Canonical

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Canonical forms of Linear Transformations

David M. Rocke
Department of Applied Science
UC Davis

April 16, 2003

1. Overview
A canonical form of a linear transformation is a matrix representation in a basis chosen to
make that representation simple in form. The most common canonical form is a diagonal
matrix. In this section of the course, we explore canonical forms with three main types of
results:

1. For real symmetric or complex Hermitian matrices, we show that these are always di-
agonalizable; i.e., there exists a basis wrt which the matrix representation is diagonal.

2. For general linear transformations, we show that it is diagonalizable if and only if its
minimal polynomial is a product of distinct linear factors.

3. For any linear transformation for which the characteristic polynomial factors com-
pletely (this is all linear transformations if the field is C), there is a matrix represen-
tation in Jordan canonical form.

Most of this will be shown directly in class, assuming the standard facts about real and
complex numbers and solution and factoring of polynomials. The Cayley-Hamilton theorem
is given without proof, as this would be too extensive.

2. Eigenvalues and Eigenvectors


Let T ∈ L(V ). If T x = λx for some x ∈ V , we say that λ is an eigenvalue with associated
eigenvector x. If λ is an eigenvalue of T , then the eigenspace associated with λ is {x ∈
V |T x = λx}.
The following is an immediate consequence of the definitions.

Theorem 1 Suppose T ∈ L(V ), with V finite dimensional. Then the following are equiv-
alent:

1. λ is an eigenvalue of T .

1
2. T − λI is singular and therefore not invertible.

3. |T − λI| = 0.

If u is a scalar variable, then |T − uI| is a polynomial of degree n in u called the charac-


teristic polynomial. It’s roots are exactly the eigenvalues according to the theorem above.
Next, we show that the characteristic polynomial is independent of the basis representation.

Lemma 1 Let T ∈ L(V ), V finite dimensional, and let A be a matrix representation of T


in a particular basis. If P is a basis transformation, then a point x in the old basis is P x
in the new basis, and T has matrix representation B = P −1 AP in the new basis. Then

|B − uI| = |P −1 AP − uI|
= |P −1 AP − uP −1 P |
= |P −1 (A − uI)P |
= |P −1 ||(A − uI)||P |
= |(A − uI)

since |P −1 | = |P |−1 .

A linear transformation T is diagonalizable if there exists a basis wrt which the matrix
representation of T is diagonal, or, equivalently, if there is a basis of the whole vector space
consisting of eigenvectors of T . If T has a diagonal representation, and λ1 , λ2 , . . . , λk are
the distinct eigenvalues, then T ’s matrix representation consists of k diagonal blocks, of
which the ith block is the diagonal matrix λi Idi ×di and the characteristic polynomial of T
is (u − λ1 )d1 (u − λ2 )d2 · · · (u − λk )dk . For diagonalizable transformations, the dimension of
the eigenspace of λi is di .

Lemma 2 If T ∈ L(V ), V finite dimensional of dimension n, and if T has n distinct


eigenvalues, then T is diagonalizable.

Proof. Let (λ1 , x1 ), (λ2 , x2 ), . . . , (λn , xn ) be the distinct eigenvalues and an associated
eigenvector for each. We show that {x1 , x2 , . . . , xn } is a linearly independent set. Then this
must be a basis of V since the space spanned by them is of dimension n and must therefore
be the whole space.
Suppose, on the contrary, that
Xn
ai xi = 0 (2.1)
i=1

for some set of constants ai not all zero. Without loss of generality, let a1 6= 0 Applying T
to (2.1), we get
Xn
ai λi xi = 0
i=1

2
so that
n
X n
X
0 = λn ai xi − ai λi xi
i=1 i=1
n
X
= ai (λn − λi )xi
i=1
n−1
X
= ai (λn − λi )xi
i=1

We can then apply T again, multiplying by λn−1 and subtracting, and continue the process
to obtain
a1 (λn − λ1 )(λn−1 − λ1 ) · · · (λ2 − λ1 ) = 0
Since all the eigenvalues are distinct by hypothesis, this means that a1 = 0, a contradiction.
Thus, the xi are linearly independent.

3. Symmetric and Hermitian matrices


A symmetric matrix A is one in which A> = A. A Hermitian matrix A has A∗ = A,
where A∗ is the transpose complex conjugate of A. An orthogonal matrix A is one in which
A> A = I and a unitary matrix A is one in which A∗ A = I.

Lemma 3 All eigenvalues of a symmetric or Hermitian matrix are real.

Proof. Let λ be an eigenvalue of a Hermitian matrix A and let x be the associated


eigenvector. Then Ax = λx, so multiplying by x∗ on the left we obtain x∗ Ax = λx∗ x.
Taking the adjoint of both sides of the equation defining the eigenvalues, we obtain x∗ A∗ =
λ̄x∗ , yielding x∗ A∗ x = λ̄x∗ x by multiplying by x on the right. Since A∗ = A, this implies
that λx∗ x = λ̄x∗ x. Since x∗ x cannot be zero for nonzero x, this implies λ̄ = λ, so that λ is
real. The proof for symmetric matrices is the same

Lemma 4 Eigenvectors corresponding to distinct eigenvalues of a real symmetric or Her-


mitian matrix are orthogonal.

Proof. Suppose that (λ1 , x1 ) and (λ2 , x2 ) are eigenvalue/eigenvector pairs for a Hermitian
matrix A, and suppose that λ1 6= λ2 . Then Ax1 = λ1 x1 and so x∗2 Ax1 = λ1 x∗2 x1 (multiplying
on the left by x∗2 ). Also, Ax2 = λ2 x2 , so x∗2 A∗ = x∗2 A = λ2 x∗2 (using the definition of
Hermitian and the previous lemma), and thus x∗2 Ax1 = λ2 x∗2 x1 (multiplying on the right
by x1 ). This in turn implies that λ1 x∗2 x1 = λ2 x∗2 x1 . Since λ1 6= λ2 , x∗2 x1 = 0; that is, the
two vectors are orthogonal. The proof for symmetric matrices is the same.

Theorem 2 Let A be a Hermitian (real symmetric) matrix. Then A is diagonalizable, and


the basis transformation matrix P can be chosen to be unitary (orthogonal).

3
Proof. Let A be Hermitian and let (λ1 , x1 ) be an eigenvalue/eigenvector pair. Construct
an orthonormal basis of V consisting of y1 = x1 /||x1 ||, y2 , . . . , yn . The basis transformation
into this new coordinate system has a matrix P satisfying P ∗ P = I because the new basis
is orthonormal. Let B be the matrix in the new coordinate system, so that B = P ∗ AP .
Note that B is still Hermitian because B ∗ = (P ∗ AP )∗ = P ∗ A∗ P = P ∗ AP = B. Also,
B has the same eigenvalues as A since it has the same characteristic polynomial, and
y1 is an eigenvector because y1 in the new coordinate system is P ∗ x1 /||x1 || and By1 =
P ∗ AP P ∗ x1 /||x1 || = P ∗ Ax1 /||x1 || = λ1 P ∗ x1 /||x1 || = λ1 y1 . The fact that Ay1 = λ1 y1 ,
implies that the matrix representation of the linear transformation in the new basis has a
first column consisting of (λ1 , 0, . . . , 0)> . This is so because we know that the vector with
coordinates (1, 0, . . . , 0) is mapped into (λ1 , 0, . . . , 0). Now consider a basis vector yi with
i 6= 1. Byi ∈ V , so Byi = cy1 + ỹ, where ỹ is a linear combination of y2 , y3 ,. . . , yn and is
therefore orthogonal to y1 . Now

y1∗ Byi = cy1∗ y1 + y1∗ ỹ


= c.

However, y1∗ B = λ1 y1∗ so y1∗ Byi = λ1 y1∗ yi = 0. Thus, c = 0. This means that B has a first
row in which all the off-diagonal elements are zero and is of the form
µ ¶
λ1 0
B=
0 C

Here C is a Hermitian (real symmetric) matrix of degree n − 1 since


µ ¶∗ µ ¶ µ ¶
λ1 0 λ1 0 λ1 0
B∗ = = = B =
0 C 0 C∗ 0 C

so that C ∗ = C.
We now finish the proof by induction on the dimension n. For n = 1, the result is obvious
since 1 × 1 matrices are trivially diagonal. If the result is true for Hermitian matrices of
dimension n − 1, then there is a unitary matrix Q such that Q∗ CQ is diagonal. This implies
that B is diagonalized by the unitary matrix
µ ¶
1 0
R=
0 Q

4. A Criterion for General Diagonalizability


In this section, we derive a criterion for diagonalizability based on the minimal polynomial,
which is related to the characteristic polynomial. Consider a diagonal matrix A with distinct

4
eigenvalues λ1 , λ2 ,. . . , λk . Then as previously observed, A is of the form
 
λ1 I1 0 ··· 0
 0 λ2 I2 · · · 0 
 
A= . . . 
 .. .. .. 0 
0 0 ··· λk Ik

where Ii is a di × di identity matrix. This means that the characteristic polynomial of


A is p(u) = (u − λ1 )d1 (u − λ2 )d2 · · · (u − λk )dk . Now A − λ1 I has the first d1 diagonal
entries equal to zero, (A − λ2 I) has the next d2 entries equal to zero, so that the matrix
(A − λ1 I)(A − λ2 I) · · · (A − λk I) = 0, the matrix of all zeroes. Possible a lower degree
polynomial may serve as well, because if one of the eigenvalues, say λk , is zero, then one
may omit that factor and the resulting polynomial in A is still the zero polynomial. Thus,
for any diagonal matrix A, there is a polynomial p(u) of degree k ≤ n which is a product
of distinct linear factors and which when A is substituted in for the variable produces a
zero linear transformation. Note also, that this polynomial has the same factors as the
characteristic polynomial (or possibly one fewer), but in any case divides the characteristic
polynomial. The rest of the section is spent showing that this condition is sufficient as well
as necessary. To do this, we will need to take a short detour in the study of polynomial
ideals.

4.1. Polynomial Ideals


The set of polynomials P is a vector space, and in fact is a commutative algebra with
ordinary multiplication of polynomials as the algebra’s multiplication. A polynomial ideal
is a subspace F ⊂ P which is closed under multiplication by polynomials; that is, whenever
f ∈ F, and p ∈ P, then f p ∈ F . If f ∈ F , the principal ideal generated by f , denoted
hf i, is the smallest polynomial ideal containing f . It consists of all polynomials with f as
a factor. As it turns out, the principal ideals are all polynomial ideals:

Lemma 5 Let F ⊂ P be a polynomial ideal. Then there exists a monic (leading coefficient
1) polynomial f with F = hf i.

Proof. Let f be a nonzero polynomial in F of smallest degree, and wlog let f be monic.
We show that F = hf i. Let g ∈ F. It is sufficient to show that f divides g. By polynomial
division with remainders, we can write g = f q + r, where deg(r) < deg(f ). But by hypoth-
esis, g ∈ F, and f q ∈ F because F is a polynomial ideal, so r ∈ F since F is a vector
subspace of P. But f is by choice a polynomial in F of lowest degree possible for nonzero
polynomials, so r = 0, and g is therefore a multiple of f .

4.2. Cyclic Subspaces and Annihilating Polynomials


Consider T ∈ L(V ), V of finite dimension n, and let x ∈ V . Consider the sequence of
images of x consisting of x, T x, T 2 x, . . . and consider the sequence of subspaces V0 = hxi,

5
V1 = hx, T xi, V2 = hx, T x, T 2 xi, . . . . At some point, the sequence of subspaces must stop
increasing in size, since the dimension increases by 1 at each step if the size increases, and
the dimension is limited to n. Thus, at some point T k x ∈ Vk−1 , so

T k x = −c0 x − c1 T x − c2 T 2 x − · · · − ck−1 T k−1 x

so p(T )x = 0 for the polynomial p(u) = c0 + c1 u + · · · + ck−1 uk−1 + uk .


Now this is for a specific vector x, but we can also look at polynomials in T that are zero
2
for every x. First, there must be at least one of degree at most n × n, since I, T, T 2 , . . . , T n
cannot be linearly independent, since Rn×n is of dimension n2 and the set of powers of T has
2
n2 + 1 members. Thus there is some linear combination c0 I + c1 T + c2 T 2 + · · · + cn2 T n = 0.
2
Thus the polynomial p(u) = c0 + c1 u + c2 u2 + · · · + cn2 un has p(T ) = 0. Such a polynomial
is said to annihilate T . Now consider the set AT ⊂ P of polynomials that annihilate a fixed
polynomial T . The set is non-empty, and is a vector subspace since 0+0 = 0 and the 0 linear
transformation times any scalar is also zero. Furthermore, it is a polynomial ideal, since
if p(T ) = 0, then also p(T )q(T ) = 0 for any polynomial q. Thus, it must be the principal
ideal generated by a unique monic polynomial m(u), called the minimal polynomial.

Lemma 6 Let m(u) be the minimal polynomial of a linear transformation T and let λ be
an eigenvalue of T . Then m(λ) = 0. Conversely, if m(c) = 0, then c is an eigenvalue of T .

Proof. First, suppose that λ is an eigenvalue of T . Then T x = λx, so 0 = m(T )x = m(λ)x,


so m(λ) = 0. Conversely, if m(c) = 0, then m(u) = (u − c)q(u) and q(T ) 6= 0, since q is
of lower degree than the minimal polynomial u. Let x be such that y = q(T )x 6= 0. Then
0 = m(T )x = (T − cI)q(T )x = (T − cI)y, so c is an eigenvalue of T .
The last result means that the minimal polynomial and the characteristic polynomial
have the same linear factors. Furthermore, if T is diagonalizable, then the minimal polyno-
mial for T must be exactly the product of (u − λi ) for all non-zero distinct eigenvalues λi .
We state the following without proof.

Theorem 3 (Cayley-Hamilton) Let p be the characteristic polynomial for T ∈ L(V ), V


finite dimensional. Then p(T ) = 0.

This implies that the minimal polynomial must divide the characteristic polynomial. We
now show that T is diagonalizable if and only if the minimal polynomial for T is a product
of distinct linear factors.

Lemma 7 Let T ∈ L(V ), V of finite dimension n, and let m(u) be the minimal polynomial
of T , Suppose that m(u) = (u − c1 )r1 (u − c2 )r2 · · · (u − ck )rk . Let W ⊂ V , W 6= V , and
suppose that W is invariant under T , meaning that if w ∈ W , then T w ∈ W . Then there
exists y ∈ V with y ∈
/ W and some eigenvalue λ of T such that (T − λI)y ∈ W .

Proof. Fix x ∈ V \ W and consider the set F of polynomials p(u) such that p(T )x ∈ W .
The minimal polynomial m(u) is in F, so the set is non-empty. Also, we can show that F

6
is a polynomial ideal. First, it is clearly a vector space. Next, if p ∈ F and q ∈ P then
p(T )x ∈ W because p ∈ F, and q(T )[p(T )x] ∈ W because W is invariant under T , and thus
pq ∈ F. Since F is a polynomial ideal, it must be the principal ideal generated by some
monic polynomial g. Since m ∈ F, g|m, so g(u) = (u − c1 )e1 (u − c2 )e2 · · · (u − ck )ek , with
ei ≤ ri . For at least one i, we must have ei > 0 since g 6= 1, say ej . Then g(u) = (u−cj )h(u)
and h(u) ∈ / F because g is a polynomial of minimal degree in F . Now let y = h(T )x ∈ / W.
We have (T − cj I)y = (T − cj )h(T )x = g(T )x ∈ W , as required.

Theorem 4 Let T ∈ L(V ) with V of finite dimension n. Then T is diagonalizable if


and only if the minimal polynomial for T is a product of distinct linear factors m(u) =
(u − λ1 )(u − λ2 ) · · · (u − λk ), where the λi are the distinct eigenvalues of T

We have already shown that if T is diagonalizable, then the minimal polynomial of T is of


the required form. Now we show the converse. Suppose that the minimal polynomial of T is
of the required form. If V is spanned by eigenvectors of T , then T has a basis of eigenvectors,
and is thus diagonal in that basis. So suppose that the subspace spanned by the eigenvectors
is W , a proper subspace of V . W is invariant under T since any eigenvector of T is taken onto
another eigenvector with the same eigenvalue. Then, by the previous lemma, there exists a
vector x ∈ V \ W and an eigenvalue λj of T such that y = (T − λj I)x ∈ W . By hypothesis,
m(u) = (u−λj )q(u) with q(λj ) 6= 0. The polynomial q(u)−q(λj ) has a zero at λj , so factors
as q(u) − q(λj ) = (u − λj )h(u). Now q(T )x − q(λj )x = h(T )(T − λj I)x = h(T )y ∈ W . But
0 = m(T )x = (T − λj I)q(T )x, so q(T )x is an eigenvector and thus lies in W . Thus q(λj )x
must also lie in W , which is a contradiction since q(λj ) 6= 0.

5. Jordan Canonical Form


We give the following result without proof.

Theorem 5 Let T ∈ L(V ), with V of finite dimension n. Suppose that the characteristic
polynomial of T factors completely as (u − λ1 )d1 (u − λ2 )d2 · · · (u − λk )dk . Then there exists a
basis wrt which the matrix representation A of T has the following form: A is block diagonal
with blocks A1 , A2 ,. . . ,Ak . Each block corresponds to one eigenvalue and is in turn block
diagonal with blocks Bi1 , Bi2 ,. . . , Bini . Each of the Bij has a diagonal consisting of the
eigenvalue λi , has 1 on each entry directly below the diagonal, and has zeroes elsewhere.
There is exactly one eigenvector corresponding to each Bij .

You might also like