LinearAlgebra Author Benjamin
LinearAlgebra Author Benjamin
LinearAlgebra Author Benjamin
Linear Algebra
January 2, 2008
Preface
Up close, smooth things look atthe picture behind dierential calculus. In mathematical language, we can approximate smoothly varying functions by linear functions. In calculus of several variables, the resulting linear functions can be complicated: you need to study linear algebra. Problems appear throughout the text, which you must learn to solve. They often provide vital results used in the course. Most of these problems have hints, particularly the more important ones. There are also review problems at the end of each section, and you should try to solve a few from each section. Try to solve each problem rst before looking up the hint. Never use decimal approximations (for instance, from a calculator) on any problem, except to check your work; many problems are very sensitive to small errors and must be worked out precisely. Whenever ridiculously large numbers appear in the statement of a problem, this is a hint that they must play little or no role in the solution. The prerequisites for this course are basic arithmetic and elementary algebra, typically learned in high school, and some comfort and facility with proofs, particularly using mathematical induction. You cant prove that all men are wearing hats just by pointing out one example of a man in a hat; most proofs require an argument, and not just examples. Polya [9] and Solow [12] explain induction and provide help with proofs. Bretscher [2] and Strang [15] are excellent introductory textbooks of linear algebra.
iii
Contents
Matrix Calculations
1 2 3 4 5 6 7 Solving Linear Equations Matrices Important Types of Matrices Elimination Via Matrix Arithmetic Finding the Inverse of a Matrix The Determinant The Determinant via Elimination 3 17 25 35 43 51 61
Eigenvectors
11 Eigenvalues and Eigenvectors 12 Bases of Eigenvectors 101 109
Abstraction
16 Vector Spaces 17 Fields 161 173
Contents
v 23 24 25 26 27 Jordan Normal Form Decomposition and Minimal Polynomial Matrix Functions of a Matrix Variable Symmetric Functions of Eigenvalues The Pfaan 213 225 235 245 251
Factorizations
28 Dual Spaces and Quotient Spaces 29 Singular Value Factorization 30 Factorizations 265 273 279
Tensors
31 32 33 34 Quadratic Forms Tensors and Indices Tensors Exterior Forms 285 295 303 313 317 477 479 481
Matrix Calculations
1.1 Elimination
Consider equations 6 x3 + x2 = 0 3 x1 + 7 x2 + 4 x3 = 9 3 x1 + 5 x2 + 8 x3 = 3. They are called linear because they are sums of constants and constant multiples of variables. How can we solve them (or teach a computer to solve them)? To solve means to nd values for each of the variables x1 , x2 and x3 satisfying all three of the equations.
Preliminaries
a. Line up the variables: x2 x3 = 6 3x1 + 7x2 + 4x3 = 9 3x1 + 5x2 + 8x3 = 3 All of the x1 s are in the same column, etc. and all constants on the right hand side. b. Drop the variables and equals signs, just writing the numbers. 0 1 1 6 3 7 4 9 . 3 5 8 3 This saves rewriting the variables at each step. We put brackets around for decoration. 3
Forward elimination
(1) If the pivot is zero, then swap rows with a lower row to get the pivot to be nonzero. This gives 7 4 9 3 1 1 6 . 0 3 5 8 3 (Going back to the linear equations we started with, we are swapping the order in which we write them down.) If you cant nd any row to swap with (because every lower row also has a zero in the pivot column), then move pivot one step to the right and repeat step (1). (2) Add whatever multiples of the pivot row you need to each lower row, in order to kill o every entry under the pivot. (Kill o means make into 0). This requires us to add (row 1) to row 3 to kill o the 3 under the pivot, giving 3 7 4 9 0 1 1 6 . 0 2 4 6 (Going back to the linear equations, we are adding equations together which doesnt change the answerswe could reverse this step by subtracting again.) (3) Make a new pivot one step down and to the right: . 3 0 0 and start again at step (1). In our example, our next pivot, 1, must kill everything beneath it: 2. So we add 2(row 2) to (row 3), giving 3 0 0 7 1 0 4 1 2 9 6 . 6 7 1 2 4 1 4 9 6 . 6
1.1. Elimination
Figure 1.1: Forward elimination on a large matrix. The shaded boxes are nonzero entries, while zero entries are left blank. The pivots are outlined. You can see the rst few steps, and then a step somewhere in the middle of the calculation, and then the nal result.
. 4 1 2 9 6 . 6
Forward elimination is done. Lets turn the numbers back into equations, to see what we have: +7x2 +4x3 =9 3x1 x2 x 3 2x3 =6 =6
Each pivot solves for one variable in terms of later variables. Problem 1.1. Apply forward elimination to 0 0 1 0 1 0 1 0 3 1 1 1
1 1 0 1
Back Substitution
Starting at the last pivot, and working up: a. Rescale the entire row to turn the pivot into a 1.
Figure 1.2: Back substitution on the large matrix from gure 1.1. You can see the rst few steps, and then a step somewhere in the middle of the calculation, and then the nal result. You can see the pivots turn into 1s.
b. Add whatever multiples of the pivot row you need to each higher row, in order to kill o every entry above the pivot. Applied to our example: 7 4 9 3 1 1 6 , 0 0 0 2 6 Scale row 3 by
1 2
: 3 0 0 7 1 0 4 1 1 9 6 3
3 9 3
3 0 0
0 1 0
0 0 1
66 9 3 22 9 3
1 Scale row 1 by 3 .
0 1 0
0 0 1
x1 = 22 x2 = 9 x3 = 3.
1.2. Examples
Denition 1.1. Forward elimination and back substitution together are called GaussJordan elimination or just elimination. (Forward elimination is often called Gaussian elimination.) Remark 1.2. Forward elimination already shows us what is going to happen: which variables are solved for in terms of which other variables. So for answering most questions, we usually only need to carry out forward elimination, without back substitution.
1.2 Examples
Example 1.3 (More than one solution). x1 + x2 + x3 + x4 = 7 x1 + 2x3 =1 x2 + x3 =0 Write down the numbers:
1 1 0
1 0 1
1 2 1
1 0 0
7 1 . 0
Kill everything under the pivot: add 1 1 0 1 0 1 Done with that pivot; move 1 0 0 Kill: add row 2 to row 3: 1 0 0 Move . 1 1 1
(row 1) to row 2. 1 1 7 1 1 6 . 1 0 0
1 1 1
1 1 0
7 6 . 0
1 1 0
1 1 2
1 1 1
7 6 . 6
x 2 +x 3
2 x 3 x 4 =6
Look: each pivot solves for one variable, in terms of later variables. There was never any pivot in the x4 column, so x4 is a free variable : x4 can take on any value, and then we just use each pivot to solve for the other variables, bottom up. Problem 1.3. Back substitute to nd the values of x1 , x2 , x3 in terms of x4 . Example 1.4 (No solutions). Consider the equations x1 + x2 + x3 = 1 2 x1 + x2 + x3 = 0 4x1 + 3x2 + 3x3 = 1. Forward eliminate: 1 2 4 1 1 3 2 1 5 1 0 1
Add 2(row 1) to row 2, 4(row 1) to row 3. 1 0 0 Move the pivot . 1 0 0 Add (row 2) to row 3. 1 0 0 1 1 0 2 3 0 1 2 1 1 1 1 2 3 3 1 2 3 1 1 1 2 3 3 1 2 3
1.3. Summary
9 . 1 0 0 1 1 0 2 3 0 1 2 1
Turn back into equations: x1 + x2 + 2 x3 = 1 x 2 3 x 3 = 2 0 = 1. You cant solve these equations: 0 cant equal 1. So you cant solve the original equations either: there are no solutions. Two lessons that save you time and eort: a. If a pivot appears in the constants column, then there are no solutions. b. You dont need to back substitute for this problem; forward elimination already tells you if there are any solutions.
1.3 Summary
We can turn linear equations into a box of numbers. Start a pivot at the top left corner, swap rows if needed, move if swapping wont work, kill o everything under the pivot, and then make a new pivot from the last one. After forward elimination, we will say that the resulting equations are in echelon form (often called row echelon form ). The echelon form equations have the same solutions as the original equations. Each column except the last (the column of constants) represents a variable. Each pivot solves for one variable in terms of later variables (each pivot binds a variable, so that the variable is not free). The original equations have no solutions just when the echelon equations have a pivot in the column of constants. Otherwise there are solutions, and any pivotless column (besides the column of constants) gives a free variable (a variable whose value is not xed by the equations). The value of any free variable can be picked as we like. So if there are solutions, there is either only one solution (no free variables), or there are innitely many solutions (free variables). Setting free variables to dierent values gives dierent solutions. The number of pivots is called the rank. Forward elimination makes the pattern of pivots clear; often we dont need to back substitute.
10
Remark 1.5. We often encounter systems of linear equations for which all of the constants are zero (the right hand sides). When this happens, to save time we wont write out a column of constants, since the constants would just remain zero all the way through forward elimination and back substitution. Problem 1.4. Use elimination to solve the linear equations 2 x2 + x3 = 1 4 x1 x2 + x3 = 2 4 x1 + 3 x2 + 3 x3 = 4
Problem 1.6. Apply forward elimination to 1 1 1 1 1 1 1 1 0 Problem 1.7. Apply forward elimination to 1 2 2 1 2 2 2 0 0 Problem 1.8. Apply forward elimination 0 0 1 0 1 1 to 1 1 1
1 2 1
0 0 0 1
1.3. Summary
11
Problem 1.10. Apply forward elimination to 1 3 2 6 2 5 4 1 3 8 6 7 Problem 1.11. Apply back substitution to the result of problem 1.2 on page 5.
Problem 1.12. Apply back substitution to 1 1 0 0 2 0 0 0 1 Problem 1.13. Apply back substitution to 1 0 1 0 1 1 0 0 0 Problem 1.14. Apply back substitution to 2 1 1 0 3 1 0 0 0 Problem 1.15. Apply back substitution to 3 0 2 2 0 0 0 0 3 0 0 0
2 1 2 2
12
Problem 1.19. Write down the simplest example you can to show that adding one to each entry in a row can change the answers to the linear equations. So adding numbers to rows is not allowed. Problem 1.20. Write down the simplest systems of linear equations you can come up with that have a. One solution. b. No solutions. c. Innitely many solutions.
Problem 1.21. If all of the constants in some linear equations are zeros, must the equations have a solution?
1 2 Problem 1.22. Draw the two lines 1 2 x1 x2 = 2 and 2 x1 + x2 = 3 in R . In your drawing indicate the points which satisfy both equations.
1.3. Summary
13
Problem 1.23. Which pair of equations cuts out which pair of lines? How many solutions does each pair of equations have? x1 x2 = 0 x1 + x2 = 1 x1 x2 = 4 2 x 1 + 2 x 2 = 1 x1 x2 = 1 3 x 1 + 3 x 2 = 3 (3) (2) (1)
(a)
(b)
(c)
Problem 1.24. Draw the two lines 2 x1 + x2 = 1 and x1 2 x2 = 1 in the x1 x2 -plane. Explain geometrically where the solution of this pair of equations lies. Carry out forward elimination on the pair, to obtain a new pair of equations. Draw the lines corresponding to each new equation. Explain why one of these lines is parallel to one of the axes. Problem 1.25. Find the quadratic function y = ax2 + bx + c which passes through the points (x, y ) = (0, 2), (1, 1), (2, 6). Problem 1.26. Give a simple example of a system of linear equations which has a solution, but for which, if you alter one of the coecients by a tiny amount (as tiny as you like), then there is no solution. Problem 1.27. If you write down just one linear equation in three variables, like 2x1 + x2 x3 = 1, the solutions draw out a plane. So a system of three linear equations draws out three dierent planes. The solutions of two of the equations lie on the intersections of the two corresponding planes. The solutions of the whole system are the points where all three planes intersect. Which system of equations in table 1.1 on the next page draws out which picture of planes from gure 1.3 on page 15?
14
(1)
(2)
(3)
(4)
(5)
1.3. Summary
15
(a)
(b)
(c)
(d)
(e)
Figure 1.3: When you have three equations in three variables, each one draws a plane. Solutions of a pair of equations lie where their planes intersect. Solutions of all three equations lie where all three planes intersect.
2 Matrices
The boxes of numbers we have been writing are called matrices. Lets learn the arithmetic of matrices.
2.1 Denitions
Denition 2.1. A matrix is a nite box A of numbers, arranged in rows and columns. We write it as A11 A12 ... A1q A21 A22 ... A2q A= . . . . . . . . . . . . Ap 1 Ap 2 ... Apq and say that A is p q if it has p rows and q columns. If there are as many rows as columns, we will say that the matrix is square. Remark 2.2. A31 is in row 3, column 1. If we have 10 or more rows or columns (which wont happen in this book), we might write A1,1 instead of A11 . For example, we can distinguish A11,1 from A1,11 . Denition 2.3. A matrix x with only one column is called a vector and written x1 x2 . x= . . . xn The collection of all vectors with n real number entries is called Rn . Think of R2 as the xy -plane, writing each point as x y instead of (x, y ). We draw a vector, for example the vector 2 , 3 17
18
Matrices
Figure 2.1: Echelon form: a staircase, each step down by only 1, but across to the right by maybe more than one. The pivots are the steps down, and below them, in the unshaded part, are zeros.
as an arrow, pointing out of the origin, with the arrow head at the point x = 2, y = 3 :
We draw vectors either as dots or more often as arrows. If there are too many vectors, pictures of arrows can get cluttered, so we prefer to draw dots. Sometimes we distinguish between points (drawn as dots) and vectors (drawn as arrows), but algebraically they are the same objects: columns of numbers.
19
Problem 2.3. Draw dots where the pivots are in gure 2.1 on the facing page.
Problem 2.4. Give the simplest examples you can of two matrices which are not in echelon form, each for a dierent reason.
Problem 2.5. The entries A11 , A22 , . . . of a square matrix A are called the diagonal. Prove that every square matrix in echelon form has all pivots lying on or above the diagonal. Problem 2.6. Prove that a square matrix in echelon form has a zero row just when it is either all zeroes or it has a pivot above the diagonal. Problem 2.7. Prove that a square matrix in echelon form has a column with no pivot just when it has a zero row. Thus all diagonal entries are pivots or else there is a zero row. Theorem 2.5. Forward elimination brings any matrix to echelon form, without altering the solutions of the associated linear equations. Obviously proof is by induction, but the result is clear enough, so we wont give a proof.
0 .
(We will often colour various rows and columns of matrices, just to make the discussion easier to follow. The colours have no mathematical meaning.)
20
Matrices
Denition 2.6. Any matrix which has only zero entries will be written 0. Problem 2.9. What could (0 0) mean?
0) mean if 1 3 2 ? 4
A=
Denition 2.8. If two matrices have matching numbers of rows and columns, we add them by adding their components: (A + B )ij = Aij + Bij . Similarly for subtracting. Problem 2.11. Let A= Find A + B . When we add matrices in blocks, A B + C D = A+C B+D 1 3 2 , B= 4 1 1 2 . 2
(as long as A and C have the same numbers of rows and columns and B and D do as well). Problem 2.12. Draw the vectors u= 2 , v= 1 3 , 1
and the vectors 0 and u + v . In your picture, you should see that they form the vertices of a parallelogram (a quadrilateral whose opposite sides are parallel). Multiply by numbers like: 7 1 3 2 4 = 71 73 72 . 74
21
Denition 2.9. If A is a matrix and c is a number, cA is the matrix with (cA)ij = cAij . Example 2.10. Let x= 1 . 2
The multiples x, 2x, 3x, . . . and x, 2x, 3x, . . . live on a straight line through 0: 3x 2x x 0 x 2x 3x
Put your left hand index nger on the row, and your right hand index nger on the column, and as you run your left hand along, run your right hand down: .
As your ngers travel, you multiply the entries you hit, and add up all of the products. Problem 2.13. Multiply 8 2 1 3
Matrices
As your left hand nger travels along a row, and your right hand down a column, you produce the entry in that row and column; the second row of A times the rst column of B gives the entry of AB in second row, rst column. Problem 2.14. Multiply 1 2 2 3 1 3 2 4 3 2 3 4
Denition 2.13. We write k in front of an expression to mean the sum for k taking on all possible values for which the expression makes sense. For example, if x is a vector with 3 entries, x1 x = x2 , x3 then
k
xk = x1 + x2 + x3 .
Denition 2.14. If A is p q and B is q r, then AB is the p r matrix whose entries are (AB )ij = k Aik Bkj .
0 0
1 , C= 1
2 0
1 1
2 , D= 1
0 2
0 1
Compute all of the following which are dened: AB, AC, AD, BC, CA, CD.
23
Problem 2.17. Find some 2 2 matrices A and B with no zero entries for which AB = 0. Problem 2.18. Find a 2 2 matrix A with no zero entries for which A2 = 0.
Problem 2.19. Suppose that we have a matrix A, so that whenever x is a vector with integer entries, then Ax is also a vector with integer entries. Prove that A has integer entries. Problem 2.20. A matrix is called upper triangular if all entries below the diagonal are zero. Prove that the product of upper triangular square matrices is upper triangular, and if, for example A11 A12 A13 A14 ... A1n A22 A23 A24 ... A2n A33 A34 ... A3n . A= , .. .. . . . . . .. . . . Ann (with zeroes under the diagonal) and B11 B= then AB = A11 B11 A22 B22 A33 B33 .. . ... ... ... .. . .. . . . . Ann Bnn . B12 B22 B13 B23 B33 B14 B24 B34 .. . ... ... ... .. . .. . B1n B2n B3n . , . . . . . Bnn
Problem 2.21. Prove the analogous result for lower triangular matrices.
24
Matrices
Problem 2.23. Prove that matrix multiplication is associative: (AB )C = A(BC ) (and that if either side is dened, then the other is, and they are equal).
Problem 2.24. Prove that matrix multiplication is distributive: A(B + C ) = AB + AC and (P + Q)R = P R + QR for any matrices A, B, C, P, Q, R (again if one side is dened, then both are and they are equal). Running your nger along rows and columns, you see that blocks multiply like: A etc. Problem 2.25. To make sense of this last statement, what do we need to know about the numbers of rows and columns of A, B, C and D? B C D = AC + BD
I1 = (1) , I2 =
Denition 3.1. The n n matrix with 1s on the diagonal and zeros everywhere else is called the identity matrix, and written In . We often write it as I to be deliberately ambiguous about what size it is. An equivalent denition: Iij = 1 0 if i = j if i = j.
Problem 3.1. What could I13 mean? (Careful: it has two meanings.) What does I2 mean? Problem 3.2. Prove that IA = AI = A for any matrix A.
Problem 3.3. Suppose that B is an n n matrix, and that AB = A for any n n matrix A. Prove that B = In .
Problem 3.4. If A and B are two matrices and Ax = Bx for any vector x, prove that A = B . Denition 3.2. The columns of In are vectors called e1 , e2 , . . . , en . Problem 3.5. Consider the identity matrix I3 . What are the vectors e1 , e2 , e3 ? 25
26
Problem 3.6. The vector ej has a one in which row? And zeroes in which rows?
Problem 3.8. If A is any matrix, prove that Aej is the j -th column of A. If A is a p q matrix, by the previous exercise, A = (Ae1 Ae2 . . . Aeq ) . In particular, when we multiply matrices AB = (ABe1 ABe2 . . . ABeq ) (and if either side of this equation is dened, then both sides are and they are equal). In other words, the columns of AB are A times the columns of B . This next exercise is particularly vital: Problem 3.9. If A is a matrix, and x a vector, prove that Ax is a sum of the columns of A, each weighted by entries of x: Ax = x1 (Ae1 ) + x2 (Ae2 ) + + xn (Aen ) .
Problem 3.12. Prove that the rows of AB are the rows of A multiplied by B .
27
(b)
(c)
(d)
(e)
Figure 3.1: Faces formed by y = Ax. Each face is centered at the origin.
Problem 3.13. The Fibonacci numbers are the numbers x0 = 1, x1 = 1, xn+1 = xn + xn1 . Write down x0 , x1 , x2 , x3 and x4 . Let A= Prove that xn+1 xn Problem 3.14. Let x= 1 , y= 0 0 . 2 = An 1 . 1 1 1 1 . 0
D=
For each matrix M = A, B, C, D, E, F draw M x and M y (in a dierent colour for each matrix), and explain in words what each matrix is doing (for example, rotating, attening onto a line, expanding, contracting, etc.). Problem 3.15. The rst picture in gure 3.1 is the original in the x1 , x2 plane, and the center of the circular face is at the origin. If we pick a matrix A and set y = Ax, and draw the image in the y1 , y2 plane, which matrix below draws which picture?
28 2 1 0 , 0 0 1 1 , 0 2 1
0 , 1
1 1
0 1
Problem 3.16. Can you gure out which matrices give rise to the pictures in the last problem, just by looking at the pictures? Assume that you known that all the entries of each matrix are integers between -2 and 2.
Problem 3.17. What are the simplest examples you can nd of 2 2 matrices A for which taking vectors x to Ax (a) contracts the plane, (b) dilates the plane, (c) dilates one direction, while contracting another, (d) rotates the plane by a right angle, (e) reects the plane in a line, (f) moves the vertical axis, but leaves every point of the horizontal axis where it is (a shear)?
3.2 Inverses
Denition 3.3. A matrix is called square if it has the same number of rows as columns. Denition 3.4. If A is a square matrix, a square matrix B of the same size as A is called the inverse of A and written A1 if AB = BA = I . Problem 3.18. If A= check that A 1 = 1 1 1 . 2 2 1 1 , 1
Problem 3.19. If A, B and C are square matrices, and AB = I and CA = I , prove that B = C . In particular, there is only one inverse (if there is one at all).
Problem 3.20. Which 1 1 matrices have inverses, and what are their inverses? Problem 3.21. By multiplying out the matrices, prove that any 2 2 matrix A= a c b d
3.2. Inverses
29
has inverse A 1 = as long as ad bc = 0. Problem 3.22. If A and B are invertible matrices, prove that AB is invertible, 1 and (AB ) = B 1 A1 . 1 ad bc d c b a
explain how to nd M 1 in terms of A1 and D1 . (Warning: for a matrix which splits into blocks like M= A C B D
the inverse of M cannot be expressed in any elementary way in terms of the blocks and their inverses.)
30 Original
Matrices
Inverse matrices
Figure 3.2: Images coming from some matrices, and from their inverses.
Problem 3.29. Figure 3.2 shows how various matrices (on the left hand side) and their inverses (on the right hand side) aect vectors. But the two columns are scrambled up. Which right hand side picture is produced by the inverse matrix of each left hand side picture?
31
Problem 3.30. Suppose that I write down a list of numbers, and (a) they are all dierent and (b) there are 25 of them and (c) all of them are integers between 1 and 25. Prove that this list is a permutation of the numbers 1, 2, . . . , 25. We can draw a picture of a permutation: the permutation 2,4,1,3 is
1 2 3 4 2 4 1 3
In general, write the numbers 1, 2, . . . , n down the left side, and the permutation down the right side, and then connect left to right, 1 to 1, 2 to 2, etc. We dont need to write out the labels, so from now on lets draw 2,4,1,3 as Problem 3.31. Which permutations are , , ?
Inverting a Permutation
Denition 3.6. Flip a picture of a permutation left to right to give the inverse permutation : to . So the inverse of 2, 4, 1, 3 is 3, 1, 4, 2. Problem 3.32. Find the inverses of a. 1,2,4,3 b. 4,3,2,1 c. 3,1,2
Multiplying
We need to write down names for permutations. Write down the numbers in a permutation as p(1), p(2), . . . , p(n), and call the permutation p. If p and q are permutations, write pq for the permutation p(q (1)), p(q (2)), . . . , p(q (n)) (the product of p and q ). So pq scrambles up numbers by rst getting q to scramble them, and then getting p to scramble them. Multiply by drawing the pictures beside each other: . p = , q = , pq = Write the inverse permutation of p as p1 . So pp1 = p1 p = 1, where 1 (the identity permutation ) means the permutation that just leaves all of the numbers where they were: 1, 2, . . . , n. Problem 3.33. Let p be 2, 3, 1, 4 and q be 1, 4, 2, 3. What is pq ?
Problem 3.34. Write down two permutations p and q for which pq = qp.
32
Transpositions
Denition 3.7. A transposition is a permutation which swaps precisely two numbers, leaving all of the others alone. For example, is a transposition.
Problem 3.36. Draw each of the following permutations as a product of transpositions: (a) 3,1,2 (b) 4,3,2,1 (c) 4,1,2,3 Flipping the picture of a transposition left to right gives the same picture: a transposition is its own inverse. When a permutation is written as a product of transpositions, its inverse is written as the same transpositions, taken in reverse order. Problem 3.37. Prove that every permutation is a product of transpositions swapping successive numbers, i.e. swapping 1 with 2, or 2 with 3, etc.
Permutation Matrices
Problem 3.38. Let P = 0 1 1 . 0
Prove that, for any vector x in R2 , P x is x with rows 1 and 2 swapped. Denition 3.8. The permutation matrix associated to a permutation p of the numbers 1, 2, . . . , n is P = ep(1) ep(2) ... ep(n) .
Example 3.9. Let p be the permutation 3, 1, 2. The permutation matrix of p is 0 1 0 P = e3 e1 e2 = 0 0 1 . 1 0 0 Problem 3.39. What is the permutation matrix P associated to the permutation 4, 2, 5, 3, 1?
33
Problem 3.40. What permutation p has permutation matrix 0 0 1 0 0 0 0 1 ? 0 1 0 0 0 0 0 1 Problem 3.41. What is the permutation matrix of the identity permutation? Problem 3.42. Prove that a matrix is the permutation matrix of some permutation just when a. its entries are all 0s or 1s and b. it has exactly one 1 in each column and c. it has exactly one 1 in each row.
Lemma 3.10. Let P be the permutation matrix associated to a permutation p. Then for any vector x, P x is just x with the x1 entry moved to row p(1), the x2 entry moved to row p(2), etc. Remark 3.11. All we need to remember is that P x is x with rows permuted somehow, and that we could permute the rows of x any way we want by choosing a suitable permutation matrix P . We will never have to actually nd the permutation. Proof. Px = P
j
xj ej xj P ej
=
j
=
j
xj ep(j ) .
Proposition 3.12. Let P be the permutation matrix associated to a permutation p. Then for any matrix A, P A is just A with the row 1 moved to row p(1), row 2 moved to row p(2), etc. Proof. The columns of P A are just P multiplied by the columns of A. Remark 3.13. It is faster and easier to work directly with permutations than with permutation matrices. Avoid writing down permutation matrices if you can; otherwise you end up wasting time juggling huge numbers of 0s. Replace permutation matrices by permutations.
34
Problem 3.43. If p and q are two permutations with permutation matrices P and Q, prove that pq has permutation matrix P Q.
Which permutation p do we need to make sure that P A is in echelon form, with P the permutation matrix of p? Problem 3.45. Confusing: Prove that the permutation matrix P of a permutation p is the result of permuting the columns of the identity matrix by p, or the rows of the identity matrix by p1 .
Problem 3.46. What happens to a permutation matrix when you carry out forward elimination? back substitution? Problem 3.47. Let p be a permutation with permutation matrix P . Prove that P 1 is the permutation matrix of p1 .
1 S32 . . .
1 . . .
36
adds S21 x1 to x2 , etc. If A is any matrix then the columns of SA are S times columns of A. Problem 4.2. Let S be a matrix so that for any matrix A (of appropriate size), SA is A with multiples of some rows added to later rows. Prove that S is strictly lower triangular.
Problem 4.3. Which 3 3 matrix S adds 5 row 2 to row 3, and 7 row 1 to row 2?
Problem 4.4. Prove that if R and S are strictly lower triangular, then RS is too.
Problem 4.5. Say that a strictly lower triangular matrix is elementary if it has only one nonzero entry below the diagonal. Prove that every strictly lower triangular matrix is a product of elementary strictly lower triangular matrices.
Lemma 4.3. Every strictly lower triangular matrix is invertible, and its inverse is also strictly lower triangular. Proof. Clearly true for 1 1 matrices. Lets consider an n n strictly lower triangular matrix S , and assume that we have already proven the result for all matrices of smaller size. Write S= 1 c 0 A
where c is a column and A is a smaller strictly lower triangular matrix. Then S 1 = which is strictly lower triangular. Denition 4.4. A matrix M is strictly upper triangular if it has ones down the diagonal zeroes everywhere below the diagonal. Problem 4.6. For each fact proven above about strictly lower triangular matrices, prove an analogue for strictly upper triangular matrices. 1 A1 c 0 A
1
37
Problem 4.7. Draw a picture indicating where some vectors lie in the x1 x2 plane, and where they get mapped to in the y1 y2 plane by y = Ax with 1 2 0 . 1
A=
3 6 .
9 7
Problem 4.9. Prove that a diagonal matrix D is invertible just when none of its diagonal entries are zero. Find its inverse.
t2 .. . tn
x1 x2 . . . xn
t1 x1 t2 x2 = . . . t n xn (just running your ngers along rows and down columns). So D scales row i by ti . For any matrix A, the columns of DA are D times columns of A.
3 5
4 6
Problem 4.12. Draw a picture indicating where some vectors lie in the x1 x2 plane, and where they get mapped to in the y1 y2 plane by y = Ax with each of the following matrices playing the part of A: 2 0 0 , 3 1 0 0 , 1 2 0 0 . 3
39
become
A11 A21 . . . Ap 1
A12 A22 . . . Ap 2
40 Swap rows 1 and 2 (and 0 1 0 Add (row 1) to (row 1 0 0 1 1 0 The string of to (row 3). 1 0 0 1 0 2
3): 0 0 0 1 0 1
1 0 0
3 0 A = 0 0 1 0
7 1 2
4 1 4
0 1 0
1 0 0
7 1 0
0 1 0
0 0 0 1 1 0
1 0 0
0 0 A. 1
Remark 4.7. We wont write out these tedious matrices on the left side of A ever again, but it is important to see it done once. Remark 4.8. Back substitution is similarly carried out by multiplying by strictly upper triangular and invertible diagonal matrices.
Problem 4.15. Let S be the 3 3 strictly lower triangular matrix which adds 2 (row 1) to row 3. What does the 3 3 matrix S 101 do? Write it down.
Problem 4.16. Which 3 3 matrix adds twice the rst row to the second row when you multiply by it? Problem 4.17. Which 4 4 matrix swaps the second and fourth rows when you multiply by it?
41
Problem 4.18. Which 4 4 matrix doubles the second and quadruples the third rows when you multiply by it? Problem 4.19. If P is the permutation matrix of a permutation p, what is AP ? Problem 4.20. If we start with 0 A = 2 0 and end up with 2 P A = 0 0 what permutation matrix is P ? Problem 4.21. If A is a 2 2 matrix, and AP = P A for every 2 2 permutation matrix P or strictly lower triangular matrix, then prove that A = c I for some number c. Problem 4.22. If the third and fourth columns of a matrix A are equal, are they still equal after we carry out forward elimination? After back substitution? Problem 4.23. How many pivots can there be in a 3 5 matrix in echelon form? 3 5 0 4 6 1 0 3 5 1 4 6
Problem 4.24. Write down the simplest 3 5 matrices you can come up with in echelon form and for which a. The second and third variables are the only free variables. b. There are no free variables. c. There are pivots in precisely the columns 3 and 4.
Problem 4.25. Write down the simplest matrices A you can for which the number of solutions to Ax = b is a. 1 for any b; b. 0 for some b, and for other b; c. 0 for some b, and 1 for other b; d. for any b. Problem 4.26. Suppose that A is a square matrix. Prove that all entries of A are positive just when, for any nonzero vector x which has no negative entries, the vector Ax has only positive entries.
42
Problem 4.27. Prove that short matrices kill. A matrix is called short if it is wider than it is tall. We say that a matrix A kills a vector x if x = 0 but Ax = 0.
4.5 Summary
The many steps of elimination can each be encoded into a matrix multiplication. The resulting matrices can all be multiplied together to give the single equation U = V A, where A is the matrix we started with, U is the echelon matrix we end up with and V is the product of the various matrices that carry out all of our elimination steps. There is a big idea at work here: encode a possibly huge number of steps into a single algebraic equation (in this case the equation U = V A), turning a large computation into a simple piece of algebra. We will return to this idea periodically.
So A 1 = 3 2 2 . 1
Theorem 5.2. Let A be a square matrix. Suppose that GaussJordan elimination applied to the matrix (A I ) ends up with (U V ) with U and V square matrices. A is invertible just when U = I , in which case V = A1 . 43
44
Example 5.3. Before the proof, lets have an example. Lets invert A= 1 2 2 . 3
1 2
2 3
1 0
0 1
2 1
1 2
0 1
2 1
1 2
0 1
0 1 V
3 2 .
2 1
Obviously these are the same steps we used in the example above; the shaded part represents coecients in front of the y vector above. Since U = I , A is invertible and A 1 = V = 3 2 2 . 1
Proof. GaussJordan elimination on (A I ) is carried out by multiplying by various invertible matrices (strictly lower triangular, permutation, invertible diagonal and strictly upper triangular), say like U So U = MN MN 1 . . . M2 M1 A V = M N M N 1 . . . M 2 M 1 , V = MN MN 1 . . . M2 M1 A I .
45
which we summarize as U = V A. Clearly V is a product of invertible matrices, so invertible. Thus U is invertible just when A is. First suppose that U has pivots all down the diagonal. Every pivot is a 1. Entries above and below each pivot are 0, so U = I . Since U = V A, we nd I = V A. Multiply both sides on the left by V 1 , to see that V 1 = A. But then multiply on the right by V to see that I = AV . So A and V are inverses of one another. Next suppose that U doesnt have pivots all down the diagonal. We always start GaussJordan elimination on the diagonal, so we fail to place a pivot somewhere along the diagonal just because we move during forward elimination. That move makes a pivotless column, hence a free variable for the equation Ax = 0. Setting the free variable to a nonzero value produces a nonzero x with Ax = 0. By problem 3.24 on page 29, A is not invertible. Problem 5.1. Find the inverse of 0 A = 1 1
0 1 2
1 0 . 1
2 2 0
1 1 1
1 1 . 1
1 1 1
1 3 . 3
Problem 5.5. Is there a faster method than GaussJordan elimination to nd the inverse of a permutation matrix?
46
47
Problem 5.7. Is
0 1 1
1 0 1
0 1 1
invertible?
Problem 5.8. Prove that a square matrix A is invertible just when the only solution x to the equation Ax = 0 is x = 0.
have a unique solution, because they are Ax = b with A= which has echelon form U= 1 0 2 4 . 1 1 2 2
48
Problem 5.9. Suppose that A and B are n n matrices and AB = I . Prove that A and B are both invertible, and that B = A1 and that A = B 1 .
Problem 5.10. Prove that for square matrices A and B of the same size (AB )
1
= B 1 A 1
(and if either side is dened, then the other is and they are equal).
Problem 5.12. How many solutions are there to the following equations? x1 + 2x2 + 3x3 = 284905309485083 x1 + 2x2 + x3 = 92850234853408 x2 + 15x3 = 4250348503489085.
Problem 5.13. Let A be the n n matrix which has 1 in every entry on or under the diagonal, and 0 in every entry above the diagonal. Find A1 . Problem 5.14. Let A be the n n matrix which has 1 in every entry on or above the diagonal, and 0 in every entry below the diagonal. Find A1 . Problem 5.15. Give an example of a 3 3 invertible matrix A for which A and At have dierent values for their pivots.
Problem 5.16. Imagine that you start with a matrix invertible, and carry out forward elimination on (A I ). (U V ) = 0 0 0 0 2 8 0 0 0 0 0 1
49
with some pivots somewhere on the rst two rows of U . Fact: you can solve Ax = b just for those vectors b which solve the equations 2b1 +8b2 +3b3 +9b4 = 0 . b2 +5b3 +2b4 = 0 Explain why.
6 The Determinant
We can see whether a matrix is invertible by computing a single number, the determinant. Problem 6.1. Use forward elimination to prove that a 2 2 matrix A= is invertible just when ad bc = 0. a b , the determinant is ad bc. For larger c d matrices, the determinant is complicated. For any 2 2 matrix a c b d
6.1 Denition
Determinants are computed as in gure 6.1 on the next page. To compute a determinant, run your nger down the rst column, writing down plus and minus signs in the pattern +, , +, , . . . in front the entry your nger points at, and then writing down the determinant of the matrix you get by deleting the row and column where your nger lies (always the rst column), and add up. Problem 6.2. Prove that det a c b d = ad bc.
52
The Determinant
3 det 1 6
2 4 7
2 4 7 2 4 7 2 4 7 det 2 7
1 5 2 1 5 2 1 5 2 1 2 + 6 det 2 4 1 5
0 1 1
Problem 6.5. Does A2 11 appear in the expression for det A, when you expand out all of the determinants in the expression completely? Problem 6.6. Prove that the 0 0 0 0 0 0 A= 0 0 0 0 0 0 determinant of 0 0 0 0 0 0
is zero, no matter what number we put in place of the s, even if the numbers are all dierent.
53
Problem 6.7. Give an example of a matrix all of whose entries are positive, even though its determinant is zero. Problem 6.8. What is det I ? Justify your answer. Problem 6.9. Prove that det A 0 B C = det A det C,
for A and C any square matrices, and B any matrix of appropriate size to t in here.
54
The Determinant
Lemma 6.3. The determinant of an upper triangular square matrix U11 U12 U13 U14 ... U 1n U22 U23 U24 ... U 2n U33 U34 ... U 3n . U = .. .. . . . . . .. . . . Unn is the product of the diagonal terms: det U = U11 U22 . . . Unn . Corollary 6.4. A square matrix A is invertible just when det U = 0, with U obtained from A by forward elimination. Proof. The matrix U is upper triangular. The fact that det U = 0 says just precisely that all diagonal entries of U are not zero, so are pivotsa pivot in every column. Apply theorem 5.6 on page 46.
Problem 6.11. Suppose that U is an invertible upper triangular matrix. a. Prove that U 1 is upper triangular. b. Prove that the diagonal entries of U 1 are the reciprocals of the diagonal entries of U . c. How can you calculate by induction the entries of U 1 in terms of the entries of U ?
Problem 6.12. Let U be any upper triangular matrix with integer entries. Prove that U 1 has integer entries just when det U = 1.
55
Proof. It is obvious for 1 1 (you cant swap anything). It is easy to check for a 2 2. Picture a 3 3 matrix A (like example 6.1 on page 52). For simplicity, lets swap rows 1 and 2. Then the plus sign of row 1 and the minus sign of row 2 are clearly switched in the 1st and 2nd terms in the determinant. In the 3rd term, the leading plus sign is not switched. Look at the determinant in the 3rd term: rows 1 and 2 dont get crossed out, and have been switched, so the determinant factor changes sign. So all terms in the determinant formula have changed sign, and therefore the determinant has changed sign. The argument goes through identically with any size of matrix (by induction) and any two neighboring rows, instead of just rows 1 and 2. Lemma 6.6. Swapping any two rows of a square matrix changes the sign of the determinant, so det P A = det A for P the permutation matrix of a transposition. Proof. Suppose that we want to swap two rows, not neighboring. For concreteness, imagine rows 1 and 4. Swapping the rst with the second, then second with third, etc., a total of 3 swaps will drive row 1 into place in row 4, and drives the old row 4 into row 3. Two more swaps (of row 3 with row 2, row 2 with row 1) puts everything where we want it. More generally, to swap two rows, start by swapping the higher of the two with the row immediately under it, repeatedly until it ts into place. Some number s of swaps will do the trick. Now the row which was the lower of the two has become the higher of the two, and we have to swap it s 1 swaps into place. So 2s 1 swaps in all, an odd number. Problem 6.13. If a square matrix has two rows the same, prove that it has determinant 0.
Problem 6.14. Find 2 2 matrices A and B for which det(A + B ) = det A + det B. So det doesnt behave well under adding matrices. But it does behave well under adding rows of matrices. Example 6.7. Watch each row: det 1+5 3 2+6 4 = det 1 3 2 4 + det 5 3 6 4 .
Theorem 6.8. The determinant of any square matrix scales when you scale across any row like det 71 3 72 4 = 7 det 1 3 2 4
The Determinant
It adds when you add across any row like det 1+5 3 2+6 4 = det 1 3 2 4 + det 5 3 6 4
or when you add down any column like det 1 3 2+5 4+6 = det 1 3 2 4 + det 1 3 5 . 6
Proof. To compute a determinant, you pick an entry from the rst column, and then delete its row and column. You then multiply it by the determinant of what is left over, which is computed by picking out an entry from the second column, not from the same row, etc. If we ignore for a moment the plus and minus signs, we can see the pattern emerging: you just pick something from the rst column, and cross out its row and column,
and then something from the second column, and cross out its row and column, , , ...,
and so on. Finally, you have picked one entry from each column, all from dierent rows. In our example, we picked A31 , A52 , A23 , A14 , A45 . Multiply these together, and you get just one term from the determinant: A31 A52 A23 A14 A45 . Your term has exactly one entry from the rst column, and then you crossed out the rst column and moved on. Suppose that you double all of the entries in the rst column. Your term contains exactly one entry from that column, A31 in our example, so your term doubles. Adding up the terms, the determinant doubles. The determinant is the sum over all choices you could make of rows to pick at each step; and of course, there are some plus and minus signs which we are
57
still ignoring. For example, with this kind of picture, a 2 2 determinant looks like = A11 A22 A21 A12 .
In the same way, scaling any column, you scale your entry from that column, so you scale your term. You scale all of the terms, so you scale the determinant. When you cobbled together your term, you picked out an entry from some row, and then crossed out that row. So you didnt use the same row twice. There are as many rows as columns, and you picked an entry in each column, so you have picked as many entries as there are rows, never using the same row twice. So you must have picked out exactly one entry from each row. In our example term above, we see this clearly: the rows used were 3, 5, 2, 1, 4. By the same argument as for columns, if you scale row 2, you must scale the entry A23 , any only that entry, so you scale the term. Adding up all possible terms, you scale the determinant. Lets see why we can add across rows. If I try to add entries across the rst row, a single term looks like = (4 + 9) (. . . ) where the (. . . ) indicates all of the other factors from the lower rows, which we will leave unspecied, = 4 ( . . . ) + 9 (. . . ) 1 2 3 = 1+6 2+7 3+8 4+9 5 + 10
10
since we keep all of the entries in the lower rows exactly the same in each matrix. This shows that each term adds when you add across a single row, so the sum of the terms, the determinant, must add. This reasoning works for any size of matrix in the same way. Moreover, it works for columns just in the same way as for rows. Problem 6.15. What happens to the determinant if I double the rst row and then triple the second row?
58
The Determinant
Proposition 6.9. Suppose that S is the strictly upper or strictly lower triangular matrix which adds a multiple of one row to another row. Then det SA = det A. i.e. we can add a multiple of any row to any other row without aecting the determinant. Proof. We can always swap rows as needed, to get the rows involved to be the rst and second rows. Then swap back again. This just changes signs somehow, and then changes them back again. So we need only work with the rst and second rows. For simplicity, picture a 3 3 matrix as 3 rows: a1 A = a2 . a3 Adding s (row 1) to (row 2) gives
a1 a2 + s a1 a3
a1 = det a2 a3
a1 + s det a1 a3
by the last lemma. The second determinant vanishes because it has two identical rows. The general case is just the same with more notation: we stu more rows around the three rows we had above. Problem 6.16. Which property of the determinant is illustrated in each of these examples? (a) 10 5 5 2 1 1 1 0 2 = 5 1 0 2 1 0 1 1 0 1 (b) 1 1 3 2 0 0 3 1 2 = 3 2 1 2 0 0 3 2 2
59
(c) 1 4 2 1 0 2 1 2 2 = 4 0 1 1 0 0 2 2 3
with one row swap so det A = (2)(7) = 14. Remark 7.3. The fast formula isnt actually any faster for small matrices, so for a 2 2 or 3 3 you wouldnt use it. But we need the fast formula anyway; each of the two formulas gives dierent insight. Proof. We can see how the determinant changes during elimination: adding multiples of rows to other rows does nothing, swapping rows changes sign. Problem 7.1. Use the fast formula to nd the determinant of 2 5 5 A = 2 5 7 2 6 11
61
62 Problem 7.2. Just by looking, 1001 2002 det 2343 9873 nd 1002 2004 6787 7435
Problem 7.3. Prove that a square matrix is invertible just when its determinant is not zero.
1 0 1 0
2 0 0 1
0 0 1 1
1 0 1
0 1 1
63
1 2 1
1 2 0
Problem 7.10. Prove that a square matrix with a zero row has determinant 0.
Problem 7.11. Prove that det P A = (1)N det A if P is the permutation matrix of a product of N transpositions. Problem 7.12. Use the fast formula to nd 0 2 A = 3 1 3 5 the determinant of 1 2 2
Problem 7.13. Prove that the determinant of any lower triangular square matrix L11 L 21 L31 L= L41 . . . Ln1 L22 L32 L42 . . . Ln2 L33 L43 . . . Ln3 . . . . ... .. .. Lnn
Ln(n1)
(with zeroes above the diagonal) is the product of the diagonal terms: det L = L11 L22 . . . Lnn .
64
Proof. Suppose that det A = 0. By the fast formula, A is not invertible. Problem 5.10 on page 48 tells us that therefore AB is not invertible, and both sides of equation 7.1 are 0. So we can safely suppose that det A = 0. Via Gauss-Jordan elimination, any invertible matrix is a product of matrices each of which adds a multiple of one row to another, or scales a row, or swaps two rows. Write A as a product of such matrices, and peel o one factor at a time, applying lemma 6.5 on page 54 and proposition 6.9 on page 58. Example 7.5. If 1 A = 0 0 4 2 0 1 6 , B = 2 5 7 3 0 2 5 0 0 , 4
then it is hard to compute out AB , and then compute out det AB . But det AB = det A det B = (1)(2)(3)(1)(2)(4) = 48.
7.2 Transpose
Denition 7.6. The transpose of a matrix A is the matrix At whose entries are At ij = Aji (switching rows with columns). Example 7.7. Flip over the diagonal: 10 2 t A=3 40 , A = 5 6 Problem 7.14. Find the transpose of 1 2 A = 4 5 0 0 Problem 7.15. Prove that (AB ) = B t At . (The transpose of the product is the product of the transposes, in the reverse order.)
t
10 2
3 40
5 . 6
3 6 . 0
65
Problem 7.16. Prove that the transpose of any permutation matrix is a permutation matrix. How is the permutation of the transpose related to the original permutation?
Proof. Forward elimination gives U = V A, U upper triangular and V a product of permutation and strictly lower triangular matrices. Tranpose: U t = At V t . But V t is a product of permutation and strictly upper triangular matrices, with the same number of row swaps as V , so det V t = det V = 1. The matrix U t is lower triangular, so det U t is the product of the diagonal entries of U t (by problem 7.13 on page 63), which are the diagonal entries of U , so det U t = det U .
Theorem 7.9. We can compute the determinant of any square matrix A by picking any column (or any row) of A, writing down plus and minus signs from the same column (or row) of the checkboard matrix, writing down the entries of A from that column (or row), multiplying each of these entries by the determinant obtained from deleting the row and column of that entry, and adding all of these up.
66 if we expand along the second row, we get 3 det A = (1) det 1 6 3 + (4) det 1 6 3 (5) det 1 6
2 4 7 2 4 7 2 4 7
1 5 2 1 5 2 1 5 2
Proof. By swapping columns (or rows), we change signs of the determinant. Swap columns (or rows) to get the required column (or row) to slide over to become the rst column (or row). Take the sign changes into account with the checkboard pattern: changing all plus and minus signs for each swap. Problem 7.17. Use this to calculate the determinant of 1 2 0 1 4 0 0 3 A= . 0 0 0 2 839 1702 1 493
7.4 Summary
Determinants (a) scale when you scale across a row (or down a column), (b) add when you add across a row (or down a column), (c) switch sign when you swap two rows, (or when you swap two columns), (d) dont change when you add a multiple of one row to another row (or a multiple of one column to another column), (e) dont change when you transpose, (f) multiply when you multiply matrices. The determinant of (a) an upper (or lower) triangular matrix is the product of the diagonal entries. (b) a permutation matrix is (1)# of transpositions . (c) a matrix is not zero just when the matrix is invertible. (d) any matrix is det A = (1)N det U , if A is taken by forward elimination with N row swaps to a matrix U .
7.4. Summary
67
Problem 7.19. Use this last exercise to nd det A2222444466668888 where A= 0 1 1 . 1234567890
Problem 7.22. How many solutions are there to the following equations? x1 + 1010x2 + 130923x3 = 2839040283 2x2 + 23932x3 = 2390843248 3x3 = 98234092384
Problem 7.23. Prove that no matter which entry of an n n matrix you pick (n > 1), you can nd some invertible n n matrix for which that entry is zero.
69
8 Span
We want to think not only about vectors, but also about lines and planes. We will nd a convenient language in which to describe lines and planes and similar objects.
There are many solutions. Each is a point in R3 , and together they draw out a plane. But how do we write down this plane? The picture is uselesswe cant see for sure which vectors live on it. We need a clear method to write down planes, lines, and similar things, so that we can communicate about them (e.g. over the telephone or to a computer). One method to describe a plane is to write down an equation, like x1 + 2x2 + x3 = 0, cutting out the plane. But there is another method, which we will often prefer, building up the plane out of vectors.
8.2 Span
Example 8.1. Consider the equations x1 + 2x2 7 x4 = 0 x3 + x4 = 0
Solutions have x 1 = 2 x 2 + 7 x 4 x 3 = x 4 , 71
72 giving x1 x x = 2 x3 x4 2 x2 + 7 x4 x2 = x4 x4 2 7 1 0 = x2 + x4 0 1 0 1
Span
But x2 and x4 are freethey can be anything. The solutions are just arbitrary combinations of 2 7 1 0 and 0 1 0 1 We can just remember these two vectors, to describe all of the solutions. Denition 8.2. A multiple of a vector v is a vector cv where c is a number. A linear combination of some vectors v1 , v2 , . . . , vp in Rn is a vector v = c1 v1 + c2 v2 + + cp vp , for some numbers c1 , c2 , . . . , cp (a sum of multiples). The span of some vectors is the collection of all of their linear combinations.
73
74
Span
Denition 8.4. A pivot column of a matrix A is a column in which a pivot appears when we forward eliminate A. Example 8.5. The matrix A= has echelon form U= so columns 1 and 3 of A: A= are pivot columns. Lemma 8.6. Write some vectors into the columns of a matrix, say A = x1 x2 ... xp y 0 1 0 1 1 1 1 0 1 0 1 1 0 1 0 1 1 1
and apply forward elimination. Then y lies in the span of x1 , x2 , . . . , xp just when y is not a pivot column. Proof. As in example 8.3 on the preceding page, the problem is precisely whether we can solve the linear equations whose matrix is A, with y the column of constants. We already know that linear equations have solutions just when the column of constants is not a pivot column. Applied to our example, this gives A = x1 1 = 2 3 x2 1 0 1 y 1 4 7
1 4 7
75
There is no pivot in the last column, so y is a linear combination of x1 and x2 , i.e. lies in their span. (In fact, in the echelon form, we see that the last column is twice the rst column minus the second column. So this must hold in the original matrix too: y = 2 x1 x2 .) Problem 8.2. What if we have a lot of vectors y to test? Prove that vectors y1 , y2 , . . . , yq all lie in the span of vectors x1 , x2 , . . . , xp just when the matrix x1 x2 ... xp y1 y2 ... yq
76
Span
Problem 8.5. Does the vector 1 0 1 lie in the span of the vectors
1 2 0 1 , 1 , 1? 1 2 0
0 2 0
lie in the span of the vectors 4 2 2 , , 0 1 0? 0 1 0 Problem 8.7. Does the vector 1 0 1 lie in the span of the vectors
0 2 1 1 , 0 , 1 ? 1 1 0 0 3 6
1 1 3 1 , 2 , 0 ? 0 6 6
8.5. Subspaces
77
Problem 8.9. Find a linear equation satised on the span of the vectors 1 2 1 , 0 1 1
8.5 Subspaces
Picture a straight line through the origin, or a plane through the origin. We generalize this picture: Denition 8.7. A subspace P of Rn is a collection of vectors in Rn so that a. P is not empty (i.e. some vector belongs to the collection P ) b. If x belongs to P , then ax does too, for any number a. c. If x and y belong to P , then x + y does too. We can see in pictures that a plane through the origin is a subspace:
Problem 8.10. Prove that 0 belongs to every subspace. Problem 8.11. Prove that if a subspace contains some vectors, then it contains their span. Intuitively, a subspace is a at object, like a line or a plane, passing through the origin 0 of Rn . Example 8.8. The set P of vectors x= x1 x2
for which x1 + 2x2 = 0 is a subspace of R2 , because a. x = 0 satises x1 + 2x2 = 0 (so P is not empty). b. If x satises x1 + 2x2 = 0, then ax satises (ax)1 + 2(ax)2 = a x1 + 2a x2 = a (x1 + 2 x2 ) = 0.
78 c. If x and y are points of P , satisfying x1 + 2 x2 = 0 y1 + 2y2 = 0 then x + y satises (x1 + y1 ) + 2 (x2 + y2 ) = (x1 + 2x2 ) + (y1 + 2y2 ) = 0. Problem 8.12. Is the set S of all points x= of the plane with x2 = 1 a subspace? x1 x2
Span
Problem 8.13. Is the set P of all points x1 x = x2 x3 with x1 + x2 + x3 = 0 a subspace? The word subspace really means just the same as the span of some vectors, as we will eventually see. Proposition 8.9. The span of a set of vectors is a subspace; in fact, it is the smallest subspace containing those vectors. Conversely, every subspace is a span: the span of all of the vectors inside it. Remark 8.10. In order to make this proposition true, we have to change our denitions just a little: if we have an empty collection of vectors (i.e. we dont have any vectors at all), then we will declare that the span of that empty collection is the origin. Remark 8.11. If we have an innite collection of vectors, then their span just means the collection of all linear combinations we can build up from all possible choices we can make of any nite number of vectors from our collection. We dont allow innite sums. We would really like to avoid using spans of innite sets of vectors; we will address this problem in chapter 9. Proof. Given any set of vectors X in Rn , let U be their span. So any vector in U is a linear combination of vectors from X . Scaling any linear combination yields another linear combination, and adding two linear combinations yields a further linear combination, so U is a subspace. If W is any other subspace
8.5. Subspaces
79
containing X , then we can add and scale vectors from W , yielding more vectors from W , so we can make linear combinations of any vectors from W making more vectors from W . Therefore W contains the span of X , i.e. contains U . Finally, if V is any subspace, then we can add and scale vectors from V to make more vectors from V , so V is the span of all vectors in V . Problem 8.14. Prove that every subspace is the span of the vectors that it contains. (Warning: this fact isnt very helpful, because any subspace will either contain only the origin, or contain innitely many vectors. We would really rather only think about spans of nitely many vectors. So we will have to reconsider this problem later.)
Problem 8.16. If U and V are subspaces of Rn : a. Let W be the set of vectors which either belong to U or belong to V . Is W a subspace? b. Let Z be the set of vectors which belong to U and to V . Is Z a subspace?
Problem 8.18. a. The set of b. The set of c. The set of d. The set of
Which of the following are subspaces of R4 ? points x for which x1 x4 = x2 x3 . points x for which 2x1 = 3x2 . points x for which x1 + x2 + x3 + x4 = 0. points x for which x1 , x2 , x3 and x4 are all 0.
Problem 8.19. Is a circle in the plane a subspace? Prove your answer. Draw pictures to explain your answer. Problem 8.20. Which lines in the plane are subspaces? Draw pictures to explain your answer.
80
Span
8.6 Summary
We have solved the problem of this chapter: to describe a subspace. You write down a set of vectors spanning it. If I write down a dierent set of vectors, you can check to see if mine are linear combinations of yours, and if yours are linear combinations of mine, so you know when yours and mine span the same subspace.
9 Bases
Our goal in this book is to greatly simplify equations in many variables by changing to new variables. In linear algebra, the concept of changing variables is replaced with the more concrete concept of a basis.
9.1 Denition
A basis is a list of just enough vectors to span a subspace. For example, we should be able to span a line by writing down just one vector lying in it, a plane with just two vectors, etc. Denition 9.1. A linear relation among some vectors x1 , x2 , . . . , xp in Rn is an equation c1 x1 + c2 x2 + + cp xp = 0, where c1 , c2 , . . . , cp are not all zero. A set of vectors is linearly independent if the vectors admit no linear relation. A set of vectors is a basis of Rn if (1) the vectors are linearly independent and (2) adding any other vector into the set would render them no longer linearly independent. Example 9.2. The vectors x1 = 1 , x2 = 2 2 4 The vector sticking up is linearly independent of the other two vectors.
Only this plane contains 0 and these two vectors. Threelegged tables dont wobble, unless all of the feet of the table legs lie on the same straight line.
9.2 Properties
Lemma 9.3. The columns of a matrix are linearly independent just when each one is a pivot column. Proof. Obvious from lemma 8.6 on page 74. Problem 9.1. Is 0 , 1 a basis of R2 ? 81 1 1
82
Bases
Problem 9.2. The standard basis of Rn is the basis e1 , e2 , . . . , en (where e1 is the rst column of In , etc.). Prove that the standard basis of Rn is a basis.
Problem 9.3. Prove that there is a linear relation between some vectors w1 , w2 , . . . , wq just when one of those vectors, say wk , is a linear combination of earlier vectors w1 , w2 , . . . , wk1 . Theorem 9.4. Every linearly independent set of vectors in Rn consists in at most n vectors, and consists in exactly n vectors just when it is a basis. Proof. Suppose that x1 , x2 , . . . , xp are a linearly independent. Let A = x1 x2 ... xp .
There is either one pivot or no pivot in each row. So the number of rows is at least as large as the number of pivots. There are n rows. There is one pivot in each column, so p pivots. So p n. If p = n, we have one pivot in each row, so adding another vector (another column) cant add another pivot. Therefore adding any other vector to the vectors x1 , x2 , . . . , xp would break linear independence. If p < n, then we have zero rows after forward elimination. Suppose that forward elimination yields U = V A. Then (U ep+1 ) has more pivot columns than U has, so A V 1 ep+1 has more pivot columns than A has. Thus adding a new vector xp+1 = V 1 ep+1 to the collection of vectors x1 , x2 , . . . , xp , we have a larger linearly independent collection. Problem 9.4. Prove that every linearly independent set of vectors in Rn belongs to a basis.
Lemma 9.5. A set of vectors u1 , u2 , . . . , un is a basis of Rn just when every vector b in Rn can be written as a linear combination b = a1 u1 + a2 u2 + + an un , for a unique choice of numbers a1 , a2 , . . . , an . Proof. Let A = u1 u2 a1 a2 , a= . . . an ... un ,
83
Problem 9.7. Can you nd matrices A and B so that A is 3 5 and B is 5 3, and AB = 1? Problem 9.8. Suppose that A is 3 5 and B is 5 3, and that AB is invertible. Must the columns of B be linearly independent? the rows of B ? the columns of A? the rows of A? Problem 9.9. Give an example of a 3 3 matrix for which any two columns are linearly independent, but the three columns together are not linearly independent. Can such a matrix be invertible?
Note that F e1 = u1 , F e2 = u2 , . . . , F en = un . So taking x to F x is a change of basis, taking the standard basis to the new basis. Problem 9.10. Prove that an n n matrix A is the change of basis matrix of a basis just when the equation Ax = 0 has x = 0 as its only solution, which occurs just when A is invertible. Suppose that you and I look at the sky and watch a falling star. You measure its position against the xed choice of basis e1 , e2 , e3 , while I measure against
84
Bases
some funny choice of basis u1 , u2 , u3 . The actual position is some vector p in R3 . Lets say p = x1 e1 + x2 e2 + x3 e3 as you measure it, = y1 u1 + y2 u2 + y3 u3 as I measure it. Let F = u1 u2 u3 be the change of basis matrix, so that F e1 = u1 , etc. So F takes your basis to mine. If we let x1 y1 x = x2 and y = y2 x3 y3 then in your basis: p = x1 e1 + x2 e2 + x3 e3 =x but in mine: p = y1 u1 + y2 u2 + y3 u3 = y1 F e1 + y2 F e2 + y3 F e3 = F (y1 e1 + y2 e2 + y3 e3 ) = F y. So x = F y converts my measurements to yours. Remark 9.7. Suppose that we change variables by x = F y and so y = F 1 x, with F some invertible matrix. Then any matrix A acting on the x variables by taking x to Ax is represented in y variables as F 1 turn xs to y s the matrix F 1 AF . Problem 9.11. Take A act on xs F turn y s to xs
1 F = 0 0
0 1 0
1 1 0 , A = 0 1 0
0 2 0
0 0 . 2
Compute F 1 AF .
85
Problem 9.12. A shower of falling stars fall to Earth. Each star falls from a position x1 x = x2 x3 to a position on the ground x1 Ax = x2 . 0 What is the matrix A? Suppose that I measure the positions of the stars against the basis 1 2 0 u1 = 0 , u2 = 1 , u3 = 2 . 0 0 1 Find the change of basis matrix F , and nd F 1 AF , the matrix that describes how each star falls from the sky as measured against my basis.
a basis of R3 ? Problem 9.14. If A is a matrix, show how each vector which A kills determines a linear relation between the columns of A, and vice versa.
Problem 9.16. Write down a basis of R2 other than the standard basis, and prove that your basis really is a basis.
Bases
Problem 9.18. Is 0 1 1 1
a change of basis matrix? If so, for what basis? Problem 9.19. If x1 , x2 , . . . , xn and y1 , y2 , . . . , yn are two bases of Rn , prove that there is a unique invertible matrix A so that A x1 = y1 , A x2 = y2 , etc.
9.5. Dimension
87
so the rst and second columns are pivot columns. Therefore 1 1 , 1 1 1 0 are a basis for the span. Are there any bases? Proposition 9.12. Every subspace of Rn has a basis. Moreover, any basis v1 , v2 , . . . , vp of a subspace V of Rn lives in a basis v1 , v2 , . . . , vp , w1 , w2 , . . . , wq of Rn . Proof. If V only contains the 0 vector, then we can take no vectors as a basis for V , and let w1 , w2 , . . . , wn be any basis for Rn . On the other hand, if V contains a nonzero vector, then pick as many linearly independent vectors from V as possible. By theorem 9.4 on page 82, we could only pick at most n vectors. They must span V , because otherwise we could pick another one. If V = Rn , then we are nished. Otherwise, pick as many vectors from Rn as possible which are linearly independent of v1 , v2 , . . . , vp . Clearly we stop just when we hit a total of n vectors.
9.5 Dimension
Do all bases look pretty much the same? Theorem 9.13. Any two bases of a subspace have the same number of vectors. Proof. Imagine two bases, say x1 , x2 , . . . , xp and y1 , y2 , . . . , yq , for the same subspace. Forward eliminate x1 x2 ... xp y1 y2 ... yq
88 yielding .
Bases
Each x vector generates a pivot, p pivots in all, straight down the diagonal. Forward eliminate the right hand portion of the matrix, yielding ,
giving at most p pivots because of the zero rows. Each y vector generates a pivot. So there arent more than p of these y vectors. Thus no more y vectors than x vectors. Reversing the roles of x and y vectors, we nd that there cant be more x vectors than y vectors. Problem 9.20. Prove that every subspace of Rn has a basis with at most n vectors. Denition 9.14. The dimension of a subspace is the number of vectors in any basis. Write the dimension of a subspace U as dim U .
9.6 Summary
A subspace is a at thing passing through 0. A basis for a subspace is a collection of just enough vectors to span the subspace. A change of basis matrix is a basis organized into the columns of a matrix.
89
10.1 Kernel
Denition 10.1. If A is any matrix, say p q , then the vectors x in Rq for which Ax = 0 (vectors killed by A) form a subspace of Rq called the kernel of A, and written ker A. The kernel is a subspace, because a. 0 belongs to the kernel of any matrix A, since A0 = 0 (everything kills 0). b. If Ax = 0 and Ay = 0, then A(x + y ) = Ax + Ay = 0 (when you kill two vectors, you kill their sum). c. If Ax = 0, then A (ax) = aAx = 0 (when you kill a vector, you kill its multiples). Problem 10.1. If a matrix is wider than it is tall (a short matrix), then its kernel contains nonzero vectors. Problem 10.2. Prove that the kernel of AB contains the kernel of B . Does it have to contain the kernel of A? We will often need to nd kernels of matrices. To rapidly calculate the kernel of a matrix, for example 2 0 1 1 A = 1 1 2 1 3 1 0 1 a. Carry out forward elimination and back substitution. 1 1 1 0 2 2 3 1 0 1 2 2 0 0 0 0 b. Cut out all zero rows. 1 0 0 1 91
1 2 3 2 1 2 1 2
1 (This corresponds to changing equations like x1 + 1 2 x3 + 2 x4 = 0 to 1 1 x1 = 2 x3 2 x4 . Think of it as moving everything after the pivot over to the right hand side, although we wont actually move anything.) d. Stu in whatever rows from the identity matrix you need into your matrix, so that it ends up with nonzero entries all down the diagonal. 1 1 0 1 2 2 1 3 1 0 2 2 . 0 0 1 0 0 0 0 1
We wont mark the new rows with pivots. Each new row corresponds to setting one of the free variables to 1 and the others to 0. e. Cut out all of the pivot columns. The remaining columns are a basis for the kernel. 1 1 2 2 3 1 2 2 , . 1 0 0 1 Problem 10.3. Apply this algorithm to the matrix 0 1 2 A= 2 2 2 2 0 2 and check that Ax = 0 for each vector x in your resulting basis for the kernel.
Lemma 10.2. The algorithm works, giving a basis for the kernel of any matrix. Proof. Each vector in the kernel is obtained by setting arbitrary values for the free variables, and letting the pivots solve for the other variables. Let v1 be the vector in the kernel which has value 1 for the 1st free variable, and 0 for all other free variables. Similarly, make a vector v2 , v3 , . . . , vs for each free variablesuppose that there are s free variables. The kernel is a subspace, so each linear combination c1 v1 + c2 v2 + + cs vs lies in the kernel. This linear combination has value c1 for the rst free variable, c2 for the second, etc. (just looking at the rows of the free variables). Each
10.1. Kernel
93
vector in the kernel has some values c1 , c2 , . . . , cs for the free variables. So each vector in the kernel is a unique linear combination of v1 , v2 , . . . , vs . Supose we nd a linear relation among v1 , v2 , . . . , vs , say c1 v1 + c2 v2 + + cs vs = 0. Look at the row in which v1 has a 1 and all of the other vectors have 0s: the linear relation gives c1 = 0 in that row. Similarly all of c1 , c2 , . . . , cs must vanish, so there is no linear relation among these vectors. Therefore the vectors v1 , v2 , . . . , vs form a basis for the kernel. Finally, we need to see why these vectors v1 , v2 , . . . , vs are precisely the vectors which come out of our process above. First, look at our example. The reduced echelon form turns back into equations as x1 x2 +1 2 x3 +3 2 x3
1 +2 x4 1 +2
= =
0 0
x4
All free variables line up on the right hand side, and we have changed the signs of their coecients. Setting x3 = 1 and x4 = 0, go down the right hand side, killing the x4 entries, and putting x3 = 1 in each x3 entry, i.e. writing down just the entries from the x3 column: 1 2 3 v1 = 2 . 1 0 The general algorithm works in the same way: if we put all free variables on to the right hand side, and then set one free variable to 1 (turn it on) and the others to 0s (turn them o), we can picture this as turning on the column associated to that free variable. Each pivot solves for a pivot variablethe value of that pivot variable is the entry in the corresponding row of the turned on column. Problem 10.4. Give an example of a square matrix whose kernel is not the kernel of its transpose. Problem 10.5. Draw a picture of the kernel for each of A= 1 0 0 1 0 ,B = 0 1 2 0 ,C = 1 ,D = 0 0 0 .
Corollary 10.3. The dimension of the kernel of a matrix is the number of pivotless columns after forward elimination. Remark 10.4. Another way to say it: the dimension of the kernel of a matrix A is the number of free variables in the equation Ax = 0.
94
kernel of 0 1 0 0 0 0 1 0 0 0 0 1
kernel of 1 1 1 2 0 1
kernel of 0 1 1 1 1 0
10.2. Image
95
10.2 Image
Denition 10.5. The image of a matrix is the set of vectors y of the form y = Ax for some vector x, written im A. Problem 10.12. Prove that the image of a matrix is the span of its columns.
1 A= 0 1
1 1 0
2 2 0
3 3 0
is the span of 1 1 2 3 0 , 1 , 2 , 3 . 1 0 0 0 Problem 10.13. Prove that the equation Ax = b has a solution x just when b lies in the image of A.
Problem 10.15. (a) Suppose that A is a 2 2 matrix which, when taking a vector x to the vector Ax, takes multiples of e1 to multiples of e1 . Show that A is upper triangular. (b) Similarly, if A is a 3 3 matrix which takes multiples of e1 to multiples of e1 , and takes linear combinations of e1 and e2 to other such linear combinations, then A is upper triangular. (c) Generalize this to Rn , and use it to show that the inverse of an invertible upper triangular matrix is upper triangular. Problem 10.16. Prove that if a matrix is taller than it is wide (a tall matrix), then some vector does not belong to its image.
D=
1 , F = 1
0 . 1
Problem 10.18. Prove that the kernel of A is the kernel of A . A Problem 10.19. If you know the kernel of a p q matrix A, how do you nd the dimension of the kernel of A A A B = A A A? A A A
Problem 10.21. If A and B are two matrices, and B = CA for an invertible matrix C , prove that A and B have images of the same dimension.
Theorem 10.7. For any matrix A, dim ker A + dim im A = number of columns. Proof. The image of A is the span of the columns. By lemma 8.6 on page 74, each pivotless column is a linear combination of earlier pivot columns. So the pivot columns span the image. Pivot columns are linearly independent: a basis. Each pivotless column contributes (in our algorithm) to our basis for the kernel. Example 10.8. The matrix 1 A = 1 2 1 1 2 1 1 1 1 1 1
10.4. Summary
97
So columns 1, 3 and 4 of A (not of U ) are a basis for the image of A: 1 1 1 , , 1 1 1 . 1 1 2 The image has dimension 3 because there are 3 pivot columns. The kernel has dimension 1, because there is one pivotless column. The pivotless column is not a basis for the kernel. It just shows you the dimension of the kernel. (In this example, the pivotless column isnt even in the kernel.) Problem 10.22. Find the rank of 0 1 A= 0 0
2 2 1 2
2 0 2 2
2 0 2 2
and explain what this tells you about image and kernel.
10.4 Summary
a. The kernel of a matrix is the set of vectors it kills. It is large just when linear equations Ax = b with one solution have lots of solutions (measures plurality of solutions when they exist). b. Our algorithm makes a basis for the kernel out of the pivotless columns. c. The image of a matrix is the stu that comes out of itthe vectors b for which you can solve Ax = b (measures existence of solutions). d. The pivot columns are a basis of the image.
98
Problem 10.32. Prove that the image of AB is contained in the image of A. Problem 10.33. Prove that the rank of AB is never more than the rank of A or of B . Problem 10.34. Prove that the rank of a sum of matrices is never more than the sum of the ranks.
Problem 10.35. Which of the following can change when you carry out forward elimination? a. image, b. kernel, c. dimension of image, d. dimension of kernel?
Problem 10.36. Prove that the rank of AB is no larger than the ranks of A and B .
Eigenvectors
99
102 Example 11.5. The matrix A= has characteristic polynomial det 2 0 0 3 1 0 0 1 = det 2 0 0 3
2 0
0 3
= (2 ) (3 ) . So the eigenvalues are = 2 and = 3. Problem 11.2. Prove that the eigenvalues of an upper (or lower) triangular matrix are the diagonal entries. For example: 1 2 3 A = 0 4 5 0 0 6 has eigenvalues = 1, = 4, = 6.
Problem 11.3. Find a 2 2 matrix A whose eigenvalues are not the same as its diagonal entries.
Problem 11.4. Find 2 2 matrices A and B for which A + B has an eigenvalue which is not a sum of some eigenvalue of A with some eigenvalue of B .
Denition 11.6. The set of eigenvalues of a matrix is called its spectrum. Example 11.7. A= 0 1 1 0
has det (A I ) = 2 + 1, which has no roots, so there are no eigenvalues (among real numbers ). Problem 11.5. Why is the characteristic polynomial a polynomial in ?
Problem 11.6. What is the highest order term of the variable in the characteristic polynomial of a matrix A?
103
1
4 + 1 1+ 1+
2
2 1 4 ( 1 + ) 2 5 + 2 1+
. 1 1 4 + 2 1 +
1+
2 4 (1 + )
5 + 2 1+
2) to row 3. 1 1 4 + 2 1 + 0 2 4 (1 + )
3
1 0 0
8 2 +15 2 14 +2
1 1 4 + 2 1 + 0
2
1
4 (1 + ) 3 8 2 + 15 2 1 4 + 2
The point: at each step, the expressions are rational functions of , accumulating to become more complicated at each step. This is not any faster than the slow process, which gives: det (A I ) = + (1 ) det 2 det 1 1 1 3 3 1 2 4 2 0 0 4
+ 1 det
=2 15 + 8 2 3 . Lets always use the slow process. There is actually a faster method to nd eigenvalues (see section 24.2 on page 230) of large matrices, but it is slower on small matrices, and we wont ever want to work with large matrices.
Problem 11.8. Prove that a square matrix A and its transpose At have the same eigenvalues.
Problem 11.9. Prove that det F 1 AF I = det (A I ) for any square matrix A, and any invertible matrix F . So the characteristic polynomial is unchanged by change of basis.
105
Problem 11.10. If all of the entries of a square matrix are positive, are its eigenvalues positive? Problem 11.11. Are the eigenvalues of AB equal to those of BA?
Problem 11.12. Give an example of 2 2 matrices A and B for which the eigenvalues of AB are not products of eigenvalues of A with those of B . Problem 11.13. What are the eigenvalues of 1 1 1 A = 1 1 1? 1 1 1 Problem 11.14. The multiplicity of an eigenvalue j is the number of factors of j appearing in the characteristic polynomial. Suppose that the characteristic polynomial of some n n matrix A splits into a product of linear factors. Prove that the determinant of A is the product of its eigenvalues (each taken with multiplicity), by setting = 0 in the characteristic polynomial. Problem 11.15. From the previous exercise, if a 2 2 matrix A has eigenvalues 0 and 1, what is its rank? Problem 11.16. Write out the characteristic polynomial of an n n matrix A as det (A I ) = s0 (A) s1 (A) + s2 (A)2 + + (1)n sn (A)n . a. Find sn (A). b. Prove that s0 (A) = det A. c. Prove that sj (A) is a sum of products of precisely n j entries of A. In particular, sn1 (A) is a polynomial of degree 1 as a function of each entry of A. d. Use this to prove that sn1 (A) = A11 + A22 + + Ann . (This quantity A11 + A22 + + Ann is called the trace of A). e. Prove that sj F 1 AF = sj (A) for any invertible matrix F , so the coecients of the characteristic polynomial are unchanged by change of basis. f. Take a basis u1 , u2 , . . . , un for which the vectors ur+1 , ur+2 , . . . , un form a basis of the kernel, let F be the associated change of basis matrix, and look at F 1 AF . Prove that F 1 AF = P Q 0 0
106
g. If A has rank r, prove that sk (A) = 0 for k n r. h. Write down two 2 2 matrices of dierent ranks with the same characteristic polynomial.
11.2 Eigenvectors
To nd the eigenvectors of a matrix A: once you have the eigenvalues, pick each eigenvalue , and nd the kernel of A I . Example 11.8. The matrix A= has eigenvalues = 2 and = 3. Lets start with = 2: A I = 2 1 0 1 0 3 0 1 2 1 0 0 1 2 1 0 3
Our algorithm (from section 10.1) for nding the kernel yields a basis 1 1 for the = 2-eigenvectors. Problem 11.17. Do the same for = 3.
Problem 11.18. Find the eigenvectors and eigenvalues of 1 3 0 A = 2 6 0 0 0 4 Example 11.9. Lets put it all together. How do we calculate the eigenvectors of A= 3 0 2 ? 1
11.2. Eigenvectors
107
= (3 ) (1 )
So the eigenvalues are = 3 and = 1. b. Find the eigenvectors: for each eigenvalue , compute a basis for the kernel of A I . A 3I = 33 0 0 0 2 2 2 13
1 . 0 The nonzero linear combinations of this basis are the eigenvectors with eigenvalue = 3. For = 1, AI = 31 0 2 0 2 0 2 11
108
Problem 11.23. Prove that every eigenvector of any square matrix A is an eigenvector of A1 , of A2 , of 3A and of A 7 I . How are the eigenvalues related? Problem 11.24. Forward elimination messes up eigenvalues and eigenvectors. Back substitution messes them up further. Give the simplest examples you can. Problem 11.25. What are the eigenvalues and eigenvectors of the permutation matrix of a transposition? Problem 11.26. What are the eigenvalues and eigenvectors of a 2 2 strictly lower triangular matrix?
12 Bases of Eigenvectors
In this chapter, we try (and dont always succeed) to organize eigenvectors into bases.
12.1 Eigenspaces
Denition 12.1. The -eigenspace of a square matrix A is the set of vectors x for which (A I )x = 0 (i.e. the kernel of A I ). The eigenvectors are precisely the nonzero vectors in the eigenspace. In particular, if is not an eigenvalue, then the -eigenspace is just the 0 vector. Problem 12.1. Prove that for any value , the -eigenspace of any square matrix is a subspace.
Figure 12.1: An eigenspace with eigenvalue = 2: anything you draw in that subspace get doubled. 109
110
Bases of Eigenvectors
Figure 12.2: A basis of eigenvectors of a matrix. Each vector starts o as a thickly drawn vector, and gets stretched into the thinly drawn vector. A negative stretching factor reverses the direction of the vector. We can recover the entire matrix A if we know the directions of the basis vectors that are stretched and how much the matrix stretches vectors in each of those directions.
each variable simply getting scaled by a factor. The next easiest are matrices that become diagonal when we change variables. Theorem 12.2 (Decoupling Theorem). If u1 , u2 , . . . , un is a basis of Rn , and each of u1 , u2 , . . . , un is an eigenvector of a square matrix A, say Au1 = 1 u1 , Au2 = 2 u2 , . . . , Aun = n un , then 1 2 1 , F AF = .. . n where F = u1 u2 ... un is the change of basis matrix of the basis u1 , u2 , . . . , un . Denition 12.3. We say that the matrix F (or the basis u1 , u2 , . . . , un ) diagonalizes the matrix A. Remark 12.4. We call this the decoupling theorem, because the transformation taking x to Ax is usually very complicated, mixing up the variables x1 , x2 , . . . , xn in a tangled mess. But if we can somehow change the variables and make A into a diagonal matrix, then each of the new variables is just being stretched or squished by a factor i , independently of any of the other variables, so the variables appear decoupled from one another.
111
Proof. F takes es to us, A scales the us, and then F 1 turns the scaled us back into es. So F 1 AF ej = j ej , giving the j -th column of F 1 AF . So 1 2 . F 1 AF = .. . n
We save a lot of time if we notice that: Theorem 12.5. Eigenvectors with dierent eigenvalues are linearly independent. Remark 12.6. This saves us time because we dont have to check to see if the eigenvectors we come up with are linearly independent, since we generate a basis for each eigenspace, and there are no relations between eigenspaces. Proof. Take a square matrix A. Pick some eigenvectors, say x1 with eigenvalue 1 , x2 with eigenvalue 2 , etc., up to some xp . Suppose that all of these eigenvalues 1 , 2 , . . . , p are dierent from one another. If we found a linear relation c1 x1 = 0 involving just one vector x1 , we would divide by c1 to see that x1 = 0. But x1 = 0 (being an eigenvector), so there is no linear relation involving just one eigenvector. Lets suppose we found a linear relation involving just two eigenvectors, x1 and x2 , like c1 x1 + c2 x2 = 0. We could just replace x1 by c1 x1 and x2 by c2 x2 to arrange a linear relation x1 + x2 = 0. Since x1 is an eigenvector with eigenvalue 1 , we know that (A 1 I ) x1 = 0. Apply A 1 I to both sides of our relation to get (2 1 ) x2 = 0. Since the eigenvalues are distinct, we can divide by 2 1 to get x2 = 0, again a contradiction. So there are no linear relations involving just two eigenvectors. Lets imagine a linear relation c1 x1 + c2 x2 + + cp xp = 0, involving any number of eigenvectors, and see why that leads us into a contradiction. If any of the terms are 0, just drop them, so we can assume that there are no 0 terms, i.e. that all coecients c1 , c2 , . . . , cp are nonzero. So we can rescale, replacing x1 by c1 x1 , etc., to arrange that our relation is now x1 + x2 + + xp = 0.
Bases of Eigenvectors
a linear relation with fewer terms. Since 1 = 2 , the coecient of x2 wont become 0 in the new linear relation unless it was already 0, so this new linear relation still has nonzero terms. In this way, each linear relation leads to a linear relation with fewer terms, until we get down to one or two terms, which we already saw cant happen. Therefore there are no linear relations among x1 , x2 , . . . , xn . Problem 12.4. Diagonalize 1 A= 2 3 2 2 6 2 2 . 6
Problem 12.6. Prove that a square matrix is diagonalizable (i.e. diagonalized by some matrix) just when it has a basis of eigenvectors.
Remark 12.7. Following remark 9.7 on page 84, F diagonalizes A just when the change of coordinates y = F 1 x changes the matrix A into a diagonal matrix. Problem 12.7. Give an example of a matrix which is not diagonalizable.
Problem 12.8. If A is diagonalized by F , say F 1 AF = diagonal, then prove that A2 is also diagonalized by F . Apply induction to prove that all powers of A are diagonalized by F . Problem 12.9. Use the result of the previous exercise to compute A100000 where 3 2 A= . 4 3
12.3. Summary
113
Problem 12.13. Let A be the 5 5 matrix all of whose entries are 1. a. Without any calculation, what is the kernel of A? b. Use this to diagonalize A. Problem 12.14. Lets investigate which 2 2 matrices are diagonalizable. a. Prove that every 2 2 matrix A can be written uniquely as A= p+q rs r+s pq
b. c. d. e.
for some numbers p, q, r, s. 2 Prove that the characteristic polynomial of A is (p ) + s2 q 2 r2 . Prove that A has two dierent eigenvalues just when q 2 + r2 > s2 . Prove that any 2 2 matrix with two dierent eigenvalues is diagonalizable. Prove that any 2 2 matrix with only one eigenvalue is diagonalizable just when it is diagonal. f. Prove that any 2 2 matrix with no eigenvalues is not diagonalizable.
12.3 Summary
Linear algebra has two problems: a. Solving linear equations Ax = b for the unknown x. This problem is truly linear. It has a solution x whenever b lies in the image, and the solution x is unique up to adding on vectors from the kernel.
114
Bases of Eigenvectors
b. Find eigenvectors and eigenvalues Ax = x. This problem is nonlinear, in fact quadratic, since and x are multiplied by one another. The nonlinear part is nding the eigenvalues , which are the roots of the characteristic polynomial det (A I ). There is an eigenspace of solutions x for each , and nding a basis of each eigenspace is a linear problem. If we get lucky (which doesnt always happen ), then the eigenvectors might form a basis of Rn , diagonalizing A. Table 12.1: Invertibility criteria (Strangs nutshell [15]). A is n n. U is any matrix obtained from A by forward elimination. 5.1 5.2 5.2 5.2 5.2 5.2 5.2 5.2 5.2 5.2 7.4 7.4 9.2 9.2 9.2 10.2 10.3 11.1 Invertible Just When . . . GaussJordan on A yields 1. U is invertible. Pivots lie all the way down the diagonal. U has no zero rows U has n pivots. Ax = b has a solution x for each b. Ax = b has exactly one solution x for each b. Ax = b has exactly one solution x for some b. Ax = 0 only for x = 0. A has rank n. At is invertible. det A = 0. The columns are linearly independent. The columns form a basis. The rows form a basis. The kernel of A is just the 0 vector. The image of A is all of Rn . 0 is not an eigenvalue of A.
Problem 12.15. Take each of the criteria in table 12.1, and describe an analogous criterion for showing that A is not invertible. For example, instead of det A = 0, you would write det A = 0. Make sure that as many as possible of your criteria express the failure of invertibility in terms of the rank r of the matrix A. For example, instead of turning U has no zero rows into U has a zero row,
12.3. Summary
115
117
13 Inner Product
So far, we havent thought about distances or angles. The elegant algebraic way to describe these geometric notions is in terms of the inner product, which measures something like how strongly in agreement two vectors are.
Problem 13.1. Prove that Aej , ei = Aij . At ij Recall that the transpose At of a matrix A is the matrix with entries = Aji , i.e. with rows and columns switched.
Problem 13.3. Let P be a permutation matrix. Use the result of problem 13.1 to prove that P 1 = P t . 119
120
Inner Product
a2 + b2 b
Figure 13.1: The Pythagorean theorem. Rearrange the 4 triangles into 2 rectangles to nd the area of all 4 triangles. Add the area of the small white square.
Denition 13.4. Vectors u and v are perpendicular if u, v = 0. Denition 13.5. The length of a vector x in Rn is x = x, x .
a This agrees in the plane with the Pythagorean theorem: if x = then b we can draw x as a point of the plane, and the length along x is a2 + b2 . Problem 13.4. Prove that for any vectors u and v , with u = 0, the vector v is perpendicular to u. v, u u u, u
Problem 13.6. (Due to Ian Christie) a. What is wrong with the clock?
121
b. At what times of day are the minute and hour hands of a properly functioning clock a) perpendicular? b) parallel? (The answer isnt very pretty.)
Problem 13.7. Prove that Ax, y = x, At y for vectors x in Rq , y in Rp and A any p q matrix.
Problem 13.9. Prove that an n n matrix is symmetric just when Ax, y = x, Ay for x and y any vectors in Rn . Clearly a symmetric matrix is square. Problem 13.10. Prove that a. The sum and dierence of symmetric matrices is symmetric. b. If A is a symmetric matrix, then 3A is also a symmetric matrix. Problem 13.11. Give an example of a pair of symmetric 2 2 matrices A and B for which AB is not symmetric.
122
Inner Product
Problem 13.13. If A is symmetric, and F an invertible matrix, is F AF 1 symmetric? If not, can you give a 2 2 example? Problem 13.14. If A and B are symmetric, is AB + BA symmetric? Is AB BA symmetric?
2 0 1 1
0 , 1 1 , 1
2 0 0 1
0 , 2 1 0
1 2 1 2
Orthogonal matrices are important because they preserve inner products: Problem 13.16. Prove that a matrix is orthogonal just when F x, F y = x, y for x and y any vectors. Clearly any orthogonal matrix is square. Problem 13.17. Prove that a. The product of orthogonal matrices is orthogonal. b. The inverse of an orthogonal matrix is orthogonal. Problem 13.18. If F is orthogonal, and c is a real number, prove that cF is also orthogonal only when c = 1. Problem 13.19. Which diagonal matrices are orthogonal?
123
Problem 13.20. Prove that the matrices P = cos sin sin cos
Problem 13.21. Give an example of a pair of orthogonal 2 2 matrices A and B for which A + B is not orthogonal.
Problem 13.22. By expanding out the expressions x + y, x + y using the properties of inner products, express the inner product x, y of two vectors in terms of their lengths. Use this to prove that a matrix A is orthogonal just when Ax = x , for any vector x.
ui , uj =
Example 13.10. The standard basis is orthonormal. Example 13.11. The basis
u1 = is orthonormal.
3 2 1 2
, u2 =
1 2 3 2
124
Inner Product
u2
...
un
u1 t u2 u2 t u2 . . . un t u2 u1 , u2 u2 , u2 . . . un , u2
un t u1 u1 , u1 u2 , u1 = . . . un , u1
un t un
u1 t un u2 t un . . . u1 , un u2 , un . . . un , un .
125
Done: orthonormal.
orthonormal? Draw a picture of these two vectors. Problem 13.25. Prove that a square matrix is orthogonal just when its rows are orthonormal.
126
Inner Product
The formal denition: given any linearly independent vectors v1 , v2 , . . . , vp (as input), the output are orthonormal vectors u1 , u2 , . . . , up : w 1 = v1 , v2 , w1 w1 , w1 , w1 v3 , w2 v3 , w1 w1 w2 , w 3 = v3 w1 , w1 w2 , w2 vj , wi wj = vj wi , wi , wi i<j w 2 = v2 uj = 1 wj . wj , wj
Each wj is just vj with all parts pulled o that head in the directions of previous wi s (the directions we are already nished with). At the nal step, each uj is just wj rescaled to unit length. We say that we are orthogonalizing the vectors v1 , v2 , . . . , vp . Problem 13.26. Orthogonalize
1 2 v1 = 1 , v2 = 0 . 0 2
127
Problem 13.30. Prove that if v1 , v2 , . . . , vp are linearly independent vectors, then each step of GramSchmidt makes sense (no dividing by zero), and the resulting u1 , u2 , . . . , up are an orthonormal basis for the span of v1 , v2 , . . . , vp .
Problem 13.31. Prove that any set of vectors, all of unit length, and perpendicular to one another, is contained in an orthonormal basis. Problem 13.32. If u and v are two vectors in Rn , and every vector w which is perpendicular to u is perpendicular to v , then v = au for some number a.
Inner Product
129
Problem 13.46. Orthogonalize 1 1 , . 2 0 Problem 13.47. Orthogonalize 0 1 . , 1 1 Problem 13.48. Orthogonalize 1 1 , . 2 1 Problem 13.49. Orthogonalize 0 1 , . 1 2 Problem 13.50. Orthogonalize 1 0 , . 1 2 Problem 13.51. Orthogonalize 1 2 , . 1 1 Problem 13.52. Orthogonalize 1 1 , . 1 0 Problem 13.53. Orthogonalize 1 2 , . 2 2
130
Inner Product
Problem 13.54. Orthogonalize 2 1 , . 2 2 Problem 13.55. Orthogonalize 0 1 . , 1 0 Problem 13.56. Orthogonalize 1 1 , . 1 0 Problem 13.57. Orthogonalize 1 2 , . 1 0 Problem 13.58. Orthogonalize 1 2 , . 1 1 Problem 13.59. Orthogonalize 0 1 , . 2 1 Problem 13.60. Orthogonalize 1 2 , . 1 1 Problem 13.61. Orthogonalize 2 1 , . 1 2
131
Problem 13.62. Orthogonalize 1 1 , . 2 1 Problem 13.63. Orthogonalize 2 2 . , 0 1 Problem 13.64. Orthogonalize 1 2 , . 1 1 Problem 13.65. Orthogonalize 1 0 , . 1 1 Problem 13.66. Orthogonalize 1 2 , . 1 0 Problem 13.67. Orthogonalize 2 0 , . 0 2 Problem 13.68. Orthogonalize 0 1 , . 1 1 Problem 13.69. Orthogonalize 1 2 , . 0 2
132
Inner Product
Problem 13.70. Orthogonalize 0 2 , . 2 1 Problem 13.71. Orthogonalize 2 0 . , 0 2 Problem 13.72. Orthogonalize 2 1 , . 2 1 Problem 13.73. Orthogonalize 0 1 , . 2 1 Problem 13.74. Orthogonalize 2 1 , . 0 1 Problem 13.75. Orthogonalize 1 0 , . 0 1 Problem 13.76. Orthogonalize 2 0 , . 1 1 Problem 13.77. Orthogonalize 2 2 , . 0 1
133
Problem 13.84. What happens to a basis when you carry out GramSchmidt, if it was already orthonormal to begin with?
136
First lets try t positive, so we can divide by t, and nd 0 2 Au, w + tH (w). Let t to go to zero, to see that 0 Au, w . Next try t negative, and divide by t and then let t go to zero, and see that 0 Au, w . Therefore Au, w = 0. So every vector w perpendicular to u is also perpendicular to Au. By problem 13.32 on page 127, Au is a multiple of u, so u is an eigenvector. Problem 14.1. If two eigenvectors of a symmetric matrix have dierent eigenvalues, prove that they are perpendicular.
Theorem 14.2 (Spectral Theorem). Each symmetric matrix A is diagonalized by an orthogonal matrix F . The columns of F form an orthonormal basis of eigenvectors. We say that F orthogonally diagonalizes A. Proof. Start with a unit eigenvector u1 , given by the minimum principle. Take any orthonormal basis u1 , u2 , . . . , un that starts with this vector, and let F be the matrix with these vectors as columns. Replace A with F t AF . After replacement, A has e1 as eigenvector: Ae1 = e1 , so the rst column of A is e1 . Because A is symmetric, we see that A= I 0 0 B
with B a smaller symmetric matrix. By induction on the size of matrix, we can orthogonally diagonalize B . The previous exercise is vital for calculations: to orthogonally diagonalize, nd all eigenvalues, and for each eigenvalue nd an orthonormal basis u1 , u2 , . . . of eigenvectors of that eigenvalue . All of the eigenvectors of all of the other eigenvalues will automatically be perpendicular to u1 , u2 , . . . , so put together they make an orthonormal basis of Rn . Problem 14.2. Find a matrix F which orthogonally diagonalizes the matrix A= 7 6 6 , 12
Problem 14.3. Prove that a square matrix is orthogonally diagonalizable just when it is symmetric. Remark 14.3. An elegant (but longer) proof of the spectral theorem can be made along the following lines. Once we have used the minimum principle to nd one eigenvector u1 , we can then look among all unit length vectors x perpendicular
137
to u1 , and see which of these vectors has smallest value for Ax, x . Call that vector u2 . Look among all unit vectors x which are perpendicular to both u1 and u2 for one which has the smallest value of Ax, x , and call it u3 , etc. This recipe will actually generate the eigenvectors for us, although it isnt easy to use either by hand or by computer.
Problem 14.5. Find a matrix F which orthogonally diagonalizes the matrix 4 2 0 A = 2 1 0 . 0 0 3 Problem 14.6. Find a matrix F which orthogonally diagonalizes the matrix 7 1 6 1 6 3 7 1 A = 1 6 6 3 1 1 5 3 3 3 Problem 14.7. Find a matrix F which orthogonally diagonalizes 7 24 25 0 25 A= 0 1 0 24 7 0 25 25 Problem 14.8. Let
1 A=
2 3
What are all of the orthogonal matrices F for which F t AF is diagonal with entries increasing as we move down the diagonal?
Problem 14.9. If A is symmetric, prove that A2 has the same rank as A. Problem 14.10. If A is n n, prove that At A has no negative eigenvalues, and its eigenvalues are all positive just when A is invertible.
138
Symmetrizing
If we have a quadratic form in variables x1 and x2 , we can write it more symmetrically; for example: x1 x2 = 1 1 x1 x2 + x2 x1 . 2 2
Making a matrix
Pluck out the quadratic terms in the polynomial to make a matrix. For example:
2 2 a x2 1 + b x1 x2 + c x2 = a x1 +
b b x1 x2 + x2 x1 + c x2 2 2 2
b 2
becomes A= a
b 2
More generally, ij Aij xi xj becomes A = (Aij ). Because we symmetrized, the matrix is symmetric. Problem 14.12. Make matrices for a. x2 2 2 b. x2 1 + x2 3 2 c. x1 + 2 x1 x2 + 3 2 x2 x1 1 1 d. x2 + x x + 1 2 1 2 2 x2 x1
139
Diagonalizing
Diagonalize our matrix, by orthogonal change of variables. Then the same orthogonal change of variables will simplify our quadratic form, turning it into a sum of quadratic forms in one variable each. For example, take the quadratic form 2 23 x2 1 + 72 x1 x2 + 2 x2 . Symmetrize:
2 23 x2 1 + 36 x1 x2 + 36 x2 x1 + 2 x2 .
We also let the reader check that if we take new variables y , dened by y = F t x, i.e. by x = F y , then the same quadratic form is
2 2 25 y1 + 50 y2 .
Theorem 14.5 (Decoupling Theorem). Any quadratic form in any number of variables becomes a sum of quadratic forms in one variable each, after a change of variables x = F y given by an orthogonal matrix F . Remark 14.6. We will say that the quadratic form is diagonalized by the orthogonal matrix. Proof. The problem comes from the mixed terms, like x1 x2 . Symmetrize and write a symmetric matrix A out of the coecients. Then the quadratic form is ij Aij xi xj = Ax, x . Diagonalize A to = F t AF . Let y = F t x. Then x = F y , so Ax, x = AF y, F y = F t AF y, y = y, y
2 2 = 1 y1 + + n yn .
140
141
Problem 14.19. Diagonalize (a) 7 x1 2 + 12 x1 x2 + 2 x2 2 (b) 2 x1 2 4 x1 x2 x2 2 (c) 6 x1 2 + 18 x1 x2 + 6 x2 2 (d) 3 x1 2 + 6 x1 x2 + 11 x2 2 (e) 9 x1 2 + 6 x1 x2 + x2 2 (f) 8 x1 2 6 x1 x2 Problem 14.20. Diagonalize (a) 8 x1 2 + 18 x1 x2 + 8 x2 2 (b) 7 x1 2 6 x1 x2 x2 2 (c) 11 x1 2 18 x1 x2 + 11 x2 2 (d) 9 x1 2 6 x1 x2 + x2 2 (e) 11 x1 2 12 x1 x2 + 6 x2 2 (f) 5 x1 2 + 12 x1 x2 + 10 x2 2 Problem 14.21. Diagonalize (a) 4 x1 2 + 12 x1 x2 + 9 x2 2 (b) 6 x1 2 6 x1 x2 2 x2 2 (c) 2 x1 x2 (d) 2 x1 2 + 8 x1 x2 + 2 x2 2 (e) 3 x1 2 6 x1 x2 + 11 x2 2 (f) 7 x1 2 + 18 x1 x2 + 7 x2 2 Problem 14.22. Diagonalize (a) 3 x1 2 4 x1 x2 (b) x1 2 + 6 x1 x2 + 7 x2 2 (c) 3 x1 2 + 12 x1 x2 + 8 x2 2 (d) 2 x1 2 2 x1 x2 2 x2 2 (e) 6 x1 x2 + 8 x2 2 (f) 3 x1 2 + 8 x1 x2 + 3 x2 2 Problem 14.23. Diagonalize (a) x1 2 + 2 x1 x2 + x2 2 (b) 7 x1 2 18 x1 x2 + 7 x2 2 (c) 6 x1 2 + 12 x1 x2 + 11 x2 2 (d) 2 x1 2 8 x1 x2 + 2 x2 2 (e) 5 x1 2 + 4 x1 x2 + 2 x2 2 (f) 2 x1 2 + 4 x1 x2 x2 2 Problem 14.24. Diagonalize (a) 4 x1 2 8 x1 x2 + 4 x2 2 (b) 11 x1 2 6 x1 x2 + 3 x2 2 (c) x1 2 6 x1 x2 + 9 x2 2 (d) 2 x1 2 + 6 x1 x2 + 10 x2 2 (e) 4 x1 2 + 4 x1 x2 + x2 2 (f) x1 2 + 12 x1 x2 + 6 x2 2 Problem 14.25. Diagonalize
Problem 14.26. Diagonalize (a) x1 2 + 4 x1 x2 + 2 x2 2 (b) 2 x1 2 4 x1 x2 + 5 x2 2 (c) 7 x1 2 + 6 x1 x2 x2 2 (d) 8 x1 2 18 x1 x2 + 8 x2 2 (e) 3 x1 2 + 2 x1 x2 + 3 x2 2 (f) 9 x1 2 12 x1 x2 + 4 x2 2 Problem 14.27. Diagonalize (a) 5 x1 2 + 8 x1 x2 + 5 x2 2 (b) 6 x1 2 + 12 x1 x2 + x2 2 (c) 3 x1 2 8 x1 x2 + 3 x2 2 (d) 2 x1 x2 (e) x1 2 + 4 x1 x2 + 4 x2 2 (f) 9 x1 2 18 x1 x2 + 9 x2 2 Problem 14.28. Diagonalize (a) 7 x1 2 + 6 x1 x2 x2 2 (b) 2 x1 2 2 x1 x2 + 2 x2 2 (c) 2 x1 2 + 2 x1 x2 2 x2 2 (d) 8 x1 2 + 6 x1 x2 (e) 5 x1 2 12 x1 x2 + 10 x2 2 (f) 6 x1 2 12 x1 x2 + x2 2 Problem 14.29. Diagonalize (a) 2 x1 2 + 2 x1 x2 + 2 x2 2 (b) 6 x1 x2 + 8 x2 2 (c) x1 2 + 8 x1 x2 + x2 2 (d) 5 x1 2 8 x1 x2 + 5 x2 2 (e) 6 x1 2 + 4 x1 x2 + 3 x2 2 (f) 6 x1 2 + 8 x1 x2 + 6 x2 2 Problem 14.30. Diagonalize (a) 4 x1 2 12 x1 x2 + 9 x2 2 (b) 3 x1 2 4 x1 x2 + 6 x2 2 (c) 2 x1 2 12 x1 x2 + 7 x2 2 (d) 5 x1 2 4 x1 x2 + 2 x2 2 (e) 10 x1 2 + 18 x1 x2 + 10 x2 2 (f) 2 x1 2 + 12 x1 x2 + 7 x2 2
143
Its eigenvalues are = 1 and = 4. So we can change variables (somehow) to get to 2 x 2 1 + 4 x2 = 0. This is just x1 = 2 x2 , a pair of lines intersecting at a point. Since the change of variables is linear, the original quadratic equation also cuts out a pair of lines intersecting at a point. Example 14.8. The equation
2 x2 1 + 4 x1 x2 + x2 = 1
The eigenvalues are = 1 and = 3. So after a linear change of variables, we get 2 x 2 1 + 3 x2 = 1. (The right hand side is a constant, so doesnt change.)
with a and b of dierent signs is a hyperbola, while if a and b have the same signs then it is an ellipse. So our last example must be a hyperbola. Warning: until you diagonalize the associated matrix, and look at the eigenvalues, you cant easily see what shape a quadratic equation cuts out. You cant just look at whether the coecients are positive, or anything obvious like that.
Problem 14.32. What more can you do to normalize a quadratic form if you allow arbitrary invertible matrices instead of orthogonal ones?
14.4 Positivity
Denition 14.9. A quadratic form Q(x) is positive denite if Q(x) > 0 except if x = 0. (Clearly if x = 0 then Q(x) = 0.) For example, Q(x) = x2 is a positive 2 n denite quadratic form on R , while Q(x) = x2 1 + x2 is positive denite on R . 2 2 But it is not at all clear whether Q(x) = 6 x1 12 x1 x2 + 11 x2 is positive denite, because it has positive terms and negative ones. As in gure 14.2 on the facing page, we can also dene positive semidenite forms (Q(x) 0), negative denite forms (Q(x) < 0 for x = 0), and indenite forms (not positive semidenite or negative semidenite), but they are less important. Lemma 14.10. A quadratic form Q(x) = x, Ax (with a symmetric matrix A) is positive denite just if all of the eigenvalues of A are positive. Proof. Let 1 , 2 , . . . , n be the eigenvalues, and change variables to y = F 1 x, so x = F y , to diagonalize the quadratic form:
2 2 2 Q = 1 y1 + 2 y2 + + n yn .
145
-1
-1.0 -0.5 0.0 3 0.5 1.0 2 -0.5 1 0.5 0 1.0 0.0 x[1] -1.0 1.0
-1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 0.0 1.0 x[1] -0.5 0.0 0.5 1.0
-1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 x[1]
-2 x[2]
x[2]
x[2]
-3 1.0
0.75
0.5
0.0 x[1]
-0.5
0.5
0.25
(c) Indenite
Suppose that all of the eigenvalues are positive. Clearly this quantity is then positive for nonzero vectors y , because each term is positive or zero, and at least one of y1 , y2 , . . . , yn is not zero, so gives a positive term. On the other hand, if one of these j is negative, then take y = ej , and you get Q 0 but x = F y = F ej = 0.
146
Any sequence of increasing real numbers, all of which are bounded from above by some large enough number, must converge to something. This fact is a property of real numbers which we cannot prove without giving an explicit and precise denition of the real numbers; see Spivak [14] for the complete story. We will just assume that this fact is true. Denition 14.12. A function f (x) of any number of variables x1 , x2 , . . . , xn (writing x for x1 . x= . . xn as a point of Rn ) is continuous if, in order to get f (y ) to stay as close to f (x) as you like, you have only to ensure that y is kept close enough to x. If two numbers are close, their sums, products and dierences are clearly close. The reader should try to prove: Lemma 14.13. The function f (x) = x1 is continuous. Constant functions are continuous. The sum, dierence and product of continuous functions is continuous. Corollary 14.14. Any polynomial function in any nite number of variables is continuous. Proof. Induction on the degree and number of terms of the polynomial. Denition 14.15. A ball in Rn is a set B consisting of all points closer than some distance to some chosen point (called the center of the ball). The distance is called the radius. A closed ball includes also the points of distance equal to the radius (an apple with the skin), while an open ball does not include any such points, only including points of distance less than the radius (an apple without the skin). Denition 14.16. A set S Rn is bounded if it lies in a ball (open or closed). Denition 14.17. A set S Rn is called closed if every point x of Rn not belonging to S can be surrounded by an open ball not belonging to S . Denition 14.18. A closed box is the set of points x = (x1 , . . . , xn ) for which each xj lies in some chosen interval, aj xj bj . An open box is the same but with aj < xj < bj . Lemma 14.19. A closed ball is a closed set, as is a closed box. Proof. Given a closed ball, say of radius r, take any point p not belonging to it, say of distance R from the center, and draw an open ball of radius R r about p. By the triangle inequality, no point in the open ball lies in the closed ball. Given a closed box, and a point p not belonging to it, there must be some coordinate of p which does not satisfy the inequalities dening the closed
147
box. For example, suppose that the box is cut out by inequalities including a1 x1 b1 , and p fails to satisfy these bounds because p1 > b1 . Then every point q closer to p than p1 b1 will still fail: q1 > b1 . So then a ball of radius p1 b1 around p will not overlap the closed box. Theorem 14.20. Every innite sequence of points in a closed, bounded set has a convergent subsequence. Proof. Suppose that the set is a box. Cover the box with a nite number of small closed boxes (perhaps overlapping). There are innitely many points x1 , x2 , . . . , and only nitely many of the small boxes, so there must be innitely many xj lying in the same small box. Similarly, subdivide that small box into much smaller closed boxes. Repeating, we nd a sequence of closed boxes, like Russian dolls, each contained entirely in the previous one, with innitely many xj in each. We get to choose how small the boxes are going to be at each step, so lets make them get much smaller at each step, with side lengths decreasing as rapidly as we like. Pick out one of these xi points, call it yj , from the j -th nested box, as the point with the smallest possible coordinates among all points of that box. The sequence of points y1 , y2 , . . . must converge, since all of the coordinates of the point yj are constrained by the box yj lies in, and each coordinate only increases with j . If we face a closed, bounded set S , which is not a closed box, then nd a closed box B containing it, and repeat the argument above. The problem is to ensure that the limit x of the sequence constructed belongs to the set S . Even if not, it certainly belongs to B . Since S is closed, if x does not belong to S ,then there must be an open ball around x not containing any points of S . But that open ball can not contain any of the points in the nested boxes, and therefore x cannot be their limit. Theorem 14.21. Every continuous function f on a closed, bounded set attains a maximum and a minimum. Proof. For the moment, lets suppose that our closed, bounded set is just a closed box. Suppose that f has no maximum. So the values of f can get larger and larger, but never peak. Let M be the smallest positive number so that f never exceeds M ; if there is no such number let M = . By denition, f gets as close to M as we like (which, if M = , means simply that f gets as large as we like), but never reaches M . Let x1 , x2 , x3 , . . . be any points of the closed bounded set on which f (xj ) approaches M . Taking a subsequence, we nd xj approaching a limit point x, and by continuity f (xj ) must approach f (x), so f (x) = M . So every continuous function on a closed bounded set has a maximum. If f is a continuous function on a closed bounded set, then f is too, and has a maximum, so f has a minimum.
15 Complex Vectors
The entire story so far can be retold with a cast of complex numbers instead of real numbers. Most of this is straightforward. But there turns out to be an important twist in the complex theory of the inner product. The minimum principle doesnt make any sense in the setting of complex numbers, and the spectral theorem as it was stated just isnt true any more for complex matrices. Moreover, the natural notion of inner product itself is quite dierent for complex vectorsthis new notion leads directly to the complex spectral theorem.
150
Complex Vectors
The number r is called the modulus of the complex number (written |z | if z is the complex number). The angle is called the argument of the complex number (written arg z if z is the complex number). The modulus is the distance from 0, or the length if think of (x, y ) as a vector. Theorem 15.2 (de Moivre). Under multiplication of complex numbers, moduli multiply, while arguments add. Under division of complex numbers, moduli divide, while arguments subtract. Proof. Let z and w be two complex numbers. Write them as z = r cos + ir sin and w = cos + i cos . Then calculate zw = r [(cos cos sin sin ) + i (sin cos + cos sin )] . A trigonometric identity: cos cos sin sin = cos ( + ) sin cos + cos sin = sin ( + ) . Division is similar. Problem 15.1. Explain why every complex number has a square root.
Denition 15.3. The conjugate z of a complex number z = x + iy is the number z = x iy . Problem 15.2. If z is a complex number, prove that |z |2 = z z We write C for the set of complex numbers, and Cn for the set of vectors z1 z2 z= . . . zn with each of z1 , z2 , . . . , zn a complex number. We wont go through the eort of translating the theorems above into complex linear algebra, except to say that all of the results before chapter 13 (on inner products) are still true for complex matrices, with identical proofs.
z w.
151
Problem 15.5. The unit disk in the complex plane is the set of complex numbers of modulus less than 1. The unit circle is the set of complex numbers of modulus 1. Draw the unit circle. Pick z a nonzero complex number, and consider its integer powers z n (n ranging over the integers). Prove that either innitely many of these powers lie inside the unit disk, or all of them lie on the unit circle. Problem 15.6. Let zk = cos 2k n + i sin 2k n .
n Use de Moivres theorem to show that zk = 1. These are the so-called nth roots of 1. Why are they all dierent for k = 0, 1, . . . , n 1? Draw the 3rd roots of 1 and (in another colour) the 4th roots of 1.
Problem 15.7. With pictures and words, explain what you know about a. z + z , b. z 2 if |z | > 1, c. z if |z | = 1, d. zw if |z | = |w| = 1.
as a complex matrix still only has one eigenvalue, and (up to rescaling) one eigenvector. Two linearly independent eigenvectors are just what we need to diagonalize a 2 2 matrix. Clearly we cannot diagonalize A. So complex numbers dont resolve all of the subtleties. Problem 15.8. Find the (complex) eigenvalues and eigenvectors of A= 0 1 1 0
152
Complex Vectors
and write down a matrix F which diagonalizes A. Moral of the story: even a matrix like A, which has only real number entries, can have complex number eigenvalues, and complex eigenvectors.
Then |z |2 = z z is the squared length x2 + y 2 . This motivates: Denition 15.6. The Hermitian inner product of two vectors z and w in Cn is the complex number z, w = z1 w 1 + z2 w 2 + + zn w n . The curious bars on top of the w terms allow us to write z = z, z , just as we would for real vectors. Warning: the Hermitian inner product z, w is a complex number, not the real number we had in inner products before. Problem 15.10. Compute z, w , z and w for z= 1 , w= i i , 2 + 2i
2
Problem 15.11. Prove that a. w, z = z, w b. cz, w = c z, w c. z + w, u = z, u + w, u d. z, z 0 e. z, z = 0 just when z = 0 for z, w and u any complex vectors in Cn and c and complex number.
153
154
Complex Vectors
vp , uq uq
up =
wp . wp
Problem 15.23. Apply the complex GramSchmidt process to nd a unitary basis for the basis 1 1 v1 = , v2 = . i 2
155
= Az, Az = z, A Az = z, AA z = A z, A z . So A z = 0. Lemma 15.14. If A is normal, then every eigenvector z of A with eigenvalue . is also an eigenvector of A , but with eigenvalue Proof. Let B = A . Then Bz = 0. Moreover, B is normal since A is. By z = 0. the previous lemma, B z = 0, so A
156
Complex Vectors
Replace A by F AF . After replacement A is still normal, and Ae1 = e1 ; the rst column of A is e1 . So A= 0 B C
1 , so the rst for some smaller matrices B and C . By lemma 15.14, A e1 = e column of A is e1 , and so B = 0. Moreover, C is also normal, so by induction we can unitarily diagonalize C , and therefore A. Corollary 15.16. Self-adjoint, skew-adjoint and unitary matrices are unitarily diagonalizable. Problem 15.27. Let A=
7 2 i 2 i 2 7 2
a. Is A self-adjoint or skew-adjoint? b. Find the eigenvalues and eigenvectors of A. c. Find a unitary matrix F which diagonalizes A, and unitarily diagonalize A.
157
Proof. By lemma 15.17, if we choose a large enough disk containing z0 then p(z ) has large modulus at each point z around the edge of that disk. Making the disk even larger if need be, we can ensure that the modulus all around the edge is larger than at some chosen point inside the disk. By theorem 14.21 on page 147, there is a point of the disk where p(z ) has minimum modulus among all points of that disk. The minimum cant be on the edge. Moreover, the modulus stays large as we move past the edge. Thus any minimum modulus point in that large disk is a minimum modulus point among all points of the plane. Lemma 15.19. The modulus |p(z )| of any nonconstant polynomial function reaches a minimum just where p(z ) reaches zero. Proof. Take any point z0 . Suppose that p (z0 ) = 0, and lets nd a reason why z0 is not a minimum modulus point. Replace p(z ) by p (z z0 ) if needed, to arrange that z0 = 0. Write out p(z ) = a0 + a1 z + a2 z 2 + + an z n . It might happen that a1 = 0, and maybe a2 too. So write p(z ) = a0 + ak z k + + an z n , writing down only the nonzero terms, in increasing order of their power of z . Clearly a0 = 0 because p(0) = 0. We can divide by a0 if we wish, which alters modulus only by a positive factor, so lets assume that a0 = 1. We can rotate the z variable, and rescale it, which rotates and scales each coecient. Thereby arrange ak = 1, so p(z ) = 1 z k + + an z n . Calculate |p(z )|2 = p(z )p(z ) = 1 zk z k + . . . , where the dots indicate terms involving more z and z factors. Write z = r cos + ir sin . De Moivres theorem gives |p(z )| = 1 2 rk cos k + rk+1 (. . . ) . The error term (. . . ) is some (probably very complicated) polynomial in r with (complicated) coecients involving cos and sin . We dont need to work it out. We only need to know that it is bounded for z near enough to 0, which is clear whatever the terms involved are. For r > 0 suciently small, 2 r(. . . ) > 0. Multiplying by rk , 2 rk + rk+1 (. . . ) < 0. Therefore |p(z )|2 gets even smaller at the point z = r than at z = 0.
2
158
Complex Vectors
Corollary 15.20. Every nonconstant complex polynomial has a root. Theorem 15.21 (Fundamental Theorem of Algebra). Every nonconstant complex polynomial p(z ) can be factored into linear factors. More specically, p(z ) = c (z z1 )
d1
( z z2 )
d2
. . . (z zk )
dk
where c is a constant, z1 , z2 , . . . , zk are the roots of p(z ), and d1 , d2 , . . . , dk are positive integers, with sum d1 + d2 + + dk equal to the degree of p(z ). Proof. We have a root, say z1 . Therefore p(z )/ (z z1 ) is a polynomial, and we apply induction.
Abstraction
159
16 Vector Spaces
The ideas of linear algebra apply more widely, in more abstract spaces than Rn .
16.1 Denition
Denition 16.1. A vector space V is a set (whose elements are called vectors ) equipped with two operations, addition (written +) and scaling (written ), so that a. Addition laws: a) u + v is in V b) (u + v ) + w = u + (v + w) c) u + v = v + u for any vectors u, v, w in V , b. Zero laws: a) There is a vector 0 in V so that 0 + v = v for any vector v in V . b) For each vector v in V , there is a vector w in V , for which v + w = 0. c. Scaling laws: a) av is in V b) 1 v = v c) a(bv ) = (ab)v d) (a + b)v = av + bv e) a(u + v ) = au + av for any real numbers a and b, and any vectors u and v in V . Because (u + v ) + w = u + (v + w), we never need parentheses in adding up vectors. Example 16.2. Rn is a vector space, with the usual addition and scaling. Example 16.3. The set V of all real-valued functions of a real variable is a vector space: we can add functions (f + g )(x) = f (x) + g (x), and scale functions: (c f )(x) = c f (x). This example is the main motivation for developing an abstract theory of vector spaces. Example 16.4. Take some region inside Rn , like a box, or a ball, or several boxes and balls glued together. Let V be the set of all real-valued functions of that region. Unlike Rn , which comes equipped with the standard basis, there is no standard basis of V . By this, we mean that there is no collection of 161
162
Vector Spaces
functions fi we know how to write down so that every function f is a unique linear combination of the fi . Even still, we can generalize a lot of ideas about linear algebra to various spaces like V instead of just Rn . Practically speaking, there are only two types of vector spaces that we ever encounter: Rn (and its subspaces) and the space V of real-valued functions dened on some region in Rn (and its subspaces). Example 16.5. The set Rpq of p q matrices is a vector space, with usual matrix addition and scaling. Problem 16.1. If V is a vector space, prove that a. 0 v = 0 for any vector v , and b. a 0 = 0 for any scalar a.
Problem 16.2. Let V be the set of real-valued polynomial functions of a real variable. Prove that V is a vector space, with the usual addition and scaling. Problem 16.3. Prove that there is a unique vector w for which v + w = 0. (Lets always call that vector v .) Prove also that v = (1)v . We will write u v for u + (v ) from now on. We dene linear relations, linear independence, bases, subspaces, bases of subspaces, and dimension using exactly the same denitions as for Rn . Remark 16.6. Thinking as much as possible in terms of abstract vector spaces saves a lot of hard work. We will see many reasons why, but the rst is that every subspace of any vector space is itself a vector space.
163
164
Vector Spaces
Remark 16.13. A linear map between abstract vector spaces doesnt have an associated matrix; this idea only makes sense for maps T : Rq Rp . Example 16.14. Let U and V be two vector spaces. The set W of all linear maps T : U V is a vector space: we add linear maps by (T1 + T2 ) (u) = T1 (u) + T2 (u), and scale by (cT )u = cT u. Denition 16.15. The kernel of a linear map T : U V is the set of vectors u in U for which T u = 0. The image is the set of vectors v in V of the form v = T u for some u in U . Denition 16.16. A linear map T : U V is an isomorphism if a. T x = T y just when x = y (one-to-one) for any x and y in U , and b. For any z in W , there is some x in U for which T x = z (onto). Two vector spaces U and V are called isomorphic if there is an isomorphism between them. Being isomorphic means eectively being the same for purposes of linear algebra. Problem 16.9. Prove that a linear map T : U V is an isomorphism just when its kernel is 0, and its image is V . Problem 16.10. Let V be a vector space. Prove that 1 : V V is an isomorphism.
Problem 16.11. Prove that an isomorphism T : U V has a unique inverse map T 1 : V U so that T 1 T = 1 and T T 1 = 1, and that T 1 is linear.
Problem 16.12. Let V be the set of polynomials of degree at most 2, and map T : V R3 by, for any polynomial p, p(0) T p = p(1) . p(2) Prove that T is an isomorphism.
16.3. Subspaces
165
16.3 Subspaces
The denition of a subspace is identical to that for Rn . Example 16.17. Let V be the set of real-valued functions of a real variable. The set P of continuous real-valued functions of a real variable is a subspace of V . Example 16.18. Let V be the set of all innite sequences of real numbers. We add a sequence x1 , x2 , x3 , . . . to a sequence y1 , y2 , y3 , . . . to make the sequence x1 + y1 , x2 + y2 , x3 + y3 , . . . . We scale a sequence by scaling each entry. The set of convergent innite sequences of real numbers is a subspace of V . In these last two examples, we see that a large part of analysis is encoded into subspaces of innite dimensional vector spaces. (We will dene dimension shortly.) Problem 16.14. Describe some subspaces of the space of all real-valued functions of a real variable.
Problem 16.16. Which of the following are subspaces of vector space of all 3 3 matrices? a. The invertible matrices. b. The noninvertible matrices. c. The matrices with positive entries. d. The upper triangular matrices. e. The symmetric matrices. f. The orthogonal matrices.
Problem 16.17. a. Let H be an n n matrix. Let P be the set of all matrices A for which AH = HA. Prove that P is a subspace of the space V of all n n matrices.
Vector Spaces
16.4 Bases
We dene linear combinations, linear relations, linear independence, bases, the span of a set of vectors, eigenvalues, and eigenvectors identically. Problem 16.18. Find bases for the following vector spaces: a. The set of polynomial functions of degree 3 or less. b. The set of 3 2 matrices. c. The set of n n upper triangular matrices. d. The set of polynomial functions p(x) of degree 3 or less which vanish at the origin x = 0.
Remark 16.19. When working with an abstract vector space V , the role that has up to now been played by a change of basis matrix will henceforth be played by an isomorphism F : Rn V . Equivalently, F e1 , F e2 , . . . , F en is a basis of V. Example 16.20. Let V be the vector space of polynomials p(x) = a + bx + cx2 of degree at most 2. Let F : R3 V be the map a F b = a + bx + cx2 . c Clearly F is an isomorphism. Denition 16.21. The dimension of a vector space V is n if there is an isomorphism F : Rn V . If there is no such value of n, then we say the V has innite dimension. Remark 16.22. We can include the possibility that n = 0 by dening R0 to consist in just a single vector 0, a zero dimensional vector space. Problem 16.19. Prove that the denition of dimension is well-dened, i.e. that there is either only one such value of n, or no such value of n.
Problem 16.20. Let V be the set of polynomials of degree at most p in n variables. Find the dimension of V . Problem 16.21. Prove that if linear maps satisfy P S = T and P is an isomorphism, then S and T have the same kernel, and isomorphic images.
16.4. Bases
167
Problem 16.22. Prove that if linear maps satisfy SP = T , and P is an isomorphism, then S and T have the same image and isomorphic kernels. Problem 16.23. Prove that dimension is invariant under isomorphism.
Problem 16.24. Prove that for any subspace Z of a nite dimensional vector space U , there is basis for U z1 , z2 , . . . , zp , u1 , u2 , . . . , uq so that z1 , z2 , . . . , zp , form a basis for Z .
Theorem 16.23. If v1 , v2 , . . . , vn is a basis for a vector space V , and w1 , w2 , . . . , wn are any vectors in a vector space W , then there is a unique linear map T : V W so that T vi = wi . Proof. If there were two such maps, say S and T , then S T would vanish on v1 , v2 , . . . , vn , and therefore by linearity would vanish on any linear combination of v1 , v2 , . . . , vn , therefore on any vector, so S = T . To see that there is such a map, we know that each vector x in V can be written uniquely as x = x 1 v 1 + x 2 v2 + + x n vn . So lets dene T x = x1 w1 + x2 w2 + + xn wn . If we take two vectors, say x and y , and write them as linear combinations of basis vectors, say with x = x 1 v 1 + x 2 v2 + + x n vn , y = y 1 v1 + y 2 v2 + + y n vn ,
Vector Spaces
Denition 16.24. If T : U V is a linear map, and W is a subspace of U , the restriction, written T |W : W V , is the linear map dened by T |W (w) = T w for w in W , only allowing vectors from W to map through T . Theorem 16.25. Let T : U V be a linear transformation of nite dimensional vector spaces. Then dim ker T + dim im T = dim U. Proof. Problem 16.24 on the previous page shows that we can pick a basis z1 , z2 , . . . , zp , u1 , u2 , . . . , uq for U so that z1 , z2 , . . . , zp is a basis for ker T . Let w1 = T u1 , w2 = T u2 , . . . , wq = T uq . Every vector in im T can be written as y = T x for some vector x in U . But then x can be written in terms of the basis, say as x = a1 z1 + a2 z2 + + ap zp + b1 u1 + b2 u2 + + bq uq . So y = Tx = a1 T z1 + a2 T z2 + + ap T zp + b1 T u1 + b2 T u2 + + bq T uq = b1 T u1 + b2 T u2 + + bq T uq = b1 w1 + b2 w2 + + bq wq . Therefore w1 , w2 , . . . , wq span im T . If there is a linear relation between w1 , w2 , . . . , wq , say 0 = c1 w1 + c2 w2 + + cq wq , then 0 = T (c1 u1 + c2 u2 + + cq uq ) . Therefore c1 u1 + c2 u2 + + cq uq lies in ker T , and so can be written in terms of the basis z1 , z2 , . . . , zp of ker T , say as c1 u1 + c2 u2 + + cq uq = d1 z1 + d2 z2 + + dp zp , a linear relation among the vectors of our basis of U , impossible. So w1 , w2 , . . . , wq are linearly independent, and so form a basis of im T . Finally, we have a basis of U with p + q vectors in it, a basis of ker T with p vectors in it, and a basis of im T with q vectors in it, so dim ker T + dim im T = dim U . Remark 16.26. In this theorem, we could also allow U , V , or both to have innite dimension, as well as allowing kernel and image to have innite dimension, with the understanding that + = . However, in this book we will content ourselves with nite dimensional vector spaces.
16.5. Determinants
169
16.5 Determinants
Denition 16.27. If T : V V is a linear map taking a nite dimensional vector space to itself, dene det T to be det T = det A, where F : Rn V is an isomorphism, and A is the matrix associated to F 1 T F : Rn R n . Remark 16.28. There is no denition of determinant for a linear map of an innite dimensional vector space, and there is no general theory to handle such things, although there are many important examples. Remark 16.29. A map T : U V between dierent vector spaces doesnt have a determinant. Problem 16.25. Prove that value of the determinant is independent of the choice of isomorphism F .
Problem 16.26. Let V be the vector space of polynomials of degree at most 2, and let T : V V be the linear map T p(x) = 2p(x 1) (shifting a polynomial p(x) to 2p(x 1).) For example, T 1 = 2, T x = 2(x 1), T x2 = 2(x 1)2 . a. Prove that T is a linear map. b. Prove that T is an isomorphism. c. Find det T .
Problem 16.28. Let V be the vector space of all 2 2 matrices. Let A be a 2 2 matrix with two dierent eigenvalues, 1 and 2 , and eigenvectors x1 and x2 corresponding to these eigenvalues. Consider the linear map T : V V given by T B = AB (matrix multiplication on the right hand side of B by A). What are the eigenvalues of T and what are the eigenvectors? (Warning: the eigenvectors are vectors from V , so they are matrices.) What is det T ?
170
Vector Spaces
Problem 16.30. Let V be the vector space of polynomials of degree at most 2, and let T : V V be dened by T q (x) = q (x). What is the characteristic polynomial of T ? What are the eigenspaces of T ? Is T diagonalizable?
Problem 16.31. (Due to Peter Lax [6].) Consider the problem of nding a polynomial p(x) with specied average values on each of a dozen intervals on the x-axis. (Suppose that the intervals dont overlap.) Does this problem have a solution? Does it have many solutions? (All you need is a naive notion of average value, but you can consult a calculus book, for example [14], for a precise denition.) (a) For each polynomial p of degree n, let T p be the vector whose entries are the averages. Suppose that the number of intervals is at least n. Show that T p = 0 only if p = 0. (b) Suppose that the number of intervals is no more than n. Show that we can solve T p = b for any given vector b.
Problem 16.32. How much of the nutshell (table 12.1 on page 114) can you translate into criteria for invertibility of a linear map T : U V ? How much more if we assume that U and V are nite dimensional? How much more if we assume that U = V ?
171
A real vector space equipped with an inner product is called a inner product space. A linear map between vector spaces is called orthogonal if it preserves inner products. Theorem 16.31. Every inner product space of dimension n is carried by some orthogonal isomorphism to Rn with its usual inner product. Proof. Use the GramSchmidt process to construct an orthonormal basis, using the same formulas we have used before, say u1 , u2 , . . . , un . Dene a linear map F x = x1 u1 + +xn un , for x in Rn . Clearly F is an orthogonal isomorphism. Example 16.32. Take A any symmetric n n matrix with positive eigenvalues, and let x, y A = Ax, y (with the usual inner product on Rn appearing on the right hand side). Then the expression x, y A is an inner product. Therefore by the theorem, we can nd a change of variables taking it to the usual inner product. Denition 16.33. A linear map T : V V from an inner product space to itself is symmetric if T v, w = v, T w for any vectors v and w. Theorem 16.34 (Spectral Theorem). Given a symmetric linear map T on a nite dimensional inner product space V , there is an orthogonal isomorphism F : Rn V for which F 1 T F is the linear map of a diagonal matrix.
Problem 16.34. Continuing the previous question, if the points z0 , z1 , z2 , z3 are z0 = 1, z1 = 1, z2 = i, z3 = i, prove that the map T : V V given by T p(z ) = p(z ) is unitary.
172
Vector Spaces
Problem 16.35. Continuing the previous two questions, unitarily diagonalize T. Problem 16.36. State and prove a spectral theorem for normal complex linear maps T : V V on a Hermitian inner product space, and dene the terms adjoint, normal and unitary for complex linear maps V V .
17 Fields
Instead of real or complex numbers, we can dream up wilder notions of numbers. Denition 17.1. A eld is a set F equipped with operations + and so that a. Addition laws a) x + y is in F b) (x + y ) + z = x + (y + z ) c) x + y = y + x for any x, y and z from F . b. Zero laws a) There is an element 0 of F for which x + 0 = x for any x from F b) For each x from F there is a y from F so that x + y = 0. c. Multiplication laws a) xy is in F b) x(yz ) = (xy )z c) xy = yx for any x, y and z in F . d. Identity laws a) There is an element 1 in F for which x1 = 1x = x for any x in F . b) For each x = 0 there is a y = 0 for which xy = 1. (This y is called the reciprocal or inverse of x.) c) 1 = 0 e. Distributive law a) x(y + z ) = xy + xz for any x, y and z in F . We will not ask the reader to check all of these laws in any of our examples, because there are just too many of them. We will only give some examples; for a proper introduction to elds, see Artin [1]. Example 17.2. Of course, the set of real numbers R is a eld (with the usual addition and multiplication), as is the set C of complex numbers and the set Q of rational numbers. The set Z of integers is not a eld, because the integer 2 has no integer reciprocal. 173
174
Fields
Example 17.3. Let F be the set of all rational functions p(x)/q (x), with p(x) and q (x) polynomials, and q (x) not the 0 polynomial. Clearly for any pair of rational functions, the sum p1 (x) p2 (x) p1 (x)q2 (x) + q1 (x)p2 (x) + = q1 (x) q2 (x) q1 (x)q2 (x) is also rational, as is the product, and the reciprocal. Problem 17.1. Suppose that F is a eld. Prove the uniqueness of 0, i.e. that there is only one element z = 0 in F which satises x + z = x for any element x.
Problem 17.3. Let x be an element of a eld F . Prove the uniqueness of the element y for which x + y = 0. Henceforth, we write this y as x. Problem 17.4. Let x be an element of eld F . If x = 0, prove the uniqueness of the reciprocal. Henceforth, we write the reciprocal of x as
1 x,
and write x + (y ) as x y .
= x.
Example 17.5. Suppose that p is a positive integer. Let F be the set of numbers Fp = {0, 1, 2, . . . , p 1}. Dene addition and multiplication as usual for integers, but if the result is bigger than p 1, then subtract multiples of p from the result until it lands in Fp , and let that be the denition of addition and multiplication. F2 is the eld of Boolean numbers. We usually write x = y (mod p) to mean that x and y dier by a multiple of p. For example, if p = 7, we nd 5 6 = 30 =2 (mod 7) (mod 7) (mod 7).
= 30 28
This is arithmetic in F7 . It turns out that Fp is a eld for any prime number p.
175
Problem 17.6. Prove that Fp is not a eld if p is not prime. The only trick in seeing that Fp is eld is to see why there is a reciprocal. It cant be the usual reciprocal as a number. For example, if p = 7 6 6 = 36 (mod 7) = 36 35 =1 (mod 7) (mod 7)
176 Plug in the equation before that: = 3 612 + 4 (2304 3 612) = 4 2304 15 612 = 4 2304 15 (12132 5 2304) = 15 12132 + 79 2304.
Fields
We have it: gcd(a, b) = u a + b v , in our case 36 = 15 12132 + 79 2304. What does this algorithm do? At each step downward, we are facing an equation like a bq = r, so any number which divides into a and b must divide into r and b (the next a and b) and vice versa. The remainders r get smaller at each step, always smaller than either a or b. On the last line, b divides into a. Therefore b is the greatest common divisor of a and b on the last line, and so is the greatest common divisor of the original numbers. We express each remainder in terms of previous a and b numbers, so we can plug them in, cascading backwards until we express the greatest common divisor in terms of the original a and b. In the example, that gives (15)(12132) + (79)(2304) = 36. Let compute a reciprocal modulo an integer. Lets compute 171 modulo 1001. Take a = 1001, and b = 17. 1001 58 17 = 15 17 1 15 = 2 15 7 2 = 1 2 2 1 = 0. Going backwards 1 = 15 7 2 = 15 7 (17 1 15) = 7 17 + 8 15 = 7 17 + 8 (1001 58 17) = 471 17 + 8 1001. So nally, (471)(17) + (8)(1001) = 1. Modulo 1001, (471)(17) = 1. So 171 = 471 = 1001 471 = 530 (mod 1001). This is how we can compute reciprocals in Fp : we take a = p, and b the number to reciprocate, and apply the process. If p is prime, the resulting greatest common divisor is 1, and so we get up + vb = 1, and so vb = 1 (mod p), so v is the reciprocal of b. Problem 17.7. Compute 151 in F79 .
17.2. Matrices
177
17.2 Matrices
Matrices with entries from any eld F are added, subtracted, and multiplied by the same rules. We can still carry out forward elimination, back substitution, calculate inverses, determinants, characteristic polynomials, eigenvectors and eigenvalues, using the same steps. Problem 17.10. Let F be the Boolean numbers, and A the matrix 0 1 0 A = 1 0 1 , 1 1 0 thought of as having entries from F . Is A invertible? If so, nd A1 . All of the ideas of linear algebra worked out for the real and complex numbers have obvious analogues over any eld, except for the concept of inner product, which is much more sophisticated. From now on, we will only state and prove results for real vector spaces, but those results which do not require inner products (or orthogonal or unitary matrices) continue to hold with identical proofs over any eld. Problem 17.11. If A is a matrix whose entries are rational functions of a variable t, prove that the rank of A is constant in t, except for nitely many values of t.
179
Remark 18.2. There is no standard notation in the mathematical literature for A(ij ) . We wont refer to A(ij ) again after this chapter. We can also expand down any column, or across any row: Theorem 18.3. det A =
i
(1)i+j Aij det A(ij ) , for any column j (1)i+j Aij det A(ij ) , for any row i,
j
where A is any square matrix. Finally, we can expand into a sum of permutations: Theorem 18.4. det A = (1)N Ai1 1 Ai2 2 . . . Ain n , 181 (18.1)
182
for any square matrix A, where the sum is over all permutations i1 , i2 , . . . , in of 1, 2, . . . , n, and N is the number of transpositions in some sequence of transpositions taking the permutation i1 , i2 , . . . , in back to 1, 2, . . . , n. Remark 18.5. This permutation formula for the determinant is important for the theory. However, it is too slow as a method for computing determinants. For a 20 20 matrix, GaussJordan elimination takes approximately 2489 multiplications and divisions, while the permutation formula takes approximately 46225138155356160000 multiplications, so about 1016 times as longtoo long for any supercomputer that will ever be built. Remark 18.6. The number (1)N is called the sign of the permutation i1 , i2 , . . . , in . For example, the sign of 1, 3, 2 is (1)1 = 1. Keep in mind that a permutation might factor into transpositions in dierent ways, using dierent numbers of transpositions; for example = .
It is not obvious whether there could be two dierent ways to carry out a permutation in terms of transpositions, with two dierent values for (1)N (i.e. one with an odd number of transpositions, and the other an even number of transpositions). However, this will follow from the proof below.
Proof. First, lets forget about the minus signs. Run your nger down the rst column, and you pick up Ai1 1 , and multiply it by det A(i1 1) . This det A(i1 1) is calculated similarly, except that row i1 and column 1 have been deleted, so none of the terms come from that row or column. Therefore (inductively) all of the terms look just right, except for the (very confusing) minus signs. In each term, the numbers i1 , i2 , . . . , in label deleted rows, in the order in which we delete them, and 1, 2, . . . , n label deleted columns. There is only one way to generate each term, given by the permutation i1 , i2 , . . . , in , since i1 labels which row we delete rst, etc. So each term we have written above shows up just once, with a plus or minus sign. Lets x the minus signs. The term A11 A22 . . . Ann occurs with a plus sign, because we start with a plus sign when we run our nger down the rst column, so it is clear by induction. When we swap rows, we switch sign of the whole determinant. Each term is dierent from any other (coming from a dierent permutation), and has just a plus or minus sign in front of it. Swapping rows can alter signs of terms, and changes the order in which the terms appear, but the same terms are still sitting there. So every term must switch sign when we swap any two rows. The sign in front of A12 A21 A33 A44 . . . Ann , for example, must be , since it comes about from one swapping (of rows 1 and 2) applied to A11 A22 . . . Ann . The number N of transpositions needed to reorder the permutation i1 , i2 , . . . , in of rows into 1, 2, . . . , n is the number of minus signs in front of that term. (The minus signs cancel in pairs.)
183
for any square matrix A, where the sum is over all permutations j1 , j2 , . . . , jn of 1, 2, . . . , n, and N is the number of transpositions in some sequence of transpositions taking the permutation j1 , j2 , . . . , in back to 1, 2, . . . , n. Proof. Take a term in the row permutation formula 18.1, say (1)N Ai1 1 Ai2 2 . . . Ain n . Scramble the factors back into order by rst index, say (1)N A1j1 A2j2 . . . Anjn . So j1 , j2 , . . . , jn is the inverse permutation of i1 , i2 , . . . , in . The inverse permutation can be brought about by reversing the transpositions that bring about the original permutation i1 , i2 , . . . , in . So it has the same number of transpositions N , and therefore the same sign.
=
i
(1)N
Ai2 k2 Bk2 1
kn
Ai n k n B k n 1
=
k1 ,k2 ,...,kn
=
k1 ,k2 ,...,kn
184
If k1 = k2 , then two columns in here are equal, so the resulting determinant is zero. Therefore this is really a sum over permutations k1 , k2 , . . . , kn . Reorder the columns: = = Bk1 1 . . . Bkn n (1)N det Ae1 Bk1 1 . . . Bkn n (1)N det A (1)N Bk1 1 . . . Bkn n ... Aen
= det A
= det A det B.
Problem 18.4. Prove that a matrix is a permutation matrix just when it is a product of permutation matrices of transpositions. Problem 18.5. For the reader who has read chapter 13: why are permutation matrices orthogonal? For which permutations is the associated permutation matrix symmetric? Problem 18.6. Lets write n! (read as n factorial ) for how many permutations there are of the numbers 1, 2, . . . , n. Prove that n! = n(n 1)(n 2) . . . (2)(1).
185
a. Prove that |det A| n! Rn , b. Using caution with how you pick your pivots, by forward elimination and induction, prove that |det A| 2n(n1)/2 Rn . c. Give some evidence as to which is the better bound.
(1)i+j Aij det A(ij ) , for any column j (1)i+j Aij det A(ij ) , for any row i,
j
where A is any square matrix. Denition 18.8. The adjugate matrix adj A of a square matrix A is the matrix whose entries are (adj A)ij = (1)j +i det A(ji) . Note the placement of i and j here: reversed from the formula for determinants. So clearly det A =
j
for any xed index i. Some more notation: if A is a matrix, and b a vector, write Ab,i for the matrix obtained by replacing column i of A by b. Theorem 18.9 (Cramers Rule). If A is invertible, then the solution to Ax = b has entries det Ab,j xj = . det A Proof. Write A as columns A = a1 a2 ... an .
186 Expand out: det Ab,j = det a1 = det a1 = det a1 a2 a2 a2 ... ... ... b Ax ... ...
an an ... an
(x1 a1 + x2 a2 + + xn an )
But a1 already appears in the rst column, so x1 a1 has no eect on the result. Similarly for x2 a2 , etc., so = det a1 = xj det A. a2 ... xj aj ... an
Problem 18.8. If A and b have integer entries, and det A = 1, prove that the solution x of Ax = b has integer entries. Corollary 18.10. If a matrix A is invertible, then A 1 = 1 adj A. det A
Proof. We need to invert A, so to solve Ax = ej for each vector ej , and then put each solution x in column j of a matrix A1 . By Cramers rule, the solution of Ax = ej has entries det Aej ,i xi = . det A Putting these together into the columns of a matrix, we nd components
1 A ij =
det Aej ,i
A11 A 21 . . . = det Aj 1 . . . An 1
A12 A22 . . . Aj 2 . . . An 2
0 0 . . . 1 . . . 0
187
Remark 18.11. Cramers rule shows us that the solution x of Ax = b is a smooth function of A and b, as long as A is invertible. (By smooth, we mean dierentiable as many times as you like, with respect to all variables. In this case, the variables are the entries of A and b.) In particular, the entries of A1 are smooth functions of the entries of A. Denition 18.12. We will say that two square matrices A and B commute if AB = BA. Similarly, we will say that two linear maps S : V V and T : V V of a vector space V commute if ST = T S . Lemma 18.13 (Lax [6]). Suppose that P (x) and Q(x) are polynomials with n n matrix coecients: P (x) = P0 + P1 x + P2 x2 + + Pp xp Q(x) = Q0 + Q1 x + Q2 x2 + + Qq xq . Their product P Q(x) = P (x)Q(x) is also a polynomial with matrix coecients. If A is an n n matrix, then write P (A) to mean P ( A ) = P0 + P1 A + P2 A 2 + + Pp A p . If A is an nn matrix commuting with all of the coecient matrices Q0 , Q1 , . . . , Qq of Q(x), then P Q(A) = P (A)Q(A). Proof. Calculate P Q(x) = P0 Q0 + (P1 Q0 + P0 Q1 ) x + + Pp Qq xp+q =
j,k
Pj Qk xj +k .
So P Q(A) =
j,k
Pj Qk Aj +k Pj Aj Qk Ak
j,k
= P (A)Q(A).
Theorem 18.14 (CayleyHamilton). Every square matrix A satises p(A) = 0 where p() = det (A I ) is the characteristic polynomial of A. Remark 18.15. The proof works equally well over any eld, not just the real or complex numbers. Proof. Let P () = adj (A I ), Q() = A I . Then P ()Q() = p(). Clearly A commutes with the coecient matrices of Q() (i.e. A commutes with A), so P (A)Q(A) = P Q(A) = p(A). But Q(A) = A A = 0.
188
Problem 18.11. If an invertible matrix A has integer entries, prove that A1 also has integer entries just when det A = 1. Problem 18.12. Without any calculation, what can you say about the inverse of the matrix 1 1 + x + 5 x3 1 + 2 x3 2 + 7 x + x3 6 + 3 x3 0 1 5 + 7 x + 7 x3 7 + 7 x + 8 x3 1 + 5 x3 A = 0 0 1 2 + 4 x + 3 x3 6 x + 4 x3 ? 0 0 0 1 3 + 3 x + 2 x3 0 0 0 0 1
19.1 Shears
Consider the matrix A= 1 0 c . 1
x1 + cx2 x2
We call the transformation taking x to y = Ax a shear. Consider the x1 , x2 plane. Draw the x1 axis horizontally, and x2 vertically. Take a rectangule in the x1 , x2 -plane, with sides along the two axes. The rectangle gets mapped into a parallelogram in the y1 , y2 -plane. Problem 19.1. Prove that the bottom of the rectangle stays put, but the top is shifted over by the amount c, while the height stays the same. The parallelogram has the same area as the rectangle. So the shear preserves the area of the rectangle. If we slide the rectangle away from the origin, the parallelogram slides too in the same way. Consequently all rectangles with sides parallel to the axes have area preserved by any shear.
19.2 Reections
The map y = Ax with A= 0 1 1 0
Cut the parallelogram. Slide the triangle over to the right side to recover the rectangle.
swaps the x1 , x2 coordinates, so clearly takes rectangles to congruent rectangles, and so preserves their area. 189
190
for some size p p of identity matrix. So the image of A lies inside Rp . Take any set X . Suppose that X has volume Vol (X ). The set AX is AX = Y Z for Y some set in Rp , and Z the single point 0 in Rnp . Hence Vol (Z ) = 0 and Vol (AX ) = Vol (Y ) Vol (Z ) = 0.
191
y b
= 1,
from the well-known fact that a circle of unit radius x2 + y 2 = 1 has area .
Equality holds just when one of the vectors is a multiple of the other. Proof. If y = 0 then the result is obvious. If y = 0, then 0 = = y = y = y Therefore x
2
y y
4 4 2
2 2
x x, y y x x, y y, y x x
2 2 2
x x, y y x, y x, y
2 2
y y
2
2 2 2
x, y x, y
x, y + x, y
x, y
x, y
On the rst line, you see that equality holds just when x is a multiple of y . Lemma 20.2 (The Triangle Inequality). If x and y are vectors in Rn , then x+y x + y . Equality holds just when one of the vectors is a nonnegative multiple of the other. 193
= x + y, x + y = x x
2 2
+ 2 x, y + y +2 x
2
2 2
y + y
=( x + y ) . Equality requires x, y = x and then y , so one is a multiple of the other, say y = ax, x, y = x y
Denition 20.3. Dene the line connecting two distinct points x and y in Rn to be the set of points of the form tx + (1 t)y , and the line segment between x and y to be the set of such points with 0 t 1. Problem 20.3. Recall that for any three points x, y, z in Rn , xz + zy xy . Prove that when x and y are distinct, equality holds just when z lies on the line segment between x and y .
195
Problem 20.4. If x and y are distinct, and z = tx + (1 t)y on the line segment between x and y , prove that no other point of Rn has the same distances to x and y . Denition 20.4. A map T : Rn Rp is a rule associating to each point x of Rn a point T (x) of Rp . An isometry T of Rn is a map T : Rn Rn , so that the distance between any two points x and y of Rn is the same as the distance between T (x) and T (y ). Theorem 20.5. The isometries of Rn are precisely the maps T (x) = Ax + b where A is any orthogonal matrix, and b is any vector in Rn . Proof. From the exercises, T (tx + (1 t)y ) = tT (x) + (1 t)T (y ) , for x and y distinct, and t between 0 and 1, because T preserves distances, so preserves the triangle inequality, the lines, line segments, etc. Replace the map T (x) by T (x) T (0) if needed (just shifting T (x) over) to ensure that T (0) = 0. Taking y = 0, and x = 0, we get T (tx) = tT (x) , for 0 t 1. For t > 1, t T (x) = t T =t 1 tx t
1 T (tx) t = T (tx) . To handle minus signs, set y = x and t = 1 2: 0 = T (0) 1 1 x + (x) 2 2 1 1 = T (x) + T (x). 2 2 So T (x) = T (x), and this ensures that T (tx) = t T (x) for all real values of t. Take any nonzero vector y : =T T (x + y ) = T x y + (1 t) t 1t x y = tT + (1 t) T t 1t t
= T (x) + T (y ).
196
Set A to be the matrix whose columns are T (e1 ) , T (e2 ) , . . . , T (en ). Then T ( x) = T = = = Ax. So T (x) = Ax for a matrix A, and Ax = x so A is orthogonal. Problem 20.5. Which maps alter distance by a constant factor? xj ej xj T ( ej ) xj Aej
(where each 1 could be an identity matrix of any size). Call this a rotation in the xk x -plane if the sines and cosines appear in rows k and . More generally, picking any two perpendicular vectors of unit length, say u and v , and an angle , we can dene an orthogonal matrix R by asking that Rx = x for x perpendicular to u and v , and that R rotate the plane spanned by u and v by the angle : Ru = cos u + sin v Rv = sin u + cos v. Problem 20.7. The minus signs look funny. Check that this gives the expected matrix R if u = e1 and v = e2 . Problem 20.8. How can we be sure that such a matrix R exists?
197
Careful: if we swap the choice of which is u and which is v , we will rotate in the wrong direction. Another kind of orthogonal map is the map taking x to x, which (for want of a better word) we can call a reversal . Theorem 20.6. Every orthogonal matrix is a product of rotations in mutually perpendicular planes together with a reversal in some subspace perpendicular to all of those planes. Remark 20.7. The reversal occurs precisely in the = 1 eigenspace of the orthogonal matrix. Remark 20.8. There might be some vectors which are perpendicular to all of the planes and to the reversing subspace. Such vectors must be xed: Ax = x, and so form the = 1 eigenspace of the orthogonal matrix. Proof. Call the matrix A. This matrix A is a real matrix, but we can think of it as a complex matrix, which just happens to have only real number entries. Since A is an orthogonal matrix, it is normal. By the complex spectral theorem we can nd a unitary basis u1 , u2 , . . . , un in Cn of complex eigenvectors of A, say Au1 = 1 u1 , Au2 = 2 u2 , . . . , Aun = n un . These eigenvalues 1 , 2 , . . . , n are complex numbers. What sort of complex numbers are they? As in problem 15.21 on page 154, since A is unitary, the eigenvalues of A are complex numbers of modulus 1: 1 = |1 | = |2 | = = |n | . Moreover, since A is a matrix of real numbers, taking complex conjugates of 1u the equation Au1 = 1 u1 gives Au 1 = 1 . Because A is orthogonal, and therefore unitary, Au1 , Au 1 = u1 , u 1 . Therefore u1 , u 1 = Au1 , Au 1 1u = 1 u1 , 1 Pulling the scalars through the Hermitian inner product, = 1 1 u1 , u 1 = 2 1 . 1 u1 , u Therefore either (1) 2 1 . If (1), then 1 = 1 or else (2) u1 is perpendicular to u 1 = 1, so we are looking at an eigenvector in the reversing subspace, or a xed vector. The case (2) is trickier: write u1 = x1 + iy1 , with x1 and y1 real vectors. Then calculate u1 , u 1 = x1
2
y1
+ 2i x1 , y1 .
198
Therefore in case (2), we nd that x1 and y1 have the same length, and are perpendicular. We can scale x1 and y1 to both have length 1. Since |1 | = 1, we can write 1 = cos 1 + i sin 1 . Expand out the equation Au1 = 1 u1 , into real and imaginary parts, and you nd Ax1 = cos 1 x1 sin 1 y1 Ay1 = sin 1 x1 + cos 1 y1 , a rotation by an angle of 1 . The same results hold for u2 , u3 , . . . , un in place of u1 . Problem 20.9. Prove that the reversal of an even dimensional space R2n is a product of rotations, each rotating some plane by an angle of . We can choose any planes we like, as long as they are mutually perpendicular and together span R2n .
Corollary 20.9. An orthogonal matrix is a product of rotations in mutually perpendicular planes just when it has determinant 1. Denition 20.10. A rotation is an orthogonal matrix of unit determinant.
199
Problem 20.10. If A is orthogonal, prove that det A = 1. Theorem 20.13. Every orthogonal matrix is either a rotation (if it has determinant 1) or the product of a rotation with a reection (if it has determinant -1). Proof. By the same procedure as in corollary 20.9, we can arrange by induction without loss of generality that the columns of our orthogonal matrix A are e1 , e2 , e3 , . . . , en1 . But the proof breaks down at the last column, because to get each column xed up, the proof needs to have at least two rows. So the last column is en . Hence det A = 1 and A = 1 just when det A = 1, and is reection in en just when det A = 1. Problem 20.11. Prove that every unitary matrix is a rotation. Problem 20.12. Use the spectral theorem to show that every n n unitary matrix is a product of rotations in n mutually perpendicular planes. Theorem 20.14 (CartanDieudonn). Every n n orthogonal matrix is a product of at most n reections. The number of reections is an even number if it is a rotation, and odd otherwise. Proof. For a 1 1 matrix, the result is obvious. Let A be our n n orthogonal matrix. Our matrix A is a product of reections, say in vectors u1 , u2 , . . . , un , just when F AF t is a product in reections F u1 , F u2 , . . . , F un . So we may change orthonormal basis as we please. If u is a xed vector, i.e. Au = u, then we can rescale u to have unit length, and change basis to get u = e1 . Then by induction, A= 1 0 0 B
is a product of at most n 1 reections. If A doesnt x any vector, then take any nonzero vector v . Consider the reection R in the vector u = Av v . A simple calculation: Rv = Av and RAv = v. Therefore ARAv = Av, so that AR has a xed vector Av . By induction AR is a product of n 1 reections, so A is a product of n reections.
21 Orthogonal Projections
In chapter 20, we studied geometry of the inner product in Rn . In this chapter, we continue the study in abstract inner product spaces.
Theorem 21.3. Let W be a subspace of a nite dimensional inner product space V . Then every vector v in V can be written in precisely one way as v = x + y with x from W and y from W . The vector x is called the orthogonal projection of v to W . The map P : V W taking each vector to its orthogonal projection is linear. 201
202
Orthogonal Projections
Proof. First, lets show that there is at most one way to break up a vector v into a sum x + y . Suppose that v = x0 + y0 = x1 + y1 , with x0 and x1 from W and y0 and y1 from W . Then x0 x1 = y1 y0 . The left hand side lies in W while the right hand side lies in W . But the the left hand side must be perpendicular to the right hand side, so perpendicular to itself, so 0. Therefore the right hand side must be 0 too, so x0 = x1 and y0 = y1 : there is at most one way to break up each vector v . Next, lets show that there is a way. Take an orthonormal basis for V , say v1 , v2 , . . . , vn for which v1 , v2 , . . . , vp form an orthonormal basis for W . Therefore vp+1 , vp+2 , . . . , vn lie in W . Write each vector v in V as v = a1 v1 + a2 v2 + + ap vp + ap+1 vp+1 + ap+2 vp+2 + + an vn .
in W in W
Problem 21.2. Finish the proof by proving that orthogonal projection is a linear map.
Problem 21.3. Suppose that W is a subspace of a nite dimensional inner product space V . Prove that W = W .
Problem 21.4. Prove the Pythagorean theorem :if x and y are perpendicular 2 2 2 vectors in an inner product space, then x + y = x + y .
Problem 21.5. Prove that v = P v + Qv , where P is the orthgonal projection to a subspace W and Q is the orthogonal projection to W . Lemma 21.4. The orthogonal projection P v of a vector v to a subspace W is the closest point of W to the vector v . Proof. Write v = x + y with x in W and y in W . Lets see how close a vector w from W can get to v . vw
2
= xw+y = xw
2
2 2
+ y
by the Pythagorean theorem. As we vary w through W , this expression is clearly smallest when w = x.
203
Problem 21.6. Let P be the orthogonal projection to a subspace W of a nite dimensional inner product space V . Prove Bessels inequality Pv v , and equality holds just when P v = v , which holds just when v lies in W . Problem 21.7. Prove that a linear map P : V V of a nite dimensional inner product space V is the projection to some subspace if and only if P = P 2 = P t .
Problem 21.8. Find analogues of all of the results of this chapter for complex vectors in a nite dimensional Hermitian inner product space.
205
208 and V the subspace consisting of the vectors 0 x = 0 , x3 since we can write any vector x uniquely as 0 x1 x = x2 + 0 . x3 0
Problem 22.2. Give an example of two subspaces of R3 which are not complementary.
Theorem 22.6. U + V is a direct sum U V just when U V consists of just the 0 vector. Proof. If U + V is a direct sum, then we need to see that U V only contains the zero vector. If it contains some vector x, then we can write x uniquely as a sum x = y + z , but we can also write x = (1/2)x + (1/2)x or as x = (1/3)x + (2/3)x, as a sum of vectors from U and V . Therefore x = 0. On the other hand, if there is more than one way to write x = y + z = Y + Z for some vectors y and Y from U and z and Z from V , then 0 = (y Y )+(z Z ), so Y y = z Z , a nonzero vector from U V . Lemma 22.7. If U V is a direct sum of subspaces of a vector space W , then the dimension of U V is the sum of the dimensions of U and V . Moreover, putting together any basis of U with any basis of V gives a basis of W . Proof. Pick a basis for U , say u1 , u2 , . . . , up , and a basis for V , say v1 , v2 , . . . , vq . Then consider the set of vectors given by throwing all of the us and v s together. The us and v are linearly independent of one another, because any linear relation 0 = a1 u1 + a2 u2 + + ap up + b1 v1 + b2 v2 + + bq vq would allow us to write a1 u1 + a2 u2 + + ap up = (b1 v1 + b2 v2 + + bq vq ) , so that a vector from U (the left hand side) belongs to V (the right hand side), which is impossible unless that vector is zero, because U and V intersect only at 0. But that forces 0 = a1 u1 + a2 u2 + + ap up . Since the us are a basis, this forces all as to be zero. The same for the bs, so it isnt a linear relation. Therefore the us and v s put together give a basis for U V .
209
We can easily extend these ideas to direct sums with many summands U1 U2 Uk . Problem 22.3. Prove that if U V = W , then any linear maps S : U Rp and T : V Rp determine a unique linear map Q : W Rp written Q = S T , so that Q|U = S and Q|V = T .
22.2 Transversality
Lemma 22.9. If V is a nite dimensional vector space, containing two subspaces U and W , then dim U + dim W = dim(U + W ) + dim (U W ) . Proof. Take any basis for U W . Then while you pick some more vectors from U to extend it to a basis of U , I will simultaneously pick some more vectors from W to extend it to a basis of W . Clearly can throw our vectors together to get a basis of U + W . Count them up. This lemma makes certain inequalities on dimensions obvious.
210
Lemma 22.10. If U and W are subspaces of an n dimensional vector space V , say of dimensions p and q , then max {0, p + q n} dim (U W ) min {p, q } , max {p, q } dim (U + W ) min {n, p + q } . Proof. All inequalities but the rst are obvious. The rst follows from the last by applying lemma 22.9 on the previous page. Problem 22.4. How few dimensions can the intersection of subspaces of dimensions 5 and 3 in R7 have? How many? Denition 22.11. Two subspaces U and W of a nite dimensional vector space V are transverse if U + W = V . Problem 22.5. How few dimensions can the intersection of transverse subspaces of dimensions 5 and 3 in R7 have? How many? Problem 22.6. Must subspaces in direct sums be transverse?
22.3 Computations
In Rn , all abstract concepts of linear algebra become calculations. Problem 22.7. Suppose that U and W are subspaces of Rn . Take a basis for U , and put it into the columns of a matrix, and call that matrix A. Take a basis for W , and put it into the columns of a matrix, and call that matrix B . How do you nd a basis for U + W ? How do you see if U + W is a direct sum?
Proposition 22.12. Suppose that U and W are subspaces of Rn and that A and B are matrices whose columns give bases for U and W respectively. Apply the algorithm of chapter 10 to nd a basis for the kernel of (A B ), say x1 y1 , x2 y2 ,..., xs ys .
Then the vectors Ax1 , Ax2 , . . . , Axs form a basis for the intersection of U and W. Proof. For example, Ax1 + By1 = 0, so Ax1 = By1 = B (y1 ) lies in the image of A and of B. Therefore the vectors Ax1 , Ax2 , . . . , Axn lie in U W . Suppose that some vector v also lies in U W . Then v = Ax = B (y ) for some vectors x and y . But then Ax + By = 0, so x y
22.3. Computations
211
so x = aj Axj , for some numbers aj . Therefore these Axj span the intersection. Suppose they suer some linear relation: 0 = cj Axj . So 0 = A cj xj . But the columns of A are linearly independent, so A is 1-1. Therefore 0 = cj xj . At the same time, 0= = cj Axj cj B (yj ) cj yj ).
xj yj
is the simplest possible example. Its only eigenvalue is = 0. As a real or complex matrix, it has only one eigenvector, 1 , 0 up to rescaling. Not enough eigenvectors to form a basis of R2 , so not enough to diagonalize. We will build this entire chapter from this simple example. Problem 23.1. What does the map taking x to Ax look like for this matrix A? Lets write 1 = (0) , 2 = 0 0 0 1 , 3 = 0 0 0 1 0 0 0 1 , 0
and in general write n or just for the square matrix with 1s just above the diagonal, and 0s everywhere else. Problem 23.2. Prove that ej = ej 1 , except for e1 = 0. So we can think of as shifting the standard basis, like the proverbial lemmings stepping forward until e1 falls o the cli. 213
214
A matrix of the form + is called a Jordan block. Our goal in this chapter is to prove: Theorem 23.2. Every square complex matrix A can be brought by change of F 1 AF to Jordan normal form 1 + 2 + 1 , F AF = .. . N + broken into Jordan blocks. We will not give the simplest possible proof (which is probably the one given by Hartl [4]), but instead give an explicit algorithm for computing F , and then prove that the algorithm works. Denition 23.3. If is an eigenvalue of a matrix A, a vector x is a generalized eigenvector of A with eigenvalue if (A )k x = 0, for some positive integer k . If k = 1 then x is an eigenvector in the usual sense. Example 23.4. A= 0 0 1 0
satises A2 = 0, so every vector x in R2 is a generalized eigenvector of A, with eigenvalue 0. In the generalized sense, we have lots of eigenvectors. Problem 23.3. Prove that every vector in Cn is a generalized eigenvector of with eigenvalue 0. Then prove that for any number , every vector in Cn is a generalized eigenvector of + with eigenvalue . Problem 23.4. Prove that no nonzero vector can be a generalized eigenvector with two dierent eigenvalues.
Problem 23.5. Prove that nonzero generalized eigenvectors of a square matrix, with dierent eigenvalues, are linearly independent.
Denition 23.5. A string is a collection of linearly independent vectors of the form 2 k x, (A ) x, (A ) x, . . . , (A ) x, each a generalized eigenvector with eigenvalue . We want to make our strings as long as possible, not contained in any longer string.
23.2. Algorithm
215
Example 23.6. For A = , en , en1 , . . . , e1 is a string with eigenvalue = 0. Problem 23.6. For A = + , show that en , en1 , . . . , e1 is a string with eigenvalue . Problem 23.7. Find strings of 2 0 0 2 0 0 A= 0 0 0 0 0 0
0 1 2 0 0 0
0 0 0 3 0 0
0 0 0 1 3 0
0 0 0 . 0 1 3
Problem 23.8. Prove that every nonzero generalized eigenvector x belongs to a string, and the string can be lengthened until the last entry of the string is an eigenvector (not generalized).
23.2 Algorithm
First we give the algorithm, and then an example, and then a proof that it works. To compute out the strings that put a matrix A into Jordan normal form: a. If A is already in Jordan normal form, then just look beside the diagonal of A to nd the blocks, and read o the strings. So we can assume that A is not in Jordan normal form. b. Find an eigenvalue of A. Replace A with A . (So from now on A is not invertible.) c. Apply forward elimination to (A 1). Call the result (U V ). d. Put the pivot columns of A into a matrix A , and the pivot columns of U into a matrix U . e. Solve the system of equations U X = UA for an unknown square matrix X , by back substitution. This matrix X has size r r, where r is the rank of A (i.e. the number of pivot columns of A). The matrix X is smaller than A, because not all columns of A are pivot columns. f. Apply the algorithm to the matrix X , to nd its strings. (It may save you time to notice that X has the same eigenvalues as A, except perhaps 0.) g. For each string of X , applying A to the string gives a string of A. For example, a string y1 , y2 , . . . becomes A y1 , A y2 , . . . .
216
h. From among the strings we have just constructed in our last step, take each vector z which starts a string with eigenvalue 0. Solve the equations U x = V z by back substitution, and add x to the start of this string. i. We are still missing the strings of length 1 and zero eigenvalue, i.e. vectors from the kernel. Solve U x = 0 by back substitution, and take a basis x1 , x2 , . . . , xk of solutions. j. Each of these vectors x1 , x2 , . . . , xk constitutes a string of length 1 and zero eigenvalue. Add into our list of strings enough of these vectors to produce a basis. k. Put all of the strings into the columns of a single matrix F , each string listed in reverse order. Then F 1 AF is in Jordan normal form. A computer can carry out the algorithm, using symbolic algebra software. Lets see an example, and then prove that this algorithm always works. Example 23.7. Let 4 0 A= 0 0 0 1 0 1 0 0 3 1 0 0 . 0 3
a. A is not already in Jordan normal form. However, we can see that A is built out of blocks: a 1 1 block, and a 3 3 block. Each block can be separately brought to Jordan normal form, so the strings will divide up into strings for each block. The 1 1 block has e1 as eigenvector (a string of length 1) with eigenvalue 4. So it suces to nd the Jordan normal form for 1 0 0 A = 0 3 0 . 1 1 3 (We will still call this matrix A, even after we make various changes to it, to avoid a mess of notation.) b. The eigenvalues of A are 1 and 3. Replace A by A 3, so 2 0 0 A= 0 0 0 . 1 1 0 c. Forward elimination applied to (A 1) yields 2 0 0 1 1 (U V ) = 0 0 1 2 0 0 0 0 so 2 U = 0 0 0 1 0 0 1 0 , V = 1 2 0 0
0 0 1 0 0 1
0 1 , 0 0 1 . 0
23.2. Algorithm
217
d. The rst two columns of A are pivot columns. Cutting out nonpivot columns, 2 0 2 0 A = 0 0 , U = 0 1 . 1 1 0 0 e. Solving U X = U A , we nd 2 X11 4 U X UA = X21 0 Therefore X= 2 0 0 . 0 2 X12 X22 . 0
f. This matrix X is already in Jordan normal form. The strings of X are = 2 =0 g. These give strings in A: = 2 2 A e1 = 0 1 0 A e2 = 0 1 e1 e2
=0
h. One of the strings has zero eigenvalue, and starts with 0 z = 0 . 1 Solve U x = V z for an unknown vector x1 x = x2 , x3 nding 2x1 U x V z = x2 1 . 0
for any number x3 . We will pick x3 = 0 for simplicity. Add this vector x to the front of the string: e2 , Ae2 = e3 . So the two strings for A are 2 = 2 0 1 =0 e2 , e3
i. We can see a basis already, so skip this step. j. And skip this step too, for the same reason. Returning the original problem, we have to rst shift back the eigenvalues by 3 to restore the matrix back to 1 0 0 A = 0 3 0 . 1 1 3 This shifts eigenvalues, but preserves strings: 2 =1 0 1 =3 e2 , e3
Next we add back in the original block structure of A, so return to 4 0 0 0 1 0 0 0 A= . 0 0 3 0 0 1 1 3 This requires us to relabel our vectors for the second block, giving strings: =4 =1 e1 0 2 0 1 e3 , e4
=3
23.2. Algorithm
219
Writing each string down in reverse order, they become the columns of the matrix 1 0 0 0 2 0 0 0 F = . 0 0 0 1 0 1 1 0 This matrix F brings A to Jordan normal form: 4 0 0 0 1 0 F 1 AF = 0 0 3 0 0 0
0 0 . 1 3
Proposition 23.8. The algorithm takes any complex n n matrix, and yields strings whose entries are a basis of Cn . Proof. Clearly this is true for a 1 1 matrix. Lets imagine that A is n n, and that we have proven this proposition for any smaller complex matrix. U = V A, so dropping pivotless columns gives U = V A . Therefore U X = U A just when A X = AA . This implies A (X ) = (A ) A , and so k k A (X ) = (A ) A for any integer k . Since A has linearly independent columns, only 0 belongs to its kernel. Since the columns of A and of A both span the image of A, the images of A and A are the same. So mapping a vector y in Cr to the vector A y in Cn makes a correspondence between vectors in Cr and vectors in the image of A. A vector y starts a string for X just when A y starts a corresponding string for A (of the same length with the same eigenvalue). By induction, we can assume that the strings for X produce a basis of Cr . We write down the corresponding strings for A in step g. So we have a basis of strings for the image of A, not enough to make a basis of Cn . Lets make some more. First, lets think about strings of A of eigenvalue = 0. They look like x, Ax, A2 x, . . . . Every string of A that we have constructed so far lies in the image of A. We can always lengthen the = 0-strings because if the rst vector in the string, say z , lies in the image, say z = Ax, then we can add x to the front of the string. After that rst vector, which is not in the image, the rest of the string is in the image. This explains step h. But there could still be = 0-strings which have no vectors in the image. Such a = 0-string must have length 1, since any = 0-string x, Ax, . . . has Ax in the image. This explains step i. We cant build any more linearly independent = 0-strings, or lengthen any of the = 0-strings we have. We need to see that our strings form a basis of Cn . First we will check that all of the vectors in all of the strings are linearly independent, and then we will count them. Take any vectors x1 , x2 , . . . , xq , so that any two are either from dierent strings, or from dierent positions on the same string. Suppose that they satisfy
220
a linear relation. If none of these xj head up a = 0-string, then they live in the image, and are linearly independent by construction. There cannot be a linear relation between generalized eigenvectors with dierent eigenvalues, so we can assume that all of these xj live in = 0-strings. Any linear relation 0 = c1 x1 + c2 x2 + + cp xp entails a linear relation 0 = c1 Ax1 + c2 Ax2 + + cp Axp , pushing each vector down the string. None of these vectors can occur at the same position, so there is no such relation unless all of the Axj vanish. Therefore we can assume that all of the xj are in = 0-strings of length 1. But these are linearly independent by construction. We have to count how many vectors we have written down in all of the strings put together. A is n n, and has rank r. The image of A has dimension r and the kernel has dimension n r. In step f, we picked up strings containing r vectors. In step h, we added one vector to the front of each = 0-string. Such a string ends in the intersection of kernel and image. Suppose that the kernel and image have m dimensional intersectionthere are m such strings. Therefore we have taken r vectors, and added m more. In step i, we added one more vector, for each dimension of the kernel outside the image. This adds n r m more vectors, giving a total of r + m + (n r m) = n vectors. Theorem 23.9. Every complex square matrix A can be brought to Jordan normal form 1 + 2 + 1 F AF = .. . N + by the invertible complex matrix F obtained from the algorithm. Proof. Take the strings and write each one down in reverse order into the columns of a matrix F . Each string now becomes ei , ei1 , ei2 , . . . , just as in a Jordan block. Replace A by F 1 AF , so we can assume that Aei = i ei + ei1 . This gives us the i-th column of A, the same as for the Jordan normal form. Corollary 23.10. The same theorem is true for real square matrices, as long as all of the complex eigenvalues are real numbers. Proof. Use the same proof.
23.2. Algorithm
221
Problem 23.9. For an n n matrix A with entries in a eld F , show that we can put A into Jordan normal form as P AP 1 using a square matrix P with entries from F just when the characteristic polynomial det (A I ) splits into n linear factors with coecients from F .
is, as long as 1 = 2 , since these 1 and 2 are the eigenvalues of this matrix. It doesnt matter how small 1 and 2 are. The same idea clearly works for of any size, and for + of any size, and so for any matrix in Jordan normal form. Theorem 23.12. Every complex square matrix can be approximated as closely as we like by a diagonalizable square matrix. Remark 23.13. By approximated, we mean that we can make a new matrix with entries all as close as we wish to the entries of the original matrix. Proof. Put your matrix into Jordan normal form, say F 1 AF is in Jordan normal form, and then use the trick from the last example, to make diagonalizable matrices B close to F 1 AF . Then F BF 1 is diagonalizable too, and is close to A. Remark 23.14. Using the same proof, we can also approximate any real matrix arbitrarily closely by diagonalizable real matrices (i.e. diagonalized by real change of basis matrices), just when its eigenvalues are real.
222
Problem 23.12.
0 A = 0 0
0 0 0
1 0 0
223
Problem 23.13. Thinking about the fact that has string en , en1 , . . . , e1 , what is the Jordan normal form of A = 2 n ? (Dont try to nd the matrix F bringing A to that form.)
Problem 23.15. Use the algorithm to compute the Jordan normal form of 0 0 A = 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 . 0 0
Problem 23.16. Without computation (and without nding the matrix F taking A to Jordan normal form), explain how you can see the Jordan normal form of 1 10 100 0 20 200 0 0 300 Problem 23.17. If a square complex matrix A satises a complex polynomial equation f (A) = 0, show that each eigenvalue of A must satisfy the same equation. Problem 23.18. Prove that any reordering of Jordan blocks can occur by changing the choice of the matrix F we use to bring a matrix A to Jordan normal form. Problem 23.19. Prove that every matrix is a product of two diagonalizable matrices.
Problem 23.20. Suppose that to each n n matrix A we assign a number D(A), and that D(AB ) = D(A)D(B ).
224 (a) Prove that D P 1 AP = D(A) for any matrix P . (b) Dene a function f (x) by f ( x) = D x 1 1 .. .
. 1
Prove that f (x) is multiplicative ; i.e. f (ab) = f (a)f (b) for any numbers a and b. (c) Prove that on any diagonal matrix a1 a2 A= .. . an we have D(A) = f (a1 ) f (a2 ) . . . f (an ) . (d) Prove that on any diagonalizable matrix A, D(A) = f (1 ) f (2 ) . . . f (n ) , where 1 , 2 , . . . , n are the eigenvalues of A (counted with multiplicity). (e) Use the previous exercise to show that D(A) = f (det(A)) for any matrix A. (f) In this sense, det is the unique multiplicative quantity associated to a matrix, up to composing with a multiplicative function f . What are all continuous multiplicative functions f ? (Warning: it is a deep result that there are many discontinuous multiplicative functions.)
Problem 24.2. Use the Euclidean algorithm (subsection 17.5 on page 175) applied to polynomials instead of integers, to compute the greatest common divisor r(x) of a(x) = x4 + 2 x3 + 4 x2 + 4 x + 4 and b(x) = x5 + x2 + 2 x3 + 2. Find polynomials u(x) and v (x) so that u(x)a(x) + b(x)v (x) = r(x). Given any pair of polynomials a(x) and b(x), the Euclidean algorithm writes their greatest common divisor r(x) as r(x) = u(x) a(x) + v (x) b(x), a linear combination of a(x) and b(x). Similarly, if we have any number of polynomials, we can write the greatest common divisor of any pair of them as a linear combination. Pick two pairs of polynomials, and write the greatest common divisor of the greatest common divisors as a linear combination, etc. Keep going until you hit the greatest common divisor of the entire collection. We can unwind this process, to write the greatest common divisor of the entire collection of polynomials as a linear combination of the polynomials themselves. 225
226
Problem 24.3. For integers 2310, 990 and 1386 (instead of polynomials) express their greatest common divisor as an integer linear combination of them.
Generalized Eigenvectors
Lemma 24.1. Let T : V V be a complex linear map on a nite dimensional vector space V of dimension n. For each vector v in V there is a polynomial q (x) of degree at most n for which q (T )v = 0, and for which the roots of q (x) are eigenvalues of T . Remark 24.2. We could just let q (x) be the characteristic polynomial of T , and employ the CayleyHamilton theorem (theorem 18.14 on page 187). But we will opt for a more elementary proof. Proof. Take a vector v in V , and consider the vectors v, T v, T 2 , . . . , T n v . There are n + 1 vectors in this list, so there must be a linear relation a0 v + a1 T v + + an T n v. Let q (x) = a0 + a1 x + + an xn . Clearly q (T )v = 0. Rescale to get the leading coecient (the highest nonzero aj ) to be 1. Suppose that q () = 0. If is not an eigenvalue of T , then T is invertible, and we can drop the x j factor from q (x) and still satisfy q (T )v = 0. Theorem 24.3. Let T : V V be a complex linear map on a nite dimensional vector space. Every vector in V can be written as a sum of generalized eigenvectors, in a unique way. In other words, V is the direct sum of the generalized eigenspaces of T . Proof. We have seen in the previous chapter that generalized eigenvectors with dierent eigenvalues are linearly independent. We need to show that every vector v can be written as a sum of generalized eigenvectors. Pick any vector v in V . Suppose that T has eigenvalues 1 , 2 , . . . , s , all distinct from one another. Pick q (x) as in lemma 24.1. By the fundamental theorem of algebra, we can write q (x) as a product of linear factors, each of the form x j . Let qj (x) be the result of dividing out as many factors of x j as possible from q (x), say: qj (x) = q ( x) (x j )
dj
227
The various qj (x) have no common divisors by construction. Therefore we can nd polynomials uj (x) for which uj (x)qj (x) = 1. Let vj = uj (T )qj (T )v . Clearly vj = = v. These vi are generalized eigenvectors: (T j )
dj
qj (T )uj (T )v
vj = (T j )
dj
qj (T )uj (T )v
= q (T )uj (T )v = uj (T )q (T )v = 0.
Denition 24.6. The minimal polynomial of a square matrix A (or a linear map T : V V ) is the smallest degree polynomial m(x) = xd + ad1 xd1 + + a0 (with complex coecients) for which m(A) = 0 (or m(T ) = 0). Lemma 24.7. There is a unique minimal polynomial for any linear map T : V V on a nite dimensional vector space V . The minimal polynomial divides every other polynomial s(x) for which s(T ) = 0.
228
Remark 24.8. The CayleyHamilton theorem (theorem 18.14 on page 187) coupled with this lemma ensures that the minimal polynomial divides the characteristic polynomial. Proof. For example, if T satises two polynomials, say 0 = T 3 + 3 T + 1 and 1 to get 0 = T 3 + 1 0 = 2 T 3 + 1, then we can rescale the second equation by 2 2, 3 and then we have two equations which both start with T , so just take the 1 dierence: 0 = T 3 + 3 T + 1 T 3 + 2 . The point is that the T 3 terms wipe each other out, giving a new equation of lower degree. Keep going until you get the lowest degree possible nonzero polynomial. Rescale to get the leading coecient to be 1. If s(x) is some other polynomial, and s(T ) = 0, then divide m(x) into s(x), say s(x) = q (x)m(x) + r(x), with remainder r(x) of smaller degree than m(x). But then 0 = s(T ) = q (T )m(T ) + r(T ) = r(T ), so r(x) has smaller degree than m(x), and r(T ) = 0. But m(x) is already the smallest degree possible without being 0. So r(x) = 0, and m(x) divides s(x). Problem 24.6. Prove that the minimal polynomial of n is m(x) = xn .
Problem 24.7. Prove that the minimal polynomial of a Jordan block + n n is m(x) = (x ) .
Lemma 24.9. If A and B are square matrices with minimal polynomials mA (x) and mB (x), then the matrix C= A 0 0 B
has minimal polynomial mC (x) the greatest common divisor of the polynomials mA (x) and mB (x). Proof. Calculate that C2 = etc., so for any polynomial q (x), q (C ) = q (A) 0 0 . q (B ) A2 0 0 B2 ,
Let g (x) be the greatest common divisor of the polynomials mA (x) and mB (x). Then clearly g (C ) = 0. So mC (x) divides g (x). But mC (C ) = 0, so mC (A) = 0. Therefore mA (x) divides mC (x). Similarly, mB (x) divides mC (x). So mC (x) is the greatest common divisor.
229
Using the same proof: Lemma 24.10. If a linear map T : V V has invariant subspaces U and W so that V = U W , then the minimal polynomial of T is the greatest common divisor of the minimal polynomials of T |U and T |W . Lemma 24.11. The minimal polynomial m(x) of a complex linear map T : V V on a nite dimensional vector space V is m(x) = (x 1 )
d1
(x 2 )
d2
. . . (x s )
ds
where 1 , 2 , . . . , s are the eigenvalues of T and dj is no larger than the dimension of the generalized eigenspace of j . Remark 24.12. In fact dj is the size of the largest Jordan block with eigenvalue j in the Jordan normal form. Lets rst prove the result using Jordan normal form. Proof. We can assume that T is already in Jordan normal form: the minimal polynomial is the greatest common divisor of the minimal polynomials of the blocks. Remark 24.13. Next a proof which doesnt use Jordan normal form. Proof. We need only prove the result on each generalized eigenspace since they form a direct sum. We can assume that V is a single generalized eigenspace, say with eigenvalue . The result holds for T just if it holds for T , so we can assume that = 0 is our only eigenvalue. By lemma 24.1 on page 226, every vector v satises T n v = 0. So the minimal polynomial must divide xn . Corollary 24.14. A square matrix A (or linear map T : V V ) is diagonalizable just when it satises a polynomial equation s(A) = 0 (or s(T ) = 0) s(x) = (x 1 ) (x 2 ) . . . (x s ) = 0, for some distinct numbers 1 , 2 , . . . , s , which happens just when its minimal polynomial is a product of distinct linear factors.
230
Problem 24.9. Prove that the minimal polynomial of any 2 2 matrix A is m() = 2 tr A + det A, (where tr A is the trace of A), unless A is a multiple of the identity matrix, say A = c for some number c, in ehich case m(A) = c. Problem 24.10. Use Jordan normal form to prove the CayleyHamilton theorem: every complex square matrix A satises p(A) = 0, where p() = det (A I ) is the characteristic polynomial of A.
Problem 24.11. Prove that if A is a square matrix with real entries, then the minimal polynomial of A has real coecients.
Problem 24.12. If A is a square matrix, and An = 1, prove that A is diagonalizable over the complex numbers. Give an example to show that A need not be diagonalizable over the real numbers.
1 3
2 , 4
231
Clearly B has n2 rows and n columns. Apply forward elimination to B , and call the resulting matrix U . If one of the columns, lets say A3 , is not a pivot column, then A3 is a linear combination of lower powers of A, so therefore A4 is too, etc. So as soon as you hit a pivotless column of B , all subsequent columns are pivotless. Therefore U looks like U = ,
pivots straight down the diagonal, until you hit rows of zeros. Cut out all of the pivotless columns of U except the rst pivotless column. Also cut out the zero rows. Then apply back substitution, turning U into 1 a0 1 a1 . . .. . . . 1 ap Then the minimal polynomial is m(x) = xp+1 a0 a1 x ap xp . To see that this works, you notice that we have cut out all but the column of Ap+1 , the smallest power of A that is a linear multiple of lower powers. So the minimal polynomial has to express Ap+1 as a linear combination of lower powers, i.e. solving the linear equations a0 I + a1 A + + ap Ap = Ap+1 = 0. These equations yield the matrix B with a0 , a1 , . . . , ap as the unknowns, and we just apply elimination. On large matrices, this process is faster than nding the determinant. But it has the danger that small perturbations of the matrix entries alter the minimal polynomial drastically, so we can only apply this process when we know the matrix entries precisely. Problem 24.13. Find the minimal 0 2 0 What are the eigenvalues? polynomial of 1 1 1 2 . 0 1
232
Denition 24.15. If T : V W is a linear map, and U is a subspace of V , recall that the restriction T |U : U W is dened by T |U u = T u for u in U . Denition 24.16. Suppose that T : V V is a linear map from a vector space back to itself, and U is a subspace of V . We say that U is invariant under T if whenever u is in U , T u is also in U . A dicult result to prove by any other means: Corollary 24.17. If a linear map T : V V on a nite dimensional vector space is diagonalizable, then its restriction to any invariant subspace is diagonalizable. Proof. The linear map satises the same polynomial equation, even after restricting to the subspace. Problem 24.14. Prove that a linear map T : V V on a nite dimensional vector space is diagonalizable just when every subspace invariant under T has a complementary subspace invariant under T . Denition 24.18. A linear map N : V V from a vector space back to itself is called nilpotent if there is some positive integer k for which N k = 0. Clearly a linear map on a nite dimensional vector space is nilpotent just when its minimal polynomial is p(x) = xk for some positive integer k . Corollary 24.19. A linear map N : V V is nilpotent just when the restriction of N to any N invariant subspace is nilpotent. Problem 24.15. Prove that the only nilpotent which is diagonalizable is 0. Problem 24.16. Give an example to show that the sum of two nilpotents might not be nilpotent. Denition 24.20. Two linear maps S : V V and T : V V commute when ST = T S . Lemma 24.21. The sum and dierence of commuting nilpotents is nilpotent. Proof. If S and T are nilpotent linear maps V V , say S p = 0 and T q = 0. r Then take any number r p + q and expand out the sum (S + T ) . Because S and T commute, every term can be written with all its S factors on the left, and all its T factors on the right, and there are r factors in all, so either p of the factors must be S or q must be T , hence each term vanishes. Lemma 24.22. If two linear maps S, T : V V commute, then S preserves each generalized eigenspace of T , and vice versa.
233
Proof. If ST = T S , then clearly ST 2 = T 2 S , etc. so that Sp(T ) = p(T )S for any polynomial p(T ). Suppose that T has an eigenvalue . If x is a generalized eigenvector, i.e. (T )p x = 0 for some p, then (T ) Sx = S (T ) x = 0, so that Sx is also a generalized eigenvector with the same eigenvalue. Theorem 24.23. Take T : V V any linear map from a nite dimensional vector space back to itself. T can be written in just one way as a sum T = D + N with D diagonalizable, N nilpotent, and all three of T, D and N commuting. Example 24.24. For T = + , set D = , and N = . Remark 24.25. If any two of T, D and N commute, and T = D + N , then it is easy to check that all three commute. Proof. First, lets prove that D and N exist, and then prove they are unique. One proof that D and N exist, which doesnt require Jordan normal form: split up V into the direct sum of the generalized eigenspaces of T . It is enough to nd some D and N on each one of these spaces. But on each eigenspace, say with eigenvalue , we can let D = and let N = T . So existence of D and N is obvious. Another proof that D and N exist, which uses Jordan normal form: pick a basis in which the matrix of T is in Jordan normal form. Lets also call this matrix T , say 1 + 2 + . T = .. . N + Let 1 D= 2 .. . N and N = .. . This proves that D and N exist. Why are D and N uniquely determined? All generalized eigenspaces of T are D and N invariant. So we can restrict to a single generalized eigenspace . ,
p p
234
of T , and need only show that D and N are uniquely determined there. If is the eigenvalue, then D = (T ) N is a dierence of commuting nilpotents, so nilpotent by lemma 24.21 on page 232. Therefore D is both nilpotent and diagonalizable, and so vanishes: D = and N = T , uniquely determined. Theorem 24.26. If T0 : V V and T1 : V V are commuting complex linear maps (i.e. T0 T1 = T1 T0 ) on a nite dimensional complex vector space V , then splitting each into its diagonalizable and nilpotent parts, T0 = D0 + N0 and T1 = D1 + N1 , any two of the maps T0 , D0 , N0 , T1 , D1 , N1 commute. Proof. If x is a generalized eigenvector of T0 (so that (T0 ) x = 0 for some k and integer k > 0), then T1 x is also (because (T0 ) T1 x = 0 too, by pulling the T1 across to the left). Since V is a direct sum of generalized eigenspaces of T0 , we can restrict to a generalized eigenspace of T0 and prove the result there. So we can assume that T0 = 0 + N0 , for some complex number 0 . Switching the roles of T0 and T1 , we can assume that T1 = 1 + N1 . Clearly D0 = 0 and D1 = 1 commute with one another and with anything else. The commuting of T0 and T1 is equivalent to the commuting of N0 and N1 . Remark 24.27. All of the results of this chapter apply equally well to any linear map T : V V on any nite dimensional vector space over any eld, as long as the characteristic polynomial of T splits into a product of linear factors.
k
(using the same letter T for the linear map and its associated matrix).
Problem 24.18. If two linear maps S : V V and T : V V on a nite dimensional complex vector space commute, show that the eigenvalues of ST are products of eigenvalues of S with eigenvalues of T .
= F 1 T 2 F . By induction,
F T F = F T F for k = 1, 2, 3, . . . . So for any polynomial function p(x), we see that p F 1 T F = F 1 p(T )F . Therefore the partial sums of the Taylor expansion converge on the left hand side just when they converge on the right, approaching the same value. Remark 25.4. If a square matrix is in square blocks, A= B 0 235 0 C ,
0 . f (C )
So we only need to work one block at a time. Theorem 25.5. Let f (x) be an analytic function. If A is a single Jordan block, A = + n , and the series for f (x) converges near x = , then f (A) = f () + f () + f (n1) () n1 f () 2 f () 3 + + + . 2 3! (n 1)!
Proof. Expand out the Taylor series, keeping in mind that n = 0. Corollary 25.6. The value of f (A) does not depend on which Taylor series we use for f (x): we can expand f (x) about any point as long as the series converges on the spectrum of A. If we change the choice of the point to expand around, the resulting expression for f (A) determines the same matrix. Proof. Entries will be given by the formulas above, which dont depend on the particular choice of Taylor series, only on the values of the function f (x) for x in the spectrum of A. Corollary 25.7. Suppose that T : V V is a linear map of a nite dimensional vector space. Split T into T = D + N , diagonalizable and nilpotent parts. Take f (x) an analytic function given by a Taylor series converging on the spectrum of T (which is the spectrum of D). Then f (T ) = f (D + N ) = f (D) + f (D)N + f (D )N 2 f (D)N 3 f (n1) (D)N n1 + + + 2! 3! (n 1)!
where n is the dimensional of V . Remark 25.8. In particular, the result is independent of the choice of point about which we expand f (x) into a Taylor series, as long as the series converges on the spectrum of T . Example 25.9. Consider the matrix A= 0 1 = + .
237
1 + , log (1 + ), e .
Remark 25.10. If f (x) is analytic on the spectrum of a linear map T : V V on nite dimensional vector space, then we could actually dene f (T ) by using a dierent Taylor series for f (x) around each eigenvalue, but that would require a more sophisticated theory. (We could, for example, split up V into generalized eigenspaces for T , and compute out f (T ) on each generalized eigenspace separately; this proves convergence.) However, we will never use such a complicated theorywe will only dene f (T ) when f (x) has a single Taylor series converging on the entire spectrum of T .
converges for all values of x. Therefore eT is dened for any linear map T : V V on any nite dimensional vector space. Problem 25.2. Recall that the trace of an n n matrix A is tr A = A11 + A22 + + Ann . Prove that det eA = etr A . Example 25.12. For any positive number r,
log (x + r) =
k=0
1 x k r
For |x| < r, this series converges by the ratio test. (Clearly x can actually be real or complex.) If T : V V is a linear map on a nite dimensional vector space, and if every eigenvalue of T has positive real part, then we can pick r larger than the largest eigenvalue of T . Then
log T =
k=0
1 k
T r r
converges. The value of this sum does not depend on the value of r, since the value of log x = log (x r + r) doesnt.
238
Remark 25.13. The same tricks work for complex linear maps. We wont ever be tempted to consider f (T ) for f (x) anything other than a real-valued function of a real variable; the reader may be aware that there is a sophisticated theory of complex functions of a complex variable.
1 Problem 25.3. Find the Taylor series of f (x) = x around the point x = r (as 1 long as r = 0). Prove that f (A) = A , if all eigenvalues of A have positive real part.
Problem 25.4. If f (x) is the sum of a Taylor series converging on the spectrum of a matrix A, why are the entries of f (A) smooth functions of the entries of A? (A function is called smooth to mean that we can dierentiate the function any number of times with respect to any of its variables, in any order.) Problem 25.5. For any complex number z = x + iy , prove that ez converges to ex (cos y + i sin y ). Problem 25.6. Use the result of the previous exercise to prove that elog A = A if all eigenvalues of T have positive real part. Problem 25.7. Use the results of the previous two exercises to prove that log eA = A if all eigenvalues of A have imaginary part strictly between /2 and /2. Lemma 25.14. If A and B are two n n matrices, and AB = BA, then eA+B = eA eB . Proof. We expand out the product eA eB and collect terms. (The process proceeds term by term exactly as it would if A and B were real numbers, because AB = BA.) Corollary 25.15. eA is invertible for all square matrices A, and eA Proof. A commutes with A, so eA eA = e0 = 1. Denition 25.16. A real matrix A is skew-symmetric if At = A. A complex matrix A is skew-adjoint if A = A. Corollary 25.17. If A is skew-symmetric/complex skew-adjoint then eA is orthogonal/unitary. Proof. Term by term in the Taylor series, eA other cases.
t 1
= e A .
Lemma 25.18. If two n n matrices A and B commute (AB = BA) and the eigenvalues of A and B have positive real part and the products of their eigenvalues also have positive real part, then log (AB ) = log A + log B .
239
Proof. The eigenvalues of AB will be products of eigenvalues of A and of B , as see in section 24.3 on page 231. Again the result proceeds as it would for A and B numbers, term by term in the Taylor series. Corollary 25.19. If A is orthogonal/unitary and all eigenvalues of A have positive real part, then log A is skew-symmetric/complex skew-adjoint. Problem 25.8. What do corollaries 25.17 and 25.19 say about the matrix A= 0 1 1 ? 0
Problem 25.9. What can you say about eA for A symmetric? For A selfadjoint? If we slightly alter a matrix, we only slightly alter its spectrum. Theorem 25.20 (Continuity of the spectrum). Suppose that A is a n n complex matrix, and pick some disks in the complex plane which together contain exactly k eigenvalues of A (counting each eigenvalue by its multiplicity). In order to ensure that a complex matrix B also has exactly k eigenvalues (also counted by multiplicity) in those same disks, it is sucient to ensure that each entry of B is close enough to the corresponding entry of A. Proof. Eigenvalues are the roots of the characteristic polynomial det (A I ). If each entry of B is close enough to the corresponding entry of A, then each coecient of the characteristic polynomial of B is close to the corresponding coecient of the characteristic polynomial of A. The result follows by the argument principle (theorem 25.32 on page 243 in the appendix to this chapter).
Remark 25.21. The eigenvalues vary as dierentiable functions of the matrix entries as well, except where eigenvalues collide (i.e. at matrices for which two eigenvalues are equal), when there might not be any way to write the eigenvalues in terms of dierentiable functions of matrix entries. In a suitable sense, the eigenvectors can also be made to depend dierentiably on the matrix entries away from eigenvalue collisions. See Kato [5] for more information. Problem 25.10. Find the eigenvalues of A= 0 t 1 0
as a function of t. What happens at t = 0? Problem 25.11. Prove that if an n n complex matrix A has n distinct eigenvalues, then so does every complex matrix whose entries are close enough to the entries of A.
240
Corollary 25.22. If f (x) is an analytic function given by a Taylor series converging on the spectrum of a matrix A, then f (B ) is dened by the same Taylor expansion as long as each entry of B is close enough to the corresponding entry of A. Problem 25.12. Prove that a complex square matrix A is invertible just when it has the form A = eL for some square matrix L.
ak (x x0 ) ,
kak (x x0 )
k
k1
converge for x near x0 . Proof. We can assume just by replacing x by x x0 that x0 = 0. Lets suppose that our Taylor series converges for b < x < b. Then it must converge for x = b/r for any r > 1. So the terms must get small eventually, i.e. ak For large enough k , |ak | < Pick any R < r. Then |ak | b R
k
b r
0.
r b
<
b R r = R
r b
r R
241
a geometric series of diminishing terms, so convergent. Similarly, kak xk which converges by the comparison test. Corollary 25.24. Under the same conditions, the series k
k
r R
ak (x x0 )
converges in the same domain. Proof. The same trick works. Lemma 25.25. If f (x) is the sum of a convergent Taylor series, then f (x) is too. More specically, if f (x) = then f ( x) = Proof. Let f1 ( x) = kak (x x0 )
k 1
ak (x x0 ) ,
k 1
kak (x x0 )
which we know converges in the same open interval where the Taylor series for f (x) converges. We have to show that f1 (x) = f (x). Pick any points x + h and x where f (x) converges. Expand out: f (x + h) f (x) f1 (x) = h =
k k
ak (x + h) h
k
ak xk
kak xk1
ak
(x + h) x kxk1 h
k
=
k
ak
=0 k
xk h xk h
k
xk h1 kxk1
=h
k
ak
=2
=h
k
ak k (k 1)
=0
1 k 2 k 2 x h. ( + 2)( + 1)
242
which are the terms of a convergent series. The expression f (x + h) f (x) f1 (x) h is governed by a convergent series multiplied by h. In particular the limit as h 0 is 0. Corollary 25.26. Any function f (x) which is the sum of a convergent Taylor series in a disk has derivatives of all orders everywhere in the interior of that disk, given by formally dierentiating the Taylor series of f (x). All of these tricks work equally well for complex functions of a complex variable, as long as they are sums of Taylor series.
z p z p
A point z travelling around a circle winds around each point p inside, and doesnt wind around any point p outside.
z p z p
As z travels around the circle, the angle from p to z increases by 2 if p is inside, but is periodic if p is outside.
p0 cos sin , r r
If p0 > 1, then cos > 0 so that after adding multiples of 2 , we must have contained inside the domain of the arcsin function: = arcsin sin r ,
243
a continuous function of , and ( + 2 ) = (). This continuous function is uniquely determined up to adding integer multiples of 2 . On the other hand, suppose that p0 < 1. Consider the angle between Z = (p0 cos , p0 sin ) and P = (1, 0). By the argument above = arcsin where r=
2 (1 p0 cos ) + p2 0 sin . 2
p0 sin r
Rotating by takes P to z and Z to p. Therefore the angle of the ray from z to p is = () + , a continuous function increasing by 2 every time increases by 2 . Since is uniquely determined up to adding integer multiples of 2 , so is . Corollary 25.29. Consider the complex polynomial function P (z ) = (z p1 ) (z p2 ) . . . (z pn ) . Suppose that p1 lies inside some disk, and all other roots p2 , p3 , . . . , pn lie outside that disk. Then as z travels once around the boundary of that disk, the argument of the complex number w = P (z ) increases by 2 . Proof. The argument of a product is the sum of the arguments of the factors, so the argument of P (z ) is the sum of the arguments of z p1 , z p2 , etc. Corollary 25.30. Consider the complex polynomial function P (z ) = a (z p1 ) (z p2 ) . . . (z pn ) . Suppose that some roots p1 , p2 , . . . , pk all lie inside some disk, and all other roots pk+1 , pk+2 , . . . , pn lie outside that disk. Then as z travels once around the boundary of that disk, the argument of the complex number w = P (z ) increases by 2k . Corollary 25.31. Consider two complex polynomial functions P (z ) and Q(z ). Suppose that P (z ) has k roots lying inside some disk, and Q(z ) has roots lying inside that same disk, and all other roots of P (z ) and Q(z ) lie outside that disk. (So no roots of P (z ) or Q(z ) lie on the boundary of the disk.) Then as z travels once around the boundary of that disk, the argument of the complex number w = P (z )/Q(z ) increases by 2 (k ). Theorem 25.32 (The argument principle). If P (z ) is a polynomial, with k roots inside a particular disk, and no roots on the boundary of that disk, then every polynomial Q(z ) of the same degree as P (z ) and whose coecients are suciently close to the coecients of P (z ) has exactly k roots inside the same disk, and no roots on the boundary.
244
Proof. To apply corollary 25.31 on the previous page, we have only to ensure that Q(z )/P (z ) is not going to change in argument (or vanish) as we travel around the boundary of that disk. So we have only to ensure that while z stays on the boundary of the disk, Q(z )/P (z ) lies in a particular half-plane, for example that Q(z )/P (z ) is never a negative real number (or 0). So it is enough to ensure that |P (z ) Q(z )| < |P (z )| for z on the boundary of the disk. Let m be the minimum value of |P (z )| for z on the boundary of the disk. Suppose that the furthest point of our disk from the origin is some point z with |z | = R. Then if we write out Q(z ) = P (z ) + cj z j , we only need to ensure that the coecients c0 , c1 , . . . , cn satisfy |cj |Rj < m, to be sure that Q(z ) will have the same number of roots as P (z ) in that disk.
For any (real or complex) numbers x = (x1 , x2 , . . . , xn ) let Px (t) = (t x1 ) (t x2 ) . . . (t xn ) . Clearly the roots of Px (t) are precisely the entries of the vector x. Problem 26.1. Prove that Px (t) = tn s1 (x)tn1 + s2 (x)tn2 + + (1) sk (x)tnk + + (1)n sn (x). Denition 26.3. Let s(x) = (s1 (x), s2 (x), . . . , sn (x)), so that s : Rn Rn (If we work with complex numbers, then s : Cn Cn .) Lemma 26.4. The map s is onto, i.e. for each complex vector w in Cn there is a complex vector z in Cn so that s(z ) = w. Proof. Pick any w in Cn . Let z1 , z2 , . . . , zn be the complex roots of the polynomial P (t) = tn w1 tn1 + w2 tn2 + + (1)n wn . Such roots exist by the fundamental theorem of algebra (see theorem 15.21 on page 158). Clearly Pz (t) = P (t), since these polynomial functions have the same roots and same leading term. 245
k
246
Lemma 26.5. The entries of two vectors z and w are permutations of one another just when s(z ) = s(w). Proof. The roots of Pz (t) and Pw (t) are the same numbers. Corollary 26.6. A function is symmetric just when it is a function of the elementary symmetric functions. Remark 26.7. This means that every symmetric function f : Cn C has the form f (z ) = h(s(z )), for a unique function h : Cn C , and conversely if h is any function at all, then f (z ) = h(s(z )) determines a symmetric function. Theorem 26.8. A symmetric function of some complex variables is continuous just when it is expressible as a continuous function of the elementary symmetric functions, and this expression is uniquely determined. Proof. If h(z ) is continuous, clearly f (z ) = h(s(z )) is. If f (z ) is continuous and symmetric, then given any sequence w1 , w2 , . . . in Cn converging to a point w, we let z1 , z2 , . . . be a sequence in Cn for which s (zj ) = wj , and z a point for which s(z ) = w. The entries of zj are the roots of the polynomial Pzj (t) = tn wj 1 tn1 + wj 2 tn2 + + (1)n wjn . By the argument principle (theorem 25.32 on page 243), we can rearrange the entries of each of the various z1 , z2 , . . . vectors so that they converge to z . Therefore h (wj ) = f (zj ) converges to f (z ) = h(w). If there are two expressions, f (z ) = h1 (s(z )) and f (z ) = h2 (s(z )), then because s is onto, h1 = h2 .
a1 a2 an If a = (a1 , a2 , . . . , an ), write z a to mean z1 z2 . . . zn . Call a the weight of a the monomial z . We will order weights by alphabetical order, for example so that (2, 1) > (1, 2). Dene the weight of a polynomial to be the highest weight of any of its monomials. (The zero polynomial will not be assigned any weight.) The weight of a product of nonzero polynomials is the sum of the weights. The weight of a sum is at most the highest weight of any term. The weight of sj (z ) is (1, 1, . . . , 1, 0, 0, . . . , 0). j
Theorem 26.9. Every symmetric polynomial f has exactly one expression as a polynomial in the elementary symmetric polynomials. If f has real/rational/integer coecients, then f is a real/rational/integer coecient polynomial of the elementary symmetric polynomials. Proof. For any monomial z a , let
za = p
z p(a)
247
a sum over all permutations p. Every symmetric polynomial, if it contains a monomial z a , must also contain z p(a) , for any permutation p. Hence every symmetric polynomial is a sum of z a polynomials. Consequently the weight a of a symmetric polynomial f must satisfy a1 a2 an . We have only to write the z a in terms of the elementary symmetric functions, with integer coecients. Let bn = an , bn1 = an1 bn , bn2 = an2 bn1 , . . . , b1 = a1 b2 . Then s(z )b has leading monomial z a , so z a s(z )b has lower weight. Apply induction on the weight.
2 2 Example 26.10. z1 + z2 = (z1 + z2 ) 2 z1 z2 . To compute out these expressions: 2 2 f (z ) = z1 + z2 has weight (2, 0). The polynomials s1 (z ) and s2 (z ) have weights (1, 0) and (1, 1). So we subtract o the appropriate weights of s1 (z )2 from f (z ), and nd f (z ) s1 (z )2 = 2 z1 z2 = 2s2 (z ). 2
Sums of powers
j j j Dene pj (z ) = z1 + z2 + + zn , the sums of powers.
Lemma 26.11. The sums of powers are related to the elementary symmetric functions by 0 = k sk p1 sk1 + p2 sk2 + (1)k1 pk1 s1 + (1)k pk . Proof. Lets write z ( ) for z with the -th entry removed, so if z is a vector in Cn , then z ( ) is a vector in Cn1 . pj skj = zj
i1 <i2 <<ikj
= =
zj
i1 <i2 <<ikj
z j +1
i1 <i2 <<ikj 1 )
z j skj z (
z j +1 skj 1 z (
+
)
z j +1 skj 1 z (
z j +1 skj 1 z ( z j skj z (
)
z j +2 skj 2 z (
)
z j +2 skj 2 z (
z sk1 z (
+ (1)k1
z k s0 z (
= k sk + (1)k1 pk .
Proposition 26.12. Every symmetric polynomial is a polynomial in the sums of powers. If the coecients of the symmetric polynomial are real (or rational), then it is a real (or rational) polynomial function of the sums of powers. Every continuous symmetric function of complex variables is a continuous function of the sums of powers. Proof. We can solve recursively for the sums of powers in terms of the elementary symmetric functions and conversely. Remark 26.13. The standard reference on symmetric functions is [7].
.. , zn
then prove that sj (A) = sj (z1 , z2 , . . . , zn ), the elementary symmetric functions of the eigenvalues.
249
Problem 26.3. Generalize the previous exercise to A diagonalizable. Theorem 26.17. Each continuous (or polynomial) invariant function of a complex matrix has exactly one expression as a continuous (or polynomial) function of the elementary symmetric functions of the eigenvalues. Each polynomial invariant function of a real matrix has exactly one expression as a polynomial function of the elementary symmetric functions of the eigenvalues. Remark 26.18. We can replace the elementary symmetric functions of the eigenvalues by the sums of powers of the eigenvalues. Proof. Every continuous invariant function f (A) determines a continuous function f (z ) by setting z1 z2 . A= .. . zn Taking F any permutation matrix, invariance tells us that f F AF 1 = f (A). But f F AF 1 is given by applying the associated permutation to the entries of z . Therefore f (z ) is a symmetric function. If f (A) is continuous (or polynomial) then f (z ) is too. Therefore f (z ) = h(s(z )), for some continuous (or polynomial) function h; so f (A) = h(s(A)) for diagonal matrices. By invariance, the same is true for diagonalizable matrices. If we work with complex matrices, then every matrix can be approximated arbitrarily closely by diagonalizable matrices (by theorem 23.12 on page 221). Therefore by continuity of h, the equation f (A) = h(s(A)) holds for all matrices A. For real matrices, the equation only holds for those matrices whose eigenvalues are real. However, for polynomials this is enough, since two polynomial functions equal on an open set must be equal everywhere. Remark 26.19. Consider the function f (A) = sj (|1 | , |2 | , . . . , |n |), where A has eigenvalues 1 , 2 , . . . , n . This function is a continuous invariant of a real matrix A, and is not a polynomial in 1 , 2 , . . . , n .
27 The Pfaan
Skew-symmetric matrices have a surprising additional polynomial invariant, called the Pfaan, but it is only invariant under rotations, and only exists for skew-symmetric matrices with an even number of rows and columns.
252
The Pfaan
of A. If is an complex eigenvalue of A, with complex eigenvector z , scale z to have unit length, and then = z, z = z, z = Az, z = z, Az = z, z z, z = = . , i.e. has the form ia for some real number a. So there are Therefore = two dierent possibilities: = 0 or = ia with a = 0. If = 0, then z lies in the kernel, so if we write z = x + iy then both x and y lie in the kernel. In particular, we can write a real orthonormal basis for the kernel, and then x and y will be real linear combinations of those basis vectors, and therefore z will be a complex linear combination of those basis vectors. Lets take u1 , u2 , . . . , us to be a real orthonormal basis for the kernel of A, and clearly then the same vectors u1 , u2 , . . . , un form a complex unitary basis for the complex kernel of A. Next lets take care of the nonzero eigenvalues. If = ia is a nonzero eigenvalue, with unit length eigenvector z , then taking complex conjugates on the equation Az = z = iaz , we nd Az = iaz , so z is another eigenvector with eigenvalue ia. So they come in pairs. Since the eigenvalues ia and ia are distinct, the eigenvectors z and z must be perpendicular. So we can always make a new unitary basis of eigenvectors, throwing out any = ia eigenvector and replacing it with z if needed, to ensure that for each eigenvector z in our unitary basis of eigenvectors, z also belongs to our unitary basis. Moreover, we have three equations: Az = iaz , z, z = 1, and z, z = 0. Write z = x + iy with x and y real vectors, and expand out all three equations in terms of x and y to nd Ax = ay, Ay = ax, x, x + y, y = 1, x, x y, y = 0 and 1 1 x, y = 0. So if we let X = x and Y = y , then X and Y are unit vectors, 2 2 and AX = aY and AY = aX . Now if we carry out this process for each eigenvalue = ia with a > 0, then we can write down vectors X1 , Y1 , X2 , Y2 , . . . , Xt , Yt , one pair for each eigenvector from our unitary basis with a nonzero eigenvalue. These vectors are each unit length, and each Xi is perpendicular to each Yi . We also have AXi = ai Yi and AYi = ai Xi . If zi and zj are two dierent eigenvectors from our original unitary basis of eigenvectors, and their eigenvalues are i = iai and j = iaj with ai , aj > 0, then we want to see why Xi , Yi , Xj and Yj must be perpendicular. This follows immediately from zi , z i , zj and z j being perpendicular, by just expanding into real and imaginary parts. Similarly, we can see that u1 , u2 , . . . , us are
27.2. Partitions
253
Clearly these vectors form a real orthonormal basis, so F is an orthogonal matrix. We want to arrange that F be a rotation matrix. Lets suppose that F is not a rotation. We can either change the sign of one of the vectors u1 , u2 , . . . , us (if there are any), or replace X1 by X1 , which switches the sign of a1 , to make F orthogonal.
27.2 Partitions
A partition of the numbers 1, 2, . . . , 2n is a choice of division of these numbers into pairs. For example, we could choose to partition 1, 2, 3, 4, 5, 6 into {4, 1} , {2, 5} , {6, 3} . This is the same partition if we write the pairs down in a dierent order, like {2, 5} , {6, 3} , {4, 1} , or if we write the numbers inside each pair down in a dierent order, like {1, 4} , {5, 2} , {6, 3} . It isnt really important that the objects partitioned be numbers. Of course, you cant partition an odd number of objects into pairs. Each permutation p of the numbers 1, 2, 3, . . . , 2n has an associated partition {p(1), p(2)} , {p(3), p(4)} , . . . , {p(2n 1), p(2n)} . For example, the permutation 3, 1, 4, 6, 5, 2 has associated partition {3, 1} , {4, 6} , {5, 2} . Clearly two dierent permutations p and q could have the same associated partition, i.e. we could rst transpose various of the pairs of the partition of p, keeping each pair in order, and then transpose entries within each pair, but not across dierent pairs. Consequently, there are n!2n dierent permutations associating the same partition: n! ways to permute pairs, and 2n ways to swap the order within each pair. When you permute a pair, like changing 3, 1, 4, 6, 5, 2 to 4, 6, 3, 1, 5, 2, this is the eect of a pair of transpositions (one to permute 3 and 4 and another to permute 5 and 6), so has no eect on signs. Therefore if two permutations have the same partition, the root cause of any dierence in sign must be from transpositions inside each pair. For example, while it is complicated to nd the signs of the permutations 3, 1, 4, 6, 5, 2 and of 4, 6, 1, 3, 5, 2, it is easy to see that these signs must be dierent.
254
The Pfaan
On the other hand, we can write each partition in alphabetical order, like for example rewriting {4, 1} , {2, 5} , {6, 3} as {1, 4} , {2, 5} , {3, 6} so that we put each pair in order, and then order the pairs among one another by their lowest elements. This in term determines a permutation, called the natural permutation of the partition, given by putting the elements in that order; in our example this is the permutation 1, 5, 2, 5, 3, 6. We write the sign of a permutation p as sgn(p), and dene the sign of a partition P to be sign of its natural permutation. Watch out: if we start with a permutation p, like 6, 2, 4, 1, 3, 5, then the associated partition P is {6, 2} , {4, 1} , {3, 5} . This is the same partition as {1, 4} , {2, 6} , {3, 5} (just written in alphabetical order). The natural permutation q of P is therefore 1, 4, 2, 6, 3, 5, so the original permutation p is not the natural permutation of its associated partition. Problem 27.1. How many partitions are there of the numbers 1, 2, . . . , 2n?
Problem 27.2. Write down all of the partitions of (a) 1, 2; (b) 1, 2, 3, 4; (c) 1, 2, 3, 4, 5, 6.
then det A = a2 , so the entry a = A12 is a polynomial function of the entries of A, which squares to det A.
255
Example 27.3. A huge calculation shows that if A is a 4 4 skew-symmetric matrix, then 2 det A = (A12 A34 A13 A24 + A14 A23 ) . So A12 A34 A13 A24 + A14 A23 is a polynomial function of the entries of A which squares to det A. Denition 27.4. For any 2n 2n skew-symmetric matrix A, let Pf A = 1 n!2n sgn(p)Ap(1)p(2) Ap(3)p(4) . . . Ap(2n1)p(2n)
p
where sgn(p) is the sign of the permutation p. Pf is called the Pfaan. Remark 27.5. Dont ever try to use this horrible formula to compute a Pfaan. We will nd better way soon. Lemma 27.6. For any 2n 2n skew-symmetric matrix A, Pf A =
P
where the sum is over partitions P and the permutation p is the natural permutation of the partition P . In particular, Pf A is an integer coecient polynomial of the entries of the matrix A. Proof. Each permutation p has an associated partition P . So we can write the Pfaan as a sum Pf A = 1 n!2n sgn(p)Ap(1)p(2) Ap(3)p(4) . . . Ap(2n1)p(2n)
P p
where the rst sum is over all partitions P , and the second over all permutations p which have P as their associated partition. But if two permutations p and q both have the same associated partition {p(1), p(2)} , {p(3), p(4)} , . . . , {p(2n 1), p(2n)} , then p and q give the same pairs of indices in the expression Ap(1)p(2) Ap(3)p(4) . . . Ap(2n1)p(2n) . Perhaps some of the indices in these pairs might be reversed. For example, we might have partition P being {1, 5} , {2, 6} , {3, 4} , and permutations p being 1, 5, 2, 6, 3, 4 and q being 5, 1, 2, 6, 3, 4. The contribution to the sum coming from p is sgn(p)A15 A26 A34 , while that from q is sgn(q )A51 A26 A34 .
256
The Pfaan
But then A51 = A15 , a sign change which is perfectly oset by the sign sgn(q ): each transposition of pairs gives a sign change from sgn(p). So put together, we nd that for any two permutations p and q with the same partition, their contributions are the same: sgn(q )Aq(1)q(2) Aq(3)q(4) . . . Aq(2n1)q(2n) = sgn(p)Ap(1)p(2) Ap(3)p(4) . . . Ap(2n1)p(2n) . Therefore the n!2n permutations with associated partition P all contribute the same amount as the natural permutation of P .
257
n!2n Pf BAB t =
p
p(1)p(2)
BAB t
p(3)p(4)
. . . BAB t
p(2n1)p(2n)
i2n1 i2n
=
i1 i2 ...i2n p
Bei2
...
If i1 = i2 then two columns are equal inside the determinant, so we can write this as a sum over permutations:
=
q
Beq(2) Be2
... ...
= n!2n det B Pf A.
Finally, to prove that Pf 2 A = det A, we just need to get A into skew-symmetric normal form via a rotation matrix B , and then Pf 2 A = Pf 2 (BAB t ) = det (BAB t ) = det A.
How do you calculate the Pfaan in practice? It is like the determinant, except that you start running your nger down the rst column under the diagonal, and you write down , +, , +, . . . in front of each entry from the rst column, and then the Pfaan you get by crossing out that row and column,
258 and symmetrically 0 2 0 2 Pf 1 8 3 4 the corresponding column and row. So 1 3 0 2 8 4 0 2 = (2) Pf 1 0 5 8 5 0 3 4 0 2 0 2 + (1) Pf 1 8 3 4 0 2 0 2 (3) Pf 1 8 3 4
The Pfaan
1 8 0 5 1 8 0 5 1 8 0 5
3 4 5 0 3 4 5 0 3 4 5 0
= (2) 5 + (1) (4) (3) 8. Lets prove that this works: Lemma 27.8. If A is a skew-symmetric matrix with an even number of rows and columns, larger than 2 2, then Pf A = A21 Pf A[21] + A31 Pf A[31] . . . =
j>1
where A[ij ] is the matrix A with rows i and j and columns i and j removed. Proof. Lets dene a polynomial P (A) in the entries of a skew-symmetric matrix A (with an even number of rows and columns) by setting P (A) = Pf A if A is 2 2, and setting P ( A) = (1)i+1 Ai1 P A[i1] ,
i>1
for larger A. We need to show that P (A) = Pf A. Clearly P (A) = Pf A if A is in skew-symmetric normal form. Each term in Pf A corresponds to a partition, and each partition must put 1 into one of its pairs, say in a pair {1, i}. It then cant use 1 or i in any other pair. Clearly P (A) also has exactly one factor like Ai1 in each term, and then no other factors get to have i or 1 as subscripts. Moreover, all terms in P (A) and in Pf A have a coecient of 1 or 1. So it is clear that the terms of P (A) and of Pf A are the same, up to sign. We have to x the signs. Suppose that we swap rows 2 and 3 of A and columns 2 and 3. Lets show that this changes the signs of P (A) and of Pf A. For Pf A, this is immediate from theorem 27.7 on page 256. Let Q be the permutation matrix of the
259
transposition of 2 and 3. (To be more precise, let Qn be the n n permutation matrix of the transposition of 2 and 3, for any value of n 3. But lets write all such matrices Qn as Q.) Let B = QAQt , i.e. A with rows 2 and 3 and columns 2 and 3 swapped. So Bij is just Aij unless i or j is either 2 or 3. So P (B ) = B21 P B [21] + B31 P B [31] B41 P B [41] + . . . = A31 P A[31] + A21 P A[21] A41 P QA[41] Qt By induction, the sign changes in the last term. = +A21 P A[21] A31 P A[31] + A41 P A[41] = P (A). So swapping rows 2 and 3 changes a sign. In the same way, P QAQt = sgn qP (A), for Q the permutation matrix of any permutation q of the numbers 2, 3, . . . , 2n. If we start with A in skew-symmetric normal form, letting the numbers a1 , a2 , . . . , an in the skew-symmetric normal form be some abstract variables, then Pf A is just a single term of the Pf and of P and these terms have the same sign. All of the terms of Pf are obtained by permuting indices in this term, i.e. as Pf (QAQt ) for suitable permutation matrices Q. Indeed you just need to take Q the permutation matrix of the natural permutation of each partition. Therefore the signs of Pf and of P are the same for each term, so P = Pf . Problem 27.3. Prove that the odd degree elementary symmetric functions of the eigenvalues vanish on any skew-symmetric matrix. Let s1 (a), . . . , sn (a) be the usual symmetric functions of some numbers a1 , a2 , . . . , an . For any vector a let t(a) be the vector 2 2 s1 a2 1 , a2 , . . . , an 2 2 s2 a2 1 , a2 , . . . , an . . . . 2 2 sn1 a2 1 , a2 , . . . , an a1 a2 . . . an Lemma 27.9. Two complex vectors a and b in Cn satisfy t(a) = t(b) just when b can be obtained from a by permutation of entries and changing signs of an even number of entries. A function f (a) is invariant under permutations and even numbers of sign changes just when f (a) = h(t(a)) for some function h.
2 2 2 Proof. Clearly sn (a2 1 , a2 , . . . , an ) = (a1 a2 . . . an ) = tn (a) . In particular, the 2 2 symmetric functions of a2 , a , . . . , a are all functions of t , 1 t2 , . . . , tn . Therefore n 1 2 2
260
The Pfaan
2 2 if we have two vectors a and b with t(a) = t(b), then a2 1 , a2 , . . . , an are a permuta2 2 2 tion of b1 , b2 , . . . , bn . So after permutation, a1 = b1 , a2 = b2 , . . . , an = bn , equality up to some sign changes. Since we also know that tn (a) = tn (b), we must have a1 a2 . . . an = b1 b2 . . . bn . If none of the ai vanish, then a1 a2 . . . an = b1 b2 . . . bn ensures that none of the bi vanish either, and that the number of sign changes is even. It is possible that one of the ai vanish, in which case we can change its sign as we like to arrange that the number of sign changes is even.
Lemma 27.10. Two skew-symmetric matrices A and B with the same even numbers of rows and columns can be brought one to another, say B = F AF t , by some rotation matrix F , just when they have skew-symmetric normal forms 0 a1 a1 0 0 a2 a2 0 .. . 0 an an 0 and 0 b1 b1 0 0 b2 b2 0 .. . 0 bn bn 0
respectively, with t(a) = t(b). Proof. If we have a skew-symmetric normal form for a matrix A, with numbers a1 , a2 , . . . , an as above, then t1 (a), t2 (a), . . . , tn1 (a) are the elementary symmetric functions of the squares of the eigenvalues, while tn (a) = Pf A, so clearly t(a) depends only on the invariants of A under rotation. In particular, suppose that I nd two dierent skew-symmetric normal forms, one with numbers a1 , a2 , . . . , an and one with numbers b1 , b2 , . . . , bn . Then the numbers b1 , b2 , . . . , bn must be given from the numbers a1 , a2 , . . . , an by permutation and switching of an even number of signs. In fact we can attain these changes by actual rotations as follows. For example, think about 4 4 matrices. The permutation matrix F of 3, 4, 1, 2 permutes the rst two and second two basis vectors, and is a rotation because the number of transpositions is even. When we replace A by F AF t , we
261
swap a1 with a2 . Similarly, we can take the matrix F which reects e1 and e3 , changing the sign of a1 and of a2 . So we can clearly carry out any permutations, and any even number of sign changes, on the numbers a1 , a2 , . . . , an . Lemma 27.11. Any polynomial in a1 , a2 , . . . , an can be written in only one way as h(t(a)). Proof. Recall that every complex number has a square root (a vector with half the argument and the square root of the modulus). Clearly 0 has only 0 as square root, while all other complex numbers z have two square roots, which we write as z . Given any complex vector b, I can solve t(a) = b by rst constructing a solution c to b1 b2 . s(c) = . , . bn1 b2 n and then letting aj = cj . Clearly t(a) = b unless tn (a) has the wrong sign. If we change the sign of one of the aj then we can x this. So t : Cn Cn is onto. Theorem 27.12. Each polynomial invariant of a skew-symmetric matrix with even number of rows and columns can be expressed in exactly one way as a polynomial function of the even degree symmetric functions of the eigenvalues and the Pfaan. Two skew-symmetric matrices A and B with the same even numbers of rows and columns can be brought one to another, say B = F AF t , by some rotation matrix F , just when their even degree symmetric functions and their Pfaan agree. Proof. If f (A) is a polynomial invariant under rotations, i.e. f (F AF t ) = f (A), then we can write f (A) = h(t(a)), with a1 , a2 , . . . , an the numbers in the skewsymmetric normal form of A, and h some function. Lets write the restriction of f to the normal form matrices as as polynomial f (a). We can split f into a sum of homogeneous polynomials of various degrees, so lets assume that f is already homogeneous of some degree. We can pick any monomial in f and sum it over permutations and over changes of signs of any even number of variables, and f will be a sum over such quantities. So we only have to consider each such quantity, i.e. assume that f= (a1 )
dp(1)
(a2 )
dp(2)
. . . (an )
dp(n)
(27.1)
where the sum is over all choices of any even number of minus signs and all permutations p of the degrees d1 , d2 , . . . , dn . If all degrees d1 , d2 , . . . , dn are even,
262
The Pfaan
2 2 then f is an elementary symmetric function of a2 1 , a2 , . . . , an , so a polynomial in t1 (a), t2 , (a), . . . , tn1 (a). If all degrees are odd, then they are all positive, and we can divide out a factor of a1 a2 . . . an = tn (a). So lets assume that at least one degree is even, say d1 , and that at least one degree is odd, say d2 . All terms in equation 27.1 that put a plus sign in front of a1 and a2 cancel those terms which put a minus sign in front of both a1 and a2 . Similarly, terms putting a minus sign in front of a1 and a plus sign in front of a2 cancel those which do the opposite. So f = 0. Consequently, invariant polynomial functions f (A) are polynomials in the Pfaan and the symmetric functions of the squared eigenvalues. The characteristic polynomial of an n n skew-symmetric matrix A is clearly
det (A I ) = + a2 1 so that
2 + a2 2 . . . + an ,
2 2 s2j (A) = sj a2 1 , a2 , . . . , an ,
s2j 1 (A) = 0,
for any j = 1, . . . , n. Consequently, invariant polynomial functions f (A) are polynomials in the Pfaan and the even degree symmetric functions of the eigenvalues.
Factorizations
263
28.1 The Vector Space of Linear Maps Between Two Vector Spaces
If V and W are two vector spaces, then a linear map T : V W is also often called a homomorphism of vector spaces, or a homomorphism for short, or a morphism to be even shorter. We wont use this terminology, but we will nevertheless write Hom (V, W ) for the set of all linear maps T : V W . Denition 28.1. A linear map is onto if every output w in W comes from some input: w = T v , some input v in V . A linear map is 1-to-1 if any two distinct vectors v1 = v2 get mapped to distinct vectors T v1 = T v2 . Problem 28.1. Turn Hom (V, W ) into a vector space.
Problem 28.2. Prove that a linear map is an isomorphism just when it is 1-to-1 and onto. Problem 28.3. Give the simplest example you can of a 1-to-1 linear map which is not onto. Problem 28.4. Give the simplest example you can of an onto linear map which is not 1-to-1. Problem 28.5. Prove that a linear map is 1-to-1 just when its kernel consists precisely in the zero vector.
Remark 28.2. If V and W are complex vector spaces, we will write Hom (V, W ) to mean the set of linear maps of complex vector spaces, etc. We will only prove results for real vector spaces, and the reader can imagine how to generalize them appropriately. 265
266
So we will identify Rn with the set of row matrices. We will write e1 , e2 , . . . , en for the obvious basis: ei is the i-th row of the identity matrix. Problem 28.6. Why is V a vector space?
Problem 28.7. What is dim V ? Remark 28.5. V and V have the same dimension, but we should think of them as quite dierent vector spaces. Lemma 28.6. Suppose that V is a vector is a unique basis for V , called the basis write as v 1 , v 2 , . . . , v n , so that 1 v i ( vj ) = 0 space with basis v1 , v2 , . . . , vn . There dual to v1 , v2 , . . . , vn , which we will
if i = j , if i = j .
Remark 28.7. The hard part is getting used to the notation: v 1 , v 2 , . . . , v n are each a linear function taking vectors from V to numbers: v 1 , v 2 , . . . , v n : V R . Proof. For each xed i, the equations above uniquely determine a linear function v i , by theorem 16.23 on page 167, since we have dened the linear function on a basis. The functions v 1 , v 2 , . . . , v n are linearly independent, because if they satisfy ai v i = 0, then applying this linear function ai v i to the
267
basis vector vj we nd aj = 0, and this holds for each j , so all numbers a1 , a2 , . . . , an vanish. Finally if we have any linear function f on V , then we can set a1 = f (v1 ) , a2 = f (v2 ) , . . . , an = f (vn ), and nd f (v ) = aj v j (v ) for v = v1 or v = v2 , etc., and therefore for v any linear combination of v1 , v2 , etc. Therefore f = aj v j , and we see that these functions v 1 , v 2 , . . . , v n span V . Problem 28.8. Find the dual basis v 1 , v 2 , v 3 to the basis 1 1 1 v1 = 0 , v2 = 2 , v3 = 2 . 3 0 0 Problem 28.9. Prove that the dual basis v 1 , v 2 , . . . , v n to any basis v1 , v2 , . . . , vn of Rn satises v1 2 v . = F 1 , . . vn where F = v1 v2 ... vn . Lemma 28.8. Let V be a nite dimensional vector space. V and V are isomorphic, by associating to each vector x from V the linear function fx from V dened by fx () = (x). Remark 28.9. This lemma is very confusing, but very simple, and therefore very important. Proof. First, lets ask what V means. Its vectors are linear functions on V , by denition. Next, lets pick a vector x in V and construct a linear function on V . How? Take any covector in V , and lets assign to it some number f (). Since is (by denition again) a linear function on V , (x) is a number. Lets take the number f () = (x). Lets call this function f = fx . The rest of the proof is a series of exercises. Problem 28.10. Check that fx is a linear function.
Problem 28.12. Check that T : V V is one-to-one: i.e. if we pick two dierent vectors x and y in V , then fx = fy .
268
Remark 28.10. Although V and V are identied as above, V and V cannot be identied in any natural manner, and should be thought of as dierent. Denition 28.11. If T : V W is a linear map, write T : W V for the linear map given by T ()(v ) = (T v ). Call T the transpose of T . Problem 28.13. Prove that T : W V is a linear map. Problem 28.14. What does this notion of transpose have to do with the notion of transpose of matrices?
269
Denition 28.15. If x + W and y + W are translates, we add them by (x + W ) + (y + W ) = (x + y ) + W . If s is a number, let s(x + W ) = sx + W . Problem 28.16. Prove that addition and scaling of translates is well-dened, independent of the choice of x and y in a given translate.
Denition 28.16. The quotient space V /W of a vector space V by a subspace W is the set of all translates v + W of all vectors v in V . Example 28.17. Take V the plane, V = R2 , and W the vertical axis. The translates of W are the vertical lines in the plane. The quotient space V /W has the various vertical lines as its points. Each vertical line passes through the horizontal axis at a single point, uniquely determining the vertical line. So the translates are the points x + W. 0 The quotient space V /W is just identied with the horizontal axis, by taking x 0 + W to x.
Lemma 28.18. The quotient space V /W of a vector space by a subspace is a vector space. The map T : V V /W given by the rule T x = x + W is an onto linear map. Remark 28.19. The concept of quotient space can each be circumvented by using some complicated matrices, as can everything in linear algebra, so that one never really needs to use abstract vector spaces. But that approach is far more complicated and confusing, because it involves a choice of basis, and there is usually no natural choice to make. It is always easiest to carry out linear algebra as abstractly as possible, descending into choices of basis at the latest possible stage. Proof. One has to check that (x + W ) + (y + W ) = (y + W ) + (x + W ), but this follows from x + y = y + x clearly. Similarly all of the laws of vector spaces hold. The 0 element of V /W is the translate 0 + W , i.e. W itself. To check that T is linear, consider scaling: T (sx) = sx + W = s(x + W ), and addition: T (x + y ) = x + y + W = (x + W ) + (y + W ). Lemma 28.20. If U and W are subspaces of a vector space V , and V = U W a direct sum of subspaces, then the map T : V V /W taking vectors v to v + W restricts to an isomorphism T |U : U V /W . Remark 28.21. So, while there is no natural complement to W , every choice of complement is naturally identied with the quotient space.
270
Proof. The kernel of T is clearly U W = 0. To see that T is onto, take a vector v + W in V /W . Because V = U W , we can somehow write v as a sum v = u + w with u from U and w from W . Therefore v + W = u + W = T |U u lies in the image of T |U . Theorem 28.22. If V is a nite dimensional vector space and W a subspace of V , then dim V /W = dim V dim W.
271
Remark 28.26. For simplicity, mathematicians often write V /(U V ) = (U + V )/U, with the isomorphism above implicitly understood. Proof. Dene a linear map : V (U + V ) /U , by v = v + U . This map has kernel U V , clearly, so denes a monomorphism : V / (U V ) (U + V ) /U , as in the rst isomorphism theorem, by (v + (U V )) = v . Take any vector in (U + V ) /U , say u + v + U . Clearly u + v + U = v + U (as a translate), so every vector in (U + V ) /U has the form v + U for some vector v from V . Therefore v + U = v = (v + (U V )) , so is an isomorphism. Theorem 28.27 (The Third Isomorphism Theorem). If U is a subspace of V which is itself a subspace of a vector space W , then there is an isomorphism : (W/U )/(V /U ) W/V, given by the rule ((w + U ) + V /U ) = w + V . Remark 28.28. For simplicity, mathematicians often write (W/U )/(V /U ) = W/V, with the isomorphism above implicitly understood. Proof. Dene a map : W/U W/V by the rule (w + U ) = w + V . Clearly is an epimorphism, with ker = V /U . Therefore by the rst isomorphism theorem, induces an isomorphism : (W/U ) / (V /U ) W/V .
as a good description of where they lie. How do they arrange themselves around the mean? To keep things simple, lets subtract the mean from each of the vectors. So assume that the mean is = 0, and we are asking how the vectors arrange themselves around the origin. Imagine that these vectors v1 , v2 , . . . , vN tend to lie along a particular line through the origin. Lets try to take an orthonormal basis of Rn , say u1 , u2 , . . . , un , so that u1 points along that line. How can we nd the direction of that line? We look at the quantity vk , x . If the vectors lie nearly on a line through 0, then for x on that line, vk , x should be large positive or negative, while for x perpendicular to that line, vk , x should be nearly 0. If we square, we can make sure the large positive or negative becomes large positive, so we take the quantity Q(x) = v1 , x
2
+ v2 , x
+ + vN , x
The spectral theorem guarantees that we can pick an orthonormal basis u1 , u2 , . . . , un of eigenvectors of the symmetric matrix A associated to Q. We will arrange the eigenvalues 1 , 2 , . . . , n from largest to smallest. Because Q(x) 0, we see that none of the eigenvalues are negative. Clearly Q(x) grows fastest in the direction x = u1 . Problem 29.1. The symmetric matrix A associated to Q(x) (for which Ax, x = Q(x) 273
Aij = v1 , ei v1 , ej + v2 , ei v2 , ej + + v2 , ei v2 , ej . If we rescale all of the vectors v1 , v2 , . . . , vN by the same nonzero scalar, then the resulting vectors tend to lie along the same lines or planes as the original vectors did. So it is convenient to replace Q(x) by the quadratic polynomial function 2 vk , x Q(x) = k 2 . v This has associated symmetric matrix Aij =
k
vk , ei vk , ej v
2
which we will call the covariance matrix associated to the data. Lemma 29.1. Given any set of nonzero vectors v1 , v2 , . . . , vN in Rn , write them as the columns of a matrix V . Their covariance matrix A= VVt
k 2
vk
has an orthonormal basis of eigenvectors u1 , u2 , . . . , un with eigenvalues 1 1 2 n 0. Remark 29.2. The square roots of the eigenvalues are the correlation coecients, each indicating how much the data tends to lie in the direction of the associated eigenvector. Proof. We have only to check that x, V V t x = k vk , x , an exercise for the reader, to see that A is the covariance matrix. Eigenvalues of A cant be negative, as mentioned already. For any vector x of length 1, the Schwarz inequality (lemma 20.1 on page 193) says that vk , x
k 2 2
vk
Therefore, by the minimum principle, eigenvalues of A cant exceed 1. Our data lies mostly along a line through 0 just when 1 is large, and the remaining eigenvalues 2 , 3 , . . . , n are much smaller. More generally, if we nd that the rst dozen or so eigenvalues are relatively large, and the rest are relatively much smaller, then our data must lie very close to a subspace of dimension a dozen or so. The data tends most strong to lie along the u1 direction; uctations about that direction are mostly in the u2 direction, etc.
275
(b) The same data. Lines indicate the directions of eigenvectors. Vectors sticking out from the mean are drawn in those directions. The lengths of the vectors give the correlation coecients.
Every vector x can be written as x = a1 u1 + a2 u2 + + an un , and the numbers a1 , a2 , . . . , an are recovered from the formula ai = x, ui . If the eigenvalues 1 , 2 , . . . , d are relatively much larger than the rest, we can say that our data live near the subspace spanned by u1 , u2 , . . . , ud , and say that our data has d eective dimensions. The numbers a1 , a2 , . . . , ad are called the principal components of a vector x. To store the data, instead of remembering all of the vectors v1 , v2 , . . . , vN , we just keep track of the eigenvectors u1 , u2 , . . . , ud , and of the principal components of the vectors v1 , v2 , . . . , vN . In matrices, this means that instead of storing V , we store F = (u1 u2 . . . un ), and store the rst d columns of F t V ; let W be the matrix of these columns. Coming out of storage, we can approximately recover the vectors v1 , v2 , . . . , vN as the columns of F W . The matrix F represents an orthogonal transformation putting the vectors v1 , v2 , . . . , vN nearly into the subspace spanned by e1 , e2 , . . . , ed , and mostly along the e1 direction, with uctations mostly along the e2 direction, etc. So it is often useful to take a look at the columns of W themselves, as a convenient picture of the data.
2 .. . r .
Proof. Suppose that A is p q . Just like when we worked out principal components, we order the eigenvalues of At A from largest to smallest. For each eigenvalue j , let j = j . (Since we saw that the eigenvalues j of At A arent negative, the square root makes sense.) Let V be the matrix whose columns are an orthonormal basis of eigenvectors of At A, ordered by eigenvalue. Write V = V1 V2
with V1 the eigenvectors with positive eigenvalues, and V2 those with 0 eigenvalue. For each nonzero eigenvalue, dene a vector uj = 1 Avj . j
Suppose that there are r nonzero eigenvalues. Lets check that these vectors u1 , u2 , . . . , ur are orthonormal. ui , uj = 1 1 Avi , Avj i j 1 = Avi , Avj i j 1 = vi , At Avj i j 1 = vi , j vj i j j 1 if i = j , = i j 0 otherwise. 1 if i = j , = 0 otherwise.
If there arent enough vectors u1 , u2 , . . . , ur to make up a basis (i.e. if r < p), then just write down some more vectors to make up an orthonormal basis, say
277
Corollary 29.4. Any square matrix A can be written as A = KP (the Cartan decomposition, also called the polar decomposition), where K is orthogonal and P is symmetric and positive semidenite. Proof. Write A = U V t and set K = U V t and P = V V t .
30 Factorizations
Most theorems in linear algebra are obvious consequence of simple factorizations.
30.1 LU Factorization
Forward elimination is messy: we swap rows and add rows to lower rows. We want to put together all of the row swaps into one permutation matrix, and all of the row additions into one strictly lower triangular matrix.
Algorithm
To forward eliminate a matrix A (lets say with n rows), start by setting p to be the permutation 1, 2, 3, . . . , n (the identity permutation ), L = 1, U = A. To start with, no entries of L are painted. Carry out forward elimination on U . a. Each time you nd a nonzero pivot in U , you paint a larger square box in the upper left corner of L. (This number of rows in this painted box is always the number of pivots in U .) b. When you swap rows k and of U , a) swap entries k and of the permutation p and b) swap rows k and of L, but only swap unpainted entries which lie beneath painted ones. c. If you add s (row k ) to row in U , then put s into column k , row in L. The painted box in L is always square, with number of rows and columns equal to the number of pivots drawn in U . Remark 30.1. As each step begins, the pivot rows of U with nonzero pivots in them are nished, and the entries inside the painted box and the entries on and above the diagonal of L are nished. Theorem 30.2. By following the algorithm above, every matrix A can be written as A = P 1 LU where P is a permutation matrix of a permutation p, L a strictly lower triangular matrix, and U an upper triangular matrix. Proof. Lets show that after each step, we always have P A = LU and always have L strictly lower triangular. For the rst forward elimination step, we might have to swap rows. There is no painted box yet, so the algorithm says that 279
Factorizations
1, 2, 3
1, 3, 2
1, 3, 2
1, 3, 2
0 1 3 0 1 3 0 3 1 0 3 1 0 3 1
the row swap leaves all entries of L alone. Let Q be the permutation matrix of the required row swap, and q the permutation. Our algorithm will pass from p = 1, L = I, U = A to p = q, L = I, U = QA, and so P A = LU . Next, we might have to add some multiples of the rst row of U to lower rows. We carry this out by a strictly lower triangular matrix, say 1 s 0 , I
S=
subtracts the corresponding multiples of row 1 from lower rows. So U becomes U new = SU , while the permutation p (and hence the matrix P ) stays the same. The matrix L becomes Lnew = S 1 L = S 1 , strictly lower triangular, and P new A = Lnew U new . Suppose that after some number of steps, we have reduced the upper left
30.1. LU Factorization
281
is strictly lower triangular, and that P is some permutation matrix. Finally, suppose that P A = LU . Our next step in forward elimination could be to swap rows k and in U , and these we can assume are rows in the bottom of U , i.e. rows of U2 . Suppose that Q is the permutation matrix of a transposition so that QU2 is U2 with the appropriate rows swapped. In particular, Q2 = I since Q is the permutation matrix of a transposition. Let P new = Lnew = U new = I 0 I 0 I 0 0 P, Q 0 I L Q 0 0 U. Q 0 , Q
strictly lower triangular. The upper left corner L0 is the painted box. So Lnew is just L with rows k and swapped under the painted box. If we add s (row k ) of U to row , this means multiplying by a strictly lower triangular matrix, say S . Then P A = LU implies that P A = LS 1 SU . But LS 1 is just L with s (column ) subtracted from column k . Problem 30.1. Find the LU-factorization of each of A= 0 1 1 , B= 1 , C= 1 . 0
Problem 30.2. Suppose that A is an invertible matrix. Prove that any two LU-factorizations of A which have the same permutation matrix P must be the same.
Tensors
283
31 Quadratic Forms
Quadratic forms generalize the concept of inner product, and play a crucial role in modern physics.
xi yj b (ei , ej ) xi yj Aij
ij
= x, Ay . So every bilinear form on Rp has the form b(x, y ) = x, Ay for a uniquely determined matrix A. Of course, we can add and scale bilinear forms in the obvious way to make more bilinear forms, and the bilinear forms on a xed vector space V form a vector space. Lemma 31.5. Fix a basis v1 , v2 , . . . , vp of V . Given any collection of numbers bij (with i, j = 1, 2, . . . , p), there is precisely one bilinear form b with bij = b (vi , vj ). Thus the vector space of bilinear forms on V is isomorphic to the vector space of p p matrices (bij ). 285
286
Quadratic Forms
Proof. Given any bilinear form b we can calculate out the numbers bij = b (vi , vj ). Conversely given any numbers bij , and vectors x = i xi vi and y = j yj vj we can let b (x, y ) = bij xi yj .
i,j
Clearly adding bilinear forms adds the associated numbers bij , and scaling bilinear forms scales those numbers. Lemma 31.6. Let B be the set of all bilinear forms on V . If V has nite 2 dimension, then dim B = (dim V ) . Proof. There are (dim V ) numbers bij .
2
b(p(x), q (x)) =
1
287
(d) b(p(x), q (x)) = p(1) + q (1), (e) b(p(x), q (x)) = p(0)q (0) + p(1)q (1) + p(2)q (2), is b bilinear? Is b degenerate?
= x, x =
i n
x2 i
is the quadratic form of the inner product on R . Example 31.10. Every quadratic form on Rn has the form Q(x) =
ij
Aij xi xj ,
for some numbers Aij = Aji . We could make a symmetric matrix A with those numbers as entries, so that Q(x) = x, Ax . Of course, as in section 14.2 on page 138, the symmetric matrix A is uniquely determined by the quadratic form Q and uniquely determines Q. Problem 31.8. What are the quadratic forms of the bilinear forms in problem 31.6 on the facing page? Lemma 31.11. Every quadratic form Q determines a symmetric bilinear form b by 1 b(x, y ) = (Q(x + y ) Q(x) Q(y )) . 2 Moreover, Q is the quadratic form of b. Proof. There are various identities we have to check on b to ensure that b is a bilinear form. Each identity involves a nite number of vectors. Therefore it suces to prove the result over a nite dimensional vector space V (replacing V by the span of the vectors involved in each identity). Be careful: the identities
288
Quadratic Forms
have to hold for all vectors from V , but we can rst pick vectors from V , and then replace V by their span and then check the identity. Since we can assume that V is nite dimensional, we can take a basis for V and therefore assume that V = Rn . Therefore we can write Q(x) = x, Ax , for a symmetric matrix A. Expanding out b(x, y ) = 1 ( x + y, A(x + y ) x, Ax y, Ay ) 2 = x, Ay ,
which is clearly bilinear. Problem 31.9. The results of this chapter are still true over any eld (although we wont try to make sense of being positive denite if our eld is not R ), except for lemma 31.11. Find a counterexample to lemma 31.11 on the previous page over the eld of Boolean numbers. Theorem 31.12. The equation b(x, y ) = 1 (Q(x + y ) Q(x) Q(y )) . 2
gives an isomorphism between the vector space of symmetric bilinear forms b and the vector space of quadratic forms Q. The proof is obvious just looking at the equation: if you scale the left side, then you scale the right side, and vice versa, and similarly if you add bilinear forms on the left side, you add quadratic forms on the right side.
where
x1 x2 . Fx = . . . xn
We cannot by any linear change of variables alter the value of p (the number of positive terms) or the value of q (the number of negative terms).
289
Remark 31.14. Sylvesters Law of Inertia tells us what all quadratic forms look like, if we allow ourselves to change variables. The numbers p and q are the only invariants. The reader should keep in mind that in our study of the spectral theorem, we only allowed orthogonal changes of variable, so we got eigenvalues as invariants. But here we allow any linear change of variable; in particular we can rescale, so only the signs of the eigenvalues are invariant. Proof. We could apply the spectral theorem (theorem 14.2 on page 136), but we will instead use elementary algebra. Take any basis for V , so that we can assume that V = Rn , and that Q(x) = x, Ax for some symmetric matrix A. In other words, Q(x) = A11 x2 1 + A12 x1 x2 + . . . . Suppose that A11 = 0. Lets collect together all terms containing x1 and complete the square A11 x2 1+
j>1
A1j x1 xj +
i>1
Ai 1 x i x 1 2
y1 = x1 + Then
A1j xj .
j>1
where the . . . involve only x2 , x3 , . . . , xn . Changing variables to use y1 in place of x1 is an invertible linear change of variables, and gets rid of nondiagonal terms involving x1 . We can continue this process using x2 instead of x1 , until we have used up all variables xi with Aii = 0. So lets suppose that all diagonal terms of A vanish. If there is some nondiagonal term of A which doesnt vanish, say A12 , then make new variables y1 = x1 + x2 and y2 = x1 x2 , so x1 = 1 2 (y1 + y2 ) 1 2 2 and x2 = 1 ( y y ) . Then x x = y y , turning the A x x term into 1 2 1 2 12 1 2 1 2 2 4 two diagonal terms. So now we have killed o all nondiagonal terms, so we can assume that
2 2 Q(x) = A11 x2 1 + A22 x2 + + Ann xn .
We can rescale x1 by any nonzero constant c which scales A11 by 1/c2 . Lets choose c so that c2 = A11 . Problem 31.10. Apply this method to the quadratic form Q(x) = x2 1 + x1 x2 + x2 x1 + x2 x3 + x3 x2 .
290
Quadratic Forms
Next we have to show that the numbers p of positive terms and q of negative terms cannot be altered by any linear change of variables. We can assume (by using the linear change of variables we have just constructed) that V = Rn and that
2 2 2 2 2 Q(x) = x2 1 + x2 + + xp xp+1 + xp+2 + + xp+q .
We want to show that p is the largest dimension of any subspace on which Q is positive denite, and similarly that q is the largest dimension of any subspace on which Q is negative denite. Consider the subspace V+ of vectors of the form x1 x2 . . . xp x= 0 . 0 . . . 0 Clearly Q is positive denite on V+ . Similarly, Q is negative denite on the subspace V of vectors of the form 0 0 . . .
0 x p+1 xp+2 x= . . . . xp+q 0 0 . . . 0 Suppose that we can nd some subspace W of V of greater dimension than p, so that Q is positive denite on W . Let T be the orthogonal projection to V+ .
291
In other words, for any vector x in Rn , let x1 x2 . . . xp P+ x = 0 . 0 . . . 0 Then P+ |W : W V+ is a linear map, and dim W > dim V+ , so dim W = dim ker P+ |W + dim im T |W dim ker P+ |W + dim V+ < dim ker P+ |W + dim W. so, subtracting dim W from both sides, 0 < dim ker P+ |W . Therefore there is a nonzero vector x in W for which P+ x = 0, i.e. 0 0 . . .
0 x p+1 xp+2 x= . . . . xp+q x p+q+1 xp+q+2 . . . xn Clearly Q(x) > 0 since x lies in W . But clearly
2 2 Q(x) = x2 p+1 xp+2 xp+q 0,
a contradiction.
292
Quadratic Forms
Remark 31.15. Much of the proof works over any eld, as long as we can divide by 2, i.e. as long as 2 = 0. However, there could be a problem when we try to rescale: even if 2 = 0, we can only arrange
2 2 Q(x) = 1 x2 1 + 2 x2 + + n xn ,
where each i can be rescaled by any nonzero number of the form c2 . (There is no reasonable analogue of the numbers p and q in a general eld). In particular, since every complex number has a square root, the same theorem is true for complex quadratic forms, but in the stronger form that we can arrange q = 0, i.e. we can arrange 2 2 Q(x) = x2 1 + x2 + + xp . Problem 31.11. Prove that a quadratic form on any real n-dimensional vector space is nondegenerate just when p + q = n, with p and q as in Sylvesters law of inertia. Problem 31.12. For a complex quadratic form, prove that if we arrange our quadratic form to be 2 2 Q(x) = x2 1 + x2 + + xp . then we are stuck with the resulting value of p, no matter what linear change of variables we employ.
lies in the null cone of the symmetric bilinear form b(x, y ) = x1 y1 x2 y2 for x and y in R2 . Indeed the null cone of that symmetric bilinear form is 2 2 0 = b(x, x) = x2 1 x2 , so its the pair of lines x1 = x2 and x1 = x2 in R .
293
Problem 31.15. Prove that the kernel of a symmetric bilinear form lies in its null cone.
Problem 31.16. Find the null cone of the symmetric bilinear form b(x, y ) = x1 y1 x2 y2 for x and y in R3 . What part of the null cone is the kernel? Problem 31.17. Prove that the kernel is a subspace. For which symmetric bilinear forms is the null cone a subspace? For which symmetric bilinear forms is the kernel equal to the null cone?
Problem 31.20. Suppose that b is a symmetric bilinear form on nite dimensional vector space V . (a) For each vector x in V , dene a linear map : V R by (y ) = b(x, y ). Write this covector as = T x. Prove that the map T : V V given by = T x is linear. (b) Prove that the kernel of T is the kernel of b. (c) Prove that T is an isomorphism just when b is nondegenerate. (The moral of the story is that a nondegenerate symmetric bilinear form b identies the vector space V with its dual space V via the map T .)
294
Quadratic Forms
(d) If b is nondegenerate, prove that for each covector , there is a unique vector x in V so that b(x, y ) = (y ) for every vector y in V .
Pijk Sjk .
Just as a matrix is a rectangle of numbers, a tensor with three indices is a box of numbers. For this chapter, all of our tensors will be tensors in Rn , which means that the indices all run from 1 to n. For example, our vectors are literally in Rn , while our matrices are n n, etc. The subject of tensors is almost trivial, since there is really nothing much we can say in any generality about them. There are two subtle points: upper versus lower indices and summation notation. 295
296
with indices down. We will call the elements of Rn covectors . Finally, we write matrices as A1 A1 ... A1 q 1 2 2 A2 ... A2 A1 q 2 , A= . . . . . . . . . ... Ap Ap ... Ap q 1 2 so Arow column . In general, a tensor can have as many upper and lower indices as we need, and we will treat upper and lower indices as being dierent. ij For example, the components of a matrix look like Ai j , never like Aij or A , which would represent tensors of a dierent type.
Summation Notation
Following Einstein further, whenever we write an expression with some letter j appearing once as an upper index and once as a lower index, like Ai j x , this i j means j Aj x , i.e. a sum is implicitly understood over the repeated j index. We will often refer to a vector x as xi . This isnt really fair, since it confuses a single component xi of a vector with the entire vector, but it is standard. Similarly, we write a matrix A as Ai j and a tensor with 2 upper and 3 lower indices as tij klm . The names of the indices have no signicance and will usually change during the course of calculations.
32.2. Operations
297
32.2 Operations
What can we do with tensors? Very little. At rst sight, they look complicated. But there are very few operations on tensors. We can (1) Add tensors that have the same numbers of upper and of lower indices; for example add sijk lm to tijk lm to get
ijk sijk lm + tlm .
If the tensors are vectors or matrices, this is just adding in the usual way. (2) Scale; for example 3 tij klm means the obvious thing: triple each component of tij klm . (3) Swap indices of the same type; for example, take a tensor ti jk and make the tensor ti . There is no nice notation for doing this. kj (4) Take tensor product : just write down two tensors beside one another, with distinct indices; for example, the tensor product of si j and tij is si j tkl . Note that we have to change the names on the indices of t before we write it down, so that we dont use the same index names twice. (5) Finally, contract : take any one upper index, and any one lower index, and set them equal and sum. For example, we can contract the i and k indices i i of a tensor ti jk to produce tji . Note that tji has only one index j , since the summation convention tells us to sum over all possibilities for the i index. So ti ji is a covector. In tensor calculus there are some additional operations on tensor quantities (various methods for dierentiating and integrating), and these additional operations are essential to physical applications, but tensor calculus is not in the algebraic spirit of this book, so we will never consider any other operations than those listed above. Example 32.2. If xi is a vector and yi a covector, then we cant add them, because the indices dont match. But we can take their tensor product xi yj , and then we can contract to get xi yi . This is of course just y (x), thinking of every covector y as a linear function on Rn . Example 32.3. If xi is a vector and Ai j is a matrix, then their tensor prodk i j uct is Ai x , and contracting gives A x , which is the vector Ax. So matrix j j multiplication is tensor product followed by contraction.
i i k Example 32.4. If Ai j and Bj are two matrices, then Ak Bj is the matrix AB . k i Similarly, Aj Bk is the matrix BA. These are the two possible contractions of k the tensor product Ai j Bl .
298
Example 32.6. It is standard in working with tensors to write the entries of the i i identity matrix not as Ij but as j : 1 if i = j, i j = 0 otherwise.
i The trace of the identity matrix is i = n, since for each value of i we add one. i k Tensor products of the identity matrix give many other tensors, like j l .
Problem 32.1. Take the tensor product of the identity matrix and a covector , and simplify all possible contractions.
Example 32.7. If a tensor has various lower indices, we can average over all permutations of them. This process is called symmetrizing over indices. For 1 i 1 i example, a tensor ti jk can be symmetrized to a tensor 2 tjk + 2 tkj . Obviously we can also symmetrize over any two upper indices. But we cant symmetrize over an upper and a lower index. If we x our attention on a pair of indices, we can also antisymmetrize over them, say taking a tensor tjk and producing 1 1 tjk tkj . 2 2 A tensor is symmetric in some indices if it doesnt change when they are permuted, and is antisymmetric in those indices if it changes by the sign of the permutation when the indices are permuted. Again focusing on just two lower indices, we can split any tensor into a sum tjk = 1 1 1 1 tjk + tkj + tjk tkj 2 2 2 2
symmetric antisymmetric
of a symmetric part and an antisymmetric part. Problem 32.2. Suppose that a tensor tijk is symmetric in i and j , and antisymmetric in j and k . Prove that tijk = 0. Of course, we write a tensor as 0 to mean that all of its components are 0. Example 32.8. Lets look at some tensors with lots of indices. Working in R3 , dene a tensor by setting if i, j, k is an even permutation of 1, 2, 3, 1 ijk = 1 0 if i, j, k is an odd permutation of 1, 2, 3, if i, j, k is not a permutation of 1, 2, 3.
299
Of course, i, j, k fails to be a permutation just when two or three of i, j or k are equal. For example, 123 = 1, 221 = 0, 321 = 1, 222 = 0. Problem 32.3. Take three vectors x, y and z in R3 , and calculate the contraction ijk xi y j z k .
Problem 32.4. Prove that every tensor tijk which is antisymmetric in all lower indices is a constant multiple tijk = c ijk . Example 32.9. More generally, working in Rn , we can dene a tensor by if i1 , i2 , . . . in is an even permutation of 1, 2, . . . , n, 1 i1 i2 ...in = 1 0 if i1 , i2 , . . . in is an odd permutation of 1, 2, . . . , n, if i1 , i2 , . . . in is not a permutation of 1, 2, . . . , n.
for a matrix A.
In other words, F is contracted against F 1 . So vectors transform as F x = F x (contract with F ), and covectors as F = F 1 (contract with F 1 ). Contracting any tensor with as many vectors and covectors as needed to form a number; in order to preserve these contractions, the tensors upper indices must transform like vectors, and its lower indices like covectors, when we carry out F . For example, r ij i j pq (F t)k = Fp Fq tr F 1 k
300
In other words, we contract one copy of F with each upper index and contract one copy of F 1 with each lower index. For example, lets see the invariance of contraction under F in the simplest case of a matrix.
1 (F A)i = Fji Aj k F 1 = Aj k F k i i k i
Fji
k j
1 = Aj F k F k = Aj k j
= Aj j
k since Aj k j has a sum over j and k , but each term vanishes unless j = k , in which case we nd Aj j being added.
Problem 32.5. Prove that both of the contractions of a tensor ti jk are preserved by linear change of variables.
Problem 32.6. Find F . (Recall the tensor from example 32.9 on the previous page.) Remark 32.10. Note how abstract the subject is: it is rare that we would write down examples of tensors, with actual numbers in their entries. It is more common that we think about tensors as abstract algebraic gadgets for storing multivariate data from physics. Writing down examples, in this raried air, would only make the subject more confusing.
301
Tensor Invariants
The contraction of indices is invariant under F , as is tensor product. So given a tensor t, we can try to write down some invariant numbers out it by taking any number of tensor products of that tensor with itself, and any number of contractions until there are no indices left. For example, a matrix Ai j has among its invariants the numbers
i j i j k Ai i , Aj Ai , Aj Ak Ai , . . .
In the notation of earlier chapters, these numbers are tr A, tr A2 , tr A3 , . . . i.e. the functions we called pk (A) in example 26.16 on page 248. We already know from that chapter that every real-valued polynomial invariant of a matrix is a function of p1 (A), p2 (A), . . . , pn (A). More generally, all real-valued polynomial invariants of a tensor are polynomial functions of those obtained by taking some number of tensor products followed by some number of contractions. (The proof is very dicult; see Olver [8] and Procesi [10]). Problem 32.7. Describe as many invariants of a tensor tij kl as you can. It is dicult to decide how many of these invariants you need to write down before you can be sure that you have a complete set, in the sense that every invariant is a polynomial function of the ones you have written down. General theorems (again see Olver [8] and Procesi [10]) ensure that eventually you will produce a nite complete set of invariants. If a tensor has more lower indices than upper indices, then so does every tensor product of it with itself any number of times, and so does every contraction. Therefore there are no polynomial invariants of such tensors. Similarly if there are more upper indices than lower indices then there are no polynomial invariants. For example, a vector has no polynomial invariants. For example, a quadratic form Qij = Qji has no upper indices, so has no polynomial invariants. (This agrees with Sylvesters law of inertia (theorem 31.13 on page 288), which tells us that the only invariants of a quadratic form (over the real numbers) are the integers p and q , which are not polynomials in the Qij entries.
302
tells us that in order to preserve lengths, the only linear changes of variable we can employ are those given by orthogonal matrices. If we had a tensor with both upper and lower indices, like ti jk , we can see that it transforms under a linear change of variables y = F x as
i p (F t)jk = Fp tqr F 1 i q j
F 1
r k
Lets dene a new tensor by letting sijk = ti jk , just dropping the upper index. If
i F is orthogonal, i.e. F 1 = F t , then F = F 1 , so Fp = F 1 i . Therefore t p p q r s F 1 j F 1 k i pqr q r i p Fp tqr F 1 j F 1 k i (F t)jk .
(F s)ijk = F 1 = =
We summarize this calculation: Theorem 32.11. Dropping upper indices to become lower indices is an operation on tensors which is invariant under any orthogonal linear change of variable. Note that this trick only works for orthogonal matrices F , i.e. orthogonal changes of variable. Problem 32.8. Prove that doubling (i.e. the linear map y = F x = 2x) acts on a vector x by doubling it, on a covector by scaling by 1 2 . (In general, this linear map F x = 2x acts on any tensor t with p upper and q lower indices by scaling by 2pq , i.e. F t = 2pq t.) What happens if you rst lower the index of a vector and then apply F ? What happens if you apply F and then lower the index? Problem 32.9. Prove that contracting two lower indices with one another is an operation on tensors which is invariant under orthogonal linear change of variable, but not under rescaling of variables. Engineers often prefer their approach to tensors: only lower indices, and all of the usual operations. However, their approach makes rescalings, and other nonorthogonal transformations (like shears, for example) more dicult. There is a similar approach in relativistic physics to lower indices: by contracting with a quadratic form.
33 Tensors
We give a mathematical denition of tensor, and show that it agrees with the more concrete denition of chapter 32. We will continue to use Einsteins summation convention throughout this chapter.
304
Tensors
Denition 33.4. For any nite dimensional vector spaces V1 , V2 , . . . , Vp , let V1 V2 Vp (called the tensor product of the vector spaces V1 , V2 , . . . , Vp ) be the set of all multilinear maps t (1 , 2 , . . . , p ) , where 1 is a covector from V1 , 2 is a covector from V2 , etc. Each such multilinear map t is called a tensor . Example 33.5. A tensor tij in Rn following our old denition (from chapter 32) yields a tensor t(, ) = tij i j following this new denition. On the other hand, if t (, ) is a tensor in Rn Rn , then we can dene a tensor following our old denition by letting tij = t ei , ej , where e1 , e2 , . . . , en is the usual dual basis to the standard basis of Rn . Example 33.6. Let V be a nite dimensional vector space. Recall that there is a natural isomorphism V V , given by sending any vector v to the linear function fv on V given by fv ( ) = (v ). We will henceforth identify any vector v with the function fv ; in other words we will from now on use the symbol v itself instead of writing fv , so that we think of a covector as a linear function (v ) on vectors, and also think of a vector v as a linear function on covectors , by the bizarre denition v ( ) = (v ). In this way, a vector is the simplest type of tensor. Denition 33.7. Let V and W be nite dimensional vector spaces, and take v a vector in V and w a vector in W . Then write v w for the multilinear map v w(, ) = (v ) (w). So v w is a tensor in V W , called the tensor product of v and w. Denition 33.8. If s is a tensor in V1 V2 Vp , and t is a tensor in W1 W2 Wq , then let s t, called the tensor product of s and t, be the tensor in V1 V2 Vp W1 W2 Wq given by s t (1 , 2 , . . . , p , 1 , 2 , . . . , q ) = s (1 , 2 , . . . , p ) t (1 , 2 , . . . , q ) . Denition 33.9. Similarly, we can dene the tensor product of several tensors. For example, given nite dimensional vector spaces U, V and W , and vectors u from U , v from V and w from W , let u v w mean the multilinear map u v w(, , ) = (u) (v ) (w), etc. Problem 33.1. Prove that (av ) w = a (v w) = v (aw) ( v1 + v2 ) w = v1 w + v2 w v (w1 + w2 ) = v w1 + v w2
305
Problem 33.2. Take U, V and W any nite dimensional vector spaces. Prove that (u v ) w = u (v w) = u v w for any three vectors u from U , v from V and w from W . Theorem 33.10. If V and W are two nite dimensional vector spaces, with bases v1 , v2 , . . . , vp and w1 , w2 , . . . , wq , then V W has as a basis the vectors vi wJ for i running over 1, 2, . . . , p and J running over 1, 2, . . . , q . Proof. Take the dual bases v 1 , v 2 , . . . , v p and w1 , w2 , . . . , wq . Every tensor t from V W has the form t(, ) = t i v i , J wJ = i J t v i , w J so let tij = t v i , wJ to nd = tiJ i J = tiJ vi wJ (, ) . So the vi wJ span. Any linear relation between the vi wJ , just reading these lines from bottom to top, would yield a vanishing multilinear map, so would have to satisfy 0 = t v k , wL = tkL , forcing all coecients in the linear relation to vanish. Remark 33.11. A similar theorem, with a similar proof, holds for any tensor products: take any nite dimensional vector spaces V1 , V2 , . . . , Vp , and pick any basis for V1 and any basis for V2 , etc. Then taking one vector from each basis, and taking the tensor product of these vectors, we obtain a tensor in V1 V2 Vp . These tensors, when we throw in all possible choices of basis vectors for all of those bases, yield a basis for V1 V2 Vp , called the tensor product basis. Problem 33.3. Let V = R3 , W = R2 and let 1 4 x = 2 , y = . 5 3 What is x y in terms of the standard basis vectors ei eJ ?
Denition 33.12. Tensors of the form v w are called pure tensors. Problem 33.4. Prove that every tensor in V W can be written as a sum of pure tensors.
Tensors
This tensor is not pure (which is certainly not obvious just looking at it). Lets see why. Any pure tensor x y must be x y = x1 e1 + x2 e2 + x3 e3 y 1 e1 + y 2 e2 + y 3 e3 =x 1 y 1 e 1 e 1 + x 2 y 1 e 2 e 1 + x 3 y 1 e 3 e 1 + x1 y 2 e1 e2 + x2 y 2 e2 e2 + x3 y 2 e3 e2 + x1 y 3 e1 e3 + x2 y 3 e2 e3 + x3 y 3 e3 e3 . If we were going to have x y = e1 e1 + e2 e2 + e3 e3 , we would need x1 y 1 = 1, x2 y 2 = 1, x3 y 3 = 1, but also x1 y 2 = 0, so x1 = 0 or y 2 = 0, contradicting x1 y 1 = x2 y 2 = 1. Denition 33.14. The rank of a tensor is the minimum number of pure tensors that can appear when it is written as a sum of pure tensors. Denition 33.15. If U, V and W are nite dimensional vector spaces and b : U V W is a map for which b(u, v ) is linear in u for any xed v and linear in v for any xed u, say that b is a bilinear map. Theorem 33.16 (Universal Mapping Theorem). Every bilinear map b : U V W induces a unique linear map B : U V W , by the rule B (u v ) = b(u, v ). Sending b to B = T b gives an isomorphism T between the vector space Z of all bilinear maps f : U V W and the vector space Hom (U V, W ). Problem 33.5. Prove the universal mapping theorem:
307
Problem 33.10. Take two vector spaces V and W and dene a vector space V W to be the collection of all real-valued functions on V W which are zero except at nitely many points. Careful: these functions dont have to be linear. Picking any vectors v from V and w from W , lets write the function 1, if x = v and y = w f (x, y ) = 0, otherwise, as v w. So clearly V W is a vector space, whose elements are linear combinations of elements of the form v w. Let Z be the subspace of V W spanned by the vectors (av ) w a(v w), v (aw) a(v w), (v1 + v2 ) w v1 w v2 w, v (w1 + w2 ) v w1 v w2 ,
for any vectors v, v1 , v2 from V and w, w1 , w2 from W and any number a. a. Prove that if V and W both have positive and nite dimension, then V W and Z are innite dimensional. b. Write down a linear map V W V W . c. Prove that your linear map has kernel containing Z . It turns out that (V W ) /Z is isomorphic to V W . We could have dened V W to be (V W ) /Z , and this denition has many advantages for various generalizations of tensor products. Remark 33.17. In the end, what we really care about is that tensors using our abstract denitions should turn out to have just the properties they had with the more concrete denition in terms of indices. So even if the abstract denition is hard to swallow, we will really only need to know that tensors have tensor products, contractions, sums and scaling, change according to the usual rules when we linearly change variables, and that when we tensor together bases, we obtain a basis for the tensor product. This is the spirit behind problem 33.10.
So it is clear where the lower indices come from when we pick a basis: they come from V .
308
Tensors
Problem 33.11. We saw in chapter 32 that a matrix A is written in indices as Ai j . Describe an isomorphism between Hom (V, V ) and V V . Lets dene contractions. Theorem 33.18. Let V and W be nite dimensional vector spaces. There is a unique linear map V V W W, called the contraction map, that on pure tensors takes v w to (v )w. Remark 33.19. We can generalize this idea in the obvious way to any tensor product of any nite number of nite dimensional vector spaces: if one of the vector spaces is the dual of another one, then we can contract. For example, we can contract V W V by a linear map which on pure tensors takes v w to (v )w. Proof. Pick a basis v1 , v2 , . . . , vp of V and a basis w1 , w2 , . . . , wq of W . Dene T on the basis vi v j wK by w if i = j, K T vi v j wK = 0 if i = j. By theorem 16.23 on page 167, there is a unique linear map T : V V W W, which has these values on these basis vectors. Writing any vector v in V as v = ai vi and any covector in V as = bi v i , and any vector w in W as w = cJ wJ , we nd T v w = ai bj cJ wJ = (v )w. Therefore there is a linear map T : V V W W , that on pure tensors takes v w to (v )w. Any other such map, say S , which agrees with T on pure tensors, must agree on all linear combinations of pure tensors, so on all tensors.
33.5. Summary
309
Problem 33.12. Prove theorem 33.20, by imitating the proof of theorem 33.18. Remark 33.21. In the same fashion, we can make a unique linear isomorphism reordering the factors in any tensor product of vector spaces. To be more specic, take any permutation q of the numbers 1, 2, . . . , p. Then (with basically the same proof) there is a unique linear isomorphism V1 V2 Vp Vq(1) Vq(2) Vq(p) which takes each pure tensor v1 v2 vp to the pure tensor vq(1) vq(2) vq(p) .
33.5 Summary
We have now acheived our goal: we have dened tensors on an abstract nite dimensional vector space, and dened the operations of addition, scaling, tensor product, contraction and index swapping for tensors on an abstract vector space. All there is to know about tensors is that (1) they are sums of pure tensors v w, (2) the pure tensor v w depends linearly on v and linearly on w, and (3) the universal mapping property. Another way to think about the universal mapping property is that there are no identities satised by tensors other than those which are forced by (1) and (2); if there were, then we couldnt turn a bilinear map which didnt satisfy that identity into a linear map on tensors, i.e. we would contradict the universal mapping property. Roughly speaking, there is nothing else that you could know about tensors besides (1) and (2) and the fact that there is nothing else to know.
310
Tensors
The inverse to is usually also written , and we write the vector dual to a covector as . Naturally, we can dene an inner product on V by , = , . Problem 33.16. Prove that this denes an inner product on V . If V and W are nite dimensional inner product spaces, we then dene an inner product on V W by setting v1 w1 , v2 w2 = v1 , v2 w1 , w2 . This expression only determines the inner product on pure tensors, but since the inner product is required to be bilinear and every tensor is a sum of pure tensors, we only need to know the inner product on pure tensors. Problem 33.17. Prove that this denes an inner product on V W . Lets write V 2 to mean V V , etc. We refer to tensors in a vector space V to mean elements of V p V q for some positive integers p and q , i.e. tensor products of vectors and covectors. The elements of V p are called covariant tensors : they are sums of tensor products of vectors. The elements of V p are called contravariant tensors : they are sums of tensor products of covectors. Problem 33.18. Prove that an inner product yields a unique linear isomorphism
: V p V q V (p+q) ,
so that (v1 v2 vp 1 2 q ) = v1 v2 vp 1 2 q . This isomorphism raises indices. Similarly, we can dene a map to lower indices.
33.7 Polarization
We will generalize the isomorphism between symmetric bilinear forms and quadratic forms to an isomorphism between symmetric tensors and polynomials. Denition 33.22. Let V be a nite dimensional vector space. If t is a tensor in V p , i.e. a multilinear function t (v1 , v2 , . . . , vp ) depending on p vectors v1 , v2 , . . . , vp from V , then we can dene the polarization of t to be the function (also written traditionally with the same letter t) t(v ) = t (v, v, . . . , v ). Example 33.23. If t is a covector, so a linear function t(v ) of a single vector v , then the polarization is the same linear function. Example 33.24. In R2 , if t = e1 e2 , then the polarization is t (x) = x1 x2 for x= in R2 . x1 x2
33.7. Polarization
311
Example 33.25. The antisymmetric tensor t = e1 e2 e2 e1 in R2 has polarization t(x) = x1 x2 x2 x1 = 0, vanishing. Denition 33.26. A function f : V R on a nite dimensional vector space is called a polynomial if there is a linear isomorphism F : Rn V for which f (F (x)) is a polynomial in the usual sense. Clearly the choice of linear isomorphism F is irrelevant. A dierent choice would only alter the linear functions by linear combinations of one another, and therefore would alter the polynomial functions by substituting linear combinations of new variables in place of old variables. In particular, the degree of a polynomial function is well dened. A polynomial function f : V R is called homogeneous of degree d if f (x) = d f (x) for any vector x in V and number . Clearly every polynomial function splits into a unique sum of homogeneous polynomial functions. There are two natural notions of multiplying symmetric tensors, which simply dier by a factor. The rst is st (x1 , x2 , . . . , xa+b ) =
p
when s has a lower indices and t has b and the sum is over all permutations p of the numbers 1, 2, . . . , a + b. The second is st (x1 , x2 , . . . , xa+b ) = 1 s t. (a + b)!
Theorem 33.27. Polarization in is a linear isomorphism taking symmetric contravariant tensors to polynomials, preserving degree, and taking products to products (using the second multiplication above). Proof. ???
34 Exterior Forms
This chapter develops the denition and basic properties of exterior forms.
314
Exterior Forms
varies smoothly, so after rescaling the picture the integral hardly varies at all.) Just for simplicity, lets assume that S is unchanged when we translate the surface S . Any two opposite sides of a box are translations of one another, but with opposite orientations. So they must have opposite signs for . Therefore any small box has as much of our quantity entering as leaving. Approximating any region with small boxes, we must get total ux S = 0 when S is the boundary of the region. Pick two linearly independent vectors u and v , and let P be the parallelogram at the origin with sides u and v . Pick any vector w perpendicular to u and v and with det u v w > 0. Orient P so that the outside of P is the side in the direction of w. If we swap u and v , then we change the orientation of the parallelogram P . Lets write (u, v ) for P . Slicing the parallelogram into 3 equal pieces, say into 3 parallelograms with sides u/3, v , we see that (u/3, v ) = (u, v )/3. In the same way, we can see that (u, v ) = (u, v ) for any positive rational number (dilate by the numerator, and cut into a number of pieces given by the denominator). Because (u, v ) is a smooth function of u and v , we see that (u, v ) = (u, v ) for > 0. Similarly, (0, v ) = 0 since the parallelogram is attened into a line. Moreover, (u, v ) = (u, v ), since the parallelogram of u, v is the parallelogram of u, v reected, reversing its orientation. So (u, v ) scales in u and v . By reversing orientation, (v, u) = (u, v ). A shear applied to the parallelogram will preserve the area, and after the shear we can cut and paste the parallelogram as in chapter 19. The integral must be preserved, by translation invariance, so (u + v, v ) = (u, v ). The hard part is to see why is linear as a function of u. This comes from the drawing u u+v v v+w The integral over the boundary of this region must vanish. If we pick three vectors u, v and w, which we draw as the standard basis vectors, then the region has boundary given by various parallelograms and triangles (each triangle being half a parallelogram), and the vanishing of the integral gives 1 1 0 = (u, v ) + (v, w) + (w, u) (w, u) (u + w, v ). 2 2 Therefore (u, v ) + (v, w) = (u + w, v ), so that nally is a tensor. If you like indices, you can write as ij with ji = ij . Our argument is only slightly altered if we keep in mind that the integral should not really be exactly translation invariant, but only vary slightly S w
34.2. Denition
315
with small translations, and that the integral around the boundary of a small region should be small. We can still carry out the same argument, but throwing in error terms proportional to the area of surface and extent of translation, or to the volume of a region. We end up with being an exterior form whose coecients are functions. If we imagine a ow contained inside a surface, we can similarly measure ux across a curve. We also need to be sensitive to orientation: which side of the boundary of a surface is the inside of the surface. Again the correct object to work with in order to have the correct sign sensitivity is an exterior form (whose coecients are functions, not just numbers). Similar remarks hold in any number of dimensions. So exterior forms play a vital role because they are the objects we integrate. We can easily change variables when we integrate exterior forms.
34.2 Denition
Denition 34.1. A tensor t in V p is called a p-form if it is antisymmetric, i.e. t (v1 , v2 , . . . , vp ) is antisymmetric as a function of the vectors v1 , v2 , . . . , vp , For any permutation q , t vq(1) , vq(2) , . . . , vq(p) = (1)N t (v1 , v2 , . . . , vp ) , where (1)N is the sign of the permutation q . Example 34.2. The form in Rn (v1 , v2 , . . . , vn ) = det v1 v2 ... vn
is called the volume form of Rn (because of its interpretation as an integrand: is the volume of any region R.). R Example 34.3. A covector in V is a 1-form, because there are no permutations you can carry out on (v ). Example 34.4. In Rn we traditionally write points as x1 2 x , x= . . . xn and write dx1 for the covector given by the rule dx1 (y ) = y 1 for any vector y in Rn . Then = dx1 dx2 dx2 dx1 is a 2-form: (u, v ) = u1 v 2 u2 v 1 .
316 Problem 34.1. If t is a 3-form, prove that a t(x, y, y ) = 0 and b t(x, y + 3 x, z ) = t(x, y, z ) for any vectors x, y, z .
Exterior Forms
A Hints
. 1 0 0 0 0 0 0 1 317 3 1 1 2 0 1 1 1
Hints
. 1 0 0 0 0 1 0 0 3 2 1 1 0 1 1 1
. 1 0 0 0 0 1 0 0 3 2 1 0 0 1 1 0
1.2. 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 1
Hints
319 . 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 1
. 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 1
0 4 4
2 1 3
1 1 3
1 2 4
320 Swap rows 1 and 2. 4 0 4 Add (row 1) to row 3. 4 0 0 Move the pivot . 4 0 0 Add 2(row 2) to row 3. 4 0 0 Move the pivot . 4 0 0 Move the pivot . 4 0 0 Move the pivot . 4 0 0 1 2 0 1 1 0 2 1 0 1 2 0 1 1 0 2 1 0 1 2 0 1 1 0 2 1 0 1 2 0 1 1 0 2 1 0 1 2 4 1 1 2 2 1 2 1 2 4 1 1 2 2 1 2 1 2 3 1 1 3 2 1 4
Hints
Hints
321
Back substitute:
1 Scale row 2 by 2 .
1 1 0
1
1 2
2 0
1 2
0 1 0
3 2 1 2
5 2 1 2
1 0 0
0 1 0
3 8 1 2
5 8 1 2
Hints
. 1 0 0 0 1 1 0 1 0 0 0 1 1 1 0 0 1 0 0 0
. 1 0 0 0 1 1 0 0 0 0 0 1 1 1 0 1 1 0 0 0
. 1 0 0 0 1 1 0 0 0 0 1 0 1 1 1 0 1 0 0 0
Hints
323
1.10. 1 2 3 3 5 8 2 4 6 6 1 7
Add 2(row 1) to row 2, 3(row 1) to row 3. 1 3 2 6 0 1 0 11 0 1 0 11 Move the pivot . 1 0 0 Add (row 2) to row 3. 3 1 1 2 0 0 6 11 11
1 0 0
3 1 0
2 0 0
6 11 0
. 1 0 0 3 1 0 2 0 0 6 11 0
Hints
Hints
325
1.14.
1 0 0 1.15. I 1.16. Forward eliminate: 1 1 0 0 Add (row 1) to row 2. 1 0 0 0 Move the pivot . 1 0 0 0 2 0 0 0
0 1 0
1 3 1 3 0
2 2 0 0
1 2 1 0
1 1 2 1
1 0 0 2
2 0 0 0
1 1 1 0
1 0 2 1
1 1 0 2
1 1 1 0
1 0 2 1
1 1 0 2
Hints
2 0 0 0
1 1 0 0
1 0 2 1
1 1 1 2
2 0 0 0
1 1 0 0
1 0 2 1
1 1 1 2
2 0 0 0
1 1 0 0
1 0 2 0
1 1 1
3 2
2 0 0 0
1 1 0 0
1 0 2 0
1 1 1 3 2
Hints
327
Scale row 3 by 1 2. 1 0 0 0 2 0 0 0 1 1 0 0 1 0 1 0 0 0 0 1
Scale row 1 by 1. 1 0 0 0 2 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
2 5 1
3 7 1
4 11 4
5 12 3
Hints
1 0 0
2 1 1
3 1 1
4 3 4
5 2 3
. 1 0 0 2 1 1 3 1 1 4 3 4 5 2 3
1 0 0
2 1 0
3 1 0
4 3 1
5 2 1
. 1 0 0 2 1 0 3 1 0 4 3 1 5 2 1
Back substitute: Add 4(row 3) to row 1, 3(row 3) to row 1 2 3 0 1 1 0 0 0 Add 2(row 2) to row 1. 1 0 0 2. 0 0 1 1 1 1
0 1 0
1 1 0
0 0 1
3 1 1
Hints
329 x 1 = x 3 + 3 x 2 = x 3 1 x4 = 1
1 1 2 1
1 1 1 2
0 0 0 0
2 0 0 0
1 3 2
3 2 3 2
1
3 2 3 2 3 2
1
3 2 3 2 3 2
0 0 0 0
. 2 1 3 2
3 2 3 2
1
3 2 3 2 3 2
1
3 2 3 2 3 2
0 0 0
0 0 0
1
3 2
0 3
3 0
0 0 0 0
0 0 0
1 3 2 0 0
1
3 2
1
3 2
0 3
3 0
0 0 0
Hints
1
3 2
3 0
0 3
0 0 0
0 0 0
1 3 2 0 0
1
3 2
1
3 2
3 0
0 3
0 0 0
Back substitute:
1 Scale row 4 by 3 .
0 0 0
1 3 2 0 0
1
3 2
1
3 2
3 0
0 1
0 0 0
0 0 0
1 Scale row 3 by 3 .
1 3 2 0 0
1
3 2
0 0 0 1
3 0
0 0 0
0 0 0
1 3 2 0 0
1
3 2
0 0 0 1
1 0
0 0 0
Hints
331
0 0 0 Scale row 2 by 2 3. 2 0 0 0
0 0 0
1 1 0 0
0 0 1 0
0 0 0 1
0 0 0 0
Scale row 1 by 1 2. 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0
x1 = 0 x2 = 0 x3 = 0 x4 = 0
1.20. You could try: a. One solution: x1 = 0. b. No solutions: x1 = 1, x2 = 1, x1 + x2 = 0. c. Innitely many solutions: x1 + x2 = 0.
332 1.22.
Hints
2 Matrices
2.2. (a) All coordinates of each vertex are 1. (b) The vertices of a regular octahedron lie in the centers of the faces of a cube. (c) Try an equilateral triangle in the plane rst. This should lead you to the points 1 1 1 with an even number of minus signs. 2.3. 2.4. A= 1 1 0 ,B = 0 0 . 1
2.9. Any matrix full of zeros with at least two columns. 2.13. 1 8 + (1) 1 + 2 3 = 13 2.14. 14 20 20 29
Hints
333
2.16. 0 AB = 0 0 4 4 0 AD = 0 0 AC = BC = 0 0 10 0 2 2 2 2 2 0 0 0 1 1 0 . 0 1 1 4 4
1 1
1 . 1
2.20. In an upper triangular matrix, Aij = 0 if i > j . So nonzero terms have i j . The product: (AB )ij = k Aik Bkj The Aik vanishes unless i k , and the second vanishes unless k j , so the whole sum consists in terms with i k j . The sum vanishes unless i j , hence upper triangular. Moreover, the terms with i = j must have i k j , so just the one Aii Bii term. 2.22. (c(AB ))ij = c(AB )ij =c
k
= =
k
(cA)ik Bkj
Hints
Since k and are just used to add up, we can change their names to anything we like. In particular, the resulting sums wont change if we rename k to and to k . Moreover, we can carry out the sums in any order. (You still have to show that each side is dened just when the other is.) 2.24. (A(B + C ))ij =
k
= =
k
Aik Bkj +
k
Aik Ckj
Iik Akj
= Aij because Iik = 1 just when k = i. A hint for a dierent proof (without using ): if A is 1 1, then the result is clear. Now suppose that we have proven the result already for all matrices of some size smaller than p q , but that we face a matrix A which is p q . Then split A into blocks, in any way you like, say as A= and write out IA = I 0 0 I P R Q , S P R Q , S
and calculate out the result, using the fact that since P, Q, R and S are smaller matrices, we can pretend that have already checked the result for them.
Hints
335
1 e1 = 0 . 0
3.6. Row j . All rows except row j . 3.7. One proof: A11 A12 A21 A22 Ae1 = . . . . . . An 1 An 2
so running your ngers along the rows of A and column of e1 : A11 1 + A12 0 + + A1n 0 A21 1 + A22 0 + + A2n 0 = . . . An1 1 + An2 0 + + Ann 0 A11 A21 = . . . An 1 Another proof: e1 has entries (e1 )i = 1 if i = 1 and (e1 )i = 0 if i = 1. So Ae1 has entries (Ae1 )i = k Aik (e1 )k . Each term is zero except if k = 1, in which case it is Ai1 . But the entries of the rst column of A are A11 , A21 , . . . , An1 . 3.9. Write x = xj ej , and multiply both sides by A. 3.10. Use the fact that the columns of AB are A times columns of B . 3.11. You could try A = I3 3.15. 0 , B= I3 0 .
2 1
0 0
0 1
1 0
2 1
0 1
1 1
0 1
336
Hints
3.16. Mostly yes. You only have to try to gure out where e1 goes to, and where e2 goes to. The vector e1 is close to her left eye. The vector e2 is close to the top of her head, which is not really marked by anything, so harder to follow. But you cant gure out what matrix gives the straight line segment. Why? 3.19. C = CI = C (AB ) = (CA)B = IB = B . 3.22. B 1 A1 (AB ) = B 1 A1 A B = B 1 IB = B 1 B = I. and similarly multiplying out (AB ) B 1 A1 . 3.23. By denition, AA1 = A1 A = I . But these equations say exactly that A is the inverse of A1 . 3.24. Multiply both sides of the equation Ax = 0 by A1 . 3.25. Multiply both sides of AB = I by A1 to nd B = A1 . Therefore BA = I , and so A = B 1 . 3.27. If x = y then clearly Ax = Ay . If Ax = Ay , then multiply both sides by A1 . 3.28. M 1 = A 1 0 A1 BD1 D 1
3.29. See gure A.1 on the next page. 3.31. 2, 3, 4, 1; 3, 4, 1, 2; and 4, 2, 3, 1. 3.33. pq is 2, 4, 3, 1. 3.35. Put the number 1 into the spot where the permutation wants it to go, by swapping 1 with whatever number sits in that spot. Then put the number 2 into its spot, etc., swapping two numbers at each step. 3.36. (a) 3,1,2 is (b) 4,3,2,1 is (c) 4,1,2,3 is 3.37. These transpositions allow us to shift any number we want over to the left or to the right, one step. Keep doing this until it lands in its place. So we can put whatever number we want to in the last place, and then proceed by induction.
Hints
337
Figure A.1: Images coming from some matrices, and from their inverses
3.39. P = e4 0 0 = 0 1 0 e2 0 1 0 0 0 0 0 0 0 1 e5 0 0 1 0 0 e3 e1 1 0 0 . 0 0
3.40. 2,3,1,4 3.42. Each column has to be a column of the identity matrix, since it has all 0s except for a single 1. But all of the columns have to be dierent, in order that no two columns have 1s in the same row. So they are dierent columns of the identity matrix. If some column of the identity matrix doesnt show up anywhere in our matrix, say the third column for example, then the third row has only 0s. So every column of the identity matrix shows up, precisely once, scrambled in some permutation.
Hints
3.45. By denition, column j of P is ep(j ) . So we permuted by p. For the rows, P A is A with row 1 moved to row p(1), etc. Take A = I : then P is 1 with row 1 moved to row p(1), etc. So row p(1) of P is e1 , etc. So row 1 of P is ep1 (1) , etc. 3.47. The rows of P are those of 1 swapped by p1 .
Rik Skj .
But this sum is all 0s unless we nd i k and k j , so we need i j . If i = j , then only the term k = i = j makes a contribution, which is Rii Sii = 1. Proof (3): Obvious for 1 1 matrices. Suppose that we have proven the result for all matrices of size smaller than n n. If R and S are n n, write R= 1 a 0 B , S= 1 p 0 Q
where a and p are columns, and B and Q are strictly lower triangular. Then RS = 1 a + Bp 0 BQ
which is strictly lower triangular because B and Q are strictly lower triangular of smaller size. 4.5. Suppose that we want to make a matrix S which is p q . Start with the identity matrix. Multiply it by the elementary matrix with S1q in row 1,
Hints
339
column q . We get one element into place. Keep going, rst getting things set up properly along the bottom row. 4.7.
4.9. If
t1 D=
t2 .. . tn ,
then
1
D =
1 t 1 1 t 2
.. .
.
1 t n
If one of these ti is 0, then there cant be an inverse, because Dei = 0, so if D has an inverse, then D1 Dei = ei , but also D1 Dei = D1 0 = 0. 4.11. ad 0 0 0 be 0 0 0 cf 4.12. The original picture is
4.13. 1 3 2 4 x1 x2 = 7 8
Hints
1 0 0
0 0 . 1
4.15. S 101 adds 202(row 1) to row 3. 1 0 101 S = 0 1 202 0 4.23. Any number between 0 and 3. 4.24. a. 1 2 3 0 0 0 0 0 0
0 0 . 1
4 6 0
5 7 8
b. Impossible: can only use 1 pivot in each row, so at most 3 pivots binding up variables, so must have at least 2 left over free variables (or at least one free variable, if the last column is a column of constants). c. 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 4.27. There is at most one pivot in each row. There are more columns than rows. So there must be a pivotless column: a free variable for the equation Ax = 0.
1 0 1
1 0 0
0 1 0
0 0 1
Hints
341
1 0 1
0 1 1
0 1 0
1 0 1
0 0 1
1 0 0
1 1 0
0 1 1
0 0 1
1 1 0
0 1 0
1 0 0
1 1 0
0 1 1
0 0 1
1 1 0
0 1 0
A 1
1 = 1 1
2 1 0
1 1 . 0
Hints
3 0 0
1 0 0
0 1 0
0 0 1
Add (row 1) to row 2, (row 1) to row 3. 1 2 3 1 0 0 3 1 0 2 3 1 Move the pivot . 1 0 0 Swap rows 2 and 3. Move the pivot . 1 0 0 2 2 0 3 3 3 1 1 1 2 0 2 3 3 3 1 1 1
0 1 0
0 0 1
0 1 0
0 0 1
1 0 0
2 2 0
3 3 3
1 1 1
0 0 1
0 1 0
0 0 1
0 1 0
0 0 1 3
0 1 0
1 1 1 3
0 1 0
Hints
343
Scale row 2 by 1 2. 1 0 0 2 1 0 0 0 1 0 0
1 3
1
1 2 1 3
0 1 2 0
0
1 2 1 3
1 1 2 0
0 = 0
1 3
0
1 2 1 3
1 1 . 2 0
A1 = 3 2 1 2
1 1 0
1 3 . 2
1 2
5.6. Yes, invertible. 5.7. No, not invertible. 5.8. If A is invertible, A1 Ax = x = 0. On the other hand, if Ax = 0 holds only for x = 0, then the same is true of U x = 0, after GaussJordan elimination. Any column of U with no pivot gives a free variable, so there must be a pivot in each column, going straight down the diagonal, so U is invertible. 5.9. Lets use theorem 5.9. Is there a solution x to the equation Ax = b for every choice of b? Yes: try x = Bb. So A is invertible. Multiply AB = I on both sides by A1 to get B = A1 . 5.10. We have already seen that if A and B are both invertible, then 1 1 (AB ) = B 1 A1 . Suppose that AB is invertible. Then (AB ) (AB ) = I so A B (AB )
1 1
B (AB ) = A1 . Multiply by A on the right: B (AB ) invertible. 5.11. Yes 5.12. Forward elimination: 1 2 3 1 2 1 0 1 15
A = I . So B is
Hints
The matrix is invertible, so the equations have a unique solution. 5.15. You could try 0 1 0 A = 0 2 1 1 0 0 which has forward elimination 1 0 0 0 2 0 0 1 1 2 ,
5.16. U x = V b just when V Ax = V b just when Ax = b. So to solve Ax = b we need b to solve U x = V b. The last two rows of U x = V b give the conditions above. If those are satised, we can then drop those rows, and start solving the rst two rows with the pivots.
Hints
345
6 The Determinant
6.1. If a = 0, use it as a pivot, and forward elimination yields a b bc . 0 d a Therefore if a = 0, then A is invertible just when d bc a = 0. Multiplying by a, we see that A is invertible just when ad bc = 0. What if a = 0? We try to swap. Forward elimination yields c 0 d b .
Invertibility (when a = 0) is just precisely both b and c not vanishing. But (when a = 0) ad bc = bc vanishes just when b or c does, so just when invertibility fails. 6.2. det a c b d = a det = ad bc. 6.3. +3 (3) 1 (1) = 10 6.4. +1 det 1 0 1 1 0 det 1 0 0 1 + 1 det 1 1 0 1 = 2 a c b d c det a c b d
Suppose that U is n n and assume that we have already checked that all smaller invertible upper triangular matrices. Split into blocks, in any manner at all A B U= 0 C with A and C square and upper triangular. You will have to nd a way to see that A and C are invertible. Once you do that, check that U 1 = A1 0 A1 BC 1 C 1 .
346
Hints
We see by induction that (1) U 1 is upper triangular, (2) the diagonal entries of U 1 are the reciprocals of the diagonal entries of U , and (3) we can compute the entries U 1 inductively in terms of the entries of U . 6.13. Swapping changes the sign of the determinant. But swapping doesnt change the matrix, so it doesnt change the sign of the determinant. Therefore the determinant is 0. 6.14. You can use A= 1 0 0 ,B = 0 0 0 0 1
Add (row 1) to row 2, (row 1) to row 3. 5 5 2 0 2 0 0 1 6 Make a new pivot . 2 0 0 Swap rows 2 and 3 2 0 0 Make a new pivot . 2 0 0 5 1 0 5 6 2 5 1 0 5 6 2 5 0 1 5 2 6
Hints
347
So det A = (2)(1)(2): the minus sign because of one row swap. 7.2. 0 because the second row is a multiple of the rst. 7.3. From the fast formula, det A = 0 just when there is a pivot in each column. 7.4. GaussJordan elimination with one row swap yields 1 0 1 1 0 1 1 1 0 0 1 1 0 so 1 1 det 0 1 0 0 1 0 1 0 1 1 1 0 = 1 1 0 0 0 1
7.6. GaussJordan elimination with no row swaps yields 1 1 2 3 0 1 2 2 0 0 0 so 2 det 1 2 7.7. +0 det 0 2 1 1 0 det 2 2 0 1 + 2 det 2 0 0 1 = 4 1 1 1 1 0 =0 1
Hints
=0
7.10. Rescaling that row rescales the determinant. But rescaling that row doesnt change anything, so it must not change the determinant. The determinant must not change when scaled, so must be 0. 7.12. To see that det A = 12, we compute 0 2 1 1 2 3 3 5 2 Swap rows 1 and 2 3 0 3 Add (row 1) to row 3. 3 0 0 Make a new pivot . 3 0 0 Add 2(row 2) to row 3. 3 0 0 1 2 0 2 1 2 1 2 4 2 1 0 1 2 4 2 1 0 1 2 5 2 1 2
Hints
349 . 3 0 0 1 2 0 2 1 2
7.13. If L11 = 0, then we have a zero row, so det = 0, and the result is obviously true. If L11 = 0, then use it as a pivot to kill everything underneath it: L11 L22 0 0 L32 L33 .. 0 . L L 42 43 . . . . . . . . . . . . . . . 0 Ln2 Ln3 ... Ln(n1) Lnn Proceed by induction. 7.15. Here is one proof: the i-th row of At is obvious the transpose of t the i-th column of A: ei t At = (Aei ) . Writing out any vector x as x = t x1 e1 + x2 e2 + + xn en , and adding up, we see that xt At = (Ax) for any vector x. Apply this to a column of B , say x = Bei . ei t B t At = (Bei ) At = (ABei )
t t t
= ei t (AB ) . Here is another proof, using lots of indices: (AB )ij = (AB )ji =
k t
= =
= B t At
ij
7.16. It is the inverse permutation; see the exercises in subsection 3.3. 7.17. 4: expand down the third column.
350 7.18. For k = 1, A1 = A, so obvious. By induction, det Ak+1 = det A Ak = (det A) det Ak = (det A) (det A) = (det A)
k+1 k
Hints
7.19. det A2222444466668888 = (det A) = (1)2222444466668888 = 1 (an even number of minus signs). 7.20. AA1 = I so det A det A1 = 1. 7.21. a. By expanding down any column. b. By expanding across any row. c. By forward elimination, and then taking the product of the diagonal entries. (The fastest way for a big matrix.) 7.22. One, because the determinant of the coecients is 1 2 3 = 6.
2222444466668888
8 Span
8.1. x1 + 2x2 + x3 = 1 x2 + x3 = 0 x1 + x2 = 2 8.3. Yes. 8.4. Lets call these vectors x1 , x2 , x3 . Make the matrix A = x1 Apply forward elimination: 1 A = 0 1 Add 1(row 1) to row 3. 1 0 0 1 0 1 1 1 1 1 0 0 1 1 0 x2 x3 .
Hints
351
If we add any vector y , we cant add another pivot, so every vector y is a linear combination of x1 , x2 , x3 . Therefore the span is all of R3 . 8.5. Yes. Forward eliminate: 0 2 1 1 1 1 1 0 0 2 1 1 Swap rows 1 and 2. 1 0 0 Move the pivot . 1 0 0 Add (row 2) to row 3. 1 0 0 Move the pivot . 1 0 0 1 2 0 1 1 2 0 1 2 1 2 0 1 1 2 0 1 2 1 2 2 1 1 1 0 1 1 1 2 2 1 1 1 0 1 1
352
Hints
There is no pivot in the nal column, so the nal column is a linear combination of earlier columns. 8.6. No. Forward eliminate: 2 2 4 0 0 1 0 2 0 1 0 0 Move the pivot . 2 0 0 Add row 2 to row 3. 2 0 0 Move the pivot . 2 0 0 Move the pivot . 2 0 0 2 1 0 4 0 0 0 2 2 2 1 0 4 0 0 0 2 2 2 1 0 4 0 0 0 2 2 2 1 1 4 0 0 0 2 0
There is a pivot in the nal column, so the nal column is linearly independent of earlier columns. 8.9. x1 + x2 + 2x3 = 0 8.11. You can rescale and add as many times as you need to, forming any linear combination. 8.12. No: it doesnt contain 0. 8.13. Yes 8.14. Obviously every vector in a subspace is a linear combination of vectors in the subspace: x = 1 x. So the subspace lies inside the span of its vectors. Conversely, every linear combination of vectors in a subspace belongs
Hints
353
to the subspace, so the span of the vectors in the subspace lies in the subspace. Therefore a subspace is its own span. 8.15. {0} and R . 8.16. a. Not always. Take the x and y axes in the (x, y ) plane. b. Yes. 8.17. No: it contains 1 x= 1 but doesnt contain 2x = 8.18. a. no b. yes c. yes d. no 8.20. The lines through 0. 2 . 2
9 Bases
9.2. Put the standard basis into the columns of a matrix, and you have the identity matrix. Look: there is a pivot in each column. 9.4. Try adding a vector to the set. If you cant then you are done: a basis. If you can, then keep going. If you end up with more than n vectors, then use theorem 9.4. 9.5. Put them into the columns of matrix A. You nd det A = 0, so these vectors are linearly dependent. 9.6. Put them into the columns of a matrix, and apply forward elimination to nd pivots: 2 0 1 3 . 2
A square matrix, a pivot in every column, so a basis. 9.10. Write A as columns A = u1 u2 ... un .
The equation Ax = 0 is the equation x1 u1 + x2 u2 + + xn un = 0, which imposes a linear relation. Therefore u1 , u2 , . . . , un are linearly independent just when Ax = 0 has x = 0 as its only solution.
354 9.11. F 1 1 = 0 0 0 1 0 1 1 1 , F AF = 0 0 1 0 0 2 0 1 0 . 2
Hints
9.12. 1 A = 0 0 1 F 1 = 0 0 0 1 0 2 1 0 1 0 0 , F = 0 0 0 4 1 2 , F AF 1 2 1 0 1 = 0 0 0 2 , 1 0 1 0 4 2 . 0
9.14. Expand out x = x1 e1 + x2 e2 + + xq eq to give Ax = x1 (Ae1 ) + x2 (Ae2 ) + + xq (Aeq ) . So Ax = 0 just when x1 , x2 , . . . , xq give a linear relation among the columns of A. 9.15. No 9.17. Yes 9.19. Let F = x1 G = y1 x2 y2 ... ... xn yn ,
and let A = G F 1 . But why is there only one such matrix? 9.21. The idea is that e2 e3 = (e1 e3 ) (e1 e2 ), etc. So consider the vectors e1 e2 , e1 e3 , . . . , e1 en1 . Clearly if i = 1, then ei ej is one of these vectors. But if i = 1, then ei ej = (1) (e1 ei ) + (e1 ej ) . So the vectors e1 e2 , e1 e3 , . . . , e1 en1 span the subspace. Clearly these vectors are linearly independent, because each one has a nonzero entry just at a spot where all of the others have a zero entry. Alternatively, to see linear independence, any linear relation among them: 0 = c2 (e1 e2 ) + c3 (e1 e3 ) + . . . cn (e1 en ) = (c2 + c3 + + cn ) e1 + c2 e2 + c3 e3 + + cn en determines a linear relation among the standard basis vectors, forcing 0 = c2 = c3 = = cn . The subspace is actually the set of vectors x for which x1 + x2 + + xn = 0.
Hints
355
Depending on whether you swap row 1 with row 2 or with row 3, forward elimination yields 1 1 1 2 0 1 or 0 1 . 0 0 0 0
1 2 1
0 1 0
0 0 1
0 0 2
0 1 2
0 1 2
Pad with rows from the identity matrix, to get 1s down the diagonal. 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 2 1 0 0 1 2 0 1
Hints
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
Pad with rows from the identity matrix, to get 1s down the diagonal. 1 0 0 0 0 Keep the pivotless columns. 1 0 0 0 0 10.8. The reduced echelon 1 0 0 0 form is 0 1 0 0 0 0 1 0 0 0 0 1 2 5 2 5 2 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1
Hints
357
0 0 0 1
5 2 5 2
Pad with rows from the identity matrix, to get 1s down the diagonal. 1 0 0 0 2 0 5 0 0 1 2 0 0 1 0 5 2 0 0 0 1 1 0 0 0 0 1 Keep the pivotless columns. 2 5 2 5 2 1 1 10.9. 1 1 0 1 1 0
10.10.
xj ej xj Aej
=
j
358
Hints
so a linear combination (with coecients x1 , x2 , . . . , xq ) of the columns Aej of A. Therefore the vectors of the form y = Ax are precisely the linear combinations of the columns of A. 10.14. It is the horizontal plane in R3 , the xy -plane in xyz coordinates. 10.16. If A is tall, then At is short, so there are nonzero vectors c so that t A c = 0, i.e. ct A = 0. Since c = 0, there must be some entry ci = 0. If Ax = ei , we nd 0 = ct Ax = ct ei = ci a contradiction. So ei is not in the image. 10.17. 1, 1, 2, 1, 1, 0 10.19. The kernel of B consists in the vectors x v = y z for which Bv = 0. This is just asking for Ax + Ay + Az = 0, i.e. for A(x + y + z ) = 0. We can pick x and y arbitrarily, and pick an arbitrary vector w in the kernel of A, and set z = w (x + y ). In particular, B has kernel of dimension 2q + k where k is the dimension of the kernel of A. 10.20. Ax = 0 implies that Bx = CAx = 0. Conversely, Bx = 0 implies that Ax = C 1 Bx = 0. 10.21. Linear relations among vectors pass through C and through C 1 . 10.22. Compute echelon form: 0 1 0 0 2 2 1 2 2 0 2 2 2 0 2 2
Hints
359
So the rank is 3, the number of pivots. There is 1 pivotless column. The image is 3-dimensional, while the kernel is 1-dimensional. 10.23. You could try A= 1 0 0 , B= 0 0 1 0 , C= 0 0 1 1 . 0
The image of A is the span of the columns, so the span of 1 0 while the image of B is the span of its columns, so the span of 0 , 1 which are clearly not the same subspaces, but both one dimensional. 10.24. The rank of At is the number of linearly independent columns of t A (rows of A). Let U be the forward elimination of A. When we compute forward elimination, we add rows to other rows, and swap rows, so the rows of U are linear combinations of the rows of A, and vice versa. Thus the rank of At is the number of linearly independent rows of U . None of the zero rows of U count toward this number, while pivot rows are clearly linearly independent. Therefore the rank of At is the number of pivots, so equal to the rank of A. 10.31. If they both have solutions, then At y = 0 so y t A = 0, so y t Ax = 0. But y t Ax = y t b, so bt y = 0, a contradiction. On the other hand, if neither has a solution, then b is not in the image of A. So b is not a linear combination of the columns of A, and the matrix M = (A b) has rank 1 higher than the matrix A. Therefore the matrix Mt = At bt
also has rank one higher than At . So the dimension of the kernel of M t is one lower than the dimension of the kernel of At , and therefore there is a vector y in the kernel of At but not in the kernel of M t . 10.34. A A+B = 1 1 B
360
Hints
so you can use the previous exercise. 10.35. (a) only 10.36. The rank of AB is the dimension of the image. But (AB )x = A(Bx), so every vector in the image of AB lies in the image of A. The clever bit: on the other hand, the rank of AB is the same as the rank of AB t , as shown in problem 10.24 on page 97. But AB t = B t At , so the rank is at most the rank of B t , which is the rank of B .
Their eigenvalues are: for A, = 1; for B , = 1, for A + B , = 1 or = 3. 11.5. Suppose that A is an n n matrix. The determinant of a matrix A is a sum of terms, each one linear in any of the entries of A which appear in it. The characteristic polynomial det (A I ) therefore involves at most n terms with a in them, coming from the n diagonal entries of A , so a polynomial in of degree at most n. 11.6. (1)n n 11.7. A : 0, 5, B : 1, 3, C : 0, 1, D : 0, 0, 2 11.8. det A = det At for any square matrix A, so I t = I and det (A I ) = det (At I ) . 11.9. det F 1 AF I = det F 1 (A I ) F = det F 1 det (A I ) det (F ) = det (A I ) . 11.11. If A is invertible, then det (AB I ) = det A1 (AB I ) A = det (BA I ) . If B is invertible, the same trick works. But if neither A nor B is invertible, we have to work harder. Pick any number which is not an eigenvalue of A. Then
Hints
361
A I is invertible, so (A I ) B and B (A I ) have the same eigenvalues. Therefore det ((A I ) B I ) = det (B (A I ) I ) as polynomials in . Now we can plug in = 0. 11.13. 0,3 11.16. a. sn (A) = 1 b. s0 (A) = det (A 0) = det A c. The expression det A is a sum ofterms, each a product of precisely n entries of A, as we have seen. So det (A I ) is a sum of terms, each involving some A entries and some s, with a total of n factors in each term. d. If A is upper triangular, or lower triangular, then the result is obvious. But each term of sn1 (A) involves precisely one entry of A, hence linear in those entries. So sn1 (A) is a linear function of A, i.e. sn1 (A + B ) = sn1 (A) + sn1 (B ). We can write any matrix as a sum of lower triangular and an upper triangular. e. Follows immediately from problem 11.9. f. Clearly F 1 AF ej = F 1 Auj = 0 for j > r. So we get zeros just where we need them, to be able to write F 1 AF = P Q 0 . 0
P must have rank r, since there is nowhere else to put the r pivots, so P is invertible. Since A has rank r, so must F 1 AF . g. det (A I ) = det P I Q 0 I
nr
= det (P I ) () has no terms of degree less than n r in . h. You could try 0 1 0 A= , B= 0 0 0 11.17. The = 3-eigenvectors are multiples of x= 11.20. =1 0 1 0 . 1
0 . 0
362
Hints
11.21. = 2 =3 11.22. =1
3 2 1
1 1 0 1 0 1 2 2 1 0 3 2 1
=2
=3
12 Bases of Eigenvectors
12.1. The kernel of any matrix is a subspace. 12.2. Ax = x so ABx = BAx = Bx = Bx. 12.3. Depending on the order in which you write down your eigenvalues, you could get: 1 0 0 F = 0 1 0 1 0 1 1 0 0 F 1 = 0 1 0 1 0 1 1 0 0 F 1 AF = 0 2 0 0 0 3
Hints
363
12.4. Again, it depends on the order you choose to write the eigenvalues, and the order in which you write down basis vectors for each eigenspace. You could get:
= 3
= 2
=0
1 0 1 2 1 0 0 1 1
1 F = 0 1 1 F 1 = 1 1 3 F 1 AF = 0 0
2 1 0 2 1 2 0 2 0
0 1 1 2 1 1 0 0 0
=2
1 1 0 , 1 0 1
1
= 4
2 1 2 1
364
Hints
1 F = 0 1 1 1 F 1 = 2 1 2 1 F AF = 0 0
1 1 0 1
3 2
1 2 1 2
1 0 1 2 1
1 0 2 0
0 0 4
12.6. Suppose we have a matrix F for which 1 F 1 AF = 2 .. . n Call this diagonal matrix . Therefore AF = F . Lets check that the columns of F are eigenvectors. We need only see that F is just F with columns scaled by the diagonal entries of . 12.7. You could try A= because it has only one eigenvector, 1 0 0 0 1 0 .
x=
(up to rescaling), with eigenvalue = 0. Therefore there is no basis of eigenvectors. 12.9. You could get 1 1 1 2 1
= 1 =1
Hints
365
F = F 1 = F 1 AF =
1 1 2 2 1 0
1 2 1
1 2 0 1
3 4
=1
1 4
0 0 1 1 2 v1 = 1 2 1 0 v2 = 0 1 1 v3 = 1 0
1 4 1 2 1 4
1 2 1 F = 2 1 1 F 1 = 1 1 2
3
0 0 1 1 1
1 2
4 AF = 0 0
0 1 0
0 0
1 4
1 1 0 0 1 0
Hints
Numbers of people cant be negative. Clearly the vector must be a linear combination of eigenvectors, with a positive coecient for the = 1 eigenvector: h0 s0 = a1 v1 + a2 v2 + a3 v3 , d0 with a2 > 0. Over time, the numbers develop according to hn hn1 sn = A sn1 dn dn1 h0 = An s0 d0 = 3 4
n
a1 v1 + a2 v2 +
1 4
a 3 v3 .
Since the other eigenvalues are smaller than 1, their powers become very small, and their components in the resulting vector gradually decay away. Thereforethe result becomes every closer to 0 a 2 v2 = 0 , a2 everybody dead. So everyone dies, in an exponential decay of population. (It should be obvious, because we didnt allow any births in our model.) 12.15. See table A.1. Table A.1: Invertibility criteria. A is n n of rank r. U is any matrix obtained from A by forward elimination. 5.1 5.2 Invertible GaussJordan on A yields 1. U is invertible. Not invertible GaussJordan on A yields a matrix with n r zero rows. U has n r zero rows.
Hints
367
Invertible Pivots lie on the diagonal. U has no zero rows U has n pivots. Ax = b has a solution x for each b. Ax = b has exactly one solution x for each b. Ax = b has exactly one solution x for some b. Ax = 0 only for x = 0. A has rank n. At is invertible. det A = 0. The columns are linearly independent. The columns form a basis.
Not invertible Some pivot lies above the diagonal, and all pivots after it. U has n r zero rows. U has r < n pivots. Ax = b has no solution for some b, nr dimensions worth for other b. Ax = b has no solution for some b, many for other b. Ax = b has no solution for some b, many for other b. Ax = 0 for many x. A has rank r < n. At is not invertible. Every square block larger than r r has det = 0. The n r pivotless columns are linear combinations of the r pivot columns. Each of the n r pivotless columns is a linear combination of earlier pivot columns. One row is a linear combination of earlier rows. The kernel has positive dimension n r. The image has positive dimension r. The = 0 eigenspace has positive dimension n r.
9.2
The rows form a basis. The kernel of A is just the 0 vector. The image of A is all of Rn . 0 is not an eigenvalue of A.
13 Inner Product
13.1. Aej , ei = Akj ek , ei = Aij .
368 13.2. xt y =
k
Hints
xt k yk xk yk .
k
= 13.3. Pij = P ei , ej
13.4. v v, u u
2
u, u
= v, u
v, u u
2
u, u
= v, u v, u = 0. 13.5. (a) One: x = 0. (b) 2n: one xi is 1, all other xj are 0. (c) 2n + 16 n 4 : one xi is 2, all other xj are 0, or 4 xi s are 1 and all other are 0. n 6 9 n (d) 2n + 8(n 2) n 2 + 2 (n 5) 5 + 2 9 : a. one 3 or b. two 2s and one 1 or c. one 2 and ve 1s or d. nine 1s. 13.6. The hour hand starts at angle /2, and completes a revolution every 12 hours. So the hour hand is at an angle of 2 t, 2 12 after t hours. The minute hand, if we measure time in hours, revolves every hour, so has angle = 2t. 2 =
Hints
3 11 (2k 6k 11 ,
a. t = b. t =
any integer k .
A=
0 1
1 , B= 1
0 1
1 , AB = 2
1 1
2 . 3
13.12. For A symmetric. 13.19. Those which have 1 in each diagonal entry. 13.20. 1 0 0 1
P =
A=
0 1
1 , B = A. 0
13.22. x, y = 1 2 x+y
2
Preserve the right hand side, and you must preserve the left hand side. 13.25. The rows of A are orthonormal just when At is orthogonal, which 1 occurs just when A = (At ) , which occurs just when At = A1 , which occurs just when A is orthogonal. 13.26. See table A.2 on the following page and table A.3 on the next page. 13.27. u1 =
2 2 22 ,
u2 =
6 6 6 , 6 6 3
33 3 u3 = 3 ,
3 3
370
Hints
w 1 = v1
1 = 1 0
u1 =
1 w1 , w1
w1
1 1 = 1 (1)(1) + (1)(1) + (0)(0) 0 1 2 21 = 2 2 0 1 u2 = w2 w2 , w2 1 1 = 1 (1)(1) + (1)(1) + (2)(2) 2 1 6 6 1 = 6 6 1 6 3 Table A.3: Orthogonalizing vectors: rescaling
Hints
371
Done: orthonormal.
13.29. u1 =
1 16 , 6 1 2 6
u2 =
1 13 . 3 1 3
Notice that v1 , v2 , v3 did not give a basis, so when we try to compute u3 , we run into trouble. 13.30. The only problem that can come up is division by zero. But that happens only when we divide by a length wj . If this length is 0, then wj is zero, so vj =
i
vj , ui ui .
But each ui is a linear combination of vectors v1 , v2 , . . . , vi , so this is a linear dependence. 13.32. If v is perpendicular to u, then set w = v , see that v is perpendicular to v , so v = 0. Otherwise, if v is not perpendicular to u, then w = v v,u u is u 2 perpendicular to u and v , and therefore perpendicular to any linear combination of u and v . In particular, since w is a linear combination of u and v , w must be perpendicular to w, so w = 0. So v= v, u u
2
u.
372 13.33.
Hints
w2 = v2 = 5 3 4 4
u1 = =
1 w1 , w1 1
w1 1 1
= u2 = =
Hints
373
Done: orthonormal.
13.34.
w2 = v2 = 0 2 1 1
374
Hints
Done: orthonormal.
u1 = =
1 w1 , w1
w1 1 1 1
= u2 = =
Hints
375
13.35.
w2 = v2 = 1 2
1 2 1 2
u1 = =
1 w1 , w1 1
w1 1 1
= u2 = =
(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1
1 1 1 (1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2 1 2 1 2
376
Hints
Done: orthonormal.
13.36.
w 2 = v2 = 1 1
1 5 2 5
Hints
377
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 2 1
= u2 = =
(2)(2) + (1)(1) 2 5 5 1 5 5 1 w2 w2 , w2 1
1 2 2 (1 5 )( 5 ) + ( 5 )( 5 ) 1 5 5 2 5 5 1 5 2 5
378 13.37.
Hints
w2 = v2 = 0 2
4 5 2 5
u1 = =
1 w1 , w1 1
w1 1 2
= u2 = =
(1)(1) + (2)(2) 1 5 5 2 5 5 1 w2 w2 , w2 1
4 5 2 5
4 2 2 ( 4 5 )( 5 ) + ( 5 )( 5 ) 2 5 5 1 5 5
Hints
379
Done: orthonormal.
13.38.
w2 = v2 = 2 0 1 1
380
Hints
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 1 1
= u2 = =
(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1
(1)(1) + (1)(1) 1 2 2 1 2 2
1 1
Hints
381
13.39.
w2 = v2 = 0 2 1 1
u1 = =
1 w1 , w1 1
w1 1 1
= u2 = =
(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1
(1)(1) + (1)(1) 1 2 2 1 2 2
1 1
382
Hints
Done: orthonormal.
13.40.
w2 = v2 = 1 2 3 2
3 2
Hints
383
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 1 1
= u2 = =
(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1
3 2 3 2
3 3 3 ( 3 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2
384 13.41.
Hints
w2 = v2 = 1 1 1 0
u1 = =
1 w1 , w1 1
w1 0 1
(0)(0) + (1)(1) 0 1 1 w2 , w2 1 w2
= u2 = =
(1)(1) + (0)(0) 1 0
1 0
Hints
385
Done: orthonormal.
13.42.
w2 = v2 = 1 2 1 2
386
Hints
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 2 1
= u2 = =
(2)(2) + (1)(1) 2 5 5 1 5 5 1 w2 w2 , w2 1
(1)(1) + (2)(2) 1 5 5 2 5 5
1 2
Hints
387
13.43.
w2 = v2 = 1 1 0 1
u1 = =
1 w1 , w1 1
w1 1 0
= u2 = =
0 1
388
Hints
Done: orthonormal.
13.44.
w 2 = v2 = 0 1 2 5 4 5
Hints
389
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 2 1
= u2 = =
(2)(2) + (1)(1) 2 5 5 1 5 5 1 w2 w2 , w2 1
2 4 4 ( 2 5 )( 5 ) + ( 5 )( 5 ) 1 5 5 2 5 5
2 5 4 5
390 13.45.
Hints
w2 = v2 = 1 0 1 2
1 2
u1 = =
1 w1 , w1 1
w1 1 1
= u2 = =
(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1
1 2 1 2
1 1 1 ( 1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2
Hints
391
Done: orthonormal.
13.46.
w2 = v2 = 1 0 4 5
2 5
392
Hints
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 1 2
= u2 = =
(1)(1) + (2)(2) 1 5 5 2 5 5 1 w2 w2 , w2 1
4 5 2 5
4 2 2 ( 4 5 )( 5 ) + ( 5 )( 5 ) 2 5 5 1 5 5
Hints
393
13.47.
w2 = v2 = 0 1
1 2 1 2
u1 = =
1 w1 , w1 1
w1 1 1
= u2 = =
(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1
1 1 1 (1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2
1 2 1 2
394
Hints
Done: orthonormal.
13.48.
w 2 = v2 = 1 1
2 5 1 5
Hints
395
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 1 2
= u2 = =
(1)(1) + (2)(2) 1 5 5 2 5 5 1 w2 w2 , w2 1
2 1 1 (2 5 )( 5 ) + ( 5 )( 5 ) 2 5 5 1 5 5 2 5 1 5
396 13.49.
Hints
w2 = v2 = 1 2 1 0
u1 = =
1 w1 , w1 1
w1 0 1
(0)(0) + (1)(1) 0 1 1 w2 , w2 1 w2
= u2 = =
(1)(1) + (0)(0) 1 0
1 0
Hints
397
Done: orthonormal.
13.50.
w2 = v2 = 0 2 1 1
398
Hints
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 1 1
= u2 = =
(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1
(1)(1) + (1)(1) 1 2 2 1 2 2
1 1
Hints
399
13.51.
w2 = v2 = 2 1
1 2 1 2
u1 = =
1 w1 , w1 1
w1 1 1
= u2 = =
(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1
1 1 1 (1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2 1 2 1 2
400
Hints
Done: orthonormal.
13.52.
w2 = v2 = 1 0 1 2
1 2
Hints
401
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 1 1
= u2 = =
(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1
1 2 1 2
1 1 1 ( 1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2
402 13.53.
Hints
w2 = v2 = 2 2
4 5 2 5
u1 = =
1 w1 , w1 1
w1 1 2
= u2 = =
(1)(1) + (2)(2) 1 5 5 2 5 5 1 w2 w2 , w2 1
4 2 2 (4 5 )( 5 ) + ( 5 )( 5 ) 2 5 5 1 5 5
4 5 2 5
Hints
403
Done: orthonormal.
13.54.
w2 = v2 = 1 2 1 2
1 2
404
Hints
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 2 2
= u2 = =
(2)(2) + (2)(2) 1 2 2 1 2 2 1 w2 w2 , w2 1
1 2 1 2
1 1 1 ( 1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2
Hints
405
13.55.
w2 = v2 = 0 1 0 1
u1 = =
1 w1 , w1 1
w1 1 0
(1)(1) + (0)(0) 1 0 1 w2 , w2 1 w2
= u2 = =
(0)(0) + (1)(1) 0 1
0 1
406
Hints
Done: orthonormal.
13.56.
w2 = v2 = 1 0
1 2 1 2
Hints
407
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 1 1
= u2 = =
(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1
1 1 1 (1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2 1 2 1 2
408 13.57.
Hints
w2 = v2 = 2 0 1 1
u1 = =
1 w1 , w1 1
w1 1 1
= u2 = =
Hints
409
Done: orthonormal.
13.58.
w2 = v2 = 2 1
1 2 1 2
410
Hints
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 1 1
= u2 = =
(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1
1 1 1 (1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2
1 2 1 2
Hints
411
13.59.
w2 = v2 = 1 1 1 0
u1 = =
1 w1 , w1 1
w1 0 2
= u2 = =
1 0
412
Hints
Done: orthonormal.
13.60.
w2 = v2 = 2 1
1 2 1 2
Hints
413
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 1 1
= u2 = =
(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1
1 1 1 (1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2
1 2 1 2
414 13.61.
Hints
w2 = v2 = 1 2
3 5 6 5
u1 = =
1 w1 , w1 1
w1 2 1
= u2 = =
(2)(2) + (1)(1) 2 5 5 1 5 5 1 w2 w2 , w2 1
3 5 6 5
3 6 6 ( 3 5 )( 5 ) + ( 5 )( 5 ) 1 5 5 2 5 5
Hints
415
Done: orthonormal.
13.62.
w2 = v2 = 1 1 6 5 3 5
416
Hints
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 1 2
= u2 = =
(1)(1) + (2)(2) 1 5 5 2 5 5 1 w2 w2 , w2 1
6 3 3 ( 6 5 )( 5 ) + ( 5 )( 5 ) 2 5 5 1 5 5
6 5 3 5
Hints
417
13.63.
w2 = v2 = 2 0
2 5 4 5
u1 = =
1 w1 , w1 1
w1 2 1
= u2 = =
(2)(2) + (1)(1) 2 5 5 1 5 5 1 w2 w2 , w2 1
2 4 4 (2 5 )( 5 ) + ( 5 )( 5 ) 1 5 5 2 5 5
2 5 4 5
418
Hints
Done: orthonormal.
13.64.
w2 = v2 = 2 1
3 2 3 2
Hints
419
Done: orthonormal.
u1 = =
1 w1 , w1
w1 1 1 1
= u2 = =
(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1
3 3 3 (3 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2
3 2 3 2
420 13.65.
Hints
w2 = v2 = 0 1
1 2 1 2
u1 = =
1 w1 , w1 1
w1 1 1
= u2 = =
(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1
1 1 1 (1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2
1 2 1 2
Hints
421
Done: orthonormal.
13.66.
w2 = v2 = 2 0 1 1
422
Hints
Done: orthonormal.
u1 = =
1 w1 , w1
w1 1 1 1
= u2 = =
Hints
423
13.67.
w2 = v2 = 0 2 0 2
u1 = =
1 w1 , w1 1
w1 2 0
= u2 = =
0 2
424
Hints
Done: orthonormal.
13.68.
w2 = v2 = 1 1 1 0
Hints
425
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 0 1
= u2 = =
426 13.69.
Hints
w2 = v2 = 2 2 0 2
u1 = =
1 w1 , w1 1
w1 1 0
= u2 = =
0 2
Hints
427
Done: orthonormal.
13.70.
w2 = v2 = 2 1 2 0
428
Hints
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 0 2
= u2 = =
2 0
Hints
429
13.71.
w2 = v2 = 2 0 2 0
u1 = =
1 w1 , w1 1
w1 0 2
= u2 = =
2 0
430
Hints
Done: orthonormal.
13.72.
w2 = v2 = 1 1 1 1
Hints
431
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 2 2
= u2 = =
(2)(2) + (2)(2) 1 2 2 1 2 2 1 w2 w2 , w2 1
(1)(1) + (1)(1) 1 2 2 1 2 2
1 1
432 13.73.
Hints
w2 = v2 = 1 1 1 0
u1 = =
1 w1 , w1 1
w1 0 2
= u2 = =
1 0
Hints
433
Done: orthonormal.
13.74.
w 2 = v2 = 1 1 0 1
434
Hints
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 2 0
(2)(2) + (0)(0) 1 0 1 w2 , w2 1 w2
= u2 = =
(0)(0) + (1)(1) 0 1
0 1
Hints
435
13.75.
w2 = v2 = 0 1 0 1
u1 = =
1 w1 , w1 1
w1 1 0
= u2 = =
436
Hints
Done: orthonormal.
13.76.
w 2 = v2 = 0 1 2 5 4 5
Hints
437
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 2 1
= u2 = =
(2)(2) + (1)(1) 2 5 5 1 5 5 1 w2 w2 , w2 1
2 4 4 ( 2 5 )( 5 ) + ( 5 )( 5 ) 1 5 5 2 5 5
2 5 4 5
438 13.77.
Hints
w2 = v2 = 2 1 0 1
u1 = =
1 w1 , w1 1
w1 2 0
= u2 = =
0 1
Hints
439
Done: orthonormal.
13.78.
w2 = v2 = 1 1 1 1
440
Hints
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 2 2
= u2 = =
(2)(2) + (2)(2) 1 2 2 1 2 2 1 w2 w2 , w2 1
(1)(1) + (1)(1) 1 2 2 1 2 2
1 1
Hints
441
13.79.
w2 = v2 = 2 1 0 1
u1 = =
1 w1 , w1 1
w1 2 0
= u2 = =
0 1
442
Hints
Done: orthonormal.
13.80.
w2 = v2 = 0 1
2 5 1 5
Hints
443
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 1 2
= u2 = =
(1)(1) + (2)(2) 1 5 5 2 5 5 1 w2 w2 , w2 1
2 1 1 (2 5 )( 5 ) + ( 5 )( 5 ) 2 5 5 1 5 5
2 5 1 5
444 13.81.
Hints
w2 = v2 = 1 0
4 5 2 5
u1 = =
1 w1 , w1 1
w1 1 2
= u2 = =
(1)(1) + (2)(2) 1 5 5 2 5 5 1 w2 w2 , w2 1
4 2 2 (4 5 )( 5 ) + ( 5 )( 5 ) 2 5 5 1 5 5
4 5 2 5
Hints
445
Done: orthonormal.
13.82.
w2 = v2 = 0 1
1 2 1 2
446
Hints
Done: orthonormal.
u1 = =
1 w1 , w1 1
w1 2 2
= u2 = =
(2)(2) + (2)(2) 1 2 2 1 2 2 1 w2 w2 , w2 1
1 1 1 (1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2
1 2 1 2
Hints
447
w2 = v2 = 1 1 1 1
u1 = =
1 w1 , w1 1
w1 1 1
= u2 = =
448
Hints
Done: orthonormal.
14.2. = 16 =3
2 3 1 3 2
F = F 1 = F 1 AF =
2 13
3 13
16 0
0 3
Each eigenvector comes from a dierent eigenvalue, so they are already perpendicular you only have to rescale them to have unit length. 14.4. det (A I ) =2 5 = ( 5)
Hints
449 2 1
1 2
2 5 1 5 2 5 1 5
1 5 2 5 1 5 2 5
5 0
0 0
det (A I ) =3 8 2 + 15 = ( 3) ( 5) 0 =3 0 1
1
=0
2 1 0 2 1 0
1 5 2 5
2 5 1 5
0 0
2 5 1 5
0 1 0 0
1 F 1 = 5 2 5 3 0 F 1 AF = 0 0 0 0
0 0 5
Hints
=1
=2
2 1 1 , 0 1 0 1 12 2 1
1 12 2
1 3 1 3 1 3 1 2 1 3 1 6
1 6 1 6 2 6
F 1 =
1 12 3 1 6
1 3 2 6
1 F 1 AF = 0 0
0 1 0
0 0 2
14.7.
= 1
=1
4 3 0 1 3 0 4 1 , 0 0 1
Hints
451
F 1
5 4 5 = 0
3
0 1 0 0 1 0 0 1 0
3 5 4 5 3 5 4 5
0 0 0 1
5 1 F 1 AF = 0 0
14.8. Such an F must preserve the eigenspaces, so preserve the span of e1 , the span of e2 , and the span of e3 . Therefore 1 F = 1 . 1 14.11. You only change x1 x2 terms: a. x2 2 2 b. x2 1 + x2 3 2 c. x1 + 2 x1 x2 + 3 2 x2 x1 1 1 d. x2 + x x + 1 2 1 2 2 x2 x1 14.12. a. A= b. A= c. A= d. A= 1
1 2 1 2
0 0 0 0 1
3 2
0 1 0 1
3 2
14.31. Look at the eigenvalues of the symmetric matrix, to get started. a. ellipse b. hyperbola c. pair of lines d. hyperbola
452
Hints
e. line f. empty set 14.32. First, by orthogonal transformations, you can get you quadratic form to look like 2 2 1 x2 1 + 2 x2 + + n xn . Then you can get every eigenvalue j to be 0, 1 or 1, by rescaling the associated variable xj . Then you can permute the order of the variables. So you can get
2 2 x 2 1 + x2 + + xs ,
15 Complex Vectors
15.1. Take the complex number with half as much argument, and square root as much modulus. 15.10. z, w = 1(i) + i(2 2i) = 2 + i 15.16. Since A is self-adjoint, Az, z = z, Az . If we pick z an eigenvector, with eigenvalue , then the left side becomes Az, z = z, z , and the right side becomes z, z . z, Az =
2 . Since z, z = z = 0, we nd = 15.21. For an eigenvector z , with eigenvalue ,
z, z .
, u2 =
1+2 i 10 2i 10
Hints
453
a. A is self-adjoint. b. =3 =4 c. F = F AF =
1 2 i 2 1 2 (1 1 2 (1 1 2 i 2 1 2 (1 1 2 (1
+ i) i)
+ i) i)
3 4
16 Vector Spaces
16.1. a. 0 v = (0 + 0) v = 0 v + 0 v . Add the vector w for which 0 v + w = 0 to both sides. b. a 0 = a (0 + 0) = a 0 + a 0. Similar. 16.3. If there are two such vectors, w1 and w2 , then v + w1 = v + w2 = 0. Therefore w1 = 0 + w1 = (v + w2 ) + w1 = v + (w2 + w1 ) = v + (w1 + w2 ) = (v + w1 ) + w2 = 0 + w2 = w2 . 0 = (1 + (1)) v = 1 v + (1) v = v + (1) v so (1) v = v . 16.8. Ax = Sx and By = T y for any x in Rp and y in Rq . Therefore T Sx = BSx = BAx for any x in Rp . So (BA)ej = T Sej , and therefore BA has the required columns to be the associated matrix. 16.10.
Hints
b. Given any z in V , we nd z = 1x just for x = z . 16.11. Given z in V , we can nd some x in U so that T x = z . If there are two choices for this x, say z = T x = T y , then we know that x = y . Therefore x is uniquely determined. So let T 1 z = x. Clearly T 1 is uniquely determined, by the equation T 1 T = 1, and satises T T 1 = 1 too. Lets prove that T 1 is linear. Pick z and w in V . Then let T x = z and T y = w. We know that x and y are uniquely determined by these equations. Since T is linear, T x + T y = T (x + y ). This gives z + w = T (x + y ). So T 1 (z + w) = T 1 T (x + y ) =x+y = T 1 z + T 1 w. Similarly, T (ax) = aT x, and taking T 1 of both sides gives aT 1 z = T 1 az . So T 1 is linear. 16.12. Write p(x) = a + bx + cx2 . Then a Tp = a + b + c . a + 2 b + 4c To solve for a, b, c in terms of p(0), p(1), p(2), we solve 1 1 1 Apply forward elimination: 1 1 1 1 0 0 1 0 0 0 1 2 0 1 2 0 1 0 0 1 4 0 1 4 0 1 2 p(0) p(1) p(2) p(0) p(1) p(0) p(2) p(0) p(0) p(1) p(0) p(2) 2p(1) + p(0) a p(0) 1 b = p(1) . 4 c p(2)
1 2
Hints
455
Back substitute to nd: 1 1 p(0) p(1) + p(2) 2 2 b = p(1) p(0) c 3 1 = p(0) + 2p(1) p(2) 2 2 a = p(0). c= Therefore we can recover p = a+bx+cx2 completely from knowing p(0), p(1), p(2), so T is one-to-one and onto. 16.14. Some examples you might think of: The set of constant functions. The set of linear functions. The set of polynomial functions. The set of polynomial functions of degree at most d (for any xed d). The set of functions f (x) which vanish when x < 0. The set of functions f (x) for which there is some interval outside of which f (x) vanishes. The set of functions f (x) for which f (x)p(x) goes to zero as x gets large, for every polynomial p(x). The set of functions which vanish at the origin. 16.15. a. no b. no c. no d. yes e. no f. yes g. yes 16.16. a. no b. no c. no d. yes e. yes f. no 16.17. a. If AH = HA and BH = HB , then (A + B )H = H (A + B ), clearly. Similarly 0H = H 0 = 0, and (cA)H = H (cA). b. P is the set of diagonal 2 2 matrices. 16.18. You might take: a. 1, x, x2 , x3 b. 1 0 1 0 0 0 0 0 0 , 0 0 , 0 1 0 1 0 0 0 0 0 0 , 0 0 , 0 1 0 1 0 0 0 0 0 0 , 0 0 0
456
Hints
c. The set of matrices e(i, j ) for i j , where e(i, j ) has zeroes everywhere, except for a 1 at row i and column j . d. A polynomial p(x) = a + bx + cx2 + dx3 vanishes at the origin just when a = 0. A basis: x, x2 , x3 . 16.19. If F : Rp V and G : Rq V are isomorphisms, then A = G1 F : R Rq is an isomorphism. Hence A is a matrix with 0 kernel and image Rq . So A must have rank q . By theorem 10.7 on page 96, the dimension of the kernel plus the dimension of the image must be p. Therefore p = q . 16.21. The kernel of T is the set of x for which T x = 0. But T x = 0 implies Sx = P 1 T x = 0, and Sx = 0 implies T x = P Sx = 0. So the same kernel. The image of S is the set of vectors of the form Sx, and each is carried by P to a vector of the form P Sx = T x. Conversely P 1 carries the image of T to the image of S . Check that this is an isomorphism. 16.23. If T : U V is an isomorphism, and F : Rn U is a isomorphism, prove that T F : Rn V is also an isomorphism. 16.24. The same proof as for Rn ; see proposition 9.12 on page 87. 16.25. Take F and G two isomorphisms. Determinants of matrices multiply. Let A be the matrix associated to F 1 T F : Rn Rn and B the matrix associated to G1 T G : Rn Rn . Let C be the matrix associated to G1 F . Therefore CAC 1 = B .
p
16.26. a. T (p(x) + q (x)) = 2 p(x 1) + 2 q (x 1) = T p(x) + T q (x), and T (ap(x)) = 2a p(x 1) = a T p(x). b. If T p(x) = 0 then 2 p(x 1) = 0, so p(x 1) = 0 for any x, so p(x) = 0 for any x. Therefore T has kernel {0}. As for the image, if q (x) is any polynomial of degree at most 2, then let p(x) = 1 2 q (x + 1). Clearly T p(x) = q (x). So T is onto. c. To nd the determinant, we need an isomorphism. Let F : R3 V , a F b = a + bx + cx2 . c
Hints
457 Calculate the matrix A of F 1 T F by a F 1 T F b = F 1 T (a + bx + cx2 ) c = F 1 2 a + b(x 1) + c(x 1)2 = F 1 2a + 2b(x 1) + 2c x2 2x + 1 = F 1 (2a 2b + 2c) + (2b 4c) x + 2cx2 2a 2b + 2 c = 2b 4c 2c So the associated matrix is 2 A = 0 0 giving det T = det A = 8. 2 2 0 2 4 2
16.27. det T = 2n 16.28. det T = det A2 . The eigenvalues of T are the eigenvalues of A. The eigenvectors with eigenvalue j are spanned by xj 0 , 0 xj .
16.29. Let y1 and y2 be the eigenvectors of At with eigenvalues 1 and 2 respectively. Then the eigenvalues of T are those of A, with multiplicity 2, and i -eigenspace spanned by t yi 0 , . t 0 yi 16.30. The characteristic polynomial is p () = ( + 1) ( 1) . The 1-eigenspace is the space of polynomials q (x) for which q (x) = q (x), so q (x) = a + cx2 . This eigenspace is spanned by 1, x2 . The (1)-eigenspace is the space of polynomials q (x) for which q (x) = q (x), so q (x) = bx. The eigenspace is spanned by x. Indeed T is diagonalizable, and diagonalized by the isomorphism F e1 = 1, F e2 = x, F e3 = x2 , for which 1 F 1 T F = 1 . 1 16.31.
2
458
Hints
(a) A polynomial with average value 0 on some interval must take on the value 0 somewhere on that interval, being either 0 throughout the interval or positive somewhere and negative somewhere else. A polynomial in one variable cant have more zeros than its degree. (b) It is enough to assume that the number of intervals is n, since if it is smaller, we can just add some more intervals and specify some more choices for average values on those intervals. But then T p = 0 only for p = 0, so T is an isomorphism. 16.33. Clearly the expression is linear in p(z ) and conjugate linear in q (z ). Moreover, if p(z ), p(z ) = 0, then p(z ) has roots at z0 , z1 , z2 and z3 . But p(z ) has degree at most 3, so has at most 3 roots or else is everywhere 0.
17 Fields
17.1. If there were two, say z1 and z2 , then z1 + z2 = z1 , but z1 + z2 = z2 + z1 = z2 . 17.2. Same proof, but with instead of +. 17.6. If p = ab, then in Fp arithmetic ab = p (mod p) = 0 (mod p). If a has a reciprocal, say c, then ab = 0 (mod p) implies that b = cab (mod p) = c0 (mod p) = 0 (mod p). So b is a multiple of p, and p is a multiple of b, so p = b and a = 1. 17.7. You nd 21 as answer from the Euclidean algorithm. But you can add 79 any number of times to the answer, to get it to be between 0 and 78, since we are working modulo 79, so the nal answer is 21 + 79 = 58 17.8. x = 3 17.9. It is easy to check that Fp satises addition laws, zero laws, multiplication laws, and the distributive law: each one holds in the integers, and to see that it holds in Fp , we just keep track of multiples of p. For example, x + y in Fp is just addition up to a multiple of p, say x + y + ap, usual integer addition, some integer a. So (x + y ) + z in Fp is (x + y ) + z in the integers, up to a multiple of p, and so equals x + (y + z ) up to a multiple of p, etc. The tricky bit is the reciprocal law. Since p is prime, nothing divides into p except p and 1. Therefore for any integer a between 0 and p 1, the greatest common divisor gcd(a, p) is 1. The Euclidean algorithm computes out integers u and v so that ua + vp = 1, so that ua = 1 (mod p). Adding or subtracting enough multiples of p to u, we nd a reciprocal for a.
Hints
459 1): 0 0 1 0 0 1 0 0 1 0 0 1 1 0 1
17.11. Carry out GaussJordan elimination thinking of the entries as living in the eld of rational functions. The result has rational functions as entries. For any value of t for which none of the denominators of the entries are zero, and none of the pivot entries are zero, the rank is just the number of pivots. There are nitely many entries, and the denominator of each entry is a polynomial, so has nitely many roots.
460
Hints
18.6. Each permutation chooses somewhere for the number 1 to go to, for which there are n choices, and then there are n 1 choices left for which it can put the number 2, etc. 18.12. A1 is upper triangular, with 1s down the diagonal, and its entries above the diagonal are polynomials with integer coecients, of degree at most 12.
21 Orthogonal Projections
21.1. Each element x of W yields a linear equation x, y = 0. The vectors y that satisfy all of these linear equations form precisely W . Clearly if we add vectors perpendicular to such a vector x, or scale them, they remain perpendicular to x. 21.2. If we have two vectors, say v1 and v2 in W , then we can write each one uniquely in the form v1 = x1 + y1 and v2 = x2 + y2 . But then
Hints
461
v1 + v2 = (x1 + x2 ) + (y1 + y2 ), writes v1 + v2 as a sum of a vector in W and a vector in W . Uniqueness of such a representation tells us that P (v1 + v2 ) = P v1 + P v2 . A similar argument works for rescaling vectors. 21.3. All vectors in W and in W are perpendicular. So W lies in the set of vectors perpendicular to W , i.e. in W . Conversely, take v any vector in W . Write v = x + y with x in W and y in W . 0 = v, y because v lies in W while y lies in W
= x + y, x + y = x, x + x, y + y, x + y, y = x, x + y, y = x
2
+ y
21.7. It is easy to show that the projection P to any subspace satises P = P 2 = P t . On the other hand, any linear map P with P = P 2 = P t has some image, say W , a subspace of V . Take x + y any vector in V , with x in W and y in W . We need to see that P x = x and P y = 0. Since W is the image of P , and x lies in W , we must have x = P z for some vector z in V . Therefore P x = P 2 z = P z = x, so P x = x. Next, take any vector v in V . Then P y, z = y, P t z = y, P z = 0, since y is perpendicular to the image of P , so to P z . Therefore P y = 0.
462 and V any other plane, for instance the plane consisting in the vectors x1 x = 0 . x3
Hints
22.3. Any vector w in Rn splits uniquely into w = u + v for some u in U and v in V . So we can unambiguously dene a map Q by the equation Qw = Su + T v . It is easy to check that this map Q is linear. Conversely, given any map Q : Rn Rp , dene S = Q|U and T = Q|V . 22.7. The pivot columns of (A B ) form a basis for U + W . All columns are pivot columns just when U + W is a direct sum.
23.8. Clearly x by itself is a string of length 1. Let k be the length of the longest string starting from x. So the string is x, (A ) x, . . . , (A )
k1
x = y.
Hints
463
Lets try to take another step. The next step must be (A ) y. If that vanishes, we cant step. If it doesnt vanish, then we can step, as long as this next step is linearly independent of all of the vectors earlier in the string. If the next step isnt linearly independent, then we must have a relation: c0 x + c1 (A ) x + + ck (A ) x = 0. Multiply both sides with a big enough power of A to kill o all but the rst term. This power exists, because all of the vectors in the string are generalized eigenvectors, so killed by some power of A , and the further down the list, the smaller a power of A you need to do the job, since you already have some A sitting around. So this forces c0 = 0. Hit with the next smallest power of A to force c1 = 0, etc. There is no linear relation, and we can take this next step. The last step y must be an eigenvector, because (A ) y = 0. 23.10. Two ways:(1) There is an eigenvector for each eigenvalue. Pick one for each. Eigenvectors with dierent eigenvalues are linearly independent, so they form a basis. (2) Each Jordan block of size k has an eigenvalue of multiplicity k . So all blocks must be 1 1, and hence A is diagonal in Jordan normal form. 23.11. F = 23.12. 1 F = 0 0 0 1 1 0 0 1 1 , F AF = 0 0 0 1 0 0 0 0 0
1 2 1 2 1 2 1 2 k
, F 1 AF =
0 0
0 2
23.13. Two blocks, each at most n/2 long, and a zero block if needed to pad it out. 23.14. i 4 1 F = 4 i 4 1 4
1 2 i 4 i 4 1 4 i 4 1 4 1 2
0
i 4
i 4 , 0 i 4
i 0 F 1 AF = 0 0
1 i 0 0
0 0 i 0
0 0 1 i
0 1 0 0 0
0 0 1 . 0 0
Hints
U =A =
U X UA =
(Note how we can allow ourselves to just play with I and as if they were numbers. Dont write I3 or 5 , i.e. forget the subscripts.) So X = 2 , and X is 3 3. We leave the reader to check that X has strings: e2 e3 , e1
=0
=0
But A = 1 0
Take the start z of each 0-string, here z = e2 and z = e3 , and try to solve U x = V z . But U = 2 shifts labels back by 2. Therefore the strings of A are e4 , e2 e5 , e3 , e1
=0
Hints
465
23.19. Clearly it is enough to prove the result for a single Jordan block. Given an n n Jordan block + , let A be any diagonal matrix, A= a1 a2 .. . an with all of the aj distinct and nonzero. Compute out the matrix B = A( + ) to see that all diagonal entries of B are distinct and that B is upper triangular. Why is B diagonalizable? Then + = A1 B , a product of diagonalizable matrices.
466 x 2 x + 2x + 4x + 4x + 4
4 3 2
Hints
2x3 + 5x2 + 4x + 10
x4 + 2 x3 + 4 x2 + 4 x + 4 5 3 x4 2 x 2x2 5x
1 3 2 x + 2 x2 x + 4 5 2 1 3 +x + 5 2x + 4x 2 13 2 4 x
13 2
2x + 5 x +2
2
0 Clearly r(x) = x2 + 2 (up to scaling; we will always scale to get the leading term to be 1.) Solving backwards for r(x): r(x) = u(x)a(x) + v (x)b(x) with u(x) = v (x) = 2 13 1 , 2 5 x2 x + 3 . 2 x
2 13
24.3. Euclidean algorithm yields: 2310 2 990 = 330 990 3 330 = 0 and 1386 1 990 = 396 990 2 396 = 198 396 2 198 = 0. Therefore the greatest common divisors are 330 and 198 respectively. Apply Euclidean algorithm to these: 330 1 198 = 132 198 1 132 = 66 132 2 66 = 0.
Hints
467
Therefore the greatest common divisor of 2310, 990 and 1386 is 66. Turning these equations around, we can write 66 = 198 1 132 = 198 1 (330 1 198) = 2 198 1 330 = 2 (990 2 396) 1 (2310 2 990) = 4 990 4 396 1 2310 = 4 990 4 (1386 1 990) 1 2310 = 8 990 4 1386 1 2310. 24.6. Clearly n n = 0, since shifts each vector of the string en , en1 , . . . , e1 one step (and sends e1 to 0). Therefore the minimal polynomial of n must k divide xn . But for k < n, k n en = enk is not 0, so x is not the minimal polynomial. 24.7. If the minimal polynomial of + is m(x), then the minimal polynomial of must be m(x ). 24.8. m () = 2 5 2 24.10. Take A to have Jordan normal form, and you nd characteristic polynomial det (A I ) = (1 )
n1
(2 )
n2
. . . (N )
nN
with n1 the sum of the sizes of all Jordan blocks with eigenvalue 1 , etc. The characteristic polynomial is clearly divisible by the minimal polynomial. 24.11. Split the minimal polynomial m(x) into real and imaginary parts. Check that s(A) = 0 for s(x) either of these parts, a polynomial equation of the same or lower degree. The imaginary part has lower degree, so vanishes. 24.12. Let 2k 2k zk = cos + i sin . n n
n By deMoivres theorem, zk = 1. If we take k = 1, 2, . . . , n, these are the so-called n-th roots of 1, so that
z n 1 = (z z1 ) (z z2 ) . . . (z zn ) . Clearly each root of 1 is at a dierent angle. An = 1 implies that (A z1 ) (A z2 ) . . . (A zn ) = 0, so by corollary 24.14, A is diagonalizable. Over the real numbers, we can take A= 0 1 1 , 0
which satises A4 = 1, but has complex eigenvalues, so is not diagonalizable over the real numbers.
Hints
1 0 0 0 1 0 0 0 1
0 2 0 1 1 0 1 2 1
2 2 0 1 3 0 1 2 1
2 6 0 3 5 0 3 6 1
yields
1 0 0 0 1 0 0 0 1
0 2 0 1 1 0 1 2 1
2 2 0 1 3 0 1 2 1
2 6 0 3 5 0 3 6 1
1 0 0 0 0 0 0 0 0
0 2 0 1 1 0 1 2 1
2 2 0 1 1 0 1 2 1
2 6 0 3 3 0 3 6 3
Hints
469 .
1 0 0 0 0 0 0 0 0
0 2 0 1 1 0 1 2 1
2 2 0 1 1 0 1 2 1
2 6 0 3 3 0 3 6 3
1 1 Add 1 2 (row 2) to row 4, 2 (row 2) to row 5, 2 (row 2) to row 7, (row 2) 1 to row 8, 2 (row 2) to row 9.
1 0 0 0 0 0 0 0 0
0 2 0 0 0 0 0 0 0
2 2 0 0 0 0 0 0 0
2 6 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
0 2 0 0 0 0 0 0 0
2 2 0 0 0 0 0 0 0
2 6 0 0 0 0 0 0 0
Hints
Cutting out all of the pivotless columns after the rst one, and all of the zero rows, yields 1 0 2 0 2 2
Scale row 2 by 1 2. 1 0 0 1 2 1
The minimal polynomial is therefore 2 + 2 = ( + 1) ( 2) . The eigenvalues are = 1 and = 2. We cant see which eigenvalue has multiplicity 2 and which has multiplicity 1.
Hints
471
If we imagine 1 and 2 as variables, then D jumps (its top right corner changes from 0 to 1 suddenly) as 1 and 2 collide. 24.18. We have seen previously that S and T will preserve each others k generalized eigenspaces: if (S )k x = 0, then (S ) T x = 0. Therefore we can restrict to a generalized eigenspace of S , and then further restrict to a generalized eigenspace of T . So we can assume that S = 0 +N0 and T = 1 +N1 , with 0 and 1 complex numbers and N0 and N1 commuting nilpotent linear maps. But then ST = 0 1 + N where N = 0 N1 + 1 N0 + N0 N1 . Clearly large enough powers of N will vanish, because they will be sums of terms like
j+ k k+ j N1 . 0 1 N0
So N is nilpotent.
27 The Pfaan
27.1. There are (2n)! permutations of 1, 2, . . . , 2n, and each partition is associated to n!2n dierent permutations, so there are (2n)!/ (n!2n ) dierent partitions of 1, 2, . . . , 2n. 27.2. In alphabetical order, the rst pair must always start with 1. Then we can choose any number to sit beside the 1, and the smallest number not choosen starts the second pair, etc.
472 (a) {1, 2} (b) {1, 2} , {3, 4} {1, 3} , {2, 4} {1, 4} , {2, 3} (c) {1, 2} , {3, 4} , {5, 6} {1, 2} , {3, 5} , {4, 6} {1, 2} , {3, 6} , {4, 5} {1, 3} , {2, 4} , {5, 6} {1, 3} , {2, 5} , {4, 6} {1, 3} , {2, 6} , {4, 5} {1, 4} , {2, 3} , {5, 6} {1, 4} , {2, 5} , {3, 6} {1, 4} , {2, 6} , {3, 5} {1, 5} , {2, 3} , {4, 6} {1, 5} , {2, 4} , {3, 6} {1, 5} , {2, 6} , {3, 4} {1, 6} , {2, 3} , {4, 5} {1, 6} , {2, 4} , {3, 5} {1, 6} , {2, 5} , {3, 4}
Hints
Hints
473
and thus x and y are dierent vectors in Rn . So some entry of x is not equal to some entry of y , say xj = yj . Let be the function (z ) = zj , i.e. = ej . 28.16. If x + W is a translate, we might nd that we can write this translate two dierent ways, say as x + W but also as z + W . So x and z are equal up to adding a vector from W , i.e. x z lies in W . Then after scaling, clearly sx sz = s(x z ) also lies in W . So sx + W = sz + W , and therefore scaling is dened independent of any choices. A similar argument works for addition of translates.
30 Factorizations
30.1. p 1, 2 2, 1 2, 1 1 0 1 0 1 0 L 0 1 0 1 0 1 0 1 1 0 1 0 U 1 0 0 1 0 1
31 Quadratic Forms
31.13. Try 0 Q(x) = x4 +x4 ++x4 2 n 1 2 2 2 if x = 0, otherwise .
x1 +x2 ++xn
474
Hints
(a) Suppose that = T x and = T y , i.e (z ) = b(x, z ) and (z ) = b(y, z ) for any vector z . Then (z ) + (z ) = b(x, z ) + b(y, z ) = b(x + y, z ), so T x + T y = + = T (x + y ). Similarly for scaling: aT x = T ax. (b) If T x = 0 then 0 = b(x, y ) for all vectors y from V , so x lies in the kernel of b, i.e. the kernel of T is the kernel of b. Since b is nondegenerate, the kernel of b consists precisely in the 0 vector. Therefore T is 1-to-1. (c) T : V V is a 1-to-1 linear map, and V and V have the same dimensions, say n. dim im T = dim ker T + dim im T = dim V = n, so T is onto, hence T is an isomorphism. (d) You have to take x = T 1 .
F 1
= =
= (F s)k .
33 Tensors
33.3. x y = 4 e1 e1 + 5 e1 e2 + 8 e2 e1 + 10 e2 e2 + 12 e3 e1 + 15 e3 e2 33.4. Picking a basis v1 , v2 , . . . , vp for V and w1 , w2 , . . . , wq for W , we have already seen that every tensor in V W has the form tiJ vi wJ , so is a sum of pure tensors. 33.11. Take any linear map T : V V and dene a tensor t in V V , which is thus a bilinear map t (, v ) for in V and v in V = V , by the rule t (, v ) = (T v ). (A.1)
Hints
475
Clearly if we scale T , then we scale t by the same amount. Similarly, if we add linear maps on the right side of equation A.1, then we add tensors on the left side. Therefore the mapping taking T to t is linear. If t = 0 then (T v ) = 0 for any vector v and covector . Thus T v = 0 for any vector v , and so T = 0. Therefore the map taking T to t is 1-to-1. Finally, we need to see that the map taking T to t is onto, the tricky part. But we can count dimensions for that: if dim V = n then dim Hom (V, V ) = n2 and dim V V = n.
Bibliography
[1] Emil Artin, Galois theory, 2nd ed., Dover Publications Inc., Mineola, NY, 1998. [2] Otto Bretscher, Linear algebra, Prentice Hall Inc., Upper Saddle River, NJ, 1997. [3] Richard P. Feynman, Robert B. Leighton, and Matthew Sands, The Feynman lectures on physics. Vol. 2: Mainly electromagnetism and matter, Addison-Wesley Publishing Co., Inc., Reading, Mass.-London, 1964. [4] Johann Hartl, Zur Jordan-Normalform durch den Faktorraum, Archiv der Mathematik 69 (1997), no. 3, 192195. [5] Tosio Kato, Perturbation theory for linear operators, Springer-Verlag, Berlin, 1995. [6] Peter D. Lax, Linear algebra, John Wiley & Sons Inc., New York, 1997. [7] I. G. Macdonald, Symmetric functions and Hall polynomials, 2nd ed., Oxford University Press, New York, 1995. [8] Peter J. Olver, Classical invariant theory, Cambridge University Press, Cambridge, 1999. [9] G. Polya, How to solve it, Princeton Science Library, Princeton University Press, Princeton, NJ, 2004. [10] Claudio Procesi, Lie groups, Springer, New York, 2007. [11] Igor R. Shafarevich, Basic notions of algebra, Encyclopaedia of Mathematical Sciences, vol. 11, Springer-Verlag, Berlin, 2005. [12] Daniel Solow, How to read and do proofs: An introduction to mathematical thought processes, Wiley, New Jersey, 2004. [13] Michael Spivak, Calculus on manifolds, Addison-Wesley, Reading, Mass., 1965. [14] , Calculus, 3rd ed., Publish or Perish, 1994. [15] Gilbert Strang, Linear algebra and its applications, 2nd ed., Academic Press, New York, 1980. [16] Richard L. Wheeden and Antoni Zygmund, Measure and integral, Marcel Dekker Inc., New York, 1977.
477
List of Notation
||x|| x, y z, w 0 1 1 A1 A[ij ] A Ab,i adj A A(ij ) arg z At C Cn det dim ei ei The length of a vector Inner product of two vectors Hermitian inner product Any matrix whose entries are all zeroes identity map identity permutation The inverse of a square matrix A A with rows i and j and columns i and j removed Adjoint of a complex matrix or complex linear map A The matrix obtained by replacing column i of A by b adjugate The matrix obtained from cutting out row i and column j from A Argument (angle) of a complex number The transpose of a matrix A The set of all complex numbers The set of all complex vectors with n entries The determinant of a square matrix Dimension of a subspace i-th row of the identity matrix The i-th standard basis vector (also the i-th column of the identity matrix) The set of linear maps T : V W 116 115 142 20 153 31 28 248 143 175 175 51 140 61 140 140 51 84 256 25
Hom (V, W )
255
479
480
List of Notation
I im A In ker A pq Pf Rn
The identity matrix The image of a matrix A The n n identity matrix The kernel of a matrix A Matrix with p rows and q columns Pfaan of a skew-symmetric matrix The space of all vectors with n real number entries Sum Transpose of a linear map Restriction of a linear map to a subspace Sum of two subspaces Direct sum of two subspaces The dual space of a vector space V Translate of a subspace Quotient space Covector dual to the vector v . orthogonal complement Modulus (length) of a complex number
25 91 25 87 17 245 17
T T |W U +V U V V v+W V /W v W |z |
22 258 158 197 197 256 258 259 299 191 140
Index
adjoint, 153 adjugate, 185 alphabetical order see partition, alphabetical order, 254 analytic function, 235 antisymmetrize, 298 argument, 150 associated matrix, see matrix, associated back substitution, 5 ball, 146 center, 146 closed, 146 open, 146 radius, 146 basis, 81, 86 dual, 266 orthonormal, 123 standard, 82 unitary, 154 Bessel, see inequality, Bessel bilinear form, 285 degenerate, 286 nondegenerate, 286 positive denite, 287 symmetric, 287 bilinear map, 306 block Jordan, see Jordan block Boolean numbers, 174 bounded, 146 box closed, 146 CayleyHamilton, see theorem, Cayley Hamilton theorem, see theorem, CayleyHamilton center of ball, see ball, center change of variables, 84 change of basis, see matrix, change of basis characteristic polynomial, see polynomial, characteristic circle unit, 151 closed ball, see ball, closed box, see box, closed set, 146 column permutation formula, see determinant, permutation formula, column combination linear, see linear combination commuting, 187, 232 complement, see subspace, complementary complementary subspace, see subspace, complementary complex linear map, see linear map, complex vector space, see vector space, complex complex number, 149 imaginary part, 149 real part, 149 complex plane, 149 component, 295 components see principal components, 273 composition, 163 conjugate, 150
481
482
continuous, 146 convergence, 145 of points, 145 covector, 266, 296 dual, 309 de Moivre, see theorem, de Moivre decoupling theorem, see theorem, decoupling degenerate bilinear form, see bilinear form, degenerate determinant, 51 permutation formula column, 183 row, 182 diagonal of a matrix, 19 diagonalize, 110, 139 orthogonally, 136 dimension eective, 275 of subspace, 88 direct sum, see subspace, direct sum distance, 194 dot product, see inner product echelon form, 18 equation, 9 reduced, 89 eective dimension, see dimension, effective eigenspace, 109 eigenvalue, 101 complex, 151 multiplicity, 105 eigenvector, 101 complex, 151 generalized, 214 elimination, 3 forward, 4 GaussJordan, 7 fast formula for the determinant, 61 Fibonacci, 27 eld, 173 form bilinear, see bilinear form volume, 315
Index
form, exterior, 315 forward, see elimination, forward free variable, 8 fundamental theorem of algebra, 158 GaussJordan, see elimination, Gauss Jordan generalized eigenvector, see eigenvector, generalized Hermitian inner product, see inner product, Hermitian, see inner product, Hermitian homomorphism, 265 identity permutation, see permutation, identity identity map, see linear, map, identity identity matrix, see matrix, identity image, 95 independence linear, see linear independence inequality Bessel, 203 Schwarz, 193 triangle, 193 inertia law of, 288 inner product, 119, 170 Hermitian, 152, 171 space, 171, 201 intersection, 207 invariant subspace, see subspace, invariant inverse, 173 of a matrix, see matrix, inverse of permutation, see permutation, inverse isometry, 195 isomorphism, 164 Jordan block, 214 normal form, 214 kernel, 91 kill, 42, 91
Index
483
nondegenerate bilinear form, see bilinear form, nondegenerate normal form Jordan, see Jordan normal form normal matrix, see matrix, normal open ball, see ball, open orthogonal, 122 orthogonal complement, 201 orthogonally diagonalize, see diagonalize, orthogonally orthonormal basis, see basis, orthonormal partition, 253 alphabetical order, 254 associated to a permutation, 253 permutation, 30 identity, 31, 279 inverse, 31 matrix, see matrix, permutation natural, of partition, 254 product, 31 sign, 182 permutation formula column, see determinant, permutation formula, column row, see determinant, permutation formula, row perpendicular, 120 Pfaan, 255 pivot, 4, 18 column, 74 plane complex, see complex plane polar coordinates, 149 polarization, 310 polynomial, 311 characteristic, 101 homogeneous, 311 minimal, 227 positive denite, see quadratic form, positive denite positive semidenite, see quadratic form, positive semidenite principal component, 275 principal components, 273 product
length, 120 line, 194 segment, 194 linear combination, 72 equation, 3 independence, 81 map, 163 complex, 170 identity, 163 relation, 81 map, 195 linear, see linear map matrix, 17 addition, 20 associated, 163 change of basis, 83 diagonal entries, 19 identity, 25 inverse, 28, 43 multiplication, 22 normal, 155 permutation, 32 self-adjoint, 153 short, 42 skew-adjoint, 153, 238 skew-symmetric, 238 square, 17, 28 strictly lower triangular, 35 strictly upper triangular, 36 subtraction, 20 symmetric, 121 unitary, 154 upper triangular, 53 minimal polynomial, see polynomial, minimal minimum principle, 135 modulus, 150 multilinear, 303 multiplicative function, 224 multiplicity eigenvalue, see eigenvalue, multiplicity negative denite, see quadratic form, negative denite nilpotent, 232
484
Index
dot, see inner product skew-symmetric normal form, 251 of permutations, see prmutation, product31 spectral scalar, see inner product theorem, see theorem, spectral spectrum, 102 tensor, see tensor, product square, see matrix, square product, inner, see inner product matrix, see matrix, square see inner product, 119 standard basis, see basis, standard projection, 201 string, 214 pure subspace, 77 tensor, see tensor, pure complement, see subspace, comPythagorean theorem, see theorem, Pythagorean plementary quadratic form, 138 complementary, 207 kernel, 292 direct sum, 207 negative denite, 144 invariant, 232 positive denite, 144 sum, 207 positive semidenite, 144 substitution quotient space, 269 back, see back substitution sum radius of subspaces, see subspace, sum of ball, see ball, radius sum, direct, see subspace, direct sum rank, 9 Sylvester, 288 tensor, see tensor, rank symmetric real matrix, see matrix, symmetric vector space, see vector space, real symmetrize, 298 reciprocal, 173 tensor, 295, 304 reduced echelon form, see echelon form, antisymmetric, 298 reduced contraction, 297 reection, 198 contravariant, 310 relation covariant, 310 linear, see linear relation product, 297 restriction, 168, 232 pure, 305 reversal, 197 rank, 306 rotation, 196, 198 symmetric, 298 row tensor product permutation formula, see determibasis, 305 nant, permutation formula, of tensors, 304 row of vector spaces, 304 row echelon form, 9 of vectors, 304 scalar product, see inner product theorem Schwarz inequality, see inequality, Schwarz CayleyHamilton, 187, 230 self-adjoint, see matrix, self-adjoint de Moivre, 150, 157 decoupling, 110 shear, 189 Pythagorean, 120, 202 short matrix, see matrix, short spectral, 136 sign trace, 105, 237, 248, 298 of a permutation, see permutatranslate, 268 tion, sign transpose, 64, 268 skew-adjoint matrix, see matrix, skewadjoint transposition, 32
Index
485
transverse, 210 triangle inequality, see inequality, triangle unit circle see circle, unit, 151 unitary basis, see basis, unitary matrix, see matrix, unitary variable free, see free variable vector, 17, 161 vector space, 161 complex, 170 real, 170 weight of polynomial, 246