308 Notes
308 Notes
308 Notes
Contents
1 Linear Systems 3
1.1 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Linear Systems and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.1 Traffic flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.2 Data fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Euclidean space 17
2.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.1 Geometry of vector operations . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 Ax = b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.1 Spans and linear independence . . . . . . . . . . . . . . . . . . . . . . . . 25
3 Matrices 27
3.1 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.2 What does T tell us? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.3 Geometry of Linear Transformations . . . . . . . . . . . . . . . . . . . . . 31
3.2 Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.1 The basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.2 Transpose of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.3 Powers of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.2 Finding A1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
1
4 Subspaces 45
4.1 Introduction to subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.1 Span and nullspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.2 Kernel and range of linear transformations . . . . . . . . . . . . . . . . . 49
4.2 Basis and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.2 Finding a basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.3 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3 Row and Column Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5 Determinants 61
5.1 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.1.1 A Formula for the Determinant . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1.2 Applications of the determinant . . . . . . . . . . . . . . . . . . . . . . . . 65
2
Introduction
Linear algebra is ubiquitous with advanced applications of math. Here are some examples of
applications, that we will talk about more during the quarter.
Ch. 1 Scheduling and optimization: Lufthansa pilots, flight attendants, cleaning, maintenance,
schedules.
Ch. 2 Video Games and computer animation: quickly update a 3d scene by rotating and scaling
vectors.
Ch. 4 Computer vision: rectify points in stereo vision.
Ch. 6 Google PageRank : determine how to list search results.
1 Linear Systems
1.1 Systems of Linear Equations
Lets motivate linear algebra with an example.
Example 1.1. Xavier is two years older than Yang. Their ages sum to 6.
x+y =6
x y = 2.
There are three ways to think about this: substitution, doing algebra with equations, or
graphically. Many of you are probably used to solving this with substitution. For example, if
we solve the second equation for x, we find that
x = y + 2.
Now, we can plug this into the first equation to get
y+2+y =6
which means y = 2, and x = 4. There is nothing wrong with this method, but well soon be
trying to solve more complicated systems (with lots of variables and lots of equations), so we
might want to focus on a different method.
We could solve this equation in another way by doing a little equation algebra. For instance,
if we just add the first equation to the second, we get
x+y+xy =6+2
or
2x = 8
which immediately tells us that x = 4. We can then solve for y to see y = 2 as well. This is the
process we are going to formalize in this course.
Finally, another beneficial way to do this would be to graph the linear system. The solution
is the intersection of two lines. We know how to graph lines and can just sketch them to find
the solution. Well come back to this throughout the course.
3
What if we wanted to solve a harder question?
Example 1.2. Ling, an engineer, was just hired to orchestrate the upkeep of 7 bridges in the
San Francisco area. She has a budget of $700,000. Older bridges cost more per day to repair.
In fact, the nth oldest bridge costs n thousand dollars per day. They need more work too. She
plans to spend 2 days more on each bridge than the one built just before it. How many days
does she allocate to each bridge?
What if the same scheme was kept, but instead Ling had 100 bridges and $770,000,000
dollars? How many days should the first bridge get? How could you program a computer to
solve this? (answer 10 days)
What if Ling had M bridges and 5M (M + 1) thousand dollars? How many days should the
first bridge get? Can a computer even solve this question? (answer 76 days)
Well come back to this later. The course will be focused on studying these equations and
systems of these equations. Lets begin with the terminology.
Notation 1.3. Rather than x and y we might have a lot of variables. So we use x1 , x2 , . . . , xn
denote different variables. Other ways they might appear are as xi , yi , wi , zi . We also let
a1 , . . . , an R denote constants. Here ai R means ai is an element of the set of real numbers.
a1 x1 + a2 x2 + + an xn = b
Example 1.5.
2x1 + 3x2 + x3 = 6
is an example of a linear equation with 3 variables.
Definition 1.7. A solution to a linear equation is a set of numbers (s1 , s2 , . . . , sn ) that satisfies
the equation.
2x1 + 3x2 + x3 = 6,
4
we get
2+3+1=6
which shows that (1, 1, 1) is a solution to the equation.
Definition 1.9. The solution set to a linear equation is the set of all solutions to a given
equation.
Notice the double subscripts; aij means ith down and jth over when arranged in standard form.
Can you tell that x1 = 17? Later well introduce a quick way to solve this system.
Example 1.12.
2x1 + 3x2 + x3 = 6
x1 x2 x3 = 2
2x1 + x2 7x3 = 1
Definition 1.13. A solution to a linear system is a set of numbers (s1 , s2 , . . . , sn ) that simul-
taneously satisfy all of the equations in the linear system.
Definition 1.14. We say a linear system is consistent if it has at least one solution. We say
it is inconsistent if it has no solutions.
5
Keep in mind that there can be more than one solution to a linear system; for example the
system
2x1 + 3x2 = 6
4x1 + 6x2 = 12
has infinitely many solutions (because it is two multiples of the same equation).
We can also use the graph perspective to analyze solution sets.
In the example of two lines intersecting (solving a system of two equations with two variables),
we know the lines could either intersect in one point (meaning there is exactly one solution), or
intersect nowhere by being parallel (meaning there are no solutions), or be the same line (meaning
there are infinitely many solutions). This is our first indication of the following theorem, which
well prove soon.
Theorem 1.15. A linear system has no solutions, exactly one solution, or infinitely many
solutions.
Two questions: Why is linearity important here? And, how do we actually find these solu-
tions? We want to develop an algorithm to systematically find the solutions. First, lets start
with a few special types of systems.
Example 1.16. Solve the system
x1 + 2x2 + 2x3 = 2
x2 x3 = 1
x3 = 2.
This triangular system is something we can solve by back-substitution (i.e., start at the
bottom and substitute upwards to find all the variables). We start at the bottom, where x3 = 2,
and then plug that in to the second equation to find that x2 2 = 1, or x2 = 3, and then finally
plug these into the first equation to find x1 = 8. Therefore, we find the (unique) solution is
(8, 3, 2).
6
Definition 1.17. The above system is an example of a triangular system, where the linear
system has n equations and n variables, and the first nonzero term of the k-th equation is the
k-th variable.
We can see this in the above example, and it is true in general because back-substitution
will always give us that solution.
This system is not triangular (there are more variables than equations) but we can still try
to solve it with back substitution. The last equation tells us that x4 = 3, and if we try to plug
that in to the second equation, we see that x2 = x3 1 and that we have no information about
x3 !
Because there are no restrictions on x3 , that means it is a free variable and can take any
value. We assign it a free parameter (a letter representing any number we want), says x3 = s.
Then, x2 = s 1 and we can plug all of this into the first equation to find x1 = 2 + s. In other
words, the solutions are
(2 + s, 1 + s, s, 3)
where s R.
Definition 1.20. The above system is an example of a system in echelon form. We say a
system is in echelon form if
(a) Every variable is the leading variable (the first nonzero term) of at most one equation.
(b) The system is organized in a stair-step pattern, meaning the leading varibles move down
and to the right.
2x1 4x2 = 11
x1 x2 = 5.
Theorem 1.22. A system in echelon form always has at least one solution.
This is true for the same reason as the analogous fact about triangular systems; we can find
the solutions by back-substitution.
7
1.2 Linear Systems and Matrices
In the last section, we saw that if a system was in echelon form, we could solve it with back
substitution. In this section, well figure out how to convert systems into echelon form to easily
find solutions. To do that, were going to introduce matrices.
Definition 1.23. A matrix is a rectangular array of numbers.
Example 1.24.
1 2 3 1 2
and
3 4 1 4 7
are examples of matrices.
Notation 1.25. We will use both hard [] and soft () brackets for matrices. They both mean
the same thing.
Each linear system corresponds to two matrices.
Definition 1.26. The linear system
8
We can use the matrix to systematically find solutions to systems of equations. First, well
introduce echelon and reduced echelon forms, and then elementary operations.
10 7 20 4
A = 3 2 6 1
1 4 2 9
Definition 1.28. A matrix is in echelon form if
(a) the leading terms (the first nonzero entries of each row) move down and to the right in a
stair step pattern.
(b) any rows of zeros are grouped at the bottom of the matrix.
7
2 25
1 10
echelon(A) = 0 1 0 2
0 0 0 0
Definition 1.29. A matrix is in reduced echelon form if
(c) each leading term is the only nonzero term in its column
1 0 2 1
reduced echelon(A) = 0 1 0 2
0 0 0 0
Definition 1.30. Elementary operations are operations that we can do to a linear system that
dont change its solution set. There are three of them:
Now, we will outline an algorithm for converting a matrix into an equivalent matrix in reduced
echelon form, and see how that gives us the solutions to the corresponding linear system. We
will outline the process with an example.
9
1. Write the linear system as the corresponding augmented matrix.
In the example,
2 4 6 2
1 1 2 1 .
1 3 5 2
2. If necessary, interchange two rows to get a nonzero entry in the upper left corner. Multiply
the first row by a constant to make that entry a 1.
In the example, this means multiplying the first row by -1/2,
1 2 3 1
1 1 2 1 .
1 3 5 2
3. Add multiples of the first row to the other rows to zero out the rest of the column.
In the example, this corresponds to adding R1 to R2 and adding R1 to R2 ,
1 2 3 1
0 1 1 2 .
0 1 2 1
4. Move over one column and down one row and repeat.
In this example, we move to the entry in the second row and second column. Its already
a 1, so we proceed to step 3 to zero out the rest of the second column by adding 2R2 to the
first row adding R2 to R3 ,
1 0 1 3
0 1 1 2 .
0 0 1 1
Now, we move down and over again to the third row and third column. Its also a 1, so
we proceed to step 3 and zero out the rest of the column,
1 0 0 2
0 1 0 3 .
0 0 1 1
x1 = 2
x2 = 3
x3 = 1
so the solution is (2, 3, 1). Its always a good idea to check this!
10
This process will work for any system.
1R1
1 3 9 7
1 2 6 5
2 7 21 16
1R1 + R2
2R1 + R3
1 3 9 7
0 1 3 2
0 1 3 2
1R2
1 3 9 7
0 1 3 2
0 1 3 2
1R2 + R3
1 3 9 7
0 1 3 2
0 0 0 0
Reduced echelon form:
1 0 0 1
0 1 3 2
0 0 0 0
Example 1.33. No solutions.
10 3 5
3 1 3
5 2 4
1/10R1
3
21
1 10
3 1 3
5 2 4
3R1 + R2
5R1 + R3
3
21
1 10
1
0
10 29
0 12 13
2
11
10R2
3
21
1 10
0 1 45
0 12 13
2
1/2R2 + R3
3
21
1 10
0 1 45
0 0 16
1/16R3
3
21
1 10
0 1 45
0 0 1
Reduced echelon form:
1 0 0
0 1 0
0 0 1
Lets make a few observations about systems in reduced echelon form.
1. In reduced echelon form, if the last row looks like [0 . . . 0 | 1], that translates to the
equation 0 = 1, which is FALSE, so the system has NO SOLUTIONS.
2. If there is not a row like that, then either each column has a leading one, meaning there
is exactly one solution, or there is at least one column without a leading one, meaning it
is free and can take any value, so there are infinitely many solutions.
3. Combining the two statements above, weve actually proven Theorem 1.11, because these
are the only possibilities in reduced echelon form.
4. If there are more variables than equations (more columns than rows in the matrix), then
each column CANNOT have a leading one. This means the system has either no solutions
or infinitely many, but it cannot have exactly one.
Now we introduce special type of linear systems that ALWAYS have solutions.
Definition 1.34. A linear system is homogeneous if all of the constant terms are equal to 0,
i.e. it looks like
Note that these systems always have a solution, namely (0, 0, . . . 0), called the trivial so-
lution. There may be (infinitely many) other solutions, but we at least know the system is
consistent.
12
1.3
SKIP
1.4 Applications
1.4.1 Traffic flow
Arcata, on the northern coast of California, is a small college town with a central plaza (Figure
1). Figure 2 shows the streets surrounding and adjacent to the towns central plaza. As indicated
by the arrows, all streets in the vicinity of the plaza are oneway.
Figure 2 Traffic volumes around the Arcata plaza. Traffic flows north and south on G and H
streets, respectively, and east and west on 8th and 9th streets, respectively. The number of
cars flowing on and off the plaza during a typical 15-minute period on a Saturday morning is
also shown. Our goal is to find x1 , x2 , x3 , and x4 , the volume of traffic along each side of the
plaza. The four intersections are labeled A, B, C, and D. At each intersection, the number of
cars entering the intersection must equal the number leaving. For example, the number of cars
entering A is 100 + x1 and the number exiting is 20 + x2 . Since these must be equal, we end up
with the equation
A : 100 + x1 = 20 + x2
Applying the same reasoning to intersections B, C, and D, we arrive at three more equations,
B : x3 + 30 = x1 + 100
C : x2 + 25 = x3 + 95
D : x3 + 75 = x4 + 15.
Rewriting the equations in the usual form, we obtain the system:
13
To solve the system, we populate an augmented matrix and transform to echelon form.
1 1 0 0 80
1
0 0 1 70
0 1 1 0 70
0 0 1 1 60
1R1 + R2
1 1 0 0 80
0
1 0 1 10
0 1 1 0 70
0 0 1 1 60
1R2 + R3
1 1 0 0 80
0
1 0 1 10
0 0 1 1 60
0 0 1 1 60
1R3
1 1 0 0 80
0
1 0 1 10
0 0 1 1 60
0 0 1 1 60
1R3 + R4
1 1 0 0 80
0
1 0 1 10
0 0 1 1 60
0 0 0 0 0
Reduced echelon form:
1 0 0 1 70
0
1 0 1 10
0 0 1 1 60
0 0 0 0 0
The last coordinate x4 = s1 is free. Back substitution yields the general solution
x1 = 70 + s1 , x2 = 10 + s1 , x3 = 60 + s1 , x4 = s1 .
where s1 is a free parameter. A moments thought reveals why it makes sense that this system
has infinitely many solutions. There can be an arbitrary number of cars simply circling the plaza,
perhaps looking for a parking space. Note also that since each of x1 , x2 , x3 , and x4 must be
nonnegative, it follows that the parameter s1 70. The analysis performed here can be carried
over to much more complex traffic questions, or to other similar settings, such as computer
networks.
14
1.4.2 Data fitting
Given a set of ordered pairs we want to find a curve that goes through all of them.
Example 1.35. Find a quadratic through (0, 1), (1, 2) and (2, 7).
y = a1 x2 + a2 x + a3 .
So we have a system:
a3 = 1
a1 + a2 + a3 = 2
4a1 + 2a2 + a3 = 7
0 0 1 1
1 1 1 2
4 2 1 7
R1 R2
1 1 1 2
0 0 1 1
4 2 1 7
4R1 + R3
1 1 1 2
0 0 1 1
0 2 3 1
R2 R3
1 1 1 2
0 2 3 1
0 0 1 1
1/2R2
1 1 1 2
0 1 3 1
2 2
0 0 1 1
Reduced echelon form:
1 0 0 2
0 1 0 1
0 0 1 1
Example 1.36. Find the plane equation going through (1, 1, 0), (1, 2, 1) and (0, 0, 1).
a1 x1 + a2 x2 + a3 x3 b = 0.
1 1 0 1 0
1 2 1 1 0
0 0 1 1 0
15
1R1 + R2
1 1 0 1 0
0 1 1 0 0
0 0 1 1 0
Reduced echelon form:
1 0 0 2 0
0 1 0 1 0
0 0 1 1 0
a1 = 2b, a2 = b, a3 = b.
So we have infinitely many solutions. But what is one solution? Well, set b equal to anything
nonzero. For instance, if b = 1 then we have
2x1 x2 + x3 1 = 0
16
2 Euclidean space
2.1 Vectors
Definition 2.1. A vector is an ordered list of real numbers (called components) arranged in
a vertical column or in wedge brackets h i.
Example 2.2.
u1
u2 1
2 h1, 2, 3i
..
.
3
un
We need to know where vectors live.
Notation 2.3. We write a R to mean that a is a real number. Often we call a a scalar.
Vectors on the other hand live in higher-dimensional space. The set of all n-dimensional vectors
is denoted by Rn , here R is the set of all real numbers. We write ~u Rn .
u1 v1
u2 v2
Definition 2.4. Let ~u, ~v Rn . That is ~u = . and ~v = . .
.. ..
un vn
~u = ~v ui = vi for all 1 i n.
u1 + v1
u2 + v2
~u + ~v =
..
.
un + vn
u1 cu1
u2 cu2
c~u = c . = .
.. ..
un cun
3 3 11
Example 2.5. 2 + 2 2 = 4
0 0 6
Vectors have all the nice properties that we associate to scalars. Like c(~u~v )+w ~ uc~v .
~ = w+c~
See the book for a list and proof of all of these properties. There are other vector operations.
You might remember the dot product from multi-dimensional calculus.
Example 2.7.
3 3
2 2 = 9 + 4 + 1 = 14.
1 1
17
c1
.
Notation 2.8. The dot product allows us write linear equations compactly. Fix b. Call ~c = ..
cn
x1
.
and ~x = .. then
xn
c1 x1 + cn xn = b is the same as ~c ~x = b.
We are interested in something that looks similar to a dot product, but involves summing
scalars times vectors.
Definition 2.9. Given c1 , . . . , cm R a linear combination of vectors ~u1 , . . . , ~um is the vector
c1 ~u1 + cm ~um .
Notation 2.10. Given vectors ~u1 , . . . , ~um it is convenient to write the matrix with columns
corresponding to these vectors as [~u1 ~um ].
We wont elaborate on this much now, but this notation allows us to concisely describe a
linear combination.
b1 c1
. .
Fact 2.11. Fix ~b = .. . Call ~c = .. and A = [~u1 ~um ] with ~ui Rn . We will see
bm cn
later that a linear combination is the same as A~c, and that a system of equations is the same as
A~c = ~b.
General solutions can be written in terms of linear combinations of vectors. This is called
the vector form of the general solution.
x1 = 70 + s1 , x2 = 10 + s1 , x3 = 60 + s1 , x4 = s1 .
0 1
Well see that there are several other ways to represent solutions.
18
Application 2.13. Vector graphics such as PDF and CAD use vectors to make scalable
images. Rather than recording image pixel data, these record everything as either a line, circle,
ellipse, curve with start and end points. Rescaling is then multiplying by a constant. So vector
graphics can be scaled infinitely! Basically the file has 1000s of instructions that read like:
Points in a vector graphic are described by a vector describing their position from the
origin.
Connect the points together with a straight line (or curve) of a certain thickness
19
Example 2.14. This is stored as a vector graphic consisting of three lines. Each line is
stored as a vector and a base point for the vector. Suppose that the base of the arrow is at the
origin and has length 4.
Describe the vector and base point of the three lines in the arrow. (There are many choices
that could work.) Then, rescale it to be 4 times larger in size. What would make it point left
instead of right?
4
Solution: The line from base to tip is rooted at (0, 0) with the vector . The two points
0
1 1
of the arrows are the vectors , based at (4, 0). To rescale we multiply everything by
1 1
4. Multiplying everything by 1 would change the direction.
2.2 Span
Thinking of vectors as building blocks. The span is the set of all points we can build with our
vectors. Better yet, think of vectors like your favorite smoothie ingredients. Imagine you want
to blend them together to hit a certain nutrient profile (fat, protein, carbs, iron, calcium, . . .).
Studying the span asks what profiles we can make with the vectors and which ones we cannot.
We expand on this here with an example in R2 .
2.2.1 Introduction
x1
x2 n
Question 2.15. How many vectors, ~u1 , ~u2 , . . . , ~um does it take so that for any ~x = ... in R
xn
we can find c1 , c2 , . . . , cm R so that c1 ~u1 + c2 ~u2 + + cm ~um = ~x?
More succinctly, how many vectors do we need to express every point in Rn as a linear
combination?
2 1 1
Example 2.16. Lets look at R . Imagine we have vectors ~u = and ~v = and we want
2 0
to use them to get to the point (5, 6). With ~v we can adjust our horizontal position as we please.
3
So, we first use ~u to get to height 6. This is at 3~u = . Next, we need to shift back by 1 in
6
20
the x-coordinate. This requires ~v . So
5
3~u ~v = .
6
In this manner we can express any point in R2 using ~u and ~v . To see this, take (x, y) R2
and solve
x
c1 ~u + c2~v = .
y
This is equivalent to
c1 + c2 = x
2c1 = y.
Given a set of vectors in Rn we are interested in the set of all points we can make with linear
combinations. This is called the span.
Definition 2.18. The span of vectors ~u1 , . . . , ~um Rn is the set of all linear combinations
where c1 , . . . , cm R.
2.2.2 Properties
Whether or not a matrix has a solution tells us whether a vector belongs to the span.
Theorem 2.19. Let ~u1 , . . . , ~um Rn . Then v span{~u1 , . . . , ~um } if and only if the matrix
equation [~u1 ~um | ~v ] has at least one solution.
21
1 1 1
Example 2.21. Is span 1 , 0 , 0 = R3 ?
2 1 0
Call these vectors ~u1 , ~u2 , ~u3 . Given (a, b, c) R3 we want to find x1 , x2 , x3 so that
a
x1 ~u1 + x2 ~u2 + x3 ~u3 = b .
c
This gives matrix
1 1 1 a 1 0 0 b
1 0 0 b 0 1 0 2 b + c
2 1 0 c 0 0 1 a 3b + c
Yes!
1 3 2
Example 2.22. Is span 1 , 1 , 1 = R3 ?
1 1 0
Lets setup a matrix and reduce.
1 0 21 12 a + 32 b
1 3 2 a
1 1 1 b 0 1 12 1
2a 2b
1
1 1 0 c 0 0 0 a 2b + c
No! It is described by the plane a 2b + c = 0.
Theorem 2.23. If m < n then ~u1 , . . . , ~um Rn do not span Rn . If m n then maybe they
span Rn .
Proof: Idea for m < n: Row reduce to get to a row of all zeros. Keep track of the operations.
Augment, setting the all zero row equal to 1. Reverse the operations. The new augmented
vector is not in the span. Heres a worked out example from the book:
22
For m n. The vectors ei = (0, . . . , 1, 0, . . . , 0) (1 in the ith place) span Rn . But if was take
2n copies of just say e1 , this doesnt span Rn .
2.2.3 Ax = b
With spans we are interested in solving vector equations. It will be useful to have a compact
way to express these.
Definition 2.24. Let ~a1 , . . . , ~am Rn be vectors. Set A = [~a1 ~am ]. Let ~x = (x1 , . . . , xm )
with xi R. We have
A~x = x1 ~u1 + xm ~um .
Example 2.25. Kyle pays very close attention to what he eats post workout, and his macronu-
trient needs vary greatly from week to week. He typically blends a smoothie with yogurt, banana,
avocado and tofu. These have macro-nutrient ratios per 100g as
Y B A T
Protein 9 1 2 5
.
Fat 5 1 15 3
Carb 4 22 1 2
x1
x2
If A is the nutrition matrix, then ~x =
x3 is the recipe Kyle follows, and A~x is the resulting
x4
smoothie.
Note that A needs to have the same number of columns as ~x has rows.
0 0 1
Example 2.26. Let A = 1 1 1 . The system
4 2 1
1
A~x = 2
7
23
Definition 2.27. A set of vectors ~u1 , . . . , ~um Rn is linearly independent if the only solution
to
x1 ~u1 + + xm ~um = ~0
is the trivial solution ~x = ~0 (x1 = 0, x2 = 0, . . . , xm = 0). Otherwise the set is linearly
dependent. Another way to say this is that the homogeneous equation A~x = 0 has only
one solution.
1 1 1
Example 2.28. Is 0 , 1 , 1 linearly independent?
0 0 2
24
Say that (y, b, a, t) is the amount of each ingredient he uses. He reduces this matrix:
y b a t 745
9 1 2 5 1 0 0 1261 0
0 1 0 12 0 .
9 1 15 3
1261
2
0 0 1 13 0
4 22 1 2
Letting s = t, this has solution set
745 12 2
( s, s, s, s).
1261 1261 13
Set s = 1261 and we have
Theorem 2.31. If n < m, then ~u1 , . . . , ~um Rn are linearly dependent. If m n, then they
might be linearly independent.
Theorem 2.32. Let {~u1 , . . . , ~um } be a set of vectors in Rn . The set is linearly dependent if
and only if one of the vectors is in the span of the other vectors.
25
1 1 0 2
0 , , , 0 linearly dependent?
2 1
Example 2.33. Is the set of vectors
3 1 2
0
1 0 1 4
Answering this question corresponds to solving a homogeneous system, and to save time
we may omit the column of zeros in our augmented matrix. Converting the matrix to reduced
echelon form,
1 1 0 2 1 0 0 0
0 2 1 0 0 1 0 2
0 3 1 2 0 0 1 4 .
1 0 1 4 0 0 0 0
Letting x4 = t, the solution set to x1 ~u1 + x2 ~u2 + x3 ~u3 + x4 ~u4 = ~0 is (0, 2t, 4t, t). In particular,
this tells us that ~u1 is not in span{~u2 , ~u3 , ~u4 }. This example shows an important caveat of the
above theorem, that a set being linearly dependent only implies that some vector is in the span
of the others, but this need not be true for every vector.
Theorem 2.34. Let{~u1 , . . . , ~um } be a set of vectors in Rn . Then {~u1 , . . . , ~um } is linearly
independent if and only if the system [~u1 ~un : ~v ] has at most one solution for every ~v Rn .
So geometrically speaking, if its possible to reach a vector ~v with a linearly independent set
{~u1 , . . . , ~um }, then the displacement along ~u1 , . . . , ~um is always the same.
When m = n, we have a much stronger statement.
Theorem 2.35. Let {~u1 , . . . , ~un } be a set of vectors in Rn . The following are equivalent:
26
3 Matrices
Lets review the story so far. We have been interested in solving systems of linear equations
A~x = ~b where we try to solve for ~x in terms of A and ~v . We studied linear independence and
span to understand better when we have solutions. To do this we introduced a bit of matrix
algebraexplaining how a matrix acts on a vector. Here we look at this from a more geometric
viewpoint. Lets start with an example.
Example 3.1. You have a picture of a cat that is winking, and want it to cock its head to the
side in a cute way. How do you animate this?
Example 3.2. For an art project you are going to film the shadows of two statues over the
course of a 1 hour period. You want the statues to give each other a high-5 at the end of the
hour.
You need to calculate how the shadows move across the (say perfectly flat) pavement. The
sun at any given instant can be thought of as a function from R3 to R2 . We are interested in
characterizing properties of this function.
27
3.1 Linear Transformations
When we take an n m matrix A and multiply it by a vector in ~v Rm we get a linear
combination of the column vectors in A. This new vector, A~v , lives in Rn . So, we can think of
A as a function that takes vectors in Rm to vectors in Rn .
Example 3.4.
1 2 1 x1 2
A = 0 1 3 , ~x = x2 ,
~b = 0 .
1 2 5 x3 7
Fact 3.5. A nice general form of the above is if A = [~a1 ~am ] then
Definition 3.7. The domain of a function is what can be inputted in the function. The image
is the result of the function. We call the range the set of images. The codomain is the space
where images live.
Notation 3.8. We denote the domain and codomain of a function f by: Dom(f ), and Cod(f ).
Also, we let Range(f ) denote the range.
3.1.1 Basics
Definition 3.9. A function T : Rm Rn is a linear transformation if for all vectors ~u, ~v Rm
and scalars r we have
Fact 3.10. T (~0) = ~0. This is because of linearity: T (~u) = T (~u + 0) = T (~u) + T (~0).
28
x+y
x
Example 3.11. T = x is a linear transformation that takes R2 R3 , but
y
y
x xy
T = is not a linear transformation because it isnt linear.
y y
This may look daunting, but it is a property satisfied by matrices. That is for any n m
matrix A and vectors ~x, ~y Rn we have
1. A(~x + ~y ) = A~x + A~y
2. A(r~x) = r(A~x).
This will be our focus for this chapter. We record this fact as a theorem.
Theorem 3.12. Let A = [~a1 ~am ] be a matrix with ~ai Rn and define T (~x) = A~x. Then
T : Rm Rn is a linear transformation.
1 2
Example 3.13. Let A = 2 3 and define T (~x) = A(~x). Then T : R2 R3 is a linear
0 1
2 3 1
transformation. The domain is R and the range is R . Let ~u = . Then
3
1 2 7
A~u = 1 2 + 3 3 = 7 .
0 1 3
7 a
So 7 is the image of ~u under T . But is any vector b in the range of T ? That is, does
3 c
a
x
there exist ~x = 1 such that T (~x) = b ? Well this would mean
x2
c
" # x 2x x 2x a
1 2 x
1 2 1 2
2 3 1 = ?
2x1 + 3x2 = 2x1 + 3x2 = b .
0 1 x2
0 x2 x2 c
No, x2 = c, x1 = (b 3c)/2. So a = (b 3c)/2 2c, which means it isnt free.
Or, we could have row reduced this system and seen it was inconsistent for some values of
a
b .
c
1 2 a
0 1 27 a + 17 b
0 0 27 a 17 b + c
Theorem 3.14. Let A = [~a1 ~am ] be an n m matrix. and let T : Rm Rn with T (~x) = A~x
a linear transformation. Then,
~ Range(T ) if and only if T (~x) = Aw
1. w ~ is a consistent linear system.
2. Range(T ) = span{~a1 , . . . , ~am }.
29
3.1.2 What does T tell us?
Theorem 3.15. For any linear transformation T there is a matrix A so that T (~x) = A~x.
0
.
Proof: Let ~ei = ..1 with a 1 in the ith place. The matrix A = [T (~e1 ) T (~em )] is the matrix
..
.
we want. Any ~x Rm can be written as x1~e1 + xm~em . We then have
So, we are interested in properties of T . Here are a few things one can say about any function:
Definition 3.16. We say that T is one-to-one if T (~u) is unique for all ~u in the domain. We
say that T is onto if every ~v in the codomain is equal to T (~u) for some ~u in the domain.
Theorem 3.18. Let A = {~a1 ~an } and T : Rn Rn be given by T (~x) = A~x. The following
are equivalent:
1. A spans Rn .
2. A is linearly independent.
4. T is onto.
5. T is one-to-one.
30
3.1.3 Geometry of Linear Transformations
There are three basic transformations: Dilations, Rotations and Translations. Lets look at
these three examples in more depth.
Before we can describe a dilation, we need some notion of the size of a vector.
p
Definition 3.19. Given ~x Rn we let k~xk = x21 + x22 + + x2m .
Example 3.20. A dilation by a factor of r R+ is just the map T (~x) = r~x. Notice that
kr~xk = rk~xk so we
are indeed scaling by
r. Can
you think of the matrix attached to this map?
r 0 x1 rx1
It is Ar = . This is because Ar = . Notice that his works in any dimension!
0 r x2 rx2
Also, note that if we want to scale in two directions differently we multiply by the matrix
r 0
Ar,s = .
0 s
Some food for thought... what happens if r < 0? Well, we still scale, but something else
geometric also happens.
2 cos sin
Example 3.21. A rotation of every vector by in R is given by the matrix A = .
sin cos
Lets see why with the vector h1, 0i
1 cos ()
A = .
0 sin ()
x
If we think about a general vector then the trig is a little trickier. This is a CAMPS exercise.
y
If youre curious, here is the resulting vector:
x cos () sin () x x cos () y sin ()
A = = .
y sin () cos () y y cos () + x sin ()
Say we want to rotate in Rd about one of the axes x, y, or z by . We use the matrices
(respectively):
1 0 0 cos 0 sin cos sin 0
Ax, = 0 cos sin Ay, = 0 1 0 Az, = sin cos 0 .
0 sin cos sin 0 cos 0 0 1
31
More complicated rotations are possibly by multiplying these matrices together, but we wont
get into that now.
x
Example 3.22. Say we want to translate a vector, R2 , over 3 and up 2. The natural
y
x x+3
thing to try would be T = . But this isnt a linear transformation, because
y y+3
T (r~x) 6= rT (~x). It turns out that to make this work we need to think of R2 as a plane sitting
inside R3 .
1
Thinking of vectors as points, lets start with the vector ~e1 = . We would like to translate
0
it over 3 and up 2 to get
3
.
2
Weve seen that this isnt possible with a linear transformation from R2
R2 . The fix is to go
x
3 x
up a dimension and map the plane into Z via the function f = y . Note this isnt a
y
1
~ ~
linear transformation because f (0) 6= 0. Can you see where in R this takes R2 ? This gives
3
1
f (~e1 ) = 0 .
1
1 0 a
Now consider the matrix Aa,b = 0 1 b .
0 0 1
We have
x 1 0 a x x+a
Aa,b y = 0 1 b y = y + b
1 0 0 1 1 1
This works because this is whats called a shear. Think of pushing the sides of this slinky:
32
3.2 Matrix Algebra
3.2.1 The basics
Definition 3.23. Let c R. Let A = [~a1 ~am ] with ~ai Rn . We can write the row vectors of
A as ~ar1 , ~ar2 , . . . , ~arn . This lets us write A alternatively as
r
~a1
~ar2
A= ... .
~arm
Note that A is an n m matrix. Let B = [~b1 ~bk ] with ~bi Rm . Notice that B is an m k
matrix. We define addition, scalar multiplication and matrix multiplication:
1. A + B = [~a1 + ~b1 ~am + ~bm ] This only works if A and B have the same
2. cA = [c~a1 c~am ]
Fact 3.24. Say that A is n m and B is j k. We can only multiply A and B if m = j. The
resulting matrix is n k.
Example 3.26.
1 2 3
1 1 2 4 8 3
1 2 0 = .
1 0 1 0 0 3
1 2 0
Note that if we switched the matrices then the product would be undefined.
W (~x) = T (S(~x)).
then
W (~x) = A(B~x) = (AB)~x.
33
Example 3.29. Let T : R2 R2 be the linear transformation that rotates a vector ~x be . Let
cos sin
A = .
sin cos
T T : R2 R2
2. A(B + C) = AB + AC
3. (A + B)C = AC + BC
5. AI = A
6. IA = A
The one we choose presently concerns identity matrices.
Theorem 3.33. Let A be an n m matrix, then
In A = A = AIm .
34
Equally important are the ways in which matrix algebra differs from the algebra of the real
numbers. Its important to keep these rules in mind to prevent erroneous mistakes.
Theorem 3.34. Let A, B, and C be nonzero matrices.
(a) It is possible that AB 6= BA.
(b) AB = 0 does not imply that A = 0 or B = 0.
(c) AC = BC does not imply A = B or C = 0.
Example 3.35. Weve already seen an example of (a). For part (b) we consider the product
1 2 2 2 0 0
= .
2 4 1 1 0 0
For part (c) we observe
3 1 1 1 5 5 5 0 1 1
= = .
2 1 2 2 4 4 0 2 2 2
35
Example 3.41. Two neat applications of skew symmetric matrices to physics are as follows:
1. The cross product matrix B with the property that B~x = ~b ~x.
0 b3 b2
b3 0 b1
b2 b3 0
is skew-symmetric.
Ak = |A A{z
A} .
k terms
We can encode this graph with an adjacency matrix where we put a 1 for any two vertices
that have an edge between them.
A B D E C F G
A 0 1 0 0 1 0 0
B 1 0 1 1 0 0 0
D 0 1 0 0 0 0 0
Q=
E 0 1 0 0 0 0 0
C 1 0 0 0 0 1 1
F 0 0 0 0 1 0 0
G 0 0 0 0 1 0 0
A neat feature of matrix multiplication is that Qk counts exactly how many paths of length k
between two vertices there are. The way to see this is to realize that when we take the dot
product of say the B row with the B column we will get a +1 for each connected vertex. Thus,
36
the result is exactly how many ways one can get from B back to itself in 2 steps. This reasoning
continues for higher powers.
2 0 1 1 0 1 1 0 4 0 0 4 0 0 8 0 4 4 0 4 4
0 3 0 0 1 0 0 4 0 3 3 0 1 1 0 10 0 0 6 0 0
1 0 1 1 0 0 0 0 3 0 0 1 0 0 4 0 3 3 0 1 1
2 3 4
Q = 1 0 1 1 0 0 0 , Q = 0 3 0 0 1 0 0 , Q =
4 0 3 3 0 1 1
0 1 0 0 3 0 0 4 0 1 1 0 3 3 0 6 0 0 10 0 0
1 0 0 0 0 1 1 0 1 0 0 3 0 0 4 0 1 1 0 3 3
1 0 0 0 0 1 1 0 1 0 0 3 0 0 4 0 1 1 0 3 3
Bryan is surfing the web and randomly clicking on links. Another term for this is random
walk. He starts on page 1. What is the probability that Brian is at page 5 after 10 clicks?
37
We can describe how a random walk behaves after 1 click with the transition probability
matrix:
0.0 0.5 0.5 0.0 0.0 0.0
0.5 0.0 0.5 0.0 0.0 0.0
0.0 0.5 0.0 0.5 0.0 0.0
P = 0.0
0.0 0.0 0.0 0.5 0.5
0.0 0.0 0.5 0.5 0.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0
If we start at website #1 and ask where we can transition this is like computing P ~e1 . Now, to
figure out the next step this is like P (P ~e1 ). This continues so that the answer to our question
is look at the 5th entry of P 10~e1 .
0.076 0.151 0.225 0.22 0.22 0.108 .076
0.075 0.152 0.225 0.22 0.22 0.108 .151
10
0.076 0.148 0.223 0.225 0.217 0.111 10
.225
P =
, P ~e1 =
.
0.073 0.146 0.217 0.22 0.228 0.116 .22
0.07 0.149 0.225 0.228 0.22 0.108 .22
0.076 0.141 0.223 0.217 0.232 0.111 .108
This is .22. For a transition probability matrix P it holds that the matrix P = limk P k
describes the average amount of time a random walk spends at each site. Google actually uses
something very similar to this to rank pages. They meticulously index the web, put it into a
matrix, and the page they show first when you type in a query is the one that has the highest
probability in P . Well give more details about this in Chapter 6.
There are two matrix properties that are preserved when we take powers of matrices.
Theorem 3.45. If A is an n n diagonal matrix as above then Ak is also diagonal for all k 1.
In fact, k
a11 0 0
0 ak 0
22
Ak = . .. .
.. ..
.. . . .
0 0 aknn
Definition 3.46. Let A be an n n matrix. We say A is upper triangular if it has the form
a11 a12 a1n
0 a22 a2n
A= . .. .
.. ..
.. . . .
0 0 ann
38
We say A is lower triangular if it has the form
a11 0 0
a21 a22 0
A= . .
.. .. ..
.. . . .
an1 an2 ann
A matrix is triangular is it is either upper or lower triangular.
Theorem 3.47. Let A be an n n upper (lower) triangular matrix and k 1. Then Ak is also
upper (lower) triangular.
3.3 Inverses
In this section we tackle the problem of reversing a linear transformation, to the extent that
its possible. This process has countless applications, including cryptography and 3D transfor-
mations.
3.3.1 Basics
Example 3.48. Let T : R3 R3 be given by T (~x) = A~x, where
1 3 2
A = 2 7 2 .
2 6 5
When we solve T (~x) = ~y for some ~y R3 , we can alternatively interpret this as the inverse of
T for ~y , or T 1 (~y ). To do this we need to solve
x1 + 3x2 + 2x3 = y1
2x1 7x2 + 2x3 = y2
2x1 + 6x2 + 5x3 = y3
Transforming this system into reduced echelon form is equivalent to solving for the xi -s, giving
us the system
x1 = 47y1 + 3y2 20y3
x2 = 14y1 y2 + 6y3
x3 = 2y1 + y3
.
So if we set
47 3 20
B = 14 1 6 ,
2 0 1
then T 1 (~y ) = B~y . In particular, T 1 is also a linear transformation.
Definition 3.49. A linear transformation T : Rm Rn is invertible if T is one-to-one and
onto. When T is invertible, the inverse T 1 : Rn Rm is defined by
T 1 (~y ) = ~x if and only if T (~x) = ~y
39
Theorem 3.50. Let T : Rm Rn be a linear transformation. Then
Example 3.51. It is important to keep in mind that m = n is a necessary but not sufficient
requirement for T to be invertible. If T (~x) = A~x with
3 6
A= ,
4 8
then the columns of this matrix are linearly dependent, so T is neither one-to-one nor onto, and
therefore not invertible.
Since this holds for all ~x, it follows that AB = In , the identity matrix. This yields the corre-
sponding notion of invertible matrices.
Since
1 2 5 2 1 0
= ,
3 5 3 1 0 1
and
1 2 1 2 2 3 1 0 0
2 5 1 1 1 1 = 0 1 0 ,
1 2 0 1 0 1 0 0 1
both A and B are invertible.
Theorem 3.54. Let A be an invertible matrix with AB = I. Then BA = I, and this matrix
B is unique.
Definition 3.55. If A is invertible, then A1 is the inverse of A given by the unique matrix
such that AA1 = A1 A = I.
A square matrix that is not invertible is also called singular.
40
1
(a) A1 is invertible, with A1 =A
(b) AB is invertible, with (AB)1 = B 1 A1
(c) If AC = AD then C = D
(d) If AC = 0nm then C = 0nm
Example 3.57. Inverses are useful for cryptography. Suppose Chuck Norris has a message:
When I slice onions, onions cry.
He can encode this as numbers in a simple way. Letting 0 denote a space, and each other
number the ordinal of the letter a = 1, b = 2, . . ..
23, 8, 5, 14, 0, 9, 19, 12, 9, 3, 5, 0, 15, 14, 9, 15, 14, 19, 15, 14, 9, 15, 14, 19, 0, 3, 18, 25
Lets put it into a 4 7 matrix
23 8 5 14 0 9 19
12 9 3 5 0 15 14
B=
9 15 14 19 15 14 9
15 14 19 0 3 18 25
This could easily be broken though. So, what he does is scramble it with a 4 4 matrix A. Lets
say
0 1 0 1
3 3 3 2
A= 2 4 2 3
3 3 2 2
This has inverse
1 0 1 1
5 2 3 0
A1 =
0 1 0 1
6 2 3 0
So, instead of B, Chuck can send the message AB:
0 1 0 1 23 8 5 14 0 9 19 27 23 22 5 3 33 39
3 3 3 2 12 9 3 5 0 15 14 162 124 104 114 51 150 176
AB =
2 4
=
2 3 9 15 14 19 15 14 9 157 124 107 86 39 160 187
3 3 2 2 15 14 19 0 3 18 25 153 109 90 95 36 136 167
Jimmy (the deputy Ranger) decodes it via the formula
A1 (AB) = (A1 A)B = IB = B.
That is
1 0 1 1 27 23 22 5 3 33 39 23 8 5 14 0 9 19
5 2 3 0 162 124 104 114 51 150 176 12 9 3 5 0 15 14
157 124 107 86 39 160 187 = 9 15 14 19 15 14 9 .
0 1 0 1
6 2 3 0 153 109 90 95 36 136 167 15 14 19 0 3 18 25
41
History: This was invented by Lester Hill in 1929, along with a machine that decoded it,
the Hill Cipher:
3.3.2 Finding A1
If A = [~a1 ~an ] and B = [~b1 ~bn ] are matrices such that AB = In , then by breaking down
matrix multiplication column by column tells us.
We can then solve for each ~bi with the following augmented matrices
However, in solving each of these systems we will perform exactly the same row operations, so
it makes sense to combine it all into one process by setting up one larger augmented matrix:
Once weve transformed the left half to In , provided it is possible to do so, then the right hand
side will be transformed to B = A1 .
In short: We perform Gauss-Jordan elimination on [A : In ] to get [In : A1 ].
42
1 2 3
Example 3.58. Consider A = 0 1 5. To find A1 we will row-reduce [A : I3 ].
5 6 0
1 2 3 : 1 0 0 1 2 3 : 1 0 0
0 1 5 : 0 1 0 0 1 5 : 0 1 0
5 6 0 : 0 0 1 0 4 15 : 5 0 1
1 2 3 : 1 0 0
0
1 5 : 0 1 0
0 0 5 : 5 4 1
2 0 : 4 12 35
1 5
0
1 0 : 5 3 1
4 1
0 0 1 : 1 5 5
0 0 : 6 18 7
1 5 5
0
1 0 : 5 3 1
0 0 1 : 1 45 1
5
Our process terminates because it is no longer possible to transform the right half into I3 . This
is what will happen for any singular matrix.
Fact 3.60. In the 2 2 case we even have a nice formula for inverses. If
a b
A= ,
c d
(a) A is invertible.
(b) A~x = ~b has a uniue solution for all ~b, given by ~x = A1~b.
43
(c) A~x = ~0 has only the trivial solution.
Theorem 3.62 (The Big Theorem). Let A = [~a1 ~an ] be an n n matrix and T : Rn Rn
be given by T (~x) = A~x. Then the following are equivalent:
(d) T is onto.
(e) T is one-to-one.
(f) T is invertible.
(g) A is invertible.
44
4 Subspaces
We have studied linear systems of equations, matrices, and linear transformations. Now we are
interested in studying the subsets of Rn that these relate to. It is often the case that the span
of a matrix, or the range of a linear transformation is not all of the codomain.
1 2
Example 4.1. T (~x) = ~x has range 0 = 4x + y, or equivalently the line y = 4x.
4 8
2. If ~u, ~v S the ~u + ~v S.
x+y =6
xy =2
4
The solution set is not a subspace because ~0 is not a solution.
2
Example 4.4. Let
x+y =6
5x 5y = 30
6t
The solution set is . Notice that ~0 again is not in the solution set, so the solutions set is
t
not a subspace.
Step 2. If you can show that S is generated by a set of vectors, i.e. that is the span of some set of
vectors, then by the previous theorem S is a subspace.
Step 3. Show by hand that the second and third conditions of the subspace definition are or arent
met by S.
45
Example 4.6. S = {~0} and S = Rn are subspaces. We call them the trivial subspaces.
Example 4.7. Let ` be a line in Rn not containing the origin. The ` is not a subspace because
~0
/ `. However, any line L that contains ~0 is a subspace because it can be written as the span
of a single vector.
v1
v2 n
Example 4.8. Let S be the set of all vectors ... in R such that v1 + + vn = 0. Then
vn
~
S is a subspace because (1) 0 belongs to S, and one can check that the second two conditions
hold as well:
(2) ~u, ~v S implies that (u1 + v1 ) + (un + vn ) = 0 so ~u + ~v S.
(3) If ~u S then r(u1 + un ) = 0 so r~u S.
Matrices also have subspaces attached to them. For instance, homogeneous systems A~x = ~0
have solution sets that are subspaces.
Definition 4.10. The set of solutions to A~x = ~0 is called the null space of A. This is denoted
by null(A).
Example 4.11. We give three more intuitive examples of what the Span and null space might
describe:
1. Lets suppose that the matrix A represents a physical system. As an example, lets assume
our system is a rocket, and A is a matrix representing the directions we can go based on
our thrusters. So what do the null space and the span represent?
Well lets suppose we have a direction that were interested in. Is it in our column space?
If so, then we can move in that direction. The column space is the set of directions that
we can achieve based on our thrusters. Lets suppose that we have three thrusters equally
spaced around our rocket. If theyre all perfectly functional then we can move in any
direction. In this case our column space is the entire range. But what happens when a
thruster breaks? Now weve only got two thrusters. Our linear system will have changed
(the matrix A will be different), and our column space will be reduced.
46
Whats the null space? The null space are the set of thruster instructions that completely
waste fuel. Theyre the set of instructions where our thrusters will thrust, but the direction
will not be changed at all.
Suppose a space ship has thrusters which allow it to move in directions
1 0 0 2 1
~u1 = 1 , ~u2 = 0 , ~u3 = 1 , ~u4 = 3 ~u5 = 1 .
1 1 0 2 2
Suppose that each of these vectors has an antipodal thruster that can fire in the opposite
direction. So, it makes sense to apply a negative thrust in the direction. What thrust
amounts, t1 , t2 , t3 , t4 , t5 result in no motion?
We solve the system A~x = ~0 :
1 0 0 2 1 0
1 0 1 3 1 0
1 1 0 2 2 0
and obtain
1 0 0 2 1 0
0 1 0 4 1 0 .
0 0 1 1 2 0
2t4 t5
So, if we choose t4 and t5 , then any thrusting for the first three of the form 4t4 t5
t4 + 2t5
will result in no motion. Note that because the Span is all of R3 we can go anywhere with
our space ship.
2. Or A can represent measurements on investments. The range are all the measurements
achievable by a portfolio. The null space are all the investments that can be made that
wouldnt change the measurements at all.
47
R
An investment has 4 measurements attached to it where R is the average rate of
v
return, is the amount the stock has outperformed the index, is the correlation to the
index, and v is the riskiness or volatility of the stock. An investor can either buy an index
or short it. So, it is possible to have positive and negative linear combinations of the
following five investments:
3 1 14 1 5
1 0 2 1 1
Quascorp: 3; LundinWorx: 1; GruelInc: 11 ; Holtheroni: 1; Bruzzard 2 .
4 11 20 1 3
What stock blends will result in a completely neutral portfolio? We set up an augmented
matrix to solve for the nullspace:
3 1 14 1 5 0
1
0 2 1 1 0
3 1 11 1 2 0
4 11 20 1 3 0
3. Another example that we wont work out is room illumination. A math professor at
Indiana University once told Matt that the hardest math problem he couldnt solve was
buying lighting for his house (true story). Apparently it is really hard to get the light just
right in a room. The range of A represents the area of the room that can be illuminated.
The null space of A represents the power we can apply to lamps that dont change the
illumination in the room at all.
48
4.1.2 Kernel and range of linear transformations
We can define similar notions for linear transformations.
Both of these can be checked by using the linearity and scaling properties of linear transfor-
mations.
49
4.2.1 Introduction
Example 4.16. Bryan is very health conscious. He not only cares about macro nutrients, but
about the constitution of these macros. Here are the standard recommendations.
Calories 2000
Fat 65 g
Sodium 2,400 mg
Carbohydrate 300 g
Protein 50 g
Let
K
F
S R5
C
P
be the nutrition information of 100g of each food item. Notice that calories are determined by
the formula
K = 9F + 4C + 4P.
Thus, the space of possible nutrition vectors is the subspace W R5 described by all vectors
satisfying
0 = K + 9F + 4C + 4P.
Bryan also has a narrow palate. Suppose he only likes
whole milk, avocado, tahini, orange juice, pineapple and coconut milk.
Can he hit any nutrition goal with just these? What is the least amount of food he can eat
in order to be able to hit any nutrition goal he might have? We first gather the nutrition
information:
Nutrition Facts
AVOCADO
Serving Size 100 g
50
This gives a matrix of column vectors:
A T O M P C
179 638 44 41 41 232 K
15 54 0 1 1 24 F
7 115 1 44 15 2 S
9 21 10 5 6 1 C
2 17 1 3 2 3 P
Suppose also, that Bryan can have really targeted workouts so he depletes exactly the negative
of any of these nutrition vectors. The echelon form of the above is:
5176 8417
1 0 0 0 18807 12538
0 1 0 0 1741 1705
18807 6269
3690 8741
0 0 1 0
6269 12538
811 9483
6269 12538
0 0 0 1
0 0 0 0 0 0
Note that had we augmented with a generic vector we would have the echelon form:
1 638 44 41 41 232 1
179 179 179 179 179 179 K
0 1 55
8 109
24 109
24
17
2
179 5
96 F 32 K
3611 3379 2060 1791 449 8
0
0 1 4947 4947 1649 6596 F + 19788 K + 4947 S ,
0 0 0 1 6269 12538 12538 C + 41367
811 9483 4947
50152 F 4547
50152 K + 547
12538 S
0 0 0 0 0 0 4C + 9 F K + 4P
which checks out with the fact these vectors all lie in the subspace W .
The above tells us that Bryan can hit every nutritional goal in W . But can he cut out a
food item? Yes! These vectors are not linearly independent. Suppose he took out coconut milk
and pineapple. Then we would have {~a, ~t, ~o, m}
~ is a linearly independent system that still spans
W . This sets us up PERFECTLY to define a basis.
Lets define a basis.
Definition 4.17. A set B = {~u1 , . . . , ~um } is a basis for a subspace S if
1. span(B) = S.
2. B is linearly independent.
Theorem 4.18. If B is a basis for S then every element of S can be written as a unique linear
combination of vectors in B.
a~n
span{~a1 , . . . , ~am }. We denote this by rowspace(A).
51
Theorem 4.20. If A and B are equivalent (i.e. theres a way to go from one to the other by
row operations) then
rowspace(A) = rowspace(B).
So if we know that ~u1 , . . . , ~um span S and want to find a minimal set, it is enough to write
these as row vectors then go to echelon form. The remaining non-zero vectors will be linearly
independent, and still span S.
Example 4.21. The one in the book is fine:
52
Fact 4.22. If U = [~u1 ~um ] and V = [~v1 ~vm ] are equivalent matrices, then any linear
dependence among the ~ui still exists for the ~vi .
53
4.2.3 Dimension
Now that we have the notion of a basis, we can finally
make sense of dimension. For example, it
1
has always been tempting to say that span = R, because it basically looks like the real
0
line. However, this isnt true because it lives in R2 . We will see that this set has dimension
1, and so it looks like R, although it lives somewhere else...
Before defining a dimension we need to establish that every basis for a given subspace will
have the same size.
Theorem 4.24. If S is a subspace of Rn , then every basis of S has the same number of vectors.
Definition 4.25. Let S be subspace of Rn , then the dimension of S is the number of vectors
in any basis of S.
Note that {~0} and Rn have dimensions 0 and n, respectively. To see that Rn has dimension
n we can use the standard basis vectors ~e1 , . . . , ~en .
We are interested in three different tasks:
1. Determining the dimension of a subspace.
We do so by row reducing the corresponding matrix and counting the number of pivots.
6 1 1 7 1 0 1 0
4 0 4 1 0 1 5 0
1
2 9 0
0 0
0 1
.
1 3 16 1 0 0 0 0
2 4 22 3 0 0 0 0
2. If span(U) = S, then either U is already a basis, or vectors can be removed from U to form
a basis.
54
Now lets expand a set to form a basis. We go about this in a round-about way. Put simply,
we add too many vectors in, then remove them using techniques we already know.
55
Describe the nullspace of A and exhibit a basis. We row reduce and obtain
1 3 0 4 2 0 0
0 0 1 2 0 0 0
A 0
0 0 0 0 1 0
0 0 0 0 0 0 0
3 4 2
1 0 0
0
+ x4 0 + x5 0 .
x2
0 1 0
0 0 1
0 0 0
Thus,
3 4 2
1 0 0
0 0 0
nullspace(A) = span , ,
0 1 0 .
0 0 1
0 0 0
We often care about the dimension of the nullspace. This gets a special name.
Definition 4.30. The nullity of a matrix A is the dimension of the nullspace. We denote this
by nullity(A).
Knowing the dimension of a subspace helps us find the dimension.
Theorem 4.31. If S has dimension m and U is a set of m vectors that either span S or are
linearly independent, then U is a basis for S.
1
Example 4.32. Suppose that S is a subspace of dimension 2 and contains the vectors 0 and
2
0
1. Show that these two vectors form a basis for S.
0
To do this it suffices to check that these are linearly independent, which either visual inspec-
tion or row reducing quickly reveals.
56
Fact 4.33. If S1 and S2 are subspaces of Rn and S1 is a subset of S2 , then dim(S1 ) dim(S2 )
and dim(S1 ) = dim(S2 ) if and only if S1 = S2 .
Fact 4.34. If U = {~u1 , . . . , ~um } is a set of vectors in a subspace S of dimension k, then
1. If m < k, then U does not span S.
2. The columns of A corresponding to the pivot columns of B form a basis for col(A).
Example 4.37. Find a basis and the dimension of the row and column spaces for A.
1 3 1 0 3
A = 2 6 0 2 4
3 9 3 6 3
By the previous theorem, the nonzero rows of B form a basis for row(A), or
1 0
3 0
1 , 2 .
0 2
3 2
For the row space, we observe that the pivot columns of B are the first and third columns, so
the corresponding columns of A will be a basis for col(A), or
1 1
2 , 0 .
3 3
Both subspaces have bases consisting of two vectors, so they have dimension 2.
57
In the above example the row space and column space of our matrix, even though they were
subspaces of different Euclidean spaces, had the same dimension. It turns out this is always the
case.
Theorem 4.38. For any matrix A, the dimensions of the row space and the column space are
the same.
Proof: Given A, we use elementary row operations to find an echelon form B of A. The
dimension of the row space of A will correspond to the number of nonzero rows of B, while
the dimension of the column space will correspond to the number of pivot columns of B. We
observe that each nonzero row of B has exactly one leading variable, or pivot, and by the
stair-step pattern of B different nonzero rows will have different pivot columns. It follows that
With this relationship established we can make sense of the rank of a matrix.
Definition 4.39. The rank of a matrix A is the dimension of the row (or column) space of A,
denoted rank(A).
1 2 3 0 1
Example 4.40. What is the rank and nullity of 0 0 1 3 5 ?
0 0 0 0 0
Well, we see that there are two pivots, so the rank is 2. To determine the nullity we augment
with ~0 and observe that there are 3 free variables: x2 , x4 , x5 . Back substitution yields the
following solutions for the null space:
x3 = 3x4 5xt ,
x1 = 2x2 3x3 + x5 = 2x2 9x4 + 16x5 .
These are linearly independent, and thus form a basis for the nullspace. We deduce that the
nullity is 3.
In this example we saw that for our 3 5 matrix the rank was 2 and the nullity was 3. Notice
that 2 + 3 = 5. This holds in general:
rank(A) + nullity(A) = m.
58
This is true because in echelon form, each pivot of A corresponds to a vector in the basis
for column (or row) space of A. The remaining columns correspond to free variables, each of
which is a parameter for a basis vector of the solution set A~x = 0. So, these two objects sum to
m = #columns.
Example 4.42. Suppose that A is 3742 and T (~x) = A~x has a kernel with dimension 12. What
is the rank of A? Well, since the kernel of T and the nullspace of A have the same dimension,
we can use the rank-nullity theorem to deduce that rank(A) = 42 12 = 30.
Theorem 4.43 (The Big Theorem). Let A = [~a1 ~an ] be an n n matrix and T : Rn Rn
be given by T (~x) = A~x. Then the following are equivalent:
(d) T is onto.
(e) T is one-to-one.
(f) T is invertible.
(g) A is invertible.
(j) col(A) = Rn
(k) row(A) = Rn
(l) rank(A) = n
(m) nullity(A) = 0.
We end with one last example that ties a lot of this together.
59
0 1 1 0 0 0 1 0 2 0 0 1
0 0 0 1 0 0 0 1 1 0 0 0
1 2 0 0 1 0 0 0 0 1 0 0
1 2 0 0 0 1 0 0 0 0 1 1
Looking at the pivots we have {~u1 , ~u2 , ~e2 , ~e3 } is a basis for R4 .
Next we need to find spanning vectors
for the subspace Q. We can write x = y + z, so
x
another way to represent vectors y Q is as
z
y+z 1 1
y = y 1 + z 0 .
z 0 1
1 1
This means that ~v1 = 1 and ~v2 = 0 is a spanning set for Q. We still need to find another
0 1
vector to make this a basis for R3 . A sure way to do this is to augment again with the standard
basis vectors and row reduce:
1 1 1 0 0 1 0 0 1 0
1 0 0 1 0 0 1 0 0 1
0 1 0 0 1 0 0 1 1 1
So, we have a basis {~v1 , ~v2 , ~e1 } is a basis for R3 .
Step 2: We observe that a linear transformation is completely determined by how it acts on a
basis. So, the map T : R4 R3 given by:
60
5 Determinants
5.1 Determinants
The determinant is a number we associate to a square matrix A. We call it det(A), or sometimes
just |A| (Caution, the determinant can be negative). It will tell us, among other things, whether
or not A is invertible. Rather than say how to compute it, lets say how we would like det(A)
to behave like:
Theorem 5.1. Let A be an n n matrix. The number det(A) has the following properties:
1. det A 6= 0 if and only if A is invertible. (Puts us in the Big Theorem situation of lots being
true about the row and column vectors of A, as well as the induced linear transformation
and bases.)
2. The area of the unit cube under the linear transformation T is | det(A)|.
3. det I = 1.
4. If B is obtained from A by switching a single row or column with another then det B =
det A.
a b
= c d .
c d a b
a + a0 b + b0 a b a0 b0
= +
c d c d c d
6. If two rows (or columns) are the same then det A = 0. This is by Property 3, since we
could swap two rows and obtain det A = det A and so det A = 0.
9. If B is the upper triangular we get from A after row reducing like in 6, then det A is the
product of the diagonal entries of B times (1)#row exchanges .
61
10. det(AB) = det A det B
11. det A1 = 1
det A .
.
Note: weve
talked about the quick formula for the inverse of a 2 2 matrix: if A is
already
a a
the matrix 11 12 , then
a21 a22
1 1 a22 a12
A = .
a11 a22 a12 a21 a21 a11
So weve already seen the determinant of a 2 2 matrix come up and we know that a 2 2
matrix A is invertible if and only if det(A) 6= 0.
2 4
Example 5.4. Find the determinant of the matrix A = .
3 1
The determinant is (2)(1) 4(3) = 14.
Formula 5.5. Once we know the formula for a 2 2 determinant, we begin to compute de-
terminants recursively. In order to do this, we first introduce some notation. If A is an n n
matrix, denote the ij th entry by aij . Let Mij be the (n 1) (n 1) matrix gotten by cross-
ing out the ith row and j th column of A. Mij is called the ij th minor of A. Finally, let
Cij = (1)i+j det(Mij ). Cij is called the ij th cofactor of A.
Example 5.6. As an example, consider the matrix
2 1 1
A = 3 2 4 .
2 1 0
Then,
a13 = 1
3 2
M13 =
2 1
62
and
C13 = (1)1+3 det M13 = (1)4 (3(1) (2)(2)) = 1.
Well use these to compute the determinant of any matrix. Heres the formula: if A is the
n n matrix
a11 a12 . . . a1n
.. .. .. ..
. . . .
A= .. .. .. ..
. . . .
an1 an2 . . . ann
then
det(A) = a11 C11 + a12 C12 + + a1n C1n .
2 1 1
Example 2. Find the determinant of the matrix A = 3 2 4 .
2 1 0
The formula above tells us det(A) = a11 C11 + a12 C12 + a13 C13 . We know
and
1+1 2 4
C11 = (1) det M11 = det = 4
1 0
1+2 3 4
C12 = (1) det M11 = det = 8
2 0
1+3 3 2
C13 = (1) det M13 = det = 1
2 1
so
det(A) = 2(4) + 1(8) 1(1) = 15.
63
If you are curious, here is a worked out example for a 4 4 matrix:
1 1 2 1
0 2 1 1
Example 5.8. Find the determinant of the matrix A = 0 3 2
.
4
0 2 1 0
The formula tells us det(A) = a11 C11 + a12 C12 + a13 C13 + a14 C14 . We know
a11 = 1, a12 = 1, a13 = 2, a14 = 1
and, by what we computed above (example (2)),
2 1 1
C11 = (1)1+1 det M11 = det 3 2 4 = 15.
2 1 0
We also need to find
0 1 1
C12 = (1)1+2 det M11 = det 0 2 4
0 1 0
0 2 1
C13 = (1)1+3 det M13 = det 0 3 4
0 2 0
0 2 1
C14 = (1)1+4 det M14 = det 0 3 2 .
0 2 1
Lets do the first one together:
0 1 1
2 4 0 4 0 2
C12 = det 0 2 4 = 0 det 1 det + (1) det = 0.
1 0 0 0 0 1
0 1 0
Similarly, we can compute C13 = 0 and C14 = 0 and we find that so
det(A) = 1(15) 1(0) + 2(0) + 1(0) = 15.
64
A lot of the determinants in the above example turned out to be 0, and thats a reflection
of the following fact: we can compute determinants by expanding along the first row (what
weve been doing in the formula above, using the entries from the first row to compute the
determinant), or we could expand along any row or column of our choice. This is summarized
in the following formulas.
1. An expansion across row i: det(A) = ai1 Ci1 + ai2 Ci2 + + ain Cin
2. An expansion across column j: det(A) = a1j C1j + a2j C2j + + anj Cnj
Example 5.9. Lets revisit the earlier example and try applying part (2) of the formula to
the example above, with j = 1 (expanding across the first column). The formula tells us
det(A) = 1 C11 + 0 C21 + 0 C31 + 0 C41 . So, we need to find
2 1 1
C11 = (1)1+1 det M11 = det 3 2 4 = 15
2 1 0
Example 5.11.
1
2 0 1 1 1 2
1 2 1 = 1 0 0 3
3
0 1 0 1 2 4
65
Fact 5.12. If T : Rn Rn with T (~x) = A~x then the volume of the image of the unit cube is
det A.
Fact 5.13. You will see in multivariable calculus that even for non-linear coordinate changes the
determinant tells you what happens to the volume of a cube. For example, the transformation
from polar coordinates F (r, ) = (r cos , r sin ) induces a determinant
cos r sin
sin r cos = r.
So the distortion is r. This means the amount of stretching increases as r increases. See the
picture below:
What this amounts to is an extra factor of r when we compute double integrals in polar
coordinates: Z Z Z Z
f (x, y)dxdy = f (r cos , r sin )rdrd.
For the matrix above we know that all equations A~x = ~b have integer solutions.
This follows from Cramers rule since
66
6 Eigenvalues and Eigenvectors
The prefix eigen- is adopted from the German word eigen for own- or unique to, peculiar
to, or belonging to in the sense of idiosyncratic in relation to the originating matrix.
Example 6.1. Suppose you take facebook and draw lines between all friends. What is a good
way to measure how well-connected an individual is? It maybe isnt just how many friends
someone has, its also how important their friends are.
What if we assume that there are n people and each person has a certain importance score,
ci . To account for how important your friends are mattering, lets say this is proportional to its
neighbors scores. Another way to write this is as
1
ci = (sum of neighbors importance scores).
If ~c is everyones importance score, we can encode this with an equation involving the adjacency
matrix, A to be:
1
~c = A~c.
Thus, the importance scores can be found by solving A~c = ~c for and ~c. For example, if we
.41
1/2
.58
solve this for the two graphs below we get vectors (a) 2/2 and (b) .41 .
1/2 .52
.21
67
One more example: hippos.
Example 6.2. Suppose that a large hippo population exists in a wildlife preserve. Hippos
produce different average amounts of offspring at different times in their lives. In fact,
Does the population stabilize, go extinct, or grow exponentially? At what rate does this
happen?
Lets start by putting this data into a matrix:
b1 b2 b3 0 2 1
A = p1 0 0 = 21 0 0
0 p2 0 0 15 0
Notice that multiplying the age vector ~x by A gives the age distribution for the next gener-
ation:
57
5 28 7 2
A0 ~x = 10 , A1 ~x = 52 , A2 ~x = 14 , A3 ~x = 72 .
1 14
8 2 2 5
It would be nice to find a simpler formula for Ak ~x. That way we could make predictions
about the long term behavior of the hippo population.
Suppose we could solve the equation
A~u = ~u.
68
~x = c1 ~u1 + c2 ~u2 + c3 ~u3 .
Now, if we multiply by A we have
Ak ~x = Ak (c1 ~u1 + c2 ~u2 + c3 ~u3 )
= c1 Ak ~u1 + c2 Ak ~u2 + c3 Ak ~u3
= c1 k1 ~u1 + c2 k2 ~u2 + c3 k3 ~u3 .
This decomposes the matrix multiplication into a simple formula involving just scalars. What
is eigen (peculiar) is that we can do this! We will say how soon, but for the hippo example we
have
1
1 = .95, ~u1 = .53
.11
1
2 = .10, ~u2 = 4.9
9.8
1
3 = 1.04, ~u3 = .48
.09
And, by solving a linear system we get
c1 11.8
c2 = .8 .
c3 16
This gives us a much simpler formula for Ak ~x:
1 1 1
Ak ~x = 11.8(.95)k .53 + .8(.10)k 4.9 + 16(1.04)k .48 .
.11 9.8 .09
Notice that because 1 , 2 < 1 they go to zero as k gets large. So, we see that the stable
equation is (note we multiply 16~u1 ):
16
Ak ~x 1.1k 7.68 .
1.44
This tells us the rate of growth, 1.04k , as well as the stabilizing population. For example,
16
16+7.68+1.44 .64 says that 64% of the hippos are young in the 1000th generation.
69
The calculation for the hippo example was done on a computer. The one for the centrality
scores is at the end of this section.
In the above examples, for our square matrix A, we heavily relied on knowing scalars
nonzero vectors ~u such that A~u = ~u. We wish to develop a method for finding these for any
square matrix.
Definition 6.3. A vector ~u is an eigenvector of A if there exists some scalar such that
A~u = ~u. We then call an eigenvalue of A. Finally, the set of all eigenvectors corresponding
to a given is called the eigenspace of .
Fact 6.4. If ~u is an eigenvector of a matrix with eigenvector , then so is any scalar multiple
of ~u.
3 5 5
Example 6.5. If A = , verify that ~u = is an eigenvector of A with corresponding
4 2 4
eigenvalues = 7.
To verify this, we need to check the A~u = 7~u. But this is true:
3 5 5 35 5
A~u = = =7 = 7~u.
4 2 4 28 4
A~u ~u = ~0.
(A I)~u = ~0.
Rearranging the eigenvalue problem in this way, were looking for values of such that the
equation
(A I)~x = ~0
has a nonzero solution ~x = ~u. By the Big Theorem, this equation has a nonzero solution if and
only if A I is a singular matrix, ie. det(A I) = 0.
By rearranging the problem in this way, we see that we can find eigenvalues of A by deter-
mining when det(A I) = 0. So we are solving for the zeros of a polynomial in .
So we find the eigenvalues of A by setting its characteristic polynomial to 0 then solving for
.
70
Example 6.7. Find the eigenvalues for the matrix
1 0 1
A = 0 1 2
0 4 3
1 0 1
1 2
det(A I) = 0
1 2 = (1 )
+0+0
4 3
0 4 3
= (1 )((1 )(3 ) 8)
= (1 )(2 4 5)
= (1 )( 5)( + 1)
Fact 6.8. The eigenspace of is then seen to be the set of all solutions to (A I)~x = ~0. It it
is therefore the nullspace of A I, and a subspace of Rn .
3 3
Example 6.9. Find the eigenvalues and a basis for each eigenspace of the matrix A = .
6 4
We first want to find eigenvalues of A, which we do by finding such that det(A I) = 0.
3 3 1 0 3 3
A I = = .
6 4 0 1 6 4
Now we compute
We can find the corresponding eigenvectors (and eigenspaces) by solving the equation (A
I)~x = ~0. For 1 = 5,
2 3
A 5I =
6 9
and we solve the equation by reducing
2 3 0 1 3/2 0
[A 5I|~0] = .
6 9 0 0 0 0
The solutions to this equation are the eigenspace corresponding to 1 = 5, so the eigenspace is
the set of vectors of the form
3/2
s .
1
71
A basis for this eigenspace would be
3/2
.
1
Note that although ~0 is in the eigenspace, eigenvectors are required to be nonzero, so any value
s 6= 0 would give an eigenvector corresponding to 1 = 5.
meaning the eigenspace corresponding to 2 = 6 (the solutions to (A + 6I)~x = ~0) is the set of
vectors of the form
1/3
s .
1
One basis for this eigenspace would be
1/3
.
1
1 2 1
Example 6.10. Find the eigenvalues and a basis for each eigenspace of A = 1 0 1.
1 2 3
First we need to find the characteristic polynomial, det(A I). But,
1 2 1
det(A I) = det 1 1 = 3 + 42 4 = ( 2)2 .
1 2 3
This equals zero when is 0 or 2, so the eigenvalues are 1 = 0 and 2 = 2. To find the
corresponding eigenspaces, we want to find all solutions to (A I)~x = ~0.
72
The solutions to this system (the eigenspace corresponding to 2 = 2) are given by vectors of
the form
1 2
s1 0 + s2 1 ,
1 0
so a basis for the eigenspace of 2 = 2 is
1 2
0 , 1 .
1 0
0 0 1 3
and so has eigenvalues 2 and 3 each with multiplicity 2.
Lets look at the eigenspace for = 2. This is the nullspace of
0 0 0 0 1 0 0 0
1 0 0 0 0 1 1 0
0 1 1 0 0 0 1 1 .
0 0 1 1 0 0 0 0
0
0
x4
= span 1 . So the nullspace has dimension 1, despite the
This has nullspace x4 1
x4 1
fact that = 2 has multiplicity 2.
Another thing to be aware of when calculating eigenvalues and eigenspaces is that a matrix
need not having solely real eigenvalues.
0 1
Example 6.14. Let B = , the matrix corresponding to a rotation by 2 in R2 . Thinking
1 0
about it geometrically, rotating a non-vector by /2 will never yield a scalar multiple of that
vector. Therefore our intuition is that this (invertible) matrix should have no eigenvectors.
1
Actually working it out, the characteristic polynomial of B is = 2 + 1. But then
1
2 + 1 = 0 implies that = 1 = i.
So we see that matrices can have i maginary or complex eigenvalues. However, any B is real-
valued, so multiplying it by any ~x Rn will result in another vector in Rn . So any eigenspaces
corresponding these eigenvalues will not be subspaces of Rn .
73
6.1.5 Remarks about Eigenvalues
Things to keep in mind when finding eigenvalues and eigenvectors:
1. If A is an n n matrix, the characteristic polynomial det(A I) should always be a
degree n polynomial (you get n s from the n terms on the diagonal).
3. If A is a triangular matrix, the eigenvalues of A are just the entries on the diagonal.
4. When youre finding eigenvectors and reducing [A I|~0], you should ALWAYS get free
variables and rows of zeros because weve chosen so that A I is a singular matrix. If
you dont get free variables and rows of zeros, something went wrong in your calculation!
5. Because eigenvalues are all scalars such that A I is a singular matrix, we can say
that A itself is a singular matrix if and only if 0 is an eigenvalue of A. But, this means we
can add something to the Big Theorem!
Theorem 6.15 (Big Theorem). Let S = {~a1 , . . . , ~an } be a set of n vectors in Rn , and let
A = [~a1 . . . ~an ]. Let T : Rn Rn be the linear transformation T (~x) = A~x. Then the following
are equivalent:
1. S spans Rn .
2. S is linearly independent.
4. T is onto.
5. T is one-to-one.
6. ker(T ) = {~0}.
7. range(T ) = Rn .
8. null(A) = {~0}.
9. col(A) = Rn .
10. row(A) = Rn .
11. nullity(A) = 0.
12. rank(A) = n.
14. det(A) 6= 0.
74
Figure 3.2: Eigenvector Centrality Example.
Example 3.2. For the graph shown in Figure 3.2(a), the adjacency matrix is
2 3
66 0 1 0 77
6 7
A = 666 1 0 1 777 . (3.10)
4 5
0 1 0
Based on Equation 3.9, we need to solve Ce = ACe , or
(A I)Ce = 0. (3.11)
Assuming Ce = [u1 u2 u3 ]T ,
2 32 3 2 3
66 0 1 0 77 66 u1 77 66 0 77
66 1 0 1 777 666 u2 777 = 666 0 777 .
66 75 64 75 64 75 (3.12)
4
0 1 0 u3 0
Since Ce , [0 0 0]T , the characteristic equation is
0 1 0
det(A I) = 1 0 1 = 0, (3.13)
0 1 0
or equivalently,
2 3 2
( )( 1)
1( ) = 2 = (2 ) = 0. (3.14)
p p p
So the eigenvalues are ( 2, 0, + 2). We select the largest eigenvalue: 2.
We compute the corresponding eigenvector:
2 p 32 3 2 3
66 0 2 1p 0 77 6 u1 7 6 0 7
66 77 66 77 66 77
666 1 0 2 1p 777 666 u2 777 = 666 0 777 . (3.15)
64 75 4 5 4 5
0 1 0 2 u3 0
77
Assuming Ce vector has norm 1, its solution is
2 3 2 3
66 u1 77 666 p1/2 777
6 7
Ce = 666 u2 777 = 6666 2/2 7777 , (3.16)
4 5 4 5
u3 1/2
which denotes that node v2 is the most central node and nodes v1 and v3 have equal
centrality values.
Example 3.3. For the graph shown in Figure 3.2(b), the adjacency matrix is as
follows:
2 3
66 0 1 0 1 0 77
66 7
66 1 0 1 1 1 777
66 7
A = 666 0 1 0 1 0 7777 . (3.17)
66 1 1 1 0 0 77
66 77
4 5
0 1 0 0 0
The eigenvalues of A are ( 1.74, 1.27, 0.00, +0.33, +2.68). For eigenvector
centrality, the largest eigenvalue is selected: 2.68. The corresponding eigenvector
is the eigenvector centrality vector and is
2 3
66 0.4119 77
66 77
66 0.5825 77
6 77
Ce = 6666 0.4119 77 .
77 (3.18)
66 0.5237 77
66 75
4
0.2169
78
Heres one more example with the Fuzzy Ball Graph:
Example 6.16.
This has eigenvector [1.0, 1.0, 1.0, 1.0, 1.0, 0.96, 0.95, 0.95, 0.95, 0.95, 0.54, 0.1]T . So we see that
five of the nodes are best connected and the rest have lower scores.
6.2
SKIP
Were used to seeing coordinates in the standard basis. For example, if {~e1 , ~e2 } are the
standard basis vectors for R2 , then any vector in R2 can be written as a linear combination of
these coordinates in a unique way, namely
a 1 0
=a +b .
b 0 1
a
This tells us the coordinates of the vector in the standard basis are just (a, b). This makes
b
sensewe go a units in the direction of ~e1 (the x-direction) and b units in the direction of ~e2 (the
y-direction) to get to the point (a, b).
77
Now, using other bases, we just want to extend this idea to other subspaces. For example,
if I have some plane in R3 , can we find some vector that acts like the x-axis and some vector
that acts like the y-axis? Well, if we have a basis for that plane, we will be able to impose a
coordinate system.
We are also interested in extending our ideas of coordinate systems beyond the standard
basis because some bases are much easier to work with than others. For example, if we have an
n n matrix A and a basis B = {~u1 , . . . , ~un } of Rn consisting of eigenvectors of A, then it is very
easy to compute A~x for any vector ~x in Rn . Why? Well, suppose the eigenvectors {~u1 , . . . , ~un }
correspond to the eigenvalues 1 , . . . , n . Then, we know A~ui = i ~ui . So, if we can find scalars
c1 , . . . , cn such that
~x = c1 ~u1 + + cn ~un ,
then
A~x = c1 A~u1 + + cn A~un = c1 1 ~u1 + + cn n ~un .
This becomes very useful when were trying to find higher powers of A; it says
Ak ~x = c1 Ak ~u1 + + cn Ak ~un = c1 k1 ~u1 + + cn kn ~un .
So, instead of multiplying ~x by A k times, we just have to multiply the eigenvectors by powers
of scalars and add them up. This is much easier to do!
We can investigate some of these ideas further in the future, but for now, thats a bit of
motivation of why wed want to be able to find a basis and then change to a different basis.
Now well move on to the computational side of how to change to a new basis. We restrict our
attention to changing between bases of Rn .
6.3.2 Changing it
Notation 6.17. If B = {~u1 , . . . , ~un } is a basis for Rn , and ~x is a vector in Rn , we write
c1
..
~xB = .
cn B
78
Instead, we mean that ~x is a linear combination of the basis vectors in B with coefficients given
by c1 , . . . , cn .
2 1 2 4
Example 6.18. Let B be the basis , for R . If ~xB = , find .
1 3 5 B
4
The notation ~xB = means that
5 B
2 1 3
~x = 4 5 = .
1 3 19
Example 6.20. Brian is all about smoothies. His favorite ingredients by far are banana, avocado
and tofu.
~b ~a ~t
Protein 1 2 5
.
Fat 1 15 3
Carb 22 1 2
These vectors form a basis for the set of macro nutrients R3 . Brian likes these ingredients so
much he would rather express macronutrients in terms of B = {~b, ~a, ~t} than in the standard
m1
basis. So instead of making a smoothie with a macro-profile m ~ = m2 , he would like to write
m3
r1
m
~ B = r2 where r1 , r2 and r3 describe the amount of banana, avocado and tofu he needs to
r2 B
put in his smoothie. Letting B = [~b ~a ~t], the equation m
~ = ~rB is the same as:
m
~ = Bm
~ B.
79
So, if his partner
wants
to go get fried oysters later, Brian can understand the macro composition
13
of oysters ~o = 12 in terms of his favorite foods, ~oB by multiplying
9
111 0.1
1
N 1~o = 68 0.1 .
745
1932 2.6
So if we take 10g of banana (i.e. -.1 100g servings of banana), 10g of avocado and 260g of
avocado we end up with the same macros as fried oysters.
Recap.
If B = {~u1 , . . . , ~un } is a basis for Rn , and ~x is a vector in Rn , then we can find scalars
c1 , . . . , cn such that
~x = c1 ~u1 + + cn ~un .
However, this says
c1
..
~xB = . .
cn B
If U is the matrix whose columns are the basis vectors, A = [~u1 . . . ~un ], then we can rewrite the
first equation as
~x = U~xB .
In words, this is saying that multiplying the coordinate vector ~xB of ~x in the basis B by the
matrix U gives us ~x, the coordinate vector of ~x in the standard basis S. Hence, we call U the
change of basis matrix from B to the standard basis.
What if were given a vector in the standard basis, and we want to find ~xB ? Well, because
B is a basis, the matrix U is invertible, so we can solve for ~xB by multiplying both sides of the
equation above by U 1 to get the equation
~xB = U 1 ~x.
We can do this problem using the formula ~x = U~xB . We first set U = [~u1 ~u2 ]. Then,
3 5 2 1
~x = U~xB = = .
1 2 1 0
1
Example 6.22. Let B be as in Example 2. Find ~xB if ~x = .
1
80
3 5
We do this problem using the formula ~xB = U 1 ~x. We know U = , so U 1 =
1 2
2 5
. Then,
1 3
1 2 5 1 3
~xB = U ~x = = .
1 3 1 2 B
~xB2 = V 1 U~xB1
~xB1 = U 1 V ~xB2
Example 6.23. Recall Brian just wants to think of macro nutrients in terms of avocado, banana
and tofu. His friend Adrian only thinks in terms of coconut milk, yogurt and pineapple. These
have macro profiles:
~b ~a ~t
~c ~y p~
Protein 3 9 2 Protein 1 2 5
A= Fat
, B= .
24 5 1 Fat 1 15 3
Carb 1 4 6 Carb 22 1 2
One canreadily check that A = {~c, ~y , p~} is a basis. We already saw how to compute fried oysters,
13
~o = 12, in Brians basis, ~oB . Say, that Brian calls up Adrian and asks him about the macros
9
in his fried oyster meal. If Brian tells Adrian ~oB , how does Adrian translate to his basis A?.
That is how do we relate ~oB and ~oA ?
First, we compute some inverses
27 1 69 26 46 1
1 1
B 1 = 64 108 2 , A1 = 143 16 45
1490 1027
329 43 13 91 3 201
Notice that
A1~o = ~oA , and B~oB = ~o.
81
It follows that
A1 B~oB = ~oA .
This is our conversion formula. If youre curious we have
6.2
~oA 9.8 .
55.3
82
6.4 Diagonalization
6.4.1 How to diagonalize
Suppose A is an n n matrix with eigenvectors B = {~u1 , . . . , ~un } that form a basis for Rn . Let
1 , . . . , n be the corresponding eigenvectors. If ~x is any vector in Rn , because these eigenvectors
form a basis, we can write
~x = c1 ~u1 + + cn ~un
or
c1
..
~xB = . .
cn B
Because these are eigenvectors, if we multiply the first equation by A, we get
or
1 c1
(A~x)B = ... .
n cn B
But, if D is the matrix
1 0 ... 0
..
0 2 .
D=
.. ..
,
. . 0
0 ... 0 n
we have just derived the formula
1 c1 c1
.. ..
(A~x)B = . = D . = D~xB .
n cn cn
~xB = P 1 ~x
A = P DP 1 .
Definition 6.25. We say that an nn matrix A is diagonalizable if we can find these matrices
P and D (where D is a diagonal matrix) such that A = P DP 1 .
83
In the situation outlined above, we see that we can find P and D by finding a basis for Rn
consisting of eigenvectors {~u1 , . . . , ~un } of A with corresponding eigenvalues 1 , . . . , n . Then
P = [~u1 . . . ~un ] and D is the diagonal matrix with the eigenvalues on the diagonal.
3 1
Example 6.26. Find matrices P and D to diagonalize A = .
2 0
First, we want to find the eigenvalues and eigenvectors of A. A computation shows that
the eigenvalues
are
1 = 1 and 2 = 2, and an eigenvector corresponding to 1 = 1 is the
1 1
vector ~u1 = . Similarly, an eigenvector corresponding to 2 = 2 is the vector ~u2 = .
2 1
1 1 1 0
Then, the vectors {~u1 , ~u2 } form a basis for R2 , so we set P = and D = . Then,
2 1 0 2
A = P DP 1 .
Remark. Not every matrix is diagonalizable. In order to do this computation, we need to find
a basis for Rn consisting of eigenvectors of A, and we cannot always do this. We can if either:
6.4.2 Applications
The main application of this is the following formula:
For example, (P DP 1 )2 = P DP 1 P DP 1 = P DP 1 .
Example 6.28 (Hippos). Recall that the hippo population model had a matrix:
b1 b2 b3 0 2 1
1
A = p1 0 0 =
2 0 0
0 p2 0 0 15 0
This has eigenvalues and vectors:
1
1 = .95, ~u1 = .53
.11
1
2 = .10, ~u2 = 4.9
9.8
1
3 = 1.04, ~u3 = .48
.09
84
.95 0 0
Let P = [~u1 ~u2 ~u3 ] and D = 0 .10 0 . As we take larger and larger powers Ak
0 0 1.04
0 0 0
we get the matrix Dk starts to look like Dk 0 0 0 .
0 0 1.04k
One final example. Remember that e 2.714 is a ubiquitous number in math. This number
mysteriously comes up all the time. For example,
p
e = lim n LCM(1, 2, . . . , n),
n
and
1
e 1 = 0.
Heres two from probability. If n people are at the dog park, the dogs run around, then eventually
return to a random owner, we will have n/e dogs return to their actual owner. Or, if you interview
n candidates for a job, and must make the hiring decision immediately after each interview, you
should interview n/e of the candidates then hire the next best one to come along (called the
secretary problem, wiki it).
Another way to define ex from Math 126 is as an infinite sum:
x0 x1 x2 xk
ex = + + + + + .
0! 1! 2! k!
We can define the exponential of a matrix in the same way.
A0 A1 A2 Ak
eA = + + + + + .
0! 1! 2! k!
This is hard in general. Computers cant even do it. But it becomes tractable for diagonal
matrices:
85
Note you can also write things like etA or 3A = elog(3)A . It is possible to check from the
d tA
definition that dt e = AetA . We can use this to use to solve systems of differential equations.
S(t)
Example 6.31. Suppose that ~x = is a function of t and represents people at different
F (t)
stages of having a facebook account. Let S(t) denote number of people that are susceptible to
making a facebook, but not yet on it. Let F (t) denote number of people who have an account.
We assume:
The fraction of the susceptible who make facebooks per unit time (say decades) is propor-
tional to the number with facebooks, b is the proportional number.
A fixed fraction rS of the facebook population deletes their facebook, but still is susceptible
to making a new account.
Another proportion dI vows to never use facebook again with 0 d 1 r.
The system of differential equations that model this phenomenom are
S 0 = bS + rF
I 0 = bS rF dF
S(t)
Letting ~x(t) = . The above looks like matrix multiplication. We can express this as
F (t)
d b r
~x(t) = ~x(t).
dt b r d
S(0)
Call the above matrix A. Suppose that ~x0 = . The solution to the above is given by
I(0)
86
This amounts to multiplying the second row of A with ~x0 , setting equal to 0, and solving for
t
4(0.45 e1.52t + 0.45e.043t ) + 1(0.57e1.52t + 0.42e.043t ) = 0.
A solver numerically estimates this at t = 1.31 decades.
87