Linear Algebra
Linear Algebra
Linear Algebra
Linear Algebra
2.1 Introduction
We discuss vectors, matrices, transposes, covariance, correlation, diagonal and inverse
matrices, orthogonality, subspaces and eigenanalysis. An alterntive source for much
of this material is the excellent book by Strang [58].
26
24
22
20
18
16
14
12
10
0 20 40 60 80 100
t
Figure 2.1: Three time series having the covariance matrix C 1 and mean vector m1
shown in the text. The top and bottom series have high covariance but none of the
other pairings do.
There is also a more compact notation for the same matrix
D = diag([4; 1; 6]) (2.17)
If a covariance matrix is diagonal it means that the covariances between variables are
zero, that is, the variables are all uncorrelated. Non-diagonal covariance matrices are
known as full covariance matrices. If V is a vector of variances V = [12; 22; 32 ]T then
the corresponding diagonal covariance matrix is V d = diag(V ).
where I is the identity matrix. The inverse of a diagonal matrix with entries dii is
another diagonal matrix with entries 1=dii. This satis es the de nition of an inverse,
eg.
2 32 3 2 3
4 0 0 1=4 0 0 1 0 0
64 0 1 0 75 64 0 1 0 75 = 64 0 1 0 75 (2.22)
0 0 6 0 0 1=6 0 0 1
More generally, the calculation of inverses involves a lot more computation. Before
looking at the general case we rst consider the problem of solving simultaneous
equations. These constitute relations between a set of input or independent variables
xi and a set of output or dependent variables yi. Each input-output pair constitutes
an observation. In the following example we consider just N = 3 observations and
p = 3 dimensions per observation
2w1 +w2 + w3 = 5
4w1 6 w2 = 2
2w1 +7w2 + 2w3 = 9
which can be written in matrix form
2 32 3 2 3
2 1 1 w1 5
64 4 6 0 75 64 w2 75 = 64 2 75 (2.23)
2 7 2 w3 9
or in matrix form
Xw = y (2.24)
This system of equations can be solved in a systematic way by subtracting multiples
of the rst equation from the second and third equations and then subtracting mul-
tiples of the second equation from the third. For example, subtracting twice the rst
equation from the second and 1 times the rst from the third gives
2 32 3 2 3
2 1 1 w1 5
64 0 8 2 75 64 w2 75 = 64 12 75 (2.25)
0 8 3 w3 4
Then, subtracting 1 times the second from the third gives
2 32 3 2 3
2 1 1 w1 5
64 0 8 2 75 64 w2 75 = 64 12 75 (2.26)
0 0 1 w3 2
This process is known as forward elimination. We can then substitute the value for
w3 from the third equation into the second etc. This process is back-substitution. The
two processes are together known as Gaussian elimination. Following this through
for our example we get w = [1; 1; 2]T .
When we come to invert a matrix (as opposed to solve a system of equations as in
the previous example) we start with the equation
AA 1
=I (2.27)
and just write down all the entries in the A and I matrices in one big matrix
2 3
2 1 1 1 0 0
64 4 6 0 0 1 0 75 (2.28)
2 7 2 0 0 1
We then perform forward elimination 1 until the part of the matrix corresponding to
A equals the identity matrix; the matrix on the right is then1 A 1 (this is because in
equation 2.27 if A becomes I then the left hand side is A and the right side must
equal the left side). We get
2 3
1 0 0 12 5 6
64 0 1 0 416 163 162 75 (2.29)
8 8 8
0 0 1 1 1 1
This process is known as the Gauss-Jordan method. For more details see Strang's
excellent book on Linear Algebra [58] where this example was taken from.
Inverses can be used to solve equations of the form Xw = y. This is achieved by
multiplying both sides by X 1 giving
w = X 1y (2.30)
Hence,
2 3 2 32 3
w 12 5 6
5
64 w12 75 = 64 16
4
16
3
16
275 64 2 75 (2.31)
8 8 8
w3 1 1 1 9
which also gives w = [1; 1; 2]T .
The inverse of a product of matrices is given by
(AB ) 1 = BA1 1
(2.32)
Only square matrices are invertible because, for y = Ax, if y and x are of di erent
dimension then we will not necessarily have a one-to-one mapping between them.
1We do not perform back-substitution but instead continue with forward elimination until we get
a diagonal matrix.
2.5 Orthogonality
The length of a d-element vector x is written as jjxjj where
d
X
jjxjj2 = x2i (2.33)
i=1
= xT x
Two vectors x and y are orthogonal if
x-y
Figure 2.2: Two vectors x and y. These vectors will be orthogonal if they obey
Pythagoras' relation ie. that the sum of the squares of the sides equals the square of
the hypoteneuse.
||b-a||
||b||
a
||a||
β
δ α
2.5.2 Projections
The projection of a vector b onto a vector a results in a projection vector p which is
the point on the line a which is closest to the point b. Because p is a point on a it
b
b-p
p
δ
Re-arranging gives
w= T
aT b (2.45)
aa
and
p = aaT ab a
T
(2.46)
We refer to p as the projection vector and to w as the projection.
qTj qk = 0d j 6= k (2.47)
jk = k
j
If these vectors are placed in columns of the matrix Q then
QT Q = QQT = D (2.48)
2.5.4 Orthonormal Matrices
The set of vectors q1 ::qk are orthonormal if
qTj qk = 10 jj =6= kk (2.49)
If these vectors are placed in columns of the matrix Q then
QT Q = QQT = I (2.50)
Hence, the transpose equals the inverse
QT = Q 1 (2.51)
The vectors q1 ::qk are said to provide an orthonormal basis. This means that any
vector can be written as a linear combination of the basis vectors. A trivial example
is the two-dimensional cartesian coordinate system where q1 = [1; 0]T (the x-axis)
and q2 = [0; 1]T (the y-axis). More generally, to represent the vector x we can write
x = x~1q1 + x~2q2 + ::: + x~dqd (2.52)
To nd the appropriate coecients x~k (the co-ordinates in the new basis), multiply
both sides by qTk . Due to the orthonormality property all terms on the right disappear
except one leaving
x~k = q Tk x (2.53)
The new coordinates are the projections of the data onto the basis functions (re.
equation 2.45, there is no denominator since qTk qk = 1). In matrix form, equation 2.52
can be written as x = Qx~ which therefore has the solution x~ = Q 1x. But given
that Q 1 = QT we have
x~ = QT x (2.54)
Transformation to an orthonormal basis preserves lengths. This is because 2
jjx~jj = jjQT xjj (2.55)
= (Q x) Q x
T T T
= xT QQT x
= xT x
= jjxjj
Similarly, inner products and therefore angles between vectors are preserved. That is
x~T y~ = (QT x)T QT y (2.56)
= x QQ y
T T
= xT y
Therefore, transformation by an orthonormal matrix constitutes a rotation of the
co-ordinate system.
2 Throughout this chapter we will make extensive use of the matrix identities (AB )T = B T AT
and (AB )C = A(BC ). We will also use (AB ) 1 = B 1 A 1 .
2.6 Subspaces
A space is, for example, a set of real numbers. A subspace S is a set of points fxg
such that (i) if we take two vectors from S and add them we remain in S and (ii) if
we take a vector from S and multiply by a scalar we also remain in S (S is said to be
closed under addition and multiplication). An example is a 2-D plane in a 3-D space.
A subspace can be de ned by a basis.
2.7 Determinants
The determinant of a two-by-two matrix
" #
A = a b (2.57)
c d
is given by
det(A) = ad bc (2.58)
The determinant of a three-by-three matrix
2 3
a b c
A = 64 d e f 75 (2.59)
g h i
is given by
" #! " #! " #!
det(A) = a det e f b det d f + c det d e (2.60)
h i g i g h
Determinants are important because of their properties. In particular, if two rows of
a matrix are equal then the determinant is zero eg. if
" #
A = a b a b (2.61)
then
det(A) = ab ba = 0 (2.62)
In this case the transformation from x = [x1 ; x2 ]T to y = [y1; y2]T given by
Ax = y (2.63)
reduces two pieces of information (x1 and x2 ) to one piece of information
y = y1 = y2 = ax1 + bx2 (2.64)
In this case it is not possible to reconstruct x from y; the transformation is not
invertible - the matrix A does not have an inverse and the value of the determinant
provides a test for this: If det(A) = 0 the matrix A is not invertible; it is singular.
x1 x2 y
2.8 Eigenanalysis
The square matrix A has eigenvalues and eigenvectors q if
Aq = q (2.66)
Therefore
(A I )q = 0 (2.67)
To satisfy this equation either q = 0, which is uninteresting, or the matrix A I
must reduce q to the null vector (a single point). For this to happen A I must
be singular. Hence
det(A I ) = 0 (2.68)
Eigenanalysis therefore proceeds by (i) solving the above equation to nd the eigen-
values i and then (ii) substituting them into equation 2.66 to nd the eigenvectors.
For example, if
" #
A = 2 3 4 5 (2.69)
then
det(A I ) = (4 )( 3 ) ( 5)(2) = 0 (2.70)
which can be rearranged as
2 2 = 0 (2.71)
( + 1)( 2) = 0
Hence the eigenvalues are = 1 and = 2. Substituting back into equation 2.66
gives an eigenvector q1 which is any multiple of [1; 1]T . Similarly, eigenvector q2 is
any multiple of [5; 2]T .
We now note that the determinant of a matrix is also equal to the product of its
eigenvalues Y
det(A) = k (2.72)
k
We also de ne the Trace of a matrix as the sum of its diagonal elements
X
T r(A) = akk (2.73)
k
and note that it is also equal to the sum of the eigenvalues
X
T r(A) = k (2.74)
k
Eigenanalysis applies only to square matrices.
2.9 Gram-Schmidt
A general class of procedures for nding eigenvectors are the de ation methods of
which QR-decomposition and Gram-Schmidt orthogonalization are examples.
In Gram-Schmidt, we are given a set of vectors, say a,b and c and we wish to nd a
set of corresponding orthonormal vectors which we'll call q1,q2 and q3. To start with
we let
q1 = jjaajj (2.75)
We then compute b0 which is the original vector b minus the projection vector (see
equation 2.46) of b onto q1
b0 = b qT1 bq1 (2.76)
The second orthogonal vector is then a unit length version of b0
q2 = jjbb0jj
0
(2.77)
Finally, the third orthonormal vector is given by
q3 = jjcc0jj
0
(2.78)
where
c0 = c qT1 cq1 qT2 cq2 (2.79)
In QR-decomposition the Q terms are given by qi and the R terms by qTi c.
2.9.1 Diagonalization
If we put the eigenvectors into the columns of a matrix
2 3
j j : j
66 j j : j 77
Q = 6666 q1 q2 : qd 7777 (2.80)
4j j : j 5
j j : j
then, because, Aqk = k qk , we have
2 3
j
66 j
j : j
77
j : j
AQ = 6666 1q1 2 q 2 : d q d 77
7
75
(2.81)
4j j : j
j j : j
If we put the eigenvalues into the matrix then the above matrix can also be written
as Q. Therefore,
AQ = Q (2.82)
Pre-multiplying both sides by Q gives
1
Q 1AQ = (2.83)
This shows that any square matrix can be converted into a diagonal form (provided
it has distinct eigenvalues; see eg. [58] p. 255). Sometimes there won't be d distinct
eigenvalues and sometimes they'll be complex.
2.11.1 Ellipses
For 2-by-2 matrices if A = I then we have
f = x21 + x22 (2.94)
p
which is the equation of a circle with radius f . If A = kI we have
f
= x21 + x22 (2.95)
k
q
The radius is now f=k. If A = diag([k1; k2]) we have
f = k1 x21 + k2 x22 (2.96)
q
which is the equation of anqellipse. For k1 > k2 the major axis has length f=k2 and
the minor axis has length f=k1.
For a non-diagonal A we can diagonalise it using A = QQT . This gives
f = 1 x~21 + 2 x~22 (2.97)
where the ellipse now lives in a new co-ordinate q by the rotation x~ =
q system given
x Q. The major and minor axes have lengths f=2 and f=1.
T