Lecture3

Numerical
Linear Algebra
CSE 803– Fall 2024, MSU
Xiaoming Liu
1
Slides credit: D Fouhey
This Week – Math
Two goals for the next two classes:
• Math with computers ≠ Math
• Practical math you need to know but
may not have been taught
3
This Week – Goal
• Not a “Linear algebra in two lectures” – that’s

impossible.
• Some of this you should know!
• Aimed at reviving your knowledge and plugging
any gaps
• Aimed at giving you intuitions
4
Adding Numbers
•1+1=?
• Suppose 𝑥! is normally distributed with mean 𝜇
and standard deviation 𝜎 for 1 ≤ 𝑖 ≤ 𝑁
𝟏 𝑵
• How is the average, or 𝝁 (= ∑𝒊%𝟏 𝒙𝒊 ,
𝑵
distributed (qualitatively), in terms of
variance?
&
• 𝜇- has mean 𝜇 and standard deviation .
'
5
Let’s Make It Big
• Suppose I average 50M normally distributed

numbers (mean: 31, standard deviation: 1)
• For instance: have predicted and actual depth
for 200 480x640 images and want to know the
average error (|predicted – actual|)
numerator = 0
for x in xs:
numerator += x
return numerator / len(xs)
7
Let’s Make It Big
• What should happen qualitatively?

• Theory says that the average is distributed with
(
mean 31 and standard deviation ≈ 10,)
)*+
• What will happen?
• Reality: 17.47
8
Trying it Out
Hmm.
Hmm.
9
What’s a Number?
27 26 25 24 23 22 21 20
1 0 1 1 1 0 0 1 185
128 + 32 + 16 + 8 + 1 = 185
10
Adding Two Numbers
28 27 26 25 24 23 22 21 20
1 0 1 1 1 0 0 1 185
0 1 1 0 1 0 0 1 105
1 0 0 1 0 0 0 1 0 34
Carry Result
Flag
“Integers” on a computer are integers modulo 2k
11
Some Gotchas
32 + (3 / 4) x 40 = 32
Why?
32 + (3 x 40) / 4 = 62
Underflow No Underflow
32 + 3 / 4 x 40 = 32 + 3 x 40 / 4 =
32 + 0 x 40 = 32 + 120 / 4 =
32 + 0 = 32 + 30 =
32 62
Ok – you have to multiply before dividing
12
Some Gotchas
Should be:
9x4=36
math 32 + (9 x 40) / 10 = 68
uint8 32 + (9 x 40) / 10 = 42
Overflow
32 + 9 x 40 / 10 = Why 104?
32 + 104 / 10 = 9 x 40 = 360
32 + 10 = 360 % 256 = 104
42
13
What’s a Number?
27 26 25 24 23 22 21 20
1 0 1 1 1 0 0 1 185
How can we do fractions?
25 24 23 22 21 20 2-1 2-2
1 0 1 1 1 0 0 1 45.25
45 0.25
14
Fixed-Point Arithmetic
25 24 23 22 21 20 2-1 2-2
1 0 1 1 1 0 0 1 45.25
What’s the largest number we can represent?
63.75 – Why?
How precisely can we measure at 63?
0.25
How precisely can we measure at 0?
0.25
Fine for many purposes but for science, seems silly
15
Floating Point
Sign (S) Exponent (E) Fraction (F)
1 0 1 1 1 0 0 1
1 7 1
-1 27-7 = 20 =1 1+1/8 = 1.125
𝑺 𝑬#𝒃𝒊𝒂𝒔
𝑭
−𝟏 𝟐 𝟏+ 𝟑
𝟐
Bias: allows exponent to be negative; Note: fraction = significant = mantissa;
exponents of all ones or all zeros are special numbers 16
Floating Point
Fraction
0/8 000 -20 x 1.00 = -1
1/8 001 -20 x 1.125 = -1.125
Sign Exponent
2/8 010 -20 x 1.25 = -1.25
1 0111 …
-1 7-7=0
*(-bias)*
6/8 110 -20 x 1.75 = -1.75
7/8 111 -20 x 1.875 = -1.875
18
Floating Point
Fraction
0/8 000 -22 x 1.00 = -4
1/8 001 -22 x 1.125 = -4.5
Sign Exponent
2/8 010 -22 x 1.25 = -5
1 1001 …
-1 9-7=2
*(-bias)*
6/8 110 -22 x 1.75 = -7
7/8 111 -22 x 1.875 = -7.5
19
Floating Point
Fraction
Sign Exponent
000 -20 x 1.00 = -1
1 0111
001 -20 x 1.125 = -1.125
000 -22 x 1.00 = -4

1 1001
001 -22 x 1.125 = -4.5
Gap between numbers is relative, not absolute

20
Revisiting Adding Numbers
Sign Exponent Fraction
1 0110 000 -2-1 x 1.00 = -0.5
1 1001 000 -22 x 1.00 = -4
1 1001 001 -22 x 1.125 = -4.5
Actual implementation is complex
21
1 0100 000 -2-3 x 1.00 = -0.125
1 1001 000 -22 x 1.00 = -4
-22 x 1.03125 = -4.125

1 1 0 0 1 000 -22 x 1.00 = -4
?
1 1 0 0 1 001 -22 x 1.125 = -4.5
22
1 0100 000 -2-3 x 1.00 = -0.125
1 1001 000 -22 x 1.00 = -4
-22 x 1.03125 = -4.125

1 1 0 0 1 000 -22 x 1.00 = -4
For a and b, these can happen
a + b = a a+b-a ≠ b
23
8-bit minifloat
24
https://en.wikipedia.org/wiki/Minifloat
IEEE 754 Single Precision / Single
8 bits 23 bits
2127 ≈ 1038 ≈ 7 decimal digits
S Exponent Fraction
IEEE 754 Double Precision / Double

11 bits 52 bits
21023 ≈ 10308 ≈ 15 decimal digits
S Exponent Fraction
25
Trying it Out
Roundoff
error occurs
a+b=a ->
numerator is
stuck,
denominator
isn’t
26
Take-homes
• Computer numbers aren’t math numbers

• Overflow, accidental zeros, roundoff error, and
basic equalities are almost certainly incorrect
for some values
• Floating point defaults and numpy try to protect
you.
• Generally safe to use a double and use built-in-
functions in numpy (not necessarily others!)
• Spooky behavior = look for numerical issues
27
Vectors
[2,3] = 2 x [1,0] + 3 x [0,1]

2x +3x
2 x e1 + 3 x e2
x = [2,3] Can be arbitrary # of
dimensions
(typically denoted Rn)
28
Vectors
2 𝒙) = 2
𝒙=
3 𝒙* = 3
x = [2,3] Just an array!

Get in the habit of thinking of
them as columns.
29
Scaling Vectors
2x = [4,6] • Can scale vector by a scalar

• Scalar = single number
• Dimensions changed
x = [2,3] independently
• Changes magnitude / length,
does not change direction.
30
Adding Vectors
• Can add vectors
• Dimensions changed independently
• Order irrelevant
• Can change direction and magnitude
x = [2,3] x+y = [5,4]
y = [3,1]
31
Scaling and Adding
2x+y = [7,7]
Can do both at the same

x = [2,3] time
y = [3,1]
32
Measuring Length
Magnitude / length / (L2) norm of vector
# $/!
𝒙 = 𝒙 ! = # 𝑥"!
"
x = [2,3] There are other norms; assume L2
unless told otherwise
𝒙 * = 13
y = [3,1] 𝒚 * = 10
Why?
33
Normalizing a Vector
x = [2,3] Diving by norm gives

something on the unit
sphere (all vectors with
length 1)
𝒙& = 𝒙/ 𝒙 𝟐
y = [3,1]
𝒚 & = 𝒚/ 𝒚 𝟐
34
Dot Products
#
𝒙 ⋅ 𝒚 = # 𝑥" 𝑦" = 𝒙𝑻 𝒚
"($
𝒙 ⋅ 𝒚 = cos 𝜃 𝒙 𝒚
What happens with
𝒙& normalized / unit
𝜃 vectors?
𝒚 &
35
Dot Products
#
𝒙 = [2,3] 𝒙 ⋅ 𝒚 = # 𝑥" 𝑦"

"
What’s 𝒙 ⋅ 𝒆𝟏 , 𝒙 ⋅ 𝒆𝟐 ?
Ans: 𝒙 ⋅ 𝒆𝟏 = 2 ; 𝒙 ⋅ 𝒆𝟐 = 3
• Dot product is projection
𝒆𝟏 • Amount of x that’s also
pointing in direction of y
𝒆𝟐
36
Dot Products
#
𝒙 = [2,3] 𝒙 ⋅ 𝒚 = # 𝑥" 𝑦"

"
What’s 𝒙 ⋅ 𝒙 ?
!
Ans: 𝒙 ⋅ 𝒙 = ∑𝑥" 𝑥" = 𝒙 !
37
Special Angles
1 0
⋅ =0∗1+1∗0=0
0 1
𝒙 Perpendicular /
orthogonal vectors
have dot product 0
𝒙& irrespective of their
𝜃 magnitude
𝒚& 𝒚
38
Special Angles
𝑥$ 𝑦$
𝑥! ⋅ 𝑦! = 𝑥$ 𝑦$ + 𝑥! 𝑦! = 0
𝒙
Perpendicular /
orthogonal vectors
have dot product 0
𝒙& irrespective of their
𝜃 magnitude
𝒚 & 𝒚
39
Orthogonal Vectors
𝒙 = [2,3]
• Geometrically,
what’s the set of
vectors that are
orthogonal to x?
• A line [3,-2]
40
Orthogonal Vectors
𝒙 • What’s the set of vectors that are
orthogonal to x = [5,0,0]?
• A plane/2D space of vectors/any
vector [0, 𝑎, 𝑏]
• What’s the set of vectors that are

𝒙 orthogonal to x and y = [0,5,0]?
• A line/1D space of vectors/any
vector [0,0, 𝑏]
𝒚 • Ambiguity in sign and magnitude
41
Cross Product
• Set {𝒛: 𝒛 ⋅ 𝒙 = 0, 𝒛 ⋅ 𝒚 = 0} has an
𝒙 ambiguity in sign and magnitude
𝒚 • Cross product 𝒙×𝒚 is: (1)
orthogonal to x, y (2) has sign
given by right hand rule and (3)
𝒙×𝒚 has magnitude given by area of
parallelogram of x and y
• Important: if x and y are the same
direction or either is 0, then 𝒙×𝒚 =
𝟎.
• Only in 3D!
42
Image credit: Wikipedia.org
Operations You Should Know
• Scale (vector, scalar → vector)

• Add (vector, vector → vector)
• Magnitude (vector → scalar)
• Dot product (vector, vector → scalar)
• Dot products are projection / angles
• Cross product (vector, vector → vector)
• Vectors facing same direction have cross product 0
• You can never mix vectors of different sizes
43
Matrices
Horizontally concatenate n, m-dim column vectors
and you get a mxn matrix A (here 2x3)
𝑣)! 𝑣*! 𝑣,!

𝑨 = 𝒗) , ⋯ , 𝒗+ = 𝑣 𝑣*" 𝑣,"
)"
(scalar) (vector) (matrix)

a lowercase a lowercase A uppercase
undecorated bold or arrow bold
44
Matrices
𝑎 +
Transpose: flip
𝑏 = 𝑎 𝑏 𝑐 (3x1)T = 1x3
rows / columns
𝑐
Vertically concatenate m, n-dim row vectors
and you get a mxn matrix A (here 2x3)
-
𝒖) 𝑢)! 𝑢)" 𝑢)#
𝐴= ⋮ = 𝑢 𝑢*" 𝑢*#
- *!
𝒖+
45
Matrix-Vector Product
𝒚*.) = 𝑨*., 𝒙,.)
𝑥)
𝑦)
= 𝒗𝟏 𝒗𝟐 𝒗𝟑 𝑥*
𝑦*
𝑥,
𝒚 = 𝑥) 𝒗𝟏 + 𝑥* 𝒗𝟐 + 𝑥, 𝒗𝟑
Linear combination of columns of A
46
𝒚*.) = 𝑨*., 𝒙,.)
3
𝑦) 𝑻
𝒖𝟏
𝑦* = 𝒙 3
𝑻
𝒖𝟐
𝑻 𝑻
𝑦) = 𝒖𝟏 𝒙 𝑦* = 𝒖𝟐 𝒙
Dot product between rows of A and x
47
Matrix Multiplication
Generally: Amn and Bnp yield product (AB)mp
− 𝑻
𝒂𝟏 − | |
𝑨𝑩 = ⋮ 𝒃𝟏 ⋯ 𝒃𝒑
𝑻
− 𝒂𝒎 − | |
Yes – in A, I’m referring to the rows, and in B,
I’m referring to the columns
48
Generally: Amn and Bnp yield product (AB)mp
| |
𝒃𝟏 ⋯ 𝒃𝒑
| |
𝑻 𝑻 𝑻
− 𝒂𝟏 − 𝒂 𝟏 𝒃𝟏 ⋯ 𝒂 𝟏 𝒃𝒑
𝑨𝑩 = ⋮ ⋮ ⋱ ⋮
− 𝒂𝑻𝒎 − 𝒂𝑻𝒎 𝒃𝟏 ⋯ 𝒂𝑻𝒎 𝒃𝒑
𝑨𝑩". = 𝒂𝑻𝒊 𝒃𝒋 49
• Dimensions must match

• (Yes, it’s associative): ABx = (A)(Bx) = (AB)x
• (No it’s not commutative): ABx ≠ (BA)x ≠ (BxA)
50
Operations They Don’t Teach
You Probably Saw Matrix Addition
𝑎 𝑏 𝑒 𝑓 𝑎+𝑒 𝑏+𝑓
+ =
𝑐 𝑑 𝑔 ℎ 𝑐+𝑔 𝑑+ℎ
What is this? FYI: e is a scalar
𝑎 𝑏 𝑎+𝑒 𝑏+𝑒
+𝑒 =
𝑐 𝑑 𝑐+𝑒 𝑑+𝑒
51
Broadcasting
If you want to be pedantic and proper, you expand
e by multiplying a matrix of 1s (denoted 1)
𝑎 𝑏 𝑎 𝑏
+𝑒 = + 𝟏!1! 𝑒
𝑐 𝑑 𝑐 𝑑
𝑎 𝑏 𝑒 𝑒
= +
𝑐 𝑑 𝑒 𝑒
Many smart matrix libraries do this automatically.
This is the source of many bugs.
52
Broadcasting Example
Given: a nx2 matrix P and a 2D column vector v,
Want: nx2 difference matrix D
𝑥$ 𝑦$ 𝑥$ − 𝑎 𝑦$ − 𝑏
𝑎
𝑷= ⋮ ⋮ 𝒗= 𝑫= ⋮ ⋮
𝑥# 𝑦# 𝑏
𝑥# − 𝑎 𝑦# − 𝑏
𝑥$ 𝑦$ 𝑎 𝑏 Blue stuff is
𝑷− 𝒗 + = ⋮ ⋮ − ⋮ assumed /
𝑥# 𝑦# 𝑎 𝑏 broadcast
53
Two Uses for Matrices
1. Storing things in a rectangular array (images,

maps)
• Typical operations: element-wise operations,
convolution (which we’ll cover next)
• Atypical operations: almost anything you learned in
a math linear algebra class
2. A linear operator that maps vectors to
another space (Ax)
• Typical/Atypical: reverse of above
54
Images as Matrices
Suppose someone hands you this matrix.
What’s wrong with it?
55
Contrast – Gamma curve
Typical way to
change the contrast
is to apply a
nonlinear correction
pixelvalue-
The quantity 𝛾
controls how much
contrast gets added
56
Contrast – Gamma curve
Now the darkest
regions (10th pctile) 90%
are much darker 50%
than the moderately
dark regions (50th new
10% 90%
pctile).
new
new 10% 50%
57
Implementation
Python+Numpy (right way):
imNew = im**4
Python+Numpy (slow way – why? ):
imNew = np.zeros(im.shape)
for y in range(im.shape[0]):
for x in range(im.shape[1]):
imNew[y,x] = im[y,x]**expFactor
58
Numerical
Linear Algebra
CSE 803– Fall 2024, MSU
Xiaoming Liu
59
Images as Matrices
Suppose someone hands you this matrix.
The contrast is wrong!
60
Results
Phew! Much Better.
61
Implementation
Python+Numpy (right way):
imNew = im**4
Python+Numpy (slow way – why? ):
imNew = np.zeros(im.shape)
for y in range(im.shape[0]):
for x in range(im.shape[1]):
imNew[y,x] = im[y,x]**expFactor
62
Element-wise Operations
Element-wise power – beware notation
2
𝑨2 ". = 𝐴".
“Hadamard Product” / Element-wise multiplication

𝑨⊙𝑩 ". = 𝑨". ∗ 𝑩".
Element-wise division
𝐴".
𝑨/𝑩 ". =
𝐵".
63
Sums Across Axes
𝑥$ 𝑦$
Suppose have
Nx2 matrix A
𝑨= ⋮ ⋮
𝑥# 𝑦#
𝑥$ + 𝑦$
ND col. vec. Σ(𝑨, 1) = ⋮
𝑥# + 𝑦#
# #
2D row vec Σ(𝑨, 0) = # 𝑥" , # 𝑦"
"($ "($
Note – libraries distinguish between N-D column vector and Nx1 matrix.
64
Vectorizing Example
• Suppose I represent each image as a 128-

dimensional vector
• I want to compute all the pairwise distances
between {x1, …, xN} and {y1, …, yM} so I can
find, for every xi the nearest yj
• Identity: 𝒙 − 𝒚 / = 𝒙 / + 𝒚 / − 2𝒙0 𝒚
• Or: 𝒙 − 𝒚 = 𝒙 / + 𝒚 / − 2𝒙0 𝒚 (//
65
Vectorizing Example
−𝒙( − 𝒚( − − | |
𝑿= ⋮ 𝒀= ⋮ 𝒀𝑻 = 𝒚( ⋯ 𝒚+
− 𝒙' − − 𝒚+ − | |
𝒙𝟏 𝟐
Compute a Nx1
𝚺 𝑿𝟐 , 𝟏 = ⋮
vector of norms 𝟐
𝒙𝑵
(can also do Mx1)
Compute a NxM 𝑿𝒀𝑻 = 𝒙𝑻𝒊 𝒚𝒋

matrix of dot products !3
66
Vectorizing Example
𝑻 (//
𝐃 = Σ 𝑿𝟐 , 1 + Σ 𝒀𝟐 , 1 − 2𝑿𝒀𝑻
𝒙𝟏 𝟐
+ 𝒚( 𝟐 𝟐
⋯ 𝒚+
⋮
𝟐
𝒙𝑵
𝒙𝟏 / + 𝒚𝟏 / ⋯ 𝒙𝟏 / + 𝒚𝑴 /
⋮ ⋱ ⋮ Why?
/ / / /
𝒙𝑵 + 𝒚𝟏 ⋯ 𝒙𝑵 + 𝒚𝑴
/ / 0 / /
Σ 𝑿 ,1 + Σ 𝒀 ,1 !3 = 𝒙! + 𝒚3
67
Vectorizing Example
𝑻 (//
𝐃 = Σ 𝑿𝟐 , 1 + Σ 𝒀𝟐 , 1 − 2𝑿𝒀𝑻
/ /
𝐃!3 = 𝒙𝒊 + 𝒚𝒋 + 2𝒙𝑻 𝒚
Numpy code:
XNorm = np.sum(X**2,axis=1,keepdims=True)
YNorm = np.sum(Y**2,axis=1,keepdims=True)
D = (XNorm+YNorm.T-2*np.dot(X,Y.T))**0.5
*May have to make sure this is at least 0
(sometimes roundoff issues happen)
68
Does it Make a Difference?
Computing pairwise distances between 300 and

400 128-dimensional vectors
1. for x in X, for y in Y, using native python: 9s
2. for x in X, for y in Y, using numpy to compute
distance: 0.8s
3. vectorized: 0.0045s (~2000x faster than 1,
175x faster than 2)
Expressing things in primitives that are
optimized is usually faster
69
Linear Independence
A set of vectors is linearly independent if you can’t
write one as a linear combination of the others.
0 0 5
Suppose: 𝒂 = 0 𝒃 = 6 𝒄 = 0
2 0 0
0 0 1 1
𝒙 = 0 = 2𝒂 𝒚 = −2 = 𝒂 − 𝒃
4 2 3
1
• Is the set {a,b,c} linearly independent?
• Is the set {a,b,x} linearly independent?
• Max # of independent 3D vectors?
70
Span
Span: all linear
combinations of a
set of vectors
Span({ }) =
Span({[0,2]}) = ?
All vertical lines
through origin =
𝜆 0,1 : 𝜆 ∈ 𝑅
Is blue in {red}’s
span?
71
Span
Span: all linear
combinations of a
set of vectors
Span({ , }) = ?
72
Span
Span: all linear
combinations of a
set of vectors
Span({ , }) = ?
73
| | Right-multiplying A by x
𝑨𝒙 = 𝒄𝟏 ⋯ 𝒄𝒏 𝒙 mixes columns of A
| | according to entries of x
• The output space of f(x) = Ax is constrained to

be the span of the columns of A.
• Can’t output things you can’t construct out of
your columns
74
An Intuition
| | | 𝑥$
𝒚 = 𝑨𝒙 = 𝒄𝟏 𝒄𝟐 𝒄𝒏 𝑥!
| | | 𝑥4
y y1 x
y2 Ax x1 x2 x3
y3
x – knobs on machine (e.g., fuel, brakes)
y – state of the world (e.g., where you are)
A – machine (e.g., your car)
75
Linear Independence
Suppose the columns of 3x3 matrix A are not
linearly independent (c1, αc1, c2 for instance)
| | | 𝑥$
𝒚 = 𝑨𝒙 = 𝒄𝟏 𝛼𝒄𝟏 𝒄𝟐 𝑥!
| | | 𝑥4
𝒚 = 𝑥$ 𝒄𝟏 + 𝛼𝑥! 𝒄𝟏 + 𝑥4 𝒄𝟐
𝒚 = 𝑥$ + 𝛼𝑥! 𝒄𝟏 + 𝑥4 𝒄𝟐
76
Linear Independence Intuition
Knobs of x are redundant. Even if y has 3
outputs, you can only control it in two directions
𝒚 = 𝑥$ + 𝛼𝑥! 𝒄𝟏 + 𝑥4 𝒄𝟐
y y1 x
y2 Ax x1 x2 x3
y3
77
Linear Independence
Recall: 𝑨𝒙 = 𝑥( + 𝛼𝑥/ 𝒄𝟏 + 𝑥7 𝒄𝟐
𝑥( + 𝛽
𝛽
𝒚 = 𝑨 𝑥/ − 𝛽/𝛼 = 𝑥( + 𝛽 + 𝛼𝑥/ − 𝛼 𝑐( + 𝑥7 𝑐/
𝑥7 𝛼
• Can write y an infinite number of ways by

8
adding 𝛽 to x1 and subtracting from x2
9
• Or, given a vector y there’s not a unique
vector x s.t. y =Ax
• Not all y have a corresponding x s.t. y=Ax
78
Linear Independence
𝑨𝒙 = 𝑥( + 𝛼𝑥/ 𝒄𝟏 + 𝑥7 𝒄𝟐
𝛽 𝛽
𝒚 = 𝑨 −𝛽/𝛼 = 𝛽 − 𝛼 𝒄𝟏 + 0𝒄𝟐
𝛼
0
• What else can we cancel out?

• An infinite number of non-zero vectors x can
map to a zero-vector y
• Called the right null-space of A.
79
Rank
• Rank of a nxn matrix A – number of linearly

independent columns (or rows) of A / the
dimension of the span of the columns
• Matrices with full rank (n x n, rank n) behave
nicely: can be inverted, span the full output
space, are one-to-one.
• Matrices with full rank are machines where
every knob is useful and every output state can
be made by the machine
80
Inverses
• Given 𝒚 = 𝑨𝒙, y is a linear combination of

columns of A proportional to x. If A is full-rank,
we should be able to invert this mapping.
• Given some y (output) and A, what x (inputs)
produced it?
• x = A-1y
81
Symmetric Matrices
• Symmetric: 𝑨𝑻 = 𝑨 or 𝑎$$ 𝑎$! 𝑎$4

𝑨!3 = 𝑨3!
𝑎!$ 𝑎!! 𝑎!4
• Have lots of special 𝑎4$ 𝑎4! 𝑎44
properties
Any matrix of the form 𝑨 = 𝑿𝑻 𝑿 is symmetric.

𝑻
Quick check: 𝑨𝑻 = 𝑻
𝑿 𝑿
𝑻
𝑨𝑻 = 𝑿𝑻 𝑿𝑻
𝑨𝑻 = 𝑿𝑻 𝑿
82
Special Matrices – Rotations
𝑟(( 𝑟(/ 𝑟(7
𝑟/( 𝑟// 𝑟/7
𝑟7( 𝑟7/ 𝑟77
• Rotation matrices 𝑹 rotate vectors and do not

change vector L2 norms ( 𝑹𝒙 / = 𝒙 / )
• Every row/column is unit norm
• Every row is linearly independent
• Transpose is inverse 𝑹𝑹𝑻 = 𝑹𝑻 𝑹 = 𝑰
• Determinant is 1 (otherwise it’s also a coordinate
flip/reflection), eigenvalues are 1
83
Eigensystems
• An eigenvector 𝒗𝒊 and eigenvalue 𝜆! of a

matrix 𝑨 satisfy 𝑨𝒗𝒊 = 𝜆! 𝒗𝒊 (𝑨𝒗𝒊 is scaled by 𝜆! )
• Vectors and values are always paired and
typically you assume 𝒗𝒊 / = 1
• Biggest eigenvalue of A gives bounds on how
much 𝑓 𝒙 = 𝑨𝒙 stretches a vector x.
• Hints of what people really mean:
• “Largest eigenvector” = vector w/ largest value
• Spectral just means there’s eigenvectors
84
Suppose I have points in a grid
85
Now I apply f(x) = Ax to these points
Pointy-end: Ax . Non-Pointy-End: x
86
𝑨=
1.1 0
0 1.1
Red box – unit square, Blue box – after f(x) = Ax.

What are the yellow lines and why?
87
𝑨=
0.8 0
0 1.25
Now I apply f(x) = Ax to these points

Pointy-end: Ax . Non-Pointy-End: x
88
𝑨=
0.8 0
0 1.25

What are the yellow lines and why?
89
𝑨=
cos(𝑡) −sin(𝑡)
sin(𝑡) cos(𝑡)

Can we draw any yellow lines?
90
Eigenvectors of Symmetric Matrices
• Always n mutually orthogonal eigenvectors

with n (not necessarily) distinct eigenvalues
• For symmetric 𝑨, the eigenvector with the
𝒙𝑻 𝑨𝒙
largest eigenvalue maximizes 𝒙𝑻𝒙
(smallest/min)
• So for unit vectors (where 𝒙𝑻 𝒙 = 1), that
eigenvector maximizes 𝒙𝑻 𝑨𝒙
• A surprisingly large number of optimization
problems rely on (max/min)imizing this
91
The Singular Value Decomposition
Can always write a mxn matrix A as: 𝑨 = 𝑼𝚺𝑽𝑻
A = U ∑
σ1
σ2
0
σ3
Scale
0
Rotation
Eigenvectors Sqrt of
of AAT Eigenvalues
of ATA
92
The Singular Value Decomposition
Can always write a mxn matrix A as: 𝑨 = 𝑼𝚺𝑽𝑻
VT
A = U ∑
Rotation
Rotation Scale
Eigenvectors Sqrt of Eigenvectors
of AAT Eigenvalues of ATA
of ATA
93
Singular Value Decomposition
• Every matrix is a rotation, scaling, and rotation

• Number of non-zero singular values = rank /
number of linearly independent vectors
• “Closest” matrix to A with a lower rank
σ1 0
σ2 VT
A = U σ3
0
94

σ1 0
σ2 VT
Â = U 0
0
95

• Secretly behind basically many things you do
with matrices
96
Solving Least-Squares
(x2,y2) Start with two points (xi,yi)
𝒚 = 𝑨𝒗
𝑦( 𝑥( 1 𝑚
𝑦/ = 𝑥/ 1 𝑏
(x1,y1)
𝑦( 𝑚𝑥( + 𝑏
𝑦/ = 𝑚𝑥/ + 𝑏
We know how to solve this –
invert A and find v (i.e., (m,b)
that fits points)
97
(x2,y2) Start with two points (xi,yi)
𝒚 = 𝑨𝒗
𝑦( 𝑥( 1 𝑚
𝑦/ = 𝑥/ 1 𝑏
(x1,y1) 𝑦( 𝑚𝑥 ( + 𝑏 /
𝒚 − 𝑨𝒗 / = 𝑦/ − 𝑚𝑥 + 𝑏
/
/ /
= 𝑦( − 𝑚𝑥( + 𝑏 + 𝑦/ − 𝑚𝑥/ + 𝑏
The sum of squared differences between
the actual value of y and
what the model says y should be.
98
Suppose there are n > 2 points
𝒚 = 𝑨𝒗
𝑦( 𝑥( 1
⋮ = ⋮ 𝑚
⋮
𝑦' 𝑏
𝑥' 1
Compute 𝑦 − 𝐴𝑥 / again
<
𝒚 − 𝑨𝒗 / = p 𝑦! − (𝑚𝑥! + 𝑏) /
!%(
99
Given y, A, and v with y = Av overdetermined
(A tall / more equations than unknowns)
We want to minimize 𝒚 − 𝑨𝒗 𝟐 , or find:
arg min𝒗 𝒚 − 𝑨𝒗 !
(The value of x that makes

the expression smallest)
Solution satisfies 𝑨𝑻 𝑨 𝒗∗ = 𝑨𝑻 𝒚
or
∗ 𝑻 7$ 𝑻
𝒗 = 𝑨 𝑨 𝑨 𝒚
100
When is Least-Squares Possible?
Given y, A, and v. Want y = Av
Want n outputs, have n knobs

y = A v to fiddle with, every knob is
useful if A is full rank.
= A: rows (outputs) > columns

v (knobs). Thus can’t get precise
y A output you want (not enough
knobs). So settle for “closest”
knob setting.
101
When is Least-Squares Possible?
Given y, A, and v. Want y = Av
Want n outputs, have n knobs

y = A v to fiddle with, every knob is
useful if A is full rank.
y = A A: columns (knobs) > rows

v (outputs). Thus, any output can
be expressed in infinite ways.
102
Homogeneous Least-Squares
Given a set of unit vectors (aka directions) 𝒙𝟏 , … , 𝒙𝒏
and I want vector 𝒗 that is as orthogonal to all the 𝒙𝒊
as possible (for some definition of orthogonal)
𝒙𝒏… Stack 𝒙𝒊 into A, compute Av
𝒙𝟐 𝑻 𝑻 0 if
𝒙𝟏 − 𝒙 𝟏 − 𝒙 𝟏 𝒗
𝑨𝒗 = ⋮ 𝒗= ⋮ orthog
𝑻
− 𝒙𝒏 − 𝒙𝑻𝒏 𝒗
𝒏
𝒗 𝟐 𝑻 𝟐
Compute 𝑨𝒗 =p 𝒙𝒊 𝒗
𝒊
Sum of how orthog. v is to each x 103
• A lot of times, given a matrix A we want to find

the v that minimizes 𝑨𝒗 / .
• I.e., want 𝐯 ∗ = arg min 𝑨𝒗 //
𝒗
• What’s a trivial solution?
• Set v = 0 → Av = 0
• Exclude this by forcing v to have unit norm
104
/
Let’s look at 𝑨𝒗 /
!
𝑨𝒗 ! = Rewrite as dot product
! 9
𝑨𝒗 ! = 𝐀𝐯 (𝐀𝐯) Distribute transpose
!
𝑨𝒗 ! = 𝒗𝑻 𝑨𝑻 𝐀𝐯 = 𝐯 𝐓 𝐀𝐓 𝐀 𝐯
We want the vector minimizing this quadratic form

Where have we seen this?
105
Ubiquitious tool in vision:
arg min 𝑨𝒗 !
$ 𝒗 ($
(1) “Smallest”* eigenvector of 𝑨𝑻 𝑨

(2) “Smallest” right singular vector of 𝑨
For min → max, switch smallest → largest
*Note: 𝑨𝑻 𝑨 is positive semi-definite so it has all non-negative eigenvalues

106
Derivatives
Remember derivatives?
Derivative: rate at which a function f(x) changes

at a point as well as the direction that increases
the function
107
Given quadratic function f(x)
/
𝑓 𝑥, 𝑦 = 𝑥 − 2 +5
𝑓 𝑥 is function
𝑔 𝑥 = 𝑓& 𝑥
aka
𝑑
𝑔 𝑥 = 𝑓(𝑥)
𝑑𝑥
108
Given quadratic function f(x)
/
𝑓 𝑥, 𝑦 = 𝑥 − 2 +5
What’s special
about x=2?
𝑓 𝑥 minim. at 2
𝑔 𝑥 = 0 at 2
a = minimum of f →
𝑔 𝑎 =0
Reverse is not true
109
Rates of change
/
𝑓 𝑥, 𝑦 = 𝑥 − 2 +5
Suppose I want to
increase f(x) by
changing x:
Blue area: move left

Red area: move right
Derivative tells you

direction of ascent
and rate
110
What Calculus Should I Know
• Really need intuition

• Need chain rule
• Rest you should look up / use a computer
algebra system / use a cookbook
• Partial derivatives (and that’s it from
multivariable calculus)
111
Partial Derivatives
• Pretend other variables are constant, take a

derivative. That’s it.
• Make our function a function of two variables
𝑓 𝑥 = 𝑥−2 /+5
𝜕
𝑓 𝑥 = 2 𝑥 − 2 ∗ 1 = 2(𝑥 − 2)
𝜕𝑥 Pretend it’s
𝑓/ 𝑥, 𝑦 = 𝑥 − 2 / + 5 + 𝑦 + 1 / constant →
𝜕 derivative = 0
𝑓/ 𝑥 = 2(𝑥 − 2)
𝜕𝑥
112
Zooming Out
/ /
𝑓/ 𝑥, 𝑦 = 𝑥 − 2 +5+ 𝑦+1
Dark = f(x,y) low
Bright = f(x,y) high
113
Taking a slice of
/ /
𝑓/ 𝑥, 𝑦 = 𝑥 − 2 +5+ 𝑦+1
Slice of y=0 is the
function from before:
𝑓 𝑥 = 𝑥−2 /+5
𝑓 @ 𝑥 = 2(𝑥 − 2)
114
Taking a slice of
/ /
𝑓/ 𝑥, 𝑦 = 𝑥 − 2 +5+ 𝑦+1
A
𝑓/𝑥, 𝑦 is rate of
AB
change & direction in
x dimension
115
Zooming Out
/ /
𝑓/ 𝑥, 𝑦 = 𝑥 − 2 +5+ 𝑦+1
A
𝑓/ 𝑥, 𝑦 is
AC
2(𝑦 + 1)
and is the rate of
change & direction in
y dimension
116
Zooming Out
/ /
𝑓/ 𝑥, 𝑦 = 𝑥 − 2 +5+ 𝑦+1
Gradient/Jacobian:
Making a vector of
𝜕𝑓 𝜕𝑓
∇D = ,
𝜕𝑥 𝜕𝑦
gives rate and
direction of change.
Arrows point OUT of

minimum / basin.
117
What Should I Know?
• Gradients are simply partial derivatives per-

dimension: if 𝒙 in 𝑓(𝒙) has n dimensions, ∇D (𝑥)
has n dimensions
• Gradients point in direction of ascent and tell
the rate of ascent
• If a is minimum of 𝑓(𝒙) → ∇E a = 𝟎
• Reverse is not true, especially in high-
dimensional spaces
118
119

Lecture3

Uploaded by

Copyright:

Available Formats

Lecture3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture3

Uploaded by

Copyright:

Available Formats

Numerical

• Not a “Linear algebra in two lectures” – that’s

• Suppose I average 50M normally distributed

• What should happen qualitatively?

000 -22 x 1.00 = -4

Gap between numbers is relative, not absolute

1 1001 001 -22 x 1.125 = -4.5

Actual implementation is complex

-22 x 1.03125 = -4.125

-22 x 1.03125 = -4.125

IEEE 754 Double Precision / Double

• Computer numbers aren’t math numbers

[2,3] = 2 x [1,0] + 3 x [0,1]

x = [2,3] Just an array!

2x = [4,6] • Can scale vector by a scalar

Can do both at the same

x = [2,3] Diving by norm gives

𝒙 = [2,3] 𝒙 ⋅ 𝒚 = # 𝑥" 𝑦"

𝒙 = [2,3] 𝒙 ⋅ 𝒚 = # 𝑥" 𝑦"

• What’s the set of vectors that are

• Scale (vector, scalar → vector)

𝑣)! 𝑣*! 𝑣,!

(scalar) (vector) (matrix)

• Dimensions must match

1. Storing things in a rectangular array (images,

“Hadamard Product” / Element-wise multiplication

• Suppose I represent each image as a 128-

Compute a NxM 𝑿𝒀𝑻 = 𝒙𝑻𝒊 𝒚𝒋

Computing pairwise distances between 300 and

• The output space of f(x) = Ax is constrained to

• Can write y an infinite number of ways by

• What else can we cancel out?

• Rank of a nxn matrix A – number of linearly

• Given 𝒚 = 𝑨𝒙, y is a linear combination of

• Symmetric: 𝑨𝑻 = 𝑨 or 𝑎$$ 𝑎$! 𝑎$4

Any matrix of the form 𝑨 = 𝑿𝑻 𝑿 is symmetric.

• Rotation matrices 𝑹 rotate vectors and do not

• An eigenvector 𝒗𝒊 and eigenvalue 𝜆! of a

Red box – unit square, Blue box – after f(x) = Ax.

Now I apply f(x) = Ax to these points

Red box – unit square, Blue box – after f(x) = Ax.

Red box – unit square, Blue box – after f(x) = Ax.

• Always n mutually orthogonal eigenvectors

• Every matrix is a rotation, scaling, and rotation

• Every matrix is a rotation, scaling, and rotation

• Every matrix is a rotation, scaling, and rotation

(The value of x that makes

Want n outputs, have n knobs

= A: rows (outputs) > columns

Want n outputs, have n knobs

y = A A: columns (knobs) > rows

• A lot of times, given a matrix A we want to find

We want the vector minimizing this quadratic form

(1) “Smallest”* eigenvector of 𝑨𝑻 𝑨

For min → max, switch smallest → largest

*Note: 𝑨𝑻 𝑨 is positive semi-definite so it has all non-negative eigenvalues

Derivative: rate at which a function f(x) changes

Reverse is not true

Blue area: move left

Derivative tells you