Lecture3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 116

Numerical

Linear Algebra
CSE 803– Fall 2024, MSU

Xiaoming Liu

1
Slides credit: D Fouhey
This Week – Math
Two goals for the next two classes:
• Math with computers ≠ Math
• Practical math you need to know but
may not have been taught

3
This Week – Goal

• Not a “Linear algebra in two lectures” – that’s


impossible.
• Some of this you should know!
• Aimed at reviving your knowledge and plugging
any gaps
• Aimed at giving you intuitions

4
Adding Numbers

•1+1=?
• Suppose 𝑥! is normally distributed with mean 𝜇
and standard deviation 𝜎 for 1 ≤ 𝑖 ≤ 𝑁
𝟏 𝑵
• How is the average, or 𝝁 (= ∑𝒊%𝟏 𝒙𝒊 ,
𝑵
distributed (qualitatively), in terms of
variance?
&
• 𝜇- has mean 𝜇 and standard deviation .
'

5
Let’s Make It Big

• Suppose I average 50M normally distributed


numbers (mean: 31, standard deviation: 1)
• For instance: have predicted and actual depth
for 200 480x640 images and want to know the
average error (|predicted – actual|)
numerator = 0
for x in xs:
numerator += x
return numerator / len(xs)
7
Let’s Make It Big

• What should happen qualitatively?


• Theory says that the average is distributed with
(
mean 31 and standard deviation ≈ 10,)
)*+
• What will happen?
• Reality: 17.47

8
Trying it Out

Hmm.

Hmm.

9
What’s a Number?

27 26 25 24 23 22 21 20
1 0 1 1 1 0 0 1 185
128 + 32 + 16 + 8 + 1 = 185

10
Adding Two Numbers

28 27 26 25 24 23 22 21 20
1 0 1 1 1 0 0 1 185
0 1 1 0 1 0 0 1 105
1 0 0 1 0 0 0 1 0 34
Carry Result
Flag
“Integers” on a computer are integers modulo 2k
11
Some Gotchas
32 + (3 / 4) x 40 = 32
Why?
32 + (3 x 40) / 4 = 62
Underflow No Underflow
32 + 3 / 4 x 40 = 32 + 3 x 40 / 4 =
32 + 0 x 40 = 32 + 120 / 4 =
32 + 0 = 32 + 30 =
32 62
Ok – you have to multiply before dividing
12
Some Gotchas
Should be:
9x4=36
math 32 + (9 x 40) / 10 = 68
uint8 32 + (9 x 40) / 10 = 42
Overflow
32 + 9 x 40 / 10 = Why 104?
32 + 104 / 10 = 9 x 40 = 360
32 + 10 = 360 % 256 = 104
42

13
What’s a Number?

27 26 25 24 23 22 21 20
1 0 1 1 1 0 0 1 185
How can we do fractions?

25 24 23 22 21 20 2-1 2-2
1 0 1 1 1 0 0 1 45.25
45 0.25
14
Fixed-Point Arithmetic
25 24 23 22 21 20 2-1 2-2
1 0 1 1 1 0 0 1 45.25
What’s the largest number we can represent?
63.75 – Why?
How precisely can we measure at 63?
0.25
How precisely can we measure at 0?
0.25
Fine for many purposes but for science, seems silly
15
Floating Point
Sign (S) Exponent (E) Fraction (F)

1 0 1 1 1 0 0 1
1 7 1
-1 27-7 = 20 =1 1+1/8 = 1.125
𝑺 𝑬#𝒃𝒊𝒂𝒔
𝑭
−𝟏 𝟐 𝟏+ 𝟑
𝟐
Bias: allows exponent to be negative; Note: fraction = significant = mantissa;
exponents of all ones or all zeros are special numbers 16
Floating Point
Fraction
0/8 000 -20 x 1.00 = -1
1/8 001 -20 x 1.125 = -1.125
Sign Exponent
2/8 010 -20 x 1.25 = -1.25
1 0111 …
-1 7-7=0
*(-bias)*
6/8 110 -20 x 1.75 = -1.75
7/8 111 -20 x 1.875 = -1.875
18
Floating Point
Fraction
0/8 000 -22 x 1.00 = -4
1/8 001 -22 x 1.125 = -4.5
Sign Exponent
2/8 010 -22 x 1.25 = -5
1 1001 …
-1 9-7=2
*(-bias)*
6/8 110 -22 x 1.75 = -7
7/8 111 -22 x 1.875 = -7.5
19
Floating Point
Fraction
Sign Exponent
000 -20 x 1.00 = -1
1 0111
001 -20 x 1.125 = -1.125

000 -22 x 1.00 = -4


1 1001
001 -22 x 1.125 = -4.5

Gap between numbers is relative, not absolute


20
Revisiting Adding Numbers
Sign Exponent Fraction
1 0110 000 -2-1 x 1.00 = -0.5
1 1001 000 -22 x 1.00 = -4

1 1001 001 -22 x 1.125 = -4.5

Actual implementation is complex

21
Revisiting Adding Numbers
Sign Exponent Fraction
1 0100 000 -2-3 x 1.00 = -0.125
1 1001 000 -22 x 1.00 = -4

-22 x 1.03125 = -4.125


1 1 0 0 1 000 -22 x 1.00 = -4
?
1 1 0 0 1 001 -22 x 1.125 = -4.5

22
Revisiting Adding Numbers
Sign Exponent Fraction
1 0100 000 -2-3 x 1.00 = -0.125
1 1001 000 -22 x 1.00 = -4

-22 x 1.03125 = -4.125


1 1 0 0 1 000 -22 x 1.00 = -4
For a and b, these can happen
a + b = a a+b-a ≠ b
23
8-bit minifloat

24
https://en.wikipedia.org/wiki/Minifloat
Revisiting Adding Numbers
IEEE 754 Single Precision / Single
8 bits 23 bits
2127 ≈ 1038 ≈ 7 decimal digits
S Exponent Fraction

IEEE 754 Double Precision / Double


11 bits 52 bits
21023 ≈ 10308 ≈ 15 decimal digits
S Exponent Fraction

25
Trying it Out

Roundoff
error occurs
a+b=a ->
numerator is
stuck,
denominator
isn’t

26
Take-homes

• Computer numbers aren’t math numbers


• Overflow, accidental zeros, roundoff error, and
basic equalities are almost certainly incorrect
for some values
• Floating point defaults and numpy try to protect
you.
• Generally safe to use a double and use built-in-
functions in numpy (not necessarily others!)
• Spooky behavior = look for numerical issues

27
Vectors

[2,3] = 2 x [1,0] + 3 x [0,1]


2x +3x
2 x e1 + 3 x e2
x = [2,3] Can be arbitrary # of
dimensions
(typically denoted Rn)

28
Vectors

2 𝒙) = 2
𝒙=
3 𝒙* = 3

x = [2,3] Just an array!


Get in the habit of thinking of
them as columns.

29
Scaling Vectors

2x = [4,6] • Can scale vector by a scalar


• Scalar = single number
• Dimensions changed
x = [2,3] independently
• Changes magnitude / length,
does not change direction.

30
Adding Vectors
• Can add vectors
• Dimensions changed independently
• Order irrelevant
• Can change direction and magnitude
x = [2,3] x+y = [5,4]

y = [3,1]

31
Scaling and Adding

2x+y = [7,7]

Can do both at the same


x = [2,3] time

y = [3,1]

32
Measuring Length
Magnitude / length / (L2) norm of vector
# $/!

𝒙 = 𝒙 ! = # 𝑥"!
"
x = [2,3] There are other norms; assume L2
unless told otherwise
𝒙 * = 13
y = [3,1] 𝒚 * = 10
Why?
33
Normalizing a Vector

x = [2,3] Diving by norm gives


something on the unit
sphere (all vectors with
length 1)
𝒙& = 𝒙/ 𝒙 𝟐
y = [3,1]
𝒚 & = 𝒚/ 𝒚 𝟐

34
Dot Products
#

𝒙 ⋅ 𝒚 = # 𝑥" 𝑦" = 𝒙𝑻 𝒚
"($
𝒙 ⋅ 𝒚 = cos 𝜃 𝒙 𝒚
What happens with
𝒙& normalized / unit
𝜃 vectors?
𝒚 &

35
Dot Products
#

𝒙 = [2,3] 𝒙 ⋅ 𝒚 = # 𝑥" 𝑦"


"
What’s 𝒙 ⋅ 𝒆𝟏 , 𝒙 ⋅ 𝒆𝟐 ?
Ans: 𝒙 ⋅ 𝒆𝟏 = 2 ; 𝒙 ⋅ 𝒆𝟐 = 3
• Dot product is projection
𝒆𝟏 • Amount of x that’s also
pointing in direction of y

𝒆𝟐
36
Dot Products
#

𝒙 = [2,3] 𝒙 ⋅ 𝒚 = # 𝑥" 𝑦"


"
What’s 𝒙 ⋅ 𝒙 ?
!
Ans: 𝒙 ⋅ 𝒙 = ∑𝑥" 𝑥" = 𝒙 !

37
Special Angles
1 0
⋅ =0∗1+1∗0=0
0 1
𝒙 Perpendicular /
orthogonal vectors
have dot product 0
𝒙& irrespective of their
𝜃 magnitude

𝒚& 𝒚
38
Special Angles
𝑥$ 𝑦$
𝑥! ⋅ 𝑦! = 𝑥$ 𝑦$ + 𝑥! 𝑦! = 0
𝒙
Perpendicular /
orthogonal vectors
have dot product 0
𝒙& irrespective of their
𝜃 magnitude

𝒚 & 𝒚
39
Orthogonal Vectors

𝒙 = [2,3]
• Geometrically,
what’s the set of
vectors that are
orthogonal to x?
• A line [3,-2]

40
Orthogonal Vectors
𝒙 • What’s the set of vectors that are
orthogonal to x = [5,0,0]?
• A plane/2D space of vectors/any
vector [0, 𝑎, 𝑏]

• What’s the set of vectors that are


𝒙 orthogonal to x and y = [0,5,0]?
• A line/1D space of vectors/any
vector [0,0, 𝑏]
𝒚 • Ambiguity in sign and magnitude

41
Cross Product
• Set {𝒛: 𝒛 ⋅ 𝒙 = 0, 𝒛 ⋅ 𝒚 = 0} has an
𝒙 ambiguity in sign and magnitude
𝒚 • Cross product 𝒙×𝒚 is: (1)
orthogonal to x, y (2) has sign
given by right hand rule and (3)
𝒙×𝒚 has magnitude given by area of
parallelogram of x and y
• Important: if x and y are the same
direction or either is 0, then 𝒙×𝒚 =
𝟎.
• Only in 3D!
42
Image credit: Wikipedia.org
Operations You Should Know

• Scale (vector, scalar → vector)


• Add (vector, vector → vector)
• Magnitude (vector → scalar)
• Dot product (vector, vector → scalar)
• Dot products are projection / angles
• Cross product (vector, vector → vector)
• Vectors facing same direction have cross product 0
• You can never mix vectors of different sizes

43
Matrices
Horizontally concatenate n, m-dim column vectors
and you get a mxn matrix A (here 2x3)

𝑣)! 𝑣*! 𝑣,!


𝑨 = 𝒗) , ⋯ , 𝒗+ = 𝑣 𝑣*" 𝑣,"
)"

(scalar) (vector) (matrix)


a lowercase a lowercase A uppercase
undecorated bold or arrow bold

44
Matrices
𝑎 +
Transpose: flip
𝑏 = 𝑎 𝑏 𝑐 (3x1)T = 1x3
rows / columns
𝑐
Vertically concatenate m, n-dim row vectors
and you get a mxn matrix A (here 2x3)
-
𝒖) 𝑢)! 𝑢)" 𝑢)#
𝐴= ⋮ = 𝑢 𝑢*" 𝑢*#
- *!
𝒖+
45
Matrix-Vector Product
𝒚*.) = 𝑨*., 𝒙,.)
𝑥)
𝑦)
= 𝒗𝟏 𝒗𝟐 𝒗𝟑 𝑥*
𝑦*
𝑥,

𝒚 = 𝑥) 𝒗𝟏 + 𝑥* 𝒗𝟐 + 𝑥, 𝒗𝟑
Linear combination of columns of A

46
Matrix-Vector Product
𝒚*.) = 𝑨*., 𝒙,.)
3

𝑦) 𝑻
𝒖𝟏
𝑦* = 𝒙 3
𝑻
𝒖𝟐
𝑻 𝑻
𝑦) = 𝒖𝟏 𝒙 𝑦* = 𝒖𝟐 𝒙
Dot product between rows of A and x

47
Matrix Multiplication
Generally: Amn and Bnp yield product (AB)mp

− 𝑻
𝒂𝟏 − | |
𝑨𝑩 = ⋮ 𝒃𝟏 ⋯ 𝒃𝒑
𝑻
− 𝒂𝒎 − | |
Yes – in A, I’m referring to the rows, and in B,
I’m referring to the columns

48
Matrix Multiplication
Generally: Amn and Bnp yield product (AB)mp

| |
𝒃𝟏 ⋯ 𝒃𝒑
| |
𝑻 𝑻 𝑻
− 𝒂𝟏 − 𝒂 𝟏 𝒃𝟏 ⋯ 𝒂 𝟏 𝒃𝒑
𝑨𝑩 = ⋮ ⋮ ⋱ ⋮
− 𝒂𝑻𝒎 − 𝒂𝑻𝒎 𝒃𝟏 ⋯ 𝒂𝑻𝒎 𝒃𝒑
𝑨𝑩". = 𝒂𝑻𝒊 𝒃𝒋 49
Matrix Multiplication

• Dimensions must match


• Dimensions must match
• Dimensions must match
• (Yes, it’s associative): ABx = (A)(Bx) = (AB)x
• (No it’s not commutative): ABx ≠ (BA)x ≠ (BxA)

50
Operations They Don’t Teach
You Probably Saw Matrix Addition
𝑎 𝑏 𝑒 𝑓 𝑎+𝑒 𝑏+𝑓
+ =
𝑐 𝑑 𝑔 ℎ 𝑐+𝑔 𝑑+ℎ
What is this? FYI: e is a scalar
𝑎 𝑏 𝑎+𝑒 𝑏+𝑒
+𝑒 =
𝑐 𝑑 𝑐+𝑒 𝑑+𝑒

51
Broadcasting
If you want to be pedantic and proper, you expand
e by multiplying a matrix of 1s (denoted 1)
𝑎 𝑏 𝑎 𝑏
+𝑒 = + 𝟏!1! 𝑒
𝑐 𝑑 𝑐 𝑑
𝑎 𝑏 𝑒 𝑒
= +
𝑐 𝑑 𝑒 𝑒
Many smart matrix libraries do this automatically.
This is the source of many bugs.

52
Broadcasting Example
Given: a nx2 matrix P and a 2D column vector v,
Want: nx2 difference matrix D

𝑥$ 𝑦$ 𝑥$ − 𝑎 𝑦$ − 𝑏
𝑎
𝑷= ⋮ ⋮ 𝒗= 𝑫= ⋮ ⋮
𝑥# 𝑦# 𝑏
𝑥# − 𝑎 𝑦# − 𝑏

𝑥$ 𝑦$ 𝑎 𝑏 Blue stuff is
𝑷− 𝒗 + = ⋮ ⋮ − ⋮ assumed /
𝑥# 𝑦# 𝑎 𝑏 broadcast
53
Two Uses for Matrices

1. Storing things in a rectangular array (images,


maps)
• Typical operations: element-wise operations,
convolution (which we’ll cover next)
• Atypical operations: almost anything you learned in
a math linear algebra class
2. A linear operator that maps vectors to
another space (Ax)
• Typical/Atypical: reverse of above

54
Images as Matrices
Suppose someone hands you this matrix.
What’s wrong with it?

55
Contrast – Gamma curve
Typical way to
change the contrast
is to apply a
nonlinear correction

pixelvalue-

The quantity 𝛾
controls how much
contrast gets added

56
Contrast – Gamma curve
Now the darkest
regions (10th pctile) 90%
are much darker 50%
than the moderately
dark regions (50th new
10% 90%
pctile).

new
new 10% 50%

57
Implementation
Python+Numpy (right way):
imNew = im**4
Python+Numpy (slow way – why? ):
imNew = np.zeros(im.shape)
for y in range(im.shape[0]):
for x in range(im.shape[1]):
imNew[y,x] = im[y,x]**expFactor

58
Numerical
Linear Algebra
CSE 803– Fall 2024, MSU

Xiaoming Liu

59
Images as Matrices
Suppose someone hands you this matrix.
The contrast is wrong!

60
Results
Phew! Much Better.

61
Implementation
Python+Numpy (right way):
imNew = im**4
Python+Numpy (slow way – why? ):
imNew = np.zeros(im.shape)
for y in range(im.shape[0]):
for x in range(im.shape[1]):
imNew[y,x] = im[y,x]**expFactor

62
Element-wise Operations
Element-wise power – beware notation
2
𝑨2 ". = 𝐴".

“Hadamard Product” / Element-wise multiplication


𝑨⊙𝑩 ". = 𝑨". ∗ 𝑩".
Element-wise division
𝐴".
𝑨/𝑩 ". =
𝐵".
63
Sums Across Axes
𝑥$ 𝑦$
Suppose have
Nx2 matrix A
𝑨= ⋮ ⋮
𝑥# 𝑦#
𝑥$ + 𝑦$
ND col. vec. Σ(𝑨, 1) = ⋮
𝑥# + 𝑦#
# #
2D row vec Σ(𝑨, 0) = # 𝑥" , # 𝑦"
"($ "($

Note – libraries distinguish between N-D column vector and Nx1 matrix.
64
Vectorizing Example

• Suppose I represent each image as a 128-


dimensional vector
• I want to compute all the pairwise distances
between {x1, …, xN} and {y1, …, yM} so I can
find, for every xi the nearest yj
• Identity: 𝒙 − 𝒚 / = 𝒙 / + 𝒚 / − 2𝒙0 𝒚
• Or: 𝒙 − 𝒚 = 𝒙 / + 𝒚 / − 2𝒙0 𝒚 (//

65
Vectorizing Example
−𝒙( − 𝒚( − − | |
𝑿= ⋮ 𝒀= ⋮ 𝒀𝑻 = 𝒚( ⋯ 𝒚+
− 𝒙' − − 𝒚+ − | |

𝒙𝟏 𝟐
Compute a Nx1
𝚺 𝑿𝟐 , 𝟏 = ⋮
vector of norms 𝟐
𝒙𝑵
(can also do Mx1)

Compute a NxM 𝑿𝒀𝑻 = 𝒙𝑻𝒊 𝒚𝒋


matrix of dot products !3

66
Vectorizing Example
𝑻 (//
𝐃 = Σ 𝑿𝟐 , 1 + Σ 𝒀𝟐 , 1 − 2𝑿𝒀𝑻

𝒙𝟏 𝟐
+ 𝒚( 𝟐 𝟐
⋯ 𝒚+

𝟐
𝒙𝑵
𝒙𝟏 / + 𝒚𝟏 / ⋯ 𝒙𝟏 / + 𝒚𝑴 /

⋮ ⋱ ⋮ Why?
/ / / /
𝒙𝑵 + 𝒚𝟏 ⋯ 𝒙𝑵 + 𝒚𝑴
/ / 0 / /
Σ 𝑿 ,1 + Σ 𝒀 ,1 !3 = 𝒙! + 𝒚3
67
Vectorizing Example
𝑻 (//
𝐃 = Σ 𝑿𝟐 , 1 + Σ 𝒀𝟐 , 1 − 2𝑿𝒀𝑻
/ /
𝐃!3 = 𝒙𝒊 + 𝒚𝒋 + 2𝒙𝑻 𝒚

Numpy code:
XNorm = np.sum(X**2,axis=1,keepdims=True)
YNorm = np.sum(Y**2,axis=1,keepdims=True)
D = (XNorm+YNorm.T-2*np.dot(X,Y.T))**0.5
*May have to make sure this is at least 0
(sometimes roundoff issues happen)
68
Does it Make a Difference?

Computing pairwise distances between 300 and


400 128-dimensional vectors
1. for x in X, for y in Y, using native python: 9s
2. for x in X, for y in Y, using numpy to compute
distance: 0.8s
3. vectorized: 0.0045s (~2000x faster than 1,
175x faster than 2)
Expressing things in primitives that are
optimized is usually faster

69
Linear Independence
A set of vectors is linearly independent if you can’t
write one as a linear combination of the others.
0 0 5
Suppose: 𝒂 = 0 𝒃 = 6 𝒄 = 0
2 0 0
0 0 1 1
𝒙 = 0 = 2𝒂 𝒚 = −2 = 𝒂 − 𝒃
4 2 3
1
• Is the set {a,b,c} linearly independent?
• Is the set {a,b,x} linearly independent?
• Max # of independent 3D vectors?
70
Span
Span: all linear
combinations of a
set of vectors

Span({ }) =
Span({[0,2]}) = ?
All vertical lines
through origin =
𝜆 0,1 : 𝜆 ∈ 𝑅
Is blue in {red}’s
span?
71
Span
Span: all linear
combinations of a
set of vectors

Span({ , }) = ?

72
Span
Span: all linear
combinations of a
set of vectors

Span({ , }) = ?

73
Matrix-Vector Product
| | Right-multiplying A by x
𝑨𝒙 = 𝒄𝟏 ⋯ 𝒄𝒏 𝒙 mixes columns of A
| | according to entries of x

• The output space of f(x) = Ax is constrained to


be the span of the columns of A.
• Can’t output things you can’t construct out of
your columns

74
An Intuition
| | | 𝑥$
𝒚 = 𝑨𝒙 = 𝒄𝟏 𝒄𝟐 𝒄𝒏 𝑥!
| | | 𝑥4

y y1 x
y2 Ax x1 x2 x3
y3
x – knobs on machine (e.g., fuel, brakes)
y – state of the world (e.g., where you are)
A – machine (e.g., your car)
75
Linear Independence
Suppose the columns of 3x3 matrix A are not
linearly independent (c1, αc1, c2 for instance)
| | | 𝑥$
𝒚 = 𝑨𝒙 = 𝒄𝟏 𝛼𝒄𝟏 𝒄𝟐 𝑥!
| | | 𝑥4

𝒚 = 𝑥$ 𝒄𝟏 + 𝛼𝑥! 𝒄𝟏 + 𝑥4 𝒄𝟐
𝒚 = 𝑥$ + 𝛼𝑥! 𝒄𝟏 + 𝑥4 𝒄𝟐

76
Linear Independence Intuition
Knobs of x are redundant. Even if y has 3
outputs, you can only control it in two directions
𝒚 = 𝑥$ + 𝛼𝑥! 𝒄𝟏 + 𝑥4 𝒄𝟐

y y1 x
y2 Ax x1 x2 x3
y3

77
Linear Independence
Recall: 𝑨𝒙 = 𝑥( + 𝛼𝑥/ 𝒄𝟏 + 𝑥7 𝒄𝟐
𝑥( + 𝛽
𝛽
𝒚 = 𝑨 𝑥/ − 𝛽/𝛼 = 𝑥( + 𝛽 + 𝛼𝑥/ − 𝛼 𝑐( + 𝑥7 𝑐/
𝑥7 𝛼

• Can write y an infinite number of ways by


8
adding 𝛽 to x1 and subtracting from x2
9
• Or, given a vector y there’s not a unique
vector x s.t. y =Ax
• Not all y have a corresponding x s.t. y=Ax
78
Linear Independence
𝑨𝒙 = 𝑥( + 𝛼𝑥/ 𝒄𝟏 + 𝑥7 𝒄𝟐

𝛽 𝛽
𝒚 = 𝑨 −𝛽/𝛼 = 𝛽 − 𝛼 𝒄𝟏 + 0𝒄𝟐
𝛼
0

• What else can we cancel out?


• An infinite number of non-zero vectors x can
map to a zero-vector y
• Called the right null-space of A.

79
Rank

• Rank of a nxn matrix A – number of linearly


independent columns (or rows) of A / the
dimension of the span of the columns
• Matrices with full rank (n x n, rank n) behave
nicely: can be inverted, span the full output
space, are one-to-one.
• Matrices with full rank are machines where
every knob is useful and every output state can
be made by the machine

80
Inverses

• Given 𝒚 = 𝑨𝒙, y is a linear combination of


columns of A proportional to x. If A is full-rank,
we should be able to invert this mapping.
• Given some y (output) and A, what x (inputs)
produced it?
• x = A-1y

81
Symmetric Matrices

• Symmetric: 𝑨𝑻 = 𝑨 or 𝑎$$ 𝑎$! 𝑎$4


𝑨!3 = 𝑨3!
𝑎!$ 𝑎!! 𝑎!4
• Have lots of special 𝑎4$ 𝑎4! 𝑎44
properties

Any matrix of the form 𝑨 = 𝑿𝑻 𝑿 is symmetric.


𝑻
Quick check: 𝑨𝑻 = 𝑻
𝑿 𝑿
𝑻
𝑨𝑻 = 𝑿𝑻 𝑿𝑻
𝑨𝑻 = 𝑿𝑻 𝑿
82
Special Matrices – Rotations
𝑟(( 𝑟(/ 𝑟(7
𝑟/( 𝑟// 𝑟/7
𝑟7( 𝑟7/ 𝑟77

• Rotation matrices 𝑹 rotate vectors and do not


change vector L2 norms ( 𝑹𝒙 / = 𝒙 / )
• Every row/column is unit norm
• Every row is linearly independent
• Transpose is inverse 𝑹𝑹𝑻 = 𝑹𝑻 𝑹 = 𝑰
• Determinant is 1 (otherwise it’s also a coordinate
flip/reflection), eigenvalues are 1
83
Eigensystems

• An eigenvector 𝒗𝒊 and eigenvalue 𝜆! of a


matrix 𝑨 satisfy 𝑨𝒗𝒊 = 𝜆! 𝒗𝒊 (𝑨𝒗𝒊 is scaled by 𝜆! )
• Vectors and values are always paired and
typically you assume 𝒗𝒊 / = 1
• Biggest eigenvalue of A gives bounds on how
much 𝑓 𝒙 = 𝑨𝒙 stretches a vector x.
• Hints of what people really mean:
• “Largest eigenvector” = vector w/ largest value
• Spectral just means there’s eigenvectors

84
Suppose I have points in a grid

85
Now I apply f(x) = Ax to these points
Pointy-end: Ax . Non-Pointy-End: x
86
𝑨=
1.1 0
0 1.1

Red box – unit square, Blue box – after f(x) = Ax.


What are the yellow lines and why?
87
𝑨=
0.8 0
0 1.25

Now I apply f(x) = Ax to these points


Pointy-end: Ax . Non-Pointy-End: x
88
𝑨=
0.8 0
0 1.25

Red box – unit square, Blue box – after f(x) = Ax.


What are the yellow lines and why?
89
𝑨=
cos(𝑡) −sin(𝑡)
sin(𝑡) cos(𝑡)

Red box – unit square, Blue box – after f(x) = Ax.


Can we draw any yellow lines?
90
Eigenvectors of Symmetric Matrices

• Always n mutually orthogonal eigenvectors


with n (not necessarily) distinct eigenvalues
• For symmetric 𝑨, the eigenvector with the
𝒙𝑻 𝑨𝒙
largest eigenvalue maximizes 𝒙𝑻𝒙
(smallest/min)
• So for unit vectors (where 𝒙𝑻 𝒙 = 1), that
eigenvector maximizes 𝒙𝑻 𝑨𝒙
• A surprisingly large number of optimization
problems rely on (max/min)imizing this
91
The Singular Value Decomposition
Can always write a mxn matrix A as: 𝑨 = 𝑼𝚺𝑽𝑻

A = U ∑
σ1
σ2
0
σ3
Scale

0
Rotation
Eigenvectors Sqrt of
of AAT Eigenvalues
of ATA
92
The Singular Value Decomposition
Can always write a mxn matrix A as: 𝑨 = 𝑼𝚺𝑽𝑻

VT
A = U ∑
Rotation

Rotation Scale
Eigenvectors Sqrt of Eigenvectors
of AAT Eigenvalues of ATA
of ATA
93
Singular Value Decomposition

• Every matrix is a rotation, scaling, and rotation


• Number of non-zero singular values = rank /
number of linearly independent vectors
• “Closest” matrix to A with a lower rank
σ1 0
σ2 VT
A = U σ3

0
94
Singular Value Decomposition

• Every matrix is a rotation, scaling, and rotation


• Number of non-zero singular values = rank /
number of linearly independent vectors
• “Closest” matrix to A with a lower rank
σ1 0
σ2 VT
 = U 0

0
95
Singular Value Decomposition

• Every matrix is a rotation, scaling, and rotation


• Number of non-zero singular values = rank /
number of linearly independent vectors
• “Closest” matrix to A with a lower rank
• Secretly behind basically many things you do
with matrices

96
Solving Least-Squares
(x2,y2) Start with two points (xi,yi)
𝒚 = 𝑨𝒗
𝑦( 𝑥( 1 𝑚
𝑦/ = 𝑥/ 1 𝑏
(x1,y1)
𝑦( 𝑚𝑥( + 𝑏
𝑦/ = 𝑚𝑥/ + 𝑏
We know how to solve this –
invert A and find v (i.e., (m,b)
that fits points)

97
Solving Least-Squares
(x2,y2) Start with two points (xi,yi)
𝒚 = 𝑨𝒗
𝑦( 𝑥( 1 𝑚
𝑦/ = 𝑥/ 1 𝑏
(x1,y1) 𝑦( 𝑚𝑥 ( + 𝑏 /
𝒚 − 𝑨𝒗 / = 𝑦/ − 𝑚𝑥 + 𝑏
/
/ /
= 𝑦( − 𝑚𝑥( + 𝑏 + 𝑦/ − 𝑚𝑥/ + 𝑏
The sum of squared differences between
the actual value of y and
what the model says y should be.
98
Solving Least-Squares
Suppose there are n > 2 points
𝒚 = 𝑨𝒗
𝑦( 𝑥( 1
⋮ = ⋮ 𝑚

𝑦' 𝑏
𝑥' 1
Compute 𝑦 − 𝐴𝑥 / again
<

𝒚 − 𝑨𝒗 / = p 𝑦! − (𝑚𝑥! + 𝑏) /

!%(

99
Solving Least-Squares
Given y, A, and v with y = Av overdetermined
(A tall / more equations than unknowns)
We want to minimize 𝒚 − 𝑨𝒗 𝟐 , or find:
arg min𝒗 𝒚 − 𝑨𝒗 !

(The value of x that makes


the expression smallest)
Solution satisfies 𝑨𝑻 𝑨 𝒗∗ = 𝑨𝑻 𝒚
or
∗ 𝑻 7$ 𝑻
𝒗 = 𝑨 𝑨 𝑨 𝒚
100
When is Least-Squares Possible?
Given y, A, and v. Want y = Av

Want n outputs, have n knobs


y = A v to fiddle with, every knob is
useful if A is full rank.

= A: rows (outputs) > columns


v (knobs). Thus can’t get precise
y A output you want (not enough
knobs). So settle for “closest”
knob setting.
101
When is Least-Squares Possible?
Given y, A, and v. Want y = Av

Want n outputs, have n knobs


y = A v to fiddle with, every knob is
useful if A is full rank.

y = A A: columns (knobs) > rows


v (outputs). Thus, any output can
be expressed in infinite ways.

102
Homogeneous Least-Squares
Given a set of unit vectors (aka directions) 𝒙𝟏 , … , 𝒙𝒏
and I want vector 𝒗 that is as orthogonal to all the 𝒙𝒊
as possible (for some definition of orthogonal)
𝒙𝒏… Stack 𝒙𝒊 into A, compute Av
𝒙𝟐 𝑻 𝑻 0 if
𝒙𝟏 − 𝒙 𝟏 − 𝒙 𝟏 𝒗
𝑨𝒗 = ⋮ 𝒗= ⋮ orthog
𝑻
− 𝒙𝒏 − 𝒙𝑻𝒏 𝒗
𝒏
𝒗 𝟐 𝑻 𝟐
Compute 𝑨𝒗 =p 𝒙𝒊 𝒗
𝒊
Sum of how orthog. v is to each x 103
Homogeneous Least-Squares

• A lot of times, given a matrix A we want to find


the v that minimizes 𝑨𝒗 / .
• I.e., want 𝐯 ∗ = arg min 𝑨𝒗 //
𝒗
• What’s a trivial solution?
• Set v = 0 → Av = 0
• Exclude this by forcing v to have unit norm

104
Homogeneous Least-Squares
/
Let’s look at 𝑨𝒗 /
!
𝑨𝒗 ! = Rewrite as dot product
! 9
𝑨𝒗 ! = 𝐀𝐯 (𝐀𝐯) Distribute transpose
!
𝑨𝒗 ! = 𝒗𝑻 𝑨𝑻 𝐀𝐯 = 𝐯 𝐓 𝐀𝐓 𝐀 𝐯

We want the vector minimizing this quadratic form


Where have we seen this?

105
Homogeneous Least-Squares
Ubiquitious tool in vision:
arg min 𝑨𝒗 !
$ 𝒗 ($

(1) “Smallest”* eigenvector of 𝑨𝑻 𝑨


(2) “Smallest” right singular vector of 𝑨

For min → max, switch smallest → largest

*Note: 𝑨𝑻 𝑨 is positive semi-definite so it has all non-negative eigenvalues


106
Derivatives

Remember derivatives?

Derivative: rate at which a function f(x) changes


at a point as well as the direction that increases
the function

107
Given quadratic function f(x)
/
𝑓 𝑥, 𝑦 = 𝑥 − 2 +5
𝑓 𝑥 is function

𝑔 𝑥 = 𝑓& 𝑥

aka

𝑑
𝑔 𝑥 = 𝑓(𝑥)
𝑑𝑥

108
Given quadratic function f(x)
/
𝑓 𝑥, 𝑦 = 𝑥 − 2 +5

What’s special
about x=2?

𝑓 𝑥 minim. at 2
𝑔 𝑥 = 0 at 2

a = minimum of f →
𝑔 𝑎 =0

Reverse is not true

109
Rates of change
/
𝑓 𝑥, 𝑦 = 𝑥 − 2 +5

Suppose I want to
increase f(x) by
changing x:

Blue area: move left


Red area: move right

Derivative tells you


direction of ascent
and rate

110
What Calculus Should I Know

• Really need intuition


• Need chain rule
• Rest you should look up / use a computer
algebra system / use a cookbook
• Partial derivatives (and that’s it from
multivariable calculus)

111
Partial Derivatives

• Pretend other variables are constant, take a


derivative. That’s it.
• Make our function a function of two variables
𝑓 𝑥 = 𝑥−2 /+5
𝜕
𝑓 𝑥 = 2 𝑥 − 2 ∗ 1 = 2(𝑥 − 2)
𝜕𝑥 Pretend it’s
𝑓/ 𝑥, 𝑦 = 𝑥 − 2 / + 5 + 𝑦 + 1 / constant →
𝜕 derivative = 0
𝑓/ 𝑥 = 2(𝑥 − 2)
𝜕𝑥
112
Zooming Out
/ /
𝑓/ 𝑥, 𝑦 = 𝑥 − 2 +5+ 𝑦+1
Dark = f(x,y) low
Bright = f(x,y) high

113
Taking a slice of
/ /
𝑓/ 𝑥, 𝑦 = 𝑥 − 2 +5+ 𝑦+1
Slice of y=0 is the
function from before:
𝑓 𝑥 = 𝑥−2 /+5
𝑓 @ 𝑥 = 2(𝑥 − 2)

114
Taking a slice of
/ /
𝑓/ 𝑥, 𝑦 = 𝑥 − 2 +5+ 𝑦+1
A
𝑓/𝑥, 𝑦 is rate of
AB
change & direction in
x dimension

115
Zooming Out
/ /
𝑓/ 𝑥, 𝑦 = 𝑥 − 2 +5+ 𝑦+1
A
𝑓/ 𝑥, 𝑦 is
AC
2(𝑦 + 1)
and is the rate of
change & direction in
y dimension

116
Zooming Out
/ /
𝑓/ 𝑥, 𝑦 = 𝑥 − 2 +5+ 𝑦+1
Gradient/Jacobian:
Making a vector of
𝜕𝑓 𝜕𝑓
∇D = ,
𝜕𝑥 𝜕𝑦
gives rate and
direction of change.

Arrows point OUT of


minimum / basin.

117
What Should I Know?

• Gradients are simply partial derivatives per-


dimension: if 𝒙 in 𝑓(𝒙) has n dimensions, ∇D (𝑥)
has n dimensions
• Gradients point in direction of ascent and tell
the rate of ascent
• If a is minimum of 𝑓(𝒙) → ∇E a = 𝟎
• Reverse is not true, especially in high-
dimensional spaces

118
119

You might also like