Supervised Learning: Linear Methods (1/2) : Applied Multivariate Statistics - Spring 2012
Supervised Learning: Linear Methods (1/2) : Applied Multivariate Statistics - Spring 2012
Supervised Learning: Linear Methods (1/2) : Applied Multivariate Statistics - Spring 2012
1
Conditional Probability
Sample space
P (T jC)P (C)
posterior P (CjT ) = P (T ) prior
Class conditional probability 2
One approach to supervised learning
P (C)P (XjC)
P (CjX) = P (X) » P (C)P (XjC)
Prior / prevalence:
Find some estimate Assume:
Fraction of samples
XjC » N(¹c; §c)
in that class
Bayes rule:
Choose class where P(C|X) is maximal
(rule is “optimal” if all types of error are equally costly)
In Practice: Estimate 𝑃 𝐶 , 𝜇𝐶 , Σ𝐶
3
¡ 1 ¢
QDA: Doing the math… p 1 T ¡1
exp ¡ 2 (x ¡ ¹c ) §C (x ¡ ¹c )
(2¼)d j§C j
𝑃 𝐶 𝑋 ~ 𝑃 𝐶 𝑃(𝑋|𝐶)
Use the fact: max 𝑃 𝐶 𝑋 max(log 𝑃 𝐶 𝑋 )
𝛿𝑐 𝑥 = log 𝑃 𝐶 𝑋 = log 𝑃 𝐶 + log 𝑃 𝑋 𝐶 =
1 1 𝑇 −1
= log 𝑃 𝐶 − log Σ𝐶 − 𝑥 − 𝜇𝐶 Σ𝐶 𝑥 − 𝜇𝐶 + 𝑐
2 2
4
Simplification
1
Classify to which class (assume equal prior)?
• Physical distance in space is equal
0
• Classify to class 0, since Mahal. Dist. is smaller
5
LDA vs. QDA
+ Only few parameters to - Many parameters to estimate;
estimate; accurate estimates less accurate
- Inflexible + More flexible
(quadratic decision boundary)
(linear decision boundary)
6
Fisher’s Discriminant Analysis: Idea
Find direction(s) in which groups are separated best
D(U) D(U)
𝐽 𝑤 large 𝐽 𝑤 small
Var(U) Var(U)
7
LDA and Linear Discriminants
8
Example: Classification of Iris flowers
Iris setosa
Iris versicolor
Iris virginica
9
Quality of classification
Test
Training
10
Measures for prediction error
Error rate:
1 – sum(diagonal entries) / (number of samples) =
= 1 – 76/100 = 0.24
We expect that our classifier predicts 24% of new
observations incorrectly (this is just a rough estimate)
11
Example: Digit recognition
13
R functions to know
lda
14