Testing: Training
Testing: Training
Testing: Training
ements
Review of Le ture 17
Sampling bias
Hi
O
am's Razor
The simplest model that
ts the data is also the
most plausible.
P(x)
testing
training
Hi
Data snooping
Cumulative Prot %
30
snooping
20
10
omplexity of h
omplexity of H
unlikely event signi
ant if it happens
-10
no snooping
100
200
300
Day
400
500
Le ture 18:
Epilogue
Outline
Bayesian learning
Aggregation methods
A knowledgments
AM
L
2/23
semisupervised learning
overfitting
Gaussian processes
distributionfree
collaborative filtering
deterministic noise
linear regression
VC dimension
nonlinear transformation
decision trees
data snooping
sampling bias
Q learning
SVM
learning curves
mixture of expe
neural networks
no free
ensemble learning
types of learning
error measures
is learning feasible?
clustering
AM
L
regularization
kernel methods
softorder constraint
weight decay
Occams razor
Boltzmann mach
3/23
The map
THEORY
TECHNIQUES
models
VC
biasvariance
complexity
linear
methods
supervised
regularization
neural networks
SVM
nearest neighbors
bayesian
PARADIGMS
RBF
gaussian processes
unsupervised
validation
reinforcement
aggregation
active
input processing
online
SVD
graphical models
AM
L
4/23
Outline
Bayesian learning
Aggregation methods
A knowledgments
AM
L
5/23
Probabilisti
approa
h
Hi
P (D | h = f )
How about
de ides whi h
P (h = f | D)
x)
target function f: X
UNKNOWN
INPUT
DISTRIBUTION
plus noise
P( x )
(likelihood)
x1 , ... , xN
DATA SET
D = ( x1 , y1 ), ... , ( xN , yN )
g ( x )~
~ f (x )
LEARNING
ALGORITHM
FINAL
HYPOTHESIS
g: X Y
HYPOTHESIS SET
H
Hi
AM
L
6/23
The prior
P (h = f | D)
P (D | h = f ) P (h = f )
P (h = f | D) =
P (D | h = f ) P (h = f )
P (D)
P (h = f )
is the
P (h = f | D)
prior
is the
posterior
AM
L
7/23
Example of a prior
Consider a per
eptron:
A possible prior on
w:
h is
Ea
h
determined by
wi
w = w0, w1, , wd
[1, 1]
Given
D,
we an ompute
P (h = f )
P (D | h = f )
P (h = f | D)
P (h = f )P (D | h = f )
AM
L
8/23
A prior is an assumption
Even the most neutral prior:
Hi
is unknown
is random
P(x)
x
Hi
is unknown
is random
(xa)
1
AM
L
x
Hi
9/23
we ould ompute
P (h = f | D)
for every
hH
we an derive
E(h(x))
we an derive the
for every
AM
L
10/23
valid
2. The prior is
irrelevant
AM
L
11/23
Outline
Bayesian learning
Aggregation methods
A knowledgments
AM
L
12/23
What is aggregation?
Combining dierent solutions
Hi
h 1 , h2 , , hT
D:
Hi
ensemble learning
and
boosting
13/23
training data
jointly:
Learning
Algorithm
Hi
Hi
training data
Learning
Algorithm
Hi
AM
L
14/23
blending
training data
Learning
Algorithm
Hi
AM
L
15/23
De
orrelation - boosting
Create
h 1 , , ht ,
sequentially: Make
ht
h's:
Hi
training data
Learning
Algorithm
Hi
Emphasize points in
Choose weight of
AM
L
ht
based on
E (ht)
in
16/23
h 1 , h2 , , hT
g(x) =
T
X
t ht(x)
t=1
Prin
ipled
hoi
e of
t's:
Some
t's
pseudo-inverse
Most valuable
ht
in the blend?
17/23
Outline
Bayesian learning
Aggregation methods
A knowledgments
AM
L
18/23
Course ontent
AM
L
Professor
Professor
19/23
Course sta
Carlos Gonzalez (Head TA)
Ron Appel
Costis Sideris
Doris Xin
AM
L
20/23
AM
L
21/23
Calte
h support
IST -
AM
L
Mathieu Desbrun
E&AS Division -
Ares Rosakis
Provost's O e -
Ed Stolper
and
and
Mani Chandy
Melany Hunt
22/23
Many others
Calte
h TA's and sta members
Calte
h alumni and Alumni Asso
iation
Colleagues all over the world
AM
L
23/23
Faiza A. Ibrahim