Vladimir Cherkassky IJCNN05
Vladimir Cherkassky IJCNN05
Predictive Learning
Vladimir Cherkassky
University of Minnesota
cherk001@umn.edu
Tutorial at IJCNN-05
July 31, 2005
Copyright © Vladimir Cherkassky
1
Outline
Motivation and Background
Standard Inductive Learning Formulation
Alternative Formulations
- non-inductive types of inference
- non-standard inductive formulations
Predictive models for interpretation
Conclusions
2
Motivation:
Importance of Problem Formulation
Traditional (Simplistic) View
‘Useful’ =‘Predictive’
May lead to misconceptions:
Inductive models are completely data-driven
The goal is to design better algorithms
3
Motivation: philosophical
Karl Popper: Science starts from
problems, and not from observations
What to do vs how to do
4
Motivation
Another view of Predictive Learning
5
Background: historical
The problem of predictive learning
Given past data + reasonable assumptions
Estimate unknown dependency for future
predictions
6
Historical Development
Statistics (mathematical science)
Goal: model identification, density estimation
Neural Networks (empirical science)
Goal of learning: generalization, risk minimization
Statistical Learning (VC theory)
(natural science)
Goal of learning: generalization for distinct learning
problem formulations
7
Standard Inductive Learning
The learning machine observes samples (x ,y), and
returns an estimated response yˆ = f ( x, w)
Two modes of inference: identification vs imitation
Risk ∫ Loss(y, f(x,w)) dP(x,y)→ min Λ
y
x
Generator Learning
of samples Machine
y
System
8
Two Learning Problems
10
Alternative Formulations
Re-examine assumptions behind
standard inductive learning
1 Finite training + large unknown test set
Æ non-inductive inference (transduction, …)
2 Particular loss functions
Æ new inductive formulations (application-
driven)
3 Single model
Æ multiple model estimation
11
1.Transduction
How to incorporate unlabeled test data
into the learning process
Estimating function at given points
Given: training data (Xi, yi) , i = 1,…n
and unlabeled test points Xn+j , j = 1,…k
Estimate: class labels at these test points
Note: need to predict only at given test points
Xn+j, not for every possible input X
12
Transduction vs Induction
a priori knowledge
assumptions
estimated
function
induction deduction
training predicted
data output
transduction 13
Transduction based on size of margin
The problem: Find class label of test input X
14
Many potential applications
Prediction of molecular bioactivity for drug
discovery
Training data~1,909; test~634 samples
Prediction accuracy:
16
Hierarchy of Types of Inference
Identification
Imitation
Transduction
Selection
.....
Implications: philosophical, human learning
17
2. Application-driven formulations
APPLICATION NEEDS
Admissible
Models
LEARNING THEORY
18
Inductive Learning System (revised)
The learning machine observes samples
(x ,y), and returns an estimated response ŷ
to minimize application-specific Loss [f(x,w), y]
Λ
y
x
Generator Learning Loss[f(x,w),y]
of samples Machine
y
System
19
Application: financial engineering
Asset management via daily trading:
non-standard learning formulation
PREDICTIVE
input x prediction TRADING Buy/sell/hold
MODEL
indicators y DECISION
y=f(x)
GAIN/
MARKET
LOSS
20
Example: timing of mutual funds
Background: buy-and-hold vs trading
Recent scandals in mutual fund industry
Daily trading scenario
Sell Buy
or or
buy sell
22
Learning formulation for fund trading
Given
- (
Daily % price changes of a fund qi = pi − pi −1 pi)
- Time series of daily values of input variables X
i
- Indicator decision function (1/0 ~ Buy/Sell) yi = f ( xi , w)
23
Non-standard inductive formulation
n
Maximize total account value Q ( w) = ∑ f ( xi , w) qi
i =1
Neither classification, nor regression
PREDICTIVE
input x MODEL prediction TRADING Buy/sell/hold
indicators y DECISION
y=f(x)
GAIN/
MARKET
LOSS
24
3. Multiple Model Estimation
Single-model formulation
Estimate unknown
dependency
x→y
Multiple-model approach:
Available data can be
‘explained’ using
several models
25
Example data sets: Regression
Two regression models Single complex model
26
Multiple Model Formulation
Available (training) data are generated by several
(unknown) regression models,
y = t m (x) + ξ m x∈ Xm
Goals of learning:
Partition available data (clustering, segmentation)
27
Experimental Results: Linear
3.5 3.5
3 3
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5 M1 estimate
M2 estimate
0 0
0 0.5 1 0 0.5 1
(a) (b)
3.5 3.5
3 3
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5 M1 estimate
M2 estimate
0 0
0 0.5 1 0 0.5 1
(c) (d) 28
Experimental Results: Non-Linear
2 2
1 1
0 0
-1 -1
M1 estimate
M2 estimate
-2 -2
0 0.5 1 0 0.5 1
(a) (b)
2 2
1 1
0 0
-1 -1
M1 estimate
M2 estimate
-2 -2
0 0.5 1 0 0.5 1
(c) (d)
29
Multiple Model Classification
Single-model approach Multiple-model approach
Æcomplex model Æ two simple models
30
Procedure for MMC
Initialization: Available data = all training samples.
Step 1: Estimate major model, i.e. apply robust
classification to available data
Here, ‘Robustness’ wrt variations of data generated by minor
model (s)
Step 2: Partition available data (from one class)
into two subsets
31
Example of MMC: XOR data set
Training phase
32
Comparison for toy data set
MMC hyperplanes RBF-SVM
1.2 1.2
(2)
H
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
(1)
H
-0.2 -0.2
-1 -0.5 0 0.5 -1 -0.5 0 0.5
(a) (b)
33
Comparison continued
SVM polynomial kernel Prediction Accuracy
1.2
Error (%SV)
1
RBF 0.058 (25.5%)
0.8
Poly 0.067 (26.4%)
0.6
MMC 0.055 (14.5%)
0.4
0.2
-0.2
-1 -0.5 0 0.5
(c)
34
Summary for Multiple Model Estimation
35
Prediction and interpretation
Many, many applications intrinsically
difficult to formalize
Two practical goals of learning:
- prediction (objective loss function)
- interpretation, understanding (subjective)
Most algorithms developed for predictive
settings, but used for interpretation and
human decision making
Rationale: good predictive model ~ true
36
Example:functional neuroimaging
Understanding fMRI image data:
- estimate ‘good’ Brain Activation Maps showing brain activity
(colored patches) in response to specific tasks
Measure of goodness: predictability, reproducibility
37
Predictive models for understanding
Always assume inductive formulation
What if transduction yields much better
prediction?
Fundamental problem (classical view):
- human reasoning ~ logic + induction
- transduction does not fit this paradigm
Goal of science: understanding
Goal of science: perform/act well
38
Conclusions
Methodological shift:
think first about the problem formulation,
rather than learning algorithms
Importance of problem formulation
- for empirical comparisons
- the limits of predictive models
Philosophical impact of Vapnik’s new
types of (non-inductive) inference
39
References
VC-Theory: V. Vapnik, Statistical Learning Theory, Wiley, NY
40