0% found this document useful (0 votes)
221 views40 pages

Vladimir Cherkassky IJCNN05

CH

Uploaded by

arijit_ghosh_18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
221 views40 pages

Vladimir Cherkassky IJCNN05

CH

Uploaded by

arijit_ghosh_18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

New Formulations for

Predictive Learning
Vladimir Cherkassky
University of Minnesota
cherk001@umn.edu
Tutorial at IJCNN-05
July 31, 2005
Copyright © Vladimir Cherkassky
1
Outline
„ Motivation and Background
„ Standard Inductive Learning Formulation
„ Alternative Formulations
- non-inductive types of inference
- non-standard inductive formulations
„ Predictive models for interpretation
„ Conclusions

2
Motivation:
Importance of Problem Formulation
„ Traditional (Simplistic) View

„ ‘Useful’ =‘Predictive’
„ May lead to misconceptions:
„ Inductive models are completely data-driven
„ The goal is to design better algorithms

3
Motivation: philosophical
„ Karl Popper: Science starts from
problems, and not from observations

„ Confucius: Learning without thought is


useless, thought without learning is
dangerous

„ What to do vs how to do

4
Motivation
„ Another view of Predictive Learning

„ Importance of problem formulation (vs algorithm)


„ Just a few known formulations
„ Thousands of algorithms

5
Background: historical
„ The problem of predictive learning
Given past data + reasonable assumptions
Estimate unknown dependency for future
predictions

„ Driven by applications (not theory)

6
Historical Development
„ Statistics (mathematical science)
Goal: model identification, density estimation
„ Neural Networks (empirical science)
Goal of learning: generalization, risk minimization
„ Statistical Learning (VC theory)
(natural science)
Goal of learning: generalization for distinct learning
problem formulations

7
Standard Inductive Learning
„ The learning machine observes samples (x ,y), and
returns an estimated response yˆ = f ( x, w)
„ Two modes of inference: identification vs imitation
„ Risk ∫ Loss(y, f(x,w)) dP(x,y)→ min Λ
y
x
Generator Learning
of samples Machine

y
System
8
Two Learning Problems

„ Learning ~ estimating mapping x → y


(in the sense of risk minimization)
„ Binary Classification: estimating an
indicator function (with 0/1 loss)
„ Regression: estimating a real-valued
function (with squared loss)
„ Assumptions: iid, training/test, loss fct
9
Contributions of VC-theory
„ The Goal of Learning
system imitation vs system identification
„ Two factors responsible for generalization
„ Keep-It-Direct Principle (Vapnik, 1995)
Do not solve a problem of interest by solving a more
general (harder) problem as an intermediate step
„ Clear Distinction between
- problem setting
- solution approach (inductive principle)
- learning algorithm

10
Alternative Formulations
„ Re-examine assumptions behind
standard inductive learning
1 Finite training + large unknown test set
Æ non-inductive inference (transduction, …)
2 Particular loss functions
Æ new inductive formulations (application-
driven)
3 Single model
Æ multiple model estimation

11
1.Transduction
„ How to incorporate unlabeled test data
into the learning process
„ Estimating function at given points
Given: training data (Xi, yi) , i = 1,…n
and unlabeled test points Xn+j , j = 1,…k
Estimate: class labels at these test points
Note: need to predict only at given test points
Xn+j, not for every possible input X

12
Transduction vs Induction
a priori knowledge
assumptions

estimated
function
induction deduction

training predicted
data output
transduction 13
Transduction based on size of margin
The problem: Find class label of test input X

14
Many potential applications
„ Prediction of molecular bioactivity for drug
discovery
„ Training data~1,909; test~634 samples

„ Input space ~ 139,351-dimensional

„ Prediction accuracy:

SVM induction~74.5%; transduction ~ 82.3%


Ref: J. Weston et al, KDD cup 2001 data analysis: prediction
of molecular bioactivity for drug design – binding to
thrombin, Bioinformatics 2003
15
Beyond Transduction: Selection
„ Selection Problem
Given: training data (Xi, yi) , i = 1,…n
and unlabeled test points Xn+j , j = 1,…k
Select: a subset of m test points with the
highest probability of belonging to one class
Note: selective inference needs only to select
a subset of m test points, rather than assign
class labels to all test points.

16
Hierarchy of Types of Inference

„ Identification
„ Imitation
„ Transduction
„ Selection
„ .....
Implications: philosophical, human learning

17
2. Application-driven formulations
APPLICATION NEEDS

Loss Input, output, Training/


Function other variables test data

Admissible
Models

FORMAL PROBLEM STATEMENT

LEARNING THEORY
18
Inductive Learning System (revised)
„ The learning machine observes samples
(x ,y), and returns an estimated response ŷ
to minimize application-specific Loss [f(x,w), y]
Λ
y
x
Generator Learning Loss[f(x,w),y]
of samples Machine

y
System

19
Application: financial engineering
„ Asset management via daily trading:
non-standard learning formulation

PREDICTIVE
input x prediction TRADING Buy/sell/hold
MODEL
indicators y DECISION
y=f(x)

GAIN/
MARKET
LOSS

20
Example: timing of mutual funds
„ Background: buy-and-hold vs trading
„ Recent scandals in mutual fund industry
„ Daily trading scenario

Index or Fund Money Market

Sell Buy
or or
buy sell

Proprietary Exchange Strategy


21
Example of Actual Trading
„ Improved return + Reduced risk/ volatility:

22
Learning formulation for fund trading
Given
- (
Daily % price changes of a fund qi = pi − pi −1 pi)
- Time series of daily values of input variables X
i
- Indicator decision function (1/0 ~ Buy/Sell) yi = f ( xi , w)

Objective: maximize total return over n-day period


n
Q ( w ) = ∑ f ( xi , w ) q i
i =1

23
Non-standard inductive formulation
n
„ Maximize total account value Q ( w) = ∑ f ( xi , w) qi
i =1
„ Neither classification, nor regression

PREDICTIVE
input x MODEL prediction TRADING Buy/sell/hold
indicators y DECISION
y=f(x)

GAIN/
MARKET
LOSS

24
3. Multiple Model Estimation
„ Single-model formulation
„ Estimate unknown
dependency
x→y

„ Multiple-model approach:
„ Available data can be
‘explained’ using
several models

25
Example data sets: Regression
„ Two regression models „ Single complex model

26
Multiple Model Formulation
„ Available (training) data are generated by several
(unknown) regression models,
y = t m (x) + ξ m x∈ Xm

„ Goals of learning:
„ Partition available data (clustering, segmentation)

„ Estimate a model for each subset of data


(supervised learning)
„ Assumption:
„ Majority of the data samples can be explained
(described) by a single model.

27
Experimental Results: Linear
3.5 3.5
3 3
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5 M1 estimate
M2 estimate
0 0
0 0.5 1 0 0.5 1
(a) (b)

3.5 3.5
3 3
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5 M1 estimate
M2 estimate
0 0
0 0.5 1 0 0.5 1
(c) (d) 28
Experimental Results: Non-Linear
2 2

1 1

0 0

-1 -1
M1 estimate
M2 estimate
-2 -2
0 0.5 1 0 0.5 1
(a) (b)

2 2

1 1

0 0

-1 -1
M1 estimate
M2 estimate
-2 -2
0 0.5 1 0 0.5 1
(c) (d)

29
Multiple Model Classification
„ Single-model approach „ Multiple-model approach
„ Æcomplex model „ Æ two simple models

30
Procedure for MMC
„ Initialization: Available data = all training samples.
„ Step 1: Estimate major model, i.e. apply robust
classification to available data
„ Here, ‘Robustness’ wrt variations of data generated by minor
model (s)
„ Step 2: Partition available data (from one class)
into two subsets

„ Step 3: Remove subset of data (from one class)


classified by the major model from available data.
„ Iterate

31
Example of MMC: XOR data set
„ Training phase

32
Comparison for toy data set
„ MMC hyperplanes „ RBF-SVM
1.2 1.2
(2)
H
1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0
(1)
H
-0.2 -0.2
-1 -0.5 0 0.5 -1 -0.5 0 0.5
(a) (b)

33
Comparison continued
„ SVM polynomial kernel „ Prediction Accuracy

1.2
Error (%SV)
1
„ RBF 0.058 (25.5%)
0.8
„ Poly 0.067 (26.4%)
0.6
„ MMC 0.055 (14.5%)
0.4

0.2

-0.2
-1 -0.5 0 0.5
(c)

34
Summary for Multiple Model Estimation

„ Improvements due to novel problem


formulation, not sophisticated algorithms
„ Practical learning algorithm using based on
(linear) SVM
„ Resulting model has hierarchical structure
„ Advantages:
„ Interpretation
„ No Kernel Selection

35
Prediction and interpretation
„ Many, many applications intrinsically
difficult to formalize
„ Two practical goals of learning:
- prediction (objective loss function)
- interpretation, understanding (subjective)
„ Most algorithms developed for predictive
settings, but used for interpretation and
human decision making
„ Rationale: good predictive model ~ true

36
Example:functional neuroimaging
„ Understanding fMRI image data:
- estimate ‘good’ Brain Activation Maps showing brain activity
(colored patches) in response to specific tasks
„ Measure of goodness: predictability, reproducibility

37
Predictive models for understanding
„ Always assume inductive formulation
„ What if transduction yields much better
prediction?
„ Fundamental problem (classical view):
- human reasoning ~ logic + induction
- transduction does not fit this paradigm
„ Goal of science: understanding
„ Goal of science: perform/act well
38
Conclusions
„ Methodological shift:
think first about the problem formulation,
rather than learning algorithms
„ Importance of problem formulation
- for empirical comparisons
- the limits of predictive models
„ Philosophical impact of Vapnik’s new
types of (non-inductive) inference
39
References
„ VC-Theory: V. Vapnik, Statistical Learning Theory, Wiley, NY

„ Transduction: V. Vapnik (1998), Statistical Learning Theory,


Wiley, + many recent papers

„ Timing of Mutual Funds: E. Zitzewitz (2002), Who cares about


shareholders: arbitrage-proofing mutual funds. Journal of Law,
Economics and Organization, 19, 2, pp. 245-280

„ Multiple Model Estimation:


Y. Ma and V. Cherkassky (2003), Multiple model classification
using SVM-based approach, in Proc. IJCNN
V. Cherkassky and Y. Ma (2005), Multiple model regression
estimation, IEEE TNN, 14, 4, pp. 785-798

40

You might also like