01 Present PDF

Machine Learning Foundations
()
Lecture 1: The Learning Problem

Hsuan-Tien Lin ()
htlin@csie.ntu.edu.tw
Department of Computer Science
& Information Engineering
National Taiwan University

()
Hsuan-Tien Lin (NTU CSIE)
0/27
The Learning Problem
Course Introduction
Course Design (1/2)

Machine Learning: a mixture of theoretical and practical tools
theory oriented
derive everything deeply for solid understanding
less interesting to general audience
techniques oriented
flash over the sexiest techniques broadly for shiny coverage
too many techniques, hard to choose, hard to use properly
1/27
Course Introduction
Course Design (1/2)

Machine Learning: a mixture of theoretical and practical tools
theory oriented
derive everything deeply for solid understanding
less interesting to general audience
techniques oriented
flash over the sexiest techniques broadly for shiny coverage
too many techniques, hard to choose, hard to use properly
our approach: foundation oriented
1/27
Course Introduction
Course Design (2/2)

Foundation Oriented ML Course
mixture of philosophical illustrations, key theory, core techniques,
usage in practice, and hopefully jokes :-)

what every machine learning user should know
2/27
Course Introduction
Course Design (2/2)


story-like:
When Can Machines Learn? (illustrative + technical)
2/27
Course Introduction
Course Design (2/2)


story-like:
Why Can Machines Learn? (theoretical + illustrative)
2/27
Course Introduction
Course Design (2/2)


story-like:
How Can Machines Learn? (technical + practical)
2/27
Course Introduction
Course Design (2/2)


story-like:

How Can Machines Learn Better? (practical + theoretical)
2/27
Course Introduction
Course Design (2/2)


story-like:

How Can Machines Learn Better? (practical + theoretical)
allows students to learn future/untaught

techniques or study deeper theory easily
2/27
Course Introduction
Course History
NTU Version
15-17 weeks (2+ hours)
highly-praised with English
and blackboard teaching
3/27
Course Introduction
Course History
NTU Version
Coursera Version
8 weeks of foundation (this
course) + 7 weeks of
techniques (coming course)
Mandarin teaching to reach
more audience in need

slides teaching improved
with Courseras quiz and

homework mechanisms
3/27
Course Introduction
Course History
NTU Version
Coursera Version
8 weeks of foundation (this
course) + 7 weeks of
techniques (coming course)
Mandarin teaching to reach
more audience in need

slides teaching improved
with Courseras quiz and

homework mechanisms
goal: try making Coursera version

even better than NTU version
3/27
Course Introduction
Fun Time
Which of the following description of this course is true?
1
the course will be taught in Taiwanese
the course will tell me the techniques that create the android
Lieutenant Commander Data in Star Trek
the course will be 15 weeks long
the course will be story-like
4/27
Course Introduction
Fun Time
Which of the following description of this course is true?
1
the course will be taught in Taiwanese
the course will tell me the techniques that create the android
Lieutenant Commander Data in Star Trek
the course will be 15 weeks long
the course will be story-like
Reference Answer: 4
1
no, my Taiwanese is unfortunately not

good enough for teaching (yet)
no, although what we teach may serve as

foundations of those (future) techniques
no, unless you choose to join the next

course
yes, lets begin the story
4/27
Course Introduction
Roadmap
1
When Can Machines Learn?

Course Introduction
What is Machine Learning
Applications of Machine Learning
Components of Machine Learning
Machine Learning and Other Fields
2
Why Can Machines Learn?
How Can Machines Learn?
How Can Machines Learn Better?
5/27
From Learning to Machine Learning

learning:
6/27

learning: acquiring skill
with experience accumulated from observations
observations
learning
skill
6/27

observations
learning
skill
machine learning: acquiring skill

with experience accumulated/computed from data
data
ML
skill
6/27

observations
learning
skill
machine learning: acquiring skill

with experience accumulated/computed from data
data
ML
skill
What is skill?
6/27
A More Concrete Definition

skill
improve some performance measure (e.g. prediction accuracy)
7/27

skill
machine learning: improving some performance measure
with experience computed from data
data
ML
improved
performance
measure
7/27

skill
data
ML
improved
performance
measure
An Application in Computational Finance

stock data
ML
more investment gain
7/27

skill
data
ML
improved
performance
measure
An Application in Computational Finance

stock data
ML
more investment gain
Why use machine learning?

7/27
Yet Another Application: Tree Recognition
8/27

define trees and hand-program: difficult
8/27

learn from data (observations) and
recognize: a 3-year-old can do so
8/27


ML-based tree recognition system can be
easier to build than hand-programmed

system
8/27


ML-based tree recognition system can be
easier to build than hand-programmed

system
ML: an alternative route to

build complicated systems
8/27
The Machine Learning Route

ML: an alternative route to build complicated systems
Some Use Scenarios

when human cannot program the system manually
navigating on Mars
9/27

Some Use Scenarios

navigating on Mars
when human cannot define the solution easily
speech/visual recognition
9/27

Some Use Scenarios

navigating on Mars
when needing rapid decisions that humans cannot do
high-frequency trading
9/27

Some Use Scenarios

navigating on Mars
when needing to be user-oriented in a massive scale
consumer-targeted marketing
9/27

Some Use Scenarios

navigating on Mars
when needing to be user-oriented in a massive scale
consumer-targeted marketing
Give a computer a fish, you feed it for a day;
teach it how to fish, you feed it for a lifetime. :-)
9/27
Key Essence of Machine Learning

data
ML
improved
performance
measure
10/27

data
ML
improved
performance
measure
exists some underlying pattern to be learned

so performance measure can be improved
10/27

data
improved
performance
measure
ML

but no programmable (easy) definition

so ML is needed
10/27

data
improved
performance
measure
ML


so ML is needed
somehow there is data about the pattern

so ML has some inputs to learn from
10/27

data
improved
performance
measure
ML


so ML is needed
somehow there is data about the pattern

so ML has some inputs to learn from
key essence: help decide whether to use ML
10/27
Fun Time
Which of the following is best suited for machine learning?
1
predicting whether the next cry of the baby girl happens at an

even-numbered minute or not
determining whether a given graph contains a cycle
deciding whether to approve credit card to some customer
guessing whether the earth will be destroyed by the misuse of

nuclear power in the next ten years
11/27
Fun Time
Which of the following is best suited for machine learning?
1
predicting whether the next cry of the baby girl happens at an

even-numbered minute or not
determining whether a given graph contains a cycle
deciding whether to approve credit card to some customer
guessing whether the earth will be destroyed by the misuse of

nuclear power in the next ten years
Reference Answer: 3
1
no pattern
programmable definition
pattern: customer behavior;

definition: not easily programmable;
data: history of bank operation
arguably no (or not enough) data yet
11/27
Daily Needs: Food, Clothing, Housing, Transportation

data
1
ML
skill
Food (Sadilek et al., 2013)

data: Twitter data (words + location)
skill: tell food poisoning likeliness of restaurant properly
12/27

data
1
ML
skill

Clothing (Abu-Mostafa, 2012)

data: sales figures + client surveys
skill: give good fashion recommendations to clients
12/27

data
1
ML
skill


Housing (Tsanas and Xifara, 2012)

data: characteristics of buildings and their energy load
skill: predict energy load of other buildings closely
12/27

data
1
ML
skill



Transportation (Stallkamp et al., 2012)

data: some traffic sign images and meanings
skill: recognize traffic signs accurately
12/27

ML
data
1
skill



Transportation (Stallkamp et al., 2012)

data: some traffic sign images and meanings
skill: recognize traffic signs accurately
ML is everywhere!
12/27
Education
data
ML
skill
data: students records on quizzes on a Math tutoring system

skill: predict whether a student can give a correct answer to
another quiz question
13/27
Education
data
ML
skill

A Possible ML Solution
answer correctly Jrecent strength of student > difficulty of questionK
13/27
Education
data
ML
skill

give ML 9 million records from 3000 students
13/27
Education
data
ML
skill

ML determines (reverse-engineers) strength and difficulty
automatically
13/27
Education
data
ML
skill

ML determines (reverse-engineers) strength and difficulty
automatically
key part of the world-champion system from
National Taiwan Univ. in KDDCup 2010
13/27
Entertainment: Recommender System (1/2)

data
ML
skill
data: how many users have rated some movies

skill: predict how a user would rate an unrated movie
14/27

data
ML
skill

A Hot Problem
competition held by Netflix in 2006
100,480,507 ratings that 480,189 users gave to 17,770 movies
10% improvement = 1 million dollar prize
14/27

data
ML
skill

A Hot Problem
similar competition (movies songs) held by Yahoo! in KDDCup
2011
252,800,275 ratings that 1,000,990 users gave to 624,961 songs
14/27

data
ML
skill

A Hot Problem
similar competition (movies songs) held by Yahoo! in KDDCup
2011
252,800,275 ratings that 1,000,990 users gave to 624,961 songs
How can machines learn our preferences?

14/27

us
?
dy n? ckb
me ctio blo
o
a
c
s
es es f er
lik lik pre
te r
s?
e
lik
om
sT
u
Cr
ise
pattern:
rating viewer/movie factors
viewer
Match movie and
viewer factors
add contributions
from each factor
predicted
rating
movie
m
To
u
Cr
ise
in
it?
r?
s te
bu nt
ck nte
blo n co tent
tio on
ac d y c
me
co
15/27

us
?
dy n? ckb
me ctio blo
o
a
c
s
es es f er
lik lik pre
te r
s?
e
lik
om
sT
u
Cr
ise
pattern:
viewer
learning:
Match movie and
viewer factors
add contributions
from each factor
predicted
rating
movie
m
To
u
Cr
ise
in
it?
r?
s te
bu nt
ck nte
blo n co tent
tio on
ac d y c
me
co
known rating
learned factors
unknown rating prediction
15/27

us
?
dy n? ckb
me ctio blo
o
a
c
s
es es f er
lik lik pre
te r
s?
e
lik
om
sT
u
Cr
ise
pattern:
viewer
learning:
Match movie and
viewer factors
add contributions
from each factor
predicted
rating
movie
known rating
learned factors
unknown rating prediction
m
To
u
Cr
ise
in
it?
r?
s te
bu nt
ck nte
blo n co tent
tio on
ac d y c
me
co
key part of the world-champion (again!)

system from National Taiwan Univ.
in KDDCup 2011
15/27
Fun Time
Which of the following field cannot use machine learning?
1
Finance
Medicine
Law
none of the above
16/27
Fun Time
Which of the following field cannot use machine learning?
1
Finance
Medicine
Law
none of the above
Reference Answer: 4
1
predict stock price from data
predict medicine effect from data
summarize legal documents from data
:-) Welcome to study this hot topic!
16/27
Components of Learning:
Metaphor Using Credit Approval
Applicant Information
age
gender
annual salary
year in residence
year in job
current debt
23 years
female
NTD 1,000,000
1 year
0.5 year
200,000
17/27
Components of Learning:
Metaphor Using Credit Approval
Applicant Information
age
gender
annual salary
year in residence
year in job
current debt
23 years
female
NTD 1,000,000
1 year
0.5 year
200,000
unknown pattern to be learned:

approve credit card good for bank?
17/27
Formalize the Learning Problem

Basic Notations
input: x X (customer application)
output: y Y (good/bad after approving credit card)
18/27

Basic Notations
unknown pattern to be learned target function:
f : X Y (ideal credit approval formula)
18/27

Basic Notations

data training examples: D = {(x1 , y1 ), (x2 , y2 ), , (xN , yN )}
(historical records in bank)
18/27

Basic Notations


hypothesis skill with hopefully good performance:
g : X Y (learned formula to be used)
18/27

Basic Notations


hypothesis skill with hopefully good performance:
g : X Y (learned formula to be used)
{(xn , yn )} from f
ML
18/27
Learning Flow for Credit Approval

unknown target function
f: X Y
(ideal credit approval formula)
training examples
D : (x1 , y1 ), , (xN , yN )
learning
algorithm
A
final hypothesis
gf
(learned formula to be used)
19/27

f: X Y
training examples
D : (x1 , y1 ), , (xN , yN )
learning
algorithm
A
final hypothesis
gf
target f unknown
(i.e. no programmable definition)
19/27

f: X Y
training examples
D : (x1 , y1 ), , (xN , yN )
learning
algorithm
A
final hypothesis
gf
target f unknown

hypothesis g hopefully f
but possibly different from f

(perfection impossible when f unknown)
19/27

f: X Y
training examples
D : (x1 , y1 ), , (xN , yN )
learning
algorithm
A
final hypothesis
gf
target f unknown

hypothesis g hopefully f
but possibly different from f

(perfection impossible when f unknown)
What does g look like?
19/27
The Learning Model

training examples
D : (x1 , y1 ), , (xN , yN )
learning
algorithm
A
final hypothesis
gf

hypothesis set
H
(set of candidate formula)
assume g H = {hk }, i.e. approving if

h1 : annual salary > NTD 800,000
h2 : debt > NTD 100,000 (really?)
h3 : year in job 2 (really?)
20/27
The Learning Model

training examples
D : (x1 , y1 ), , (xN , yN )
learning
algorithm
A
final hypothesis
gf

hypothesis set
H

hypothesis set H:
can contain good or bad hypotheses
20/27
The Learning Model

training examples
D : (x1 , y1 ), , (xN , yN )
learning
algorithm
A
final hypothesis
gf

hypothesis set
H

hypothesis set H:
up to A to pick the best one as g
20/27
The Learning Model

training examples
D : (x1 , y1 ), , (xN , yN )
learning
algorithm
A
final hypothesis
gf

hypothesis set
H

hypothesis set H:
up to A to pick the best one as g
learning model = A and H

20/27
Practical Definition of Machine Learning

f: X Y
training examples
D : (x1 , y1 ), , (xN , yN )
learning
algorithm
A
final hypothesis
gf

hypothesis set
H
machine learning:
use data to compute hypothesis g
that approximates target f
21/27
Fun Time
How to use the four sets below to form a learning problem for
song recommendation?
S1 = [0, 100]
S2 = all possible (userid, songid) pairs
S3 = all formula that multiplies user factors & song factors,
indexed by all possible combinations of such factors
S4 = 1,000,000 pairs of ((userid, songid), rating)
1
S1 = X , S2 = Y, S3 = H, S4 = D
S1 = Y, S2 = X , S3 = H, S4 = D
S1 = D, S2 = H, S3 = Y, S4 = X
S1 = X , S2 = D, S3 = Y, S4 = H
22/27
Fun Time
How to use the four sets below to form a learning problem for
song recommendation?
S1 = [0, 100]
S2 = all possible (userid, songid) pairs
S3 = all formula that multiplies user factors & song factors,
indexed by all possible combinations of such factors
S4 = 1,000,000 pairs of ((userid, songid), rating)
1
S1 = X , S2 = Y, S3 = H, S4 = D
S1 = Y, S2 = X , S3 = H, S4 = D
S1 = D, S2 = H, S3 = Y, S4 = X
S1 = X , S2 = D, S3 = Y, S4 = H
Reference Answer: 2
A on S
3
(g : S2 S1 )
S4
22/27
Machine Learning and Data Mining

Machine Learning
Data Mining

use (huge) data to find property

that is interesting
23/27

Machine Learning
Data Mining


that is interesting
if interesting property same as hypothesis that approximate
target
23/27

Machine Learning
Data Mining


that is interesting
target
ML = DM (usually what KDDCup does)
23/27

Machine Learning
Data Mining


that is interesting
target
if interesting property related to hypothesis that approximate
target
23/27

Machine Learning
Data Mining


that is interesting
target
target
DM can help ML, and vice versa (often, but not always)
23/27

Machine Learning
Data Mining


that is interesting
target
target
traditional DM also focuses on efficient computation in large
database
23/27

Machine Learning
Data Mining


that is interesting
target
target
traditional DM also focuses on efficient computation in large
database
difficult to distinguish ML and DM in reality
23/27
Machine Learning and Artificial Intelligence

Machine Learning
Artificial Intelligence

compute something
that shows intelligent behavior
24/27

Machine Learning

compute something
g f is something that shows intelligent behavior
24/27

Machine Learning

compute something
ML can realize AI, among other routes
24/27

Machine Learning

compute something

e.g. chess playing
traditional AI: game tree
ML for AI: learning from board data
24/27

Machine Learning

compute something

e.g. chess playing
traditional AI: game tree
ML for AI: learning from board data
ML is one possible route to realize AI
24/27
Machine Learning and Statistics

Machine Learning
Statistics

use data to make inference

about an unknown process
25/27

Machine Learning
Statistics


g is an inference outcome; f is something unknown
25/27

Machine Learning
Statistics


statistics can be used to achieve ML
25/27

Machine Learning
Statistics



traditional statistics also focus on provable results with math
assumptions, and care less about computation
25/27

Machine Learning
Statistics



traditional statistics also focus on provable results with math
assumptions, and care less about computation

statistics: many useful tools for ML
25/27
Fun Time
Which of the following claim is not totally true?
1
machine learning is a route to realize artificial intelligence
machine learning, data mining and statistics all need data
data mining is just another name for machine learning
statistics can be used for data mining
26/27
Fun Time
Which of the following claim is not totally true?
1
machine learning is a route to realize artificial intelligence
machine learning, data mining and statistics all need data
data mining is just another name for machine learning
statistics can be used for data mining
Reference Answer: 3
While data mining and machine learning do
share a huge overlap, they are arguably not
equivalent because of the difference of focus.
26/27
Summary
1
When Can Machines Learn?

Course Introduction
foundation oriented and story-like
use data to approximate target
almost everywhere
A takes D and H to get g
related to DM, AI and Stats
next: a simple and yet useful learning model (H and A)
2
3
4
Why Can Machines Learn?

How Can Machines Learn?
How Can Machines Learn Better?
27/27

01 Present PDF

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

01 Present PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

01 Present PDF

Uploaded by

Copyright:

Available Formats

Machine Learning Foundations

Lecture 1: The Learning Problem

National Taiwan University

Hsuan-Tien Lin (NTU CSIE)

Machine Learning Foundations

The Learning Problem

Course Design (1/2)

Hsuan-Tien Lin (NTU CSIE)

Machine Learning Foundations

The Learning Problem

Course Design (1/2)

our approach: foundation oriented

Hsuan-Tien Lin (NTU CSIE)

Machine Learning Foundations

The Learning Problem

Course Design (2/2)

usage in practice, and hopefully jokes :-)

Hsuan-Tien Lin (NTU CSIE)

Machine Learning Foundations

The Learning Problem

Course Design (2/2)

usage in practice, and hopefully jokes :-)

Hsuan-Tien Lin (NTU CSIE)

Machine Learning Foundations

The Learning Problem

Course Design (2/2)

usage in practice, and hopefully jokes :-)

Hsuan-Tien Lin (NTU CSIE)

Machine Learning Foundations

The Learning Problem

Course Design (2/2)

usage in practice, and hopefully jokes :-)

Hsuan-Tien Lin (NTU CSIE)

Machine Learning Foundations

The Learning Problem

Course Design (2/2)

usage in practice, and hopefully jokes :-)

When Can Machines Learn? (illustrative + technical)

Hsuan-Tien Lin (NTU CSIE)

Machine Learning Foundations

The Learning Problem

Course Design (2/2)

usage in practice, and hopefully jokes :-)

When Can Machines Learn? (illustrative + technical)

allows students to learn future/untaught

Hsuan-Tien Lin (NTU CSIE)

Machine Learning Foundations

The Learning Problem

and blackboard teaching

Hsuan-Tien Lin (NTU CSIE)

Machine Learning Foundations

The Learning Problem

and blackboard teaching

more audience in need

with Courseras quiz and

Hsuan-Tien Lin (NTU CSIE)

Machine Learning Foundations

The Learning Problem