2-Inductive Learning
2-Inductive Learning
2-Inductive Learning
to
Machine
Learning:
CS
436/580L
Induc&ve
(Supervised)
Learning:
Hypothesis
Spaces
Instructor:
Ar)
Ramesh
Binghamton
University
Administrivia
• Homework
0
is
available
on
myCourses
– Worth
4/40
points
– Due
Sep
6th,
Wednesday,
11:59
pm
– Late
penalty
is
10%
aPer
the
deadline
2
Probability
Review
Recap
• Mutual
Exclusion
• Independence
• Condi)onal
Independence
• Expecta)on
• Variance
3
Probability
Review
The
weather
on
a
par)cular
day
can
be
sunny,
cloudy,
or
rainy.
It
can
be
sunny
with
probability
=
0.3,
cloudy
with
probability
=
0.4,
and
rainy
with
probability
=
0.3.
A
concert
is
planned
to
be
held
in
the
city.
If
the
weather
is
sunny,
the
concert
will
be
held
100%.
If
the
weather
is
cloudy
or
rainy,
it
will
be
held
with
probability
0.8
and
0.5,
respec)vely.
What
is
the
probability
that
the
concert
will
be
held?
4
Probability
Review
Let
X
denote
the
sum
of
two
fair
dice.
What
is
the
expecta)on
of
X?
5
Recap
• Different
defini)ons
of
machine
learning
and
all
are
correct!!!
• Slight
varia)ons
according
to
type
of
learning
• Types
of
Learning
– Supervised
Learning
– Unsupervised
Learning
– Reinforcement
Learning
– Semi-‐Supervised
Learning
6
Types
of
Learning
• Supervised
Learning
– problem:
the
learner
is
required
to
learn
a
func&on
which
maps
a
vector
into
one
of
several
classes
by
looking
at
several
input-‐output
examples
of
the
func)on.
– standard
formula)on
of
the
supervised
learning
task:
classifica&on
7
Types
of
Learning
• Unsupervised
Learning
– models
a
set
of
inputs:
labeled
examples
are
not
available
– standard
formula)on
of
the
unsupervised
learning
task:
clustering
• Semi-‐supervised
Learning
– combines
both
labeled
and
unlabeled
examples
to
generate
an
appropriate
func)on
or
classifier
8
Types
of
Learning
• Reinforcement
Learning
– the
algorithm
learns
a
policy
of
how
to
act
given
an
observa)on
of
the
world
– Every
ac)on
has
some
impact
in
the
environment,
the
environment
provides
feedback
that
guides
the
learning
algorithm
9
Which
type
of
learning
is
best?
• Determining
the
best
move
to
make
in
a
game
• Dis)nguish
between
dogs,
cats,
and
horse
pictures
• Elevator
scheduling
• Agent
in
field
trying
to
diffuse
a
bomb
• Speech
analysis
of
telephone
conversa)on
(400
hours
annota)on
)me
for
each
hour
of
speech)
10
ML
in
a
Nutshell
• Tens
of
thousands
of
machine
learning
algorithms
• Hundreds
new
every
year
• Every
machine
learning
algorithm
has
three
components:
– Representa&on
– Evalua&on
– Op&miza&on
11
Representa&on
• Decision
trees
• Sets
of
rules
/
Logic
programs
• Instances
• Graphical
models
(Bayes/Markov
nets)
• Neural
networks
• Support
vector
machines
• Model
ensembles
• Etc.
12
Evalua&on
• Accuracy
• Precision
and
recall
• Squared
error
• Likelihood
• Posterior
probability
• Cost
/
U)lity
• Margin
• Entropy
• K-‐L
divergence
• Etc.
13
Op&miza&on
• Combinatorial
op)miza)on
– E.g.:
Greedy
search
• Convex
op)miza)on
– E.g.:
Gradient
descent
• Constrained
op)miza)on
– E.g.:
Linear
programming
14
Supervised
Learning
• Given:
Training
examples
(x,
f(x)),
for
some
unknown
func)on
f
• Find:
an
approxima)on
to
f
Example
Applica&ons
• Credit
risk
assessments
– x:
proper)es
of
customer
and
proposed
purchase
– f(x):
to
approve/reject
purchase
• Disease
diagnosis
– x:
proper)es
of
pa)ent
(symptoms,
lab
tests)
– f(x):
disease
diagnosis,
recommended
therapy
• Face
recogni&on
– x:
bitmap
picture
of
person’s
face
– f(x):
Person’s
name
15
Appropriate
Applica&ons
for
Supervised
Learning
• Situa&ons
where
there
is
no
human
expert
– x:
bond
graph
for
a
new
molecule
– f(x):
predicted
binding
strength
to
AIDS
protease
molecule
• Situa&ons
where
humans
can
perform
the
task
but
cant
describe
how
to
do
it
– x:
bitmap
picture
of
handwrihen
character
– f(x):
ASCII
code
of
character
• Situa&ons
where
desired
f(x)
is
changing
rapidly
– x:
descrip)on
of
stock
prices
and
trades
for
last
10
days
– f(x):
recommended
stock
transac)ons
• Situa&ons
where
each
user
needs
a
customized
f
– x:
incoming
email
message
– f(x):
importance
score
for
presen)ng
to
user
16
Example:
A
dataset
for
supervised
learning
Sepal
Length
Sepal
Width
Petal
Length
Petal
Width
Class
5.1
3.5
1.4
0.2
Iris-‐Sentosa
6.1
3.0
4.6
1.4
Iris-‐Versicolor
7.2
3.6
6.1
2.5
Iris-‐Virginica
17
Supervised Learning
Supervised
Learning:
Problem
Formal
Defini&on
test point
P(x,y) <x,y>
training points x y
training learning f
samples algorithm
X
0
X
X
0
X
X
X
X
0
X
0
X
X
0
0
X
X
f(x)=1
0
X
X
X
0
0
0
0
0
0
X
0
0
0
X
0
X
X
X
0
X
X
X
0
X
0
0
f(x)=0
0
X
X
0
X
X
0
X
X
0
X
X
0
X
0
f(x)=?
X
X
0
19
A
Learning
Problem!
X1
X2
X3
X4
X5
X6
X7
X8
X9
f(x)
X
0
X
0
X
0
0
X
X
1
X
0
X
X
X
0
X
0
0
1
X
X
X
0
X
X
0
0
0
1
0
X
0
X
0
X
0
X
X
0
0
0
X
X
X
0
0
X
X
0
0
X
X
X
0
0
0
X
X
0
0
X
X
0
X
0
X
X
0
?
• x:
a
9-‐dimensional
vector
• f(x):
a
func)on
or
a
program
that
takes
the
vector
as
input
and
outputs
either
a
0
or
a
1
• Task:
given
the
training
examples,
find
a
good
approxima)on
to
f
so
that
in
future
if
you
see
an
unseen
vector
“x”
you
will
be
able
to
figure
out
the
value
of
f(x)
20
Example
of
a
learning
problem
A
simpler
example
for
Classifica&on
problem
analysis!
21
How
to
find
a
good
approxima&on
to
f?
• A
possible/plausible
technique
Unknown
Training
func)on
Examples/Data
Learning
𝑓:𝑋→𝑌
algorithm
(𝑥,𝑓(𝑥))
A
good
Hypothesis
approxima)on:
space
𝐻
ℎ≈𝑓
22
Hypothesis
Spaces
You
are
assuming
that
the
It
turns
out
that
out
of
the
unknown
func)on
f
could
be
2↑16
possible
func)ons,
2↑9
any
one
of
the
2↑16
func)ons!
classify
all
points
in
the
training
data
correctly!
23
Hypothesis
Spaces
• 10,000
features
• Features
are
binary
• Output
is
binary
Number
of
boolean
func&ons?
24
Hypothesis
Spaces
25
Hypothesis
Spaces
26
Two
Views
of
Learning
• Learning
is
the
removal
of
uncertainty
• Learning
requires
guessing
a
good,
small
hypothesis
case
•
We
could
be
wrong!
– Our
prior
knowledge
might
be
wrong!
– Our
guess
for
hypothesis
class
could
be
wrong!
• The
smaller
the
hypothesis
class,
more
likely
we
are
wrong!
27
Strategies
for
Machine
Learning
• Strategy
1:
Develop
languages
for
expressing
prior
knowledge:
rule
grammars
and
stochas)c
models
• Strategy
2:
Develop
flexible
hypothesis
spaces:
Nested
collec)ons
of
hypotheses
–
decision
trees,
rules,
neural
networks,
…
• In
either
case:
– Develop
algorithms
for
finding
a
hypothesis
that
fits
the
data!
28
Terminology
• Training
Example:
An
example
of
form
(x,
f(x))
• Target
func)on
(target
concept):
The
true
func)on
f
• Hypothesis:
A
proposed
func)on
h
believed
to
be
similar
to
f
• Concept:
A
boolean
func)on.
Examples
for
which
f(x)
=
1
are
called
posi)ve
examples
or
posi)ve
instances
of
the
concept.
Examples
for
which
f(x)
=
0
are
called
nega)ve
examples
or
nega)ve
instances
of
the
concept.
• Classifier:
A
discrete-‐valued
func)on.
The
possible
values
of
f
are
called
class
labels
f ∈ {1, 2,...K}
• Hypothesis
Space:
The
space
of
learning
algorithms
that
can
be
output
by
a
learning
algorithm
29
Key
Issues
in
Machine
Learning
• What
are
good
hypothesis
spaces?
– Which
spaces
are
useful
in
prac)cal
applica)ons
and
why?
• What
algorithms
can
work
in
these
spaces?
– Are
there
general
design
principles
for
machine
learning
algorithms?
• How
can
we
op&mize
accuracy
on
future
data
points?
– This
is
some)mes
called
the
problem
of
overfivng
• How
can
we
have
confidence
in
the
results?
– How
much
training
data
is
required
to
find
accurate
hypothesis
• Are
some
learning
problems
computa&onally
intractable?
(the
computa)onal
ques)on!)
• How
can
we
formulate
applica&on
problems
as
machine
learning
problems?
(the
engineering
ques)on!)
30
Steps
in
Supervised
Learning
1. Determine
the
representa)on
for
“x,f(x)”
and
determine
what
“x”
to
use
(Feature
Engineering)
2. Gather
a
training
set
(not
all
data
is
kosher)
(Data
Cleaning)
3.
Select
a
suitable
evalua)on
method
4.
Find
a
suitable
learning
algorithm
among
a
plethora
of
available
choices
– Issues
discussed
on
the
previous
slide
31
Feature
Engineering
is
the
Key
• Most
effort
in
ML
projects
is
construc)ng
features
• Black
art:
Intui)on,
crea)vity
required
– Understand
proper)es
of
the
task
at
hand
– How
the
features
interact
with
or
limit
the
algorithm
you
are
using.
• ML
is
an
itera)ve
process
– Try
different
types
of
features,
experiment
with
each
and
then
decide
which
feature
set/algorithm
combina)on
to
use
32
A
sample
machine
learning
Algorithm
• 2-‐way
classifica)on
problem
– +ve
and
–ve
classes
• Representa)on:
Lines
(Ax+By=C)
– Specifically
• if
Ax+By+C
>0
then
classify
“+ve”
• Else
classify
as
“-‐ve”
• Evalua)on:
Number
of
mis-‐classified
examples
• Op)miza)on:
An
algorithm
that
searches
for
the
three
parameters:
A,
B
and
C.
33
Toy
Example
Y Ax+By+C>0
Blue
circles:
Good
credit
(low
risk)
60K
Red
circles:
Bad
credit
(high
risk)
45K
Income
35
Toy
Example:
More
data
x2
Blue
circles:
Good
credit
(low
risk)
Red
circles:
Bad
credit
(high
risk)
Income
37