2-Inductive Learning

Introduc)on
to Machine
Learning: CS 436/580L
Induc&ve (Supervised)
Learning: Hypothesis Spaces
Instructor: Ar) Ramesh
Binghamton University
Administrivia
•  Homework 0 is available on myCourses
–  Worth 4/40 points
–  Due Sep 6th, Wednesday, 11:59 pm
–  Late penalty is 10% aPer the deadline

2
Probability Review Recap
•  Mutual Exclusion
•  Independence
•  Condi)onal Independence
•  Expecta)on
•  Variance
3
Probability Review
The weather on a par)cular day can be sunny,
cloudy, or rainy. It can be sunny with probability =
0.3, cloudy with probability = 0.4, and rainy with
probability = 0.3. A concert is planned to be held
in the city. If the weather is sunny, the concert will
be held 100%. If the weather is cloudy or rainy, it
will be held with probability 0.8 and 0.5,
respec)vely.
What is the probability that the concert will be
held?
4
Probability Review
Let X denote the sum of two fair dice. What is the
expecta)on of X?

5
Recap
•  Different defini)ons of machine learning and all
are correct!!!
•  Slight varia)ons according to type of learning
•  Types of Learning
–  Supervised Learning
–  Unsupervised Learning
–  Reinforcement Learning
–  Semi-‐Supervised Learning
6
Types of Learning
•  Supervised Learning
–  problem: the learner is required to learn a func&on
which maps a vector into one of several classes by
looking at several input-‐output examples of the
func)on.
–  standard formula)on of the supervised learning
task: classifica&on
7
Types of Learning
•  Unsupervised Learning
–  models a set of inputs: labeled examples are not
available
–  standard formula)on of the unsupervised learning
task: clustering
•  Semi-‐supervised Learning
–  combines both labeled and unlabeled examples to
generate an appropriate func)on or classifier
8
Types of Learning
•  Reinforcement Learning
–  the algorithm learns a policy of how to act given an
observa)on of the world
–  Every ac)on has some impact in the environment,
the environment provides feedback that guides the
learning algorithm

9
Which type of learning is best?
•  Determining the best move to make in a game

•  Dis)nguish between dogs, cats, and horse pictures

•  Elevator scheduling

•  Agent in field trying to diffuse a bomb

•  Speech analysis of telephone conversa)on (400
hours annota)on )me for each hour of speech)
10
ML in a Nutshell
•  Tens of thousands of machine learning
algorithms
•  Hundreds new every year
•  Every machine learning algorithm has three
components:
–  Representa&on
–  Evalua&on
–  Op&miza&on
11
Representa&on
•  Decision trees
•  Sets of rules / Logic programs
•  Instances
•  Graphical models (Bayes/Markov nets)
•  Neural networks
•  Support vector machines
•  Model ensembles
•  Etc.
12
Evalua&on
•  Accuracy
•  Precision and recall
•  Squared error
•  Likelihood
•  Posterior probability
•  Cost / U)lity
•  Margin
•  Entropy
•  K-‐L divergence
•  Etc.
13
Op&miza&on
•  Combinatorial op)miza)on
–  E.g.: Greedy search
•  Convex op)miza)on
–  E.g.: Gradient descent
•  Constrained op)miza)on
–  E.g.: Linear programming
14
Supervised Learning
•  Given: Training examples (x, f(x)), for some unknown func)on
f
•  Find: an approxima)on to f
Example Applica&ons
•  Credit risk assessments
–  x: proper)es of customer and proposed purchase
–  f(x): to approve/reject purchase
•  Disease diagnosis
–  x: proper)es of pa)ent (symptoms, lab tests)
–  f(x): disease diagnosis, recommended therapy
•  Face recogni&on
–  x: bitmap picture of person’s face
–  f(x): Person’s name
15
Appropriate Applica&ons for
Supervised Learning
•  Situa&ons where there is no human expert
–  x: bond graph for a new molecule
–  f(x): predicted binding strength to AIDS protease molecule
•  Situa&ons where humans can perform the task but cant
describe how to do it
–  x: bitmap picture of handwrihen character
–  f(x): ASCII code of character
•  Situa&ons where desired f(x) is changing rapidly
–  x: descrip)on of stock prices and trades for last 10 days
–  f(x): recommended stock transac)ons
•  Situa&ons where each user needs a customized f
–  x: incoming email message
–  f(x): importance score for presen)ng to user
16
Example: A dataset for supervised
learning
Sepal Length Sepal Width Petal Length Petal Width Class
5.1 3.5 1.4 0.2 Iris-‐Sentosa
6.1 3.0 4.6 1.4 Iris-‐Versicolor
7.2 3.6 6.1 2.5 Iris-‐Virginica
•  Columns are called input variables, features, or aSributes

•  The type of flower {Iris-‐Sentosa, Iris-‐Versicolor, Iris-‐Virginica} are
called target variables, output variables, or labels
•  A row in the table is called a training example
•  The whole table is called (training, valida&on, test or
evalua&on) data set
•  The problem of predic)ng the label is called classifica&on
17
Supervised Learning
Supervised Learning: Problem
Formal Defini&on
test point
P(x,y) <x,y>
training points x y
training learning f
samples algorithm
Training examples are drawn from independently at

random according to unknown probability distribution ŷ y
P(x,y)
The learning algorithm analyzes the the examples and
produces a classifier f or hypothesis h loss
function
Given a new data point <x,y> drawn from P
(independently and at random), the classifier is given x
and predicts ŷ = f(x)
The loss L(ŷ,y) is then measured. L(ŷ,y)
Goal of the learning algorithm: Find the f that minimizes
the expected loss.
18
A learning problem!
X 0 X X 0 X X X X
0 X 0 X X 0 0 X X f(x)=1
0 X X X 0 0 0 0 0
0 X 0 0 0 X 0 X X
X 0 X X X 0 X 0 0 f(x)=0
0 X X 0 X X 0 X X
0 X X
0 X 0 f(x)=?
X X 0
19
A Learning Problem!
X1 X2 X3 X4 X5 X6 X7 X8 X9 f(x)
X 0 X 0 X 0 0 X X 1
X 0 X X X 0 X 0 0 1
X X X 0 X X 0 0 0 1
0 X 0 X 0 X 0 X X 0
0 0 X X X 0 0 X X 0
0 X X X 0 0 0 X X 0
0 X X 0 X 0 X X 0 ?
•  x: a 9-‐dimensional vector
•  f(x): a func)on or a program that takes the vector as input and outputs either a
0 or a 1
•  Task: given the training examples, find a good approxima)on to f so that in
future if you see an unseen vector “x” you will be able to figure out the value
of f(x)
20
Example of a learning problem
A simpler
example for Classifica&on problem
analysis!
Given data or examples, find the func&on f?
21
How to find a good approxima&on to f?
•  A possible/plausible technique
Unknown Training
func)on Examples/Data Learning
𝑓:𝑋→𝑌 algorithm
(𝑥,𝑓(𝑥))
A good
Hypothesis approxima)on:
space 𝐻
ℎ≈𝑓
Set of candidate func)ons

(Your assump)ons about f)
22
Hypothesis Spaces
You are assuming that the It turns out that out of the
unknown func)on f could be 2↑16  possible func)ons, 2↑9 
any one of the 2↑16  func)ons! classify all points in the
training data correctly!
23
Hypothesis Spaces
•  10,000 features
•  Features are binary
•  Output is binary

Number of boolean func&ons?
24
Hypothesis Spaces
You are assuming that

the unknown func)on f
could be any one of the
16 conjunc)ve rules!
Unfortunately, none of

them work
25
Hypothesis Spaces
At least m of the n

variables must be true
You are assuming that

the unknown func)on f
could be any one of the
32 m-‐of-‐n rules!
Only one of them, the one

marked by “***” works!
26
Two Views of Learning
•  Learning is the removal of uncertainty
•  Learning requires guessing a good, small hypothesis
case
•  We could be wrong!
–  Our prior knowledge might be wrong!
–  Our guess for hypothesis class could be wrong!
•  The smaller the hypothesis class, more likely we are wrong!
27
Strategies for Machine Learning
•  Strategy 1: Develop languages for expressing
prior knowledge: rule grammars and stochas)c
models
•  Strategy 2: Develop flexible hypothesis spaces:
Nested collec)ons of hypotheses – decision
trees, rules, neural networks, …
•  In either case:
–  Develop algorithms for finding a hypothesis that
fits the data!
28
Terminology
•  Training Example: An example of form (x, f(x))
•  Target func)on (target concept): The true func)on f
•  Hypothesis: A proposed func)on h believed to be similar
to f
•  Concept: A boolean func)on. Examples for which f(x) = 1
are called posi)ve examples or posi)ve instances of the
concept. Examples for which f(x) = 0 are called nega)ve
examples or nega)ve instances of the concept.
•  Classifier: A discrete-‐valued func)on. The possible values
of f are called class labels f ∈ {1, 2,...K}
•  Hypothesis Space: The space of learning algorithms that
can be output by a learning algorithm
29
Key Issues in Machine Learning
•  What are good hypothesis spaces?
–  Which spaces are useful in prac)cal applica)ons and why?
•  What algorithms can work in these spaces?
–  Are there general design principles for machine learning
algorithms?
•  How can we op&mize accuracy on future data points?
–  This is some)mes called the problem of overfivng
•  How can we have confidence in the results?
–  How much training data is required to find accurate hypothesis
•  Are some learning problems computa&onally intractable?
(the computa)onal ques)on!)
•  How can we formulate applica&on problems as machine
learning problems?
(the engineering ques)on!)
30
Steps in Supervised Learning
1.  Determine the representa)on for “x,f(x)” and
determine what “x” to use (Feature Engineering)
2.  Gather a training set (not all data is kosher)
(Data Cleaning)
3. Select a suitable evalua)on method
4. Find a suitable learning algorithm among a
plethora of available choices
–  Issues discussed on the previous slide

31
Feature Engineering is the Key
•  Most effort in ML projects is construc)ng
features
•  Black art: Intui)on, crea)vity required
–  Understand proper)es of the task at hand
–  How the features interact with or limit the algorithm
you are using.
•  ML is an itera)ve process
–  Try different types of features, experiment with
each and then decide which feature set/algorithm
combina)on to use
32
A sample machine learning Algorithm
•  2-‐way classifica)on problem
–  +ve and –ve classes
•  Representa)on: Lines (Ax+By=C)
–  Specifically
•  if Ax+By+C >0 then classify “+ve”
•  Else classify as “-‐ve”
•  Evalua)on: Number of mis-‐classified examples
•  Op)miza)on: An algorithm that searches for the
three parameters: A, B and C.
33
Toy Example
Y Ax+By+C>0 Blue circles: Good
credit (low risk)
60K
Red circles: Bad
credit (high risk)
45K
Income
Ax+By+C<0 Problem: Fit a

30K line that
separates the two
15K such that the
x error is
20 40 60 80 minimized.
Age
34
How do machine learners solve this
problem?
•  Try different lines un)l you find one that
separates the data into two
•  A more plausible alterna)ve
–  Begin with a random line
–  Repeat un)l no errors
–  For each point
•  If the current line says +ve and point is –ve then decrease
A, B and C
•  If the current line says –ve and the point is +ve then
increase A, B, and C
35
Toy Example: More data
x2
Blue circles: Good
credit (low risk)
Red circles: Bad credit
(high risk)
Income
Problem: Fit a line

that separates the
two such that the
error is minimized.
x1
Age
36
Learning = Representa&on +
Evalua&on + Op&miza&on
•  Combina)ons of just three elements
Representa&on Evalua&on Op&miza&on
Instances Accuracy Greedy search
Hyperplanes Precision/Recall Branch & bound
Decision trees Squared error Gradient descent
Sets of rules Likelihood Quasi-‐Newton
Neural networks Posterior prob. Linear progr.
Graphical models Margin Quadra)c progr.
Etc. Etc. Etc.
37

2-Inductive Learning

Uploaded by

Copyright:

Available Formats

2-Inductive Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2-Inductive Learning

Uploaded by

Copyright:

Available Formats

Introduc)on

•  Columns are called input variables, features, or aSributes

Training examples are drawn from independently at

Given data or examples, ﬁnd the func&on f?

Set of candidate func)ons

You are assuming that

Unfortunately, none of

At least m of the n

You are assuming that

Only one of them, the one

Ax+By+C<0 Problem: Fit a

Problem: Fit a line

You might also like

2-Inductive Learning

Uploaded by

Copyright:

Available Formats

2-Inductive Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2-Inductive Learning

Uploaded by

Copyright:

Available Formats

Introduc)on

• Columns are called input variables, features, or aSributes

Training examples are drawn from independently at

Given data or examples, ﬁnd the func&on f?

Set of candidate func)ons

You are assuming that

Unfortunately, none of

At least m of the n

You are assuming that

Only one of them, the one

Ax+By+C<0 Problem: Fit a

Problem: Fit a line

You might also like

•  Columns are called input variables, features, or aSributes