ML Unit 2

MACHINE LEARNING
UNIT-II
Handling More than Two Classes
• More than Two Classes.
• Two Classes in Classification, Scoring and Class

Probability.
• Two Issues:
1. Evaluate Multi-class Performance
2. Build Multi-class models out of binary
Models.
Multi-Class Classification
• Classification tasks with more than two classes.
• Performance by Confusion matrix or

Contigency table.
• K-classes, performance of a classifier using k-

by- k Contigency table.
Recall
Precision
• Ability to train only two classes(binary) in
Multiclass .
• Schemes/Methods:
1. One vs Rest. Train (n-1) models
C1, C2, C3……….Cn
or one Rest
C2, C1, C3……….Cn
one Rest
2. One vs One.
Pair of classifiers:
(C1,C2)=(C2,C1)Symmetry then n(n-1)/2 models
(C1,C2)≠(C2,C1)Asymmetry then n(n-1) models
Output Code Matrix
One vs One
• C1, C2, C3----- +1,-1,0

For ex: (C1,C2) (+1,-1,0), (C1,C3)+1,0,-1 &
(C2,C3)(0,+1,1)
C1 C1
C2 C2
C3 C3
Symmetric(ordered) Asymmetric(unordered)
(C1,C2)=(C2,C1) (C1,C2) ≠ (C2,C1)
(C1,C3)=(C3,C1) (C1,C3) ≠ (C3,C1)
(C2,C3)=(C3,C2) (C2,C3) ≠ (C3,C2)
One vs Rest
For Ex: C1, C2, C3
• C1 +ve then C2, C3 are -ve.
• C2 +ve then C1, C3 are -ve.
• C3 +ve then C1, C2 are –ve.
• C1-C2-C3, C3-C1
C1 C1
C2 C2
C3 C3
To predict the class for a particular instance
Ex: -1,+1,-1 is a word

• Which method one vs one or one vs rest?
• If One vs rest then
• check instance match in output code matrix (row)
then it belongs to class C2.
• If the instance match not found then find the
nearest code words.
• Distance between the new word and all words .
Calculate the Distance
• WC1, WC2, WC3.

If +1 or -1 then +1
If +1,-1 or 0 then ½
REGRESSION
• Label space set of Discrete Class.
• Label space set of Real valued target variable.
• Function Estimator or Regressor
• Switching from Low Resolution target variable to
Infinite Resolution.
• At the time of matching may lead to over fitting.
Example: Line Fitting
• Value of y is estimated by polynomial x value.
• If degree of polynomial is 1 then straight line
Otherwise parabola or curves e.t.c.
• Reduce the degree of polynomial as low as
possible.
• Overfitting Number of Parameters
degree n n+1
• Y=ax+b
• 4 then 5 parameters
• An instance is divided into Segments  n
segments we require 2n-1 parameters.
ny values N-1  x So 2n-1
A concept of Overfitting
• A Rule of thumb:
number of parameters estimated from the
data must be less than data points.
• Classification  loss function marginals
• Regression  Loss Function  Residuals
• (F(x)-F’(x))
• Squared residuals the outliers are eliminated.

• Number of parameters are less then decrease
loss to zero.
• Number of parameters are more then models

are depend on training samples.
• Bias –Variance Dilema
• Expected Square loss Function:

Low Bias, Low Variance Low Bias, High Variance
High Bias, Low Variance High Bias, High Variance

Unsupervised and Descriptive
Learning
Descriptive Learning—no training data set
Predictive and Descriptive Clustering
•In Predictive clustering the domain of mapping an entire

space.
•In Descriptive clustering the domain of mapping an

subset of Instance space.
Subgroup Discovery
Association Rule Discovery
CONCEPT LEARNING
• Methods for learning logical expressions or

concepts.
• In concept learning we only learn a description
for the positive class, and label everything that
doesn’t satisfy that description as negative.
Concept Learning
Hypothesis Space
• The space of possible concepts – usually called the
hypothesis space – is already fairly large.
• Let’s assume we have three different lengths: 3, 4
and 5 metres, while the other three features have
two values each.
• We then have 3·2·2·2 = 24 possible instances.
• if we treat the absence of a feature as an additional
‘value’. This gives a total of
4 · 3 · 3 · 3 = 108 different concepts.
• sets of instances – is much larger: 2 power(24),
which is more than 16 million!
Least General Generalisation
(LGG)
• If we rule out all concepts that don’t cover at
least one of the instances. The hypothesis
space is reduced to 32 conjunctive concepts
• Insisting that any hypothesis cover all three
instances reduces this further to only four
concepts, the least general one of which is the
one found in the example – it is called their
Least General Generalisation (LGG)
Least General Generalisation(LGG)
• A hypothesis space forms a lattice: a partial
order in which each two elements have a least
upper bound (lub) and a greatest lower bound
(glb).
• The LGG of a set of instances is exactly the
least upper bound of the instances in that
lattice.
• choose one of the more general hypotheses,
such as Gills = no or Beak = yes, an over-
generalisation.
• Negative examples are very useful to prevent
over-generalistion.
Internal disjunction
• To make our logical language slightly richer, by
allowing a restricted form of disjunction called
internal disjunction.
• one dolphin that is 3 metres long and another
one of 4 metres.
• Length = [3,4], which logically means
Length = 3 Length = 4
• Algorithm 4.3 details how we can calculate the
LGG of two conjunctions employing internal
disjunction. The function Combine-ID(vx , vy )
returns [vx , vy ] if vx and vy are constants, and
their union if vx or vy are already sets of
values: e.g., Combine-ID([3,4], [4,5])= [3, 4, 5].
Paths Through The Hypothesis Space
• Every concept between the least general one and
one of the most general ones is also a possible
hypothesis, i.e., covers all the positives and none of
the negatives.
• Hypotheses that agree with the data is a convex
set.
• Definition 4.1 (Version space).
A concept is complete if it covers all positive
examples.
A concept is consistent if it covers none of the
negative examples.
The version space is the set of all complete and
consistent concepts.
This set is convex and is fully defined by its least
and most general elements
• Moving upwards in the hypothesis space by
generalisation means that the numbers of
covered positives and negatives can stay the
same or increase, but never decrease.
• In other words, an upward path through the

hypothesis space corresponds to a coverage
curve and hence to a ranking.
Closed concepts
• It is worthwhile to reflect on the fact that
concepts D and E occupy the same point in
coverage space
• A concept that includes all implicitly understood
conditions is called a closed concept.
• A closed concept is the LGG of all examples that it
covers. For instance, D and E both cover all
positives and n5; the LGG of those six examples is
Gills = no Beak = yes, which is D.
• the closure of E is D, which is also its own closure,
hence the term ‘closed concept’. This doesn’t
mean that D and E are logically equivalent:
• on the contrary, since XD XE –> the extension of
D is a proper subset of the extension of E –> there
exist instances in X that are covered by E but not by
D.
Beyond Conjunctive Concepts
Beyond Conjunctive Concepts
• A conjunctive normal form expression (CNF) is a
conjunction of disjunctions of literals, or
equivalently, a conjunction of clauses.
• CNF where each disjunction consists of a single
literal.
• an algorithm for learning Horn theories, where
each clause A → B is a Horn clause, i.e., A is a
conjunction of literals and B is a single literal.
• For ease of notation Boolean features
F for F = true and ¬F for F = false. In the example
below we adapt the dolphins example to Boolean
variables ManyTeeth (standing for Teeth = many),
Gills, Short (standing for Length = 3) and Beak
• In the example below we adapt the dolphins example
to Boolean variables
ManyTeeth (standing for Teeth = many), Gills, Short

(standing for Length = 3) and Beak.
• if a Horn theory doesn’t cover a positive we need to

drop all clauses that violate the positive, where a clause
A → B violates a positive if all literals in the conjunction
A are true in the example, and B is false.
• To find one or more clauses to add to the theory in
order to exclude the negative.
For example, suppose that our current hypothesis
covers the negative
ManyTeeth Gills Short ¬Beak
To exclude it, we can add the following Horn clause

to our theory:
ManyTeeth Gills Short → Beak

While there are other clauses that can exclude the
negative (e.g., ManyTeeth → Beak)
• if our covered negative is
ManyTeeth Gills ¬Short ¬Beak
then we have a choice between the following two
Horn clauses:
ManyTeeth Gills → Short
ManyTeeth Gills → Beak
• The Horn algorithm combines a number of
interesting new ideas.
• First, it is an active learning algorithm: rather than
learning from a provided data set, it constructs its
own training examples and asks the membership
oracle to label them.
• Secondly, the core of the algorithm is the list of
cleverly chosen negative examples, from which
the hypothesis is periodically rebuilt.
• The intersection step is crucial here: if the
algorithm just remembered negatives, the
hypothesis would consist of many specific clauses.
First-Order Logic
• The languages we have been using so far are
propositional:
• Each literal is a proposition such as
Gills = yes –> standing for
‘the dolphin has gills’ –> from which larger
expressions are built using logical connectives.
• ‘Dolphin has gills’ –> from which larger expressions
are built using logical connectives.
• First order predicate logic, or first-order logic for
short, generalises this by building more complex
literals from predicates and terms
•
• For example, a first-order literal could be

BodyPart(Dolphin42,PairOf(Gill)).
• Here, Dolphin42 and PairOf(Gill) are terms referring to
objects:
Dolphin42 is a constant, and PairOf(Gill) is a compound
term consisting of the function symbol PairOf and the
term Gills.
• BodyPart is a binary predicate forming a proposition
(something that can be true or false) out of two terms.
• This richer language brings with it a number of
advantages:
• we can use terms such as Dolphin42 to refer to
individual objects we’re interested in;
• The structure of objects can be explicitly described
we can introduce variables to refer to unspecified
objects and quantify over them.
• The first-order literal BodyPart(x,PairOf(Gill)) can
be used to refer to the set of all objects having a
pair of gills.
• The following expression applies universal
quantification to state that everything with a pair
of gills is a fish:
x : BodyPart(x,PairOf(Gill)) → Fish(x)
• LGG-Conj-ID(Length = [3,4],Length = [4,5]) returns
Length = [3,4,5].
• In order to generalise first-order literals we use
variables.
• Consider, for example, the two firstorder
literalsBodyPart(Dolphin42,PairOf(Gill))
andBodyPart(Human123,PairOf(Leg)):
BodyPart(x,PairOf(y)), signifying the set of objects
that have a pair of some unspecified body part.
• There is a well-defined algorithm for computing
LGGs of first-order literals called anti-unification,
as it is the mathematical dual of the deductive
operation of unification

ML Unit 2

Uploaded by

Copyright:

Available Formats

ML Unit 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Unit 2

Uploaded by

Copyright:

Available Formats

MACHINE LEARNING

• More than Two Classes.

• Two Classes in Classification, Scoring and Class

• Performance by Confusion matrix or

• K-classes, performance of a classifier using k-

• C1, C2, C3----- +1,-1,0

Ex: -1,+1,-1 is a word

• WC1, WC2, WC3.

• Regression  Loss Function  Residuals

• Squared residuals the outliers are eliminated.

• Number of parameters are more then models

• Bias –Variance Dilema

• Expected Square loss Function:

High Bias, Low Variance High Bias, High Variance

•In Predictive clustering the domain of mapping an entire

•In Descriptive clustering the domain of mapping an

• Methods for learning logical expressions or

• In other words, an upward path through the

ManyTeeth (standing for Teeth = many), Gills, Short

• if a Horn theory doesn’t cover a positive we need to

ManyTeeth Gills Short ¬Beak

To exclude it, we can add the following Horn clause

ManyTeeth Gills Short → Beak

• For example, a first-order literal could be

You might also like