Unit2_4
Unit2_4
Unit2_4
The concept can be viewed as describing some subset of objects or events defined over a larger
set (e.g., the subset of animals that constitute birds). Alternatively, each concept can be thought of
as a boolean-valued function defined over this larger set (e.g., a function defined over all animals,
whose value is true for birds and false for other animals).
Concept learning. - Inferring a boolean-valued function from training examples of its input and
output.
What hypothesis representation shall we provide to the learner in this case? Let us consider a
simple representation in which each hypothesis consists of a conjunction of constraints on the
instance attributes. In particular, let each hypothesis be a vector of six constraints, specifying the
values of the six attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. For each attribute,
the hypothesis will either :
indicate by a "?' that any value is acceptable for this attribute,
specify a single required value (e.g., Warm) for the attribute, or
indicate by a "Ø" that no value is acceptable.
If some instance x satisfies all the constraints of hypothesis h, then h classifies x as a positive
example (h(x) = 1). To illustrate, the hypothesis that Aldo enjoys his favorite sport only on cold
days with high humidity (independent of the values of the other attributes) is represented by
the expression -
(?, Cold, High, ?, ?, ?)
To summarize, the EnjoySport concept learning task requires learning the set of days for which
EnjoySport = yes, describing this set by a conjunction of constraints over the instance attributes
Inductive Learning Hypothesis – Concept learning task is able to follow inductive learning
hypothesis.
Definition: Any hypothesis found to approximate the target function well over a sufficiently large
set of training examples , will also approximate the target function well over other unobserved
examples. Following points are to be considered- •What are the implications?
•Is this reasonable?
•What (if any) are alternatives?
•What about concept drift (what if our views/tastes change over time)?
Concept Learning as search- Concept learning can be viewed as the task of searching through a
large space of hypotheses with the goal to find the hypothesis that best fits the training
examples. For the given example, the instance space X contains exactly 3 X 2X 2 X 2 X 2 X 2 =
96 distinct instances. A similar calculation shows that there are 5X 4 X 4 X 4 X 4 X 4 = 5 120
syntactically distinct hypotheses within H. Notice, however, that every hypothesis containing one
or more " Ø " symbols represents the empty set of instances; that is, it classifies every instance as
negative. Therefore, the number of semantically distinct hypotheses is only 1 + (4.3.3.3.3.3) = 973.
Here, hj is (strictly) more-general than hk (written hj >g hk) if and only if (hj ≥g hk) and (hk
not(≥g) hi). Finally, we will sometimes find the inverse useful and will say that hj is more specific
than hk when hk is more_general-than hj.
The hypothesis h2 is more general than hl because every instance that satisfies hl also satisfies
h2. Similarly, h2 is more general than h3. Note that neither hl nor h3 is more general than the
other. the >g or ≥g relations are defined independent of the target concept, but defines a partial
order over the hypothesis space H (the relation is reflexive, antisymmetric, and transitive).
Find S Algorithm-
The concept learning algorithms that take advantage of this partial order to efficiently organize the
search for hypotheses that fit the training data.
How can we use the more-general-than partial ordering to organize the search for a hypothesis
consistent with the observed training examples?
One way is to begin with the most specific possible hypothesis in H, then generalize this hypothesis
each time it fails to cover an observed positive training example.
Ex. For the EnjoySport task,
the first step of FIND- S is to initialize h to the most specific hypothesis in H---
h (Ø, Ø, Ø, Ø, Ø, Ø )
In particular, none of the " Ø " constraints in h are satisfied by this example, so each is replaced
by the next more general constraint that fits the example; namely, the attribute values for this
training example
h (Sunny, Warm, Normal, Strong, Warm, Same)
Upon encountering the third training example-in this case a negative example-then algorithm
makes no change to h.
As h correctly classifies this example as negative, and hence no revision is needed.
To complete our trace of FIND-S, the fourth (positive) example leads to a further generalization
of h
h (Sunny, Warm, ?, Strong, ?, ?)
The search moves from hypothesis to hypothesis, searching from the most specific to progressively
more general hypotheses along one chain of the partial ordering. At each stage, the hypothesis is
the most specific hypothesis consistent with the training examples observed up to this point , hence
the name is FIND-S.
FIND-S is guaranteed to output the most specific hypothesis within H that is consistent with the
positive training examples. Its final hypothesis will also be consistent with the negative examples
provided the correct target concept is contained in H, and provided the training examples are
correct.
Drawbacks of Find S-
1. Has the learner converged to the correct target concept? Although FIND-S will find a
hypothesis consistent with the training data, it has no way to determine whether it has found
the only hypothesis in H consistent with the data (i.e., the correct target concept), or
whether there are many other consistent hypotheses as well. We would prefer a learning
algorithm that could determine whether it had converged and, if not, at least characterize
its uncertainty regarding the true identity of the target concept.
2. Why prefer the most specific hypothesis? It is unclear whether we should prefer this
hypothesis over, say, the most general, or some other hypothesis of intermediate generality.
3. Are the training examples consistent? We would prefer an algorithm that could at least
detect when the training data is in-consistent and, preferably, accommodate such errors.
4. What if there are several maximally specific consistent hypotheses? In the hypothesis
language H for the EnjoySport task, there is always a unique, most specific hypothesis
consistent with any set of positive examples. FIND-S must be extended to allow it to
backtrack on its choices of how to generalize the hypothesis, to accommodate the
possibility that the target concept lies along a different branch of the partial ordering than
the branch it has selected.
Furthermore, we can define hypothesis spaces for which there is no maximally specific
consistent hypothesis, although this is more of a theoretical issue than a practical one.