0% found this document useful (0 votes)
12 views

5.3-Inductive Logic Programming

This document provides an overview of inductive logic programming (ILP). ILP combines machine learning and logic programming to learn logical hypotheses from examples and background knowledge expressed in logical rules. The goal of ILP is to find hypotheses that are consistent with positive examples and inconsistent with negative examples. ILP algorithms perform a search over the hypothesis space using generalization and specialization rules, pruning unpromising candidates, until a stopping criterion is met. Examples of generalization rules are inverse resolution and least general generalization, while theta-subsumption is a common specialization rule.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

5.3-Inductive Logic Programming

This document provides an overview of inductive logic programming (ILP). ILP combines machine learning and logic programming to learn logical hypotheses from examples and background knowledge expressed in logical rules. The goal of ILP is to find hypotheses that are consistent with positive examples and inconsistent with negative examples. ILP algorithms perform a search over the hypothesis space using generalization and specialization rules, pruning unpromising candidates, until a stopping criterion is met. Examples of generalization rules are inverse resolution and least general generalization, while theta-subsumption is a common specialization rule.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Week 5 Machine Learning enabled


by prior Theories

Video 5.3 Inductive Logic Programming


Inductive Logic Programming (ILP)

= Machine Learning ∩ Logic Programming

= Learning with Logic


Logical reasoning: deduction

From rules to facts…

B ∪ T |-
E
mother(penelope,victoria). parent(penelope,victoria).
mother(penelope,arthur). parent(penelope,arthur).
father(christopher,victoria). parent(christopher,victoria).
father(christopher,arthur). parent(christopher,arthur).

parent(X,Y) :- father(X,Y).
parent(X,Y) :- mother(X,Y).
Logical reasoning: induction

From facts to rules…

B ∪ E |-
T
mother(penelope,victoria). parent(penelope,victoria).
mother(penelope,arthur). parent(penelope,arthur).
father(christopher,victoria). parent(christopher,victoria).
father(christopher,arthur). parent(christopher,arthur).

parent(X,Y) :- father(X,Y).
parent(X,Y) :- mother(X,Y).
Inductive Logic Programming
The goal of ILP is to find hypotheses expressed as logic programming clauses, from a set of
positive and negative examples and in the presence of a domain theory where the latter also
are expressed in the same formalism.

Given:
• a set of positive E+ and negative E- training examples expressed in an ´observation language´ LE.
• a Domain Theory T
• a hypothesis language LH that delimits the clauses that are allowed in the hypotheses space H,
• a relation covers(e, H, B) which determines the classification of an example e with respect to H
considering B.

Find a hypothesis h ∈ H that covers


• all positive examples (completeness)
• no negative examples (consistency).
Advantages of ILP

• Possibility to express examples, hypotheses and


domain(background) theory in the same formalism

• Handling deduction and induction within the same


framework.

• Convenient handling of structured data and multiple Train Car Car Length Shape Axe Roof …
s
relations as needed for e.g. descriptions of chemical t1 c11
c11 short rectangle 2 none …
molecules. t1 c12
c12 long rectangle 3 none …
t1 c13
c13 short rectangle 2 peaked …
t1 c14
• Some ILP systems can also invent new predicates and t2 c21
c14 long rectangle 2 none …

c21 short rectangle 2 flat …


add them to the Domain theory. … …
… … … … … …
Specialization and Generalization Rules
A hypothesis G is more general than a hypothesis S if and only if G ⊧ S. S is also said to be more specific than G.

In search algorithms, the notions of generalization and specialization are incorporated using inductive and deductive
inference rules:

• A deductive inference rule r that maps a conjunction of clauses G onto a conjunction of clauses S such that G ⊧ S is
called a specialization rule.
- the typical ILP rule is θ-subsumption
- the search space is a lattice under θ-subsumption
- there exists a least upper bound (lub) greatest lower bound (glb) and for every pair of clauses

• An inductive inference rule r that maps a conjunction of clauses S onto a conjunction of clauses G such that G ⊧ S is
called a generalization rule.
- the typical ILP rules are inverse resolution and inverse entailment
- Bottom-up approaches establishes the least general generalization (lgg) of the positive examples.

• Induction is viewed as the inverse of deduction.


A Generic ILP Algorithm
Given the key ideas of ILP as search, a generic ILP algorithm can be defined as follows:

The algorithm:
• keeps track of a queue of candidate hypotheses QH
• repeatedly
1. deletes a hypothesis H from the queue
2. expands H using some generalization or specialization rules
3. the resulting hypotheses are then added to the queue of hypotheses QH
4. QH is pruned to discard unpromising hypotheses from further consideration
• continues until the stop-criterion is satisfied.
ILP Algorithm – Elements
• Initialize the hypotheses (HQ) (e.g. HQ = {true} in Top Down or HQ= Examples in Bottom UP)

• R denotes the set of inference rules applied, generalizations or specializations

• Delete influences the search strategy. One can realize a depth-first (Delete = LIFO), a breadth-first
(Delete = FIFO) or a best-first algorithm.

• Choose determines the inference rules to be applied on the hypothesis H

• Prune determines which candidate hypotheses are to be deleted from the queue. Delete and prune
together implement the search strategy. (e.g. for hill-climbing: only keep the best hypothesis)

• The Stop-criterion states the conditions under which the algorithm stops , when the current set of
hypothesis is already good enough (e.g. when it contains a correct hypothesis).
Pruning the search space

Generalization and specialization form the basis for pruning the search space; this is
because:

• When B ∪ H ⊭ e, where e ∈ E+, B is the background theory, H is the hypothesis, then the
specializations H’ of H will not imply the evidence and can be pruned.

• When B ∪ H ∪ {e} ⊧ □, where e ∈ E-, B is the background theory, H is the hypothesis, then
all generalizations H’ of H will also be inconsistent with B ∪ E and can be pruned.
Some further ILP terminology

An ILP algorithm has the goal to form correct hypotheses in the sense that they satisfying the
following requirements:
• "Sufficiency" - the hypotheses explains all positive examples
• "Necessity" - all the positive examples are NOT explainable without the hypotheses.
• "Weak consistency" the hypotheses do not contradict any element of the Domain theory D .
• "Strong consistency" the hypotheses are not consistent with the negative examples.
Generalization through Inverse Resolution
Resolution

father(X,Y) :- male(X) male(adam)

father( adam, kain )

Inverse resolution

Hypothesis: father(X)
Domain knowledge: male(adam). male(kain). male(abdullah). male(muhammad). male(moses).
parent(adam,kain). parent(eve,kain). parent(abdullah,muhammad)
Example: father(adam,kain)

father(X,Y) :- male(X),parent(X,Y) parent(adam, kain)

father(X,Y) :- male(X) male(adam)

father( adam, kain)


Specialization through Θ-subsumption
Clause G subsumes clause F if and only G |= F or, equivalently G ⊆ F

Example - propositional logic


H:- p,q, |= H :- p,q,r, because {H, ¬p, ¬q} ⊆ {H, ¬p, ¬q,¬r, }

H:-p H:-q H:-r

H:-p, q H:-p, r H:-q,r

H:-p, q, r
Θ-subsumption in predicate logic
G subsumes S if and only if there is a substitution θ such that Gθ = S
e.g. p(X,Y,X) subsumes p(a,Y,a) or p(f(X),Y) subsumes p(f(a),Y)

P(X,Y,Z)

P(a,Y,Z) P(X,b,Z) P(X,Y,c)

P(a,b,Z) P(a,Y,c) P(X,b,c)

P(a,b,c)
Example
Positive Examples:

bird(penguin) , bird(eagle) , bird(sparrow)

Background knowledge:

lays-eggs(penguin), lays-eggs(sparrow), lays-eggs(eagle),


flies(eagle), flies(sparrow).
has-talons(eagle),

Hypotheses: bird(X).
Bottom up search typically based on inverse resolution
bird(X):- true.

bird(X):- lays-eggs(X). bird(X):- flies(X). bird(X):- has-talons(X).

Inverse resolution
bird(X):- lays-eggs(X), flies(X) bird(X):- lays-eggs(X), has-talons(X) bird(X):- flies(X), has-talons(X)

…..

bird(X):- lays-eggs(X), flies(X), has-talons(X)

bird(penguin).
bird(sparrow).
bird(eagle).
Top down search typically based on theta subsumption
bird(X):- true.

bird(X):- lays-eggs(X). bird(X):- flies(X). bird(X):- has-talons(X).

theta subsumption
bird(X):- lays-eggs(X), flies(X) bird(X):- lays-eggs(X), has-talons(X) bird(X):- flies(X). has-talons(X)

…..

bird(X):- lays-eggs(X), flies(X), has-talons(X)

bird(penguin).
bird(sparrow). {}
bird(eagle).

false.
A sample of ILP systems

FOIL (Quinlan and Cameron-Jones, 1993) top down hill climbing search

PROGOL (Muggleton, 1995) top down best first search

Golem (Muggleton and Feng, 1992) bottom up hill-climbing search.

Marvin (Sammut and Banerji, 1986) bottom up search


FOIL (Quinlan et al, 1993)
• Greedy search top down search where background knowledge
(B) is limited to ground facts

• Each candidate hypotheses is scored using information gain:


• Let c1 be a specialization of c2
• Then WIG(c1) (weighted information gain) is

• Where p⊕⊕ is the number of possible bindings that make the clause cover
positive examples, p is the number of positive examples covered and n is
the number of negative examples covered.

EDAM Reading Group © 2004 19


PROGOL (Muggleton, 1995)

• Top-down best first search


• Uses P-N-l as scoring function:
• P is number of positive examples covered
• N is number of negative examples covered
• l is the number of literals in the clause

• Preprocessing step: build a bottom clause using a positive example and


Background knowledge to restrict search space.
• Uses mode declarations to restrict language
• Background knowledge not limited to ground facts
• While doing branch and bound top-down search:
• Only use literals appearing in bottom clause to refine clauses.
• Learned literal is a generalization of this bottom clause.
• Can set depth bound on variable chaining and theorem proving

EDAM Reading Group © 2004 20


Two important application areas for ILP

• Natural Language Processing


• Part of speech tagging
• Semantic parsing
• Bioinformatics
• Predicting toxicology (carcinogens)
• Discovering rules governing the 3D topology of protein
structures
NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Thanks for your attention!

The next lecture 5.4 will be on the topic:

Re-inforcement Learning

You might also like