0% found this document useful (0 votes)

78 views43 pages

Inductive Logic Programming: Anoop & Hector

Uploaded by

Annada Priyadarshinee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views43 pages

Inductive Logic Programming: Anoop & Hector

Uploaded by

Annada Priyadarshinee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 43

Inductive Logic Programming

(for Dummies)

Anoop & Hector

Knowledge Discovery in DB (KDD)
“automatic extraction of novel, useful, and valid knowledge
from large sets of data”
Different Kinds of:
• Knowledge
– Rules
– Decision trees
– Cluster hierarchies
– Association rules
– Statistically unusual subgroups
• Data

EDAM Reading Group © 2004 2

Relational Analysis
Would it not be nice to have analysis methods and
data mining systems capable of directly working
with multiple relations as they are available in
relational database systems?

EDAM Reading Group © 2004 3

Single Table vs Relational DM
The same problems still remain.
But solutions?

More problems:
• Extending the key notations
• Efficiency concerns

EDAM Reading Group © 2004 4

Relational Data Mining
Data Mining : ML :: Relational Data Mining : ILP

Initially
Binary Classification

Now
Classification, Regression, Clustering,
Association Analysis
EDAM Reading Group © 2004 5
ILP?
• India Literacy Project?
• International Language Program?
• Individualized Learning Program?
• Instruction Level Parallelism?
• International Lithosphere Program?

EDAM Reading Group © 2004 6

ILP
Inductive Logic Programming:
• Is a sub-area of Machine Learning, that in turn
is part of Artificial Intelligence
• Uses contributions from Logic Programming
and Statistics
• Tries to automate the induction processes

EDAM Reading Group © 2004 7

Deductive Vs Inductive Reasoning
T U B  E (deduce)
mother(mary,vinni). parent(mary,vinni).
parent(X,Y) :- mother(X,Y).
parent(X,Y) :- father(X,Y). mother(mary,andre). parent(mary,andre).
father(carrey,vinni).
parent(carrey,vinni).
parent(carrey,andre).
father(carry,andre).

E U B  T (induce)

parent(mary,vinni). mother(mary,vinni).
parent(X,Y) :- mother(X,Y).
parent(mary,andre). mother(mary,andre). parent(X,Y) :- father(X,Y).
parent(carrey,vinni). father(carrey,vinni).
parent(carrey,andre).
father(carry,andre).

EDAM Reading Group © 2004 8

ILP: Objective
Given a dataset:
• Positive examples (E+) and optionally negative examples (E-)
• Additional knowledge about the problem/application domain (Background
Knowledge B)
• Set of constraints to make the learning process more efficient (C)

Goal of an ILP system is to find a set of hypothesis that:

• Explains (covers) the positive examples - Completeness
• Are consistent with the negative examples - Consistency

h ∈ H : ∀p ∈ P : covers(h, p) Λ ∀n ∈ N : ¬ covers(h, n).

EDAM Reading Group © 2004 9

DB vs. Logic Programming
DB Terminology LP Terminology
• Relation name p • Predicate symbol p
• Attribute of relation p • Argument of predicate p
• Tuple <a1,…,an> • Ground fact p(a1,…,an)
• Relation p • Predicate p
a set of tuples defined extensionally by a
set of ground facts
• Relation q • Predicate q
defined as a view defined intentionally by a
set of rules (clauses)

EDAM Reading Group © 2004 10

Relational Pattern
IF Customer(C1,Age1,Income1,TotSpent1,BigSpender1)
AND MarriedTo(C1,C2)
AND Customer(C2,Age2,Income2,TotSpent2,BigSpender2)
AND Income2≥10000
THEN BigSpender1 = Yes

big_spender(C1,Age1,Income1,TotSpent1) ←
married_to(C1,C2) ∧
customer(C2,Age2,Income2,TotSpent2,BigSpender2) ∧
Income2 ≥10000

EDAM Reading Group © 2004 11

A Generic ILP Algorithm
procedure ILP (Examples)
INITIALIZE (Theories, Examples)
repeat
T = SELECT (Theories, Examples)
{Ti}ni=1 = REFINE (T, Examples)
Theories = REDUCE (Theories U Ui =1Ti, Examples)
n

until STOPPINGCRITERION (Theories, Examples)

return (Theories)

EDAM Reading Group © 2004 12

Procedures for a Generic ILP Algo.
• INITIALIZE: initialize a set of theories
(e.g. Theories = {true} or Theories = Examples)
• SELECT: select the most promising candidate theory
• REFINE: apply refine operators that guarantee new theories
(specialization, generalization,…).
• REDUCE: discard unpromising theories
• STOPPINGCRITERION: determine whether the current set of theories
is already good enough
(e.g. when it contains a complete and consistent theory)
SELECT and REDUCE together implement the search strategy.
(e.g. hill-climbing: REDUCE = only keep the best theory.)

EDAM Reading Group © 2004 13

Search Algorithms
Search Methods
– Systematic Search
• Depth-first search
• Breadth-first search
• Best-first search
– Heuristic Search
• Hill-climbing
• Beam-search
Search Direction
– Top-down search: Generic to specific
– Bottom-up search: Specific to general
– Bi-directional search

EDAM Reading Group © 2004 14

Example

EDAM Reading Group © 2004 15

Search Space

EDAM Reading Group © 2004 16

Search Space

EDAM Reading Group © 2004 17

Search Space as Lattice
• Search space is a lattice under -subsumption
• There exists a lub and glb for every pair of
clauses
• lub is ‘least general generalization’
• Bottom-up approaches find the lgg of the
positive examples

EDAM Reading Group © 2004 18

Basics of ILP cont’d
• Bottom-up approach of finding clauses leads to long
clauses through lgg.
• Thus, prefer top-down approach since shorter and more
general clauses are learned
• Two ways of doing top-down search
– FOIL: greedy search using information gain to score
– PROGOL: branch-and-bound, using P-N-l to score, uses saturation
to restrict search space
• Usually, refinement operator is to
– Apply substitution
– Add literal to body of a clause

EDAM Reading Group © 2004 19

FOIL
• Greedy search, score each clause using information gain:
– Let c1 be a specialization of c2
– Then WIG(c1) (weighted information gain) is
WIG(c1,c 2 ) = p2⊕⊕ ( I (c1 ) − I (c 2 ))
p
I(c) = − log 2
p+n

– Where p is the number of possible bindings that make the clause
cover positive examples, p is the number of positive examples
€ covered and n is the number of negative examples covered.
– Background knowledge (B) is limited to ground facts.

EDAM Reading Group © 2004 20

PROGOL
• Branch and bound top-down search
• Uses P-N-l as scoring function:
– P is number of positive examples covered
– N is number of negative examples covered
– l is the number of literals in the clause
• Preprocessing step: build a bottom clause using a positive example and B to
restrict search space.
• Uses mode declarations to restrict language
• B not limited to ground facts
• While doing branch and bound top-down search:
– Only use literals appearing in bottom clause to refine clauses.
– Learned literal is a generalization of this bottom clause.
• Can set depth bound on variable chaining and theorem proving

EDAM Reading Group © 2004 21

Example of Bottom Clause
E + = { p(a), p(b)} modes : p(var)
E − = { p(c), p(d)} q (var, var)
B = {r(a,b),r(a,c),r(c,d), r (var, const )
q(X,Y ) ← r(Y, X )} r (var, var)
• Select a seed from positive examples, p(a), randomly or by order (first
uncovered positive example)
€ • Gather all relevant literals (by forward chaining add anything from B
that is allowable)
p(a) ← r(a,b),r(a,c),r(c,d),q(b,a),
q(c,a),q(d,c),r(a,b),r(a,c),r(c,d)
• Introduce variables as required by modes
p(A) ← r(A,b),r(A,c),r(C,d),q(B, A),
€ q(C, A),q(D,C),r(A,B),r(A,C),r(C,D)
EDAM Reading Group © 2004 22
Iterate to Learn Multiple Rules
• Select seed from positive examples to build
bottom clause.
• Get some rule “If A  B then P”. Now throw
away all positive examples that were covered by
this rule
• Repeat until there are no more positive examples.

First seed - + -
selected
+ + ++
- Second seed
+ -
+ + -- -+ + selected
First rule
- -
learned
EDAM Reading Group © 2004 23
From Last Time
• Why ILP is not just Decision Trees.
– Language is First-Order Logic
• Natural representation for multi-relational settings
• Thus, a natural representation for full databases
– Not restricted to the classification task.
– So then, what is ILP?

EDAM Reading Group © 2004 24

What is ILP?
(An obscene generalization)
• A way to search the space of First-Order
clauses.
– With restrictions of course
 -subsumption and search space ordering
– Refinement operators:
• Applying substitutions
• Adding literals
• Chaining variables

EDAM Reading Group © 2004 25

More From Last Time
• Evaluation of hypothesis requires finding
substitutions for each example.
– This requires a call to PROLOG for each
example
– For PROGOL only one substitution required
– For FOIL all substitutions are required (recall
the p in scoring function)

EDAM Reading Group © 2004 26

An Example: MIDOS
• A system for finding statistically-interesting
subgroups
– Given a population and its distribution over
some class
– Find subsets of the population for which the
distribution over this class is significantly
different (an example)
– Uses ILP-like search for definitions of
subgroups

EDAM Reading Group © 2004 27

MIDOS
• Example:
– A shop-club database
• customer(ID, Zip, Sex, Age, Club)
• order(C_ID, O_ID, S_ID, Pay_mode)
• store(S_ID, Location)
– Say want to find good_customer(ID)
– FOIL or PROGOL might find:
• good_customer(C) :- customer(C,_,female,_,_),
order(C,_,_,credit_card).

MIDOS
• Instead, let’s find a subset of customers that
contain an interesting amount of members
– Target: [member, non_member]
– Reference Distribution: <66.1%,33.9%> 1371 total
– customer(C,_,female,_,_), order(C,_,_,credit_card).
Distribution: <69.9%, 30.1%> 478 total 1.54 %%

MIDOS
• How-to:
– Build a clause in an ILP fashion
– Refinement operators:
• Apply a substitution (to a constant, for example)
• Add a literal (uses given foreign link relationships)
– Use a score for ‘statistical interest’
– Do a beam search

MIDOS
• ‘Statistically-interesting’ scoring
g 2

1− g
• ∑ ( p0 i − pi )
i=1..n

– g: relative size of subgroup

– p0i: relative frequency of value i in entire population
€
– pi: relative frequency of value i in subgroup

MIDOS
• Example
– 1378 total population, 478 female credit card
buyers
– 906 members in total population, 334 members
among female credit card buyers:
g 2

1− g
• ∑ ( p0 i − pi )
i=1..n

478
1378 • ⎡ 906 − 334
2
+ 472 − 144
2
⎤ = 0.00154
⎢⎣( 1378 478) ( 1378 )
478 ⎥⎦
1− 4781378
€
EDAM Reading Group © 2004 32
Scaling in Data Mining
• Scaling to large datasets
– Increasing number of training examples
• Scaling to size of the examples
– Increasing number of ground facts in
background knowledge

[Tang, Mooney, Melville, UT Austin, MRDM (SIGKDD) 2003.]

BETHE
• Addresses scaling to number of ground facts in
background knowledge
• Problem: PROGOL bottom-clause is too big
– This makes search space too big
• Solution: don’t build a bottom clause
– Use FOIL-like search, BUT
– Guide search using some seed example

Efficiency Issues
• Representational Aspects
• Search
• Evaluation
• Sharing computations
• Memory-wise scalability

Representational Aspects
• Example:
– Student(string sname, string major, string minor)
– Course(string cname, string prof, string cred)
– Enrolled(string sname, string cname)
• In a natural join of these tables there is a one-to-one
correspondance between join result and the Enrolled table
• Data mining tasks on the Enrolled table are really
propositional
• MRDM is overkill

Representational Aspects
• Three settings for data mining:
– Find patterns within individuals represented as tuples
(single table, propositional)
• eg. Which minor is chosen with what major
– Find patterns within individuals represented as sets of
tuples (each individual ‘induces’ a sub-database)
• Multiple tables, restricted to some individual
• eg. Student X taking course A, usually takes course B
– Find patters within whole database
• Mutliple tables
• eg. Course taken by student A are also taken by student B

Search
• Space restriction
– Bottom clauses as seen above
• Syntactical biases and typed logic
– Modes as seen above.
– Can add types to variables to further restrict language
• Search biases and pruning rules
– PROGOL’s bound (relies on anti-monotonicity of
coverage)
• Stochastic search

Evaluation
• Evaluating a clause: get some measure of
coverage
– Match each example to the clause:
• Run multiple logical queries.
– Query optimization methods from DB
community
• Rel. Algebra operator reordering
• BUT: queries for DB are set oriented (bottom-up),
queries in PROLOG find a single solution (top-
down).
EDAM Reading Group © 2004 39
Evaluation
• More options
– k-locality, given some bindings, literals in a clause
become independent:
• eg. ?- p(X,Y),q(Y,Z),r(Y,U).
• Given a binding for Y, proofs of q and r are independent
• So, find only one solution for q, if no solution found for r no
need to backtrack.
– Relax -subsumption using stochastic estimates
• Sample space of substitutions and decide on subsumption
based on this sample

Sharing Computations
• Materialization of features
• Propositionalization
• Pre-compute some statistics
– Joint distribution over attributes of a table
– Query selectivity
• Store proofs, reuse when evaluating new
clauses

Memory-wise scalability
• All full ILP systems work on memory
databases
– Exception: TILDE: learns multi-relational
decision trees
• The trick: make example loop the outer loop
• Current solution:
– Encode data compactly

References
• Dzeroski and Lavrac, Relational Data Mining,
Springer, 2001.
• David Page, ILP: Just Do It, ILP 2000.
• Tang, Mooney, Melville, Scaling Up ILP to Large
Examples: Results on Link Discovery for Counter
Terrorism, MRDM 2003.
• Blockeel, Sebag: Scalability and efficiency in multi-
relational data mining. SIGKDD Explorations 5(1) July
2003

2025 MMP AI-KRR Unit 3 Representing Knowledge Using Rules
No ratings yet
2025 MMP AI-KRR Unit 3 Representing Knowledge Using Rules
99 pages
Prolog LabSheet
100% (2)
Prolog LabSheet
34 pages
The Art of Prolog
100% (1)
The Art of Prolog
277 pages
Unification and Lifting
No ratings yet
Unification and Lifting
8 pages
Is Lab Cse Visem
No ratings yet
Is Lab Cse Visem
43 pages
7 InductiveLogicProgramming
No ratings yet
7 InductiveLogicProgramming
48 pages
Ch6: Knowledge Representation Using Rules
No ratings yet
Ch6: Knowledge Representation Using Rules
37 pages
How To Simplify Logical Reasoning Problems Book
No ratings yet
How To Simplify Logical Reasoning Problems Book
247 pages
S4hana Session P3
100% (1)
S4hana Session P3
53 pages
Full PHP ToDo List Report
No ratings yet
Full PHP ToDo List Report
8 pages
lectures 6-8
No ratings yet
lectures 6-8
150 pages
Coure Main
No ratings yet
Coure Main
323 pages
The Course Logic Programming 2G1121
No ratings yet
The Course Logic Programming 2G1121
387 pages
PDF of Artifical Intelligance Machine Learning Lab 2
No ratings yet
PDF of Artifical Intelligance Machine Learning Lab 2
250 pages
Prolog PDF
No ratings yet
Prolog PDF
83 pages
Prolog Lab Sheets
No ratings yet
Prolog Lab Sheets
36 pages
AWS solution architect question and answers
No ratings yet
AWS solution architect question and answers
154 pages
Logic, Programming, and Prolog (2nd Ed) - Ulf Nilsson, Jan Maluszynski
No ratings yet
Logic, Programming, and Prolog (2nd Ed) - Ulf Nilsson, Jan Maluszynski
294 pages
L08 - Learning and Logic
No ratings yet
L08 - Learning and Logic
19 pages
Internship Report
No ratings yet
Internship Report
12 pages
MDG Interview Questions
No ratings yet
MDG Interview Questions
43 pages
P - 14 - Datalog, Recursive SQL
No ratings yet
P - 14 - Datalog, Recursive SQL
51 pages
CS240A: Databases and Knowledge Bases
No ratings yet
CS240A: Databases and Knowledge Bases
20 pages
TIB Ebx 6.2.0 Installation
No ratings yet
TIB Ebx 6.2.0 Installation
78 pages
Query Execution
No ratings yet
Query Execution
87 pages
Logic Programming Notes
No ratings yet
Logic Programming Notes
83 pages
Logical Programming Languages
No ratings yet
Logical Programming Languages
6 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
98 pages
Concepts of ILP
No ratings yet
Concepts of ILP
15 pages
5.3-Inductive Logic Programming
No ratings yet
5.3-Inductive Logic Programming
22 pages
Average Difficulty
No ratings yet
Average Difficulty
17 pages
Referencia 3
No ratings yet
Referencia 3
79 pages
Datalog and Recursion
No ratings yet
Datalog and Recursion
144 pages
Lesson Plan BCA103 Database Management System
No ratings yet
Lesson Plan BCA103 Database Management System
2 pages
sminton
No ratings yet
sminton
86 pages
Developer and DBA Tips For Pro-Actively Optimizing SQL Apply Performance PDF
No ratings yet
Developer and DBA Tips For Pro-Actively Optimizing SQL Apply Performance PDF
16 pages
Referencia 8
No ratings yet
Referencia 8
51 pages
Module2 Notes (Srinivasulu M)
No ratings yet
Module2 Notes (Srinivasulu M)
33 pages
Chapter 6 MP
No ratings yet
Chapter 6 MP
30 pages
Electronic Voting System Software Requirements Specification
No ratings yet
Electronic Voting System Software Requirements Specification
14 pages
Ai Chapter 3
No ratings yet
Ai Chapter 3
22 pages
Prog3112 Second Quarter Exam
0% (1)
Prog3112 Second Quarter Exam
10 pages
Lingaya S Vidyapeeth, Faridabad
No ratings yet
Lingaya S Vidyapeeth, Faridabad
27 pages
Lecture Notes in Machine Learning
No ratings yet
Lecture Notes in Machine Learning
65 pages
A Learning Path Recommendation Model Based On A Multidimensional
No ratings yet
A Learning Path Recommendation Model Based On A Multidimensional
28 pages
Assignment 2 Study of DDL and DML Operations On Views - 191061060
No ratings yet
Assignment 2 Study of DDL and DML Operations On Views - 191061060
6 pages
Special Topics Mas
100% (2)
Special Topics Mas
11 pages
IA4T- 24-25 (1)
No ratings yet
IA4T- 24-25 (1)
51 pages
DFo Quiz
0% (1)
DFo Quiz
22 pages
Artificial Intelligence Logic - Programming.and - Prolog
No ratings yet
Artificial Intelligence Logic - Programming.and - Prolog
294 pages
Translation and Optimization of Logic Queries, The Algebraic Approach
No ratings yet
Translation and Optimization of Logic Queries, The Algebraic Approach
8 pages
Week 2 Dillard - S Analysis Guide PDF
No ratings yet
Week 2 Dillard - S Analysis Guide PDF
6 pages
Ak Project
No ratings yet
Ak Project
21 pages
Data Analysis and Validation - Case Study
No ratings yet
Data Analysis and Validation - Case Study
5 pages
19-datalog
No ratings yet
19-datalog
49 pages
Visual Basic 6 Library Functions
No ratings yet
Visual Basic 6 Library Functions
17 pages
Project 1 Info How Do I Construct A Frequency Distribution
No ratings yet
Project 1 Info How Do I Construct A Frequency Distribution
3 pages
Lab Sheets: Interim Syllabi By: Kashif Rizwan
No ratings yet
Lab Sheets: Interim Syllabi By: Kashif Rizwan
38 pages
Prolog 1
No ratings yet
Prolog 1
22 pages
Deductive Databases
No ratings yet
Deductive Databases
24 pages
Procedural Query Languages
No ratings yet
Procedural Query Languages
129 pages
Prolog Programming: Techniques of
No ratings yet
Prolog Programming: Techniques of
7 pages
2021 - Inductive Logic Programming at 30
No ratings yet
2021 - Inductive Logic Programming at 30
23 pages
T4 Pro 1
No ratings yet
T4 Pro 1
25 pages
8 Logic Programming
No ratings yet
8 Logic Programming
29 pages
New Microsoft PowerPoint Presentation
No ratings yet
New Microsoft PowerPoint Presentation
6 pages
ORACLE DBA Job Description
No ratings yet
ORACLE DBA Job Description
2 pages
BDA - Lecture 3
100% (1)
BDA - Lecture 3
17 pages
Cape Notes Unit 2 Module 1 Content 9
No ratings yet
Cape Notes Unit 2 Module 1 Content 9
21 pages
Knowledge in Learning: Fall 2005
No ratings yet
Knowledge in Learning: Fall 2005
28 pages
AI File
No ratings yet
AI File
29 pages
B351F (2021) L02 Business Programming
No ratings yet
B351F (2021) L02 Business Programming
9 pages
Database Security Attacks Threats and Challenges IJERTCONV5IS10008
No ratings yet
Database Security Attacks Threats and Challenges IJERTCONV5IS10008
4 pages
Module 5
No ratings yet
Module 5
24 pages
1525974395Hyndhav-Resume
No ratings yet
1525974395Hyndhav-Resume
5 pages
Prolog Notes-Complete
No ratings yet
Prolog Notes-Complete
31 pages
12-1-LogicProg
No ratings yet
12-1-LogicProg
26 pages
Ai Unit-4 Material
No ratings yet
Ai Unit-4 Material
14 pages
Top 20 Production Support Executive Questions and Answers
No ratings yet
Top 20 Production Support Executive Questions and Answers
4 pages
Az 303t00a Enu Powerpoint 00
0% (1)
Az 303t00a Enu Powerpoint 00
14 pages
adbms-unit2
No ratings yet
adbms-unit2
20 pages
UNIT 4 DWDM
No ratings yet
UNIT 4 DWDM
27 pages
Inductive_Logic_Programming_via_Differen
No ratings yet
Inductive_Logic_Programming_via_Differen
12 pages
The Chi-Square Test of Independence: Key Concepts
No ratings yet
The Chi-Square Test of Independence: Key Concepts
6 pages
UNIT 5
No ratings yet
UNIT 5
19 pages
.NET Generics 4.0 Beginner’s Guide
From Everand
.NET Generics 4.0 Beginner’s Guide
Sudipta Mukherjee
No ratings yet
Algebra
From Everand
Algebra
Larry C. Grove
5/5 (3)
EC Cryptography Tutorials - Herong's Tutorial Examples
From Everand
EC Cryptography Tutorials - Herong's Tutorial Examples
Herong Yang
No ratings yet
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
High School Pre-Calculus Tutor
From Everand
High School Pre-Calculus Tutor
The Editors of REA
4/5 (1)