Ontology mapping:
a way out of
the medical tower of Babel?
Frank van Harmelen
Vrije Universiteit Amsterdam
The Netherlands Antilles
Before we start…
a talk on ontology mappings
is difficult talk to give:
no concensus in the field
• on merits of the different approaches
• on classifying the different approaches
no one can speak with authority on
the solution
this is a personal view, with a sell-by date
other speakers will entirely disagree
(or disapprove)
Good overviews of the topic
Knowledge Web D2.2.3:
“State of the art on ontology alignment”
Ontology Mapping Survey
talk by Siyamed Seyhmus SINIR
ESWC'05 Tutorial on
Schema and Ontology Matching
by Pavel Shvaiko Jerome Euzenat
KER 2003 paper Kalfoglou & Schorlemmer
These are all different & incompatible…
Ontology mapping:
a way out of
the medical tower of Babel?
The Medical tower of Babel
Mesh
• Medical Subject Headings, National Library of Medicine
• 22.000 descriptions
EMTREE
• Commercial Elsevier, Drugs and diseases
• 45.000 terms, 190.000 synonyms
UMLS
• Integrates 100 different vocabularies
SNOMED
• 200.000 concepts, College of American Pathologists
Gene Ontology
• 15.000 terms in molecular biology
NCI Cancer Ontology:
• 17,000 classes (about 1M definitions),
Ontology mapping:
a way out of
the medical tower of Babel?
What are ontologies &
what are they used for
world
concept
language
no shared understanding
Conceptual and
terminological confusion
Agree on a
conceptualization
Make it explicit
in some language.
Actors: both humans and machines
Ontologies come in very
different kinds
From lightweight to heavyweight:
• Yahoo topic hierarchy
• Open directory (400.000 general categories)
• Cyc, 300.000 axioms
From very specific to very general
• METAR code (weather conditions at air terminals)
• SNOMED (medical concepts)
• Cyc (common sense knowledge)
What’s inside an ontology?
terms + specialisation hierarchy
classes + class-hierarchy
instances
slots/values
inheritance (multiple? defaults?)
restrictions on slots (type, cardinality)
properties of slots (symm., trans., …)
relations between classes (disjoint, covers)
reasoning tasks: classification, subsumption
Increasing semantic “weight”
In short
(for the duration of this talk)
Ontologies are not
definitive descriptions of
what exists in the world (= philosphy)
Ontologies are
models of the world
constructed
to facilitate communication
Yes, ontologies exist
(because we build them)
Ontology mapping:
a way out of
the medical tower of Babel?
n Ontology mapping is
old & inevitable
Ontology mapping is old
• db schema integration
• federated databases
Ontology mapping is inevitable
• ontology language is standardised,
• don't even try to standardise contents
o Ontology mapping is
important
database integration,
heterogeneous database retrieval
(traditional)
catalog matching (e-commerce)
agent communication (theory only)
web service integration (urgent)
P2P information sharing (emerging)
personalisation (emerging)
p Ontology mapping is
now urgent
Ontology mapping has acquired
new urgency
• physical and syntactic integration is ± solved,
(open world, web)
• automated mappings are now required (P2P)
• shift from off-line to run-time matching
Ontology mapping has new opportunities
• larger volumes of data
• richer schemas (relational vs. ontology)
• applications where partial mappings work
Different aspects
of ontology mapping
how to discover a mapping
how to represent a mapping
• subset/equal/disjoint/overlap/
is-somehow-related-to
• logical/equational/category-theoretical
atomic/complex arguments,
confidence measure
how to use it
We only talk about “how to discover”
Many experimental systems:
(non-exhaustive!)
Prompt (Stanford SMI)
Anchor-Prompt (Stanford SMI)
Chimerae (Stanford KSL)
Rondo (Stanford U./ULeipzig)
MoA (ETRI)
Cupid (Microsoft research)
Glue (Uof Washington)
FCA-merge (UKarlsruhe)
IF-Map
Artemis (UMilano)
T-tree (INRIA Rhone-Alpes)
S-MATCH (UTrento)
Coma (ULeipzig)
Buster (UBremen)
MULTIKAT (INRIA S.A.)
ASCO (INRIA S.A.)
OLA (INRIA R.A.)
Dogma's Methodology
ArtGen (Stanford U.)
Alimo (ITI-CERTH)
Bibster (UKarlruhe)
QOM (UKarlsruhe)
KILT (INRIA LORRAINE)
Different approaches to
ontology matching
Linguistics & structure
Shared vocabulary
Instance-based matching
Shared background knowledge
Linguistic &
structural mappings
normalisation
(case,blanks,digits,diacritics)
lemmatization, N-grams,
edit-distance, Hamming distance,
distance = fraction of common parents
elements are similar if
their parents/children/siblings are similar
decreasing order of boredom
Different approaches to
ontology matching
Linguistics & structure
Shared vocabulary
Instance-based matching
Shared background knowledge
Matching through
shared vocabulary
Q
Low(Q)
Q
Up(Q)
U
U Low(Q)
Low(Q) ⊆
⊆Q
Q⊆
⊆I
I Up(Q)
Up(Q)
Matching through
shared vocabulary
Used in mapping geospatial databases
from German land-registration authorities
(small)
Used in mapping bio-medical and
genetic thesauri
(large)
Different approaches to
ontology matching
Linguistics & structure
Shared vocabulary
Instance-based matching
Shared background knowledge
Matching through
shared instances
Matching through
shared instances
Used by Ichise et al (IJCAI’03) to
succesfully map parts of Yahoo to
parts of Google
Yahoo = 8402 classes, 45.000 instances
Google = 8343 classes, 82.000 instances
Only 6000 shared instances
70% - 80% accuracy obtained (!)
Conclusions from authors:
• semantics is needed to improve on this ceiling
Different approaches to
ontology matching
Linguistics & structure
Shared vocabulary
Instance-based matching
Shared background knowledge
Matching using shared
background knowledge
shared
background
knowledge
ontology 1
ontology 2
Ontology mapping
using background knowledge
Case study 1
PHILIPS
Work with Zharko Aleksovski @ Philips
•
Michel Klein @ VU
KIK @ AMC
Overview of test data
Two terminologies from
intensive care domain
OLVG list
• List of reasons for ICU admission
AMC list
• List of reasons for ICU admission
DICE hierarchy
• Additional hierarchical knowledge describing
the reasons for ICU admission
OLVG list
developed by clinician
3000 reasons for ICU admission
1390 used in first 24 hours of stay
• 3600 patients since 2000
based on ICD9 + additional material
List of problems for patient admission
Each reason for admission is described with
one label
• Labels consist of 1.8 words on average
• redundancy because of spelling mistakes
• implicit hierarchy (e.g. many fractures)
AMC list
List of 1460 problems for ICU admission
Each problem is described using
5 aspects from the DICE terminology:
2500 concepts (5000 terms), 4500 links
• Abnormality (size: 85)
• Action taken (size: 55)
• Body system (size: 13)
• Location (size: 1512)
• Cause (size: 255)
expressed in OWL
allows for subsumption & part-of reasoning
Why mapping
AMC list ↔ OLVG list?
allow easy entering of OLVG data
re-use of data in
• epidemiology
• quality of care assessment
• data-mining (patient prognosis)
Linguistic mapping:
Compare each pair of concepts
Use labels and synonyms of concepts
Heuristic method to discover
equivalence and subclass relations
Long brain tumor
More specific Long tumor
than
First round
• compare with complete DICE
• 313 suggested matches, around 70 % correct
Second round:
• only compare with “reasons for admission” subtree
• 209 suggested matches, around 90 % correct
Î High precision, low recall (“the easy cases”)
Using background knowledge
Use properties of concepts
Use other ontologies to discover
relation between properties
….
….
….
?
….
….
….
Semantic match
DICE aspect
taxonomies
Lexical match
?
?
?
?
?
OLVG
problem list
Given
Abnormality
Abnormality taxonomy
taxonomy
Action
Action taxonomy
taxonomy
Body
Body system
system taxonomy
taxonomy
Location
Location taxonomy
taxonomy
Cause
Cause taxonomy
taxonomy
Implicit
matching:
property
match
DICE
problem list
Semantic match
Taxonomy of body parts
Blood vessel
is more general
is more general
Vein
Artery
Artery
is more general
Aorta
Aorta
Lexical match:
has location
Reasoning:
implies
Aorta
Aorta thoracalis
thoracalis dissection
dissection
Lexical match:
has location
Dissection
of artery
artery
Dissection of
Location match:
has more
general location
Example: “Heroin intoxication”
– “drugs overdose”
Cause taxonomy
Drugs
Drugs
is more general
Heroine
Heroine
Lexical
match:
cause
Heroin
Heroin intoxication
intoxication
Lexical
match:
abnormality
Cause match:
has more specific
cause
Abnormality match:
has more general
abnormality
Abnormality taxonomy
Intoxicatie
Intoxicatie
is more general
Overdosis
Overdosis
Lexical
match:
cause
Drugs
Drugs overdosis
overdosis
Lexical match:
abnormality
Example results
• OLVG: Acute respiratory failure
abnormality
•
cause
•
•
•
DICE: Asthma cardiale
OLVG: Aspergillus fumigatus
DICE: Aspergilloom
OLVG: duodenum perforation
DICE: Gut perforation
OLVG: HIV
DICE: AIDS
OLVG: Aorta thoracalis dissectie type B
DICE: Dissection of artery
abnormality,
cause
cause
location,
abnormality
Ontology mapping
using background knowledge
Case study 2
Work with Heiner Stuckenschmidt
@ VU
Case Study:
1. Map GALEN & Tambis,
using UMLS as background knowledge
2. Select three topics with sufficient overlap
•
•
•
Substances
Structures
Processes
3. Define some
partial & ad-hoc manual mappings
between individual concepts
4. Represent mappings in C-OWL
5. Use semantics of C-OWL
to verify and complete mappings
Case Study:
verification &
derivation
UMLS
verification &
derivation
(medical terminology)
lexical mapping
lexical mapping
GALEN
(medical ontology)
Tambis
derived mapping
(genetic ontology)
Ad hoc mappings: Substances
UMLS
GALEN
Notice: mappings high and low in the hierarchy, few in the middle
Ad hoc mappings: Substances
UMLS
Tambis
Notice different grainsize: UMLS course, Tambis fine
Verification of mappings
=
UMLS:Chemicals
Tambis:Chemical
UMLS:Chemicals_
viewed_structurally
Tambis:enzyme
⊥?
UMLS:Chemicals_
viewed_functionally
UMLS:enzyme
=
Deriving new mappings
UMLS:substance
UMLS:Chemicals
UMLS:Phenomenon_
or_process
⊥
Galen:
ChemicalSubstance
UMLS:OrganicChemical
⊇
=
⊆
Ontology mapping:
a way out of
the medical tower of Babel?
“Conclusions”
Ontology mapping is (still) hard & open
Many different approaches will be required:
•
•
•
•
•
linguistic,
structural
statistical
semantic
…
Currently no roadmap theory on
what's good for which problems
Challenges
roadmap theory
run-time matching
“good-enough” matches
large scale evaluation methodology
hybrid matchers (needs roadmap theory)
View publication stats
Ontology mapping:
a way out of
the medical tower of Babel?