Bok:978 3 642 24574 9 PDF
Bok:978 3 642 24574 9 PDF
Bok:978 3 642 24574 9 PDF
Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Alfred Kobsa
University of California, Irvine, CA, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Germany
Madhu Sudan
Microsoft Research, Cambridge, MA, USA
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbruecken, Germany
Olga De Troyer Claudia Bauzer Medeiros
Roland Billen Pierre Hallot
Alkis Simitsis Hans Van Mingroot (Eds.)
Advances in
Conceptual Modeling
Recent Developments and New Directions
13
Volume Editors
Olga De Troyer
Vrije Universiteit Brussel, Department of Computer Science, Brussel, Belgium
E-mail: olga.detroyer@vub.ac.be
Roland Billen
Pierre Hallot
Universit de Lige, Geomatics Unit, Lige, Belgium
E-mail: {rbillen; p.hallot}@ulg.ac.be
Alkis Simitsis
Hewlett-Packard Laboratories, Palo Alto, CA, USA
E-mail: alkis@hp.com
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microlms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,
even in the absence of a specic statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
Typesetting: Camera-ready by author, data conversion by Scientic Publishing Services, Chennai, India
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
Preface to ER 2011 Workshops, Posters,
Demonstrations, and Industrial Papers
This book contains the proceedings of the workshops associated with the 30th
International Conference on Conceptual Modeling (ER 2011), as well as the pa-
pers associated with the Posters and Demonstrations Session, and the Industrial
Track of ER 2011.
As always, the aim of the workshops was to give researchers and participants
a forum to present and discuss cutting edge research in conceptual modeling, and
to pose some of the challenges that arise when applying conceptual modeling in
less traditional areas. The workshops deal with theories, techniques, languages,
methods, and tools for conceptual modeling and span a wide range of domains
including web information systems, geographical information systems, business
intelligence, software variability management, and ontologies. Some of the work-
shops were organized for the rst time but others have a longer tradition. Six
workshops were selected and organized after a call for workshop proposals:
We would also like to thank the main ER 2011 conference committees, particu-
larly the Conference Co-chairs, Esteban Zim
anyi and Jean-Luc Hainaut, the Con-
ference Program Co-chairs, Manfred Jeusfeld, Lois Delcambre, and Tok Wang
Ling, and the Webmaster, Boris Verhaegen, for their support and for putting
the program together.
Workshop Co-chairs
Olga De Troyer Vrije Universiteit Brussel, Belgium
Claudia Bauzer Medeiros University of Campinas, Brazil
WISM 2011
WISM 2011 was organized by the Econometric Institute, Erasmus University
Rotterdam, Netherlands; the Department of Computer Science, Delft University
of Technology, Netherlands; and the Department of Computer Science, Namur
University, Belgium.
Program Committee
Workshop Co-chairs
Flavius Frasincar Erasmus University Rotterdam,
The Netherlands
Geert-Jan Houben Delft University of Technology,
The Netherlands
Philippe Thiran Namur University, Belgium
Program Committee
Djamal Benslimane University of Lyon 1, France
Sven Casteleyn Polytechnic University of Valencia, Spain
Richard Chbeir Bourgogne University, France
Olga De Troyer Vrije Universiteit Brussel, Belgium
Roberto De Virgilio Universit`
a di Roma Tre, Italy
Oscar Diaz University of the Basque Country, Spain
Flavius Frasincar Erasmus University Rotterdam,
The Netherlands
Martin Gaedke Chemnitz University of Technology, Germany
Irene Garrigos Universidad de Alicante, Spain
Hyoil Han LeMoyne-Owen College, USA
Geert-Jan Houben Delft University of Technology,
The Netherlands
Zakaria Maamar Zayed University, UAE
Michael Mrissa University of Lyon 1, France
Moira Norrie ETH Zurich, Switzerland
VIII Organization
External Reviewers
J.A. Aguilar
A. Bikakis
F. Valverde
MORE-BI 2011
Program Committee
Workshop Co-chairs
Ivan Jureta FNRS and Louvain School of Management,
University of Namur, Belgium
Stephane Faulkner Louvain School of Management, University of
Namur, Belgium
Esteban Zim
anyi Universite Libre de Bruxelles, Belgium
Steering Committee
Ivan Jureta FNRS and Louvain School of Management,
University of Namur, Belgium
Stephane Faulkner Louvain School of Management,
University of Namur, Belgium
Esteban Zim
any Universite Libre de Bruxelles, Belgium
Marie-Aude Aufaure Ecole Centrale Paris, France
Carson Woo Sauder School of Business, Canada
Program Committee
Alberto Abello Universitat Polit`ecnica de Catalunya, Spain
Daniele Barone University of Toronto, Canada
Ladjel Bellatreche Ecole Nationale Superieure de Mecanique
et dAerotechnique, France
Organization IX
Variability@ER 2011
Program Committee
Workshop Co-chairs
Iris Reinhartz-Berger University of Haifa, Israel
Arnon Sturm Ben Gurion University of the Negev, Israel
Kim Mens Universite catholique de Louvain, Belgium
Program Committee
Felix Bachmann SEI, USA
David Benavides University of Seville, Spain
Jan Bosch Intuit, Mountain View, USA
Paul Clements SEI, USA
X Organization
Onto.Com 2011
Program Committee
Workshop Co-chairs
Giancarlo Guizzardi Federal University of Espirito Santo, Brazil
Oscar Pastor Polytechnic University of Valencia, Spain
Yair Wand University of British Columbia, Canada
Program Committee
Alessandro Artale Free University of Bolzano, Italy
Alex Borgida Rutgers University, USA
Andreas Opdahl University of Bergen, Norway
Bert Bredeweg University of Amsterdam, The Netherlands
Brian Henderson-Sellers University of Technology Sydney, Australia
Carson Woo University of British Columbia, Canada
Chris Partridge BORO Solutions, UK
Claudio Masolo Laboratory for Applied Ontology (ISTC-CNR),
Italy
Colin Atkinson University of Mannheim, Germany
David Embley Brigham Young University, USA
Dragan Gasevic Athabasca University, Canada
Fred Fonseca Penn State University, USA
Gerd Wagner Brandenburg University of Technology,
Germany
Giancarlo Guizzardi Federal University of Espirito Santo, Brazil
Heinrich Herre University of Leipzig, Germany
Heinrich Mayr University of Klagenfuhrt, Austria
Jean-Marie Favre University of Grenoble, France
Jerey Parsons Memorial University of Newfoundland, Canada
Joerg Evermann Memorial University of Newfoundland, Canada
John Mylopoulos University of Trento, Italy
Jose Palazzo M. de Oliveira Federal University of Rio Grande do Sul, Brazil
Leo Orbst MITRE Corporation, USA
Matthew West Information Junction, UK
Michael Rosemann University of Queensland, Australia
Organization XI
SeCoGIS 2011
Program Committee
Workshop Co-chairs
Roland Billen Universite de Li`ege, Belgium
Esteban Zim anyi Universite Libre de Bruxelles, Belgium
Pierre Hallot Universite de Li`ege, Belgium
Steering Committee
Claudia Bauzer Medeiros University of Campinas, Brazil
Michela Bertolotto University College Dublin, Ireland
Jean Brodeur Natural Resources Canada
Christophe Claramunt Naval Academy Research Institute, France
Christelle Vangenot Universite de Gen`eve, Switzerland
Esteban Zim anyi Universite Libre de Bruxelles, Belgium
Program Committee
Alia Abdelmoty Cardi University, UK
Gennady Andrienko Fraunhofer Institute IAIS, Germany
Natalia Andrienko IAIS Fraunhofer, Germany
David Bennett The University of Iowa, USA
Michela Bertolotto University College Dublin, Ireland
Roland Billen Universite de Li`ege, Belgium
Patrice Boursier University of La Rochelle, France
Jean Brodeur National Resources Canada
Benedicte Bucher Institut Geographique National, France
Yvan Bedard Laval University, Canada
XII Organization
FP-UML 2011
Program Committee
Workshop Co-chairs
Guido Geerts University of Delaware, USA
Matti Rossi Aalto University, Finland
Steering Committee
Juan Trujillo University of Alicante, Spain
Il-Yeol Song Drexel University, USA
Organization XIII
Program Committee
Doo-Hwan Bae EECS Dept. KAIST, South Korea
Michael Blaha OMT Associates Inc., USA
Cristina Cachero University of Alicante, Spain
Gill Dobbie University of Auckland, New Zealand
Dirk Draheim Inst. of Computer Science, Freie Univ. Berlin,
Germany
Eduardo Fernandez University of Castilla La Mancha, Spain
Frederik Gailly Vrije Universiteit Brussel, Belgium
Paolo Giorgini University of Trento, Italy
Jaime Gomez University of Alicante, Spain
Peter Green University of Queensland, Australia
Manfred Jeusfeld Tilburg University, The Netherlands
Ludwik Kuzniarz Blekinge Institute of Technology, Sweden
Jens Lechtenborger University of Munster, Germany
Pericles Loucopoulos University of Manchester, UK
Kalle Lyytinen Case Western Reserve University, USA
Hui Ma Massey University, New Zealand
Antoni Olive Polytechnic University of Catalonia, Spain
Oscar Pastor Polytechnic University of Valencia, Spain
Witold Pedrycz Univerisity of Alberta, Canada
Mario Piattini University of Castilla La Mancha, Spain
Colette Rolland Universite de Paris, France
Manuel Serrano University of Castilla La Mancha, Spain
Keng Siau University of Nebraska-Lincoln, USA
Bernhard Thalheim Universitaet zu Kiel, Germany
A Min Tjoa Technical University of Vienna, Austria
Ambrosio Toval University of Murcia, Spain
Panos Vassiliadis University of Ioannina, Greece
Harry (Jiannan) Wang University of Delaware, USA
Program Committee
Co-chairs
Roland Billen Universite de Li`ege, Belgium
Pierre Hallot Universite de Li`ege, Belgium
XIV Organization
Program Committee
Renata Ara
ujo Universidade Federal do Estado do
Rio de Janeiro, Brazil
Michael Blaha OMT Associates, USA
Irene Garrigos University of Alicante, Spain
Pierre Geurts Universite de Li`ege, Belgium
Sergio Lifschitz Pontifcia Universidade Catolica do
Rio de Janeiro, Brazil
Jose-Norberto Mazon Universidad de Alicante, Spain
Sergio Lujan Mora Universidad de Alicante, Spain
German Shegalov Oracle, USA
Alkis Simitsis HP Labs, USA
David Taniar Monash University, Australia
Industrial Track
Program Committee
Co-chairs
Alkis Simitsis HP Labs, USA
Hans Van Mingroot IBM, Belgium
Program Committee
Phil Bernstein Microsoft Research, USA
Umeshwar Dayal HP Labs, USA
Howard Ho IBM Research, USA
Neoklis Polizotis UCSC, USA
Erik Proper Public Research Centre - Henri Tudor,
Luxembourg
Sabri Skhiri Euranova, Belgium
Jan Verelst University of Antwerp, Belgium)
Table of Contents
Industrial Track
Preface to the Industrial Track . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
Alkis Simitsis and Hans Van Mingroot
Business Intelligence
QBX: A CASE Tool for Data Mart Design . . . . . . . . . . . . . . . . . . . . . . . . . . 358
Antonino Battaglia, Matteo Golfarelli, and Stefano Rizzi
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 23, 2011.
c Springer-Verlag Berlin Heidelberg 2011
Academic Social Networks 3
References
1. Lopes, G.R., Moro, M.M., Wives, L.K., de Oliveira, J.P.M.: Collaboration recom-
mendation on academic social networks. In: Trujillo, J., Dobbie, G., Kangassalo, H.,
Hartmann, S., Kirchberg, M., Rossi, M., Reinhartz-Berger, I., Zim anyi, E., Frasin-
car, F. (eds.) ER 2010. LNCS, vol. 6413, pp. 190199. Springer, Heidelberg (2010)
2. Lopes, G.R., Moro, M.M., Wives, L.K., de Oliveira, J.P.M.: Cooperative authorship
social network. In: Proceedings of the 4th Alberto Mendelzon International Work-
shop on Foundations of Data Management, Buenos Aires, Argentina, May 17-20.
CEUR Workshop Proceedings, vol. 619 (2010), CEUR-WS.org
3. Lopes, G.R., da Silva, R., de Oliveira, J.P.M.: Applying gini coecient to quan-
tify scientic collaboration in researchers network. In: Proceedings of the Interna-
tional Conference on Web Intelligence, Mining and Semantics, WIMS 2011, pp. 168.
ACM, New York (2011)
Conceptual Modelling for Web Information Systems:
What Semantics Can Be Shared?
Simon McGinnes
1 Introduction
Now is an important moment in human development. Thanks to the internet, for the
first time in history we have the capacity to share data on a massive scale, to move
information in digital form more or less anywhere we want at the press of a button.
But with this new ability comes the need to think in new ways about information
exchange. Our conventional view of sharing information developed in the low
bandwidth world of conversation, storytelling, books and newspapers. Trying to apply
the same ideas to the internet may risk creating the potential for confusion. Recently,
efforts have been made to automate the sharing of information using technologies
such as the Semantic Web, microformats and web services. As we begin to use these
technologies it is important to be clear about what we mean when we talk about
sharing information and what kinds of information can feasibly be shared.
This position paper aims to clarify these questions in a useful way for the
developers and users of web-based information systems. The discussion is necessarily
rather philosophical, but it is made as practical as possible because sharing data is
inherently a practical issue. In fact, the sole reason for sharing data is to facilitate
action. Viewing shared data as passive information is to miss the point; the
significance of sharing data is in the potential it creates for action.
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 413, 2011.
Springer-Verlag Berlin Heidelberg 2011
Conceptual Modelling for Web Information Systems: What Semantics Can Be Shared? 5
There is a greater need for data sharing within and particularly between organizations
than ever before. Historically, most organizations have used a portfolio of information
systems for different purposes, so corporate data has been locked into a number of
separate, mutually-incompatible data structures. Organizations need their systems to
work together, but sharing data between heterogeneous applications is rarely
straightforward. In an attempt to achieve integration, many organizations use
enterprise software applications, which address requirements across a range of
business processes. However, the adoption of enterprise software products tends to
lock organizations into a single supplier and prevents the selection of best-of-breed
solutions. Both approachesthe application portfolio approach, and the adoption of
enterprise softwarecan be expensive, and neither is ideal.
The situation is exacerbated when it comes to sharing data between organizations,
or between organizations and individuals. There is at present no universal inter-
organizational equivalent to the enterprise software solution. Therefore organizations
have little choice but to make their mutually-incompatible applications work together.
Historically, two factors have presented a barrier to interoperation. One is physical
incompatibility: if two systems are physically disconnected then they cannot
communicate. The Internet and related technologies have largely solved that problem.
However, as soon as applications can physically exchange data then the need for
semantic compatibility becomes paramount. This problem is less easily solved.
For example, consider two information systems which need to interoperate: (a) a
sales order processing system containing data about product types, suppliers,
customers, orders and employees, and (b) an accounting system with data on
transactions, debtors, creditors and accounts. Although the two sets of concepts
describe the same real-world phenomena (people, organizations and business
transactions), the systems conceptualize the data very differently. The systems are
conceptually incompatible, even though they store data about the same things. The
incompatibility presents a barrier to data exchange between the applications. Unless a
programmer crafts a suitable custom interface between the two applications, which
translates between the two ways of conceptualizing the underlying business entities, it
will be difficult to share data between them; the applications are built around concepts
that do not map to one another in any straightforward way.
Of course, the problem is not confined to this trivial example of sales and
accounting systems. It is rife, because most application software is structured around
idiosyncratic, domain-specific concepts. This seemingly-benign design practice,
which has long been the norm in information systems engineering, guarantees that
different programs will tend to be semantically incompatible. But, regardless of the
pros and cons of using ad hoc concepts in application design, we need to find ways of
integrating conceptually-incompatible systems. This is where technologies such as
ontologies and the Semantic Web offer some hope.
The paper is structured as follows. Section 2 explores from first principles the ways in
which data exchange between semantically-incompatible systems can and cannot be
6 S. McGinnes
achieved. Section 3 discusses ontologies, the Semantic Web and related technologies
in the light of this analysis. Section 4 outlines some possible changes to design
practice suggested by this analysis, which might facilitate semantic interoperability.
Section 5 concludes with a summary of findings and limitations, and suggests
directions for further work.
Two commonly-used terms are avoided in this paper, because their ambiguity
could contribute to confusion. The first is information; this is a term from everyday
language which has been co-opted in IS/IT with a variety of meanings. It can refer to
essentially the same thing as data, or to any digital signal, or to text, or to facts that
have particular significance. The second term is semantics; this term has been adopted
by particular academic communities, with the result that its meaning is rather blurred.
Because the meanings of both terms are central to the arguments in this paper, we
avoid using them altogether and will use other, less ambiguous terms as appropriate.
2 Shared Meanings
When talking about data and information systems it is important to distinguish clearly
between real-world things (non-information resources in linked data terminology),
the signs that represent them, and mental states which corresponding to signs and
real-world things. For example, the term IBM is a sign, which corresponds to a
particular organization known as IBM (a real-world thing). An observer may have an
idea of the organization known as IBM, and this is a mental state. This three-way
relationship is sometimes encapsulated in the semiotic triangle [1].
Information systems store signs (bits, bytes, images, text, etc.) which represent
real-world things and mental states. When people communicate, they do so using
signs such as words and gestures. In all of these cases, signs are manipulated in the
hope of evoking mental states. This much is uncontroversial; in the following
discussion, we consider how mental states can be evoked through the use of signs by
people and machines. We are interested in knowing how software applications can
exchange data despite conceptual incompatibility. In order to understand that, we
need to understand how meaning is transmitted. Since humans often try to transmit
meaning to one another, it is helpful first to consider how this works.
Shared
meanings Meanings
embedded in
language
Language
(a) (b)
to be taken literally. Language cannot literally carry meaning since it consists only of
signs: sounds and gestures. The medium in this case is not the message. Meaning
arises only as an experience in each observer when he or she hears the sounds and
observes the gestures. The mental states that are evoked by these perceptions give rise
to mental states which we experience as meaning (Figure 2). According to this view,
no meanings can be shared; experience and memory are private mental states of each
individual. Language, which consists of signs, flows between individuals and evokes
the experience of meaning separately in each individual.
Subjective Subjective
meanings meanings
Language
However, this raises the question of how humans communicate at all, if there are
no shared meanings. The answer is that all meaning is subjective, yet we can assume
that much of our experience is similar. Humans communicate imperfectly and
misunderstanding is common. But we can proceed for much of the time as if we share
common meanings, because we are physiologically similar to one another and have
similar formative experiences. To give a trivial example, it is a reasonable working
assumption that we all mean the same thing by red. This assumption is in fact
incorrect, because evidence tells us that many people are colorblind and cannot
experience red in the same way as non-colorblind individuals. It is also likely that
individual variations in the perception of colors exist quite apart from colorblindness.
Nevertheless, it is more helpful to assume common meanings than to assume we have
no shared experience. The same argument can be extended to many concepts; while
there may be disagreement between individuals on the interpretation of particular
signals, it is more beneficial to attempt communication than not to.
Unlike computers, people can rely on having a common set of concepts, because
we all share the experience of living in the same world (roughly), are members of the
same species, and share elements of brain function. Hence our most fundamental
8 S. McGinnes
concepts (whether innate or learned) tend to be similar; most people would agree on
what a person is, for example. These ideas provide the context for thought and tell us
generally how to behave in relation to any given thing or situation that we encounter.
A related question is how each individuals subjective meanings are stored and what
form they take when consciously experienced. This is an area of academic debate, but
one broad trend in our understanding of meaning can be identified. Early theories of
cognition postulated that meaning stems from conceptual structures in the mind. In
psychology, the spreading activation model is an example of such a theory; nodes
representing distinct concepts are connected explicitly, reflecting associative links
between ideas [2]. Similar thinking in computer science gave rise to conceptual
graphs and the ideas of schemata and frames [3]. There have even been suggestions
that the unconscious mind performs computations using symbolic logic [4].
However, despite a great deal of looking, neuroscience has not found evidence for
conceptual structures in the mind. It is becoming apparent that meaning arises in a
rather different way. Rather than activating concepts, perceptions appear instead to
elicit the recall of prior experience in a holistic manner. When we observe a situation
or hear language we recall a complex of memories with associated emotional states.
To recall is to re-experience, and it is this conscious re-experiencing of prior
experiences which we know as meaning.
Recall occurs on the basis of similarity using perceptual feature-matching
processes, which operate on multiple levels. This allows us to recall memories
because of literal as well as abstract (analogical) similaritiesone situation may be
quite unlike another in detail and yet we still are reminded, because of more abstract
similarities [5]. This suggests that, as a general principle, thinking about a concept
does not depend on definitions or on analytical thinking; it involves the retrieval of
prior experience on the basis of perceptual similarity. The definition of a concept
emerges and becomes crisp only when we try to define it consciously.
Although neuroscience has not found evidence for the existence of conceptual
structures, the results of studies (some of which use brain imaging) suggest that the
primate brain possesses hardwired semantic regions which process information
about particular subjects such as people, animals, tools, places and activities [6].
There is debate about the interpretation of these results, but the implication is that the
brain automatically segregates (on the basis of feature matching) cognitive processing
into certain broad categories. Categorization occurs unconsciously and the categories
appear to be innate, not learned. They correspond to concrete, everyday ideas rather
than abstractions. We can hypothesize that other conceptsmore specialized ideas
like customer and account, for exampleare learned, and become associated with the
corresponding basic-level innate categories.
capable of experiencing meaning in the way that a human does. The closest
equivalent to meaning for a software application is when it has been programmed to
act on specific types of data; we may then say (figuratively) that the application
understands that type of data. This figurative understanding consists of two
elements: (a) being programmed to accept data with a particular structure, and (b)
being programmed to deal with that type of data appropriately. However, we must be
clear that the ability to handle each specific type of data requires explicit
programming.
So, if two applications understand different types of data (and so are
conceptually incompatible) how can they exchange their own equivalent of meaning?
As before, we consider different models of communication. In Figure 3, view (a)
suggests that it is sufficient merely to exchange data. View (b) suggests that metadata
should also be included with the data; its function is to explain the structure and
purpose of the data, so that the receiving application can process it properly.
Data +
Data metadata
App A App B App A App B
(a) (b)
View (a) is clearly insufficient since (to use our earlier example) the accounting
system will not recognize data about customers, orders and so on, and will therefore
be unable to do anything useful with them. Therefore some explanation of these
concepts is needed. However, view (b) is also incomplete, because any metadata must
be expressed in terms of concepts that the receiving application understands. The
accounting system will understand what to do with data about customers and orders
only if the metadata explains the concepts customer and order in terms of the
concepts account and transaction, or in terms of other constructs which the
accounting system understands. That would require the sending application to have
knowledge (e.g. of accounts and transactions) which it does not have.
For two conceptually-incompatible applications to exchange meanings, they must
both understand how to process particular types of data (Figure 4). This requires
that both be programmed with the ability to handle those types of data. For that to be
the case, both applications must share particular conceptual structures and must
contain program code which can deal with the corresponding data. But that would
mean that the applications would no longer be conceptually incompatible. Of course,
this is a contradiction. It means that the only way for conceptually-incompatible
applications to exchange data is by becoming conceptually compatible.
This analysis tells us that conceptually-incompatible applications can never
exchange data unless they become conceptually compatible first. It makes no
difference how complicated the data exchange method is or how much markup is
included. When data is exchanged it can be processed meaningfully only if the
sending and receiving applications are conceptually compatible and this situation can
be achieved only in advance of the exchange, through programming. In the next
section, we consider the implications of this finding for data exchange technologies.
10 S. McGinnes
Shared schema,
ontology or
conceptual model
Data +
App A metadata App B
in a particular ontology then exchange of data becomes easy, because the applications
are conceptually compatible. However, it requires that applications be designed from
the ground up to conform to the particular ontology, which is an expensive
proposition. Also, no standard ontology has emerged. Multiple competing ontologies
and microformats exist, each with its own peculiar take on the relevant domains, and
it is not particularly easy to use them in combination. If applications are built to match
a variety of different ontologies, the risk is that this lack of standardization will
perpetuate the present problem of ad hoc conceptual structures and legacy
information islands.
An alternative approach is to use ontologies as a basis for mapping the conceptual
models of applications to one another. Defining conceptual structures in more detail
does not in itself convey significance or create machine comprehension. But richer
definition of concepts (such as customer and account, in our example) might help link
them to terms in a common ontology. This would allow programs to read the data, but
it would not tell programs how to process the data appropriatelyunless specific
programming were to be done for each concept or term in the ontology. That is also
an expensive proposition and probably less likely because ontologies contain many
thousands of terms, most of which are irrelevant to any particular application.
To summarize, the development of ontologies may assist in reconciling mutually-
incompatible conceptual structures and so allow applications to exchange data. But
ontologies are in effect better information plumbing; they cannot tell applications
how to process any given type of data. Each distinct ontology defines a set of
concepts which must be programmed into a software application before the
application can process the corresponding data meaningfully. The metadata (in RDF,
for example) is useful because it signifies the type of data, but this is useful only
inasmuch as the receiving application already understands how to deal with data of
that type.
Because of the fundamental limitation that applications must already share concepts
in order to exchange corresponding data, the use of ontologies does not substantially
improve our ability to link applications on a large scale or more rapidly. We are still
held back by the need to program into each application the methods that it must use to
handle data corresponding to each concept. There is no such thing as common sense
or general knowledge for a computer application which would tell it how to handle
specific types of data.
However, the discussion in Section 2.1 alluded to some results from neuroscience
which may be helpful in thinking about this problem. The results suggest that a small
number of categories are hardwired in the brain and perhaps innate. These correspond
to concepts that people seem to understand intuitively: other people, food, tools,
actions and so on. They are neither particularly generic nor particularly abstract but
instead couched at a basic or everyday level [9]. One hypothesis is that these common
basic-level concepts, together with other common experiences such as emotions and
sensations, allow us to think and communicate about more complex ideas.
Applying the same idea to software applications, we could envisage a simple set of
shared innate categories for software applications to adhere to, providing a basis for
12 S. McGinnes
at least some limited data exchange and interoperability, without the need for
programming. It is generally a straightforward matter to map an existing conceptual
model to a simple set of basic-level categories [10]. For example, data about people
could be tagged as such, as could data about places, documents, organizations, and so
on. Once an item of data had been tagged according to this set of simple basic-level
categories, certain default programming would be applicable. For example, places can
be represented as maps; documents can be downloaded and opened. Other possible
basic-level categories include systems, physical objects, conceptual objects and
categories [11].
Recall that the purchase order processing and accounting systems in our example
contained the concepts supplier and creditor. If both of these concepts were identified
with the innate category organization, then the two applications could exchange data
meaningfully, identifying data in the categories supplier and creditor as data about
organizations. The applications could then treat the data in a way deemed appropriate
for processing data about organizations. Similarly, the concepts purchase and
transaction could be identified with the innate category activity, allowing exchange
and treatment appropriate for activities, and so on. By providing a generic level of
programming to suit each shared basic-level category, at least some basic level of
default operation would be possible on exchanged data without the two applications
possessing complex shared conceptual schemas.
4 Conclusion
Today, each software application is structured around its own set of unique, ad hoc,
concepts. This design practice guarantees conceptual incompatibility. There is a need
to find alternative design approaches that will allow applications to work with
different types of data more flexibly and share data more readily. Yet it is important
to respect the uniqueness of each applications conceptual model; a one-size-fits-all
conceptual model cannot easily be imposed.
It will become increasingly important for applications to be able to share data
automatically, despite the problem of conceptual incompatibility. A way is needed of
imbuing applications with the equivalent of common sense or general knowledge,
which will allow them to offer a sensible response to new types of data. In ontology
research, it is hoped that this will be achieved through more rigorous and more
detailed definition of concepts, to ultimately enable a form of machine
comprehension. But it is unclear how machine comprehension will be produced by
more detailed definition of concepts. As for the less ambitious goal of using
ontologies to map between conceptual models, even if ontology use in information
systems were to become widespread, a critical mass of organizations would need to
adopt a particular ontology before the benefits of standardization could be realized.
Overall, there is a tendency to think in rather non-specific ways about how
ontologies and the Semantic Web might permit free exchange of data. Any exchange
of data is constrained by the problem of conceptual incompatibility, and this cannot
be overcome solely by the inclusion of more complex markup. It requires advance
programming so that applications are able to handle the types of data to be exchanged.
This cardinal rule constitutes a fundamental limit on conceptual interoperability and
Conceptual Modelling for Web Information Systems: What Semantics Can Be Shared? 13
can be stated thus: applications can meaningfully interoperate with respect to data of
a specific type only if they have been programmed in advance to handle data of that
type. When data containing particular concepts is exchanged, applications have to be
specifically programmed to handle the relevant concepts, regardless of what
mechanism is used to transfer the data or construct the programs, and irrespective of
what markup or metadata is included.
In conclusion, this work remains theoretical. However, the relatively limited
progress to date on the Semantic Web (in comparison with the worldwide web, for
example) may represent evidence of the inherent limitation discussed in this paper.
Research is needed into ways of genuinely allowing heterogeneous applications to
exchange data in the face of conceptual incompatibility. Whether or not ontologies are
involved, the idea of using a simple set of innate categories should be tested
because it may offer a more practicable approach than attempting to implement large
and unwieldy ontologies in software applications.
References
1. Liebenau, J., Backhouse, J.: Understanding Information: An Introduction. Macmillan,
Basingstoke (1990)
2. Crestani, F.: Application of Spreading Activation Techniques in Information Retrieval.
Artificial Intelligence Review 11, 453482 (1997)
3. Sowa, J.F.: Conceptual Structures. Addison-Wesley, Reading (1984)
4. Modell, A.H.: Imagination and the Meaningful Brain. The MIT Press, Cambridge (2003)
5. Eysenck, M.W., Keane, M.: Cognitive Psychology: A Students Handbook. Psychology
Press, UK (2005)
6. Mason, M.F., Banfield, J.F., Macrae, C.N.: Thinking About Actions: The Neural
Substrates of Person Knowledge. Cerebral Cortex 14, 209214 (2004)
7. Kalfoglou, Y., Hu, B.: Issues with Evaluating and Using Publicly Available Ontologies
(2006)
8. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284, 28
37 (2001)
9. Pansky, A., Koriat, A.: The Basic-Level Convergence Effect in Memory Distortions.
Psychological Science 15, 5259 (2004)
10. McGinnes, S.: Conceptual Modelling: A Psychological Perspective. Ph.D Thesis,
Department of Information Systems, London School of Economics, University of London
(2000)
11. McGinnes, S., Amos, J.: Accelerated Business Concept Modeling: Combining User
Interface Design with Object Modeling. In: Harmelen, M.V., Wilson, S. (eds.) Object
Modeling and User Interface Design: Designing Interactive Systems, pp. 336. Addison-
Wesley, Boston (2001)
A Goal-Oriented Approach for Optimizing
Non-functional Requirements in Web
Applications
Lucentia-DLSI
University of Alicante, E-03080, San Vicente del Raspeig, Alicante, Spain
{ja.aguilar,igarrigos,jnmazon}@dlsi.ua.es
1 Introduction
Unlike traditional stand-alone software, the audience of Web applications is both
open and large. Therefore, users may have dierent goals and preferences and
stakeholders (in this context, stakeholders are individuals or organizations who
aect or are aected directly or indirectly by the development project in a posi-
tive or negative form [9]) should be able to cope with these heterogeneous needs
by means of an explicit requirements analysis stage in which functional and
non-functional requirements are considered [2].
Functional requirements (FRs) describe the system services, behavior or func-
tions, whereas non-functional requirements (NFRs), also known as quality re-
quirements, specify a constraint in the application to build or in the development
process [7]. An eective denition of requirements improves the quality of the
nal product. Unfortunately, in most of the Web engineering approaches, a com-
plete analysis of requirements is performed considering only FRs, thus leaving
aside the NFRs until the implementation stage. We totally agree with [3] the
argument that NFRs are a very important issue and must be considered from
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 1423, 2011.
c Springer-Verlag Berlin Heidelberg 2011
A Goal-Oriented Approach for Optimizing Non-functional Requirements 15
2 Related Work
In our previous work [2], a systematic literature review has been conducted for
studying requirement engineering techniques in the development of Web applica-
tions. Our ndings showed that most of the Web engineering approaches focus on
16 J.A. Aguilar, I. Garrig
os, and J.-N. Maz
on
the analysis and design phases and do not give a comprehensive support to the
requirements phase. Furthermore, the NFRs are considered in a isolated form,
leaving them out of the analysis stage. In addition, we can also conclude that
the most used requirement analysis technique is UML use cases and proles. On
the other side, with regard to approaches that consider NFRs from early stages
of the development process, in [8] the authors propose a metamodel for repre-
senting usability requirements for Web applications. Moreover, in [3] the authors
present the state-of-the-art for NFRs in a MDD (Model-Driven Development),
as well as an approach for a MDD process (outside the eld of Web engineering).
Unfortunately, these works overlook how to maximize the NFRs.
To sum up, there have been many attempts to provide techniques and meth-
ods to deal with some aspects of the requirements engineering process for Web
applications. Nevertheless, there is still a need for solutions that considers NFRs
from beginning of the Web application development process, in order to assure
that they will be satised at the same time that the functional requirements are
met, improving the quality of the Web application perceived by users.
all decision vectors that are not dominated by any other decision vectors form
the Pareto optimal set, while the corresponding objective vectors are said to
form the Pareto front. Our approach as a running example is described next.
Step 5. The Objective Function. For each softgoal j the corresponding ob-
jective function Fj with respect to a decision vector Xv is calculated by sum-
ming the contributions of all requirements to each softgoal j taking into account
the requirements conguration dened in xv : j {1...M }, v0 v < 2N
M
Fj (Xv ) = j=1 Wikj , where N is the number of requirements, M is the number
of softgoals.
Finally, the sum of all objective functions with respect to a decision vector Xv
is computed to obtain the overall tness of the decision vector Xv : j {1...N },
v0 v < 2N j=1 M
Fj (Xv ), where N is the number of requirements and M is
the number of softgoals.
Table 2 shows all possible decision vectors (column 2 to 6, all rows), in other
words, all possible requirements congurations, where I represents the sta-
tus Implemented and N represents Not implemented. The results of the
corresponding objective functions are shown in columns 7 to 12, and the overall
A Goal-Oriented Approach for Optimizing Non-functional Requirements 21
Table 2. The posible requirements to implement or not for the softgoal tradeo
Configuration R1 R2 R3 R4 R5 F(S1) F(S2) F(S3) F(S4) F(S5) F(S6) Pareto front
X1 I I I I I -1 0 0 -1 0 0 No
X2 I I I I N -1 0 0 -1 0 -1 No
X3 I I I N I -1 0 0 1 1 -1 Yes
X4 I I I N N -1 0 0 1 1 -2 No
X5 I I N I I 1 -4 -1 -1 0 0 Yes
X6 I I N I N 1 -4 -1 -1 0 -1 No
X7 I I N N I 1 -4 -1 1 1 -1 Yes
X8 I I N N N 1 -4 -1 1 1 -2 No
X9 I N I I I -1 0 0 -2 0 2 Yes
X10 I N I I N -1 0 0 -2 0 1 No
X11 I N I N I -1 0 0 0 1 1 Yes
X12 I N I N N -1 0 0 0 1 0 No
X13 I N N I I 1 -4 -1 -2 0 2 Yes
X14 I N N I N 1 -4 -1 -2 0 1 No
X15 I N N N I 1 -4 -1 0 1 1 Yes
X16 I N N N N 1 -4 -1 0 1 0 No
X17 N I I I I -2 4 1 -1 -1 0 Yes
X18 N I I I N -2 4 1 -1 -1 -1 No
X19 N I I N I -2 4 1 1 0 -1 Yes
X20 N I I N N -2 4 1 1 0 -2 No
X21 N I N I I 0 0 0 -1 -1 0 No
X22 N I N I N 0 0 0 -1 -1 -1 No
X23 N I N N I 0 0 0 1 0 -1 Yes
X24 N I N N N 0 0 0 1 0 -2 No
X25 N N I I I -2 4 1 -2 -1 2 Yes
X26 N N I I N -2 4 1 -2 -1 1 No
X27 N N I N I -2 4 1 0 0 1 Yes
X28 N N I N N -2 4 1 0 0 0 No
X29 N N N I I 0 0 0 -2 -1 2 Yes
X30 N N N I N 0 0 0 -2 -1 1 No
X31 N N N N I 0 0 0 0 0 1 Yes
X32 N N N N N 0 0 0 0 0 0 No
tness for each decision vector is shown in column 13. Finally, in the last column,
we indicate if the corresponding decision vector is in the Pareto front. Grey rows
are the Pareto front.
Step 6. Maximize the Softgoals and Still Satisfaying the Goals. In this
step the stakeholder creates a list of softgoals sorted by priority (the softgoals
priority was stablished in the list from Step 1 by the stakeholder) and a list
of goals that the Web application has to achieve. For this case, the softgoals
priority list is shown in Table 3.
Table 3. Softgoals priority list for achieve the goal Process of review of papers be
selected
Order Softgoal
1 S4.- Privacy be maximized
2 S2.- Review process easier
3 S3.- Accurate review process
4 S1.- Be fair in review
5 S5.- Avoid possible conflicts os interest
6 S6.- Obtain more complete info
objective function for those congurations that satisfy the goals, but not doing
it in this form allows us to select between dierent congurations considering
only the softgoals maximization, leaving aside the goals, this gives a wider scope
to the stakeholder for the nal implementation.
The next step consists in selecting from the congurations that are Pareto
front and satisfy the goal, the ones that maximize the softgoals according with
the list from Table 3. To do this, it is necessary to check all the congurations
with the requirements model to select the congurations that allow to achieve
the goal (in this case there are two paths, i.e., two means-ends links), these are
X3, X7, X17 and X25 . Then, it is necessary to select the best option
according to the softgoals to maximize. For the softgoal S4, X3 and X7
are the congurations which its overall is maximized and, for the softgoal S2
are X17 and X25.
For this running example, the conguration X3 is the best option, because
according with the priority list, S4 and S2 are the softgoals to prioritize.
The congurations X17 and X25 maximize S2, however the contributions
of both to softgoal S4 (which is the number one from the priority list) are
1 and 2 (see Table 2). Furthermore, besides that the conguration X3
has an overall tness of +1 for S4 as same as the conguration X7, the
conguration X3 has an overall tness of 0 for S2 and, X7 has an overall
tness of 4 for S2, resulting more aected that the conguration X3 (see
Table 2), with which indicating that optimizing security comes at a high cost
with respect to other softgoals (usability). The rest of solutions of the Pareto
front are intermediate congurations that lead us to dierent tradeos.
Finally, the nal requirements model (FRM) is the conguration X3 (see Ta-
ble 2). Therefore, the requirements R1.- Blind review process, R2.- Download
papers without authors name, R3.- Normal review process and R5.- View
review process status must be implemented in order to maximize the softgoals
S4.- Privacy be maximized and S2.- Review process easier. In X3 only
R4 is not implemented. These requirements enable alternative paths (means-
ends links) to satisfy the goal.
our goal-oriented approach for requirements analysis and the Pareto algorithm
in a MDD solution for the development of Web applications, within the A-OOH
approach.
References
1. Aguilar, J.A., Garrigos, I., Mazon, J.N., Trujillo, J.: An MDA Approach for Goal-
Oriented Requirement Analysis in Web Engineering. J. Univ. Comp. Sc. 16(17),
24752494 (2010)
2. Aguilar, J.A., Garrigos, I., Mazon, J.N., Trujillo, J.: Web Engineering Approaches
for Requirement Analysis- A Systematic Literature Review. In: 6th Web Informa-
tion Systems and Technologies (WEBIST), vol. 2, pp. 187190. SciTePress Digital
Library, Valencia (2010)
3. Ameller, D., Gutierrez, F., Cabot, J.: Dealing with Non-Functional Requirements
in Model-Driven Development. In: 18th IEEE International Requirements Engi-
neering Conference (RE), pp. 189198. IEEE, Los Alamitos (2010)
4. Bolchini, D., Paolini, P.: Goal-Driven Requirements Analysis for Hypermedia-
Intensive Web Applications. J. Req. Eng. 9(2), 85103 (2004)
5. Escalona, M.J., Koch, N.: Requirements Engineering for Web Applications - A
Comparative Study. J. Web Eng. 2(3), 193212 (2004)
6. Garrigos, I., Maz
on, J.N., Trujillo, J.: A requirement analysis approach for using i*
in web engineering. In: Gaedke, M., Grossniklaus, M., Daz, O. (eds.) ICWE 2009.
LNCS, vol. 5648, pp. 151165. Springer, Heidelberg (2009)
7. Gupta, C., Singh, Y., Chauhan, D.S.: Dependency Based Process Model for Im-
pact Analysis: A Requirement Engineering Perspective. J. Comp. App. 6(6), 2830
(2010)
8. Molina, F., Toval, A.: Integrating Usability Requirements that can be Evaluated in
Design Time into Model-Driven Engineering of Web Information Systems. J. Adv.
Eng. Softw. 40, 13061317 (2009)
9. Sommerville, I.: Software Engineering, 6th edn. Addison-Wesley, Reading (2001)
10. Szidarovszky, F., Gershon, M., Duckstein, L.: Techniques for Multiobjective Deci-
sion Making in Systems Management. Elsevier, Amsterdam (1986)
11. Yu, E.S.K.: Towards Modeling and Reasoning Support for Early-Phase Require-
ments Engineering. In: 3rd IEEE International Symposium on Requirements En-
gineering (RE), p. 226. IEEE, Washington, DC, USA (1997)
Yet Another BPEL Extension for User
Interactions
1 Introduction
A Web service is an autonomous software application that can be described,
published, discovered, and invoked across the Web [6] by using a set of XML
based standards such as UDDI, WSDL, and SOAP. Web services can be com-
bined together in order to fulll the user request that a single Web service cannot
satisfy [10]. This mechanism is known as Web service composition. A Web ser-
vice composition consists of several Web services orchestration or Web services
choreography processes that deal with a functional need which is unsatised by
a single Web service [10].
Several initiatives have been conducted to provide languages such as WS-
BPEL (Web Services Business Process Execution Language) [5] that allow the
description of Web service composition execution. WS-BPEL (BPEL for short)
is an XML based standard that provides a syntax to dene the Web service com-
position behavior (control ow) and mechanisms that enable data manipulation
(data ow).
This language expresses the Web service composition process in a fully auto-
mated way. Users are not able to interact with the Web services until the end of
the process execution. For example, users are not able to provide the input to a
Web service at runtime, they are not able to cancel the process execution, or they
are not able to have some intermediary output from a Web service. However,
many Web service composition scenarios require user interactions [3]. These user
interactions can be classied into four types [8]:
Research supported by la Wallonie.
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 2433, 2011.
c Springer-Verlag Berlin Heidelberg 2011
Yet Another BPEL Extension for User Interactions 25
Data input interaction represents the fact that the user provides data to a
Web service at runtime. For example, a user provides a licence Number to a
car renting Web service;
Data output interaction represents the fact that a Web service composition
makes data available to the user. For example, the trip scheduling Web
service composition presents the car renting price to the user;
Data selection represents the fact that the user can select a data from a set
of data. For example, a user can make a choice between ight or driving as
a means of transportation;
Interaction event represents an indicator that a user interaction has been
carried out. For example, a cancellation is done by a user during the Web
service composition execution (by pressing a button for example).
The dierence between the data interaction types (input, output and selection)
and the interaction event is that the data interaction types allow changing the
data ow of the composition, while an interaction event changes only the control
ow of the composition regardless of the data (e.g. the cancelling process is done
regardless of the data).
Unfortunately, the BPEL language does not support such types of user inter-
action. For this reason, BPEL meta-model needs to be extended to express user
interactions in a Web service composition. In addition, a user interface for the
composition needs to be developed in order to help the user to interact with the
Web services at runtime.
In this work, we propose a BPEL extension for the user interactions expres-
sion. This extension is called UI-BPEL (User Interaction Business Process Ex-
ecution Language). UI-BPEL supports the expression of the four types of the
user interactions explained above by introducing new BPEL elements: (1) a new
BPEL activities (DataInputUI, DataOutputUI, DataSelectionUI ) to express the
user data interactions; (2) a new type of BPEL event (InteractionEventUI ) to
express the interaction event; (3) an extension of the BPELs Pick and Scope
activities that support the new InteractionEventUI. The main objective of this
extension is to allow the generation of a user interface for the Web service compo-
sition based on the described user interactions. This generation is performed by
transforming the UI-BPEL user interactions elements to specic user interface
components. For example, transforming a DataInputUI to a text box, trans-
forming a DataOutputUI to a label, and transforming a DataSelectionUI to a
combo box.
UI-BPEL is part of the several initiative eorts of BPEL extension to address
the user interaction expression in a Web service composition. An example of
such extensions is BPEL4People [3], which introduces user actors into a Web
service composition by dening a new type of BPEL activities to specify user
tasks. However, this extension focuses only on the user task and does not deal
with the design of a user interface for the Web service composition. Another
example of BPEL extensions that addresses the user interaction is BPEL4UI
(Business Process Execution Language for User Interface) [1]. BPEL4UI extends
the Partner Link part of BPEL in order to allow the denition of a binding
26 M. Boukhebouze, W.P.F. Neto, and L. Erbin
between BPEL activities and an existing user interface. This user interface is
developed separately from the composition instead to be generated.
Figure 1 shows the dierence between our approach (UI-BPEL) and BPEL4UI
approach. The top side of the gure depicts our approach that proposes to ex-
tend BPEL with user interactions so that a user interface for the Web service
composition can be generated. The generated user interface is described in two
abstraction levels: rst, we propose to generate, from a UI-BPEL process, an
abstract user interface, which is independent to any interaction modality (e.g.
graphical input, vocal output) and computing platform (e.g. PC, smart phone)
[4]. We then generate a concrete user interface based on the user context and by
taking into account a set of user interface usability criterions [7]. This approach
is compliant with the existing user interface description languages (like UsiXML
[4]), which describe a user interface at dierent abstraction levels. The bottom
side of Figure 1 shows the BPEL4UI approach that proposes to extend the Part-
ner Link of BPEL in order to link the composition with an existing concrete user
interface (HTML/JavaScript). This approach does not allow the generation of
a user interface adapted to the user context (user preference, user environment,
and user platform) and the usability criteria (e.g the interface should respect the
size of the device screen).
In the remainder of this paper, we focus on the description of the UI-BPEL.
In section 2, we show a Web service composition scenario that requires user
interactions. We present, in Section 3, an overview of the UI-BPEL meta-model
with an illustrative example. Next, in Section 4, we present an extension of the
Eclipse BPEL editor that supports the UI-BPEL. Finally, we conclude this paper
with a discussion about the generation of the user interface from the proposed
extension and our future works.
2 Scenario
In this section we introduce the scenario of the purchase order process, which
requires user interactions. A customer requests a purchase order by providing
Yet Another BPEL Extension for User Interactions 27
the necessary data such as the item, and the customer address (user interaction:
data input interaction). Upon receiving a request from the customer, the initial
price of the order, initial price of the order is calculated and a shipper is selected
simultaneously. The shipper selection needs to be made by the customer who
selects a shipper from a list of available shippers (user interaction: data selec-
tion). When the two activities are completed, the nal price is calculated and
a purchase order needs to be presented to the customer (user interaction: data
output interaction). If the customer accepts his purchase order, she/he selects
a payment method either by cash or by credit card (user interaction: data se-
lection). There is a discount if the customer pays by cash. Next, a bill is shown
to the customer (user interaction: data output interaction). Finally, the user the
customer could cancel her/his request for an purchase order before receiving the
receipt of the payment (user interaction: interaction event).
Figure 2 illustrates the purchase order process expressed using the graphical
representation of Eclipse BPEL Designer graphical notations [2]. BPEL does not
support any type of user interaction required by the scenario. For example, BPEL
cannot express the fact that the customer needs data interaction to provide
28 M. Boukhebouze, W.P.F. Neto, and L. Erbin
the order. Moreover, BPEL cannot express the fact that the process can make
the purchase order available to the customer using data output interaction. In
addition, BPEL cannot describe the data interaction that allows the customer
to choose a shipper and a payment method. Finally, BPEL does not support the
fact that the customer has the ability to cancel the process before the payment
is received.
In the next section, we propose an extension of BPEL that deals with user
interactions required by the purchase order process scenario. The user interaction
specication helps to generate a user interface for the purchase order process.
3 UI-BPEL
In this section, we present our BPEL extension (called UI-BPEL) that addresses
the BPEL user interaction expression issue.
DataInputUI is a class that represents the data input activity. This activity
is similar to the BPEL Receive activity, since it suspends the composition
precess execution while waiting for an input from the user. However, unlike
the BPEL Receive activity, the process waits for an event where data is
provided, instead of a message from another Web service. This type of event
will be explained afterwards.
DataOutputUI is a class that represents the data output activity. This
activity species which variable contains data to be presented to the user.
DataSelectionUI is a class that represents the data selection activity. This
activity allows the user to select one value (or a subset of values) from a set of
values of a specic variable. The number of selectable values can be dened
by using the minCardinality property (how many values must be selected
at least) and/or the maxCardinality property (the maximum number of el-
ements have to be selected). Like DataInputUI , the DataSelectionUI
activity suspends the execution of the process until receiving an event of
selecting data as we will explain afterwards.
UI-BPEL proposes a new type of event to process the user interaction. These
events help to indicate when a user interaction is carried out, so that a specic
action can be launched. UI-BPEL denes the user interaction events as following:
UI-BPEL denes also denes some useful new classes, for example:
Figure 4 depicts the purchase order process expressed using both graphical
representation (extended Eclipse BPEL Designer graphical notations [2]) and
XML representation of UI-BPEL. The gure shows that UI-BPEL supports the
user interaction types required by the scenario. For example:
UI-BPEL expresses the fact that customer needs data input interaction to
provide the data order by using the DataInputUI activity (Figure 4, line 4).
This activity launches the process execution when the input is provided;
UI-BPEL expresses the fact that the customer can select one shipper by using
DataSelectionUI (Figure 4, line 14). In order to process this data interaction,
the composed event Not (onShipperSelection, 5 min) is listened. So that, if
no shipper is selected in 5 minutes, then the process is cancelled (Figure 4,
line 18);
UI-BPEL expresses the fact that the customer can cancel the request for an
purchase order by using a ScopeUI with an OnCancelEvent (Figure 4, line
5-11);
The DataOutputUI presenting an order and The DataSelectionUI Pay-
mentMethod can be gathered on the same user interface component so that
the event Conjunction (OnDataOutput, on OnDataSelection) will be raised.
This event noties that user has conrmed the order (OnDataOutput ) and
leads to unblock the process that waits for a payment method selection (On-
DataSelection).
32 M. Boukhebouze, W.P.F. Neto, and L. Erbin
Step2 : a concrete user interface [7] is generated from the abstract user inter-
face. The concrete user interface is adapted to a specic user context (user
preference, user environment, and user platform). For example, for a visually
handicapped person, the output abstract component will be transformed to
vocal output. The concrete user interface takes also into account a set of
user interface usability criterions. For example, the interface should respect
the size of the device screen.
Our future work includes the development of two transformation methods for
the two steps of the user interface generation described above.
References
1. Daniel, F., Soi, S., Tranquillini, S., Casati, F., Heng, C., Yan, L.: From people to
services to ui: Distributed orchestration of user interfaces. In: Hull, R., Mendling,
J., Tai, S. (eds.) BPM 2010. LNCS, vol. 6336, pp. 310326. Springer, Heidelberg
(2010)
2. Eclipse, B.: Project. Eclipse BPEL Designer (2011),
http://www.eclipse.org/bpel/
3. Kloppmann, M., Koenig, D., Leymann, F., Pfau, G., Rickayzen, A., von Riegen, C.,
Schmidt, P., Trickovic, I.: Ws-bpel extension for peoplebpel4people. Joint White
Paper, IBM and SAP (2005)
4. Limbourg, Q., Vanderdonckt, J., Michotte, B., Bouillon, L., L opez-Jaquero, V.:
USIXML: A language supporting multi-path development of user interfaces. In:
Bastide, R., Palanque, P.A., Roth, J. (eds.) DSV-IS 2004 and EHCI 2004. LNCS,
vol. 3425, pp. 200220. Springer, Heidelberg (2005)
5. OASIS, B.: Web Services Business Process Execution Language (2007)
6. Rao, J., Su, X.: A survey of automated web service composition methods. In:
Cardoso, J., Sheth, A. (eds.) SWSWPC 2004. LNCS, vol. 3387, pp. 4354. Springer,
Heidelberg (2005)
7. Seah, A., Gulliksen, J., Desmarais, C.M. (eds.): Human-Centered Software Engi-
neering - Integrating Usability in the Software Development Lifecycle. HCI Series,
vol. 8 ch. 6, pp. 109140. Springer, Heidelberg (2005)
8. Tofan, S., Pradais, A., Buraga, S.: A study regarding the abstract specication of
the user interface by using USIXML and UIML languages. Romanian Journal of
Human-Computer Interaction (RoCHI 2009), 3134 (2009)
9. Wu, E., Diao, Y., Rizvi, S.: High-performance complex event processing over
streams. In: Proceedings of the 2006 ACM SIGMOD 2006, pp. 407418 (2006)
10. Zeng, L., Benatallah, B., Ngu, A.H.H., Dumas, M., Kalagnanam, J., Chang,
H.: QoS-aware middleware for web services composition. IEEE Trans. Software
Eng. 30(5), 311327 (2004)
Semantics-Enabled Web API Organization and
Recommendation
1 Introduction
Recently, research attention is being progressively shifting from the develop-
ment of Web applications from scratch to their composition starting from a
huge amount of components independently developed by third parties [7]. These
components are made available through their APIs, to serve the development
of situational applications. However, integration of Web APIs often means in-
tegration of UIs (consider, for instance, the wide use of Google maps in Pro-
grammableWeb.com, an on-line registry of about 3000 Web APIs). UIs can gen-
erate/react to events, that require synchronization with the other Web APIs in
the same application. Moreover, Web application development is hampered by
the semantic heterogeneity of Web API descriptions (in terms of I/O variables,
operations, events) and by their increasing number.
The development of Web applications from third-party Web APIs can be
shortened by providing a model that abstracts from implementation aspects of
each Web API and supports their selection. While some eorts such as WADL
(Web Application Description Language) are being developed for RESTful ser-
vices, whose aim is to be the counterpart of the WSDL standard for SOAP Web
services, there are some tools such as SWEET [9] which guides the providers to
give a structured representation of their Web APIs (by using the hRESTS for-
malism) and add semantics by referencing concepts in publicly available domain
ontologies through the MicroWSMO language [8]. This still does not avoid prob-
lems during Web application development due to the huge availability of Web
APIs. In this paper we propose a framework to support easy Web application de-
velopment. The framework provides Web API organization based on automated
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 3443, 2011.
c Springer-Verlag Berlin Heidelberg 2011
Semantics-Enabled Web API Organization and Recommendation 35
2 Motivating Scenario
Let consider Dante, a Web designer who works for an Italian multimedia and
entertainment publishing company and must quickly design a Web application
to allow users to nd the company shops for promoting coming soon books/D-
VDs. This is an example of situational application, targeted on the companys
specic requirements and potentially useful for short periods of time, that is,
until the sales promotion will last. Main tasks of Dante concern the selection
and combination of suitable Web APIs which already implement some of the re-
quired functionalities, such as Web APIs to obtain information about company
shops or Web APIs to visualize on a map the location of the shops which have
been found. Dante proceeds step by step by selecting Web APIs and wiring them
to obtain the nal Web application as quickly as possible. Dante species the
36 D. Bianchini, V.D. Antonellis, and M. Melchiori
Web API hes looking for, for instance in terms of desired categories, operations
and inputs/outputs. A list of available API descriptions should be proposed,
ranked with respect to the degree of match with categories, operations and I/O
names specied by Dante. Less relevant APIs must be ltered out, to properly
reduce the search space. When Dante has already selected a Web API, other
APIs that could be coupled with the selected one should be proactively sug-
gested to him. The suggested APIs should be ranked with respect to the degree
of coupling with the selected one. Coupling can be evaluated on the basis of
correspondences between events and operations of Web API descriptions. Let
suppose that Dante chooses a Web API that, given the book/DVD title, returns
the addresses of company shops where the book/DVD is available. Other Web
APIs that can be wired with the selected one, for example Web APIs that visu-
alize points on a map by specifying the addresses to display, can be suggested to
Dante. If Dante wants to substitute one of the selected Web APIs, the system
should suggest a list of candidate APIs as alternatives, ranked with respect to
their similarity with the API to be substituted. API similarity can be evaluated
on the basis of similarity between operation names, inputs and outputs of Web
API descriptions.
The framework we propose has been designed to support Dante in the de-
scribed motivating scenario. The framework is based on semantic annotation of
Web APIs provided through available tools (e.g., SWEET [9]) and is composed
of three main elements.
Semantics-enabled Web API model. A collection of Web API semantic de-
scriptors are extracted from semantically Web APIs. Descriptors abstract
from underlying concrete implementations of the Web APIs. Each semantic
descriptor has a reference to the URI of the original API.
Web API registry model. Web API semantic descriptors are organized in a
semantics-enabled registry through semantic links established by applying
automated matching techniques.
Selection patterns. Semantic links are exploited for supporting: (i) proactive
suggestion of Web API descriptors ranked with respect to their similarity
with the Web application designers requirements; (ii) interactive support
to the designer for the composition of the Web application, according to an
exploratory perspective.
descriptions: (i) inputs and outputs; (ii) operations, usually associated with but-
tons or links on the Web API graphical interface; (iii) events, to model users
interactions with the Web API interface. Therefore, SWEET enables the se-
mantic annotation of the hRESTS structure: (i) APIs are classied with respect
to categories, that are taken from standard taxonomies available on the Web
(identied by a taxonomy URI); (ii) operation names, inputs and outputs and
event outputs are annotated with concepts, extracted from domain ontologies
and identied by a concept URI. Domain ontologies are built by domain experts
and can be designed ad-hoc for a particular application or can be made available
on the Web for general purposes. If a suitable ontology is not available, it must
be created rst, for example using common editors (such as Protege OWL). No
commitment is made on a particular ontology formalism. Annotation and classi-
cation of the Web APIs is performed according to the MicroWSMO [8] notation
extended with semantically annotated events.
From each semantically annotated Web API, the following semantic descriptor
SDi is extracted:
SDi = CATi , OPi , EVi (1)
where: CATi is a set of categories, OPi is a set of operations, EVi is a set
of events. Each operation opk OPi is described by the operation name opk ,
the operation inputs IN (opk ) and the operation outputs OU T (opk ), that are
annotated with concepts taken from the domain ontologies. Each event evh
EVi is described by the set of ontological concepts which annotate the event
outputs OU Tev (evh ) as well. An event of a Web API can be connected to an
operation of another Web API in a publish/subscribe-like mechanism. An event-
operation pair is represented in the model through an activation relationship,
that is equipped with the set of correspondences between event outputs, changed
when the event is raised, and inputs of operations triggered by event occurrence
(see Figure 1). An activation relationship from an event evih of a descriptor SDi
and an operation opkj of another descriptor SDj , is dened as follows:
38 D. Bianchini, V.D. Antonellis, and M. Melchiori
of evi EVi and the inputs of opj OPj (see Table 1). Functional similarity and
coupling have been detailed in [2].
Let be SD the set of semantic descriptors, OP (SD) (resp., EV (SD)) the over-
all set of operations (resp., events) for all the semantic descriptors in SD, M c the
set of candidate correspondences between annotated event outputs and operation
inputs detected during the functional coupling evaluation. The Web API registry
is dened as a 4-uple SD, SL, CL, M c , where SLSDSD[0, 1] is the set of
similarity links between descriptors and CL EV (SD)OP (SD)[0, 1] is the
set of coupling links between event/operation pairs.
where SDc is the selected descriptor, Condc is dened as catj CAT (SDi )
CouplIO (..), catj the category which suggested descriptors must belong to.
Note that this implies the presence of a coupling link from SD c to SD i in the
Web API registry. The function c : SDSD denes a ranking over the set
SD of descriptors in the registry such that SD i c SDj if CouplIO (SD c , SDi )
CouplIO (SD c , SD j ). For a coupled descriptor that has been selected to be added
to the Web application, semantic correspondences among I/Os are suggested to
the designer. With reference to the motivating scenario, the completion pattern
is used to suggest to Dante Web API descriptors that can be coupled with the
Web API descriptor that, given the book/DVD title, returns the addresses of
company shops where the book/DVD is available, for instance, Web APIs that
visualize addresses on a map.
Similarly, the Substitution Pattern, denoted with s , is formally dened as
follows:
In the upper part, the Web interface enables to search for available Web APIs
by specifying their category or their name (Search By category and Search
By Name). On the left, panels to browse the semantics-enabled Web API registry
are shown. The selected descriptor SDi is highlighted as a circle in the center of
the Similarity links panel (e.g., the FindShops descriptor in Figure 2); all the
descriptors SDj related to SDi through a similarity link are displayed as circles
around SDi ; dimension of each circle is proportional to the SimIO (SDi , SDj )
value. The selected descriptor SDi is also highlighted as a pentagon in the center
of the Coupling links panel; other descriptors SDj coupled with SDi are shown
as hexagons around the pentagon; dimension of each hexagon is proportional to
the CouplIO (SDi , SDj ) value (see, for example, the MapViewer descriptor). In
the canvas on the right, the Web interface enables the designer to drag Web API
descriptors and wire them to design the Web application. Each descriptor is rep-
resented as a rectangle containing the descriptor events (e.g., the selectedShop
event for the FindShops descriptor) and operations (e.g., the find operation
for the FindShops descriptor). By pushing the Self connections button, the
system suggests activation relationships among descriptors which present IO
correspondences. By clicking on the connection in the canvas, the designer can
visualize the set of possible IO correspondences, which he/she can set or re-
set. The designer can also introduce his/her own activation relationships (Add
activations facility).
Preliminary experiments have been performed to validate the eectiveness of
the selection patterns. We used a 32 bit Windows machine with a 2.10 GHz
AMD Athlon X2 Dual-Core CPU, 1 MB L2 Cache and 4GB RAM memory
42 D. Bianchini, V.D. Antonellis, and M. Melchiori
0.85 0.95
0.8
0.9
0.75
0.85
Precision
Precision
0.7
0.8
0.65
0.75
0.6
0.7
0.55
0.5 0.65
0.45
0.3 0.4 0.5 0.6 0.7 0.8 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall Recall
Completion Substitution
References
1. Abiteboul, S., Greenshpan, O., Milo, T.: Modeling the Mashup Space. In: Proc. of
the Workshop on Web Information and Data Management, pp. 8794 (2008)
2. Bianchini, D., Antonellis, V.D., Melchiori, M.: A Semantic Framework for collab-
orative Enterprise Knowledge Mashup. In: DAtri, A., Ferrara, M., George, J.F.,
Spagnoletti, P. (eds.) Information Technology and Innovation Trends in Organiza-
tions, pp. 117124. Physica Verlag, Heidelberg (2011)
3. Bianchini, D., Antonellis, V.D., Melchiori, M.: Flexible Semantic-based Service
Matchmaking and Discovery. World Wide Web Journal 11(2), 227251 (2008)
4. Daniel, F., Casati, F., Benatallah, B., Shan, M.: Hosted universal composition:
Models, languages and infrastructure in mashArt. In: Laender, A.H.F., Castano,
S., Dayal, U., Casati, F., de Oliveira, J.P.M. (eds.) ER 2009. LNCS, vol. 5829, pp.
428443. Springer, Heidelberg (2009)
5. Gomadam, K., Ranabahu, A., Nagarajan, M., Sheth, A.P., Verma, K.: A Faceted
Classication Based Approach to Search and Rank Web APIs. In: ICWS, pp. 177
184 (2008)
6. Greenshpan, O., Milo, T., Polyzotis, N.: Autocompletion for Mashups. In: Proc. of
the 35th Int. Conference on Very Large DataBases (VLDB 2009), Lyon, France,
pp. 538549 (2009)
7. Hoyer, V., Fischer, M.: Market overview of enterprise mashup tools. In: Bouguet-
taya, A., Krueger, I., Margaria, T. (eds.) ICSOC 2008. LNCS, vol. 5364, pp. 708
721. Springer, Heidelberg (2008)
8. Kopecky, J., Vitvar, T., Fensel, D.: hRESTS & MicroWSMO. Tech. rep., SOA4ALL
Project, Deliverable D3.4.3 (2009)
9. Maleshkova, M., Pedrinaci, C., Domingue, J.: Semantic annotation of Web APIs
with SWEET. In: Proc. of the 6th Workshop on Scripting and Development for
the Semantic Web (2010)
10. Ngu, A.H.H., Carlson, M.P., Sheng, Q.Z., Young Paik, H.: Semantic-based mashup
of composite applications. IEEE T. Services Computing 3(1), 215 (2010)
11. van Rijsbergen, C.J.: Information Retrieval. Butterworth (1979)
Preface to MORE-BI 2011
Business intelligence (BI) systems gather, store, and process data to turn it into informa-
tion that is meaningful and relevant for decision-making in businesses and organizations.
Successful engineering, use, and evolution of BI systems require a deep understanding
of the requirements of decision-making processes in organizations, of the kinds of infor-
mation used and produced in these processes, of the ways in which information can be
obtained through acquisition and reasoning on data, of the transformations and analyses
of that information, of how the necessary data can be acquired, stored, cleaned, how its
quality can be improved, and of how heterogeneous data can be used together.
The first International Workshop on Modeling and Reasoning for Business Intelli-
gence (MORE-BI 2011) was organized and collocated with the 30th International Con-
ference on Conceptual Modeling (ER 2011), held in Brussels, Belgium, to stimulate
discussions and contribute to the research on the concepts and relations relevant for the
various steps in the engineering of BI systems, the conceptual modeling of requirements
for BI systems, of the data used and produced by them, of the transformations and anal-
yses of data, and associated topics to which researchers and practitioners of conceptual
modeling can contribute, in the aim of constructing theoretically sound and practically
relevant models and reasoning facilities to support the engineering of BI systems.
The call for papers attracted 23 abstracts, of which 22 resulted in full submissions.
The submissions came from 64 authors from 14 countries and 4 continents. The pro-
gram committee consisting of 31 researchers conducted three reviews of each submis-
sion and selected 7 submissions for presentation and discussion at the workshop, an
acceptance rate of 32%.
We wish to thank all authors who have submitted their research to MORE-BI 2011.
We are grateful to our colleagues in the steering committee for helping us define the
topics and scope of the workshop, our colleagues in the program committee for the
time invested in carefully reviewing the submissions under a very tight schedule, the
participants who have helped make this an interesting event, and the local organizers
and workshop chairs of ER 2011.
We hope that you find the workshop program and presentations of interest to research
and practice of business intelligence, and that the workshop has allowed you to meet
colleagues and practitioners focusing on modeling and reasoning for business intelli-
gence. We look forward to receive your submissions and meet you at the next edition
of the International Workshop on Modeling and Reasoning for Business Intelligence.
1 Introduction
Business Intelligence (BI) solutions provide dierent means like OLAP, data
mining or case based reasoning to explore data. Standard BI means are usually
designed to work with numerical data, thus they provide a quantitative analy-
sis of the data (aka number crunching) based on mathematical statistics. In
fact, classical BI examples show accounting, nance, or some other calculation-
heavy subject [10]. To some extent, though arguably oversimplied, one can
understand BI as acting on lists or tables lled with numbers.
Compared to number crunching, Formal Concept Analysis (FCA) [3] provides
a complementing approach. The starting point of FCA are crosstables (called
formal contexts), where the rows stand for some objects, the columns for some
attributes, and the cells (intersections of rows and columns) carry the binary
information whether an attribute applies to an object (usually indicated by a
cross) or not. Based on this crosstable, the objects are clustered to meaningful
sets. These clusters form a hierarchy, which can be visually displayed, e.g. by
a so-called Hasse-diagram. A short introduction into FCA, as needed for this
paper, is provided in the next section.
A general overview over the benets of FCA in information science is provided
by Priss in [8]. Relevant for this paper are the relationships between FCA and
both Business Intelligence (BI) and Semantic Technologies (ST).
With respect to BI, FCA can be for example considered as a data mining tech-
nology, particularly for mining association rules [7]. More relevant to this paper
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 4554, 2011.
c Springer-Verlag Berlin Heidelberg 2011
46 F. Dau and B. Sertkaya
modelCars
PCgames
makeUp
female
books
male
mary
john
erika
max
Fig. 1. Formal context of customers (left) and its concept lattice (right)
instance the last row states that Max is a male, he is interested in model cars,
books and PC games but he is not interested in make-up items.
Given such a formal context, the rst step for analyzing this context is usually
computing the formal concepts of this context, which are natural clusterings
of the data in the context. A formal concept is a pair consisting of an object set
A and an attribute set B such that the objects in A share the attributes in B,
and B consists of exactly those attributes that the objects in A have in common.
The object set A is called the extent, and the attribute set B is called the intent
of the formal concept (A, B).
Example 2. Consider the formal context given in Ex. 1. It has nine formal con-
cepts. One of them is ({max}, {male, modelCars, P Cgames, books}) with the
extent {max} and the intent {male, modelCars, P Cgames, books}. Note that
max is male and has exactly the interests in modelCars, PCgames, books. On
the other hand, max is the only male person with these interests. Another (less
trivial) formal concept is ({john, max}, {male, modelCars, P Cgames}). Both
john and max are male and have modelCars and PCgames as (common) inter-
ests, and they are indeed the only male persons with (at least) these interests.
Once all formal concepts of a context are obtained, one orders them w.r.t.
the inclusion of their extents (equivalently, inverse inclusion of their intents).
For example, the two formal concepts of the above given example are ordered
that way. This ordering gives a complete lattice (e.g. a hierarchy where any
two elements have -like in trees- a least upper bound and -unlike in trees- a
greatest lower bound), called the concept lattice of the context. A concept lattice
contains all information represented in a formal context, i.e., we can easily read
o the attributes, objects and the incidence relation of the underlying context.
Moreover, concept lattice can be visualized, which makes it easier to see formal
concepts of a context and interrelations among them. Thus it helps to understand
the structure of the data in the formal context, and to query the knowledge
represented in the formal context.
48 F. Dau and B. Sertkaya
The nodes of a concept lattice represent the formal concepts of the underlying
context. In order to improve readability of the lattice, we avoid writing down
the extent and intent of every single node. Instead, we label the nodes with
attribute and object names in such a way that every name appears only once
in the lattice. In this labelling, the intent of the formal concept corresponding
to a node can be determined by the attribute names that can be reached by
the ascending lines, and its extent can be determined by the object names that
can be reached by the descending lines. For instance consider the concept lattice
in Figure 1 that results from the formal context in Example 1. The attribute
names are written in boxes with gray background and object names are written
in boxes with white background. The intent of the formal concept marked with
the attribute name books is {books} since there is no other attribute name that
can be reached by an ascending line, and its extent is {max, erika} since these
are the only two object names that can be reached by a descending line from it.
Similarly, the concept marked with the attribute names modelCars and male,
and the object name john has the intent {modelCars, male, P Cgames} and the
extent {john, max}.
FCA, as it has been described so far, can only deal with binary attributes.
For real data, the situation is usually dierent: Attributes assign specic values
(which might be strings, numbers, etc) to data. For example, RDF-triples (s, p, o)
are exactly of this form: The attribute p - from now on we will use the RDF-term
property instead- assigns the value o to the entity s. In FCA, a process called
conceptual scaling is used to deal with this issue.
Let a specic property be given with a set of possible val-
ues. A conceptual scale is a specic context with the values sex age
of the property as formal objects. The choice of the formal Adam m 21
attributes of the scale is a question of the design of the Betty f 50
scale: The formal attributes are meaningful attributes to Chris 66
describe the values; they might be dierent entities or they Dora f 88
might even be the values of the property again. To exem- Eva f 17
plify conceptual scaling, we reuse a toy example from [15], Fred m
which is the following table provided on the right with two George m 90
many-valued properties sex and age. Note that empty Harry m 50
cells are possible as well.
Next, two conceptual scales for the properties sex and age and their line
diagrams are provided.
S2 < 18 < 40 65 > 65 80
17
S1 m f 21
m 50
f 66
88
90
Formal Concept Analysis for Qualitative Data Analysis over Triple Stores 49
Both points are important for ToscanaJ, which is discussed in the next section.
3 ToscanaJ
There is a variety of software for FCA available. Most of them support the cre-
ation of contexts from scratch and the subsequent computation and display of the
corresponding concept lattices. Contrasting this approach, Elba and ToscanaJ
are a suite of mature FCA-tools which allow to query and navigate through
data in databases. They are intended to be a Conceptual Information System
(CIS). CISs are systems that store, process, and present information using
concept-oriented representations supporting tasks like data analysis, informa-
tion retrieval, or theory building in a human centered way. Here, a CIS is an
FCA-based system used to analyze data stored in one table of an RDBMS.
Similar to other BI-systems, in CIS we have to distinguish between a design
phase and a run-time-phase (aka usage phase), with appropiate roles attached
to the phases. In the design phase, a CIS engineer (being an expert for the CIS)
together with a domain expert who has limited knowledge of a CIS) develops
the CIS schema, i.e. those structures which will be later on used to access the
system. This schema consists of manually created conceptual scales. Developping
the scales is done with a CIS editor (Elba) and usually a highly iterative process.
In the run-time phase, a CIS browser (ToscanaJ) allows a user to explore and
analyze the real data in the database with the CIS schema.
The original Elba/ToscanaJ-suite has been developped to analyze data in a
relational table, i.e. a table in a RDBMS or an excel-le. We have extended the
suite in order to be able to access data in a triple store. This extended version of
the suite uses the Sesame framework4 for accessing a triple store and querying
the RDF data therein. It provides two ways of connecting to a triple store over
Sesame. One of them is over HTTP via Apache Tomcat5 , the other one is over
the SAIL API6 . Tomcat is an open source software implementation of the Java
4
See http://www.openrdf.org/doc/sesame2/system
5
See http://tomcat.apache.org/
6
See http://www.openrdf.org/doc/sesame2/system/ch05.html
50 F. Dau and B. Sertkaya
4 Use Case
In order to evaluate our approach, we have used a dataset crawled from the
SAP Community Network (SCN). SCN contains a number of forums for SAP
Formal Concept Analysis for Qualitative Data Analysis over Triple Stores 51
users and experts to share knowledge, or get help on SAP topics and products.
The dataset we have used is taken from the forum Service-Oriented Architecture
(SOA), which contains 2600 threads and 10076 messages. The dataset is anno-
tated by the crawler using ontologies from the NEPOMUK project. The used
ontologies and their meanings are provided below along with short descriptions
taken from the project website7 .
NEPOMUK Information Element Ontology (NIE): The NIE Framework is
an attempt to provide unied vocabulary for describing native resources
available on the desktop.
NEPOMUK le ontology (NFO): The NFO intends to provide vocabulary
to express information extracted from various sources. They include les,
pieces of software and remote hosts.
NEPOMUK Message Ontology (NMO): The NMO extends the NIE frame-
work into the domain of messages. Kinds of messages covered by NMO in-
clude Emails and instant messages.
NEPOMUK Contact Ontology (NCO): The NCO describes contact infor-
mation, common in many places on the desktop.
From these ontologies, our dataset uses the following classes as types:
nie#DataObject: A unit of data that is created, annotated and processed on
the user desktop. It represents a native structure the user works with. This
may be a le, a set of les or a part of a le.
nfo#RemoteDataObject: A le data object stored at a remote location.
nie#InformationElement: A unit of content the user works with. This is a
superclass for all interpretations of a DataObject.
nco#Contact: A Contact. A piece of data that can provide means to identify
or communicate with an entity.
nmo#Message: A message. Could be an email, instant messanging message,
SMS message etc.
For analyzing experience levels of the users of the SOA forum, we used the
Contact type above and created a scale based on the number of posts, number
of questions, number of resolved questions information provided in the data.
We have named users that have less than 50 posts as newbie, users that have
more than 300 posts as frequent, users that have more than 1000 posts as pro,
users that have asked more than 310 questions as curious and people that have
resolved more than 230 questions as problem solver. Note that this scale uses
dierent measures (number of posts, number of questions, numbers of answers).
The concept lattice in Figure 2 shows number of users with the mentioned ex-
perience levels. The diagram clearly displays the sub/super-concept-relationships
between the experience levels, which is one of the main distinguishing features
of visualizing data using concept lattices. E.g. we can read from the lattice that
7
http://www.semanticdesktop.org/ontologies
52 F. Dau and B. Sertkaya
Fig. 2. Diagram of the scale based on num- Fig. 3. Diagram of the scale based
ber of posts, questions, resolved questions on number of points
curious and professional users are both also frequent users, whereas problem
solvers and newbies are not.
Next, for analyzing experience levels based on the number of points infor-
mation in our dataset we created another scale. This time, as labels we took
contributor types that are ocially dened by SCN as bronze, silver, gold and
platinium contributors, which have more than 250, 500, 1500 and 2500 points
respectively. The concept lattice of this scale is shown in Figure 3. This scale is
a so-called ordinal scale, which means that the formal concepts are ordered as
a chain. This is also easily seen in the concept lattice of this scale. Obviously, a
user that has more than 2500 points also has more than 1500 points, and so on.
The above displayed concept lattices are separately informative about the
properties of forum users, i.e., the rst one about experience levels based on
number of posts, questions and resolved questions, and the second one about
number of points. One of the most powerful techniques of FCA is to com-
bine such lattices to give a combined view of several lattices together, which
is called a nested line diagram. In its simplest form, a nested line diagram is a
concept lattice whose concepts are themselves also concept lattices. Nested line
diagrams allow the user to select a concept and zoom into it to see the lattice
nested in that concept. Figure 4 shows the nested line diagram of the diagrams
in the Figures 2 and 3. Note that the outer diagram is actually the one in Fig-
ure 3. The four bigger circles correspond to the four types of contributors in
that gure. The inner diagrams are the diagram in Figure 2. Figure 5 shows an
excerpt of the nested diagram that corresponds to the node golden contributor,
and Figure 6 shows the inner diagram of this node. Note that the number of
users corresponding to dierent levels of experience in this diagram diers from
that of diagram in Figure 2. The reason is that, now we zoomed into the node
gold contributor so the information in the inner diagram is restricted to the
gold contributors only. For instance, as seen in this diagram there are no new-
bies that are gold contributors, which is quite natural. On the other hand 79 of
the gold contributors are pro users. In ToscanaJ, and thus in our extension of it
Formal Concept Analysis for Qualitative Data Analysis over Triple Stores 53
Fig. 4. Nesting the above two scales Fig. 6. Inner diagram of Fig. 5
to triple stores, one can nest an arbitrary number of diagrams and can browse
nested diagrams easily by zooming in and out.
References
1. Becker, P., Hereth, J., Stumme, G.: ToscanaJ: An open source tool for quali-
tative data analysis. In: Duquenne, V., Ganter, B., Liquiere, M., Nguifo, E.M.,
Stumme, G. (eds.) Advances in Formal Concept Analysis for Knowledge Discovery
in Databases, FCAKDD 2002 (2002)
2. Dau, F., Klinger, J.: From Formal Concept Analysis to Contextual Logic. In:
Ganter, B., Stumme, G., Wille, R. (eds.) Formal Concept Analysis. LNCS (LNAI),
vol. 3626, pp. 81100. Springer, Heidelberg (2005)
3. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations.
Springer, Berlin (1999)
4. Hereth, J.: Formale begrisanalyse und data warehousing. Masters thesis, TU
Darmstadt, Germany (2000)
5. Hereth, J.: Relational Scaling and Databases. In: Priss, U., Corbett, D., Angelova,
G. (eds.) ICCS 2002. LNCS (LNAI), vol. 2393, pp. 6276. Springer, Heidelberg
(2002)
6. Hereth, J., Stumme, G., Wille, R., Wille, U.: Conceptual knowledge discovery - a
human-centered approach. Journal of Applied Articial Intelligence (AAI) 17(3),
281301 (2003)
7. Lakhal, L., Stumme, G.: Ecient Mining of Association Rules Based on Formal
Concept Analysis. In: Ganter, B., Stumme, G., Wille, R. (eds.) Formal Concept
Analysis. LNCS (LNAI), vol. 3626, pp. 180195. Springer, Heidelberg (2005)
8. Priss, U.: Formal concept analysis in information science. Annual Review of Infor-
mation Science and Technology 40 (2005)
9. Roth-Hintz, M., Mieth, M., Wetter, T., Strahringer, S., Groh, B., Wille, R.: In-
vestigating snomed by formal concept analysis. In: Proceedings of ACM SIGMOD
Workshop on Research Issues in Data Mining and Knowledge Discovery (2000)
10. Scheps, S.: Business Intelligence for Dummmies. John Wiley and Sons Ltd., Chich-
ester (2008)
11. Sertkaya, B.: OntoComP: A Prot e Plugin for Completing OWL Ontologies. In:
e g
Aroyo, L., Traverso, P. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 898902. Springer,
Heidelberg (2009)
12. Stumme, G.: Conceptual on-line analytical processing. In: Tanaka, K., Ghande-
harizadeh, S., Kambayashi, Y. (eds.) Information Organization and Databases. ch.
14. Kluwer, Boston (2000)
13. Stumme, G., Wille, R., Wille, U.: Conceptual knowledge discovery in databases us-
ing formal concept analysis methods. In: Zytkow, J.M., Quafofou, M. (eds.) PKDD
1998. LNCS (LNAI), vol. 1510, pp. 450458. Springer, Heidelberg (1998)
14. Vogt, F., Wille, R.: Toscana a graphical tool for analyzing and exploring data. In:
Tamassia, R., Tollis, I.G. (eds.) GD 1995. LNCS, vol. 1027, pp. 226233. Springer,
Heidelberg (1996)
15. Wol, K.E.: A rst course in formal concept analysis. In: Faulbaum, F. (ed.) Pro-
ceedings of Advances in Statistical Software, vol. 4, pp. 429438 (1993)
Semantic Cockpit: An Ontology-Driven,
Interactive Business Intelligence Tool for
Comparative Data Analysis
1 Introduction
Comparative analysis of data gathered about past business activities is one of the
critical initial steps by a business analyst in her or his business intelligence task of
understanding and evaluating a business within its environmental context. Data
warehouse (DWH) technology for collecting and processing relevant data and ac-
companying On-Line-Analytical-Processing (OLAP)-Tools support the business
analyst to analyze and compare data aggregated over various dimensions (such
as time, location, product group). Dashboards and cockpits have been developed
SemCockpit is a collaborative research project funded by the Austrian Ministry of
Transport, Innovation, and Technology in program FIT-IT Semantic Systems and
Services. The project started in March 2011 and will end in August 2013.
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 5564, 2011.
c Springer-Verlag Berlin Heidelberg 2011
56 B. Neumayr, M. Schre, and K. Linner
recently as front end tools for data warehouses that allow the business analyst to
predene and interactively select analysis reports and to build alert mechanisms
into user interfaces. These tools assist him through dierent kinds of graphs to
interactively inspect aggregated measures (e.g., average number of prescriptions
per patient) related to group of entities of interest (e.g., general practitioners)
and to compare these against corresponding measures of one or multiple peer (or
comparison) groups, potentially also through a normalized score, visualized by a
gauge indicating the result of this comparison. Thereby, he identies opportuni-
ties and problems, explores possible causes, or discovers interesting phenomena
(relationships between data). His comparative analysis may lead immediately to
insights that can be used for strategy formulation or implementation, or may
trigger further analysis by dedicated data mining tools, whereby the insights
gained so far help him to guide the formulation of the right analysis questions.
Unintelligent Cockpits oer only limited support to the business analyst for
his comparative analysis. With important meta knowledge unrepresented or rep-
resented in a form not processable by automatic reasoning, he is limited in the
analysis that can be formulated and his success entirely depends on his domain
knowledge as business expert. Some simple examples: The analysis compare
the prescription habits of general practitioners in tourist regions to other prac-
titioners in relation to the percentage of prescribing generic drugs vs. innovator
drug X cannot be formulated if domain knowledge about pharmacology is not
represented in machine-processable form. New concepts, even as simple as gen-
eral practitioners in tourist regions, are not easy to formulate, if expressible at
all. Judging scores comparisons of u medications across months requires the
background knowledge that sales of u medications are typically 30 % higher in
winter months. Business analysts have to deal with a lot of data and commonly
have insucient personal experience to guide themselves during the process of
information selection and interpretation [9]. It is up to the business analyst to
discriminate between usual phenomena and interesting situations that may give
rise to further action or need further analysis; moreover, similar results re-appear
all the time, overloading analysis results with already explained usual phenom-
ena such that he may unnecessarily repeat further analysis steps.
The Semantic Cockpit assists and guides the business analyst in dening
analysis tasks, discriminating between usual phenomena and novel interesting
situations to be followed up to avoid him being drown in information, and rec-
ommending actions or indicating further analysis steps that should be applied to
follow up interesting situations. The Semantic Cockpit is an intelligent partner of
the business analyst due to reasoning about various kinds of knowledge, explic-
itly represented by machine-processable ontologies such as: organization-internal
knowledge (e.g., denitions of concepts corresponding to business terms such
as weekend and evening visits), organization external domain knowledge (e.g.,
about medications or illnesses), the semantics of measures and measure values
(i.e., about upon what possible groups of interests a measure may be dened
and which particular group of interest a measure value describes), the semantics
of scores (i.e., what measure about a group of interest is scored against which
group of comparison along what dimension), knowledge about insights gained
from previous analysis (e.g., typical percentage of higher sales of u medications
in winter months), and knowledge about how to act upon a striking low or high
Semantic Cockpit 57
+ asdf ddd
Judgement Knowledge
Domain
p d
?
+ asdf asdf + asdf asdf
Query dfdse sdf dfdse sdf new
Templates insights
DWH DWH
Fig. 1. Conventional Comparative Data Analysis (left) vs. Semantic Cockpit: Setting
& Components (right)
Fig. 2. Semantic Cockpit Process (simplied, without Scores and Judgement Rules)
We will now describe step-by-step how the business analyst interacts with the
Semantic Cockpit and how the Semantic Cockpit exploits its knowledge to in-
telligently support the analyst using our sample analysis task.
First, the business analyst opens the Cockpit Design View (see Fig. 3), which
is partitioned into ve parts: the Measure Denition Board, the Score Denition
Board, the Judgment Rule Board, the Domain Ontology Context View, and the
Measure and Score Ontology Context View.
Context: Medication Domain Ontology Context View Measure & Score Ontology Context View
drug druggroup Applicable Measure Schemata:
In our case, the business analyst wishes to dene a general measure percentage-
OfCheapAntiDepressionDrugs for psychotropic drugs that have its main indication
anti depression for all locations, all prescribers, and all times. To dene this
scope, he moves down in the Measure Denition Board in the Drug Dimension
from the top-most multi-dimensional point of all dimensions to (All-Location, All-
Prescriber, Psychotropic, All-Time) and adds the qualication main indication = anti
depression upon selecting the role main-indication from the properties presented for
drug group Psychotropic (which corresponds to psychoactive-drug in the external
drug ontology) in the Ontology Context View. For future use, he may dene a
new concept AntiDepressionPsychotropic for this qualication that will be added
and correctly placed into the domain ontology by the Ontology Manager (If that
concept had been dened previously, the business analyst would have seen it in
the Ontology Context View for possible selection).
Semantic Cockpit 61
After having dened the scope of interest for the measure, the business an-
alyst species the measurement instructions. He checks the Measure and Score
Ontology Context View for measures already dened for the scope of interest.
In our simple case this view contains the measure nrOfprescriptions. He drags this
measure into the Measurement Instructions section of the Measure Denition
Board and further specializes the measure by adding qualication LowPricedDrug
from the Domain Ontology. To calculate the percentage, he drags the measure
nrOfPrescriptions once more into the Measure Denition Board and indicates that
the value of the main measure is the ratio of both submeasures.
Next, the business analyst denes in the Score Denition Board a score to be
used for comparing (based on the dened measure) a specic group of interest
(in the scope of the measure) against some specic group of comparison. In
our simple example, the business analyst selects the predened scoring function
percentageDierence and scales the score value to [0,100].
The Judgement Rule Board will be explained later.
The business analyst moves to the Cockpit Data Analysis View (see Fig. 4),
that is partitioned into four parts: the Control Board, the Score Board, the Mea-
sure Board, and the Domain Ontology Context View, to apply measure percent-
ageOfCheapAntiDepressionDrugs to a particular group of interest within the scope
of the measure. In the Control Board the business analyst can narrow down
the group of interest as required for his specic analysis. For our analysis task
the business analyst moves in the location dimension from All-Locations to Upper-
Austria and in the doctor dimension from All-Doctors to GenPractitioner to multi-
dimensional point (UpperAustria, GenPractitioner, Psychotropic, All-Time) and selects
additional qualication medium-to-large-city from the domain ontology about lo-
cations using the Ontology Context View (not shown expanded in Fig. 4 but as
in Fig. 3).
Fig. 4. Cockpit Data Analysis View (simplied; Score Board, Measure Board and Do-
main Ontology Context View unexpanded)
5 Related Work
The Semantic Cockpit is based on multidimensional modeling at the conceptual
level (an issue extensively discussed in the literature, with the Dimensional Fact
Model [3] being our main reference) enriched by the explicit representation of
and reasoning with additional knowledge, such as domain knowledge represented
in ontologies, semantics of derived measures and scores, and previous insights
represented as judgement rules.
Combining ontologies and multidimensional modeling has been discussed from
dierent perspectives and with dierent applications in mind: Nebot et al. [5] pro-
pose Multidimensional Integrated Ontologies as a basis for the multidimensional
analysis of Semantic Web data. Romero and Abell o [8] introduce an approach
to derive a multidimensional conceptual model starting from domain knowledge
represented in ontologies. Khouri and Bellatreche [4] introduce a methodology for
designing Data Warehouses from ontology-based operational databases. Niemi
and Niinim aki [6] discuss how ontologies can be used for reasoning about sum-
marizability in OLAP. Sell et al. [9] focus on using ontologies as a means to
represent data in business terms in order to simplify data analysis and cus-
tomizing BI applications. Baader and Sattler [1] and Calvanese et al. [2] extend
Description-Logics-based ontologies with aggregation functions.
Conceptual modeling and reasoning with (derived) measures and scores has
not received much attention so far, however it is related to the work of Pardillo
et al. [7], which extends OCL for dening OLAP queries as part of conceptual
multidimensional models.
The interplay of rules and ontologies was subject of the REWERSE
project (http://rewerse.net) and is subject of the ongoing ONTORULE
project (http://ontorule-project.eu/). The representation and reasoning
over multi-dimensional situation-condition-action rules representing interesting
relationships between compared data, as it is required to represent judgment
knowledge in the Semantic Cockpit, has not been researched so far.
6 Conclusion
In this paper we outlined the components and usage of the Semantic Cockpit,
which is developed in an ongoing cooperative research project (see footnote on
page 1).
We implemented a rst proof-of-concept prototype that we use together with
our partners from health insurance industry to study detailed requirements and
performance issues and to further revise and extend the architecture and basic
approach presented in this paper.
64 B. Neumayr, M. Schre, and K. Linner
In the remaining phases of the project till 2013 we will investigate and im-
plement language constructs and reasoning techniques for measure & score on-
tologies and judgement rules, together with their mapping to the database level.
We will investigate the complexity and performance of the dierent reasoning
tasks, the costs of integrating external ontologies, problems with the quality of
external ontologies and the possibility to further automate design and analysis
tasks.
We expect that the Semantic Cockpit leads to higher-quality analysis re-
sults and signicant cost savings. We will validate the expected quantitative and
qualitative improvements by eld studies carried out with end-users from health
insurance industry.
References
1. Baader, F., Sattler, U.: Description logics with aggregates and concrete domains.
Inf. Syst. 28(8), 9791004 (2003)
2. Calvanese, D., Kharlamov, E., Nutt, W., Thorne, C.: Aggregate queries over ontolo-
gies. In: ONISW 2008: Proceeding of the 2nd International Workshop on Ontologies
and Information Systems for the Semantic Web, pp. 97104. ACM, New York (2008)
3. Golfarelli, M., Maio, D., Rizzi, S.: The dimensional fact model: A conceptual model
for data warehouses. Int. J. Cooperative Inf. Syst. 7(2-3), 215247 (1998)
4. Khouri, S., Bellatreche, L.: A methodology and tool for conceptual designing a data
warehouse from ontology-based sources. In: Song II, Y., Ordonez, C. (eds.) DOLAP,
pp. 1924. ACM, New York (2010)
5. Nebot, V., Llavori, R.B., Perez-Martnez, J.M., Aramburu, M.J., Pedersen, T.B.:
Multidimensional integrated ontologies: A framework for designing semantic data
warehouses. J. Data Semantics 13, 136 (2009)
6. Niemi, T., Niinim aki, M.: Ontologies and summarizability in olap. In: SAC, pp.
13491353 (2010)
7. Pardillo, J., Mazon, J.-N., Trujillo, J.: Extending ocl for olap querying on conceptual
multidimensional models of data warehouses. Inf. Sci. 180(5), 584601 (2010)
8. Romero, O., Abell o, A.: A framework for multidimensional design of data warehouses
from ontologies. Data Knowl. Eng. 69(11), 11381157 (2010)
9. Sell, D., da Silva, D.C., Beppler, F.D., Napoli, M., Ghisi, F.B., dos Santos Pacheco,
R.C., Todesco, J.L.: Sbi: a semantic framework to support business intelligence. In:
Duke, A., Hepp, M., Bontcheva, K., Vilain, M.B. (eds.) OBI. ACM International
Conference Proceeding Series, vol. 308, p. 11. ACM, New York (2008)
A Model-Driven Approach for Enforcing
Summarizability in Multidimensional Modeling
1 Introduction
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 6574, 2011.
c Springer-Verlag Berlin Heidelberg 2011
66 J.-N. Maz
on, J. Lechtenb
orger, and J. Trujillo
2 Related Work
( ) class and Date, Product, and Customer via Dimension ( ) classes. Facts are
X
dimension levels, which allow to analyze measures at a specic level of detail, are
specied by Base classes ( B ) such as Day and Week. Associations (represented
by the stereotype Rolls-UpTo, ) between pairs of Base classes form dimension
hierarchies. Importantly, Rolls-UpTo associations between Base classes as well
as associations between Fact and Dimension classes are annotated with UML
multiplicities. E.g., the multiplicities for the association between Country and
Region in Fig. 1 indicate that every region belongs to exactly one country (1
in role r at the Country end) whereas there are countries (such as Andorra,
Monaco, etc.) without associated regions (0..* in role d at the Region end).
Finally, UML generalization relationships between Base classes can be used to
represent optional dimension levels within a hierarchy.
the data that will populate the DW, which may involve considerable eorts of
preprocessing. In particular, ETL processes become more complex, as summariz-
ability checks must be incorporated and executed for every update. In addition,
as the data transformations produce articial data values, data analysis becomes
more complex.
In [6,7] the authors present a classication of dierent kinds of complex di-
mension hierarchies, and they dene the MultiDimER model for the conceptual
design of complex MD models based on an extension of the well-known Entity-
Relationship (ER) model. The idea is that this classication guides developers
to properly capture at a conceptual level the precise semantics of dierent kinds
of hierarchies without being limited by current data analysis tools. Furthermore,
the authors discuss how to map these conceptual hierarchies to the relational
model (enabling implementation in commercial tools). Unfortunately, the map-
ping between the conceptual and the logical level is described informally. In
addition, the commented mapping is tool-dependent and it may vary depending
on the scenario.
(a) Excerpt of the source model. (b) Excerpt of the target model.
Fig. 2. Newly created dimension for eliminating non-strictness: the Book sales example
both facts, queries involving Sales for Authors are still possible via drill-across
operations. Moreover, to support summarizable queries, the designer may want
to manually add additional measures, e.g., each authors share per book, to the
new fact schema. The target model is shown in Fig. 2(b).
The nonStrictBases relation is shown in Fig. 3(a). It checks that there is a non-
strict Rolls-UpTo association between two Base classes (b1 and b2 ) in the source
model (multiplicity in the role r ) in order to create the appropriate elements in
the target model to obtain a representation without non-strict associations. The
two cases concerning the treatment of non-strict associations explained above
are captured by a condition that must hold to execute this relation (see when
clause of Fig. 3(a)). This condition checks whether Base class b2 plays role r in
some strict association in the source model. On the one hand, if this condition
does not hold then the QVT relation is not launched, and no new Rolls-UpTo
association is created in the target model, since the two Base classes are related
via a many-to-many relationship by default. On the other, if the condition is
satised then b2 does not play role r in some strict association. In this case, a
new Fact class f is created which is associated with two new Dimension classes:
d1 and d2 . Specically, d1 corresponds to the Dimension class related to the
Base classes b1 and b2 of the source model, while d2 is a new Dimension class
whose dening Base class is b2 p (which is a copy of the Base class that causes
the non-strictness in the source model b2 ).
After the execution of the nonStrictBases relation, an excerpt of the target
model related to the Date dimension is shown in Fig. 4(a). Here, the non-strict
Rolls-UpTo association between Month and Week has disappeared, since in this
way both Bases classes are already many-to-many related, and there is no need
for the creation of a new Fact class. The remaining parts of the source model that
do not present non-strictness problems are copied directly to the target model. It
is worth noting that all the possible strict Rolls-UpTo association between Base
classes are assumed to be explicitly modeled in the source model and they will
A Model-Driven Approach for Enforcing Summarizability in MD Modeling 71
nonStrictBases
rollUpIncompleteBases
<<domain>>
<<domain>>
fd2: Association
r :Rolls-upTo
g2: Generalization
b2p: Base
d1: Dimension specific
name=n_b2
b4: Base
when name=n_b1+wo+n_b2
s.rollBase().getDrilledDownBases()->size()=0; when
Class(b1.relatedDimension(),d1); Class(b2,b2p);
Class(b2,b2p); Class(b1,b1p);
also appear in the target model. In this way, analysis capabilities of the source
model are not negatively aected. Anyway, if users would still like to navigate
between aggregate values by means of a non-strict relationship, then a new type
of association could be added to the conceptual model; such navigational arcs,
however, would need special support from implementing OLAP tools and are
beyond the scope of this paper.
With respect to the soundness of the nonStrictBases relation shown in Fig. 3(a)
we focus on the non-strict association of the source model, say association s be-
tween b1 and b2 , which is represented dierently in the target model. We consider
two cases: First, the QVT relation just removes the non-strict association. In this
case, the when clause of the relation assures that b2 is alternatively reachable
via strict Rolls-UpTo associations. Hence, in the target model both Base classes
still belong to a common dimension hierarchy, where all pairs of Base classes are
related via many-to-many relationships by default as we have explained above,
and there is no loss of information when removing the non-strict association.
Second, the QVT relation creates a new Fact class to represent the many-to-
many relationship. In this case, the associations between instances of b1 and b2
of the source model are simply stored as instances of the new Fact class in the
target model. Hence, the information that can be represented under the source
model can also be represented under the target model.
In line with the general analysis concerning optional properties for conceptual
design in [1] we argue that multiplicities of 0 should be avoided in understandable
conceptual models whenever possible. Instead, we advocate to apply generaliza-
tion for conceptual MD modeling as well. To eliminate roll-up incompleteness,
the transformation is based on one QVT relation that replaces all occurrences
of roll-up incomplete Rolls-UpTo associations in the source model with general-
ization constructs. In this case, the optional Base class (which has multiplicity
0 at the role r ) should be associated with a suitable sub-class in a generalization
72 J.-N. Maz
on, J. Lechtenb
orger, and J. Trujillo
between Base classes: one to reect instances with the optional property and
the other one to reect instances without that property.
The corresponding QVT relation (rollUpIncompleteBases) is shown in Fig. 3(b).
It checks roll-up incompleteness in the Rolls-UpTo association between Base
classes in the source model. Specically, if a 0 multiplicity is detected in the role
r of a Rolls-UpTo association s between two Base classes b1 (e.g., Product in
Fig. 1) and b2 (e.g., Category in Fig. 1), then the relation enforces the creation of
new elements in the target model as follows: Two new Base classes b1 p and b2 p
that correspond to the source Base classes b1 and b2 , respectively. In addition,
two new Base sub-classes of b1 p, namely b3 and b4 , are created via new general-
ization relationships g1 and g2 . Here, b3 reects the instances of its super-class
b1 p that are associated with some instance of the optional Base class b2 p, and
b4 reects the remaining instances of b1 p. Furthermore, the roll-up incomplete
association s between b1 and b2 is replaced with a roll-up complete association
r between b3 and b2 p.
After the execution of the rollUpIncompleteBases relation, an excerpt of the
target model related to the Product dimension is shown in Fig. 4(b). Here, two
new Base classes, ProductWCategory and ProductWOCategory, are created to
reect those products that belong to a category or not, respectively. Again, those
parts of the source model that do not present roll-up incompleteness problems
are copied directly to the target model.
With respect to the soundness of the rollUpIncompleteBases relation, we focus
on the roll-up incomplete association of the source model, say association s
between b1 in the role d and b2 in the role r, which is represented dierently
in the target model. First, b1 and b2 are still present in the target model (as
b1 p and b2 p). Moreover, if an instance of b1 is not related to any instance of b2
A Model-Driven Approach for Enforcing Summarizability in MD Modeling 73
References
1. Bodart, F., Patel, A., Sim, M., Weber, R.: Should optional properties be used in
conceptual modelling? a theory and three empirical tests. Info. Sys. Research 12(4),
384405 (2001)
2. Lechtenb orger, J., Vossen, G.: Multidimensional normal forms for data warehouse
design. Inf. Syst. 28(5), 415434 (2003)
3. Lehner, W., Albrecht, J., Wedekind, H.: Normal forms for multidimensional
databases. In: Rafanelli, M., Jarke, M. (eds.) SSDBM, pp. 6372. IEEE Computer
Society, Los Alamitos (1998)
4. Lenz, H.J., Shoshani, A.: Summarizability in OLAP and statistical data bases.
In: Ioannidis, Y.E., Hansen, D.M. (eds.) SSDBM, pp. 132143. IEEE Computer
Society, Los Alamitos (1997)
5. Lujan-Mora, S., Trujillo, J., Song, I.Y.: A UML prole for multidimensional mod-
eling in data warehouses. Data Knowl. Eng. 59(3), 725769 (2006)
6. Malinowski, E., Zim anyi, E.: Hierarchies in a multidimensional model: From con-
ceptual modeling to logical representation. Data Knowl. Eng. 59(2), 348377 (2006)
7. Malinowski, E., Zim anyi, E.: Advanced data warehouse design: From conventional
to spatial and temporal applications. Springer, Heidelberg (2008)
8. Maz on, J.N., Lechtenb orger, J., Trujillo, J.: Solving summarizability problems in
fact-dimension relationships for multidimensional models. In: Song, I.Y., Abell o,
A. (eds.) DOLAP, pp. 5764. ACM, New York (2008)
9. Maz on, J.N., Lechtenborger, J., Trujillo, J.: A survey on summarizability issues in
multidimensional modeling. Data Knowl. Eng. 68(12), 14521469 (2009)
10. Pedersen, T.B., Jensen, C.S., Dyreson, C.E.: Extending practical pre-aggregation
in on-line analytical processing. In: VLDB, pp. 663674 (1999)
11. Pedersen, T.B., Jensen, C.S., Dyreson, C.E.: A foundation for capturing and query-
ing complex multidimensional data. Inf. Syst. 26(5), 383423 (2001)
12. Rafanelli, M., Shoshani, A.: STORM: A statistical object representation model.
In: Michalewicz, Z. (ed.) SSDBM 1990. LNCS, vol. 420, pp. 1429. Springer, Hei-
delberg (1990)
13. Rizzi, S., Abello, A., Lechtenb orger, J., Trujillo, J.: Research in data warehouse
modeling and design: dead or alive? In: Song, I.Y., Vassiliadis, P. (eds.) DOLAP,
pp. 310. ACM, New York (2006)
Repairing Dimension Hierarchies under Inconsistent
Reclassification
1 Introduction
Data Warehouses (DWs) integrate data from different sources, also keeping their his-
tory for analysis and decision support [1]. DWs represent data according to dimensions
and facts. The former are modeled as hierarchies of sets of elements (called dimension
instances), where each element belongs to a category from a hierarchy, or lattice of
categories (called a dimension schema). Figure 1(a) shows the dimension schema of a
Phone Traffic DW designed for an online Chilean phone call company, with dimensions
Time and Phone (complete example in [2]). Figure 1(b) shows a dimension instance for
the Phone schema. Here, TCH (Talcahuano), TEM (Temuco) and CCP (Concepcion) are
elements of the category City, and IX and VIII are elements of Region. The facts stored in
the Phone Traffic DW correspond to the number of incoming and outgoing calls of a
phone number at a given date. The fact table is shown in Figure 1(c).
To guarantee summarizability [3,4], a dimension must satisfy some constraints. First,
it must be strict, that is, every element of a category should reach (i.e., roll-up to) no
more that one element in each ancestor category (for example, the dimension instance
in Figure 1 (b) is strict). Second, it must be covering, meaning that every member
of a dimension level rolls-up to some element in another dimension level. Strict and
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 7585, 2011.
c Springer-Verlag Berlin Heidelberg 2011
76 M. Caniupan and A. Vaisman
1
To simplify the presentation and without loss of generality, we assume that categories do not
have attributes, and schemas have a unique bottom category.
78 M. Caniupan and A. Vaisman
45 41 TCH TEM CCP 45 41 TCH TEM CCP 45 41 TCH TEM CCP 45 41 TCH TEM CCP
N1 N2 N3 N1 N2 N3 N1 N2 N3 N1 N2 N3
D D1 D2 D3
Fig. 2. (Non-strict dimension instance (category names are omitted)(D); Repairs of the dimension
(dashed edges were inserted to restore strictness) (D1 -D3 )
dimension satisfying the constraints. In particular, from all possible repairs we are in-
terested in finding the minimal one. Intuitively, a minimal repair of a dimension is a
new dimension that satisfies a given set of constraints, and is obtained by applying a
minimum number of insertions and deletions to the original rollup relations. Although
techniques to compute repairs with respect to a set of constraints are well-known in the
field of relational databases [14], they cannot be applied in a DW setting, since it is
not trivial to represent strictness or covering constraints using relational constraints like
functional dependencies [8].
Defining a minimal repair requires a notion of distance between dimensions: Let
D = (M, <D ) and D = (M, <D ) be two dimensions over the same schema S =
(C, ). The distance between D and D is defined as dist(D, D ) = |(<D <D )
(<D <D )|, i.e. the cardinality of the symmetric difference between the two roll-up
relations [8]. Based on this, the definition of repair is as follows [8]: Given a dimension
D = (M, <) over a schema S = (C, ), and a set of constraints = s (S) c (S),
a repair of D with respect to is a dimension D = (M , < ) over S, such that D
satisfies , and M = M. A minimal repair of D is a repair D such that dist(D, D )
is minimal among all the repairs of D.
Minimal repairs according to [8] may result in dimensions where the reclassification
is undone. If we assume that a reclassification always represents information that is
certain, this semantics is not appropriate. For example, the minimal repairs (as defined
in [8]) for dimension D in Figure 2 are dimensions D1 , D2 and D3 . All of them were
obtained from D by performing insertions and/or deletions of edges. For example, D1
is generated by deleting edge ( CCP, VIII) and inserting (CCP, IX). The distances between
the repairs and D are: (a) dist(D, D1 ) = |(CCP, IX), (CCP, VIII)| = 2; (b) dist(D, D2 ) =
|(N3 ,41), (N3 ,45)| = 2; (c) dist(D, D3 ) = |(N3 ,TEM), (N3 ,CCP)| = 2. Note that dimension
D2 has undone the reclassification, and should not be considered an appropriate repair.
In light of the above, we next study the problem of finding repairs that do not undo
the reclassification. We denote these repairs r-repairs.
3.1 r-Repairs
Given a dimension D, a set R of reclassify operations is called valid if the following
holds: for every Reclassify (D, ci , cj , a, b) R there is no Reclassify (D, ci , cj , a, c)
R with c = b. In what follows we consider only valid reclassification.
Note that the distance between a dimension D and an r-repair D is bounded by the
size of D. For the dimension in Figure 1(b) and the set R = {Reclassify (D, Number,
AreaCode, N3 , 45)}, r-repair (D, R) contains dimensions D1 and D3 in Figure 2.
80 M. Caniupan and A. Vaisman
all all
All
d1 d2 d1 d2
D
H h1 h1
B C G b1 b2 b3 b4 c1 c2 c3 g1 b1 b2 b3 b4 c1 c2 c3 g1
A F a1 a2 a3 a4 a5 f1 a1 a2 a3 a4 a5 f1
E e1 e1
Fig. 3. Heuristics
D d1 d2 d1 d2
H I K h1 h2 i1 k1 h1 h2 i1 k1
F G B f1 g1 g2 b1 f1 g1 g2 b1
E C e1 c1 e1 c1
A a1 a2 a1 a2
Fig. 4. The dimension schema, instance and r-repair for the Algorithm
Algorithm search repair() (Figure 5) first verifies if an update operation leaves the
dimension instance non-strict. For this, the list list inconsistent paths containing the
non-strict paths is produced (line 2). If the dimension is non-strict, the algorithm re-
stores consistency of every non-strict path on the list. Let Reclassify(D, C1 , C2 , eu , pu )
be the update operation that affects categories in cat list paths and leaves the dimen-
sion non-strict. The algorithm obtains, for every cat bottom element in the inconsistent
list, the number of paths reaching new CL (new parent of eu in CL) and old CL (old
parent in CL) (Lines 5 to 10). If these numbers are equal, it means that there are only
two alternative paths for the corresponding cat bottom element, and the algorithm tries
to keep new CL as the ancestor to these paths in the CL (lines 12 to 21). If not, it
means that there are more paths reaching old CL, and the algorithm tries to update the
edge reaching new CL, since this produces less changes (lines 25 to 26). If not, the
algorithm assigns new CL to all the paths reaching old CL (lines 28 to 31).
As an illustration, consider the reclassification Reclassify (D, C, G, c1 , g2 ) applied
over the dimension in Figure 4(b). The reclassification affects to the bottom element
a1 and therefore list inconsistent paths contains the paths: [1]: a1 e1 f1 d1 .
[2]: a1 c1 g2 h2 d2 . [3]: a1 b1 i1 d1 . The old and new parent
in CL for a1 are: old CL = d1 , new CL = d2 , and the number of paths reaching d1
and d2 are, respectively, 2 and 1. Since there are more paths reaching the old parent
in CL, the algorithm tries to keep d1 as the ancestor in D for all the conflicting paths.
This operation is possible given that element h2 does not have other child different from
g2 , and also the update is not performed over h2 (validations performed by function
check change to new CL); thus, the algorithm deletes edge (h2 , d2 ) and inserts (h2 , d1 )
(function change new CL), producing the repair shown in Figure 4(c).
Proposition 1. Given a dimension D over an schema S, and a set of reclassify opera-
tions R of size 1. (a) Algorithm search repair() terminates in a finite number of steps.
(b) Algorithm search repair() finds an r-repair for dimension D.
search repair ()
Structure paths{String element, String category, next, below};
paths list inconsistent paths = NULL;
String new CL, old CL, e u, p u, parent child CL, child CL1, child CL2;
Int cost=0, cont same elements;
1: if check Consitency() = 0 then
2: list inconsistent paths= non strict paths();
3: while (list inconsistent paths.below = NULL) do
4: i=0;
5: cont same elements = find number paths(list inconsistent paths(i));
6: new CL = find new parent CL(list inconsistent paths(i),e u);
7: old CL = find old parent CL(list inconsistent paths(i),new CL);
8: {the parents in the CL before and after the update};
9: cont 1 =number paths reaching element(list inconsistent paths(i),new CL);
10: cont 2 =number paths reaching element(list inconsistent paths(i),old CL);
11: if (cont 1 = cont 2) then
12: {Same # of paths reaching the old and new parent in CL, try to keep the new parent};
13: child CL1 = find child CL(list inconsistent paths(i),old CL);
14: child CL2 = find child CL(list inconsistent paths(i),new CL);
15: {it captures the element in the category that reach the old(new) parent in CL};
16: if (check change to new CL(child CL1)=1) then
17: {It is possible to change to the new parent in CL};
18: cost = cost + change new CL(list inconsistent paths(i),child CL1, new CL);
19: else
20: cost = cost + change old CL(list inconsistent paths(i),child CL2, old CL);
21: end if
22: else
23: {# of paths reaching the old parent in CL is greater than the # of paths reaching the new parent in CL, try
to keep the old parent (second heuristics)};
24: child CL2 = find child CL(list inconsistent paths(i),new CL);
25: if (check change to old CL(child CL2)=1) then
26: cost = cost + change old CL(list inconsistent paths(i),child CL2, old CL);
27: else
28: for j = 1 TO cont 2 do
29: child CL1 = find child CL(list inconsistent paths(i),old CL);
30: cost = cost + change new CL(list inconsistent paths(i),child CL1, new CL);
31: end for
32: end if
33: end if
34: i = i + cont same elements;
35: move(list inconsistent paths,i);
36: end while
37: end if
[4], which is crucial for OLAP operations. Pedersen et al. [12] presented a first ap-
proach to this problem, transforming non-strict into strict dimensions by means of in-
sertion of artificial elements. Caniupan et al. present a logic programming approach to
repair dimensions that are inconsistent with respect to a set of constraints [8,19]. Al-
though important to gain insight into the problem of repairing inconsistent dimensions,
and containing some interesting theoretical results, from a practical point of view, the
approach presented in [8,19] would be computationally expensive in real-world cases.
Besides, DW administrators and developers are not acquainted with logic programs.
Moreover, for the specific case of reclassification, the work in [8,19] only deal with
repairs that may undo the update. On the contrary, the r-repairs we present in this paper
do not undo the reclassification. Finally, the minimal repairs obtained in [8,19] could
lead to rollup functions that do not make sense in the real world (e.g., relating dimen-
sion members that are not actually related in any way). Following a different approach,
we propose efficient algorithms that lead to consistent dimensions (although not nec-
essarily minimal with respect to the distance function), and where undesired solutions
could be prevented.
We have shown that, in general, finding r-repairs for dimension instances is NP-
complete. However, we also showed that in practice, computing r-repairs can be done
in polynomial time when the set of updates contains only one reclassification, and the
dimension schema has at most one conflicting level. We have explored algorithms to
compute r-repairs for this class of dimension schemas, and discussed their computa-
tional complexity, being in a worst case scenario of order O(n m k), where the
key term is n, the number of elements in the bottom level affected by the inconsisten-
cies. We would like to remark the fact that in the algorithms presented in this paper,
for the sake of generality, we did not include the possibility of preventing rollups that
could make no sense in practice. However, it is straightforward to enhance the search-
repair algorithm to consider only repairs that are acceptable by the user. At least two
approaches can be followed here: to prioritize the rollup functions (as, for example, is
proposed in [20]), or even to define some rollups to be fixed (and therefore, not allowed
to be changed). Of course, in the latter case, it may be the case where a minimal r-repair
does not exist. We leave this discussion as future work, as well as the experimentation
of the algorithms in real-world data warehouses.
Acknowledgements. This project was partially funded by FONDECYT, Chile grant
number 11070186. Part of this research was done during visit of Alejandro Vaisman
to University del Bo-Bo in 2010. Currently, Monica Caniupan is funded by DIUBB
110115 2/R. A. Vaisman has been partially funded by LACCIR project LACR-FJR-
R1210LAC004.
References
1. Chaudhuri, S., Dayal, U.: An Overview of Data Warehousing and OLAP Technology.
SIGMOD Record 26, 6574 (1997)
2. Bertossi, L., Bravo, L., Caniupan, M.: Consistent query answering in data warehouses. In:
AMW (2009)
3. Hurtado, C., Gutierrez, C., Mendelzon, A.: Capturing Summarizability with Integrity Con-
straints in OLAP. ACM Transacations on Database Systems 30, 854886 (2005)
4. Lenz, H., Shoshani, A.: Summarizability in OLAP and Statistical Data Bases. In: SSDBM,
pp. 132143 (1997)
Repairing Dimension Hierarchies under Inconsistent Reclassification 85
5. Rafanelli, M., Shoshani, A.: STORM: a Statistical Object Representation Model. In:
Michalewicz, Z. (ed.) SSDBM 1990. LNCS, vol. 420, pp. 1429. Springer, Heidelberg
(1990)
6. Hurtado, C., Mendelzon, A., Vaisman, A.: Maintaining Data Cubes under Dimension Up-
dates. In: ICDE, pp. 346355 (1999)
7. Hurtado, C., Mendelzon, A., Vaisman, A.: Updating OLAP Dimensions. In: DOLAP, pp.
6066 (1999)
8. Caniupan, M., Bravo, L., Hurtado, C.: A logic programming approach for repairing inconsis-
tent dimensions in data warehouses. Submitted to Data and Knowledge Engineering (2010)
9. Dodge, G., Gorman, T.: Essential Oracle8i Data Warehousing: Designing, Building, and
Managing Oracle Data Warehouses (with Website). John Wiley & Sons, Inc., Chichester
(2000)
10. Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Complete Guide to Dimensional
Modeling. John Wiley & Sons, Inc., Chichester (2002)
11. Hurtado, C., Mendelzon, A.: Reasoning about summarizability in heterogeneous multidi-
mensional schemas. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973,
p. 375. Springer, Heidelberg (2000)
12. Pedersen, T., Jensen, C., Dyreson, C.: Extending Practical Pre-Aggregation in On-Line An-
alytical Processing. In: VLDB, pp. 663674 (1999)
13. Vaisman, A.: Updates, View Maintenance and Materialized Views in Multidimnensional
Databases. PhD thesis, Universidad de Buenos Aires (2001)
14. Bertossi, L.: Consistent query answering in databases. ACM Sigmod Record 35, 6876
(2006)
15. Zhuge, Y., Garcia-Molina, H., Wiener, J.L.: Multiple View Consistency for Data Warehous-
ing. In: ICDE, pp. 289300 (1997)
16. Gupta, H., Mumick, I.S.: Selection of views to materialize under a maintenance cost con-
straint. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 453470.
Springer, Heidelberg (1998)
17. Schlesinger, L., Lehner, W.: Extending Data Warehouses by Semiconsistent Views. In:
DMDW, pp. 4351 (2002)
18. Letz, C., Henn, E.T., Vossen, G.: Consistency in Data Warehouse Dimensions. In: IDEAS,
pp. 224232 (2002)
19. Bravo, L., Caniupan, M., Hurtado, C.: Logic programs for repairing inconsistent dimensions
in data warehouses. In: AMW (2010)
20. Espil, M.M., Vaisman, A., Terribile, L.: Revising data cubes with exceptions: a rule-based
perspective. In: DMDW, pp. 7281 (2002)
GrHyMM: A Graph-Oriented Hybrid
Multidimensional Model
Dipartimento di Informatica
Universit degli Studi di Bari Aldo Moro
Via Orabona 4, 70125, Bari, Italy
{francescoditria,lefons,tangorra}@di.uniba.it
Abstract. The main methodologies for the data warehouse design are based on
two approaches which are opposite and alternative each other. The one, based
on the data-driven approach, aims to produce a conceptual schema mainly
through a reengineering process of the data sources, while minimizing the
involvement of end users. The other is based on the requirement-driven
approach and aims to produce a conceptual schema only on the basis of
requirements expressed by end users. As each of these approaches has valuable
advantages, it is emerged the necessity to adopt a hybrid methodology which
combines the best features of the two approaches. We introduce a conceptual
model that is based on a graph-oriented representation of the data sources. The
core of the proposed hybrid methodology is constituted by an automatic process
of reengineering of data sources that produces the conceptual schema using a
set of requirement-derived constraints.
1 Introduction
The actual lack of a standard process for data warehouse design has led to the
definition of several methodologies. This is especially true in the conceptual design,
where opposite approaches can be adopted based on the requirement-driven and data-
driven methodologies [1]. The requirement-driven approach, also known as demand-
driven or goal-oriented methodology, aims to define multidimensional schemas using
business goals resulting from the decision makers needs. The data sources are
considered later, when the Extraction, Transformation, and Loading (ETL) phase is
designed [2]. In this feeding plan, the multidimensional concepts (such as facts,
dimensions, and measures) have to be mapped on the data sources in order to define
the procedures to populate the data warehouse by cleaned data. At this point, it may
happen that the designer discovers that the needed data are not currently available at
the sources. On the contrary, the data-driven approach, also known as supply-driven
methodology, aims to define multidimensional schemas on the basis of a remodelling
of the data sources. This process is individually executed by the designer who
minimizes the involvement of end users and, consequently, goes towards a possible
failure of their expectations. To overcome these limits, in the last years the necessity
to define hybrid methodologies arose [3].
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 8697, 2011.
Springer-Verlag Berlin Heidelberg 2011
GrHyMM: A Graph-Oriented Hybrid Multidimensional Model 87
2 Methodological Framework
Our framework for data warehouse design is depicted in Figure 1 and describes the
activities that must be performed by designers, along with the produced artifacts.
2. Source analysis. In this step, the different schemas of data sources must be
analyzed and then reconciled, in order to obtain a global and integrated schema.
Also this artifact must be given in input to the conceptual design process. The
constraints derived from the requirement analysis are used to perform a
reengineering of the source schema.
3. Conceptual design. This step is based on GrHyMM, the graph-oriented
multidimensional model. The sub-steps to be performed in this phase are: (3a)
identifying facts present in the source schema on the basis of the constraints; (3b)
building an attribute tree for each fact; (3c) remodelling the tree on the basis of the
constraints; and (3d) verifying whether all the remodelled trees agree with the
workload (this is the so-called validation process).
4. Logical design. In this step, the conceptual schema is transformed into a logical
one with reference to the data model of the target database system.
5. Physical design. The logical schema implementation ends the design process with
defining the physical database properties based on the specific features provided
by the database system, such as indexing, partitioning, and so on.
Here, we address only the first steps 1 to 3, as steps 4 and 5 strongly depend on target
systems. As an example, ROLAP and MOLAP systems can be utilized in step 4.
3 Requirement Analysis
In the i* framework, user requirements, alias business goals, are exploded into a more
detailed hierarchy of nested goals: (a) strategic goals, or high-level objectives to be
reached by the organization; (b) decision goals, to answer how strategic goals can be
satisfied; and (c) information goals, to define which information is needed for
decision making. To do this, the designer must produce a model describing the
relationships among the main actors of the organization, along their own interests.
This model is the so-called strategic dependency model and aims to outline how the
data warehouse helps decision makers to achieve business goals. As in this context
the i* framework applies to data warehousing, the data warehouse itself has to be an
actor of the system. Each actor in a strategic dependency model is further detailed in a
strategic rationale model that shows the specific tasks the actor has to perform in
order to achieve a given goal.
In Figure 2, there is reported an example of strategic rationale model showing the
strategic goals and the tasks of the decision maker and data warehouse actors.
day
amount
Analyze the month
shipment costs year
per client from product
Provide category
2000 to now
D
Information information
about sales D sales
about products
selling store
Increase
the profit region
Information
about shipments D shipments Provide
day
Analyze the information
month
profit of the about
year
products selling shipments
in the last year cost location
client
Decision maker Data warehouse
In our methodology, the designer translates the tasks of the decision makers into a
workload, while the goals and the tasks of the data warehouse are transformed into a
set of constraints that must be taken into account in the next data remodelling phase.
The workload contains a set of queries derived from user requirements and helps the
designer to identify the information the users are interested in. In a few words, it
includes the typical queries that will be executed by decision makers in the analytical
processing step (OLAP).
The grammar for a high-level representation of the queries of the workload follows.
Notice that constant is any number or string, and identifier a user-defined name
corresponding to a valid variable (the name of a table or column, for example).
Example 1. With reference to the decision makers tasks shown in Figure 2, we have
to manually translate them in high-level queries. First, the task Analyze the profit per
products sold in 2010 corresponds to the SQL-like statement select product,
sum(amount) from sale join product where year = 2010 group by product_id; and
grammar-based statement sum(sale[product; year = 2010].amount);. On the other
hand, the second task corresponds to the query select client, sum(cost) from shipment
join client where year 2000 and year 2011 group by client_id; and statement
sum(shipment[client; year 2000 and year 2011].cost);.
The workload will be used in the final step of the conceptual design in order to
perform the validation process. If all the queries of the workload can be effectively
executed over the schema, then such a schema is assumed to be validated and the
designer can safely translate it into the corresponding logical schema. Otherwise, the
conceptual design process must be revised.
For each resource needed from decision makers, the data warehouse must provide
adequate information by achieving its own goals. Moreover, a goal must have
measures that are resources to be used in order to provide the information required for
decision making. Therefore, a fact is generated in order to allow the data warehouse
90 F. Di Tria, E. Lefons, and F. Tangorra
to achieve its own goal. Finally, for each measure, a context of analysis must be
provided by a task of the data warehouse. So, starting from measures and dimensions
emerged from business goals, they can be identified some constraints the designer
must necessarily consider. The constraints will be used in the main step of the
conceptual design in order to perform the remodeling process in a supervised way.
The grammar for a high-level representation of constraints follows.
Example 2. The constraint sales[product, category; day, month, year; store, region].
[amount]; states that we must have a fact, namely sales, provided with the amount
measure. This fact has three dimensions. The first dimension has two levels: product
(the first dimensional level of the hierarchy) and category (the second one). The
second dimension has levels day, month, and year, while the third dimension levels
store and region.
4 Sources Analysis
The aim of this activity is to produce a global and reconciled schema coming from the
integration of the heterogeneous data sources. In this phase, it is very important to
align the different schemas [8] using a unique model. The methodology to accomplish
this is faced in [9], where a supermodel is proposed provided with a set of meta-
constructs that can be mapped to constructs of specific models. The transformation
between schemas is performed via Datalog rules.
A class of problems derives from inconsistency among schemas, as cardinality
constraints. As an example, in a university database a professor can hold one or many
courses but in another one a professor can hold zero or two courses. Whereas there is
an agreement on the meaning of the professor entity, there is a discordance about how
many courses a professor may hold.
Inconsistencies can be treated using an ontological approach [10]. So, starting from
a logical schema, an abstraction process has to be performed to derive a conceptual
schema, such as an Entity Relationship (ER) schema. Then, all the inconsistencies in
the conceptual schemas must be solved by defining a common and shared ontology.
Indeed, an ER schema is used to represent locally true concepts, that is, concepts that
are true in a given context. On the other hand, an ontology is used to represent
necessarily true concepts, that is, concepts that are always true [11].
Relative to the previous example, if a professor may hold one or many courses in
one database and a professor may hold zero or two courses in another database, then
this inconsistency can reconciled in an ontology that states that a professor, in general,
may hold zero or many courses.
GrHyMM: A Graph-Oriented Hybrid Multidimensional Model 91
5 Conceptual Design
This section describes how to build and to remodel an attribute tree to represent a fact
identified from user requirements. In detail, the design consists of four steps to be
performed in automatic way: the identification of facts is done by marking possible
fact tables in data source and by matching them with constraints derived from user
requirements; the building of an attribute tree is defined by a set of formal
assumptions; the remodeling of the tree is defined by a novel algorithm; and, at last,
the validation is based on a set of non-ambiguous rules represented using the first
order logic [6].
The main difficulty in this step is to correctly map the multidimensional concepts to
the integrated schema. For example, in a relational schema, the designer must face the
problem to identify which table can be considered as a fact. To solve this problem,
several methodologies have been defined to automatically identify multidimensional
concepts in order to correctly map them onto the data sources.
For example, in [12] the user requirements can be entirely derived by defining a
preliminary workload. On the basis of this assumption, the authors propose an
algorithm able to automatically create a graph (whose nodes are the tables of the data
sources and edges are the joins between tables) that aims to identify whether each
table can be considered as a fact table or a dimension table. They state that a correct
labelling of all nodes generates a valid multidimensional schema. The labels are
assigned by examining the role played by tables and attributes in the SQL queries
included in the preliminary workload.
In our model, we mainly deal with the correct identification of facts, as these are
the starting point to build the initial attribute tree. According to some heuristics
reported in [13], we suggest that a table in the data source is a candidate fact provided
that: (a) it is a very-frequently-updated relation; (b) it has numeric attributes; (c) it
does not have its own primary key; (d) its primary key is composed of two or more
foreign keys; and (e) it represents a many-to-many relationship among relations.
We consider the facts involved in the constraints coming from the requirement
analysis. Given a fact F1 in a constraint, we choose a candidate fact F2 in the
integrated schema such that F2 corresponds to F1. We assume no inconsistencies exist
(nor syntactic neither semantic) among user requirements and data sources, though in
many cases user requirements do not agree with concepts in data source.
Given a fact, we now show how to build the initial attribute tree. Of course, there
can be as many trees as many facts.
Assumption 1. Let R(X1, , Xn) be a relation, and let G = (N, E) be an attribute tree.
We assume XiN, i = 1, , n.
We also assume that G = (N, E) is the tree obtained from R, by invoking the algorithm
tree(R). In particular, G = (N, E) is the tree constructed by considering the relation R
as a starting point.
Assumption 2. Let R(X1, , Xn) be a relation, and let G = (N, E) be a tree. We
assume (Xi, Xj)E, if Xi is the primary key of R and i j.
On the basis of Assumption 2, the edge (Xi, Xj) states that the non-trivial (for i j)
functional dependency Xi Xj holds on R (in this case, established by a primary key
constraint).
It is worth noting that the primary key of a relation can be composed of more than
one attribute. In this case, the node representing the primary key is a composite node.
Example 3. Let Student(code, university, name, address) be a relation, whose primary
key is composed of code and university. Then, naming the primary key by the relation
name, we have that:
On the basis of Assumption 3, an edge can also indicate the functional dependency
established by a foreign key constraint. So, the next Assumption 4 describes how to
build a tree starting from a relation having r foreign keys. Accordingly, we assume
that the algorithm tree is able to navigate among relations via foreign key constraints.
To this end, Assumption 5 describes how to build a tree when a many-to-many
relationship is encountered in the navigation process.
Assumption 4. Let R1(X11, , X1h), R2(X21, , X2k), , Rr(Xr1, , Xrq) be r
relations, and let T(Z1, , Zp) be a relation where r p. If
Xiji is the primary key of the relation Ri, i = 1, 2, , r, and
Zt1, , ZtrT, such that
GrHyMM: A Graph-Oriented Hybrid Multidimensional Model 93
Assumption 5. Let R(X1, , Xh), S(Y1, , Yk), and T(Z1, , Zp) be three relations,
and let G = (N, E) a tree. If
Example 4. Let us consider the schema of Figure 3(a). The algorithm tree starts from
Sale and reaches Cataloging. The primary key of the Cataloging relation is composed
of foreign keys. In fact, this relation is an intermediate relation, that establishes a
many-to-many relationship between Product and Catalog. So, the edge between
Product and Cataloging is represented by a dot-headed arrow, as depicted in Figure
3(b), to indicate that, given a prodId instance, many occurrences in the Cataloging
relation will correspond to.
In conclusion, we note that, sometimes, a node can have multiple parents. This
happens when an attribute is pointed out by more than one table. In this case, the tree
is a graph or a semi-tree, since there is always a root node representing the fact.
In what follows, given an attribute A, we denote with A the node representing the
corresponding attribute. Moreover, we indicate with A B the edge existing from the
A node to the B node. At last, we denote with (A, B) :- C the fact that the attribute C
can be computed using A and B, that is, C is a derived measure. As an example,
(price, quantity) :- amount means that there exists an expression to compute amount
94 F. Di Tria, E. Lefons, and F. Tangorra
Cataloging Sale
Catalog Product Order
PK,FK1 catId PK,FK1 prodId (a)
PK catId PK prodId PK orderId
PK,FK2 prodId PK,FK2 orderId
catDesc prodDesc date
stock quantity
(b)
using price and quantity. The basic operations on a tree are: (a) add A B, or adding
an edge from A to B; (b) remove A B, or removing the edge from A to B; and (c)
prune A, or removing the A node with all its children.
Let us consider a tree T and a constraint coming from the user requirements. In
informal way, we create as many root children as many measures there are in the
constraint. Moreover, we add a root child for each first dimensional level in the
constraint. In a recursive way, the other dimensional levels are added as children
nodes of their own predecessor levels. In general, when we add a new node B to a
parent node A, the node B can be created ex novo or can be already present in the tree.
In the latter case, the edge between B and the old parent of B must be deleted and a
new edge between B and the new parent A must be created; this is the so-called
change parent operation.
Algorithm 1 performs the attribute tree remodelling.
Example 5. On the tree of Figure 4(a), given the constraint shipments[client; location;
day, month, year].[cost]; we perform the following operations. First, since cost can be
computed by the sum of transport_cost and delivery_cost, then we delete both the
transport_cost and delivery_cost nodes, and we add the cost node as root child (lines
6-8). Second, also the client, location, and day nodes becomes three distinct root
children via a change parent operation (lines 15-18). Third, in the constraint, the only
hierarchy to be built is formed by day, month, and year. So, a new node month
becomes the child of day and a new node year becomes the child of month (line 24).
All the other nodes, such as carrier and order are pruned (lines 30-31). The resulting
tree is shown in Figure 4(b).
GrHyMM: A Graph-Oriented Hybrid Multidimensional Model 95
Fig. 4. (a) shipments attribute tree; (b) remodelled shipments attribute tree
96 F. Di Tria, E. Lefons, and F. Tangorra
In the validation process [14], we have several attribute trees (whose collection is a
conceptual schema) and a workload, composed of high-level queries. A query is
assumed to be validated if there exists at least an attribute tree such that the following
conditions hold: (a) the fact is the root of the tree; (b) the measures are the children
nodes of the root; (c) for each level in the aggregation pattern, there exists a path from
the root to a node X, where X is a non-leaf node representing the level; and (d) for
each attribute in the selection clause, there exists a path from the root to a node Y,
where Y is a leaf node representing the attribute.
If all the queries are validated, then each attribute tree can be considered as a cube,
where the root is the fact, non-leaf nodes are aggregation levels, and leaf nodes are
descriptive attributes belonging to a level. So, the conceptual design ends and the
designer can transform the conceptual schema into a logical one. On the other hand, if
a query cannot be validated, then the designer has to opportunely modify the tree. For
example, if an attribute of the selection clause is not in the tree, then the designer can
decide to add a further node. A deeper discussion on the validation process can be
found in [6], along with the methodology to execute this task in automatic way.
Example 6. Let us consider the following query: sum(shipments[client; year 2000
and year 2011].cost);. Such a query is validated on the tree of Figure 4(b) because
shipments is the root of the tree, cost is the child node of the root, client is a non-leaf
node reachable from the root (that is, client is a correct aggregation level belonging to
the first dimension), and year is a leaf node reachable from the root.
The validation process ends the conceptual design. Two similar approaches can be
found in [15, 16]. They differ from our approach in that they produce conceptual
schemas by reconciling user requirements with data sources. In this way, possible
lack of compliance to the user requirements needs can arise, for some requirements
could have been disregarded in order to preserve data integrity. On the other hand, we
start from data source and use user requirements as constraints to remodel a source
schema in automatic way, preserving both business needs and data integrity.
6 Conclusions
In this paper, we have presented a multidimensional model for data warehouse
conceptual design. This model can be surely adopted in a hybrid methodology for
data warehouse design. In fact, the model uses a graph-oriented representation of an
integrated relational schema. So, the data remodelling process is expressed according
to traditional operations on a tree and is based on a set of constraints derived from the
requirement analysis, along with a preliminary workload to be used to validate the
conceptual schema. The main contribution of this model is that all the steps to be
performed in the conceptual design can be automated thanks to the high degree of
formalization given by i* schemas and to the precise and non-ambiguous rules to be
followed in the remodeling activity.
We are now working on the development of a logical program that considers such
rules for the construction and the remodeling of the attribute trees, and the validation
of the resulting conceptual schema.
GrHyMM: A Graph-Oriented Hybrid Multidimensional Model 97
References
1. Ballard, C., Herreman, D., Schau, D., Bell, R., Kim, E., Valencic, A: Data Modeling
Techniques for Data Warehousing. IBM Redbooks (1998)
2. Kimball, R.: The Data Warehouse Lifecycle Toolkit, 2nd edn. Practical Techniques for
Building Data Warehouse and Business Intelligence Systems. John Wiley & Sons,
Chichester (2008)
3. Romero, O., Abell, A.: A Survey of Multidimensional Modeling Methodologies.
International Journal of Data Warehousing and Mining 5(2), 123 (2009)
4. Golfarelli, M., Maio, D., Rizzi, S.: The Dimensional Fact Model: a Conceptual Model for
Data Warehouses. International Journal of Cooperative Information Systems 7, 215247
(1998)
5. dellAquila, C., Di Tria, F., Lefons, E., Tangorra, F.: Dimensional Fact Model Extension
via Predicate Calculus. In: 24th International Symposium on Computer and Information
Sciences, Cyprus, pp. 211217 (2009)
6. dellAquila, C., Di Tria, F., Lefons, E., Tangorra, F.: Logic Programming for Data
Warehouse Conceptual Schema Validation. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M.
(eds.) DAWAK 2010. LNCS, vol. 6263, pp. 112. Springer, Heidelberg (2010)
7. Mazn, J.N., Trujillo, J., Serrano, M., Piattini, M.: Designing Data Warehouses: from
Business Requirement Analysis to Multidimensional Modeling. In: Cox, K., Dubois, E.,
Pigneur, Y., Bleistein, S.J., Verner, J., Davis, A.M., Wieringa, R. (eds.) Requirements
Engineering for Business Need and IT Alignment. Wales Press, University of New South
(2005)
8. Rahm, E., Bernstein, P.: A Survey of Approaches to Automatic Schema Matching. The
International Journal on Very Large Data Bases 10(4), 334350 (2001)
9. Atzeni, P., Cappellari, P., Torlone, R., Bernstein, P., Gianforme, G.: Model-Independent
Schema Translation. The International Journal on Very Large Data Bases 17(6), 1347
1370 (2008)
10. Romero, O., Abell, A.: Automating Multidimensional Design from Ontologies. In:
Proceedings of the ACM 10th International Workshop on Data Warehousing and OLAP,
Lisbon, Portugal, pp. 18 (November 9, 2007)
11. Spyns, P., Meersman, R., Jarrar, M.: Data Modelling versus Ontology Engineering. ACM
SIGMOD Record 31(4), 1217 (2002)
12. Romero, O., Abell, A.: Multidimensional Design by Examples. In: Tjoa, A.M., Trujillo,
J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 8594. Springer, Heidelberg (2006)
13. Carm, A., Mazn, J.N., Rizzi, S.: A Model-Driven Heuristic Approach for Detecting
Multidimensional Facts in Relational Data Sources. In: Pedersen, T.B., Mohania, M.K.,
Tjoa, A.M. (eds.) DAWAK 2010. LNCS, vol. 6263, pp. 1324. Springer, Heidelberg
(2010)
14. Golfarelli, M., Rizzi, S.: A Methodological Framework for Data Warehouse Design. In: 1st
ACM International Workshop on Data Warehousing and OLAP, Washington D.C., USA,
pp. 39 (1998)
15. Giorgini, P., Rizzi, S., Garzetti, M.: GRAnD: A Goal-oriented Approach to Requirement
Analysis. Decision Support Systems 45(1), 421 (2008)
16. Mazn, J.N., Trujillo, J., Lechtenborger, J.: Reconciling Requirement-driven Data
Warehouses with Data Sources via Multidimensional Normal Forms. Data Knowl.
Eng. 63(3), 725751 (2007)
Ontologies and Functional Dependencies for
Data Integration and Reconciliation
1 Introduction
In the past decades, Enterprise and Information Integration (EII) became an es-
tablished business, where commercial and academic tools integrating data from
various sources exist. They provide uniform and transparent access to data. The
spectacular development of this business is largely due to companies requiring
being able to access data located over the Internet and within their Intranets
[8,12]. Integration problem inputs are a set of distributed, heterogeneous, au-
tonomous sources where each one has its schemes and populations. It outputs
a unied description of source schemes via an integrated schema and mapping
rules allowing the access to data sources. The construction of a data integration
system is a hard task due to the following main points: (a) the large number
of data sources candidate for integration, (b) the lack of explicitation of the
semantic of sources, (c) the heterogeneity of sources and (d) the autonomy of
sources. (a) The explosion of data sources: the number of data sources involved
in the integration process is increasing. The amount of information generated in
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 98107, 2011.
c Springer-Verlag Berlin Heidelberg 2011
Ontologies and Functional Dependencies for Data Integration 99
the world increases by 30% every year and this rate is bound to accelerate [8],
especially in domains such as E-commerce, engineering, etc. Integrating these
mountains of data requires automatic solutions. (b) The lack of explicitation of
the semantic of sources: the semantics of data sources is usually not explicit.
Most sources participating in the integration process were designed to satisfy
day-to-day applications and not to be integrated in the future. Often, the small
amount of semantic contained in their conceptual models is lost, since only their
logical models are implemented and used by applications. The presence of a
conceptual model may oer designers to express the application requirements
and domain knowledge in an intelligible form for a user. Thus, its absence or
any other semantic representation in nal databases makes their interpretation
and understanding complicated, even for designers who have good knowledge of
the application domain. (c) The heterogeneity of data sources impacts both the
structure and the semantic. Structural heterogeneity exits because data sources
may have dierent structures and/or dierent formats to store their data. The
autonomy of the sources increases heterogeneity signicantly. Indeed, the data
sources are designed independently by various designers with dierent applica-
tion objectives. Semantic heterogeneity presents a major issue in developing in-
tegration systems [11]. It is due to dierent interpretations of real world objects,
generating several categories of conicts (naming conicts, scaling conicts, con-
founding conicts and representation conicts [10]). (d) Autonomy of sources:
most sources involved in the data integration are fully autonomous in choosing
their schemes.
To deal with semantic problem and ensure an automatic data integration, an
important number of research studies propose the use of ontologies, where several
integration systems were proposed around this hypothesis. We can cite COIN
[10], Observer [14], OntoDaWa [3], etc. In [3], we claim that if the semantic of
each source participating in the integration process is explicit (in a priori way),
the integration process becomes automatic. This means the used ontology exists
before the creation of the sources. This assumption is reasonable since several
domain ontologies exist in various areas: medicine, engineering, travel, etc. The
explicitation of dierent concepts used in a database leads to the concept of
ontology-base databases (OBDB). Several academic and industrial systems oer
solutions to manage this type of databases (e.g., Jena, Sesame, Oracle, IBM
Sor ).
To deal with data reconciliation issue, we gure out that most existing inte-
gration systems (with a mediator architecture) suppose that there is a common
single identier for each common concept between the sources. The assumption
facilitates the reconciliation of query results, but it violates the sources auton-
omy. Other research eorts use of entity reconciliation methods [4,17] performed
either by entity matching and data fusion. These approaches are ecient for
linguistic and information retrieval applications [8], but they may suer in the
context of sensitive applications such as banking, healthcare, travel, engineer-
ing, etc. The similarity between conceptual models and ontologies and the recent
work focusing on the denition of functional dependencies F D on ontologies [16]
100 A. Bakhtouchi, L. Bellatreche, and Y. Ait-Ameur
2 Background
In this section, we present concepts and denitions related to the ontology and
data ontology-based database to facilitate the understanding of our proposal.
OBDB is a database usually composed of four part [7], since it stores both data
and ontology describing their sense. Part 1 and 2 are traditional parts available
in all DBMSs, namely the data part that contains instance data and meta-base
part that contains the system catalog. The ontology part (3) allows the repre-
sentation of ontologies in the database. The meta-schema (4) part records the
ontology model into a reexive meta-model. For the ontology part, the meta-
schema part plays the same role as the one played by the meta-base in tra-
ditional databases. By means of naming convention, the meta-base part also
represents the logical model of the content, and its link with the ontology, thus
Ontologies and Functional Dependencies for Data Integration 101
R is the starting point of all paths in the left part and the right part, so that a
F D expresses relationships among properties of a single instance of the class R.
After the F D formalisation, we propose to extend the initial ontology model
proposed in Section 2 as follows: O :< C, P, Sub, Applic, F D >, where F D is a
binary relationship F D: C (2P 2P ... 2P , 2P ) which associates to each
class c of C, the set of the functional dependencies (LP, RP ), where the class c
is the root (f d c : LP RP ).
Note that reconciliation of query results in a mediator architecture leads to
four possible cases: (1) manual reconciliation based on the experience and deep
knowledge of data sources of designers which is practically impossible in the
real life, where a large number of sources is involved. (2) Only sources having
common identiers are taken into consideration to process queries. In this case,
mediator may propagate the query on sources having the common identiers.
This solution compromises the quality of returned results. (3) Query results are
merged, where some instances overlap which may cause error. (4) Overlapping
instances may be discarded using probabilistic reconciliation.
The presence of F D may help and facilitate the data reconciliation, especially
when no common identier is used by various sources. To illustrate this point,
let us consider the following example.
Example 1. Let S1 , S2 and S3 be three sources containing the same relation
Customer. With dierent properties as follows: S1 .Customer (id(PK), name,
address, phoneNumber), S2 .Customer (id(PK), name, phoneNumber) and
S3 .Customer (phoneNumber(PK), name, address). On this table, the following
F D are dened: f d1 : Customer : id name, f d2 : Customer : id address,
f d3 : Customer : id phoneN umber, f d4 : Customer : phoneN umber
name, f d5 : Customer : phoneN umber address.
This table has two candidate keys: id and phoneN umber. Suppose that the
mediator schema contains a Customer relation with the following properties:
id(P K), name, address. Suppose the following query: list names and addresses
of all customers. The mediator decomposes this query on the three sources.
Without F D, we cannot reconcile all sources, since the source S3 has dierent
identier. By using f d4 : phoneN umber name and f d5 : phoneN umber
address, we notice that the attribute phoneN umber is a common candidate key
between the three sources. Therefore, a reconciliation of the results coming from
these three sources becomes possible.
The schema Sch, the source schemas S and the mapping M are initially empty
(Sch(c) = c C, S = and M(c) = c C).
Integrating a new source. the mediator ontology is determined only once whereas
the schema Sch, the source schemas S and the mapping M are updated af-
ter each integration of a new source. The integration of a new source Si :<
Oi , Ii , Schi , P opi > is run in the following steps:
1. We update the source schemas S by adding the schema of the new source
(S = S Si :< OLi , SchLi >).
2. In the ontology OLi :< CLi , P Li , SubLi , ApplicLi >, we keep only classes
and properties existing in the mediator ontology (OLi = O Oi ).
3. We import the schemas of the OLi classes from Schi (c CLi SchLi (c) =
Schi (c)).
4. We update the schema of the mediator ontology classes Sch by adding the
properties valuated in Si to the schema of their classes (Sch(c) = Sch(c) {p
SchLi (c)}).
5. We update the mapping of the mediator classes by adding the mapping be-
tween the classes of the mediator ontology O and the classes of the new source
ontology OLi (c CLi M(c) = M (c) Si .c).
The query engine performs the following tasks for a given user query Qi . Let
P roj be the set of projected properties used in Qi . Among this set, two types of
F D may be distinguished: (1) direct F D already exist in the mediator ontology,
where a property from P roj exists in the right part of F D and (2) generated F D
obtained using a similar algorithm of [15]. The presence of both types of depen-
dencies allows us to generate the reconciliation key. The query engines identies
then the relevant sources and rewrites a global query dened on mediator on-
tology in local queries dened in sources, where each one is sent to relevant
sources. Finally, the reconciliator merges the results using the reconciliation key
and sends the nal result to the user interface.
that our system executes eciently queries involving a small set of classes (less
joins) (e.g. Q14 (x):- UndergraduateStudent(x)), but, for queries involving large
number of classes (e.g. Q9 (x):- Student(x), Faculty(y), Course(z), advisor(x, y),
takesCourse(x, z), takesCourse(y, z)), the response time is quite high, but still
reasonable. An interesting issue from this result is to separate the query response
time into two parts: mediator processing time (including nding the functional
dependencies that hold in the query, deriving the reconciliation key and recon-
ciliation of result) and local query evaluation.
In the same direction of the previous experiment, we conduct another one,
by considering low costly and high costly queries (Q7 and Q9) and varying the
number of instances of 10 used sources. Figure 3 shows the obtained results. An
interesting result came from the query 10 considered as low costly query in the
rst experiments, but when the number of instances increases, join operation
becomes costly where it becomes costly. This shows the query response time
depends heavily on the sources and their ability of processing queries and not
on the mediator.
6 Conclusion
The need for developing semantic integration systems increases with the evo-
lution of domain ontologies in various systems such as engineering, medicine,
etc. The presence of ontologies contributes largely in solving heterogeneity of
sources. Some actual integration systems suppose that the manipulated sources
have similar keys to ensure data integration which violates the autonomy charac-
teristic of sources. Others use statistical techniques to reconcile data. In sensitive
domains, such techniques cannot be used. In this paper, we proposed a complete
ontology-based integration method that covers most important phases of system
integration life cycle. The presence of ontology contributes for both reducing het-
erogeneity and oering mechanisms for data reconciliation, since it is enriched
by functional dependencies dened on each ontology class. Our approach is eval-
uated using the dataset of Lehigh University Benchmark. The obtained results
show its eciency and feasibility.
Ontologies and Functional Dependencies for Data Integration 107
Two main issues that arise from our preliminary work should be explored: (i)
conducting of a large scale evaluation to measure the real eciency of our system
and (ii) dening of metrics to measure the quality of our integration system.
References
1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases (1995)
2. Bellatreche, L., Ait Ameur, Y., Chakroun, C.: A design methodology of ontology
based database applications. Logic Journal of the IGPL, 118 (2010)
3. Bellatreche, L., Xuan, D.N., Pierra, G., Dehainsala, H.: Contribution of ontology-
based data modeling to automatic integration of electronic catalogues within engi-
neering databases. Computers in Industry Journal Elsevier 57(8-9), 711724 (2006)
4. Bleiholder, J., Naumann, F.: Data fusion. ACM Computing Surveys 411(1), 141
(2008)
5. Calbimonte, J.P., Porto, F., Maria Keet, C.: Functional dependencies in owl abox.
In: Brazilian Symposium on Databases (SBBD), pp. 1630 (2009)
6. Calvanese, D., Giacomo, G., Lenzerini, M.: Identication constraints and functional
dependencies in description logics. In: Proc. of IJCAI, pp. 155160 (2001)
7. Dehainsala, H., Pierra, G., Bellatreche, L.: OntoDB: An ontology-based database
for data intensive applications. In: Kotagiri, R., Radha Krishna, P., Mohania, M.,
Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 497508. Springer,
Heidelberg (2007)
8. Dong, X.L., Naumann, F.: Data fusion - resolving data conicts for integration.
PVLDB 2(2), 16541655 (2009)
9. Fan, W.: Dependencies revisited for improving data quality. In: PODS, pp. 159170
(2008)
10. Goh, C.H., Bressan, S., Madnick, E., Siegel, M.D.: Context interchange: New fea-
tures and formalisms for the intelligent integration of information. ACM Transac-
tions on Information Systems 17(3), 270293 (1999)
11. Hakimpour, F., Geppert, A.: Global Schema Generation Using Formal Ontologies.
In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds.) ER 2002. LNCS, vol. 2503,
pp. 307321. Springer, Heidelberg (2002)
12. Halevy, A.Y., Ashish, N., Bitton, D., Carey, M.J., Draper, D., Pollock, J., Rosen-
thal, A., Sikka, V.: Entreprise information integration: successes, challenges and
controversies. In: SIGMOD, pp. 778787 (2005)
13. Hong, J., Liu, W., Bell, D.A., Bai, Q.: Answering queries using views in the presence
of functional dependencies. In: Jackson, M., Nelson, D., Stirk, S. (eds.) BNCOD
2005. LNCS, vol. 3567, pp. 7081. Springer, Heidelberg (2005)
14. Mena, E., Kashyap, V., Sheth, A.P., Illarramendi, A.: Observer: An approach for
query processing in global information systems based on interoperation across pre-
existing ontologies. In: CoopIS, pp. 1425 (1996)
15. Mohania, M.K., Radha Krishna, P., Pavan Kumar, K.V.N.N., Karlapalem, K.,
Vincent, M.W.: Functional dependency driven auxiliary relation selection for ma-
terialized views maintenance. In: COMAD (2005)
16. Romero, O., Calvanese, D., Abello, A., Rodriguez-Muro, M.: Discovering functional
dependencies for multidimensional design. In: ACM 12th Int. Workshop on Data
Warehousing and OLAP (2009)
17. Sas, F., Pernelle, N., Rousset, M.C.: Combining a logical and a numerical method
for data reconciliation. Journal of Data Semantics 12, 6694 (2009)
18. Toman, D., Weddell, G.E.: On keys and functional dependencies as rst-class citi-
zens in description logics. J. of Automated Reasoning 40(2-3), 117132 (2008)
A Comprehensive Framework on
Multidimensional Modeling
1 Introduction
Developing a data warehousing system is never an easy job, and raises up some
interesting challenges. One of these challenges focuses on modeling multidimen-
sionality. OLAP tools are conceived to exploit the data warehouse for analysis
tasks based on the multidimensional (MD) paradigm and therefore, the data
warehouse must be structured according to the MD model. Lots of eorts have
been devoted to MD modeling, and several models and design methods have
been developed and presented in the literature. Consequently, we can nowadays
design a MD conceptual schema, create it physically and later, exploit it through
the model algebra or calculus (implemented in the exploitation tools).
MD modeling was rst introduced by Kimball in [9]. Kimballs approach was
well received by the industry and also introduced the rst method to derive
the data warehouse logical schema. Similar to traditional information systems
modeling, Kimballs method is requirement-driven: it starts eliciting business
requirements of an organization and through a step-by-step guide we are able to
derive the MD schema. Only at the end of the process data sources are considered
to map data from sources to target.
In short, Kimballs approach follows a traditional modeling approach (i.e.,
from requirements), but it set down the principles of MD modeling. MD modeling
is radically opposite to OLTP systems modeling: the data warehouse conceptual
schema is directly derived from the organization operational sources and provides
a single, detailed, integrated and homogenized view of the business domain.
Consequently, the data warehouse can be thought as a strategic view of the
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 108117, 2011.
c Springer-Verlag Berlin Heidelberg 2011
A Comprehensive Framework on Multidimensional Modeling 109
organization data and for this reason, and unlike most information systems that
are designed from scratch, the organization data sources must be considered as
rst-class citizens in the data warehouse design process. This major additional
requirement has such interesting consequences so much so that it gave rise to
a new research topic and up to now, several MD modeling methods have been
introduced in the literature. With the perspective of time, we may now highlight
those features that drew the attention of the community. The evolution of the
modeling methods introduced in the literature pivots on a crucial aspect: the
dichotomy requirements versus data sources (and how to deal with it).
In this paper we discuss what current approaches provide and which are their
major aws. Our contribution lays in a comprehensive framework that does not
focus on how these approaches work but what they do provide. Importantly, note
that by no means we aim at comparing current approaches but providing a com-
prehensive, general picture on MD modeling and identify what is (yet) missing
on this area. The criteria used for this analysis can be summarized as follows: the
role played by end-user requirements and data sources for each method, the de-
gree of automation achieved and the quality of the output produced (i.e., which
MD concepts and features do they really consider). The use of this criteria is
justied by the conclusions drawn by a previous, exhaustive analysis of current
design methods that can be found in [17].
The paper is structured as follows. Section 2 summarizes briey our previous
research on MD design and highlights how this area evolved with time. Next,
Section 3 provides a detailed, comprehensive discussion of what can be achieved
by using current approaches in real projects. We wrap up the discussion pointing
out the main aws that still need to be addressed in order to better support the
data warehouse design.
2 Multidimensional Modeling
In this section we introduce the background of MD modeling. Our objective
here is to provide an insightful view of how this area evolved with time. The
interested reader is addressed to [17] for details.
Shortly after Kimball introduced his ad hoc modeling method for data ware-
houses [10], some other methods were presented in the literature (e.g., [4,6,2,7,12]).
Like Kimballs method, these methods were originally regarded as step-by-step
guides to be followed by a data warehouse expert who start gathering the end-
user requirements. However, unlike Kimballs work, they give more and more
relevance to the data sources. Involving the data sources in these approaches
means that it is compulsory to have well-documented data sources (e.g., with
up-to-date conceptual schemas) at the experts disposal but it also entailed two
main benets: on the one hand, the user may not know all the potential analysis
contained in the data sources and analyzing them we may nd unexpected poten-
tial analysis of interest to the user; on the other hand, we should assure that the
data warehouse can be populated with data available within the organization.
As said, to carry out these approaches manually it is compulsory to have well-
documented data sources, but in a real organization, the data sources
110 O. Romero and A. Abell
o
documentation may be incomplete, incorrect or may not even exist [6] and,
in any case, it would be rather dicult for a non-expert designer to follow these
guidelines. Indeed, when automating this process is essential not to depend on
the experts ability to properly apply the method chosen and to avoid the tedious
and time-consuming task (even unfeasible when working over large databases)
of analyzing the data sources.
In order to solve these problems, several new methods automating the design
task were introduced in the literature [14,21,8]. These approaches work directly
over relational database logical schemas. Thus, despite they are restricted to a
specic technology, they get up-to-date data that can be queried and managed by
computers. They also argue that restricting to relational technology makes sense
since nowadays it is the most widely used technology for operational databases.
About the process carried out, these methods follow a data-driven process fo-
cusing on a thorough analysis of the data sources to derive the data warehouse
schema in a reengineering process. This process consists of techniques and de-
sign patterns that must be applied over the data sources schema to identify data
likely to be analyzed from a MD perspective.
Nevertheless, a requirement analysis phase is crucial to meet the user needs
and expectations [3,5,22,15,11,1]. Otherwise, the user may nd himself frustrated
since he / she would not be able to analyze data of his / her interest, entailing
the failure of the whole system. Today, it is assumed that the ideal scenario
to derive the data warehouse conceptual schema embraces a hybrid approach
(i.e., a combined data-driven and requirement-driven approach) [22]. Then, the
resulting MD schema will satisfy the end-user requirements and it will have been
conciliated with the data sources simultaneously.
According to [22], MD modeling methods may be classied within a demand-
driven, a supply-driven or a hybrid framework:
Each paradigm has its own advantages and disadvantages. Carrying out an ex-
haustive search of dimensional concepts among all the concepts of the domain
(like SDAs do) has a main benet with regard to those approaches that derive
the schema from requirements and later conciliate them with the data sources
(i.e., DDAs): in many real scenarios, the user may not be aware of all the po-
tential analysis contained in the data sources and, therefore, overlook relevant
knowledge. Demand-driven and current hybrid approaches do not consider this
and assume that requirements are exhaustive. Thus, knowledge derived from
the sources not depicted in the requirements is not considered and discarded.
A Comprehensive Framework on Multidimensional Modeling 111
As a counterpart, SDAs tend to generate too many results (since they overlook
the MD requirements, they must apply their design patterns all over the data
sources) and mislead the user with non relevant information. Furthermore, DDAs
(or demand-driven stages within a hybrid approach) are not automated whereas
supply-driven stages tend to facilitate their automation. The main reason is
that demand-driven stages would require to formalize the end-user requirements
(i.e., translate them to a language understandable by computers). Unfortunately,
most current methods handle requirements mostly stated in languages (such as
natural language) lacking the required degree of formalization. Thus, matching
requirements over the data sources must be performed manually. However, the
time-consuming nature of this task can render it unfeasible when large databases
are used.
In general, most approaches do not automate the process and just present a
set of steps (i.e., a guideline) to be followed by an expert in order to derive the
MD schema. Mainly, these methods introduce dierent patterns or heuristics to
discover concepts likely to play a MD role and to carry out these approaches
manually it is compulsory to have well-documented data sources at the experts
disposal. This prerequisite is not easy to fulll in many real organizations and
in order to solve this problem, current automatable methods directly work over
relational databases (i.e., getting up-to-date data). To our knowledge, only three
exceptions exist to this rule [20,19,13], which automate the process from ER
schemas (the rst one) and ontologies (the other two). Consequently, all these
methods (or stages within hybrid approaches) follow a supply-driven paradigm
and thus, rely on a thorough analysis of the sources.
All in all, DDAs assume that requirements are exhaustive, whereas SDAs rely
on discovering as much MD knowledge as possible. As a consequence, SDAs
generate too many results. Furthermore, current automatable methods follow a
SDA, whereas current DDAs overlook the process automation, since they tend to
work with requirements at a high level of abstraction. Finally, all current hybrid
approaches follow a sequential approach with two well-dierentiated steps: the
supply-driven and the demand-driven stages. Each one of these stages, however,
suers from the same drawbacks as pure SDAs or DDAs do.
make them hardly comparable. For example, some approaches claim to fully au-
tomate the design task, but they do so by overlooking the end-user requirements
in a fully SDA (and thus, making the user responsible for manually ltering the
results obtained according to his / her needs). Similarly, exhaustive DDAs claim
to derive high-quality outputs, but they completely overlook the task automa-
tion. For this reason, every approach ts to a narrow-ranged set of scenarios and
do not provide an integrated solution for every real-world case, which, in turn,
makes the data warehouse designers to come up with ad hoc solutions for each
project. For example, we cannot follow the same approach in a scenario where
the end-user requirements are clear and well-known, and in a scenario in which
the end-user requirements are not evident or cannot be easily elicited (e.g., this
may happen when the users are not aware of the analysis capabilities of their
own sources). Clearly, this is the major aw of current approaches, which do not
suit well for the wide range of real projects a designer could meet. Interestingly,
it has already been pointed out [18] that, given a specic design scenario, the
necessity to provide requirements beforehand is smoothed by the fact of having
semantically rich data sources. In lack of that, requirements gain relevance to
extract the MD knowledge from the sources. In both cases, we can still achieve
an acceptable degree of automation and output quality, as discussed later on.
as a rhombus labeled with the rst initial of each author plus the year of the
bibliographical item it represents. Furthermore, for the sake of understandability,
we provide the projection of each rhombus in the three planes (green points for
the XZ plane projections; blue points for the XY plane and red points for the
XZ plane). Each approach is placed in the 3D-space according to the conclusions
extracted from [17]. The rst conclusion is that the duality requirements / data-
sources is clearly shown in both gures, as SDAs and DDAs are placed in opposite
axis (to better appreciate it, check the plane projections of each point).
Requirements Specicity: DDAs integrate the end-user requirements, which
lead the whole process by exploiting the knowledge of the requirements. There-
fore, the quality and expressiveness of the input requirements must be high. On
the contrary, at the beginning of the process, SDAs do not need the end-user
requirements to work. Indeed, results provided by SDAs, are eventually shaped
by the end-user needs a posteriori. In this latter step, the end-user just state
his / her requirements by choosing his / her concepts of interest regarding the
results provided. Therefore, the user would even be able to state them on-the-y
regarding the output presented to the user by SDAs.
Data Source Expresiveness: SDAs lead the process from a thorough analysis
of the data sources and, in general, they ask for high quality inputs capturing
the data sources (i.e., relational schema, ER diagram, domain ontology, etc.). In
case of inputs at the conceptual level, a mapping between the conceptual schema
and the sources as well as means to access the data sources at the instance level
are also required. Regarding DDAs, the quality of the inputs is not that relevant
114 O. Romero and A. Abell
o
given that the requirements provide the lack of semantics captured in the sources.
Thus, they could even handle not well-formed data sources (e.g., denormalized
sources).
Automation: The rst gure shows that the automation degree achieved, in the
general case, is medium or low. Only 6 approaches automate the design task up
to a fair degree. Regarding DDAs, new techniques to automatically manipulate
requirements during the design phase are needed. In this sense, [16] sets a basis
on this direction but it can only deal with relational sources. Oppositely, SDAs
achieve an interesting degree of automation, but most of them happen not to be
useful in practice due to the big set of assumptions made. For example, the kind
of sources (normally, only relational sources are allowed but this is clearly unsat-
isfactory nowadays, where unstructured data and the Web is a relevant source
of knowledge) or additional ones (such as relational schemas in, at least, 3NF).
Furthermore, ltering techniques based on objective evidences are a must. SDAs
tend to generate too many results. Consequently, they unnecessarily overwhelm
users with blindly generated combinations whose meaning has not been analyzed
in advance. Eventually, they put the burden of (manually) analyzing and ltering
results provided onto the designers shoulder, but the time-consuming nature of
this task can render it unfeasible when large data sources are considered. To our
knowledge, only [19] lters out results obtained prior to show the results to the
user. Furthermore, all these facts directly aect the computational complexity
of SDAs.
Quality Output: Both SDAs and DDAs are able to extract valuable knowledge
from the requirements / data sources but only a few of them deal with concepts
A Comprehensive Framework on Multidimensional Modeling 115
Several methods for supporting the data warehouse modeling task have been
provided. However, they suer from some signicant drawbacks, which need to
be addressed. In short, DDAs assume that requirements are exhaustive (and
therefore, do not consider the data sources to contain alternative interesting ev-
idences of analysis), whereas SDAs (i.e., those leading the design task from a
thorough analysis of the data sources) rely on discovering as much MD knowl-
edge as possible from the data sources. As a consequence, SDAs generate too
many results, which misleads the user. Furthermore, the design task automation
is essential in this scenario, as it removes the dependency on an experts ability
to properly apply the method chosen, and the need to analyze the data sources,
which is a tedious and time-consuming task (which can be unfeasible when work-
ing with large databases). In this sense, current automatable methods follow a
SDA, whereas current DDAs overlook the process automation, since they tend
to work with requirements at a high level of abstraction. Indeed, this scenario
is repeated regarding SDA and DDA stages within current hybrid approaches,
which suer from the same drawbacks than pure DDA and SDA approaches.
Consequently, previous experiences in this eld have shown that the data
warehouse MD conceptual schema must be derived from a truly hybrid approach:
i.e., by considering both the end-user requirements and the data sources, as rst-
class citizens. Currently, several methods (i.e., detailed design approaches) and
dissertations (i.e., high level discussions highlighting the necessities in each real
scenario) for supporting the data warehouse design task have been introduced in
the literature, but none of them provides an integrated and automated solution
116 O. Romero and A. Abell
o
embracing both aspects. On the one hand, dissertations about how the design
task must be adapted to every real-world scenario provide an insightful idea of
how to proceed in each case. However, they fail to provide detailed algorithms
to undertake this task (thus, ad hoc solutions are needed). On the other hand,
detailed methods introduced tend to focus on a narrow-ranged set of scenarios.
For example, today, it is assumed that the approach to follow in a scenario where
the end-user requirements are clear and well-known is completely dierent from
that in which the end-user requirements are not evident or cannot be easily
elicited (for example, this may happen when the users are not aware of the
analysis capabilities of their own sources). Similarly, the necessity to provide
requirements beforehand is smoothed by the fact of having semantically rich
data sources. In lack of that, requirements gain relevance to extract the MD
knowledge from the sources. Indeed, a combined and comprehensive framework
to decide, according to the inputs provided in each scenario, which is the best
approach to follow, is missing. This framework should be built considering that:
If the end-user requirements are well-known beforehand, we can benet from
the knowledge captured in the data sources, but we should guide the de-
sign task according to requirements and consequently, we will be able to
work and handle semantically poorer data sources. In other words, providing
high-quality end-user requirements, we can guide the process and overcome
the fact of disposing of bad quality (from a semantical point of view) data
sources.
As a counterpart, a scenario in which the data sources available are seman-
tically richer, the approach should be guided by a thorough analysis of the
data sources, which eventually will be properly adapted to shape the output
result and meet the end-user requirements. In this context, disposing of high-
quality data sources we can overcome the fact of lacking of very expressive
end-user requirements.
References
1. Annoni, E., Ravat, F., Teste, O., Zuruh, G.: Towards multidimensional require-
ment design. In: DaWaK 2006. LNCS, vol. 4081, pp. 7584. Springer, Heidelberg
(2006)
2. Bohnlein, M., vom Ende, A.U.: Deriving Initial Data Warehouse Structures from
the Conceptual Data Models of the Underlying Operational Information Systems.
In: Proc. of 2nd Int. Wksp on Data Warehousing and OLAP, pp. 1521. ACM,
New York (1999)
3. Bonifati, A., Cattaneo, F., Ceri, S., Fuggetta, A., Paraboschi, S.: Designing Data
Marts for Data Warehouses. ACM Trans. Soft. Eng. Method 10(4), 452483 (2001)
4. Cabibbo, L., Torlone, R.: A Logical Approach to Multidimensional Databases. In:
Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377,
pp. 183197. Springer, Heidelberg (1998)
A Comprehensive Framework on Multidimensional Modeling 117
5. Giorgini, P., Rizzi, S., Garzetti, M.: Goal-oriented Requirement Analysis for Data
Warehouse Design. In: Proc. of 8th Int. Wksp on Data Warehousing and OLAP,
pp. 4756. ACM Press, New York (2005)
6. Golfarelli, M., Maio, D., Rizzi, S.: The Dimensional Fact Model: A Conceptual
Model for Data Warehouses. Int. Journal of Cooperative Information Systems
7(2-3), 215247 (1998)
7. Husemann, B., Lechtenb orger, J., Vossen, G.: Conceptual Data Warehouse Model-
ing. In: Proc. of 2nd Int. Wksp on Design and Management of Data Warehouses,
p. 6. CEUR-WS.org (2000)
8. Jensen, M.R., Holmgren, T., Pedersen, T.B.: Discovering Multidimensional Struc-
ture in Relational Data. In: Kambayashi, Y., Mohania, M., W o, W. (eds.) DaWaK
2004. LNCS, vol. 3181, pp. 138148. Springer, Heidelberg (2004)
9. Kimball, R.: The Data Warehouse Toolkit: Practical Techniques for Building Di-
mensional Data Warehouses. John Wiley & Sons, Inc., Chichester (1996)
10. Kimball, R., Reeves, L., Thornthwaite, W., Ross, M.: The Data Warehouse Lifecy-
cle Toolkit: Expert Methods for Designing, Developing and Deploying Data Ware-
houses. John Wiley & Sons, Inc., Chichester (1998)
11. Maz on, J., Trujillo, J., Lechtenborger, J.: Reconciling Requirement-Driven Data
Warehouses with Data Sources Via Multidimensional Normal Forms. Data &
Knowledge Engineering 23(3), 725751 (2007)
12. Moody, D., Kortink, M.: From Enterprise Models to Dimensional Models: A
Methodology for Data Warehouse and Data Mart Design. In: Proc. of 2nd Int.
Wksp on Design and Management of Data Warehouses. CEUR-WS.org (2000)
13. Nebot, V., Llavori, R.B., Perez-Martnez, J.M., Aramburu, M.J., Pedersen, T.B.:
Multidimensional integrated ontologies: A framework for designing semantic data
warehouses. J. Data Semantics 13, 136 (2009)
14. Phipps, C., Davis, K.C.: Automating Data Warehouse Conceptual Schema Design
and Evaluation. In: Proc. of 4th Int. Wksp on Design and Management of Data
Warehouses., vol. 58, pp. 2332. CEUR-WS.org (2002)
15. Prat, N., Akoka, J., Comyn-Wattiau, I.: A UML-based Data Warehouse Design
Method. Decision Support Systems 42(3), 14491473 (2006)
16. Romero, O., Abell o, A.: Automatic Validation of Requirements to Support Multi-
dimensional Design. Data & Knowledge Engineering 69(9), 917942 (2010)
17. Romero, O., Abell o, A.: A Survey of Multidimensional Modeling Methodologies.
Int. J. of Data Warehousing and Mining 5(2), 123 (2009)
18. Romero, O.: Automating the Multidimensional Design of Data Warehouses.
Ph.D. thesis, Universitat Politecnica de Catalunya, Barcelona, Spain (2010),
http://www.tdx.cat/handle/10803/6670
19. Romero, O., Abell o, A.: A Framework for Multidimensional Design of Data Ware-
houses from Ontologies. Data & Knowledge Engineering 69(11), 11381157 (2010)
20. Song, I., Khare, R., Dai, B.: SAMSTAR: A Semi-Automated Lexical Method for
Generating STAR Schemas from an ER Diagram. In: Proc. of the 10th Int. Wksp
on Data Warehousing and OLAP, pp. 916. ACM, New York (2007)
21. Vrdoljak, B., Banek, M., Rizzi, S.: Designing Web Warehouses from XML Schemas.
In: Kambayashi, Y., Mohania, M., W o, W. (eds.) DaWaK 2003. LNCS, vol. 2737,
pp. 8998. Springer, Heidelberg (2003)
22. Winter, R., Strauch, B.: A Method for Demand-Driven Information Requirements
Analysis in DW Projects. In: Proc. of 36th Annual Hawaii Int. Conf. on System
Sciences, pp. 231239. IEEE, Los Alamitos (2003)
Preface to Variability@ER11
As software requirements constantly increase in size and complexity, the need for
methods, formalisms, techniques, tools and languages for managing and evolving
software artifacts become crucial. One way to manage variability when dealing with a
rapidly growing variety of software products is through developing and maintaining
families of software products rather than individual products. Variability management
is concerned with controlling the versions and the possible variants of software
systems. Variability management gained a special interest in various software-related
areas in different phases of the software development lifecycle. These areas include
conceptual modeling, product line engineering, feature analysis, software reuse,
configuration management, generative programming and programming language
design. In the context of conceptual modeling, the terminology of variability
management has been investigated, yielding ontologies, modeling languages, and
classification frameworks. In the areas of software product line engineering and
feature analysis, methods for developing core assets and efficiently using them in
particular contexts have been introduced. In the software reuse and configuration
management fields, different mechanisms for reusing software artifacts and managing
software versions have been proposed, including adoption, specialization, controlled
extension, parameterization, configuration, generation, template instantiation, analogy
construction, assembly, and so on. Finally, generative programming deals with
developing programs that synthesize or generate other programs and programming
language design provides techniques for expressing and exploiting commonality of
source code artifacts, but also for specifying the allowed or potential variability,
whether it is static or dynamic.
The purpose of this workshop is to promote the theme of variability management
from all or part of these different perspectives, identifying possible points of synergy,
common problems and solutions, and visions for the future of the area. The workshop
accepted 4 papers dealing with variability management related issues:
1. Mohammed Eldammagh and Olga De Troyer. Feature Modeling Tools:
Evaluation and Lessons learned.
2. Ateeq Khan, Gunter Saake, Christian Kaestner and Veit Koeppen. Service
Variability Patterns.
3. Angela Lozano. An Overview of Techniques for Detecting Software
Variability Concepts in Source Code.
4. Jaap Kabbedijk and Slinger Jansen. Variability in Multi-tenant
Environments: Architectural Design Patterns from Industry.
The workshop also had an invited talk given by Timo K. Kkl and entitled "ISO
Initiatives on Software Product Line Engineering: Vision and Current Status."
For more information about the workshop, please visit our web site at
http://www.domainengineering.org/Variability@ER11/
Timo K. Kkl
1 Introduction
During the past years, several variability modeling techniques as well as tools have
been developed to model variability during domain analysis [1]. Example modeling
techniques are: COVAMOF [2], OVM (Orthogonal Variability Modeling) [1], VSL
(Variability Specification Language) [3] and FAM (Feature Assembly Modeling) [4].
Notwithstanding the wide range of possible approaches to support variability
modeling, feature modeling remains the most commonly used technique for
identifying and capturing variability requirements and dependencies between product
characteristics. Also, most variability modeling tools support feature modeling [6].
Industrial software product lines are characterized by a rapid growth of the number of
variation points, associated variants and dependencies. Previous research points out that
there is a lack of adequate tools supporting this increase in variability information
[5,6,16]. Moreover, in the case of large-scale software systems, usability and scalability
issues quickly grow and become a major source of frustration [7].
Because of the importance of adequate tool support, we want to investigate how
the quality of feature modeling tools can be improved. As a first step in this research,
the quality of existing tools will be investigated in order to gain insight in the aspects
that influence the quality. Later on, we can then try to improve on these aspects. In a
first study performed in this context, we have concentrated on evaluating the quality
of existing tools from a usability point of view. Scalability issues will be investigated
in later work. In this paper, we present the results of this first study. For this study, we
selected 9 feature modeling tools [8] using a graphical user interface. These tools are
evaluated against the criteria Usability, Safety, and Functional Usability Features.
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 120129, 2011.
Springer-Verlag Berlin Heidelberg 2011
Feature Modeling Tools: Evaluation and Lessons Learned 121
The rest of the paper is organized as follows. We start by identifying the quality
criteria that we will use (Section 2). Next, we describe the methodology of our quality
evaluation in Section 3. Whereas Section 4 concerns the setup of the evaluation,
Section 5 concentrates on the results. Section 6 discusses the differences in quality
compliance of the tools and highlights the most important lessons learned. Next,
Section 7 mentions the related work. Finally, Section 8 recaps the main ideas and
presents further research.
2 Quality Criteria
For our quality evaluation of tools, we can use the system quality in use model, as
defined by ISO/IEC 9126, ISO 924111 and the ISO/IEC 25000 series of standards.
This model offers a broad framework of quality requirements. It consists of three
characteristics: usability, flexibility and safety. These are on their turn subdivided into
sub-characteristics. Accordingly, usability can be measured by the degree to which a
specified group of users conducts certain tasks with effectiveness, efficiency and
satisfaction. Whereas effectiveness is defined in terms of accuracy and
completeness, efficiency is defined in terms of resources expended in relation to
effectiveness. Satisfaction describes the extent to which users are satisfied. Similarly,
flexibility deals with the context within which the tool operates. Safety includes all
the potential negative effects resulting from incomplete or incorrect output [9].
For this first study, we decided to focus on usability and safety. We omit
flexibility, as in this study the context of use will be fixed. However, some other
aspects of flexibility will be measured by considering functional usability features
(FUFs), which are usability requirements with a major impact on the functionality.
According to Juristo [10], these are important to consider because of their high
functional implications. Since a relationship between usability and functional
requirements has been proven, functional requirements should be taken into
consideration in the case that they have an impact on usability attributes.
3 Methodology
The purpose of our study is to measure the quality of existing feature modeling tools
using the criteria identified in Section 2. Please note that, in no way, it is our purpose
to rank or give a judgment on the tools evaluated. The purpose of the study is to use
the information collected to draw some lessons learned, and to use this to make
recommendations on how to improve the quality of feature modeling tools in general.
Firstly, tools were selected on their ability to support feature modeling. Secondly, we
selected tools with an interactive graphical user interface (GUI), in which the features,
types, feature group and feature dependencies are visualized. We decided to
concentrate on tools with a GUI, as the use of a GUI has many advantages [11],
especially in the communication with non-technical domain experts.
122 M. El Dammagh and O. De Troyer
Table 1. Tools
This study will evaluate whether and to which extend the nine tools support the
selected quality criteria, i.e. Usability and Safety, and whether the FUFs are present.
Feature Modeling Tools: Evaluation and Lessons Learned 123
Usability
Efficiency: The expended resources as referred to in ISO 924111, are in our
experiment defined as follows. On one hand, task completion time (TCT) is measured
as the time that is needed to model the features, feature groups and feature
dependencies. On the other hand, the users effort (e) that is needed to complete the
task is measured as well. In order to calculate effort, we rely on [12]. Accordingly to
this work, effort (e) equals the number of mouse clicks (mc) + the number of
keyboard strokes (mk) + the number of mouse pixels traversed by the user - mouse
trajectory - (mic). Note that the time and effort to arrange the model according to
individual preferences (esthetic aspect) was excluded from the results.
Satisfaction: In order to evaluate the degree to which users are satisfied, we base
ourselves on the studies of [9] and [13]. Hence, participants were asked to rate the
following three statements: (Q1) You like very much to use the tool, (Q2) You
find the tool very easy to use and (Q3) You strongly recommend the tool to a
friend. A 5-point Likert scale has been used for this (1strongly disagree to 5
strongly agree).
Note that we will not measure effectiveness. In this study the accuracy and
completeness with which users achieve specified goals will not be considered as all
tools selected, support the creation of an accurate feature model.
Safety. We rely on the work of [14] to measure all the potential negative effects
resulting from incomplete or incorrect output [9]. The author draws a parallel
between inconsistent product configuration (output) and the disability of the tools to
provide automatic redundancy, anomaly and inconsistency checks, defined as follows:
Redundancy: Refers to information that is modeled in multiple ways. Redundancy
can decrease the maintainability of the model (negative effect), while on the other
hand it can increase the readability and understandability of the feature model
(positive effect). Hence, redundancy is considered as a light issue.
Anomalies: Refers to the modeling of senseless information. As a consequence,
potential configurations are being lost, although these configurations should be
possible. This can be considered as a medium issue.
Inconsistency: Refers to contradictory information within a model, i.e. information
that is conflicting with some other information in the model. Since inconsistent
product configuration can be derived from such a model, it is a severe issue.
Complementary to these three checks, a fourth measure of Safety has been added:
Invalid semantics: Since all the given tools support a feature model based on a tree
structure, the feature model should be modeled with respect to certain semantic rules
inherent to this tree structure, e.g., each child has only one parent.
In our study, the tools are tested on their compliance with these four safety checks
by using a rating scale from 0 to 3 (0not supported to 3fully supported).
5 Results
We discuss the results of the evaluation following the evaluation criteria:
Feature Modeling Tools: Evaluation and Lessons Learned 125
Usability
Efficiency: One-way analyses of variance (ANOVAs) were conducted to outline
whether the tools differ significantly from each other in terms of task completion
time and effort spent. The independent variable consists of the different tools, while
the dependent variables are task completion time (TCT) and effort. Both dependent
variables are measured for each of the sub processes. The results of these ANOVAs
indicate that there is a significant difference among the tools.
Additionally, Tukey's HSD [35] range test was used to identify homogeneous
subsets of means that are not significant differently from each other. It allows us to
group tools together and draw parallels between non-literally similar tools. Equally, it
enables us to search for a justification why a certain group of tools scores better in
efficiency than another group. The latter will be further explored in Section 6.
Table 2 depicts TCT and effort required to model one feature, one feature group
and one feature constraint. The groups are arranged in a sequential manner, with G1
being the most efficient and G4 being the least. Table 2 suggests that all tools differ in
offering support to each of the three sub processes. Unfortunately, no tool scores well
(G1) in all the three sub processes.
Satisfaction: None of the tools was given the maximum rating by each of the
participants. Nevertheless, RequiLine scores the best on satisfaction. Subsequently,
the satisfaction ratings of Feature Modeling Tool, CVM Tool, Feature Model DSL
and Pure::Variants are close to each other. The worst results are obtained for
XFeature, CaptainFeature, FeatureIDE and MOSKitt.
From participants comments given, it is clear that most of the tools failed in
satisfying them. The reasons brought up mainly are: (a) the inability to adapt to the
participants modeling skills, i.e. a lack to support user expertise; (b) the unfitness to
adapt to participants preferences; (c) the non-availability of functionalities like
copy/paste, shortcuts and redo/do; (d) requiring more steps than needed; (e)
restrictions/lack of flexibility, e.g. features, like parent and child are displayed and
locked into levels on top of each other; and (f) bugs.
Table 2. Efficiency
Safety
The tools were investigated on their compliance with the checks for redundancy,
anomalies, inconsistency, and invalid semantics. Table 3 shows the results.
126 M. El Dammagh and O. De Troyer
Tools Ra Ab Ic Sd Tools Fa Ub Vc Ed Se Rf Hg
T7 3 3 3 3 T2 1 3 3 0 1 2 3
T2 0 0 3 3 T7 1 3 3 0 1 2 0
T4 0 0 3 3 T9 0 3 3 0 1 0 3
T9 0 0 0 3 T4 1 2 3 0 1 2 0
T3 0 0 0 3 T8 1 3 3 0 1 0 0
T8 0 0 0 3 T6 0 3 0 0 1 2 0
T5 0 0 0 1 T1 0 0 3 0 0 0 3
T1 0 0 0 0 T5 0 3 1 0 1 0 0
T6 0 0 0 0 T3 1 0 3 0 0 0 0
a. Redundancy, b. Anomalies, c. Inconsistency, a. Feedback, b. Undo, c. Field Validation, d. User Expertise, e. Shortcuts,
d. Invalid semantics. r. Reuse information, g. Help.
By using a context menu: e.g., by right clicking on a feature a menu pops up to edit
it or to create a sub feature/feature group (CVM Tool uses this technique to create a
feature group, and FeatureIDE uses it to create a feature and a feature group).
Specifying information by using an external window or/and by syntax are the most
effort and time consuming. The use of syntax also requires good knowledge of the
syntax to avoid errors.
The best results in efficiency were obtained when input can be done by a context
menu. It owes its success to the fact that it is concentrated directly on the design
surface, where the diagram is located. Although a toolbox is commonly used in
modeling tools, its use turns out to be very counterproductive because of the
trajectory of going forth and back from the toolbox to design surface.
Secondly, with relevance to efficiency, we encountered three different ways of
editing. Editing was done by either (a) a properties menu, or (b) a context menu or (c)
a single- and double-click of the mouse. Editing by mouse clicks appears to be least
effort and time consuming, whereas editing by a properties menu should be avoided.
Thirdly, unnecessary steps should be omitted. The cumbersome way for modeling
a feature group in 3 separated steps (like done in most tools) aptly illustrates this.
Since the feature group logically consists of one parent and at least two children,
modeling the feature group can be easily reduced to one single step.
And last but not least, the use of defaults has a negative effect on the efficiency,
e.g., default feature type, default position (i.e. position of the feature on the design
surface after its creation), and default feature size (i.e. the size of its symbol). A
default carries the disadvantage that it may need to be adjusted afterwards, which
requires twice as much effort and time. Therefore, defaults must be used with care.
With respect to safety, any kind of inconsistencies, anomalies and invalid
semantics should be avoided. With respect to the redundancy check, the user should
have the option to allow it since in some cases redundancy can add readability and
understandability to the model.
Concerning the Functional Usability Features (FUFs), it is undisputable that these
should be supported. The comments of the participants post-questionnaire unveil a
significant influence of the FUF on satisfaction.
To summarize, we can give the following advices:
Any action like input, editing, arranging features taking place outside of the
design surface should be avoided as much as possible. The use of a context menu
meets these needs the best. Another modeling technique worthwhile to explore is
one that visualizes features, feature group and feature constraints on the design
surface (with small icons) and where the user can create these by a single mouse
click (cf. CmapTool [34]). Although such a technique looks promising, it should be
evaluated to verify if indeed it results in better efficiency.
At the same time, input and editing should take as few steps as possible. Shortcuts
and a kind of single- and double-click method can be advised.
In general, defaults should only be used if in most cases the user does not need to
change them. In case of default position, we advise to position the created features
either under the parent or at the position of the mouse. As to default size, we
propose to AutoFit the size of the feature to its name.
128 M. El Dammagh and O. De Troyer
Moreover, the safety checks (except redundancy check) as well as the Functional
Usability Features have to be provided.
Furthermore, allowing the user to configure the tool according to his personal
expertise and preferences is also considered important (e.g., define his own
shortcuts and to (de)activate safety checks).
7 Related Work
To the best of our knowledge, no previous work has evaluated these specific quality
requirements regarding tools supporting modeling variability.
In [6] the author presented a systematic functionality review of 19 tools, which
support the domain analysis process. However, the review only included functional
requirements. [16] provides an evaluation report of three tools on the importance of
tool support for functionality, usability and performance within software product
lines. The tools were completely reviewed based on the authors own opinion without
a scientific experiment. An industry survey of 4 tools was conducted in [17], where
usability had been categorized as a technical criterion. The literature review in [8]
highlights the need for an inconsistency, redundancy and anomalies check.
References
1. Pohl, K., Bckle, G., van der Linden, F.: Software Product Line Engineering: Foundations,
Principles, and Techniques. Springer, New York (2005)
2. Sinnema, M., Deelstra, S., Nijhuis, J., Bosch, J.: COVAMOF: A Framework for Modeling
Variability in Software Product Families. In: Nord, R.L. (ed.) SPLC 2004. LNCS,
vol. 3154, pp. 197213. Springer, Heidelberg (2004)
3. Becker, M.: Towards a General Model of Variability in Product Families. In: Proceedings
of the 1st Workshop on Software Variability Management, Netherlands (2003)
4. Abo Zaid, L., Kleinermann, F., De Troyer, O.: Feature Assembly Framework: Towards
Scalable and Reusable Feature Models. In: Fifth International Workshop VaMoS (2011)
5. Chen, L., Babar, M.A.: A Systematic Review of Evaluation of Variability Management
Approaches in Software Product Lines. Information and Software Technology 53, 344
362 (2011); Elsevier Journal
6. Lisboa, L.B., Garcia, V.C., Almeida, E.S., Meira, S.L., Lucrdio, D., Fortes, R.P.: A
Systematic Review on Domain Analysis Tools. Information and Software Technology 52,
113 (2010)
Feature Modeling Tools: Evaluation and Lessons Learned 129
1 Introduction
Service-Oriented Computing (SOC) is a paradigm to create information systems and
provides flexibility, interoperability, cost effectiveness, and higher quality characteris-
tics [1]. The trend of service-usage is increasing in enterprise to support processes.
However, even in the flexible world of services, variability is paramount at all layers.
Variability is the ability of a system to extend functionality, modify, customise or con-
figure the system [2]. We do not want to provide the same service to all consumers but
need to provide customised variants. Consumers want to fine tune services according
to their needs and will get a unique behaviour, which is tailored (personalised) for their
requirements. Fine-tuning depends on available features of the services, where a feature
is a domain-abstraction used to describe commonalities and differences [3].
However, variability approaches in SOC are ad-hoc. Many solutions exist; however,
each one is tailored and aimed for a specific problem or at a specific layer. Some ap-
proaches use simple mechanisms for variability, such as, using if-else structure imple-
mentations for variability in services. Others try to prevent bloated results of putting all
variability into one service (which also violates the service principle that each service
should be an atomic unit to perform a specific task) with various strategies, such as
frameworks ( [4, 5, 6]) and languages-based approaches ( [7, 8]). A single and perfect-
for-all solution does not exist in variability. Such a solution is also unrealistic, due to
very different requirements and technologies at different layers. Still, we believe that
there are common patterns, and developers do not need to rule out inefficient solutions
and reinvent better solutions again and again.
We contribute a catalogue of common variability pattern, designed to help devel-
opers to choose a technique for specific variability needs. We survey the literature and
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 130140, 2011.
c Springer-Verlag Berlin Heidelberg 2011
Service Variability Patterns 131
abstract from reoccurring problems and individual implementation strategies and layers.
We summarise our results in six common patterns for variability in the SOC domain (in
general many patterns are even transferable to other domains). The patterns are general
enough to describe the problem and the solution strategy including its trade-offs, dif-
ferent implementation strategies at different SOC layers, but also concrete enough to
guide a specific implementation. To help developers decide for the appropriate solution
to a variability problem at hand, we discuss trade-offs, limitations and possible com-
binations of different patterns. To aid understanding, we discuss example scenarios of
each pattern with their consequences.
Solution: We can store consumer specific settings and parameters as configuration files,
e.g. as XML files or stored in a database for each consumer. Parameter storage is also
not necessary; a possible extension is passing all required parameters every time when a
consumer accesses the SaaS application. There are two types of data associated with this
pattern, one is configuration specific data (values configured by consumers for different
options) and other is application specific data for each consumer (contain database,
values, and users). Configuration data is usually small and less updated as compared to
application specific data. For general needs or requirements, configuration data for each
consumer can be stored as key-value pair, e.g. consumer id and configuration values (for
user-interface favourite colour, selected endpoint, or fields to display).
Consequences: This pattern provides an easy approach to provide variability from the
same source code by storing and accessing consumer-specific behaviour based on pa-
rameters. Services are selected based on attribute values. Such approach is simple to
program and does not require a lot of expertise. This pattern provides flexibility but
consumer can choose only from the provided set. Management will be an issue in larger
scenarios if parameter conditions are scattered within the code.
Consumer
A Consumer
A
SaaS
SaaS
If(gid=foreign)
pref(creditC)
Service else `local`
Service
acting as a parameter
acting as a
consumer consumer
Consumer Consumer
B B
using logical operators. Logical operators can also be source of variability, e.g. some
consumers may use simple operators and others prefer or require more flexible rules for
business logic. We can use this pattern to to handle exceptions. This pattern is similar
to the facade or proxy pattern, discussed in [9].
Example: In our sports system, members pay club membership fees. For payments dif-
ferent options, or routing of services are possible, e.g. local members pay using credit
card, bank transfer or both, and foreign members can only pay using credit card.
Solution: Different solutions for implementation do exist for routing. These approaches
range from simple if-else statements to complex Aspect-Oriented Programming (AOP)
based approaches [12]. Message interception can also be used for routing. A message is
intercepted and analysed to add user-specific behaviour. Different techniques are used
to intercept message. Rahman et al. [12] use an AOP-based approach to apply business
rules on the intercepted message in SOA domain.
A web service request is intercepted, rules are applied on the request, and then re-
sult is forwarded. Routing can be done by analysing SOAP header or SOAP body (may
carry extra data for routing) and request is routed accordingly.
Consequences: Routing allows consumer to use application which suits to their re-
quirements. It also allows to separate business logic from service implementation (for
easy modification in rules at runtime). It is also easy to change routing rules and only
few changes are necessary. Consumers influence the application behaviour by changing
rules.
Adding new business rules or logical operators may add unnecessary loop in an
application or inconsistency in application. Validation rules or validators are applied
before adding branching rule [13,14]. Higher complexity of involved services may lead
to inconsistency in application due to rules. Algorithms for validation [15] can also be
used to find inconsistent or contradictory rules. Scalability is also an issue for complex
applications, routing rules may increase in size and their management become difficult.
Routing pattern may introduce single point of failure or decrease in performance.
hides the functionality. We can also use this pattern to support legacy systems without
major modification of existing code of the system and exposing functionality as a ser-
vice [1, 17, 18]. The consumers may want to expose her existing systems as a service
for other consumers, and restrict the access of some private business logic.
Example: An example from our sports system is offering email and SMS message ser-
vices (wrapped together as a notify match composite service) to send reminder about
change in match schedule to members and players.
Solution: We can use different solutions, e.g. using intermediate service, middleware
solutions or tools for variability. To expose legacy systems as service, different tech-
niques are possible, e.g. service annotations in Java. Intermediate service acts as an
interface between incompatible services and contains required logic to overcome the
mismatch.
Using SAP Process Integration (SAP PI) as a middleware, different service imple-
mentation, workflows, or client interfaces can be used to provide variability. We can use
different types of adapters to solve interface mismatch or to connect different systems.
When a request from a consumer side is sent to SAP PI, different service implementa-
tions, business rules, and interfaces can be selected based on the request. We also use
middleware for synchronous-asynchronous communication, in which results are stored
at middleware and delivered to the consumers based on their requests. The consumer or
provider both (not necessary service owner, could be third party providers) are respon-
sible for variability in this pattern.
Consequences: Using this pattern, we offer different variability solutions. Service wrap-
ping hides the complexity of the scenario from the consumer and simplifies the commu-
nication between consumer and composite service (consumers do not care about dif-
ferent interfaces or number of underlying services). Addition or removal of a service
becomes easy for the consumer (considered as include/exclude component in case of
component engineering). Services are reused and become compatible without changing
their implementation details by using service wrapping.
Composite services increase the complexity of the system. Adding services from
other providers may effect non-functional properties. Service wrapping increases the
number of services (depending on the scenarios composite, adapters or fine-grained)
offered from the provider and management of such a system becomes complex.
Consumer Consumer
A A
SaaS SaaS
Variant1 Variant2
Service
Service
acting as a
acting as a
wrapper consumer
consumer
Consumer Consumer
B C
Consumer
B Fig. 4. Variant Pattern
Example: In our sports system, different user-interface variants are used to display
match scores (suppose in Figure 4, text commentary is displayed in Variant1 for the
consumer B, online video streaming in Variant2 from provider side for consumer C,
while the consumer A sees the score only).
Solution: The variant pattern is a general pattern and used in various scenarios. Con-
sumers choose from a set of variants and use options to configure it, e.g. for unique
look and feel, workflows, or for viewing/hiding data fields in interface. These variants
are typically generated from the same source code at provider side. We can use gen-
erators, inheritance, polymorphic, or product line approaches to generate variants of a
service at design time [3, 20, 21, 9]. In [3], we discuss how different variants can be
offered based on a feature set from the same code base and benefits achieved using
variability. WSDL files can also be tailored and used for representing different variants.
The consumer specific options are stored as configuration files.
Consequences: It pattern allows to offer optimal solutions in the form of variants. In-
dustry best practices help consumers to choose right options and result in higher quality.
This pattern does not allow full flexibility to consumers. Developers provide variants
in advance and consumers have to choose only from given set. Managing different vari-
ants of a service increases the complexity. Additional information is needed to decide
which variant of a service is useful or compatible. Complex scenarios need a flexible
platform or architecture, which allows handling of different variants (challenges men-
tioned in [3]).
Motivation: Sometimes, consumers have specific requirements which are not fulfilled
by the above mentioned patterns. For instance, consumers want to upload their own im-
plementation of a service, replace part of a process, to meet the specific requirements.
Therefore, providers offer extension points in a SaaS application.
Application: This pattern requires pre-planning. Service providers prepare the variabil-
ity as extension points at design time. Consumers share the same code base and provide
behaviour at those extension points at runtime. Other consumers access the service
136 A. Khan et al.
without any change. It is similar to the strategy design pattern [9], frameworks, or call-
backs (can use inheritance methods at design time). The consumer modifies the applica-
tion behaviour by uploading implementations, rules, or fine-tuning services (changing
service endpoints). Extension points allow consumers to add consumer-specific imple-
mentations or business logic in the system at runtime as shown in Figure 5.
Example: In our sports system, a consumer configures extension point for alternative
scoring services from different providers using web service endpoint binding method.
Solution: In SOC, service interfaces (WSDL files), service implementations, service
bindings, and ports (endpoints) act as extension points in the architecture [22, 23, 24].
Consumers change these extensions points for variability. We can use physical sepa-
ration of instances or virtualisation as solutions for this pattern. A provider allocates
a dedicated hardware or a virtual instance for consumer-specific code execution sepa-
rately. In case of malicious code or failure, only the tenant-specific instance or virtual
image will be effected instead of the whole system. The consumer can perform modifi-
cations for service binding in WSDL. Endpoint modification is a method to modify the
service address in a WSDL or in a composite service, e.g. adding an end-point service
as an alternative in a web service binding. Endpoint modification can be done at run-
time.
Consequences: Extension points offer flexibility to the consumer and allow customisa-
tion of application behaviour. There are some potential risks due to offering flexibility
through extension points. In a workflow, by allowing a consumer to add activities, it is
possible that adding new activities in a workflow introduce loops in application, con-
suming resources or might result in never ending loops. Another problem is in allowing
a consumer to insert her own code, which may lead to failure of the whole system or
instance, e.g. in case of malicious code or virus uploading. Once variability is realised
by consumers, the system must check for the modification (extension points) and test
scenarios for correctness of the system, e.g. for resource consumption or effect on the
whole process (availability, time constraints for response, etc.)
Consumer
A
Solution: Service providers offer a separate instance for a consumer to keep the solu-
tion simpler, although it may introduces services with similar codes and functionalities.
The consumer introduces her own implementation and exposes as a service or modifies
the provided solution. In such a case, every consumer gets a independent customised
service instance. We use this pattern at the process or database layer as well, where a
consumer adds or develops her own process in SaaS. In such cases, at the database layer,
a consumer uses a separate database instance to accommodate new database relations
and different business requirements.
Consequences: SOC benefits are achieved in this pattern, although for some parts the
application service provider (ASP [25]) model is used, in which each consumer shares
the infrastructure facilities (shifting infrastructure and management tasks to providers)
but separate service instances. Legacy systems or other applications can be shifted to
SOC using this pattern easily. This pattern allows full flexibility, and consumers can
modify or customise respective services freely.
From service provider perspective, this pattern does not scale. It is expensive in terms
of costs for the large number of consumers and service instances. Hardware costs also
increase in such cases due to separate instances. Code replication increases the effort for
management and decreases productivity. Software updates or new version of software
must be updated for each instance manually or individually. Due to these main problems,
it is often not advisable to use this pattern and sometimes considered as anti-pattern.
We contributed six variability patterns for SOC that can guide developers to solve
different variability problems in practice. We discuss trade-offs according to several
Service Variability Patterns 139
evaluation criteria to help deciding for the right solution strategy for a problem at hand.
Our pattern catalogue helps to reuse solutions strategies in a manageable way.
In future work, we plan to extend our pattern catalogue into a framework that con-
tains decision criteria to choose and manage variability in SOC with specific imple-
mentation techniques. We will also evaluate our pattern catalogue further in practice to
compare performances where more than one patterns can be used at the same time.
References
[1] Papazoglou, M.P., van den Heuvel, W.J.: Service oriented architectures: approaches, tech-
nologies and research issues. VLDB 16(3), 389415 (2007)
[2] Svahnberg, M., van Gurp, J., Bosch, J.: A taxonomy of variability realization techniques.
Software - Practice and Experience 35(8), 705754 (2005)
[3] Apel, S., Kastner, C., Lengauer, C.: Research challenges in the tension between features
and services. In: ICSE Workshop Proceedings SDSOA, pp. 5358. ACM, NY (2008)
[4] Camara, J., Canal, C., Cubo, J., Murillo, J.M.: An Aspect-Oriented Adaptation Framework
for Dynamic Component Evolution. Electr. Notes Theor. Comput. Sci. 189, 2134 (2007)
[5] Guo, C.J., Sun, W., Huang, Y., Wang, Z.H., Gao, B.: A framework for native multi-tenancy
application development and management. In: The 9th IEEE International Conference on
E-Commerce Technology, pp. 551558 (2007)
[6] Kongdenfha, W., Saint-Paul, R., Benatallah, B., Casati, F.: An aspect-oriented framework
for service adaptation. In: Dan, A., Lamersdorf, W. (eds.) ICSOC 2006. LNCS, vol. 4294,
pp. 1526. Springer, Heidelberg (2006)
[7] Charfi, A., Mezini, M.: AO4BPEL: An aspect-oriented extension to BPEL. WWW 10(3),
309344 (2007)
[8] Zur Muehlen, M., Indulska, M.: Modeling languages for business processes and business
rules: A representational analysis. Information Systems 35, 379390 (2010)
[9] Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable
Object-Oriented Software. Addison Wesley, Reading (1995)
[10] Khan, A., Kastner, C., Koppen, V., Saake, G.: Service variability patterns in SOC.
Technical Report 05, School of Computer Science, University of Magdeburg, Magde-
burg, Germany (May 2011), http://wwwiti.cs.uni-magdeburg.de/iti_db/
publikationen/ps/auto/KKKS11.pdf
[11] Topaloglu, N.Y., Capilla, R.: Modeling the Variability of Web Services from a Pattern Point
of View. In: Zhang, L.J. (ed.) ECOWS 2004. LNCS, vol. 3250, pp. 128138. Springer,
Heidelberg (2004)
[12] ur Rahman, S.S., Khan, A., Saake, G.: Rulespect: Language-Independent Rule-Based AOP
Model for Adaptable Context-Sensitive Web Services. In: 36th Conference on Current
Trends in Theory and Practice of Computer Science (Student Research Forum), vol. II, pp.
8799. Institute of Computer Science AS CR, Prague (2010)
[13] Chong, F.T., Carraro, G.: Architecture strategies for catching the long tail, Microsoft
Corporation (April 2006), http://msdn.microsoft.com/en-us/library/
aa479069.aspx (last accessed June 24, 2011)
140 A. Khan et al.
[14] Carraro, G., Chong, F.T.: Software as a service (SaaS): An enterprise perspective, Microsoft
Corporation (October 2006), http://msdn.microsoft.com/en-us/library/
aa905332.aspx (last accessed June 24, 2011)
[15] Bianculli, D., Ghezzi, C.: Towards a methodology for lifelong validation of service com-
positions. In: Proceedings of the 2nd International Workshop on Systems Development in
SOA Environments, SDSOA, pp. 712. ACM, New York (2008)
[16] Mugge, H., Rho, T., Speicher, D., Bihler, P., Cremers, A.B.: Programming for Context-
based Adaptability: Lessons learned about OOP, SOA, and AOP. In: KiVS 2007 - Kommu-
nikation in Verteilten Systemen, vol. 15. ITG/GI-Fachtagung (2007)
[17] Yu, Q., Liu, X., Bouguettaya, A., Medjahed, B.: Deploying and managing web services:
issues, solutions, and directions. The VLDB Journal 17(3), 537572 (2006)
[18] Mughrabi, H.: Applying SOA to an ecommerce system, Master thesis (2007), http://
www2.imm.dtu.dk/pubdb/p.php?5496 (last accessed May 5, 2011)
[19] Aalst, W., Hofstede, A., Kiepuszewski, B., Barros, A.P.: Workflow patterns. Distributed and
Parallel Databases 14(1), 551 (2003)
[20] Papazoglou, M.P., Kratz, B.: Web services technology in support of business transactions.
Service Oriented Computing and Applications 1(1), 5163 (2007)
[21] Pohl, C., Rummler, A., et al.: Survey of existing implementation techniques with respect
to their support for the requirements identified in m3. 2, AMPLE (Aspect-Oriented, Model-
Driven, Product Line Engineering), Specific Targeted Research Project: IST- 33710 (July
2007)
[22] Jiang, J., Ruokonen, A., Systa, T.: Pattern-based variability management in web service
development. In: ECOWS 2005: Proceedings of the Third European Conference on Web
Services, p. 83. IEEE Computer Society, Washington, DC, USA (2005)
[23] Moser, O., Rosenberg, F., Dustdar, S.: Non-intrusive monitoring and service adaptation for
WS-BPEL. In: WWW, pp. 815824. ACM, New York (2008)
[24] Erradi, A., Maheshwari, P., Tosic, V.: Policy-Driven Middleware for Self-adaptation of Web
Services Compositions. In: van Steen, M., Henning, M. (eds.) Middleware 2006. LNCS,
vol. 4290, pp. 6280. Springer, Heidelberg (2006)
[25] Lacity, M.C., Hirschheim, R.A.: Information Systems Outsourcing; Myths, Metaphors, and
Realities. John Wiley & Sons, Inc., Chichester (1993)
An Overview of Techniques for Detecting
Software Variability Concepts in Source Code
Angela Lozano
Abstract. There are two good reasons for wanting to detect variability
concepts in source code: migrating to a product-line development for an
existing product, and restructuring a product-line architecture degraded
by evolution. Although detecting variability in source code is a com-
mon step for the successful adoption of variability-oriented development,
there exists no compilation nor comparison of approaches available to
attain this task. This paper presents a survey of approaches to detect
variability concepts in source code. The survey is organized around vari-
ability concepts. For each variability concept there is a list of proposed
approaches, and a comparison of these approaches by the investment re-
quired (required input), the return obtained (quality of their output),
and the technique used. We conclude with a discussion of open issues in
the area (variability concepts whose detection has been disregarded, and
cost-benet relation of the approaches).
1 Introduction
Todays companies face the challenge of creating customized and yet aordable
products. Therefore, a pervasive goal in industry is maximizing the reuse of com-
mon features across products without compromising the tailored nature expected
from the products. One way of achieving this goal is to delay customization de-
cisions to a late stage in the production process, which can be attained through
software. For instance, a whole range of products can be achieved through a scale
production of the same hardware, and a software customization of each type of
product.
The capability of building tailored products by customization is called vari-
ability. Software variability can be achieved through combination and congu-
ration of generic features.
Typical examples of variability can be found in embedded software and soft-
ware families. Embedded software facilitates the customization of single pur-
pose machines (i.e. those that are not computers) such as cars, mobile phones,
airplanes, medical equipment, televisions, etc. by reusing their hardware while
Angela Lozano is funded as a post-doc researcher on a an FNRS-FRFC project. This
work is supported by the ICT Impulse Program of ISRIB and by the Inter-university
Attraction Poles (IAP) Program of BELSPO.
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 141150, 2011.
c Springer-Verlag Berlin Heidelberg 2011
142 A. Lozano
varying their software to obtain dierent products. Software families are groups
of applications that come from the conguration and combination of generic
features. Both embedded software and software families refer to groups of appli-
cations related by common functionality. However, variability can also be found
in single applications when they delay design decisions to late stages in order
to react to dierent environments e.g. mobile applications, games, fault tolerant
systems, etc.
Motivations for Detecting Variability: When companies realize that slight
modications of a product could enlarge their range of clients they could mi-
grate to a product-line development. Detecting variability opportunities in the
current product is the rst step to assess which changes have a higher return
in terms of potential clients. This return assessment is crucial for the success of
such migration because the reduction on development cost may not cover the
increase in maintenance cost if the variation introduced is unnecessary.
Once a product-line architecture is in place, it will degrade over time. Degra-
dation of the architecture is a reality of software development [7]. In particular,
product-lines have well-known evolution issues that can degrade their architec-
ture [3,12]. Therefore, at some point it might become necessary to restructure
the product-line; and the rst step is reconstructing its architecture from the
source code.
Both previous scenarios start from the source code of the (software) system
as source of information. Source code mining is a reverse engineering technique
that extracts high-level concepts from the source code of a system. In order to
mine for variability concepts, we need to clearly dene these high-level concepts.
Although mining for variability in source code is a common step towards the
successful adoption of variability-oriented development, there exists no compi-
lation nor comparison of existing approaches available to attain this task. The
goal of our paper is two-fold. First, we identify high-level concepts targeted by
variability mining approaches and describe the intuition behind them. Second,
we use these concepts to classify the approaches that detect variability as they
uncover the intuition and assumptions behind the technique used.
2 Variability Concepts
Variability dependencies are constraints between features that establish the valid
products of a feature diagram. In terms of source code, variability dependen-
cies (requires/excludes) are related with control ow relations between features.
Nevertheless, other types of feature overlaps have been found by analyzing im-
plementations of features regardless of being variable or mandatory.
Czarnecki et al. [5] dene Probabilistic Feature Models for feature diagrams,
organized using the probability of having a feature given the presence of another
feature. The approach is based on counting the legal congurations of a feature
diagram. The legal congurations are used to determine a set of legal samples
(of legal congurations), which are in turn used to calculate the frequency of co-
existence of all possible combinations of features. These frequencies are then used
to obtain conditional probabilities, that is, the probability of requiring a feature
given the presence in the application of another feature. Conditional probabilities
are used to structure the feature diagram and to decide when a variable feature
is optional, alternative, mandatory, inclusive or exclusive. Finally, a Bayesian
Network with a Directed Acyclic Graph of the features, and the conditional
probability table is used to learn variability dependencies. The Bayesian Net-
work is capable of detecting require relations (probability(f eature1|f eature2))
and exclude relations (probability(f eature1|f eature2)) among features. A dis-
advantage of this approach is that it does not analyze source code but it requires
an intermediate approach to detect the Directed Acyclic Graph of features in
the domain and their mapping to several products.
Parra et al. [19] propose an analysis of feature constraints based on the as-
sumption that variable features are implemented with aspects. The approach
detects that one feature requires a second feature when the pointcut that de-
nes the variation point for the rst feature references source code elements
referred to by the aspect that denes the second feature. The approach can de-
tect when one feature excludes a second feature, when the pointcuts (that dene
the variation points) for both features refer to the same source code elements.
This analysis is used to detect the correct order (if it exists) to compose the
variants represented by aspects in the source code. The disadvantage of this
approach is that presupposes an aspect-oriented implementation of variability.
Egyed [6] assumes that calling a similar set of source code entities when ex-
ecuting dierent features implies that one feature is a sub-feature of the other.
Similarly, Antkiewicz et al.[2] consider that a sub-feature is essential if it is com-
mon to all applications (of the same domain) analyzed. This approach locates
a feature by identifying similar patterns in entities that may be related to the
146 A. Lozano
Variation points capture the functionality areas in which products of the same
product family dier. Mining for variation points aims at detecting source code
entities that allow diverging functionality across dierent products of the same
domain. Although there exists literature describing how to implement variabil-
ity [13,1,4,22,17], we could only nd one approach to detect variation points and
variants. The lack of approaches may be due to the wide variety of possibilities
to translate a conceptual variation point (i.e. a delayed decision) to the imple-
mentation of a variation point, as well as to the diculty to trace this translation
[20,14].
Thummalapenta and Xie [23] analyze applications that extend the same frame-
work. They calculate metrics that describe the amount and type of extensions
per class and method of the framework. These metrics allow to classify the
methods and classes of the framework into variation points (hotspots/hooks)
and coldspots/templates. Authors, analyzed the level of variability of the frame-
works analyzed by counting the percentage of classes and methods identied as
variation points. The disadvantage of this approach is that it requires a high
variety of applications from the domain to give reliable results.
A variant is an option available for a variable feature. Mining for variants
implies assigning a high level concept to the values (or objects) used in the
variation points to decide when to change the implementation of a variable
feature. This means that mining for variants requires detecting variation points
An Overview of Techniques for Detecting Software Variability Concepts 147
control. The approach is called grow and prune because it aims at letting prod-
ucts of the domain evolve (grow), and then merge as mandatory features (prune)
the successful source code entities of these products. For that reason, they pro-
posed metrics to evaluate the success of a source code entity. A source code
entity is dened as successful if its implementation contained a large amount of
code, a low number of decisions, and a low frequency of changes, was highly used
and cloned. Nevertheless the approach is incapable of oering further assistance
for restructuring the product line.
Mende et al. [18] use clone detection to measure the level of commonalities
across directories, les and functions of dierent products of the same domain.
The idea is to assess to what extent a product (p1) can be merged into a basic
product (p2), and to what extent the functionality of the product to be merged
(p1) is included in the other product (p2). Depending on the number of identical
functions (similarity = 1) and of similar functions (similarity between 0 and 1)
of the product p2 into the product p1, it is possible to say if the correspondence
is identical or potential, and if such correspondence is located in one function
or not. Therefore the analysis proposed helps to detect potential mandatory
features parts in an existing product line (i.e., product-specic functions identical
across several products), as well as variability points that may require some
restructuring to separate better mandatory and variable features (i.e., product-
specic functions similar across several products).
Frenzel et al.[9] extract the model of a product-line architecture based on
several products of the same domain. To extract the model they use reexion
models, which gives them the static components, their interfaces, their depen-
dencies, and their grouping as layers and sub-layers in the system. The model
is then compared with the implementation of the products by checking whether
dierent products have corresponding components (based on clone similarity).
Clone detection is also used to transfer common implementation areas to the
common design. The latter two approaches are merged and further developed
in [15]. However, given that these approaches do not mine for features, they
are unable infer the feature diagram and its correspondence with the inferred
architecture .
8 Conclusions
This paper compiles techniques to propose refactorings from variable to manda-
tory features, and to improve the comprehension of variable products by dis-
covering the decomposition of features in a domain, the hidden links among
implementations of variable features, the source code entities in charge of the
variation, and valid products of a domain. Although we found several approaches
to mine for most of the variability concepts the approaches can be improved in
several ways. Some of the techniques have demanding requirements. For instance,
reconstructing feature diagrams require an initial domain model [24] or entity
that implements the feature to analyze [2], while detecting excludes/requires
dependencies may require an aspect-oriented implementation of variable fea-
tures [19] or an initial feature diagram and its mapping to several applications
An Overview of Techniques for Detecting Software Variability Concepts 149
[5]. Other techniques have a restricted automated support. For example, recon-
structing feature diagrams may require manual post processing [24]. Finally, the
usefulness of the output of some techniques is limited to address architectural
degradation of product-line design from single products. For instance, knowing
the congurations of a product line and the variables involved [21] does not pro-
vide any hints on how to restructure it or on how to map this implementation
details to domain concepts; which limits the usage of the approach to deal with
architectural degradation due to evolution.
Mining for variation points is the area with highest potential because there
are several papers that describe how variability should be implemented. Mining
for variants also has potential given that is a neglected area, and that it is com-
plimentary to the detection of variation points. Nevertheless, detecting variation
points is not enough to detect variants because each variation point needs to be
linked to a variable feature. However, the assignment of a variation points to a
variable feature could be technically challenging because the majority of order
interactions are due to control ow dependencies [16]. Another area open for
future work is extending the approaches to mine for feature diagrams, and for
variable and mandatory features to analyze the exibility of single products in
order to support the migration towards product-line development.
References
1. Anastasopoulos, M., Gacek, C.: Implementing product line variabilities. In: SSR
2001: Proc. of the 2001 Symposium on Software Reusability, pp. 109117. ACM,
New York (2001)
2. Antkiewicz, M., Bartolomei, T.T., Czarnecki, K.: Fast extraction of high-quality
framework-specic models from application code. Autom. Softw. Eng. 16(1), 101
144 (2009)
3. Bosch, J., Florijn, G., Greefhorst, D., Kuusela, J., Obbink, J.H., Pohl, K.: Variabil-
ity issues in software product lines. In: Revised Papers from the 4th Intl Workshop
on Software Product-Family Engineering, PFE 2001, pp. 1321. Springer, Heidel-
berg (2002)
4. Brown, T.J., Spence, I., Kilpatrick, P., Crookes, D.: Adaptable components for soft-
ware product line engineering. In: Chastek, G.J. (ed.) SPLC 2002. LNCS, vol. 2379,
pp. 154175. Springer, Heidelberg (2002)
5. Czarnecki, K., She, S., Wasowski, A.: Sample spaces and feature models: There
and back again. In: SPLC 2008: Proc. of the 2008 12th Intl Software Product Line
Conference, pp. 2231. IEEE Computer Society, Washington, DC, USA (2008)
6. Egyed, A.: A scenario-driven approach to traceability. In: ICSE 2001: Proc. of
the 23rd Intl Conference on Software Engineering, pp. 123132. IEEE Computer
Society, Washington, DC, USA (2001)
7. Eick, S.G., Graves, T.L., Karr, A.F., Marron, J.S., Mockus, A.: Does code decay?
assessing the evidence from change management data. IEEE Trans. Softw. Eng. 27,
112 (2001)
8. Faust, D., Verhoef, C.: Software product line migration and deployment. Software:
Practice and Experience 33(10), 933955 (2003)
150 A. Lozano
9. Frenzel, P., Koschke, R., Breu, A.P.J., Angstmann, K.: Extending the reexion
method for consolidating software variants into product lines. In: WCRE 2007:
Proc. of the 14th Working Conference on Reverse Engineering, pp. 160169. IEEE
Computer Society, Washington, DC, USA (2007)
10. Hummel, O., Janjic, W., Atkinson, C.: Proposing software design recommendations
based on component interface intersecting. In: Proc. of the 2nd Intl Workshop on
Recommendation Systems for Software Engineering, RSSE 2010, pp. 6468. ACM,
New York (2010)
11. Jaring, M.: Variability Engineering as an Integral Part of the Software Product
Family Development Process. PhD thesis, Rijksuniversiteit Groningen (2005)
12. Johansson, E., H ost, M.: Tracking degradation in software product lines through
measurement of design rule violations. In: Proc. of the 14th Intl Conference on
Software Engineering and Knowledge Engineering, SEKE 2002, pp. 249254. ACM,
New York (2002)
13. Keepence, B., Mannion, M.: Using patterns to model variability in product families.
IEEE Softw. 16, 102108 (1999)
14. Kim, S.D., Her, J.S., Chang, S.H.: A theoretical foundation of variability in
component-based development. Inf. Softw. Technol. 47, 663673 (2005)
15. Koschke, R., Frenzel, P., Breu, A.P., Angstmann, K.: Extending the reexion
method for consolidating software variants into product lines. Software Quality
Control 17, 331366 (2009)
16. Lai, A., Murphy, G.C.: The structure of features in Java code: An exploratory
investigation. In: Ossher, H., Tarr, P., Murphy, G. (eds.) Workshop on Multi-
Dimensional Separation of Concerns (OOPSLA 1999) (November 1999)
17. Maccari, A., Heie, A.: Managing innite variability in mobile terminal software:
Research articles. Softw. Pract. Exper. 35(6), 513537 (2005)
18. Mende, T., Beckwermert, F., Koschke, R., Meier, G.: Supporting the grow-and-
prune model in software product lines evolution using clone detection. In: Proc. of
the 2008 12th European Conference on Software Maintenance and Reengineering,
CSMR 2004, pp. 163172. IEEE Computer Society, Washington, DC, USA (2008)
19. Parra, C., Cleve, A., Blanc, X., Duchien, L.: Feature-based composition of software
architectures. In: Babar, M.A., Gorton, I. (eds.) ECSA 2010. LNCS, vol. 6285, pp.
230245. Springer, Heidelberg (2010)
20. Salicki, S., Farcet, N.: Expression and usage of the variability in the software prod-
uct lines. In: Revised Papers from the 4th Intl Workshop on Software Product-
Family Engineering, PFE 2001, pp. 304318. Springer, London (2002)
21. Snelting, G.: Reengineering of congurations based on mathematical concept anal-
ysis. ACM Trans. Softw. Eng. Methodol. 5(2), 146189 (1996)
22. Svahnberg, M., van Gurp, J., Bosch, J.: A taxonomy of variability realization tech-
niques: Research articles. Softw. Pract. Exper. 35, 705754 (2005)
23. Thummalapenta, S., Xie, T.: Spotweb: detecting framework hotspots via mining
open source repositories on the web. In: Proc. of the 2008 Intl Working Conference
on Mining Software Repositories, MSR 2008, pp. 109112. ACM, New York (2008)
24. Yang, Y., Peng, X., Zhao, W.: Domain feature model recovery from multiple appli-
cations using data access semantics and formal concept analysis. In: WCRE 2009:
Proc. of the 2009 16th Working Conference on Reverse Engineering, pp. 215224.
IEEE Computer Society, Washington, DC, USA (2009)
Variability in Multi-tenant Environments:
Architectural Design Patterns from Industry
Utrecht University
Department of Information and Computing Sciences
Princetonplein 5, 3584CC, Utrecht, Netherlands
{j.kabbedijk,s.jansen}@cs.uu.nl
1 Introduction
Increasingly, product software vendors want to oer their product as a service
to their customers [1]. This principle is referred to in literature as Software as a
Service (SaaS) [2]. Turning software into a service from a vendors point of view
means separating the possession and ownership of software from its use. Software
is still maintained and deployed by the vendor, but used by the customer. The
problem of moving a software product from dierent on-premises locations to one
central location, is the fact that it becomes really dicult to comply to specic
customer wishes. In order to serve dierent customers wishes, variability in
a software product is needed to oer specic functionality. By making use of
variability in a software product, it is possible to supply software functionality
as optional modules, that can be added to the product at runtime. Applying
this principle can overcome many current limitations concerning software use,
deployment, maintenance and evolution in a SaaS context [3]. It also reduces
support costs, as only a single instance of the software has to be maintained [4].
Besides complying to specic customer requirements, a software vendor should
be able to oer a service to a large number of customers, each with their own
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 151160, 2011.
c Springer-Verlag Berlin Heidelberg 2011
152 J. Kabbedijk and S. Jansen
2 Research Method
In this research, the variability realization techniques (VRTs) currently described
in literature and in place in large SaaS providers are observed. In order to do this,
a thorough literature study has been performed, in which the combinations of
variability, saas and variability, multi-tenancy were used as keywords in Google
Scholar1 . The VRTs discussed in the papers having the previous mentioned
keywords in their title or abstract were collected and patterns in those patterns
were put in a pattern database. A total of 27 papers was collected this way.
Besides the literature study, two independent case studies were performed at
large ERP providers who recently launched their enterprise resource planning
(ERP) software as a service through the Internet (refered to as ErpCompA and
ErpCompB from here on). ErpCompA has a turnover of around 250 million
euros and around 20,000 users using their online product, while ErpCompB has
a turnover of around 50 million euros and around 10,000 users. The case studies
were performed using the case study research approach by Yin [9].
The VRTs are presented as architectural design patterns and created based
on the Design Science principles of Hevner [10], in which a constant design
cycle consisting of the construction and evaluation of the VRTs takes place.
The initial model is constructed using a exploratory focus group (EFG) [11],
consisting out of participants from academia and the case companies, and a
systematic literature review [12]. The focus group has been carefully selected
and all participants have experience in the area of variable multi-tenant SaaS-
environments. Additional validation of the VRTs was done conducting interviews
with software architects within the two case companies [13].
1
Google Scholar (www.scholar.google.com indexes and searches almost all academic
publishers and repositories world-wide.
Variability in Multi-tenant Environments 153
Multi-tenancy
Multi-tenancy can be dened as the ability to let dierent tenants share the
same hardware resources, by oering them one shared application and database
instance, while allowing them to congure the application to t their needs as
if it runs on a dedicated environment [5]. A tenant refers to an organization or
part of an organization with their own specic requirements, renting the software
product. We dene dierent levels of multi-tenancy:
Data Model Multi-tenancy: All tenants share the same database. All
data is typically provided with a tenant specic GUID in order to keep
all data separate. Even better is native support for multi-tenancy in the
database management system [14].
Application Multi-tenancy: Besides sharing the same database, all ten-
ants also share the same instance of the software product. In practice, this
could also mean a couple of duplications of the same instance, coupled to-
gether with a tenant load balancer [8].
Full Multi-tenancy: All tenants share the same database and software
instances. They can also have their own variant of the product, based on
their tenant requirements. This level of multi-tenancy adds variability to the
software product.
All items above are sorted on ascending implementation complexity.
Variability
The concept of variability comes from the car industry, in which dierent com-
binations of for example chassis, engine and color were dened as dierent vari-
ants. In software the concept is rst introduced in the area of software product
lines [15], in which variability is dened as the ability of a software system or
artefact to be eciently extended, changed, customized or congured for use in
a particular context [16]. Within the area of software product lines, software is
developed by the software vendor and then shipped to the customer to be run
on-premises. This means variants have to be compiled before product shipping.
Within the area of Software-as-a-Service, software is still developed by the soft-
ware vendor, but the product is served to all customers through the internet from
one central place [3,8]. In principle, all variants can be composed the moment
customers ask for some specic functionality, so at run-time.
We identify two dierent types of variability within multi-tenant SaaS
deployments:
Segment Variability: Product variability based on the segment a tenant is
part of. Examples of such variability issues are dierent standard currencies
154 J. Kabbedijk and S. Jansen
or tax rules per country or a dierent layout for SMEs and sole proprietor-
ships.
Tenant-oriented Variability: Product variability based on the specic
requirements of a tenant. Examples of such variability issues are dierent
background colors or specic functionality.
Low: Look and Feel: Changes only inuencing the visual representation
of the product. These changes only occur in the presentation tier (tier-based
architecture [17]) or view element (MVC-base architecture [18]). Examples
include dierent background colors or dierent element sorting in lists.
Medium: Feature: Changes inuencing the logic tier in tier-based archi-
tecture or the model or controller element in a MVC-based architecture.
Examples include the changes in workow or the addition of specic func-
tionality.
High: Full: Variability of this level can inuence multiple tiers at the same
time and can be specic. Examples of this level of variability includes the
ability for tenant to run their own program code.
The scope of this research is focussed on runtime tenant-oriented low and medium
variability in multi-tenant SaaS deployments.
Design Patterns
The concept of patterns was rst introduced by Christopher Alexander in his
book about the architecture of towns [19]. This concept was quickly picked up
in the software engineering world and led to the famous Gang of Four pattern
book by Gamma et al. [20]. This book describes several patterns that are still
used today and does this in a way that inspired a lot of subsequent pattern au-
thors. The denition of a pattern used in this paper originates from the Pattern
Oriented Software Architecture series [21,22,23,24,25] and reads: A pattern for
software architecture describes a particular recurring design problem that arises
in specic design contexts, and presents a well-proven generic scheme for its solu-
tion. The solution scheme is specied by describing its constituent components,
their responsibilities and relationships, and the ways in which they collaborate.
Patterns are not articially created artifacts, but evolve from best practices
and experiences. The patterns described in this paper result from several case
studies and discussions with experienced software architects. All patterns have
proven to be a suitable solution for the problems described in section 5, since they
are applied in successful SaaS products at our case companies. Also, all patterns
are described language or platform independent, so the solution can be applied
in various situations in the Software-as-a-Service domain. More information on
future research concerning the patterns proposed can be found in section 6.
Variability in Multi-tenant Environments 155
4 User-Variability Trade-o
The best solution for deploying a software product from a software vendors per-
spective depends on level of resources shared and the level of variability needed
to keep all users satised. In gure 1 four deployment solutions are introduced,
that are considered best practices in the specic situations shown. In this section,
the need for multi-tenant deployment models is explained.
Standard Configurable
Multi-tenant Multi-tenant a = Business Growth
Solution b Solution b = Customer
Need for resource
Requirements Growth
PAAS
sharing
IAAS
a
a+
Custom b
SPL
Software Solution
Solution
By using the model shown in gure 1, software vendors can determine the
best suited software deployment option. On the horizontal axis, the need for
variability in a software product is depicted and the number of customers is
shown on the vertical axis. For a small software vendor who does not have
a lot of customers with specic wishes, a standard custom software solution
is sucient. The more customers software vendors get (business growth), the
higher the need for a standard multi-tenant solution because of the advantages
in maintenance. When the amount of specic customer wishes grows, software
vendors can choose the software product line (SPL) approach to create variant
for all customers having specic requirements. This solution can lead to a lot of
extra maintenance issues as the number of customers grows. In case of a large
number of customers having specic requirements, a congurable multi-tenant
solution is the best solution for software vendors, keeping an eye on performance
and maintenance.
5 Variability Patterns
In this section three patterns are described that were observed in the case stud-
ies that were conducted. The patterns are designed based on patterns observed
within the case companies software product, extended by patterns already doc-
umented in literature [20,16]. All patterns will be explained by an UML-diagram,
together with descriptive topics proposed by Buschmann et al. [24] and Gamma
et al. [20]
156 J. Kabbedijk and S. Jansen
+storeSetting(userID,setting)
+retrieveSetting(userID,setting)
Intent - To give the tenant the ability to indicate and save his preferences on
the representation of data shown.
Motivation - In a multi-tenant solution it is important to give tenants the feel-
ing they can customize the product the way they want it. This customizability
is most relevant in parts of the product where data is presented to the tenant.
Solution - In this variability pattern (cf. gure 2), the representation of data is
performed at client side. Tenants can for example choose how they want to sort
or lter their data, while the data-queries do not have to be adapted. The only
change needed to a software product is the introduction of tenant-specic rep-
resentation settings. In this table, all preferred font colors, sizes and sort option
can be stored in order to retrieve this information on other occasions to display
the data again, according to the tenants wishes.
Explanation - As can be seen in the UML representation of the pattern in g-
ure 2, the DataRepresentation class can manipulate the appearance of all data
by making use of a FunctionalComponent able of sorting, ltering, etcetera. All
settings are later stored by a DataComponent in a specic UserSettings table.
Settings can later be retrieved by the same DataComponent, to be used again
by the DataRepresentation class and FunctionalModule.
Consequences - By implementing this pattern, one extra table has to be imple-
mented. Nothing changes in the way data selection queries have to be formatted.
Representation of all data has to be formatted in a default way, except if a ten-
ant changes this default way and stores his own preferences.
Example - In a bookkeeping program, a tenant for example, can decide what
columns he wants to display and how he wants to order them. By clicking the
columns he wants to display, his preferences are saved in the database. When
the tenant uses the product again later, his preferences are fetched from the
database and applied to his data.
Variability in Multi-tenant Environments 157
This section describes a pattern to create dynamic menus, based on the modules
associated to a tenant.
0.*
0.* links to 0.*
Button Module
+image +moduleID
+description +functionA()
+link +functionB()
+mandatoryModule = moduleID +functionN()
Intent - To provide a custom menu to all tenants, only containing links to the
functionality relevant to the tenant.
Motivation - Since all tenants have specic requirements to a software product,
they can all use dierent sets of functionality. Displaying all possible function-
ality in the menu would decrease the user experience of tenants, so menus have
to display only the functionality that is relevant to the tenant.
Solution - The pattern proposed (cf. gure 3), creates a menu out of dierent
buttons based on the modules associated to the tenant. Every time a tenant
displays the menu, the menu is built dynamically based on the modules he has
selected or bought.
Explanation - The Menu class aggregates and displays dierent buttons, con-
taining a link a specic module and the prerequisite for displaying this link
(mandatoryModule). The selection of buttons is done, based on the results of
the ModuleChecker. This class checks whether an entry is available in the User-
Modules table, containing both the ID of the tenant (user) and the mandatory
module. If an entry is present, the Menu aggregates and displays the button
corresponding to this module.
Consequences - To be able to use this pattern, an extra table containing user
IDs and the modules available to this user has to be implemented. Also, the
extra class ModuleChecker has to be implemented. All buttons do need a notion
of a mandatory module that can be checked by the ModuleChecker to verify if
a tenant wants or can have a link to the specic functionality.
Example - In a large bookkeeping product, containing several modules that can
be bought by a tenant, the menus presented to the tenant can be dynamically
composed based on the tenants license.
158 J. Kabbedijk and S. Jansen
PreComponent PostComponent
+attributeA +attributeA
+attributeB +attributeB
+attributeN +attributeN
+operationA() +operationA()
+operationB() +operationB()
+operationN() +operationN()
0.* 0.* UserModules
+userID
+moduleID
implements
0.*
0.* calls 1 1 uses 0.*
BusinessComponent FunctionalComponent ComponentChecker
Intent - To provide the possibility for tenants to have custom functionality just
before or after an event.
Motivation - In business oriented software, workows often dier per tenant.
To let the software product t the tenants business processes best, extra actions
could be made available to tentants before or after an event is called.
Solution - The pattern introduced here (cf. gure 4), makes use of a component
able of calling other components before and after the update of data. The tenant-
specic modules are listed in a separate table, similar to the pattern described
in section 5.2.
Explanation - Before the FunctionalComponent calls the BusinessComponent
in order to perform an update, the ComponentChecker is used to check the
UserModules table if a tenant wants and may implements an extra component
before the update is performed. After this, the BusinessComponent is called
and the update is performed. The DataComponent takes care of the writing of
data to a specic data table. After this, the ComponentChecker again checks the
UserModules table and a possible PostComponent is called.
Consequences - Extra optional components have to be available in the software
system in order to be able to implement this pattern. The amount and type of
components available depends on the tenants requirements.
Variability in Multi-tenant Environments 159
References
1. Ma, D.: The Business Model of Software-As-A-Service. In: IEEE International
Conference on Services Computing, SCC 2007, pp. 701702. IEEE, Los Alamitos
(2007)
2. Gold, N., Mohan, A., Knight, C., Munro, M.: Understanding service-oriented soft-
ware. IEEE Software 21(2), 7177 (2005)
3. Turner, M., Budgen, D., Brereton, P.: Turning software into a service. Com-
puter 36(10), 3844 (2003)
4. Dubey, A., Wagle, D.: Delivering software as a service. The McKinsey Quarterly 6,
112 (2007)
5. Bezemer, C., Zaidman, A.: Multi-tenant SaaS applications: maintenance dream or
nightmare? In: Proceedings of the International Workshop on Principles of Software
Evolution (IWPSE), pp. 8892. ACM, New York (2010)
6. Bezemer, C., Zaidman, A., Platzbeecker, B., Hurkmans, T., Hart, A.: Enabling
multi-tenancy: An industrial experience report. In: 26th IEEE Int. Conf. on Soft-
ware Maintenance, ICSM (2010)
160 J. Kabbedijk and S. Jansen
7. Guo, C., Sun, W., Huang, Y., Wang, Z., Gao, B.: A framework for native multi-
tenancy application development and management. In: The 9th IEEE International
Conference on E-Commerce Technology, pp. 551558 (2007)
8. Kwok, T., Nguyen, T., Lam, L.: A software as a service with multi-tenancy support
for an electronic contract management application. In: IEEE International Confer-
ence on Services Computing, SCC 2008, vol. 2, pp. 179186. IEEE, Los Alamitos
(2008)
9. Yin, R.: Case study research: Design and methods. Sage Publications, Inc., Thou-
sand Oaks (2009)
10. Hevner, A.R., March, S., Park, J., Ram, S.: Design science in information systems
research. Mis Quarterly 28(1), 75105 (2004)
11. Tremblay, M.C., Hevner, A.R., Berndt, D.J.: The Use of Focus Groups in Design
Science Research. Design Research in Information Systems 22, 121143 (2010)
12. Cooper, H.: Synthesizing research: A guide for literature reviews (1998)
13. Runeson, P., H ost, M.: Guidelines for conducting and reporting case study research
in software engineering. Empirical Software Engineering 14(2), 131164 (2009)
14. Schiller, O., Schiller, B., Brodt, A., Mitschang, B.: Native support of multi-tenancy
in RDBMS for software as a service. In: Proceedings of the 14th International
Conference on Extending Database Technology, pp. 117128. ACM, New York
(2011)
15. Pohl, K., Bockle, G., van der Linden, F.: Software product line engineering: founda-
tions, principles, and techniques. Springer-Verlag New York Inc., Secaucus (2005)
16. Svahnberg, M., van Gurp, J., Bosch, J.: A taxonomy of variability realization tech-
niques. Software: Practice and Experience 35(8), 705754 (2005)
17. Eckerson, W.: Three Tier Client/Server Architectures: Achieving Scalability, Per-
formance, and Eciency in Client/Server Applications. Open Information Sys-
tems 3(20), 4650 (1995)
18. Krasner, G., Pope, S.: A description of the model-view-controller user inter-
face paradigm in the smalltalk-80 system. Journal of Object Oriented Program-
ming 1(3), 2649 (1988)
19. Alexander, C., Ishikawa, S., Silverstein, M., Jacobson, M., Fiksdahl-King, I., Angel,
S.: A pattern language. Oxford Univ. Pr., Oxford (1977)
20. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design patterns: elements of
reusable object-oriented software, vol. 206
21. Buschmann, F.: Pattern-Oriented Software Architecture: A System of Patterns,
vol. 1. John Wiley & Sons, Chichester (1996)
22. Schmidt, D.: Pattern-Oriented Software Architecture: Patterns for Concurrent and
Networked Objects, vol. 2. Wiley, Chichester (2000)
23. Kircher, M., Jain, P.: Pattern-Oriented Software Architecture : Patterns for Re-
source Management, vol. 3 (2004)
24. Buschmann, F., Henney, K., Schmidt, D.: Pattern-Oriented Software Architecture:
Pattern Language for Distributed Computing, vol. 4. Wiley, Chichester (2007)
25. Buschmann, F.: Pattern-Oriented Software Architecture : On patterns and Pattern
Languages, vol. 5 (2007)
Preface to Onto.Com 2011
This volume collects articles presented at the first edition of the International
Workshop on Ontologies and Conceptual Modeling (Onto.Com 2011). This workshop
was organized as an activity of the Special Interest Group on Ontologies and
Conceptual Modeling of the International Association of Ontologies and Applications
(IAOA). It was held in the context of the 30th International Conference on Conceptual
Modeling (ER 2011), in Brussels, Belgium. Moreover, the workshop was designed
with the main goal of discussing the role played by formal ontology, philosophical
logics, cognitive sciences and linguistics, as well as empirical studies in the
development of theoretical foundations and engineering tools for conceptual
modeling.
For this edition, we have received 18 submissions from Belgium, Canada, France,
Germany, Italy, New Zealand, Russia, South Africa, Spain, Tunisia, United Kingdom,
and the United States. These proposals were carefully reviewed by the members of
our international program committee. After this process, 7 articles were chosen for
presentation at the workshop. In the sequel, we elaborate on these selected
submissions.
In the paper entitled Experimental Evaluation of an ontology-driven enterprise
modeling language, Frederik Gailly and Geert Poels discuss an experiment to
evaluate the use of an enterprise modeling language which was developed using the
Resource Event Agent (REA) enterprise ontology and the Unified Foundational
ontology (UFO). The effect of using the ontology-driven modeling language is
analyzed using a well-known method evaluation model which contains both actual
and perception-based variables for measuring the efficiency and effectiveness of the
used method.
In Levels for Conceptual Modeling, Claudio Masolo proposes a (non-exclusive)
alternative to taxonomic structuring based on subtyping relations in conceptual
modeling. The authors proposal relies on two relations: factual existential
dependence and extensional atemporal parthood. On the basis of these relations, the
author elaborates on a strategy to stratify object types in different levels, and to
manage inheritance in a manner that addresses some classical difficulties in the
modeling of this notion (e.g. attribute overriding, attribute hiding, or dynamic and
multiple classifications and specialization).
In Principled Pragmatism: A Guide to the Adaptation of Ideas from
Philosophical Disciplines to Conceptual Modeling, David W. Embley, Stephen W.
Liddle, and Deryle W. Lonsdale discuss the synergism among the traditional
disciplines of ontology, epistemology, logic, and linguistics and their potential for
enhancing the discipline of conceptual modeling. The authors argue that application
objectives, rather than philosophical tenets, should guide the adaptation of ideas from
these disciplines to the area of conceptual modeling.
In Ontology Usage Schemes: A Working Proposal for the Ontological Foundation
of Language Use, Frank Loebe proposes three thesis regarding the relations between
formal semantics, ontological semantics and representation systems. Based on these
theses, the author outlines and illustrates a proposal for establishing usage-specific
and ontology-based semantic schemes. Moreover, the author establishes a relation
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 161162, 2011.
Springer-Verlag Berlin Heidelberg 2011
162 Preface
between these findings and works regarding the specific case of conceptual modeling
languages. Finally, he discusses potential applications of these proposals, including
semantics-preserving translations and the re-engineering of representations.
In Gene Ontology based automated annotation: why it isn't working, Matthijs
van der Kroon and Ana M. Levin propose an analysis of the current practice of
ontology-based annotation in genome sequence applications. In particular, they
analyze the current approaches based on the Gene Ontology (GO) and elaborate on
the use of ontological categories to reflect on the pitfalls of these approaches.
In Formal Ontologies, Exemplars, Prototypes, Marcello Frixione and Antonio
Lieto discuss a fundamental notion in Knowledge Representation and Ontology
Specification, namely, the notion of Concept. In the paper, they discuss problematic
aspects of this notion in cognitive science, arguing that Concept it is an overloaded
term, referring to different sorts of cognitive phenomena. Moreover, they sketch some
proposals for concept representation in formal ontologies which take advantage from
suggestions coming from cognitive science and psychological research.
Finally, in the paper entitled Unintended Consequences of Class-based
Ontological Commitment, by Roman Lukyanenko and Jeffrey Parsons elaborate on a
rather controversial thesis, namely, that what appears to be a clear advantage of
domain-specific ontologies, i.e., the explicit representation of domain semantics, may
in fact impede domain understanding and result in domain information loss.
Moreover, the paper discusses what the authors claim to be unintended consequences
of class-based ontological commitment and advocates instead for the adoption of an
instance-and property ontological foundation for semantic interoperability support.
We would like to thank the authors who considered Onto.Com as a forum for
presentation of their high-quality work. Moreover, we thank our program committee
members for their invaluable contribution with timely and professional reviews.
Additionally, we are grateful to the support received by the IAOA (International
Association for Ontologies and Applications). Finally, we would like to thank the ER
2011 workshop chairs and organization committee for giving us the opportunity to
organize the workshop in this fruitful scientific environment.
Faculty of Economic, Political and Social Sciences and Solvay Business School,
Vrije Universiteit Brussel
Frederik.Gailly@vub.ac.be
Faculty of Economics and Business Administration, Ghent University
Geert.Poels@ugent.be
1 Introduction
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 163172, 2011.
Springer-Verlag Berlin Heidelberg 2011
164 F. Gailly and G. Poels
The enterprise modeling language used in this experiment is based on the Resource
Event Agent enterprise ontology and the Unified Foundational Ontology. The REA-
EO focuses on the creation of value in an enterprise and specifies that a business
process or transaction consists of Economic Events that transfer or consume
Economic Resources and Economic Agents that participate in these Economic Events.
Additionally the ontology differentiates between different layers that deal with (i) the
actual past and present in enterprise reality (i.e., what occurred or is occurring), (ii)
the known future (i.e., what will occur), and (iii) the allowable states of the unknown
future (i.e., what should occur). UFO is a core ontology developed by Guizzardi [7]
and represents a synthesis of a selection of core ontologies. It is tailored towards
applications in conceptual modeling. During the creation of the REA enterprise
modeling language UFO is used for the ontological analysis of the REA-EO concepts,
relations and axioms.
The development of the REA enterprise modeling language is described in [8] and
results in the creation of a UML profile (i.e. the REA profile) that transforms the
ontologies in a useful modeling language. The ontological evaluation of the REA-EO
using UFO supports the development of the REA profile in two ways. Firstly it allows
to define the stereotypes of the enterprise modeling language as specializations of the
stereotypes of the OntoUML profile which is a general purpose modeling language
that is core ontology-driven because it extends the UML class diagram metamodel
with stereotypes that have their origin in UFO [10]. Secondly the ontological analysis
also influences the development of the structuring rules for the enterprise modeling
language because it identifies the domain-independent axioms, the domain-specific
specializations of general axioms and the domain-specific extensions that need to be
translated into structuring rules. In the profile the domain-independent axioms are
inherited from the OntoUML profile and domain-specific axioms are implemented
using OCL.
Experimental Evaluation of an Ontology-Driven Enterprise Modeling Language 165
The REA-EO has been empirically evaluated for business modeling in previous
research projects. These experiments have shown that being able to consult the REA
enterprise ontology as a reference model, has a significant impact on the pragmatic
quality of the developed models [11]. Moreover model users perceive diagrams with
REA pattern occurrences as easier to interpret than diagrams without [12]. The main
difference between this research and this previous work, is that in this experiment the
REA ontology is not used as a reference model but instead is implemented as a
domain specific modeling language with accompanying structuring rules that are
based on the ontology axioms which have their origin in both a domain ontology and
a core ontology.
effectives of using the modeling method and is defined as the degree to which a
person believes that a particular method will be effective in achieving its attended
objectives.
Following the work of Topi and Ramesh [14] the research model in figure 1 also
contains the variables that need to be constant or controlled in the experiment because
they could influence the performance and perception based variables and we only
want to measure the influence of the used method. Firstly the characteristics of the
modeler (e.g. domain knowledge, personal characteristics, modeling language
knowledge) should be controlled. This will be done by randomly selecting the
participants for a specific treatment (cfr. Infra). Secondly the task that should be
performed is constant by making sure that all modelers must develop a business
model for the same business case. Finally the used notation of the modeling language
is also constant by using in both method the UML class diagram notation.
The specification of the experimental hypotheses is based on two different
foundational theories and results in two different sets of hypotheses. On the one hand
based on the contiguity principle of the multimedia learning theory of Mayer [15] we
believe that when using an integrated tool that implements both the language and the
rules instead of reference model with accompanying textual description of the axioms,
the actual and perceived efficiency will be higher:
Ha: The efficiency of creating a business model using the REA profile is
significantly higher when the model is created using the REA profile than when it is
created using standard UML class diagrams with REA as a reference framework.
Experimental Evaluation of an Ontology-Driven Enterprise Modeling Language 167
Hb: The perceived ease of use while creating a business model using the REA
profile is significantly higher when the model is created using the REA profile than
when it is created using standard UML class diagrams with REA as a reference
framework.
On the other hand we also believe that the integration of the core ontology axioms
and domain-specific axioms in the domain specific modeling language forces the
modeler to take into account some general purpose and domain-specific modeling
structuring rules which actually make it easier to create a high-quality model. This
effect will be intensified by providing to the group that use the REA ontology as a
reference model only the domain-specific axioms and not the domain independent
axioms. Consequently the second hypothesis expects that modelers that use the profile
will develop models with a higher-domain specific quality and higher integration
quality, which should also result in a higher perceived usefulness:
Hc: The domain-specific quality of a business model created using the REA profile
is significantly higher when the model is created using the REA profile than when it is
created using standard UML class diagrams with REA as a reference framework.
Hd: The integration quality of a business model created using the REA profile is
significantly higher when the model is created using the REA profile than when it is
created using standard UML class diagrams with REA as a reference framework.
He: The perceived usefulness of creating a business model using the REA profile is
significantly higher when the model is created using the REA profile than when it is
modeled using standard UML class diagrams with REA as a reference framework.
4 Experimental Design
The potential users of an enterprise modeling language like REA are business people
who want to model the value creation of the business processes. Consequently the
participants of the experiments are students of a master in business engineering which
have followed a business modeling course that contains topics such as conceptual data
modeling (using ER modeling), business process modeling (using BPMN) and
domain-specific modeling using profiles.
The dependent variable is operationalized by dividing the group of participants in
two groups which will receive a different treatment. Group A will get one hour
introduction into REA enterprise modeling where the REA-model is used as a
reference model and the relevant axioms are described in text. Group B of students
will also learn the REA enterprise modeling language but will learn this by means of
the REA UML profile. Important to notice is that for both groups only a limited
version of the REA enterprise modeling languages will be taught because the
complete modeling languages is too complex to be taught in a limited time frame. As
a consequence we decided to exclude the policy layer of the REA ontology and only
take into account a limited set of structuring rules. The domain-specific structuring
that will be thought to the all participants are represented in table 1. The domain-
independent structuring rules that are only thought to group B and influence the
integration quality of the model are presented in table 2.
168 F. Gailly and G. Poels
Structuring Description
rule
1 Every Economic Agent must be related to an Economic Event via an
accountability or participation relationship
2 Instances of economic event must affect at least one instance of an
economic resource by means of an inflow or outflow relation
3 Instances of economic event must be related to at least one instance of
economic agent that is accountable for the event (i.e. inside agent) and
to at least one instance of an economic agent that participates in the
event (i.e. outside agent).
4 Instances of commitment must affect respectively at least one instance
of an economic resource or economic resource type by means of a
reserve or specify relationship.
5 Instances of commitment must be related to at least one instance of an
economic agent that is accountable for the event (i.e. inside agent) and
to at least one instance of an economic agent that participates in the
event (i.e. outside agent).
6 A Commitment that is connected to an economic resource or
economic resource type via a reserve-inflow or specify-inflow relation
must be connected via a inflow-fulfill relation to an Economic Event
that is connected to an inflow relation.
A Commitment that is connected to an economic resource or
economic resource type via a reserve-outflow or specify-outflow
relation must be connected via an outflow-fulfill relation to an
Economic Event that is connected to an outflow relation.
Structuring Description
rule
7 Instances of REA-EO Economic Resource and Economic Agents must
always directly or indirectly be an instance of a Substance Sortal
8 Economic Resources and Economic Agents cannot include in their
general collection more than one Substance Sortal
9 Economic Resources and Economic Agents cannot be specialized in
an entity representing a Kind
After receiving the treatment the two groups will receive the same business case
which is actually based on a existing business which sells trainings to their customers.
The business case contains on the one hand the description of business process which
contains both the planning and the actual execution of the training, can be modeled
Experimental Evaluation of an Ontology-Driven Enterprise Modeling Language 169
using the REA profile or using the REA ontology as reference. Additionally the
business case contains some information that is not within the scope of REA but is
also relevant. The modeling of this information is given to the participants and should
be integrated in the model. For instance the business case indicates that every
available training is provided by a partner organization and a training is a special kind
of product for the company. Moreover the company makes a distinction between a
training and a planned training.
Based on the case description the students have to perform two tasks. First they
have to develop a BPMN model for the business process. The creation of this model
has two goals: it forces the modeler to understand the process and it will be used to
control the domain knowledge and modeling experience of the students. Next they
have to develop a business model where the group A must use UML class diagrams
and group B the REA UML profile. Figure 2 and 3 represent respectively the BPMN
models and the REA UML profile business model for the business case. The
constructs in bold in figure 3 present the part of the business model that is given to the
participants.
PEOU1 I found the procedure for applying the method complex and difficult to
follow
PEOU2 Overall, I found the method difficult to use
PEOU3 I found the method easy to learn
PEOU4 I found it difficult to apply the method to the business case
PEOU5 I found the rules of the method clear and easy to understand
PEOU6 I am not confident that I am now competent to apply this method in
practice
PU1 I believe that the method reduces the effort required to create a business
model
PU2 I believe that this method has improved my overall performance during
the development of the business model
PU3 This method makes it easier for users to create business models
PU4 Overall, I found the method to be useful for the development of a
business model
PU5 I believe that this method allows me to create business models more
quickly
PU6 Overall, I think this method does not provide an effective solution to the
problem of representing business models
References
1. Wand, Y., Weber, R.: An Ontological Model of an Information System. IEEE
Transactions on Software Engineering 16, 12821292 (1990)
2. Wand, Y., Storey, V.C., Weber, R.: An Ontological Analysis of the Relationship Construct
in Conceptual Modeling. ACM Transactions on Database Sysems 24 (1999)
3. Evermann, J., Wand, Y.: Ontology based object-oriented domain modelling: fundamental
concepts. Requirements Engineering 10, 146160 (2005)
4. Tairas, R., Mernik, M., Gray, J.: Using Ontologies in the Domain Analysis of Domain-
Specific Languages. In: Chaudron, M.R.V. (ed.) MODELS 2008. LNCS, vol. 5421, pp.
332342. Springer, Heidelberg (2009)
5. Gruninger, M., Lee, J.: Ontology Applications and Design: Introduction. Communications
of the ACM 45, 3941 (2002)
6. Henderson-Sellers, B.: Bridging metamodels and ontologies in software engineering.
Journal of Systems and Software 84 (2011)
7. Guizzardi, G.: Ontological Foundations for Structural Conceptual Models.
TelematicaInstituut cum laude. University of Twente, Twente (2005)
8. Gailly, F., Geerts, G., Poels, G.: Ontology-driven development of a domain-specific
modeling language: the case of an enterprise modeling language. FEB working paper
series. Ghent University (2011)
9. Geerts, G., McCarthy, W.E.: An Accounting Object Infrastructure for Knowledge Based
Enterprise Models. IEEE Intelligent Systems and Their Applications 14, 8994 (1999)
10. Benevides, A.B., Guizzardi, G.: A Model-Based Tool for Conceptual Modeling and
Domain Ontology Engineering in OntoUML. In: Filipe, J., Cordeiro, J. (eds.) Enterprise
Information Systems. LNBIP, vol. 24, pp. 528538. Springer, Heidelberg (2009)
11. Poels, G., Maes, A., Gailly, F., Paemeleire, R.: The pragmatic quality of Resources-
Events-Agents diagrams: an experimental evaluation. Information Systems Journal 21, 63
89 (2011)
12. Poels, G.: Understanding Business Domain Models: The Effect of Recognizing Resource-
Event-Agent Conceptual Modeling Structures. Journal of Database Management 21 (2011)
13. Moody, D.: The Method Evaluation Model: A Theoretical Model for Validating
Information Systems Design Methods. In: 11th European Conference on Information
Systems, ECIS 2003 (2003)
14. Topi, H., Ramesh, V.: Human Factors research on Data Modeling: A review of Prior
Research, An extended Framework and Future Research Directions. Journal Database
Management 13, 319 (2002)
15. Moreno, R., Mayer, R.E.: Cognitive Principles of Multimedia Learning: The Role of
Modality and Contiguity. Journal of Educational Psychology 91, 358368 (1999)
16. Davis, F.D.: Perceived Usefulness, Perceived Ease of Use, and User Acceptance of
Information Technology. MIS Quarterly 13, 319340 (1989)
17. Maes, A., Poels, G.: Evaluating quality of conceptual modelling scripts based on user
perceptions. Data & Knowledge Engineering 63, 701724 (2007)
Levels for Conceptual Modeling
Claudio Masolo
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 173182, 2011.
c Springer-Verlag Berlin Heidelberg 2011
174 C. Masolo
a subtype nor a supertype of both Company and Person. [W]e have the para-
doxical situation that, from the extensional point of view, roles are supertypes
statically, while dynamically they are subtypes ([17], p.90). While keeping the
same domain, this problem can be managed by adding new objects types, e.g.
Private Customer (subtype of both Person and Customer) and Corporate Cus-
tomer (subtype of both Company and Customer) [7], or by introducing dynamic
and multiple classication and specialization (see [17] for a review). Alterna-
tively, more permissive or multiplicative approaches extend the domain with
new entities. Steimann [17] separates natural types (e.g. Person) from role types
(e.g. Customer). Roles are adjunct instances linked by a played-by relation to
their players (the persons or companies in the case of customers). The object and
its roles form an aggregate and the dynamic picking up of a role corresponds
to the creation of a new instance of the corresponding role type and its integra-
tion in a compound, and dropping a role means releasing the role instance from
the unit and destroy it ([17], p.91). In object-oriented database management
systems, by distinguishing specialization, an abstract concept, from inheritance,
a mechanism that implements specialization, [1] systematically multiplies the
instances in the presence of a subtype relation. If P is a subtype of Q, then the
creation of an object p of type P produces the creation of an object q of type Q
plus a link between them that allows p to inherit attributes from q. An object
then is implemented by multiple instances which represent its many faceted
nature. Those instances are linked together through aggregation links in a spe-
cialization relation ([1], p.561). The attributes are locally dened and stored
but additional ones can be inherited via the links between the instances. From a
more foundational perspective, multiplicative approaches have been investigated
to solve the counting problem [9]. For instance, to count the Alitalia passengers
(during 2010), one cannot just count the persons that ew Alitalia (during 2010).
By adding qua-entities [12], (sum of) relational tropes [7], or role-holders [13]
entities that inhere in (a sort of existential specic dependence), but are dif-
ferent from, the players (see Section 1 for more details) the counting problem
is solved. In philosophy, multiplicativism is often considered also in the case of
statues, organisms, tables, etc. (see [14] for a review and [3] for a recent defense).
Interestingly, qua-entities have been originally introduced in this contest [6]. As
in the case of roles, statues and amounts of matter have dierent properties (in
particular causal properties) and dierent persistence conditions. The amount
of matter that constitutes a specic statue can change through time. Or, an
amount of matter can constitute some statue only during a part of its life, when
it is statue-shaped. Therefore, some authors assume that statues are constituted
by (a sort of existential dependence), but dierent from, amounts of matter.
Taxonomies are undeniably an important conceptual tool to organize object
types according to the set-theoretical inclusion between their extensions. But it
is not the only one. This paper proposes a complementary structuring mecha-
nism founded on a specic kind of existential dependence called grounding. This
mechanism allows to account for both roles and material objects with a exible
management of inheritance that helps to avoid isa overloading and misuse.
Levels for Conceptual Modeling 175
Let us assume that statues can change their material support through time while
maintaining their shape, i.e. the shape, not the material support, is essential for
statues. It follows that Statue is not a subtype of Amount Of Matter. One
can represent being a statue as a binary predicate with a temporal argument,
Statuet x stands for at the time t, the amount of matter x is a statue (is statue-
shaped) (d1)4 . According to (d1), being a statue becomes a sort of (relational)
role played by amounts of matter. Counting seems unproblematic: the statues
present at t are the amounts of matter that are statue-shaped at t. However,
problems arise by considering a non atomic time, e.g. the whole 2010. A statue
could change its material support during 2010, i.e. we could have two amounts
of matter that are statue-shaped during 2010 but only one single statue. On
the other side, if the same amount of matter, at a given time, is the support
of two dierent statues, then we have one amount of matter but two statues.
This sounds wrong because one usually excludes co-location of statues. Dierent
are the cases of artifacts intended as (material) objects with an assigned (by
the creator) functionality [4], and roles where, for example, at a given time the
same person can be the customer of dierent companies or a multiple-customer
of the the same company (see [12] for more details). The strategy to multiply
predicates, e.g. one specialization of Statue for each statue, incurs in the problem
of expressing what is the exact property that identies the amounts of matter
that, for instance, are David at dierent times.
d1 Statuet x AmountOfMatterx xHasShapet y StatueShapey
A multiplicative approach helps in managing these problems. In the literature,
the nature and the relations among dierent kinds of entities are discussed.
Four-dimensionalism (see [15]) accepts spatio-temporal-worms. A statue, say
david, and the amount of matter m that constitutes david only during a part of
its life, are two overlapping but dierent worms: some temporal slices of david
are not part of m. Problems can arise when david and m coincide (share the
same slices) during their whole lives. Some approaches refuse spatio-temporal
coincidence. Other approaches support a modal distinction founded on slices
spreading across possible worlds: david and m are dierent world-spatio-temporal
worms because david can exist without coinciding with m (and vice versa).
In a three dimensionalist perspective, multiplicative positions (see [18]) as-
sume that statues (generically) existentially depend on, more precisely they are
consituted by, amounts of matter without overlapping with them. In particular,
Fine [6] analyzes constitution on the basis of the notion of qua-entity. If an object
a, the basis, instantiates a property P , the gloss, then there exists an additional
entity, a-qua-P that is a sort of amalgam of a and P .5 The entity a-qua-P , e.g.
m-qua-s-shaped (m-qua-having-shape-s) exists at every time at which a instanti-
ates P , it is uniquely determined by a and P , and it can inherit (not necessarily
4
To avoid reference to shapes one can consider StatueShapedt x where x is an object.
5
Qua-entities seem similar to states of aairs as dened in [2].
176 C. Masolo
6
Dierently from classical temporal slices (see the denition in [15]), qua-entities
persist through time when the basis instantiates the gloss during a whole interval.
7
Here customer-of is a relation dened on persons and companies. Qua-entities are
then identied by a person and a property like being a customer of company A.
DBs often add customer codes that, however, in general, are keys to identify persons
not customers. This is due to the fact that DBs do not refer to persons, they just
manage cluster of attributes (e.g. Name, Date Of Birth, etc.) that do not always
identify persons. Customer codes could be conceptually necessary when the same
person can have dierent customer roles inside the same company according to, for
instance, his/her rights or obligations. In this case, the way qua-entities are identied
is dierent because there is a third argument in customer-of.
8
In this view customers coincide with single qua-entities, a limit case of mereological
sum, that have the form person-qua-customer-of-A. This explains why multiplica-
tivist models of roles often consider only qua-entities and not general sums.
9
Some authors claim that roles are necessary based on anti-rigid properties. I will
not address here this topic.
10
It is not clear to me whether unity criteria that involve diachronic constraints are
part of the glosses.
Levels for Conceptual Modeling 177
a sum of four legs and one top, and the gloss is a structural property reducible
to some spatial relations holding between the legs and the top. In this case
there are two unity criteria. A synchronic one that establishes how the legs and
the top must be structured, and a diachronic one that establishes the allowed
substitutions of legs and tops through time.
Despite the dierences previously discussed, I think that a unied view on
(structured and unstructured) material objects and roles is possible. At the end,
all these kinds of objects have an intensional dimension, to be identied, they
rely on intensional rules.
12
I will focus here only on objects present at some time.
13
Sums need to be carefully managed because not all the summands necessarily exist
at every time at which the sum exists.
Levels for Conceptual Modeling 179
i.e. grounding does not hold between objects belonging to the same leaf type.
Together with (a5), it avoids grounding loops. (a4) and (a5) are basic require-
ment for structuring (leaf) types in levels that assure also the maximality (with
respect to parthood) of the grounds.14
After all these technical details, I will now introduce three grounding rela-
tions useful to organize types in levels. The formal denitions characterize the
semantic of these grounding relations, but, once understood, they can be used as
conceptual modeling primitives. In this sense, according to the following quote,
they can be seen as an abstraction, simplication, and hiding of the previous
analysis: The theoretical notions which are required for suitable characteriza-
tions of domain conceptualizations are of a complex nature. This puts emphasis
on the need for appropriate computational support for hiding as much as possible
this inherent complexity from conceptual modeling practitioners. ([8], p.9).
T1 is (directly) specically grounded on T2 (a6), noted T1 T2 , if every T1 -
object is (directly) grounded on a single T2 -object during its whole life, e.g.
Customer Person. It is often motivated by emergent properties. Customer is
no more modeled as a subtype of Person. Customer is now a rigid type, i.e. a
customer is necessarily a customer, with specic attributes. I think this is a quite
simplifying CM technique. Furthermore, the temporal extension of a customer
is included in the one of the person (a dierent object) that grounds him, i.e.,
to exist, a customer requires a grounding person while persons do not require
customers. We will see how (some of) the attributes of Person can be inherited
by Customer and vice versa.
T1 is (directly) generically grounded on T2 (a7), noted T1 T2 , if every T1 -
object is (directly) grounded on some, but not necessarily the same, T2 -object,
e.g. StatueAmountOfMatter. It is often motivated by dierent persistence con-
ditions.15 Note that the proposed framework does not commit on a specic on-
tological theory of persistence. One can quantify on both statues and amounts
of matter without including in the domain temporal slices, qua-entities, states
of aairs, events, or tropes. Indeed without being forced to, the modeler can,
through axioms that links statues and amounts of matter, make explicit the
underlying theory of persistence (in addition to the unity criteria).
T is (directly and generically) compositionally grounded on T1 , . . . , Tm if every
T-object is (directly) grounded on some, but not necessarily the same, mereologi-
cal sum of T1 -,. . . ,Tm -objects. It is often motivated by structural relations among
T1 -,. . . ,Tm -objects. I distinguish denite compositional grounding (a8)16 , noted
14
In general, levels are not necessarily linear and they can be conceived as collections
of objects that obey the same laws of nature, have common identity criteria or
persistence conditions. These are interesting points for CM that deserve future work.
15
Customer are not completely determined by persons, nor statues by amounts of mat-
ters. Grounding does not necessarily imply reduction, it diers from determination
used to explain supervenience, e.g. The mental is dependent on the physical, or the
physical determines the mental, roughly in the sense that the mental nature of a
thing is entirely xed by its physical nature ([10], p.11).
16
In (a8) and (a9) Ti (x + y) is a shortcut for s(sSM{x, y} Ti s) s(sSM{x, y}).
180 C. Masolo
T(n1 )T1 ; . . . ; (nm )Tm , e.g. TableSurface; (4)Leg17 , i.e. when a table exists
it is grounded on exactly one surface and four legs, from (at least) indenite
compositional grounding (a9), noted T1 ( n)T2 , e.g. Organism ( 2)Cell, i.e.
organisms are grounded on at least two cells even though the exact number of
grounding cells can vary in time.18 To count the grounding objects one must rely
on clear principles that identify unitary objects. For example, I would exclude
Statue ( 2)AmountOfMatter and Statue (2)AmountOfMatter. Here I just as-
sume a mereological principle, i.e. the grounding Ti -objects does not overlap and
their sums are not of type Ti (see (a8) and (a9)).19
a6 T1 x y(T2 y t(Et x y t x)) (specic direct grounding)
a7 T1 x t(Et x y(T2 y y t x)) (generic direct grounding)
a8 Tx t(Et x y11 ... y1n1 ... ym1 ... ymnm s
Et y11 ... Et ymnm sSM{y11 ,..., ymnm } st x
T1 y11 ... T1 y1n1 y11 Oy12 ... y1,n1 1 Oy1n1 T1 (y11 + y12 ) ...
...
Tm ym1 ... Tm ymnm ym1 Oym2 ... Tm (ym1 + ym2 ) ...
a9 T1 x t(Et x s(st x z(zPs u(uPz T2 u))
y1 ... yn (Et y1 ... Et yn y1 Ps ... yn Ps T2 y1 ... T2 yn
y1 Oy2 ... yn1 Oyn T2 (y1 + y2 ) T2 (y1 + y3 ) ...)))
Generic (or specic) grounding relations can be easily combined. For exam-
ple, KitchenTable; Oven; ( 2)Chair. To mix specic and generic (composi-
tional) grounding, one just needs to introduce more elaborate denitions. E.g.,
Car Chassis; Engine; (4)Wheel; ( 1)WindscreenWiper ( is het-
erogeneous grounding) stands for cars specically depend on a chassis and
generically depend on an engine, four wheels, and at least one windscreen wiper.
Methodologically, one can start from the fundamental types, types that are
not grounded20 , and then, according to the grounding relations, progressively
arrange the other (leaf) types in layers. Figure 1 depicts a simple example (with
only a fundamental type, namely AmountOfMatter) that shows the weakness of
the notion of level: types can be grounded on types that have a dierent distance
from the fundamental level as in the case of Exhibition.
Inheritance. All the types involved in grounding relations are rigid and disjoint
from the ones on which they are grounded. Customers, statues, and tables are
17
I write TableSurface; (4)Leg instead of Table(1)Surface; (4)Leg. This is con-
sistent with the fact that T1 T2 is equivalent to T1 (1)T2 , i.e. generic compositional
grounding is an extension of generic grounding.
18
At most indenite compositional grounding, cardinality constraints (for exam-
ple, FootballTeam(11...22)FootballPlayer). Moreover, indenite compositional
grounding can also be used to introduce levels of granularity, even though addi-
tional constraints are necessary (see [11] for a preliminary discussion).
19
Specic compositional grounding can be dened starting from the corresponding
generic case by considering the form in (a6) instead of the one in (a7).
20
The existence of a (unique) fundamental level is debated in philosophy. However, in
applicative terms, I dont see any drawback in accepting fundamental types.
Levels for Conceptual Modeling 181
Exhibition
s (1)
ss
sss
s
ss Table UU
UUUU
sss(1) (4) UUU
s
Statue UU Leg
i Surface
UUUU iiii
UU
ii ii
AmountOfMatter
such during their whole life. Grounding and subtyping are separate relations,
therefore the problems due to isa overloading trivially disappear. As drawback,
we loose the power of the inheritance mechanism. However, Baker [3] observes
that constitution (a specic grounding) provides a unity, it allows the constituted
entity to inherit (to derivatively have) some properties from the constituting one
and vice versa.21 E.g. amounts of matter (persons) can inherit the style (right
to vote for student representatives) from the statues (students) they ground.
On the basis of these observations, following [1], the inheritance of attributes
of grounded types must be controlled. By default, T1 T2 or T1 T2 implies
that all the attributes of T2 are inherited by T1 . T1 [A11 , ..., A1n ] T2 [A21 , ..., A2m ]
means that only the T2 attributes A21 , ..., A2m are exported to T1 while the T1 at-
tributes A11 , ..., A1n are exported to T2 . Similarly in the case of generic grounding.
Statue[Style]AmountOfMatter means that Statue inherits all the attributes
of AmountOfMatter, while the last type inherits only the attribute Style from
Statue. In this way, attribute hiding can be trivially modeled. Attribute overrid-
ing can be approached by systematically override the attributes of the grounding
type or by localizing all the attributes as in [1]. The case of compositional de-
pendence is interesting. Some attributes of the grounded object can be obtained
from a composition of the attributes of the grounds. For example, the weight of
tables is the sum of the weights of the grounding legs and surfaces. If necessary
these rules can be explicitly added as constraints. Alternatively, one can add
dependences among the values of attributes.
Grounding and Subtyping. It is trivial to prove that if T1 T2 22 and T2 T3
then T1 T3 . Vice versa, from T1 T2 and T1 T3 , T2 T3 does not follow.
Moreover, from T1 T2 and T2 T3 it follows that T1 T3 but one looses the
information about the specic subtype on which T1 is grounded. A parsimonious
approach considers only maximally informative grounding relations T1 T2 : T1
is maximal with respect to subtyping, while T2 is minimal. This criterion (to-
gether with the fact that only direct grounding relations are considered) allows to
clarify the nature of abstract types like MaterialObject. Let us assume Leg
MaterialObject, Surface MaterialObject, and Table MaterialObject
and compare the model that considers all the grounding relations in Figure 1
21
However, high-level properties are not always reducible to properties of substrata.
22
represents the subtyping relation. The following results hold also for generic
dependence. Here I do not consider the composition of grounding relations.
182 C. Masolo
with the one with only MaterialObjectAmountOfMatter. Given the same tax-
onomical information, only the rst model makes explicit that MaterialObject
is an abstract and multi-level type.
References
1. Al-Jadir, L., Michel, L.: If we refuse the inheritance... In: Bench-Capon, T.J.M.,
Soda, G., Tjoa, A.M. (eds.) DEXA 1999. LNCS, vol. 1677, pp. 569572. Springer,
Heidelberg (1999)
2. Armstrong, D.M.: A World of States of Aairs. Cambridge University Press,
Cambridge (1997)
3. Baker, L.R.: The Metaphysics of Everyday Life. Cambridge Univerity Press,
Cambridge (2007)
4. Borgo, S., Vieu, L.: Artefacts in formal ontology. In: Meijers, A. (ed.) Handbook
of Philosophy of Technology and Engineering Sciences, pp. 273308. Elsevier, Am-
sterdam (2009)
5. Correia, F.: Existential Dependence and Cognate Notions. Ph.D. thesis, University
of Geneva (2002)
6. Fine, K.: Acts, events and things. In: Sixth International Wittgenstein Symposium,
Kirchberg-Wechsel, Austria, pp. 97105 (1982)
7. Guizzardi, G.: Ontological Foundations for Structural Conceptual Models. Ph.D.
thesis, University of Twente (2005)
8. Guizzardi, G., Halpin, T.: Ontological foundations for conceptual modeling. Ap-
plied Ontology 3(1-2), 112 (2008)
9. Gupta, A.: The Logic of Common Nouns. Phd thesis, Yale University (1980)
10. Kim, J.: Mind in a Physical World. MIT Press, Cambridge (2000)
11. Masolo, C.: Understanding ontological levels. In: Lin, F., Sattler, U. (eds.) Proceed-
ings of the Twelfth International Conference on the Principles of Knowledge Repre-
sentation and Reasoning (KR 2010), pp. 258268. AAAI Press, Menlo Park (2010)
12. Masolo, C., Vieu, L., Bottazzi, E., Catenacci, C., Ferrario, R., Gangemi, A., Guar-
ino, N.: Social roles and their descriptions. In: Dubois, D., Welty, C., Williams,
M.A. (eds.) Proceedings of the Ninth International Conference on the Principles
of Knowledge Representation and Reasoning (KR 2004), pp. 267277 (2004)
13. Mizoguchi, R., Sunagawa, E., Kozaki, K., Kitamura, Y.: A model of roles within
an ontology development tool: Hozo. Applied Ontology 2(2), 159179 (2007)
14. Rea, M. (ed.): Material Constitution. Rowman and Littleeld Publishers (1996)
15. Sider, T.: Four-Dimensionalism. Clarendon Press, Oxford (2001)
16. Simons, P.: Parts: a Study in Ontology. Clarendon Press, Oxford (1987)
17. Steimann, F.: On the representation of roles in object-oriented and conceptual
modelling. Data and Knowledge Engineering 35, 83106 (2000)
18. Vieu, L., Borgo, S., Masolo, C.: Artefacts and roles: Modelling strategies in a
multiplicative ontology. In: Eschenbach, C., Gruninger, M. (eds.) Proceedings of
Fith International Conference on Formal Ontology and Information Systems (FOIS
2008), pp. 121134. IOS Press, Amsterdam (2008)
Principled Pragmatism: A Guide to the
Adaptation of Ideas from Philosophical
Disciplines to Conceptual Modeling
1 Introduction
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 183192, 2011.
c Springer-Verlag Berlin Heidelberg 2011
184 D.W. Embley, S.W. Liddle, and D.W. Lonsdale
ideas from these areas (Section 2.2). Appropriate selectivity requires tempering
by two crucial, overarching considerations: the adaptation must be principled,
and it must be pragmatic. From our perspective forcing purist views for adap-
tation may overly complicate the conceptual-modeling application in opposi-
tion to Einsteins suciency-with-simplicity adage. To make our views concrete,
we present several case-study examples to show how the principle of practical
pragmatism has served and can further serve as a guide to adapting ideas to
conceptual modeling (Section 3). Then, as we conclude (Section 4), we general-
ize and assert that the principled pragmatism we advocate is potentially more
far-reaching in its implications than to just the case-study examples we use
for illustration. It provides a vision and perspective for adapting philosophical
disciplines to conceptual-modeling applications. Further, it answers in part the
question about the relationship among Ontology as an Artifact, Ontology as a
Philosophical Discipline, Conceptual Modeling, and Metamodeling.
Mileage
Price 1:* Color
1:*
1:*
0:1 0:*
0:1
0:* 1:*
Make 1:* 0:1 Car Feature
0:1 0:1
1:*
Year 1:* +
ModelTrim
0:1 0:1
Engine Transmission
1:* 1:*
Price
internal representation: Integer
external representation: \$[1-9]\d{0,2},?\d{3} | \d?\d [Gg]rand | ...
context keywords: price|asking|obo|neg(\.|otiable)| ...
...
LessThan(p1: Price, p2: Price) returns (Boolean)
context keywords: (less than | < | under | ...)\s*{p2} | ...
...
Make
...
external representation: CarMake.lexicon
...
Fig. 2. Linguistic Grounding of Price and Make (partial)
Our use of logic and reasoning is likely not controversial, although we must be
careful not to adopt so much that our system becomes undecidable or intractable.
Our use of linguistics is intentionally simplistic: our lexical grounding compu-
tationally enables the conveyance of linguistic meaning without the necessity of
the traditional depth of natural-language processing systems.
Surveillance Controller 3
1 2 @delay end
@init @reset Active
emergency notify
4
init detectors reset detectors
init alarms reset alarms @detection
^ current time is within the active
schedules start and stop times
Delay
activate alarm
Ready record a new detection event
start delay 5
@user abort
reset detectors
reset alarms
0:*
has record of
1 user abort
has has
Detector ID 0:* 1
Detector Event 1 1:*
Timestamp
Valuable local information is often available on the web, but encoded in a for-
eign language that non-local users do not understand. Hence the epistemological
and linguistic problem: Can we create a system to allow a user to query in
language L1 for facts in a web page written in language L2 ? We propose a
suite of multilingual extraction ontologies as a solution to this problem [13]. We
ground extraction ontologies in each language of interest, and we map both the
190 D.W. Embley, S.W. Liddle, and D.W. Lonsdale
data and the metadata via the language-specic extraction ontologies through
a central, language-agnostic ontology, reifying our ontological commitment to
cross-linguistic information extraction in a particular domain. Our world model
can thus be language-neutral and grounded in semantic and lexical equivalencies
to a practical, operational degree [14]. This allows new languages to be added
by only having to provide one language-specic ontology and its mapping to the
language-agnostic ontology.
Mappings at several linguistic levels are required to assure multilingual func-
tionality. Structural mappings (`a la mappings for schema integration) associate
related concepts across languages. Data-instance mappings mediate content of
largely terminological nature: scalar units (i.e. xed-scale measurements such as
weight, volume, speed, and quantity); lexicons (i.e. words, phrases, and other
lexical denotations); transliteration (i.e. rendering of proper names across or-
thographic systems); and currency conversions (i.e. mapping temporally varying
indexes like monetary exchange rates). Most of this type of information is pri-
marily referential in nature, with some element of pragmatics since commonly
adopted standards and norms are implied, and these vary across cultural and
geopolitical realms (e.g. use of the metric vs. imperial measurement systems).
One other type of mapping, called commentary mappings, are added to the on-
tologies: these document cultural mismatches that may not be apparent to the
monolingual information seeker, for example tipping practices in restaurants in
the target culture.
Fig. 5. Text Snippet from Page 419 of the Ely Family History
4 Concluding Remarks
We have argued in this position paper that the relationship among Ontology as
an Artifact, Ontology as a Philosophical Discipline, Conceptual Modeling, and
Metamodeling is synergistic, but should be tempered with principled pragma-
tism. For conceptual-modeling applications the goal is practical serviceability,
not philosophical enrichment. Thus, conceptual-model researchers should draw
ideas and seek guidance from these disciplines to enhance conceptual-modeling
applications, but should not become distracted from computationally practical
solutions by insisting that philosophical tenets should prevail.
To show how principled pragmatism works for building conceptual-modeling
applications, we presented several case-study examples. Tempered by Einsteins
suciency-with-simplicity adage, each case study illustrates how to directly
leverage ideas from philosophical disciplines. Further, these case studies show
that the requirements for applications to have computationally tractable so-
lutions, itself, leads to principled pragmatism. Discipline principles can spark
192 D.W. Embley, S.W. Liddle, and D.W. Lonsdale
References
1. Smith, B.: Ontology. In: Floridi, L. (ed.) Blackwell Guide to the Philosophy of
Computing and Information, pp. 155166. Blackwell, Oxford (2003)
2. Xu, L., Embley, D.W.: A composite approach to automating direct and indirect
schema mappings. Information Systems 31(8), 697732 (2006)
3. Embley, D.W., Campbell, D.M., Jiang, Y.S., Liddle, S.W., Lonsdale, D.W., Ng,
Y.-K., Smith, R.D.: Conceptual-model-based data extraction from multiple-record
web pages. Data & Knowledge Engineering 31(3), 227251 (1999)
4. Embley, D.W., Liddle, S.W., Lonsdale, D.W.: Conceptual modeling foundations
for a web of knowledge. In: Embley, D.W., Thalheim, B. (eds.) Handbook of Con-
ceptual Modeling: Theory, Practice, and Research Challenges. ch. 15, pp. 477516.
Springer, Heidelberg (2011)
5. Aristotle. Metaphysics. Oxford University Press, New York, about 350BC (1993
translation)
6. Embley, D.W., Kurtz, B.D., Woodeld, S.N.: Object-oriented Systems Analysis: A
Model-Driven Approach. Prentice-Hall, Englewood Clis (1992)
7. Embley, D.W., Liddle, S.W., Pastor, O.: Conceptual-Model Programming: A Man-
ifesto. ch. 1, pp. 316. Springer, Heidelberg (2011)
8. Bunge, M.A.: Treatise on Basic Philosophy: Ontology II: A World of Systems,
vol. 4. Reidel, Boston (1979)
9. Knowledge discovery and dissemination program,
http://www.iarpa.gov/-solicitations_kdd.html/
10. ACM-L-2010 Workshop, http://www.cs.uta.fi/conferences/acm-l-2010/
11. Liddle, S.W., Embley, D.W.: A common core for active conceptual modeling for
learning from surprises. In: Chen, P.P., Wong, L.Y. (eds.) ACM-L 2006. LNCS,
vol. 4512, pp. 4756. Springer, Heidelberg (2007)
12. Clyde, S.W., Embley, D.W., Liddle, S.W., Woodeld, S.N.: OSM-Logic: A Fact-
Oriented, Time-Dependent Formalization of Object-oriented Systems Modeling
(2012) (submitted for publication, manuscript), www.deg.byu.edu/papers/
13. Embley, D.W., Liddle, S.W., Lonsdale, D.W., Tijerino, Y.: Multilingual ontologies
for cross-language information extraction and semantic search. In: De Troyer, O.,
et al. (eds.) ER 2011 Workshops. LNCS, vol. 6999, Springer, Heidelberg (2011)
14. Nirenburg, S., Raskin, V., Onyshkevych, B.: Apologiae ontologiae. In: Proceedings
of the 1995 AAAI Spring Symposium: Representation and Acquisition of Lexical
Knowledge: Polysemy, Ambiguity, and Generativity, Menlo Park, California, pp.
95107 (1995)
15. Plato: Theaetetus. BiblioBazaar, LLC, Charleston, South Carolina, about 360BC
(translated by Benjamin Jowett)
Ontological Usage Schemes
A Working Proposal for the Ontological Foundation of
Language Use
Frank Loebe
1 Introduction
The proposals in this article and the considerations that lead to them arose and
have been pursued since then in the context of a long-term ontology development
enterprise, building the top-level ontology General Formal Ontology (GFO)1 [9].
Top-level ontologies naturally lend themselves to nding applications in numer-
ous areas. This poses a challenge as regards the representation of the ontology,
i.e., its provision in appropriate formats / languages (of primarily formal and
semi-formal, and to some extent informal kind) for those dierent areas and
for applications regarding dierent purposes. Due to this we are interested in
translations among representations that provably preserve the semantics of the
ontology.
Motivated by the ultimate goal of meaning-preserving translations, we are
working towards formally dening the semantics of languages on the basis of
1
http://www.onto-med.de/ontologies/gfo
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 193202, 2011.
c Springer-Verlag Berlin Heidelberg 2011
194 F. Loebe
This basic distinction has already been defended in some detail by others,
e.g. [8, sect. 2.1], and ourselves [11]. One supportive argument is that encodings
of conceptual notions by mathematical constructs occur and lead to problems
where mathematical properties correspond no longer to conceptual interrela-
tions. Further, iterated translations among formalisms are problematic if encod-
ings produce anomalies that aggregate over repeated translations. Nevertheless,
the distinction between formal and ontological semantics is worth being stressed.
At least, this is our observation regarding many proponents with a formal / logical
background. For instance, major, seemingly accepted denitions of notions like
ontology-based semantics and equivalence [2,13] and of semantics-preserving
translations (like those between UML and OWL in the Ontology Denition
Metamodel) purely rest on the formal semantics of the languages under study.
For conceptual modeling and ontology research related with conceptual modeling
the awareness of the dierence may be greater, cf. [8, sect. 2.1].
It should be stressed again that multiple OUS are perfectly reasonable for any
single language L. Moreover, L and base are parameters in Def. 1, allowing
for dierent languages and base ontologies. For our purposes, we have equipped
FOL syntax with a direct ontological semantics [11, sect. 3.2] and thus use FOL
(as L ) in conjunction with the methodology mentioned in sect. 2.3 for ontology
representation, including the ontology of categories and relations (outlined in [11,
sect. 4]). The schematic aspect of ontological usage schemes becomes manifest
in generic denitions for certain sets of identiers, like NC , NR , and NI in Table 1.
This is similar to usual denitions of formal semantics and primarily useful for
the ontological foundation of language elements.
It is important, on the one hand, that the translations , , and must be
specied such that they formally capture the intended meaning of representa-
tions in L. On the other hand, they can be dened with great freedom concerning
the (semi-)formal semantics of L. This is contrary to existing formal accounts of
ontology-based semantics, like [2]. However, due to the encoding of conceptual
notions into Ls syntax and, thereby, semantics (see sect. 2.1), the translation of
Ontological Usage Schemes 199
logics (DLs) [1].2 A detailed explanation of it does not t here, but for readers
aware of DL the scheme is designed to strongly correspond to an ontological
reading of the standard translation between DL and FOL [1, sect. 4.2], i.e., DL
concepts are understood as categories of (ontological) individuals, DL roles as
binary relations among (ontological) individuals. In terms of Def. 1, this prox-
imity between L and L is no prerequisite. For more intuitions on the ontology
of categories and relations CR, the reader is referred to [11,10]. To complete the
illustration with a very simple translation of a DL representation, the ontological
image of the DL statement lion animal is the deductive closure of the theory
CR{lion :: Cat, lion Ind, animal :: Cat, animal Ind, x.x :: lion x :: animal}.
4 Discussion
4.1 Applications and Benets
Ontological semantics in general and ontological usage schemes (OUSes) in par-
ticular allow for at least the following applications. Firstly, the formal-logical,
ontological theory resulting from applying an OUS to a particular representa-
tion R allows for reasoning over the conceptual contents of R. Such theories also
form the basis of notions of conceptual equivalence. Notably, logical equivalence
within those theories is a rst candidate, cf. [2], but we are convinced that fur-
ther renements are required. Secondly, with notion(s) of conceptual equivalence
(seemingly) non-standard translations become justiable. In particular, these play
a role when translating from less expressive to more expressive languages. Ex-
pressiveness here refers to the distinctions available in the abstract syntax. For
instance, consider the case of DLs and UML. (Almost all) DLs do not provide
immediate means to express relationships with an arity greater than two, which
leads to encoding proposals like [12]. Accordingly, catch-all language-level3 trans-
lations like the ones from UML to OWL and vice versa in the Ontology Denition
Metamodel must fail in a number of cases, and circular translation chains lead
to non-equivalent representations in the same language.
Thirdly, based on an OUS one may derive constraints / rules for using a
language in accordance with that OUS. For instance, the disjointness of rela-
tions and purely non-relational categories suggests that a DL concept encoding
an n-ary relation should be declared disjoint with every DL concept capturing
a non-relational category. Fourthly, OUSes should be useful for re-engineering
purposes, because frequently the ontological semantics of a representation is aug-
mented / elaborated in greater detail. Consider the example of modeling pur-
chase (processes), initially encoded as a binary relation / association between a
buyer and a commodity (in DL and UML), then expanded to include the time of
the purchase (requiring a concept in DL and, e.g., an association class in UML).
If the connection to the notion of purchase is explicated in both OUS (for DL
and UML), the relationships between the two versions should be at least easier to
grasp in terms of the OUS and might support (semi-)automatic data migration.
2
ALC is derived from the phrase attribute logic with complement.
3
As opposed to usage-level.
Ontological Usage Schemes 201
The overall aim herein is to unite the formal and conceptual viewpoints on lan-
guage semantics beneath a common roof, but without reducing one to the other.
We advocate the general theses that specic uses of languages should be ac-
companied by explicating the ontological semantics of the (usage of) language
constructs, in addition to (possibly) having a formal or semi-formal semantics
assigned to the language already. This presumes that ontological and formal se-
mantics are dierent. To the best of our knowledge, these positions are not well
accepted in formal language communities (if discussed at all), whereas they seem
to be acknowledged at least by some researchers in conceptual modeling. After
arguing for these views, the approach of ontological usage schemes (OUSes)
is outlined and briey illustrated. It is based on formalization in logical lan-
guages and research in top-level ontology, and thus far contributes a translation
approach to dene precisely ontological semantics for languages.
4
Only recently, a translation of a UML representation of Bunges ontology into OWL
was presented by J. Evermann [4].
5
Notably, [6, p. 32] restricts this to only when using UML for conceptual modeling
of a domain, as distinguished from software modeling.
202 F. Loebe
Future work will comprise the denition of further OUSes for languages that
are relevant to our ontology development projects. In combination with appro-
priate notions of (conceptual) equivalence, which we are currently developing,
these OUSes should allow us to oer provably conceptually equivalent versions
of contents represented in multiple languages. Regarding the rather large col-
lection of works on ontological semantics for UML / in conceptual modeling, it
appears worthwhile to elaborate mutual relationships in greater detail. Remem-
bering the distinction of conceptual models and software models in [6], it appears
interesting to study the relation and information ow between conceptual and
software models by means of OUS (or an extension of these). Not at least, this
might become one kind of evaluation in practice.
References
1. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P. (eds.):
The Description Logic Handbook. Cambridge University Press, Cambridge (2003)
2. Ciocoiu, M., Nau, D.S.: Ontology-based semantics. In: Cohn, A.G., Giunchiglia, F.,
Selman, B. (eds.) Proc. of KR 2000, pp. 539546. Morgan Kaufmann Publishers,
San Francisco (2000)
3. Diaz, M. (ed.): Petri Nets: Fundamental Models, Verication and Applications.
ISTE, London (2009)
4. Evermann, J.: A UML and OWL description of Bunges upper-level ontology
model. Software Syst. Model 8(2), 235249 (2009)
5. Evermann, J., Wand, Y.: Ontology based object-oriented domain modelling: Fun-
damental concepts. Requir. Eng. 10(2), 146160 (2005)
6. Evermann, J., Wand, Y.: Toward formalizing domain modeling semantics in lan-
guage syntax. IEEE T. Software Eng. 31(1), 2137 (2005)
7. Evermann, J.M.: Using Design Languages for Conceptual Modeling: The UML
Case. PhD Thesis, University of British Columbia, Vancouver, Canada (2003)
8. Guizzardi, G.: Ontological Foundations for Structural Conceptual Models. CTIT
PhD Series No. 05-74, Telematica Instituut, Enschede, The Netherlands (2005)
9. Herre, H.: General Formal Ontology (GFO): A foundational ontology for conceptual
modelling. In: Poli, R., Healy, M., Kameas, A. (eds.) Theory and Applications of
Ontology: Computer Applications. ch. 14, pp. 297345. Springer, Berlin (2010)
10. Loebe, F.: Abstract vs. social roles: Towards a general theoretical account of roles.
Appl. Ontology 2(2), 127158 (2007)
11. Loebe, F., Herre, H.: Formal semantics and ontologies: Towards an ontological
account of formal semantics. In: Eschenbach, C., Gr uninger, M. (eds.) Proc. of
FOIS 2008, pp. 4962. IOS Press, Amsterdam (2008)
12. Noy, N., Rector, A.: Dening N-ary relations on the Semantic Web. W3C Working
Group Note, World Wide Web Consortium (W3C) (2006),
http://www.w3.org/TR/swbp-n-aryRelations/
13. Schorlemmer, M., Kalfoglou, Y.: Institutionalising ontology-based semantic inte-
gration. Appl. Ontology 3(3), 131150 (2008)
14. W3C: OWL 2 Web Ontology Language Document Overview. W3C Recommenda-
tion, World Wide Web Consortium (W3C) (2009)
15. Wand, Y., Storey, V.C., Weber, R.: An ontological analysis of the relationship
construct in conceptual modeling. ACM T. Database Syst. 24(4), 494528 (1999)
Gene Ontology Based Automated Annotation:
Why It Isnt Working
Centro de Investigaci
on en Metodos de Producci
on de Software -PROS-, Universidad
Politecnica de Valencia, Camino de Vera s/n, 46022 Valencia, Valencia, Spain
1 Introduction
The sequencing of the Human Genome in the year 2000 by Craig Venter and
Francis Collins [1] came with tremendous promises. These eects are most prob-
ably not yet apparent and science still struggles to process the huge accomplish-
ment into knowledge artifacts. Scientists now broadly agree that reading the
sequence of DNA was the relatively easy part of genome analysis; guring out
what the sequence actually means is the real challenge. Following these insights
a new scientic line of research was opened as the marriage of informatics and
biology: bioinformatics. It is here that bioinformaticians try to combine rigorous
yet computationally powerful informatics with the ambiguous and fuzzy biology.
And as Venter and Collins eorts start to bear fruits and technology rapidly ad-
vances, more and more sequencing experiments are being performed world-wide
generating large amounts of data; leading to the following question: how do we
manage the data chaos?
Current solutions are often based on ontologies, most notably the Gene On-
tology (GO). Literally translated from ancient Greek, ontos means of that
which is and -logia: science, study. The science of Ontology (uppercase O)
is diverse and dates back to the early Greeks, where it referred to the analytic
philosophy of determining what categories of being are fundamental, and in what
sense items in those categories can be said to be. In modern times, an ontol-
ogy (lowercase o) is considered many things. Gruber is credited for introducing
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 203209, 2011.
c Springer-Verlag Berlin Heidelberg 2011
204 M. van der Kroon and A.M. Levin
Adding to this, it allows us to reason over these structured knowledge bases, with
the purpose of deducing new knowledge. In this short description we identify two
dierent applications; knowledge management and knowledge deduction. An on-
tology can be of varying level of rigor, where a lower level, and as such more
ambiguous ontology will be ne to deliver the promise of maintaining knowledge
bases, but unable to allow for automated reasoning necessary to deduce new
knowledge. A higher level of rigor will allow for both accurate knowledge man-
agement, and deduction of new knowledge at the cost of increased complexity.
When the Gene Ontology was conceived in 2000 [8], it came with the promise
of enabling a conceptual unication of biology by providing a dynamic, controlled
vocabulary. Adding to this, it was hoped that the common vocabulary would
result in the ability to query and retrieve gene and proteins based on their
shared biology, thus deducing new knowledge. Later research has shown that
Gene Ontology in reality often lacks rigor to allow for this high level of ontology
application. The early discussion started by Smith [9] and Kumar [10], stating
that It is unclear what kinds of reasoning are permissible on the basis of GOs
hierarchies. and No procedures are oered by which GO can be validated..
We now proceed to discuss the main points of their work.
Occurrents on the other hand, are never said to exist in full for a single instant
of time. Rather they they unfold during successive phases, like for example a viral
infection unfolds itself over time. (Biological) processes usually are characterized
by passing through dierent states: where the nucleus is part of the cell, mitosis
is a part of the cellular process.
The continuant/occurrent opposition corresponds in the rst place to the dis-
tinction between substances (objects, things) and processes. GOs cellular com-
ponent ontology is in our terms an ontology of substance universals; its molecular
function and biological process ontology are ontologies of function and process
universals. But functions, too, are from the perspective of philosophical ontology
continuants. For if an object has a given function which means a token function
for a given interval of time, then this token function is present in full at every
instant in this interval. It does not unfold itself in phases in the manner of an
occurrent. If, however, the token function gets exercised, then the token process
that results does indeed unfold itself in this manner. Each function thus gives
rise, when it is exercised, to processes or activities of characteristic types.
4 Conclusions
Our goal in this work is to start a discussion on the application of Gene Ontology
to gene expression annotation, uncovering its aws and proposing an alternative
route. Capturing biological understanding is a huge challenge we face, and the
question is not whether we want to do it, but how do we do it. Specications of
conceptualizations are everywhere around us, and thus according to Gruber all
these correspond to the concept of an ontology. It is when we try and structure
knowledge into a computationally sound structure, where not everything can
qualify as an ontology anymore. Rigorous methods are needed, for a semanti-
cally sound ontology to emerge. The practice of conceptual modeling has been
long around in Information System development, and has proven very suitable
to capture domain knowledge in a controlled manner, both syntactically and
semantically sound; ultimately even leading to the automated generation and
validation of computer software. If we take the liberty of considering the biolog-
ical system of life as analogous to an Information System it is not easy to miss
the grand opportunities such a perspective provides.
References
1. Venter, J., et al.: The Sequence of the Human Genome. Science 291(5507), 1304
1351 (2000)
2. Gruber, T.: Principle for the Design of Ontologies Used for Knowledge Sharing.
In: Poli, R. (ed.) Formal Ontology in Conceptual Analysis and Knowledge Repre-
sentation. Kluwer Academic Publishers, Dordrecht (1993)
3. Gonzlez-Daz, H., Muo, L., Anadn, A., et al.: MISS-Prot: web server for self/non-self
discrimination of protein residue networks in parasites; theory and experiments in
Fasciola peptides and Anisakis allergens, Molecular Biosystems (2011) [Epub ahead
of print]
4. Hsu, C., Chen, C., Liu, B.: WildSpan: mining structured motifs from protein se-
quences. Algorithms in Molecular Biology 6(1), 6 (2011)
5. Tirrell, R., Evani, U., Berman, A., et al.: An ontology-neutral framework for enrich-
ment analysis. In: American Medical Informatics Association Annual Symposium,
vol. 1(1), pp. 797801 (2010)
6. Jung, J., Yi, G., Sukno, S., et al.: PoGo: Prediction of Gene Ontology terms for
fungal proteins. BMC Bioinformatics 11(215) (2010)
7. Khatri, P., Draghici, S.: Ontological analysis of gene expression data: current tools,
limitations, and open problems. Bioinformatics 21(18), 35873595 (2005)
8. Ashburner, M., Ball, C.A., Blake, J.A.: Gene Ontology: tool for the unication of
biology. Nature Genetics 25(1), 2530 (2000)
Gene Ontology Based Automated Annotation: Why It Isnt Working 209
9. Smith, B., Williams, J., Schulze-Kremer, S.: The Ontology of the Gene Ontology.
In: American Medical Informatics Association Annual Symposium Proceedings,
vol. 1(1), pp. 609613 (2003)
10. Kumar, A., Smith, B.: Controlled vocabularies in bioinformatics: a case study in
the Gene Ontology. Drug Discovery Today: BIOSILICO 2(6), 246252 (2004)
11. Egaa Aranguren, M., Bechhofer, S., Lord, P., et al.: Understanding and using the
meaning of statements in a bio-ontology: recasting the Gene Ontology in OWL.
BMC Bioinformatics 8(57) (2007)
12. Pastor, O., Molina, J.C.: Model-driven architecture in practice: a software produc-
tion environment based on conceptual modeling. Springer, Heidelberg (2007)
13. Collins, F.S.: The Language of Life: DNA and the Revolution in Personalized
Medicine. Prole Books Ltd. (2010)
14. Paton, N.W., Khan, S.A., Hayes, A., et al.: Conceptual modeling of genomic in-
formation. Bioinformatics 16(6), 548557 (2000)
15. Pastor, O., Levin, A.M., Celma, M., et al.: Model Driven-Based Engineering Ap-
plied to the Interpretation of the Human Genome. In: Kaschek, R., Delcambre, L.
(eds.) The Evolution of Conceptual Modeling. Springer, Heidelberg (2010)
16. Pastor, O., van der Kroon, M., Levin, A.M., et al.: A Conceptual Modeling Ap-
proach to Improve Human Genome Understanding. In: Embley, D.W., Thalheim,
B. (eds.) Handbook of Conceptual Modeling. Springer, Heidelberg (2011)
17. Warmer, J., Kleppe, A.: Object Constraint Language: Getting Your Models Ready
for MDA, 2nd edn. Addison-Wesley Longman Publishing Co., Boston (2011)
Formal Ontologies, Exemplars, Prototypes
1 Introduction
Computational representation of concepts is a central problem for the development of
ontologies and for knowledge engineering1. Concept representation is a
multidisciplinary topic of research that involves such different disciplines as Artificial
Intelligence, Philosophy, Cognitive Psychology and, more in general, Cognitive
Science. However, the notion of concept itself results to be highly disputed and
problematic. In our opinion, one of the causes of this state of affairs is that the notion
itself of concept is in some sense heterogeneous, and encompasses different cognitive
phenomena. This results in a strain between conflicting requirements, such as, for
example, compositionality on the one side and the need of representing prototypical
1
It could be objected that ontologies have to do with the representation of the world, and not
with the representation of our concepts. This is surely true, but, as far as we are (also)
interested in our commonsense ontologies (i.e., in the representation of the world from the
standpoint of our everyday experience, contrasted, for example, with a scientific
representation of the world), then, in our opinion, we cannot ignore the problem of how
ordinary concepts are structured.
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 210219, 2011.
Springer-Verlag Berlin Heidelberg 2011
Formal Ontologies, Exemplars, Prototypes 211
information on the other. This has several consequences for the practice of knowledge
engineering and for the technology of formal ontologies.
In this paper we propose an analysis of this situation. The paper is organised as
follows. In section 2. we point out some differences between the way concepts are
conceived in philosophy and in psychology. In section 3. we argue that AI research in
some way shows traces of the contradictions individuated in sect. 2. In particular, the
requirement of compositional, logical style semantics conflicts with the need of
representing concepts in the terms of typical traits that allow for exceptions. In section
4 we review some attempts to resolve this conflict in the field of knowledge
representation, with particular attention to description logics. It is our opinion that a
mature methodology to approach knowledge representation and knowledge
engineering should take advantage from both the empirical results of cognitive
psychology that concern human abilities and from philosophical analyses. In this
spirit, in section 5 we individuate some possible suggestions coming from different
aspects of cognitive research: the distinction between two different types of reasoning
processes, developed within the context of the so-called dual process accounts of
reasoning; the proposal to keep prototypical effects separate from compositional
representation of concepts; the possibility to develop hybrid, prototype and exemplar-
based representations of concepts.
incompatible with prototypical effects. But such approaches pose various theoretical
and practical difficulties, and many unsolved problems remain.
In this section we overview some recent proposal of extending concept-oriented
KRs, and in particular DLs, in order to represent non-classical concepts.
Recently different methods and techniques have been adopted to represent non-
classical concepts within computational ontologies. They are based on extensions of
DLs and of standard ontology languages such as OWL. The different proposals that
have been advanced can be grouped in three main classes: a) fuzzy approaches, b)
probabilistic and Bayesan approaches, c) approaches based on non-monotonic
formalisms.
a) Following this direction, for as the integration of fuzzy logics in DLs and in
ontology oriented formalisms, see for example [10-11], Stoilos et al. [12] propose a
fuzzy extension of OWL, f-OWL, able to capture imprecise and vague knowledge, and
a fuzzy reasoning engine that lets f-OWL reason about such knowledge. In [13] a fuzzy
extension of OWL 2 is proposed, for representing vague information in semantic web
languages. However, it is well known [14] that approaches to prototypical effects based
on fuzzy logic encounter some difficulty with compositionality.
b) The literature offers also several probabilistic generalizations of web ontology
languages. Many of these approaches, as pointed out in [15], focus on combining the
OWL language with probabilistic formalisms based on Bayesian networks. In
particular, in [16] a probabilistic generalization of OWL is proposed, called Bayes-
OWL, which is based on standard Bayesian networks. Bayes-OWL provides a set of
rules and procedures for the direct translation of an OWL ontology into a Bayesian
network. A problem here could be represented by the translation from one form of
semantics (OWL based) to another one.
c) In the field of non-monotonic extensions of DLs, in [17] an extension of ALCF
system based on Reiters default logic is proposed. The same authors, however, point
out both the semantic and computational difficulties of this integration and, for this
reason, they propose a restricted semantics for open default theories, in which default
rules are only applied to individuals explicitly represented in the knowledge base. In
[18] an extension of DLs with circumscription is proposed. One of motivating
applications of circumscription is indeed to express prototypical properties with
exceptions, and this is done by introducing abnormality predicates, whose extension
is minimized. A different approach, investigated in [19], is based on the use of the
OWL 2 annotation properties (APs) in order to represent vague or prototypical,
information. The limit of this approach is that APs are not taken into account by the
reasoner, and therefore have no effect on the inferential behaviour of the system [13].
monotonic machine reasoning for Semantic Web) can be maybe adopted for local
uses only or for specific applications because it is unsafe on the web. The question
about which logics must be used in the Semantic Web (or, at least, until which
degree, and in which cases, certain logics could be useful) is still open.
The empirical results from cognitive psychology show that most common-sense
concepts cannot be characterised in terms of necessary/sufficient conditions.
Classical, monotonic DLs seem to capture the compositional aspects of conceptual
knowledge, but are inadequate to represent prototypical knowledge. But a non
classical alternative, a general DL able to represent concepts in prototypical terms
does not still emerge.
As a possible way out, we sketch a tentative proposal that is based on some
suggestions coming from cognitive science. Some recent trends of psychological
research favour the hypothesis that reasoning is not an unitary cognitive phenomenon.
At the same time, empirical data on concepts seem to suggest that prototypical effects
could stem from different representation mechanisms. In this spirit, we individuate
some hints that, in our opinion, could be useful for the development of artificial
representation systems, namely: (i) the distinction between two different types of
reasoning processes, which has been developed within the context of the so-called
dual process accounts of reasoning (sect. 5.1 below); (ii) the proposal to keep
prototypical effects separate from compositional representation of concepts (sect.
5.2); and (iii) the possibility to develop hybrid, prototype and exemplar-based
representations of concepts (sect. 5.3).
Cognitive research about concepts seems to suggest that concept representation does
not constitute an unitary phenomenon from the cognitive point of view. In this
perspective, a possible solution should be inspired by the experimental results of
empirical psychology, in particular by the so-called dual process theories of reasoning
and rationality [21-22]. In such theories, the existence of two different types of
cognitive systems is assumed. The systems of the first type (type 1) are
phylogenetically older, unconscious, automatic, associative, parallel and fast. The
systems of the type 2 are more recent, conscious, sequential and slow, and are based
on explicit rule following. In our opinion, there are good prima facie reasons to
believe that, in human subjects, classification, a monotonic form of reasoning which
is defined on semantic networks2, and which is typical of DL systems, is a task of the
type 2 (it is a difficult, slow, sequential task). On the contrary, exceptions play an
important role in processes such as non-monotonic categorisation and inheritance,
which are more likely to be tasks of the type 1: they are fast, automatic, usually do not
require particular conscious effort, and so on.
Therefore, a reasonable hypothesis is that a concept representation system should
include different modules: a monotonic module of type 2, involved in classification
2
Classification is a (deductive) reasoning process in which superclass/subclass (i.e., ISA)
relations are inferred from implicit information encoded in a KB. Categorization is an
inferential process through which a specific entity is assigned as an instance to a certain class.
In non-monotonic categorization class assignment is a non deductive inferential process, based
on typicality.
Formal Ontologies, Exemplars, Prototypes 215
3
Various attempts to conciliate compositionality and typicality effects have been proposed within
the field of psychology ([23-25]). However, when psychologists face the problem of
compositionality, they usually take into account a more restricted phenomenon with respect to
philosophers. They try to explain how concepts can be combined, in order to form complex
conceptual representations. But compositionality is a more general matter: what is needed is an
account of how the meaning of any complex representation (included propositional
representations) depends in a systematic way from the meaning of its components and from its
syntactic structure. This should allow to account, among other things, for the inferential
relationships (typically, of logical consequence) that exist between propositional representations.
From this point of view, psychological proposals are much more limited.
216 M. Frixione and A. Lieto
We are in no way committed with such an account of semantic content. (In any case,
the philosophical problem of the nature of the intentional content of representations is
largely irrelevant to our present purposes).
ii. Fodor claims that concepts are compositional, and that prototypical
representations, in being not compositional, cannot be concepts. We do not take
position on which part of the system we propose must be considered as truly
conceptual. Rather, in our opinion the notion of concept is spurious from the
cognitive point of view. Both the compositional and the prototypical components
contribute to the conceptual behaviour of the system (i.e., they have some role in
those abilities that we usually describe in terms of possession of concepts).
iii. According to Fodor, the majority of concepts are atomic. In particular, he
claims that almost all concepts that correspond to lexical entries have no structure.
We maintain that many lexical concepts, even though not definable in the terms
classical theory, should exhibit some form of structure, and that such structure can be
represented, for example, by means of a DL taxonomy.
Within the field of psychology, different positions and theories on the nature of concepts
are available. Usually, they are grouped in three main classes, namely prototype views,
exemplar views and theory-theories (see e.g. [23, 28]). All of them are assumed to
account for (some aspects of) prototypical effects in conceptualisation, effects which
have been firstly individuated by Eleanor Rosch in her seminal works [29].
According to the prototype view, knowledge about categories is stored in terms of
prototypes, i.e. in terms of some representation of the best instances of the category.
For example, the concept CAT should coincide with a representation of a prototypical
cat. In the simpler versions of this approach, prototypes are represented as (possibly
weighted) lists of features.
According to the exemplar view, a given category is mentally represented as set of
specific exemplars explicitly stored within memory: the mental representation of the
concept CAT is the set of the representations of (some of) the cats we encountered
during our lifetime.
Theory-theories approaches adopt some form of holistic point of view about
concepts. According to some versions of the theory-theories, concepts are analogous
to theoretical terms in a scientific theory. For example, the concept CAT is
individuated by the role it plays in our mental theory of zoology. In other version of
the approach, concepts themselves are identified with micro-theories of some sort. For
example, the concept CAT should be identified with a mentally represented micro-
theory about cats.
These approaches turned out to be not mutually exclusive. Rather, they seem to
succeed in explaining different classes of cognitive phenomena, and many researchers
hold that all of them are needed to explain psychological data. In this perspective, we
propose to integrate some of them in computational representations of concepts. More
precisely, we try to combine a prototypical and an exemplar based representation in
order to account for category representation and prototypical effects (for a similar,
hybrid prototypical and exemplar based proposal, see [30]). We do not take into
consideration the theory-theory approach, since it is in some sense more vaguely
Formal Ontologies, Exemplars, Prototypes 217
References
1. DellAnna, A., Frixione, M.: On the advantage (if any) and disadvantage of the
conceptual/nonconceptual distinction for cognitive science. Minds & Machines 20, 2945
(2010)
2. Frixione, M., Lieto, A.: The computational representation of concepts in formal ontologies:
Some general considerations. In: Proc. KEOD 2010, Int. Conf. on Knowledge Engineering
and Ontology Development, Valencia, Spain, October 25-28 (2010)
3. Frixione, M., Lieto, A.: Representing concepts in artificial systems: a clash of
requirements. In: Proc. HCP 2011, pp. 7582 (2011)
4. Fodor, J.: The present status of the innateness controversy. J. Fodor, Representations
(1981)
5. Minsky, M.: A framework for representing knowledge, in Patrick Winston (a cura di), The
Psychology of Computer Vision (1975); Also in Brachman & Levesque (2005)
6. Brachman, R., Schmolze, J.G.: An overview of the KL-ONE knowledge representation
system. Cognitive Science 9, 171216 (1985)
7. Brachman, R., Levesque, H. (eds.): Readings in Knowledge Representation. Morgan
Kaufmann, Los Altos (1985)
8. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.: The
Description Logic Handbook: Theory, Implementations and Applications. Cambridge
University Press, Cambridge (2003)
9. Brachman, R.: I lied about the trees. The AI Magazine 3(6), 8095 (1985)
10. Gao, M., Liu, C.: Extending OWL by fuzzy Description Logic. In: Proc. 17th IEEE Int.
Conf. on Tools with Artificial Intelligence (ICTAI 2005), pp. 562567. IEEE Computer
Society, Los Alamitos (2005)
11. Calegari, S., Ciucci, D.: Fuzzy Ontology, Fuzzy Description Logics and Fuzzy-OWL. In:
Masulli, F., Mitra, S., Pasi, G. (eds.) WILF 2007. LNCS (LNAI), vol. 4578, pp. 118126.
Springer, Heidelberg (2007)
12. Stoilos, G., Stamou, G., Tzouvaras, V., Pan, J.Z., Horrocks, I.: Fuzzy OWL: Uncertainty
and the Semantic Web. In: Proc. Workshop on OWL: Experience and Directions (OWLED
2005). CEUR Workshop Proceedings, vol. 188 (2005)
13. Bobillo, F., Straccia, U.: An OWL Ontology for Fuzzy OWL 2. In: Rauch, J., Ra, Z.W.,
Berka, P., Elomaa, T. (eds.) ISMIS 2009. LNCS, vol. 5722, pp. 151160. Springer,
Heidelberg (2009)
14. Osherson, D.N., Smith, E.E.: On the adequacy of prototype theory as a theory of concepts.
Cognition 11, 237262 (1981)
15. Lukasiewicz, L., Straccia, U.: Managing uncertainty and vagueness in description logics
for the Semantic Web. Journal of Web Semantics 6, 291308 (2008)
16. Ding, Z., Peng, Y., Pan, R.: BayesOWL: Uncertainty modeling in Semantic Web
ontologies. In: Ma, Z. (ed.) Soft Computing in Ontologies and Semantic Web. Studies in
Fuzziness and Soft Computing, vol. 204. Springer, Heidelberg (2006)
17. Baader, F., Hollunder, B.: Embedding defaults into terminological knowledge
representation formalisms. J. Autom. Reasoning 14(1), 149180 (1995)
18. Bonatti, P.A., Lutz, C., Wolter, F.: Description logics with circumscription. In: Proc. of
KR, pp. 400410 (2006)
19. Klinov, P., Parsia, B.: Optimization and evaluation of reasoning in probabilistic description
logic: Towards a systematic approach. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M.,
Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 213
228. Springer, Heidelberg (2008)
Formal Ontologies, Exemplars, Prototypes 219
20. Hayes, P.: Dialogue on rdf-logic. Why must the web be monotonic? (W3C). Link (2001),
http://lists.w3.org/Archives/public/www-rdf-logic/
2001Jul/0067.html
21. Stanovich, K.E., West, R.: Individual Differences in Reasoning: Implications for the
Rationality Debate? The Behavioural and Brain Sciences 23(5), 645665 (2000)
22. Evans, J.S.B.T., Frankish, K. (eds.): In Two Minds: Dual Processes and Beyond. Oxford
UP, New York (2008)
23. Murphy, G.L.: The Big Book of Concepts. The MIT Press, Cambridge (2002)
24. Margolis, E., Laurence, S. (eds.): Concepts: Core Readings. The MIT Press, Cambridge
(1999)
25. Laurence, S., Margolis, E.: Review. Concepts: where cognitive science went wrong.
British Journal for the Philosophy of Science 50(3), 487491 (1999)
26. Fodor, J.: Psychosemantics. The MIT Press/A Bradford Book, Cambridge, MA (1987)
27. Fodor, J.: Concepts: Where Cognitive Science Went Wrong. Oxford University Press,
Oxford (1998)
28. Machery, E.: Doing without Concepts. Oxford University Press, Oxford (2009)
29. Rosch, E.: Principles of categorization. In: Rosch, E., Lloyd, B. (eds.) Cognition and
Categorization, pp. 2748. Lawrence Erlbaum, Hillsdale (1978)
30. Gagliardi, F.: A Prototype-Exemplars Hybrid Cognitive Model of Phenomenon of
Typicality in Categorization: A Case Study in Biological Classification. In: Proc. 30th
Annual Conf. of the Cognitive Science Society, Austin, TX, pp. 11761181 (2008)
31. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques
with Java Implementations, 2nd edn., Kaufmann, San Francisco(2005)
32. Gagliardi, F.: The Need of an Interdisciplinary Approach based on Computational
Modelling in the Study of Categorization. In: Proc. of ICCM 2009, pp. 492493 (2009)
Unintended Consequences of Class-Based Ontological
Commitment
1 Introduction
Specific domain ontologies can offer a number of advantages in integrating and
managing data sources. Ontology is a philosophical study aimed at describing what
exists in a systematic way [1]. In a particular area of concern, a domain-specific
ontology prescribes possible constructs, rules and relationships. Thus, many consider
domain-specific ontologies as surrogates for the semantics of a domain [2].
Recently, increased attention to domain ontologies is fueled by the need to manage a
growing body of heterogeneous data sets, especially on the web [2-4]. Yet, what
appears to be a clear advantage of domain-specific ontologies the explicit
representation of domain semantics may in fact impede domain understanding and
distort the originally intended reality. This paper examines unintended consequences
of class-based ontological commitment and advocates instead an instance-and-
property ontological foundation that avoids the negative effects of class-based
ontologies and supports semantic interoperability in a broader sense.
Initially part of philosophy, ontological studies are embraced by the information
systems and computer science research based on the pragmatic goal of improving
communication between humans and machines. This focus suggests a potential
application of ontologies to the Semantic Web, which aims to move beyond syntactic
matches and support semantic interoperability [3, 5]. To achieve semantic inter-
operability, users and machines need to agree on common constructs and rules with
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 220229, 2011.
Springer-Verlag Berlin Heidelberg 2011
Unintended Consequences of Class-Based Ontological Commitment 221
which to understand and reason about domains. In the search for this common blueprint
of knowledge, domain-specific ontologies appear to offer an advantage: they contain a
structured snapshot of reality usually carefully engineered by domain experts. Thus,
ontological commitment implies an agreement between all data producers and users to
share common understanding of some domain of interest by adopting a formal set of
constructs, relationships and rules [6-7]. For example, when working within the natural
history domain, one may use the ETHAN (Evolutionary Trees and Natural History)
ontology [8-9]. ETHAN draws upon a combination of Linnaean taxonomy and
phylogenetic information to construct a set of classes and properties that form a
hierarchical tree of life. This ontology can be used to discover, integrate, store and
analyze ecological information for a variety of purposes [8].
While ontologies have the potential to facilitate semantically-enhanced information
transfer, we argue that the prevailing practice of imposing ontological structure on a
domain can impede domain understanding and result in information loss. In
particular, many ontologies have adopted RDF-compliant (the Resource Description
Framework) class-structure [see 4], in which individual information resource
instances are recorded as members of a priori-defined classes. Classes (which are
similar to categories, entity sets, kinds) are typically modelled as sets of
attributes/properties that ideally support inferences beyond properties required to
establish class membership [see 10]. Thus, class-based ontological commitment is a
requirement to express an ontology in terms of classes of phenomena and the
relationships among them. We further argue that class-based ontological commitment
causes loss of potentially valuable properties of the phenomena, thereby impeding
domain understanding and semantic interoperability. In contrast, we suggest an
instance-based ontological model that does not require classifying individual
resources. Instead, each autonomous unit of information (instance) can be described
in terms of attributes and the resulting attribute sets can be used to create classes of
interest based on ad-hoc utility. This approach is based on the instance-based data
model proposed by Parsons and Wand in the context of database design [see 11].
The remainder of the paper is organized as follows. The next section provides a
review of relevant ontological literature. Then we discuss theoretical foundation of
the new approach and offer a case study to illustrate specific deficiencies of the class-
based ontologies. The paper concludes with a discussion on the instance-based
ontological foundation and an outlook to the future.
2 Ontological Research in IS
The premise that ontologies can streamline knowledge engineering and improve
information transfer is being actively explored in the information systems and computer
science literature [12]. Much of the research focuses on domain-specific ontologies,
which typically act as content theories and describe classes, properties and relations for
a particular domain [see 6, 13]. In that sense, [o]ntologies are often equated
with taxonomic hierarchies of classes [14, p. 172]. Many class-based libraries of
domain-specific ontologies have been created on the Internet (e.g. the Protg Ontology
Library, http://protegewiki.stanford.edu/wiki/Protege_Ontology_Library;DAML, http://
www.daml.org/ontologies/). The relative success of ontologies in a number of
222 R. Lukyanenko and J. Parsons
would prefer to use as few categories as possible. From the cognitive load point of
view, the best categories are the most inclusive ones [32-33]. Yet, little useful
information can be inferred by classifying objects into general categories such as
object or living being, except that that they indicate some existence. As a result,
humans need to consider categories at finer levels of specificity. As categories
become conceptually smaller and include fewer individuals, they gain inferential
power. This can be explained using probability theory [e.g. 33]. Since there are fewer
birds than there are objects, a certain bird property has a greater chance of occurring
in a smaller category (e.g. birds) than in the more inclusive category (e.g. objects).
Once an individual is identified as an instance of a category, it is easier (with greater
probability) to infer additional (unobserved) properties for a smaller category [30].
Yet, since there are more smaller categories (e.g. duck) than inclusive ones (e.g.
object), maintaining them requires additional resources.
In the case of both humans and machines, classification means that certain
properties that are not of immediate interest are ignored. Herein is a fundamental
difference between human and computerized representations of reality. When humans
classify, they focus on relevant features but remain aware of other ones (e.g., they can
recall other features of an instance that are not associated with a particular category of
interest). Recent research shows that humans can hold a massive amount of details
for a long time without instructions to maintain any specific information [34]. This
information can be stored subconsciously and invoked in response to certain triggers.
While humans use a certain classification of the world, they remain aware of other
potential alternatives or accept them with new experiences. For example, in William
Blakes poem The Tyger a vivid picture of a dangerous predator is depicted. Survival
instincts cause humans to categorize large wild animals as dangerous. Yet, this does
not preclude us from considering Aunt Jennifer's Tigers by Adrienne Rich as symbols
of freedom and chivalry and wishing we could be prancing alongside them. Thus, in
the domain of poetry, the same class of tigers can have different properties of interest
in the mind of a poetry reader. As humans engage with their environment, they
continuously revise and adjust their class definitions. In contrast, strict adherence to a
predefined class structure means only individuals possessing a certain set of
properties will be permitted to be classified as members of a particular class. Thus, if
an ontology that defines tiger as dangerous needs to integrate an instance that is
characterized as meek, it may either reject classification, or ignore the meek property
and assume the dangerous one in the process of data integration. It can also lead to a
lower probability of class membership in those approaches that treat class boundaries
as fuzzy.
Proposition. Domain understanding is necessarily reduced every time a class is used
to conceptually represent instances.
The assumption of the class-based approach to data modeling is that individual
instances are always members of some (usually one) a priori defined class. Yet, since
many classes can be used to represent the same phenomena, it is unclear which class
is better. Choosing the wrong one means that the information stored will be
deficient with respect to perceived reality. Parsons and Wand proposed cognitive
guidelines for choosing classes that could be used for reasoning about classification,
and their violation indicates something is lost from a cognitive point of view
224 R. Lukyanenko and J. Parsons
[30, p. 69; emphasis added]. Extending this idea further, we claim that using classes
to represent instances will always fail to fully capture reality, no matter how good
the chosen class is. Simply, no class is good enough to fully represent an individual
and whenever an attempt is made to do so, properties not defined by a class are
neglected or lost.
Proposition: An instance can never be fully represented by a class.
Any object has a potentially large number of features and no one class can capture
them all. The same object can belong to many classes, which means that individual
objects exist independent of any given classification. The relationship between
individuals and classes is well depicted in Bunges ontology [20, 23]. According to
Bunge the world is made of distinct things and things are characterized by their
properties. Humans perceive properties indirectly as attributes and classes can be
formed by grouping properties together.
Each instance is different from all others in some ways. As Bunge puts is, there
are no two identical entities [20]. Even the atoms and particles of the same element
are in some sense different because no two things can occupy the same state at the
same time [20]. In fact, on the basis of individual differences it is possible to
construct as many new classes as there are differences.
Typically, however, classes are formed based on the commonality of instances, not
their differences. Nevertheless, this does not preclude humans from considering, if
necessary, properties that are not part of the classification schema. For example, when
professors think about own students each student retains a plethora of individual
features. Some students may require more attention than others. The distribution of
attention for each student may also change over time. In contrast, a university domain
ontology typically defines a Student class using the same set of properties. Thus,
while humans are capable of both reasoning about similarity and difference, class-
based ontologies emphasize commonalities only. An important element of reality
individual differences is consistently neglected. Rigid classification-based
ontologies routinely miss potentially valuable instance-specific information.
Finally, human categorization is a dynamic process, during the course of which
class schemas may change. As shown from the discussion of the poetry domain,
humans can uncover new features of familiar classes with additional experience [35].
Yet, as emphasized by Uschold and Gruninger semantic interoperability demands
standardization [6]. This requires ontologies to have persistent and stable schemas.
Thus, Uschold and Gruninger introduce the concept of reusability, defined as the
formal encoding of important entities, attributes, processes and their inter-
relationships in the domain of interest [6, p.3]. Thus, storing instances in terms of
predefined classes makes it difficult to dynamically include new properties into class
definitions.
The implications of class-based ontologies are illustrated next using a simple case
study.
For simplicity, the proposed ontology consists of four classes: traveler, agent, trip
and supervisor, each with a set of properties. For example, the class definition of
traveler mean that any individual members of that class will be expected to have
corresponding properties customer id, name, passport no, nationality, date of birth,
email, password. It is possible that some values of the properties will be missing, but
in order to qualify as a traveler, a certain number of values known must be present.
Similarly, the class travel agent contains those properties that best describe agency
employees who assist customers in booking and management of trips (agent id, name,
supervisor id, date hired, email).
Each class has its own unique relations to other classes. For example, traveler
can request information about or take a trip. At the same time, an agent assists with
trip booking and maintains trips. Travelers may not have direct relationships with
agencys supervisors, while each agent reports to a particular supervisor. In each case
a class defines the set of intuitive and lawful relations with other constructs.
The presented ontology can be used to guide distributed application development
and automatically integrate data from multiple local agencies into a global one.
Traveler Trip
Customer ID takes Trip ID Creates,
Name Trip Cost Manages
Passport No Description
Nationality
DOB books
Email Supervisor
Password Agent reports to
Supervisor ID
Agent ID Name
Name Password
Supervisor ID Date Hired
Date Hired
Email
already know something about this visitor, which may be of potential business use:
we may be interested in his/her IP address, click stream patterns [39], viewing time
[40], or comments he/she left on a website. Once several attributes are recorded, the
system can match them with pre-existing sets of identifying attributes for a
phenomena (such as a class of interest), and either infer a class or seek additional
attributes that could also be automatically deduced from those previously supplied.
The final attribute set can potentially match to a class (e.g. customer), or integrate
instances without classifying them. Doing so avoids inherent data quality deficiencies
of the class-based models. By shifting the focus from classification to identification of
instances and properties, fuller and semantically richer information about domains can
be collected without imposing a particular view and biasing the results.
The instance-based approach can also address the problem of representing
temporary changing information. For example, an individuals properties may change
overtime. Class membership, thus, becomes more dynamic and evolves with the
changing properties.
Instance-and-property ontologies can accommodate the growing semantic diversity
of the web. Discretionary data input is growing and increasingly large numbers of
websites generate content from direct user input. This is the premise behind the
practice of crowdsourcing, the volunteer participation of regular users in purpose-
driven projects online [41]. One type of crowdsourcing is citizen science. Online
citizen science projects, such as eBird (ebird.org) or iSpot (ispot.org.uk), attempt to
capture valuable insights of regular people to be used in academic research. It is
clearly difficult to a priori anticipate what kind of information non-experts can
provide, and creating unnecessary constraints can undermine potential gains from
such projects. Moreover, amateur observers are often unable to provide information in
compliance with the constructs and relations of a given ontology, especially if it is
based on scientific taxonomy (e.g. ETHAN). Being able to give voice to every citizen
of the Internet and easily manage that data is an emerging frontier of the web of the
future.
References
1. Lacey, A.R.: A dictionary of philosophy. Routledge, New York (1996)
2. Burton-Jones, A., Storey, V.C., Sugumaran, V., Ahluwalia, P.: A semiotic metrics suite for
assessing the quality of ontologies. Data & Knowledge Engineering 55, 84102 (2005)
3. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American
Magazine, 2837 (2001)
4. Decker, S., Melnik, S., van Harmelen, F., Fensel, D., Klein, M., Broekstra, J., Erdmann,
M., Horrocks, I.: The Semantic Web: the roles of XML and RDF. IEEE Internet
Computing 4, 6373 (2000)
5. Guizzardi, G.: Theoretical foundations and engineering tools for building ontologies as
reference conceptual models. In: Semantic Web, pp. 310 (2010)
6. Uschold, M., Gruninger, M.: Ontologies: principles, methods, and applications.
Knowledge Engineering Review 11, 93155 (1996)
7. Gruber, T.R.: A translation approach to portable ontology specifications. Knowl.
Acquis. 5, 199220 (1993)
228 R. Lukyanenko and J. Parsons
8. Parafiynyk, A., Parr, C., Sachs, J., Finin, T.: Proceedings of the Workshop on Semantic
e-Science. In: AAAI 2007 (2007)
9. Parr, C., Sachs, J., Parafiynyk, A., Wang, T., Espinosa, R., Finin, T.: ETHAN: the
Evolutionary Trees and Natural History Ontology. Computer Science and Electrical
Engineering. University of Maryland, Baltimore County (2006)
10. Parsons, J., Wand, Y.: Using cognitive principles to guide classification in information
systems modeling. Mis Quart. 32, 839868 (2008)
11. Parsons, J., Wand, Y.: Emancipating instances from the tyranny of classes in information
modeling. ACM Transactions on Database Systems 25, 228268 (2000)
12. Chandrasekaran, B., Josephson, J.R., Benjamins, V.R.: What Are Ontologies, and Why Do
We Need Them? IEEE Intelligent Systems 14, 2026 (1999)
13. Evermann, J., Wand, Y.: Ontology based object-oriented domain modelling: fundamental
concepts. Requir. Eng. 10, 146160 (2005)
14. Hansen, P.K., Mabogunje, A., Eris, O., Leifer, L.: The product develoment process
ontology creating a learning research community. In: Culley, S., WDK, W.D.-.-K. (ed.)
Design Management: Process and Information Issues. Professional Engineering for The
Institution of Mechanical Engineers, pp. 171186 (2001)
15. Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., Goldberg, L.J.,
Eilbeck, K., Ireland, A., Mungall, C.J., Leontis, N., Rocca-Serra, P., Ruttenberg, A.,
Sansone, S.-A., Scheuermann, R.H., Shah, N., Whetzel, P.L., Lewis, S.: The OBO
Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat.
Biotech. 25, 12511255 (2007)
16. Kalfoglou, Y., Schorlemmer, M.: Ontology mapping: the state of the art. Knowl. Eng.
Rev. 18, 131 (2003)
17. Burton-Jones, A., Wand, Y., Weber, R.: Guidelines for Empirical Evaluations of
Conceptual Modeling Grammars. J. Assoc. Inf. Syst. 10, 495532 (2009)
18. Orme, A.M., Haining, Y., Etzkorn, L.H.: Indicating ontology data quality, stability, and
completeness throughout ontology evolution. Journal of Software Maintenance &
Evolution: Research & Practice 19, 4975 (2007)
19. Beck, T., Morgan, H., Blake, A., Wells, S., Hancock, J.M., Mallon, A.-M.: Practical
application of ontologies to annotate and analyse large scale raw mouse phenotype data.
BMC Bioinformatics 10, 19 (2009)
20. Bunge, M.A.: The furniture of the world. Reidel, Dordrecht (1977)
21. Sowa, J.F.: Knowledge representation: logical, philosophical, and computational
foundations. Brooks/Cole, Pacific Grove (2000)
22. Parsons, J., Wand, Y.: Using objects for systems analysis. Commun. ACM 40, 104110
(1997)
23. Wand, Y., Weber, R.: An Ontological Model of an Information-System. IEEE T. Software
Eng. 16, 12821292 (1990)
24. Wand, Y., Wang, R.Y.: Anchoring data quality dimensions in ontological foundations.
Commun. ACM 39, 8695 (1996)
25. Wand, Y., Monarchi, D.E., Parsons, J., Woo, C.C.: Theoretical Foundations for
Conceptual Modeling in Information-Systems Development. Decis. Support Syst. 15, 285
304 (1995)
26. Parsons, J., Wand, Y.: Emancipating Instances from the Tyranny of Classes in Information
Modeling. ACM Transactions on Database Systems 25, 228 (2000)
27. Wand, Y., Weber, R.: On the ontological expressiveness of information systems analysis
and design grammars. Inform. Syst. J. 3, 217237 (1993)
Unintended Consequences of Class-Based Ontological Commitment 229
28. Wand, Y., Weber, R.: An Ontological Evaluation of Systems-Analysis and Design
Methods. Information System Concepts: An in-Depth Analysis 357, 79107 (1989)
29. Raven, P.H., Berlin, B., Breedlove, D.E.: The Origins of Taxonomy. Science 174, 1210
1213 (1971)
30. Parsons, J., Wand, Y.: Choosing classes in conceptual modeling. Commun. ACM 40, 63
69 (1997)
31. Markman, E.M.: Categorization and Naming in Children: Problems of Induction. MIT
Press, Cambridge (1991)
32. Murphy, G.L.: Cue validity and levels of categorization. Psychological Bulletin 91, 174
177 (1982)
33. Corter, J., Gluck, M.: Explaining basic categories: Feature predictability and information.
Psychological Bulletin 111, 291303 (1992)
34. Brady, T.F., Konkle, T., Alvarez, G.A., Oliva, A.: Visual long-term memory has a massive
storage capacity for object details. PNAS Proceedings of the National Academy of
Sciences of the United States of America 105, 1432514329 (2008)
35. Anderson, J.R.: The Adaptive Nature of Human Categorization. Psychol. Rev. 98, 409
429 (1991)
36. Vukmirovic, M., Szymczak, M., Ganzha, M., Paprzycki, M.: Utilizing ontologies in an
agent-based airline ticket auctioning system. In: 28th International Conference on
Information Technology Interfaces, Croatia, pp. 385390 (2006)
37. Chang, C., Miyoung, C., Eui-young, K., Pankoo, K.: Travel Ontology for
Recommendation System based on Semantic Web. In: The 8th International Conference on
Advanced Communication Technology, ICACT 2006, vol. 1, pp. 624627 (2006)
38. Gong, H., Guo, J., Yu, Z., Zhang, Y., Xue, Z.: Research on the Building and Reasoning of
Travel Ontology. In: Proceedings of the 2008 International Symposium on Intelligent
Information Technology Application Workshops, pp. 9497. IEEE Computer Society, Los
Alamitos (2008)
39. Park, J., Chung, H.: Consumers travel website transferring behaviour: analysis using
clickstream data-time, frequency, and spending. Service Industries Journal 29, 14511463
(2009)
40. Parsons, J., Ralph, P., Gallagher, K.: Using Viewing Time to Infer User Preference in
Recommender Systems. In: Proceedings of the AAAI Workshop on Semantic Web
Personalization held in conjunction with the 9th National Conference on Artificial
Intelligence (AAAI 2004), pp. 5263 (2004)
41. Doan, A., Ramakrishnan, R., Halevy, A.Y.: Crowdsourcing systems on the World-Wide
Web. Commun. ACM 54, 8696 (2011)
Preface to SeCoGIS 2011
This volume contains the papers presented at SeCoGIS 2011, the Fifth Inter-
national Workshop on Semantic and Conceptual Issues in GIS, held the 1st of
November in Brussels, Belgium.
Current information technologies have increased the production, collection,
and diusion of geographical and temporal data, thus favoring the design and
development of geographic information systems (GIS) and more generally speak-
ing spatio-temporal information systems (STIS). Nowadays, GISs are emerging
as a common information infrastructure, which penetrate into more and more
aspects of our society. This has given rise to new methodological and data en-
gineering challenges in order to accommodate new users requirements for new
applications. Conceptual and semantic modeling are ideal candidates to con-
tribute to the development of the next generation of GIS solutions. They allow
eliciting and capturing user requirements as well as the semantics of a wide
domain of applications.
The SeCoGIS workshop brings together researchers, developers, users, and
practitioners carrying out research and development in geographic information
systems. The aim is to stimulate discussions on the integration of conceptual
modeling and semantics into current geographic information systems, and how
this will benet the end users. The workshop provides a forum for original re-
search contributions and practical experiences of conceptual modeling and se-
mantic web technologies for GIS, fostering interdisciplinary discussions in all
aspects of these two elds, and will highlight future trends in this area. The
workshop is organized in a way to highly stimulate interaction amongst the
participants.
This edition of the workshop attracted papers from 15 dierent countries dis-
tributed all over the world: Brazil, Argentina, Chile, USA, Canada, France, Italy,
Spain, Germany, Belgium, The Netherlands, Austria, Macedonia FYROM, Mo-
rocco, and New-Zealand. We received 19 papers from which the Program Com-
mittee selected 6 papers, making an acceptance rate of 32 percent. The accepted
papers cover a wide range of issues, in particular consistency and integration,
rened spatial relationships and conceptual model transformation rules.
We hope that you nd the program and presentations benecial and enjoyable
and that during the workshop you had many opportunities to meet colleagues
and practitioners. We would like to express our gratitude to the program commit-
tee members and the external referees for their hard work in reviewing papers,
the authors for submitting their papers, and the ER 2011 organizing committee
for all their support.
Abstract. Mobile devices and location based services enable digital and real
worlds to be integrated within our daily lives. The handling of natural language
dialogue raises several research challenges, including the ability to direct the
users attention to a particular feature in the field of view through the use of
suitable descriptions. To mimic natural language these referring expressions
should use attributes which include factors of the buildings appearance, and
descriptions of its location with reference to the observer or other known
buildings in view. This research focuses on one particular positional case used
in describing features in a field of view, that of the opposite spatial relation,
and discusses how this referring expression may be generated by modelling the
view from the observers location to the surrounding features.
1 Introduction
Increasingly, digital and real worlds are becoming integrated within our daily lives,
with mobile devices and location based services being among the tools that enable
this to happen. One of the drawbacks has been that graphical interfaces distract the
user from their environment, and alternative interaction experiences are being
researched. Augmented Reality [1, 2] is one such innovation whereby digital
information is superimposed onto real world views. Speech interfaces are another
solution, whereby information may be retrieved using voice commands and speech
prompts [3, 4]. The work presented here discusses how speech interfaces may
reference items in the current view using natural language terms, with particular focus
on the use of the spatial preposition opposite as in were the house opposite the
bakery.
People use language to describe and share experiences about space and the objects
which occupy it [5]. These object descriptions are known in natural language research
as referring expressions and are used, for example, to draw someones attention to a
particular building in a cityscape [6]. Typically, the descriptions include a number of
physical attributes relating to the feature, such as its position relative to the observer
or other surrounding objects, so that the listener may identify the intended target. The
research area has particular relevance for the future of speech based interfaces for
Location Based Services (LBS), in both the generation of phrases to direct the users
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 231240, 2011.
Springer-Verlag Berlin Heidelberg 2011
232 P. Bartie et al.
2 Background
The range of LBS applications has diversified from basic navigational support into
areas of social networking and virtual city guides [4, 7]. User interfaces have tended
to have a graphical focus, but mobile speech recognition tools are improving to the
point where devices in the near future may incorporate a speech based mode allowing
users to operate in a hands-free and eyes-free way discretely, without the need to re-
focus attention from their environment [8]. Earlier research has shown that users are
able to carry out multiple tasks more easily if they are using a number of sensory
modalities [9].
System usability is very important for the success of LBS applications, and great
efforts are made to reduce the seam between the application and the user by closely
modelling the users viewpoint [10]. The ultimate goal would be for an LBS to pass
the spatial Turing test [11], whereby its instructions are indistinguishable from those
generated by a human. Steps towards this goal require that the LBS filter and translate
digital information into appropriate forms which match the users frame of reference
[12, 13].
The position of any feature in the urban landscape can be described relative to the
observer, or a secondary reference object. Relations are rarely described in metric
space (e.g. 123.7m at 54 degrees) but instead usually refer to topological space [14,
15] or projective space [16]. For example a paddling pool may be described as being
inside the park using a topological relations, or in front of the swings using a
projective relations. Equally a house may be described as on your left by
referencing the view experienced by an observer at a given location and orientation.
The topological relations between static features are permanent, which in urban
areas may include containment within a region (e.g. in a park), topographic feature
(e.g. on a hill, a slope, or in a valley), or adjacency to a linear feature (e.g. road, river,
rail). In contrast, projective relations are ternary comparisons between the primary
Referring Expressions in Location Based Services: The Case of the Opposite Relation 233
object, a reference object, and the observer [17]. This means that they are dynamic as
the users viewpoint is considered in each relations, therefore an ability to model
which Features of Interest (FOI) are in view is required to ensure only visible items
are referenced.
Most Geographic Information Systems (GIS) offer the functionality to carry out
visibility modelling [18], with a catalogue of research including siting radio masts
[19], locating the most scenic or most hidden routes [20], landscape planning [21], as
a weapon surrogate in military exercises [22], and in examining spatial openness in
built environments [23].
Studies in the urban landscape have tended to be based on isovists [24], using in
particular Benedikts [25] interpretation and definitions. Essentially isovists describe
the space which is visible from a vantage point considering the form of the built
environment through the use of architectural plans which denote the building footprint
and position. However this model ignores building height, the topography of the land
surface, and the continuation of the lines of sight beyond the first intersection with a
building footprint. Therefore isovists depict lines which when traversed from the
vantage point offer a continuous view of the target, and disregard more distant
features.
Recently, 3D isovists [26] and visual exposure models [27, 28] using DSMs built
from LiDAR sources have been introduced for urban visibility modelling. These
DSMs include building and topographical form and may be used to determine how
much of a feature can be viewed from the surrounding space, enabling the creation of
surfaces to show in which direction an observer would need to move to view the
target more, or less, clearly. These techniques can be used to find visual corridors, or
visual ridges, and form a useful basis for considering feature visibility in the context
of LBS. The urban visual exposure model calculates the vertical extents visible for
each building cell of the DSM, by calculating the lowest visible point on the faade
from the intersection of foreground objects, as shown in Fig. 1.
From this the visible faade area, and the percentage of a feature on the skyline
may be deduced, along with other metrics. Once the model is able to determine which
234 P. Bartie et al.
features are visible, it is possible to then relate these to construct the positional part of
a referring expression. To translate the positional information into the users frame of
reference requires an egocentric projective spatial model as discussed in the next
section.
Fig. 2. A combined model of space using relative and intrinsic frames of reference
space [31]. Hence the interrelated visibility of three related physical entities are
required, such as the library, park and street of the previous example. For this reason
visual exposure modelling is required to report which objects may be used in the
relation. Two cases are examined here, firstly where the entity is represented as a one-
dimensional feature, such as a road or river, and secondly where it is represented as a
two-dimensional region, such as a park.
The relation conveyed in the phrase the library is opposite the park, can be broken
down into the library is left of the road and the park is right of the road from a
viewpoint on the road. However this does not signify that the two objects are
opposite each other unless both are perceived to occur at similar positions along the
road. Consider Fig. 3 which shows a situation where a number of buildings surround a
park, and the observer is located on a road at Point 1. From this viewpoint the
observer is able to see Buildings A,B,C, and the upper part of D above hedges, but not
Buildings E, F and G which are out of view and therefore not included in any
referring expressions. When the observer faces Building C, it is valid to report that
both Buildings C and B are opposite the park. Here the term with reference to the
road is left out but inferred, and the phrase is equivalent to Building C is on the
other side of the road from the park. However, it would not be appropriate to define
Building A as opposite the park, as the two features do not share a similar location
along the linear road feature, yet the phrase opposite side of the road is still true.
236 P. Bartie et al.
The segments of the road labelled A1-A2, B1-B2 indicate the start and end of the
entity along the roads length, computed at the first and last intersections of a line
perpendicular to the direction of travel with the entity. To satisfy the opposite
condition the following must apply:
the observer must be able to view both entities from a single point (e.g., C and
Park);
the entities must occur at overlapping sections of the linear entity (e.g., C1-C2 and
P1-P2);
the observer must also be able to view the common linear entity in the overlap
region (e.g., road);
the entities must occupy beside left/right space when viewed from the overlapping
region.
Table 1. Calculating the Opposite relation for Entities and the Park from Observer Point 1
Following these rules Table 1 may be generated, indicating that B and C are
opposite the park when viewed from Point 1. Although part of D is visible above
hedges, the section of roadway between the building and the park is out of view
behind bushes, rendering the use of opposite as less appropriate to assist the users
visual search from the current location. However, the term could be used if
instructions considered the case as the user approaches, such as when you get to
Point 2 youll see D opposite the park.
When multiple features satisfy the opposite relation further consideration is
necessary to establish which would form the most suitable candidate. So from Point 2
buildings E, F and G come into view and may be considered as candidates for
describing the location of building D. A function is required to establish which of
these is most suitable, requiring knowledge of the candidates and consideration of the
overlap extent. In this case the overlap between D and E is minimal, and although F
has a larger overlap it would still make more sense to describe D as opposite the
Park, as these features share the greatest overlap and the Park is a very recognisable
feature.
As a further example, when the observer is at Point 2 looking for building G then
the park can no longer be considered opposite, however either building E or F could
be used. Factors including the saliency of the building, its visibility, distance from the
target, and the number of items between each should be considered. Assuming the
visibility of both E and F were high (clear views) then E would form the most logical
Referring Expressions in Location Based Services: The Case of the Opposite Relation 237
choice as it is the first item viewed from the roadside and most readily identifiable.
However if F was a visually prominent landmark, such as a church, then it would take
precedence despite being further from the target as its saliency allows it to form a
more useful descriptor.
Saliency is a measure of the prominence of a feature in the neighbourhood, and
there are methods to quantify such distinctiveness [32, 33]. Typically factors
including visual appearance and semantic interest are considered by comparing items
in the neighbourhood to establish the most easily recognisable and rare features. It is
of particular importance in choosing candidates for forming referring expressions, as
when targeting a building by describing it as opposite a tree it may be logically true,
but worthless if the entire street is filled with trees and all houses are opposite a tree.
Therefore, when constructing a referring expression, the number of other entities
which share a similar relation need to be considered, to minimise the confusion
caused by the statement.
Fuzzy classes may be used to establish the most attractive entity in an opposite
relation, by considering all alternatives and awarding class memberships between 0
and 1 according to a number of factors. The weighting between factors may be
adjusted according to the current task, for example car drivers may favour number of
items between as scanning opportunities are more limited while driving, whereas
pedestrians may favour saliency as they have more freedom to view the surroundings
and wish to locate the most prominent landmarks in a wider field of view.
Most suitable entity = f (V,S,N,D,O), where:
V visibility (degree of visibility of all items from a single observation point)
S saliency (prominent, minimise confusability)
N number of items between (measure of separation by entity count)
D distance apart (close items preferred)
O degree of overlap
A slightly modified set of rules are necessary when considering two-dimensional
common features as discussed next.
For linear common features the entity overlaps, used to identify whether features are
opposite, were determined by considering the first and last intersections of a line
perpendicular to the linear feature with each entity (as shown in Fig. 3, e.g., A1-A2).
In cases where the common feature is a region, an alternative rule is required to
determine overlap, and consequently opposition.
When the observer occupies a space inside the common region, for example
standing in a square, then two features may be described as opposite one another by
considering the observer as a central point with a feature occupying the in front space,
and one in behind space. As an example, if the observer looks towards feature A as
shown in Fig. 4(i), then B can be classed as in the opposite direction, according to the
in front/behind relation outlined in Fig. 2. However, if the observer is outside of the
common region then the relations may be calculated according to a division of space
based on the Orientated Minimum Bounding Box (OMBB), as shown in Fig. 4(ii). In
this case the OMBB is drawn around the common region, and the centre point
238 P. Bartie et al.
determined. Lines are extrapolated from the centre point to the corner and edge
midway points of the bounding box, creating 8 triangular zones. For any two entities
to be considered opposite each other with respect to the square they must occupy
zones whose sum adds up to ten, according to the number system shown. Therefore
no matter where the observer is located the relation of A, B1 and the square would be
classed as opposite, assuming all entities were visible. Entities occupying a
neighbouring zone are also considered to be opposite, so that A, B2 and A, B3
would be describe as sharing an opposite relation with respect to the square. This
works for all common region shapes (e.g. lakes), as the algorithm is based on the
orientated bounding box.
The concepts outlined so far have failed to include a measure of the appropriateness
of the inclusion of the term with respect to the observers viewing distance, and object
sizes. Instead they have looked to determine the most suitable candidate for the
relation. However at greater viewing distances, and for smaller items, it may be
harder for the observer to judge when two entities share an opposite relation and it
may be necessary to restrict the inclusion of the term in a referring expression
according to the percentage of the field of view occupied by the items. This approach
accommodates entity scale, such that a house may be described as opposite a bakery
from only close range, while a park may be described as opposite the hills from
greater distances.
Additionally when a referring expression is used in a more general description of a
region, and not to identify a particular target, consideration must be given to the
ordering of features in the relation. Jackendorf [5] makes the observation that not all
relations are symmetrical, and that the house is next to the bike makes less sense
than the bike is next to the house. When considering the opposite relation it
makes more sense to use the most salient feature as the reference object, such as turn
right after you see a hut opposite a lake, whereby the viewers attention should be
more naturally drawn to the lake, and the decision point confirmed once the hut has
been located.
Referring Expressions in Location Based Services: The Case of the Opposite Relation 239
References
1. Narzt, W., et al.: Augmented reality navigation systems. Universal Access in the
Information Society, 111 (2006)
2. Hollerer, T., et al.: Exploring MARS: Developing indoor and outdoor user interfaces to a
mobile augmented reality system. Computers and Graphics (Pergamon) 23(6), 779785
(1999)
3. Goose, S., Sudarsky, S., Zhang, X., Navab, N.: Speech-Enabled Augmented Reality
Supporting Mobile Industrial Maintenance. In: PERVASIVE Computing (2004)
4. Bartie, P.J., Mackaness, W.A.: Development of a speech-based augmented reality system
to support exploration of cityscape. Transactions in GIS 10(1), 6386 (2006)
5. Jackendoff, R.: Languages of the Mind. MIT Press, Cambridge (1992)
6. Dale, R., Reiter, E.: Computational interpretations of the Gricean maxims in the generation
of referring expressions. Cognitive Science 19(2), 233263 (1995)
7. Espinoza, F., et al.: GeoNotes: Social and Navigational Aspects of Location-Based
Information Systems. In: Abowd, G.D., Brumitt, B., Shafer, S. (eds.) UbiComp 2001.
LNCS, vol. 2201, pp. 217. Springer, Heidelberg (2001)
8. Francioni, J.M., Jackson, J.A., Albright, L.: The sounds of parallel programs: IEEE (2002)
9. Allport, D., Antonis, B., Reynolds, P.: On the Division of Attention: a Disproof of the Single
Channel Hypothesis. Quarterly Journal of Experimental Psychology 24, 225235 (1972)
10. Ishii, H., Kobayashi, M., Arita, K.: Iterative design of seamless collaboration media.
Communications of the ACM 37(8), 8397 (1994)
11. Winter, S., Wu, Y.: The spatial Turing test. In: Colloquium for Andrew U. Franks 60th
Birthday. Geoinfo Series, Department for Geoinformation and Cartography, Technical
University Vienna, Vienna (2008)
240 P. Bartie et al.
12. Meng, L.: Ego centres of mobile users and egocentric map design. In: Meng, L., Zipf, A.,
Reichenbacher, T. (eds.) Map-based Mobile Services, pp. 87105. Springer, Berlin (2005)
13. Reichenbacher, T.: Adaptive egocentric maps for mobile users. In: Meng, L., Zipf, A.,
Reichenbacher, T. (eds.) Map-based Mobile Services, pp. 143162. Springer, Berlin
(2005)
14. Egenhofer, M.J., Herring, J.: A mathematical framework for the definition of topological
relationships (1990)
15. Clementini, E., Di Felice, P.: A comparison of methods for representing topological
relationships. Information Sciences-Applications 3(3), 149178 (1995)
16. Clementini, E., Billen, R.: Modeling and computing ternary projective relations between
regions. IEEE Transactions on Knowledge and Data Engineering 18, 799814 (2006)
17. Hernndez, D.: Relative representation of spatial knowledge: The 2-D case. In: Mark,
D.M., Frank, A.U. (eds.) Cognitive and linguistic aspects of geographic space, pp. 373
385. Kluwer Academic Publishers, Netherlands (1991)
18. De Smith, M.J., Goodchild, M.F., Longley, P.: Geospatial Analysis: A Comprehensive
Guide to Principles, Techniques and Software Tools. Troubador Publishing (2007)
19. De Floriani, L., Marzano, P., Puppo, E.: Line-of-sight communication on terrain models.
International Journal of Geographical Information Systems 8(4), 329342 (1994)
20. Stucky, J.L.D.: On applying viewshed analysis for determining least-cost paths on Digital
Elevation Models. International Journal of Geographical Information Science 12(8), 891
905 (1998)
21. Fisher, P.F.: Extending the applicability of viewsheds in landscape planning.
Photogrammetric Engineering and Remote Sensing 62(11), 12971302 (1996)
22. Baer, W., et al.: Advances in Terrain Augmented Geometric Pairing Algorithms for
Operational Test. In: ITEA Modelling and Simulation Workshop, Las Cruces, NM (2005)
23. Fisher-Gewirtzman, D., Wagner, I.A.: Spatial openness as a practical metric for evaluating
built-up environments. Environment and Planning B: Planning and Design 30(1), 3749
(2003)
24. Tandy, C.R.V.: The isovist method of landscape. In: Symposium: Methods of Landscape
Analysis. Landscape Research Group, London (1967)
25. Benedikt, M.L.: To take hold of space: isovists and isovist fields. Environment and
Planning B 6(1), 4765 (1979)
26. Morello, E., Ratti, C.: A digital image of the city: 3D isovists in Lynchs urban analysis.
Environment and Planning B 36, 837853 (2009)
27. Llobera, M.: Extending GIS-based visual analysis: the concept of visualscapes.
International Journal of Geographical Information Science 17(1), 2548 (2003)
28. Bartie, P.J., et al.: Advancing Visibility Modelling Algorithms for Urban Environments.
Computers Environment and Urban Systems (2010);
doi:10.1016/j.compenvurbsys.2010.06.002
29. Billen, R., Clementini, E.: Projective relations in a 3D environment. In: Raubal, M., Miller,
H.J., Frank, A.U., Goodchild, M.F. (eds.) GIScience 2006. LNCS, vol. 4197, pp. 1832.
Springer, Heidelberg (2006)
30. Bartie, P., et al.: A Model for Egocentric Projective Spatial Reasoning based on Visual
Exposure of Features of Interest (forthcoming)
31. Merriam-Webster: Opposite (2010)
32. Raubal, M., Winter, S.: Enriching wayfinding instructions with local landmarks. In:
Egenhofer, M.J., Mark, D.M. (eds.) GIScience 2002. LNCS, vol. 2478, pp. 243259.
Springer, Heidelberg (2002)
33. Elias, B.: Determination of landmarks and reliability criteria for landmarks (2003)
Cognitive Adequacy of Topological Consistency
Measures
1 Introduction
A dataset is consistent if it satises a set of integrity constraints. These integrity
constraints dene valid states of the data and are usually expressed in a lan-
guage that also denes the data schema (logical representation). Consistency
measures provide an indication on how much a dataset satises a set of integrity
constraints. They are useful to compare datasets and to dene strategies for data
cleaning and integration. Traditionally, consistency in datasets has been a binary
property, the dataset is either consistent or not. At most, consistency measures
count the number of elements in a dataset that violate integrity constraints,
but the concept of being partially consistent does not exist. Spatial information
rises new issues regarding the degree of consistency because the comparison of
spatial data requires additional operators beyond the classical comparison oper-
ators (=, >, <, , , =). Geometries are typically related by topological or other
spatial relations, upon which dierent semantic constraints may be dened.
This work was partially funded by Fondecyt 1080138, Conicyt-Chile and by Min-
isterio de Ciencia e Innovaci
on (PGE and FEDER) refs. TIN2009-14560-C03-02,
TIN2010-21246-C02-01 and by Xunta de Galicia (Fondos FEDER), ref. 2010/17.
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 241250, 2011.
c Springer-Verlag Berlin Heidelberg 2011
242 N.R. Brisaboa, M.R. Luaces, and M.A. Rodrguez
2 Related Work
Related work addresses similarity measures of topological relations. Similar-
ity measures are useful to compare the topological relation between geome-
tries stored in a dataset with respect to an expected topological relation as
expressed by an integrity constraint. We distinguish qualitative from quantita-
tive approaches to comparing topological relations. A qualitative representation
of topological relations uses a symbolic representation of spatial relations, such
as the topological relations dened by Egenhofer and Franzosa [3] or by Randell
et al. [8]. Under this representation, a similarity measure compares topologi-
cal relations by the semantic distance between relations dened in a conceptual
neighborhood graph [7]. The disadvantage of comparing topological relations
from a qualitative perspective is that it does not make distinction between par-
ticular geometries. For example, it does not distinguish between two pairs of
geometries, both disjoint, but where in one case the geometries are very close
and in the other case the geometries are far apart. Even more, in most cases
when semantic distance is used, all edges in the conceptual graph will usually
have the same weight in the determination of the semantic distance.
A quantitative representation of topological relations is given in [1] by the dis-
tance and angle between the centroid of the objects. Using this representation,
similarity between topological relations is dened as the inverse of the dier-
ence between representations. Another study [4] denes ten quantitative mea-
sures that characterize topological relations based on metric properties, such as
Cognitive Adequacy of Topological Consistency Measures 243
length, area, and distance. The combination of these measures gives an indica-
tion of the topological relations and their associated terms in natural language
(such as going through and goes up to). The problem of using the previous mea-
sures for evaluating the degree of inconsistency is that although datasets handle
geometries of objects, constraints are expressed by qualitative topological rela-
tions, and therefore, only a symbolic representation of the expected topological
relations exists.
In the spatial context, only the work in [9] introduces some measures to com-
pare the consistency of dierent datasets. In this previous work, given an ex-
pected topological relation between any two objects with particular semantics,
a violation degree measure quanties how dierent is the topological relation
between the objects from the expected relation expressed by a topological con-
straint. While this previous work provides an evaluation with respect to semantic
distance, it does not evaluate the cognitive adequacy of the measures neither the
impact of the relative size of objects in the quantication of inconsistency.
x1 x x2 , g2 ) T (g1 , g2 ))
x1 , g1 ) R(
2 g1 g2 (P (
where g1 is either g1 , 1 [g1 ] or 2 [g1 , d], with d a constant and 1 and 2 geo-
metric operators that return a geometry. Geometry g2 is dened in the same way
than g1 where g1 is replaced by g2 . Also is an optional formula in conjunctive
normal form (CNF) dened recursively by:
Using the previous denitions and considering predicate county(idc, ids, g) and
state(ids, g), a CTD could be a county must be within the state to which it
belongs:
(a) (b)
Fig. 1. Comparison of the violation degree: (a) example of section 1 and (b) example
of section 2
A B 0
factor of the violation degree (H1 and H2 ). Figure 1(a) shows a question of this
section. The task is to choose whether or not one of them shows a larger (>),
equal (=) or smaller (<) violation degree of an expected topological relation
among geometries A and B. Specically, we check that the larger the value of
the parameter that denes the magnitude of conicts in our measures, the larger
the perceived violation.
The questions in this section check the four parameters considered in our mea-
sures (i.e., external distance, internal distance, crossing length, and overlapping
size) as well as the inuence of the touching length between geometries. The
questions also represent a balanced selection of dierent topological relations
and types of geometries (mainly surfaces and curves, but also points).
Section 2 includes 14 questions similar to those in Section 1. The dierence is
that the gures now include black geometries that represent the context where
the violation occurs. Figure 1(b) shows a question of section 2. This section
is designed to prove the inuence of the context, that is, the inuence in the
perceived violation of the size of the two geometries in conict with respect to
the other geometries in the dataset (H3 ).
Section 3 shows each single gure in the questions in section 3. Fourteen of
them represent small blue and yellow geometries and large context geometries
and fourteen of them represent large blue and yellow geometries and small con-
text geometries. For each gure, the subjects were asked to provide a value of the
degree of violation between 0 and 100. The example given to the subjects (see
Figure 2) does not violate the expected topological relation and, consequently,
the assigned violation degree value is 0. We decided to use this example with
value 0 to avoid any inuence on the value given by subjects in case of a viola-
tion. This section is designed to validate our measures by evaluating whether or
not the violation degrees computed by our measures is in concordance with the
violation degrees given by the subjects. If there is a high correlation between the
scores provided by the subjects and the values obtained with our measures, we
can conclude that our measures are valid and that they reect the opinions of
the subjects.
We decided not to check all combinations of topological relations between ge-
ometries versus an expected topological relation because this would require 64
questions in the test. We assume that if a parameter (e.g. the external distance
Cognitive Adequacy of Topological Consistency Measures 247
between geometries) was perceived and used to decide a degree of violation be-
tween geometries that must touch but are disjoint, then it will be also perceived
and used to decide the degree of violation between two geometries that must
overlap but are disjoint. Similarly, we assume that if a parameter is perceived
and used for geometries of a particular dimension (e.g., surfaces), then it will be
also for geometries of other dimension (e.g., curves or points). Furthermore, we
did not include gures in the test where the expected relation is Equal.
(a) (b)
Fig. 3. Comparison of the violation degree: (a) a surface within another surface (b) a
point within another point
Table 1 shows the raw data obtained in the dierent questions of section 1. In
this table, Expected refers to the expected topological relation as expressed by a
CTD, Actual refers to the real relation between the geometries in the question,
Geometries refers to the type of geometries involved in the relation, Parameter
refers to the parameter evaluated by the test, + refers to the percentage of
answers that perceive a positive impact of the parameter on the violation degree,
= refers to the percentage of answers that do not perceive any eect of the
parameters on the violation degree, and refers to the percentage of answers
that give an inverse inuence of the parameter on the violation degree.
One can see that there is always around a 10% of subjects that answered
a dierent option than the one we expected. When some of the subjects were
asked about the reasons of their answers, many of them said it was a mistake. We
realize now that it was dicult for the subjects to keep the level of concentration
to mark correctly < or > on all the questions. The results prove beyond any
doubt that the parameters external distance, overlapped size and crossing length
are consistently used by the subjects to evaluate the violation degree (hypothesis
H1 ). We performed the Students t test to evaluate the signicance of our results
and we found that the average percentage of correct answers for the parameter
external distance was signicant at the 99% level. In the same way the results
showed that for the parameters crossing length and overlapping size the average
percentage we obtained is signicant at the 95% level.
Results regarding the parameter internal distance are more dicult to ana-
lyze. For questions 2 and 23 there was a large percentage of subjects (around
40%) that did not answer as we expected. Figure 3(a) shows question number
2. When asked why they answered this way, the subjects said that geometries
were more disjoint when the internal distance between them was larger. We be-
lieve that this was due to a misunderstanding of the topological relation Disjoint.
248 N.R. Brisaboa, M.R. Luaces, and M.A. Rodrguez
Question 7 was another case where many subjects gave unexpected answers due
to the misunderstanding of the topological relation Overlaps. When asked why
they considered that both gures have the same degree of violation, many sub-
jects answered that in both cases there was no violation because when a geometry
is within another geometry they also overlap each other.
After eliminating questions 2, 7, and 23 where there was some misinterpre-
tation, the Students t-test shows that the average percentage is signicant at
the 85% level. This means that the internal distance parameter aects the per-
ception of consistency by subjects. However, further analysis must be performed
in the future to better understand why in some cases the internal distance is
considered important and in some cases not.
Finally, as H2 states, the touching length is not a useful parameter to evalu-
ate the degree of violation of a topological constraint. Only question 11 shows a
higher percentage of subjects that considered the impact of touching length im-
portant on the violation degree. However, as it can be seen in Figure 3(b), this
question was the only one where the geometries in each gure were not exactly
the same. Thus, there may be factors other than touching length involved in the
subjects answers.
Cognitive Adequacy of Topological Consistency Measures 249
The results for Section 2 indicate that 35%, 35% and 30% of the subjects
considered that the size of geometries in conict had a positive, equal or negative
impact on the violation degree, respectively. These results do not support our
hypothesis H3 , but they also do not support the alternative hypothesis that states
that the relative size has no impact or a negative impact on the violation degree.
Therefore, we cannot extract any conclusion over the inuence of the context in
the evaluation of the violation degree. These results are in concordance with the
results obtained in section 3.
Finally, for each question in Section 3, we computed the average score given
by the 60 subjects and the value of our measure. Then, we computed the Pearson
correlation between both series of values. The correlation coecient equals to
0.54. Given that this is a very small value, we excluded the relative weight
from the computation of our measures and we obtained a correlation coecient
of 0.84. This result supports the conclusion that we extracted from section 2.
We can conclude that the relative size of the geometries is not considered to be
important by the subjects. Or at least, that the subjects consider more important
the magnitude of the conicts than the relative size of the geometries with respect
to other objects in the dataset.
5 Conclusions
We obtained two types of conclusions from this work: some related to the def-
inition of the measures and other related to the methodology to evaluate these
measures. Overall, it is clear that the use of parameters such as external dis-
tance and overlapping size allows us to discriminate situations that a semantic
distance approach to comparing topological relations would otherwise overlook.
Unless we consider the particularity of the geometries, a pair of geometries hold-
ing the same topological relation will always have the same degree of violation
with respect to a dierent expected topological relation.
The results of the empirical evaluation indicate that the parameters that
dene our measures agree with the human perception of the violation degree. The
only one that was not fully conrmed was the internal distance, which requires
further evaluation. Contrary to our expectation, the results also indicate that
the relative size of geometries in conict with respect to other geometries in the
dataset has less impact on the evaluation of the violation degree than what we
expected. This is conrmed by the increase in the correlation of the scores given
by the users in Section 3 of the test when we eliminated the eect of the relative
size of geometries from the measures.
We conrmed that the design of the test is critical. There are two basic prob-
lems that need to be solved for future empirical evaluations: the diculty of the
task and the knowledge the subjects need about topological relations.
Regarding the diculty of the task, the questions in the test require a high
level of concentration. This explains the high number of mistakes we found in
the questions of section 1. On the other hand, in section 3 the task was easier
because only a score was requested. However, the subjects complained about the
250 N.R. Brisaboa, M.R. Luaces, and M.A. Rodrguez
diculty and many of them moved back and forward changing the scores while
answering the questions.
The problem of the knowledge about the topological relations is harder to
solve. Explaining the meaning of the topological relations before the test does
not guarantee that they use these denitions instead of their own interpretations.
For instance, some of the subjects considered that two surfaces that overlap,
also touch, or that two surfaces that are one within the other also overlap. The
only way to avoid this problem is to train the subjects in the meaning of the
topological relations. However, it may be dicult to do this without instructing
them in our view of the parameters that dene the measure of the violation
degree. Probably, the safest way to tackle this problem is to select the gures and
their relations very carefully to avoid that subjects misunderstand topological
relations.
References
1. Berreti, S., Bimbo, A.D., Vicario, E.: The computational aspect of retrieval by
spatial arrangement. In: Intl. Conference on Pattern Recognition (2000)
2. Bravo, L., Rodrguez, M.A.: Semantic integrity constraints for spatial databases.
In: Proc. of the 3rd Alberto Mendelzon Intl. Workshop on Foundations of Data
Management, Arequipa, Peru, vol. 450 (2009)
3. Egenhofer, M., Franzosa, R.: Point Set Topological Relations. IJGIS 5, 161174
(1991)
4. Egenhofer, M., Shari, A.: Metric details for natural-language spatial relations.
ACM Transactions on Information Systems 16(4), 295321 (1998)
5. Hadzilacos, T., Tryfona, N.: A Model for Expressing Topological Integrity Con-
straints in Geographic Databases. In: Frank, A.U., Formentini, U., Campari, I. (eds.)
GIS 1992. LNCS, vol. 639, pp. 252268. Springer, Heidelberg (1992)
6. OpenGis: Opengis Simple Features Specication for SQL. Tech. rep., Open GIS
Consortium (1999)
7. Papadias, D., Mamoulis, N., Delis, V.: Algorithms for querying spatial structure.
In: VLDB Conference, pp. 546557 (1998)
8. Randell, D., Cui, Z., Cohn, A.: A spatial logic based on regions and connection. In:
Nebel, B., Rich, C., Swarthout, W. (eds.) Principles of Knowledge Representation
and Reasoning, pp. 165176. Morgan Kaufmann, San Francisco (1992)
9. Rodrguez, M.A., Brisaboa, N.R., Meza, J., Luaces, M.R.: Measuring consistency
with respect to topological dependency constraints. In: 18th ACM SIGSPATIAL
Intl. Symposium on Advances in Geographic Information Systems, ACM-GIS 2010,
San Jose, CA, USA, pp. 182191 (2010)
The Neighborhood Conguration Model:
A Framework to Distinguish Topological
Relationships between Complex Volumes
1 Introduction
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 251260, 2011.
c Springer-Verlag Berlin Heidelberg 2011
252 T. Chen and M. Schneider
systems, and qualitative spatial reasoning. From a database and GIS perspective,
their development has been motivated by the need of formally dened topologi-
cal predicates as lter conditions for spatial selections and spatial joins in spatial
query languages and as a support for spatial data retrieval and analysis tasks.
The central conceptual approach, upon which almost all publications in this
eld have been based, is the 9-intersection model (9IM) [1]. This model checks
the nine intersections of the boundary, interior, and exterior of a spatial object
with the respective components of another spatial object for the topologically
invariant criterion of non-emptiness. Extensions have been proposed to obtain
more ne-grained topological predicates. However, the main focus has been on
spatial objects in the 2D space; the study of topological relationships between
spatial objects in the 3D space has been rare. An available strategy is to apply
9IM based models and to investigate the total number of relationships that can
occur in reality between spatial objects in the 3D space. However, the third di-
mension introduces more complicated topological situations between 3D spatial
objects. When directly applying the 9IM based models to 3D complex spatial
objects like volumes, they suer from a major problem, which we call the high
granularity problem. That is, the 9IM considers the interior, exterior, and bound-
ary point sets as the basic elements for empty and non-empty intersection tests,
which ignores the fact that the interior, exterior, and the boundary of a spatial
object are also complex spatial object parts, and may have multiple components.
Thus, the interaction between any pair of the basic elements from two spatial
objects can be complex, and empty or non-empty intersection results may not
be enough to describe such interactions. For example, the boundary of a volume
object is a closed surface object, which may have multiple components. Thus, the
interaction between the boundaries of two volume objects is equivalent to the in-
teraction between two complex surface objects, which can touch at a point, meet
at a face, cross each other, or have touch, meet, and cross interactions coexist
on one or more components. Since 9IM based models do not have the capability
to handle these interaction details for their basic elements, a large number of
topological relationships between complex volumes are not distinguished.
In this paper, we propose a new framework based on point set theory and
point set topology to model topological relationships between two complex vol-
ume objects. We overcome the problems raised on the 9IM by investigating the
interactions between two volumes at a much lower granularity. Instead of check-
ing the intersections of the interior, boundary, and exterior of two volumes, we
investigate the interaction of the two volumes within the neighborhood of any
point in the Euclidean space where the two volumes rest, which we call the
neighborhood conguration. We explore all possible neighborhood congurations,
and dene corresponding Boolean neighborhood conguration ags. By evalu-
ating and composing the values of the neighborhood conguration ags for a
given scenario with two complex volumes, we can obtain a binary encoding of
the topological relationship between the two volumes. We show that our model
yields more and thus a more ne-grained characterization of topological rela-
tionships between two complex volumes compared to 9IM.
The Neighborhood Conguration Model 253
A B = A B = A B =
A B = A B = A B =
A B = A B = A B =
2 Related Work
A special emphasis of spatial research has been put on the exploration of topo-
logical relationships (for example, overlap, inside, disjoint, meet) between spatial
objects. An important approach for characterizing them rests on the so-called
9-intersection model, which employs point set theory and point set topology [1].
The model is based on the nine possible intersections of the boundary (A),
interior (A ), and exterior (A ) of a spatial object A with the corresponding
components of another object B. Each intersection is tested with regard to the
topologically invariant criteria of emptiness and non-emptiness. The topological
relationship between two spatial objects A and B can be expressed by evaluating
the matrix in Figure 1. A total of 29 = 512 dierent congurations are possible
from which only a certain subset makes sense depending on the combination of
spatial data types just considered. Several extensions based on the 9IM exist. Ex-
amples are the dimensionality extensions in [2,3], the Voronoi-based extensions
in [4], and the extensions to complex spatial objects in [5].
A topic that has been partially formally explored at the abstract level deals
with topological relationships between simple 3D spatial objects. In [6], the au-
thor applies the 9-intersection model to simply-connected 3D spatial objects,
that is, simple 3D lines, simple surfaces (no holes), and simple volumes (no
cavities), in order to determine their topological relationships. A total of 8 topo-
logical relationships are distinguished between two simple volumes. Zlatanova
has also investigated the possible 3D topological relationships in [7] by develop-
ing a complete set of negative conditions. The 9-intersection model for 3D can
be extended with the dimensionality being considered. In [8], the values of the
matrix elements are extended. Besides the and symbols, the is used to
indicate the omitted specication for the resulting set at this position, and the
numbers (0, 1, 2, 3) refer to the dimensionality of the resulting set. Therefore,
unlike the 1:1 mapping between the matrix with the topological relationships in
the 9-intersection model, a matrix that contains a value represents a class of
topological relationships. As a result, the topological relationships are clustered
and manageable to the user. However, these 9IM based models suer from the
254 T. Chen and M. Schneider
aforementioned high granularity problem due to the use of interior, exterior and
boundary sets as basic elements.
The above models in common apply set theory to identify topological relation-
ships. Thus, they can be categorized as point-set based topological relationships
models. There are also other approaches that do not employ the point-set theory.
The approach in [9] investigates topological predicates between cell complexes,
which are structures from algebraic topology. It turns out that, due to limita-
tions of cell complexes, the topological relationships between them are only a
subset of the derived point-set based topological relationships. The topological
relationships between 3D spatial objects that consist of a series of cell complexes
can be described by the combination of relationships between those cells [10].
The Dimensional Model (DM ) [11] is a model that is independent of the 9-
intersection model. It denes dimensional elements on a spatial object, and all
the dimensional elements contribute to the nal result. Three levels of details
for the topological relationships are developed for dierent application purposes.
The dimension model can distinguish some cases, especially meet cases that the
9-intersection model cannot identify. However, since it leaves the abstract topo-
logical space where only point sets are used, it is not clear how the dimensional
elements can be constructed.
3.1 Overview
In this paper, we are interested in complex volumes that may contain cavities
or multiple components. A formal denition of complex volume objects can be
found in [12], which models volumes as special innite point sets in the three-
dimensional Euclidean space. Our approach is also based on point set theory and
point set topology. The basic idea is to evaluate the values of a set of Boolean
neighborhood conguration ags to determine the topological relationships be-
tween two volumes. Each neighborhood conguration ag indicates the existence
or non-existence of a characteristic neighborhood conguration of the points in
a given scenario. The neighborhood conguration of a point describes the own-
erships of the points that are near the reference point. If the existence of a
neighborhood conguration is detected, then the corresponding neighborhood
conguration ag is set to true. For example, for a scenario that involves two
volumes A and B, if there exists a point p whose neighboring points all belong
to both A and B, then the corresponding neighborhood conguration ag ex-
ist nei in overlap (see Denition 1(1)) is set to true. Later, this neighborhood
The Neighborhood Conguration Model 255
(A, B, p) x N (p) : x A x B
(A, B, p) x N (p) : x A x
/B
(A, B, p) x N (p) : x
/ Ax B
(A, B, p) x N (p) : x
/ Ax /B
AB A B p p A
p p p p B B B B
pA Ap pA p
(1) (2) (3) (4) (5) (6) (7) (8)
A A A p p A p
p p B B B B B
B p pA pA pA pA A
this ag (exist nei contain op1 op2 ) further indicates the existence of a meeting
on face topological relationship between two volume objects A and B.
Thus, we can use a 15 bit binary array to encode the topological relationship
between A and B. The denition for the topological relationship encoding TRE
is given as:
TRE (A, B) = (FV (A, B, 0) FV (A, B, 1) ... FV (A, B, 14))
which are exist nei in overlap, exist nei in op1, exist nei in op2, exist nei in ext,
exist nei contain overlap op1, exist nei contain overlap op2, exist nei contain
op1 ext, exist nei contain op2 ext, and exist nei contain op1 op2 overlap ext. To
demonstrate the validity of the encoding, we have marked the corresponding
points pi that validates the true value of ag F [i]. In Figure 3b, three addi-
tional ags, which are exist nei contain op1 op2, exist nei contain op1 op2 ext,
and exist nei contain op1 op2 overlap, become true due to the points p8 , p11 ,
and p14 respectively. Therefore, Figure 3 presents three dierent topological re-
lationships that can be distinguished by our encoding, which are overlap encoded
by 111111001100001, overlap with meet on face encoded by 111111011110011,
and overlap with meet on edge encoded by 111111001110001.
As a result, we obtain a total of 215 = 32768 possible topological relationship
encoding values, which implies a total of 32768 possible topological relationships
between two volume objects. However, not all encoding values represent valid
topological relationships. We call a topological relationship encoding valid for
two volumes if, and only if, it can be derived from a real world scenario that
involves two volume objects. For example, there does not exist a real world
scenario with the topological relationship encoding 000000000000000. Thus, the
open questions are now (i) which are the valid topological relationship encod-
ings for two complex volumes, (ii) which are the valid topological relationship
encodings for two simple volumes, and (iii) what their numbers are. However,
due to space limitation, our goal is only to propose the basic modeling strategy
in this paper; thus, we leave these questions open.
0 0 1 0 0 1 1 1 1 1 0 0
9IM 0 0 1 0 1 1 1 1 1 1 1 0
1 1 1 1 1 1 1 1 1 1 1 1
NCM (011100001100000) (011100011110000) (111111001100001) (101101100101000)
1 0 0 1 1 1 1 1 1 1 0 0
9IM 1 0 0 0 1 1 0 0 1 0 1 0
1 1 1 0 0 1 0 0 1 0 0 1
NCM (101101000100000) (110110101000100) (110110001000000) (100100100000000)
Fig. 4. The 8 topological relationships between two volumes that can be distinguished
with both 9IM and NCM
B
1 1 1 1 1 1 1 1 1 1 1 1
9IM 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1
NCM (111111011110011) (111111011110001) (111111101101001) (111111111111001)
Fig. 5. The 4 topological relationships that can be distinguished with NCM but not
9IM
References
1. Egenhofer, M.J., Herring, J.: A Mathematical Framework for the Denition of
Topological Relationships.. In: Int. Symp. on Spatial Data Handling, pp. 803813
(1990)
2. Clementini, E., Felice, P.D., Oosterom, P.: A Small Set of Formal Topological
Relationships Suitable for End-user Interaction. In: 3rd Int. Symp. on Advances in
Spatial Databases, pp. 277295 (1993)
3. McKenney, M., Pauly, A., Praing, R., Schneider, M.: Dimension-rened Topological
Predicates. In: 13th ACM Symp. on Geographic Information Systems (ACM GIS),
pp. 240249 (2005)
4. Chen, J., Li, C., Li, Z., Gold, C.: A Voronoi-based 9-intersection Model for Spatial
Relations. International Journal of Geographical Information Science 15(3), 201
220 (2001)
5. Schneider, M., Behr, T.: Topological Relationships between Complex Spatial Ob-
jects. ACM Trans. on Database Systems (TODS) 31(1), 3981 (2006)
6. Egenhofer, M.J.: Topological Relations in 3D. Technical report (1995)
7. Zlatanova, S.: On 3D Topological Relationships. In: 11th Int. Conf. on Database
and Expert Systems Applications (DEXA), p. 913 (2000)
8. Borrmann, A., van Treeck, C., Rank, E.: Towards a 3D Spatial Query Language for
Building Information Models. In: Proceedings of the Joint International Conference
for Computing and Decision Making in Civil and Building Engineering (2006)
9. Pigot, S.: Topological Models for 3D Spatial Information Systems. In: International
Conference on Computer Assisted Cartography (Auto-Carto), pp. 368392 (1991)
10. Guo, W., Zhan, P., Chen, J.: Topological Data Modeling for 3D GIS. Int. Archives
of Photogrammetry and Remote Sensing 32(4), 657661 (1998)
11. Billen, R., Zlatanova, S., Mathonet, P., Boniver, F.: The Dimensional Model: a
Framework To Distinguish Spatial Relationships. In: Int. Symp. on Advances in
Spatial Databases, pp. 285298 (2002)
12. Schneider, M., Weinrich, B.E.: An Abstract Model of Three-Dimensional Spatial
Data Types. In: 12th ACM Symp. on Geographic Information Systems (ACM GIS),
pp. 6772 (2004)
Reasoning with Complements
Max J. Egenhofer
1 Introduction
Associations between and aggregations of spatial objects are often specified by
integrity constraints to ensure consistent semantics. The consideration of spatial
relations as part of such integrity constraints offers a focus that is innate to the
modeling and analysis of geospatial semantics. Such spatial relations may be applied
at the instance level (e.g., Orono is inside the State of Maine) or in an ontology or a
conceptual schema at the class level (e.g., islands are surrounded by waterbodies).
The spatial relations used are typically qualitative spatial relations, which abstract
away the myriad of quantitative detail, while they capture comprehensively some of
the most critical traits of how individuals are related to each other. This paper focuses
on the consistency of such spatial integrity specifications, particularly for the case
when two constraints, c1 and c2, are applied between three classes A, B, C (i.e., A c1
B and B c2 C), as such scenarios give rise to an implied consistency constraint c3 that
holds between A and C.
The mere specification of a spatial relation at the class level is typically
insufficient, as it does not quantify the relation. Therefore, associations are often
supplemented with their cardinalities, specifying the relations as n:m (with n and m
typically taking on values of ZZ 2 ). For relation specified between classes A and B,
such cardinality specifications yield multiple (each instance of class A may be
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 261270, 2011.
Springer-Verlag Berlin Heidelberg 2011
262 M.J. Egenhofer
related to 0, 1, or many instances of class B), singular (each instance of class A has
exactly one instance of class B), or optional (each instance of class A may be
related to one instance of class B) associations and aggregations. The use of such
annotations has become standard practice in conceptual schema design for spatial
applications [1,2,9,10] to model spatial properties at the class level, but without
extensive quantification it often still lacks the ability to capture the intended
semantics fully. A comprehensive classification of constraints on relations has
identified a set of 17 abstract class relations [7,8], which apply to associations,
specifying concisely how a specific binary relation (e.g., a topological relation) must
be distributed among the participating objects in order to yield a valid representation.
For instance, an island cannot exist without a surrounding water body, which requires
that each island has a surroundedBy relation with respect to a waterbody (captured by
a left-total specification). Such spatial integrity constraints may be nested. For
instance, the additional constraint that any lake on an island must be inside a single
islanda constraint that is left total and injectiveimplies that there is also the
topological relation surroundsWithoutContact between the lake on the island and the
waterbody in which the island is located. This paper addresses how to determine
correctly not only the implied spatial relation, but also the implied abstract class
relation.
The remainder of the paper is structured as follows: Section 2 briefly summarizes
the 17 abstract class relations. Section 3 introduces the specification of the
complement constraint and Section 4 includes complement constraint into the
derivation of compositions. The paper closes with conclusions in Section 5.
RD x
LT x
RT x
LD.RD x x
LD.LT x x
LD.RT x x
RD.LT x x
RD.RT x x
LT.RT x x
LT.RT-all x
LD.RD.LT x x x
LD.RD.RT x x x
LD.LT.RT x x x
RD.LT.RT x x x
LD.RD.LT.RT x x x x
some x
Fig. 1. Specification of the 17 abstract class relations [7] in terms of six base constraints: x
means a required base constraint, while stands for a negative constraint. For instance, LD
fully develops to R LD (A,B):=R Left-D (A,B) R Right-D (A,B) R all-1 (A,B) R all-2 (A,B)
D&B D&B
The relations to which the abstract class relations are applied is the set of eight
topological relations between two regions (Figure 2), which derives from the 4-
intersection [6]. The universal topological relation as the disjunction of all eight
relations is referred to by .
Fig. 2. The eight topological relations between two regions in IR 2 with the relations labels
264 M.J. Egenhofer
(a) (b)
Fig. 3. (a) The complete bigraph formed by the LT-RT-all relation between all instances of
class A and class B and (b) the constraint that any disc-like regions embedded in IR 2 must be
related by exactly one of the eight topological region-region relations to another disc-like
region B
Associations typically specify the entities relation only with respect to its host .
For example, building ( inside coveredBy )RD.LT landParcel captures each
Reasoning with Complements 265
buidlings relation with the one land parcel that it is built on (excluding such special
cases a building being partially built on a neighboring lot or a buildings footprint
extending over an entire land parcel). The integrity constraint RD.LT also captures
that each building must have the relation with resepct to a land parcel (because the
constraint is left-total), that there are also land parcels that have no building on it
(because the constraint is not right-total), and that a land parcel may have multiple
buildings on it (because the constraint is not left-definite). It does not include,
however, a specification of the relation and the integrity constraint with all other land
parcels than the buildings host. Such a relation exists, however between each
building and each land parcel other than the buildings host. For instance, if building
a1 is coveredBy landParcel b1, and b1 meets land parcel b2, then building a1 and
landParcel b2 would either be disjoint or they would meet. On the other hand, if
building a2 is inside landParcel b1, then a2 would be disjoint from b2 without an
option that a2 and b2 meet.
The complement of a topological integrity constraint captures the topological
relations that must hold between all instances of the related classes other than the
hosts. As a topological integrity constraint, the complement has a topological
component and an integrity constraint. Its topological component follows from the
relations compositions [4]. For building ( inside coveredBy )RD.LT landParcel the
topological complement is building ( disjoint meet ) landParcel. It applies to all
buildings, therefore, the complement constraint must be left-total. Only if all
buildings were inside their respective host parcel the topological complement would
be unambiguously disjoint.
With respect to the land parcels, the integrity constraint is ambiguous as well, as
three different scenarios may occur: (1) all buildings are on one land parcel, and there
exists a second land parcel (without a building on it); (2) all buildings are on one land
parcel, and there exist at least another two land parcels (each without a building); and
(3) buidlings are on more than one land parcel. In the first case the complement of
RD.LT is RD.LT (Figure 4a), while in the second case RD.LTs complement is the
mere left-total constraint LT (Figure 4b). For the third case the complement of RD.LT
is left-total and right-total, yielding the constraint LT.RT. So the constraint building
(disjoint meet)LT.RTRD.LTLT landParcel is the complement of the constraint
(inside coveredBy)RD.LT (Figure 4c).
Fig. 4. Complement constraints for RD.LT: (a) LT, (b) RD.LT, and (c) LT.RT
266 M.J. Egenhofer
In order to derive the complement for each of the 17 abstract class relations, the
bigraph of each constraint was represented by an ordered pair of strings of the
vertices degrees [string1; string2]string1 stands for the source, and string2 for the
target. The sequence of the degrees within a string is immaterial. For instance, the two
graphs in Figure 4b are captured by the two strings [1,1; 2,0] and [1,1; 0,2]. These two
graphs are isomorphic, as captured by the same source strings and same target strings.
For any of the 17 constraints, its complete bigraph serves as the base for determining
its complement. The number of vertices in the source determines the degree of each
vertex in the complete bigraphs target, while the number of vertices in the target
determines the degree of each vertex in the complete bigraphs source. The
complement of a constraint c, denoted by c , is then the difference between the
degree strings of its complete bigraph and the constraints degree string. For example,
the degree string of the constraint depicted in Figure 4a is [1,1; 2,0,0], and the degree
string of its complete bigraph is [3,3; 2,2,2]. The complement of [1,1; 2,0,0] is then
[3-1,3-1; 2-2,2-0,2-0], that is [2,2; 0,2,2].
The complements of all 17 abstract class relations (Figure 5) were derived from all
respective bigraphs up to cardinality 5. No additional complements were found
beyond cardinality 4. Completeness will be proven in the future.
c c c c
LD.RD LT.RT RD.RT LD.RD.RT, RD.RT, LT.RT
LD LT.RT LT RD.LT,LT, LT.RT
RD LT.RT RT LD.RT,RT, LT.RT
some LT.RT LD.RD.LT.RT LD.RD.LT.RT, LT.RT
LD.RD.LT LD.RD.LT, LD.LT, LT.RT LD.LT.RT LD.LT.RT, LT.RT
LD.RD.RT LD.RD.RT, RD.RT, LT.RT RD.LT.RT RD.LT.RT, LT.RT
LD.RT LD.RT, RT, LT.RT LT.RT \ LT.TR-all
RD.LT RD.LT, LT, LT.RT LT.RT-all
LD.LT LD.RD.LT, LD.LT, LT.RT LT.RT-all
Fig. 5. The complements c to the 17 abstract class relations c (plus, for completeness, the
empty constraint)
Four constraints (LD.RD, LD, RD, and some) have a unique, non-trivial
complement. LT.RT-all (which forms the complete bigraph) has the empty graph as
its complement. On the other end of the spectrum is complement to LT.RT, which is
the universal constraint relation , except for LT-RT-all. The remaining eleven
constraints complements are each of cardinality 3, always including LT.RT.
Accounting fully for the complement relations turns the calculation of the
composition of two topological integrity constraints into a multi-step process. Two
topological integrity constraints t1c1 and t2 c2 yield the composition of the actual
relations (Eqn. 3a) as well as the three compositions involving their complements
t1c1 and t2 c2 (Eqs. 3b-d).
t3c3 (t1 ; t2)(c1 ; c2) t1c1 ; t2 c2 (3a)
t4 c4 (t1 ; t2)(c1 ; c2 ) t1c1 ; t2 c2 (3b)
compositions remain inconclusive (Eqs. 5a and 5b). For the time being only
conclusive complement composition inferences will be used. The role of inconclusive
inferences will be explored in the future.
LT.RTLT.RT-all (disjoint ; ( \ contains))LT.RT ; (LD.RD.LT.RTLT.RT) (5a)
RDsomeRD.RTRT (disjoint ; ( \ contains))some ; (LD.RD.LT.RTLT.RT) (5b)
disjoint LT.RT (disjoint ; contains)LT.RT ; LD.RD.LT.RT (5c)
The next step is the attempt to augment a base inference (e.g., Eqn. 3a) with the result
of a conclusive complement composition (e.g., Eqn. 5c). The topological integrity
constraint t3c3 is the base, and class relation c3 may be augmented if any of the
three compositions with the complements results in the topological relation t3 as well
(i.e., t3=t4 or t3=t5 or t3=t6). In such cases, the augmented class relation is then the
result of combining the corresponding class relation with c3 (Eqn. 6).
c3 e ci t3=ti46 (6)
For the composition of meet some with contains LD.RD.LT.RT the composition of the
complement of meet some with contains LD.RD.LT.RT (Eqn. 5c) has the same resulting
topological relation disjoint; therefore, disjoint some may be augmented
by disjoint LT.RT . Since LT.RT is the complement of the class relation some (Figure 5),
the augmentation of these two class relations ( some e LT.RT ) yields LT-RT-all for
the topological relation disjoint (Eqn. 7).
disjoint LT.RT-all meet some ; containsLD.RD.LT.RT (7)
cardinalities (#) of A and B are the same, while RRD.LT.RT requires that
#( A) > #( B) . Ms [7] determined cardinality constraints for seven class relations
(5, 6, 9, 10, 13, 14, 15), whose 49 combinations create 24 cardinality conflicts.
Definiteness-totality conflict: Transitions from a relation that is a-definite.b-total
( a,b {left,right} ) cannot yield a relation that contains a-definite.b-total, because
the definite property cannot be retained when adding an edge to it).
Definiteness increase: The addition of a relation to a class relation that is not a-
definite (a {left,right}) cannot turn that relation into an a-definite class relation.
Totality decrease: The addition of a relation to an a-total class relation
(a {left,right}) cannot turn it into a class relation that lacks the a-total property.
Complete-graph conflict: LT.RT-all is a complete bigraph, so that the addition of
another edge is impossible.
Out of the 289 transitions, only 80 are feasible, among them nine transitions between a
class relation an itself (LD.RD, LD, RD, some, RD.LT, LD.LT, LT, RT, LT.RT). Such
idempotent transitions have no effect on the augmentation of a derived base relation.
Among the remaining 71 transitions are 44 atomic transitions (i.e., they cannot be
obtained from successive combinations of other transitions) and 27 that can be obtained
by two or more successive application of an atomic transition. The feasibility of all 44
transitions was confirmed by constructing for each transition two corresponding bigraphs
that differ only by the addition of one edge. All feasible transitions are comprehensively
captured by the transition graph (Figure 6), a directed conceptual neighborhood graph [5]
in the form of a partially-ordered set, with LD.RD as the bottom element and LT.RT-all
as the top element. The augmentation of a class relation with another one is then the least
upper bound of these two relations within the transition poset (i.e., the least element that
is common to the two class relations that participate in the augmentation). This property
also shows that augmentation is commutative as c ci equals ci c .
Fig. 6. The transition graph of the 17 class relations when adding an edge to a relation
5 Conclusions
In order to guarantee consistency in the specification with spatial integrity constraints,
not only the soundness of compositions of must be considered, but also the
270 M.J. Egenhofer
compositions with the constraints complements. For Mss 17 abstract class relations
[7] we derived their complements, and developed two methods of augmenting spatial
integrity constraints. The first method, which applies when composition and
complement composition are complementary, leads to the complete graph. For the
second method that applies to any pair of the 17 abstract class relations, we developed
the transition poset, a type of conceptual neighborhood graph that captures transitions
between class relations when adding an additional edge to the bigraph of a constraint.
References
1. Belussi, A., Negri, M., Pelagatti, G.: An ISO TC 211 Conformant Approach to Model
Spatial Integrity Constraints in the Conceptual Design of Geographical Databases. In:
Roddick, J., Benjamins, V.R., Si-said Cherfi, S., Chiang, R., Claramunt, C., Elmasri, R.A.,
Grandi, F., Han, H., Hepp, M., Lytras, M.D., Mii, V.B., Poels, G., Song, I.-Y., Trujillo,
J., Vangenot, C. (eds.) ER Workshops 2006. LNCS, vol. 4231, pp. 100109. Springer,
Heidelberg (2006)
2. Borges, K., Laender, A., Davis, C.: Spatial Data Integrity Constraints in Object Oriented
Geographic Data Modeling. In: Bauzer Medeiros, C. (ed.) 7th International Symposium on
Advances in Geographic Information Systems, pp. 16. ACM, New York (1999)
3. Donnelly, M., Bittner, T.: Spatial Relations Between Classes of Individuals. In: Cohn,
A.G., Mark, D.M. (eds.) COSIT 2005. LNCS, vol. 3693, pp. 182199. Springer,
Heidelberg (2005)
4. Egenhofer, M.: Deriving the Composition of Binary Topological Relations. Journal of
Visual Languages and Computing 5(2), 133149 (1994)
5. Egenhofer, M.: The Family of Conceptual Neighborhood Graphs for Region-Region
Relations. In: Fabrikant, S.I., Reichenbacher, T., van Kreveld, M., Schlieder, C. (eds.)
GIScience 2010. LNCS, vol. 6292, pp. 4255. Springer, Heidelberg (2010)
6. Egenhofer, M., Franzosa, R.: Point-Set Topological Relations. International Journal of
Geographical Information Systems 5(2), 161174 (1991)
7. Ms, S.: Reasoning on Spatial Semantic Integrity Constraints. In: Winter, S., Duckham,
M., Kulik, L., Kuipers, B. (eds.) COSIT 2007. LNCS, vol. 4736, pp. 285302. Springer,
Heidelberg (2007)
8. Ms, S.: Reasoning on Spatial Relations between Entity Classes. In: Cova, T.J., Miller,
H.J., Beard, K., Frank, A.U., Goodchild, M.F. (eds.) GIScience 2008. LNCS, vol. 5266,
pp. 234248. Springer, Heidelberg (2008)
9. Pelagatti, G., Neri, M., Belussi, A., Migliorini, S.: From the Conceptual Design of Spatial
Constraints to their Implementation in Real Systems. In: Agrawal, D., Aref, W., Lu, C.-T.,
Mokbel, M., Scheuermann, P., Shahabi, C. (eds.) 17th ACM SIGSPATIAL International
Symposium on Advances in Geographic Information Systems, pp. 448451. ACM, New
York (2009)
10. Tryfona, N., Hadzilacos, T.: Logical Data Modeling of SpatioTemporal Applications:
Definitions and a Model. In: Eaglestone, B., Desai, B., Shao, J. (eds.) International
Database Engineering and Applications Symposium, pp. 1423. IEEE Computer Society,
Los Alamitos (1998)
Towards Modeling Dynamic Behavior with
Integrated Qualitative Spatial Relations
1 Introduction
Situation Awareness in Dynamic Spatial Systems. Situation awareness
and geographic information systems (GIS) are gaining increasing importance in
dynamic spatial systems such as road trac management (RTM). The main goal
is to support human operators in assessing current situations and, particularly, in
predicting possible future ones in order to take appropriate actions pro-actively.
The underlying data describing real-world entities (e. g., tunnel) and their spatial
relations (e. g., inside, near), which together dene relevant situations (e. g., a
trac jam inside and near the boundary of a tunnel), are often highly dynamic
and vague. As a consequence reliable numerical values are hard to obtain, which
makes qualitative modeling approaches better suited than quantitative ones [17].
Dynamic Behavior in Qualitative Spatial Calculi. Recently, ontology-
driven situation awareness techniques [1],[6] and qualitative approaches to mod-
eling the dynamic behavior of spatial systems [3] have emerged as a basis for
predicting critical situations from relations between objects. Such relations are
expressed by employing multiple relation calculi, each of them focusing on a cer-
tain aspect, such as topology [8], [20], size [13], or distance [15]. These calculi
This work has been funded by the Austrian Federal Ministry of Transport, Innovation
and Technology (BMVIT) under grant FIT-IT 829598.
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 271280, 2011.
c Springer-Verlag Berlin Heidelberg 2011
272 S. Mitsch, W. Retschitzegger, and W. Schwinger
2 Related Work
In this section, we discuss related work on modeling the dynamic behavior of
spatial systems with qualitative spatial reasoning approaches, focusing on those
approaches in the domain of GIS. In this discussion, we follow the common on-
tological distinction (cf. the SNAP/SPAN approach [14]) often applied in GIS
[12] , [23] between the states of a system describing relations between entities
from a snapshot point-of-view, and the evolution between these states in terms
of occurrents, such as events and actions. Causal relations between states and
occurrents [12] comprise (i) qualication constraints dening preconditions for
states (i. e., states enable or disable other states, e. g., being smaller enables be-
ing a part), and for occurrents (i. e., states allow or prevent occurrents, e. g.,
having very close boundaries enables becoming externally connected), whereas
Integrated Qualitative Calculi 273
(ii) frame constraints dene eects of occurrents1 (i. e., occurrents cause other
occurrents, e. g., motion causes two objects becoming disrelated). In this paper,
we focus on qualication constraints for states and occurrents, since these are
the primary source of inter-calculi dependencies.
Many qualitative spatial reasoning approaches (e. g., [7], [10], [11], [21]) pro-
vide or utilize a single qualitative spatial calculus modeling a particular aspect,
and naturally, encode qualication constraints in CNGs (i. e., each relation is a
qualication constraint for its neighboring relations). A slightly broader view is
applied in GIS [8], informally discussing states, in particular the size of objects,
as qualication constraints for relations. The same constraint is used in a mod-
eling framework for dynamic spatial systems [4] as qualication constraint on
the transitions between relations. Arbitrary qualication constraints spanning
multiple qualitative spatial calculi are explicitly supported in Bhatts approach
to modeling the dynamic behavior of spatial systems [3] in the form of so-called
axioms of interaction. However, this modeling approach lacks a taxonomy of
states and constraints. As a consequence, both must be provided by users of this
modeling framework, instead of being integrated within its ontology.
Focusing on the integration of multiple calculi, Gerevini and Renz [13] discuss
interdependencies between the Region Connection Calculus (RCC) and their
Point Algebra for describing size relations. These interdependencies describe
qualication constraints for states (i. e., relations) of one calculus in terms of
states of the other. For example, a relation TPP (tangential proper part) of
RCC entails a size relation < (i. e., the contained entity must be smaller than
the containing one). Using the same calculi (RCC and size), Klippel et al. [18]
investigated the impact of dierent size relationships on the relation transitions
in RCC induced by motion events, and the cognitive adequacy of these changes.
Since the interdependencies between topological and size relations are rather
obvious, providing a formal integration model, however, has not been the focus.
Clementini et al. [5] present several algorithms for combining distance and
orientation relations from a compositional point-of-view (e. g., these algorithms
compute the composition of distance relations, given a known orientation rela-
tion). In contrast, we focus on interpreting relations of a particular calculus as
qualication constraints for relations and/or transitions in other calculi.
In summary, existing works lack a model of space and of spatial primitive
pairs, preventing consistent integration of multiple calculi with major evolution
causes (motion, scaling, orientation, shape, cf. [8]). In the next section, we discuss
such a model along three spatial calculi modeling important aspects like topology
(RCC, [20]), distance of boundaries [15], and size [13].
particular aspect of the real world, some of their relations implicitly model other
aspects as well (i. e., these relations restrict the relations that can hold and the
transitions that can occur in another calculus). For instance, a topological non-
tangential proper part relation (NTPP) between two objects does not only dene
that a particular object is contained in another one, but also implicitly denes
that the contained object must be smaller than the containing one [13]. Addi-
tionally, real-world evolution abstracted to transitions in one calculus might be
modeled in more detail in another calculus. For example, a topological transition
from being disconnected (DC) to being externally connected (EC) in RCC is mod-
eled from a distance viewpoint [15] with a sequence of relations and transitions,
comprising transitions from being very far (VF) over far (F) and close (C) to be-
ing very close (VC). We make such assumptions explicit by combining exisiting
calculi with qualication constraints modeling inter-calculi dependencies.
In order to dene such qualication constraints in a consistent manner and
account for a plethora of dierent special cases, a mapping between relations
and the underlying spatial primitives including their numerical representation
is needed. For example, let us consider relations describing the spatial distance
between object boundaries. Since the boundary of an object implicitly denes
its size and center, the options concerning the distance between the boundaries
of two objects can only be narrowed by taking into account information about
their topological relationship, relative size, and distance of their centers: If one
object is known to be a proper part of the other one, a rather small object being
located at the center of a large object is regarded to be very far from the large
objects boundaries, whereas the same object with a large distance to the center
would result in the boundaries being considered to be very close. The boundaries
of two nearly equally-sized objects would be considered very close as well.
As the basis for determining the above sketched variety of special cases making
up inter-calculi dependencies, we base upon Galtons approach [11] to deriving a
two-dimensional image of relations from the CNG of RCC, since this approach
covers the full space of possible region-pairs. In such a two-dimensional image,
the topological relations between two spheres are encoded, using the radii r1 and
r2 of the spheres along the x-axis (x = r1/(r1 + r2)) and the distance d between
their centers on the y-axis (d/2(r1 + r2)). The relations DC (disconnected), EC
(externally connected), PO (partly overlapping), TPP (tangential proper part) and
its inverse TPPi, NTPP (non-tangential proper part) and its inverse NTPPi, as well
as EQ (equals) are dened in terms of these two measures in (1).
DC 0.5 < y < 1 EC y = 0.5
PO |0.5 x| < y < 0.5 EQ x = 0.5 y = 0
(1)
TPP 0 < y = 0.5 x TPPi 0 < y = x 0.5
NTPP y < 0.5 x NTPPi y < x 0.5
The resulting image of possible relations in RCC between intervals in R, cir-
cular regions in R2 , and spheres in R3 is depicted in Fig. 12 . Besides reecting
2
This is dierent from Galton[11], since we normalize both the x- and y-axis metric
with the sum of the radii to obtain a symmetric image.
Integrated Qualitative Calculi 275
+'"'%)'()&%"
,'#$ '-'.--$
$&!#%*
Fig. 1. Combined image of topology, distance of boundaries, and size (cf. [11])
the CNG of RCC (neighboring relations in the CNG are neighboring regions,
lines, or points), this image encodes two interesting aspects of evolution: (i) the
implications of motion and scaling, and (ii) the dominance relationship between
relations (e. g., EQ being a point indicates that it can hold for a time instant,
whereas those relations being denoted by regions must hold for a time interval).
Considering the impact of evolution, a point in this gure denoting a particu-
lar relation of RCC moves along the x-axis when one of the spheres changes its
size with respect to the size of the other sphere (i. e., due to scaling), whereas
its movement along the y-axis is caused by motion of the centers (which in turn
is either due to motion of an entire sphere, or scaling). For example, consider
the black dot labelled Holds(rcc (o, o ), N T P P i, s) in the sector NTPPi, denot-
ing that o contains o : in terms of size and distance, this dot means that o is
approximately twice the size of o (cf. 0.67 on the x-axis), and that their centers
are near each other. If o shrinks, the black dot moves along the x-axis to the left,
until o can no longer fully contain o , leading to an overlap relation (represented
by the dot moving from NTPPi into PO). During this scaling, at a single time
instant the boundary of o touches the boundary of o , represented by the dot
passing the line labeled TPPi (i. e., o is a tangential proper part of o). If o shrinks
even further, it will eventually be contained in o (i. e., the dot will move into
TPP). Now consider that the centers of o and o coincide (i. e., their distance is
zero): the same scaling event will then traverse EQ instead of TPPi, PO, and TPP.
276 S. Mitsch, W. Retschitzegger, and W. Schwinger
We now dene such a space representing relations as points for each of the
employed positional relation calculi (distance of boundaries and size). To begin
with, we discuss the integration of size, since the x-axis in Fig. 1 already expresses
size relationships in terms of the ratio of interval radii r1/(r1+r2). The mapping
to a qualitative size calculus is straightforward: a ratio below 0.5 corresponds to
smaller (<), above 0.5 to larger (>) and one of exactly 0.5 to equal size (=).
Less obvious is the integration of the distance between boundaries. As a start-
ing point, we informally dene that two objects are very close whenever the
boundaries meet, which is the case along the lines labeled EC, TPP, and TPPi,
as well as at the point EQ. To both sides of these lines and around the point
of topological equality, we dene a region where the boundaries are still very
close to each other (e. g., 10%3 o in distance and size as used in Fig. 1). Since
we must consistently encode the CNG (represented by the sequence VF-F-C-VC),
to each side of VC a region C must follow, which itself neighbors to regions F.
Finally, regions VF are positioned at the outermost and innermost sectors of the
image, neighboring only to regions F. Considering PO in conjunction with VC, it
becomes obvious why our metrics are normalized. Let o be much larger than o
(r1 r2) and o overlap with o : their boundaries certainly should be regarded to
be very close to each other, since in comparison to the size of o the distance be-
tween their boundaries is quite small (analogous assumptions hold for r1 r2).
This means, that our image should be symmetric with respect to size equality
(x = 0.5), which cannot be achieved using an unnormalized metric. In (2) be-
low, we dene the distance relation VC with respect to x = r1/(r1 + r2) and
y = d/2(r1 + r2). With analogous formalizations, C, F, and VF can be dened.
1. Trac jam grows. In the beginning, the area of road works (1: )
is much larger than the traffic jam. Since the traffic jam grows, it thereafter
extends beyond the area of road works, so causing transitions in topology, dis-
tance, and size. At the same time it remains externally connected to the on-ramp
(2: ).
2. Accident occurs. Next, an accident occurs at the end of the traffic jam,
further reducing traffic flow. Since the trac jam (3: ) is still growing,
it soon completely contains the accident. In contrast, the accident and the area
of road works are both stationary, resulting in no evolution between them (4:
).
3. Ambulance drives towards accident. Finally, an ambulance drives to-
wards the nearly-equally sized accident (5: ), indicated by the arrow
pointing downwards along the horizontal center and ending at EC). On its way
to the accident, the ambulance enters the much larger trac jam (6: ).
Thus, their boundaries are considered to become very far from each other even
though the ambulance is within the traffic jam.
the intersection of the boundary of o and the interior of o is not empty, whereas
the intersection of the interior of o and the boundary of o , as well as the inter-
section of their boundaries are empty. In terms of our model, for such a relation
the distance between the centroids of o and o must be smaller than the dif-
ference between their radii (d < r2 r1), hence the following must hold true:
d/2(r1 + r2) < 0.5 r1/(r1 + r2) (i. e., inside is NTPP of RCC). In order
to integrate additional spatial aspects not being representable with the spatial
primitives employed above (e. g., orientation of entities towards each other), a
generalization (e. g., in terms of higher-dimensional images) of the presented
abstraction in terms of radii and center distance of spatial primitives is still nec-
essary (e. g., considering orientation vectors). Likewise, in order to support the
multitude of dierent spatial primitives found especially in GIS (e. g., regions,
lines, points, as well as fuzzy approaches with broad boundaries) going beyond
the intervals, regions, and spheres utilized above, metrics for comparing spatial
primitives of dierent sorts must be dened (e. g., a line passing a region [7]).
Encoding of the Ontology with Semantic Web Standards. Since current
Semantic Web standards, in particular OWL 2, formalize ontologies using a
decidable fragment of rst-order logic, an interesting further direction is to dene
a mapping of the ontology excerpt expressed in terms of the Situation Calculus
into the concepts of OWL 2. For this, it can be based on prior work in terms of
description logic rules [19] integrating rules and OWL. As a result, an integration
with Semantic-Web-based GIS would be an interesting option.
References
1. Baumgartner, N., Gottesheim, W., Mitsch, S., Retschitzegger, W., Schwinger, W.:
BeAware!situation awareness, the ontology-driven way. International Journal of
Data and Knowledge Engineering 69(11), 11811193 (2010)
2. Baumgartner, N., Gottesheim, W., Mitsch, S., Retschitzegger, W., Schwinger, W.:
Situation Prediction NetsPlaying the Token Game for Ontology-Driven Situation
Awareness. In: Parsons, J., Saeki, M., Shoval, P., Woo, C., Wand, Y. (eds.) ER
2010. LNCS, vol. 6412, pp. 202218. Springer, Heidelberg (2010)
3. Bhatt, M., Loke, S.: Modelling Dynamic Spatial Systems in the Situation Calculus.
Spatial Cognition and Computation 8, 86130 (2008)
4. Bhatt, M., Rahayu, W., Sterling, G.: Qualitative Simulation: Towards a Situation
Calculus based Unifying Semantics for Space, Time and Actions. In: Proc. of the
Conf. on Spatial Information Theory, Ellicottville, NY, USA (2005)
5. Clementini, E., Felice, P.D., Hern andez, D.: Qualitative Representation of Posi-
tional Information. Articial Intelligence 95(2), 317356 (1997)
6. Cohn, A.G., Renz, J.: Qualitative Spatial Representation and Reasoning. In: Hand-
book of Knowledge Representation, pp. 551596. Elsevier, Amsterdam (2008)
7. Egenhofer, M.: A Reference System for Topological Relations between Compound
Spatial Objects. In: Proc. of the 3rd Intl. Workshop on Semantic and Conceptual
Issues in GIS, Gramado, Brazil, pp. 307316. Springer, Heidelberg (2009)
8. Egenhofer, M.: The Family of Conceptual Neighborhood Graphs for Region-Region
Relations. In: Proc. of the 6th Intl. Conf. on Geographic Information Science,
Zurich, Switzerland, pp. 4255. Springer, Heidelberg (2010)
280 S. Mitsch, W. Retschitzegger, and W. Schwinger
9. Freksa, C.: Conceptual neighborhood and its role in temporal and spatial reasoning.
In: Proc. of the Imacs International Workshop on Decision Support Systems and
Qualitative Reasoning, pp. 181187 (1991)
10. Galton, A.: Towards a Qualitative Theory of Movement. In: Proc. of the Intl. Conf.
on Spatial Information Theory: A Theoretical Basis for GIS. Springer, Heidelberg
(1995)
11. Galton, A.: Continuous Motion in Discrete Space. In: Proc. of the 7th Intl. Conf. on
Principles of Knowledge Representation and Reasoning, Breckenridge, CO, USA,
pp. 2637. Morgan Kaufmann, San Francisco (2000)
12. Galton, A., Worboys, M.: Processes and Events in Dynamic Geo-Networks. In:
Rodrguez, M.A., Cruz, I., Levashkin, S., Egenhofer, M.J. (eds.) GeoS 2005. LNCS,
vol. 3799, pp. 4559. Springer, Heidelberg (2005)
13. Gerevini, A., Nebel, B.: Qualitative spatio-temporal reasoning with RCC-8 and
allens interval calculus: Computational complexity. In: Proc. of the 15th Eureo-
pean Conf. on Articial Intelligence, Lyon, France, pp. 312316. IOS Press, Ams-
terdam (2002)
14. Grenon, P., Smith, B.: SNAP and SPAN: Towards Dynamic Spatial Ontology.
Spatial Cognition & Computation: An Interdisciplinary Journal 4(1), 69104 (2004)
15. Hern andez, D., Clementini, E., Felice, P.D.: Qualitative Distances. In: Kuhn, W.,
Frank, A.U. (eds.) COSIT 1995. LNCS, vol. 988, pp. 4557. Springer, Heidelberg
(1995)
16. Hu, Y., Levesque, H.J.: Planning with Loops: Some New Results. In: Proc. of
the ICAPS Workshop on Generalized Planning: Macros, Loops, Domain Control,
Thessaloniki, Greece (2009)
17. Ibrahim, Z.M., Tawk, A.Y.: An Abstract Theory and Ontology of Motion Based
on the Regions Connection Calculus. In: Proc. of the 7th Intl. Symp. on Abstrac-
tion, Reformulation, and Approximation, Whistler, Canada, pp. 230242. Springer,
Heidelberg (2007)
18. Klippel, A., Worboys, M., Duckham, M.: Conceptual Neighborhood BlindnessOn
the Cognitive Adequacy of Gradual Topological Changes. In: Proc. of the Workshop
on Talking about and Perceiving Moving Objects: Exploring the Bridge between
Natural Language, Perception and Formal Ontologies of Space, Bremen, Germany,
Springer, Heidelberg (2006)
19. Krotzsch, M., Rudolph, S., Hitzler, P.: Description Logic Rules. In: Proc. of the
18th European Conf. on Articial Intelligence, pp. 8084. IOS Press, Amsterdam
(2008)
20. Randell, D.A., Cui, Z., Cohn, A.G.: A Spatial Logic based on Regions and Connec-
tion. In: Proc. of the 3rd Intl. Conf. on Knowledge Representation and Reasoning.
Morgan Kaufmann, San Francisco (1992)
21. Reis, R., Egenhofer, M., Matos, J.: Conceptual Neighborhoods of Topological Re-
lations between Lines. In: Ruas, A., Gold, C. (eds.) Proc. of the 13th Intl. Symp.
on Spatial Data Handling (2008)
22. Reiter, R.: Knowledge in Action: Logical Foundations for Specifying and Imple-
menting Dynamical Systems. The MIT Press, Cambridge (2001)
23. Worboys, M., Hornsby, K.: From Objects to Events: GEM, the Geospatial Event
Model. In: Egenhofer, M.J., Freksa, C., Miller, H.J. (eds.) GIScience 2004. LNCS,
vol. 3234, pp. 327343. Springer, Heidelberg (2004)
Transforming Conceptual Spatiotemporal Model into
Object Model with Semantic Keeping
1 Introduction
Our work takes part of a generic approach proposal to efficiently deal with urban data
within a GIS. Currently, the major shortcomings of a GIS are in the spatiotemporal
data design and in the incomplete coverage of the temporal dimension and multi-
representation of data. For that, we propose a GIS based on the MADS conceptual
model (Modeling of Application Data with Spatio-temporal features) [1], [2] and on
an ODBMS (Object Database Management System) db4o (DataBase For Objects) [3]
for storing and manipulating data.
MADS is a conceptual model dedicated to spatiotemporal data modeling [1]. It is
based on the Entity-Relation (EA) formalism. However to take advantage of the
semantics offered by the MADS model and to achieve its coupling with a GIS, we
perform its transformation into an object model. For that, a key step in creating
structures (generic classes) simulating the MADS concepts in the object language is
necessary. This step is followed by the definition of mapping rules (algorithms) that
allow the translation of specific conceptual schemas defined by MADS into the
corresponding object language.
We emphasize that, in our approach, we are specifically interested in the use of
ODBMS. ODBMS store the objects in Object Databases that are more flexible than
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 281290, 2011.
Springer-Verlag Berlin Heidelberg 2011
282 C. Zaki, M. Servires, and G. Moreau
relational databases. In these ODBMS, data objects are treated the same way as in the
programming language and access to data is by reference. With these models, we
can store and access any type of data without using (or improving) specific
manipulation languages like SQL. We believe that this gives advantages in the
formulation and understanding of requests (indeed, the way in which the data will be
handled is, in our case, planned by the conceptual schemas). Among the ODBMS we
used db4o [3]. It is a free ODBMS based on the Java programming language that
provides facilities for backing up any data type (even complex) in a uniform,
automatic and transparent way. Db4o uses simple mechanisms for manipulating data
based on methods of Java language. But on the other side the basic concepts offered
by the programming language (on which the db4o is based) are less expressive than
those of MADS. So semantic preservation verification methods are automatically
implemented, on different levels, during the transformation from conceptual to
physical model (programming language) to ensure the preservation of the semantics
imposed by the designer when creating conceptual MADS schemas.
In other words, we propose the design of spatiotemporal applications to be made
according to MADS schemas whose formal specifications, undergo later a series of
systematic transformations towards operative object programs. The scripts generation,
and the Object Database creation are viewed as sequences of the transformations.
In the rest of this paper, we will present in section 2 a background of conceptual
spatiotemporal models. Then in section 3 we will present the general procedure for
implementing our conception based on MADS into an object paradigm. We will detail
the transition rules to express the thematic, spatial and temporal characteristics of
MADS schemas into an object language. We will conclude this paper and give some
perspectives of our work in the last section.
2 Background
Our state of art focuses on spatiotemporal data design models since they are, for our
approach, the most important part that will monitor the processing of information. For
this reason, we seek a semantically rich model that allows a good modeling of the
spatiotemporal data and of applications. The implementation of this model and the
adjustment of semantics during the implementation is the part of our approach that we
will detail later in this article. A conceptual data model (especially for urban
spatiotemporal data) must be able to offer users a wealth of expression to meet their
diverse needs and also enable them to implement readable and easy to understand data
schemes. In order to properly model spatiotemporal data, some requirements, besides
those related to traditional data modeling, must be taken into account during the
design phase. In fact, the conceptual model should allow modeling of spatial and
temporal dimensions (modeling the objects and events that occur in a field of study,
along with their interactions) as well as the temporal evolution of objects (history
tracking) [2], [4].
Several spatiotemporal modeling approaches that take into account these
requirements have already been proposed [5]. Most of them are based on modeling
techniques like EA and Object-Oriented (OO) used for traditional data modeling.
Transforming Conceptual Spatiotemporal Model into Object Model 283
do not already exist. This first phase is to be done once and for all. The java classes
created are called basic classes, and they will be the same no matter what is the
considered case study.
Transformation of the conceptual schema: While the first phase deals with the
creation of basic classes, the second deals with the application schemas and
consists in transforming each MADS conceptual schema into an equivalent Java
program. This is achieved through the exploitation of the basic Java classes created
in the first phase. In this second phase, for each given conceptual schema modeling
a specific application, a corresponding structure (java program) is automatically
created. The program resulting from this transformation is created using the rules
of model transformation (described in the next paragraphs).
The orthogonality of the concepts and of the spatial and temporal dimensions of the
MADS model allows us to conduct separate studies of transformation for these
concepts and dimensions. Thus we begin by transforming the general concepts of
MADS and then we present the transformation of spatial and temporal dimensions.
becomes an instance of a new class declared inside the main class in order to simulate
this attribute. Java allows creating classes within bounding classes. These inner
classes will have a reference to the bounding class, and access to all its members
(fields, methods, other inner classes). In addition, they cannot be declared outside the
bounding class. (See the transformation of roofing in an inner-class in Table 1).
MADS allows an attribute to take more than one value and provides several types
of collections to support this. MADS modeling of this multi-valued attribute is
achieved by assigning a maximum cardinality equal to n. MADS defined collections
are list, set and bag [2]. However, Java also offers the use of collections, and defines
the Collection, Set, List and TreeList classes. Thus, we propose to establish a
correspondence between the concepts: collection, set, bag and list of MADS and java
basic classes: Collection, HashSet, ArrayList and TreeList, respectively.
Subsequently, a MADS multi-valued attribute is translated into Java by an attribute
having an equivalent collection type and same name (see the transformation of the
attribute owner in Table 1). Although the maximum cardinality of an attribute
indicates whether the attribute is multi-valued or mono-valued, the minimum
cardinality specifies whether it is optional or mandatory. When the minimum
cardinality is equal to zero then the attribute is optional, otherwise it is mandatory. In
our proposal, the optional (mono or multi-valued) attributes of MADS are treated the
same way as multi-valued attributes and are mapped to java collections. The only
difference is that in this second case, additional constraints must be enforced. In fact,
we must verify their existence right while accessing the data. If the attribute is
optional and mono-valued, then another check is required (method created and called
automatically) to insure before accessing the value of the attribute that the attribute is
instantiated (see the transformation of the attribute otherName in Table 1).
chosen to make this correspondence since our target database is an object database
and hence access to objects is done "by reference"). If the association contains
attributes, then the transformation rules are the same as those of the Class concept.
In addition to the attributes defined in the association, we add new attributes (or lists
of attributes) to reference the object of classes involved in this association.
For the classes participating in the association, we also add references to the class
that simulates the association. The minimum and maximum cardinalities of each role
linking a class to the association specify the data structure we are going to use to store
the references (to the instances of class denoting the association). This structure will
be chosen in accordance with the rules used to transform an attribute with a minimum,
and a maximum cardinality. In other words, for a class participating in an association,
this association can be simulated by an attribute with a cardinality equal to the
cardinality of the role that links the class to the association.
Table 2 shows the transformation of the association Contains linking Plot and
Building. The cardinalities of rules indicate that a plot may contain many buildings
and a building is included in one plot. That will give us the java classes: Plot,
Building and Contains. Building contains an attribute of type "Contains". Plot
contains n records (grouped in collection of type ArrayList) to objects of class
"Contains". "Contains" contains two references to Plot and Building.
The "generalization relationship" is one of the fundamental concepts of object
technology. It allows creation of specific classes that inherit attributes and methods of
other classes. The concept of generalization between classes and associations of
MADS is directly translated into the mechanism of inheritance between classes in
Java. Table 2 presents an example of the transformation of this relation.
MADS offers spatial data types (Point, Line, etc.) [2] to denote the shape of objects.
These spatial types can be assigned to classes, attributes and associations.
In Java, we took advantage of the existence of the spatial library "JTS" [13] which
takes into account and enforce all the suggestions defined by the OGC (Open
Geospatial Consortium) [14]. Nevertheless, the spatial types of MADS are more
detailed than those of the JTS, but, for the moment, we still decided to make a direct
Transforming Conceptual Spatiotemporal Model into Object Model 287
correspondence between the spatial types of MADS and the JTS classes. Indeed there
are in MADS some types that have no equivalent in JTS, as for example "SimpleGeo"
and "OrientedLine". In our transformation (Fig.1) we have not created classes for
these types but we did match them with the closest JTS classes. (The creation of an
exact equivalent of all MADS spatial types is feasible. It will be done by
adding some new classes on JTS and it is one of our perspectives).
Fig. 1. Correspondences between the spatial types of MADS and JTS classes
MADS temporal types are used to design precise dates (instants, intervals, etc.), the
lifecycle of objects or the temporal evolution of spatial and thematic attributes.
These temporal types can be assigned to classes, attributes and associations. They
are organized in a temporal hierarchy (see [2]) that has no direct equivalent in the core
classes of Java. Indeed, no existing Java time libraries can take into account the
general complex and life cycle types of MADS. This is why we have created this
structure and we have developed generic classes to simulate all the MADS temporal
types. However, if powerful enough datatypes were introduced in the java language, it
would be easy to use them.
The classes we have created are semantically equivalent and have the same names
as MADS temporal types ("Instant", "Interval", "SimpleTime", etc.). Once these
classes are created, the second step is to transform into Java the MADS conceptual
schema using these temporal types. This transformation depends on the fact that
temporality is assigned to the class, to the association or to the attributes.
Temporality for Classes and Associations: By assigning a temporal type to a class
or an association, we keep track of the lifecycle of instances of this class or
association.
Tracking the lifecycle of instance is to indicate, its scheduling, creation, deletion or
deactivation date. Each object can take at a given date, one of the following four
statuses: planned, active, suspended or destroyed. The active period of an object is the
temporal element associated to the active status in its life cycle [10].
The transformation of the temporality of a class or association (Table 4) is similar
to the transformation of a complex multi-valued attribute named "LifeCycle" and
having as sub-attributes:
Attribute "status" which takes one of values: planned, active, suspended or
destroyed
A temporal attribute of the same type as the class (or association).
Transforming Conceptual Spatiotemporal Model into Object Model 289
The associations between two temporal classes can have synchronized constraints (the
active period of the first class is for example before, after, during, starts with, etc. the
second). A synchronized constraint is transformed into a static method that validates
the constraint in the Java class simulating the association.
4 Conclusions
In this paper, we have presented our method for implementing MADS into Java. In
fact, structural concepts and the spatial and temporal dimension of MADS are
presented and implemented in structures directly manipulated by bd4o. Our
290 C. Zaki, M. Servires, and G. Moreau
References
1. Parent, C., Spaccapietra, S., Zimnyi, E., et al.: Modeling /Spatial Data in the MADS
Conceptual Model MADS. In: Spatial Data Handling 1998 Conference Proceedings,
Vancouver, BC, Canada, pp. 138150 (1998)
2. Parent, C., Spaccapietra, S., Zimnyi, E.: Conceptual Modeling for Traditional and Spatio-
Temporal Applications: the MADS Approach. Springer, New York (2006)
3. Db40, http://www.db4o.com
4. Zimanyi, E., Minout, M.: Preserving Semantics When Transforming Conceptual Spatio-
temporal Schemas. In: Chung, S., Herrero, P. (eds.) OTM-WS 2005. LNCS, vol. 3762, pp.
10371046. Springer, Heidelberg (2005)
5. Pelekis, N., Theodoulidis, B., Kopanakis, I., Theodoridis, Y.: Literature review of spatio-
temporal database models. Knowledge Engineering Review 19(3), 235274 (2004)
6. Price, R.J., Tryfona, N., Jensen, C.S.: Extended SpatioTemporal UML: Motivations,
Requirements and Constructs. Journal on Database Management, Special Issue on
UML 11(4), 1427 (2000)
7. Bedard, Y.: Visual Modeling of Spatial Databases Towards Spatial Extensions and UML.
Geomatica 53(2), 169186 (1999)
8. Fowler, M., Scott, K.: UML Distilled - Applying the Standard Object Modeling Language.
Addison-Wesley, Reading (1998); ISBN 0-201-65783-X
9. Moisuc, B., Gensel, J., Davoine, P.A.: Designing adaptive spatio-temporal information
systems for natural hazard risks with ASTIS. In: Carswell, J.D., Tezuka, T. (eds.) W2GIS
2006. LNCS, vol. 4295, pp. 146157. Springer, Heidelberg (2006)
10. Minout, M.: Modlisation des Aspects Temporels dans les Bases de Donnes Spatiales -
thse de doctorat: Universit Libre de Bruxelles (2007)
11. Souleymane, T., DeSdeMarceau, M.H., Parent, C.: COBALT: a design tool for
geographic and temporal data application. In: Proceedings of the 6th AGILE Conference
(2003)
12. Zaki, C., Zekri, E., Servires, M., Moreau, G., Hegron, G.: Urban Spatiotemporal Data
Modeling: Application to the Study of Pedestrian Walkways. In: Intelligent Spatial
Decision Analysis (ISDA 2010), Inner Harbor, Baltimore, Maryland, USA (2010)
13. JTS, http://www.vividsolutions.com/jts/JTSHome.htm
14. OGC, http://www.opengeospatial.org/standards/sfa
Preface to FP-UML 2011
The Unified Modeling Language (UML) has been widely accepted as the standard
object-oriented language for modeling various aspects of software and information
systems. The UML is an extensible language in the sense that it provides mechanisms
to introduce new elements for specific domains; these include business modeling,
database applications, data warehouses, software development processes, and web
applications. Also, UML provides different diagrams for modeling different aspects
of a software system. However, in most cases, not all of them need to be applied.
Further, UML has grown more complex over the years, and new approaches are
needed to effectively deal with these complexities. In general, we need heuristics and
design guidelines that drive the effective use of UML in systems modeling and
development.
The Seventh International Workshop on Foundations and Practices of UML (FP-
UML'11) will be a sequel to the successful BP-UML'05 - FP-UML'10 workshops held
in conjunction with the ER'05 - ER'10 conferences, respectively. The FP-UML
workshops are a premier forum for researchers and practitioners around the world to
exchange ideas on the best practices for using UML in modeling and systems
development. For FP-UML11, we received papers from nine countries: Poland,
Bosnia and Herzegovina, France, Germany, Israel, Mexico, Spain, Tunisia, and
United States. While the papers addressed a wide range of issues, the dominant topic
was model-driven architectures, including the various challenges related to
transformations and the complexities resulting from multi-model specifications.
The Program Committee selected three papers to include in the program. The first
paper by Brdjanin and Maric shows how associations in class diagrams can be
generated from activity diagrams. The second paper by Reinhartz-Berger and Tsoury
compares two core asset modeling methods, Cardinality-Based Feature Modeling and
Application-Based Domain Modeling, and discusses their benefits and limitations in
terms of specification and utilization capabilities. Finally, the third paper by Marth
and Ren introduces the Actor-eUML model for concurrent programming and
formalizes the mapping between actors in the Actor model and Executable UML
agents by unifying the semantics of actor behavior and the hierarchical state machine
semantics of Executable UML agents.
We thank the authors for submitting their papers, the program committee members
for their hard work in reviewing papers, and the ER 2011 organizing committee for all
their support.
1 Introduction
The UML activity diagram (AD) is a widely accepted business modeling notation
[1]. Several papers [2,3,4,5,6] take AD as the basis for (automated) conceptual
data modeling, but with modest achievements in building the class diagram (CD)
which represents the conceptual model. Emphasizing the insuciently explored
semantic capacity of AD for automated conceptual model design [5], these at-
tempts have mainly resulted in (automated) generation of respective classes for
extracted business objects [2,3,4,5,6] and business process participants [4,5,6], as
well as a limited set of participant-object associations [4,5,6].
This paper considers the semantic capacity of object ows and action nodes in
AD for automated generation of class associations in the target CD representing
the initial conceptual database model (CDM). We performed an extensive anal-
ysis related to: (i) the nature of action nodes regarding the number of dierent
types of input and output objects, and (ii) the weight of object ows. Based
on its results, the formal transformation rules for the generation of object-object
associations are dened and the experimental results of the corresponding ATL
implementation, applied to a generic business model, are provided.
The rest of the paper is structured as follows. The second section introduces
preliminary assumptions and denitions that will be used throughout the paper.
In the third section we present the analysis of the semantic capacity of AD and
dene the formal transformation rules. ATL implementation is partly provided
in the fourth section. The fth section presents an illustrative example. Finally,
the sixth section concludes the paper.
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 292301, 2011.
c Springer-Verlag Berlin Heidelberg 2011
On Automated Generation of Associations in Conceptual Database Model 293
2 Preliminaries
2.1 Detailed Activity Diagram
We assume that each business process in the given business system is modeled by
corresponding detailed AD (DAD) which represents activities at least at complete
level (the related UML metamodel excerpt from the UML superstructure [7] is
shown in Fig. 1 (up)), i.e.
each step (action, activity, etc.) in the realization of the given business pro-
cess is represented by a corresponding action node and will be shortly re-
ferred to as activity in the rest of the paper;
each activity is performed by some business process participant (it can be
external, e.g. buyer, customer, etc., or internal, e.g. worker, working group,
organization unit, etc.) represented by a corresponding activity partition,
usually called swimlane. In the rest of the paper, a business process partici-
pant is shortly referred to as participant ;
each activity may have a number of inputs and/or outputs represented by
object nodes. In the rest of the paper they are referred to as input objects
and output objects, respectively;
objects and activities are connected with object ows. An object ow is
a kind of activity edge which is directed from an input object toward the
corresponding activity (input object ow ) or from an activity toward the
corresponding output object (output object ow ); and
each object ow has a weight attribute, whose value is by default one. The
object ow weight represents the minimum number of tokens that must tra-
verse the edge at the same time. We assume that constant weight represents
not minimum, but the exact number of objects required for the activity if
they are input objects, or the exact number of created objects in the activity
if they are output objects. An unlimited weight (*) is used if the number of
input/output objects is not constant.
Fig. 1. UML metamodel excerpt used for representation of DAD (up) and CDM (down)
Fig. 3. Classication of activities: (a) SISO, (b) MISO, (c) SIMO, and (d) MIMO
SISO Activities. SISO cases and the corresponding transformation rules are
illustrated in Fig. 4.
Regardless of whether the input object (IO) is existing (EIO) or generated
(GIO), each output object (OO) depends on as many IOs as is the weight of
the given input object ow. In this way, if the weight of the input object ow
equals 1, then the OO depends on exactly one IO and the multiplicity of the
respective source association end is exactly 1. If the input weight equals *,
then the OO depends on many IOs and the multiplicity of the respective source
association end is *. If the input weight is literal n which is greater than one,
then the OO depends on exactly n IOs (like personal data containing data about
two cities, where the rst city represents the place of birth, while the second city
represents the place of residence). In that case we have exactly n associations,
where each association has the source end multiplicity equal to 1.
If the input object(s) is/are existing object(s), i.e. EIO, then the target end
multiplicity (which corresponds to the OO) of each association always equals
* and doesnt depend on the weight of the output object ow, because even
in cases when the output weight is exactly 1, with time, the same EIO may
be used in the creation of many OOs.
If the input object(s) is/are generated object(s), i.e. GIO, then the target
end multiplicity depends only on the weight of the output object ow. If the
output weight is exactly 1, then exactly one OO depends on input GIO(s)
and the target end multiplicity should be exactly 1. Otherwise, the target end
multiplicity should be *, because more than one OO depend on input GIO(s).
MISO Activities. Each MISO activity has input objects of more than one
type and it creates output objects of exactly one type. Since all input objects
are required for the start of the activity, we assume that the output object(s)
depend(s) on all these input objects, i.e. output object(s) is/are directly related
to each of the input objects. For example, an activity which results in creating
an invoice takes data about sold goods and shipping details as well. This implies
siso
that SISO transformation rule TOO should be independently applied to each
input object(s) - output object(s) pair of MISO activity.
miso def
TOO : DAD(P, A, O, F ) CM(E, R) Rmiso
OO (a) = TOO
miso
M(a)
def
Rmiso siso
(k) (k)
OO (a) = ROO (a), ROO (a) = TOO iok , ifk , a, of, oo .
1km
miso
The TOO transformation rule is a general rule relevant for all single output
activities, since a SISO activity is just a special case of MISO activities (m = 1).
SIMO Activities. A SIMO activity has input objects of exactly one type and
creates output objects of more than one dierent types. Since the given activity
is to result in the creation of all output objects, we assume that each output
object is directly related to the given input object(s). This implies that SISO
siso
transformation rule TOO should be independently applied to each input object(s)
- output object(s) pair of a SIMO activity.
Rule 3. (Object-object associations for SIMO activities) Let a A be
a SIMO activity having input object(s) io OG OX and creating n N
dierent types of output objects oo1 , oo2 , . . . , oon OO , where if FI and
of1 , of2 , . . . , ofn FO represent the corresponding input and output object ows,
respectively. Let M(a) = {io, if, a, ofk , ook , 1 k n} be the set of SISO tu-
simo
ples for the given activity a A. Transformation rule TOO maps the M(a) set
simo
into the ROO (a) set of corresponding associations for the given activity:
simo def
TOO : DAD(P, A, O, F ) CM(E, R) Rsimo OO (a) = TOO
simo
M(a)
def
Rsimo siso
(k) (k)
OO (a) = ROO (a), ROO (a) = TOO io, if, a, ofk , ook .
1kn
4 Implementation
We use ATL1 [9] to implement formal transformation rules in the Topcased
environment [10]. Due to the space limitations, implementation is just partly
presented.
1 ATLAS Transformation Language.
On Automated Generation of Associations in Conceptual Database Model 299
5 Illustrative Example
The implemented generator has been applied to the generic DAD given in Fig. 5
(up). There are two participants (P1 and P2) in the given process. SIMO activity
A, performed by P1, has one existing input object of the E1 type. Each execution
of this activity results in the creation of two objects of the GA1 type and one
object of the GA2 type. Both GA1 objects constitute the generated input objects
in MISO activity B (performed by P2), which also has two existing input objects
300 D. Brdjanin and S. Maric
Fig. 5. Generic DAD (up) and corresponding automatically generated CDM (down)
6 Conclusion
This paper has considered the semantic capacity of object ows and action nodes
in AD for automated generation of associations in CD representing the CDM.
We have performed an analysis related to: (i) the nature of action nodes based
on the number of dierent types of input and output objects, and (ii) the weight
of object ows. By introducing the classication of activities into SISO, MISO,
SIMO and MIMO activities and making a distinction between existing and gen-
erated input objects, we have dened formal transformation rules for generation
of object-object associations. We have also provided some experimental results
of corresponding ATL implementation applied to a generic business model.
In comparison with the few existing approaches, preliminary experimental re-
sults imply that proposed transformation rules signicantly increase the already
identied semantic capacity of AD for automated CDM design, since the existing
approaches propose creation of associations for activities having both input and
output objects, but dont propose any explicit rule for automated generation of
association cardinalities.
Further research will focus on the full identication of the semantic capacity of
AD for automated CDM design, formal denition of transformation rules (par-
ticularly for participant-object associations) and the evaluation on real business
models.
References
1. Ko, R., Lee, S., Lee, E.: Business process management (BPM) standards: A survey.
Business Process Management Journal 15(5), 744791 (2009)
2. Garcia Molina, J., Jose Ortin, M., Moros, B., Nicolas, J., Troval, A.: Towards
use case and conceptual models through business modeling. In: Laender, A.H.F.,
Liddle, S.W., Storey, V.C. (eds.) ER 2000. LNCS, vol. 1920, pp. 281294. Springer,
Heidelberg (2000)
3. Suarez, E., Delgado, M., Vidal, E.: Transformation of a process business model to
domain model. In: Proc. of WCE 2008, IAENG 2008, pp. 165169 (2008)
4. Brdjanin, D., Maric, S.: An example of use-case-driven conceptual design of re-
lational database. In: Proc. of Eurocon 2007, pp. 538545. IEEE, Los Alamitos
(2007)
5. Brdjanin, D., Maric, S., Gunjic, D.: ADBdesign: An approach to automated initial
conceptual database design based on business activity diagrams. In: Catania, B.,
Ivanovic, M., Thalheim, B. (eds.) ADBIS 2010. LNCS, vol. 6295, pp. 117131.
Springer, Heidelberg (2010)
6. Brdjanin, D., Maric, S.: Towards the initial conceptual database model through
the UML metamodel transformations. In: Proc. of Eurocon 2011, pp. 14. IEEE,
Los Alamitos (2011)
7. OMG: Unied Modeling Language: Superstructure, v2.2. OMG (2009)
8. OMG: Unied Modeling Language: Infrastructure, v2.2. OMG (2009)
9. Jouault, F., Allilaire, F., Bezivin, J., Kurtev, I.: ATL: A model transformation
tool. Science of Computer Programming 72(1-2), 3139 (2008)
10. TOPCASED Project: Toolkit in OPen-source for Critical Application & SystEms
Development, v3.2.0, http://www.topcased.org
Specification and Utilization of Core Assets:
Feature-Oriented vs. UML-Based Methods
Abstract. Core assets are reusable artifacts built to be used in different software
products in the same family. As such, core assets need to capture both
commonality that exists and variability that is allowed in the product family
(line). These assets are later utilized for guiding the creation of particular valid
products in the family. Feature-oriented and UML-based methods have been
proposed for modeling core assets. In this work, we suggest a framework for
analyzing and evaluating core assets modeling methods. We use this framework
for comparing two specific methods: feature-oriented CBFM and UML-based
ADOM. We found similar performance in modifying core assets in the two
methods and some interesting differences in core assets utilization.
1 Introduction
Core assets are reusable artifacts built to be used in more than one product in the
family (line) [1]. They are mainly utilized for creating particular product artifacts that
satisfy specific requirements of software products (applications). Two commonly
used ways to specify and model core assets are through feature-oriented [3, 5] and
UML-based methods [4, 10]. While feature-oriented methods support specifying core
assets as sets of characteristics relevant to some stakeholders and the relationships and
dependencies among them, UML-based methods extend UML 2 metamodel or more
commonly suggest profiles for handling core asset specification in different diagram
types. In order to examine the capabilities of these kinds of modeling methods and
their differences, we identified four main utilization and specification activities,
namely (1) guidance, or reuse, which provides aids for creating product artifacts from
core assets; (2) product enhancement, as exists while adding application-specific
elements to satisfy the requirements in hand; (3) product validation with respect to the
corresponding domain knowledge as specified in the core assets; and (4) core asset
modification, which handles introducing changes or elaborations to existing core
assets. All these activities refer to commonality and variability aspects of the given
family of software products [11], [12]. Commonality mainly specifies the mandatory
elements or features of the product line, i.e., the elements or the features that identify
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 302311, 2011.
Springer-Verlag Berlin Heidelberg 2011
Specification and Utilization of Core Assets: Feature-Oriented vs. UML-Based Methods 303
the product family and all products that belong to that family must include them.
However, commonality may refer also to optional elements which may add some
value to the product when selected, but not all the products that belong to the family
will include them. It can also refer to dependencies among (optional) elements that
specify valid configurations. Variability [8] is usually specified in terms of variation
points, which identify locations at which variable parts may occur, variants, which
realize possible ways to create particular product artifacts at certain variation points,
and rules for realizing variability (e.g., in the form of open and closed variation
points, binding times, and selection of variants in certain variation points).
We examined how advanced information systems students, who studied a domain
engineering course, performed the four aforementioned utilization and specification
activities, separately referring to commonality and variability aspects. For this
purpose, we used two particular methods: Cardinality-Based Feature Modeling
(CBFM) [3], which is a feature-oriented method, and Application-based DOmain
Modeling (ADOM), which is defined through a UML 2 profile [10]. We chose these
two particular methods due to their extended expressiveness in their categories [9]
and we found that modification of core assets in the two methods results in quite
similar performance. Regarding utilization, UML-based ADOM outperformed
feature-oriented CBFM in commonality aspects of both guidance and product
validation, while CBFM outperformed ADOM in product enhancement and
variability aspects of product validation.
The remainder of this paper is organized as follows. Section 2 introduces our
framework for analyzing the specification and utilization aids of core assets modeling
methods, while Section 3 briefly introduces CBFM and ADOM. Section 4 describes
the evaluation we have performed, discussing the research questions, settings, results,
and threats to validity. Finally, Section 5 concludes and refers to future research
directions.
The above framework is used in the paper to examine core assets specification and
utilization in two methods, which are presented next.
A possible utilization of the above core asset is for creating a specific brokers'
application, which has a single peripheral, called All-in-One that consists of a fax
machine, a printer, and a scanner. The first step is choosing the mandatory and
relevant optional paths in the feature diagram of the core asset. The only mandatory
path in the VOF domain is that to Physical Location of Peripheral, but several
relevant optional paths exist, e.g., the paths to Model, to Manufacturer, and so on.
Application-specific features can be added in this domain only under the Peripheral
feature group. Finally, in the third step of product validation, the product artifact, i.e.,
the brokers' application, is validated with respect to the core asset of the VOF domain.
In particular, this step includes checking that the cardinality constraints are satisfied.
Fig. 2. Partial peripheral models in (a) CBFM and (b) ADOM (class diagram)
1
For clarity purposes, four commonly used multiplicity groups are defined on top of this
stereotype: optional many, where min=0 and max= , optional single, where min=0 and
max=1, mandatory many, where min=1 and max= , and mandatory single, where
min=max=1. Nevertheless, any multiplicity interval constraint can be specified using the
general stereotype multiplicity min=m1 max=m2.
306 I. Reinhartz-Berger and A. Tsoury
well as the relevant variants, are first selected and adapted to the application in hand.
The adaptation starts with providing these elements with specific names that best fit
the given application, besides the domain names that appear as stereotypes in the
application model. Note that the same specific (application) element may be
stereotyped by several core asset elements to denote that it plays multiple roles in the
domain. All-in-One, for example, will be simultaneously stereotyped as Fax Machine,
Printer, and Scanner. Nevertheless, when selecting particular variants in a certain
variation point, the name of the variation point does not appear as a stereotype. Thus,
All-in-One will not be stereotyped as Peripheral, but just as Fax Machine, Printer, and
Scanner. Then, the application model is enhanced by adding application specific
elements, which can be completely new elements or elements that are created from
(open) variation points without reusing particular variants. Respectively, completely
new elements are visualized as elements without stereotypes, while variation point-
derived elements are decorated with the variation point names as their stereotypes. In
any case, these elements cannot violate the domain constrains, which are validated
using the application stereotypes as anchors to the domain.
Note that there is a very significant difference between the utilization of core assets
in CBFM and their utilization in ADOM. While CBFM reuses core assets mainly by
configuring them to the particular needs of a given application, ADOM enables
specialization of core assets and addition of details in the model level (denoted M1 in
the MOF framework [7]). However, the usage of multiple diagram types in ADOM
may raise consistency and comprehension difficulties that need to be examined. The
next section described a preliminary experiment in this direction.
on the models and to create models of specific product artifacts according to given
requirements. We did not refer to core assets creation in our study, due to the nature
of our subjects and the complication of such task that mainly originates from the
diversity of the required sources. Instead, we contented with modification tasks for
examining core assets specification. Nevertheless, while creating the particular
product artifacts, the students were implicitly asked to reuse portions of the core
assets and to add application-specific elements. They were also asked to list
requirements that cannot be satisfied in the given domain, since they somehow
violates the core assets specification. Respectively, we phrased the following four
research questions: (1) The specifications of which method are more modifiable (i.e.,
easy to be modified) and to what extent? (2) The specifications of which method are
more guidable and to what extent? (3) The specifications of which method help
enhance particular product artifacts and to what extent? (4) The specifications of
which method help create valid product artifacts and to what extent?
Due to the difficulties to carry out such experiments on real developers in industrial
settings [6], the subjects of our study were 18 students who took a seminar course on
domain engineering during the winter semester of the academic year 2010-2011. All
the subjects had previous knowledge in systems modeling and specification, as well
as initial experience in industrial projects. During the course, the students studied
various domain engineering techniques, focusing on CBFM and ADOM and their
ability to specify core assets and utilize them for creating valid product artifacts. The
study took place towards the end of the course as a class assignment, which worth up
to 10 points of the students' final course grades.
The students were divided into four similarly capable groups of 4-5 students each,
according to their knowledge in the studied paradigms (feature-oriented vs. UML-
based), previous grades, and degrees (bachelor vs. master students). Each group got a
CBFM or ADOM model of the entire VOF domain and a dictionary of terms in the
domain. The ADOM model included a use case diagram and a class diagram, while
the CBFM model included two related feature diagrams (using feature model
references). Two groups got modification tasks, while the two other groups got
utilization tasks (see the exact details on the experiment setting in Table 1).
As noted, the modification tasks mainly required extensions to the core asset and
not inventing new parts from scratch. An example of a modification task in the study
is:
For checking devices two strategies are available: distance-based and attribute-
based. In the distance-based strategy the application locates the nearest available
308 I. Reinhartz-Berger and A. Tsoury
devices for the task in hand ... In the attribute-based strategy, the employee needs to
supply values to different relevant parameters of the device, such as the paper size
and the print quality for printers and the paper size and the image quality for fax
machines. Each application in VOF domain must support the distance-based strategy,
but may support both strategies.
The utilization tasks required creating valid portions of a brokers' application in the
domain, according to a predefined list of requirements, and listing the parts of the
requirements that cannot be satisfied with the given core assets. An example of
requirements in these tasks is:
While operating a peripheral in the brokers' application, notifications are sent to both
employees (via emails) and log files. Furthermore, when performing a task on a
device, three main checks are performed: accessibility, feasibility, and profitability.
All tasks referred to both commonality and variability aspects, and three experts
checked that these questions can be answered via the two models separately. After the
experiment took place, we conducted interviews with the students about their answers
in order to understand what leads them to answer as they did, what their difficulties
were, and how they reached their conclusions.
The scores in the core assets modification category, of both commonality and
variability-related issues, were very similar in the two modeling methods, with slight
advantages in favor of CBFM. Nevertheless, some sources of difficulties to perform this
task were identified in the students' interviews. First, while CBFM promotes
2
This separation is irrelevant for product enhancement, which deals with application-specific
additions.
Specification and Utilization of Core Assets: Feature-Oriented vs. UML-Based Methods 309
Regarding product enhancement, students utilizing the ADOM model believed that
they are not allowed to add elements which are not specified in the core asset of the
domain. Thus, they pointed on the corresponding requirements as violating the
domain constraints rather than application-specific additions. This result may be
attributed to the relatively rich notation of UML and ADOM with respect to CBFM:
the students mainly relied on the VOF specification as expressed in the ADOM model
and tended not to extend it with new requirements.
In both methods, the main source of difficulties in questions that refer to
commonality-related issues of product validation was comprehension of dependencies
between elements. In particular, traceability of OCL constraints was found as a very
difficult task. Since ADOM enables (through UML) visual specification of some OCL
constraints with {or} and {xor} constructs, the students utilizing ADOM succeeded a
little bit more in this category.
Finally, only one task referred to variability-related issues in product validation.
The violation in this task referred to a dependency between two variants. None of the
students who utilized ADOM found this violation, while half of the students who
utilized CBFM found it. We believe that the difference in this case is due to relatively
crowd specifications in ADOM with respect to CBFM: for finding the violation, the
class diagram, which included the variation point, the possible variants, and the
exhibited attributes, had to be consulted. The corresponding specification in CBFM
was much simpler and involved hierarchical structure of features. However, since
only one task referred to this aspect, we cannot state any more general conclusions.
The main threats to validity are of course the small numbers of subjects (18 overall),
the nature of the subjects (students and not experienced developers and domain
engineers), and the relatively simplified tasks and models. In order to overcome these
threats we took the following actions. First, the students had to fill pre-questionnaires
that summarized their level of knowledge and skills. They were trained throughout the
course with the modeling methods and reached a high level of familiarity with these
methods. Moreover, we used additional information regarding the subjects, such as
their grades, degrees, and previous knowledge, in order to divide them into similarly
capable groups. Since we could not perform a statistical analysis due to the low
number of subjects, we conducted interviews with the subjects in order to get more
insights to the achieved results. We further motivated the students to produce
qualitative outcomes by giving them up to 10 points to their final grades according to
their performance in the assignment. Second, we carefully chose the domain and
specified the corresponding models so that they will refer to different domain
engineering-related challenges. Three experts checked the models and their equivalent
expressiveness before the experiment. However, we aim at keeping the models
simple, yet realistic, so that the subjects will be able to respond in reasonable time.
Third, we conducted a comparative analysis between existing feature-oriented and
UML-based methods. The methods used in the experiment were selected based on
this analysis (the main reasons for this choice are mentioned in [9]).
Specification and Utilization of Core Assets: Feature-Oriented vs. UML-Based Methods 311
References
1. Bachmann F., Celments, P., C.: Variability in Software Product Lines. Technical Report
CMU/SEI-2005-TR-012 (2005),
http://www.sei.cmu.edu/library/abstracts/reports/05tr012.cfm
2. Clements, P., Northrop, L.: Software Product Lines: Practices and Patterns. Addison-
Wesley, Reading (2002)
3. Czarnecki, K., Kim, C.H.P.: Cardinality based feature modeling and constraints: A
progress report. In: Proceedings of the OOPSLA Workshop on Software Factories (2005)
4. Halmans, G., Pohl, K., Sikora, E.: Documenting Application-Specific Adaptations in
Software Product Line Engineering. In: Bellahsne, Z., Lonard, M. (eds.) CAiSE 2008.
LNCS, vol. 5074, pp. 109123. Springer, Heidelberg (2008)
5. Kang, K., Cohen, S., Hess, J., Novak, W., Peterson, A.: Feature-Oriented Domain Analysis
(FODA) Feasibility Study. Technical Report CMU/SEI-90-TR-21, Software Engineering
Institute, Carnegie Mellon University (1990)
6. Kitchenham, B.A., Lawrence, S., Lesley, P., Pickard, M., Jones, P.W., Hoaglin, D.C.,
Emam, K.E.: Preliminary Guidelines for Empirical Research. IEEE Transactions on
Software Engineering 28(8), 721734 (2002)
7. OMG. Meta Object Facility (MOF) Specification version 2.4,
http://www.omg.org/spec/MOF/2.4/Beta2/PDF/
8. Pohl, K., Bckle, G., van der Linden, F.: Software Product Line Engineering: Foundations,
Principles, and Techniques. Springer, New York (2005)
9. Reinhartz-Berger, I., Tsoury, A.: Experimenting with the Comprehension of Feature-
Oriented and UML-Based Core Assets. In: Halpin, T., Nurcan, S., Krogstie, J., Soffer, P.,
Proper, E., Schmidt, R., Bider, I. (eds.) BPMDS 2011 and EMMSAD 2011. LNBIP,
vol. 81, pp. 468482. Springer, Heidelberg (2011)
10. Reinhartz-Berger, I., Sturm, A.: Utilizing Domain Models for Application Design and
Validation. Information and Software Technology 51(8), 12751289 (2009)
11. Sinnema, M., Deelstraa, S.: Classifying Variability Modeling Techniques. Information and
Software Technology 49(7), 717739 (2007)
12. Svahnberg, M., Van Gurp, J., Bosch, J.: A Taxonomy of Variability Realization
Techniques. Software Practice & Experience 35(8), 705754 (2005)
Actor-eUML for Concurrent Programming
1 Introduction
Multi-core processors have entered the computing mainstream, and many-core
processors with 100+ cores are predicted within this decade. The increasing
hardware parallelism and the absence of a clear software strategy for exploiting
this parallelism have convinced leading computer scientists that many practicing
software engineers cannot eectively program state-of-the-art processors [8]. We
believe that a basis for simplifying parallel programming exists in established
software technology, including the Actor model [1] and Executable UML. The
advent of multi-core processors has galvanized interest in the Actor model, as
the Actor model has a sound formal foundation and provides an intuitive parallel
programming model. To leverage the Actor model, software systems should be
specied in a language that provides rst-class support for the Actor model and
exposes its rather abstract treatment of parallelism. A leading parallel research
program has advocated dividing the software stack into a productivity layer
and an eciency layer [7]. Parallel concerns are addressed in the eciency layer
by expert parallel programmers, and the productivity layer enables mainstream
programmers to develop applications while being shielded from the parallel hard-
ware platform. This separation of application concerns (productivity layer) and
platform concerns (eciency layer) is the foundation of Executable UML.
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 312321, 2011.
c Springer-Verlag Berlin Heidelberg 2011
Actor-eUML for Concurrent Programming 313
Fortunately, the Actor model and Executable UML are readily unied. In this
paper, we introduce the Actor-eUML model and formalize the mapping between
actors in the Actor model and agents (active objects) in Executable UML by uni-
fying the semantics of actor behavior and the hierarchical state machine (HSM)
semantics of Executable UML agents. Simply stated, an Executable UML agent
is an actor whose behavior is specied as a HSM. To facilitate the denition of
unied semantics for Actor-eUML, we simplify the UML treatment of concur-
rency and extend the Actor model to enable a set of actor behaviors to specify
the HSM for an Executable UML active class. Section 2 presents an overview
of the Actor-eUML model. Section 3 presents the operational semantics of the
Actor-eUML model. Section 4 concludes the paper.
Events are processed serially and to completion. Although there can be massive
parallelism among agents, processing within each agent is strictly sequential.
The Actor-eUML model retains the essence of the pure Actor model while
adding capabilities that simplify actor programming and enable HSM behavior
to be expressed directly and conveniently. As illustrated in Fig. 2, the actor
interface to its abstract mailbox has been extended with two internal queues:
a working queue (queuew ) for internal messages used while processing a single
external message, and a defer queue (queued ) used to defer messages based on
the current state in a HSM. The external queue (queuee ) that receives messages
sent to an actor has been retained. When a message is dispatched from queuee ,
the message is moved to queuew and then dispatched for processing by the next
behavior. As the behavior executes, messages can be added to queuew . When the
behavior completes, the message at the head of queuew is dispatched to the next
behavior. If the queuew is empty when a behavior completes, the next message
is dispatched from queuee . A message dispatched from queuew can be deferred
to queued . The messages in queued are moved to queuew to revisit them.
The additional internal queues enable a single actor to completely process a
client message without creating and communicating with auxiliary actors and
also facilitate the expression of HSM behavior. Each state in a HSM is mapped
to an actor behavior, and a signal is delegated from the current state to its
parent state by adding the signal to queuew and then selecting the behavior for
its parent state as the next behavior. A sequence of start, entry, and exit actions
is executed during a state transition by adding specialized messages to queuew
and then selecting the behavior of the next state in the sequence. A signal that is
deferred in the current state is added to queued . Deferred messages are revisited
after a state transition by moving the messages from queued to queuew .
318 K. Marth and S. Ren
3 Actor-eUML Semantics
The Actor-eUML model denes the following actor primitives, where B is a
behavior, M is a message, and V is a list of parameter values. The call B(V)
returns a closure, which is a function and a referencing environment for the non-
local names in the function that binds the nonlocal names to the corresponding
variables in scope at the time the closure is created. The closure returned by the
call B(V) captures the variables in V, and the closure expects to be passed M as
a parameter when called subsequently.
actor-new(B, V): create a new actor with initial behavior B(V).
actor-next(Ai , B, V): select B(V) as the next behavior for actor Ai .
actor-send(Ai , M): send M to the tail of queuee for actor Ai .
actor-push(Ai , M): push M at the head of queuew for actor Ai .
actor-push-defer(Ai , M): push M at the head of queued for actor Ai .
actor-move-defer(Ai ): move all messages in queued to queuew for actor Ai .
The {actor-new, actor-next, actor-send} primitives are inherited from the
Actor model, and the {actor-push, actor-push-defer, actor-move-defer}
primitives are extensions to the Actor model introduced by the Actor-eUML
model. When transforming a PIM to a PSI, a model compiler for an Actor-
eUML implementation translates the HSM associated with each active class to a
target programming language in which the actor primitives have been embedded.
At any point in a computation, an actor is either quiescent or actively process-
ing a message. The term actor4 (Ai , Q, C, M ) denotes a quiescent actor, where
Ai uniquely identies the actor, Q is the queue for the actor, C is the closure used
to process the next message dispatched by the actor, and M is the local memory
for the actor. The queue Q is a 3-tuple Qe , Qw , Qd , where Qe is the external
queue where messages sent to the actor are received, Qw is the work queue used
when processing a message M dispatched by the actor, and Qd is used to queue
messages deferred after dispatch. At points in a computation, a component of
Q can be empty and is denoted by Q . The term actor5 (Ai , Q, C , M, E S)
denotes an active actor and extends the actor4 term to represent a computation
in which statement list S is executing in environment E. An active actor has a
null C, denoted by C .
A transition relation between actor congurations is used to dene the Actor-
eUML operational semantics, as in [2]. A conguration in an actor computation
consists of actor4 , actor5 , and send terms. The send(Ai , M) term denotes a
message M sent to actor Ai that is in transit and not yet received. The speci-
cation of structural operational semantics for Actor-eUML uses rewrite rules to
dene computation as a sequence of transitions among actor congurations.
Rules (1) and (2) dene the semantics of message receipt. A message M sent
to actor Ai can be received and appended to the external queue for actor Ai
when the actor is quiescent (1) or active (2). The send term is consumed and
eliminated by the rewrite. The message receipt rules illustrate several properties
explicit in the Actor model. An actor message is an asynchronous, reliable, point-
to-point communication between two actors. The semantics of message receipt
Actor-eUML for Concurrent Programming 319
are independent of the message sender. Each message that is sent is ultimately
received, although there is no guarantee of the order in which messages are
received. A message cannot be broadcast and is received by exactly one actor.
Rules (3) and (4) dene the semantics of message dispatch. In rule (3), the
quiescent actor Ai with non-empty Qe and empty Qw initiates the dispatch of
the message M at the head of Qe by moving M to Qw . In rule (4), the quiescent
actor Ai completes message dispatch from Qw and becomes an active actor by
calling the closure C to process M in the initial environment Ec associated with
C. The message dispatch rules enforce the serial, run-to-completion processing of
messages and the demand-driven relationship between Qe and Qw in the Actor-
eUML model. An actor cannot process multiple messages concurrently, and a
message is dispatched from Qe only when Qw is empty.
Rules (5), (6), and (7) dene the semantics of the actor-new, actor-next, and
actor-send primitives, respectively. In rule (5), the active actor Ai executes the
actor-new primitive to augment the conguration with an actor4 term that
denotes a new actor An with empty Q, uninitialized local memory (M ), and
initial behavior closure C = B(V). In rule (6), the active actor Ai executes the
actor-next primitive to select its next behavior closure C = B(V) and becomes
a quiescent actor. In rule (7), the active actor Ai executes the actor-send prim-
itive to send the message M to actor Aj , where both i = j and i = j are
well-dened.
Rules (8), (9), and (10) dene the semantics of the actor-push,
actor-push-defer, and actor-move-defer primitives, respectively.
actor5 (Ai , Qe , Qw , Qd , C , M, E actor-push(Ai , M); S)
actor5 (Ai , Qe , M:Qw , Qd , C , M, E S) (8)
References
1. Agha, G.: Actors: A Model of Concurrent Computation in Distributed Systems.
MIT Press, Cambridge (1986)
2. Agha, G., Mason, I.A., Smith, S.F., Talcott, C.L.: A Foundation for Actor Com-
putation. Journal of Functional Programming, 172 (1997)
3. Harel, D.: Statecharts: A Visual Formalism for Complex Systems. Science of Com-
puter Programming 8(3), 231274 (1987)
4. Mellor, S.J., Balcer, S.J.: Executable UML: A Foundation for Model-Driven Ar-
chitecture. Addison-Wesley, Reading (2002)
5. Mellor, S.J., Kendall, S., Uhl, A., Weise, D.: MDA Distilled. Addison-Wesley, Read-
ing (2004)
6. Milicev, D.: Model-Driven Development with Executable UML. Wiley, Chichester
(2009)
7. Patterson, D., et al.: A View of the Parallel Computing Landscape. Communica-
tions of the ACM 52(10), 5667 (2009)
8. Patterson, D.: The Trouble with Multi-Core. IEEE Spectrum 47(7), 2832 (2010)
9. Raistrick, C., Francis, P., Wright, J., Carter, C., Wilkie, I.: Model Driven Archi-
tecture with Executable UML. Cambridge University Press, Cambridge (2004)
10. Tomlinson, C., Singh, V.: Inheritance and Synchronization with Enabled Sets.
SIGPLAN Notices 24(10), 103112 (1989)
11. Object Management Group: UML Superstructure Specication, Version 2.1.2,
http://www.omg.org/docs/formal/07-11-02.pdf
12. Object Management Group: Semantics of a Foundational Subset for Executable
UML Models (fUML), Version 1.0, http://www.omg.org/spec/FUML
Preface to the Posters and Demonstrations
This volume also contains the papers presented at Posters and Demonstrations
session of the 30th International Conference on Conceptual Modeling, held on
October 31 to November 3, 2011 in Brussels.
The committee decided to accept 9 papers for this session. We hope that
you nd the contributions benecial and enjoyable and that during the session
you had many opportunities to meet colleagues and practitioners. We would
like to express our gratitude to the program committee members for their work
in reviewing papers, the authors for submitting their papers, and the ER 2011
organizing committee for all their support.
David Aguilera, Ra
ul Garca-Ranea, Cristina G
omez, and Antoni Olive
1 Introduction
Names play a very important role on the understandability of a conceptual
schema. Many authors agree that choosing good names for schema elements make
conceptual schemas easier to understand for requirements engineers, conceptual
modelers, system developers and users [5,6].
Choosing good names is one of the most complicated activities related to
conceptual modeling [8, p.46]. There have been several proposals of naming
guidelines for some conceptual schema elements in the literature [3,7] but, as
far as we know, few CASE tools support this activity. One example is [2], which
controls that the capitalization of some elements is correct, like classes should
start with a capital letter.
In this demonstration, we present an Eclipse plugin that adds naming valida-
tion capabilities to the UML2Tools framework. This plugin can assist modelers
during the naming validation process of named elements in UML, following the
complete naming guidelines presented in [1].
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 323327, 2011.
c Springer-Verlag Berlin Heidelberg 2011
324 D. Aguilera et al.
because it is complete: for each kind of element to which a modeler may give a
name in UML, it provides a guideline on how to name it. As an example, two
guidelines are summarized in the following:
G1f The name of an entity type should be a noun phrase whose head is a countable
noun in singular form. The name should be written in the Pascal case.
G1s If N is the name of an entity type, then the following sentence must be gram-
matically well-formed and semantically meaningful:
An instance of this entity type is [a|an] lower 1 (N )
G2f The name A should be a verb phrase in third-person singular number, in the
Camel case.
G2s The following sentence must be grammatically well-formed and semantically
meaningful:
[A|An] lower (E) lower (withOrNeg 2 (A)) [, or it may be unknown].
where the last optional fragment is included only if min is equal to zero.
As stated in [1], a name given by a conceptual modeler complies with the guide-
line if: a) it has the corresponding grammatical form Gf , and b) the sentence
generated from the pattern sentence Gs and the given name is grammatically
well-formed and semantically meaningful.
Fig. 1. Example of a conceptual schema with some names violating their guidelines
1
lower (N ) is a function that gives N in lower case and using blanks as delimiters.
2
withOrNeg (A) extends A with the insertion of the negative form of the verb of A.
Eclipse Plugin for Validating Names in UML Schemas 325
Figure 1 shows an example where some names, which are highlighted, violate
their naming guidelines. The next section describes how to use the developed
Eclipse plugin to detect those errors.
Fig. 2. Screenshot of Eclipse showing those names that violate their guidelines
the whole schema. If one or more names are incorrect, the errors are shown in a
new Eclipse view.
The second functionality introduces schema verbalization. Our tool generates
a PDF le containing the pattern sentences dened in the naming guidelines
and the names of the selected elements. Some aspects of the resulting document
can be congured by using the conguration window shown in Fig. 3. In order
to generate the document, the modeler has to click on Verbalize Verbalize.
Figure 4 shows the verbalization of the schema of Fig. 1 after correcting the
errors previously detected. Then, the modeler or a domain expert may check
the document and detect whether the generated sentences are grammatically
well-formed and semantically meaningful.
Acknowledgements. Our thanks to the people in the GMC research group. This
work has been partly supported by the Ministerio de Ciencia y Tecnologa un-
der TIN2008-00444 project, Grupo Consolidado, and by BarcelonaTech Universitat
Polit`ecnica de Catalunya, under FPI-UPC program.
References
1. Aguilera, D., Gomez, C., Olive, A.: A complete set of guidelines for naming UML
conceptual schema elements (submitted for publication, 2011)
2. ArgoUML: ArgoUML, http://argouml.tigris.org
3. Chen, P.: English sentence structure and entity-relationship diagrams. Inf. Sci.
29(2-3), 127149 (1983)
4. Clayberg, E., Rubel, D.: Eclipse Plug-ins. Addison-Wesley, Reading (2008)
5. Deissenboeck, F., Pizka, M.: Concise and consistent naming. Softw. Qual. Con-
trol 14, 261282 (2006)
Eclipse Plugin for Validating Names in UML Schemas 327
1 Introduction
The vast majority of existing keyword search techniques over structured data relies
heavily on an a-priori creation of an index on the contents of the database. At run time,
the index is used to locate in the data instance the appearance of the keywords in the
query provided by the user, and then associate them by finding possible join paths. This
approach makes the existing solutions inapplicable in all the situations where the con-
struction of such an index is not possible. Examples include databases on the hidden
web, or behind wrappers in integration systems. To cope with this issue we have devel-
oped KEYRY (from KEYword to queRY) [2], a system that converts keyword queries
into structured queries expressed in the native language of the source, i.e., SQL. The
system is based only on the semantics provided by the source itself, i.e., the schema,
auxiliary semantic information that is freely available, i.e., public ontologies and the-
sauri, a Hidden Markov Model (HMM) and an adapted notion of authority [4]. Using
this information, it builds a ranked list of possible interpretations of the keyword query
in terms of the underlying data source schema structures. In practice, the tool computes
only the top-k most prominent answers and not the whole answer space. One of the
features of KEYRY is that the order and the proximity of the keywords in the queries
play a central role in determining the k most prominent interpretations.
This work was partially supported by project Searching for a needle in mountains of data
http://www.dbgroup.unimo.it/keymantic.
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 328331, 2011.
c Springer-Verlag Berlin Heidelberg 2011
KEYRY: A Keyword-Based Search Engine Over Relational Databases 329
KEYRY
Metadata Repository
Keyword Matcher
actors Avatar
movie?
actors TABLE actor Query Generator
movie TABLE movie
Avatar ATTRIBUTE movie.title
Query
HMM Matcher Path Selector
Builder
Wrapper
Metadata Query
Data Source
Sam Worthington, Extractor Manager SELECT * FROM movie
Zoe Saldana, JOIN actor ON ...
Sigourney Weaver, ...
Apart from the obvious use case of querying hidden web sources, KEYRY finds two
additional important applications. First, it allows keyword queries to be posed on infor-
mation integration systems that traditionally supports only structured queries. Second,
it can be used as a database exploration tool. In particular, the user may pose some
keyword query, and the system will generate its possible interpretations on the given
database. By browsing these interpretations, i.e., the generated SQL queries, the user
may obtain information on the way the data is stored in the database.
2 KEYRY at a Glance
Interpreting a keyword query boils down to the discovery of a bipartite assignment of
keywords to database structures. In some previous work we have studied the problem
from a combinatorial perspective [1] by employing the the Hungarian algorithm [3].
The tool we demonstrate here is instead based on a Hidden Markov Model. A detailed
description of the methodology can be found on our respective research paper [2]. A
graphic illustration of the different components of the tool can be found in Figure 1.
The first step when a keyword query has been issued is to associate each keyword
to some database structure. That structure may be a table name, a class, an attribute, or
an actual value. Each assignment of the keywords to database structures is a configura-
tion. The computation of the different configurations is done by the Keyword Matcher
component. A configuration describes possible meanings of the keywords in the query.
Given the lack of access to the instance data that we assume, finding the configurations
and subsequently the interpretations of a keyword query and selecting k most promi-
nent is a challenging task. Consider for instance the query movie Avatar actors over
a relational database collecting information about movies. The query may be intended
to find actors acting in the movie Avatar which means that the keywords movie and
actors could be respectively mapped into the tables movie and actor while Avatar
could be a value of the attribute title of the table movie. Different semantics are
also possible, e.g. Avatar could be an actor or a character, or something else.
The Keyword Matcher is the main component of the system. It implements a Hiden
Markov Model (HMM), where the observations represent the keywords and the states
the data source elements. The main advantages of this representation are that the HMM
330 S. Bergamaschi et al.
takes into account both the likelihood of single keyword-data source element associa-
tions, and the likelihood of the match between the whole keyword sequence in the query
to the data source structure. In this way, the assignment of a keyword to a data source
element may increase or decrease the likelihood that another keyword corresponds to a
data source element. In order to define a HMM, the values of a number of parameters
need to be specified. This is usually done using a training algorithm that, after many
iterations, converges to a good solution for the parameter values. Therefore, finding a
suitable dataset for training the HMM is a critical aspect for the effective use of HMM-
based approaches in real environments. In our scenario, the parameters are initialized
by exploiting the semantic information collected in the Metadata Repository (explained
below), providing that way keyword search capabilities even without any training data,
as explained in more details in our paper [2].
To obtain more accurate results, the HMM can be trained, i.e. the domain-
independent start-up parameters may be specialized for a specific data source thanks
to the users feedbacks. We train KEYRY with a semi-supervised approach that ex-
ploits both the feedbacks (when available) implicitly provided by the users on the sys-
tem results as supervised data and the query log as unsupervised data. By applying the
List Viterbi algorithm [5] to an HMM, the top-k state sequences which have the high-
est probability of generating the observation sequence are computed. If we consider
the user keyword query as an observation sequence, the algorithm retrieves the state
sequences (i.e., sequences of HMM states representing data source terms) that more
likely represent the intended meaning of the user query.
The HMM Matcher can present the computed top-k configurations to the user who,
in turn, selects the one that better represents the intended meaning of his/her query. This
allows the tool to reduce the number of queries to be generated (the ones related to the
configuration selected) and to train the HMM parameters.
Given a configuration, the different ways that the data structures on which the key-
words have been mapped can be connected, need to be found (e.g., by discovering join
paths). A configuration alongside the join paths describe some possible semantics of the
whole keyword query, and can be expressed into some query in the native query language
of the source. Such queries are referred to as interpretations. As an example, consider the
configuration mentioned above in which the keywords movie and actor are mapped into
the tables movie and actor, respectively, while the keyword Avatar to the title of
the table movie, may represent the actors that acted in the movie Avatar, or the movies
where acted actors of the movie Avatar, etc., depending on the path selected and the ta-
bles involved. The path computation is the main task of the Query Generator module.
Different strategies have been used in the literature to select the most prominent one,
or provide an internal ranking based on different criteria, such as the length of the join
paths. KEYRY uses two criteria: one is based on the shortest path and the other is using
the HITS algorithm [4] to classify the relevance of the data structures involved in a path.
The tasks of generating the various configurations and subsequently the different
interpretations are supported by a set of auxiliary components such as the Metadata
Repository that is responsible for the maintenance of the metadata of the data source
structures alongside previously executed user queries. KEYRY has also a set of Wrap-
pers for managing the heterogeneity of the data sources. Wrappers are in charge of
KEYRY: A Keyword-Based Search Engine Over Relational Databases 331
extracting metadata from data sources and formulating the queries generated by
KEYRY in the native source languages.
3 Demonstration Highlights
In this demonstration we intend to illustrate the use of KEYRY and communicate
to the audience a number of important messages. The demonstration will consist
of a number of different application scenarios such as querying the IMDB database
(www.imdb.com), the DBLP collection (dblp.uni-trier.de) and the Mondial database
(http://www.dbis.informatik.uni-goettingen.de/Mondial). We will first show the behav-
ior of KEYRY without any training. We will explain how the metadata of the sources
is incorporated into our tool and how heuristic rules allow the computation of the main
HMM parameters. We will run a number of keyword queries against the above sources
and explain the results. These queries are carefully selected to highlight the way the tool
deals with the different possible mappings of the keywords to the database structures.
The participants will also have the ability to run their own queries. Furthermore, we will
consider the parameters obtained by different amounts of training and we will compare
the results to understand how the amount of training affects the final result.
The important goals and messages we would like to communicate to the partici-
pants though the demo are the following. First, we will demonstrate that keyword-based
search is possible even without prior access to the data instance, and is preferable from
formulating complex queries that require skilled users who know structured query lan-
guages and how/where the data is represented in the data source. Second, we will show
that using a HMM is a successful approach in generating SQL queries that are good
approximations of the intended meaning of the keyword queries provided by the user.
Third, we will illustrate how previously posed queries are used to train the search en-
gine. In particular, we will show that the implicit feedback provided by the user se-
lecting an answer among the top-k returned by the system can be used for supervised
training. We will also demonstrate that, even in the absence of explicit users feedbacks,
the results computed by the tool may still be of high enough quality. We will finally
demonstrate that each user query may be associated to several possible interpretations
which can be used to reveal the underline database structure.
References
1. Bergamaschi, S., Domnori, E., Guerra, F., Lado, R.T., Velegrakis, Y.: Keyword search over
relational databases: a metadata approach. In: Sellis, T.K., Miller, R.J., Kementsietsidis, A.,
Velegrakis, Y. (eds.) SIGMOD Conference, pp. 565576. ACM, New York (2011)
2. Bergamaschi, S., Guerra, F., Rota, S., Velegrakis, Y.: A Hidden Markov Model Approach to
Keyword-based Search over Relational Databases. In: De Troyer, O., et al. (eds.) ER 2011
Workshops. LNCS, vol. 6999, pp. 328331. Springer, Heidelberg (2011)
3. Bourgeois, F., Lassalle, J.-C.: An extension of the Munkres algorithm for the assignment prob-
lem to rectangular matrices. Communications of ACM 14(12), 802804 (1971)
4. Li, L., Shang, Y., Shi, H., Zhang, W.: Performance evaluation of hits-based algorithms. In:
Hamza, M.H. (ed.) Communications, Internet, and Information Technology, pp. 171176.
IASTED/ACTA Press (2002)
5. Seshadri, N., Sundberg, C.-E.: List Viterbi decoding algorithms with applications. IEEE
Transactions on Communications 42(234), 313323 (1994)
VirtualEMF: A Model Virtualization Tool
1 Introduction
Complex systems are usually described by means of a large number of interre-
lated models, each representing a given aspect of the system at a certain ab-
straction level. Often, the system view a user needs does not correspond to a
single model, but is a cross-domain view in which the necessary information is
scattered in several models. This integrated view is provided by the means of
model composition which is, in its simplest form, a modeling process that com-
bines two or more input (contributing) models into a single output (composed)
model. Model composition can be very challenging, due to the heterogeneous
nature of models and the complex relationships that can exist between them.
Composition has been extensively studied from various perspectives: its formal
semantics [2], composition languages [3], or also targeting dierent families of
models (UML [1], Statecharts [4], database models [5], . . . ). A commonality of
the vast majority of approaches is the fact that the composed model is generated
by copying/cloning information from its contributing models, what poses some
important limitations in terms of synchronization (updates in the composed
model are not propagated to the base ones, or the other way round), creation
time (copying many elements is time consuming, and composition must be re-
executed every time contributing models are modied), and memory usage (data
duplication can be a serious bottleneck when composing large models).
In this demo we present VirtualEMF: a model composition tool that allows
overcoming these limitations by applying the concept of virtual models, i.e.,
models that do not hold concrete data (as opposed to concrete models), but
that access and manipulate the original contributing data contained in other
models. The tool was built on top of Eclipse/EMF1 .
1
Eclipse Modeling Framework : http://www.eclipse.org/modeling/emf/
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 332335, 2011.
c Springer-Verlag Berlin Heidelberg 2011
VirtualEMF: A Model Virtualization Tool 333
read/write
Virtual Composition conforms to
inter-model
relationships
Contributing Correspondence Contributing composition
Model Ma Model Model Mb input/output
Fig. 1 introduces the main idea of model virtualization and the involved arte-
facts. Tools (editors, analysis and transformation tools,. . . ) see and use the vir-
tual model as a normal model. The virtual model delegates modeling operations
to its set of contributing models, locating referenced element(s), and translating
them into virtual elements to be used by the tool. Contributing elements are
composed at runtime, on an on-demand basis.
Contributing elements (and their properties) can be composed/translated into
virtual elements in dierent manners. Some may be ltered; others, simply re-
produced. Another possibility is when contributing elements are related to each
other and the virtual element is a combination of them (e.g. as in merge or
override relationships). A correspondence model links contributing elements and
identies which translation rule should be used for composing each element.
A virtual composed model conforms to the same composition metamodel a
concrete composed model would. This composition metamodel states the core
concepts that can populate the composed model.
334 C. Clasen, F. Jouault, and J. Cabot
VMab.virtualmodel
compositionMetamodel = {\MMab.ecore}
contributingMetamodels = {\MMa.ecore, \MMb.ecore}
contributingModels = {\Ma.xmi, \Mb.xmi}
correspondenceModel = {\MatoMb.amw}
Fig. 3. A virtual model (composition of a UML class model with a relational database
model, where the latter derives from the former and virtual associations are used to
display traceability links between them) handled in two dierent EMF tools: Sample
Ecore Editor (left) and MoDisco Model Browser (right)
4 Conclusion
Model virtualization is a powerful mechanism that provides a more ecient
model composition process, while maintaining perfect synchronization between
composition resources. This demo presents VirtualEMF3 , our model virtualiza-
tion tool. The tool is extensible and supports dierent types of virtual links
and new semantics for them. As further work we intend to explore new types
of inter-model relationships, and to use state-of-the-art matching techniques to
automatically identify relationships and generate the correspondence model. Sev-
eral experiments have been conducted to prove the scalability of our solution.
References
1. Anwar, A., Ebersold, S., Coulette, B., Nassar, M., Kriouile, A.: A Rule-Driven
Approach for composing Viewpoint-oriented Models. Journal of Object Technol-
ogy 9(2), 89114 (2010)
2. Herrmann, C., Krahn, H., Rumpe, B., Schindler, M., V olkel, S.: An Algebraic View
on the Semantics of Model Composition. In: Akehurst, D.H., Vogel, R., Paige, R.F.
(eds.) ECMDA-FA. LNCS, vol. 4530, pp. 99113. Springer, Heidelberg (2007)
3. Kolovos, D., Paige, R., Polack, F.: Merging Models with the Epsilon Merging Lan-
guage (EML). In: Wang, J., Whittle, J., Harel, D., Reggio, G. (eds.) MoDELS 2006.
LNCS, vol. 4199, pp. 215229. Springer, Heidelberg (2006)
4. Nejati, S., Sabetzadeh, M., Chechik, M., Easterbrook, S., Zave, P.: Matching and
Merging of Statecharts Specications. In: ICSE 2007, pp. 5464. IEEE Computer
Society, Los Alamitos (2007)
5. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching.
The VLDB Journal 10, 334350 (2001)
3
VirtualEMF website: www.emn.fr/z-info/atlanmod/index.php/VirtualEMF
Towards a Model-Driven Framework for Web
Usage Warehouse Development
Paul Hern
andez, Octavio Glorio, Irene Garrig
os, and Jose-Norberto Maz
on
Web usage analysis is the process of nding out what users are looking for on
the Internet. This information is extremely valuable for understanding how a
user walks through a website in, thus supporting decision making process.
Commercial tools for Web usage data analysis have some drawbacks: (i) sig-
nicant limitations performing advanced analytical tasks, (ii) uselessness when
trying to understand navigational patterns of users, (iii) inability to integrate
and correlate information from dierent sources, or (iv) unawareness of the con-
ceptual schema of the application. For example, one of the most known analy-
sis tools is Google Analytics (http://www.google.com/analytics) which has
emerged as a major solution for Web trac analysis, but it has a limited drill-
down capability and there is no way of storing data eciently. Worse still, the
user does not own the data, Google does.
There are several approaches [3,4] that dene a multidimensional schema in or-
der to analyze the Web usage by using the Web log data. With these approaches,
once the data is structured, it is possible to use OLAP or data mining techniques
to analyze the content of the Web logs, tackling the aforementioned problems.
However, there is a lack of agreement about a methodological approach in order
to detect which would be the most appropriate facts and dimensions: some of
them let the analysts decide the required multidimensional elements, while oth-
ers decide these elements by taking into consideration a specic Web log format.
Therefore, the main problem is that the multidimensional elements are infor-
mally chosen according to a specic format, so the resulting multidimensional
This work has been partially supported by the following projects: SERENIDAD
(PEII-11-0327-7035) from Castilla-La Mancha Ministry, and MESOLAP (TIN2010-
14860) from the Spanish Ministry of Education and Science.
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 336337, 2011.
c Springer-Verlag Berlin Heidelberg 2011
Towards a Model-Driven Framework for Web Usage Warehouse Development 337
In our rst scenario (Web usage warehouse within model-driven Web engi-
neering), several conceptual models have to be dened when designing a website
(navigation model, user model, data model, etc.). However, none of these models
are intended to represent and understand the Web usage. Therefore, multidimen-
sional concepts (facts, dimensions, hierarchies, etc.) should be identied within
the conceptual models of a given application in order to build a Web usage
warehouse in an integrated and structured manner. Due to the fact that concep-
tual models of websites may not be available or out-of-date, within our second
scenario (Web usage warehouse from Web log data), a Web usage warehouse is
developed without requiring these conceptual models, but using Web log les.
To this aim, a Web log metamodel is dened which contains the elements and
the semantics that allow building a conceptual model from Web log les, which
represents, in a static way, the interaction between raw data elements (i.e. the
client remote address) and usage concepts (i.e. session, user).
References
1. Ceri, S., Fraternali, P., Bongio, A.: Web modeling language (WebML): a modeling
language for designing web sites. Computer Networks 33(1-6), 137157 (2000)
2. Garrigos, I.: A-OOH: Extending web application design with dynamic personaliza-
tion (2008)
3. Joshi, K.P., Joshi, A., Yesha, Y.: On using a warehouse to analyze web logs. Dis-
tributed and Parallel Databases 13(2), 161180 (2003)
4. Lopes, C.T., David, G.: Higher education web information system usage analysis
with a data webhouse. In: Gavrilova, M.L., Gervasi, O., Kumar, V., Tan, C.J.K.,
Taniar, D., Lagan a, A., Mun, Y., Choo, H. (eds.) ICCSA 2006. LNCS, vol. 3983,
pp. 7887. Springer, Heidelberg (2006)
CRESCO: Construction of Evidence Repositories
for Managing Standards Compliance
1 Introduction
Safety critical systems are typically subject to safety certication based on recog-
nized safety standards as a way to ensure that these systems do not pose undue
risks to people, property, or the environment. A key prerequisite for demonstrat-
ing compliance to safety standards is collecting structured evidence in support
of safety claims. Standards are often written in natural language and are open to
subjective interpretation. This makes it important to develop a precise and ex-
plicit interpretation of the evidence requirements of a given standard. In previous
work [4,3], we have proposed conceptual modeling for formalizing the evidence
requirements of safety standards. This approach on the one hand helps develop
a shared understanding of the standards and on the other hand, provides a basis
for the automation of various evidence collection and management tasks.
In this paper, we describe CRESCO, a exible tool infrastructure for creat-
ing repositories to store, query, and manipulate standards compliance evidence.
Additionally, CRESCO generates a web-based user interface for interacting with
these repositories. Our work was prompted by an observed need during our col-
laboration with companies requiring IEC 61508 compliance. In particular, we
observed that little infrastructure support has been developed to date to sup-
port management of safety evidence based on a specic standard. This issue has
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 338342, 2011.
c Springer-Verlag Berlin Heidelberg 2011
CRESCO: Construction of Evidence Repositories 339
also been noted in the literature as an important gap in the safety certication
process [2,5]. While CRESCO is general and can be used in conjunction with
dierent standards, we ground our discussion in this paper on IEC 61508, which
is a key standard for safety certication of programmable electronic systems.
In the rest of this paper, we will describe the key components of CRESCO.
For the actual demonstration, we will follow, and expand where necessary, our
presentation in this paper. Specically, the demo will begin with motivational
material similar to this papers introduction and augmented with snippets of
the conceptual model in [4]. We then go on to describe the overall architecture
of the tool, as shown in Figure 1. In the next step, we will use a combination of
pre-recorded and live demonstrations to illustrate the main functions of the tool,
discussed in Sections 2. Finally, as we outline in Section 3, our demonstration
will provide information about the tools implementation based on our existing
documentation [1], and give details about availability.
2 Tool Overview
Users can interact with CRESCO in two roles: the administrator and general
user. The administrator is responsible for importing the conceptual model into
CRESCO, running the transformations and setting up and starting the web
server. Once the server is started, the general users typically experts from the
supplier company or certication body can add, view and manipulate the data
in the database. In this section we provide an overview of the main components
of CRESCO as shown in Figure 1(a).
Administrator Users
(a) (b)
CRESCO User Interaction
Transformation User Interface
Browser
Model-to-Model View Layer
UML Class Transformation 6: dynamic 1: user request
Diagram HTML page
Consistency Check
Web Server
Logic Layer
5: dynamic
UML Metamodel ORM Model HTML page
2: user request
Object-Relational Mapping
Model-to-Text Persistence Server-side Script
Transformation Layer
3: persistence
ORM Metamodel 4: result set layer command
The generation of the database and the user interface code involves two steps: a
model-to-model (M2M) transformation and a model-to-text (M2T) transforma-
tion. The M2M transformation takes as input a conceptual model of a standard
340 R.K. Panesar-Walawege et al.
in the form of a UML class diagram [6]. This model can be created in any UML
tool and then imported into CRESCO. An in-depth description of the concep-
tual model is beyond the scope of this demonstration paper further details are
available in [4]. The M2M transformation makes use of the UML meta-model [6]
and a meta-model for an object-relational mapping (ORM) that we have created
see [1]. This ORM meta-model enables the storage (in a relational database) of
objects that have been created based on a UML class diagram. The ORM meta-
model includes a database schema (with tables, columns, foreign keys, etc) and
object-oriented concepts, mainly generalization. The M2M transformation iter-
ates over the conceptual model and transforms it into a model that corresponds
to the ORM meta-model.
The user interface is generated
from the ORM model created during
the M2M transformation. The M2T
transformation iterates over the ele-
ments of the ORM model and gener-
ates the database implementation as
well as all the code for accessing and
updating the database via a web in-
terface. The generated code is a com-
bination of server-side Java code and
Fig. 2. CRESCO User Interface
HTML (see Section 3). Figure 1(b)
shows how the user interaction is pro-
cessed via the generated code. Figure 2 shows the user interface generated. The
left hand pane lists all the tables that have been generated and the right hand
pane is used to manipulate the rows in a selected table. The New button shown
is used to add a new row into the selected table. Figure 2 shows the table for the
concept of Agent, who is an entity that carries out an activity required during
system development. An Activity is a unit of behavior with specic input and
output Artifacts. Each activity utilizes certain Techniques to arrive at its de-
sired output and requires certain kind of Competence by the agents performing it.
The agent itself can be either an individual or an organization and is identied
by the type of role it plays. In CRESCO, one can: (1) create instances of concepts
such as Agent, Activity, Artifact, (2) ll out their attributes, (3) and establish
the links between the concept instances. For illustration, we show in Figure 2,
the addition of an agent record into the Agent table.
an unnecessary order on how the evidence items have to be entered. While our
choice allows more freedom for the user when adding entries in the database,
it also calls for the implementation of a consistency checker, to verify that the
data in the database is in accordance with the constraints dened in the UML
class diagram. For example, an Activity must have at least one Agent who is
responsible for carrying out this activity (dened in the Agentcarriesoutactivity
table shown in Figure 2). Such constraints are checked by CRESCOs consistency
checker and any violations are highlighted to the user for further investigation.
References
1. Knutsen, T.: Construction of information repositories for managing standards com-
pliance evidence (2011) Master Thesis, University of Oslo,
http://vefur.simula.no/~ rpanesar/cresco/knutsen.pdf
2. Lewis, R.: Safety case development as an information modelling problem. In: Safety-
Critical Systems: Problems, Process and Practice, pp. 183193. Springer, Heidelberg
(2009)
3. Panesar-Walawege, R.K., Sabetzadeh, M., Briand, L.: Using UML proles for sector-
specic tailoring of safety evidence information. In: De Troyer, O., et al. (eds.) ER
2011 Workshops. LNCS, vol. 6999, Springer, Heidelberg (2011)
4. Panesar-Walawege, R.K., Sabetzadeh, M., Briand, L., Coq, T.: Characterizing the
chain of evidence for software safety cases: A conceptual model based on the iec
61508 standard. In: ICST 2010, pp. 335344 (2010)
5. Redmill, F.: Installing IEC 61508 and supporting its users nine necessities. In: 5th
Australian Workshop on Safety Critical Systems and Software (2000)
6. UML 2.0 Superstructure Specication (August 2005)
Modeling Approach for Business Networks with
an Integration and Business Perspective
1 Introduction
Nowadays, enterprises participate in value chains of suppliers and customers
while competing in business networks rather than being isolated. To remain
competetive, e.g. by quickly implementing new process variants or proactively
identify aws within existing processes, enterprises need visibility into their busi-
ness network, the relevant applications and the processes. Business Network
Management is the capability of managing intra and inter enterprise networks.
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 343344, 2011.
c Springer-Verlag Berlin Heidelberg 2011
344 D. Ritter and A. Bhatt
3 Conclusions
In this paper, we presented the design desicions to use BPMN for a network
domain, BNM. We discuss the requirements for a model in this domain and
show that a network model can be based on BPMN 2.0 while extending it for
network specic entities and attributes. The specic extensions to BPMN 2.0
are discussed in detail in [2].
References
1. Owen, M., Stuecka, R.: BPMN und die Modellierung von Gesch aftsprozessen.
Whitepaper, Telelogic (2006)
2. Ritter, D., Ackermann, J., Bhatt, A., Homann, F. O.: Building a Business Graph
System and Network Integration Model based on BPMN. In: 3rd International
Workshop on BPMN, Luzern (accepted, 2011)
3. Specication of Business Process Modeling Notation version 2.0 (BPMN 2.0),
http://www.omg.org/spec/BPMN/2.0/PDF
4. Service Component Architecture (SCA), http://www.osoa.org/display/Main/
Service+Component+Architecture+Home
5. Unied Modeling Language (UML), http://www.uml.org/
Mosto: Generating SPARQL Executable
Mappings between Ontologies,
1 Introduction
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 345348, 2011.
c Springer-Verlag Berlin Heidelberg 2011
346 C.R. Rivero et al.
2 Mosto
In this section, we present Mosto, our tool to perform the data translation task
between two OWL ontologies by means of SPARQL executable mappings. Per-
forming this task in our tool comprises four steps, namely: 1) Selecting source
and target ontologies; 2) Specifying restrictions and correspondences; 3) Gen-
erating SPARQL executable mappings; and 4) Executing SPARQL executable
mappings. These steps are described in the rest of the section.
The rst step deals with the selection of source and target ontologies to be
integrated. In our demo scenario (cf. Figure 1), we integrate DBpedia ontol-
ogy v3.2 with DBpedia ontology v3.6, which are shown in a tree-based notation
in which classes, data properties and object properties are represented by cir-
cles, squares and pentagons, respectively. Note that subclasses are represented
between brackets, e.g., dbp:Artist is subclass of dbp:Person is represented as
dbp:Artist [dbp:Person] . In addition, the domain of a property is represented
by nesting the property in a class, and the range is represented between < and
>, e.g., the domain of dbp:director is dbp:Film and its range is dbp:Person.
After selecting the ontologies, Mosto extracts a number of implicit restrictions,
which are restrictions that are due to the modelling language of source and target
ontologies, in our case, the OWL ontology language.
In the second step, the user species explicit restrictions in the source and
target ontologies, and correspondences between them. Explicit restrictions are
necessary to adapt existing ontologies to the requirements of a specic scenario,
Mosto: Generating SPARQL Executable Mappings between Ontologies 347
V6 V 8
dbp:starring <dbp:Person>
dbp:Film [dbp:Work]
dbp:imdbId <xsd:string> V7 dbp:Film [dbp:Work]
dbp:starring <dbp:Person> dbp:imdbId <xsd:string>
V9
dbp:director <dbp:Person> dbp:director <dbp:Person>
Resulting SPARQL executable mappings
M1 // Correspondence V 1 M 2 // Correspondence V4 M3 // Correspondence V 8
CONSTRUCT { CONSTRUCT { CONSTRUCT {
?p rdf:type dbp :Person . ?w rdf:type dbp :Award . ?w rdf :type dbp :Work .
} WHERE { ?a rdf:type dbp :Person . ?p rdf :type dbp :Person .
?p rdf:type dbp :Person . ?a rdf:type dbp :Artist . ?w dbp :starring ?p .
} ?a rdf:type dbp :Actor . } WHERE {
?w dbp:academyAward ?w . ?w dbp :starring ?p.
} WHERE { ?w rdf :type dbp :Work .
?a dbp:academyawards ?w . ?w rdf :type dbp :Film .
?a rdf:type dbp :Person . ?p rdf :type dbp :Person .
?a rdf:type dbp :Artist . }
?a rdf:type dbp :Actor .
}
Finally, in the fourth step, Mosto is able to perform the data translation task
by executing the previously generated SPARQL executable mappings over the
source ontology to produce instances of the target ontology. Note that, thanks
to our SPARQL executable mappings, we are able to automatically translate the
data from a previous version of an ontology to a new version.
3 The Demo
In this demo, ER attendees will have an opportunity to use Mosto to test the
automatic generation of SPARQL executable mappings using our demo scenario,
which integrates dierent versions of the DBpedia ontology. We will show how
the addition or removal of correspondences and restrictions aect the resulting
executable mappings. Furthermore, we will perform the data translation task
using these mappings, and check whether resulting target data are as expected.
Expected evidences in our demo scenario are the following, namely: 1) the time
to generate executable mappings is less than one second; 2) Mosto facilitates the
specication of restrictions and correspondences in complex scenarios; and 3)
the resulting target data are coherent with expected results.
References
1. Alexe, B., Chiticariu, L., Miller, R.J., Tan, W.C.: Muse: Mapping understanding
and design by example. In: ICDE, pp. 1019 (2008)
2. Bernstein, P.A., Melnik, S.: Model management 2.0: manipulating richer mappings.
In: SIGMOD, pp. 112 (2007)
3. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann,
S.: DBpedia - a crystallization point for the web of data. J. Web Sem. (2009)
4. Euzenat, J., Shvaiko, P.: Ontology matching. Springer, Heidelberg (2007)
5. Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and
query answering. Theor. Comput. Sci. 336(1), 89124 (2005)
6. Haas, L.M., Hern andez, M.A., Ho, H., Popa, L., Roth, M.: Clio grows up: from
research prototype to industrial tool. In: SIGMOD, pp. 805810 (2005)
7. Motik, B., Horrocks, I., Sattler, U.: Bridging the gap between OWL and relational
databases. J. Web Sem. 7(2), 7489 (2009)
8. Petropoulos, M., Deutsch, A., Papakonstantinou, Y., Katsis, Y.: Exporting and in-
teractively querying web service-accessed sources: The CLIDE system. ACM Trans.
Database Syst. 32(4) (2007)
9. Rao, A., Braga, D., Ceri, S., Papotti, P., Hern andez, M.A.: Clip: a tool for map-
ping hierarchical schemas. In: SIGMOD, pp. 12711274 (2008)
10. Ressler, J., Dean, M., Benson, E., Dorner, E., Morris, C.: Application of ontology
translation. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon,
L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudre-
Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 830842.
Springer, Heidelberg (2007)
11. Rivero, C.R., Hern andez, I., Ruiz, D., Corchuelo, R.: Generating SPARQL exe-
cutable mappings to integrate ontologies. In: Jeusfeld, M., et al. (eds.) ER 2011.
LNCS, vol. 6998, pp. 118131. Springer, Heidelberg (2011)
12. Shadbolt, N., Berners-Lee, T., Hall, W.: The semantic web revisited. IEEE Int.
Sys. 21(3), 96101 (2006)
The CSTL Processor: A Tool for Automated Conceptual
Schema Testing
1 Introduction
Two fundamental quality properties of conceptual schemas are correctness (i.e. all the
defined knowledge is true for the domain) and completeness (i.e. all the relevant
knowledge is defined in the conceptual schema) [1]. The validation of these properties
is still an open challenge in conceptual modeling [2].
In [3], we proposed a novel environment for testing conceptual schemas. The main
purpose of conceptual schema testing is the validation of conceptual schemas according
to stakeholders needs and expectations. Conceptual schemas can be tested if (1) the
conceptual schema is specified in an executable form, and (2) a representative set of
concrete scenarios are formalized as test cases.
In this paper, we present the CSTL Processor [4], a tool that supports the execution
of test sets written in the Conceptual Schema Testing Language (CSTL) [3]. The
CSTL Processor makes the proposed testing environment feasible in practice.
This tool may be used in different application contexts in which UML/OCL
conceptual schemas may be tested. The CSTL Processor supports test-last validation
(in which correctness and completeness are checked by testing after the schema
definition) or test-first development of conceptual schemas (in which the elicitation
and definition is driven by a set of test cases). Our testing environment proposes the
use of automated tests [5], which include assertions about the expected results that
may be automatically checked. This is an essential feature in order to allow regression
testing [5].
In the next section, we present the CSTL Processor and its components. In
Section 3, we reference example applications which will be illustrated in the
demonstration and complementary case studies, tutorials and documentation.
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 349352, 2011.
Springer-Verlag Berlin Heidelberg 2011
350 A. Tort, A. Oliv, and M.-R. Sancho
The test processor implements the execution of the test cases and consists of the
presentation manager, the test manager and the test interpreter.
The test manager stores the CSTL programs in order to make it possible to execute
the test set at any time. When the conceptual modeler requests the execution of the
test programs, the test manager requests the test interpreter to execute them. The test
manager also keeps track of the test results and maintains test statistics.
The test interpreter reads and executes CSTL programs. For each test case, the
interpreter sets up the common fixture (if any), executes the statements of each test
case and computes the verdicts. The interpreter invokes the services of the
information processor to build the IB states according to each test case and check the
specified assertions.
The presentation manager provides means for writing CSTL test programs and for
displaying the results of their execution. Built-in editors are provided for the
definition of the conceptual schema, its methods and the test programs. Moreover,
after each execution of the test set, test programs verdicts are displayed. Figure 2
shows a screenshot of the verdicts presentation screen. The test processor indicates
the number of the lines where test cases have failed and gives an explanation of the
failure in natural language.
The preprocessor initializes the coverage database, which maintains the set of
covered elements for each test adequacy criterion. The test interpreter communicates
information about the tests execution to the adequacy criteria analyzer which is the
responsible of updating the coverage database according to the defined criteria.
After the execution of all test programs, the adequacy criteria analyzer queries the
coverage database in order to obtain the sets of covered and uncovered elements for
each criterion and computes statistical information about the coverage results.
References
1. Lindland, O.I., Sindre, G., Solvberg, A.: Understanding Quality in Conceptual Modeling.
IEEE Software 11(2), 4249 (1994)
2. Oliv, A.: Conceptual Modeling of Information Systems. Springer, Berlin (2007)
3. Tort, A., Oliv, A.: An approach to testing conceptual schemas. Data Knowl.Eng. 69(6),
598618 (2010)
4. Tort, A.: The CSTL Processor project website,
http://www.essi.upc.edu/~atort/cstlprocessor
5. Janzen, D., Saiedian, H.: Test-Driven Development: Concepts, Taxonomy, and Future
Direction. Computer 38(9), 4350 (2005)
6. Gogolla, M., Bohling, J., Richters, M.: Validating UML and OCL Models in USE by
Automatic Snapshot Generation. Software & Systems Modeling 4(4), 386398 (2005)
7. Hevner, A.R., March, S.T., Park, J., Ram, S.: Design science in information systems
research. Mis Quarterly, 75105 (2004)
8. Tort, A.: Testing the osCommerce Conceptual Schema by Using CSTL. Research Report
UPC (2009), http://hdl.handle.net/2117/6289
9. Tort, A.: Development of the conceptual schema of a bowling game system by applying
TDCM. Research Report UPC (2011), http://hdl.handle.net/2117/11196
10. Tort, A.: Development of the conceptual schema of the osTicket system by applying
TDCM. Research Report UPC (2011), http://hdl.handle.net/2117/12369
A Tool for Filtering Large Conceptual Schemas
1 Introduction
The conceptual schemas of many real-world information systems are too large to
be easily managed or understood. There are many information system develop-
ment activities in which people needs to get a piece of the knowledge contained
in a conceptual schema. For example, a conceptual modeler needs to check with
a domain expert that the knowledge is correct, a database designer needs to
implement that knowledge into a relational database, a software tester needs to
write tests checking that the knowledge has been correctly implemented in the
system components, or a member of the maintenance team needs to change that
knowledge. Currently, there is a lack of computer support to make conceptual
schemas usable for the goal of knowledge extraction.
Information ltering [3] is a rapidly evolving eld to handle large information
ows. The aim of information ltering is to expose users only to information that
is relevant to them. We present an interactive tool in which the user species
one or more concepts of interest and the tool automatically provides a (smaller)
subset of the knowledge contained in the conceptual schema that is likely to be
relevant. The user may then start another interaction with dierent concepts,
until she has obtained all knowledge of interest. We presented the theoretical
background behind this tool in [4,5].
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 353356, 2011.
c Springer-Verlag Berlin Heidelberg 2011
354 A. Villegas, M.-R. Sancho, and A. Olive
The rst step consists in preparing the required information to lter the large
schema according to the specic needs of the user. Basically, the user focus on
a set of entity types she is interested in and our method surrounds them with
additional related knowledge from the large schema. Therefore, it is mandatory
for the user to select a non-empty initial focus set of entity types of interest.
During the second step our method computes the required metrics to automat-
ically select the most interesting entity types to extend the knowledge selected
in the focus set of the rst step. The main goal of these metrics is to discover
those entity types that are relevant in the schema but also that are close (in
terms of structural distance over schema) to the entity types of the focus set.
We presented a detailed denition of such metrics in [4].
Finally, the last step receives the set of most interesting entity types selected
in the previous step and puts it together with the entity types of the focus set
in order to create a ltered conceptual schema with the entity types of both
sets. The main goal of this step consists in ltering information from the orig-
inal schema involving entity types in the ltered schema. To achieve this goal,
the method explores the relationships and generalizations/specializations in the
original schema that are dened between those entity types and includes them
in the ltered schema to obtain a connected schema.
Figure 2 depicts the main components that participate in a user request to our
ltering tool. The user writes in the search eld the names of the entity types she
is interested in. Our web client automatically suggests names while the user is
writing to simplify that task and to help her discovering additional entity types.
In the example of Fig. 2 the user focuses on the entity types ActAppointment
and Patient. Once the request is complete our web client processes the focus
set and uses the ltering API of the web service through a SOAP call. The
web service analyses the request and constructs the related ltered conceptual
schema following the ltering process described in the previous section.
Figure 3 shows the components of the response. The reduced schema pro-
duced by our web service is an XMI le containing 8 entity types. In order to
increase the understandability of the schema we make use of an external service
(http://yuml.me) to transform the ltered schema from a textual representa-
tion to a graphical one. As a result, the user can rapidly comprehend from the
schema of Fig. 3 that SubjectOfActAppointment connects the entity types Ac-
tAppointment and Patient, which means that in the HL7 V3 schemas a patient
is the subject of a medical appointment. Subsequently, the user can start again
the cycle with a new request if required.
4 Summary
We have presented a tool to assist users to deal with large conceptual schemas
that allows to focus on a set of entity types of interest and automatically ob-
tains a reduced view of the schema in connection with that focus. Our imple-
mentation as a web service provides interoperability and simplies the inter-
action with users. A preliminary version of the ltering tool can be found in
http://gemece.lsi.upc.edu/filter.
Our immediate plans include the improvement of our tool by adding a more
dynamic view of the ltered schema instead of the static image obtained by
the present external service. As a result, we are introducing more interactive
features such as selection of schema elements and new ltering interactions from
that selection.
356 A. Villegas, M.-R. Sancho, and A. Olive
References
1. Beeler, G.W.: HL7 version 3 an object-oriented methodology for collaborative
standards development. International Journal of Medical Informatics 48(1-3), 151
161 (1998)
2. Gogolla, M., B uttner, F., Richters, M.: USE: A UML-based specication environ-
ment for validating UML and OCL. Science of Computer Programming (2007)
3. Hanani, U., Shapira, B., Shoval, P.: Information ltering: Overview of issues, research
and systems. User Modeling and User-Adapted Interaction 11(3), 203259 (2001)
4. Villegas, A., Olive, A.: A method for ltering large conceptual schemas. In: Parsons,
J., Saeki, M., Shoval, P., Woo, C., Wand, Y. (eds.) ER 2010. LNCS, vol. 6412, pp.
247260. Springer, Heidelberg (2010)
5. Villegas, A., Olive, A., Vilalta, J.: Improving the usability of HL7 information models
by automatic ltering. In: IEEE 6th World Congress on Services, pp. 1623 (2010)
Preface to the Industrial Track
The aim of the ER11 Industrial Track was to serve as a forum for high quality pre-
sentations on innovative commercial software, systems, and services for all facets of
conceptual modeling methodologies and technologies as described in the list of topics
of the ER 2011 conference. We strongly believe that bringing together researchers and
practitioners is important for the progress and success of research on conceptual mod-
eling. We do hope that this track will become stronger year by year and will serve as an
excellent opportunity to discuss current practices and modern and future market trends
and needs.
The 2011 edition formed two interesting sessions on advanced and novel conceptual
model applications.
The first session included three papers on business intelligence applications. The first
two papers present tools than assist the data warehouse designer. QBX is a case tool that
facilitates data mart design and deployment. TARGIT BI Suite assists the designer to
add associations between measures and dimensions to a traditional multidimensional
cube model and facilitates a process where users are able to ask questions to a busi-
ness intelligence system without the constraints of a traditional system. The third paper
presents a tool that implements an entity resolution method for topic-centered expert
identification based on bottom-up mining of online sources.
The second session included three papers on emerging industrial applications of con-
ceptual modeling. The first paper presents a model-driven solution toward the provision
of secure messaging capabilities to the financial services industry through the stan-
dardization of message flows between industry players. The second paper presents the
underlying scientific theories, methodology, and software technology to meet the re-
quirements of high quality technical documentation. The third paper presents a real
case of using business semantics management for integrating and publishing research
information on an innovation information portal.
We hope that you will enjoy the industrial track proceedings and find useful infor-
mation and motives to extend your research to new horizons. We would like to express
our gratitude to all authors who submitted papers and talk proposals, the members of
the program committee for their help and efforts in organizing this track, and the ER
2011 organizing committee and ER steering committee for all their support.
Abstract. QBX is a CASE tool for data mart design resulting from a
close collaboration between academy and industry. It supports designers
during conceptual design, logical design, and deployment of ROLAP data
marts in the form of star/snowake schemata, and it can also be used
by business users to interactively explore project-related knowledge at
dierent levels of abstraction. We will demonstrate QBX functionalities
focusing on both forward and reverse engineering scenarios.
The continuous market evolution and the increasing competition among com-
panies solicit organizations to improve their ability to foresee customer demand
and create new business opportunities. In this direction, data warehouses have
become an essential element for strategic analyses. However, data warehouse sys-
tems are characterized by a long and expensive development process that hardly
meets the ambitious requirements of todays market. This is one of the main
causes behind the low penetration of data warehouse systems in small-medium
rms, and even behind the failure of whole projects [4].
A data warehouse is incrementally built by designing and implementing one
data mart at a time; so, one of the directions to increase the eciency of the
data warehouse development process is to automate design of single data marts.
Several techniques for automating some phases of data mart design have been
proposed in the literature (e.g., [2] for conceptual design, [5] for logical design,
[3] for physical design, [6] for designing the ETL process), and some research
prototypes of CASE tools have been developed (e.g., [1]). On the other hand,
commercial tools such as Oracle Warehouse Builder are oriented to a single
platform and should be considered as design wizards rather than CASE tools.
In this paper we introduce QBX, a CASE tool resulting from a close collab-
oration between academy and industry; in particular, industrial partners took
care of the executive design and implementation of the tool. QBX includes two
separate components: QB-Xpose (read cube-expose) and QB-Xplore (read cube-
explore). The rst is used by designers for conceptual design, logical design, and
deployment of ROLAP data marts in the form of star/snowake schemata; the
design process is further streamlined by letting QBX read from and write to the
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 358363, 2011.
c Springer-Verlag Berlin Heidelberg 2011
QBX: A CASE Tool for Data Mart Design 359
2 Architecture
As already mentioned, QBX includes two integrated software tools.
QB-Xpose gives designers an eective support by automating conceptual and
logical design of data marts, and by making deployment on ROLAP platforms
easier. It was implemented based on Eclipse [7], an open source project pro-
viding an extensible development platform. The QB-Xpose components were
developed in accordance with the Eclipse plug-in contract using three comple-
mentary frameworks: Eclipse Modeling Framework (EMF), Graphical Editing
Framework (GEF), and Graphical Modeling Framework (GMF). Figure 1 shows
the main components of QB-Xpose and their dependencies. Three components
implement the model-view-controller design pattern: QBXtool.Model uses the
EMF and is responsible for managing the QBX model; QBXtool.Edit and QBX-
tool.Diagram use the EMF and the GMF, and play the roles of the controller and
of the viewer, respectively. QBXtool.Logical supports logical design, while QBX-
tool.Mondrian and QBXtool.Microstrategy manage the conversion of the QBX
meta-model to/from the Mondrian and Microstrategy meta-models.
QB-Xplore enables business users to interactively browse and annotate the
project documentation, both at business and technical levels. QB-Xplore is a
web application implemented using the Google Web Toolkit. The underlying
model was built using the EMF to achieve higher eciency when exchanging
information with QB-Xpose.
3 Functional Overview
From a methodological point of view, QBX supports both classical scenarios of
data mart design [8]:
Demand-driven approach, where designers draw their data mart starting
from user requirements, possibly by composing existing hierarchies and
360 A. Battaglia, M. Golfarelli, and S. Rizzi
QBXpose QBXplore
Project
Conceptual Schema Logical Schema Documentation
WEB
BROWSER
IMPORT
Data Mart
acquire metadata from an OLAP engine and translate them into a DFM
conceptual schema. This scenario is sketched in Figure 3.
Logical design preferences. To enable a ner tuning of logical schemata,
designers can express a set of preferences about logical design, including
e.g. how to deal with degenerate dimensions, shared hierarchies, and cross-
dimensional attributes.
Data volume. QBX enables designers to specify the data volume for a data
mart, in terms of expected cardinalities for both facts and attributes. This
is done manually in a demand-driven scenario, automatically in a supply-
driven scenario. When a data volume has been specied, designers can ask
QBX to optimize ROLAP schemata from the point of view of their storage
space.
4 Demonstration Scenarios
The demonstration will focus on both forward and reverse engineering scenarios.
In the forward engineering scenario, we will adopt a demand-driven approach
to interactively draw a conceptual schema (Figure 4), showing how QB-Xpose
automatically checks for hierarchy consistency in presence of advanced constructs
of the DFM. Then, we will let QB-Xpose generate alternative logical schemata
using dierent design preferences, and critically compare the results (Figure 5).
Finally, we will let QB-Xpose create a comprehensive documentation for the
project.
In the reverse engineering scenario, the relational schema and metadata of
an existing data mart will be imported from the Mondrian engine, and the
eectiveness of their translation to the DFM will be discussed.
References
1. Golfarelli, M., Rizzi, S.: WAND: A CASE tool for data warehouse design. In: Proc.
ICDE, pp. 79 (2001)
2. Golfarelli, M., Rizzi, S.: Data warehouse design: Modern principles and methodolo-
gies. McGraw-Hill, New York (2009)
3. Golfarelli, M., Rizzi, S., Saltarelli, E.: Index selection for data warehousing. In: Proc.
DMDW, pp. 3342 (2002)
4. Ramamurthy, K., Sen, A., Sinha, A.P.: An empirical investigation of the key de-
terminants of data warehouse adoption. Decision Support Systems 44(4), 817841
(2008)
5. Theodoratos, D., Sellis, T.: Designing data warehouses. Data & Knowledge Engi-
neering 31(3), 279301 (1999)
6. Vassiliadis, P., Simitsis, A., Georgantas, P., Terrovitis, M., Skiadopoulos, S.: A
generic and customizable framework for the design of ETL scenarios. Information
Systems 30(7), 492525 (2005)
7. Vv. Aa.: Eclipse. http://www.eclipse.org/platform/ (2011)
8. Winter, R., Strauch, B.: A method for demand-driven information requirements
analysis in data warehousing projects. In: Proc. HICSS, pp. 13591365 (2003)
The Meta-Morphing Model Used in TARGIT BI Suite
Abstract. This paper presents the meta-morphing model and its practical appli-
cation in an industry strength business intelligence solution. The meta-morphing
model adds associations between measures and dimensions to a traditional multi-
dimensional cube model, and thus facilitates a process where users are able to ask
questions to a business intelligence (BI) system without the constraints of a tradi-
tional system. In addition, the model will learn the users presentation preferences
and thereby reduce the number of interactions needed to present the answer. The
nature of meta-morphing means that users can ask questions that are incomplete
and thereby experience the system as a more intuitive platform than state-of-art.
1 Introduction
According to leading industry analyst, Gartner, ease of use has surpassed functionality
for the first time as the dominant business intelligence platform buying criterion [5].
This change represents a shift from prioritizing the IT departments need to standardize
to prioritizing the ability for casual users to conduct analysis and reporting.
Ease of use, in the decision processes that managers and employees go through, has
been the focal point in the development of TARGIT BI Suite since its early version in
1995. However, different from other solutions that seek the same objective, TARGIT
has methodically applied the CALM philosophy [1], which seeks to create synergy
between humans and computers as opposed to using computers simply as a tool to create
efficiency. In the CALM philosophy, the entire organization is divided into multiple
observe-orient-decide-act (OODA) loops and computing is applied to make users cycle
these loops as fast as possible, i.e., with as few interactions as possible. The patented
meta-morphing [4], described in the following section, allows users to analyze data by
stating their intent, and thus facilitates users cycling OODA loops with few interactions.
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 364370, 2011.
c Springer-Verlag Berlin Heidelberg 2011
The Meta-Morphing Model Used in TARGIT BI Suite 365
The meta-morphing model, shown in Figure 1, facilitates the following four steps:
1. A question is parsed into a set of one or more measures or dimensions.
2. If the question is incomplete (meaning that it has either only dimension(s) or mea-
sures(s)) then an association with the most relevant measure (if question had only
a dimension) or dimension (if question had only a measure) is created.
3. A query based on the associated measures and dimensions is executed on the cube.
4. The preferred presentation of the returned data is selected given by either the users
preferences (earlier behavior) or if no previous experience exists an expert system
will determine the presentation based on the size and shape of the returned data. The
returned dataset is displayed to the user with the presentation properties identified.
Example: A user submits the question I would like to see customers.
Step 1. parses all words in the sentence representing the questions (Figure 1(b)), and
these words are subsequently matched against the meta-data of the data warehouse (Fig-
ure 1(a)). If a word in the sentence is not matched in the meta-data it is simply thrown
away. The output from Step 1 will be a set of dimensions and/or measures; and if the
set is empty, the meta-morphing process is simply terminated. In our example, the only
word that will remain from this parsing is customers.
Step 2. compensates for the problem of the users question containing only measures
or dimensions. In a traditional system, asking incomplete questions like I would like
to see customers or Show me revenue would at best return a list of customers (the
members of the customer dimension) or the sum of revenue for all data in the data
warehouse. By creating an association between measures and dimensions (Figure 1(c)),
the system will learn the individual users behavior based on what he clicks, e.g., if he
366 M. Middelfart and T.B. Pedersen
clicks to select revenue and customers at the same time, then an association between
revenue and customers will be created. Therefore, the answer to both questions will be
a list of customers with their respective revenue. Associations are also created while
the user loads any analysis, meaning that he does not need to actively pose a question
including the association. This means that the user will simply feel that he is receiving
information in the way he is used to. In the event that a user has never seen or asked
for a given relationship, the meta-morphing process will look into which dimension or
measure the user most often uses, and then create an association that is used most often,
i.e., the measure or dimension from the association with the highest Used Count (see
Figure 1(c)). The output of Step 2 is a combination of measures and dimensions.
Step 3. is, given the output of Step 2, a trivial query in the data warehouse retrieving
revenue for each member on the customers dimension. The output of this step is a
dataset as well as the dimensions and measures used to provide it.
Step 4. preferences are created for each user given the way they would normally see
an answer given its dimension/measure combination, e.g., customer/revenue is usually
displayed as a pie-chart (see Figure 1(d)), profit per country is usually displayed as
a bar-chart, etc. Given the experience with the users preferences that are collected
whenever the user sees or formats a piece of information, the dataset received from
Step 3 is formatted and displayed to the user. In the event that no preferences have been
collected for the user, an expert system will inspect the dataset and make a call for
the best presentation object (pie, bar, geographic map, table, etc.), this expert system is
based on input from a TARGIT BI domain expert. In our example, the returned revenue
for each customer will be presented as a pie chart based on the data in Figure 1(d).
Using the meta-morphing model, users are able to pose incomplete questions that
will still return relevant answers, while at the same time save the users a number of in-
teractions in formatting the dataset returned since these are already known to the system.
In other words, the user will be guiding the system with his intent, and the computer
will provide him with the best fitting output based on the his individual preferences.
Another interesting aspect of meta-morphing is that it will allow the user to ask
questions in human language as opposed to a database query language, which will al-
low a much more natural intuitive feel to the application that exploits the process. In
particular, with regards to speech recognition, the parsing of question to meta-data in
Step 1 will mean that the recognition will be enhanced simply from the fact that fewer
words will be in the vocabulary, as opposed to a complete language. The combination
of meta-morphing and speech is also a patented process [3].
The TARGIT BI Suite is recognized by analysts as being one of the leading global
business intelligence platforms with more than 286,000 users World-wide. Although no
specific usability study has been conducted, the TARGIT BI Suite has been surveyed
by leading industry analysts to have a unique strength in its ease of use achieved by
reducing the number of interactions that a user needs to conduct in order to make deci-
sions [5]. The so-called few clicks approach has been demonstrated to allow users to
easily interpret the datasets displayed using automatically generated explanations [2].
The Meta-Morphing Model Used in TARGIT BI Suite 367
In the TARGIT BI Suite, the meta-morphing process is integrated such that users
have the option of activating it dynamically in three different scenarios: guided ana-
lytics called Intelligent Analysis, a quick drag-drop function called TARGIT This, and
finally, an analytical link to all dashboards and reports known as Hyper-Related OLAP.
Intelligent Analysis allows users to compose sentences similar to our example in the
previous section by clicking on meta-data in a semi-structured environment. Once a
question is posed, the process using the meta-morphing model can be activated.
TARGIT This is a drop-area to which either a dimension or a measure can be dropped,
and upon release of the item dropped, the process using the meta-morphing model
will commence with the question I would like to analyze [measure or dimension].
Hyper-Related OLAP is perhaps where the most powerful results are achieved by the
meta-morphing process. Hyper-Related OLAP allows the user to click any figure in
the TARGIT BI Suite in order to analyze it. Since any figure presented on a business
intelligence platform is a measure surrounded by dimensions (either as grouping or
criteria), the process using the meta-morphing model can be activated by a single click
at any figure, with the question I would like to analyze [measure]. This gives the user
a starting point for analyzing any figure whenever he sees something he wants to inves-
tigate further. This functionality significantly reduces the time and interactions needed
from whenever a problem is observed to when an analysis can be conducted in order to
reach a decision with subsequent action. In other words, Hyper-Related OLAP directly
assists the users in cycling their individual OODA loops with fewer interactions.
4 Conclusion
This paper presented the meta-morphing model and showed its practical application
in an industry strength business intelligence solution. It was specifically demonstrated
how the meta-morphing model will allow users to freely pose questions in human lan-
guage, including in speech, and subsequently receive a presentation of the answer in
accordance with their preferences. It was demonstrated how the meta-morphing model
can contribute with greater ease of use by reducing the number of interactions needed in
data analysis. Moreover, it was demonstrated how meta-morphing can reduce the time
and interactions for users cycling an observation-orientation-decision-action loop.
Acknowledgments. This work was supported by TARGIT A/S, Daisy Innovation and
the European Regional Development Fund.
References
1. Middelfart, M.: CALM: Computer Aided Leadership & Management. iUniverse (2005)
2. Middelfart, M., Pedersen, T.B.: Using Sentinel Technology in the TARGIT BI Suite.
PVLDB 3(2), 16291632 (2010)
3. Middelfart, M.: Presentation of data using meta-morphing. United States Patent 7,779,018
(Issued August 17, 2010)
4. Middelfart, M.: Method and user interface for making a presentation of data using meta-
morphing. United States Patent 7,783,628 (Issued August 24, 2010)
5. Sallam, R.L., Richardson, J., Hagerty, J., Hostmann, B.: Magic Quadrant for Busi-
ness Intelligence Platforms, www.gartner.com/technology/media-products/
reprints/oracle/article180/article180.html (April 28, 2011)
368 M. Middelfart and T.B. Pedersen
The Demonstration
Based on experience with the users preferences for presenting the measure Revenue,
a complete analysis (with the measure Revenue displayed over a set of dimensions:
Customer Country, Item, and Period, including their presentation preferences:
map, pie-, and bar chart) is presented (Figure 3).
Scenario 2: We drag the dimension Customer and drop it on the TARGIT This
drop area in the TARGIT BI Suite (Figure 4).
Based on experience with the users preferences the customer dimension is presented
with the measure Revenue, and given the users presentation preferences the output is
presented in a table (Figure 5). The system automatically adds a descending sorting to
the table and an option for selecting on the time dimension (criteria option above the
table).
Tool Support for Technology Scouting Using
Online Sources
1 Introduction
Many rms are nowadays looking for opportunities to adopt and implement a
formal, structured and focused approach for the identication and acquisition
of new technology, and to develop technology based product and service inno-
vations. This is usually referred to as technology scouting and is understood as
an organised approach for identifying technological needs, gaps and opportuni-
ties, and then nding solutions outside the ocial borders of the enterprise. It
is very often applied when: 1) a technical problem needs to be solved quickly
due to some change in the competitive landscape; 2) an organisation is looking
for opportunities to move into a new market with limited involvement of inter-
nal resources; 3) or specic new skills need to be acquired without increasing
internal resource overhead.
The role of the technology scout will therefore include searching for oppor-
tunities and leads within a certain technological domain, evaluating leads, and
creating the link between a lead and company strategy. A technology scout needs
to utilize an extensive and varied network of contacts and resources, and stay
on top of emerging trends and technologies.
With the rise of the social web and the advent of linked data initiatives a
growing amount of data is becoming publicly available: people communicate on
social networks, blogs and discussion forums, research events publish their online
programmes including article abstracts and authors, websites of technological
conferences and exhibitions advertise new products and technologies, govern-
ments and commercial organisations publish data, and the linked open data
cloud keeps on growing. An enormous potential exists for exploiting this data
by combining it and extracting intelligence from it for the purpose of technology
scouting. However, there are currently no adequate tools available to support the
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 371376, 2011.
c Springer-Verlag Berlin Heidelberg 2011
372 E. Tsiporkova and T. Tourwe
work of the technology scouts and although a large pool of data is available on
the web, it is often gathered manually, a time intensive, tedious and error-prone
process, due to the fact that the data is not centralised, is available in dierent
formats, can be outdated or contradictory, etc.
In this paper, we describe a prototype of a software tool for topic-centric
expert identication from online sources. We present the rationale and the con-
crete approach applied in the realisation of the tool. The main novelty of our
approach lies in the fact that it is based on the bottom-up, application-driven
and incremental creation of the repository, as opposed to the more commonly
considered top-down approach.
2 Rationale
Identifying experts is by far not a new topic and it has been gradually gaining
interest in the recent years [1,2,3,4]. However, there are certain shortcomings
associated with the existing approaches e.g. lack of focus on realistic applications,
limited to a single source, targeting too large scale, poor resolution and accuracy,
high information redundancy, etc.
The problem of nding experts is very often approached top-down with re-
spect to both application and scale, meaning that often a certain method or
technology is developed without a clear idea of the potential application in mind
and by default coverage comes before accuracy. Many works on the topic focus
mainly on the elaboration and application of advanced information retrieval,
machine learning and reasoning techniques (see [5,6,7,8,9,10]), while consider-
ing rather synthetic applications as research collaboration or enterprise expert
search([11,12,13]). Some other recent contributions [14,15] in the area of inte-
grating web data consider a set of challenging research topics as data cleansing,
proling, and entity resolution, but suer from the same lack of focus on concrete
and realistic applications.
In terms of scale, the problem of nding experts is usually tackled by mining
vast online data sources as e.g. Wikipedia and CiteSeer (see Jung et al. [2]), in
order to gather a suciently large data repository containing information about
persons, articles, social links, etc. This has the advantage of achieving high cov-
erage of the experts active in a certain domain and relatively complete expert
proles in space and time. Unfortunately, such large data collections contain
a substantial proportion of noisy data (contradictions, duplicates, ...) and the
achieved degree of accuracy cannot be estimated in a reliable way. Accuracy is
most commonly measured by precision and recall. Precision is the ratio of true
positives, i.e. true experts in the total number of found expert candidates, while
recall is the fraction of true experts found among the total number of true ex-
perts in a given domain. However, determining the total number of true experts
in a given domain is not feasible. Furthermore, for applications as technology
scouting the current activity and location of certain experts is more important
than their career evolution within an extensive time span. In general, for appli-
cations delivering data for business decision making purposes the reliability of
the information at the current time period is crucial.
Tool Support for Technology Scouting Using Online Sources 373
3 Approach
The types of expert-related information identied as relevant to technology
scouting applications are as follows: 1) actors in the eld: leading researchers
and experts, as well as companies, research institutes, universities; 2) technology-
related publications: scientic and popularised publications, presentations and
keynotes, press releases and technology blogs; 3) research activities: past and
ongoing research projects and collaborations, organisation and participation in
research events, etc. Relationships between the dierent data entries need also to
be identied and made explicit, e.g. formal and informal expert networks, identi-
ed through joint publications, joint organisation of events, social networks such
as LinkedIn and Twitter, professional and educational history.
Following the reasoning in the foregoing section, we approach the problem
of identifying the main actors in a certain technological domain bottom-up by
rst identifying online sources to mine, targeted to the application domain in
question. These serve as seeds for the further incremental growth of our expert
repository. The main rationale behind the seed approach is that dierent expert
communities use dierent communication channels as their primary mean for
communicating and disseminating knowledge, and thus dierent types of sources
would be relevant for nding experts on dierent topics.
Thus if we want to identify currently active researchers in a particular domain,
as for example in software engineering, we can assume such experts regularly
publish in top-quality international conferences, such as for example The In-
ternational Conference on Software Engineering (http://icse-conferences.org/).
Such conferences nowadays have a dedicated website which details their pro-
gram, i.e. the set of accepted papers that will be presented, together with their
authors and titles, often grouped in sessions. We consider such a website as the
initial seed for our approach: we extract information about the topics presented,
e.g. the titles and abstracts (if available) of presented papers and the session
in which they are presented, and about who is presenting, e.g. author names,
aliation, e-mail addresses, etc.
A drawback of this approach of having a front end which is very tightly aligned
to the topic of interest is that a dedicated tool needs to be developed each time
a new seed source is considered for mining. This imposes certain limitations on
the level of automation that can be achieved.
374 E. Tsiporkova and T. Tourwe
The initial set of experts is rather limited: one author typically only pub-
lishes one paper at a particular conference, probably collaborates with a broader
set of people besides his co-authors for this particular paper, and is poten-
tially interested in more topics and domains than the ones addressed in this
paper. To extend the gathered information, we consider additional sources us-
ing the extracted information as a seed. We search for every author and co-
author on Google Scholar (http://scholar.google.com) and the DBLP website
(http://www.informatik.uni-trier.de/ley/db/) to identify additional published
material, resulting in a broader set of co-authors and a broader set of topics
of interest. Although the completeness of the information improves, the level of
accuracy decreases as more noise occurs: dierent authors might share a name,
one authors name might be spelled dierently (e.g.Tourwe, T. versus Tom
Tourwe) in dierent papers, dierent aliations might occur for a seemingly
unique author, etc.
In order to clean the data, we again consider additional sources, such as
LinkedIn or ScienticCommons (http://en.scienticcommons.org/) for example.
People use LinkedIn to list past and present work experience, their educational
background, a summary of their interests and activities, etc. In addition, in-
formation about peoples professional network is available. We exploit all this
information to merge and disambiguate the extracted data, e.g. we use work
experience data to merge authors with dierent aliations, and we separate au-
thors with the same name by matching the list of LinkedIn connections with the
list of co-authors.
The raw data contains a simple enumeration of keywords reecting to some
extent the topics addressed by a single author. These have been obtained after
processing the extracted textual information for each expert (e.g. article title
and abstract, session title, aliation, etc.) in two steps:
Part-of-speech tagging: The dierent words in the total text collection for
each author are annotated with their specic part of speech using the Stan-
ford part-of-speech tagger ([16]). Next to the part of speech recognition, the
tagger also denes whether a noun is plural, whether a verb is conjugated,
etc.
Keyword retention: The annotated with part of speech text is subsequently
reduced to a set of keywords by removing all the words tagged as articles,
prepositions, verbs, and adverbs. Practically, only the nouns and the ad-
jectives are retained and the nal keyword set is formed according to the
following simple algorithm:
1. adjective-noun(s) keywords: a sequence of an adjective followed by a
noun or a sequence of adjacent nouns is considered as one compound
keyword e.g. supervised learning;
2. multiple nouns keywords: a sequence of adjacent nouns is considered as
one compound keyword e.g. mixture model;
3. single noun keywords: each of the remaining nouns forms a keyword on
its own.
The above process is rather crude and generates an extensive set of keywords,
without any relationship between them and without guarantee that all of the
Tool Support for Technology Scouting Using Online Sources 375
retained keywords are relevant to the topic of interest. To attain more accurate
and more concise annotation of expert proles with interrelated domain-specic
keywords, a conceptual model of the domain of interest, such as a taxonomy or
an ontology, can be used. In our software engineering example, this would allow
to derive that the keywords user stories and prioritisation are both related
to the requirements concept. This in turn would allow to cluster dierent
authors on dierent levels of abstraction, e.g. around higher-level research do-
mains, such as requirements engineering or around specic research topics such
as agile requirements elicitation. Note that after such formal classication, the
data can potentially be cleaned even more, as one author with two seemingly
unrelated sets of topics of interest potentially should be split into two authors,
or two separate authors with perfectly aligned topics that were not unied in
a previous cleaning step could be merged now this additional information has
become available.
Ontologies for many dierent domains are being developed, e.g. an exhaustive
ontology for the software engineering domain has been developed in [17]. An
alternative approach is to derive a taxonomy using additional sources, such as
the Wikipedia category structure or the Freebase entity graph.
An additional advantage of the construction of a formal semantic model of the
domain of interest is that this enables the use of automatic text annotation tools
for analysing extensive expert-related textual information in order to enrich the
expert proles with domain-specic keywords that are not explicitly present in
the original documents.
4 Demo
The tool platform proposed above is currently in an initial development stage.
A concrete use case scenario will be prepared for the demonstration session in
order to illustrate dierent functionalities e.g. keyword generation for user pro-
ling, particular disambiguation features, domain modelling through taxonomy
building, etc.
5 Conclusion
A software tool which serves as a proof-of-concept for topic-centered expert
identication based on bottom-up mining of online sources is presented. The
realised approach is still work in progress and obviously, the gathered data only
covers parts of the data needed for technology scouting purposes. We consider
to further extend and complemented it with information obtained from research
project databases (e.g. the FP7 database), online patent repositories (e.g. US
Patent Oce, Google Patents) or technological roadmaps.
References
1. Balog, K., de Rijke, M.: Finding similar experts. In: Proceedings of the 30th Annual
International ACM SIGIR Conference on Research and Development in Informa-
tion Retrieval, pp. 821822. ACM, New York (2007)
376 E. Tsiporkova and T. Tourwe
2. Jung, H., Lee, M., Kang, I., Lee, S., Sung, W.: Finding topic-centric identied
experts based on full text analysis. In: 2nd International ExpertFinder Workshop
at the 6th International Semantic Web Conference ISWC 2007 (2007)
3. Zhang, J., Tang, J., Li, J.: Expert nding in a social network. In: Advances in
Databases: Concepts, Systems and Applications, pp. 10661069 (2010)
4. Stankovic, M., Jovanovic, J., Laublet, P.: Linked Data Metrics for Flexible Expert
Search on the Open Web. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B.,
Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011. LNCS, vol. 6644, pp.
108123. Springer, Heidelberg (2011)
5. Boley, H., Paschke, A.: Expert querying and redirection with rule responder. In:
2nd International ExpertFinder Workshop at the 6th International Semantic Web
Conference ISWC 2007 (2007)
6. Fang, H., Zhai, C.X.: Probabilistic models for expert nding. Advances in Infor-
mation Retrieval, 418430 (2007)
7. Zhang, J., Tang, J., Liu, L., Li, J.: A mixture model for expert nding. In: Washio,
T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI),
vol. 5012, pp. 466478. Springer, Heidelberg (2008)
8. Balog, K., Azzopardi, L., de Rijke, M.: A language modeling framework for expert
nding. Information Processing & Management 45, 119 (2009)
9. Hofmann, K., Balog, K., Bogers, T., de Rijke, M.: Contextual factors for nd-
ing similar experts. Journal of the American Society for Information Science and
Technology 61, 9941014 (2010)
10. Tung, Y., Tseng, S., Weng, J., Lee, T., Liao, A., Tsai, W.: A rule-based CBR
approach for expert nding and problem diagnosis. Expert Systems with Applica-
tions 37, 24272438 (2010)
11. Sriharee, N., Punnarut, R.: Constructing Semantic Campus for Academic Collab-
oration. In: 2nd International ExpertFinder Workshop at the 6th International
Semantic Web Conference ISWC 2007, pp. 2332 (2007)
12. Pavlov, M., Ichise, R.: Finding experts by link prediction in co-authorship networks.
In: 2nd International ExpertFinder Workshop at the 6th International Semantic
Web Conference ISWC 2007, pp. 4255 (2007)
13. Jung, H., Lee, M., Sung, W., Park, D.: Semantic Web-Based Services for Supporting
Voluntary Collaboration among Researchers Using an Information Dissemination
Platform. Data Science Journal 6, 241249 (2007)
14. Bohm, C., Naumann, F., et al.: Proling linked open data with ProLOD. In: Pro-
ceedings of the 26th IEEE International Conference on Data Engineering ICDE
2011, Workshops, pp. 175178 (2010)
15. Pu, K., Hassanzadeh, O., Drake, R., Miller, R.: Online annotation of text streams
with structured entities. In: Proceedings of the 19th ACM International Conference
on Information and Knowledge Management, CIKM 2010, pp. 2938 (2010)
16. Toutanova, K., Kelin, D., Manning, C.: Enriching the knowledge sources used in
a maximum entropy part of speech tagger. In: Proceedings of the Joint SIGDAT
Conference on Empirical Methods in Natural Language Processing and Very Large
Corpora EMNLP/VLC 2000, 6370 (2000)
17. Wongthongtham, P., Chang, E., Dillon, T., Sommerville, I.: Development of a
Software Engineering Ontology for Multi-site Software Development. IEEE Trans-
actions on Knowledge and Data Engineering 21, 12051217 (2009)
Governance Issues on Heavy Models in an
Industrial Context
1 Introduction
SWIFT is the leader in the banking communication and message transmission.
One of its main missions is the management of the communication standards
ISO-15022 and ISO-20022 that are used between banks in order to exchange
messages. Those standards provide the denition of message payloads (i.e. which
data elds can or must be included in which communication ow).
One of the diculties of managing a worldwide business standard is the con-
tinuous need to evolve the standard to cater for new business requirements. From
a model management point of view, this creates a lot of new denitions that then
have to be organized properly.
In 2009, SWIFT Standards undertook a major strategic study aimed at den-
ing a 5 year roadmap for standard capabilities evolutions. They identied a set
of priorities: (i) the management of the content and the reuse, (ii) the ability to
support specialized standards and market practices, and (iii) the management
of changes.
Very recently, SWIFT and their customers have reiterated the role of ISO-
20022 as a mechanism to facilitate the industry integration at the business level.
In order to realize this vision, the need of a common and machine-readable de-
nition has been established. This denition comprised the business processes, the
data dictionary, the message denitions, the market practices and the mapping
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 377382, 2011.
c Springer-Verlag Berlin Heidelberg 2011
378 S. Skhiri et al.
rules. All of these denitions would be managed in a controlled manner over time
(versioning). It should be possible for industry players to customize the deni-
tions in order to t their needs (e.g. local market practices, bilateral agreements)
as well as the representations of their own formats in this environment (using
the expressive power of the ISO 20022 global dictionary).
A thorough model-based approach has been set up in order to enable the
proper governance of the whole model and the interoperability across the actors
working on the model.
3 Solutions
The foundational element of the solution is the implementation of a meta-model
above the current standard model. This meta-model allows to manage and to
simplify the use of the standard model: instances of the meta-model are the
models of the standard, and instances of the models are the message denitions.
Both are dened in an XMI le.
The data can then be stored on a centralized way in order to promote re-usability.
Modeled architecture data storage as provided by CDO is ideal in this situation.
4 Implementation
5 Illustration
The View model is used for grouping elements within an high level view. Orig-
inally it was designed for resolving two important challenges of the ISO 20022
Repository: (1) nding a relevant content, as its relevance may rely on several
criteria of dierent kinds (project, business domain, publication, etc.), (2) the
need to dene ner-grained scopes, since the dictionary size will dramatically
increase in the coming years. Therefore, the view model is a mechanism for
providing a view on specic set of message components. In additionn it oers
informations to describe when this view was extracted. The root element of the
model exposed in the Figure 1 is the ViewSet, which is nothing else than a set
of views. Basically a View has a name, a description and a list of message com-
ponents. A view does not contain components, it references them. A view can
be checked out from the repository. In such a case, the editor will create a local
copy of each element. A change request can then be associated with this view.
That is why the change model is also linked with the view model. In case of
modications when the view is checked in, the user can use EMF compare and
its 3-way comparison to evaluate the changes and merge the dierent versions.
The mechanism of Publication View is based on the same concept. A publica-
tion is the mechanism by which the base standard is published, it uses the same
concepts and adds additional elements: the previous and next Publication View.
These attributes enables to link the publications between them.
Governance Issues on Heavy Models in an Industrial Context 381
Fig. 1. The view model is used for checking out content, but also for publishing new
base standard
Fig. 2. The semantic trace between the business and logical layer. This trace is used
for dening the impact of an element.
6 Conclusion
1 Introduction
Engineering contractors are designing and building very large and complex
installations such as nuclear power plants, airports, oil refineries etc. The whole plant
is described, specified in detail by technical documentation. This documentation is of
equal importance as the actual plant and should meet also high quality requirements.
This application domain is characterized by increasing size and complexity of the
plants, compliance to increasing legal and other external regulation and the increasing
need to apply and control rules for governance. This increases the overall complexity
of the documentation and the number of rules business processes- to be followed in
an exponential way. The risks, responsibilities, liabilities and the financial exposure
increase similarly. The elements of this domain are:
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 383388, 2011.
Springer-Verlag Berlin Heidelberg 2011
384 S.J.H. van Kervel
coordination of the production) that produces the documentation and iii) the supporting
IT system, are closely related engineering artifacts. Example: the engineering of a
bicycle factory is directly derived from the engineering of a bicycle. A supporting IT
system for the bicycle factory is also an engineering artifact that is directly derived from
the way a bicycle factory operates. This is common practice and successful in
manufacturing. However, for document-driven enterprises this is not (yet) applied in a
proper way due to a lack of applicable science (enterprise ontology) and methodology
(DEMO). For this domain, engineering contractors the produced technical
documentation is also controlling the production and maintenance of the engineering
contractor artifact (refinery, energy plant etc.).
For each new project these three artifacts have to be designed in advance and
represented in high quality DEMO conceptual enterprise models. Enterprises are
complex entities, recursively composed of aggregated sub-enterprises (example:
multinational-national-departmental-sub-department) Recursive modeling into finer
detail delivers finer detailed enterprise models. These aggregated conceptual models
of unlimited size and complexity specify simultaneous:
The construction and the operation of the enterprise including all communication
and coordination;
The detailed production for each actor- employee in the enterprise, and
The IT system that monitors and controls the enterprise operation.
This may seem very elaborate but these models have a high degree of reusability and
the overall modeling effort is small.
The core of the software technology is a software engine, the DEMO processor.
Edit
model
model rendering
Simulation Model
results validation
<--
DEMO
DMOL XML models
Processor Communication
model repository ==> Actor 1
& coordination
1 .. n
parsing &
Communication
model building Actor n
& coordination
DEMO
4 aspect models
The DEMO processor constructs and executes DEMO models, enables model
simulation, model validation and incremental model improvement during
development (fig. 1). The development process starts with the modeling stage for an
enterprise (nested in enterprises etc.) that delivers the 4 graphical DEMO aspect
models. Then the models are translated (not programmed) on the DEMO processor in
a 1:1 process. The DEMO processor executes the enterprise model dynamically and
delivers simulation results for model validation. The execution involves all
High Quality Technical Documentation for Large Industrial Plants 387
Communication
DMOL XML models MIS Actor 1
& coordination
model repository production
Enterprise A, actors 1 .. n
copy to instance Communication
model rendering Actor n
& coordination
<--
DMOL XML files DEMO
production instances Processor
production database ==> Communication
Actor 1
& coordination
parsing &
model building & Enterprise B, actors 1 ..
model aggregation Communication
Aggregated Actor m
& coordination
DEMO models
References
1. NEMI White Paper Report, In search of the perfect Bill of Materials (2002),
http://thor.inemi.org/webdownload/newsroom/Articles/BoMwhite
paper.pdf
2. Dietz, J.L.G., Hoogervorst, J.A.P.: Language Design and Programming Methodology.
LNCS, vol. 79. Springer, Heidelberg (1980)
3. Object Management Group Inc, 2009. Business Process Modeling Notation (BPMN)
Specifications (2009), http://www.omg.org/spec/BPMN/1.2/PDF/
4. Nuffel van, D., Mulder, H., van Kervel, S.: Enhancing the formal foundations of BPMN
using Enterprise Ontology. In: CAiSE CIAO (2009)
5. Ciao Consortium, http://www.ciaonetwork.org
6. DEMO Knowledge Centre; Enterprise Engineering institute, http://www.demo.nl
7. Hevner, A., March, S., Park, J., Ram, S.: Design science in information systems research.
MIS Quarterly 28(1) (2004)
8. Dietz, J.L.G.: Enterprise Ontology. Springer, New York (2006)
9. Mulder, J.B.F.: Rapid Enterprise Design. PhD thesis (2008); ISBN 90-810480-1-5
Publishing Open Data and Services for the Flemish
Research Information Space
Abstract. The Flemish public administration aims to integrate and publish all
research information on a portal. Information is currently stored according to
the CERIF standard modeled in (E)ER and aimed at extensibility. Solutions
exist to easily publish data from databases in RDF, but ontologies need to be
constructed to render those meaningful. In order to publish their data, the public
administration and other stakeholders first need to agree on a shared under-
standing of what exactly is captured and stored in that format. In this paper,
we show how the use of the Business Semantics Management method and tool
contributed in achieving that aim.
O. De Troyer et al. (Eds.): ER 2011 Workshops, LNCS 6999, pp. 389394, 2011.
Springer-Verlag Berlin Heidelberg 2011
390 C. Debruyne et al.
As we will explain in Section 2, integrating all information and reducing the ad-
ministrative burden faces some problems for which appropriate data governance me-
thods and tools are needed. Such method and tool is presented in Section 3 and we
end this paper with a conclusion in Section 4.
Fig. 1. The CERIF entity cfProject and its relationship with the entity cfProject_Classification
(linked by the two identifiers of the linked entities). A CERIF relationship is always semanti-
cally enriched by a time-stamped classification reference. The classification record is main-
tained in a separate entity (cfClassification) and allows for multilingual features. Additionally,
each classification record or instance requires an assignment to a classification scheme (cfClas-
sificationSchemeIdentifier).
FRIS semantic community). The software is currently deployed at EWI for manag-
ing business semantics of CERIF terms. A term (here Project) can be defined using
one or more attributes such as definitions, examples, fact types, rules sets, categoriza-
tion schemas (partly shown in taxonomy), and finally milestones for the lifecycle.
Project in this case is a subtype of Thing and has two subtypes: large academic
project and small industrial project. Re governance: in the top-right corner is indi-
cated which member in the community (here Pieter De Leenheer) carries the role of
steward, who is ultimately accountable for this term. The status candidate indi-
cates that the term is not yet fully articulated: in this case Project only 37.5%. This
percentage is automatically calculated based on the articulation tasks that have to be
performed according to the business semantics management methodology. Tasks are
related to defining attributes and are distributed among stakeholders and orchestrated
using workflows.
Fig. 3. Screenshot of Collibras BSG supporting the semantic reconciliation process of the
BSM methodology by providing domain experts means to enter simple key facts in natural
language, natural language definitions on facts and terms in those facts as well as constraints.
Applying BSM results in a community driven (e.g. representing the different clas-
sifications and models mentioned earlier), iteratively developed shared and agreed
upon conceptual model in SVBR. This model then is automatically converted in a
CERIF-based ER model and RDFS/OWL for Web publishing. Fig. 4 shows a part of
the generated OWL from the concept depicted in the previous figure. In this figure,
Publishing Open Data and Services for the Flemish Research Information Space 393
we see that Project is a Class and all instances of that class are also instances of
entities with at least one value for the property ProjectHasTitle, one of the rules
expressed in SBVR in Fig. 3 (see general rule sets).
Fig. 4. Screenshot of the OWL around Project generated by BSG. In this picture, we see that
Project is a Class and all instances of that class are also instances of entities with at least
one value for the property ProjectHasTitle.
Fig. 5. Part of the generated mapping file by D2R server, it maps the table CFProj to the
generated CFPROJ RDFS class. It uses the primary key to generate a unique ID and the class
definition label is taken from the tables name.
Even though classes and properties are generated and populated with instances,
these RDF triples are not semantic as they stem from one particular information sys-
tem(s database schema). That RDFS is then aligned with the generated RDFS/OWL
classes and properties generated from the BSM ontology. The commitments described
in the previous section are used as a guideline to create this alignment. Fig. 6 below
shows the changes (highlighted) made on the generated mapping file with the ontolo-
gy. The ontology can then be used to access the data.
4
http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/
394 C. Debruyne et al.
Fig. 6. Modified mapping file with the ontology exported from BSG. An extra namespace (for
the exported ontology) is added and the generated classes and properties are appropriately
annotated with that ontology.
References
[1] OMG SBVR, version 1.0, http://www.omg.org/spec/SBVR/1.0/
[2] De Leenheer, P., Christiaens, S., Meersman, R.: Business semantics management: A case
study for competency-centric HRM. Computers in Industry 61(8), 760775 (2010)
[3] Halpin, T.: Information Modeling and Relational Databases. Morgan Kaufmann, San
Francisco (2008)
[4] Jrg, B.: CERIF: The common European research information format model. Data Science
Journal (9), 2431 (2010)
[5] Jrg, B., van Grootel, G., Jeffery, K.: Cerif2008xml - 1.1 data exchange format specifica-
tion. Technical report, euroCRIS (2010)
[6] Spyns, P., Tang, Y., Meersman, R.: An ontology engineering methodology for DOGMA.
Applied Ontology 3(1-2), 1339 (2008)
[7] Spyns, P., van Grootel, G., Jrg, B., Christiaens, S.: Realising a Flemish government in-
novation information portal with business semantics management. In: Stempfhuber, M.,
Thidemann, N. (eds.) Connecting Science with Society. The Role of Research Information
in a Knowledge-Based Society: University of Aalborg, Aalborg University Press (2010)
Author Index
Abell
o, Alberto 108 Gailly, Frederik 163
Aguilar, Jose Alfonso 14 Garca-Ranea, Raul 323
Aguilera, David 323 Garrigos, Irene 14, 336
Ait-Ameur, Yamine 98 Geerts, Guido L. 291
Glorio, Octavio 336
Bach Pedersen, Torben 364 Golfarelli, Matteo 358
Bakhtouchi, Abdelghani 98 G
omez, Cristina 323
Bartie, Phil 231 Guerra, Francesco 328
Battaglia, Antonino 358 Guizzardi, Giancarlo 161
Bellatreche, Ladjel 98
Bergamaschi, Sonia 328 Hallot, Pierre 230, 322
Bhatt, Ankur 343 Hernandez, Inma 345
Bianchini, Devis 34 Hernandez, Paul 336
Billen, Roland 230, 322 Houben, Geert-Jan 1
Bontemps, Yves 377
Boukhebouze, Mohamed 24 Jansen, Slinger 151
Brdjanin, Drazen 292 Jouault, Frederic 332
Briand, Lionel 338 Jureta, Ivan J. 44
Brisaboa, Nieves R. 241 Kabbedijk, Jaap 151
Cabot, Jordi 332 K
ak
ol
a, Timo K. 119
Caniupan, Monica 75 K
astner, Christian 130
Chen, Tao 251 Khan, Ateeq 130
Christiaens, Stijn 389 Kingham, Simon 231
Clasen, Caue 332 K
oppen, Veit 130
Clementini, Eliseo 231 Lechtenborger, Jens 65
Corchuelo, Rafael 345
Lefons, Ezio 86
Dau, Frithjof 45 Levin, Ana M. 203
De Antonellis, Valeria 34 Liddle, Stephen W. 183
Debruyne, Christophe 389 Lieto, Antonio 210
de Hemptinne, Gregoire 377 Linner, Konrad 55
Delbaere, Marc 377 Loebe, Frank 193
De Leenheer, Pieter 389 Lonsdale, Deryle W. 183
de Oliveira, Jose Palazzo Moreira 2 Lopes, Giseli Rabello 2
De Troyer, Olga 120 Lozano, Angela 141
Di Tria, Francesco 86 Luaces, Miguel R. 241
Lukyanenko, Roman 220
Egenhofer, Max J. 261
El Dammagh, Mohammed 120 Maric, Slavko 292
Embley, David W. 183 Marth, Kevin 312
Erbin, Lim 24 Masolo, Claudio 173
Mazon, Jose-Norberto 14, 65, 336
Faulkner, Stephane 44 McGinnes, Simon 4
Frasincar, Flavius 1 Melchiori, Michele 34
Frixione, Marcello 210 Mens, Kim 118
396 Author Index