00 Fundamental Approaches To Software Engineering
00 Fundamental Approaches To Software Engineering
00 Fundamental Approaches To Software Engineering
Fundamental Approaches
to Software Engineering
21st International Conference, FASE 2018
Held as Part of the European Joint Conferences
on Theory and Practice of Software, ETAPS 2018
Thessaloniki, Greece, April 14–20, 2018, Proceedings
www.dbooks.org
Lecture Notes in Computer Science 10802
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison, UK Takeo Kanade, USA
Josef Kittler, UK Jon M. Kleinberg, USA
Friedemann Mattern, Switzerland John C. Mitchell, USA
Moni Naor, Israel C. Pandu Rangan, India
Bernhard Steffen, Germany Demetri Terzopoulos, USA
Doug Tygar, USA Gerhard Weikum, Germany
www.dbooks.org
Alessandra Russo Andy Schürr (Eds.)
•
Fundamental Approaches
to Software Engineering
21st International Conference, FASE 2018
Held as Part of the European Joint Conferences
on Theory and Practice of Software, ETAPS 2018
Thessaloniki, Greece, April 14–20, 2018
Proceedings
Editors
Alessandra Russo Andy Schürr
Imperial College London TU Darmstadt
London Darmstadt
UK Germany
© The Editor(s) (if applicable) and The Author(s) 2018. This book is an open access publication.
Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International
License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution
and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and
the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this book are included in the book’s Creative Commons license,
unless indicated otherwise in a credit line to the material. If material is not included in the book’s Creative
Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use,
you will need to obtain permission directly from the copyright holder.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer International Publishing AG
part of Springer Nature
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
www.dbooks.org
ETAPS Foreword
Welcome to the proceedings of ETAPS 2018! After a somewhat coldish ETAPS 2017
in Uppsala in the north, ETAPS this year took place in Thessaloniki, Greece. I am
happy to announce that this is the first ETAPS with gold open access proceedings. This
means that all papers are accessible by anyone for free.
ETAPS 2018 was the 21st instance of the European Joint Conferences on Theory
and Practice of Software. ETAPS is an annual federated conference established in
1998, and consists of five conferences: ESOP, FASE, FoSSaCS, TACAS, and POST.
Each conference has its own Program Committee (PC) and its own Steering Com-
mittee. The conferences cover various aspects of software systems, ranging from
theoretical computer science to foundations to programming language developments,
analysis tools, formal approaches to software engineering, and security. Organizing
these conferences in a coherent, highly synchronized conference program facilitates
participation in an exciting event, offering attendees the possibility to meet many
researchers working in different directions in the field, and to easily attend talks of
different conferences. Before and after the main conference, numerous satellite work-
shops take place and attract many researchers from all over the globe.
ETAPS 2018 received 479 submissions in total, 144 of which were accepted,
yielding an overall acceptance rate of 30%. I thank all the authors for their interest in
ETAPS, all the reviewers for their peer reviewing efforts, the PC members for their
contributions, and in particular the PC (co-)chairs for their hard work in running this
entire intensive process. Last but not least, my congratulations to all authors of the
accepted papers!
ETAPS 2018 was enriched by the unifying invited speaker Martin Abadi (Google
Brain, USA) and the conference-specific invited speakers (FASE) Pamela Zave (AT &
T Labs, USA), (POST) Benjamin C. Pierce (University of Pennsylvania, USA), and
(ESOP) Derek Dreyer (Max Planck Institute for Software Systems, Germany). Invited
tutorials were provided by Armin Biere (Johannes Kepler University, Linz, Austria) on
modern SAT solving and Fabio Somenzi (University of Colorado, Boulder, USA) on
hardware verification. My sincere thanks to all these speakers for their inspiring and
interesting talks!
ETAPS 2018 took place in Thessaloniki, Greece, and was organised by the
Department of Informatics of the Aristotle University of Thessaloniki. The university
was founded in 1925 and currently has around 75,000 students; it is the largest uni-
versity in Greece. ETAPS 2018 was further supported by the following associations
and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer
Science), EAPLS (European Association for Programming Languages and Systems),
and EASST (European Association of Software Science and Technology). The local
organization team consisted of Panagiotis Katsaros (general chair), Ioannis Stamelos,
VI ETAPS Foreword
www.dbooks.org
Preface
This book contains the proceedings of FASE 2018, the 21th International Conference
on Fundamental Approaches to Software Engineering, held in Thessaloniki, Greece, in
April 2018, as part of the annual European Joint Conferences on Theory and Practice of
Software (ETAPS 2018).
As usual for FASE, the contributions combine the development of conceptual and
methodological advances with their formal foundations, tool support, and evaluation on
realistic or pragmatic cases. As a result, the volume contains regular research papers
that cover a wide range of topics, such as program and system analysis, model
transformations, configuration and synthesis, graph modeling and transformation,
software product lines, test selection, as well as learning and inference. We hope that
the community will find this volume engaging and worth reading.
The contributions included have been carefully selected. For the third time, FASE
used a double-blind review process, as the past two years’ experiments were considered
valuable by authors and worth the additional effort of anonymizing the papers. We
received 77 abstract submissions from 24 different countries, from which 63 full-paper
submissions materialized. All papers were reviewed by three experts in the field, and
after intense discussion, only 19 were accepted, giving an acceptance rate of 30%.
We thank the ETAPS 2018 general chair Katsaros Panagiotis, the ETAPS orga-
nizers, Ioannis Stamelos, Lefteris Angelis, and George Rahonis, the ETAPS publicity
chairs, Ezio Bartocci and Simon Bliudze, as well as the ETAPS SC chair, Joost-Pieter
Katoen, for their support during the whole process. We thank all the authors for their
hard work and willingness to contribute. Last but not least, we thank all the Program
Committee members and external reviewers, who invested time and effort in the
selection process to ensure the scientific quality of the program.
Program Committee
Ruth Breu Universität Innsbruck, Austria
Yuanfang Cai Drexel University, USA
Sagar Chaki Carnegie Mellon University, USA
Hana Chockler King’s College London, UK
Ewen Denney NASA Ames, USA
Stefania Gnesi ISTI-CNR, Italy
Dilian Gurov Royal Institute of Technology (KTH), Sweden
Zhenjiang Hu National Institute for Informatics, Japan
Reiner Hähnle Darmstadt University of Technology, Germany
Valerie Issarny Inria, France
Einar Broch Johnsen University of Oslo, Norway
Gerti Kappel Vienna University of Technology, Austria
Ekkart Kindler Technical University of Denmark, Denmark
Kim Mens Université catholique de Louvain, Belgium
Fernando Orejas Universitat Politècnica de Catalunya, Spain
Fabrizio Pastore University of Luxembourg, Luxembourg
Arend Rensink Universiteit Twente, The Netherlands
Leila Ribeiro Universidade Federal do Rio Grande do Sul, Brazil
Julia Rubin The University of British Columbia, USA
Bernhard Rumpe RWTH Aachen, Germany
Alessandra Russo Imperial College London, UK
Rick Salay University of Toronto, Canada
Ina Schaefer Technische Universität Braunschweig, Germany
Andy Schürr Darmstadt University of Technology, Germany
Marjan Sirjani Reykjavik University, Iceland
Wil Van der Aalst RWTH Aachen, Germany
Daniel Varro Budapest University of Technology and Economics,
Hungary
Virginie Wiels ONERA/DTIM, France
Yingfei Xiong Peking University, China
Didar Zowghi University of Technology Sydney, Australia
www.dbooks.org
X Organization
Additional Reviewers
www.dbooks.org
XII Contents
www.dbooks.org
A Formal Framework for Incremental
Model Slicing
1 Introduction
Program slicing as introduced by Weiser [1] is a technique which determines
those parts of a program (the slice) which may affect the values of a set of
(user-)selected variables at a specific point (the slicing criterion). Since the sem-
inal work of Weiser, which calculates a slice by utilizing static data and control
flow analysis and which primarily focuses on assisting developers in debugging,
a plethora of program slicing techniques addressing a broad range of use cases
have been proposed [2].
With the advent of Model-Driven Engineering (MDE) [3], models rather than
source code play the role of primary software development artifacts. Similar use
c The Author(s) 2018
A. Russo and A. Schürr (Eds.): FASE 2018, LNCS 10802, pp. 3–20, 2018.
https://doi.org/10.1007/978-3-319-89363-1_1
4 G. Taentzer et al.
cases as known from program slicing must be supported for model slicing [4–6]. In
addition to classical use cases adopted from the field of program understanding,
model slicing is often motivated by scalability issues when working with very
large models [7,8], which has often been mentioned as one of the biggest obstacles
in applying MDE in practice [9,10]. Modeling frameworks such as the Eclipse
Modeling Framework (EMF) and widely-used model management tools do not
scale beyond a few tens of thousands of model elements [11], while large-scale
industrial models are considerably larger [12]. As a consequence, such models
cannot even be edited in standard model editors. Thus, the extraction of editable
submodels from a larger model is the only viable solution to support an efficient
yet independent editing of huge monolithic models [8]. Further example scenarios
in which model slices may be constructed for the sake of efficiency include model
checkers, test suite generators, etc., in order to reduce runtimes and memory
consumption.
Slice criteria are often modified during software development tasks. This
leads to corresponding slice updates (also called slice adaptations in [8]). During
a debugging session, e.g., the slicing criterion might need to be modified in order
to closer inspect different debugging hypotheses. The independent editing of
submodels is another example of this. Here, a slice created for an initial slicing
criterion can turn out to be inappropriate, most typically because additional
model elements are desired or because the slice is still too large. These slice
update scenarios have in common that the original slicing criterion is modified
and that the existing slice must be updated w.r.t. the new slicing criterion.
Model slicing is faced with two challenging requirements which do not exist or
which are of minor importance for traditional program slicers. First, the increas-
ing importance and prevalence of domain-specific modeling languages (DSMLs)
as well as a considerable number of different use cases lead to a huge number of
different concrete slicers, examples will be presented in Sect. 2. Thus, methods
for developing model slicers should abstract from a slicer’s concrete behavior
(and thus from concrete modeling languages) as far as possible. Ideally, model
slicers should be generic in the sense that the behavior of a slicer is adapt-
able with moderate configuration effort [7]. Second, rather than creating a new
slice from scratch for a modified slicing criterion, slices must often be updated
incrementally. This is indispensable for all use cases where slices are edited by
developers since otherwise these slice edits would be blindly overwritten [8]. In
addition, incremental slice updating is a desirable feature when it is more effi-
cient than creating the slice from scratch. To date, both requirements have been
insufficiently addressed in the literature.
In this paper, we present a fundamental methodology for developing model
slicers which abstract from the behavior of a concrete slicer and which support
incremental model slicing. To be independent of a concrete DSML and use cases,
we restrict ourselves to static slicing in order to support both executable and
non-executable models. We make the following contributions:
www.dbooks.org
A Formal Framework for Incremental Model Slicing 5
2 Motivating Example
In this section we introduce a running example to illustrate two use cases of
model slicing and to motivate incremental slice updates.
Figure 1 shows an excerpt of the system model of the Barbados Car Crash
Crisis Management System (bCMS) [13]. It describes the operations of a police
and a fire department in case of a crisis situation.
Fig. 1. Excerpt of the system model of the bCMS case study [13].
The system is modeled from different viewpoints. The class diagram mod-
els the key entities and their relationships from a static point of view. A
police station coordinator (PS coordinator) and a fire station coordinator (FS
coordinator) are responsible for coordinating and synchronizing the activities
on the police and fire station during a crisis. The interaction of both coordinators
is managed by the respective system classes PSC System and FSC System which
contain several operations for, e.g., establishing the communication between the
coordinators and exchanging crisis details. The state machine diagram models
the dynamic view of the class PSC System, i.e., its runtime behavior, for send-
ing and receiving authorization credentials and crisis details to and from a FSC
System. Initially, the PSC System is in the state Idle. The establishment of the
6 G. Taentzer et al.
Model Slicing. Model slicers are used to find parts of interest in a given model
M . These parts of M are specified by a slicing criterion, which is basically a set
of model elements or, more formally, a submodel C of M . A slicer extends C
with further model elements of M according to the purpose of the slicer.
We illustrate this with two use cases. Use case A is known as backward slicing
in state-based models [4]. Given a set of states C in a statechart M as slicing
criterion, the slicer determines all model elements which may have an effect
on states in C. For instance, using S.1.0.1 (s. gray state in Fig. 1) as slicing
criterion, the slicer recursively determines all incoming transitions and their
sources, e.g., the transition with the event sendPScoordinatorCredentials and
its source state S.1.0.0, until an initial state is reached.
The complete backward slice is indicated by the blue elements in the lower
part of Fig. 1. The example shows that our general notion of a slicing criterion
may be restricted by concrete model slicers. In this use case, the slicing criterion
must not be an arbitrary submodel of a given larger model, but a very specific
one, i.e., a set of states.
Use case B is the extraction of editable models as presented in [8]. Here,
the slicing criterion C is given by a set of requested model elements of M . The
purpose of this slicer is to find a submodel which is editable and which includes
all requested model elements. For example, if we use the blue elements in the
lower part of Fig. 1 as slicing criterion, the model slice also contains the orange
elements in the upper part of Fig. 1, namely three operations, because events of
a transitions in a statechart represent operations in the class diagram, and the
class containing these operations.
www.dbooks.org
A Formal Framework for Incremental Model Slicing 7
3 Formal Framework
We have seen in the motivating example that model slicers can differ consider-
ably in their intended purpose. The formal framework we present in the following
defines the fundamental concepts for model slicing and slice updates. This frame-
work uses graph-based models and model modifications [14]. It shall serve as a
guideline how to define model slicers that support incremental slice updates.
Example 1 (Typed model graph). The left-hand side of Fig. 2 shows the model
graph of an excerpt from the model depicted in Fig. 1. The model graph is
1
In the following, we usually omit the adjective “attributed”.
8 G. Taentzer et al.
typed over the meta-model depicted on the right-hand side of Fig. 2. It shows a
simplified excerpt of the UML meta-model. Every node (and edge) of the model
graph is mapped onto a node or edge of the type graph by the graph morphism
type : M → M M .
Typed models and morphisms as defined above form the category AGraphsAT G
in [15]. It has various properties since it is an adhesive HLR category using a class
M of injective graph morphisms with isomorphic data mapping, it has pushouts
and pullbacks where at least one morphism is in M. These constructions can
be considered as generalized union and intersection of models being defined
component-wise on nodes and edges such that they are structure-compatible.
These constructions are used to define the formal framework.
www.dbooks.org
A Formal Framework for Incremental Model Slicing 9
www.dbooks.org
A Formal Framework for Incremental Model Slicing 11
All model modifications are concatenated yielding the direct model modification
e1 ◦c1 e2 ◦c2
S1 ←− Cs −→ S2 called slice update construction (see also Fig. 6).
www.dbooks.org
A Formal Framework for Incremental Model Slicing 13
satisfied, model slicing through slice-creating edit scripts indeed behaves accord-
ing to Definition 4, i.e., a slice S = Slice(M, C → M ) is obtained by applying
Δ⇒S to the empty model: The resulting slice S is a submodel of M and a super-
model of C. As we will see in Sect. 5, the behavior of a concrete model slicer and
thus its intended purpose is configured by the transformation rule set R.
4.4 Implementation
The framework instantiation has been implemented using a set of standard MDE
technologies on top of the widely used Eclipse Modeling Framework (EMF),
which employs an object-oriented implementation of graph-based models in
which nodes and edges are represented as objects and references, respectively.
Edit scripts are calculated using the model differencing framework SiLift [21],
which uses EMF Compare [22] in order to determine the corresponding elements
in a pair of models being compared with each other. A matching determined by
EMF Compare fulfills the requirements presented in Sect. 4.1 since EMF Com-
pare (a) delivers 1:1-correspondences between elements, thus yielding an injective
mapping, and (b) implicitly matches edges if their respective source and target
www.dbooks.org
A Formal Framework for Incremental Model Slicing 15
nodes are matched and if they have the same type (because EMF does not sup-
port parallel edges of the same type in general), thus yielding an edge-preserving
mapping. Finally, transformation rules are implemented using the model trans-
formation language and framework Henshin [23,24] which is based on graph
transformation concepts.
Fig. 7. Subset of the creation rules for configuring a state-based model slicer
single step. To support the incremental updating of slices, for each creation
rule an inverse deletion rule is included in the overall set of transformation rules.
Parts of the resulting model-creating edit script using these rules are shown in
Fig. 8. For example, rule application p3 creates the state Idle in the top-level
region of the state machine PSCSystem, together with an incoming transition
having the initial state of the state machine, created by rule application p2, as
source state. Thus, p3 depends on p2 since the initial state must be created first.
Similar dependency relationships arise for the creation of other states which are
created together with an incoming transition.
The effect of this configuration on the behavior of the model slicer is as follows
(illustrated here for the creation of a new slice): If state S.1.0.1 is selected as
slicing criterion, as in our motivating example, rule application p7 is included
in the slice-creating edit script since it creates that state. Implicitly, all rule
applications on which p7 transitively depends on, i.e., all rule applications p1
to p6, are also included in the slice-creating edit script. Consequently, the slice
resulting from applying the slice-creating edit script to an empty model creates
a submodel of the state machine of Fig. 1 which contains a transition path from
its initial state to state S.1.0.1, according to the desired behavior of the slicer.
A current limitation of our solution is that, for each state s of the slicing
criterion, only a single transition path from the initial state to state s is sliced.
This path is determined non-deterministically from the set of all possible paths
from the initial state to state s. To overcome this limitation, rule schemes com-
prising a kernel rule and a set of multi-rules (see, e.g., [26,27]) would have to
be supported by our approach. Then, a rule scheme for creating a state with an
arbitrary number of incoming transitions could be included in the configuration
of our slicer, which in turn leads to the desired effect during model slicing. We
leave such a support for rule schemes for future work.
www.dbooks.org
A Formal Framework for Incremental Model Slicing 17
6 Related Work
A large number of model slicers has been developed. Most of them work only
with one specific type of models, notably state machines [4] and other types of
behavioral models such as MATLAB/Simulink block diagrams [5]. Other sup-
ported model types include UML class diagrams [31], architectural models [32] or
system models defined using the SysML modeling language [33]. None of these
approaches can be transferred to other (domain-specific) modeling languages,
and they do not abstract from concrete slicing specifications.
The only well-known more generally usable technique which is adaptable to
a given modeling language and slicing specification is Kompren [7]. In contrast
to our formal framework, however, Kompren does not abstract from the con-
crete model modification approach and implementation technologies. It offers
a domain-specific language based on the Kermeta model transformation lan-
guage [34] to specify the behavior of a model slicer, and a generator which gen-
erates a fully functioning model slicer from such a specification. When Kompren
is used in the so-called active mode, slices are incrementally updated when the
input model changes, according to the principle of incremental model transfor-
mation [35]. In our approach, slices are incrementally updated when the slicing
criterion is modified. As long as endogenous model transformations for con-
structing slices are used only, Kompren could be easily extended to become an
instantiation of our formal framework.
Incremental slicing has also been addressed in [36], however, using a notion
of incrementality which fundamentally differs from ours. The technique has been
developed in the context of testing model-based delta-oriented software product
lines [37]. Rather than incrementally updating an existing slice, the approach
incrementally processes the product space of a product line, where each “product”
is specified by a state machine model. As in software regression testing, the goal
is to obtain retest information by utilizing differences between state machine
slices obtained from different products.
In a broader sense, related work can be found in the area of model splitting
and model decomposition. The technique presented in [38] aims at splitting a
model into submodels according to linguistic heuristics and using information
retrieval techniques. The model decomposition approach presented in [39] consid-
ers models as graphs and first determines strongly connected graph components
from which the space of possible decompositions is derived in a second step.
Both approaches are different from ours in that they produce a partitioning of
an input model instead of a single slice. None of them supports the incremental
updating of a model partitioning.
7 Conclusion
We presented a formal framework for defining model slicers that support incre-
mental slice updates based on a general concept of model modifications. Incre-
mental slice updates were shown to be equivalent to non-incremental ones. Fur-
thermore, we presented a framework instantiation based on the concept of edit
18 G. Taentzer et al.
References
1. Weiser, M.: Program slicing. In: Proceedings of ICSE 1981. IEEE Press (1981)
2. Xu, B., Qian, J., Zhang, X., Wu, Z., Chen, L.: A brief survey of program slicing.
ACM SIGSOFT Softw. Eng. Notes 30(2), 1–36 (2005)
3. Brambilla, M., Cabot, J., Wimmer, M.: Model-driven software engineering in prac-
tice. Synth. Lect. Softw. Eng. 1(1), 1–182 (2012)
4. Androutsopoulos, K., Clark, D., Harman, M., Krinke, J., Tratt, L.: State-based
model slicing: A survey. ACM Comput. Surv. 45(4), 36 (2013). https://doi.org/
10.1145/2501654.2501667. Article 53
5. Gerlitz, T., Kowalewski, S.: Flow sensitive slicing for matlab/simulink models. In:
Proceedings of WICSA 2016. IEEE (2016)
6. Samuel, P., Mall, R.: A novel test case design technique using dynamic slicing of
UML sequence diagrams. e-Informatica 2(1), 71–92 (2008)
7. Blouin, A., Combemale, B., Baudry, B., Beaudoux, O.: Kompren: modeling and
generating model slicers. SoSyM 14(1), 321–337 (2015)
8. Pietsch, C., Ohrndorf, M., Kelter, U., Kehrer, T.: Incrementally slicing editable
submodels. In: Proceedings of ASE 2017. IEEE Press (2017)
9. Baker, P., Loh, S., Weil, F.: Model-driven engineering in a large industrial context—
Motorola case study. In: Briand, L., Williams, C. (eds.) MODELS 2005. LNCS,
vol. 3713, pp. 476–491. Springer, Heidelberg (2005). https://doi.org/10.1007/
11557432_36
10. Hutchinson, J., Whittle, J., Rouncefield, M., Kristoffersen, S.: Empirical assessment
of MDE in industry. In: Proceedings of ICSE 2011. IEEE (2011)
11. Kolovos, D.S., Paige, R.F., Polack, F.A.C.: The grand challenge of scalability for
model driven engineering. In: Chaudron, M.R.V. (ed.) MODELS 2008. LNCS, vol.
5421, pp. 48–53. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-
01648-6_5
12. Kolovos, D.S., Rose, L.M., Matragkas, N., Paige, R.F., Guerra, E., Cuadrado, J.S.,
De Lara, J., Ráth, I., Varró, D., Tisi, M., et al.: A research roadmap towards
achieving scalability in model driven engineering. In: Proceedings of BigMDE @
STAF 2013. ACM (2013)
13. Capozucca, A., Cheng, B., Guelfi, N., Istoan, P.: OO-SPL modelling of the focused
case study. In: Proceedings of CMA @ MoDELS 2011 (2011)
www.dbooks.org
A Formal Framework for Incremental Model Slicing 19
14. Taentzer, G., Ermel, C., Langer, P., Wimmer, M.: Conflict detection for model
versioning based on graph modifications. In: Ehrig, H., Rensink, A., Rozenberg, G.,
Schürr, A. (eds.) ICGT 2010. LNCS, vol. 6372, pp. 171–186. Springer, Heidelberg
(2010). https://doi.org/10.1007/978-3-642-15928-2_12
15. Ehrig, H., Ehrig, K., Prange, U., Taentzer, G.: Fundamentals of Algebraic
Graph Transformation. Springer, Heidelberg (2006). https://doi.org/10.1007/3-
540-31188-2
16. Habel, A., Pennemann, K.: Correctness of high-level transformation systems rela-
tive to nested conditions. Math. Struct. Comput. Sci. 19(2), 245–296 (2009)
17. Kehrer, T., Kelter, U., Taentzer, G.: Consistency-preserving edit scripts in model
versioning. In: Proceedings of ASE 2013. IEEE (2013)
18. Kolovos, D.S., Di Ruscio, D., Pierantonio, A., Paige, R.F.: Different models for
model matching: an analysis of approaches to support model differencing. In: Pro-
ceedings of CVSM @ ICSE 2009. IEEE (2009)
19. Kehrer, T., Kelter, U., Pietsch, P., Schmidt, M.: Adaptability of model comparison
tools. In: Proceedings of ASE 2011. ACM (2012)
20. Kehrer, T., Kelter, U., Taentzer, G.: A rule-based approach to the semantic lifting
of model differences in the context of model versioning. In: Proceedings of ASE
2011. IEEE (2011)
21. Kehrer, T., Kelter, U., Ohrndorf, M., Sollbach, T.: Understanding model evolution
through semantically lifting model differences with SiLift. In: Proceedings of ICSM
2012. IEEE Computer Society (2012)
22. Brun, C., Pierantonio, A.: Model differences in the eclipse modeling framework.
UPGRADE Eur. J. Inform. Prof. 9(2), 29–34 (2008)
23. Arendt, T., Biermann, E., Jurack, S., Krause, C., Taentzer, G.: Henshin: advanced
concepts and tools for in-place EMF model transformations. In: Petriu, D.C.,
Rouquette, N., Haugen, Ø. (eds.) MODELS 2010. LNCS, vol. 6394, pp. 121–135.
Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16145-2_9
24. Strüber, D., Born, K., Gill, K.D., Groner, R., Kehrer, T., Ohrndorf, M., Tichy,
M.: Henshin: a usability-focused framework for EMF model transformation devel-
opment. In: de Lara, J., Plump, D. (eds.) ICGT 2017. LNCS, vol. 10373, pp.
196–208. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61470-0_12
25. Taentzer, G., Kehrer, T., Pietsch, C., Kelter, U.: Accompanying website for this
paper (2017). http://pi.informatik.uni-siegen.de/projects/SiLift/fase2018/
26. Rozenberg, G. (ed.): Handbook of Graph Grammars and Computing by Graph
Transformation. Foundations, vol. I. World Scientific Publishing Co., Inc., River
Edge (1997)
27. Biermann, E., Ermel, C., Taentzer, G.: Lifting parallel graph transformation con-
cepts to model transformation based on the eclipse modeling framework. Electron.
Commun. EASST 26 (2010)
28. Kehrer, T., Taentzer, G., Rindt, M., Kelter, U.: Automatically deriving the spec-
ification of model editing operations from meta-models. In: Van Van Gorp, P.,
Engels, G. (eds.) ICMT 2016. LNCS, vol. 9765, pp. 173–188. Springer, Cham
(2016). https://doi.org/10.1007/978-3-319-42064-6_12
29. Rindt, M., Kehrer, T., Kelter, U.: Automatic generation of consistency-preserving
edit operations for MDE tools. In: Proceedings of Demos @ MoDELS 2014. CEUR
Workshop Proceedings, vol. 1255 (2014)
30. Kehrer, T., Rindt, M., Pietsch, P., Kelter, U.: Generating edit operations for pro-
filed UML models. In: Proceedings ME @ MoDELS 2013. CEUR Workshop Pro-
ceedings, vol. 1090 (2013)
20 G. Taentzer et al.
31. Kagdi, H., Maletic, J.I., Sutton, A.: Context-free slicing of UML class models. In:
Proceedings of ICSM 2005. IEEE (2005)
32. Lallchandani, J.T., Mall, R.: A dynamic slicing technique for UML architectural
models. IEEE Trans. Softw. Eng. 37(6), 737–771 (2011)
33. Nejati, S., Sabetzadeh, M., Falessi, D., Briand, L., Coq, T.: A SysML-based app-
roach to traceability management and design slicing in support of safety certifica-
tion: framework, tool support, and case studies. Inf. Softw. Technol. 54(6), 569–590
(2012)
34. Jézéquel, J.-M., Barais, O., Fleurey, F.: Model driven language engineering with
Kermeta. In: Fernandes, J.M., Lämmel, R., Visser, J., Saraiva, J. (eds.) GTTSE
2009. LNCS, vol. 6491, pp. 201–221. Springer, Heidelberg (2011). https://doi.org/
10.1007/978-3-642-18023-1_5
35. Etzlstorfer, J., Kusel, A., Kapsammer, E., Langer, P., Retschitzegger, W., Schoen-
boeck, J., Schwinger, W., Wimmer, M.: A survey on incremental model trans-
formation approaches. In: Pierantonio, A., Schätz, B. (eds.) Proceedings of the
Workshop on Models and Evolution. CEUR Workshop Proceedings, vol. 1090, pp.
4–13 (2013)
36. Lity, S., Morbach, T., Thüm, T., Schaefer, I.: Applying incremental model slicing
to product-line regression testing. In: Kapitsaki, G.M., Santana de Almeida, E.
(eds.) ICSR 2016. LNCS, vol. 9679, pp. 3–19. Springer, Cham (2016). https://doi.
org/10.1007/978-3-319-35122-3_1
37. Schaefer, I., Bettini, L., Bono, V., Damiani, F., Tanzarella, N.: Delta-oriented pro-
gramming of software product lines. In: Bosch, J., Lee, J. (eds.) SPLC 2010. LNCS,
vol. 6287, pp. 77–91. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-
642-15579-6_6
38. Struber, D., Rubin, J., Taentzer, G., Chechik, M.: Splitting models using infor-
mation retrieval and model crawling techniques. In: Gnesi, S., Rensink, A. (eds.)
FASE 2014. LNCS, vol. 8411, pp. 47–62. Springer, Heidelberg (2014). https://doi.
org/10.1007/978-3-642-54804-8_4
39. Ma, Q., Kelsen, P., Glodt, C.: A generic model decomposition technique and its
application to the eclipse modeling framework. SoSyM 14(2), 921–952 (2015)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
www.dbooks.org
Multiple Model Synchronization
with Multiary Delta Lenses
1 Introduction
Modelling normally results in a set of inter-related models presenting different
views of the system. If one of the models changes and their joint consistency
is violated, the related models should also be changed to restore consistency.
This task is obviously of paramount importance for MDE, but its theoretical
underpinning is inherently difficult and reliable practical solutions are rare. There
are working solutions for file synchronization in systems like Git, but they are
not applicable in the UML/EMF world of diagrammatic models. For the latter,
much work has been done for the binary case (synchronizing two models) by the
bidirectional transformation community (bx) [15], specifically, in the framework
of so called delta lenses [3], but the multiary case (the number of models to be
synchronized is n ≥ 2) gained much less attention—cf. the energetic call to the
community in a recent Stevens’ paper [16].
The context underlying bx is model transformation, in which one model in
the pair is considered as a transform of the other even though updates are prop-
agated in both directions (so called round-tripping). Once we go beyond n = 2,
we at once switch to a more general context of models inter-relations beyond
model-to-model transformations. Such situations have been studied in the con-
text of multiview system consistency, but rarely in the context of an accurate
formal basis for update propagation. The present paper can be seen as an adap-
tation of the (delta) lens-based update propagation framework for the multiview
c The Author(s) 2018
A. Russo and A. Schürr (Eds.): FASE 2018, LNCS 10802, pp. 21–37, 2018.
https://doi.org/10.1007/978-3-319-89363-1_2
22 Z. Diskin et al.
2 Example
We will consider a simple example motivating our framework. Many formal con-
structs below will be illustrated with the example (or its fragments) and referred
to as Running example.
www.dbooks.org
Multiple Model Synchronization with Multiary Delta Lenses 23
an agency investigating traffic problems, which maintains its own data on com-
muting between addresses (see schema M3 ) computable by an obvious relational
join over M1 and M2 . In addition, the agency supervises consistency of the two
sources and requires that if they both know a person p and a company c, then
they must agree on the employment record (p, c): it is either stored by both or
by neither of the sources. For this synchronization, it is assumed that persons
and companies are globally identified by their names. Thus, a triple of data sets
(we will say models) A1 , A2 , A3 , instantiating the respective metamodels, can
be either consistent (if the constraints described above are satisfied) or inconsis-
tent (if they aren’t). In the latter case, we normally want to change some or all
models to restore consistency. We will call a collection of models to be kept in
sync a multimodel.
To talk about constraints for multimodels, we need an accurate notation.
If A is a model instantiating metamodel M and X is a class in M, we write
X A for the set of objects instantiating X in A. Similarly, if r : X1 ↔ X2 is
an association in M, we write rA for the corresponding binary relation over
X1A × X2A . For example, Fig. 2 presents a simple model A1 instantiating M1 with
PersonA1 = {p1 , p1 }, CompanyA1 = {c1 }, empl-erA1 = {(p1 , c1 )}, and similarly
for attributes, e.g.,
(livesA1 and also nameA1 are assumed to be functions and Addr is the (model-
independent) set of all possible addresses). The triple (A1 , A2 , A3 ) is a (state of a)
multimodel over the multimetamodel (M1 , M2 , M3 ), and we say it is consistent if
the two constraints specified below are satisfied. Constraint (C1) specifies mutual
consistency of models A1 and A2 in the sense described above; constraint (C2)
specifies consistency between the agency’s view of data and the two data sources:
where −1 refers to the inverse relations and 1 denotes relational join (composi-
tion); using subsetting rather than equality in (C2) assumes that there are other
data sources the agency can use. Note that constraint (C1) inter-relates two
component models of the multimodel, while (C2) involves all three components
and forces synchronization to be 3-ary.
It is easy to see that multimodel A1,2,3 in Fig. 2 is “two-times” inconsis-
tent: (C1) is violated as both A1 and A2 know Mary and IBM, and (IBM,
Mary) ∈ empl-eeA2 but (Mary, IBM) ∈ / empl-erA1 ; (C2) is violated as A1 and A2
show a commuting pair (a1, a15) not recorded in A3 . We will discuss consis-
tency restoration in the next subsection, but first we need to discuss an impor-
tant part of the multimodel – traceability or correspondence mappings – held
implicit so far.
24 Z. Diskin et al.
www.dbooks.org
Multiple Model Synchronization with Multiary Delta Lenses 25
from IBM to Google, which will restore (C1) as A1 does not know Google. Simi-
larly, we can delete John’s record from A1 and then Mary’s employment with IBM
in A2 would not violate (C1). As the number of constraints and the elements they
involve increase, the number of consistency restoration variants grows fast.
The range of possibilities can be essentially decreased if we take into account the
history of creating inconsistency and consider not only an inconsistent state A† but
update u: A → A† that created it (assuming that A is consistent). For example,
suppose that initially model A1 contained record (Mary, IBM) (and A3 contained
(a1, a15)-commute), and the inconsistency appears after Mary’s employment with
IBM was deleted in A1 . Then it’s reasonable to restore consistency by deleting this
employment record in A2 too; we say that deletion was propagated from A1 to A2
(where we assume that initially A3 contained the commute (a1, a15)). If the incon-
sistency appears after adding (IBM, Mary)-employment to A2 , then it’s reasonable
to restore consistency by adding such a record to A1 . Although propagating dele-
tions/additions to deletions/additions is typical, there are non-monotonic cases
too. Let us assume that Mary and John are spouses (they live at the same address),
and that IBM follows an exotic policy prohibiting spouses to work together. Then
we can interpret addition of (IBM, Mary)-record to A2 as swapping of the family
member working for IBM, and then (John, IBM) is to be deleted from A1 .
Now let’s consider how updates to and from model A3 may be propagated.
As mentioned above, traceability/correspondence links play a crucial role here.
If additions to A1 or A2 or both create a new commute, the latter has to be
added to A3 (together with its corr-links) due to constraint (C2). In contrast, if
a new commute is added to A3 , we change nothing in A1,2 as (C2) only requires
inclusion. If a commute is deleted from A3 , and it is traced to a correspond-
ing employment in empl-erA1 ∪ empl-eeA2 , then this employment is deleted. (Of
course, there are other ways to remove a commute derivable over A1 and A2 .)
Finally, if a commute-generating employment in empl-erA1 ∪empl-eeA2 is deleted,
the respective commute in A3 is deleted too. Clearly, many of the propagation
policies above although formally correct, may contradict the real world changes
and hence should be corrected, but this is a common problem of a majority of
automatic synchronization approaches, which have to make guesses in order to
resolve non-determinism inherent in consistency restoration.
her record yet. Deletion (IBM, Mary) from A2 seems to be a different event
unless there are strong causal dependencies between moving to downtown and
working for IBM. Thus, an update policy that would keep A2 unchanged but
amend addition of Mary to A1 with further automatic adding her employment
for IBM (as per model A2 ) seems reasonable. This means that updates can be
reflectively propagated (we also say self-propagated).
Of course, self-propagation does not necessarily mean non-propagation to
other directions. Consider the following case: model A1 initially only contains
(John, IBM) record and is consistent with A2 shown in Fig. 2. Then record (Mary,
Google) was added to A1 , which thus became inconsistent with A2 . To restore
consistency, (Mary, Google) is to be added to A2 (the update is propagated
from A1 to A2 ) and (Mary, IBM) is to be added to A1 as discussed above (i.e.,
addition of (Mary, Google) is amended or self-propagated).
A general schema of update propa-
gation including reflection is shown in
Fig. 3. We begin with a consistent multi-
model (A1 ...An , R)1 one of which mem-
bers is updated ui : Ai → Ai . The
propagation operation, based on a priori
defined propagation policies as sketched
above, produces:
To distinguish given data from those produced by the operation, the former
are shown with framed nodes and solid lines in Fig. 3 while the latter are non-
framed and dashed. Below we introduce an algebraic model encompassing several
operations and algebraic laws formally modelling situations considered so far.
1
Here we first abbreviate (A1 , . . . , An ) by (A1 ...An ), and then write (A1 ...An , R) for
((A1 ...An ), R). We will apply this style in other similar cases, and write, e.g., i ∈ 1...n
for i ∈ {1, ..., n} (this will also be written as i ≤ n).
www.dbooks.org
Multiple Model Synchronization with Multiary Delta Lenses 27
Basically, a model space is a category, whose nodes are called model states or just
models, and arrows are (directed) deltas or updates. For an arrow u: A → A ,
we treat A as the state of the model before update u, A as the state after the
update, and u as an update specification. Structurally, it is a specification of
correspondences between A and A . Operationally, it is an edit sequence (edit
log) that changed A to A . The formalism does not prescribe what updates are,
but assumes that they form a category, i.e., there may be different updates from
state A to state A ; updates are composable; and idle updates idA : A → A (doing
nothing) are the units of the composition.
In addition, we require every model space A to be endowed with a family
(K
A )A∈A• of binary relations KA ⊂ A (_, A) × A (A, _) indexed by objects
of A, and specifying non-conflicting or compatible consecutive updates. Intu-
itively, an update u into A is compatible with update u from A, if u does
not revert/undo anything done by u, e.g., it does not delete/create objects cre-
ated/deleted by u, or re-modify attributes modified by u (see [14] for a detailed
discussion). Formally, we only require (u, idA )∈K
A and (idA , u )∈KA for all
A ∈ A• , u∈A (_, A) and u ∈A (A, _).
In the sequel, we will work with families of model spaces indexed by a finite
set I, whose elements can be seen as space names. To simplify notation, we
will assume that I = {1, . . . , n} although ordering will not play any role in our
framework. Given a tuple of model spaces A1 , . . . , An , we will refer to objects
and arrows of the product category A1 × · · · × An as model tuples and update
tuples or, sometimes, as discrete multimodels/multiupdates.
Note that any corr R uniquely defines a multimodel via the corr’s boundary
•
function ∂. We will also needto identify the set of all corrs for some fixed A ∈ Ai
def
for a given i: Ai (A, _) = R∈A ∂i R = A.
The Running example of Sect. 2 gives rise to a 3-ary multimodel space. For
i ≤ 3, space Ai consists of all models instantiating metamodel Mi in Fig. 1
and their updates. To get a consistent multimodel (A1 A2 A3 , R) from that one
shown in Fig. 2, we can add to A1 an empl-er-link connecting Mary to IBM,
add to A3 a commute with from = a1 and to = a15, and form a corr-set R =
{(p1 , p2 ), (c1 , c2 )} (all other corr-links are derivable from this data).
www.dbooks.org
Multiple Model Synchronization with Multiary Delta Lenses 29
Stability says that lenses do nothing voluntarily. Reflect1 says that amendment
works towards “completion” rather than “undoing”, and Reflect2-3 are idempo-
tency conditions to ensure the completion indeed done.
Definition 6 (Invertibility). A wb lens is called (weakly) invertible, if it
satisfies the following law for any i, update ui : Ai → Ai and R ∈ A i (Ai , _):
(Invert)i for all j = i: ppgR ij (ppg R
ji (ppg R
ij (u i ))) = ppg R
ij (u i )
This law deals with “round-tripping”: operation ppgR ji applied to update uj =
ppgR
ij (u i ) results in update û i equivalent to u i in the sense that ppgRij (ûi ) =
R
ppgij (ui ) (see [3] for a motivating discussion).
Example 1 (Identity Lens (nA)). Let A be an arbitrary model space. It gener-
ates an n-ary lens (nA) as follows: The carrier A has n identical model spaces:
Ai = A for all i ∈ {1, .., n}, it has A = A• , and boundary functions are
identities. All updates are propagated to themselves (hence the name of (nA)).
Obviously, (nA) is a wb, invertible lens non-reflective at all its feet.
30 Z. Diskin et al.
www.dbooks.org
Multiple Model Synchronization with Multiary Delta Lenses 31
access to relational storage would amend application data, and thus we have
a consistent corr R2 as shown. Step 4: lens b1 maps update v1 (see above
in Step 2) backward to u
1 that adds (Mary, IBM) to B1 so that B1 includes
both (Mary, Google) and (Mary, IBM) and a respective consistent corr R1 is
provided. There is no amendment for v1 by the same reason as in Step 3.
Thus, all five models in the bottom line of Fig. 4 (A3 is not shown) are
mutually consistent and all show that Mary is employed by IBM and Google.
Synchronization is restored, and we can consider the entire scenario as propaga-
tion of u1 to u
2 and its amendment with u1 so that finally we have a consis-
tent corr (R1 , R , R2 ) interrelating B1 , A3 , B2 . Amendment u
1 is compatible
with u1 as nothing is undone and condition (u1 , u 1 ) ∈ K B1 holds; the other two
equations required by Reflect2-3 for the pair (u1 , u1 ) also hold. For our simple
projection views, these conditions will hold for other updates too, and we have
a well-behaved propagation from B1 to B2 (and trivially to A3 ). Similarly, we
have a wb propagation from B2 to B1 and A3 . Propagation from A3 to B1,2 is
non-reflective and done in two steps: first lens k works, then lenses bi work as
described above (and updates produced by k are bi -closed). Thus, we have built
a wb ternary lens synchronizing spaces B1 , B2 and A3 by joining lenses b1 and
b2 to the central lens k .
4⇒
3
⇒
above as a transition system and shows that Steps • ==⇒ • ==⇒ • •
1 2
⇒
3 and 4 can go concurrently. It is the non-trivial
⇒
amendment created in Step 2 that causes the neces- •
4
3
sity of Step 4, otherwise Step 3 would finish consis-
tency restoration (with Step 4 being an idle transition). On the other hand, if
update v2 in Fig. 4 would not be closed for lens b2 , we’d have yet another con-
current step complicating the scenario. Fortunately for our example with simple
projective views, Step 4 is simple and provides a non-conflicting amendment, but
the case of more complex views beyond the constant-complement class needs care
and investigation. Below we specify a simple situation of lens composition with
reflection a priori excluded, and leave more complex cases for future work.
Likewise we write ∂xbi with x ∈ {A, B} for the boundary functions of lenses bi .
The above configuration gives rise to the following n-ary lens . The carrier is
the tuple of model spaces B1 ...Bn and corrs are tuples (R, R1 ...Rn ) with R ∈ k
and Ri ∈ bi , such that ∂ik R = ∂A bi
Ri for all i ∈ 1..n. Moreover, we define
def
∂i (R, R1 ...Rn ) = ∂B
bi
Ri (see Fig. 5). Operations are defined as compositions of
consecutive lens’ executions as described below (we will use the dot notation for
operation application and write x.op for op(x), where x is an argument).
Given a model tuple (B1 ...Bn ) ∈ B1 ×...× Bn , a corr (R, R1 ...Rn ), and
update vi : Bi → Bi in B i , we define, first for j = i,
Ri
amendments to ui = vi .(bi .ppgBA ) produced by k , and to uj = ui .(k .ppgR ij )
produced by bj , are identities due to the Junction conditions. This allows
us to set corrs properly and finish propagation with the three steps above:
(R,R ...R ) def Rj
vi . .ppgi 1 n = (R , R1 ...Rn ) where R = ui . k .ppgR
i , Rj = uj . bj .ppgA
for j = i, and Ri = vi . bi .ppgR
B . We thus have a lens denoted by k (b1 , . . . , bn ).
i
Proof. Laws Stability and Reflect1 for the composed lens are straightforward.
Reflect2-3 also follow immediately, since the first step of the above propagation
procedure already enjoys idempotency by Reflect2-3 for bi .
www.dbooks.org
Multiple Model Synchronization with Multiary Delta Lenses 33
Proof. An n-ary span of a-lenses bi (all of them interpreted as symmetric lenses
bi as explained above) is a construct equivalent to the star-composition of Def-
inition 4.1.3, in which lens k = (nB) (cf. Example 1) and peripheral lenses are
lenses bi . The junction condition is satisfied as all base updates are bi -closed for
all i by Lemma 1, and also trivially closed for any identity lens. The theorem
thus follows from Theorem 1. Note that a corr in (Σi=1 n
bi ) is nothing but a
•
single model B ∈ B with boundaries being the respective geti -images.
34 Z. Diskin et al.
The theorem shows that combining a-lenses in this way yields an n-ary sym-
metric lens, whose properties can automatically be inferred from the binary
a-lenses.
Running example. Figure 6 shows a metamodel M + obtained by merging the
three metamodels M1,2,3 from Fig. 1 without loss and duplication of information.
In addition, for persons and companies, the identifiers of model spaces, in which
a given person or company occurs, can be traced back via attribute “spaces”
(Commute-objects are known to appear in space A3 and hence do not need such
an attribute). As shown in [10], any consistent multimodel (A1 ...An , R) can be
merged into a comprehensive model A+ instantiating M + . Let B be the space
of such together with their comprehensive updates u+ : A+ → A+ .
For a given i ≤ 3, we can define the fol-
lowing a-lens bi = (Ai , B, geti , puti ): geti takes
update u+ as above and outputs its restriction
to the model containing only objects recorded
in space Ai . Operation puti takes an update
vi : Ai → Ai and first propagates it to all direc-
tions as discussed in Sect. 2, then merges these
propagated local updates into a comprehensive Fig. 6. Merged metamodel
B-update between comprehensive models. This yields a span of a-lenses that
implements the same synchronization behaviour as the symmetric lens discussed
in Sect. 2.
From lenses to spans. There is also a backward transformation of (symmetric)
lenses to spans of a-lenses. Let = (A, ppg) be a wb lens. It gives rise to the
following span of wb a-lenses
i = (∂ i (A), B, geti , puti ) where space B is built
from consistent multimodels and their updates, and functors geti : B → Ai are
projection functors. Given B = (A1 ...An , R) and update ui : Ai → Ai , let
def
putB
ib (ui ) = (u1 , .., ui−1 , (ui ; ui ), ui+1 , .., un ): (A1 ...An , R) → (A1 ...An , R )
def def
where uj = ppgR ij (ui ) (all j) and R
= ppgR B
i (ui ). Finally, putiv (vi ) =
R
ppgii (ui ). Validity of Stability, Reflect0-2, PutGet directly follows from the above
definitions.
An open question is whether the span-to-lens transformation in Theorem 2
and the lens-to-span transformation described above are mutually inverse. The
results for the binary case in [8] show that this is only the case modulo cer-
tain equivalence relations. These equivalences may be different for our reflective
multiary lenses, and we leave this important question for future research.
5 Related Work
For state-based lenses, the work closest in spirit is Stevens’ paper [16]. Her and
our goals are similar, but the technical realisations are different even besides
the state- vs. delta-based opposition. Stevens works with restorers, which take
www.dbooks.org
Multiple Model Synchronization with Multiary Delta Lenses 35
References
1. Diskin, Z., König, H., Lawford, M.: Multiple model synchronization with multiary
delta lenses. Technical report. McMaster Centre for Software Certification,
McSCert-2017-10-01, McMaster University (2017). http://www.mcscert.ca/
projects/mcscert/wp-content/uploads/2017/10/Multiple-Model-Synchronization-
with-Multiary-Delta-Lenses-ZD.pdf
2. Diskin, Z., Xiong, Y., Czarnecki, K.: From state- to delta-based bidirectional model
transformations: the asymmetric case. J. Object Technol. 10(6), 1–25 (2011)
3. Diskin, Z., Xiong, Y., Czarnecki, K., Ehrig, H., Hermann, F., Orejas, F.: From
state- to delta-based bidirectional model transformations: the symmetric case. In:
Whittle, J., Clark, T., Kühne, T. (eds.) MODELS 2011. LNCS, vol. 6981, pp. 304–
318. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24485-8_22
4. Foster, J.N., Greenwald, M.B., Moore, J.T., Pierce, B.C., Schmitt, A.: Combi-
nators for bi-directional tree transformations: a linguistic approach to the view
update problem. In: Palsberg, J., Abadi, M. (eds.) Proceedings of the 32nd ACM
SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL
2005, 12–14 January 2005, Long Beach, California, USA, pp. 233–246. ACM (2005).
https://doi.org/10.1145/1040305.1040325
5. Hermann, F., Ehrig, H., Orejas, F., Czarnecki, K., Diskin, Z., Xiong, Y.: Cor-
rectness of model synchronization based on triple graph grammars. In: Whittle,
J., Clark, T., Kühne, T. (eds.) MODELS 2011. LNCS, vol. 6981, pp. 668–682.
Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24485-8_49
6. Hofmann, M., Pierce, B.C., Wagner, D.: Symmetric lenses. In: Ball, T., Sagiv, M.
(eds.) Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles
of Programming Languages, POPL 2011, 26–28 January 2011, Austin, TX, USA,
pp. 371–384. ACM (2011). https://doi.org/10.1145/1926385.1926428
www.dbooks.org
Multiple Model Synchronization with Multiary Delta Lenses 37
7. Hofmann, M., Pierce, B.C., Wagner, D.: Edit lenses. In: Field, J., Hicks, M. (eds.)
Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Pro-
gramming Languages, POPL 2012, 22–28 January 2012, Philadelphia, Pennsylvania,
USA, pp. 495–508. ACM (2012). https://doi.org/10.1145/2103656.2103715
8. Johnson, M., Rosebrugh, R.D.: Symmetric delta lenses and spans of asymmetric
delta lenses. J. Object Technol. 16(1), 2:1–2:32 (2017). https://doi.org/10.5381/
jot.2017.16.1.a2
9. Johnson, M., Rosebrugh, R.D., Wood, R.J.: Lenses, fibrations and universal trans-
lations. Math. Struct. Comput. Sci. 22(1), 25–42 (2012). https://doi.org/10.1017/
S0960129511000442
10. König, H., Diskin, Z.: Efficient consistency checking of interrelated models. In:
Anjorin, A., Espinoza, H. (eds.) ECMFA 2017. LNCS, vol. 10376, pp. 161–178.
Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61482-3_10
11. Königs, A., Schürr, A.: MDI: a rule-based multi-document and tool integration app-
roach. Softw. Syst. Model. 5(4), 349–368 (2006). https://doi.org/10.1007/s10270-
006-0016-x
12. Macedo, N., Cunha, A., Pacheco, H.: Towards a framework for multidirectional
model transformations. In: Proceedings of the Workshops of the EDBT/ICDT
2014 Joint Conference (EDBT/ICDT 2014), 28 March 2014, Athens, Greece, pp.
71–74 (2014). http://ceur-ws.org/Vol-1133/paper-11.pdf
13. Mu, S.-C., Hu, Z., Takeichi, M.: An algebraic approach to bi-directional updating.
In: Chin, W.-N. (ed.) APLAS 2004. LNCS, vol. 3302, pp. 2–20. Springer, Heidelberg
(2004). https://doi.org/10.1007/978-3-540-30477-7_2
14. Orejas, F., Boronat, A., Ehrig, H., Hermann, F., Schölzel, H.: On propagation-
based concurrent model synchronization. ECEASST 57, 1–19 (2013). http://
journal.ub.tu-berlin.de/eceasst/article/view/871
15. Stevens, P.: Bidirectional model transformations in QVT: semantic issues and open
questions. Softw. Syst. Model. 9(1), 7–20 (2010)
16. Stevens, P.: Bidirectional transformations in the large. In: 20th ACM/IEEE Inter-
national Conference on Model Driven Engineering Languages and Systems, MOD-
ELS 2017, 17–22 September 2017, Austin, TX, USA, pp. 1–11 (2017). https://doi.
org/10.1109/MODELS.2017.8
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
Controlling the Attack Surface
of Object-Oriented Refactorings
1 Introduction
The essential activity in designing object-oriented programs is to identify class
candidates and to assign responsibility (i.e., data and operations) to them. An
appropriate solution to this Class-Responsibility-Assignment (CRA) problem, on
the one hand, intuitively reflects the problem domain and, on the other hand,
exhibits acceptable quality measures [4]. In this context, refactoring has become
a key technique for agile software development: productive program-evolution
phases are interleaved with behavior-preserving code transformations for updat-
ing CRA decisions, to proactively maintain, or even improve, code-quality met-
rics [13,29]. Each refactoring pursues a trade-off between two major, and gen-
erally contradicting, objectives: (1) maximizing code-quality metrics, including
fine-grained coupling/cohesion measures as well as coarse-grained anti-pattern
c The Author(s) 2018
A. Russo and A. Schürr (Eds.): FASE 2018, LNCS 10802, pp. 38–55, 2018.
https://doi.org/10.1007/978-3-319-89363-1_3
www.dbooks.org
Attack Surface of OO Refactorings 39
avoidance, and (2) minimizing the number of changes to preserve the initial pro-
gram design as much as possible [8]. Manual search for refactorings sufficiently
meeting both objectives becomes impracticable already for medium-size pro-
grams, as it requires to find optimal sequences of interdependent code transfor-
mations with complex constraints [10]. The very large search space and multiple
competing objectives make the underlying optimization problem well-suited for
search-based optimization [15] for which various semi-automated approaches for
recommending refactorings have been recently proposed [18,27,28,30,34].
The validity of proposed refactorings is mostly concerned with purely func-
tional behavior preservation [24], whereas their impact on extra-functional prop-
erties like program security has received little attention so far [22]. However,
applying elaborated information-flow metrics for identifying security-preserving
refactorings is computationally too expensive in practice [36]. As an alterna-
tive, we consider attack-surface metrics as a sufficiently reliable, yet easy-to-
compute indicator for preservation of program security [20,41]. Attack surfaces
of programs comprise all conventional ways of entering a software by users/at-
tackers (e.g., invoking API methods or inheriting from super-classes) such that
an unnecessarily large surface increases the danger of exploiting vulnerabilities.
Hence, the goal of a secure program design should be to grant least privileges to
class members to reduce the extent to which data and operations are exposed
to the world [41]. In Java-like languages, accessibility constraints by means of
modifiers public, private and protected provide a built-in low-level mecha-
nism for controlling and restricting information flow within and across classes,
sub-classes and packages [38]. Accessibility constraints introduce compile-time
security barriers protecting trusted system code from untrusted mobile code [19].
As a downside, restricted accessibility privileges naturally obstruct possibilities
for refactorings, as CRA updates (e.g., moving members [34]) may be either
rejected by those constraints, or they require to relax accessibility privileges,
thus increasing the attack surface [35].
In this paper, we present a search-based technique to find optimal sequences
of refactorings for object-oriented Java-like programs, by explicitly taking acces-
sibility constraints into account. To this end, we do not propose novel refac-
toring operations, but rather apply established ones and control their impact
on attack-surface metrics. We focus on MoveMethod refactorings which have
been proven effective for improving CRA metrics [34], in combination with
operations for on-demand strengthening and relaxing of accessibility declara-
tions [38]. As objectives, we consider (O1) elimination of design flaws, partic-
ularly, (O1a) optimization of object-oriented coupling/cohesion metrics [5,6]
and (O1b) avoidance of anti-patterns, namely The Blob, (O2) preservation
of original program design (i.e., minimizing the number of change operations),
and (O3) attack-surface minimization. Our model-based tool implementation,
called GOBLIN, represents individuals (i.e., intermediate refactoring results) as
program-model instances complying to an EMF meta-model for Java-like pro-
grams [33]. Hence, instead of regenerating source code after every single refactor-
ing step, we apply and evaluate sequences of refactoring operations, specified as
model-transformation rules in Henshin [2], on the program model. To this end,
40 S. Ruland et al.
1
https://github.com/Echtzeitsysteme/goblin.
www.dbooks.org
Attack Surface of OO Refactorings 41
class- and even package-boundaries. The Blob and other design flaws are widely
considered harmful with respect to software quality in general and program main-
tainability in particular [7]. For instance, assume a developer to extend MailApp
by (1) adding further classes SecureMailApp and RsaAdapter for encrypting and
signing messages, and by (2) extending class Contact with public RSA key han-
dling: method findKey() searches for public RSA keys of contacts by repeatedly
calling method findKeyFromServer() with the URL of available key servers. This
program evolution further decays the already flawed design of MailApp as class
SecureMailApp may be considered as a second instance of The Blob anti-pattern:
method encryptMessage() of class SecureMailApp intensively calls method find-
Key() in class Contact. This example illustrates a well-known dilemma of agile
program development in an object-oriented world: Class-Responsibility Assign-
ment decisions may become unbalanced over time, due to unforeseen changes
crosscutting the initial program design [31]. As a result, a majority of object-
oriented design flaws like The Blob anti-pattern is mainly caused by low cohe-
sion/high coupling ratios within/among classes and their members [5,6].
Refactoring of Object-Oriented Programs. Object-oriented refactorings
constitute an emerging and widely used counter-measure against design
flaws [13]. Refactorings impose systematic, semantic-preserving program trans-
formations for continuously improving code-quality measures of evolving source
code. For instance, the MoveMethod refactoring is frequently used to update
CRA decisions after program changes, by moving method implementations
between classes [34]. Applied to our example, a developer may (manually) con-
duct two refactorings, R1 and R2, to counteract the aforementioned design
flaws:
(R1) move method plainToHtml() from class MailApp to class Message, and
(R2) move method encryptMessage() from class SecureMailApp to class Contact.
However, concerning programs of realistic size and complexity, tool support
for (semi-)automated program refactorings becomes more and more inevitable.
The major challenges in finding effective sequences of object-oriented refactoring
operations consists in detecting flawed program parts to be refactored, as well as
in recommending program transformations applied to those parts to obtain an
improved, yet behaviorally equivalent program design. The complicated nature
of the underlying optimization problem stems from several phenomena.
– Very large search-space due to the combinatorial explosion resulting
from the many possible sequences of (potentially interdependent) refactoring-
operation applications.
– Multiple objectives including various (inherently contradicting) refactoring
goals (e.g., O1−O3).
– Many invalid solutions due to (generally very complicated) constraints to
be imposed for ensuring behavior preservation.
Further research especially on the last phenomenon is required to understand
to what extent a refactoring actually alters (in a potentially critical way) the
42 S. Ruland et al.
www.dbooks.org
Attack Surface of OO Refactorings 43
www.dbooks.org
Attack Surface of OO Refactorings 45
side. The rule takes a source class srcClass, a target class trgClass and a method
signature methodSig as parameters, deletes the containment arrow between source
class and signature (red arrow annotated with --) and creates a new contain-
ment arrow from the target class (green arrow annotated with ++), only if such
an arrow not already exists before rule application. The latter (pre-)condition is
expressed by a forbidden (crossed-out) arrow. For a comprehensive list of all nec-
essary pre-conditions (or, pre-constraints), we refer to [38].
Accessibility Post-constraints. Besides pre-constraints, for refactoring oper-
ations to yield correct results, it must satisfy further post-constraints to be
evaluated after rule application, especially concerning accessibility constraints
as declared in the original program (i.e., member accesses like method calls in
the original program must be preserved after refactoring [24]). As an example,
a (simplified) post-constraint for the MoveMethod rule is shown on the right
of Fig. 3 using OCL-like notation. Members refers to the collection of all class
members in the program. The post-constraint utilizes helper-function reqAcc(m)
to compute the required access modifier of class member m and checks whether
the declared accessibility of m is at least as generous as required (based on the
canonical ordering private < default < protected < public) [38].
For instance, if refactoring R2 is applied to MailApp, method encryptMes-
sage() violates this post-constraint, as the call from sendMessage() from another
package requires accessibility public, whereas the declared accessibility is
protected. Instead of immediately rejecting refactorings like R2, we introduce
an accessibility-repair operation of the form m.accessibility := reqAcc(m) for each
member violating the post-constraint which therefore causes a relaxation of the
attack surface. However, this repair is not always possible as relaxations may
lead to incorrect refactorings altering the original program semantics (e.g., due
to method overriding/overloading [38]). In contrast, refactoring R1 (i.e., mov-
ing plainToHtml() to class Message) satisfies the post-constraint as the required
accessibility of plainToHtml() becomes private, whereas its declared accessibil-
ity is public. In those cases, we may also apply the operation m.accessibility :=
reqAcc(m), now leading to a reduction of the attack surface. Different strategies
for attack-surface reduction will be investigated in Sect. 4.
this paper [9]. Consequently, good CRA decisions exhibit low values for both
COU and LCOM5. Hence, refactorings R1 and R2 both improve values of
COU (i.e., by eliminating inter-class call -arrows) and LCOM5 (i.e., by moving
methods into classes where they are called).
Anti-patterns. Concerning (O1b), we limit our considerations to occurrences
of The Blob anti-pattern for convenience. We employ the detection-approach of
Peldszus et al. [33] and consider as objective to minimize the number of The Blob
instances (denoted #BLOB). For instance, for the original MailApp program
(white parts in Fig. 1), we have #BLOB = 1, while for the extended version
(white and gray parts), we have #BLOB = 2. Refactoring R1 may help to
remove the first occurrence and R2 potentially removes the second one.
Changes. Concerning (O2), real-life studies show that refactoring recommen-
dations to be accepted by users must avoid a too large deviation from the original
design [8]. Here, we consider the number of MoveMethod refactorings (denoted
#REF) to be performed in a recommendation, as a further objective to be
minimized. For example, solely applying R1 results in #REF = 1, whereas a
sequence of R1 followed by R2 most likely imposes more design changes (i.e.,
#REF = 2). In contrast, accessibility-repair operations do not affect the value
#REF, but rather impact objective (O3).
Attack Surface. Concerning (O3), the guidelines for secure object-oriented
programming encourages developers to grant as least access privileges as possible
to any accessible program element to minimize the attack surface [19]. In our
program model, the attack-surface metric (denoted AS) is measured as
AS = ω(m.accessibility), (1)
m∈Members
www.dbooks.org
Attack Surface of OO Refactorings 47
4 Experimental Evaluation
We now present experimental evaluation results gained from applying GOB-
LIN to a collection of Java programs. First, to investigate the impact of attack-
surface reduction on the resulting refactoring recommendations, we consider the
following reduction strategies, differing in when to perform attack-surface reduc-
tion during search-space exploration (where step means a refactoring step):
– Strategy 1: A priori reduction. Before the first and after the last step.
– Strategy 2: A posteriori reduction. Only after the last step.
– Strategy 3: Continuous reduction. After every refactoring step.
We are interested in the impact of each strategy on the trade-off between attack-
surface metrics and design-quality metrics (i.e., do the recommended refactor-
ing sequences tend to optimize more the attack surface aspect or the program
design?). We quantify attack-surface impact (ASI) and design impact (DI) of a
refactoring recommendation rr as follows:
AS(rr) − AS(orig)
ASI(rr) = (2)
AS(orig)
www.dbooks.org
Attack Surface of OO Refactorings 49
−0.004
−0.003
Minimal Impact
#3S3 #3S3
#3S2 #3S2
#3S1 #3S1
#2S3 #2S3
#2S2 #2S2
#2S1 #2S1
#1S3 #1S3
#1S2 #1S2
#1S1 #1S1
#6S3 #6S3
#6S2 #6S2
#6S1 #6S1
#5S3 #5S3
#5S2 #5S2
#5S1 #5S1
#4S3 #4S3
#4S2 #4S2
#4S1 #4S1
#8S3 #8S3
#8S2 #8S2
#8S1 #8S1
#7S3 #7S3
#7S2 #7S2
#7S1 #7S1
−0.008 −0.004 0 0.004 0.008 0.012 −0.008 −0.004 0 0.004 0.008 0.012
(e) ASI for Large Programs (f) DI for Large Programs
#8 #8
#7 #7
#6 #6
#5 #5
#4 #4
#3 #3
#2 #2
www.dbooks.org
Attack Surface of OO Refactorings 51
4.2 Discussion
5 Related Work
Automating Design-Flaw Detection and Refactorings. Marinescu pro-
poses a metric-based design-flaw detection approach similar to Peldszus et al.
in [33], which is used in our work. However, both works do not deal with elimi-
nation of detected flaws [21]. In contrast, the DECOR framework also includes
recommendations for eliminating anti-patterns, whereas, in contrast to our work,
those recommendations remain rather atomic and local. More related to our
approach, Fokaefs et al. [12] and Tsantalis et al. [40] consider (semi-)automatic
refactorings to eliminate anti-patterns like The Blob in the tool JDeodorant.
Nevertheless, they focus on optimizing one single objective and do not consider
multiple, esp. extra-functional, aspects like security metrics as in our approach.
Multi-objective Search-Based Refactorings. O’Keeffe and Ó Cinnéide use
search-based refactorings in their tool CODe-Imp [28] including various stan-
dard refactoring operations and different quality metrics as objectives [27]. Seng
et al. consider a search-based setting, where, similar to our approach, compound
refactoring recommendations comprise atomic MoveMethod operations. Harman
and Tratt also investigate a Pareto-front of refactoring recommendations includ-
ing various design objectives [16], and more recently, Ouni et al. conducted a
large-scale real-world study on multi-objective search-based refactoring recom-
mendations [30]. However, neither of the approaches investigates the impact of
refactorings on security-relevant metrics as in our approach.
Security-Aware Refactorings. Steimann and Thies were the first to pro-
pose a comprehensive set of accessibility constraints for refactorings covering
full Java [38]. Although their constraints are formally founded, they do not
consider software metrics to quantify the attack surface impact of (sequences
of) refactorings. Alshammari et al. propose an extensive catalogue of software
metrics for evaluating the impact of refactorings on program security of object-
oriented programs [1]. Similarly, Maruyama and Omori propose a technique [22]
and tool [23] for checking if a refactoring operation raises security issues. How-
ever, all these approaches are concerned with security and accessibility con-
straints of specific refactorings, but they do not investigate those aspects in a
multi-objective program optimization setting. The problem of measuring attack
surfaces serving as a metric for evaluating secure object-oriented programming
policies has been investigated by Zoller and Schmolitzky [41] and Manadhata
and Wing [20], respectively. Nevertheless, those and similar metrics have not
yet been utilized as optimization objective for program refactoring. Finally,
Ghaith and Ó Cinnéide consider a catalogue of security-relevant metrics to rec-
ommend refactorings using CODe-Imp, but they also consider security as single
objective [14].
www.dbooks.org
Attack Surface of OO Refactorings 53
6 Conclusion
We presented a search-based approach to recommend sequences of refactor-
ings for object-oriented Java-like programs by taking the attack surface as
additional optimization objective into account. Our model-based methodology,
implemented in the tool GOBLIN, utilizes the MOMoT framework including
the genetic algorithm NSGA-III for search-space exploration. Our experimental
results gained from applying GOBLIN to real-world Java programs provides us
with detailed insights into the impact of attack-surface metrics on fitness values
of refactorings and the resulting trade-off with competing design-quality objec-
tives. As a future work, we plan to incorporate additional domain knowledge
about critical code parts to further control security-aware refactorings.
Acknowledgements. This work was partially funded by the Hessian LOEWE ini-
tiative within the Software-Factory 4.0 project as well as by the German Research
Foundation (DFG) in the Priority Programme SPP 1593: Design For Future - Man-
aged Software Evolution (LO 2198/2-1, JU 2734/2-1).
References
1. Alshammari, B., Fidge, C., Corney, D.: Assessing the impact of refactoring on
security-critical object-oriented designs. In: Proceedings of APSEC, pp. 186–195
(2010)
2. Arendt, T., Biermann, E., Jurack, S., Krause, C., Taentzer, G.: Henshin: advanced
concepts and tools for in-place EMF model transformations. In: Petriu, D.C.,
Rouquette, N., Haugen, Ø. (eds.) MODELS 2010. LNCS, vol. 6394, pp. 121–135.
Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16145-2 9
3. Bendis, B.M.: Secret Invasion, vol. 1-8. Marvel, New York (2009)
4. Bowman, M., Briand, L.C., Labiche, Y.: Solving the class responsibility assignment
problem in object-oriented analysis with multi-objective genetic algorithms. IEEE
Trans. Softw. Eng. 36(6), 817–837 (2010)
5. Briand, L.C., Daly, J.W., Wust, J.K.: A unified framework for coupling measure-
ment in object-oriented systems. IEEE Trans. Softw. Eng. 25(1), 91–121 (1999)
6. Briand, L.C., Daly, J.W., Wüst, J.: A unified framework for cohesion measurement
in object-oriented systems. Empir. Softw. Eng. 3(1), 65–117 (1998)
7. Brown, W.J., Malveau, R.C., McCormick III, H.W., Mowbray, T.J.: AntiPat-
terns: Refactoring Software, Architectures, and Projects in Crisis. Wiley, New York
(1998)
8. Candela, I., Bavota, G., Russo, B., Oliveto, R.: Using cohesion and coupling for
software remodularization: is it enough? ACM Trans. Softw. Eng. Methodol. 25(3),
24:1–24:28 (2016)
9. Chidamber, S., Kemerer, C.: A metrics suite for object oriented design. IEEE
Trans. Softw. Eng. 20(6), 476–493 (1994)
10. Van Eetvelde, N., Janssens, D.: Extending graph rewriting for refactoring. In:
Ehrig, H., Engels, G., Parisi-Presicce, F., Rozenberg, G. (eds.) ICGT 2004. LNCS,
vol. 3256, pp. 399–415. Springer, Heidelberg (2004). https://doi.org/10.1007/978-
3-540-30203-2 28
54 S. Ruland et al.
11. Fleck, M., Troya, J., Wimmer, M.: Search-based model transformations with
MOMoT. In: Van Gorp, P., Engels, G. (eds.) ICMT 2016. LNCS, vol. 9765, pp.
79–87. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42064-6 6
12. Fokaefs, M., Tsantalis, N., Stroulia, E., Chatzigeorgiou, A.: JDeodorant: identifi-
cation and application of extract class refactorings. In: Proceedings of ICSE, pp.
1037–1039 (2011)
13. Fowler, R.: Refactoring: Improving the Design of Existing Code. Addison-Wesley,
Reading (2000)
14. Ghaith, S., Ó Cinnéide, M.: Improving software security using search-based refac-
toring. In: Fraser, G., Teixeira de Souza, J. (eds.) SSBSE 2012. LNCS, vol. 7515, pp.
121–135. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33119-
0 10
15. Harman, M., Mansouri, S.A., Zhang, Y.: Search based software engineering: a
comprehensive analysis and review of trends techniques and applications (2009)
16. Harman, M., Tratt, L.: Pareto optimal search based refactoring at the design level.
In: Proceedings of GECCO, pp. 1106–1113. ACM (2007)
17. Henderson-Sellers, B.: Object-Oriented Metrics: Measures of Complexity. Prentice-
Hall Inc., Upper Saddle River (1996)
18. Kessentini, M., Sahraoui, H., Boukadoum, M., Wimmer, M.: Search-based design
defects detection by example. In: Giannakopoulou, D., Orejas, F. (eds.) FASE
2011. LNCS, vol. 6603, pp. 401–415. Springer, Heidelberg (2011). https://doi.org/
10.1007/978-3-642-19811-3 28
19. Long, F., Mohindra, D., Seacord, R.C., Sutherland, D.F., Svoboda, D.: The CERT
Oracle Secure Coding Standard for Java. Addison-Wesley Professional, Boston
(2011)
20. Manadhata, P.K., Wing, J.M.: An attack surface metric. IEEE Trans. Softw. Eng.
37(3), 371–386 (2011)
21. Marinescu, R.: Detection strategies: metrics-based rules for detecting design flaws,
pp. 350–359. IEEE (2004)
22. Maruyama, K., Omori, T.: Security-aware refactoring alerting its impact on code
vulnerabilities. In: APSEC, pp. 445–451. IEEE (2008)
23. Maruyama, K., Omori, T.: A security-aware refactoring tool for Java programs.
In: Proceedings of WRT, pp. 22–28. ACM (2011)
24. Mens, T., Demeyer, S., Janssens, D.: Formalising behaviour preserving program
transformations. In: Corradini, A., Ehrig, H., Kreowski, H.-J., Rozenberg, G. (eds.)
ICGT 2002. LNCS, vol. 2505, pp. 286–301. Springer, Heidelberg (2002). https://
doi.org/10.1007/3-540-45832-8 22
25. Mens, T., Taentzer, G., Runge, O.: Analysing refactoring dependencies using graph
transformation. SOSYM 6(3), 269–285 (2007)
26. Mens, T., Van Eetvelde, N., Demeyer, S., Janssens, D.: Formalizing refactorings
with graph transformations. J. Softw. Evol. Process 17(4), 247–276 (2005)
27. Moghadam, I.H., Ó Cinnéide, M.: Code-Imp: a tool for automated search-based
refactoring. In: Proceedings of WRT, pp. 41–44. ACM (2011)
28. O’Keeffe, M., Ó Cinnéide, M.: Search-based refactoring: an empirical study. J.
Softw. Maint. Evol. Res. Pract. 20(5), 345–364 (2008)
29. Opdyke, W.: Refactoring Object-Oriented Frameworks. Ph.D. thesis, University of
Illinois (1992)
30. Ouni, A., Kessentini, M., Sahraoui, H.A., Inoue, K., Deb, K.: Multi-criteria code
refactoring using search-based software engineering: an industrial case study. ACM
Trans. Softw. Eng. Methodol. 25(3), 23:1–23:53 (2016)
www.dbooks.org
Attack Surface of OO Refactorings 55
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
Effective Analysis of Attack Trees:
A Model-Driven Approach
Abstract. Attack trees (ATs) are a popular formalism for security anal-
ysis, and numerous variations and tools have been developed around
them. These were mostly developed independently, and offer little inter-
operability or ability to combine various AT features.
We present ATTop, a software bridging tool that enables automated
analysis of ATs using a model-driven engineering approach. ATTop ful-
fills two purposes: 1. It facilitates interoperation between several AT
analysis methodologies and resulting tools (e.g., ATE, ATCalc, ADTool
2.0), 2. it can perform a comprehensive analysis of attack trees by trans-
lating them into timed automata and analyzing them using the popular
model checker Uppaal, and translating the analysis results back to the
original ATs. Technically, our approach uses various metamodels to pro-
vide a unified description of AT variants. Based on these metamodels,
we perform model transformations that allow to apply various analysis
methods to an AT and trace the results back to the AT domain. We illus-
trate our approach on the basis of a case study from the AT literature.
1 Introduction
Formal methods are often employed to support software engineers in particularly
complex tasks: model-based testing, type checking and extended static checking
are typical examples that help in developing better software faster. This paper is
about the reverse direction: showing how software engineering can assist formal
methods in developing complex analysis tools.
More specifically, we reap the benefits of model-driven engineering (MDE)
to design and build a tool for analyzing attack trees (ATs). ATs [25,31] are
a popular formalism for security analysis, allowing convenient modeling and
analysis of complex attack scenarios. ATs have become part of various system
engineering frameworks, such as UMLsec [16] and SysMLsec [27].
Attack trees come in a large number of variations, employing different secu-
rity attributes (e.g., attack time, costs, resources, etc.) as well as modeling con-
structs (e.g., sequential vs. parallel execution of scenarios). Each of these vari-
ations comes with its own tooling; examples include ADTool [12], ATCalc [2],
c The Author(s) 2018
A. Russo and A. Schürr (Eds.): FASE 2018, LNCS 10802, pp. 56–73, 2018.
https://doi.org/10.1007/978-3-319-89363-1_4
www.dbooks.org
Effective Analysis of Attack Trees: A Model-Driven Approach 57
and Attack Tree Evaluator [5]. This “jungle of attack trees” seriously hampers
the applicability of ATs, since it is impossible or very difficult to combine dif-
ferent features and tooling. This paper addresses these challenges and presents
ATTop1 , a software tool that overarches existing tooling in the AT domain.
In particular, the main features of ATTop are (see Fig. 1):
1. A unified input format that encompasses the known AT features. We have
collected these features in one comprehensive metamodel. Following MDE
best practices, this metamodel is extensible to easily accommodate future
needs.
2. Systematic model transformations. Many AT analysis methods are based on
converting the AT into a mathematical model that can be analyzed with exist-
ing formal techniques, such as timed automata [11,23], Bayesian networks
[13], Petri nets [8], etc. An important contribution of our work is to make
these translations more systematic, and therefore more extensible, maintain-
able, reusable, and less error-prone.
To do so, we again refer to the concepts of MDE and deploy model transfor-
mations. We deploy two categories here: so-called horizontal transformations
achieve interoperability between existing tools. Vertical transformations inter-
pret a model via a set of semantic rules to produce a mathematical model to
be analyzed with formal methods.
3. Bringing the results back to the original domain. When a mathematical model
is analyzed, the analysis result is computed in terms of the mathematical
model, and not in terms of the original AT. For example, if AT analysis is
done via model checking, a trace in the underlying model (i.e., transition
system) can be produced to show that, say, the cheapest attack costs $100.
What security practitioners need, however, is a path or attack vector in the
original AT. This interpretation in terms of the original model is achieved by
a vertical model transformation in the inverse direction, from the results as
obtained in the analysis model back into the AT domain.
These features make ATTop a software bridging tool, acting as a bridge
between existing AT languages, and between ATs and formal languages.
Our Contributions. The contributions of this paper include:
– a full-fledged tool based on MDE, which allows for high maintainability and
extensibility;
– a unified input format, enabling interoperability between different AT
dialects;
– systematic use of model transformations; which increases reusability while
reducing error likelihood;
– a complete cycle from AT to formal model and back, allowing domain experts
to profit from formal methods without requiring specific knowledge.
Overview of Our Approach. Figure 1 depicts the general workflow of our
approach. It shows how ATTop acts as a bridge between different languages and
1
Available at https://github.com/utwente-fmt/attop.
58 R. Kumar et al.
Vertical
ATE Binary AT Horizontal Transformation
Transformation
Horizontal Trace
ATCalc
format Transformation ATTop
Horizontal
AT specified by
ADTool 2.0 Transformation
adtree.xsd Vertical Vertical UPPAAL tool
Transformation Transformation
Timed UPPAAL
automata query
Fig. 1. Overview of our approach, showing the contributions of the paper in the gray
rectangle. Here ATE, ATCalc, ADTool 2.0 are different attack tree analysis tools, each
with its own input format. ATTop allows these tools to be interoperable (horizontal
model transformations, see Sect. 4.1). ATTop also provides a much more comprehensive
AT analysis by automatic translation of attack trees into timed automata and using
Uppaal as the back-end analysis tool (vertical transformations, see Sect. 4.2).
www.dbooks.org
Effective Analysis of Attack Trees: A Model-Driven Approach 59
2 Background
2.1 Attack Trees in the Security Domain
11–11.5
access home network exploit software vulnerability in IoT device run malicous script
10–11 cost = 10 US$, duration = 1 hour cost = 100 US$
duration = 0.5 hour
0–10
get credentials gain access to private networks
cost = 100 US$,
duration = 10 hours
0–5 0–2
find LAN access port spoof MAC address find WLAN break WPA keys
cost = 10 US$, cost = 50 US$, cost = 10 US$, cost = 100 US$,
duration = 1 hour duration = 0.5 hour duration = 5 hours duration = 2 hours
Fig. 2. Attack tree modeling the compromise of an IoT device. Leaves are equipped
with the cost and time required to execute the corresponding step. The parts of the tree
attacked in the cheapest successful attack are indicated by a darker color, with start
and end times for the steps in this cheapest attack denoted in red (times correspond
to the scenario in Fig. 11). (Color figure online)
www.dbooks.org
Effective Analysis of Attack Trees: A Model-Driven Approach 61
TransformaƟon
Source Metamodel maps the maps to the Target Metamodel
DefiniƟon
elements of elements of
is an executes is an
instance of instance of
input output
Model TransformaƟon Model
Engine
Structure Metamodel. The structure model, depicted in Fig. 4 on the left, repre-
sents the structure of the attack tree. Its main class AttackTree contains a set of
one or more Nodes, as indicated by the containment arrow between AttackTree
and Node. One of these nodes is designated as the root of the tree, denoted by
the root reference. Each Node is equipped with an id, used as a reference during
transformation processes. Furthermore, each node has a (possibly empty) list of
its parents and children, which allows to easily traverse the AT. A node may
have a connector, i.e., a gate such as AND, OR, SAND (sequential-AND), etc.
www.dbooks.org
Effective Analysis of Attack Trees: A Model-Driven Approach 63
Structure Values
TimeType TimePurpose Purpose
AƩackTree ƟmeType : TimeType
MINIMAL
AND
1..*
nodes
root
MAXIMAL
...
... purpose
[0..*] parents
Node node AƩribute aƩributes Domain
id : String name : String
[0..*] children
OR
RealValue RealType valueType
[0..1] connector value
value : Double
Value Type
Connector
Values Metamodel. The Values metamodel (Fig. 4, right side) describes how
values are attributed to nodes (arrow from Attribute on the right to Node on the
left). Each Attribute contains exactly one Value, which can be of various (basic
or complex) types: For example, RealValue is a type of Value that contains real
(Double) numbers. A Domain groups all those attributes that have the same
Purpose. By separating the purpose of attributes from their data type, we can
use basic data types (integer, boolean, real number) for different purposes: For
example, a real number (RealType) can be used in a Domain named “Maximum
Duration”, where the purpose is a TimePurpose with timeType = MAXIMAL.
A RealType number could also be used in a different Domain, say “Likelihood
of attack” with the purpose to represent a probability (ProbabilityPurpose, not
shown in the diagram). Thanks to the flexibility of this construct, the set of
available domains is easily extensible.
1 context ATMM!AttackTree {
2 constraint OneAndOnlyOneChildWithoutParents {
3 check : ATMM!Node.allInstances.select(n|n.parents.size() == 0).size() = 1
4 and self.root = ATMM!Node.allInstances.select(n|n.parents.size() == 0).first()
5 }
6 }
Listing 1. Constraint specifying that the root node is the only node in an ATMM AT
with no parents.
64 R. Kumar et al.
OptimalQuery
ExpectedValueQuery ReachabilityQuery ProbabilityQuery
domain : Domain
domain : Domain goal : OptimizationGoal
Query
constraints
RelationalOperator
Constraint OptimizationGoal
GREATER operator : RelationalOperator
SMALLER domain : Domain MAXIMUM
EQUAL value : Value MINIMUM
Fig. 5. The query metamodel. The types ‘Domain’ and ‘Value’ refer to the classes of
the ATMM metamodel (Fig. 4).
The main component of the query metamodel is the element named Query.
A query can be one of the following:
– Reachability, i.e., Is it feasible to reach the top node of an attack tree? Sup-
ported by every tool.
– Probability, i.e., What is the probability that a successful attack occurs? Sup-
ported by every tool.
– ExpectedValue, i.e., What is the expected (average) value of a given quantity
over all possible attacks? Supported by ATTop.
– Optimality, i.e., Which is the attack that is optimal w.r.t. a given attribute
(e.g., time or cost)? Supported by ATE, ADTool 2.0, ATTop.
Furthermore, a query can be framed by combining one of the above query types
with a set of Constraints over the AT attributes. A Constraint is made of a
RelationalOperator, a Value and its Domain. For example, the constraint “within
10 days” is expressed with the SMALLER RelationalOperator, a Value of 10, and
the Domain of “Maximum Duration”.
www.dbooks.org
Effective Analysis of Attack Trees: A Model-Driven Approach 65
executor executable
tasks
startTime [1]
Task Time
name : String value : Float
endTime [0..1]
Fig. 6. The Scenario metamodel from [29]. In the context of ATs, all instances of this
metamodel will have only one Executor, the Attacker; Executables represent attack steps
(i.e. Nodes from the AT), while a Scenario is known as an attack vector.
4 Model Transformations
Example 2. ATE Transformation. The Attack Tree Evaluator [5] tool can only
process binary trees. Using a simple transformation, we can transform any
instance of the ATMM into a binary tree. A simplified version of this trans-
formation, written in ETL, is given in Listing 2. This transformation is based
on a recursive method that traverses the tree. For every node with more than
two children, it nests all but the first child under a new node until no more than
two children remain.
Thus far we have described the transformations to and from dedicated tools for
attack trees. In this section we introduce a vertical transformation which we use
in ATTop to translate attack trees into the more general-purpose formalism of
timed automata (TA). Specifically, we provide model transformations to TAs
that can be analyzed by the Uppaal tool to obtain the wide range of qualitative
and quantitative properties supported by the query metamodel.
Our transformation targets the Uppaal metamodel described in [29]. It
transforms each element of the attack tree (i.e., each gate and basic attack step)
Metamodel
Uppaal timed
Attack tree metamodel (ATMM)
automata metamodel
AT specified in
AT specified in Timed automata
Model
www.dbooks.org
Effective Analysis of Attack Trees: A Model-Driven Approach 67
into a timed automaton. These automata communicate via signals and together
describe the behavior of the entire tree. For example, Fig. 8 shows the timed
automaton obtained by transforming an attack step with a deterministic time
to execute of 5 units.
Depending on the features of the x <= 5 x >= 5
activate[id]? complete[id]!
model and the desired property to be
analyzed, the output of the transfor- Init Active Completed
mation can be analyzed by different
extensions of Uppaal. For example, Fig. 8. Example of a timed automaton
Uppaal CORA supports the analysis modeling a basic attack step with a fixed
of cost-optimal queries, such as “What time to execute of 5 units.
is the lowest cost an attacker needs to incur in order to complete an attack”,
while Uppaal-SMC supports statistical model checking, allowing the analysis of
models with stochastic times and probabilistic attack steps with queries such as
“What is the probability that an attacker successfully completes an attack within
one hour”. The advantages of Uppaal CORA’s exact results come at the cost
of state space explosion, which limits the applicability of this approach for larger
problems. On the other hand, the speed and scalability of the simulation-based
Uppaal-SMC are countered by approximated results and the unavailability of
(counter-)example traces.
support only a single query (e.g., ATE [5] only supports Pareto curves of cost
vs. probability), in which case no transformation is performed but ATTop only
allows that single query as input.
The Uppaal tool is an example of a tool supporting many different queries.
After transforming the AT to a timed automaton (cf. Sect. 4.2), we transform
the query into the textual formula supported by Uppaal. The basic form of
this formula is determined by the query type (e.g., a ReachabilityQuery will be
translated as “E<> toplevel.completed”, which asks for the existence of a trace
that reaches the top level event), while constraints add additional terms limiting
the permitted behavior of the model. By using an Uppaal-specific metamodel
for its query language linked to the TA metamodel, our transformation can easily
refer to the TA elements that correspond to converted AT elements.
5 Tool Support
We have developed the tool ATTop to enable users to easily use the transfor-
mations described in this paper, without requiring knowledge of the underly-
ing techniques or formalisms. ATTop automatically selects which transforma-
tions to apply based on the available inputs and desired outputs. For exam-
ple, if the user provides an ADTool input and requests an Uppaal output,
www.dbooks.org
Effective Analysis of Attack Trees: A Model-Driven Approach 69
6 Case Study
As a case study we use the example anno-
tated attack tree given in Fig. 2. We apply
ATTop to automatically compute several
qualitative and quantitative security met-
rics. Specifically, we apply a horizontal
transformation to convert the model from
the ATCalc format to that accepted by
ADTool 2.0, and a vertical transformation
to analyze the model using Uppaal.
We specify the AT in the Galileo for- Fig. 10. ATCalc plot showing proba-
mat as accepted by ATCalc. Analysis with bility of successful attack over time
ATCalc yields a graph of the probability of
a successful attack over time, as shown in Fig. 10. Next, we would like to deter-
mine the minimal cost of a successful attack, which ATCalc cannot provide.
Therefore, we use ATTop to transform the AT to the ADTool 2.0 format, and
use ADTool 2.0 to compute the minimal cost (yielding $270).
Next, we perform a more comprehensive timing analysis using the vertical
transformation described in Sect. 4.2. We use ATTop to transform the AT to a
timed automaton that can be analyzed using the Uppaal tool. We also transform
a query (OptimalityQuery asking for minimal time) to the corresponding Uppaal
query. Combining these, we obtain a trace for the fastest successful attack, which
ATTop transforms into a scenario in terms of the AT as described in Sect. 4.3.
70 R. Kumar et al.
find_WLAN exploit_sw_vulnerability
break_WPA_keys run_malicious_script
get_credentials
Fig. 11. Scenario of fastest attack as computed by Uppaal . The executed steps and
their start–end times are also shown in Fig. 2.
The resulting scenario is shown in Fig. 11. Running the whole process, including
the transformations and the analysis with Uppaal, took 6.5 s on an Intel R
CoreTM i7 CPU 860 at 2.80 GHz running Ubuntu 16.04 LTS.
7 Conclusions
Acknowledgments. This research was partially funded by STW and ProRail under
the project ArRangeer (grant 12238), STW, TNO-ESI, Océ and PANalytical under
the project SUMBAT (13859), STW project SEQUOIA (15474), NWO projects BEAT
(612001303) and SamSam (628.005.015), and EU project SUCCESS (102112).
www.dbooks.org
Effective Analysis of Attack Trees: A Model-Driven Approach 71
References
1. Andrade, E.C., Alves, M., Matos, R., Silva, B., Maciel, P.: OpenMADS: an open
source tool for modeling and analysis of distributed systems. In: Bitsch, F., Guio-
chet, J., Kaâniche, M. (eds.) SAFECOMP 2013. LNCS, vol. 8153, pp. 277–284.
Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40793-2 25
2. Arnold, F., Belinfante, A., Van der Berg, F., Guck, D., Stoelinga, M.: DFTCalc: a
tool for efficient fault tree analysis. In: Bitsch, F., Guiochet, J., Kaâniche, M. (eds.)
SAFECOMP 2013. LNCS, vol. 8153, pp. 293–301. Springer, Heidelberg (2013).
https://doi.org/10.1007/978-3-642-40793-2 27
3. Arnold, F., Guck, D., Kumar, R., Stoelinga, M.: Sequential and parallel attack
tree modelling. In: Koornneef, F., van Gulijk, C. (eds.) SAFECOMP 2015. LNCS,
vol. 9338, pp. 291–299. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-
24249-1 25
4. Aslanyan, Z., Nielson, F., Parker, D.: Quantitative verification and synthesis of
attack-defence scenarios. In: Computer Security Foundations (CSF), pp. 105–119
(2016). https://doi.org/10.1109/CSF.2016.15
5. Aslanyan, Z.: Attack Tree Evaluator, developed for EU project TREsPASS, Tech-
nical University of Denmark. https://vimeo.com/145070436
6. Bistarelli, S., Fioravanti, F., Peretti, P., Santini, F.: Evaluation of complex security
scenarios using defense trees and economic indexes. J. Exp. Theor. Artif. Intell.
24(2), 161–192 (2012). https://doi.org/10.1080/13623079.2011.587206
7. Byres, E.J., Franz, M., Miller, D.: The use of attack trees in assessing vulnerabili-
ties in SCADA systems. In: Proceedings of Infrastructure Survivability Workshop.
IEEE (2004)
8. Dalton, G.C.I., Mills, R.F., Colombi, J.M., Raines, R.A.: Analyzing attack trees
using generalized stochastic petri nets. In: 2006 IEEE Information Assurance Work-
shop, pp. 116–123, June 2006. https://doi.org/10.1109/IAW.2006.1652085
9. Dehnert, C., Junges, S., Katoen, J.-P., Volk, M.: A Storm is coming: a mod-
ern probabilistic model checker. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017.
LNCS, vol. 10427, pp. 592–600. Springer, Cham (2017). https://doi.org/10.1007/
978-3-319-63390-9 31
10. Fraile, M., Ford, M., Gadyatskaya, O., Kumar, R., Stoelinga, M., Trujillo-Rasua,
R.: Using attack-defense trees to analyze threats and countermeasures in an ATM:
a case study. In: Horkoff, J., Jeusfeld, M.A., Persson, A. (eds.) PoEM 2016. LNBIP,
vol. 267, pp. 326–334. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-
48393-1 24
11. Gadyatskaya, O., Hansen, R.R., Larsen, K.G., Legay, A., Olesen, M.C., Poulsen,
D.B.: Modelling attack-defense trees using timed automata. In: Fränzle, M.,
Markey, N. (eds.) FORMATS 2016. LNCS, vol. 9884, pp. 35–50. Springer, Cham
(2016). https://doi.org/10.1007/978-3-319-44878-7 3
12. Gadyatskaya, O., Jhawar, R., Kordy, P., Lounis, K., Mauw, S., Trujillo-Rasua,
R.: Attack trees for practical security assessment: ranking of attack scenarios with
ADTool 2.0. In: Agha, G., Van Houdt, B. (eds.) QEST 2016. LNCS, vol. 9826, pp.
159–162. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43425-4 10
13. Gribaudo, M., Iacono, M., Marrone, S.: Exploiting Bayesian networks for the anal-
ysis of combined attack trees. In: Proceedings of PASM. ENTCS, vol. 310, pp.
91–111 (2015). https://doi.org/10.1016/j.entcs.2014.12.014
72 R. Kumar et al.
14. Hendriks, M., Verhoef, M.: Timed automata based analysis of embedded system
architectures. In: Proceedings of 20th International Conference on Parallel and
Distributed Processing (IPDPS), p. 179. IEEE (2006). https://doi.org/10.1109/
IPDPS.2006.1639422
15. Hermanns, H., Krämer, J., Krčál, J., Stoelinga, M.: The value of attack-defence
diagrams. In: Piessens, F., Viganò, L. (eds.) POST 2016. LNCS, vol. 9635, pp.
163–185. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49635-
09
16. Jürjens, J.: UMLsec: extending UML for secure systems development. In: Jézéquel,
J.-M., Hussmann, H., Cook, S. (eds.) UML 2002. LNCS, vol. 2460, pp. 412–425.
Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45800-X 32
17. Kolovos, D., Rose, L., Garcı́a-Domńguez, A., Paige, R.: The Epsilon Book (2016).
http://www.eclipse.org/epsilon/doc/book
18. Kordy, B., Mauw, S., Radomirović, S., Schweitzer, P.: Foundations of attack–
defense trees. In: Degano, P., Etalle, S., Guttman, J. (eds.) FAST 2010. LNCS,
vol. 6561, pp. 80–95. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-
642-19751-2 6
19. Kordy, B., Mauw, S., Schweitzer, P.: Quantitative questions on attack–defense
trees. In: Kwon, T., Lee, M.-K., Kwon, D. (eds.) ICISC 2012. LNCS, vol. 7839, pp.
49–64. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37682-5 5
20. Kordy, B., Piètre-Cambacédès, L., Schweitzer, P.: DAG-based attack and defense
modeling: don’t miss the forest for the attack trees. Comput. Sci. Rev. 13–14,
1–38 (2014). https://doi.org/10.1016/j.cosrev.2014.07.001
21. Kumar, R., Stoelinga, M.: Quantitative security and safety analysis with attack-
fault trees. In: Proceedings of IEEE 18th International Symposium on High Assur-
ance Systems Engineering (HASE), pp. 25–32, January 2017. https://doi.org/10.
1109/HASE.2017.12
22. Kumar, R., Guck, D., Stoelinga, M.: Time dependent analysis with dynamic
counter measure trees. In: Proceedings of 13th Workshop on Quantitative Aspects
of Programming Languages (QAPL) (2015). http://arxiv.org/abs/1510.00050
23. Kumar, R., Ruijters, E., Stoelinga, M.: Quantitative attack tree analysis via priced
timed automata. In: Sankaranarayanan, S., Vicario, E. (eds.) FORMATS 2015.
LNCS, vol. 9268, pp. 156–171. Springer, Cham (2015). https://doi.org/10.1007/
978-3-319-22975-1 11
24. Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic
real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS,
vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-
3-642-22110-1 47
25. Mauw, S., Oostdijk, M.: Foundations of attack trees. In: Won, D.H., Kim, S. (eds.)
ICISC 2005. LNCS, vol. 3935, pp. 186–198. Springer, Heidelberg (2006). https://
doi.org/10.1007/11734727 17
26. Mead, N.: SQUARE Process (2013). https://buildsecurityin.us-cert.gov/articles/
best-practices/requirements-engineering/square-process
27. Roudier, Y., Apvrille, L.: SysML-Sec: a model driven approach for designing safe
and secure systems. In: Proceedings of 3rd International Conference on Model-
Driven Engineering and Software Development (MODELSWARD), pp. 655–664
(2015)
28. Ruijters, E., Schivo, S., Stoelinga, M.I.A., Rensink, A.: Uniform analysis of fault
trees through model transformations. In: Proceedings of IEEE 63rd Annual Reli-
ability and Maintainability Symposium (RAMS), January 2017. https://doi.org/
10.1109/RAM.2017.7889759
www.dbooks.org
Effective Analysis of Attack Trees: A Model-Driven Approach 73
29. Schivo, S., Yildiz, B.M., Ruijters, E., Gerking, C., Kumar, R., Dziwok, S., Rensink,
A., Stoelinga, M.: How to efficiently build a front-end tool for UPPAAL: a model-
driven approach. In: Larsen, K.G., Sokolsky, O., Wang, J. (eds.) SETTA 2017.
LNCS, vol. 10606, pp. 319–336. Springer, Cham (2017). https://doi.org/10.1007/
978-3-319-69483-2 19
30. Schmidt, D.C.: Guest editor’s introduction: model-driven engineering. Computer
39(2), 25–31 (2006). https://doi.org/10.1109/MC.2006.58
31. Schneier, B.: Attack trees. Dr. Dobb’s J. 24(12), 21–29 (1999)
32. da Silva, A.R.: Model-driven engineering: a survey supported by the unified con-
ceptual model. Comput. Lang. Syst. Struct. 43, 139–155 (2015). https://doi.org/
10.1016/j.cl.2015.06.001
33. Sprinkle, J., Rumpe, B., Vangheluwe, H., Karsai, G.: Chapter 3: Metamodelling.
In: Giese, H., Karsai, G., Lee, E., Rumpe, B., Schätz, B. (eds.) MBEERTS 2007.
LNCS, vol. 6100, pp. 57–76. Springer, Heidelberg (2010). https://doi.org/10.1007/
978-3-642-16277-0 3
34. Stahl, T., Voelter, M., Czarnecki, K.: Model-Driven Software Development: Tech-
nology, Engineering, Management. Wiley, Chichester (2006)
35. Steinberg, D., Budinsky, F., Paternostro, M., Merks, E.: EMF: Eclipse Modeling
Framework 2.0, 2nd edn. Addison-Wesley Professional, Reading (2009)
36. Steiner, M., Liggesmeyer, P.: Qualitative and quantitative analysis of CFTs taking
security causes into account. In: Koornneef, F., van Gulijk, C. (eds.) SAFECOMP
2015. LNCS, vol. 9338, pp. 109–120. Springer, Cham (2015). https://doi.org/10.
1007/978-3-319-24249-1 10
37. Völter, M., Stahl, T., Bettin, J., Haase, A., Helsen, S.: Model-Driven Software
Development: Technology, Engineering, Management. Wiley, Chichester (2006)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
Distributed Program and System
Analysis
www.dbooks.org
ROLA: A New Distributed Transaction
Protocol and Its Formal Analysis
1 Introduction
www.dbooks.org
ROLA: A New Distributed Transaction Protocol and Its Formal Analysis 79
2 Preliminaries
When a client initiates a get all operation, then for each i ∈ I the client
will first request the latest version vector stored on the server for i. It will then
look at the metadata in the version vector returned by the server, iterating over
each item in the metadata set. If it finds an item in the metadata that has a
later timestamp than the ts v in the returned vector, this means the value for i
is out of date. The client can then request the RA-consistent version of i.
x ) −→ t (−
[l] : t(−
→ →
x,−
→
y ) if cond(−
→
x ) with probability →
−
y := π(−
→
x)
www.dbooks.org
ROLA: A New Distributed Transaction Protocol and Its Formal Analysis 81
2
The coordinator, or client, is the partition executing the transaction.
82 S. Liu et al.
Algorithm 1. ROLA
Server-side Data Structures
1: versions: list of versions item, value, timestamp tsv , metadata md
2: latestCommit[i]: last committed timestamp for item i
3: seq[ts]: local sequence number mapped to timestamp ts
4: sqn: local sequence counter
Server-side Methods
get same as in RAMP-Fast
Coordinator-side Methods
put all, get all same as in RAMP-Fast
the passed-in timestamp ts prev , then the version is deemed prepared. The par-
tition keeps a record of this locally by incrementing a local sequence counter
and mapping the received version’s timestamp ts v to the current value of the
sequence counter. Finally the partition returns an ack to the client. If ts prev
www.dbooks.org
ROLA: A New Distributed Transaction Protocol and Its Formal Analysis 83
does not match the timestamp of the last version in versions with the same
item, then this latest timestamp is simply returned to the coordinator.
If the coordinator receives an ack from prepare update, it immediately
commits the version with the generated timestamp tstx . If the returned value is
instead a timestamp, the transaction is aborted.
Data Types. A version is a timestamped version of a data item (or key) and is
modeled as a 4-tuple version(key, value, timestamp, metadata). A timestamp
is modeled as a pair ts(addr , sqn) consisting of a partition’s identifier addr and
a local sequence number sqn. Metadata are modeled as a set of keys, denoting,
for each key, the other keys that are written in the same transaction.
The sort OperationList represents lists of read and write operations as terms
such as (x := read k1) (y := read k2) write(k1, x + y), where LocalVar
denotes the “local variable” that stores the value of the key read by the operation,
and Expression is an expression involving the transaction’s local variables:
op write : Key Expression -> Operation [ctor] .
op _:=read_ : LocalVar Key -> Operation [ctor] .
pr LIST{Operation} * (sort List{Operation} to OperationList) .
www.dbooks.org
ROLA: A New Distributed Transaction Protocol and Its Formal Analysis 85
The datastore attribute represents the partition’s local database as a list of ver-
sions for each key stored at the partition. The attribute latestCommit maps to
each key the timestamp of its last committed version. tsSqn maps each version’s
timestamp to a local sequence number sqn. The attributes gotTxns, executing,
committed and aborted denote the transaction(s) which are, respectively, wait-
ing to be executed, currently executing, committed, and aborted.
The attribute votes stores the votes in the two-phase commit. The remaining
attributes denote the partitions from which the executing partition is awaiting
votes, committed acks, first-round get replies, and second-round get replies.
The following shows an initial state (with some parts replaced by ‘...’) with
two partitions, p1 and p2, that are coordinators for, respectively, transactions
t1, and t2 and t3. p1 stores the data items x and z, and p2 stores y. Transaction
t1 is the read-only transaction (xl := read x) (yl := read y), transaction
t2 is a write-only transaction write(y, 3) write(z, 8), while transaction t3
is a read-write transaction on data item x. The states also include a buffer of
messages in transit and the global clock value, and a table which assigns to each
data item the site storing the item. Initially, the value of each item is [0]; the
version’s timestamp is empty (eptTS), and metadata is an empty set.
This section formalizes the dynamic behaviors of ROLA using rewrite rules,
referring to the corresponding lines in Algorithm 1. We only show 2 of the 15
rewrite rules in our model, and refer to the report [14] for further details.3
Receiving prepare Messages (lines 5–10). When a partition receives a prepare
message for a read-write transaction, the partition first determines whether the
timestamp of the last version (VERSION) in its local version list VS matches
the incoming timestamp TS’ (which is the timestamp of the version read by
the transaction). If so, the incoming version is added to the local store, the
map tsSqn is updated, and a positive reply (true) to the prepare message is
sent (“return ack ” in our pseudo-code); otherwise, a negative reply (false, or
“return latest” in the pseudo-code) is sent. Depending on whether the sender
PID’ of the prepare message happens to be PID itself, the reply is equipped
with a local message delay ld or a remote message delay rd, both of which are
sampled probabilistically from distributions with different parameters:4
crl [receive-prepare-rw] :
{T, PID <- prepare(TID, version(K, V, TS, MD), TS’, PID’)}
< PID : Partition | datastore: VS, sqn: SQN, tsSqn: TSSQN, AS’ >
=>
if VERSION == eptVersion or tstamp(VERSION) == TS’
then < PID : Partition | datastore: (VS version(K,V,TS,MD)), sqn: SQN’,
tsSqn: insert(TS,SQN’,TSSQN), AS’ >
[if PID == PID’ then ld else rd fi,
PID’ <- prepare-reply(TID, true, PID)]
else < PID : Partition | datastore: VS, sqn: SQN, tsSqn: TSSQN, AS’ >
[if PID == PID’ then ld else rd fi,
PID’ <- prepare-reply(TID, false, PID)] fi
if SQN’ := SQN + 1 /\ VERSION := latestPrepared(K,VS) .
rl [receive-prepare-reply-false-executing] :
{T, PID <- prepare-reply(TID, false, PID’)}
< PID : Partition | executing: < TID : Txn | AS >, aborted: TXNS,
voteSites: VSTS addrs(TID, (PID’ , PIDS)), AS’ >
=>
< PID : Partition | executing: noTxn,
aborted: (TXNS ;; < TID : Txn | AS >),
voteSites: VSTS addrs(TID, PIDS), AS’ > .
3
We do not give variable declarations, but follow the convention that variables are
written in (all) capital letters.
4
The variable AS’ denotes the “remaining” attributes in the object.
www.dbooks.org
ROLA: A New Distributed Transaction Protocol and Its Formal Analysis 87
which stores crucial information about each transaction. The log is a list of
records record(tid , issueTime, finishTime, reads, writes, committed ), with tid
the transaction’s ID, issueTime its issue time, finishTime its commit/abort time,
reads the versions read, writes the versions written, and committed a flag that
is true if the transaction is committed.
We modify our model by updating the Monitor when needed. For example,
when the coordinator has received all committed messages, the monitor records
the commit time (T) for that transaction, and sets the “committed” flag to true5 :
crl [receive-committed] :
{T, PID <- committed(TID, PID’)}
< M : Monitor | log: (LOG record(TID, T’, T’’, RS, WS, false) LOG’) >
< PID : Partition | executing: < TID : Txn | AS >,
committed: TXNS, commitSites: CMTS, AS’ >
=>
if CMTS’[TID] == empty --- all "committed" received
then < M : Monitor | log: (LOG record(TID, T’, T, RS, WS, true) LOG’) >
< PID : Partition | executing: noTxn, commitSites: CMTS’,
committed: (TXNS ;; < TID : Txn | AS >, AS’ >
else < M : Monitor | log: (LOG record(TID, T’, T’’, RS, WS, false) LOG’) >
< PID : Partition | executing: < TID : Txn | AS >,
committed: TXNS, commitSites: CMTS’, AS’ > fi
if CMTS’ := remove(TID, PID’, CMTS) .
5
The additions to the original rule are written in italics.
88 S. Liu et al.
The function fracRead checks whether there are fractured reads in the execution
log. There is a fractured read if a transaction TID2 reads X and Y, transaction
TID1 writes X and Y, TID2 reads the version TSX of X written by TID1, and reads
a version TSY’ of Y written before TSY (TSY’ < TSY). Since the transactions in
the log are ordered according to start time, TID2 could appear before or after
TID1 in the log. We spell out the case when TID1 comes before TID2:
op fracRead : Record -> Bool .
ceq fracRead(LOG ;
record(TID1,T1,T1’,RS1, (version(X,VX,TSX,MDX), version(Y,VY,TSY,MDY)),true) ; LOG’ ;
record(TID2,T2,T2’,(version(X,VX,TSX,MDX), version(Y,VY’,TSY’,MDY’)), WS2,true) ; LOG’’)
= true if TSY’ < TSY .
ceq fracRead(LOG ; record(TID2, ...) ; LOG’ ; record(TID1, ...) ; LOG’’) = true if TSY’ < TSY .
eq fracRead(LOG) = false [owise] .
No Lost Updates. We analyze the PLU property by searching for a final state in
which the monitor shows that an update was lost:
search [1] initConfig =>! C:Config < M:Address : Monitor | log: LOG:Record >
such that lu(LOG) .
The function lu, described in [14], checks whether there are lost updates in LOG.
We have performed our analysis with 4 different initial states, with up to 8
transactions, 2 data items and 4 partitions, without finding a violation of RA
or PLU. We have also model checked the causal consistency (CC) property with
the same initial states, and found a counterexample showing that ROLA does
not satisfy CC. (This might imply that our initial states are large enough so
that violations of RA or PLU could have been found by model checking.) Each
analysis command took about 30 seconds to execute on a 2.9 GHz Intel 4-Core
i7-3520M CPU with 3.7 GB memory.
www.dbooks.org
ROLA: A New Distributed Transaction Protocol and Its Formal Analysis 89
vector timestamp). We only plot the results under uniform key access distribu-
tion, which are consistent with the results using Zipfian distributions.
The plots in Fig. 2 show the average transaction latency as a function of the
same parameters as the plots for throughput. Again, we see that ROLA out-
performs Walter in all settings. In particular, this difference is quite large for
write-heavy workloads; the reason is that Walter incurs more and more overhead
for providing causality, which requires background propagation to advance the
vector timestamp. The latency tends to converge under read-heavy workload
(because reads in both ROLA and Walter can commit locally without certifica-
tion), but ROLA still has noticeable lower latency than Walter.
7 Related Work
Maude and PVeStA have been used to model and analyze the correctness and
performance of a number of distributed data stores: the Cassandra key-value
store [12,15], different versions of RAMP [10,13], and Google’s Megastore [7,8].
In contrast to these papers, our paper uses formal methods to develop and
validate an entirely new design, ROLA, for a new consistency model.
Concerning formal methods for distributed data stores, engineers at Amazon
have used TLA+ and its model checker TLC to model and analyze the correct-
ness of key parts of Amazon’s celebrated cloud computing infrastructure [17].
www.dbooks.org
ROLA: A New Distributed Transaction Protocol and Its Formal Analysis 91
In contrast to our work, they only use formal methods for correctness analysis;
indeed, one of their complaints is that they cannot use their formal method for
performance estimation. The designers of the TAPIR transaction protocol for
distributed storage systems have also specified and model checked correctness
(but not performance) properties of their design using TLA+ [22].
8 Conclusions
We have presented the formal design and analysis of ROLA, a distributed trans-
action protocol that supports a new consistency model not present in the survey
by Cerone et al. [4]. Using formal modeling and both standard and statistical
model checking analyses we have: (i) validated ROLA’s RA and PLU consis-
tency requirements; and (ii) analyzed its performance requirements, showing
that ROLA outperforms Walter in all performance measures.
This work has shown, to the best of our knowledge for the first time, that the
design and validation of a new distributed transaction protocol can be achieved
relatively quickly before its implementation by the use of formal methods. Our
next planned step is to implement ROLA, evaluate it experimentally, and com-
pare the experimental results with the formal analysis ones. In previous work
on existing systems such as Cassandra [9] and RAMP [3], the performance esti-
mates obtained by formal analysis and those obtained by experimenting with
the real system were basically in agreement with each other [10,12]. This con-
firmed the useful predictive power of the formal analyses. Our future research
will investigate the existence of a similar agreement for ROLA.
92 S. Liu et al.
References
1. Agha, G.A., Meseguer, J., Sen, K.: PMaude: rewrite-based specification language
for probabilistic object systems. Electr. Notes Theor. Comput. Sci. 153(2), 213–239
(2006)
2. AlTurki, M., Meseguer, J.: PVeStA: a parallel statistical model checking and
quantitative analysis tool. In: Corradini, A., Klin, B., Cı̂rstea, C. (eds.) CALCO
2011. LNCS, vol. 6859, pp. 386–392. Springer, Heidelberg (2011). https://doi.org/
10.1007/978-3-642-22944-2 28
3. Bailis, P., Fekete, A., Ghodsi, A., Hellerstein, J.M., Stoica, I.: Scalable atomic
visibility with RAMP transactions. ACM Trans. Database Syst. 41(3), 15:1–15:45
(2016)
4. Cerone, A., Bernardi, G., Gotsman, A.: A framework for transactional consistency
models with atomic visibility. In: CONCUR. Schloss Dagstuhl - Leibniz-Zentrum
fuer Informatik (2015)
5. Clavel, M., Durán, F., Eker, S., Lincoln, P., Martı́-Oliet, N., Meseguer, J., Talcott,
C.: All About Maude - A High-Performance Logical Framework: How to Spec-
ify, Program, and Verify Systems in Rewriting Logic. LNCS, vol. 4350. Springer,
Heidelberg (2007). https://doi.org/10.1007/978-3-540-71999-1
6. Eckhardt, J., Mühlbauer, T., Meseguer, J., Wirsing, M.: Statistical model checking
for composite actor systems. In: Martı́-Oliet, N., Palomino, M. (eds.) WADT 2012.
LNCS, vol. 7841, pp. 143–160. Springer, Heidelberg (2013). https://doi.org/10.
1007/978-3-642-37635-1 9
7. Grov, J., Ölveczky, P.C.: Formal modeling and analysis of Google’s Megastore
in Real-Time Maude. In: Iida, S., Meseguer, J., Ogata, K. (eds.) Specification,
Algebra, and Software. LNCS, vol. 8373, pp. 494–519. Springer, Heidelberg (2014).
https://doi.org/10.1007/978-3-642-54624-2 25
8. Grov, J., Ölveczky, P.C.: Increasing consistency in multi-site data stores:
Megastore-CGC and its formal analysis. In: Giannakopoulou, D., Salaün, G. (eds.)
SEFM 2014. LNCS, vol. 8702, pp. 159–174. Springer, Cham (2014). https://doi.
org/10.1007/978-3-319-10431-7 12
9. Hewitt, E.: Cassandra: The Definitive Guide. O’Reilly Media, Sebastopol (2010)
10. Liu, S., Ölveczky, P.C., Ganhotra, J., Gupta, I., Meseguer, J.: Exploring design
alternatives for RAMP transactions through statistical model checking. In: Duan,
Z., Ong, L. (eds.) ICFEM 2017. LNCS, vol. 10610, pp. 298–314. Springer, Cham
(2017). https://doi.org/10.1007/978-3-319-68690-5 18
11. Liu, S., Ölveczky, P.C., Wang, Q., Meseguer, J.: Formal modeling and analysis
of the Walter transactional data store. In: Proceedings of WRLA 2018. LNCS.
Springer (2018, to appear). https://sites.google.com/site/siliunobi/walter
12. Liu, S., Ganhotra, J., Rahman, M., Nguyen, S., Gupta, I., Meseguer, J.: Quanti-
tative analysis of consistency in NoSQL key-value stores. Leibniz Trans. Embed.
Syst. 4(1), 03:1–03:26 (2017)
www.dbooks.org
ROLA: A New Distributed Transaction Protocol and Its Formal Analysis 93
13. Liu, S., Ölveczky, P.C., Rahman, M.R., Ganhotra, J., Gupta, I., Meseguer, J.:
Formal modeling and analysis of RAMP transaction systems. In: SAC 2016. ACM
(2016)
14. Liu, S., Ölveczky, P.C., Santhanam, K., Wang, Q., Gupta, I., Meseguer, J.: ROLA:
a new distributed transaction protocol and its formal analysis (2017). https://sites.
google.com/site/fase18submission/tech-report
15. Liu, S., Rahman, M.R., Skeirik, S., Gupta, I., Meseguer, J.: Formal modeling and
analysis of Cassandra in Maude. In: Merz, S., Pang, J. (eds.) ICFEM 2014. LNCS,
vol. 8829, pp. 332–347. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-
11737-9 22
16. Meseguer, J.: Conditional rewriting logic as a unified model of concurrency. Theor.
Comput. Sci. 96(1), 73–155 (1992)
17. Newcombe, C., Rath, T., Zhang, F., Munteanu, B., Brooker, M., Deardeuff, M.:
How Amazon Web Services uses formal methods. Commun. ACM 58(4), 66–73
(2015)
18. Sen, K., Viswanathan, M., Agha, G.: On statistical model checking of stochastic
systems. In: Etessami, K., Rajamani, S.K. (eds.) CAV 2005. LNCS, vol. 3576, pp.
266–280. Springer, Heidelberg (2005). https://doi.org/10.1007/11513988 26
19. Sen, K., Viswanathan, M., Agha, G.A.: VESTA: a statistical model-checker and
analyzer for probabilistic systems. In: QEST 2005. IEEE Computer Society (2005)
20. Sovran, Y., Power, R., Aguilera, M.K., Li, J.: Transactional storage for geo-
replicated systems. In: SOSP 2011. ACM (2011)
21. Younes, H.L.S., Simmons, R.G.: Statistical probabilistic model checking with a
focus on time-bounded properties. Inf. Comput. 204(9), 1368–1409 (2006)
22. Zhang, I., Sharma, N.K., Szekeres, A., Krishnamurthy, A., Ports, D.R.K.: Building
consistent transactions with inconsistent replication. In: Proceedings of Symposium
on Operating Systems Principles, SOSP 2015. ACM (2015)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
A Process Network Model for Reactive
Streaming Software with Deterministic
Task Parallelism
petro.poplavko@siemens.com
3
Université Grenoble Alpes (UGA), VERIMAG, Grenoble, France
Saddek.Bensalem@univ-grenoble-alpes.fr
4
Deimos SpaceR , Madrid, Spain
pedro.palomo@deimos-space.com
5
Information Technology Institute, Centre of Research and Technology,
Thessaloniki, Greece
The research leading to these results has received funding from the European Space
Agency project MoSaTT-CMP, Contract No. 4000111814/14/NL/MH.
c The Author(s) 2018
A. Russo and A. Schürr (Eds.): FASE 2018, LNCS 10802, pp. 94–110, 2018.
https://doi.org/10.1007/978-3-319-89363-1_6
www.dbooks.org
A PN Model for Reactive Streaming Software 95
1 Introduction
The proliferation of multi-cores in timing-critical embedded systems requires a
programming paradigm that addresses the challenge of ensuring predictable tim-
ing. Two prominent paradigms and a variety of associated languages are widely
used today. For streaming signal processing, synchronous dataflow languages [18]
allow writing programs in the form of directed graphs with nodes for their func-
tions and arcs for the data flows between functions. Such programs can exploit
concurrency when they are deployed to multi-cores [15], while their functions
can be statically scheduled [17] to ensure a predictable timing behavior.
On the other hand, the reactive-control synchronous languages [12] are used
for reactive systems (e.g., flight control systems) expected to react to stimuli
from the environment within strict time bounds. The synchronicity abstraction
eliminates the non-determinism from the interleaving of concurrent behaviors.
The synchronous languages lack appropriate concepts for task parallelism
and timing-predictable scheduling on multiprocessors, whereas the streaming
models do not support reactive behavior. The Fixed Priority Process Network
(FPPN) model of computation has been proposed as a trade-off between stream-
ing and reactive control processing, for task parallel programs. In FPPNs, task
invocations depend on a combination of periodic data availability (similar to
streaming models) and sporadic control events. Static scheduling methods for
FPPNs [20] have demonstrated a predictable timing on multi-cores. A first imple-
mentation of the model [22] in an executable formal specification language called
BIP (Behavior, Interaction, Priority) exists, more specifically in its real-time
dialect [3] extended to tasks [10]. In [21], the FPPN scheduling was studied by
taking into account resource interference; an approach for incrementally plug-
ging online schedulers for HW/SW resource sharing (e.g., for QoS management)
was proposed.
This article presents the first comprehensive FPPN semantics definition, at
two levels: semantics for sequential execution, which ensures functional deter-
minism, and a real-time semantics for concurrent task execution while adhering
to the constraints of the former semantics. Our definition is related to a new
model transformation framework, which enables programming at a high level by
embedding FPPNs into the architecture description, and allows an incremental
refinement in terms of task interactions and scheduling1 . Our approach is demon-
strated with a real spacecraft on-board application ported onto the European
Space Agency’s quad-core Next Generation Microprocessor (NGMP).
2 Related Work
Design frameworks for embedded applications, like Ptolemy II [6] and
PeaCE [11], allow designing systems through refining high-level models. They
are based on various models of computation (MoC), but we focus mainly on
those that support task scheduling with timing constraints. Dataflow MoCs that
1
The framework is online at [2].
96 F. Gioulekas et al.
stem from the Kahn Process Networks [16] have been adapted for the timing
constraints of signal processing applications and design frameworks like Comp-
SoC [13] have been introduced; these MoCs do not support reactive behavior and
sporadic tasks as in the FPPN MoC that can be seen as an extension in that
direction. DOL Critical [10] ensures predictable timing, but its functional behav-
ior depends on scheduling. Another timing-aware reactive MoC that does not
guarantee functional determinism is the DPML [4]. The Prelude design frame-
work [5] specifies applications in a synchronous reactive MoC, but due to its
expressive power it is hard to derive scheduling analyses, unless restricting its
semantics. Last but not the least, though the reactive process networks (RPN) [8]
do not support scheduling with timing constraints, they lay an important foun-
dation for combining the streaming and reactive control behaviors. In the FPPN
semantics we reuse an important principle of RPN semantics, namely, perform-
ing the maximal execution run of a dataflow network in response to a control
event.
www.dbooks.org
A PN Model for Reactive Streaming Software 97
i.e., a functional priority either follows the direction of dataflow or the opposite.
Given a (p1 , p2 ) ∈ FP, p1 is said to have a higher priority than p2 .
The FPPN in Fig. 2, represents an imaginary data processing application,
where the “X” sporadic process generates values, “Square” calculates the square
of the received value and the “Y” periodic process serves as sink for the squared
value. A sporadic event (command from the environment) invokes “X”, which is
annotated by its minimal inter-arrival time. The periodic processes are annotated
by their periods. The two types of non-blocking channels are also illustrated. The
FIFO (or mailbox) has a semantics of a queue. The blackboard remembers the
last written value that can be read multiple times. The arc depicted above the
channels indicates the functional priority relation FP. Additionally, the external
input/output channels are shown. In this example, the dataflow in the channels
go in the opposite direction of the functional priority order. Note that, by analogy
to the scheduling priorities, a convenient method to define priority is to assign
a unique priority index to every process, the smaller the index the higher the
priority. This method is demonstrated in Fig. 2. In this case the minimal required
FP relation would be defined by joining each pair of communicating processes
by an arc going from the higher-priority process to the lower-priority one.
Let us denote by Var the set of all variables. For a variable x or an ordered
set (vector) X of variables we denote by D(x) (resp. D(X)) its domain (or vector
of domains), i.e., the set(s) of values that the variable(s) may take. Valuations of
variables X are shown as X 0 , X 1 . . ., or simply as X, dropping the superscript.
Each variable is assumed to have a unique initial valuation. From the software
point of view, this means that all variables are initialized by a default value.
Var includes all process state variables Xp and the channel state variables
γc . The current valuation of a state variable is often referred to simply as state.
98 F. Gioulekas et al.
The time stamps in the execution are non-decreasing, and denote the time
until the next time stamp, at which the following actions occur. In the example,
at time 0 we read sample [1] from I1 and we compute its square. Then we write
to channel c1 . At time 100, we read from c1 and write the sample [2] to O1 .
A process models a subroutine with a set of locations (code line numbers),
variables (data) and operators that define a guard on variables (‘if’ condition),
the action (operator body) and the transfer of control to the next location.
www.dbooks.org
A PN Model for Reactive Streaming Software 99
www.dbooks.org
A PN Model for Reactive Streaming Software 101
a given sequence (t1 , P1 ), (t2 , P2 ) . . . , where t1 < t2 < . . . are time stamps and
Pi is the multiset of processes invoked at time ti . For convenience, we associate
each ‘invoked process’ p in Pi with respective invocation event, ep . The execution
trace has the form:
T race(PN ) = w(t1 ) ◦ α1 ◦ w(t2 ) ◦ α2 . . .
where αi is a concatenation of job executions of processes in Pi included in an
order, such that if p1 → p2 then the job(s) of p1 execute earlier than those of p2 .
www.dbooks.org
A PN Model for Reactive Streaming Software 103
cases, the relative execution order of these subsets of jobs is dictated by zero-
delay semantics, whereby the jobs are executed in the invocation order and the
simultaneously invoked jobs follow the functional priority order. In this way, we
ensure deterministic updates in both cases: (i) for the states of processes by
excluding auto-concurrency, and (ii) for the data shared between the processes
by excluding data races on the channels. The precedence constraints for (i) are
satisfied by construction, because BIP components for processes never start a
new job execution until the previous job of the same process has finished. For the
precedence constraints in (ii), an appropriate component is generated for each
pair of communicating processes and plugged incrementally into the network of
BIP components.
Figure 4 shows such a component generated a given pair of processes “A”
and “B”, assuming (A, B) ∈ FP. We saw in Fig. 3 that the evolution of a job
execution goes through three steps: ‘invoke’, ‘start’ and ‘finish’. The component
handles the three steps of both processes in almost symmetrical way, except in
the method that determines whether the job is ready to start: if two jobs are
simultaneously invoked, then first the job of process “A” gets ready and then,
after it has executed, the job of “B” becomes ready. The “Functional Priority”
104 F. Gioulekas et al.
Fig. 4. Imposing precedence order between “A”, “B” (“A” has higher functional priority)
2
Queues are implemented by a circular buffer with the following operations:
– Allocate() picks an available (statically allocated) cell and gives reference to it
– Push() push the last allocated cell into the tail
– Pull() undo the push
– Pop() retrieve the data from the head of the queue.
3
Thanks to ‘init α’ and ‘advance α’, the queue tail always contains the next antici-
pated job, which is conservatively marked as non-active until ‘Invoke α’ transition.
www.dbooks.org
A PN Model for Reactive Streaming Software 105
amended the attributes of TASTE functions to reflect the priority index of pro-
cesses and the parameters of FPPN channels, such as capacity of FIFO channels.
The resulting model can be compiled and simulated in TASTE.
The second and final refinement step is scheduling. To schedule on multi-
cores while respecting the real-time semantics of FPPN this step is preceded by
transformation from TASTE architectural model into BIP FPPN model. The
transformation process implements the FPPN-to-BIP ‘compilation’ sketched in
the previous section, and we believe it could be formalized by a set of trans-
formation rules. For example, as illustrated in Fig. 6, one of the rules could say
that if there are two tasks τ1 and τ2 related by FP relation then their respective
BIP components B1 and B2 are connected (via ‘Start’ and ‘Finish’ ports) to a
functional priority component.
The scheduling is done offline, by first deriving a task graph from the archi-
tectural model, taking into account the periods, functional priorities and WCET
of processes. The task graph represents a maximal set of jobs invoked in a hyper-
period and their precedence constraints; it defines the invocation and the dead-
line of jobs relatively to the hyperperiod start time. The task graph derivation
algorithm is detailed in [20].
Definition 5 (Task Graph). A directed acyclic graph T G(J , E) whose nodes
J = {Ji } are jobs defined by tuples Ji = (pi , ki , Ai , Di , Wi ), where pi is the
job’s process, ki is the job’s invocation count, Ai ∈ Q≥0 is the invocation time,
Di ∈ Q+ is the absolute deadline and Wi ∈ Q+ is the WCET. The k-th job of
process p is denoted by p[k]. The edges E represent the precedence constraints.
www.dbooks.org
A PN Model for Reactive Streaming Software 107
The task graph is given as input to a static scheduler. The schedule obtained
from the static scheduler is translated into parameters for the online-scheduler
(cf. Fig. 6), which, on top of the functional priority components, further con-
straints the job execution order and timing, with the purpose of ensuring dead-
line satisfaction. The joint application/scheduler BIP model is called System
Model. This model is eventually compiled and linked with the BIP-RTE, which
ensures correct BIP semantics of all components online [23].
scheduler. Since there are four discrete transitions per one job execution and
31 jobs per hyperperiod, 31 × 4 = 124 discrete transitions are executed by BIP
RTE per hyperperiod. The P20 activities were mapped to Core 0, whereas the
jobs of tasks (P1, P2, P3, P4) were mapped to Core 1 and Core 2. P1 stands
for the Data Input Dispatcher, P2 for the Control FM, P3 for the Control Out-
put and P4 for the Guidance Navigation task. Right after 10 consecutive jobs
of P1, P2, P3 the job on P4 is executed. The job of P4 is delayed due to the
450 ms invocation offset and the least functional priority. Since P3 and P4 do
not communicate via the channels, in our framework (P 3, P 4) ∈ / FP and they
can execute in parallel, which was actually programmed in our static schedule.
Due to more than 100% system load this was necessary for deadline satisfaction.
www.dbooks.org
A PN Model for Reactive Streaming Software 109
8 Conclusion
We presented the formal semantics of the FPPN model, at two levels: zero-delay
semantics with precedence constraints on the job execution order to ensure func-
tional determinism, and real-time semantics for scheduling. The semantics was
implemented by a model transformational framework. Our approach was val-
idated through a spacecraft on-board application running on a multi-core. In
future work we consider it important to improve the efficiency of code gener-
ation, formal proofs of equivalence of the scheduling constraints (like the task
graph) and the generated BIP model. The offline and online schedulers need to
be enhanced to a wider spectrum of online policies and a better awareness of
resource interference.
References
1. GR-CPCI-LEON4-N2X: Quad-core LEON4 next generation microprocessor eval-
uation board. http://www.gaisler.com/index.php/products/boards/gr-cpci-leon4-
n2x
2. Multicore code generation for time-critical applications. http://www-verimag.
imag.fr/Multicore-Time-Critical-Code,470.html
3. Abdellatif, T., Combaz, J., Sifakis, J.: Model-based implementation of real-time
applications. In: EMSOFT 2010 (2010)
4. Chaki, S., Kyle, D.: DMPL: programming and verifying distributed mixed-synchrony
and mixed-critical software. Technical report, Carnegie Mellon University (2016).
http://www.andrew.cmu.edu/user/schaki/misc/dmpl-extended.pdf
5. Cordovilla, M., Boniol, F., Forget, J., Noulard, E., Pagetti, C.: Developing critical
embedded systems on multicore architectures: the Prelude-SchedMCore toolset.
In: RTNS (2011)
6. Eker, J., Janneck, J.W., Lee, E.A., Liu, J., Liu, X., Ludvig, J., Neuendorffer, S.,
Sachs, S., Xiong, Y.: Taming heterogeneity - the Ptolemy approach. Proc. IEEE
91(1), 127–144 (2003)
7. Feiler, P., Gluch, D., Hudak, J.: The architecture analysis & design language
(AADL): an introduction. Technical report CMU/SEI-2006-TN-011, Software
Engineering Institute, Carnegie Mellon University, Pittsburgh, PA (2006). http://
resources.sei.cmu.edu/library/asset-view.cfm?AssetID=7879
8. Geilen, M., Basten, T.: Reactive process networks. In: EMSOFT 2004, pp. 137–146.
ACM (2004)
9. Ghamarian, A.H.: Timing analysis of synchronous dataflow graphs. Ph.D. thesis,
Eindhoven University of Technology (2008)
10. Giannopoulou, G., Poplavko, P., Socci, D., Huang, P., Stoimenov, N., Bourgos, P.,
Thiele, L., Bozga, M., Bensalem, S., Girbal, S., Faugere, M., Soulat, R., Dinechin,
B.D.d.: DOL-BIP-Critical: a tool chain for rigorous design and implementation of
mixed-criticality multi-core systems. Technical report (2016)
11. Ha, S., Kim, S., Lee, C., Yi, Y., Kwon, S., Joo, Y.P.: PeaCE: a hardware-software
codesign environment for multimedia embedded systems. ACM Trans. Des. Autom.
Electron. Syst. 12(3), 24:1–24:25 (2008)
12. Halbwachs, N.: Synchronous Programming of Reactive Systems. Springer, Berlin
(2010). https://doi.org/10.1007/978-1-4757-2231-4
110 F. Gioulekas et al.
13. Hansson, A., Goossens, K., Bekooij, M., Huisken, J.: CoMPSoC: a template for
composable and predictable multi-processor system on chips. ACM Trans. Des.
Autom. Electron. Syst. (TODAES) 14(1), 2 (2009)
14. Hugues, J., Zalila, B., Pautet, L., Kordon, F.: From the prototype to the final
embedded system using the Ocarina AADL tool suite. ACM Trans. Embed. Com-
put. Syst. 7(4), 42:1–42:25 (2008)
15. Johnston, W.M., Hanna, J.R.P., Millar, R.J.: Advances in dataflow programming
languages. ACM Comput. Surv. 36(1), 1–34 (2004)
16. Kahn, G.: The semantics of a simple language for parallel programming. In:
Rosenfeld, J.L. (ed.) Information Processing 1974: Proceedings of the IFIP
Congress, pp. 471–475. North-Holland, New York (1974)
17. Lee, E.A., Messerschmitt, D.G.: Static scheduling of synchronous data flow pro-
grams for digital signal processing. IEEE Trans. Comput. C–36(1), 24–35 (1987)
18. Lee, E.A., Messerschmitt, D.G.: Synchronous data flow. Proc. IEEE 75(9), 1235–
1245 (1987)
19. Perrotin, M., Conquet, E., Delange, J., Schiele, A., Tsiodras, T.: TASTE: a real-
time software engineering tool-chain overview, status, and future. In: Ober, I.,
Ober, I. (eds.) SDL 2011. LNCS, vol. 7083, pp. 26–37. Springer, Heidelberg (2011).
https://doi.org/10.1007/978-3-642-25264-8 4
20. Poplavko, P., Socci, D., Bourgos, P., Bensalem, S., Bozga, M.: Models for deter-
ministic execution of real-time multiprocessor applications. In: DATE 2015, pp.
1665–1670. IEEE, March 2015
21. Poplavko, P., Kahil, R., Socci, D., Bensalem, S., Bozga, M.: Mixed-critical systems
design with coarse-grained multi-core interference. In: Margaria, T., Steffen, B.
(eds.) ISoLA 2016. LNCS, vol. 9952, pp. 605–621. Springer, Cham (2016). https://
doi.org/10.1007/978-3-319-47166-2 42
22. Socci, D., Poplavko, P., Bensalem, S., Bozga, M.: A timed-automata based mid-
dleware for time-critical multicore applications. In: SEUS 2015, pp. 1–8. IEEE
(2015)
23. Triki, A., Combaz, J., Bensalem, S., Sifakis, J.: Model-based implementation of
parallel real-time systems. In: Cortellessa, V., Varró, D. (eds.) FASE 2013. LNCS,
vol. 7793, pp. 235–249. Springer, Heidelberg (2013). https://doi.org/10.1007/978-
3-642-37057-1 18
24. Waez, M.T.B., Dingel, J., Rudie, K.: A survey of timed automata for the develop-
ment of real-time systems. Comput. Sci. Rev. 9, 1–26 (2013)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
www.dbooks.org
Distributed Graph Queries for Runtime
Monitoring of Cyber-Physical Systems
1 Introduction
A smart and safe cyber-physical system (CPS) [23,30,36] heavily depends on
intelligent data processing carried out over a heterogeneous computation plat-
form to provide autonomous behavior with complex interactions with an envi-
ronment which is rarely known in advance. Such a complexity frequently makes
design time verification be infeasible in practice, thus CPSs need to rely on
run-time verification (RV) techniques to ensure safe operation by monitoring.
Traditionally, RV techniques have evolved from formal methods [24,26],
which provide a high level of precision, but offer a low-level specification lan-
guage (with simple atomic predicates to capture information about the system)
which hinders their use in every day engineering practice. Recent RV approaches
[17] started to exploit rule-based approaches over a richer information model.
c The Author(s) 2018
A. Russo and A. Schürr (Eds.): FASE 2018, LNCS 10802, pp. 111–128, 2018.
https://doi.org/10.1007/978-3-319-89363-1_7
112 M. Búr et al.
www.dbooks.org
Distributed Graph Queries for Runtime Monitoring 113
temporal behavior [11], our current work is restricted to (structural) safety prop-
erties where the violation of a property is expressible by graph queries.
These queries will be evaluated over a runtime model which reflects the cur-
rent state of the monitored system, e.g. data received from different sensors, the
services allocated to computing units, or the health information of computing
infrastructure. In accordance with the models@ runtime paradigm [8,38], observ-
able changes of the real system gets updated—either periodically with a certain
frequency, or in an event-driven way upon certain triggers.
Runtime monitor programs are deployed to a distributed heterogeneous com-
putation platform, which may include various types of computing units ranging
from ultra-low-power microcontroller units, through smart devices to high-end
cloud-based servers. These computation units primarily process the data pro-
vided by sensors and they are able to perform edge- or cloud-based computations
based on the acquired information. The monitoring programs are deployed and
executed on them exactly as the primary services of the system, thus resource
restrictions (CPU, memory) need to be respected during allocation.
Runtime monitors are synthesized by transforming high-level query specifi-
cations into deployable, platform dependent source code for each computation
unit used as part of a monitoring service. The synthesis includes a query opti-
mization step and a code generation step to produce platform-dependent C++
source code ready to be compiled into an executable for the platform. Due to
space restrictions, this component of our framework is not detailed in this paper.
Our system-level monitoring framework is hierarchical and distributed. Mon-
itors may observe the local runtime model of the their own computing unit, and
they can collect information from runtime models of different devices, hence pro-
viding a distributed monitoring architecture. Moreover, one monitor may rely on
information computed by other monitors, thus yielding a hierarchical network.
114 M. Búr et al.
Many industrial modeling tools used for engineering CPS [3,31,47] build on the
concepts of domain-specific (modeling) languages (DSLs) where a domain is typ-
ically defined by a metamodel and a set of well-formedness constraints. A meta-
model captures the main concepts in a domain as classes with attributes, their
relations as references, and specifies the basic structure of graph models.
A metamodel can be formalized as a vocabulary Σ = {C1 , . . . , Cn1 , A1 , . . . ,
An2 , R1 , . . . , Rn3 } with a unary predicate symbol Ci for each class, a binary predicate
symbol Aj for each attribute, and a binary predicate symbol Rk for each relation.
Example 1. Figure 2 shows a metamodel for the CPS demonstrator with Comput-
ing Units (identified on the network by hostID attribute) which host Domain Ele-
ments and communicate with other Computing Units. A Domain Element is either a
Train or Railroad Element where the latter is either a Turnout or a Segment. A Train is
situated on a Railroad Element which is connected to at most two other Railroad Ele-
ments. Furthermore, a Turnout refers to Railroad Elements connecting to its straight
and divergent exits. A Train also knows its speed.
Objects, their attributes, and links between them constitute a runtime model
[8,38] of the underlying system in operation. Changes to the system and its
environment are reflected in the runtime model (in an event-driven or time-
triggered way) and operations executed on the runtime model (e.g. setting values
of controllable attributes or relations between objects) are reflected in the system
www.dbooks.org
Distributed Graph Queries for Runtime Monitoring 115
itself (e.g. by executing scripts or calling services). We assume that this runtime
model is self-descriptive in the sense that it contains information about the
computation platform and the allocation of services to platform elements, which
is a key enabler for self-adaptive systems [10,44].
A runtime model M = Dom M , IM can be formalized as a 2-valued logic
structure over Σ where Dom M = Obj M Data M where Obj M is a finite set of
objects, while Data M is the set of (built-in) data values (integers, strings, etc.).
IM is a 2-valued interpretation of predicate symbols in Σ defined as follows:
which can abstract from the actual communication semantics (e.g. asynchronous
messages vs. broadcast messages) by (1) evaluating predicates locally at a com-
puting unit with (2) a 3-valued truth evaluation having a third 1/2 value in
case of uncertainty. Each computing unit maintains a set of facts described by
atomic predicates in its local knowledge base wrt. the objects with attributes it
hosts, and references between local objects. Additionally, each computing unit
incorporates predicates describing outgoing references for each object it hosts.
The 3-valued truth evaluation of a predicate P (v1 , . . . , vn ) on a computing
unit cu is denoted by [[P (v1 , . . . , vn )]]@cu. The DRM of the system is constituted
from the truth evaluation of all predicates on all computing units. For the current
paper, we assume the single source of truth principle, i.e. each model element is
always faithfully observed and controlled by its host computing unit, thus the
local truth evaluation of the corresponding predicate P is always 1 or 0. However,
3-valued evaluation could be extended to handle such local uncertainties.
Example 2. Figure 3 shows a DRM snapshot for the CPS demonstrator (bot-
tom part of Fig. 1). Computing units BBB1–BBB3 manage different parts of the
system, e.g. BBB1 hosts objects s1, s2, tu1 and tr2 and the links between them.
We illustrate the local knowledge bases of computing units.
Since computing unit BBB1 hosts train tr2, thus [[Train(tr2)]]@BBB1 = 1.
However, according to computing module BBB2, [[Train(tr2)]]@BBB2 = 1/2 as
there is no train tr2 hosted on BBB2, but it may exist on a different one.
Similarly, [[ConnectedTo(s1, s7)]]@BBB1 = 1, as BBB1 is the host of s1, the
source of the reference. This means BBB1 knows that there is a (directed) reference
of type connectedTo from s1 to s7. However, the knowledge base on BBB3 may have
uncertain information about this link, thus [[ConnectedTo(s1, s7)]]@BBB3 = 1/2,
i.e. there may be a corresponding link from s1 to s7, but it cannot be deduced using
exclusively the predicates evaluated at BBB3.
www.dbooks.org
Distributed Graph Queries for Runtime Monitoring 117
of CPSs to provide scalable queries over large system models. The current paper
aims to reuse this declarative graph query language for runtime verification pur-
poses, which is a novel idea. The main benefit is that safety properties can be
captured on a high level of abstraction over the runtime model, which eases the
definition and comprehension of safety monitors for engineers. Moreover, this
specification is free from any platform-specific or deployment details.
The expressiveness of the VQL language converges to first-order logic with
transitive closure, thus it provides a rich language for capturing a variety of com-
plex structural conditions and dependencies. Technically, a graph query captures
the erroneous case, when evaluating the query over a runtime model. Thus any
match (result) of a query highlights a violation of the safety property at runtime.
Syntax. Formally, a graph pattern (or query) is a first order logic (FOL) for-
mula ϕ(v1 , . . . , vn ) over variables [42]. A graph pattern ϕ can be inductively
constructed (see Table 1) by using atomic predicates of runtime models C(v),
A(v1 , v2 ), R(v1 , v2 ), C, A, R ∈ Σ, equality between variables v1 = v2 , FOL connec-
tives ∨, ∧, quantifiers ∃, ∀, and positive (call ) or negative (neg) pattern calls.
www.dbooks.org
Distributed Graph Queries for Runtime Monitoring 119
– If any links of its local runtime model point to a fragment stored at a neigh-
boring computing unit, or if a subpattern call is initiated, corresponding query
R(v1 , v2 ), call(ϕ) or neg(ϕ) needs to be evaluated at all neighbors cui .
– Such calls to distributed monitors are carried out by sending asynchronous
messages to each other thus graph queries are evaluated in a distributed way
along the computing platform. First, the requester cur sends a message of
the form “[[ϕ(v1 , . . . , vn )]]@cup =?”. The provider cup needs to send back a
reply which contains further information about the internal state or previous
monitoring results of the provider which contains all potential matches known
by cup , i.e. all bindings [[ϕ(o1 , . . . , on )]]@cup ≥ 1/2 (where we abbreviated the
binding vi → oi into the predicate as a notational shortcut).
– Matches of predicates sent as a reply to a computing unit can be cached.
– Messages may get delayed due to network traffic and they are considered
to be lost by the requester if no reply arrives within a deadline. Such a case
introduces uncertainty in the truth evaluation of predicates, i.e. the requestor
cur stores [[ϕ]]@cup = 1/2 in its cache, if the reply of the provider cup is lost.
– After acquiring truth values of predicates from its neighbors, a computing
unit needs to decide on a single truth value for each predicate evaluated
along different variable bindings. This local decision will be detailed below.
– At the end of the query cycle, each computing unit resets its cache to remove
information acquired within the last cycle.
references of type On leading from objects tr2 and tr1 to objects stored in BBB1
and BBB2, respectively (m6 and m7). As the answer, each computing unit sends
back facts stating outgoing references from the objects (m8 and m9).
The next message (m10) asks for outgoing references of type ConnectedTo
from object s2. To send a reply, first BBB1 asks BBB2 to ensure that a reference
from s2 to s3 exists, since s3 is hosted by BBB2 (m11). This check adds tolerance
against lost messages during model update. After BBB1 receives the answer from
BBB2 (m12), it replies to BBB3 containing all facts maintained on this node.
– If a match is obtained exclusively from the local runtime model of cu, then it
is a certain match, formally [[ϕ(o1 , . . . , on )]]@cu = 1.
– If a match is sent as a reply by multiple neighboring computing units cui
(with cui ∈ nbr(cu)), then we take the most certain result at cu, formally,
[[ϕ(o1 , . . . , on )]]@cu := max{[[ϕ(o1 , . . . , on )]]@cui |cui ∈ nbr(cu)}.
– Otherwise, tuple o1 , . . . , on is surely not a match: [[ϕ(o1 , . . . , on )]]@cu = 0.
Note that in the second case uses max{} to assign a maximum of 3-valued
logic values wrt. information ordering (which is different from the numerical
maximum used in Table 1). Information ordering is a partial order ({1/2, 0, 1}, )
with 1/2 0 and 1/2 1. It is worth pointing out that this distributed truth
evaluation is also in line with Sobociński 3-valued logic axioms [33].
www.dbooks.org
Distributed Graph Queries for Runtime Monitoring 121
to deploy query services to computing units with limited amount of memory and
prevent memory overflow due to the several messages sent over the network.
A graph query is evaluated according to a search plan [43], which is a list of
predicates ordered in a way that matches of predicates can be found efficiently.
During query evaluation, free variables of the predicates are bound to a value
following the search plan. The evaluation terminates when all matches in the
model are found. An in-depth discussion of query optimization is out of scope
for this paper, but Sect. 5 will provide an initial investigation.
Semantic Guarantees and Limitations. Our construction ensures that (1) the
execution will surely terminate upon reaching the end of the query time win-
dow, potentially yielding uncertain matches, (2) each local model serves as a
single source of truth which cannot be overridden by calls to other computing
units, and (3) matches obtained from multiple computing units will be fused by
preserving information ordering. The over- and under approximation properties
of 3-valued logic show that the truth values fused this way will provide a sound
result (Theorem 1 in [42]). Despite the lack of total consistency, our approach
still has safety guarantees by detecting all potentially unsafe situations.
There are also several assumptions and limitations of our approach. We use
asynchronous communication without broadcast messages. We only assumed
faults of communication links, but not the failures of computing units. We also
excluded the case when computing units maliciously send false information.
Instead of refreshing local caches in each cycle, the runtime model could incorpo-
rate information aging which may enable to handle other sources of uncertainty
(which is currently limited to consequences of message loss). Finally, in case of
longer cycles, the runtime model may no longer provide up-to-date information
at query evaluation time.
5 Evaluation
We conducted measurements to evaluate and address two research questions:
Q1: How does distributed graph query execution perform compared to executing
the queries on a single computing unit?
Q2: Is query evaluation performance affected by alternative allocation of model
objects to host computing units?
– Train locations: gets all trains and the segments on which trains are located.
– Close trains: this pattern is the one introduced in Fig. 4.
– Derailment: detects the train when approaching a turnout, but the turnout
is set to the other direction (causing the train to run off from the track).
– End of siding: detects trains approaching an end of the track.
Since the original runtime model of the CPS demonstrator has only a total of
49 objects, we scaled up the model by replicating the original elements (except
for the computing units). This way we obtained models with 49–43006 objects
and 114–109015 links, having similar structural properties as the original one.
www.dbooks.org
Distributed Graph Queries for Runtime Monitoring 123
of even a small execution platform with only 6 computing units could suppress the
communication overhead between units in case of several distributed queries, which
is certainly a promising outcome.
6 Related Work
Runtime Verification Approaches. For continuously evolving and dynamic CPSs,
an upfront design-time formal analysis needs to incorporate and check the robust-
ness of component behavior in a wide range of contexts and families of config-
urations, which is a very complex challenge. Thus consistent system behavior
is frequently ensured by runtime verification (RV) [24], which checks (poten-
tially incomplete) execution traces against formal specifications by synthesizing
verified runtime monitors from provenly correct design models [21,26].
Recent advances in RV (such as MOP [25] or LogFire [17]) promote to capture
specifications by rich logic over quantified and parameterized events (e.g. quanti-
fied event automata [4] and their extensions [12]). Moreover, Havelund proposed
to check such specifications on-the-fly by exploiting rule-based systems based on
the RETE algorithm [17]. However, this technique only incorporates low-level
events; while changes of an underlying data model are not considered as events.
1
See Appendix A for details under http://bit.ly/2op3tdy.
www.dbooks.org
Distributed Graph Queries for Runtime Monitoring 125
Distributed Graph Queries. Highly efficient techniques for local-search based [9]
and incremental model queries [40] as part of the VIATRA framework were devel-
oped, which mainly builds on RETE networks as baseline technology. In [34], a
distributed incremental graph query layer deployed over a cloud infrastructure
with numerous optimizations was developed. Distributed graph query evaluation
techniques were reported in [22,27,32], but none of these techniques considered
an execution environment with resource-constrained computation units.
Runtime Models. The models@ runtime paradigm [8] serves as the concep-
tual basis for the Kevoree framework [28] (developed within the HEADS FP7
project). Other recent distributed, data-driven solutions include the Global Data
Plane [48] and executable metamodels at runtime [44]. However, these frame-
works currently offer very limited support for efficiently evaluating queries over
a distributed runtime platform, which is the main focus of our current work.
7 Conclusions
In this paper, we proposed a runtime verification technique for smart and safe
CPSs by using a high-level graph query language to capture safety properties for
runtime monitoring and runtime models as a rich knowledge representation to
capture the current state of the running system. A distributed query evaluation
technique was introduced where none of the computing units has a global view
of the complete system. The approach was implemented and evaluated on the
physical system of MoDeS3 CPS demonstrator. Our first results show that it
scales for medium-size runtime models, and the actual deployment of the query
components to the underlying platform has significant impact on execution time.
In the future, we plan to investigate how to characterize effective search plans
and allocations in the context of distributed queries used for runtime monitoring.
126 M. Búr et al.
References
1. Abril, M., et al.: An assessment of railway capacity. Transp. Res. Part E Logist.
Transp. Rev. 44(5), 774–806 (2008)
2. Alippi, C., et al.: Model-free fault detection and isolation in large-scale cyber-
physical systems. IEEE Trans. Emerg. Top. Comput. Intell. 1(1), 61–71 (2017)
3. AUTOSAR Tool Platform: Artop. https://www.artop.org/
4. Barringer, H., Falcone, Y., Havelund, K., Reger, G., Rydeheard, D.: Quantified
event automata: towards expressive and efficient runtime monitors. In: Gian-
nakopoulou, D., Méry, D. (eds.) FM 2012. LNCS, vol. 7436, pp. 68–84. Springer,
Heidelberg (2012). https://doi.org/10.1007/978-3-642-32759-9 9
5. Bauer, A., Falcone, Y.: Decentralised LTL monitoring. Formal Methods Syst. Des.
48(1–2), 46–93 (2016)
6. Bauer, A., Leucker, M., Schallhart, C.: Runtime verification for LTL and TLTL.
ACM Trans. Softw. Eng. Methodol. 20(4), 14 (2011)
7. Bergmann, G., Ujhelyi, Z., Ráth, I., Varró, D.: A graph query language for EMF
models. In: Cabot, J., Visser, E. (eds.) ICMT 2011. LNCS, vol. 6707, pp. 167–182.
Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21732-6 12
8. Blair, G.S., et al.: Models@run.time. IEEE Comput. 42(10), 22–27 (2009)
9. Búr, M., Ujhelyi, Z., Horváth, Á., Varró, D.: Local search-based pattern matching
features in EMF-IncQuery. In: Parisi-Presicce, F., Westfechtel, B. (eds.) ICGT
2015. LNCS, vol. 9151, pp. 275–282. Springer, Cham (2015). https://doi.org/10.
1007/978-3-319-21145-9 18
10. Cheng, B.H.C., et al.: Using models at runtime to address assurance for self-
adaptive systems. In: Bencomo, N., France, R., Cheng, B.H.C., Aßmann, U. (eds.)
Models@run.time. LNCS, vol. 8378, pp. 101–136. Springer, Cham (2014). https://
doi.org/10.1007/978-3-319-08915-7 4
11. Dávid, I., Ráth, I., Varró, D.: Foundations for streaming model transformations by
complex event processing. Softw. Syst. Model. 17, 1–28 (2016). https://doi.org/
10.1007/s10270-016-0533-1
12. Decker, N., Leucker, M., Thoma, D.: Monitoring modulo theories. Int. J. Softw.
Tools Technol. Transf. 18(2), 205–225 (2015)
13. Desai, A., Seshia, S.A., Qadeer, S., Broman, D., Eidson, J.C.: Approximate syn-
chrony: an abstraction for distributed almost-synchronous systems. In: Kroening,
D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9207, pp. 429–448. Springer,
Cham (2015). https://doi.org/10.1007/978-3-319-21668-3 25
14. Emery, D.: Headways on high speed lines. In: 9th World Congress on Railway
Research, pp. 22–26 (2011)
15. Gönczy, L., et al.: MDD-based design, configuration, and monitoring of resilient
cyber-physical systems. Trustworthy Cyber-Physical Systems Engineering (2016)
16. Google: Protocol buffers. https://github.com/google/protobuf
www.dbooks.org
Distributed Graph Queries for Runtime Monitoring 127
17. Havelund, K.: Rule-based runtime verification revisited. Int. J. Softw. Tools Tech-
nol. Transf. 17(2), 143–170 (2015)
18. Hewitt, C., et al.: A universal modular ACTOR formalism for artificial intelligence.
In: International Joint Conference on Artificial Intelligence, pp. 235–245 (1973)
19. Horányi, G., Micskei, Z., Majzik, I.: Scenario-based automated evaluation of test
traces of autonomous systems. In: DECS workshop at SAFECOMP (2013)
20. Iqbal, M.Z., et al.: Applying UML/MARTE on industrial projects: challenges,
experiences, and guidelines. Softw. Syst. Model. 14(4), 1367–1385 (2015)
21. Joshi, Y., et al.: Runtime verification of LTL on lossy traces. In: Proceedings of
the Symposium on Applied Computing - SAC 2017, pp. 1379–1386. ACM Press
(2017)
22. Krause, C., Tichy, M., Giese, H.: Implementing graph transformations in the bulk
synchronous parallel model. In: Gnesi, S., Rensink, A. (eds.) FASE 2014. LNCS,
vol. 8411, pp. 325–339. Springer, Heidelberg (2014). https://doi.org/10.1007/978-
3-642-54804-8 23
23. Krupitzer, C., et al.: A survey on engineering approaches for self-adaptive systems.
Perv. Mob. Comput. 17, 184–206 (2015)
24. Leucker, M., Schallhart, C.: A brief account of runtime verification. J. Log. Algebr.
Program. 78(5), 293–303 (2009)
25. Meredith, P.O., et al.: An overview of the MOP runtime verification framework.
Int. J. Softw. Tools Technol. Transf. 14(3), 249–289 (2012)
26. Mitsch, S., Platzer, A.: ModelPlex: verified runtime validation of verified cyber-
physical system models. In: Bonakdarpour, B., Smolka, S.A. (eds.) RV 2014. LNCS,
vol. 8734, pp. 199–214. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-
11164-3 17
27. Mitschke, R., Erdweg, S., Köhler, M., Mezini, M., Salvaneschi, G.: i3QL: Language-
integrated live data views. ACM SIGPLAN Not. 49(10), 417–432 (2014)
28. Morin, B., et al.: Kevoree Modeling Framework (KMF): efficient modeling tech-
niques for runtime use. University of Luxembourg, Technical report (2014)
29. Mostafa, M., Bonakdarpour, B.: Decentralized runtime verification of LTL spec-
ifications in distributed systems. In: 2015 IEEE International Parallel and Dis-
tributed Processing Symposium, pp. 494–503, May 2015
30. Nielsen, C.B., et al.: Systems of systems engineering: Basic concepts, model-based
techniques, and research directions. ACM Comput. Surv. 48(2), 18 (2015)
31. No Magic: MagicDraw. https://www.nomagic.com/products/magicdraw
32. Peters, M., Brink, C., Sachweh, S., Zündorf, A.: Scaling parallel rule-based reason-
ing. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A.
(eds.) ESWC 2014. LNCS, vol. 8465, pp. 270–285. Springer, Cham (2014). https://
doi.org/10.1007/978-3-319-07443-6 19
33. Sobociński, B.: Axiomatization of a Partial System of Three-Value Calculus of
Propositions. Institute of Applied Logic (1952)
34. Szárnyas, G., Izsó, B., Ráth, I., Harmath, D., Bergmann, G., Varró, D.: IncQuery-
D: a distributed incremental model query framework in the cloud. In: Dingel,
J., Schulte, W., Ramos, I., Abrahão, S., Insfran, E. (eds.) MODELS 2014. LNCS,
vol. 8767, pp. 653–669. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-
11653-2 40
35. Szárnyas, G., et al.: The Train Benchmark: cross-technology performance evalu-
ation of continuous model queries. Softw. Syst. Model., 1–29 (2017). https://doi.
org/10.1007/s10270-016-0571-8
36. Sztipanovits, J., et al.: Toward a science of cyber-physical system integration. Proc.
IEEE 100(1), 29–44 (2012)
128 M. Búr et al.
37. Sztipanovits, J., Bapty, T., Neema, S., Howard, L., Jackson, E.: OpenMETA: a
model- and component-based design tool chain for cyber-physical systems. In:
Bensalem, S., Lakhneck, Y., Legay, A. (eds.) ETAPS 2014. LNCS, vol. 8415, pp.
235–248. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54848-
2 16
38. Szvetits, M., Zdun, U.: Systematic literature review of the objectives, techniques,
kinds, and architectures of models at runtime. Softw. Syst. Model. 15(1), 31–69
(2013)
39. The Eclipse Project: Eclipse Modeling Framework. http://www.eclipse.org/emf
40. Ujhelyi, Z., et al.: EMF-IncQuery: an integrated development environment for live
model queries. Sci. Comput. Program. 98, 80–99 (2015)
41. Varró, D., et al.: Road to a reactive and incremental model transformation plat-
form: three generations of the VIATRA framework. Softw. Syst. Model 15(3),
609–629 (2016)
42. Varró, D., Semeráth, O., Szárnyas, G., Horváth, Á.: Towards the automated gen-
eration of consistent, diverse, scalable and realistic graph models. In: Heckel, R.,
Taentzer, G. (eds.) Graph Transformation, Specifications, and Nets. LNCS, vol.
10800, pp. 285–312. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-
75396-6 16
43. Varró, G., et al.: An algorithm for generating model-sensitive search plans for
pattern matching on EMF models. Softw. Syst. Model 14(2), 597–621 (2015)
44. Vogel, T., Giese, H.: Model-driven engineering of self-adaptive software with
EUREMA. ACM Trans. Auton. Adapt. Syst. 8(4), 18 (2014)
45. Vörös, A., et al.: MoDeS3: model-based demonstrator for smart and safe cyber-
physical systems. In: NASA Formal Methods Symposium (2018, accepted)
46. Warren, D.S.: Memoing for logic programs. Commun. ACM 35(3), 93–111 (1992)
47. Yakindu Statechart Tools: Yakindu. http://statecharts.org/
48. Zhang, B., et al.: The cloud is not enough: saving IoT from the cloud. In: 7th
USENIX Workshop on Hot Topics in Cloud Computing (2015)
49. Zheng, X., et al.: Efficient and scalable runtime monitoring for cyber-physical sys-
tem. IEEE Syst. J. PP, 1–12 (2016)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
www.dbooks.org
EventHandler -Based Analysis Framework
for Web Apps Using Dynamically
Collected States
1 Introduction
Web applications (apps) written in HTML, CSS, and JavaScript have become
prevalent, and JavaScript is now the 7th most popular programming lan-
guage [22]. Because web apps can run on any platforms and devices that provide
any browsers, they are being used widely. The overall structure of web apps
is specified in HTML, which is represented as a tree structure via Document
Object Model (DOM) APIs. CSS describes visual effects like colors, positions,
and animation of contents of the web app, and JavaScript handles events trig-
gered by user interaction. JavaScript code can change the status of the web app
c The Author(s) 2018
A. Russo and A. Schürr (Eds.): FASE 2018, LNCS 10802, pp. 129–145, 2018.
https://doi.org/10.1007/978-3-319-89363-1_8
130 J. Park et al.
by interoperation with HTML and CSS, load other JavaScript code dynamically,
and access device-specific features via APIs provided by underlying platforms.
JavaScript is the de facto standard language for web programming these days.
To help developers build high-quality web apps, researchers have studied var-
ious analysis techniques and software industries have developed in-house static
analyzers. Static analyzers such as SAFE [12,15], TAJS [2,10], and WALA [19]
analyze JavaScript web apps without concretely executing them, and dynamic
analyzers such as Jalangi [20] utilize concrete values obtained by actually exe-
cuting the apps. Thus, static analysis results aim to cover all the possible execu-
tion flows but they often contain infeasible execution flows, and dynamic anal-
ysis results contain only real execution flows but they often struggle to cover
abundant execution flows. Such different analysis results are meaningful for dif-
ferent purposes: sound static analysis results are critical for verifying absence
of bugs and complete dynamic analysis results are useful for detecting gen-
uine bugs. In order to enhance the quality of their own software, IT companies
develop in-house static analyzers like Infer from Facebook [4] and Tricorder from
Google [18].
However, statically analyzing web apps in a sound and scalable manner is
extremely challenging. Especially because JavaScript, the language that handles
controls of web apps, is totally dynamic, purely static analysis has various limita-
tions. While JavaScript can generate code to execute from string literals during
evaluation, such code is not available for static analyzers before run time. In
addition, dynamically adding and deleting object properties, and treating prop-
erty names as values make statically analyzing them difficult [17]. Moreover,
since execution flows triggered by user events are abundant, statically analyzing
them often incurs analysis performance degradation [16].
Among many challenges in statically analyzing JavaScript web apps, we
focus on analysis of event-driven execution flows in this paper. Most existing
JavaScript static analyzers are focusing on analysis of web apps at loading time
and they over-approximate event-driven execution flows to be sound. In order
to consider all possible event sequences soundly, they abstract the event-driven
semantics in a way that any events can happen in any order. Such a sound
event modeling contains many infeasible event sequences, which lead to unnec-
essary operations computing imprecise analysis results. Thus, the state-of-the-art
JavaScript static analyzers often fail to analyze event flows in web apps.
In this paper, we propose a novel EventHandler -based ( EH-based) static anal-
ysis for web apps using dynamically collected state information. First, we present
a new analysis unit, an EH . While traditional static analyzers perform whole-
program analysis covering all possible execution flows, the EH -based analysis
aims to analyze partial execution flows triggered by user events more precisely.
In other words, unlike the whole-program analysis that starts analyzing from
a single entry point of a given program, the EH -based analysis considers each
event function call triggered by a user event as an entry point. Because the
EH -based analysis enables a subset of the entire execution flows to be analyzed
at a time, it can analyze less infeasible execution flows than the whole-program
www.dbooks.org
EventHandler -Based Analysis Framework for Web Apps 131
Fig. 1. (a) A conservative modeling of event control flows (b) Modeling in TAJS [9]
if a user clicks the target A, a new event handler becomes registered, which
makes two handlers executable. Second, changes in DOM states of a web app
also change a set of executable event handlers for an event. For instance, an
event target may be removed from document via DOM API calls, which makes
the detached event target inaccessible from users. Also, events may not be cap-
tured depending on their capturing/bubbling options and CSS style settings of
visibility or display. In addition, it is a common practice to manipulate CSS
styles like the following:
– HTMLElement.style.opacity = 0;
– HTMLElement.style.zIndex = n;
to hide an element such as a button under another element, making it inaccessible
from users. These various features affect event sequences that users can trigger
and event handlers that are executed accordingly.
www.dbooks.org
EventHandler -Based Analysis Framework for Web Apps 133
top-level code” node, they analyze code initiated by event handlers in any order
as denoted by the “trigger all event handlers” node in any number of times.
According to this modeling of event control flows, all possible event sequences
that occur after loading the top-level code are soundly analyzed. Note that even
though whole-program analyzers use this sound event modeling, the analyzers
themselves may not be sound because of other features like dynamic code gener-
ation. However, because registered event handlers may be removed during eval-
uation and they may be even inaccessible due to some CSS styles as discussed
in Sect. 2.1, the event modeling in Fig. 1(a) may contain too many infeasible
event sequences that are impossible in concrete executions. Analysis with lots of
infeasible event sequences involves unnecessary computation that wastes anal-
ysis time, and often results in imprecise analysis results. Such a conservative
modeling of event control flows indeed reports many false positives [16].
To reduce the amount of infeasible event sequences to analyze, TAJS uses
a refined modeling of event control flows as shown in Fig. 1(b). Among various
event handlers, this modeling distinguishes “load event handlers” and analyzes
them before all the other event handlers. While this modeling is technically
unsound because non-load events may precede load events [15], most web apps
satisfy this modeling in practice. Moreover, because load event handlers often
initialize top-level variables, the event modeling in Fig. 1(a) often produces false
positives by analyzing non-load event functions before load event functions ini-
tialize top-level variables. On the contrary, the TAJS modeling reduces such
false positives by analyzing load event handlers before non-load event handlers.
Although the TAJS modeling distinguishes a load event, the over-approximation
of the other event handler calls still brings analysis precision and scalability
issues.
To alleviate the analysis precision and scalability problem due to event modeling,
we propose the EHA framework, which aims to analyze a subset of execution flows
within a limited time budget to detect bugs in partial execution flows rather
than to analyze all execution flows. EHA presents two key points to achieve the
goal. First, it slices the entire execution flows by using each event handler as an
individual entry point, which amounts to consider a given web app as a collection
of smaller web apps. This slicing brings the effect of breaking the loop structures
in existing event modelings shown in Fig. 1. Second, in order to analyze sliced
event control flows in various contexts, EHA constructs an initial abstract heap
of each entry point that contains necessary information to analyze a given event
control flow by abstracting dynamically collected states. More specifically, EHA
takes two components—a dynamic event generator and a static analyzer —and
collects concrete values of non-local variables of event functions via the dynamic
event generator, and abstracts the collected values using the static analyzer.
Let us compare static, dynamic, and EH -based analyses with an example. We
assume that a top-level code registers three event handlers: l, a, and b where l
134 J. Park et al.
denotes a load event handler, which precedes the others and runs once. In addi-
tion, a and b simulate a pop-up and its close button, respectively. Thus, we can
represent possible event sequences as a regular expression: l(ab)∗ a?. For a given
event sequence lababa, Fig. 2 represents the event flows analyzed by each analy-
sis technique. A conservative static analysis contains infeasible event sequences
like the ones starting with a or b, whereas a dynamic analysis covers only short
prefixes out of infinitely many flows. The EH -based analysis slices the web app
into three handler units: l, a, and b. Hence, there is no loop in the event model-
ing; each handler considers every prefix of the given event sequence that ends with
itself. For example, the handler a considers la, laba, and lababa as possible event
sequences. Moreover, instead of abstracting the evaluation result of each sequence
separately and merging them, it first merges the evaluation result of each sequence
just before the handler a—l, lab, and labab—and uses its abstraction as the initial
heap of analyzing a, which analyzes more event flows.
Fig. 2. Event flows analyzed by (a) static, (b) dynamic, and (c) EH -based analyses.
www.dbooks.org
EventHandler -Based Analysis Framework for Web Apps 135
3 Technical Details
This section discusses the EHA framework, which composes of five phases as
shown in Fig. 3. Boxes denote modules and ellipses denote data. EHA takes three
inputs: a web app (Web App) to analyze and find bugs in it, and two modules
to use as its components—a dynamic event sequence generator (Event Generator)
and a static analyzer (Static Analyzer). During the first instrumentation phase,
Instrumentor inserts code that dynamically collects states into the input web app.
Then, during the execution phase, the Instrumented Web App runs on a browser
producing Collected States. One of the input module Event Generator repeatedly
receives states of the running web app and sends user events to it during this
phase. In the third unit building phase, Unit Web App Builder constructs a small
Unit Web App for each event handler from Collected States. After analyzing the set
of Unit Web Apps by another input module Static Analyzer in the static analysis
phase, Alarm Aggregator summarizes the resulting set of Bug Reports and generates
a Final Bug Report for the original input Web App in the final alarm aggregation
phase. We now describe each phase in more detail.
Instrumentation Phase. The first phase instruments a given web app so that the
instrumented web app can record dynamically collected states during execution.
Figure 4 presents the instrumentation rules for the most important cases where
the unary operator ⊕ is either ++ or --. For presentation brevity, we abuse the
notation and write x to denote the string representation of a variable name x.
The Inst function converts necessary JavaScript language constructs to others
that perform dynamic logging. For example, for each function declaration of f ,
Inst inserts four statements before the function body and one statement after
the function body to keep track of non-local variables of the function f .
Execution Phase. The execution phase runs an instrumented web app on a
browser using events generated by Event Generator. Because EHA is parameter-
ized by the input Event Generator, it may be an automated testing tool or manual
136 J. Park et al.
efforts. The following definitions formally specify the concepts being used in the
execution phase and the rest of this section:
Execution σ ∈ S∗ State s∈S=P×H ProgramPoint p ∈ P
Heap h∈H=A→O Address @x ∈ A Object O=F→V
Field x∈F Value V = Vb A PrimitiveValue Vb
An execution of a web app σ is a sequence of states that are results of evaluation
of the web app code. We omit how states change according to the evaluation of
different language constructs, but focus on which states are collected during exe-
cution. A state s is a pair of a program point p denoting the source location of the
code being evaluated and a heap h denoting a memory status. A heap is a map
from addresses to objects. An address is a unique identifier assigned whenever an
object is created, and an object is a map from fields to values. A field is an object
property name and a value is either a primitive value or an address that denotes
an object. For presentation brevity, we abuse Object to represent Environment as
well, which is a map from variables to values. Then, EHA collects states at event
callback entries during execution:
Collected States(σ) = {s | s ∈ σ s.t. s is at an event callback entry}
the program points of which are function entries and the call stack depths are 1.
Unit Building Phase. As shown in Fig. 3, this phase constructs a set of sliced
unit web apps using dynamically collected states. More specifically, it divides
the collected states into EH units, and then for each EH unit u, it constructs
an initial summary ŝuI that contains merged values about non-local variables
from the states in u. As discussed in Sect. 2.1, an event handler consists of three
components: an event target, an event type, and a callback function. Thus, we
design an EH unit u with an abstract event target φ, an event type τ , and a
program point p:
u∈U = AbsEventTarget × EventType × P
φ ∈ AbsEventTarget = DOMTreePosition A
τ ∈ EventType
While we use the same concrete event types and program points for EH s, we
abstract concrete event targets to maintain a modest number of event targets. We
assume the static analyzer expresses analysis results as summaries. A summary
ŝ is a map from a pair of a program point and a context to an abstract heap:
ŝ ∈ Ŝ = P × Context → Ĥ c ∈ Context
where Context is parameterized by an input static analyzer of EHA.
For each dynamically collected state s = (p, h) with an event target o and
an event type τ both contained in h, Unit Web App Builder calculates an EH unit
u as follows:
o (o), τ, p)
u = αs (s) = (α
DOMTreePosition(o) if o is attached on DOM
where αo (o) =
o otherwise
www.dbooks.org
EventHandler -Based Analysis Framework for Web Apps 137
The initial summary maps all pairs of program points and contexts to the heap
bottom ⊥H denoting no information, but it keeps a single map from a pair of the
program point and the empty context to the initial abstract heap
global entry
hinit
u = i αh (hi ) where si ∈ Collected States ∧ αs (si ) = u ∧ si = (pi , hi ). The
initial abstract heap for a unit u is a join of all abstraction results of the heaps
are mapped to the same u. The heap abstraction αh
in the collected states that
and the abstract heap join are parameterized by the input static analyzer.
Static Analysis Phase. Now, the static analysis phase analyzes each sliced unit
web app one by one, and detects any bugs in it. Let us call the static analyzer
that EHA takes as its input SA. Without loss of generality, let us assume that SA
performs a whole-program analysis to compute the analysis result ŝfinal with the
initial summary ŝI by computing the least fixpoint of a semantics transfer func-
tion F̂ : ŝfinal = leastFix λŝ.(ŝI S F̂ (ŝ)) and then reports alarms for possible
bugs in it. We call an instance of EHA that takes SA as its input static analyzer
EHASA . Then, for each EH unit u, EHASA performs an EH -based analysis to com-
pute its analysis result ŝufinal with the initial summary ŝuI constructed during the
unit building phase by computing the least fixpoint of the same semantics transfer
function F̂ : ŝufinal = leastFix λŝ.(ŝuI S F̂ (ŝ)). It also reports alarms for possible
bugs in each unit u.
Alarm Aggregation Phase. The final phase combines all bug reports from sliced
unit web apps and constructs a final bug report. Because source locations of bugs
in a bug report from a unit web app are different from those in an original input
web app, Alarm Aggregator resolves such differences. Since a single source location
in the original web app may appear multiple times in differently sliced unit web
apps, Alarm Aggregator also merges bug reports for the same source locations.
4 Implementation
This section describes how we implemented concrete data representation and
each module in dark boxes in Fig. 3 in our prototype implementation.
Instrumentor. The main idea of instrumentor is similar to that of Jalangi [20],
a JavaScript dynamic analysis framework, and we implemented the rules
(partially) shown in Fig. 4. An instrumented web app collects states during exe-
cution by stringifying them and writing them on files. Dynamically collected infor-
mation may be ordinary JavaScript values or built-in objects of JavaScript engines
or browsers, which are often implemented in non-JavaScript, native languages.
Because such built-in values are inaccessible from JavaScript code, we omit their
138 J. Park et al.
values in the collected states. On the contrary, ordinary JavaScript values are
stringified in JSON format. A primitive value is stringified by JSON.stringify and
stored in ValueMap. An object value is stored in two places—its pointer in Storage
and its pointer identifier in ValueMap—and its property values are also recur-
sively stringified and stored in StorageMap. The stringified document, ValueMap, and
StorageMap are written in files at the end of execution, and Unit Web App Builder con-
verts them to states in the unit building phase.
Unit Web App Builder. In our prototype implementation, the unit web app
builder parses the collected states as in JSON format and constructs a unit web
app as multiple HTML files and one JavaScript file. A single JavaScript file
contains all the information to build an initial abstract heap as Fig. 5. It con-
tains modeling code for built-in objects on the top, declares objects recorded in
StorageMap and initializes their properties, and then declares and initializes non-
local variables, which are all the information needed to build an initial abstract
heap. At the bottom, the handler function is being called.
Starting from the above 3 variables— handler, target, and arguments—
we can fill in contents of a unit web app using the collected states. For each
variable, we get its value from the collected states and construct a corresponding
JavaScript code. When the value of a variable is a primitive value, create a
corresponding code fragment as a string literal. For an object value, get the
value from StorageMap using its pointer id, and repeat the process for its property
values. For a function object value, repeat the process for its non-local variables.
Alarm Aggregator. The alarm aggregator maintains a mapping between different
source locations and eliminates duplicated alarms. It should map between loca-
tions in the original web app and in sliced unit web apps. Our implementation
keeps track of corresponding AST nodes in different web apps, and utilizes the
information for mapping locations. It identifies duplicated alarms by string com-
parison of their bug messages and locations after mapping the source locations.
www.dbooks.org
EventHandler -Based Analysis Framework for Web Apps 139
5 Experimental Evaluation
In this section, we evaluate EHAman
SAFE , an instantiation of EHA with manual event
generation and SAFE [12], to answer the following research questions:
In the case of providing dynamic events as many as possible,
– RQ1. Full Coverage: How many event flows does the EH -based analysis
cover compared with the whole-program analysis?
– RQ2. Precision: How precise is the EH -based analysis compared with the
whole-program analysis?
– RQ3. Scalability: What is the execution time of each phase in the analyses?
– RQ4. Partial Coverage: How many event flows does the EH -based analysis
cover for timeout analyses?
We studied 8 open-source game web apps [8], which were used in the evaluation
of SAFE. They have various buttons and show event-dependent behaviors. The
first two columns of Table 1 show the names and lines of code of the apps,
respectively. The first four apps do not use any JavaScript libraries, and the
remaining apps use the jQuery library version 2.0.3. They are all cross-platform
apps that can run on Chrome, Chrome-extension, and Tizen environments.
To perform experiments, we instantiated EHA with two inputs. As an
Event Generator input, we chose manual event generation by one undergraduate
researcher who was ignorant of EHA. He was instructed to explore behaviors of
web apps as much as possible, and he could check the number of functions being
called during execution as a guidance. In order to make execution environments
simple enough to reproduce multiple times, we collected dynamic states from
a browser without any cached data. As a Static Analyzer input, we use SAFE
because it can analyze the most JavaScript web apps among existing analyz-
ers via the state-of-the-art DOM tree abstraction [14,15] and it supports a bug
detector [16]. We ran the apps with Chrome on a 2.9 GHz quad-core Intel Core
i7 with 16 GB memory in the execution phase. The other phases are conducted
on Ubuntu 16.04.1 with intel Core i7 and 32 GB memory.
Answer to RQ1. For the analysis coverage, we measured the numbers of analyzed
functions and true positives by SAFE and EHAman SAFE . Because SAFE could not
analyze 4 apps that use jQuery within the timeout of 72 h, we considered only
the other apps for SAFE.
Table 1 summarizes the result of analyzed functions. The 3rd to the 5th
columns show the numbers of registered event handler functions analyzed by
both, SAFE only, and EHAman SAFE only, respectively. Similarly, the 6th to the
8th columns show the numbers of functions analyzed by both, SAFE only, and
EHAman
SAFE only, respectively. When we compare only the registered event handler
functions among all the analyzed functions, EHAman SAFE outperforms SAFE. Even
though SAFE was designed to be sound, it missed some behaviors. Our investi-
gation showed that the causes of the unsoundness were due to incomplete DOM
modeling. For the numbers of analyzed functions, the analyses covered more than
75% of the functions in common. EHAman SAFE analyzed more functions for the first
3 subjects than SAFE due to missing event registrations caused by incomplete
DOM modeling in SAFE. On the other hand, SAFE analyzed more functions for
the 4th subject because EHAman SAFE missed flows during the execution phase. We
studied the analysis result of the 4th subject in more detail, and found flows that
resume previously suspended execution by using cached data in a localStorage
object. EHAman
SAFE could not analyze the flows because it does not contain cached
data, while SAFE could use a sound modeling of localStorage. Lastly, EHAman SAFE
did not miss any true positives that SAFE detected, and EHAman SAFE could detect
four more true positives in common functions as shown in Table 2, which implies
that EHAman
SAFE analyzed execution flows in those functions that SAFE missed.
We explain Table 2 in more detail in the next answer.
Answer to RQ2. To compare the analysis precision, we measured the numbers
of false positives (FPs) in alarm reports by SAFE and EHAman SAFE . Note that
true positives (TPs) may not be considered as “bugs” by app developers. For
example, while SAFE reports a warning when the undefined value is implicitly
converted to a number because it is a well-known error-prone pattern, it may be
an intentional behavior of a developer. Thus, TPs denote they are reproducible
in concrete executions while FPs denote it is impossible to reproduce them in
feasible executions. Similarly for RQ1, we compare the analysis precision for four
apps that do not use jQuery.
Tables 2 and 3 categorize alarms in three categories: alarms reported by both
SAFE and EHAman SAFE , alarms in functions commonly analyzed by both, and alarms
in functions that are analyzed by only one. Table 2 shows numbers of TPs and
www.dbooks.org
EventHandler -Based Analysis Framework for Web Apps 141
FPs for each app, and Table 3 further categorizes alarms in terms of their causes.
Out of 21 common alarms, 6 are TPs and 15 are FPs. Among 15 common FPs,
14 are due to absence of DOM modeling and 1 is due to the unsupported getter
and setter semantics. For the functions commonly analyzed by both, they may
report different alarms because they are based on different abstract heaps. We
observed that 40 FPs from SAFE are due to the over-approximated event sys-
tem modeling. Especially, the causes of FPs in the 01 and 03 apps are because
top-level variables are initialized when non-load event handler functions are
called, which implies that the event modeling of Fig. 1(b) would have a simi-
lar imprecision problem. On the contrary, EHAman
SAFE reported only 16 FPs mostly
(10 FPs) due to absence of DOM modeling. The remaining three FPs from object
joins and three FPs by handler unit abstraction are due to inherent problems
142 J. Park et al.
of static analysis that merges multiple values losing precision. Finally, for the
functions analyzed by only one analyzer, all the reported alarms are FPs due to
absence of DOM modeling and omitted properties in the EHAman SAFE implementa-
tion. In short, EHAman
SAFE could partially analyze more subjects than SAFE, and it
improved the analysis precision by finding four TPs and less FPs for commonly
analyzed functions. Especially, its handler unit abstraction produced three FPs
which are considerably fewer than 40 FPs from over-approximated event mod-
eling in SAFE without missing any TPs.
Answer to RQ3. To compare the analysis scalability, we measured the execution
time of each phase for the both analyzers as summarized in Table 4.
Table 4. Execution time (seconds) of each phase for SAFE and EHAman
SAFE
Id SAFE EHAman
SAFE
Total Top-Level Event Loop Execution Unit build Static analysis
Total #Call Ave. Total #EH #TO Ave.
01 375.7 8.9 366.8 465.41 682 0.68 10.0 33038.4 130 9 96.6
02 282.0 8.2 273.8 252.86 135 1.87 6.0 6379.7 33 0 70.4
03 850.2 15.5 834.7 82.70 168 0.49 2.0 7894.1 43 3 68.8
04 1276.6 325.3 951.3 302.36 589 0.51 2.1 16223.9 95 7 54.2
For SAFE, we measured the time took for analyses of the entire code, top-
level code, and event loops: Total = Top-Level + Event Loop. For four subjects
that do not use any JavaScript libraries, the total analysis took at most 1276.6 s
among which 951.3 s took for analyzing event loops. While SAFE finished ana-
lyzing the top-level code of the other subjects that use jQuery in 137.3 s at the
maximum, it could not finish analyzing their entire code within the time of 72
h (259,200 s).
For EHAman
SAFE , because the maximum execution time of the instrumentation
phase and the alarm aggregation phase are 10.3 s and 4.9 s, respectively, much
smaller than the other phases, the table shows only the other phases. For the
execution phase, we present the overhead to collect states:
EHAman
SAFE (Execution Phase): Total = #Call × Ave.
The 6th column presents the numbers of event handler function calls that Event
Generator executed; each event handler function pauses for 3.24 s on average.
In order to understand the performance overhead due to the instrumentation,
we measured its slowdown effect by replacing all the instrumented helper func-
tions with a function with the empty body. With the Sunspider benchmark,
Jalangi showed x30 slowdown and EHAmanSAFE showed x178 slowdown on average.
We observed that collecting non-local variables for each function incurs much
performance overhead, and more function calls make more overhead.
www.dbooks.org
EventHandler -Based Analysis Framework for Web Apps 143
The unit building phase takes time to generate unit web app code. Our
investigation showed that the time heavily depends on the size of collected data.
For the static analysis phase, we measured the analysis time of unit web apps
except timeout (TO):
EHAman
SAFE (Static Analysis Phase): Total = (#EH − #TO) × Ave. + 1200 × #TO
We analyzed each unit web app with the timeout of 1200 s. While the 02 app
has no timeout, the 07 app has 87 timeouts out of 94 unit web apps. On average,
analysis of 38% (25/66) of the unit web apps was timeout. Note that even for
the first four apps that SAFE finished analysis, EHAman SAFE had some timeouts.
We conjecture that SAFE finished analysis quickly since it missed some flows
because of unsupported DOM modeling. By contrast, because EHAman SAFE analyzes
more flows using dynamically collected data, it had several timeouts.
Answer to RQ4. To see how many event flows EHAman SAFE covers with a limited
time budget, let us consider four apps that SAFE did not finish in 72 h from
Tables 1 and 4. EHAman
SAFE finished 19% (42/225) of the units within the timeout of
1200 s as shown in Table 4, and the average analysis time excluding timeouts was
76.0 s. Because it implies that web apps have event flows that can be analyzed in
about 76 s, it may be meaningful to analyze such simple event flows quickly first
to find bugs in them. Starting with 42 units, EHAman SAFE covered 78 functions as
shown in Table 1. While SAFE could not provide any bug reports for four apps
using jQuery, EHAmanSAFE reported 6 alarms from the analzyed functions.
6 Related Work
Researchers have studied event dependencies to analyze event flows more pre-
cisely. Madsen et al. [13] proposed event-based call graphs, which extend tra-
ditional call graphs with behaviors of event handlers such as registration and
trigger of events. While they do not consider analysis of DOM state changes and
event capturing/bubbling behaviors, EHA addresses them by utilizing dynami-
cally collected states. Sung et al. [21] introduced DOM event dependency and
exploited it to test JavaScript web apps. Their tool improved the efficiency of
event testing but it has not yet been applied for static analysis of event loops.
Taking advantage of both static analysis and dynamic analysis is not a new
idea [5]. For JavaScript analysis, researches tried to analyze dynamic features
of JavaScript [7] and DOM values of web apps [23,24] precisely. Alimadadi
et al. [1] proposed a DOM-sensitive change impact analysis for JavaScript web apps.
JavaScript Blended Analysis Framework (JSBAF) [26] collects dynamic traces of
a given app, specializes dynamic features of JavaScript like eval calls and reflec-
tive property accesses utilizing the collected traces. JSBAF analyzes each trace
separately and combines the results, but EHA abstracts the collected states on
each EH first and then analyzes the units to get generalized contexts. Finally, Ko
et al. [11] proposed a tunable static analysis framework that utilizes a light-weight
pre-analysis. Similarly, our work builds an approximation of selected executions by
constructing an initial abstract heap utilizing dynamic information, which enables
to analyze complex event flows although partially.
144 J. Park et al.
Acknowledgment. The research leading to these results has received funding from
National Research Foundation of Korea (NRF) (Grants NRF-2017R1A2B3012020 and
2017M3C4A7068177).
References
1. Alimadadi, S., Mesbah, A., Pattabiraman, K.: Hybrid DOM-sensitive change
impact analysis for JavaScript. In: ECOOP 2015 (2015)
2. Andreasen, E., Møller, A.: Determinacy in static analysis for jQuery. In: OOPSLA
2014 (2014)
3. Andreasen, E.S., Møller, A., Nielsen, B.B.: Systematic approaches for increasing
soundness and precision of static analyzers. In: SOAP 2017 (2017)
4. Calcagno, C., et al.: Moving fast with software verification. In: Havelund, K.,
Holzmann, G., Joshi, R. (eds.) NFM 2015. LNCS, vol. 9058, pp. 3–11. Springer,
Cham (2015). https://doi.org/10.1007/978-3-319-17524-9 1
5. Ernst, M.D.: Static and dynamic analysis: synergy and duality. In: PASTE 2004
(2004)
6. Grech, N., Fourtounis, G., Francalanza, A., Smaragdakis, Y.: Heaps don’t lie: coun-
tering unsoundness with heap snapshots. In: OOPSLA 2017 (2017)
7. Guarnieri, S., Livshits, B.: GATEKEEPER: mostly static enforcement of security
and reliability policies for JavasSript code. In: SSYM 2009 (2009)
8. Intel: HTML5 web apps (2017). https://01.org/html5webapps/webapps
9. Jensen, S.H., Madsen, M., Møller, A.: Modeling the HTML DOM and browser API
in static analysis of JavaScript web applications. In: ESEC/FSE 2011 (2011)
10. Jensen, S.H., Møller, A., Thiemann, P.: Type analysis for JavaScript. In: Palsberg,
J., Su, Z. (eds.) SAS 2009. LNCS, vol. 5673, pp. 238–255. Springer, Heidelberg
(2009). https://doi.org/10.1007/978-3-642-03237-0 17
11. Ko, Y., Lee, H., Dolby, J., Ryu, S.: Practically tunable static analysis framework
for large-scale JavaScript applications. In: ASE 2015 (2015)
www.dbooks.org
EventHandler -Based Analysis Framework for Web Apps 145
12. Lee, H., Won, S., Jin, J., Cho, J., Ryu, S.: SAFE: formal specification and imple-
mentation of a scalable analysis framework for ECMAScript. In: FOOL 2012 (2012)
13. Madsen, M., Tip, F., Lhoták, O.: Static analysis of event-driven Node.js JavaScript
applications. In: OOPSLA 2015 (2015)
14. Park, C., Ryu, S.: Scalable and precise static analysis of JavaScript applications
via loop-sensitivity. In: ECOOP 2015 (2015)
15. Park, C., Won, S., Jin, J., Ryu, S.: Static analysis of JavaScript web applications
in the wild via practical DOM modeling. In: ASE 2015 (2015)
16. Park, J., Lim, I., Ryu, S.: Battles with false positives in static analysis of JavaScript
web applications in the wild. In: ICSE-SEIP 2016 (2016)
17. Richards, G., Lebresne, S., Burg, B., Vitek, J.: An analysis of the dynamic behavior
of JavaScript programs. In: PLDI 2010 (2010)
18. Sadowski, C., Van Gogh, J., Jaspan, C., Söderberg, E., Winter, C.: Tricorder:
building a program analysis ecosystem. In: ICSE 2015 (2015)
19. Schäfer, M., Sridharan, M., Dolby, J., Tip, F.: Dynamic determinacy analysis. In:
PLDI 2013 (2013)
20. Sen, K., Kalasapur, S., Brutch, T., Gibbs, S.: Jalangi: a selective record-replay and
dynamic analysis framework for JavaScript. In: ESEC/FSE 2013 (2013)
21. Sung, C., Kusano, M., Sinha, N., Wang, C.: Static DOM event dependency analysis
for testing web applications. In: FSE 2016 (2016)
22. TIOBE: TIOBE Index for September 2017. http://www.tiobe.com/tiobe-index
23. Tripp, O., Ferrara, P., Pistoia, M.: Hybrid security analysis of web JavaScript code
via dynamic partial evaluation. In: ISSTA 2014 (2014)
24. Tripp, O., Weisman, O.: Hybrid analysis for JavaScript security assessment. In:
ESEC/FSE 2011 (2011)
25. Wang, Y., Zhang, H., Rountev, A.: On the unsoundness of static analysis for
android GUIs. In: SOAP 2016 (2016)
26. Wei, S., Ryder, B.G.: Practical blended taint analysis for JavaScript. In: ISSTA
2013 (2013)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
Software Design and Verification
www.dbooks.org
Hierarchical Specification and Verification
of Architectural Design Patterns
Diego Marmsoler(B)
1 Introduction
Architectural design patterns capture architectural design experience and pro-
vide abstract solutions to recurring architectural design problems. They are an
important concept in software engineering and regarded as one of the major
tools to support an architect in the conceptualization and analysis of software
systems [1]. The importance of patterns resulted in a panoply of pattern descrip-
tions in literature [1–3]. They usually consist of a description of some key archi-
tectural constraints imposed by the pattern, such as involved data types, types
of components, and assertions about the activation/deactivation of components
as well as connections between component ports. These descriptions are usually
highly informal and the claim that they indeed solve a certain design problem
remains unverified. As a consequence, an architect cannot fully rely on a pat-
tern’s specification to solve a design problem faced during the development of a
c The Author(s) 2018
A. Russo and A. Schürr (Eds.): FASE 2018, LNCS 10802, pp. 149–168, 2018.
https://doi.org/10.1007/978-3-319-89363-1_9
150 D. Marmsoler
www.dbooks.org
Hierarchical Specification and Verification of Architectural Design Patterns 151
2 Background
In the following, we provide some background on which our work is build.
k0 k1 k2
{9} o0 c1 A {1} o0 c1 A {2, 4} o0 c1 A
o2 {5} o2 {7} o2 {9}
{5} i0 o1 {8} i0 o1 {6} i0 o1
{5} {2, 4}
{K}
, , ,
{A} {G}
i1 o0 {9} i1 o0 {7}
c3 F c3 F
o1 i0 {X} o1 i0 {T, B}
{A} {F} {W}
{8, 4} {3, 9}
{9} o0 i1 {1} o0 i1 {2} o0 i1
i2 i2 {8} i2 {6}
{Z} i0 c2 {8, 4} {G} i0 c2 {3, 9} i0 c2
www.dbooks.org
Hierarchical Specification and Verification of Architectural Design Patterns 153
The specification of interfaces proceeds then in two steps: First, ports are spec-
ified by providing a set of ports P and a corresponding mapping tp : P → S to
specify which types of data may be exchanged through each port. Then, a set
of interfaces (CP , IP , OP ) is specified by declaring input ports IP ⊆ P , output
ports OP ⊆ P , and a set of configuration parameters CP ⊆ P . Thereby, config-
uration parameters are a way to parametrize components of a certain type and
they can be thought of as ports with a predefined value which is fixed for each
component.
Interfaces can then be specified using so-called configuration diagrams con-
sisting of a graphical depiction of the involved interfaces (see Sect. 3.6 for exam-
ples). Thereby, each interface consists of two parts: A name followed by a list of
configuration parameters (enclosed between ‘’ and ‘ ’). Input and output ports
are represented by empty and filled circles, respectively.
www.dbooks.org
Hierarchical Specification and Verification of Architectural Design Patterns 155
Ports. Two port types are specified over these data types by the specification
given in Fig. 3c: a type sb which allows to exchange subscriptions to a specific
event and type nt which allows to exchange messages associated to any event.
www.dbooks.org
Hierarchical Specification and Verification of Architectural Design Patterns 157
s: Subscriber
p: Publisher
m: msg
E: ℘(evt)
s : Subscriber
e: evt
−→ p.sb s.sb
p ∧ s ∧ s.sb
s ∧ ∃E : s E ∈ s .sb ∧ e ∈ E
−→ p∧s ∧(e,m)∈p.nt−→s .ntp.nt W s ∧(∃E : s E∈s .sb∧e∈E)
message. Equation (4), on the other hand, requires a subscriber’s input port nt
to be connected to the corresponding output port of the publisher, whenever the
latter sends a message for which the subscriber is subscribed.
Data Types. Blackboard architectures usually work with problems and solutions
for them. Figure 5b provides a specification of the corresponding data types.
We denote by PROB the set of all problems and by SOL the set of all solutions.
Complex problems consist of subproblems which can be complex themselves. To
solve a problem, its subproblems have to be solved first. Therefore, we assume the
existence of a subproblem relation ≺ ⊆ PROB×PROB. For complex problems, the
details of the relation may not be known in advance. Indeed, one of the benefits of
a blackboard architecture is that a problem can be solved even without knowing
the exact nature of this relation in advance. However, the subproblem relation
has to be well-founded (Eq. (5)) for a problem to be solvable. In particular,
we do not allow for cycles in the transitive closure of ≺. While there may be
different approaches to solve a problem (i.e., several ways to split a problem
into subproblems), we assume, without loss of generality, that the final solution
for a problem is always unique. Thus, we assume the existence of a function
solve : PROB → SOL which assigns the correct solution to each problem. Note,
however, that it is not known in advance how to compute this function and it is
indeed one of the reasons for using this pattern to calculate this function.
158 D. Marmsoler
≺: PROB × PROB
solve : PROB → SOL
KS pb : Subscriber (≺)
[nt, sb → rp, cs]
op cs rp ns
op cs rp ns
BB : Publisher
rp : PROB × ℘(PROB)
[sb, nt → rp, cs] ns cs : PROB × SOL
op prob : PROB
www.dbooks.org
Hierarchical Specification and Verification of Architectural Design Patterns 159
p: PROB
P: PROB SET
p : PROB
s : SOL
(p , s ) ∈ ns −→ ♦ (p , s ) ∈ cs
(p, P ) ∈ rp −→ ∀p ∈ P : (♦(p ∈ op))
p ∈ op −→ p ∈ op W (p , solve(p )) ∈ cs
receive currently open problems and solutions for all currently solved problems,
and two output ports rp and ns to communicate required subproblems and new
solutions. Thereby, port rp is specified to be an instance of a subscribers nt port
and port cs to be an instance of a subscribers sb port, respectively.
Component Types. A blackboard provides the current state towards solving the
original problem and forwards problems and solutions from knowledge sources.
Figure 6 provides a specification of the blackboard’s behavior in terms of three
behavior assertions:
– If a solution s to a subproblem p is received on its input port ns, then it is
eventually provided at its output port cs (Eq. 6).
– If, on its input port rp, it gets notified that solutions for some subproblems
P are required in order to solve a certain problem p, these problems are
eventually provided at its output port op (Eq. (7)).
– A problem p is provided at its output port op as long as it is not solved
(Eq. (8)).
Note that the last assertion (Eq. (8)) is formulated using a weak until operator
def
which is defined as follows: γ W γ = (γ ) ∨ (γ U γ).
A knowledge source receives open problems via op and provides solutions for
other problems via cs. It might contribute to the solution of the original problem
by solving currently open subproblems. Figure 7 provides a specification of the
knowledge sources’s behavior in terms of four behavior assertions:
– If a knowledge source (able to solve a problem pp) requires some subprob-
lems P to be solved in order to solve pp and it gets solutions for all these
subproblems p on its input port cs, then it eventually solves pp and provides
the solution on its output port ns (Eq. (9)).
– To solve a problem pp, a knowledge source requires solutions only for smaller
problems p ∈ P (Eq. (10)).
– A knowledge source will eventually communicate its ability to solve an open
problem pp via its output port rp (Eq. (11)).
– A knowledge source does not unsubscribe from receiving solutions for sub-
problems it required until it indeed received these solutions (Eq. (12)).
160 D. Marmsoler
ks = KS pp
p: PROB
P: ℘(PROB)
p : PROB
∀(pp, P ) ∈ rp : (∀p ∈ P : ♦(p , solve(p )) ∈ cs) −→ ♦(pp, solve(pp)) ∈ ns
∀(pp, P ) ∈ rp : ∀p ∈ P : p ≺ pp
pp ∈ op −→ ♦(∃P : (pp, P ) ∈ rp)
sub ks P = rp −→ ¬∃P : p ∈ P ∧ unsub ks P = rp W (p, solvep) ∈ cs
ks : KS pp
bb : BB
ks : KS pp
ks ∧ pp ∈ ks .op −→ ks W ks ∧ (pp, solve(pp)) ∈ ks .ns
−→ ks.op bb.op
ks ∧ bb ∧ bb.op
−→ bb.ns ks.ns
bb ∧ ks ∧ ks.ns
www.dbooks.org
Hierarchical Specification and Verification of Architectural Design Patterns 161
∃!c : ( c ) . (16)
www.dbooks.org
Hierarchical Specification and Verification of Architectural Design Patterns 163
an equivalent result as Eq. (16) for free. Moreover, we can use the additional
assertions imposed by the specification to come up with another property for the
publisher subscriber pattern which guarantees that a subscriber indeed receives
all the messages for which he is subscribed:
c ∧ sub c E ∈ c.sb −→ (17)
(e, m) ∈ p.nt ∧ e ∈ E −→ (e, m) ∈ c.sb W (unsub c E ∈ c.sb ∧ e ∈ E ) .
Note that the proof of the above property is based on Eq. (16) inherited from the
singleton pattern. Indeed, the hierarchical nature of FACTum allows for reuse
of verification results from instantiated patterns.
Blackboard. Again, the properties verified for singletons (Eq. (16)) as well as
the properties verified for publisher subscriber architectures (Eq. (17)) are inher-
ited for the blackboard specification. In the following, we use these properties
to verify another property for blackboard architectures: A blackboard pattern
guarantees that if for each open (sub-)problem, there exists a knowledge source
which is able to solve the corresponding problem:
∀p ∈ bb .op : ♦ ks p , (18)
then, it is guaranteed, that the architecture will eventually solve an overall prob-
lem, even if no single knowledge source is able to solve the problem on its own:
p ∈ bb .rp −→ ♦(p , solve(p )) ∈ bb .cs . (19)
5 Related Work
6 Conclusion
With this paper we presented a novel approach for the specification and ver-
ification of architecture design patterns. Therefore, we provide a methodology
and corresponding specification techniques for the specification of patterns in
terms of configuration traces. Then, we describe an algorithm to map a given
specification to a corresponding Isabelle/HOL theory and show soundness of
the algorithm. Our approach can be used to formally specify patterns in a hier-
archical way. Using the algorithm, the specification can then be mapped to a
corresponding Isabelle/HOL theory where the pattern can be verified using a
pre-existing calculus. This is demonstrated by specifying and verifying versions
of three architecture patterns: the singleton, the publisher subscriber, and the
blackboard. Thereby, patterns were specified hierarchical and verification results
for lower level patterns were reused for the verification of higher level patterns.
www.dbooks.org
Hierarchical Specification and Verification of Architectural Design Patterns 165
The proposed approach addresses the challenges for pattern verification iden-
tified in the introduction as follows:
References
1. Taylor, R.N., Medvidovic, N., Dashofy, E.M.: Software Architecture: Foundations,
Theory, and Practice. Wiley Publishing, Chichester (2009)
2. Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P., Stal, M.: Pattern-
Oriented Software Architecture: A System of Patterns. Wiley, West Sussex (1996)
3. Shaw, M., Garlan, D.: Software Architecture: Perspectives on an Emerging Disci-
pline, vol. 1. Prentice Hall, Englewood Cliffs (1996)
4. Wiedijk, F. (ed.): The Seventeen Provers of the World. LNCS (LNAI), vol. 3600.
Springer, Heidelberg (2006). https://doi.org/10.1007/11542384
5. Marmsoler, D., Gleirscher, M.: On activation, connection, and behavior in dynamic
architectures. Sci. Ann. Comput. Sci. 26(2), 187–248 (2016)
6. Marmsoler, D., Gleirscher, M.: Specifying properties of dynamic architectures using
configuration traces. In: Sampaio, A., Wang, F. (eds.) ICTAC 2016. LNCS, vol.
9965, pp. 235–254. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-
46750-4 14
7. Marmsoler, D.: Dynamic architectures. Archive of Formal Proofs, pp. 1–65. Formal
proof development, July 2017
8. Marmsoler, D.: Towards a calculus for dynamic architectures. In: Hung, D., Kapur,
D. (eds.) ICTAC 2017. LNCS, vol. 10580. Springer, Cham (2017). https://doi.org/
10.1007/978-3-319-67729-3 6
166 D. Marmsoler
9. Nipkow, T., Wenzel, M., Paulson, L.C. (eds.): Isabelle/HOL: A Proof Assistant
for Higher-Order Logic. LNCS, vol. 2283. Springer, Heidelberg (2002). https://
doi.org/10.1007/3-540-45949-9
10. Gordon, M.J., Milner, A.J., Wadsworth, C.P.: Edinburgh LCF: A Mechanised Logic
of Computation. LNCS, vol. 78. Springer, Heidelberg (1979). https://doi.org/10.
1007/3-540-09724-4
11. Berghofer, S., Wenzel, M.: Inductive datatypes in HOL — lessons learned in formal-
logic engineering. In: Bertot, Y., Dowek, G., Théry, L., Hirschowitz, A., Paulin,
C. (eds.) TPHOLs 1999. LNCS, vol. 1690, pp. 19–36. Springer, Heidelberg (1999).
https://doi.org/10.1007/3-540-48256-3 3
12. Wenzel, M.: Type classes and overloading in higher-order logic. In: Gunter, E.L.,
Felty, A. (eds.) TPHOLs 1997. LNCS, vol. 1275, pp. 307–322. Springer, Heidelberg
(1997). https://doi.org/10.1007/BFb0028402
13. Wenzel, M.: Isabelle/Isar - a generic framework for human-readable proof docu-
ments. In: From Insight to Proof - Festschrift in Honour of Andrzej Trybulec vol.
10, no. 23, pp. 277–298 (2007)
14. Ballarin, C.: Locales and locale expressions in Isabelle/Isar. In: Berardi, S., Coppo,
M., Damiani, F. (eds.) TYPES 2003. LNCS, vol. 3085, pp. 34–50. Springer,
Heidelberg (2004). https://doi.org/10.1007/978-3-540-24849-1 3
15. Broy, M.: A logical basis for component-oriented software and systems engineering.
Comput. J. 53(10), 1758–1782 (2010)
16. Broy, M.: A model of dynamic systems. In: Bensalem, S., Lakhneck, Y., Legay,
A. (eds.) ETAPS 2014. LNCS, vol. 8415, pp. 39–53. Springer, Heidelberg (2014).
https://doi.org/10.1007/978-3-642-54848-2 3
17. Marmsoler, D.: On the semantics of temporal specifications of component-behavior
for dynamic architectures. In: Eleventh International Symposium on Theoretical
Aspects of Software Engineering. Springer (2017)
18. Broy, M.: Algebraic specification of reactive systems. In: Wirsing, M., Nivat, M.
(eds.) AMAST 1996. LNCS, vol. 1101, pp. 487–503. Springer, Heidelberg (1996).
https://doi.org/10.1007/BFb0014335
19. Wirsing, M.: Algebraic specification. In: van Leeuwen, J. (ed.) Handbook of The-
oretical Computer Science, pp. 675–788. MIT Press, Cambridge (1990)
20. Manna, Z., Pnueli, A.: The Temporal Logic of Reactive and Concurrent Systems.
Springer, New York (1992). https://doi.org/10.1007/978-1-4612-0931-7
21. Wenzel, M., et al.: The Isabelle/Isar reference manual (2004)
22. Marmsoler, D.: Isabelle/HOL theories for the singleton, publisher subscriber, and
blackboard pattern. http://www.marmsoler.com/docs/FASE18
23. Allen, R.J.: A formal approach to software architecture. Technical report, DTIC
Document (1997)
24. Attie, P., Baranov, E., Bliudze, S., Jaber, M., Sifakis, J.: A general framework for
architecture composability. Form. Asp. Comput. 28(2), 207–231 (2016)
25. Mavridou, A., Baranov, E., Bliudze, S., Sifakis, J.: Architecture diagrams: a graph-
ical language for architecture style specification. In: Bartoletti, M., Henrio, L.,
Knight, S., Vieira, H.T. (eds.) Proceedings of the 9th Interaction and Concurrency
Experience. ICE 2016, Heraklion, 8–9 June 2016. EPTCS, vol. 223, pp. 83–97
(2016)
26. Mavridou, A., Baranov, E., Bliudze, S., Sifakis, J.: Configuration logics: mod-
elling architecture styles. In: Braga, C., Ölveczky, P.C. (eds.) FACS 2015. LNCS,
vol. 9539, pp. 256–274. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-
28934-2 14
www.dbooks.org
Hierarchical Specification and Verification of Architectural Design Patterns 167
27. Kim, J.S., Garlan, D.: Analyzing architectural styles with alloy. In: Proceedings
of the ISSTA 2006 Workshop on Role of Software Architecture for Testing and
Analysis, pp. 70–80. ACM (2006)
28. Jackson, D.: Alloy: a lightweight object modelling notation. ACM Trans. Softw.
Eng. Methodol. (TOSEM) 11(2), 256–290 (2002)
29. Garlan, D.: Formal modeling and analysis of software architecture: components,
connectors, and events. In: Bernardo, M., Inverardi, P. (eds.) SFM 2003. LNCS,
vol. 2804, pp. 1–24. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-
540-39800-4 1
30. Wong, S., Sun, J., Warren, I., Sun, J.: A scalable approach to multi-style architec-
tural modeling and verification. In: Engineering of Complex Computer Systems,
pp. 25–34. IEEE (2008)
31. Zhang, J., Liu, Y., Sun, J., Dong, J.S., Sun, J.: Model checking software architec-
ture design. In: High-Assurance Systems Engineering, pp. 193–200. IEEE (2012)
32. Marmsoler, D., Degenhardt, S.: Verifying patterns of dynamic architectures using
model checking. In: Proceedings of the International Workshop on Formal Engi-
neering approaches to Software Components and Architectures, FESCA@ETAPS
2017, Uppsala, Sweden, 22 April 2017, pp. 16–30 (2017)
33. Wirsing, M., Eckhardt, J., Mühlbauer, T., Meseguer, J.: Design and analysis of
cloud-based architectures with KLAIM and Maude. In: Durán, F. (ed.) WRLA
2012. LNCS, vol. 7571, pp. 54–82. Springer, Heidelberg (2012). https://doi.org/10.
1007/978-3-642-34005-5 4
34. Fensel, D., Schnogge, A.: Using KIV to specify and verify architectures of
knowledge-based systems. In: Automated Software Engineering, pp. 71–80,
November 1997
35. Li, Y., Sun, M.: Modeling and analysis of component connectors in Coq. In:
Fiadeiro, J.L., Liu, Z., Xue, J. (eds.) FACS 2013. LNCS, vol. 8348, pp. 273–290.
Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07602-7 17
36. Arbab, F.: Reo: a channel-based coordination model for component composition.
Math. Struct. Comput. Sci. 14(03), 329–366 (2004)
37. Marmsoler, D.: Towards a theory of architectural styles. In: Proceedings of the
22nd ACM SIGSOFT International Symposium on Foundations of Software Engi-
neering - FSE 2014, pp. 823–825. ACM Press (2014)
38. Steinberg, D., Budinsky, F., Merks, E., Paternostro, M.: EMF: Eclipse Modeling
Framework. Pearson Education, London (2008)
168 D. Marmsoler
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
www.dbooks.org
Supporting Verification-Driven
Incremental Distributed Design
of Components
1 Introduction
Software is usually not a monolithic product: it is often comprised of multiple
components that interact with each other to provide the desired functional-
ity. Components themselves can be complex, requiring their own decomposition
into sub-components. Hence, system design, must follow a systematic approach,
based on a recursive decomposition strategy that yields a modular structure.
A good decomposition and a careful specification should allow components and
sub-components to be developed in isolation by different development teams,
delegated to third parties [32], or reused off-the-shelf.
In this context, guaranteeing correctness of the system under development
becomes particularly challenging because of the intrinsic tension between two
main requirements. On the one hand, to handle complexity, we need to enable
development of sub-components where only a partial view of the system is avail-
able [28]. On the other hand, we must ensure that independently developed and
verified (sub-)components can be composed to guarantee global correctness of
c The Author(s) 2018
A. Russo and A. Schürr (Eds.): FASE 2018, LNCS 10802, pp. 169–188, 2018.
https://doi.org/10.1007/978-3-319-89363-1_10
170 C. Menghi et al.
The p&d running example. The p&d system supports furniture purchase and delivery. It uses
two existing web services, which implement furniture-sale and delivery, as well as a component
that implements the user interface. These are modeled by the labeled transition systems shown
in Fig. 1a-1c. The p&d component under design is responsible for interaction with these com-
ponents, which form its execution environment. The overall system must ensure satisfaction of
the properties informally described in Fig. 1d.
userReq
q
q
co
Re
Re
in
2 2
stA
fo
fo
fo
nd
Rc
In
In
1 4 2
offerRcvd
use
ip
od
Ti
shipCancel
v
prodCancel
d
rA
sh
m
pr
ck
e
respOk
1 3 1 3 req
prodReq shipReq Ca 5 3
nc
userNack
(a) Furniture-sale. (b) Shipping. (c) User.
P1: ship and product info are provided only if a request has been received.
P2: when user requests are processed, offers are considered only after users received information about the desired product.
P3: the furniture service is activated only if the user has decided to purchase.
P4: when a user request is cancelled by the p&d system, no user ack precedes the cancellation.
www.dbooks.org
Supporting Verification-Driven Incremental Distributed Design 171
Distribute
1
2
Design Component
Distribute
Check Synthesize Re-use Existing
Check Realizability Sub-Component Sub-Component
Check Well-Formedness Check Well-Formedness
Check Model Check Substitutability Fix
Integrate 3
Integration
Final Component
Legend
Contract Component
Development Designer Automatic Component
Output distribution
phase activity support reuse
2 Overview
FIDDle is a verification-driven environment supporting incremental and dis-
tributed component development. A high-level view of FIDDle is shown in Fig. 2.
FIDDle allows incrementally developing a component through a set of develop-
ment phases in which the human insight and experience are exploited (rounded
boxes labeled with a designer icon or a recycle symbol, to indicate design or reuse,
172 C. Menghi et al.
www.dbooks.org
Supporting Verification-Driven Incremental Distributed Design 173
1 2 3 4 5 1 2 3 4
(e) A sub-component for black-box state 2. (f) Another sub-component for black-box
state 2.
req
ck C
Na 5 anc
usr
userReq shipInfoReq costAndTime prodInfoReq infoRcvd
usrAck respOk
3 4 1 2.1 2.2 2.3 2.4 2.5
offerRcvd
(g) Integration of the sub-component of Fig. 3e and the component of Fig. 3d.
product and shipping requests are performed. Finally, pre- and post-conditions
of the black-box state 5 specify that infoRcvd has occurred after the user request
and before entering the state, and both the product and the shipping requests
are cancelled when leaving the state. This model is checked using the provided
tools; since it passes all the checks, it can be used in the next phase of the
development.
The design team may choose to refine the component or distribute the devel-
opment of unspecified sub-components (represented by black box states) to other
(internal or external) development teams. In both cases, the sub-component can
be designed by only considering the contract of the corresponding black-box
state. Each team can develop the assigned sub-component or reuse existing com-
ponents.
Sub-component Development. This phase is identified in Fig. 2 with the
symbol 2 . Each team can design the assigned sub-component using any avail-
able technique, including manual design (left side), reusing of existing sub-
components (right side) or synthesizing new ones from the provided specifi-
cations (center). The only constraints are (1) given the stated pre-condition,
the sub-component has to satisfy its post-condition, and (2) the sub-component
should operate in the same environment as the overall partially specified compo-
nent. Sub-component development can itself be an iterative process, but neither
the model of the environment nor the overall properties of the system can be
changed during this process. Otherwise, the resulting sub-component cannot be
automatically integrated into the overall system.
In the p&d example, development of the sub-component for the black-box
state 2 is delegated to an external contractor. Candidate sub-components are
shown in Fig. 3e–f. In the former case, the component requests shipping info
details and waits until the shipping service provides the shipment cost and time.
Then it queries the furniture-sale service to obtain the product info. In the latter
case, the shipping and the furniture services are queried, but the sub-component
does not wait for an answer from the furniture-sale. Since these candidates are
fully defined, the well-formedness check is not needed. Yet, the substitutability
checking confirms that of these, only the sub-component in Fig. 3e satisfies the
post-condition in Fig. 3b.
Sub-component Integration. This phase is identified in Fig. 2 with the sym-
bol 3 . FIDDle guarantees that if each sub-component is developed correctly
w.r.t. the contract of the corresponding black-box state, the component obtained
by integrating the sub-components is also correct. In the p&d example, the sub-
component in Fig. 3e passes the substitutability check and can be a valid imple-
mentation of the black-box state 2 in Fig. 3d. Their integration is showed in
Fig. 3g.
3 Preliminaries
The model of the environment and the properties of interest are expressed using
Labelled Transition Systems and Fluent Linear Time Temporal Logic.
www.dbooks.org
Supporting Verification-Driven Incremental Distributed Design 175
Model of the Environment. Let Act be the universal set of observable events
and let Actτ = Act∪{τ }, where τ denotes an unobservable local event. A Labeled
Transition System (LTS) [20] is a tuple A = Q, q0 , αA, Δ, where Q is the set
of states, q0 ∈ Q is the initial state, αA ⊆ Act is a finite set of events, and
Δ ⊆ Q × αA ∪ {τ } × Q is the transition relation. The parallel composition
operation is defined as usual (see for example [14]).
Properties. A fluent [33] Fl is a tuple IF l , TF l , InitFl , where IF l ⊂ Act, TF l ⊂
Act, IF l ∩ TF l = ∅ and InitFl ∈ {true, false}. A fluent may be true or false. A
fluent is true if it has been initialized by an event i ∈ IF l at an earlier time point
(or if it was initially true, that is, InitFl = true) and has not yet been terminated
by another event t ∈ TF l ; otherwise, it is false. For example, consider the LTS
in Fig. 1c and the fluent F ReqPend ={userReq}, {respOk, reqCanc}, false .
F ReqPend holds in a trace of the LTS from the moment at which userReq
occurs and until a transition labeled with respOk or reqCanc is fired. In the
following, we use the notation F Event to indicate a fluent that is true when the
event with label event occurs.
An FLTL formula is obtained by composing fluents with standard LTL
operators: (next), (eventually), (always), U (until) and (weak until).
For example, FLTL encodings of the properties P1, P2, P3 and P4 are shown
in Fig. 3a.
Satisfaction of FLTL formulae can be evaluated over finite and infinite traces,
by first constructing and FLTL interpretation of the infinite and finite trace
and then by evaluating the FLTL formulae over this interpretation The FLTL
interpretation of a finite trace is obtained by slightly changing the interpretation
of infinite traces. The evaluation of the FLTL formulae on the finite trace is
obtained by considering the standard interpretation of LTL operator over finite
traces (see [13]). In the following, we assume that Definitions 5 and 4 (available in
the Appendix) are considered to evaluate whether an FLTL formula is satisfied
on finite and infinite traces, respectively.
This section introduces a novel formalism for modeling and refining components.
We define the notion of a partial LTS and then extend it with pre- and post-
conditions.
Partial LTS. A partial LTS is an LTS where some states are “regular” and
others are “black-box”. Black-box states model portions of the component whose
behavior still has to be specified. Each black-box state is augmented with an
interface that specifies the universe of events that can occur in the black-box. A
Partial LTS (PLTS) is a structure P = A, R, B, σ, where: A = Q, q0 , αA, Δ
is an LTS; Q is the set of states, s.t. Q = R ∪ B and R ∩ B = ∅; R is the set
of regular states; B is the set of black-box states; σ : B → 2αA is the interface.
An LTS is a PLTS where the set of black-box states is empty. The PLTS in
Fig. 3d is defined over the regular states 1 and 3, and the black-box states 2,
176 C. Menghi et al.
www.dbooks.org
Supporting Verification-Driven Incremental Distributed Design 177
obtained from R. The contracts of black-box states 4 and 5 are the same as
those in Fig. 3b.
5 Verification Algorithms
In this section, we describe the algorithms for the analysis of partial components,
which we have implemented on top of LTSA [25].
Checking Realizability. Realizability of a property φ is checked via the fol-
lowing procedure. Let E be the environment of the partial component C, and
C B be the LTS resulting from removing all black-box states and their incoming
and outgoing transitions from C. Check C B E |= φ. If φ is not satisfied,
the component is not realizable: no matter how the black-box states are speci-
fied, there will be a behavior of the system that does not satisfy φ. Otherwise,
compute C E (as specified in Definition 1) and model-check it against ¬φ.
If the property ¬φ is satisfied, the component is not realizable. Indeed, all the
behaviors of C E satisfy ¬φ, i.e., there is no behavior that the component can
exhibit to satisfy φ. Otherwise, the component may be realizable. For example,
the realizability checker shows that it is possible to realize a component refining
the one shown in Fig. 3c while satisfying property P2. Specifically, it returns a
trace that ensures that after a userReq event, the offer is provided to the user
(the event offerRcvd ) only if the furniture service has confirmed the availability
of the requested product (the event inforRcvd ).
www.dbooks.org
Supporting Verification-Driven Incremental Distributed Design 179
post(bi ) is transformed into an equivalent LTS, called LT Sbi , using the pro-
cedure in [37]. Since LT Sbi has traces in the form π, {end}ω , it has a state
s with an end -labelled self-loop. This self-loop is removed, and s is consid-
ered as final state of LT Sbi . All other end -labeled transitions are replaced
by τ -transitions. Each automaton LT Sbi contains all the traces that do not
violate the corresponding post-condition.
(2) Integrate the LTSs of all the black-box states bi = b. For every black-box
state bi = b, eliminate bi and add LT Sbi to C by replacing every incoming
transition of bi with a transition whose destination is the initial state of
LT Sbi , and every outgoing transition of bi with a transition whose source
is the final state of LT Sbi . This step creates an LTS which encodes all the
traces of the component that do not violate any post-conditions of its black-
box states.
(3) Integrate the LTS of the black-box state b. Integrate LT Sb into C together
with two additional states, q1 and q2 , calling the resulting model C . Replace
every incoming transition of b by a transition with destination q1 . Replace
every outgoing transition of b by a transition whose source is the final state
of LTSb . Add a transition labeled with τ from q1 to the initial state of LTSb .
Add a self-loop labeled with an event end to q2 . Add a τ -transition from q1
to q2 . The obtained LTS C encodes all the valid traces of the system. When
a valid trace reaches the black-box state b, C can enter state q2 from which
only the end -labelled self-loop is available.
(4) Verify. Recall that the precondition pre(b) of b is defined over finite traces,
i.e., those that reach the initial state of the sub-component to be substituted
for b. To use standard verification procedures, we transform pre(b) into an
equivalent formula, pre(b) , over infinite traces. This transformation, speci-
fied in [13], ensures that every trace of the form π, {end }ω satisfies pre(b)
iff π satisfies pre(b). By construction in step 3 above, C E has a valid
trace of this form which is generated when C E reaches the initial state
of the LTS LT Sb associated with the black-box state b of C. To check the
pre-condition, we verify whether C E |= pre(b) using traditional model
checking.
In the p&d example, if we remove the clause F InfoRcvd from the post-
condition of the black-box state 2, the p&d component is not well-formed since
the pre-condition of state 4 is violated. The counterexample shows a trace that
reaches the black-box state 4 in which an event userReq is not followed by infoR-
cvd. Adding F InfoRcvd to the post-condition of state 2 solves the problem.
design of Fig. 3d and assume that the black-box state 2 is not associated with
any post-condition, the model checker returns the counterexample userReq,τ ,
offerRcvd for property P2, since the sub-component that will replace the black-
box state 2 is not forced to ask to book the furniture service. Adding the post-
condition in Fig. 3b solves the problem.
Theorem 4. The model checking procedure returns true iff every valid trace of
C E satisfies φ.
6 Evaluation
We aim to answer two questions: RQ.1: How effective is FIDDle w.r.t. support-
ing an iterative, distributed development of correct components? (Sect. 6.1) and
RQ.2: How scalable is the automated part of the proposed approach? (Sect. 6.2).
www.dbooks.org
Supporting Verification-Driven Incremental Distributed Design 181
www.dbooks.org
Supporting Verification-Driven Incremental Distributed Design 183
#CompStates
E1 : (Tw )/(Tm ) E2 : (Ts )/(Tm )
#EnvStates 10 50 100 250 500 750 1000 10 50 100 250 500 750 1000
10 1.45 1.26 1.51 1.29 1.42 1.43 1.31 2.20 4.37 2.18 1.50 2.19 1.62 1.62
100 1.15 1.25 1.50 1.08 0.88 1.02 2.33 3.51 4.66 3.61 2.80 3.18 1.96 2.73
1000 1.39 1.23 0.60 1.44 4.90 1.00 2.83 13.98 8.12 3.84 2.64 2.83 2.91 2.00
7 Related Work
www.dbooks.org
Supporting Verification-Driven Incremental Distributed Design 185
8 Conclusion
References
1. Alur, R., Henzinger, T.A.: Reactive modules. Formal Meth. Softw. Des. 15(1),
7–48 (1999)
2. Alur, R., Moarref, S., Topcu, U.: Pattern-Based Refinement of Assume-Guarantee
Specifications in Reactive Synthesis. In: Baier, C., Tinelli, C. (eds.) TACAS 2015.
LNCS, vol. 9035, pp. 501–516. Springer, Heidelberg (2015). https://doi.org/10.
1007/978-3-662-46681-0 49
186 C. Menghi et al.
3. Alur, R., Moarref, S., Topcu, U.: Compositional synthesis of reactive controllers
for multi-agent systems. In: Chaudhuri, S., Farzan, A. (eds.) CAV 2016. LNCS,
vol. 9780, pp. 251–269. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-
41540-6 14
4. Alur, R., Yannakakis, M.: Model checking of hierarchical state machines. ACM
SIGSOFT Softw. Eng. Notes 23(6), 175–188 (1998)
5. Amalfitano, D., Fasolino, A.R., Tramontana, P.: Reverse engineering finite state
machines from rich internet applications. In: Proceedings of the 15th Working
Conference on Reverse Engineering, pp. 69–73 (2008)
6. Bensalem, S., Bozga, M., Krichen, M., Tripakis, S.: Testing conformance of real-
time applications by automatic generation of observer. In: Proceedings of RV,
Electronic Notes in Theoretical Computer Science, pp. 23–43 (2004)
7. Bernasconi, A., Menghi, C., Spoletini, P., Zuck, L.D., Ghezzi, C.: From model
checking to a temporal proof for partial models. In: Cimatti, A., Sirjani, M. (eds.)
SEFM 2017. LNCS, vol. 10469, pp. 54–69. Springer, Cham (2017). https://doi.
org/10.1007/978-3-319-66197-1 4
8. Bruns, G., Godefroid, P.: Model checking partial state spaces with 3-valued tem-
poral logics. In: Halbwachs, N., Peled, D. (eds.) CAV 1999. LNCS, vol. 1633, pp.
274–287. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48683-6 25
9. Chaki, S., Clarke, E.M., Sharygina, N., Sinha, N.: Verification of evolving software
via component substitutability analysis. Formal Methods Softw. Des. 32(3), 235–
266 (2008)
10. Chechik, M., Devereux, B., Easterbrook, S., Gurfinkel, A.: Multi-valued symbolic
model-checking. ACM Trans. Softw. Eng. Methodol. 12(4), 371–408 (2003)
11. Ciolek, D., Braberman, V.A., D’Ippolito, N., Uchitel, S.: Technical Report:
Directed Controller Synthesis of Discrete Event Systems. CoRR, abs/1605.09772
(2016)
12. Cobleigh, J.M., Giannakopoulou, D., PĂsĂreanu, C.S.: Learning assumptions for
compositional verification. In: Garavel, H., Hatcliff, J. (eds.) TACAS 2003. LNCS,
vol. 2619, pp. 331–346. Springer, Heidelberg (2003). https://doi.org/10.1007/3-
540-36577-X 24
13. De Giacomo, G., De Masellis, R., Montali, M.: Reasoning on LTL on finite traces:
insensitivity to infiniteness. In: Proceedings of AAAI, pp. 1027–1033 (2014)
14. D’Ippolito, N., Braberman, V., Piterman, N., Uchitel, U.: Synthesising non-
anomalous event-based controllers for liveness goals. ACM Tran. Softw. Eng.
Methodol. 22, 9 (2013)
15. D’Ippolito, N., Braberman, V., Piterman, N., Uchitel, S.: Controllability in partial
and uncertain environments. In: Proceedings of ACSD, pp. 52–61. IEEE (2014)
16. Dwyer, M.B., Avrunin, G.S., Corbett, J.C.: Property specification patterns for
finite-state verification. In: Proceedings of FMSP, pp. 7–15. ACM (1998)
17. Giannakopoulou, D., Pasareanu, C.S., Barringer, H.: Assumption generation for
software component verification. In: Proceedings of ASE, pp. 3–12. IEEE (2002)
18. Giannakopoulou, D., Păsăreanu, C.S., Barringer, H.: Component verification with
automatically generated assumptions. J. Autom. Softw. Eng. 12(3), 297–320 (2005)
19. Jones, C.B.: Tentative steps toward a development method for interfering pro-
grams. ACM Trans. Program. Lang. Syst. 5(4), 596–619 (1983)
20. Keller, R.M.: Formal verification of parallel programs. Commun. ACM 19(7), 371–
384 (1976)
21. Larsen, K.G., Thomsen, B.: A modal process logic. In: Proceedings of LICS, pp.
203–210. IEEE (1988)
www.dbooks.org
Supporting Verification-Driven Incremental Distributed Design 187
22. Levy, L.S.: Taming the Tiger: Software Engineering and Software Economics.
Springer Books on Professional Computing Series. Springer-Verlag, New York
(1987). https://doi.org/10.1007/978-1-4612-4718-0
23. Li, W., Dworkin, L., Seshia, S.A.: Mining assumptions for synthesis. In: Proceed-
ings of ACM/IEEE MEMPCODE, pp. 43–50 (2011)
24. Lorenzoli, D., Mariani, L., Pezzè, M.: Automatic generation of software behavioral
models. In: Proceedings of ICSE, pp. 501–510 (2008)
25. Magee, J., Kramer, J.: State Models and Java Programs. Wiley, New York (1999)
26. Menghi, C., Spoletini, P., Ghezzi, C.: Dealing with incompleteness in automata-
based model checking. In: Fitzgerald, J., Heitmeyer, C., Gnesi, S., Philippou, A.
(eds.) FM 2016. LNCS, vol. 9995, pp. 531–550. Springer, Cham (2016). https://
doi.org/10.1007/978-3-319-48989-6 32
27. Menghi, C., Spoletini, P., Ghezzi, C.: Integrating goal model analysis with iterative
design. In: Grünbacher, P., Perini, A. (eds.) REFSQ 2017. LNCS, vol. 10153, pp.
112–128. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54045-0 9
28. Nivoit, J.-B.: Issues in strategic management of large-scale software product line
development. Master’s thesis, MIT, USA (2013)
29. Pistore, M., Barbon, F., Bertoli, P., Shaparau, D., Traverso, P.: Planning and
monitoring web service composition. In: Bussler, C., Fensel, D. (eds.) AIMSA 2004.
LNCS (LNAI), vol. 3192, pp. 106–115. Springer, Heidelberg (2004). https://doi.
org/10.1007/978-3-540-30106-6 11
30. Pnueli, A.: In transition from global to modular temporal reasoning about pro-
grams. In: Apt, K.R. (ed.) Logics and Models of Concurrent Systems. NATO ASI
Series, pp. 123–144. Springer-Verlag, New York Inc (1985). https://doi.org/10.
1007/978-3-642-82453-1 5
31. Pnueli, A., Rosner, R.: On the synthesis of a reactive module. In: Proceedings of
POPL, pp. 179–190. ACM (1989)
32. Pretschner, A., Broy, M., Kruger, I.H., Stauner, T.: Software engineering for auto-
motive systems: a roadmap. In: Proceedings of FOSE, pp. 55–71. IEEE Computer
Society (2007)
33. Sandewall, E.: Features and Fluents (Vol. 1): The Representation of Knowledge
about Dynamical Systems. Oxford University Press Inc, New York (1995)
34. Sibay, G.E., Uchitel, S., Braberman, V., Kramer, J.: Distribution of modal tran-
sition systems. In: Giannakopoulou, D., Méry, D. (eds.) FM 2012. LNCS, vol.
7436, pp. 403–417. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-
642-32759-9 33
35. Software Measurement Services Ltd. “small project”, “medium-size project”, and
“large project”: What do these terms mean? (2004). http://www.totalmetrics.com/
function-points-downloads/Function-Point-Scale-Project-Size.pdf
36. Solar-Lezama, A.: Program synthesis by sketching. Ph.D. thesis. University of Cal-
ifornia, Berkeley (2008)
37. Uchitel, S., Brunet, G., Chechik, M.: Synthesis of partial behavior models from
properties and scenarios. IEEE Trans.Softw. Eng. 35(3), 384–406 (2009)
188 C. Menghi et al.
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
www.dbooks.org
Summarizing Software API Usage
Examples Using Clustering Techniques
1 Introduction
Third-party libraries and frameworks are an integral part of current software
systems. Access to the functionality of a library is typically offered by its API,
which may consist of numerous classes and methods. However, as noted by mul-
tiple studies [24,30], APIs often lack proper examples and documentation and,
in general, sufficient explanation on how to be used. Thus, developers often
use general-purpose or specialized code search engines (CSEs), and Question-
Answering (QA) communities, such as Stack Overflow, in order to find possible
API usages. However, the search process in these services can be time consuming
c The Author(s) 2018
A. Russo and A. Schürr (Eds.): FASE 2018, LNCS 10802, pp. 189–206, 2018.
https://doi.org/10.1007/978-3-319-89363-1_11
190 N. Katirtzis et al.
[13], while the source code snippets provided in web sites and QA communities
might be difficult to recognise, ambiguous, or incomplete [28,29].
As a result, several researchers have studied the problem of API usage min-
ing, which can be described as automatically identifying a set of patterns that
characterize how an API is typically used from a corpus of client code [11]. There
are two main types of API mining methods. First are methods that return API
call sequences, using techniques such as frequent sequence mining [31–33], clus-
tering [25,31,33], and probabilistic modeling [9]. Though interesting, API call
sequences do not always describe important information like method arguments
and control flow, and their output cannot be directly included in one’s code.
A second class of approaches automatically produces source code snippets
which, compared to API call sequences, provide more information to the devel-
oper, and are more similar to human-written examples. Methods for mining
snippets, however, tend to rely on detailed semantic analysis, including program
slicing [5,13–15] and symbolic execution [5], which can make them more difficult
to deploy to new languages. Furthermore, certain approaches do not use any
clustering techniques, thus resulting to a redundant and non-diverse set of API
soure code snippets [20], which is not representative as it only uses a few API
methods as noted by Fowkes and Sutton [9]. On the other hand, approaches
that do use clustering techniques are usually limited to their choice of clustering
algorithms [34] and/or use feature sets that are language-specific [13–15].
In this paper, we propose CLAMS (Clustering for API Mining of Snip-
pets), an approach for mining API usage examples that lies between snippet
and sequence mining methods, which ensures lower complexity and thus could
apply more readily to other languages. The basic idea is to cluster a large set
of usage examples based on their API calls, generate summarized versions for
the top snippets of each cluster, and then select the most representative snippet
from each cluster, using a tree edit distance metric on the ASTs. This results in a
diverse set of examples in the form of concise and readable source code snippets.
Our method is entirely data-driven, requiring only syntactic information from
the source code, and so could be easily applied to other programming languages.
We evaluate CLAMS on a set of popular libraries, where we illustrate how its
results are more diverse in terms of API methods than those of other approaches,
and assess to what extent the snippets match human-written examples.
2 Related Work
Several studies have pointed out the importance of API documentation in the
form of examples when investigating API usability [18,22] and API adoption in
cases of highly evolving APIs [16]. Different approaches have thus been presented
to find or create such examples; from systems that search for examples on web
pages [28], to ones that mine such examples from client code located in source
code repositories [5], or even from video tutorials [23]. Mining examples from
client source code has been a typical approach for Source Code-Based Recom-
mendation Systems (SCoReS ) [19]. Such methods are distinguished according
to their output which can be either source code snippets or API call sequences.
www.dbooks.org
Summarizing Software API Usage Examples Using Clustering Techniques 191
One of the first systems to mine API usage patterns is MAPO [32] which employs
frequent sequence mining [10] to identify common usage patterns. Although the
latest version of the system outputs the API call sequences along with their asso-
ciated snippets [33], it is still more of a sequence-based approach, as it presents
the code of the client method without performing any summarization, while it
also does not consider the structure of the source code snippets.
Wang et al. [31] argue that MAPO outputs a large number of usage patterns,
many of which are redundant. The authors therefore define scalability, succinct-
ness and high-coverage as the required characteristics of an API miner and
construct UP-Miner, a system that mines probabilistic graphs of API method
calls and extracts more useful patterns than MAPO. However, the presentation
of such graphs can be overwhelming when compared to ranked lists.
Recently, Fowkes and Sutton [9] proposed a method for mining API usage
patterns called PAM, which uses probabilistic machine learning to mine a less
redundant and more representative set of patterns than MAPO or UP-Miner.
This paper also introduced an automated evaluation framework, using handwrit-
ten library usage examples from Github, which we adapt in the present work.
A typical snippet mining system is eXoaDocs [13–15] that employs slicing tech-
niques to summarize snippets retrieved from online sources into useful documen-
tation examples, which are further organized using clustering techniques. How-
ever, clustering is performed using semantic feature vectors approximated by
the Deckard tool [12], and such features are not straightforward to get extracted
for different programming languages. Furthermore, eXoaDocs only targets usage
examples of single API methods, as its feature vectors do not include information
for mining frequent patterns with multiple API method calls.
APIMiner [20] introduces a summarization algorithm that uses slicing to
preserve only the API-relevant statements of the source code. Further work by
the same authors [4] incorporates association rule techniques, and employs an
improved version of the summarization algorithm, with the aim of resolving
variable types and adding descriptive comments. Yet the system does not cluster
similar examples, while most examples show the usage of a single API method.
Even when slicing is employed in the aforementioned systems, the examples
often contain extraneous statements (i.e. statements that could be removed as
they are not related to the API), as noted by Buse and Weimer [5]. Hence,
the authors introduce a system that synthesizes representative and well-typed
usage examples using path-sensitive data flow analysis, clustering, and pattern
abstraction. The snippets are complete and abstract, including abstract naming
and helpful code, such as try/catch statements. However, the sophistication of
their program analysis makes the system more complex [31], and increases the
required effort for applying it to new programming languages.
192 N. Katirtzis et al.
Allamanis and Sutton [1] present a system for mining syntactic idioms, which
are syntactic patterns that recur frequently and are closely related to snippets,
and thus many of their mined patterns are API snippets. That method is lan-
guage agnostic, as it relies only on ASTs, but uses a sophisticated statistical
method based on Bayesian probabilistic grammars, which limits its scalability.
Although the aforementioned approaches can be effective in certain scenarios,
they also have several drawbacks. First, most systems output API call sequences
or other representations (e.g. call graphs), which may not be as helpful as snip-
pets, both in terms of understanding and from a reuse perspective (e.g. adapting
an example to fit one’s own code). Several of the systems that output snippets
do not group them into clusters and thus they do not provide a diverse set of
usage examples, and even when clustering is employed, the set of features may
not allow extending the approaches in other programming languages. Finally,
certain systems do not provide concise and readable snippets as their source
code summarization capabilities are limited.
In this work, we present a novel API usage mining system, CLAMS, to over-
come the above limitations. CLAMS employs clustering to group similar snippets
and the output examples are subsequently improved using a summarization algo-
rithm. The algorithm performs heuristic transformations, such as variable type
resolution and replacement of literals, while it also removes non-API statements,
in order to output concise and readable snippets. Finally, the snippets are ranked
in descending order of support and given along with comprehensive comments.
3 Methodology
The architecture of the system is shown in Fig. 1. The input for each library is a
set of Client Files and the API of the library. The API Call Extractor generates
a list of API call sequences from each method. The Clustering Preprocessor
computes a distance matrix of the sequences, which is used by the Clustering
Engine to cluster them. After that, the top (most representative) sequences from
www.dbooks.org
Summarizing Software API Usage Examples Using Clustering Techniques 193
each cluster are selected (Clustering Postprocessor ). The source code and the
ASTs (from the AST Extractor ) of these top snippets are given to the Snippet
Generator that generates a summarized snippet for each of them. Finally, the
Snippet Selector selects a single snippet from each cluster, and the output is
given by the Ranker that ranks the examples in descending order of support.
The Preprocessing Module receives as input the client source code files and
extracts their ASTs and their API call sequences. The AST Extractor employs
srcML [8] to convert source code to an XML AST format, while the API Call
Extractor extracts the API call sequences using the extractor provided by Fowkes
and Sutton [9] which uses the Eclipse JDT parser to extract method calls using
depth-first AST traversal.
Fig. 2. The sample client code on the left side contains the same API calls with the
client code on the right side, which are encircled in both snippets.
|LCS (S1 , S2 )|
LCS dist (S1 , S2 ) = 1 − 2 · (1)
|S1 | + |S2 |
194 N. Katirtzis et al.
where |S1 | and |S2 | are the lengths of S1 and S2 , and |LCS (S1 , S2 )| is the length
of their LCS. Given the distance matrix, the Clustering Engine explores the k-
medoids algorithm which is based on the implementation provided by Bauckhage
[3], and the hierarchical version of DBSCAN, known as HDBSCAN [7], which
makes use of the implementation provided by McInnes et al. [17].
The next step is to retrieve the source code associated with the most rep-
resentative sequence of each cluster (Clustering Postprocessor ). Given, however,
that each cluster may contain several snippets that are identical with respect to
their sequences, we select multiple snippets for each cluster, this way retaining
source code structure information, which shall be useful for selecting a single
snippet (see Sect. 3.5). Our analysis showed that selecting all possible snippets
did not further improve the results, thus we select n snippets and set n to 5 for
our experiments, as trying higher values would not affect the results.
The Snippet Generator generates a summarized version for the top snippets.
Our summarization method, a static, flow-insensitive, intra-procedural slicing
approach, is presented in Fig. 3. The input (Fig. 3, top left) is the snippet source
code, the list of its invoked API calls and a set of variables defined in its outer
scope (encircled and highlighted in bold respectively).
At first, any comments are removed and literals are replaced by their srcML
type, i.e. string, char, number or boolean (Step 1 ). In Step 2, the algorithm
creates two lists, one for API and one for non-API statements (highlighted in
bold), based on whether an API method is invoked or not in each statement. Any
control flow statements that include API statements in their code block are also
retained (e.g. the else statement in Fig. 3). In Step 3, the algorithm creates a list
with all the variables that reside in the local scope of the snippet (highlighted
in bold). This is followed by the removal of all non-API statements (Step 4 ), by
traversing the AST in reverse (bottom-up) order.
In Step 5, the list of declared variables is filtered, and only those used in
the summarized tree are retained (highlighted in bold). Moreover, the algorithm
creates a list with all the variables that are declared in API statements and used
only in non-API statements (encircled). In Step 6, the algorithm adds declara-
tions (encircled) for the variables retrieved in Step 5. Furthermore, descriptive
comments of the form “Do something with variable” (highlighted in bold) are
added for the variables that are declared in API statements and used in non-API
statements (retrieved also in Step 5). Finally, the algorithm adds “Do something”
comments in any empty blocks (highlighted in italics).
Finally, note that our approach is quite simpler than static, syntax preserving
slicing. E.g., static slicing would not remove any of the statements inside the
else block, as the call to the getFromUser API method is assigned to a variable
(userName), which is then used in the assignment of user. Our approach, on the
other hand, performs a single pass over the AST, thus ensuring lower complexity,
which in its turn reduces the overall complexity of our system.
www.dbooks.org
Summarizing Software API Usage Examples Using Clustering Techniques 195
The next step is to select a single snippet for each cluster. Given that the selected
snippet has to be the most representative of the cluster, we select the one that
is most similar to the other top snippets. The score between any two snippets is
defined as the tree edit distance between their ASTs, computed using the AP-
TED algorithm [21]. Given this metric, we create a matrix for each cluster, which
contains the distance between any two top snippets of the cluster. Finally, we
select the snippet with the minimum sum of distances in each cluster’s matrix.
3.6 Ranker
We rank the snippets according to the support of their API call sequences, as
in [9]. In specific, if the API call sequence of a snippet is a subsequence of the
sequence of a file in the repository, then we claim that the file supports the snippet.
For example, the snippet with API call sequence [twitter4j.Status.getUser, twit-
ter4j.Status.getText], is supported by a file with sequence [twitter4j.Paging.<init>,
196 N. Katirtzis et al.
4 Evaluation
4.1 Evaluation Framework
We evaluate CLAMS on the APIs (all public methods) of 6 popular Java libraries,
which were selected as they are popular (based on their GitHub stars and forks),
cover various domains, and have handwritten examples to compare our snippets
with. The libraries are shown in Table 1, along with certain statistics concerning
the lines of code of their examples’ directories (Example LOC) and the lines of
code considered from GitHub as using their API methods (Client LOC).
www.dbooks.org
Summarizing Software API Usage Examples Using Clustering Techniques 197
We focus our evaluation on the 4 research questions of Fig. 4. RQ1 and RQ2
refer to summarization and clustering respectively and will be evaluated with
respect to handwritten examples. For RQ3 we assess the API coverage achieved
by CLAMS versus the ones achieved by the API mining systems MAPO [32,33]
and UP-Miner [31]. RQ4 will determine whether the extra information of source
code snippets when compared to API call sequences is useful to developers.
RQ1: How much more concise, readable, and precise with respect to handwritten
examples are the snippets after summarization?
RQ2: Do more powerful clustering techniques, that cluster similar rather than identi-
cal sequences, lead to snippets that more closely match handwritten examples?
RQ3: Does our tool mine more diverse patterns than other existing approaches?
RQ4: Do snippets match handwritten examples more than API call sequences?
where Ts and Te are the set of tokens of the snippet s and of the example e,
respectively. Finally, if no example has exactly the same API calls as the snippet
(i.e. Es = ∅), then snippet precision is set to zero. Given the snippet precision,
we also define the average snippet precision for n snippets s1 , s2 , . . . , sn as:
n
1
AvgP rec(n) = P rec(si ) (3)
n i=1
This metric is useful for evaluating our system which outputs ordered results, as
it allows us to illustrate and draw conclusions for precision at different levels.
We also define coverage at k as the number of unique API methods contained
in the top k snippets. This metric has already been defined in a similar manner by
Fowkes and Sutton [9], who claim that a list of patterns with identical methods
would be redundant, non-diverse, and thus not representative of the target API.
Finally, we measure additional information provided in source code snippets
when compared with API call sequences. For each snippet we extract its snippet-
tokens Ts , as defined in (2), and its sequence-tokens Ts , which are extracted by
the underlying API call sequence of the snippet, where each token is the name
of an API method. Based on these sets, we define the additional info metric as:
m
1 maxe∈Es {|Tsi ∩ Te |}
AdditInf o = (5)
m i=1 maxe∈Es {|Tsi ∩ Te |}
www.dbooks.org
Summarizing Software API Usage Examples Using Clustering Techniques 199
(a) (b)
Fig. 5. Figures of (a) the average readability, and (b) the average PLOCs of the snip-
pets, for each library, with (NaiveSum) and without (NaiveNoSum) summarization.
(a) (b)
RQ3: Does our tool mine more diverse patterns than other exist-
ing approaches? For this research question, we evaluate the diversity of the
examples of CLAMS to that of two API mining approaches, MAPO [32,33] and
UP-Miner [31], which were deemed most similar to our approach from a mining
perspective (as it also works at sequence level)2 . We measure diversity using
the coverage at k. Figure 7a depicts the coverage in API methods for each app-
roach and each library, while Fig. 7b shows the average number of API methods
covered at top k, using the top 100 examples of each approach.
2
Comparing with other tools was also hard, as most are unavailable, such as, e.g., the
eXoaDocs web app (http://exoa.postech.ac.kr/) or the APIMiner website (http://
java.labsoft.dcc.ufmg.br/apimineride/resources/docs/reference/).
www.dbooks.org
Summarizing Software API Usage Examples Using Clustering Techniques 201
(a) (b)
Fig. 7. Graphs of the coverage in API methods achieved by CLAMS, MAPO, and UP-
Miner, (a) for each project, and (b) on average, at top k, using the top 100 examples.
The coverage by MAPO and UP-Miner is quite low, which is expected since
both tools perform frequent sequence mining, thus generating several redundant
patterns, a limitation noted also by Fowkes and Sutton [9]. On the other hand,
our system integrates clustering techniques to reduce redundancy which is fur-
ther eliminated by the fact that we select a single snippet from each cluster
(Snippet Selector). Finally, the average coverage trend (Fig. 7b) indicates that
our tool mines more diverse sequences than the other two tools, regardless of
the number of examples.
Fig. 10. Top 5 usage examples mined by (a) CLAMS, (b) MAPO, and (c) UP-Miner.
The API methods for the examples of our system are highlighted.
www.dbooks.org
Summarizing Software API Usage Examples Using Clustering Techniques 203
Interestingly, the snippet ranked second by CLAMS has not been matched to
any handwritten example, although it has high support in the dataset. In fact,
there is no example for the setOauthConsumer method of Twitter4J, which is one
of its most popular methods. This illustrates how CLAMS can also extract snip-
pets beyond those of the examples directory, which are valuable to developers.
5 Threats to Validity
The main threats to validity of our approach involve the choice of the evaluation
metrics and the lack of comparison with snippet-based approaches. Concerning
the metrics, snippet API coverage is typical when comparing API usage mining
approaches. On the other hand, the choice of metrics for measuring snippet
quality is indeed a subjective criterion. To address this threat, we have employed
three metrics, for the conciseness (PLOCs), readability, and quality (similarity to
real examples). Our evaluation indicates that CLAMS is effective on all of these
axes. In addition, as these metrics are applied on snippets, computing them
for sequence-based systems such as MAPO and UP-Miner was not possible.
Finally, to evaluate whether CLAMS can be practically useful when developing
software, we plan to conduct a developer survey. To this end, we have already
performed a preliminary study on a team of 5 Java developers of Hotels.com, the
results of which were encouraging. More details about the study can be found
at https://mast-group.github.io/clams/user-survey/ (omitted here due to space
limitations).
Concerning the comparison with current approaches, we chose to compare
CLAMS against sequence-based approaches (MAPO and UP-Miner), as the min-
ing methodology is actually performed at sequence level. Nevertheless, compar-
ing with snippet-based approaches would also be useful, not only as a proof of
concept but also because it would allow us to comparatively evaluate CLAMS
with regard to the snippet quality metrics mentioned in the previous paragraph.
However, such a comparison was troublesome, as most current tools (including
e.g., eXoaDocs, APIMiner, etc.) are currently unavailable (see RQ3 of Sect. 4.2).
We may however note this comparison as an important point for future work,
while we also choose to upload our code and findings online (https://mast-
group.github.io/clams/) to facilitate future researchers that may face similar
challenges.
6 Conclusion
In this paper we have proposed a novel approach for mining API usage examples
in the form of source code snippets, from client code. Our system uses clustering
techniques, as well as a summarization algorithm to mine useful, concise, and
readable snippets. Our evaluation shows that snippet clustering leads to better
precision versus coverage rate, while the summarization algorithm effectively
increases the readability and decreases the size of the snippets. Finally, our tool
offers diverse snippets that match handwritten examples better than sequences.
204 N. Katirtzis et al.
In future work, we plan to extend the approach used to retrieve the top mined
sequences from each cluster. We could use a two-stage clustering approach where,
after clustering the API call sequences, we could further cluster the snippets of
the formed clusters, using a tree edit distance metric. This would allow retrieving
snippets that use the same API call sequence, but differ in their structure.
References
1. Allamanis, M., Sutton, C.: Mining idioms from source code. In: Proceedings of
the 22nd ACM SIGSOFT International Symposium on Foundations of Software
Engineering, FSE 2014, pp. 472–483. ACM, New York (2014)
2. Artistic Style 3.0: http://astyle.sourceforge.net/. Accessed Jan 2018
3. Bauckhage, C.: Numpy/scipy Recipes for Data Science: k-Medoids Clustering.
Technical report. University of Bonn (2015)
4. Borges, H.S., Valente, M.T.: Mining usage patterns for the Android API. PeerJ
Comput. Sci. 1, e12 (2015)
5. Buse, R.P.L., Weimer, W.: Synthesizing API usage examples. In: Proceedings of the
34th International Conference on Software Engineering, ICSE 2012, pp. 782–792.
IEEE Press, Piscataway (2012)
6. Buse, R.P., Weimer, W.R.: A metric for software readability. In: Proceedings of
the 2008 International Symposium on Software Testing and Analysis, ISSTA 2008,
pp. 121–130. ACM, New York (2008)
7. Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on
hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G.
(eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 160–172. Springer, Heidelberg
(2013). https://doi.org/10.1007/978-3-642-37456-2 14
8. Collard, M.L., Decker, M.J., Maletic, J.I.: srcML: an infrastructure for the explo-
ration, analysis, and manipulation of source code: a tool demonstration. In: Pro-
ceedings of the 2013 IEEE International Conference on Software Maintenance,
ICSM 2013, pp. 516–519. IEEE Computer Society, Washington, DC (2013)
9. Fowkes, J., Sutton, C.: Parameter-free probabilistic API mining across GitHub. In:
Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations
of Software Engineering, FSE 2016, pp. 254–265. ACM, New York (2016)
10. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, vol. 3, pp.
1–38. Morgan Kaufmann Publishers Inc., San Francisco (2011)
11. Ishag, M.I.M., Park, H.W., Li, D., Ryu, K.H.: Highlighting current issues in API
usage mining to enhance software reusability. In: Proceedings of the 15th Inter-
national Conference on Software Engineering, Parallel and Distributed Systems,
SEPADS 2016, pp. 200–205. WSEAS (2016)
12. Jiang, L., Misherghi, G., Su, Z., Glondu, S.: DECKARD: scalable and accurate
tree-based detection of code clones. In: Proceedings of the 29th International Con-
ference on Software Engineering, ICSE 2007, pp. 96–105. IEEE Computer Society,
Washington, DC (2007)
13. Kim, J., Lee, S., Hwang, S.W., Kim, S.: Adding examples into Java documents.
In: Proceedings of the 2009 IEEE/ACM International Conference on Automated
Software Engineering, ASE 2009, pp. 540–544. IEEE, Washington, DC (2009)
14. Kim, J., Lee, S., Hwang, S.W., Kim, S.: Towards an intelligent code search engine.
In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence,
AAAI 2010, pp. 1358–1363. AAAI Press (2010)
www.dbooks.org
Summarizing Software API Usage Examples Using Clustering Techniques 205
15. Kim, J., Lee, S., Hwang, S.W., Kim, S.: Enriching documents with examples: a
corpus mining approach. ACM Trans. Inf. Syst. 31(1), 1:1–1:27 (2013)
16. McDonnell, T., Ray, B., Kim, M.: An empirical study of API stability and adoption
in the android ecosystem. In: Proceedings of the 2013 IEEE International Confer-
ence on Software Maintenance, ICSM 2013, pp. 70–79. IEEE Computer Society,
Washington, DC (2013)
17. McInnes, L., Healy, J., Astels, S.: HDBSCAN: hierarchical density based clustering.
J. Open Source Softw. 2(11), 205 (2017)
18. McLellan, S.G., Roesler, A.W., Tempest, J.T., Spinuzzi, C.I.: Building more usable
APIs. IEEE Softw. 15(3), 78–86 (1998)
19. Mens, K., Lozano, A.: Source code-based recommendation systems. In: Robillard,
M.P., Maalej, W., Walker, R.J., Zimmermann, T. (eds.) Recomm. Syst. Softw.
Eng., pp. 93–130. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-
45135-5 5
20. Montandon, J.E., Borges, H., Felix, D., Valente, M.T.: Documenting APIs with
examples: lessons learned with the APIMiner platform. In: Proceedings of the 20th
Working Conference on Reverse Engineering, WCRE 2013, pp. 401–408 (2013)
21. Pawlik, M., Augsten, N.: Tree edit distance: robust and memory-efficient. Inf. Syst.
56(C), 157–173 (2016)
22. Piccioni, M., Furia, C.A., Meyer, B.: An empirical study of API usability. In: Pro-
ceedings of the 7th ACM/IEEE International Symposium on Empirical Software
Engineering and Measurement, ESEM 2013, pp. 5–14 (2013)
23. Ponzanelli, L., Bavota, G., Mocci, A., Penta, M.D., Oliveto, R., Russo, B., Haiduc,
S., Lanza, M.: CodeTube: extracting relevant fragments from software development
video tutorials. In: Proceedings of the 38th International Conference on Software
Engineering Companion, ICSE-C 2016, pp. 645–648 (2016)
24. Robillard, M.P.: What makes APIs hard to learn? answers from developers. IEEE
Softw. 26(6), 27–34 (2009)
25. Saied, M.A., Benomar, O., Abdeen, H., Sahraoui, H.: Mining multi-level API usage
patterns. In: 2015 IEEE 22nd International Conference on Software Analysis, Evo-
lution, and Reengineering (SANER), pp. 23–32 (2015)
26. Sillito, J., Maurer, F., Nasehi, S.M., Burns, C.: What makes a good code example?:
a study of programming Q&A in stackoverflow. In: Proceedings of the 2012 IEEE
International Conference on Software Maintenance, ICSM 2012, pp. 25–34. IEEE
Computer Society, Washington, DC (2012)
27. Source Code Readability Metric. http://www.arrestedcomputing.com/readability.
Accessed Jan 2018
28. Stylos, J., Faulring, A., Yang, Z., Myers, B.A.: Improving API documentation using
API usage information. In: Proceedings of the 2009 IEEE Symposium on Visual
Languages and Human-Centric Computing, VLHCC 2009, pp. 119–126 (2009)
29. Subramanian, S., Inozemtseva, L., Holmes, R.: Live API documentation. In: Pro-
ceedings of the 36th International Conference on Software Engineering, ICSE 2014,
pp. 643–652. ACM, New York (2014)
30. Uddin, G., Robillard, M.P.: How API documentation fails. IEEE Softw. 32(4),
68–75 (2015)
31. Wang, J., Dang, Y., Zhang, H., Chen, K., Xie, T., Zhang, D.: Mining succinct and
high-coverage API usage patterns from source code. In: Proceedings of the 10th
Working Conference on Mining Software Repositories, MSR 2013, pp. 319–328.
IEEE Press, Piscataway (2013)
206 N. Katirtzis et al.
32. Xie, T., Pei, J.: MAPO: Mining API usages from open source repositories. In:
Proceedings of the 2006 International Workshop on Mining Software Repositories,
MSR 2006, pp. 54–57. ACM, New York (2006)
33. Zhong, H., Xie, T., Zhang, L., Pei, J., Mei, H.: MAPO: mining and recommending
API usage patterns. In: Drossopoulou, S. (ed.) ECOOP 2009. LNCS, vol. 5653, pp.
318–343. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03013-
0 15
34. Zhu, Z., Zou, Y., Xie, B., Jin, Y., Lin, Z., Zhang, L.: Mining API usage examples
from test code. In: Proceedings of the 2014 IEEE International Conference on
Software Maintenance and Evolution, ICSME 2014, pp. 301–310. IEEE Computer
Society, Washington, DC (2014)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
www.dbooks.org
Fast Computation of Arbitrary Control
Dependencies
1 Introduction
Context. Control dependence is a fundamental notion in software engineering
and analysis (e.g. [6,12,13,21,22,27]). It reflects structural relationships between
different program statements and is intensively used in many software analysis
techniques and tools, such as compilers, verification tools, test generators, pro-
gram transformation tools, simulators, debuggers, etc. Along with data depen-
dence, it is one of the key notions used in program slicing [25,27], a program
transformation technique allowing to decompose a given program into a simpler
one, called a program slice.
In 2011, Danicic et al. [11] proposed an elegant generalization of the notions of
closure under non-termination insensitive (weak ) and non-termination sensitive
(strong) control dependence. They introduced the notions of weak and strong
control-closures, that can be defined on any directed graph, and no longer only
on control flow graphs. They proved that weak and strong control-closures sub-
sume the closures under all forms of control dependence previously known in
the literature. In the present paper, we are interested in the non-termination
insensitive form, i.e. weak control-closure.
c The Author(s) 2018
A. Russo and A. Schürr (Eds.): FASE 2018, LNCS 10802, pp. 207–224, 2018.
https://doi.org/10.1007/978-3-319-89363-1_12
208 J.-C. Léchenet et al.
The Coq, Why3 and OCaml implementations are all available in [17].
Outline. We present our motivation and a running example in Sect. 2. Then,
we recall the definitions of some important concepts introduced by [11] in Sect. 3
and state two important lemmas in Sect. 4. Next, we describe Danicic’s algorithm
www.dbooks.org
Fast Computation of Arbitrary Control Dependencies 209
in Sect. 5 and our algorithm along with a sketch of the proof of its correctness
in Sect. 6. Experiments are presented in Sect. 7. Finally, Sect. 8 presents some
related work and concludes.
observable from u but not from v, then u must be added to V to build the
weak control-closure. Figure 2a shows our example graph G0 , each node being
annotated with its set of observables in V0 .
(u0 , u1 ) is an edge such that u0 is reachable from V0 , u1 can reach V0 and u3
is an observable vertex from u0 in V0 but not from u1 . u0 is thus a node to be
added in the weak control-closure. Likewise, from the edges (u2 , u3 ) and (u4 , u3 ),
we can deduce that u2 and u4 belong to the closure. However, we have seen that
u6 belongs to the closure, but it is not possible to apply the same reasoning
to (u6 , u0 ), (u6 , u4 ) or (u6 , u5 ). We need another technique. As Lemma 3 will
establish, the technique is actually iterative. We can add to the initial V0 the
nodes that we have already detected and apply our technique to this new set V0 .
The vertices that will be detected this way will also be in the closure of the initial
set V0 . The observable sets w.r.t. to V0 = V0 ∪ {u0 , u2 , u4 } are shown in Fig. 2b.
This time, both edges (u6 , u4 ) and (u6 , u0 ) allow us to add u6 to the closure.
Applying again the technique with the augmented set V0 = V0 ∪ {u6 } (cf.
Fig. 2c) does not reveal new vertices. This means that all the nodes have already
been found. We obtain the same set as before for the weak control-closure of V0 ,
i.e. {u0 , u1 , u2 , u3 , u4 , u6 }.
3 Basic Concepts
This section introduces basic definitions and properties needed to define the
notion of weak control-closure. They have been formalized in Coq [17], including
in particular Property 3 whose proof in [11] was inaccurate.
From now on, let G = (V, E) denote a directed graph, and V a subset of
path
V . We define a path in G in the usual way. We write u −−−→ v if there exists a
path
path from u to v. Let RG (V ) = {v ∈ V | ∃u ∈ V , u −−−→ v} be the set of nodes
reachable from V . In our example (cf. Fig. 1), u6 , u0 , u1 , u3 is a (4-node) path
in G0 , u1 is a trivial one-node path in G0 from u1 to itself, and RG0 (V0 ) = V0 .
www.dbooks.org
Fast Computation of Arbitrary Control Dependencies 211
Remark 1. Definition 1 and the following ones are slightly different from [11],
where a V -path must contain at least two vertices and there is no constraint
on its first vertex, which can be in V or not. Our definitions lead to the same
notion of weak control-closure.
Example. Since in particular u2 is not V0 -weakly committing and reachable from
V0 , V0 is not weakly control-closed in G0 . ∅, singletons and the set of all nodes V0
are trivially weakly control-closed. Less trivial weakly control-closed sets include
{u0 , u1 }, {u4 , u5 , u6 } and {u0 , u1 , u2 , u3 , u4 , u6 }.
Property 1. If u ∈ V , then u ∈
/ WDG (V ) (by Definitions 1, 4).
Example. In G0 , u2 is reachable from V0 and is V0 -weakly deciding. This gives
another proof that V0 is not weakly control-closed.
Property 2. ∀ V1 , V2 ⊆ V, V1 ⊆ V2 =⇒ WDG (V1 ) ⊆ V2 ∪ WDG (V2 )
We can prove that adding to a given set V the V -weakly deciding nodes
that are reachable from V gives a weakly control-closed set in G. This set is the
smallest superset of V weakly control-closed in G.
4 Main Lemmas
This section gives two lemmas used to justify both Danicic’s algorithm and ours.
www.dbooks.org
Fast Computation of Arbitrary Control Dependencies 213
on all the successively computed sets. Since each set is a strict superset of the
previous one, this iterative procedure terminates because graph G is finite.
Before stating the second lemma, we introduce a key concept. It is called Θ
in [11]. We use the name “observable” as in [26].
The concept of observable set was illustrated in Fig. 2 (cf. Sect. 2).
Proof. We need to exhibit two V -paths from u ending in V that share no vertex
except u. We take the V -path from u to u as the first one, and a V -path
connecting u to V through v as the second one (we construct it by prepending
u to the smallest prefix of the path from v ending in V which is a V -path). If
these V -paths intersected at a node y different from u, we would have a V -path
from v to u by concatenating the paths from v to y and from y to u , which is
contradictory.
Example. In G0 , obsG0 (u0 , V0 ) = {u1 , u3 } and obsG0 (u1 , V0 ) = {u1 } (cf. Fig. 2a).
Since u1 is a child of u0 , we can apply Lemma 4, and deduce that u0 is V0 -weakly
deciding. obsG0 (u5 , V0 ) = {u1 , u3 } and obsG0 (u6 , V0 ) = {u1 , u3 }. We cannot
apply Lemma 4 to u5 , and for good reason, since u5 is not V0 -weakly deciding.
But we cannot apply Lemma 4 to u6 either, since u6 and all its children u0 , u4 and
u5 have observable sets {u1 , u3 } w.r.t. V0 , while u6 is V0 -weakly deciding. This
shows that with Lemma 4, we have a sufficient condition, but not a necessary
one, for proving that a vertex is weakly deciding.
Example. Let us apply Algorithm 1 to our running example G0 (cf. Fig. 1).
Initially, W0 = V0 = {u1 , u3 }.
5 Danicic’s Algorithm
We present here the algorithm described in [11]. This algorithm and a proof of its
correctness have been formalized in Coq [17]. The algorithm is nearly completely
justified by a following lemma (Lemma 5, equivalent to [11, Lemma 60]).
We first need to introduce a new concept, which captures edges that are of
particular interest when searching for weakly deciding vertices. This concept is
taken from [11], where it was not given a name. We call such edges critical edges.
Definition 7 (Critical edge). An edge (u, v) in G is called V -critical if:
(1) | obsG (u, V )| ≥ 2;
(2) | obsG (v, V )| = 1;
(3) u is reachable from V in G.
Example. In G0 , (u0 , u1 ), (u2 , u3 ) and (u4 , u3 ) are the V0 -critical edges.
Lemma 5. If V is not weakly control-closed in G, then there exists a V -
critical edge (u, v) in G. Moreover, if (u, v) is such a V -critical edge, then
u ∈ WDG (V ) ∩ RG (V ), therefore u ∈ WCCG (V ).
Proof. Let x be a vertex in WDG (V ) reachable from V . There exists a V -path π
from x ending in x ∈ V . It follows that | obsG (x, V )| ≥ 2 and | obsG (x , V )| = 1.
Let u be the last vertex on π with at least two observable nodes in V and v its
successor on π. Then (u, v) is a V -critical edge.
Assume there exists a V -critical edge (u, v). Since | obsG (u, V )| ≥ 2 and
| obsG (v, V )| = 1, u ∈ V , v can reach V and there exists u in obsG (u, V ) but
not in obsG (v, V ). By Lemma 4, u ∈ WDG (V ) and thus u ∈ WCCG (V ).
Remark 3. We can see in the proof above that we do not need the exact values
2 and 1. We just need strictly more observable vertices for u than for v and at
least one observable for v, to satisfy the hypotheses of Lemma 4.
As described in Sect. 4, we can build an iterative algorithm constructing the
weak control-closure of V by searching for critical edges on the intermediate sets
built successively. This is the idea of Danicic’s algorithm shown as Algorithm 1.
www.dbooks.org
Fast Computation of Arbitrary Control Dependencies 215
Example. We can replay Algorithm 1 using the first optimization. This run cor-
responds to the steps shown in Fig. 2. Initially, W0 = V0 = {u1 , u3 }.
1. (u0 , u1 ), (u2 , u3 ), (u4 , u3 ) are W0 -critical edges. Set W1 = {u0 , u1 , u2 , u3 , u4 }.
2. (u6 , u0 ) is a W1 -critical edge. Set W2 = {u0 , u1 , u2 , u3 , u4 , u6 }.
3. There is no W2 -critical edge in G0 .
The optimized version computes the weak control-closure of V0 in G0 in only 2
iterations instead of 4. This run also demonstrates that the algorithm is neces-
sarily iterative: even when considering all V0 -critical edges in the first step, u6
is not detected before the second step.
Example. Let us use our running example (cf. Fig. 1) to illustrate the algorithm.
The successive steps are represented in Fig. 3. In the different figures, nodes in
W already processed (that is, in W \L) are represented using a solid double circle
( ui ), while nodes in W not already processed (that is, still in worklist L) are
represented using a dashed double circle ( ui ). A label uj next to a node ui
www.dbooks.org
Fast Computation of Arbitrary Control Dependencies 217
u1 u1 u3 u3 u0 u0
u6 u5 u6 u5 u6 u5
u0 u1 u0 u0 u0 u0
u1 u3 u2
u2 u1 u1 u2 u1 u1 u2 u1 u1
u4 u3 u3 u4 u3 u3 u4 u3 u3
u1 u3 u4
(a) After propagation of u1 (b) After propagation of u3 (c) After propagation of u0
u0 u0 u6 u4 u6 u6
u6 u5 u6 u5 u6 u5
u0 u0 u0 u0 u0 u0
u2 u2 u2
u2 u1 u1 u2 u1 u1 u2 u1 u1
u4 u3 u3 u4 u3 u3 u4 u3 u3
u4 u4 u4
(d) After propagation of u2 (e) After propagation of u4 (f) After propagation of u6
– propagate takes a vertex and propagates backwards a label over its prede-
cessors. It returns a set of candidate V -weakly-deciding nodes.
– main calls propagate on a node of the closure not yet processed, gets can-
didate V -weakly deciding nodes, calls confirm to keep only true V -weakly
deciding nodes, adds them to the closure and updates their labels, and loops
until no more V -weakly deciding nodes are found.
www.dbooks.org
Fast Computation of Arbitrary Control Dependencies 219
Proof of the Optimized Algorithm. We opted for Why3 instead of Coq for
this proof to take advantage of Why3’s automation. Indeed, most of the goals
could be discharged in less than a minute using Alt-Ergo, CVC4, Z3 and E.
Some of them still needed to be proved manually in Coq, resulting in 330 lines
of Coq proof. The Why3 development [17] focuses on the proof of the algorithm,
not on the concepts presented in Sects. 3 and 4. Most of the concepts are proved,
one of them is assumed in Why3 but was proved in Coq previously. Due to lack
of space, we detail here only the main invariants necessary to prove main (cf.
Algorithm 4). The proofs of I1 , I2 , I3 , I4 are rather simple. while those of I5 and
I6 are more complex.
I1 states that each node in W has itself as a label. It is true initially for all
nodes in V and is preserved by the updates.
I2 states that all labels are in W . This is true initially since all labels are in
V . The preservation is verified, since all updates are realized using labels in W .
I3 states that labels in L have not been already propagated. Given a node y
in L, y is the only node whose label is y. It is true initially since every vertex in
V has itself as a label. After an update, the new nodes obey the same rule, so
I3 is preserved.
I4 states that if label z is associated to a node y then there exists a path
between y and z. Initially, there exist trivial paths from each node in V to itself.
When obs is updated, there exists a W -path, thus in particular a path.
I5 states that W remains between V and V ∪WDG (V ) during the execution
of the algorithm. The first part V ⊆ W is easy to prove, because it is true
initially and W is growing. For the second part, we need to prove that after the
filtering, Δ ⊆ WDG (V ). For that, we will prove that Δ ⊆ WDG (W ) thanks
to Lemma 3. Let v be a node in Δ. Since Δ ⊆ C, we know that v ∈ W and
u ∈ obsG (v, W ). Moreover, we have confirm(G, obs, v, u) = true, i.e. v has a
child v such that v ∈ obs, hence v can reach W by I4 , and obs[v ] = u, hence
u ∈ obsG (v , W ). We can apply Lemma 4 and deduce that v ∈ WDG (W ).
I6 is the most complicated invariant. I6 states that if there is a path between
two vertices y and z that does not intersect W , and z has a label already pro-
cessed, then y and z have the same label. Let us give a sketch of the proof
of preservation of I6 after an iteration of the main loop. Let us note obs the
(W ∪Δ)−disjoint
map at the end of the iteration. Let y, z, z ∈ V such that y −−−−−−−−−−−→ z,
obs [z] = z and z ∈ (L \ {u}) ∪ Δ. Let us show that obs [y] = z . First, observe
that neither y nor z can be in Δ, otherwise z would be in Δ, which would
be contradictory. We examine four cases depending on whether the conditions
W −path W −path
z −−−−−→ u (H1 ) and y −−−−−→ u (H2 ) hold.
– H1 ∧ H2 : Both z and y were given the label u during the last iteration, thus
obs [z] = obs [y] = u as expected.
(W ∪Δ)−disjoint
– H1 ∧ (¬H2 ): This case is impossible, since y −−−−−−−−−−−→ z.
– (¬H1 ) ∧ (¬H2 ): Both z and y have the same label as before the iteration. We
can therefore conclude by I6 at the beginning of the iteration.
– (¬H1 ) ∧ H2 : This is the only complicated case. We show that it is contra-
dictory. For that, we introduce v1 as the last vertex on the (W ∪ Δ)-disjoint
www.dbooks.org
Fast Computation of Arbitrary Control Dependencies 221
7 Experiments
We have implemented Danicic’s algo- 60 Danicic’s algorithm
rithm (additionally improved by the Our algorithm
two optimizations proposed in Remark
4) and ours in OCaml [17] using the 40
time(s)
www.dbooks.org
Fast Computation of Arbitrary Control Dependencies 223
References
1. Why3, a tool for deductive program verification, GNU LGPL 2.1, development
version, January 2018. http://why3.lri.fr
2. Amtoft, T.: Slicing for modern program structures: a theory for eliminating irrel-
evant loops. Inf. Process. Lett. 106(2), 45–51 (2008)
3. Amtoft, T., Androutsopoulos, K., Clark, D.: Correctness of slicing finite state
machines. Technical report RN/13/22. University College London, December 2013
4. Amtoft, T., Banerjee, A.: A theory of slicing for probabilistic control flow graphs.
In: Jacobs, B., Löding, C. (eds.) FoSSaCS 2016. LNCS, vol. 9634, pp. 180–196.
Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49630-5 11
5. Bertot, Y., Castéran, P.: Interactive Theorem Proving and Program Development.
Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-662-07964-5
6. Bilardi, G., Pingali, K.: Generalized dominance and control dependence. In: PLDI,
pp. 291–300. ACM (1996)
7. Blazy, S., Maroneze, A., Pichardie, D.: Verified validation of program slicing. In:
CPP 2015, pp. 109–117 (2015)
8. Buchsbaum, A.L., Georgiadis, L., Kaplan, H., Rogers, A., Tarjan, R.E., Westbrook,
J.: Linear-time algorithms for dominators and other path-evaluation problems.
SIAM J. Comput. 38(4), 1533–1573 (2008)
9. Conchon, S., Filliâtre, J., Signoles, J.: Designing a generic graph library using ML
functors. In: Morazán, M.T. (ed.) Trends in Functional Programming, vol. 8, pp.
124–140. Intellect, Bristol (2007)
10. Cooper, K.D., Harvey, T.J., Kennedy, K.: A simple, fast dominance algorithm.
Softw. Pract. Exp. 4(1–10), 1–8 (2001)
11. Danicic, S., Barraclough, R.W., Harman, M., Howroyd, J., Kiss, Á., Laurence,
M.R.: A unifying theory of control dependence and its application to arbitrary
program structures. Theor. Comput. Sci. 412(49), 6809–6842 (2011)
12. Denning, D.E., Denning, P.J.: Certification of programs for secure information
flow. Commun. ACM 20(7), 504–513 (1977)
13. Ferrante, J., Ottenstein, K.J., Warren, J.D.: The program dependence graph and
its use in optimization. ACM Trans. Program. Lang. Syst. 9(3), 319–349 (1987)
14. Filliâtre, J.-C., Paskevich, A.: Why3 — where programs meet provers. In: Felleisen,
M., Gardner, P. (eds.) ESOP 2013. LNCS, vol. 7792, pp. 125–128. Springer,
Heidelberg (2013). https://doi.org/10.1007/978-3-642-37036-6 8
15. Georgiadis, L., Tarjan, R.E.: Dominator tree certification and divergent spanning
trees. ACM Trans. Algorithms 12(1), 11:1–11:42 (2016)
16. Georgiadis, L., Tarjan, R.E., Werneck, R.F.F.: Finding dominators in practice. J.
Graph Algorithms Appl. 10(1), 69–94 (2006)
17. Léchenet, J.-C.: Formalization of weak control dependence (2018). http://perso.
ecp.fr/∼lechenetjc/control/
18. Léchenet, J.-C., Kosmatov, N., Le Gall, P.: Cut branches before looking for bugs:
sound verification on relaxed slices. In: Stevens, P., Wasowski,
A. (eds.) FASE
2016. LNCS, vol. 9633, pp. 179–196. Springer, Heidelberg (2016). https://doi.org/
10.1007/978-3-662-49665-7 11
19. Lengauer, T., Tarjan, R.E.: A fast algorithm for finding dominators in a flowgraph.
ACM Trans. Program. Lang. Syst. 1(1), 121–141 (1979)
20. Leroy, X.: Formal verification of a realistic compiler. Commun. ACM 52(7), 107–
115 (2009)
224 J.-C. Léchenet et al.
21. Ottenstein, K.J., Ottenstein, L.M.: The program dependence graph in a soft-
ware development environment. In: The First ACM SIGSOFT/SIGPLAN Software
Engineering Symposium on Practical Software Development Environments (SDE
1984), pp. 177–184. ACM Press (1984)
22. Podgurski, A., Clarke, L.A.: A formal model of program dependences and its impli-
cations for software testing, debugging, and maintenance. IEEE Trans. Softw. Eng.
16(9), 965–979 (1990)
23. Ranganath, V.P., Amtoft, T., Banerjee, A., Hatcliff, J., Dwyer, M.B.: A new foun-
dation for control dependence and slicing for modern program structures. ACM
Trans. Program. Lang. Syst. 29(5) (2007). Article No. 27
24. The Coq Development Team: The Coq proof assistant, v8.6 (2017). http://coq.
inria.fr/
25. Tip, F.: A survey of program slicing techniques. J. Prog. Lang. 3(3), 121–189 (1995)
26. Wasserrab, D.: From formal semantics to verified slicing: a modular framework
with applications in language based security. Ph.D. thesis, Karlsruhe Inst. of Techn.
(2011)
27. Weiser, M.: Program slicing. In: ICSE 1981 (1981)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
www.dbooks.org
Specification and Program Testing
Iterative Generation of Diverse Models
for Testing Specifications of DSL Tools
1 Introduction
www.dbooks.org
228 O. Semeráth and D. Varró
2 Preliminaries
Core modeling concepts and testing challenges of DSL tools will be illustrated in
the context of Yakindu Statecharts [46], which is an industrial DSL for developing
reactive, event-driven systems, and supports validation and code generation.
www.dbooks.org
230 O. Semeráth and D. Varró
[[C(v)]]M
Z := IM (C)(Z(v)) [[ϕ1 ∧ ϕ2 ]]MZ := [[ϕ1 ]]M M
Z ∧ [[ϕ2 ]]Z
[[R(v1 , v2 )]]MZ := IM (R)(Z(v1 ), Z(v2 )) [[ϕ1 ∨ ϕ2 ]]MZ := [[ϕ1 ]]Z ∨ [[ϕ2 ]]M
M
Z
[[v1 = v2 ]]M := Z(v1 ) = Z(v2 ) [[¬ϕ]]M := ¬[[ϕ]]M
Z Z Z
[[∀v : ϕ]]MZ := M
x∈Obj M [[ϕ]]Z,v→x [[∃v : ϕ]]M
Z := M
x∈Obj M [[ϕ]]Z,v→x
www.dbooks.org
232 O. Semeráth and D. Varró
the DSL tool. Now a test model M detects a fault if there is a variable binding
M M
Z, where the two evaluations differ, i.e. [[ϕe ]]Z = [[ϕf ]]Z .
Example 2. Two WF constraints checked by the Yakindu environment can be
captured by graph predicates as follows:
– ϕ : incomingToEntry(E) := ∃T : Entry(E) ∧ target(T, E)
– φ : noOutgoingFromEntry(E) := Entry(E) ∧ ¬(∃T : source(T, E))
According to our fault model, we can derive two mutants for incomingToEntry
as predicates ϕf1 := Entry(E) and ϕf2 := ∃t : target(T, E).
Constraints ϕ and φ are satisfied in model M1 and M2 as the corresponding
M M
graph predicates have no matches, thus [[ϕ]]Z 1 = 0 and [[φ]]Z 1 = 0. As a test
model, both M1 and M2 is able to detect the same omission fault both for ϕf1
M
as [[ϕf1 ]] 1 = 1 (with E → e1 and E → e2) and similarly ϕf2 (with s1 and s3).
However, M3 is unable to kill mutant ϕf1 as (ϕ had a match E → e3 which
remains in ϕf1 ), but able to detect others.
Example 3. We illustrate the concept of graph shapes for model M1 . For range
0, objects are mapped to class names as neighborhood descriptors:
– nbh 0 (e) = {Entry, PseudoState, Vertex}
– nbh 0 (t1) = nbh 0 (t2) = nbh 0 (t3) = {Transition}
– nbh 0 (s1) = nbh 0 (s2) = {State, RegularState, Vertex}
For range 1, objects with different incoming or outgoing types are further split,
e.g. the neighborhood of t1 is different from that of t2 and t3 as it is connected
to an Entry along a source reference, while the source of t2 and t3 are States.
www.dbooks.org
234 O. Semeráth and D. Varró
o nbh 1 (o)
e1 − 2 n1
e3 n5
s1 − 6 n3
s7 n7
t1, 4, 11 n2
t2, 3, 5 − 8, 10 n4
t9 n6
The range of this internal diversity metric dinti (M ) is [0..1], and a model M
with dint
1 (M ) = 1 (and |M | ≥ |M M |) guarantees full metamodel coverage [45],
i.e. it surely contains all elements from a metamodel as types. As such, it is
an appropriate diversity metric for a model in the sense of [43]. Furthermore,
given a specific range i, the number of potential neighborhood shapes within
that range is finite, but it grows superexponentially. Therefore, for a small range
i, one can derive a model Mj with dint i (Mj ) = 1, but for larger models Mk (with
|Mk | > |Mj |) we will likely have dint int
i (Mj ) ≥ di (Mk ). However, due to the rapid
growth of the number of shapes for increasing range i, for most practical cases,
dint
i (Mj ) will converge to 1 if Mj is sufficiently diverse.
Iterative Generation of Diverse Models for Testing Specifications 235
External model diversity allows to compare two models. One can show that
this metric is a (pseudo)-distance in the mathematical sense [2], and thus, it can
serve as a diversity metric for a model generator in accordance with [43].
The coverage of a model set is not normalised, but its value monotonously
grows for any range i by adding new models. Thus it corresponds to our expec-
tation that adding a new test case to a test suite should increase its coverage.
www.dbooks.org
236 O. Semeráth and D. Varró
5 Evaluation
In this section, we provide an empirical evaluation of our diversity metrics and
model generation technique to address the following research questions:
RQ1: How effective is our technique in creating diverse models for testing?
RQ2: How effective is our technique in creating diverse test suites?
RQ3: Is there correlation between diversity metrics and mutation score?
Target Domain. In order to answer those questions, we executed model gen-
eration campaigns on a DSL extracted from Yakindu Statecharts (as proposed
in [35]). We used the partial metamodel describing the state hierarchy and tran-
sitions of statecharts (illustrated in Fig. 1, containing 12 classes and 6 refer-
ences). Additionally, we formalized 10 WF constraints regulating the transitions
as graph predicates, based on the built-in validation of Yakindu.
For mutation testing, we used a constraint or negation omission operator (CO
and NO) to inject an error to the original WF constraint in every possible way,
which yielded 51 mutants from the original 10 constraints (but some mutants
may never have matches). We checked both the original and mutated versions
of the constraints for each instance model, and a model kills a mutant if there
is a difference in the match set of the two constraints. The mutation score for a
test suite (i.e. a set of models) is the total number of mutants killed that way.
Compared Approaches. Our test input models were taken from three different
sources. First, we generated models with our iterative approach using a graph
solver (GS) with different neighborhoods for ranges r = 1 to r = 3.
Next, we generated models for the same DSL using Alloy [39], a well-known
SAT-based relational model finder. For representing EMF metamodels we used
traditional encoding techniques [8,32]. To enforce model diversity, Alloy was
configured with three different setups for symmetry breaking predicates: s = 0,
s = 10 and s = 20 (default value). For greater values the tool produced the same
set of models. We used the latest 4.2 build for Alloy with the default Sat4j [20]
as back-end solver. All other configuration options were set to default.
Finally, we included 1250 manually created statechart models in our anal-
ysis (marked by Human). The models were created by students as solutions
for similar (but not identical) statechart modeling homework assignments [43]
representing real models which were not prepared for testing purposes.
Measurement Setup. To address RQ1–RQ3, we created a two-step measure-
ment setup. In Step I. a set of instance models is generated with all GS and
Alloy configurations. Each tool in each configuration generated a sequence of
30 instance models produced by subsequent solver calls, and each sequence is
repeated 20 times (so 1800 models are generated for both GS and Alloy). In
www.dbooks.org
238 O. Semeráth and D. Varró
case of Alloy, we prevented the deterministic run of the solver to enable statisti-
cal analysis. The model generators was to create metamodel-compliant instances
compliant with the structural constraints of Subsect. 2.1 but ignoring the WF
constraints. The target model size is set to 30 objects as Alloy did not scale with
increasing size (the scalability and the details of the back-end solver is reported
in [33]). The size of Human models ranges from 50 to 200 objects.
In Step II., we evaluate and the mutation score for all the models (and for
the entire sequence) by comparing results for the mutant and original predicates
and record which mutant was killed by a model. We also calculate our diversity
metrics for a neighborhood range where no more equivalence classes are produced
by shapes (which turned out to be r = 7 in our case study). We calculated the
internal diversity of each model, the external diversity (distance) between pairs
of models in each model sequence, and the coverage of each model sequence.
RQ1: Measurement Results and Analysis. Figure 6a shows the distribution
of the number of mutants killed by at least one model from a model sequence (left
box plot), and the distribution of internal diversity (right box plot). For killing
mutants, GS was the best performer (regardless of the r range): most models
found 36–41 mutants out of 51. On the other hand, Alloy performance varied
based on the value of symmetry: for s = 0, most models found 9–15 mutants
(with a large number of positive outliers that found several errors). For s = 10,
the average is increased over 20, but the number of positive outliers simulta-
neously dropped. Finally, in default settings (s = 20) Alloy generated similar
models, and found only a low number of mutants. We also measured the effi-
ciency of killing mutants by Human, which was between GS and Alloy. None
of the instance models could find more than 41 mutants, which suggests that
those mutants cannot be detected at all by metamodel-compliant instances.
The right side of Fig. 6a presents the internal diversity of models measured
as shape nodes/graph nodes (for fixpoint range 7). The result are similar: the
diversity was high with low variance in GS with slight differences between ranges.
In case of Alloy, the diversity is similarly affected by the symmetry value:
s = 0 produced low average diversity, but a high number of positive outliers.
With s = 10, the average diversity increased with decreasing number of positive
outliers. And finally, with the default s = 20 value the average diversity was low.
The internal diversity of Human models are between GS and Alloy.
Iterative Generation of Diverse Models for Testing Specifications 239
200 200
45
Alloy;s=0 Alloy;s=10
# of Shape Nodes
# of Shape Nodes
150 150
40
100 100
# Mutants Killed
35 50 50
30 0 0
0 10 20 30 0 10 20 30
200 800
25 Alloy;s=20 (def) Graph Solver;r=1
# of Shape Nodes
# of Shape Nodes
150 600
20
100 400
15
0 5 10 15 20 25 30 50 200
# of Models 0 0
0 10 20 30 0 10 20 30
Alloy;s=0 Alloy;s=10 Alloy;s=20 (def)
r0 r1 r2 r3 r4 r5
GS;r=1 GS;r=2 GS;r=3
(a) Mutation score for model sequence (b) Model set coverage
Figure 6b illustrates the average distance between all model pairs generated
in the same sequence (vertical axis) for range 7. The distribution of external
diversity also shows similar characteristics as Fig. 6a: GS provided high diversity
for all ranges (56 out of the maximum 60), while the diversity between models
generated by Alloy varied based on the symmetry value.
As a summary, our model generation technique consistently outperformed
Alloy wrt. both the diversity metrics and mutation score for individual models.
RQ2: Measurement Results and Analysis. Figure 7a shows the number
of killed mutants (vertical axis) by an increasing set of models (with 1 to 30
elements; horizontal axis) generated by GS or Alloy. The diagram shows the
median of 20 generation runs to exclude the outliers. GS found a large amount of
mutants in the first model, and the number of killed mutants (36–37) increased
to 41 by the 17th model, which after no further mutants were found. Again,
our measurement showed little difference between ranges r = 1, 2 and 3. For
Alloy, the result highly depends on the symmetry value: for s = 0 it found a
large amount of mutants, but the value saturated early. Next, for s = 10, the
first model found significantly less mutants, but the number increased rapidly in
the for the first 5 models, but altogether, less mutants were killed than for s = 0.
Finally, the default configuration (s = 20) found the least number of mutants.
In Fig. 7b, the average coverage of the model sets is calculated (vertical axis)
for increasing model sets (horizontal axis). The neighborhood shapes are cal-
culated for r = 0 to 5, which after no significant difference is shown. Again,
configurations of symmetry breaking predicates resulted in different characteris-
tics for Alloy. However, the number of shape nodes investigated by the test set
was significantly higher in case of GS (791 vs. 200 equivalence classes) regardless
of the range, and it was monotonously increasing by adding new models.
Altogether, both mutation score and equivalence class coverage of a model
sequence was much better for our model generator approach compared to Alloy.
www.dbooks.org
240 O. Semeráth and D. Varró
1
0.9
0.8
Alloy;s=0
0.7
0.6
Alloy;s=10
Diversity
Our initial investigation suggests that a high internal diversity will provide
good mutation score, thus our metrics can potentially be good predictors in a
testing context, but we cannot generalize to full statistical correlation.
Threats to Validity and Limitations. We evaluated more than 4850 test
inputs in our measurement, but all models were taken from a single domain
of Yakindu statecharts with a dedicated set of WF constraints. However, our
model generation approach did not use any special property of the metamodel
or the WF constraints, thus we believe that similar results would be obtained for
other domains. For mutation operations, we checked only omission of predicates,
as extra constraints could easily yield infeasible predicates due to inconsistency
with the metamodel, thus further reducing the number of mutants that can be
killed. Finally, although we detected a strong correlation between diversity and
mutation score with our test cases, this result cannot be generalized to statistical
causality, because the generated models were not random samples taken from
the universe of models. Thus additional investigations are needed to justify this
correlation, and we only state that if a model is generated by either GS or Alloy,
a higher diversity means a higher mutation score with high probability.
6 Related Work
Diverse model generation plays a key role in testing model transformations
code generators and complete developement environments [25]. Mutation-based
approaches [1,11,22] take existing models and make random changes on them
by applying mutation rules. A similar random model generator is used for exper-
imentation purposes in [3]. Other automated techniques [7,12] generate models
that only conform to the metamodel. While these techniques scale well for larger
models, there is no guarantee whether the mutated models are well-formed.
Iterative Generation of Diverse Models for Testing Specifications 241
www.dbooks.org
242 O. Semeráth and D. Varró
scale well with increasing model size. While Alloy has been used as a model gen-
erator for numerous testing scenarios of DSL tools and model transformations
[6,8,35,36,42], our measurements strongly indicate that it is not a justified choice
as (1) Alloy is very sensitive to configurations of symmetry breaking predicates
and (2) the diversity and mutation score of generated models is problematic.
References
1. Aranega, V., Mottu, J.-M., Etien, A., Degueule, T., Baudry, B., Dekeyser, J.-L.:
Towards an automation of the mutation analysis dedicated to model transforma-
tion. Softw. Test. Verif. Reliab. 25(5–7), 653–683 (2015)
2. Arkhangel’Skii, A., Fedorchuk, V.: General Topolgy I: Basic Concepts and Con-
structions Dimension Theory, vol. 17. Springer, Heidelberg (2012). https://doi.org/
10.1007/978-3-642-61265-7
3. Batot, E., Sahraoui, H.: A generic framework for model-set selection for the unifi-
cation of testing and learning MDE tasks. In: MODELS, pp. 374–384 (2016)
4. Baudry, B., Dinh-Trong, T., Mottu, J.-M., Simmonds, D., France, R., Ghosh, S.,
Fleurey, F., Le Traon, Y.: Model transformation testing challenges. In: Integration
of Model Driven Development and Model Driven Testing (2006)
5. Baudry, B., Monperrus, M., Mony, C., Chauvel, F., Fleurey, F., Clarke, S.: Diver-
sify: ecology-inspired software evolution for diversity emergence. In: Software Main-
tenance, Reengineering and Reverse Engineering, pp. 395–398 (2014)
6. Bordbar, B., Anastasakis, K.: UML2ALLOY: a tool for lightweight modeling of
discrete event systems. In: IADIS AC, pp. 209–216 (2005)
7. Brottier, E., Fleurey, F., Steel, J., Baudry, B., Le Traon, Y.: Metamodel-based test
generation for model transformations: an algorithm and a tool. In: 17th Interna-
tional Symposium on Software Reliability Engineering, pp. 85–94 (2006)
8. Büttner, F., Egea, M., Cabot, J., Gogolla, M.: Verification of ATL transformations
using transformation models and model finders. In: Aoki, T., Taguchi, K. (eds.)
ICFEM 2012. LNCS, vol. 7635, pp. 198–213. Springer, Heidelberg (2012). https://
doi.org/10.1007/978-3-642-34281-3 16
9. Cabot, J., Clarisó, R., Riera, D.: UMLtoCSP: a tool for the formal verification of
UML/OCL models using constraint programming. In: ASE, pp. 547–548 (2007)
10. Cabot, J., Clariso, R., Riera, D.: Verification of UML/OCL class diagrams using
constraint programming. In: ICSTW, pp. 73–80 (2008)
11. Darabos, A., Pataricza, A., Varró, D.: Towards testing the implementation of graph
transformations. In: GTVMT, ENTCS. Elsevier (2006)
12. Ehrig, K., Küster, J.M., Taentzer, G.: Generating instance models from meta mod-
els. Softw. Syst. Model. 8(4), 479–500 (2009)
13. Fleurey, F., Baudry, B., Muller, P.-A., Le Traon, Y.: Towards dependable model
transformations: qualifying input test data. SoSyM, 8 (2007)
14. González, C.A., Cabot, J.: Test data generation for model transformations com-
bining partition and constraint analysis. In: Di Ruscio, D., Varró, D. (eds.) ICMT
2014. LNCS, vol. 8568, pp. 25–41. Springer, Cham (2014). https://doi.org/10.1007/
978-3-319-08789-4 3
Iterative Generation of Diverse Models for Testing Specifications 243
15. Guerra, E., Soeken, M.: Specification-driven model transformation testing. Softw.
Syst. Model. 14(2), 623–644 (2015)
16. Jackson, D.: Alloy: a lightweight object modelling notation. ACM Trans. Softw.
Eng. Methodol. 11(2), 256–290 (2002)
17. Jackson, E.K., Simko, G., Sztipanovits, J.: Diversely enumerating system-level
architectures. In: International Conference on Embedded Software, p. 11 (2013)
18. Jia, Y., Harman, M.: An analysis and survey of the development of mutation
testing. IEEE Trans. Softw. Eng. 37(5), 649–678 (2011)
19. Kang, E., Jackson, E., Schulte, W.: An approach for effective design space explo-
ration. In: Calinescu, R., Jackson, E. (eds.) Monterey Workshop 2010. LNCS, vol.
6662, pp. 33–54. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-
21292-5 3
20. Le Berre, D., Parrain, A.: The sat4j library. J. Satisf. Boolean Model. Comput. 7,
59–64 (2010)
21. Micskei, Z., Szatmári, Z., Oláh, J., Majzik, I.: A concept for testing robustness
and safety of the context-aware behaviour of autonomous systems. In: Jezic, G.,
Kusek, M., Nguyen, N.-T., Howlett, R.J., Jain, L.C. (eds.) KES-AMSTA 2012.
LNCS (LNAI), vol. 7327, pp. 504–513. Springer, Heidelberg (2012). https://doi.
org/10.1007/978-3-642-30947-2 55
22. Mottu, J.-M., Baudry, B., Le Traon, Y.: Mutation analysis testing for model trans-
formations. In: Rensink, A., Warmer, J. (eds.) ECMDA-FA 2006. LNCS, vol. 4066,
pp. 376–390. Springer, Heidelberg (2006). https://doi.org/10.1007/11787044 28
23. Mottu, J.-M., Simula, S.S., Cadavid, J., Baudry, B.: Discovering model transfor-
mation pre-conditions using automatically generated test models. In: ISSRE, pp.
88–99. IEEE, November 2015
24. The Object Management Group.: Object Constraint Language, v2.0, May 2006
25. Ratiu, D., Voelter, M.: Automated testing of DSL implementations: experiences
from building mbeddr. In: AST@ICSE 2016, pp. 15–21 (2016)
26. Reid, S.C.: An empirical analysis of equivalence partitioning, boundary value anal-
ysis and random testing. In: Software Metrics Symposium, pp. 64–73 (1997)
27. Rensink, A.: Isomorphism checking in GROOVE. ECEASST 1 (2006)
28. Rensink, A., Distefano, D.: Abstract graph transformation. Electron. Notes Theor.
Comput. Sci. 157(1), 39–59 (2006)
29. Reps, T.W., Sagiv, M., Wilhelm, R.: Static program analysis via 3-valued logic.
In: Alur, R., Peled, D.A. (eds.) CAV 2004. LNCS, vol. 3114, pp. 15–30. Springer,
Heidelberg (2004). https://doi.org/10.1007/978-3-540-27813-9 2
30. Salay, R., Famelis, M., Chechik, M.: Language independent refinement using partial
modeling. In: de Lara, J., Zisman, A. (eds.) FASE 2012. LNCS, vol. 7212, pp. 224–
239. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28872-2 16
31. Schonbock, J., Kappel, G., Wimmer, M., Kusel, A., Retschitzegger, W., Schwinger,
W.: TETRABox - a generic white-box testing framework for model transforma-
tions. In: APSEC, pp. 75–82. IEEE, December 2013
32. Semeráth, O., Barta, Á., Horváth, Á., Szatmári, Z., Varró, D.: Formal validation of
domain-specific languages with derived features and well-formedness constraints.
Softw. Syst. Model. 16(2), 357–392 (2017)
33. Semeráth, O., Nagy, A.S., Varró, D.: A graph solver for the automated generation
of consistent domain-specific models. In: 40th International Conference on Software
Engineering (ICSE 2018), Gothenburg, Sweden. ACM (2018)
www.dbooks.org
244 O. Semeráth and D. Varró
34. Semeráth, O., Varró, D.: Graph constraint evaluation over partial models by con-
straint rewriting. In: Guerra, E., van den Brand, M. (eds.) ICMT 2017. LNCS,
vol. 10374, pp. 138–154. Springer, Cham (2017). https://doi.org/10.1007/978-3-
319-61473-1 10
35. Semeráth, O., Vörös, A., Varró, D.: Iterative and incremental model generation
by logic solvers. In: Stevens, P., Wasowski,
A. (eds.) FASE 2016. LNCS, vol.
9633, pp. 87–103. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-
49665-7 6
36. Sen, S., Baudry, B., Mottu, J.-M.: Automatic model generation strategies for model
transformation testing. In: Paige, R.F. (ed.) ICMT 2009. LNCS, vol. 5563, pp. 148–
164. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02408-5 11
37. The Eclipse Project.: Eclipse Modeling Framework. https://www.eclipse.org/
modeling/emf/
38. The Eclipse Project.: EMF DiffMerge. http://wiki.eclipse.org/EMF DiffMerge
39. Torlak, E., Jackson, D.: Kodkod: a relational model finder. In: Grumberg, O.,
Huth, M. (eds.) TACAS 2007. LNCS, vol. 4424, pp. 632–647. Springer, Heidelberg
(2007). https://doi.org/10.1007/978-3-540-71209-1 49
40. Torrini, P., Heckel, R., Ráth, I.: Stochastic simulation of graph transformation
systems. In: Rosenblum, D.S., Taentzer, G. (eds.) FASE 2010. LNCS, vol. 6013, pp.
154–157. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12029-
9 11
41. Ujhelyi, Z., Bergmann, G., Hegedüs, Á., Horváth, Á., Izsó, B., Ráth, I., Szatmári,
Z., Varró, D.: EMF-IncQuery: an integrated development environment for live
model queries. Sci. Comput. Program. 98, 80–99 (2015)
42. Vallecillo, A., Gogolla, M., Burgueño, L., Wimmer, M., Hamann, L.: Formal spec-
ification and testing of model transformations. In: Bernardo, M., Cortellessa, V.,
Pierantonio, A. (eds.) SFM 2012. LNCS, vol. 7320, pp. 399–437. Springer, Heidel-
berg (2012). https://doi.org/10.1007/978-3-642-30982-3 11
43. Varró, D., Semeráth, O., Szárnyas, G., Horváth, Á.: Towards the automated gen-
eration of consistent, diverse, scalable and realistic graph models. In: Heckel, R.,
Taentzer, G. (eds.) Graph Transformation, Specifications, and Nets. LNCS, vol.
10800, pp. 285–312. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-
75396-6 16
44. Viatra Solver Project (2018). https://github.com/viatra/VIATRA-Generator
45. Wang, J., Kim, S.-K., Carrington, D.: Verifying metamodel coverage of model
transformations. In: Software Engineering Conference, p. 10 (2006)
46. Yakindu Statechart Tools.: Yakindu. http://statecharts.org/
Iterative Generation of Diverse Models for Testing Specifications 245
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
www.dbooks.org
Optimising Spectrum Based Fault
Localisation for Single Fault Programs
Using Specifications
1 Introduction
Faulty software is estimated to cost 60 billion dollars to the US economy per
year [1] and has been single-handedly responsible for major newsworthy catas-
trophes1 . This problem is exacerbated by the fact that debugging (defined as
the process of finding and rectifying a fault) is complex and time consuming –
estimated to consume 50–60% of the time a programmer spends in the main-
tenance and development cycle [2]. Consequently, the development of effective
and efficient methods for software fault localisation has the potential to greatly
reduce costs, wasted programmer time and the possibility of catastrophe.
In this paper, we advance the state of the art in lightweight fault localisation
by building on research in spectrum-based fault localisation (sbfl). sbfl is one
This research was supported by the Innovate UK project 113099 SECT-AIR.
1
https://www.newscientist.com/gallery/software-faults/.
c The Author(s) 2018
A. Russo and A. Schürr (Eds.): FASE 2018, LNCS 10802, pp. 246–263, 2018.
https://doi.org/10.1007/978-3-319-89363-1_14
Optimising SBFL for Single Fault Programs Using Specifications 247
2 Preliminaries
In this section we formally present the preliminaries for understanding our fault
localisation approach. In particular, we describe probands, proband models, and
sbfl.
2.1 Probands
Following the terminology in Steimann et al. [22], a proband is a faulty program
together with its test suite, and can be used for evaluating the performance of
www.dbooks.org
248 D. Landsberg et al.
In this section we define proband models, which are the principle formal objects
used in sbfl. Informally, a proband model is a mathematical abstraction of a
proband. We assume the existence of a given proband in which the uuts have
already been identified for the faulty program and appropriately labeled C1, . . . ,
Cn, and assume a total of n uuts. We begin as follows.
– for all 0 < i n, cki = 1 if the i-th uut is covered by the test case associated
with tk , and 0 otherwise.
– ckn+1 = 1 if the test case associated with tk fails and 0 if it passes.
We also call a set of coverage vectors T the fault localisation data or a dataset.
Intuitively, each coverage vector can be thought of as a mathematical abstraction
of an associated test case which describes which uuts were executed/covered in
that test case. We also use the following additional notation. If the last argument
of a coverage vector in T is the number k it is denoted tk where k uniquely
www.dbooks.org
250 D. Landsberg et al.
identifies a coverage vector in T and the corresponding test case in the associated
test suite. In general, for each tk ∈ T, cki is a coverage variable and gives the
value of the i-th argument in tk . If ckn+1 = 1, then tk is called a failing coverage
vector, and passing otherwise. The set of failing coverage vectors/the event of an
error is denoted E (such that the set of passing vectors is then E). Element ckn+1
is also denoted ek (as it describes whether the error occurred). For convenience,
we may represent the set of coverage vectors T with a coverage matrix, where for
all 0 < i n and tk ∈ T the cell intersecting the i-th column and k-th row is cki
and represents whether the i-th uut was covered in the test case corresponding
to tk . The cell intersecting the last column and k-th row is ek and represents
whether tk is a failing or passing test case. Fig. 2 is an example coverage matrix.
In practice, given a program and an input vector, one can extract coverage
information from an associated test case using established tools2 .
Example 4. For the test suite given in Example 2 we can devise a set of cov-
erage vectors T = {t1 , . . . , t10 } in which t1 = 1, 0, 1, 1, 0, 1, 1, t2 = 1, 0, 0, 1,
1, 1, 2, t3 = 1, 0, 0, 1, 0, 1, 3, t4 = 1, 1, 0, 0, 0, 0, 4, t5 = 1, 1, 0, 0, 0, 0, 5, t6 =
1, 0, 0, 0, 1, 0, 6, t7 = 1, 0, 0, 1, 1, 0, 7, t8 = 1, 0, 0, 0, 0, 0, 8, t9 = 1, 1, 0, 0,
1, 0, 9, and t10 = 1, 1, 1, 0, 0, 0, 10. Here, coverage vector tk is associated with
the k-th input vector described in the list in Example 2. To illustrate how input
and coverage vectors relate, we observe that t10 is associated with a test case
with input vector 0, 1, 2 which executes the statements labeled C1, C2 and C3,
does not execute the statements labeled C4 and C5, and does not result in error.
Consequently c10 10 10
1 = c2 = c3 = 1, and c4 = c5 = e
10 10 10
= 0, and k = 10 such
that t10 = 1, 1, 1, 0, 0, 0, 10 (by the definition of coverage vectors). The coverage
matrix representing T is given in Fig. 2.
We often use the notation PMT to denote the program model PM associated
with T. The final component C|PM| is also denoted E (denoting the event of
the error). Each member of a program model is called a program component or
event, and if cki = 1 we say Ci occurred in tk , that tk covers Ci , and say that
Ci is faulty just in case its corresponding uut is faulty. Following the definition
above, each component Ci is the set of vectors in which Ci is covered, and
obey set theoretic relationships. For instance, for all components Ci , Cj ∈ PM,
we have ∀tk ∈ Cj . cki = 1 just in case Cj ⊆ Ci . In general, we assume that E
contains at least one coverage vector and each coverage vector covers at least one
component. Members of E and E are called failing/passing vectors, respectively.
2
For C programs Gcov can be used, available at http://www.gcovr.com.
Optimising SBFL for Single Fault Programs Using Specifications 251
www.dbooks.org
252 D. Landsberg et al.
Example 6. For the proband model of the running example PM, T (where
PM = C1 , . . . , C5 , E and T is represented by the coverage matrix in Fig. 2),
the spectra for C1 , . . . C5 , and E are 3, 0, 7, 0, 0, 3, 3, 4, 1, 2, 2, 5, 3, 0, 1, 6,
1, 2, 3, 4, and 3, 0, 0, 7 respectively.
Following Naish et al. [7], we define a suspiciousness measure as follows.
Under the assumption that there is a single fault in the program, Naish
et al. argue that a measure must have this property to be optimal [7]. Informally,
the first condition demands that uuts covered by all failing test cases are more sus-
picious than anything else. The rationale here is that if there is only one faulty uut
in the program, then it must be executed by all failing test cases (otherwise there
would be some failing test case which executes no fault – which is impossible given
it is assumed that all errors are caused by the execution of some faulty uut) [7,30].
The second demands that of two uuts covered by all failing test cases, the one which
is executed by fewer passing test cases is more suspicious.
An example of a single fault optimal measure is the Naish-I measure w(Ci ) =
ai
aief − ai +aepi +1 [31]. A framework that optimises any given sbfl measure to
ep np
being single fault optimal was first given by Naish [31]. For any suspiciousness
measure w scaled from 0 to 1, we can construct the single fault optimised version
for w (written Optw ) as follows (here, we use the equivalent formulation of
Landsberg et al. [4]): Optw (Ci ) = ainp + 2 if aief = |E|, and w(Ci ) otherwise.
We now describe the established sbfl algorithm [4,7–12]. The method pro-
duces a list of program component indices ordered by suspiciousness, as a func-
tion of set of coverage vectors T (taken from a proband model PM, T) and
suspiciousness measure w. As the algorithm is simple, we informally describe
the algorithm in three stages, as follows. First, the program spectrum for each
program component is constructed as a function of T. Second, the indices of
program components are ordered in a suspiciousness list according to decreas-
ing order of suspiciousness. Third, the suspiciousness list is returned to the user,
who will inspect each uut corresponding to each index in the suspiciousness
Optimising SBFL for Single Fault Programs Using Specifications 253
In this section, we identify a new property for the optimality of a given dataset T
for use in fault localisation. Throughout we make two assumptions: Firstly that
a single bug optimal measure w is being used and secondly that there is a single
bug in a given faulty program (henceforth our two assumptions). Let PM, T
be a given sample proband model, then we have the following:
If this condition holds, then we say the dataset T (and its associated test
suite) satisfies this property of single fault optimality. Informally, the condition
demands that if a uut is covered by all failing test cases in the sample test suite
then it is covered by all failing test cases in the population. If our two assumptions
hold, we argue it is a desirable that a test suite satisfies this property. This
is because the fault is assumed to be covered by all failing test cases in the
population (similar to the rationale of Naish et al. [7]), and as uuts executed
by all failing test cases in the sample are investigated first when a single fault
optimal measure is being used, it is desirable that uuts not covered by all failing
test cases in the population are less suspicious in order to guarantee the fault
is found earlier. An additional desirable feature of knowing one’s data satisfies
this property, is that we do not have to add any more failing test cases to a test
suite, given it is then impossible to improve fault localization effectiveness by
adding more failing test cases under our two assumptions.
www.dbooks.org
254 D. Landsberg et al.
4 Algorithm
In this section we present an algorithm which outputs single fault optimal data
for a given faulty program. We assume several preconditions for our algorithm.
– For the given faulty program, at least one uut is executed by all failing
test cases (for C programs this could be a variable initialization in the main
function).
– The population proband model is available (but as we shall see in the next
section, practical implementations will not require this).
– We also assume that E is a mutable set, and shall make use of a choose(X)
subroutine which non-deterministically returns the set of a single a member
of X (if one exists, otherwise it returns the empty set).
Finally, we informally observe that the maximum size of the E returned is the
number of uuts. In this case E is input to the algorithm with a failing vector that
covers all components, and choose always returns a failing vector that covers 1
fewer uuts than the failing vector covering the fewest uuts already in E (noting
that we assume at least one component will always be covered). The minimum
is one. In this case E is input to the algorithm with a failing vector which covers
some components and the post-condition is already fulfilled. In general, E can
potentially be much smaller than E ∗ .
5 Implementation
We now discuss our implementation of the algorithm. In practice, we can leverage
model checkers to compute members of E ∗ (the population set of failing vectors)
on the fly, where computing E ∗ as a pre-condition would usually be intractable.
This can be done by appeal to a SMT solving subroutine, which we describe as
follows. Given a formal model of some code Fcode , a formal specification φ, set of
Booleans which are true just in case a corresponding uut is executed in a given
execution {C1, . . . , Cn}, and a set E ⊆ E ∗ , we can use a SMT solver to return a
www.dbooks.org
256 D. Landsberg et al.
satisfying assignment by calling SMT(Fcode ∧ ¬φ ∧ (∀tk ∈E)ck =1 Ci = 0), and
i
then extracting a coverage vector from that assignment. A subroutine which
returns this coverage vector (or the empty set if one does not exist) can act
as a substitute for the choose subroutine in Algorithm 1, and the generation
of a static object E ∗ is no longer required as an input to the algorithm. Our
implementation of this is called sfo (single fault optimal data generation tool).
We now discuss extensions of sfo. It is known that adding passing executions
help in sbfl [4,5,7–12], thus to develop a more effective fault localisation procedure
we developed a second implementation sfo p (sfo with passing traces) that runs sfo
and then adds passing test cases. To do this, after running sfo we call a SMT solver
20 times to find up to 20 new passing execution, where on each call if the vector
found has new coverage properties (does not cover all the same uuts as some pass-
ing vector already computed) it is added to a set of passing vectors.
Our implementations of sfo and sfo p are integrated into a branch of the
model checker cbmc [32]. Our branch of the tool is available for download at the
URL given in the footnote3 . Our implementations, along with generating fault
localisation data, rank uuts by degree of suspiciousness according to the Naish-I
measure and report this fault localisation data to the user.
6 Experimentation
In this section we provide details of evaluation results for the use of sfo and sfo p
in fault localisation. The purpose of the experiment is to demonstrate that imple-
mentations of Algorithm 1 can be used to facilitate efficient and effective fault
localisation in practice on small programs (≤2.5kloc). We think generation
of fault localisation information in a few seconds (≤2) is sufficient to demon-
strate practical efficiency, and ranking the fault in the top handful of the most
suspicious lines of code (≤5) on average is sufficient to demonstrate practical
effectiveness. In the remainder of this section we present our experimental setup
(where we describe our scoring system and benchmarks), and our results.
6.1 Setup
For the purposes of comparison, we tested the fault localisation potential of sfo
and sfo p against a method named 1f , which performes sbfl when only a single
failing test case was generated by cbmc (and thus uuts covered by the test
case were equally suspicious). We used the following scoring method to evaluate
the effectiveness of each of the methods for each benchmark. We envisage an
engineer who is inspecting each loc in descending order of suspiciousness using
a given strategy (inspecting lines that appear earlier in the code first in the case
of ties). We rank alternative techniques by the number of non-faulty loc that
are investigated until the engineer finds a fault. Finally, we report the average of
these scores for the benchmarks to give us an overall measure of fault localisation
effectiveness.
3
https://github.com/theyoucheng/cbmc.
Optimising SBFL for Single Fault Programs Using Specifications 257
4
http://sir.unl.edu/portal/index.php.
5
Benchmarks can be accessed at https://sv-comp.sosy-lab.org/2018/.
6
For our experiment we activated assertion statement P5a and fault 32c.
www.dbooks.org
258 D. Landsberg et al.
We now discuss the results of the three techniques 1f , sfo and sfo p . On
average, 1f located a fault after investigating 17.23 lines of code (4.09% of the
program on average). The results here are perhaps better than expected. We
observed that the single failing test case consistently returned good fault locali-
sation potential given the use of slicing by the technique.
We now discuss sfo. On average, sfo located a fault after investigating 16
lines of code (3.8% of the program on average). Thus, the improvement over 1f
is very small. When only one failing test case was available for sfo (i.e. |E| = 1)
we emphasise that the SMT solver could not find any other failing traces which
covered different parts of the program. In such cases, sfo performed the same
as 1f (as expected). However, when there was more than one failing test case
available (i.e. |E| > 1), sfo always made a small improvement. Accordingly, for
benchmarks 1, 2, 3, 5, 9, and 12 the improvements in terms fewer loc examined
are 2, 6, 3, 1, 2, and 3, respectively. An improvement in benchmarks where sfo
generated more than one test case is to be expected, given there was always a
Optimising SBFL for Single Fault Programs Using Specifications 259
fault covered by all failing test cases in each program (even in programs with
multiple faults), thus taking advantage of the property of single fault optimal
data. Finally, we conjecture that on programs with more failing test cases avail-
able in the population, and on longer faulty programs, that this improvement
will be larger.
We now discuss sfo p . On average, sfo p located a fault after investigating
4.08 loc (0.97% of each program on average). Thus, the improvement over
the other techniques is quite large (four times as effective as 1f ). Moreover, this
effectiveness came at very little expense to runtime – sfo p had an average runtime
of 1.06 s, which is comparable to the runtime of 1f of 0.78 s. This is despite
the fact that sfo p generated over 7 executions on average. We consequently
conclude that implementations of Algorithm 1 can be used to facilitate efficient
and effective fault localisation in practice on small programs.
7 Related Work
The techniques discussed in this paper improve the quality of data usable for
sbfl. We divide the research in this field into the following areas; many other
methods can be potentially combined with our technique.
Test Suite Expansion. One approach to improving test suites is to add more
test cases which satisfy a given criterion. A prominent criterion is that the test
suite has sufficient program coverage, where studies suggest that test suites with
high coverage improve fault localisation [15–17,20]. Other ways to improve test
suites for sbfl are as follows. Li et al. generate test suites for sbfl, considering
failing to passing test case ratio to be more important than number [35]. Zhang
et al. consider cloning failed test cases to improve sbfl [13]. Perez et al. develop a
metric for diagnosing whether a test suite is of sufficient quality for sbfl to take
place [14]. Li et al. consider weighing different test cases differently [36]. Aside
from coverage criteria, methods have been studied which generate test cases
with a minimal distance from a given failed test case [18]. Baudry et al. use
a bacteriological approach in order to generate test suites that simultaneously
facilitate both testing and fault localisation [19]. Concolic execution methods
have been developed to add test cases to a test suite based on their similarity to
an initial failing run [20].
Prominent approaches which leverage model checkers for fault localisation
are as follows. Groce [33] uses integer linear programming to find a passing test
case most similar to a failing one and then compare the difference. Schupman and
Bierre [37] generate short counterexamples for use in fault localisation, where
a short counterexample will usually mean fewer uuts for the user to inspect.
Griesmayer [38] and Birch et al. [39] use model checkers to find failing execu-
tions and then look for whether a given number of changes to values of variables
can be made to make the counterexample disappear. Gopinath et al. [40] com-
pute minimal unsatisfiable cores in a given failing test case, where statements in
the core will be given a higher suspiciousness level in the spectra ranking. Addi-
tionally, when generating a new test, they generate an input whose test case is
www.dbooks.org
260 D. Landsberg et al.
most similar to the initial run in terms of its coverage of the statements. Fey
et al. [41] use SAT solvers to localise faults on hardware with LTL specifications.
In general, experimental scale is limited to a small number of programs in these
studies, and we think our experimental component provides an improvement in
terms of experimental scale (13 programs).
8 Conclusion
In this paper, we have presented a method to generate single fault optimal data
for use with sbfl. Experimental results on our implementation sfo p , which inte-
grates single fault optimal data along with passing test cases, demonstrate that
small optimized fault localisation data can be generated efficiently in practice
(1.06 s on average), and that subsequent fault localization can be performed effec-
tively using this data (investigating 4.06 loc until a fault is found). We envisage
that implementations of the algorithm can be used in two different scenarios.
In the first, the test suite generated can be used in standalone fault localisa-
tion, providing a small and low cost test suite useful for repeating iterations of
simultaneous testing and fault localisation during program development. In the
second, the data generated can be added to any pre-existing data associated
with a test suite, which may be useful at the final testing stage where we may
wish to optimise single fault localisation.
Future work involves finding larger benchmarks to use our implementation
on and developing further properties, and methods for use with programs with
multiple faults. We would also like to combine our technique with existing test
suite generation algorithms in order to experiment how much test suites can be
additionally improved for the purposes of fault localization.
Optimising SBFL for Single Fault Programs Using Specifications 261
References
1. Zhivich, M., Cunningham, R.K.: The real cost of software errors. IEEE Secur. Priv.
7(2), 87–90 (2009)
2. Collofello, J.S., Woodfield, S.N.: Evaluating the effectiveness of reliability-
assurance techniques. J. Syst. Softw. 9(3), 745–770 (1989)
3. Wong, W.E., Gao, R., Li, Y., Abreu, R., Wotawa, F.: A survey on software fault
localization. IEEE Trans. Softw. Eng. 42(8), 707–740 (2016)
4. Landsberg, D., Chockler, H., Kroening, D., Lewis, M.: Evaluation of measures for
statistical fault localisation and an optimising scheme. In: Egyed, A., Schaefer,
I. (eds.) FASE 2015. LNCS, vol. 9033, pp. 115–129. Springer, Heidelberg (2015).
https://doi.org/10.1007/978-3-662-46675-9 8
5. Landsberg, D., Chockler, H., Kroening, D.: Probabilistic fault localisation. In:
Bloem, R., Arbel, E. (eds.) HVC 2016. LNCS, vol. 10028, pp. 65–81. Springer,
Cham (2016). https://doi.org/10.1007/978-3-319-49052-6 5
6. Landsberg, D.: Methods and measures for statistical fault localisation. Ph.D. thesis,
University of Oxford (2016)
7. Naish, L., Lee, H.J., Ramamohanarao, K.: A model for spectra-based software
diagnosis. ACM Trans. Softw. Eng. Methodol. 20(3), 1–11 (2011)
8. Lucia, L., Lo, D., Jiang, L., Thung, F., Budi, A.: Extended comprehensive study of
association measures for fault localization. J. Softw. Evol. Process 26(2), 172–219
(2014)
9. Wong, W.E., Debroy, V., Gao, R., Li, Y.: The DStar method for effective software
fault localization. IEEE Trans. Reliab. 63(1), 290–308 (2014)
10. Wong, W.E., Debroy, V., Choi, B.: A family of code coverage-based heuristics for
effective fault localization. JSS 83(2), 188–208 (2010)
11. Yoo, S.: Evolving human competitive spectra-based fault localisation techniques.
In: Fraser, G., Teixeira de Souza, J. (eds.) SSBSE 2012. LNCS, vol. 7515, pp. 244–
258. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33119-0 18
12. Kim, J., Park, J., Lee, E.: A new hybrid algorithm for software fault localization.
In: IMCOM, pp. 50:1–50:8. ACM (2015)
13. Zhang, L., Yan, L., Zhang, Z., Zhang, J., Chan, W.K., Zheng, Z.: A theoretical
analysis on cloning the failed test cases to improve spectrum-based fault localiza-
tion. JSS 129, 35–57 (2017)
14. Perez, A., Abreu, R., van Deursen, A.: A test-suite diagnosability metric for
spectrum-based fault localization approaches. In: ICSE (2017)
15. Jiang, B., Chan, W.K., Tse, T.H.: On practical adequate test suites for integrated
test case prioritization and fault localization. In: International Conference on Qual-
ity Software, pp. 21–30 (2011)
16. Santelices, R., Jones, J.A., Yu, Y., Harrold, M.J.: Lightweight fault-localization
using multiple coverage types. In: ICSE, pp. 56–66 (2009)
17. Feldt, R., Poulding, S., Clark, D., Yoo, S.: Test set diameter: quantifying the
diversity of sets of test cases. CoRR, abs/1506.03482 (2015)
18. Jin, W., Orso, A.: F3: fault localization for field failures. In: ISSTA, pp. 213–223
(2013)
19. Baudry, B., Fleurey, F., Le Traon, Y.: Improving test suites for efficient fault
localization. In: ICSE, pp. 82–91. ACM (2006)
20. Artzi, S., Dolby, J., Tip, F., Pistoia, M.: Directed test generation for effective fault
localization. In: ISSTA, pp. 49–60 (2010)
www.dbooks.org
262 D. Landsberg et al.
21. Perez, A., Abreu, R., D’Amorim, M.: Prevalence of single-fault fixes and its impact
on fault localization. In: 2017 ICST, pp. 12–22 (2017)
22. Steimann, F., Frenkel, M., Abreu, R.: Threats to the validity and value of empirical
assessments of the accuracy of coverage-based fault locators. In: ISSTA, pp. 314–
324. ACM (2013)
23. Groce, A.: Error explanation with distance metrics. In: Jensen, K., Podelski, A.
(eds.) TACAS 2004. LNCS, vol. 2988, pp. 108–122. Springer, Heidelberg (2004).
https://doi.org/10.1007/978-3-540-24730-2 8
24. Steimann, F., Frenkel, M.: Improving coverage-based localization of multiple faults
using algorithms from integer linear programming. In: ISSRE, pp. 121–130, 27–30
November 2012
25. Abreu, R., Zoeteweij, P., van Gemund, A.J.: An evaluation of similarity coefficients
for software fault localization. In: PRDC, pp. 39–46 (2006)
26. DiGiuseppe, N., Jones, J.A.: On the influence of multiple faults on coverage-based
fault localization. In: ISSTA, pp. 210–220. ACM (2011)
27. Jones, J.A., Harrold, M.J., Stasko, J.: Visualization of test information to assist
fault localization. In: Proceedings of the 24th International Conference on Software
Engineering, ICSE 2002, pp. 467–477. ACM (2002)
28. Wong, W.E., Qi, Y.: Effective program debugging based on execution slices and
inter-block data dependency. JSS 79(7), 891–903 (2006)
29. Liblit, B., Naik, M., Zheng, A.X., Aiken, A., Jordan, M.I.: Scalable statistical bug
isolation. SIGPLAN Not. 40(6), 15–26 (2005)
30. Naish, L., Lee, H.J.: Duals in spectral fault localization. In: Australian Conference
on Software Engineering (ASWEC), pp. 51–59. IEEE (2013)
31. Naish, L., Lee, H.J., Ramamohanarao, K.: Spectral debugging: how much better
can we do? In: ACSC, pp. 99–106 (2012)
32. Clarke, E., Kroening, D., Lerda, F.: A tool for checking ANSI-C programs. In:
Jensen, K., Podelski, A. (eds.) TACAS 2004. LNCS, vol. 2988, pp. 168–176.
Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24730-2 15
33. Groce, A.: Error explanation and fault localization with distance metrics. Ph.D.
thesis, Carnegie Melon (2005)
34. CBMC. http://www.cprover.org/cbmc/
35. Li, N., Wang, R., Tian, Y., Zheng, W.: An effective strategy to build up a balanced
test suite for spectrum-based fault localization. Math. Probl. Eng. 2016, 13 (2016)
36. Li, Y., Liu, C.: Effective fault localization using weighted test cases. J. Soft. 9, 08
(2014)
37. Schuppan, V., Biere, A.: Shortest counterexamples for symbolic model checking of
LTL with past. In: Halbwachs, N., Zuck, L.D. (eds.) TACAS 2005. LNCS, vol.
3440, pp. 493–509. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-
540-31980-1 32
38. Griesmayer, A., Staber, S., Bloem, R.: Fault localization using a model checker.
Softw. Test. Verif. Reliab. 20(2), 149–173 (2010)
39. Birch, G., Fischer, B., Poppleton, M.: Fast test suite-driven model-based fault
localisation with application to pinpointing defects in student programs. Soft. Syst.
Model. (2017)
40. Gopinath, D., Zaeem, R.N., Khurshid, S.: Improving the effectiveness of
spectra-based fault localization using specifications. In: Proceedings of the 27th
IEEE/ACM ASE, pp. 40–49 (2012)
41. Fey, G., Staber, S., Bloem, R., Drechsler, R.: Automatic fault localization for prop-
erty checking. CAD 27(6), 1138–1149 (2008)
Optimising SBFL for Single Fault Programs Using Specifications 263
42. Vidacs, L., Beszedes, A., Tengeri, D., Siket, I., Gyimothy, T.: Test suite reduction
for fault detection and localization. In: CSMR-WCRE, pp. 204–213, February 2014
43. Xuan, J., Monperrus, M.: Test case purification for improving fault localization.
In: FSE, FSE 2014, pp. 52–63. ACM (2014)
44. Alves, E., Gligoric, M., Jagannath, V., d’Amorim, M.: Fault-localization using
dynamic slicing and change impact analysis. In: ASE, pp. 520–523 (2011)
45. Xiaolin, J., Jiang, S., Chen, X., Wang, X., Zhang, Y., Cao, H.: HSFal: effective
fault localization using hybrid spectrum of full slices and execution slices. J. Syst.
Softw. 90, 3–17 (2014)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
www.dbooks.org
TCM: Test Case Mutation to Improve
Crash Detection in Android
1 Introduction
As of April 2016, there are over 2.6 billion smartphone users worldwide and
this number is expected to go up [1]. There is an increasing focus on mobile
application testing starting from the last decade in top testing conferences and
journals [2]. Android applications have the largest share in the mobile application
market, where 82.8% of all mobile applications are designed for Android [1].
Therefore, we focus on Android GUI Testing in this paper.
The main idea of TCM is to mutate existing test cases to produce richer
test cases in order to increase the number of detected crashes. We first iden-
tify typical crash patterns that exist in Android applications. Then, we develop
mutation operators based on these crash patterns. Typically mutation operators
are applied to the source code of applications. However, in our work we apply
them to test cases.
Typical crash patterns in Android are Unhandled Exceptions, External
Errors, Resource Unavailability, Semantic Errors, and Network-Based Crashes
[3]. We describe one case study for each crash pattern. We define six novel muta-
tion operators (Loop-Stressing, Pause-Resume, Change Text, Toggle Contextual
State, Remove Delays, and Faster Swipe) and relate them to these five crash
patterns.
AUT
We implement TCM on top of AndroFrame
[4], a fully automated Android GUI testing
tool. We give an overview of TCM in Fig. 1.
AndroFrame Test Results
First, we generate a test suite for the Appli-
Generated Test Suite + cation Under Test (AUT) using AndroFrame.
AUT Model
AndroFrame obtains an AUT Model which is
Test Suite
Minimization
represented as an Extended Labeled Transi-
tion System (ELTS). We then minimize the
Minimized Test Suite
Generated Test Suite using the AUT Model in
Test Case Mu- order to reduce test execution costs (Test Suite
tation (TCM) Minimization). We apply Test Case Muta-
Mutated Test Suite tion (TCM) on the Minimized Test Suite
and obtain a Mutated Test Suite. We use
AndroFrame Test Results AndroFrame to execute the Mutated Test
Suite and collect Test Results.
We state our contributions as follows:
Fig. 1. TCM overview
2 Background
In this section, we first describe the basics of the Android GUI to facilitate the
understanding of our paper.
Android GUI is based on activities, events, and crashes. An activity is a
container for a set of GUI components. These GUI components can be seen on the
Android screen. Each GUI component has properties that describe boundaries
of the component in pixels (x1 , y1 , x2 , y2 ) and how the user can interact with
the component (enabled, clickable, longclickable, scrollable, password ). Each GUI
component also has a type property from which we can understand whether
the component accepts text input. A GUI component accepts text input if its
password property is true or its type is EditText.
www.dbooks.org
266 Y. Koroglu and A. Sen
The Android system and the user can interact with GUI components using
events. We divide events in two categories, system events and GUI events
(actions). We show the list of GUI actions that we use in Table 1, which covers
more actions then are typically used in the literature. Note that GUI actions
in Table 1 are possible inputs from the user whereas system events are not.
We group actions into three categories; non-contextual, contextual, and special.
Non-contextual actions correspond to actions that are triggered by user gestures.
Click and longclick take two parameters, x and y coordinates to click on. Text
takes three parameters, x and y for coordinates and string to describe what to
write. Swipe takes five parameters. The first four parameters describe the start-
ing and the ending coordinates. The fifth parameter is used to adjust the speed
of swipe. Menu and back actions have no parameters. These actions just click to
the menu and back buttons of the mobile device, respectively. Contextual actions
correspond to the user changing the contextual state of the AUT. Contextual
state is the concatenation of the global attributes of the mobile device (internet
connectivity, bluetooth, location, planemode, sleeping). The connectivity action
adjusts the internet connectivity of the mobile device (adjusts wifi or mobile
data according to which is available for the mobile device). Bluetooth, location,
and planemode are straightforward. The doze action taps the power button of
the mobile device and puts the device to sleep or wakes it. We use the doze
action to pause and resume the AUT. Our only special action is reinitialize,
which reinstalls and starts an AUT. System events are system generated events,
e.g. battery level, receiving SMS, clock/timer.
TCM: Test Case Mutation to Improve Crash Detection in Android 267
In this section, we first describe typical crash patterns for Android applications
based on related work in the literature [3]. We give a list of the crash patterns
in Table 2 and describe them below.
www.dbooks.org
268 Y. Koroglu and A. Sen
C4. Semantic Errors. An AUT may crash if it fails to handle certain inputs
given by the user. For example, AUT may crash instead of generating a warning
if some textbox is left empty, or contains an unexpected text.
Here n is the length of test case t and = (vi , vi+1 , zi , d ). We pick d = 1 to avoid
ti
double-click, which may be programmed as a separate action than single click.
We pick m = 9. We have two motivations for choosing m = 9. First, in our case
studies, we did not encounter a crash when m < 9. Second, although we detect
the same crash when m > 9, we want to keep m as small as possible to keep
test cases small. Loop-stressing may lead to an unhandled exception (C1) due to
stressing the third party libraries by invoking them repeatedly. Loop-stressing
may also lead to an external error (C2) if it stresses another application until it
crashes.
M3. Change Text (δ CT ). We assume that existing test cases contain well-
behaving text inputs to explore the AUT as much as possible. To increase the
number of detected crashes, we modify the contents of the texts.
t = δCT (t) first picks one random abnormal text manipulation operation and
applies it to a random textentry action of the existing test case t. Abnormal text
manipulation operations can be emptytext, dottext, and longtext where empty-
text deletes the text, dottext enters a singe dot character, and longtext enters a
random string of length >200.
Let zict denote a random abnormal text manipulation action where zi is a
text action and dct i denotes the new delay required to completely execute zi .
ct
We define t = δCT (t) on test cases as follows:
t zi = textentry
δCT (t) = (3)
t1...i−1 · tct
i · t i+1...n otherwise
www.dbooks.org
270 Y. Koroglu and A. Sen
M4. Toggle Contextual State (δ TCS ). Existing test suites typically lack con-
textual actions where the condition of the contextual state is crucial to generate
the crash. Therefore, we introduce contextual state toggling with t = δTCS (t)
which is defined as follows.
δTCS (t) = t1 · ttcs
1 · t2 · t2 · . . . · tn · tn
tcs tcs
(4)
where n is the length of test case t and ttcs i is a contextual action transition
(vi+1 , vi+1 , z tcs , d ). z tcs corresponds to a random contextual toggle action. We
pick d = 10 s for each contextual action since Android may take a long time
before it stabilizes after the change of contextual state. Toggling the contextual
states of the AUT may result in an external error (C2), or a network-based crash
if the connection failures are not handled correctly (C5).
M5. Remove Delays (δ RD ). t = δRD (t) takes a test case t and sets all of its
delays to 0. When reproduced, the events of t will be in the same order with t,
but sent to the AUT at the earliest possible time.
δRD (t) = (v1 , v2 , z1 , 0) · (v2 , v3 , z2 , 0) · . . . · (vn , vn+1 , zn , 0) (5)
If the AUT is communicating with another application, removing delays may
cause the requests to crash the other application. If this case is not handled
in the AUT, the AUT crashes due to external errors (C2). If the AUT’s back-
ground process is affected by the GUI actions, removing delays may cause the
background process to crash due to resource unavailability (C3). If the GUI
actions trigger network requests, having no delays may cause a network-based
crash (C5).
M6. Faster Swipe (δ FS ). t = δFS (t) increases the speed of all swipe actions
of a test case t. Let zif s denote a faster version of zi , where zi is a swipe action.
Then, we define δFS on test cases with at least one swipe action as follows.
δFS (t) = tf1 s · tf2 s · . . . · tfns (6)
where n is the length of test case t and
(vi , vi+1 , zi , di ) zi is NOT a swipe
tfi s =
(vi , vi+1 , zif s , di ) otherwise
If the information presented by the AUT is downloaded from a network or
another application, swiping too fast may cause a network-based crash (C3) due
to the network being unable to provide the necessary data or an external error
(C2). If the AUT is a game, swiping too fast may cause the AUT to throw an
unhandled exception (C1).
TCM: Test Case Mutation to Improve Crash Detection in Android 271
1: T S ← ∅
2: for t ∈ {t : t ∈ T S ∧ t does not crash} do Iterate over non-crashing test cases
3: if covM (T S ∪ {t}) > covω (T S ) then Take only the test cases that increase coverage
4: t ← argmin t1...i s.t. covM (T S ∪ {t1...i }) = covM (T S ∪ {t}) Shorten the test case
i
5: T S ← T S ∪ {t } Add the shortened test case to the Minimized Test Suite
6: end if
7: end for
1: T S ← {}
2: x←0
3: repeat
4: t ← random t ∈ T S Pick a random test case
5: δ ← random δ ∈ Δ s.t. t = δ(t) Pick a mutation operator that changes the test case
6: t ← δ(t) Apply the mutation operator to the test case
7: T S ← T S ∪ {t } Add the mutated test case to the New Test Suite
8: x ← x + (vs ,ve ,z,d)∈t d Calculate the total delay
9: until x > X Repeat until the total delay is above the given timeout
www.dbooks.org
272 Y. Koroglu and A. Sen
Test Case A
1 v1 reinit 10
2 v1 v2 click 1
3 v2 v1 back 1
4 v1 v2 click 1
5 v2 v1 back 1
Test Case B
1 v1 reinit 8
2 v1 v3 menu 2 Mutated 1
3 v3 CRASH menu 1 1 v1 reinit 15 Mutated 2
2 v1 v1 back 1 1 v1 reinit 15
Test Case C
3 v1 v1 back 1 2 v1 doze off 2
1 v1 reinit 9 4 v1 v1 back 1 3 v1 doze on 2
2 v1 v1 back 0 5 v1 v1 back 1 4 v1 v1 back 0
3 v1 v2 click 1 6 v1 v1 back 1 5 v1 doze off 2
4 v2 v3 click 2 7 v1 v1 back 1 6 v1 doze on 2
5 v3 CRASH menu 2 8 v1 v1 back 1 7 v1 v2 click 2
Test Case D 9 v1 v1 back 1 8 v2 doze off 2
1 v1 reinit 15 10 v1 v1 back 1 9 v2 doze on 2
2 v1 v1 back 0 11 v1 v1 back 0 10 v2 v1 back 1
3 v1 v2 click 2 12 v1 v2 click 2 11 v1 doze off 2
4 v2 v1 back 1 13 v2 v1 back 1 12 v1 doze on 2
5 v1 v3 menu 3 14 v1 v3 menu 3 13 v1 v3 menu 3
(a) Test Cases (b) AUT Model (c) Mutated Test Cases
5 Motivating Example
Figures 2a and b show a test suite and an AUT model, respectively. We generate
this test suite and the AUT model by executing AndroFrame for one minute on
an example AUT. We execute AndroFrame for just one minute, because that is
enough to generate test cases for this example. We limit the maximum number
of transitions per test case to five to keep the test cases small in this motivating
example for ease of presentation. The test suite has four test cases; A, B, C, and
D. Each row of test cases describes a delayed transition. The click action has
coordinates, but we abstract this information for the sake of simplicity.
Among the four test cases reported by AndroFrame, we take only the non-
crashing test cases, A and D. In our example, we include D since it increases
the edge coverage and we exclude A since all of A’s transitions are also D’s
transitions, i.e. A is subsumed by D. Then, we attempt to minimize test case
D without reducing the edge coverage. In our example, we don’t remove any
transitions from D because all transitions in D contribute to the edge coverage.
We then generate mutated test cases by randomly applying mutation operators
to D one by one until we reach one minute timeout. Figure 2c shows an example
mutated test suite. Test case Mutated 1 takes D and exercises the back button
for multiple times to stress the loop at state v1. Test case Mutated 2 clicks the
hardware power button twice (doze off, doze on) between each transition. This
operation pauses and resumes the AUT in our test devices. We then execute all
mutated test cases on the AUT. Our example AUT in fact crashes when the loop
on v1 is reexecuted more than eight times and also crashes when the AUT is
paused in state v2 . When executed, our mutated test cases reveal these crashes
both at their ninth transition, doubling the number of detected crashes.
TCM: Test Case Mutation to Improve Crash Detection in Android 273
6 Evaluation
In this section, we evaluate TCM via experiments and case studies. We show
that, through experiments, we improve crash detection. We then show, with
case studies, how we detect crash patterns.
6.1 Experiments
We selected 100 AUTs (excluding the case studies described later) from F-Droid
benchmarks [7] for experiments. To evaluate the improvement in crash detection,
we first execute AndroFrame, Sapienz, PUMA, Monkey, and A3 E for 20 min each
on these applications with no mutations enabled on test cases. Then we execute
TCM with 10 min for AndroFrame to generate test cases and 10 min to mutate
the generated test cases and replay them to detect more crashes. AndroFrame
requires the maximum length of a test case as a parameter. We used its default
parameter, 80 transitions maximum per test case.
Figure 3 shows the number of total distinct crashes detected by each tool
across time. Whenever a crash occurs, the Android system logs the resulting
stack trace. We say that two crashes are distinct if stack traces of these crashes
are different.
Our results show that AndroFrame detects more crashes than any other tool
from very early on. TCM detects the same number of crashes with AndroFrame
for the first 10 min (600 s). During that time, AndroFrame detects 15 crashes. In
the last 10 min, TCM detects 14 more crashes whereas AndroFrame detects only
3 more crashes. As a result TCM detects 29 crashes in total whereas AndroFrame
detects 18 crashes in total. As a last note, all other tools including AndroFrame
seem to stabilize after 20 min whereas TCM finds many crashes near timeout.
This shows us that TCM may find even more crashes when timeout is longer.
www.dbooks.org
274 Y. Koroglu and A. Sen
(a) Execution of Test Case t (b) Execution of Test Case t = δCT (t)
Overall, TCM finds 14 more crashes than AndroFrame and 17 more crashes than
Sapienz, the best among other tools.
We also investigate how much each mutation operator contributes to the
number of detected crashes. Our observations reveal that M1 (δLS ) detects one
crash, M2 (δPR ) detects four crashes, M3 (δCT ) detects two crashes, M4 (δTCS )
detects two crashes, M5 (δRD ) detects four crashes, and M6 (δFS ) detects one
crash. These crashes add up to 14, which is the number of crashes detected by
TCM in the last 10 min. This result shows that while all mutation operators
contribute to the crash detection, M2 and M5 have the largest contribution.
We present and explain one crash that is found only by TCM in Fig. 4.
Figure 4a shows an instance where AndroFrame generates and executes a test
case t on the Yahtzee application. Note that t does not lead to a crash, but only
a warning message. Figure 4b shows the instance where TCM mutates t and
executes the mutated test case t . When t is executed, the application crashes
and terminates. We note that this crash was not found by any other tool. Mao
et al. [8] also report that Sapienz and Dynodroid did not find any crashes in this
application.
(d) Semantic Error (C4) Example (e) Network-Based Crash (C5) Example
www.dbooks.org
276 Y. Koroglu and A. Sen
7 Discussion
Although TCM is conceptually applicable to different GUI platforms, e.g. iOS
or a desktop computer, there are three key challenges. First, our crash patterns
are not guaranteed to exist or be observable in different platforms. Second, our
mutation operators may not be applicable to those platforms, e.g. swipe may
not be available as a gesture. Third, either an AUT model may be impossible to
obtain or a replayable test case may be impossible to generate in those platforms.
When all these challenges are addressed, we believe TCM should be applicable
to not just Android, but other platforms as well.
TCM mutates test cases after they are generated. We could apply mutated
inputs immediately during test generation. However, this requires us to alter the
test generation process which may not be possible if a third party test generation
tool is used. Our approach is conceptually applicable to any test generation tool
without altering the test generation tool.
We use an edge coverage criterion to minimize a given test suite. Because
of this the original test suite covers potentially more paths than the minimized
test suite and therefore explores the same edge in different contexts. Without
minimization, test cases in the test suite are too many and too large to generate
enough mutations to observe crashes in given timeout. Therefore, we argue that
by minimizing the test suite we improve the crash detection performance of TCM
at the cost of the test suite’s completeness in terms of a higher coverage criterion
than edge coverage.
Although TCM detects crashes, it does not detect all possible bug patterns.
Qin et al. [9] thoroughly classifies all bugs in Android. According to this classi-
fication, there are two types of bugs in Android, Bohrbugs and Mandelbugs. A
Bohrbug is a bug whose reachability and propagation are simple. A Mandelbug
is a bug whose reachability and propagation are complicated. Qin et al. further
categorize Mandelbugs as Aging Related Bugs (ARB) and Non-Aging Related
Mandelbugs (NAM). Qin et al. also define five subtypes for NAM and six sub-
types for ARB. TCM detects only the first two subtypes of NAM, TIM and SEQ.
TIM and SEQ are the only kinds of bugs which are triggered by user inputs. If
a bug is TIM, the error is caused by the timing of inputs. If a bug is SEQ, the
error is caused by the sequencing of inputs.
We note two key points on the crash patterns of TCM. First, testing tools
we compare TCM with only detect SEQ bugs. TCM introduces the detection
of TIM bugs in addition to SEQ bugs. Second, Azim et al. [3] further divides
SEQ and TIM bugs into six crash patterns. We base our crash patterns on these
TCM: Test Case Mutation to Improve Crash Detection in Android 277
8 Related Work
Test Case Mutation (TCM) differs from the well-known Mutation Testing (MT)
[12] where mutations are inserted in the source code of an AUT to measure the
quality of existing test cases. Whereas in TCM, we update existing test cases
to increase the number of detected crashes. Oliveria et al. [13] are the first to
suggest using Mutation Testing (MT) for GUIs. Deng et al. [14] define several
source code level mutation operators for Android applications to measure the
quality of existing test suites.
The concept of Test Case Mutation is not new. In Android GUI Testing,
Sapienz [8] and EvoDroid [15] are Android testing tools that use evolution-
ary algorithms, and therefore mutation operators. Sapienz shuffles the orders of
the events, whereas EvoDroid mutates the test case in two ways: (1) EvoDroid
transforms text inputs and (2) EvoDroid either injects, swaps, or removes events.
TCM mutates not only text inputs, but also introduces 5 more novel mutation
operators. Furthermore, Sapienz and EvoDroid use their mutation operators
for both exploration and crash detection whereas we specialize TCM’s muta-
tion operators for crash detection only. In Standard GUI Testing, MuCRASH
[16] uses test case mutation via defining special mutation operators on test
cases, where the operators are defined at the source code level. They use TCM
for crash reproduction, whereas ours is the first work that uses TCM to dis-
cover new crashes. Directed Test Suite Augmentation (DTSA) introduced by
www.dbooks.org
278 Y. Koroglu and A. Sen
Xu et al. in 2010 [17] also mutates existing test cases but for the goal of achiev-
ing a target branch coverage.
We implement TCM on AndroFrame [4]. AndroFrame is one of the state-of-
the-art Android GUI Testing tools. AndroFrame finds more crashes than other
available alternatives in the literature such as A3 E and Sapienz. These tools
generate replayable test cases as well. They provide the necessary utilities to
replay their generated test cases. We can mutate these test cases but most of
our mutations won’t be applicable for two reasons. First, A3 E and Sapienz do
not learn a model from which we can extract looping actions. Second, A3 E
and Sapienz do not support contextual state toggling. Implementing all of our
mutations on top of these tools is possible, but requires a significant amount of
engineering effort. Therefore we implement TCM on top of AndroFrame.
Other black-box testing tools in the literature include A3 E [18], SwiftHand
[6], PUMA [19], DynoDroid [20], Sapienz [8], EvoDroid [15], CrashScope [5] and
MobiGUITAR [21]. From these applications, only EvoDroid, CrashScope, and
MobiGUITAR are publicly unavailable.
Monkey is a simple random generation-based fuzz tester for Android. Mon-
key detects the largest number of crashes among other black-box testing tools.
Generation-based fuzz testing is a popular approach in Android GUI Testing,
which basically generates random or unexpected inputs. Fuzzing could be com-
pletely random as in Monkey, or more intelligent by detecting relevant events
as in Dynodroid [20]. TCM can be viewed as a mutation-based fuzz testing
tool, where we modify existing test cases rather than generating test cases from
scratch. TCM can be implemented on top of Monkey or DynoDroid to improve
crash detection of these tools.
Baek and Bae [22] define a comparison criterion for Android GUI states.
AndroFrame uses the maximum comparison level described in this work, which
makes our models as fine-grained as possible for black-box testing.
9 Conclusion
In this study, we developed a novel test case mutation technique that allows us
to increase detection of crashes in Android applications. We defined six muta-
tion operators for GUI test cases and relate them to commonly occurring crash
patterns in Android applications. We obtained test cases through a state-of-the-
art Android GUI testing tool, called AndroFrame. We showed with several case
studies that our mutation operators are able to uncover new crashes.
As a future work, we plan to study a broader set of GUI actions, such as
rotation and doubleclick. We will improve our mutation algorithm by sampling
mutation operators from a probability distribution based on crash rates rather
than a uniform distribution. We will find the most optimal timings for executing
the test generator and TCM, rather than dividing the available time into two
equal halves. We will further investigate Android crash patterns.
TCM: Test Case Mutation to Improve Crash Detection in Android 279
References
1. Piejko, P.: 16 mobile market statistics you should know in 2016 (2016). https://
deviceatlas.com/blog/16-mobile-market-statistics-you-should-know-2016
2. Zein, S., Salleh, N., Grundy, J.: A systematic mapping study of mobile application
testing techniques. J. Syst. Softw. 117, 334–356 (2016)
3. Azim, T., Neamtiu, I., Marvel, L.M.: Towards self-healing smartphone software via
automated patching. In: 29th ACM/IEEE International Conference on Automated
Software Engineering (ASE), pp. 623–628 (2014)
4. Koroglu, Y., Sen, A., Muslu, O., Mete, Y., Ulker, C., Tanriverdi, T., Donmez, Y.:
QBE: QLearning-based exploration of android applications. In: IEEE International
Conference on Software Testing, Verification and Validation (ICST) (2018)
5. Moran, K., Vásquez, M.L., Bernal-Cárdenas, C., Vendome, C., Poshyvanyk, D.:
Automatically discovering, reporting and reproducing android application crashes.
In: IEEE International Conference on Software Testing, Verification and Validation
(ICST), pp. 33–44 (2016)
6. Choi, W., Necula, G., Sen, K.: Guided GUI testing of android apps with minimal
restart and approximate learning. In: ACM SIGPLAN International Conference on
Object Oriented Programming Systems Languages and Applications (OOPSLA),
pp. 623–640 (2013)
7. Gultnieks, C.: F-Droid Benchmarks (2010). https://f-droid.org/
8. Mao, K., Harman, M., Jia, Y.: Sapienz: multi-objective automated testing for
android applications. In: 25th International Symposium on Software Testing and
Analysis (ISSTA), pp. 94–105 (2016)
9. Qin, F., Zheng, Z., Li, X., Qiao, Y., Trivedi, K.S.: An empirical investigation of
fault triggers in android operating system. In: IEEE 22nd Pacific Rim International
Symposium on Dependable Computing (PRDC), pp. 135–144 (2017)
10. Zeller, A.: Yesterday, my program worked. Today, it does not. Why? In: 7th Euro-
pean Software Engineering Conference Held Jointly with the 7th ACM SIGSOFT
International Symposium on Foundations of Software Engineering (ESEC/FSE-7),
pp. 253–267 (1999)
11. Carino, S., Andrews, J.H.: Dynamically testing GUIs using ant colony optimiza-
tion. In: 30th IEEE/ACM International Conference on Automated Software Engi-
neering (ASE), pp. 135–148 (2015)
12. Ammann, P., Offutt, J.: Introduction to Software Testing, 1st edn. Cambridge
University Press, Cambridge (2008)
13. Oliveira, R.A.P., Algroth, E., Gao, Z., Memon, A.: Definition and evaluation of
mutation operators for GUI-level mutation analysis. In: IEEE Eighth International
Conference on Software Testing, Verification and Validation Workshops (ICSTW),
pp. 1–10 (2015)
14. Deng, L., Offutt, J., Ammann, P., Mirzaei, N.: Mutation operators for testing
android apps. Inf. Softw. Technol. 81(C), 154–168 (2017)
15. Mahmood, R., Mirzaei, N., Malek, S.: EvoDroid: segmented evolutionary testing of
android apps. In: 22nd ACM SIGSOFT International Symposium on Foundations
of Software Engineering (FSE), pp. 599–609 (2014)
16. Xuan, J., Xie, X., Monperrus, M.: Crash reproduction via test case mutation:
let existing test cases help. In: 10th Joint Meeting on Foundations of Software
Engineering (ESEC/FSE), pp. 910–913 (2015)
17. Xu, Z., Kim, Y., Kim, M., Rothermel, G., Cohen, M.B.: Directed test suite aug-
mentation: techniques and tradeoffs. In: 18th ACM SIGSOFT International Sym-
posium on Foundations of Software Engineering (FSE), pp. 257–266 (2010)
www.dbooks.org
280 Y. Koroglu and A. Sen
18. Azim, T., Neamtiu, I.: Targeted and depth-first exploration for systematic test-
ing of android apps. In: ACM SIGPLAN International Conference on Object Ori-
ented Programming Systems Languages and Applications (OOPSLA), pp. 641–660
(2013)
19. Hao, S., Liu, B., Nath, S., Halfond, W.G., Govindan, R.: PUMA: programmable UI-
automation for large-scale dynamic analysis of mobile apps. In: 12th Annual Inter-
national Conference on Mobile Systems, Applications, and Services (MobiSys), pp.
204–217 (2014)
20. Machiry, A., Tahiliani, R., Naik, M.: Dynodroid: an input generation system
for android apps. In: 9th Joint Meeting on Foundations of Software Engineering
(ESEC/FSE), pp. 224–234 (2013)
21. Amalfitano, D., Fasolino, A.R., Tramontana, P., Ta, B.D., Memon, A.M.: MobiGU-
ITAR: automated model-based testing of mobile apps. IEEE Softw. 32(5), 53–59
(2015)
22. Baek, Y.M., Bae, D.H.: Automated model-based android GUI testing using multi-
level GUI comparison criteria. In: 31st IEEE/ACM International Conference on
Automated Software Engineering (ASE), pp. 238–249 (2016)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
CRETE: A Versatile Binary-Level
Concolic Testing Framework
1 Introduction
Symbolic execution [1] has become an increasingly important technique for auto-
mated software analysis, e.g., generating test cases, finding bugs, and detecting
security vulnerabilities [2–11]. There have been many recent approaches to sym-
bolic execution [12–22]. Generally speaking, these approaches can be classified
into two categories: online symbolic execution (e.g., BitBlaze [4], klee [5], and
s2 e [6]), and concolic execution (a.k.a., offline symbolic execution, e.g., CUTE [2],
DART [3], and SAGE [7]). Online symbolic execution closely couples Symbolic
Execution Engines (see) with the System Under Test (sut) and explore all
possible execution paths of sut online at once. On the other hand, concolic
execution decouples see from the sut through traces, which concretely runs a
single execution path of a sut and then symbolically executes it.
www.dbooks.org
282 B. Chen et al.
Both online and offline symbolic execution are facing new challenges, as com-
puter software is experiencing an explosive growth, both in complexities and
diversities, ushered in by the proliferation of cloud computing, mobile comput-
ing, and Internet of Things. Two major challenges are: (1) the sut involves many
types of software for different hardware platforms and (2) the sut involves many
components distributed on different machines and as a whole the sut cannot fit
in any see. In this paper, we focus on how to extend concolic execution to sat-
isfy the needs for analyzing emerging software systems. There are two major
observations behind our efforts on extending concolic execution:
– The decoupled architecture of concolic execution provides the flexibility in
integrating new trace-captured frontends for emerging platforms.
– The trace-based nature of concolic testing offers opportunities for selectively
capturing and synthesizing reduced system-level traces for scalable analysis.
We present crete, a versatile binary-level concolic testing framework, which
features an open and highly extensible architecture allowing easy integration of
concrete execution frontends and symbolic execution backends. crete’s exten-
sibility is rooted in its modular design where concrete and symbolic execution is
loosely coupled only through standardized execution traces and test cases. The
standardized execution traces are llvm-based, self-contained, and composable,
providing succinct and sufficient information for see to reproduce the concrete
executions. The crete framework is composed of:
– A crete tracing plugin, which is embedded in the concrete execution
environment, captures binary-level execution traces of the sut, and stores the
traces in a standardized trace format.
– A crete manager, which archives the captured execution traces and test
cases, schedules concrete and symbolic execution, and implements policies for
selecting the traces and test cases to be analyzed and explored next.
– A crete replayer, which is embedded in the symbolic execution environ-
ment, performs concolic execution on captured traces for test case generation.
We have implemented the crete framework on top of qemu [23] and klee,
particularly the tracing plugin for qemu, the replayer for klee, and the man-
ager that coordinates qemu and klee to exchange runtime traces and test cases
and manages the policies for prioritizing runtime traces and test cases. To val-
idate crete extensibility, we have also implemented a tracing plugin for the
8051 emulator [24]. The trace-based architecture of crete has enabled us to
integrate such tracing frontends seamlessly. To demonstrate its effectiveness and
capability, we evaluated crete on GNU Coreutils programs and TianoCore
utility programs for UEFI BIOS, and compared with klee and angr, which are
two state-of-art open-source symbolic executors for automated program analysis
at source-level and binary-level.
The crete framework makes several key contributions:
– Versatile concolic testing. crete provides an open and highly extensible
architecture allowing easy integration of different concrete and symbolic exe-
cution environments, which communicate with each other only by exchanging
CRETE: A Versatile Binary-Level Concolic Testing Framework 283
2 Related Work
DART [3] and CUTE [2] are both early representative work on concolic testing.
They operate on the source code level. crete further extends concolic testing
and targets close-source binary programs. SAGE [7] is a Microsoft internal con-
colic testing tool that particularly targets at X86 binaries on Windows. crete
is platform agnostic: as long as a trace from concrete execution can be converted
into the llvm-based trace format, it can be analyzed to generate test cases.
klee [5] is a source-level symbolic executor built on the llvm infrastruc-
ture [25] and is capable of generating high-coverage test cases for C programs.
crete adopts klee as its see, and extends it to perform concolic execution
on standardized binary-level traces. s2 e [6] provides a framework for develop-
ing tools for analyzing close-source software programs. It augments a Virtual
Machine (vm) with a see and path analyzers. It features a tight coupling of
concrete and symbolic execution. crete takes a loosely coupled approach to
the interaction of concrete and symbolic execution. crete captures complete
execution traces of the sut online and conducts whole trace symbolic analysis
off-line.
BitBlaze [4] is an early representative work on binary analysis for computer
security. It and its follow-up work Mayhem [8] and MergePoint [12] focus on
optimizing the close coupling of concrete and symbolic execution to improve the
effectiveness in detecting exploitable software bugs. crete has a different focus
on providing an open architecture for binary-level concolic testing that enables
flexible integration of various concrete and symbolic execution environments.
angr [14] is an extensible Python framework for binary analysis using
VEX [26] as an intermediate representation (IR). It implemented a number of
www.dbooks.org
284 B. Chen et al.
3 Overview
During the design of the crete framework for binary-level concolic testing, we
have identified the following design goals:
– Binary-level In-vivo Analysis. It should require only the binary of the sut
and perform analysis in its real execution environment.
– Extensibility. It should allow easy integration of concrete execution fron-
tends and see backends.
– High Coverage. It should achieve coverage that is not significantly lower
than the coverage attainable by source-level analysis.
– Minimal Changes to Existing Testing Processes. It should simply pro-
vide additional test cases that can be plugged into existing testing processes
without major changes to the testing processes.
This online tracing and offline test generation process is iterative: it repeats until
all generated test cases are issued or time bounds are reached. We extend this
process to satisfy our design goals as follows.
CRETE: A Versatile Binary-Level Concolic Testing Framework 285
4 Design
In this section, we present the design of crete with a vm as the concrete exe-
cution environment. The reason for selecting a vm is that it allows complete
access to the whole system for tracing runtime execution states and is generally
accessible as mature open-source projects.
www.dbooks.org
286 B. Chen et al.
curve to utilize crete are minimal. It makes virtually no difference for users to
setup the testing environment for the tbp in a crete instrumented vm than a
vanilla vm. The configuration file is an interface for users to configure parameters
on testing a tbp, especially specifying the number and size of symbolic command-
line inputs and symbolic files for test case generation.
the standardized trace. A main function is also added making the trace a self-
contained llvm module. The main function first invokes crete helper functions
to initialize hardware states, then it calls into the first basic block llvm function.
Before it calls into the second basic block llvm function, the main function
invokes crete helper functions to update hardware states. For example, before
calling asm_BB_3, it calls function sync_state to update register r1 and memory
location 0x5678, which are the side effects brought by BB_2.
www.dbooks.org
288 B. Chen et al.
crete Tracer captures the initial state of CPU by capturing a copy of the
CPU state before the first interested basic block is executed. The initial CPU
state is normally a set of register values. As shown in Fig. 2, the initial CPU
state is captured before instruction (1). Näively, the initial memory state can
be captured in the same way; however, the typical size of memory makes it
impractical to dump entirely. To minimize the trace size, crete Tracer only
captures the parts of memory that are accessed by the captured read instructions,
like instruction (1) and (9). The memory being touched by the captured write
instructions, like instruction (3) and (11), can be ignored because the state of this
part of the memory has been included in the write instructions and has been
captured. As a result, crete Tracer monitors every memory read instruction
that is of interest, capturing memory as needed on-the-fly. In the example above,
there are two memory read instructions. crete Tracer monitors both of them,
but only keeps the memory state taken from instruction (1) as a part of the
initial state of memory, because instruction (1) and (9) access the same address.
The side effects of hardware states are captured by monitoring uncaptured
write instructions of hardware states. In the example in Fig. 2, instructions (5)
and (6) write CPU registers which cause side effects to the CPU state. crete
Tracer monitors those instructions and keeps the updated register values as part
of the runtime trace. As register r1 is updated twice by two instructions, only
the last update is kept in the runtime trace. Similarly, crete Tracer captures
the side effect of memory at address 0x5678 by monitoring instruction (7).
Fig. 3. Execution tree of the example trace from Fig. 2: (a) for concrete execution, (b)
for symbolic execution, and (c) for concolic execution.
block, if both of the paths are feasible given the collected constraints so far on
the symbolic values, the see in crete only keeps the execution state of the
path that was taken by the original concrete execution in the vm by adding the
corresponding constraints of this branch instruction, while generating a test case
for the other path by resolving constraints with the negated branch condition.
This generated test case can lead the tbp to a different execution path later
during the concrete execution in the vm.
crete detects bugs and runtime vulnerabilities in two ways. First, all the native
checks embedded in see are checked during the symbolic replay over the trace
captured from concrete execution. If there is a violation to a check, a bug report
is generated and associated with the test case that is used in the vm to generate
this trace. Second, since crete does not change the native testing process and
simply provides additional test cases that can be applied in the native process,
all the bugs and vulnerability checks that are used in the native process are
effective in detecting bugs and vulnerabilities that can be triggered by the crete
generated test cases. For instance, Valgrind [26] can be utilized to detect memory
related bugs and vulnerabilities along the paths explored by crete test cases.
5 Implementation
To demonstrate the practicality of crete, we have implemented its complete
workflow with qemu [23] as the frontend and klee [5] as the backend respec-
tively. And to demonstrate the extensibility of crete, we have also developed
the tracing plug-in for the 8051 emulator which readily replaces qemu.
crete Tracer for qemu: To give crete the best potential of supporting vari-
ous guest platforms supported by qemu, crete Tracer captures the basic blocks
in the format of qemu-ir. To convert captured basic blocks into standardized
www.dbooks.org
290 B. Chen et al.
6 Evaluation
In this section, we present the evaluation results of crete from its application
to GNU Coreutils [38] and TianoCore utility programs for UEFI BIOS [39].
Those evaluations demonstrate that crete generates effective test cases that
are as effective in achieving high code coverage as the state-of-the-art tools for
automated test case generation, and can detect serious deeply embedded bugs.
CRETE: A Versatile Binary-Level Concolic Testing Framework 291
Table 1. Comparison of overall and median coverage by klee, angr, and crete on
Coreutils.
1
http://klee.github.io/docs/coreutils-experiments/.
www.dbooks.org
292 B. Chen et al.
with given resources, such as time and memory. This is why klee can achieve
great code coverage, such as line coverage over 90%, on more programs than
crete, as shown in Table 2. klee requires to maintain execution states for all
paths being explored at once. This limitation becomes bigger when size of pro-
gram gets bigger. What’s more, klee analyzes programs within its own virtual
environment with simplified model of real execution environment. Those models
sometimes offer advantages to klee by reducing the complexity of the tbp, while
sometimes they lead to disadvantages by introducing inaccurate environment.
This is why crete gradually caught up in general as shown in Table 2. Specif-
ically, crete gets higher line coverage on 33 programs, lower on 31 programs,
and the same on other 23 programs. Figure 4(a) shows the coverage differences
of crete over klee on all 87 Coreutils programs. Note that our coverage
results for klee are different from klee’s paper. As discussed and reported
in previous works [12,41], the coverage differences are mainly due to the major
code changes of klee, an architecture change from 32-bit to 64-bit, and whether
manual system call failures are introduced.
angr shares the same limitation as klee requiring to maintain multiple
states and provide models for execution environment, while it shares the disad-
vantage of crete in having no access to semantics information. Moreover, angr
provides models of environment at machine level supporting various platforms,
which is more challenging compared with klee’s model. What’s more, we found
and reported several crashes of angr from this evaluation, which also affects the
result of angr. This is why angr performs worse than both klee and crete in
this experiment. Figure 4(b) shows the coverage differences of crete over angr
on all 87 Coreutils programs. While crete outperformed angr on majority
of the programs, there is one program printf that angr achieved over 40%
better line coverage than crete, as shown in the left most column in Fig. 4(b).
We found the reason is printf uses many string routines from libc to parse
inputs and angr provides effective models for those string routines. Similarly,
klee works much better on printf than crete.
CRETE: A Versatile Binary-Level Concolic Testing Framework 293
Fig. 4. Line coverage difference on Coreutils by crete over klee and angr: positive
values mean crete is better, and negative values mean crete is worse.
Fig. 5. Coverage improvement over seed test case by crete on GNU Coreutils
www.dbooks.org
294 B. Chen et al.
Fig. 6. Coverage improvement over seed test case by crete on TianoCore utilities
www.dbooks.org
296 B. Chen et al.
References
1. King, J.C.: Symbolic execution and program testing. Commun. ACM 19, 385–394
(1976)
2. Sen, K., Marinov, D., Agha, G.: Cute: a concolic unit testing engine for C. In:
Proceedings of the 10th European Software Engineering Conference (2005)
3. Godefroid, P., Klarlund, N., Sen, K.: Dart: directed automated random testing. In:
Proceedings of the ACM SIGPLAN Conference on Programming Language Design
and Implementation (PLDI 2005) (2005)
4. Song, D., et al.: BitBlaze: a new approach to computer security via binary analysis.
In: Sekar, R., Pujari, A.K. (eds.) ICISS 2008. LNCS, vol. 5352, pp. 1–25. Springer,
Heidelberg (2008). https://doi.org/10.1007/978-3-540-89862-7_1
5. Cadar, C., Dunbar, D., Engler, D.: KLEE: unassisted and automatic generation
of high-coverage tests for complex systems programs. In: Proceedings of the 8th
USENIX Conference on Operating Systems Design and Implementation (OSDI
2008) (2008)
6. Chipounov, V., Kuznetsov, V., Candea, G.: The s2e platform: design, implemen-
tation, and applications. ACM Trans. Comput. Syst. 30, 1–49 (2012)
7. Godefroid, P., Levin, M.Y., Molnar, D.: Sage: whitebox fuzzing for security testing.
Commun. ACM 10, 1–20 (2012)
CRETE: A Versatile Binary-Level Concolic Testing Framework 297
8. Cha, S.K., Avgerinos, T., Rebert, A., Brumley, D.: Unleashing Mayhem on binary
code. In: Proceedings of the 2012 IEEE Symposium on Security and Privacy (2012)
9. Cadar, C., Sen, K.: Symbolic execution for software testing: three decades later.
Commun. ACM 56, 82–90 (2013)
10. Kuznetsov, V., Kinder, J., Bucur, S., Candea, G.: Efficient state merging in sym-
bolic execution. In: PLDI 2012 (2012)
11. Marinescu, P.D., Cadar, C.: Make test-zesti: a symbolic execution solution for
improving regression testing. In: Proceedings of the 34th International Conference
on Software Engineering (ICSE 2012) (2012)
12. Avgerinos, T., Rebert, A., Cha, S.K., Brumley, D.: Enhancing symbolic execution
with veritesting. In: ICSE 2014 (2014)
13. Avgerinos, T., Cha, S.K., Rebert, A., Schwartz, E.J., Woo, M., Brumley, D.: Auto-
matic exploit generation. Commun. ACM 57, 74–84 (2014)
14. Shoshitaishvili, Y., Wang, R., Salls, C., Stephens, N., Polino, M., Dutcher, A.,
et al.: SOK: (state of) the art of war: offensive techniques in binary analysis. In:
IEEE Symposium on Security and Privacy (2016)
15. Stephens, N., Grosen, J., Salls, C., et al.: Driller: augmenting fuzzing through selec-
tive symbolic execution. In: Proceedings of the Network and Distributed System
Security Symposium (2016)
16. Redini, N., Machiry, A., Das, D., Fratantonio, Y., Bianchi, A., Gustafson, E.,
Shoshitaishvili, Y., Kruegel, C., Vigna, G.: Bootstomp: on the security of boot-
loaders in mobile devices. In: 26th USENIX Security Symposium (2017)
17. Palikareva, H., Kuchta, T., Cadar, C.: Shadow of a doubt: testing for divergences
between software versions. In: ICSE 2016 (2016)
18. Palikareva, H., Cadar, C.: Multi-solver support in symbolic execution. In: Shary-
gina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 53–68. Springer, Heidel-
berg (2013). https://doi.org/10.1007/978-3-642-39799-8_3
19. Bucur, S., Kinder, J., Candea, G.: Prototyping symbolic execution engines for
interpreted languages. In: Proceedings of the 19th International Conference on
Architectural Support for Programming Languages and Operating Systems (2014)
20. Kasikci, B., Zamfir, C., Candea, G.: Automated classification of data races under
both strong and weak memory models. ACM Trans. Program. Lang. Syst. 37, 1–44
(2015)
21. Ramos, D.A., Engler, D.: Under-constrained symbolic execution: correctness check-
ing for real code. In: Proceedings of the 24th USENIX Conference on Security
Symposium (2015)
22. Zheng, H., Li, D., Liang, B., Zeng, X., Zheng, W., Deng, Y., Lam, W., Yang, W.,
Xie, T.: Automated test input generation for android: towards getting there in an
industrial case. In: Proceedings of the 39th International Conference on Software
Engineering: Software Engineering in Practice Track (2017)
23. Bellard, F.: QEMU, a fast and portable dynamic translator. In: Proceedings of the
Annual Conference on USENIX Annual Technical Conference (2005)
24. Kasolik, M.: 8051 emulator. http://emu51.sourceforge.net/
25. Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program
analysis & transformation. In: Proceedings of the International Symposium on
Code Generation and Optimization: Feedback-directed and Runtime Optimization
(2004)
26. Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary
instrumentation. In: PLDI 2007 (2007)
27. Godefroid, P.: Random testing for security: blackbox vs. whitebox fuzzing. In:
Proceedings of the 2nd International Workshop on Random Testing (2007)
www.dbooks.org
298 B. Chen et al.
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
Family-Based Software Development
www.dbooks.org
Abstract Family-Based Model Checking
Using Modal Featured Transition
Systems: Preservation of CTL
Aleksandar S. Dimovski(B)
1 Introduction
Variational systems appear in many application areas and for many reasons.
Efficient methods to achieve customization, such as Software Product Line Engi-
neering (SPLE) [8], use features (configuration options) to control presence and
absence of the variable functionality [1]. Family members, called variants of a
variational system, are specified in terms of features selected for that particular
variant. The reuse of code common to multiple variants is maximized. The SPLE
method is particularly popular in the embedded and critical system domain
(e.g. cars, phones). In these domains, a rigorous verification and analysis is very
important. Among the methods included in current practices, model checking [2]
c The Author(s) 2018
A. Russo and A. Schürr (Eds.): FASE 2018, LNCS 10802, pp. 301–318, 2018.
https://doi.org/10.1007/978-3-319-89363-1_17
302 A. S. Dimovski
www.dbooks.org
Abstract Family-Based Model Checking Using MFTSs 303
properties, we can verify any such properties on the concrete variability model
(which is given as an FTSs) by verifying these on an abstract MFTS. Any
model checking problem on modal transitions systems (resp., MFTSs) can be
reduced to two traditional model checking problems on standard transition sys-
tems (resp., FTSs). The overall technique relies on partitioning and abstracting
concrete FTSs, until the point we obtain models with so limited variability (or,
no variability) that it is feasible to complete their model checking in the brute-
force fashion using the standard single-system model checkers. Compared to the
family-based model checking, experiments show that the proposed technique
achieves performance gains.
2 Background
In this section, we present the background used in later developments.
Modal Featured Transition Systems. Let F = {A1 , . . . , An } be a finite set
of Boolean variables representing the features available in a variational system.
A specific subset of features, k ⊆ F, known as configuration, specifies a variant
(valid product) of a variational system. We assume that only a subset K ⊆ 2F of
configurations are valid. An alternative representation of configurations is based
upon propositional formulae. Each configuration k ∈ K can be represented by a
formula: k(A1 ) ∧ . . . ∧ k(An ), where k(Ai ) = Ai if Ai ∈ k, and k(Ai ) = ¬Ai if
Ai ∈/ k for 1 ≤ i ≤ n. We will use both representations interchangeably.
We recall the basic definition of a transition system (TS) and a modal tran-
sition system (MTS) that we will use to describe behaviors of single-systems.
Definition 1. A transition system (TS) is a tuple T = (S, Act, trans, I, AP, L),
where S is a set of states; Act is a set of actions; trans ⊆ S ×Act×S is a transi-
tion relation; I ⊆ S is a set of initial states; AP is a set of atomic propositions;
and L : S → 2AP is a labelling function specifying which propositions hold in a
λ
state. We write s1 −−→s2 whenever (s1 , λ, s2 ) ∈ trans.
An execution (behaviour) of a TS T is an infinite sequence ρ = s0 λ1 s1 λ2 . . .
λi+1
with s0 ∈ I such that si −→ si+1 for all i ≥ 0. The semantics of the TS T ,
denoted as [[T ]]T S , is the set of its executions.
MTSs [26] are a generalization of transition systems that allows describ-
ing not just a sum of all behaviors of a system but also an over- and under-
approximation of the system’s behaviors. An MTS is a TS equipped with two
transition relations: must and may. The former (must) is used to specify the
required behavior, while the latter (may) to specify the allowed behavior of a
system.
Definition 2. A modal transition system (MTS) is represented by a tuple M =
(S, Act, transmay , transmust , I, AP, L), where transmay ⊆ S × Act × S describe
may transitions of M; transmust ⊆ S × Act × S describe must transitions of M,
such that transmust ⊆ transmay .
304 A. S. Dimovski
www.dbooks.org
Abstract Family-Based Model Checking Using MFTSs 305
c an
se
cel
rv
4 / c 5
eS
c
/s
n/
o
da
r
tu
da
so
re
take /c
/s
pay/v change/v op en /v start
se
rv
1 2 3 7 8 5
/t
e
ea
So
so
da
eT
rv
pay change open
te
se
fr ee/f
a
6 1 2 3 7 8
/t
start take/ f
take / v
take
serving tea; Soda (s, in green), for serving soda, which is a mandatory feature
present in all variants; CancelPurchase (c, in brown), for canceling a purchase
after a coin is entered; and FreeDrinks (f , in blue) for offering free drinks. Each
transition is labeled by an action followed by a feature expression. For instance,
free/f
the transition 1 − −−→ 3 is included in variants where the feature f is enabled.
By combining various features, a number of variants of this VendingMa-
chine can be obtained. Recall that v and s are mandatory features. The set
of valid configurations is thus: KVM = {{v, s}, {v, s, t}, {v, s, c}, {v, s, t, c}, {v,
s, f }, {v, s, t, f }, {v, s, c, f }, {v, s, t, c, f }}. In Fig. 2 is shown the basic version
of VendingMachine that only serves soda, which is described by the con-
figuration: {v, s} (or, as formula v ∧ s ∧ ¬t ∧ ¬c ∧ ¬f ), that is the projection
π{v,s} (VendingMachine). It takes a coin, returns change, serves soda, opens a
compartment so that the customer can take the soda, before closing it again.
Figure 3 shows an MTS. Must transitions are denoted by solid lines, while
may transitions by dashed lines.
CTL Properties. Computation Tree Logic (CTL ) [2] is an expressive tem-
poral logic for specifying system properties, which subsumes both CTL and LTL
logics. CTL state formulae Φ are generated by the following grammar:
Φ ::= true | a ∈ AP | ¬a | Φ1 ∧ Φ2 | ∀φ | ∃φ, φ ::= Φ | φ1 ∧ φ2 | φ | φ1 U φ2
where φ represent CTL path formulae. Note that the CTL state formulae Φ
are given in negation normal form (¬ is applied only to atomic propositions).
Given Φ ∈ CTL , we consider ¬Φ to be the equivalent CTL formula given in
negation normal form. Other derived temporal operators (path formulae) can be
defined as well by means of syntactic sugar, for instance: ♦φ = true Uφ (φ holds
eventually), and φ = ¬∀♦¬φ (φ always holds). ∀CTL and ∃CTL are subsets
of CTL where the only allowed path quantifiers are ∀ and ∃, respectively.
We formalise the semantics of CTL over a TS T . We write [[T ]]sTS for the
set of executions that start in state s; ρ[i] = si to denote the i-th state of the
execution ρ; and ρi = si λi+1 si+1 . . . for the suffix of ρ starting from its i-th state.
(4) ρ |= Φ iff ρ[0] |= Φ,
(5) ρ |= φ1 ∧ φ2 iff ρ |= φ1 and ρ |= φ2 ; ρ |= φ iff ρ1 |= φ; ρ |= (φ1 U φ2 ) iff
∃i ≥ 0. ρi |= φ2 ∧ (∀0 ≤ j ≤ i−1. ρj |= φ1 )
From now on, we implicitly assume this adapted definition when interpreting
CTL formulae over MTSs and MFTSs.
www.dbooks.org
Abstract Family-Based Model Checking Using MFTSs 307
3 Abstraction of FTSs
We now introduce the variability abstractions which preserve full CTL and its
universal and existential properties. They simplify the configuration space of
an FTSs, by reducing the number of configurations and manipulating presence
conditions of transitions. We start working with Galois connections1 between
Boolean complete lattices of feature expressions, and then induce a notion of
abstraction of FTSs. We define two classes of abstractions. We use the standard
conservative abstractions [14,15] as an instrument to eliminate variability from
the FTS in an over-approximating way, so by adding more executions. We use
the dual abstractions, which can also eliminate variability but through under-
approximating the given FTS, so by dropping executions.
Domains. The Boolean complete lattice of feature expressions (propositional
formulae over F) is: (FeatExp(F)/≡ , |=, ∨, ∧, true, false, ¬). The elements of the
domain FeatExp(F)/≡ are equivalence classes of propositional formulae ψ ∈
FeatExp(F) obtained by quotienting by the semantic equivalence ≡. The order-
ing |= is the standard entailment between propositional logics formulae, whereas
the least upper bound and the greatest lower bound are just logical disjunction
and conjunction respectively. Finally, the constant false is the least, true is the
greatest element, and negation is the complement operator.
Conservative Abstractions. The join abstraction, αjoin , merges the control-
flow of all variants, obtaining a single variant that includes all executions occur-
ring in any variant. The information about which transitions are associated with
which variants is lost. Each feature expression ψ is replaced with true if there
exists at least one configuration from K that satisfies ψ. The new abstract set of
features is empty: αjoin (F) = ∅, and the abstract set of valid configurations is a
singleton: αjoin (K) = {true} if K = ∅. The abstraction and concretization func-
tions between FeatExp(F) and FeatExp(∅), forming a Galois connection [14,15],
are defined as:
join true if ∃k ∈ K.k |= ψ join true if ψ is true
α (ψ) = γ (ψ) =
false otherwise k∈2F \K k if ψ is false
αfignore
A (ψ) = ψ[lA → true] fignore
γA (ψ ) = (ψ ∧ A) ∨ (ψ ∧ ¬A)
where ψ and ψ need to be in negation normal form before substitution.
Dual Abstractions. Suppose that FeatExp(F)/≡ , |=, FeatExp(α(F))/≡ , |= are
γ
Boolean complete lattices, and FeatExp(F)/≡ , |= ←−→
−−
α
− FeatExp(α(F))/≡ , |= is
a Galois connection. We define [9]: α = ¬ ◦ α ◦ ¬ and γ = ¬ ◦ γ ◦ ¬ so that
γ
FeatExp(F)/≡ , −←−−→− FeatExp(α(F))/≡ , is a Galois connection (or equiva-
α
α
←−
lently, FeatExp(α(F))/≡ , |= −−→− FeatExp(F)/≡ , |=). The obtained Galois con-
γ
nections ( ) are called dual (under-approximating) abstractions of (α, γ).
α, γ
The dual join abstraction, α join , merges the control-flow of all variants,
obtaining a single variant that includes only those executions that occur in all
variants. Each feature expression ψ is replaced with true if all configurations from
K satisfy ψ. The abstraction and concretization functions between FeatExp(F)
and FeatExp(∅), forming a Galois connection, are defined as: α join = ¬ ◦ αjoin ◦ ¬
true if ∀k ∈ K.k |= ψ k∈2F \K (¬k) if ψ is true
α join (ψ) = γ
join (ψ) =
false otherwise false if ψ is false
The dual feature ignore abstraction, αfignore A , introduces an under-
approximation by ignoring the feature A ∈ F, such that the literals of A
(that is, A and ¬A) are replaced with false in feature expressions (given in
negation normal form). The abstraction and concretization functions between
FeatExp(F) and FeatExp(αfignore
A (F)), forming a Galois connection, are defined
as: αfignore
A = ¬ ◦ αfignore
A
fignore
◦ ¬ and γA fignore
= ¬ ◦ γA ◦ ¬, that is:
αfignore
A (ψ) = ψ[lA → false] fignore
γA (ψ ) = (ψ ∨ ¬A) ∧ (ψ ∨ A)
where ψ and ψ are in negation normal form.
Abstract MFTS and Preservation of CTL . Given a Galois connection
(α, γ) defined on the level of feature expressions, we now define the abstrac-
tion of an FTS as an MFTS with two transition relations: one (may) preserving
universal properties, and the other (must) existential properties. The may tran-
sitions describe the behaviour that is possible, but not need be realized in the
variants of the family; whereas the must transitions describe behaviour that has
to be present in any variant of the family.
Definition 7. Given the FTS F = (S, Act, trans, I, AP, L, F, K, δ), we define the
MFTS α(F) = (S, Act, transmay , transmust , I, AP, L, α(F), α(K), δ may , δ must ) to
be its abstraction, where δ may (t) = α(δ(t)), δ must (t) = α
(δ(t)), transmay = {t ∈
may must
trans | δ (t) = false}, and trans = {t ∈ trans | δ must (t) = false}.
www.dbooks.org
Abstract Family-Based Model Checking Using MFTSs 309
Note that the degree of reduction is determined by the choice of abstraction and
may hence be arbitrary large. In the extreme case of join abstraction, we obtain
an abstract model with no variability in it, that is αjoin (F) is an ordinary MTS.
Example 3. Recall the FTS VendingMachine of Fig. 1 with the set of valid
configurations KVM (see Example 1). Figure 3 shows αjoin (VendingMachine),
where the allowed (may) part of the behavior includes the transitions that are
associated with the optional features c, f , t in VendingMachine, whereas the
required (must) part includes the transitions associated with the mandatory
features v and s. Note that αjoin (VendingMachine) is an ordinary MTS with
no variability. The MFTS αfignore
{t,f } (π[[v ∧ s]] (VendingMachine)) is shown in [12,
Appendix B], see Fig. 8. It has the singleton set of features F = {c} and limited
variability K = {c, ¬c}, where the mandatory features v and s are enabled.
From the MFTS (resp., MTS) MF, we define two FTSs (resp., TSs) MF may
and MF must representing the may- and must-components of MF, i.e. its may
and must transitions, respectively. Thus, we have [[MF may ]]F T S = [[MF]]may
MF T S
and [[MF must ]]F T S = [[MF ]]must
MF T S.
We now show that the abstraction of an FTS is sound with respect to CTL .
First, we show two helper lemmas stating that: for any variant k ∈ K that can
execute a behavior, there exists an abstract variant k ∈ α(K) that executes the
same may-behaviour; and for any abstract variant k ∈ α(K) that can execute a
must-behavior, there exists a variant k ∈ K that executes the same behaviour2 .
(i) Let k ∈ K and k |= ψ. Then there exists k ∈ α(K), such that k |= α(ψ).
(ii) Let k ∈ α(K) and k |= α
(ψ). Then there exists k ∈ K, such that k |= ψ.
Lemma 2
(i) Let k ∈ K and ρ ∈ [[πk (F)]]T S ⊆ [[F]]F T S . Then there exists k ∈ α(K), such
that ρ ∈ [[πk (α(F))]]may may
M T S ⊆ [[α(F)]]M F T S is a may-execution in α(F).
(ii) Let k ∈ α(K) and ρ ∈ [[πk (α(F))]]M T S ⊆ [[α(F)]]must
must
M F T S be a must-execution
in α(F). Then there exists k ∈ K, such that ρ ∈ [[πk (F)]]T S ⊆ [[F]]F T S .
As a result, every ∀CTL (resp., ∃CTL ) property true for the may- (resp.,
must-) component of α(F) is true for F as well. Moreover, the MFTS α(F)
preserves the full CTL .
Theorem 1 (Preservation results). For any FTS F and (α, γ), we have:
2
Proofs of all lemmas and theorems in this section can be found in [12, Appendix A].
310 A. S. Dimovski
c an
se
4 cel 5
rv
n
da
eS
r
tu
so
o
re
da
take
pay change open
1 2 3 7 8
ea
veT
te ser
free a 6
start
take
take
Let Φ be a CTL formula which is not in ∀CTL nor in ∃CTL , and let MF
be an MFTS. We verify MF |= Φ by checking Φ on two FTSs MF may and
MF must , and then we combine the obtained results as specified below.
www.dbooks.org
Abstract Family-Based Model Checking Using MFTSs 311
4 Implementation
We now describe an implementation of our abstraction-based approach for CTL
model checking of variational systems in the context of the state-of-the-art
NuSMV model checker [3]. Since it is difficult to use FTSs to directly model
very large variational systems, we use a high-level modelling language, called
fNuSMV. Then, we show how to implement projection and variability abstrac-
tions as syntactic transformations of fNuSMV models.
A High-Level Modelling Language. fNuSMV is a feature-oriented extension
of the input language of NuSMV, which was introduced by Plath and Ryan
[28] and subsequently improved by Classen [4]. A NuSMV model consists of a
set of variable declarations and a set of assignments. The variable declarations
define the state space and the assignments define the transition relation of the
finite state machine described by the given model. For each variable, there are
assignments that define its initial value and its value in the next state, which
is given as a function of the variable values in the present state. Modules can
be used to encapsulate and factor out recurring submodels. Consider a basic
NuSMV model shown in Fig. 4a. It consists of a single variable x which is
initialized to 0 and does not change its value. The property (marked by the
keyword SPEC) is “∀♦(x ≥ k)”, where k is a meta-variable that can be replaced
with various natural numbers. For this model, the property holds when k = 0.
In all other cases (for k > 0), a counterexample is reported where x stays 0.
The fNuSMV language [28] is based on superimposition. Features are mod-
elled as self-contained textual units using a new FEATURE construct added to
the NuSMV language. A feature describes the changes to be made to the given
basic NuSMV model. It can introduce new variables into the system (in a section
marked by the keyword INTRODUCE), override the definition of existing variables
in the basic model and change the values of those variables when they are read (in
a section marked by the keyword CHANGE). For example, Fig. 4b shows a FEATURE
construct, called A, which changes the basic model in Fig. 4a. In particular, the
feature A defines a new variable nA initialized to 0. The basic system is changed
312 A. S. Dimovski
in such a way that when the condition “nA = 0” holds then in the next state
the basic system’s variable x is incremented by 1 and in this case (when x is
incremented) nA is set to 1. Otherwise, the basic system is not changed.
Classen [4] shows that fNuSMV and FTS are expressively equivalent. He
[4] also proposes a way of composing fNuSMV features with the basic model
to create a single model in pure NuSMV which describes all valid variants.
The information about the variability and features in the composed model is
recorded in the states. This is a slight deviation from the encoding in FTSs,
where this information is part of the transition relation. However, this encoding
has the advantage of being implementable in NuSMV without drastic changes.
In the composed model each feature becomes a Boolean state variable, which is
non-deterministically initialised and whose value never changes. Thus, the initial
states of the composed model include all possible feature combinations. Every
change performed by a feature is guarded by the corresponding feature variable.
For example, the composition of the basic model and the feature A given
in Figs. 4a and b results in the model shown in Fig. 4c. First, a module, called
features , containing all features (in this case, the single one A) is added to the
system. To each feature (e.g. A) corresponds one variable in this module (e.g.
f A). The main module contains a variable named f of type features , so that
all feature variables can be referenced in it (e.g. f.f A). In the next state, the
variable x is incremented by 1 when the feature A is enabled (f A is TRUE ) and
nA is 0. Otherwise (TRUE: can be read as else:), x is not changed. Also, nA is
set to 1 when A is enabled and x is incremented by 1. The property ∀♦(x ≥ 0)
holds for both variants when A is enabled and A is disabled (f A is FALSE ).
www.dbooks.org
Abstract Family-Based Model Checking Using MFTSs 313
The functions αm and α copy all basic boolean expressions other than fea-
ture expressions, and recursively calls itself for all sub-expressions of compound
expressions. For αjoin (M )may , we have a single Boolean variable rnd which
is non-deterministically initialized. Then, αm (ψ) = rnd if α(ψ) = true. We
have: α([[M ]])may = [[α(M )may ]] and α([[M ]])must = [[α(M )must ]]. For exam-
ple, given the composed model M in Fig. 4c, the abstractions αjoin (M)may and
αjoin (M)must are shown in Figs. 5 and 6, respectively. Note that α join (f.f A) =
5 Evaluation
We now evaluate our abstraction-based verification technique. First, we show
how variability abstractions can turn a previously infeasible analysis of variabil-
ity model into a feasible one. Second, we show that instead of verifying CTL
314 A. S. Dimovski
properties using the family-based version of NuSMV [7], we can use variabil-
ity abstraction to obtain an abstract variability model (with a low number of
variants) that can be subsequently model checked using the standard version of
NuSMV.
All experiments were executed on a 64-bit Intel CoreT M i7-4600U CPU run-
ning at 2.10 GHz with 8 GB memory. The implementation, benchmarks, and all
results obtained from our experiments are available from: https://aleksdimovski.
github.io/abstract-ctl.html. For each experiment, we report the time needed to
perform the verification task in seconds. The BDD model checker NuSMV is
run with the parameter -df -dynamic, which ensures that the BDD package
reorders the variables during verification in case the BDD size grows beyond a
certain threshold.
Synthetic Example. As an experiment, we have tested limits of family-based
model checking with extended NuSMV and “brute-force” single-system model
checking with standard NuSMV (where all variants are verified one by one).
We have gradually added variability to the variational model in Fig. 4. This was
done by adding optional features which increase the basic model’s variable x
by the number corresponding to the given feature. For example, the CHANGE
section for the second feature B is: IF (nB = 0) THEN IMPOSE next(x) := x +
2; next(nB) := next(x) = x + 2?1:nB, and the domain of x is 0..3.
We check the assertion ∀♦(x ≥ 0). For |F| = 25 (for which |K| = 225 variants,
and the state space is 232 ) the family-based NuSMV takes around 77 min to
verify the assertion, whereas for |F| = 26 it has not finished the task within two
hours. The analysis time to check the assertion using “brute force” with standard
NuSMV ascends to almost three years for |F| = 25. On the other hand, if we
apply the variability abstraction αjoin , we are able to verify the same assertion by
only one call to standard NuSMV on the abstracted model in 2.54 s for |F| = 25
and in 2.99 s for |F| = 26.
Elevator. The Elevator, designed by Plath and Ryan [28], contains about
300 LOC and 9 independent features: Antiprunk, Empty, Exec, OpenIfIdle,
Overload, Park, QuickClose, Shuttle, and TTFull, thus yielding 29 = 512
variants. The elevator serves a number of floors (which is five in our case) such
that there is a single platform button on each floor which calls the elevator.
The elevator will always serve all requests in its current direction before it stops
and changes direction. When serving a floor, the elevator door opens and closes
again. The size of the Elevator model is 228 states. On the other hand, the
sizes of αjoin (Elevator)may and αjoin (Elevator)must are 220 and 219 states,
resp.
We consider five properties. The ∀CTL property “Φ1 = ∀ (f loor =
2 ∧ lif tBut5.pressed ∧ direction = up ⇒ ∀[direction = up Uf loor = 5]”
is that, when the elevator is on the second floor with direction up and
the button five is pressed, then the elevator will go up until the fifth floor
is reached. This property is violated by variants for which Overload (the
elevator will refuse to close its doors when it is overloaded) is satisfied.
Given sufficient knowledge of the system and the property, we can tailor
www.dbooks.org
Abstract Family-Based Model Checking Using MFTSs 315
model checking algorithms for verifying FTSs against LTL [5]. This approach
is extended [4,7] to enable verification of CTL properties using an family-based
version of NuSMV. In order to make this family-based approach more scalable,
the works [15,21] propose applying conservative variability abstractions on FTSs
for deriving abstract family-based model checking of LTL. An automatic abstrac-
tion refinement procedure for family-based model checking is then proposed in
[19]. The application of variability abstractions for verifying real-time variational
systems is described in [18]. The work [11,13] presents an approach for family-
based software model checking of #ifdef-based (second-order) program families
using symbolic game semantics models [10].
To conclude, we have proposed conservative (over-approximating) and their
dual (under-approximating) variability abstractions to derive abstract family-
based model checking that preserves the full CTL . The evaluation confirms
that interesting properties can be efficiently verified in this way. In this work, we
assume that a suitable abstraction is manually generated before verification. If
we want to make the whole verification procedure automatic, we need to develop
an abstraction and refinement framework for CTL properties similar to the one
in [19] which is designed for LTL.
References
1. Apel, S., Batory, D.S., Kästner, C., Saake, G.: Feature-Oriented Software Product
Lines - Concepts and Implementation. Springer, Heidelberg (2013). https://doi.
org/10.1007/978-3-642-37521-7
2. Baier, C., Katoen, J.: Principles of Model Checking. MIT Press, Cambridge,
London (2008)
3. Cimatti, A., Clarke, E., Giunchiglia, E., Giunchiglia, F., Pistore, M., Roveri, M.,
Sebastiani, R., Tacchella, A.: NuSMV 2: an opensource tool for symbolic model
checking. In: Brinksma, E., Larsen, K.G. (eds.) CAV 2002. LNCS, vol. 2404, pp.
359–364. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45657-0 29
4. Classen, A.: CTL model checking for software product lines in NuSMV. Technical
report, P-CS-TR SPLMC-00000002, University of Namur, pp. 1–17 (2011)
5. Classen, A., Cordy, M., Heymans, P., Legay, A., Schobbens, P.: Model checking
software product lines with SNIP. STTT 14(5), 589–612 (2012). https://doi.org/
10.1007/s10009-012-0234-1
6. Classen, A., Cordy, M., Schobbens, P., Heymans, P., Legay, A., Raskin, J.: Featured
transition systems: foundations for verifying variability-intensive systems and their
application to LTL model checking. IEEE Trans. Softw. Eng. 39(8), 1069–1089
(2013). http://doi.ieeecomputersociety.org/10.1109/TSE.2012.86
7. Classen, A., Heymans, P., Schobbens, P.Y., Legay, A.: Symbolic model checking
of software product lines. In: Proceedings of the 33rd International Conference on
Software Engineering, ICSE 2011, pp. 321–330. ACM (2011). http://doi.acm.org/
10.1145/1985793.1985838
8. Clements, P., Northrop, L.: Software Product Lines: Practices and Patterns.
Addison-Wesley, Boston (2001)
9. Cousot, P.: Partial completeness of abstract fixpoint checking. In: Choueiry, B.Y.,
Walsh, T. (eds.) SARA 2000. LNCS (LNAI), vol. 1864, pp. 1–25. Springer,
Heidelberg (2000). https://doi.org/10.1007/3-540-44914-0 1
www.dbooks.org
Abstract Family-Based Model Checking Using MFTSs 317
10. Dimovski, A.S.: Program verification using symbolic game semantics. Theor. Com-
put. Sci. 560, 364–379 (2014). https://doi.org/10.1016/j.tcs.2014.01.016
11. Dimovski, A.S.: Symbolic game semantics for model checking program families. In:
Bošnački, D., Wijs, A. (eds.) SPIN 2016. LNCS, vol. 9641, pp. 19–37. Springer,
Cham (2016). https://doi.org/10.1007/978-3-319-32582-8 2
12. Dimovski, A.S.: Abstract family-based model checking using modal featured tran-
sition systems: preservation of CTL (extended version). CoRR abs/1802.04970
(2018). http://arxiv.org/abs/1802.04970
13. Dimovski, A.S.: Verifying annotated program families using symbolic game seman-
tics. Theor. Comput. Sci. 706, 35–53 (2018). https://doi.org/10.1016/j.tcs.2017.
09.029
14. Dimovski, A.S., Al-Sibahi, A.S., Brabrand, C., Wasowski,
A.: Family-based model
checking without a family-based model checker. In: Fischer, B., Geldenhuys, J.
(eds.) SPIN 2015. LNCS, vol. 9232, pp. 282–299. Springer, Cham (2015). https://
doi.org/10.1007/978-3-319-23404-5 18
15. Dimovski, A.S., Al-Sibahi, A.S., Brabrand, C., Wasowski, A.: Efficient family-based
model checking via variability abstractions. STTT 19(5), 585–603 (2017). https://
doi.org/10.1007/s10009-016-0425-2
16. Dimovski, A.S., Brabrand, C., Wasowski, A.: Variability abstractions: trading
precision for speed in family-based analyses. In: 29th European Conference
on Object-Oriented Programming, ECOOP 2015. LIPIcs, vol. 37, pp. 247–270.
Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2015). https://doi.org/10.
4230/LIPIcs.ECOOP.2015.247
17. Dimovski, A.S., Brabrand, C., Wasowski, A.: Finding suitable variability
abstractions for family-based analysis. In: Fitzgerald, J., Heitmeyer, C., Gnesi,
S., Philippou, A. (eds.) FM 2016. LNCS, vol. 9995, pp. 217–234. Springer, Cham
(2016). https://doi.org/10.1007/978-3-319-48989-6 14
18. Dimovski, A.S., Wasowski,
A.: From transition systems to variability models and
from lifted model checking back to UPPAAL. In: Aceto, L., Bacci, G., Bacci, G.,
Ingólfsdóttir, A., Legay, A., Mardare, R. (eds.) Models, Algorithms, Logics and
Tools. LNCS, vol. 10460, pp. 249–268. Springer, Cham (2017). https://doi.org/10.
1007/978-3-319-63121-9 13
19. Dimovski, A.S., Wasowski,
A.: Variability-specific abstraction refinement for
family-based model checking. In: Huisman, M., Rubin, J. (eds.) FASE 2017. LNCS,
vol. 10202, pp. 406–423. Springer, Heidelberg (2017). https://doi.org/10.1007/978-
3-662-54494-5 24
20. Gazzillo, P., Grimm, R.: SuperC: parsing all of C by taming the preprocessor. In:
Vitek, J., Lin, H., Tip, F. (eds.) ACM SIGPLAN Conference on Programming
Language Design and Implementation, PLDI 2012, Beijing, China, 11–16 June
2012. pp. 323–334. ACM (2012). http://doi.acm.org/10.1145/2254064.2254103
21. Holzmann, G.J.: The SPIN Model Checker - Primer and Reference Manual.
Addison-Wesley, Boston (2004)
22. Iosif-Lazar, A.F., Al-Sibahi, A.S., Dimovski, A.S., Savolainen, J.E., Sierszecki, K.,
Wasowski, A.: Experiences from designing and validating a software modernization
transformation (E). In: 30th IEEE/ACM International Conference on Automated
Software Engineering, ASE 2015. pp. 597–607 (2015). https://doi.org/10.1109/
ASE.2015.84
23. Iosif-Lazar, A.F., Melo, J., Dimovski, A.S., Brabrand, C., Wasowski, A.: Effective
analysis of C programs by rewriting variability. Program. J. 1(1), 1 (2017). https://
doi.org/10.22152/programming-journal.org/2017/1/1
318 A. S. Dimovski
24. Kästner, C., Apel, S., Thüm, T., Saake, G.: Type checking annotation-based
product lines. ACM Trans. Softw. Eng. Methodol. 21(3), 14:1–14:39 (2012).
http://doi.acm.org/10.1145/2211616.2211617
25. Kästner, C., Giarrusso, P.G., Rendel, T., Erdweg, S., Ostermann, K., Berger, T.:
Variability-aware parsing in the presence of lexical macros and conditional compi-
lation. In: Proceedings of the 26th Annual ACM SIGPLAN Conference on Object-
Oriented Programming, Systems, Languages, and Applications, OOPSLA 2011,
pp. 805–824 (2011). http://doi.acm.org/10.1145/2048066.2048128
26. Larsen, K.G., Thomsen, B.: A modal process logic. In: Proceedings of the Third
Annual Symposium on Logic in Computer Science (LICS 1988), pp. 203–210. IEEE
Computer Society (1988). https://doi.org/10.1109/LICS.1988.5119
27. Midtgaard, J., Dimovski, A.S., Brabrand, C., Wasowski, A.: Systematic derivation
of correct variability-aware program analyses. Sci. Comput. Program. 105, 145–170
(2015). https://doi.org/10.1016/j.scico.2015.04.005
28. Plath, M., Ryan, M.: Feature integration using a feature construct. Sci. Comput.
Program. 41(1), 53–84 (2001). https://doi.org/10.1016/S0167-6423(00)00018-6
29. von Rhein, A., Thüm, T., Schaefer, I., Liebig, J., Apel, S.: Variability encod-
ing: from compile-time to load-time variability. J. Log. Algebr. Methods Program.
85(1), 125–145 (2016). https://doi.org/10.1016/j.jlamp.2015.06.007
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
www.dbooks.org
FPH: Efficient Non-commutativity
Analysis of Feature-Based Systems
1 Introduction
Feature-oriented software development (FOSD) [3] is a promising approach for
developing a collection of similar software products from a shared set of software
assets. In this approach, each feature encapsulates a certain unit of functionality
of a product; features are developed and tested independently and then inte-
grated with each other; developed features are then combined in a prescribed
manner to produce the desired set of products. A well-recognized issue in FOSD
is that it is prone to creating feature interactions [2,13,22,28]: cases where inte-
grating multiple features alters the behavior of one or several of them. Not all
interactions are desirable. E.g., the Night Shift feature of the recent iPhone did
not allow the Battery Saver to be enabled (and the interaction was not fixed
for over 2 months, potentially affecting millions of iPhone users). More critically,
in 2010, Toyota had to recall hundreds of thousands of Prius cars due to an
interaction between the regenerative braking system and the hydraulic braking
system that caused 62 crashes and 12 injuries.
Existing approaches for identifying feature interactions either require an
explicit order in which the features are to be composed [6,8,18,19,26] or assume
presence of a “150%” representation which uses an implicit feature order [12,15].
Yet they do not provide guidance on how to define this order, or how to deter-
mine a relative order of a newly-developed feature w.r.t. existing ones.
A classical approach of feature non-commutativity detection, defined by
Plath and Ryan [25], can be used to help build a composition order. The authors
defined non-commutativity as “the presence of a property, the value of which is
different depending on the order of the composition of the features” and pro-
posed a model-checking approach allowing to check available properties on dif-
ferent composition orders. E.g., consider the Elevator System [14,25] consisting
of five features: Empty – to clear the cabin buttons when the elevator is empty;
ExecutiveFloor – to override the value of the variable stop to give priority to the
executive floor (not stopping in the middle); TwoThirdsFull – to override the
value of stop not allowing people to get into the elevator when it is two-thirds
full; Overloaded – to disallow closing of the elevator doors while it is overloaded;
and Weight – to allow the elevator to calculate the weight of the people inside
the cabin. Features TwoThirdsFull and ExecutiveFloor are not commutative
(e.g., a property “the elevator does not stop at other floors when there is a
call from the executive floor” changes value under different composition orders),
whereby Empty and Weight are. Thus, an order between Empty and Weight is
not required, whereas the user needs to determine which of TwoThirdsFull or
ExecutiveFloor should get priority. Thus, feature non-commutativity guarantees
a feature interaction, whereas feature commutativity means that order of compo-
sition does not matter. Both of these outcomes can effectively complement other
feature interaction approaches.
In this paper, we aim to make commutativity analysis practical and appli-
cable to a broad range of modern feature-based systems, so that it can be used
as “the first line of defense” before running other feature interaction detections.
There are three main issues we need to tackle. First of all, to prove that fea-
tures commute requires checking their composition against all properties, and
capturing the complete behavior of features in the form of formal specifications
is an infeasible task. Thus, we aim to make our approach property-independent.
Second, we need to make commutativity analysis scalable and avoid rechecking
the entire system every time a single feature is modified or a new one is added.
Finally, we need to support analysis of systems expressed in modern program-
ming languages such as Java.
In [25], features execute “atomically” in a state-machine representation of the
system, i.e., they make all state changes in one step. However, when systems are
represented in conventional programming languages like Java, feature execution
may take several steps; furthermore, such features are composed sequentially,
using superimposition [5]. Examining properties defined by researchers studying
such systems [6], we note that they do not refer to intermediate states within
www.dbooks.org
FPH: Efficient Non-commutativity Analysis of Feature-Based Systems 321
the feature execution, but only to states before or after running the feature,
effectively treating features as atomic. In this paper, we use this notion of atom-
icity to formalize commutativity. The foundation of our technique is the separa-
tion between feature behavior and feature composition and efficiently checking
whether different feature compositions orders leave the system in the same inter-
nal state. Otherwise, a property distinguishing between the orders can be found,
and thus they do not commute. We call the technique and the accompanying
tool Mr. Feature Potato Head (FPH ), named after the kids’ toy which can be
composed from interchangeable parts.
In this paper, we show that FPH can perform commutativity analysis in
an efficient and precise manner. It performs a modular checking of pairs of fea-
tures [17], which makes the analysis very scalable: when a feature is modified, the
analysis can focus only on the interactions related to that feature, without need-
ing to consider the entire family. That is, once the initial analysis is completed,
a partial order between the features of the given system can be created and used
for detecting other types of interactions. Any feature added in the future will be
checked against all other features for non-commutativity-related interactions to
define its order among the rest of the features, but the existing order would not
be affected. In this paper, we only focus on the non-commutativity analysis and
consider interaction resolution as being out of scope.
Contributions. This paper makes the following contributions: (1) It defines
commutativity for features expressed in imperative programming languages and
composed via superimposition. (2) It proposes a novel modular representation
for features that distinguishes between feature composition and behavior. (3)
It defines and implements a modular specification-free feature commutativity
analysis that focuses on pairs of features rather than on complete products or
product families. (4) It instantiates this analysis on features expressed in Java.
(5) It shows that the implemented analysis is effective for detecting instances of
non-commutativity as well as proving their absence. (6) It evaluates the efficiency
and scalability of the approach.
The rest of the paper is organized as follows. We provide the necessary back-
ground, fix the notation and define the notion of commutativity in Sect. 2. In
Sect. 3, we describe our iterative tool-supported methodology for detecting fea-
ture non-commutativity for systems expressed in Java. We evaluate the effective-
ness and scalability of our approach in Sect. 4, compare our approach to related
work in Sect. 5 and conclude in Sect. 61 .
2 Preliminaries
In this section, we present the basic concepts and definitions and define the
notion of commutativity used throughout this paper.
1
The complete replication package including the tool binary, case studies used in
our experiments and proofs of selected theorems is available at https://github.com/
FeaturePotatoHead/FPH.
322 M. Chechik et al.
www.dbooks.org
FPH: Efficient Non-commutativity Analysis of Feature-Based Systems 323
Then we say that two features commute if they preserve valuation of prop-
erties of the form G(inBase =⇒ φ), where φ is a propositional formula defined
over any system state variables. That is, they do not commute if there is at least
one state of the base system which changes depending on the order in which the
features are composed. For example, the property “the elevator does not stop at
other floors when there is a call from the executive floor”, used in Sect. 1 to iden-
tify non-commutativity between features TwoThirdsFull and ExecutiveFloor, is
G(inBase =⇒ ¬(isExecutiveFloorCalling ∧ stopped ∧ floor =executiveFloor )).
3 Methodology
Our goal is to provide a scalable technique for
determining whether features commute by
establishing whether the two different com-
position orders leave the system in the same
internal state. The workflow of FPH is shown
below. The first step of FPH is to transform
each feature from an FST into an FPH repre-
sentation consisting of a set of fragments. The
base is transformed in the same way as the
individual features. Each fragment is further
split into feature behavior and feature composition – see Sect. 3.1. Afterwards,
we check for non-compositionality. If there do not exist feature fragments that
have shared location of composition, i.e., whose feature composition components
are the same, then the features commute. Otherwise, check the pairs of feature
fragments for behavior preservation, i.e., when the two features are composed in
the same location, the previous behavior is still present and can be executed. If
this check succeeds, we perform the shared variables check – see Sect. 3.2.
www.dbooks.org
FPH: Efficient Non-commutativity Analysis of Feature-Based Systems 325
(a) (b)
(c)
It creates a new Terminal Node to be added to the FST for each feature fragment
in the given feature. The name, type and body attributes of the node are filled
using the corresponding fields in the feature behavior component of the fragment.
Then, starting from the root node, for every node in the location path of the
feature composition component, if the node does not exist in the FST, it is
added; otherwise, the next node of the path is examined. The information about
bp and vars is already contained in the body of the Terminal Node and is no
longer considered as a separate field. E.g., joining the ExecutiveFloor feature
that we previously separated yields the FST in Fig. 2, as expected.
Theorem 1. Let n be the number of features in a system. For every feature F
which can be represented as (f b, f c), Join and Separate are inverses of each
other, i.e., Join( Separate(F )) = F and Separate( Join(f b,f c)) = (f b,f c).
(a) (b)
www.dbooks.org
FPH: Efficient Non-commutativity Analysis of Feature-Based Systems 327
Complexity. Let |F | be the number of features in the system and let M be the
largest number of fragments that each feature can have. For a pair of feature
fragments, checking shared location and checking behavior preservation are both
done in constant time, so the overall complexity of these steps is O((|F | × M )2 ).
In the worst case, all features affect the same set of methods and thus the shared
variables check should be run on all of them. Yet, all fragments in a feature are
non-overlapping, and thus the number of these checks is at most |F |2 × M .
2
But this does not happen often – see Sect. 4.
328 M. Chechik et al.
The time to perform a shared variable check, which we denote by SV , can vary
depending on an implementation and can be as expensive as PSPACE-hard.
Thus, the overall complexity of non-commutativity detection is O((|F | × M )2 +
SV × |F |2 × M ).
4 Evaluation
www.dbooks.org
FPH: Efficient Non-commutativity Analysis of Feature-Based Systems 329
different composition orders but handle only state machines. SPLVerifier [6] rep-
resents state of the art in verification of feature-based systems expressed in Java,
but it is not designed to do non-commutativity analysis. In the absence of alter-
native tools, we adapted SPLVerifier to the task of finding non-commutativity
violations to be able to compare with FPH.
We conducted two experiments to evaluate FPH and to answer our research
questions. For the first, we ran SPLVerifier on the first six systems (all properties
that came with them satisfied the pattern in Sect. 2 and thus were appropriate
for commutativity detection) presented in Table 1 to identify non-commutativity
interactions. Since SPLVerifier is designed to check products against a set of spec-
ifications, we have to define what a commutativity check means in this context.
For a pair of features, SPLVerifier would detect a commutativity violation if,
upon composing these features in different orders, the provided property pro-
duces different values. During this check, SPLVerifier considers composition of all
other features of the system in all possible orders and thus can identify two-way,
three-way, etc. feature interactions, if applicable. We measured the time taken
by SPLVerifier and the number of interactions found.
For the second experiment, we checked all 29 systems using FPH to identify
non-commutativity interactions. We measured the number of feature pairs that
required checking for shared variables, the time the analysis took and the preci-
sion of FPH in finding interactions. We were unable to establish ground truth for
non-commutativity analysis in cases where FPH required the shared variables
check due to our tool’s reliance on Soot’s unsound call graph construction [7].
Thus, we measured precision of our analysis by manually analyzing the valid-
ity of every interaction found by FPH. We also calculated SPLVerifier’s relative
recall, i.e., the ratio of non-commutativity-related interactions detected by FPH
that were also detected by SPLVerifier. We did not encounter any interactions
that were detected by SPLVerifier but not by FPH.
When the shared variables check is not necessary, our technique is sound.
In such cases, if we inform the user that two features are commutative, they
certainly are, and there is no need to define an order between them. As shown
below, soundness was affected only for a small number of feature pairs. Moreover,
advances in static analysis techniques may improve our results for those cases in
the future. Our experiments were performed on a 2 GB RAM Virtual machine
within an Intel Core i5 machine dual-core at 1.3 GHz.
Results. Columns 6–10 of Table 1 summarize results of our experiments, includ-
ing, for the first six examples, SPLVerifier’s precision and (relative) recall. “SV
pairs” capture the number of feature pairs for which the shared variables check
was required. A dash in the precision columns means that the measurement was
not meaningful since no interactions were detected. E.g., SPLVerifier does not
detect any non-commutativity interactions for Email, and FPH does not find
any non-commutativity interactions for EPL. FPH found a number of instances
of non-commutativity such as the one between ExecutiveFloor and TwoThirds-
Full in the Elevator System. Only one SV check was required (while checking
Empty and Weight features). Without our technique, the user would need to
330 M. Chechik et al.
provide order between the five features of the Elevator System, that is, specify 20
(5 × 4) ordering constraints. FPH allows us to conclude that ExecutiveFloor and
TwoThirdsFull do not commute, that Empty and Weight likely commute but this
is not guaranteed, and that all other pairs of features do commute. Thus, only
two feature pairs required further analysis by the user.
The Minepump system did not require the shared variable check at all and
thus FPH analysis for it is sound, and all three of the found interactions were
manually confirmed to be “real” (thus, precision is 1). ChatSystem/Weiss has
nine features which would imply needing to define the order between 72 (9 × 8)
feature pairs. Four non-commutativity cases were found, all using the shared
variables check, but only three were confirmed as “real” via a manual inspection
(thus, precision is 0.75). We conclude that FPH is effective in discovering non-
commutativity violations and proving their absence (RQ1).
We now turn to studying the accuracy of FPH w.r.t. finding non-
commutativity violations (RQ2). From Table 1, we observe that for the Elevator
System, both FPH and SPLVerifier correctly detect a non-commutativity inter-
action. For the Minepump system, SPLVerifier only finds two out of the three
interactions found by FPH (relative recall = 0.67). For the Email system, AJS-
tats, ZipMe, and GPL the specifications available in SPLVerifier do not allow
detecting any of the non-commutativity interactions found by FPH (relative
recall = 0).
GPL was a problematic case for FPH, affecting its precision. The graph algo-
rithms in this example take a set of vertices and create and maintain an internal
www.dbooks.org
FPH: Efficient Non-commutativity Analysis of Feature-Based Systems 331
(a) (b)
(c)
Fig. 6. (a) Number of FPH varsAnalysis calls per system; (b) Time spent by
FPH varsAnalysis per system; (c) Percentage of non-commutativity checks where BP
or SV analyses were applied last. (Color figure online)
data structure (e.g., to calculate the vertices involved in the shortest path or in
a strongly connected component). With this data structure, our analysis found
a number of possible shared variables and incorrectly deemed several features as
non-commutative. E.g., the algorithms to find cycles or the shortest path between
two nodes access the same set of vertices but change different fields and thus are
commutative. One way of avoiding such false positives would be to implement
field-sensitive alias analysis. While more precise, it will be significantly slower
than our current shared variables analysis.
For the remaining systems, either FPH’s reported interactions were “real”,
or, in cases where it returned some false positives (ChatSystemBurke, ChatSys-
temWeiss, and TankWar), it had to do with the precision of the alias analysis.
Thus, given SPLVerifier’s set of properties, FPH always exhibited the same or
better precision and recall than SPLVerifier. Moreover, for all but three of the
remaining systems, FPH exhibited perfect precision. We thus conclude that FPH
is very accurate (RQ2).
We now turn to the efficiency of our analysis (RQ3). The time it took to
separate features into behavior and composition was usually under 5 s. The out-
lier was BerkeleyDB, which took about a minute, due to the number of features
and especially fragments (BerkeleyDB has 2667 fragments whereas Violet has
912 and the other systems have at most 229). In general, the time taken by
FPH’s commutativity check was highly influenced by the number of calls to
FPH varsAnalysis. Figure 6a shows the number of calls to FPH varsAnalysis as
the number of features increases. E.g., BerkeleyDB has 98 features and required
332 M. Chechik et al.
only one call to FPH varsAnalysis, while AJStats has 19 features and required
136 of these calls. More features does not necessarily imply needing more of
these checks. E.g., Violet and BerkeleyDB required fewer checks than AJStats,
TankWar, and GPL, and yet they have more features.
Figure 6b shows the overall time spent by FPH varAnalysis per system being
analyzed. NotepadQuark and Violet took more time (resp., 1192 sec. and 1270
sec.) than GPL (1084 sec.) since these systems have calls to Java GUI libraries
(awt and swing), thus resulting in a larger call graph than for GPL. A simi-
lar situation occurred during checking TankWar (1790 sec.) and AJStats (1418
sec.). It took FPH under 200 s in most cases and less than 35 min in the worst
case to analyze non-commutativity (see Fig. 6b). FPH was efficient because
FPH varAnalysis was required for a relatively small fraction of pairs of fea-
ture fragments. We plot this information in Fig. 6c. For each analyzed system, it
shows the percentage of feature fragments for which behavior preservation (BP)
or shared variables (SV) was the last check conducted by FPH (out of the pos-
sible 100%). We omit the systems for which these checks were required for less
than 1% of feature pairs. The figure shows that the calls to FPH varsAnalysis
(to compute SV, in blue) were not required for over 96% of feature pairs.
To check for non-commutativity violations, SPLVerifier needs to check all
possible products which is infeasible in practice. So we set the timeout to one
hour during which SPLVerifier was able to check 110 products for Elevator, 57 for
Email, 151 for Minepump, 3542 for GPL, 2278 for AJStats and 1269 for ZipMe.
For each of these systems, a different check is required for every specification,
thus the same product is checked more than once if more than one specification
exists. Even though GPL, AJStats and ZipMe are larger systems with more fea-
tures, they have fewer properties associated with them and therefore we were
able to check more products within one hour. Thus, to answer RQ3, FPH was
much more efficient than SPLVerifier in performing non-commutativity analysis.
SPLVerifier was only able to analyze products containing the base system and
at most three features before reaching a timeout. Moreover, FPH can guaran-
tee commutativity, while SPLVerifier cannot because of it being based on the
properties given.
Our experiments also allow us to conclude that our technique is highly scal-
able (RQ4). E.g., the percentage of calls to FPH varsAnalysis is shown to be
small and increases only slightly with increase in the number of fragments (see
Fig. 6a and b).
Threats to Validity. Our results may not generalize to other feature-based sys-
tems expressed in Java. We believe we have mitigated this threat by running our
tool on examples provided by FeatureHouse. They include a variety of systems
of different sizes which we consider to be representative of typical Java feature-
based systems. As mentioned earlier, our use of SPLVerifier was not as intended
by its designers. We also had no ground truth when the shared variable check
was required. For those few cases, we calculated SPLVerifier’s relative instead of
actual recall.
www.dbooks.org
FPH: Efficient Non-commutativity Analysis of Feature-Based Systems 333
5 Related Work
In this section, we survey related work on modular feature definitions, feature
interaction detection and commutativity-related feature interactions.
Modular Feature Definitions. A number of approaches to modular feature
definitions have been proposed. E.g., the composition language in [8] includes
states in which the feature is to be composed (similar to our fg.location) and the
feature behavior (similar to our fb.body). Other work [4,9,10] uses superimposi-
tion of FSTs to obtain the composed system. In [14,25], new variables are added
or existing ones are changed with particular kind of compositions (either execut-
ing a new behavior when a particular variable is read, or adding a check before a
particular variable is set). These approaches treat the feature behavior together
with its composition specification. Instead, our approach automatically separates
feature definition into the behavioral and the composition part, enabling a more
scalable and efficient analysis.
Feature Interaction Detection. Calder et al. [13] survey approaches for ana-
lyzing feature interactions. Interactions occur because the behavior of one fea-
ture is being affected by others, e.g., by adding non-deterministic choices that
result in conflicting states, by adding infinite loops that affect termination, or
by affecting some assertions that are satisfied by the feature on its own. Check-
ing these properties as well as those discussed in more recent work [8,15,18,19]
requires building the entire SPL. Additionally, all these approaches consider state
machine representations which are not available for most SPLs, and extracting
them from code is non-trivial. SPLLift [12] is a family-based static analysis tool
not directly intended to find interactions. Any change in a feature would require
building the family-based representation again, whereas we conduct modular
checks between features. Spek [26] is a product-based approach that analyzes
whether the different products satisfy provided feature specifications. It does
not check whether the features commute.
Non-commutativity-Related Feature Interactions. [5,8] also looked at
detecting non-commutativity-related feature interactions. [5] presents a feature
algebra and shows why composition (by superimposition) is, in general, not com-
mutative. [8] analyzes feature commutativity by checking for bisimulation, and
the result of the composition is a state machine representing the product. Neither
work reports on a tool or applies to systems expressed in Java.
Aspect-Oriented Approaches. Storzer et al. [27] present a tool prototype
for detecting precedence-related interactions in AspectJ. Technically, this app-
roach is very similar to ours: it (a) detects which advice is activated at the same
place; (b) checks whether the proceed keyword and exceptions are present; and
(c) analyzes read and written variables. Yet, the focus is on aspects, and often
many aspects are required to implement a single feature [23]. This implies that
for m features
with an average of n aspects each, the analysis in [27] needs to
2 2
make O (m · n) checks, while our approach requires O m checks. There-
fore, the approach in [27] might be significantly slower than FPH. [1] analyzes
334 M. Chechik et al.
References
1. Aksit, M., Rensink, A., Staijen, T.: A graph-transformation-based simulation app-
roach for analysing aspect interference on shared join points. In: Proceedings of
AOSD 2009, pp. 39–50 (2009)
2. Apel, S., Atlee, J., Baresi, L., Zave, P.: Feature interactions: the next generation
(Dagstuhl Seminar 14281). Dagstuhl Rep. 4(7), 1–24 (2014)
3. Apel, S., Kästner, C.: An overview of feature-oriented software development. J.
Object Technol. 8(5), 49–84 (2009)
4. Apel, S., Kastner, C., Lengauer, C.: FeatureHouse: language-independent, auto-
mated software composition. In: Proceedings of ICSE 2009, pp. 221–231 (2009)
www.dbooks.org
FPH: Efficient Non-commutativity Analysis of Feature-Based Systems 335
5. Apel, S., Lengauer, C., Möller, B., Kästner, C.: An algebra for features and feature
composition. In: Proceedings of AMAST 2008, pp. 36–50 (2008)
6. Apel, S., Von Rhein, A., Wendler, P., Groslinger, A., Beyer, D.: Strategies for
product-line verification: case studies and experiments. In: Proceedings of ICSE
2013 (2013)
7. Arzt, S., Rasthofer, S., Fritz, C., Bodden, E., Bartel, A., Klein, J., Le Traon, Y.,
Octeau, D., McDaniel, P.: Flowdroid: precise context, flow, field, object-sensitive
and lifecycle-aware taint analysis for android apps. ACM SIGPLAN Not. 49(6),
259–269 (2014)
8. Atlee, J., Beidu, S., Fahrenberg, U., Legay, A.: Merging features in featured tran-
sition systems. In: Proceedings of MoDeVVa@MODELS 2015, pp. 38–43 (2015)
9. Batory, D., Sarvela, J., Rauschmayer, A.: Scaling step-wise refinement. IEEE TSE
30(6), 355–371 (2004)
10. Beidu, S., Atlee, J., Shaker, P.: Incremental and commutative composition of state-
machine models of features. In: Proceedings of MiSE@ICSE 2015, pp. 13–18 (2015)
11. Berger, T., Lettner, D., Rubin, J., Grünbacher, P., Silva, A., Becker, M., Chechik,
M., Czarnecki, K.: What is a feature?: A qualitative study of features in industrial
software product lines. In: Proceedings of SPLC 2015, pp. 16–25 (2015)
12. Bodden, E., Tolêdo, T., Ribeiro, M., Brabrand, C., Borba, P., Mezini, M.: SPLLift:
Statically analyzing software product lines in minutes instead of years. In: Proceed-
ings of PLDI 2013, pp. 355–364 (2013)
13. Calder, M., Kolberg, M., Magill, E., Reiff-Marganiec, S.: Feature interaction: A
critical review and considered forecast. Comput. Netw. 41(1), 115–141 (2003)
14. Classen, A., Cordy, M., Heymans, P., Legay, A., Schobbens, P.-Y.: Formal seman-
tics, modular specification, and symbolic verification of product-line behaviour.
Sci. Comput. Program. 80, 416–439 (2014)
15. Cordy, M., Classen, A., Schobbens, P.-Y., Heymans, P., Legay, A.: Managing evo-
lution in software product lines: a model-checking perspective. In: Proceedings of
VaMoS 2002, pp. 183–191 (2012)
16. Disenfeld, C., Katz, S.: A closer look at aspect interference and cooperation. In:
Proceedings of AOSD 2012, pp. 107–118. ACM (2012)
17. Fantechi, A., Gnesi, S., Semini, L.: Optimizing feature interaction detection. In:
Petrucci, L., Seceleanu, C., Cavalcanti, A. (eds.) FMICS-AVoCS 2017. LNCS, vol.
10471, pp. 201–216. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-
67113-0 13
18. Guelev, D., Ryan, M., Schobbens, P.-Y.: Model-checking the preservation of tem-
poral properties upon feature integration. STTT 9(1), 53–62 (2007)
19. Jayaraman, P., Whittle, J., Elkhodary, A.M., Gomaa, H.: Model composition in
product lines and feature interaction detection using critical pair analysis. In:
Engels, G., Opdyke, B., Schmidt, D.C., Weil, F. (eds.) MODELS 2007. LNCS,
vol. 4735, pp. 151–165. Springer, Heidelberg (2007). https://doi.org/10.1007/978-
3-540-75209-7 11
20. Krishnamurthi, S., Fisler, K., Greenberg, M.: Verifying aspect advice modularly.
In: ACM SIGSOFT SEN, vol. 29, pp. 137–146. ACM (2004)
21. Lam, P., Bodden, E., Lhoták, O., Hendren, L.: The soot framework for java pro-
gram analysis: a retrospective. In: Proceedings of CETUS 2011, vol. 15, p. 35
(2011)
22. Liu, J., Batory, D., Nedunuri, S.: Modeling interactions in feature oriented software
designs. In: Proceedings of ICFI 2005 (2005)
336 M. Chechik et al.
23. Lopez-Herrejon, R.E., Batory, D., Cook, W.: Evaluating support for features
in advanced modularization technologies. In: Black, A.P. (ed.) ECOOP 2005.
LNCS, vol. 3586, pp. 169–194. Springer, Heidelberg (2005). https://doi.org/10.
1007/11531142 8
24. Nipkow, T., Von Oheimb, D.: Javalight is type-safe - definitely. In: Proceedings of
PLDI 1998, pp. 161–170. ACM (1998)
25. Plath, M., Ryan, M.: Feature integration using a feature construct. Sci. Comput.
Program. 41(1), 53–84 (2001)
26. Scholz, W., Thüm, T., Apel, S., Lengauer, C.: Automatic detection of feature inter-
actions using the java modeling language: an experience report. In: Proceedings of
SPLC 2011, p. 7 (2011)
27. Storzer, M., Forster, F.: Detecting precedence-related advice interference. In: Pro-
ceedings of ASE 2006, pp. 317–322, September 2006
28. Zave, P.: Feature interactions and formal specifications in telecommunications.
IEEE Comput. 26(8), 20–29 (1993)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
www.dbooks.org
Taming Multi-Variability of Software
Product Line Transformations
1 Introduction
Software product line engineering [1] enables systematic reuse of software arti-
facts through the explicit management of variability. Representing a software
product line (SPL) in terms of functionality increments called features, and map-
ping these features to development artifacts such as domain models and code
allows to generate custom-tailored products on demand, by retrieving the corre-
sponding artifacts for a given feature selection. Companies such as Bosch, Boe-
ing, and Philips use SPLs to deliver tailor-made products to their customers [2].
Despite these benefits, a growing amount of variability leads to combinatorial
explosions of the product space and, consequently, to severe challenges. Notably,
this applies to software engineering tasks such as refactorings [3], refinements [4],
and evolution steps [5], which, to support systematic management, are often
expressed as model transformations. When applying a given model transforma-
tion to a SPL, a key challenge is to avoid enumerating and considering all possible
products individually. To this end, Salay et al. [6] have proposed an algorithm
that “lifts” regular transformation rules to a whole product line. The algorithm
www.dbooks.org
Taming Multi-Variability of Software Product Line Transformations 339
Fig. 1. Overview
2 Running Example
In this section, we introduce SPLs and variability-based model transformation by
example, and motivate and explain our contribution in the light of this example.
Software Product Lines. An SPL represents a collection of models that are
similar, but different to each other. Figure 2 shows a washing machine controller
SPL in an annotative representation, comprising an annotated domain model
and a feature model. The feature model [23] specifies a root feature Wash with
three optional children Heat, Delay, and Dry, where Heat and Delay are mutually
exclusive. The domain model is a statechart diagram specifying the behavior
of the controller SPL based on states Locking, Waiting, Washing, Drying, and
UnLocking with transitions between them. Presence conditions, shown in gray
labels, denote the condition under which an annotated element is present. These
conditions are used to specify variations in the execution behavior.
Concrete products can be obtained from configurations, in which each
optional feature is set to either true or false. A product arises by removing
340 D. Strüber et al.
Fig. 2. Washing machine controller product line and product (adapted from [6]).
those elements whose presence condition evaluates to false in the given con-
figuration. For instance, selecting Delay and deselecting Heat and Dry yields
the product shown in the right of Fig. 2. The SPL has six configurations and
products in total, since Wash is non-optional and Delay excludes Heat.
Variability-Based (VB) Model Transformation. In complex model trans-
formation scenarios, developers often create rules that are similar, but different
to each other. As an example, consider two rules foldEntryActions and foldExi-
tActions (Fig. 3), called A and B in short. These rules express a “fold” refactoring
for statechart diagrams: if a state has two incoming or outgoing transitions with
the same action, these actions are to be replaced by an entry or exit action of
the state. The rules have a left- and a right-hand side (LHS, RHS). The LHS
specifies a pattern to be matched to an input graph, and the difference between
the LHS and the RHS specifies a change to be performed for each match, like
the removing of transition actions, and the adding of exit and entry actions.
Rules A and B are simple; however, in a realistic transformation system, the
number of required rules can grow exponentially with the number of variation
points in the rules. To avoid combinatorial explosion, a set of variability-intensive
rules can be encoded into a single representation using a VB rule [12,18]. A VB
rule consist of a LHS, a RHS, a feature model specifying a set of interrelated
features, and presence conditions annotating LHS and RHS elements with a
condition under which they are present. Individual “flat” rules are obtained via
configuration, i.e., binding each feature to either true or false. In the VB rule
A + B, the feature model specifies a root feature refactor with alternative child
features foldEntry and foldExit. Since exactly one child feature has to be active
at one time, two possible configurations exist. The two rules arising from these
configurations are isomorphic to rules A and B.
Problem Statement. Model transformations such as foldActions are usually
designed for applications to a concrete software product, represented by a single
www.dbooks.org
Taming Multi-Variability of Software Product Line Transformations 341
Fig. 3. Two rules and their encoding into a variability-based rule (adapted from [24]).
x1: State
www.dbooks.org
Taming Multi-Variability of Software Product Line Transformations 343
3 Background
We now introduce the necessary prerequisites of our methodology, starting with
the double-pushout approach to algebraic graph transformation [19]. As the
underlying structure, we assume the category of graphs with graph morphisms
(referred to as morphisms from here), although all considerations are likely com-
patible with additional graph features such as typing and attributes.
www.dbooks.org
Taming Multi-Variability of Software Product Line Transformations 345
T ransF F (P, ř) = {Pi ⇒rc ,mc Qi |Pi ∈ F lat(P ), rc ∈ F lat(ř), match mc : Lc → Pi }
346 D. Strüber et al.
In the example, there are two rules and six products; however, only for two
products—the ones arising from configurations with Delay = true and Heat
= false—a match, and, therefore, a rule application exists, as we saw in the
earlier description of the example. T ransF F (P, ř) comprises the resulting two
rule applications.
To reuse matches to the domain model for the products, we introduce the
rerouting of a morphism from its codomain onto another graph G . We omit
naming the codomain and G explicitly where they are clear from the context.
Definition 8 (Rerouted morphism). Let an inclu-
sion i : G → G, a morphism m : L → G with
reroute(m,G)
c _ [ V R
an epi-mono-factorization (e, m ), and a morphism j : vl h / o
G aD GO L
m[L] → G be given, s.t. m = i ◦ j. The rerouted mor- DD i m {
{
DD {{
phism reroute(m, G ) : L → G arises by composition: D m {{e
j DD {
}{
reroute(m, G ) = j ◦ e. m[L]
www.dbooks.org
Taming Multi-Variability of Software Product Line Transformations 347
takes a single rule and applies it to a domain model and its presence conditions
in such a way as if the rule had been applied to each product individually. The
considered rule in our case is a flat rule with a match to the domain model.
Note that we cannot compare the set of staged applications directly to the
set of flattened applications, since it does not live on the product level. We can,
however, compare the obtained sets of products from both sets of applications,
which happens to be the same, thus showing the correctness of our approach.
Proof. Since both sets are defined over the same set of matches of flat rules, the
proof follows straight from the definition of lifting.
5 Algorithm
We present an algorithm for
Algorithm 1. Staged application.
implementing the staged applica-
Input : Product line P, VB rule ř
tion of a VB rule ř to a product
Output: Transformed product line P
line P . Following the overview
1 BMatches := findMatches(ModelP , r0 );
in Sect. 2 and the treatment in
2 foreach m ∈ BMatches do
Sect. 4, the main idea is to pro-
3 Φpc := { pc ∈ pcspre };
ceed in three steps: First, we
4 if ΦP ∧ Φpc is SAT then
match the base rule of ř to 5 foreach c ∈ configs(ř) do
the domain model, ignoring pres- 6 flatRule := rř .removeAll(e |
ence conditions. Second, we con- c pce );
sider individual rules as far as 7 Matches := findMatches(
necessary to obtain matches to ModelP , flatRule, m);
the domain model. Third, based 8 lift(P, flatRule, Matches);
on the matches, we perform the 9 end
actual rule application by using 10 end
the lifting algorithm from [6] in a 11 end
black-box manner.
Algorithm 1 shows the computation in more detail. In line 1, ř’s base rule r0
is matched to the domain model ModelP , leading to a set of base matches. If
this set is empty, we have reached the first exit criterion and can stop directly.
Otherwise, given a match m, in line 2, we check if at least one product Pi exists
that m can be rerouted onto (Definition 8). To this end, in lines 3–4, we use a SAT
solver to check if there is a valid configuration of P ’s feature model for which all
www.dbooks.org
Taming Multi-Variability of Software Product Line Transformations 349
6 Evaluation
To evaluate our technique, we implemented it for Henshin [27,28], a graph-based
model transformation language, and applied it to a transformation scenario with
product lines and transformation variability. The goal of our evaluation was to
study if our technique indeed produces the expected performance benefits.
Setup. The transformation is concerned with the detection of applied editing
operations during model differencing [29]. This setting is particularly interesting
for a performance evaluation: Since differencing is a routine software develop-
ment task, low latency of the used tools is a prerequisite for developer effective-
ness. The rule set, called UmlRecog, is tailored to the detection of UML edit
operations. Each rule detects a specific edit operation, such as “move method to
superclass”, based on a pair of model versions and a low-level difference trace.
UmlRecog comprises 1404 rules, which, as shown in Table 2, fall in three main cat-
egories: Create/Set, Change/Move, and Delete/Unset. To study the effect of our
technique on performance, an encoding of the rules into VB rules was required.
We obtained this encoding using RuleMerger [18], a tool for generating VB rules
from classic ones based on clustering and clone detection [30]. We obtained 504
VB rules; each of them representing between 1 and 71 classic rules. UmlRecog is
publicly available as part of a benchmark transformation set [31].
We applied this transformation to the 6 UML-based product lines specified
in Table 3. The product lines came from diverse sources and include manually
designed ones (1–2), and reverse-engineered ones from open-source projects (3–
6). Each product line was available as an UML model annotated with presence
conditions over a feature model. To produce the model version pairs used by
UmlRecog, we automatically simulated development steps by nondeterministi-
cally applying rules from a set of edit rules to the product lines, using the lifting
algorithm to account for presence conditions during the simulated editing step.
Table 4. Execution times (in seconds) of the lifting and the staged approach.
Lift Stage Factor Lift Stage Factor Lift Stage Factor Lift Stage Factor
InCar 2.13 0.52 4.1 0.23 0.12 1.9 7.28 0.86 8.5 9.66 1.49 6.5
E2E 1.99 0.82 2.4 0.35 0.32 1.1 7.28 0.95 7.7 9.62 2.12 4.5
JSSE 2.00 0.51 3.9 0.24 0.16 1.5 8.40 3.08 2.7 10.61 3.79 2.8
Notepad 2.05 0.66 3.1 0.26 0.14 1.9 7.01 1.64 4.3 9.38 2.47 3.8
Mobile 2.00 0.55 3.7 0.24 0.13 1.9 8.28 1.62 5.1 10.55 2.26 4.7
Lampiro 2.05 0.64 3.2 0.26 0.15 1.7 8.25 2.58 3.2 10.55 3.29 3.2
www.dbooks.org
Taming Multi-Variability of Software Product Line Transformations 351
than the rule variability, which implies a high performance penalty when enu-
merating products. Since we currently do not support advanced transformation
features, e.g., negative application conditions and amalgamation, we used vari-
ants of the flat and the VB rules without these concepts. We used a Ubuntu 17.04
system (Oracle JDK 1.8, Intel Core i5-6200U, 8 GB RAM) for all experiments.
Results. Table 4 gives an overview of the results of our experiments. The total
execution times for our technique were between 1.5 and 3.3 s, compared to 9.4
and 10.6 s for lifting, yielding a speedup by factors between 2.8 and 6.5. For both
techniques, all execution times are in the same order of magnitude across product
lines. A possible explanation is that the amount of applicable rules was small:
if the vast majority of rules can be discarded early in the matching process, the
execution time is constant with the number of rules.
The greatest speedups were observed for the Change/Move category, in which
rule variability was the greatest as well, indicated by the ratio between rules
and VB rules in Table 2. This observation is in line with our rationale of reusing
shared matches between rules. Regarding the number of products, a trend regard-
ing better scalability is not apparent, thus demonstrating that lifting is sufficient
for controlling product-line variability. Still, based on the overall results, the
hypothesis that our technique improves performance in situations with signifi-
cant product-line and transformation variability can be confirmed.
Threats to Validity. Regarding external validity, we only considered a limited
set of scenarios, based on six product lines and one large-scale transformation.
We aim to apply our technique to a broader class of cases in the future. The
version pairs were obtained in a synthetic process, arguably one that produces
pessimistic cases. Our treatment so far is also limited to a particular transfor-
mation paradigm, AGT, and one variability paradigm, the annotative one. Still,
AGT and annotative variability are the underlying paradigms of many state-
of-the-art tools. Finally, we did not consider the advanced AGT concepts of
negative application conditions and amalgamation in our evaluation; extending
our technique accordingly is left as future work.
7 Related Work
During an SPL’s lifecycle, not only the domain model, but also the feature
model evolves [32,33]. To support the combined transformation of domain and
feature models, Taentzer et al. [25] propose a unifying formal framework which
generalizes Salay et al.’s notion of lifting [6], yet in a different direction than us:
focusing on combined changes, this approach is not geared for internal variability
of rules; similar rules are considered separately. Both works could be combined
using a rule concept with separate feature models for rule and SPL variability.
Beyond transformations of SPLs, transformations have been used to imple-
ment SPLs. Feature-oriented development [34] supports the implementation of
features as additive changes to a base product. Delta-oriented programming [35]
adds flexibility to this approach: changes are specified using deltas that sup-
port deletions and modifications as well. Impact analysis in an evolving SPL can
352 D. Strüber et al.
Acknowledgement. We thank Rick Salay and the anonymous reviewers for their con-
structive feedback. This work was supported by the Deutsche Forschungsgemeinschaft
(DFG), project SecVolution@Run-time, no. 221328183.
References
1. Pohl, K., Boeckle, G., van der Linden, F.: Software Product Line Engineering:
Foundations, Principles, and Techniques. Springer, Heidelberg (2005). https://doi.
org/10.1007/3-540-28901-1
2. Apel, S., Batory, D., Kästner, C., Saake, G.: Feature-Oriented Software Product
Lines: Concepts and Implementation. Springer, Heidelberg (2013). https://doi.org/
10.1007/978-3-642-37521-7
3. Schulze, S., Thüm, T., Kuhlemann, M., Saake, G.: Variant-preserving refactoring
in feature-oriented software product lines. In: VaMoS, pp. 73–81 (2012)
www.dbooks.org
Taming Multi-Variability of Software Product Line Transformations 353
4. Borba, P., Teixeira, L., Gheyi, R.: A theory of software product line refinement.
Theor. Comput. Sci. 455, 2–30 (2012)
5. Lity, S., Kowal, M., Schaefer, I.: Higher-order delta modeling for software product
line evolution. In: FOSD, pp. 39–48 (2016)
6. Salay, R., Famelis, M., Rubin, J., Sandro, A.D., Chechik, M.: Lifting model trans-
formations to product lines. In: ICSE, pp. 117–128 (2014)
7. Kolovos, D.S., Rose, L.M., Matragkas, N., Paige, R.F., Guerra, E., Cuadrado, J.S.,
De Lara, J., Ráth, I., Varró, D., Tisi, M., et al.: A research roadmap towards
achieving scalability in model driven engineering. In: BigMDE, p. 2. ACM (2013)
8. Sijtema, M.: Introducing variability rules in ATL for managing variability in MDE-
based product lines. In: MtATL 2010, pp. 39–49 (2010)
9. Anjorin, A., Saller, K., Lochau, M., Schürr, A.: Modularizing triple graph gram-
mars using rule refinement. In: Gnesi, S., Rensink, A. (eds.) FASE 2014. LNCS,
vol. 8411, pp. 340–354. Springer, Heidelberg (2014). https://doi.org/10.1007/978-
3-642-54804-8 24
10. Hussein, J., Moreau, L., et al.: A template-based graph transformation system for
the PROV data model. In: GCM (2016)
11. Strüber, D.: Model-driven engineering in the large: refactoring techniques for mod-
els and model transformation systems. Ph.D. dissertation, Philipps-Universität
Marburg (2016)
12. Strüber, D., Rubin, J., Arendt, T., Chechik, M., Taentzer, G., Plöger, J.:
Variability-based model transformation: formal foundation and application. Formal
Aspects Comput. 30, 133–162 (2017)
13. Strüber, D., Rubin, J., Chechik, M., Taentzer, G.: A variability-based approach
to reusable and efficient model transformations. In: Egyed, A., Schaefer, I. (eds.)
FASE 2015. LNCS, vol. 9033, pp. 283–298. Springer, Heidelberg (2015). https://
doi.org/10.1007/978-3-662-46675-9 19
14. Kästner, C., Apel, S., Trujillo, S., Kuhlemann, M., Batory, D.: Language-
independent safe decomposition of legacy applications into features, vol. 2. Techni-
cal report, School of Computer Science, University of Magdeburg, Germany (2008)
15. Di Sandro, A., Salay, R., Famelis, M., Kokaly, S., Chechik, M.: MMINT: a graphical
tool for interactive model management. In: P&D@ MoDELS, pp. 16–19 (2015)
16. Strüber, D., Schulz, S.: A tool environment for managing families of model trans-
formation rules. In: Echahed, R., Minas, M. (eds.) ICGT 2016. LNCS, vol. 9761,
pp. 89–101. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40530-8 6
17. Rubin, J., Chechik, M.: Combining related products into product lines. In: de
Lara, J., Zisman, A. (eds.) FASE 2012. LNCS, vol. 7212, pp. 285–300. Springer,
Heidelberg (2012). https://doi.org/10.1007/978-3-642-28872-2 20
18. Strüber, D., Rubin, J., Arendt, T., Chechik, M., Taentzer, G., Plöger, J.: Rule-
Merger : automatic construction of variability-based model transformation rules.
In: Stevens, P., Wasowski,
A. (eds.) FASE 2016. LNCS, vol. 9633, pp. 122–140.
Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49665-7 8
19. Ehrig, H., Ehrig, K., Prange, U., Taentzer, G.: Fundamentals of Algebraic
Graph Transformation. MTCSAES. Springer, Heidelberg (2006). https://doi.org/
10.1007/3-540-31188-2
20. Czarnecki, K., Helsen, S.: Feature-based survey of model transformation
approaches. IBM Syst. J. 45(3), 621–645 (2006)
21. Richa, E., Borde, E., Pautet, L.: Translation of ATL to AGT and application
to a code generator for Simulink. SoSyM, 1–24 (2017). https://link.springer.com/
article/10.1007/s10270-017-0607-8
354 D. Strüber et al.
22. Kästner, C., Apel, S., Kuhlemann, M.: Granularity in software product lines. In:
ICSE, pp. 311–320 (2008)
23. Kang, K.C., Cohen, S.G., Hess, J.A., Novak, W.E., Peterson, A.S.: Feature-oriented
domain analysis (FODA) feasibility study. Technical report, Software Engineering
Inst., Carnegie-Mellon Univ., Pittsburgh, PA (1990)
24. Chechik, M., Famelis, M., Salay, R., Strüber, D.: Perspectives of model transfor-
mation reuse. In: Ábrahám, E., Huisman, M. (eds.) IFM 2016. LNCS, vol. 9681,
pp. 28–44. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-33693-0 3
25. Taentzer, G., Salay, R., Strüber, D., Chechik, M.: Transformations of software
product lines: a generalizing framework based on category theory. In: MODELS,
pp. 101–111. IEEE (2017)
26. Gomes, C.P., Kautz, H., Sabharwal, A., Selman, B.: Satisfiability solvers. In: Foun-
dations of Artificial Intelligence, vol. 3, pp. 89–134 (2008)
27. Arendt, T., Biermann, E., Jurack, S., Krause, C., Taentzer, G.: Henshin: advanced
concepts and tools for in-place EMF model transformations. In: Petriu, D.C.,
Rouquette, N., Haugen, Ø. (eds.) MODELS 2010. LNCS, vol. 6394, pp. 121–135.
Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16145-2 9
28. Strüber, D., Born, K., Gill, K.D., Groner, R., Kehrer, T., Ohrndorf, M., Tichy,
M.: Henshin: a usability-focused framework for EMF model transformation devel-
opment. In: de Lara, J., Plump, D. (eds.) ICGT 2017. LNCS, vol. 10373, pp.
196–208. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61470-0 12
29. Kehrer, T., Kelter, U., Taentzer, G.: A rule-based approach to the semantic lifting
of model differences in the context of model versioning. In: ASE, pp. 163–172.
IEEE Computer Society (2011)
30. Strüber, D., Plöger, J., Acreţoaie, V.: Clone detection for graph-based model trans-
formation languages. In: Van Gorp, P., Engels, G. (eds.) ICMT 2016. LNCS, vol.
9765, pp. 191–206. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-
42064-6 13
31. Strüber, D., Kehrer, T., Arendt, T., Pietsch, C., Reuling, D.: Scalability of model
transformations: position paper and benchmark set. In: Workshop on Scalable
Model Driven Engineering, pp. 21–30 (2016)
32. Thüm, T., Batory, D., Kästner, C.: Reasoning about edits to feature models. In:
ICSE, pp. 254–264 (2009)
33. Bürdek, J., Kehrer, T., Lochau, M., Reuling, D., Kelter, U., Schürr, A.: Reason-
ing about product-line evolution using complex feature model differences. Autom.
Softw. Eng. 23, 687–733 (2015)
34. Trujillo, S., Batory, D., Diaz, O.: Feature oriented model driven development: a
case study for portlets. In: ICSE, pp. 44–53. IEEE Computer Society (2007)
35. Schaefer, I., Bettini, L., Bono, V., Damiani, F., Tanzarella, N.: Delta-oriented pro-
gramming of software product lines. In: Bosch, J., Lee, J. (eds.) SPLC 2010. LNCS,
vol. 6287, pp. 77–91. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-
642-15579-6 6
36. Damiani, F., Hähnle, R., Kamburjan, E., Lienhardt, M.: A unified and formal
programming model for deltas and traits. In: Huisman, M., Rubin, J. (eds.) FASE
2017. LNCS, vol. 10202, pp. 424–441. Springer, Heidelberg (2017). https://doi.org/
10.1007/978-3-662-54494-5 25
37. He, X., Hu, Z., Liu, Y.: Towards variability management in bidirectional model
transformation. In: COMPSAC, vol. 1, pp. 224–233. IEEE (2017)
38. Biermann, E., Ermel, C., Taentzer, G.: Lifting parallel graph transformation con-
cepts to model transformation based on the eclipse modeling framework. Electron.
Commun. EASST 26 (2010)
www.dbooks.org
Taming Multi-Variability of Software Product Line Transformations 355
39. Rensink, A.: Compositionality in graph transformation. In: Abramsky, S., Gavoille,
C., Kirchner, C., Meyer auf der Heide, F., Spirakis, P.G. (eds.) ICALP 2010. LNCS,
vol. 6199, pp. 309–320. Springer, Heidelberg (2010). https://doi.org/10.1007/978-
3-642-14162-1 26
40. Ghamarian, A.H., Rensink, A.: Generalised compositionality in graph transforma-
tion. In: Ehrig, H., Engels, G., Kreowski, H.-J., Rozenberg, G. (eds.) ICGT 2012.
LNCS, vol. 7562, pp. 234–248. Springer, Heidelberg (2012). https://doi.org/10.
1007/978-3-642-33654-6 16
41. Perrouin, G., Amrani, M., Acher, M., Combemale, B., Legay, A., Schobbens, P.-Y.:
Featured model types: towards systematic reuse in modelling language engineering.
In: MiSE, pp. 1–7. IEEE (2016)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
Author Index
www.dbooks.org