Requirements Engineering
Requirements Engineering
Requirements Engineering
Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Alfred Kobsa
University of California, Irvine, CA, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Germany
Madhu Sudan
Microsoft Research, Cambridge, MA, USA
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max-Planck Institute of Computer Science, Saarbruecken, Germany
Roel Wieringa Anne Persson (Eds.)
Requirements Engineering:
Foundation for
Software Quality
13
Volume Editors
Roel Wieringa
University of Twente
Enschede, The Netherlands
E-mail: r.j.wieringa@utwente.nl
Anne Persson
University of Skvde
Skvde, Sweden
E-mail: anne.persson@his.se
ISSN 0302-9743
ISBN-10 3-642-14191-9 Springer Berlin Heidelberg New York
ISBN-13 978-3-642-14191-1 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
springer.com
Springer-Verlag Berlin Heidelberg 2010
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper 06/3180
Preface
This volume compiles the papers accepted for presentation at the 16th Working Con-
ference on Requirements Engineering: Foundation for Software Quality (REFSQ
2010), held in Essen during June 30 and July 1-2, 2010.
Since 1994, when the first REFSQ took place, requirements engineering (RE) has
never ceased to be a dominant factor influencing the quality of software, systems and
services. Initially started as a workshop, the REFSQ working conference series has
now established itself as one of the leading international forums to discuss RE in its
(many) relations to quality. It seeks reports of novel ideas and techniques that enhance
the quality of RE products and processes, as well as reflections on current research
and industrial RE practices. One of the most appreciated characteristics of REFSQ is
that of being a highly interactive and structured event. REFSQ 2010 was no exception
to this tradition.
In all, we received a healthy 57 submissions. After all submissions had been care-
fully assessed by three independent reviewers and went through electronic discus-
sions, the Program Committee met and finally selected 15 top-quality full papers (13
research papers and 2 experience reports) and 7 short papers, resulting in an accep-
tance rate of 38 %.
The work presented at REFSQ 2009 continues to have a strong anchoring in prac-
tice with empirical investigations spanning over a wide range of application domains.
As in previous years, these proceedings serve as a record of REFSQ 2010, but also
present an excellent snapshot of the state of the art of research and practice in RE. As
such, we believe that they are of interest to the whole RE community, from students
embarking on their PhD to experienced practitioners interested in emerging knowl-
edge, techniques and methods. At the time of writing, REFSQ 2010 has not taken
place yet. All readers who are interested in an account of the discussions that took
place during the conference should consult the post-conference summary that we
intend to publish as usual in the ACM SIGSOFT Software Engineering Notes.
REFSQ is essentially a collaborative effort. First of all, we thank Klaus Pohl for his
work as General Chair of the conference. We also extend our gratitude to Ernst Sikora
and Mikael Berndtsson who served REFSQ 2010 very well as Organization Chair and
Publication Chair, respectively. Also we thank Andreas Gehlert for serving very well
as Workshop and Poster Chair and Mikael Berndtsson for his work as Publications
Chair.
As the Program Chairs of REFSQ 2010, we deeply thank the members of the Pro-
gram Committee and the additional referees for their careful and timely reviews. We
particularly thank those who have actively participated in the Program Committee
meeting and those who have volunteered to act as shepherds to help finalize promis-
ing papers.
General Chair
Klaus Pohl University of Duisburg-Essen, Germany
Organizing Chair
Ernst Sikora University of Duisburg-Essen, Germany
Publications Chair
Mikael Berndtsson University of Skvde, Sweden
Program Committee
Ian Alexander Scenarioplus, UK
Aybke Aurum University New South Wales, Australia
Daniel M. Berry University of Waterloo, Canada
Jrgen Brstler University of Ume, Sweden
Sjaak Brinkkemper Utrecht University, The Netherlands
David Callele University of Saskatchewan, Canada
Alan Davis University of Colorado at Colorado Springs, USA
Eric Dubois CRP Henri Tudor, Luxembourg
Jrg Drr Fraunhofer-IESE, Germany
Christof Ebert Vector, Germany
Anthony Finkelstein University College London, UK
Xavier Franch Universitat Politcnica de Catalunya, Spain
Samuel Fricker University of Zrich and Fuchs-Informatik AG,
Switzerland
Vincenzo Gervasi Universit di Pisa, Italy
Martin Glinz University of Zrich, Switzerland
Tony Gorschek Blekinge Institute of Technology, Sweden
Olly Gotel Independent Researcher, New York City, USA
Paul Grnbacher University of Linz, Austria
Peter Haumer IBM Rational, USA
Patrick Heymans University of Namur, Belgium
VIII Organization
External Reviewers
Willem Bekkers Daniel Kerkow
Andreas Classen Dewi Mairiza
Alexander Delater Anshuman Saxena
Oscar Dieste Kevin Vlaanderen
Arash Golnam Inge van de Weerd
Florian Graf Richard Berntsson Svensson
Wiebe Hordijk Robert Heinrich Rumyana
Jennifer Horkow Proynova Sebastian
Cedric Jeanneret Barney Sira Vegas
Isabel John
Table of Contents
Keynote
Keynote Talk Piecing Together the Requirements Jigsaw-Puzzle . . . . . . . 1
Ian Alexander
Product Families I
Requirements Value Chains: Stakeholder Management and
Requirements Engineering in Software Ecosystems . . . . . . . . . . . . . . . . . . . 60
Samuel Fricker
Requirements Patterns
Towards a Framework for Specifying Software Robustness Requirements
Based on Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Ali Shahrokni and Robert Feldt
Product Families II
Towards Multi-view Feature-Based Conguration . . . . . . . . . . . . . . . . . . . . 106
Arnaud Hubaux, Patrick Heymans, Pierre-Yves Schobbens, and
Dirk Deridder
Natural Language
A Domain Ontology Building Process for Guiding Requirements
Elicitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Inah Omoronyia, Guttorm Sindre, Tor Stalhane, Stefan Bi,
Thomas Moser, and Wikan Sunindyo
Security Requirements
On the Role of Ambiguity in RE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
Vincenzo Gervasi and Didar Zowghi
Poster
How Do Software Architects consider Non-Functional Requirements:
A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
David Ameller and Xavier Franch
Ian Alexander
Scenarioplus Ltd., UK
iany@scenarioplus.org.uk
Software developers have been made to write requirements for their projects since the
1960s. Researchers have investigated every imaginable technique. But requirements
are still not being put together well. Something is going wrong.
One reason is that while different schools of research advocate powerful methods
goal modeling, scenario analysis, rationale modeling and more industry still
believes that requirements are stand-alone imperative statements. The mismatch be-
tween the wealth of techniques known to researchers and the impoverished lists of
shall-statements used in industry is striking.
The solution cannot be found by devising yet more elaborate techniques, yet more
complex puzzle-pieces. Even the existing ones are scarcely used in industry. Instead,
we need to work out how to assemble the set of available puzzle-pieces existing
ways of discovering and documenting requirements into simple, practical industrial
methods.
Another reason is that existing textbooks, and perhaps requirements education and
training too, largely assume that projects are all alike, developing stand-alone
software from scratch. But projects are constrained by contracts, fashion, standards
and not least by existing systems. The problems they must solve, and the techniques
they need to use, vary enormously. Pure and simple green-field development is the
exception.
This talk suggests:
what the pieces of the requirements jigsaw-puzzle are for example, scenario
analysis and goal modelling;
how, in general, they can be fitted together for example, as sequences of activities
and by traceability;
how, more specifically, projects of different types can re-assemble the pieces to
solve their own puzzles for example, by tailoring imposed templates, or develop-
ing processes appropriate to their domain and situation.
There are numerous answers to each of these questions. Perhaps the real message is
that there is not one requirements engineering, but many.
1 Introduction
Self-adaptation is emerging as a design strategy to mitigate maintenance costs in sys-
tems where factors such as complexity, mission-criticality or remoteness make off-
line adaptation impractical. Self-adaptation offers a means to respond to changes in a
systems environment by sensing contextual or environmental change at run-time and
adapting the behaviour of the system accordingly. In this paper we refer to self-
adaptive systems as dynamically adaptive systems (DASs) to reflect their ability to
adapt autonomously to changing context at run-time.
Dynamically adaptive systems have now been deployed in a number of problem
domains [1] yet remain challenging to develop because there is typically a significant
degree of uncertainty about the environments in which they operate. Indeed, this un-
certainty is the primary reason why a DAS must be able to self-adapt; to continue to
operate in a range of contexts with different requirements or requirements trade-offs.
DASs remain challenging to develop, despite advances made at different levels in the
software abstraction hierarchy and by communities as diverse as AI and networking.
At the architectural level [2], for example, compositional adaptation [3] promotes the
re-use of DAS components. Compositional adaptation is one such approach in which
structural elements of the system can be combined and recombined at run-time using
adaptive middleware (e.g. [4]).
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 216, 2010.
Springer-Verlag Berlin Heidelberg 2010
Understanding the Scope of Uncertainty in Dynamically Adaptive Systems 3
Despite such advances, and the seminal work of Fickas and Feather [5] in require-
ments monitoring, the RE community has only recently started to address the
challenges of dynamic adaptation. Our own recent contribution [6, 7, 8] has been to
investigate the use of goal-based techniques for reasoning about the requirements for
DASs. In [7], we advocated the use of claims from the NFR framework [9] in i* [10]
strategic rationale models to enhance the traceabilty of DAS requirements.
In this paper, we go a step further and argue for the utility of claims in DAS goal
models as markers for uncertainty. This research result has emerged as an unexpected
side-effect of our work on claims for tracing. Design rationale in a DAS is not always
founded on good evidence, but sometimes on supposition about how the system will
behave in different, hard-to predict contexts. Thus claims can serve not only as design
rationale but also as proxies for analysts understanding. It is crucial that the conse-
quences of decisions based on assumptions that may subsequently prove to be false
are understood, even if there is insufficient data to validate the assumptions them-
selves. We propose that a validation scenario should be defined to evaluate the effect
of a claim turning out to be false.
The primary contribution of the paper is a simple means for reasoning about hier-
archies of claims to understand how uncertainty propagates throughout these hierar-
chies. We show how this knowledge can be leveraged to improve the robustness of a
DAS specification using validation scenarios while minimizing the number of valida-
tion scenarios that need to be defined and evaluated. We demonstrate our approach
using a case study drawn from a sensor grid that was deployed on the River Ribble in
the Northwest of England. This sensor grid has acted as a preliminary evaluation for
our use of claim reasoning in DASs.
The rest of the paper is structured as follows. In the next section, section 2, we
introduce what we mean by claim reasoning and in section 3 we explain how an exist-
ing DAS modeling process may be adapted to include claims. We then use a case
study to illustrate claim reasoning about a DAS in section 4, and conclude with a brief
survey of related work (section 5) and final conclusions (section 6).
2 Claim Reasoning
In [7] we augmented i* models used to model DASs with claims, a concept borrowed
from the NFR toolkit [9]. As is now well-known within RE, i* supports reasoning
about systems in terms of agents, dependencies, goals and softgoals. We showed how
claims can be used to record requirements traceability information by explicitly re-
cording the rationale behind decisions, in cases where the contribution links assigned
to softgoals for different solution alternatives in i* strategic rationale (SR) models
dont reveal an obvious choice. We argued that this enhances the tracing information
in a way that is particularly important for a system that can adapt at run-time to
changing context.
As described above, claims capture the rationale for selecting one alternative de-
sign over another. As an example and before considering the effect of claims with
respect to self-adaptive behaviour, consider the fragment of a simple SR model of a
robot vacuum cleaner for domestic apartments depicted in Fig 1. The vacuum cleaner
has a goal to clean the apartment (clean apartment) and two softgoals; to avoid
4 K. Welsh and P. Sawyer
causing a danger to people within the house (avoid tripping hazard) and to be eco-
nomical to run (minimize energy costs). The vacuum cleaner can satisfy the clean
apartment goal by two different strategies that have been identified. It can clean at
night or when the apartment is empty. These two strategies are represented by two
alternative tasks related to the goal using means-end relationships. The choice of best
strategy is unclear because at this early stage of the analysis, it is hard to discriminate
between the extent to which each solution strategy satisfices the softgoals. The
balance of ve and +ve effects on satisficement of the softgoals appears to be ap-
proximately the same for both, but to different softgoals. This is depicted by the con-
tribution links labeled help and hurt. However, the choice is resolved using a claim,
which has the effect of asserting that there is no tripping hazard. The claim thus
breaks the hurt contribution link between the task clean at night and the softgoal
avoid tripping hazard. The break-ing claim nullifies the contribution link to which it
is attached. In this case it nullifies the negative impact that night cleaning was pre-
sumed to have on tripping hazard avoidance. In turn, this has the effect of promoting
the night cleaning strategy over the empty apartment cleaning strategy since it now
appears to better satisfice the two softgoals. The inverse of a break-ing claim is a
make-ing claim, which lends additional credence to a contribution link, implying the
importance of the satisfaction of a softgoal with a helps link or the unacceptability of
failing to satisfy a softgoal with a hurts link1. Note that claims speak of the impor-
tance of the effects captured by the contribution link, not the magnitude of the effect,
which can be captured using fine-grained contribution links.
The no tripping hazard claim is sufficient to select the night cleaning strategy, but
only if there is sufficient confidence in the claim that no hazard is offered. However,
the analyst may have greater or lesser confidence in a claim, so claim confidence
spans a range from axiomatic claims in which full confidence is held, to claims that
are mere assumptions. At the assumption end of the claim confidence range, a claim
is essentially a conjecture about a Rumsfeldian known unknown [11] and thus
serves as a marker of something about which uncertainty exists.
If a claim is wrong, the performance of the system may be unsatisfactory, or the
system may exhibit harmful emergent behaviour, or even fail completely. Ideally, the
claims should be validated before the system is deployed. In a DAS, this may be very
hard to do, however. Since the world in which a DAS operates is imperfectly under-
stood, at least some conjectural claims are likely to be impossible to validate at
design-time with complete assurance.
Given this fundamental limitation on claim validation, the behaviour of a system
should be evaluated in cases where claims turn out to be false. To do this, a validation
scenario should be designed for each claim to help establish the effects of claim falsi-
fication, such as whether it causes undesirable emergent behaviour. We do not define
the form of a validation scenario; it may be a test case or some form of static reason-
ing. However, developing and executing validation scenarios for each claim can be
expensive. A validation scenario should be developed for every possible combination
of broken and unbroken claims. Hence, the number of validation scenarios (T) is
T=2n1 where n represents the number of claims that make or break a softgoal contri-
bution link. One of the three target systems (explained below) of the GridStix system
1
Note that there are several other types of contribution and claim link to those presented here.
Understanding the Scope of Uncertainty in Dynamically Adaptive Systems 5
to see is derived from the claim vacuum has warning light. The claims family sleeps
at night and vacuum is easy to see together derive the bottom-level claim no tripping
hazard, which is the claim underpinning the selection of the night cleaning strategy
for the vacuum cleaner.
Falsity of any claim will propagate down the claim refinement model to the
bottom-level claim. If the family sleeps at night claim is shown to be untrue or only
partially true, the claim that there is no tripping hazard is upheld by the Or-ed claim
vacuum is easy to see, provided the latter claim is sound. If confidence in all the
claims in a claim refinement model was low, a validation scenario would have to be
produced for every combination of true and false claims. However, some of
the claims in the model may be axiomatic and the logic of claim derivations may give
confidence in the bottom-level claims even where they are in part derived from non-
axiomatic claims. To exploit this, a binary classification of claims, as axiomatic or
conjectural, might be used:
Unbreakable claims are axiomatic and can be removed from consideration. Vac-
uum has warning light is such a claim.
Uncertain claims are conjectural in that it is considered possible that the claim
could prove false at run-time. The claim that the family sleeps at night, and by impli-
cation would not encounter the vacuum cleaner, is conjectural because it cannot be
assumed to be generally true. In contrast to the vacuum is easy to see claim, there is a
significant possibility of this claim being proven false, since even a normally sound
sleeper might awake and visit the bathroom while the vacuum cleaner is operating.
Using this classification a validation scenario should be developed for every bot-
tom-level claim that resolves to uncertain, but need not be developed for one that
resolves to unbreakable. Thus, for the robot vacuum cleaner, a validation scenario is
needed to examine the consequences of the vacuum cleaner operating at night if the
family was not asleep.
Unbreakable and uncertain represent the two extremes of the claim confidence
spectrum. There may be claims that could be falsified but for which it is believed to
be improbable that the system will encounter a situation where the claims are broken.
We propose that such claims be classified as Qualified. However, a qualified claim
should be re-classified uncertain if, despite the low probability of them being falsi-
fied, the consequences of them being so might be serious. Note that a claim with a
high probability of falsification would already be classified uncertain, even if the
consequences were considered minor.
The claim vacuum is easy to see might be classified as a qualified claim. Even a
vacuum cleaner whose visibility is enhanced by a warning light may not be easily
noticeable to people with visual impairment or people who are sleep-walking. How-
ever, sleep-walking is unusual and it is unlikely that robot vacuum cleaners would be
recommended for people for whom they were obviously a hazard.
To propagate claim values (unbreakable, qualified, uncertain), the following rules
apply:
Where a claim is derived from a single claim (e.g. vacuum is easy to see is derived
directly from vacuum has warning light), the derived claim inherits the value of the
upper-level claim; the upper-level claim makes or breaks the derived claim. Hence,
vacuum is easy to see assumes the value unbreakable directly from vacuum has
warning light.
Understanding the Scope of Uncertainty in Dynamically Adaptive Systems 7
Where a claim is derived from two or more AND-ed claims, it inherits the value of
the weakest of the claims i.e. the claim least certain to be true. Hence if an un-
breakable and an uncertain claim are AND-ed to derive a bottom-level claim, the
bottom-level claim will be uncertain.
Where a claim is derived form two or more OR-ed claims, it inherits the value of
the strongest of the claims. Hence no tripping hazard assumes the value unbreak-
able from vacuum is easy to see, despite family sleeps at night being uncertain. In
the example, this has the effect of validating selection of the night-time cleaning
strategy in Fig 1.
The classification of claims cannot be easily formalized or automated. It requires hu-
man judgment by the analyst and stakeholders. However, if claims are classified
thoughtfully and propagated through the claim refinement model, the result can be
considered to be a form of threat modeling [12]. The exercise generates validation
scenarios for the goal models but prunes back the set of all possible validation scenar-
ios to include only claims that are judged plausible or high risk. In the next sections,
we describe how claims can be used to extend LoREM, a previously-reported
approach for modeling DAS requirements [6].
4 Case Study
To illustrate the use of claims for validation scenario identification in LoREM, we
present a conceptually simple but real DAS and work through the model analysis for a
single target system.
GridStix [15] is a system deployed on the River Ribble in North West England that
performs flood monitoring and prediction. It is a sensor network with smart nodes
capable of sensing the state of the river, processing the data and communicating it
across the network. The hardware available includes sensors that can measure depth
and flow rate, and wireless communication modules for the Wi-Fi and Bluetooth
standards, all supplied with power by batteries and solar panels. The system software
uses the GridKit middleware system [4], which provides the GridKits self adaptive
capabilities using component substitution.
The flow rate and river depth data is used by a point prediction model which
predicts the likelihood of the river flooding using data from the local node and data
cascaded from nodes further upstream. The more upstream nodes from which data is
available, the more accurate is the prediction. GridStix acts as a lightweight Grid,
Understanding the Scope of Uncertainty in Dynamically Adaptive Systems 9
capable of distributing tasks. This is important because some tasks, such as execution
of the point prediction models, can be parallelized and are best distributed among the
resource-constrained nodes. However, distributing the processing comes at the cost of
increased energy consumption and this is one of the factors that affect satisficement
of the energy efficiency softgoal mentioned previously. Distribution may be effected
by communicating between nodes using either the IEEE 802.11b (refered to as Wi-Fi
in the rest of this paper) or Bluetooth communication standards. Bluetooth has lower
power consumption, but shorter range. Bluetooth-based communication is hence
thought to be less resilient to node failure than Wi-Fi-based communication. The
choice of spanning tree algorithm can also affect resilience. A fewest-hop (FH) algo-
rithm is considered better able to tolerate node failure than the shortest-path (SP) al-
gorithm. However, data transmission via SP typically requires less power due to the
smaller overall distance that the data must be transmitted.
The GridKit middleware can configure itself dynamically to support all the varia-
tions implied above; local node or distributed processing, Wi-Fi or Bluetooth com-
munications and different network topologies, as well as others.
The environment in which GridStix operates is volatile, as the river Ribble drains a
large upland area that is subject to high rainfall. The river is therefore liable to flood-
ing with consequent risk to property and livestock among the communities sited on
the flood plain. Stochastic models provide only an imperfect understanding of the
rivers behaviour. Moreover, events upstream, such as construction sites or vegetation
changes, may alter its behaviour over time. There is therefore significant uncertainty
associated with developing an autonomous sensor network for deployment on the
river.
the river is about to flood. When this happens, GridStix itself is at risk of damage
from node submersion and from water-borne debris.
GridStix is conceptualised as comprising three target systems, S1, S2 and S3 tai-
lored to domains D1, D2 and D3 respectively, as shown in Fig 3.
The LoREM models for the system have previously been published in [6]. Here,
we will focus on only one of the target systems to illustrate the use of claim reason-
ing. The SD model identified a single overall goal Predict flooding, and three soft-
goals; Fault tolerance, Energy efficiency and Prediction accuracy. The SR model for
S3 (Flood) is depicted in Fig 4.
Flood prediction in S3 is operationalised by the task Provide Point Prediction
which can be further decomposed into the (sub)goals: Measure Depth, Calculate
Flow Rate and the task Communicate Data. The Communicate Data task depends on
the Transmit Data and Organize Network subgoals being achieved. Satisfaction of the
Measure Depth and Calculate Flow Rate goals produces Depth and Flow Rate re-
sources respectively. These are used elsewhere in LoREM models which we do not
consider here.
The Calculate Flow Rate, Organize Network and Transmit Data goals all have
several alternative satisfaction strategies, represented by tasks connected to the re-
spective goals with means-end links. The three softgoals are depicted on the right of
Fig 4, and the impact of selecting individual tasks on the satisficement of each of the
softgoals is represented by contribution the links with values help or hurt.
Understanding the Scope of Uncertainty in Dynamically Adaptive Systems 11
The tasks represent alternative solution strategies and those selected to satisfy
goals in S3 are coloured white in Fig 4. Attached to some of the contribution links in
Fig 4 are three claims that make or break the softgoal contributions. The claim
refinement model for S3 is depicted in Fig 5.
The three claims shown in Fig 4 appear at the bottom of Fig 5, connected to the
claims from which they are derived by contribution links. They show, for example,
that Wi-Fi was selected over Bluetooth to satisfy the Transmit Data goal because
Bluetooth was considered too risky in terms of Fault Tolerance. Examining Fig 5
allows the basis for this claim to be established: that Bluetooth is less resilient than
Wi-Fi and that, given the river is about to flood in S3, there is a significant risk of
node failure. Bluetooth is considered less resilient than Wi-Fi because of its poorer
range, which reduces the number of nodes an individual node may communicate with,
so increasing the likelihood of a single node failure hampering communication
throughout the network.
Fig. 6. GridStix Claim Refinement Model for S3 (Flood) Annotated with Claim Classifications
There are three bottom-level claims in Figs 4 & 5, and we would thus need seven
validation scenarios for every combination of claims. An analysis of the claims classi-
fied each according to the scheme presented in section 2. The results are shown in Fig 6.
12 K. Welsh and P. Sawyer
Two claims were considered uncertain. The claims SP is less resilient than FH and
Bluetooth is less resilient than Wi-Fi were both considered uncertain because the ana-
lyst was not certain whether the theoretically more resilient option in each case would
actually prove demonstrably more resilient in the field. Note that the Monitorable tag
(M) denotes a claim that could be directly monitored by the DAS at run-time. We
return to claim monitoring in the conclusions.
Propagating the unbreakable, qualified, and uncertain values through the tree in Fig
6, shows that two of the three bottom-level claims, SP too risky for S3 and Bluetooth
too risky for S3, could be impacted by an uncertain claim and thus be uncertain
themselves. The consequence of this analysis is that three validation scenarios are
necessary. Recall that the purpose of the validation scenarios is to investigate the con-
sequences of false claims, not to validate the claims themselves. The three identified
validation scenarios contrast with the seven that would be necessary if all the claims
were assumed to be falsifiable. The three scenarios are summarized in table 1.
Scenario Single-Node[] not SP too risky for S3 Bluetooth too risky for S3
accurate enough for S3
1 True True False
2 True False True
3 True False False
The S1 target system stands out as having the largest number of bottom-level
claims (five), which reflects the complexity of the trade-offs in available solution
strategies for this domain. Table 3 shows the numbers of validation scenarios that
would be needed in each target system for all claims, qualified and uncertain claims,
and uncertain claims only, respectively. This shows that using our strategy, the
number of validation scenarios for S1, even though it has five bottom-level claims,
could be restricted to three for uncertain claims or seven if further assurance was felt
necessary by including qualified claims.
Thus, devising validation scenarios for all combinations of uncertain claims would
require nine scenarios, and for the qualified and the uncertain claims would require
seventeen scenarios. Devising scenarios for all combinations of claims in the GridStix
system would require forty-five validation scenarios.
The GridStix system, as presented here, is only modestly complex from a model-
ling perspective, with only three softgoals and three target systems. Many DASs will
be significantly more complex. The effort involved in evaluating the consequences of
poorly-informed softgoal trade-offs in a DAS increases rapidly with the number of
potentially-conflicting NFRs involved. The technique described here has the potential
to focus and limit this effort by the use of claims to explicitly record the rationale for
solution strategy selection and by explicitly considering the soundness of those
claims. An undesirable side-effect of using claims is that they add to the problem of
complexity management in i* diagrams [21]. This is partially mitigated by the fact
that we have designed claims to work with LoREM. Using LoREM, an SR model for
a DAS is partitioned, with a separate SR model for each target system.
5 Related Work
There is currently much interest in software engineering for DASs [16]. Much of this
work has been in the design of software architectures that enable flexible adaptations
[2]. Much research in RE for self-adaptation has focused on run-time monitoring of
requirements conformance [5, 17], which is crucial if a DAS is to detect when and
how to adapt at run-time. More recently, attention has turned to requirements model-
ing for DASs, and a number of authors (e.g. [8, 18]) report on the use of goals for
modeling requirements for DASs. Goal models are well suited to exploring the
alternative solution strategies that are possible when the environment changes. Here,
adaptation is seen as the means to maintain goal satisficement, while goal modeling
notations such as KAOS [19] and i* [10] support reasoning about goals and softgoals.
A key challenge posed by DASs for RE is uncertainty about their environments
and a number of modeling approaches for handling uncertainty have been proposed.
14 K. Welsh and P. Sawyer
Letier and van Lamsweerde propose a formal means to reason about partial goal satis-
faction. Cheng et al. [8] use KAOSs obstacle analysis to reason about uncertainty,
utilizing a small set of mitigation strategies that include directly framing the uncer-
tainty using the RELAX requirements language [20]. In this paper, we propose aug-
menting the i* models used by the LoREM approach to DAS modeling [6] with the
claim construct adapted from the NFR framework [9]. We argue that by using claims
as the rationale for selecting between alternative solution strategies, they can also
serve as explicit markers for uncertainty where rationale is induced from assumed
properties of the environment or the DASs own behaviour.
There are two important differences between claims and the belief construct that is
built in to i*. The first is that an i* belief represents a condition that an actor holds to
be true. In our use of claims, the claim may also represent a condition that the analyst
holds to be true. The second difference is that a belief attaches to a softgoal while a
claim attaches to a softgoals contribution link. Hence, a claim is able to provide the
explicit rationale for selecting a particular solution strategy.
6 Conclusions
Claims attached to i* softgoal contribution links can be used to provide the rationales
for selecting from several alternative solution strategies. Used this way, claims can be
useful for tracing in DAS goal models [7]. Moreover, as we argue in this paper,
claims may also be used as markers of uncertainty. The utility of claims may extend
beyond DASs, but we focus on DASs because there is often significant uncertainty
about a DASs environment. Uncertainty may even extend to the DASs own,
emergent behaviour, if (e.g.) adaptation results in unexpected configurations of run-
time-substitutable components.
Not all claims represent uncertainty, however. The confidence level in a claim will
generally fall somewhere on a spectrum from axiomatic to pure conjecture. Conjec-
tural claims represent uncertainty; assumptions that cannot be validated at design-
time. Conjectural claims may therefore be falsified at run-time, possibly leading to a
variety of undesirable effects. Accepting that such claims cant be easily validated at
design-time, we should instead evaluate how the system will behave if a claim proves
to be false by developing a validation scenario. A validation scenario subsumes a test
case that may be developed for some combination of false claims, but also allows for
static evaluation if the claims are hard to simulate in a test harness.
Validation scenarios may be costly to evaluate so the approach we advocate is
designed to carefully select only those claims that have a significant probability of
being false and those with a low probability of being false but whose falsification
would be serious. To do this we advocate classifying claims as unbreakable, qualified
or uncertain, and then propagating claim values through a claim refinement model.
As future work, we are developing the means to monitor claims at run-time, using
the techniques of requirements monitoring [5]. Data collected about claim soundness
may be used for subsequent off-line corrective maintenance. However, if the goal
models can be treated as run-time entities where they can be consulted by the running
system, the DAS may adapt by dynamically selecting alternative solutions when a
claim is falsified. Such a system introduces new adaptive capabilities but also further
Understanding the Scope of Uncertainty in Dynamically Adaptive Systems 15
References
1. Cheng, B., de Lemos, R., Giese, H., Inverardi, P., Magee, J.: Software engineering for self
adaptive systems. In: Dagstuhl Seminar Proceedings (2009)
2. Kramer, J., Magee, J.: Self-managed systems: an architectural challenge. In: FOSE 2007:
2007 Future of Software Engineering, pp. 259268. IEEE Computer Society, Los
Alamitos (2007)
3. McKinley, P., Sadjadi, S., Kasten, E., Cheng, B.: Composing adaptive software.
Computer 37(7), 5664 (2004)
4. Grace, P., Coulson, G., Blair, G., Mathy, L., Duce, D., Cooper, C., Yeung, W., Cai, W.:
Gridkit: pluggable overlay networks for grid computing. In: Symposium on Distributed
Objects and Applications (DOA), Cyprus (2004)
5. Fickas, S., Feather, M.: Requirements monitoring in dynamic environments. In: Second IEEE
International Symposium on Requirements Engineering (RE 1995), York, UK (1995)
6. Goldsby, H., Sawyer, P., Bencomo, N., Cheng, B., Hughes, D.: Goal-Based modelling of
Dynamically Adaptive System requirements. In: ECBS 2008: Proceedings of the 15th
IEEE International Conference on Engineering of Computer-Based Systems, Belfast, UK
(2008)
7. Welsh, K., Sawyer, P.: Requirements tracing to support change in Dynamically Adaptive
Systems. In: Glinz, M., Heymans, P. (eds.) REFSQ 2009. LNCS, vol. 5512, pp. 5973.
Springer, Heidelberg (2009)
8. Cheng, H., Sawyer, P., Bencomo, N., Whittle, J.: A goal-based modelling approach to
develop requirements of an adaptive system with environmental uncertainty. In: MODELS
2009: Procedings of IEEE 12th International Conference on Model Driven Engineering
Languages and Systems, Colorado, USA (2009)
9. Chung, L., Nixon, B.A., Yu, E., Mylopoulos, J.: Non-Functional Requirements in
Software Engineering. Springer International Series in Software Engineering 5 (1999)
10. Yu, E.: Towards modeling and reasoning support for early-phase requirements engineering.
In: RE 1997: Proceedings of the 3rd IEEE International Symposium on Requirements En-
gineering (RE 1997), Washington DC, USA (1997)
11. Department of Defence: DoD News Briefing - Secretary Rumsfeld and Gen. Myers,
http://www.defense.gov/transcripts/transcript.aspx?transcriptid=2636
12. Schneier, B.: Attack Trees - Modeling security threats. Dr. Dobbs Journal (1999)
13. Berry, D., Cheng, B., Zhang, J.: The four levels of requirements engineering for and in
dynamic adaptive systems. In: 11th International Workshop on Requirements Engineering
Foundation for Software Quality, REFSQ (2005)
14. Jackson, M.: Problem frames: analyzing and structuring software development problems.
Addison-Wesley Longman, Amsterdam (2000)
16 K. Welsh and P. Sawyer
15. Hughes, D., Greenwood, P., Coulson, G., Blair, G., Pappenberger, F., Smith, P., Beven,
K.: Gridstix: Supporting Flood prediction using embedded hardware and next generation
grid middleware. In: 4th International Workshop on Mobile Distributed Computing (MDC
2006), Niagara Falls, USA (2006)
16. Cheng, B., et al.: Software engineering for self-adaptive systems: A research road map. In:
Cheng, B., de Lemos, R., Giese, H., Inverardi, P., Magee, J. (eds.) Software Engineering
for Self-Adaptive Systems. Dagstuhl Seminar Proceedings, vol. 08031 (2008)
17. Robinson, W.: A requirements monitoring framework for enterprise systems. Require-
ments Engineering 11(1), 1741 (2006)
18. Lapouchnian, A., Liaskos, S., Mylopoulos, J., Yu, Y.: Towards requirements-driven auto-
nomic systems design. In: DEAS 2005: Proceedings of the 2005 Workshop on Design and
Evolution of Autonomic Application Software (DEAS), St. Louis, MO, USA (2005)
19. van Lamsweerde, A.: Requirements Engineering: From System Goals to UML Models to
Software Specifications. John Wiley & Sons, Chichester (2009)
20. Whittle, J., Sawyer, P., Bencomo, N., Cheng, B., Bruel, J.-M.: RELAX: Incorporating
Uncertainty into the Specification of Self-Adaptive Systems. In: Proc. 17th IEEE
International Conference on Requirements Engineering (RE 2009), Atlanta, Georgia (Au-
gust 2009)
21. Moody, D., Heymans, P., Matulevicius, R.: Improving the Effectiveness of Visual
Representations in Requirements Engineering: An Evaluation of the i* Visual Notation. In:
Proc. 17th IEEE International Conference on Requirements Engineering (RE 2009),
Atlanta, Georgia (August 2009)
Use of Personal Values in Requirements Engineering
A Research Preview
1 Introduction
It is widely known from practice that the requirements engineering (RE) process is
heavily influenced by soft issues such as politics or personal values of stakeholders,
but there is very little guidance on how to deal with these issues [1] [2]. Goals models
such as i* [3] include goals, softgoals and actor dependencies, but give only little
guidance on how to elicit these intentional elements, and dont call for a deeper analy-
sis of these elements. Therefore they often capture only quite apparent economic or
operational goals. Scenario-oriented approaches typically incorporate guidelines from
human-computer interaction to focus on user tasks or use cases and to include early
prototyping [4], but they do not capture information about user motivation.
We believe it is important to reveal the fundamental issues behind goals and task
performance and to incorporate them into current RE approaches. Therefore, we pro-
pose to study personal values and their relationship to software requirements. We
chose personal values because they are an important motivation factor, which remains
stable independent of context [5]. We expect that the effect of personal values on
requirements will be especially pronounced in the health care domain where effective
patient treatment is the focus of the work of physicians and nurses.
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 1722, 2010.
Springer-Verlag Berlin Heidelberg 2010
18 R. Proynova et al.
2 Personal Values
Motivation has always been an important research topic in psychology: what makes
an individual behave in a certain way? The widely accepted human value theory intro-
duces the concept of personal values as a major behaviour determinant for the indi-
vidual. Most contemporary publications on value theory build on the work of social
psychologist Shalom Schwartz [6], who validated his theory using extensive empirical
studies. Note that there are behaviour influences other than personal values, such as
economic values or emotions. In our research we focus on personal values, sub-
sequently called values.
Schwartz defines values as desirable, trans-situational goals, varying in impor-
tance, that serve as guiding principles in peoples lives.[7]. The connotation of
goal in this definition is slightly different than the one typically used in RE litera-
ture: personal values like social recognition and free choice are seldom modelled
among stakeholders goals. While goal-oriented RE concentrates on stakeholder goals
limited to a single purpose, values function on a much higher level. They are deeply
ingrained in culture and the individuals acquire them during the socialization process.
Individuals generally behave in a way which helps them achieve these values. Excep-
tions from this rule arise in situations where other behavioural determinants are pre-
dominant, such as biological needs or ideological prescriptions. In early studies,
Schwartz discovered ten common values exhibited to a different extent by all partici-
pants. Table 1 lists these values and short explanations for each. Since then, extensive
research has shown that these ten values occur independently of race, nationality,
social and cultural background. This is an important conclusion of value theory: dif-
ferent populations dont strive for fundamentally different values, but there is a set of
values common to all of us.
Despite sharing the same values, different individuals act differently in similar
situations. The reason is that they place different importance on each value. In a situa-
tion where there is a conflict between values (e.g. donating money to a charity aids
benevolence and universalism, but spending the same money on ones hobby helps
achieve hedonism), an individual would choose the option consistent with the values
he or she deems more important. So while everyone believes in the desirability of the
same values, each individual has a personal ranking of the values.
Use of Personal Values in Requirements Engineering A Research Preview 19
Values are important in RE because of the way they shape the individuals interac-
tions with software. They are a criterion for the evaluation: the desirability of a behav-
iour option increases monotonically with the degree in which performing it helps the
individual achieve the (high-ranked) values. The evaluation process may be deliberate
or subconscious. But regardless of the awareness of the actual process, the individual
is usually aware of its outcome. He or she can articulate it as a statement of the kind
I like X or I dont like Y. Such judgments of behaviour options (and also judg-
ments of any other entities) are known in psychology as attitudes.
An attitude is defined as a psychological tendency that is expressed by evaluating
a particular entity with some degree of favour or disfavour [8]. Unlike the univer-
sally applicable values, an attitude always is about some target entity. It always im-
plies an evaluation, which may use one or more dimensions (like/dislike, good/bad),
but invariably results in an aggregated assessment, positive or negative.
Attitudes are formed through a judgment process. Since values are important crite-
ria for our judgment, attitudes are at least partly based on values. Thus, when informa-
tion on values isnt available, information on attitudes can be used as an indicator for
the ranking of values [9]. This has important implications for empirical research. As
attitudes are much more salient than values, their self-reporting proves easier than
self-reporting on values. Thus, researchers interested in values can apply instruments
based on attitudes, which are more reliable, and then use the results to draw conclu-
sions about the subjects values (see e.g. [10]). But the correlation can also be used in
the opposite direction: once the connection between value rankings and attitudes to-
ward specific targets is supported by empirical evidence, knowledge of an individ-
uals value ranking can be used (given some conditions described in [11]) to predict
his or her attitude toward these targets.
20 R. Proynova et al.
The first step of our approach is the elicitation of values. Existing instruments for
value elicitation might appear too intrusive in requirements engineering practice,
because in this situation users might be reluctant to answer direct questions about
their personality. As stated in section 2, some attitudes strongly correlate with values.
We plan to develop as part of the first method a new attitude-based questionnaire
which gives us information about a users values. We want to explore how acceptance
of the original Schwartz questionnaire compares to our instrument.
When the values are known, the requirements analyst can use our second method to
predict the users attitudes towards different tasks. The seemingly superfluous round-
trip from attitudes to values and from that to other attitudes is caused by the fact that
directly questioning a user about his or her attitudes towards the hundreds of software
supported tasks involves too much effort. Moreover, the attitudes towards tasks are
situation dependent and likely to change as soon as the context changes. On the other
hand, values are an integral part of the users personality and unlikely to change [13].
They allow us to predict how the users attitudes will change after a context change.
We plan to identify for our second method correlations between values and atti-
tudes towards tasks at least for the medical domain. But even when a catalogue of
empirically founded statements about value-attitude correlations isnt available, a
requirements analyst with basic knowledge of value theory can use the information on
the values to reason about expected attitudes. For example, if the value elicitation
shows that the user is primarily motivated by the value stimulation, which is mediated
through novelty, then it is reasonable to assume that whenever faced with a task like
record patient temperature, the user would prefer to input the data using a trendy
electronic device instead of scribbling on paper.
Use of Personal Values in Requirements Engineering A Research Preview 21
In the third step, the analyst uses the information about attitudes and our third
method to enrich the existing requirements. Our approach does not include the identifi-
cation of tasks and goals; they have to be elicited using classical methods. But knowing
the users attitude towards the tasks allows deeper insight into the requirements. It can
be especially useful for uncovering new requirements which werent verbalised by the
user: if a physician has a negative attitude towards tasks involving typing, possibly
associated with a value of achievement or power, we can expect him or her to take
notes on patient symptoms on paper and delegate the data input to a nurse. This means
that he or she needs a system which gives nurses writing access to patient records. Of
course, such an inferred requirement cannot be simply included in the specification
without discussing it with the relevant stakeholders. But the merit of the value-based
approach in this example is that it has revealed the existence of an issue which could
be easily overlooked in a traditional process oriented task description.
4 Related Work
By wording, value-based software engineering seems related to our research [14].
However, so far it typically focuses on economic value, not on personal value. Only
recently have personal values been addressed in RE [2]. This publication also dis-
cusses motivation, but it uses a much broader definition of the term value, namely
any concept which influences human behaviour is a value. Furthermore, it considers
other soft issues such as emotions or motivations. Our understanding of value is
roughly equivalent to their notion of a motivation. So while our research has a simi-
lar focus, that publication remains on a much more general level.
Psychology provides plenty of literature on personal values. We name some of the
main sources in this publication. Psychology also offers many studies on the link
between work patterns and personal behavioural determinants like values, beliefs etc.
Some of these studies focus on health care professionals, such as [12], [15]. They
provide valuable insights in the motivation of clinicians, but dont link them to their
software use or software requirements. Another type of studies concentrates on pro-
fessionals attitudes towards computers in general [16], but we arent aware of any
results which try to establish a link between attitudes and software requirements.
References
1. Robertson, S., Robertson, J.: Mastering the Requirements Process, 2nd edn. Addison-Wesley,
Harlow (2006)
2. Thew, S., Sutcliffe, A.: Investigating the Role of Soft Issues in the RE Process. In:
Proceedings of the 2008 16th IEEE International Requirements Engineering Conference.
IEEE Computer Society, Los Alamitos (2008)
3. Yu, E.S.K.: From E-R to A-R - Modeling strategic actor relationships for business
process reengineering. International Journal of Cooperative Information System 4,
125144 (1995)
4. Lauesen, S.: Software Requirements: Styles and Techniques. Pearson Education, London
(2001)
5. Wetter, T., Paech, B.: What if business process is the wrong metaphor? Exploring the
potential of Value Based Requirements Engineering for clinical software. In: Accepted at
MedInfo 2010, CapeTown (2010)
6. Schwartz, S., Bilsky, W.: Toward a theory of the universal content and structure of values:
Extensions and cross-cultural replications. Journal of Personality and Social Psychol-
ogy 58, 878891 (1990)
7. Schwartz, S., Melech, G., Lehmann, A., Burgess, S., Harris, M., Owens, V.: Extending the
Cross-Cultural Validity of the Theory of Basic Human Values with a Different Method of
Measurement. Journal of Cross-Cultural Psychology 32, 519542 (2001)
8. Eagly, A., Chaiken, S.: The psychology of attitudes. Harcourt Brace Jovanovich College
Publishers Fort Worth, TX (1993)
9. Bohner, G., Schwarz, N.: Attitudes, persuasion, and behavior. In: Tesser, A. (ed.) Black-
well handbook of social psychology: Intrapersonal processes, vol. 3, pp. 413435. Black-
well, Malden (2001)
10. Inglehart, R.: Modernization and Postmodernization: Cultural, Economic, and Political
Change in 43 Societies. Princeton University Press, Princeton (1997)
11. Ajzen, I., Fishbein, M.: The influence of attitudes on behavior. In: Albarracin, D., Johnson,
B., Zanna, M. (eds.) The handbook of attitudes, vol. 173, p. 221. Lawrence Erlbaum,
Mahwah (2005)
12. Larsson, J., Holmstrom, I., Rosenqvist, U.: Professional artist, good Samaritan, servant and
co-ordinator: four ways of understanding the anaesthetists work. Acta Anaesthesiol
Scand 47, 787793 (2003)
13. Rokeach, M.: The nature of human values. Jossey-Bass, San Francisco (1973)
14. Biffl, S., Aurum, A., Boehm, B., Erdogmus, H., Grunbacher, P.: Value-based software
engineering. Springer, New York (2006)
15. Timmons, S., Tanner, J.: Operating theatre nurses: Emotional labour and the hostess role.
International Journal of Nursing Practice 11, 8591 (2005)
16. van Braak, J.P., Goeman, K.: Differences between general computer attitudes and
perceived computer attributes: development and validation of a scale. Psychological
reports 92 (2003)
17. Venkatesh, V., Bala, H.: Technology acceptance model 3 and a research agenda on
interventions. Decision Sciences 39, 273 (2008)
Requirements and Systems Architecture Interaction in a
Prototypical Project: Emerging Results
1 Introduction
A recent, laboratory, study of ours [3] investigated the issue of the impact of an exist-
ing software architecture (SA) in Requirements Engineering (RE), where we identi-
fied four types of RE-SA interaction effects along with their quantitative profile: (i)
constraint (25%), if the existing SA makes a requirement solution approach less (or
in-) feasible; (ii) enabler (30%), if the existing SA makes a solution approach (more)
feasible because of the current architectural configuration; (iii) influence (6%), if the
architectural effect altered a requirements decision without affecting the feasibility of
its solution approaches; and (iv) no effect (39%), if the architecture has no known
effect on a requirements solution.
In this paper, we present emerging results of a replicated case study on a large-
scale prototypical rail project (RailCab) being conducted in Germany. While we
continue the main investigation of [3], here we also investigate the impact of the af-
fected requirements decisions on downstream development processes and the resul-
tant system. The case study, which is still ongoing, involves the investigation of the
history of requirements and architecting decisions in five major components of Rail-
Cab (e.g., drive and brake, suspension/tilt and active guidance). For the emerging
results reported in this paper, data for one of these components (Energy Management)
was collected from project documents and extensive interviews with the RailCab
developers and planners. The results of this study have implications for: project
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 2329, 2010.
Springer-Verlag Berlin Heidelberg 2010
24 R. Ferrari et al.
Study Context: The RailCab project has been in development for approximately ten
years at the University of Paderborn in Germany. The goal of the project is to develop
a comprehensive concept for a future railway system. RailCab is considered a mecha-
tronic system, i.e., it requires the interdisciplinary expertise in the areas of mechani-
cal, electrical and software engineering fields. The key feature of RailCab is that it is
an autonomous, self-optimizing system, and thus does not require any human operator.
The RailCab consists of five major components: Drive and Brake, Energy Manage-
ment, Active Guidance, Tilt and Suspension, and Motor and Track Design. For this
short paper, we examine the Energy Management component only. The primary pur-
pose of this component is to ensure that each of the RailCab subsystems energy de-
mands are fulfilled. Additionally, the component is responsible for recharging the
energy sources as the train operates. Other features include heat and voltage monitor-
ing of battery arrangements for safety purposes, using batteries as main power supply
for driving if track energy is not available, and adjusting energy levels at runtime based
on differing priority levels of subsystems requesting energy. The component involves a
mix of hardware and software elements; the software is executed on an on-board com-
puter and controls the various functions of the component listed above1.
Participants, Data Collection, Data Analysis: In this study, eight senior-level de-
velopers and researchers (with over five years of experience and domain expertise)
were extensively interviewed over a span of four months on a bi-weekly basis for
approximately one hour each interview session. Each developer is primarily responsi-
ble for his/her own major component. Additionally, they provided project documents
and validated emergent findings.
There are numerous qualitative-based data sources in this project: minutes, theses
and reports, research papers, presentation slides, design documents, prototypes, other
project documents. Other primary data source is the RailCab developers interviews.
Over ten hours of audio recordings resulted in over 80 transcribed pages of text. The
interviews focused on: (i) domain understanding, (ii) extracting the requirements
decisions and high-level architecture for the Energy Management component, and (iii)
determining the RE-SA interaction.
Qualitative coding [6] was used to analyse the project documents and interview
data. For example, if a segment of text is describing a current requirement for the
Active Guidance module (one of the components of the RailCab system) and how
the requirement is affected by a previous architectural decision made for the Energy
Management module then it will be tagged as such. The coded text can then be
counted to create frequency figures of the various categories (i.e., requirements
decisions) which form the basis of the study results.
3 Emerging Results
3.1 Architectural Impact on RE Decision-Making (Q1)
In the Energy Management component, a total of 30 requirements decisions were ex-
tracted from the project documents and interviews with the RailCab staff. A significant
1
Because of space constraints, we are not able to provide comprehensive information regarding
the technical SA details of the RailCab. For more information, the readers are referred to:
http://www.sfb614.de/en/sfb614/subprojects/project-area-d/subproject-d2/
26 R. Ferrari et al.
portion of these decisions was affected in some way by the evolving architecture.
Here, we describe the characteristics of these RE-SA interactions.
Referring to Table 1, overall, 13 out of the 30 decisions (43%) were affected by
previous architectural decisions. Conversely, 17 decisions (57%) were not affected.
Out of the 13 affected decisions, 8 were of the type constrained (27%) and 6 were of
type enabled (20%), almost an even split between these two types. These figures are
similar to what we observed in our previous study [4], where 30% of the effect types
were found to be of type enabled and 25% were of type constrained.
The effect type influenced was 6% in our previous study [3] to no observations in
the current component study. No new types of RE-SA interaction effects were ob-
served in the component study. The overall affected:not_affected results (43%:57%)
are similar to that in our earlier study in the banking domain (59%:41%). We now
probe into the characteristics of the different types of affected decisions.
Constrained. The 8 out of 30 (27%) constrained decisions can be considered substan-
tive. Of these, two decisions were core or essential for the operation of the RailCab.
These were the availability of the driving of the shuttle vehicle even in case of track
power failure, and that the hydraulic unit receives its own energy converter and sup-
ply. Four of the eight constrained decisions are what can be classified as consequen-
tial decisions, i.e., decisions that emerged as a consequence of other decisions made
in the same subsystem, or were triggered by feedback from other implementation-
based development activities. The remaining two decisions were triggered by design
oversights made previously in other components that were not discovered until the
implementation and testing phases of development.
Enabled. The 6 out of 30 (20%) enabled decisions can also be considered substan-
tive. Of these, 4 decisions led to core requirements for the energy subsystem. One
decision was consequential and emerged during more detailed construction and im-
plementation phases. Finally, one decision was the mixed enabled/constrained deci-
sion and its source was both as a core feature of the RailCab, and was also a fix to a
previous design oversight from the motor and track topology design.
Neutral. It is also significant that 17 out of 30 (57%) decisions were not affected by
previous architectural decisions. Basically, these decisions were made during the
early phases of planning, which spanned approximately 2-3 years, and remained sta-
ble for the entire duration of the development process. Furthermore, these decisions
and their subsequent solutions were largely dictated by the system domain and did not
offer many alternative solution strategies.
Requirements and Systems Architecture Interaction in a Prototypical Project 27
In every case of a constrained decision, the result was increased construction (i.e.,
hardware assembly and software coding) time and effort due to the constrained
decision. The second highest development activity that was severely affected by
constrained decisions was testing, which was observed in 7 out of the 8 cases. This
involved the creation of new test cases as well as testing procedures. The third high-
est development activity was systems architecting, which was affected in terms of
effort spent in 5 out of the 8 cases due to a constrained decision. Other activities (e.g.,
requirements prioritization, costing, and elicitation) were also observed but in only 1
or 2 cases.
For the enabled cases, the impact on other activities is difficult to discern since
there are no counter-cases to compare; the benefit could only really be observed dur-
ing the RE decision making time but later in the implementation, design and testing it
was not reported any different than the not affected decisions.
The impact of the constrained decisions on system properties (i.e., non-functional
attributes) in the context of this single component study -- was less noticeable. In 5
of the 8 cases, the developers reported a slight degradation of system quality. The
attributes mostly affected were the physical space of the shuttle (in 3 cases) and the
software modifiability of the system (in 2 cases); the only other attribute affected (in 1
case) was the energy efficiency. However, in all of these cases, the system properties
did not deviate significantly from what was originally intended or desired. As with
the previous subsection, it was difficult to discern any positive benefit in the enabled
cases because they followed a similar implementation path as the not affected cases.
These results seem to fit the characteristics of a prototypical project where, in a
constrained decision, developers could not simply upgrade or replace hardware com-
ponents because of the cost involved; instead, they had to spend time and effort in
finding alternative solutions that still provided near-desired levels of system quality.
In a production environment, the reaction to constrained decisions may very well
different, depending on budgetary and other factors.
4 Implications
There are a number of implications of the findings. We discuss one example:
Tighter SA-RE integration across different subsystems: With almost 50% of the RE
decisions being affected by an architecture (see Table 1), and many (50%) of the
affected decisions originating outside of the energy management component, it is
strongly encouraged that the SA and RE processes be more tightly integrated [4] to
provide insight on the technical feasibility of the elicited requirements in terms of
constraints and enablers from a non-local sub-system.
In RailCab, RE and SA decisions are predominantly made synonymously within a
single subsystem and no distinction is made between SA and RE roles. For example,
the early decisions for the motor and track topology subsystems led to constraints in
the energy management system which the planners knew about but deferred until
later. Requirements and Design were highly intertwined in the motor and track
subsystem, yet during this early planning phase the focus was almost entirely on the
28 R. Ferrari et al.
motor and track subsystem; high-level requirements were elicited for the energy
subsystem but no detailed RE or SA work was done at that time. After the motor and
track design phases were near completion, the energy subsystems detailed RE and
SA phases commenced. However, it was then determined that previously known con-
straints would be more difficult to plan and implement because of tradeoffs intro-
duced in design decisions from the motor and track subsystem. Thus, one lesson
learnt from this is that during the design phase, corresponding detailed RE and SA
work should also have been carried out in the energy subsystem to handle alignment
issues.
5 Related Work
From [3], we summarize below related work on the role of an SA in RE. In 1994,
Jackson [2] gave four key reasons why RE and SA are best treated as interweaving
processes. In 1995, El-Emam and Madhavji [1] found four factors for RE success in
information systems that deal with architecture and/or the system (one of which is
relevant for this study): the adequacy of diagnosis of the existing system (which in-
cludes SA). In 2001, Nuseibeh [4] described the twin-peaks model, which captures
the iterative relationship between RE and SA. An important aspect of this model is
that SA can, and should, feed back into the RE process.
6 Conclusion
We describe the impact an existing Systems Architecture has on requirements deci-
sions, determined through a case study on a rail project (RailCab), and is an extension
of an initial exploratory study [3] that was conducted in a laboratory setting. The
case study involved the analysis of approximately 10 years worth of project docu-
ments and extensive interviews with RailCab staff with a focus on one of the six
major RailCab system components (Energy Management). In a nutshell, we found 30
requirements decisions where:
13 (43%) were affected by a previous architectural decision.
8 of these 13 decisions were constrained by the existing architecture.
6 of these decisions were enabled by the existing architecture.
Furthermore, from the identified affected requirements decisions, we qualitatively
determined their impact on other development activities and properties of the resultant
system. Despite being emergent findings, this early evidence suggests that existing
architecture in the Requirements Engineering process does have a serious impact on
requirements decisions.
References
1. El Emam, K., Madhavji, N.H.: Measuring the Success of RE Processes. In: Proc. of the 2nd
IEEE Int. Symp. on RE, York, England, March 1995, pp. 204211 (1995)
2. Jackson, M.: The Role of Architecture in RE. In: Proc. of 1st Int. Conf. on RE,
p. 241 (1994)
3. Miller, J., Ferrari, R., Madhavji, N.H.: Architectural Effects on Requirements Decisions: An
Exploratory Study. In: 7th Working IEEE/IFIP Conf. on SA, Vancouver, Canada,
pp. 231240 (2008)
4. Nuseibeh, B.: Weaving Together Requirements and Architectures. IEEE Comp. 34(3), 115
(2001)
5. Ramesh, B., Jarke, M.: Toward Reference Models for Requirements Traceability. IEEE
Transactions on Software Engineering 2(1), 5893 (2001)
6. Runeson, P., Host, M.: Guidelines for conducting and reporting case study research in
software engineering. Journal of Emp. Soft. Eng. 14(2), 131164 (2009)
Videos vs. Use Cases: Can Videos Capture More
Requirements under Time Pressure?
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 3044, 2010.
Springer-Verlag Berlin Heidelberg 2010
Videos vs. Use Cases: Can Videos Capture More Requirements under Time Pressure? 31
are most valuable during elicitation and validation activities. During these activities it
is often hard to reach customers and stakeholders. In addition, customers and stake-
holders have limited time, get impatient, and even misunderstand abstract require-
ments. In order to validate requirements, a concrete representation of requirements
(e.g. a prototype) is needed. Videos as a means of documenting requirements promise
to support stakeholder interaction in a more efficient way, because they are more
concrete and easier to understand by stakeholders. Yet, in literature it remains unclear
if there are any advantages over textual requirements documents. Empirical insights
about costs and benefits of videos are needed.
In this work we investigate whether videos can replace textual requirements repre-
sentations. We give no guidance for creating good videos this remains future work.
Instead, we compare ad-hoc videos with use cases as a widely used textual representa-
tion of requirements. Firstly, we compare the efficiency of creating videos and textual
requirements descriptions by subjecting the analysts to time pressure. Secondly, we
investigate the effectiveness, i.e. whether customers can distinguish valid from invalid
requirements when they see them represented as use cases, or in videos. The ability to
recognize requirements fast in a communication medium is a prerequisite for using
that medium successfully during requirements analysis. The results of our experiment
indicate that our subjects were able to capture more requirements with videos in the
same time period. In contrast to others we find that videos can capture more require-
ments than use cases in the same time period.
This paper is organized as follows: In Sect. 2 we discuss possible situations where
videos can be used in RE and describe the context of our investigations more closely.
In Sect. 3 we give an overview of related work. In Sect. 4 we describe the design of
our experiment based on the Goal-Question-Metric paradigm. Our results are pre-
sented in Sect. 5 and their validity is discussed in Sect. 6. Sect. 7 gives our conclu-
sions and discusses questions for future research in video-based RE.
Videos in elicitation meeting: Analysts gain better understanding of the system under
construction while planning potential use scenarios for videos and by enacting them.
During the elicitation meeting, stakeholders can provide direct feedback on video
scenes. However, very little is known about the system at this stage. The videos that
can be created for elicitation meetings are based on rather abstract and visionary re-
quirements, often derived from marketing [4] (i.e. from the vision statement). Hence,
there is a high risk of creating irrelevant video scenes. Because of this risk, videos
should not be too expensive, but focus on showing typical scenarios and contexts of
usage for discussion with the stakeholders.
Videos for validation and negotiation: These videos are created based on the require-
ments from elicitation meetings. Requirements engineers have identified and inter-
viewed stakeholders and interpreted their requirements in order to add concrete user
goals to the project vision. During validation and negotiation, visualization of re-
quirements in videos makes it easier for stakeholders to recognize their own require-
ments, and identify missing details or misinterpretations. Confirming recognized
requirements and correcting or adding missing ones is equally important in this phase.
Videos in design and construction: Videos portrait the assumed application of a
planned product. It contains assumptions on environmental conditions, the type of
users envisioned, and their anticipated interaction with the system. This information is
useful for system developers. There are approaches to enhance videos with UML
models, which allow using videos as requirements models directly [2, 3]. In this work,
we do not focus on videos in design and construction, but on videos in elicitation,
validation, and negotiation meetings (video opportunities I and II in Fig. 1).
3 Related Work
Karlsson et al. illustrate the situation in market-driven projects [4]. Accordingly, in
such projects developers invent the requirements. Since future customers of marked-
driven products may not be available for elicitation, their feedback during validation
will be especially important for project success. Simple techniques are needed that
allow stakeholders to participate.
In this section we describe related work dealing with scenarios, requirements visu-
alization, and videos as requirements representation techniques.
Scenario-based approaches. In order to support capturing, communicating and un-
derstanding requirements in projects, scenarios have been proposed by several re-
searchers [1, 6, 7]. They allow analyzing and documenting functional requirements
with a focus on the intended human-machine-interaction.
Anton and Potts show how to create scenarios systematically from goals [1]. They
show that once a concrete scenario is captured other scenarios can easily be found.
There are several ways to document scenarios. One classic way is to create use
cases that describe abstract scenarios [6]. The lack of concrete scenario representation
is often observed as weakness, because it prevents stakeholders from understanding
the specification. Therefore, Mannio and Nikula [7] describe the combination of pro-
totyping with use cases and scenarios. They show the application of the method in a
simple case study. In the first phase of their method, use cases are created and in one
Videos vs. Use Cases: Can Videos Capture More Requirements under Time Pressure? 33
of the later phases a prototype is constructed from multimedia objects like pictures,
animations and sounds. The prototype is used to elicit the stakeholders requirements.
The prototyping leads to a more focused session.
A similar approach to support the creation of scenarios was presented by Maiden et
al. [8, 9]: The ART-SCENE tool enables organizations to generate scenarios auto-
matically from use cases and permits guided scenario walkthroughs. Zachos et al.
propose to enrich such scenarios with multimedia content [10]. This facilitates recog-
nizing and discovering new requirements. The evaluation of this approach is promis-
ing, but still additional evaluation is needed to understand the value of videos in RE.
Visualization of Requirements. Truong et al. describe storyboards as a technique to
visualize requirements [22]. Storyboards are able to graphically depict narratives
together with context. Williams et al. argue that a visual representation of require-
ments is important [11]. This visual representation makes RE more playful and enjoy-
able, thus contributing to better stakeholder satisfaction and more effective software
development. They recommend using requirements in comic book style, because this
allows combining visualizations with text. Williams et al. give no empirical evalua-
tion if typical developers or stakeholders are able to create good-enough comic style
requirements specifications. In addition, drawing comics may be time-consuming.
Videos in RE. Apart from comics, videos have been proposed as a good technique for
adding visual representations to scenarios [2, 3, 12]. Broll et al. describe the use of
videos during the RE (analysis, negotiation, validation, documentation) of two pro-
jects [12]. Their approach starts by deriving concrete contexts from general goals.
Concrete scenarios are defined for each context of use. Based on these scenarios,
videos are created. In parallel, the scenarios are analyzed. Both the video material and
the analysis results are used to negotiate requirements in focus groups (small groups
of important stakeholders). Broll et al. do not provide quantitative data about the ef-
fectiveness of their approach, but share qualitative lessons learned. They conclude
that amateur videos created with household equipment are sufficient for RE-purposes.
Based on their experience, they expect video production to be expensive due to the
time-consuming recording of video and audio material. Therefore, they recommend to
consider videos in RE as an option, but to keep the cost minimal. We agree that vid-
eos in sufficient quality can be created by a development team.
Brgge, Creighton et al. present a sophisticated high-end technique for video anal-
ysis of scenarios [2, 3]. Their approach starts with the creation of scenarios. Based on
these scenarios, videos are created and refined to professional scenario movies. The
Software Cinema tool allows enriching videos with UML models. This combination
is a formal representation of requirements, useful for subsequent phases of the soft-
ware development process (i.e. design). They found it feasible to negotiate require-
ments based on videos. However, they did not discuss whether videos are superior to
textual representations of scenarios or not.
Compared to related work, this paper contributes by presenting empirical results
from the comparison of videos and text based scenarios. In contrast to the expecta-
tions of others, our results suggest that videos can be produced faster and at lower
effort than use cases during requirements analysis.
34 O. Brill, K. Schneider, and E. Knauss
4 Experiment Design
Experiments in software engineering are difficult to design. There is a tension: Real
situations are difficult and expensive to reproduce and compare. Very simple effects
may be easier to observe and compare, but have little significance for practical appli-
cations. Zelkowitz et al. present several levels of empirical validation [13]. Anecdotal
evidence is easy to capture, but insufficient to derive conclusive results. Rigid ex-
periments might enable us to apply statistical methods, but controlling all threats to
validity is hardly possible in any non-trivial set-up. In order to improve RE, a careful
balance is needed.
Our experiment is designed to cover a few relevant aspects of working with videos,
while simplify all others. Thus, it combines non-trivial real effects with the attempt to
control threats to validity. The Goal-Question-Metric paradigm (GQM, see [14])
provides guidance for metrication in process improvement. It proposes a goal-oriented
approach to selecting and defining metrics. Due to the real-world constraints (effort,
limited comparability / repeatability), GQM is often applied to study phenomena in a
rigid and disciplined way, without claiming statistical significance or generalizability.
We had eight student volunteers who had all some experience writing and reading
use cases, but no or very limited experience using videos. None had applied videos to
requirements before. Students had a computer science background and were in their
second to fourth year. Two researchers acted as customers. Each of them adopted two
tasks ("project requirements") and explained them to some of the students (see be-
low). The first task was about navigating within the university building in order to
find a person or office (person finder). The second task was about an airport check-in
with the ability to assign waiting tickets to all boarding pass holders - and priority
checking of late passengers who might otherwise miss their planes (adaptive check-
in). Both customers were encouraged to invent more details and requirements about
these visions. None of the subjects was involved in this research before. Customers
did not know what the experiment was about before the evaluation started.
We use a short time slice for requirements elicitation. Use cases vs. ad-hoc videos
are used to document elicited requirements, and to present them to the customers for
validation. Counting recognized requirements is afforded by using lists of explicit
requirements as a reference. In a pre-study, we examined feasibility of that concept
[15]. Based on lessons learned in that pre-study, we made adjustments and refine-
ments for the current experiment design. This new design makes best use of our avail-
able subjects in the above-mentioned context. We are aware of the threats due to stu-
dent subjects and the limited number of repetitions, but consider those limitations
acceptable [16] (see discussion in Sect. 6). We consider our experiment scenarios
appropriate to represent the kind of real-world situations we want to study. They are
relevant for evaluating the potential of videos in RE.
GQM starts by looking at goals for improvement, and measurement goals. We stated
goals of our investigation and used a number of cognitive tools and techniques to
refine them into questions, hypotheses, and finally metrics that were designed into the
experiment. At the same time, we took systematic precautions to limit and reduce
Videos vs. Use Cases: Can Videos Capture More Requirements under Time Pressure? 35
threats to validity. Other researchers are invited to use our considerations and design
rationale as a starting point to replicate or extend our experiment. A replication in
industry would be expensive, but particularly welcome. Our questionnaire and ex-
periment kit are available for replications of our study.
Goal of investigation:
Investigate effectiveness and efficiency of creating ad-hoc videos under time pres-
sure for validation of early requirements compared to use cases
Goal 1: Analyze effectiveness and efficiency of use cases ()
Goal 2: Compare effectiveness and efficiency of videos with respect to use cases ()
Goal 3: Analyze subjective preferences of videos with respect to use cases ()
For each goal, a number of characterizing facets were specified. According to GQM
[14] and our own experience in applying it in industry [17], this explicit facet classifi-
cation helps to focus measurement and to avoid ambiguities.
In the pre-study, we had analyzed both customer and developer perspectives and
what they recognized. They represent requirements analysis and design&create tasks.
In this paper, only the customer is defined to be the reference for recognizing re-
quirements. Requirements that are not perceived by the customer cannot be confirmed
or corrected during validation. Therefore, the customer perspective is adopted for
comparing effectiveness and efficiency for goals 1 and 2 in Table 1. When personal
preferences are solicited for goal 3, however, we focus on the requirements engineers
perspective: videos only deserve further investigation if requirements engineers ac-
cept them and consider them useful. Similar considerations are stimulated by the other
facets. For example, the purpose of comparing things (goal 2) requires a reference for
that comparison we planned to apply use cases analyzed in goal 1 for that purpose.
In order to reach the goals, a number of questions need to be answered: This is how
our research questions are related to the above-mentioned goals of our investigation.
According to GQM, goals are further refined into questions and hypotheses. Abstrac-
tion sheets [18] may be used to guide the refinement and preparation process. They
force researchers to make decisions on details of their research questions.
36 O. Brill, K. Schneider, and E. Knauss
seeing actual results. Since most measurements in real-world settings do not provide
statistically significant results, it is even more important for interpretation of results to
define what we mean by "similar", "more" and "remarkably more", respectively.
Goal 3, asking for the subjective preferences of our subjects, was evaluated using a
questionnaire. Basically we asked whether our subjects preferred videos or use cases
for documentation under time pressure.
Based on explicit questions and hypotheses, metrics can be selected or defined. GQM
is often applied to measuring symptoms of process improvement in real-world envi-
ronments [17]. In those cases, metrics should be integrated into existing practices; this
reduces measurement effort and mitigates the risk to distort the measured object by
the measurement.
Our experiment is designed to reflect non-trivial real-world aspects of validation
under time pressure, and to accommodate measurement at the same time. In order to
fully exploit the available resources (participants, time, effort), experiment tasks were
carefully designed. Measurement is supported by forms and a questionnaire for goal
3: subjective preference. When GQM is applied consistently, metrics results directly
feed back into evaluation and interpretation. The overall setup is shown in Table 2.
Table 2. Setup of experiment using eight subjects (a-h) and two customers (A, W)
Tasks / projects
person finder adaptive check-in
Customer A config. 1 a,b use cases video e,f config. 2
c,d videos use cases g,h
Customer W config. 3 e, f use cases videos a,b config. 4
g, h videos use cases c,d
Two researchers (A, W) acted as customers. Each phase contained two configura-
tions, one for each customer. A configuration is characterized by the same task, the
same customer, and two pairs of subjects working independently, one applying use
cases, the other applying videos. In the second phase, a different task was presented
by a different customer and the pairs of subjects exchanged techniques. Each configu-
ration followed the same schedule:
10 min. Customer explains task to all subjects of that configuration together
(e.g., a, b, c, d). They may ask clarification questions.
30 min Pairs of subjects conceive and produce use cases vs. videos.
In parallel, customers write down a list of those requirements that
were explicitly mentioned during the 10 minute slot of explanation
and questions.
38 O. Brill, K. Schneider, and E. Knauss
10 min. Pairs clean up their results. They rewrite text and use cases, down-
load video clips from the cameras, rename files etc.
Afterwards Each customer evaluates use cases and videos with respect to the
reference list of explicit requirements (see above). They check and
count all recognized requirements, and count false or additional re-
quirements raised. They use a form to report those counts.
Fig. 2 shows three excerpts from different videos produced during the experiment. All
teams used hand-drawn paper mockups, combined them with available hardware like
existing info terminals or furniture in the university building, or mobile phones. By
interacting with new, pretended functionality, subjects illustrated the envisioned sys-
tem. Most groups enacted scenarios like a passenger in a hurry who benefits from an
advanced ticket system (Fig. 2, center).
In the pre-study, we wanted to explore the feasibility of using video for fast require-
ment documentation. With respect to Table 2, the pre-study resembled one configura-
tion, but with four people interpreting the results of that one configuration.
We wanted to repeat the experiment in order to substantiate our observations. We
were able to add four configurations. Given our research questions, we needed to
investigate the validation capabilities of videos vs. use cases in more detail. It was not
sufficient to recognize intended requirements in the use cases or videos we wanted
to classify represented requirements based on Kanos categories [19].
Design inspired by factorial variation can be used in software engineering in order
to exploit the scarce resource of appropriate subjects better, and to avoid threats asso-
ciated with dedicated control groups: All eight subjects carried out two tasks in two
subsequent phases. We varied tasks, customers, and media in a systematic way in
order to improve control of potential influence factors. This design reduces undesired
learning effects and biases for a particular technique. Variation of techniques,
customers, and tasks was used to minimize threats to validity (see Sect. 6).
Pairing subjects has a number of advantages: (a) The ability to communicate in a
pair improves the chance to derive ideas and reflect on them. (b) Two people can do
more work than individuals: write more use cases and make videos of each other,
which amplifies the visible impact of both techniques. (c) Different personalities
and their influence should be averaged out in pairs as opposed to individuals. We
Videos vs. Use Cases: Can Videos Capture More Requirements under Time Pressure? 39
considered those aspects more relevant than a higher number of configurations con-
taining only individuals instead of pairs.
5 Results
The counts of requirements recognized, and of additional requirements identified on
basic/excitement level are indicated in Table 3. The customer in a configuration pro-
vided the reference for requirements explicitly stated during the first 10 minute slot
of explanations and questions. All recognized and represented requirements were
marked in that list by their respective customer based on an audio recording of the
explanation session.
Table 3 presents all use case pairs in the top part, and their corresponding video
pairs in the lower part. The right-hand columns provide the average of additionally
raised requirements and the average percentage of recognized requirements. As addi-
tional requirements we count new basic or excitement requirements were raised and
were confirmed or corrected by the customer. Both types (corrected, confirmed) are
requirements that otherwise would have been forgotten. The sum of requirements
confirmed and corrected is given below the category (basic, excitement).
Table 3. Results of experiment: counts and percentages (average over all configs.)
Avg. Avg.
Config 1 Config 2 Config 3 Config 4 absolute % of total
customer total reference on explicit req. list 15 17 31 16
use case recognized performance reqs. 10 7 20 9 57%
basic reqs. confirmed 1 1 1 0 0,8
basic reqs. corrected 0 1 3 1 1,3
confirmed+corrected basic reqs. 1 2 4 1 2,0
excitem. confirmed 0 2 2 1 1,3
potential exc. corrected 1 1 2 1 1,3
confirmed+corrected excitement reqs. 1 3 4 2 2,5
video recognized performance reqs. 14 12 20 11 74%
basic reqs. confirmed 2 5 0 1 2,0
basic reqs. corrected 0 1 1 1 0,8
corresponds confirmed+corrected basic reqs. 2 6 1 2 2,8
to use case excitem. confirmed 0 3 1 0 1,0
in same potential exc. corrected 0 0 3 0 0,8
column confirmed+corrected excitement reqs. 0 3 4 0 1,8
much more difficult, artificial, and error-prone to distil satisfaction and preference
data from "objective observations".
We asked for potential preferences of use cases over videos:
Subjects considered videos more appropriate for an overview. They appeared less
ambiguous and better suited to illustrate functions. Use cases were preferred to
document exceptional behaviour and alternative steps. Pre- and postconditions
were mentioned as advantages, together with a finer level of detail.
Under time pressure, 7 (total: 8) subjects would prefer videos for documenting
requirements for various reasons: better description of requirements (6), better
coverage of usability aspects (6), more functional requirements per time (3), or
generally more requirements per time (2).
Without time pressure, still 5 (total: 8) subjects would prefer videos for docu-
menting requirements; only 2 would prefer use cases.
When GQM is used with explicit expectations, results can be easily compared to the
above-mentioned hypotheses. The most promising expectations and respective actual
results are briefly commented:
"Customers will recognize a similar amount of performance requirements in both
techniques [estimate: +/-10% (#req(use case) - #req(use case)/10 < #req(video) <
#req(use case) + #req(use case)/10]."
o Customers recognized 57 % of their requirements in use cases and 74% in
videos.
o Although this difference is far higher than the 10% expected, our small
number of configurations (4) limits statistical power and significance.
o Since there was no configuration in which use cases performed better than
videos, the four configurations support the feasibility of video-based re-
quirements validation.
Customers will be able to identify more errors and problems concerning basic
requirements in videos than in use cases [estimate: >50% more]
o Use cases led to an average of 2.0 additional basic requirements being
confirmed or corrected. In comparison, videos raised 2.8.
o Therefore, videos led to 40 % more additional basic requirements than
use cases. Our expectation of more than 50% is not fulfilled.
o Nevertheless, the experiment has approved the tendency that videos
raise more basic requirements than use cases.
For early requirements on an innovative product, customers will be stimulated to
identify more excitement requirements (correct or false) in videos than in use
cases when both are built under comparable time pressure [estimate: 1 or 2 ex-
citement requirements with use cases, at least 3 with videos].
o Use cases stimulated an average of 2.5 excitement requirements, videos
performed slightly worse at an average of 1.8.
o Our expectation is not supported by the observations and counts. Use
cases scored higher in two of the configurations, in the other two con-
figurations videos and use cases raised the same amount.
Videos vs. Use Cases: Can Videos Capture More Requirements under Time Pressure? 41
6 Discussion of Validity
Wohlin emphasizes the necessity to consider threats to validity during experiment
planning [20]. In Sect. 4, the design of our experiment referred to several threats to
validity - and provides rationale how we tried to avoid them. Nevertheless, several
threats remain and should be considered when our results are used.
Our "treatment" is the application of either use cases or videos to the representation
of requirements. We discuss four types of validity in the order of descending priority
that Wohlin considers appropriate for applied research:
Internal validity (do we really measure a causal relationship between videos and
requirements in validation?): We paired subjects randomly under the constraint that
each pair included one student of computer science and one of mathematics. We took
care to build equally strong pairs. The cross-design shown in Table 2 was inspired by
Basili et al. [21] in order to compensate for previous knowledge. Then each pair used
the other technique to compensate for learning during the experiment. Volunteers are
said to be a threat to validity since they tend to be more motivated and curious than
regular employees. This may be true, but our target population may be close enough
to our subjects to reduce that threat (see external validity).
There is a threat to validity we consider more severe: When customers evaluate re-
sults (use cases and videos) for "recognized requirements" and additional findings,
their judgment should be as objective and repeatable as possible. We took great care
to handle this threat: A customer audio-recorded the 10 minute explanation session
and derived the list of 15-31 requirements that were explicitly raised during the
explanation or questions. When customers evaluated results, they used this list as
a checklist, which makes the evaluation process more repeatable. Obviously, the
granularity of what was considered "one" requirement is very difficult and might
42 O. Brill, K. Schneider, and E. Knauss
cause fierce discussions among any two requirements experts. Our attempt to cope
with this threat is using "configurations" in which two pairs (one use case, one video)
operate under the same conditions, no matter what those conditions might be in detail:
same customer, same granularity, same task, participated in same 10-minute session
with same questions asked. By using four configurations, we try to compensate for
random influences in a given situation.
External validity (are the findings valid for the intended target population?): Stu-
dents and volunteers are usually regarded poor representatives of industry employees
[16]. However, our work tries to support the upcoming generation of requirements
engineers who are familiar with video-equipped mobile phones and multimedia hand-
helds. As explained in [15], two new developments encouraged us to explore ad-hoc
video for requirements validation in the first place: (1) the advent of inexpensive, ubiq-
uitous recording devices and (2) a generation of requirements engineers that have
grown up using mobile phones and PDAs voluntarily in their private life. Today's (high
school and) university students might represent that generation even better than current
industry employees who learned to practice RE with DFDs, Doors etc. All of our sub-
jects had completed at least one lecture that included a substantial portion (8h lecture
time) of "traditional" RE. We consider our students valid representatives of upcoming
requirements engineers in practice - which they may actually become in a year or two.
Construct validity (did we model the real phenomena in an appropriate way?): A
major threat to construct validity is a poor understanding of the questions and con-
cepts examined. This can lead to misguided experiment design. By following GQM
carefully, we were forced to specify our goals, questions, and derived metrics in de-
tail. For example, specifying purpose and perspective as goal facets usually helps
clarifying aspects that are otherwise neglected as "trivial". The GQM technique
guided us from the goal of investigation down to the form used by customers to mark
"recognized explicit requirements", and additional findings in the "explicit reqs. list".
Conclusion validity (What is the statistical power of our findings?): Conclusion
validity is considered lowest priority for applied research by Wohlin et al. [20] - large
numbers of subjects and statistical significance are very difficult to get in a real or
realistic setup. GQM is a technique optimized for exploring effects in practice, not so
much for proving them statistically [17]. Nevertheless, even in our small sample of
eight projects (4 tasks* 2 pairs), some differences are large enough to reach statistical
significance: e.g., the recognized number of explicit requirements is higher with vid-
eos than with use cases (statistically significant at alpha=10% in a double-sided paired
t-test). Although the statistical power is not very high, (p=0.86), an effect that even
reaches statistical significance is the exception rather than the rule in GQM.
References
1. A.I., Potts, C.: The use of goals to surface requirements for evolving systems. In: ICSE
1998: Proceedings of the 20th International Conference on Software Engineering, Leipzig,
Germany, pp. 157166. IEEE Computer Society, Los Alamitos (1998)
2. Creighton, O., Ott, M., Bruegge, B.: Software Cinema-Video-based Requirements Engi-
neering. In: RE 2006: Proceedings of the 14th IEEE International Requirements Engineer-
ing Conference, Minneapolis, Minnesota, USA, pp. 106115. IEEE Computer Society, Los
Alamitos (2006)
3. Bruegge, B., Creighton, O., Rei, M., Stangl, H.: Applying a Video-based Requirements
Engineering Technique to an Airport Scenario. In: MERE 2008: Proceedings of the 2008
Third International Workshop on Multimedia and Enjoyable Requirements Engineering,
Barcelona, Katalunya, Spain, pp. 911. IEEE Computer Society, Los Alamitos (2008)
44 O. Brill, K. Schneider, and E. Knauss
4. Karlsson, L., Dahlstedt, .G., Nattoch Dag, J., Regnell, B., Persson, A.: Challenges in
Market-Driven Requirements Engineerng - an Industrial Interview Study. In: Proceedings
of Eighth International Workshop on Requirements Engineering: Foundation for Software
Quality, Essen, Germany (2002)
5. Fischer, G.: Symmetry of Ignorance, Social Creativity, and Meta-Design. In: Creativity
and Cognition 3 - Intersections and Collaborations: Art, Music, Technology and Science,
pp. 116123 (1999)
6. Cockburn, A.: Writing Effective Use Cases. Addison-Wesley Professional, Reading (Janu-
ary 2000)
7. Mannio, M., Nikula, U.: Requirements Elicitation Using a Combination of Prototypes and
Scenarios. In: WER, pp. 283296 (2001)
8. Zachos, K., Maiden, N., Tosar, A.: Rich-Media Scenarios for Discovering Requirements.
IEEE Software 22, 8997 (2005)
9. Seyff, N., Maiden, N., Karlsen, K., Lockerbie, J., Grnbacher, P., Graf, F., Ncube, C.: Ex-
ploring how to use scenarios to discover requirements. Requir. Eng. 14(2), 91111 (2009)
10. Zachos, K., Maiden, N.: ART-SCENE: Enhancing Scenario Walkthroughs With Multi-
Media Scenarios. In: Proceedings of Requirements Engineering Conference (2004)
11. Williams, A.M., Alspaugh, T.A.: Articulating Software Requirements Comic Book Style.
In: MERE 2008: Proceedings of the 2008 Third International Workshop on Multimedia
and Enjoyable Requirements Engineering, Barcelona, Catalunya, Spain, pp. 48. IEEE
Computer Society, Los Alamitos (2008)
12. Broll, G., Humann, H., Rukzio, E., Wimmer, R.: Using Video Clips to Support Require-
ments Elicitation in Focus Groups - An Experience Report. In: 2nd International Work-
shop on Multimedia Requirements Engineering (MeRE 2007), Conference on Software
Engineering (SE 2007), Hamburg, Germany (2007)
13. Zelkowitz, M.V., Wallace, D.R.: Experimental validation in software engineering. Infor-
mation & Software Technology 39(11), 735743 (1997)
14. Basili, V.R., Caldiera, G., Rombach, H.D.: The Goal Question Metric Approach. In: Ency-
clopedia of Software Engineering, pp. 646661. Wiley, Chichester (1994)
15. Schneider, K.: Anforderungsklrung mit Videoclips. In: Proceedings of Software Engi-
neering 2010, Paderborn, Germany (2010)
16. Carver, J., Jaccheri, L., Morasca, S., Shull, F.: Issues in Using Students in Empirical Stud-
ies in Software Engineering Education. In: METRICS 2003: Proceedings of the 9th Inter-
national Symposium on Software Metrics, Sydney, Australia. IEEE Computer Society, Los
Alamitos (2003)
17. Schneider, K., Gantner, T.: Zwei Anwendungen von GQM: hnlich, aber doch nicht
gleich. Metrikon (2003)
18. van Solingen, R., Berghout, E.: The Goal/Question/Metric Method: A Practical Guide for
Quality Improvement of Software Development. McGraw-Hill Publishing Company, New
York (1999)
19. Kano, N.: Attractive Quality and Must-be Quality. Journal of the Japanese Society for
Quality Control, 3948 (1984)
20. Wohlin, C., Runeson, P., Hst, M., Ohlsson, M.C.: Experimentation In Software Engineer-
ing: An Introduction, 1st edn. Springer, Heidelberg (1999)
21. Basili, V.R., Green, S., Laitenberger, O., Lanubile, F., Shull, F., Srumgrd, S., Zelkowitz,
M.V.: The Empirical Investigation of Perspective-Based Reading. Int. Journal of Empirical
Software Engineering 1(2), 133164 (1996)
22. Truong, K.N., Hayes, G.R., Abowd, G.D.: Storyboarding: An Empirical Determination of
Best Practices and Effective Guidelines. In: DIS 2006: Proceedings of the 6th Conference on
Designing Interactive Systems, Pennsylvania, USA, pp. 1221. ACM, New York (2006)
Supporting the Consistent Specification of Scenarios
across Multiple Abstraction Levels
1 Introduction
Scenario-based requirements engineering (RE) is a well proven approach for the elici-
tation, documentation, and validation of requirements. In the development of complex
software-intensive systems in the embedded systems domain, scenarios have to be
defined at different levels of abstraction (see e.g. [1]). We call scenarios that specify
the required interactions of a system with its external actors system level scenarios
and scenarios that additionally define required interactions between the system com-
ponents component level scenarios. For brevity, we use the terms system scenar-
ios and component scenarios in this paper.
Requirements for embedded systems in safety-oriented domains such as avionics,
automotive, or the medical domain, must satisfy strict quality criteria. Therefore, when
developing system and component scenarios for such systems, a rigorous development
approach is needed. Such an approach must support the specification of scenarios at
the system and component level and ensure the consistency between system and com-
ponent scenarios. Existing scenario-based approaches, however, do not provide this
kind of support.
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 4559, 2010.
Springer-Verlag Berlin Heidelberg 2010
46 E. Sikora, M. Daun, and K. Pohl
In this paper, we outline our approach for developing scenarios for software-
intensive systems at multiple abstraction levels. This approach includes a methodical
support for defining scenarios at the system level, defining component scenarios
based on the system scenarios and an initial system architecture as well as detecting
inconsistencies between system scenarios and component scenarios. Our consistency
check reveals, for instance, whether the component scenarios are complete and neces-
sary with respect to the system scenarios. To automate the consistency check, we
specify system and component scenarios using a subset of the message sequence
charts (MSC) language [2]. The consistency check is based on a transformation of the
specified MSCs to interface automata [3] and the computation of differences between
the automata or, respectively, the regular languages associated with the automata. We
have implemented the consistency check in a prototypical tool and evaluated our
approach by applying it to a (simplified) adaptive cruise control (ACC) system.
The paper is structured as follows: In the remainder of this section, we provide a
detailed motivation for our approach. Section 2 outlines the foundations of specifying
use cases and scenarios using message sequence charts. Section 3 briefly describes
our approach for specifying system and component scenarios. Section 4 provides an
overview of our technique for detecting inconsistencies between system and compo-
nent scenarios. Section 5 summarises the results of the evaluation of our approach.
Section 6 presents related work. Section 7 concludes the paper and provides a brief
outlook on future work.
Abstraction levels are used to separate different concerns in systems engineering such
as the concerns of a system engineer and the concerns of component developers. We
illustrate the specification of requirements at multiple levels of abstraction by means
of an interior light system of a vehicle. At the system level, the requirements for the
interior light system are defined from an external viewpoint. At this level, the system
is regarded as a black box, i.e. only the externally visible system properties are con-
sidered without defining or assuming a specific internal structure of the system. For
instance, the following requirement is defined at the system level:
R2 (Interior light system): The driver shall be able to switch on the ambient light.
The level of detail of requirement R2 is typically sufficient for communicating about
the requirements with system users. However, for developing and testing the system,
detailed technical requirements are needed. To define these detailed technical
requirements, the system is decomposed into a set of architectural components and
the requirements for the individual components and the interactions between the com-
ponents are defined based on the system requirements. For instance, the following
requirements are defined based on requirement R2 (after an initial, coarse-grained
system architecture has been defined for the interior light system):
R2.1 (Door control unit): If the driver operates the Ambient button, the door
control unit shall send the message LIGHT_AMB_ON to the roof control unit.
R2.2 (Roof control unit): If the roof control unit receives the message
LIGHT_AMB_ON, it shall set the output DIG_IO_AMB to high.
Supporting the Consistent Specification of Scenarios 47
goals are typically grouped into use cases (see e.g. [6]). Scenarios can be documented
using various formats such as natural language, structured templates, or diagrams.
To facilitate automated verification, we use message sequence charts (see [2]) for
specifying and grouping scenarios, both, at the system and the component level. We
decided to use message sequence charts since they are commonly known in practice
and offer a standardised exchange format. The specification of scenarios using mes-
sage sequence charts is outlined in this section. The formalisation of message se-
quence charts employed in our approach is outlined in Section 4.
The message sequence charts language defines basic message sequence charts
(BMSCs) and high-level message sequence charts (HMSCs). The essential elements
of a BMSC are instances and messages (see Fig. 1). The essential elements of a
HMSC are nodes and flow lines. A node can refer to a BMSC or another HMSC. A
flow line defines the sequential concatenation of two nodes. Therein, the sequential
order of the nodes may contain iterations and alternatives. Formal definitions of the
(abstract) syntax of BMSCs and HMSCs are given, for instance, in [7]. The graphical
notation of BMSCs and HMSCs used in this paper is shown in Fig. 1.
BMSC1
interaction
interaction
Fig. 1. Graphical notation of BMSCs (left-hand side) and HMSCs (right-hand side)
2.2 Specifying Use Cases and Scenarios Using Message Sequence Charts
Fig. 2 shows the documentation and composition of scenarios using message se-
quence charts as opposed to the documentation and grouping of scenarios by means of
use case templates (see e.g. [6]). In our approach, we use message sequence charts in
order to reduce the ambiguity caused by documenting scenarios and their composition
using natural language. We use BMSCs for documenting atomic scenarios (or sce-
nario fragments) and HMSCs for scenario composition. Therein, a HMSC can com-
pose several scenario fragments into a larger scenario, several scenarios into a use
case, or several use cases into a larger use case (Fig. 2, right-hand side). By using
HMSCs, relationships between use cases such as include and extend as well as
the sequential execution of use cases can be defined. A similar approach based on
UML activity diagrams and UML sequence diagrams is described in [8]. Note that
other information such as use case goals or pre- and post-conditions still need to be
documented in the use case template.
Supporting the Consistent Specification of Scenarios 49
Besondere Anforderungen -
Fig. 2. Templates (left) vs. messages sequence charts (right) for documenting scenarios
Coarse-grained
System 1. Specification of 2. Specification of
system
goals system scenarios component scenarios
architecture
Required input Required input
Differences
System Component
Main output between the
scenarios scenarios
scenarios
3. Comparison of
scenarios across
abstraction levels
Project-specific
consistency rules
environment required to satisfy the associated goal. For each system goal, the relevant
system scenarios satisfying this goal should be documented.
When specifying the system scenarios, the requirements engineers need to ensure
that a black box view of the system is maintained. In other words, the specified
system scenarios should only define the external interactions of the system and no
internal interactions since defining the internal interactions is the concern of lower
abstraction levels. Furthermore, system scenarios should be defined at a logical level,
i.e. independently of a specific implementation technology such as a specific interface
design or a specific system architecture. Fig. 4 shows a simple system scenario for the
interior light system example introduced in Section 1.1.
Switch on
ambient light
Activate
In early phases of use case development, the focus should be placed on the main or
normal course of interactions that is performed to satisfy the use case goal(s). In later
stages, alternative and exception scenarios should be added and related to the main
scenario. However, in order to maintain the black box view, alternative and exception
scenarios should be defined only if they are required independently of the internal
structure of the system and the technology used for realising the system.
Switch on
ambient light
LIGHT_AMB_ON
Activate
Note that one system scenario may be detailed by several component scenarios
which define different possibilities for realising the system scenario through different
system-internal interactions.
Switch on
ambient light
LIGHT_AMB_ON
IO_FAILURE
HMSC-Structure of
Component-level Use Case
HMSC-Structure of
System-level Use Case
<<refines>>
BMSC 1a BMSC 2a BMSC *
BMSC 1 BMSC 2
(Main (Alternative
scenario) scenario)
BMSC 1/2b BMSC 1/2b
specifications represent a pair of scenarios (or use cases) related to each other by a
refines link. Each MSC specification may comprise several HMSCs and BMSCs.
The algorithm is applied to each such pair of MSC specifications individually.
The architecture that is provided as input defines the decomposition of the system into
a set of components. It hence interrelates the instances defined in the two MSC
specifications.
In Step 3.1 (see Fig. 8), the scenarios are normalised in order to ensure that identi-
cal message labels have identical meanings. Step 3.2 transforms the normalised MSCs
into interface automata. It results in a system-level automaton PH (Step 3.2a) and a
component-level automaton PL (Step 3.2b). Step 3.3 computes the differences
between PH and PL. It results in two automata PH-L and PL-H.
The individual steps are explained in more detail in Subsections 4.1 to 4.3. In addi-
tion, we outline the analysis of the computed difference automata in Section 4.4.
Normalised
scenarios 3.2a Computation of PH
System
system-level
scenarios
automaton PH
3.1 Normalisation
of the scenarios
Normalised
scenarios 3.2b Computation of
Component
component-level
scenarios PL
automaton PL
To compute the differences between the system and component scenarios, the mes-
sage sequence charts documenting the scenarios must match some input criteria. We
assume that each message sent or received by an instance has a label that uniquely
identifies its message type. Furthermore, we assume that messages of a specific type
sent to the environment or received from the environment are labelled identically at
the system level and the component level.
The information which components decompose the system is taken from the archi-
tecture model. For the system scenarios, we assume that the instance representing the
system is named consistently with the representation of the system in the architecture
model. For the component scenarios, we assume that the instances representing sys-
tem components are named consistently with the representations of the components in
the architecture model.
To ensure that the above assumptions hold, a normalisation step is performed in
which the instance names and message labels are checked and adapted, if necessary.
advantages that make them particularly suitable for our approach. For instance, the set
of actions of an interface automaton is partitioned into a set of input actions, a set of
output actions, and a set of internal actions. This corresponds to the event types de-
fined for MSCs (receive, send, internal). Furthermore, interface automata do not en-
force that each input action is enabled in every state. This is also true for the MSCs in
our approach since we assume that, after reaching a specific location of an instance
line, only the specified events are allowed to occur.
We briefly outline the transformation of the scenarios into interface automata:
1. Construction of instance automata: In this step, an interface automaton is com-
puted for the system (system level) as well as for each component (component
level). For this purpose, first each BMSC is transformed into a set of partial
automata based on the algorithm described e.g. in [10]. Subsequently, the partial
automata are concatenated by inserting -transitions (i.e. invisible transitions) as
defined by the HMSC edges. Environment instances are disregarded in this step.
2. Elimination of -transitions and indeterminism: In this step, non-deterministic
transitions and the -transitions inserted during concatenation are eliminated in
the interface automata that were constructed in the first step. For performing
this step, standard algorithms for automata such as those described in [11] can
be used. The results of this step are the system-level automaton PH and a set of
component-level automata.
3. Composition of the instance automata: In this step, the automata derived from
the component scenarios are composed to a single component-level automaton
PL by means of the composition operator defined for interface automata
(see [3]).
The comparison of the automata shall reveal differences between the traces of
the automata with regard to the externally observable system behaviour (similar to
weak trace equivalence; see [12]). For this purpose, the traces consisting only of
input and output actions of the interface automata PH and PL need to be compared
with each other. The set of traces of the automaton PH is called the language of PH
and denoted as LH. The set of traces (of input and output actions) of PL is called the
language of PL and denoted as LL. To compare the two languages, two differences
must be computed:
LH-L = LH \ LL = LH LL and
LL-H = LL \ LH = LL LH
Hence, for computing the desired differences, the intersection and the complement
operator for automata must be applied [11]. Since the complement operator requires a
deterministic finite automaton as input, PH and PL must be transformed into determi-
nistic automata. Furthermore, the internal actions defined in PL must be substituted by
-transitions. Due to space limitations, we omit the details here.
Supporting the Consistent Specification of Scenarios 55
The requirements engineers can interpret the resulting automata in the following way:
LH-L = and LL-H = : In this case the externally observable traces of both
automata are identical, i.e. the scenarios at the system level and the component are
consistent to each other.
LL-H : In this case, the component scenarios contain traces that are not defined
at the system layer. The requirements engineers have to analyse these in order to
determine which of them are valid or desired and which ones are not. Traces con-
tained in LL-H that are considered valid may indicate missing system scenarios.
However, the project-specific consistency rules may also allow such traces under
certain conditions.
LH-L : In this case, the system scenarios contain traces that are not defined at
the component level. The requirements engineers have to analyse these traces in
order to determine which of these traces are valid. Traces contained in LH-L that are
considered valid indicate missing component scenarios.
For supporting the interpretation and analysis of the computed differences, we gener-
ate graphical representations of the difference automata using the environment
described in [13]. The analysis results are used to drive the further development and
consolidation of the system and component scenarios.
ACC System
UC 5.3
Follow a vehicle
UC 6 ahead
Adjust set speed
UC 5.1
UC 5.2 Drive at set speed
Detect a vehicle Vehicle
<<extends>> ahead
<<extends>>
<<extends>>
UC 5
Driver
Drive with activated
ACC Engine
UC 4
...
Deactivate ACC
Brakes
Fig. 9. Excerpt of the use case diagram defined for the ACC system
56 E. Sikora, M. Daun, and K. Pohl
actual speed
decelerate break
Headway Headway
Drive Control Engine Control Drive Control Brake Control
control control
decelerate break
For the identified use cases, scenarios were specified by means of message se-
quence charts. The specification activity resulted in eleven HMSCs and nineteen
BMSCs at the system level and eleven HMSCs and twenty-five BMSCs at the com-
ponent level. The consistency checking was performed using a prototypical imple-
mentation of the algorithm described in Section 4. Based on the computed differ-
ences, the system scenarios and component scenarios were consolidated to remove
inconsistencies. Fig. 10 depicts an exemplary system-level use case defined for the
ACC system, the corresponding component-level use case as well as the automata PH
and PL computed for these use cases. The difference LH-L for the use case depicted in
Fig. 10 is empty which means that the component-level use case realises all scenarios
defined by the system-level use case. The automaton representing the difference LL-H
is shown in Fig. 11. The sequences of actions which lead from the start state to the
final state of this automaton are included in the component-level use case but are not
defined by the system-level use case and thus may indicate inconsistencies. The
evaluation demonstrated the importance of an automated, tool-supported consistency
Supporting the Consistent Specification of Scenarios 57
check between system and component scenarios. The simplified ACC system used in
the evaluation was already too complex to detect all inconsistencies between system
and component scenarios manually. The prototypical tool revealed a large number of
inconsistencies in the initial use cases and thus contributed significantly to improving
the consistency of the use cases across the two abstraction levels. However, since the
evaluation has been performed in an academic environment, further investigations
concerning the applicability and usefulness of the approach in an industrial environ-
ment are needed.
Attention:
Speedrequest! vehicle Speedresponse?
Ahead!
Attention:
Speedresponse? vehicle
Ahead!
Decelerate!
Break!
Scalability or performance problems did not occur during the consistency check of
the ACC scenarios. For instance, the computation of the difference automaton de-
picted in Fig. 11 took approximately 100 milliseconds. Still, for very complex use
cases (such as a composition of all use cases defined for a system into a single use
case), a more efficient implementation might be needed.
6 Related Work
Although scenario-based approaches exist that support, in principle, the development
of scenarios at different abstraction levels such as FRED [15] and Play-in/Play-out
[16], these approaches do not offer the required rigor for safety-oriented development.
The checking of the consistency of the scenarios across abstraction levels is not sup-
ported by these approaches. Play-in/Play-out merely supports checking whether a set
of universal scenarios realise a defined existential scenario which is not sufficient
for proving cross-level consistency.
Approaches that support the formal verification of scenarios suffer from other defi-
ciencies limiting their applicability within our approach. Existing techniques that, in
principle, support the verification of MSCs across different abstraction levels provide
insufficient support for HMSCs or do not support HMSCs at all. In [9], severe restric-
tions are imposed by requiring identical HMSC structures at the system level and the
component level. The goal of temporal-logic model checking is typically to reveal a
single counter example. In contrast, our approach computes extensive differences
58 E. Sikora, M. Daun, and K. Pohl
between the scenarios defined at two abstraction levels. Furthermore, to apply tempo-
ral-logic model checking, use cases must be encoded using temporal logic which
limits the applicability of such an approach in practice.
Furthermore, our approach can be regarded as a further step towards a methodical
support for the transition between requirements and design as it facilitates the for-
mally consistent specification of black-box scenarios and design-level scenarios. The
approach thus complements less formal approaches such as [18] which aim at sup-
porting the communication between requirements engineers and architects.
7 Conclusion
The approach presented in this paper closes a gap in the existing, scenario-based re-
quirements engineering methods. It supports the development of scenarios at different
abstraction levels and therein facilitates cross-level consistency checking. Cross-level
consistency is important, for instance, for constructing safety proofs and to avoid
requirements defects which lead to significant rework during system integration.
The approach employs the message sequence charts (MSC) language as a formal,
visual specification language for scenarios and use cases. Individual scenarios are
specified as basic message sequence charts (BMSCs). High-level message sequence
charts (HMSCs) interrelate several BMSCs and allow for iterations and alternatives.
The consistency check offered by our approach aims at detecting differences in the
traces of externally observable events specified at the system level and those specified
at the component level. The approach thus reveals, for instance, whether the traces of
a component-level MSC are complete and necessary with respect to a system-level
MSC. The approach is not based on simple, syntactic correspondences but rather
employs a transformation of MSCs into interface automata. This makes the approach
robust against changes at the syntactic level such as restructuring an MSC.
We have demonstrated the feasibility of our approach by applying it to the specifi-
cation and consistency checking of requirements for an adaptive cruise control sys-
tem. Our approach has proven useful for supporting the specification of the scenarios
at the system and component level. We hence consider objectives O1 and O2 defined
in Section 1.3 to be met. The consistency of the scenarios was checked using a proto-
typical tool. Thereby, a large amount of inconsistencies could be resolved which were
difficult or even impossible to detect manually. We hence consider objective O3 (see
Section 1.3) to be met. The approach can be applied in settings where consistency
across different abstraction levels must be enforced and the use of formal specifica-
tion and verification methods is accepted. A detailed evaluation of the applicability of
our approach in industrial settings is ongoing work.
Acknowledgements. This paper was partly funded by the German Federal Ministry
of Education and Research (BMBF) through the project Software Platform Embed-
ded Systems (SPES 2020), grant no. 01 IS 08045. We thank Nelufar Ulfat-Bunyadi
for the rigorous proof-reading of the paper.
Supporting the Consistent Specification of Scenarios 59
References
[1] Gorschek, T., Wohlin, C.: Requirements Abstraction Model. Requirements Engineering
Journal (REJ) 11, 79101 (2006)
[2] International Telecommunication Union. Recommendation Z.120 - Message Sequence
Charts, MSC (2004)
[3] De Alfaro, L., Henzinger, T.A.: Interface Automata. In: Proc. of the ACM SIGSOFT
Symp. on the Foundations of Software Engineering, pp. 109120 (2001)
[4] RTCA: DO-178B Software Considerations in Airborne Systems and Equipment
Certification (1992)
[5] Potts, C.: Using Schematic Scenarios to Understand User Needs. In: Proc. of the ACM
Symposium on Designing Interactive Systems Processes, Practices, Methods and
Techniques (DIS 1995), pp. 247266. ACM, New York (1995)
[6] Pohl, K.: Requirements Engineering Foundations, Principles, Techniques. Springer,
Heidelberg (to appear 2010)
[7] Peled, D.: Specification and Verification using Message Sequence Charts. Electr. Notes
Theor. Comp. Sci. 65(7), 5164 (2002)
[8] Whittle, J., Schumann, J.: Generating Statechart Designs from Scenarios. In: Proc. of the
Intl. Conference on Software Engineering, pp. 314323 (2000)
[9] Khendek, F., Bourduas, S., Vincent, D.: Stepwise Design with Message Sequence Charts.
In: Proc. of the IFIP TC6/WG6.1, 21st Intl. Conference on Formal Techniques for
Networked and Distributed Systems, pp. 1934. Kluwer, Dordrecht (2001)
[10] Krger, I., Grosu, R., Scholz, P., Broy, M.: From MSCs to Statecharts. In: Proc. of the
IFIP WG10.3/WG10.5, Intl. Workshop on Distributed and Parallel Embedded Systems,
pp. 6171. Kluwer, Dordrecht (1999)
[11] Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Languages,
and Computation, 3rd edn. Addison-Wesley, Reading (2006)
[12] Milner, R.: Communication and Mobile Systems The Pi Calculus. Cambridge University
Press, Cambridge (1999)
[13] Gansner, E., North, S.: An Open Graph Visualization System and its Applications to
Software Eengineering. Software - Practice and Experience (1999)
[14] Robert Bosch GmbH: ACC Adaptive Cruise Control. The Bosch Yellow Jackets (2003),
http://www.christiani-tvet.com/
[15] Regnell, B., Davidson, A.: From Requirements to Design with Use Cases. In: Proc. 3rd
Intl. Workshop on Requirements Engineering Foundation for Software Quality, Barce-
lona (1997)
[16] Harel, D., Marelly, R.: Come, Lets Play Scenario-Based Programming Using LSCs
and the Play-Engine. Springer, Heidelberg (2003)
[17] Ohlhoff, C.: Consistent Refinement of Sequence Diagrams in the UML 2.0. Christian
Albrechts Universitt, Kiel (2006)
[18] Fricker, S., Gorscheck, T., Byman, C., Schmidle, A.: Handshaking with Impementation
Proposals: Negotiating Requirements Understanding. IEEE Software 27(2), 7280
(2010)
Requirements Value Chains: Stakeholder Management
and Requirements Engineering in Software Ecosystems
Samuel Fricker
1 Introduction
Much requirements engineering research has focused on engineering requirements of
a system with few easily accessible stakeholders [1]. This model of stakeholder in-
volvement is adequate for many bespoke situations, but is too simplistic for market-
driven software development [2, 3], where collaboration among stakeholders is a
central concern [4]. Here, a possibly large number of anonymous and distributed users
of current product versions provide feedback from product use and state new
requirements. Developers state requirements that emerge from attempts to balance
customer satisfaction and development feasibility. Marketing and management
departments define development scope. Other roles pursue further objectives.
Stakeholders and their relationships are a central concern of software ecosystems,
where stakeholder collaboration is organized in two value chains [5]. The require-
ments value chain applies to inception, elaboration, and planning of software, starting
with business and application ideas and ending with an agreed detailed set of re-
quirements for implementation. It is concerned of discovering, communicating, and
matching interests of direct and indirect stakeholders with functionality and quality
properties of the software to be developed [6]. Stakeholders need to be known and
differentiated [7, 8] and conflicting perspectives and goals resolved [9, 10].
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 6066, 2010.
Springer-Verlag Berlin Heidelberg 2010
Requirements Value Chains 61
Examples of requirements value chains have been published [15, 16, 18, 19, 25],
mostly characterized ad-hoc. Figure 1 shows an interpretation of one of them using
the notation introduced in [16]. It describes an institutionalized requirements value
chain that was developed in a large-scale requirements engineering process develop-
ment effort [15]. Process development defined the social network and requirements
engineering artifacts by identifying roles (circles in Figure 1), their responsibilities
with respect to the companys product and technology portfolio, and documentation
to capture agreements between the roles (arrows in Figure 1). Process development
left open the concrete interests to be pursued and aligned.
Analysis of the requirements value chain in Figure 1 raises questions regarding ef-
ficiency and completeness of the developed process. Software usability, if a concern
for the companys products, requires alignment of user interests with the software
team leaders (SW-TL) intentions over four specifications. The effort needed could be
significantly reduced and the quality of the alignment improved by introducing more
direct collaboration with selected users. The requirements value chain excludes rela-
tionships to production and suppliers, which could connect development to the supply
chain and would allow considering requirements that affect product cost, hence a
substantial part of the economic success. Company resources, finally are considered
mere implementation resources. No link points from product management to devel-
opment that would allow gathering innovative ideas [20].
3 Research Issues
Conceptualizing software product stakeholders as a requirements value chain can
bring transparency into how the software ecosystem affects product inception.
Research in this area can provide the fundaments for reasoning on efficiency and
effectiveness of requirements engineering strategy and of innovation.
64 S. Fricker
Requirements value chain analysis can allow understanding power of given stake-
holders, process performance, and ecosystem maturity. Social network theory [27]
provides concepts and models for determining stakeholder power and influence and
for evaluating structural strengths and weaknesses of the stakeholder network. Group
theory [26] gives insights into group effectiveness and the development of specialized
roles. Negotiation theory [21] provides decision-making knowledge.
Stakeholders need to be managed in the software ecosystem to evolve a value
chain. Partners need to be identified, relationships established, stakeholder interac-
tions monitored, and changes in requirements value chain controlled. Partner identifi-
cation may be addressed by directories or by recommendation systems. Established
social networks provide such capabilities, but have not been used for such purposes in
requirements engineering yet. Groups, besides negotiation tactic selection [25], can
also play a role for partner and peer identification. Group management addresses
group lifecycle, performance, and partnering with other groups. Relationships need to
be established to allow partners to start negotiations. Factors like trust and distance
affect the quality of such relationships. Value chain management, opposed to passive
emergence of a chain, involves proactive value chain composition, structuring, and
change to provide value and perspectives to its members and to ensure sustainability
of the software ecosystem.
Information spread in the requirements value chain needs to be managed. The
choice of interest elicitation, expectation setting, and decision documentation ap-
proaches can have effects on the transparency and performance of the value chain.
Computer-supported collaborative work [28], traceability, and audit trails can con-
tribute to understanding effective information sharing and management. Social
network technologies may give unprecedented support for requirements engineering.
Requirements value chain structure can affect innovation, requirements engineer-
ing performance and software success. Negotiations permit local alignment of
interest, but may not be effective for global alignment. Distance, feed-forward and
feedback affect the overall alignment of stakeholder interests and intentions in the
value chain and the motivation of stakeholders to collaborate. A new management
role may be needed, responsible for value chain structure and policies, for guiding
stakeholder behavior, and for controlling progress and success of interest alignment.
References
1. Cheng, B., Atlee, J.: Research Directions in Requirements Engineering. In: Future of
Software Engineering (FOSE 2007). IEEE Computer Society, Los Alamitos (2007)
2. Regnell, B., Brinkkemper, S.: Market-Driven Requirements Engineering for Software
Products. In: Aurum, A., Wohlin, C. (eds.) Engineering and Managing Software Require-
ments, pp. 287308. Springer, Heidelberg (2005)
3. Ebert, C.: Software Product Management. Crosstalk 22, 1519 (2009)
4. Karlsson, L., Dahlstedt, ., Regnell, B., Natt Och Dag, J., Persson, A.: Requirements
engineering challenges in market-driven software development - An interview study with
practitioners. Information and Software Technology 49, 588604 (2007)
5. Messerschmitt, D., Szyperski, C.: Software Ecosystem: Understanding an Indispensable
Technology and Industry. The MIT Press, London (2003)
6. Yu, E.: Towards Modelling and Reasoning Support for Early-Phase Requirements
Engineering. In: IEEE Intl. Symp. on Requirements Engineering, Annapolis MD, USA
(1997)
7. Alexander, I., Robertson, S.: Understanding Project Sociology by Modeling Stakeholders.
IEEE Software 21, 2327 (2004)
8. Kotonya, G., Sommerville, I.: Requirements Engineering with Viewpoints. Software
Engineering Journal 11, 518 (1996)
9. van Lamsweerde, A., Darimont, R., Letier, E.: Managing Conflicts in Goal-Driven
Requirements Engineering. IEEE Transactions on Software Engineering 24, 908926
(1998)
10. Easterbrook, S., Nuseibeh, B.: Using ViewPoints for Inconsistency Management. Software
Engineering Journal 11, 3143 (1996)
11. Jansen, S., Brinkkemper, S., Finkelstein, A.: Providing Transparency in the Business of
Software: A Modeling Technique for Software Supply Networks. Virtual Enterprises and
Collaborative Networks (2007)
12. Lauesen, S.: COTS Tenders and Integration Requirements. Requirements Engineering 11,
111122 (2006)
13. Rayport, J., Sviokla, J.: Exploiting the Virtual Value Chain. Harvard Business Review 73,
7585 (1995)
14. Gordijn, J., Yu, E., van der Raadt, B.: e-Service Design Using i* and e3value Modeling.
IEEE Software 23, 2633 (2006)
15. Paech, B., Drr, J., Koehler, M.: Improving Requirements Engineering Communication in
Multiproject Environments. IEEE Software 22, 4047 (2005)
16. Fricker, S.: Specification and Analysis of Requirements Negotiation Strategy in Software
Ecosystems. In: Intl. Workshop on Software Ecosystems, Falls Church, VA, USA (2009)
17. Mller, D., Herbst, J., Hammori, M., Reichert, M.: IT Support for Release Management
Processes in the Automotive Industry. In: Dustdar, S., Fiadeiro, J.L., Sheth, A.P. (eds.)
BPM 2006. LNCS, vol. 4102, pp. 368377. Springer, Heidelberg (2006)
18. Damian, D., Zowghi, D.: RE Challenges in Multi-Site Software Development Organisations.
Requirements Engineering 8, 149160 (2003)
19. Damian, D.: Stakeholders in Global Requirements Engineering: Lessons Learned from
Practice. IEEE Software 24, 2127 (2007)
20. Gorschek, T., Fricker, S., Palm, K., Kunsman, S.: A Lightweight Innovation Process for
Software-Intensive Product Development. IEEE Software (2010)
21. Thompson, L.: The Mind and Heart of the Negotiator. Prentice-Hall, Englewood Cliffs
(2004)
66 S. Fricker
22. Bergman, M., King, J.L., Kyytinen, K.: Large-Scale Requirements Analysis Revisited: The
Need for Understanding the Political Ecology of Requirements Engineering. Requirements
Engineering 7, 152171 (2002)
23. Grnbacher, P., Seyff, N.: Requirements Negotiation. In: Aurum, A., Wohlin, C. (eds.)
Engineering and Managing Software Requirements, pp. 143162. Springer, Heidelberg
(2005)
24. Fricker, S., Gorschek, T., Byman, C., Schmidle, A.: Handshaking with Implementation
Proposals: Negotiating Requirements Understanding. IEEE Software 27, 7280 (2010)
25. Fricker, S., Grnbacher, P.: Negotiation Constellations - Method Selection Framework for
Requirements Negotiation. In: Working Conference on Requirements Engineering: Foun-
dation for Software Quality, Montpellier, France (2008)
26. Johnson, D., Johnson, F.: Joining Together: Group Theory and Group Skills. Pearson,
London (2009)
27. Wasserman, S., Faust, K.: Social Network Analysis, Cambridge (2009)
28. Gross, T., Koch, M.: Computer-Supported Cooperative Work. Oldenbourg (2007)
Binary Priority List for Prioritizing Software
Requirements
1 Introduction
Product software companies produce packaged software products aimed at a specific
market [1]. In product software companies, the number of requirements from the
market typically exceeds the number of features that can be implemented in one
release due to limited resources. Requirements prioritization is aimed at responding
to this challenge. It is defined as an activity during which the most important re-
quirements for the system (or release) should be identified [2]. According to the
reference framework for software product management, in which key processes,
actors, and relations between them are modeled, it is the first step in the release
planning process, the process through which software is made available to, and
obtained by, its users [3]. The main actor in prioritization is the product manager,
but other stakeholders (development, sales & marketing, customers, partners etc.)
may influence it as well [3].
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 6778, 2010.
Springer-Verlag Berlin Heidelberg 2010
68 T. Bebensee, I. van de Weerd, and S. Brinkkemper
The figure shows a list of requirements containing three example requirements (R1,
R2 and R3) and a number of sub-lists (L1, L2, L3 and L4) containing more require-
ments. In accordance with a binary tree structure (cf. [17] and [6]) requirements that
are further up are more important than requirements further down. Therefore, R1s
priority is lower than that of R2 but higher than that of R3. Subsequently, all require-
ments listed in the sub-lists L1 and L3 have higher and all requirements in L2 and L4
have a lower priority than their root requirement, R2 and R3 respectively.
The steps of applying the technique are (cf. [5], [6] and [8]):
1. Pile all requirements that have been collected from various sources.
2. Take one element from the pile, and use it as the root requirement.
Binary Priority List for Prioritizing Software Requirements 71
We developed a simple spreadsheet tool based on Microsoft Excel (see Figure 2).
A macro guides the user through the prioritization process by asking him to compare
different requirements and deciding which one is respectively more important than the
other. The list structure is saved in an implicit form in a hidden spreadsheet, which
allows saving it and changing it at a later time without having to run the whole priori-
tization again.
Due to its simple nature we expect BPL to be especially useful in environments
where no formal prioritization techniques have yet been used to assist the product
manager in prioritizing his criteria. This type of environment is mostly likely to be
found in small product software companies that have recently grown in terms of
requirements to be processed.
72 T. Bebensee, I. van de Weerd, and S. Brinkkemper
4 Case Studies
4.1 Approach
The case studies were divided into three phases: (1) a prioritization with BPL, (2) a
prioritization with Wiegerss technique, and (3) an evaluation of the two techniques.
To reduce the number of confounding factors as proposed by Wohlin and Wesslen
[16], the same way of input, namely Excel spreadsheets, were used.
We conducted case studies in two small product software companies in which we
compared BPL with Wiegerss technique in terms of the following three factors:
1. Time consumption: indicates the time necessary to prioritize a certain number of
requirements.
2. Ease of use: describes how easy it is to use the examined prioritization technique
assessed by the respective product manager.
3. Subjective reliability of results: indicates how reliable the result of the prioritiza-
tion technique is in the opinion of an experienced product manager and thus how
applicable the technique is to the respective company.
The ultimate goal was to show that BPL can be applied by product managers in small
product software companies to systematize their requirement prioritization practices. In
order give an additional indication of the techniques relative prioritization process
quality, we compared it with Wiegers approach, a commonly used prioritization tech-
nique that is applied to similar numbers of requirements as BPL (cf. [7], [18] and [19]).
The first case study took place at Edmond Document Solutions (below referred to as
Edmond), a small Dutch product software company. Edmond is a specialist in providing
document processing solutions to document intensive organizations. The company em-
ploys 15 people whereof six are involved in software development. The software devel-
opment process is based on Scrum (cf. [20]), an agile software development method.
The current release planning process takes place in two stages. High-level require-
ments are discussed once a year among the product manager, the operations manager
and the sales director. The selection and order of requirements is defined in an infor-
mal discussion between the three. To gain a good understanding of the market, they
visit exhibitions, read journals and communicate with major customers. The product
manager estimates the required resources and makes sure the needed resources do not
exceed the resources available.
In the second stage of the release planning process, the product manager derives
low-level requirements from the high-level requirements defined in the first stage.
To manage requirements together with tasks, bugs, improvements and impediments
he uses JIRA (cf. [21]), a bug and issue tracking software. Prioritization of low level
requirements takes place by comparing them in pairs with each other. In this process
no formal technique is used. Subsequently, he assigns the requirements to Scrum
sprints, periods of four weeks where developers work on a certain number of
requirements.
The second case study took place at Credit Tools (below referred to as CT), a small
Dutch product software company. CT produces encashment management software.
Binary Priority List for Prioritizing Software Requirements 73
Five out of the companys 25 employees are software developers. In addition, there
are two outsourced software developers. The companys development method is
Rapid Application Development (cf. [22]).
Requirements are generated from customer requests, from information acquired
through consultants and from ideas generated by the companys owners. JIRA is used
to collect requirements and to communicate with the outsourcing developers. There is
no formal process of requirements prioritization and not all requirements are system-
atically noted down.
4.3 Results
BPL Wiegers
Edmond CT Edmond CT
Ease of use 8/10 8/10 7/10 4/10
Subjective reliability 7/10 7/10 4/10 5/10
Time consumption 30 min 20 min 120 min 50-60 min
The product managers of both companies had a quite positive impression of BPL,
which is reflected by their rating of the technique in terms of ease of use and reliability
(see Table 1). One indicated that the ten most highly prioritized requirements
corresponded exactly to his own manual prioritization. Lower priority requirements,
however, partly differed from it. He supposed that this could be caused by accidentally
giving wrong answers while going through the prioritization process and proposed to
improve the user interface by including a possibility to correct a wrong choice.
The second product manager indicated that the informal approach to prioritizing
74 T. Bebensee, I. van de Weerd, and S. Brinkkemper
requirements that they had used so far has many similarities with BPL. Therefore, he
considered it as quite suitable for his company.
To compare the results of two techniques, Table 2 shows the ten requirements with
the highest priority according to both techniques.
Table 2. The ten most highly prioritized requirements according to both techniques
Edmond CT
Priority BPL Wiegers BPL Wiegers
1 5 54 18 19
2 2 23 35 43
3 28 21 25 11
4 27 61 14 18
5 3 63 24 29
6 6 12 16 42
7 23 28 11 3
8 12 45 42 22
9 16 50 33 24
10 7 53 2 13
Avg. Diff. 14.75 8.54
Figure 2 (Edmond case) and Figure 3 (CT case) show the priorities of all require-
ments assigned by Wiegerss techniques plotted over the priorities assigned by BPL,
which is also represented by the straight line. The stronger the two graphs in each
figure differ, the bigger the difference between the priorities assigned by the two
techniques.
61
51
41
Priority
31 BPL
Wiegers
21
11
1
1 11 21 31 41 51 61
46
41
36
31
Priority
26
21 BPL
Wiegers
16
11
1
1 11 21 31 41
In general, in both case studies, the results from both techniques differed strongly
from each other. Interestingly, however, in the second case study, the results of the
two techniques are close to each other for the requirements with the lowest priority.
The average difference of the priorities based on the two techniques was 14.75 in the
first case study and 8.54 in the second case study. The maximal difference between
BPL and Wiegerss technique was 56 in the first case and respectively 31 in the sec-
ond. Both product managers rate Wiegerss technique rather low in terms of reliability
(see Table 1). However, the product manager of Edmond noted that a better calibra-
tion might have resulted in an improvement. In their rating of ease of use of
Wiegerss technique, the two product managers differ considerably. The product
manager of CT mentioned that he found it difficult to estimate values for relative risk
and penalty.
4.4 Discussion
The two techniques compared in these case studies differ in the way the prioritization
criterion is articulated. In contrast to Wiegerss technique, BPL does not make the
underlying inputs for the prioritization explicit. The results are rather based on the
spontaneous intuition of the person applying it. However, ideally the product manager
bases his considerations during the BPL prioritization on the same inputs, namely
benefit, penalty, costs and risk as in Wiegerss technique or even considers other
important factors, as for instance attractiveness for development.
In order to make sure that this happens, the question asked in the prioritization dia-
logue of the BPL tool should be formulated accordingly. During the case study the
76 T. Bebensee, I. van de Weerd, and S. Brinkkemper
question was Is req. X more important than req. Y? Now, we would suggest formu-
lating it as Do you want to implement req. X before req. Y? instead. This formula-
tion more explicitly suggests considering other factors than just importance. However,
we still recommend avoiding prioritization of requirements that differ considerably in
terms of costs to be implemented.
The difference in how explicitly the two techniques require the product manager to
express the factors that determine his considerations also explains the different time
consumption of the two techniques. The prioritization with BPL consequently only
takes one quarter (Edmond case) to one third (CT case) of the time of Wiegerss
technique.
BPL was perceived the easier of the two techniques. This can also be related to its
simple structure. The user basically only has to compare two requirements at the time
and can apply an own set of criteria.
The strong difference between the two techniques prioritization suggests that the
result of a prioritization session depends heavily on the respective technique used.
The accordance of the two techniques results for requirements with low priority is
actually the only point where both techniques correspond considerably with each
other. It might be explained by the fact that the product manager considered the last
four requirements as so unimportant that he assigned very low scores to three of the
determining factors of Wiegerss technique to them.
Both product managers considered the result of BPL considerably more reliable
than that of Wiegerss technique. This may be attributed to the fact that they could
directly apply their own comparison criteria. As a consequence, the result always
stays relatively close to his intuition. However, Wiegerss technique might become
more reliable when it is fine-tuned to the circumstances of the environment it is
applied in by changing the weights of the four input factors. To test this, we would
suggest repeating the prioritization with a small amount of requirements and subse-
quently adjust the weights in such a way that the prioritization result corresponds to
the manual one.
Altogether, the results from the two case studies suggest that BPL is an appropriate
technique to prioritize a few dozen requirements as they typically occur in small
product software companies although we cannot generalize from the case studies due
to the limited number of replication. In such an environment, the techniques overall
prioritization process quality, considering the quality of the process itself and the
quality of its results, seems to be higher than that of Wiegers technique. BPL could
help small software product companies without a formal prioritization process to
systemize it. The best results are expected when requirements are compared that are
similar in terms of development costs. The technique was used by one single person.
It remains open if it is also suitable with a group of people performing a prioritization
as e.g. in the situation mentioned in the Edmond case study where three people are in
charge of prioritizing the high-level requirements.
However, the case studies also revealed some limitations of the technique. First of
all, BPL does not consider dependencies between requirements. Instead, the user has
to keep them in mind while prioritizing or refining the prioritization list afterwards as
also suggested in the requirements selection phase of the release planning as also
indicated in the reference framework for software product management [3]. In addi-
tion, due to the simple structure there might be a tendency to base the prioritization
Binary Priority List for Prioritizing Software Requirements 77
just on one single criterion, such as importance rather than consider other factors such
as costs, penalty and risk.
In terms of scalability, the first case study revealed that a pair-wise comparison of
68 requirements can already be quite tiring and lead to mistakes in terms of the com-
parison. Balancing the binary tree and incorporating other BST optimization tech-
niques [11] could reduce the number of comparisons necessary. However, we expect
BPL not to be practicable for numbers much more than 100 requirements.
References
1. Xu, L., Brinkkemper, S.: Concepts of product software. European Journal of Information
Systems 16(5), 531541 (2007)
2. Berander, P., Khan, K.A., Lehtola, L.: Towards a Research Framework on Requirements
Prioritization. In: Proceedings of the Sixth Conference on Software Engineering Research
and Practise in Sweden, pp. 3948 (2006)
3. van de Weerd, I., Brinkkemper, S., Nieuwenhuis, R., Versendaal, J., Bijlsma, L.: Towards
a reference framework for software product management. In: Proceedings of the 14th
International Requirements Engineering Conference, pp. 312315 (2006)
4. Augustine, S.: Managing Agile Projects. Prentice Hall, New Jersey (2005)
5. Racheva, Z., Daneva, M., Buglione, L.: Supporting the Dynamic Reprioritization of
Requirements in Agile Development of Software Products. In: Proceedings of the Second
International Workshop on Software Product Management 2008, Barcelona, pp. 4958
(2008)
6. Karlsson, J., Wohlin, C., Regnell, B.: An evaluation of methods for prioritizing software
requirements. Information and Software Technology 39(14-15), 939947 (1997)
7. Wiegers, K.: First things first: prioritizing requirements. Software Developmen 7(9),
4853 (1999)
8. Ahl, V.: An Experimental Comparison of Five Prioritization Methods. Masters Thesis.
Department of Systems and Software Engineering, Blekinge Institute of Technology,
Ronneby (2005)
9. Karlsson, J., Ryan, K.: A cost-value approach for prioritizing requirements. IEEE
Software 14(5), 6774 (1997)
10. Knuth, D.: The Art of Computer Programming, vol. 3. Addison-Wesley, Reading (1997)
11. Knuth, D.: Optimum binary search trees. Acta Informatica 1(1), 1425 (1971)
12. Bentley, J.L.: Multidimensional binary search trees used for associative searching.
Communications of the ACM 18(9), 509517 (1975)
13. Bell, J., Gupta, G.: An evaluation of self-adjusting binary search tree techniques. Software:
Practice and Experience 23(4), 369382 (1993)
14. Hevner, A.R., March, S.T., Park, J., Ram, S.: Design science in information systems
research. Management Information Systems Quarterly 28(1), 75106 (2004)
15. Yin, R.K.: Case study research: Design and methods. Sage, Thousand Oaks (2009)
16. Wohlin, C., Wesslen, A.: Experimentation in software engineering: an introduction.
Kluwer, Norwell (2000)
17. Smith, J.D.: Design and Analysis of Algorithms. PWS-KENT, Boston (1989)
18. Young, R.R.: Recommended requirements gathering practices, CrossTalk, pp. 912 (April
2002)
19. Herrmann, A., Daneva, M.: Requirements Prioritization Based on Benefit and Cost
Prediction: An Agenda for Future Research. In: Proceedings of the 16th IEEE International
Requirements Engineering Conference, pp. 125134 (2008)
20. Schwaber, K., Beedle, M.: Agile software development with Scrum. Prentice Hall, Upper
Saddle River (2001)
21. JIRA Bug tracking, issue tracking and project management software,
http://www.atlassian.com/software/jira/
22. McConnell, S.: Rapid Development: Taming Wild Software Schedules, 1st edn. Microsoft
Press, Redmond (1996)
Towards a Framework for Specifying Software
Robustness Requirements Based on Patterns
1 Introduction
With software becoming more commonplace in society and with continued in-
vestments on nding better ways to produce it the maturity of both customers
requirements and our ability to fulll them increases. Often, this leads to an in-
creased focus on non-functional requirements and quality characteristics, such as
performance, design and usability. But there is also less of a tolerance for faults
and failures; by coming more reliant on software our society also increasingly
requires it to be reliable and robust. Robustness as a software quality attribute
(QA) is dened as [1]: The degree to which a system or component can function
correctly in the presence of invalid inputs or stressful environmental conditions.
The industrial project which is referred to in this paper is the development
of a robust embedded software platform for running internal and third party
telematics services [2]. The platform and services in this context need to be
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 7984, 2010.
c Springer-Verlag Berlin Heidelberg 2010
80 A. Shahrokni and R. Feldt
In this section the framework ROAST for eliciting RR and aligning specication
and testing of RR is shortly described. ROAST is based on identifying patterns
for specication of robustness at dierent abstraction levels. As mentioned ear-
lier, robustness is not a strictly dened term and can refer to both low-level
(interface and input validation, failure handling) and high-level (service degra-
dation, availability, reliability and dependability) requirement types.
There are three main ideas behind the method: (a) specication levels, (b)
requirement patterns, and (c) alignment from requirements to testing. The rst
2 parts are shortly described in this paper and the alignment from requirements
to testing will be discussed in future publications.
Like many NFRs, RR are often summative in nature. This means that they
specify general attributes of the developed system and not specic attributes for
Towards a Framework for Specifying Software Robustness Requirements 81
specic, local situations. For example, while a functional requirement (FR) for
a certain feature of a telematics system (system should support being updated
with applications during runtime) can be judged by considering if that specic
feature is present or not, a RR (system should be stable at all times, it cannot
shut down because of erroneous inputs or components) requires testing of a large
number of dierent system executions. So while a FR talks about one specic
situation, or a denite sub-set of situations, a RR summarizes aspects of the
expected system behavior for a multitude of situations.
To make RRs testable they need to be rened into specic behaviors that
should (positive) or should never happen (negative). Early in the development
of a software system users or developers may not be able to provide all details
needed to pinpoint such a behavior (or non-behavior). However, it would be a
mistake not to capture more general RRs. Our method thus describes dierent
information items in a full specication of a robustness behavior and describes
dierent levels in detailing them. This is similar to the Performance Renement
and Evolution Model (PREM) as described by [6, 7], but specic to robustness
instead of performance requirements. The dierent levels can be used to judge
the maturity of specifying a requirement or as a specic goal to strive for.
Since RR are often summative, i.e. valid for multiple dierent system situ-
ations, they are also more likely, than specic functional requirements, to be
similar for dierent systems. We can exploit this similarity to make our method
both more eective (help achieve a higher quality) and ecient (help lower costs).
By creating a library of common specication patterns for robustness, industrial
practitioners can start their work from that library. Thus they need not develop
the requirements from scratch and can use the pattern to guide the writing of a
specic requirement. This can both increase quality and decrease time in devel-
oping the requirements. Our approach and the pattern template we use is based
on the requirements patterns for embedded systems developed by Konrad and
Cheng, that are in turn based on the design patterns book [8, 9].
The verication of dierent robustness behaviors should be aligned with the
RR. Based on the level of requirement and the pattern the requirement is based
on dierent verication methods are applicable and relevant. We make these
links explicit in order to simplify the verication and testing process. Note that
the verication method that is relevant for a certain pattern at a certain level
may not actually be a testing pattern. For pattern levels that are not quantiable
or testable, the verication method may be a checklist or similar technique.
Figure 1 gives an overview of the method we propose, and shows both the
levels, robustness areas with patterns and attached verication methods. In the
following we describe each part in more detail.
N Pattern Category
1 Specied response to out-of-range and invalid inputs IS
2 Specied response to timeout and latency IS
3 Specied Response to input with unexpected timing IS
4 High input frequency IS
5 Lost events IS
6 High output frequency IS
7 Input before or during startup, after or during shut down IS
8 Error recovery delays IS
9 Graceful degradation M
10 All modes and modules reachable M
11 run-time memory access in presence of other modules and services ES
12 Processor access in presence of other modules and services ES
13 Persistent memory access in presence of other modules and services ES
14 Network access in presence of other modules and services ES
The patterns presented in this section are partly elicited by studying ear-
lier requirement documents from similar projects and partly through expertise
provided by the participants in the project who are mainly experienced people
in the eld of requirement engineering. Earlier academic work presented above
helped us complete and reformulate already identied patterns.
84 A. Shahrokni and R. Feldt
3 Conclusion
The state of the art and practice concerning robustness requirements and testing
is rather immature compared to that of other quality attributes. The proposed
framework, ROAST, is a unied framework for how to interpret robustness and
specify and verify robustness requirements.
ROAST follows a requirement as it often evolves from a high level requirement
to a set of veriable and concrete requirements. Therefore ROAST consists of
dierent levels of specication that follow the most typical requirement speci-
cation phases practiced in the industry. As presented in ROAST, requirements
engineering process tends to start from high level requirements and break them
down into more specic and measurable ones. Therefore, ROAST can be incor-
porated into the activities of most companies with minimal change to the rest
of the process. The commonality often seen between robustness requirements
in dierent projects is captured in patterns. For dierent patterns and levels
dierent verication methods will be more or less useful.
Initial evaluation of ROAST has been carried out in an industrial setting.
Preliminary results are promising and show that the resulting requirements are
more complete and more likely to be veriable. Further evaluation is underway.
References
1. IEEE Computer Society, IEEE standard glossary of software engineering terminol-
ogy. IEEE, Tech. Rep. Std. 610.12-1990 (1990)
2. Shahrokni, A., Feldt, R., Petterson, F., Back, A.: Robustness verication challenges
in automotive telematics software. In: SEKE, pp. 460465 (2009)
3. Jae, M., Leveson, N.: Completeness, robustness, and safety in real-time software
requirements specication. In: Proceedings of the 11th International Conference on
Software Engineering, pp. 302311. ACM, New York (1989)
4. Lutz, R.R.: Targeting safety-related errors during software requirements analysis.
Journal of Systems and Software 34(3), 223230 (1996)
5. Newmann, P.: The computer-related risk of the year: weak links and correlated
events. In: Proceedings of the Sixth Annual Conference on Computer Assurance,
COMPASS 1991, Systems Integrity, Software Safety and Process Security, pp. 58
(1991)
6. Ho, C.-W., Johnson, M., Maximilien, L.W.E.: On agile performance requirements
specication and testing. In: Agile Conference 2006, pp. 4652. IEEE, Los Alamitos
(2006)
7. Ho, C.-W.: Performance requirements improvement with an evolutionary model.
PhD in Software Engineering, North Carolina State University (2008)
8. Konrad, S., Cheng, B.H.C.: Requirements patterns for embedded systems. In: Pro-
ceedings of the IEEE Joint International Conference on Requirements Engineering
(RE 2002), Essen, Germany (September 2002)
9. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns. Addison-Wesley,
Boston (January 1995)
A Metamodel for Software Requirement Patterns
1 Introduction
Reuse is a fundamental activity in all software development related processes. Of
course, requirements engineering is not an exception to this rule [1]. The reuse of
software requirements may help requirement engineers to elicit, validate and docu-
ment software requirements and as a consequence, obtain software requirement speci-
fications of better quality both in contents and syntax [2].
There are many approaches to reuse. Among them, patterns hold a prominent posi-
tion. According to their most classical definition, each pattern describes a problem
which occurs over and over again, and then describes the core of the solution to that
problem, in such a way that it can be used a million times over, without ever doing it
the same way twice [3]. Software engineers have adopted the notion of pattern in
several contexts, remarkably related with software design (e.g., software design and
architectural patterns), but also in other development phases, both earlier and later.
This work has been partially supported by the Spanish project TIN2007-64753.
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 8590, 2010.
Springer-Verlag Berlin Heidelberg 2010
86 X. Franch et al.
We are interested in the use of patterns for the software analysis stage, namely Soft-
ware Requirement Patterns (SRP).
As [4] shows, there are not much proposals for SRP in the literature, in fact their
exhaustive review lists just 4 catalogues out of 131, compared to 47 design catalogues
and 39 architecture catalogues. Our own literature review has found some more ap-
proaches but still this unbalance is kept. The existing approaches differ in criteria like
the scope of the approach, the formalism used to write the patterns, the intended main
use of patterns and the existence of an explicit metamodel. Table 1 shows the classifi-
cation of these approaches with respect to the mentioned criteria. In the last row we
describe our own method as general-purpose, representing patterns in natural language,
aiming at writing sofware requirements specifications (SRS) and metamodel-based.
About the two approaches that propose a metamodel, [8] focus on reuse of semi-
formal models (e.g., UML class diagrams and sequence diagrams), thus the kind of
concepts managed are quite different. Concerning [6], their focus is on variability
modeling for handling the different relationships that requirements may have. From
this point of view, it is a very powerful approach, but other aspects that we will tackle
here, like the existence of different forms that a pattern may take, or multiple classifi-
cation criteria, are not present in their metamodel.
The idea of using SRP for reusing knowledge acquired during this stage arose from
the work of the CITI department of the Centre de Recherche Publique Henri Tudor
(CRPHT) on helping SME with no background in requirements engineering to handle
requirements analysis activities and to design SRS in order to conduct call-for-tender
processes for selecting Off-The-Shelf (OTS) solutions [14]. More than 40 projects ran
successfully following the CITI methodology, but the only technique of reuse they
applied was starting a new project by editing the most similar requirement book.
These techniques demonstrated their weaknesses especially in relation to mobility of
IT experts and consultants. It became necessary to provide better means to capitalize
requirements in a high-level manner by creating reusable artifacts like patterns, sup-
porting consultants need of creating new SRS.
As a response to this need, we built an SRP catalogue with 29 patterns. The patterns
were all about non-functional requirements since this type of requirements are the less
sensitive to changes in the problem domain. The research method used to build this
catalogue and the underlying metamodel was based on the study of SRS from 7
A Metamodel for Software Requirement Patterns 87
call-for-tender real projects conducted by CITI; experts knowledge, being these ex-
perts: IT consultants, CITI facilitators and UPC researchers; background on require-
ments engineering literature and especially on requirement patterns. We undertook
then a first validation in two real projects. In this paper we focus on the metamodel,
that is, the structure of our proposed SRPs and its classification to facilitate the selec-
tion of patterns. The PABRE process of application of SRP in the context of CITI and
the validation of our current SRP catalogue have been described in [15], therefore
neither the process nor the catalogues content are part of the objectives of this paper.
Figure 2 shows the metamodel for SRP. It represents the metaclasses for the basic
concepts that appear in the example above and others introduced later. We may ob-
serve that the concept represented by a Requirement Pattern may take different Pat-
tern Forms. Each form is applicable in a particular context, i.e. it is the most appro-
priate form to achieve the patterns goal in a particular type of software project. In the
example of Fig. 1, the second form is more adequate if the types of alerts that the
client wants in the system will be the same for all types of failures, if not the first
form must be applied. Applying a SRP, then, means choosing and applying the most
suitable form.
At its turn, each form has a Fixed Part that characterizes it which is always applied
if the form is selected, together with zero or more Extended Parts that are optional
and help customizing the SRP in the particular project. In general, extended parts
must conform to some Constraint represented by means of a formula over some pre-
defined operators (e.g., for declaring multiplicities or dependencies among extended
parts, as excludes, requires). For instance, in the example we may see that the first
form allows repeated application of its single extended part, whilst the second form
allows one application at most of each of its extended parts (since in this form it has
not sense to state more than once the types of alerts and failures).
Both fixed and extended parts are atomic Pattern Items that cannot be further de-
composed. Each pattern item contains a template with the text that finally appears in
the SRS when applied. In this text, some variable information in the form of Parame-
ters may (and usually, do) appear. Parameters establish their Metric, eventually a
correctness condition inv, and also may be related to other parameters (belonging to
other patterns) such that they must have the same value; an example is the parameter
failures that also appears in some form of other SRP in the catalogue, namely the
pattern Recovery Procedures.
SRPs are not isolated units of knowledge, instead there are several types of rela-
tionships among them. For instance, Withall structures his SRP catalogue using a
more detailed proposal of relationships, that may be purely structural like has,
uses and is-a, or with a semantic meaning like displays and is across [12].
Even generic (unlabelled) relationships are used. A thorough analysis of the SRS
written by CITI shows that relationships may appear at three different levels:
Pattern Relationship. The most general relationship that implies all the forms
and all the forms parts of the related patterns.
Form Relationship. A relationship at the level of forms implies all the parts of
the related forms.
Part Relationship. The relationship only applies to these two parts.
In any case, if A is related to B and A is applied in the current project, the need of
applying or avoiding B must be explicitly addressed. The types of relationships are
not predetermined in the metamodel to make it more flexible. The superclass Rela-
tionship includes an attribute to classify each relationship.
A fundamental issue when considering patterns as part of a catalogue is the need of
classifying them over some criteria for supporting their search. In fact, it is important
to observe that different contexts (organizations, projects, standards, etc.) may, and
usually do, define or require different classification schemas. History shows that try-
ing to impose a particular classification schema does not work, therefore we decouple
SRPs and Classifiers as shown in the metamodel. The catalogue is thus considered as
flat and the Classification Schemas just impose different structuring schemas on top
of it. Classifiers are organized into a hierarchy and then SRP are in fact bound to
Basic Classifiers, whilst Compound Classifiers just impose this hierarchical structure.
The use of aggregation avoids cycles without further integrity constraints. Last, a
derived class Root is introduced as a facilitation mechanism.
The metamodel shows that a SRP may be bound to several classification schemas,
and even to more than one classifier in a single classification schema (since no further
restrictions are declared). Also note that we do not impose unnecessary constraints
that could lead the catalogue to be rigid. For instance, we may mention that a classifi-
cation schema may not cover all existing SRP (i.e., some SRP may not be classified).
Although this situation could be thought as a kind of incompleteness, in fact we are
allowing having dedicated classification schemas for particular categories of patterns,
e.g. a performance classification schema, a classification schema just for the non-
technical criteria [16] and then allowing to compound them for having a multi-source
global classification schema. Also we remark that the PABRE method [15] benefits
from this existence of multiple classification schemas since nothing prevents chang-
ing from one schema to another during catalogue browsing.
References
1. Lam, W., McDermid, J.A., Vickers, A.J.: Ten Steps Towards Systematic Requirements
Reuse. REJ 2(2) (1997)
2. Roberson, S., Robertson, J.: Mastering the Requirements Process, 2nd edn. Addison-
Wesley, Reading (2006)
3. Alexander, C.: The Timeless Way of Building. Oxford Books (1979)
4. Henninger, S., Corra, V.: Software Pattern Communities: Current Practices and Chal-
lenges. In: PLoP 2007 (2007)
5. Durn, A., Bernrdez, B., Ruz, A., Toro, M.: A Requirements Elicitation Approach Based
in Templates and Patterns. In: WER 1999 (1999)
6. Moros, B., Vicente, C., Toval, A.: Metamodeling Variability to Enable Requirements Re-
use. In: EMMSAD 2008 (2008)
7. Robertson, S.: Requirements Patterns Via Events/Use Cases. In: PLoP 1996 (1996)
8. Lpez, O., Laguna, M.A., Garca, F.J.: Metamodeling for Requirements Reuse. In: WER
2002 (2002)
9. Konrad, S., Cheng, B.H.C.: Requirements Patterns for Embedded Systems. In: RE 2002 (2002)
10. Matheson, D., Ray, I., Ray, I., Houmb, S.H.: Building Security Requirement Patterns for
Increased Effectiveness Early in the Development Process. In: SREIS 2005 (2005)
11. Mahfouz, A., Barroca, L., Laney, R.C., Nuseibeh, B.: Patterns for Service-Oriented Infor-
mation Exchange Requirements. In: PLoP 2006 (2006)
12. Withall, J.: Software Requirements Patterns. Microsoft Press, Redmond (2007)
13. Yang, J., Liu, L.: Modelling Requirements Patterns with a Goal and PF Integrated Analysis
Approach. In: COMPSAC 2008 (2008)
14. Krystkowiak, M., Bucciarelli, B.: COTS Selection for SMEs: a Report on a Case Study
and on a Supporting Tool. In: RECOTS 2003 (2003)
15. Renault, S., Mndez, O., Franch, X., Quer, C.: A Pattern-based Method for building Re-
quirements Documents in Call-for-tender Processes. IJCSA 6(5) (2009)
16. Carvallo, J.P., Franch, X., Quer, C.: Managing Non-Technical Requirements in COTS
Components Selection. In: RE 2006 (2006)
17. Mndez, O., Franch, X., Quer, C.: Requirements Patterns for COTS Systems. In: ICCBSS
2008 (2008)
Validation of the Effectiveness of an
Optimized EPMcreate as an Aid for
Creative Requirements Elicitation
1 Introduction
Many have observed the importance of creativity in requirements engineering, e.g.,
[1,2,3]. Many techniques, e.g., brainstorming [4], Six Thinking Hats [5], and the Cre-
ative Pause Technique [6], have been developed to help people be more creative. Some
of these techniques have been applied to requirements engineering [7,2], and some of
these techniques have also been subjected to experimental validation of their effective-
ness [7,8]. A fuller discussion of these techniques can be found elsewhere [9].
This paper investigates a variant of the creativity enhancement technique (CET),
EPMcreate (EPM Creative Requirements Engineering [A] TEchnique) [9,10], that is
based on the Elementary Pragmatic Model (EPM) [11] and on a general-purpose CET
developed to increase individual creativity [12]. The feasibility of applying EPMcreate
to idea generation in requirements elicitation was established by experiments on two
computer-based system (CBS) development projects with very different characteristics.
Each experiment compared the requirements idea generation of two analysis teams, one
using EPMcreate and the other using brainstorming [9]. Because EPMcreate was a new
CET being applied to requirements elicitation, it had been necessary to define both the
input of a requirements elicitation session with EPMcreate and the process. The main
inputs of such a session are:
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 91105, 2010.
c Springer-Verlag Berlin Heidelberg 2010
92 V. Sakhnini, D.M. Berry, and L. Mich
the problem statement or any other information useful for the CBS to be developed,
and
an understanding of the viewpoints of different stakeholders of the CBS, as defined
by EPM, i.e., based on a systematic enumeration of all possible combinations of
the stakeholders viewpoints.
The definition of the process describes the steps and the activities to be performed at
each step.
Effectiveness was chosen as the first research question: Is EPMcreate at least as
effective as brainstorming? Brainstorming was chosen as the basis for a comparative
measure of effectiveness, because: (1) it is well known [1,13]; and (2) there are at least
two studies of its application in requirements elicitation [7,14], one experimental and
the other anecdotal.
The results of the first experiments confirmed that, in at least the situations of the
experiments, EPMcreate:
1. can be used by analysts, both junior and senior, requiring only minimal training and
2. produces more ideas and, in particular, more innovative ideas than does brainstorm-
ing.
Another investigation [10] compared the quality of the ideas produced by the two treat-
ments in these same experiments and concluded that EPMcreate produced more ideas
related to content and service requirements than did brainstorming.
The first experiments exposed a number of issues to be explored in the future. These
include:
Are there optimizations of EPMcreate, which involve fewer steps than EPM-
create, that are at least as effective as EPMcreate in helping to generate ideas
for requirements for CBSs?
Since an optimization of EPMcreate requires fewer steps than the full EPMcreate, if the
optimization is only at least as effective as the full EPMcreate, the optimization is still
an improvement.
The purpose of this paper is to take up this question. This paper describes one opti-
mization of EPMcreate and demonstrates its effectiveness as a creativity enhancement
technique (CET). It reports on a controlled experiment that compares the optimization
with both the original EPMcreate and brainstorming when they are used to help elicit
requirements for an improved version of a Web site.
In the rest of this paper, Section 2 describes the EPMcreate technique, including
the optimization. Section 3 describes the experiment, including its hypotheses and its
steps. Section 4 gives the results of the experiment, analyzes them and determines if the
hypotheses are supported. Section 5 discusses limitations of the results, and Section 6
concludes the paper.
Other Viewpoints
3.1 Hypotheses
The hypotheses to be tested with the experiment were:
H1. The POEPMcreate is more effective than the full 16-step EPMcreate in helping to
generate requirement ideas.
H2. The full 16-step EPMcreate is more effective than brainstorming in helping to gen-
erate requirement ideas.
Note that H1 is stronger than needed since all we require is that, as an optimization, the
POEPMcreate is at least as effective as the full EPMcreate. Therefore, if H1 were not
supported, it would be acceptable if its corresponding null hypothesis, that there is no
difference in the effectiveness of the two CETs, were supported. As it turned out, H1
is supported. However, it is always nice when an optimization proves to be better than
required.
We considered the hypothesis H2 because the original experiments [9] addressing
this same hypothesis did not get generalizable results, although the results were sig-
nificant for the CBSs and subjects studied in the experiments. The generalization is
addressed by testing the effectiveness of EPMcreate to help generate requirement ideas
for a different CBS with different subjects.
The raw number of ideas generated was used because one of the CETs evaluated, brain-
storming, encourages quantity over quality in its first step.
The basis for evaluating the quality of an idea is the notion that a creative idea is both
new and useful [8]. Therefore, as suggested by Mich et al. [9], the quality of an idea
was evaluated by classifying it into one of 4 rankings:
1. new and realizable
2. new and not realizable
3. not new and not realizable
4. not new and realizable
with 1 being the highest ranking and 4 being the lowest ranking.
An idea is considered new if the idea is not already implemented in the current
Web site. Realizable includes several notions: (1) useful for at least one stakeholder,
(2) technically implementable, and (3) socially and legally implementable, thus also
excluding privacy invading ideas.
This ranking embodies three independent assumptions:
that a new idea is better than a not new idea,
that a realizable idea is better than a not realizable idea, and
among the not new ideas, a not realizable one is more creative since it is more
outside the box1 .
To evaluate the quality of the ideas, each of two domain experts, namely the first
two authors of this paper, independently classified each idea into one of 4 rankings.
In order to reduce the chances that the authors desired results might affect the quality
evaluation, we merged the requirement ideas generated by the 6 groups into one file.
We then sorted the ideas alphabetically to produce the list of ideas to be evaluated. With
the merged and sorted list, it was impossible for any evaluator to see which group, with
its known CET, generated any idea being evaluated. After each evaluator had assigned
a ranking to each idea, the rankings were copied to the original idea files, in order to be
able to evaluate the quality of the requirement ideas of each group separately.
Step 2: 30 minutes for each subject to take a creativity assessment test, the modified
Williams test described in Section 3.4.
Step 3: 10 minutes for us to deliver to each group an explanation about the experiment
and the CET it was to use:
To the EPMcreate and POEPMcreate groups, the explanation was basically the
last two paragraphs of Section 2.2 of this paper. The full-EPMcreate groups
were given the full list of steps, and the POEPMcreate groups were given only
Steps 1, 2, 4, and 8.
To the brainstorming groups, the explanation emphasized that the main goal of
a brainstorming session is to generate as many ideas as possible. Our recom-
mendations and requests were:
1. Dont judge, be open to all ideas, and consider them carefully and respect-
fully in an unprejudiced manner.
2. Encourage the unusual with no limits placed on your imaginations.
3. The more ideas you generate, the better.
4. Improve on the ideas of others; no one is the exclusive owner of any idea.
5. Try to produce as many ideas as possible; dont evaluate any idea and
dont inhibit anyone from participating.
Each group was told that it had only 120 minutes for its session and that it
could finish session earlier if its members agreed that no other aspects could be
discussed. Then, each group was given a short training session, with practice,
about the CET it was to use.
Step 4: 120 minutes for each group to carry out its requirements elicitation session
using the groups CET: Each group consisted of 4 subjects and was provided with
two laptops: one to access the Web site that the group was to improve, and the other
to write the requirement ideas generated by the group.
Each group, except one, used the full 120 minutes for requirements elicitation.
Group 5, a brainstorming group, finished 25 minutes early, claiming that its mem-
bers could not generate any more new ideas, but it was not the group that generated
the fewest ideas.
Steps 1 and 2 were to be done in one 50-minute meeting and privately with each
respondent to our advertisement for subjects, and Steps 3 and 4 were to be done in
sessions attended by one or more groups.
some professional work experience, and 8 did not. Eleven subjects had English as a
native language, and 13 did not. All the subjects were familiar with brainstorming, but
none had heard about any form of EPMcreate. Therefore, one might expect the groups
using brainstorming to have an advantage; as it turned out, any such advantage proved
to be of no help.
Six groups were created for the experiment: Groups 1 and 2 used POEPMcreate as
their CET, Groups 3 and 4 used the full 16-step EPMcreate as their CET, and Groups 5
and 6 used brainstorming as their CET.
In order to create homogeneous groups in the experiment with equivalent spreads of
CS knowledge, English fluency, work experience, and native creativity, we used data we
had gathered about each subject in Steps 1 and 2. The data that we used were the number
of CS courses the subject had taken, the subjects native language, whether the subject
had worked professionally, and the results of the subjects taking an adult version of Frank
Williamss Creativity Assessment Packet [16], hereinafter called the Williams test.
As in past experiments [9,17], the Williams test was administered to each subject to
measure his or her native creativity. The subjects test scores were to be used ensure
that any observed differences in the numbers of ideas were not due to differences in the
native creativity of the subjects. In order to avoid having to interpret specific scores,
we used the subjects Williams test scores as one of the factors to consider in forming
knowledge-, skill-, experience-, and native-creativity-balanced groups.
The properties of the groups are shown in Table 1. Note that the average Williams
test scores for the 6 groups were in the small range from 70.25 to 71.6 out of a possible
100. We did not consider gender or age in creating the groups because it would have
been very difficult to balance these factors while balancing the other factors. Moreover,
we did not believe that these factors are relevant; and even if they are, they are probably
less relevant than the ones we did consider. Note that the table shown is not exactly the
original table calculated during the formation of the groups; it is the final table produced
taking into account the two alternates that replaced the two original assigned subjects
that did not show. Fortunately, the alternates did not change the balance of the groups.
After assigning the subjects to each of the 6 groups, we drew lots for the groups
to determine which groups were going to use each of the three CETs. The first two
columns of Table 1 show the resulting assignment of groups to CETs.
Each group contained the same number of subjects and participated in its session for
the same period of time so that the resources for the all groups would be the same. We
tried to schedule all the groups into one session, but could not find a single time slot that
all could attend. So we allowed each group to choose a time slot that was convenient
for all of its members. So, while the session times for the groups differ, all times were
at the subjects convenience. Thus, we believe that differences in the subjects moods
and energy levels that might arise from differences in session times were minimized.
Finally, all groups generated requirement ideas for the same Web site. Thus, only the
numbers and quality of the creative ideas generated need to be compared in order to
compare the effectiveness of the three CETs.
80 76
74
70
63
60
60
40
35
30
20
10
0
POEPMcreate POEPMcreate EPMcreate EPMcreate Brainstorming Brainstorming
Creativity Enhancement Technique
70
66
60
49
50 48
42
40 New and Realizable
New and Not Realizable
33
Not New and Not Realizable
30 Not New and Realizable
26
23
20
13 14 13
10
5 5 6
3 3 3
0 0 0 1 1 0 1 0
0
te
te
te
te
g
in
in
ea
ea
ea
ea
rm
rm
cr
cr
cr
cr
to
to
M
ns
ns
EP
EP
EP
EP
ai
ai
PO
PO
Br
Br
80
69
70
60 57 56
50
50
New and Realizable
New and Not Realizable
40 36 Not New and Not Realizable
Not New and Realizable
30 28
20
13
10 6 6 6
8
4 3 3 4 3
1 0 0 0 1 1 0 0
0
e
te
g
t
in
in
ea
ea
ea
ea
rm
rm
cr
cr
cr
cr
to
to
M
ns
ns
EP
EP
EP
EP
ai
ai
PO
PO
Br
Br
parts of the space and undervisiting other parts of the space. We suspect that POEPM-
create is more effective than EPMcreate because it gives a way to visit the entire space
in fewer steps, with fewer mind shifts between the steps.
In spite of the small number of data points, that might argue against significance, the
data about both the quantity and the quality of the ideas generated do produce results
that, based on a two-sample T-test for unequal variances, are statistically significant
at various levels ranging from 0.05 to 0.12. The statistical results are corroborated by
the observations and opinions of the subjects. These results thus indicate that POEPM-
create helps to generate more and better requirement ideas than EPMcreate does and
that EPMcreate helps to generate more and better requirement ideas than brainstorm-
ing does. We therefore conclude that Hypotheses H1 and H2 are supported and that the
experiment should be replicated.
The results, despite being somewhat statistically significant, do suffer from the small
numbers of subjects, groups, and thus data points involved in the experiment. It was
difficult to convince students in our School of Computer Science to be subjects. Never-
theless, the results are so promising that we are planning to conduct more experiments.
We will advertise for subjects in the whole university and will use smaller groups. These
differences will give us a chance to see if the results are independent of the subjects
major field and of the size of the groups.
Regardless, the small number of data points causes the threat of a so-called Type I
error, that of accepting a non-null hypothesis, making a positive claim, when it should
be rejected. Even if the data yield statistically significant results, the small number of
data points increases the probability that the positive observations were random false
positives. The only remedy for this threat is to have more data points or to replicate the
experiment, which we are already planning to do.
Construct validity is the extent to which the experiment and its various measures test
and measure what they claim to test and measure. Certainly, the groups were trying
to be creative in their idea generation. Counting of raw ideas is correct, because as
mentioned, at least one of the CETs compared has as a principal goal the generation of
as many ideas as possible. The method to evaluate the quality of an idea, determining
its novelty and its usefulness, is based squarely on an accepted definition of creativity,
that it generates novel and useful ideas.
The shakiest measure used in the experiment is the Williams test of native creativity.
With any psychometric test, such as the Williams test and the standard IQ tests, there
is always the question of whether the test measures what its designers say it measures.
The seminal paper describing the test discusses this issue [16], and the test seems to be
accepted in the academic psychometric testing field [18]. The original test was designed
for testing children, and the test seems to be used in U.S. schools to identify gifted
and talented students [19]. We modified the test to be for adults attending a university
Validation of Effectiveness of Optimized EPMcreate 103
or working [9,17]. Each of the authors has examined the test and has determined for
him- or herself that the test does examine at least something related to creativity if not
native creativity itself. Finally, the same modified-for-adults Williams test, in Italian
and English versions, has been used in all of our past experiments about CETs and will
be used in all of our future experiments about CETs. Therefore, even if the test does not
measure native creativity exactly or fully, the same error is made in all our experiments
so that the results of all of these experiments should be comparable.
Internal validity is whether one can conclude the causal relationship that is being
tested by the experiment. In this case, we are claiming that the differences in CETs
caused the observed differences in the quantity and quality of the requirement ideas
generated by use of the CETs. We know from being in the room with the groups that
each group was actively using its assigned CET while it was generated its ideas. We
carefully assigned subjects to the groups so that the groups were balanced in all per-
sonal factors, especially native creativity, that we thought might influence the subjects
abilities to generate requirement ideas. Therefore, we believe that the only factor that
can account for the differences in the number of ideas is the CET being used by the
groups. The opinions volunteered by the subjects during the sessions corroborate this
belief.
External validity is whether the results can be generalized to other cases, with differ-
ent kinds of subjects, with different kinds of CBS. Certainly the small number of data
points stands in the way of generalization.
One threat to external validity is the use of students as subjects instead of require-
ments elicitation or software development professionals. However, our student subjects
had all studied at least a few courses in computer science and software engineering.
Moreover, each group had at least one subject with professional experience in comput-
ing. One could argue that the subjects were equivalent to young professionals, each at
an early stage in his or her career [20].
Another threat to external validity is the particular choice of the types of stakeholders
whose viewpoints were used by EPMcreate and POEPMcreate sessions. Would other
choices, e.g., of teachers, work as well?
Yet another threat to external validity is the single Web site as the CBS for which to
generate requirement ideas. Would a different Web site or even a different kind of CBS
inhibit the effectiveness of any CET?
In any case, our plans for future experiments, to use different kinds of subjects, dif-
ferent sized groups, different stakeholder viewpoints, and different CBSs for which to
generate requirement ideas, address these threats to external validity.
These threats limit the strength of the conclusion of support for the hypotheses and
dictate the necessity to replicate the experiments.
6 Conclusions
This paper has described an experiment to compare the effectiveness of three CETs,
EPMcreate, POEPMcreate, and brainstorming. The experiment tested two hypothe-
ses that say that POEPMcreate is more effective in helping to generate new require-
ment ideas than EPMcreate, which is in turn more effective in helping to generate new
104 V. Sakhnini, D.M. Berry, and L. Mich
requirement ideas than brainstorming. The data from the experiment support both hy-
potheses, albeit not with uniformly high significance, due to the low number of subjects
participating in the experiment. However, the support is strong enough that it is worth
conducting more experiments to test these hypotheses, with more subjects and different
CBSs about which to generate requirement ideas. Should you want to conduct these
experiments, please avail yourself of the experimental materials we used [21].
It is necessary also to compare POEPMcreate with CETs other than EPMcreate and
brainstorming. We suggest also to evaluate the effectiveness of other optimizations of
EPMcreate and of other orderings of the steps of EPMcreate, POEPMcreate, and the
other optimizations.
Mich, Berry, and Alzetta [17] have compared the effectiveness of EPMcreate applied
by individuals to the effectiveness of EPMcreate applied by groups. It will be interesting
to do a similar comparison for POEPMcreate and other optimizations that prove to be
effective.
Finally, recall that Hypothesis H1 was stronger than needed. All that was required
to satisfy us is that POEPMcreate be at least as effective than EPMcreate. While the
support for H1 is not as strong as desired, the support for a logical union of H1 and its
null hypothesis would be stronger. As an optimization, POEPMcreate is easier to apply
and easier to teach than EPMcreate. POEPMcreates fewer steps means that either it
requires less time to use or there is more more time in each step for idea generation.
POEPMcreates fewer steps means that less time is wasted shifting the users mental
focus.
Acknowledgments
The authors thank William Berry for his advice on matters of statistical significance.
They thank the referees and shepherds for their comments, and in particular, they thank
Sam Fricker for his persistence and his conciseness improving suggestions. Victoria
Sakhninis and Luisa Michs work was supported in parts by a Cheriton School of
Computer Science addendum to the same Canadian NSERCScotia Bank Industrial
Research Chair that is supporting Daniel Berry. Daniel Berrys work was supported
in parts by a Canadian NSERC grant NSERC-RGPIN227055-00 and by a Canadian
NSERCScotia Bank Industrial Research Chair NSERC-IRCPJ365473-05.
References
1. Gause, D., Weinberg, G.: Exploring Requirements: Quality Before Design. Dorset House,
New York (1989)
2. Maiden, N., Gizikis, A., Robertson, S.: Provoking creativity: Imagine what your require-
ments could be like. IEEE Software 21, 6875 (2004)
3. Nguyen, L., Shanks, G.: A framework for understanding creativity in requirements engineer-
ing. J. Information & Software Technology 51, 655662 (2009)
4. Osborn, A.: Applied Imagination. Charles Scribners, New York (1953)
5. Bono, E.D.: Six Thinking Hats. Viking, London (1985)
6. Bono, E.D.: Serious Creativity: Using the Power of Lateral Thinking to Create New Ideas.
Harper Collins, London (1993)
Validation of Effectiveness of Optimized EPMcreate 105
7. Aurum, A., Martin, E.: Requirements elicitation using solo brainstorming. In: Proc. 3rd Aus-
tralian Conf. on Requirements Engineering, pp. 2937. Deakin University, Australia (1998)
8. Jones, S., Lynch, P., Maiden, N., Lindstaedt, S.: Use and influence of creative ideas and
requirements for a work-integrated learning system. In: Proc. 16th IEEE International Re-
quirements Engineering Conference, RE 2008, pp. 289294. IEEE Computer Society, Los
Alamitos (2008)
9. Mich, L., Anesi, C., Berry, D.M.: Applying a pragmatics-based creativity-fostering technique
to requirements elicitation. Requirements Engineering J. 10, 262274 (2005)
10. Mich, L., Berry, D.M., Franch, M.: Classifying web-application requirement ideas gener-
ated using creativity fostering techniques according to a quality model for web applications.
In: Proc. 12th Int. Workshop Requirements Engineering: Foundation for Software Quality,
REFSQ 2006 (2006)
11. Lefons, E., Pazienza, M.T., Silvestri, A., Tangorra, F., Corfiati, L., De Giacomo, P.: An al-
gebraic model for systems of psychically interacting subjects. In: Dubuisson, O. (ed.) Proc.
IFAC Workshop Information & Systems, Compiegne, France, pp. 155163 (1977)
12. De Giacomo, P.: Mente e Creativita: Il Modello Pragmatico Elementare Quale Strumento
per Sviuppare la Creativita in Campo Medico, Psicologico e Manageriale. In: Franco Angeli,
Milano, Italy (1995) (in Italian)
13. Leffingwell, D., Widrig, D.: Managing Software Requirements: a Unified Approach, 5th edn.
Addison-Wesley, Boston (1999)
14. Telem, M.: Information requirements specification I & II: Brainstorming collective decision-
making approach. Information Processing Management 24, 549557, 559566 (1988)
15. Administrator: Sir John A MacDonald High School Web Site (Viewed November 16-20,
2009), http://sja.ednet.ns.ca/index.html
16. Williams, F., Taylor, C.W.: Instructional media and creativity. In: Proc. 6th Utah Creativity
Research Conf., New York, NY, USA. Wiley, Chichester (1966)
17. Mich, L., Berry, D.M., Alzetta, A.: Individual and end-user application of the epmcreate
creativity enhancement technique to website requirements elicitation. Technical report,
School of Computer Science, University of Waterloo (2009),
http://se.uwaterloo.ca/dberry/FTP SITE/tech.reports/
MichBerryAlzetta.pdf
18. Dow, G.: Creativity Test: Creativity Assessment Packet (Williams, 1980), R546 Instructional
Strategies for Thinking, Collaboration, and Motivation, AKA: Best of Bonk on the Web
(BOBWEB). Technical report, Indiana University (Viewed March 7, 2010)
19. West Side School District: Gifted and Talented Program. Technical report, West Side Public
Schools, Higden, AR, USA (Viewed March 7, 2010)
20. Berander, P.: Using students as subjects in requirements prioritization. In: Proceedings of the
International Symposium on Empirical Software Engineering (ISESE 2004), pp. 167176.
IEEE Computer Society, Los Alamitos (2004)
21. Sakhnini, V., Berry, D., Mich, L.: Materials for Comparing POEPMcreate, EPMcreate,
and Brainstorming. Technical report, School of Computer Science, University of Waterloo
(Viewed March 7, 2010),
http://se.uwaterloo.ca/dberry/FTP SITE/
software.distribution/EPMcreateExperimentMaterials/
Towards Multi-view Feature-Based Configuration
1 Introduction
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 106112, 2010.
c Springer-Verlag Berlin Heidelberg 2010
Towards Multi-view Feature-Based Configuration 107
Two challenges that FBC techniques fail to address in a satisfactory way are (1) tai-
loring the configuration environment according to the stakeholders profile (knowledge,
role, preferences. . . ) and (2) managing the complexity resulting from the size of the FD.
In this paper, we outline a solution strategy to address those two challenges. We do
so by extending FDs with multiple views that can be used to automatically build FD
visualisations. A view is a streamlined representation of a FD that has been tailored
for a specific stakeholder, task, or, to generalize, a combination of such elements
which we call a concern. Views facilitate configuration in that they only focus on those
parts of the FD that are relevant for a given concern. Using multiple views is thus a
way to achieve separation of concerns (SoC) in FDs. SoC helps making FD-related
tasks less complex by letting stakeholders concentrate on the parts that are relevant to
them while hiding the others. Further tailoring of the visualisations is suggested through
the selection of three alternative visualisations: (1) greyed out, (2) pruned and (3)
collapsed.
In the rest of this paper, we elaborate on these ideas. Section 2 introduces FDs.
A motivating example is given in Section 3. Section 4 presents our basic strategy for
constructing views.
2 Feature Diagram
Schobbens et al. [3] defined a generic formal semantics for a wide range of FD dialects.
We only recall the basic concepts. In essence, a FD d is a hierarchy of features (typi-
cally a tree) topped by a root feature. Each feature has a cardinality i..j attached to
it, where i (resp. j) is the minimum (resp. maximum) number of children (i.e. features
at the level below) required in a product (aka configuration). For convenience, com-
mon cardinalities are denoted by Boolean operators, as shown in Table 1. Additional
constraints that crosscut the tree can also be added and are defined, without loss of
generality, as a conjunction of Boolean formulae. The semantics of a FD is the set of
products. The full syntax and semantics as well as benefits, limitations and applications
of FDs are extensively discussed elsewhere [3,9].
FBC tools use FDs to pilot the configuration of customisable products. These tools
usually render FDs in an explorer-view style [10,4], as in the upper part of Table 1.
The tick boxes in front of features are used to capture decisions, i.e. whether the
features are selected or not. We now illustrate this more concretely with a motivating
example.
f f f f
Concrete X i..j
g g g g g
syntax
h h h h
Boolean non
operator and: or: xor: optional
standard
Cardinality n..n 1..n 1..1 i..j 0..1
108 A. Hubaux et al.
3 Motivating Example
Spacebel is a Belgian software company developing software for the aerospace industry.
We collaborate with Spacebel on the development of a SPL for flight-grade libraries
implementing the CSSDS File Delivery Protocol (CFDP) [8]. CFDP is a file transfer
protocol designed for space requirements, such as long transmission delays and specific
hardware characterised by stringent resource limitations. Spacebel built a SPL of CFDP
libraries, where each library can be tailored to the needs of a specific space mission.
The FD of the CFDP SPL counts 80 features, has a maximal depth of four and con-
tains ten additional constraints. A simplified excerpt of this FD appears in the upper part
of Figure 11 . The principal features provide the capability to send (Send ) and receive
(Receive ) files. The Extended feature allows a device to send and receive packets via
other devices. The Reboot feature allows the protocol to resume transfers safely after
a sudden system reboot. PUS stands for Packet Utilisation Standard, part of the ESA
CFDP
Receive (R)
Reboot (O)
Reboot Entity
Reboot PUS
Greyed Pruned Collapsed
Fig. 1. FD of the CFDP with three alternative visualisations for the view of the TMTC integrator
1
An online version designed with SPLOT, an open source web-based FBC tool, is available at
http://www.splot-research.org/.
Towards Multi-view Feature-Based Configuration 109
standard for transport of telemetry and telecommand data (TMTC). The PUS feature
implements the CFDP related services of this standard.
CFDP typically handles four different stakeholder profiles. Spacebel decides which
features are mature enough for the mission, while leaving as much variability as possi-
ble. The system engineer makes initial high-level choices and passes the task of refining
these choices on to the network integrator and the TMTC integrator who handle the
technical aspects of the CFDP. The configuration options of interest for each of these
profiles are thus different and limited in scope.
A major problem is that access rights to these configuration options are currently
informally defined and FDs offer no way to do so. In the absence of clear access speci-
fications, a simplistic policy has been implemented: all profiles have access to all con-
figuration options. A reported consequence is that sometimes the system engineer does
not have sufficient knowledge to fully understand low-level options and make decisions.
The results were incorrect settings, e.g., inappropriate CPU consumption or excessive
use of memory for a given hardware. Similarly, the integrators were not aware of gen-
eral decisions and could make inconsistent choices wrt. the missions goals.
The changing context also demands flexible definitions of access policies. For in-
stance, there can be variations in the access rights (e.g., the integrators are granted
access to more features) or stakeholder profiles (e.g. a dedicated File System integrator
might be needed in some projects).
This situation provided the initial motivation for the solution outlined in this paper.
However, as we will see, the solution is applicable to a wider variety of problems than
the sole definition of configuration access rights. Its ambition is to extend FDs with
support for multiple perspectives.
two areas (red and orange) respectively contain the technical features that should be
accessible to the TMTC and network integrators.
View coverage. An important property to be guaranteed by a FBC system is that all
configuration questions be eventually answered [7], i.e. that a decision be made for each
feature of the FD. A sufficient condition is to check that all the features in the FD are in
the views of V . The FD of Figure 1 fulfils that condition. But this is not necessary since
some decisions can usually be deduced from others.
A necessary and sufficient condition can be defined using the notion of propositional
defineability [11]. We need to ensure that the decisions on the features that do not appear
in any view can be inferred from (are propositionally defined by) the decisions made
on the features that are part of the view. This can be achieved by translating the FD
into an equivalent propositional formula and apply the algorithm described in [11].
Features that do not belong to any view and that do not satisfy the above condition will
have to be added to existing views, or new views will have to be created to configure
them.
View interactions. Another important property of FBC is that it should always lead to
valid configurations [7]. In our case, doing the configuration through multiple views is
not a problem per se. This is because, although stakeholders only have partial views, the
FBC system knows the whole FD and is thus capable of propagating the choices made
in one view to the others. However, problems can arise when the selection of a feature
in one view depends on the selection of another feature in another view. If overriding
of decisions across views is not allowed, then we must introduce some form of conflict
resolution mechanisms. This is a complex issue for which various strategies can be
elaborated. One is to introduce priorities on views [12]. Another one is to constrain the
order in which views are configured [8].
Visualisation. Views are abstract entities. To be effectively used during FBC, they need
to be made concrete, i.e. visual. We call a visual representation of a view a visualisa-
tion. The goal of a visualisation is to strike a balance between (1) showing only features
that belong to a concern and (2) including features that are not in the the concern but
that allow the user to make informed decisions. For instance, the PUS copy feature is
in the view of the TMTC integrator, but its parent feature PUS is not: How will that in-
fluence the decision making process? To tackle this problem, we see three visualisation
alternatives with different levels of details (see lower part of Figure 1).
The greyed visualisation is a mere copy of the original FD in which the features
that do not belong to the view are greyed out (e.g. P , S, SF and SA). Greyed out
features are only displayed but cannot be manually selected/deselected. In the pruned
visualisation, features that are not in the view are pruned (e.g. S, SF and SA) un-
less they appear on a path between a feature in the view and the root, in which case
they are greyed out (e.g. P ). Pruning can have an impact on cardinalities. As shown
in Figure 1, the cardinality of CFDP is 0..1 whereas it is 1..5 (or-decomposition)
in the FD. It has to be recomputed to ensure the consistency of the FD. In the col-
lapsed visualisation, all the features that do not belong to the view are pruned. A feature
in the view whose parent or ancestors are pruned is connected to the closest ancestor
Towards Multi-view Feature-Based Configuration 111
that is still in the view. If no ancestor is in the view, the feature is directly connected to
the root (e.g. P C and P R). Similarly, cardinalities have to be recomputed for
consistency reasons.
5 Conclusion
In this paper, we have outlined an approach to specify views on feature diagrams in or-
der to facilitate feature-based configuration, one of the main techniques to define prod-
uct requirements in software product lines. Three alternative visualisations were pro-
posed, each offering different levels of detail. This work was motivated by an ongoing
collaboration with the developers of Spacebel, a Belgian software company developing
software for the aerospace industry. A preliminary evaluation with the developers of an
open source web-based meeting management system is also in progress [13].
A number of future work can be envisioned. First, a more thorough evaluation should
be carried out. Second, we will have to address the problem of conflictual configuration
decisions across views. Third, the formalisation needs to be refined and the implemen-
tation to be pursued. Currently, we only have standalone algorithms implementing our
transformations. The rest of our approach needs to be developed, integrated in a feature
modelling and configuration environment, and properly validated.
Acknowledgements
This work is sponsored by the Interuniversity Attraction Poles Programme of the Bel-
gian State, Belgian Science Policy, under the MoVES project.
References
1. Pohl, K., Bockle, G., van der Linden, F.: Software Product Line Engineering: Foundations,
Principles and Techniques. Springer, Heidelberg (July 2005)
2. Kang, K., Cohen, S., Hess, J., Novak, W., Peterson, S.: Feature-Oriented Domain Analy-
sis (FODA) Feasibility Study. Technical Report CMU/SEI-90-TR-21, SEI, Carnegie Mellon
University (November 1990)
3. Schobbens, P.Y., Heymans, P., Trigaux, J.C., Bontemps, Y.: Feature Diagrams: A Survey and
A Formal Semantics. In: RE 2006, pp. 139148 (September 2006)
4. Mendonca, M.: Efficient Reasoning Techniques for Large Scale Feature Models. PhD thesis,
University of Waterloo (2009)
5. Czarnecki, K., Helsen, S., Eisenecker, U.W.: Formalizing cardinality-based feature models
and their specialization. Software Process: Improvement and Practice 10(1), 729 (2005)
6. Czarnecki, K., Helsen, S., Eisenecker, U.W.: Staged configuration through specialization
and multi-level configuration of feature models. Software Process: Improvement and Prac-
tice 10(2), 143169 (2005)
7. Classen, A., Hubaux, A., Heymans, P.: A formal semantics for multi-level staged configura-
tion. In: VaMoS 2009. University of Duisburg-Essen (January 2009)
8. Hubaux, A., Classen, A., Heymans, P.: Formal modelling of feature configuration workflow.
In: SPLC 2009, San Francisco, CA, USA (2009)
112 A. Hubaux et al.
9. Benavides, D., Segura, S., Ruiz-Cortes, A.: Automated analysis of feature models 20 years
later: A literature review. IST (2010) (preprint)
10. pure-systems GmbH: Variant management with pure: variants. Technical White Paper
(2006),
http://www.pure-systems.com/fileadmin/downloads/
pv-whitepaper-en-04.pdf
11. Lang, J., Marquis, P.: On propositional definability. Artificial Intelligence 172(8-9), 991
1017 (2008)
12. Zhao, H., Zhang, W., Mei, H.: Multi-view based customization of feature models. Journal of
Frontiers of Computer Science and Technology 2(3), 260273 (2008)
13. Hubaux, A., Heymans, P., Schobbens, P.Y.: Supporting mulitple perspectives in feature-based
configuration: Foundations. Technical Report P-CS-TR MPFD-000001, PReCISE Research
Centre, Univ. of Namur (2010),
http://www.fundp.ac.be/pdf/publications/69578.pdf
Evaluation of a Method for Proactively Managing
the Evolving Scope of a Software Product Line
1 Introduction
Product Line (PL) Engineering is a software development approach that aims at ex-
ploiting commonalities and predicted variabilities among software products that
strongly overlap in terms of functionality [1,2]. According to Knauber and Succi [3],
PLs are already intended to capture the evolution of software products and to last for a
fairly long time. In this context, one of the most important aspects to consider is the
ability of PLs themselves to evolve and change [3]. However, Savolainen and
Kuusela [4] emphasize that any given design can only handle a limited number of
different kinds of changes and, therefore, it is crucial to predict what kind of changes
will be required during the lifespan of a PL.
Current PL engineering methods [1,2,5] address pre-planned and more straightfor-
ward proactive changes across products or different versions of products. They do
not support the prediction of the not-so-straightforward future changes in products
and features, which are often triggered by change requests from inside or outside the
organization (such as a change due to the decision of a technology provider to
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 113127, 2010.
Springer-Verlag Berlin Heidelberg 2010
114 K. Villela, J. Drr, and I. John
2 Method Overview
PLEvo-Scoping consists of four steps to be carried out by the PL scoping team, which
is generally composed of people with the following roles [8]: scoping expert, PL
manager, and domain expert, the latter with either the technical or the market point of
view.
The first step is Preparation for Volatility Analysis, which establishes the basis for
the volatility analysis and is made up of the following activities:
Activity 1: Establish the timeframe that restricts the current volatility analysis, and
Activity 2: Identify/update the types of system components that are generally in-
volved in the assembly of the planned PL products.
The second step is called Environment Change Anticipation and has the purpose of
identifying and characterizing facts that may take place in the PLs environment
within the pre-established timeframe, and that may allow or require adaptations in the
PL. This step comprises the following activities:
Activity 3: Identify the actors that play a role in the PLs environment and who
give rise to or realize facts that may affect the PL,
Activity 4: Identify and characterize facts that may be caused or realized by the
identified actors and have the potential for changing the PLs environment,
Activity 5: Verify the perspective of new actors playing a part in the PLs environ-
ment within the volatility analysis timeframe and characterize how these actors
may change such an environment, and
Activity 6: Classify the previously characterized facts according to their relevance,
in order to decide whether and when they should have their impact in terms of ad-
aptation needs analyzed.
The next step is called Change Impact Analysis. Its purpose is to analyze the impact
of the identified facts on the PL and consists of:
Activity 7: Identify the adaptation needs that may be allowed or required in the PL
as a consequence of the previously identified facts,
Activity 8: Characterize the adaptation needs by identifying the PL features to be
affected by them, and by estimating their business impact, technical penalty, and
technical risk, and
Activity 9: Classify the adaptation needs according to relevance, in order to decide
whether and when the inclusion of an adaptation need should be planned.
Once the most relevant adaptation needs have been selected, it is time for PL Evolu-
tion Planning. The idea is to establish when and how relevant adaptation needs are
expected to be introduced into the PL, and prepare it for accommodating the adapta-
tion needs beforehand. The activities that make up this step are:
Activity 10: Determine when and in which products relevant adaptations are ex-
pected to be introduced, which gives rise to the PL Evolution Map,
Activity 11: Analyze the alternative solutions for dealing with relevant adaptation
needs, in terms of effort, cost, effectiveness, and strategic alignment,
Activity 12: Select the best alternatives for dealing with the adaptation needs, and
Activity 13: Revise the PL Evolution Map in order to adjust it to the alternative
solutions selected, if necessary.
116 K. Villela, J. Drr, and I. John
4 Quasi-experiment
4.1 Definition
The goal of this quasi-experiment was to characterize the adequacy and feasibility of
PLEvo-Scoping, by collecting the perception of the quasi-experiment participants as
well as some quantitative measures. These empirical data are expected to support PL
organizations in making the decision to try out the method, which will give us the
opportunity to perform further empirical studies, such as a case study in a software
company. In addition, feedback provided by the quasi-experiment participants should
be used to improve the method. According to the template proposed in the
118 K. Villela, J. Drr, and I. John
4.2 Planning
From the goal definition, two propositions were defined: P1) PLEvo-Scoping is ade-
quate, and P2) PLEvo-Scoping is feasible. It was established for this quasi-experiment
that PLEvo-Scoping would be considered adequate (P1) if its immediate benefits (in
terms of changes in the outputs of the scoping process and information made available
to guide further activities and decisions) were considered to support PL evolution, ac-
cording to the judgment of the quasi-experiment participants; and feasible (P2) if the
obtained benefits were worth the effort required for applying the method, which would
also be judged by the quasi-experiment participants. In both cases, the quasi-experiment
participants should be provided with the results of the scoping process and support their
judgment with both qualitative and quantitative information.
By using the Goal/Question/Metric method [12], the quasi-experiment propositions
were broken down into questions and metrics. Each metric was defined in terms of
meaning, type of measure, measure scale, source, and collecting procedure [13]. The
acceptance criteria were also defined during the planning of the quasi-experiment and
will be presented in Subsection 4.4 (Data Analysis).
This quasi-experiment was designed to be performed in two days by two scoping
teams in charge of separately scoping the AAL platform as a PL. Group 1 applied
treatment 1, which consisted of first using PuLSE-Eco to conduct the scoping process,
already taking into consideration the evolution concern, and then applying PLEvo-
Scoping; while Group 2 applied treatment 2, which consisted of interweaving the activi-
ties of PuLSE-Eco and PLEvo-Scoping. The purpose of treatment 1 was to allow a clear
distinction between the results before and after applying PLEvo-Scoping, while the
purpose of treatment 2 was to avoid the confounding factor (present in treatment 1) of
providing the scoping team with extra time to think about PL scope evolution after
Evaluation of a Method for Proactively Managing the Evolving Scope 119
applying PuLSE-Eco. This confounding factor would put in doubt whether the obtained
benefits were a result of this extra time or a result of the application of PLEvo-Scoping.
In addition, the application of PLEvo-Scoping in the two treatments aimed at strength-
ening the validity of the results through corroboration. Two days were allocated to each
treatment due to time restrictions.
Each group consisted of three people: one PL manager and two domain experts,
one with the technical point of view and the other one with the market point of view.
All participants were selected by the leader of the AAL research program, taking into
account their profiles and involvement in AAL research projects. Two experts on the
respective two methods (PuLSE-Eco and PLEvo-Scoping) were allocated to guide the
pertinent part of the scoping process in both treatments. Neither the PuLSE-Eco nor
the PLEvo-Scoping expert was involved in the AAL research program, so their role in
the scoping process was comparable to that of an external consultant.
As PL scoping teams are not generally composed of many people, all necessary
roles were represented, and the PL scoping process was to take just a couple of days
[8]; this design was realistic compared to industrial settings.
The threats to the validity of this quasi-experiment have been analyzed based on
the set of threats to validity provided in [14]. We have addressed most validity threats:
Fishing: Subjective classifications and measures were only provided by the partici-
pants of the quasi-experiment; two types of open questions were included; two
people with no special expectations in the quasi-experiment results were involved
in its data analysis.
Reliability of measures: The quasi-experiment participants defined subjective
measures and/or provided values for them based on objective measures; the in-
struments were revised by three people with different profiles (one M.Sc. student,
one PL professional, and one expert in empirical studies).
Mono-method bias: Both quantitative and qualitative measures were used; meas-
ures were cross-checked whenever possible; one groups contributions were con-
firmed by the other group.
Interaction of selection and treatment: One representative of each expected role
was allocated to each group and all participants answered a profile questionnaire to
check whether they were really appropriate representatives of the roles they were
expected to have.
A discussion of further threats (reliability of treatment implementation, diffusion or
imitation of treatments, hypothesis guessing, compensatory rivalry, and resentful
demoralization) can be found in [13]. As is common in (quasi-)experiments, some
validity threats had to be accepted:
Low statistical power: The number of subjects in this quasi-experiment made it
impossible to perform any statistical analysis.
Random heterogeneity of subjects: As the allocation of people to the treatments
was based on convenience, the two groups might not have had similar knowledge
and backgrounds.
Selection: Participants were selected by the research program leader according to the
expected profiles; the PLEvo-Scoping expert, who was not a Fraunhofer employee at
that time, had had previous contact with those two participants; the PuLSE-Eco
expert was from the same organization as the quasi-experiment participants.
120 K. Villela, J. Drr, and I. John
We decided to deal with the lack of statistical tests as proposed by Yin [15], who
claims that an effective way of strengthening the results of empirical studies when no
statistical test is applicable is to perform them further. Table 1 and its related com-
ments and interpretation (see Subsection 4.5, first paragraph) show that the two
groups were comparable indeed, and people with the same role in the different groups
had similar profiles (knowledge and background). Therefore, the threat of Random
heterogeneity of subjects did not appear to be real. Concerning the threat of Selection,
a question in the profile questionnaire was added that asked about the participants
motivation for taking part in the scoping process of the AAL platform. In addition, the
scoping activities were conducted as they would have been conducted with any exter-
nal customer. Ultimately, this threat did not appear to be real either, because some of
the worst evaluations were made by one of the two quasi-experiment participants who
had previous contact with the PLEvo-Scoping expert. Regarding the PuLSE-Eco
expert, it should be noted that PuLSE-Eco was not the object of study of this quasi-
experiment and any possible bias would have affected both treatments.
4.3 Operation
The quasi-experiment took place in the form of one two-day workshop for each
treatment. The first part of each workshop was dedicated to the presentation of the
application domain, the quasi-experiments task, as well as relevant information for
assuring a common understanding of the PL to be scoped. After that, each member of
the scoping team completed the profile questionnaire.
Due to time restrictions, the PLEvo-Scoping expert suggested that the groups divide
tasks in some activities according to the participants role. Group 1 used this approach
when identifying and characterizing facts (activity 4, part of the step Environment
Change Anticipation), when identifying adaptation needs (activity 7, part of the step
Change Impact Analysis), and when performing the step PL Evolution Plan as a whole.
Group 2 decided to perform all activities as a group.
Group 2 received extra training and extra time to improve their lists of facts and
adaptation needs because the initial number of these was very low (15 and 9, respec-
tively). As scoping the AAL platform PL was a real problem, the goal of this quasi-
experiment from the viewpoint of the leader of the AAL research program was to get
the highest number and the best quality of results possible from each group. During
the analysis of the impact of the adaptation needs on the set of PL features (part of
activity 8, in the Change Impact Analysis step), the method expert asked Group 1 to
define a criterion to distinguish unstable features from stables ones, based on the
number of adaptation needs causing changes in the PL features. Group 1 defined that
features affected by at least five adaptation needs would be considered unstable. The
method expert asked Group 2 to adopt the same criterion.
Another remark concerning the quasi-experiment execution is related to the activi-
ties of analyzing the alternative solutions for dealing with relevant adaptation needs
and selecting the best alternatives, which are part of the PL Evolution Plan step (see
Section 2, activities 11 and 12). Due to time restrictions, only the most appropriate
alternative solution for each adaptation need was analyzed.
Evaluation of a Method for Proactively Managing the Evolving Scope 121
Group 1 Group 2
Profile Item
TE ME PLM TE ME PLM
Experience in AAL 3 4 4 2,5 3 4
Knowledge of the AAL platform 5 2 2 5 3 2
Experience in PL Scoping 2 2 1 1 2 4
Technical knowledge in the AAL context 4 3 4 3 4 4
Market knowledge in the AAL context 2 4 4 3 4 4
Capability of providing an overview of 4 3 2 3 4
the AAL PL and its goals
Motivation 4 4 4 4 4 5
TE: domain expert with technical viewpoint; ME: domain expert with market viewpoint;
PLM: product line manager.
Table 2 presents a quantitative overview of the scoping process results. The values
in parentheses represent the number of facts or adaptation needs that had been given
as examples during the extra training and were confirmed by Group 2.
(column Support 1 - ST1) was The relationship between actors goals and facts is not
so clear. Actors, facts, and adaptation needs would have provided enough support. The
missing (necessary, but not required or provided) information related to Perception of
Support 3 (column Support 3 - ST2) was The basic alternative solutions have to be
tailored to the concrete situation, which makes the activity difficult. Therefore, the
values of # Annoying Information 1 for Group 1 and # Missing Information 3 for Group
2 were both 1, while the remaining quantitative metrics related to missing and annoying
information had the value 0 (see conditions 2 and 3 in Table 5).
From Table 3, one can calculate, by converting the original ordinal values obtained
from the questionnaires into numeric values and applying the arithmetic mean, the
values of Perception of Support 1 (3.5 for Group 1 and 4 for Group 2), Perception of
Support 2 (4.33 for both Group 1 and Group 2), and Perception of Support 3 (4.67 for
Group 1 and 3.67 for Group 2). All of them correspond to the ordinal value of either
Mostly necessary and sufficient (4) or Necessary and Sufficient (5).
Table 4 shows the values obtained for the quantitative metrics. Adaptation needs
and technical progress items were not confirmed by the other scoping team if consid-
ered irrelevant or outside the scope of the AAL platform PL. Technical progress items
are facts related to technology evolution that the PL organization wants to pursue in
order to get innovative features to the market as soon as possible.
A few days after the workshop, the quasi-experiment participants were provided
with a report on the respective results and asked to provide their perception of PLEvo-
Scopings adequacy using a 5-point ordinal scale. The ordinal values were converted
into numeric ones, and the arithmetic mean was applied. The values obtained for the
metric Perception of Adequacy were 4.5 (High) according to Group 1, and 4.33
Evaluation of a Method for Proactively Managing the Evolving Scope 123
(Medium to High) according to Group 2. No participant selected the values Low (1)
and Low to Medium (2) for the adequacy of the method.
From the defined acceptance criterion (see Table 5) and the above data analysis,
proposition P1 (PLEvo-Scoping is adequate) was accepted in the context of both
groups. While Group 1s results even satisfied the non-obligatory condition (condition
5), the results of Group 2 did not satisfy it.
Data Analysis related to Feasibility. The effort for applying both approaches (PuLSE-
Eco and PLEvo-Scoping) is described in Table 6, which neither takes into consideration
the learning effort nor the scoping experts effort. The learning effort related to PLEvo-
Scoping was 2.48 person-hours for Group 1 and 2.68 person-hours for Group 2. The
scoping experts effort represents no loss of information for the quasi-experiment,
because the PLEvo-Scoping expert herself did not carry out any activity. The values in
parentheses in Table 6 refer to the extra time Group 2 used to improve their lists of facts
and adaptation needs, after additional training by the PLEvo-Scoping expert. The PL
managers effort (column PLM) is also presented factored out of the total effort,
because the PL manager from Group 2 managed to take part in more activities of the
scoping process than the minimum previously established.
Difficulty were 3.27 for Group 1 and 2.83 for Group 2, which represent the ordinal
value Neither Difficult nor Easy.
Furthermore, the quasi-experiment participants were asked to give their perception of
the feasibility of PLEvo-Scoping using a 5-point ordinal scale, after being provided with
the effort metrics and the method results. The ordinal values were converted into nu-
meric ones, and the arithmetic mean was applied. The values obtained for the metric
Perception of Feasibility were 4 (Medium to High) according to Group 1, and 3.67
(Medium to High) according to Group 2. No participant selected the values Low (1)
and Low to Medium (2) for the methods feasibility.
Taking into consideration the acceptance criteria defined in Table 7 and the
reported data analysis, proposition P2 (PLEvo-Scoping is feasible) was accepted in
the scope of both scoping teams.
From the values presented in Table 1, we concluded that each participant had the
competencies required to perform the role he/she had been assigned to by the AAL
Evaluation of a Method for Proactively Managing the Evolving Scope 125
research program leader (technical knowledge on the part of the domain expert with
the technical viewpoint, market knowledge on the part of the domain expert with the
market viewpoint, and the capability of providing an overview of the AAL PL on the
part of the PL managers). The values related to Motivation were very similar. While
Group 1 had higher values for Experience in AAL, Group 2 had higher values for
Knowledge of the AAL platform. Therefore, overall, their domain knowledge can be
considered similar as well. The main difference between the two groups is related to
Experience in PL Scoping, because the PL manager of Group 2 had already partici-
pated in a scoping process. As the time spent by the PL managers in the workshops
was limited, as PLEvo-Scoping was a new method, and as Group 2 applied the
interwoven approach, we do not believe this experience had much influence.
Consequently, we considered the two groups to be comparable.
The number of facts and adaptation needs identified by Group 2 (see Subsection
4.3 and Table 2), which initially was too low, and the subsequent extra training and
complementary activities to improve it may have been caused by Group 2s choice of
performing all activities as a group, not allowing any parallelism. Another factor that
may have influenced Group 2s performance is the interweaving of activities from
both methods, because the group had to switch their minds between scoping and evo-
lution activities. Despite the differences in Tables 2 and 4, which are justified above,
the two approaches for integrating the method into an existing PL scoping process
showed similar general results: The method could be applied in just one day, and it
was considered adequate, feasible, and neither difficult nor easy to apply.
Concerning the annoying information that was pointed out by a member of Group
1 (see Subsection 4.4 - Data Analysis related to Adequacy), the identification of ac-
tors goals is optional and therefore cannot be considered an annoying request. Fur-
thermore, this information was very useful when Group 2 had identified only 15 facts
and 9 adaptation needs and needed some help. The method expert used the actors
goals that had been identified by the group to derive some possible examples of facts
and adaptation needs; some of these were considered relevant by the group and ac-
cepted. The missing information that was pointed out by a member of Group 2 is
really missing information and cannot be added easily. The best way to address it is to
build an experience base of alternative solutions for the PL organization.
Contrary to our expectations, the interweaving of activities in treatment 2 did not
provide any benefit. We expected that some activities of PuLSE-Eco (especially, the
activity Assess Domains) would provide insights into some PLEvo-Scoping activities
and vice versa, but the workshop format (two consecutive days) and time pressure did
not allow the quasi-experiment participants to really benefit from previous activities.
The interaction between the two methods must be investigated further.
With regard to the difficulty of applying PLEvo-Scoping, the value for General
Perception of Difficulty (Neither Difficult nor Easy) is acceptable due to the inherent
difficulty of some activities. Furthermore, this result might have been influenced
negatively by the short time available for training.
In order to address the feedback obtained from the quasi-experiment participants, we
have included in the PLEvo-Scoping activities of characterizing adaptation needs (Sec-
tion 2, activity 8) and analyzing alternative solutions (Section 2, activity 11) the register
of the rationales behind the assignment of values to the attributes. We plan to provide
tool support for applying PLEvo-Scoping, which should make the registered and
126 K. Villela, J. Drr, and I. John
derived relationships between adaptation needs more explicit for the scoping team when
elaborating the PL Evolution Map (Section 2, activity 10). Concerning the problem of
missing alternative solutions, we want to continue to investigate alternative solutions for
dealing with adaptation needs together with PL architects and, at the same time, start to
collect real occurrences in an experience base of alternative solutions. However, we do
not believe it is possible to compile a list of adaptation needs that usually apply to all
kinds of products or development, because adaptation needs are expected to be applica-
tion-specific and change over time. PLEvo-Scoping provides a list of 35 generic facts
instead, so that the scoping team can reflect on whether and how they may apply to the
application domain and those facts should lead to the adaptation needs. Moreover, the
difficulty of estimating the business impact and the technical risk of an adaptation need
cannot be addressed completely, because it is inherent to the problem. Depending on the
experience the scoping team has in the domain and with the required technologies, this
difficulty is higher or lower.
5 Conclusion
PLEvo-Scoping is a method for supporting PL scoping teams in systematically
reasoning about the driving forces of evolution in a certain domain, especially
reasoning about who is behind these forces and how their decisions, needs, or
achievements may affect the PL infrastructure. Our method allows the PL scoping
team to proactively identify and prioritize the adaptation needs that will probably be
required in the PL infrastructure and decide about how to deal with them.
The contribution of this paper is the characterization of the adequacy and feasibil-
ity of PLEvo-Scoping in practice, meaning according to professionals in charge of
scoping an AAL PL. A quasi-experiment was performed to obtain feedback from PL
practitioners on how to improve the method and to provide first empirical data on the
usage of PLEvo-Scoping, so that other PL organizations can decide on whether to try
out the method or not. The method could be applied in just one day, and overall, the
quasi-experiment participants perceived it as being adequate and feasible. Those re-
sults were really positive, taking into consideration that predicting the future is hard,
the quasi-experiment participants applied the method for the first time, the learning
effort had to be minimal, and there was no specific tool support.
Although PLEvo-Scoping was applied for just one day, we recommend two to
three days for its application, in order to give the PL scoping team enough time to
understand the methods underlying concepts and carry out its activities without so
much time pressure. This recommendation addresses two of the comments provided
by the quasi-experiment participants. In addition, PLEvo-Scoping can also be applied
interactively, where the method expert would guide the PL scoping team in perform-
ing their activities. The intervention of the PLEvo Scoping expert in this quasi-
experiment was kept to a minimum in order not to affect the results.
We think that performing a quasi-experiment is a good means for providing em-
pirical evidence on Product Line engineering technologies, because, compared to case
studies and experiment, its intermediate degree of control makes it easier to have PL
professionals as subjects while still allowing manipulation of variables and compari-
son of treatments on some level. In this way, it is possible to convince practitioners as
Evaluation of a Method for Proactively Managing the Evolving Scope 127
to the applicability of a method in real settings and, at the same time, to provide
researchers with scientific evidence on the value of the method.
We intend to perform further empirical studies in order not only to corroborate the
results reported in this paper, but also to further analyze the interaction between
PLEvo-Scoping and the scoping approach.
References
1. Clements, P., Northrop, L.: Software Product Lines: Practices and Patterns. Addison-
Wesley, Reading (2001)
2. Pohl, K., Bckle, G., van der Linden, F.: Software Product Line Engineering: Foundations,
Principles, and Techniques. Springer, Heidelberg (2005)
3. Knauber, P., Succi, G.: Perspectives on Software Product Lines. Software Engineering
Notes 27(2), 4045 (2002)
4. Savolainen, J., Kuusela, J.: Volatility Analysis Framework for Product Lines. In: Proc.
SSR 2001, Toronto, pp. 133141 (2001)
5. Bayer, J., Flege, O., Knauber, P., et al.: PuLSE: A Methodology to Develop Software
Product Lines. In: Proc. SSR 1999, Los Angeles, pp. 122131 (1999)
6. Bengtsson, P., Lassing, N., Bosch, J., van Vliet, H.: Analyzing Software Architectures for
Modifiability. TR HK-R-RES00/11-SE, University of Karlskrona/Ronneby. Ronneby
(2000)
7. Schmid, K.: Planning Software Reuse A Disciplined Scoping Approach for Software
Product Lines. PhD Theses in Experimental Software Engineering. Fraunhofer IRB (2003)
8. John, I., Knodel, J., Lehner, T., et al.: A Practical Guide to Product Line Scoping. In: Proc.
SPLC 2006, Baltimore, pp. 312 (2006)
9. Villela, K., Drr, J., Gross, A.: Proactively Managing the Evolution of Embedded System
Requirements. In: Proc. RE 2008, Barcelona, pp. 1322 (2008)
10. John, I., Villela, K., Gross, A.: AAL Platform Product Line Scoping Results and
Recommendations. TR 074.09/E, Fraunhofer IESE, Kaiserslautern (2009) (available upon
request)
11. Zelkowitz, M., Wallace, D.: Experimental Models for Validating Technology. IEEE
Computer 31(5), 2331 (1998)
12. Basili, V., Caldiera, G., Rombach, H.: Goal Question Metrics Paradigm. Encyclopedia of
Software Engineeering 1, 528532 (1994)
13. Villela, K., John, I.: Usage of PLEvo-Scoping in the Ambient Assisted Living Domain: A
Quasi-Experiment Package. TR 093.09/E, Fraunhofer IESE, Kaiserslautern (2010)
14. Wohlin, C., Runeson, P., Hst, M., et al.: Experimentation in Software Engineering: An
Introduction. Kluwer Academic Publishers, Norwell (2000)
15. Yin, R.: Case Study Research: Design and Methods, 3rd edn. Sage Publications, Thousand
Oaks (2003)
Challenges in Aligning Requirements Engineering and
Verification in a Large-Scale Industrial Context
1 Introduction
Are we sure that the tests performed are based on requirements and not on technical
specifications supplied by developers? Are we sure that the test coverage is adequate?
In order to assure that customer requirements are realized as intended these questions
must be asked and answered. However, this is not an easy task, since requirements
tend to change over time [13], and in many cases the requirement specifications are
not updated during the development of a product making it hard to use them as a solid
base for creating e.g. test cases [7, 15]. In small systems with just a few requirements
it could still be possible to handle the changes manually, but it gets extremely hard in
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 128142, 2010.
Springer-Verlag Berlin Heidelberg 2010
Challenges in Aligning Requirements Engineering and Verification 129
2 Related Work
In [17], authors presented the findings of the discussions with test managers and engi-
neers in software development organizations regarding difficulties of integrating in-
dependent test agencies into software development practices. The company where we
have performed interviews does not commonly use independent test agencies, how-
ever it has separate requirements, development, and testing units. Therefore it would
be interesting to compare the results of having independent test agency and independ-
ent test unit within the company under study.
Findings related to change management emphasize the importance of synchroniza-
tion between the development and test with respect to modifications of functionality
[17]. The results of our study confirm these findings. One of the most recurrent chal-
lenges identified in our study is that requirements are not being updated on time.
Findings related to people interactions and communication stress the need of com-
munication between development and test organizations. If testers do not know who
wrote or modified the code, they do not know whom to talk to when potential faults
are detected. On the other hand, it could be difficult for developers to inform testers
on upcoming functionality changes, if they dont know whose test cases will be af-
fected [17]. Our study confirms these results as well. Most of the interviewees suggest
that alignment could be greatly improved if requirements and testing people would
interact more with each other.
Several surveys on requirements related challenges are present in the literature:
problems in the requirements engineering process [9], requirements modeling [6],
quality requirements [7], requirements prioritization [1], and requirements interde-
pendencies [8]. Among these, Karlsson et al. [1] have results similar to ours, i.e. tool
integration is difficult and it is a challenge to write quality requirements.
130 G. Sabaliauskaite et al.
Most of the studies above do not focus on the alignment between the requirements
and the verification processes. Research in connecting requirements and testing has
been performed by several authors, for instance Uusitalo et al. [4], Post et al. [3], and
Damian and Chisan [10]. Uusitalo et al [4], have conducted a series of interviews in
order to investigate best practices in linking requirements and testing. Among the best
practices, authors mention early tester involvement in requirements activities. They
conclude by suggesting to strengthening the links between requirements engineers
and testers, since it is difficult to implement traceability between them; a conclusion
supported by this study (see Section 4.7).
The importance of linking requirements and verification is also stressed by Post et
al. [3]. They describe a case study showing that formalizing requirements in scenarios
make it easier to trace them to test sets. Damian and Chisan [10] present a case study
where they introduce a new requirements engineering process in a software company.
Among the practices in the process, they include traceability links between require-
ments and testing, cross-functional teams, and testing according to requirements.
They show that an effective requirements engineering process has positive influence
on several other processes including testing process.
The case studies above [3, 4, 10] are performed in a medium scale requirements
engineering context [11], while our study is performed in a large/very large scale
context and includes many aspects of aligning requirements and verification.
3 Research Approach
The approach used in this study is qualitative. Qualitative research consists of an
application of various methods of collecting information, mainly interviews and focus
groups. This type of research is exploratory [16]. Participants are asked to respond to
general questions, and the interviewers explore their responses to identify and define
peoples' perceptions and opinions about the topic being discussed. As the study was
meant to be deep and exploratory, interviews were the best tool since surveys are not
exploratory in nature. The interviews were semi-structured to allow in-depth, explora-
tory freedom to investigate non-premeditated aspects.
In this study, we interviewed 11 professionals in a large software development
company in Sweden, based on the research question: What are the current challenges
in aligning the requirements and the verification processes?
The viewpoint taken in this research is from a process perspective. The researchers
involved do not work directly with artifacts, but with processes and have expertise in
fields like requirements, testing, quality, and measurement.
Based on our pre-understanding of the processes involved in aligning requirements
and verification, a conceptual model has been designed (see Figure 1). This model
was used as a guide during the interviews. In this model, we consider three dimen-
sions of requirements and test artifacts, connected through work processes. One is the
Abstraction level dimension, from general goals down to source code, which is similar
both for the requirements and the testing side. Test artifacts are used to verify the
code, but also for verifying the requirements. The arrows are relationships that can be
both explicit and implicit, and can be both bi- or uni-directional. Then, we have the
Time dimension, in which the processes, the products, and the projects change and
Challenges in Aligning Requirements Engineering and Verification 131
evolve. This has an effect on the artifacts. There is also the dimension of Product
lines, which addresses variability, especially applicable when the development is
based on a product line engineering approach [2].
Case Context. Our results are based on empirical data collected through interviews at
a large anonymous company, which is using a product-line approach. The company is
developing embedded systems for a global market, and has more than 5000 employ-
ees. A typical project in this company lasts about 2-years, involves circa 800-1000
men per year, and has around 14000 requirements and 200000 test cases. The tool
DOORS is used for requirements management, and the tool Quality Center for test
management. Further information about the company is not disclosed for confidential-
ity reasons.
The interviews have been distributed in time between May and October 2009.
In this study, challenges and problems, as well as, current good practices and im-
provement suggestions regarding alignment between the requirements and verifica-
tion processes have been identified through interviews with software engineering
practitioners. The results from 11 interviews are included in this paper. Employees
with different roles have been interviewed: quality management related roles (quality
manager and quality control leader), requirements related roles (requirements process
manager, requirements architect and requirements coordinator), developer and testing
related roles (test leader, tester). The research was conducted in several steps:
1. Definition of interview guide;
2. Interview planning and execution;
132 G. Sabaliauskaite et al.
Step 2. Eleven professionals were interviewed; each interview lasted for about one
and a half hour. All interviews were recorded in audio format and notes were taken. A
semi-structured interview strategy [16] has been used in all interviews, where the
interview guide acted as a checklist to make sure that all important topics were
covered. 2-3 interviewers interviewed one interviewee. One of the interviewers lead
the interview, while the others followed the interview guide, took notes, and asked
additional questions. The selection of the interviewees has been made based on
recommendations by requirements managers, test managers, and the interviewees
themselves. (At the end of each interview we asked the interviewees if they could
recommend a person or a role in a company whom we could interview in order to get
alignment related information).
Step 3. Interviews were transcribed into text in order to facilitate the analysis. The
transcriptions were then divided into text sections containing 1-2 sentences. All the
1
The complete version of the interview guide and coding guide are available at:
http://serg.cs.lth.se/research/experiment_packages/interview_study_on_requirements_
verification_alignment/
Challenges in Aligning Requirements Engineering and Verification 133
text sections have been numbered in order to keep the order of the sentences. The size
of the transcriptions ranged from 4000 words to about 9000 words per interview.
Step 4. As suggested by C.B. Seaman [12], codes (keywords) were assigned to the
transcriptions sections in order to be able to extract all the sections related to a spe-
cific topic. However, the definition of the coding scheme turned out to be a non-trivial
task. We started by making an initial list of possible codes, which included codes
related to our research questions, alignment methods, quality requirements [14] and
software development process activities. In order to extend and tailor this initial list of
codes to our interview context, we decided to perform exploratory coding [16], which
included six researchers analyzing several interview transcriptions individually and
assigning suitable codes to the text sections.
The result of exploratory coding was a list with 169 codes. In the next stage, we
reviewed the codes resulting from the exploratory coding, grouped them into several
categories at different abstraction levels and developed a coding guide. The coding
guide is a document containing the list of codes and detailed instructions of how to
code a transcription. In order to validate the coding guide, seven researchers used it to
code the same interview transcription (lets call it X) individually, and then had a
meeting to discuss differences in coding and possible improvements of the coding
guide. Kappa inter-rater agreement [18] has been used as a metric to evaluate im-
provement in homogeneity of coding by different researchers. Consequently, the
coding guide was updated and the interview transcription (X) was coded again using
the updated version of the coding guide to make sure that the differences between
different coders were minimized. The coding guide included codes at three abstrac-
tion levels: high, medium, and low (see Table 2). The high-level codes were based on
research questions. The medium-level codes included different categories relevant to
our research, and the low-level codes were the coders interpretation of the transcrip-
tions section. A summary of the codes is presented in Table 2.
Table 2. Overview of the codes assigned to transcriptions sections (see footnote 1 for a com-
plete list of codes)
Step 6. Coded interview transcriptions were merged into one file, making it possible
to group transcription sections according to codes.
Step 7. The identified transcriptions sections of each group were analyzed by two
researchers. In order to identify alignment challenges, researchers studied all the tran-
scriptions section coded as challenges with the goal to extract challenges from the
information provided by interviewees. Some challenges were similar and therefore
could be reformulated or merged together, while others were kept apart as they were
different.
Step 8. The results of the analysis were validated by feedback from the organization
where the interviews have been conducted.
A discussion of possible threats to validity will help us to qualify the results and high-
light some of the issues associated with our study. As suggested by P. Runeson and
M. Hst [5], we have analyzed the construct validity, external validity, and reliability.
Internal validity is concerned with threats to conclusions about cause and effect rela-
tionships, which is not an objective of this study. A detailed list of possible threats is
presented in [16].
Threats to Construct Validity. The construct validity is the degree to which the
variables are accurately measured by the measurement instruments used in the study
[5]. The main construct validity threat in this study regards the design of the meas-
urement instrument: are the questions formulated so that the interviews answer our
research questions? Our main measurement instrument is the interview guide (see
Section 3.1, Step 1), which includes the questions to be asked. Two researchers have
constructed it by analysing the research questions and creating sub-questions. Five
researchers have reviewed it to check for completeness and consistency; therefore we
believe that the interview guide is accurate. The other measurement instrument is the
coding guide. As described in Section 3.1, Step 4, this instrument has been validated
by seven researchers in order to make sure that the result of the coding activity had
minimal individual variations.
The questions in the interview guide were tailored on the fly to the interviewees
since the professionals participating in the interviews had different roles and different
background. Our study is qualitative; the goal is not to quantify answers of the same
type, rather to explore the different activities in the company, which could be done
best by investigating deeply the role of each interviewee.
Challenges in Aligning Requirements Engineering and Verification 135
Another potential threat in this study is that different interviewees may interpret the
term "alignment" differently. For this reason, the conceptual model (see Figure 1) has
been shown to the subjects during the interviews, in order to present our definition of
alignment between requirements and verification.
Reliability. Reliability issues concern to what extent the data and the analysis are
dependent on the researchers. Hypothetically, if another researcher later on conducts
the same study the results should be the same. In this study, all finding have been
derived by at least two researchers, and then reviewed by at least three other research-
ers. Therefore, this threat has been made smaller.
In our study, the investigation procedures are systematic and well documented (see
Section 3). The interview guide, the researchers view (the conceptual model), and
the coding scheme were reviewed independently by seven researchers with different
background.
The presented observations reflect the views of the participants. The interviews
have been recorded and transcribed. The transcriptions could contain errors due to
misinterpretation, mishearing, inaccurate punctuation or mistyped words. In order to
minimize these threats, the transcriber has also been present at the interview. More-
over, the transcriptions were sent to the interviewees so that they could correct possi-
ble misinterpretation of their answers.
One factor affecting the reliability of the data collected can be the fact that the in-
terviews capture the subjective opinion of each interviewee. However, we interviewed
11 professionals, which we believe is a sufficient amount to capture the general view
of the company. Influence among the subjects could not be controlled and we could
only trust the answers received. The choice of the subjects in the company might not
give a representative picture of the company; however, the subjects had different roles
and we tried to cover diverse roles within the company.
Regarding the coding activity, it is a classification of pieces of text, which are
taken out of context; hence there is a risk of misinterpretation. This risk was mini-
mized by checking the whole context of the text while doing data analysis.
To summarize, we believe that the validity threats of our results are under control,
although the results should not be generalized to all organizations.
This section summarizes the alignment problems and challenges related to the com-
panys organizational structure and processes.
The requirements and verification processes are separate processes and are not
aligned. Furthermore, processes can use different standards of documentation,
which negatively influence the hand-over between different parts of organization.
Moreover, some parts of the company follow a documented development process
while other parts do not.
Frequent process changes negatively influence alignment. It would take time for
people to learn and use the new process. Sometimes, people are reluctant to use a
process knowing that it will change soon. Also, some good practices could be lost
due to the process changes.
Distance in time between the development of requirements and test artifacts can
create alignment problems. Requirements can be approved without having test
cases associated with them. This can result in having non-testable requirements.
In a large company, gaps in communication across different organizational units
often occur, especially at the high level. Furthermore, as stated by an employee it
is hard to find who is accountable for things because depending on who you ask
you get very different answers. Therefore, this could affect the alignment, espe-
cially at the high abstraction level of the requirements and verification processes.
Implementation of process improvements is time consuming, especially when the
improvements are involving several units. Several issues related to the manage-
ment can affect the alignment, e.g. decisions are not documented, lessons learnt are
not always collected and processes depends on individual commitment.
Summarizing the challenges, the requirements and the verification processes are not
aligned and are distant in time. There are also communication problems across differ-
ent organizational units and the decisions are not documented, therefore it is hard to
know who is accountable for a decision. The organizational structure and the proc-
esses, as well as changes in these are influencing the alignment. One reason could be
that the company is very large and many organizational units are involved, and not
every unit follows the documented process, and the standard for documentation.
This subsection presents a list of issues that are related to people, their skills and
communication with each other.
Software tools play a crucial role in maintaining alignment between different artifacts.
The following are several tool related issues.
The lack of appropriate tools influences the alignment. It is very important to have
reliable and easy to use requirements and verification tools. If the tool is difficult to
use, or it is not reliable, people are not willing to use them. Having a good
requirements management tool, which includes not only information about
requirements, but also the flow of requirements, is crucial for testers. Otherwise,
testers try to get this kind of information from other sources, for instance the
developers. Tools for managing quality requirements are needed, otherwise there is
a risk that quality requirements are not implemented and/or tested.
It is important to keep the requirements database updated. If requirements are not
up to date, testers will test according to old requirements and will find issues,
which are not really failures, but valid features.
If there is no tool to collect customer needs, it is difficult to keep them aligned with
requirements, hence with test cases as well. And this leads to misalignment be-
tween customer needs and requirements, and consequently affects customer satis-
faction with the final product.
In cases when requirements and testing artifacts are stored in different tools, there
is a need of good interfaces between these tools, and access of all interested parties
to the tools. Otherwise, it becomes very difficult to maintain alignment. Especially
when there are many-to-many relationships between requirements and test cases.
If the mapping between requirements and test cases is not presented in a clear way,
it could contain too much redundant information, and therefore it could be difficult
for requirements people and testers to use it.
Most of the interviewees stated the lack of adequate software tools, which would
allow to handle requirements, verification, and to measure the alignment between
them. Furthermore, the interface of the tools and tool integration is not always good.
138 G. Sabaliauskaite et al.
The consequence of this is that people become reluctant to use them and do not up-
date the information stored in them. This is greatly affecting the alignment.
This subsection presents a list of issues that are related to the requirements process.
Requirements sometimes are not given enough attention and consideration by other
organizational units, such as development and testing units. According to an em-
ployee Developers do not always review the requirements, and discover require-
ments that can not be implemented during development, even when having agreed
on the requirements beforehand. This could be due to the lack of involvement of
developers and testers in requirements reviews.
Not having a good way of managing customers needs makes it more difficult to
define requirements, especially requirements at a high abstraction level.
Requirements engineers do not think about testability of requirements. Therefore,
requirements could turn out to be non-testable.
Dealing with quality requirements is a difficult task. Quality requirements tend to
be badly structured or vague. Furthermore, it is difficult to assign quality require-
ments to different development groups for implementation, since several groups
are usually involved in implementing a quality requirement, and none wants to take
a full responsibility for that.
It is difficult to maintain alignment in organizations working with a large set of
requirements, when the number of requirements reaches tens of thousands or more.
Furthermore, in the organizations, which are using a product lines engineering [2]
approach, maintaining alignment between domain and application requirements
and test cases could be a challenge.
As we can see, there are numerous challenges related to requirements process, which
affect alignment. Most of the interviewees stress the importance of updating require-
ments as soon as changes occur, and finding adequate ways of defining and managing
quality requirements. These two are the most recurrent requirements process related
challenges.
The following are the issues that related to the testing process.
Sometimes testers lack clear directions on how to proceed with testing. Especially
while testing high-level requirements, such as roadmaps for example. It is difficult
to test that the products adhere to roadmaps, since such testing takes a long time
and is costly. Usually short loops are preferred.
In case several organizational units are involved in testing, the cooperation be-
tween them is crucial. It is particularly relevant to the companies, which have a
product line engineering approach, since different organizational units could be
performing domain and application testing, and the faults detected in applications
should be removed from domain as well.
Challenges in Aligning Requirements Engineering and Verification 139
The following are the challenges related to traceability between requirements and
testing artifacts.
There is a lack of links between requirements and test cases. Some test cases are
very complex; therefore it is difficult to trace them back to requirements.
If traceability between requirements and test cases is not maintained, testers keep
testing requirements that have been removed. The reasons for lack of traceability
140 G. Sabaliauskaite et al.
stored becomes obsolete and not useful. Traceability is also a challenge, and its
importance is corroborated by other studies [3, 14]. Communication and cooperation
across different units within the company is also a major challenge, confirming the
results in [1, 17]. As a consequence of the challenges, company has decided to
improve its development process.
Our results can inspire other practitioners in their alignment improvement efforts
since they can learn from this case what can be the most salient challenges in manag-
ing large quantities of requirements and test information in natural language.
Researchers can also learn from this study since they can focus their research on
existing challenges of potentially general interest.
We are extending this study to other companies of different size and domain. This
will further enhance a general picture of alignment issues.
References
1. Karlsson, L., Dahlstedt, .G., Regnell, B., Natt Och Dag, J., Persson, A.: Requirements
Engineering Challenges in Market-Driven Software Development An Interview Study
with Practitioners. Information and Software Technology 49(6), 588604 (2007)
2. Pohl, K., Bckle, G., Van Der Linden, F.: Software Product Line Engineering: Founda-
tions, Principles and Techniques. Springer, Heidelberg (2005)
3. Post, H., Sinz, C., Merz, F., Gorges, T., Kropf, T.: Linking Functional Requirements and
Software Verification. In: 17th IEEE International Requirements Engineering Conference,
pp. 295302. IEEE Computer Society, Atlanta (2009)
4. Uusitalo, E.J., Komssi, M., Kauppinen, M., Davis, A.M.: Linking Requirements and
Testing in Practice. In: 16th IEEE International Requirements Engineering Conference,
pp. 265270. IEEE Computer Society, Barcelona (2008)
5. Runeson, P., Hst, M.: Guidelines for Conducting and Reporting Case Study Research in
Software Engineering. Empirical Software Engineering 14(2), 131164 (2009)
6. Lubars, M., Potts, C., Richter, C.: A Review of the State of the Practice in Requirements
Modeling. In: 1st IEEE International Symposium on Requirements Engineering, pp. 214.
IEEE Computer Society, San Diego (1993)
7. Berntsson Svensson, R., Gorschek, T., Regnell, B.: Quality Requirements in Practice: An
Interview Study in Requirements Engineering for Embedded Systems. In: Glinz, M.,
Heymans, P. (eds.) REFSQ 2009. LNCS, vol. 5512, pp. 218232. Springer, Heidelberg
(2009)
8. Carlshamre, P., Sandahl, K., Lindvall, M., Regnell, B., Natt Och Dag, J.: An Industrial
Survey of Requirements Interdependencies in Software Product Release Planning. In: 5th
IEEE International Symposium on Requirements Engineering, pp. 8491. IEEE Computer
Society, Toronto (2001)
9. Chatzoglou, P.D.: Factors Affecting Completion of the Requirements Capture Stage of
Projects with Different Characteristics. Information and Software Technology 39(9),
627640 (1997)
142 G. Sabaliauskaite et al.
10. Damian, D., Chisan, J.: An Empirical Study of the Complex Relationships between Re-
quirements Engineering Processes and Other Processes That Lead to Payoffs in Productiv-
ity, Quality and Risk Management. IEEE Transactions on Software Engineering 32(7),
433453 (2006)
11. Regnell, B., Berntsson Svensson, R., Wnuk, K.: Can We Beat the Complexity of Very
Large-Scale Requirements Engineering? In: Paech, B., Rolland, C. (eds.) REFSQ 2008.
LNCS, vol. 5025, pp. 123128. Springer, Heidelberg (2008)
12. Seaman, C.B.: Qualitative Methods. In: Shull, F., Singer, J., Sjberg, D.I.K. (eds.) Guide
to Advanced Empirical Software Engineering, ch. 2. Springer, Heidelberg (2008)
13. Nurmuliani, N., Zowghi, D., Fowell, S.: Analysis of Requirements Volatility during
Software Development Life Cycle. In: 2004 Australian Software Engineering Conference,
p. 28. IEEE Computer Society, Washington (2004)
14. ISO/IEC 9126 Software and System Engineering Product quality Part 1: Quality
model (1999-2002)
15. Fricker, S., Gorschek, T., Byman, C., Schmidle, A.: Handshaking: Negotiate to Provoke
the Right Understanding of Requirements. IEEE Software (2009)
16. Robson, C.: Real World Research, 2nd edn. Blackwell, Malden (2002)
17. Jones, J.A., Grechanik, M., Van der Hoek, A.: Enabling and Enhancing Collaborations
between Software Development Organizations and Independent Test Agencies. In:
Cooperative and Human Aspects of Software Engineering (CHASE), Vancouver (2009)
18. Lombard, M., Snyder-Duch, J., Campanella Bracken, C.: Content Analysis in Mass Com-
munication - Assessment and Reporting of Intercoder Reliability. Human Communication
Research 28(4), 587604 (2002)
On the Perception of Software Quality
Requirements during the Project Lifecycle
1 Introduction
Software quality requirements are a key concern throughout the software
lifecycle. Requirements research is increasingly focused on supporting systems
beyond the initial design phase, captured by Finkelsteins term reective require-
ments [6]. Quality requirements are usually dened in terms of global properties
for a software system, such as reliability, usability and maintainability;
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 143157, 2010.
c Springer-Verlag Berlin Heidelberg 2010
144 N.A. Ernst and J. Mylopoulos
we think of them as describing the how, rather than the what. In this sense
functionality can also be considered a quality, insofar as it describes how well
a given artifact implements a particular function (such as security). The impor-
tance of quality requirements lies in their inter-system comparability. Because of
their global nature, quality requirements are hard to build into a design and are
often treated post facto in terms of metrics that are applied to the nal product.
If requirements are important throughout the life-cycle (and we believe strongly
that they are), a better understanding of requirements after the initial release is
important. Are requirements discussed post-release? One way of answering this
question is to examine current practices using a standardized requirements tax-
onomy. In particular, we are interested in nding out whether there is any notice-
able pattern in how software project participants conceive of quality requirements.
Our study is conducted from the perspective of project participants (e.g., develop-
ers, bug reporters, users). We use a set of eight open-source software (OSS) prod-
ucts to test two specic questions about software quality requirements. The rst
is whether software quality requirements are of more interest as a project ages, as
predicted in Lehmans Seventh Law that the quality of systems will appear to
be declining unless they are rigorously maintained and adapted to environmental
changes [15, p. 21]. Our second question is whether quality is of similar concern
among dierent projects. That is, is a quality such as Usability as important to
one projects participants as it is to another?
To assess these questions, we need to dene what we mean by software qual-
ity requirements. Our position is that requirements for software quality can be
conceived as a set of labels assigned to the conversations of project participants.
These conversations take the form of mailing list discussions, bug reports, and
commit logs. Consider two developers in an OSS project who are concerned
about the softwares performance. To capture this quality requirement, we look
for indicators, which we call signiers, which manifest the concern. We then label
the conversations with the appropriate software quality, using text analysis. Our
qualities are derived from a standard taxonomy the ISO 9126-1 software qual-
ity model [9]. The signiers are keywords that are associated with a particular
quality. For example, we label a bug report mentioning the slow response time
of a media player with the Eciency quality.
We discuss related approaches in Section 2. Section 3 describes how we derive
these signiers and how we built our corpora and toolset for extracting the
signiers. We then present our observations and a discussion about signicance
in Section 4. Finally, we examine some threats to our approach and discuss future
work.
2 Related Work
Part of our eort with this project is to understand the qualitative and inten-
tional aspects of requirements in software evolution, a notion we rst discussed
in [11]. That idea is derived from work on narratives of software systems shown
in academic work like [1].
On the Perception of Software Quality Requirements 145
3 Methodology
Our corpora are from a selection of eight Gnome projects, listed in Table 1.
Gnome is an OSS project that provides a unied desktop environment for Linux
and its cousins. Gnome is both a project and an ecosystem: while there are
co-ordinated releases, each project operates somewhat independently. In 2002,
Koch and Schneider [14] listed 52 developers as being responsible for 80% of the
Gnome source code. In our study, the number of contributors is likely higher,
since it is easier to participate via email (e.g., feature requests) or bug reports.
For example, in Nautilus, there were approximately 2,000 people active on the
mailing list, whereas there were 312 committers to the source repository. 1
1
Generated using Libresoft, tools.libresoft.es
146 N.A. Ernst and J. Mylopoulos
Table 1. Selected Gnome ecosystem products (ksloc = thousand source lines of code)
The projects used in this paper were selected to represent a variety of lifespans
and codebase sizes (generated with [21]). All projects were written in C/C++,
save for one in Python (Deskbar). For each project we created a corpus from that
projects mailing list, subversion logs and the bug comments, as of November
2008. From the corpus, we extracted messages, that is, the origin, date, and
text (e.g, the content of the bug comment), and placed this information into
a MySQL database. A message consists of a single bug report, a single email
message, or a single commit. If a discussion takes place via email, each indi-
vidual message about that subject is treated separately. Our dataset consists of
over nine hundred thousand such messages, across all eight projects. We do not
extract information on the mood of a message: i.e., we cannot tell whether this
message expressed a positive attitude towards the requirement in question (e.g.,
This menu is unusable). Furthermore, we are not linking these messages to
the implementation in code; we have no way of telling to what extent the code
met the requirement beyond participant comments.
Table 2. Qualities and quality signiers Wordnet version (WN). Bold text indicates
the word appears in ISO/IEC 9126.
Quality Signiers
Maintainability testability changeability analyzability stability maintainabil-
ity maintain maintainable modularity modiability understandability
Functionality security compliance accuracy interoperability suitability func-
tional practicality functionality
Portability conformance adaptability replaceability installability portable
movableness movability portability
Eciency resource behaviour time behaviour ecient eciency
Usability operability understandability learnability useable usable ser-
viceable usefulness utility useableness usableness serviceableness ser-
viceability usability
Reliability fault tolerance recoverability maturity reliable dependable re-
sponsibleness responsibility reliableness reliability dependableness
dependability
On the Perception of Software Quality Requirements 147
In semiotics, Peirce drew a distinction between signier, signied, and sign [2].
In this work, we make use of signiers words like usability and usable to
capture the occurrence in our corpora of the signied in this example, the
concept Usability. We extract our signied, concept words from the ISO 9126
quality model [9], which describes six high-level quality requirements (listed in
Table 2). There is some debate about the signicance and importance of the
terms in this model. However, it is an international standard and thus provides
an internationally accepted terminology for software quality [3, p. 58], which is
sucient for the purposes of this research.
We want to preserve domain-independence, such that we can use the same
set of signiers on dierent projects. This is why we eschew more conventional
text-mining techniques that generate keyword vectors from a training set.
We generate the initial signiers from Wordnet [12], an English-language lexi-
cal database that contains semantic relations between words, including meronymy
and synonymy. We extract signiers using Wordnets synsets, hypernyms, and
related forms (stems), and related components using the two-level hierarchy in
ISO9126. When we account for spelling variations, we associate this wordlist with
a top-level quality, and use that to nd unique events. This gives us a repeatable
procedure for each signied quality. We call this initial set of signiers WN.
Expanding the signiers The members of the set of signiers will have a big
eect on the number of events returned. For example, the term user friendly is
one most would agree is relevant to discussion of usability. However, this term
does not appear in Wordnet. To see what eect an expanded list of signiers
would have, we generated a second set (henceforth ext), by expanding WN with
more software-specic signiers. The ext signier sets are shown in Table 3.
To construct our expanded sets, we rst used Boehms 1976 software quality
model [4], and classied his eleven ilities into their respective ISO9126 qualities.
We did the same for the quality model produced by McCall et al. [17]. Finally,
we analyzed two mailing lists from the KDE project to enhance the specicity
of the sets. Like Gnome, KDE is an open-source desktop suite for Linux, and
likely uses comparable terminology. We selected KDE-Usability, which focuses
on usability discussions for KDE as a whole; and KDE-Konqueror, a list about a
long-lived web browser project. For each high-level quality in ISO9126, we rst
searched for our existing (WN) signiers; we then randomly sampled twenty-
ve mail messages that were relevant to that quality, and selected co-occurring
terms relevant to that quality. For example, we add the term performance to
the synonyms for Eciency, since this term occurs in most mail messages that
discuss eciency.
There are many other possible sources for quality signiers, but for compar-
ative purposes with the Wordnet lists, we felt these sources were sucient.
We discuss the dierences the two sets create in Section 4.
148 N.A. Ernst and J. Mylopoulos
Table 3. Qualities and quality signiers extended version (ext). Each quality consists
of terms in (a) in addition to the ones listed.
Quality Signiers
Maintainability WN + interdependent dependency encapsulation decentralized mod-
ular
Functionality WN + compliant exploit certicate secured buer overow policy ma-
licious trustworthy vulnerable vulnerability accurate secure vulnera-
bility correctness accuracy
Portability WN + specication migration standardized l10n localization i18n in-
ternationalization documentation interoperability transferability
Eciency WN + performance proled optimize sluggish factor penalty slower
faster slow fast optimization
Usability WN + gui accessibility menu congure convention standard feature
focus ui mouse icons ugly dialog guidelines click default human con-
vention friendly user screen interface exibility
Reliability WN + resilience integrity stability stable crash bug fails redundancy
error failure
Once we constructed our sets of signiers, we applied them to the message cor-
pora (the mailing lists, bug trackers, and repositories) to create a table of events.
An event is any message (row) in the corpus table which contains at least one
term in the signier set. A message can contain signiers for dierent qualities,
and can this generate as many as six events (e.g., a message about maintain-
ability and reliability). However, multiple signiers for the same quality only
generate a single event for that quality. We produced a set of events (e.g., a
subversion commit message), along with the associated time and project. We
group events by week for scalability reasons. Note that each email message in
a thread constitutes a single event. This means that it is possible that a single
mention of a signier in the original message might be replied to multiple times.
We assume these replies are on-topic and related to the original concern.
We normalize the extracted event counts to remove the eect of changes in mail-
ing list volume or commit log activity (some projects are much more active). The
calculation takes each signiers event count for that period, and divides by the over-
all number of messages in the same period. We also remove low-volume periods from
consideration. This is because a week in which only one message appeared, that con-
tained a signier, will present as a 100% match. From this dataset we conducted our
observations and statistical tests. Table 4 illustrates some of the sample events we
dealt with, and our subsequent mapping to software quality requirements.
Event Quality
...By upgrading to a newer version of GNOME you could receive bug None
xes and new functionality.
There should be a feature added that allows you to keep the current Functionality
functionality for those on workstations (automatic hot-sync) and then
another option that allows you to manually initiate .
Steps to reproduce the crash: 1. Cant reproduce with accuracy. Reliability,
Seemingly random. .... Functionality
How do we go disabling ekigas dependency on these functions, so that Maintainability
people who arnt using linux can build the program without having to
resort to open heart surgery on the code?
U () is equivalent of () but returns Unicode (UTF-8) string. Update Portability
your xml-i18n-tools from CVS (recent version understands U ), update
Swedish translation and close the bug back.
On some thought, centering dialogs on the panel seems like its prob- Usability
ably right, assuming we keep the dialog on the screen, which should
happen with latest metacity.
These calls are just a waste of time for client and server, and the Nau- Eciency
tilus online storage view is slowed down by this wastefulness.
encountered some mail messages from individuals whose email signature included
the words Usability Engineer. If the body of the message wasnt obviously
about usability, we coded this as a false-positive. Our error test was to randomly
select messages from the corpora and code them as relevant or irrelevant. We
assessed 100 events per quality, for each set of signiers (ext and WN). Table 5
presents the results of this test. False-positives averaged 21% and 20% of events,
for ext and WN respectively (i.e., precision was 79% and 80%).
Recall, or completeness, is dened as the number of relevant events retrieved
divided by the total number of relevant events. Supercially we could describe
our recall as 100%, since the query engine returns all matches we asked for,
but true recall should be calculated using all events that had that quality as
a topic. To assess this, we randomly sampled our corpora and classied each
event into either a signier (Usability, Reliability, etc.) or None. For extended
signier lists, we had an overall recall of 51%, and a poor 6% recall for the
Wordnet signiers. We therefore dispensed with the Wordnet signiers. This
is a very subjective process. For example, we classied a third of the events
as None; however, arguably any discussion of software could be related, albeit
tangentially, to an ISO9126 quality. We think a better understanding of this issue
is more properly suited to a qualitative study, in which project-specic quality
models can be best established.
Table 6. Selected summary statistics, normalized. Examples from Nautilus and Evo-
lution for all qualities using extended signiers.
of one version, and the release of the next (major) version. Finally, we explored
qualitative explanations for patterns in the data.
Using project lifespan We examined whether, over a projects complete
lifespan, there was a correlation with quality event occurrences. Recall that we
dene quality events as occurrences of a quality signier in a message in the
corpora. We performed a linear regression analysis and generated correlation
coecients for all eight projects and six qualities. Figure 2 is an example of our
analysis. It is a scatterplot of quality events vs. time for the Usability quality in
Evolution. For example, in 2000/2001, there is a cluster around the 300 mark,
using the extended (ext) set of signiers. Note that the y-axis is in units of
(events/volume * 1000) for readability reasons.
The straight line is a linear regression. The dashed vertical lines represent
Gnome project milestones, with which the release dates of the projects we study
are synchronized. Release numbers are listed next to the dashed lines. Due to
Gnome 1.0
400
Gnome 1.2
Gnome 1.4
Gnome 2.0
Gnome 2.2
Gnome 2.4
Gnome 2.6
Gnome 2.8
Gnome 2.10
Gnome 2.12
Gnome 2.16
Gnome 2.20
Gnome 2.24
300
Occurrences (normalized)
200
100
nome 2.10
0
Year
space constraints, Table 6 lists only Nautilus and Evolution as products, and
r2 squared correlation value, or coecient of determination and slope (trend)
values for each quality within that project. r2 varies between 0 and 1, with a
value of 1 indicating perfect correlation. The sign of the slope value indicates
direction of the trend. A negative slope would imply a decreasing number of
occurrences as the project ages. Table 7 does a similar analysis for all products
and the Usability and Eciency (performance) qualities.
The results are inconclusive. In all cases the correlation coecient indicating
the explanatory power of our linear regression model is quite low, well below the
0.9 threshold used in, for example, [18]. There does not seem to be any reason to
move to non-linear regression models based on the data analysis we performed.
We conclude that our extended list of signiers does not provide any evidence
of a relationship between discussions of software quality requirements and time.
In other words, either the occurrences of our signiers are random, or there is
a pattern, and our signier lists are not adequately capturing it. The former
conclusion seems more likely based on our inspection of the data.
Using release windows It is possible that the event occurrences are more
strongly correlated with time periods prior to a major release, that is, that there
is some cyclical or autocorrelated pattern in the data. We dened a release win-
dow as the period from immediately after a release to just before the next release.
We investigated whether there was a higher degree of correlation between the
number of quality events and release age, for selected projects and keywords. Was
this release window correlation better than the one we found for project lifespan
as a whole? For space reasons we do not include these results, but there was
no improvement in correlation. There is no relationship between an approaching
release date and an increasing interest in software quality requirements.
On the Perception of Software Quality Requirements 153
50
Gnome 1.2
Gnome 1.4
Gnome 2.0
Gnome 2.2
Gnome 2.4
Gnome 2.6
Gnome 2.8
Gnome 2.10
Gnome 2.12
40
Occurrences (actual)
30
20
10
0
Year
For our second approach, we used the actual signier event counts, and tar-
geted Reliability events for Nautilus. In November, 2000, 50 events occur. In-
specting the events, one can see that a number have to do with bug testing
the second preview release that was released a few days prior. For example, one
event mentions ways to verify reliability requirements using hourly builds: As
a result, you may encounter a number of bugs that have already been xed. So,
if you plan to submit bug reports, its especially important to have a correct
installation!. Secondly, in early 2004 there is a point with 29 events just prior
to the release of Gnome 2.6. Discussion centers around the proper treatment
of le types that respects reliability requirements. It is not clear whether these
discussions are in response to the external pressure of the deadline or are just
part of a general, if heated, discussion.
These investigations show that there is value to examining the historical record
of a project in detail, beyond quantitative analysis. While some events are clearly
responding to external pressures such as release deadlines, other events are often
prompted by something as simple as participant interest, which seems to be
central to the OSS development model.
Table 8. Quality per project. Numbers indicate normalized occurrences per week.
for instance, what is the relationship between product reliability and product
functionality?
The challenge for researchers is to align software quality models, at the high
level, with the product-specic requirements models developers and community
participants work with, even if these models are implicit. One reason discussions
of quality requirements were dicult to identify is that, without explicit models,
these requirements are not properly considered or are applied haphazardly. We
need to establish a mapping between the platonic ideal and the reality on the
ground. This will allow us to compare maintenance strategies for product quality
requirements across domains, to see whether strategies in, for example, Gnome,
can be translated to KDE, Apple, or Windows software.
References
1. Anton, A.I., Potts, C.: Functional paleontology: system evolution as the user sees it.
In: International Conference Software Engineering, Toronto, Canada, pp. 421430
(2001)
On the Perception of Software Quality Requirements 157
1 Introduction
Requirements engineering (RE) is a central part of software development, but despite
this fact, actual knowledge of the RE process is lacking [1]. A large part of RE re-
search concentrates on methods or techniques supporting a single activity instead of
promoting an integrated view of RE and lacks reports on the connections between
various good practices [1, 2].
Several best RE practices have been identified that contribute to software project
success [1]. However, having the best RE practices in place may not be enough. A
best RE practice may include various techniques. The proper selection and tailoring of
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 158172, 2010.
Springer-Verlag Berlin Heidelberg 2010
Lessons Learned from Integrating Specification Templates, Collaborative Workshops 159
2 Related Work
It is important first to find out the strengths and weaknesses of individual practices in
order to understand how these practices can work together. In this section, the previ-
ously reported characteristics of specification templates, collaborative workshops, and
peer reviews are presented.
project team and improve document quality. The reason for these advantages is the
early involvement of the team in documentation work. RaPiD7 speeds up the docu-
ment creation process in terms of calendar time.
RaPiD7 is similar to a more widespread method called Joint Application Develop-
ment (JAD), which was developed at IBM in 1977 [16]. JAD enables technical and
business specialists to learn about each others domain knowledge, improves commu-
nication among interested parties, facilitates consensus management, and increases
user acceptance of specifications [17].
One challenge these two techniques present is the common time required for work-
shop meetings. For example, a JAD procedure typically lasts for three to five days
[18] and stakeholders have difficulty allocating common time. In fact, it was neces-
sary to adapt the JAD technique in some organizations because the staff were some-
times unable or unwilling to commit to full-time participation in JAD workshops [19].
Cockburn [20] proposes a more informal process for a collaborative workshop, in
which people work in a full group when there is a need to align or brainstorm and use
the rest of their time in pairs or alone. Cockburn explains that a group is able to brain-
storm and reach a consensus effectively, but when the group is split, more text is
produced.
Peer reviews are a core practice of requirements quality control [2, 21]. A peer review
consists of someone other than the author of a document examining it in order to
discover defects and identify improvement opportunities [22]. Eventually, a software
development organization may need to acquire deeper knowledge from the types,
formalities, and feasibilities of peer reviews. In particular, mature software develop-
ment organizations are advised to develop capability to determine what types of peer
reviews are conducted and to tailor the peer reviews in the organizations software
projects [21]. Wiegers has defined several review techniques and types, both formal
and informal [23], which are listed below using his definitions.
The most formal technique of a review, inspection, has several characteristics that
distinguish it from other review techniques. For example, a trained moderator leads
meetings and co-operates with a trained team. The moderator defines the goals, col-
lects quality data, and distributes results using a reporting process.
A team review is slightly more informal and imprecise than an inspection. Team
reviews concentrate more on detecting defects than preventing them. A team review
may be chosen if no trained inspection leaders are available.
In a walkthrough, the author of the document explains it to colleagues and asks for
their feedback. This review type is generally informal and does not involve data col-
lection and reporting. However, the process steps and the role of each participant may
be clearly defined.
In a passaround, the author of the document sends it to several colleagues and
gathers their feedback. The passaround technique is useful, for instance, for obtaining
ideas and corrections for a new project plan.
In a peer deskcheck, only one checker examines the document. While this review
technique requires the smallest amount of resources, it is only appropriate for prod-
ucts that do not have very high quality expectations or are not to be reused.
162 M. Komssi et al.
In an ad hoc review, the author of the program presents a problematic part of the
design to a fellow worker and asks for help. Although quite informal, this review type
is useful for short and tricky cases.
A team can identify the strengths and weaknesses of the review types [23]. The
purpose of this is to select the proper review type for each case with regard to
the organizational culture, time constraints, and business objectives. In particular, the
team is advised to select the least expensive review type that fulfills the objectives of
the review [22].
3 Research Design
The goal of this study is to present lessons learned from the integration of specifica-
tion templates, collaborative workshops, and peer reviews. The study was conducted
using an action research approach in five Finnish companies. The data were collected
from the case study companies during a period of ten years (1999 to 2009) and
analyzed iteratively in three phases.
In order to gain a deep understanding of the three practices and their integration, we
applied an action research approach. This research method was selected for two rea-
sons: it has a unique ability to link research to practice, and as a qualitative method, it
is also effective for explaining what is happening in a company [24]. The action re-
search approach allows researchers to address complex real-life matters and study
selected issues in detail [25]. Additionally, an industry-as-laboratory research
approach, where researchers identify problems through close involvement with indus-
trial projects and create and evaluate practices addressing the problem, is suggested in
[26]. This lets researchers emphasize what people actually do or can do in practice,
rather than what is possible in principle.
To access insider and historical data, as well as to engage practitioners in research,
we also applied the insider action research approach [27]. In the insider action
research approach, some of the researchers are internal members of practitioner or-
ganizations. As internal members of the organization, practitioner-researchers have
the opportunity to collect data that are richer than what they would collect as external
researchers. Gummesson [28] points out that a lot of information is stored in the
minds of practitioners, who have often undergone central and dramatic changes.
Therefore, Gummesson urges practitioners to act as researchers and reflect on what
they had learned retrospectively.
Our research was conducted in five Finnish companies, which are introduced in
Table 1. Three of the companies were of medium size, one was small, and one was
large. Companies A, C, and E are internationally known and have a significant global
market share in their fields. These three companies focus mainly on solutions devel-
oped for a large number of customers. Company B provides pension insurance
Lessons Learned from Integrating Specification Templates, Collaborative Workshops 163
A B C D E
Specification templates X X X X X
Collaborative Workshops X X
Peer Reviews X X X X X
Our study was based on the following question: what are the lessons learned from
the integration of these three practices? Applying the industry-as-laboratory re-
search approach [26], we divided the question into two more specific questions as
follows: what are the problems faced with the use of the three practices? and what
kind of approach supports the integrated use of the three practices? Figure 1
illustrates the three main phases of the study: 1) identification of problems faced in
software projects and development of the integration approach, in Company A, 2)
retrospective analysis of the problems faced in the five companies and refinement
of the integration approach, and 3) validation of findings and refinement of the inte-
gration approach, in Company B.
164 M. Komssi et al.
Phase 1 was performed in Company A between the years 2003 and 2006. Per-
ceived problems of eleven software projects were first identified, and the related im-
provement ideas were collected and analyzed. Subsequently, a preliminary approach
to the integration of the three RE practices was developed and piloted iteratively.
The goal of Phase 2 was to compare the preliminary results gained from Company
A during Phase 1 with the experiences from the other four companies (B, C, D, and
E). In this phase, retrospective analysis was used to examine previously collected
data. The data had been collected in three ways. First, four of the authors had worked
in one or two of the case study companies (A, B, D, and E), participating in software
development projects and requirements process improvement work. Second, the
authors conducted two research projects with the case study companies during 1999-
2005. Within these research projects, data from Companies A, C, and E was gathered.
Third, the authors interviewed a person who had been in charge of specification
templates and peer reviews in Company A ten years ago. Based on the analysis, the
authors refined the findings related to the problems faced with the use of the three
practices and reflected on the approach to integration.
Phase 3 was conducted with Company B from 2006 to 2009. The companys goal
was to develop a consistent yet tailorable set of RE practices that could be applied
company-wide. This was accomplished by a series of 12 workshops and 19 meetings,
where the findings of the previous phases were built on. The result of these activities
was an approach that enables an organization to tailor and integrate specification
templates, collaborative workshops, and peer reviews into a coherent entity. The
approach was piloted during its development in 9 software projects. In addition, the
company organized two training sessions, in which 28 project managers, requirements
specialists, and group leaders participated. After the training, we asked for partici-
pants comments on how suitable they perceive the approach as being for the types of
projects they typically participate in. We used both a feedback form and group discus-
sion to collect the data. In this phase, the collected data were analyzed and the
Lessons Learned from Integrating Specification Templates, Collaborative Workshops 165
findings were clustered into the previous findings iteratively. The findings were vali-
dated and new findings were merged with them. The final findings are described as
the lessons learned. These lessons are described in the following sections.
Table 3 summarizes the data collection activities performed in the case study
companies. The results of this study are based on the data collected through observa-
tions, formal semi-structured interviews, informal conversations, the analysis of
requirements specification templates, the analysis of requirements documents, and
questionnaires.
A B C D E
Observation X X X X X
Interviews X X X X
Informal conversations X X X X X
Analysis of specification templates X X X X X
Analysis of requirements documents X X X X X
Questionnaires X X X
We apply the explanations of Yin [29] to construct and external validity. In our study,
a threat to construct validity is the possibility that we were not able to correctly collect
and evaluate the problems related to the use of the three practices and the benefits and
success factors from applying the integration approach. As a result, our inferences
concluded as lessons learned might not represent reality, in the companies. The threat
to external validity is the possibility that lessons we have learned cannot be general-
ized to other software development organizations.
To reduce the threat to construct validity of the study, we used of multiple sources
of evidence and triangulation. We used a number of information sources and data
collection techniques. We applied triangulation of data sources and data collection
techniques by utilizing interviews, informal conversations, participant observation,
and document analysis. In addition, the study covers a long period of time that im-
proves the construct validity of our findings, as it was possible to analyze and validate
the findings at different times. Finally, key informants from two companies reviewed
our findings several times.
To reduce the threat to external validity of the research results, the study involved
five separate case study organizations of different characteristics, such as size,
solutions, and business environments. The integration approach was developed and
piloted in two companies that have very different types of software development and
business environments and solutions.
meeting was different than they had expected or wanted. In another company, reviews
were part of the development process and the participants typically perceived the
reviews as little more than a rubber stamp at the end of the software development
procedure. On the other hand, collaborative workshops were not a defined practice in
the case study companies. Undefined workshops were applied in one company and
this was identified as a reason for the frustration of the participants. In particular, the
goals of the workshops were not communicated and this meant that the participants
had expectations of the course of action and outcomes of workshops that differed
from those of the facilitators.
A setup workshop is a collaborative and facilitated workshop used for planning and
communicating how to utilize the three practices in the creation of requirements
specification in software projects. The setup workshop was originally developed and
piloted in Company A, and later refined and piloted in Company B. The setup work-
shop seems to be a key component for integrating the three practices. The current
version of the setup workshop is presented in the following and illustrated in Figure 2.
Fig. 2. Using a setup workshop for planning the integrated deployment of the three RE
practices
Lessons Learned from Integrating Specification Templates, Collaborative Workshops 169
Although specification templates, collaborative workshops, and peer reviews have all
been recommended, they are typically treated as independent RE practices in the
literature. Our findings indicate that the independence of the practices leads to several
problems in practice. The use of specification templates often leads to individual
specifying work, resulting in relatively long documents. Peer reviews are typically
performed too late and reviewers are not motivated to contribute.
The integration of the three practices was identified as a rational way to reduce
such problems. As each of the three practices has strengths and weaknesses that partly
overlap, using the practices in a tailored and intertwined way helps the team to reduce
the negative influences of their weaknesses. As a means to perform integration, our
findings suggest using the setup workshop for planning the workflow of specifying
requirements, identifying project-specific ways to collaborate, and selecting and tai-
loring the appropriate types of practices. The use of a setup workshop can improve the
applicability of specification templates and promote early information sharing
between stakeholders. Proper planning and communication of the goals of each col-
laborative workshop and peer review should reduce false expectations and frustration
on the part of participants.
The proposed integration approach includes limitations and challenges when
adopting it in a software organization. Software developers may not easily adopt the
proposed integration approach, if they already oppose meetings, documentation, and
peer reviews. In addition, the tailoring of the RE practices, as a key element of the
integration, requires more RE skills than the use of standardized practices. A software
organization needs to consider whether they have the necessary skills or willingness
to acquire them. Furthermore, the integration approach will increase the variability of
the used RE practices in the software organization. The work practices and require-
ment specifications in different software projects will become less comparable. Con-
sequently, a process owner and management may find it more difficult to observe the
progress and quality of software projects.
The significance of our findings is to be confirmed in future studies. The role of
certain authors as active participants in the companies may have affected the construct
validity of the results. It should be noted that the development of the integration
approach partly occurred as everyday work in two companies and was not solely
organized to support the research purpose. Furthermore, we were able to apply the
integration approach only in two out of the five companies. This weakens the external
validity of the findings. New studies in several organizations are needed to evaluate
whether the integration approach really addresses to the problems faced with use of
the three RE practices.
While this paper presented the three RE practices as integrated, in future, it would
also be worth studying how to integrate larger sets of RE practices that are adopted
for use in software companies. Indeed, evaluating the usefulness of the setup work-
shop for planning and tailoring the entire RE process of a software project appears to
be a promising idea.
Lessons Learned from Integrating Specification Templates, Collaborative Workshops 171
References
1. Hofmann, H.F., Lehner, F.: Requirements Engineering as a Success Factor in Software
Projects. IEEE Software 18(4), 5866 (2001)
2. Katasonov, A., Sakkinen, M.: Requirements Quality Control: a Unifying Framework.
Requirements Engineering 11(1), 4257 (2006)
3. Tsumaki, T., Tamai, T.: Framework for Matching Requirements Elicitation Techniques to
Project Characteristics. Software Process Improvement and Practice 11(5), 505519
(2006)
4. Fitzgerald, B., Russo, N.L., OKane, T.: Software Development Method Tailoring at
Motorola. Communications of the ACM 46(4), 6470 (2003)
5. Uusitalo, E.J., Komssi, M., Kauppinen, M., Davis, A.M.: Linking Requirements and Test-
ing in Practice. In: Proceedings of the 16th IEEE International Requirements Engineering
Conference, pp. 265270. IEEE CS Press, Barcelona (2008)
6. IEEE Recommended Practice for Software Requirements Specifications (IEEE Std-830),
pp. 207244 (1998)
7. Robertson, S., Robertson, J.: Mastering the Requirements Process, 2nd edn. Addison-
Wesley, Boston (2006)
8. Sommerville, I., Sawyer, P.: Requirements Engineering: A Good Practice Guide. Wiley,
Chichester (1997)
9. Davis, A.M.: Just Enough Requirements Management: Where Software Development
Meets Marketing. Dorset House Publishing, New York (2005)
10. Wiegers, K.E.: Software Requirements, 2nd edn. Microsoft Press, Redmond (2003)
11. Brockmann, R.J.: Where Has the Template Tradition in Computer Documentation Led Us?
In: Proceedings of the 2nd Annual International Conference on Systems Documentation,
pp. 1618. ACM, Seattle (1983)
12. Kaner, C., Bach, J., Pettichord, B.: Lessons Learned from Software TestingA Context-
driven Approach. Wiley, New York (2002)
13. Power, N., Moynihan, T.: A Theory of Requirements Documentation Situated in Practice?
In: Proceedings of the 21st Annual International Conference on Documentation, pp. 8692.
ACM, San Francisco (2003)
14. Gottesdiener, E.: Requirements by Collaboration: Getting It Right the First Time. IEEE
Software 20(2), 5255 (2003)
15. Kylmkoski, R.: Efficient Authoring of Software Documentation Using RaPiD7. In:
Proceedings of the 25th International Conference on Software Engineering, pp. 255261.
IEEE Computer Society Press, Portland (2003)
16. Carmel, E., Whitaker, R.D., George, J.F.: PD and Joint Application Design: a Transatlantic
Comparison. Communications of the ACM 36(6), 4048 (1993)
17. Purvis, R., Sambamurthy, V.: An Examination of Designer and User Perceptions of JAD
and the Traditional IS Design Methodology. Information & Management 32(3), 123135
(1997)
18. Wood, J., Silver, D.: Joint Application Development, 2nd edn. Wiley, New York (1995)
19. Davidson, E.J.: Joint Application Design (JAD) in Practice: The Journal of Systems and
Software 45(3), 215223 (1999)
20. Cockburn, A.: Writing Effective Use Cases. Addison-Wesley, Upper Saddle River (2001)
21. Chrissis, M.B., Konrad, M., Shrum, S.: CMMI: Guidelines for Process Integration and
Product Improvement. Addison-Wesley, Boston (2003)
22. Wiegers, K.E.: Peer Reviews in Software: A Practical Guide. Addison-Wesley, Massachusetts
(2001)
172 M. Komssi et al.
23. Wiegers, K.: When Two Eyes Arent Enough. Software Development 9(10), 5861 (2001)
24. Avison, D., Lau, F., Myers, M.D., Nielsen, P.A.: Action Research. Communications of the
ACM 42(1), 9497 (1999)
25. Avison, D.E., Baskerville, R., Myers, M.D.: Controlling Action Research Projects.
Information Technology & People 14(1), 2845 (2001)
26. Potts, C.: Software-Engineering Research Revisited. IEEE Software 10(5), 1928 (1993)
27. Coghlan, D.: Insider Action Research Projects: Implications for Practising Managers.
Management Learning 32(49), 4960 (2001)
28. Gummesson, E.: Qualitative Methods in Management Research, 2nd edn. Sage Publica-
tions Inc., Thousand Oaks (2000)
29. Yin, R.K.: Case Study Research Design and Methods, 3rd edn. Sage Publications Inc.,
Thousand Oaks (2003)
A Case Study on Tool-Supported
Multi-level Requirements Management
in Complex Product Families
1 Introduction
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 173187, 2010.
c Springer-Verlag Berlin Heidelberg 2010
174 M. Bittner, M.-O. Reiser, and M. Weber
This basic idea of product line orientation well suits the situation in auto-
motive industry, with its huge product ranges and its extensive variability. The
same applies to many other industrial domains of software-intensive systems.
However, when applying traditional product line methods and techniques to a
highly complex product family, such as that of a global automotive manufacturer,
the engineer is faced with a dilemma: managing everything as a single, gargan-
tuan product line is virtually impossible owing to its enormous complexity; but
when dividing the range of available products into several smaller independent
product lines, systematic reuse and strategic variability management across these
portionstwo of the key benets of product line orientationare lost. It is the
purpose of the multi-level approach, as presented in [3,4], to avoid this strict al-
ternative by oering a compromise between a single global and several smaller,
independent product lines. With this technique, it is possible to split up a huge
product line into smaller, independent sublines but still, to strategically steer
their commonalities and variabilities on a global level.
In this paper we present the results of an automotive case study of tool-
supported multi-level requirements management and we will discuss the experi-
ences and the lessons learned from it and show how they motivated renements
and extensions to the existing approach and tool. After a brief overview of the
multi-level approach in the coming section and a short description of the applied
tool-support in Section 3, we describe the background, scope and the quantita-
tive results of the case study in Section 4. In the main part of this article we
will then describe in detail the experiences from the case study and the resulting
renements and extensions to the approach and its tool-support (Section 5). The
last section nishes with a summary and several concluding remarks.
The basic intention of the multi-level approach is to allow for strategic planning
and to manage development across several smaller, independent product lines,
without introducing a large, rigid product line infrastructure on the global level.
To achieve this, we turn to the development artifacts of two or more independent,
lower-level product lines and initially assume that these artifacts are dened
and evolved independently for each lower-level product line. To now allow for a
coordination on the global level, we introduce an additional artifact of the same
type which has the sole purpose of making proposals for the content of the lower-
level artifacts, thus serving as a template for them; the individual proposals in
this template may or may not be adopted within the lower-level artifacts. The
template artifact on the global level is called a reference artifact, whereas the
lower-level artifacts are called referring artifacts; whenever a lower-level artifact
diverges from a proposal in its reference artifact, we speak of a deviation.
In addition to this, the proposals in the reference artifact are marked as op-
tional or obligatory, which allows one to recognise deviations in the referring
artifacts as legitimate deviations, that constitute a deviation from an optional
proposal, or illegitimate deviations, that deviate from an obligatory proposal. If
A Case Study on Tool-Supported Multi-level Requirements Management 175
Reference
Module
Devitation
Permissions
Feedback of
Conformance-
Check
Referring Module
Fig. 1. The tool for multi-level Doors modules. The lower screen shot shows the feed-
back of the conformance-check algorithm for each object: its conformance, dierences
to its reference object (if any), review status, and recent changes.
specications, dened and managed within the commercial tool Rational Doors.
To denote this special instantiation of the multi-level approach we speak of multi-
level Doors modules, instead of multi-level requirements artifacts, following the
terminology in Doors where a single requirement specication container is called
a module. The tool support for this specialized multi-level technique was imple-
mented as an extension to Doors. It has been applied in the N-Lighten case study
as mentioned above.
The reason for building on an existing commercial tool was that this tool is
widely used in the automotive industry and it provides a very exible extension
mechanism in the form of the Doors Extension Language (DXL), which actually
constitutes a complete programming language.
A screen shot of Doors running the tool for multi-level requirements manage-
ment is shown in Figure 1. The upper window shows a reference module which is
referred to by the module in the lower window, the specication presented in the
upper window thus serves as a template for the one in the lower window. As can
be seen in the upper window, the deviation permission attributes all go in one
single column, i.e. a single Doors custom attribute was created for them. The
A Case Study on Tool-Supported Multi-level Requirements Management 177
attribute lists the values of all deviation permission attributes that dier from
the default value. In the lower window, on the other hand, it can be seen that
the feedback of the algorithm checking the conformance of the referring module
is also presented in a dedicated column/attribute, called Conformance. The con-
formance state for each object can be found here. Conformance violations are
highlighted by a special background color.
In other implementations of the multi-level approach mentioned in [4], the
reference link from a referring model to a reference model is dened inside the
referring model. In contrast, in the implementation presented here the links re-
ferring to reference modules are kept in one or several separate Doors modules,
called synchronization module(s), solely dedicated to the purpose of dening
what modules are used as reference modules and what other modules are refer-
ring modules that have to conform to them. This way, it is not necessary to put
special objects into referring modules.
Further details on the tool will be given in Section 5 where the extensions
resulting from the case study will be described.
used as
Common Base reference
module
Platform A Platform B
Copy &
Paste
This initial setting perfectly matches the intention of subscoping and the
multi-level approach: the overall product line, comprised of platform A and
platform B cars, is split up in two sublines, namely platform A and platform
B, and each subline is enhanced independently; the multi-level management
now allows the commonalities and dierences between these two sublines to be
tracked and strategically coordinated without the necessity of introducing a rigid
product line organization. As one of the most important intentions and bene-
ts of the multi-level approach, the variability between platform A and platform
B no longer appears as variation points within the platform A and platform
B specications, which substantially reduces the complexity of these individual
specications.
In order to set up a multi-level hierarchy with all three specications, an
auxiliary synchronization module (as described in Section 3) which provided the
tool prototype with the necessary meta-information had to be created. This is
very straightforward and mainly denes the hierarchy of reference and referring
Doors modules involved, in this case the base module as a single reference
module and the platform A and platform B modules as referring modules. Then,
the reference links between the requirements in the platform A and platform
B modules and their corresponding reference requirements in the base module
had to be established and dened in Doors in a form suitable for the tool
prototype. This was a more daunting task as is discussed below. Together, these
preparations allowed the two referring specications to be managed according
to the multi-level approach, for example to dene deviation permissions in the
base module, to reveal illegitimate deviations in the platform A and platform
B modules or to propagate changes, e.g. a newly added object, from one of the
referring modules to the base module.
The platform A module contained 1.351 objects whereas the platform B
module contained 3.501. This alone shows the remarkable dierence in sys-
tem complexity between a low-end and a medium-class vehicle, not to mention
luxury-class models. Table 1 presents some statistics of the case study that fur-
ther characterizes the specication modules involved. Out of such a statistic,
several interesting facts become immediately obviousan important benet of
the multi-level concept. For example, the number of non-referring objects, i.e.
objects without a reference object, adequately measures how much additional
information is introduced in a subline, compared to the base module. In order
A Case Study on Tool-Supported Multi-level Requirements Management 179
nref
cov R
A =
nR
is called R-coverage of A.
is called R-innovation of A.
2
Please note that this denition assumes that the advanced concepts of split and
merge, as introduced later in this article, are not allowed; however, even if split and
merge occurs in an artifact this denition is usually suciently accurate.
180 M. Bittner, M.-O. Reiser, and M. Weber
Doors Modules
Base Platform A Platform B
Objects, thereof ... 1,908 1,351 3,501
- without reference object 590 43.7% 1,714 49.0%
- with reference object 761 56.3% 1,787 51.0%
Coverage 0.399 39.9% 0.937 93.7%
Innovation 0.437 43.7% 0.490 49.0%
Deviations:
- Renement 33 2.4% 107 3.1%
- Reduction 24 1.8% 18 0.5%
- Move 5 0.4% 36 1.0%
- Reorder 2 0.1%
- Textual Changes 220 16.3% 241 6.9%
- Merge 2 0.1%
- Split 65 4.8% 16 0.5%
links based on an appropriate similarity measure and assist the user in re-
viewing and adapting them by way of an appropriate on-screen presentation.
While in the case study some ad-hoc scripting was used to mechanize some of
the link creation, the design of a more sophisticated linking assistant would
be relatively straightforward; experience in the model transformation eld
where a similar issue exists, could be applied. It is realistic to assume that
the linking problem could be reduced quite substantially in this way.
3. However, even without such a linking assistant, the task of manually es-
tablishing a substantial number of reference links is absolutely manageable,
as the case study clearly showed. And this was the case, even though the
persons dening the reference links did not know the content of the three
specications before starting work on this case study.
Therefore this diculty does not actually represent a critical problem for this
approach.
Split and Merge. One of the most important observations from the case study
was that a splitting and merging of objects in a referring module is of great prac-
tical relevance, leading to referring objects with more than one reference object
and vice versa. Let us rst investigate the precise meaning of split an merge in our
context. When a Doors object has more than one reference object, this means
that two separate objects from the reference level were semantically merged into
a single object in the referring module. Since this in itself always represents a
change in the lower-level artifact with respect to the reference artifact, we can
perceive this as an new form of deviation called merge:
Denition. When a referring element has more than one reference ele-
ment, the referring element is assumed to comprise the semantic meaning
of all reference elements. This form of deviation is called a merge.
182 M. Bittner, M.-O. Reiser, and M. Weber
Similarly, several referring objects may point to one and the same object as
their reference object. The usual practical motivation for this is that a single
objects semantic meaning is distributed within the referring module among
several distinct objects. Again, this necessarily constitutes a deviation in itself:
Denition. When several referring elements have the same reference
element, the referring elements are assumed to jointly comprise the se-
mantic meaning of the reference element. This form of deviation is called
a split.
These two novel forms of deviation now complement the types of deviation for-
mally dened in Section 4.3 of [4].
Having precisely dened split and merge, we can now turn to the technical
details of the concept. Most importantly, the fact that splitting and merging
is now allowed has an impact on the precise denition of the other deviations.
For example, if an object o with name abc has two reference objects oA with
name abc and oB with name xyz, how do we decide if a change in os name
has occurred? As a general rule, we use a logical disjunction of the old denition
evaluated separately for each referring object (in case of a split) or each reference
object (in case of a merge). In the given example this means that we assume
that os name has changed if its name is dierent from oA s name or its name
is dierent from oB s name. In the above example this would be true. With this
general semi-formal rule, the logical constraints from our earlier publications can
be straightforwardly applied to split and merge as well (cf. Table 2 in [4]).
Review Status. This was introduced to make the multi-level approach manage-
able for very large referring modules, which are typically reviewed or reworked
A Case Study on Tool-Supported Multi-level Requirements Management 183
over such long a period of time (e.g. weeks or even months) that new changes in
the reference module occur during the review. To provide support for such use
cases, each reference relation has two status attributes attached: ReviewStatus
and NewChanges. The rst attribute has the default value not reviewed and
can be manually set to reviewed, for example after a review of the conformance
analysis and a possible adaptation of the synchronization settings. The date of
this setting is displayed in the attribute (last review date). If the correspond-
ing reference object has been changed since the date of the review, then (after
synchronization) the attribute ReviewStatus is set to need to revisit, and the
attribute NewChanges contains a delta description of what has been changed. As
previously, the user can then inspect these changes and reset the review status
to reviewed once more.
In addition to these rather conceptual observations, there are also several more
technical aspects to note:
atomization [7]this partial solution proved sucient for practical use. However,
in full-edged tool support, a more sophisticated comparison mechanism for the
most common types of OLE object, probably diagrams and tables imported from
Microsoft Word or Excel, is desirable.
Visibility of Reference Links. Since the tool DOORS does support ltering
on the objects of a requirements container but not on the (incoming or outgoing)
links displayed in a module, it becomes confusing for the user to see which links
leaving or entering his module are reference links connected to the multi-level
system and which links are outside this system. While this is a general DOORS
issue, we found that the Multi-Level approach should at least not aggravate it.
Therefore, the multi-level tool support allows the user to hide all reference
links in an additional special attribute and only display them as required.
Conformance and Review Summary. Initially, the tool was designed only
to present the conformance state inside the referring module (as a dedicated
Doors attribute of each referring object). While this is the most obvious place
for it and suitable in many use cases, it has turned out that it is often very
convenient for attaining an overview of all conformance states in all referring
modules at a glance. Therefore, a functionality was implemented in the tool to
show a special view summarizing the conformance states in all its referring mod-
ules: for each object in the reference module this view displays the conformance
state within the reference module and the review status for its referring object
in each referring module, each in a separate column (cf. Figure 3).
Fig. 3. A reference module showing a conformance summary for several referring modules
links (as discussed above), but for any two particular objects it can usually be
decided easily and denitely if one of the two should serve as the reference object
of the other. This was the case, even though the structure of the base module
had been changed substantially in the subline modules; even if an object was
put in a dierent section, it was still possible to clearly decide which reference
object it should receive.
This suggests that a reference link is not an articial concept, introduced
solely to make the multi-level approach work, but rather it captures an actual
conceptual relation which naturally occurs in real-word use cases.
Adequate Usability. The overall usability of the tool for multi-level Doors
modules proved to be quite reasonable. Even though it was not specically de-
signed for usability in the rst place, it was still quite convenient for performing the
typical actions, such as spotting deviations, editing deviation permissions, nding
out whether deviations are illegitimate, or propagating deviations up and down.
6 Conclusion
This article presented the so-called N-Lighten case study which was conducted on
industrial specications and by engineers who were not involved in the
A Case Study on Tool-Supported Multi-level Requirements Management 187
development of the multi-level approach. This means that the approach had
to be taught to them beforehand, putting the understandability and feasibility
of the concepts as well as the related tool support to the test.
Initially, the tool described in Section 3 was mainly intended as a research
prototype for experimentation purposes. Thanks to the above study, some inter-
nal case studies and the resulting renements and extensions, part of which were
described in Section 5, we are convinced that both the concepts as well as the
tool support are now ready for application in industrial development projects.
The tool is publicly accessable on a website, together with documentation and
a tutorial [8].
References
1. Clements, P., Northrop, L.: Software Product Lines: Practices and Patterns.
Addison-Wesley, Reading (2002)
2. Pohl, K., Bockle, G., van der Linden, F.: Software Product Line Engineering:
Foundations, Principles and Techniques. Springer, Heidelberg (2005)
3. Reiser, M.-O., Weber, M.: Managing highly complex product families with
multi-level feature trees. In: Proceedings of the 14th IEEE International Require-
ments Engineering Conference (RE 2006), pp. 146155. IEEE Computer Society,
Los Alamitos (2006)
4. Reiser, M.-O., Weber, M.: Multi-level feature trees a pragmatic approach
to managing highly complex product families. Requirements Engineering 12(2),
5775 (2007)
5. Reiser, M.-O.: Core concepts of the Compositional Variability Management frame-
work (CVM). Technische Universitat Berlin, Technical Report, no. 2009-16 (2009)
6. Chappell, D.: Understanding ActiveX and OLE. Strategic Technology Series.
Microsoft Press, Redmond (August 1996)
7. Pohl, K.: Requirements Engineering Grundlagen, Prinzipien, Techniken. Dpunkt
Verlag (2007)
8. DOORS Multi-Level Tool Web-Site (2010), www.mule-re.org
A Domain Ontology Building Process
for Guiding Requirements Elicitation
1 Introduction
The RE process [17] starts out with a specification which is informal, opaque, and
dominated by personal views, while the goal is to have a specification, which is for-
mal, complete, and reflects the stakeholders common view. The use of an ontology
can help to tackle these challenges: from opaque to complete because an ontology can
encode knowledge about the domain, thus ensuring that important requirements are
not forgotten, and from personal to common view because an ontology defines a stan-
dard terminology for the domain, which mitigates misunderstandings about terms. If
the ontology is defined in a formal language, it will also help regarding the formality
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 188202, 2010.
Springer-Verlag Berlin Heidelberg 2010
A Domain Ontology Building Process for Guiding Requirements Elicitation 189
dimension. There has been an increasing interest in using ontologies to aid the RE
process.
Ontologies are specifications of a conceptualization [4] in a certain domain. An on-
tology seeks to represent basic primitives for modeling a domain of knowledge or
discourse. These primitives are typically concepts, attributes, and relations among
concept instances. The represented primitives also include information about their
meaning and constraints on their logically consistent application [5]. A domain ontol-
ogy for guiding requirements elicitation depicts the representation of knowledge that
spans the interactions between environmental and software concepts. It can be seen as
a model of the environment, assumptions, and collaborating agents, within which a
specified system is expected to work. From a requirements elicitation viewpoint,
domain ontologies are used to guide the analyst on domain concepts that are appropri-
ate for stating system requirements.
There are a number of research approaches to elicit and analyze domain require-
ments based on existing domain ontologies. For example, Lee and Zhao [13] used a
domain ontology and requirements meta-model to elicit and define textual require-
ments. Shibaoka et al. [18] proposed GOORE, an approach to goal-oriented and on-
tology-driven requirements elicitation. GOORE represents the knowledge of a spe-
cific domain as an ontology and uses this ontology for goal-oriented requirements
analysis [12]. A shortcoming of these approaches is the need for a pre-existing ontol-
ogy, as to our knowledge there is no suitable method for building this ontology for
requirements elicitation in the first place in an at least semi-automated way. In indus-
trial settings, the task of building domain ontologies from scratch can be daunting,
mostly due to the size of technical standard documents that need to be interpreted by
domain experts and the wide range of domain concepts coverage that will be the input
to such ontologies. Therefore, the domain ontology building task can greatly be lever-
aged by tool support.
This paper explores the challenge in building a domain ontology that is sufficient
for guided requirements elicitation. Firstly, we investigate an approach for building
domain ontologies from existing technical standards which the specified requirements
need to be compliant with. Our investigation is based on a set of heuristics used for
extracting semantic graphs from textual technical standards to generate compatible
baseline domain ontologies. Secondly, we present an evaluation of the feasibility of
our approach and provide insights on the challenges of semi-automatically building
domain ontologies using natural language texts. The remainder of this paper is struc-
tured as follows: section 2 presents related work and motivates the research issues;
section 3 discusses the characteristics of a suitable ontology for requirements elicita-
tion and also proposes an approach for achieving such ontologies. Section 4 presents
the evaluation of our approach and a discussion of lessons learned during this re-
search. Finally, section 5 concludes the paper and presents some ideas for further
work.
1
http://nlp.stanford.edu/software/lex-parser.shtml
2
http://opennlp.sourceforge.net/
A Domain Ontology Building Process for Guiding Requirements Elicitation 191
Although Kofs and Flores methods are based on analysing natural language texts
to extract an a ontology that can subsequently be used for requirement elicitation, no
analysis has been done on the suitability or usefullness of the resulting ontology for
such purpose. The association of concepts in a domain ontology can be described by
its taxonomy or by the use of axioms. The taxonomy is a hierarchical system of con-
cepts, while axioms are rules, principles, or constraints guarding the relations amongst
concepts. Furthermore, the level of granularity to which the axioms are specified is
highly influenced by its intended use within the ontology [6]. From the viewpoint of
using domain ontologies for requirements elicitation, axioms specify the extent to
which such an ontology can be useful for the categories of questions to which the
ontology can provide answers.
Again, existing related works lack insight into the potential challenges of extract-
ing a domain ontology from a textual source. For instance, such text might not suffi-
ciently describe the domain of concern or contain terms that are not unique to the
domain being described. Furthermore, derived ontology from analyzed text can con-
tain unique concepts/relations in the domain of interest which do not contribute to the
requirements elicitation process. In such a case, the text analyzed contains valid do-
main terms that do not necessarily contribute to useful domain ontology. In this re-
search, we reckon that each of these challenges needs to be investigated as to how it
can be mitigated.
To address the research issues identified in closely related work, this paper dis-
cusses the semantic features of suitable domain ontologies for requirements elicitation
and proposes a process for systematic and efficient domain ontology building. We
then evaluate the feasibility of our approach based on a real-world industrial use case
by analyzing text from technical standards.
guiding the analyst in determining the relevance of a prescribe trace inference from
the ontology to the system being specified.
Qualified identification of relations: Ontologies used to support computers in rea-
soning will normally identify relations by the use of a single so-called interesting or
performative verb. These are verbs whose action is accomplished merely by saying
them. Performative verbs such as requires, sends, or request, explicitly convey the
kind of act being performed by a concept by virtue of an involving relation. But
considering an ontology for guided requirements elicitation, the semantic implica-
tions of such performative verbs are normally described in an adjoining qualifier
such as adjectives and conjunctions. Thus, it is more insightful to name the relation
between agent and message using the identifier periodically sends rather than
only using send. In this research we explore the use of performative verbs in
combination with their qualifiers to semantically identify relations between
concepts.
Temporal and spatial expressions: For using domain ontologies for requirements
elicitation, we need insight into temporal and spatial implications of relations that
exist between concepts. For example, assume A, B and C are concepts in a domain
and the description A requires B during C is a feature used to characterize a do-
main. For semantic insight during requirements elicitation, it is important that the
explicit relation that exists between A and B as well as the temporal relation that A
has with C are captured by the ontology and made obvious to the analyst.
Building domain ontologies with the above semantic features is challenging as it
requires domain experts to describe and document their knowledge about the domain
with the meaning of concepts and implied relations in a detailed manner, which will
be time-consuming. In this research, we explore a rule-based approach that uses NLP
techniques to evaluate the possibility of automatically capturing initial or baseline
domain ontology from existing text.
The basis of this approach is: given some pre-processed textual document and some
predefined heuristics based on NLP, it is possible to extract ontology concepts and
relations that are semantically meaningful for requirements elicitation. The pre-
processing of the document is normally a manual process and ensures that the text
from which concepts and relations are to be extracted is suitable for sentence-based
analysis. This includes the removal of symbols or formatting from text that will oth-
erwise alter the meaning of extracted concepts or relations. It is worth mentioning that
the more detailed a document is pre-processed, the more effective domain ontologies
can be extracted from the natural language text. In contrast, for large documents such
detailed preprocessing is difficult, as it requires more effort from the domain experts.
This challenge for large documents necessitates an (semi-)automated pre-processing
approach to help improve the resulting ontology. In the first instance, the rule-based
ontology extraction investigates two automated document pre-processing mechanisms
known as bracket trailing and bridged-term completion. Subsequently, Subject-
Predicate-Object extraction, association mining and concept clustering is executed on
the pre-processed text.
A Domain Ontology Building Process for Guiding Requirements Elicitation 193
The manual generation of the requirements elicitation ontology involved two experi-
enced participants. The first, who acted as the analyst, had only a vague understand-
ing of ACC but were knowledgeable about requirements elicitation processes, while
the second participant had a much deeper insight into the ACC domain and acted as
the domain expert. Both participants were knowledgeable of how concepts and rela-
tions amongst concepts can be used to generate an ontology.
Paragraphs from two representative sections in ISO 15622 ACC systems technical
standard [7] were presented to both participants (see figure 2 labeled case 1 and 2).
196 I. Omoronyia et al.
Our selection criterion was a document section that was representative and at the
same time provided insights independent of the initial manual pre-processing. This is
because the level of manual document pre-processing carried out by a third-party on
the selected text can influence the quality of domain ontologies generated using the
rule-based approach. Both participants were asked to extract concepts and generate
relations among the concepts from the text. To understand the implications of our
approach, we feed the same natural language texts into an implemented prototype for
automated rule-based ontology approach.
Concepts Relations
Assumed Explicit Assumed Explicit Parent-Child
Rule-based 7 30 16 15 21
Analyst 0 26 0 25 0
Domain expert 3 18 8 13 6
Table 1 shows the number of concepts and relations extracted from the sample text.
Explicit concepts/relations are directly inferred from the text, while assumed con-
cepts/relations are inferred using reasoned based on concept clustering for the rule-
based approach or based on understanding and knowledge of the analyst respectively
the domain expert. For each of the categories, a higher number of concepts and rela-
tions were identified using the rule-based approach compared to those identified by
the domain expert or analyst. Insight from participants showed that since the analyst
had a limited understanding of the domain, concept/relation extraction was strictly
based on his/her understanding of the sample text. On the other hand, the domain
expert relied more on his/her general understanding of the domain to assimilate the
meaning and implication of each concept/relation from the sample text. Both partici-
pants used one hour to analyze and document a domain ontology based on the sample
text. The automated rule-based approach used all the required steps besides the initial
manual document pre-processing step. Based on the above sample text, this result is
an initial indicator that the automated rule based approach can help reduce the effort
in generating requirements elicitation ontology and at the same time achieve a greater
coverage of domain concepts. On the other hand, it is also important to understand if
A Domain Ontology Building Process for Guiding Requirements Elicitation 197
the additional concepts and relations extracted by the rule-based approach are valid,
semantically meaningful and necessary and not simply an over specification.
Using the ontology extracted by the rule-based approach and manually by the ana-
lyst and domain expert, this study focuses on getting insights into the over specifica-
tion (extracting concepts and relations that are not necessary) or under specification
(missing concepts and relations that are otherwise necessary) using our rule-based
approach; and if the extracted concepts and relations were semantically intuitive for
guided requirements elicitation. We discuss each of these factors and pinpoint how
they can be possibly ameliorated.
information about: (1) ranging to forward vehicles (see figure 2 case 1) results in
the over specification of the concepts forward vehicle and forward vehicles.
Apart from the understanding that one concept is singular while the other is plural, the
two concepts point to the same semantic meaning. Thus, not much additional insight
is obtained by modeling the two as separate concepts. A case of relational over speci-
fication using the rule-based approach is demonstrated in the modeling of the relation
between driver, clutch pedal and clutch concepts (see the shaded section of
figure 3a and marked X). In this case, only the relation depresses between driver
and clutch pedal is meaningful, while the relation depressing between driver
and clutch is not required, given that the former relation implies the later one.
The two cases of over specification pointed out above can be eliminated using
stemming/lemmatizing of concept terms to their root words or a more rigorous con-
cept and relational clustering methods. In the first case, the term vehicles can be
stemmed to its root form vehicle, while in the second case, the relation depresses
and depressing can be merged to a single relation between driver and clutch
pedal or clutch. On the other hand, modeling of generic terms such as control,
use and forward in the rule-based approach as concepts can be considered an over
specification for domain ontologies suitable for requirements elicitation. Such generic
terms are more suitably defined in general ontologies such as Wordnet 3and can hence
be filtered out from specific domain ontologies.
Ontology under specification: Initial insight into under specification when using the
rule-based approach can be carried out by checking if all semantically meaningful
concepts and relations captured by the analyst and domain expert can be directly or
indirectly inferred from the domain ontology generated by the rule-based approach.
24 of the 26 concepts captured by the analyst were also capture by the rule-based
approach. As further discussed below, the two remaining concepts motion of vehicle
and transition to ACC-stand-by is considered wrongfully modeled or less meaning-
ful. 20 of the 21 concepts captured by the domain expert were also captured by the
rule-based approach. The remaining concept ACC state is the concept captured by
domain expert which was not represented in the rule-based approach. A follow-up of
this finding from the domain expert suggested that ACC state was informed by
his/her understanding that although ACC-stand-by and ACC-active were only men-
tioned in the text, the two concepts were the possible states of ACC system. Hence the
concept ACC state captured as the parent of ACC-stand-by and ACC-active by the
domain expert. On the other hand, it had only been possible for the rule based ap-
proach to conceptualize ACC state if the sample document (case 2 figure 2) was
rewritten as but remain in the ACC-active state or transition to ACC-stand-by
state. In this case, both ACC-stand-by state and ACC-active state would be
identified and represented by the rule-based approach as concepts while ACC state
would be represented as a parent concept.
Three relations present in the ontology created by the domain expert were not cap-
tured by the rule-based approach. These include: vehicle has driver, vehicle has
brakes and ACC system same as Adaptive Cruise Control. In all three cases,
the sample text was not sufficient and did not contain possible references to suggest
3
http://wordnet.princeton.edu/
A Domain Ontology Building Process for Guiding Requirements Elicitation 199
such relation among concepts and was hence impossible to infer for an automated
approach. Overall, insights from this study demonstrate that the challenge of under
specification using rule-based approach can be reduced by either rigorous manual
preprocessing of text document; providing sufficient text for rule-base analysis or by
domain experts manually adding the missing concepts and relations.
Semantically intuitive and meaningful ontology: For the ontology generated by the
analyst, the concept motion of vehicle is wrongly modeled. The reason for this is
that in the sample text, the concept information is precisely related to the motion of
the subject vehicle and not to every vehicle. Similarly, the concept transition to
ACC-stand-by confounds the already modeled relation between transition to that
exist between the concepts system and ACC-stand-by. The concepts actuators
and longitudinal control strategy are similarly modeled using the relation carry
out and carrying out for the domain ontology generated by the analyst and by the
rule-based approach respectively. However, from a linguistic viewpoint, the expres-
sion actuators carry out longitudinal control strategy is a complete self-defining
phrase, while the expression actuators carrying out longitudinal control strategy
suggests the need for an additional support phrase. In this example, the defined rela-
tion between actuators and longitudinal control strategy from the analyst is se-
mantically more intuitive than the relation generated by the rule based approach. Such
linguistic issues in the definition of relations for the rule-based approach can possibly
be reduced if during the predicate extraction phase of the SPO analysis, the root form
of the verb gerund or present participle (VBG) is used.
The outcome of the study also suggests that domain ontologies originating from the
rule-based approach and from the domain expert should complement each other. This
is because concepts and relations are sometime better modeled using the rule-based
approach than the domain ontology created by the domain expert, and vice versa. A
core observation of the comparison of the domain ontology created by the rule-based
approach and the one created by the domain expert is that relations between concepts
can sometimes be represented in a rather concise but semantically equivalent and
meaningful way. For example, as shown in figure 3a, the relation between the
concepts Automatic brake maneuver and clutch pedal is identified using an in-
termediate concept use with <temporally infers> and can be continued during
relational identifiers between them. In the domain ontology created by the domain
expert (figure 3c), a more concise relational identifier can use links the two con-
cepts Automatic brake maneuver and clutch pedal. On the other hand, the rule-
base approach also demonstrates cases where the use of concepts as intermediaries
provides more insights into the relational semantics. For instance, the ontology cre-
ated by the domain expert (figure 3c) relates the two concepts controller and
driver via the relation informs. While this is a semantically valid relation, the
token that is transmitted from controller to driver is not an explicit characteristic of
the relation. Rather, in the rule-based ontology, the controller and driver concepts
are related via an intermediate concept status information with <spatially infers>
and a sends relational identifier between them. In this case, the conceptualization of
status information provides more details on the token that is transmitted from the
controller to driver.
200 I. Omoronyia et al.
The core lesson learned in this research is that domain ontologies for supporting re-
quirements elicitation can be achieved by extracting knowledge from technical docu-
ments. The domain ontology manually generated by an analyst has shown to be more
prone to error when identifying concepts and relations than the ontology that is auto-
matically generated. This is understandable, since analysts usually have no knowledge
of the ontology domain. The ontologies created by the rule-based approach and by the
domain expert can be used to complement each other. Thus, a viable technique for
building requirements elicitation domain ontologies is to generate a baseline ontology
using the rule-based approach based on the technical documents and then let it be
verified and refined by domain experts.
Furthermore, manual document pre-processing before carrying out sentence based
NLP analysis that extracts concepts and relations is critical but in non-trivial cases
difficult to achieve. This is because the generated ontology is highly dependent on the
quality and format of source text. Our general experience is that domain standard
texts tend to conform to good linguistic style and in some cases use controlled lan-
guage subsets. This is normally not the case when source of the text is informal
documents such as emails, interview transcripts and web pages. The successful appli-
cation of the rule-based ontology generation approach has so far been validated for a
domain standard text, and hence might not be a valid approach for informal text
sources. Automated document pre-processing such as bracket trailing and bridged-
term completion, where possibly ambiguous terms are brought to the notice of the
domain expert, are viable options to reducing manual preprocessing effort.
Bridged-terms completion can sometimes raise false alerts. For instance, the sen-
tence Safety communication and standard communication shall be independent will
alert the domain expert on possible concept term ambiguity, even though safety
communication and standard communication are both completely defined concepts.
As part of our future work, we plan to investigate a machine learning approach to
reduce such false positives. Bracket trailing relies on the assumption that it is com-
mon for the supplementary material in a bracketed text to provide more information
on the particular single sentence. Such an assumption cannot hold for writing styles
where a bracketed text is used to provide supplementary material that references mul-
tiple sentences.
As in most text analysis techniques, a 100% precision/recall is difficult to achieve
although a high precision/recall rate for the rule-based approach can be inferred for
the text used for the initial study. In the first case, a high precision is inferred based on
the analysis of sample text for over specification. Two cases of concept over specifi-
cation were captured out of 37 concepts (95% concept precision). Similarly, two rela-
tions were over specified out of 51 relations (96% relational precision). In the second
case, a high recall is inferred based on the analysis of sample text for under specifica-
tion. The analysis showed that 20 of the 21 concepts captured by the domain expert
were also captured by the rule-based approach (95% concept recall). Similarly, three
relations present in the domain expert ontology were not captured by the rule-based
approach (94% relational recall). Given that this is an initial preliminary study using a
relatively small subset of technical standard text, more studies will be required to
generalize this outcome for a much larger subset.
A Domain Ontology Building Process for Guiding Requirements Elicitation 201
This preliminary study reveals a scalability concern. Using the rule-based ap-
proach, a small snippet of domain standard text can produce large ontology models
(figure 3a). An initial insight applying our approach to a larger text suggests that at
the early stage, the size of the ontology had a relatively linear growth as text from
different sections of the domain standard was analysed. As the volume of text ana-
lysed increased, a peak growth is reached when no new concepts were introduced by
simply adding text from new sections of the document.
References
1. Choi, F.Y.Y.: Advances in domain independent linear text segmentation. In: Proceedings
of the 1st North American Chapter of the Association for Computational Linguistics
Conference. Morgan Kaufmann Publishers Inc., Seattle (2000)
2. Falbo, R.d.A., Guizzardi, G., Duarte, K.C.: An ontological approach to domain engi-
neering. In: Proceedings of the 14th International Conference on Software Engineering and
Knowledge Engineering (2002)
3. Flores, J.J.G.: Semantic Filtering of Textual Requirements Descriptions. In: Natural
Language Processing and Information Systems, pp. 474483 (2004)
4. Gruber, T.R.: A translation approach to portable ontology specifications. Knowl.
Acquis. 5(2), 199220 (1993)
5. Gruber, T.R.: Ontology. In: Liu, L., Ozsu, M.T. (eds.) Encyclopedia of Database Systems.
Springer, Heidelberg (2008)
6. Ikeda, M., Seta, K., Mizoguchi, R.: Task ontology makes it easier to use authoring tools.
In: Proceedings of the 15th International Joint Conference on Artifical Intelligence (1997)
7. ISO standard: Transport information and control systems -Adaptive Cruise Control
Systems - Performance requirements and test procedures. 15622 (2002)
8. Kitamura, M., et al.: A Supporting Tool for Requirements Elicitation Using a Domain
Ontology. In: Proceedings Software and Data Technologies (2009)
9. Kof, L.: Scenarios: Identifying Missing Objects and Actions by Means of Computational
Linguistics. In: Proceedings RE 2007 (2007)
10. Kof, L.: An Application of Natural Language Processing to Domain Modelling - Two Case
Studies. International Journal on Computer Systems Science Engineering 20, 3752 (2005)
11. Kof, L.: Translation of Textual Specifications to Automata by Means of Discourse Context
Modeling. In: Glinz, M., Heymans, P. (eds.) REFSQ 2009. LNCS, vol. 5512, pp. 197211.
Springer, Heidelberg (2009)
12. Kof, L.: Using Application Domain Ontology to Construct an Initial System Model. In:
IASTED International Conference on Software Engineering (2004)
13. Lee, Y., Zhao, W.: An Ontology-Based Approach for Domain Requirements Elicitation
and Analysis. In: Proceedings of the First International Multi-Symposiums on Computer
and Computational Sciences (2006)
14. Lennard, J.: But I Digress: The Exploitation of Parentheses in English Printed Verse.
Clarendon Press, Oxford (1991)
15. Liddy, E.D.: Natural Language Processing. In: Encyclopedia of Library and Information
Science, 2nd edn. Marcel Decker, Inc., New York (2001)
16. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of
English: the penn treebank. Comput. Linguist. 19(2), 313330 (1993)
17. Pohl, K.: The three dimensions of requirements engineering: a framework and its applica-
tions. Inf. Syst. 19(3) (1994)
18. Shibaoka, M., Kaiya, H., Saeki, M.: GOORE: Goal-Oriented and Ontology Driven
Requirements Elicitation Method. In: Hainaut, J.-L., Rundensteiner, E.A., Kirchberg, M.,
Bertolotto, M., Brochhausen, M., Chen, Y.-P.P., Cherfi, S.S.-S., Doerr, M., Han, H.,
Hartmann, S., Parsons, J., Poels, G., Rolland, C., Trujillo, J., Yu, E., Zimnyie, E. (eds.)
ER Workshops 2007. LNCS, vol. 4802, pp. 225234. Springer, Heidelberg (2007)
19. Sowa, J.F.: Conceptual structures: information processing in mind and machine. Addison-
Wesley Longman Publishing Co., Inc., Amsterdam (1994)
Tackling Semi-automatic Trace Recovery for
Large Specications
1 Introduction
Currently, there are no regulations that require the automobile industry to main-
tain explicit traceability through their development cycle. But this is going to
change for safety critical components with the upcoming ISO 26262 [14]. Cur-
rent specications do not contain complete trace sets but will require them to be
available when they are reused. Creating traceability manually is quite arduous
as the specications are large. Konrad and Gall [16] report that manually trac-
ing 4,000 user requirements is quite a challenge. The US DoD reportedly spends
four percent of life cycle costs on requirements traceability [22].
Other researchers proposed semi-automatic solutions that successfully apply
information retrieval algorithms in order to create and maintain traceability
after-the-fact [1,6,13,19,20,25]. However, publications of existing research focused
on small, English specications (< 1,000 elements) while the specications in the
German automobile industry are larger and often in German. The size and lan-
guage pose a problem for the semi-automatic methods. We propose and describe
optimizations to decrease the impact of these obstacles. The next section will
describe the problem and current solutions in more detail.
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 203217, 2010.
c Springer-Verlag Berlin Heidelberg 2010
204 J. Leuser and D. Ott
2 The Problem
In the automotive industry, three main abstraction layers exist: Vehicle, systems,
and components. For each of these layers, requirement and test specications ex-
ist. Specifying a car easily requires around 100 system and multiple hundred
component specications. The classication of Regnell et al. [24] places many
specications into the category large-scale RE (order of magnitude of 1,000
requirements). Some specications contain up to 50,000 individual elements (ap-
proximately 2,000 pages) and therefore fall in the category very large-scale RE
(order of magnitude of 10,000 requirements). The sum of all specications for a
system easily reaches this category.
Two main kinds of traceability are important: Traces within or between re-
quirement specications and traces between requirement and test specications.
Their creation and maintenance is quite a challenge. Semi-automatic methods us-
ing information retrieval (IR) algorithms promise to reduce this eort. However,
these methods are not ready for industry use as they do not reduce the manual
eort enough. We outlined a number of challenges [17] for traceability in our
context. Two of these directly aect the use of semi-automatic methods: The
size and language of specications. The optimizations described in this paper
(Sect. 5) mainly address the size of specications.
Research on the use of IR methods for recovery of traceability focused on three
IR models: Vector Space Model (VSM) [1,6,13], Probabilistic Model [6], and
Latent Semantic Indexing (LSI) [9,13,19,20,25]. The datasets used for validation
are comparably small and mainly in English. Winklers [25] dataset AB is
reasonably large. It contains hundreds of elements on two abstraction layers, but
only an unspecied subset is used. As De Lucia et al. [9] point out and we could
conrm [18], with the size of the dataset, the number of candidate traces grows
fast and the ratio between actual links and false positives deteriorates rapidly.
The language is another problem we face: Braschler and Ripplinger [5] showed
for classic IR that the German language needs dierent preprocessing than for
example English. We found this also to be true for the search for traceability [18].
German grammar is more complex than English grammar. This leads to a larger
number of word forms that have to be dealt with. German also allows nearly
unrestricted compounding of words. Compounding is widely used for technical
terms. For example Dachbedieneinheit (overhead control unit) is compounded
of three words. The English equivalent consists of three individual words. These
language properties have to be dealt with and worsen the precision of the results.
For a better grasp of the problem, we analyzed a subset of German requirements
specications of the Daimler AG. The analyzed 70 system and 106 component
specications contained a total of 62,116 dierent words. The system specica-
tions have an average length of 441 requirements (15 words each), the component
specications of 848 requirements (16 words each). This illustrates that the real
world specications are larger than the previously published datasets. The state-
ments about size are based on user generated meta-data which tags specication
elements to be of a certain kind, e.g. requirement or information (see column
object type in Fig. 1). The quality of this meta-data varied.
Tackling Semi-automatic Trace Recovery for Large Specications 205
OLC LSC
System specication (SysS) requirements 2,095 109
headings 1,166 65
other 30 83
Component specication 1 (CS 1) requirements 61
headings 915
other 1,181
Component specication 2 (CS 2) requirements 756
headings 301
other 241
Test specication (TS) test cases 18
test steps 52
other 27
Reference traceability set
SysS SysS 1,109 67
SysS CS 1 14
SysS CS 2 6
SysS Test 27
206 J. Leuser and D. Ott
3 Information Retrieval
The goal of the information retrieval (IR) algorithms is to retrieve the desired
data in our case correct traces without retrieving unwanted data. The search
for results is based on a four-step process: Preprocessing, application of algo-
rithms, creation of a candidate trace list, and its inspection by a human analyst.
We focus on the vector space model (VSM) as this is the IR model that our
research tool is based on. We chose the VSM as the probabilistic model as alterna-
tive currently does not show an overall superiority [3, p. 34]. In the preprocessing
stage, the documents raw texts (e.g. requirements) are tokenized. Common fur-
ther steps are removal of stop words (words not helpful for the retrieval) and
stemming. Due to the large number of compounds, the decomposition of such
words is benecial for German texts [18]. The result of the preprocessing stage
is a list of index terms {t1 , ..., tn }.
In the VSM, documents (e.g. individual requirements) are interpreted as vec-
tors. Each index term ti represents a dimension. Each document dj is trans-
formed into a vector dj = {f reqt1 , ..., f reqtn }. f reqti is the number of occur-
rences of index term ti in dj . All document vectors build the so called term-
document matrix Aij (f reqti ,j ). This matrix can be processed dierently depend-
ing on the algorithm, resulting in matrix Bij (wi,j ) with term-weights instead of
frequencies.
The similarity of two documents can be calculated as the cosine between
the two document vectors in Bij . The assumption is that the more similar the
documents (smaller angle), the more likely a trace exists. Creating the candidate
trace list therefore consists of calculating the similarities and ordering the results.
For practical reasons, the candidate trace list only contains document pairs that
have a similarity larger than the so called cuto point. The quality of the list
can be determined using two measures: recall and precision. Recall measures how
well the correct results (traces of the reference trace set) are retrieved. Precision
measures the amount of correct traces in the candidate list. We use generalized
forms for multiple queries like for example De Lucia et al. [7] do as well:
|correct tracesi found tracesi |
Recall = i (1)
i |correct tracesi |
|correct tracesi found tracesi |
Precsion = i (2)
i |found tracesi |
Hayes et al. [12] argue that recall > 60% is acceptable, > 70% good and > 80%
excellent. For precision, > 20% is acceptable, > 30% good and > 50% excellent.
Like De Lucia et al. [8], we share this assessement. This scale demonstrates
that a complete semi-automatically retrieved trace set is currently unlikely. The
already mentioned semi-automatic solutions (e.g. [8,12,25]) typically reach good
to excellent recall with acceptable to good precision for small specications.
wi,j in the term-document matrix based on two factors: the term frequency factor
tf (Eq. 3) and the inverse document frequency factor idf (Eq. 4). tfi,j indicates
how well ti characterizes the document. idfi represents the importance of ti in
the whole document corpus. k = index of t with maximum occurrence, N =
number of documents, ni = number of documents containing index term ti .
f reqi,j N
tfi,j = , f reqtk ,j = maxi f reqti ,j (3) idfi = log (4)
f reqtk ,j ni
The nal weight for index term ti in document dj is calculated as follows:
4 The TraceTool
The research tool we use is called TraceTool which is described in more detail
in [17]. The TraceTool is able to access live data in Doors databases. It imple-
ments the two IR algorithms tf/idf and LSI [3]. Using those algorithms, a list of
candidate traces is created. In this paper, we just employ the tf/idf algorithm.
For a more intuitive similarity measure for the user, the raw cosine similarities
are transformed into the interval [0%,100%]. This measure is called trust level.
In order to evaluate the proposed optimizations (Sect. 5), we introduced a
functionality to automatically change and measure dierent congurations. A so
called Measurement Run Set denes what optimizations to activate and what
parameters to set. It executes the processing and records measures like precision
and recall on dierent cuto-point levels. Thus, a large number of congurations
can be tested automatically. This is important as (pre-)processing a conguration
of the larger datasets takes between a quarter of an hour up to 6 hours, depending
on algorithm and optimizations. For example evaluating the inuence of the
weight of one optimization (Sect. 5) in the interval [0,10] (step size 0.1) on dataset
OLC (Sect. 2.1) using tf/idf with all other parameters xed takes approximately
a day on our fastest machine. Only the thesaurus which changes the similarity
calculation (Sect. 5.2) extends the processing time signicantly.
5 Investigated Optimizations
We focused on the tf/idf algorithm with our optimizations. The main reason
is that the correlation of changes to the algorithm (e.g. wi,j ) and changes in
the results (the resulting similarities) are more comprehensible than with LSI
which processes Bij heavily. Furthermore, LSI could make use of the extended
tf/idf by using the dierently weighted term-document matrix. We investigated
dierent kinds of optimizations: First, lters to discard candidate traces based
on document text or meta-data. Secondly, the weighting of the tf/idf algorithm
(Eq. 5) was modied by exploiting knowledge about individual index terms.
208 J. Leuser and D. Ott
Signals with Value Assignment (VA). Hayes et al. [11] found that incor-
porating phrases, that is index terms with more than one word, can improve
retrieval results. We found signals with value assignment that seem to be similar
to phrases: For example, requirements might contain LoBM F LT = 0 which
diers from LoBM F LT = 1 although the signal name is the same. The iden-
tication of such index terms is done during preprocessing. From there on, the
index terms are treated like normal index terms.
Word Classes. It is evident that not all words are created equally: Some have
more semantic than others. Stopwords, for example, are not seen as helpful for
the retrieval task and are therefore discarded. We propose two additional word
classes besides stopwords and normal words: weak words (mostly as commonly
used in requirements engineering, e.g. [10]: oft (often), gering (small)) and
domain words (e.g. Temperaturregler (thermostat)). The signals (Sect. 5.1)
Tackling Semi-automatic Trace Recovery for Large Specications 209
could be seen as a special form of domain words. We assume that weak words
contain less meaning than normal words (ww < 1) and domain words contain
more (dw > 1). Therefore, we created a list of domain words (D(dw), 9,648
words) and a list of weak words (D(ww), 1,039 words) out of our extensive word
list we extracted during our initial analysis. We change the term-weights wi,j as
seen in Eq. 6 with the following weight xi ; ww, dw: parameters to be set:
ww, if ti weak words
xi = dw, if ti domain words
1, else
Thesaurus (Th). Hayes et al. [13] report improved retrieval results when a
thesaurus is included into the similarity calculation of the IR algorithms. En-
couraged by these results, we wanted to test the inclusion of a thesaurus. As in
their solution, we built a thesaurus T with entries of the form tk , tl , kl using
our initial word list as basis. kl is the similarity coecient for the two terms tk
and tl . The created thesaurus contains 286 entries. It is applied as an extension
to the basic cosine similarity in the similarity calculation as in Eq. 7:
t
i=1 wi,j wi,q + tk ,tl ,kl T kl (wk,j wl,q + wl,j wk,q )
sim(dj , q) = (7)
t 2 t 2
i=1 i,jw w
i=1 i,q
6 Results
Our prior work [18] indicates better recall when stemming and decomposition
is activated for German specications. Therefore, all measurements were done
with those two steps activated. Additionally, two lters were activated: The rst
lter removes candidate traces between sibling elements, i.e. elements that share
the same parent (e.g. a heading). The second lter removes candidate traces
where one end of the trace is a direct parent of the other end, for example a
trace between the heading and the requirement in Fig. 1 would be removed.
The two lters are based on the knowledge that these relationships are already
documented by the document structure. Both lters have shown an increase in
precision with a minimal loss of recall [18]. These lters are only eective when
searching within one specication. When a dataset contained more than one
specication we created pairwise subsets.
To nd the best possible results for each optimization we used our Measure-
ment Run Sets (Sect. 4) to iterate through many possible parameter values
(where available). We opted for 0.1 as step size for the parameters. The search
space was [0,10]. It was extended to 20 when the results indicated, that higher
Tackling Semi-automatic Trace Recovery for Large Specications 211
parameter values would produce better results. Our search space produced a
large number of results. The best result was picked by the following schema:
Take the results with the 15% best recall values in low-condence traces
(5-25% cuto point; < 5% was not measured)
Rank the selected results according to the slope of the regression line and
take the result with the slowest decline
When the last selection step resulted in more than one possible result, we
looked at the precision in the same way
This selection scheme is based on a couple of assumptions: First, the most im-
portant task of the TraceTool is to recover traces, hence the focus on recall.
Secondly, the curve with the best maximum recall might deteriorate worse than
curves with slightly less maximum recall. We believe that a good recall over all
cuto points (slow deterioration) is preferable over partially good results (e.g.
only good at the cuto point that yields the maximum recall). This way, an ana-
lyst can expect similar recall at all cuto points. As we experienced unacceptable
precision values in our datasets before, we opted for precision as the last factor
only. When applying our selection schema, we checked, if promising results were
discarded by the rst selection step. When such a result existed it was manually
added. This was the case for the weak word optimization in dataset LSC in its
subset Sys CS 1.
For the ltering approaches (Sect. 5.3, 5.4), we adapted the lters to the datasets.
Additional dataset specic values were included. The lter for redundant texts
originally contained 10 predened texts but was extended to up to 35 entries.
Table 2. Results for dataset OLC. Traces within the system specication.
100% tf/idf
D(ww = 3)
80%
DSW(x = 1.4),
VA(x = 1.5)
60%
frt
40% feh
20%
0%
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Cutoff Point
(a) Recall
10% tf/idf
9%
8%
D(ww = 3)
7%
DSW(x = 1.4),
6% VA(x = 1.5)
5% frt
4%
feh
3%
2%
1%
0%
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Cutoff Point
(b) Precision
900000
tf/idf
800000
700000 D(ww = 3)
600000
500000
DSW(x = 1.4),
VA(x = 1.5)
400000
frt
300000
200000
feh
100000
0
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Cutoff Point
Fig. 2. Results for dataset OLC. Traces within the system specication.
Due to the limited space, the results of dataset OLC only are presented as
graphs. Table 2 displays all results. Fig. 2(a) shows the recall over the cuto
point range 5% - 100% (low - high condence). Fig. 2(b) shows the accompanying
precision curves. Fig. 2(c) depicts the number of candidate traces. The results
are reported with the best parameters selected according to our schema. For
better readability, only results are included that dier from the original tf/idf
or from one of the other curves visibly. Table 3 displays the results of dataset
LSC with its subsets. As its system specication contains no signals with value
assignment, the appropriate optimization is not applicable.
Tackling Semi-automatic Trace Recovery for Large Specications 213
7 Discussion
There is only one interpretation: The results are not adequate for use in industry.
However, there is light: With most of our datasets, we were able to retrieve most
of the reference traces. The remaining problem is the precision. Although we
reach 10% precision in some sets, this is not enough when compared with the
huge amount of candidate traces that have to be analyzed. These results are due
to the large size of the datasets in terms of semi-automatic trace retrieval. The
precision in the smaller subsets (see e.g. SysS TS in Tab. 3) is considerably
better. For even larger datasets not uncommon in industry, even worse precision
results are expected. Although the results show mostly little impact on precision
and recall in absolute terms, these changes become important through the size
of our datasets: The number of candidate traces needing inspection show large
dierences. If we take dataset LSC with SysS CS 1, we see an increase of the
average precision of 0.09 only between tf/idf and the ltering of redundant texts
(frt). However, in absolute terms, the dierence is more obvious: 561 traces less
(12.8%) need inpection per cuto point without loss of recall.
The dierent optimizations performed unequally. Dynamic signal weighting
(DSW) either aected the recall mildly positively or not at all. It always lead
to fewer candidate traces. This might be due to the fact that signals are an im-
portant factor for the existence of traces. The additional consideration of signals
with value assignment (DSW+VA) did not increase the recall but improved the
precision further. However, the optimal weight parameter varies quite a bit.
The handling of domain vocabulary in form of weak and domain words was
not very successful. Although weak words (D(ww)) helped to increase the recall
in most datasets, the precision worsened. In case SysS SysS of dataset LSC
only, the precision was improved. We expected the parameter for weak words to
be < 1 as weak words should be semantically less important than other words.
However, this was not always the case. Similar unexpected behavior of weak
words was observed when the quality of automotive specications was rated by
requirements engineers: The higher the amount of weak words, the better the
perceived quality [26]. Domain words (D(dw)) improved the recall most of the
time while always increasing the number of candidate traces. The combination
of both word classes did not perform well as especially the precision suers.
The use of a thesaurus improved or kept the recall. The precision was reduced
in all but one set. The synonym normalization (SN) a simpler form of the the-
saurus (Th) performed slightly worse on the recall side. However, the precision
was not as badly inuenced as with the thesaurus. For large datasets, SN seems
to be the better choice as it is faster.
The ltering of redundant texts (frt) is especially helpful for the search of traces
within one document, or more specically within one document template. This
might be due to the fact that the dierent templates use dierent redundant texts.
It should be noted that beside in dataset OLC, the removal of redundant texts did
not diminish the recall but always improved the precision. The extension that re-
moves (semantically) empty heading (feh) can improve the precision further. As
with frt, the recall is reduced in dataset OLC only, while the precision is doubled.
Tackling Semi-automatic Trace Recovery for Large Specications 215
8 Related Work
Research into optimizations for semi-automatic methods recovering traces lead
to dierent approaches and results.
Multiple researchers studied nding traces between artifacts without common
language. McMillan et al. [21] propose identifying traces by taking an indirection:
When two documents are related to the same part of code, it is assumed that
the documents also are related to each other. Asuncion and Taylor [2] propose
deriving candidate traces by monitoring the way the user interacts with artifacts.
They reason that when a user concurrently or sequentially modies artifacts, they
might be related. Their approach is based in the e-Science domain but should
be transferable to model based specications.
Contrary to model based development with code generation, model based spec-
ications are not very common in the automobile domain. Should this change, the
proposals of De Lucia et al. [9] and Cleland-Huang et al. [6] promise to retrieve
traces between requirements and models. Kof [15] uses natural language process-
ing to build message sequence charts and automata from textual scenarios. He
therefore provides means to automatically formalize parts of natural language
specications. Such transformations allow tracing the dierent representations.
For later phases of development, Marcus and Maletic [20] show that traces
into code can be retrieved. Ratanotayanon et al. [23] support tracing between
feature-descriptions and code and are able to automatically update traces on
code changes. Winkler [25] proposes to extend the IR approaches in order to
reuse the analysts decision about candidate traces even when an artifact changes.
Although his approaches are benecial when good decisions were made, they
also preserve bad decisions.
9 Conclusions
Finding traces in some sets is like looking for a needle in a haystack. For example,
nding 6 traces between the LSC system specication and CS 2 in about 1,500
elements is not that easy. Unsurprisingly, sets with traces into component speci-
cations have bad precision results. One of the reasons might be that just a very
small part of the component specications is relevant for the system. Obviously,
only traces to the relevant parts should be retrieved. Therefore, we tested remov-
ing the irrelevant parts. The results improved as expected enormously: For
example for SysS CS 1, the recall went up to 28.57% and precision to 4.27%.
The precision of the new results of the original tf/idf is nearly 10-times as good
216 J. Leuser and D. Ott
as the previously achieved 0.48%. As our goal is to work with unaltered speci-
cations, we propose extending the ltering mechanisms or the user interaction
in a way to facilitate nding traces between subparts of specications.
The use of dynamic signal weighting always reduced the number of candidate
traces without loss or even with gain in recall. If applicable, the consideration of
signals with value assignment also has positive eects. The introduction of two
word classes weak words and domain words with additional weights generally
improved the recall but often at the cost of reduced precision. The same is true
for the use of a thesaurus or the faster form of synonym normalization.
It is evident to us that the current results are not good enough for an industrial
application. However, we also see that we were able to improve the number of
candidate traces needing inspection. The ltering approaches based on redundant
texts, empty headings, and meta-data generally improved the precision. Often, it
is possible to remove false positive candidate traces without any loss of recall. As
this depends on the dataset, our decision to apply the lters as late as possible
in the processing chain seems to be correct. Hence the analyst can decide at
analysis time what lters to activate.
References
1. Antoniol, G., Canfora, G., Casazza, G., De Lucia, A.: Identifying the Starting
Impact Set of a Maintenance Request: a Case Study. In: Proceedings of the Fourth
European Software Maintenance and Reengineering, pp. 227230 (2000)
2. Asuncion, H.U., Taylor, R.N.: Capturing Custom Link Semantics among Heteroge-
neous Artifacts and Tools. In: ICSE Workshop on Traceability in Emerging Forms
of Software Engineering (2009)
3. Baeza-Yates, R., Ribeiro-Neto, B.A.: Modern Information Retrieval, reprint edn.
Pearson Addison-Wesley (2006)
4. Boutkova, E.: Variantendokumentation in Lastenheften: State-of-the-Practice
(Variant Documentation in Requirement Specications). In: Systems Engineering
Infrastructure Conference (November 2009)
5. Braschler, M., Ripplinger, B.: How Eective is Stemming and Decompounding for
German Text Retrieval? Information Retrieval 7(3-4), 291316 (2004)
6. Cleland-Huang, J., Settimi, R., Duan, C., Zou, X.: Utilizing Supporting Evidence
to Improve Dynamic Requirements Traceability. In: 13th IEEE International Con-
ference on Requirements Engineering, pp. 135144. IEEE CS, Los Alamitos (2005)
7. De Lucia, A., Fasano, F., Oliveto, R., Tortora, G.: Enhancing an Artefact Man-
agement System with Traceability Recovery Features. In: 20th IEEE International
Conference on Software Maintenance, pp. 306315. IEEE CS, Los Alamitos (2004)
8. De Lucia, A., Fasano, F., Oliveto, R., Tortora, G.: Can Information Retrieval Tech-
niques eectively Support Traceability Link Recovery? In: 14th IEEE International
Conference on Program Comprehension, pp. 307316 (2006)
9. De Lucia, A., Fasano, F., Oliveto, R., Tortora, G.: Recovering Traceability Links
in Software Artifact Management Systems Using Information Retrieval Methods.
ACM Transactions on Software Engineering and Methodology 16(4), 13 (2007)
10. Dreher, M.: Konstruktive und analytische Methoden zur Qualitatssicherung von
Anforderungen in der Softwareentwicklung (Constructive and Analytical Methods
for Quality Assurance of Requirements in SW Development). Stuttgart Media
University, Diplomarbeit (2004)
Tackling Semi-automatic Trace Recovery for Large Specications 217
11. Hayes, J.H., Dekhtyar, A., Osborne, J.: Improving Requirements Tracing via
Information Retrieval. In: 11th IEEE International Requirements Engineering
Conference, pp. 138147 (2003)
12. Hayes, J.H., Dekhtyar, A.: Humans in the Traceability Loop: Cant Live with em,
Cant Live without em. In: Proceedings of the 3rd International Workshop on
Traceability in Emerging Forms of Software Engineering, pp. 2023. ACM, New
York (2005)
13. Hayes, J.H., Dekhtyar, A., Sundaram, S.K.: Advancing Candidate Link Generation
for Requirements Tracing: The Study of Methods. IEEE Transactions on Software
Engineering 32(1), 419 (2006)
14. ISO/DIS 26262: Road Vehicles Functional Safety. ISO (2009)
15. Kof, L.: Translation of Textual Specications to Automata by Means of Discourse
Context Modeling. In: Glinz, M., Heymans, P. (eds.) REFSQ 2009. LNCS, vol. 5512,
pp. 197211. Springer, Heidelberg (2009)
16. Konrad, S., Gall, M.: Requirements Engineering in the Development of Large-Scale
Systems. In: 16th IEEE International Conference on Requirements Engineering,
pp. 217222 (2008)
17. Leuser, J.: Challenges for Semi-automatic Trace Recovery in the Automotive
Domain. In: ICSE Workshop on Traceability in Emerging Forms of Software
Engineering, pp. 3135 (May 2009)
18. Leuser, J.: Herausforderungen fur halbautomatische Traceability-Erkennung (Chal-
lenges for Semi-automatic Trace Recovery). In: Systems Engineering Infrastructure
Conference (November 2009)
19. Lormans, M., van Deursen, A.: Can LSI Help Reconstructing Requirements
Traceability in Design and Test? In: Proceedings of the Conference on Software
Maintenance and Reengineering, pp. 4756. IEEE CS, Los Alamitos (2006)
20. Marcus, A., Maletic, J.I.: Recovering Documentation-to-Source-Code Traceabil-
ity Links Using Latent Semantic Indexing. In: 25th International Conference on
Software Engineering, 3rd edn., pp. 310 (2003)
21. McMillan, C., Poshyvanyk, D., Revelle, M.: Combining Textual and Structural
Analysis of Software Artifacts for Traceability Link Recovery. In: ICSE Workshop
on Traceability in Emerging Forms of Software Engineering, pp. 4148 (May 2009)
22. Powers, T., Stubbs, C.: A Study on Current Practices of Requirements Traceabil-
ity in Systems Development. Masterthesis, Naval Postgrad. School Monterey CA
(1993)
23. Ratanotayanon, S., Sim, S.E., Raycraft, D.J.: Cross-Artifact Traceability Using
Lightweight Links. In: ICSE Workshop on Traceability in Emerging Forms of Soft-
ware Engineering, pp. 5764 (May 2009)
24. Regnell, B., Svensson, R.B., Wnuk, K.: Can we Beat the Complexity of very Large-
Scale Requirements Engineering? In: Paech, B., Rolland, C. (eds.) REFSQ 2008.
LNCS, vol. 5025, pp. 123128. Springer, Heidelberg (2008)
25. Winkler, S.: Trace Retrieval for Evolving Artifacts. In: ICSE Workshop on Trace-
ability in Emerging Forms of Software Engineering, pp. 4956 (May 2009)
26. Yakoubi, R.: Empirische Bewertung von Qualitatsindikatoren fur Anforderungs-
dokumente (Empirical Assessment of Quality Indicators for Requirement
Specications). Ulm University, Diplomarbeit (2009)
Ambiguity Detection: Towards a Tool Explaining
Ambiguity Sources
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 218232, 2010.
c Springer-Verlag Berlin Heidelberg 2010
Ambiguity Detection: Towards a Tool Explaining Ambiguity Sources 219
correction is. Thus, the imprecision of requirements should be detected early in the
development process.
Ambiguity (i.e., the possibility to interpret one phrase in several ways) is one of
the problems occurring in natural language texts. An empirical study by Kamsties et
al. [3] has shown that . . . ambiguities are misinterpreted more often than other types of
defects; ambiguities, if noticed, require immediate clarification. In order to systematize
typical ambiguous phrases, Berry et al. introduced the Ambiguity Handbook [1]. A tool
that detects ambiguities listed in the handbook surely contributes to early detection of
problematic passages in requirements documents. According to Kiyavitskaya et al. [4],
a tool for ambiguity detection should not only detect ambiguous sentences, but also
explain, for every detected sentence, what is potentially ambiguous in it. Such a tool is
presented in this paper.
Contribution: The tool described in the presented paper is a big step towards the am-
biguity detection tool satisfying the requirements by Kiyavitskaya et al: this tool is able
not only to detect ambiguities but also to explain ambiguity sources. When detecting
ambiguities, it basically relies on a grep-like technique, which makes it highly reliable,
applicable to different languages, and independent from error-prone natural language
parsing. For every detected ambiguity the tool provides an explanation why the detec-
tion result represents a potential problem. Furthermore, due to web-based presentation
and a lightweight linguistic engine on the server side, the tool is fast and highly portable,
which makes it applicable for real projects. Therefore, it can cause considerable time
and cost savings while at the same time enabling higher quality, as it simplifies early
detection of potentially critical errors.
Outline: The remainder of the paper is organized as follows: First, Section 2 sketches
part-of-speech tagging, the computational linguistics technique used in our tool. Then,
Section 3 introduces the types of ambiguities that can be detected by our tool. These
types include ambiguity classes introduced in the Ambiguity Handbook and ambiguity
classes derived from writing rules used internally at Siemens. Section 4 presents the
tool itself, especially the technique used to detect ambiguities and the presentation of
the detected ambiguities to the tool user. Section 5 provides the results of the tool eval-
uation. Finally, Sections 6 and 7 present an overview of related work and the summary
of the paper, respectively.
Part-of-speech (POS) tagging marks every word of a given sentence with one of the
predefined parts-of-speech (substantive, adjective, . . . ) by assigning a POS tag. For ex-
ample, the words of the sentence Failure of any other physical unit puts the program
into degraded mode are marked in the following way: Failure NN failure
of IN of any DT any other JJ other physicalJJ physical
unit NN unit puts VBZ put the DT the program NN program
into IN into degraded VBN degrade mode NN mode. Here, NN means a
where we will look for the verb be. Following tags are the most important ones in the
context of the presented work: (1) any tag starting with VB, identifying different verb
forms, (2) tag VBN, identifying verbs in the past participle form (been, done),
(3) any tag starting with NN, identifying different noun forms, (4) tag JJ, identifying
adjectives, and (5) tag RB, identifying adverbs. Complete information on tag meanings
can be found in the official specifications of tagsets [5,6]. Tagging technology is rather
mature: there are taggers available with a precision of about 96%, such as the TreeTag-
ger [7,8], used in the presented work. The applied tagger (TreeTagger) provides support
for English, German, and further languages, and thus allows to extend the presented
work to really multilingual ambiguity detection.
Independently from the ambiguity type, we can apply ambiguity detection on the same
four levels, namely lexical, syntactic, semantic, and pragmatic. These are the levels tra-
ditionally used in natural language processing, cf. [9]. Analysis tasks and result types
for every kind of analysis are sketched in Table 1. A survey performed in our previous
work [10] shows that solely lexical and syntactic analyses are possible at the moment
for fully-fledged English. If the grammar used in the text can be restricted to a certain
subset of English, semantical analysis becomes possible, too. Attempto Controlled En-
glish [11] gives an example of such a restricted language and a processing tool for this
language. Pragmatic analysis is not possible yet. In order to keep our tool applicable to
different documents, written without any grammatical restrictions, as well as to make
the tool efficient, we focus on lexical and lightweight syntactical analysis, based on
part-of-speech tagging. The applied analysis rules are presented below in Section 4.
The patterns for ambiguity detection have been extracted from the Ambiguity Hand-
book and an Siemens-internal guidelines for requirements writing. The Ambiguity
Handbook lists a total of 39 types of ambiguity, vagueness or generality. Some of these
patterns were not integrated into our tool: 4 out of 39 types were isolated cases, i.e., ex-
amples of ambiguous expressions without explicit statements, which linguistic patterns
can be used to identify the ambiguity. Mean water level is an example of such an
expression: to make it unambiguous, it is necessary to define precisely, how mean is
determined, but we cannot generalize this expression to an ambiguity detection pattern.
7 further ambiguities are on a semantic or pragmatic level and are not amenable to state-
of-the-art computational linguistics. Elliptic ambiguities like Perot knows a richer man
than Trump provide an example of such a high level ambiguity. Lastly, ambiguities in
formalisms (counted as one of 39 ambiguity types too) were not included in our tool,
as we aim at the analysis of requirements written in natural language.
The remaining 27 patterns were integrated into our tool. In addition, all 20 patterns
from the Siemens guidelines could be integrated, as they all can be easily detected on
lexical or syntactic level. 9 out of 20 Siemens patterns are already covered by 8 patterns
222 B. Gleich, O. Creighton, and L. Kof
Table 2. Ambiguity patterns with source and level of detection. Sources: AH=Ambiguity Hand-
book, S=Siemens
from the Ambiguity Handbook, so we had the total of 27+20-9=38 patterns included in
the tool. The detection patterns that were finally implemented in the tool are presented
in Table 2. To make the table compact, we merged similar patterns to a single line of
the table, so there is no 1:1 correspondence between table lines and patterns. Here, it
is important to emphasize that many of the ambiguity detection patterns implemented
in our tool represent semantic or pragmatic ambiguities, although we perform solely
lexical and syntactic analysis. This is also easy to see in Table 2.
occurrence of the verb to be, followed by the past participle, but no further
verbs are allowed to occur between be and the participle. Such a word se-
quence can be matched by the regular expression presented in Table 5.
Adjectives and adverbs: The list of vague adjectives and adverbs in the Ambi-
guity Handbook is incomplete, as it contains etc. When analyzing manual
evaluations (cf. Section 5) we came to the conclusion that there are a lot more
adjectives and adverbs that are perceived as ambiguous, than listed in the Am-
biguity Handbook. So, we decided to trade in some precision for recall and to
mark every adjective and every adverb as a potential ambiguity. Marking of
adjectives as a potential ambiguity source is also in line with the statement by
Rupp [12] that any adjective in comparative form (better, faster) can result
in misinterpretations.
Tables 3-5 clearly show that most ambiguities result from single ambiguous words (not
from word combinations) and, thus, can be detected on the lexical level. To apply the
tool to German documents, we use the same regular expressions, with the only differ-
ence that the keywords are translated and the regular expression for passive detection is
altered to fit German grammar.
The tool marks every found ambiguity occurrence either red or orange or blue, de-
pending on the severity of the found ambiguity. This marking idea is similar to errors
and warnings produced by most compilers: An ambiguity is marked red, if it definitely
represents a problem, and either orange or blue, if it, depending on the context, can be
until, during, through, after, at These expressions do not spec- Maintenance shall be per-
ify the outside behaviour. formed on sundays vs. only on
sundays.
could, should, might These expressions are not con- The system should avoid er-
cise. rors.
usually, normally Unnecessary speculation The system should not display
errors normally.
actually Requirements shall avoid pos- Actually, this requirement is
sibilities important.
100 percent, all errors Wishful thinking The system must be 100 per-
cent secure.
he, she, it Potentially unclear reference. The system uses encryption. It
must be reusable.
(,) Unclear brackets The system shall use HTML
(DOC) documents.
/ Unclear slashes The System shall use
HTML/DOC documents.
tbd, etc These expressions denote that The system shall support
something is missing HTML, DOC etc.
fast Vague non functional require- The system shall be fast.
ment
this Potentially unclear reference. This is very important.
Ambiguity Detection: Towards a Tool Explaining Ambiguity Sources 225
Table 3. (Continued)
until, during, through, after, at These expressions do not spec- Maintenance shall be per-
ify the outside behaviour. formed on sundays vs. only on
sundays.
could, should, might These expressions are not con- The system should avoid er-
cise. rors.
usually, normally Unnecessary speculation The system should not display
errors normally.
actually Requirements shall avoid pos- Actually, this requirement is
sibilities important.
100 percent, all errors Wishful thinking The system must be 100 per-
cent secure.
he, she, it Potentially unclear reference. The system uses encryption. It
must be reusable.
(,) Unclear brackets The system shall use HTML
(DOC) documents.
/ Unclear slashes The System shall use
HTML/DOC documents.
tbd, etc These expressions denote that The system shall support
something is missing HTML, DOC etc.
fast Vague non functional require- The system shall be fast.
ment
this Potentially unclear reference. This is very important.
Table 4. Regular expressions used to detect ambiguities resulting from word combinations
potentially unambiguous, too, cf. Table 3. The difference between blue and orange is
based on our experiments with the tool: patterns that get blue markings are more likely
to result in false positives. For every sentence that contains a detected ambiguity, the
tool places a pictogram next to the sentence. If the user clicks on the pictogram, he/she
will get an explanation for every marking in the sentence under analysis. Additionally,
the tool user gets a short explanation if he/she places the mouse pointer over the marked
text. Figure 1 shows an example of the presentation of found ambiguities to the user.
226 B. Gleich, O. Creighton, and L. Kof
Fig. 1. Sample tool output, applied to the text used for evaluation (cf. Section 5)
5 Evaluation
In order to evaluate the tool, we created a reference data set consisting of approximately
50 German and 50 English sentences. The sentences were not specially crafted for the
tool evaluation, but taken at random from real requirements documents. Table 6 shows
an excerpt from the reference data set.
We asked 11 subjects to mark ambiguities in the data set. We had subjects from
different backgrounds:
Ambiguity Detection: Towards a Tool Explaining Ambiguity Sources 227
by the tool, and S = T E. In the most simple form, we calculated recall and pre-
|S| |S|
cision as Precision = |T | and Recall = |E| . Additionally, we used the criticality
weight (S)
values to calculate weighted recall. We defined Recall weighted = weight (E) , where
def
weight (A) = aA criticality (a). Weighted precision makes no sense, as we would
have to mix unrelated criticality values coming from different sources.
When calculating recall and precision according to the above definitions, we ob-
served two interesting phenomena:
1. There exist text passages that were marked as problematic, but these markings rep-
resent no ambiguities but are purely stylistic. Furthermore, they are contradictory
to explicitly stated best practices by Siemens or the suggestions of the Ambigu-
ity Handbook. For example, one of our subjects marked shall as ambiguous, al-
though the use of shall is not ambiguous at all and is even an explicitly stated best
practice by Siemens. On the other hand, another subject marked every requirement
that was in indicative mode, which is more a matter of taste than a real ambiguity.
In the following definitions we will refer to such ambiguities as BP (falsely
marked ambiguities that are not ambiguities according to best practices). Here, it is
important to emphasize that no ambiguity was absorbed into BP simply because
it was not contained in the Ambiguity Handbook or Siemens guidelines. Marked
ambiguities were absorbed into BP only if they were purely stylistic and in con-
tradiction with best writing practices.
2. There exist ambiguities that were missed by every subject (thus, these ambiguities
were not in E), which are still genuine ambiguities in the sense of the Ambigu-
ity Handbook or the Siemens-guidelines. For example, many occurrences of pas-
sive voice were missed by the subjects. In the following definitions we will refer
to such ambiguities as BP + (genuine ambiguities according to best practices,
which were not found by our subjects).
In order to attenuate the influence of human evaluators performance on the tool evalua-
tion, we evaluated the tool not only with the original set of marked ambiguities (E), but
also with E BP +, E\BP and (E BP +)\BP as reference sets. As the sets BP +
Ambiguity Detection: Towards a Tool Explaining Ambiguity Sources 229
Language Reference set Precision (%) Recall, simple (%) Recall, weighted (%)
E 47 55 64
E BP + 95 71 78
English
E\BP 95 75 77
(E BP+)\BP 95 86 86
E 34 53 52
E BP + 97 76 74
German
E\BP 97 69 70
(E BP+)\BP 97 86 86
6 Related Work
Lightweight text processing techniques (techniques not involving natural language pars-
ing) are very popular in requirements analysis, as they are easy to implement and, never-
theless, can provide valuable information about document content. Such techniques can
be used, for example, to identify application specific concepts. Approaches by Goldin
and Berry [14], Maarek and Berry [15], and Sawyer et al. [16] provide good examples of
concept extraction techniques: they analyze occurrences of different terms, and basing
on occurrence frequency, extract application-specific terms from requirements docu-
ments. The focuses of these approaches and our approach are different, though: we do
not perform any concept extraction but focus exclusively on ambiguity detection.
Ambiguity detection approaches are closer to the presented tool and should be ana-
lyzed more thoroughly. Apart from the approaches by Berry et al. [1] and Kiyavitskaya
et al. [4], used as the basis for the presented tool, ambiguity detection approaches were
introduced by Fabbrini et al. [17], Kamsties et al. [18], and Chantree et al. [19]. The
approach by Fabbrini et al. introduces a list of weak words and evaluates requirements
documents on the basis of weak word presence. Weak word detection is already in-
cluded in our tool, and adding further weak words to the detection engine is just a
matter of extending the weak words database. The ambiguity types classified by Kam-
sties et al. became a part of the Ambiguity Handbook later, so our tool already covers
most ambiguities presented there. The approach by Chantree et al. deals exclusively
with the coordination ambiguity. Our tool, although not specially designed to detect co-
ordination ambiguity, is able to detect coordination ambiguity, too, and, in addition to
that a lot more other types of ambiguities.
Ambiguity Detection: Towards a Tool Explaining Ambiguity Sources 231
The tool presented in this paper has one important advantage when compared to
other existing ambiguity detection approaches: It can not only detect ambiguities, but
also explain the rationale for the detected ambiguity. Thus, apart from pure ambiguity
detection, the presented tool can be used to educate requirements analysts, too.
7 Summary
Acknowledgments
We want to thank the participants of our empirical evaluation and other people who
helped to improve this paper: Bernhard Bauer, Naoufel Boulila, Andreas Budde, Roland
Eckl, Dominik Grusemann, Christian Leuxner, Klaus Lochmann, Asa MacWilliams
Daria Malaguti, Birgit Penzenstadler, and Carmen Seyfried.
References
1. Berry, D.M., Kamsties, E., Krieger, M.M.: From contract drafting to software specification:
Linguistic sources of ambiguity (2003),
http://se.uwaterloo.ca/dberry/handbook/ambiguityHandbook.pdf
(accessed 27.12.2009)
2. Mich, L., Franch, M., Novi Inverardi, P.: Market research on requirements analysis using
linguistic tools. Requirements Engineering 9, 4056 (2004)
3. Kamsties, E., Knethen, A.V., Philipps, J., Schatz, B.: An empirical investigation of the defect
detection capabilities of requirements specification languages. In: Proceedings of the Sixth
CAiSE/IFIP8.1 International Workshop on Evaluation of Modelling Methods in Systems
Analysis and Design (EMMSAD 2001), pp. 125136 (2001)
4. Kiyavitskaya, N., Zeni, N., Mich, L., Berry, D.M.: Requirements for tools for ambiguity iden-
tification and measurement in natural language requirements specifications. Requir. Eng. 13,
207239 (2008)
232 B. Gleich, O. Creighton, and L. Kof
5. Santorini, B.: Part-of-speech tagging guidelines for the Penn Treebank Project. Technical
report, Department of Computer and Information Science, University of Pennsylvania (3rd
revision, 2nd printing) (1990)
6. Schiller, A., Teufel, S., Stockert, C., Thielen, C.: Guidelines fur das Tagging deutscher
Textcorpora mit STTS. Technical report, Institut fur maschinelle Sprachverarbeitung,
Stuttgart (1999)
7. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the
International Conference on New Methods in Language Processing, pp. 4449 (1994)
8. Schmid, H.: Improvements in part-of-speech tagging with an application to german. In:
Proceedings of the ACL SIGDAT-Workshop, pp. 4750 (1995)
9. Russell, S., Norvig, P.: Communicating, perceiving, and acting. In: Artificial Intelligence: A
Modern Approach. Prentice-Hall, Englewood Cliffs (1995)
10. Kof, L.: On the identification of goals in stakeholders dialogs. In: Paech, B., Martell,
C. (eds.) Monterey Workshop 2007. LNCS, vol. 5320, pp. 161181. Springer, Heidelberg
(2008)
11. Fuchs, N.E., Schwertel, U., Schwitter, R.: Attempto Controlled English (ACE) language
manual, version 3.0. Technical Report 99.03, Department of Computer Science, University
of Zurich (1999)
12. Rupp, C.: Requirements-Engineering und -Management. Professionelle, iterative An-
forderungsanalyse fur die Praxis, 2nd edn. HanserVerlag (2002), ISBN 3-446-21960-9
13. Clark, S., Curran, J.R.: Parsing the WSJ using CCG and log-linear models. In: ACL 2004:
Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics,
Morristown, NJ, USA, p. 103. Association for Computational Linguistics (2004)
14. Goldin, L., Berry, D.M.: AbstFinder, a prototype natural language text abstraction finder for
use in requirements elicitation. Automated Software Eng. 4, 375412 (1997)
15. Maarek, Y.S., Berry, D.M.: The use of lexical affinities in requirements extraction. In:
Proceedings of the 5th International Workshop on Software Specification and Design,
pp. 196202. ACM Press, New York (1989)
16. Sawyer, P., Rayson, P., Cosh, K.: Shallow knowledge as an aid to deep understanding in early
phase requirements engineering. IEEE Trans. Softw. Eng. 31, 969981 (2005)
17. Fabbrini, F., Fusani, M., Gnesi, S., Lami, G.: The linguistic approach to the natural language
requirements quality: benefit of the use of an automatic tool. In: 26th Annual NASA Goddard
Software Engineering Workshop, Greenbelt, Maryland, pp. 97105. IEEE Computer Society,
Los Alamitos (2001)
18. Kamsties, E., Berry, D.M., Paech, B.: Detecting ambiguities in requirements documents
using inspections. In: Workshop on Inspections in Software Engineering, Paris, France,
pp. 6880 (2001)
19. Chantree, F., Nuseibeh, B., de Roeck, A., Willis, A.: Identifying nocuous ambiguities in
natural language requirements. In: RE 2006: Proceedings of the 14th IEEE International
Requirements Engineering Conference (RE 2006), Washington, DC, USA, pp. 5665. IEEE
Computer Society, Los Alamitos (2006)
Ambiguity in Natural Language Software
Requirements: A Case Study
1 Introduction
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 233247, 2010.
c Springer-Verlag Berlin Heidelberg 2010
234 F. de Bruijn and H.L. Dekkers
Fabbrini et al. [10,9] present a tool to analyze the quality of natural language require-
ments written in English. The tool Alpino[5] can be used to automatically parse texts
written in Dutch, to find the most probable interpretation.
3 Research Method
To answer the research question "What is the effect of ambiguity of software require-
ments on project success?" we studied a real life project that failed. First we established
the level of ambiguity in the requirements specification. Next we established if ambi-
guity was a significant cause for the reported issues.
were performed on this sample. Table 1 lists the different requirements categories, % of
total is the total number of requirements in the category / total no. of requirements.
2 Alpino version 14888 for the x86 Linux platform was used in this experiment.
Ambiguity in Natural Language Software Requirements: A Case Study 237
4 Results
4.1 Measured Ambiguity
4.1.1 Review Panel Interpretations
The review panel of three different persons qualified 36 requirements as unambigu-
ous, 41 requirements had two different interpretations, and finally 25 requirements had
three different interpretations. Figure 1 shows for each requirement category how many
requirements were ambiguous and how many were unambiguous.
3 Many requirements are formulated in the problem domain, explicitly leaving the choice of
solution to the contractor. In some cases the test team prefers a solution different from the one
that is implemented while it is apparent that the requirement is interpreted in the same way as
the contractor. These issues are not considered to be caused by ambiguity.
238 F. de Bruijn and H.L. Dekkers
Fig. 3. x-axis: requirements categories, see table 1. y-axis: average number of parse trees per
requirement.
discovered. Table 3 lists in how many requirement statements the most common ambigu-
ity types were found. Note that the number of ambiguities within one single requirement
statement is not counted. Figure 2 shows for each requirement category how many
requirements were ambiguous and how many were unambiguous.
To make our reasoning process transparent we illustrate this in table 5. The pre-
sented issue was selected because it raised some discussion and gives good insight in
our analysis. Determining if the implementation satisfied the requirements was some-
times straightforward, sometimes it was hard to ascertain. The issue and requirements
were in Dutch and there is some risk that ambiguity is lost in translation. Also the issue
description and annotations are too large to be presented.
4 Calculations over the set of requirements that could be parsed.
240 F. de Bruijn and H.L. Dekkers
5 Evaluation
Table 6 shows that the studied requirements sample of 102 requirements revealed a lot
of ambiguous requirements. Alpino considered all but 1 requirement to be ambiguous.
The systematic review considered 83 out of 102 requirements to be ambiguous. The
review panel considered 66 of the 102 requirements to be ambiguous. That the review
panel has a single reading for an ambiguous statement corresponds with the notion of
innocuous ambiguity[7].
Only one of the forty inspected issues was caused by ambiguity in the requirements.
This issue was not a costly one. From our study we cannot conclude that ambiguous
requirements caused the failure of this project.
Table 6. Ambiguity of requirements according to review panel, systematic review, and Alpino
Although the requirements specification from project X was highly ambiguous most of
the examined issues could not be attributed to ambiguity. Project X used workshops
involving the customer to clarify the requirements. Yet, even workshops and discussion
dont guarantee a good interpretation. Given the complexity of this project, the many
issues, and the lack of contact between test team and development team we were sur-
prised by this finding. This is in line with the observation of [11] that the biggest danger
is unconscious disambiguation. The software engineer interprets an ambiguous require-
ment differently then the customers intention, but is unaware of this. In this project the
contractor was from the start aware of the high level of ambiguity.
What is the cost of ambiguous requirements?
The issue that was caused by ambiguity is about performance, potentially a costly issue.
However the architecture of the application was set up to process vast amounts of data
in reasonable time. To get a reasonable performance took roughly 550 hours5. The issue
was considered to be resolved however from the issue annotations it was apparent that
still much was and is unclear about the real time scenarios (how much data in what time
slots) and what performance is considered to be acceptable.
The project suffered a major budget overrun of 30.000 hours. The 550 hours for
the issue caused by ambiguity is limited. The workshops to clarify ambiguity were
included in the budget (180 hours). The project data does not show what part of the
budget overrun is caused by ambiguity.
6 Threats to Validity
6.1 Validity of the Tests to Determine Ambiguous Requirements
Is the sample set of requirements representative?
Throughout the research project we have read and interpreted the complete set of re-
quirements intensively. The requirement sample was characteristic for the whole set of
requirements. For the different types of requirements the requirement statements follow
a similar pattern and use similar words. We found no occurrences of requirements that
were more specific than the ones studied in our sample. Since the different requirement
types are in different requirement categories, we consider our sample to be representa-
tive for the complete set of requirements.
5 According to the project manager one software engineer worked on the performance optimiza-
tion for three months.
242 F. de Bruijn and H.L. Dekkers
6.1.1.1 The Interpretations by the Review Panel. The good thing about this test is that
ambiguity that does not lead to misinterpretations will not be reported. However, the re-
viewers might have an invalid interpretation or a different interpretation from the actual
project team or customer. This test is not just an indicator of ambiguity, it also says some-
thing about the interpretation process of the reviewers. The final threat to validity is that
the review panel is under the impression they have a different interpretation, while they
actually share the same interpretation (false positive). This last threat could have been
avoided by making the interpreters formalize their interpretation as described in [13].
6.1.1.2 The Systematic Review. Detecting ambiguities by humans is a hard task. The
reviewers were no trained linguists and unconscious disambiguation makes it easy to
miss ambiguity types. We expect that the number of false negatives is rather high. The
ambiguities found complied with the taxonomy of [4] and the test protocol ensured that
the found ambiguities were analyzed well. We expect that the number of false positives
is rather low.
6.1.1.3 Alpino Scan. Alpino was used to get objective measures for ambiguity. When
a requirement has at least two parse trees then the requirement has structural or lexical
ambiguity. As described in [7] many of these ambiguities have a single reading by
humans and are innocuous. Alpino features a maximum-entropy based disambiguation
component to select the best parse for a given sentence. From our discussion with the
Alpino research group it became clear that there is not a clear threshold that can be used
to automatically determine which of the parse trees is a plausible interpretation. This
would have enabled us to automatically distinguish between nocuous and innocuous
ambiguity. Also to date Alpino has no feature to report the different types of ambiguity.
The parse trees of Alpino are used to detect false negatives.
A first read of all blocking and urgent issues revealed five issues that were likely to have
been caused by ambiguity. Indeed all of these five issues sparked a lot of discussion.
Initially we extended this set of five with a random sample of 20 issues. When our first
analysis revealed that only 1 issue was caused by ambiguity, another 15 issues were
randomly selected and analyzed. The analysis showed that the new set of 15 issues had
similar causes as the initial set of 20 randomly selected issues. This strengthens our
belief that the sample was representative
improvements and comments were not specified by the requirements, the contractor felt
that the improvements and comments were reasonable and without much discussion
qualified each issue as bug. Most issues had been resolved at the time of the research.
Our analysis would classify these 9 issues as "new feature request". This analysis of 13
issues revealed no false negatives.
7 Other Observations
Is the number of words an indicator for ambiguity?
Five randomly selected requirements were rewritten with the help of the rules described
in Berry et al.[4]. The selected requirements were found to be ambiguous by both the
review panel and by the systematic review. Furthermore, the average word count of
these requirements was 25. Rewriting these five requirements took about half a day,
however this did not include a check that the new description expressed the intention of
the customer. The rewritten requirements were reviewed again by the same protocol as
specified in subsection 3.2.2.
This test found that four of the five requirements were unambiguous. The fifth re-
quirement was still ambiguous, caused by a vague word. Disambiguating required man-
dating a specific solution, limiting the design space more than the customer required.
The rewritten requirements contained more words than the original requirements. In
fact, the average word count was 46 for the rewritten requirements, the requirements
contained more and shorter sentences. The review panel members mentioned they had
no problems in comprehending the rewritten requirements. This shows that length is
not the most important indicator for ambiguities.
8 Conclusion
In this research we studied the effect of a highly ambiguous requirements document on
project success. The studied project was the development of a complex system that took
about 21 man years to develop and was canceled after an independent test team found
over 100 blocking issues. The perception of the contractor was that many of these issues
were caused by ambiguity in the requirements. Independent tests by humans showed
that 91% of the requirements were ambiguous. An automated test revealed that 92%
of the requirements were ambiguous. A root cause analysis on 40 of the main issues
showed that only one of the examined issues was caused by ambiguous requirements.
This issue was resolved and explained 2% of the budget overrun.
In this project, ambiguous requirements were not the main cause of the issues found
by the external test team, and cannot explain the failure of the project. Both the inde-
pendent test team and the third party development team found ways to cope with the
high level of ambiguity.
We can only speculate to the reason why the project was canceled. As a fixed price
project it wasnt because of the budget overrun. Studying the bug reports we saw that
the number of open defects and newly found defects remained high throughout the
acceptance test. This resulted in a loss of confidence in the product by the customer. A
possible explanation for most of the issues is that due to schedule pressure not enough
care was given to implementation details. Also, as Brooks already knew, adding people
to a project that is already late is usually not effective.
9 Future Work
References
1. Abran, A., Moore, J.W., Bourque, P., Dupuis, R.: SWEBOK: Guide to the software engineer-
ing Body of Knowledge. IEEE Computer Society, Los Alamitos (2004)
2. Alexander, I.F., Stevens, R.: Writing better requirements. Addison-Wesley, Reading (2002)
3. Berry, D.M.: Ambiguity in Natural Language Requirements Documents. In: Paech, B.,
Martell, C. (eds.) Monterey Workshop 2007. LNCS, vol. 5320, pp. 17. Springer, Heidel-
berg (2008)
4. Berry, D.M., Kamsties, E., Krieger, M.M.: From contract drafting to software specification:
Linguistic sources of ambiguity. Univ. of Waterloo Technical Report (2003)
5. Bouma, G., Van Noord, G., Malouf, R.: Alpino: Wide-coverage computational analysis of
Dutch. In: Computational Linguistics in the Netherlands 2000. Selected Papers from the
11th CLIN Meeting (2001)
6. Campbell, M.J., TDV Swinscow: Statistics at square one. John Wiley & Sons, Chichester
(2002)
7. Chantree, F., Nuseibeh, B., de Roeck, A., Willis, A.: Identifying nocuous ambiguities in
natural language requirements. In: 14th IEEE International Conference Requirements Engi-
neering, pp. 5968 (2006)
8. Davis, A., et al.: Identifying and measuring quality in a software requirementsspecification.
In: Proceedings of First International Software Metrics Symposium, pp. 141152 (1993)
9. Fabbrini, F., Fusani, M., Gervasi, V., Gnesi, S., Ruggieri, S.: Achieving quality in natural
language requirements. In: Proceedings of the 11 th International Software Quality Week
(1998)
10. Fabbrini, F., Fusani, M., Gnesi, S., Lami, G.: An automatic quality evaluation for natural lan-
guage requirements. In: Proceedings of the Seventh International Workshop on Requirements
Engineering: Foundation for Software Quality REFSQ, vol. 1, pp. 45 (2001)
11. Gause, D.C.: User DRIVEN design - The luxury that has become a necessity. In: A Workshop
in Full Life-Cycle Requirements Management. ICRE (2000)
12. Hull, E., Jackson, K., Dick, J.: Requirements engineering. Springer, Heidelberg (2005)
13. Kamsties, E.: Surfacing ambiguity in natural language requirements. PhD thesis, Fachbereich
Informatik, Universitat Kaiserslautern, Kaiserslautern, Germany (2001)
14. Kamsties, E., Berry, D.M., Paech, B.: Detecting ambiguities in requirements documents us-
ing inspections. In: Proceedings of the first Workshop on Inspection in Software Engineering
(WISE 2001), pp. 6880 (2001)
15. Lauesen, S.: Software requirements: styles and techniques. Addison-Wesley, Reading (2002)
16. Mullery, G.: The perfect requirement myth. Requirements Engineering 1(2), 132134 (1996)
17. Nuseibeh, B., Easterbrook, S.: Requirements engineering: a roadmap. In: ICSE 2000: Pro-
ceedings of the Conference on The Future of Software Engineering, pp. 3546. ACM, New
York (2000)
18. Robertson, S., Robertson, J.: Mastering the requirements process. Addison-Wesley Profes-
sional, Reading (2006)
19. Schneider, G.M., Martin, J., Tsai, W.T.: An experimental study of fault detection in user
requirements documents. ACM Transactions on Software Engineering and Methodology
(TOSEM) 1(2), 188204 (1992)
20. Schramm, W.: How communication works, p. 51. Mass Media & Society (1997)
On the Role of Ambiguity in RE
1 Introduction
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 248254, 2010.
c Springer-Verlag Berlin Heidelberg 2010
On the Role of Ambiguity in RE 249
how, when and by whom it is introduced in SRS, and on what the eects of its
various forms are. Armed with this understanding, we discover that ambiguity
is not necessarily a defect, and in fact can play an important positive role both
in the requirements as a document, and in the requirements elicitation process.
Symbols
Reality
Notification Correctness
System
User
Month
The Real World
that is of interest (even when we have such a thing, e.g. in formal languages),
but the interpretation placed on it by a cognitive agent or interpreter.
Figure 1 shows the various transformations which can lead to multiple mean-
ings. The denotation of the semantics of the requirements is then what drives the
implementation, whose purpose is to build a computer-based system which will
interact with its environment in such a way that the original intent is satised.
3 Sources of Ambiguity
Ambiguity is essentially a linguistic phenomenon, thus it is appropriate to an-
alyze its sources according to the usual paradigm of lexicon, syntax, semantics
(we omit pragmatics due to space constraints). We briey outline the main
issues here, not delving into all the details.
Lexical level. Ambiguity in lexicon occurs typically when the same term is
used to denote dierent things. This can be an inherent feature of the language
being used (for example: homonyms in NL), or happen even in more formal
languages due to lack of or imprecise designations. In fact, even in formal lan-
guages such designations are invariably rooted in the informal real world, and
all stakeholders must a-priori agree on their meaning (thus establishing a com-
mon base of reference). In Figure 1, terms appearing in the requirement, such
as user or month are just lexical tokens. They can correspond to dierent
designations, e.g. month could mean a 30-days period, or a 31-days period,
or till the same-numbered day in the next month, etc.1 Without a more precise
designation, the term month is seriously ambiguous: for example, which date
is a month from today?
Syntactic level. Ambiguity on the syntactic level is easier to dene. It stems
from there existing multiple parse trees for a sentence; to each possible parse
tree, a dierent meaning is attached, hence the ambiguity. In Figure 1, multiple
possible parse trees exist for our sample requirement. In fact, the sentence could
be parsed as The system shall delete the user and (send the notication within
a month) or as The system shall (delete the user and send the notication)
within a month, where the parentheses have been used to indicate the two
critically dierent parsings.
Semantic level. Semantic ambiguity happens when the source text is uniquely
determined in both lexicon and syntax, and still multiple meanings can be as-
signed to the sentence. In this case, the ambiguity lies not in the source, but in
the function assigning meaning to the source, labeled in Figure 1 as the semantics
mapping function. In Figure 1, even if we have precise designations for month,
system, user etc., and even if we are told which of the two syntactic interpre-
tation to take, we could still have doubts on the intended semantics. For example,
shall send a notication means the system will attempt to do it, but how? Is it
sucient to print out a form and hope that someone will put it in an envelope and
mail it? What if the notication is sent, but not delivered? Is there some sort of
1
The Baha calendar, for example, has 19 months of 19 days each, plus 4 intercalary
days (5 in leap years) which are not part of any month.
On the Role of Ambiguity in RE 251
Abstraction is the omission of some details (or more properly, of some in-
formation content). Ambiguity can be used as a means of abstraction, in that
the omitted detail is the information needed to discriminate between multiple
semantics in order to identify the right one (in the eye of the requirement
author). Abstraction is generally considered a desirable quality in requirements,
up to a point, in that it avoids overspecication and simplies the requirements,
keeping them manageable and allowing stakeholders to focus on the important
parts.
252 V. Gervasi and D. Zowghi
Reader
recognized unrecognized
unrecognized recognized
faced with fuzzy cases, in which we suspect there is an ambiguity but cannot
determine for certain.
When the ambiguity is recognized by the writer (cases (a) and (b) in Table 1),
we can assume that it is intentional: the writer uses ambiguity as a means of
abstracting away unnecessary details, signifying that all possible meanings are
all equally acceptable to her as correct implementations of the requirements.
In our example the clause within a month could be intentionally ambiguous,
meaning that the writer (e.g., the customer) is not interested in the exact limit,
as long as there is a xed term, and the term is approximately a month. In case
(a), the reader (e.g., the implementor) also recognizes the ambiguity, and is free
to choose, among all possible implementations that satisfy the requirement in
any of its possible ambiguous meanings, the one that best suits him: for example,
a simple limit=today()+30; in code will suce. In case (b), the reader may not
realize that the writer has given him freedom to implement a vague notion of
month, and might implement a full calendar, taking into account leap years and
dierent month lengths, possibly synchronizing with time servers on the Internet
to give precise-to-the-second months, etc. The resulting implementation will be
correct, but unnecessarily complex. The design space for the solution has been
restricted without reason, and maybe opportunities for improving the quality
of the implementation in other areas (e.g., robustness or maintainability) have
been lost.
If the ambiguity is not recognized by the writer (cases (c) and (d) in Table 1),
we can assume it is not intentional: in a sense, it has crept in against the writers
intention. Hence, only one of the possible meanings is correct, whereas others are
incorrect. The implementation can still be correct, but only by chance (because,
among the possible interpretations, the correct one was chosen). Moreover, when
multiple readers are involved, as is the case in every real-life project, the chances
of every reader taking up the same correct interpretation is slim: so, this type
of ambiguity will probably lead to a wrong implementation, or to a correct
implementation which is tested against the wrong set of test cases, or to a correct
implementation which is tested correctly but then erroneously documented in
users manuals according to a wrong interpretation, etc.
The critical issue becomes: how can one be certain if a given instance of am-
biguity is intentional or not? The answer lies in a generalized concept of marked-
ness. In linguistics, a normal, default form (the one which more naturally would
be used) is considered unmarked, whereas a non-standard form is considered
marked (the more un-natural the form is, the more marked it is considered). For
example, instead of a more natural (and unmarked) within a month, a require-
ment could be written as within a period of approximately one month. This
second form is less naturally occurring, hence more marked, and thus provides
evidence that the ambiguity was intended by the writer (one could also say that
it dispels ambiguity by explicitly stating vagueness).
NL does not oer a specic way of marking intentional ambiguity from un-
intentional one, but conventions could be established to that eect. Notice that
using more contrived forms (e.g., adding approximately) does constitute a case
254 V. Gervasi and D. Zowghi
5 Conclusions
1 Introduction
Software systems are now widely used for applications including financial services,
industrial management, and medical information management. Therefore, it is now
necessary that software for critical applications must comply with the relevant legisla-
tion. Sensitive system information must not be open to unauthorised access, process-
ing, and disclosure by legitimate users and/or external attackers. This situation makes
security to one of the key components involved in ensuring privacy [1]. Information
security and data privacy laws are in general complex and ambiguous by nature and in
particular relatively new and evolving [2, 10].
Such laws often undergo evolution to support the demands of the volatile world.
Several factors such as the introduction of new restrictions, regulation mandates
to increase security, privacy and quality of service, technology evolution, and new
threats and harms are commonly responsible for the amendment of legislation.
An amended legislation enforces an organization to review their internal policies and
to adopt the changes in their software systems. Especially legally relevant require-
ments (security and privacy in our case) should be adapted to avoid corresponding
risks. Therefore, research should be devoted to the development of techniques that
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 255261, 2010.
Springer-Verlag Berlin Heidelberg 2010
256 S. Islam, H. Mouratidis, and S. Wagner
systematically extract and manage requirements from laws and regulations in order to
support requirements compliance to such laws and regulations. We believe evolution
at requirements level is critical in order to meet the needs of its stakeholders and the
constraints such as legal requirements so that change can be traced further through the
life cycle. Due to the above situation, the elicitation of legally compliant requirements
is a challenging task.
This paper, as an extension of our previous work [9], discusses the need to intro-
duce a framework to allow the elicitation and management of security and privacy
requirements from relevant laws and regulations and it briefly presents the founda-
tions of a novel framework that assists in eliciting security and privacy requirements
from relevant legislation and it supports the adoption of changes in the systems
requirements to support the evolution of the laws and regulations. Our contribution
addresses the current research problem of handling evolution of laws, regulation and
their alignment to the requirements.
goal, actor,task,
de
goal,actor,
security
task, right,privacy
constratins
constraints
traceability
Elicit Requirements
Model security & privacy
dependencies
Elicit security & privacy
requirements
refinement
actor model,
requ.
Analyse Requirements
Identify attacker intentions
& attacks
security attack
Estimate risk level
scenario, risk
Identify countermeasures
detailed, refined req.
Refine requiremetns
Activity 1: Model Evolving Regulation. The first step in that activity is to identify
and refine the goals from the privacy legislation by analysing why the regulation and
specific sections of the regulation were introduced to support the specific context. We
follow a basic legal taxonomy proposed by Hohfeld [11] to identify the terms of pri-
vacy legislation. The taxonomy is based on legal rights and classifies in several
elementary concepts including privilege, claim, power, immunity, duty, no-right,
liability, and disability. The next step involves the identification of the relevant
actors, their performed tasks, and the required resources in the system environment to
support the goals. Legal rights are concerned with the actions that the actors are
allowed or permitted to perform [10, 11]. The rights should focus on certain consent,
enforcement, notice, awareness, and participation relating to the privacy taxonomy
[1]. We use activity and purpose patterns [10] along with a sub-set of the Secure
Tropos language to support these steps [4]. The final step involves the adoption of
privacy artefacts with the legislation evolution. We consider the privacy artefacts
identified previously to support the analysis of the requirements change and we
structure our analysis into three possible ways, with which legal text evolves [7]:
addition of a new clause, modification of an existing clause, deletion of a clause.
Activity 2: Map Terminology. During this activity legal terms are mapped to the
terms used for security and privacy requirements. In particular, the legal artefacts
identified from the previous activity are systematically mapped to the security arte-
facts. An initial step is to identify and refine the security goals. Security goals are
identified by analysing the business and initial user requirements of the system envi-
ronment, and by following the privacy taxonomy [1]. The main focus is to ensure
critical security properties such as confidentiality, integrity, availability, authenticity,
and non-repudiation as well as the privacy goals from the previous activity within the
overall system environment. Once the goals are identified, the next step is to map the
actors from the legal concepts to the security concepts by following both security and
privacy goals. Finally we need to map the privacy and security constraints for the goal
satisfaction by following goals, actors, and task.
Activity 3: Elicit Requirements. During the first step of this activity, we model the
secure and privacy dependencies through the Secure Tropos actor model [4], by fol-
lowing the identified actors, goals, tasks, and constraints. This allows us to establish
the compliance link from the legal concepts to security concepts. Finally security and
legal requirements are identified by elaborating both security and privacy constraints
and traceability from legal concepts to security is attained through the identified arte-
fact; in particular by following the relevant goals, tasks, and actors.
Activity 4: Analyse Requirements. This final activity refines the initial requirements
by following risk and evolution techniques. Security threats and privacy harms that
obstruct the relevant goals and influence the relevant non-compliance issues are iden-
tified and analysed. To support the analysis, we combine goal-driven risk manage-
ment [8] with Security Attack Scenarios (SAS) [5]. The activity starts by identifying
the attackers intentions and attacks. This allows us to identify the potential resources
of the system that might be attacked. In our framework, we model the goals of an at-
tacker, attacks and possible resources of the system that might be attacked with an
extended set of attack links [5]. The next step of the activity is to estimate the risk
level based on the analysis techniques of GSRM so that risks are categorised as high,
258 S. Islam, H. Mouratidis, and S. Wagner
medium, and low by focusing on the risk likelihood and impact. Once the risks are
estimated then it is important to identify the countermeasures to prevent the potential
attacks and non-compliances issues. Finally the initial requirements are refined (if
needed) to accommodate provisions for the countermeasure of attacks that cannot be
prevented with the existing set of requirements.
3 Example
The presented example briefly illustrates the applicability of our framework to a spe-
cific application context, where a German bank that offers its customers use of a
smart card (EC card) for payments. We have chosen relevant privacy regulations by
considering the EU directive 95/46/EC [6] and German Federal Data Protection Act
(FDPA) [3] that are related for the context. In the text below, normative phrases (such
as must, shall) and conditional phrases (such as and, or) are in bold; a sub-
ject for an action is underlined; an action is italicized; an object is in bold and under-
lined; a measurement parameter is in bold, italicized, and underlined.
Directive 95/46/EC, Article 17 (partial), Security of processing (partial)
1. Member States shall provide that the controller must implement appropriate techni-
cal and organizational measures to protect personal data against accidental or unlaw-
ful destruction or accidental loss, alteration, unauthorized disclosure or access, in par-
ticular where the processing involves the transmission of data over a network, and
against all other unlawful forms of processing. Having regard to the state of the art
and the cost of their implementation, such measures shall ensure a level of security
appropriate to the risks represented by the processing and the nature of the data to be
protected.
German Federal Data Protection Act, Annex (partial)
1. To prevent unauthorised persons from gaining access to data processing systems
with which personal data are processed or used (access control).
Activity 1: Model Evolving Regulation. The goal of 95/46/EC is to ensure personal
data protection, which is refined with security in processing and supported by appro-
priate technical and organisational measures in article 17. The FDPA supports the
goal of 95/46/EC by including high level requirements such as access control in its
annex. The customer and application providers are the two main actors. Customer
data is the main resource, which contains personally identifiable information such as
the customer name and sensitive information such as card and account details. The
resource is shared for common tasks such as collect customer data, and update ac-
count balance. Among the identified legal rights is that the providers have the liabil-
ity to take appropriate measure to ensure privacy protection and to protect from any
accidental and unlawful activities. To simplify the illustration of our framework, at
this stage, we have not considered any evolution of legal texts but we consider it dur-
ing the analysis activity below.
Activity 2: Map Terminology. The security goal for the application context is al-
ready considered by the legal goals. Therefore, we directly refine the goals to support
the security properties. For example, access control is refined to identification and
adequate authorisation. Goals such as data integrity and secure communication as
well as tasks like providing customised reports about balance are necessary for this
Towards a Framework to Elicit and Manage Security and Privacy Requirements 259
context. To map actors, for simplicity, we consider high level actors such as bank and
card issuer and assume their roles support the security constraints. The security con-
straints supported by the actors are: only legitimate customer, keep communication
secure, transfer minimum data, and preserve anonymity. Finally, security and privacy
constraints are mapped to align with the goals, such as providers liability to consider
any technical measure as privacy constraints and only legitimate customer, keep
communication secure as security constraints support goals like access control, and
secure communication.
Activity 3: Elicit Requirements. Once the security and privacy constraints are ana-
lysed, this activity initially models their dependencies and then elicit relevant re-
quirements such as; i) The customer shall be identified and authenticated before al-
lowed to perform any transaction through the card; ii) The bank shall only provide the
minimum of required data to the retailer that supports the business purpose.
Activity 4: Analyse Requirements. Finally, the elicited requirements are analysed
based on the security threat, privacy harm, and legislation evolution. We consider
data retainment from directive 2006/24/EC [6] as evolution by adding new con-
straints from the legislation to the application context.
Article 6 partial (Periods of retention)
Member States shall ensure that the categories of data specified in Article 5 are re-
tained for periods of not less than six months and not more than two years from the
date of the communication.
The amendment of the legal text introduces the banks liability to retain the customer
data for a certain period to time. At this stage, we need to identify the attacker inten-
tions and attacks for the non-compliance issues in the environment. Among the sev-
eral attackers goals, we consider here obtain sensitive data, by external attackers
through unauthorised access to the system or eavesdropping, and by internal attackers
through misuse. Furthermore, amendment of legislation also supports the attackers
goal, as the longer data is retained, the higher the likelihood of accidental disclosure,
data theft, and other illegal activities. Commonly the impacts of the factors are high
once the attacker successfully performs any attack. Therefore, for simplicity we con-
sider the risk level as high for both high and medium likelihoods of the risk factors.
Finally, requirements are refined such as, the data shall be categorised in a manner
that some sensitive data would not transfer even to the trusted business partners, and
new requirements are elicited, such as The system shall preserve the customer cate-
gorised data for the minimum amount of time to support the business purpose and to
meet the legal compliance to ensure security and privacy goals.
4 Related Work
Mouratidis et al. [4] presented Secure Tropos for eliciting security requirements in
terms of security constraints and the approach of Islam [8] extended it with security
attack scenarios, where possible attackers, their attacks, and system resources are
modelled. Islam [8] also proposed a goal-based software development risk manage-
ment model (GSRM) to assess and manage risks from the RE phase. Antn et al. [1]
260 S. Islam, H. Mouratidis, and S. Wagner
introduce two classes of privacy related software requirements through two classes:
privacy protection goals such as integrity & security and privacy harms based on vul-
nerabilities relating to information monitoring, aggregation, storage, transfer, collec-
tion, and personalization. Breaux et al. [10] consider activity, purpose, and rule sets to
extract rights, obligations, and constraints from legal texts. Ghanavati et al. [7] use
User Requirement Notation based on Goal-oriented Requirement Language for a re-
quirement management framework by modelling hospital business process and pri-
vacy legislation in terms of goals, tasks, actors, and responsibilities. Siena et al. [2]
focus on Hohfelds legal taxonomy and map the legal rights with the i* goal model-
ling language to extract legal compliance requirement. In [8], we use Secure Tropos
to model regulation, based on Hohfelds legal taxonomy, in order to extract require-
ments that comply with legislation.
As foundation for our work we use SecureTropos, GSRM, activity and purpose pat-
terns, and rule sets. Our framework contributes that it enables the analysis of privacy
regulations beyond the only permitted and required actions and it facilitates the con-
sideration of non-compliance issues and risk management since the early stages of the
development process. Furthermore, it supports adopting security and privacy re-
quirements to a change of legislation.
5 Conclusion
Security and privacy practices are important for software that manages sensitive in-
formation and for stakeholders when selecting software or service providers to serve
their business needs. Therefore, organisations responsible to manage sensitive data
cannot escape the obligation to implement the requirements established by privacy
regulations and changes therein. This paper advances the current state of the art by
contributing the foundations of a framework that aligns security and privacy require-
ments with relevant legislation.
References
[1] Antn, A., Earp, J., Reese, A.: Analyzing website privacy requirements using privacy
goal taxonomy. In: Proc. of the IEEE Joint International Conference on RE, pp. 2331
(2002)
[2] Siena, J., Mylopoulos, A., Susi, A.: Towards a framework for law-compliant software
requirements. In: Proc. of the 31st International Conference on Software Engineering
(ICSE 2009), Vancouver, Canada (2009)
[3] Bundesdatenschutzgesetz - Federal Data Protection Act (as of November 15, 2006)
[4] Mouratidis, H., Giorgini, P.: Secure Tropos: A Security-Oriented Extension of the Tro-
pos Methodology. International Journal of Software Engineering and Knowledge Engi-
neering. World Scientific Publishing Company
[5] Mouratidis, H., Giorgini, P.: Security Attack Testing (SAT) - testing the security of in-
formation systems at design time. Inf. Syst. 32(8), 11661183 (2007)
[6] Information society, Summary of legislation, European Commission
Towards a Framework to Elicit and Manage Security and Privacy Requirements 261
[7] Ghanavati, S., Amyot, D., Peyton, L.: A Requirements Management Framework for Pri-
vacy Compliance. In: Workshop on Requirements Engineering (WER 2007), Toronto,
Canada (2007)
[8] Islam, S.: Software development risk management model: a goal driven approach. In:
Proceedings of the Doctoral Symposium for ESEC/FSE on Doctoral Symposium, Am-
sterdam, The Netherlands (2009)
[9] Islam, S., Mouratidis, H., Jrjens, J.: A Framework to Support Alignment of Secure
Software Engineering with Legal Regulations. Journal of Software and Systems Model-
ing (SoSyM) Theme Section NFPinDSML (to appear 2010), doi:10.1007/s10270-010-
0154-z
[10] Breaux, T.D., Antn, A.I.: Analyzing Regulator Rules for privacy and Security Require-
ments. IEEE Transactions on Software Engineering 34(1) (January-February 2008)
[11] Hohfeld, W.N.: Fundamental Legal Conceptions as Applied in Judicial Reasoning. Yale
Law of Journal 23(1) (1913)
Visualizing Cyber Attacks with Misuse Case Maps
Keywords: security, requirements elicitation, misuse case, use case map, mis-
use case map.
1 Introduction
Much effort in the security area focuses on surveillance and fire-fighting, which are
undoubtedly crucial aspects of the cops and robbers game of security. A comple-
mentary approach is prevention by design. Instead of detecting and mitigating attacks,
prevention by design strives to eliminate security vulnerabilities in the early phases of
software development. Vulnerabilities can take many shapes. One time, a well-
known, long-used mechanism is misused in an unexpected way. Another time, an
obscure part of the software system is exposed and exploited by an attacker. In order
to eliminate vulnerabilities early during software development, it is essential to under-
stand the attackers perspectives and their ways of working, as pointed out by many
authors (e.g., [1-3]).
Much research has been performed on modelling the technical aspect of complex
attacks, targeting security experts and security-focused software developers. Our
premise is that secure software development may benefit from involving a wider
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 262275, 2010.
Springer-Verlag Berlin Heidelberg 2010
Visualizing Cyber Attacks with Misuse Case Maps 263
group of stakeholders, such as domain experts (who know the subject and usage
worlds of the proposed software system) and regular software developers who have
no special security training. Ideally, this would happen during the requirements phase
of software development, which is a common ground for domain experts, software
experts and security experts to meet. Also, clarifying security issues already during
the requirements phase results in a security conscious design that saves many troubles
(money, effort, time, reputation etc.) later on.
There are already several techniques and methods available for dealing with secu-
rity requirements in the early software development phases. But there is no technique
or method that addresses security requirements in relation to design of secure archi-
tectures. In practice, however, the two cannot be completely separated: possible
threats will depend on the chosen architecture; the choice of architecture might de-
pend on what threats are considered the most dangerous ones; different architecture
choices offer different mitigations strategies etc. Our idea is that domain experts,
regular software developers and others should be allowed and encouraged to reason
about security concerns in an architectural context. For this purpose, suitable repre-
sentations are needed that combine user, designer and security perspectives on the
proposed software system, so that all stakeholders can understand the issues and con-
tribute their ideas and background knowledge to the security discussions.
Hence, the purpose of this paper is to introduce a new attack modelling technique
that combines an attacker's behavioural view of the proposed software system with an
architectural view. The technique is intended to be useful for a variety of stake-
holders. The technique is called misuse case maps (MUCM). It is inspired by use case
maps [4, 5], into which it introduces anti-behaviours. The technique is illustrated
through a multi-stage intrusion from the literature [6]. Results from a preliminary
evaluation are also reported.
The rest of the paper is organized as follows. We present related work in Sec. 2.
Misuse case maps are introduced in Sec. 3. Misuse case maps are applied to an exam-
ple from the literature in Sec. 4. A preliminary evaluation is presented in Sec. 5. Fi-
nally, we conclude and point out future directions for our work in Sec. 6.
There are already many techniques and methods available that focus on elicitation and
analysis of security requirements during early RE. Attack trees [7] and threat trees [8]
are trees with a high level attack (or threat) at the root, which is then decomposed
through AND / OR branches. Secure i* [9] is an extension of the i* modelling lan-
guage, where malicious actors and their goals are modelled with inverted variants of
the usual icons. Abuse frames [10], extend problem frames with anti-requirements
that might be held by potential attackers. Abuse cases [11], misuse cases [12], and
security use cases [13] are security-oriented variants of regular use cases. Abuse and
misuse cases represent behaviours that potential attackers want to perform using the
software system, whereas security use cases represent countermeasures intended to
avoid or repel these attacks. The difference between abuse and misuse cases is that the
264 P. Karpati, G. Sindre, and A.L. Opdahl
latter show use and misuse in the same picture, whereas abuse cases are drawn in
separate diagrams. We will return to misuse cases in Sec. 2.4.
There are also techniques and methods that attempt to cover later development
phases. Secure Tropos [14] extends the Tropos method with security-related con-
cepts, whereas KAOS has been extended with anti-goals [15]. The CORAS project
[16] combined misuse cases with UML-based techniques into a comprehensive
method for secure software development. Other security-focused extensions of
UML include UMLsec [17] and SecureUML [18]. Languages for secure business
process modelling have also been proposed based on BPMN [19] and UML activity
diagrams [20]. Security patterns describe recommended designs for security [21],
and the formal specification language Z has been used to specify security-critical
systems [22, 23].
Despite the many techniques and methods available for dealing with security in the
early phases of software development, there is so far no technique or method that
links security requirements and architecture. There is, however, a technique that links
software functionality in general with architecture, which we now present.
The use case map (UCM) notation [4,5,24,25] was introduced by Buhr and his team at
Carleton University in 1992. It quickly gained popularity. UCMs have been used in
both research and industry, in particular in the telecommunications sector. It is a part
of the User Requirements Notation (URN) standardized by the International Tele-
communication Union (ITU).
UCMs provide a combined overview of a software systems architecture and its
behaviour by drawing usage scenarios paths (aka use cases) as lines across boxes that
represent architectural run-time components. The boxes can be nested to indicate hi-
erarchies of components. The scenario paths are connected to the components they
run across by responsibilities drawn as crosses. Fig. 1 illustrates and explains the ba-
sic UCM notation. This UCM shows multiple scenarios as multiple paths across the
architecture components.
Visualizing Cyber Attacks with Misuse Case Maps 265
Fig. 2 shows how a UCM binds responsibilities, paths, and components together.
In this simple example, a user (Alice) attempts to call another user (Bob) through
some network of agents. Each user has an agent responsible for managing subscribed
telephony features such as Originating Call Screening (OCS). Alice first sends a con-
nection request (req) to the network through her agent. This request causes the called
agent to verify (vrfy) whether the called party is idle or busy (conditions are between
square brackets). If he is, then there will be some status update (upd) and a ring signal
will be activated on Bob's side (ring). Otherwise, a message stating that Bob is not
available will be prepared (mb) and sent back to Alice (msg). [4] The example also
shows how sections of scenario paths can be split and joined to indicate alternative or
parallel paths.
In this manner, UCMs offer high-level views for software and systems develop-
ment [4, 24]. They combine an architectural overview with behavioural detail and
thus facilitate discovery of problems within collections of scenarios or use cases.
UCMs can also serve as synchronization means among the scenarios/use cases to
check them for completeness, correctness, consistency, ambiguity or consistent ab-
straction levels. UCMs provide several additional notations for visualizing more com-
plex behaviours and more refined relationships between scenarios and architecture
components. We do not present all of them here. The UCM notation also offers vari-
ants that use only two of the three core components (scenario paths, architecture com-
ponents and responsibilities) [4, 25]. In particular, paths and components can be used
without responsibilities for presenting very-high level overviews and for napkin-
type sketching of ideas.
Beyond these contributions, we have not found any direct considerations of the se-
curity perspective in UCMs. But there are two contributions that address safety in a
UCM context. We review them here because security and safety requirements are
both examples of anti-functional requirements, i.e., requirements that state what the
software system should not do. Hence they are similar to functional requirements in
that they are both concerned with the software system's behaviour, but they have
opposite modalities. Wu and Kelly [29] present an approach to derive safety require-
ments using UCMs. The approach aims to provide assurance on the integrity of
requirements elicitation and formulation. First they formulate the problem context in
their process, followed by analysis of deviations, assessment of risks, choice of miti-
gations and formulation of safety requirements. The initial set of requirements is re-
fined iteratively while a software system architecture is also developed incrementally.
The authors conclude (1) that UCMs are effective for capturing the existing architec-
tural context (structure and specific operational modes) beside the intended behaviour
and (2) that the explicit architectural references extend the scope of the deviation
analysis compared with the one over functions or use cases. In [30], Wu and Kelly
extend their approach into a negative scenario framework (along with a mitigation
action model), which has a wider theoretical background and is more general than the
proposal in [29]. The UCM no longer plays the central role, and the approach to iden-
tifying deviations is less specific. Although their framework targeted safety-critical
systems, the authors suggest that it is applicable for other systems as well, such as
security- and performance-critical ones.
Despite the interest in combining UCMs with anti-functional requirements, no rep-
resentation technique has so far been proposed that provide a combined overview of
the attackers' and the architects' views of a proposed software system. However, there
is a technique that shows how to extend and combine representations of wanted soft-
ware system behaviour, as covered by UCMs, with an attacker's attempts to cause
harm, which we now present.
Misuse cases (MUC) [12] extend use cases (UC) for security purposes with misusers,
misuse cases and mitigation use cases, as well as the new relations threatens and
mitigates. They represent security issues by expressing the point of view of an
attacker [31]. Whereas regular UCs describe functional requirements to the software
system, MUCs thereby represent anti-functional requirements, i.e., behaviours
the software should prohibit. They thus encourage focus on security early during
software development by facilitating discussion between different stakeholder groups,
such as domain experts, software developers and security experts. MUCs have also
been investigated for safety [32-34] and other system dependability threats [35] and
compared with other similar techniques like FMEA, Attack Trees and Common Crite-
ria [33, 36, 37]. MUCs can be represented in two ways, either diagrammatically or
textually. Diagrammatically, MUC symbols inverts the graphical notations used in
regular UC symbols, and UC and MUC symbols can be combined in the same
diagram. Textually, both lightweight and an extensive template are offered [12, 34].
Visualizing Cyber Attacks with Misuse Case Maps 267
The previous section shows that there are many techniques and methods available for
dealing with security requirements in the early software development phases, both
during RE and in the transition to later phases. But there is no technique or method
that addresses security requirements in relation to design of secure architectures. Yet
it is well known that requirements and architecture can rarely be considered in com-
plete isolation. Contrarily, architecture is essential for security in several ways. The
types of architecture components suggest typical weaknesses and attack types for the
component (e.g., a router can be scanned for open ports). The specifics of architecture
components suggest specific weaknesses (e.g., a particular router model is likely to
have a particular standard password). The path each function takes through the soft-
ware architecture suggests which general and specific weaknesses a user of that func-
tion might try to exploit. Furthermore, when weaknesses have been identified, archi-
tectural considerations are equally important for mitigating the threats. To alleviate
these and other problems, there is a need for a security requirements technique that
combines an attack-oriented view of the proposed software with an architectural view.
MUCMs extend regular UCMs for security purposes with exploit paths in much the
same way that MUCs extend regular UCs with misuse cases. As in regular UCMs, the
exploit paths in a MUCM are drawn across nested boxes that represent hierarchically-
organized architecture components. In addition to regular responsibilities, the inter-
section of an exploit path and a component can constitute a vulnerability, which is a
behavioural or structural weakness in a system. A component can be a vulnerability
too. A threat combines one or more weaknesses to achieve a malicious intent. Vulner-
abilities can be mitigated, where a mitigation counters a threat and translates to a se-
curity requirement. Both regular scenario paths and exploit paths can be combined in
the same MUCM, just like a MUC diagram can also show UCs.
3.3 Notation
The MUCM notation is based on the regular UCM notation, just like the MUC nota-
tion is based on the UC notation. The MUC notation uses inversion of use-case sym-
bols to distinguish wanted from unwanted behaviour. This is not easy to do for use
case maps, where the start and end points of regular scenario paths are already shown
with filled icons and where the paths themselves are drawn as whole lines. Instead,
we have explored using colours and different icon shapes to distinguish between
wanted and unwanted behaviours in MUCMs.
The leftmost column of Fig. 4 shows the basic MUCM symbols. An exploit path
starts with a triangle and ends in a bar (as in UCMs) if no damage happened. Other-
wise the path ends in a lightning symbol. Exploit paths can be numbered to show the
order of stages in a complex intrusion, where the stages will mostly be causally re-
lated, in the sense that each of them builds on the results of previous ones. A vulner-
able responsibility (e.g., an authentication point) or component (e.g., an unpatched
server) is indicated by a closed curve, and a mitigation of the resulting threat (e.g.,
secure password generation, routines for patching servers) is shown by shading the
interior of the closed curve. Responsibilities can be left out whenever they are not
relevant from the intrusions point of view. Through these basic symbols, MUCMs
offer a basic notation that is close to the simplified UCM notation suggested for very-
high level overviews and for napkin-type sketching of ideas. We expect this to be
the most prominent use of MUCMs in practice.
Yet at this early stage it is worth exploring more detailed notation alternatives too.
For example, the rightmost column of Fig. 4 shows how a time glass can be used to
indicate that an exploit path must wait for a regular scenario path to reach a certain
point. The example in Fig. 3 used this notation to show how an attacker, who have
secured access to a Citrix server at an earlier intrusion stage, installs a keylogger on
the server in order to snatch the administrators password. The hour glass indicates
that the attacker has to wait for an administrator to log in before the keylogger can
snatch the password.
Fig. 4. The proposed MUCM notation, with the basic symbols shown on the left and further
tentative extensions on the right
Visualizing Cyber Attacks with Misuse Case Maps 269
Get, put and remove arrows can be used to show how an exploit path interacts with
a component. An example involving the get arrow is when the attacker accesses the
password hash files. An example of a put arrow is when the attacker installs a sniffer
program on one of the servers. An example using the remove arrow is when the at-
tacker deletes his/her traces from a system. We will see in the complex example that
not all the information is available about the case. Similarly, when (re)creating an
intrusion, some parts of it may be unclear at first. The question marks can be used as
reminders about unclear issues.
Labels can be attached to symbols. For example, a label at the start of a UCM path
might indicate the role of the actor if it affects a connected exploit path; a label at the
end of an exploit path might be labelled with the result of the exploit if it is not clear
from the path alone; and get, put or remove arrows can be labelled with the types of
data or software that are accessed. These arrows are part of the regular UCM notation
as well, where they have a slightly different meaning. Hence, their interpretation de-
pends on the context. Further work should consider using distinct symbols, such as a
wave arrow, to differentiate the notations.
Like in regular UCMs, the granularity of the intrusion representation can change
by combining or exploding steps. Consider a case where the attacker downloads a file
of password hashes from one machine, cracks them in his/her own computer and pro-
ceeds to log into another machine with a cracked password. This could be shown as
individual steps or as a composite step a MUCM stub that hide the cracking proc-
ess and leads the exploit path from the first machine with the hashes to the one logged
into with the cracked password.
As already explained, we expect the basic MUCM symbols to be the ones most
used in practice, and we expect the notation we suggest here to evolve further. In this
paper, however, we will stay with the symbols we have used in the preliminary
evaluation to be presented in Sec. 5.
The bank intrusion is a multi-stage intrusion presented in [6, Chapter 7]. We suggest
following the intrusion on the MUCM in Sec. 4.2 while reading through the intrusion
steps.
First, the intruder found an interesting bank by browsing a web site with organiza-
tions and IP ranges assigned. Next, he probed for further details about the IP ad-
dresses of the bank and found a server that was running Citrix MetaFrame (remote
access software). He then scanned other networked computers for the remote access
port to Citrix terminal services (port 1494). The attacker knew he might be able to
enter the server with no password, as the default Citrix setting is no password
270 P. Karpati, G. Sindre, and A.L. Opdahl
required. He searched every file on the computer for the word password to find the
clear text password for the bank's firewall. The attacker then tried to connect to
routers and found one with default password. He added a firewall rule allowing in-
coming connections to port 1723 (VPN).
After successfully authenticating to the VPN service, the attackers computer was
assigned an IP address on the internal network, which was flat, with all systems on a
single segment. He discovered a confidential report written by security consultants
containing a list of network vulnerabilities. He also found operation manuals to the
bank's IBM AS/400 server on a Windows domain server. The default password
Fig. 5. A misuse case map for the bank intrusion. The whole red line depicts the attackers
footprint whereas the dashed black line shows the regular users activities.
Visualizing Cyber Attacks with Misuse Case Maps 271
worked for the AS/400. The intruder installed a keylogger on the Citrix server, waited
for an administrator to log in and snarfed the administrator's password. He now had
access to training manuals for critical AS/400 applications, giving him the ability to
perform any activity a teller could. He also found that the database of the Department
of Motor Vehicles was accessible from the bank's site. He accessed the Primary
Domain Controller (which authenticates login requests to the domain) and added a
disguised script that extracted password hashes from a protected part of the system
registry in the administrators startup folder. He then waited for a domain administra-
tor to log in so the script would be triggered and password hashes written to a hidden
file. He then cracked the appropriate password. The most sensitive parts of the bank's
operations could now be accessed (generating wire transfers, moving funds etc.). A
manual he had already found described how to complete a wire transfer form.
The attacker was a white-hat hacker who claimed not to harm the bank or its
customers as a result of the intrusion.
4.2 MUCM
Fig. 5 shows a MUCM for the bank intrusion. Some details were omitted, either be-
cause they were not given in the original text (e.g., how the access to some of the
components was secured) or because the details were intuitive and would only over-
load the map (e.g., to access the internal computers, the attacker always went through
the VPN).
5 Preliminary Evaluation
To preliminary evaluate MUCMs and the MUCM notation, we sent out a written
evaluation sheet to more than 20 colleagues and other contacts. We received 12 re-
sponses. All respondents had MSc or PhD degrees, except one MSc student who, on
the other hand, had professional experience as a system administrator. All the degrees
were in computing. 6 of the respondents were working as academics, 4 in industry
and 2 in both academia and industry.
The evaluation sheet had three sections. The first section explained the aim and the
required conditions of the experiment. There was no time limit, but the respondents
were asked to perform the evaluation without interruption. The second section gave
an introduction to UCM and MUCM. The third section included a copy of the textual
description of the bank intrusion from [6], along with the corresponding MUCM (Fig.
5). The third section also comprised three sets of questions, regarding (a) the back-
ground of the participants, (b) the participants understanding of the case, and (c) the
user acceptance of the technique. The user acceptance questions were inspired by the
Technology Acceptance Model (TAM) [38], reflecting the three TAM-variables per-
ceived usefulness, perceived ease of use and intention to use with 2 items each, giving
six items (or questions) in all. At the end of the sheet, open comments were invited. In
addition, the respondents were asked how much time they spent on the evaluation
sheet and which aids they relied on when answering the questions about understand-
ing of the case (either the textual description, the misuse case map or memory).
272 P. Karpati, G. Sindre, and A.L. Opdahl
The participants spent between 20 and 60 minutes on the task (37 minutes on aver-
age). We split the responses in the following four groups, depending on which aids
they reported to have relied on when answering the questions about understanding of
the case. The four groups were TD (9 valid responses relying on the textual descrip-
tion), MEM (6 valid responses relying on memory), MUCM (6 valid responses rely-
ing on the misuse case map) and NON-M (4 valid responses not using the misuse case
map). Because most respondents reported relying on more than one aid, the groups
overlap considerably. Nevertheless, a comparison of the responses gives a useful first
indication of the strengths and weaknesses of MUCM as an aid for understanding a
complex intrusion scenario.
Table 1 summarizes the responses according to group. The TD and MUCM groups
spent most time on the task, the MEM and NON-M groups the least. With regard to
background experience, the groups were quite similar, with average scores between
3.6 and 3.9 on a scale from 1 to 5 (with 5 being highest). The TD and MUCM groups
reported slightly lower experience on average than the MEM and NON-M groups,
suggesting that less experienced respondents relied more on external aids. This may
explain in part why they used more time too. The MUCM group had the highest aver-
age percentage (77%) of correct answers to the questions about understanding,
whereas the NON-M group had the lowest one (68%). Although time may have
played a part, we take this to be an indication that MUCM may indeed be a beneficial
aid for understanding complex intrusions. The TD group also had a high average
score on understanding, and the MEM group a lower one.
Table 1. Responses to the evaluation sheet, grouped according to the aids used when answer-
ing the questions about understanding of the case
On average, the four groups rated the perceived usefulness (PU) of MUCMs simi-
larly, from 3.8 to 3.9 (again on a scale from 1 to 5 with 5 being highest). The average
ratings on perceived ease of use (PEOU), however, were considerably higher for
MUCM (3.3) than for NON-M (2.5), suggesting that perceived ease of use played an
important role in the respondents' decisions whether to use the MUCM or not when
answering questions about the case. The TD group also had a high average rating and
MEM a low one, on PEOU. In general, the scores on PU were higher than for PEOU,
indicating that the somewhat elaborate MUCM notation used in the evaluation was
perceived as complex. On intention to use (ITU), however, the average ratings were
nearly identical between the groups (from 3.8 to 4.0). Surprisingly, intention to use
Visualizing Cyber Attacks with Misuse Case Maps 273
misuse case maps in the future was highest for the NON-M group that had not used
MUCMs at all. Because the evaluation was preliminary, we do not address validity
issues here.
We received written and oral comments on the following issues: the notation was
hard to understand although it could become easier to read with time; the map con-
tained too much detail; the map contained too little detail; UML sequence diagrams
may be a better alternative; the component concept is unclear because it mixes physical
and logical entities; and MUCMs are good for analysis but maybe not for communica-
tion. We plan to address these comments in the further development of the technique.
The paper has introduced a new attack modeling technique, misuse case maps
(MUCM), that combines an attacker's behavioral view of the proposed software sys-
tem with an architectural view. The purpose of misuse case maps is to offer a repre-
sentation technique with the potential to include a wider group of stakeholders, such
as domain experts and regular software developers, in security considerations already
during the earliest development phases. The technique and its notation was illustrated
through a multi-stage bank intrusion described in the literature. Results from a pre-
liminary evaluation were also reported, indicating that MUCM may indeed be a bene-
ficial aid for understanding complex intrusions.
Of course, the preliminary evaluation is severely limited. It used only a small ex-
ample, which precluded statistical analysis. The evaluation was not controlled, and
the subjects were colleagues and other contacts who might have been positively bi-
ased towards our proposal. Hence, further empirical evaluations are clearly needed,
for example investigating different complex intrusion scenarios for the future like in
[39, Chapter 3]. They should involve more subjects working under more controlled
conditions.
Future work on MUCMs should address issues such as how to avoid overly com-
plex, spaghetti-like maps, how to best communicate intrusions to domain experts and
regular software developers, and how to involve them in the vulnerability exploring
and mitigating process. Further evaluations and practical studies will use MUCMs in
increasingly realistic settings. We intend to combine MUCM with other attack model-
ling and security analysis techniques. We also plan to provide practical guidelines to
establish a security requirements method and provide tool support for it. The method
should perhaps be further extended to consider anti-functional requirements in gen-
eral, addressing safety requirements in particular in addition to security. Important
questions to address will be when and how to apply the security and safety experts'
knowledge and how to manage the different types of information that is generated
from the cooperation between customers, domain experts and developers.
References
1. Barnum, S., Sethi, A.: Attack Patterns as a Knowledge Resource for Building Secure
Software. In: OMG Software Assurance Workshop (2007)
2. Koziol, J., et al.: The shellcoders handbook: discovering and exploiting security holes.
John Wiley & Sons, Chichester (2004)
3. Hoglund, G., McGraw, G.: Exploiting Software: How to Break Code. Addison-Wesley,
Boston (2004)
4. Amyot, D.: Use Case Maps Quick Tutorial (1999),
http://www.usecasemaps.org/pub/UCMtutorial/UCMtutorial.pdf
5. Buhr, R., Casselman, R.: Use case maps for object-oriented systems. Prentice-Hall, Inc.,
Upper Saddle River (1995)
6. Mitnick, K.D., Simon, W.L.: The art of intrusion: the real stories behind the exploits of
hackers, intruders & deceivers. Wiley, Chichester (2005)
7. Schneier, B.: Secrets & lies: digital security in a networked world. John Wiley & Sons,
Chichester (2000)
8. Amoroso, E.G.: Fundamentals of computer security technology. Prentice-Hall, Inc., Upper
Saddle River (1994)
9. Liu, L., Yu, E., Mylopoulos, J.: Security and privacy requirements analysis within a social
setting. In: Proc. RE 2003, vol. 3, pp. 151161 (2003)
10. Lin, L., et al.: Using abuse frames to bound the scope of security problems (2004)
11. McDermott, J., Fox, C.: Using abuse case models for security requirements analysis (1999)
12. Sindre, G., Opdahl, A.L.: Eliciting security requirements with misuse cases. Requirements
Engineering 10(1), 3444 (2005)
13. Firesmith, D.J.: Security use cases. Technology 2(3) (2003)
14. Giorgini, P., et al.: Modeling security requirements through ownership, permission and
delegation. In: Proc. of RE, vol. 5, pp. 167176 (2005)
15. Van Lamsweerde, A., et al.: From system goals to intruder anti-goals: attack generation
and resolution for security requirements engineering. In: Requirements Engineering for
High Assurance Systems (RHAS 2003), vol. 2003, p. 49 (2003)
16. Dimitrakos, T., et al.: Integrating model-based security risk management into eBusiness
systems development: The CORAS approach. In: Monteiro, J.L., Swatman, P.M.C.,
Tavares, L.V. (eds.) Proc. 2nd Conference on E-Commerce, E-Business, E-Government
(I3E 2002), pp. 159175. Kluwer, Lisbon (2002)
17. Jurjens, J.: UMLsec: Extending UML for secure systems development. In: Jzquel, J.-M.,
Hussmann, H., Cook, S. (eds.) UML 2002. LNCS, vol. 2460, pp. 412425. Springer,
Heidelberg (2002)
18. Lodderstedt, T., et al.: SecureUML: A UML-based modeling language for model-driven
security. In: Jzquel, J.-M., Hussmann, H., Cook, S., et al. (eds.) UML 2002. LNCS,
vol. 2460, pp. 426441. Springer, Heidelberg (2002)
19. Rodriguez, A., Fernandez-Medina, E., Piattini, M.: Towards an integration of security re-
quirements into business process modeling. In: Proc. of WOSIS, vol. 5, pp. 287297
(2005)
20. Rodriguez, A., Fernandez-Medina, E., Piattini, M.: Capturing Security Requirements in
Business Processes Through a UML 2.0 Activity Diagrams Profile. In: Roddick, J.,
Benjamins, V.R., Si-said Cherfi, S., Chiang, R., Claramunt, C., Elmasri, R.A., Grandi, F.,
Han, H., Hepp, M., Lytras, M.D., Mii, V.B., Poels, G., Song, I.-Y., Trujillo, J.,
Vangenot, C. (eds.) ER Workshops 2006. LNCS, vol. 4231, pp. 3242. Springer,
Heidelberg (2006)
Visualizing Cyber Attacks with Misuse Case Maps 275
21. Schumacher, M., et al.: Security Patterns: Integrating Security and Systems Engineering.
Wiley, Chichester (2005)
22. Boswell, A.: Specification and validation of a security policy model. IEEE Transactions on
Software Engineering 21(2), 6368 (1995)
23. Hall, A., Chapman, R.: Correctness by construction: Developing a commercial secure
system. IEEE Software, 1825 (2002)
24. Buhr, R.J.A.: Use case maps for attributing behaviour to system architecture. In: 4th
International Workshop of Parallel and Distributed Real-Time Systems (1996)
25. Buhr, R.J.A.: Use case maps as architectural entities for complex systems. IEEE Transac-
tions on Software Engineering 24(12), 11311155 (1998)
26. Woodside, M., Petriu, D., Siddiqui, K.: Performance-related completions for software
specifications. In: 24th International Conference on Software Engineering (2002)
27. Liu, X., Peyton, L., Kuziemsky, C.: A Requirement Engineering Framework for Electronic
Data Sharing of Health Care Data Between Organizations. In: MCETECH (2009)
28. Mussbacher, G., Amyot, D., Weiss, M.: Visualizing Early Aspects with Use Case Maps.
In: Rashid, A., Aksit, M. (eds.) Transactions on AOSD III. LNCS, vol. 4620, pp. 105143.
Springer, Heidelberg (2007)
29. Wu, W., Kelly, T.P.: Deriving safety requirements as part of system architecture defini-
tion. In: Proceedings of the 24th International System Safety Conference, Albuquerque
(2006)
30. Wu, W., Kelly, T.: Managing Architectural Design Decisions for Safety-Critical Software
Systems. In: Hofmeister, C., Crnkovi, I., Reussner, R. (eds.) QoSA 2006. LNCS,
vol. 4214, pp. 5977. Springer, Heidelberg (2006)
31. Alexander, I.: Misuse cases: Use cases with hostile intent. IEEE Software 20(1), 5866
(2003)
32. Sindre, G.: A look at misuse cases for safety concerns. International Federation for Infor-
mation Processing Publications - IFIP, vol. 244, p. 252 (2007)
33. Stlhane, T., Sindre, G.: A comparison of two approaches to safety analysis based on use
cases. In: Parent, C., Schewe, K.-D., Storey, V.C., Thalheim, B. (eds.) ER 2007. LNCS,
vol. 4801, pp. 423437. Springer, Heidelberg (2007)
34. Stlhane, T., Sindre, G.: Safety Hazard Identification by Misuse Cases: Experimental
Comparison of Text and Diagrams. In: Czarnecki, K., Ober, I., Bruel, J.-M., Uhl, A.,
Vlter, M. (eds.) MODELS 2008. LNCS, vol. 5301, pp. 721735. Springer, Heidelberg
(2008)
35. Sindre, G., Opdahl, A.L.: Misuse Cases for Identifying System Dependability Threats.
Journal of Information Privacy and Security 4(2), 322 (2008)
36. Diallo, M.H., et al.: A comparative evaluation of three approaches to specifying security
requirements. In: Proc. REFSQ 2006, Luxembourg (2006)
37. Opdahl, A.L., Sindre, G.: Experimental comparison of attack trees and misuse cases for
security threat identification. Information and Software Technology 51(5), 916932 (2009)
38. Davis, F.D.: Perceived usefulness, perceived ease of use, and user acceptance of informa-
tion technology. MIS quarterly 13(3), 319340 (1989)
39. Lindqvist, U., Cheung, S., Valdez, R.: Correlated Attack Modeling, CAM (2003)
How Do Software Architects Consider Non-Functional
Requirements: A Survey
1 The Survey
The survey has been developed following an iterative methodology. Each iteration
has been revised by IT experts and researchers of the area. For the implementation we
chose LimeSurvey, an open source project for developing surveys.
For the dissemination of the survey we used two strategies. On the one hand,
personal contact with software architects and on the second hand, advertisement in
IT communities hosted in common sites such as LinkedIn and Facebook. We have
contacted more than 10 software architects and advertised in the International Asso-
ciation of Software Architects (IASA) group. The survey was running during the
year 2009.
The survey had questions about software development. Concretely, we asked
about the used architectural styles, the type of developed applications, the techno-
logic platforms used in them, and questions about Non-Functional Requirements
(NFRs).
In this work we show the results about NFRs and their relationship to the used
architectural style, the type of developed application, and the used technologic
platform.
R. Wieringa and A. Persson (Eds.): REFSQ 2010, LNCS 6182, pp. 276277, 2010.
Springer-Verlag Berlin Heidelberg 2010
How Do Software Architects consider Non-Functional Requirements: A Survey 277
35
Maintainability
Reusability
30
Efficiency
Reliability
25
Usability
Portability
20
Cost
Standards compliance
15 Organizational
10
0
None Marginal Medium Important Critical No ans wer
2 The Results
We had 60 responses to the survey. The main results of this survey about NFR may be
summarized as follows:
Respondents answered about the importance of NFRs in their habitual software
development practices: while 96% of respondents consider NFR (73% at the same
level as functional requirements), only 57% use NFR to make architectural and
technological decisions.
Respondents rated nine types of NFRs with respect to the importance to their pro-
jects as shown in Fig. 1. Requirements such as maintainability, reusability, effi-
ciency, reliability, and usability have a tendency of being more important for archi-
tects than portability, cost, standard compliance, and organizational NFR.
80% of respondents declared that the development tools that they use are not well-
suited for analysing the compliance with the specified NFRs, whilst 70% would
like to have them. For us this is a clear indicator that there is an unsatisfied need in
software industry.
Other results (e.g., some relations between NFRs and used architectural styles) were
also found when analyzing the data gathered.
3 Conclusions
This survey can be seen as an instrument to show the differences in software devel-
opment practices between research and industry. In particular, we show the impact of
NFRs in the software development practices.
Our position is that a way to obtain empirical evidence about the current state of
software architectures usage in IT companies and organizations is asking the involved
actors.
Author Index