Rajesh Vasa - PHD Thesis
Rajesh Vasa - PHD Thesis
Rajesh Vasa - PHD Thesis
Rajesh Vasa
2010
Abstract
In this thesis we address the problem of identifying where, in successful
software systems, maintenance effort tends to be devoted. By examin-
ing a larger data set of open source systems we show that maintenance
effort is, in general, spent on addition of new classes. Interestingly, ef-
forts to base new code on stable classes will make those classes less
stable as they need to be modified to meet the needs of the new clients.
This thesis advances the state of the art in terms of our understanding
of how evolving software systems grow and change. We propose an
innovative method to better understand growth dynamics in evolving
software systems. Rather than relying on the commonly used method
of analysing aggregate system size growth over time, we analyze how
the probability distribution of a range of software metrics change over
time. Using this approach we find that the process of evolution typically
drives the popular classes within a software system to gain additional
clients over time and the increase in popularity makes these classes
change-prone.
i
Dedicated to all my teachers
ii
Acknowledgements
Finally, I would like to thank my family for their loving forbearance dur-
ing the long period it has taken me to conduct the research and write
up this thesis.
iii
Declaration
I declare that this thesis contains no material that has been accepted for
the award of any other degree or diploma and to the best of my knowl-
edge contains no material previously published or written by another
person except where due reference is made in the text of this thesis.
iv
Publications Arising from this
Thesis
v
7. R. Vasa, M. Lumpe, P. Branch, and O. Nierstrasz. Comparative
Analysis of Evolving Software Systems using the Gini Coefficient.
In Proceedings of the 25th IEEE International Conference on Soft-
ware Maintenance (ICSM ’09), 2009.
The early articles helped lay the foundation and scope the work pre-
sented in this thesis. Specifically, the QAOOSE’03 and ISESE’05 ar-
ticles (papers 1 and 2) showed that software metrics typically exhibit
highly skewed distributions that retain their shape over time and that
architectural changes can be detected by analyzing these changing dis-
tributions. The article published at SC’2007 (paper 3) expanded on the
ISESE’05 article (paper 2) and presented a mathematical model to de-
scribe the evolution process and also put forward the thresholds as well
as a technique to detect substantial changes between releases. These
papers helped establish and refine the input data selection method
(Chapter 3), validate the approach that we take for extracting metrics
(Chapter 4), and developed the modelling approach that we eventually
used to detect substantial changes between releases (Chapter 5).
More recent work (in particular, ICSM’07 and ICSM’09 articles and the
EVOL’07 article – papers 4, 5 and 7) contributed to the content pre-
sented in Chapters 5 and 6 of this thesis which address the primary
research questions. The article in ASWEC’10 (paper 8) showed that the
key analysis approach advocated in this thesis can also be used to un-
derstand how properties are used in Java software. The IEEE Software
article in 2009 (paper 6) presented a method for reasoning about soft-
ware architecture and the findings from this thesis influenced some of
the arguments with respect to the long term stability of software archi-
tecture. The implications that we derived from all of the various papers
are expanded upon in Chapter 7.
vi
Contents
1 Introduction 1
2 Software Evolution 10
2.1 Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
vii
Contents
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
viii
Contents
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5 Growth Dynamics 90
ix
Contents
x
Contents
7 Implications 181
xi
Contents
8 Conclusions 193
References 207
xii
List of Figures
xiii
List of Figures
5.5 Spring evolution profiles showing the upper and lower bound-
aries on the relative frequency distributions for Number of
Branches, In-Degree Count, Number of Methods and Out-
Degree Count. All metric values during the entire evolution
of 5 years fall within the boundaries shown. The y-axis in
all the charts shows the percentage of classes (similar to
a histogram). . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.7 Box plot of Gini coefficients across all selected Java systems.115
xiv
List of Figures
xv
List of Figures
xvi
List of Tables
4.2 Direct count metrics computed for both classes and inter-
faces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
xvii
List of Tables
xviii
Chapter 1
Introduction
Research in the field of software evolution aims to bridge the gap in our
understanding of how software changes by undertaking rigorous stud-
ies of how a software system has evolved. Over the past few decades,
work in this field has identified generalizations that are summarized
1
Chapter 1. Introduction
in the laws of software evolution [174, 175] and has identified general
facets of evolution [198], put forward techniques for visualising evolu-
tion and change [59,65,95,163], collated and analyzed software metric
data in order to understand the inherent nature of change [27], as-
sembled methods for identifying change prone components [109, 281]
as well as advise on expected statistical properties in evolving software
systems [21, 270]. Although earlier work on how software evolves fo-
cused on large commercial systems [17,146,147,175,283], recent stud-
ies have investigated open source software systems [41, 100, 101, 192,
239,305]. This work has been enriched by more recent studies into how
object oriented software systems evolve [64, 71, 193, 269, 270].
2
Chapter 1. Introduction
Given that change is inherent within an active and used software sys-
tem, the key to a successful software evolution approach lies not only in
anticipating new requirements and adapting a system accordingly [87],
but also in understanding the nature and the dynamics of change, es-
pecially as this has an influence on the type of decisions the devel-
opers make. Changes over time lead to software that is progressively
harder to maintain if no corrective action is taken [168]. Compound-
ing this, these changes are often time consuming to reverse even with
tool support. Tools such as version control systems can revert back to a
previous state, but they cannot bring back the cognitive state in the de-
veloper’s mind. Developers can often identify and note local or smaller
changes, but this task is much more challenging when changes tend
to have global or systemic impact. Further, the longer-term evolution-
ary trends are often not easily visible due to a lack of easy to interpret
summary measures that can be used to understand the patterns of
change.
3
Chapter 1. Introduction
models that are much more descriptive and can be used to warn de-
velopers of significant variations in the development effort or highlight
decisions that may be unusual within the historical context of a project.
4
Chapter 1. Introduction
5
Chapter 1. Introduction
6
Chapter 1. Introduction
pret the software metrics distributions more effectively and can identify
if evolutionary pressures are causing centralisation of complexity and
functionality into a small set of classes.
Thirdly, we find that the popularity of a class is not a function of its size
or complexity, and that evolution typically drives these popular classes
to gain additional users over time. Interestingly, we did not find a con-
sistent and strong trend for measures of class size and complexity. That
is, large and complex classes do not get bigger and more complex purely
due to the process of evolution, rather, there are other contributing fac-
tors that determine which classes gain complexity and volume.
7
Chapter 1. Introduction
8
Chapter 1. Introduction
The Appendix collates the data tables and provides an overview of the
files on the companion DVD for this thesis which has the raw metric
data extracted from software systems under investigation.
9
Chapter 2
Software Evolution
How does software change over time? What constitutes normal change?
Can we detect patterns of change that are abnormal and might be in-
dicative of some fundamental issue in the way software is developed?
These are the types of questions that research in the field of software
evolution aims to answer, and our thesis makes a contribution towards
this end. Over the last few decades research in this field has contributed
qualitative laws [174] and insights into the nature and dynamics of this
evolutionary process at various levels of granularity [41, 59, 65, 71, 85,
96, 100, 118, 127, 162, 171, 188, 194, 200, 284, 289, 289, 290, 292–294,
304]. In this chapter we present the background literature relevant for
this thesis and provide motivation for our research goals.
2.1 Evolution
Evolution describes a process of change that has been observed over a
surprisingly wide range of natural and man-made entities. It spans sig-
nificant temporal and spacial scales from seconds to epochs and from
microscopic organisms to the electricity grids that power continents.
The term evolution was originally popularised within the context of
biology and captures the “process of change in the properties of popu-
lations of organisms or groups of such populations, over the course of
generations” [84]. Biological evolution postulates that organisms have
10
Chapter 2. Software Evolution
Within the context of software the term evolution has been used since
the 1960s to characterise growth dynamics. For example work by Halpern
[112] has shown how programming systems have evolved and Fry et.
11
Chapter 2. Software Evolution
Directed Adaptation
(based on feedback/external pressures)
Software System
al. [82] studied how database management systems evolve. The term
in relation to how a software system changes started to appear in work
done by Couch [57]. Building on this foundation, Lehman [174], in his
seminal work argued that E-type software (application software used
in the real-world) due to their very use provide evolutionary pressures
that drive change. This argument was supported by the observation
that stakeholder requirements continually change, and in order to stay
useful, a software system must be adapted to ensure ongoing satisfac-
tion of the stakeholders. Unlike biological evolution which applies to a
population of organisms, the term software evolution is used within the
context of an individual software system. Similar to biological evolution,
the process of evolution in software is directed and feedback-driven to
ensure the software system is continuously adapted to satisfy the user’s
requirements. However, a key distinction is that in software evolution,
there is no random variation occurring within the software system (see
Figure 2.1) and the term “evolution” in the context of software implies
directed adaptation.
12
Chapter 2. Software Evolution
13
Chapter 2. Software Evolution
Within the context of this thesis, software evolution implies the measur-
able changes between releases made to the software as it is maintained
and enhanced over its life time. Software, the unit of change, includes
the executable as well as the source code. Our definition is a minor
adaptation to the one proposed by Lehman [174], and reinforces the dis-
tinction between maintenance and evolution. It also explicitly focuses
on the outcome of the maintenance activity and changes that can be
measured from the software system using static analysis. That is, we
focus only on the set of changes that can be detected without executing
the software system, and without direct analysis of artifacts external to
the software system, for example, product documentation. Our study
focuses on the outcome from changes that are possible due to the fol-
14
Chapter 2. Software Evolution
Release based studies are able to provide an insight into evolution from
a post-release maintenance perspective. That is, we can observe the
evolution of the releases of a software system that the stakeholders are
likely to deploy and use. The assumption made by these studies is that
15
Chapter 2. Software Evolution
Change based studies, on the other hand, view evolution as the ag-
gregate outcome of a number of individual changes over the entire life
cycle [25]. That is, they primarily analyze information generated during
the development of a release. Due to the nature of information that they
focus on, change based studies tend to provide an insight into the pro-
cess of evolution that is comparatively more developer centric. Although
change based studies can also be used to determine changes from the
end-user perspective, additional information about releases that have
been deployed for customers to use has to be taken into consideration
during analysis.
16
Chapter 2. Software Evolution
17
Chapter 2. Software Evolution
[59, 100, 101, 103, 127, 193, 204, 217, 239, 254, 277, 304, 310] focused
on a few popular and large software systems (for example, the Linux
operating system or the Eclipse IDE).
Given the small number of systems that are typically investigated in re-
lease based evolution studies, there is a need for a comparatively larger
longitudinal release based software evolution study to confirm findings
of previous studies still hold, to increase the generalizability of the find-
ings, and to improve the strength of the conclusions. Even though pre-
vious release based studies [59, 100, 101, 103, 127, 193, 204, 217, 239,
254, 277, 304, 310] have investigated a range of different software sys-
tems, a general limitation is that there has been no single study that
has attempted to analyze a significant set of software systems. Our
work fills this gap and involves a release based study of forty software
systems comprising 1057 releases. The focus on a comparatively larger
set of software systems adds to the existing body of knowledge since
our results have additional statistical strength than studies that inves-
tigated only a few software systems. Our data set selection criteria and
the method used to extract information is discussed in Chapter 3 and
Chapter 4, respectively.
18
Chapter 2. Software Evolution
tended into eight laws (See Table 2.1) [175]. These laws are based on
a number of observations of size and complexity growth in a large and
long lived software system. Lehman and his colleagues in their ini-
tial work discovered [168] and refined [171, 175] the laws of evolution
(which provide a broad description of what to expect), in part, from di-
rect observations of system size growth (measured as number of mod-
ules) as well as by analysing the magnitude of changes to the modules.
The initial set of Five laws were based on the study of evolution of one
large mainframe software system. These five laws were later refined,
extended and supported by a series of case studies by Lehman and his
colleagues [171, 283, 284].
19
Chapter 2. Software Evolution
The Laws of Software Evolution (Table 2.1), state that regardless of do-
main, size, or complexity, real-world software systems evolve as they are
continually adapted, grow in size, become more complex, and require
additional resources to preserve and simplify their structure. In other
words, the laws suggest that as software systems evolve they become
increasingly harder to modify unless explicit steps are taken to improve
maintainability [175].
The laws capture the key drivers and characteristics of software evo-
lution, are tightly interrelated, and capture both the change as well as
the context within which this change takes place [168, 170, 175]. The
first law (Continuing Change) summarises the observation that soft-
ware will undergo regular and ongoing changes during its life-time in
order to stay useful to the users. These changes are driven by external
pressures, causing growth in the software system (captured as Con-
tinuing Growth by the sixth law) and in general, this increase in size
also causes a corresponding increase in the complexity of the software
structure (captured by the second law as Increasing Complexity). Inter-
estingly, the process of evolution is triggered when the user perceives a
decrease in quality (captured as Declining Quality in the seventh law).
Additionally, the laws also state that the changes take place within an
environment that forces stability and a rate of change that permits the
20
Chapter 2. Software Evolution
21
Chapter 2. Software Evolution
and Chapter 6) rather than observe the overall growth trend in cer-
tain measures which is the technique employed by many earlier stud-
ies [39, 41, 59, 85, 100, 101, 103, 119, 120, 127, 153, 171, 175, 192, 200,
217, 239, 277, 306, 310] (discussed further in the next section – Sec-
tion 2.4). As a consequence our study can offer a different insight into
the laws, as well as the dynamics of change within open source software
systems.
22
Chapter 2. Software Evolution
size
size
time time time
Growth rate
23
Chapter 2. Software Evolution
E
Si = + Si − 1 (2.4.1)
Si2−1
Beyond the work by Turski [283, 284], the sub-linear growth rate ob-
servation is also supported by a number of different case studies [41,
59, 85, 171, 175, 188, 192, 200, 217] that built models based on regres-
sion techniques. The increasing availability and acceptance of Open
Source Software Systems has allowed researchers to undertake com-
paratively larger studies in order to understand growth as well as other
aspects of evolution [39, 100, 103, 127, 217, 306]. Interestingly, it is
these studies that have initially provided a range of conflicting results,
some studies [17, 129, 153] found that growth typically tends to be
sub-linear supporting the appropriateness of Lehman’s laws, but oth-
ers [101, 119, 120, 127, 153, 239] have observed linear as well as super-
linear growth rates suggesting that the growth expectations implied by
Lehman’s laws of evolution are not universal.
Godfrey and his colleagues [101] were one of the first to question the
validity of Lehman’s laws in the context of Open Source Software Sys-
tems. In their study they observed growth to be super-linear in certain
sub-systems of Linux (specifically the driver sub-system in their study),
suggesting that the increasing complexity and sub-linear growth rate
expectation of Lehman’s laws do not universally hold. This observation
of super-linearity was later confirmed by Succi et al. [264], González-
Barahona et al. [105, 106] and more recently by Israeili et al. [127].
24
Chapter 2. Software Evolution
Web server and gcc compiler all showed only linear growth. Robles et
al. [239] analyzed 18 different open source software systems and found
that sub-linear and linear growth rates to be the dominant trend with
only two systems (Linux and KDE) fitting a super-linear growth trend.
Mens et al. [192] in a study of the evolution of the Eclipse IDE observed
super-linear growth in the number of plug-ins, while the core platform
exhibited a linear growth rate. Koch [152], in an extensive change based
study of over 4000 different software systems mined from Sourceforge (a
popular open source software repository), found linear and sub-linear
growth rates to be common, while only a few systems exhibited super-
linear growth rate. Though, Koch et al. undertook a change based
study by analysing the source code control logs, they reconstruct size
measures in order to analyze the growth rates. More recently, Thomas
et al. [277] investigated the rate of growth in Linux kernel and found a
linear growth rate.
Researchers [101, 127, 153, 192, 200] that have observed the super-
linear growth rate argue that the underlying structure and organisa-
tion of a system has an impact on the evolutionary growth potential
and that modular architectures can support super-linear growth rates.
They suggest that these modular architectures can support an increas-
ing number of developers, allowing them to make contributions in par-
allel without a corresponding amplification of the communication over-
head [153].
25
Chapter 2. Software Evolution
nential growth in the number of plug-ins for the Eclipse platform [192]
is similar to that of the driver sub-system in Linux and shows that cer-
tain architectural styles can allow the overall software systems to grow
at super-linear rates, suggesting limitations to Lehman’s laws.
Studies that found super-linear growth rates [101, 119, 120, 127, 153,
239] show that it is possible to manage the increase in volumetric com-
plexity and the consequent structural complexity. The implication of
these studies is that certain architectural choices made early in the
life cycle can have an impact on the growth rate, and a certain level of
structural complexity can be sustained without a corresponding invest-
ment of development effort (in contrast to the expectations of the laws
of software evolution).
An early study that has provided some data about the distribution of
growth is the one undertaken by Gall et al. [85] that suggests that differ-
ent modules grow at different rates. This observation is also confirmed
by Barry et al. [17]. Although these studies highlight that growth rates
can differ across modules, they do not discuss in depth how the growth
is distributed and what the impact of this distribution is on the over-
all evolution of the software that they study. More recently, Israeli et
al. [127] investigated the Linux kernel and identified that average com-
plexity is decreasing. The interesting aspect of the study by Israeli et al.
was that they note that the reduction of the average complexity was a re-
sult of developers adding more functions with lower relative complexity.
However, all of these studies have focused on individual systems and on
a small set of metrics, and hence there is a gap in our understanding
26
Chapter 2. Software Evolution
1000
900
All versions (Super‐Linear)
800
y = 0.0001x2 + 0.1072x + 213.29
R² = 0.97821
700
Size (Number of Classes)
600 Versions 1.1.x to 1.6.x (Sub‐Linear)
y = ‐0.0009x2 + 3.4153x ‐ 2516.7
R² = 0.9968
500
400
300
200
Version 1.0.x (Sub‐Linear)
y = ‐0.0002x2 + 0.3909x + 174.69
100
R² = 0.9659
0
0 500 1000 1500 2000 2500
Age (Days)
Segmented Growth
The common theme in studies of growth [101, 127, 152, 153, 192, 217,
239] is that they focus on the growth over the entire evolution history
and as a consequence attach only a single growth rate to software sys-
tems. That is, they tend to classify the size growth of a system to be
one of the following: sub-linear, linear, or super-linear. However, when
a more fine-grained analysis was performed, software systems under-
going evolution have been shown to exhibit a segmented and uneven
growth pattern. That is, the software system can grow at different rates
at different time periods [6,120,123,175,256,305], and also that some
modules can grow much faster than others [17, 85]. This segmented
growth pattern is illustrated in Figure 2.3. The data in the figure is
from one of the software systems that we analyse in our study and
highlights the need for analyzing growth from different perspectives.
27
Chapter 2. Software Evolution
Summary
28
Chapter 2. Software Evolution
a drag on the rate of growth [101]. Another aspect is that the studies
identifying linear and super-linear growth rates show that the parame-
ters considered in simple growth models (like Turski’s [283, 284], or by
models developed using regression techniques) are not sufficient when
attempting to model and understand growth. That is, though there is
some relationship between complexity and growth, there may be other
aspects that influence the growth of a software system.
29
Chapter 2. Software Evolution
In our study we aim to close these gap in our understanding and ob-
serve evolution from a different perspective. Rather than use global
size growth models to infer support for laws and understand the na-
ture of evolution, we study software evolution by observing how the
distribution of size and complexity changes over time. More specifically,
we undertake a longitudinal study that constructs probability density
functions of different metrics collected from the software systems, and
builds a descriptive model of evolution by observing how these metric
distributions change over time.
30
Chapter 2. Software Evolution
Detecting Change
31
Chapter 2. Software Evolution
The second approach to detecting change analyses the actual class (or a
program file) at two different points in time in order to determine if it has
changed. This approach, referred to as origin analysis in the literature
[102,282] is comparatively more complex, but provides a more accurate
reflection of the changes since there is no reliance on reconstructing
change information by analysing an external transaction log. Detecting
change between two releases of a class has been an area of study for
many years [242,282] and different methods to assist in the detection of
the change have been developed [5,22,72,80,135,143,155,243,294]. In
this thesis, we detect changes by analysing classes in two consecutive
releases as it enables us to focus on changes to the functional aspects
of a program. A more detailed discussion of these techniques, their
strengths and limitations, as well as our own method for detecting the
change is presented in Chapter 6.
32
Chapter 2. Software Evolution
Dimensions of Change
33
Chapter 2. Software Evolution
added, removed and modified [85, 217]. Periodicity measures the reg-
ularity at with a system or an abstraction is modified. For instance,
this measure is required to determine how often a file is modified over
the evolution history. The expectation is that if a large part of the code
base is modified frequently, then the software is highly volatile and may
require corrective action. Finally, the measure of dispersion aims to
identify if there is a consistent pattern to the change. This measure
is motivated by the assumption that consistency allows managers to
anticipate how much of the code base may change in the next version
and hence can allocate resources appropriately. The dispersion mea-
sure can be applied to determine the consistency of the size of change,
as well as the consistency in the frequency of change. In this thesis,
we study change against these three dimensions. A discussion our ap-
proach to compute change against these three dimensions is presented
in Chapter 6.
34
Chapter 2. Software Evolution
35
Chapter 2. Software Evolution
36
Chapter 2. Software Evolution
a larger scale study can help improve the strength of this expectation
and can provide a more robust model that captures the relationship
between complexity and change.
Another arc in the study of change has been the area of understand-
ing co-change, where the expectation is that certain groups of classes
or modules change together [85] because developers tend to group and
construct related features together. This expectation is supported by
research undertaken by Hassan and Holt who analyzed many Open
Source projects and concluded that historical co-change is a better
predictor of change propagation [113]. This observation is also sup-
ported by Zimmerman et al. [319, 320] which led to the development
of tools [315] that can guide developers to consider a group of classes
when they modify one class. These studies into co-change have been
instrumental in developing tools that can help developers consider the
impact of a change by exposing a wider ripple impact than is visible to
37
Chapter 2. Software Evolution
Summary
Studies of change have a general theme. They have primarily used logs
generated by the tools used in software development [138], and the typi-
cal focus has been mostly on establishing attributes within classes that
make them change-prone [20, 28, 31, 63, 130, 177, 213, 214, 261, 266,
314, 318]. Also, these studies have arrived at their conclusions us-
ing relatively small number of software systems, and the emphasis has
been on understanding fine-grained changes during the construction
of a release rather than post-release.
38
Chapter 2. Software Evolution
39
Chapter 2. Software Evolution
• How does the profile and shape of this distribution change as soft-
ware systems evolve?
• What is the likelihood that a class will change from a given version
to the next?
The next chapter (Chapter 3) presents the input data set, our selection
criteria and the aspects that we investigate. Chapter 4 (Measuring Java
Software) follows with a discussion of the metric extraction process and
defines the metrics we collect from the Java software systems. The key
findings of this thesis are presented in Chapter 5 (Growth Dynamics)
and Chapter 6 (Change Dynamics). We discuss the implications arising
from our findings in Chapter 7 and conclude in Chapter 8.
40
Chapter 3
The software artifacts from the release history (specifically binaries and
source files) offer a direct evolutionary view into the size, structure and
41
Chapter 3. Data Selection Methodology
History Description
Release History Source code, binaries, release notes, and release
documentation
Revision History Version control logs, issue/defect records, Modifi-
cation history of documentation, Wiki logs
Project History Messages (email, Instant message logs), Project
documentation (plans, methodology, process)
Table 3.1: The different types of histories that typically provide input
data for studies into software evolution.
Research work that focuses on analysing the release history studies the
actual outcome of changes, in most cases the source code over time.
The Laws of Software Evolution as defined by Lehman [167, 168, 174]
were built pre-dominantly from the analysis of release histories. Re-
searchers that have focused on this area have been able to identify typ-
ical patterns of evolution and change [27], construct statistical models
of growth and change [21,127,132,152,193,264,269,270,283,310], de-
velop methods to identify change prone components [109,149,253,281],
and have proposed methods to visualise evolution [1, 65, 79, 95, 163,
164,298]. An interesting contribution from studies of release history is
that modular architectures style can allow for rate of growth beyond
the sub-linear growth rate expected by the Laws of Software Evolu-
tion [100, 200]. This insight provides some level of empirical support
for the recommendation from software engineering to build software as
loosely coupled modules [218, 302].
42
Chapter 3. Data Selection Methodology
Although the version control logs, change logs, and defect logs are in-
herently unreliable due to the human dimension, they still offer a valu-
able source of information when interpreted qualitatively as well as for
providing a high-level indication of change patterns. For example, re-
searchers that have analyzed version control logs [231, 232, 312, 316,
319,321] developed techniques to identify co-evolution, that is, artifacts
that tend to change together.
43
Chapter 3. Data Selection Methodology
In this thesis, we use the release history as our primary source of data.
Our approach involves collecting metric data by processing compiled
binaries (Java class files, JAR and WAR archives). We consider every
release of the software system in order to build an evolution history.
The analysis then uses the extracted metric data as the input. A com-
prehensive discussion of our technique, and the actual measures are
presented in the next chapter (Chapter 4). Though, our focus is on the
release history, we also make use of the revision and project history in
order to gain a deeper insight and better understanding of any abnor-
mal change events. For instance, if the size of the code base has dou-
bled between two consecutive releases within a short time frame, ad-
ditional project documentation and messages on the discussion board
often provide an insight into the rationale and motivations within the
team that cannot be directly ascertained from an analysis of the bina-
ries or the source code alone. This approach of considering multiple
sources of information in studies of evolution is also suggested to be
effective by Robles et al. [240] as it will provide a more comprehensive
picture of the underlying dynamics than can be obtained by purely re-
lying on a single source of information.
Open Source Software (OSS) is, in general, software which is free, and
distributed along with the source code at no cost with licensing models
that conform to the Open Source Definition (OSD), as articulated by the
Open Source Initiative 1 (see Table 3.2). Typically the term “free” carries
multiple meanings and in this context, it implies that the software is:
(i) free of cost, (ii) free to alter, (iii) free to distribute, and (iv) free to use
the software as one wishes. In contrast, commercial software is in most
cases sold at a cost, with restrictions on how and where it can be used,
and often without access to the source code. Though, some commercial
1 Open Source Initiative http://www.opensource.org.
44
Chapter 3. Data Selection Methodology
Criteria Description
Free Redistribution of the program, in source code or
Redistribution other form, must be allowed without a fee.
Source Code The source code for program must be available at no
charge, or a small fee to cover cost of distribution and
media. Intermediate forms such as the output of a
preprocessor or translator are not allowed. Deliber-
ately obfuscated source code is not allowed.
Distribution of Distribution of modified software must be allowed
Modifications without discrimination, and on the same terms as
the original program.
License The license must allow modifications, derived works,
be technology neutral. It must not restrict other soft-
ware, and must not depend on the program being
part of a particular software distribution.
Integrity The license may require derived and modified works
to carry a different name or version number from the
original software program.
No The license must not restrict the program to specific
Discrimination field of endeavour, and must not discriminate against
any person or group of persons.
Table 3.2: The criteria that defines an Open Source Software System.
systems are distributed free of cost, the licensing models often restrict
alteration and how they can be used and distributed.
Projects that develop and distribute Open Source Software have over
the past two decades championed a (radical) paradigm shift in legal
aspects, social norms, knowledge dissemination and collaborative de-
velopment [200]. One of the most compelling aspects of Open Source
Software projects is that they are predominantly based on voluntary
contributions from software developers without organisational support
in a traditional sense [202]. The typical open source model pushes for
operation and decision making that allows concurrent input of diver-
gent agendas, competing priorities, and differs from the more closed,
centralised models of development [83, 215, 234]. These open source
projects have over time evolved tools and techniques by experimenting
with a range of ideas on how best to organise and motivate software de-
velopment efforts, even when developers are geographically dispersed
and not provided any monetary compensation for their efforts. In these
45
Chapter 3. Data Selection Methodology
projects, methods and tools that have not added sufficient value were
rejected, while embracing approaches that have consistently provided
additional value [215]. In a sense, this model of software develop-
ment has provided an ongoing validation of collaboration techniques
that tend to work, are light-weight and provide the maximum return on
invested effort [160, 207, 215, 233].
Open Source Software projects due to their very nature often select li-
censes that do not place any restriction on the use of the software as
well as the information and knowledge that is generated during devel-
opment [176, 262]. The use of these open licenses has opened up a
rich data set of information that can be analyzed to understand how
developers tend to build such software, how they collaborate, share in-
formation and distribute the outcome of their efforts. Further, the lack
of restrictions on analysis and reporting of the findings has motivated
an interest in open source software for evolution research, including
this work (see No Discrimination in Table 3.2). An advantage of focus-
ing on Open Source Software projects is that the findings from research
into these projects provides additional insight into the effectiveness and
value of the development methods as well as helping identify typical and
unusual evolution patterns. Given their increasing adoption in com-
mercial projects [200, 202, 207, 262], an understanding of how these
open source software systems evolve is also of value to stakeholders
outside of the Open Source community.
46
Chapter 3. Data Selection Methodology
ing, a web site to host and distribute various releases, a Wiki to create
and share documents, as well as a defect/issue tracking tool.
1. Sourceforge - http://www.sourceforge.com
47
Chapter 3. Data Selection Methodology
The selection criteria that each project must satisfy are as follows:
1. The system must be developed for the Java virtual machine. Source
code and compiled binaries are available for each release.
4. The system has been in actively development and use for at least
36 months.
5. The system comprises of at least 100 types (i.e., classes and inter-
faces) in all releases under study.
Java is currently a popular language with wide spread use in both open
source and commercial projects. This popularity and usage has re-
sulted in a large amount of software developed using the Java program-
ming language. Despite its popularity and use in a variety of domains,
there are only a few studies that exclusively study release histories of
Java software systems [21, 193, 208, 270, 319]. Further, these studies
48
Chapter 3. Data Selection Methodology
49
Chapter 3. Data Selection Methodology
The size and skill of development teams, though helpful, was a criteria
that was removed after an initial pass at selecting systems mainly be-
cause it was not possible to obtain this information accurately. In some
of our projects, the software used to host the source control repositories
changed during the evolutionary history of a project and many projects
choose to archive older contribution logs at regular intervals removing
access to this data. These aspects limited our ability to determine the
number of active and contributing developers to the project, specifically
during the early part of the evolution. Another facet that could not be
accurately determined was that the level of contribution from different
developers. That is, we were not able to identify reliably if some develop-
ers contribute more code than others. Further, some project members
contributed artwork, documentation, organised and conducted meet-
ings while some focused on testing. These non-code contributions were
often not visible as active contributors on the source code repository.
Another interesting finding during our investigation was that develop-
ers that have not contributed any material code for many years are still
shown as members in the project. These limitations, including an ob-
servation that suggests that a small sub-set of developers are responsi-
ble for a large amount of the changes and additions to the source code
in open source software, has been noted by Capiluppi et al. [39].
50
Chapter 3. Data Selection Methodology
tems, 1057 unique versions and approximately 55000 classes (in total
over all the various systems and releases). Our data comprises three
broad types of software systems: (a) Applications, (b) Frameworks, and
(c) Libraries. In our selection, we aimed to select a similar number of
systems for each of the types.
51
Chapter 3. Data Selection Methodology
Table 3.3: Systems investigated - Rel. shows the total number of dis-
tinct releases analyzed. Age is shown in Weeks since the first release.
Size is a measure of the number of classes in the last version under
analysis.
52
Chapter 3. Data Selection Methodology
53
Chapter 3. Data Selection Methodology
54
Chapter 3. Data Selection Methodology
Data
(Configuration, Images, Files etc.)
Java Virtual
Machine
We analyse the binary bundle in our study and it typically contains the
following items:
• A set of Java archives (JAR or WAR files) that form the core software
system
• Configuration files
• Release documentation
In our study, we collect metrics from the core software system and ig-
nore third-party libraries (see Figure 3.1).
55
Chapter 3. Data Selection Methodology
Using Binaries
We extract the measures for each class by processing the compiled Java
bytecode instructions generated by the compiler (details are explained
in Chapter 4). This method allows us to avoid running a (sometimes
quite complex) build process for each release under investigation since
we only analyze code that has actually been compiled.
This practice of leaving old code has been noted by researchers in the
field of code clone detection who observed the tendency of developers
to copy a block of code, modify it, and leave the old code still in the
repository [5, 135, 155, 157]. Godfrey et al. [100] in their study of Linux
kernel evolution noted that depending on the configuration setting in
the build script (Makefile), it is possible that only 15% of the Linux
source files are part of the final build. The use of only a small set of
source for a release is common in software that can be built for multiple
environments. For instance, Linux is an operating system designed to
run on a large range of hardware platforms. When building the operat-
ing system for a specific hardware configuration, many modules are not
56
Chapter 3. Data Selection Methodology
needed and hence not included during the build process using settings
provided in the Makefile. Hence, when using source code files as input
into a study of evolution, ideally, the build scripts have to be parsed to
determine a the set of files for a specific configuration and then the evo-
lution of the system for this specific configuration has to be analysed.
Many previous studies [41, 100, 105, 120, 153, 217, 239, 256] that use
release histories do not explicitly indicate if the build scripts have been
pre-processed adequately to ensure that the correct set of source files
is used as the input.
In our study, we use compiled releases (Java classes package inside JAR
files) to construct our release history and so our input data has already
gone through the build process, reducing the chance of encountering
code that is no longer in active use. This approach allows us to fo-
cus on the set of classes that have been deemed fit for release by the
development team.
57
Chapter 3. Data Selection Methodology
volume to the size measure and have the potential to distort the evolu-
tionary patterns and may indicate a faster rate of growth than would
be possible if only the contributions of the core team are considered.
Including the external libraries also has the potential to distort mea-
sures of complexity and may indicate that a project is far more complex
than it really is. For example, if a project makes use of two complex
libraries for visualization and signal processing the structural and al-
gorithm complexity of these libraries will be considered to be part of
the actual project under investigation and the core project will show far
more complexity than what needs to be considered by the developers.
We also noticed that many projects rely on the same set of libraries and
frameworks. For example, the Apache Java libraries are extensively
used for String, Math, Image, and XML processing. Though, there are
58
Chapter 3. Data Selection Methodology
The approach we take for detecting third party libraries in order to re-
move them from our measures is explained in the next chapter (Chap-
ter 4).
3.7 Summary
Research into software evolution relies on historical information. There
are three types of histories that can be used to understand the evolution
of a system: (a) Release history, (b) Revision history or, (c) Project his-
tory. Our research effort studies release histories of forty Java software
systems. We investigate Open Source Software Systems due to their
non-restrictive licensing. Further, unlike previous studies that worked
with source code files we use compiled binaries and also actively ignore
contributions from third-party libraries.
In the next chapter, we present our approach for collecting metrics from
Java software, the description of the metrics collected, and how we
model the information to enable our analysis.
59
Chapter 4
60
Chapter 4. Measuring Evolving Software
61
Chapter 4. Measuring Evolving Software
The classes that we measure are compiled Java classes and our metric
extraction approach is discussed in Section 4.5.2. The class depen-
dency graph captures the dependencies between the various classes in
the software system. The graph is constructed by analysing all classes
in the system and our method is discussed in further detail in Sec-
tion 4.5.4 and Section 4.5.5. We consider the metrics that are described
in this chapter as direct metrics since we compute the value by a direct
count of either a class or a graph, rather than by combining different
types of measures. That is, the domain used by the metric function
contains only one variable.
62
Chapter 4. Measuring Evolving Software
What is complexity?
1. The size of the system: more parts require a need to organise them
in order to properly comprehend,
63
Chapter 4. Measuring Evolving Software
5. The level of design and order, where a better designed system lends
itself to be understood easily. Specifically, a system that has de-
tectable and well known patterns will tend to improve maintain-
ability.
Measuring Complexity
There are two broad types of complexity that can be measured: Vol-
umetric and Structural [77]. Volumetric complexity is measured by
counting the number and variety of abstractions, whereas the inter-
connections between these abstractions is used to derive structural com-
plexity measures [147] which provide an insight into the structure and
organisation of a software system [45, 46, 122, 250, 265, 277].
64
Chapter 4. Measuring Evolving Software
In our study, we collect complexity metrics for each class from both per-
spectives. Specifically, we measure the internal structural complexity of
a class as well as the coupling for a class (the specific metrics and their
definitions are described in the sections that follow). Furthermore, we
use the term “complexity” to imply structural complexity rather than
volumetric complexity.
65
Chapter 4. Measuring Evolving Software
Release History
Name: String Implemented
Interfaces
0 .. *
1 .. * Class Metric
Version Name: String
RSN: int 1 .. * Package: String
Release Date: Date Metric Data: Set
1 is Interface: boolean 0 .. *
66
Chapter 4. Measuring Evolving Software
The key limitation to the use of RSN arises when attempting to compare
aspects like growth rates in different software systems [17, 217, 277]
since the time interval between releases in different software systems
cannot be assumed to be the same constant value. Furthermore, since
the time interval between releases does not correspond to a more in-
tuitive measure of real elapsed time, models that use RSN have to be
carefully interpreted.
67
Chapter 4. Measuring Evolving Software
68
Chapter 4. Measuring Evolving Software
0
0 10 20 30 40 50 0 10 20 30 40 50
Release Sequence Number Release Sequence Number
0 50 100 150
0 10 20 30 40 50 0 10 20 30 40 50
Release Sequence Number Release Sequence Number
four software systems from our data set. If developers release soft-
ware at regular intervals the scatter plots (cf. Figure 4.2) would show
substantially less variability. Further, given the variability in the data,
we are also unable to derive a generalizable, and sufficiently strong lin-
ear relationship between RSN and “Days between Consecutive Releases”
which is necessary for RSN measure to be considered an interval scale
measure [260]. Though, the intervals are erratic, interestingly in ap-
proximately 70% of the releases (across our entire data set) we noticed
that the gap between consecutive releases is less than 90 days (see Fig-
ure 4.3). This observation indicates that there exists some pressure on
the development team that compels them to release software at reason-
able intervals, potentially to ensure ongoing community support.
69
Chapter 4. Measuring Evolving Software
1
Cumulative Percentage (of Releases)
.2 .4 0 .6 .8
The measure of calendar time is a more flexible measure than RSN be-
cause it directly maps to the more intuitive “elapsed time” with constant
interval between units. Additionally, this measure of time is also recom-
mended as a more appropriate and effective in studies of evolution by
many researchers [17,127,188,217,277]. Although calendar time is the
70
Chapter 4. Measuring Evolving Software
Figure 4.4: Age is calculated in terms of the days elapsed since first
release.
71
Chapter 4. Measuring Evolving Software
Graph Metric
Extraction
Dependency Graph
Construction
Inheritance Metric
Extraction
Figure 4.5: The metric extraction process for each release of a software
system
72
Chapter 4. Measuring Evolving Software
JAR files were tagged as potential external libraries based on the pack-
age names of classes inside the JAR file. We found that using the
package names was an effective method to detect potential external li-
braries because Java developers tend to follow the recommended stan-
dard package naming convention and embed the name of the project, or-
ganisation or team in the package name [235]. For example, all classes
developed by the Hibernate project have a package name that starts
with org.hibernate. We used this common naming convention that is
applied by developers to cluster package names manually (after a simple
sort) and then identify potential third-party JAR files. Once potentially
distinct set of packages was identified, we used a Google search to check
if a distinct project with its own source code repository was available
on the web that matched the package signature identified.
73
Chapter 4. Measuring Evolving Software
In the final stage of this step, we determined the release date for the
version from the JAR files that have been determined to be part of the
core software system. All Java archives contain a Manifest file that is
created as part of the JAR construction process. We use the creation
date timestamp of this Manifest file to determine the release date for
the Version. Where a version contains multiple JAR files, we apply
the maximum function and take the latest date to represent the release
date for the entire version. This was needed since certain projects tend
to constructed JAR files for their distribution over multiple days rather
than build it all on a single date.
Once the release date for a version was established, we ordered all ver-
sions by release date and compute the Release Sequence Number (RSN)
for each version. We started the RSN numbers at 1 for the oldest version
and incremented it by 1 for each subsequent version.
After the classes files are extracted from the JAR file, we process each
class file using ASM, a Java Bytecode manipulation framework [9], in
order to extract information from the compiled Java class (Table 4.1
highlights the information that is available to be extracted from a com-
piled Java class). In this step we compute direct measures such as
the Number of Fields for a class as well as extracting its additional in-
formation such as the fully qualified class name (i.e. class name in-
cludes the package name; an example of a fully qualified class name is
java.lang.String, where java.lang is the package name).
74
Chapter 4. Measuring Evolving Software
Table 4.1: Structure of a compiled Java Class. Items that end with an
* indicate a cardinality of zero or more [180].
The Java compiler reads class and interface definitions, written in the
Java programming language [108], and compiles them into class files
[131] that can be executed by the Java Virtual Machine (JVM) [180].
A compiled Java class, in contrast to natively compiled programs (for
example, a C/C++ application), retains all of the structural informa-
tion from the program code and almost all of the symbols from the
source code [184, 268]. More specifically, the compiled class contains
all information about the fields (including their types), method bodies
represented as a sequence of Java bytecode instructions, and general
information about the class (see Table 4.1 for an overview). Although,
a range of different Java compilers are available, the class files that
are generated by these compilers must adhere to the JVM specifica-
tion [180] and hence all of the data that we process to extract the met-
rics matches a common specification.
The Java bytecode instructions that are generated by the compiler con-
sists of an opcode specifying the operation to be performed, followed
75
Chapter 4. Measuring Evolving Software
Though, the compiled Java class is close to the source code, there are
some differences:
76
Chapter 4. Measuring Evolving Software
• A compiled class describes only one Java class, or one Java inter-
face. This constraint extends to inner classes as well, and each
inner class is compiled into a separate class file. When a source
code file contains a main class with multiple inner classes, the
Java specification requires the compiler to generate distinct com-
piled class files for each of the inner classes as well as one for the
parent class. Furthermore, the compiled parent class will con-
tain references to the inner classes. Similarly, the compiled inner
classes also have a reference to either the enclosing parent class,
or the enclosing parent method (if an inner class is declared within
scope of the a method).
• The type name of the compiled class is fully qualified. That is,
the package name is included. However, in the source code the
package name is stated as a separate statement.
77
Chapter 4. Measuring Evolving Software
We process each compiled Java class file and extract two types of met-
rics: direct count metrics and modifier flags. Table 4.2 shows the list of
count metrics that we extract by processing field and method interface
information for each class, Table 4.3 shows the list of count metrics
that are computed by processing the bytecode instructions present in
method bodies, and Table 4.4 shows the flags that we extract for each
class. In this thesis, we treat all the count metrics as measures of size
as they reflect size of a class from different perspectives. However, we
consider the Number of Branches (NOB) measure as a complexity met-
ric that captures the internal structure of a class. The NOB measure is
equivalent to the widely used Weighted Method Count (WMC) metric [46]
with Cyclomatic Complexity [190] used as the weight [46]. The WMC
and (hence our formulation of the NOB) is accepted within the literature
as a measure of structural complexity [116, 165].
Along with the metrics shown in Tables 4.2, 4.3, and 4.4 we also capture
the fully qualified name of each class, its fully-qualified super class
name as well as all method names (including full signature capturing
the return type), field names (including the type) and the fully-qualified
name of all other classes that a class depends upon.
78
Chapter 4. Measuring Evolving Software
Table 4.2: Direct count metrics computed for both classes and inter-
faces.
79
Chapter 4. Measuring Evolving Software
Abbv. Name Description
CBC Try-Catch Block Number of try-catch blocks
Count
THC Throw Count Number of throw statements
ICC Inner Class Number of inner classes (counted recursively)
Count
MCC Method Call Number of method calls
Count
MCI Internal Method Number of internal method calls (that is, meth-
Call Count ods defined in the same class)
MCE External Method Number of times methods defined outside of the
Call Count class are invoked
LVC Local Variable Number of local variables defined across all
Count methods in the class
OOC Instance Of Number of times the instanceof operator is
Check Count used
CAC Check Cast Number of times a cast is checked for
Count
TCC Type Construc- Number of times a new object is created
tion Count
CLC Constant Load Number of constants loaded from a local variable
Count
PLC Primitive Load Number of times a primitive is loaded from a lo-
Count cal variable
PSC Primitive Store Number of times a primitive is stored into a local
Count variable
ALC Array Load Count Number of arrays loaded from a local variable
ASC Array Store Count Number of arrays stored into a local variable
FLC Field Load Count Number of times an object or primitive is loaded
from a field
FSC Field Store Count Number of times an object or primitive is stored
into a field
LIC Load Count Total number of load operations (is a sum of PLC,
ALC, FLC, and CLC)
SIC Store Count Number of store operations (is a sum of PSC,
ASC, and FSC)
IOC Increment Opera- Number of times the increment operation is used
tion Count
ZIC Zero Operand In- Number of bytecode instructions that have no
str. Count operands
ITC Instruction Number of bytecode instructions
Count
NOB Branch Count Number of branch instructions (counts all con-
ditional branches including the cases inside a
switch statement as well as for and while loops)
GTC Goto Count Number of times a goto instruction is used (this
is generated when the source code contains loop
constructs and is generally paired with a branch
instruction)
80
Chapter 4. Measuring Evolving Software
The Java programming language provides developers the option for cre-
ating two different types of abstractions: a class, and an interface [108].
Within the context of our study, the key distinction between these two
abstractions is that interfaces do not contain any method bodies and
hence the metrics defined in Table 4.3 are not available for compiled
Java interfaces which do not contain bytecode instructions in the method
section of a class file. However, all of the other information (see Ta-
ble 4.1) is available and therefore used to compute the metrics defined
in Table 4.2 and Table 4.4. We are also able to extract dependency infor-
mation from an interface, that is, other Java classes that an interface
depends upon (discussed in Section 4.5.4). In the rest of this thesis, to
improve readability, we use the term class to indicate a compiled Java
class, and it may be either a Java interface or a Java class. We use
81
Chapter 4. Measuring Evolving Software
the terms Java Interface and Java Class where these abstractions are
treated separately in our analysis.
82
Chapter 4. Measuring Evolving Software
R
Q
V
U
S T
Classes in set N
C Math Date
B
Classes in set K
83
Chapter 4. Measuring Evolving Software
Once the dependency graph has been constructed, we can analyze each
node n ∈ N in the graph as well as the set of directed links l ∈ L for
each node within the graph G T and measure the In-Degree Count lin (n),
as well as the Out-Degree Count lout (n) it. More precisely,
84
Chapter 4. Measuring Evolving Software
85
Chapter 4. Measuring Evolving Software
<<interface>>
java.util.ArrayList
java.util.List
ClassN ClassQ
ClassP
ClassM ClassO
e
lout (n) = |{(n, next ) ∧ next ∈ K \ N }| (4.5.3)
i
lout (n) = |{(n, nint ) ∧ nint ∈ N ∧ n 6= nint }| (4.5.4)
86
Chapter 4. Measuring Evolving Software
e i
lout (n) = lout (n) + lout (n) (4.5.5)
The dependency metrics collected for each class and the abbreviation
used for the metrics are presented in Table 4.5. While determining
the dependencies between classes we ignore all dependency links into
java.lang.Object since all objects in Java inherit from this class. By
ignoring this default link we are able to determine if there are classes
that do not have any outgoing links to other objects, that is, Out-Degree
Count can be zero for some classes. Furthermore, having a potential
zero value for the dependency metrics simplifies the statistical analysis
that we undertake in our study (discussed in further detail in Chapter 5
and Chapter 6).
87
Chapter 4. Measuring Evolving Software
java.lang.Object
ClassA
ClassE
The final step in our metric extraction process focuses on measuring in-
heritance. The set of inheritance measures that we compute are listed
and explained in Table 4.6. We illustrate how our inheritance met-
rics are computed by using an example class diagram (see Figure 4.8
showing both the figure and the metrics computed). Since we do not
process classes external to the core software system, the inheritance
measures that we compute may not include the complete set of ances-
tors for any given class in a software system. For example, consider a
class ReportView that extends the class javax.swing.JFrame which
is part of the Java framework. We compute the inheritance metric Depth
88
Chapter 4. Measuring Evolving Software
4.6 Summary
The evolution of a software system can be studied in terms of how var-
ious properties as reflected by software metrics change over time. We
build a release history model by analysing the compiled class files. Our
release history model captures meta-data and 58 different metrics at a
class level. We also build a class dependency graph for each release in
the evolution history.
The data selection and metric extraction method that we use ensures
that we study non-trivial software allowing us to extend our findings
to other comparable software systems built in Java. We also analyse
compiled binaries that have already gone through the build process
improving the accuracy of our measures. Further, as discussed in the
previous chapter, we focus on contributions from the core development
team ignoring third party libraries ensuring that the metrics that we
collect are a better reflection of the development effort.
89
Chapter 5
Growth Dynamics
90
Chapter 5. Growth Dynamics
• How does the profile and shape of this distribution change as soft-
ware systems evolve?
91
Chapter 5. Growth Dynamics
used to answer questions like “are the rich getting richer?”. Our ap-
proach allows us not only to observe changes in software systems ef-
ficiently, but also to assess project risks and monitor the development
process itself. We apply the Gini coefficient to 10 different metrics and
show that many metrics not only display consistently high Gini values,
but that these values are remarkably consistent as a project evolves
over time. Further, this measure is bounded (between a value of 0 and
1) and when observed over time it can directly inform us if develop-
ers tend to centralise functionality and complexity over time or if they
disperse it.
The raw data used is this study is available as data files on the DVD
attached to this thesis. Appendix E describes the various data and
statistical analysis log files related to this chapter.
92
Chapter 5. Growth Dynamics
15
1
.8
Cumulative Percentage
10
.6
Percent
.4
5
.2
0
0
0 20 40 60 80 100 120 140 160 180
Number of Methods
The sheer volume of metric data available from any object-oriented soft-
ware systems can make it difficult to understand the nature of soft-
ware systems and how they have evolved [75]. A common approach
[77, 117, 165, 182] to reducing the complexity of the analysis is to ap-
ply some form of some simple statistical summarisation such as the
93
Chapter 5. Growth Dynamics
94
Chapter 5. Growth Dynamics
6
5
4
Median
32
1
0
0 10 20 30 40
Release Sequence Number
224, 226, 251, 274, 295, 296] summarise data using simple descriptive
statistical measures.
95
Chapter 5. Growth Dynamics
bution parameters as they summarise the data and can gain an insight
into the evolution by observing changes to the distribution parameters
over time.
96
Chapter 5. Growth Dynamics
97
Chapter 5. Growth Dynamics
Though, fitting distributions has been shown to have merit for model-
ing networks [210] and to infer how these networks have been created,
software evolution is better modelled by analysing the evolution history
as we can reduce the number of assumptions one has to make. Rather
than attempting to infer the generative process from a single release of
a software system, we can gain more insight into the evolutionary pres-
sures by analysing the changing metric distribution over time. In our
work, we take this approach and study the metric distributions as they
change over time in order gain a better understanding of the underlying
evolutionary processes.
Though there has been progress over the last decade in this field, there
is still no widely-accepted distribution that captures consistently and
reliably software metric data. But more importantly, we are not re-
quired to fit a given software metric to particular distributions in order
to interpret it. What is needed is a set of measures that reliably and con-
sistently summarize properties of the distribution allowing for effective
inferences to be made about the evolution of a software system.
98
Chapter 5. Growth Dynamics
insight into how the system has been constructed, and how to maintain
it [66]. A technique to study allocation of some attribute within a popu-
lation and how it changes over time has been studied comprehensively
by economists who are interested in the distribution of wealth and how
this changes [311] – we use the same approach in our analysis.
In 1912, the Italian statistician Corrado Gini proposed the Gini coef-
ficient, a single numeric value between 0 and 1, to measure the in-
equality in the distribution of income or wealth in a given population
(cp. [91, 229]). A low Gini coefficient indicates a relatively equal wealth
distribution in a given population, with 0 denoting a perfectly equal
wealth distribution (i.e., everybody has the same wealth). A high Gini
coefficient, on the other hand, signifies a very uneven distribution of
wealth, with a value of 1 signalling perfect inequality in which one in-
dividual possesses all of the wealth in a given population. The Gini
Coefficient is a widely used social and economic indicator to ascertain
an individual’s ability to meet financial obligations or to correlate and
compare per-capita GDPs [286].
We can adopt this technique and consider software metrics data as in-
come or wealth distributions. Each metric that we collect for a given
property, say the number of methods defined by all classes in an object-
oriented system, is summarized as a Gini coefficient, whose value in-
forms us about the degree of concentration of functionality within a
given system.
99
Chapter 5. Growth Dynamics
Rx
t f (t) dt
L(F(x)) = R−∞∞ (5.2.1)
−∞ t f (t) dt
More formally, if the Lorenz curve is L(Y ), then the Gini Coefficient is
defined as:
100
Chapter 5. Growth Dynamics
Z 1
GiniCoefficient = 1 − 2 L(Y) dY (5.2.2)
0
1 ∑ n ( n + 1 − i ) xi
G= ( n + 1 − 2 ( i =1 n )) (5.2.3)
n ∑ i =1 x i
101
Chapter 5. Growth Dynamics
In the first version of the Spring Framework, the Gini Coefficient value
for In-Degree Count is 0.62. Values of Gini Coefficient substantially
greater than 0 are indicative of a skewed distribution, where a small set
of classes are very popular (since In-Degree Count is the wealth in our
case). Furthermore, in Spring Framework, the Gini Coefficient value
gradually increases over time from 0.62 to 0.71 over a period of 4 years
of evolution. The trend shows that over time, a small set of classes are
gaining popularity. This type of trend analysis is used by economists to
answer the question “are the rich getting richer?”.
102
Chapter 5. Growth Dynamics
103
Chapter 5. Growth Dynamics
104
Chapter 5. Growth Dynamics
A consistent finding by other researchers [21, 55, 223, 270, 299] study-
ing software metric distributions has been that this data is positively
skewed with long-tails. Can we confirm this finding in our own data?
Further, will this shape assumption hold if metric data was observed
over time? We undertook this step in order to provide additional strength
to the current expectation that metric data is highly skewed.
1 n ( x i − µ )3
n i∑
MovementSkewness = (5.3.1)
=1 σ3
In our analysis, we tested the metric data for each release over the en-
tire evolution history to ensure that the data did not have a gaussian
distribution by using the Shapiro-Wilk goodness of fit tests for normal-
ity [279] at a significance level of 0.05. The expectation is that the test
will show that the metric data is not normally distributed. Addition-
ally, to confirm that the distribution can be considered skewed we com-
puted the descriptive measure of movement skewness (See Equation
105
Chapter 5. Growth Dynamics
We compute the Gini Coefficient for each of the selected metrics (see
Table 5.1) using the formula in Equation 5.2.3. There were, however,
some minor adjustments made to the calculation after taking into con-
sideration certain Java language features. When we process code for
metric extraction, we treat both Java classes and Java interfaces as ab-
stractions from which we can collect metrics (see Chapter 4). However,
Interfaces in Java are unable to include load or store actions, branch-
ing, method invocations, or type constructions, respectively. As a re-
sult, interfaces were excluded from these counts, but were included in
the Out-Degree Count, In-Degree Count, Number of Methods, and Public
Method Count measures. While interfaces in Java may include constant
field declarations [108], it was decided to also exclude them from the
Number of Attributes measure in order to focus more directly on field
usage within individual classes.
106
Chapter 5. Growth Dynamics
Similarly, if the Gini Coefficient values of any system (for a selected met-
ric) are within a very narrow boundary over the entire evolution history,
then it is an indicator of organisation stability at the system level. For
example, consider a software system which has increased in size by
300%, but the Gini Coefficient for Number of Branches is between 0.78
and 0.81 over an evolution of 4 years. This minimal fluctuation can be
seen as an indicator of stability in how developers organise structurally
complex classes. However, if Gini Coefficient values change substan-
tially over the evolution period across different systems then it is an
indication that evolutionary pressures do play a role in how developers
organise the solutions.
We analyse the trends in Gini Coefficient values over time to answer one
of our research questions - do developers create more complex and large
classes over time?. If we consistently observe that the value of the Gini
Coefficients increase over time, this is a strong indicator that developers
do tend to centralise functionality into a few complex and large abstrac-
tions as software evolves. However, if the Gini Coefficients decrease over
time, this then suggests that there are pressures that compel develop-
ment teams to reorganise the responsibilities more evenly. However, a
third alternative is that developers have a certain set of habitual pref-
erence and that software evolution does not impact on the underlying
distribution significantly – that is, the Gini Coefficients do not change
substantially over time. If the Gini Coefficients consistently do not show
107
Chapter 5. Growth Dynamics
108
Chapter 5. Growth Dynamics
LIC –
SIC 0.93 –
NOB 0.80 0.81 –
IDC 0.18 0.24 0.15 –
ODC 0.86 0.82 0.77 0.08 –
NOM 0.71 0.66 0.44 0.26 0.63 –
PMC 0.21 0.20 0.09 0.27 0.18 0.75 –
NOA 0.70 0.79 0.48 0.23 0.50 0.53 0.16 –
FOC 0.96 0.87 0.80 0.09 0.85 0.62 0.16 0.67 –
TCC 0.78 0.78 0.53 0.11 0.68 0.56 0.16 0.82 0.84 –
Metric LIC SIC NOB IDC ODC NOM PMC NOA FOC TCC
5.4 Observations
In this section we present our observations within the context of the
key research questions. The findings are summarised in Section 5.4.8
and discussed in Section 5.5.
• There exists a strong positive correlation (i.e., > 0.8 [265]) between
some different measures consistently across our entire data set.
The high correlation coefficient values can be seen in skewed his-
tograms chart in Figure 5.4 for most metrics. Across all system,
except for In-Degree Count and Public Method Count the correlation
coefficient values are in general very high.
109
Chapter 5. Growth Dynamics
• Except for In-Degree Count, in 75% of the releases all other mea-
sures show moderate to high positive correlation (i.e. > 0.6) be-
tween different measures.
110
Chapter 5. Growth Dynamics
large and complex classes need not directly service a large number of
other classes.
An example typical of the metric data in our data set is illustrated in Fig-
ure 5.1 and it shows the relative frequency distributions, for the metrics
Number of Methods and Fan-Out Count for release 2.5.3 of the Spring
framework (a popular Java/J2EE light-weight application container).
In both cases the distributions, are significantly skewed. However, the
shape of distribution is different. This is a pattern that is recurring
and common, that is, though the distributions are non-guassian and
positively skewed with fat tails, they are different for different systems
and metrics. A complete list of all descriptive statistics and the result
from our test for normality is presented in Appendix E.
The upper and lower boundaries of the metric data distribution is bounded
within a fairly narrow range. Figure 5.5 presents the boundaries of the
histograms based on the minimum and maximum values of Number of
Branches, In-Degree Count, Number of Methods and Out-Degree Count
attained across all versions of the Spring Framework. The figures show
that relative frequency distributions of these measures have a distinct
profile that is bounded in a small range. The notable fact is that this
remarkable stability is observable over an evolution period of 5 years.
111
Chapter 5. Growth Dynamics
50.0%
45.0%
45.0%
40.0%
40.0%
35.0%
35.0%
30.0%
30.0%
% Classes
% Classes
25.0%
25.0%
20.0%
20.0%
15.0%
15.0%
10.0%
10.0%
5.0%
5.0%
0.0%
0.0%
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
20.0% 20.0%
18.0% 18.0%
16.0% 16.0%
14.0% 14.0%
12.0% 12.0%
% Classes
% Classes
10.0%
10.0%
8.0%
8.0%
6.0%
6.0%
4.0%
4.0%
2.0%
2.0%
0.0%
0.0%
0
+
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29+
29
Figure 5.5: Spring evolution profiles showing the upper and lower
boundaries on the relative frequency distributions for Number of
Branches, In-Degree Count, Number of Methods and Out-Degree Count.
All metric values during the entire evolution of 5 years fall within the
boundaries shown. The y-axis in all the charts shows the percentage
of classes (similar to a histogram).
There are however, some exceptions to this rule that coincide with struc-
tural shifts from one major release to another. For instance, in Hiber-
nate, one of the systems in our study, we noticed the profile of many
distributions has shifted significantly, twice during its evolutionary his-
tory. Upon closer examination we found that the profile shifted to a
new bounded range when the team moved from one major version to
112
Chapter 5. Growth Dynamics
35%
30%
25%
20%
% Classes
15%
10%
5%
0%
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
In-Degree Count
Hibernate 1.x Hibernate 2.x Hibernate 3.x
Figure 5.6: The distinct change in shape of the profile for Hibernate
framework between the three major releases. Major releases were ap-
proximately 2 years apart.
The typical range for the Gini coefficient independent of the metric or
system under consideration is between 0.47 and 0.75, with a mean value
113
Chapter 5. Growth Dynamics
Though, the relatively high value for Gini coefficient was not surprising
(given that metric distributions are known to be skewed), the narrow
range for the IRQ confirms the visual observation that metric distribu-
tions do not change substantially over time.
114
Chapter 5. Growth Dynamics
1
.9
.8
.7
.6
Gini
.5
.4
.3
.2
.1
0
IDC PMC NOF TCC ODC NOB NOM SIC LIC FOC
Figure 5.7: Box plot of Gini coefficients across all selected Java sys-
tems.
115
Chapter 5. Growth Dynamics
116
Chapter 5. Growth Dynamics
An analysis of the distributions showed that very high Gini values (typ-
ically those over 0.85) were consistently observed in only two metrics:
NOB and TCC. We also noticed a few systems with high values of Num-
117
Chapter 5. Growth Dynamics
The systems where these high values were consistently found were:
Checkstyle, Hibernate (version 3.0b1 and higher), PMD, Groovy, Pro-
Guard, FreeMarker, JabRef (version 1.4.0 and higher), and JasperRe-
ports. In these systems we observed persistent occurrences of Gini
values for Number of Branches greater than 0.83, with a value of 0.91 for
CheckStyle (version 2.1.0). But, why should this be the case? We dis-
covered, upon further inspection, that all systems contained machine-
generated code, which yields an extremely uneven functionality dis-
tribution. Seven systems out of the eight used compiler-compilers to
generate parsers which centralise the semantic rules into few classes
that tend to be large and have a very high Weighted Method Count. The
other system that we picked up with large Gini values was ProGuard
with a TCC value over 0.9. This was caused by a few mapping classes
that have been generated using a code generation template.
Two other systems also produced fairly high Gini values were Xerces2
and Xalan. In both of these systems, the Gini coefficient for NOB is
between 0.75 and 0.81 over the entire evolution history. These high val-
ues resulted from hand-written parsers that produced functionality dis-
tribution profiles closer to systems that contained machine-generated
code. These were the only instances in which we observed such high
values for Number of Branches or Type Construction Count without the
presence of machine-generated code.
We analyzed the trend in Gini coefficients over the entire evolution his-
tory in order to determine if developers tend to centralise functionality
and complexity by observing the relationship between the Gini coeffi-
cient and the Age.
118
Chapter 5. Growth Dynamics
The In-Degree Count, a measure of popularity was the only one that
consistently showed a positive trend. In 20 systems the ρ value was
greater than 0.8 and in 11 systems the value was in the range 0.55 to 0.8.
3 systems showed a weak relationship (ρ values of 0.08, 0.22 and 0.33),
while only 3 systems (Struts, Tapestry and Ant consistently) showed a
strong negative trend (ρ < −0.6).
The aim of our study into the nature of growth in software was to un-
derstand if evolution causes growth to be evenly distributed among the
various classes in the software system, or if a certain classes tend to
gain more complexity and size. In order to answer this question, we
studied how the metric data distribution changes over time.
119
Chapter 5. Growth Dynamics
1
.8 .6
Spearmans Correlation Coeff.
-.4 -.2 0 .2
-.6
-.8
-1 .4
IDC PMC NOF TCC ODC NOB NOM SIC LIC FOC
120
Chapter 5. Growth Dynamics
5.5 Discussion
In this section, we discuss how developers choose to maintain software
systems by interpreting the observations presented in the previous sec-
tion. Specifically, we argue that though the different metrics selected
are strongly correlated, the inconsistency of the strength between dif-
ferent systems warrants the use of different metrics in order to under-
stand the maintenance of a software system effectively. Furthermore,
we discuss the finding of the consistent increase in the Gini Coefficient
value of the In-Degree Count and argue that this observation provides
the empirical support for the preferential attachment model that has
previously been postulated as one of the likely models of growth. We
also present an explanation for the observation of highly stable and
strongly bounded range for the Gini Coefficients and show that the de-
cisions that developers make with respect to how they organise a solu-
tion are highly consistent during the development life cycle. Addition-
ally, we present our interpretation for why developers prefer to construct
and maintain solutions with highly skewed distributions where much
of the functionality is centralised into a few god-like classes. We end
the section with a discussion on the value and applicability of the Gini
Coefficient for identifying releases with significant changes as well as
for detecting the presence of machine generated code.
121
Chapter 5. Growth Dynamics
1
Spearman's Correlation Coefficient
.4 .6
.2 .8
122
Chapter 5. Growth Dynamics
Moreover, we found repeatedly that the Gini coefficients for metrics that
may be considered correlated, diverge independently of the underlying
metric data relationship. Though Load Instruction Count and Store In-
struction Count are strongly correlated (often well over 0.90), we need
to analyze summary measures derived from them separately also. We
observed in some instances that the Load Instruction Count Gini Coef-
ficients changed independently of the Store Instruction Count Gini Coef-
ficient values. For example, between the release 3.0 and 3.1 of Check-
Style the Load Instruction Count Gini Coefficient changes from 0.870 to
0.804 while Store Instruction Count Gini Coefficient only changes from
0.831 to 0.827.
123
Chapter 5. Growth Dynamics
as existing classes gain more size and complexity, that is, they become
relatively wealthier.
When developers add a set of new classes, these fit within the overall
probability distribution profile as indicated by the minimal change in
the Gini coefficient. Although our observations indicate that the long-
term trend for most Gini Coefficient values cannot be reliably predicted,
there is a high probability that the distribution of size and complexity
between any two consecutive versions are very similar.
We analyzed the trend in the Gini values to answer one of our research
questions – “do large and complex classes get bigger over time?”. If large
and complex classes gain additional volume in terms of code, then they
will grow at a slightly faster rate than the rest of the code base causing
the Gini Coefficient values to increase as software evolves. Our obser-
vations showed a consistent trend in the Gini Coefficient value of IDC,
but none of the other metrics had a sufficiently generalizable trend.
In some systems there was an increase, while in others there was a
decrease in the Gini Coefficient values over time.
Our findings show that popular classes tend to gain additional de-
pendents indicating that, in general, software systems are built incre-
124
Chapter 5. Growth Dynamics
Many researchers that study software have noted that the In-Degree
Count metric distributions follow a power-law [21,54,55,223,287,299],
and have hypothesised that this distribution arises due to preferential
attachment. That is, they have inferred the potential growth mecha-
nism from the observed outcome. However, although preferential at-
tachment can cause a power-law distribution, it is not the only model
that can give rise to this category of skewed distributions [48,145]. This
possibility implies that the hypothesis of preferential attachment model
generating skewed In-Degree Count distributions was not fully validated
empirically. Our observations show that, in general, there is an upward
trend in the Gini coefficient for IDC providing empirical support for the
preferential attachment growth model as the most likely explanation for
the observed highly skewed distributions for In-Degree Count metric.
125
Chapter 5. Growth Dynamics
sented in the next chapter). The major rework has caused the Gini co-
efficient values for In-Degree Count to reduce substantially. In essence,
in both of these systems the popular classes were not able to establish
themselves to gain additional popularity.
The Ant Build system was unique in its behaviour as the Gini Coefficient
value for In-Degree Count decreased gradually over time. Furthermore,
there was no evidence of any substantial rework in this system. The rea-
son that the popular classes slowly lost popularity was driven by the
inherent architecture of Ant, specifically how new features were added.
In Ant, new features were typically added into the core functionality via
a plug-in architectural style. The plug-in contained a set of new classes
that provided the needed functionality. Although the plugins rely on
some of the existing core classes, the dependency binding into the core
classes was achieved via external configuration files rather than stat-
ically in Java code. However, since our metric extractor is unable to
process external configuration files to determine dependency informa-
tion, we were not able to fully identify these dependencies. When the
dependency data was checked manually for four consecutive versions
(1.6.4, 1.6.5, 1.7.0 and 1.7.1), we observed a weak positive correlation
trend.
Unlike In-Degree Count, in the other metrics there was no strong gen-
eralizable trend in the Gini Coefficient suggesting that growth models
other than preferential attachment may be at work.
126
Chapter 5. Growth Dynamics
When developers are making decisions, they weigh the benefits of using
a large number of simple abstractions against the risk of using only a
few, but complex ones in their solution design. Our findings indicate
that developers favor the latter. In particular, we learned (see Figure 5.7)
that the Gini coefficients of most metrics across all investigated systems
assume bounded values. These values mark a significant inequality
between the richest and the poorest. For example, in Spring framework
version 2.5.3 approx. 10% of the classes possess 80% of the Fan-Out
Count wealth (see Figure 5.3).
127
Chapter 5. Growth Dynamics
The existence of both lower and upper bounds for Gini Coefficients in
software systems can be viewed as a result of a trade-off developers use
throughout development. Decisions on how to allocate responsibilities
to classes within a system are a product of the application domain, past
experience, preferences, and cultural pressures within the development
team. The Gini Coefficient is a potential mechanism for summarising
the outcome of these choices.
The stability of the Gini coefficient indicates that developers rarely make
modifications to systems that result in a significant reallocation of func-
tionality within the system. Managers can, therefore, use this knowl-
edge to define project-specific triggers both to help detect substantive
shifts in the code base and to ask directed questions about the reasons
for their occurrences. For example, in our study we were able to detect
a major change in the Type Construction Count of Proguard, a Java byte
code obfuscator, between release 3.8 and 4.0 (cf. Table 5.4). A detailed
analysis disclosed that the developers had added a set of large auxil-
iary classes to centralise the obfuscation mapping of instructions — a
change, so significant, as to warrant an appropriate notification to both
128
Chapter 5. Growth Dynamics
The true benefit of the Gini coefficient is its ability to capture precisely
and summarize changes in both the degree of concentration and the
129
Chapter 5. Growth Dynamics
The median for Type Construction Count, for example, stays firm at 1 for
340 weeks of development, suggesting a fairly routine evolution of the
code base over a period of 6.5 years. The Gini coefficient for Type Con-
struction Count, on the other hand, moves from 0.776 to 0.897 indicating
that the arrival of new classes results in a less equitable concentration
of object instantiations. In case of ProGuard, the changes to the sys-
tem occur at the upper end of the Type Construction Count measure.
While the median remains for Type Construction Count the same, the
changing Gini coefficient reflects correctly the occurrence of regular
fluctuations as well as a sudden architectural change at RSN 19 (corre-
sponding to Version id. 3.8, see Figure 5.12). Between RSN 19 and 20,
the developers of this software system modified the strategy used for
mapping the actual code to obfuscated code which can be considered
as a substantial architectural change since it adjusts how this software
system implements its core functionality.
130
Chapter 5. Growth Dynamics
131
Chapter 5. Growth Dynamics
abstractions that they need to focus on is well within the limits of their
working memory. If this decision making method is applied for each fea-
ture, over time, the asymmetric distribution observed in various metrics
is reinforced.
The Gini Coefficient is a single bounded value, and hence offers an easy
to use trigger for an investigation that would reveal potential causal
mechanisms. One alternatives to using the Gini Coefficient will be to
identify and observe the movement in outliers using a range of outlier
detection techniques [24]. Another alternative is to use a combination
of arithmetic mean, median, skewness and kurtosis to qualitatively de-
duce the nature of the distribution. Though these techniques may offer
some insight, they do not digest the information taking into consider-
ation the entire population size and also are limited since there is no
easy baseline to compare against. Furthermore, the outlier detection
method will highlight a set of classes rather than present information
that permits direct comparisons between versions without additional
analysis.
132
Chapter 5. Growth Dynamics
In the first system (tax calculation software for large corporations with 5
releases) that we audited applying the above rules, we noticed that the
Gini coefficient for NOB, NOA and TCC was consistently over 0.95 (in all
5 versions available for investigation). Additional investigation showed
that this system contained a large number of stub classes that were
generated from code templates. Interestingly, the stub class generator
was a script developed specifically for this purpose and even though
these stub classes were used extensively their generation technique was
not specifically documented.
Using the Gini Coefficient in both cases identified key aspects of the
133
Chapter 5. Growth Dynamics
134
Chapter 5. Growth Dynamics
(see Figure 5.13 showing the Number of Branches Gini Coefficient values
for JabRef). JabRef is a tool that helps manage bibliography databases,
specifically BibTex files. Interestingly, the developers initially used a
hand-written parsers and in RSN 6 introduced a machine generated
parser. The introduction of the machine generated parser for BibTex
files caused the Number of Branches Gini Coefficient value to increase
from 0.74 to approximately 0.91 (see Figure 5.13).
High Gini Coefficient values indicate that these systems have a small
set of very large and complex classes (wealthy in terms of size and com-
plexity). As noted earlier, human developers rarely write code in which
Gini Coefficients for specific measures go past 0.80 as they will find it
hard to write and maintain methods with very high algorithmic com-
plexity [74,263]. The ability of detecting and flagging machine generated
code is valuable since it signals the possible need for additional exper-
tise in order to maintain or enhance an existing code base and to meet
strategic objectives.
135
Chapter 5. Growth Dynamics
5.6 Summary
Evolving software tends to exhibit growth. In this chapter our focus was
on how maintenance effort is distributed by analyzing class size and
complexity metric distributions. The data we collected for our study
showed that size and complexity are unevenly distributed, confirming
similar observations made by other researchers. However, these skewed
distributions cause standard statistical summary measures such as
mean and standard deviation to provide misleading information mak-
ing their application for comparative analysis challenging. In order to
overcome this limitation, we analyzed the data using Gini coefficient, a
higher-order statistic widely used in economics to study the distribution
136
Chapter 5. Growth Dynamics
of wealth.
The results from our study showed that metric distributions have a
similar shape across a range of different systems. These distributions
are stable over long periods of time with occasional and abrupt spikes
indicating significant changes are rare. A general pattern is that once
developers commit to a specific solution approach, the organisational
preferences are stable as the software system evolves.
137
Chapter 6
Change Dynamics
138
Chapter 6. Change Dynamics
• What is the likelihood that a class will change from a given version
to the next?
The raw data used is this study is available as data files on the DVD
attached to this thesis. Appendix F describes the various data and
statistical analysis log files related to this chapter.
139
Chapter 6. Change Dynamics
140
Chapter 6. Change Dynamics
ing it with another syntax tree. In contrast to the String matching ap-
proach, the Abstract Syntax Tree (AST) based matching is programming
language aware and is able to identify the types of changes more pre-
cisely. Though this approach is comparatively more complex, it has
increasingly been adopted by contemporary IDE’s like Eclipse [73] to
perform an intelligent diff allowing developers to compare two versions
of a class. The AST based matching approach has also been enhanced
to better detect structural changes in object oriented programs by com-
bining it with call graphs analysis [157], and program behaviour anal-
ysis [7].
141
Chapter 6. Change Dynamics
often make error correction difficult (since an error may be fixed in the
one block of code, yet remain in all of the clones), and also inflate the
size of the code making maintenance harder [236].
3. The set of interfaces that this particular class implements (we use
fully qualified names)
5. The name, signature and modifiers of the methods (the return type
as well as parameter names and types are considered as part of
the method signature)
142
Chapter 6. Change Dynamics
6. The set of classes that this class depends on (defined as lout (n) in
Chapter 4.
Additionally, there also exists the following disjoint sets in a given ver-
sion v:
f
• The set of classes that are deleted in the next release Dv ,
f
• The set of unchanged classes Uv in the next release, and
f
• The set of classes that are modified in the next release Cv .
143
Chapter 6. Change Dynamics
The set of changed classes Cv and the set of unchanged classes Uv are
determined by comparing the classes in two releases using the change
detection technique presented in Section 6.1.2.
f
The set of deleted classes Dv in the next release is determined by com-
paring any given version with the future version (if available) and check-
ing if the fully qualified class name exists in both versions. The set of
newly added classes Av is computed by comparing any given version
with the previous version (if available) and checking if the fully quali-
fied class name exists in that version.
We measure the size, consistency and frequency of change for each class
in our release history (that is, every class in every version). The ratio-
nale for our choices is presented in the next section within the context
of the observations.
144
Chapter 6. Change Dynamics
Once change has been detected and the magnitude assessed, we con-
sider two attributes in order to understand the factors that impact on
the probability of change: the popularity of a class, and the complexity
of a class. A popular class is one that is used by a large number of other
classes. In any development environment, it seems wise to make new
classes depend on stable, reliable parts of the system, rather than on
those that are constantly changing. As a consequence, it seems logical
that new code should depend on existing, proven parts of the system.
If this is the case, popularity should make a class stable. The other as-
pect that is likely to have a close relationship with change is complexity
of a class. A complex part, is more likely to have defects [20, 130, 266],
and hence changed as part of a corrective modification. However, the
counter argument is that a complex part should resist change because
developers will avoid these classes to minimize the chance of introduc-
ing new defects.
p p
The set of popular classes is Nv , where Nv ⊆ Nv . Any given version v of
the system has the following disjoint sets of popular classes:
p
• The set of unchanged popular classes Uv ,
p
• The set of popular classes that have changed Cv , and
p
• The set of popular classes that are newly added in this version Av
6.2 Observations
In this section we present the analysis method as well as the observa-
tions within the context of the key research questions. We start with an
145
Chapter 6. Change Dynamics
What is the likelihood that a class will change from a given version to
the next? Does this probability change over time? Is it project-specific?
f
uv ratio of classes that are unchanged
f
cv ratio of classes that are changed
f
dv ratio of classes that are removed
f
f |Cv |
cv = (6.2.2)
| Nv |
f
f | Dv |
dv = (6.2.3)
| Nv |
and,
f f f
uv + cv + dv = 1 (6.2.4)
146
Chapter 6. Change Dynamics
|Uv |
uv = (6.2.5)
| Nv |
|Cv |
cv = (6.2.6)
| Nv |
| Av |
av = (6.2.7)
| Nv |
and,
uv + cv + av = 1 (6.2.8)
In our input data set, we determined that for any given version v, the
following property (Equation 6.2.9) holds in 80% of the versions:
f f f
uv > cv > dv (6.2.9)
and the following property (Equation 6.2.10) holds for 75% of versions:
When we look ahead one version, on average across all systems that we
studied, we observe that 75% of the classes are unchanged, 20% are
modified and 5% are removed. When we look back one version to detect
new classes, on average we note that 70% of the classes are unchanged,
22% are modified and around 8% are new classes. Figure 6.2 highlights
147
Chapter 6. Change Dynamics
100%
90%
80%
70%
60%
% Classes
50%
40%
30%
20%
10%
0%
1
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
Release Sequence Number
148
Chapter 6. Change Dynamics
100%
90%
80%
70%
60%
% Classes
50%
40%
30%
20%
10%
0%
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 18 19 20 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
This hypothesis was tested using the Kruskal-Wallis test [279] that
checks equality of population medians among groups. The population
was divided into two groups based on if the property under investiga-
tion holds or not. If age does not have a significant impact then the
population median should be similar for both groups.
149
Chapter 6. Change Dynamics
3,000
3,000
2,000
2,000
Age (Days)
Age (Days)
1,000
1,000
0
N Y N Y
The Kruskal-Wallis test shows that system maturity has an impact, but
it does not answer a related question – Does the probability that the
stated properties hold increase with age?. That is, as software matures
does the probability of stability increase from the 80% indicated for the
first property? In order to answer this specific question, we constructed
a Logistic model [279] to predict the probability that the change prop-
erties (Equations 6.2.9 and 6.2.10) will hold.
150
Chapter 6. Change Dynamics
.95
.95
.9
.9
.85
.85
Probability
Probability
.8
.8
.75
.75
.7
.7
The logistic model was used here rather than the OLS regression model
since it is specifically designed to handle response variable with di-
chotomous (that is, can be one of two possibilities) outcomes. In our
model we used Age as the explanatory (independent) variable while the
response (dependent) variable was if the change property holds (0 was
used to represent where the property was violated, and 1 was used to
represent where the property holds). The parameters of the model were
estimated using the Maximum Likelihood Estimate (MLE) method and
the probability that the property holds was calculated from the logistic
models for each value of age.
The models that we constructed for both properties showed that Age
was a statistically significant predictor of how strongly the change prop-
151
Chapter 6. Change Dynamics
erties hold. For every one unit increase in Age (that is one day), the
odds that the change property holds increases by a factor of 1.0008
for Change Property 6.2.9, and by a factor of 1.006 for Change Property
6.2.10. The predicted properties are presented graphically in Figure 6.4
and show that as software systems mature, the proportion of code that
is modified reduces.
The key finding is that as software evolves, though classes resist change,
there is a certain proportion that are still modified. Furthermore, our
observations suggest that in general, the number of modified classes
is greater than the number of new or deleted classes. The main appli-
cation of this observation is in effort estimation, since if x new classes
are anticipated to be added in the next version our change properties
show that at least x classes will be changed. Though, our findings do
not imply that the new classes cause the change, the consistency of
this observation in our data set suggests that it can be applied as a
heuristic during the planning stage.
The probabilities that the change properties will hold were derived by
using the entire data set of releases, ignoring individual projects. How-
ever, when we apply the break down by project, we noticed that in 7
systems these properties (Equations 6.2.9 and 6.2.10) are violated sig-
nificantly more than expected. For instance, Ant Build System violates
the second property (Equation 6.2.10) approximately 45% of the time,
well over the 25% probability that was identified for the second prop-
erty. Interestingly, apart from Ant, in all other systems the probability
that both properties hold is at least 60%. Though there was no common
and consistent driver across all of these systems, the higher than nor-
mal change pattern can be explained by certain architectural choices
(Ant has a plug-in architecture hence new classes can be added with
minimal changes to existing code) and decisions made regarding devel-
opment method (details are presented in section 6.3).
152
Chapter 6. Change Dynamics
For the analysis, we measured the number of times a class has been
modified since birth across the entire data set (approximately 55300
classes across all projects). In our data, on average 45% of the classes
were never modified once created. However, when the data was an-
alyzed by project, the range of classes that were never modified was
between 22% and 63%. For instance, in Saxon only 22% of the classes
were unchanged, while at the other end in Jung and ActiveMQ 63%
of the classes were unchanged after birth. Only considering classes
that have been modified at some point in their life cycle and counting
the number of times that they have been changed, we can observe that,
approximately 60% of the classes are modified less than 4 times (see
Figure 6.5). The range is however fairly broad. In Jung, for example,
90% of modified classes are changed less than 3 times.
153
Chapter 6. Change Dynamics
100%
90%
80%
Axis
70% Azureus
% Classes (Cumulative)
Castor
60% Checkstyle
Findbugs
Groovy
50%
Hibernate
Jung
40%
Spring
Struts
30% Webwork
Wicket
20% Saxon
Xerces
Xalan
10%
ActiveBPEL
iText
0%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Modification Count
154
Chapter 6. Change Dynamics
100.0%
90.0%
80.0%
70.0%
Axis
60.0% Saxon
Xerces2
Xalan
50.0%
PMD
Wicket
40.0% Jung
ActiveBPEL
30.0% iText
20.0%
10.0%
0.0%
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
Measuring only the number of metrics that change provides a size change
measure that is bounded (since the total number of metrics that can
change is fixed, in our study at 54 metrics). The bounded nature of
155
Chapter 6. Change Dynamics
We found that there is a broad range for the amount of change across
the systems, but the overall shape of the distribution is similar. The cu-
mulative distribution curve is fairly linear for 90% of the classes across
multiple systems and only 10% of the modified classes have more than
23 measures modified. When we investigated these observations by
project, 3 systems (Axis, iText and Saxon) had a higher level of volatility
than others, while one system (Jung) showed comparatively low volatil-
ity since most classes underwent very small modifications.
The reader may notice (in Figure 6.6) that a few classes do not indi-
cate a change in any of the measures at all. This is possible since the
change detection technique that we apply also takes into consideration
method signatures, field names and additional metadata. Hence, minor
changes in names are detected as a change, but the underlying mea-
sures need not change. However, our analysis has revealed that this is
only rarely the case and over all versions analyzed, at most 3% of mod-
ified classes have a distance of zero, indicating that our metric based
approach is effective at detecting the majority of changes. Further, the
zero distance change were caused by minor modification to interfaces,
156
Chapter 6. Change Dynamics
p
| Nv |
pv = (6.2.11)
| Nv |
p
p |Cv |
cv = (6.2.12)
|Cv |
157
Chapter 6. Change Dynamics
50.0%
45.0%
40.0%
35.0%
30.0%
% Types
All
25.0% Modified
New
20.0%
15.0%
10.0%
5.0%
0.0%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41
RSN
p
cv > pv (6.2.13)
Interestingly, even if we include all of the data, that is, we consider re-
leases where less than 30 classes have been modified, the above prop-
erty hold approximately 75% of the time.
Although, this property holds across all releases, are individual sys-
tems different? Does this property get violated more in some systems?
In order to answer this question, we investigated each system. We ob-
served that in 3 systems (Proguard, Tapestry and Acegi), this property
holds only 50% of the time. These systems appear to be outliers (all
other systems in our data set had at least an 80% probability). Inter-
estingly, in 17 systems, the property holds 100% of the time. That is,
in these systems, the majority of modified classes are popular.
158
Chapter 6. Change Dynamics
In Figure 6.7 we see a chart for that illustrates our observation in the
Spring Framework and shows the proportion of popular classes over the
entire evolutionary history. We can see that the proportion of modified
popular classes is consistently greater than the proportion of popular
classes. Developers ideally would approach a class with high In-Degree
Count with care due to the associated risk of impacting other classes
that rely on the services provided. However, our observations show that
typically, classes with a higher In-Degree Count tend to be modified more
than those with lower In-Degree Count.
In the previous chapter we showed that In-Degree Count and size are not
correlated (Spearman’s rank correlation coefficient across all systems
was in the range −0.2 and +0.2). This weak correlation implies that
popular classes are being modified, but not because they have more
code.
159
Chapter 6. Change Dynamics
160
Chapter 6. Change Dynamics
p
p | Av |
av = (6.2.14)
| Av |
p
In Equation 6.2.14, Av is the set of new classes that are popular, and
Av is the set of new classes in version v.
p
pv > av (6.2.15)
And, the following property (Equation 6.2.16) holds, for 85% of the re-
leases in our data set:
p p
cv > pv > av (6.2.16)
We observed that the In-Degree profile of the new classes is very differ-
ent from the profile of existing code. New classes tend to start with a
substantially lower In-Degree Count, and as they are modified over time
move towards the overall trend. If we compare the proportion of popular
161
Chapter 6. Change Dynamics
new classes with that of all classes, we can see (Figure 6.7) that this
proportion is consistently lower than the norm. New classes, therefore,
tend to start out with relatively lower popularity and some of these gain
dependents over time. The observation that in general new classes are
less popular than existing classes is also supported by our finding from
the Gini coefficient analysis in the previous chapter. We stated that the
In-Degree Count gini coefficient in general increases as systems mature
which also shows that new classes do not have the same distribution
profile in terms of popularity as the existing classes — if they did, the
Gini Coefficient would not change.
162
Chapter 6. Change Dynamics
163
Chapter 6. Change Dynamics
The aim of our study into the nature of change in software was to inform
developers where they can expect change, the likelihood of a change,
and the magnitude of these modifications allowing them to take proac-
tive steps.
164
Chapter 6. Change Dynamics
6.3 Discussion
In the previous section, we summarized our observations. In this sec-
tion, we discuss how our findings relate to the maintenance activities
within the development life-cycle and offer an interpretation for the re-
sults.
Our analysis of the relationship between new classes and changed classes
as highlighted by Equation 6.2.10 shows that the set of changed classes
is in general greater than the set of new classes. This finding can be
used during the planning phase for a new version, specifically in im-
proving the effort estimation. For example, if the initial analysis and
design for a new version suggests that there is need for 25 new classes,
165
Chapter 6. Change Dynamics
The first example is the Ant build system, a project that exhibited the
greatest amount of violations (in our data set) of the two change prop-
erties (Equations 6.2.9 and 6.2.10). Our analysis shows that most of
these violations are primarily driven by the plug-in architecture style
used in this software system. Ant is a build automation software sim-
ilar to make [258]. This software was conceptualised and built around
a plug-in based architecture where new build automation tasks can be
developed and added with minimal changes to the existing code. Most
of the enhancements to Ant came as new plug-ins that provided new
build automation tasks. The core of the system is only modified if a large
number of plug-ins are replicating similar functionality. This plug-in
architecture allows developers to create and add new classes without
requiring many changes in the existing code base. This modular archi-
tectural style and the domain of the application in effect meant that in
most new releases, there was a greater proportion of new classes com-
pared to modified classes (violating the change properties at a greater
rate than expected).
The second example is Struts, another project that also had a large
number of violations for the change properties (nearly 50% of the re-
leases). An investigation into the release notes and source code of Struts
shows that changes can be attributed to the developers rewriting large
parts of the codebase twice. The first complete re-write used a small
166
Chapter 6. Change Dynamics
portion of the code from the previous versions. The second re-write
abandoned much of the code base and started on top of another open-
source project. This level of fluctuation, and continuous change effec-
tively meant that developers did not have sufficient time to increase the
maturity of the project as it was consistently undergoing substantive
changes.
Although our approach does not reveal the actual amount of code that
has changed, it provides us with a broad indicator. One of the systems
that has resisted changes was Wicket, a web application development
framework. Interestingly, the rate of modification in Wicket was close to
many other systems. However, the observed distribution of these mod-
ifications suggests that most of the changes were minor fixes. At the
167
Chapter 6. Change Dynamics
This observed popularity of modified classes does not square well, how-
ever, with Martin’s Stable Dependencies Principle [189] which states
that: “The dependencies between components in a design should be in
the direction of the stability of the components. A component should
only depend upon component that are more stable than it is.” On the
surface, the principle appears sound: to improve the overall stability of
our system, we should make new things depend on stable and mature
components. Unfortunately, our new interpretation of Lehman’s Law of
Continuing Change suggests that the very fact of depending on a stable
component will make it less stable.
168
Chapter 6. Change Dynamics
Our analysis indicates that there is a very strong probability that pop-
ular classes are less stable. However, in our data we encountered three
outliers — JabRef, Proguard and Acegi where popular classes were just
as likely to be modified as non-popular classes. Despite an exploratory
analysis into each of these systems, we were not able to identify a com-
mon driver that can explain our observation. This indicates that our
finding of popular classes being less stable has a strong probability of
occurring, but we cannot yet generalise it into a universal principle with
well defined constraints.
2 Oxford American Dictionary, 2005
169
Chapter 6. Change Dynamics
We have seen that the distributions of size and complexity remains sta-
ble, hence it cannot be that growth mainly occurs in existing classes,
but rather in the creation of new classes. But where does this growth
occur — do developers build on top of the existing code base, or do they
tend to build features as components that are added into a software
system?
Since new classes have lower than average In-Degree Count, it seems
clear that growth is on top of existing classes. It is highly unusual for a
new class to have high In-Degree Count, so there must be little growth
below classes of the existing system. This finding is also supported by
the Gini coefficient analysis that shows that, in general, the Gini value
of In-Degree Count tends to increase as software matures, suggesting
that as new classes are added they tend to make use of existing classes
hence increasing their popularity. Further, since open-source projects
are known to be developed in an incremental and iterative fashion, our
observations are consistent with the notion that these systems are built
bottom-up, rather than top-down.
We have observed that classes that are modified tend to have a higher
Branch Count than the ones that remain unchanged. Why is this the
case?
Research into defect-prone classes [31, 213, 318] suggests that com-
plex abstractions will tend to have more defects. Our observation that
complex classes attract a higher proportion of modification is consis-
tent with the fact that complex classes tend to have more defects and,
therefore, are likely to undergo more modifications as the defects are
corrected. Our observations are incomplete, however, since we have
not analysed the defect data that will allow us to state that changes to
complex classes are principally concerned with correcting defects. Fur-
thermore, it is reported that corrective changes account for only 21% of
170
Chapter 6. Change Dynamics
changes [26], so we cannot conclude that defects are the main reason
for change.
A class that changes is likely to have higher In-Degree Count and higher
Branch Count. Simply put, complex classes have a higher probability of
change. Furthermore, this probability increases with the complexity of
the class.
In our data set, three systems stand out as clear outliers: Saxon XML
processor, Axis web services library and iText PDF generation library.
All of these systems exhibited above average change both in terms of
the frequency as well as the magnitude. But, why should this be the
case? Is there a common shared trait?
It turns out that all three systems have one common attribute — they
were all built to satisfy well defined industry standards. Saxon and
Axis implement W3C (World Wide Web Consortium) standards for XML
and Web services, respectively. The iText library attempts to satisfy
the Adobe PDF specification. Saxon and iText are currently supported
by commercial (for-profit) organisations, while the Apache consortium
handles the Axis project (Apache is a non-profit organisation that man-
ages a large number of open source projects, including the popular
Apache Web server).
171
Chapter 6. Change Dynamics
The architecture guide provided with Axis [11] states that most modules
are not functionally separated and attempt to provide a diverse range
of functionality, this tight coupling is noted by the developers as an
known concern that needs to be addressed in later releases [11]. The
developers of Axis, interestingly, do suggest a potential solution that can
address the coupling issue in the architecture document [11] but these
recommendations have not been implemented (at time of analysis). The
decisions that the iText team made with respect to compiling are similar
to choices made for Axis and Saxon as inferred from the change logs and
release notes [128].
Another aspect common in iText and Saxon was the choice made by the
developers to build their own collections data structures (such as a List,
Map and Queue) that underwent many changes when they were ini-
tially introduced into the code base. Comments provided in the source
code suggest that the motivation for creating their own collections data
structure was to improve performance. It is possible that at the time
of development the developers did not have access to reliable and ma-
ture high-performance collections classes such as those that exist freely
now like GNU Trove [280] and the Google collections framework [107].
Though, reliable high-performance collections libraries now exist the
developers have left their original data structures and algorithms in-
tact.
172
Chapter 6. Change Dynamics
173
Chapter 6. Change Dynamics
Similar to our work, Capiluppi et al. have analyzed the release history
of a number of Open Source Systems [35, 39–41] with an intention of
understanding if developers will undertake anti-regressive work, i.e.,
make changes to reduce the complexity of a certain module and hence
improve maintainability. Their studies mainly focused at a macro-level,
in particular on relative changes in the code size and on complexity at a
module level [41] as well as the influence of the number of developers on
the release frequency [39]. Their conclusions were that in Open Source
Projects, relatively little effort is invested in anti-regressive work. In our
study we found that developers minimize the frequency and magnitude
of change, and the abstractions that do change tend to be popular or
complex. Interestingly, if developers spend little effort on anti-regressive
work, then it implies that the typical modification is to create new fea-
tures or correct defects.
174
Chapter 6. Change Dynamics
A recent study into change by Gîrba et al. showed that classes that have
changed in the past are also those most likely to change in the future.
They also showed that these classes were the minority [92]. In related
and earlier work, Gîrba et al. have tested the hypothesis that classes
that have changed in the past are likely to change in the future [94].
The reliability of this measure of “yesterday’s weather” seems to vary
according to the “climate” of a software project. Gîrba et al. have also
studied the relationship between change and developers [95]. Addition-
ally, Girba et al. in their studies tag a class as having changed if meth-
ods were added or removed. They also identify change at the method
level by observing a change in the number of statement and the cyclo-
matic complexity. In our study, we used a larger set of properties to
detect change and hence are able to identify more fine grained changes.
Rather than modeling and understanding the nature of change, their
goal was to understand which developers are most knowledgeable about
different parts of an evolving system.
Another arc in the study of change has been the area of understand-
ing co-change, where the assumption is made that certain groups of
classes or modules change together [85] because related features are
grouped together. This assumption was supported by research under-
taken by Hassan and Holt who analyzed many Open Source projects
and concluded that historical co-change is a better predictor of change
propagation [113]. This observation is also supported by Zimmerman et
al. [319,320] which led to the development of tools [315] that can guide
developers to consider a group of classes when they modify one class.
Our study however, aims at understanding the statistical properties of
post-release change and inherent properties of a class that might lead
them to change. The properties that we identified can improve project
plans and help during iteration retrospectives, rather than being able
to directly guide developers about potential co-changing classes. Fur-
thermore, Gall et al., Hassan et al., and Zimmerman et al. relied on the
revision history in their studies, and identify change during the devel-
opment phase. We however, investigated the release history and hence
are able to provide a post-release change perspective.
175
Chapter 6. Change Dynamics
176
Chapter 6. Change Dynamics
More recent work by Aversano et al. [10] and Di Penta et al. [68] based
on empirical investigation of evolution in 3 Java software systems (both
studies used the same data set) arrived at similar conclusions suggest-
ing that classes that participate in design patterns tend to be change-
prone. The findings from the study by Bieman et al., Aversano et al.
and Di Penta et al. complement our own observations. Our study, how-
ever, is more specific, in that we show that popular and complex classes
tend to be change-prone rather than be restricted to purely classes that
participate in design patterns or inheritance hierarchies. Additionally,
in contrast to the studies by Aversano et al., Bieman et al. and Di Penta
et al. that focused on a few software systems, we have undertaken a
larger-scale longitudinal study.
177
Chapter 6. Change Dynamics
In a recent study, Capiluppi et al. [36, 37] investigated the value of sta-
bility at predicting reuse of a software module (identified as a set of
files within a source code folder) by studying evolution in 4 long-lived
open source software systems. Capiluppi et al. argue that highly sta-
ble modules are good candidates to be tuned into independent reusable
modules. Capiluppi et al. measured stability by using Martin’s Insta-
bility Metric [189] which uses the coupling of a module (similar to the
In Degree and Out Degree count in our work) as the basis for calcu-
lating the stability. However as discussed in Section 6.3.3, Martin’s
instability metric does not measure actual stability of an abstraction
over time and hence cannot be considered to be a direct measure of
actual change. We also did not find any rigorous longitudinal studies
that have shown the relationship between Martin’s instability metric
and actual changes that take place within a module. In contrast, our
work shows that the new dependants place additional pressure on a
class increasing the probability of change to satisfy new clients.
A key distinction from our own study and that by Capiluppi et al. [36,37]
is the level of abstraction used in the study. We focus on changes at the
class level, while Capiluppi et al. study modules. Although, the level of
abstraction used in our study is different, a module can be treated as
a set of related classes. Therefore, if a set of classes within a module
change, we can considered the module to have changed as well and
hence our findings are likely to be similar at the module level as well.
178
Chapter 6. Change Dynamics
6.5 Limitations
In order to place our study in context, we highlight some of the known
limitations of our approach in addition to our findings.
The change detection method used in this work may miss a small set
of classes if the change is not measurable in the metrics under con-
sideration. The accuracy can be improved by adding further metrics
and additional aspects like a comparison of the call graph. The value
of this additional complexity remains as a topic for consideration in our
future work. Furthermore, our distance measure compares the initial
version of a class to the final version. This method will miss edits where
a class is modified and returned back to its original shape, as seen by
our change detection technique. This limitation can be addressed by
computing a distance measure incrementally. As a consequence, the
analysis approach will need to be adjusted to take into consideration
the way the metric information is being collected.
We have also not investigated in depth why changing classes have higher
than normal In-Degree Count. We speculate that the introduction of
new clients creates the need for adaptations. Other possibilities are
that (i) new clients introduce new requirements, but that suggests new
179
Chapter 6. Change Dynamics
growth in existing classes, which we did not consistently find, or (ii) new
clients exercise existing classes in new ways, thus uncovering previous
unknown defects. Further work is needed to discover which, if any of
these hypotheses is valid.
6.6 Summary
Change in software systems is unavoidable as they are adapted to meet
the changing needs of the users. Our study shows that when we look
at the latest version of a given system, around a third (or more) of the
classes are unchanged in their lifetime. Of the modified classes, very
few are changed multiple times, and the magnitude of change is small
suggesting an inherent resistance to change. These findings show that
maintenance effort, which is considered to be a substantial proportion
of the development effort (post initial versions) is spent on adding new
classes. In the absence of a substantial architectural shift or a rewrite
of the system, much of the code base resists change. Furthermore,
efforts to base new code on stable classes will inevitably make those
classes less stable as they need to be modified to meet the needs of the
new clients.
In the next chapter, we discuss the implications arising from our obser-
vations. Specifically, we consider how our findings relate to the Laws
of Software Evolution, and how our work can help improve software
development practices.
180
Chapter 7
Implications
181
Chapter 7. Implications
ically, we find support for First law Continuing Change, third law Self
Regulation, fifth law Conservation of Familiarity, and the sixth law Con-
tinuing Growth. However, our analysis was not able to provide sufficient
evidence to show support for the other laws.
Change
Our observations show that in all cases the metric distributions contin-
uously changed. Providing support for the first law software evolution
– Continuing Change. The only consistent facet in our data was that
in most cases, the change as measured by the difference in Gini Co-
efficient values was small. This small level of change indicates that a
certain level of stability is common and potentially desirable for evo-
lution to proceed since development teams are most productive with
stable abstractions or those that are changing at a rate that allow for
developers to learn them. Rate of change must be in synchronization
with the development teams ability to learn, adapt and effectively uti-
lize these abstractions. If the rate of change is too slow or zero, then it
would imply that the system may slowly deteriorate as it no longer meets
changing/maturing user needs. Conversely, if the many abstractions
(classes) change too quickly, the effort spent on learning the changes
will be substantially greater than the effort spent on adding new fea-
tures.
Complexity
182
Chapter 7. Implications
ity, that is, because there are a larger number of abstractions that de-
velopers have to deal with over time. However, when we consider the
distribution of size and complexity metrics, we notice that the distribu-
tion of all measures fall within a tight boundary (see Chapter 5). In any
given software system, though there is variation, the overall consistency
in the shape suggests that developers either explicitly or implicitly take
correction action in order to maintain the overall distribution of the
size as well as complexity. The only material increase in complexity at
the system level is not a consequence of complexity of a certain set of
classes, but rather due to the size of the system as a whole and increase
in the absolute number of complex classes.
The third law, Self Regulation states that evolution is regulated by feed-
back [166]. We do not collected direct quantitative data related to feed-
back. However, the bounded nature of Gini Coefficients across multiple
software systems provides indirect support for this law. Feedback can
be caused by different drivers. For instance, developers will get feed-
back from a set of classes that are too large or complex, especially if
these classes are defect prone. Feedback can also be driven by team
dynamics as well as the cultural bias (strong team habits) that influ-
ence how software is constructed. Hence, it is likely that there are
potentially multiple sources of feedback at work in order to regulate the
183
Chapter 7. Implications
What are the set of causal factors that cause this observed phenomenon?
We do not have a direct answer yet of the causal factors, but the con-
sistency of the observations gives rise to the speculation that developer
decisions are driven by some form of cognitive preference that makes de-
velopers choose solutions with certain typical profiles. It appears that
there exists an acceptable range of complexity for programmers and this
range is domain-independent since the Gini coefficients across all ana-
lyzed systems fall within a tightly-bounded interval and is independent
of the Java language semantics as well as development-environment
neutral. The Java programming language does not, in any way, limit
developers to choose between design alternatives or forces them to con-
struct software with the observed distribution profiles and hence the
184
Chapter 7. Implications
Given the emergent nature of the overall statistical properties and the
consistency across multiple software systems, there cannot be a finite
and deterministic set of causes that drive developers to organise solu-
tions within a constrained design space. Rather it is likely an interac-
tion between many choices that gives rise to this typical organisational
pattern.
Familiarity
Growth
The sixth law (Continuing Growth) states that “evolving software must
be continually enhanced to maintain user satisfaction”. Our observa-
tions (cf. Chapter 6) show that software systems are built incrementally
bottom-up and that there is a small proportion of code that constitutes
new classes in every release. Though, new classes need not always
185
Chapter 7. Implications
186
Chapter 7. Implications
187
Chapter 7. Implications
ciently broad general principles and rules that can be applied across a
range of different software systems. Identifying more fine grained rec-
ommendations is an aspect that we intend to tackle in future work.
188
Chapter 7. Implications
The key argument in our study is that when dealing with skewed dis-
tributions, it is hard to derive sensible conclusions from descriptive
statistical summary measures such as mean and median. Hence, we
recommend that software metric tool developers should present the dis-
tribution histogram (relative and cumulative), the Lorenz curve and the
Gini Coefficient values. When this information is shown along with the
typical range for the Gini Coefficient values, the tool users can rea-
son about the nature of their software more effectively. Furthermore,
rather than remove the widely used summary measures (such as mean
and standard deviation) the tools should display a warning about the
limitations of these measures. This approach will serve to inform and
educate the users allowing them to interpret the data better.
189
Chapter 7. Implications
190
Chapter 7. Implications
Agile methodologies [23, 49, 189, 301] recommend that the design of a
software system should ideally be to satisfy the requirements at hand
and not aim to satisfy potential future needs. Interestingly, our ob-
servations suggest that developers, in general, inherently follow this
approach (at the class level). That is, a class is given a set of respon-
sibilities that it needs to satisfy its current dependents. We also found
that developers do not create a class that is very popular directly, rather
classes slowly gain additional dependents over time (Equation 6.2.16).
191
Chapter 7. Implications
7.3 Summary
Change in software systems is unavoidable as they are maintained to
meet the changing needs of the users. Based on the observations in
our study of software evolution we found consistent support for the ap-
plicability and validity of the following laws of software evolution: First
law Continuing Change, third law Self Regulation, fifth law Conservation
of Familiarity, and the sixth law Continuing Growth. However, our anal-
ysis was not able to provide sufficient evidence to show support for the
other laws.
There are a number of implications that arise from our findings. In par-
ticular, we discussed how managers can monitor the changes and trig-
ger a deeper investigation to explain abnormal changes as well as use
the properties and thresholds identified to reflect on the development
process. We also recommend that managers use the change properties
outlined in Chapter 6 during planning and estimation and present the
implications for software design. Specifically, we argue that reusable
components should be designed to be flexible since our findings sug-
gest that these components are change-prone.
192
Chapter 8
Conclusions
8.1 Contributions
The key contributions in this research effort are:
193
Chapter 8. Conclusions
194
Chapter 8. Conclusions
Our findings are inconsistent with some of the key literature on object
oriented design which suggests that these systems should be designed
to ensure that responsibilities are well balanced among the classes
195
Chapter 8. Conclusions
In our work we show that Java software developers tend to create and
evolve software with these skewed distributions. However, we do not
yet know the key drivers for this behaviour, and if the structure arises
due to the limitations of human memory (which would make it a more
broad property). This is an area of future work arising from our study,
where we seek to establish why these skewed distributions arise. Addi-
tionally, we also seek to conduct experiments to verify if it is possible to
construct, and evolve a software system with a more equitable metric
distributions.
196
Chapter 8. Conclusions
197
Chapter 8. Conclusions
198
Appendix A
The meta data captured for each software system (see Table 3.3) in our
data set is summarised in Table A.1. The release history as well as the
meta data used in our study is released as the “Helix Software Evolution
Data Set”, and it available online at:
http://www.ict.swin.edu.au/research/projects/helix/.
Item Description
Name Full name of the software system
Short name Short name (one word) of the software system.
This name is used in the reports.
Type This can one of the following values: Applica-
tion, Library, or Framework.
Is GUI This is a flag that is turned on if the system
provides a graphical user interface for all, or
some part of the functionality.
License The open source license that this software sys-
tem is distributed under.
Commercial This is a flag that is turned on if the project is
sponsored by a commercial organisation.
Project URL This the URL of the project’s primary website.
Issue Log This is a URI. It can be either a pointer to a
local file, or a URL to the issue log.
199
Appendix B
The raw metric data files for all of the metrics defined in Chapter 4 are
located in the directory data/rawmetrics on the DVD. The metrics for
each software system (cf. Table 3.3) are in a separate data file.
Sample of the data that is available in the raw metric file for each system
is shown in the box below. We list out the Number of Methods (NOM)
metric for 6 releases of 2 classes in the Apache Ant system. As can be
seen below, the second class did not exist in the first 4 releases and
hence has empty values for the metric.
org/apache/ant/AntClassLoader,NOM,8,13,17,43,43,46
org/apache/ant/AntTypeDefinition,NOM„„,20,20
200
Appendix C
Table C.1 shows the list of metrics where the the Java Virtual Machine
opcodes are used to compute the value.
201
Appendix D
202
Chapter D. Metric Extraction Illustration
1 /∗∗ Total counts for a subset of metrics are included in the header
2 ∗ LIC ( Load Count ) = 15, SIC ( Store Count ) = 2 , NOM (Number of Methods ) = 2
3 ∗ NOB (Number of Branches ) = 6 , TCC ( Type Construction Count ) = 1
4 ∗ MCC (Method Call Count ) = 3 , THC (Throw Count ) = 1 , EXC ( Exception Count ) = 1
5 ∗ ODC ( Out Degree Count ) = 4 ( String , Exception , PrintStream , System. out )
6 ∗ NOF (Number of Fields ) = 3 , IOC ( Increment Operation Count ) = 3
7 ∗ LVC ( Local Variable Count ) = 3
8 ∗/
9 public class MetricCountExample
10 {
11 p r i v a t e boolean bField ; // un−i n i t i a l i s e d f i e l d
12
13 // 2 LOADS: this , constant integer
14 p r i v a t e i n t index = 5; // 1 STORE
15
16 // 2 LOADS: this , constant string
17 p r i v a t e String str = " Hello World ! " ; // 1 STORE
18
19 // Default Constructor
20 // 1 LOAD ( this ) , 1 METHOD CALL ( constructor of super class java . lang . Object )
21
22 public void methodX ( String msg ) throws Exception
23 {
24 i n t a = 5; // 1 LOAD constant of 5 , 1 STORE into local variable a
25 methodY ( ) ; // 1 INTERNAL method call , 1 LOAD: this object
26
27 // BRANCH count = 2 , one for the i f and another for &&
28 // 4 LOADS: a , this , index , and bField
29 i f ( ( a > index ) && ( bField == f a l s e ) ) return ;
30
31 // BRANCH count = 2 , default i s not counted , 1 per each case
32 switch ( index ) // 2 LOADS: this , index
33 {
34 case 0: a++; // INCREMENT operation
35 case 1: a−−; // INCREMENT operation with −1
36 d e f a u l t : a−−; // INCREMENT operation with −1
37 }
38
39 // BRANCH COUNT = 1 , TYPE Construction Count = 1 (new Exception )
40 // THROW Count = 1 , 2 LOADS: args array , constant value 3
41 i f ( msg == null ) throw new Exception ( " Something odd " ) ;
42 }
43
44 public void methodY ( )
45 {
46 // 2 LOAD constants , 1 STORE, 1 INCREMENT OPERATION, 1 BRANCH ( for loop condition
)
47 f o r ( i n t j=0; j<3; j++)
48 {
49 // 3 LOADS: this , System. out and str
50 System . out . println ( str ) ; // 1 EXTERNAL method c a l l
51 } // 1 GOTO count ( to continue loop )
52 }
53 }
The data referred to in the study of the distribution of the growth (Chap-
ter 5) is described in Table E.1, and located in the directory data/growth
on the DVD. The data is also available online at:
http://www.ict.swin.edu.au/personal/rvasa/thesis/data
The raw metric data is provided as a comma separated values (CSV) file,
and the first line of the CSV file contains the header. A detailed output
of the statistical analysis undertaken is provided as log files generated
directly from Stata (statistical analysis software) [259].
204
Appendix F
205
Chapter F. Change Dynamics Data Files
206
References
207
References
208
References
209
References
210
References
211
References
212
References
213
References
214
References
215
References
216
References
217
References
218
References
219
References
[160] K. Lakhani and E. Von Hippel. How Open Source Software Works:
Free User-To-User Assistance. Research policy, 32(6):923–943,
2003.
220
References
[171] M. Lehman and J. Ramil. Rules and Tools for Software Evolu-
tion Planning and Management. Annals of Software Engineering,
11(1):15–44, 2001.
221
References
222
References
223
References
[207] J. Nash. Directions for Open Source Software Over the Next
Decade. Futures, 42(4):427–433, 2009.
224
References
225
References
226
References
227
References
228
References
229
References
230
References
231
References
[304] J. Wu. Open Source Software Evolution and Its Dynamics. PhD
thesis, University of Waterloo, 2006.
[305] J. Wu, R. Holt, and A. Hassan. Empirical Evidence for SOC Dy-
namics in Software Evolution. In IEEE International Conference
on Software Maintenance (ICSM 2007), pages 244–254, 2007.
[311] K. Xu. How Has the Literature on Gini’s Index Evolved in the
Past 80 Years? Department of Economics, Dalhouse University,
Halifax, Nova Scotia, Dec. 2004.
232
References
233