Metrics and Laws of Software Evolution - The Nineties View: Fig. 1 OS/360 Growth Trend by RSN
Metrics and Laws of Software Evolution - The Nineties View: Fig. 1 OS/360 Growth Trend by RSN
Metrics and Laws of Software Evolution - The Nineties View: Fig. 1 OS/360 Growth Trend by RSN
M M Lehman D E Perry
J F Ramil Bell Laboratories, Murray Hill, NJ 07974
P D Wernick +1 908 582 2529
Department of Computing dep@research.bell-labs.com
Imperial College of Science, Technology and
Medicine W M Turski
London SW7 2BZ Institute of Informatics
tel: +44 (0)171 594 8214 Warsaw University
fax: +44 (0)171 594 8215 Warsaw 02-097
e-mail: {mml,jcf1,pdw1}@doc.ic.ac.uk +48 22 658 3522
URL:http://www-dse.doc.ic.ac.uk/~mml/feast1/ wmt@mimuw.edu.pl
500
400 Modules
0 Logica FW
RSN 350
0 5 10 15 20 Incremental Growth
300
Fig. 3 FW growth trend by rsn 250
Average Increment
200
4.3 Incremental growth
150
Figures 4 and 5 show the incremental growth per 100
release of OS/360 and FW respectively over the releases 50
rsn1 to rsn21 for each system. The horizontal line indicates
the average growth per release over this range. For FW the 0 RSN
plot includes all the data in table 1. For OS/360 the final 0 5 10 15 20
five releases for which data is available are omitted from
the plot since they reflect the transition (in growth trend Fig. 5 Incremental growth per release o f
terms) from OS/370 to VS1 and VS2. FW over rsn2 to rsn21
As pointed out previously [leh74,78], the cyclic effect Ei = (si - si-1)si-12 {i = 2,..., n} (2)
reflected by the peaks and troughs in the incremental Ei = (si - s1 )/(Σk=1i-1(1/(sk ) 2 )) {i = 2,..., n} (3)
growth plots may be indicating the presence of feedback
driven and controlled growth. Thus, influences tending to
increase system functionality, that is growth towards the Size in Modules
peaks, may have their source in positive feedback. The 2800 Logica FW
declines may reflect size stabilisation and other negative Least Squares Linear Fit (LSL)
2400
feedback effects. An example of such feedback is the
evolutionary pressure that arises when clients and users 2000
express a need for enhancements to existing capability or
1600
system extension. But as implementation of such changes
proceeds, the size and complexity of the system increases 1200
leading to declining comprehendability, increasing error
rates, increasing resistance to change or the impact of 800
budgetary constraints. These lead to a decrease of resources 400
available for, for example, growth as the resource demand
for fault fixing and complexity reduction increases [leh85]. 0
RSN
If sufficiently mature [hum95], the process will be directed 0 5 10 15 20
in its evolution and growth patterns by data reflecting such
needs. That is, the data or its derivatives will be used to Fig. 6 Least squares linear fit to FW over
adjust process objectives (immediate and/or long term) and rsn1 to rsn21
process parameters. It will be used to drive, constrain, and
in general, manage the process. Positive feedback drives Algorithm (2) uses only the two most recent data
growth while negative influences force a period of points in computing Ei. With (3) all data to rsni are
consolidation (correction and restructuring). An example of considered. In either case the average of the resultant set of
the consequences of excessive positive feedback may be Ei gives an estimated value for E. A third approach (LSIS)
provided by the final 7 releases, rsn20 to rsn26, of OS/360 computes E from the entirety of data using a least squares
(figure 1). A hypothesis that explains the system's criterion and is illustrated in figure 7.
apparently unstable behaviour over these releases is that it
was a consequence of excessive growth, in response to Size in Modules
market demand, in going from rsn19 to rsn20. 2800 Logica FW
This brief analysis suggests that the FW data supports,
in part at least, the third and fifth laws of software 2400 Inverse Square Fit (IS)
evolution as originally inferred from OS/360 study. 2000
Analysis of the long term growth trend of FW in the next
subsection suggests, however, that the wording of laws III 1600
and V as in table 1, must be modified.
1200
4.4 The Inverse Square model (IS) 800
400
This section presents two models of FW growth. The
first of these, illustrated in figure 6, is obtained from the 0
data set of table 2 using a least squares linear (LSL) fit. RSN
0 5 10 15 20
The models focus on the general trend and largely ignore
the ripple. Detailed analysis of the latter is beyond the Fig. 7 Least squares inverse square fit t o
scope of this paper. FW over rsn1 to rsn21
After investigating other possibilities Turski developed
an alternative, inverse square, model (IS) represented by The conceptual implications guiding the selection of
the nonlinear discrete-time dynamical recursion (1) [tur96]. one of the three alternative algorithms for computing E are
In this model si is the actual value of rsni, ^si is its fitted subtle and are not discussed further here. They yield
or predicted size, "n" is the total number of releases in the slightly different values for E but, in the context of this
data set and E is a model parameter. study, they do not produce significantly different
^s1 = s1 (1a) behavioural patterns. Nor do they change the conclusions
^si = ^si-1 + E/(^si-1) 2 {i = 2,…, n} (1b) to be drawn. Finally, the observant reader will notice
apparent outliers rsn20 and rsn21. No comment can be made
The parameter E is the average of individual Ei, at this time about the significance of these or their
calculated from either (2) or (3). possible implication.
For the trend models estimated from the full set of 21 output identified in the fourth law as being required to take
data points, statistical measures of the closeness of fit of the system from one release to the next. The principal
the LSL and IS models do not differ significantly. questions raised by this interpretation, questions not
Comparative assessment is, therefore, difficult on basis of satisfactorily answered, relate to the interpretation of E and
currently available data. This may be due to the fact that the units in which it is measured. Does E relate to the
the damping in the IS trend is not strong. Moreover, input effort required to achieve release by release system
neither model addresses the ripple. The deviations from evolution or to the output achieved from the process
smooth growth that the latter represents could, of course measured by some measure of increase in system quality
simply be noise, the compounded impact of many, and power? To answer the first question requires further
continuing, localised, often short term management and investigation and additional data. As to units, si is a
implementation decisions in which case it would not affect dimensionless count. Hence E is dimensionless. But
the assessment. The FEAST hypothesis suggested that, in despite these unsolved questions it is concluded that, on
part at least, the ripple is an indicator of the presence of the basis of currently available data, the above remarks,
feedback-controlled mechanisms that regulate the long together provide some justification for preferring the IS
term growth trend. The ongoing white box modelling model. It appears to reflect reality more closely.
activity in FEAST/1 represents a first step in the attempts The full implications of one further indicator of the
to resolve this issue, to permit refinement of the models superiority of IS over LSL must now be considered. When
and a more precise assessment of the degree to which they, modelling large data sets, the first part is often used to
their derivatives or different models reflect the reality of estimate model parameters and the second to then evaluate
the processes studied; and the degree to which they may be its "predictive" capability [ger93]. With the small size of
generalised. the data set available from FW, this might not appear to
Including the ripple will assist in comparative be a fruitful path to follow. Turski [tur96] did, however,
assessment of the model. It has, however, been pointed investigate this question, asking: "How many points
out already [tur96] that the phenomenology of the beginning with rsn1 have to be considered in order to get
situation suggests several reasons for preferring the IS an appropriately low error of fit, an acceptable predictive
model: capability?" In terms of the FEAST hypothesis this
• The IS inverse square property can be interpreted as question is equivalent to asking: How fast is the FW
reflecting the complexity growth of a software system dynamics established? An answer for FW is suggested by
over a sequence of releases. Such growth is due, in the plot of figure 8.
part, to increases in the complexity of the application, Figure 8 plots a set of mean absolute error of fit values
for example, as features not included in the original (maej {j = 2, ..., 21}, where j indicates the number of
system definition, and often orthogonal to it, are points from rsn1 used to compute E, see Appendix). The
added. Moreover, the process of evolution adds change values of mae2 and mae3 are relatively large. As j is
upon change upon change with, in general, little increased maej converges rapidly and reaches a relatively
attention paid to the resultant complexity growth steady value by j equal 6 (parameter E computed from the
[leh85]. It is this phenomenon that is captured by the first six releases only). Thereafter maej {j = 6, ...,21} has a
second law (table 1). mean of 74.6 with a standard deviation of 2.8. The mae6
• As a one parameter model IS is also compatible with value is only 4.7% of the system size at rsn6, 3.2% of its
the fourth law of software evolution, with the size at rsn19 and 2.8% of its size at rsn21. This behaviour is
parameter E reflecting the constant effort that the law counter-intuitive in several ways. Possible interpretations
identifies [leh78]. and implications are summarised below. Overall, it does,
• IS also satisfies the Principle of Parsimony [cox66]. however, appear to indicate the strength of the system
• No system can grow forever. The linear growth model dynamics. This phenomenon supports the observation
is thus incompatible with reason and common made by one of the authors many years ago with regards to
experience. OS/360 evolution that "Rather than the managers
managing the (evolving software) system, the system
4.5 Further consideration of the Inverse Square manages the managers." It must, of course, be understood
model that the reference here is to long term evolution, not to the
specifics of individual decisions, often localised in time,
The list of reasons for favouring IS over the LSL system space and implementation space.
includes the observation that the single parameter E of the • Figure 8 based on the IS model suggests that the FW
former may be interpreted as a constant effort parameter as growth trend is established over some six of the
predicted by the fourth law. Estimation of E from the releases included in the study. In accordance with the
available data strengthens that argument. Such estimation FEAST hypothesis, it is assumed that the dynamics
produces a value that, as shown below, remains relatively arises from the characteristics of the software, the
constant as FW evolves. That is, the single parameter of organisations developing, marketing and using the
the model may be interpreted as the constant effort or work software, the communications between them and the
controls that are exercised. In any event figure 8 • Note that the mae of LSL over the stable range is, at
supports the hypothesis that the E-type systems 86 modules, even closer to the average incremental
evolution process develops strong dynamics. growth of 86.1 modules than is that of IS. The
• The mae for IS of 74.6 modules with standard implications of this, for example on the evaluation of
deviation of 2.8 over the stable range is very close to the relative value of the two models requires more
the calculated average incremental growth of about investigation.
86.1 modules over all data points (fig. 5). This raises • The IS plot in Figs. 8 and 9 stabilises much more
the question whether there is some relationship rapidly than does the LSL plot. Moreover, if IS and
between the variance of the ripple (which is a LSL are estimated by using only rsn1 and rsn2, the
significant source of error for the trend fit) and the former outperforms the latter by an order of
mean incremental growth. Establishing a correlation magnitude. Thus while there are still unanswered
would lead to a concept of safe growth rate limits. questions, figures 8 and 9 appear to support the earlier
Establishing either would provide strong conceptual conclusion that IS is to be preferred over LSL. That
support for the incremental or evolutionary release they provide further support for the FEAST
strategy [gil88]. The entire question remains to be hypothesis and the laws of software evolution does
investigated. not require further emphasis
The results presented above are based on the
Modules examination of the FW system, investigation of OS/360
300 Logica FW not having yet been reopened. Continued investigation of
Mean Absolute Error over All Releases these and other systems is clearly required.
250
as Function of Number of Data Points
200 Used to Estimate E 4.6 Impact of the study on the laws of software
evolution
150 IS
More work is clearly required for firm conclusions to be
100 reached in regards to the many issues raised above. It is
nevertheless considered appropriate to indicate in table 3
50 the extent to which the investigators feel encouraged to see
the present results as being compatible with, or even
0 # of supporting, the laws of software evolution. The weight of
0 5 10 15 20 Point evidence suggests that, despite the 20 year gap and the
significant difference between IBM and Logica systems and
Fig. 8 Mean absolute error of fit to FW over their development and operational environments, there are
all releases as function of number o f strong similarities in the phenomenology of their
points used to estimate IS model evolutionary growth. It is believed that the results of the
studies to date will, with some modification, extend to E-
type systems in general. The FEAST/1 project will, it is
hoped, receive sufficient data from the evolution processes
3000 Modules
Logica FW of a variety of systems to establish confidence in a set of
Mean Absolute Error over All Releases conclusions that are valid in some stated domain or, of
2500 course, to demonstrate that they cannot be generalised.
as Function of Number of Data Points
2000 Used to Estimate Models
5 Final remarks
1500 LSL
The results achieved so far by applying this method in
1000 the FEAST/1 project are encouraging. Additional data on a
wide spectrum of software systems to be received from
500 IS various industrial collaborators should, if consistent,
permit generalisation of both the conclusions reached and
0 # of the measurement and analysis techniques being employed.
0 5 10 15 20 Points The present paper describes the black box approach that
has revealed aspects of FW evolution and of its evolution
Fig. 9 Mean absolute error of fit to FW over dynamics, has provided material for interpretation and for
all releases as function of number o f the formulation of explanatory hypotheses. A white box
points used to estimate LSL model modelling approach is simultaneously seeking to model
(squares) superimposed on that o f the structure of other industrial software processes and to
IS model (circles) simulate their behaviour including their feedback control
loops. These investigations are being further backed up detect, measure and control feedback phenomena and their
through the development of a multi-agent model. It is impact is believed to be key to major advances in software
hoped that this work will confirm, perhaps modified process management and execution.
versions of, the laws of software evolution [leh96d] that In view of the fact that this paper will be presented at
now include the FEAST hypothesis and, put them on the Metrics '97 symposium it is appropriate to comment
firmer foundations. If successful over a range of systems, on its focus on the FEAST hypothesis, the related
the investigation will provide a base for a plausible theory FEAST/1 project and the absence of references to other
of software process and software evolution. The relevant metrics work [fen96,ieee94,kit82,96,vot95].
alternative, that the results of the investigation FEAST/1 is believed to exemplify an original metrics
demonstrate that the laws and the hypothesis are not of based approach to the study of the software process and
general relevance though satisfied for particular instances software evolution. This approach has been consistently
of E-type systems and their evolution processes cannot, at followed from the first primitive study of OS/360 in the
this stage, be dismissed. late sixties and seventies [leh69,85] to the current
The FEAST/1 study has already made visible progress investigation. The study was triggered by a general
in illustrating how measurement concepts can be applied observation; the universal and persistent problems
to the study of software evolution. It has successfully accompanying software development and maintenance, ie.
extended the 1970s techniques by applying more rigour software evolution. Following recognition of the problem
[law82] to mastery of the observed phenomena. The as appropriate for research investigation [leh69,85] and
specific results derived are of considerable interest, both in receipt of appropriate data, first from OS/360 and more
themselves and from a wider perspective. The long term recently from Logica FW [leh96d], patterns and
significance of this paper is, however, more likely to be in regularities in their evolution were revealed and modelled.
the approach and techniques it presents. Being able to Interpretation of the models led, in turn, to the generation
Table 3 The laws of software evolution in the light of the preliminary FW analysis7
7
It is hoped to obtain more data that will provide evidence, one way or the other.
of hypotheses (eg. the laws and FEAST) to interpret them. [bec94 ] Becker RS, Hall B, and Rustem E, Robust Optimal
These successive steps led to an iterative investigation that Control of Stochastic Nonlinear Economic
is now yielding further data (historical and/or obtained by Systems, J. of Economic Dynamics and Control, n .
experimentation and measurement) to support, refute or 18, 1994, pp. 125 - 148
[bel72] Belady LA and Lehman MM, An Introduction t o
modify and then to extend and generalise the emerging Growth Dynamics, Proc. Conf. on Statistical
theoretical base and framework. Such results must, of Comp. Perf. Evaluation, Brown Univ. 1971,
course, be continually validated or adjusted by observation Academic Press, 1972, W Freiberger (ed.), pp. 503 -
of and experimentation in actual industrial processes. Thus 511
the more general relevance of the paper to the metrics [bel78]* id., Characteristics of Large Systems, Proc. Conf.
community is in its approach which may be compared Research Directions in Software Technology, (P.
with those more widely adopted. Wegner ed.), Sponsored by Tri-Services Committee
Apart from any theoretical advance that this study will of the DOD, Brown U. Providence, RI, Oct. 1878,
provide, it should, if successful, lead to the development MIT Press, 1979, pp. 106 - 142
of methods and tools for process management, release [box70] Box GP and Jenkins GM, Time Series Analysis,
Forecasting and Control, Holden-Day, San
planning and process improvement. This will shape the Francisco, 1970, 553 pps.
direction of software metrics, software process modelling [cox66] Cox DR and Lewis PAW, The Statistical Analysis
and process improvement in the years to come. If the of Series of Events, Methuen, London, 1966
extent to which feedback phenomena in E-type evolution [dav84] Davis OL and Goldsmith PL, Statistical Methods i n
processes shapes and constrains the software process Research and Production, 4th. ed., Longman,
significantly, mastery and command of that phenomena London, 1984, 478 pps.
will open up important new prospects. Moreover, the [fea94,5] Preprints of the three FEAST Workshops, Lehman
software process is a special case of business processes, in MM (ed.), Dept. of Comp., ICSTM, 1994/5
general [leh97]. The approach applied and the conclusions [fen96] Fenton NE and Pfleeger SL, Software Metrics - A
reached should find much wider application. It is believed Rigorous and Practical Approach, 2nd ed., PWS
Publ. Co., London, 1997, 638 pps.
that FEAST/1 is a study which, if successful, will [for61] Forrester JW, Industrial Dynamics, Productivity
eventually lead to a theory and to a technology which Press, Cambridge, MA, 1961
together can trigger major advances in the software and [for70] Forrester JW, Understanding the Counter Intuitive
other business processes and their improvement. Behaviour of Social Systems i n Systems
Behaviour, ed. by Open Systems Group, 3rd. Ed.,
pp. 270-287, Paul Chapman Publishing Co. and
6 Acknowledgements The Open University. London, 1972
[ger93] Gershenfeld NA and Weigend AS, The Future o f
We are grateful to Logica plc for providing access to the Time Series: Learning and Understanding, in Time
FW data and in particular to Joe Halberstadt for his Series Prediction: Forecasting the Future and
collaboration. Sincere appreciation is also due to Profs. Understanding the Past, Gershenfeld NA and
Weigend AS (eds.), SFI Studies in the Sciences of
Berc Rustem and Vic Stenning, co-Principle Investigators Complexity, Proc. Vol. XV, Addison-Wesley,
on the FEAST/1 project, for their many contributions to 1993, pp. 1-70
the investigation and to Dr. Emma McCoy of the ICSTM [gil88] Gilb T, Principles of Software Engineering
Mathematics Department for her help with statistical Management, Addison Wesley, 1988
aspects of this investigation. We also acknowledge the [göd31] Gödel K, Über formal unentscheibare Sätze der
constructive contributions of participants in the three open Principia Mathematica und verwandter Systeme, I,
FEAST workshops in 1994/5. Last but not least our Monatshefte für Mathematik und Physik 38, 1931,
thanks to the anonymous referees for their careful reading pp. 173-1198. English translation, On Formally
and constructive comments. Since October 1996 the work Undecidable Propositions,Gödel K, Basic Books,
reported here has been supported under EPSRC grants New York, 1962
[hum95] Humphrey WS, A Discipline for Software
numbers GR/K86008 and GR/L07437. Engineering, SEI Series in Software Engineering,
Addison-Wesley, Reading, MA, 1995, 789 pps.
7 References8 [ieee94] Measurement Based Process Improvement, sp. iss.
IEEE Softw., IEEE Comp. Soc. v. 11, n. 4, July
[abd91] Abdel-Hamid T and Madnick SE, Software Project 1994
Dynamics - An Integrated Approach, Prentice-Hall, [kem93] Kemerer CF, Reliability of Function Point
Englewood Cliffs, 1991, 264 pps. Measurement: A Field Experiment, CACM v. 3, n .
[alb79] Albrecht AJ, Measuring Application Development 2, Feb. 1993, pp. 85 - 97
Productivity, Proc. Guide/Share: IBM Application [kit82] Kitchenham B, System Evolution Dynamics o f
Development Symposium, Monterey, CA, 1979, VME/B, ICL Tech. J., May 1982, pp. 42 - 57
pp. 83 - 92 [kit96] id., Software Metrics: Measurement for Software
Process Improvement, NCC Blackwell, 1996, 241
8
Papers identified by a * in the reference listing are reprinted in pps.
[leh85].
[law82] Lawrence MJ, An Examination of Evolution [mat92] MATLAB High-performance Numeric Computation
Dynamics, Proc. 6th. Int. Conf. On Softw. Eng., and Visualisation Software - Reference Guide, The
Tokyo, Japan, 13 -16 Sept. 1982, IEEE Comp. Soc. MathWorks, Inc., Natick, MA, 1992
Ord.N. 422, IEEE Cat n. 81CH1795-4, pp.188-196. [mca95] McCabe FG and Clark KL, Programming in April:
[leh69]* Lehman MM, The Programming Process, IBM Res. An Agent Process Interaction Language, i n
Rep. RC 2722, IBM Res. Centre, Yorktown Intelligent Agents, Springer Verlag, 1995.
Heights, NY 10594, Sept. 1969 [mcg93] McGowan CL and Bohner SA, Model Based Process
[leh74]* id., Programs, Cities, Students, Limits to Growth?, Assessments, Proc. 15th Int. Conf. on Softw. Eng.,
Inaugural Lecture, May 1974, Publ. in Imp. Col of Baltimore, MD, 17 - 21 May 1993, IEEE Comp.
Sc. Tech. Inaug. Lect. Ser., vol 9, 1970, 1974, pp. Soc. ord. n. 3700-02, pp. 202-211
211 - 229. Also in Programming Methodology, [mea72] Meadows DH et al, Limits to Growth, Signet, 1972
(Gries D , ed.), Springer, Verlag, 1978, pp. 42 - 62 [tur96] Turski W, Reference Model for Smooth Growth of
[leh77] Lehman MM and Patterson J, Preliminary CCSS Software Systems, IEEE Trans. on Softw. Eng., vol.
System Analysis Using Techniques of Evolution 22, n. 8, 1996
Dynamics, Working Papers, First Software Life [vot95] Votta LG and Zajac ML, A Design Process
Cycle Management Workshop, Airlie VA, 1977, Improvement Case Study Using Process Waiver
publ. by ISRAD/AIRMICS, Comp. Sys. Com., US Data, Proc. ESEC '95, Sitges, Barcelona, Spain, 2 5
Army, Fort Belvoir, VA, Dec. 1997, pp. 324 - 332 - 28 Sept. 1995
[leh78]* id., Laws of Program Evolution - Rules and Tools [wae94] Waeselynck H and Pfahl D, System Dynamics
for Programming Management, Proc. Infotech State Applied to the Modelling of Software Projects, i n
of the Art Conf., Why Software Projects Fail, - Apr. Software - Concepts and Tools, v. 15, Springer-
1978, pp. 11/1 11/25 Verlag, Berlin, 1994, pp. 162 - 174
[leh80a]* id., On Understanding Laws, Evolution and [wil51] Wilkes M V, Wheeler D J and Gill S, The
Conservation in the Large Program Life Cycle, J. of Preparation of Programs for an Electronic Digital
Sys. and Software, v. 1, n. 3, 1980, pp. 213 - 221 Computer, Addison Wesley Press Inc., 1951, 167
[leh80b]* id., Programs, Life Cycles and Laws of Software pps.
Evolution, Proc. IEEE Spec. Iss. on Softw. Eng., v .
68, n. 9, Sept. 1980, pp. 1060 - 1076
[leh85]* Lehman MM and Belady LA, Program Evolution - Appendix
Processes of Software Change, Academic Press,
London, 1985, 538 pps. This appendix indicates how the values of the mean
[leh89] Lehman MM, Uncertainty in Computer Application average error of fit (mae), as in section 4.5 figures 8 and 9,
and its Control Through the Engineering o f have been computed. It also records the equations used to
Software, J. of Software Maintenance: Research and compute the least squares linear (LSL) and inverse square
Practice, v. 1, n. 1, Sept. 1989, pp. 3 - 27 (IS) models.
[leh94] id., Feedback in the Software Evolution Process,
Keynote Addr., CSR Eleventh Annual Workshop o n
As explained in section 4.5, for each of the models LSL
Softw. Evolution: Models and Metrics. Dublin, 7- and IS, a set of maej values {j=2,...,21} was computed to
9th Sept. 1994, in Information and Softw. Tech., determine the effect on the error of fit of the number of
sp. Iss. on Softw. Maintenance, v. 38, n. 11, 1996, points used in the estimation. The average error over the
Elsevier, 1996, pp. 681 - 686 entire data set for each such number of points was then
[leh96a] id., Process Improvement - The Way Forward, taken as a measure of the goodness of fit for that model.
Invited Keynote Address, Proc. Brazilian Softw. Each set of maej was calculated from the expression:
Eng. Conf., SBES'96, Universidade Federal de Sao
Carlos, Brazil, 1996, pp. 23 - 35 maej = (1/n) Σk=1n |sk - ^sk,j| (A.1)
[leh96b] Lehman MM, Perry DE and Turski WM, Why is i t
so Hard to Find Feedback Control in Software where throughout the appendix:
Processes?, Inv. Pres., Proc. 19th Australasian n (= 21 for the FW set) is the number of data points
Comp. Sc. Conf., Melbourne, Austr., Jan. 31 - Feb. used in calculating mae;
2, 1996 j {j = 2,...,21} represents the number of data points
[leh96c] Lehman MM and Stenning V, FEAST/1: Case for being used to estimate the LSL and IS models,
Support, ICSTM - DoC EPSRC Proposal, March respectively;
1996 sk is the actual system size for the release with sequence
[leh96d] Lehman MM, Laws of Software Evolution number rsnk (table 2);
Revisited, Position Paper, EWSPT96, Oct. 1996,
LNCS 1149, Springer Verlag, 1997, pp. 108 - 124
^sk,j represents the fitted system size for rsnk , with
[leh97] id., Process Modelling - Where Next?, ICSE 9 Most sub-index j indicating that the corresponding model
Influential Paper Award, Proc. ICSE 19,Boston, (either LSL or IS) has been computed using the first j
MA, 20 - 22 May 1997, pp. 549 - 552 points of the data set only.
[mad96] Madachy RJ, System Dynamics Modelling of an Similarly, ^sk is used below to represent the fitted
Inspection Process, 18th Int. Conf. On Softw. system size based on LSL whose parameters have not, or
Eng., Berlin, 25-29 March 1996, IEEE Comp. Soc. IS whose parameter has not, necessarily been adjusted to
Ord. N. PR07246, pp. 376-386 minimise the error of fit in some sense.
Computation of ^sk,j Ei in expression A.4 will have been computed either
Least Squares Linear Model. In this case ^sk,j is from expression 2 or 3 (section 4.4). Ej may also be
expressed as follows: computed using the least squares criterion and is then the
value of E which minimises the error of fit, dj, over the
^sk,j = aj.k + bj {k = 1,…, n} (A.2) first j points, expressed as:
The parameters aj and bj are computed for each value of min (d ) = min (Σ
E k=1 (sk - ^sk ) ) (A.5)
j 2
j using a least squares linear regression, as provided by E j
most statistical packages and spreadsheets, to minimise where minE(.) indicates the minimum value of (.) over the
Σk=1j (sk - ^sk ) 2 . entire range of the parameter E. Expression 1 (section 4.4)
shows how ^sk may be computed.
Inverse Square Model. For this model, each value of For the FW data choosing one or other of these
^sk,j is computed recursively from s1 [tur96]: approaches has only minimally effect on the results. The
choice has no significant impact on the interpretation of
^s1,j = s1 (A.3a) these results or on the conclusions reached.
^sk,j = ^sk-1,j + Ej /(^sk-1,j) 2 {k = 2,…, n} (A.3b) Figures 7, 8 and 9 (IS plot) in section 4.5 are based on
expression A.5. This has been implemented under
where
MATLAB [mat92] and is available on the Web at
Ej = (1/(j-1)) Σi=2j Ei {j = 2,..., n} (A.4) http://www-dse.doc.ic.ac.uk/~mml/feast1/.
mml568[papers]
18/8/97