Software Re-Engineering: Objectives
Software Re-Engineering: Objectives
Software Re-Engineering: Objectives
28. Software
Re-engineering
Objectives
Contents
28.1 Source code translation
28.2 Reverse engineering
28.3 Program structure improvement
28.4 Program modularisation
28.5 Data re-engineering
In Chapters 26 and 27, I introduced legacy systems and different strategies for
software evolution. Legacy systems are old software systems which are essential for
business process support. Companies rely on these systems so they must keep
them in operation. Software evolution strategies include maintenance, replacement,
architectural evolution and, the topic of this chapter, software re-engineering.
Software re-engineering is concerned with re-implementing legacy systems
to make them more maintainable. Re-engineering may involve re-documenting the
system, organising and restructuring the system, translating the system to a more
modern programming language and modifying and updating the structure and values
of the system’s data. The functionality of the software is not changed and,
normally, the system architecture also remains the same.
From a technical perspective, software re-engineering may appear to be a
second-class solution to the problems of system evolution. The software
architecture is not updated so distributing centralised systems is difficult. It is not
usually possible to radically change the system programming language so old
systems cannot be converted to object-oriented programming languages such as Java
or C++. Inherent limitations in the system are maintained because the software
functionality is unchanged.
However, from a business point of view, software re-engineering may be the
only viable way to ensure that legacy systems can continue in service. It may be
too expensive and too risky to adopt any other approach to system evolution. To
understand the reasons for this, we must make a rough assessment of the legacy
system problem.
The amount of code in legacy systems is immense. In 1990, it was
estimated (Ulrich, 1990) that there were 120 billion lines of source code in
existence. The majority of these systems have been written in COBOL, a
programming language best suited to business data processing, or FORTRAN.
FORTRAN is a language for scientific or mathematical programming. These
languages have limited program structuring facilities and, in the case of
FORTRAN, very limited support for data structuring.
Although many of these programs have now been replaced, most of them are
probably still in service. Meanwhile, since 1990, there has been a huge increase in
computer use for business process support. Therefore, I guess that there must now
be roughly 250 billion lines of source code in existence which must be maintained.
Most of this is not written in object-oriented languages and much of it still runs on
mainframe computers.
There are so many systems in existence that complete replacement or radical
restructuring is financially unthinkable for most organisations. Maintenance of old
systems is increasingly expensive so re-engineering these systems extends their
useful lifetime. As discussed in Chapter 26, re-engineering a system is cost-
effective when it has a high business value but is expensive to maintain. Re-
engineering improves the system structure, creates new system documentation and
makes it easier to understand.
Re-engineering a software system has two key advantages over more radical
approaches to system evolution:
1. Reduced risk There is a high risk in re-developing software that is essential
for an organisation. Errors may be made in the system specification, there
may be development problems, etc.
Figure 28.1
Forward engineering
and re-engineering
2. Reduced cost The cost of re-engineering is significantly less than the costs of
developing new software. Ulrich (Ulrich, 1990) quotes an example of a
commercial system where the re-implementation costs were estimated at $50
million. The system was successfully re-engineered for £12 million. If these
figures are typical, it is about 4 times cheaper to re-engineer than to re-write.
The term re-engineering is also associated with business process re-
engineering (Hammer, 1990). Business process re-engineering is concerned with re-
designing business processes to reduce the number of redundant activities and
improve process efficiency. It is usually reliant on the introduction or the
enhancement of computer-based support for the process. Process re-engineering is
often a driver for software evolution as legacy systems may incorporate implicit
dependencies on the existing processes. These have to be discovered and removed
before process re-engineering is possible. Therefore, the need for software re-
engineering may emerge in a company when it becomes clear that the scale of the
changes required by the business process re-engineering cannot be accommodated
through normal program maintenance.
The critical distinction between re-engineering and new software development
is the starting point for the development. Rather than start with a written
specification, the old system acts as a specification for the new system. Chikofsky
and Cross (Chikofsky and Cross, 1990) call conventional development forward
engineering to distinguish it from software re-engineering. This distinction is
illustrated in Figure 28.1. Forward engineering starts with a system specification
and involves the design and implementation of a new system. Re-engineering starts
with an existing system and the development process for the replacement is based
on understanding and transformation of the original system.
Figure 28.2 illustrates a possible re-engineering process. The input to the
process is a legacy program and the output is a structured, modularised version of
the same program. At the same time as program re-engineering, the data for the
system may also be re-engineered. The activities in this re-engineering process are:
1. Source code translation The program is converted from an old programming
language to a more modern version of the same language or to a different
language.
2. Reverse engineering The program is analysed and information extracted from
it which helps to document its organisation and functionality.
3. Program structure improvement The control structure of the program is
analysed and modified to make it easier to read and understand.
Figure 28.2 T h e
re-engineering
process 4. Program modularisation Related parts of the program are grouped together
and, where appropriate, redundancy is removed. In some cases, this stage
may involve architectural transformation as discussed in Chapter 27.
5. Data re-engineering The data processed by the program is changed to reflect
program changes.
Program re-engineering may not necessarily require all of the steps in Figure
28.2. Source code translation may not be needed if the programming language used
to develop the system is still supported by the compiler supplier. If the re-
engineering relies completely on automated tools then recovering documentation
through reverse engineering may be unnecessary. Data re-engineering is only
required if the data structures in the program change during system re-engineering.
However, software re-engineering always involves some program re-structuring.
The costs of re-engineering obviously depend on the extent of the work that
is carried out. There is a spectrum of possible approaches to re-engineering as
shown in Figure 28.3. Costs increase from left to right so that source code
translation is the cheapest option and re-engineering as part of architectural
migration is the most expensive.
Apart from the extent of the re-engineering, the principal factors that affect
re-engineering costs are:
1. The quality of the software to be re-engineered. The lower the quality of the
software and its associated documentation (if any), the higher the re-
engineering costs.
2. The tool support available for re-engineering. It is not normally cost-
effective to re-engineer a software system unless you can use CASE tools to
automate most of the program changes.
3. The extent of data conversion required. If re-engineering requires large
volumes of data to be converted, this significantly increases the process cost.
Figure 28.3 R e -
engineering
approaches
4. The availability of expert staff. If the staff responsible for maintaining the
system cannot be involved in the re-engineering process, this will increase
the costs. System re-engineers will have to spend a great deal of time
understanding the system.
The main disadvantage of software re-engineering is that there are practical
limits to the extent that a system can be improved by re-engineering. It isn’t
possible, for example, to convert a system written using a functional approach to
an object-oriented system. Major architectural changes or radical re-organising of the
system data management cannot be carried out automatically so involve high
additional costs. Although re-engineering can improve maintainability, the re-
engineered system will probably not be as maintainable as a new system developed
using modern software engineering methods.
Figure 28.4 T h e 4. Lack of software support The suppliers of the language compiler may have
program translation gone out of business or may discontinue support for their product.
process
Figure 28.4 illustrates the process of source code translation. There may be no need
to understand the operation of the software in detail or to modify the system
architecture. The analysis involved can focus on programming language
considerations such as the equivalence of program control constructs.
Source code translation is only economically realistic if an automated
translator is available to do the bulk of the translation. This may be a specially
written program, a bought-in tool to convert from one language to another or a
pattern matching system. In the latter case, a set of instructions how to make the
translation from one representation to another has to be written. Parameterised
patterns in the source language are defined and associated with equivalent patterns in
the target language.
In many cases, completely automatic translation is impossible. Constructs
in the source language may have no direct equivalent in the target language. There
may be embedded conditional compilation instructions in the source code which are
not supported in the target language. In these circumstances, you need to do make
changes manually to tune and improve the generated system.
1. The design and specification of an existing system may be reverse engineered Figure 28.5 T h e
so that they can serve as an input to the requirements specification for that reverse engineering
program’s replacement. process
2. Alternatively, the design and specification may be reverse engineered so that
they are available to help program maintenance. With this additional
information, it may not be necessary to re-engineer the system source code.
The reverse engineering process is illustrated in Figure 28.5. The process starts
with an analysis phase. During this phase, the system is analysed using automated
tools to discover its structure. In itself, this is not enough to re-create the system
design. Engineers then work with the system source code and its structural model.
They add information to this which they have collected by understanding the
system. This information is maintained as a directed graph that is linked to the
program source code.
Information store browsers are used to compare the graph structure and the
code and to annotate the graph with extra information. Documents of various types
such as program and data structure diagrams and traceability matrices can be
generated from the directed graph. Traceability matrices show where entities in the
system are defined and referenced. The process of document generation is an iterative
one as the design information is used to further refine the information held in the
system repository.
Tools for program understanding may be used to support the reverse
engineering process. These usually present different system views and allow easy
navigation through the source code. For example, they allow users to select a data
definition then move through the code to where that data item is used. Examples of
such program browsers are discussed by Cleveland (Cleveland, 1989), Oman and
Cook (Oman and Cook, 1990) and Ning et al. (Ning, Engberts et al., 1994).
After the system design documentation has been generated, further
information may be added to the information store to help re-create the system
specification. This usually involves further manual annotation to the system
structure. The specification cannot be deduced automatically from the system model.
loop
-- The Get statement finds values for the given variables from the system’s
-- environment.
Get (Time-on, Time-off, Time, Setting, Temp, Switch) ;
case Switch of
when On => if Heating-status = off then
Switch-heating ; Heating-status := on ;
end if ;
when Off => if Heating-status = on then
Switch-heating ; Heating-status := off ;
end if;
when Controlled =>
if Time >= Time-on and Time < = Time-off then
if Temp > Setting and Heating-status = on then
Switch-heating; Heating-status = off;
elsif Temp < Setting and Heating-status = off then
Switch-heating; Heating-status := on ;
end if;
end if ; Figure 28.7 A
end case ; structured control
end loop ; program
as part of the program re-structuring process. Figure 28.8 shows how a conditional
statement including ‘not’ logic may be made more understandable.
Bohm and Jacopini (Bohm and Jacopini, 1966) proved that any program may
be rewritten in terms of simple if-then-else conditionals and while loops and that
unconditional goto statements were not required. This theorem is the basis for
automatic program re-structuring. Figure 28.9 shows the stages in the automatic re-
structuring of a program. It is first converted to a directed graph then a structured
equivalent program, without goto statements, is generated.
The directed graph that is generated is a program flow graph which shows
how control moves through the program. Simplification and transformation
techniques can be applied to this graph without changing its semantics. These detect
and remove unreachable parts of the code. Once simplification has been completed,
a new program is generated. While loops and simple conditional statements are
substituted for goto-based control. This program may be in the original language or
in a different language (e.g. FORTRAN may be converted to C).
Problems with automatic program re-structuring include:
1. Loss of comments If the program has in-line comments, these are invariably
lost as part of the re-structuring process.
2. Loss of documentation Similarly, the correspondence between external
program documentation and the program is also lost. In many cases,
however, both the comments and the documentation of a program are out-of-
-- Complex condition
if not (A > B and (C < D or not ( E > F) ) )...
Figure 28.8
-- Simplified condition Condition
if A <= B and (C>= D or E > F)... simplification
Figure 28.9
Automated program
restructuring
Approach Description
Data cleanup The data records and values are analysed to improve their
quality. Duplicates are removed, redundant information is
deleted and a consistent format applied to all records. This
should not normally require any associated program changes.
Data extension In this case, the data and associated programs are re-engineered
to remove limits on the data processing. This may require
changes to programs to increase field lengths, modify upper
limits on the tables, etc. The data itself may then have to be
rewritten and cleaned up to reflect the program changes.
Data migration In this case, data is moved into the control of a modern
database management system. The data may be stored in
separate files or may be managed by an older type of DBMS.
This situation is illustrated in Figure 28.11.
had to run 23 separate copies of the system. They therefore decided to re- Figure 28.10
engineer the system and its associated data. Approaches to data
3. Architectural evolution If a centralised system is migrated to a distributed re-engineering
architecture it is essential that the core of that architecture should be a data
management system that can be accessed from remote clients. This may
require a large data re-engineering effort to move data from separate files into
the server database management system. The move to a distributed program
architecture may be initiated when an organisation decides to move from file-
based data management to a database management system.
As with program re-engineering, there are a spectrum of approaches to data re-
engineering which reflect the reasons why data re-engineering may be required.
These are shown in Figure 28.10.
Rickets et al. (Rickets, DelMonaco et al., 1993) describe some of the
problems with data which can arise in legacy systems made up of several
cooperating programs:
1. Data naming problems Names may be cryptic and difficult to understand.
Different names (synonyms) may be given to the same logical entity in
different programs in the system. The same name may be used in different
programs to mean different things.
2. Field length problems This is a problem when field lengths in records are
explicitly assigned in the program. The same item may be assigned different
lengths in different programs or the field length may be too short to
represent current data. To solve this problem, other fields may be re-used in
some cases so that usage of a named data field across the programs in a
system is inconsistent.
3. Record organisation problems Records representing the same entity may be
organised differently in different programs. This is a problem in languages
like COBOL where the physical organisation of records are set by the
programmer and reflected in files. It is not a problem in languages like C++
or Java where the physical organisation of a record is the compiler’s
responsibility.
4. Hard-coded literals Literal (absolute) values, such as tax rates, are included
Figure 28.11 Data directly in the program rather than referenced using some symbolic name.
migration
5. No data dictionary There may be no data dictionary defining the names used,
their representation and their use.
As well as inconsistent data definitions, data values may also be stored in an
inconsistent way. After the data definitions have been re-engineered, the data values
must also be converted to conform to the new structure. Rickets et al. also describe
some possible data value inconsistencies. These are shown in Figure 28.12.
Detailed analysis of the programs that use the data is essential before data re-
engineering. This analysis should be aimed at discovering the function of identifiers
in the program, finding the literal values which should be replaced with named
constants, discovering embedded data validation rules and data representation
conversions. Tools such as cross-reference analysers and pattern matchers may be
used to help with this analysis. A set of tables should be created which show where
data items are referenced and the changes to be made to each of these references.
Figure 28.13 illustrates the process of data-re-engineering, assuming that
data definitions are modified, literal values named, data formats re-organised and the
data values converted. The change summary tables hold details of all the changes to
be made. They are therefore used at all stages of the data re-engineering process. Figure 28.12 Data
In Stage 1 of this process, the data definitions in the program are modified to value inconsistencies
improve understandability. The data itself are not affected by these modifications. It
is possible to automate this process to some extent using pattern matching systems
such as awk (Aho, Kernighan et al., 1988) to find and replace definitions or to
develop XML descriptions of the data (St Laurent and Cerami, 1999) and use these
to drive data conversion tools. However, some manual work is almost always
necessary to complete the process. The data re-engineering process may stop at this
stage if the goal is simply to improve the understandability of the data structure
definitions in a program. If, however, there are data value problems as discussed
above, Stage 2 of the process may then be entered.
If an organisation decides to continue to Stage 2 of the process, it is then
committed to Stage 3, data conversion. This is usually a very expensive process.
Programs have to be written which embed knowledge of the old and the new
organisation. These process the old data and output the converted information.
Again, pattern matching systems may be used to implement this conversion.
K EY P OINTS
• The objective of system re-engineering is to improve the system structure and
make it easier to understand. The cost of future system maintenance should
therefore be reduced.
Figure 28.13 T h e
data re-engineering
process
• The re-engineering process includes activities such as source code translation,
reverse engineering, program structure improvement, program modularisation
and data re-engineering.
• Source code translation is the automatic conversion of a program written in one
programming language to another language. It may be necessary when the
original programming language is obsolete.
• Reverse engineering is the process of deriving a systems design and
specification from its source code. Tools such as program browsers may be
used to assist this process.
• Program structure improvement involves replacing unstructured control
constructs such as gotos with while loops and simple conditional statements.
This can be done automatically.
• Program modularisation involves re-organising the source code of a program to
group related items together. This makes them easier to understand and
change.
• Data re-engineering may be necessary because of inconsistent data
management by the programs in a legacy system. The objective of data re-
engineering may be to re-engineer all programs to use a common database.
• The costs of data re-engineering are significantly increased if existing data has
to be converted to some new format.
F URTHER R EADING
‘Examining Data Quality’. This special section includes a number of papers which
discuss data quality issues and the impact of poor data quality. (G. K. Tayi and D.
P. Ballou, Comm. ACM, 41 (2), Feb. 1998).
Software Re-engineering This is an IEEE tutorial that includes most of the
important papers on re-engineering which were published before 1992. Many of the
papers referenced in this chapter are re-printed in it. (R. S. Arnold, IEEE Press,
1994).
‘DoD Legacy Systems: Reverse Engineering Data Requirements’ This is a good
description of the practical problems which arise with legacy systems. The paper
focuses on data re-engineering where systems managing similar but incompatible
data were combined. Other papers in this special issue on reverse engineering are
also relevant. (P. Aiken, A. Muntz, R. Richards, Comm. ACM, 37 (5), May
1994).
EXERCISES
28.1 Under what circumstances do you think that software should be scrapped and
re-written rather than re-engineered?
28.2 Compare the control constructs (loops and conditionals) in any two
programming languages which you know. Write a short description of how
to translate the control constructs in one language to the equivalent
constructs in the other.
28.3 Translate the unstructured routine shown in Figure 28.14 into its structured
equivalent and work out what it is supposed to do.
28.4 Write a set of guidelines that may be used to help find modules in an
unstructured program.
28.5 Suggest meaningful names for the variables used in the program shown in
Figure 28.14 and construct data dictionary entries for these names.
28.6 What problems might arise when converting data from one type of database
management system to another (e.g. hierarchical to relational or relational to
object-oriented)?
28.7 Explain why it is impossible to recover a system specification by
automatically analysing system source code.
28.8 Using examples, describe the problems with data degradation which may have
routine BS (K, T, S, L)
B:= 1
NXT: if S >= B goto CON
L = -1
goto STP
CON: L := INTEGER (B / S)
L := INTEGER ((B+S) / 2)
if T (L) = K then return
if T(L) > K then goto GRT
B := L+1
goto NXT
GRT: S := L-1 Figure 28.14 A n
goto NXT unstructured program
STP: end