4575

Automated Regression Test Generation Using
Record & Replay Approach: A Case Study on

Train Control Management System
Adnan Čaušević Rikard Land Ola Sellin
Mälardalen University Maximatecc AB Bombardier Transportation
Västerås, Sweden Västerås, Sweden Västerås, Sweden
adnan.causevic@mdh.se rikard.land@maximatecc.com ola.sellin@se.transport.bombardier.com
Abstract—Organizations tend to limit their investment in test testing and, for some organizations, manual inspection of the
automation due to the lack of information on the actual tests log files could be the only way to make a verdict on a test.
reuse and when will automated tests really pay off. However,
to perform efficient regression testing of software systems, it is The goal of test automation is to advance in automating
expected from a development team to posses a certain level of all of its elements: creation of test scripts, test execution,
test automation infrastructure in place, where at the minimum generation of test inputs and defining of a test oracle. By not
the test execution is scripted and automated. having automated some of the mentioned elements, the cost
In this paper we are proposing the usage of record & replay
approach to observe the functional usage of a component under of performing regression testing could substantially increase as
test, while its being invoked as part of the whole system or it would require an additional manual effort with every new
only in a certain portion of it. Afterwords, executable tests software release. On the other hand, it is rather challenging
are automatically derived, containing both test inputs and test for companies to change how things are currently done and
verdict, allowing its later usage as part of a regression testing. especially make a significant jump when it comes to its
With as minimal effort as one manual test execution, developers
are provided with automated tests, minimizing any concerns on own test automation capabilities. Large-scale organizations are
the investment in automation. A case study from Bombardier just not eager to change things around easily particularly if
Transportation is provided showing how the proposed approach it introduces an additional effort for their employees. This
substantially reduced the test effort needed when performing is why any non-intrusive approach is very welcomed for
regression testing of the train control management system for consideration.
the Stockholm C30 metro train.
In this paper we are proposing the usage of record & replay
I. I NTRODUCTION approach in observing the functional usage of a component
With every new release of a software system, a dedicated under test (CUT) while the same is executed as part of the
testing activity should be performed in order to verify that the whole system or only in a certain portion of it. Information
changes to the code, due to fault fixes or introduction of new sampled during the recording phase are sufficient to automat-
features, did not corrupted any of the existing functionality. ically generate test scripts, containing both test inputs and
Often, we refer to this activity as a regression testing. Due test oracle, and perform their automated unattended execution.
to its repetitive nature, by over and over re-verifying that the Approach had a positive impact and was seen as an enabler
same functionality still has a correct behavior, efficiency of of a more efficient regression testing of the Train Control
regression testing is heavily reliant on the level of automation Management System (TCMS) for the Stockholm C30 metro
a certain organization posses. Depending on the application train developed by Bombardier Transportation site in Västerås,
domain and the maturity level of a specific organization, this Sweden.
can vary substantially. Organization of the paper is as follows. Section II pro-
Test automation in industry is often perceived as a mean to vides the background information regarding the specifics of
perform automated execution of the system under test (SUT) the TCMS development environment including its simulation
by unattended running of a set of test scripts. An execution of technology. Section III discuss the related work followed by
a single test script usually involves retaining SUT in a certain section IV which provides details regarding the record &
state by invoking it with a predefined set of input values. A replay approach for automated test generation. Component test
verdict or a decision whether a test has passed or not, also process is outlined in section V while the case study design,
known as a test oracle, could be automated as well. However, execution and results are presented in section VI. Section VII
this might require an additional investment in automation discuss possible future directions followed by section VIII
depending on the testability of the SUT. Testability defines which highlights conclusions on the work presented in this
how observable the states of the SUT are when performing paper.
Fig. 1. Bombardier’s Mitrac - Train Control Management System (TCMS)1 Fig. 2. An example of an FBD (IEC 61131-3) program structure
II. BACKGROUND of the distributed system that controls the train. Examples of
functions controlled by TCMS include collecting line voltage,
The general perception of software intended usage has
controlling the train engines, opening and closing the train
changed over the last decades. Often, software purpose is
doors, and upload of diagnostic data. TCSM is developed
perceived as a medium to help us in creating spreadsheets,
by following the IEC 61131-3 standard [1] for programming
editing graphical content or just entertain us in the form of
of PLC controllers. In particular, Function Block Diagrams
gaming or presenting multimedia. However, people may not
(FBD) are used to implement most of the functionality of
be fully aware to what extent software has a critical role in our
TCMS while Structured Text (ST) programming language is
lives. Today, software is used to control brakes of a car, the
used for the development of user specific libraries.
speed of a train or the altitude of an airplane. The main reason
FBD is a graphical programming language fitted very well
for this change in software usage has been the increased need
for developers having background in control engineering.
for more complex functionality in products usually perceived
Behavior of a program in FBD is described by connecting
as a physical, or hardware-oriented. Nowadays companies are
input signals to the network of functions and function blocks
using general purpose or custom made hardware modules
which computes the resulting outputs. An example of an
from suppliers on top of which their own in-house developed
FBD program structure is shown in figure 2. It is important
software solution (usually referred as an embedded software)
to distinguish the main difference in the behavior between
often controls the main functionality of the final product.
functions (for example AND, OR elements in fig. 2) and
Programmable logic controllers (PLC) are examples of such
function blocks (for example TP, TON, RS elements in fig.
systems.
2) in an FBD program. Functions always compute the same
A typical PLC system consists of a processor, a memory,
output for a given input vector. Function blocks however have
and a communication bus. Its main advantage is a build-in
internal states causing output to differ even if the input vector
support for input and output (I/O) communication. Having
is the same. This is because the outputs of function blocks
sensor values as inputs and actuator systems as outputs, PLCs
are dependent on the ordering within the sequence of input
are ideal for monitoring and supervising control systems. A
vectors provided to it but also dependent on when (in time)
program running on a PLC is almost always executed in a
a specific input vector is provided to the system and for how
cyclic loop with three distinguish phases: (i) reading all inputs
long (duration) it had that value.
from an input vector, (ii) executing program computation
This behavior of FBD programs emphasize the need to
without any interrupts, and (iii) writing updated values to the
have a structured approach when performing its testing. In
output vector.
particular, re-testing requires a well defined test steps clearly
One example of a PLC product is the Bombardier Trans- indicating sequence of values and timing information to suc-
portation’s Mitrac1 Train Control and Management System cessfully observe proper functioning of the underlying com-
(TCMS) depicted in figure 1. TCMS is a high capacity, infras- ponent under test (CUT). A challenging task, considering that
tructure backbone built upon an open standard IP-technology FBD components are part of a bigger system that is running
that allows an easy integration of all control and communica- on a dedicated PLC hardware. To avoid these problems,
tion functions on-board the train. It is considered as the center simulation techniques are extensively used in software testing
1 http://www.bombardier.com/en/transportation/products-services/ of embedded products [2]. Simulation enables execution of
propulsion-controls/products/train-control-and-management-system.html embedded software on a general purpose PC, enabling utiliza-
Fig. 3. Maximatecc Simtecc Simulation Technology
tion of existing debugging and test automation solutions. In

addition, it also helps improving testability of these systems Fig. 4. General Architecture of the TCMS [4]
by exposing internal input and output signals.
In the following subsections a short explanation is presented
on how a simulation environment was created in Bombardier intended purpose of this software, once it is loaded into VCUs,
Transportation as part of their SoftTCMS framework by uti- is to communicate with various train subsystems, among
lizing SimTecc simulation technology by Maximatecc. others: brakes, doors, air control units, lights, etc. A general
architecture of TCMS is depicted in figure 4. Palmieri et al. [4]
A. Maximatecc’s SimTecc presented in their work more high-level description of various
The technical concept of SimTecc simulation environment, TCMS elements and its intended usage.
formerly known as CCSimTech [3], is based on the software As previously mentioned, TCMS is considered as a Pro-
implementation of device-drivers. Examples of such are read- grammable Logic Controller (PLC) unit and as such it is
ing and writing of inputs and outputs, accessing the internal developed by following the IEC 61131-3 standard, in specific
memory of the device and communication via controller area FBD language. In order to simulate VCUs and execute the
network (CAN) bus or local area network (LAN). The em- corresponding TCMS software loaded on top of them, Bom-
bedded software is developed to use a well defined interface, bardier invested in creating several applications for the purpose
the so-called Hardware Abstraction Layer (HAL) which itself of replicating VCUs behavior on a general purpose PC. This
delivers all the data to and from the hardware. The figure allowed execution of the same TCMS software in both real and
3 visualizes this concept. The drivers are exchanged in soft simulated environment. The simulated counterpart was named
environment, and the new drivers don’t access real hardware SoftTCMS.
but instead communicate with a virtual software layer. The MITRAC CC tool is used for the development but also
embedded software (application) is exactly the same as on the for the compilation of the code for the real platform. This
target system, using the exactly same interface. However, the is where processes for the real and the simulated compi-
soft drivers now enables running of the entire software on a lation diverge. The simulated counterpart modifies the code
standard PC. These virtual drivers are the core of maximatecc’s to support its compilation for the PC platform (in specific
simulation environment SimTecc. In particular, for the scope of Microsoft Windows environment) but also adds various glue
this paper, we will focus on one specific driver called SimIO. code to it. Among others, it enables utilization of the SimTecc
SimIO driver serves to simulate hardware IO often used simulation environment, in specific SimIO driver. This, as a
by embedded applications to communicate with peripheral result, increase significantly testability of the TCMS in the
units, such as motors, controls, switches etc. Maximatecc’s simulated environment and enables efficient debugging and
implementation of simulated IO uses shared memory created testing processes.
by a dynamic load library (DLL) file. This memory contains For the debugging activities, Bombardier previously devel-
information about the IO signal, such as name and its value and oped a tool called DCUTerm. What is interesting with this
any other node in the system (who instantiated the same DLL tool is that it was created prior to the SoftTCMS environment
file) can access this value. This way it is possible to create a and its main purpose was to communicate and debug the
new IO signal (e.g. analogue, digital, pulse-width modulation real hardware VCUs. Now, the same tool is used for the
(PWM), or any other signal with the pulse nature) that can be communication with the simulated environment without any
read or set by other processes. modifications to it.
In order to perform unit, integration and system level
B. Bombardier’s SoftTCMS testing of the TCMS software, a dedicated tool was created,
Bombardier Transportation site in Västerås, Sweden, deliv- called Mitrac Desktop Component Tester - MDCT. Its intended
ers software for Vehicle Control Units (VCU) and Intelligent usage is to visualize FBD execution flow and allow manual
Display Units (IDU) to be used on-board the train. The interaction with the TCMS on a signal level.
III. R ELATED W ORK ƉƉůŝĐĂƚŝŽŶůĂǇĞƌ ƉƉůŝĐĂƚŝŽŶůĂǇĞƌ ƉƉůŝĐĂƚŝŽŶůĂǇĞƌ
ZĞƉůĂǇ ^ƵďƐǇƐƚĞŵ
IEEE defines regression testing as: selective retesting of a ^ƵďƐǇƐƚĞŵ ^ƵďƐǇƐƚĞŵ ^ƵďƐǇƐƚĞŵ ^ƵďƐǇƐƚĞŵ ^ƵďƐǇƐƚĞŵ
system or component to verify that modifications have not ^ƵďƐǇƐƚĞŵ ^ƵďƐǇƐƚĞŵ ^ƵďƐǇƐƚĞŵ ^ƵďƐǇƐƚĞŵ
ZĞĐŽƌĚĞƌ
^ƵďƐǇƐƚĞŵ
^ƵďƐǇƐƚĞŵ ^ƵďƐǇƐƚĞŵ ^ƵďƐǇƐƚĞŵ ^ƵďƐǇƐƚĞŵ
caused unintended effects and that the system or component

,ĂƌĚǁĂƌĞůĂǇĞƌ ,ĂƌĚǁĂƌĞůĂǇĞƌ ,ĂƌĚǁĂƌĞůĂǇĞƌ
still complies with its specified requirements [5]. Manual
regression testing on a subsystem level is a very tedious and
Fig. 5. Overview of the record & replay approach
time consuming task given a situation where requirements
are specified from a user or a system level perspective [6].
A record & replay approach to software testing is a well synchronization operations using Lamport timestamps [13].
known technique that has been researched and is in use. The During the replay phase the file with timestamps is consulted
most common usage scenario for this technique in software in order to track synchronization operations. In this way the
testing is for performing automated graphical user interface execution of a synchronization operation is delayed until all
(GUI) testing [7]. Also, reproducing software execution for synchronization operations with a smaller timestamp have
debugging purposes is another approach where record and been executed. This mechanism makes sure that all synchro-
replay has been already investigated [8]. nization operations are executed in an order that guarantees a
In [9] Bauer et al. demonstrate an approach that supports a faithful re-execution if no data races are present.
complete chain from a requirements document to a statistical The intended purpose for most of the previously presented
test report with a very high degree of automation. The initial contributions in academia is either to perform regression
part requires some human involvement in order to convert testing on a GUI level by recording mouse movements and
requirements into a precise specification, and develop the keyboard logging, or to replicate the exact behavior of the
PROVEtech TA test runner files. They were able to provide system in order to localize the fault, a process known as
fully automated test case generation, test execution, and test debugging. Although, the principle of recording and replaying
evaluation based on a system usage model. The model has is still the same, the main difference with our approach is
been applied on a real mirror control unit of a car door for that we are expecting the system to function correctly, for a
reliability estimations, where messages have been broadcasted given set of requirements, when the recording is performed.
via CAN bus. The presented work describes a practical way of Afterwords, when replicating the system execution we are not
using rigorous methods to achieve a high degree of automa- doing it for the purpose of finding faults, but instead to confirm
tion. that the system is still functioning correctly for the same set
In order to provide a better understanding and easier de- of requirements on a new system release.
bugging of the multi-threaded distributed Java applications,
Konuru et al. have developed a replay of the recorded ap- IV. R ECORD & R EPLAY
plication execution using logical thread schedules and logical In this section we are presenting a high level overview of
intervals [10]. The described approach is based on the already the record & replay approach used for automated generation
existing system called DejaVu that provides deterministic of regression tests as depicted in figure 5. Overall idea is
replay of multi-threaded Java programs on a single Java Virtual that when a specific subsystem or a component is being
Machine(JVM). However, the framework described in [10] is functionally tested, either in isolation or when used by a
developed to support distributed Java applications running on complete system, we would like to record interactions with
multiple JVMs. The approach is intended to be independent that component. This way, once we have a new version of
of the underlying operating system and not to require any the component, we can replay those interaction in isolation
modifications from the user application in order to enable and in an automated way and thus validate if any changes in-
replay. troduced to the component are corrupting previously working
In [11] Netzer et al. present an approach that enables tracing functionality. In order to further elaborate this approach, three
and replaying the order in which messages in a program are subsections are following, each discussing distinct phases of
executed, while debugging it. In this way the program is forced the approach, namely: (i) recording phase, (ii) replaying phase,
to reproduce messages in their original order. This approach and (iii) test verdict (oracle) phase.
focuses only on the messages that are part of the trace to be
analyzed by using a built-in run-time decision mechanism. The A. Recording
approach introduces a certain level of adaptivity, by continuous In order for the component under test (CUT) to be correctly
decision making about what to trace. The approach reduces observed during its execution all input signals, that could
trace size and enables debugging with substantially lower provoke change in the behavior of the CUT, and all output
execution time overhead. signals, that could be changed as a result of the behavior of
Ronsse et al. [12] describe a system called RecPlay that en- the CUT, must be monitored. In most cases this referees to the
ables record/replay of shared memory applications containing signals directly connected to the component itself. However,
no data races (a program error occurred due to the existence there are cases where CUT internal state could be changed
of synchronization operations). They record the order of all by provoking another component via a specific signal and
TABLE I
R ECORDING STRUCTURE
Unique Signal Sampled Inputs Sampled Outputs Oracle Invocation

Value 1 {{In 1,Value}, {In 2,Value}, ..., {In M,Value}} {{Out 1,Value}, {Out 2,Value}, ..., {Out K,Value}} {True/False} {True/False}
Value 2 {{In 1,Value}, {In 2,Value}, ..., {In M,Value}} {{Out 1,Value}, {Out 2,Value}, ..., {Out K,Value}} {True/False} {True/False}
.
.
.
Value N {{In 1,Value}, {In 2,Value}, ..., {In M,Value}} {{Out 1,Value}, {Out 2,Value}, ..., {Out K,Value}} {True/False} {True/False}
making an indirect change through a shared memory or a a value of 3000. This value can represent number of cycles
similar concept. In such cases, those specific signals must be passed from the start of the execution. When replaying, we do
monitored as well. not have to wait for the system to reach 3000 cycles in order to
Monitoring, or sampling, of the selected signals has to start invoking our CUT. We can provoke the CUT immediately
be done relative to the global time-related or cycle-related after it is initialized, but every subsequent invocation of the
unique signal. This could be, for example, a global clock or CUT has to happen relative to that number.
a cycle counter in the system. Precision of the sampling is of What is specific about replaying phase is that at the same
course very important but too precise recording may result in a time as we are provoking the CUT, by automated replaying
overflow of unnecessary information making it rather difficult of previously recorded signal values, we are also performing
to efficiently generate tests. If we have a system designed to a new recording of the CUT in parallel. This new recording
be executed at the millisecond precision, there is no need to is done exactly the same as in the first phase since recording
sample our recording at the microseconds level. itself cannot distinguish if the CUT is provoked by a user, by
Another important aspect of the recording phase, which also another component or by a previously recorded sample. By
contributes to a more efficient test generation, is performing doing so we can create a new structure of recorded samples
selective data sampling. This means we do not save all values allowing us to perform its analysis and comparison with the
of monitored signals throughout their execution lifespan, but original recording and thus enable creation of automated test
only values when there was a change in the signals being oracle.
monitored at a specific time. The change could be that one or
more signals at the input of the CUT have a new value, and/or C. Test Oracle
one or more signals at the output of the CUT has a new value. Once we have recorded structures from both the recording
Immediately when this is detected a new row in the recording phase and the replaying phase, we can devise a test verdict.
structure needs to be populated. This is done by comparing these two structures as described
The structure of the recording is presented in Table I. in the next steps:
In addition to the sampled inputs, outputs and the unique 1) Both structures are filtered out such that only rows which
signal values, each row contain information called Oracle and have an Oracle flag set to True are left for the analysis
Invocation. Oracle represent a Boolean flag indicating if a (as per table I).
specific row in the recording structure should be used for a 2) Values for the unique signals are compared having in
verdict and it is set whenever there is a change on any of the mind that relative offset is important rather than the
monitored outputs. Invocation also represent a Boolean flag, actual value.
but it is indicating if a specific row in the recording structure 3) Values for each and every input and output signal are
should be used for provoking the CUT during the replaying compared between the structures.
phase and it is set whenever there is a change on any of the 4) If any of the above comparison is not true, the test is
monitored inputs. reported as failed. Otherwise, test is passing.
For systems having timing and state-full nature it is very
B. Replaying important to preserve the order of input signals which are
Replaying phase, as its name suggest, serves for the purpose invoking a CUT. In addition to the exact sequence of input
of exercising the CUT in the same manner as it was performed vectors, it is also important to maintain the same timing offset
during the recording phase. When performing replaying, only from a single reference point such that input signals are not
a subset of sampled recording structure is used. Essentially, only provided in a specific order but also at the specific
every recorded sample that has a flag Invocation set to True is time. This, as a result, makes it rather difficult to create a
used for replaying. Idea is to provoke the CUT by setting the flexible test oracle. Essentially, any time-related deviations
signals to specific values at the specific point in time. For that when observing outputs may result in a false positive test
to achieve it is important to keep track of the relative offset case: a failing test case which is actually not reveling any
for the unique signal used as a timing reference. For example, faults in the system, the signalwas just set late. For that reason
during the recording phase, the first row in the recording it is important to consider introducing tolerance factor when
structure could have been observed when the unique signal had performing automated test oracle generation. This factor could
2) Based on the same list of provided requirements, an
implementation in an FBD language is conducted by
a developer.
3) Functional testing of a component is performed manu-
ally by following the Software Component Test Speci-
fication document and using MDCT tool (presented in
figure 6).
4) Results of functional testing are documented as an Excel
sheet and also in a new document called Software
Component Test Record.
The main reason why so much attention is given to various
documents throughout the development of a component is the
safety auditing process. This process is mandated for safety-
critical products like TCMS. However, main problem is due
to the fact that these documents cannot be used or re-used
for any other purpose. This is very much emphasized when a
component needs to be updated, due to the change request. In
such a situation, the following process for the development is
Fig. 6. Mitrac Desktop Component Tester - MDCT tool visualizing FBD in place:
execution flow and allowing developers to manually override a signal value
when performing functional testing. 1) Based on the change request, and update is done to the
Software Component Test Specification document.
2) Actual change is implemented in the FBD code.
be based on the unique signal values such that there is a 3) Complete functional testing of a component is per-
tolerance in when a specific values for outputs are observed. formed manually by following the Software Component
For example, if we are having microseconds precision of Test Specification document and using MDCT tool.
the unique signal, we could state that our system tolerate 4) New results of functional testing are documented as an
deviations of ±10 microseconds when observing changes to Excel sheet and also in a new Software Component Test
the output signals. Record document.
V. C OMPONENT T ESTING P ROCESS This means that any single update to any existing component
Before presenting the case study design, how it is executed requires a full re-testing activity which at the moment could
and the results we collected in the following section, we need be done only by a manual effort. However, just because an
to first elaborate on how the current component testing process additional manual effort is spend on each update, this does
is performed in Bombardier Transportation. not imply that it has to be immediately automated. Investing
TCMS is considered as a safety-critical product used in in automation is very much dictated by the number of re-runs
the railway domain and thus its development, testing and that could potentially occur. For this purpose we investigated
safety assessing obligations are regulated by the EN50128 the ratio of component updates for the existing C30 project,
standard [14]. Among other requirements, it is mandated by currently being developed, and compared it with the finalized
EN50128 to perform functional testing of the software on the VZI300 Zefiro (high speed train) project. Results are presented
lowest (unit) level. In the case of Bombardier, a unit is a in table II.
function block diagram (FBD) implementation in IEC 61131-
3 standard which is not composed of other FBDs (unless TABLE II
O CCURRENCE OF UPDATES ON A COMPONENT LEVEL
they belong to the library of already tested, reusable FBDs).
However, what is important to point out here is that unit Updates C30 Project (in %) VZI300 Zefiro Project (in %)
level testing, or component level testing as its referred to in 0 57,46 65,71
1 28,95 0,00
Bombardier internally, is performed by developers itself. This 2 4,98 18,22
by no means should be a surprise considering that in almost 3 5,88 8,87
all programming environments unit level testing is in general ≥4 2,63 7,2
a responsibility of a developer. However, the lack of unit
testing framework for function block diagrams makes it rather At first it may seem that any effort in automation is not
difficult, if not impossible, to have any sort of automation really motivated by the figures provided. However, Bombardier
support. This is why currently the component testing process Transportation is always trying to reuse components from the
is devised as following: previous projects, leading to a situation that quite a number of
1) Based on the list of provided requirements, developer components do not have any change requests even all the way
creates a Software Component Test Specification docu- up to the usage on the real train. But those that do have updates
ment. are the ones specific to a train project and in most cases they
do have a higher complexity (in terms of number of inputs,
outputs, function blocks, etc.). Higher complexity naturally
leads to a higher effort when performing manual testing.
VI. C ASE S TUDY

In order to quantify potential benefits that could be gained
by using the record & replay approach for automated test
generation, a case study was performed on the train control
and management system (TCMS) product for the Stockholm
C30 metro train developed by Bombardier Transportation site
in Västerås, Sweden.
A. Case Study Design

The goal of this case study is to evaluate applicability of
the record & replay approach on a real industrial system that
had actual change requests, which required from developers
to conduct regression testing of the same. Focus is on: (i) es-
timating possible saving effort when performing regression
testing, and (ii) evaluating correctness of the proof-of-concept
implementation. In order to achieve such goals, case study was
designed by following the next steps: Fig. 7. TestRecorder tool - A proof-of-concept implementation for the record
& replay approach used within the case study.
1) Implementation of the proof-of-concept tool supporting
the record & replay approach for a given industrial
system. • number of input and output signals it is designed to
2) Selection of the component under test (test object). handle, and
3) Recording of the tests for the initial release of the CUT, • number and diversity of function blocks it contains.
based on the Software Component Test Specification.
The component selected for the case study is part of
4) Replaying recorded tests on every subsequent release of
the Traction Brake sub-level function group. It is named
the CUT.
DBC LV MchSup and its purpose is:
5) In case some tests fail at the specific release, a new
recording is done for that specific test based on the To supervise master controller handle of the train and calcu-
Software Component Test Specification for that release. lates the traction/brake reference from the master controller
Details on each of the above steps are presented in the handle unit. Additionally, it prioritizes the inputs from master
following subsections. controller unit and validates the status of a master controller.
Finally, it generates events for deviations of micro switches
B. TestRecorder implementation from master controller and master controller handle cutouts.
To practically support monitoring, recording and replaying When it comes to the change requests, this component had
of signals in the TCMS environment, a dedicated tool was cre- in total 6 versions2 internally released for testing. The first
ated. TestRecorder interface is depicted in figure 7. Inputs and release was introduced 12th of September, 2014 and the last
outputs areas are drag-and-drop enabled allowing developers modification was done on 23rd of August, 2015.
to easily decide which signals to select from MDCT tool for Complexity of the selected CUT, expressed in the terms of
monitoring. number of signals and function blocks is listed here:
Open Test button allows opening of previously saved tests • 16 input signals - 14 Boolean and 2 integer types
to be used for replaying or even new recording in case the • 9 output signals - 6 Boolean and 3 integer types
same signals could be used again. Record and Replay buttons • 14 parameters (constants) - 1 Boolean, 1 time and 12
are implemented according to the description of the record integer types
& replay approach in section IV. Save as Test button allows • 14 function blocks (state-full elements in FBD)
saving of the recorded test for later use. Below the buttons, • 38 functions (state-less elements in FBD)
an information panel display results of test replay execution
By discussing internally with developers at Bombardier to
in terms of a pass or a fail message.
what extent this component would be a good representative,
C. Test Object Selection it is stated that this component is considered of a higher
complexity than what an average FBD diagram looks like.
Selection of the test object for the case study is based on
the following criteria: 2 This was at the moment of publication submission for a review process.
• number of change requests it had, Development on the C30 TCMS project was not completed at that time.
(all items except Initial Recording). Out of those 30 regression
tests, 24 are passing when being automatically replayed using
the proposed record & replay approach and the TestRecorder
tool.
This means that 80% of the regression testing activities could
have been performed completely automatically without any
need for the manual effort to be spend, if the record & replay
approach was used from the very beginning of the component
development.
It is also interesting to discuss the tests which were failing
Fig. 8. Trace logging result when manually recorded Test 7 is replayed. and the reasons for that. Test 6 was failing in Version 2 due
to the miss-implementation in previous release. This of course
required a new recording to be performed based on the new
D. Case Study Execution test specification. Afterwords, the new recording was passing
After selection of the CUT, the case study was executed by on all subsequent releases. The Version 3 release had most
first determining which previous internal releases of SoftTCMS failing tests: Test 1, Test 2 and Test 5. Reason for that is a
contained changes to the selected CUT. Each and every release change request regarding component interface. Several input
was individually compiled and verified that it could be used and output signals were named slightly different in the Version
by Mitrac Desktop Component Tester (MDCT) tool. 3 release. This is an interesting situation with respect to the
The very first initial release in which DBC LV MchSup regression testing as the reason for the change is not really
CUT was introduced was selected as a starting point for the functionality driven. In such a case there is no need to do
recording. Based on the Software Component Test Specifica- a completely new recording as it is easier to just change
tion, six individual tests (Test 1 to Test 6) were recorded by the name of signals in the recorded test script file. By re-
setting specific signals to the stated values with the MDCT running updated test recordings on the Version 3 all tests
tool. Essentially, we were replicating the exact manual effort where passing. In addition, tests where also passing on all
spent by developers. the subsequent releases.
Subsequently, for every next internal release, which con- In the Version 4 release, the selected CUT had a change
tained updates to the selected CUT, recorded tests were request regarding its functionality, where the input range for
replayed. For these activities, only a machine effort was master control handle was not any longer between -100 and
used. Figure 8 displays trace logging done by DCUTerm tool 100, but instead -10000 and 10000 to support the increased
when previously manually recorded test is now automatically precision. Interestingly, only two previously recorded tests,
replayed. In case a specific test was failing on a given release, Test 3 and Test 4 were failing because of this change. Once
a new manual recording was done based on the Software again, new recording of those tests on the Version 4 release
Component Test Specification for that release and for that test. was sufficient to keep that functionality automatically re-tested
For every other test which was passing, no additional actions in all the subsequent releases.
were needed.
F. Limitations
Since the Software Component Test Specification for
the last internal release contained one additional test for In order to fully understand the meaning of the presented
DBC LV MchSup CUT, it had to be manually recorded for the results it is important to discuss limitations of the study
first time. However, there were no subsequent releases where presented in this paper.
this test could be replayed. External Validity. The component under test used within
Results of the case study execution process are outlined and this case study was developed in FBD 61131-3 program-
discussed in the following subsection. ming language as part of a train control management system
which limits to some extent its external validity. However, no
E. Results specifics of FBD programming language influenced how the
In table III, test execution results are displayed for the record & replay approach is used within the study. As long
selected CUT. It contained seven tests spanning over six as a component under test is observable in terms of input and
releases. However, Test 7 was only introduced in the last output signals or variables, a general principle of the record
release and no subsequent regression testing was done with it. & replay approach will apply.
Every test that was recorded for the first time has a label Initial Internal Validity. One of the criteria for the selection of the
Recording. A failing test required new recording and thus such component under test was the number of change requests
situations are labeled as Fail / New Recording. Passing test is it had. Considering that the effect of the record & replay
labeled as Pass. approach increases with the number of change request the
Looking at the occasions where regression testing had to CUT has, this criteria could be considered as a bias in the
be performed, we have in total 30 regression testing activities study. But, the main reason for having this criteria in the
TABLE III
T EST RESULTS
Version 1 Version 2 Version 3 Version 4 Version 5 Version 6

Test 1 Initial Recording Pass Fail / New Recording Pass Pass Pass
Test 3 Initial Recording Pass Pass Fail / New Recording Pass Pass
Test 4 Initial Recording Pass Pass Fail / New Recording Pass Pass
Test 6 Initial Recording Fail / New Recording Pass Pass Pass Pass
Test 7 - - - - - Initial Recording
first place was to highlight the process when using the record
& replay approach rather than its benefits. In the worst case
scenario the record & replay approach would not save any
manual effort but it would neither introduce any overhead as
it is designed to be a non-intrusive approach.
Construct Validity. When measuring the saved effort while
using the record & replay approach we assume that the
same amount of manual effort was spent when performing
regression testing of the CUT. In practice this may not be true
due to the fact that developers when performing regression
testing are to some extent already familiar with the system
and re-running the tests could be done quicker. Fig. 9. Trace logging result when Test 7 is executed without unnecessary
Conclusion Validity. In the presented case study, only one manual delays.
CUT was used for the evaluation of the record & replay ap-
proach. However, the selected CUT does represent a complex
information regarding the usage of state-full elements while
component with a critical function, determining traction/brake
propagating from the input to the output. These information
reference based on the position of the driver’s handle. More
could be collected by analyzing a source code of an underlying
components could have been selected for evaluation of the
PLC component and creating a dependency matrix between the
record & replay approach, but authors do not see how could
inputs and the outputs of a given component.
that specifically help to increase confident on the applicability
Even in the case of PLC components containing timing
of the presented approach.
information, since the complete system is simulated, there
is an opportunity to increase the speed of simulation. This
VII. F UTURE W ORK
can additionally save machine effort and obtain a test result
With the record & replay approach, presented in this paper feedback quicker with respect to the real time. In case of the
and applied within the case study, developers at Bombardier SimTecc simulation environment, a TimeSync module support
have gained the possibility to replace the manual effort in such a possibility.
the regression testing phase with the machine effort. Since One additional opportunity to save the machine effort,
PLC systems are time-aware, replaying phase took the same resulting in getting a quicker feedback on the regression test
amount of time as it did during the manual recording phase. results, is to introduce parallel execution of tests. Again, based
This means that several time delays in setting various signals on the analysis of the dependency matrix between inputs
are not really a consequence of a system behavior but rather and outputs, several test recordings could be waived into one
the result of a manual invocation of the system. To emphasize parallel replay of the PLC component execution. This can lead
this, figure 9 is showing how a full execution of Test 7 could to a significant reduction of machine execution effort.
be replayed almost 5 times faster by just a simple removal of The current implementation of the record & replay approach
manual delays introduced when system was invoked. is done as a stand alone tool for the evaluation purposes.
In addition, parts of a PLC component, or even complete Bombardier’s intention is to have it integrated within the
PLC components, may not have any timing related implemen- Mitrac Desktop Component Tester tool where most of the
tations. They could be constructed using basic functions (logic above mentioned advancements could be easily integrated and
operators, comparison blocks, etc.) also known as stateless made available for engineers to use.
elements. This creates a unique opportunity to “speed up” test Although presented case study and proof-of-concept imple-
execution since any timing information is again purely because mentation does rely on specifics of the Bombardier’s TCMS,
of a user is manually invoking the system. However, in order there is a possibility to generalize this approach for usage
to achieve such an advancement in the machine effort needed in other type of systems and even in other type of domains.
to execute a test, one need information on the dependency What we plan to investigate further is how other customers of
between inputs and outputs of a PLC component. Especially SimTecc simulation environment could benefit from the record
& replay approach, especially in situations where optimization especially safety critical ones. Having mechanisms to re-assure
has to be done during replaying but underlying source code is correct behavior helps increasing the speed of the development
not available to support analysis. process by providing a safety net around existing functionality.
VIII. C ONCLUSION ACKNOWLEDGMENT
Test automation is often perceived as an expensive but
The conducted research was supported by The Knowledge
inevitable investment for various software industry domains.
Foundation (KKS) through the AGENTS project (Automated
Even a simple task of automating test execution in an unat-
Generation of Tests for Simulated Software Systems). The
tended manner could be considered as a significant testing
authors would like to thanks Maximatecc AB and Bombardier
infrastructure investment. Additional effort needed for writing
Transportation for successfully supporting a research-industry
of test scripts and analyzing test execution results makes the
co-production.
decision on test automation investments even harder to make.
Our solution based on the record & replay approach ad- R EFERENCES
dresses this problem by reusing the manual effort performed
during the functional testing of a component. By recording [1] I. E. Commission, “Iec international standard 61131-3,” Programmable
Controllers, 2014.
interactions on the specific component for a certain time [2] J. Teich, “Hardware/software codesign: The past, the present, and
duration, we are able to automatically replicate the same predicting the future,” Proceedings of the IEEE, vol. 100, no. Special
interactions by replaying the execution flow. This, when done Centennial Issue, pp. 1411–1430, 2012.
[3] A. Möller and P. Åberg, “A simulation technology for can-based
on a newer version of a component, enables efficient regression systems,” CAN Newsletter, no. 4, 12 2004.
testing process. [4] M. Palmieri, A. Cicchetti, and A. Öberg, “Cutting time-to-market by
The record & replay approach does not represent a novelty adopting automated regression testing in a simulated environment,”
in The 26th IFIP International Conference on Testing Software and
in itself. It has been an established method for increasing Systems, M. G. Merayo and E. M. de Oca, Eds. Springer, 9 2014,
automation for many years now. However, to the best of pp. 129–144.
our knowledge, there are no currently proposals to utilize [5] “Ieee standard computer dictionary. a compilation of ieee standard
computer glossaries,” IEEE Std 610, p. 170, 1991.
record & replay approach for regression testing on a unit [6] E. Engström and P. Runeson, “A qualitative survey of regression
or component level. Most recording efforts on unit level testing practices,” in Product-Focused Software Process Improvement,
are addressing debugging aspect of the development process, ser. Lecture Notes in Computer Science, M. Ali Babar, M. Vierimaa,
and M. Oivo, Eds. Springer Berlin Heidelberg, 2010, vol. 6156, pp.
while any recording for the purpose of regression testing is 3–16.
focusing on a GUI level. The same way as we are recording [7] A. M. Memon, “Gui testing: Pitfalls and process,” Computer, vol. 35,
mouse movements for GUI testing, we can say that here we no. 8, pp. 87–88, 2002.
are recording signal movements. [8] M. Ronsse, K. De Bosschere, M. Christiaens, J. C. de Kergommeaux,
and D. Kranzlmüller, “Record/replay for nondeterministic program ex-
Except for enabling efficient regression testing, the record & ecutions,” Commun. ACM, vol. 46, no. 9, pp. 62–67, 9 2003.
replay approach provides an easy entry for companies into the [9] T. Bauer, F. Bohr, D. Landmann, T. Beletski, R. Eschbach, and J. Poore,
area of test automation due to the fact being a non-intrusive “From requirements to statistical testing of embedded systems,” in
Proceedings of the 4th International Workshop on Software Engineering
method, something which is highly appreciate in the industrial for Automotive Systems, ser. SEAS ’07. IEEE Computer Society, 2007,
context. Even if the method would not help in a specific case, pp. 3–.
for example when every regression testing instance would [10] R. Konuru, H. Srinivasan, and J.-D. Choi, “Deterministic replay of
distributed java applications,” in Parallel and Distributed Processing
require a new recording, it will neither introduce any additional Symposium, 2000. IPDPS 2000. Proceedings. 14th International, 2000,
manual effort for its usage. pp. 219–227.
Once the method does show itself useful, it could provide [11] R. Netzer and B. Miller, “Optimal tracing and replay for debugging
message-passing parallel programs,” The Journal of Supercomputing,
benefit in terms of saving manual developers effort. In the case vol. 8, no. 4, pp. 371–388, 1995.
study presented in this paper estimated saving for regression [12] M. Ronsse and K. De Bosschere, “Recplay: A fully integrated practical
testing is 80%. In addition, the recording structure of each test record/replay system.” ACM Transactions on Computer Systems, vol. 17,
no. 2, pp. 133–152, 1999.
contains sufficient information to almost completely generate [13] L. Lamport and R. S. Gains, “Time, clocks, and the ordering of events
Software Component Test Record documentation needed by in a distributed system.” Communications of the ACM, vol. 21, no. 7,
the safety assessors. pp. 558–565, 1978.
[14] CENELEC, “50128: Railway application–communications, signaling
Regression testing and regression tests do represent valu- and processing systems–software for railway control and protection
able assets during the development of any type of product, systems,” Standard Report, 2001.

4575

Uploaded by

Copyright:

Available Formats

4575

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4575

Uploaded by

Copyright:

Available Formats

Automated Regression Test Generation Using

Record & Replay Approach: A Case Study on

tion of existing debugging and test automation solutions. In

caused unintended effects and that the system or component

Unique Signal Sampled Inputs Sampled Outputs Oracle Invocation

VI. C ASE S TUDY

A. Case Study Design

Version 1 Version 2 Version 3 Version 4 Version 5 Version 6

You might also like