4575
4575
4575
Abstract—Organizations tend to limit their investment in test testing and, for some organizations, manual inspection of the
automation due to the lack of information on the actual tests log files could be the only way to make a verdict on a test.
reuse and when will automated tests really pay off. However,
to perform efficient regression testing of software systems, it is The goal of test automation is to advance in automating
expected from a development team to posses a certain level of all of its elements: creation of test scripts, test execution,
test automation infrastructure in place, where at the minimum generation of test inputs and defining of a test oracle. By not
the test execution is scripted and automated. having automated some of the mentioned elements, the cost
In this paper we are proposing the usage of record & replay
approach to observe the functional usage of a component under of performing regression testing could substantially increase as
test, while its being invoked as part of the whole system or it would require an additional manual effort with every new
only in a certain portion of it. Afterwords, executable tests software release. On the other hand, it is rather challenging
are automatically derived, containing both test inputs and test for companies to change how things are currently done and
verdict, allowing its later usage as part of a regression testing. especially make a significant jump when it comes to its
With as minimal effort as one manual test execution, developers
are provided with automated tests, minimizing any concerns on own test automation capabilities. Large-scale organizations are
the investment in automation. A case study from Bombardier just not eager to change things around easily particularly if
Transportation is provided showing how the proposed approach it introduces an additional effort for their employees. This
substantially reduced the test effort needed when performing is why any non-intrusive approach is very welcomed for
regression testing of the train control management system for consideration.
the Stockholm C30 metro train.
In this paper we are proposing the usage of record & replay
I. I NTRODUCTION approach in observing the functional usage of a component
With every new release of a software system, a dedicated under test (CUT) while the same is executed as part of the
testing activity should be performed in order to verify that the whole system or only in a certain portion of it. Information
changes to the code, due to fault fixes or introduction of new sampled during the recording phase are sufficient to automat-
features, did not corrupted any of the existing functionality. ically generate test scripts, containing both test inputs and
Often, we refer to this activity as a regression testing. Due test oracle, and perform their automated unattended execution.
to its repetitive nature, by over and over re-verifying that the Approach had a positive impact and was seen as an enabler
same functionality still has a correct behavior, efficiency of of a more efficient regression testing of the Train Control
regression testing is heavily reliant on the level of automation Management System (TCMS) for the Stockholm C30 metro
a certain organization posses. Depending on the application train developed by Bombardier Transportation site in Västerås,
domain and the maturity level of a specific organization, this Sweden.
can vary substantially. Organization of the paper is as follows. Section II pro-
Test automation in industry is often perceived as a mean to vides the background information regarding the specifics of
perform automated execution of the system under test (SUT) the TCMS development environment including its simulation
by unattended running of a set of test scripts. An execution of technology. Section III discuss the related work followed by
a single test script usually involves retaining SUT in a certain section IV which provides details regarding the record &
state by invoking it with a predefined set of input values. A replay approach for automated test generation. Component test
verdict or a decision whether a test has passed or not, also process is outlined in section V while the case study design,
known as a test oracle, could be automated as well. However, execution and results are presented in section VI. Section VII
this might require an additional investment in automation discuss possible future directions followed by section VIII
depending on the testability of the SUT. Testability defines which highlights conclusions on the work presented in this
how observable the states of the SUT are when performing paper.
Fig. 1. Bombardier’s Mitrac - Train Control Management System (TCMS)1 Fig. 2. An example of an FBD (IEC 61131-3) program structure
II. BACKGROUND of the distributed system that controls the train. Examples of
functions controlled by TCMS include collecting line voltage,
The general perception of software intended usage has
controlling the train engines, opening and closing the train
changed over the last decades. Often, software purpose is
doors, and upload of diagnostic data. TCSM is developed
perceived as a medium to help us in creating spreadsheets,
by following the IEC 61131-3 standard [1] for programming
editing graphical content or just entertain us in the form of
of PLC controllers. In particular, Function Block Diagrams
gaming or presenting multimedia. However, people may not
(FBD) are used to implement most of the functionality of
be fully aware to what extent software has a critical role in our
TCMS while Structured Text (ST) programming language is
lives. Today, software is used to control brakes of a car, the
used for the development of user specific libraries.
speed of a train or the altitude of an airplane. The main reason
FBD is a graphical programming language fitted very well
for this change in software usage has been the increased need
for developers having background in control engineering.
for more complex functionality in products usually perceived
Behavior of a program in FBD is described by connecting
as a physical, or hardware-oriented. Nowadays companies are
input signals to the network of functions and function blocks
using general purpose or custom made hardware modules
which computes the resulting outputs. An example of an
from suppliers on top of which their own in-house developed
FBD program structure is shown in figure 2. It is important
software solution (usually referred as an embedded software)
to distinguish the main difference in the behavior between
often controls the main functionality of the final product.
functions (for example AND, OR elements in fig. 2) and
Programmable logic controllers (PLC) are examples of such
function blocks (for example TP, TON, RS elements in fig.
systems.
2) in an FBD program. Functions always compute the same
A typical PLC system consists of a processor, a memory,
output for a given input vector. Function blocks however have
and a communication bus. Its main advantage is a build-in
internal states causing output to differ even if the input vector
support for input and output (I/O) communication. Having
is the same. This is because the outputs of function blocks
sensor values as inputs and actuator systems as outputs, PLCs
are dependent on the ordering within the sequence of input
are ideal for monitoring and supervising control systems. A
vectors provided to it but also dependent on when (in time)
program running on a PLC is almost always executed in a
a specific input vector is provided to the system and for how
cyclic loop with three distinguish phases: (i) reading all inputs
long (duration) it had that value.
from an input vector, (ii) executing program computation
This behavior of FBD programs emphasize the need to
without any interrupts, and (iii) writing updated values to the
have a structured approach when performing its testing. In
output vector.
particular, re-testing requires a well defined test steps clearly
One example of a PLC product is the Bombardier Trans- indicating sequence of values and timing information to suc-
portation’s Mitrac1 Train Control and Management System cessfully observe proper functioning of the underlying com-
(TCMS) depicted in figure 1. TCMS is a high capacity, infras- ponent under test (CUT). A challenging task, considering that
tructure backbone built upon an open standard IP-technology FBD components are part of a bigger system that is running
that allows an easy integration of all control and communica- on a dedicated PLC hardware. To avoid these problems,
tion functions on-board the train. It is considered as the center simulation techniques are extensively used in software testing
1 http://www.bombardier.com/en/transportation/products-services/ of embedded products [2]. Simulation enables execution of
propulsion-controls/products/train-control-and-management-system.html embedded software on a general purpose PC, enabling utiliza-
Fig. 3. Maximatecc Simtecc Simulation Technology
system or component to verify that modifications have not ^ƵďƐLJƐƚĞŵ ^ƵďƐLJƐƚĞŵ ^ƵďƐLJƐƚĞŵ ^ƵďƐLJƐƚĞŵ
ZĞĐŽƌĚĞƌ
^ƵďƐLJƐƚĞŵ
^ƵďƐLJƐƚĞŵ ^ƵďƐLJƐƚĞŵ ^ƵďƐLJƐƚĞŵ ^ƵďƐLJƐƚĞŵ
making an indirect change through a shared memory or a a value of 3000. This value can represent number of cycles
similar concept. In such cases, those specific signals must be passed from the start of the execution. When replaying, we do
monitored as well. not have to wait for the system to reach 3000 cycles in order to
Monitoring, or sampling, of the selected signals has to start invoking our CUT. We can provoke the CUT immediately
be done relative to the global time-related or cycle-related after it is initialized, but every subsequent invocation of the
unique signal. This could be, for example, a global clock or CUT has to happen relative to that number.
a cycle counter in the system. Precision of the sampling is of What is specific about replaying phase is that at the same
course very important but too precise recording may result in a time as we are provoking the CUT, by automated replaying
overflow of unnecessary information making it rather difficult of previously recorded signal values, we are also performing
to efficiently generate tests. If we have a system designed to a new recording of the CUT in parallel. This new recording
be executed at the millisecond precision, there is no need to is done exactly the same as in the first phase since recording
sample our recording at the microseconds level. itself cannot distinguish if the CUT is provoked by a user, by
Another important aspect of the recording phase, which also another component or by a previously recorded sample. By
contributes to a more efficient test generation, is performing doing so we can create a new structure of recorded samples
selective data sampling. This means we do not save all values allowing us to perform its analysis and comparison with the
of monitored signals throughout their execution lifespan, but original recording and thus enable creation of automated test
only values when there was a change in the signals being oracle.
monitored at a specific time. The change could be that one or
more signals at the input of the CUT have a new value, and/or C. Test Oracle
one or more signals at the output of the CUT has a new value. Once we have recorded structures from both the recording
Immediately when this is detected a new row in the recording phase and the replaying phase, we can devise a test verdict.
structure needs to be populated. This is done by comparing these two structures as described
The structure of the recording is presented in Table I. in the next steps:
In addition to the sampled inputs, outputs and the unique 1) Both structures are filtered out such that only rows which
signal values, each row contain information called Oracle and have an Oracle flag set to True are left for the analysis
Invocation. Oracle represent a Boolean flag indicating if a (as per table I).
specific row in the recording structure should be used for a 2) Values for the unique signals are compared having in
verdict and it is set whenever there is a change on any of the mind that relative offset is important rather than the
monitored outputs. Invocation also represent a Boolean flag, actual value.
but it is indicating if a specific row in the recording structure 3) Values for each and every input and output signal are
should be used for provoking the CUT during the replaying compared between the structures.
phase and it is set whenever there is a change on any of the 4) If any of the above comparison is not true, the test is
monitored inputs. reported as failed. Otherwise, test is passing.
For systems having timing and state-full nature it is very
B. Replaying important to preserve the order of input signals which are
Replaying phase, as its name suggest, serves for the purpose invoking a CUT. In addition to the exact sequence of input
of exercising the CUT in the same manner as it was performed vectors, it is also important to maintain the same timing offset
during the recording phase. When performing replaying, only from a single reference point such that input signals are not
a subset of sampled recording structure is used. Essentially, only provided in a specific order but also at the specific
every recorded sample that has a flag Invocation set to True is time. This, as a result, makes it rather difficult to create a
used for replaying. Idea is to provoke the CUT by setting the flexible test oracle. Essentially, any time-related deviations
signals to specific values at the specific point in time. For that when observing outputs may result in a false positive test
to achieve it is important to keep track of the relative offset case: a failing test case which is actually not reveling any
for the unique signal used as a timing reference. For example, faults in the system, the signalwas just set late. For that reason
during the recording phase, the first row in the recording it is important to consider introducing tolerance factor when
structure could have been observed when the unique signal had performing automated test oracle generation. This factor could
2) Based on the same list of provided requirements, an
implementation in an FBD language is conducted by
a developer.
3) Functional testing of a component is performed manu-
ally by following the Software Component Test Speci-
fication document and using MDCT tool (presented in
figure 6).
4) Results of functional testing are documented as an Excel
sheet and also in a new document called Software
Component Test Record.
The main reason why so much attention is given to various
documents throughout the development of a component is the
safety auditing process. This process is mandated for safety-
critical products like TCMS. However, main problem is due
to the fact that these documents cannot be used or re-used
for any other purpose. This is very much emphasized when a
component needs to be updated, due to the change request. In
such a situation, the following process for the development is
Fig. 6. Mitrac Desktop Component Tester - MDCT tool visualizing FBD in place:
execution flow and allowing developers to manually override a signal value
when performing functional testing. 1) Based on the change request, and update is done to the
Software Component Test Specification document.
2) Actual change is implemented in the FBD code.
be based on the unique signal values such that there is a 3) Complete functional testing of a component is per-
tolerance in when a specific values for outputs are observed. formed manually by following the Software Component
For example, if we are having microseconds precision of Test Specification document and using MDCT tool.
the unique signal, we could state that our system tolerate 4) New results of functional testing are documented as an
deviations of ±10 microseconds when observing changes to Excel sheet and also in a new Software Component Test
the output signals. Record document.
V. C OMPONENT T ESTING P ROCESS This means that any single update to any existing component
Before presenting the case study design, how it is executed requires a full re-testing activity which at the moment could
and the results we collected in the following section, we need be done only by a manual effort. However, just because an
to first elaborate on how the current component testing process additional manual effort is spend on each update, this does
is performed in Bombardier Transportation. not imply that it has to be immediately automated. Investing
TCMS is considered as a safety-critical product used in in automation is very much dictated by the number of re-runs
the railway domain and thus its development, testing and that could potentially occur. For this purpose we investigated
safety assessing obligations are regulated by the EN50128 the ratio of component updates for the existing C30 project,
standard [14]. Among other requirements, it is mandated by currently being developed, and compared it with the finalized
EN50128 to perform functional testing of the software on the VZI300 Zefiro (high speed train) project. Results are presented
lowest (unit) level. In the case of Bombardier, a unit is a in table II.
function block diagram (FBD) implementation in IEC 61131-
3 standard which is not composed of other FBDs (unless TABLE II
O CCURRENCE OF UPDATES ON A COMPONENT LEVEL
they belong to the library of already tested, reusable FBDs).
However, what is important to point out here is that unit Updates C30 Project (in %) VZI300 Zefiro Project (in %)
level testing, or component level testing as its referred to in 0 57,46 65,71
1 28,95 0,00
Bombardier internally, is performed by developers itself. This 2 4,98 18,22
by no means should be a surprise considering that in almost 3 5,88 8,87
all programming environments unit level testing is in general ≥4 2,63 7,2
a responsibility of a developer. However, the lack of unit
testing framework for function block diagrams makes it rather At first it may seem that any effort in automation is not
difficult, if not impossible, to have any sort of automation really motivated by the figures provided. However, Bombardier
support. This is why currently the component testing process Transportation is always trying to reuse components from the
is devised as following: previous projects, leading to a situation that quite a number of
1) Based on the list of provided requirements, developer components do not have any change requests even all the way
creates a Software Component Test Specification docu- up to the usage on the real train. But those that do have updates
ment. are the ones specific to a train project and in most cases they
do have a higher complexity (in terms of number of inputs,
outputs, function blocks, etc.). Higher complexity naturally
leads to a higher effort when performing manual testing.
first place was to highlight the process when using the record
& replay approach rather than its benefits. In the worst case
scenario the record & replay approach would not save any
manual effort but it would neither introduce any overhead as
it is designed to be a non-intrusive approach.
Construct Validity. When measuring the saved effort while
using the record & replay approach we assume that the
same amount of manual effort was spent when performing
regression testing of the CUT. In practice this may not be true
due to the fact that developers when performing regression
testing are to some extent already familiar with the system
and re-running the tests could be done quicker. Fig. 9. Trace logging result when Test 7 is executed without unnecessary
Conclusion Validity. In the presented case study, only one manual delays.
CUT was used for the evaluation of the record & replay ap-
proach. However, the selected CUT does represent a complex
information regarding the usage of state-full elements while
component with a critical function, determining traction/brake
propagating from the input to the output. These information
reference based on the position of the driver’s handle. More
could be collected by analyzing a source code of an underlying
components could have been selected for evaluation of the
PLC component and creating a dependency matrix between the
record & replay approach, but authors do not see how could
inputs and the outputs of a given component.
that specifically help to increase confident on the applicability
Even in the case of PLC components containing timing
of the presented approach.
information, since the complete system is simulated, there
is an opportunity to increase the speed of simulation. This
VII. F UTURE W ORK
can additionally save machine effort and obtain a test result
With the record & replay approach, presented in this paper feedback quicker with respect to the real time. In case of the
and applied within the case study, developers at Bombardier SimTecc simulation environment, a TimeSync module support
have gained the possibility to replace the manual effort in such a possibility.
the regression testing phase with the machine effort. Since One additional opportunity to save the machine effort,
PLC systems are time-aware, replaying phase took the same resulting in getting a quicker feedback on the regression test
amount of time as it did during the manual recording phase. results, is to introduce parallel execution of tests. Again, based
This means that several time delays in setting various signals on the analysis of the dependency matrix between inputs
are not really a consequence of a system behavior but rather and outputs, several test recordings could be waived into one
the result of a manual invocation of the system. To emphasize parallel replay of the PLC component execution. This can lead
this, figure 9 is showing how a full execution of Test 7 could to a significant reduction of machine execution effort.
be replayed almost 5 times faster by just a simple removal of The current implementation of the record & replay approach
manual delays introduced when system was invoked. is done as a stand alone tool for the evaluation purposes.
In addition, parts of a PLC component, or even complete Bombardier’s intention is to have it integrated within the
PLC components, may not have any timing related implemen- Mitrac Desktop Component Tester tool where most of the
tations. They could be constructed using basic functions (logic above mentioned advancements could be easily integrated and
operators, comparison blocks, etc.) also known as stateless made available for engineers to use.
elements. This creates a unique opportunity to “speed up” test Although presented case study and proof-of-concept imple-
execution since any timing information is again purely because mentation does rely on specifics of the Bombardier’s TCMS,
of a user is manually invoking the system. However, in order there is a possibility to generalize this approach for usage
to achieve such an advancement in the machine effort needed in other type of systems and even in other type of domains.
to execute a test, one need information on the dependency What we plan to investigate further is how other customers of
between inputs and outputs of a PLC component. Especially SimTecc simulation environment could benefit from the record
& replay approach, especially in situations where optimization especially safety critical ones. Having mechanisms to re-assure
has to be done during replaying but underlying source code is correct behavior helps increasing the speed of the development
not available to support analysis. process by providing a safety net around existing functionality.
VIII. C ONCLUSION ACKNOWLEDGMENT
Test automation is often perceived as an expensive but
The conducted research was supported by The Knowledge
inevitable investment for various software industry domains.
Foundation (KKS) through the AGENTS project (Automated
Even a simple task of automating test execution in an unat-
Generation of Tests for Simulated Software Systems). The
tended manner could be considered as a significant testing
authors would like to thanks Maximatecc AB and Bombardier
infrastructure investment. Additional effort needed for writing
Transportation for successfully supporting a research-industry
of test scripts and analyzing test execution results makes the
co-production.
decision on test automation investments even harder to make.
Our solution based on the record & replay approach ad- R EFERENCES
dresses this problem by reusing the manual effort performed
during the functional testing of a component. By recording [1] I. E. Commission, “Iec international standard 61131-3,” Programmable
Controllers, 2014.
interactions on the specific component for a certain time [2] J. Teich, “Hardware/software codesign: The past, the present, and
duration, we are able to automatically replicate the same predicting the future,” Proceedings of the IEEE, vol. 100, no. Special
interactions by replaying the execution flow. This, when done Centennial Issue, pp. 1411–1430, 2012.
[3] A. Möller and P. Åberg, “A simulation technology for can-based
on a newer version of a component, enables efficient regression systems,” CAN Newsletter, no. 4, 12 2004.
testing process. [4] M. Palmieri, A. Cicchetti, and A. Öberg, “Cutting time-to-market by
The record & replay approach does not represent a novelty adopting automated regression testing in a simulated environment,”
in The 26th IFIP International Conference on Testing Software and
in itself. It has been an established method for increasing Systems, M. G. Merayo and E. M. de Oca, Eds. Springer, 9 2014,
automation for many years now. However, to the best of pp. 129–144.
our knowledge, there are no currently proposals to utilize [5] “Ieee standard computer dictionary. a compilation of ieee standard
computer glossaries,” IEEE Std 610, p. 170, 1991.
record & replay approach for regression testing on a unit [6] E. Engström and P. Runeson, “A qualitative survey of regression
or component level. Most recording efforts on unit level testing practices,” in Product-Focused Software Process Improvement,
are addressing debugging aspect of the development process, ser. Lecture Notes in Computer Science, M. Ali Babar, M. Vierimaa,
and M. Oivo, Eds. Springer Berlin Heidelberg, 2010, vol. 6156, pp.
while any recording for the purpose of regression testing is 3–16.
focusing on a GUI level. The same way as we are recording [7] A. M. Memon, “Gui testing: Pitfalls and process,” Computer, vol. 35,
mouse movements for GUI testing, we can say that here we no. 8, pp. 87–88, 2002.
are recording signal movements. [8] M. Ronsse, K. De Bosschere, M. Christiaens, J. C. de Kergommeaux,
and D. Kranzlmüller, “Record/replay for nondeterministic program ex-
Except for enabling efficient regression testing, the record & ecutions,” Commun. ACM, vol. 46, no. 9, pp. 62–67, 9 2003.
replay approach provides an easy entry for companies into the [9] T. Bauer, F. Bohr, D. Landmann, T. Beletski, R. Eschbach, and J. Poore,
area of test automation due to the fact being a non-intrusive “From requirements to statistical testing of embedded systems,” in
Proceedings of the 4th International Workshop on Software Engineering
method, something which is highly appreciate in the industrial for Automotive Systems, ser. SEAS ’07. IEEE Computer Society, 2007,
context. Even if the method would not help in a specific case, pp. 3–.
for example when every regression testing instance would [10] R. Konuru, H. Srinivasan, and J.-D. Choi, “Deterministic replay of
distributed java applications,” in Parallel and Distributed Processing
require a new recording, it will neither introduce any additional Symposium, 2000. IPDPS 2000. Proceedings. 14th International, 2000,
manual effort for its usage. pp. 219–227.
Once the method does show itself useful, it could provide [11] R. Netzer and B. Miller, “Optimal tracing and replay for debugging
message-passing parallel programs,” The Journal of Supercomputing,
benefit in terms of saving manual developers effort. In the case vol. 8, no. 4, pp. 371–388, 1995.
study presented in this paper estimated saving for regression [12] M. Ronsse and K. De Bosschere, “Recplay: A fully integrated practical
testing is 80%. In addition, the recording structure of each test record/replay system.” ACM Transactions on Computer Systems, vol. 17,
no. 2, pp. 133–152, 1999.
contains sufficient information to almost completely generate [13] L. Lamport and R. S. Gains, “Time, clocks, and the ordering of events
Software Component Test Record documentation needed by in a distributed system.” Communications of the ACM, vol. 21, no. 7,
the safety assessors. pp. 558–565, 1978.
[14] CENELEC, “50128: Railway application–communications, signaling
Regression testing and regression tests do represent valu- and processing systems–software for railway control and protection
able assets during the development of any type of product, systems,” Standard Report, 2001.