Jiong 2009
Jiong 2009
Jiong 2009
International
IEEE International
Conference on
Conference
Scalable Computing
on Embedded
andComputing;
Communications;
IEEE International
The Eighth International
Conference on
Conference
Scalable Computing
on Embedded
andComputing
Communications
Multi-core and VMM based Non-Interference Test Method of Embedded System Software
32
we hope to test with most interest. We can use
multiple cores to test one core based software,
that means multi-core based test host may
have sufficient computing ability even though
the target core running with a very high
frequency.
In [6], the test platform includes data
gathering hardware and analysis software. The
hardware part is connected to the target
Figure 1 The Model of NITM system directly but makes no any control or
(2) Target System Status Data Gathering interference to the target system when
Accurate data gathering is the gathering the status data. In MC-NITM, we
precondition of all the analysis of NITM. If use Coren-1, which share L2 cache with Coren,
only all the needed history status data were to achieve this goal while other cores are used
recorded, all the analysis could be to analyse static data and dynamic data need
implemented. The following runtime status in NITM.
data should be concerned: executed These data will be analyzed to produce
instructions; operand; interrupt and the the result like: coverage ratio analysis of each
corresponding acknowledge; system bus status; sub routine, module and the whole system;
time stamp. performance analysis of each sub routine,
What kind of data should be recorded interrupt routine and even single statement;
depends on the particular requirements of each target system status history trace with the
test, among them the time stamp is the most pattern < time, event >, here event could be
basic while most important data. In fact, all interrupt requested, branches switched,
the performance and/or function analysis is variable accessed. when the case occurred to a
based on the time information. 2GHz or higher frequency multi-core
(3) Dynamical Analysis processor, the large amount L2 cache
The mainly function of dynamical influence should be considered as
analysis is to illustrate the runtime status data [7][8][9][10]. Consider all kinds of the
correctly with the help from both the target concerns, the best experiment target is Intel’s
software structure produced by the static new cosset Larrabee in the future, and we
analysis tools and the test template based the should consider how to test real time
target system. The analysis results include: embedded system software as discussed in
each computing operation and the [11][12]. While now, we do the experiments
corresponding time; all kinds of statistic on a PC with Quad-core Q9400 processor and
information; the trace of variables. 2G DDR RAM, and the target software is just
The implementation of such NITM a set of assembly program.
model in [6] can’t test a processor faster than To build an acceptable embedded system
8086 running on 5MHz. To promote the software test environment, we use
ability of NITM, we now upgrade NITM virtualization technology to setup a Xen based
model to Multi-core based NITM (MC-NITM) embedded system test environment as shown
illustrated in figure 2. MC-NITM exploits in figure2. Here we modified the famous open
multi-core to test single processor oriented source virtualization software Xen to make a
software, including embedded software which special VMM which running on a Intel
33
quad-core processor machine but manage only filter the missed instructions correctly is the
one core for itself while left 3 cores to setup a key problem of MC-NITM.
MC-NITM based embedded system test To solve this problem, there are two
environment: 1 core for target and 2 cores for methods: forward prediction and backward
host testing software. With this environment, retrospect. Here are some definitions:
we test some real embedded system Definition 1 Invalid Pre-fetch
application with acceptable result. Instruction (IPI): the instruction loaded but not
executed.
Definition 2 Valid Pre-fetch Instruction
(VPI): the instruction loaded any executed.
The behavior of a code section with IPI is
different with the one without IPI. In fact,
there are at least three types of differences:
time, space and semantic. So there are three
basic methods used to filter IPI according to
each of them. The following table is a classic
Figure2 Xen based MC-NITM case in which the code segment is 8086
assembly program running on target core in
5 Chaos of Instruction Pre-fetch real mode, the according compiler is
An unavoidable problem of MC-NITM is MASM6.11.
that the target core has the mechanism of Method (1) Considering the semantic
instruction pre-fetch, which is common among details of this segment, instruction 2 will
the modern processors. This feature of never be executed, so it is must be IPI. This
processor often leads to an embarrassed assert is based on forward prediction, so the
scenario: an instruction looks like it has been key point of this method is the static analysis
executed in the processor, but that only means about the source code of the target program. If
that it has been loaded into the cache of the instruction 1 is a condition branch, the
processor, and a wrong judgment in this kind semantic analysis will not provide any help.
of case could lead to severe failure of Obviously, this method can not solve this
MC-NITM. problem.
This problem can be solved based on the Line Addr Ins Ins Time Cycles Bin
techniques named “Data Flow Based Cache No. length Stamp Per Code
Prediction”[4] and “predicting data cache bytes Ins
behavior” [5]. For example, Intel 8086 1 100 JMP 2 T1 2 EB 03
processor has a 6-bytes-long instruction queue 105
while most assembly instructions of 8086 are 2 102 MOV 3 T2 4 B8 00 00
if there are any branches in the 6 instructions, Table 1 Classic Embedded System Target Program
some of them will be skipped, especially the Method (2) As to the condition of space
latter several instructions [6]. Just simply in this case, since the instruction queue is
comparing the status data gathered can’t 6-bytes-long, the object codes from address
indicate the real execute sequence. So, how to 100 to 105 are all loaded. The address
34
sequence should be 100, 102, 104 and 106. application of all of them. In real test project,
But the instruction 3 is reloaded after the all the analysis result of MC-NITM is
instruction 1 and 2. The real address sequence gratifying.
is 100, 102, 104 and 105. Obviously, an Method Speed Memory Algorithm Common Ground
loading process. By rewinding the trace data, 1 Slow High Complex, Compare the
it is easy to find this event. In this analysis Forward static feature and
process, not considering of the semantic 2 Faster Low Simple, the dynamic trace
details but an instruction data queue is mostly Backward data to find
needed. 3 Faster Low Simple, exceptions
Method (3) As to the condition of time, it Backward (semantics, time
is also simple to detect some time duration and space).
exceptions. For example, suppose all the Table 2 Differences of three methods.
instructions are executed in their loaded order,
i.e. all the instructions are VPI. Then the time 6 Conclusions
between instruction 1 and 3 are 6 clks. While MC-NITM is an upgrade of NITM[6],
in fact, instruction 2 is not executed, so the especially for embedded system, which can
real time difference is not 6 clks long reflect the real running status of target system
(T3-T1<6). So, there must be something and can test a rather fast processor. Even
wrong in our supposition: instruction2 is not though it really does make a few influences to
VPI. the target software, the analysis result of test is
In this case, all of these three methods much more accurate than traditional test
work well in finding the IPI, so which is method. Further more, NITM could be used to
selected in a real test project depends on the fit for all kinds of target systems with different
computing cost of each method. As to this architecture while this paper brought a
code segment, the cost of method (2) and (3) potential implementation model based on x86
is much lower and the corresponding multi-core processor.
algorithms are rather simple. In the contrary, References:
the cost of method (1) is much higher. After [1] Jin Chao.CodeTest-The Use of Embedded
comparing many test result of large scale Software Real-Time Test and Analysis Tools in the
target software, there are very much difference
Development of Embedded System[A]. In: Shen
among their costs of time and/or space when
Xubang,He Limin. 2001International Conference on
different method are selected. The essential
Embedded Systems[C].Beijing: Beijing University
difference among these methods could be
of Aeronautics and Astronautics Press,
summarized as: method (1) is based on
2001.290~292
forward prediction, all the data should be
[2] Wang Pu, Zhang Zhenjian, Wang Yuxi. A
checked; method (2) and (3) are based on
Research on Coverage-Based Software Testing
backward retrospect, only when an exception
Technique in the Real-Time Embedded Computer
occurs there is the need to make further
System[J]. Computer Engineering And Design,
analysis. Table 2 shows the comparisons of
1998,19ď6Đġ45~49
these methods.
[3]Li Qian, Mei Lin, Ling Hui, et al. EASTT: An
As to the complex case, each of these
Embedded Application Software Test System[J].
methods can’t work well and produce accurate
Computer Engineering and Science, 2002.24ď2Đġ
result. Then there must be a comprehensive
66~69
35
[4] Wolf F, Ernst R. Data Flow Based Cache Applications Symposium, 2008. RTAS '08. IEEE ,
Prediction Using Local Simulation[A]. In: IEEE vol., no., pp.80-89, 22-24 April 2008
International High-Level Validation and Test
Workshop (HLDVT'00) (C). Berkeley,
California :2000.155~160
[5] Ferdinand C , Wilhelm R. On predicting data
cache behavior for real-time systems[A]. In
proceeding of the ACM SIGPLAN Workshop
LCTES'98 Montreal, Canada, June 1998,on
Languages, Compilers and Tools for Embedded
Systems, volume 1474 of Lecture Notes in Computer
Science. 1998.16-30.
[6]Zhang J, Jin H H, Shang L H. Non-Interference
Test Method of Embedded System Software. Journal
of Beijing University of Aero. & Astro. 2004.7.
[7] Alexandra FedorovaēOperating System
Scheduling for Chip Multithreaded Processorsē
Harvard UniversityēSept.2006
[8] C. Jung, D. Lim, J. Lee, and S. Han. Adaptive
Execution Techniques for SMT Multiprocessor
Architectures. In Proceedings of the Tenth ACM
SIGPLAN Symposium on Principles and Practice of
Parallel Programming, pp. 236-246, 2005
[9] S. Kim, D. Chandra, and Y. Solihin. Fair Cache
Sharing and Partitioning in a Chip Multiprocessor
Architecture. In Proceedings of the 13th
International Conference on Parallel Architectures
and Compilation Techniques (PACT), pp. 111-122,
2004
[10] D. Chandra, F. Guo, S. Kim, and Y. Solihin.
Predicting Inter-Thread Cache Contention on a
Multi-Processor Architecture. In Proceedings of the
12th International Symposium on High Performance
Computer Architecture, pp. 340-351, 2005
[11]Brandenburg, B.B.; Calandrino, J.M.; Anderson,
J.H., "On the Scalability of Real-Time Scheduling
Algorithms on Multicore Platforms: A Case Study,"
Real-Time Systems Symposium, 2008 , vol., no.,
pp.157-169, Nov. 30 2008-Dec. 3 2008
[12]Jun Yan; Wei Zhang, "WCET Analysis for
Multi-Core Processors with Shared L2 Instruction
Caches," Real-Time and Embedded Technology and
36