Obtaining Hard Real-Time Performance and Rich Linux Features in A Compounded Real-Time Operating System by A Partitioning Hypervisor
Obtaining Hard Real-Time Performance and Rich Linux Features in A Compounded Real-Time Operating System by A Partitioning Hypervisor
59
VEE ’20, March 17, 2020, Lausanne, Switzerland Chung-Fan Yang and Yasushi Shinjo
RTOSs to be compatible with typical GPOSs. These chal- in the real-time realm can not only gain hard real-time per-
lenges mainly aim to increase the preemptible regions in OS formance by the RTOS but can also use the rich features by
kernels. For example, a popular GPOS, Linux, is frequently the GPOS through same application binary interface (ABI) as
modified as an RTOS. Linux has rich features and supports the GPOS. Developers can write and test their applications
the standard POSIX interface. It has a large number of exist- with toolchains they are familiar with, e.g. gcc.
ing applications and libraries. It is also a familiar program- The contributions of this paper are the following:
ming environment for developers. By translating Linux to an 1. We show the design and implementation of a cRTOS
RTOS, these existing resources can reduce the development that provides both hard real-time performance and
cost of real-time applications. Therefore, researchers and a Linux-compatible execution environment without
developers are working on real-time extensions of Linux. modifications to Linux.
However, each of these approaches has limitations. 2. We provide the same efficient development process of
The first representative approach is to extend Linux. Early rich and hard real-time applications as that of Linux
examples of real-time extensions to Linux are RT-Linux [60], without modifications to Linux. It also allows fast pro-
Resource kernel for Linux [40], and Time-Sensitive Linux totyping.
[19]. Currently, the most widely used extension is PREEMPT-
_RT [53]. This is a large set of patches that translate the non- This paper is organized as follows. We describe the de-
preemptible parts of Linux into preemptible ones and add sign of our cRTOS in Section 2 and its implementation in
real-time capability to Linux. However, it is difficult to prove Section 3 and 4. In Section 5, we present the evaluation and
that this set of patches can achieve hard real-time perfor- experimental results, including real-time performance and
mance. The existing studies only claim that the performance Linux-compatibility on system calls. In Section 6 we compare
of the PREEMPT_RT patched Linux is statically bounded, but our work with various previous studies. We conclude the
the permissiveness of this bound is unknown [46]. Moreover, paper in Section 7.
this set of patches is maintained out of the upstream kernel
source tree. In Linux 4.9, which contains about 25,000,000 2 System design
lines of code, the set of patches contains about 14,000 lines We design a cRTOS that fulfills the following requirements:
of changes to the vanilla kernel. It is difficult to track down
• Short and bounded jitter:
every single section of large and evolving Linux, rewrite it,
and evaluate its real-time performance and preemptibility. It has fully preemptive hard real-time performance.
This makes the effort of developing and maintaining this set • Good maintainability for kernel developers:
of patches extremely high. It works without patching Linux but adding plugins,
Another representative approach to real-time Linux is run- e.g. kernel modules.
ning a coscheduled RTOS with Linux and hijacking the in- • Good usability for application developers:
terrupt requests (IRQ) from Linux by using interrupt-dispat- It allows access to the rich features of Linux.
ching layers including microkernels [3, 16, 18, 21–24, 32, 59]. It is not trivial to achieve the first and second requirements
These methods achieve hard real-time performance but limit by modifying existing GPOSs because they are designed for
access to rich Linux features. For example, developers have obtaining high throughput for common (non-real-time) ap-
to split their applications into real-time processes running plications. Many previous approaches tried to modify Linux
in the RTOS and non-real-time processes running in Linux. which has architectures for high throughput, e.g. the inter-
Because real-time processes cannot use features of Linux di- rupt handling architecture with the top and bottom half.
rectly, they must use a special inter-process communication Modifying such throughput-oriented architecture faced a
mechanism with the non-real-time processes. Developers maintenance problem, as discussed in Section 1. Therefore,
must use special toolchains for building applications of real- we designed our cRTOS which has an isolated environment
time processes. It is not trivial to port the evolving Linux for real-time applications to achieve the first requirement
kernel to an interrupt-dispatching layer. Thus, they fall in as well as a common environment for the second and third
the same problems as typical RTOSs, which is tedious for requirement.
development.
To address these limitations in usability and maintainabil- 2.1 Normal and real-time realms
ity, we propose a new approach to a compounded real-time Our proposed cRTOS runs a GPOS and one or more instances
operating system (cRTOS). This system runs both a rich of sRTOSs in parallel using a hypervisor on a multi-core
GPOS and a swift RTOS (sRTOS) in it with a hypervisor and platform, as shown in Figure 1. The hypervisor provides
creates a normal realm and a real-time realm. The GPOS re- strong isolation of hardware resources. It also provides inter-
quires no modifications and patching. Real-time applications VM collaboration facilities, such as inter-VM shared memory,
messaging and event notification.
60
Obtaining hard RT performance and rich Linux features in a cRTOS VEE ’20, March 17, 2020, Lausanne, Switzerland
system Kernel Modules Rich GPOS Swift RTOS Remote Local system
calls Kernel Kernel system calls system calls calls
RT Hypervisor
Figure 1. The system architecture of our cRTOS and the internals of a rich real-time process.
In Figure 1, we show two realms. One is the normal realm, This avoids head-of-line blocking because multiple non-real-
which runs a GPOS, and provides the rich features. It exe- time threads do not interfere one another on accessing the
cutes non-real-time (non-RT) processes. This normal realm resources in the normal realm.
occupies most of the hardware resources including proces- In our cRTOS, a developer develops a rich real-time appli-
sor cores and non-real-time I/O devices. Each core has a cation as follows.
programmable interrupt controller (PIC) with a timer.
The other is the real-time realm, which runs an sRTOS 1. Choose a rich GPOS and sRTOS.
and acts as a subsystem of the GPOS. The sRTOS kernel is Typically, Linux is chosen as a rich GPOS. The sRTOS
specially designed for real-time processes and provides a should provide real-time performance through stan-
fully preemptible hard real-time environment. It also has dard APIs of the GPOS, such as the POSIX thread and
a shorter and more consistent interrupt handling mecha- real-time extension [51]. If the sRTOS lacks support for
nism than typical GPOSs. This real-time realm has dedicated multiple address spaces and other features, the devel-
processor cores and real-time devices. oper needs to add them for reusing the rich toolchains
These two realms collaborate via inter-VM messaging of the GPOS. We will discuss this in Section 3.3. The
queues and a large shared memory area. The GPOS loads a developer also has to add system call handlers, which
helper kernel module dynamically for this inter-VM collabo- will be shown in Section 4.
ration. 2. Design a rich real-time process.
It is composed by real-time and non-real-time threads.
The real-time threads access real-time devices and
2.2 Developing rich real-time applications timers and contain critical algorithms. The non-real-
Developers can create real-time processes from the normal time threads use rich features of the GPOS, e.g. X
realm. A real-time process is visible in both the normal and Window. They use lockless algorithms over shared
real-time realm. We call such a process a rich real-time process, variables to avoid blocking real-time threads by non-
as shown in Figure 1. real-time threads.
A single rich real-time process has multiple threads in the 3. Develop and test the application in the GPOS.
real-time realm. The scheduling and synchronization of these The developer can use the standard API and rich tool-
threads are managed by the sRTOS kernel in the real-time chains of the GPOS. The application uses a device
realm with the real-time scheduler. This provides a fully driver of the GPOS in this (prototype) phase.
preemptible environment and a short interrupt handling 4. Find device drivers of real-time devices.
path to the rich real-time processes and prevents priority The device drivers of the sRTOS of the real-time realm
inversion among threads. can be reused as is. If no drivers are available, the
Each rich real-time process has a shadow process running developer has to write them.
in the normal realm, which is used to access the resources 5. Run the executable in the real-time realm.
and features in the normal realm. Each thread of a rich real- The developer confirms the real-time performance of
time process is mapped to a thread of the shadow process. the rich real-time process.
61
VEE ’20, March 17, 2020, Lausanne, Switzerland Chung-Fan Yang and Yasushi Shinjo
62
Obtaining hard RT performance and rich Linux features in a cRTOS VEE ’20, March 17, 2020, Lausanne, Switzerland
63
VEE ’20, March 17, 2020, Lausanne, Switzerland Chung-Fan Yang and Yasushi Shinjo
execve()
(8) Read of ELF sections
Figure 4. Memory layout of a rich real-time process and its
(9) Reply ELF sections
shadow process.
(10) Jump to the
program entry point
applications often prefer statically linked executables be-
Figure 3. Executing a real-time process in the real-time cause they have no run-time overheads. Typical real-time
realm from the normal realm applications of Nuttx are also statically linked executables.
In our cRTOS, the execve() system call can handle statically
linked executables.
to the server. The first argument is the file name of the ELF
In contrast, it is convenient for rich real-time application
executable in the normal realm. The server creates a seed
developers to use dynamically linked executables because
process with a new system call rexec() of Nuttx. This seed
most Linux executables nowadays are dynamically linked
process executes the pre-defined code in the Nuttx kernel
ones. A dynamic linked executable requires a dynamic loader
and calls the execve() system call.
to load related dynamic shared objects (DSOs) during startup-
The execve() system call is mainly implemented by the
and run-time. Most dynamically linked executables in Linux
user-level code that performs the following.
use the common dynamic loader, called ld.so [29].
1. Communicates with the shadow process and reads the To reuse this dynamic loader, we implemented the mmap()
headers of the ELF executable. system call as a dual system call. Ld.so itself is a statically
2. Allocates the required memory pages for the .text, linked program. Therefore, it can be loaded as a statically
.data, and .bss segments from the shared memory linked executable with execve(). Ld.so loads the dynami-
area. cally linked libraries with mmap() and jumps to the entry
3. Communicates with the shadow process and reads point of the dynamically linked executable. We will describe
the sections of the ELF executable into these memory the implementation of the system call in the next section.
pages.
4. Allocates memory pages for a new user stack from the 4 Handling system calls
shared memory area. 4.1 Real-time system calls
5. Maps the memory pages to the address space of the
seed process. In the real-time realm, the following system calls (and related
6. Communicates with the shadow process and requests APIs) are handled by the Nuttx kernel in a real-time way:
it to map the memory pages in the shared memory • Task-related:
area to the same addresses of the seed process by the clone(), fork(), sched_setparam(), etc.
mmap() system call. • Time-related:
7. Stores the arguments and environment variables onto clock_nanosleep(), gettime(), etc.
the user stack. • IO-related:
8. Stores the ELF auxiliary vectors onto the user stack. open(), close(), read(), write(), ioctl() for real-
9. Sets the stack pointer and jumps to the entry point of time I/O devices and pseudo devices.
the executable. • IPC-related:
socket() (both Unix domain sockets and TCP/IP),
The shadow process shares the same address space of the
pipe(), select(), and poll(), etc. for communica-
new rich real-time process. Figure 4 shows an example of
tion with other real-time processes.
the mapping between a rich real-time process and its corre-
sponding shadow process. This is essential for the remote When we run Linux executables in Nuttx, we have to solve
system calls, and its details will be described in Section 4. the following problems:
Executables of Linux are classified into two types: stati- 1. The numbering of system calls of Nuttx is different
cally linked and dynamically linked executables. Real-time from that of Linux.
64
Obtaining hard RT performance and rich Linux features in a cRTOS VEE ’20, March 17, 2020, Lausanne, Switzerland
2. The flag values of various system calls are different. For These IO and IPC system calls make a real-time process
example, the flag ‘‘O_RDONLY’’ of open() is 0x0 in have access to rich features. For example, a rich real-time pro-
Linux and 0x1 in Nuttx. Conforming to POSIX means cess can open a DSO file in the normal realm and read it into
only source level compatibility. the memory of the real-time realm to use additional libraries.
A rich real-time process can use Unix domain sockets and
To solve these problems, we have implemented a transla- connect to the X window server in the normal realm.
tion table for the system call numbering and multiple stubs We have implemented remote system calls using shared-
for all the system calls with different flag values. We did not memory and virtio queues of Jailhouse as follows.
modify the system call numbering and flag values of Nuttx.
A rich real-time process can use the POSIX Thread API to In the real-time realm:
create real-time and non-real-time threads. In Linux, these 1. A user thread of a rich real-time process issues a sys-
are implemented mainly in the Glibc at the user level with tem call, switching to its kernel thread.
a small number of non-standard system calls of Linux. We 2. The kernel thread gets the arguments, makes a request
have added the following non-standard system calls of Linux message from these arguments and the priority of the
to Nuttx. thread, and puts the message into the virtio queue to
Linux.
• clone() (including fork() and vfork()) 3. The kernel thread notifies Linux with an inter-pro-
This is used for creating a kernel-level thread. Because cessor-interrupt (IPI).
multi-threading of both Linux and Nuttx is based on 4. The kernel thread sleeps and yields the CPU to a next
the 1:1 model [31], we create a single Nuttx thread for runnable thread.
each invocation of clone(). In addition, we create a In the normal realm:
Linux thread in the shadow process for remote system 5. The Linux kernel handles the IPI. The interrupt handler
calls. gets the request message and puts it to the memory
• futex() (fast user-space locking) of the corresponding shadow process. The interrupt
This is used in Glibc to reduce the overhead of the handler wakes up the corresponding thread in the
condition variable and mutex of Glibc. shadow process.
• arch_prctl() (set architecture-specific thread state) 6. The corresponding thread extracts the request message
This system call can set the FS register of x86-64 with and issues a system call to Linux.
an argument. This is used for implementing thread 7. The Linux kernel performs the system call.
local storage (TLS) with gcc support on the x86-64 8. The corresponding thread makes a reply message with
platform [6]. For example, Glibc utilizes TLS of gcc to the return value of the system call and goes back to
implement the per-thread errno. Furthermore, TLS is the Linux kernel.
used in Glibc to implement thread specific data with 9. The Linux kernel puts the reply message into the virtio
pthread_getspecific() [50]. queue to Nuttx. The Linux kernel checks the current
• set_tid_address() (set thread ID address) thread priority in the real-time realm, which is pub-
This is used to implement pthread_join(), lished on a shared page by the Nuttx kernel. The Linux
pthread_exit() and pthread_create(). When a th- kernel notifies the Nuttx kernel with an IPI if the pri-
read calls pthread_join(), the thread is put to wait ority of the returning thread is higher than the current
for a futex associated with the thread ID address that is one.
set in pthread_create(). The futex is released when In the real-time realm:
another thread call pthread_exit(). 10. The Nuttx kernel gets the reply messages from the
queue and wakes up and schedules the sleeping cor-
responding kernel thread that has issued the remote
4.2 Remote system calls
system call in the following circumstance:
For non-critical tasks, a rich real-time process can use the • During receiving an IPI from Linux.
following remote system calls of Linux: • During a context switch. The Nuttx kernel polls the
virtio queue for available messages.
• IO-related:
11. The kernel thread gets the return value from the reply
open(), close(), read(), write(), and ioctl() for
message and returns it back to the user thread.
non-real-time I/O devices and filesystems mounted in
the normal realm. In this implementation, we have chosen to use IPIs for
• IPC-related: better response with real-time performance. We took extra
Sockets (both Unix domain sockets and TCP/IP) for care on sending the IPIs to the real-time realm to prevent un-
communicating with non-real-time processes. wanted priority inversion. We also chose to use the user-level
65
VEE ’20, March 17, 2020, Lausanne, Switzerland Chung-Fan Yang and Yasushi Shinjo
code in the shadow process for simplicity. Similar implemen- • Vanilla Linux on bare-metal machine
tations are found in device emulation of Qemu/KVM [28, 44]. • PREEMPT_RT patched Linux on bare-metal machine
We also have to multiplex system call handling for indi- • Xenomai 3 and I-Pipe patched Linux on bare-metal
vidual opened files. For example, if a file is opened in the machine
real-time realm, the system call read() should be handled • Our cRTOS with vanilla Linux on Jailhouse
locally. If a file is opened in the normal realm, the system • Our cRTOS with PREEMPT_RT Linux on Jailhouse
call read() should be handled as a remote system call. The first three served as baselines to compare the perfor-
To solve this problem, the current implementation uses mance of our cRTOS. In our cRTOS, we allocated a single
a simple method for real-time performance. We split the core and one half of the LLC to the sRTOS, and the rest of the
address space of file descriptors into two spaces, the real-time cores and LLC to Linux. In the other systems, we allocated
space and the normal space, with a limit. If a file descriptor an exclusive core to a real-time benchmark program and one
is smaller than the limit, the kernel deals with it as a file half of the LLC to the core. The rest of the cores and LLC
descriptor of the normal realm. Otherwise, the kernel deals were shared by other non-real-time programs.
with it as one of the real-time realm. The kernel deals with To add loads to the system, we used stress-ng [26] with
file descriptors 0 to 2 as those in the normal realm, and a various built-in stressors, including the stream [34] stressor.
rich real-time process can use stdin, stdout, and stderr In each experiment, we concurrently ran 10 stressors in the
with non-real-time threads. normal realm in our cRTOS. In other systems, we ran the
stressors in Linux.
4.3 Dual system calls
Because both a rich real-time process and its shadow process 5.1.2 Timing accuracy
have to keep the same memory layout and terminate together, We evaluated the timing accuracy of our cRTOS using cyc-
some system calls are executed in both the real-time and lictest [52] with various loads. Cyclictest measures the time
normal realms. These include mmap(), munmap(), exit(), between when a thread sets a timer and when the timer
fork(), and vfork(). expires. Using this program, we measured the accuracy of
Because the parent process and the child process in fork() the response to the timer interrupts. All the systems used
and vfork() have the same memory layout, a single shadow TSC timers. We ran the same dynamically linked cyclictest
process cannot map the memory of both the processes. There- executable of Linux, compiled from the source code of the
fore, the shadow process also forks itself in the normal realm. main repository, in all the experiments except Xenomai. Be-
cause the cyclictest of the main repository did not work in
5 Evaluation Xenomai, we used the cyclictest port by Xenomai. Each test
In this section, we evaluate our cRTOS. First, we show the ran cyclictest with the SCHED_FIFO scheduling policy, with
real-time performance of our cRTOS compared with the rep- the highest priority, and produced 100,000 samples.
resentative existing approaches, the PREEMPT_RT patched Figure 5 shows the results without and with load. The
Linux and Xenomai 3. We measure the timing accuracy and results of vanilla Linux in Figure 5 (d) and (e) have been
interrupt latency. Next, we show the performance of the cropped for better visualization. This figure contains mod-
real-time and remote system calls. Finally, we discuss the ified box plots, which depict the first and third quantile of
compatibility of our cRTOS with Linux. latency with maximum and minimum values. Figure 5 (a)
shows the results without load. While threads running in
5.1 Real-time performance vanilla Linux could achieve a very short response time to
timer interrupts, the variation in response time was high.
5.1.1 Experimental setup In contrast, threads running in the real-time realm of our
To evaluate our cRTOS, we used an x86-64 platform with an cRTOS had not only shorter latency, but also higher stability
Intel Xeon 2630 v4 processor with 10 cores and 20 MB of the and predictability.
last-level cache (LLC) and 32 GB of RAM. We disabled the hy- Figures 5 (b) to (e) show the results with load. Compared
per-threading of the processor and the power management of with Figure 5 (a), the jitter of the real-time process was in-
operating systems to ensure accuracy of measurement. The creased. These results show that our cRTOS could deliver
normal realm ran vanilla or PREEMPT_RT patched Linux hard real-time performance with about 4 µs jitter and well
with version 4.9.84 in Ubuntu 16.04. The real-time realm ran bounded maximum latency of 4 µs, while the others could
our ported Nuttx version 7.3. The hypervisor was Jailhouse not. In other words, with the same application executable,
version 0.9.1. our cRTOS could provide better real-time performance than
We ran real-time benchmark programs in the following the representative Linux-compatible real-time environments.
five different systems, which are all Linux-compatible envi- In addition, we noticed that the performance of our cRTOS
ronments. was slightly better when using PREEMPT_RT patched Linux
66
Obtaining hard RT performance and rich Linux features in a cRTOS VEE ’20, March 17, 2020, Lausanne, Switzerland
(a) Without load (b) Loaded with cpu (c) Loaded with cache (d) Loaded with vm (e) Loaded with stream
60
Latency (us)
40
20
0
L P X LN PN L P X LN PN L P X LN PN L P X LN PN L P X LN PN
Environments under test
Proposed Proposed
L Vanilla Linux P PREEMPT_RT X Xenomai 3 LN PN
(Vanilla Linux + Nuttx) (PREEMPT_RT + Nuttx)
Figure 5. The latency jitter measured with cyclictest in Linux-compatible real-time environments under various loads.
67
VEE ’20, March 17, 2020, Lausanne, Switzerland Chung-Fan Yang and Yasushi Shinjo
(a) Without load (b) Loaded with cpu (c) Loaded with cache (d) Loaded with vm (e) Loaded with stream
60
50
Latency (us)
40
30
20
10
0
L P X LN PN L P X LN PN L P X LN PN L P X LN PN L P X LN PN
Environments under test
Proposed Proposed
L Vanilla Linux P PREEMPT_RT X Xenomai 3 LN PN
(Vanilla Linux + Nuttx) (PREEMPT_RT + Nuttx)
Figure 7. The interrupt latency and jitter of a serial port device in Linux-compatible real-time environments under various
loads.
Table 1. The maximum latency of various system calls. Table 2. Analysis of system call coverage.
Measured by Lmbench in microseconds. A=Available, N/A=Not Available, O=Obsolete in Linux.
Environment getpid read write open and close Classification Importance Linux 3.19 Real-time realm
PREEMPT_RT native 0.306 0.406 0.338 2.23 A N/A O
Xenomai 3 0.456 1.14 1.07 4.16 Indispensable 100% 224 224 0 0
Real-time system call 0.059 0.088 0.083 0.445 Important 10% - 100% 33 22 11 0
Remote system call — 27.7 27.0 56.3
Low important 0% - 10% 44 38 6 0
Not used 0% 18 4 6 8
New in Linux 4.9 — 8 5 0
Total 319 295 29 8
The results are shown in Table 1. We noticed that for real-
time system calls, our cRTOS has smaller latency than Linux 5.3.1 System calls and pseudo files
and Xenomai 3. This is because the real-time realm has a
We have shown the development process of our target real-
simpler system call handler and does not require switching
time applications in Section 2.2. In this process, developers
between user and privilege levels. In general, we achieved 5
develop new real-time applications by using local real-time
times speedup with respect to PREEMPT_RT patched Linux
system call and remote system calls according to their needs.
and Xenomai 3.
Although our cRTOS does not support all the system calls of
For remote system calls, the latency was large. It took 27 µs
Linux, this missing causes no problem in this development
for a single system call. This was because of the overhead
process. In the following, we evaluate our cRTOS’s compati-
of IPIs, memory copy, cache contention, and scheduling in
bility with Linux to run existing executables, including ones
both the realms. We conclude that to achieve good real-time
for PREEMPT_RT patched Linux.
performance, developers should provide real-time drivers
Linux version 4.9 has 332 system calls for the x86-64 ar-
for real-time devices. Developers should use remote system
chitecture. Among the 332 system calls, 8 are obsolete, and
calls for rich user interface, access to non-real-time devices,
the rest of them, 324 are properly implemented in the Linux
and initialization, i.e. dynamic loading.
kernel. The real-time realm supports 295 of 324 system calls,
and 29 system calls are not supported.
5.3 Compatibility with Linux We evaluate the effect of these missing system calls by
The compatibility with Linux depends on the sRTOS run- following the methodology in the paper [56]. They analyzed
ning in the real-time realm. To evaluate the compatibility Ubuntu and Debian packages statically in 2.9 million installa-
of our cRTOS using Nuttx as the sRTOS, we performed the tions, and found that not all APIs are equally important. They
following: define a metric, API importance for a given API as the prob-
ability that an installation includes at least one application
1. We statically analyzed the coverage of system calls requiring the given API. Using this metric, they classified
and pseudo files in our cRTOS and compared it with 319 system calls of Linux 3.19 into four classes: indispensable,
vanilla Linux. important, low important, and not used. Table 2 shows the
2. We executed X Window applications. results.
68
Obtaining hard RT performance and rich Linux features in a cRTOS VEE ’20, March 17, 2020, Lausanne, Switzerland
69
VEE ’20, March 17, 2020, Lausanne, Switzerland Chung-Fan Yang and Yasushi Shinjo
70
Obtaining hard RT performance and rich Linux features in a cRTOS VEE ’20, March 17, 2020, Lausanne, Switzerland
[2] ASIX Electronics Corporationa. 2015. MCS9900 PCIe to Multi I/O (4S, [19] Ashvin Goel, Luca Abeni, Charles Krasic, Jim Snow, and Jonathan
2S+1P) Controller Datasheet. ASIX Electronics Corporationa. Walpole. 2002. Supporting Time-sensitive Applications on a Com-
[3] Antonio Barbalace, Adriano Luchetta, Gabriele Manduchi, Michele modity OS. In Proceedings of the 5th Symposium on Operating Systems
Moro, Anton Soppelsa, and Cesare Taliercio. 2008. Performance Com- Design and Implementation (OSDI ’02). 165–180. https://doi.org/10.
parison of VxWorks, Linux, RTAI, and Xenomai in a Hard Real-Time 1145/844128.844144
Application. IEEE Transactions on Nuclear Science 55, 1 (Feb. 2008), [20] Sriram Govindan, Arjun R. Nath, Amitayu Das, Bhuvan Urgaonkar,
435–439. https://doi.org/10.1109/TNS.2007.905231 and Anand Sivasubramaniam. 2007. Xen and Co.: Communication-
[4] John M. Calandrino, Hennadiy Leontyev, Aaron Block, UmaMah- aware CPU Scheduling for Consolidated Xen-based Hosting Platforms.
eswari C. Devi, and James H. Anderson. 2006. LITMUSRT : A Testbed In Proceedings of the 3rd International Conference on Virtual Execution
for Empirically Comparing Real-Time Multiprocessor Schedulers. In Environments (VEE ’07). 126–136. https://doi.org/10.1145/1254810.
Proceedings of 2006 27th IEEE International Real-Time Systems Sympo- 1254828
sium (RTSS’06). 111–126. https://doi.org/10.1109/RTSS.2006.27 [21] Hermann Härtig, Robert Baumgartl, Martin Borriss, Claude-Joachim
[5] Chok Leong Chai. 2010. [RTAI] "XIO: fatal IO error 1..." when using Hamann, Micheal Hohmuth, Frank Mehnert, Lars Reuther, Sebastian
Liunx server. http://mail.rtai.org/pipermail/rtai/2010-June/023338. Schönberg, and Jean Wolter. 1998. DROPS: OS Support for Distributed
html Multimedia Applications. In Proceedings of the 8th ACM SIGOPS Eu-
[6] Ulrich Drepper. 2002. ELF Handling for Thread-Local Storage. Technical ropean Workshop on Support for Composing Distributed Applications.
Report. 203–209. https://doi.org/10.1145/319195.319226
[7] Hideki Eiraku, Yasushi Shinjo, Calton Pu, Younggyun Koh, and [22] Hermann Härtig, Michael Hohmuth, Jochen Liedtke, Sebastian Schön-
Kazuhiko Kato. 2009. Fast Networking with Socket-outsourcing in berg, and Jean Wolter. 1997. The Performance of µ-kernel-based
Hosted Virtual Machine Environments. In Proceedings of the 2009 Systems. In Proceedings of the Sixteenth ACM Symposium on Operating
ACM Symposium on Applied Computing (SAC ’09). 310–317. https: Systems Principles (SOSP ’97). 66–77. https://doi.org/10.1145/268998.
//doi.org/10.1145/1529282.1529350 266660
[8] Heiko Falk, Sebastian Altmeyer, Peter Hellinckx, Björn Lisper, Wolf- [23] Gernot Heiser and Kevin Elphinstone. 2016. L4 Microkernels: The
gang Puffitsch, Christine Rochange, Martin Schoeberl, Rasmus Bo Lessons from 20 Years of Research and Deployment. ACM Transactions
Sørensen, Peter Wägemann, and Simon Wegener. 2016. TACLeBench: on Computer Systems 34, 1, Article 1 (April 2016), 29 pages. https:
A benchmark collection to support worst-case execution time research. //doi.org/10.1145/2893177
In Proceedings of the 16th International Workshop on Worst-Case Execu- [24] Gernot Heiser and Ben Leslie. 2010. The OKL4 Microvisor: Conver-
tion Time Analysis (WCET ’16). Schloss Dagstuhl-Leibniz-Zentrum für gence Point of Microkernels and Hypervisors. In Proceedings of the
Informatik, 2:1–2:10. First ACM Asia-pacific Workshop on Workshop on Systems (APSys ’10).
[9] The Linux Foundation. 2015. Linux Standard Base Core Specification for 19–24. https://doi.org/10.1145/1851276.1851282
X86-64. http://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-AMD64/ [25] Intel. 2018. Intel® 64 and IA-32 Architectures Software Developer’s
LSB-Core-AMD64.pdf Manual.
[10] Sahan Gamage, Ramana Rao Kompella, Dongyan Xu, and Ardalan [26] Colin King. 2018. Stress-ng: a tool to load and stress a computer system.
Kangarlou. 2013. Protocol Responsibility Offloading to Improve http://kernel.ubuntu.com/~cking/stress-ng
TCP Throughput in Virtualized Environments. ACM Transactions [27] Jan Kiszka and Rik van Riel. 2016. Two approaches to real-time virtu-
on Computer Systems 31, 3, Article 7 (Aug. 2013), 34 pages. https: alization - Jailhouse and KVM. In Proceedings of the Linux Foundation
//doi.org/10.1145/2491463 Real-Time Summit. https://wiki.linuxfoundation.org/realtime/events/
[11] Oscar F. Garcia, Yasushi Shinjo, and Calton Pu. 2018. Achieving Con- rt-summit2016/schedule
sistent Real-Time Latency at Scale in a Commodity Virtual Machine [28] Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lublin, and Anthony Liguori.
Environment Through Socket Outsourcing-Based Network Stacks. 2007. KVM: the Linux Virtual Machine Monitor. In Proceedings of the
IEEE Access 6 (2018), 69961–69977. 2007 Ottawa Linux Symposium (OLS ’07).
[12] Marisol García-Valls, Tommaso Cucinotta, and Chenyang Lu. 2014. [29] Linux man-pages. 2019. ld.so, ld-linux.so - dynamic linker/loader.
Challenges in real-time virtualization and predictable cloud computing. http://man7.org/linux/man-pages/man8/ld.so.8.html
Journal of Systems Architecture 60, 9 (2014), 726 – 740. https://doi.org/ [30] C. L. Liu and James W. Layland. 1973. Scheduling Algorithms for
10.1016/j.sysarc.2014.07.004 Multiprogramming in a Hard-Real-Time Environment. J. ACM 20, 1
[13] GCC team. 2019. GCC, the GNU Compiler Collection. https://gcc. (Jan. 1973), 46–61. https://doi.org/10.1145/321738.321743
gnu.org/ [31] Robert Love. 2010. Linux Kernel Development (3rd ed.). Addison-Wesley
[14] GCC team. 2019. The GNU C Library (glibc). https://www.gnu.org/ Professional.
software/libc/ [32] Paolo Mantegazza, EL Dozio, and Steve Papacharalambous. 2000. RTAI:
[15] Balazs Gerofi, Masamichi Takagi, Yutaka Ishikawa, Rolf Riesen, Evan Real time application interface. Linux Journal 2000, 72es (2000), 10.
Powers, and Robert W Wisniewski. 2015. Exploring the design space [33] Michael Matz, Jan Hubicka, Andreas Jaeger, and Mark Mitchell. 2014.
of combining Linux with lightweight kernels for extreme scale com- System V Application Binary Interface AMD64 Architecture Processor
puting. In Proceedings of the 5th International Workshop on Runtime Supplement, Draft Version 0.99.7. Technical Report. https://www.uclibc.
and Operating Systems for Supercomputers. 5–12. org/docs/psABI-x86_64.pdf
[16] Philippe Gerum. 2004. Xenomai–Implementing a RTOS emulation [34] John D McCalpin. 1995. Memory bandwidth and machine balance in
framework on GNU/Linux. Technical Report. current high performance computers. IEEE computer society technical
[17] Philippe Gerum. 2014. [Xenomai] POSIX application running under committee on computer architecture (TCCA) newsletter (1995), 19–25.
xenomai – what do wrapped functions do? https://xenomai.org/ [35] Larry McVoy and Carl Staelin. 1996. Lmbench: Portable Tools for
pipermail/xenomai/2014-June/031120.html Performance Analysis. In Proceedings of the 1996 Annual Conference
[18] Sourav Ghosh and Ragunathan Raj Rajkumar. 2002. Resource manage- on USENIX Annual Technical Conference (ATEC ’96). 23–23. http:
ment of the OS network subsystem. In Proceedings 5th IEEE Interna- //dl.acm.org/citation.cfm?id=1268299.1268322
tional Symposium on Object-Oriented Real-Time Distributed Computing. [36] Jun Nakajima, Qian Lin, Sheng Yang, Min Zhu, Shang Gao, Mingyuan
ISIRC 2002. 271–279. https://doi.org/10.1109/ISORC.2002.1003728 Xia, Peijie Yu, Yaozu Dong, Zhengwei Qi, Kai Chen, and Haibing Guan.
2011. Optimizing Virtual Machines Using Hybrid Virtualization. In
71
VEE ’20, March 17, 2020, Lausanne, Switzerland Chung-Fan Yang and Yasushi Shinjo
Proceedings of the 2011 ACM Symposium on Applied Computing (SAC 95–103. https://doi.org/10.1145/1400097.1400108
’11). 573–578. https://doi.org/10.1145/1982185.1982308 [49] Livio Soares and Michael Stumm. 2010. FlexSC: Flexible System Call
[37] Nick. 2013. lxardoscope. https://sourceforge.net/projects/lxardoscope/ Scheduling with Exception-less System Calls. In Proceedings of the 9th
[38] Audun Nordal, Åge Kvalnes, and Dag Johansen. 2012. Paravirtualizing USENIX Conference on Operating Systems Design and Implementation
TCP. In Proceedings of the 6th International Workshop on Virtualization (OSDI ’10). 33–46.
Technologies in Distributed Computing Date (VTDC ’12). 3–10. https: [50] Sun Microsystems, Inc. 2004. Linker and Libraries Guide.
//doi.org/10.1145/2287056.2287060 [51] The IEEE and The Open Group. 2017. IEEE 1003.1-2017 - IEEE Stan-
[39] Nuttx. 2019. NuttX Real-Time Operating System. http://www.nuttx.org dard for Information Technology–Portable Operating System Interface
[40] Shuichi Oikawa and Raj Rajkumar. 1999. Portable RK: a portable re- (POSIX(R)) Base Specifications, Issue 7. IEEE.
source kernel for guaranteed and enforced timing behavior. In Proceed- [52] The Linux Foundation. 2018. Cyclictest. https://wiki.linuxfoundation.
ings of the 5th IEEE Real-Time Technology and Applications Symposium. org/realtime/documentation/howto/tools/cyclictest/start
111–120. https://doi.org/10.1109/RTTAS.1999.777666 [53] The Linux Foundation. 2019. realtime:start [Linux Foundation Wiki].
[41] Pierre Olivier, Daniel Chiba, Stefan Lankes, Changwoo Min, and Binoy https://wiki.linuxfoundation.org/realtime/start
Ravindran. 2019. A Binary-compatible Unikernel. In Proceedings of [54] The Linux Foundation. 2019. Zephyr Project. https://www.
the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual zephyrproject.org/
Execution Environments (VEE ’19). 59–73. https://doi.org/10.1145/ [55] Tool Interface Standard (TIS). 1995. Executable and Linking Format
3313808.3313817 (ELF) Specification Version 1.2. Technical Report.
[42] Paoloni, Gabriele. 2010. How to Benchmark Code Execution Times on [56] Chia-Che Tsai, Bhushan Jain, Nafees Ahmed Abdul, and Donald E.
Intel® IA-32 and IA-64 Instruction Set Architectures. Intel Corporation. Porter. 2016. A Study of Modern Linux API Usage and Compatibility:
[43] Qemu. 2016. Device Specification for Inter-VM shared memory device. What to Support when You’re Supporting. In Proceedings of the 11th
Technical Report. European Conference on Computer Systems (EuroSys ’16). 16:1–16:16.
[44] Qumranet Inc. 2006. KVM: Kernel-based Virtualization Driver. Qum- https://doi.org/10.1145/2901318.2901341
ranet. [57] Sisu Xi, Justin Wilson, Chenyang Lu, and Christopher Gill. 2011. RT-
[45] Ralf Ramsauer, Jan Kiszka, Daniel Lohmann, and Wolfgang Mauerer. Xen: Towards Real-time Hypervisor Scheduling in Xen. In Proceedings
2017. Look Mum, no VM Exits!(Almost). In Proceedings of Workshop of the 9th ACM International Conference on Embedded Software (EM-
on Operating Systems Platforms for Embedded Real-Time Applications. SOFT ’11). 39–48. https://doi.org/10.1145/2038642.2038651
13–18. [58] Sisu Xi, Meng Xu, Chenyang Lu, Linh Thi Xuan Phan, Christopher Gill,
[46] Federico Reghenzani, Giuseppe Massari, and William Fornaciari. 2019. Oleg. Sokolsky, and Insup Lee. 2014. Real-time multi-core virtual ma-
The Real-Time Linux Kernel: A Survey on PREEMPT_RT. Comput. chine scheduling in Xen. In 2014 International Conference on Embedded
Surveys 52, 1, Article 18 (Feb. 2019), 36 pages. https://doi.org/10.1145/ Software (EMSOFT). 1–10. https://doi.org/10.1145/2656045.2656061
3297714 [59] Karim Yaghmour. 2001. Adaptive domain environment for operating
[47] Yi Ren, Ling Liu, Qi Zhang, Qingbo Wu, Jianbo Guan, Jinzhu Kong, systems. Opersys Inc.
Huadong Dai, and Lisong Shao. 2016. Shared-Memory Optimizations [60] Victor Yodaiken. 1999. The RTLinux manifesto. In Proceedings of The
for Inter-Virtual-Machine Communication. Comput. Surveys 48, 4, 5th Linux Expo.
Article 49 (Feb. 2016), 42 pages. https://doi.org/10.1145/2847562 [61] Pei Zhang. 2017. See what happened with real time KVM when build-
[48] Rusty Russell. 2008. Virtio: Towards a De-facto Standard for Virtual ing real time cloud. In Proceedings of the Linux Foundation LinuxCon
I/O Devices. ACM SIGOPS Operating Systems Review 42, 5 (July 2008), China. https://www.slideshare.net/LCChina
72