0% found this document useful (0 votes)
60 views14 pages

Obtaining Hard Real-Time Performance and Rich Linux Features in A Compounded Real-Time Operating System by A Partitioning Hypervisor

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views14 pages

Obtaining Hard Real-Time Performance and Rich Linux Features in A Compounded Real-Time Operating System by A Partitioning Hypervisor

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Obtaining hard real-time performance and rich Linux

features in a compounded real-time operating system


by a partitioning hypervisor
Chung-Fan Yang Yasushi Shinjo
University of Tsukuba University of Tsukuba
Department of Computer Science Department of Computer Science
Tsukuba, Ibaraki, Japan Tsukuba, Ibaraki, Japan
sonicyang@softlab.cs.tsukuba.ac.jp yas@cs.tsukuba.ac.jp

Abstract Linux executables with graphical user interfaces through the


In this study, we describe obtaining hard real-time perfor- X window system.
mance and rich Linux features together in a compounded CCS Concepts • Computer systems organization →
real-time operating system (cRTOS). This system creates Real-time operating systems; • Software and its engi-
two realms with a partitioning hypervisor: a normal realm neering → Virtual machines; Operating systems; Embed-
of Linux and a hard real-time realm of a swift RTOS (sRTOS). ded software; Real-time systems software;
A rich real-time process running in the real-time realm can
use not only the hard real-time performance of the RTOS but Keywords Real-time operating system, Linux kernel, bi-
also the rich features of Linux through remote system calls. nary compatibility, virtualization, operating systems
Unlike existing approaches for real-time Linux including the ACM Reference Format:
PREEMPT_RT patch and using interrupt-dispatching layers, Chung-Fan Yang and Yasushi Shinjo. 2020. Obtaining hard real-time
this approach requires no modifications to Linux. performance and rich Linux features in a compounded real-time
We implemented the cRTOS by running Nuttx, a POSIX- operating system by a partitioning hypervisor. In 16th ACM SIG-
compliant RTOS as an sRTOS and Jailhouse as the parti- PLAN/SIGOPS International Conference on Virtual Execution Environ-
ments (VEE ’20), March 17, 2020, Lausanne, Switzerland. ACM, New
tioning hypervisor. We ported base Nuttx to the x86-64 ar-
York, NY, USA, 14 pages. https://doi.org/10.1145/3381052.3381323
chitecture and added support for multiple address spaces
with MMU. This allows developers of rich real-time appli- 1 Introduction
cations to use the same toolchains and executables with
Linux, which reduces the cost and complexity of developing Real-time operating systems (RTOSs) are typically used in
real-time applications. embedded systems to support time critical applications [30].
We measured the timing accuracy and interrupt latency With the evolution of embedded systems with the devel-
of the proposed cRTOS and other existing systems, the PRE- opment of the Internet of things (IoT) and edge computing,
EMPT_RT patched Linux and Xenomai 3. The experimen- RTOSs also need to evolve. Users and applications now are de-
tal results show that the proposed cRTOS could deliver a manding rich features from RTOSs along with hard real-time
hard real-time performance with about 4 µs jitter and well performances. These include graphical user interfaces and
bounded maximum latency, while the others could not. The complete TCP/IP networking, which are commonly available
experimental results also show that the proposed cRTOS in general purpose operating systems (GPOSs).
with a real-time device yielded the best interrupt response Adding these rich features to an RTOS is difficult and un-
in both latency and jitter. The RTOS could execute complex favorable work because adding code often harms real-time
performance and is often considered reinventing the wheel.
In addition, the development of real-time applications is typ-
Permission to make digital or hard copies of all or part of this work for
ically tedious, because the programming environments and
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear toolchains for RTOSs are very different from those of widely
this notice and the full citation on the first page. Copyrights for components adapted GPOSs. For example, the support for well-known
of this work owned by others than ACM must be honored. Abstracting with programming environments, e.g. Portable Operating System
credit is permitted. To copy otherwise, or republish, to post on servers or to Interface (POSIX) [51], libraries, e.g. GNU C library [14],
redistribute to lists, requires prior specific permission and/or a fee. Request
and toolchains e.g. gcc [13], are not well adapted by many
permissions from permissions@acm.org.
VEE ’20, March 17, 2020, Lausanne, Switzerland
off-the-shelf RTOSs. This makes the development process of
© 2020 Association for Computing Machinery. real-time applications inefficient and more expensive.
ACM ISBN 978-1-4503-7554-2/20/03. . . $15.00 This inspires a lot of research challenges for converting
https://doi.org/10.1145/3381052.3381323 GPOSs to fulfill hard real-time requirements and extending

59
VEE ’20, March 17, 2020, Lausanne, Switzerland Chung-Fan Yang and Yasushi Shinjo

RTOSs to be compatible with typical GPOSs. These chal- in the real-time realm can not only gain hard real-time per-
lenges mainly aim to increase the preemptible regions in OS formance by the RTOS but can also use the rich features by
kernels. For example, a popular GPOS, Linux, is frequently the GPOS through same application binary interface (ABI) as
modified as an RTOS. Linux has rich features and supports the GPOS. Developers can write and test their applications
the standard POSIX interface. It has a large number of exist- with toolchains they are familiar with, e.g. gcc.
ing applications and libraries. It is also a familiar program- The contributions of this paper are the following:
ming environment for developers. By translating Linux to an 1. We show the design and implementation of a cRTOS
RTOS, these existing resources can reduce the development that provides both hard real-time performance and
cost of real-time applications. Therefore, researchers and a Linux-compatible execution environment without
developers are working on real-time extensions of Linux. modifications to Linux.
However, each of these approaches has limitations. 2. We provide the same efficient development process of
The first representative approach is to extend Linux. Early rich and hard real-time applications as that of Linux
examples of real-time extensions to Linux are RT-Linux [60], without modifications to Linux. It also allows fast pro-
Resource kernel for Linux [40], and Time-Sensitive Linux totyping.
[19]. Currently, the most widely used extension is PREEMPT-
_RT [53]. This is a large set of patches that translate the non- This paper is organized as follows. We describe the de-
preemptible parts of Linux into preemptible ones and add sign of our cRTOS in Section 2 and its implementation in
real-time capability to Linux. However, it is difficult to prove Section 3 and 4. In Section 5, we present the evaluation and
that this set of patches can achieve hard real-time perfor- experimental results, including real-time performance and
mance. The existing studies only claim that the performance Linux-compatibility on system calls. In Section 6 we compare
of the PREEMPT_RT patched Linux is statically bounded, but our work with various previous studies. We conclude the
the permissiveness of this bound is unknown [46]. Moreover, paper in Section 7.
this set of patches is maintained out of the upstream kernel
source tree. In Linux 4.9, which contains about 25,000,000 2 System design
lines of code, the set of patches contains about 14,000 lines We design a cRTOS that fulfills the following requirements:
of changes to the vanilla kernel. It is difficult to track down
• Short and bounded jitter:
every single section of large and evolving Linux, rewrite it,
and evaluate its real-time performance and preemptibility. It has fully preemptive hard real-time performance.
This makes the effort of developing and maintaining this set • Good maintainability for kernel developers:
of patches extremely high. It works without patching Linux but adding plugins,
Another representative approach to real-time Linux is run- e.g. kernel modules.
ning a coscheduled RTOS with Linux and hijacking the in- • Good usability for application developers:
terrupt requests (IRQ) from Linux by using interrupt-dispat- It allows access to the rich features of Linux.
ching layers including microkernels [3, 16, 18, 21–24, 32, 59]. It is not trivial to achieve the first and second requirements
These methods achieve hard real-time performance but limit by modifying existing GPOSs because they are designed for
access to rich Linux features. For example, developers have obtaining high throughput for common (non-real-time) ap-
to split their applications into real-time processes running plications. Many previous approaches tried to modify Linux
in the RTOS and non-real-time processes running in Linux. which has architectures for high throughput, e.g. the inter-
Because real-time processes cannot use features of Linux di- rupt handling architecture with the top and bottom half.
rectly, they must use a special inter-process communication Modifying such throughput-oriented architecture faced a
mechanism with the non-real-time processes. Developers maintenance problem, as discussed in Section 1. Therefore,
must use special toolchains for building applications of real- we designed our cRTOS which has an isolated environment
time processes. It is not trivial to port the evolving Linux for real-time applications to achieve the first requirement
kernel to an interrupt-dispatching layer. Thus, they fall in as well as a common environment for the second and third
the same problems as typical RTOSs, which is tedious for requirement.
development.
To address these limitations in usability and maintainabil- 2.1 Normal and real-time realms
ity, we propose a new approach to a compounded real-time Our proposed cRTOS runs a GPOS and one or more instances
operating system (cRTOS). This system runs both a rich of sRTOSs in parallel using a hypervisor on a multi-core
GPOS and a swift RTOS (sRTOS) in it with a hypervisor and platform, as shown in Figure 1. The hypervisor provides
creates a normal realm and a real-time realm. The GPOS re- strong isolation of hardware resources. It also provides inter-
quires no modifications and patching. Real-time applications VM collaboration facilities, such as inter-VM shared memory,
messaging and event notification.

60
Obtaining hard RT performance and rich Linux features in a cRTOS VEE ’20, March 17, 2020, Lausanne, Switzerland

Normal realm Real-time realm

Rich GPOS (Linux) Swift RTOS (Nuttx)


Non-RT Rich RT Process
Shadow Memory Non-RT RT Memory RT
Process Process access Threads Threads access Process

system Kernel Modules Rich GPOS Swift RTOS Remote Local system
calls Kernel Kernel system calls system calls calls
RT Hypervisor

Non-RT Core Non-RT Core RT Core


RT
Non-RT
Devices
Devices PIC PIC PIC

PIC: programmable interrupt controller

Figure 1. The system architecture of our cRTOS and the internals of a rich real-time process.

In Figure 1, we show two realms. One is the normal realm, This avoids head-of-line blocking because multiple non-real-
which runs a GPOS, and provides the rich features. It exe- time threads do not interfere one another on accessing the
cutes non-real-time (non-RT) processes. This normal realm resources in the normal realm.
occupies most of the hardware resources including proces- In our cRTOS, a developer develops a rich real-time appli-
sor cores and non-real-time I/O devices. Each core has a cation as follows.
programmable interrupt controller (PIC) with a timer.
The other is the real-time realm, which runs an sRTOS 1. Choose a rich GPOS and sRTOS.
and acts as a subsystem of the GPOS. The sRTOS kernel is Typically, Linux is chosen as a rich GPOS. The sRTOS
specially designed for real-time processes and provides a should provide real-time performance through stan-
fully preemptible hard real-time environment. It also has dard APIs of the GPOS, such as the POSIX thread and
a shorter and more consistent interrupt handling mecha- real-time extension [51]. If the sRTOS lacks support for
nism than typical GPOSs. This real-time realm has dedicated multiple address spaces and other features, the devel-
processor cores and real-time devices. oper needs to add them for reusing the rich toolchains
These two realms collaborate via inter-VM messaging of the GPOS. We will discuss this in Section 3.3. The
queues and a large shared memory area. The GPOS loads a developer also has to add system call handlers, which
helper kernel module dynamically for this inter-VM collabo- will be shown in Section 4.
ration. 2. Design a rich real-time process.
It is composed by real-time and non-real-time threads.
The real-time threads access real-time devices and
2.2 Developing rich real-time applications timers and contain critical algorithms. The non-real-
Developers can create real-time processes from the normal time threads use rich features of the GPOS, e.g. X
realm. A real-time process is visible in both the normal and Window. They use lockless algorithms over shared
real-time realm. We call such a process a rich real-time process, variables to avoid blocking real-time threads by non-
as shown in Figure 1. real-time threads.
A single rich real-time process has multiple threads in the 3. Develop and test the application in the GPOS.
real-time realm. The scheduling and synchronization of these The developer can use the standard API and rich tool-
threads are managed by the sRTOS kernel in the real-time chains of the GPOS. The application uses a device
realm with the real-time scheduler. This provides a fully driver of the GPOS in this (prototype) phase.
preemptible environment and a short interrupt handling 4. Find device drivers of real-time devices.
path to the rich real-time processes and prevents priority The device drivers of the sRTOS of the real-time realm
inversion among threads. can be reused as is. If no drivers are available, the
Each rich real-time process has a shadow process running developer has to write them.
in the normal realm, which is used to access the resources 5. Run the executable in the real-time realm.
and features in the normal realm. Each thread of a rich real- The developer confirms the real-time performance of
time process is mapped to a thread of the shadow process. the rich real-time process.

61
VEE ’20, March 17, 2020, Lausanne, Switzerland Chung-Fan Yang and Yasushi Shinjo

2.3 System calls 3.1 Jailhouse


The OS kernel of the real-time realm only provides a portion We have chosen Jailhouse [45] as the partitioning hypervisor,
of system calls locally. When a rich real-time process triggers because it provides the following features:
system calls, the system call handler executes the system call
• Strong isolation of hardware resource and interrupts.
functions according to pre-defined types. There are following
• Real-time performance.
three types of system calls:
• No VM exit under normal execution.
• Assigning PCI express (PCI-e) devices and direct rout-
• Real-time system calls: ing of the related interrupts to the CPU cores.
These are related to scheduling and synchronization
and are executed in the real-time realm. Jailhouse does not perform dynamic core scheduling. This
• Remote system calls (RSCs): lowers the resource utilization but has an advantage for
realizing hard real-time performance [12, 27, 57, 61].
These are executed in the normal realm.
Jailhouse utilizes typical virtualization extension of mod-
• Dual system calls:
ern CPUs and creates isolated virtual machines called cells.
These are executed in both the normal and real-time
Jailhouse provides shared memory (IVSHMEM), messaging
realms. For example, mmap() and exit() are executed
queues based on virtio for inter-guest communication and
in both the realms.
virtual PCI devices [43, 47, 48]. We implement the shared
memory facility by using IVSHMEM.
2.4 Comparisons with existing approaches In our cRTOS, the GPOS, Linux, boots first. Next, Jailhouse
is activated from Linux. Jailhouse creates two cells and al-
This design of our cRTOS has advantages over the two repre- locates hardware resources to these two cells. The GPOS
sentative approaches to real-time Linux described in Section (booted Linux) is moved to a cell. Finally, the sRTOS, Nuttx
1. Because our design utilizes a partitioning hypervisor, mod- boots in the other cell. Figure 2 shows the memory map of
ifications of the Linux kernel is not necessary. This enables a these Linux and Nuttx. Linux uses most of the physical mem-
developer to use the newest vanilla kernel with new features. ory. The memory of Nuttx consists of a boot loader, shared
Both the approach using the PREEMPT_RT patch and our memory area, the kernel area, and virtio areas.
cRTOS provide the same application binary interface (ABI) as
vanilla Linux and allow reuse of the existing binary executa-
3.2 Nuttx x86-64
bles. However, our cRTOS includes a real-time kernel and
a real-time interrupt handling path for real-time processes. We use Nuttx [39] as the sRTOS because it provides the
This makes our cRTOS outperform the approach using the following features:
PREEMPT_RT patch in hard real-time tests regarding inter- • POSIX-compatible multithread API and real-time
rupt handling and preemptive scheduling. thread scheduling.
The approach using interrupt-dispatching layers also uses • Real-time interrupt handling.
a real-time kernel. However, this real-time kernel does not • Pseudo file systems, such as /dev and /proc.
provide Linux APIs, and its ABI is not compatible with vanilla • A TCP/IP networking stack.
Linux. We cannot use existing binaries as-is, and we need
special APIs to issue system calls to the Linux kernel. Fur- While Nuttx is a simple RTOS targeting embedded sys-
thermore, in our cRTOS, the real-time and non-real-time tems, it supports a wide range of CPUs including ARM, MIPS,
threads in a rich real-time processes can be synchronized by RISC-V, Zilog, and 8086. However, it does not support x86-64
the standard APIs of the GPOS, i.e. the POSIX thread API, at this time. Because we would like to run it together with
with priority-inheritance protocol. Our cRTOS out performs Linux x86-64, we first ported it to the x86-64 long mode [25].
this approach on usability. Moreover, our cRTOS gives better Nuttx consists of architecture-dependent and -independ-
real-time performance while under load because of better re- ent parts. We only modified the architecture-dependent part
source isolation and hardware usage for interrupt handling. of x86-64 and have added support for
We will show this in Section 5. 1. time stamp counter (TSC) timers for nanosecond reso-
lution and
2. local advance programmable interrupt controller
3 Implementation (LAPIC) of x86-64.
In Section 2, we show the design of our cRTOS. Based on We run this Nuttx x86-64 port as a guest in Jailhouse. Our
this design, we implement our cRTOS using Jailhouse as the current port does not support multi-core real-time schedul-
partitioning hypervisor, Linux as a GPOS, and Nuttx as an ing. To utilize multiple cores, a developer can create multiple
sRTOS. real-time realms running the Nuttx kernels.

62
Obtaining hard RT performance and rich Linux features in a cRTOS VEE ’20, March 17, 2020, Lausanne, Switzerland

Memory reserved for rich GPOS (Linux)


Layout is same as vanilla rich GPOS In our cRTOS, we changed this model, similar to Linux. A
Host Memory
real-time application only includes its code. The kernel code
is loaded at boot time.
Memory
reserved for Memory reserved for sRTOS (Nuttx) Next, we implemented memory mapping and multiple
hypervisor Bootloader IVSHMEM area Nuttx Kernel Virt-io 1 Virt-io 2
address spaces with MMU. A single kernel runs multiple
real-time processes. Each real-time process has its own ad-
Figure 2. Memory map of our cRTOS. dress space similar to that in Linux. We implement mmap(),
munmap(), clone()/fork(), and execve() in the Nuttx x86-
3.2.1 Real-time device drivers 64 port.
We can reuse the existing device drivers for Nuttx in the While we use MMU, we did not implement virtual mem-
real-time realm. For example, we reused the upstream 16550 ory with paging for hard real-time performance. In other
serial device driver without modifications. words, before a real-time process runs, all pages are pre-
Nuttx provides a C library and real-time version of Linux sented on memory and locked. Pages are physically copied
kernel API, e.g. work queues, spinlocks, etc. Real-time appli- when system calls mmap(), fork(), and execve() are exe-
cation developers can write their device drivers easily with cuted. If a user process performs mmap() for a file, the kernel
this library and Linux-like API. allocate memory pages and reads the contents of the file to
In our cRTOS, a rich real-time process can directly access the pages, immediately. This makes these system calls slower
the real-time devices that are attached to the real-time realm but suitable for real-time execution.
by Jailhouse. This can provide hard real-time performance.
3.3.2 Application binary interface (ABI)
3.2.2 Virtual Ethernet compatibility
In our cRTOS, we need a user-level inter-realm communi- The ABI of base Nuttx depends on the compiler used, and
cation mechanism. For example, we need this to start rich the system calls ABI for x86-64 are missing. In our cRTOS,
real-time processes in the real-time realm from the normal we made the Nuttx kernel use the same ABI as Linux. Our
realm, which will be described in Section 3.4. We decided to Nuttx x86-64 can execute regular Linux executables that
reuse the existing TCP/IP stack of Nuttx for this purpose. are built with gcc, linked with Glibc, and issue the system
To reuse the existing TCP/IP stack of Nuttx, we imple- calls of Linux. The arguments of function calls are stored in
mented a paravirtual Ethernet driver for Nuttx. The fron- registers rdi, rsi, rdx, rcx, r8, and r9 of the x86-64 CPU. Any
tend driver of this driver runs in the real-time realm, uses the other arguments beyond these are pushed on the stack [33].
virtio mechanism of Jailhouse, and communicates with the System calls are provided via the syscall instruction, and
backend driver in the normal realm. We used the backend arguments are stored in the registers rdi, rsi, rdx, r10, r8,
driver for Linux that is provided by Jailhouse. and r9 [33]. Our Nuttx x86-64 configures floating-point unit
(FPU) in the same way as Linux. It also puts ELF auxiliary
3.3 Using ELF binaries in the real-time realm vectors1 on the stack when a program starts execution [9].
Base Nuttx uses binary large objects (BLOBs) as the exe- System calls are executed by Nuttx, Linux, or both. We
cutable format. This is quite different from Linux and pro- will describe this in Section 4.
hibits use of the standard toolchains of Linux.
In our cRTOS, we use the Linux executables as the ex- 3.4 Executing rich real-time processes in the
ecutable format in the real-time realm. We use executable real-time realm
and linkable format (ELF) [55] as the common executable To start the first real-time process, the corresponding shadow
format of the normal and real-time realms because ELF is process first starts in the normal realm. This shadow process
the most-widely used executable format in Linux. takes arguments of the execve() system call and creates
To use ELF as the executable format in the real-time realm, a rich real-time process in the real-time realm as in a dis-
we add the followings to base Nuttx. tributed system. After the first real-time process is created,
• Address space compatibility. more real-time processes can be created in this manner or
• Application binary interface (ABI) compatibility. the typical fork() and execve() Unix convention using
existing real-time process.
3.3.1 Address space compatibility Figure 3 shows the communication protocol between them.
In base Nuttx, a real-time application includes both its code The shadow process in the normal realm establishes a TCP
and the kernel code. A real-time application comprises th- connection to the server. It marshals the arguments of the
reads that share the same flat address space with the kernel. execve() system call and sends them via the TCP connection
Base Nuttx does not support memory mapping and multiple 1 InLinux, both Glibc and the dynamic loader, ld.so, read these values to
address spaces with memory management unit (MMU). setup the runtime environment.

63
VEE ’20, March 17, 2020, Lausanne, Switzerland Chung-Fan Yang and Yasushi Shinjo

Shadow process Server Seed process Shared memory area


in normal realm in RT realm in RT realm

(1) TCP Connect


Shadow process
(2) Invoke user Sections of
remote execve() (3) Start a seed .text .data .bss heap
stack Shadow process
RT process
0x40000000
(4) Send execve()
user
arguments .text .data .bss heap
stack
RT-Process
(6) Read of ELF headers 0x400000 0xF800000
0x0
(7) Reply ELF headers

execve()
(8) Read of ELF sections
Figure 4. Memory layout of a rich real-time process and its
(9) Reply ELF sections
shadow process.
(10) Jump to the
program entry point
applications often prefer statically linked executables be-
Figure 3. Executing a real-time process in the real-time cause they have no run-time overheads. Typical real-time
realm from the normal realm applications of Nuttx are also statically linked executables.
In our cRTOS, the execve() system call can handle statically
linked executables.
to the server. The first argument is the file name of the ELF
In contrast, it is convenient for rich real-time application
executable in the normal realm. The server creates a seed
developers to use dynamically linked executables because
process with a new system call rexec() of Nuttx. This seed
most Linux executables nowadays are dynamically linked
process executes the pre-defined code in the Nuttx kernel
ones. A dynamic linked executable requires a dynamic loader
and calls the execve() system call.
to load related dynamic shared objects (DSOs) during startup-
The execve() system call is mainly implemented by the
and run-time. Most dynamically linked executables in Linux
user-level code that performs the following.
use the common dynamic loader, called ld.so [29].
1. Communicates with the shadow process and reads the To reuse this dynamic loader, we implemented the mmap()
headers of the ELF executable. system call as a dual system call. Ld.so itself is a statically
2. Allocates the required memory pages for the .text, linked program. Therefore, it can be loaded as a statically
.data, and .bss segments from the shared memory linked executable with execve(). Ld.so loads the dynami-
area. cally linked libraries with mmap() and jumps to the entry
3. Communicates with the shadow process and reads point of the dynamically linked executable. We will describe
the sections of the ELF executable into these memory the implementation of the system call in the next section.
pages.
4. Allocates memory pages for a new user stack from the 4 Handling system calls
shared memory area. 4.1 Real-time system calls
5. Maps the memory pages to the address space of the
seed process. In the real-time realm, the following system calls (and related
6. Communicates with the shadow process and requests APIs) are handled by the Nuttx kernel in a real-time way:
it to map the memory pages in the shared memory • Task-related:
area to the same addresses of the seed process by the clone(), fork(), sched_setparam(), etc.
mmap() system call. • Time-related:
7. Stores the arguments and environment variables onto clock_nanosleep(), gettime(), etc.
the user stack. • IO-related:
8. Stores the ELF auxiliary vectors onto the user stack. open(), close(), read(), write(), ioctl() for real-
9. Sets the stack pointer and jumps to the entry point of time I/O devices and pseudo devices.
the executable. • IPC-related:
socket() (both Unix domain sockets and TCP/IP),
The shadow process shares the same address space of the
pipe(), select(), and poll(), etc. for communica-
new rich real-time process. Figure 4 shows an example of
tion with other real-time processes.
the mapping between a rich real-time process and its corre-
sponding shadow process. This is essential for the remote When we run Linux executables in Nuttx, we have to solve
system calls, and its details will be described in Section 4. the following problems:
Executables of Linux are classified into two types: stati- 1. The numbering of system calls of Nuttx is different
cally linked and dynamically linked executables. Real-time from that of Linux.

64
Obtaining hard RT performance and rich Linux features in a cRTOS VEE ’20, March 17, 2020, Lausanne, Switzerland

2. The flag values of various system calls are different. For These IO and IPC system calls make a real-time process
example, the flag ‘‘O_RDONLY’’ of open() is 0x0 in have access to rich features. For example, a rich real-time pro-
Linux and 0x1 in Nuttx. Conforming to POSIX means cess can open a DSO file in the normal realm and read it into
only source level compatibility. the memory of the real-time realm to use additional libraries.
A rich real-time process can use Unix domain sockets and
To solve these problems, we have implemented a transla- connect to the X window server in the normal realm.
tion table for the system call numbering and multiple stubs We have implemented remote system calls using shared-
for all the system calls with different flag values. We did not memory and virtio queues of Jailhouse as follows.
modify the system call numbering and flag values of Nuttx.
A rich real-time process can use the POSIX Thread API to In the real-time realm:
create real-time and non-real-time threads. In Linux, these 1. A user thread of a rich real-time process issues a sys-
are implemented mainly in the Glibc at the user level with tem call, switching to its kernel thread.
a small number of non-standard system calls of Linux. We 2. The kernel thread gets the arguments, makes a request
have added the following non-standard system calls of Linux message from these arguments and the priority of the
to Nuttx. thread, and puts the message into the virtio queue to
Linux.
• clone() (including fork() and vfork()) 3. The kernel thread notifies Linux with an inter-pro-
This is used for creating a kernel-level thread. Because cessor-interrupt (IPI).
multi-threading of both Linux and Nuttx is based on 4. The kernel thread sleeps and yields the CPU to a next
the 1:1 model [31], we create a single Nuttx thread for runnable thread.
each invocation of clone(). In addition, we create a In the normal realm:
Linux thread in the shadow process for remote system 5. The Linux kernel handles the IPI. The interrupt handler
calls. gets the request message and puts it to the memory
• futex() (fast user-space locking) of the corresponding shadow process. The interrupt
This is used in Glibc to reduce the overhead of the handler wakes up the corresponding thread in the
condition variable and mutex of Glibc. shadow process.
• arch_prctl() (set architecture-specific thread state) 6. The corresponding thread extracts the request message
This system call can set the FS register of x86-64 with and issues a system call to Linux.
an argument. This is used for implementing thread 7. The Linux kernel performs the system call.
local storage (TLS) with gcc support on the x86-64 8. The corresponding thread makes a reply message with
platform [6]. For example, Glibc utilizes TLS of gcc to the return value of the system call and goes back to
implement the per-thread errno. Furthermore, TLS is the Linux kernel.
used in Glibc to implement thread specific data with 9. The Linux kernel puts the reply message into the virtio
pthread_getspecific() [50]. queue to Nuttx. The Linux kernel checks the current
• set_tid_address() (set thread ID address) thread priority in the real-time realm, which is pub-
This is used to implement pthread_join(), lished on a shared page by the Nuttx kernel. The Linux
pthread_exit() and pthread_create(). When a th- kernel notifies the Nuttx kernel with an IPI if the pri-
read calls pthread_join(), the thread is put to wait ority of the returning thread is higher than the current
for a futex associated with the thread ID address that is one.
set in pthread_create(). The futex is released when In the real-time realm:
another thread call pthread_exit(). 10. The Nuttx kernel gets the reply messages from the
queue and wakes up and schedules the sleeping cor-
responding kernel thread that has issued the remote
4.2 Remote system calls
system call in the following circumstance:
For non-critical tasks, a rich real-time process can use the • During receiving an IPI from Linux.
following remote system calls of Linux: • During a context switch. The Nuttx kernel polls the
virtio queue for available messages.
• IO-related:
11. The kernel thread gets the return value from the reply
open(), close(), read(), write(), and ioctl() for
message and returns it back to the user thread.
non-real-time I/O devices and filesystems mounted in
the normal realm. In this implementation, we have chosen to use IPIs for
• IPC-related: better response with real-time performance. We took extra
Sockets (both Unix domain sockets and TCP/IP) for care on sending the IPIs to the real-time realm to prevent un-
communicating with non-real-time processes. wanted priority inversion. We also chose to use the user-level

65
VEE ’20, March 17, 2020, Lausanne, Switzerland Chung-Fan Yang and Yasushi Shinjo

code in the shadow process for simplicity. Similar implemen- • Vanilla Linux on bare-metal machine
tations are found in device emulation of Qemu/KVM [28, 44]. • PREEMPT_RT patched Linux on bare-metal machine
We also have to multiplex system call handling for indi- • Xenomai 3 and I-Pipe patched Linux on bare-metal
vidual opened files. For example, if a file is opened in the machine
real-time realm, the system call read() should be handled • Our cRTOS with vanilla Linux on Jailhouse
locally. If a file is opened in the normal realm, the system • Our cRTOS with PREEMPT_RT Linux on Jailhouse
call read() should be handled as a remote system call. The first three served as baselines to compare the perfor-
To solve this problem, the current implementation uses mance of our cRTOS. In our cRTOS, we allocated a single
a simple method for real-time performance. We split the core and one half of the LLC to the sRTOS, and the rest of the
address space of file descriptors into two spaces, the real-time cores and LLC to Linux. In the other systems, we allocated
space and the normal space, with a limit. If a file descriptor an exclusive core to a real-time benchmark program and one
is smaller than the limit, the kernel deals with it as a file half of the LLC to the core. The rest of the cores and LLC
descriptor of the normal realm. Otherwise, the kernel deals were shared by other non-real-time programs.
with it as one of the real-time realm. The kernel deals with To add loads to the system, we used stress-ng [26] with
file descriptors 0 to 2 as those in the normal realm, and a various built-in stressors, including the stream [34] stressor.
rich real-time process can use stdin, stdout, and stderr In each experiment, we concurrently ran 10 stressors in the
with non-real-time threads. normal realm in our cRTOS. In other systems, we ran the
stressors in Linux.
4.3 Dual system calls
Because both a rich real-time process and its shadow process 5.1.2 Timing accuracy
have to keep the same memory layout and terminate together, We evaluated the timing accuracy of our cRTOS using cyc-
some system calls are executed in both the real-time and lictest [52] with various loads. Cyclictest measures the time
normal realms. These include mmap(), munmap(), exit(), between when a thread sets a timer and when the timer
fork(), and vfork(). expires. Using this program, we measured the accuracy of
Because the parent process and the child process in fork() the response to the timer interrupts. All the systems used
and vfork() have the same memory layout, a single shadow TSC timers. We ran the same dynamically linked cyclictest
process cannot map the memory of both the processes. There- executable of Linux, compiled from the source code of the
fore, the shadow process also forks itself in the normal realm. main repository, in all the experiments except Xenomai. Be-
cause the cyclictest of the main repository did not work in
5 Evaluation Xenomai, we used the cyclictest port by Xenomai. Each test
In this section, we evaluate our cRTOS. First, we show the ran cyclictest with the SCHED_FIFO scheduling policy, with
real-time performance of our cRTOS compared with the rep- the highest priority, and produced 100,000 samples.
resentative existing approaches, the PREEMPT_RT patched Figure 5 shows the results without and with load. The
Linux and Xenomai 3. We measure the timing accuracy and results of vanilla Linux in Figure 5 (d) and (e) have been
interrupt latency. Next, we show the performance of the cropped for better visualization. This figure contains mod-
real-time and remote system calls. Finally, we discuss the ified box plots, which depict the first and third quantile of
compatibility of our cRTOS with Linux. latency with maximum and minimum values. Figure 5 (a)
shows the results without load. While threads running in
5.1 Real-time performance vanilla Linux could achieve a very short response time to
timer interrupts, the variation in response time was high.
5.1.1 Experimental setup In contrast, threads running in the real-time realm of our
To evaluate our cRTOS, we used an x86-64 platform with an cRTOS had not only shorter latency, but also higher stability
Intel Xeon 2630 v4 processor with 10 cores and 20 MB of the and predictability.
last-level cache (LLC) and 32 GB of RAM. We disabled the hy- Figures 5 (b) to (e) show the results with load. Compared
per-threading of the processor and the power management of with Figure 5 (a), the jitter of the real-time process was in-
operating systems to ensure accuracy of measurement. The creased. These results show that our cRTOS could deliver
normal realm ran vanilla or PREEMPT_RT patched Linux hard real-time performance with about 4 µs jitter and well
with version 4.9.84 in Ubuntu 16.04. The real-time realm ran bounded maximum latency of 4 µs, while the others could
our ported Nuttx version 7.3. The hypervisor was Jailhouse not. In other words, with the same application executable,
version 0.9.1. our cRTOS could provide better real-time performance than
We ran real-time benchmark programs in the following the representative Linux-compatible real-time environments.
five different systems, which are all Linux-compatible envi- In addition, we noticed that the performance of our cRTOS
ronments. was slightly better when using PREEMPT_RT patched Linux

66
Obtaining hard RT performance and rich Linux features in a cRTOS VEE ’20, March 17, 2020, Lausanne, Switzerland

(a) Without load (b) Loaded with cpu (c) Loaded with cache (d) Loaded with vm (e) Loaded with stream

60
Latency (us)

40

20

0
L P X LN PN L P X LN PN L P X LN PN L P X LN PN L P X LN PN
Environments under test
Proposed Proposed
L Vanilla Linux P PREEMPT_RT X Xenomai 3 LN PN
(Vanilla Linux + Nuttx) (PREEMPT_RT + Nuttx)

Figure 5. The latency jitter measured with cyclictest in Linux-compatible real-time environments under various loads.

5.1.4 Overhead of interrupt handling


We measured the basic overhead of interrupt handling of
System under
test
our cRTOS using the TACLe-benchmark [8]. This bench-
Signal generator
Serial device
Oscilloscope mark consists of 57 small CPU-bound programs that do not
invoke any system calls during execution. While a bench-
Figure 6. Measuring interrupt latency. mark program is running, the kernel handles interrupts from
timer devices. We ran these benchmark programs in PRE-
as the GPOS of the normal realm. However, the difference EMPT_RT patched Linux, Xenomai 3 and our cRTOS with
was not large. Therefore, some developers can use vanilla PREEMPT_RT Linux. Each test consisted of 100 samples and
Linux as a rich GPOS. ran with the SCHED_FIFO scheduling policy and the highest
priority.
5.1.3 Interrupt latency We measured the execution time of each program’s main()
function using the x86 time stamp counter with the method
To compare the interrupt latency, we designed the following
proposed in [42]. We observed no significant difference be-
experiment with a serial port device2 , as shown in Figure 6.
tween PREEMPT_RT patched Linux, Xenomai 3 and our
The signal generator generated a 1-Khz 50% square wave
cRTOS.
on the clear-to-send (CTS) pin of the serial port device. The
real-time process used the system call ioctl() to wait for 5.2 System call latency
a change of CTS and set the ready-to-send (RTS) pin. We
measured the delay and jitter between the change of CTS In this section, we evaluated the latency of real-time and
and the corresponding change of RTS using an oscilloscope remote system calls. We measured the latency using the
with 20 ns resolution. This included the latency and jitter of “latency of system call” (lat_syscall) micro-benchmark in
hardware, interrupt handling, and scheduling of the system. lmbench [35]. For this experiment, we used PREEMPT_RT
Each test consisted of 1000 samples and ran with the patched Linux as a GPOS for our cRTOS. We pinned the
SCHED_FIFO scheduling policy and the highest priority. IPI handler and the IRQ thread to the same CPU core as
In the PREEMPT_RT patched Linux, we pinned the device the shadow process. We tested the system calls getpid(),
IRQ handler and the IRQ thread to the same CPU core as the read(), write() and a combination of open() and close().
process of the test program. The filesystem related system calls were operated on pseudo
The results are shown in Figure 7 with the same type of devices, i.e. /dev/zero for read() and open() and /dev/
modified box plots as the previous Section 5.1.2. Vanilla Linux null for write(). We collected the data of 100 tests for
produced a single measurement overflow in Figure 7 (c). each system call. Each single test consisted of 10 warm-up
Overall, our cRTOS yielded the best performance in both la- and 100 measurement iterations. We took the maximum
tency and jitter among the Linux-compatible environments. of these test results as the latency of each system call. We
also conducted this experiment on PREEMPT_RT patched
2 Typical
x86-64 platform lacks general purpose input/output (GPIO) with Linux and Xenomai 3 without Jailhouse for comparison. The
interrupts. We utilized the modem status interrupt and the ability to sense virtual dynamic shared object (VDSO) feature in Linux was
and set hardware flow control lines and mimicked GPIOs with interrupts.
We used a PCI-e serial port device with ASIX MCS9900 PCI-e to serial
disabled for all tests to make all system calls be invoked
controller [2], because Jailhouse does not support the legacy interrupts of with the syscall trap instruction of x86-64. This forced the
COM ports of x86 PCs. execution of the actual system call handlers.

67
VEE ’20, March 17, 2020, Lausanne, Switzerland Chung-Fan Yang and Yasushi Shinjo

(a) Without load (b) Loaded with cpu (c) Loaded with cache (d) Loaded with vm (e) Loaded with stream
60

50
Latency (us)

40

30

20

10

0
L P X LN PN L P X LN PN L P X LN PN L P X LN PN L P X LN PN
Environments under test
Proposed Proposed
L Vanilla Linux P PREEMPT_RT X Xenomai 3 LN PN
(Vanilla Linux + Nuttx) (PREEMPT_RT + Nuttx)

Figure 7. The interrupt latency and jitter of a serial port device in Linux-compatible real-time environments under various
loads.
Table 1. The maximum latency of various system calls. Table 2. Analysis of system call coverage.
Measured by Lmbench in microseconds. A=Available, N/A=Not Available, O=Obsolete in Linux.

Environment getpid read write open and close Classification Importance Linux 3.19 Real-time realm
PREEMPT_RT native 0.306 0.406 0.338 2.23 A N/A O
Xenomai 3 0.456 1.14 1.07 4.16 Indispensable 100% 224 224 0 0
Real-time system call 0.059 0.088 0.083 0.445 Important 10% - 100% 33 22 11 0
Remote system call — 27.7 27.0 56.3
Low important 0% - 10% 44 38 6 0
Not used 0% 18 4 6 8
New in Linux 4.9 — 8 5 0
Total 319 295 29 8
The results are shown in Table 1. We noticed that for real-
time system calls, our cRTOS has smaller latency than Linux 5.3.1 System calls and pseudo files
and Xenomai 3. This is because the real-time realm has a
We have shown the development process of our target real-
simpler system call handler and does not require switching
time applications in Section 2.2. In this process, developers
between user and privilege levels. In general, we achieved 5
develop new real-time applications by using local real-time
times speedup with respect to PREEMPT_RT patched Linux
system call and remote system calls according to their needs.
and Xenomai 3.
Although our cRTOS does not support all the system calls of
For remote system calls, the latency was large. It took 27 µs
Linux, this missing causes no problem in this development
for a single system call. This was because of the overhead
process. In the following, we evaluate our cRTOS’s compati-
of IPIs, memory copy, cache contention, and scheduling in
bility with Linux to run existing executables, including ones
both the realms. We conclude that to achieve good real-time
for PREEMPT_RT patched Linux.
performance, developers should provide real-time drivers
Linux version 4.9 has 332 system calls for the x86-64 ar-
for real-time devices. Developers should use remote system
chitecture. Among the 332 system calls, 8 are obsolete, and
calls for rich user interface, access to non-real-time devices,
the rest of them, 324 are properly implemented in the Linux
and initialization, i.e. dynamic loading.
kernel. The real-time realm supports 295 of 324 system calls,
and 29 system calls are not supported.
5.3 Compatibility with Linux We evaluate the effect of these missing system calls by
The compatibility with Linux depends on the sRTOS run- following the methodology in the paper [56]. They analyzed
ning in the real-time realm. To evaluate the compatibility Ubuntu and Debian packages statically in 2.9 million installa-
of our cRTOS using Nuttx as the sRTOS, we performed the tions, and found that not all APIs are equally important. They
following: define a metric, API importance for a given API as the prob-
ability that an installation includes at least one application
1. We statically analyzed the coverage of system calls requiring the given API. Using this metric, they classified
and pseudo files in our cRTOS and compared it with 319 system calls of Linux 3.19 into four classes: indispensable,
vanilla Linux. important, low important, and not used. Table 2 shows the
2. We executed X Window applications. results.

68
Obtaining hard RT performance and rich Linux features in a cRTOS VEE ’20, March 17, 2020, Lausanne, Switzerland

We confirmed that the real-time realm supports all the


indispensable system calls, as shown in Table 2. The real-time
realm does not support 11 important and 6 low important
system calls.
We analyze these 11 important system calls in detail. They
are categorized as follows.
1. 5 system calls for Linux kernel extension,
init_module(), finit_module(), setns(),
pref_event_open() and prlimit64()
2. 1 system call for debugging, ptrace()
3. 5 Linux-specific system calls,
vhangup(), pivot_root(), unshare(), signalfd4()
and process_vm_readv()
If an existing application uses these system calls, it does
Figure 8. Screen shot of lxardoscope, an opensource- oscil-
not run in the real-time realm. As discussed in Section 2.2,
loscope program running in the real-time realm.
our target real-time application developers can execute de-
buggers in the normal realm at the prototype phase, and do
not use debuggers in the real-time realm at the production should modify their applications or write device drivers of
phase. Note that developers can use these system calls and pseudo files in the real-time realm.
existing executables in the normal realms in a non-real-time Remote system calls of our cRTOS are similar to system
way. call wrappers in Xenomai 3. Through this mechanism, Xeno-
The current implementation of system calls has several mai 3 allows a real-time process to call the other non-real-
limitations. First, our cRTOS currently cannot handle cross- time kernel with sacrificing hard real-time performance [17].
realm signals. While a real-time process can send signals to Our cRTOS has two advantages over Xenomai 3 in terms of
real-time processes in the real-time realm, it cannot send Linux-compatibility and usability. First, our cRTOS achieves
signals to processes in the normal-realm, and vice versa. higher Linux-compatibility by providing a larger number
Second, the kernel of real-time realm cannot interpret some of system calls in the real-time realm. Second, our cRTOS
arguments of vectored system calls, such as ioctl() and supports the full GNU C Library, while Xenomai 3 uses a
fcntl() of Linux. custom C library with limited functions.
The paper [56] also evaluates the API through pseudo files
with the API importance as similar to system calls. It is very 5.3.2 X window applications
hard for a Linux-compatible system to implement a large
number of pseudo files. The paper shows the API importance It is not trivial for existing hard real-time environments
distribution of pseudo files under /dev and /proc has a small including Xenomai to run X window applications [5, 16]. We
head and a long tail. tested various X window applications of Ubuntu 16.04 in
The real-time realm, using Nuttx as the sRTOS, supports the real-time realm. The following applications could work
the important pseudo files whose importance values are properly.
greater than 5%. Rich real-time processes can use /dev/null, • Basic X utilities: xeyes, xclock, glxinfo, and glxgears
/dev/zero, /dev/urandom and /dev/random in a real-time • Image rendering: Feh, Image Magick and Ristretto
way3 . • PDF viewers: Xpdf, MuPDF, GSview and Zathura
The current implementation has a limitation in pseudo • Editors: GVim and LeafPad
files. If an application opens a pseudo file that does not ex- • Terminal emulator: xterm with dash
ist in the real-time realm, the process opens the file in the
However, not all X window applications were supported
normal realm. This can cause a problem. For example, if
in the real-time realm. For example, Emacs did not work
an application opens /dev/cpuinfo, the application gets
because of lacking support of cross-realm signals.
the CPU model, correctly. However, the application cannot
We also tested a real-time program, lxardoscope [37]. This
get the number of online CPUs. Furthermore, accessing the
is an open-source oscilloscope program, gathers data from
pseudo file in the normal realm can violate hard real-time
a serial device and plots the data as a waveform, as shown
constraint. In such a case, real-time application developers
in Figure 8. This program could run in the real-time realm
of our cRTOS. It received data via a serial device, which
3 Our Nuttx port uses the x86 rdrand instruction to generate random num- was directly attached to the real-time realm with a real-time
bers for /dev/random. driver.

69
VEE ’20, March 17, 2020, Lausanne, Switzerland Chung-Fan Yang and Yasushi Shinjo

6 Related Work execute the requests asynchronously without exceptions.


We have described two current representative approaches This architecture improves throughput, especially in multi-
of real-time Linux in Section 1 and Section 2.4. We have core processors. Our implementation uses shared memory
compared the performance and Linux-compatibility of our and messaging queues between the normal realm and the
cRTOS with those of the representative approaches in Sec- real-time realm. We use inter-processor interrupts (IPIs) as
tion 5. In this section, we discuss other related work. the inter-realm notification mechanism to preserve the real-
RTLinux [60], Resource kernel for Linux [40] and Time time property while avoiding priority inversion.
Sensitive Linux (TSL) [19] are early attempts to bring Linux
as an RTOS prior to PREEMPT_RT. These methods required 7 Conclusion
the vanilla Linux kernels to be modified. In addition, the In this study, we achieved hard real-time performance and
programming environment is not rich as a normal GPOS. rich Linux features togethre in a compounded real-time oper-
LITMUSRT adds various types of real-time schedulers into ating system (cRTOS). This system creates two realms with
Linux [4]. Its goal is to provide a scheduler testing platform a partitioning hypervisor: a normal realm of Linux and a
for real-time systems in academic research. Its support is real-time realm of a swift RTOS. A rich real-time process
not active available and using it in real-world problems is running in the real-time realm can use not only the hard
difficult. real-time performance of the swift RTOS but also the rich fea-
Similar techniques have been applied to hypervisors to tures of Linux through remote system calls. Unlike existing
acquire real-time performance. These include real-time KVM approaches to real-time Linux including the PREEMPT_RT
and RT-Xen [27, 57, 58, 61]. They also require patching patch and using interrupt-dispatching layers, our cRTOS re-
the vanilla Linux kernel. Real-time KVM does not provide quires no modifications to Linux and provides hard real-time
hard real-time performance. While RT-Xen provides guaran- performance.
teed hard real-time performance, complicated compositional We have implemented our cRTOS by running Nuttx, a
schedulability analysis (CSA) [20] is required to configure POSIX-compliant RTOS, as a swift RTOS and Jailhouse as
the system. RT-Xen does not support fixed priority sched- the partitioning hypervisor. We ported base Nuttx to the
uling. In this study, we proposed a cRTOS that overcomes x86-64 architecture and added support for multiple virtual
these shortcomings. addresses with MMU. This allows developers of rich real-time
Outsourcing and similar techniques delegate network and applications to use the same toolchains and executables with
other I/O operations from a guest kernel to a host OS [7, Linux. This reduces the cost and complexity of developing
10, 36, 38]. Most of them focus on improving I/O through- real-time applications while strongly guaranteeing the real-
puts by omitting message copying and shortening message time performance.
processing paths. They require modifications to Linux. The We performed experiments and measured timing accu-
work [11] provides only soft real-time performance. racy and interrupt latency of our cRTOS and other existing
McKernel is a co-kernel approach [15], and delivers a sim- systems, the PREEMPT_RT patched Linux and Xenomai 3.
ple POSIX programming environment for high performance The experimental results show that our cRTOS could de-
computing (HPC) applications. A shadow process in our liver hard real-time performance with about 4 µs jitter and
cRTOS is similar to a proxy process in McKernel, and we ap- well bounded maximum latency, while the others could not.
plied the idea to real-time applications. We extended the idea The experimental results also show that our cRTOS with a
to not only API compatibility but also to ABI compatibility. real-time device yielded the best interrupt response in both
This allows reusing the rich toolchains of Linux. In addition, latency and jitter. Finally, we demonstrated that our cRTOS
we allowed direct hardware access without relying on Linux could execute complex Linux executables with graphical user
system calls, which is important in real-time applications. interfaces through the X window system.
[41] demonstrated that building a unikernel with identi- In the future, we would like to add debugging support
cal ABI to Linux is possible. This unikernel can yield the to the real-time realm. We are also interested in running
same level of performance for a single application by using other RTOSs, such as FreeRTOS [1] and Zypher [54], in the
Linux with a Type-II hypervisor. In this study, we extended real-time realm.
an existing RTOS to adapt to Linux ABI. Our cRTOS yields
better real-time performance than Linux and supports run- Acknowledgment
ning multiple processes. We utilized a Type-I partitioning This work was partially supported by JSPS KAKENHI Grant
hypervisor instead of a Type-II hypervisor to ensure the Number 25540022 and 16K12410.
real-time performance.
The implementation of remote system calls in our cRTOS References
is similar to that of FlexSC [49]. In FlexSC, user threads put [1] Amazon Web Services. 2019. The FreeRTOST M Kernel. https://www.
requests to shared memory with a kernel, and kernel threads freertos.org/

70
Obtaining hard RT performance and rich Linux features in a cRTOS VEE ’20, March 17, 2020, Lausanne, Switzerland

[2] ASIX Electronics Corporationa. 2015. MCS9900 PCIe to Multi I/O (4S, [19] Ashvin Goel, Luca Abeni, Charles Krasic, Jim Snow, and Jonathan
2S+1P) Controller Datasheet. ASIX Electronics Corporationa. Walpole. 2002. Supporting Time-sensitive Applications on a Com-
[3] Antonio Barbalace, Adriano Luchetta, Gabriele Manduchi, Michele modity OS. In Proceedings of the 5th Symposium on Operating Systems
Moro, Anton Soppelsa, and Cesare Taliercio. 2008. Performance Com- Design and Implementation (OSDI ’02). 165–180. https://doi.org/10.
parison of VxWorks, Linux, RTAI, and Xenomai in a Hard Real-Time 1145/844128.844144
Application. IEEE Transactions on Nuclear Science 55, 1 (Feb. 2008), [20] Sriram Govindan, Arjun R. Nath, Amitayu Das, Bhuvan Urgaonkar,
435–439. https://doi.org/10.1109/TNS.2007.905231 and Anand Sivasubramaniam. 2007. Xen and Co.: Communication-
[4] John M. Calandrino, Hennadiy Leontyev, Aaron Block, UmaMah- aware CPU Scheduling for Consolidated Xen-based Hosting Platforms.
eswari C. Devi, and James H. Anderson. 2006. LITMUSRT : A Testbed In Proceedings of the 3rd International Conference on Virtual Execution
for Empirically Comparing Real-Time Multiprocessor Schedulers. In Environments (VEE ’07). 126–136. https://doi.org/10.1145/1254810.
Proceedings of 2006 27th IEEE International Real-Time Systems Sympo- 1254828
sium (RTSS’06). 111–126. https://doi.org/10.1109/RTSS.2006.27 [21] Hermann Härtig, Robert Baumgartl, Martin Borriss, Claude-Joachim
[5] Chok Leong Chai. 2010. [RTAI] "XIO: fatal IO error 1..." when using Hamann, Micheal Hohmuth, Frank Mehnert, Lars Reuther, Sebastian
Liunx server. http://mail.rtai.org/pipermail/rtai/2010-June/023338. Schönberg, and Jean Wolter. 1998. DROPS: OS Support for Distributed
html Multimedia Applications. In Proceedings of the 8th ACM SIGOPS Eu-
[6] Ulrich Drepper. 2002. ELF Handling for Thread-Local Storage. Technical ropean Workshop on Support for Composing Distributed Applications.
Report. 203–209. https://doi.org/10.1145/319195.319226
[7] Hideki Eiraku, Yasushi Shinjo, Calton Pu, Younggyun Koh, and [22] Hermann Härtig, Michael Hohmuth, Jochen Liedtke, Sebastian Schön-
Kazuhiko Kato. 2009. Fast Networking with Socket-outsourcing in berg, and Jean Wolter. 1997. The Performance of µ-kernel-based
Hosted Virtual Machine Environments. In Proceedings of the 2009 Systems. In Proceedings of the Sixteenth ACM Symposium on Operating
ACM Symposium on Applied Computing (SAC ’09). 310–317. https: Systems Principles (SOSP ’97). 66–77. https://doi.org/10.1145/268998.
//doi.org/10.1145/1529282.1529350 266660
[8] Heiko Falk, Sebastian Altmeyer, Peter Hellinckx, Björn Lisper, Wolf- [23] Gernot Heiser and Kevin Elphinstone. 2016. L4 Microkernels: The
gang Puffitsch, Christine Rochange, Martin Schoeberl, Rasmus Bo Lessons from 20 Years of Research and Deployment. ACM Transactions
Sørensen, Peter Wägemann, and Simon Wegener. 2016. TACLeBench: on Computer Systems 34, 1, Article 1 (April 2016), 29 pages. https:
A benchmark collection to support worst-case execution time research. //doi.org/10.1145/2893177
In Proceedings of the 16th International Workshop on Worst-Case Execu- [24] Gernot Heiser and Ben Leslie. 2010. The OKL4 Microvisor: Conver-
tion Time Analysis (WCET ’16). Schloss Dagstuhl-Leibniz-Zentrum für gence Point of Microkernels and Hypervisors. In Proceedings of the
Informatik, 2:1–2:10. First ACM Asia-pacific Workshop on Workshop on Systems (APSys ’10).
[9] The Linux Foundation. 2015. Linux Standard Base Core Specification for 19–24. https://doi.org/10.1145/1851276.1851282
X86-64. http://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-AMD64/ [25] Intel. 2018. Intel® 64 and IA-32 Architectures Software Developer’s
LSB-Core-AMD64.pdf Manual.
[10] Sahan Gamage, Ramana Rao Kompella, Dongyan Xu, and Ardalan [26] Colin King. 2018. Stress-ng: a tool to load and stress a computer system.
Kangarlou. 2013. Protocol Responsibility Offloading to Improve http://kernel.ubuntu.com/~cking/stress-ng
TCP Throughput in Virtualized Environments. ACM Transactions [27] Jan Kiszka and Rik van Riel. 2016. Two approaches to real-time virtu-
on Computer Systems 31, 3, Article 7 (Aug. 2013), 34 pages. https: alization - Jailhouse and KVM. In Proceedings of the Linux Foundation
//doi.org/10.1145/2491463 Real-Time Summit. https://wiki.linuxfoundation.org/realtime/events/
[11] Oscar F. Garcia, Yasushi Shinjo, and Calton Pu. 2018. Achieving Con- rt-summit2016/schedule
sistent Real-Time Latency at Scale in a Commodity Virtual Machine [28] Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lublin, and Anthony Liguori.
Environment Through Socket Outsourcing-Based Network Stacks. 2007. KVM: the Linux Virtual Machine Monitor. In Proceedings of the
IEEE Access 6 (2018), 69961–69977. 2007 Ottawa Linux Symposium (OLS ’07).
[12] Marisol García-Valls, Tommaso Cucinotta, and Chenyang Lu. 2014. [29] Linux man-pages. 2019. ld.so, ld-linux.so - dynamic linker/loader.
Challenges in real-time virtualization and predictable cloud computing. http://man7.org/linux/man-pages/man8/ld.so.8.html
Journal of Systems Architecture 60, 9 (2014), 726 – 740. https://doi.org/ [30] C. L. Liu and James W. Layland. 1973. Scheduling Algorithms for
10.1016/j.sysarc.2014.07.004 Multiprogramming in a Hard-Real-Time Environment. J. ACM 20, 1
[13] GCC team. 2019. GCC, the GNU Compiler Collection. https://gcc. (Jan. 1973), 46–61. https://doi.org/10.1145/321738.321743
gnu.org/ [31] Robert Love. 2010. Linux Kernel Development (3rd ed.). Addison-Wesley
[14] GCC team. 2019. The GNU C Library (glibc). https://www.gnu.org/ Professional.
software/libc/ [32] Paolo Mantegazza, EL Dozio, and Steve Papacharalambous. 2000. RTAI:
[15] Balazs Gerofi, Masamichi Takagi, Yutaka Ishikawa, Rolf Riesen, Evan Real time application interface. Linux Journal 2000, 72es (2000), 10.
Powers, and Robert W Wisniewski. 2015. Exploring the design space [33] Michael Matz, Jan Hubicka, Andreas Jaeger, and Mark Mitchell. 2014.
of combining Linux with lightweight kernels for extreme scale com- System V Application Binary Interface AMD64 Architecture Processor
puting. In Proceedings of the 5th International Workshop on Runtime Supplement, Draft Version 0.99.7. Technical Report. https://www.uclibc.
and Operating Systems for Supercomputers. 5–12. org/docs/psABI-x86_64.pdf
[16] Philippe Gerum. 2004. Xenomai–Implementing a RTOS emulation [34] John D McCalpin. 1995. Memory bandwidth and machine balance in
framework on GNU/Linux. Technical Report. current high performance computers. IEEE computer society technical
[17] Philippe Gerum. 2014. [Xenomai] POSIX application running under committee on computer architecture (TCCA) newsletter (1995), 19–25.
xenomai – what do wrapped functions do? https://xenomai.org/ [35] Larry McVoy and Carl Staelin. 1996. Lmbench: Portable Tools for
pipermail/xenomai/2014-June/031120.html Performance Analysis. In Proceedings of the 1996 Annual Conference
[18] Sourav Ghosh and Ragunathan Raj Rajkumar. 2002. Resource manage- on USENIX Annual Technical Conference (ATEC ’96). 23–23. http:
ment of the OS network subsystem. In Proceedings 5th IEEE Interna- //dl.acm.org/citation.cfm?id=1268299.1268322
tional Symposium on Object-Oriented Real-Time Distributed Computing. [36] Jun Nakajima, Qian Lin, Sheng Yang, Min Zhu, Shang Gao, Mingyuan
ISIRC 2002. 271–279. https://doi.org/10.1109/ISORC.2002.1003728 Xia, Peijie Yu, Yaozu Dong, Zhengwei Qi, Kai Chen, and Haibing Guan.
2011. Optimizing Virtual Machines Using Hybrid Virtualization. In

71
VEE ’20, March 17, 2020, Lausanne, Switzerland Chung-Fan Yang and Yasushi Shinjo

Proceedings of the 2011 ACM Symposium on Applied Computing (SAC 95–103. https://doi.org/10.1145/1400097.1400108
’11). 573–578. https://doi.org/10.1145/1982185.1982308 [49] Livio Soares and Michael Stumm. 2010. FlexSC: Flexible System Call
[37] Nick. 2013. lxardoscope. https://sourceforge.net/projects/lxardoscope/ Scheduling with Exception-less System Calls. In Proceedings of the 9th
[38] Audun Nordal, Åge Kvalnes, and Dag Johansen. 2012. Paravirtualizing USENIX Conference on Operating Systems Design and Implementation
TCP. In Proceedings of the 6th International Workshop on Virtualization (OSDI ’10). 33–46.
Technologies in Distributed Computing Date (VTDC ’12). 3–10. https: [50] Sun Microsystems, Inc. 2004. Linker and Libraries Guide.
//doi.org/10.1145/2287056.2287060 [51] The IEEE and The Open Group. 2017. IEEE 1003.1-2017 - IEEE Stan-
[39] Nuttx. 2019. NuttX Real-Time Operating System. http://www.nuttx.org dard for Information Technology–Portable Operating System Interface
[40] Shuichi Oikawa and Raj Rajkumar. 1999. Portable RK: a portable re- (POSIX(R)) Base Specifications, Issue 7. IEEE.
source kernel for guaranteed and enforced timing behavior. In Proceed- [52] The Linux Foundation. 2018. Cyclictest. https://wiki.linuxfoundation.
ings of the 5th IEEE Real-Time Technology and Applications Symposium. org/realtime/documentation/howto/tools/cyclictest/start
111–120. https://doi.org/10.1109/RTTAS.1999.777666 [53] The Linux Foundation. 2019. realtime:start [Linux Foundation Wiki].
[41] Pierre Olivier, Daniel Chiba, Stefan Lankes, Changwoo Min, and Binoy https://wiki.linuxfoundation.org/realtime/start
Ravindran. 2019. A Binary-compatible Unikernel. In Proceedings of [54] The Linux Foundation. 2019. Zephyr Project. https://www.
the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual zephyrproject.org/
Execution Environments (VEE ’19). 59–73. https://doi.org/10.1145/ [55] Tool Interface Standard (TIS). 1995. Executable and Linking Format
3313808.3313817 (ELF) Specification Version 1.2. Technical Report.
[42] Paoloni, Gabriele. 2010. How to Benchmark Code Execution Times on [56] Chia-Che Tsai, Bhushan Jain, Nafees Ahmed Abdul, and Donald E.
Intel® IA-32 and IA-64 Instruction Set Architectures. Intel Corporation. Porter. 2016. A Study of Modern Linux API Usage and Compatibility:
[43] Qemu. 2016. Device Specification for Inter-VM shared memory device. What to Support when You’re Supporting. In Proceedings of the 11th
Technical Report. European Conference on Computer Systems (EuroSys ’16). 16:1–16:16.
[44] Qumranet Inc. 2006. KVM: Kernel-based Virtualization Driver. Qum- https://doi.org/10.1145/2901318.2901341
ranet. [57] Sisu Xi, Justin Wilson, Chenyang Lu, and Christopher Gill. 2011. RT-
[45] Ralf Ramsauer, Jan Kiszka, Daniel Lohmann, and Wolfgang Mauerer. Xen: Towards Real-time Hypervisor Scheduling in Xen. In Proceedings
2017. Look Mum, no VM Exits!(Almost). In Proceedings of Workshop of the 9th ACM International Conference on Embedded Software (EM-
on Operating Systems Platforms for Embedded Real-Time Applications. SOFT ’11). 39–48. https://doi.org/10.1145/2038642.2038651
13–18. [58] Sisu Xi, Meng Xu, Chenyang Lu, Linh Thi Xuan Phan, Christopher Gill,
[46] Federico Reghenzani, Giuseppe Massari, and William Fornaciari. 2019. Oleg. Sokolsky, and Insup Lee. 2014. Real-time multi-core virtual ma-
The Real-Time Linux Kernel: A Survey on PREEMPT_RT. Comput. chine scheduling in Xen. In 2014 International Conference on Embedded
Surveys 52, 1, Article 18 (Feb. 2019), 36 pages. https://doi.org/10.1145/ Software (EMSOFT). 1–10. https://doi.org/10.1145/2656045.2656061
3297714 [59] Karim Yaghmour. 2001. Adaptive domain environment for operating
[47] Yi Ren, Ling Liu, Qi Zhang, Qingbo Wu, Jianbo Guan, Jinzhu Kong, systems. Opersys Inc.
Huadong Dai, and Lisong Shao. 2016. Shared-Memory Optimizations [60] Victor Yodaiken. 1999. The RTLinux manifesto. In Proceedings of The
for Inter-Virtual-Machine Communication. Comput. Surveys 48, 4, 5th Linux Expo.
Article 49 (Feb. 2016), 42 pages. https://doi.org/10.1145/2847562 [61] Pei Zhang. 2017. See what happened with real time KVM when build-
[48] Rusty Russell. 2008. Virtio: Towards a De-facto Standard for Virtual ing real time cloud. In Proceedings of the Linux Foundation LinuxCon
I/O Devices. ACM SIGOPS Operating Systems Review 42, 5 (July 2008), China. https://www.slideshare.net/LCChina

72

You might also like