QNX Kernel Bench Method
QNX Kernel Bench Method
QNX Kernel Bench Method
Table of Contents
Introduction .............................................................................................................................. 3
Kernel Entry.............................................................................................................................. 4
Context Switching.................................................................................................................... 5
Pulses........................................................................................................................................ 7
Synchronization ....................................................................................................................... 8
Mutexes..................................................................................................................................... 8
Semaphores............................................................................................................................ 10
Timers ..................................................................................................................................... 12
Signals .................................................................................................................................... 13
Threads ................................................................................................................................... 14
Message Queues.................................................................................................................... 15
Summary................................................................................................................................. 17
2
QNX Neutrino RTOS: Kernel Benchmark Methodology QNX Software Systems
Introduction
This document describes the outline and methodology of the QNX® Neutrino® kernel benchmarks suite,
the results of which are available on a per-platform basis in the “QNX Neutrino Realtime OS: Kernel
Benchmark Results” documents. It is recommended that this methodology overview be read in
conjunction with those results to provide context as to what each individual test is measuring, the rationale
behind each group of tests, and their relevance to real-world application performance. The following
sections include an overview of the principles and operations involved in each test group, and details on
any specific considerations and implementations.
3
QNX Neutrino RTOS: Kernel Benchmark Methodology QNX Software Systems
Benchmark Methodology
The methodology for this benchmark series contains some assumptions regarding overhead. The average
cost of a single instance of an operation is calculated from the total elapsed time required to perform a
large number of iterations of each such operation. The number of iterations required to obtain a
representative value is scaled to the clock speed of the host processor, typically in the order of millions.
Averaging the benchmark in this fashion removes the need for a fine-granularity time-stamp and reduces
variance. (For example, a single iteration may be skewed by an external or clock interrupt.)
The overhead of the empty loop itself is either insignificant or can be measured and eliminated. The
average cost is considered a more appropriate measure than the worst-case time, as that metric can be
adversely affected by such random factors as unrelated system activity and interrupt latencies, which are
dependent on particular hardware peripherals and installed drivers. Elements that may legitimately affect
the average performance of an operation, such as alternate kernel code paths or thread rescheduling and
context switching, are accounted for with variants of each benchmark test designed to explicitly and
consistently evoke those situations.
Kernel Entry
The QNX Neutrino realtime operating system architecture comprises a microkernel, the process manager,
and extended services provided by user-level managers. The microkernel provides such core facilities as
message passing, thread scheduling, timers, synchronization objects, and signals, whereas the process
manager builds on the kernel facilities to provide additional process-level semantics, memory
management, and pathname management. Optional extended services available include the file system,
TCP/IP network protocols, and message queues.
The kernel is entered through a trap or software interrupt from a wrapper routine in the C library
corresponding to each kernel primitive. Parameters and results are passed directly in registers or on the
stack, and the kernel executes largely in the context of the calling thread, making it a very efficient
interface. The process manager executes as a set of schedulable threads in its own process context, and
parameters and results are exchanged using message passing. Extended facilities provided by external
servers, which include named semaphores and message queues, require both message passing and two full
context switches, which is why they incur the most overhead.
As a result, the relative performance of the various primitive operations profiled below is dependent on
which layer implements that facility. This factor should also be considered when interpreting some
external UNIX-derived benchmarks, which, for example, may use the getppid() call to illustrate the
“system call overhead.” Under QNX Neutrino this is an overly pessimistic measurement, as this particular
routine is implemented by the process manager rather than by the microkernel. Since this is a low-
bandwidth call by applications, it is not necessary to optimize this by implementing it within the kernel;
rather the bias is towards keeping the microkernel small.
4
QNX Neutrino RTOS: Kernel Benchmark Methodology QNX Software Systems
Kernel call
This test measures the overhead from a kernel call by timing the simplest possible facility provided by the
microkernel; this is the ClockId() call, which returns a per-thread clock identifier for use in subsequent
ClockTime() calls.
Benchmark Loop
ClockId(0, 0);
Benchmark Loop
getppid();
Context Switching
Context switch operations are frequent in a realtime microkernel operating system. In such an
environment, hardware interrupts, signals, message passing, and the manipulation of synchronization
objects can trigger thread rescheduling, and non-core functionality is performed outside of the kernel by
the process manager or external server processes. An efficient context switching implementation is
essential to overall system performance.
Yield (self)
This test measures the overhead of a high-priority thread yielding the processor using the POSIX
sched_yield() call. If threads of the same priority are ready to run, the calling thread is placed at the
end of the ready queue for that priority, and a context switch occurs. In this case, as there are no other
eligible threads, the running thread continues without a context switch.
Initialization
param.sched_priority = sched_get_priority_max(SCHED_FIFO);
sched_setscheduler(0, SCHED_FIFO, ¶m);
Benchmark Loop
sched_yield();
5
QNX Neutrino RTOS: Kernel Benchmark Methodology QNX Software Systems
of the processor and perform context switches between them. In a multiprocessor (SMP) environment, the
“CPU affinity” of each thread is set to bind them to the same processor, ensuring that this test measures
only the cost of context switching, without any inter-processor synchronization overhead. A full process
context switch is typically more expensive than a thread context switch, as the kernel virtual memory
subsystem has more overhead in switching the address space. (For example, a TLB flush is required on
MMU architectures with untagged TLBs, such as the x86.)
sched_yield(); sched_yield();
Message Passing
Message-passing facilities are core primitives provided by the microkernel. Every QNX Neutrino
application, including file systems, TCP/IP networking, and device drivers, is implemented as a team of
cooperating threads and processes using the send/receive/reply messaging interface. Inter-process
communication (IPC) occurs at specified transitions within the system, rather than asynchronously. The
synchronous message-passing model facilitates robust client-server design, whereby application systems
can be designed as a number of modular server processes handling client requests.
Messages
Message passing is inherently a blocking operation that synchronizes the execution of the sending thread;
the act of sending the data also causes the sender to be blocked and the receiver to be scheduled for
execution. This happens without requiring any explicit work by the microkernel to determine which thread
or process to run next. Execution and data move directly from one context to another. Using message
passing, a client sends a request to a server and becomes blocked. The server receives the messages in
priority order from clients, processes them, and replies when it can satisfy a request. At this point the
client is unblocked and continues.
No data 0 0
6
QNX Neutrino RTOS: Kernel Benchmark Methodology QNX Software Systems
Initialization
chid = ChannelCreate(0);
coid = ConnectAttach(0, getppid(), chid,
_NTO_SIDE_CHANNEL, 0);
Pulses
A pulse is a non-blocking, unidirectional message with a small data payload (four bytes). The pulse may
be queued in the kernel if there is no blocked reader and it cannot be delivered immediately. Pulses are
commonly used as a notification mechanism within interrupt handlers or for per-process timer expiry. For
example, a pulse may be used as the associated event for InterruptAttachEvent() or
timer_create(). Pulses may also be used for a server to signal to clients the occurrence of an event of
interest, such as in select() processing.
Initialization
chid = ChannelCreate(0);
coid = ConnectAttach(0, getpid(), chid, _NTO_SIDE_CHANNEL, 0);
Benchmark Loop
Initialization
chid = ChannelCreate(0);
coid = ConnectAttach(0, getpid(), chid, _NTO_SIDE_CHANNEL, 0);
7
QNX Neutrino RTOS: Kernel Benchmark Methodology QNX Software Systems
Initialization
chid = ChannelCreate(0);
coid = ConnectAttach(0, getppid(), chid,
_NTO_SIDE_CHANNEL, 0);
Synchronization
QNX Neutrino supports a full complement of POSIX 1003.1 thread synchronization primitives, including
mutexes, semaphores, and condition variables. These facilities allow multiple threads, of the same process
or of different processes, to protect critical sections of code or to coordinate access to a shared resource or
memory area.
Mutexes
A mutex is an appropriate simple synchronization object for use in permitting only a single thread at a
time into a critical section of code. For example, when updating a shared doubly linked list, if multiple
threads were allowed to enqueue/dequeue elements in an uncontrolled manner, the link pointers could
become inconsistent and corrupt the list. By controlling access to the list using a mutex only, a single
thread will be able to modify it at any one time, and any other thread attempting to do so will block until
the mutex is unlocked. A mutex is a more elegant, fine-grained, and multiprocessor-safe mechanism for
performing such synchronization than disabling interrupts or thread switching. An efficient mutex
implementation is thus important to the performance of all multithreaded applications. In fact, many
internal services of QNX Neutrino, such as the file system, network protocol stacks, and device drivers, as
well as the implementation of thread safety where required within the C library, rely heavily on mutexes
to provide such synchronization.
8
QNX Neutrino RTOS: Kernel Benchmark Methodology QNX Software Systems
uncontested in this test because no other thread is attempting to acquire a lock. Therefore, on most
platforms, it will be unnecessary to enter the kernel, as the mutex can be safely manipulated using native
processor support (with instructions designed for atomic memory modification, such as the x86 cmpxchg
or the lwarx/stwcx sequence of the PowerPC). On processors without this functionality, the mutex
operations must either enter the kernel or emulate a suitable synchronization primitive.
Initialization
pthread_mutex_init(&mutex, NULL);
Benchmark Loop
pthread_mutex_lock(&mutex);
pthread_mutex_unlock(&mutex);
Unavailable pthread_mutex_trylock
This test measures the time taken to attempt to lock an already locked mutex. The mutex is created with
default attributes. As this is an unsuccessful non-blocking probe of the mutex state, it is again unnecessary
to enter the kernel or to reschedule any threads.
Initialization
pthread_mutex_init(&mutex, NULL);
pthread_mutex_lock(&mutex);
Benchmark Loop
pthread_mutex_trylock(&mutex, NULL);
Initialization
pthread_mutexattr_init(&attr);
pthread_mutexattr_setprotocol(&attr, PTRHEAD_PRIO_PROTECT);
pthread_mutexattr_setprioceiling(&attr, getprio(0));
pthread_mutex_init(&mutex, &attr);
Benchmark Loop
pthread_mutex_lock(&mutex);
pthread_mutex_unlock(&mutex);
9
QNX Neutrino RTOS: Kernel Benchmark Methodology QNX Software Systems
Initialization
pthread_mutex_init(&mutex1);
pthread_mutex_init(&mutex2);
pthread_mutex_init(&mutex3);
pthread_mutex_init(&mutex4);
pthread_mutex_lock(&mutex4); pthread_mutex_lock(&mutex1);
pthread_mutex_lock(&mutex1); pthread_mutex_lock(&mutex3);
pthread_mutex_lock(&mutex3); pthread_mutex_lock(&mutex2);
pthread_mutex_unlock(&mutex1); pthread_mutex_unlock(&mutex3);
pthread_mutex_unlock(&mutex4); pthread_mutex_unlock(&mutex1);
pthread_mutex_lock(&mutex2); pthread_mutex_lock(&mutex4);
pthread_mutex_lock(&mutex4); pthread_mutex_lock(&mutex1);
pthread_mutex_unlock(&mutex2); pthread_mutex_unlock(&mutex4);
pthread_mutex_unlock(&mutex3); pthread_mutex_unlock(&mutex2);
Semaphores
A semaphore is a flexible synchronization object that has a non-negative integer count and a set of
blocked threads associated with it. A semaphore can be used to control access to a pool of resources or to
indicate the occurrence of events as in a producer/consumer paradigm. Semaphores are explicitly defined
to work between processes and with signals, which makes them a common method of inter-process or
signal-handler synchronization in portable code. It is typical to use a mutex to synchronize between
threads in the same process, and to use a semaphore to synchronize between different processes. Unlike
the QNX Neutrino implementation of a mutex, which optimizes the uncontested cases, a semaphore
operation always results in kernel entry.
Initialization
sem_init(&sem, 0, 0);
10
QNX Neutrino RTOS: Kernel Benchmark Methodology QNX Software Systems
Benchmark Loop
sem_trywait(&sem);
Initialization
sem_init(&sem, 0, 0);
Benchmark Loop
sem_post(&sem);
sem_wait(&sem);
Initialization
sem_init(&sem, 0, 0);
sem_wait(&sem); sem_post(&sem);
Initialization
sem_wait(sem); sem_post(sem);
11
QNX Neutrino RTOS: Kernel Benchmark Methodology QNX Software Systems
Initialization
Timers
QNX Neutrino implements POSIX 1003.1 clock and timer facilities, including per-process notification of
timer expiry via realtime signals (and other event types, including native pulses). Timers are primitives
provided directly by the microkernel, and hence may be manipulated very efficiently.
Initialization
SIGEV_SIGNAL_INIT(&event, SIGALRM);
Benchmark Loop
Initialization
Benchmark Loop
12
QNX Neutrino RTOS: Kernel Benchmark Methodology QNX Software Systems
Signals
Signals are used by the operating system to report synchronous exceptions (such as SIGSEGV, SIGBUS, or
SIGFPE), and by application processes as asynchronous event notifications (via a sigevent structure or
an explicit kill()). QNX Neutrino implements the Realtime Extensions to the POSIX 1003.1 signal
facilities. These extensions include the SA_SIGINFO attribute, the association of an application-defined
value with each signal, the ability to queue pending signals, and the prioritized receipt of signals within the
defined realtime range.
Initialization
sa.sa_handler = emptyhandler;
sigfillset(&sa.sa_mask);
sa.sa_flags = 0;
sigaction(SIGUSR1, &sa, NULL);
Benchmark Loop
kill(pid, SIGUSR1);
Initialization
sa.sa_handler = emptyhandler;
sigfillset(&sa.sa_mask);
sa.sa_flags = 0;
sigaction(SIGUSR1, &sa, NULL);
sigfillset(&iset), sigdelset(&iset, SIGUSR1);
13
QNX Neutrino RTOS: Kernel Benchmark Methodology QNX Software Systems
Threads
A thread is a single flow of control within a process, where a process is an address space in which one or
more threads execute. Threads provide a framework for certain classes of application, in particular
client/server models, to leverage concurrency and exploit any underlying SMP parallelism. The QNX
Neutrino microkernel directly supports POSIX threads as schedulable entities, although the process
manager is used to provide additional process-level semantics.
Initialization
pthread_attr_init(&attr);
pthread_attr_setstacksize(&attr, PTHREAD_STACK_MIN);
Benchmark Loop
Initialization
pthread_attr_setstacklazy(&attr, PTHREAD_STACK_NOTLAZY);
Initialization
pthread_attr_setstackaddr(&attr, userstack);
14
QNX Neutrino RTOS: Kernel Benchmark Methodology QNX Software Systems
Message Queues
POSIX defines a set of message-passing facilities known as message queues, which allow for the transfer
of arbitrary data between cooperating processes. Under QNX Neutrino, message queues are not a core
kernel primitive, but are implemented through an optional user-level manager called mqueue. Although
aspects of the interface appear to be asynchronous, it is built on top of QNX Neutrino native synchronous
message passing, using the mqueue server to broker the transaction and perform store-and-forward
buffering. Message queues thus offer lower absolute performance than direct message passing, but are
useful in specific situations. For example, they provide flexibility with their buffering and asynchronous
notification facilities, and are a portable IPC mechanism for application code that must run under multiple
operating systems.
Unavailable mq_receive
This test measures the time taken to perform a non-blocking read attempt from an empty message queue.
Although no user data is transferred, this operation involves context switches and message passing with
the mqueue server to determine the queue state.
Initialization
Benchmark Loop
Initialization
Benchmark Loop
15
QNX Neutrino RTOS: Kernel Benchmark Methodology QNX Software Systems
and unblock each other as the messages are transferred through the message queue. This ensures the
empty message queue will have a blocked reader, enabling an optimization of relaying the messages rather
than buffering them within mqueue. This test requires two sets of context switches and message passing
with the mqueue server.
Initialization
Initialization
Benchmark Loop
16
QNX Neutrino RTOS: Kernel Benchmark Methodology QNX Software Systems
Summary
The methodology described in this document outlines the rationale behind the kernel benchmark tests
prepared by QNX Software Systems. Actual benchmark results for the tests described in the above
sections are available on a per-platform basis in the “QNX Neutrino Realtime OS: Kernel Benchmark
Results” documents. Please contact your local QNX sales representative for more information.
17
About QNX Software Systems
QNX Software Systems is the leading global provider of innovative embedded technologies, including
®
middleware, development tools, and operating systems. The component-based architectures of the QNX
® ® ®
Neutrino RTOS, QNX Momentics Tool Suite, and QNX Aviage middleware family together provide the
industry’s most reliable and scalable framework for building high-performance embedded systems. Global
leaders such as Cisco, Daimler, General Electric, Lockheed Martin, and Siemens depend on QNX technology
for vehicle telematics and infotainment systems, industrial robotics, network routers, medical instruments,
security and defense systems, and other mission- or life-critical applications. The company is headquartered in
Ottawa, Canada, and distributes products in over 100 countries worldwide.
www.qnx.com
© 2003 QNX Software Systems GmbH & Co. KG. All rights reserved. QNX, Momentics, Neutrino, Aviage, Photon and Photon microGUI are trademarks of QNX
Software Systems GmbH & Co. KG, which are registered trademarks and/or used in certain jurisdictions, and are used under license by QNX Software Systems
Co. All other trademarks belong to their respective owners.