Modular System Programming in Minix-3
Modular System Programming in Minix-3
; LO G I N : A P R I L 2 0 0 6 M O D U L A R SY ST E M P R O G R A M M I N G I N M I N I X 3 19
Operating System Reliability
In our view, the only way to improve operating system reliability is to get
rid of the model of the operating system as one gigantic program running
in kernel mode, with every line of code capable of compromising or bring-
ing down the system. Nearly all the operating system functionality, and
especially all the device drivers, have to be moved to user-mode processes,
leaving only a tiny microkernel running in kernel mode. Moving the entire
operating system to a single user-mode process as in L4Linux [4] makes
rebooting the operating system after a crash faster, but does not address
the fundamental problem of every line of code being critical. What is
required is splitting the core of the operating system functionality—includ-
ing the file system, process management, and graphics—into multiple
processes, putting each device driver in a separate process, and very tightly
controlling what each component can do. Only with such an architecture
do we have a chance to improve system reliability.
The reasons that such a modular, multiserver design is better than a mono-
lithic one are threefold. First, by moving most of the code from kernel
mode to user mode, we are not reducing the number of bugs but we are
reducing the power of each bug to cause damage. Bugs in user-mode
processes have much less opportunity to trash critical kernel data struc-
tures and cannot touch hardware devices they have no business touching.
The crash of a user-mode process is rarely fatal, whereas a crash of the ker-
nel always is. By moving most of the code out of the kernel, we are moving
most of the bugs out as well.
Second, by breaking the operating system into many processes, each in its
own address space, we greatly restrict the propagation of faults. A bug in
the audio driver may turn the sound off, but it cannot wipe out the file
system by accident. In a monolithic system, in contrast, bugs in any func-
tion can destroy code and data structures in unrelated and much more crit-
ical functions.
Third, by constructing the system as a collection of user-mode processes,
the functionality of each module can be clearly determined, making the
entire system much easier to understand and simpler to implement. In
addition, the operating system’s maintainability will improve, because the
modules can be maintained independently from each other, as long as
interfaces and shared data structures are respected.
While this article does not focus on security directly, it is important to
mention that operating system reliability and security are closely related.
Security has usually been designed with the model of the multi-user sys-
tem in mind, not a single-user system where that user will run hostile
code. However, many security problems are caused by malicious code
injected by viruses and worms exploiting bugs such as buffer overruns. By
moving most of the code out of the kernel, exploits of operating system
components are rendered far less powerful. Overwriting the audio driver’s
stack may allow the intruder to cause the computer to make weird noises,
but it does not compromise system security, since the audio driver does
not have superuser privileges. Thus, while we will not discuss security
much hereafter, our design has great potential to improve security as well.
The observation that microkernels are good for reliability is not new. In the
1980s and 1990s numerous microkernels were constructed, including L4
[5], Mach [6], V [7], Chorus [8], and Amoeba [9]. None of these succeed-
ed in displacing monolithic operating systems with microkernel-based
ones, but we have learned a lot since then and the time is right to try
20 ; LO G I N : VO L . 3 1 , N O. 2
again. Even Microsoft understands this. The next version of Windows
(Vista) will feature many user-mode drivers, and Microsoft’s Singularity
research project is also based on a microkernel.
F I G . 1 . S K E TC H O F T H E L AY E R E D A R C H I T E C T U R E O F M I N I X 3
; LO G I N : A P R I L 2 0 0 6 M O D U L A R SY ST E M P R O G R A M M I N G I N M I N I X 3 21
addition, there is a nonblocking event notification mechanism. Pending
notifications are stored in a compact bitmap that is statically declared as
part of the process table. This message-passing scheme eliminates all ker-
nel buffer management and kernel buffer overruns, as well as many dead-
locks.
The next level up contains the device drivers, one per major device. Each
driver is a user process protected by the MMU the same way ordinary user
processes are protected. They are special only in the sense that they are
allowed to make a small number of kernel calls to obtain kernel services.
Typical kernel calls are writing a set of values to hardware I/O ports or
requesting that data be copied to or from a user process. A bitmap in the
kernel’s process table controls which calls each driver (and server) can
make. Also, the kernel knows which I/O ports the driver is allowed to use,
and copying is possible only with explicit permission.
The operating system interface is formed by a set of servers. The main ones
are the file server, the process manager, and the reincarnation server. User
processes make POSIX system calls by sending a message to one of these
servers, which then carries out the call. The reincarnation server is espe-
cially interesting, since it is the parent process of all the servers and driv-
ers. It is different from init, which is the root of ordinary user processes, as
it manages and guards the operating system. If a server or driver crashes or
otherwise exits, it becomes a zombie until the reincarnation server collects
it, at which time the reincarnation server looks in its tables to determine
what to do. The usual action is to create a new driver or server and to
inform the other processes that it is doing so.
Finally, we have the ordinary user processes, which have the ability to send
fixed-length messages to some of the servers requesting service, but basi-
cally have no other power. While message passing is used under the hood,
the system libraries offer the normal POSIX API to the programmer.
22 ; LO G I N : VO L . 3 1 , N O. 2
Third, when object-oriented programming was introduced, many program-
mers balked at the idea, since they could no longer count on reading or
tweaking data structures internal to other objects, a previously common
practice in the name of efficiency. For example, when Java was introduced
to C programmers, many of them saw it as a straitjacket, since they could
no longer freely manipulate pointers. Nevertheless, object-oriented pro-
gramming is now common and has led to better-quality code.
; LO G I N : A P R I L 2 0 0 6 M O D U L A R SY ST E M P R O G R A M M I N G I N M I N I X 3 23
Operating System Development in User Space
Now let us look at some other aspects of the MINIX 3 programming
model. While there are some restrictions, as pointed out above, we believe
that programming in a multiserver operating system environment has
many benefits and may lead to higher productivity and better code quality.
Short development cycle. The huge difference between a monolithic and
a multiserver operating system immediately becomes clear when looking
at the development cycle of operating system components. System pro-
gramming on a monolithic system generally involves editing, compiling,
rebuilding the kernel, and rebooting to test the new component. A subse-
quent crash will require another reboot, and tedious, low-level debugging
usually follows, frequently without even a core dump. In contrast, the
development cycle on a multiserver system like MINIX 3 is much shorter.
Typically, the steps are limited to editing, compiling, testing, and debug-
ging. We will elaborate on these steps below.
Normal programming model. Because drivers and servers are just ordinary
user processes, they can use any libraries that are needed. In some cases,
even POSIX system calls can be used. The ability to do these things can be
contrasted with the more rigid environment available to programmers writ-
ing code for monolithic kernels. In essence, working in user mode makes
programming easier.
No system downtime. The required reboots for a monolithic operating
system effectively kick off all users, meaning that a separate development
system is to be preferred. In MINIX 3, no reboots are required to test new
components, so other users are not affected. Furthermore, bugs or other
problems are isolated in the new components and cannot affect the entire
system, because the new component is run as an independent process in a
restricted execution environment. Problems thus cannot propagate as in a
monolithic system.
Easy debugging. Debugging a device driver in a monolithic kernel is a real
challenge. Often the system just halts and the programmer does not have a
clue what went wrong. Using a simulator or emulator usually is of no use
because typically the device being driven is something new and not sup-
ported by the simulator or emulator. In contrast, in the MINIX 3 model, a
device driver is just a user process, so if it crashes, it leaves behind a core
dump that can be inspected using all the normal debugging tools. In addi-
tion, the output of all printf() statements in drivers and servers automatical-
ly goes to a log server, which writes it to a file. After a failed run with the
new driver, the programmer can examine the log to see what the driver
was doing just before it died.
Low barrier to entry. Because writing drivers and servers is much easier
than in conventional systems, researchers and others can try out new ones
easily. Ease of experimentation can advance the field by allowing people
with good ideas but little experience in kernel programming to try out
their ideas and build prototypes they would not be able to construct with
monolithic kernels. Although the hardest part of writing a new device
driver may be understanding the actual hardware, other operating system
components can be easy to realize. For example, the case study at the end
of this article illustrates how semaphore functionality can be added to
MINIX 3.
High productivity. Because operating system development in user space is
easier, the programmer can get the job done faster. Also, since no lengthy
24 ; LO G I N : VO L . 3 1 , N O. 2
system build is needed once the bug has been removed, time is saved.
Finally, since the system need not be rebooted after a driver crash, as soon
as the programmer has inspected the core dump and the log and has
updated the code, it is possible to test the new driver without a reboot.
With a monolithic kernel, two reboots are often needed: one to restart the
system after the crash and one to boot the newly built kernel.
Good accountability. When a driver or server crashes, it is completely
obvious which one it is (because its parent, the reincarnation server,
knows which process exited). As a consequence, it is much easier than in
monolithic systems to pin down whose fault the crash was and possibly
who is legally liable for the damage done. Holding the producers of com-
mercial software liable for their errors, in precisely the same way as the
producers of tires, medicines, and other products are held accountable,
may improve software quality.
Great flexibility. Our modular model offers great flexibility and makes sys-
tem administration much easier. Since operating system modules are just
processes, it is relatively easy to replace one. It becomes easier to configure
the operating system by mixing and matching modules. Furthermore, if a
device driver needs to be patched, this can usually be done on the fly,
without loss of service or downtime. Module substitution is much harder
in monolithic kernels and often requires a reboot. Finally, maintenance
also becomes easier, because all modules are small, independent, and well
understood.
; LO G I N : A P R I L 2 0 0 6 M O D U L A R SY ST E M P R O G R A M M I N G I N M I N I X 3 25
void semaphore_server( ) {
message m;
int result;
/* Initialize the semaphore server. */
initialize( );
/* Main loop of server. Get work and process it. */
while(TRUE) {
/* Block and wait until a request message arrives. */
ipc_receive(&m);
/* Caller is now blocked. Dispatch based on message type. */
switch(m.m_type) {
case UP: result = do_up(&m); break;
case DOWN: result = do_down(&m); break;
default: result = EINVAL;
}
/* Send the reply, unless the caller must be blocked. */
if (result != EDONTREPLY) {
m.m_type = result;
ipc_reply(m.m_source, &m);
}
}
}
All servers and drivers have a similar main loop. The function initialize() is called
once before entering the main loop, but is not shown here. The handler functions
do_up() and do_down() are given in Fig. 3.
26 ; LO G I N : VO L . 3 1 , N O. 2
int do_down(message *m_ptr) {
/* Resource available. Decrement semaphore and reply. */
if (s > 0) {
s = s – 1; /* take a resource */
return(OK); /* let the caller continue */
}
/* Resource taken. Enqueue and block the caller. */
enqueue(m_ptr->m_source); /* add process to queue */
return(EDONTREPLY); /* do not reply in order to block the caller */
}
F I G . 3 . up A N D down O P E R A T I O N S
OF THE SEMAPHORE SERVER
The functions enqueue(), dequeue(), and queue_size() do list management and
are not shown.
Conclusion
MINIX 3 is a new, fully modular operating system designed to be highly
reliable. Like other innovations, our quest for reliability imposes certain
restrictions upon the execution environment, but the multiserver environ-
ment of MINIX 3 makes life much easier for the OS programmer. The
development cycle is shorter, system downtime is no longer required, the
programming interface is more POSIX-like, and testing and debugging
become easier. Programmer productivity is likely to increase, and code
quality might improve because of better accountability. The system admin-
istrator also benefits, since MINIX 3 improves configurability and main-
tainability of the operating system. Finally, we have illustrated the mes-
sage-driven programming model of MINIX 3 with the construction of a
simple semaphore server and discussed how its development benefits from
the modularity of MINIX 3. Interested readers can download MINIX 3
(including all the source code) from http://www.minix3.org. Over 50,000
people have already downloaded it; try it yourself.
REFERENCES
[1] T.J. Ostrand and E.J. Weyuker, “The Distribution of Faults in a Large
Industrial Software System,” Proceedings of the SIGSOFT International
Symposium on Software Testing and Analysis, ACM, 2002, pp. 55–64.
; LO G I N : A P R I L 2 0 0 6 M O D U L A R SY ST E M P R O G R A M M I N G I N M I N I X 3 27
[2] A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler, “An Empirical
Study of Operating System Errors,” Proceedings of the 18th ACM Symposium
on Operating System Principles, 2001, pp. 73–88.
[3] M.M. Swift, M. Annamalai, B.N. Bershad, and H.M. Levy, “Recovering
Device Drivers,” Proceedings of the 6th Symposium on Operating System
Design and Implementation, 2004, pp. 1–15.
[4] H. Härtig, M. Hohmuth, J. Liedtke, S. Schönberg, and J. Wolter, “The
Performance of µ-Kernel–Based Systems,” Proceedings of the 16th
Symposium on Operating System Principles, 1997, pp. 66–77.
[5] J. Liedtke, “On µ-Kernel Construction,” Proceedings of the 15th ACM
Symposium on Operating System Principles, 1995, pp. 237–250.
[6] M. Accetta, R. Baron, D. Golub, R. Rashid, A. Tevanian, and M. Young,
“Mach: A New Kernel Foundation for UNIX Development,” Proceedings of
the USENIX 1986 Summer Conference, 1986, pp. 93–112.
[7] D.R. Cheriton, “The V Kernel: A Software Base for Distributed
Systems,” IEEE Software, vol. 1, no. 2, 1984, pp. 19–42.
[8] A. Bricker, M. Gien, M. Guillemont, J. Lipkis, D. Orr, and M. Rozier,
“A New Look at Microkernel–Based UNIX Operating Systems: Lessons in
Performance and Compatibility,” Proceedings of the EurOpen Spring 1991
Conference, 1991, pp. 13–32.
[9] S. Mullender, G. Van Rossum, A.S. Tanenbaum, R. Van Renesse, and
H. Van Staveren, “Amoeba: A Distributed Operating System for the 1990s,”
IEEE Computer Magazine, vol. 23, no. 5, 1990, pp. 44–54.
[10] E.W. Dijkstra, “Goto Statement Considered Harmful,” Communications
of the ACM, vol. 11, no. 3, 1968, pp. 147–148.
28 ; LO G I N : VO L . 3 1 , N O. 2