Freebsd Debugging
Freebsd Debugging
John H. Baldwin
Yahoo!, Inc.
Atlanta, GA 30327
jhb@FreeBSD.org, http://people.FreeBSD.org/˜jhb
Just like every other piece of software, the The Kernel Debugging chapter of the
FreeBSD kernel has bugs. Debugging a ker- FreeBSD Developer’s Handbook [4] covers
nel is a bit different from debugging a user- several details already such as entering DDB,
land program as there is nothing underneath configuring a system to save kernel crash
the kernel to provide debugging facilities such dumps, and invoking kgdb on a crash dump.
as ptrace() or procfs. This paper will give a This paper will not cover these topics. In-
brief overview of some of the tools available stead, it will demonstrate some ways to use
for investigating bugs in the FreeBSD kernel. FreeBSD’s kernel debugging tools to investi-
It will cover the in-kernel debugger DDB and gate bugs.
the external debugger kgdb which is used to
perform post-mortem analysis on kernel crash
dumps.
2 Kernel Crash Messages
fault was the result of a NULL pointer deref- system. The listing includes a summary of
erence in the net.inet.tcp.pcblist sysctl the state of each thread including any lock
handler. It was caused by a race condi- the thread is blocked on or a wait channel on
tion where a struct tcpcb was freed in one which the thread is sleeping. More specific
thread while another thread was in the sysctl details about individual processes may be ob-
handler. tained via the show proc command. This
command accepts a single argument that is
The first key piece of data is the fault vir- either a direct pointer to a struct proc or a
tual address. It is the invalid memory ad- process ID (PID). Similarly, the show thread
dress that caused the fault. In this case command provides details about an individ-
the fault address is indicative of a NULL ual thread and accepts either a direct pointer
pointer derefence since its value is very small. to a struct thread or a thread ID (TID).
The instruction pointer indicates the pro- Figure 3 shows a truncated list of processes
gram counter value where the fault occurred. and threads in various states. Figure 4 shows
This can be used either with gdb(1) or more detailed information about the first pro-
addr2line(1) to determine the corresponding cess in the list and one of its threads.
source line. The current process lists the com-
mand name and PID of the process that was A very important part of a thread’s state is
executing when the fault occurred. the stack trace. A stack trace provides a bit
of history of where the thread has been in the
past. It can also help explain how a thread
arrived at its current state. DDB provides a
3 Live Debugging with DDB trace command to obtain the stack trace of
single thread. With no aguments it will pro-
vide a trace of the current thread. If an argu-
Another debugging tool provided by the ment is specified then it may be either a TID
FreeBSD kernel is the in-kernel debugger or a PID. If the argument is a PID, then the
DDB. DDB is an interactive debugger that first thread from the indicated process will be
allows the user to execute specific commands used. Figure 5 shows the stack trace for the
to inspect various details of the running ker- thread blocked on the def lock. The trace in-
nel. It is able to resolve global symbols to ad- dicates that the thread attempted to acquire
dresses and control execution via breakpoints the lock in the aptly named mtx deadlock
and single stepping. It is also extensible since function.
new commands may be added at compile
time. Details about several of the commonly
used DDB commands may be found in the 3.2 Investigating Deadlocks
ddb(4) manpage [2].
db> ps
pid ppid pgrp uid state wmesg wchan cmd
954 0 0 0 LL (threaded) crash2
100144 L *abc 0xffffff0001288dc0 [crash2: 3]
100143 L *jkl 0xffffff0001288c80 [crash2: 2]
100142 L *ghi 0xffffff0001288be0 [crash2: 1]
100055 L *def 0xffffff0001288d20 [crash2: 0]
812 0 0 0 SL - 0xffffffff80673a20 [nfsiod 0]
771 769 771 26840 Ss+ ttyin 0xffffff00011b9810 tcsh
769 767 767 26840 S select 0xffffff00018ca0d0 sshd
767 705 767 0 Ss sbwait 0xffffff00016ed94c sshd
...
10 0 0 0 RL (threaded) idle
100005 Run CPU 0 [idle: cpu0]
100004 Run CPU 1 [idle: cpu1]
100003 Run CPU 2 [idle: cpu2]
100002 Run CPU 3 [idle: cpu3]
db> tr 100055
Tracing pid 954 tid 100055 td 0xffffff00013869c0
sched_switch() at sched_switch+0x15d
mi_switch() at mi_switch+0x215
turnstile_wait() at turnstile_wait+0x24c
_mtx_lock_sleep() at _mtx_lock_sleep+0xe0
_mtx_lock_flags() at _mtx_lock_flags+0x7a
mtx_deadlock() at mtx_deadlock+0xb4
crash_thread() at crash_thread+0x138
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffffffae23ed30, rbp = 0 ---
In this case, thread 100142 owns the def Each DDB command is bound to a func-
lock and thread 100055 is waiting for it. Note tion. The <ddb/ddb.h> header provides
that the turnstile information actually in- helper macros to declare a command func-
cludes the lock owners as well as the waiters tion and add it to a command table. The
for a given lock. Also, from Figure 3 one can DB COMMAND macro creates a top-level com-
see that the thread information includes the mand including the function prototype. See
turnstile that a thread is currently blocked Figure 9 for an example of a simple “foo”
on. From this, it is apparent that one can command. Note that there is no explicit func-
build a dependency graph among a group of tion prototype and that the function body
threads. For a given thread that is blocked immediately follows the macro. To add a
on a turnstile, it is waiting for the owner of “show” command, use DB SHOW COMMAND in-
the lock associated with the turnstile. stead of DB COMMAND.
DDB provides another command, show The command function takes four argu-
lockchain that displays this dependency ments which provide the command’s param-
chain. It walks the thread dependencies eters. The addr argument specifies the ad-
via turnstiles until it finds a thread that is dress for the command to operate on. It
not blocked on a turnstile. If it encounters may either be the user-supplied address or the
a deadlock it will stay stuck in the cycle dot address as described in ddb(4) [2]. The
until the user uses ’q’ at DDB’s --More-- have addr argument is a boolean that is true
prompt. The show lockchain argument if the user supplied an explicit address. The
takes an optional argument specifying the count argument indicates the count of oper-
starting thread as either a pointer to a struct ations to be performed. If the user did not
thread or a TID. Figure 7 shows the de- specify one, then count is set to -1. Finally,
pendency graph for thread 100055 which is the modif argument is a string that contains
clearly stuck in a deadlock with the other the command modifiers without the leading
threads from the same process. slash. If no modifiers were specified, then
modif will be an empty string.
A limitation of show lockchain is that it
only handles dependencies for locking prim-
itives that use turnstiles such as mutexes. 3.3.2 I/O for DDB Commands
Other locking primitives such as sx locks
use sleepqueues to hold threads waiting for DDB command functions are executed in an
locks. DDB includes a show sleepchain alternative environment from the rest of the
command which displays a dependency graph kernel. One of the primary differences is
for threads blocked on sx locks and lockmgr that DDB uses its own I/O subsystem. DDB
locks. Figure 8 shows the dependency graph commands do not accept direct input from
db> show lock def
class: sleep mutex
name: def
flags: {DEF}
state: {OWNED, CONTESTED}
owner: 0xffffff000155c680 (tid 100142, pid 954, "crash2: 1")
db> show turnstile def
Lock: 0xffffffffae3c6fc0 - (sleep mutex) def
Lock Owner: 0xffffff000155c680 (tid 100142, pid 954, "crash2: 1")
Shared Waiters:
empty
Exclusive Waiters:
0xffffff00013869c0 (tid 100055, pid 954, "crash2: 0")
Pending Threads:
empty
db> ps
pid ppid pgrp uid state wmesg wchan cmd
811 0 0 0 SL (threaded) crash2
100139 D fee 0xffffffffae3a9180 [crash2: 3]
100138 D four 0xffffffffae3a9140 [crash2: 2]
100137 D fo 0xffffffffae3a9240 [crash2: 1]
100136 D two 0xffffffffae3a90c0 [crash2: 0]
...
db> show lock fee
class: lockmgr
name: fee
lock type: fee
state: EXCL (count 1) 0xffffff00013079c0 (tid 100136, pid 811, "crash2: 0")
waiters: 1
db> show sleepchain 100139
thread 100139 (pid 811, crash2: 3) blocked on lk "fee" EXCL (count 1)
thread 100136 (pid 811, crash2: 0) blocked on sx "two" XLOCK
thread 100137 (pid 811, crash2: 1) blocked on lk "fo" EXCL (count 1)
thread 100138 (pid 811, crash2: 2) blocked on sx "four" XLOCK
thread 100139 (pid 811, crash2: 3) blocked on lk "fee" EXCL (count 1)
...
if (have_addr)
foop = (struct foo *)addr;
else
foop = &default_foo;
if (count == -1)
count = 1; /* Default count. */
for (i = 0; i < count; i++)
do_something(foop);
}
the user. Instead, the input comes from the verbose output.
command line when the command is invoked.
Commands do output various messages to the
console, and DDB provides its own API for
console output. 3.3.3 Using DDB to Map Addresses
to Symbols
The primary routine in DDB’s I/O API is
db printf. This function takes the same ar- Another useful debugging tool DDB provides
guments as printf(9) and supports all of the is the ability to use its symbol tables to map
same output formats. This includes the ex- addresses to symbolic names. This can be
tended formats %b and %D. DDB command very useful for looking up the name of a func-
functions should use db printf for all con- tion for a function pointer. This is especially
sole output. true when working with facilities that work
on lists of function pointers such as taskqueue
An additional detail of DDB’s I/O subsys- tasks, callouts, or SYSINITs. Note that these
tem that DDB commands may need to han- routines can be used outside of DDB. How-
dle is the pager. DDB’s output includes a ever, doing so may result in races with loading
builtin pager which will interrupt the output kernel modules, so care should be taken.
with a --More-- prompt periodically. If a
command does not wish to have any of its The db search symbol function is used
output interrupted it may disable the pager to map a specific address to a symbol. It
entirely by calling db disable pager. The accepts an address as its first argument,
panic command does this for example. A a strategy as its second argument, and a
DDB command that produces a lot of out- pointer to a db expr t variable as its third
put (for example, one that iterates over a list) argument. The strategy argument can ei-
should honor a request by the user to abort ther by DB STGY PROC to only match func-
the current command at the pager prompt. If tions or DB STGY ANY to match any sym-
the user aborts a command, then the global bol. The third argument cannot be NULL as
variable db pager quit will be set to true. db search symbol assumes it always points
Thus, DDB command functions simply need to valid storage. Upon successful completion,
to check the state of db pager quit periodi- the function returns a pointer to a symbol. It
cally and gracefully exit when it is non-zero. also stores the offset of the address relative
Figure 10 contains a sample “show foos” com- to the symbol in the variable pointed to by
mand which walks a list of struct foo ob- the third argument. If no appropriate symbol
jects displaying information about each ob- was found, then db search symbol returns
ject. It supports a “v” flag to enable more C DB SYM NULL.
DB_SHOW_COMMAND(foos, db_show_foos_cmd)
{
struct foo *foop;
int verbose;
sym = db_search_symbol((vm_offset_t)(*sipp)->func,
DB_STGY_PROC, &offset);
db_symbol_values(sym, &name, NULL);
if (name != NULL)
printf(" %s(%p)... ", name, (*sipp)->udata);
else
#endif
printf(" %p(%p)... ", (*sipp)->func,
(*sipp)->udata);
and threads. Thus, to switch to a thread with kernel module from the kernel (e.g. using kld-
a specific TID or PID one has to examine the stat(8)), and the relative addresses of each
thread list from info threads to map a TID section from the kernel module (e.g. using
or PID to a gdb thread ID. objdump(8)). The relocated address of each
section is then computed by adding its rela-
To alleviate this inconvenience, kgdb pro- tive address to the base address of the mod-
vides proc and tid commands. The proc ule. Thankfully, there are ways to automate
command accepts a PID and switches to the this process.
thread context of the first thread for the spec-
ified process. The tid command accepts
a TID and switches to the corresponding
thread. Note that the proc command does 4.2.1 kgdb KLD Support
not work with remote debugging.
Recent versions of kgdb provide integrated
support for managing kernel modules. First,
4.2 Debugging Kernel Modules the add-kld command can be used to man-
ually load the symbols for a single module.
Second, kgdb uses gdb’s support for shared
Kernel modules (also called “klds”) are sep- libraries to automatically load symbols for
arate object files that can be loaded into the modules. Note that both of these features
kernel’s address space at runtime. Each ker- only work for a kernel with debug symbols.
nel module contains its own symbols that are
separate from the kernel’s symbols. DDB The add-kld command accepts as its sole
uses a merged symbol table that is updated argument a pathname of a kernel module and
by the kernel linker when modules are loaded loads the symbols for that module. The path
and unloaded. The kgdb debugger, on the can either be an absolute path or a relative
other hand, has to explicitly load symbols for path. If it is a relative path, then kgdb
each kernel module from an appropriate sym- will look for the module in several directo-
bol file. ries: the current working directory, the di-
rectory of the current kernel executable, and
An arbitrary symbol file can be loaded each directory in the target kernel’s module
in kgdb using the add-symbol-file com- path. If a kernel module is found, then its
mand. This command requires the relocated filename is matched to one of the target ker-
addresses of each section as command argu- nel’s loaded modules. The base address for
ments. Doing this by hand is a bit tedious. the loaded module is read from the target
It involves extracting the base address of the kernel and used to relocate the section ad-
dresses in the kernel module symbol file. Ba- taining add-symbol-file commands to load
sically, add-kld is a wrapper around the gdb the symbols for each module. Note that by
command add-symbol-file that does all the default, asf(8) expects to parse output from
math internally. As with add-symbol-file, kldstat(8) on its standard input to obtain the
the only way to unload symbols added via list of kernel modules. However, the -M and
add-kld is to clear all symbols via the file -N options can be used to make asf(8) read
or symbol-file commands. the list of kernel modules directly from a vm-
core similar to kgdb. Also, asf(8) assumes
For more automated handling of kernel that it is invoked from a kernel build direc-
modules, kgdb hooks into gdb’s shared li- tory. If you wish it to load symbols from
brary support and treats kernel modules as the modules in the installed location you will
shared libraries. As a result, the stan- need to use the -s flag and specify an explicit
dard commands for manipulating shared li- kernel module path. Once asf(8) has gener-
braries in gdb such as info sharedlibrary, ated a gdb command file, the symbols can be
sharedlibrary, and nosharedlibrary can loaded by using the source command from
be used to manage kernel module symbols. In kgdb to execute the commands in the gener-
addition, sections from kernel modules loaded ated file. Figure 13 shows the command file
via the shared library mechanism are listed in generated by asf(8) for the modules loaded
the info files output. Figure 12 shows the on my laptop. Note that the addresses of the
kernel modules loaded on my laptop. various named sections in the command for
iwi bss.ko match the addresses in the info
To locate the corresponding file for a ker- files output from Figure 12.
nel module, kgdb will first use the abso-
lute path stored in the kernel image for
8.0 and later. Note that you can use set 4.3 Extending kgdb via Scripts
solib-absolute-prefix to force a prefix for
the absolute paths. If the absolute path is
not present (or the corresponding file is not Similar to DDB, kgdb can be extended by
present), then kgdb will first search for the file adding new commands. Rather than requir-
in paths set via set solib-search-path. If ing a recompile of the kernel, new commands
that fails, then kgdb will search the same set can be added on the fly using gdb’s scripting
of paths as the add-kld command. language. GDB scripts are evaluated at run-
time and are not pre-compiled. On the one
Using this facility, symbols for kernel mod- hand this provides several benefits. For ex-
ules are automatically loaded when a vmcore ample, the physical layout of structures are
file is used as the target. When debugging a not hardcoded into the scripts when writing
remote target, on the other hand, symbols for them. Instead, gdb uses symbols from the
kernel modules are not automatically loaded kernel and modules to compute the offsets
when attaching to the target. However, in- of member names as well as the addresses of
voking the info sharedlibrary command global symbols. Also, gdb does not evaluate
will cause kgdb to query the list of kernel statements that are not executed. Thus, one
modules from the remote kernel. Afterward can use members of structures that are not
the sharedlibrary command can be used to always present (e.g. when a new member is
load symbols for the modules. added) by using conditional execution. The
downside is that gdb scripts require a ker-
nel built with debug symbols for all but the
4.2.2 Using asf(8) simplest tasks. The gdb info documentation
covers the basics of scripts, or user defined
commands, but there are several quirks that
For older versions of kgdb, the asf(8) [10] tool are worth mentioning.
can be used to automate the loading of kld
symbols. Specifically, asf(8) searches for ker- First, while gdb scripts do support control
nel modules corresponding to a set of loaded flow via while loops and if-then-else state-
modules and then generates a text file con- ments, there are a few limitations. For
> sudo kgdb -q
Reading symbols from /boot/kernel/iwi_bss.ko...
Reading symbols from /boot/kernel/iwi_bss.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/iwi_bss.ko
Reading symbols from /boot/kernel/logo_saver.ko...
Reading symbols from /boot/kernel/logo_saver.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/logo_saver.ko
...
(kgdb) info sharedlibrary
From To Syms Read Shared Object Library
0xc3e8e5a0 0xc3e8e63b Yes /boot/kernel/iwi_bss.ko
0xc41037a0 0xc4103c28 Yes /boot/kernel/logo_saver.ko
(kgdb) info files
Symbols from "/boot/kernel/kernel".
kernel core dump file:
‘/dev/mem’, file type FreeBSD kernel vmcore.
Local exec file:
‘/boot/kernel/kernel’, file type elf32-i386-freebsd.
Entry point: 0xc04513c0
...
0xc3e8e5a0 - 0xc3e8e63b is .text in /boot/kernel/iwi_bss.ko
0xc3e8e63b - 0xc3e8e724 is .rodata in /boot/kernel/iwi_bss.ko
0xc3e8f000 - 0xc3ebdb04 is .data in /boot/kernel/iwi_bss.ko
0xc3ebdb04 - 0xc3ebdb7c is .dynamic in /boot/kernel/iwi_bss.ko
0xc3ebdb7c - 0xc3ebdb88 is .got in /boot/kernel/iwi_bss.ko
0xc3ebdb88 - 0xc3ebdb8c is .bss in /boot/kernel/iwi_bss.ko
...
closely as well as the code around the crash for a NULL pointer. One way to verify if a
point is often sufficient to determine the cause crash on an x86 machine was the result of a
of the bug. hardware error is to check the system event
log. This can usually be examined from the
Another crash that can be a secondary ef- BIOS setup. For systems with a BMC, the
fect is a crash due to exhausting the space ipmitool [11] utility can be used to examine
in the “kmem” virtual memory map. The the system event log at runtime. Lack of a
“kmem” virtual memory map is used to pro- corresponding entry in the system event log
vide virtual address space for memory allo- doesn’t necessarily disprove a hardware fail-
cated via malloc(9) or uma(9) in the ker- ure, but if an entry is present it can confirm
nel. On architectures with a direct map such failing hardware as the panic’s cause.
as amd64, “kmem” is only used for alloca-
tions larger than a page. On other architec-
tures “kmem” is used for all allocations. If 6.2 Kernel Hangs
the amount of virtual address space in the
“kmem” map is exhausted, then the kernel
will crash. This can sometimes be the re- Kernel hangs tend to require a bit more
sult of resource exhaustion. For example, if sleuthing. One reason for this is that it can
kern.ipc.nmbclusters is set to a high value sometimes take a bit of investigating to figure
and a m getcl(M WAIT) invocation causes the out the true extent of the hang. Here are a
“kmem” map to be exhausted before the few things to try to start the investigation of
nmbclusters limit is reached, then the kernel a hang.
will panic.
First, check for resource starvation. For
Sometimes the “bug” can actually be faulty example, check for messages on the console
hardware. For example, a pointer might have about the kern.maxfiles or maxproc limits
a bit error. This can result in a page fault being exceeded. Sometimes a machine that
is overloaded will appear to be hung because
it is unable to fork a new process for a re- The show lock DDB command was added
mote login, for example. Login to the box on in FreeBSD 6.1. The show proc, show
the console if possible and check for other re- thread, show turnstile, show lockchain,
source exhaustion issues using commands like and show sleepchain commands were added
netstat(1) and vmstat(1). in FreeBSD 6.2.
The next step is generally to break into Several recent changes to kgdb will first ap-
DDB. The ps command in DDB can give a pear in FreeBSD 6.4 and 7.1. These include
very useful overview of the system. For exam- the integrated kernel module support as well
ple, if all of the CPUs are idle, then there may as the ’tid’ command supporting remote tar-
be a deadlock. The ps command can be used gets. Also, while the ’proc’ command has
to look for suspect threads which can then be been present since 6.0, the ’tid’ command first
investigated further. On the other hand, if all appeared in 7.0.
of the CPUs are busy, then that may indicate
a livelock condition (or an overloaded box). There are several existing sets of kgdb com-
mand files containing various user-defined
If the hang’s cause is still unknown, then commands. Some scripts are present
the panic command can be used from DDB in the FreeBSD source tree under the
to explicitly panic the machine. If the ma- src/tools/debugscripts directory. The
chine is configured for crashdumps, then it scripts at http://www.FreeBSD.org/~jhb/
will write out a crash. After the machine has gdb include user-defined commands that pro-
rebooted the crashdump can be used to exam- vide similar functionality to many DDB
ine the hang further. For example, if logging commands such as ps, lockchain, and
into the box to run netstat was not possible, sleepchain.
then netstat can be run against the crash-
dump.
References
7 Conclusion [1] The GNU Project Debugger, http://
www.gnu.org/software/gdb
The FreeBSD kernel has bugs just like any [2] DDB, FreeBSD Kernel Interfaces Man-
other piece of software. To aid in the investi- ual, http://www.FreeBSD.org/cgi/
gation and fixing of bugs, FreeBSD provides man.cgi
several kernel debugging tools. Some of the
tools are services within the kernel itself such [3] kgdb, FreeBSD General Commands
as DDB. Other tools are outside of the kernel Manual, http://www.FreeBSD.org/
such as kgdb. As with other tools, skilled use cgi/man.cgi
is obtained from practice and a bit of trial
and error. [4] Kernel Debugging, FreeBSD Developers’
Handbook, http://www.FreeBSD.org/
doc/en/books/developers-handbook