2014 06 Kernel Programming
2014 06 Kernel Programming
Alessandro Barenghi
Linux
A bit of history
The Linux kernel development started back in 1991
The first release was developed to have a working, simple OS,
no strings attached
In 25yrs, the codebase has grown from 140k LOC to 14M LOC
At the moment, the most used monolithic kernel around
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
Macrokernel
Macrokernel
Key areas
The Linux kernel is logically split in 6 master areas
System management : bootup, shutdown, syscall interfaces
Process management : scheduling, inner locks and mutexes,
synchronization primitives
Memory management : Memory allocator, page handler,
virtual memory mapper
Storage management : file access primitives, virtual filesystem
management, logic filesystem management and disk handling
Networking management : network syscalls, socket bufffer
handling, protocol and filtering handling, network drivers
User Interaction management : character devices, security
management, process tracing management and HI devices
management
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
Module structure
What’s in a module
A kernel module is a binary blob, which can be linked at
insertion time with the whole kernel
Think of it as a sort of a “strange” static library
The linking is performed only against kernel symbols: no libc
around here...
Particular care should be exercised before calling kernel
symbols prefixed by a double underscore, as they represent
lower level functions
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
Module structure
A simple module
Contents
A module is constituted of one (or more) C files, containing a
collection of functions
Two functions are mandatory
init module performs all the initializations of the resources at
insertion time
cleanup module performs the pre-removal cleanup
All the variables declared in the global scope of the module
are actually residing in kernel memory
The stack of the module is shared with all the others kernel
functions (i.e. the kernel has a single stack) and it’s rather
small
Dynamic memory allocation encouraged for large variables as
they would clutter the stack
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
A simple module
Building
In order to build a module,you just need the usual gcc
compiler
To specify that a kernel module object must be built, the
obj-m target is used in the Makefile
You will need at least the Linux kernel header files to compile
a modulea
If you are planning to do heavy modifications, the full kernel
source tree may come in handy
a
available as a handy package under almost every linux distribution
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
A simple module
Module Management
Once a module has been successfully built, you can check
informations about it via the modinfo command
Module insertion is performed via the insmod command,
while removal is done via rmmod
You can obtain a list of the inserted modules via the lsmod
command
It is pretty obvious that only root can insert and remove
kernel modules
The kernel ring buffer (where log messages appear) can be
accessed via the dmesg command
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
A simple module
A simple module
Parameter passing
It is possible to pass parameters to a module at insertion time
The parameter parsing is done according to the call to the
module param primitive
The module param primitive accepts the name of the
parameter, the type and the permission for changing it, if it
will be exposed via sysfs
It is possible also to pass arrays as parameters via the
module param array function
The module param array behaves in a similar way to the
argc - argv mechanism in userspace programs
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
kmalloc
The most simple way to get dynamic memory in kernelspace is
the use of the kmalloc primitive
The primitive directly calls the get free pages function
appropriately, so space is available only in page sized chunks
There is an upper limit for the maximum size of a kmalloc :
portable code should not use more than 128kB per shot
The kmalloc primitive can be invoked with different flags to
steer the behaviour of the memory allocator, in particular
GFP KERNEL is the default behaviour flag, may block and put
to sleep the current process
GFP ATOMIC is specifies that the current process should not be
put to sleep and can claim up to the last page available
kfree frees the memory claimed with kmalloc
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
vmalloc
If you are not in need of physically contiguous memory, you
may use the vmalloc primitive
The vmalloc calls the page handler at a higher level resulting
in an allocation of an arbitrarily large amount of memory
Since the call depth is greater than kmalloc , vmalloc is
obviously less performing that kmalloc
As before, you can (and must)free the memory via vfree
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
Concurrency handling
Concurrency issues
As we now know, the Linux kernel is one large monolith as far
as the running code goes with the same address space
accessible for all the modules
Once upon a long time ago, when the systems had a single
processor and the kernel structure was simpler, only one task
would have been executed at once in kernelspace
Still, hardware interrupts could get in the way of atomic
operations being performed
Then multiprocessor system started being supported back in
1996, starting to cause the first, serious concurrency issues
The whole thing got a lot worse when the whole kernel
became preemptible with the 2.6 series (around mid 2002 with
2.5.37)
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
Concurrency issues
Solutions available
As the concurrency issues are pretty serious, the kernel offers
native facilities to prevent problems
Fully atomic variables are available
Semaphore-structures were implemented since a long time ago
Spinlocks represent the main difference between userspace and
kernelspace concurrency handling mechanisms (used most of
the time)
Read-Copy-Update mechanisms are available to provide
advanced and performant concurrency handling (especially
useful for NetFilter)
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
Concurrency issues
Atomic Variables
In case the resource which may be shared among different
kernel parts is a simple integer
In this case, it is possible to avoid complex concurrency
handling structures via the use of atomic variables
The atomic [set|add|inc|dec|sub] provide the means to
atomically perform that operation on the integer value
Operations on atomic variables are usually extremely fast, as
they are compiled as single assembly instructions if the
architecture allows so
A companion primitive set is the atomic * and test group
which check if the operation was correctly performed
afterwards and are useful to implement election mechanisms
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
Concurrency issues
Spinlocks
Spinlocks are mutual exclusion primitives akin to common
mutexes
The main difference is that a spinlock will never be put to
sleep until it gains access to the resource
Spinlocks are structures of spinlock t type (defined in
spinlock.h )
Different locking and unlocking functions are available
spin lock and spin unlock are the garden variety spinlock
spin lock irqsave and spin unlock irqsave will mask
hardware interrupts and restore the IV state after the lock has
been resolved
spin lock bh and spin unlock bh only mask software
interrupts
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
Overview
In synchronization mechanisms, a key issue is preventing
deadlocks: a deadlock is a state of the computation where the
access to the resources is prevented due to a circular
dependence in the access
If a mechanism warrants that every entity will access a
protected region, it is called lock-free
In case the access will necessarily happen within a bounded
number of steps, it is also defined as wait-free
Lock-freedom warrants that a system will not hang,
wait-freedom that noone will starve (i.e. that access to a
resource is possible in a bounded amount of time)
Only a few wait free algorithms are known in literature: we
will tackle circular buffers and read-copy-update mechanisms
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
Circular buffers
Overview
Circular buffers are a memorisation structure which can be
accessed in a lockless, wait-free fashion
The key idea is that a memory buffer is thought of as circular
instead of the common linear form
This implies that writing beyond the end of the buffer starts
writing back from the beginning
The most common implementation involves two cursors, one
pointing to the beginning of the valid data, the other to the
end
Key element: can be implemented even without atomic
variables
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
Circular buffers
Typical actions
Only one reader or writer is admitted to the structure; the
structure is lock free as no possible deadlocks can happen
Reader: the reader accesses the buffer reading the pointers
first. Once the boundaries are known, the read access will be
safe.
Writer: the writer reads the boundaries, performs the writing
action and finally updates the end pointer.
Deletion from the buffer is managed moving forward the start
pointer (no explicit need to blank the memory cells)
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
Circular buffers
6 7 8 9 A B 5 FULL
START
END
EMPTY
START
Circular buffers
Circular buffers
Read-Copy-Update
Overview
Fully wait-free reads (with multiple readers) and wait-free
write (one writer only) is achievable via Read-Copy-Update
constructs
RCUs are a relatively recent (2006) strategy to avoid update
conflicts on a shared variable
They are now implemented in both the Linux kernel and as a
user space available library liburcu and their use is advised
whenever a variable is shared among many readers, while
being updated by a few writers
The key idea is to decouple the writing phase from the
removal of the old data, avoiding syncronization issues
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
rcu_assign_pointer(gptr, p)
->b=? ->b=2 ->b=2
->c=? ->c=3 ->c=3
initialization
kmalloc()
2 Versions
1 Version
1 Version
1 Version
Readers?
Readers?
Readers?
A A A A
free()
B B B
C C C C
synchronize_rcu()
rcu_read_lock()
rcu_read_unlock()
call_rcu()
Reclaimer
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
Devices
In order to expose a unified interface for communication with
the hardware, the kernel exposes devices
Following the UNIX philosophy, the devices are seen in
userspace as simple files
It is possible to either expose a real device via a
block/character interface (e.g. /dev/sda)
Or to build a mockup device which may be useful (/dev/zero)
A simpler alternative, if there is only the need to communicate
between userspace and kernelspace is the debug filesystem
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
A real device
A real character device needs to implement all the possible
operations which can be performed on it
Moreover, it is required to handle the number of stakeholders
which are actually using the device to avoid improper removal
of the module
The devices are accessible from the userspace via a peculiar
filesystem entry, which does not have any actual space
reserved on disk known as device node
Real devices are split into :
Character devices: minimum unit for access : single character
(one byte), usually unbuffered
Block devices: minimum unit for access : a block of data (a
contiguous chunk in the kB size range), usually buffered
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
Device implementation
We will see the implementation of a mockup character devicea
A character device needs to implement at least four key
primitives : open,read,write and release
It also needs to take into account whether someone is using
the device in order to prevent premature module removal
The transferral of the data from kernel to user address space
is managed by the put user primitive
a
Block devices go the same way, just with more functionalities
Kernel Structure Kernel Modules vs Processes A Simple Module Concurrency handling Synchronize without starving Character D
Node setup
A device node can be created via the mknod utility and needs
three parameters
The type of the device (block or character device)
The major number, i.e. a unique, kernel assigned, identifier for
the device
The minor number, a sub-index handled by the module
answering for that device in kernelspace
A list of all the devices exported by the kernel is available via
/proc/devices
It is possible also to avoid static devices via the udev
filesystem, which is automatically populated by the kernela
a
say, the partitions of a hard disk