Operating Systems Project: Device Drivers: Jordi Garcia and Yolanda Becerra
Operating Systems Project: Device Drivers: Jordi Garcia and Yolanda Becerra
Operating Systems Project: Device Drivers: Jordi Garcia and Yolanda Becerra
September 2012
1. Introduction
The main aim of this project is to study the internal functions of an operating system in
depth. You will learn how to modify basic data structures of an OS and improve its
functionalities.
In this second project, a generic Linux distribution (specifically, version 2.6) will be
used and several kernel modules to add new functionalities will be implemented.
When you start your PC in the laboratory, you must boot Ubuntu and the image
labeled “proso” as usual.
In [1] you can find all the documentation about Linux Kernel Modules (LKM). They
basically allow kernel parts to be dynamically modified/added while Linux is still
running without having to recompile or relink, as you had to do in the previous project.
You will thus learn another way of modifying system code.
Obviously, some functions have restricted access and some functions cannot be
inserted into the kernel in this way. The most usual system changes made are to
device drivers. However, to do so, it is necessary to be a privileged user as not
everybody is allowed to make changes to a system. The printer driver is a typical
example. Imagine a laptop that is used at home and at work; it will have several
printer drivers installed on it. However, people rarely have the same printer at home as
they do at work. Therefore, even though the drivers will always be installed, the
physical devices (the printers) will not always be readily available.
1
This document was drawn up with the support of professors on previous courses: Julita
Corbalán, Juan José Costa, Marisa Gil, Jordi Guitart, Amador Millan, Gemma Reig Silvia
LLorente, Pablo Chacín and Rubén González.
--1--
Specifically, in this project you will add a monitoring mechanism for some Linux system
calls. This monitoring will be dynamically added by using a module, without the need
for recompiling the Linux kernel. Once monitored, a new device will be added to allow
users to access the statistics they wish to consult. It will therefore be necessary to
create a driver for this device. Another module will have to be used to avoid
recompiling the kernel.
A summary is given below of essential concepts and of the basic code for creating
modules, devices and drivers.
2. Previous concepts
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
MODULE_LICENSE(“GPL”);
/*
* Initialize the module.
*/
static int __init Mymodule_init(void
void)
void
{
/* Initialization code */
printk(KERN_DEBUG “Mymodule successfully loaded\n”);
return 0;
// This function returns 0 if is everything is OK
// and < 0 in case of error
}
/*
* Unload the module.
*/
static void __exit Mymodule_exit(void
void)
void
{
/* Finalization code*/
}
module_init(Mymodule_init);
module_exit(Mymodule_exit);
--2--
The optional tokens __init and __exit are used to indicate the kernel that these
functions can only be used when initializing/ending the module.
The routines defined with module_init and module_exit macros are automatically
executed when the module is loading and unloading, respectively.
The optional keywords __init and __exit inform the kernel that these functions can
only be used when the module is being initialized/ended.
The routines defined with the macros module_init and module_exit are executed
automatically when the module is loading and unloading, respectively. These macros
are mandatory.
There is a small example below. A parameter (the PID) of the type of integer that
could be modified in loading time is defined in the module’s source code:
#include <linux/moduleparam.h>
...
int pid = 1;
module_param (pid, int, 0);
MODULE_PARM_DESC (pid, "Process ID to monitor (default 1)");
...
MODULE_AUTHOR("Joe Bloggs <joe.bloggs@somewhere>");
MODULE_LICENSE ("GPL");
MODULE_DESCRIPTION("ProSO driver");
...
obj-m += mymodule.o
2
sysfs is a file system, generally located at /sys, used by the kernel to obtain information about
devices, modules, etc. You can find further information in Chapter 2 of Linux Device Drivers,
listed in the bibliography.
--3--
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
This command will result in an ELF file named mymodule.ko (ko=kernel object).
A module can only be unloaded when no one is accessing it. In order to ascertain
whether or not it is in use, the kernel maintains a reference counter that must be
properly updated. For instance, all the functions in a module that can be accessed from
other modules must increment this counter when called and decrement it when
returned. To maintain this counter, the programmer can use the following macros:
These counters can be checked in the special device /proc/modules. If the counter
is not 0, it is not possible to unload the module. Therefore it is important to maintain a
consistent number of gets and puts in the counter.
--4--
these dependencies to be expressed by means of the file /lib/modules/modules.dep
(for instance, /lib/modules/2.6.27-proso/modules.dep). For example, if module
moduleA requires module moduleB, this can be expressed as:
It should be highlighted that the path must be an absolute path to the module’s
code.
Thus, the command modprobe facilitates the task of loading modules when the
following command is executed:
#modprobe modulA.ko
2.2. Devices
A device is a real or virtual peripheral that users can use to perform input/output
operations or to interact with the OS kernel.
--5--
2.2.2. How to install a device driver in the system
There are two possible mechanisms:
• Statically, by recompiling all the system, including the new driver routines.
• Dynamically, by using system calls or software that make it possible to
dynamically include object files in the kernel of the OS (for example, a
module). You can see how a module is compiled and installed in Sections 2.1.3
and 2.1.4.
struct file_operations {
struct module *owner
owner;
owner
loff_t(*llseek) (struct file *, loff_t, int);
ssize_t(*read
read)
read (struct file *, char __user *, size_t, loff_t *);
ssize_t(*aio_read) (struct kiocb *, char __user *, size_t, loff_t);
ssize_t(*write) (struct file *, const char __user *, size_t, loff_t *);
ssize_t(*aio_write) (struct kiocb *, const char __user *, size_t,
loff_t);
int (*readdir) (struct file *, void *, filldir_t);
unsigned int (*poll) (struct file *, struct poll_table_struct *);
int (*ioctl
ioctl)
ioctl (struct inode *, struct file *, unsigned int,
unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
int (*open
open)
open (struct inode *, struct file *);
int (*flush) (struct file *);
int (*release
release) (struct inode *, struct file *);
release
int (*fsync) (struct file *, struct dentry *, int datasync);
int (*aio_fsync) (struct kiocb *, int datasync);
int (*fasync) (int, struct file *, int);
int (*lock) (struct file *, int, struct file_lock *);
ssize_t(*readv) (struct file *, const struct iovec *, unsigned long,
loff_t *);
ssize_t(*writev) (struct file *, const struct iovec *, unsigned long,
loff_t *);
ssize_t(*sendfile) (struct file *, loff_t *, size_t, read_actor_t,
void __user *);
ssize_t(*sendpage) (struct file *, struct page *, int, size_t,
loff_t *, int);
unsigned long (*get_unmapped_area) (struct file *, unsigned long,
unsigned long, unsigned long,
unsigned long);
You will use the open, release (which corresponds to close), read and ioctl
operations. The open function makes the device available to the program and the
release function ends access. They both return 0 if everything is correct and <0 in
the case of error. The arguments of the read function are the following: the user’s
--6--
buffer where the read characters are stored, the number of characters to read size,
and an offset in/out parameter that shows the current position of the read/write
pointer before it is read, and the current position after it is read. The call returns the
number of bytes read, 0 if it has reached the end of the file, or <0 if an error has
occurred. The ioctl function returns 0 if everything is correct or <0 if an error has
occurred.
The first field in the file_operations structure, named owner, is used when the
driver is installed as a module. If defined, this field saves the programmer the task of
explicitly handling the reference counter of the driver’s module (as explained in Section
2.1.6): when using the macro THIS_MODULE, the kernel automatically maintains the
reference counter and it is not necessary to use the functions try_module_get and
module_put3.
There are also definitions for the structures struct inode and struct file in
this header file.
In order to make the code reading easier, the required fields of this structure might be
tagged such as those listed below:
struct file_operations mymod_fops = {
owner: THIS_MODULE,
read: mymod_read,
ioctl: mymod_ioctl,
open: mymod_open,
release: mymod_release,
}
Note that this syntax is not C standard, but an extension of the GNU compiler. In the
case this compiler is not be available, our old friend NULL will have to be used in the
fields in which initialization is unwanted.
Finally, it must be pointed out that the only operations required are those that the
driver will have.
3
Keep in mind that a driver is considered in use since the device is opened until it is released.
--7--
<linux/types.h>.
Furthermore, when a new device is added to the system, it is also necessary to specify
the major and minor of the corresponding device driver that manages it. Thus, each
time an operation is performed on a device, the system uses this identifier to find the
operations of its driver.
The arguments are the first identifier of the region to be reserved (first), which
must be previously generated from a major and a minor using the MKDEV macro; the
number of identifiers to be reserved (count); and the name of the device (name),
which will be shown in /proc/devices. A negative return value means an error has
occurred.
This function reserves count identifiers, all of which have the same major (from the
parameter first) and consecutive minors (starting with the minor from the
parameter first)4.
To release the driver’s identifiers and allow them to be used in the future, the
unregister_chrdev_region function can be used.
void unregister_chrdev_region(dev_t first, unsigned int count);
The arguments are the region’s first identifier (first) and the number of identifiers in
the region (count).
Once the identifiers have been reserved, they must be associated with the driver’s
specific operations. To do so, a type cdev structure is used. This structure is defined
in the header file <linux/cdev.h>. First, it is necessary to define a new structure:
It is then necessary to reserve the memory space for the structure using the following
4
It is also possible to let the system assign all the identifiers for a driver, without passing the
first identifier in the range, but this requires using the alloc_chrdev_region function
instead of register_chrdev_region. For further details about this alternative function, see
Chapter 3 of Linux Device Drivers, cited in the bibliography.
--8--
function5:
struct cdev *cdev_alloc();
Two of its fields must then be initialized, namely, the owner field, used by the system
to maintain a counter of references to the structure and that must be initialized using
the macro THIS_MODULE; and the ops field, which must be initialized using the
structure file_operations that contains the specific operations for the driver.
Finally, this structure must be attached to the device structures registered in the
system using the following function:
int cdev_add(struct cdev *dev, dev_t num, unsigned int count);
The parameters of this function are: the structure that contains the operations of the
driver (dev), the first identifier of the region (num) and the number of drivers of the
region which are to be associated with these operations (count). This function returns
a negative value if any errors occur. Until this function is successfully executed, the
driver will not be visible to the system and, therefore, it will not be possible to use its
functions.
When the driver is no longer in use, its cdev structure must be removed from the
system:
void cdev_del(struct cdev *dev);
5
If the variable of type cdev is defined statically rather than as a pointer, the cdev_init
must be used instead of the cdev_alloc function. You will find the definition of this function
in Linux Device Drivers, cited in the bibliography.
6
In the previous version of Linux, the major was used to identify the device driver and the
minor was only used internally by the driver to distinguish between the different device types
that it could manage. In version 2.6 and the following versions, both numbers (major and
minor) are needed to identify the operations associated with a device. However, the format in
/proc/devices still only shows the major of the driver.
--9--
allows different drivers to have the same major, a major currently not assigned can
be selected to obtain a new major-minor combination. Thus, any minor can work
with the confidence that the combination is not already in use.
There is an option that frees the programmer from the task of selecting these
numbers, whereby the system is told to dynamically reserve a rank of driver identifiers
(which implicitly selects the majors and the minors in the region7). Note that in this
case, the driver identifiers can vary each time the driver is installed: this behavior must
be considered when the devices are added.
2.2.7. How the major and the minor are recognized inside the driver
Using the following macros:
int MAJOR(dev_t dev);
int MINOR(dev_t dev);
The value of the parameter dev is extracted from the inode (one of the parameters
that all the driver’s operations receive: Section Error! No s'ha trobat l'origen de la
referència.).
The arguments are: the file, which identifies the file that will be used as a device; the
type (a ‘c’ to create a character device); the major and the minor, which are integers
that make it possible to identify the device in the system (see Section Error! No s'ha
trobat l'origen de la referència. for further details; any minor can be used to
begin).
To see the various devices already existing in the system, check the file
/proc/devices, in which the available devices (major and registered names) are
grouped by device type.
Once the device file is created, the functionalities (i.e. which operations the system
allows for the device) of this “file” must be defined. This is done by using the device
driver.
7
For further details, see Chapter 3 of Linux Device Drivers, cited in the bibliography.
- - 10 - -
The only of these system calls that is really dependent on the peripheral is ioctl,
which allows users to perform specific operations on the peripheral by combining its
last two arguments.
2.3. Linux
2.3.1. How to find the Linux source code
The directory /usr/src/linux contains the system’s source code. The headers
related to the system version are in /usr/src/linux/include. The various
routines and structures used by Linux to manage processes can be found in Chapter 3
of [2]: for_each_process, find_task_by_pid, etc. (see http://lxr.linux.no).
2.3.2. Symbols
By symbols, we mean variable names and routine names. Symbols from an object
file can be consulted using the nm command.
Another kind of symbol is that defined using #define, such as the macro current,
which returns a pointer struct task struct that references the control data of the
running process. This structure is known as the PCB (Process Control Block). See
http://lxr.linux.no/source/include/asm-generic/current.h).
This symbol table is created in compilation time, since it is necessary to have the name
of the symbol and its address. To export a symbol, the macro
EXPORT_SYMBOL(symbol_name) must be used and the kernel recompiled (an example
can be seen at http://lxr.linux.no). As modules are part of the kernel, they can
also export symbols using this macro.
- - 11 - -
2.3.5. What must internal system routines return? Who receives this
information?
The Linux convention states that a negative value (< 0) is returned when an error
occurs. Otherwise, a non-negative value (>=0), is returned. The type of error is the
absolute value of the returned code. It is necessary to find out its meaning in the
header file <sys/errno.h>.
Usually, there is a unique translation function for the system code and a specific
translation function for each running process. This mechanism guarantees the system
security, since users cannot change system data from its applications or between
applications because they cannot access other application address spaces.
Likewise, the memory access mechanism depends on the execution mode. If it is
necessary to access the user address space (to pass parameters for system calls, for
example) when running in system mode, special instructions will be needed to tell the
processor that the user address space must be used, even if the system mode is on.
Remember that you will have to check in each case if it is possible for the user and the
system space to copy this information, as you did in Project 1.
unsigned long copy_to_user(void *to, const void *from, unsigned long count);
It must be highlighted that printk has a special feature: the first characters of a string
- - 12 - -
are interpreted as the message priority that is to be written. The format of this
information is:
printk ("<N>Goodbye cruel world\n");
printk (KERN_EMERG "Goodbye cruel world\n");
where N is a number between 0 and 7. Depending on the priority level, the message
appears in a different place: the computer console, a log file (for example,
/var/log/messages or /var/log/kern.log), etc. The log file name depends
on the system configuration (/etc/syslog.conf).
Some macros such as KERN_EMERG for defining different priorities can be found in the
file <linux/kernel.h>. If the priority is lower than console_loglevel, the
message is printed to the console. If syslogd and klogd are running, the message is
also written in the log file, regardless of whether or not it is written to the console.
All these kernel messages are kept on a structure called the “kernel ring buffer”, which
can be accessed using the dmesg command. The size of this buffer is limited, so old
messages are removed to make room for new ones (a kind of circular queue). More
information can be found in the appendix of this document or by using the man
command:
man dmesg
man syslogd
man syslog.conf
3. Description of work
In this project, you have to modify the system to take usage statistics. Current OSes
have different ways to store statistics about them, so that problems can be easily
identified and the appropriate actions taken. In our particular case, the task will be to
find out the system’s mean response time by focusing on system calls.
To do so, the entry point of each system call must be modified introducing new code
(instrumentation). We will keep information for the following calls: open, write,
clone, close and lseek. The information needed for each type can be summarized
by:
• Number of times the call is initiated
• Number of times it ends correctly
• Number of times it ends incorrectly
• The time the call is running
By adding this instrumentation to each system call, the system may be a little slower.
You are therefore to implement this instrumentation dynamically so that it can be
enabled or disabled.
To do this, the Linux system call table will be intercepted and each function to measure
up, replaced for a local function. This local function will check the time it takes the old
function to execute. This local function will have the same interface than the
corresponding system call (you can see this interface on the Linux source code).
- - 13 - -
To sum up, you will have to implement two modules:
• Module 1 to intercept system calls and measure the time spent.
• Module 2 to access system statistics.
This module will intercept the symbols table (modify the system call table), insert the
instrumentation functions when the module is loaded into the system (by enabling
instrumentation) and remove them when unloading it.
The instrumentation functions will be in the module and they will update the system
call counters for the current process. You can use the mechanism explained in the
section How to measure time for measuring times.
From the system call table, the original functions will be replaced by those to be
monitored for the routines created. These monitoring routines must (see Figure 3):
1. Mark the beginning of the system call
2. Execute the original system call
3. Calculate the total running time and obtain the call result
Running
program
Your routine is
executed: time check
Original system call
System call
System call
return
- - 14 - -
a symbol will be obtained that references the system call table (see
http://lxr.linux.no/).
From this time on, the variable sys_open_old will contain a pointer to the original
open system call.
ssize_t sys_write(unsigned int fd, const char __user * buf, size_t count) ;
- - 15 - -
current_thread_info() current
Figure 4. Sharing of two pages by the kernel stack and the thread_union
The task_struct contains the process information, such as the open files and a pointer
to the thread_info.
The thread_info is the structure that shares the memory space with the kernel stack. It
contains the thread’s execution state and a pointer to the task_struct. A definition can
be seen at http://lxr.linux.no/#linux+v2.6.34.1/arch/x86/include/asm/thread_info.h. A
definition of the thread_union (the convergence of the thread_info and the stack) is at
http://lxr.linux.no/#linux+v2.6.34.1/include/linux/sched.h#L1939. Also see Figure 4.
The macro current must be used to obtain the address of the task_struct. The
routine current_thread_info() must be used to obtain the base address of the
thread_info.
As the statistics require little space, they will be stored just above the structure
thread_info so that they can easily be associated with the process. To do so, a new
structure called my_thread must be created. It will contain the structure thread_info
and the statistics for the process (see Figure 5).
- - 16 - -
Figure 5. Where statistics are stored
For each process, it must be determined whether or not its statistics have been
initialized. Notice that to create a new process, the kernel will reuse the same data
structures previously associated with a dead process. Therefore, this structure can still
store the statistics from the previous process.
It must be determined whether or not statistics have been initialized. Therefore, an
additional field can be defined in the data structure to store the PID of the process
associated with the data currently stored. If the PID does not match the current
process’ PID, the statistics have not been initialized and, therefore, this must be done
and the PID updated.
proso_rdtsc(eax, edx);
return ((unsigned long long) edx << 32) + eax;
}
Each process must have its own counters for the system calls, so they will have to be
reset to zero for each new process.
As a final requirement, you must check that everything works properly. Thus, when the
module is uninstalled from the system, it will print the PID’s statistics on the
screen that will have been entered as an argument when the module was
inserted.
It is necessary to prevent the module from being unloaded when there is a process
with an intercepted call (you can use try_module_get and module_put, as explained in
Section 2.1.6).
- - 17 - -
3.1.8. Tests
To check that the module is working properly, a number of tests must be run to show
this. Below is a skeleton of the tests:
• Begin the test. Print a start message.
• Print the PID of the current process and block the process until a key is pressed.
• (Load the module with the PID of the test.)
• Press a key and continue with the test.
• Check that all the system calls have been monitored.
• Print an end message to finish the test. Block the process until a key is pressed.
• (Download the module and print the process’ statistics.) Check that the results
belong to the test.
• End the test.
• open. The device can only be opened one process at a time and only by the user
root (uid==0).
• read. A read on this device will return a structure to the user space (buffer) with
information about the current monitored system call for the process currently being
monitored. Users should create the structure before executing the read system call.
The number of bytes to be read will be the minimum before the s parameter and the
sizeof(struct_info).
• ioctl. Users will be able to modify the device’s settings using this call (selected
process and system call, etc.).
• release. This call will deallocate the use of the device.
- - 18 - -
By default, the read system call obtains the statistics of the open system call for the
process that opened the device. The structure that will be returned to the user is of the
type shown below:
struct t_info {
int num_entries;
int num_exits_ok;
int num_exits_error;
unsigned long long total_time;
}
In order to control the behavior of this new device using the ioctl call, the following
parameters (the values in brackets are constant values) must be defined:
The system call will return a zero if everything worked properly, and < 0 if an error
occurred (the corresponding error code will be displayed).
4. Dynamic monitoring
The aim in this stage of the project is to add more dynamism to the instrumentation
mechanisms than have been explained so far. To do so, the modules created must be
modified.
- - 19 - -
4.1. Changes in Module 1
The monitoring of system calls is to be dynamically activated/deactivated. Thus, the
new behavior will be as follows:
• By default, all five system calls will be monitored, as before.
• Two new functions will be added to make it possible to activate/deactivate the
monitoring of system calls. Module 2 will access these two new functions.
• The addresses of the system calls should be kept on a table (penalties will be
imposed if such a table is not implemented).
• Users must be able to easily introduce the type of the call to be implemented,
such as by using constants.
5. Deliverables
You should deliver all of the source files (including Makefiles) you have created and the
test suite you used to test the modules. Additionally, you must submit a README file
describing your test suite and the instructions to execute it.
- - 20 - -
6. References
3. Edit the Makefile to modify the variable EXTRAVERSION, to set the image
name:
# vi Makefile
...
EXTRAVERSION=-proso
...
- - 21 - -
8. Install the kernel image (vmlinuz-2.6.XXX-proso) and the symbols
(System.map-2.6.xx-proso).
# make install
9. Generate a boot file with the required modules (otherwise, the system will not
boot).
# mkinitramfs -o /boot/initrd.img-2.6.XXX-proso 2.6.XXX-proso
10. Modify the grub’s boot file, /boot/grub/menu.lst to add the new image.
# vi /boot/grub/menu.lst
11. Modify the following fields to point to the new image and the new initrd file:
title
kernel
initrd
- - 22 - -