Threads API
Threads API
Threads API
Concurrency: An Introduction
Thus far, we have seen the development of the basic abstractions that the
OS performs. We have seen how to take a single physical CPU and turn
it into multiple virtual CPUs, thus enabling the illusion of multiple pro-
grams running at the same time. We have also seen how to create the
illusion of a large, private virtual memory for each process; this abstrac-
tion of the address space enables each program to behave as if it has its
own memory when indeed the OS is secretly multiplexing address spaces
across physical memory (and sometimes, disk).
In this note, we introduce a new abstraction for a single running pro-
cess: that of a thread. Instead of our classic view of a single point of
execution within a program (i.e., a single PC where instructions are be-
ing fetched from and executed), a multi-threaded program has more than
one point of execution (i.e., multiple PCs, each of which is being fetched
and executed from). Perhaps another way to think of this is that each
thread is very much like a separate process, except for one difference:
they share the same address space and thus can access the same data.
The state of a single thread is thus very similar to that of a process.
It has a program counter (PC) that tracks where the program is fetch-
ing instructions from. Each thread has its own private set of registers it
uses for computation; thus, if there are two threads that are running on
a single processor, when switching from running one (T1) to running the
other (T2), a context switch must take place. The context switch between
threads is quite similar to the context switch between processes, as the
register state of T1 must be saved and the register state of T2 restored
before running T2. With processes, we saved state to a process control
block (PCB); now, we’ll need one or more thread control blocks (TCBs)
to store the state of each thread of a process. There is one major difference,
though, in the context switch we perform between threads as compared
to processes: the address space remains the same (i.e., there is no need to
switch which page table we are using).
One other major difference between threads and processes concerns
the stack. In our simple model of the address space of a classic process
1
2 C ONCURRENCY: A N I NTRODUCTION
0KB 0KB
Program Code the code segment: Program Code
where instructions live
1KB 1KB
the heap segment:
Heap contains malloc’d data Heap
2KB dynamic data structures 2KB
(it grows positively)
(free)
(free)
Stack (2)
O PERATING
S YSTEMS WWW. OSTEP. ORG
[V ERSION 1.01]
C ONCURRENCY: A N I NTRODUCTION 3
As it turns out, there are at least two major reasons you should use
threads. The first is simple: parallelism. Imagine you are writing a pro-
gram that performs operations on very large arrays, for example, adding
two large arrays together, or incrementing the value of each element in
the array by some amount. If you are running on just a single proces-
sor, the task is straightforward: just perform each operation and be done.
However, if you are executing the program on a system with multiple
processors, you have the potential of speeding up this process consider-
ably by using the processors to each perform a portion of the work. The
task of transforming your standard single-threaded program into a pro-
gram that does this sort of work on multiple CPUs is called paralleliza-
tion, and using a thread per CPU to do this work is a natural and typical
way to make programs run faster on modern hardware.
The second reason is a bit more subtle: to avoid blocking program
progress due to slow I/O. Imagine that you are writing a program that
performs different types of I/O: either waiting to send or receive a mes-
sage, for an explicit disk I/O to complete, or even (implicitly) for a page
fault to finish. Instead of waiting, your program may wish to do some-
thing else, including utilizing the CPU to perform computation, or even
issuing further I/O requests. Using threads is a natural way to avoid
getting stuck; while one thread in your program waits (i.e., is blocked
waiting for I/O), the CPU scheduler can switch to other threads, which
are ready to run and do something useful. Threading enables overlap of
I/O with other activities within a single program, much like multipro-
gramming did for processes across programs; as a result, many modern
server-based applications (web servers, database management systems,
and the like) make use of threads in their implementations.
Of course, in either of the cases mentioned above, you could use multi-
ple processes instead of threads. However, threads share an address space
and thus make it easy to share data, and hence are a natural choice when
constructing these types of programs. Processes are a more sound choice
for logically separate tasks where little sharing of data structures in mem-
ory is needed.
T HREE
c 2008–19, A RPACI -D USSEAU
E ASY
P IECES
4 C ONCURRENCY: A N I NTRODUCTION
1 #include <stdio.h>
2 #include <assert.h>
3 #include <pthread.h>
4 #include "common.h"
5 #include "common_threads.h"
6
12 int
13 main(int argc, char *argv[]) {
14 pthread_t p1, p2;
15 int rc;
16 printf("main: begin\n");
17 Pthread_create(&p1, NULL, mythread, "A");
18 Pthread_create(&p2, NULL, mythread, "B");
19 // join waits for the threads to finish
20 Pthread_join(p1, NULL);
21 Pthread_join(p2, NULL);
22 printf("main: end\n");
23 return 0;
24 }
Figure 26.2: Simple Thread Creation Code (t0.c)
After creating the two threads (let’s call them T1 and T2), the main
thread calls pthread join(), which waits for a particular thread to
complete. It does so twice, thus ensuring T1 and T2 will run and com-
plete before finally allowing the main thread to run again; when it does,
it will print “main: end” and exit. Overall, three threads were employed
during this run: the main thread, T1, and T2.
Let us examine the possible execution ordering of this little program.
In the execution diagram (Figure 26.3, page 5), time increases in the down-
wards direction, and each column shows when a different thread (the
main one, or Thread 1, or Thread 2) is running.
Note, however, that this ordering is not the only possible ordering. In
fact, given a sequence of instructions, there are quite a few, depending on
which thread the scheduler decides to run at a given point. For example,
once a thread is created, it may run immediately, which would lead to the
execution shown in Figure 26.4 (page 5).
We also could even see “B” printed before “A”, if, say, the scheduler
decided to run Thread 2 first even though Thread 1 was created earlier;
there is no reason to assume that a thread that is created first will run first.
Figure 26.5 (page 6) shows this final execution ordering, with Thread 2
getting to strut its stuff before Thread 1.
As you might be able to see, one way to think about thread creation
O PERATING
S YSTEMS WWW. OSTEP. ORG
[V ERSION 1.01]
C ONCURRENCY: A N I NTRODUCTION 5
is that it is a bit like making a function call; however, instead of first ex-
ecuting the function and then returning to the caller, the system instead
creates a new thread of execution for the routine that is being called, and
it runs independently of the caller, perhaps before returning from the cre-
ate, but perhaps much later. What runs next is determined by the OS
scheduler, and although the scheduler likely implements some sensible
algorithm, it is hard to know what will run at any given moment in time.
As you also might be able to tell from this example, threads make life
complicated: it is already hard to tell what will run when! Computers are
hard enough to understand without concurrency. Unfortunately, with
concurrency, it simply gets worse. Much worse.
T HREE
c 2008–19, A RPACI -D USSEAU
E ASY
P IECES
6 C ONCURRENCY: A N I NTRODUCTION
O PERATING
S YSTEMS WWW. OSTEP. ORG
[V ERSION 1.01]
C ONCURRENCY: A N I NTRODUCTION 7
1 #include <stdio.h>
2 #include <pthread.h>
3 #include "common.h"
4 #include "common_threads.h"
5
8 // mythread()
9 //
10 // Simply adds 1 to counter repeatedly, in a loop
11 // No, this is not how you would add 10,000,000 to
12 // a counter, but it shows the problem nicely.
13 //
14 void *mythread(void *arg) {
15 printf("%s: begin\n", (char *) arg);
16 int i;
17 for (i = 0; i < 1e7; i++) {
18 counter = counter + 1;
19 }
20 printf("%s: done\n", (char *) arg);
21 return NULL;
22 }
23
24 // main()
25 //
26 // Just launches two threads (pthread_create)
27 // and then waits for them (pthread_join)
28 //
29 int main(int argc, char *argv[]) {
30 pthread_t p1, p2;
31 printf("main: begin (counter = %d)\n", counter);
32 Pthread_create(&p1, NULL, mythread, "A");
33 Pthread_create(&p2, NULL, mythread, "B");
34
T HREE
c 2008–19, A RPACI -D USSEAU
E ASY
P IECES
8 C ONCURRENCY: A N I NTRODUCTION
prompt> ./main
main: begin (counter = 0)
A: begin
B: begin
A: done
B: done
main: done with both (counter = 19345221)
Let’s try it one more time, just to see if we’ve gone crazy. After all,
aren’t computers supposed to produce deterministic results, as you have
been taught?! Perhaps your professors have been lying to you? (gasp)
prompt> ./main
main: begin (counter = 0)
A: begin
B: begin
A: done
B: done
main: done with both (counter = 19221041)
Not only is each run wrong, but also yields a different result! A big
question remains: why does this happen?
O PERATING
S YSTEMS WWW. OSTEP. ORG
[V ERSION 1.01]
C ONCURRENCY: A N I NTRODUCTION 9
T HREE
c 2008–19, A RPACI -D USSEAU
E ASY
P IECES
10 C ONCURRENCY: A N I NTRODUCTION
(after instruction)
OS Thread 1 Thread 2 PC eax counter
before critical section 100 0 50
mov 8049a1c,%eax 105 50 50
add $0x1,%eax 108 51 50
interrupt
save T1
restore T2 100 0 50
mov 8049a1c,%eax 105 50 50
add $0x1,%eax 108 51 50
mov %eax,8049a1c 113 51 51
interrupt
save T2
restore T1 108 51 51
mov %eax,8049a1c 113 51 51
Figure 26.7: The Problem: Up Close and Personal
O PERATING
S YSTEMS WWW. OSTEP. ORG
[V ERSION 1.01]
C ONCURRENCY: A N I NTRODUCTION 11
T IP : U SE ATOMIC O PERATIONS
Atomic operations are one of the most powerful underlying techniques
in building computer systems, from the computer architecture, to concur-
rent code (what we are studying here), to file systems (which we’ll study
soon enough), database management systems, and even distributed sys-
tems [L+93].
The idea behind making a series of actions atomic is simply expressed
with the phrase “all or nothing”; it should either appear as if all of the ac-
tions you wish to group together occurred, or that none of them occurred,
with no in-between state visible. Sometimes, the grouping of many ac-
tions into a single atomic action is called a transaction, an idea devel-
oped in great detail in the world of databases and transaction processing
[GR92].
In our theme of exploring concurrency, we’ll be using synchronization
primitives to turn short sequences of instructions into atomic blocks of
execution, but the idea of atomicity is much bigger than that, as we will
see. For example, file systems use techniques such as journaling or copy-
on-write in order to atomically transition their on-disk state, critical for
operating correctly in the face of system failures. If that doesn’t make
sense, don’t worry — it will, in some future chapter.
T HREE
c 2008–19, A RPACI -D USSEAU
E ASY
P IECES
12 C ONCURRENCY: A N I NTRODUCTION
O PERATING
S YSTEMS WWW. OSTEP. ORG
[V ERSION 1.01]
C ONCURRENCY: A N I NTRODUCTION 13
T HREE
c 2008–19, A RPACI -D USSEAU
E ASY
P IECES
14 C ONCURRENCY: A N I NTRODUCTION
References
[D65] “Solution of a problem in concurrent programming control” by E. W. Dijkstra. Commu-
nications of the ACM, 8(9):569, September 1965. Pointed to as the first paper of Dijkstra’s where
he outlines the mutual exclusion problem and a solution. The solution, however, is not widely used;
advanced hardware and OS support is needed, as we will see in the coming chapters.
[D68] “Cooperating sequential processes” by Edsger W. Dijkstra. 1968. Available at this site:
http://www.cs.utexas.edu/users/EWD/ewd01xx/EWD123.PDF. Dijkstra has an amaz-
ing number of his old papers, notes, and thoughts recorded (for posterity) on this website at the last
place he worked, the University of Texas. Much of his foundational work, however, was done years
earlier while he was at the Technische Hochshule of Eindhoven (THE), including this famous paper on
“cooperating sequential processes”, which basically outlines all of the thinking that has to go into writ-
ing multi-threaded programs. Dijkstra discovered much of this while working on an operating system
named after his school: the “THE” operating system (said “T”, “H”, “E”, and not like the word “the”).
[GR92] “Transaction Processing: Concepts and Techniques” by Jim Gray and Andreas Reuter.
Morgan Kaufmann, September 1992. This book is the bible of transaction processing, written by one
of the legends of the field, Jim Gray. It is, for this reason, also considered Jim Gray’s “brain dump”,
in which he wrote down everything he knows about how database management systems work. Sadly,
Gray passed away tragically a few years back, and many of us lost a friend and great mentor, including
the co-authors of said book, who were lucky enough to interact with Gray during their graduate school
years.
[L+93] “Atomic Transactions” by Nancy Lynch, Michael Merritt, William Weihl, Alan Fekete.
Morgan Kaufmann, August 1993. A nice text on some of the theory and practice of atomic transac-
tions for distributed systems. Perhaps a bit formal for some, but lots of good material is found herein.
[NM92] “What Are Race Conditions? Some Issues and Formalizations” by Robert H. B. Netzer
and Barton P. Miller. ACM Letters on Programming Languages and Systems, Volume 1:1,
March 1992. An excellent discussion of the different types of races found in concurrent programs. In
this chapter (and the next few), we focus on data races, but later we will broaden to discuss general
races as well.
[SR05] “Advanced Programming in the U NIX Environment” by W. Richard Stevens and Stephen
A. Rago. Addison-Wesley, 2005. As we’ve said many times, buy this book, and read it, in little
chunks, preferably before going to bed. This way, you will actually fall asleep more quickly; more im-
portantly, you learn a little more about how to become a serious U NIX programmer.
O PERATING
S YSTEMS WWW. OSTEP. ORG
[V ERSION 1.01]
C ONCURRENCY: A N I NTRODUCTION 15
Homework (Simulation)
This program, x86.py, allows you to see how different thread inter-
leavings either cause or avoid race conditions. See the README for de-
tails on how the program works, then answer the questions below.
Questions
1. Let’s examine a simple program, “loop.s”. First, just read and un-
derstand it. Then, run it with these arguments (./x86.py -p loop.s
-t 1 -i 100 -R dx) This specifies a single thread, an interrupt
every 100 instructions, and tracing of register %dx. What will %dx
be during the run? Use the -c flag to check your answers; the an-
swers, on the left, show the value of the register (or memory value)
after the instruction on the right has run.
2. Same code, different flags: (./x86.py -p loop.s -t 2 -i 100
-a
dx=3,dx=3 -R dx) This specifies two threads, and initializes each
%dx to 3. What values will %dx see? Run with -c to check. Does
the presence of multiple threads affect your calculations? Is there a
race in this code?
3. Run this: ./x86.py -p loop.s -t 2 -i 3 -r -a dx=3,dx=3
-R dx This makes the interrupt interval small/random; use dif-
ferent seeds (-s) to see different interleavings. Does the interrupt
frequency change anything?
4. Now, a different program, looping-race-nolock.s, which ac-
cesses a shared variable located at address 2000; we’ll call this vari-
able value. Run it with a single thread to confirm your under-
standing: ./x86.py -p
looping-race-nolock.s -t 1 -M 2000 What is value (i.e.,
at memory address 2000) throughout the run? Use -c to check.
5. Run with multiple iterations/threads: ./x86.py -p
looping-race-nolock.s -t 2 -a bx=3 -M 2000 Why does
each thread loop three times? What is final value of value?
6. Run with random interrupt intervals: ./x86.py -p
looping-race-nolock.s -t 2 -M 2000 -i 4 -r -s 0 with
different seeds (-s 1, -s 2, etc.) Can you tell by looking at the
thread interleaving what the final value of value will be? Does the
timing of the interrupt matter? Where can it safely occur? Where
not? In other words, where is the critical section exactly?
T HREE
c 2008–19, A RPACI -D USSEAU
E ASY
P IECES
16 C ONCURRENCY: A N I NTRODUCTION
O PERATING
S YSTEMS WWW. OSTEP. ORG
[V ERSION 1.01]