Chapter 7
Chapter 7
Chapter 7
Examples
CHAPTER OBJECTIVES
We assume that the pool consists of n buffers, each capable of holding one item.
The mutex binary semaphore provides mutual exclusion for accesses to the
buffer pool and is initialized to the value 1. The empty and full semaphores
count the number of empty and full buffers. The semaphore empty is initialized
to the value n; the semaphore full is initialized to the value 0.
The code for the producer process is shown in Figure 7.1, and the code
for the consumer process is shown in Figure 7.2. Note the symmetry between
the producer and the consumer. We can interpret this code as the producer
producing full buffers for the consumer or as the consumer producing empty
buffers for the producer.
between these two types of processes by referring to the former as readers and
to the latter as writers. Obviously, if two readers access the shared data
simultaneously, no adverse effects will result. However, if a writer and some
other process (either a reader or a writer) access the database simultaneously,
chaos may ensue.
To ensure that these difficulties do not arise, we require that the writers
have exclusive access to the shared database while writing to the database. This
synchronization problem is referred to as the readers– writers problem. Since it
was originally stated, it has been used to test nearly every new synchronization
primitive.
The readers– writers problem has several variations, all involving priori-
ties. The simplest one, referred to as the first readers– writers problem, requires
that no reader be kept waiting unless a writer has already obtained permission
to use the shared object. In other words, no reader should wait for other readers
to finish simply because a writer is waiting. The second readers– writers
problem requires that, once a writer is ready, that writer perform its write as
soon as possible. In other words, if a writer is waiting to access the object, no
new readers may start reading.
A solution to either problem may result in starvation. In the first case,
writers may starve; in the second case, readers may starve. For this reason,
other variants of the problem have been proposed. Next, we present a solution
to the first readers– writers problem. See the bibliographical notes at the end
of the chapter for references describing starvation-free solutions to the second
readers– writers problem.
In the solution to the first readers– writers problem, the reader processes
share the following data structures:
semaphore rw mutex = 1;
semaphore mutex = 1;
int read count = 0;
The b i n a r y s e m a p h o r e s m u t e x and r w mutex are i n i t i a l i z e d to
1;
read count is a counting semaphore initialized to 0. The semaphore rw mutex
while (true) {
wait (rw mutex);
. . .
/* writing is performed */
. . .
signal (rw mutex);
}
is common to both reader and writer processes. The mutex semaphore is used
to ensure mutual exclusion when the variable read count is updated.
The read count variable keeps track of how many processes are currently
reading the object. The semaphore rw mutex functions as a mutual exclusion
semaphore for the writers. It is also used by the first or last reader that enters
or exits the critical section. It is not used by readers that enter or exit while
other readers are in their critical sections.
The code for a writer process is shown in Figure 7.3; the code for a reader
process is shown in Figure 7.4. Note that, if a writer is in the critical section
and n readers are waiting, then one reader is queued on rw mutex, and n − 1
readers are queued on mutex. Also observe that, when a writer executes signal
(rw mutex), we may resume the execution of either the waiting readers or a
single waiting writer. The selection is made by the scheduler.
The readers– writers’ problem and its solutions have been generalized to
provide reader – writer locks on some systems. Acquiring a reader– writer lock
requires specifying the mode of the lock: either read or write access. When a
while (true) {
wait(mutex);
read count++;
if (read count == 1)
wait(rw mutex);
signal(mutex);
. . .
/* reading is performed */
. . .
wait(mutex);
read count--;
if (read count == 0)
signal(rw mutex);
signal(mutex);
}
RICE
Philosopher i can set the variable state[i] = EATING only if her two neigh-
bors are not eating: (state[(i+4) % 5] != EATING) and (state[(i+1) %
5] != EATING).
We also need to declare
condition self[5];
This allows philosopher i to delay herself when she is hungry but is unable to
obtain the chopsticks she needs.
We are now in a position to describe our solution to the dining-
philosophers problem. The distribution of the chopsticks is controlled by
the monitor DiningPhilosophers, whose definition is shown in Figure 7.7.
Each philosopher, before starting to eat, must invoke the operation pickup().
This act may result in the suspension of the philosopher process. After the
successful completion of the operation, the philosopher may eat. Following
this, the philosopher invokes the putdown() operation. Thus, philosopher i
must invoke the operations pickup() and putdown() in the following
sequence:
DiningPhilosophers.pickup(i);
...
eat
...
DiningPhilosophers.putdown(i);
It is easy to show that this solution ensures that no two neighbors are
eating simultaneously and that no deadlocks will occur. As we already noted,
however, it is possible for a philosopher to starve to death. We do not present
a solution to this problem but rather leave it as an exercise for you.
void putdown(int i) {
state[i] = THINKING;
test((i + 4) % 5);
test((i + 1) % 5);
}
void test(int i) {
if ((state[(i + 4) % 5] != EATING) &&
(state[i] == HUNGRY) &&
(state[(i + 1) % 5] != EATING)) {
state[i] = EATING;
self[i].signal();
}
}
initialization code() {
for (int i = 0; i < 5; i++)
state[i] = THINKING;
}
}
atomic t counter;
int value;
The following code illustrates the effect of performing various atomic opera-
tions:
Atomic Operation Effect
atomic set(&counter,5); counter = 5
atomic add(10,&counter); counter = counter + 10
atomic sub(4,&counter); counter = counter - 4
atomic inc(&counter); counter = counter + 1
value = atomic read(&counter); value = 12
In the Linux kernel, both spinlocks and mutex locks are nonrecursive,
which means that if a thread has acquired one of these locks, it cannot acquire
the same lock a second time without first releasing the lock. Otherwise, the
second attempt at acquiring the lock will block.
Linux uses an interesting approach to disable and enable kernel
preemption. It provides two simple system calls— preempt disable()
and preempt enable()— for disabling and enabling kernel preemption. The
kernel is not preemptible, however, if a task running in the kernel is holding
a lock. To enforce this rule, each task in the system has a thread-info
structure contain- ing a counter, preempt count, to indicate the number of
locks being held by the task. When a lock is acquired, preempt count is
incremented. It is decremented when a lock is released. If the value of
preempt count for the task currently running in the kernel is greater than 0, it is
not safe to preempt the kernel, as this task currently holds a lock. If the count is
0, the kernel can safely be interrupted (assuming there are no outstanding
calls to preempt disable()).
Spinlocks— along with enabling and disabling kernel preemption— are
used in the kernel only when a lock (or disabling kernel preemption) is held
for a short duration. When a lock must be held for a longer period, semaphores
or mutex locks are appropriate for use.
The mutex is acquired and released with the pthread mutex lock() and
pthread mutex unlock() functions. If the mutex lock is unavailable when
pthread mutex lock() is invoked, the calling thread is blocked until the
owner invokes pthread mutex unlock(). The following code illustrates
protecting a critical section with mutex locks:
/* critical section */
All mutex functions return a value of 0 with correct operation; if an error occurs,
these functions return a nonzero error code.
#include <semaphore.h>
sem t *sem;
/* Create the semaphore and initialize it to 1 */
sem = sem open("SEM", O CREAT, 0666, 1);
In this instance, we are naming the semaphore SEM. The O CREAT flag indicates
that the semaphore will be created if it does not already exist. Additionally, the
semaphore has read and write access for other processes (via the parameter
0666) and is initialized to 1.
The advantage of named semaphores is that multiple unrelated processes
can easily use a common semaphore as a synchronization mechanism by
simply referring to the semaphore’s name. In the example above, once the
semaphore SEM has been created, subsequent calls to sem open() (with the
same parameters) by other processes return a descriptor to the existing
semaphore.
In Section 6.6, we described the classic wait() and signal() semaphore
operations. POSIX declares these operations sem wait() and sem post(),
respectively. The following code sample illustrates protecting a critical section
using the named semaphore created above:
/* critical section */
#include <semaphore.h>
sem t sem;
In this example, by passing the flag 0, we are indicating that this semaphore can
be shared only by threads belonging to the process that created the semaphore.
(If we supplied a nonzero value, we could allow the semaphore to be shared
between separate processes by placing it in a region of shared memory.) In
addition, we initialize the semaphore to the value 1.
POSIX unnamed semaphores use the same sem wait() and sem post()
operations as named semaphores. The following code sample illustrates
protecting a critical section using the unnamed semaphore created above:
/* acquire the semaphore */
sem wait(&sem);
/* critical section */
The mutex lock associated with the condition variable must be locked
before the pthread cond wait() function is called, since it is used to protect the
data in the conditional clause from a possible race condition. Once this lock is
acquired, the thread can check the condition. If the condition is not true, the
thread then invokes pthread cond wait(), passing the mutex lock and the
condition variable as parameters. Calling pthread cond wait() releases the
mutex lock, thereby allowing another thread to access the shared data and
possibly update its value so that the condition clause evaluates to true. (To
protect against program errors, it is important to place the conditional clause
within a loop so that the condition is rechecked after being signaled.)
Athread that modifies the shared data can invoke the pthread cond signal()
function, thereby signaling one thread waiting on the condition variable. This
is illustrated below:
pthread mutex lock(&mutex);
a = b;
pthread cond signal(&cond var);
pthread mutex unlock(&mutex);
It is important to note that the call to pthread cond signal() does not
release the mutex lock. It is the subsequent call to pthread mutex unlock()
that releases the mutex. Once the mutex lock is released, the signaled thread
becomes the owner of the mutex lock and returns control from the call to
pthread cond wait().
We provide several programming problems and projects at the end of this
chapter that use Pthreads mutex locks and condition variables, as well as POSIX
semaphores.
public BoundedBuffer() {
count = 0;
in = 0;
out = 0;
buffer = (E[]) new Object[BUFFER SIZE];
}
arbitrarily selects a thread from this set to be the owner of the lock. (When we
say “arbitrarily,” we mean that the specification does not require that threads in
this set be organized in any particular order. However, in practice, most virtual
machines order threads in the entry set according to a FIFO policy.) Figure 7.10
illustrates how the entry set operates.
In addition to having a lock, every object also has associated with it a wait
set consisting of a set of threads. This wait set is initially empty. When a thread
enters a synchronized method, it owns the lock for the object. However, this
thread may determine that it is unable to continue because a certain condition
acquire lock
object
lock
owner
entry set
The amount of time between when a lock is acquired and when it is released
is defined as the scope of the lock. A synchronized method that has only
a small percentage of its code manipulating shared data may yield a scope
that is too large. In such an instance, it may be better to synchronize only
the block of code that manipulates shared data than to synchronize the entire
method. Such a design results in a smaller lock scope. Thus, in addition to
declaring synchronized methods, Java also allows block synchronization,
as illustrated below. Only the access to the critical-section code requires
ownership of the object lock for the this object.
public void someMethod() {
/* non-critical section */
synchronized(this) {
/* critical section */
}
/* remainder section */
}
has not been met. That will happen, for example, if the producer calls the
insert() method and the buffer is full. The thread then will release the lock
and wait until the condition that will allow it to continue is met.
When a thread calls the wait() method, the following happens:
Consider the example in Figure 7.11. If the producer calls the insert()
method and sees that the buffer is full, it calls the wait() method. This call
releases the lock, blocks the producer, and puts the producer in the wait set for
the object. Because the producer has released the lock, the consumer ultimately
enters the remove() method, where it frees space in the buffer for the producer.
Figure 7.12 illustrates the entry and wait sets for a lock. (Note that although
wait() can throw an InterruptedException, we choose to ignore it for code
clarity and simplicity.)
How does the consumer thread signal that the producer may now proceed?
Ordinarily, when a thread exits a synchronized method, the departing thread
releases only the lock associated with the object, possibly removing a thread
from the entry set and giving it ownership of the lock. However, at the end of
the insert() and remove() methods, we have a call to the method notify().
The call to notify():
1. Picks an arbitrary thread T from the list of threads in the wait set
/* Producers call this method */
public synchronized void insert(E item) {
while (count == BUFFER SIZE) {
try {
wait();
}
catch (InterruptedException ie) { }
}
buffer[in] = item;
in = (in + 1) % BUFFER SIZE;
count++;
notify();
}
while (count == 0) {
try {
wait();
}
catch (InterruptedException ie) { }
}
item = buffer[out];
out = (out + 1) % BUFFER SIZE;
count--;
notify();
return item;
}
Figure 7.11 insert() and remove() methods using wait() and notify().
T is now eligible to compete for the lock with the other threads. Once T has
regained control of the lock, it returns from calling wait(), where it may
check the value of count again. (Again, the selection of an arbitrary thread
is according to the Java specification; in practice, most Java virtual machines
order threads in the wait set according to a FIFO policy.)
acquire lock wait
object
lock
owner
• The producer calls the insert() method, sees that the lock is available,
and enters the method. Once in the method, the producer determines that
the buffer is full and calls wait(). The call to wait() releases the lock for
the object, sets the state of the producer to blocked, and puts the producer
in the wait set for the object.
• The consumer ultimately calls and enters the remove() method, as the
lock for the object is now available. The consumer removes an item from
the buffer and calls notify(). Note that the consumer still owns the lock
for the object.
• The call to notify() removes the producer from the wait set for the object,
moves the producer to the entry set, and sets the producer’s state to
runnable.
• The consumer exits the remove() method. Exiting this method releases the
lock for the object.
• The producer tries to reacquire the lock and is successful. It resumes execution
from the call to wait(). The producer tests the while loop, determines that
room is available in the buffer, and proceeds with the remainder of the
insert() method. If no thread is in the wait set for the object, the call to
notify() is ignored. When the producer exits the method, it releases the
lock for the object.
key.lock();
try {
/* critical section */
}
finally {
key.unlock();
}
7.4.3 Semaphores
The Java API also provides a counting semaphore, as described in Section 6.6.
The constructor for the semaphore appears as
Semaphore(int value);
where value specifies the initial value of the semaphore (a negative value
is allowed). The acquire() method throws an InterruptedException if the
acquiring thread is interrupted. The following example illustrates using a
semaphore for mutual exclusion:
Semaphore sem = new Semaphore(1);
try {
sem.acquire();
/* critical section */
}
catch (InterruptedException ie) { }
finally {
sem.release();
}
Notice that we place the call to release() in the finally clause to ensure that
the semaphore is released.
try {
/**
* If it’s not my turn, then wait
* until I’m signaled.
*/
if (threadNumber != turn)
condVars[threadNumber].await();
/**
* Do some work for awhile ...
*/
/**
* Now signal to the next thread.
*/
turn = (turn + 1) % 5;
condVars[turn].signal();
}
catch (InterruptedException ie) { }
finally {
lock.unlock();
}
}
release();
}
However, using synchronization mechanisms such as mutex locks and
semaphores involves many potential problems, including deadlock.
Additionally, as the number of threads increases, traditional locking doesn’t
scale as well, because the level of contention among threads for lock ownership
becomes very high.
As an alternative to traditional locking methods, new features that take
advantage of transactional memory can be added to a programming language.
In our example, suppose we add the construct atomic{S}, which ensures that
the operations in S execute as a transaction. This allows us to rewrite the
update() function as follows:
void update ()
{
atomic {
/* modify shared data */
}
}
The advantage of using such a mechanism rather than locks is that the
transactional memory system— not the developer— is responsible for
guaranteeing atomicity. Additionally, because no locks are involved, deadlock
is not possible. Furthermore, a transactional memory system can identify
which statements in atomic blocks can be executed concurrently, such as
concurrent read access to a shared variable. It is, of course, possible for a
programmer to identify these situations and use reader– writer locks, but the
task becomes increasingly difficult as the number of threads within an
application grows.
Transactional memory can be implemented in either software or hardware.
Software transactional memory (STM), as the name suggests, implements
transactional memory exclusively in software— no special hardware is needed.
STM works by inserting instrumentation code inside transaction blocks. The
code is inserted by a compiler and manages each transaction by examining
where statements may run concurrently and where specific low-level locking is
required. Hardware transactional memory (HTM) uses hardware cache
hierarchies and cache coherency protocols to manage and resolve conflicts
involving shared data residing in separate processors’ caches. HTM requires
no special code instrumentation and thus has less overhead than STM.
However, HTM does require that existing cache hierarchies and cache
coherency protocols be modified to support transactional memory.
Transactional memory has existed for several years without widespread
implementation. However, the growth of multicore systems and the
associated emphasis on concurrent and parallel programming have prompted
a significant amount of research in this area on the part of both academics and
commercial software and hardware vendors.
7.5.2 OpenMP
In Section 4.5.2, we provided an overview of OpenMP and its support of parallel
programming in a shared-memory environment. Recall that OpenMP includes
a set of compiler directives and an API. Any code following the compiler
directive #pragma omp parallel is identified as a parallel region and is
performed by a number of threads equal to the number of processing cores in
the system. The advantage of OpenMP (and similar tools) is that thread creation
and management are handled by the OpenMP library and are not the
responsibility of application developers.
Along with its #pragma omp parallel compiler directive, OpenMP pro-
vides the compiler directive #pragma omp critical, which specifies the code
region following the directive as a critical section in which only one thread may
be active at a time. In this way, OpenMP provides support for ensuring that
threads do not generate race conditions.
As an example of the use of the critical-section compiler directive, first
assume that the shared variable counter can be modified in the update()
function as follows:
void update(int value)
{
counter += value;
}
If the update() function can be part of— or invoked from — a parallel region,
a race condition is possible on the variable counter.
The critical-section compiler directive can be used to remedy this race
condition and is coded as follows:
void update(int value)
{
#pragma omp critical
{
counter += value;
}
}
The critical-section compiler directive behaves much like a binary semaphore
or mutex lock, ensuring that only one thread at a time is active in the critical
section. If a thread attempts to enter a critical section when another thread is
currently active in that section (that is, owns the section), the calling thread is
blocked until the owner thread exits. If multiple critical sections must be used,
each critical section can be assigned a separate name, and a rule can specify
that no more than one thread may be active in a critical section of the same
name simultaneously.
An advantage of using the critical-section compiler directive in OpenMP is
that it is generally considered easier to use than standard mutex locks.
However, a disadvantage is that application developers must still identify
possible race conditions and adequately protect shared data using the compiler
directive. Additionally, because the critical-section compiler directive behaves
much like a mutex lock, deadlock is still possible when two or more critical
sections are identified.
7.6 Summary