Unit-1 Concurrent and Parallel Programming Syllabus: Concurrent Versus Sequential Programming. Concurrent Programming Constructs
Unit-1 Concurrent and Parallel Programming Syllabus: Concurrent Versus Sequential Programming. Concurrent Programming Constructs
Unit-1 Concurrent and Parallel Programming Syllabus: Concurrent Versus Sequential Programming. Concurrent Programming Constructs
-A concurrent system is one where a computation can advance without waiting for all other
computations to complete.
-Concurrency deals with managing the access to shared state from different threads.
-Concurrency means that multiple processes or threads are making progress concurrently.
While only one thread is executed at a time by the CPU, these threads can be switched in and
out as required. This means that no thread is actually completed totally before another is
scheduled. So all the threads are executing concurrently.
For example, concurrent processes can be executed on one core by interleaving the execution
steps of each process via time-sharing slices: only one process runs at a time, and if it does
not complete during its time slice, it is paused, another process begins or resumes, and then
later the original process is resumed. In this way, multiple processes are part-way through
execution at a single instant, but only one process is being executed at that instant
Levels of Concurrency
Low-Level Concurrency
In this level of concurrency, there is explicit use of atomic operations. We cannot use such
kind of concurrency for application building, as it is very error-prone and difficult to debug.
Even Python does not support such kind of concurrency.
Mid-Level Concurrency
In this concurrency, there is no use of explicit atomic operations. It uses the explicit locks.
Python and other programming languages support such kind of concurrency. Mostly
application programmers use this concurrency.
High-Level Concurrency
In this concurrency, neither explicit atomic operations nor explicit locks are used. Python has
concurrent, futures module to support such kind of concurrency.
Statement concurrency: This means that several statements of a program are executed
concurrently on some physical parallel computer architecture. Usually, one considers a
conventional (sequential - non-concurrent) programming language, such as Fortran or C.
Certain optimizing compilers can automatically determine which statements are independent
of one another and can be executed concurrently without changing the result of the program
execution. They can also take the existing parallel hardware architecture into account (on
which the program is supposed to be executed) in order to generate code for this hardware
architecture which provides for concurrent statement execution. Note: The programmer is
normally not much involved for this kind of concurrency, since it is dealt with automatically
by the optimizing compiler.
Unit (or sub-program) concurrency: Here one talks about concurrent "units", which are
defined by the programming language and are intended to be executed concurrently.
Programming languages with concurrent "units" allow the programmer to design programs
which include a number of concurrent activities and the concurrency structure of a program
depends very much on the application. A typical application area where concurrent "units"
are very useful is simulation of process control or transportation systems, where the simulated
objects evolve in real time concurrently. This level of concurrency is also called logical
concurrency. In contrast to physical concurrency, it does not imply that the logically
concurrent "units" are really executed in physical concurrency. Often they are executed in an
interleaved manner (time-sharing, see below). Note: Instead of "unit", one often uses the
words process, thread, task, or simply concurrent statements. Note: This kind of concurrency
must be understood by the programmer that uses the programming language that supports
concurrent threads (e.g. Java). In this course, we are mainly interested in this kind of
concurrency.
Program (or task) concurrency: Often several programs are executed concurrently (in an
interleaved manner) on a computer within a time-sharing operating system. The CPU of the
computer is shared among the programs by allocating the CPU to each program for a short
time slice at a time. The allocation is done by a scheduler (see Sebesta's book). The same
technique of time sharing is usually also used for realizing concurrent "units" within a given
program
CATEGORIES OF CONCURRENCY
• There are two distinct categories of concurrent unit control, physical
concurrency and logical concurrency.
• When physical concurrency happenswhen several program units from the same
program literally execute simultaneously on more than one processor.
• On the other hand, logical concurrency, happens when the execution of several
programs takes place in an interleaving fashion on a single processor.
For a program or concurrent system to be correct, some properties must be satisfied by it.
Properties related to the termination of system are as follows −
Correctness property
The correctness property means that the program or the system must provide the desired
correct answer. To keep it simple, we can say that the system must map the starting program
state to final state correctly.
Safety property
The safety property means that the program or the system must remain in a “good” or “safe”
state and never does anything “bad”.
Liveness property
This property means that a program or system must “make progress” and it would reach at
some desirable state.
This is one common property of concurrent system in which there can be multiple processes
and threads, which run at the same time to make progress on their own tasks. These processes
and threads are called actors of the concurrent system.
Resources of Concurrent Systems
The actors must utilize the resources such as memory, disk, printer etc. in order to perform
their tasks.
Every concurrent system must possess a set of rules to define the kind of tasks to be
performed by the actors and the timing for each. The tasks could be acquiring of locks,
memory sharing, modifying the state, etc.
While implementing concurrent systems, the programmer must take into consideration the
following two important issues, which can be the barriers of concurrent systems −
Sharing of data
An important issue while implementing the concurrent systems is the sharing of data among
multiple threads or processes. Actually, the programmer must ensure that locks protect the
shared data so that all the accesses to it are serialized and only one thread or process can
access the shared data at a time. In case, when multiple threads or processes are all trying to
access the same shared data then not all but at least one of them would be blocked and would
remain idle. In other words, we can say that we would be able to use only one process or
thread at a time when lock is in force. There can be some simple solutions to remove the
above-mentioned barriers −
The simplest solution is not to share any mutable data. In this case, we need not to use
explicit locking and the barrier of concurrency due to mutual data would be solved.
Many times the concurrent processes need to access the same data at the same time. Another
solution, than using of explicit locks, is to use a data structure that supports concurrent
access. For example, we can use the queue module, which provides thread-safe queues. We
can also use multiprocessing.JoinableQueue classes for multiprocessing-based
concurrency.
Sometimes, the data structure that we are using, say concurrency queue, is not suitable then
we can pass the immutable data without locking it.
Mutable Data Transfer
In continuation of the above solution, suppose if it is required to pass only mutable data,
rather than immutable data, then we can pass mutable data that is read only.
Another important issue in implementing concurrent systems is the use of I/O resources by
threads or processes. The problem arises when one thread or process is using the I/O for such
a long time and other is sitting idle. We can see such kind of barrier while working with an
I/O heavy application. It can be understood with the help of an example, the requesting of
pages from web browser. It is a heavy application. Here, if the rate at which the data is
requested is slower than the rate at which it is consumed then we have I/O barrier in our
concurrent system.
Multiple threads of execution are running A single thread of execution weaves its way
simultaneously through your program. through your program.
Multiple PC(Program counter) s are active, A Single PC (Program counter ) identifies the
One for each. current instruction being executed.
1) Interleaving:
Interleaving is a process or methodology to make a system more efficient, fast and reliable by
arranging data in a noncontiguous manner. There are many uses for interleaving at the system
level, including:
• Storage: As hard disks and other storage devices are used to store user and system
data, there is always a need to arrange the stored data in an appropriate way.
• Error Correction: Errors in data communication and memory can be corrected
through interleaving.
• Multi-Dimensional Data Structures
1. Two-Way Interleaving: Two memory blocks are accessed at same level for reading
and writing operations. The chance for overlapping exists.
2. Four-Way Interleaving: Four memory blocks are accessed at the same time.
3. Error-Correction Interleaving: Errors in communication systems occur in high
volumes rather than in single attacks. Interleaving controls these errors with specific
algorithms.
Latency is one disadvantage of interleaving. Interleaving takes time and hides all kinds of
error structures, which are not efficient.
2) Mutual Exclusion:
A mutual exclusion (mutex) is a program object that prevents simultaneous access to a shared
resource. This concept is used in concurrent programming with a critical section, a piece of
code in which processes or threads access a shared resource. Only one thread owns the mutex
at a time, thus a mutex with a unique name is created when a program starts. When a thread
holds a resource, it has to lock the mutex from other threads to prevent concurrent access of
the resource. Upon releasing the resource, the thread unlocks the mutex.
Mutex comes into the picture when two threads work on the same data at the same time. It
acts as a lock and is the most basic synchronization tool. When a thread tries to acquire a
mutex, it gains the mutex if it is available, otherwise the thread is set to sleep condition.
Mutual exclusion reduces latency and busy-waits using queuing and context switches. Mutex
can be enforced at both the hardware and software levels.
Disabling interrupts for the smallest number of instructions is the best way to enforce mutex
at the kernel level and prevent the corruption of shared data structures. If multiple processors
share the same memory, a flag is set to enable and disable the resource acquisition based on
availability. The busy-wait mechanism enforces mutex in the software areas. This is
furnished with algorithms such as Dekker's algorithm, the black-white bakery algorithm,
Szymanski's algorithm, Peterson's algorithm and Lamport's bakery algorithm.
Mutually exclusive readers and read/write mutex class codes can be defined for an efficient
implementation of mutex.
Any property of all executions of a concurrent program can be formulated in terms of safety
and liveness.
Safety: A safety property asserts that nothing "bad" happens throughout execution.
Liveness: A liveness property asserts that something "good" eventually does happen.
For example, the property that a program always produces the correct answer can be
formulated using one safety property and one liveness property. The safety property is that
the program never terminates with the wrong answerterminating with the wrong answer is the
"bad thing". The liveness property is that the program eventually does terminate--termination
is the "good thing". We might also desire that a program generate answers in a timely
manner. This is also a safety property, where the "bad thing" is that the clock (an hidden
variable) has a certain value and the program counter (also an hidden variable) has not
reached the statement that generates the answer.
The key attribute of safety properties is that once the proscribed "bad thing" happens, no
subsequent execution can cause the safety property to hold. On the other hand, the key
attribute of liveness properties is that no partial execution is irremediable: it always remains
possible for the "good thing" to occur during subsequent execution.
4) Semaphores:
Semaphores are integer variables that are used to solve the critical section problem by using
two atomic operations, wait and signal that are used for process synchronization.
The definitions of wait and signal are as follows:
1. Wait
wait(S)
while(S<=0);
S--;
2. Signal
signal(S)
S++;
Types of Semaphores
There are two main types of semaphores i.e. counting semaphores and binary semaphores.
Details about these are given as follows:
1. Counting Semaphores
These are integer value semaphores and have an unrestricted value domain. These
semaphores are used to coordinate the resource access, where the semaphore count is
the number of available resources. If the resources are added, semaphore count
automatically incremented and if the resources are removed, the count is
decremented.
2. Binary Semaphores
The binary semaphores are like counting semaphores but their value is restricted to 0
and 1. The wait operation only works when the semaphore is 1 and the signal
operation succeeds when semaphore is 0. It is sometimes easier to implement binary
semaphores than counting semaphores.
Advantages of Semaphores
Some of the advantages of semaphores are as follows:
1. Semaphores allow only one process into the critical section. They follow the mutual
exclusion principle strictly and are much more efficient than some other methods of
synchronization.
2. There is no resource wastage because of busy waiting in semaphores as processor
time is not wasted unnecessarily to check if a condition is fulfilled to allow a process
to access the critical section.
3. Semaphores are implemented in the machine independent code of the microkernel. So
they are machine independent.
Disadvantages of Semaphores
Some of the disadvantages of semaphores are as follows:
1. Semaphores are complicated so the wait and signal operations must be implemented
in the correct order to prevent deadlocks.
2. Semaphores are impractical for last scale use as their use leads to loss of modularity.
This happens because the wait and signal operations prevent the creation of a
structured layout for the system.
3. Semaphores may lead to a priority inversion where low priority processes may access
the critical section first and high priority processes later.
5) Monitors:
Monitors and semaphores are used for process synchronization and allow processes to access
the shared resources using mutual exclusion. However, monitors and semaphores contain
many differences. Details about both of these are given as follows:
Monitors are a synchronization construct that were created to overcome the problems caused
by semaphores such as timing errors.
Monitors are abstract data types and contain shared data variables and procedures. The shared
data variables cannot be directly accessed by a process and procedures are required to allow a
single process to access the shared data variables at a time.
This is demonstrated as follows:
monitor monitorName
data variables;
Procedure P1(....)
Procedure P2(....)
ProcedurePn(....)
InitializationCode(....)
Only one process can be active in a monitor at a time. Other processes that need to access the
shared variables in a monitor have to line up in a queue and are only provided access when
the previous process releases the shared variables.
6) Channel (programming):
• sending to a channel
libthread channels[edit]
The Multithreading library, libthread, which was first created for the operating system Plan 9,
offers inter-thread communication based on fixed-size channels.
OCaml events
The OCaml event module offers typed channels for synchronization. When the module's send
and receive functions are called, they create corresponding send and receive events which can
be synchronized.
Examples
XMOS XC[edit]
The XMOS programming language XC provides a primitive type "chan" and two operators
"<:" and ":>" for sending and receiving data from a channel.
In this example, two hardware threads are started on the XMOS, running the two lines in the
"par" block. The first line transmits the number 42 through the channel while the second
waits until it is received and sets the value of x. The XC language also allows asynchronous
receiving on channels through a select statement.
chan c;
int x;
par {
c <: 42;
c :> x;
}
Go
This snippet of Go code performs similarly to the XC code. First the channel c is created,
then a goroutine is spawned which sends 42 through the channel. When the number is put in
the channel x is set to 42. Go allows channels to buffer contents, as well as non blocking
receiving through the use of a select block.
c :=make(chanint)
gofunc() {c <-+42}()
x :=<- c
Applications
In addition to their fundamental use for interprocess communication, channels can be used as
a primitive to implement various other concurrent programming constructs which can be
realized as streams. For example, channels can be used to construct futures and promises,
where a future is a one-element channel, and a promise is a process that sends to the channel,
fulfilling the future. Similarly, iterators can be constructed directly from channels.
7) Message Passing:
Process communication is the mechanism provided by the operating system that allows
processes to communicate with each other. This communication could involve a process
letting another process know that some event has occurred or transferring of data from one
process to another. One of the models of process communication is the message passing
model.
Message passing model allows multiple processes to read and write data to the message
queue without being connected to each other. Messages are stored on the queue until their
recipient retrieves them. Message queues are quite useful for inter process communication
and are used by most operating systems.
A diagram that demonstrates message passing model of process communication is given as
follows:
In the above diagram, both the processes P1 and P2 can access the message queue and store
and retrieve data.
Advantages of Message Passing Model
Some of the advantages of message passing model are given as follows:
1. The message passing model is much easier to implement than the shared memory
model.
2. It is easier to build parallel hardware using message passing model as it is quite
tolerant of higher communication latencies.
Race condition
A simple example of a race condition is a light switch. In some homes there are multiple light
switches connected to a common ceiling light. When these types of circuits are used, the
switch position becomes irrelevant. If the light is on, moving either switch from its current
position turns the light off. Similarly, if the light is off, then moving either switch from its
current position turns the light on. With that in mind, imagine what might happen if two
people tried to turn on the light using two different switches at exactly the same time.
One instruction might cancel the other or the two actions might trip the circuit breaker.
Suppose for a moment that two processes need to perform a bit flip at a specific memory
location. Under normal circumstances the operation should work like this:
In this example, Process 1 performs a bit flip, changing the memory value from 0 to 1.
Process2 then performs a bit flip and changes the memory value from 1 to 0.
If a race condition occurred causing these two processes to overlap, the sequence could
potentially look more like this:
In this example, the bit has an ending value of 1 when its value should be 0. This occurs
because Process 2 is unaware that Process 1 is performing a simultaneous bit flip.
When a program that is designed to handle tasks in a specific sequence is asked to perform
two or more operations simultaneously, an attacker can take advantage of the time gap
between when the service is initiated and when a security control takes effect in order to
create a deadlock or thread block situation. With deadlock, two or more threads must wait
for a lock in a circular chain. This defect can cause the entire software system to halt
because such locks can never be acquired if the chain is circular. Thread block can also
dramatically impact application performance. In this type of concurrency defect, one thread
calls a long-running operation while holding a lock and preventing the progress of other
threads.
Race conditions occasionally occur in logic gates when certain inputs come into conflict.
Because the gate output state takes a finite, nonzero amount of time to react to any change in
input states, sensitive circuits or devices following the gate may be fooled by the state of the
output, and thereby not operate properly.
Synchronization
While a number of techniques are available for synchronization, only a few methods are used
by developers on a regular basis. The techniques used are also to some extent determined by
the programming environment.
The scope of synchronization is broad. Proper synchronization orders the updates to data and
provides an expected outcome. As shown in Figure 1.3 below, shared data d can get access
by threads Ti and Tj at time ti , tj , tk , tl , where
and a proper synchronization maintains the order to update d at these instances and considers
the state of d as a synchronization function of time.
This synchronization function, s , represents the behavior of a synchronized construct with
respect to the execution time of a thread.
Figure 1.3. Shared Data Synchronization, Where Data d Is Protected by a
Synchronization Operation
Figure 1.4 below represents how synchronization operations are performed in an actual
multi-threaded implementation in a generic form, and demonstrates the low of threads.
When m >=1 , the creation timing for initial threads T 1 T m might not be the same. After
block Bi as well as Bj , the number of threads could be different, which means m is not
necessarily equal to n and n is not necessarily equal to p. For all operational environments,
the values of m, n, and p are at least 1.
Figure 1.4. Operational Flow of Threads for an Application
Synchronization primitives
where semaphore value sem is initialized with the value 0 or 1before the parallel processes
get started. In Dijkstra's representation, T referred to processes. Threads are used here instead
to be more precise and to remain consistent about the differences between threads and
processes.
The P operation blocks the calling thread if the value remains 0, whereas the V operation,
independent of P operation, signals a blocked thread to allow it to resume operation.
These P and V operations are “indivisible actions” and perform simultaneously. The positive
value of sem represents the number of threads that can proceed without blocking, and the
negative number refers to the number of blocked threads.
When the sem value becomes zero, no thread is waiting, and if a thread needs to decrement,
the thread gets blocked and keeps itself in a waiting list. When the value of sem gets
restricted to only 0 and 1, the semaphore is a binary semaphore.
To use semaphore, you can consider semaphore as a counter, which supports two atomic
operations. Implementation of semaphores varies. From a usability perspective, two kinds of
semaphores exist: strong and weak.
These represent the success of individual calls on P. A strong semaphore maintains First-
Come-First-Serve (FCFS) model and provides guarantee to threads to calls on P and avoid
starvation.
And a weak semaphore is the one which does not provide any guarantee of service to a
particular thread and the thread might starve. For example, in POSIX, the semaphores can
get into starvation status and implemented differently than what Dijkstra defined and
considered as a weak semaphore (Reek 2002).
According to Dijkstra, the mutual exclusion of parallel threads using P and V atomic
operations represented as follows:
Semaphores are largely of historical interest. They are the unstructured “goto ” statements
of multi-threaded programming. Most programming environments provide higher-level
structured synchronization primitives.
However, like the goto, semaphores are occasionally the best primitive to use. A typical use
of a semaphore is protecting a shared resource of which at most n instances are allowed to
exist simultaneously. The semaphore starts out with value n. A thread that needs to acquire an
instance of the resource performs operation P. It releases the resource using operation V.
Let's examine how semaphores might be used for the producer-consumer problem and
whether the problem can be resolved using semaphores o not. Producer-consumeris a
classic synchronization problem, also known as the bounded-buffer problem.
Here a producer function generates data to be placed in a shared buffer and a consumer
function receives the data out of the buffer and operates on it, where both producer and
consumer functions execute concurrently.
Pseudo-code using a semaphore for the producer-consumer problem is shown in Figure 1.5
below.
Here neither producer nor consumer maintains any order. If the producer function operates
forever prior to the consumer function then the system would require an infinite capacity and
that is not possible.
That is why the buffer size needs to be within a boundary to handle this type of scenario and
make sure that if the producer gets ahead of the consumer then the time allocated for the
producer must be restricted. The problem of synchronization can be removed by adding one
more semaphores in the previous solution shown in Figure 1.5 above.
Adding the semaphore would maintain the boundary of buffer as shown in Figure 1.6
below, where s Empty and s Full retain the constraints of buffer capacity for operating
threads.
(ii) Locks
Locks are similar to semaphores except that a single thread handles a lock at one instance.
Two basic atomic operations get performed on a lock:
* acquire():Atomically waits for the lock state to be unlocked and sets the lock state to lock.
* release():Atomically changes the lock state from locked to unlocked.
At most one thread acquires the lock. A thread has to acquire a lock before using a shared
resource; otherwise it waits until the lock becomes available.
When one thread wants to access shared data, it first acquires the lock, exclusively performs
operations on the shared data and later releases the lock for other threads to use.
The level of granularity can be either coarse or fine depending on the type of shared data
that needs to be protected from threads. The coarse granular locks have higher lock
contention than finer granular ones.
To remove issues with lock granularity, most of the processors support the Compare and
Swap (CAS) operation, which provides a way to implement lock-free synchronization. The
atomic CAS operations guarantee that the shared data remains synchronized among threads.
If you require the use of locks, it is recommended that you use the lock inside a critical
section with a single entry and single exit, as shown in Figure 1.7 below .
From an implementation perspective, it is always safe to use explicit locks rather than relying
on implicit locks. In general a lock must not be held for a long period of time. The explicit
locks are defined by the developer, whereas implicit locks come from the underlying
framework used, such as database engines provides lock the maintain data consistency.
In the produce-consumer problem, if the consumer wants to consume a shared data before the
producer produces, it must wait. To use locks for the producer-consumer problem, the
consumer must loop until the data is ready from the producer. The reason for looping is that
the lock does not support any wait operation, whereas Condition Variables does.
(i) Lock Types
An application can have different types of locks according to the constructs required to
accomplish the task. You must avoid mixing lock types within a given task. For this reason,
special attention is required when using any third party library.
If your application has some third party dependency for a resource Rand the third party uses
lock type L for R, then if you need to use a lock mechanism for R, you must use lock type L
rather any other lock type. The following sections cover these locks and define their
purposes.
1) Mutexes. The mutex is the simplest lock an implementation can use. Some texts use
the mutex as the basis to describe locks in general. The release of a mutex does not
depend on the release() operation only. A timer attribute can be added with a mutex.
If the timer expires before a release operation, the mutex releases the code block or shared
memory to other threads. A try-finally clause can be used to make sure that the mutex gets
released when an exception occurs. The use of a timer or try-finally clause helps to prevent a
deadlock scenario.
2) Recursive Locks. Recursive locks are locks that may be repeatedly acquired by the
thread that currently owns the lock without causing the thread to deadlock. No other
thread may acquire a recursive lock until the owner releases it once for each time the
owner acquired it.
Thus when using a recursive lock, be sure to balance acquire operations with release
operations. The best way to do this is to lexically balance the operations around single-entry
single-exit blocks, as was shown for ordinary locks.
The recursive lock is most useful inside a recursive function. In general, the recursive locks
are slower than non recursive locks. An example of recursive locks use is shown in Figure
1.8 below .
Figure 1.9 Use of a Condition Variable for the Producer-Consumer Problem Monitors
For structured synchronization, a higher level construct is introduced for simplifying the use
of condition variables and locks, known as a monitor. The purpose of the monitor is to
simplify the complexity of primitive synchronization operations and remove the
implementation details from application developers.
The compiler for the language that supports monitors automatically inserts lock operations at
the beginning and the end of each synchronization-aware routine.
Most recent programming languages do not support monitor explicitly, rather they expose
lock and unlock operations to the developers. The Java language supports explicit monitor
objects along with synchronized blocks inside a method. In Java, the monitor is maintained
by the ”synchronized” constructs, such as
where the “condition” primitives are used by wait(), notify(), or notifyAll() methods. Do not
confuse this with the Monitor object in the Java SDK though. The Java Monitor object is
used to perform resource management in Java Management Extension (JMX). Similarly, the
monitor r object in C# is used as lock construct.