Concurrency in Python Tutorial
Concurrency in Python Tutorial
Concurrency in Python Tutorial
i
Concurrency in Python
Audience
This tutorial will be useful for graduates, postgraduates, and research students who either
have an interest in this subject or have this subject as a part of their curriculum. The
reader can be a beginner or an advanced learner.
Prerequisites
The reader must have basic knowledge about concepts such as Concurrency,
Multiprocessing, Threads, and Process etc. of Operating System. He/she should also be
aware about basic terminologies used in OS along with Python programming concepts.
All the content and graphics published in this e-book are the property of Tutorials Point (I)
Pvt. Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute or republish
any contents or a part of contents of this e-book in any manner without written consent
of the publisher.
We strive to update the contents of our website and tutorials as timely and as precisely as
possible, however, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt.
Ltd. provides no guarantee regarding the accuracy, timeliness or completeness of our
website or its contents including this tutorial. If you discover any errors on our website or
in this tutorial, please notify us at contact@tutorialspoint.com
i
Concurrency in Python
Table of Contents
About the Tutorial ............................................................................................................................................ i
Audience ........................................................................................................................................................... i
Prerequisites ..................................................................................................................................................... i
What is Concurrency?...................................................................................................................................... 1
ii
Concurrency in Python
Why to Test?.................................................................................................................................................. 51
v
1. Concurrency in Python – Introduction Concurrency in Python
In this chapter, we will understand the concept of concurrency in Python and learn about
the different threads and processes.
What is Concurrency?
In simple words, concurrency is the occurrence of two or more events at the same time.
Concurrency is a natural phenomenon because many events occur simultaneously at any
given time.
1
Concurrency in Python
Program counter which consist of the address of the next executable instruction
Stack
Set of registers
A unique id
Multithreading, on the other hand, is the ability of a CPU to manage the use of operating
system by executing multiple threads concurrently. The main idea of multithreading is to
achieve parallelism by dividing a process into multiple threads. The concept of
multithreading can be understood with the help of the following example.
Example
Suppose we are running a particular process wherein we open MS Word to type content
into it. One thread will be assigned to open MS Word and another thread will be required
to type content in it. And now, if we want to edit the existing then another thread will be
required to do the editing task and so on.
2
Concurrency in Python
A process can have only one thread, called primary thread, or multiple threads having
their own set of registers, program counter and stack. Following diagram will show us the
difference:
Multiprocessing, on the other hand, is the use of two or more CPUs units within a single
computer system. Our primary goal is to get the full potential from our hardware. To
achieve this, we need to utilize full number of CPU cores available in our computer system.
Multiprocessing is the best approach to do so.
Memory
Python is one of the most popular programming languages. Followings are some reasons
that make it suitable for concurrent applications:
Syntactic sugar
Syntactic sugar is syntax within a programming language that is designed to make things
easier to read or to express. It makes the language “sweeter” for human use: things can
be expressed more clearly, more concisely, or in an alternative style based on preference.
Python comes with Magic methods, which can be defined to act on objects. These Magic
methods are used as syntactic sugar and bound to more easy-to-understand keywords.
Large Community
Python language has witnessed a massive adoption rate amongst data scientists and
mathematicians, working in the field of AI, machine learning, deep learning and
quantitative analysis.
3
Concurrency in Python
However, there are some libraries and implementations in Python such as Numpy,
Jpython and IronPython. These libraries work without any interaction with GIL.
4
2. Concurrency in Python – Concurrency vs Concurrency in Python
Parallelism
Both concurrency and parallelism are used in relation to multithreaded programs but there
is a lot of confusion about the similarity and difference between them. The big question in
this regard: is concurrency parallelism or not? Although both the terms appear quite
similar but the answer to the above question is NO, concurrency and parallelism are not
same. Now, if they are not same then what is the basic difference between them?
In simple terms, concurrency deals with managing the access to shared state from
different threads and on the other side, parallelism deals with utilizing multiple CPUs or its
cores to improve the performance of hardware.
Concurrency in Detail
Concurrency is when two tasks overlap in execution. It could be a situation where an
application is progressing on more than one task at the same time. We can understand it
diagrammatically; multiple tasks are making progress at the same time, as follows:
Levels of Concurrency
In this section, we will discuss the three important levels of concurrency in terms of
programming:
Low-Level Concurrency
In this level of concurrency, there is explicit use of atomic operations. We cannot use such
kind of concurrency for application building, as it is very error-prone and difficult to debug.
Even Python does not support such kind of concurrency.
Mid-Level Concurrency
In this concurrency, there is no use of explicit atomic operations. It uses the explicit locks.
Python and other programming languages support such kind of concurrency. Mostly
application programmers use this concurrency.
5
Concurrency in Python
High-Level Concurrency
In this concurrency, neither explicit atomic operations nor explicit locks are used. Python
has concurrent.futures module to support such kind of concurrency.
Correctness property
The correctness property means that the program or the system must provide the desired
correct answer. To keep it simple, we can say that the system must map the starting
program state to final state correctly.
Safety property
The safety property means that the program or the system must remain in a “good” or
“safe” state and never does anything “bad”.
Liveness property
This property means that a program or system must “make progress” and it would reach
at some desirable state.
Sharing of data
An important issue while implementing the concurrent systems is the sharing of data
among multiple threads or processes. Actually, the programmer must ensure that locks
protect the shared data so that all the accesses to it are serialized and only one thread or
process can access the shared data at a time. In case, when multiple threads or processes
6
Concurrency in Python
are all trying to access the same shared data then not all but at least one of them would
be blocked and would remain idle. In other words, we can say that we would be able to
use only one process or thread at a time when lock is in force. There can be some simple
solutions to remove the above-mentioned barriers:
The following Python script is for requesting a web page and getting the time our network
took to get the requested page:
import urllib.request
import time
ts = time.time()
req = urllib.request.urlopen('http://www.tutorialspoint.com')
pageHtml = req.read()
7
Concurrency in Python
te = time.time()
After executing the above script, we can get the page fetching time as shown below.
Output
Page Fetching Time: 1.0991398811340332 Seconds
We can see that the time to fetch the page is more than one second. Now what if we want
to fetch thousands of different web pages, you can understand how much time our network
would take.
What is Parallelism?
Parallelism may be defined as the art of splitting the tasks into subtasks that can be
processed simultaneously. It is opposite to the concurrency, as discussed above, in which
two or more events are happening at the same time. We can understand it
diagrammatically; a task is broken into a number of subtasks that can be processed in
parallel, as follows:
To get more idea about the distinction between concurrency and parallelism, consider the
following points:
8
Concurrency in Python
Necessity of Parallelism
We can achieve parallelism by distributing the subtasks among different cores of single
CPU or among multiple computers connected within a network.
If we talk about real life example of parallelism, the graphics card of our computer is the
example that highlights the true power of parallel processing because it has hundreds of
individual processing cores that work independently and can do the execution at the same
time. Due to this reason, we are able to run high-end applications and games as well.
Single-core processors
Single-core processors are capable of executing one thread at any given time. These
processors use context switching to store all the necessary information for a thread at
a specific time and then restoring the information later. The context switching mechanism
helps us make progress on a number of threads within a given second and it looks as if
the system is working on multiple things.
9
Concurrency in Python
Single-core processors come with many advantages. These processors require less power
and there is no complex communication protocol between multiple cores. On the other
hand, the speed of single-core processors is limited and it is not suitable for larger
applications.
Multi-core processors
Multi-core processors have multiple independent processing units also called cores.
Such processors do not need context switching mechanism as each core contains
everything it needs to execute a sequence of stored instructions.
Fetch-Decode-Execute Cycle
The cores of multi-core processors follow a cycle for executing. This cycle is called the
Fetch-Decode-Execute cycle. It involves the following steps:
Fetch
This is the first step of cycle, which involves the fetching of instructions from the program
memory.
Decode
Recently fetched instructions would be converted to a series of signals that will trigger
other parts of the CPU.
Execute
It is the final step in which the fetched and the decoded instructions would be executed.
The result of execution will be stored in a CPU register.
One advantage over here is that the execution in multi-core processors are faster than
that of single-core processors. It is suitable for larger applications. On the other hand,
complex communication protocol between multiple cores is an issue. Multiple cores require
more power than single-core processors.
10
3. Concurrency in Python – System & Memory Concurrency in Python
Architecture
There are different system and memory architecture styles that need to be considered
while designing the program or concurrent system. It is very necessary because one
system & memory style may be suitable for one task but may be error prone to other task.
Advantages of SISD
The advantages of SISD architecture are as follows:
Disadvantages of SISD
The disadvantages of SISD architecture are as follows:
11
Concurrency in Python
The best example for SIMD is the graphics cards. These cards have hundreds of individual
processing units. If we talk about computational difference between SISD and SIMD then
for the adding arrays [5, 15, 20] and [15, 25, 10], SISD architecture would have to
perform three different add operations. On the other hand, with the SIMD architecture,
we can add then in a single add operation.
Advantages of SIMD
The advantages of SIMD architecture are as follows:
Same operation on multiple elements can be performed using one instruction only.
Disadvantages of SIMD
The disadvantages of SIMD architecture are as follows:
12
Concurrency in Python
A normal multiprocessor uses the MIMD architecture. These architectures are basically
used in a number of application areas such as computer-aided design/computer-aided
manufacturing, simulation, modeling, communication switches, etc.
13
Concurrency in Python
When all the processors have equal access to all the peripheral devices, the system is
called a symmetric multiprocessor. When only one or a few processors can access the
peripheral devices, the system is called an asymmetric multiprocessor.
14
Concurrency in Python
15
Concurrency in Python
16
4. Concurrency in Python – Threads Concurrency in Python
In general, as we know that thread is a very thin twisted string usually of the cotton or
silk fabric and used for sewing clothes and such. The same term thread is also used in the
world of computer programming. Now, how do we relate the thread used for sewing clothes
and the thread used for computer programming? The roles performed by the two threads
is similar here. In clothes, thread hold the cloth together and on the other side, in computer
programming, thread hold the computer program and allow the program to execute
sequential actions or many actions at once.
Thread is the smallest unit of execution in an operating system. It is not in itself a program
but runs within a program. In other words, threads are not independent of one other and
share code section, data section, etc. with other threads. These threads are also known
as lightweight processes.
States of Thread
To understand the functionality of threads in depth, we need to learn about the lifecycle
of the threads or the different thread states. Typically, a thread can exist in five distinct
states. The different states are shown below:
New Thread
A new thread begins its life cycle in the new state. However, at this stage, it has not yet
started and it has not been allocated any resources. We can say that it is just an instance
of an object.
Runnable
As the newly born thread is started, the thread becomes runnable i.e. waiting to run. In
this state, it has all the resources but still task scheduler have not scheduled it to run.
Running
In this state, the thread makes progress and executes the task, which has been chosen
by task scheduler to run. Now, the thread can go to either the dead state or the non-
runnable/ waiting state.
Non-running/waiting
In this state, the thread is paused because it is either waiting for the response of some
I/O request or waiting for the completion of the execution of other thread.
Dead
A runnable thread enters the terminated state when it completes its task or otherwise
terminates.
17
Concurrency in Python
Types of Thread
In this section, we will see the different types of thread. The types are described below:
In this case, the thread management kernel is not aware of the existence of threads. The
thread library contains code for creating and destroying threads, for passing message and
data between threads, for scheduling thread execution and for saving and restoring thread
contexts. The application starts with a single thread.
Java threads
POSIX threads
18
Concurrency in Python
In this case, the Kernel does thread management. There is no thread management code
in the application area. Kernel threads are supported directly by the operating system.
Any application can be programmed to be multithreaded. All of the threads within an
application are supported within a single process.
The Kernel maintains context information for the process as a whole and for individual
threads within the process. Scheduling by the Kernel is done on a thread basis. The Kernel
performs thread creation, scheduling and management in Kernel space. Kernel threads
are generally slower to create and manage than the user threads. The examples of kernel
level threads are Windows, Solaris.
● Kernel can simultaneously schedule multiple threads from the same process on
multiple processes.
19
Concurrency in Python
● If one thread in a process is blocked, the Kernel can schedule another thread of the
same process.
● Transfer of control from one thread to another within the same process requires a
mode switch to the Kernel.
Thread state: It contains the information related to the state (Running, Runnable,
Non-Running, Dead) of the thread.
Program Counter (PC): It points to the current program instruction of the thread.
Register set: It contains the thread’s register values assigned to them for
computations.
Stack Pointer: It points to the thread’s stack in the process. It contains the local
variables under thread’s scope.
Pointer to PCB: It contains the pointer to the process that created that thread.
Thread identification
Thread state
Register set
Stack Pointer
20
Concurrency in Python
Pointer to PCB
The following table shows the comparison between process and thread:
6 In multiple processes, each process One thread can read, write or change
operates independently of the others. another thread's data.
21
Concurrency in Python
7 If there would be any change in the If there would be any change in the
parent process then it does not affect main thread then it may affect the
the child processes. behavior of other threads of that
process.
Concept of Multithreading
As we have discussed earlier that Multithreading is the ability of a CPU to manage the
use of operating system by executing multiple threads concurrently. The main idea of
multithreading is to achieve parallelism by dividing a process into multiple threads. In a
more simple way, we can say that multithreading is the way of achieving multitasking by
using the concept of threads.
The concept of multithreading can be understood with the help of the following example.
Example
Suppose we are running a process. The process could be for opening MS word for writing
something. In such process, one thread will be assigned to open MS word and another
thread will be required to write. Now, suppose if we want to edit something then another
thread will be required to do the editing task and so on.
The following diagram helps us understand how multiple threads exist in memory:
22
Concurrency in Python
We can see in the above diagram that more than one thread can exist within one process
where every thread contains its own register set and local variables. Other than that, all
the threads in a process share global variables.
Pros of Multithreading
Let us now see a few advantages of multithreading. The advantages are as follows:
Sharing of data: There is no requirement of extra space for each thread because
threads within a program can share same data.
Cons of Multithreading
Let us now see a few disadvantages of multithreading. The disadvantages are as follows:
Issue of security - As we know that all the threads within a program share same
data, hence there is always an issue of security because any unknown thread can
change the data.
Lead to deadlock state - Multithreading can lead the program to potential risk of
attaining the deadlock state.
23
5. Concurrency in Python – Implementation of Concurrency in Python
Threads
<_thread> module
<threading> module
The main difference between these two modules is that <_thread> module treats a
thread as a function whereas, the <threading> module treats every thread as an object
and implements it in an object oriented way. Moreover, the <_thread> module is
effective in low level threading and has fewer capabilities than the <threading> module.
<_thread> module
In the earlier version of Python, we had the <thread> module but it has been considered
as "deprecated" for quite a long time. Users have been encouraged to use the
<threading> module instead. Therefore, in Python 3 the module "thread" is not available
anymore. It has been renamed to "<_thread>" for backwards incompatibilities in
Python3.
To generate new thread with the help of the <_thread> module, we need to call the
start_new_thread method of it. The working of this method can be understood with the
help of following syntax:
Here –
If we want to call function without passing an argument then we need to use an empty
tuple of arguments in args.
This method call returns immediately, the child thread starts, and calls function with the
passed list, if any, of args. The thread terminates as and when the function returns.
Example
Following is an example for generating new thread by using the <_thread> module. We
are using the start_new_thread() method here.
24
Concurrency in Python
import _thread
import time
try:
_thread.start_new_thread( print_time, ("Thread-1", 2, ) )
_thread.start_new_thread( print_time, ("Thread-2", 4, ) )
except:
print ("Error: unable to start thread")
while 1:
pass
Output
The following output will help us understand the generation of new threads with the help
of the <_thread> module.
<threading> module
The <threading> module implements in an object oriented way and treats every thread
as an object. Therefore, it provides much more powerful, high-level support for threads
than the <_thread> module. This module is included with Python 2.4.
25
Concurrency in Python
threading.enumerate() – This method returns a list of all thread objects that are
currently active.
For implementing threading, the <threading> module has the Thread class which
provides the following methods:
o start() − The start() method starts a thread by calling the run method.
Step 1: In this step, we need to define a new subclass of the Thread class.
Step 3: In this step, we need to override the run(self [,args]) method to implement
what the thread should do when started.
Now, after creating the new Thread subclass, we can create an instance of it and
then start a new thread by invoking the start(), which in turn calls the run()
method.
Example
Consider this example to learn how to generate a new thread by using the <threading>
module.
import threading
import time
26
Concurrency in Python
exitFlag = 0
thread1.start()
thread2.start()
thread1.join()
thread2.join()
print ("Exiting Main Thread")
Starting Thread-1
Starting Thread-2
Output
Now, consider the following output:
27
Concurrency in Python
The following Python program with the help of start(), sleep() and join() methods will show
how a thread entered in running, waiting and dead state respectively.
import threading
import time
def thread_states():
print("Thread entered in running state")
Step 3: We are using the sleep() method of time module to make our thread waiting for
say 2 seconds.
time.sleep(2)
Step 4: Now, we are creating a thread named T1, which takes the argument of the function
defined above.
T1 = threading.Thread(target=thread_states)
28
Concurrency in Python
Step 5: Now, with the help of the start() function we can start our thread. It will produce
the message, which has been set by us while defining the function.
T1.start()
Thread entered in running state
Step 6: Now, at last we can kill the thread with the join() method after it finishes its
execution.
T1.join()
import threading
import time
import random
def Thread_execution(i):
print("Execution of Thread {} started\n".format(i))
sleepTime = random.randint(1,4)
time.sleep(sleepTime)
print("Execution of Thread {} finished".format(i))
for i in range(4):
thread = threading.Thread(target=Thread_execution, args=(i,))
thread.start()
print("Active Threads:" , threading.enumerate())
Output
Execution of Thread 0 started
Active Threads:
[<_MainThread(MainThread, started 6040)>,
<HistorySavingThread(IPythonHistorySavingThread, started 5968)>,
<Thread(Thread-3576, started 3932)>]
Execution of Thread 1 started
Active Threads:
[<_MainThread(MainThread, started 6040)>,
<HistorySavingThread(IPythonHistorySavingThread, started 5968)>,
<Thread(Thread-3576, started 3932)>, <Thread(Thread-
29
Concurrency in Python
import threading
import time
def nondaemonThread():
print("starting my thread")
time.sleep(8)
print("ending my thread")
def daemonThread():
while True:
30
Concurrency in Python
print("Hello")
time.sleep(2)
if __name__ == '__main__':
nondaemonThread = threading.Thread(target=nondaemonThread)
daemonThread = threading.Thread(target=daemonThread)
daemonThread.setDaemon(True)
daemonThread.start()
nondaemonThread.start()
In the above code, there are two functions namely nondaemonThread() and
daemonThread(). The first function prints its state and sleeps after 8 seconds while the
the deamonThread() function prints Hello after every 2 seconds indefinitely. We can
understand the difference between nondaemon and daemon threads with the help of
following output:
Output
Hello
starting my thread
Hello
Hello
Hello
Hello
ending my thread
Hello
Hello
Hello
Hello
Hello
31
6. Concurrency in Python – Synchronizing Concurrency in Python
Threads
Thread synchronization may be defined as a method with the help of which we can be
assured that two or more concurrent threads are not simultaneously accessing the
program segment known as critical section. On the other hand, as we know that critical
section is the part of the program where the shared resource is accessed. Hence we can
say that synchronization is the process of making sure that two or more threads do not
interface with each other by accessing the resources at the same time. The diagram below
shows that four threads trying to access the critical section of a program at the same time.
To make it clearer, suppose two or more threads trying to add the object in the list at the
same time. This act cannot lead to a successful end because either it will drop one or all
the objects or it will completely corrupt the state of the list. Here the role of the
synchronization is that only one thread at a time can access the list.
Deadlock
Race condition
Race condition
This is one of the major issues in concurrent programming. Concurrent access to shared
resources can lead to race condition. A race condition may be defined as the occurring of
a condition when two or more threads can access shared data and then try to change its
value at the same time. Due to this, the values of variables may be unpredictable and vary
depending on the timings of context switches of the processes.
Example
Consider this example to understand the concept of race condition:
32
Concurrency in Python
import threading
Step 2: Now, define a global variable, say x, along with its value as 0:
x =0
Step 3: Now, we need to define the increment_global() function, which will do the
increment by 1 in this global function x:
def increment_global():
global x
x += 1
Step 4: In this step, we will define the taskofThread() function, which will call the
increment_global() function for a specified number of times; for our example it is 50000
times:
def taskofThread():
for _ in range(50000):
increment_global()
Step 5: Now, define the main() function in which threads t1 and t2 are created. Both will
be started with the help of the start() function and wait until they finish their jobs with the
help of join() function.
def main():
global x
x =0
t1 = threading.Thread(target= taskofThread)
t2 = threading.Thread(target= taskofThread)
t1.start()
t2.start()
t1.join()
t2.join()
33
Concurrency in Python
Step 6: Now, we need to give the range as in for how many iterations we want to call the
main() function. Here, we are calling it for 5 times.
if __name__ == "__main__":
for i in range(5):
main()
print("x = {1} after Iteration {0}".format(i,x))
In the output shown below, we can see the effect of race condition as the value of x after
each iteration is expected 100000. However, there is lots of variation in the value. This is
due to the concurrent access of threads to the shared global variable x.
Output
x = 100000 after Iteration 0
x = 54034 after Iteration 1
x = 80230 after Iteration 2
x = 93602 after Iteration 3
x = 93289 after Iteration 4
acquire() method
This method is used to acquire, i.e., blocking a lock. A lock can be blocking or non-blocking
depending upon the following true or false value:
With value set to True: If the acquire() method is invoked with True, which is
the default argument, then the thread execution is blocked until the lock is
unlocked.
With value set to False: If the acquire() method is invoked with False, which is
not the default argument, then the thread execution is not blocked until it is set to
true, i.e., until it is locked.
release() method
This method is used to release a lock. Following are a few important tasks related to this
method:
If a lock is locked, then the release() method would unlock it. Its job is to allow
exactly one thread to proceed if more than one threads are blocked and waiting for
the lock to become unlocked.
34
Concurrency in Python
Now, we can rewrite the above program with the lock class and its methods to avoid the
race condition. We need to define the taskofThread() method with lock argument and then
need to use the acquire() and release() methods for blocking and non-blocking of locks to
avoid race condition.
Example
Following is example of python program to understand the concept of locks for dealing
with race condition:
import threading
x =0
def increment_global():
global x
x += 1
def taskofThread(lock):
for _ in range(50000):
lock.acquire()
increment_global()
lock.release()
def main():
global x
x =0
lock = threading.Lock()
t1 = threading.Thread(target=taskofThread, args=(lock,))
t2 = threading.Thread(target= taskofThread, args=(lock,))
t1.start()
t2.start()
35
Concurrency in Python
t1.join()
t2.join()
if __name__ == "__main__":
for i in range(5):
main()
print("x = {1} after Iteration {0}".format(i,x))
The following output shows that the effect of race condition is neglected; as the value of
x, after each & every iteration, is now 100000, which is as per the expectation of this
program.
Output
x = 100000 after Iteration 0
x = 100000 after Iteration 1
x = 100000 after Iteration 2
x = 100000 after Iteration 3
x = 100000 after Iteration 4
Edsger Dijkstra originally introduced the dining philosopher problem, one of the famous
illustrations of one of the biggest problem of concurrent system called deadlock.
In this problem, there are five famous philosophers sitting at a round table eating some
food from their bowls. There are five forks that can be used by the five philosophers to eat
their food. However, the philosophers decide to use two forks at the same time to eat their
food.
Now, there are two main conditions for the philosophers. First, each of the philosophers
can be either in eating or in thinking state and second, they must first obtain both the
forks, i.e., left and right. The issue arises when each of the five philosophers manages to
pick the left fork at the same time. Now they all are waiting for the right fork to be free
but they will never relinquish their fork until they have eaten their food and the right fork
would never be available. Hence, there would be a deadlock state at the dinner table.
36
Concurrency in Python
Example
The following Python program will help us find a solution to the dining philosopher
problem:
import threading
import random
import time
class DiningPhilosopher(threading.Thread):
running = True
def run(self):
while(self.running):
time.sleep( random.uniform(3,13))
print ('%s is hungry.' % self.name)
self.dine()
def dine(self):
fork1, fork2 = self.Leftfork, self.Rightfork
while self.running:
fork1.acquire(True)
locked = fork2.acquire(False)
37
Concurrency in Python
if locked: break
fork1.release()
print ('%s swaps forks' % self.name)
fork1, fork2 = fork2, fork1
else:
return
self.dining()
fork2.release()
fork1.release()
def dining(self):
print ('%s starts eating '% self.name)
time.sleep(random.uniform(1,10))
print ('%s finishes eating and now thinking.' % self.name)
def Dining_Philosophers():
forks = [threading.Lock() for n in range(5)]
philosopherNames = ('1st','2nd','3rd','4th', '5th')
random.seed()
DiningPhilosopher.running = True
for p in philosophers: p.start()
time.sleep(30)
DiningPhilosopher.running = False
print (" It is finishing.")
Dining_Philosophers()
The above program uses the concept of greedy and generous philosophers. The program
has also used the acquire() and release() methods of the Lock class of the
<threading> module. We can see the solution in the following output:
38
Concurrency in Python
Output
4th is hungry.
4th starts eating
1st is hungry.
1st starts eating
2nd is hungry.
5th is hungry.
3rd is hungry.
1st finishes eating and now thinking.3rd swaps forks
It is finishing.
39
7. Concurrency in Python – Threads Concurrency in Python
Intercommunication
In real life, if a team of people is working on a common task then there should be
communication between them for finishing the task properly. The same analogy is
applicable to threads also. In programming, to reduce the ideal time of the processor we
create multiple threads and assign different sub tasks to every thread. Hence, there must
be a communication facility and they should interact with each other to finish the job in a
synchronized manner.
Sets
For using set data structure in a thread-safe manner, we need to extend the set class to
implement our own locking mechanism.
Example
Here is a Python example of extending the class:
class extend_class(set):
def __init__(self, *args, **kwargs):
self._lock = Lock()
super(extend_class, self).__init__(*args, **kwargs)
try:
super(extend_class, self).add(elem)
finally:
self._lock.release()
In the above example, a class object named extend_class has been defined which is
further inherited from the Python set class. A lock object is created within the constructor
of this class. Now, there are two functions - add() and delete(). These functions are
defined and are thread-safe. They both rely on the super class functionality with one key
exception.
Decorator
This is another key method for thread-safe communication is the use of decorators.
Example
Consider a Python example that shows how to use decorators:
def lock_decorator(method):
class Decorator_class(set):
def __init__(self, *args, **kwargs):
self._lock = Lock()
super(Decorator_class, self).__init__(*args, **kwargs)
@lock_decorator
def add(self, *args, **kwargs):
return super(Decorator_class, self).add(elem)
41
Concurrency in Python
@lock_decorator
def delete(self, *args, **kwargs):
return super(Decorator_class, self).delete(elem)
In the above example, a decorator method named lock_decorator has been defined
which is further inherited from the Python method class. Then a lock object is created
within the constructor of this class. Now, there are two functions - add() and delete().
These functions are defined and are thread-safe. They both rely on super class
functionality with one key exception.
Lists
The list data structure is thread-safe, quick as well as easy structure for temporary, in-
memory storage. In Cpython, the GIL protects against concurrent access to them. As we
came to know that lists are thread-safe but what about the data lying in them. Actually,
the list’s data is not protected. For example, L.append(x) is not guarantee to return the
expected result if another thread is trying to do the same thing. This is because, although
append() is an atomic operation and thread-safe but the other thread is trying to modify
the list’s data in concurrent fashion hence we can see the side effects of race conditions
on the output.
To resolve this kind of issue and safely modify the data, we must implement a proper
locking mechanism, which further ensures that multiple threads cannot potentially run into
race conditions. To implement proper locking mechanism, we can extend the class as we
did in the previous examples.
L.append(x)
L1.extend(L2)
x = L[i]
x = L.pop()
L1[i:j] = L2
L.sort()
x = y
x.field = y
D[x] = y
D1.update(D2)
D.keys()
Here –
42
Concurrency in Python
i, j are ints
Queues
If the list’s data is not protected, we might have to face the consequences. We may get or
delete wrong data item, of race conditions. That is why it is recommended to use the
queue data structure. A real-world example of queue can be a single-lane one-way road,
where the vehicle enters first, exits first. More real-world examples can be seen of the
queues at the ticket windows and bus-stops.
Queues are by default, thread-safe data structure and we need not worry about
implementing complex locking mechanism. Python provides us the <queue> module to
use different types of queues in our application.
Types of Queues
In this section, we will earn about the different types of queues. Python provides three
options of queues to use from the <queue> module:
In python, FIFO queue can be implemented with single thread as well as multithreads.
Example
Following is a Python program for implementation of FIFO queue with single thread:
import queue
q = queue.Queue()
for i in range(8):
q.put("item-" + str(i))
Output
item-0 item-1 item-2 item-3 item-4 item-5 item-6 item-7
The output shows that above program uses a single thread to illustrate that the elements
are removed from the queue in the same order they are inserted.
Example
Following is a Python program for implementation of FIFO queue with multiple threads:
import threading
import queue
import random
import time
def myqueue(queue):
while not queue.empty():
44
Concurrency in Python
item = queue.get()
if item is None:
break
print("{} removed {} from the queue".format(threading.current_thread(),
item))
queue.task_done()
time.sleep(2)
q = queue.Queue()
for i in range(5):
q.put(i)
threads = []
for i in range(4):
thread = threading.Thread(target=myqueue, args=(q,))
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
Output
<Thread(Thread-3654, started 5044)> removed 0 from the queue
<Thread(Thread-3655, started 3144)> removed 1 from the queue
<Thread(Thread-3656, started 6996)> removed 2 from the queue
<Thread(Thread-3657, started 2672)> removed 3 from the queue
<Thread(Thread-3654, started 5044)> removed 4 from the queue
45
Concurrency in Python
Example
Following is a Python program for implementation of the LIFO queue with single thread:
import queue
q = queue.LifoQueue()
for i in range(8):
q.put("item-" + str(i))
The output shows that the above program uses a single thread to illustrate that elements
are removed from the queue in the opposite order they are inserted.
Example
Following is a Python program for implementation of LIFO queue with multiple threads:
import threading
import queue
import random
import time
def myqueue(queue):
while not queue.empty():
item = queue.get()
if item is None:
break
46
Concurrency in Python
Output
<Thread(Thread-3882, started 4928)> removed 4 from the queue
<Thread(Thread-3883, started 4364)> removed 3 from the queue
<Thread(Thread-3884, started 6908)> removed 2 from the queue
<Thread(Thread-3885, started 3584)> removed 1 from the queue
<Thread(Thread-3882, started 4928)> removed 0 from the queue
Priority queue
In FIFO and LIFO queues, the order of items are related to the order of insertion. However,
there are many cases when the priority is more important than the order of insertion. Let
us consider a real world example. Suppose the security at the airport is checking people
of different categories. People of the VVIP, airline staff, custom officer, categories may be
checked on priority instead of being checked on the basis of arrival like it is for the
commoners.
Another important aspect that needs to be considered for priority queue is how to develop
a task scheduler. One common design is to serve the most agent task on priority basis in
the queue. This data structure can be used to pick up the items from the queue based on
their priority value.
47
Concurrency in Python
Example
Consider the following Python program for implementation of Priority queue with single
thread:
import queue as Q
p_queue = Q.PriorityQueue()
p_queue.put((2, 'Urgent'))
p_queue.put((1, 'Most Urgent'))
p_queue.put((10, 'Nothing important'))
prio_queue.put((5, 'Important'))
Output
1 – Most Urgent
2 - Urgent
5 - Important
10 – Nothing important
In the above output, we can see that the queue has stored the items based on priority –
less value is having high priority.
Example
The following Python program helps in the implementation of priority queue with multiple
threads:
48
Concurrency in Python
import threading
import queue
import random
import time
def myqueue(queue):
while not queue.empty():
item = queue.get()
if item is None:
break
print("{} removed {} from the queue".format(threading.current_thread(),
item))
queue.task_done()
time.sleep(1)
q = queue.PriorityQueue()
for i in range(5):
q.put(i,1)
for i in range(5):
q.put(i,1)
threads = []
for i in range(2):
thread = threading.Thread(target=myqueue, args=(q,))
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
Output
<Thread(Thread-4939, started 2420)> removed 0 from the queue
<Thread(Thread-4940, started 3284)> removed 0 from the queue
<Thread(Thread-4939, started 2420)> removed 1 from the queue
<Thread(Thread-4940, started 3284)> removed 1 from the queue
<Thread(Thread-4939, started 2420)> removed 2 from the queue
<Thread(Thread-4940, started 3284)> removed 2 from the queue
<Thread(Thread-4939, started 2420)> removed 3 from the queue
49
Concurrency in Python
50
8. Concurrency in Python – Testing Thread Concurrency in Python
Applications
In this chapter, we will learn about testing of thread applications. We will also learn the
importance of testing.
Why to Test?
Before we dive into the discussion about the importance of testing, we need to know what
is testing. In general terms, testing is a technique of finding out how well something is
working. On the other hand, specifically if we talk about computer programs or software
then testing is the technique of accessing the functionality of a software program.
Satisfaction of customers
The most important part of any business is the satisfaction of their customers. By providing
bug free and good quality software, the companies can achieve customer satisfaction.
User experience
Another most important part of any business is the experience of the users of that product.
Only testing can assure that the end user finds it simple and easy to use the product.
51
Concurrency in Python
What to Test?
It is always recommended to have appropriate knowledge of what is to be tested. In this
section, we will first understand be the prime motive of tester while testing any software.
Code coverage, i.e., how many lines of code our test suite hits, while testing, should be
avoided. It is because, while testing, focusing only on the number of lines of codes adds
no real value to our system. There may remain some bugs, which reflect later at a later
stage even after deployment.
We need to focus on testing the functionality of the code rather than the code
coverage.
We need to test the most important parts of the code first and then move towards
the less important parts of the code. It will definitely save time.
The tester must have multitude different tests that can push the software up to its
limits.
Testing techniques for concurrent software programs are extensively focusing on selecting
interleaving that expose potentially harmful patterns like race conditions, deadlocks and
violation of atomicity. Following are two approaches for testing concurrent software
programs:
Systematic exploration
This approach aims to explore the space of the interleavings as broadly as possible. Such
approaches can adopt a brute-force technique and others adopt partial order reduction
technique or heuristic technique to explore the space of interleavings.
Property-driven
Property-driven approaches rely on the observation that concurrency faults are more likely
to occur under interleavings that expose specific properties such as suspicious memory
access pattern. Different property-driven approaches target different faults like race
conditions, deadlocks and violation of atomicity, which further depends on one or other
specific properties.
Testing Strategies
Test Strategy is also known as test approach. The strategy defines how testing would be
carried out. Test approach has two techniques:
52
Concurrency in Python
Proactive
An approach in which the test design process is initiated as early as possible in order to
find and fix the defects before the build is created.
Reactive
An approach in which the testing does not start until the completion of the development
process.
Before applying any test strategy or approach on python program, we must have a basic
idea about the kind of errors a software program may have. The errors are as follows:
Syntactical errors
During program development, there can be many small errors. The errors are mostly due
to typing mistakes. For example, missing colon or a wrong spelling of a keyword, etc. Such
errors are due to the mistake in program syntax and not in logic. Hence, these errors are
called syntactical errors.
Semantic errors
The semantic errors are also called logical errors. If there is a logical or semantic error in
software program then the statement will compile and run correctly but it will not give the
desired output because the logic is not correct.
Unit Testing
This is one of the most used testing strategies for testing python programs. This strategy
is used for testing units or components of the code. By units or components, we mean
classes or functions of the code. Unit testing simplifies the testing of large programming
systems by testing “small” units. With the help of the above concept, unit testing may be
defined as a method where individual units of source code are tested to determine if they
return the desired output.
In our subsequent sections, we will learn about the different Python modules for unit
testing.
unittest module
The very first module for unit testing is the unittest module. It is inspired by JUnit and
by default included in Python3.6. It supports test automation, sharing of setup and
shutdown code for tests, aggregation of tests into collections, and independence of the
tests from the reporting framework.
Text fixture
It is used to set up a test so that it can be run before starting the test and tear down after
the finish of test. It may involve creation of temporary database, directories, etc. needed
before starting the test.
Test case
53
Concurrency in Python
The test case checks whether a required response is coming from the specific set of inputs
or not. The unittest module includes a base class named TestCase which can be used to
create new test cases. It includes two by default methods:
setUp() - a hook method for setting up the test fixture before exercising it. This is
called before calling the implemented test methods.
tearDown() - a hook method for deconstructing the class fixture after running all
tests in the class.
Test suite
It is a collection of test suites, test cases or both.
Test runner
It controls the running of the test cases or suits and provides the outcome to the user. It
may use GUI or simple text interface for providing the outcome.
Example
The following Python program uses the unittest module to test a module named
Fibonacci. The program helps in calculating the Fibonacci series of a number. In this
example, we have created a class named Fibo_test, to define the test cases by using
different methods. These methods are inherited from unittest.TestCase. We are using
two by default methods – setUp() and tearDown(). We also define the testfibocal
method. The name of the test must be started with the letter test. In the final block,
unittest.main() provides a command-line interface to the test script.
import unittest
def fibonacci(n):
a, b = 0, 1
for i in range(n):
a, b = b, a + b
return a
class Fibo_Test(unittest.TestCase):
def setUp(self):
print("This is run before our tests would be executed")
def tearDown(self):
print("This is run after the completion of execution of our tests")
def testfibocal(self):
self.assertEqual(fib(0), 0)
self.assertEqual(fib(1), 1)
self.assertEqual(fib(5), 5)
self.assertEqual(fib(10), 55)
54
Concurrency in Python
self.assertEqual(fib(20), 6765)
if __name__ == "__main__":
unittest.main()
When run from the command line, the above script produces an output that looks like this:
Output
This runs before our tests would be executed.
This runs after the completion of execution of our tests.
.
----------------------------------------------------------------------
Ran 1 test in 0.006s
OK
Now, to make it clearer, we are changing our code which helped in defining the Fibonacci
module.
def fibonacci(n):
a, b = 0, 1
for i in range(n):
a, b = b, a + b
return a
def fibonacci(n):
a, b = 1, 1
for i in range(n):
a, b = b, a + b
return a
Now, after running the script with the changed code, we will get the following output:
----------------------------------------------------------------------
Ran 1 test in 0.007s
FAILED (failures=1)
The above output shows that the module has failed to give the desired output.
Docktest module
The docktest module also helps in unit testing. It also comes prepackaged with python. It
is easier to use than the unittest module. The unittest module is more suitable for
complex tests. For using the doctest module, we need to import it. The docstring of the
corresponding function must have interactive python session along with their outputs.
If everything is fine in our code then there will be no output from the docktest module;
otherwise, it will provide the output.
Example
The following Python example uses the docktest module to test a module named
Fibonacci , which helps in calculating the Fibonacci series of a number.
import doctest
def fibonacci(n):
"""
Calculates the Fibonacci number
>>> fibonacci(0)
0
>>> fibonacci(1)
1
>>> fibonacci(10)
55
>>> fibonacci(20)
6765
56
Concurrency in Python
>>>
"""
a, b = 1, 1
for i in range(n):
a, b = b, a + b
return a
if __name__ == "__main__":
doctest.testmod()
We can see that the docstring of the corresponding function named fib had interactive python session along
with the outputs. If our code is fine then there would be no output from the doctest module. But to see how it
works we can run it with the –v option.
57
Concurrency in Python
4 tests in __main__.fibonacci
4 tests in 2 items.
4 passed and 0 failed.
Test passed.
Now, we will change the code that helped in defining the Fibonacci module.
def fibonacci(n):
a, b = 0, 1
for i in range(n):
a, b = b, a + b
return a
def fibonacci(n):
a, b = 1, 1
for i in range(n):
a, b = b, a + b
return a
After running the script even without the –v option, with the changed code, we will get the
output as shown below.
Output
(base) D:\ProgramData>python dock_test.py
**********************************************************************
File "unitg.py", line 6, in __main__.fibonacci
Failed example:
fibonacci(0)
Expected:
0
Got:
1
**********************************************************************
File "unitg.py", line 10, in __main__.fibonacci
Failed example:
fibonacci(10)
58
Concurrency in Python
Expected:
55
Got:
89
**********************************************************************
File "unitg.py", line 12, in __main__.fibonacci
Failed example:
fibonacci(20)
Expected:
6765
Got:
10946
**********************************************************************
1 items had failures:
3 of 4 in __main__.fibonacci
***Test Failed*** 3 failures.
We can see in the above output that three tests have failed.
59
9. Concurrency in Python – Debugging Thread Concurrency in Python
Applications
In this chapter, we will learn how to debug thread applications. We will also learn the
importance of debugging.
What is Debugging?
In computer programming, debugging is the process of finding and removing the bugs,
errors and abnormalities from computer program. This process starts as soon as the code
is written and continues in successive stages as code is combined with other units of
programming to form a software product. Debugging is part of the software testing process
and is an integral part of the entire software development life cycle.
Python Debugger
The Python debugger or the pdb is part of the Python standard library. It is a good fallback
tool for tracking down hard-to-find bugs and allows us to fix faulty code quickly and
reliably. Followings are the two most important tasks of the pdp debugger:
import pdb;
pdb.set_trace()
h(help)
d(down)
u(up)
b(break)
cl(clear)
l(list)
n(next)
c(continue)
60
Concurrency in Python
s(step)
r(return)
b(break)
import pdb
pdb.set_trace()
--Call--
>d:\programdata\lib\site-packages\ipython\core\displayhook.py(247)__call__()
-> def __call__(self, result=None):
(Pdb) h
Example
While working with Python debugger, we can set the breakpoint anywhere in the script by
using the following lines:
import pdb;
pdb.set_trace()
After setting the breakpoint, we can run the script normally. The script will execute until
a certain point; until where a line has been set. Consider the following example where we
will run the script by using the above-mentioned lines at various places in the script:
import pdb
61
Concurrency in Python
a = "aaa"
pdb.set_trace()
b = "bbb"
c = "ccc"
final = a + b + c
print (final)
When the above script is run, it will execute the program till a = “aaa”, we can check this
in the following output.
Output
--Return--
> <ipython-input-7-8a7d1b5cc854>(3)<module>()->None
-> pdb.set_trace()
(Pdb) p a
'aaa'
(Pdb) p b
*** NameError: name 'b' is not defined
(Pdb) p c
*** NameError: name 'c' is not defined
After using the command ‘p(print)’ in pdb, this script is only printing ‘aaa’. This is
followed by an error because we have set the breakpoint till a = "aaa".
Similarly, we can run the script by changing the breakpoints and see the difference in the
output:
import pdb
a = "aaa"
b = "bbb"
c = "ccc"
pdb.set_trace()
final = a + b + c
print (final)
Output
--Return--
> <ipython-input-9-a59ef5caf723>(5)<module>()->None
-> pdb.set_trace()
62
Concurrency in Python
(Pdb) p a
'aaa'
(Pdb) p b
'bbb'
(Pdb) p c
'ccc'
(Pdb) p final
*** NameError: name 'final' is not defined
(Pdb) exit
In the following script, we are setting the breakpoint in the last line of the program:
import pdb
a = "aaa"
b = "bbb"
c = "ccc"
final = a + b + c
pdb.set_trace()
print (final)
--Return--
> <ipython-input-11-8019b029997d>(6)<module>()->None
-> pdb.set_trace()
(Pdb) p a
'aaa'
(Pdb) p b
'bbb'
(Pdb) p c
'ccc'
(Pdb) p final
'aaabbbccc'
(Pdb)
63
10. Concurrency in Python – Benchmarking & Concurrency in Python
Profiling
In this chapter, we will learn how benchmarking and profiling help in addressing
performance issues.
Suppose we had written a code and it is giving the desired result too but what if we want
to run this code a bit faster because the needs have changed. In this case, we need to find
out what parts of our code are slowing down the entire program. In this case,
benchmarking and profiling can be useful.
What is Benchmarking?
Benchmarking aims at evaluating something by comparison with a standard. However, the
question that arises here is that what would be the benchmarking and why we need it in
case of software programming. Benchmarking the code means how fast the code is
executing and where the bottleneck is. One major reason for benchmarking is that it
optimizes the code.
Example
In the following Python script, we are importing the timeit module, which further
measures the time taken to execute two functions – functionA and functionB:
import timeit
import time
def functionA():
print("Function A starts the execution:")
print("Function A completes the execution:")
def functionB():
print("Function B starts the execution")
print("Function B completes the execution")
start_time = timeit.default_timer()
64
Concurrency in Python
functionA()
print(timeit.default_timer() - start_time)
start_time = timeit.default_timer()
functionB()
print(timeit.default_timer() - start_time)
After running the above script, we will get the execution time of both the functions as
shown below.
Output
Function A starts the execution:
Function A completes the execution:
0.0014599495514175942
Function B starts the execution
Function B completes the execution
0.0017024724827479076
import random
import time
def timer_func(func):
@timer_func
65
Concurrency in Python
def Myfunction():
for x in range(5):
sleep_time = random.choice(range(1,3))
time.sleep(sleep_time)
if __name__ == '__main__':
Myfunction()
The above python script helps in importing random time modules. We have created the
timer_func() decorator function. This has the function_timer() function inside it. Now, the
nested function will grab the time before calling the passed in function. Then it waits for
the function to return and grabs the end time. In this way, we can finally make python
script print the execution time. The script will generate the output as shown below.
Output
Myfunction took 8.000457763671875 seconds to complete its execution.
What is profiling?
Sometimes the programmer wants to measure some attributes like the use of memory,
time complexity or usage of particular instructions about the programs to measure the
real capability of that program. Such kind of measuring about program is called profiling.
Profiling uses dynamic program analysis to do such measuring.
In the subsequent sections, we will learn about the different Python Modules for Profiling.
Example
def increment_global():
global x
x += 1
def taskofThread(lock):
for _ in range(50000):
66
Concurrency in Python
lock.acquire()
increment_global()
lock.release()
def main():
global x
x = 0
lock = threading.Lock()
t1 = threading.Thread(target=taskofThread, args=(lock,))
t2 = threading.Thread(target= taskofThread, args=(lock,))
t1.start()
t2.start()
t1.join()
t2.join()
if __name__ == "__main__":
for i in range(5):
main()
print("x = {1} after Iteration {0}".format(i,x))
The above code is saved in the thread_increment.py file. Now, execute the code with
cProfile on the command line as follows:
From the above output, it is clear that cProfile prints out all the 3577 functions called, with
the time spent in each and the number of times they have been called. Followings are the
columns we got in output:
cumtime: It is the cumulative time spent in this and all subfunctions. It is even
accurate for recursive functions.
68
11. Concurrency in Python – Pool of Threads Concurrency in Python
Suppose we had to create a large number of threads for our multithreaded tasks. It would
be computationally most expensive as there can be many performance issues, due to too
many threads. A major issue could be in the throughput getting limited. We can solve this
problem by creating a pool of threads. A thread pool may be defined as the group of pre-
instantiated and idle threads, which stand ready to be given work. Creating thread pool is
preferred over instantiating new threads for every task when we need to do large number
of tasks. A thread pool can manage concurrent execution of large number of threads as
follows:
If a thread in a thread pool completes its execution then that thread can be reused.
If a thread is terminated, another thread will be created to replace that thread.
In our subsequent sections, we will learn about the different classes of the
concurrent.futures module.
Executor Class
Executor is an abstract class of the concurrent.futures Python module. It cannot be
used directly and we need to use one of the following concrete subclasses:
ThreadPoolExecutor
ProcessPoolExecutor
Example
from concurrent.futures import ThreadPoolExecutor
from time import sleep
def task(message):
sleep(2)
return message
def main():
executor = ThreadPoolExecutor(5)
future = executor.submit(task, ("Completed"))
print(future.done())
sleep(2)
print(future.done())
print(future.result())
if __name__ == '__main__':
main()
Output
False
True
Completed
In the above example, a ThreadPoolExecutor has been constructed with 5 threads. Then
a task, which will wait for 2 seconds before giving the message, is submitted to the thread
pool executor. As seen from the output, the task does not complete until 2 seconds, so
the first call to done() will return False. After 2 seconds, the task is done and we get the
result of the future by calling the result() method on it.
Example
The following example is borrowed from the Python docs. In this example, first of all the
concurrent.futures module has to be imported. Then a function named load_url() is
created which will load the requested url. The function then creates ThreadPoolExecutor
70
Concurrency in Python
with the 5 threads in the pool. The ThreadPoolExecutor has been utilized as context
manager. We can get the result of the future by calling the result() method on it.
import concurrent.futures
import urllib.request
URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
Output
Following would be the output of the above Python script:
71
Concurrency in Python
Example
In this example below, the map function is used to apply the square() function to every
value in the values array.
Output
The above Python script generates the following output:
4
9
16
25
72
12. Concurrency in Python – Pool of Processes Concurrency in Python
Pool of process can be created and used in the same way as we have created and used
the pool of threads. Process pool can be defined as the group of pre-instantiated and idle
processes, which stand ready to be given work. Creating process pool is preferred over
instantiating new processes for every task when we need to do a large number of tasks.
Executor Class
Executor is an abstract class of the concurrent.futures Python module. It cannot be
used directly and we need to use one of the following concrete subclasses:
ThreadPoolExecutor
ProcessPoolExecutor
Example
We will now consider the same example that we used while creating thread pool, the only
difference being that now we will use ProcessPoolExecutor instead of
ThreadPoolExecutor.
sleep(2)
return message
def main():
executor = ProcessPoolExecutor(5)
future = executor.submit(task, ("Completed"))
print(future.done())
sleep(2)
print(future.done())
print(future.result())
if __name__ == '__main__':
main()
Output
False
False
Completed
In the above example, a ProcessPoolExecutor has been constructed with 5 threads. Then
a task, which will wait for 2 seconds before giving the message, is submitted to the process
pool executor. As seen from the output, the task does not complete until 2 seconds, so
the first call to done() will return False. After 2 seconds, the task is done and we get the
result of the future by calling the result() method on it.
Example
For better understanding, we are taking the same example as used while creating thread
pool. In this example, we need to start by importing the concurrent.futures module.
Then a function named load_url() is created which will load the requested url. The
ProcessPoolExecutor is then created with the 5 number of threads in the pool. The
ProcessPoolExecutor has been utilized as context manager. We can get the result of the
future by calling the result() method on it.
import concurrent.futures
from concurrent.futures import ProcessPoolExecutor
74
Concurrency in Python
import urllib.request
URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
def main():
with concurrent.futures.ProcessPoolExecutor(max_workers=5) as executor:
Output
The above Python script will generate the following output:
75
Concurrency in Python
Example
We will consider the same example that we used while creating thread pool using the
Executor.map() function. In the example givenbelow, the map function is used to apply
square() function to every value in the values array.
Output
The above Python script will generate the following output:
4
9
16
25
If we use ProcessPoolExecutor, then we do not need to worry about GIL because it uses
multiprocessing. Moreover, the execution time will be less when compared to
ThreadPoolExecution. Consider the following Python script example to understand this.
76
Concurrency in Python
Example
import time
import concurrent.futures
def counting(n):
start = time.time()
while n > 0:
n -= 1
return time.time() - start
def main():
start = time.time()
with concurrent.futures.ProcessPoolExecutor() as executor:
for number, time_taken in zip(value, executor.map(counting, value)):
print('Start: {} Time taken: {}'.format(number, time_taken))
print('Total time taken: {}'.format(time.time() - start))
if __name__ == '__main__':
main()
Output
Start: 8000000 Time taken: 1.5509998798370361
Start: 7000000 Time taken: 1.3259999752044678
Total time taken: 2.0840001106262207
def counting(n):
start = time.time()
while n > 0:
77
Concurrency in Python
n -= 1
return time.time() - start
def main():
start = time.time()
with concurrent.futures.ThreadPoolExecutor() as executor:
for number, time_taken in zip(value, executor.map(counting, value)):
print('Start: {} Time taken: {}'.format(number, time_taken))
print('Total time taken: {}'.format(time.time() - start))
if __name__ == '__main__':
main()
Output
Start: 8000000 Time taken: 3.8420000076293945
Start: 7000000 Time taken: 3.6010000705718994
Total time taken: 3.8480000495910645
From the outputs of both the programs above, we can see the difference of execution time
while using ProcessPoolExecutor and ThreadPoolExecutor.
78
13. Concurrency in Python – Multiprocessing Concurrency in Python
In this chapter, we will focus more on the comparison between multiprocessing and
multithreading.
Multiprocessing
It is the use of two or more CPUs units within a single computer system. It is the best
approach to get the full potential from our hardware by utilizing full number of CPU cores
available in our computer system.
Multithreading
It is the ability of a CPU to manage the use of operating system by executing multiple
threads concurrently. The main idea of multithreading is to achieve parallelism by dividing
a process into multiple threads.
The following table shows some of the important differences between them:
4 Less time taken to process More Time taken to process the jobs.
the jobs.
79
Concurrency in Python
With the use of multiprocessing, we can effectively bypass the limitation caused by GIL:
Due to this, there is no restriction of executing the bytecode of one thread within
our programs at any one time.
Fork
Spawn
Forkserver
getpid(): This system call returns the process ID(PID) of the calling process.
Example
The following Python script example will help you understabd how to create a new child
process and get the PIDs of child and parent processes:
80
Concurrency in Python
import os
def child():
n = os.fork()
if n > 0:
print("PID of Parent process is : ", os.getpid())
else:
print("PID of Child process is : ", os.getpid())
child()
Output
PID of Parent process is : 25989
PID of Child process is : 25990
Example
The following example of Python script helps in spawning three processes:
import multiprocessing
def spawn_process(i):
print ('This is process: %s' %i)
81
Concurrency in Python
return
if __name__ == '__main__':
Process_jobs = []
for i in range(3):
p = multiprocessing.Process(target=spawn_process, args=(i,))
Process_jobs.append(p)
p.start()
p.join()
Output
This is process: 0
This is process: 1
This is process: 2
The server then receives the command and handles all the requests for creating
new processes.
For creating a new process, our python program will send a request to Forkserver
and it will create a process for us.
Example
Here, we are using the same example as used in the daemon threads. The only difference
is the change of module from multithreading to multiprocessing and setting the
daemonic flag to true. However, there would be a change in output as shown below:
82
Concurrency in Python
import multiprocessing
import time
def nondaemonProcess():
print("starting my Process")
time.sleep(8)
print("ending my Process")
def daemonProcess():
while True:
print("Hello")
time.sleep(2)
if __name__ == '__main__':
nondaemonProcess = multiprocessing.Process(target=nondaemonProcess)
daemonProcess = multiprocessing.Process(target=daemonProcess)
daemonProcess.daemon = True
nondaemonProcess.daemon = False
daemonProcess.start()
nondaemonProcess.start()
Output
starting my Process
ending my Process
The output is different when compared to the one generated by daemon threads, because
the process in no daemon mode have an output. Hence, the daemonic process ends
automatically after the main programs end to avoid the persistence of running processes.
Example
import multiprocessing
import time
def Child_process():
print ('Starting function')
time.sleep(5)
83
Concurrency in Python
Output
My Process has terminated, terminating main thread
Terminating Child Process
Child Process successfully terminated
The output shows that the program terminates before the execution of child process that
has been created with the help of the Child_process() function. This implies that the child
process has been terminated successfully.
import multiprocessing
print(multiprocessing.current_process().pid)
Example
The following example of Python script helps find out the PID of main process as well as
PID of child process:
import multiprocessing
import time
def Child_process():
print("PID of Child Process is:
{}".format(multiprocessing.current_process().pid))
print("PID of Main process is:
{}".format(multiprocessing.current_process().pid))
P = multiprocessing.Process(target=Child_process)
P.start()
P.join()
84
Concurrency in Python
Output
PID of Main process is: 9401
PID of Child Process is: 9402
Example
import multiprocessing
class MyProcess(multiprocessing.Process):
def run(self):
print ('called run method in process: %s' %self.name)
return
if __name__ == '__main__':
jobs = []
for i in range(5):
P=MyProcess()
jobs.append(P)
P.start()
P.join()
Output
called run method in process: MyProcess-1
called run method in process: MyProcess-2
called run method in process: MyProcess-3
called run method in process: MyProcess-4
called run method in process: MyProcess-5
85
Concurrency in Python
apply() method
This method is similar to the .submit() method of ThreadPoolExecutor. It blocks until
the result is ready.
apply_async() method
When we need parallel execution of our tasks then we need to use the apply_async()
method to submit tasks to the pool. It is an asynchronous operation that will not lock the
main thread until all the child processes are executed.
map() method
Just like the apply() method, it also blocks until the result is ready. It is equivalent to the
built-in map() function that splits the iterable data in a number of chunks and submits to
the process pool as separate tasks.
map_async() method
It is a variant of the map() method as apply_async() is to the apply() method. It
returns a result object. When the result becomes ready, a callable is applied to it. The
callable must be completed immediately; otherwise, the thread that handles the results
will get blocked.
Example
The following example will help you implement a process pool for performing parallel
execution. A simple calculation of square of number has been performed by applying the
square() function through the multiprocessing.Pool method. Then pool.map() has
been used to submit the 5, because input is a list of integers from 0 to 4. The result would
be stored in p_outputs and it is printed.
def square(n):
result = n*n
return result
if __name__ == '__main__':
inputs = list(range(5))
p = multiprocessing.Pool(processes=4)
p_outputs = pool.map(function_square, inputs)
p.close()
p.join()
86
Concurrency in Python
Output
Pool : [0, 1, 4, 9, 16]
87
14. Concurrency in Python – Processes Concurrency in Python
Intercommunication
Queues
Queues can be used with multi-process programs. The Queue class of multiprocessing
module is similar to the Queue.Queue class. Hence, the same API can be used.
Multiprocessing.Queue provides us a thread and process safe FIFO (first-in first-out)
mechanism of communication between processes.
Example
Following is a simple example taken from python official docs on multiprocessing to
understand the concept of Queue class of multiprocessing.
88
Concurrency in Python
q = Queue()
p = Process(target=f, args=(q,))
p.start()
print (q.get())
if __name__ == '__main__':
main()
Output
[42, None, 'hello']
Pipes
It is a data structure, which is used to communicate between processes in multi-process
programs. The Pipe() function returns a pair of connection objects connected by a pipe
which by default is duplex(two way). It works in the following manner:
It returns a pair of connection objects that represent the two ends of pipe.
Every object has two methods – send() and recv(), to communicate between
processes.
Example
Following is a simple example taken from python official docs on multiprocessing to
understand the concept of Pipe() function of multiprocessing.
def f(conn):
conn.send([42, None, 'hello'])
conn.close()
if __name__ == '__main__':
parent_conn, child_conn = Pipe()
p = Process(target=f, args=(child_conn,))
p.start()
print (parent_conn.recv())
p.join()
89
Concurrency in Python
Output
[42, None, 'hello']
Manager
Manager is a class of multiprocessing module that provides a way to coordinate shared
information between all its users. A manager object controls a server process, which
manages shared objects and allows other processes to manipulate them. In other words,
managers provide a way to create data that can be shared between different processes.
Following are the different properties of manager object:
The main property of manager is to control a server process, which manages the
shared objects.
Another important property is to update all the shared objects when any process
modifies it.
Example
Following is an example which uses the manager object for creating a list record in server
process and then adding a new record in that list.
import multiprocessing
def print_records(records):
for record in records:
print("Name: {0}\nScore: {1}\n".format(record[0], record[1]))
if __name__ == '__main__':
with multiprocessing.Manager() as manager:
p1 = multiprocessing.Process(target=insert_record, args=(new_record,
records))
p2 = multiprocessing.Process(target=print_records, args=(records,))
90
Concurrency in Python
p1.start()
p1.join()
p2.start()
p2.join()
Output
A New record is added
Name: Computers
Score: 1
Name: Histoty
Score: 5
Name: Hindi
Score: 9
Name: English
Score: 3
Example
The following Python script example helps us utilize namespaces for sharing data across
main process and child process:
import multiprocessing
def Mng_NaSp(using_ns):
using_ns.x +=5
using_ns.y *= 10
if __name__ == '__main__':
91
Concurrency in Python
manager = multiprocessing.Manager()
using_ns = manager.Namespace()
using_ns.x = 1
using_ns.y = 1
Output
before Namespace(x=1, y=1)
after Namespace(x=6, y=10)
Example
Following Python script is an example taken from python docs to utilize Ctypes Array and
Value for sharing some data between processes.
if __name__ == '__main__':
num = Value('d', 0.0)
arr = Array('i', range(10))
print (num.value)
print (arr[:])
Output
3.1415927
[0, -1, -2, -3, -4, -5, -6, -7, -8, -9]
In the above PyCSP process network, there are two processes – Process1 and Process 2.
These processes communicate by passing messages through two channels – channel 1
and channel 2.
Installing PyCSP
With the help of following command, we can install Python library PyCSP:
Example
Following Python script is a simple example for running two processes in parallel to each
other. It is done with the help of the PyCSP python libabary:
time.sleep(1)
print('P1 exiting')
@process
def P2():
time.sleep(1)
print('P2 exiting')
def main():
Parallel(P1(), P2())
print('Terminating')
if __name__ == '__main__':
main()
In the above script, two functions namely P1 and P2 have been created and then
decorated with @process for converting them into processes.
Output
P2 exiting
P1 exiting
Terminating
94
15. Concurrency in Python – Event-Driven Concurrency in Python
Programming
loop = get_event_loop(): This method will provide the event loop for the current
context.
95
Concurrency in Python
loop.time(): This method is used to return the current time according to the event
loop’s internal clock.
asyncio.set_event_loop(): This method will set the event loop for the current
context to the loop.
asyncio.new_event_loop(): This method will create and return a new event loop
object.
Example
The following example of event loop helps in printing hello world by using the
get_event_loop() method. This example is taken from the Python official docs.
import asyncio
def hello_world(loop):
print('Hello World')
loop.stop()
loop = asyncio.get_event_loop()
loop.call_soon(hello_world, loop)
loop.run_forever()
loop.close()
Output
Hello World
Futures
This is compatible with the concurrent.futures.Future class that represents a computation
that has not been accomplished. There are following differences between
asyncio.futures.Future and concurrent.futures.Future:
96
Concurrency in Python
result() and exception() methods do not take a timeout argument and raise an
exception when the future isn’t done yet.
Callbacks registered with add_done_callback() are always called via the event
loop’s call_soon().
Example
The following is an example that will help you understand how to use
asyncio.futures.future class.
import asyncio
loop = asyncio.get_event_loop()
future = asyncio.Future()
asyncio.ensure_future(Myoperation(future))
try:
loop.run_until_complete(future)
print(future.result())
finally:
loop.close()
Output
Future Completed
Coroutines
The concept of coroutines in Asyncio is similar to the concept of standard Thread object
under threading module. This is the generalization of the subroutine concept. A coroutine
can be suspended during the execution so that it waits for the external processing and
returns from the point at which it had stopped when the external processing was done.
The following two ways help us in implementing coroutines:
import asyncio
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(Myoperation())
finally:
loop.close()
Output
First Coroutine
@asyncio.coroutine decorator
Another method for implementation of coroutines is to utilize generators with the
@asyncio.coroutine decorator. Following is a Python script for the same:
import asyncio
@asyncio.coroutine
def Myoperation():
print("First Coroutine")
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(Myoperation())
finally:
loop.close()
Output
First Coroutine
98
Concurrency in Python
Tasks
This subclass of Asyncio module is responsible for execution of coroutines within an event
loop in parallel manner. Following Python script is an example of processing some tasks in
parallel.
import asyncio
import time
async def Task_ex(n):
time.sleep(1)
print("Processing {}".format(n))
async def Generator_task():
for i in range(10):
asyncio.ensure_future(Task_ex(i))
print("Tasks Completed")
asyncio.sleep(2)
loop = asyncio.get_event_loop()
loop.run_until_complete(Generator_task())
loop.close()
Output
Tasks Completed
Processing 0
Processing 1
Processing 2
Processing 3
Processing 4
Processing 5
Processing 6
Processing 7
Processing 8
Processing 9
Transports
Asyncio module provides transport classes for implementing various types of
communication. These classes are not thread safe and always paired with a protocol
instance after establishment of communication channel.
99
Concurrency in Python
Followings are five distinct methods of BaseTransport class that are subsequently transient
across the four transport types:
is_closing(): This method will return true if the transport is closing or is already
closed.
Protocols
Asyncio module provides base classes that you can subclass to implement your network
protocols. Those classes are used in conjunction with transports; the protocol parses
incoming data and asks for the writing of outgoing data, while the transport is responsible
for the actual I/O and buffering. Following are three classes of Protocol:
Protocol: This is the base class for implementing streaming protocols for use with
TCP and SSL transports.
100
16. Concurrency in Python – Reactive Concurrency in Python
Programming
Reactive programming is a programming paradigm that deals with data flows and the
propagation of change. It means that when a data flow is emitted by one component, the
change will be propagated to other components by reactive programming library. The
propagation of change will continue until it reaches the final receiver. The difference
between event-driven and reactive programming is that event-driven programming
revolves around events and reactive programming revolves around data.
Observable class
This class is the source of data stream or events and it packs the incoming data so that
the data can be passed from one thread to another. It will not give data until some
observer subscribe to it.
Observer class
This class consumes the data stream emitted by observable. There can be multiple
observers with observable and each observer will receive each data item that is emitted.
The observer can receive three type of events by subscribing to observable:
on_completed() event: It implies end of emission and no more items are coming.
on_error() event: It also implies end of emission but in case when an error is
thrown by observable.
Example
Following is a Python script, which uses RxPY module and its classes Observable and
Observe for reactive programming. There are basically two classes –
101
Concurrency in Python
PrintObserver(): for printing the strings from observer. It uses all three events of
observer class. It also uses subscribe() class.
Output
Received Ram
Received Mohan
Received Shyam
Finished
102
Concurrency in Python
Example
Following example uses the PyFunctional module and its seq class which act as the
stream object with which we can iterate and manipulate. In this program, it maps the
sequence by using the lamda function that doubles every value, then filters the value
where x is greater than 4 and finally it reduces the sequence into a sum of all the remaining
values.
Output
Result: 6
103