Skip to content

Commit fce455e

Browse files
Add more explanation.
1 parent 9ab68fa commit fce455e

File tree

1 file changed

+129
-47
lines changed

1 file changed

+129
-47
lines changed

Doc/howto/concurrency.rst

Lines changed: 129 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,17 @@ for comparison, when possible.
2929
complicate the solution. In-depth discussion of this point
3030
is outside the scope of this document.
3131

32+
.. note::
33+
34+
Free-threading is one of the oldest concurrency models, fundamental
35+
to operating systems, and widely supported in programming languages.
36+
However, it is generally considered perilous and not human-friendly.
37+
Other concurrency models have demonstrated better usability and
38+
newer programming languages typically avoid exposing threads directly.
39+
Take that into consideration before reaching for threads and look at
40+
the alternatives first.
41+
See `the table below`_.
42+
3243

3344
All About Concurrency
3445
=====================
@@ -39,11 +50,19 @@ What is concurrency?
3950
At its most fundamental, concurrency means doing multiple things at once,
4051
from a strictly *logical* viewpoint.
4152

42-
When a computer program runs, it executes a sequence of code in order.
53+
When a computer program runs, it executes a sequence of code
54+
in a given order. If you were to trace the actual execution, you would
55+
still end up with a *linear* series of executed intructions that matches
56+
the code. We call this sequence of code (and instructions) a logical
57+
"thread" of execution.
58+
4359
Sometimes it makes sense to break up that sequence into smaller pieces,
44-
where some of them can run independently of others.
60+
where some of them can run independently of others. Thus the program
61+
then involves multiple logical threads. This is also called
62+
"multitasking" and each logical thread a "task".
4563

46-
For example, consider the following program with three pieces::
64+
As an example of splitting up the sequence, consider the following
65+
abstract program with three pieces::
4766

4867
prep
4968
do A
@@ -60,16 +79,21 @@ end up with the same result::
6079
In the first alternative, we swap ``do A`` and ``do B``. In the second
6180
one we split the original program into two programs that we can run at
6281
the same time. In the third one, we run ``do A`` and ``do B`` at the
63-
same time. "At the same time" means concurrency.
82+
same time. "At the same time" means concurrency. It always involves
83+
multiple logical threads.
6484

65-
Concurrency often involves some degree of synchronization between
66-
the tasks. At the most basic conceptual level: one task may wait
67-
for another to finish.
85+
Additionally, concurrency often involves some degree of synchronization
86+
between the logical threads. At the most basic conceptual level:
87+
one thread may wait for another to finish.
6888

69-
In addition to code running at the same time, concurrency typically
89+
Aside from code running at the same time, concurrency typically
7090
also involves some amount of resources shared between the concurrent
7191
tasks. That may include memory, files, and sockets.
7292

93+
One important observation is that most concurrent programs
94+
can be represented instead as a single task, with the code of the
95+
concurrent tasks merged into a single sequence.
96+
7397
What is parallelism?
7498
--------------------
7599

@@ -81,6 +105,46 @@ are physically running at the same time, not just logically.
81105

82106
That second way is parallelism.
83107

108+
Modern CPUs are designed around parallelism, with multiple cores
109+
and sometimes multiple execution pipelines per core. The operating
110+
system exposes physical CPU threads as OS threads and as processes.
111+
A programming language (or runtime) may add additional layers of
112+
abstraction on top of that.
113+
114+
Parallelism is where concurrent logical threads are running
115+
on distinct physical threads across multiple cores,
116+
117+
Concurrency Models
118+
------------------
119+
120+
The concept of concurrency has been a part of the study and practice
121+
of computer software since very early on, in the 1950s and 1960s,
122+
long before the wide-spread adotion of multi-core CPUs. Clearly
123+
its about more than just parallelism.
124+
125+
Over the decades, research and use of concurrency has led to a variety
126+
of well defined abstract models, with different characteristics and
127+
tradeoffs. The application of the different theoretical concurrency
128+
models can be categorized as follows:
129+
130+
* free threads - using multiple physical threads in the same process,
131+
with no isolation between them
132+
* isolated threads - threads, often physical, with strict isolation
133+
between them (e.g. CSP and actor model)
134+
* multiprocessing - using multiple isolated processes
135+
* distributed - multiprocessing across multiple computers
136+
* async/await - using coroutines (AKA "cooperative multitasking")
137+
138+
(There are certainly others, but these are the focus here.)
139+
140+
There are tradeoffs to each. Free-threading probably has the most
141+
notoriety and the most examples, but is also the most likely to cause
142+
you pain.
143+
Isolated threads have few of the downsides but are less familiar.
144+
Multiprocessing and distributed are less efficient at smaller scales.
145+
Async can be straightforward, but may cascade throughout a code base
146+
and doesn't necessarily give you parallelism.
147+
84148
What problems can concurrency help solve?
85149
-----------------------------------------
86150

@@ -92,6 +156,7 @@ the following:
92156
* run on multiple CPU cores (parallelism)
93157
* keep blocking resources from blocking the whole program
94158
* make sure critical tasks have priority
159+
* make sure other tasks have a fair share of time
95160
* process results as they come, instead of waiting for them all
96161

97162
Other possible benefits:
@@ -103,55 +168,58 @@ Other possible benefits:
103168
What are the downsides?
104169
-----------------------
105170

106-
The main challenge when using concurrency is the extra complexity.
107-
108-
.. XXX
109-
110-
* races on shared resources
111-
* error handling
112-
* ...
113-
114-
The operating system, along with some libraries and frameworks,
115-
can help mitigate the extra complexity. So can the concurrency
116-
model you use, which we'll talk about a little later..
171+
The main challenge when using concurrency is the (potential) extra
172+
complexity. This complexity comes from the effect of multiple logical
173+
threads running at the same time and interacting with each other.
174+
In practice, this falls into two categories: data races and tracing
175+
relative execution. Both are a form of "spooky action at a distance".
176+
177+
The first category relates to mutable data shared between threads:
178+
a data race is where one thread writes to memory at a time when another
179+
thread is expecting the value to be unchanged, invalidating its logic.
180+
Similarly, two threads could write to the same memory location at the
181+
same time, either corrupting the data there or invalidating
182+
the expectations of one of the threads.
183+
184+
In each case, the non-deterministic scheduling of threads means it is
185+
both hard to reproduce races and to track down where a race happened.
186+
These qualities much these bugs especially frustrating
187+
and worth diligently avoiding.
188+
189+
Races are possible when the concurrency approach is subject
190+
to parallel execution or to non-deterministic switching.
191+
(This excludes "async/await", which relies on cooperative multitasking.)
192+
When all memory is possibly shared, as is the case with free-threading,
193+
then all memory is at risk.
194+
195+
Dealing with data races is often managed using locks (AKA mutexes),
196+
at a low level, and thread-safe types and APIs at a high level.
197+
Depending on the programming language, the complexity is sometimes
198+
mitigated somewhat by the compiler and runtime. There are even
199+
libraries and frameworks that help abstract away the complexity
200+
to an extent. On top of that, there are tools that can help identify
201+
potential races via static analysis. Unfortunately, none of these aids
202+
is foolproof and the risk of hitting a race is always looming.
203+
204+
.. XXX mention reentrancy?
205+
206+
The second category of complexity is the problem of tracing the execution
207+
of one logical thread relative to another. This is especially relevant
208+
for error handling, when an error in the one thread is exposed in the
209+
other. This applies equally to threads that start other threads as to
210+
concurrency models that use callbacks. Knowing where the failing thread
211+
was started is valuable when debugging, as is knowing where a callback
212+
was registered.
117213

118214
Workloads
119215
---------
120216

121217
We've looked at what you can do with concurrency from a high level.
122218
Now let's look at some concrete examples.
123219

124-
125220
...
126221

127222

128-
Concurrency Models
129-
------------------
130-
131-
The concept of concurrency has been a part of the study and practice
132-
of computer software since the 1950s and 1960s, with various innovations
133-
since then. The application of the different theoretical concurrency
134-
models can be categorized as follows:
135-
136-
* free threads - using multiple threads in the same process,
137-
with no isolation between them
138-
* isolated threads - threads with strict isolation between them
139-
(e.g. CSP and actor model)
140-
* multiprocessing - using multiple isolated processes
141-
* distributed - multiprocessing across multiple computers
142-
* async/await - using coroutines (AKA cooperative multitasking)
143-
144-
(There are certainly others, but these are the focus here.)
145-
146-
There are tradeoffs to each. Free-threading probably has the most
147-
notoriety and the most examples, but is also the most likely to cause
148-
you pain.
149-
Isolated threads have few of the downsides but are less familiar.
150-
Multiprocessing and distributed are less efficient at smaller scales.
151-
Async can be straightforward, but may cascade throughout a code base,
152-
doesn't necessarily give you parallelism.
153-
154-
155223
Python Concurrency Models
156224
=========================
157225

@@ -197,6 +265,20 @@ also see:
197265
* https://github.com/ericsnowcurrently/concurrency-benchmarks
198266

199267

268+
.. _the table below:
269+
270+
.. rst-class:: align-left
271+
272+
======== ========= =============== ===== =============== ===
273+
workload threading subinterpreters async multiprocessing smp
274+
======== ========= =============== ===== =============== ===
275+
1 Y Y Y Y Y
276+
2 Y Y Y Y Y
277+
3 Y Y Y Y Y
278+
4 Y Y Y Y Y
279+
======== ========= =============== ===== =============== ===
280+
281+
200282
Workload 1
201283
----------
202284

0 commit comments

Comments
 (0)