Skip to content

Commit 3ec10e2

Browse files
committed
Merge pull request realpython#709 from Michael-F-Bryan/speed
Speed
2 parents da6648f + dde23c2 commit 3ec10e2

File tree

1 file changed

+206
-2
lines changed

1 file changed

+206
-2
lines changed

docs/scenarios/speed.rst

Lines changed: 206 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -226,13 +226,212 @@ Numba
226226
-----
227227
.. todo:: Write about Numba and the autojit compiler for NumPy
228228

229-
Threading
230-
:::::::::
229+
Concurrency
230+
:::::::::::
231+
232+
233+
Concurrent.futures
234+
------------------
235+
236+
The `concurrent.futures`_ module is a module in the standard library that
237+
provides a "high-level interface for asynchronously executing callables". It
238+
abstracts away a lot of the more complicated details about using multiple
239+
threads or processes for concurrency, and allows the user to focus on
240+
accomplishing the task at hand.
241+
242+
The `concurrent.futures`_ module exposes two main classes, the
243+
`ThreadPoolExecutor` and the `ProcessPoolExecutor`. The ThreadPoolExecutor
244+
will create a pool of worker threads that a user can submit jobs to. These jobs
245+
will then be executed in another thread when the next worker thread becomes
246+
available.
247+
248+
The ProcessPoolExecutor works in the same way, except instead of using multiple
249+
threads for its workers, it will use multiple processes. This makes it possible
250+
to side-step the GIL, however because of the way things are passed to worker
251+
processes, only picklable objects can be executed and returned.
252+
253+
Because of the way the GIL works, a good rule of thumb is to use a
254+
ThreadPoolExecutor when the task being executed involves a lot of blocking
255+
(i.e. making requests over the network) and to use a ProcessPoolExecutor
256+
executor when the task is computationally expensive.
257+
258+
There are two main ways of executing things in parallel using the two
259+
Executors. One way is with the `map(func, iterables)` method. This works
260+
almost exactly like the builtin `map()` function, except it will execute
261+
everything in parallel. :
262+
263+
.. code-block:: python
264+
265+
from concurrent.futures import ThreadPoolExecutor
266+
import requests
267+
268+
def get_webpage(url):
269+
page = requests.get(url)
270+
return page
271+
272+
pool = ThreadPoolExecutor(max_workers=5)
273+
274+
my_urls = ['http://google.com/']*10 # Create a list of urls
231275
276+
for page in pool.map(get_webpage, my_urls):
277+
# Do something with the result
278+
print(page.text)
279+
280+
For even more control, the `submit(func, *args, **kwargs)` method will schedule
281+
a callable to be executed ( as `func(*args, **kwargs)`) and returns a `Future`_
282+
object that represents the execution of the callable.
283+
284+
The Future object provides various methods that can be used to check on the
285+
progress of the scheduled callable. These include:
286+
287+
cancel()
288+
Attempt to cancel the call.
289+
cancelled()
290+
Return True if the call was successfully cancelled.
291+
running()
292+
Return True if the call is currently being executed and cannot be
293+
cancelled.
294+
done()
295+
Return True if the call was successfully cancelled or finished running.
296+
result()
297+
Return the value returned by the call. Note that this call will block until
298+
the scheduled callable returns by default.
299+
exception()
300+
Return the exception raised by the call. If no exception was raised then
301+
this returns `None`. Note that this will block just like `result()`.
302+
add_done_callback(fn)
303+
Attach a callback function that will be executed (as `fn(future)`) when the
304+
scheduled callable returns.
305+
306+
307+
.. code-block:: python
308+
309+
from concurrent.futures import ProcessPoolExecutor, as_completed
310+
311+
def is_prime(n):
312+
if n % 2 == 0:
313+
return n, False
314+
315+
sqrt_n = int(n**0.5)
316+
for i in range(3, sqrt_n + 1, 2):
317+
if n % i == 0:
318+
return n, False
319+
return n, True
320+
321+
PRIMES = [
322+
112272535095293,
323+
112582705942171,
324+
112272535095293,
325+
115280095190773,
326+
115797848077099,
327+
1099726899285419]
328+
329+
futures = []
330+
with ProcessPoolExecutor(max_workers=4) as pool:
331+
# Schedule the ProcessPoolExecutor to check if a number is prime
332+
# and add the returned Future to our list of futures
333+
for p in PRIMES:
334+
fut = pool.submit(is_prime, p)
335+
futures.append(fut)
336+
337+
# As the jobs are completed, print out the results
338+
for number, result in as_completed(futures):
339+
if result:
340+
print("{} is prime".format(number))
341+
else:
342+
print("{} is not prime".format(number))
343+
344+
The `concurrent.futures`_ module contains two helper functions for working with
345+
Futures. The `as_completed(futures)` function returns an iterator over the list
346+
of futures, yielding the futures as they complete.
347+
348+
The `wait(futures)` function will simply block until all futures in the list of
349+
futures provided have completed.
350+
351+
For more information, on using the `concurrent.futures`_ module, consult the
352+
official documentation.
232353

233354
Threading
234355
---------
235356

357+
The standard library comes with a `threading`_ module that allows a user to
358+
work with multiple threads manually.
359+
360+
Running a function in another thread is as simple as passing a callable and
361+
it's arguments to `Thread`'s constructor and then calling `start()`:
362+
363+
.. code-block:: python
364+
365+
from threading import Thread
366+
import requests
367+
368+
def get_webpage(url):
369+
page = requests.get(url)
370+
return page
371+
372+
some_thread = Thread(get_webpage, 'http://google.com/')
373+
some_thread.start()
374+
375+
To wait until the thread has terminated, call `join()`:
376+
377+
.. code-block:: python
378+
379+
some_thread.join()
380+
381+
After calling `join()`, it is always a good idea to check whether the thread is
382+
still alive (because the join call timed out):
383+
384+
.. code-block:: python
385+
386+
if some_thread.is_alive():
387+
print("join() must have timed out.")
388+
else:
389+
print("Our thread has terminated.")
390+
391+
Because multiple threads have access to the same section of memory, sometimes
392+
there might be situations where two or more threads are trying to write to the
393+
same resource at the same time or where the output is dependent on the sequence
394+
or timing of certain events. This is called a `data race`_ or race condition.
395+
When this happens, the output will be garbled or you may encounter problems
396+
which are difficult to debug. A good example is this `stackoverflow post`_.
397+
398+
The way this can be avoided is by using a `Lock`_ that each thread needs to
399+
acquire before writing to a shared resource. Locks can be acquired and released
400+
through either the contextmanager protocol (`with` statement), or by using
401+
`acquire()` and `release()` directly. Here is a (rather contrived) example:
402+
403+
404+
.. code-block:: python
405+
406+
from threading import Lock, Thread
407+
408+
file_lock = Lock()
409+
410+
def log(msg):
411+
with file_lock:
412+
open('website_changes.log', 'w') as f:
413+
f.write(changes)
414+
415+
def monitor_website(some_website):
416+
"""
417+
Monitor a website and then if there are any changes,
418+
log them to disk.
419+
"""
420+
while True:
421+
changes = check_for_changes(some_website)
422+
if changes:
423+
log(changes)
424+
425+
websites = ['http://google.com/', ... ]
426+
for website in websites:
427+
t = Thread(monitor_website, website)
428+
t.start()
429+
430+
Here, we have a bunch of threads checking for changes on a list of sites and
431+
whenever there are any changes, they attempt to write those changes to a file
432+
by calling `log(changes)`. When `log()` is called, it will wait to acquire
433+
the lock with `with file_lock:`. This ensures that at any one time, only one
434+
thread is writing to the file.
236435

237436
Spawning Processes
238437
------------------
@@ -248,3 +447,8 @@ Multiprocessing
248447
.. _`New GIL`: http://www.dabeaz.com/python/NewGIL.pdf
249448
.. _`Special care`: http://docs.python.org/c-api/init.html#threads
250449
.. _`David Beazley's`: http://www.dabeaz.com/GIL/gilvis/measure2.py
450+
.. _`concurrent.futures`: https://docs.python.org/3/library/concurrent.futures.html
451+
.. _`Future`: https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future
452+
.. _`threading`: https://docs.python.org/3/library/threading.html
453+
.. _`stackoverflow post`: http://stackoverflow.com/questions/26688424/python-threads-are-printing-at-the-same-time-messing-up-the-text-output
454+
.. _`data race`: https://en.wikipedia.org/wiki/Race_condition

0 commit comments

Comments
 (0)