@@ -226,13 +226,212 @@ Numba
226
226
-----
227
227
.. todo :: Write about Numba and the autojit compiler for NumPy
228
228
229
- Threading
230
- :::::::::
229
+ Concurrency
230
+ :::::::::::
231
+
232
+
233
+ Concurrent.futures
234
+ ------------------
235
+
236
+ The `concurrent.futures `_ module is a module in the standard library that
237
+ provides a "high-level interface for asynchronously executing callables". It
238
+ abstracts away a lot of the more complicated details about using multiple
239
+ threads or processes for concurrency, and allows the user to focus on
240
+ accomplishing the task at hand.
241
+
242
+ The `concurrent.futures `_ module exposes two main classes, the
243
+ `ThreadPoolExecutor ` and the `ProcessPoolExecutor `. The ThreadPoolExecutor
244
+ will create a pool of worker threads that a user can submit jobs to. These jobs
245
+ will then be executed in another thread when the next worker thread becomes
246
+ available.
247
+
248
+ The ProcessPoolExecutor works in the same way, except instead of using multiple
249
+ threads for its workers, it will use multiple processes. This makes it possible
250
+ to side-step the GIL, however because of the way things are passed to worker
251
+ processes, only picklable objects can be executed and returned.
252
+
253
+ Because of the way the GIL works, a good rule of thumb is to use a
254
+ ThreadPoolExecutor when the task being executed involves a lot of blocking
255
+ (i.e. making requests over the network) and to use a ProcessPoolExecutor
256
+ executor when the task is computationally expensive.
257
+
258
+ There are two main ways of executing things in parallel using the two
259
+ Executors. One way is with the `map(func, iterables) ` method. This works
260
+ almost exactly like the builtin `map() ` function, except it will execute
261
+ everything in parallel. :
262
+
263
+ .. code-block :: python
264
+
265
+ from concurrent.futures import ThreadPoolExecutor
266
+ import requests
267
+
268
+ def get_webpage (url ):
269
+ page = requests.get(url)
270
+ return page
271
+
272
+ pool = ThreadPoolExecutor(max_workers = 5 )
273
+
274
+ my_urls = [' http://google.com/' ]* 10 # Create a list of urls
231
275
276
+ for page in pool.map(get_webpage, my_urls):
277
+ # Do something with the result
278
+ print (page.text)
279
+
280
+ For even more control, the `submit(func, *args, **kwargs) ` method will schedule
281
+ a callable to be executed ( as `func(*args, **kwargs) `) and returns a `Future `_
282
+ object that represents the execution of the callable.
283
+
284
+ The Future object provides various methods that can be used to check on the
285
+ progress of the scheduled callable. These include:
286
+
287
+ cancel()
288
+ Attempt to cancel the call.
289
+ cancelled()
290
+ Return True if the call was successfully cancelled.
291
+ running()
292
+ Return True if the call is currently being executed and cannot be
293
+ cancelled.
294
+ done()
295
+ Return True if the call was successfully cancelled or finished running.
296
+ result()
297
+ Return the value returned by the call. Note that this call will block until
298
+ the scheduled callable returns by default.
299
+ exception()
300
+ Return the exception raised by the call. If no exception was raised then
301
+ this returns `None `. Note that this will block just like `result() `.
302
+ add_done_callback(fn)
303
+ Attach a callback function that will be executed (as `fn(future) `) when the
304
+ scheduled callable returns.
305
+
306
+
307
+ .. code-block :: python
308
+
309
+ from concurrent.futures import ProcessPoolExecutor, as_completed
310
+
311
+ def is_prime (n ):
312
+ if n % 2 == 0 :
313
+ return n, False
314
+
315
+ sqrt_n = int (n** 0.5 )
316
+ for i in range (3 , sqrt_n + 1 , 2 ):
317
+ if n % i == 0 :
318
+ return n, False
319
+ return n, True
320
+
321
+ PRIMES = [
322
+ 112272535095293 ,
323
+ 112582705942171 ,
324
+ 112272535095293 ,
325
+ 115280095190773 ,
326
+ 115797848077099 ,
327
+ 1099726899285419 ]
328
+
329
+ futures = []
330
+ with ProcessPoolExecutor(max_workers = 4 ) as pool:
331
+ # Schedule the ProcessPoolExecutor to check if a number is prime
332
+ # and add the returned Future to our list of futures
333
+ for p in PRIMES :
334
+ fut = pool.submit(is_prime, p)
335
+ futures.append(fut)
336
+
337
+ # As the jobs are completed, print out the results
338
+ for number, result in as_completed(futures):
339
+ if result:
340
+ print (" {} is prime" .format(number))
341
+ else :
342
+ print (" {} is not prime" .format(number))
343
+
344
+ The `concurrent.futures `_ module contains two helper functions for working with
345
+ Futures. The `as_completed(futures) ` function returns an iterator over the list
346
+ of futures, yielding the futures as they complete.
347
+
348
+ The `wait(futures) ` function will simply block until all futures in the list of
349
+ futures provided have completed.
350
+
351
+ For more information, on using the `concurrent.futures `_ module, consult the
352
+ official documentation.
232
353
233
354
Threading
234
355
---------
235
356
357
+ The standard library comes with a `threading `_ module that allows a user to
358
+ work with multiple threads manually.
359
+
360
+ Running a function in another thread is as simple as passing a callable and
361
+ it's arguments to `Thread `'s constructor and then calling `start() `:
362
+
363
+ .. code-block :: python
364
+
365
+ from threading import Thread
366
+ import requests
367
+
368
+ def get_webpage (url ):
369
+ page = requests.get(url)
370
+ return page
371
+
372
+ some_thread = Thread(get_webpage, ' http://google.com/' )
373
+ some_thread.start()
374
+
375
+ To wait until the thread has terminated, call `join() `:
376
+
377
+ .. code-block :: python
378
+
379
+ some_thread.join()
380
+
381
+ After calling `join() `, it is always a good idea to check whether the thread is
382
+ still alive (because the join call timed out):
383
+
384
+ .. code-block :: python
385
+
386
+ if some_thread.is_alive():
387
+ print (" join() must have timed out." )
388
+ else :
389
+ print (" Our thread has terminated." )
390
+
391
+ Because multiple threads have access to the same section of memory, sometimes
392
+ there might be situations where two or more threads are trying to write to the
393
+ same resource at the same time or where the output is dependent on the sequence
394
+ or timing of certain events. This is called a `data race `_ or race condition.
395
+ When this happens, the output will be garbled or you may encounter problems
396
+ which are difficult to debug. A good example is this `stackoverflow post `_.
397
+
398
+ The way this can be avoided is by using a `Lock `_ that each thread needs to
399
+ acquire before writing to a shared resource. Locks can be acquired and released
400
+ through either the contextmanager protocol (`with ` statement), or by using
401
+ `acquire() ` and `release() ` directly. Here is a (rather contrived) example:
402
+
403
+
404
+ .. code-block :: python
405
+
406
+ from threading import Lock, Thread
407
+
408
+ file_lock = Lock()
409
+
410
+ def log (msg ):
411
+ with file_lock:
412
+ open (' website_changes.log' , ' w' ) as f:
413
+ f.write(changes)
414
+
415
+ def monitor_website (some_website ):
416
+ """
417
+ Monitor a website and then if there are any changes,
418
+ log them to disk.
419
+ """
420
+ while True :
421
+ changes = check_for_changes(some_website)
422
+ if changes:
423
+ log(changes)
424
+
425
+ websites = [' http://google.com/' , ... ]
426
+ for website in websites:
427
+ t = Thread(monitor_website, website)
428
+ t.start()
429
+
430
+ Here, we have a bunch of threads checking for changes on a list of sites and
431
+ whenever there are any changes, they attempt to write those changes to a file
432
+ by calling `log(changes) `. When `log() ` is called, it will wait to acquire
433
+ the lock with `with file_lock: `. This ensures that at any one time, only one
434
+ thread is writing to the file.
236
435
237
436
Spawning Processes
238
437
------------------
@@ -248,3 +447,8 @@ Multiprocessing
248
447
.. _`New GIL` : http://www.dabeaz.com/python/NewGIL.pdf
249
448
.. _`Special care` : http://docs.python.org/c-api/init.html#threads
250
449
.. _`David Beazley's` : http://www.dabeaz.com/GIL/gilvis/measure2.py
450
+ .. _`concurrent.futures` : https://docs.python.org/3/library/concurrent.futures.html
451
+ .. _`Future` : https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future
452
+ .. _`threading` : https://docs.python.org/3/library/threading.html
453
+ .. _`stackoverflow post` : http://stackoverflow.com/questions/26688424/python-threads-are-printing-at-the-same-time-messing-up-the-text-output
454
+ .. _`data race` : https://en.wikipedia.org/wiki/Race_condition
0 commit comments