|
5 | 5 | ----------
|
6 | 6 | 问题
|
7 | 7 | ----------
|
8 |
| -You’ve heard about the Global Interpreter Lock (GIL), and are worried that it might be |
9 |
| -affecting the performance of your multithreaded program. |
| 8 | +你已经听说过全局解释器锁GIL,担心它会影响到多线程程序的执行性能。 |
10 | 9 |
|
11 | 10 | |
|
12 | 11 |
|
13 | 12 | ----------
|
14 | 13 | 解决方案
|
15 | 14 | ----------
|
16 |
| -Although Python fully supports thread programming, parts of the C implementation |
17 |
| -of the interpreter are not entirely thread safe to a level of allowing fully concurrent |
18 |
| -execution. In fact, the interpreter is protected by a so-called Global Interpreter Lock |
19 |
| -(GIL) that only allows one Python thread to execute at any given time. The most no‐ |
20 |
| -ticeable effect of the GIL is that multithreaded Python programs are not able to fully |
21 |
| -take advantage of multiple CPU cores (e.g., a computationally intensive application |
22 |
| -using more than one thread only runs on a single CPU). |
23 |
| - |
24 |
| -Before discussing common GIL workarounds, it is important to emphasize that the GIL |
25 |
| -tends to only affect programs that are heavily CPU bound (i.e., dominated by compu‐ |
26 |
| -tation). If your program is mostly doing I/O, such as network communication, threads |
27 |
| -are often a sensible choice because they’re mostly going to spend their time sitting |
28 |
| -around waiting. In fact, you can create thousands of Python threads with barely a con‐ |
29 |
| -cern. Modern operating systems have no trouble running with that many threads, so |
30 |
| -it’s simply not something you should worry much about. |
31 |
| -For CPU-bound programs, you really need to study the nature of the computation being |
32 |
| -performed. For instance, careful choice of the underlying algorithm may produce a far |
33 |
| -greater speedup than trying to parallelize an unoptimal algorithm with threads. Simi‐ |
34 |
| -larly, given that Python is interpreted, you might get a far greater speedup simply by |
35 |
| -moving performance-critical code into a C extension module. Extensions such as |
36 |
| -NumPy are also highly effective at speeding up certain kinds of calculations involving |
37 |
| -array data. Last, but not least, you might investigate alternative implementations, such |
38 |
| -as PyPy, which features optimizations such as a JIT compiler (although, as of this writing, |
39 |
| -it does not yet support Python 3). |
40 |
| -It’s also worth noting that threads are not necessarily used exclusively for performance. |
41 |
| -A CPU-bound program might be using threads to manage a graphical user interface, a |
42 |
| -network connection, or provide some other kind of service. In this case, the GIL can |
43 |
| -actually present more of a problem, since code that holds it for an excessively long period |
44 |
| -will cause annoying stalls in the non-CPU-bound threads. In fact, a poorly written C |
45 |
| -extension can actually make this problem worse, even though the computation part of |
46 |
| -the code might run faster than before. |
47 |
| -Having said all of this, there are two common strategies for working around the limi‐ |
48 |
| -tations of the GIL. First, if you are working entirely in Python, you can use the multi |
49 |
| -processing module to create a process pool and use it like a co-processor. For example, |
50 |
| -suppose you have the following thread code: |
51 |
| - |
52 |
| -# Performs a large calculation (CPU bound) |
53 |
| -def some_work(args): |
54 |
| - ... |
55 |
| - return result |
56 |
| - |
57 |
| -# A thread that calls the above function |
58 |
| -def some_thread(): |
59 |
| - while True: |
| 15 | +尽管Python完全支持多线程编程, |
| 16 | +但是解释器的C语言实现部分在完全并行执行时并不是线程安全的。 |
| 17 | +实际上,解释器被一个全局解释器锁保护着,它确保任何时候都只有一个Python线程执行。 |
| 18 | +GIL最大的问题就是Python的多线程程序并不能利用多核CPU的优势 |
| 19 | +(比如一个使用了多个线程的计算密集型程序只会在一个单CPU上面运行)。 |
| 20 | + |
| 21 | +在讨论普通的GIL之前,有一点要强调的是GIL只会影响到那些严重依赖CPU的程序(比如计算型的)。 |
| 22 | +如果你的程序大部分只会设计到I/O,比如网络交互,那么使用多线程就很合适, |
| 23 | +因为它们大部分时间都在等待。实际上,你完全可以放心的创建几千个Python线程, |
| 24 | +现代操作系统运行这么多线程没有任何压力,没啥可担心的。 |
| 25 | + |
| 26 | +而对于依赖CPU的程序,你需要弄清楚执行的计算的特点。 |
| 27 | +例如,优化底层算法要比使用多线程运行快得多。 |
| 28 | +类似的,由于Python是解释执行的,如果你将那些性能瓶颈代码移到一个C语言扩展模块中, |
| 29 | +速度也会提升的很快。如果你要操作数组,那么使用NumPy这样的扩展会非常的高效。 |
| 30 | +最后,你还可以考虑下其他可选实现方案,比如PyPy,它通过一个JIT编译器来优化执行效率 |
| 31 | +(不过在写这本书的时候它还不能支持Python 3)。 |
| 32 | + |
| 33 | +还有一点要注意的是,线程不是专门用来优化性能的。 |
| 34 | +一个CPU依赖型程序可能会使用线程来管理一个图形用户界面、一个网络连接或其他服务。 |
| 35 | +这时候,GIL会产生一些问题,因为如果一个线程长期持有GIL的话会导致其他非CPU型线程一直等待。 |
| 36 | +事实上,一个写的不好的C语言扩展会导致这个问题更加严重, |
| 37 | +尽管代码的计算部分会比之前运行的更快些。 |
| 38 | + |
| 39 | +说了这么多,现在想说的是我们有两种策略来解决GIL的缺点。 |
| 40 | +首先,如果你完全工作于Python环境中,你可以使用 ``multiprocessing`` 模块来创建一个进程池, |
| 41 | +并像协同处理器一样的使用它。例如,加入你有如下的线程代码: |
| 42 | + |
| 43 | +.. code-block:: python |
| 44 | +
|
| 45 | + # Performs a large calculation (CPU bound) |
| 46 | + def some_work(args): |
60 | 47 | ...
|
61 |
| - r = some_work(args) |
| 48 | + return result |
| 49 | +
|
| 50 | + # A thread that calls the above function |
| 51 | + def some_thread(): |
| 52 | + while True: |
| 53 | + ... |
| 54 | + r = some_work(args) |
62 | 55 | ...
|
63 | 56 |
|
64 |
| -Here’s how you would modify the code to use a pool: |
| 57 | +修改代码,使用进程池: |
65 | 58 |
|
66 |
| -# Processing pool (see below for initiazation) |
67 |
| -pool = None |
| 59 | +.. code-block:: python |
68 | 60 |
|
69 |
| -# Performs a large calculation (CPU bound) |
70 |
| -def some_work(args): |
71 |
| - ... |
72 |
| - return result |
| 61 | + # Processing pool (see below for initiazation) |
| 62 | + pool = None |
73 | 63 |
|
74 |
| -# A thread that calls the above function |
75 |
| -def some_thread(): |
76 |
| - while True: |
77 |
| - ... |
78 |
| - r = pool.apply(some_work, (args)) |
| 64 | + # Performs a large calculation (CPU bound) |
| 65 | + def some_work(args): |
79 | 66 | ...
|
| 67 | + return result |
| 68 | +
|
| 69 | + # A thread that calls the above function |
| 70 | + def some_thread(): |
| 71 | + while True: |
| 72 | + ... |
| 73 | + r = pool.apply(some_work, (args)) |
| 74 | + ... |
| 75 | +
|
| 76 | + # Initiaze the pool |
| 77 | + if __name__ == '__main__': |
| 78 | + import multiprocessing |
| 79 | + pool = multiprocessing.Pool() |
| 80 | +
|
| 81 | +这个通过使用一个技巧利用进程池解决了GIL的问题。 |
| 82 | +当一个线程想要执行CPU密集型工作时,会将任务发给进程池。 |
| 83 | +然后进程池会在另外一个进程中启动一个单独的Python解释器来工作。 |
| 84 | +当线程等待结果的时候会释放GIL。 |
| 85 | +并且,由于计算任务在单独解释器中执行,那么就不会受限于GIL了。 |
| 86 | +在一个多核系统上面,你会发现这个技术可以让你很好的利用多CPU的优势。 |
| 87 | + |
| 88 | +另外一个解决GIL的策略是使用C扩展编程技术。 |
| 89 | +主要思想是将计算密集型任务转移给C,跟Python独立,在工作的时候在C代码中释放GIL。 |
| 90 | +这可以通过在C代码中插入下面这样的特殊宏来完成: |
| 91 | + |
| 92 | +:: |
| 93 | + |
| 94 | + #include "Python.h" |
| 95 | + ... |
| 96 | + |
| 97 | + PyObject *pyfunc(PyObject *self, PyObject *args) { |
| 98 | + ... |
| 99 | + Py_BEGIN_ALLOW_THREADS |
| 100 | + // Threaded C code |
| 101 | + ... |
| 102 | + Py_END_ALLOW_THREADS |
| 103 | + ... |
| 104 | + } |
80 | 105 |
|
81 |
| -# Initiaze the pool |
82 |
| -if __name__ == '__main__': |
83 |
| - import multiprocessing |
84 |
| - pool = multiprocessing.Pool() |
85 |
| - |
86 |
| -This example with a pool works around the GIL using a neat trick. Whenever a thread |
87 |
| -wants to perform CPU-intensive work, it hands the work to the pool. The pool, in turn, |
88 |
| -hands the work to a separate Python interpreter running in a different process. While |
89 |
| -the thread is waiting for the result, it releases the GIL. Moreover, because the calculation |
90 |
| -is being performed in a separate interpreter, it’s no longer bound by the restrictions of |
91 |
| -the GIL. On a multicore system, you’ll find that this technique easily allows you to take |
92 |
| -advantage of all the CPUs. |
93 |
| -The second strategy for working around the GIL is to focus on C extension program‐ |
94 |
| -ming. The general idea is to move computationally intensive tasks to C, independent of |
95 |
| -Python, and have the C code release the GIL while it’s working. This is done by inserting |
96 |
| -special macros into the C code like this: |
97 |
| - |
98 |
| -#include "Python.h" |
99 |
| -... |
100 |
| - |
101 |
| -PyObject *pyfunc(PyObject *self, PyObject *args) { |
102 |
| - ... |
103 |
| - Py_BEGIN_ALLOW_THREADS |
104 |
| - // Threaded C code |
105 |
| - ... |
106 |
| - Py_END_ALLOW_THREADS |
107 |
| - ... |
108 |
| -} |
109 |
| -
|
110 |
| -If you are using other tools to access C, such as the ctypes library or Cython, you may |
111 |
| -not need to do anything. For example, ctypes releases the GIL when calling into C by |
112 |
| -default. |
| 106 | +如果你使用其他工具访问C语言,比如对于Cython的ctypes库,你不需要做任何事。 |
| 107 | +例如,ctypes在调用C时会自动释放GIL。 |
113 | 108 |
|
114 | 109 | |
|
115 | 110 |
|
116 | 111 | ----------
|
117 | 112 | 讨论
|
118 | 113 | ----------
|
119 |
| -Many programmers, when faced with thread performance problems, are quick to blame |
120 |
| -the GIL for all of their ills. However, doing so is shortsighted and naive. Just as a real- |
121 |
| - |
122 |
| -world example, mysterious “stalls” in a multithreaded network program might be caused |
123 |
| -by something entirely different (e.g., a stalled DNS lookup) rather than anything related |
124 |
| -to the GIL. The bottom line is that you really need to study your code to know if the |
125 |
| -GIL is an issue or not. Again, realize that the GIL is mostly concerned with CPU-bound |
126 |
| -processing, not I/O. |
127 |
| -If you are going to use a process pool as a workaround, be aware that doing so involves |
128 |
| -data serialization and communication with a different Python interpreter. For this to |
129 |
| -work, the operation to be performed needs to be contained within a Python function |
130 |
| -defined by the def statement (i.e., no lambdas, closures, callable instances, etc.), and the |
131 |
| -function arguments and return value must be compatible with pickle. Also, the amount |
132 |
| -of work to be performed must be sufficiently large to make up for the extra communi‐ |
133 |
| -cation overhead. |
134 |
| -Another subtle aspect of pools is that mixing threads and process pools together can be |
135 |
| -a good way to make your head explode. If you are going to use both of these features |
136 |
| -together, it is often best to create the process pool as a singleton at program startup, |
137 |
| -prior to the creation of any threads. Threads will then use the same process pool for all |
138 |
| -of their computationally intensive work. |
139 |
| -For C extensions, the most important feature is maintaining isolation from the Python |
140 |
| -interpreter process. That is, if you’re going to offload work from Python to C, you need |
141 |
| -to make sure the C code operates independently of Python. This means using no Python |
142 |
| -data structures and making no calls to Python’s C API. Another consideration is that |
143 |
| -you want to make sure your C extension does enough work to make it all worthwhile. |
144 |
| -That is, it’s much better if the extension can perform millions of calculations as opposed |
145 |
| -to just a few small calculations. |
146 |
| -Needless to say, these solutions to working around the GIL don’t apply to all possible |
147 |
| -problems. For instance, certain kinds of applications don’t work well if separated into |
148 |
| -multiple processes, nor may you want to code parts in C. For these kinds of applications, |
149 |
| -you may have to come up with your own solution (e.g., multiple processes accessing |
150 |
| -shared memory regions, multiple interpreters running in the same process, etc.). Al‐ |
151 |
| -ternatively, you might look at some other implementations of the interpreter, such as |
152 |
| -PyPy. |
153 |
| -See Recipes 15.7 and 15.10 for additional information on releasing the GIL in C |
154 |
| -extensions. |
| 114 | +许多程序员在面对线程性能问题的时候,马上就会怪罪GIL,什么都是它的问题。 |
| 115 | +其实这样子太不厚道也太天真了点。 |
| 116 | +作为一个真实的例子,在多线程的网络编程中神秘的 ``stalls`` |
| 117 | +可能是因为其他原因比如一个DNS查找延时,而跟GIL毫无关系。 |
| 118 | +最后你真的需要先去搞懂你的代码是否真的被GIL影响到。 |
| 119 | +同时还要明白GIL大部分都应该只关注CPU的处理而不是I/O. |
| 120 | + |
| 121 | +如果你准备使用一个处理器池,注意的是这样做涉及到数据序列化和在不同Python解释器通信。 |
| 122 | +被执行的操作需要放在一个通过def语句定义的Python函数中,不能是lambda、闭包可调用实例等, |
| 123 | +并且函数参数和返回值必须要兼容pickle。 |
| 124 | +同样,要执行的任务量必须足够大以弥补额外的通宵开销。 |
| 125 | + |
| 126 | +另外一个难点是当混合使用线程和进程池的时候会让你很头疼。 |
| 127 | +如果你要同时使用两者,最好在程序启动时,创建任何线程之前先创建一个单例的进程池。 |
| 128 | +然后线程使用同样的进程池来进行它们的计算密集型工作。 |
| 129 | + |
| 130 | +C扩展最重要的特征是它们和Python解释器是保持独立的。 |
| 131 | +也就是说,如果你准备将Python中的任务分配到C中去执行, |
| 132 | +你需要确保C代码的操作跟Python保持独立, |
| 133 | +这就意味着不要使用Python数据结构以及不要调用Python的C API。 |
| 134 | +另外一个就是你要确保C扩展所做的工作是足够的,值得你这样做。 |
| 135 | +也就是说C扩展担负起了大量的计算任务,而不是少数几个计算。 |
| 136 | + |
| 137 | +这些解决GIL的方案并不能适用于所有问题。 |
| 138 | +例如,某些类型的应用程序如果被分解为多个进程处理的话并不能很好的工作, |
| 139 | +也不能将它的部分代码改成C语言执行。 |
| 140 | +对于这些应用程序,你就要自己需求解决方案了 |
| 141 | +(比如多进程访问共享内存区,多解析器运行于同一个进程等)。 |
| 142 | +或者,你还可以考虑下其他的解释器实现,比如PyPy。 |
| 143 | + |
| 144 | +了解更多关于在C扩展中释放GIL,请参考15.7和15.10小节。 |
0 commit comments