|
5 | 5 | ----------
|
6 | 6 | 问题
|
7 | 7 | ----------
|
8 |
| -You’re writing a multithreaded program where threads need to acquire more than one |
9 |
| -lock at a time while avoiding deadlock. |
| 8 | +你正在写一个多线程程序,其中线程需要一次获取多个锁,此时如何避免死锁问题。 |
10 | 9 |
|
11 | 10 | |
|
12 | 11 |
|
13 | 12 | ----------
|
14 | 13 | 解决方案
|
15 | 14 | ----------
|
16 |
| -In multithreaded programs, a common source of deadlock is due to threads that attempt |
17 |
| -to acquire multiple locks at once. For instance, if a thread acquires the first lock, but |
18 |
| -then blocks trying to acquire the second lock, that thread can potentially block the |
19 |
| -progress of other threads and make the program freeze. |
20 |
| -One solution to deadlock avoidance is to assign each lock in the program a unique |
21 |
| -number, and to enforce an ordering rule that only allows multiple locks to be acquired |
22 |
| -in ascending order. This is surprisingly easy to implement using a context manager as |
23 |
| -follows: |
24 |
| - |
25 |
| -import threading |
26 |
| -from contextlib import contextmanager |
27 |
| - |
28 |
| -# Thread-local state to stored information on locks already acquired |
29 |
| -_local = threading.local() |
30 |
| - |
31 |
| -@contextmanager |
32 |
| -def acquire(*locks): |
33 |
| - # Sort locks by object identifier |
34 |
| - locks = sorted(locks, key=lambda x: id(x)) |
35 |
| -
|
36 |
| - # Make sure lock order of previously acquired locks is not violated |
37 |
| - acquired = getattr(_local,'acquired',[]) |
38 |
| - if acquired and max(id(lock) for lock in acquired) >= id(locks[0]): |
39 |
| - raise RuntimeError('Lock Order Violation') |
40 |
| - |
41 |
| - # Acquire all of the locks |
42 |
| - acquired.extend(locks) |
43 |
| - _local.acquired = acquired |
44 |
| - |
45 |
| - try: |
46 |
| - for lock in locks: |
47 |
| - lock.acquire() |
48 |
| - yield |
49 |
| - finally: |
50 |
| - # Release locks in reverse order of acquisition |
51 |
| - for lock in reversed(locks): |
52 |
| - lock.release() |
53 |
| - del acquired[-len(locks):] |
54 |
| - |
55 |
| -To use this context manager, you simply allocate lock objects in the normal way, but use |
56 |
| -the acquire() function whenever you want to work with one or more locks. For |
57 |
| -example: |
58 |
| - |
59 |
| -import threading |
60 |
| -x_lock = threading.Lock() |
61 |
| -y_lock = threading.Lock() |
62 |
| - |
63 |
| -def thread_1(): |
64 |
| - while True: |
65 |
| - with acquire(x_lock, y_lock): |
66 |
| - print('Thread-1') |
67 |
| - |
68 |
| -def thread_2(): |
69 |
| - while True: |
70 |
| - with acquire(y_lock, x_lock): |
71 |
| - print('Thread-2') |
72 |
| - |
73 |
| -t1 = threading.Thread(target=thread_1) |
74 |
| -t1.daemon = True |
75 |
| -t1.start() |
76 |
| - |
77 |
| -t2 = threading.Thread(target=thread_2) |
78 |
| -t2.daemon = True |
79 |
| -t2.start() |
80 |
| - |
81 |
| -If you run this program, you’ll find that it happily runs forever without deadlock—even |
82 |
| -though the acquisition of locks is specified in a different order in each function. |
83 |
| -The key to this recipe lies in the first statement that sorts the locks according to object |
84 |
| -identifier. By sorting the locks, they always get acquired in a consistent order regardless |
85 |
| -of how the user might have provided them to acquire(). |
86 |
| -The solution uses thread-local storage to solve a subtle problem with detecting potential |
87 |
| -deadlock if multiple acquire() operations are nested. For example, suppose you wrote |
88 |
| -the code like this: |
89 |
| - |
90 |
| -import threading |
91 |
| -x_lock = threading.Lock() |
92 |
| -y_lock = threading.Lock() |
93 |
| - |
94 |
| -def thread_1(): |
95 |
| - |
96 |
| - while True: |
97 |
| - with acquire(x_lock): |
98 |
| - with acquire(y_lock): |
99 |
| - print('Thread-1') |
100 |
| - |
101 |
| -def thread_2(): |
102 |
| - while True: |
103 |
| - with acquire(y_lock): |
104 |
| - with acquire(x_lock): |
105 |
| - print('Thread-2') |
106 |
| - |
107 |
| -t1 = threading.Thread(target=thread_1) |
108 |
| -t1.daemon = True |
109 |
| -t1.start() |
110 |
| - |
111 |
| -t2 = threading.Thread(target=thread_2) |
112 |
| -t2.daemon = True |
113 |
| -t2.start() |
114 |
| - |
115 |
| -If you run this version of the program, one of the threads will crash with an exception |
116 |
| -such as this: |
117 |
| - |
118 |
| -Exception in thread Thread-1: |
119 |
| -Traceback (most recent call last): |
120 |
| - File "/usr/local/lib/python3.3/threading.py", line 639, in _bootstrap_inner |
121 |
| - self.run() |
122 |
| - File "/usr/local/lib/python3.3/threading.py", line 596, in run |
123 |
| - self._target(*self._args, **self._kwargs) |
124 |
| - File "deadlock.py", line 49, in thread_1 |
125 |
| - with acquire(y_lock): |
126 |
| - File "/usr/local/lib/python3.3/contextlib.py", line 48, in __enter__ |
127 |
| - return next(self.gen) |
128 |
| - File "deadlock.py", line 15, in acquire |
129 |
| - raise RuntimeError("Lock Order Violation") |
130 |
| -RuntimeError: Lock Order Violation |
131 |
| ->>> |
132 |
| -
|
133 |
| -This crash is caused by the fact that each thread remembers the locks it has already |
134 |
| -acquired. The acquire() function checks the list of previously acquired locks and en‐ |
135 |
| -forces the ordering constraint that previously acquired locks must have an object ID |
136 |
| -that is less than the new locks being acquired. |
| 15 | +在多线程程序中,死锁问题很大一部分是由于线程同时获取多个锁造成的。举个例子:一个线程获取了第一个锁,然后在获取第二个锁的 |
| 16 | +时候发生阻塞,那么这个线程就可能阻塞其他线程的执行,从而导致整个程序假死。 |
| 17 | +解决死锁问题的一种方案是为程序中的每一个锁分配一个唯一的id,然后只允许按照升序规则来使用多个锁,这个规则使用上下文管理器 |
| 18 | +是非常容易实现的,示例如下: |
| 19 | + |
| 20 | +.. code-block:: python |
| 21 | + import threading |
| 22 | + from contextlib import contextmanager |
| 23 | +
|
| 24 | + # Thread-local state to stored information on locks already acquired |
| 25 | + _local = threading.local() |
| 26 | +
|
| 27 | + @contextmanager |
| 28 | + def acquire(*locks): |
| 29 | + # Sort locks by object identifier |
| 30 | + locks = sorted(locks, key=lambda x: id(x)) |
| 31 | +
|
| 32 | + # Make sure lock order of previously acquired locks is not violated |
| 33 | + acquired = getattr(_local,'acquired',[]) |
| 34 | + if acquired and max(id(lock) for lock in acquired) >= id(locks[0]): |
| 35 | + raise RuntimeError('Lock Order Violation') |
| 36 | +
|
| 37 | + # Acquire all of the locks |
| 38 | + acquired.extend(locks) |
| 39 | + _local.acquired = acquired |
| 40 | +
|
| 41 | + try: |
| 42 | + for lock in locks: |
| 43 | + lock.acquire() |
| 44 | + yield |
| 45 | + finally: |
| 46 | + # Release locks in reverse order of acquisition |
| 47 | + for lock in reversed(locks): |
| 48 | + lock.release() |
| 49 | + del acquired[-len(locks):] |
| 50 | +
|
| 51 | +如何使用这个上下文管理器呢?你可以按照正常途径创建一个锁对象,但不论是单个锁还是多个锁中都使用 ``acquire()`` 函数来申请锁, |
| 52 | +示例如下: |
| 53 | + |
| 54 | +.. code-block:: python |
| 55 | + import threading |
| 56 | + x_lock = threading.Lock() |
| 57 | + y_lock = threading.Lock() |
| 58 | +
|
| 59 | + def thread_1(): |
| 60 | + while True: |
| 61 | + with acquire(x_lock, y_lock): |
| 62 | + print('Thread-1') |
| 63 | +
|
| 64 | + def thread_2(): |
| 65 | + while True: |
| 66 | + with acquire(y_lock, x_lock): |
| 67 | + print('Thread-2') |
| 68 | +
|
| 69 | + t1 = threading.Thread(target=thread_1) |
| 70 | + t1.daemon = True |
| 71 | + t1.start() |
| 72 | +
|
| 73 | + t2 = threading.Thread(target=thread_2) |
| 74 | + t2.daemon = True |
| 75 | + t2.start() |
| 76 | +
|
| 77 | +如果你执行这段代码,你会发现它即使在不同的函数中以不同的顺序获取锁也没有发生死锁。 |
| 78 | +其关键在于,在第一段代码中,我们对这些锁进行了排序。通过排序,使得不管用户以什么样的顺序来请求锁,这些锁都会按照固定的顺序被获取。 |
| 79 | +如果有多个 ``acquire()`` 操作被嵌套调用,可以通过线程本地存储(TLS)来检测潜在的死锁问题。 |
| 80 | +假设你的代码是这样写的: |
| 81 | + |
| 82 | +.. code-block:: python |
| 83 | + import threading |
| 84 | + x_lock = threading.Lock() |
| 85 | + y_lock = threading.Lock() |
| 86 | +
|
| 87 | + def thread_1(): |
| 88 | +
|
| 89 | + while True: |
| 90 | + with acquire(x_lock): |
| 91 | + with acquire(y_lock): |
| 92 | + print('Thread-1') |
| 93 | +
|
| 94 | + def thread_2(): |
| 95 | + while True: |
| 96 | + with acquire(y_lock): |
| 97 | + with acquire(x_lock): |
| 98 | + print('Thread-2') |
| 99 | +
|
| 100 | + t1 = threading.Thread(target=thread_1) |
| 101 | + t1.daemon = True |
| 102 | + t1.start() |
| 103 | +
|
| 104 | + t2 = threading.Thread(target=thread_2) |
| 105 | + t2.daemon = True |
| 106 | + t2.start() |
| 107 | +
|
| 108 | +如果你运行这个版本的代码,必定会有一个线程发生崩溃,异常信息可能像这样: |
| 109 | + |
| 110 | +.. code-block:: python |
| 111 | + Exception in thread Thread-1: |
| 112 | + Traceback (most recent call last): |
| 113 | + File "/usr/local/lib/python3.3/threading.py", line 639, in _bootstrap_inner |
| 114 | + self.run() |
| 115 | + File "/usr/local/lib/python3.3/threading.py", line 596, in run |
| 116 | + self._target(*self._args, **self._kwargs) |
| 117 | + File "deadlock.py", line 49, in thread_1 |
| 118 | + with acquire(y_lock): |
| 119 | + File "/usr/local/lib/python3.3/contextlib.py", line 48, in __enter__ |
| 120 | + return next(self.gen) |
| 121 | + File "deadlock.py", line 15, in acquire |
| 122 | + raise RuntimeError("Lock Order Violation") |
| 123 | + RuntimeError: Lock Order Violation |
| 124 | + >>> |
| 125 | +
|
| 126 | +发生崩溃的原因在于,每个线程都记录着自己已经获取到的锁。 ``acquire()`` 函数会检查之前已经获取的锁列表, |
| 127 | +由于锁是按照升序排列获取的,所以函数会认为之前已获取的锁的id必定小于新申请到的锁,这时就会触发异常。 |
137 | 128 |
|
138 | 129 | |
|
139 | 130 |
|
140 | 131 | ----------
|
141 | 132 | 讨论
|
142 | 133 | ----------
|
143 |
| -The issue of deadlock is a well-known problem with programs involving threads (as |
144 |
| -well as a common subject in textbooks on operating systems). As a rule of thumb, as |
145 |
| -long as you can ensure that threads can hold only one lock at a time, your program will |
146 |
| -be deadlock free. However, once multiple locks are being acquired at the same time, all |
147 |
| -bets are off. |
148 |
| - |
149 |
| -Detecting and recovering from deadlock is an extremely tricky problem with few elegant |
150 |
| -solutions. For example, a common deadlock detection and recovery scheme involves |
151 |
| -the use of a watchdog timer. As threads run, they periodically reset the timer, and as |
152 |
| -long as everything is running smoothly, all is well. However, if the program deadlocks, |
153 |
| -the watchdog timer will eventually expire. At that point, the program “recovers” by |
154 |
| -killing and then restarting itself. |
155 |
| -Deadlock avoidance is a different strategy where locking operations are carried out in |
156 |
| -a manner that simply does not allow the program to enter a deadlocked state. The |
157 |
| -solution in which locks are always acquired in strict order of ascending object ID can |
158 |
| -be mathematically proven to avoid deadlock, although the proof is left as an exercise to |
159 |
| -the reader (the gist of it is that by acquiring locks in a purely increasing order, you can’t |
160 |
| -get cyclic locking dependencies, which are a necessary condition for deadlock to occur). |
161 |
| -As a final example, a classic thread deadlock problem is the so-called “dining philoso‐ |
162 |
| -pher’s problem.” In this problem, five philosophers sit around a table on which there |
163 |
| -are five bowls of rice and five chopsticks. Each philosopher represents an independent |
164 |
| -thread and each chopstick represents a lock. In the problem, philosophers either sit and |
165 |
| -think or they eat rice. However, in order to eat rice, a philosopher needs two chopsticks. |
166 |
| -Unfortunately, if all of the philosophers reach over and grab the chopstick to their left, |
167 |
| -they’ll all just sit there with one stick and eventually starve to death. It’s a gruesome |
168 |
| -scene. |
169 |
| -Using the solution, here is a simple deadlock free implementation of the dining philos‐ |
170 |
| -opher’s problem: |
171 |
| - |
172 |
| -import threading |
173 |
| - |
174 |
| -# The philosopher thread |
175 |
| -def philosopher(left, right): |
176 |
| - while True: |
177 |
| - with acquire(left,right): |
178 |
| - print(threading.currentThread(), 'eating') |
179 |
| - |
180 |
| -# The chopsticks (represented by locks) |
181 |
| -NSTICKS = 5 |
182 |
| -chopsticks = [threading.Lock() for n in range(NSTICKS)] |
183 |
| - |
184 |
| -# Create all of the philosophers |
185 |
| -for n in range(NSTICKS): |
186 |
| - t = threading.Thread(target=philosopher, |
187 |
| - args=(chopsticks[n],chopsticks[(n+1) % NSTICKS])) |
188 |
| - t.start() |
189 |
| - |
190 |
| -Last, but not least, it should be noted that in order to avoid deadlock, all locking oper‐ |
191 |
| -ations must be carried out using our acquire() function. If some fragment of code |
192 |
| -decided to acquire a lock directly, then the deadlock avoidance algorithm wouldn’t work. |
193 |
| - |
| 134 | +死锁是每一个多线程程序都会面临的一个问题(就像它是每一本操作系统课本的共同话题一样)。根据经验来讲,尽可能保证每一个 |
| 135 | +线程只能同时保持一个锁,这样程序就不会被死锁问题所困扰。一旦有线程同时申请多个锁,一切就不可预料了。 |
| 136 | + |
| 137 | +死锁的检测与恢复是一个几乎没有优雅的解决方案的扩展话题。一个比较常用的死锁检测与恢复的方案是引入看门狗计数器。当线程正常 |
| 138 | +运行的时候会每隔一段时间重置计数器,在没有发生死锁的情况下,一切都正常进行。一旦发生死锁,由于无法重置计数器导致定时器 |
| 139 | +超时,这时程序会通过重启自身恢复到正常状态。 |
| 140 | + |
| 141 | +避免死锁是另外一种解决死锁问题的方式,在进程获取锁的时候会严格按照对象id升序排列获取,经过数学证明,这样保证程序不会进入 |
| 142 | +死锁状态。证明就留给读者作为练习了。避免死锁的主要思想是,单纯地按照对象id递增的顺序加锁不会产生循环依赖,而循环依赖是 |
| 143 | +死锁的一个必要条件,从而避免程序进入死锁状态。 |
| 144 | + |
| 145 | +下面以一个关于线程死锁的经典问题:“哲学家就餐问题”,作为本节最后一个例子。题目是这样的:五位哲学家围坐在一张桌子前,每个人 |
| 146 | +面前有一碗饭和一只筷子。在这里每个哲学家可以看做是一个独立的线程,而每只筷子可以看做是一个锁。每个哲学家可以处在静坐、 |
| 147 | +思考、吃饭三种状态中的一个。需要注意的是,每个哲学家吃饭是需要两只筷子的,这样问题就来了:如果每个哲学家都拿起自己左边的筷子, |
| 148 | +那么他们五个都只能拿着一只筷子坐在那儿,直到饿死。此时他们就进入了死锁状态。 |
| 149 | +下面是一个简单的使用死锁避免机制解决“哲学家就餐问题”的实现: |
| 150 | + |
| 151 | +.. code-block:: python |
| 152 | + import threading |
| 153 | +
|
| 154 | + # The philosopher thread |
| 155 | + def philosopher(left, right): |
| 156 | + while True: |
| 157 | + with acquire(left,right): |
| 158 | + print(threading.currentThread(), 'eating') |
| 159 | +
|
| 160 | + # The chopsticks (represented by locks) |
| 161 | + NSTICKS = 5 |
| 162 | + chopsticks = [threading.Lock() for n in range(NSTICKS)] |
| 163 | +
|
| 164 | + # Create all of the philosophers |
| 165 | + for n in range(NSTICKS): |
| 166 | + t = threading.Thread(target=philosopher, |
| 167 | + args=(chopsticks[n],chopsticks[(n+1) % NSTICKS])) |
| 168 | + t.start() |
| 169 | +
|
| 170 | +最后,要特别注意到,为了避免死锁,所有的加锁操作必须使用 ``acquire()`` 函数。如果代码中的某部分绕过acquire |
| 171 | +函数直接申请锁,那么整个死锁避免机制就不起作用了。 |
0 commit comments