@@ -72,8 +72,7 @@ example, they add a NULL pointer or a boundary check, fix a race by adding
72
72
a missing memory barrier, or add some locking around a critical section.
73
73
Most of these changes are self contained and the function presents itself
74
74
the same way to the rest of the system. In this case, the functions might
75
- be updated independently one by one. (This can be done by setting the
76
- 'immediate' flag in the klp_patch struct.)
75
+ be updated independently one by one.
77
76
78
77
But there are more complex fixes. For example, a patch might change
79
78
ordering of locking in multiple functions at the same time. Or a patch
@@ -125,40 +124,23 @@ safe to patch tasks:
125
124
b) Patching CPU-bound user tasks. If the task is highly CPU-bound
126
125
then it will get patched the next time it gets interrupted by an
127
126
IRQ.
128
- c) In the future it could be useful for applying patches for
129
- architectures which don't yet have HAVE_RELIABLE_STACKTRACE. In
130
- this case you would have to signal most of the tasks on the
131
- system. However this isn't supported yet because there's
132
- currently no way to patch kthreads without
133
- HAVE_RELIABLE_STACKTRACE.
134
127
135
128
3. For idle "swapper" tasks, since they don't ever exit the kernel, they
136
129
instead have a klp_update_patch_state() call in the idle loop which
137
130
allows them to be patched before the CPU enters the idle state.
138
131
139
132
(Note there's not yet such an approach for kthreads.)
140
133
141
- All the above approaches may be skipped by setting the 'immediate' flag
142
- in the 'klp_patch' struct, which will disable per-task consistency and
143
- patch all tasks immediately. This can be useful if the patch doesn't
144
- change any function or data semantics. Note that, even with this flag
145
- set, it's possible that some tasks may still be running with an old
146
- version of the function, until that function returns .
134
+ Architectures which don't have HAVE_RELIABLE_STACKTRACE solely rely on
135
+ the second approach. It's highly likely that some tasks may still be
136
+ running with an old version of the function, until that function
137
+ returns. In this case you would have to signal the tasks. This
138
+ especially applies to kthreads. They may not be woken up and would need
139
+ to be forced. See below for more information .
147
140
148
- There's also an 'immediate' flag in the 'klp_func' struct which allows
149
- you to specify that certain functions in the patch can be applied
150
- without per-task consistency. This might be useful if you want to patch
151
- a common function like schedule(), and the function change doesn't need
152
- consistency but the rest of the patch does.
153
-
154
- For architectures which don't have HAVE_RELIABLE_STACKTRACE, the user
155
- must set patch->immediate which causes all tasks to be patched
156
- immediately. This option should be used with care, only when the patch
157
- doesn't change any function or data semantics.
158
-
159
- In the future, architectures which don't have HAVE_RELIABLE_STACKTRACE
160
- may be allowed to use per-task consistency if we can come up with
161
- another way to patch kthreads.
141
+ Unless we can come up with another way to patch kthreads, architectures
142
+ without HAVE_RELIABLE_STACKTRACE are not considered fully supported by
143
+ the kernel livepatching.
162
144
163
145
The /sys/kernel/livepatch/<patch>/transition file shows whether a patch
164
146
is in transition. Only a single patch (the topmost patch on the stack)
@@ -197,6 +179,11 @@ modules is permanently disabled when the force feature is used. It cannot be
197
179
guaranteed there is no task sleeping in such module. It implies unbounded
198
180
reference count if a patch module is disabled and enabled in a loop.
199
181
182
+ Moreover, the usage of force may also affect future applications of live
183
+ patches and cause even more harm to the system. Administrator should first
184
+ consider to simply cancel a transition (see above). If force is used, reboot
185
+ should be planned and no more live patches applied.
186
+
200
187
3.1 Adding consistency model support to new architectures
201
188
---------------------------------------------------------
202
189
@@ -234,13 +221,6 @@ few options:
234
221
a good backup option for those architectures which don't have
235
222
reliable stack traces yet.
236
223
237
- In the meantime, patches for such architectures can bypass the
238
- consistency model by setting klp_patch.immediate to true. This option
239
- is perfectly fine for patches which don't change the semantics of the
240
- patched functions. In practice, this is usable for ~90% of security
241
- fixes. Use of this option also means the patch can't be unloaded after
242
- it has been disabled.
243
-
244
224
245
225
4. Livepatch module
246
226
===================
@@ -296,9 +276,6 @@ into three levels:
296
276
only for a particular object ( vmlinux or a kernel module ). Note that
297
277
kallsyms allows for searching symbols according to the object name.
298
278
299
- There's also an 'immediate' flag which, when set, patches the
300
- function immediately, bypassing the consistency model safety checks.
301
-
302
279
+ struct klp_object defines an array of patched functions (struct
303
280
klp_func) in the same object. Where the object is either vmlinux
304
281
(NULL) or a module name.
@@ -317,9 +294,6 @@ into three levels:
317
294
symbols are found. The only exception are symbols from objects
318
295
(kernel modules) that have not been loaded yet.
319
296
320
- Setting the 'immediate' flag applies the patch to all tasks
321
- immediately, bypassing the consistency model safety checks.
322
-
323
297
For more details on how the patch is applied on a per-task basis,
324
298
see the "Consistency model" section.
325
299
@@ -334,14 +308,12 @@ section "Livepatch life-cycle" below for more details about these
334
308
two operations.
335
309
336
310
Module removal is only safe when there are no users of the underlying
337
- functions. The immediate consistency model is not able to detect this. The
338
- code just redirects the functions at the very beginning and it does not
339
- check if the functions are in use. In other words, it knows when the
340
- functions get called but it does not know when the functions return.
341
- Therefore it cannot be decided when the livepatch module can be safely
342
- removed. This is solved by a hybrid consistency model. When the system is
343
- transitioned to a new patch state (patched/unpatched) it is guaranteed that
344
- no task sleeps or runs in the old code.
311
+ functions. This is the reason why the force feature permanently disables
312
+ the removal. The forced tasks entered the functions but we cannot say
313
+ that they returned back. Therefore it cannot be decided when the
314
+ livepatch module can be safely removed. When the system is successfully
315
+ transitioned to a new patch state (patched/unpatched) without being
316
+ forced it is guaranteed that no task sleeps or runs in the old code.
345
317
346
318
347
319
5. Livepatch life-cycle
@@ -355,19 +327,12 @@ First, the patch is applied only when all patched symbols for already
355
327
loaded objects are found. The error handling is much easier if this
356
328
check is done before particular functions get redirected.
357
329
358
- Second, the immediate consistency model does not guarantee that anyone is not
359
- sleeping in the new code after the patch is reverted. This means that the new
360
- code needs to stay around "forever". If the code is there, one could apply it
361
- again. Therefore it makes sense to separate the operations that might be done
362
- once and those that need to be repeated when the patch is enabled (applied)
363
- again.
364
-
365
- Third, it might take some time until the entire system is migrated
366
- when a more complex consistency model is used. The patch revert might
367
- block the livepatch module removal for too long. Therefore it is useful
368
- to revert the patch using a separate operation that might be called
369
- explicitly. But it does not make sense to remove all information
370
- until the livepatch module is really removed.
330
+ Second, it might take some time until the entire system is migrated with
331
+ the hybrid consistency model being used. The patch revert might block
332
+ the livepatch module removal for too long. Therefore it is useful to
333
+ revert the patch using a separate operation that might be called
334
+ explicitly. But it does not make sense to remove all information until
335
+ the livepatch module is really removed.
371
336
372
337
373
338
5.1. Registration
0 commit comments