Skip to content

Commit d49fbf7

Browse files
Michal Hockotorvalds
authored andcommitted
proc, oom: drop bogus task_lock and mm check
Series "Handle oom bypass more gracefully", V5 The following 10 patches should put some order to very rare cases of mm shared between processes and make the paths which bypass the oom killer oom reapable and therefore much more reliable finally. Even though mm shared outside of thread group is rare (either vforked tasks for a short period, use_mm by kernel threads or exotic thread model of clone(CLONE_VM) without CLONE_SIGHAND) it is better to cover them. Not only it makes the current oom killer logic quite hard to follow and reason about it can lead to weird corner cases. E.g. it is possible to select an oom victim which shares the mm with unkillable process or bypass the oom killer even when other processes sharing the mm are still alive and other weird cases. Patch 1 drops bogus task_lock and mm check from oom_{score_}adj_write. This can be considered a bug fix with a low impact as nobody has noticed for years. Patch 2 drops sighand lock because it is not needed anymore as pointed by Oleg. Patch 3 is a clean up of oom_score_adj handling and a preparatory work for later patches. Patch 4 enforces oom_adj_score to be consistent between processes sharing the mm to behave consistently with the regular thread groups. This can be considered a user visible behavior change because one thread group updating oom_score_adj will affect others which share the same mm via clone(CLONE_VM). I argue that this should be acceptable because we already have the same behavior for threads in the same thread group and sharing the mm without signal struct is just a different model of threading. This is probably the most controversial part of the series, I would like to find some consensus here. There were some suggestions to hook some counter/oom_score_adj into the mm_struct but I feel that this is not necessary right now and we can rely on proc handler + oom_kill_process to DTRT. I can be convinced otherwise but I strongly think that whatever we do the userspace has to have a way to see the current oom priority as consistently as possible. Patch 5 makes sure that no vforked task is selected if it is sharing the mm with oom unkillable task. Patch 6 ensures that all user tasks sharing the mm are killed which in turn makes sure that all oom victims are oom reapable. Patch 7 guarantees that task_will_free_mem will always imply reapable bypass of the oom killer. Patch 8 is new in this version and it addresses an issue pointed out by 0-day OOM report where an oom victim was reaped several times. Patch 9 puts an upper bound on how many times oom_reaper tries to reap a task and hides it from the oom killer to move on when no progress can be made. This will give an upper bound to how long an oom_reapable task can block the oom killer from selecting another victim if the oom_reaper is not able to reap the victim. Patch 10 tries to plug the (hopefully) last hole when we can still lock up when the oom victim is shared with oom unkillable tasks (kthreads and global init). We just try to be best effort in that case and rather fallback to kill something else than risk a lockup. This patch (of 10): Both oom_adj_write and oom_score_adj_write are using task_lock, check for task->mm and fail if it is NULL. This is not needed because the oom_score_adj is per signal struct so we do not need mm at all. The code has been introduced by 3d5992d ("oom: add per-mm oom disable count") but we do not do per-mm oom disable since c9f0124 ("oom: remove oom_disable_count"). The task->mm check is even not correct because the current thread might have exited but the thread group might be still alive - e.g. thread group leader would lead that echo $VAL > /proc/pid/oom_score_adj would always fail with EINVAL while /proc/pid/task/$other_tid/oom_score_adj would succeed. This is unexpected at best. Remove the lock along with the check to fix the unexpected behavior and also because there is not real need for the lock in the first place. Link: http://lkml.kernel.org/r/1466426628-15074-2-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent 194dc87 commit d49fbf7

File tree

1 file changed

+4
-18
lines changed

1 file changed

+4
-18
lines changed

fs/proc/base.c

Lines changed: 4 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1083,15 +1083,9 @@ static ssize_t oom_adj_write(struct file *file, const char __user *buf,
10831083
goto out;
10841084
}
10851085

1086-
task_lock(task);
1087-
if (!task->mm) {
1088-
err = -EINVAL;
1089-
goto err_task_lock;
1090-
}
1091-
10921086
if (!lock_task_sighand(task, &flags)) {
10931087
err = -ESRCH;
1094-
goto err_task_lock;
1088+
goto err_put_task;
10951089
}
10961090

10971091
/*
@@ -1121,8 +1115,7 @@ static ssize_t oom_adj_write(struct file *file, const char __user *buf,
11211115
trace_oom_score_adj_update(task);
11221116
err_sighand:
11231117
unlock_task_sighand(task, &flags);
1124-
err_task_lock:
1125-
task_unlock(task);
1118+
err_put_task:
11261119
put_task_struct(task);
11271120
out:
11281121
return err < 0 ? err : count;
@@ -1186,15 +1179,9 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf,
11861179
goto out;
11871180
}
11881181

1189-
task_lock(task);
1190-
if (!task->mm) {
1191-
err = -EINVAL;
1192-
goto err_task_lock;
1193-
}
1194-
11951182
if (!lock_task_sighand(task, &flags)) {
11961183
err = -ESRCH;
1197-
goto err_task_lock;
1184+
goto err_put_task;
11981185
}
11991186

12001187
if ((short)oom_score_adj < task->signal->oom_score_adj_min &&
@@ -1210,8 +1197,7 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf,
12101197

12111198
err_sighand:
12121199
unlock_task_sighand(task, &flags);
1213-
err_task_lock:
1214-
task_unlock(task);
1200+
err_put_task:
12151201
put_task_struct(task);
12161202
out:
12171203
return err < 0 ? err : count;

0 commit comments

Comments
 (0)