You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
this is the current threading patchset, which accumulated up during the
past two weeks. It consists of a biggest set of changes from Roland, to
make threaded signals work. There were still tons of testcases and
boundary conditions (mostly in the signal/exit/ptrace area) that we did
not handle correctly.
Roland's thread-signal semantics/behavior/ptrace fixes:
- fix signal delivery race with do_exit() => signals are re-queued to the
'process' if do_exit() finds pending unhandled ones. This prevents
signals getting lost upon thread-sys_exit().
- a non-main thread has died on one processor and gone to TASK_ZOMBIE,
but before it's gotten to release_task a sys_wait4 on the other
processor reaps it. It's only because it's ptraced that this gets
through eligible_child. Somewhere in there the main thread is also
dying so it reparents the child thread to hit that case. This means
that there is a race where P might be totally invalid.
- forget_original_parent is not doing the right thing when the group
leader dies, i.e. reparenting threads to init when there is a zombie
group leader. Perhaps it doesn't matter for any practical purpose
without ptrace, though it makes for ppid=1 for each thread in core
dumps, which looks funny. Incidentally, SIGCHLD here really should be
p->exit_signal.
- one of the gdb tests makes a questionable assumption about what kill
will do when it has some threads stopped by ptrace and others running.
exit races:
1. Processor A is in sys_wait4 case TASK_STOPPED considering task P.
Processor B is about to resume P and then switch to it.
While A is inside that case block, B starts running P and it clears
P->exit_code, or takes a pending fatal signal and sets it to a new
value. Depending on the interleaving, the possible failure modes are:
a. A gets to its put_user after B has cleared P->exit_code
=> returns with WIFSTOPPED, WSTOPSIG==0
b. A gets to its put_user after B has set P->exit_code anew
=> returns with e.g. WIFSTOPPED, WSTOPSIG==SIGKILL
A can spend an arbitrarily long time in that case block, because
there's getrusage and put_user that can take page faults, and
write_lock'ing of the tasklist_lock that can block. But even if it's
short the race is there in principle.
2. This is new with NPTL, i.e. CLONE_THREAD.
Two processors A and B are both in sys_wait4 case TASK_STOPPED
considering task P.
Both get through their tests and fetches of P->exit_code before either
gets to P->exit_code = 0. => two threads return the same pid from
waitpid.
In other interleavings where one processor gets to its put_user after
the other has cleared P->exit_code, it's like case 1(a).
3. SMP races with stop/cont signals
First, take:
kill(pid, SIGSTOP);
kill(pid, SIGCONT);
or:
kill(pid, SIGSTOP);
kill(pid, SIGKILL);
It's possible for this to leave the process stopped with a pending
SIGCONT/SIGKILL. That's a state that should never be possible.
Moreover, kill(pid, SIGKILL) without any repetition should always be
enough to kill a process. (Likewise SIGCONT when you know it's
sequenced after the last stop signal, must be sufficient to resume a
process.)
4. take:
kill(pid, SIGKILL); // or any fatal signal
kill(pid, SIGCONT); // or SIGKILL
it's possible for this to cause pid to be reaped with status 0
instead of its true termination status. The equivalent scenario
happens when the process being killed is in an _exit call or a
trap-induced fatal signal before the kills.
plus i've done stability fixes for bugs that popped up during
beta-testing, and minor tidying of Roland's changes:
- a rare tasklist corruption during exec, causing some very spurious and
colorful crashes.
- a copy_process()-related dereference of already freed thread structure
if hit with a SIGKILL in the wrong moment.
- SMP spinlock deadlocks in the signal code
this patchset has been tested quite well in the 2.4 backport of the
threading changes - and i've done some stresstesting on 2.5.59 SMP as
well, and did an x86 UP testcompile + testboot as well.
0 commit comments