Use a real Ruby mutex in rb_io_close_wait_list #7884

KJTsanaktsidis · 2023-06-01T07:47:28Z

Because a thread calling IO#close now blocks in a native condvar wait, it's possible for there to be no threads left to actually handle incoming signals/ubf calls/etc.

This manifested as failing tests on Solaris 10 (SPARC), because:

One thread called IO#close, which sent a SIGVTALRM to the other thread to interrupt it, and then waited on the condvar to be notified that the reading thread was done.
One thread was calling IO#read, but it hadn't yet reached the actual call to select(2) when the SIGVTALRM arrived, so it never unblocked itself.

This results in a deadlock.

The fix is to use a real Ruby mutex for the close lock; that way, the closing thread goes into sigwait-sleep and can keep trying to interrupt the select(2) thread.

See the discussion in: #7865

In that linked PR above, I also said "nogvl_wait_for should wait not only on the actual FD that was requested, but also on the sigwait FD (if it's available).". I think I want to do that as well. But this patch seems sufficient to solve the deadlock I introduced, and the tests pass on the Solaris VM i'm using.

CC @mame and @ioquatix

Because a thread calling IO#close now blocks in a native condvar wait, it's possible for there to be _no_ threads left to actually handle incoming signals/ubf calls/etc. This manifested as failing tests on Solaris 10 (SPARC), because: * One thread called IO#close, which sent a SIGVTALRM to the other thread to interrupt it, and then waited on the condvar to be notified that the reading thread was done. * One thread was calling IO#read, but it hadn't yet reached the actual call to select(2) when the SIGVTALRM arrived, so it never unblocked itself. This results in a deadlock. The fix is to use a real Ruby mutex for the close lock; that way, the closing thread goes into sigwait-sleep and can keep trying to interrupt the select(2) thread. See the discussion in: ruby#7865

KJTsanaktsidis · 2023-06-01T07:49:06Z

thread.c

-        rb_native_mutex_initialize(&busy->mu);
-        rb_native_cond_initialize(&busy->cv);
+        wakeup_mutex = rb_mutex_new();
+        RBASIC_CLEAR_CLASS(wakeup_mutex); /* hide from ObjectSpace */


Is it OK to use a Ruby mutex like this inside thread.c/io.c? It works, and does what I want, but I don't know if this kind of thing is considered wrong for some reason?

I think it's okay, and I'd even say you don't need to worry about exposing it to object space.

ioquatix · 2023-06-01T08:37:28Z

Thanks for your work on this difficult issue.

ko1 · 2023-06-01T08:55:46Z

It doesn't work on Ractors so I'll change them later.

KJTsanaktsidis · 2023-06-01T09:04:57Z

Is the issue that two Ractors could be doing IO to the same fd (e.g. by both creating an IO with IO.for_fd), and the mutex would wind up being shared between Ractors unsafely? Would the fix for that be to just protect the rb_io_close_wait_list with the RB_VM_LOCK_ENTER() lock?

luanzeba · 2023-06-01T18:51:39Z

Thanks for this change. We were seeing deadlocks when running cc698c6 but this change fixed it.

KJTsanaktsidis commented Jun 1, 2023

View reviewed changes

ioquatix merged commit edee9b6 into ruby:master Jun 1, 2023

mame mentioned this pull request Jun 2, 2023

Fix busy-loop when waiting for file descriptors to close #7865

Merged

ioquatix mentioned this pull request May 2, 2025

Make waiting_fd behaviour per-IO. #13127

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a real Ruby mutex in rb_io_close_wait_list #7884

Use a real Ruby mutex in rb_io_close_wait_list #7884

KJTsanaktsidis commented Jun 1, 2023 •

edited

Loading

KJTsanaktsidis Jun 1, 2023

ioquatix Jun 1, 2023

ioquatix commented Jun 1, 2023

ko1 commented Jun 1, 2023

KJTsanaktsidis commented Jun 1, 2023

luanzeba commented Jun 1, 2023

Use a real Ruby mutex in rb_io_close_wait_list #7884

Use a real Ruby mutex in rb_io_close_wait_list #7884

Conversation

KJTsanaktsidis commented Jun 1, 2023 • edited Loading

KJTsanaktsidis Jun 1, 2023

Choose a reason for hiding this comment

ioquatix Jun 1, 2023

Choose a reason for hiding this comment

ioquatix commented Jun 1, 2023

ko1 commented Jun 1, 2023

KJTsanaktsidis commented Jun 1, 2023

luanzeba commented Jun 1, 2023

KJTsanaktsidis commented Jun 1, 2023 •

edited

Loading