-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Use a real Ruby mutex in rb_io_close_wait_list #7884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a real Ruby mutex in rb_io_close_wait_list #7884
Conversation
Because a thread calling IO#close now blocks in a native condvar wait, it's possible for there to be _no_ threads left to actually handle incoming signals/ubf calls/etc. This manifested as failing tests on Solaris 10 (SPARC), because: * One thread called IO#close, which sent a SIGVTALRM to the other thread to interrupt it, and then waited on the condvar to be notified that the reading thread was done. * One thread was calling IO#read, but it hadn't yet reached the actual call to select(2) when the SIGVTALRM arrived, so it never unblocked itself. This results in a deadlock. The fix is to use a real Ruby mutex for the close lock; that way, the closing thread goes into sigwait-sleep and can keep trying to interrupt the select(2) thread. See the discussion in: ruby#7865
rb_native_mutex_initialize(&busy->mu); | ||
rb_native_cond_initialize(&busy->cv); | ||
wakeup_mutex = rb_mutex_new(); | ||
RBASIC_CLEAR_CLASS(wakeup_mutex); /* hide from ObjectSpace */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it OK to use a Ruby mutex like this inside thread.c
/io.c
? It works, and does what I want, but I don't know if this kind of thing is considered wrong for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay, and I'd even say you don't need to worry about exposing it to object space.
Thanks for your work on this difficult issue. |
It doesn't work on Ractors so I'll change them later. |
Is the issue that two Ractors could be doing IO to the same fd (e.g. by both creating an IO with |
Thanks for this change. We were seeing deadlocks when running cc698c6 but this change fixed it. |
Because a thread calling IO#close now blocks in a native condvar wait, it's possible for there to be no threads left to actually handle incoming signals/ubf calls/etc.
This manifested as failing tests on Solaris 10 (SPARC), because:
This results in a deadlock.
The fix is to use a real Ruby mutex for the close lock; that way, the closing thread goes into sigwait-sleep and can keep trying to interrupt the select(2) thread.
See the discussion in: #7865
In that linked PR above, I also said "nogvl_wait_for should wait not only on the actual FD that was requested, but also on the sigwait FD (if it's available).". I think I want to do that as well. But this patch seems sufficient to solve the deadlock I introduced, and the tests pass on the Solaris VM i'm using.
CC @mame and @ioquatix