-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Integrate Ractor#join and Ractor#value with the fiber scheduler #13517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Integrate Ractor#join and Ractor#value with the fiber scheduler #13517
Conversation
544cad7
to
c7c78b5
Compare
This looks good to me.
|
Allow ractors to be used within a fiber scheduler context. Currently, only Ractor.new { ... }.value or Ractor.new { ... }.join are supported and tested. When calling Ractor#join or Ractor#value within a non-blocking fiber scheduler context, do not block the calling thread. Instead, call the fiber scheduler's `block` method, which will transfer to another fiber if coded correctly. When the ractor has terminated and is ready to give its value to the calling thread, it will send an interrupt to that thread telling it to switch back to its original fiber. If the thread is blocked on IO (because it's in the fiber scheduler close hook), it will unblock it by calling its unblock function. When the thread receives the interrupt, it will call the fiber scheduler's `unblock` method, which will either transfer to the original fiber or put it on a ready list, if coded correctly. Ex: ```ruby scheduler = Scheduler.new # your fiber scheduler Fiber.set_scheduler scheduler class << scheduler attr_reader :test_blockers def block(blocker, timeout=nil) (@test_blockers ||= []) << [blocker, timeout] super end end ordering = [] blocked_thread = nil Fiber.schedule do # in f2 r = Ractor.new do # in f3 sleep 0.5 end ordering << "f2 before join" # Calling `r.join` should schedule us away from f2 back to f1. In f1, we end the script # and then Scheduler#close is called, which can block on IO.select or similar. When the ractor # is finished, it resumes fiber f2. blocked_thread = Thread.current r.join ordering << "f2 after join" end ordering << "f1 thread finish" expected_ordering = ["f2 before join", "f1 thread finish", "f2 after join"] assert_equal expected_ordering, ordering assert_equal 1, scheduler.test_blockers.size assert scheduler.test_blockers.first[0].is_a?(Thread) # the blocked thread that called join assert_equal blocked_thread, scheduler.test_blockers.first[0] ```
c7c78b5
to
5fb6b50
Compare
❌ Tests Failed✖️no tests failed ✔️61914 tests passed(3 flakes) |
The PR is rebased and ready for review by @ko1. Thanks! |
rb_fiber_t *fiber = th->ec->fiber_ptr; | ||
if (scheduler != Qnil && fiber && !rb_fiberptr_blocking(fiber)) { | ||
waiter.fiber = fiber; | ||
RACTOR_UNLOCK(cr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when another Ractor wakes up this Ractor here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. By looking at the code, I would say it might try to "unblock" the fiber before the fiber gets "blocked" by the ruby interrupt. I'll try to reproduce this behavior, and come back with a solution for this racy case. Thank you 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I checked out the situation by adding a sleep
call after the RACTOR_UNLOCK
, and for the use case of this PR (Only Ractor#join
and Ractor#take
can be called in fiber scheduler context), it is not racy. This is because the only possibility for wakeup would be the end of the ractor during ractor_notify_exit
, and that uses an interrupt to signal this thread. Because interrupts are not checked in between the unlock and the time rb_fiber_scheduler_block
is called, unblock
cannot happen before block
and the normal order of operations happens.
Edit: I also added a new commit that should fix some edge-case behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ractor::Port#send
also wakes up the target ractor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the scope of this PR is just to allow Ractor#join
and Ractor#value
inside the scheduler. This is why I only added tests for this behavior. Technically, you can call Port#send
and it will also hook into the scheduler because I changed the generic wait
and wakeup
functions, so you are right. To address your concerns, we could do 1 of 3 things I can think of:
-
Only change the code path of
wakeup
when it's happening fromractor_notify_exit
. This would add additional arguments to some functions. Also don't hook intowait
when callingRactor.receive
orRactor.select
for these same reasons. -
Get all ractor actions working properly with the fiber scheduler. This would be considerably more work and our team at Shopify don't want to do this right now, however we do want to do it in the future.
-
Leave this PR as is and commit to supporting the rest of the ractor actions with the fiber scheduler in the future.
For now I will add a commit that addresses point 1.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about to use Thread with Fiber scheduler app?
class Ractor
def join
Thread.new{ self.orig_join }.join
end
end
In this patch, rb_threadptr_interrupt_exec
isn't used with thread creation, but interrupting at any timing is dangerous, so I think a similar technique is still necessary. After that, the cost of Ractor#join
becomes equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea, and I did think about this. Ideally, we wanted to try to hook into the scheduler natively without thread creation. However, if you think it is too racy to use interrupts, I have no more ideas of how to accomplish this right now. In the worse case of this PR being rejected, we can use threads.
@@ -14113,6 +14114,26 @@ ractor.$(OBJEXT): $(top_srcdir)/internal/thread.h | |||
ractor.$(OBJEXT): $(top_srcdir)/internal/variable.h | |||
ractor.$(OBJEXT): $(top_srcdir)/internal/vm.h | |||
ractor.$(OBJEXT): $(top_srcdir)/internal/warnings.h | |||
ractor.$(OBJEXT): $(top_srcdir)/prism/defines.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why prism code are added here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, I could look into it. I first opened the PR without these lines but the build failed saying it was missing these header dependencies, and tool/update-deps says that it's needed.
Now that |
Understood, but in this case, but given that the functionality is only present when the fiber scheduler is enabled, and it's a requirement not to block the scheduler, are you okay if we continue moving forward with this? |
We need to always clear the `waiter` off the current ractor's waiters list even if we get an uncaught error during the execution of the transferred fiber. Therefore, we need to be able to delete the `struct ractor_waiter` off the list multiple times if necessary: once when woken up (if it was woken up) and also before raising in the waiting thread. We use `ccan_list_del_init` to clear the node's `next` and `prev` pointers for safe deletion multiple times. Also, we now RACTOR_LOCK around this list deletion in the case of an error.
59e4b5d
to
459c9ff
Compare
Allow ractors to be used within a fiber scheduler context. Currently,
only Ractor.new { ... }.value or Ractor.new { ... }.join are supported
and tested.
When calling Ractor#join or Ractor#value within a non-blocking fiber
scheduler context, do not block the calling thread. Instead, call the
fiber scheduler's
block
method, which will transfer to another fiberif coded correctly. When the ractor has terminated and is ready to give
its value to the calling thread, it will send an interrupt to that
thread telling it to switch back to its original fiber. If the thread is
blocked on IO (because it's in the fiber scheduler close hook), it will unblock
it by calling its unblock function. When the thread receives the interrupt,
it will call the fiber scheduler's
unblock
method, which will eithertransfer to the original fiber or put it on a ready list, if coded correctly.
Ex: