-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Integrate Ractor#join and Ractor#value with the fiber scheduler #13517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Integrate Ractor#join and Ractor#value with the fiber scheduler #13517
Conversation
544cad7
to
c7c78b5
Compare
This looks good to me.
|
Allow ractors to be used within a fiber scheduler context. Currently, only Ractor.new { ... }.value or Ractor.new { ... }.join are supported and tested. When calling Ractor#join or Ractor#value within a non-blocking fiber scheduler context, do not block the calling thread. Instead, call the fiber scheduler's `block` method, which will transfer to another fiber if coded correctly. When the ractor has terminated and is ready to give its value to the calling thread, it will send an interrupt to that thread telling it to switch back to its original fiber. If the thread is blocked on IO (because it's in the fiber scheduler close hook), it will unblock it by calling its unblock function. When the thread receives the interrupt, it will call the fiber scheduler's `unblock` method, which will either transfer to the original fiber or put it on a ready list, if coded correctly. Ex: ```ruby scheduler = Scheduler.new # your fiber scheduler Fiber.set_scheduler scheduler class << scheduler attr_reader :test_blockers def block(blocker, timeout=nil) (@test_blockers ||= []) << [blocker, timeout] super end end ordering = [] blocked_thread = nil Fiber.schedule do # in f2 r = Ractor.new do # in f3 sleep 0.5 end ordering << "f2 before join" # Calling `r.join` should schedule us away from f2 back to f1. In f1, we end the script # and then Scheduler#close is called, which can block on IO.select or similar. When the ractor # is finished, it resumes fiber f2. blocked_thread = Thread.current r.join ordering << "f2 after join" end ordering << "f1 thread finish" expected_ordering = ["f2 before join", "f1 thread finish", "f2 after join"] assert_equal expected_ordering, ordering assert_equal 1, scheduler.test_blockers.size assert scheduler.test_blockers.first[0].is_a?(Thread) # the blocked thread that called join assert_equal blocked_thread, scheduler.test_blockers.first[0] ```
c7c78b5
to
5fb6b50
Compare
❌ Tests Failed✖️no tests failed ✔️61914 tests passed(3 flakes) |
The PR is rebased and ready for review by @ko1. Thanks! |
rb_fiber_t *fiber = th->ec->fiber_ptr; | ||
if (scheduler != Qnil && fiber && !rb_fiberptr_blocking(fiber)) { | ||
waiter.fiber = fiber; | ||
RACTOR_UNLOCK(cr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when another Ractor wakes up this Ractor here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. By looking at the code, I would say it might try to "unblock" the fiber before the fiber gets "blocked" by the ruby interrupt. I'll try to reproduce this behavior, and come back with a solution for this racy case. Thank you 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I checked out the situation by adding a sleep
call after the RACTOR_UNLOCK
, and for the use case of this PR (Only Ractor#join
and Ractor#take
can be called in fiber scheduler context), it is not racy. This is because the only possibility for wakeup would be the end of the ractor during ractor_notify_exit
, and that uses an interrupt to signal this thread. Because interrupts are not checked in between the unlock and the time rb_fiber_scheduler_block
is called, unblock
cannot happen before block
and the normal order of operations happens.
Edit: I also added a new commit that should fix some edge-case behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ractor::Port#send
also wakes up the target ractor.
@@ -14113,6 +14114,26 @@ ractor.$(OBJEXT): $(top_srcdir)/internal/thread.h | |||
ractor.$(OBJEXT): $(top_srcdir)/internal/variable.h | |||
ractor.$(OBJEXT): $(top_srcdir)/internal/vm.h | |||
ractor.$(OBJEXT): $(top_srcdir)/internal/warnings.h | |||
ractor.$(OBJEXT): $(top_srcdir)/prism/defines.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why prism code are added here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, I could look into it. I first opened the PR without these lines but the build failed saying it was missing these header dependencies, and tool/update-deps says that it's needed.
Now that |
Understood, but in this case, but given that the functionality is only present when the fiber scheduler is enabled, and it's a requirement not to block the scheduler, are you okay if we continue moving forward with this? |
We need to always clear the `waiter` off the current ractor's waiters list even if we get an uncaught error during the execution of the transferred fiber. Therefore, we need to be able to delete the `struct ractor_waiter` off the list multiple times if necessary: once when woken up (if it was woken up) and also before raising in the waiting thread. We use `ccan_list_del_init` to clear the node's `next` and `prev` pointers for safe deletion multiple times. Also, we now RACTOR_LOCK around this list deletion in the case of an error.
59e4b5d
to
459c9ff
Compare
Allow ractors to be used within a fiber scheduler context. Currently,
only Ractor.new { ... }.value or Ractor.new { ... }.join are supported
and tested.
When calling Ractor#join or Ractor#value within a non-blocking fiber
scheduler context, do not block the calling thread. Instead, call the
fiber scheduler's
block
method, which will transfer to another fiberif coded correctly. When the ractor has terminated and is ready to give
its value to the calling thread, it will send an interrupt to that
thread telling it to switch back to its original fiber. If the thread is
blocked on IO (because it's in the fiber scheduler close hook), it will unblock
it by calling its unblock function. When the thread receives the interrupt,
it will call the fiber scheduler's
unblock
method, which will eithertransfer to the original fiber or put it on a ready list, if coded correctly.
Ex: