Skip to content

Integrate Ractor#join and Ractor#value with the fiber scheduler #13517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

luke-gruber
Copy link
Contributor

Allow ractors to be used within a fiber scheduler context. Currently,
only Ractor.new { ... }.value or Ractor.new { ... }.join are supported
and tested.

When calling Ractor#join or Ractor#value within a non-blocking fiber
scheduler context, do not block the calling thread. Instead, call the
fiber scheduler's block method, which will transfer to another fiber
if coded correctly. When the ractor has terminated and is ready to give
its value to the calling thread, it will send an interrupt to that
thread telling it to switch back to its original fiber. If the thread is
blocked on IO (because it's in the fiber scheduler close hook), it will unblock
it by calling its unblock function. When the thread receives the interrupt,
it will call the fiber scheduler's unblock method, which will either
transfer to the original fiber or put it on a ready list, if coded correctly.

Ex:

scheduler = Scheduler.new # your fiber scheduler
Fiber.set_scheduler scheduler
class << scheduler
  attr_reader :test_blockers
  def block(blocker, timeout=nil)
      (@test_blockers ||= []) << [blocker, timeout]
      super
  end
end
ordering = []
blocked_thread = nil
# in f1
Fiber.schedule do
  # in f2
  r = Ractor.new do
      # in f3
      sleep 0.5
  end
  ordering << "f2 before join"
  # Calling `r.join` should schedule us away from f2 back to f1. In f1, we end the script
  # and then Scheduler#close is called, which can block on IO.select or similar. When the ractor
  # is finished, it resumes fiber f2.
  blocked_thread = Thread.current
  r.join
  ordering << "f2 after join"
end
ordering << "f1 thread finish"
expected_ordering = ["f2 before join", "f1 thread finish", "f2 after join"]

assert_equal expected_ordering, ordering
assert_equal 1, scheduler.test_blockers.size
assert scheduler.test_blockers.first[0].is_a?(Thread) # the blocked thread that called join
assert_equal blocked_thread, scheduler.test_blockers.first[0]

@luke-gruber luke-gruber force-pushed the ractor_ports_with_fiber_scheduler branch from 544cad7 to c7c78b5 Compare June 4, 2025 18:36
@samuel-williams-shopify
Copy link
Contributor

This looks good to me.

  • Let's split out the cosmetic changes so we can merge them separately.
  • Then let's rebase this PR so we can focus on the review cycle with @ko1.

Allow ractors to be used within a fiber scheduler context. Currently,
only Ractor.new { ... }.value or Ractor.new { ... }.join are supported
and tested.

When calling Ractor#join or Ractor#value within a non-blocking fiber
scheduler context, do not block the calling thread. Instead, call the
fiber scheduler's `block` method, which will transfer to another fiber
if coded correctly. When the ractor has terminated and is ready to give
its value to the calling thread, it will send an interrupt to that
thread telling it to switch back to its original fiber. If the thread is
blocked on IO (because it's in the fiber scheduler close hook), it will unblock
it by calling its unblock function. When the thread receives the interrupt,
it will call the fiber scheduler's `unblock` method, which will either
transfer to the original fiber or put it on a ready list, if coded correctly.

Ex:

```ruby
scheduler = Scheduler.new # your fiber scheduler
Fiber.set_scheduler scheduler
class << scheduler
  attr_reader :test_blockers
  def block(blocker, timeout=nil)
    (@test_blockers ||= []) << [blocker, timeout]
    super
  end
end
ordering = []
blocked_thread = nil
Fiber.schedule do
  # in f2
  r = Ractor.new do
    # in f3
    sleep 0.5
  end
  ordering << "f2 before join"
  # Calling `r.join` should schedule us away from f2 back to f1. In f1, we end the script
  # and then Scheduler#close is called, which can block on IO.select or similar. When the ractor
  # is finished, it resumes fiber f2.
  blocked_thread = Thread.current
  r.join
  ordering << "f2 after join"
end
ordering << "f1 thread finish"
expected_ordering = ["f2 before join", "f1 thread finish", "f2 after join"]

assert_equal expected_ordering, ordering
assert_equal 1, scheduler.test_blockers.size
assert scheduler.test_blockers.first[0].is_a?(Thread) # the blocked thread that called join
assert_equal blocked_thread, scheduler.test_blockers.first[0]
```
@luke-gruber luke-gruber force-pushed the ractor_ports_with_fiber_scheduler branch from c7c78b5 to 5fb6b50 Compare June 4, 2025 21:03
Copy link

launchable-app bot commented Jun 4, 2025

Tests Failed

✖️no tests failed ✔️61914 tests passed(3 flakes)

@luke-gruber
Copy link
Contributor Author

The PR is rebased and ready for review by @ko1. Thanks!

rb_fiber_t *fiber = th->ec->fiber_ptr;
if (scheduler != Qnil && fiber && !rb_fiberptr_blocking(fiber)) {
waiter.fiber = fiber;
RACTOR_UNLOCK(cr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when another Ractor wakes up this Ractor here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. By looking at the code, I would say it might try to "unblock" the fiber before the fiber gets "blocked" by the ruby interrupt. I'll try to reproduce this behavior, and come back with a solution for this racy case. Thank you 😄

Copy link
Contributor Author

@luke-gruber luke-gruber Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I checked out the situation by adding a sleep call after the RACTOR_UNLOCK, and for the use case of this PR (Only Ractor#join and Ractor#take can be called in fiber scheduler context), it is not racy. This is because the only possibility for wakeup would be the end of the ractor during ractor_notify_exit, and that uses an interrupt to signal this thread. Because interrupts are not checked in between the unlock and the time rb_fiber_scheduler_block is called, unblock cannot happen before block and the normal order of operations happens.
Edit: I also added a new commit that should fix some edge-case behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ractor::Port#send also wakes up the target ractor.

@@ -14113,6 +14114,26 @@ ractor.$(OBJEXT): $(top_srcdir)/internal/thread.h
ractor.$(OBJEXT): $(top_srcdir)/internal/variable.h
ractor.$(OBJEXT): $(top_srcdir)/internal/vm.h
ractor.$(OBJEXT): $(top_srcdir)/internal/warnings.h
ractor.$(OBJEXT): $(top_srcdir)/prism/defines.h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why prism code are added here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I could look into it. I first opened the PR without these lines but the build failed saying it was missing these header dependencies, and tool/update-deps says that it's needed.

@ko1
Copy link
Contributor

ko1 commented Jun 9, 2025

Now that Ractor::Port was recently merged and is not yet mature, I don't want to introduce more complexity that would make it harder to debug.
However, Ractor::Port is dramatically simpler than the previous take/yield pair, so the situation has improved.

@samuel-williams-shopify
Copy link
Contributor

I don't want to introduce more complexity that would make it harder to debug.

Understood, but in this case, but given that the functionality is only present when the fiber scheduler is enabled, and it's a requirement not to block the scheduler, are you okay if we continue moving forward with this?

We need to always clear the `waiter` off the current ractor's waiters
list even if we get an uncaught error during the execution of the transferred
fiber. Therefore, we need to be able to delete the `struct ractor_waiter` off
the list multiple times if necessary: once when woken up (if it was woken up) and
also before raising in the waiting thread. We use `ccan_list_del_init` to clear the
node's `next` and `prev` pointers for safe deletion multiple times.

Also, we now RACTOR_LOCK around this list deletion in the case of an error.
@luke-gruber luke-gruber force-pushed the ractor_ports_with_fiber_scheduler branch from 59e4b5d to 459c9ff Compare June 10, 2025 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants