Skip to content

Get String#crypt working with multi-ractor in cases where !HAVE_CRYPT_R #13567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

luke-gruber
Copy link
Contributor

@luke-gruber luke-gruber commented Jun 9, 2025

In commit 12f7ba5, ractor safety was added to String#crypt, however in certain cases it can cause a deadlock. When we lock a native mutex, we cannot allocate ruby objects because they might trigger GC which starts a VM barrier. If the barrier is triggered and other native threads are waiting on this mutex, they will not be able to be woken up in order to join the barrier. To fix this, we don't allocate ruby objects when we hold the lock.

The following could reproduce the problem:

strings = []
10_000.times do |i|
  strings << "my string #{i}"
end

STRINGS = Ractor.make_shareable(strings)

rs = []
100.times do
  rs << Ractor.new do
    STRINGS.each do |s|
      s.dup.crypt(s.dup)
    end
  end
end
while rs.any?
  r, obj = Ractor.select(*rs)
  rs.delete(r)
end

I will not be adding tests because I am almost finished a PR to enable running test-all test cases inside many ractors at once, which is how I found the issue.

@luke-gruber luke-gruber force-pushed the fix_string_crypt_with_ractors branch 2 times, most recently from dc39cd1 to 8d25d6f Compare June 9, 2025 21:58
In commit 12f7ba5, ractor safety was added to String#crypt, however
in certain cases it can cause a deadlock. When we lock a native mutex,
we cannot allocate ruby objects because they might trigger GC which
starts a VM barrier. If the barrier is triggered and other native threads
are waiting on this mutex, they will not be able to be woken up in order to join
the barrier. To fix this, we don't allocate ruby objects when we hold the
lock.

The following could reproduce the problem:

```ruby
strings = []
10_000.times do |i|
  strings << "my string #{i}"
end

STRINGS = Ractor.make_shareable(strings)

rs = []
100.times do
  rs << Ractor.new do
    STRINGS.each do |s|
      s.dup.crypt(s.dup)
    end
  end
end
while rs.any?
  r, obj = Ractor.select(*rs)
  rs.delete(r)
end
```

I will not be adding tests because I am almost finished a PR to enable
running test-all test cases inside many ractors at once, which is how I
found the issue.

Co-authored-by: jhawthorn <john@hawthorn.email>
@luke-gruber luke-gruber force-pushed the fix_string_crypt_with_ractors branch from 8d25d6f to eb97984 Compare June 9, 2025 22:02
Comment on lines +11271 to +11273
size_t res_size = strlen(res)+1;
char *dup = malloc(res_size); // need to hold onto lock while duplicating a potentially static buffer
memcpy(dup, res, res_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for readers: We have to do this because of our #define strdup ruby_strdup. We want to malloc here, not xmalloc. And I don't think there's a portable way to determine the size of crypt's output buffer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

res = (strdup)(res); can avoid the macro.

// Don't allocate a ruby object while holding this lock, we could hit a VM barrier, which
// causes a deadlock if other ractors are waiting on this lock.
result = rb_str_new_cstr(dup);
free(dup);
Copy link
Member

@jhawthorn jhawthorn Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If rb_str_new_cstr erorrs, this pointer might leak.

I think that may not be worth dealing with as crypt is semi-deprecated https://bugs.ruby-lang.org/issues/14915

I prefer that (very unlikely) leak to the deadlock that will occur in the test suite under ractors.

result = rb_str_new_cstr(res);

size_t res_size = strlen(res)+1;
char *dup = malloc(res_size); // need to hold onto lock while duplicating a potentially static buffer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encrypted result would be small enough in alloca.
And this copy should not be needed if crypt_r is available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants