Skip to content

Fix regular expressions across ractors that match different encodings #13568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 10, 2025

Conversation

luke-gruber
Copy link
Contributor

In commit d42b9ff, an optimization was introduced that can speed up Regexp#match by 15% when it matches with strings of different encodings. This optimization, however, does not work across ractors. To fix this, we only use the optimization if no ractors have been started. In the future, we could use atomics for the reference counting if we find it's needed and if it's more performant.

The backtrace of the misbehaving native thread:

  * frame #0: 0x0000000189c94388 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x0000000189ccd88c libsystem_pthread.dylib`pthread_kill + 296
    frame #2: 0x0000000189bd6c60 libsystem_c.dylib`abort + 124
    frame #3: 0x0000000189adb174 libsystem_malloc.dylib`malloc_vreport + 892
    frame #4: 0x0000000189adec90 libsystem_malloc.dylib`malloc_report + 64
    frame #5: 0x0000000189ae321c libsystem_malloc.dylib`___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32
    frame #6: 0x00000001001c3be4 ruby`onig_free_body(reg=0x000000012d84b660) at regcomp.c:5663:5
    frame #7: 0x00000001001ba828 ruby`rb_reg_prepare_re(re=4748462304, str=4748451168) at re.c:1680:13
    frame #8: 0x00000001001bac58 ruby`rb_reg_onig_match(re=4748462304, str=4748451168, match=(ruby`reg_onig_search [inlined] rbimpl_RB_TYPE_P_fastpath at value_type.h:349:14
ruby`reg_onig_search [inlined] rbimpl_rstring_getmem at rstring.h:391:5
ruby`reg_onig_search at re.c:1781:5), args=0x000000013824b168, regs=0x000000013824b150) at re.c:1708:20
    frame #9: 0x00000001001baefc ruby`rb_reg_search_set_match(re=4748462304, str=4748451168, pos=<unavailable>, reverse=0, set_backref_str=1, set_match=0x0000000000000000) at re.c:1809:27
    frame #10: 0x00000001001bae80 ruby`rb_reg_search0(re=<unavailable>, str=<unavailable>, pos=<unavailable>, reverse=<unavailable>, set_backref_str=<unavailable>, match=<unavailable>) at re.c:1861:12 [artificial]
    frame #11: 0x0000000100230b90 ruby`rb_pat_search0(pat=<unavailable>, str=<unavailable>, pos=<unavailable>, set_backref_str=<unavailable>, match=<unavailable>) at string.c:6619:16 [artificial]
    frame #12: 0x00000001002287f4 ruby`rb_str_sub_bang [inlined] rb_pat_search(pat=4748462304, str=4748451168, pos=0, set_backref_str=1) at string.c:6626:12
    frame #13: 0x00000001002287dc ruby`rb_str_sub_bang(argc=1, argv=0x00000001381280d0, str=4748451168) at string.c:6668:11
    frame #14: 0x000000010022826c ruby`rb_str_sub

You can reproduce this by running:

RUBY_TESTOPTS="--name=/test_str_capitalize/" make test-all TESTS=test/ruby/test_m17n.comb

However, you need to run it with multiple ractors at once.

In commit d42b9ff, an optimization was introduced that can speed up
Regexp#match by 15% when it matches with strings of different encodings.
This optimization, however, does not work across ractors. To fix this,
we only use the optimization if no ractors have been started. In the
future, we could use atomics for the reference counting if we find it's
needed and if it's more performant.

The backtrace of the misbehaving native thread:

```
  * frame #0: 0x0000000189c94388 libsystem_kernel.dylib`__pthread_kill + 8
    frame ruby#1: 0x0000000189ccd88c libsystem_pthread.dylib`pthread_kill + 296
    frame ruby#2: 0x0000000189bd6c60 libsystem_c.dylib`abort + 124
    frame ruby#3: 0x0000000189adb174 libsystem_malloc.dylib`malloc_vreport + 892
    frame ruby#4: 0x0000000189adec90 libsystem_malloc.dylib`malloc_report + 64
    frame ruby#5: 0x0000000189ae321c libsystem_malloc.dylib`___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32
    frame ruby#6: 0x00000001001c3be4 ruby`onig_free_body(reg=0x000000012d84b660) at regcomp.c:5663:5
    frame ruby#7: 0x00000001001ba828 ruby`rb_reg_prepare_re(re=4748462304, str=4748451168) at re.c:1680:13
    frame ruby#8: 0x00000001001bac58 ruby`rb_reg_onig_match(re=4748462304, str=4748451168, match=(ruby`reg_onig_search [inlined] rbimpl_RB_TYPE_P_fastpath at value_type.h:349:14
ruby`reg_onig_search [inlined] rbimpl_rstring_getmem at rstring.h:391:5
ruby`reg_onig_search at re.c:1781:5), args=0x000000013824b168, regs=0x000000013824b150) at re.c:1708:20
    frame ruby#9: 0x00000001001baefc ruby`rb_reg_search_set_match(re=4748462304, str=4748451168, pos=<unavailable>, reverse=0, set_backref_str=1, set_match=0x0000000000000000) at re.c:1809:27
    frame ruby#10: 0x00000001001bae80 ruby`rb_reg_search0(re=<unavailable>, str=<unavailable>, pos=<unavailable>, reverse=<unavailable>, set_backref_str=<unavailable>, match=<unavailable>) at re.c:1861:12 [artificial]
    frame ruby#11: 0x0000000100230b90 ruby`rb_pat_search0(pat=<unavailable>, str=<unavailable>, pos=<unavailable>, set_backref_str=<unavailable>, match=<unavailable>) at string.c:6619:16 [artificial]
    frame ruby#12: 0x00000001002287f4 ruby`rb_str_sub_bang [inlined] rb_pat_search(pat=4748462304, str=4748451168, pos=0, set_backref_str=1) at string.c:6626:12
    frame ruby#13: 0x00000001002287dc ruby`rb_str_sub_bang(argc=1, argv=0x00000001381280d0, str=4748451168) at string.c:6668:11
    frame ruby#14: 0x000000010022826c ruby`rb_str_sub
```

You can reproduce this by running:
```
RUBY_TESTOPTS="--name=/test_str_capitalize/" make test-all TESTS=test/ruby/test_m17n.comb
```

However, you need to run it with multiple ractors at once.

Co-authored-by: jhawthorn <john@hawthorn.email>
@jhawthorn jhawthorn merged commit 585dcff into ruby:master Jun 10, 2025
83 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants