Skip to content

Optimize str_casecmp length check using pointer end #14163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

erimicel
Copy link
Contributor

@erimicel erimicel commented Aug 10, 2025

This change refactors the final length check in str_casecmp to use existing pointers instead of recalculating the string lengths.

Currently, after the comparison loop finishes, str_casecmp uses RSTRING_LEN to check the lengths of str1 and str2. This requires an additional calculation.

This PR replaces the RSTRING_LEN calls with a check against the p1 and p2 pointers and their respective end pointers (p1end and p2end). Since these pointers are already advanced during the comparison loop, this approach avoids redundant length calculations and slightly improves performance.

The new logic is:

  • If both pointers have reached their end (p1 == p1end && p2 == p2end), the strings are equal in length, returning 0.
  • If p1 has reached its end but p2 has not (p1 == p1end), str1 is shorter, returning -1.
  • Otherwise, p2 has reached its end but p1 has not, so str2 is shorter, returning 1.

Expecting a 3-5% performance increase for str_casecmp.

As I did run benchmark 5 times and took average on my M1 apple;

Benchmarks against master branch:

Benchmark Name Iterations per Second Total Iterations Total Time (s) ns/Iteration
casecmp-1 16.389M 39.178M 2.390 61.02
casecmp-10 2.937M 8.348M 2.843 340.50
casecmp-100 328.633k 980.450k 2.983 3040
casecmp-1000 33.216k 99.165k 2.985 30110
casecmp-1000vs10 2.930M 8.376M 2.858 341.27
casecmp-nonascii1 25.605M 57.385M 2.241 39.05
casecmp-nonascii10 27.403M 56.683M 2.068 36.49
casecmp-nonascii100 27.373M 56.938M 2.080 36.53
casecmp-nonascii1000 27.426M 56.404M 2.057 36.46
casecmp-nonascii1000vs10 27.400M 57.036M 2.082 36.50

Benchmarks against current branch:

Benchmark Name Iterations per Second Total Iterations Total Time (s) ns/Iteration
casecmp-1 17.256M 42.248M 2.448 57.95
casecmp-10 3.035M 8.745M 2.881 329.48
casecmp-100 340.974k 1.018M 2.984 2930
casecmp-1000 34.471k 103.586k 3.005 29010
casecmp-1000vs10 3.032M 8.734M 2.880 329.78
casecmp-nonascii1 27.663M 60.119M 2.173 36.15
casecmp-nonascii10 27.357M 59.933M 2.191 36.55
casecmp-nonascii100 27.309M 59.682M 2.185 36.62
casecmp-nonascii1000 27.138M 59.608M 2.197 36.85
casecmp-nonascii1000vs10 26.929M 59.701M 2.217 37.13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants