Do not use `Double` arithmetics in `Integer.parseInt()`. #5193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

sjrd wants to merge 1 commit into scala-js:main from sjrd:parseint-without-doubles

Member

sjrd commented Jun 6, 2025 •

edited

Loading

Only use Int arithmetics. To detect overflow, we precompute a table of the maximum string length for each radix.

Int arithmetics is faster than Double arithmetics. The previous code used Doubles to have a concise way of detecting the overflow, which is not bad on JS engines. Given a cached overflow detection mechanism, resorting to Doubles is not necessary anymore.

sjrd requested a review from gzm0

June 6, 2025 14:58

sjrd force-pushed the parseint-without-doubles branch from 6d4a737 to c46f3a5 Compare

June 6, 2025 15:19

Contributor

ekrich commented Jun 6, 2025

I was wondering if you could take a peek and make any recommendations based on this work? https://github.com/scala-native/scala-native/blob/main/javalib/src/main/scala/java/lang/Integer.scala

Member Author

sjrd commented Jun 6, 2025

Those are fine implementations, but not as good as what's in this PR. In particular, caching the max is worthwhile, because divisions by non-constants are expensive. Otherwise it's pretty similar.

ekrich mentioned this pull request

Improve Integer.parseInt based on new Scala.js implementation scala-native/scala-native#4363

Open

Contributor

gzm0 commented Jun 7, 2025

I'll review fully, but could you amend the description to include some information about why this is better?

Member Author

sjrd commented Jun 7, 2025 •

edited

Loading

I'll review fully, but could you amend the description to include some information about why this is better?

I added a paragraph in the PR description. I'll integrate it in the commit message next time I amend/rebase/etc. The short answer is "it's faster" ;)

Edit: done

sjrd force-pushed the parseint-without-doubles branch from c46f3a5 to 4f2dccf Compare

June 7, 2025 16:04

gzm0 requested changes

View reviewed changes

javalib/src/main/scala/java/lang/Integer.scala Outdated

+                  for (radix <- Character.MIN_RADIX to Character.MAX_RADIX) {
+                    val overflowBarrier = divideUnsigned(-1, radix)
+                    var radixPower = 1

Contributor

gzm0 Jun 8, 2025

Suggested change

      
                  var radixPower = 1
          
                  var radixPower = 1 // radix^(maxLength - 1)

The "off-by-one" nature of the loop below threw me off at first.

javalib/src/main/scala/java/lang/Integer.scala Outdated

    
                  // We need at least one digit

                  if (i >= s.length)

                  // We need at least one digit, even if it is a 0 digit; also check the radix

                  if (start == len || radix < Character.MIN_RADIX || radix > Character.MAX_RADIX)

Contributor

gzm0 Jun 8, 2025

Suggested change

      
                if (start == len || radix < Character.MIN_RADIX || radix > Character.MAX_RADIX)
          
                if (start >= len || radix < Character.MIN_RADIX || radix > Character.MAX_RADIX)

For clarity / safety?

javalib/src/main/scala/java/lang/Integer.scala

+                    val digit = Character.digitWithValidRadix(s.charAt(safeLen), radix)
+                    if (digit == -1 || (result ^ SignBit) > (radixInfo.overflowBarrier ^ SignBit))
+                      fail()
+                    result = result * radix + digit

Contributor

gzm0 Jun 8, 2025

Just to check my understanding: we need both maxLength and overflowBarrier because up to here, result could be 0, so we couldn't detect overflow as-we-go.

Member Author

sjrd Jun 8, 2025

result cannot be 0 here. We trim leading zeros beforehand. If safeLen != len, it means there were maxLength characters after the leading zeros. The main loop has executed exactly maxLength - 1 times, and during the first iteration digit was at least 1. Therefore at this point result is at least radix^(maxLength - 2).

The maxLength check ahead of time is so that we can avoid the overflow checks during the main iteration. The overflow barrier is required for that last overflow check. We could write the algorithm without maxLength at all, and check for overflow at every iteration instead.

javalib/src/main/scala/java/lang/Integer.scala

+                private final class StringRadixInfo(val maxLength: Int, val overflowBarrier: Int)
+                /** Precomputed table for parseIntInternal. */
+                private lazy val StringRadixInfos: Array[StringRadixInfo] = {

Contributor

gzm0 Jun 8, 2025

Did you compare this in code size and speed with two Array[Int]s? Especially since lookup in overflowBarriers would be rare, this might be worth it.

Member Author

sjrd Jun 8, 2025

Code size is unfortunately not meaningfully different. The extra lazy val and the extra loop to compute it compensate for the removed class StringRadixInfo (which is actually very small). IMO the code is less clean if we split into two arrays (see diff), so I lean towards keeping the class StringRadixInfo.

javalib/src/main/scala/java/lang/Integer.scala

+                      fail()
+                    result = result * radix + digit
+                    if ((result ^ SignBit) < (Character.MAX_RADIX ^ SignBit)) // unsigned comparison
+                      fail()

Contributor

gzm0 Jun 8, 2025

Could you amend the test suite to have a failure case for this case and the overflow barrier case above? (IIUC this is not the case right now).

Member Author

sjrd Jun 8, 2025

I added comments to existing test cases that trigger these particular overflow checks.


          Do not use Double arithmetics in Integer.parseInt().

f340b41

Only use `Int` arithmetics. To detect overflow, we precompute a
table of the maximum string length for each radix.

`Int` arithmetics are faster than `Double` arithmetics. The
previous code used `Double`s to have a concise way of detecting
the overflow, which is not bad on JS engines. Given a cached
overflow detection mechanism, resorting to `Double`s is not
necessary anymore.

We use a `scala.Array` to be Wasm-friendly, since everything else
in the algorithm is now otherwise Wasm-friendly.

sjrd force-pushed the parseint-without-doubles branch from 4f2dccf to f340b41 Compare

June 8, 2025 10:15

sjrd requested a review from gzm0

June 8, 2025 10:16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet