-
Notifications
You must be signed in to change notification settings - Fork 396
Do not use Double
arithmetics in Integer.parseInt()
.
#5193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
6d4a737
to
c46f3a5
Compare
I was wondering if you could take a peek and make any recommendations based on this work? https://github.com/scala-native/scala-native/blob/main/javalib/src/main/scala/java/lang/Integer.scala |
Those are fine implementations, but not as good as what's in this PR. In particular, caching the |
I'll review fully, but could you amend the description to include some information about why this is better? |
I added a paragraph in the PR description. I'll integrate it in the commit message next time I amend/rebase/etc. The short answer is "it's faster" ;) Edit: done |
c46f3a5
to
4f2dccf
Compare
|
||
for (radix <- Character.MIN_RADIX to Character.MAX_RADIX) { | ||
val overflowBarrier = divideUnsigned(-1, radix) | ||
var radixPower = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var radixPower = 1 | |
var radixPower = 1 // radix^(maxLength - 1) |
The "off-by-one" nature of the loop below threw me off at first.
// We need at least one digit | ||
if (i >= s.length) | ||
// We need at least one digit, even if it is a 0 digit; also check the radix | ||
if (start == len || radix < Character.MIN_RADIX || radix > Character.MAX_RADIX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (start == len || radix < Character.MIN_RADIX || radix > Character.MAX_RADIX) | |
if (start >= len || radix < Character.MIN_RADIX || radix > Character.MAX_RADIX) |
For clarity / safety?
val digit = Character.digitWithValidRadix(s.charAt(safeLen), radix) | ||
if (digit == -1 || (result ^ SignBit) > (radixInfo.overflowBarrier ^ SignBit)) | ||
fail() | ||
result = result * radix + digit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to check my understanding: we need both maxLength and overflowBarrier because up to here, result
could be 0, so we couldn't detect overflow as-we-go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result
cannot be 0 here. We trim leading zeros beforehand. If safeLen != len
, it means there were maxLength
characters after the leading zeros. The main loop has executed exactly maxLength - 1
times, and during the first iteration digit
was at least 1. Therefore at this point result
is at least radix^(maxLength - 2)
.
The maxLength
check ahead of time is so that we can avoid the overflow checks during the main iteration. The overflow barrier is required for that last overflow check. We could write the algorithm without maxLength
at all, and check for overflow at every iteration instead.
private final class StringRadixInfo(val maxLength: Int, val overflowBarrier: Int) | ||
|
||
/** Precomputed table for parseIntInternal. */ | ||
private lazy val StringRadixInfos: Array[StringRadixInfo] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you compare this in code size and speed with two Array[Int]
s? Especially since lookup in overflowBarriers
would be rare, this might be worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code size is unfortunately not meaningfully different. The extra lazy val
and the extra loop to compute it compensate for the removed class StringRadixInfo
(which is actually very small). IMO the code is less clean if we split into two arrays (see diff), so I lean towards keeping the class StringRadixInfo
.
fail() | ||
result = result * radix + digit | ||
if ((result ^ SignBit) < (Character.MAX_RADIX ^ SignBit)) // unsigned comparison | ||
fail() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you amend the test suite to have a failure case for this case and the overflow barrier case above? (IIUC this is not the case right now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added comments to existing test cases that trigger these particular overflow checks.
Only use `Int` arithmetics. To detect overflow, we precompute a table of the maximum string length for each radix. `Int` arithmetics are faster than `Double` arithmetics. The previous code used `Double`s to have a concise way of detecting the overflow, which is not bad on JS engines. Given a cached overflow detection mechanism, resorting to `Double`s is not necessary anymore. We use a `scala.Array` to be Wasm-friendly, since everything else in the algorithm is now otherwise Wasm-friendly.
4f2dccf
to
f340b41
Compare
Only use
Int
arithmetics. To detect overflow, we precompute a table of the maximum string length for each radix.Int
arithmetics is faster thanDouble
arithmetics. The previous code usedDouble
s to have a concise way of detecting the overflow, which is not bad on JS engines. Given a cached overflow detection mechanism, resorting toDouble
s is not necessary anymore.