-
Notifications
You must be signed in to change notification settings - Fork 396
Introduce IR UnaryOps for floating point bit manipulation. #5158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a1cd758
to
59ca785
Compare
private val areTypedArraysBigEndian = { | ||
if (areTypedArraysSupported) { | ||
int32Array(0) = 0x01020304 | ||
(new typedarray.Int8Array(arrayBuffer, 0, 8))(0) == 0x01 | ||
} else { | ||
true // as good a value as any | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[comment]
Ah, it was needed because TypedArray relies on the platform's native endianness and doesn't allow specifying it.
When I first looked at this code, I didn't quite understand these, but the code using DataView
looks much clear and easier to understand 🚀
@@ -477,6 +477,14 @@ object Trees { | |||
final val UnwrapFromThrowable = 30 | |||
final val Throw = 31 | |||
|
|||
// Floating point bit manipulation, introduced in 1.20 | |||
final val Float_toBits = 32 | |||
// final val Float_toRawBits = 33 // Reserved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, I believe the difference between doubleToRawBits
and doubleToBits
is how they handle NaN, whether they convert a given NaN to an integer using its bit pattern (raw), or if they use the bit pattern of a "normalized" NaN (like the result of 0.0 / 0.0).
If I understand correctly, are there any use cases for the toRawBits
version?
Also, is it correct that "raw" NaNs typically being caused only from calculations, and as a rule of thumb, we should use normalized NaN (0.0/0.0) when we use NaNs as constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct. The specific normalized bit pattern is even spelled out in the JavaDoc.
The typical use case is if you want to implement "NaN-boxing", by storing actual data in the various payloads of NaNs. But IMO a much safer way to do that is to manipulate the Long
s everywhere, and only bit-concert to Double
when you know it's a number.
I have yet to see a compelling use case for the raw variants, TBH.
c9344b2
to
3d8498b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review of the first 3 commits. I propose we split out the rework of javalib method replacement: it seems to be used in other pending PRs as well, splitting it out will unblock that faster.
The IR cleaner rework might also be a candidate, if other PRs depend on it (I haven't checked).
project/JavalibIRCleaner.scala
Outdated
try { | ||
val iter = fs.getRootDirectories().iterator() | ||
while (iter.hasNext()) { | ||
Files.walkFileTree(iter.next(), EnumSet.of(FileVisitOption.FOLLOW_LINKS), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following links here seems odd. We traverse the entire jar anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well here I copied what PathIRContainer
does. Granted, that one uses the same code for directory walking and jar walking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove it. It will also allow us to drop the other two parameters as we can use a simpler overload and we can remove the enum set.
* underlying buffer is at least 8 bytes long. | ||
*/ | ||
@inline | ||
def bitsToDouble(fpBitsDataView: AnyRef): Double = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the IR cleaner be able to widen the method type here? Would avoid the cast.
Previously, we used several typed arrays pointing to the same `ArrayBuffer`. Now, we use a single `DataView`. Since we can (must) force a specific endianness with `DataView`, we get rid of the `highOffset` and `lowOffset` fields, which further streamlines the code. The assembly produced by V8 is now better. Since it knows that it manipulates a single buffer, it needs fewer loads and type tests internally. I expect the same would be true for other optimizing engines.
3d8498b
to
aa73146
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo minor comments / question.
project/JavalibIRCleaner.scala
Outdated
try { | ||
val iter = fs.getRootDirectories().iterator() | ||
while (iter.hasNext()) { | ||
Files.walkFileTree(iter.next(), EnumSet.of(FileVisitOption.FOLLOW_LINKS), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove it. It will also allow us to drop the other two parameters as we can use a simpler overload and we can remove the enum set.
project/Build.scala
Outdated
@@ -2053,16 +2053,16 @@ object Build { | |||
case `default212Version` => | |||
if (!useMinifySizes) { | |||
Some(ExpectedSizes( | |||
fastLink = 623000 to 624000, | |||
fastLink = 624000 to 625000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we understand what is causing the size increase here? IIUC we are testing this with ES2015 (as we IMO should). Is it just the additional helpers in the corejslib?
IIUC they are now always present, even if relevant floating point bit ops are unused. However, any off-the-shelve JS optimizer should be able to "tree-shake" these away, so not a big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's just the helpers in the CoreJSLib, which will be removed by off-the-shelves JS optimizers.
/* Defines a core function of 1 argument `x` that uses the `fpBitsDataView` | ||
* global var. When linking for ES 2015+, the provided body is always | ||
* used, as `fpBitsDataView` is known to exist. When linking for 5.1, | ||
* a polyfill from `scala.scalajs.runtime.FloatingPointBitsPolyfills` is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
org.scalajs.linker.runtime ?
That allows the IR cleaner to see the JS type definitions from the library when it appears as a jar on the classpath. This is the case for the linker private library. Adding that support will allow the linker private library to use JS type definitions from the Scala.js library, as long as the references can be erased away.
Previously, our floating point bit manipulation methods were implemented in user space. This commit instead introduces dedicated IR `UnaryOp`s for them. On Wasm, the implementations are straightforward opcodes. In JavaScript, we use the same strategies as before, but moved to the linker world. In most cases, we use a unique scratch `DataView` to perform the manipulations. It is now allocated as a `globalVar` in the emitter, rather than in user space. The functions for `Float`/`Int` conversions are straightforward, as well as those for `Double`/`Long` when we use `bigint`s for `Long`s. When we use `RuntimeLong`s, the conversions are implemented in `RuntimeLong`, but require the scratch `DataView` as an argument, which the linker injects. This new strategy brings several benefits: * For JavaScript, the generated code is basically optimal now. It produces tight sequences of Assembly instructions that do not allocate anything. * For Wasm, the implementation does not rely on JS interop anymore, even when the optimizer does not run. This property will be required when we target Wasm outside of a JS host. * We open a path to adding support for the "raw" variants `floatToRawIntBits`/`doubleToRawLongBits` in Wasm in the future, should we need it.
aa73146
to
b49cff9
Compare
try { | ||
val iter = fs.getRootDirectories().iterator() | ||
while (iter.hasNext()) | ||
Files.walkFileTree(iter.next(), dirVisitor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized: We should probably use Files.walk
here, because we do not need attrs
.
I'm happy to follow-up on this; I might take the opportunity to rewrite the entire thing using streams. Seems like it might actually give us some memory benefits.
First: Use a DataView in FloatingPointBits.
Previously, we used several typed arrays pointing to the same
ArrayBuffer
. Now, we use a singleDataView
. Since we can (must) force a specific endianness withDataView
, we get rid of thehighOffset
andlowOffset
fields, which further streamlines the code.The assembly produced by V8 is now better. Since it knows that it manipulates a single buffer, it needs fewer loads and type tests internally. I expect the same would be true for other optimizing engines.
Second: Introduce IR UnaryOps for floating point bit manipulation.
Previously, our floating point bit manipulation methods were implemented in user space. This commit instead introduces dedicated IR
UnaryOp
s for them.On Wasm, the implementations are straightforward opcodes. In JavaScript, we used the same strategies as before, but moved to the linker world.
In most cases, we use a unique scratch
DataView
to perform the manipulations. It is now allocated as aglobalVar
in the emitter, rather than in user space. The functions forFloat
/Int
conversions are straightforward, as well as those forDouble
/Long
when we usebigint
s forLong
s. When we useRuntimeLong
s, the conversions are implemented inRuntimeLong
, but require the scratchDataView
as an argument, which the linker injects.This new strategy brings several benefits:
floatToRawIntBits
/doubleToRawLongBits
in Wasm in the future, should we need it.