Introduce IR UnaryOps for floating point bit manipulation. #5158

sjrd · 2025-04-24T08:58:51Z

First: Use a DataView in FloatingPointBits.

Previously, we used several typed arrays pointing to the same ArrayBuffer. Now, we use a single DataView. Since we can (must) force a specific endianness with DataView, we get rid of the highOffset and lowOffset fields, which further streamlines the code.

The assembly produced by V8 is now better. Since it knows that it manipulates a single buffer, it needs fewer loads and type tests internally. I expect the same would be true for other optimizing engines.

Second: Introduce IR UnaryOps for floating point bit manipulation.

Previously, our floating point bit manipulation methods were implemented in user space. This commit instead introduces dedicated IR UnaryOps for them.

On Wasm, the implementations are straightforward opcodes. In JavaScript, we used the same strategies as before, but moved to the linker world.

In most cases, we use a unique scratch DataView to perform the manipulations. It is now allocated as a globalVar in the emitter, rather than in user space. The functions for Float/Int conversions are straightforward, as well as those for Double/Long when we use bigints for Longs. When we use RuntimeLongs, the conversions are implemented in RuntimeLong, but require the scratch DataView as an argument, which the linker injects.

This new strategy brings several benefits:

For JavaScript, the generated code is basically optimal now. It produces tight sequences of Assembly instructions that do not allocate anything.
For Wasm, the implementation does not rely on JS interop anymore, even when the optimizer does not run. This property will be required when we target Wasm outside of a JS host.
We open a path to adding support for the "raw" variants floatToRawIntBits/doubleToRawLongBits in Wasm in the future, should we need it.

tanishiking · 2025-04-26T13:44:13Z

javalib/src/main/scala/java/lang/FloatingPointBits.scala

-  private val areTypedArraysBigEndian = {
-    if (areTypedArraysSupported) {
-      int32Array(0) = 0x01020304
-      (new typedarray.Int8Array(arrayBuffer, 0, 8))(0) == 0x01
-    } else {
-      true // as good a value as any
-    }


[comment]
Ah, it was needed because TypedArray relies on the platform's native endianness and doesn't allow specifying it.
When I first looked at this code, I didn't quite understand these, but the code using DataView looks much clear and easier to understand 🚀

tanishiking · 2025-04-27T03:57:42Z

ir/shared/src/main/scala/org/scalajs/ir/Trees.scala

@@ -477,6 +477,14 @@ object Trees {
    final val UnwrapFromThrowable = 30
    final val Throw = 31

+    // Floating point bit manipulation, introduced in 1.20
+    final val Float_toBits = 32
+    // final val Float_toRawBits = 33 // Reserved


Just out of curiosity, I believe the difference between doubleToRawBits and doubleToBits is how they handle NaN, whether they convert a given NaN to an integer using its bit pattern (raw), or if they use the bit pattern of a "normalized" NaN (like the result of 0.0 / 0.0).

If I understand correctly, are there any use cases for the toRawBits version?

Also, is it correct that "raw" NaNs typically being caused only from calculations, and as a rule of thumb, we should use normalized NaN (0.0/0.0) when we use NaNs as constants?

Yes, that's correct. The specific normalized bit pattern is even spelled out in the JavaDoc.

The typical use case is if you want to implement "NaN-boxing", by storing actual data in the various payloads of NaNs. But IMO a much safer way to do that is to manipulate the Longs everywhere, and only bit-concert to Double when you know it's a number.

I have yet to see a compelling use case for the raw variants, TBH.

gzm0

Review of the first 3 commits. I propose we split out the rework of javalib method replacement: it seems to be used in other pending PRs as well, splitting it out will unblock that faster.

The IR cleaner rework might also be a candidate, if other PRs depend on it (I haven't checked).

project/JavalibIRCleaner.scala

gzm0 · 2025-04-26T10:01:44Z

project/JavalibIRCleaner.scala

+    try {
+      val iter = fs.getRootDirectories().iterator()
+      while (iter.hasNext()) {
+        Files.walkFileTree(iter.next(), EnumSet.of(FileVisitOption.FOLLOW_LINKS),


Following links here seems odd. We traverse the entire jar anyways.

Well here I copied what PathIRContainer does. Granted, that one uses the same code for directory walking and jar walking.

Let's remove it. It will also allow us to drop the other two parameters as we can use a simpler overload and we can remove the enum set.

project/Build.scala

compiler/src/main/scala/org/scalajs/nscplugin/GenJSCode.scala

gzm0 · 2025-04-26T10:12:45Z

linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala

+   *  underlying buffer is at least 8 bytes long.
+   */
+  @inline
+  def bitsToDouble(fpBitsDataView: AnyRef): Double = {


Shouldn't the IR cleaner be able to widen the method type here? Would avoid the cast.

compiler/src/main/scala/org/scalajs/nscplugin/GenJSCode.scala

Previously, we used several typed arrays pointing to the same `ArrayBuffer`. Now, we use a single `DataView`. Since we can (must) force a specific endianness with `DataView`, we get rid of the `highOffset` and `lowOffset` fields, which further streamlines the code. The assembly produced by V8 is now better. Since it knows that it manipulates a single buffer, it needs fewer loads and type tests internally. I expect the same would be true for other optimizing engines.

gzm0

LGTM modulo minor comments / question.

gzm0 · 2025-05-20T16:04:46Z

project/JavalibIRCleaner.scala

+    try {
+      val iter = fs.getRootDirectories().iterator()
+      while (iter.hasNext()) {
+        Files.walkFileTree(iter.next(), EnumSet.of(FileVisitOption.FOLLOW_LINKS),


Let's remove it. It will also allow us to drop the other two parameters as we can use a simpler overload and we can remove the enum set.

gzm0 · 2025-05-20T16:09:14Z

project/Build.scala

@@ -2053,16 +2053,16 @@ object Build {
          case `default212Version` =>
            if (!useMinifySizes) {
              Some(ExpectedSizes(
-                  fastLink = 623000 to 624000,
+                  fastLink = 624000 to 625000,


Do we understand what is causing the size increase here? IIUC we are testing this with ES2015 (as we IMO should). Is it just the additional helpers in the corejslib?

IIUC they are now always present, even if relevant floating point bit ops are unused. However, any off-the-shelve JS optimizer should be able to "tree-shake" these away, so not a big deal.

Yes, it's just the helpers in the CoreJSLib, which will be removed by off-the-shelves JS optimizers.

gzm0 · 2025-05-20T16:13:00Z

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/CoreJSLib.scala

+      /* Defines a core function of 1 argument `x` that uses the `fpBitsDataView`
+       * global var. When linking for ES 2015+, the provided body is always
+       * used, as `fpBitsDataView` is known to exist. When linking for 5.1,
+       * a polyfill from `scala.scalajs.runtime.FloatingPointBitsPolyfills` is


org.scalajs.linker.runtime ?

That allows the IR cleaner to see the JS type definitions from the library when it appears as a jar on the classpath. This is the case for the linker private library. Adding that support will allow the linker private library to use JS type definitions from the Scala.js library, as long as the references can be erased away.

Previously, our floating point bit manipulation methods were implemented in user space. This commit instead introduces dedicated IR `UnaryOp`s for them. On Wasm, the implementations are straightforward opcodes. In JavaScript, we use the same strategies as before, but moved to the linker world. In most cases, we use a unique scratch `DataView` to perform the manipulations. It is now allocated as a `globalVar` in the emitter, rather than in user space. The functions for `Float`/`Int` conversions are straightforward, as well as those for `Double`/`Long` when we use `bigint`s for `Long`s. When we use `RuntimeLong`s, the conversions are implemented in `RuntimeLong`, but require the scratch `DataView` as an argument, which the linker injects. This new strategy brings several benefits: * For JavaScript, the generated code is basically optimal now. It produces tight sequences of Assembly instructions that do not allocate anything. * For Wasm, the implementation does not rely on JS interop anymore, even when the optimizer does not run. This property will be required when we target Wasm outside of a JS host. * We open a path to adding support for the "raw" variants `floatToRawIntBits`/`doubleToRawLongBits` in Wasm in the future, should we need it.

gzm0 · 2025-05-21T13:59:42Z

project/JavalibIRCleaner.scala

+    try {
+      val iter = fs.getRootDirectories().iterator()
+      while (iter.hasNext())
+        Files.walkFileTree(iter.next(), dirVisitor)


I just realized: We should probably use Files.walk here, because we do not need attrs.

I'm happy to follow-up on this; I might take the opportunity to rewrite the entire thing using streams. Seems like it might actually give us some memory benefits.

sjrd requested a review from gzm0 April 24, 2025 08:58

sjrd force-pushed the floating-point-bits-opcodes branch 3 times, most recently from a1cd758 to 59ca785 Compare April 26, 2025 13:54

tanishiking reviewed Apr 27, 2025

View reviewed changes

tanishiking mentioned this pull request May 2, 2025

js.Array usage tracking in javalib scala-wasm/scala-wasm#41

Open

14 tasks

sjrd force-pushed the floating-point-bits-opcodes branch 2 times, most recently from c9344b2 to 3d8498b Compare May 16, 2025 15:46

gzm0 requested changes May 18, 2025

View reviewed changes

sjrd mentioned this pull request May 19, 2025

Refactoring: Isolate handling of javalib methods with special bodies. #5173

Merged

sjrd force-pushed the floating-point-bits-opcodes branch from 3d8498b to aa73146 Compare May 19, 2025 21:24

sjrd requested a review from gzm0 May 19, 2025 21:25

gzm0 approved these changes May 21, 2025

View reviewed changes

sjrd added 2 commits May 21, 2025 15:33

sjrd force-pushed the floating-point-bits-opcodes branch from aa73146 to b49cff9 Compare May 21, 2025 13:33

gzm0 reviewed May 21, 2025

View reviewed changes

sjrd mentioned this pull request May 21, 2025

Introduce IR BinaryOps for unsigned division and remainder. #5171

Merged

sjrd merged commit 9e5b837 into scala-js:main May 21, 2025
3 checks passed

sjrd deleted the floating-point-bits-opcodes branch May 21, 2025 19:08

Introduce IR UnaryOps for floating point bit manipulation. #5158

Introduce IR UnaryOps for floating point bit manipulation. #5158

Uh oh!

Conversation

sjrd commented Apr 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gzm0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gzm0 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!