Skip to content

Introduce IR UnaryOps for floating point bit manipulation. #5158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 21, 2025

Conversation

sjrd
Copy link
Member

@sjrd sjrd commented Apr 24, 2025

First: Use a DataView in FloatingPointBits.

Previously, we used several typed arrays pointing to the same ArrayBuffer. Now, we use a single DataView. Since we can (must) force a specific endianness with DataView, we get rid of the highOffset and lowOffset fields, which further streamlines the code.

The assembly produced by V8 is now better. Since it knows that it manipulates a single buffer, it needs fewer loads and type tests internally. I expect the same would be true for other optimizing engines.


Second: Introduce IR UnaryOps for floating point bit manipulation.

Previously, our floating point bit manipulation methods were implemented in user space. This commit instead introduces dedicated IR UnaryOps for them.

On Wasm, the implementations are straightforward opcodes. In JavaScript, we used the same strategies as before, but moved to the linker world.

In most cases, we use a unique scratch DataView to perform the manipulations. It is now allocated as a globalVar in the emitter, rather than in user space. The functions for Float/Int conversions are straightforward, as well as those for Double/Long when we use bigints for Longs. When we use RuntimeLongs, the conversions are implemented in RuntimeLong, but require the scratch DataView as an argument, which the linker injects.

This new strategy brings several benefits:

  • For JavaScript, the generated code is basically optimal now. It produces tight sequences of Assembly instructions that do not allocate anything.
  • For Wasm, the implementation does not rely on JS interop anymore, even when the optimizer does not run. This property will be required when we target Wasm outside of a JS host.
  • We open a path to adding support for the "raw" variants floatToRawIntBits/doubleToRawLongBits in Wasm in the future, should we need it.

@sjrd sjrd requested a review from gzm0 April 24, 2025 08:58
@sjrd sjrd force-pushed the floating-point-bits-opcodes branch 3 times, most recently from a1cd758 to 59ca785 Compare April 26, 2025 13:54
Comment on lines -64 to -70
private val areTypedArraysBigEndian = {
if (areTypedArraysSupported) {
int32Array(0) = 0x01020304
(new typedarray.Int8Array(arrayBuffer, 0, 8))(0) == 0x01
} else {
true // as good a value as any
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[comment]
Ah, it was needed because TypedArray relies on the platform's native endianness and doesn't allow specifying it.
When I first looked at this code, I didn't quite understand these, but the code using DataView looks much clear and easier to understand 🚀

@@ -477,6 +477,14 @@ object Trees {
final val UnwrapFromThrowable = 30
final val Throw = 31

// Floating point bit manipulation, introduced in 1.20
final val Float_toBits = 32
// final val Float_toRawBits = 33 // Reserved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, I believe the difference between doubleToRawBits and doubleToBits is how they handle NaN, whether they convert a given NaN to an integer using its bit pattern (raw), or if they use the bit pattern of a "normalized" NaN (like the result of 0.0 / 0.0).

If I understand correctly, are there any use cases for the toRawBits version?


Also, is it correct that "raw" NaNs typically being caused only from calculations, and as a rule of thumb, we should use normalized NaN (0.0/0.0) when we use NaNs as constants?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. The specific normalized bit pattern is even spelled out in the JavaDoc.

The typical use case is if you want to implement "NaN-boxing", by storing actual data in the various payloads of NaNs. But IMO a much safer way to do that is to manipulate the Longs everywhere, and only bit-concert to Double when you know it's a number.

I have yet to see a compelling use case for the raw variants, TBH.

@sjrd sjrd force-pushed the floating-point-bits-opcodes branch 2 times, most recently from c9344b2 to 3d8498b Compare May 16, 2025 15:46
Copy link
Contributor

@gzm0 gzm0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of the first 3 commits. I propose we split out the rework of javalib method replacement: it seems to be used in other pending PRs as well, splitting it out will unblock that faster.

The IR cleaner rework might also be a candidate, if other PRs depend on it (I haven't checked).

try {
val iter = fs.getRootDirectories().iterator()
while (iter.hasNext()) {
Files.walkFileTree(iter.next(), EnumSet.of(FileVisitOption.FOLLOW_LINKS),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following links here seems odd. We traverse the entire jar anyways.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well here I copied what PathIRContainer does. Granted, that one uses the same code for directory walking and jar walking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove it. It will also allow us to drop the other two parameters as we can use a simpler overload and we can remove the enum set.

* underlying buffer is at least 8 bytes long.
*/
@inline
def bitsToDouble(fpBitsDataView: AnyRef): Double = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the IR cleaner be able to widen the method type here? Would avoid the cast.

Previously, we used several typed arrays pointing to the same
`ArrayBuffer`. Now, we use a single `DataView`. Since we can (must)
force a specific endianness with `DataView`, we get rid of the
`highOffset` and `lowOffset` fields, which further streamlines the
code.

The assembly produced by V8 is now better. Since it knows that it
manipulates a single buffer, it needs fewer loads and type tests
internally. I expect the same would be true for other optimizing
engines.
@sjrd sjrd force-pushed the floating-point-bits-opcodes branch from 3d8498b to aa73146 Compare May 19, 2025 21:24
@sjrd sjrd requested a review from gzm0 May 19, 2025 21:25
Copy link
Contributor

@gzm0 gzm0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo minor comments / question.

try {
val iter = fs.getRootDirectories().iterator()
while (iter.hasNext()) {
Files.walkFileTree(iter.next(), EnumSet.of(FileVisitOption.FOLLOW_LINKS),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove it. It will also allow us to drop the other two parameters as we can use a simpler overload and we can remove the enum set.

@@ -2053,16 +2053,16 @@ object Build {
case `default212Version` =>
if (!useMinifySizes) {
Some(ExpectedSizes(
fastLink = 623000 to 624000,
fastLink = 624000 to 625000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we understand what is causing the size increase here? IIUC we are testing this with ES2015 (as we IMO should). Is it just the additional helpers in the corejslib?

IIUC they are now always present, even if relevant floating point bit ops are unused. However, any off-the-shelve JS optimizer should be able to "tree-shake" these away, so not a big deal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's just the helpers in the CoreJSLib, which will be removed by off-the-shelves JS optimizers.

/* Defines a core function of 1 argument `x` that uses the `fpBitsDataView`
* global var. When linking for ES 2015+, the provided body is always
* used, as `fpBitsDataView` is known to exist. When linking for 5.1,
* a polyfill from `scala.scalajs.runtime.FloatingPointBitsPolyfills` is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.scalajs.linker.runtime ?

sjrd added 2 commits May 21, 2025 15:33
That allows the IR cleaner to see the JS type definitions from
the library when it appears as a jar on the classpath. This is
the case for the linker private library. Adding that support will
allow the linker private library to use JS type definitions from
the Scala.js library, as long as the references can be erased
away.
Previously, our floating point bit manipulation methods were
implemented in user space. This commit instead introduces dedicated
IR `UnaryOp`s for them.

On Wasm, the implementations are straightforward opcodes. In
JavaScript, we use the same strategies as before, but moved to the
linker world.

In most cases, we use a unique scratch `DataView` to perform the
manipulations. It is now allocated as a `globalVar` in the emitter,
rather than in user space. The functions for `Float`/`Int`
conversions are straightforward, as well as those for
`Double`/`Long` when we use `bigint`s for `Long`s. When we use
`RuntimeLong`s, the conversions are implemented in `RuntimeLong`,
but require the scratch `DataView` as an argument, which the
linker injects.

This new strategy brings several benefits:

* For JavaScript, the generated code is basically optimal now.
  It produces tight sequences of Assembly instructions that do not
  allocate anything.
* For Wasm, the implementation does not rely on JS interop anymore,
  even when the optimizer does not run. This property will be
  required when we target Wasm outside of a JS host.
* We open a path to adding support for the "raw" variants
  `floatToRawIntBits`/`doubleToRawLongBits` in Wasm in the future,
  should we need it.
@sjrd sjrd force-pushed the floating-point-bits-opcodes branch from aa73146 to b49cff9 Compare May 21, 2025 13:33
try {
val iter = fs.getRootDirectories().iterator()
while (iter.hasNext())
Files.walkFileTree(iter.next(), dirVisitor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized: We should probably use Files.walk here, because we do not need attrs.

I'm happy to follow-up on this; I might take the opportunity to rewrite the entire thing using streams. Seems like it might actually give us some memory benefits.

@sjrd sjrd merged commit 9e5b837 into scala-js:main May 21, 2025
3 checks passed
@sjrd sjrd deleted the floating-point-bits-opcodes branch May 21, 2025 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants