Skip to content

Introduce IR ops for unsigned extension and comparisons. #5186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 8, 2025

Conversation

sjrd
Copy link
Member

@sjrd sjrd commented May 30, 2025

This completes the set of UnaryOps and BinaryOps to directly manipulate the unsigned representation of integers.

Unlike other operations, such as Int_unsigned_/, the unsigned extension and comparisons have efficient implementations in user land. It is common for regular code to directly use the efficient implementation (e.g., x.toLong & 0xffffffffL) instead of the dedicated library method (Integer.toUnsignedLong).

If we only changed replaced the body of the library methods with IR nodes, we would miss improvements in all the other code.

Therefore, in this case, we instead recognize the efficient patterns in the optimizer, and replace them with the unsigned IR operations through folding.

When targeting JavaScript, the new IR nodes do not actually make any difference. For int operations, the Emitter sort of "undoes" the folding of the optimizer to implement them. That said, it could choose an alternative implementation based on >>> 0, which we should investigate in the future. For Longs, the subexpressions of the patterns are expanded into the RuntimeLong operations before folding gets a chance to recognize them. That's fine, because internal folding of the underlying int operations will do the best possible thing anyway.

The size increase is only due to the additional always-reachable methods in RuntimeLong. Those can be removed by standard JS minifiers.

When targeting Wasm, this allows the emitter to produce the dedicated Wasm opcodes, which are more likely to be efficient.

To be fair, we could have achieved the same result by recognizing the patterns in the Wasm emitter instead. The deeper reason to add those IR operations is for completeness. They were the last operations from a standard set that were missing in the IR.


Second commit: Use x >>> 0 instead of x ^ 0x80000000 for unsigned comparisons.

Benchmarks show that this is slightly faster. Inspection of the source code of v8 also suggests that they do not recognize x ^ 0x80000000 as doing anything special, but they do recognize x >>> 0 as emitting an "Unsigned32", and they do generate unsigned comparisons when both inputs are known to be Unsigned32.


(this really is the last IR change I have in mind these days :p)

@sjrd sjrd requested a review from gzm0 May 30, 2025 16:52
@sjrd sjrd force-pushed the ir-unsigned-ops branch from 3fb4cac to 690cfd2 Compare May 31, 2025 15:29
@gzm0
Copy link
Contributor

gzm0 commented Jun 1, 2025

Could you split the first commit into it's own PR? I think it really does not relate to this one except for discovery. (also, we should get that one in fast, to not have broken printing).

@sjrd
Copy link
Member Author

sjrd commented Jun 4, 2025

I added a commit that switches to using x >>> 0 in unsigned comparisons in the JS emitter. It does seem to make things faster.

@sjrd sjrd force-pushed the ir-unsigned-ops branch from ec9a680 to e46ca69 Compare June 4, 2025 18:04
Copy link
Contributor

@gzm0 gzm0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a high level: I think adding the remaining opcodes makes sense for completeness.

However, I'm concerned about test coverage: IIUC the only way we generate these opcodes is through optimization. I'm not sure that's enough to test that the tail of the pipeline is handling the correctly?

@sjrd
Copy link
Member Author

sjrd commented Jun 7, 2025

However, I'm concerned about test coverage: IIUC the only way we generate these opcodes is through optimization. I'm not sure that's enough to test that the tail of the pipeline is handling the correctly?

What do you exactly refer to with the tail of the pipeline? The emitters definitely have to handle them correctly. The only exception is the JS emitter with RuntimeLong, for the Long operations.

We could introduce (some of) them in javalib methods, to exercise them more. UnsignedIntToLong has a natural corresponding method in jl.Integer.toUnsignedLong. The comparison operators, not so much. We can pick one of each set (Int_< and Long_<, for example) and put them in jl.Integer.compareUnsigned and jl.Long.compareUnsigned. But we don't have the opportunity to test all of them that way.

@gzm0
Copy link
Contributor

gzm0 commented Jun 7, 2025

What do you exactly refer to with the tail of the pipeline?

I mean the emitters yes. What I was trying to say is that, to me, it is absolutely not obvious that our current test suite covers the relevant code in the emitters. And that is a bit scary :-/

Now that I'm writing this, I realize the comment also applies to the optimizer (and in fact it feels worse): If a given opcode is never in serialized IR, how do we test that the optimizations on that opcode are correct?

We could introduce (some of) them in javalib methods, to exercise them more. UnsignedIntToLong has a natural corresponding method in jl.Integer.toUnsignedLong.

That would probably help, yes. IIUC this is quite easy to do with the "replace body infrastructure"?

The comparison operators, not so much. We can pick one of each set (Int_< and Long_<, for example) and put them in jl.Integer.compareUnsigned and jl.Long.compareUnsigned. But we don't have the opportunity to test all of them that way.

I don't have a good intuition if this is acceptable. On one hand, a lot of code is quite mechanical and often shared. So it feels a bug is unlikely. On the other hand it seems to be sub-par for how we test usually.

I'll think about it some more.

@sjrd
Copy link
Member Author

sjrd commented Jun 7, 2025

That would probably help, yes. IIUC this is quite easy to do with the "replace body infrastructure"?

Yes, that's very easy. I added a commit with just that for now, to see what it looks like.

Another possibility would be to recognize the unsigned comparison patterns in the compiler backend. That would significantly expand the coverage. (for UnsignedIntToLong, the body replacement of toUnsignedLong + recognizing the pattern in the optimizer should be plenty)

@sjrd sjrd force-pushed the ir-unsigned-ops branch from 2b59eaf to bac9d07 Compare June 7, 2025 11:18
@sjrd
Copy link
Member Author

sjrd commented Jun 7, 2025

And I pushed yet another commit where the compiler backend recognizes unsigned comparisons.

LMK which variant you think is best. I have a slight preference for the last one, even if it is a bit inelegant.

@gzm0
Copy link
Contributor

gzm0 commented Jun 7, 2025

I have a slight preference for the last one,

I agree. Consider adding an AST test to make sure we're actually hitting the expected rewrites.

@sjrd sjrd force-pushed the ir-unsigned-ops branch from 1babd10 to a903431 Compare June 7, 2025 15:35
@sjrd sjrd requested a review from gzm0 June 7, 2025 15:39
@sjrd
Copy link
Member Author

sjrd commented Jun 7, 2025

Alright. Rebased, squashed, cleaned up, added AST test, and addressed the other earlier comments. It's ready for another round.

sjrd added 2 commits June 7, 2025 21:43
This completes the set of `UnaryOp`s and `BinaryOp`s to directly
manipulate the unsigned representation of integers.

Unlike other operations, such as `Int_unsigned_/`, the unsigned
extension and comparisons have efficient (and convenient)
implementations in user land. It is common for regular code to
directly use the efficient implementation (e.g.,
`x.toLong & 0xffffffffL`) instead of the dedicated library method
(`Integer.toUnsignedLong`).

If we only replaced the body of the library methods with IR nodes,
we would miss improvements in all the other code.

Therefore, in this case, we instead recognize the user-space
patterns in the optimizer, and replace them with the unsigned IR
operations through folding. Moreover, for unsigned comparisons,
we also recognize the patterns in the compiler backend. The purpose
here is mostly to make sure that all these opcodes end up in the
serialized IR, so that we effectively test them along the entire
pipeline.

When targeting JavaScript, the new IR nodes do not actually make
any difference. For `int` operations, the Emitter sort of "undoes"
the folding of the optimizer to implement them. That said, it could
choose an alternative implementation based on `>>> 0`, which we
should investigate in the future. For `Long`s, the subexpressions
of the patterns are expanded into the `RuntimeLong` operations
before folding gets a chance to recognize them (when they have not
been transformed by the compiler backend). That's fine, because
internal folding of the underlying `int` operations will do the
best possible thing anyway.

The size increase is only due to the additional always-reachable
methods in `RuntimeLong`. Those can be removed by standard JS
minifiers.

When targeting Wasm, this allows the emitter to produce the
dedicated Wasm opcodes, which are more likely to be efficient.

To be fair, we could have achieved the same result by recognizing
the patterns in the Wasm emitter instead. The deeper reason to add
those IR operations is for completeness. They were the last
operations from a standard set that were missing in the IR.
Benchmarks show that this is slightly faster. Inspection of the
source code of v8 also suggests that they do not recognize
`x ^ 0x80000000` as doing anything special, but they do recognize
`x >>> 0` as emitting an "`Unsigned32`", and they do generate
unsigned comparisons when both inputs are known to be `Unsigned32`.
@sjrd sjrd force-pushed the ir-unsigned-ops branch from a903431 to 6d179c9 Compare June 7, 2025 19:44
Copy link
Contributor

@gzm0 gzm0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only one note, but for sure not actionable on this PR.

private object IntFlipSign {
def unapply(tree: PreTransform): Option[PreTransform] = tree match {
case PreTransBinaryOp(BinaryOp.Int_^, PreTransLit(IntLiteral(Int.MinValue)), x) =>
Some(x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, dear: Looking at this I got confused as hell: It seems we do not consistently normalize literals to lsh / rhs.

Comparisons: literal to rhs

case (PreTransLit(IntLiteral(_)), _) =>
foldBinaryOp(flippedOp, rhs, lhs)

Bit ops (example): literal to lhs

case (_, PreTransLit(IntLiteral(_))) => foldBinaryOp(Int_^, rhs, lhs)

So this pattern match is correct, but the overall system is confusing :-/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for almost everything we normalize literals on the lhs, but for comparisons we normalize to the rhs. 🤷‍♂️

@gzm0 gzm0 merged commit 1312c97 into scala-js:main Jun 8, 2025
3 checks passed
@sjrd sjrd deleted the ir-unsigned-ops branch June 8, 2025 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants