-
Notifications
You must be signed in to change notification settings - Fork 396
Introduce IR ops for unsigned extension and comparisons. #5186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Could you split the first commit into it's own PR? I think it really does not relate to this one except for discovery. (also, we should get that one in fast, to not have broken printing). |
I added a commit that switches to using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a high level: I think adding the remaining opcodes makes sense for completeness.
However, I'm concerned about test coverage: IIUC the only way we generate these opcodes is through optimization. I'm not sure that's enough to test that the tail of the pipeline is handling the correctly?
linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/FunctionEmitter.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/frontend/optimizer/OptimizerCore.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/frontend/optimizer/OptimizerCore.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/frontend/optimizer/OptimizerCore.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/frontend/optimizer/OptimizerCore.scala
Outdated
Show resolved
Hide resolved
What do you exactly refer to with the tail of the pipeline? The emitters definitely have to handle them correctly. The only exception is the JS emitter with We could introduce (some of) them in javalib methods, to exercise them more. |
I mean the emitters yes. What I was trying to say is that, to me, it is absolutely not obvious that our current test suite covers the relevant code in the emitters. And that is a bit scary :-/ Now that I'm writing this, I realize the comment also applies to the optimizer (and in fact it feels worse): If a given opcode is never in serialized IR, how do we test that the optimizations on that opcode are correct?
That would probably help, yes. IIUC this is quite easy to do with the "replace body infrastructure"?
I don't have a good intuition if this is acceptable. On one hand, a lot of code is quite mechanical and often shared. So it feels a bug is unlikely. On the other hand it seems to be sub-par for how we test usually. I'll think about it some more. |
Yes, that's very easy. I added a commit with just that for now, to see what it looks like. Another possibility would be to recognize the unsigned comparison patterns in the compiler backend. That would significantly expand the coverage. (for |
And I pushed yet another commit where the compiler backend recognizes unsigned comparisons. LMK which variant you think is best. I have a slight preference for the last one, even if it is a bit inelegant. |
I agree. Consider adding an AST test to make sure we're actually hitting the expected rewrites. |
Alright. Rebased, squashed, cleaned up, added AST test, and addressed the other earlier comments. It's ready for another round. |
This completes the set of `UnaryOp`s and `BinaryOp`s to directly manipulate the unsigned representation of integers. Unlike other operations, such as `Int_unsigned_/`, the unsigned extension and comparisons have efficient (and convenient) implementations in user land. It is common for regular code to directly use the efficient implementation (e.g., `x.toLong & 0xffffffffL`) instead of the dedicated library method (`Integer.toUnsignedLong`). If we only replaced the body of the library methods with IR nodes, we would miss improvements in all the other code. Therefore, in this case, we instead recognize the user-space patterns in the optimizer, and replace them with the unsigned IR operations through folding. Moreover, for unsigned comparisons, we also recognize the patterns in the compiler backend. The purpose here is mostly to make sure that all these opcodes end up in the serialized IR, so that we effectively test them along the entire pipeline. When targeting JavaScript, the new IR nodes do not actually make any difference. For `int` operations, the Emitter sort of "undoes" the folding of the optimizer to implement them. That said, it could choose an alternative implementation based on `>>> 0`, which we should investigate in the future. For `Long`s, the subexpressions of the patterns are expanded into the `RuntimeLong` operations before folding gets a chance to recognize them (when they have not been transformed by the compiler backend). That's fine, because internal folding of the underlying `int` operations will do the best possible thing anyway. The size increase is only due to the additional always-reachable methods in `RuntimeLong`. Those can be removed by standard JS minifiers. When targeting Wasm, this allows the emitter to produce the dedicated Wasm opcodes, which are more likely to be efficient. To be fair, we could have achieved the same result by recognizing the patterns in the Wasm emitter instead. The deeper reason to add those IR operations is for completeness. They were the last operations from a standard set that were missing in the IR.
Benchmarks show that this is slightly faster. Inspection of the source code of v8 also suggests that they do not recognize `x ^ 0x80000000` as doing anything special, but they do recognize `x >>> 0` as emitting an "`Unsigned32`", and they do generate unsigned comparisons when both inputs are known to be `Unsigned32`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Only one note, but for sure not actionable on this PR.
private object IntFlipSign { | ||
def unapply(tree: PreTransform): Option[PreTransform] = tree match { | ||
case PreTransBinaryOp(BinaryOp.Int_^, PreTransLit(IntLiteral(Int.MinValue)), x) => | ||
Some(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, dear: Looking at this I got confused as hell: It seems we do not consistently normalize literals to lsh / rhs.
Comparisons: literal to rhs
scala-js/linker/shared/src/main/scala/org/scalajs/linker/frontend/optimizer/OptimizerCore.scala
Lines 4517 to 4518 in 052d861
case (PreTransLit(IntLiteral(_)), _) => | |
foldBinaryOp(flippedOp, rhs, lhs) |
Bit ops (example): literal to lhs
scala-js/linker/shared/src/main/scala/org/scalajs/linker/frontend/optimizer/OptimizerCore.scala
Line 4323 in 052d861
case (_, PreTransLit(IntLiteral(_))) => foldBinaryOp(Int_^, rhs, lhs) |
So this pattern match is correct, but the overall system is confusing :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for almost everything we normalize literals on the lhs, but for comparisons we normalize to the rhs. 🤷♂️
This completes the set of
UnaryOp
s andBinaryOp
s to directly manipulate the unsigned representation of integers.Unlike other operations, such as
Int_unsigned_/
, the unsigned extension and comparisons have efficient implementations in user land. It is common for regular code to directly use the efficient implementation (e.g.,x.toLong & 0xffffffffL
) instead of the dedicated library method (Integer.toUnsignedLong
).If we only changed replaced the body of the library methods with IR nodes, we would miss improvements in all the other code.
Therefore, in this case, we instead recognize the efficient patterns in the optimizer, and replace them with the unsigned IR operations through folding.
When targeting JavaScript, the new IR nodes do not actually make any difference. For
int
operations, the Emitter sort of "undoes" the folding of the optimizer to implement them. That said, it could choose an alternative implementation based on>>> 0
, which we should investigate in the future. ForLong
s, the subexpressions of the patterns are expanded into theRuntimeLong
operations before folding gets a chance to recognize them. That's fine, because internal folding of the underlyingint
operations will do the best possible thing anyway.The size increase is only due to the additional always-reachable methods in
RuntimeLong
. Those can be removed by standard JS minifiers.When targeting Wasm, this allows the emitter to produce the dedicated Wasm opcodes, which are more likely to be efficient.
To be fair, we could have achieved the same result by recognizing the patterns in the Wasm emitter instead. The deeper reason to add those IR operations is for completeness. They were the last operations from a standard set that were missing in the IR.
Second commit: Use
x >>> 0
instead ofx ^ 0x80000000
for unsigned comparisons.Benchmarks show that this is slightly faster. Inspection of the source code of v8 also suggests that they do not recognize
x ^ 0x80000000
as doing anything special, but they do recognizex >>> 0
as emitting an "Unsigned32
", and they do generate unsigned comparisons when both inputs are known to beUnsigned32
.(this really is the last IR change I have in mind these days :p)