-
Notifications
You must be signed in to change notification settings - Fork 396
Replace the JS object given to jl.Class by primitive IR operations. #4998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d79a966
to
9aed462
Compare
I was prompted to seriously think about these things because of WebAssembly. Although we could indeed implement the old spec with a JS object, it was awkward. Moreover, benchmarks have already shown that the JS interop in This PR is done on |
I'm prioritizing #4988 over this one. LMK if I should change that. |
One thought I had that you can probably already answer: What prompted the decision to model this as operators rather than special purpose IR trees? |
TBH, I did not like the idea to add 10 different IR nodes for unfrequent operations. IR nodes are "expensive" in terms of maintenance. Unary and binary ops originally had the rationale that they were all together because they were pure, but that was broken when we made integer divisions reliably throw on So I reasoned that what made operators still meaningful as a category is that they share the following properties:
Those properties are meaningful for manipulating the trees. For example, unnesting in Under that category, I would actually consider integrating Conversely, |
Agreed. Probably we should do that to reduce trees (and document the properties on the ops).
Doesn't this also apply to |
Ah yes, that's true. It also applies to Exclusions are good if the nodes need to be treated differently in most cases. But otherwise probably not. |
I was thinking about this on the side, and I find it annoying that we need to handle I contemplated adding non-nullable However, what if the If we do introduce non-nullable types in the IR in the future, we could then naturally remove that restriction and only demand a non-null @gzm0 WDYT about this idea? Does it seem acceptable? |
f088f31
to
54678b7
Compare
I have implemented the above alternative as an additional commit (second-to-last). If we go for it, it should be squashed with the main commit, since it undoes a number of changes of the main commit. |
54678b7
to
1af41b1
Compare
Rebased. |
1af41b1
to
ab45adc
Compare
Rebased. |
It is a pity that
My gut feeling is that restricting the operands lexically goes a bit far, especially if we cannot maintain the property post optimizer. This will force us to either:
I don't think the first one is something we want to commit to. So if we are to commit to the second one, we might just do it? (It would not surprise me if this would lead to various simplifications in the backends, especially w.r.t. Looking
|
Not necessarily. We can also have the optimizer systematically insert an |
ab45adc
to
978442e
Compare
Rebased, including adding changes in the Wasm backend (the rest is unchanged). |
Have we considered adding an (opaque) type data primitive type to the IR? The type would be non-nullable and the argument type to the class operations. IIUC this would also allow |
Ah, that is an interesting take. The typed closures PR shows that it is indeed possible to have IR types that don't have any interop semantics, by not being subtypes of |
Meh, that's not so great. In addition to introducing more difficulties to handle |
FYI, I'm getting somewhere with non-nullable types in the IR, but I'm not done yet (the optimizer is not type-preserving yet, so Wasm+optimizer generates invalid Wasm code): |
True non-nullable types in the IR available at #5018. |
978442e
to
1a49ebd
Compare
So re-working on the
Given that it has always been broken, IMO it's best to fix those things after we land either #5031 or this PR. That said, it adds to the list of things both variants of the operation will need to deal with. On the one hand, it gives one more advantage to the single-dimension variant because there's only one failure mode to deal at the operation level; on the other hand, since both failure modes throw the same exception, the checks could maybe be reused better in the multi-dimensions variant. 🤷♂️ |
Looking at #5031 made me wonder whether we should actually try this... What happens if we introduce a The more I look at this, the more I agree that multi-dimensional array creation should have never made it into the IR. So I'm actually wondering if we should be more aggressive and attempt to throw out |
In case we go with single dimension in IR only, I think we can live with the class check being over-performed if that makes the implementation of multi dimension new array easier (which it likely does). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits discovered on a re-review.
RE dispatch: I think we should go with the magic data
field for now to get this moving. We can still decide to change the dispatch if we believe it is an advantage.
RE single / multi dimensions: See my other comment.
/** Represents a property of one of the special `ArrayClass`es. | ||
/** Represents a synthetic property of some Scala classes. | ||
* | ||
* A synthetic property not exist as `FieldDef` or `MethodDef` in the IR, but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: ... property does not exist
* throw an `ArithmeticException` when their right-hand-side is 0. That | ||
* exception is not subject to undefined behavior. | ||
* | ||
* `String_charAt` throws string index out bounds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StringIndexOutOfBoundsException
?
Huh. I don't think we have precedent for the deserialization hacks to rely on a method in the linker library. The hacks turn previously valid IR into newly valid IR, regardless of the linker. The linker library is an implementation detail of the linker. It is even possible for an alternative linker not to have a linker library at all. I think we could conceivably rewrite multi-dimension Edit: if we do that, then we could also rewrite single-dimension |
08695ae
to
fcbf48d
Compare
Added two commits to be squashed that also rewrite multi- I kept the single-dim |
eaf163d
to
9f1966d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should proceed with the variant with the simplified NewArray: IMO the deser hacks do not get much worse, but we get more simplification downstream.
I haven't finished in-detail review yet, but have one comment RE CheckNull.
} else if (methodSimpleNameString == "newArrayOfThisClass") { | ||
val newName = MethodIdent(MethodName(methodName.simpleName, | ||
List(IntRef), ClassRef(ObjectClass)))(method.name.pos) | ||
val lengthParam = ParamDef(LocalIdent(LocalName("length")), | ||
NoOriginalName, IntType, mutable = false) | ||
val newBody = BinaryOp(BinaryOp.Class_newArray, thisJLClass, lengthParam.ref) | ||
MethodDef(method.flags, newName, method.originalName, | ||
List(lengthParam), AnyType, Some(newBody))( | ||
method.optimizerHints.withInline(true), method.version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be very opposed to expose CheckNull
as a UnaryOp
? Then this (and the strange private method in Class
) would go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I think it definitely makes sense. There's a bit of open design questions for that, though:
- If we make it a
UnaryOp
, we cannot determine thetpe
of aUnaryOp
'sop
alone anymore. It now depends on the type of the argument. Should this be grounds to make it its own node instead? Not sure. (Note thatClone
will have the same issue if we want to turn it into aUnaryOp
.) - Instead of a new kind of operation, should it be merged into the behavior of
AsInstanceOf
? If the target type is not nullable (but also not a primitive because that's an unboxing operation) and the argument's type is nullable, then the behavior would include a null check. That has the useful side effect that we can express anAsInstanceOf
from non-nullable to non-nullable as well, basically for free. However it's a more involved change.
Do you have any opinion about these alternatives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be grounds to make it its own node instead?
I'm not sure. It feels like that change would still be extremely local. All usage sites are going to look at tpe
anyways if they need the type.
Instead of a new kind of operation, should it be merged into the behavior of AsInstanceOf?
I think conceptually this would be cleaner, yes. I have been fiddling with this and a couple of challenges surface already:
- We need more optimization in the emitter stage: Checking for null and for a type are very different at the emitter level. So always doing both is wasteful.
- If we naively replace
CheckNotNull(expr)
withAsInstanceOf(expr, expr.tpe.toNonNullable)
we get undefined references (at the GCC level) for$as_jl_Long
and$as_jl_Throwable
. The former is probably the fault of the optimizer (runtime long substitution is the last thing IIUC that prevents us from running IR checking post optimizer). The latter is probably the fault of the emitter (JS throwable replacement). - Side-effect checking for
AsInstanceOf
under semantics becomes more complicated: We now depend on two semantics (asInstanceOf and NPE). Again, if we do the checking naively (require both semantics to be unchecked), we'll almost certainly regress in code size.
So all-in-all, I'm not sure we're better of with the AsInstanceOf
option (but maybe more testing is required). I think what we should definitely do is allow non-nullable types in AsInstanceOf
if the target expr is already non-nullable (but IIUC that is a trivial change).
Branch: https://github.com/scala-js/scala-js/compare/main...gzm0:allow-non-null-as-instance?expand=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the analysis. It seems merging with AsInstanceOf
causes more issues than it solves.
I made the unary op version in #5037.
|
||
MethodDef(method.flags, method.name, method.originalName, | ||
method.args, method.resultType, Some(newBody))( | ||
method.optimizerHints.withInline(true), method.version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems marking getSimpleName
and computeCachedSimpleNameBestEffort
inline flagged too (am I correct?), is that okay? I don't see those methods@inline
in Class.scala.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Fixed.
9f1966d
to
495917e
Compare
Rebased on top of #5037, being optimistic that we're going to opt one way or another for a version based on a check-not-null operation. |
495917e
to
a7446dc
Compare
Could you perform the squashes for ease of review? |
6c4b8da
to
57834b8
Compare
Squashed :) |
In the next commit, we will add `jl.Class.data` as a synthetic property, which we will have to handle in the same way as `ArrayClass` properties. We therefore rename the concept to be more general and include any synthetic property on Scala classes.
723bd06
to
fc8870d
Compare
case Class_isAssignableFrom => | ||
ClassType(ClassClass, nullable = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have the ability to insert a CheckNotNull
, should we also require a non-nullable jl.Class
as the rhs of isAssignableFrom
?
It's not as critical as the lhs, because the NPE naturally comes after evaluating both arguments, which is not true for the lhs. But it would probably be more consistent.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite a bit better with that change, IMO, so I did it.
2fc6fa6
to
2dd91c1
Compare
`java.lang.Class` requires primitive capabilities that only the linker can provide for most of its core methods. This commit changes the fundamental way to provide those. In the process, remove creation of multi-dimensional arrays from the IR, and move it to user-space instead. --- Previously, the contract was that the linker provided a JS object with specified properties and methods to the constructor of `jl.Class`. For example, it contained a `"name"` property in order to implement `jl.Class.getName()`. We happened to reuse the `$TypeData` instance as that object, for convenience, and to avoid more indirections. This design grew organically over the years, and was eventually specified in Scala.js IR 1.0.0. However, it always had some shortcomings that I was not happy about. First, the `getSuperclass()` method had to be special-cased in the Analyzer and Emitter: its very presence requires additional metadata. When it is not reachable, we do not actually provide the supposedly specified member `"getSuperclass"()` on the JS object. That assumes that only `jl.Class.getSuperclass()` actually calls that method. In turn, that assumes it never gets inlined (!), since doing so would make it unreachable, but the feature would still be required. Second, for optimization purposes, the emitter needs direct access to the `$TypeData` stored inside `jl.Class` (for some of its Transients). There again, that assumed a field with a particular name be present in `jl.Class` in which the JS object would be stored. This implicit requirement was already leaking in the way `jl.Class` was written, in order to prevent scalac from renaming the field. --- In this commit, we completely change the approach, and therefore the IR spec. We remove the JS object passed to `jl.Class`; its constructor now takes no arguments. Instead, we add primitives in the IR, in the form of new `UnaryOp`s and `BinaryOp`s. These new primitives operate on an argument of type `jl.Class!`, and possibly a second argument. For example, `UnaryOp.Class_name` replaces the functionality previously offered by the `"name"` field. `BinaryOp.Class_superClass` replaces `"getSuperclass"()`. It is up to the backend to ensure it can, somehow, implement those operations given an instance of `jl.Class`. We choose to add a magic field to `jl.Class` at the Emitter level to store the corresponding instance of `$TypeData`, and implement the operations in terms of that. The approach has several benefits: * It solves the two fundamental issues of the JS object, mentioned above. * It does not require "public" (unminifiable) property names for the features; since there is no JS object spec, we can consistently minify those members. * Several methods that were given an *intrinsic* treatment by the optimizer now fall, more naturally, under regular folding of operations. --- Since we are changing the spec anyway, we use the opportunity to switch what is primitive about `Array.newInstance`. Previously, the overload with multiple dimensions was primitive, and the one with a single dimension delegated to it. This was a waste, since usages of the multi-dimensional overload are basically non-existent otherwise. Similarly, the `NewArray` node was designed for multiple dimensions, and the single dimension was a special case. We now make the one-dimensional operations the only primitives. We implement the multi-dimensional overload of `newInstance` in terms of the other one. We also use that overload as a replacement for multi-dimensional `NewArray` nodes. This simplifies the treatment in the linker, and produces shorter and more efficient code at the end of the day. --- The change of approach comes with a non-negligible cost for backward compatibility. It is entirely done using deserialization hacks. The most glaring issue is the synthesization of the multi-dimensional overload of `Array.newInstance`. In this commit, we do not change the library/compiler yet, so that we can exercise the deserialization hacks.
We use the same technique as for `String_length` and `String_charAt`, for which we hijack the method body from the compiler and replace it with the appropriate operation. Likewise, we actually change the `NewArray` data type to contain a single length. We only enable the deserialization hacks of the previous commit when reading old IR.
37d97e3
to
cafe731
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some questions regarding the deser hack.
outermostComponentType.ref, | ||
GetClass(Apply(EAF, This()(ClassType(ReflectArrayModClass, nullable = false)), | ||
MethodIdent(newInstanceSingleName), | ||
List(outermostComponentType.ref, IntLiteral(0)))(AnyType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it deliberate that you don't use BinaryOp.Class_newArray
here?
|
||
val length = varDef("length", IntType, ArraySelect(dimensions.ref, offset.ref)(IntType)) | ||
val result = varDef("result", AnyType, | ||
Apply(EAF, ths, MethodIdent(newInstanceSingleName), List(componentType.ref, length.ref))(AnyType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it deliberate that you don't use Class_newArray
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is deliberate. Here and for the other 2 comments, the idea is to best mimic the user-space implementation that we later implement in the source code. In other words, not to "cheat" with the deser hack by emitting code that we won't actually have when we don't need the deser hack anymore.
val result2 = varDef("result2", ArrayType(objectArrayTypeRef, nullable = true), | ||
AsInstanceOf(result.ref, ArrayType(objectArrayTypeRef, nullable = true))) | ||
val innerComponentType = varDef("innerComponentType", jlClassType, | ||
Apply(EAF, componentType.ref, MethodIdent(getComponentTypeName), Nil)(jlClassType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here: Is it deliberate that you don't use the primitive unary op?
java.lang.Class
requires primitive capabilities that only the linker can provide for most of its core methods. This commit changes the fundamental way to provide those.In the process, remove creation of multi-dimensional arrays from
the IR, and move it to user-space instead.
Previously, the contract was that the linker provided a JS object with specified properties and methods to the constructor of
jl.Class
. For example, it contained a"name"
property in order to implementjl.Class.getName()
. We happened to reuse the$TypeData
instance as that object, for convenience, and to avoid more indirections.This design grew organically over the years, and was eventually specified in Scala.js IR 1.0.0. However, it always had some shortcomings that I was not happy about.
First, the
getSuperclass()
method had to be special-cased in the Analyzer and Emitter: its very presence requires additional metadata. When it is not reachable, we do not actually provide the supposedly specified member"getSuperclass"()
on the JS object. That assumes that onlyjl.Class.getSuperclass()
actually calls that method. In turn, that assumes it never gets inlined (!), since doing so would make it unreachable, but the feature would still be required.Second, for optimization purposes, the emitter needs direct access to the
$TypeData
stored insidejl.Class
(for some of its Transients). There again, that assumed a field with a particular name be present injl.Class
in which the JS object would be stored. This implicit requirement was already leaking in the wayjl.Class
was written, in order to prevent scalac from renaming the field.In this commit, we completely change the approach, and therefore the IR spec. We remove the JS object passed to
jl.Class
; its constructor now takes no arguments. Instead, we add primitives in the IR, in the form of newUnaryOp
s andBinaryOp
s. These new primitives operate on an argument of typejl.Class!
, and possibly a second argument.For example,
UnaryOp.Class_name
replaces the functionality previously offered by the"name"
field.BinaryOp.Class_superClass
replaces"getSuperclass"()
.It is up to the backend to ensure it can, somehow, implement those operations given an instance of
jl.Class
. We choose to add a magic field tojl.Class
at the Emitter level to store the corresponding instance of$TypeData
, and implement the operations in terms of that.The approach has several benefits:
Since we are changing the spec anyway, we use the opportunity to switch what is primitive about
Array.newInstance
. Previously, the overload with multiple dimensions was primitive, and the one with a single dimension delegated to it. This was a waste, since usages of the multi-dimensional overload are basically non-existent otherwise. Similarly, theNewArray
node was designed for multiple dimensions, and the single dimension was a special case.We now make the one-dimensional operations the only primitives. We implement the multi-dimensional overload of
newInstance
in terms of the other one. We also use that overload as a replacement for multi-dimensionalNewArray
nodes. This simplifies the treatment in the linker, and produces shorter and more efficient code at the end of the day.The change of approach comes with a non-negligible cost for backward compatibility. It is entirely done using deserialization hacks. The most glaring issue is the synthesization of the multi-dimensional overload of
Array.newInstance
.In the first big commit, we do not change the library/compiler yet, so that we can exercise the deserialization hacks.
The second commit changes the compiler and library and disable the deserialization hacks when reading IR v1.17+.