Skip to content

Replace the JS object given to jl.Class by primitive IR operations. #4998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 20, 2024

Conversation

sjrd
Copy link
Member

@sjrd sjrd commented Jun 14, 2024

java.lang.Class requires primitive capabilities that only the linker can provide for most of its core methods. This commit changes the fundamental way to provide those.

In the process, remove creation of multi-dimensional arrays from
the IR, and move it to user-space instead.


Previously, the contract was that the linker provided a JS object with specified properties and methods to the constructor of jl.Class. For example, it contained a "name" property in order to implement jl.Class.getName(). We happened to reuse the $TypeData instance as that object, for convenience, and to avoid more indirections.

This design grew organically over the years, and was eventually specified in Scala.js IR 1.0.0. However, it always had some shortcomings that I was not happy about.

First, the getSuperclass() method had to be special-cased in the Analyzer and Emitter: its very presence requires additional metadata. When it is not reachable, we do not actually provide the supposedly specified member "getSuperclass"() on the JS object. That assumes that only jl.Class.getSuperclass() actually calls that method. In turn, that assumes it never gets inlined (!), since doing so would make it unreachable, but the feature would still be required.

Second, for optimization purposes, the emitter needs direct access to the $TypeData stored inside jl.Class (for some of its Transients). There again, that assumed a field with a particular name be present in jl.Class in which the JS object would be stored. This implicit requirement was already leaking in the way jl.Class was written, in order to prevent scalac from renaming the field.


In this commit, we completely change the approach, and therefore the IR spec. We remove the JS object passed to jl.Class; its constructor now takes no arguments. Instead, we add primitives in the IR, in the form of new UnaryOps and BinaryOps. These new primitives operate on an argument of type jl.Class!, and possibly a second argument.

For example, UnaryOp.Class_name replaces the functionality previously offered by the "name" field. BinaryOp.Class_superClass replaces "getSuperclass"().

It is up to the backend to ensure it can, somehow, implement those operations given an instance of jl.Class. We choose to add a magic field to jl.Class at the Emitter level to store the corresponding instance of $TypeData, and implement the operations in terms of that.

The approach has several benefits:

  • It solves the two fundamental issues of the JS object, mentioned above.
  • It does not require "public" (unminifiable) property names for the features; since there is no JS object spec, we can consistently minify those members.
  • Several methods that were given an intrinsic treatment by the optimizer now fall, more naturally, under regular folding of operations.

Since we are changing the spec anyway, we use the opportunity to switch what is primitive about Array.newInstance. Previously, the overload with multiple dimensions was primitive, and the one with a single dimension delegated to it. This was a waste, since usages of the multi-dimensional overload are basically non-existent otherwise. Similarly, the NewArray node was designed for multiple dimensions, and the single dimension was a special case.

We now make the one-dimensional operations the only primitives. We implement the multi-dimensional overload of newInstance in terms of the other one. We also use that overload as a replacement for multi-dimensional NewArray nodes. This simplifies the treatment in the linker, and produces shorter and more efficient code at the end of the day.


The change of approach comes with a non-negligible cost for backward compatibility. It is entirely done using deserialization hacks. The most glaring issue is the synthesization of the multi-dimensional overload of Array.newInstance.

In the first big commit, we do not change the library/compiler yet, so that we can exercise the deserialization hacks.

The second commit changes the compiler and library and disable the deserialization hacks when reading IR v1.17+.

@sjrd sjrd force-pushed the primitive-jl-class-operations branch from d79a966 to 9aed462 Compare June 14, 2024 12:07
@sjrd sjrd marked this pull request as ready for review June 14, 2024 12:18
@sjrd sjrd requested a review from gzm0 June 14, 2024 12:18
@sjrd
Copy link
Member Author

sjrd commented Jun 14, 2024

I was prompted to seriously think about these things because of WebAssembly. Although we could indeed implement the old spec with a JS object, it was awkward. Moreover, benchmarks have already shown that the JS interop in isArray and isAssignableFrom was really bad if used in a tight loop. That's because Scala's Array.copy uses those two methods.

This PR is done on main, though, as I believe we should evaluate the merits of this change independently of Wasm. The impact on Wasm can be seen in this commit:
sjrd@cd627e1

@sjrd sjrd changed the title CI ONLY Replace the JS object given to jl.Class by primitive IR operations. Replace the JS object given to jl.Class by primitive IR operations. Jun 14, 2024
@gzm0
Copy link
Contributor

gzm0 commented Jun 14, 2024

I'm prioritizing #4988 over this one. LMK if I should change that.

@gzm0
Copy link
Contributor

gzm0 commented Jun 19, 2024

One thought I had that you can probably already answer: What prompted the decision to model this as operators rather than special purpose IR trees?

@sjrd
Copy link
Member Author

sjrd commented Jun 19, 2024

TBH, I did not like the idea to add 10 different IR nodes for unfrequent operations. IR nodes are "expensive" in terms of maintenance.

Unary and binary ops originally had the rationale that they were all together because they were pure, but that was broken when we made integer divisions reliably throw on / 0. Additionally, String_+ is not really pure, since we have no guarantee that the toString() methods are pure. String_charAt even introduced semantics-dependent UB handling, so the fact that we now have NPEs in ops is not truly new.

So I reasoned that what made operators still meaningful as a category is that they share the following properties:

  • They have a fixed number of arguments
  • They first evaluate their arguments, from left to right, irrespective of the operator
  • Then they compute a result based on the value of the arguments, and not on scope or control-flow or things like that

Those properties are meaningful for manipulating the trees. For example, unnesting in FunctionEmitter is handled for all operators in a consistent way thanks to those properties.

Under that category, I would actually consider integrating GetClass, Clone and IdentityHashCode as unary ops.

Conversely, String_+ might deserve its own IR node, because it is the only one whose result may also depend on the state that is accessible from the operand values (all the other ones are stateless).

@gzm0
Copy link
Contributor

gzm0 commented Jun 22, 2024

Those properties are meaningful for manipulating the trees. For example, unnesting in FunctionEmitter is handled for all operators in a consistent way thanks to those properties.

Under that category, I would actually consider integrating GetClass, Clone and IdentityHashCode as unary ops.

Agreed. Probably we should do that to reduce trees (and document the properties on the ops).

Conversely, String_+ might deserve its own IR node, because it is the only one whose result may also depend on the state that is accessible from the operand values (all the other ones are stateless).

Doesn't this also apply to Clone? But TBH, I'm not sure that this is a meaningful criteria for exclusion. Ops can be side-effecting, and it feels that distinguishing this specific type of side-effect is kind of odd 🤷‍♂️

@sjrd
Copy link
Member Author

sjrd commented Jun 22, 2024

Ah yes, that's true. It also applies to Clone. I had not thought about that.

Exclusions are good if the nodes need to be treated differently in most cases. But otherwise probably not.

@sjrd
Copy link
Member Author

sjrd commented Jun 26, 2024

I was thinking about this on the side, and I find it annoying that we need to handle nulls. In theory, since the Class_x operators can be used with arbitrary operands, and we don't have non-nullable ClassTypes, the optimizer and backends must be ready to handle nulls. In practice, however, the only occurrences of these operators uses This() as the (lhs) Class value, which is statically known to be non-null. That makes the optimizer and backends more complicated than necessary, as accidental complexity.

I contemplated adding non-nullable ClassTypes in the IR, but that quickly looked daunting at this time.

However, what if the ClassDefChecker only allowed the (lhs) operand to Class_x operators to lexically be a This() node? After the optimizer, that would be relaxed (conditioned on the allowTransients flag, probably). This way, even without non-nullable types in the IR, we have a static guarantee that the operand to Class_x is non-null, and that means we don't need to specify and implement what happens when it's null.

If we do introduce non-nullable types in the IR in the future, we could then naturally remove that restriction and only demand a non-null jl.Class for the operand.

@gzm0 WDYT about this idea? Does it seem acceptable?

@sjrd sjrd force-pushed the primitive-jl-class-operations branch from f088f31 to 54678b7 Compare June 28, 2024 14:56
@sjrd
Copy link
Member Author

sjrd commented Jun 28, 2024

I have implemented the above alternative as an additional commit (second-to-last). If we go for it, it should be squashed with the main commit, since it undoes a number of changes of the main commit.

@sjrd sjrd force-pushed the primitive-jl-class-operations branch from 54678b7 to 1af41b1 Compare July 1, 2024 08:34
@sjrd
Copy link
Member Author

sjrd commented Jul 1, 2024

Rebased.

@sjrd
Copy link
Member Author

sjrd commented Jul 28, 2024

Rebased.

@gzm0
Copy link
Contributor

gzm0 commented Aug 10, 2024

It is a pity that jl.Class has actual implementations. Otherwise it feels like it might make sense to model values of jl.Class as an implementation defined primitive type.

  • This would resolve the nullability issue.
  • jl.Class would become a hijacked class.
  • We wouldn't have to do the field / constructor patching.

My gut feeling is that restricting the operands lexically goes a bit far, especially if we cannot maintain the property post optimizer. This will force us to either:

  • Never run the IR checker post optimizer, or
  • Implement non-nullable types in the IR.

I don't think the first one is something we want to commit to. So if we are to commit to the second one, we might just do it? (It would not surprise me if this would lead to various simplifications in the backends, especially w.r.t. AssumeNotNull).

Looking ir.Types: things we'd need for non-nullable IR types:

  • Add a nullable: Boolean field to ClassType and ArrayType
  • Adjust Types#isSubtype
  • Disallow zeros of non-nullable class types / array types
  • Per consequence disallow fields of non-nullable class types / array types
  • A full review of ir.Trees for the types they generate (other stuff can be improved as we go IMO).

@sjrd
Copy link
Member Author

sjrd commented Aug 10, 2024

My gut feeling is that restricting the operands lexically goes a bit far, especially if we cannot maintain the property post optimizer. This will force us to either:

* Never run the IR checker post optimizer, or

* Implement non-nullable types in the IR.

Not necessarily. We can also have the optimizer systematically insert an AssumeNotNull there, and the IR checker would allow both a This or an AssumeNotNull. If run after the optimizer, the optimizer would have to know about Transients anyway.

@sjrd sjrd force-pushed the primitive-jl-class-operations branch from ab45adc to 978442e Compare August 10, 2024 22:24
@sjrd
Copy link
Member Author

sjrd commented Aug 10, 2024

Rebased, including adding changes in the Wasm backend (the rest is unchanged).

@gzm0
Copy link
Contributor

gzm0 commented Aug 11, 2024

Have we considered adding an (opaque) type data primitive type to the IR?

The type would be non-nullable and the argument type to the class operations. IIUC this would also allow jl.Class to become more normal again. (IIUC the only issue we'd have to solve is the zero of the data field, but maybe we can hack to allow null there).

@sjrd
Copy link
Member Author

sjrd commented Aug 11, 2024

Ah, that is an interesting take. The typed closures PR shows that it is indeed possible to have IR types that don't have any interop semantics, by not being subtypes of any. I'll try that.

@sjrd
Copy link
Member Author

sjrd commented Aug 11, 2024

Meh, that's not so great. In addition to introducing more difficulties to handle nulls, it reintroduces the problem that the linker must know about the data field of jl.Class. For example to constant-fold classOf[C].getName(), which becomes name(classOf[C].data), we must be able to make the connection that .data is necessarily the TypeData value corresponding to C. So, not an improvement IMO.

@sjrd
Copy link
Member Author

sjrd commented Aug 21, 2024

FYI, I'm getting somewhere with non-nullable types in the IR, but I'm not done yet (the optimizer is not type-preserving yet, so Wasm+optimizer generates invalid Wasm code):
sjrd/scala-js@optimizer-with-webassembly...sjrd:scala-js:nullable-types-in-ir

@sjrd
Copy link
Member Author

sjrd commented Aug 22, 2024

True non-nullable types in the IR available at #5018.

@sjrd sjrd force-pushed the primitive-jl-class-operations branch from 978442e to 1a49ebd Compare August 25, 2024 10:49
@sjrd
Copy link
Member Author

sjrd commented Sep 2, 2024

So re-working on the newArray implementation made me realize that we've always been ignoring the IllegalArgumentExceptions that are supposed to be thrown in the following conditions:

  • For both overloads: if the cls is in fact classOf[Unit] (ironically there was a79ce3f which prevented the optimizer from messing it up, but the run-time code behind is still broken)
  • For the multi-dimensions only: when the array is empty (rather than null)

Given that it has always been broken, IMO it's best to fix those things after we land either #5031 or this PR. That said, it adds to the list of things both variants of the operation will need to deal with. On the one hand, it gives one more advantage to the single-dimension variant because there's only one failure mode to deal at the operation level; on the other hand, since both failure modes throw the same exception, the checks could maybe be reused better in the multi-dimensions variant. 🤷‍♂️

Specs here

@gzm0
Copy link
Contributor

gzm0 commented Sep 9, 2024

IMO its highly unlikely that we'll unroll multi dimensional NewArrays statically;

Looking at #5031 made me wonder whether we should actually try this...

What happens if we introduce a newArray(tpe: Class[_], dimensions: Array[Int]) helper function for the deserializer in the linker private library and use it both for the deserialization hack proposed here and for NewArray with more than one dimension?

The more I look at this, the more I agree that multi-dimensional array creation should have never made it into the IR. So I'm actually wondering if we should be more aggressive and attempt to throw out newArrayObject in both core libs.

@gzm0
Copy link
Contributor

gzm0 commented Sep 9, 2024

the checks could maybe be reused better in the multi-dimensions variant. 🤷‍♂️

In case we go with single dimension in IR only, I think we can live with the class check being over-performed if that makes the implementation of multi dimension new array easier (which it likely does).

Copy link
Contributor

@gzm0 gzm0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits discovered on a re-review.

RE dispatch: I think we should go with the magic data field for now to get this moving. We can still decide to change the dispatch if we believe it is an advantage.

RE single / multi dimensions: See my other comment.

/** Represents a property of one of the special `ArrayClass`es.
/** Represents a synthetic property of some Scala classes.
*
* A synthetic property not exist as `FieldDef` or `MethodDef` in the IR, but
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: ... property does not exist

* throw an `ArithmeticException` when their right-hand-side is 0. That
* exception is not subject to undefined behavior.
*
* `String_charAt` throws string index out bounds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringIndexOutOfBoundsException?

@sjrd
Copy link
Member Author

sjrd commented Sep 9, 2024

What happens if we introduce a newArray(tpe: Class[_], dimensions: Array[Int]) helper function for the deserializer in the linker private library and use it both for the deserialization hack proposed here and for NewArray with more than one dimension?

Huh. I don't think we have precedent for the deserialization hacks to rely on a method in the linker library. The hacks turn previously valid IR into newly valid IR, regardless of the linker. The linker library is an implementation detail of the linker. It is even possible for an alternative linker not to have a linker library at all.

I think we could conceivably rewrite multi-dimension NewArray into a call to j.l.r.Array.newInstance, because that one is at least publicly specified.

Edit: if we do that, then we could also rewrite single-dimension NewArray into a Class_newArray, and make sure the backends do something smart when the first argument is a literal.

@sjrd sjrd force-pushed the primitive-jl-class-operations branch from 08695ae to fcbf48d Compare September 12, 2024 16:10
@sjrd sjrd requested a review from gzm0 September 12, 2024 16:13
@sjrd
Copy link
Member Author

sjrd commented Sep 12, 2024

Added two commits to be squashed that also rewrite multi-NewArray into calls to jlr.Array.newInstance.

I kept the single-dim NewArray after all, because its tpe is better than the result of Class_newArray. A number of optimizations would regress if we rewrote to Class_newArray.

@sjrd sjrd force-pushed the primitive-jl-class-operations branch from eaf163d to 9f1966d Compare September 13, 2024 07:07
Copy link
Contributor

@gzm0 gzm0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should proceed with the variant with the simplified NewArray: IMO the deser hacks do not get much worse, but we get more simplification downstream.

I haven't finished in-detail review yet, but have one comment RE CheckNull.

Comment on lines 1520 to 1528
} else if (methodSimpleNameString == "newArrayOfThisClass") {
val newName = MethodIdent(MethodName(methodName.simpleName,
List(IntRef), ClassRef(ObjectClass)))(method.name.pos)
val lengthParam = ParamDef(LocalIdent(LocalName("length")),
NoOriginalName, IntType, mutable = false)
val newBody = BinaryOp(BinaryOp.Class_newArray, thisJLClass, lengthParam.ref)
MethodDef(method.flags, newName, method.originalName,
List(lengthParam), AnyType, Some(newBody))(
method.optimizerHints.withInline(true), method.version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be very opposed to expose CheckNull as a UnaryOp? Then this (and the strange private method in Class) would go away.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I think it definitely makes sense. There's a bit of open design questions for that, though:

  • If we make it a UnaryOp, we cannot determine the tpe of a UnaryOp's op alone anymore. It now depends on the type of the argument. Should this be grounds to make it its own node instead? Not sure. (Note that Clone will have the same issue if we want to turn it into a UnaryOp.)
  • Instead of a new kind of operation, should it be merged into the behavior of AsInstanceOf? If the target type is not nullable (but also not a primitive because that's an unboxing operation) and the argument's type is nullable, then the behavior would include a null check. That has the useful side effect that we can express an AsInstanceOf from non-nullable to non-nullable as well, basically for free. However it's a more involved change.

Do you have any opinion about these alternatives?

Copy link
Contributor

@gzm0 gzm0 Sep 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be grounds to make it its own node instead?

I'm not sure. It feels like that change would still be extremely local. All usage sites are going to look at tpe anyways if they need the type.

Instead of a new kind of operation, should it be merged into the behavior of AsInstanceOf?

I think conceptually this would be cleaner, yes. I have been fiddling with this and a couple of challenges surface already:

  • We need more optimization in the emitter stage: Checking for null and for a type are very different at the emitter level. So always doing both is wasteful.
  • If we naively replace CheckNotNull(expr) with AsInstanceOf(expr, expr.tpe.toNonNullable) we get undefined references (at the GCC level) for $as_jl_Long and $as_jl_Throwable. The former is probably the fault of the optimizer (runtime long substitution is the last thing IIUC that prevents us from running IR checking post optimizer). The latter is probably the fault of the emitter (JS throwable replacement).
  • Side-effect checking for AsInstanceOf under semantics becomes more complicated: We now depend on two semantics (asInstanceOf and NPE). Again, if we do the checking naively (require both semantics to be unchecked), we'll almost certainly regress in code size.

So all-in-all, I'm not sure we're better of with the AsInstanceOf option (but maybe more testing is required). I think what we should definitely do is allow non-nullable types in AsInstanceOf if the target expr is already non-nullable (but IIUC that is a trivial change).

Branch: https://github.com/scala-js/scala-js/compare/main...gzm0:allow-non-null-as-instance?expand=1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the analysis. It seems merging with AsInstanceOf causes more issues than it solves.

I made the unary op version in #5037.


MethodDef(method.flags, method.name, method.originalName,
method.args, method.resultType, Some(newBody))(
method.optimizerHints.withInline(true), method.version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems marking getSimpleName and computeCachedSimpleNameBestEffort inline flagged too (am I correct?), is that okay? I don't see those methods@inline in Class.scala.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed.

@sjrd sjrd force-pushed the primitive-jl-class-operations branch from 9f1966d to 495917e Compare September 17, 2024 10:06
@sjrd
Copy link
Member Author

sjrd commented Sep 17, 2024

Rebased on top of #5037, being optimistic that we're going to opt one way or another for a version based on a check-not-null operation.

@sjrd sjrd requested a review from gzm0 September 17, 2024 10:07
@sjrd sjrd force-pushed the primitive-jl-class-operations branch from 495917e to a7446dc Compare September 18, 2024 08:44
@gzm0
Copy link
Contributor

gzm0 commented Sep 18, 2024

Could you perform the squashes for ease of review?

@sjrd sjrd force-pushed the primitive-jl-class-operations branch from 6c4b8da to 57834b8 Compare September 18, 2024 11:02
@sjrd
Copy link
Member Author

sjrd commented Sep 18, 2024

Squashed :)

In the next commit, we will add `jl.Class.data` as a synthetic
property, which we will have to handle in the same way as
`ArrayClass` properties. We therefore rename the concept to be more
general and include any synthetic property on Scala classes.
@sjrd sjrd force-pushed the primitive-jl-class-operations branch from 723bd06 to fc8870d Compare September 18, 2024 13:30
Comment on lines 486 to 487
case Class_isAssignableFrom =>
ClassType(ClassClass, nullable = true)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have the ability to insert a CheckNotNull, should we also require a non-nullable jl.Class as the rhs of isAssignableFrom?

It's not as critical as the lhs, because the NPE naturally comes after evaluating both arguments, which is not true for the lhs. But it would probably be more consistent.

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a bit better with that change, IMO, so I did it.

@sjrd sjrd force-pushed the primitive-jl-class-operations branch from 2fc6fa6 to 2dd91c1 Compare September 19, 2024 12:40
`java.lang.Class` requires primitive capabilities that only the
linker can provide for most of its core methods. This commit changes
the fundamental way to provide those.

In the process, remove creation of multi-dimensional arrays from
the IR, and move it to user-space instead.

---

Previously, the contract was that the linker provided a JS object
with specified properties and methods to the constructor of
`jl.Class`. For example, it contained a `"name"` property in order
to implement `jl.Class.getName()`. We happened to reuse the
`$TypeData` instance as that object, for convenience, and to avoid
more indirections.

This design grew organically over the years, and was eventually
specified in Scala.js IR 1.0.0. However, it always had some
shortcomings that I was not happy about.

First, the `getSuperclass()` method had to be special-cased in the
Analyzer and Emitter: its very presence requires additional metadata.
When it is not reachable, we do not actually provide the supposedly
specified member `"getSuperclass"()` on the JS object. That assumes
that only `jl.Class.getSuperclass()` actually calls that method.
In turn, that assumes it never gets inlined (!), since doing so
would make it unreachable, but the feature would still be required.

Second, for optimization purposes, the emitter needs direct access
to the `$TypeData` stored inside `jl.Class` (for some of its
Transients). There again, that assumed a field with a particular
name be present in `jl.Class` in which the JS object would be
stored. This implicit requirement was already leaking in the way
`jl.Class` was written, in order to prevent scalac from renaming
the field.

---

In this commit, we completely change the approach, and therefore
the IR spec. We remove the JS object passed to `jl.Class`; its
constructor now takes no arguments. Instead, we add primitives in
the IR, in the form of new `UnaryOp`s and `BinaryOp`s. These new
primitives operate on an argument of type `jl.Class!`, and possibly
a second argument.

For example, `UnaryOp.Class_name` replaces the functionality
previously offered by the `"name"` field. `BinaryOp.Class_superClass`
replaces `"getSuperclass"()`.

It is up to the backend to ensure it can, somehow, implement those
operations given an instance of `jl.Class`. We choose to add a
magic field to `jl.Class` at the Emitter level to store the
corresponding instance of `$TypeData`, and implement the operations
in terms of that.

The approach has several benefits:

* It solves the two fundamental issues of the JS object, mentioned
  above.
* It does not require "public" (unminifiable) property names for
  the features; since there is no JS object spec, we can consistently
  minify those members.
* Several methods that were given an *intrinsic* treatment by the
  optimizer now fall, more naturally, under regular folding of
  operations.

---

Since we are changing the spec anyway, we use the opportunity to
switch what is primitive about `Array.newInstance`. Previously, the
overload with multiple dimensions was primitive, and the one with a
single dimension delegated to it. This was a waste, since usages of
the multi-dimensional overload are basically non-existent otherwise.
Similarly, the `NewArray` node was designed for multiple dimensions,
and the single dimension was a special case.

We now make the one-dimensional operations the only primitives.
We implement the multi-dimensional overload of `newInstance` in
terms of the other one. We also use that overload as a replacement
for multi-dimensional `NewArray` nodes. This simplifies the treatment
in the linker, and produces shorter and more efficient code at the
end of the day.

---

The change of approach comes with a non-negligible cost for backward
compatibility. It is entirely done using deserialization hacks. The
most glaring issue is the synthesization of the multi-dimensional
overload of `Array.newInstance`.

In this commit, we do not change the library/compiler yet, so that
we can exercise the deserialization hacks.
We use the same technique as for `String_length` and `String_charAt`,
for which we hijack the method body from the compiler and replace
it with the appropriate operation.

Likewise, we actually change the `NewArray` data type to contain a
single length.

We only enable the deserialization hacks of the previous commit when
reading old IR.
@sjrd sjrd force-pushed the primitive-jl-class-operations branch from 37d97e3 to cafe731 Compare September 19, 2024 15:43
Copy link
Contributor

@gzm0 gzm0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some questions regarding the deser hack.

outermostComponentType.ref,
GetClass(Apply(EAF, This()(ClassType(ReflectArrayModClass, nullable = false)),
MethodIdent(newInstanceSingleName),
List(outermostComponentType.ref, IntLiteral(0)))(AnyType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it deliberate that you don't use BinaryOp.Class_newArray here?


val length = varDef("length", IntType, ArraySelect(dimensions.ref, offset.ref)(IntType))
val result = varDef("result", AnyType,
Apply(EAF, ths, MethodIdent(newInstanceSingleName), List(componentType.ref, length.ref))(AnyType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it deliberate that you don't use Class_newArray here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is deliberate. Here and for the other 2 comments, the idea is to best mimic the user-space implementation that we later implement in the source code. In other words, not to "cheat" with the deser hack by emitting code that we won't actually have when we don't need the deser hack anymore.

val result2 = varDef("result2", ArrayType(objectArrayTypeRef, nullable = true),
AsInstanceOf(result.ref, ArrayType(objectArrayTypeRef, nullable = true)))
val innerComponentType = varDef("innerComponentType", jlClassType,
Apply(EAF, componentType.ref, MethodIdent(getComponentTypeName), Nil)(jlClassType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here: Is it deliberate that you don't use the primitive unary op?

@sjrd sjrd merged commit 48b179d into scala-js:main Sep 20, 2024
3 checks passed
@sjrd sjrd deleted the primitive-jl-class-operations branch September 20, 2024 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants