Foundations of generics support #1290

nevkontakte · 2024-04-13T20:55:14Z

As discussed in the issue, merging foundations of the generics support into master to keep it in sync with Go 1.20 work and general development.

The general approach taken in the PR is closer to monomorphization: each generic function or type is translated multiple types, once for each combination of type arguments discovered in the program. The main advantages of this approach are:

Performance of generic code is equivalent to non-generic.
No significant changes to prelude - programs that don't use generics pay no output size penalty.
Better maintenance story: there is no divergence between handling of generic and non-generic code, and no need in duplicating parts of compiler logic in the runtime (hence, bigger prelude).

This differs from the original proposal I made, which was partially implemented in the generics branch. The reason that proposal was abandoned are explained in this comment.

Currently generics implementation is NOT production-ready and hidden behind the GOPHERJS_EXPERIMENT=generics flag. Namely, some of the major missing features:

Generic instantiation discovery doesn't work across package boundary. This means that you can't use generic types or functions defined in a different package, which is a very common practical scenario. Changes to the build system are required to support that.
Blocking function analysis hasn't been updated to be generics-aware. The main effect is that a lot of generic calls are considered blocking when they don't need to be, leading to a less efficient code.
Dead code elimination doesn't work on individual instantiations. Namely, if a single instance of a generic object is considered alive, all other instantiations of the same object would be considered alive as well.

These issues will be addressed in the upcoming RPs.

Updates #1013.

This commit updates the run.go to do correct flag parsing and lists all failing test cases as known failures.

The library scans type-checked Go AST and incrementally discovers instantiations of generic types and functions in it. Each instantiation is recorded with the origin object and a set of type arguments. The scan is seeded from two sources: - Any Instances that have been added to the working InstanceSet externally (for example, from references by other packages). - Any instances that are discovered within the non-generic code in the package itself. The discovery proceeds to search for instances in the generic code by traversing its AST for each set of previously discovered type arguments. In that scan, whenever a type is defined in terms of a type param, we map it onto a concrete type given the current set of type arguments. For now, the library is not fully ready to handle generic instances that come across the package boundaries, that will come later. The type mapping logic turns out to be fairly complex, so instead of reimplementing it I decided to cheat and "borrow" it from the go/types package via a go:linkname directive. I didn't want to vendor it because I expect it would be changing fairly regularly in future. We shall see whether this was a wise choice. This commit also includes a couple of supporting refactorings: - Moved the symbols logic used by go:linkname to a separate package, since it turned out to be useful for generics as well. - Added a testingx helper library with a generic `Must(t)` helper to make trivial error handling in tests less verbose.

When translating a function with type params, we translate it for each discovered combination of type arguments. Like regular functions, generic ones are assigned a variable name, but instead of representing the function directly, it is mapping from instance keys to type-specific translation of the function. The key is a string that is derived from type arguments and is guaranteed to be different for different set of type arguments. To make this possible, I've make following changes: - For functions we now emit two decls, one that reserves the variable name, and one per instantiation that generates function body. This is necessary to make sure variable containing generic functions is initialized once. - Decl struct now has a new RefExpr field that contains a JS expression referencing the object the decl represents. This is necessary because the variable name that was previously used for linkname purposes is now stored in a different decl as explained above. - Mapping from object names to variable names is now stored in funcContext. This is necessary since each generic function is translated several times and we need to create a new local variable for objects within it each time. - Added handling of ast.IndexExpr and ast.IndexListExpr in the compiler for the purposes of specifying type parameters. At this stage, the implementation is incomplete in several major ways: - Generic types are not supported. - Generic instantiation discovery is only possible within a single package. Detecting instantiations from other packages will require changes to the build package. - When translating generic functions we do not yet apply type parameter to type argument mapping. - DCE does not distinguish between different instantiations of the same object and doesn't properly handle variable reservation decls. All those issues will be addressed in later commits.

Instead of using type information directly from types.Info.TypeOf, we now use a funcContext.typeOf, which passes the resulting type through typeparams.Resolver. If we are translating a parameterized code, it will perform type parameter substitution according to the type arguments of the instance we are currently translating. This allows us to generate a type-specific code for each function instance, making sure correct operations are used in each case.

Initially I thought it would be trivial to derive them from the main type instances, but it led to more duplicated code than necessary, so now they will be discovered and processed explicitly.

Similar to standalone functions, we generate a set of initialization code for each combination of type parameters discovered. Code for metadata initialization is generated with type argument substitution active for the given instantiation. Methods are translated essentially the same as standalone functions, except that they are assigned to the receiver type object instead of a standalone variable.

Type expressions for generic types contain an index expressions, which was previously impossible. With this change, we correctly recognize and check them.

Previously passing any type other than int would cause a type assertion failure and a panic. This bug could be triggered when evaluating a result of len() built-in for array types. However, it remained hidden in non-generic code, since the expression was determined to be constant, short-circuiting translation. Apparently, in generic code go/types can't prove the constant value, thus revealing the bug.

Prior to this change, while scanning for seed generic instances we would not traverser generic function (or method) bodies, which lead to generic types declared inside them not being added to objMap. Later, when an instance of such type was discovered, we had no matching AST node to process for further instantiations. With this change, we do traverse generic function bodies, but only to populate objMap. We ignore potential type instantiations in this case because they would be defined in terms of a type param, which are not interesting to us.

When calling a method on a type param or a type defined using a type param we replace the original selection from go/types.Info.Selections with a synthetic counterpart for the instantiated receiver type. Unfortunately, we can't construct instances of types.Selection due to its fields being private, so we create an interface-compatible implementation instead.

Prior to generics an inline type was always first encountered by the compiler at its decl statement, where it reserved the variable name for it. With generics, it may be encountered before that when translating one of the instantiations of a generic type. With this change, variable name allocation will happen correctly regardless of where it would be encountered first.

Such variables are stored in a single-element arrays, and the pointer object is stored in the $ptr property of such an array. The earlier refactoring of object name assignment made the code generation to emit `varname[0].$ptr` instead of `varname.$ptr`, which broke it for primitive types. This change restores the original behavior.

Prior to this change the discovered instance would have referred to the *types.Var object representing the implicitly declared struct field. With this change, we extract the *types.TypeName that represents the instantiated type.

Since we can now traverse the same AST subtree multiple types while processing different generic instances, type declaration statements may be visited more than once as well. The collected type list is then used to emit JS for type definitions, so duplicate *types.TypeName entries led to emitting duplicate code. Using set semantics prevents that. Note that we use ordered set to make sure code generation remains deterministic.

Previously used symbol.Name produces a collision when types with the same name declared in two different scopes, for example: ```go func f() { type E struct{} var e E _ = e } func g() { type E struct{} var e E _ = e } ``` In the example above, two declarations of `E` are actually two different objects, but their instances would have been mixed up.

This change fixes a class of generic instance collisions caused by type name shadowing by generating synthetic numeric keys for each type instance in the program instead of deriving a string-based key from type arguments. Each instance added to the InstanceSet is assigned a number which is guaranteed to be unique among all instances of the same object. This allows us to distinguish two different instances of `f` in the example below: ```go func f[T any]() {} func main() { type X int f[X]() { type X string // Shadows outer X. f[X]() } } ```

Prior to this change, when a generic function had named result variables, we had an identity mismatch when accessing result variables through the instantiated function signature vs through the function body AST. The latter references the types.Var instance defined in terms of type parameters, whereas the former produced synthetic instantiated versions of the same types.Var objects. As a result, two different names were assigned to what should be the same object, depending on how it was reached. With this change, we preserve the original, uninstantiated signature, which result variables match the ones referenced in the body AST. Whenever we need access to the instantiated signature types we perform type substitution ad-hoc.

Prior to this change, we did not substitute type parameters with instantiated counterparts, which led to the incorrect treatment of the type param types as interfaces and invalid type assertion results. This change corrects the mistake. Although typeswitch2.go and typeswitch5.go generate a different output from upstream Go, it is semantically equivalent and the difference is down to console.log() behaving slightly differently than println() in native Go.

Despite being defined in the unsafe package, these functions actually are implemented as builtins. For non-generic types go/types automatically computes constant values for corresponding AST nodes, but without type substitution it can't do the same for generic code. To make it work, we explicitly implement these builtins in our code, where type substitution takes effect, using go/types.Sizes to provide actual byte values.

Previously we've been using similar functionality from go/types package. Since it was unexported, we've used a //go:linkname directive to gain access to it. However, upgrade to go1.19 changed arguments of the function in ways that are difficult to replicate, so I switched to an equivalent reimplementation from the go/ssa package. This version is implemented in terms of public go/types interface, so it's easy to vendor and update. To simplify code updates in future I kept all original code intact and added a separate source file that exports the necessary functions for gopherjs use.

- typeswitch5.go now uses fmt.Println() and passes. - Most generics-related tests added in 1.19 also pass now.

Previously it was masked by missing generics support, but now it revealed another latent bug: #1271.

I tried to fixup original comments to keep the history tidier, but that generated far too many merge conflicts than it's worth. Let it be my lesson for why pull requests shall be of a modest size.

An alternate approach to generics support

This will be useful for testing cross-package generic instance discovery.

- Add PackageInstanceSets type to organize instances by package the instantiated object belongs to. - Update Collector and related code to work with PackageInstanceSets. Changes necessary to use this on the compiler side are not implemented yet, so we still can't _compile_ the code with generic instances across packages.

Refactor srctesting to support parsing and checking multiple packages.

Support collecting instances across packages in typeparams.Collector.

This commit adds support for GOPHERJS_EXPERIMENT environment variable that can be used to enable experimental functionality in the compiler. Experiment flags can be easily added or removed by updating fields of the internal/experiments.Flags struct. For now, only boolean flags are supported, but it should be easy enough to extend if needed. Users need to have GOPHERJS_EXPERIMENT=generics set in their environment to enable support. Adding the flag will allow us to merge generics support into the main branch before it's fully production-ready without creating unnecessary confusion for the users.

Guard generics support behind an experiment flag.

nevkontakte · 2024-04-13T20:57:10Z

Note: all the changes in this PR have been previously reviewed in the generics-ng branch, but I would like to solicit an overall approval to get this into master.

flimzy

Didn't do a thorough review, since as mentioned, this has been reviewed before, but I did skim through every file and left a few comments.

compiler/analysis/info.go

compiler/internal/typeparams/subst/export.go

compiler/internal/typeparams/utils.go

compiler/utils.go

github-actions · 2024-04-15T19:56:36Z

Reference app: jQuery TodoMVC (acf500a6c32a83d8c4582d967b09a65febf0e120)

BRANCH	ORIGINAL	MINIFIED	COMPRESSED (GZIP)
Pull request (generics-ng)	3,017,576 bytes	1,958,821 bytes	389,222 bytes
Target branch (master)	0.63% increase (18,977 bytes)	0.32% increase (6,317 bytes)	1.00% increase (3,864 bytes)

#outputSize

nevkontakte added 30 commits December 2, 2023 14:28

Add typeparam test cases to the gorepo test suite.

59e4d76

This commit updates the run.go to do correct flag parsing and lists all failing test cases as known failures.

Do not fail compilation when encountering generics.

82b84ff

Explicitly generate method instances for generic names types.

ae5a9c2

Initially I thought it would be trivial to derive them from the main type instances, but it led to more duplicated code than necessary, so now they will be discovered and processed explicitly.

Re-triage failing test cases.

b65288d

Correctly detect type conversions to generic types.

8f8d041

Type expressions for generic types contain an index expressions, which was previously impossible. With this change, we correctly recognize and check them.

Re-triage remaining typeparam test failures.

77b5223

Include type arguments into generic type strings.

49c773a

Merge master with Go 1.19 support.

5aaeafb

Update run.go to match the current set of passing tests.

fb19fae

- typeswitch5.go now uses fmt.Println() and passes. - Most generics-related tests added in 1.19 also pass now.

Diagnose fixedbugs/issue50672.go issue.

deb26d0

Previously it was masked by missing generics support, but now it revealed another latent bug: #1271.

Gofumpt the files to keep the linter happy.

8a80f06

Address pull request comments.

b288e83

I tried to fixup original comments to keep the history tidier, but that generated far too many merge conflicts than it's worth. Let it be my lesson for why pull requests shall be of a modest size.

Merge remote-tracking branch 'upstream/generics-ng' into generics-ng

2ce9bec

nevkontakte and others added 7 commits March 2, 2024 16:15

Merge pull request #1272 from nevkontakte/generics-ng

6de32f8

An alternate approach to generics support

Refactor srctesting to support parsing and checking multiple packages.

0134061

This will be useful for testing cross-package generic instance discovery.

Merge pull request #1274 from nevkontakte/gng1

4d56d3f

Refactor srctesting to support parsing and checking multiple packages.

Merge pull request #1276 from nevkontakte/gng2

732a8f6

Support collecting instances across packages in typeparams.Collector.

Merge pull request #1279 from nevkontakte/gng3

7f11cd4

Guard generics support behind an experiment flag.

nevkontakte requested a review from flimzy April 13, 2024 20:55

This comment has been minimized.

Sign in to view

flimzy approved these changes Apr 15, 2024

View reviewed changes

compiler/analysis/info.go Outdated Show resolved Hide resolved

compiler/internal/typeparams/subst/export.go Outdated Show resolved Hide resolved

compiler/internal/typeparams/utils.go Outdated Show resolved Hide resolved

compiler/utils.go Show resolved Hide resolved

Address code review feedback from PR #1290.

852c4d2

nevkontakte enabled auto-merge April 15, 2024 19:55

nevkontakte merged commit c0be063 into master Apr 15, 2024

nevkontakte deleted the generics-ng branch April 15, 2024 20:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Foundations of generics support #1290

Foundations of generics support #1290

Uh oh!

nevkontakte commented Apr 13, 2024

Uh oh!

This comment has been minimized.

nevkontakte commented Apr 13, 2024

Uh oh!

flimzy left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 15, 2024

Uh oh!

Uh oh!

Foundations of generics support #1290

Foundations of generics support #1290

Uh oh!

Conversation

nevkontakte commented Apr 13, 2024

Uh oh!

This comment has been minimized.

nevkontakte commented Apr 13, 2024

Uh oh!

flimzy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 15, 2024

Uh oh!

Uh oh!