|
| 1 | +# Proposal: Generic parameterization of array sizes |
| 2 | + |
| 3 | +Author(s): Andrew Werner |
| 4 | + |
| 5 | +Last updated: March 16th, 2021 |
| 6 | + |
| 7 | +## Abstract |
| 8 | + |
| 9 | +With the type parameters generics proposal has been accepted, though not yet |
| 10 | +fully specified or implemented, we can begin to talk about extension. [That |
| 11 | +proposal][type parameters] lists the following omission: |
| 12 | + |
| 13 | +> No parameterization on non-type values such as constants. This arises most |
| 14 | +obviously for arrays, where it might sometimes be convenient to write type |
| 15 | +`Matrix[n int] [n][n]float64`. It might also sometimes be useful to specify |
| 16 | +significant values for a container type, such as a default value for elements. |
| 17 | + |
| 18 | +This proposal seeks to resolve this limitation by (a) specifying when `len` can |
| 19 | +be used as a compile-time constant and (b) adding syntax to specify constraints |
| 20 | +for all arrays of a given type in type lists. |
| 21 | + |
| 22 | +## Background |
| 23 | + |
| 24 | +An important property of the generics proposal is that it enables the creation |
| 25 | +of libraries of specialized container data structures. The existence of such |
| 26 | +libraries will help developers write more efficient code as these data |
| 27 | +structures will be able to allocate fewer object and provide greater access |
| 28 | +locality. [This Google blog post][block based data structures] about block-based |
| 29 | +C++ data drives home the point. |
| 30 | + |
| 31 | +The justification is laid out in the omission of the type parameter proposal. |
| 32 | +The motivation that I've stumbled upon is in trying to implement a B-Tree |
| 33 | +and allowing the client to dictate the degree of the node. |
| 34 | + |
| 35 | +One initial idea would be to allow the client to provide the actual array |
| 36 | +which will be backing the data inside the node as a type parameter. This might |
| 37 | +actually be okay in some data structure user cases but in a B-Tree it's bad |
| 38 | +because we still would like to instantiate an array for the pointers and that |
| 39 | +array needs to have a size that is a function of the data array. |
| 40 | + |
| 41 | +The proposal here seeks to make it possible for clients to provide default |
| 42 | +values for array sizes of generic data structures in a way that is minimally |
| 43 | +invasive to the concepts which go already has. The shorthand comment stated |
| 44 | +in the Omission of the Type Parameter Proposal waves its hand at what feels |
| 45 | +like a number of new and complex concepts for the language. |
| 46 | + |
| 47 | +## Proposal |
| 48 | + |
| 49 | +This proposals attempts to side-step questions of how one might provide a |
| 50 | +scalar value in a type constraint by not ever providing a scalar directly. |
| 51 | +This proposal recognizes that constants can be used to specify array lengths. |
| 52 | +It also notes that the value of `len()` can be computed as a compile-time |
| 53 | +constant in some cases. Lastly, it observes that type lists could be extended |
| 54 | +minimally to indicate a constraint that a type is an array of a given type |
| 55 | +without constraining the length of the array. |
| 56 | + |
| 57 | +### The vanilla generic B-Tree |
| 58 | + |
| 59 | +Let's explore the example of a generic B-Tree with a fixed-size buffer. Find |
| 60 | +such an example [here][vanilla btree]. |
| 61 | + |
| 62 | +```go |
| 63 | +// These constants are the wart. |
| 64 | +const ( |
| 65 | + degree = 16 |
| 66 | + maxItems = 2*degree - 1 |
| 67 | + minItems = degree - 1 |
| 68 | +) |
| 69 | + |
| 70 | +func NewBTree[K, V any](cmp LessFn[K]) OrderedMap[K, V] { |
| 71 | + return &btree[K, V]{cmp: cmp} |
| 72 | +} |
| 73 | + |
| 74 | +type btree[K, V any] struct { |
| 75 | + cmp LessFn[K] |
| 76 | + root *node[K, V] |
| 77 | +} |
| 78 | + |
| 79 | +// ... |
| 80 | + |
| 81 | +type node[K, V any] struct { |
| 82 | + count int16 |
| 83 | + leaf bool |
| 84 | + keys [maxItems]K |
| 85 | + vals [maxItems]V |
| 86 | + children [maxItems + 1]*node[K, V] |
| 87 | +} |
| 88 | +``` |
| 89 | + |
| 90 | +### Parameterized nodes |
| 91 | + |
| 92 | +Then we allow parameterization on the node type within the btree implementation |
| 93 | +so that different node concrete types with different memory layouts may be |
| 94 | +used. Find an example of this generalization |
| 95 | +[here][parameterized node btree]. |
| 96 | + |
| 97 | +```go |
| 98 | +type nodeI[K, V, N any] interface { |
| 99 | + type *N |
| 100 | + find(K, LessFn[K]) (idx int, found bool) |
| 101 | + insert(K, V, LessFn[K]) (replaced bool) |
| 102 | + remove(K, LessFn[K]) (K, V, bool) |
| 103 | + len() int16 |
| 104 | + at(idx int16) (K, V) |
| 105 | + child(idx int16) *N |
| 106 | + isLeaf() bool |
| 107 | +} |
| 108 | + |
| 109 | +func NewBTree[K, V any](cmp LessFn[K]) OrderedMap[K, V] { |
| 110 | + type N = node[K, V] |
| 111 | + return &btree[K, V, N, *N]{ |
| 112 | + cmp: cmp, |
| 113 | + newNode: func(isLeaf bool) *N { |
| 114 | + return &N{leaf: isLeaf} |
| 115 | + }, |
| 116 | + } |
| 117 | +} |
| 118 | + |
| 119 | +type btree[K, V, N any, NP nodeI[K, V, N]] struct { |
| 120 | + len int |
| 121 | + cmp LessFn[K] |
| 122 | + root NP |
| 123 | + newNode func(isLeaf bool) NP |
| 124 | +} |
| 125 | + |
| 126 | +type node[K, V any] struct { |
| 127 | + count int16 |
| 128 | + leaf bool |
| 129 | + keys [maxItems]K |
| 130 | + vals [maxItems]V |
| 131 | + children [maxItems + 1]*node[K, V] |
| 132 | +} |
| 133 | +``` |
| 134 | + |
| 135 | +This still ends up using constants and there's no really easy |
| 136 | +way around that. You might want to parameterize the arrays into the node like |
| 137 | +in [this example][bad parameterization btree]. This still |
| 138 | +doesn't tell a story about how to relate the children array to the items. |
| 139 | + |
| 140 | +### The proposal to parameterize the arrays |
| 141 | + |
| 142 | +Instead, we'd like to find a way to express the idea that there's a size |
| 143 | +constant which can be used in the type definitions. The proposal would |
| 144 | +result in an implementation that looked like |
| 145 | +[this][proposal btree]. |
| 146 | + |
| 147 | +```go |
| 148 | + |
| 149 | +// StructArr is a constraint that says that a type is an array of empty |
| 150 | +// structs of any length. |
| 151 | +type StructArr interface { |
| 152 | + type [...]struct{} |
| 153 | +} |
| 154 | + |
| 155 | +type btree[K, V, N any, NP nodeI[K, V, N]] struct { |
| 156 | + len int |
| 157 | + cmp LessFn[K] |
| 158 | + root NP |
| 159 | + newNode func(isLeaf bool) NP |
| 160 | +} |
| 161 | + |
| 162 | +// NewBTree constructs a generic BTree-backed map with degree 16. |
| 163 | +func NewBTree[K, V any](cmp LessFn[K]) OrderedMap[K, V] { |
| 164 | + const defaultDegree = 16 |
| 165 | + return NewBTreeWithDegree[K, V, [defaultDegree]struct{}](cmp) |
| 166 | +} |
| 167 | + |
| 168 | +// NewBTreeWithDegree constructs a generic BTree-backed map with degree equal |
| 169 | +// to the length of the array used as type parameter A. |
| 170 | +func NewBTreeWithDegree[K, V any, A StructArr](cmp LessFn[K]) OrderedMap[K, V] { |
| 171 | + type N = node[K, V, A] |
| 172 | + return &btree[K, V, N, *N]{ |
| 173 | + cmp: cmp, |
| 174 | + newNode: func(isLeaf bool) *N { |
| 175 | + return &N{leaf: isLeaf} |
| 176 | + }, |
| 177 | + } |
| 178 | +} |
| 179 | + |
| 180 | +type node[K, V any, A StructArr] struct { |
| 181 | + count int16 |
| 182 | + leaf bool |
| 183 | + keys [2*len(A) - 1]K |
| 184 | + values [2*len(A) - 1]V |
| 185 | + children [2 * len(A)]*node[K, V, A] |
| 186 | +} |
| 187 | +``` |
| 188 | +### The Matrix example |
| 189 | + |
| 190 | +The example of the omission in type parameter proposal could be achieved in |
| 191 | +the following way: |
| 192 | + |
| 193 | +```go |
| 194 | +type Dim interface { |
| 195 | + type [...]struct{} |
| 196 | +} |
| 197 | + |
| 198 | +type SquareFloatMatrix2D[D Dim] [len(D)][len(D)]float64 |
| 199 | +``` |
| 200 | + |
| 201 | +### Summary |
| 202 | + |
| 203 | +1) Support type list constraints to express that a type is an array |
| 204 | + |
| 205 | + |
| 206 | +```go |
| 207 | +// Array expresses a constraint that a type is an array of T of any |
| 208 | +// size. |
| 209 | +type Array[T any] interface { |
| 210 | + type [...]T |
| 211 | +} |
| 212 | +``` |
| 213 | + |
| 214 | +2) Support a compile-time constant expression for `len([...]T)` |
| 215 | + |
| 216 | +This handy syntax would permit parameterization of arrays relative to other |
| 217 | +array types. Note that the constant expression `len` function on array types |
| 218 | +could actually be implemented today using `unsafe.Sizeof` by a parameterization |
| 219 | +over an array whose members have non-zero size. For example `len` could be |
| 220 | +written as `unsafe.Sizeof([...]T)/unsafe.Sizeof(T)` so long as |
| 221 | +`unsafe.Sizeof(T) > 0`. |
| 222 | + |
| 223 | +## Rationale |
| 224 | + |
| 225 | +This approach is simpler than generally providing a constant scalar expression |
| 226 | +parameterization of generic types. Of the two elements of the proposal, neither |
| 227 | +feels particularly out of line with the design of the language or its concepts. |
| 228 | +The `[...]T` syntax exists in the language to imply length inference for array |
| 229 | +literals and is not a hard to imagine concept. It is the deeper requirement to |
| 230 | +make this proposal work. |
| 231 | + |
| 232 | +One potential downside of this proposal is that we're not really using the |
| 233 | +array for anything other than its size which may feel awkward. For that reason |
| 234 | +I've opted to use a constraint which forces the array to use `struct{}` values |
| 235 | +to indicate that the structure of the elements isn't relevant. This awkwardness |
| 236 | +feels justified to side-step introduces scalars to type parameters. |
| 237 | + |
| 238 | +## Compatibility |
| 239 | + |
| 240 | +This proposal is fully backwards compatible with all of the language and also |
| 241 | +the now accepted type parameters proposal. |
| 242 | + |
| 243 | +## Implementation |
| 244 | + |
| 245 | +Neither of the two features of this proposal feel particularly onerous to |
| 246 | +implement. My guess is that the `[...]T` type list constraint would be extremely |
| 247 | +straightforward given an implementation of type parameters. The `len` |
| 248 | +implementation is also likely to be straightforward given the existence of |
| 249 | +both compile-time evaluation of `len` expressions on array types which exist |
| 250 | +in the language and the constant nature of `unsafe.Sizeof`. Maybe there'd be |
| 251 | +some pain in deferring the expression evaluation until after type checking. |
| 252 | + |
| 253 | +[type parameters]: https://go.googlesource.com/proposal/+/refs/heads/master/design/go2draft-type-parameters.md |
| 254 | +[block based data structures]: https://opensource.googleblog.com/2013/01/c-containers-that-save-memory-and-time.html |
| 255 | +[vanilla btree]: https://go2goplay.golang.org/p/A5auAIdW2ZR |
| 256 | +[parameterized node btree]: https://go2goplay.golang.org/p/TFn9BujIlc3 |
| 257 | +[bad parameterization btree]: https://go2goplay.golang.org/p/JGgyabtu_9F |
| 258 | +[proposal btree]: https://go2goplay.golang.org/p/4o36RLxF73C |
0 commit comments