Skip to content
This repository was archived by the owner on Jan 31, 2023. It is now read-only.

Commit bea8b57

Browse files
committed
Mention integration with systems languages
1 parent 978e17c commit bea8b57

File tree

1 file changed

+14
-0
lines changed

1 file changed

+14
-0
lines changed

README.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,20 @@ The WTF family of encodings has been chosen over the respective UTF family of en
5959

6060
> Depending on the programming environment, a Unicode string may or may not be required to be in the corresponding Unicode encoding form. For example, strings in Java, C#, or ECMAScript are Unicode 16-bit strings, but are not necessarily well-formed UTF16 sequences. In normal processing, it can be far more efficient to allow such strings to contain code unit sequences that are not well-formed UTF-16—that is, isolated surrogates. Because strings are such a fundamental component of every program, checking for isolated surrogates in every operation that modifies strings can create significant overhead, especially because supplementary characters are extremely rare as a percentage of overall text in programs worldwide.
6161
62+
### Integration with linear memory based languages
63+
64+
The document does not impose the requirement of full GC support on a language using linear memory.
65+
66+
The `string.new` and `string.lower` instructions are useful at the boundary even if a module does not fully embrace or otherwise support GC, enabling interoperability with or between for example systems languages like C/C++ and Rust by legalizing the relevant instructions when
67+
68+
* Calling an imported function with a string argument using `string.new`
69+
* Consuming a string argument in an export using `string.lower`
70+
71+
Furthermore, if there is a `string.new` creating a string from linear memory at one side of the boundary, and a `string.lower` immediately lowering the string at the other, as is the common case in systems languages, instead of creating an intermediate `stringref` the engine can optimize the operation to either
72+
73+
* A single copy from the source to the target memory if encodings match
74+
* A re-encoding from the source to the target memory if encodings to not match
75+
6276
## Implementation notes
6377

6478
Universal WebAssembly Strings as of this document can be implemented as a managed object with one slot per encoding. When a string from encoding A is created, only the slot of encoding A is populated. Accessing slot B will trigger re-encoding from A to B to populate slot B before using it.

0 commit comments

Comments
 (0)