From Go to Rust 1: async Dispatch

Higher-ranked Trait Bounds (HRTB), Generic async Dispatch

Jay Berkenbilt

Published in

Rustaceans

30 min read6 days ago

Introduction

This is the first part of a two-part series on my solutions to a few challenges I encountered while porting some code from go to rust.

Part 1, this part, gives an overall introduction to the problem and describes how I implemented a generic async dispatcher, which is a synchronous function that makes dynamic calls to async methods of an object. My solution involves using a custom trait and higher-ranked trait bounds (HRTB) to express a complex lifetime relationship that can’t be expressed with HRTB alone. In this post, I break it all down and give you a way to understand how it all works.

In Part 2, I describe how I implemented a generic async lock. This is an async-safe, non-blocking read/write mutex that can be used in async code without explicitly referencing any particular async runtime. The solution I present creates an AsyncRwLock trait and a concrete implementation that uses tokio::sync::RwLock, so it’s not as much about locking as it is about dealing with opaque types (“impl types”), which I needed to grapple with for the implementation.

There’s a lot of information out there about rust that is outdated because rust is still evolving, so I want you, my reader, to know right away how old this is. That way, you can check to see whether some of this material may be out-of-date in case I don’t manage to come back and update this post! As I write this, it is December 2024, and the current stable rust version is 1.83. Some of the workarounds I introduce will cease to be needed as more features stabilize in rust. Some use techniques that can be avoided with some existing crates, but there’s a lot to learn by understanding my solutions anyway.

This post assumes some familiarity with both go and rust. At a minimum, you should know about passing functions around as arguments to other functions, and you should understand the basics of rust lifetimes and async code. It’s okay if you don’t know about higher-ranked trait bounds as I will cover these in the post.

All the code in this blog post can be found in the GitHub repository: https://github.com/jberkenbilt/go-to-rust.

Background

In July 2024, I started a job working for KnectIQ. KnectIQ has a product called SelectiveTRUST™. Part of this is a device SDK that has to work on a wide range of platforms including your standard desktop and server environments, mobile, browser (WASM), and real-time operating systems. As such, we have multiple implementations of the device SDK, and some have to work in constrained environments, including environments in which we may be limited on how many threads we can run.

The original code was all written in go, and the go implementation still has broad applicability, but go can’t run in all the places we need to run, and since rust offers some advantages that are especially compelling, we intend for the rust implementation of the SDK to be the default implementation.

For the most part, porting from go to rust was straightforward enough, but there were some challenges, most of which are the usual trade-offs between rust and go. For example:

In go, WASM mostly “just works” because a WASM target built in go includes enough of a go runtime to run goroutines even though WASM is inherently single-threaded (at least for now), and it includes shims so things like making HTTP calls work using the standard library HTTP client. But the inclusion of the rust runtime makes the WASM bundle large.
In go, any code can be made asynchronous by just running it in a goroutine. In contrast, rust requires you to pick an async runtime, mark functions as async (which ripples through the code), and avoid blocking the thread in async code, which means you need separate sync and async versions of many things.
In go, it’s more common and idiomatic to use channels than locks (you know, the whole “don’t communicate by sharing memory; share memory by communicating” thing), even though locks are sometimes more performant and may lead to simpler code, particularly for things like caching. Go exposes a primitive mutex, but it is not tied to the data it protects and has to be locked and unlocked manually, leaving it up to the application to implement the right locking protocols.
Go allows concurrent access to mutable data, while rust, as a fundamental principle, does not. Go code that uses locks or holds pointers to mutable data can be hard to port to rust and will need to be refactored. For example, go code that has a bunch of struct fields protected by some mutex will have to be refactored because, in rust, protected data live inside the mutex. This forces you to organize code in a different (and usually better!) way.

Problem Statement

The system I ported is the SelectiveTRUST device SDK — a software library for interacting with our trusted endpoint devices. The rust device SDK implementation has a bottom layer and various independent top layers that access the bottom layer.

The bottom layer performs cryptographic operations and communicates with a server using an HTTP-based API. It makes heavy use of async code, and it is intentionally async runtime-agnostic. That means library users must provide the async runtime, a situation that is quite common for rust libraries.

Each top layer makes use of the bottom layer for specific environments and exposes its functionality in various ways. One example is our C API, which creates a synchronous, C-friendly API on top of a singleton instance of the device controller. The C API creates an async runtime with tokio and dispatches work to various async functions using tokio’s usual approach for sync to async bridging. If we switch the WASM device implementation to rust, it will be running in the single-threaded async runtime provided by the browser. We also have various applications and language bindings that can each use whatever kind of runtime is most suitable.

In rust, it’s easy to write async functions that don’t know or care about the runtime in which they are executing, but there are a few significant things you have to keep in mind:

Rust futures are inert — they only execute when you await them. If you want to run something in the background, you have to spawn a task. While the standard library provides a way to spawn a new operating system thread, there is no runtime-agnostic way to spawn a task. To do that, you have to make a call to your specific runtime.
You must avoid blocking the thread inside an async function. Blocking operations such as sleeping, obtaining locks, or performing I/O must be implemented in a manner that yields control to the runtime rather than blocking the thread. These operations typically require calling out to your specific runtime. The reason why you must avoid blocking the thread from async code is another topic in its own right (and is not specific to rust), but there’s plenty written about this, so I will not go into it beyond a one-sentence summary: If you block the thread from async code, all other async functions running in the same OS thread will also be blocked since there will be no opportunity to return control to the async runtime. Got it? If it doesn’t make sense to you, just take my word for it for now. You don’t need to understand that for the rest of the blog post.
Synchronous code can’t directly call async functions. You need to “build a bridge” by creating a runtime and handing it an async future to execute. A runtime will have some call, such as block_on, that takes a Future and polls the future until it is completed.

In the course of my port, I ran into issues related to all the above concerns. Most were straightforward to resolve, but two of them were tricky. In this series, I will focus on these issues:

Creating a synchronous API that dispatches work to async methods of a mutex-protected singleton object. Due to the lack of stabilization of some aspects of async closures or async function traits, this required some extra work. This is covered in this post.
Creating a generic async read/write lock trait that can be used without depending on any particular async runtime, shifting dependency on a specific runtime to an opaque implementation of the trate. Because of limitations around the use of opaque (“impl”) types and dynamic trait objects, this turned out to be more difficult than you would think. This is covered in Part 2.

In this blog, I will start with some go code and a simple port to synchronous rust, and then I will take you along a path to a final rust async implementation that resolves the issues I encountered along the way. Let’s dive in!

The Sample Code

For this post, I have created a trivial device controller and API. They don’t do anything real, but they have enough behavior to illustrate the challenges and their solutions:

There is a controller package that stands in for the internal rust implementation. It has some data that is protected behind a read/write mutex. The contents get updated with each call. The sample code doesn’t actually make any HTTP calls, but it shows where you would have to hold a lock over an await point in rust when using any popular async HTTP client library.
There is a device package that implements a simple, function-based API that operates on a global singleton object. I haven’t created a C API on top of it since this is not a post about exposing a rust API in a C API, but it would be simple to create one.

The Go Implementation

You can find the code at https://github.com/jberkenbilt/go-to-rust, but you don’t have to — I’m including excerpts inline for discussion purposes. For comparison, I’ve implemented the sample in go so you can see how it works.

Below is source code for the “controller” in our example. I’ll point out several things after the code.

Go code for the device controller

Here are some things to notice:

The Controller struct contains a mutex and some fields. As is typical in go code, it is the programmer’s responsibility to lock and unlock the mutex in the correct way. While it is possible to create an API that encapsulates the mutex and locking, the language doesn’t force you to do so. It is possible for the developer to use leave things locked, use the wrong kind of unlock (shared vs. exclusive) for a lock, or forget to lock the mutex before accessing the data. Some of these mistakes will create runtime errors, and some may create bugs or race conditions.
This example doesn’t contain any goroutines. I thought about putting some in, but in go, running things concurrently with goroutines is quite simple and doesn’t broadly impact the code base in the way that async does in rust, so I decided it was just clutter. The real code I was porting uses goroutines in non-trivial ways such as to run periodic tasks, monitor activity on channels, and implement cache expiration.
This example is a bit contrived in that both calls lock the mutex for write and then later lock it for read. But such is the nature of minimal examples — this is not about the architecture of how to manage mutex-protected data or how you should share information. I merely need something here that holds a lock over what would be an async call in rust.
I’ve included the test code for reference. It’s pretty simple code, and you can see how it works by looking at the tests.

Next, we have the “device” code. This is the function-based API that operates on a singleton. For the real system, this corresponds to what is exposed in the C API.

Go code for the device API

Things to notice:

There’s a global singleton, which is implemented as an initially nil pointer. Calling Init initializes it. There are no explicit protections against concurrent access, though the code is intended to be called from a single-threaded environment.
The runMethod function is a dispatcher. You pass it a closure and an argument, and it checks that the device is initialized and, if so, calls the closure with the device and argument. It looks really simplistic in this example, but keep in mind that this pattern is used in the real application for implementing an FFI-friendly (FFI = foreign function interface) C API, so the wrapping and dispatch is a little more involved in the real application.
The test code is essentially identical. A real application could do better about avoiding code duplication, but for purposes of example, I wanted it to be clear that the wrapper behaves identically to the underlying API.

There’s nothing tricky or unusual about this code. It’s pretty straightforward to do this kind of thing in go.

Rust Version: Initial

Here is an initial rust port of the go code. You can find this in https://github.com/jberkenbilt/go-to-rust/tree/main/rust/00-initial. This port contains only synchronous code and is a straightforward port of the go code. You can look at the two implementations side-by-side and match them up easily. I’ll include the code here and make a few remarks.

Rust code for the fully sync device controller

Notice:

While the go code has a struct containing some fields and a mutex, the rust code has a mutex that contains the fields. This is how you do it in rust. Mutexes implement thread-safe interior mutability, meaning the guarantee about having only one mutable reference at a time is enforced at runtime rather than compile time. Specifically, locking the mutex is the only way to get a mutable reference. Since rust uses RAII (“Resource allocation is initialization”), a pattern that guarantees resource cleanup through use of destructors (via the Drop trait in rust) the mutex gets unlocked automatically when the lock guard (returned by read or write) goes out of scope. There’s no possibility of forgetting to lock or unlock the mutex.
The rest of the port is straightforward. Error handling uses Result types, the tests use unwrap and assert_eq!, etc. You could probably get similar results using machine translation — everything is simple.

Here’s the device implementation.

Rust code for the fully sync device API

Notice:

Where the go code uses a global pointer that is initially nil, the rust code uses an Option type inside an RwLock inside a LazyLock. Typically for rust, this is a bit more plumbing, but it’s also null-safe, thread-safe, etc.
The dispatcher is a little more complex here. We don’t just have the global singleton sitting there. We have to obtain it in a manner that ensures we have no data race conditions. The dispatcher first grabs a reference to the global controller from the lock, and then it calls the function. Still, it’s pretty simple and closely parallels the go code, but safer.
As before, the test for the device parallels the test for the controller.

Making it Asynchronous

If only it were simple to make this async. If we could just run async code easily inside Controller::request, there wouldn’t be anything to write. To be fair, it would be possible to create an async runtime in that function (or to use a sync wrapper provided by the HTTP client library), but remember that this is a simplified example. In the real system, there is other async stuff going on, and the real request function doesn’t serialize by grabbing a lock in this way…there’s a lot more going on.

Even with rust’s complications, the first attempt at adding async to this is quite easy, but there are two problems with it:

The device module uses nightly unstable code. As mentioned at the top, this was written in December 2024 with rust 1.83. At that time, the async_closure feature was not stable, and there is still debate about exactly how it is going to work, but this shows how easy this may eventually become. Overcoming this is the topic of the rest of this post.
I had to break the rule about being async runtime-agnostic in controller. Since the Controller::request function makes an async call, I’m no longer allowed to use a sync lock as holding a sync lock across an await point may deadlock. This isn’t an error, but clippy warns you about it, and it is likely to deadlock in a single-threaded async environment or even in a multi-threaded one if you have enough threads. For now, I just pulled in tokio and used its async-aware lock. Resolving this issue is the topic of Part 2. Spoiler: I’m going to hide the tokio lock behind a trait, but implementing the trait is harder than you’d think.

You can find this code at https://github.com/jberkenbilt/go-to-rust/tree/main/rust/01-async-request-unstable. Here’s the a diff followed by the full code. I’ll throw it all up here, then repeat chunks of it with discussion.

async code: first attempt

Let’s discuss. This might be easier to follow if you have one window open with this text and another one that you can scroll around to the code and diff, but I’ll repeat chunks inline for discussion to make it possible to read it straight through.

Let’s take a look at the first part of the diff.

In the request function, you can see several things:

We have replaced std::sync::RwLock with tokio::sync::RwLock because we need an async lock. This is a problem! We don’t want to directly use tokio here! We’ll come back and deal with that in Part 2 by introducing a generic async RwLock trait that uses tokio::sync::RwLock in its implementation in a separate crate.
The request function is now async.
The lock API has changed. Sync locks can fail if a thread panicked while holding the lock, but async locks can’t, so there’s no more unwrap. But async locks are, well, async, so we have to await them.
I added a gratuitous async block just to show that you can actually do an await here while holding the lock.

All the rest of the changes in controller are changes to the lock API and modifying the test suite to be async (using the tokio::test macro) and to await on the method calls.

That’s all we’ll say about controller for now. See Part 2 for the rest.

Moving onto differences in the device API, there are two notable changes. First:

We need a sync to async bridge. There’s nothing special about this — it’s basically right from the tokio documentation. Our wrapper around the controller now also includes an async runtime, which we initialize with the singleton. Then, each call calls the now-async method using the block_on method of the runtime. But how did the f argument become async?

So simple. We just had to add the async keyword into the bound for the FnT generic type. Except, wait: that’s an unstable feature. I had to enable the async_closure feature and use a nightly rust, and as indicated in the comments, there is still debate about whether you’ll add the async keyword in front of the existing family of Fn traits or there will be a whole new collection of traits. Right now (nightly rust 1.85, December 2024), either syntax is accepted with the async_closure feature. Well, we have something to look forward to, but we want to stick with stable rust here, so what do we do?

There are several approaches. A popular one is to use boxed futures. (If the link is bad, at the time of writing, it pointed to the BoxFuture type in a 0.x version of the futures crate.) I refer you to the docs in case you want to know more about that. I didn’t really want to go down that path because of the extra allocation (not a huge deal, really, but we may target real-time operating systems and other environments where we really want to minimize allocations) and because it makes you change the async function itself. So I decided to pursue a solution that doesn’t require extra crates or allocations. Here’s a change that you can find in https://github.com/jberkenbilt/go-to-rust/tree/main/rust/02-stable-rust-first-try.

This is a pretty standard trick when dealing with generic async functions. Generic async functions typically return impl Future types, which are not allowed in return types of Fn trait bounds, so it’s common to split the future type into a separate generic parameter like the above. But it doesn’t actually work in this case. Look what happens when we try to compile:

Compiler error with async function trait bound

There’s quite a bit to unpack here, so let’s unpack it.

Line 7 above uses this impl for<'a> Future syntax. If you haven’t seen that before, the for<’a> introduces a higher-ranked trait bound. It took me quite a bit of effort before I felt that I fully understood this concept. It’s documented, but the documentation seems to be written by theoreticians for theoreticians. Are you well-versed on the Hindley-Milner type system? How’s your Haskell? I’m going to try to break it down in a way that will make it understandable if you’re not deep in the theory behind type systems in programming languages.
Line 9 suggests awaiting on both futures. This is not useful or relevant in this case.
Line 10 mentions that distinct uses of impl Trait result in different opaque types. That’s interesting to know, but it’s not really the issue here either. But impl Trait opaque types are very limited in rust right now, and that ends up being the root cause of both of the issues we explore in this post. In this case, restrictions on where impl Trait types can be used were what forced us down the path of splitting Fut into its own type. (Or we could have used Pin<Box<dyn Future>>, but I already ruled that out.

Before we go on, let’s take a detour into the land of Higher-Ranked Trait Bounds (HRTB).

A Way to Think About Lifetimes

I personally found concept of higher-ranked trait bounds (HRTB) to be the most difficult of any concept I had to wrap my head around in learning rust. I’ve seen explanations of it, but, even as a very experienced developer, I didn’t really understand any of them on first reading. Here, I aim to present a mental model you can use to better understand lifetimes and the limitations of the syntax used in rust to specify them. It is my hope that some issue that causes you to need HRTB will fit into this model naturally.

Let’s start by talking about lifetimes. I am going to assume that you have at least some level of understanding about lifetimes, but it’s okay if you’re a little murky about them. I’m going to go back to basics for a minute, so bear with me. I’m painting a picture here.

If you have a simple function like this:

fn length(a: &str) -> i32 {
    // ...
}

the use of &str as the type of the parameter a is telling rust exactly what type the variable a has to have. Programming 101. Now suppose you have this:

fn length<T: HasLength>(a: T) -> i32 {
    // ...
}

// or

fn length<T>(a: T) -> i32
where
    T: HasLength
{
    // ...
}

Now we’re saying that the type of a is some type T where T implements some trait called HasLength. Let’s take a look at the syntax and what it means. When we declare the arguments, we have a: T. This is saying that a has type T. But unlike &str, we don’t know what T is. The angle-bracket syntax is just how rust lets you define T. Rust gives you two choices about how to specify the constraints, or “bounds” on T. You stick it right with T, or you can use a where clause. (Welcome to rust, where there’s always more than one way!) Other languages do it in other ways. We could make up our own syntax and do something like this:

// not really rust...maybe this is tarnish?
where T: HasLength {
    fn length(a: T) -> i32 {
        // ...
    }
}

That’s not valid rust, but you can see what it’s doing, right? Here, we’re creating a “scope” in which we define T based on its bounds. Rust doesn’t do that though…the scope of T is implied to be just for the function where it is a generic type. But if you put generic types in an impl block, you can use them for any functions in the block, so it’s kind of like that.

Now I’m going to state something kind of obvious. You may have never really thought about it, but by declaring T as a generic type, we are implicitly doing two things: stating what the bounds of T are, and creating a scope that defines where using it is syntactically valid. If you feel like I’m not really saying anything, bear with me, because this is going to matter with lifetimes.

Sometimes, the scope of one generic parameter is limited to the scope of another one. Consider this:

pub fn f<T, U>(a: T)
where
    T: AsRef<U>
{
    // ...
}

Here, the generic type U is only used in the definition of T. It’s not directly used with the arguments of f. You could imagine something like this made-up syntax:

// not really rust...probably some other form of oxidation
pub fn f<T>(a: T)
where
    T<U>: AsRef<U>
{
    // ...
}

This expresses more clearly that U is only significant in the definition of T, but rust has no such syntax, and there’s not really any need for it because where we define U doesn’t really carry any meaning of its own. Sure, we could use U other than in T, but we don’t, and the fact that we could doesn’t matter. But, with lifetimes, things are different!

Consider this function (which may look familiar if you've read the Rust Programming Language):

fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
    // ...
}

Here, ‘a is a lifetime parameter. Just as with generic type parameters, rust “hangs” the introduction of the lifetime parameter 'a off of the function name with angle brackets. In the first example, we are saying that parameters x and y have the same lifetime and that the return value is also that lifetime. But what does that really mean?

Lifetime constraints are similar to type constraints in that they tell the compiler not to accept any parameter that doesn’t meet the constraints. When you say x: &str, you’re saying “don’t accept any value whose type isn’t &str." Likewise, when you say x: &'a str, you are saying, “Don’t accept any value whose type isn’t &str and that doesn’t live at least as long as some scope we are referring to as 'a.” Using rust terminology, we would state this as “x must outlive ‘a.” But what scope is that exactly? I think this is where people get muddled about lifetimes, but it’s actually pretty simple once you know how to think about it. By hanging 'a off of the function name, we are creating a scope just like with the generic types, but it means a bit more. Imagine how this would look with our first fake syntax:

// more tarnish
'a {
    fn longest1(x: &'a str, y: &'a str) -> &'a str {
        // ...
    }
}

As with generic types, this is saying “the name 'a is valid as a lifetime in this scope,” but it’s also saying something else: “The lifetime 'a refers to the scope that is active when longest1 is called.” You see the difference? With a lifetime declaration, you are not just creating a scope in which that symbol can be used, you are actual defining the scope itself. This is really important because a lifetime scope is a lower bound. That means that any reference (or struct or trait or anything else that can have a lifetime marker) has to be alive when the scope is entered and remain alive until the scope is exited. That’s why we say that x outlives ‘a. To be more concrete, consider this:

{ // beginning of scope
    // some code that calls f()
    let s1 = "potato"; // This begins s1's lifetime
    f(&a);
    // do something else
} // end of scope

fn f<'a>(s2: &'a str) { /* ... */ }

In this concrete example, the lifetime 'a (which could be elided since lifetime annotations are not needed when the default lifetime rules apply) refers to the scope that is active wherever f is called, so we are saying that the lifetime 'a has to start somewhere before the call of f and end sometime after f returns. The life of s1 begins when it is initialized and ends at the end of the block, so it lives long enough to pass to f.

For simple lifetime relationships like the ones above, you don’t really have to think about this, and you can just worry about relating the lifetimes to each other. But when you start getting into higher-ranked polymorphic functions, things get a little different, so this seems like a good time to introduce some new terminology.

I’m not going to go very theoretical here, but if you’re interested in going deeper, look up “Hindley-Milner type system.” For now, think of this is “type theory light.”

Monomorphic function. The word monomorphic literally means “one shape.” If you have a strongly typed programming language and you create a function whose arguments all have a specified, concrete type, the function is monomorphic. It can be any concrete type including a built-in type (like int or float types) or a user-defined type like a struct or enum. Some older languages, like C, only have monomorphic functions.
Polymorphic function. The word polymorphic means “many shapes.” A polymorphic function has at least one argument whose concrete type is determined when the function is called rather than being explicitly stated in the function’s definition. A concrete type must exist when the code is actually executed, but polymorphic functions delay the decision about which actual type will be used. Many modern languages, including rust, implement polymorphism using generics, though there are other ways to do it and other things that also fall into the category of polymorphism (think of dynamic types in object-oriented programming languages). While not theoretically rigorous, I like to think of polymorphism as a way of replacing, “this argument must be type X” with “this argument must be some type that meets a specified set of constraints.” There are several ways to state those constraints, and the mechanisms vary from language to language. In rust, we introduce the generic parameters with the name of the thing that uses them, and we add constraints, or bounds, either right there with a colon or in a separate where clause.
Rank. This is best defined by example, which I will provide after the explanation. All monomorphic functions have rank 0. A polymorphic function that doesn’t have any arguments that are themselves functions has rank 1. Most modern languages, including rust, implement functional programming, in which you can pass functions around as arguments to other functions. If a polymorphic function has an argument that is also a polymorphic function, the outer function has a higher rank than the inner function. Specifically, a polymorphic function’s rank is one more than the highest rank of any of its arguments. Note that functions that take other functions are arguments are called higher-order functions. A function can be higher-order without being polymorphic, and it can be polymorphic without being higher-order. All functions of rank 2 or higher are necessarily higher-order functions.

Let’s make this concrete with an example.

pub fn rank2<FnT, T, Ret>(f: FnT, v: T, len: usize) -> Ret
where
    FnT: FnOnce(T, usize) -> Ret,
{
    f(v, len)
}

This function is a polymorphic function of rank 2. Why? It’s argument f has type FnT. Type FnT is defined as FnOnce(T, usize) -> Ret where T and Ret are both generic types. Since T and Ret are not constrained to be functions, FnT is a basic polymorphic function of rank 1. Since rank2 takes an argument of type FnT, its rank must be one higher, which is rank 2.

Higher-Ranked Trait Bounds (HRTB)

So how does this relate to HRTB? (From now on, I’m just going to say “HRTB” rather than spelling it out.)

Take a look at this example. Here’s a monomorphic trait with an explicit lifetime along with a function that calls its associated function. For this to work, we have to introduce HRTB, which is what the for<'a> is. I’ll explain.

pub trait WithLifetime<'a> {
    fn check_len(&'a self, len: usize) -> bool;
}

pub fn call_with_lifetime<T>(v: T, len: usize) -> bool
where
    for<'a> T: WithLifetime<'a>,
{
    v.check_len(len)
}

Let’s consider what it would mean if we did this instead:

pub fn call_with_lifetime<'a, T>(v: T, len: usize) -> bool
where
    T: WithLifetime<'a>,
{
    // ERROR: v does not live long enough!
    v.check_len(len)
}

In this case, we get a compiler error that v does not live long enough! Let’s look at this in “tarnish” and see what’s going on. In my fake rust (tarnish), we would write this like this:

// not really rust (more tarnish)
'a {
    pub fn call_with_lifetime<T>(v: T, len: usize) -> bool
    where
        T: WithLifetime<'a>,
    {
        v.check_len(len)
    }
}

Here, you can see that we’re saying v has to still be alive after call_with_lifetime returns, but it won’t be: we moved v into call_with_lifetime, so it actually gets dropped right before before call_with_lifetime returns! That’s actually okay for us though…it is passed by reference to check_len, so we want the lifetime bound to reflect that it has to be valid until after that function returns. We want something more like this:

// still more tarnish
pub fn call_with_lifetime<T>(v: T, len: usize) -> bool
where
    'a {
        T: WithLifetime<'a>,
    }
    {
        v.check_len(len)
    }
}

We want to define the lifetime ‘a as the scope in which T::check_len is called, not the scope in which call_with_lifetime is called. This is very similar to what happens with higher-ranked polymorphic functions. Remember this example of our rank 2 function:

pub fn rank2<FnT, T, Ret>(f: FnT, v: T, len: usize) -> Ret
where
    FnT: FnOnce(T, usize) -> Ret,
{
    f(v, len)
}

Here, the type used for T is actually based on what is passed to f and has to satisfy the requirements of any f that may be passed in. This is the same kind of thing as our lifetime bound. So, if we can’t hang the lifetime parameter ‘a off call_with_lifetime, where do we put it? The answer is that we introduce the HRTB syntax for<'a>.

If you read the docs about this, it will tell you that it means “for any lifetime.” This may be strictly true in a theoretical sense, but I personally find it to be unhelpful. I prefer to think of it as just disconnecting 'a from the outer lifetime. I think it actually does mean “for any lifetime” and is therefore not exactly the same as my “tarnish” syntax, which would be explicitly creating a smaller lifetime, but since my smaller lifetime falls into the category of “any lifetime,” it will do the job.

So, to sum it all up, when you get an error that something won’t live long enough because of the positioning of the lifetime parameters, but you know it actually does live long enough for how you’re using it, it may be time to introduce HRTB. Usually, the rust compiler will tell you that you need one, but I have seen it suggest using HRTB when it is not necessary. When does that happen? Well, right here for one thing. In the interest of full disclosure, the example I presented doesn’t actually need HRTB because, in this case, the lifetime on on WithLifetime isn’t doing anything. We could omit the whole thing and end up with this:

pub trait WithoutLifetime {
    fn check_len(&self, len: usize) -> bool;
}

pub fn call_without_lifetime<T>(v: T, len: usize) -> bool
where
    T: WithoutLifetime,
{
    v.check_len(len)
}

As originally written, HRTB was needed, and the compiler may suggest it. In this case, the code could have been simplified to avoid the need for HRTB, but there’s no need to get hung up on that. Sometimes an explicit HRTB increases the clarity of the intent, and there are times when the lifetime is required. Those are the times that you need HRTB. It’s just that creating an example that couldn’t possibly work without HRTB is a bit more difficult, and it’s easier to explain HRTB with this simpler case. But never fear — one place where HRTB is absolutely required is with Futures (at least for now), and we will come back to that later in this post.

I believe this introduces the concepts we need to proceed with making our generic dispatcher handle async functions without using any unstable rust, so let’s continue!

Making run_method async, Part 2

Let’s take another look at the compiler-error that sent us down the HRTB detour. Here’s the definition of run_method:

fn run_method<ArgT, ResultT, FnT, Fut>(
    f: FnT,
    arg: ArgT,
) -> Result<ResultT, Box<dyn Error + Sync + Send>>
where
    FnT: FnOnce(&Controller, ArgT) -> Fut,
    Fut: Future<Output = Result<ResultT, Box<dyn Error + Sync + Send>>>,
{
    let lock = CONTROLLER.controller.read().unwrap();
    let Some(controller) = &*lock else {
        return Err("call init first".into());
    };
    CONTROLLER.rt.block_on(f(controller, arg))
}

and here’s the error:

The compiler is suggesting that we need some kind of HRTB, but it’s kind of confusing because it mentions opaque type impl Future…which doesn’t appear anywhere in the code. What’s going on here is that a Future has a lifetime. It’s not that easy to see, and it’s not explicit in the definition of Future, but you can see it. Here’s how Future is defined:

pub trait Future {
    type Output;
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}

See that context? It has a lifetime. Going into the guts of how futures are implemented is out of scope for this post, and there is good material already written about that, so I’m going to hand-wave a little. Remember that rust futures are inert — nothing happens until you await on them. When you call await on a Future, the runtime calls the poll method, and the poll method advances the async activity. While not precise or even possibly completely accurate, I find it useful to think of the Future as “capturing” a lifetime equal to the scope in which the async function is invoked because the future is effectively running that function. So if you have an async method on an object, I find it useful to think of the future as having to have the same lifetime as the object. I don’t know if that’s strictly accurate, but it provides a simple and useful mental model. If you look at the last line of the function definition:

CONTROLLER.rt.block_on(f(controller, arg)

you can see that the future is run to completion inside rt.block_on. This is what block_on does. So the future doesn’t have to last as long as the outer scope. This is exactly where we would want to use HRTB. But we also need HRTB for the function because the function takes a reference to controller, which is dropped before the function returns. It has the same basic structure as our earlier HRTB example. This time, controller isn’t moved into the function; it is actually created in the function as we borrow it from the static lock. Either way, it is dropped before run_method returns. So, what if we try this:

fn run_method<ArgT, ResultT, FnT, Fut>(
    f: FnT,
    arg: ArgT,
) -> Result<ResultT, Box<dyn Error + Sync + Send>>
where
    for<'a> FnT: FnOnce(&'a Controller, ArgT) -> Fut,
    for<'a> Fut: Future<Output = Result<ResultT, Box<dyn Error + Sync + Send>>>,
{
    let lock = CONTROLLER.controller.read().unwrap();
    let Some(controller) = &*lock else {
        return Err("call init first".into());
    };
    CONTROLLER.rt.block_on(f(controller, arg))
}

Turns out this doesn’t help at all! Why not? Well, the ‘a for FnT and the ‘a for Fut have to be the same lifetime. If we had our tarnish syntax, we would want something like this:

fn run_method<ArgT, ResultT, FnT, Fut>(
    f: FnT,
    arg: ArgT,
) -> Result<ResultT, Box<dyn Error + Sync + Send>>
where
    // more "tarnish" -- rust doesn't have this
    'a {
        FnT: FnOnce(&'a Controller, ArgT) -> Fut,
        Fut: Future<Output = Result<ResultT, Box<dyn Error + Sync + Send>>>,
    }
{
    let lock = CONTROLLER.controller.read().unwrap();
    let Some(controller) = &*lock else {
        return Err("call init first".into());
    };
    CONTROLLER.rt.block_on(f(controller, arg))
}

but, alas, there is no such syntax, and this time, we’re kind of stuck as there is actually no way syntactically to relate the two HRTB lifetimes to each other. So what do we do?

The answer is that we can create a special trait and use the trait to tie the lifetimes together. I haven’t seen this pattern used very much, but I find it to be very convenient for these special cases. Note that, when async closures are stable, we will no longer need this workaround, and I will no longer have any examples of where I need to use custom traits to tie lifetimes together. But for now, here is what it looks like.

Let’s look first at just the trait definition. We have:

trait MethodCaller<'a, ArgT, ResultT>: FnOnce(&'a Controller, ArgT) -> Self::Fut {
    type Fut: Future<Output = Result<ResultT, Box<dyn Error + Sync + Send>>>;
}

What is this?

The MethodCaller trait is a generic trait with a lifetime parameter as well as type parameters. The lifetime parameter applies to everything in the trait definition, and the type parameters are the ones that we actually care about in the function we are encapsulating without worrying about its async nature.
The MethodCaller trait is a subtrait of FnOnce. That’s like a derived class or subclass with object-oriented programming, but it rust, it means that anything that you provide an implementation of MethodCaller for must also implement FnOnce. In this case, it must implement specifically this exact FnOnce that returns the future type.
Notice that the future type is not a generic parameter. Instead, it is an associated type. Associated types are just types that have to be filled in when you implement a trait, just like functions do. (Fun fact: traits can also have associated constants, though we have no use for them here.) Why is Fut an associated type instead of an additional generic type parameter? If we have it as an additional parameter, we are stuck in our original situation as there is no way to use the same lifetime bound for the subtrait bound and the bound on Fut.
The MethodCaller trait has a lifetime. The lifetime parameter applies to the lifetime of the whole implementation, which means it ties the lifetime of the future together with the lifetime of the Controller reference in the function!

But a trait is no good without an implementation, so we have to provide one. In this case, we can provide a blanket implementation, which is an implementation of a generic trait in terms of other generics. Here’s the trait again, this time followed by its implementation:

trait MethodCaller<'a, ArgT, ResultT>: FnOnce(&'a Controller, ArgT) -> Self::Fut {
    type Fut: Future<Output = Result<ResultT, Box<dyn Error + Sync + Send>>>;
}
impl<
        'a,
        ArgT,
        ResultT,
        FnT: FnOnce(&'a Controller, ArgT) -> Fut,
        Fut: Future<Output = Result<ResultT, Box<dyn Error + Sync + Send>>>,
    > MethodCaller<'a, ArgT, ResultT> for FnT
{
    type Fut = Fut;
}

So what’s this? This is a blanket implementation because we are implementing MethodCaller<'a, ArgT, ResultT> for FnT, which is itself a generic also defined in terms of the same generic parameters. Also, we have the Fut type parameter, which is used to supply the value for the associated Self::Fut associated type. If you look the bounds of FnT and Fut on the implementation, you can see that they are identical to the subtrait bounds and associated type bounds on the trait definition. That means that our implementation is a blanket implementation for every possible combination of FnT and Fut that would satisfy the trait definition. Furthermore, the lifetime ‘a applies to the whole thing, indicating that the function and the future have identical lifetimes. With this, we can define the bounds of FnT in run_method as

    for<'a> FnT: MethodCaller<'a, ArgT, ResultT>,

So there you have it: our one single ‘a is on an HRTB like we wanted, but rather than having to apply the same lifetime to two different HRTBs, which is not representable in rust syntax, we use MethodCaller to glue it all together so a single HRTB will cover the whole thing.

Here’s a little bonus. You can even do this:

trait MethodCallerNoLT<ArgT, ResultT>: for<'a> MethodCaller<'a, ArgT, ResultT> {}
impl<ArgT, ResultT, M> MethodCallerNoLT<ArgT, ResultT> for M where
    for<'a> M: MethodCaller<'a, ArgT, ResultT>
{
}

Here we’re just hiding the lifetime required by MethodCaller inside another trait using HRTB in another place where it’s syntactically allowed. With this, it actually works to use this bound:

    FnT: MethodCallerNoLT<ArgT, ResultT>,

We have essentially created a custom AsyncFnOnce trait. See how similar this looks to one of the possible future rust implementations:

FnT: AsyncFnOnce(&Controller, ArgT) -> Result<ResultT, Box<dyn Error + Sync + Send>>,

In that, there are no HRTBs because the default lifetime rules work. Personally, I find the version with HRTB to be a little easier to understand because it makes it explicit that the lifetime of Controller is shorter than the outer lifetime, but it’s nice to know you can actually write this without any explicit lifetime annotations at all!

Well, that’s was a bit of an uphill climb, but we now have the generic async dispatcher working with all stable code!

Still here? It’s time to move on to the generic async lock implementation, which you can find in Part 2!