Google have chosen to break my blog by removing support for the chart API which I used for the formulae. For a better version with all the formulae go to my github repo.
There are many introductions to the Expectation-Maximisation algorithm.
Unfortunately every one I could find uses arbitrary seeming tricks that seem to be plucked out of a hat by magic.
They can all be justified in retrospect, but I find it more useful to learn from reusable techniques that you can apply to further problems.
Examples of tricks I've seen used are:
Using Jensen's inequality. It's easy to find inequalities that apply in any situation. But there are often many ways to apply them. Why apply it to this way of writing this expression and not that one which is equal?
Substituting in the middle of an expression. Again, you can use just about anywhere. Why choose this at this time? Similarly I found derivations that insert a into an expression.
Majorisation-Minimisation. This is a great technique, but involves choosing a function that majorises another. There are so many ways to do this, it's hard to imagine any general purpose method that tells you how to narrow down the choice.
My goal is to fill in the details of one key step in the derivation of the EM algorithm in a way that makes it inevitable rather than arbitrary.
There's nothing original here, I'm merely expanding on a stackexchange answer.
Generalities about EM
The EM algorithm seeks to construct a maximum likelihood estimator (MLE) with a twist: there are some variables in the system that we can't observe.
First assume no hidden variables.
We assume there is a vector of parameters that defines some model.
We make some observations .
We have a probability density that depends on .
The likelihood of given the observations is .
The maximum likelhood estimator for is the choice of that maximises for the we have observed.
Now suppose there are also some variables that we didn't get to observe.
We assume a density .
We now have
where we sum over all possible values of .
The MLE approach says we now need to maximise
One of the things that is a challenge here is that the components of might be mixed up among the terms in the sum.
If, instead, each term only referred to its own unique block of , then the maximisation would be easier as we could maximise each term independently of the others.
Here's how we might move in that direction.
Consider instead the log-likelihood
Now imagine that by magic we could commute the logarithm with the sum.
We'd need to maximise
One reason this would be to our advantage is that often takes the form where is a simple function to optimise.
In addition, may break up as a sum of terms, each with its own block of 's.
Moving the logarithm inside the sum would give us something we could easily maximise term by term.
What's more, the for each is often a standard probability distribution whose likelihood we already know how to maximise.
But, of course, we can't just move that logarithm in.
Maximisation by proxy
Sometimes a function is too hard to optimise directly.
But if we have a guess for an optimum, we can replace our function with a proxy function that approximates it in the neighbourhood of our guess and optimise that instead.
That will give us a new guess and we can continue from there.
This is the basis of gradient descent.
Suppose is a differentiable function in a neighbourhood of .
Then around we have
We can try optimising with respect to within a neighbourhood of .
If we pick a small circular neighbourhood then the optimal value will be in the direction of steepest descent.
(Note that picking a circular neighbourhood is itself a somewhat arbitrary step,
but that's another story.)
For gradient descent we're choosing because it matches both the value and derivatives of at .
We could go further and optimise a proxy that shares second derivatives too, and that leads to methods based on Newton-Raphson iteration.
We want our logarithm of a sum to be a sum of logarithms.
But instead we'll settle for a proxy function that is a sum of logarithms.
We'll make the derivatives of the proxy match those of the original function
precisely so we're not making an arbitrary choice.
Write
The are constants we'll determine.
We want to match the derivatives on either side of the
at :
On the other hand we have
To achieve equality we want to make these expressions match.
We choose
Our desired proxy function is:
So the procedure is to take an estimated and obtain a new estimate
by optimising this proxy function with respect to .
This is the standard EM algorithm.
It turns out that this proxy has some other useful properties.
For example, because of the concavity of the logarithm,
the proxy is always smaller than the original likelihood.
This means that when we optimise it we never optimise ``too far''
and that progress optimising the proxy is always progress optimising the
original likelihood.
But I don't need to say anything about this as it's all part of the standard literature.
Afterword
As a side effect we have a general purpose optimisation algorithm that has nothing to do with statistics. If your goal is to compute
you can iterate, at each step computing
where is the previous iteration.
If the take a convenient form then this may turn out to be much easier.
Note
This was originally written as a PDF using LaTeX. It'll be available here for a while. Some fidelity was lost when converting it to HTML.
Do you use an LLM for coding? Do you maintain a personal benchmark based on problems you have posed the LLM? The purpose of this blog post is to convince you should do this: that you can do so with marginal effort on top of your day-to-day vibe coding and that you will get both short and long term benefits from making your own personal benchmark exist.
I started thinking about benchmarks for coding in part with my frustration with the discourse around LLMs in the public squares I frequent (Reddit and Twitter). People often want to know "what's the best model" or "what's the best coding IDE"? One might imagine that the way to answer this question would be to test the models on a variety of problems from real world uses of the LLM for coding, and then compare how well various systems do on this. Indeed, whenever a new SOTA model releases, the lab will usually tell you about the model's performance against a few well known coding benchmarks. Problem solved?
Of course not! In fact, for the most part, no one really talks about benchmarks when comparing models. Why? I argue the most popular benchmarks measure tasks that are largely different from what a user wants out of an LLM. For example, take the recent Gemini 2.5 Pro release. In their headline table, they test against LiveCodeBench, Aider Polyglot and SWE-bench Verified. Both LiveCodeBench and Aider Polyglot derive their problems from contest programming and pedagogical exercises (respectively), while SWE-bench assesses bug fixes to preexisting codebases. While useful, this is only a small slice things people want to do with LLMs.
Wouldn't it be great if you had your own, personal benchmark, based on problems you actually care about? If you are tweaking your .cursorrules, you could run your benchmark to see if a change you made helped or not. When a new model comes out, you could spend a few bucks to run your eval and make a decision if you should switch your daily driver. And then on social media, if you wanted to stan the new model, instead of asking the model to drop a ball inside a rotating hexagon or vagueposting about how the new model is incredible, you could just post your benchmark results.
It's a collection of nearly 100 tests I've extracted from my actual conversation history with various LLMs.
There are two defining features of this benchmark that make it interesting. Most importantly, I've implemented a simple dataflow domain specific language to make it easy for me (or anyone else!) to add new tests that realistically evaluate model capabilities. This DSL allows for specifying both how the question should be asked and also how the answer should be evaluated. Most questions are evaluated by actually running the code the model writes but the framework supports a bunch of other evaluation methods as well. And then, directly as a result of this, I've written nearly 100 tests for different situations I've actually encountered when working with LLMs as assistants.
I have been working on my own benchmark based off of Carlini's benchmark, and I can confirm that this works well for the traditional style of coding eval, where you have a one-shot task that generates and executes the code against some test cases. My basic strategy is to vibe code as usual, but whenever I give an LLM a task that it isn't able to one shot, I consider adding it to the benchmark. In more detail:
I only add a task if a SOTA LLM failed it. This ensures the benchmark consists of all appropriate difficulty problems: easy enough that I thought an LLM should be able to do it, but hard enough that a SOTA model failed on it. I don't need problems that are too hard (this is already well covered by well known benchmarks like SWE-Bench or SWE-Lancer), and I don't mind if my problems saturate because, hey, that means the models are that much better for my use cases!
After I have added the task to the benchmark, I can use the benchmark runner to tell if changing the model, tweaking the prompt, or even just running the prompt again at nonzero temperature can make it pass. Indeed, it's helpful to find some configuration that makes the eval pass, as this is good for debugging issues in the evaluation function itself... also it means you have working code for whatever task you were working on. Conversely, you can make the task harder by leaving things out from the prompt.
Writing the test is the labor intensive part, but you can always vibe code a test. Importantly, you have a failing implementation (your initial generation) and some way you (manually?) determined that the implementation was wrong, so just turn this into your evaluation function! (And for all you yak shaving aficionados, if the model fails to vibe code your test, well, you have another task for your benchmark!)
For example, the other day I needed to take an asciinema recording and convert it into a sequence of frames rendered as plain text. However, the only project for doing these conversations was agg, which converts recordings into animated gifs. In
agg_to_text, I ask an LLM to take agg's source code and create a new program which dumps the frames as plain text rather than gif images. The reason why this task is difficult, is because there is some discretion in deciding when to emit a frame, and with my original prompt the LLM didn't precisely replicate the original behavior in agg. While working on the benchmark, I realized that instructing the model specifically about how frame batching worked was enough to get it to preserve the original behavior. But I don't think I should need to do this: thus this task. (P.S. If this test saturates, well, I can always make it harder by removing the agg source code from the prompt.)
The ability to benchmark one shot tasks is here today, but I would like to speculate a bit about what lies beyond them. In particular, most of my LLM coding activity involves asking the LLM to make changes to a pre-existing project, which makes it less amenable to "single prompt creates self contained program". (Also, I usually only ask one-shot questions that the LLM can answer, so most of them would never go in my benchmark.)
In short, how can I extract tasks from my day-to-day work? There seems to be two big extra levers we have:
Codebase tasks. This is the heavy-weight approach: you record the Git commit of your codebase at the time you prompted for some new feature to be added, and then when you want to run an eval on a new model you just check out the codebase at that commit and let the end-to-end system go. You'll typically want to execute the modified code, which means you'll also need a way to reliably setup the runtime environment for the code; things like lockfiles can help a lot here.
Transcript tasks. You don't actually need the entire codebase to be available to ask an LLM for a completion; you only need the conversation transcript up to the point of the critical generation. If the transcript is mostly your agent system reading in files for context, you can end up with a relatively system generic prompt that can tell you something about other systems. Of course, if you want to actually run the change, you still need the full codebase, which is why this approach is much more amenable if you're going to do some static analysis on the output. For example, if a model keeps adding try: ... except: ... blocks that are suppressing errors, you can take some transcripts where you've caught the model red-handed doing this and make an eval that checks if the model is still doing this. I suspect testing on transcripts works best for testing if changing prompts or rules improves performance, since the transcript itself will put the model into some particular latent space and if it were a different model they might have made different choices leading to a different latent space. Transcripts from thinking models are especially susceptible to this!
I have started adapting Carlini's framework to work better for these cases, although I would love to be told someone has already solved this problem for me. In particular, I am very excited about using transcript tasks to evaluate whether or not things I add to my prompts / triggered rules are helping or not. Current SOTA model instruction following isn't great and I regularly catch models doing behaviors that I explicitly told them not to in the system prompt. I have started some initial analysis over all of my chat logs to find cases where the model misbehaved, although I haven't quite worked out how I want to build an eval out of it.
One word of warning: to make transcript tasks, you need an AI coding system that doesn't obscure how it assembles its underlying prompts (which rules out most of the popular closed source AI code editors.)
I started building evals for a selfish reason: I wanted to be able to tell if modifications to my prompts were doing anything. But I also think there is a broader opportunity that arises if we also publish these benchmarks to the world.
For one, building a real world benchmark on use cases we care about is a way to communicate to the people training AI models whether or not they are doing well or not. Historical evals have focused on LeetCoding, and consequently we have models that would ace any big tech interview and yet on real world tasks will drive you off a cliff at the first opportunity. And this is not just free labor for the top labs: if you believe in open source models, one of the biggest barriers to good small models is having really high quality data. We, the OSS vibe coding community, can directly help here.
I think there is a tremendous opportunity for the open source community to really push the state of the art in coding evaluations. There's only so many benchmarks that I, personally, can create, but if everyone is making benchmarks I could eventually imagine a universe of benchmarks where you could curate the problems that are relevant to your work and quickly and cheaply judge models in this way: a Wikipedia of Coding Benchmarks.
To summarize: every time an LLM fails to solve a problem you ask it for, this is a potential new benchmark. As long as there is a way to automate testing if the LLM has solved the problem, you can turn this into a benchmark. Do this for yourself, and you can quickly have a personal benchmark with which to evaluate new models. Do this at scale, and you can help push the frontier in coding models.
Haskell is the world’s best programming language1, but let’s
face the harsh reality that a lot of times in life you’ll have to write in other
programming languages. But alas you have been fully Haskell-brained and
lost all ability to program unless it is type-directed, you don’t even know how
to start writing a program without imagining its shape as a type first.
Well, fear not. The foundational theory behind Algebraic Data Types and
Generalized Algebraic Data Types (ADTs and GADTs) are so fundamental that
they’ll fit (somewhat) seamlessly into whatever language you’re forced to write.
After all, if they can fit profunctor
optics in Microsoft’s Java code, the sky’s the limit!
This is an “April Fools” joke in the tradition of my previous
one in some of these ways that we are going to twist these other languages
might seem unconventional or possibly ill-advised… but also the title is
definitely a lie: these languages definitely should have them! :D
Normal ADTs
As a reminder, algebraic Data Types (ADTs) are products and sums; that’s why
they’re algebraic, after all!
Product Types
Products are just immutable structs, which pretty much every language
supports — as long as you’re able to make sure they are never mutated.
This is much simpler in languages where you can associate functions with
data, like OOP and classes. For example, this is the common “value object”
pattern in java (roughly related to the java bean2):
In this case, not only are these ADTs (algebraic data types), they’re also
ADTs (abstract data types): you are meant to work with them
based on a pre-defined abstract interface based on type algebra, instead of
their internal representations.
Sum Types
If your language doesn’t support sum types, usually the way to go is with the
visitor pattern: the underlying implementation is hidden, and the only
way to process a sum type value is by providing handlers for every branch — a
pattern match as a function, essentially. Your sum values then basically
determine which handler is called.
For example, we can implement it for a network address type that can either
be IPv4 or IPv6. Here we are using C++ just for generics and lambdas with
closures, for simplicity, but we’ll discuss how this might look in C later.
Note that in this way, the compiler enforces that we handle every branch.
And, if we ever add a new branch, everything that ever consumes
IPAddress with an IPAddressVisitor will have to add a
new handler.
In a language without generics or powerful enough polymorphism, it’s
difficult to enforce the “pure” visitor pattern because you can’t ensure that
all branches return the same type.
One common pattern is to have an “effectful” visitor pattern, where the point
isn’t to return something, but to execute something on the payload of
the present branch. This is pretty effective for languages like C, javascript,
python, etc. where types aren’t really a rigid thing.
For example, this might be how you treat an “implicit nullable”:
This is basically for_ from Haskell: You can do something like
conditionally launch some action if the value is present.
visitMaybe( () =>console.log("Nothing to request"), (reqPayload) =>makeRequest("google.com", reqPayload), maybeRequest);
On a simpler note, if your language as subtyping built in (maybe with classes
and subclasses) or some other form of dynamic dispatch, you can implement it in
terms of that, which is nice in python, java, C++, etc.
interface ExprVisitor<R>{ R visitLit(int value); R visitNegate(Expr unary); R visitAdd(Expr left, Expr right); R visitMul(Expr left, Expr right);}abstractclass Expr {publicabstract<R> R accept(ExprVisitor<R> visitor);}
Alternatively, you’re in a language where lambdas are easy, instead of
tupling up the visitor, you could just have accept itself take a
number of arguments corresponding to each constructor:
//Alternative definition without an explicit Visitorclassabstract classExpr { public abstract <R>R accept(Function<int,R> visitLit,Function<Expr,R> visitNegate,BiFunction<Expr,Expr,R> visitAdd,BiFunction<Expr,Expr,R> visitMul );}
(Note that C++ doesn’t allow template virtual methods — not because it’s not
possible within the language semantics and syntax, but rather because the
maintainers are too lazy to add it — so doing this faithfully requires a bit
more creativity)
Now, if your language has dynamic dispatch or subclass polymorphism, you can
actually do a different encoding, instead of the tagged union. This will work in
languages that don’t allow or fully support naked union types, too. In this
method, each constructor becomes a class, but it’s important to only
allow access using accept to properly enforce the sum type
pattern.
class Lit extends Expr {privatefinalint value;publicLit(int value){this.value= value;}@Overridepublic<R> R accept(ExprVisitor<R> visitor){return visitor.visitLit(value);}}class Negate extends Expr {privatefinal Expr unary;publicNegate(Expr unary){this.unary= unary;}@Overridepublic<R> R accept(ExprVisitor<R> visitor){return visitor.visitNegate(unary);}}class Add extends Expr {privatefinal Expr left;privatefinal Expr right;publicAdd(Expr left, Expr right){this.left= left;this.right= right;}@Overridepublic<R> R accept(ExprVisitor<R> visitor){return visitor.visitAdd(left, right);}}class Mul extends Expr {privatefinal Expr left;privatefinal Expr right;publicMul(Expr left, Expr right){this.left= left;this.right= right;}@Overridepublic<R> R accept(ExprVisitor<R> visitor){return visitor.visitMul(left, right);}}
(But, just wanted to note that if you actually are working in java,
you can actually do something with sealed classes, which allows exhaustiveness
checking for its native switch/case statements.)
Alternatively you could make all of the subclasses anonymous and expose them
as factory methods, if your language allows it:
abstractclass Expr {publicabstract<R> R accept(ExprVisitor<R> visitor);publicstatic Expr lit(int value){returnnewExpr(){@Overridepublic<R> R accept(ExprVisitor<R> visitor){return visitor.visitLit(value);}};}publicstatic Expr negate(Expr unary){returnnewExpr(){@Overridepublic<R> R accept(ExprVisitor<R> visitor){return visitor.visitNegate(unary);}};}publicstatic Expr add(Expr left, Expr right){returnnewExpr(){@Overridepublic<R> R accept(ExprVisitor<R> visitor){return visitor.visitAdd(left, right);}};}// ... etc}
Passing around function references like this is actually pretty close to the
scott encoding of our data type — and for non-recursive types, it’s essentially
the church encoding.
Recursive Types
Speaking of recursive types…what if your language doesn’t allow recursive
data types? What if it doesn’t allow recursion at all, or what if recursively
generated values are just annoying to deal with? Just imagine writing that
Expr type in a language with explicit memory management, for
example. Or, what if you wanted a way to express your recursive types in a more
elegant and runtime-safe manner?
One thing you can instead do is have your visitor be in its “catamorphism”,
or church encoding. Instead of having the “visitor” take the recursive
sub-values, instead have it return the result of recursively applying
itself.
Let’s do this in dhall, one of the most famous non-recursive
languages. Dhall does have native sum types, so we won’t worry about
manually writing a visitor pattern. But it does not have recursive data
types.
Let’s define a type like:
dataExpr=LitNatural|AddExprExpr|MulExprExpr
But we can’t define data types in dhall that refer to themselves. So instead,
we can define them in their “church encoding”: give what you would do with an
Expr to consume it, where the consumption function is given as if
it were recursively applied.
Note that ExprF r is essentially
ExprVisitor<R>, except instead of add being
Expr -> Expr -> r, it’s r -> r -> r: the
input values aren’t the expression, but rather the results of recursively
folding on the expression. In fact, our original non-recursive
ExprVisitor<R> (to be more precise, the
R accept(ExprVisitor<R>)) is often called the “scott
encoding”, as opposed to the recursive “church encoding” fold.
For value creation, you take the visitor and recursively apply:
And finally, using the data type involves providing the
handler to fold up from the bottom to top. Note that
add : \(left : Natural) -> \(right : Natural) -> left + right
already assumes that the handler has been applied to the sub-expressions, so you
get Naturals on both sides instead of Expr.
This pattern is useful even in languages with good datatype recursion, like
Haskell — it’s actually the recursion-schemes
refactoring of a recursive data type, and it can be useful to have it live
alongside your normal recursive types. I’ve written this blog
post talking about how useful this pattern is to have alongside your normal
recursive types.
This pattern is pretty portable to other languages too, as long as you can
scrounge together something like Rank-N types:
interface ExprFold<R>{ R foldLit(int value); R foldNegate(R unary); R foldAdd(R left, R right); R foldMul(R left, R right);}interface Expr {publicabstract<R> R accept(ExprFold<R> fold);publicstatic Expr lit(int value){returnnewExpr(){@Overridepublic<R> R accept(ExprFold<R> fold){return fold.foldLit(value);}};}publicstatic Expr negate(Expr unary){returnnewExpr(){@Overridepublic<R> R accept(ExprFold<R> fold){return fold.foldNegate(unary.accept(fold));}};}// etc.}
By “Rank-N types” here, I mean that your objects can generate polymorphic
functions: given an Expr, you could generate an
<R> R accept(ExprFold <R> fold) for any R,
and not something pre-determined or pre-chosen by your choice of representation
of Expr.
Generalized Algebraic Data Types
You’ve implemented ADTs in your language of choice, or you are currently in a
language with native ADTs. Life is good, right? Until that sneaky voice starts
whispering in your hear: “we need more type safety.” You resist that urge, maybe
even get a lot done without it, but eventually you are compelled to give in and
embrace the warm yet harsh embrace of ultimate type safety. Now what?
Singletons and Witnesses
In Haskell, singletons are essentially enums used to associate a value with a
reifiable type. “Reifiable” here means that you can take the runtime value of a
singleton and use it to bring evidence to the type-level. I ran into a
real-world usage of this while writing https://coronavirus.jle.im/, a web-based data visualizer of
COVID-19 data (source here) in
purescript. I needed a singleton to represent scales for scatter plots
and linking them to the data that can be plotted. And, not only did it need to
be type-safe in purescript (which has ADTs but not GADTs), it had to be
type-safe in the javascript ffi as well.
Here’s how it might look in Haskell:
-- | Numeric typesdataNType ::Type->TypewhereNInt ::NTypeIntNDouble ::NTypeDoubleNPercent ::NTypePercent-- | Define a scaledataScale ::Type->TypewhereScaleDate ::ScaleDateScaleLinear ::Bool->NType a ->Scale a -- ^ whether to include zero in the axis or notScaleLog ::NType a ->Scale a
You’d then run it like this:
plot ::Scale a ->Scale b -> [(a, b)] ->Canvas
So, we have the type of the input tuples being determined by the
values you pass to plot:
But let’s say we only had ADTs. And then we’re passing them down to a
javascript FFI which only has structs and functions. We could drop the
type-safety and instead error on runtime, but…no. Type unsafety is not
acceptable.
The fundamental ability we want to gain is that if we pattern match on
ScaleDate, then we knowa has to be
Date. If we match on NInt, we know that ahas to be Int.
For the sake of this example, we’re going to be implementing a simpler
function in purescript and in javascript: a function that takes a scale type and
a list of points prints the bounds. In Haskell, this looks like:
(Pretend the Percent type is just a newtype-wrapped
Float or something)
There are at least two main approaches to do this. We’ll be discussing
runtime equality witnesses and Higher-Kinded Eliminators.
Runtime Witnesses and Coyoneda
Embedding
The Yoneda Lemma
is one of the most powerful tools that Category Theory has yielded as a branch
of math, but its sibling coyoneda
is one of the most useful Haskell abstractions.
This doesn’t give you GADTs, but it’s a very lightweight way to “downgrade”
your GADTs into normal GADTs which is appropriate if you don’t need the full
power.
The trick is this: if you have MyGADT a, and you know you are
going to be using it to produceas, you can do a covariant
coyoneda transform.
For example, if you have this type representing potential data sources:
dataSource ::Type->TypewhereByteSource ::Handle->SourceWordStringSource ::FilePath->SourceStringreadByte ::Handle->IOWordreadString ::FilePath->IOStringreadSource ::Source a ->IO areadSource = \caseByteSource h -> readByte hStringSource fp -> readString fp
You could instead turn Source into a non-GADT by making it a
normal parameterized ADT and adding a X -> a field, which is a
type of CPS transformation:
dataSource a =ByteSourceHandle (Word-> a)|StringSourceFilePath (String-> a)byteSource ::Handle->SourceWordbyteSource h =ByteSource h idstringSource ::FilePath->SourceStringstringSource fp =StringSource fp idreadSource ::Source a ->IO areadSource = \caseByteSource h out -> out <$> readByte hStringSource fp out -> out <$> readString fp
A nice benefit of this method is that Source can now have a
Functor instance, which the original GADT could not.
And, if MyGADT a is going to be consumingas, you can do the contravariant
coyoneda transform:
dataSink a =ByteSinkHandle (a ->Word)|StringSinkFilePath (a ->String)
And, if you are going to be both consuming and producing as, you
can do the invariant coyoneda transform
dataInterface a =ByteInterfaceHandle (Word-> a) (a ->Word)|StringInterfaceFilePath (String-> a) (Word-> a)
However, in practice, true equality involves being able to lift
under injective type constructors, and carrying every single
continuation is unwieldy. We can package them up together with a runtime
equality witness.
This is something we can put “inside” NInt such that, when we
pattern match on a NType a, the type system can be assured that
a is an Int.
You need some sort of data of type IsEq a b with functions:
refl :: IsEq a a
to :: IsEq a b -> a -> b
sym :: IsEq a b -> IsEq b a
trans :: IsEq a b -> IsEq b c -> IsEq a c
inj :: IsEq (f a) (f b) -> IsEq a b
If you have to and sym you also get
from :: IsEq a b -> b -> a.
From all of this, we can recover our original
IsEq a Word -> Word -> a and
IsEq a Word -> a -> Word functions, saving us from having to
put two functions.
Your language of choice might already have this IsEq. But one of
the more interesting ways to me is Leibniz equality (discussed a lot in this
Ryan Scott post), which works in languages with higher-kinded polymorphism.
Leibniz quality in languages with higher-kinded polymorphism means that
a and b are equal if
forall p. p a -> p b: any property of a is also
true of b.
In Haskell, we write this like:
newtypeLeibniz a b =Leibniz (forall p. p a -> p b)refl ::Leibniz a arefl =Leibnizid
The only possible way to construct a ‘Leibniz’ is with both type parameters
being the same: You can only ever create a value of type
Leibniz a a, never a value of Leibniz a b where
b is not a.
You can prove that this is actually equality by writing functions
Leibniz a b -> Leibniz b a and
Leibniz a b -> Leibniz b c -> Leibniz a c (this
Ryan Scott post goes over it well), but in practice we realize this equality
by safely coercing a and b back and forth:
newtypeIdentity a =Identity { runIdentity :: a }to ::Leibniz a b -> a -> bto (Leibniz f) = runIdentity . f .IdentitynewtypeOp a b =Op { getOp :: b -> a }from ::Leibniz a b -> b -> afrom (Leibniz f) = getOp (f (Opid))
So, if your language supports higher-kinded Rank-2 types, you have a
solution!
There are other solutions in other languages, but they will usually all be
language-dependent.
Let’s write everything in purescript. The key difference is we use
map (to isNumber) :: Array a -> Array Number, etc., to get our
Array as something we know it has the type of.
importText.PrintfnewtypeLeibniz a b =Leibniz (forall p. p a -> p b)to ::Leibniz a b -> a -> bfrom ::Leibniz a b -> b -> adataNType a =NInt (Leibniz a Int)|NNumber (Leibniz a Number)|NPercent (Leibniz a Percent)typeAxisBounds a = { minValue :: a , minLabel ::String , maxValue :: a , maxLabel ::String }displayNumericAxis ::NType a ->Array a ->AxisBounds adisplayNumericAxis = \caseNInt isInt -> \xs ->let xMin =minimum$map (to isInt) xs xMax =maximum$map (to isInt) xsshowInt=showin { minValue: xMin , minLabel:showInt xMin , maxValue: xMax , maxLabel:showInt xMax }NNumber isNumber -> \xs ->let xMin =minimum$map (to isNumber) xs xMax =maximum$map (to isNumber) xs showFloat = printf (Proxy ::Proxy"%.4f") -- it works a little differentlyin { minValue: xMin , minLabel: showFloat xMin , maxValue: xMax , maxLabel: showFloat xMax }NPercent isPercent -> \xs ->let xMin =minimum$map (to isPercent) xs xMax =maximum$map (to isPercent) xs showPercent = printf (Proxy ::Proxy"%.1f%%") <<< (_ *100.0)in { minValue: xMin , minLabel: showPercent xMin , maxValue: xMax , maxLabel: showPercent xMax }
To work with our [a] as if it were [Int], we have
to map the coercion function over it that our Leibniz a Int gave
us. Admittedly, this naive way adds a runtime cost of copying the array. But we
could be more creative with finding the minimum and maximum in this way in
constant space and no extra allocations.
And, if we wanted to outsource this to the javascript FFI, remember that
javascript doesn’t quite have sum types, so we can create a quick visitor:
typeNVisitor a r = { nvInt ::Leibniz a Int-> r , nvNumber ::Leibniz a Number-> r , nvPercent ::Leibniz a Percent-> r }typeNAccept a =forall r.NVisitor a r -> rtoAccept ::NType a ->NAccept atoAccept =case _ ofNInt isInt -> \nv -> nv.nvInt isIntNNumber isNumber -> \nv -> nv.nvNumber isNumberNPercent isPercent -> \nv -> nv.nvPercent isPercentforeign import _formatNumeric :: forall a. Fn2 (NAccept a) a StringformatNumeric ::NType a -> a ->StringformatNumeric nt = runFn2 _formatNumeric (toAccept nt)
Admittedly in the javascript we are throwing away the “GADT type safety”
because we throw away the equality. But we take what we can — we at least retain
the visitor pattern for sum-type type safety and exhaustiveness checking. I
haven’t done this in typescript yet so there might be a way to formalize Leibniz
equality to do this in typescript and keep the whole chain type-safe from top to
bottom.
Higher-Kinded Eliminators
This is essentially the higher-kinded version of the visitor pattern, except
in dependent type theory these visitors are more often called “eliminators” or
destructors, which is definitely a cooler name.
In the normal visitor you’d have:
dataUser=TheAdmin|MemberIntdataUserHandler r =UH { uhTheAdmin :: r , uhMember ::Int-> r }
But note that if you have the right set of continuations, you have something
that is essentially equal to User without having to actually use
User:
typeUser'=forall r.UserHandler r -> rfromUser ::User->User'fromUser = \caseTheAdmin-> \UH{..} -> uhTheAdminMember userId -> \UH{..} -> uhMember userIdtoUser ::User'->FootoUser f = f $UH { fhTheAdmin =TheAdmin, fhMember =Member }
This means that User is actually equivalent to
forall r. UserHandler r -> r: they’re the same type, so if your
language doesn’t have sum types, you could encode it as
forall r. UserHandler r -> r instead. Visitors, baby.
But, then, what actually does the r type variable represent
here, semantically? Well, in a UserHandler r, r is the
“target” that we interpret into. But there’s a deeper relationship between
r and User: A UserHandler r essentially
“embeds” a User into an r. And, a
UserHandler r -> r is the application of that embedding to an
actual User.
If we pick r ~ (), then UserHandler () embeds
User into (). If we pick r ~ String, then
UserHandler () embeds User into String
(like, “showing” it). And if we pick r ~ User, a
UserHandler User embeds a User into…itself?
So here, r is essentially the projection that we view the user
through. And by making sure we are forall r. UserHandler r -> r
for allr, we ensure that we do not lose any information:
the embedding is completely 1-to-1. It lets you “create” the User
faithfully in a “polymorphic” way.
In fact, to hammer this home, some people like to use the name of the type as
the type variable: UserHandler user:
-- | The same thing as before but with things renamed to prove a pointdataMakeUser user =MakeUser { uhTheAdmin :: user , uhMember ::Int-> user }typeUser'=forall user.MakeUser user -> user
The forall user. lets us faithfully “create” a User
within the system we have, without actually having a User data
type. Essentially we can imagine the r in the forall r
as “standing in” for User, even if that type doesn’t actually
exist.
Now, here’s the breakthrough: If we can use forall (r :: Type)
to substitute for User :: Type, how about we use a
forall (p :: Type -> Type) to substitute for a
Scale :: Type -> Type?
dataScale ::Type->TypewhereScaleDate ::ScaleDateScaleLinear ::Bool->LType a ->Scale aScaleLog ::NType a ->Scale adataScaleHandler p a =SH { shDate :: p Date , shLinear ::Bool->NType a -> p a , shLog ::NType a -> p a }typeScale' a =forall p.ScaleHandler p a -> p afromScale ::Scale a ->Scale' afromScale = \caseScaleDate-> \SH{..} -> shDateScaleLinear hasZero lt -> \SH{..} -> shLinear hasZero ltScaleLog nt -> \SH{..} -> shLog nttoScale ::Scale' a ->Scale atoScale f = f $SH { shDate =ScaleDate, shLinear =ScaleLinear, shLog =ScaleLog }
So in our new system, forall p. ScaleHandler p a -> p a is
identical to Scale: we can use p a to substitute in
Scale in our language even if our language itself cannot support
GADTs.
So let’s write formatNType in purescript. We no longer have an
actual Scale sum type, but its higher-kinded church encoding:
typeNType a =forall p. { int :: p Int , number :: p Number , percent :: p Percent } -> p atypeScale a =forall p. { date :: p Date , linear ::Bool->NType a -> p a , log ::NType a -> p a } -> p antInt ::NTypeIntntInt nth = nth.intntNumber ::NTypeNumberntNumber nth = nth.numberntPercent ::NTypePercentntPercent nth = nth.percentformatNType ::NType a -> a ->StringformatNType nt = fwhereOp f = nt { int:Opshow , number:Op$ printf (Proxy"%.4f") , percent:Op$ printf (Proxy"%.1f%%") <<< (_ *100.0) }
Here we are using
newtypeOp b a =Op (a -> b)
as our “target”: turning an NType a into an
Op String a. And an Op String a is an
a -> String, which is what we wanted! The int field
is Op String Int, the number field is
Op String Number, etc.
In many languages, using this technique effectively requires having a newtype
wrapper on-hand, so it might be unwieldy in non-trivial situations. For example,
if we wanted to write our previous axis function which is
NType a -> [a] -> String, we’d have to have a newtype wrapper
for [a] -> String that has a as its argument:
newtypeOpList b a =Op ([a] -> b)
or you could re-use Compose:
newtypeCompose f g a =Compose (f (g a))
and your p projection type would be Compose Op [].
So, you don’t necessarily have to write a bespoke newtype wrapper, but you do
have to devote some brain cycles to think it through (unless you’re in a
language that doesn’t need newtype wrappers to have this work, like we’ll
discuss later).
By the way, this method generalizes well to multiple arguments: if you have a
type like MyGADT a b c, you just need to project into a
forall (p :: k1 -> k2 -> k3 -> Type).
I believe I have read somewhere that the two methods discussed here (runtime
equality witness vs. higher-kinded eliminator) are not actually fully identical
in their power, and there are GADTs where one would work and not the other … but
I can’t remember where I read this and I’m also not big-brained enough to figure
out what those situations are. But if you, reader, have any idea, please let me
know!
Existential Types
Let’s take a quick break to talk about something that’s not
technically related to GADTs but is often used alongside them.
What if we wanted to store a value with its NType and hide the
type variable? In Haskell we’d write this like:
dataNType ::Type->TypewhereNInt ::NTypeIntNDouble ::NTypeDoubleNPercent ::NTypePercentdataSomeNType=forall a.SomeNType (NType a) aformatNType ::NType a -> a ->StringformatNType nt x =...formatSomeNType ::SomeNType->StringformatSomeNType (SomeNType nt x) = formatNType nt xmyFavoriteNumbers :: [SomeNType]myFavoriteNumbers = [SomeNTypeNInt3, SomeNTypeNDoublepi]
But what if our language doesn’t have existentials? Remember, this is
basically a value SomeNType that isn’t a Generic, but
contains both a NType a and an a of the
same variable.
One strategy we have available is to CPS-transform our existentials into
their CPS form (continuation-passing style form). Basically, we write exactly
what we want to do with our contents if we pattern matched on them.
It’s essentially a Rank-N visitor pattern with only a single constructor:
typeSomeNType=forall r. (forall a.NType a -> a -> r) -> rsomeNType ::NType a -> a ->SomeNTypesomeNType nt x = \f -> f nt xformatSomeNumeric ::SomeNType->StringformatSomeNumeric snt = snt \nt x -> formatNumeric nt x
You can imagine, syntactically, that snt acts as its “own”
pattern match, except instead of matching on
SomeNType nt x -> .., you “match” on
\nt x -> ..
This general pattern works for languages with traditional generics like Java
too:
interface SomeNTypeVisitor<R>{<A> R visit(NType<A> nt, A val);}interface SomeNType {publicabstract<R> R accept(SomeNTypeVisitor<R> visitor);// One option: the factory methodpublicstatic<A> SomeNType someNType(NType<A> nt, A val){returnnewSomeNType(){@Overridepublic<R> R accept(SomeNTypeVisitor<R> visitor){return visitor.visit(nt, val);}};}}// Second option: the subtype hiding a type variable, which you have to always// make sure to upcast into `SomeNType` after creatingclass SomeNTypeImpl<A>extends SomeNType {private NType<A> nt;private A val;publicSomeNTypeImpl(NType<A> nt, A val){this.nt= nt;this.val= val;}@Overridepublic<R> R accept(SomeNTypeVisitor<R> visitor){return visitor.visit(nt, val);}}
Does…anyone write java like this? I tried committing this once while at
Google and I got automatically flagged to be put on a PIP.
Recursive GADTs
The climax of this discussion: what if your language does not support GADTs
or recursive data types?
We’re going to be using dhall as an example again, but note that the
lessons applied here are potentially useful even when you do have
recursive types: we’re going to be talking about a higher-kinded church
encoding, which can be a useful form of your data types that live alongside your
normal recursive ones.
Let’s imagine Expr as a GADT, where Expr a
represents an Expr that evaluates to an a:
dataExpr ::Type->TypewhereNatLit ::Natural->ExprNaturalBoolLit ::Bool->ExprBoolAdd ::ExprNatural->ExprNatural->ExprNaturalLTE ::ExprNatural->ExprNatural->ExprBoolTernary ::ExprBool->Expr a ->Expr a ->Expr aeval ::Expr a -> aeval = \caseNatLit n -> nBoolLit b -> bAdd x y -> eval x + eval yLTE a b -> eval a <= eval bTernary b x y ->if eval b then eval x else eval y
Adding this type variable ensures that our Expr is type-safe:
it’s impossible to Add an Expr Bool, and the two
branches of a Ternary must have the same result type, etc. And, we
can write eval :: Expr a -> a and know exactly what type will be
returned.
Now, let’s combine the two concepts: First, the church encoding, where our
handlers take the “final result” of our fold r instead of the
recursive value Expr. Second, the higher-kinded eliminator pattern
where we embed Expr :: Type -> Type into
forall (p :: Type -> Type).
Again, now instead of add taking Expr, it takes
p Natural: the “Natural result of the fold”.
p not only stands in for what we embed Expr into, it
stands in for the result of the recursive fold. That’s why in eval,
the first arguments of add are the Natural results of
the sub-evaluation.
These values can be created in the same way as before, merging the two
techniques, sending the handlers downstream:
If all of this is difficult to parse, try reviewing both the recursive ADT
section and the higher-kinded eliminator section and making sure you understand
both well before tackling this, which combines them together!
Admittedly in Haskell (and purescript) this is a lot simpler because we don’t
have to explicitly pass in type variables:
dataExprF p =ExprF { natLit ::Natural-> p Natural , boolLit ::Bool-> p Bool , add :: p Natural-> p Natural-> p Natural , ternary ::forall a. p Bool-> p a -> p a -> p a }typeExpr a =forall p.ExprF p a -> p aeval ::Expr a -> aeval e = runIdentity $ e { natLit =Identity , boolLit =Identity , add = \(Identity x) -> \(Identity y) ->Identity (x + y) , ternary = \(Identity b) -> \(Identity x) -> \(Identity y) ->if b then x else y }ternary ::ExprBool->Expr a ->Expr a ->Expr aternary b x y handlers = handlers.ternary (b handlers) (x handlers) (y handlers)
But one nice thing about the dhall version that’s incidental to dhall is that
it doesn’t require any extra newtype wrappers like the Haskell one does. That’s
because type inference tends to choke on things like this, but dhall doesn’t
really have any type inference: all of the types are passed explicitly. It’s one
of the facts about dhall that make it nice for things like this.
Congratulations
In any case, if you’ve made it this far, congratulations! You are a master of
ADTs and GADTs. Admittedly every language is different, and some of these
solutions have to be tweaked for the language in question. And, if your program
gets very complicated, there is a good chance that things will become
ergonomically unfeasible.
But I hope, at least, that this inspires your imagination to try to bring
your haskell principles, techniques, standards, practices, and brainrot into the
language of your choice (or language you are forced to work with).
And, if you ever find interesting ways to bring these things into a language
not discussed here (or a new interesting technique or pattern), I would
absolutely love to hear about it!
Until next time, happy “Haskelling”!
Special Thanks
I am very humbled to be supported by an amazing community, who make it
possible for me to devote time to researching and writing these posts. Very
special thanks to my supporter at the “Amazing” level on patreon, Josh Vera! :)
I bet you thought there was going be some sort of caveat in this
footnote, didn’t you?↩︎
I didn’t think I’d ever write “java bean” non-ironically on my
blog, but there’s a first time for everything.↩︎
Be aware that this implementation is not necessarily
appropriately lazy or short-circuiting in Ternary: it might
evaluate both sides returning the chosen branch.↩︎
To visit a tree or graph in breadth-first order, there are two main
implementation approaches: queue-based or level-based.
Our goal here is to develop a level-based approach where the levels of
the breadth-first walk are constructed compositionally and dynamically.
Compositionality means that for every node, its descendants—the other nodes
reachable from it—are defined by composing the descendants of its children.
Dynamism means that the children of a node are generated only when that node
is visited; we will see that this requirement corresponds to asking for a
monadic unfold.
A prior solution, using the Phases applicative functor,
is compositional but not dynamic in that sense. The essence of Phases
is a zipping operation in free applicative functors.
What if we did zipping in free monads instead?
A breadth-first walk explores the tree level by level; every level contains the
nodes at the same distance from the root. The list of levels of a tree can be defined
recursively—it is a fold. For a tree Node x l r, the first level contains
just the root node x, and the subsequent levels are obtained by appending the
levels of the subtrees l and r pairwise.
(We can’t just use zipWith because it throws away the end of a list when the
other list is empty.)
Finally, we concatenate the levels together to obtain the list of nodes in
breadth-first order.
toListBF::Treea-> [a]toListBF=concat.levels
Thanks to laziness, the list will indeed be produced by walking the tree in
breadth-first order.
So far so good.
The above function lets us fold a tree in breadth-first order.
The next level of difficulty is to traverse a tree, producing a tree
with the same shape as the original tree, only with modified labels.
This has the exact same type as traverse, which you might obtain with
deriving (Foldable, Traversable). The stock-derived Traversable—enabled
by the DeriveTraversable extension—is a depth-first traversal, but the laws
of traverse don’t specify the order in which nodes should be visited,
so you could make it a breadth-first traversal if you wanted.
“Breadth-first numbering” is a special case of “breadth-first traversal”
where the arrow (a -> m b) is specialized to a counter.
Okasaki presents a “numbering” solution based on queues and another solution
based on levels.
Both are easily adaptable to the more general “traversal” problem as we will
soon see.
There is a wonderful Discourse thread from 2024 on the topic of
breadth-first traversals.
The first post gives an elegant breadth-first numbering algorithm
which also appears in the appendix of Okasaki’s paper,
but sadly it does not generalize from “numbering” to
“traversal” beyond the special case m = State s.
Last but not least, another level-based solution to the breadth-first traversal
problem can be found in the
tree-traversals library by Noah Easterly.
It is built around an applicative transformer named Phases,
which is a list of actions—imagine the type “[m _]”—where each
element m _ represents one level of the tree.
The Phases applicative enables a compositional definition of a
breadth-first traversal, similarly to the levels function above:
the set of nodes reachable from the root is defined by combining the sets of
nodes reachable from its children. This concern of compositionality
is one of the main motivations behind this post.
Non-standard terminology
The broad family of algorithms being discussed is typically called
“breadth-first search” (BFS) or “breadth-first traversal”,
but in general these algorithms are not “searching” for anything,
and in Haskell, “traversal” is reserved for “things like traverse”.
Instead, this post will use “walks” as a term encompassing folds, traversals,
unfolds, or any concept that can be qualified with “breadth-first”.
Problem statement: Breadth-first unfolds
Both the fold toListBF and the traversal traverseBF had in common that they
receive a tree as an input. This explicit tree makes the notion of levels
“static”. With unfolds, we will have to deal with levels that exist only
“dynamically” as the result of unfolding the tree progressively.
To introduce the unfolding of a tree, it is convenient to introduce its “base
functor”. We modify the tree type by replacing the recursive tree fields with
an extra type parameter:
An unfold generates a tree from a seed and a
function which expands the seed into a leaf or a node containing more seeds.
A pure unfold—or anamorphism—can be defined readily:
The order in which nodes are evaluated depends on
how the resulting tree is consumed. Hence unfold
is neither inherently “depth-first” nor “breadth-first”.
The situation changes if we make the unfold monadic.
unfoldM::Monadm=> (s->m (TreeFas)) ->s->m (Treea)
An implementation of unfoldM must decide upon an ordering between actions.
To see why adding an M to unfold imposes an ordering,
contemplate the fact that these expressions have the same meaning:
Node a (unfold f l) (unfold f r)
= ( let tl = unfold f l in
let tr = unfold f r in
Node a tl tr )
= ( let tr = unfold f r in
let tl = unfold f l in
Node a tl tr )
whereas these monadic expressions do not have the same meaning in general:
( unfoldM f l >>= \tl ->
unfoldM f r >>= \tr ->
pure (Node a tl tr) )
/=
( unfoldM f r >>= \tr ->
unfoldM f l >>= \tl ->
pure (Node a tl tr) )
Without further requirements, there is an “obvious” definition of unfoldM,
which is a depth-first unfold:
We unfold the left subtree l fully before unfolding the right one r.
The problem is to define a breadth-firstunfoldM.
If you want to think about this problem on your own, you can stop reading here.
The rest of this post presents solutions.
Queue-based unfold
The two breadth-first numbering algorithms in Okasaki’s paper can
actually be generalized to breadth-first unfolds.
Here is the first one that uses queues (using the function (<+) for “push” and
pattern-matching on (:>) for “pop”):
If you’re frowning upon the use of error—as you should be—you can replace
error with dummy values here (Empty, Leaf), but
(1) that won’t be possible with tree structures that must be non-empty
(e.g., if Leaf contained a value) and (2) this is dead code, which
is harmless but no more elegant than making it obvious with error.
The correctness of this solution is also not quite obvious.
There are subtle ways to get this implementation wrong:
should the recursive call be b2 <+ b1 <+ q or b1 <+ b2 <+ q?
Should the pattern be p :> t1 :> t2 or p :> t2 :> t1?
For another version of this challenge, try implementing the unfold for another
tree type, such as finger trees or rose trees, without getting lost in the
order of pushes and pops (by the way, this is Data.Tree.unfoldTreeM_BF in
containers). The invariant is not complex but there is room for mistakes.
I believe that the compositional approach that will be presented later is more
robust on that front, although it is admittedly a subjective quality for which
is difficult to make a strong case.
Some uses of unfolds
Traversals from unfolds
One sense in which unfoldM is a more difficult problem than traverse is
that we can use unfoldM to implement traverse.
We do have to make light of the technicality that there is a Monad constraint
instead of Applicative, which makes unfoldM not suited to implement the
Traversable class.
A depth-first unfold gives a depth-first traversal:
We can use a tree unfold to explore a graph.
This usage distinguishes unfolds from folds and traversals,
which only let you explore trees.
Given a type of vertices V, a directed graph is represented by a function
V -> F V, where F is a functor which describes the arity of each node.
The obvious choice for F is lists, but we will stick to TreeF here
so we can just reuse this post’s unfoldM implementations.
The TreeF functor restricts us graphs where each node has zero or two
outgoing edges; it is a weird restriction, but we will make do for the sake of
example.
An ASCII drawing of a graph
+-------+
v |
+->1--->2--->3 |
| | | ^ |
| v v | |
| 4--->5--->6--+
| | | ^
| +----|----+
| |
+-------+
The graph drawn above turns into the following function, where every vertex
is mapped either to NodeF with the same vertex as the first argument followed
by its two adjacent vertices, or to LeafF if it has no outgoing edges or does
not belong to the graph.
If we simply feed that function to unfold, we will get the infinite tree
of all possible paths from a chosen starting vertex.
To obtain a finite tree, we want to keep track of vertices that we have
already visited, using a stateful memory. The following function wraps graph,
returning LeafF also if a vertex has already been visited.
Applying unfoldM_BF to that function produces a “breadth-first tree”
of the graph, an encoding of the trajectory of a breadth-first walk through the
graph. “Breadth-first trees” are a concept from graph theory with well-studied
properties.
-- Visit `graph` in breadth-first orderbfGraph_Q::Int->TreeIntbfGraph_Q= (`evalState`Set.empty) .unfoldM_BF_QvisitGraph
This post is a compilable Literate Haskell file. You can run all of the tests
and benchmarks in here. The source repository provides the necessary
configuration to build it with cabal.
$ cabal build breadth-first-unfolds
Test cases can then be selected with the -p option and a pattern
(see the tasty documentation for details).
Run all tests and benchmarks by passing no option.
$ cabal exec breadth-first-unfolds -- -p "/Q-graph/||/S-graph/"
All
Q-graph: OK
S-graph: OK
“Global” level-based unfold
The other solution from Okasaki’s paper can also be adapted into a monadic unfold.
The starting point is to unfold a list of seeds [s] instead of a single seed:
we can traverse the list with the expansion function s -> m (TreeF a s) to
obtain another list of seeds, the next level of the breadth-first unfold,
and keep going.
Iterating this process naively yields a variant of monadic unfold without a
result. This no-result variant can be generalized from TreeF to
any foldable structure:
Modifying this solution to create the output tree requires a little more thought.
We must keep hold of the intermediate list of ts :: [TreeF a s] to
reconstruct trees after the recursive call returns.
This solution is less brittle than the queue-based solution because
we always traverse lists left-to-right.
To avoid the uses of error in reconstruct,
you can probably create a specialized data structure in place of [TreeF a s],
but that is finicky in its own way.
In search of compositionality
Both of the solutions above (the queue-based and the “monolithic” level-based unfolds)
stem from a global view of breadth-first walks: we are iterating on a list or a
queue which holds all the seeds from one or two levels at a time.
That structure represents a “front line” between visited and unvisited
vertices, and every iteration advances the front line a little: with a queue we
advance it one vertex at a time, with a list we advance the whole front line
in an inner loop—one call to traverse—before recursing.
The opposite local view of breadth-first order is exemplified by the earlier
levels function: it only produces a list of lists of the vertices
reachable from the current root. It does so recursively, by composing
together the vertices reachable from its children. Our goal here is to find a
similarly local, compositional implementation of breadth-first unfolds.
Rather than defining unfoldM directly, which sequences the computations on
all levels into a single computation, we will introduce an intermediate
function weave that keeps levels separate—just as toListBF is defined
using levels.
The result of weave will be in an as yet unknown applicative functor F m
depending on m.
And because levels are kept separate, weave only needs
a constraint Applicative m to compose computations on the same level.
The goal is to implement this signature, where the result type F is also an
unknown:
With only what we know so far, a bit of type-directed programming leads to the
following incomplete definition. We have constructed something of type
m (F m (Tree a)), while we expect F m (Tree a):
To fill the hole _, we postulate the following primitive, weft,
as part of the unknown definition of F:
weft::Applicativem=>m (Fma) ->Fma
Intuitively, F m represents “multi-level computations”.
The weft function constructs a multi-level (F m)-computation from
one level of m-computation which returns the subsequent levels
as an (F m)-computation.
We fill the hole with weft, completing the definition of weave:
The function weave defines a multi-level computation which represents
a breadth-first walk from a seed s:
the first level of the walk is f s, expanding the initial seed;
the auxiliary function weaveF constructs the remaining levels from
the initial seed’s expansion:
if the seed expands to LeafF, there are no more seeds,
and we terminate with an empty computation (pure);
if the seed expands to NodeF, we obtain two sub-seeds l and r,
they generate their own weaves recursively (weave f l and weave f r),
and we compose them (liftA2).
One way to think about weft is as a generalization of the following primitives:
we can “embed” m-computations into F m,
and we can “delay” multi-level (F m)-computations, shifting the
m-computation on each level to the next level.
The key law relating these two operations is that embedded computations
and delayed computations commute with each other:
embed u *> delay v = delay v <* embed u
The embed and delay operations are provided by the Phases applicative
functor that I mentioned earlier, which enables breadth-first traversals,
but not breadth-first unfolds. Thus, weft is a strictly more expressive
primitive than embed and delay.
Eventually, we will run a multi-level computation as a single m-computation
so that we can use weave to define unfoldM. The runner function will be
called mesh:
mesh::Monadm=>Fma->ma
It is characterized by this law which says that mesh executes the first
level of the computation u :: m (F m a), then executes the remaining levels
recursively:
mesh (weft u) = u >>= mesh
Putting everything together, weave and mesh combine into a breadth-first unfold:
It remains to find an applicative functor F equipped with weft and mesh.
The weave applicative
A basic approach to design a type is to make some of the operations it
should support into constructors. The weave applicative WeaveS has
constructors for pure and weft:
dataWeaveSma=EndSa|WeftS (m (WeaveSma))
(The suffix “S” stands for Spoilers. Read on!)
We instantiate the unknown functor F with WeaveS.
typeF=WeaveS
Astute readers will have recognized WeaveS as the free monad.
Just as Phases has the same type definition as the free applicative functor but
a different Applicative instance, we will give WeaveS an Applicative
instance that does not coincide with the Applicative and Monad instances of
the free monad.
Starting with the easy functions,
weft is WeftS, and the equation for mesh above is basically its definition.
We just need to add an equation for EndS.
Recall that WeaveS represents multi-level computations.
Computations are composed level-wise with the following liftS2.
The interesting case is the one where both arguments are WeftS: we compose
the first level with liftA2, and the subsequent ones with liftS2
recursively.
liftS2 will be the liftA2 in WeaveS’s Applicative instance.
The Functor and Applicative instances show that WeaveS is an
applicative transformer: for every applicative functor m,
WeaveS m is also an applicative functor.
The outer weft constructor was moved into the recursive calls.
The result type has an extra m, which makes it more apparent that
we always start with a call to f. It’s the same vibe as replacing the type
[a] with NonEmpty a when we know that a list will always have at least one
element; weaveS always produces at least one level of computation.
We also replace (<$>) with its flipped version (<&>) for aesthetic reasons:
we can apply it to a lambda without parentheses, and that change makes the
logic flow naturally from left to right: we first expand the seed s using
f, and continue depending on whether the expansion produced LeafF or NodeF.
To define unfoldM, instead of applying mesh directly, we chain it with
(>>=).
That solution is Obviously Correct™, but it has a terrible flaw:
it does not run in linear time!
We can demonstrate this by generating a “thin” tree whose height
is equal to its size.
The height h is the seed of the unfolding, and we generate a NodeF as long
as it is non-zero, asking for a decreased height h - 1 on the right,
and a zero height on the left.
$ cabal exec breadth-first-unfolds -- -p "S-thin"
All
S-thin
1x: OK
27.6 μs ± 2.6 μs, 267 KB allocated, 317 B copied, 6.0 MB peak memory
10x: OK
2.90 ms ± 181 μs, 23 MB allocated, 178 KB copied, 7.0 MB peak memory, 105.35x
Multiplying the height by 10x makes the function run 100x slower.
Dramatically quadratic.
Complexity analysis
We can compare this implementation with level from earlier, which is linear-time.
In particular, looking at zipLevels with liftS2—which play similar
roles—there is a crucial difference when one of the arguments is empty
([] or EndS):
zipLevels simply returns the other argument, whereas liftS2 calls (<$>),
continuing the recursion down the other argument.
So zipLevels stops working after reaching the end of either argument, whereas
liftS2 walks to the end of both arguments. There is at least one
call to liftS2 on every level which will walk to the bottom of the tree,
so we get a quadratic lower bound Ω(height2).
Out of sight, out of mind
The problematic combinators are fmap and liftS2, which weaveS uses to
construct the unfolded tree. If we don’t care about that tree—wanting only
the effect of a monadic unfold—then we can get rid of the complexity
associated with those combinators.
With no result to return, we remove the a type parameter from the definition
of WeaveS, yielding the oblivious (“O”) variant:
dataWeaveOm=EndO|WeftO (m (WeaveOm))
We rewrite mesh into meshO, reducing a WeaveO m computation
into m () instead of m a.
To implement a breadth-first walk, we modify weaveS above by replacing
liftA2 (Node a) with (<>). Note that the type parameter a is no longer in
the result. It was only used in the tree that we decided to forget.
Running weaveO with meshO yields a oblivious monadic unfold:
it produces m () instead of m (Tree a).
(You may remember seeing another implementation of that same signature
just earlier, unfoldM_BF_G_.)
Previously, we benchmarked the function thinTreeS that outputs a tree by
forcing the tree. With an oblivious unfold, there is no tree to force.
Instead we will count the number of generated NodeF constructors:
thinTreeO::Int->IntthinTreeO= (`execState`0) .unfoldM_BF_O_ (state.f)wheref0counter= (LeafF, counter)fhcounter= (NodeF () 0 (h-1), counter+1) -- increment the counter for every NodeF
We adapt the benchmark from before to measure the complexity of
unfolding thin trees. We have to increase the baseline height from 100 to 500
because this benchmark runs so much faster than the previous ones.
$ cabal exec breadth-first-unfolds -- -p O-thin
All
O-thin
1x: OK
148 μs ± 8.3 μs, 543 KB allocated, 773 B copied, 6.0 MB peak memory
10x: OK
1.45 ms ± 113 μs, 5.4 MB allocated, 82 KB copied, 7.0 MB peak memory, 9.78x
The growth is linear, as desired:
the “10x” bench is 10x slower than the baseline “1x” bench.
Laziness for the win
The oblivious unfold avoided quadratic explosion by simplifying the problem.
Now let’s solve the original problem again,
so we can’t just get rid of fmap and liftA2.
As mentioned previously, the root cause was that (1) liftA2 calls fmap when
one of the constructors is EndS, and (2) fmap traverses the other argument.
The next solution will be to make fmap take constant time,
by storing the “mapped function” in the constructor.
Behold the “L” variant of WeaveS, which is a GADT:
The Applicative instance is… a good exercise for the reader.
The details are not immediately important—we only care about improving fmap
for now—we will come back to have a look at the Applicative instance soon.
The runner function meshL is a simple bit of type Tetris.
$ cabal exec breadth-first-unfolds -- -p "L-thin"
All
L-thin
1x: OK
14.1 μs ± 782 ns, 59 KB allocated, 5 B copied, 6.0 MB peak memory
10x: OK
140 μs ± 13 μs, 586 KB allocated, 51 B copied, 6.0 MB peak memory, 9.93x
Lazy in more ways than one
As hinted by the “L” and “S” suffixes,
WeaveL is a “lazy” variant of WeaveS: fmap for WeaveL “postpones”
work by accumulating functions in the WeftL constructor.
That work is “forced” by meshL, which is where the fmap ((<$>)) of the
underlying monad m is called, performing the work accumulated
by possibly many calls to WeaveL’s fmap.
One subtlety is that there are multiple “lazinesses” at play.
The main benefit of using WeaveL is really to delay computation,
that is a kind of laziness, but WeaveL doesn’t need to be
implemented in a lazy language.
We can rewrite all of the code we’ve seen so far in a strict language
with minor changes, and we will still observe the quadratic vs linear behavior
of WeaveS vs WeaveL on thin trees.
The “manufactured laziness” of WeaveL is a concept independent of the
“ambient laziness” in Haskell.
Nevertheless, we can still find an interesting role for that “ambient laziness”
in this story. Indeed, the function weaveL also happens to be lazier than
weaveS in the usual sense.
A concrete test case is worth a thousand words. Consider the following
tree generator which keeps unfolding left subtrees while making
every right subtree undefined:
whnfTreeS::TestTreewhnfTreeS=expectFail$testCase"S-whnf"$docasepartialTreeSofNode___->pure () -- SucceedLeaf->error"unreachable"-- definitely not a Leaf
As it turns out, this test using the “S” variant fails. (That’s
why the test is marked with expectFail.)
Forcing partialTreeS evaluates the undefined in partialTreeF.
Therefore partialTreeS is not equivalent to partialTree.
$ cabal exec breadth-first-unfolds -- -p "L-whnf"
All
L-whnf: OK
This difference can only be seen with “lazy monads”, where (>>=) is
lazy in its first argument.
(If this definition sounds not quite right, that’s probably because of seq.
It makes a precise definition of “lazy monad” more complicated.)
Examples of lazy monads from the transformers library
are Identity, Reader, lazy State, lazy Writer, and Accum.
The secret sauce is the definition of liftA2 for WeaveL:
In the third clause of liftA2, we put the function f in a lambda with a
lazy pattern (~(a, b)) directly under the topmost constructor WeftL.
Thus, we can access the result of f from the second field of WeftL
without looking at the first field. In liftS2 earlier, f was
passed as an argument to (liftA2 . liftS2), that forces us to run the
computation before we can get a hold on the result of f.
Maximizing laziness
The “L” variant of unfoldM is lazier than the “S” variant,
but there is still a gap between partialTreeL and the pure partialTree:
if we force not only the root, but also the left subtree of partialTreeL,
then we run into undefined again.
Although the unfold using WeaveL is lazier than using WeaveS,
it is not yet as lazy as it could be.
The reason is that, strictly speaking, WeaveL’s liftA2 is a strict function.
The expansion function partialTreeF produces a level with an undefined
sub-computation, which crashes the whole level.
Each level in a computation will be either completely defined or undefined.
To recap, we’ve been looking at the following trees:
It is natural to ask: can we define a breadth-first unfold that, when applied
to partialTreeF, will yield the same tree as partialTree?
More generally, the new problem is to define a breadth-first unfoldM
whose specialization with the Identity functor is equivalent to
the pure unfold even on partially-defined values. That is, it satisfies
the following equation:
unfold f = runIdentity . unfoldM (Identity . f)
Laziness without end
The strictness of liftA2 is caused by WeaveL having two constructors.
Let’s get rid of EndL.
Wait a second. I spoke too fast, GHC gives us an error:
error: [GHC-87005]
• An existential or GADT data constructor cannot be used
inside a lazy (~) pattern
• In the pattern: WeftE wa g
In the pattern: ~(WeftE wa g)
In an equation for ‘fmap’: fmap f ~(WeftE wa g) = WeftE wa (f . g)
|
641 | > fmap f ~(WeftE wa g) = WeftE wa (f . g)
| ^^^^^^^^^^
The feature we need is “first-class existentials”,
for which there is an open GHC proposal.
Not letting that stop us, there is a simple version of first-class existentials
available in the package some,
as the module Data.Some.Newtype (internally using unsafeCoerce).
That will be sufficient for our purposes.
All we need is an abstract type Some and a pattern synonym:
-- imported from Data.Some.NewtypedataSomefpatternSome::fa->Somef
And we’re back on track. Here comes the actual “E” (endless) variant:
The endless WeaveE enables an even lazier implementation of unfoldM.
When specialized to the identity monad, it lets us force the resulting
tree in any order. The forceLeftTreeE test passes (unlike forceLeftTreeL).
$ cabal exec breadth-first-unfolds -- -p "E-left"
All
E-left: OK
One can also check that forcing the left spine of partialTreeE
arbitrarily deep throws no errors.
We made it lazy, but at what cost?
First, this “Endless” variant only works for lazy monads.
With a strict monad, the runner meshE will loop forever.
It is possible to run things more incrementally by pattern-matching on
WeaveE, but you’re better off using the oblivious WeaveO anyway.
Second, when you aren’t running into an unproductive loop, the “Endless” variant of
unfoldM has quadratic time complexity Ω(height2). The reason
is essentially the same as the “Strict” variant: liftA2 keeps looping even if
one argument is a pure weave—before, that was to traverse the other
non-pure argument, now, there isn’t even a way to tell when the computation
has ended.
Thus, every leaf may create work proportional to the height of the tree.
Running the same benchmark as before, we measure even more baffling timings:
Using the previous setup comparing a baseline and a 10x run, we see a more than
700x slowdown, so much worse than the 100x predicted by a quadratic model.
Interestingly, the raw output shows that the total cumulative allocations did
grow by a 100x factor.1
But it gets weirder with more data points: it does not follow a clear power law.
If Time(n) grew as
nc for some fixed exponent c, then the ratio
Time(Mn)/Time(n) would be Mc,
a constant that does not depend on n.
In the following benchmark, we keep doubling the height (M = 2) for every
test case, and we measure the time relative to the preceding case each time.
A quadratic model predicts a 4x slowdown at every step. Instead, we
observe wildly varying factors.
Benchmark output (each time factor is relative to the preceding line,
for example, the “4x” benchmark is 9.5x slower than the “2x” benchmark):
$ cabal exec breadth-first-unfolds -- -p "E-thin-more"
All
E-thin-more
1x: OK
222 μs ± 9.3 μs, 1.2 MB allocated, 13 KB copied, 6.0 MB peak memory
2x: OK
2.43 ms ± 85 μs, 4.8 MB allocated, 236 KB copied, 7.0 MB peak memory, 10.94x
4x: OK
23.1 ms ± 1.2 ms, 19 MB allocated, 2.7 MB copied, 10 MB peak memory, 9.53x
8x: OK
126 ms ± 7.8 ms, 76 MB allocated, 18 MB copied, 24 MB peak memory, 5.44x
16x: OK
181 ms ± 7.0 ms, 119 MB allocated, 30 MB copied, 24 MB peak memory, 1.44x
I believe this benchmark is triggering some pathological behavior in the garbage
collector. I modified tasty-bench with an option to measure CPU time without GC
(mutator time). At time of writing, tasty-bench is still waiting for a new release.
We can point Cabal to an unreleased commit of tasty-bench by adding the following
lines to cabal.project.local.
For the “2x” benchmarks, we are closer the expected 4x slowdown, but there is
still a noticeable gap.
I’m going to chalk the rest to inherent measurement errors (the cost of
tasty-bench’s simplicity) exacerbated by the pathological GC behavior;
a possible explanation is that the pattern of memory usage becomes so bad that
it affects non-GC time.
Benchmark output (excluding GC time, each measurement is relative to the
preceding line):
Microbenchmarks: Queues vs Global Levels vs Weaves
So far we’ve focused on asymptotics (linear vs quadratic). Some readers
will inevitably wonder about real speed.
Among the linear-time algorithms—queues (“Q”), global levels (“G”),
and weaves (lazy “L” or oblivious “O”)—which one is faster?
tl;dr: Queues are (much) faster in these microbenchmarks (up to 25x!),
but keep in mind that these are all quite naive implementations.
There are two categories to measure separately: unfolds which produce trees,
and oblivious unfolds—which don’t produce trees. These microbenchmarks
construct full trees up to a chosen number of nodes. When there is an
output tree, we force it (using nf), otherwise we force a counter of the
number of nodes. We run on different sufficiently large sizes (500 and 5000)
to check the stability of the measured factors, ensuring that we are only
comparing the time components that dominate at scale.
The tables list times relative to the queue benchmark for each tree size.
I hope to have piqued your interest in breadth-first unfolds without
using queues.
To the best of my knowledge, this specific problem hasn’t been studied in the
literature. It is of course related to breadth-first traversals,
previously solved using the Phases applicative.2
The intersection of functional programming and breadth-first walks is a small
niche, which makes it quick to survey that corner of the world for any related
ideas to those presented here.
The paper Modular models of monoids with operations by Zhixuan Yang
and Nicolas Wu, in ICFP 2023, mentions a general construction of Phases as an
example application of their theory. Basically, Phases is defined by a
fixed-point equation:
Phases f = Day f Phases :+: Identity
We can express Phases abstractly as a least fixed-point
μx.f▫x + Id in any monoidal category with a suitable structure.
If we instantiate the monoidal product ▫ not with Day convolution,
but with functor composition (Compose), then we get Weave.
In another coincidence, the monad-coroutine package
implements a weave function which is a generalization of
liftS2—this may require some squinting.
While WeaveS as a data type coincides with the free monad Free,
monad-coroutine’s core data type Coroutine coincides
with the free monad transformer FreeT.
We can view Phases as a generalization of “zipping” from
lists to free applicatives—which are essentially lists of actions,
and Weave generalizes that further to free monads. To recap, the surprise was
that the naive data type of free monads results in a quadratic-time unfold.
That issue motivated a “lazy” variant3 which achieves a linear-time
breadth-first unfold. That in turn suggested an even “lazier” variant which
enables more control on evaluation order at the cost of efficiency.
I’ve just released the weave library which implements
the main ideas of this post. I don’t expect it to have many users, given
how much slower it is compared to queue-based solutions.
But I would be curious to find a use case for the new compositionality
afforded by this abstraction.
Recap table
Unfolds
Time
Laziness
Compositional
Phases*
No
linear†
by levels
Yes
Queue (Q)
Yes
linear†
strict
No
Global Levels (G)
Yes
linear†
by levels
No
Strict Weave (S)
Yes
quadratic‡
strict
Yes
Oblivious Weave (O)
Oblivious only
linear†
N/A
Yes
Lazy Weave (L)
Yes
linear†
by levels
Yes
Endless Weave (E)
Yes
quadratic‡E
maximally lazy◊
Yes
†Linear wrt. size: Θ(size). ‡Quadratic wrt. height: lower bound Ω(height2), upper bound O(height × size). EThe “Endless” meshE only terminates with lazy monads. *I guess there exists an “endless Phases” variant, that
would be quadratic and maximally lazy. ◊The definition of “maximally lazy” in this post actually misses a
range of possible lazy behaviors with monads other than Identity. A further
refinement seems to be another can of worms.
Note that tasty-bench also reports memory statistics
(allocated, copied, and peak memory) when certain RTS options are enabled,
which I’ve done by compiling the test executable with -with-rtsopts=-T.↩︎
Today, 2025-03-16, at 1930 UTC (12:30 pm PST, 3:30 pm EST, 7:30 pm GMT, 20:30 CET, …)
we are streaming the 41th episode of the Haskell Unfolder live on YouTube.
Generic functions are a powerful tool that allows us to make more type classes derivable. In this episode, we’ll look at a simple example, namely deriving Monoid instances for product types, using both GHC’s built-in generics and the generics-sop library.
About the Haskell Unfolder
The Haskell Unfolder is a YouTube series about all things Haskell hosted by
Edsko de Vries and Andres Löh, with episodes appearing approximately every two
weeks. All episodes are live-streamed, and we try to respond to audience
questions. All episodes are also available as recordings afterwards.
This is the twenty-sixth edition of our GHC activities report, which describes
the work Well-Typed are doing on GHC, Cabal, HLS and other parts of the core Haskell toolchain.
The current edition covers roughly the months of December 2024 to February 2025.
You can find the previous editions collected under the
ghc-activities-report tag.
Sponsorship
We offer Haskell Ecosystem Support Packages to provide
commercial users with support from Well-Typed’s experts, while investing in the
Haskell community and its technical ecosystem. Clients who engage in these packages
both fund the work described in this report
and support the Haskell Foundation.
We are delighted to announce two new Bronze Haskell Ecosystem Supporters: Channable
and QBayLogic.
Many thanks also to our existing clients who also contribute to making this work possible:
Anduril, Juspay and Mercury,
and to the HLS Open Collective for
supporting HLS release management.
In addition, many others within Well-Typed contribute to GHC, Cabal and HLS
occasionally, or contribute to other open source Haskell libraries and tools.
GHC
GHC Releases
We are currently overseeing releases of the GHC 9.10 and 9.12 release series.
Zubin oversaw the preparation and final release of GHC 9.12.1 on 16 December 2024. Unfortunately, it was found in mid-January that this release was affected by a regression affecting sub-word division (#25653). In response to this we scheduled a minimal 9.12.2 release fixing this issue, which was released on 14 March 2025. We anticipate that the next GHC 9.12 release will come in the summer.
Our current release engineering focus is 9.10.2, which is currently being worked on by Andreas and Ben. We expect that the release candidate for this will be out in late March.
In parallel, Zubin has been working towards cutting a corresponding release of HLS and introducing support for GHC 9.12.
Platform support
For many years, GHC’s FreeBSD support has been in a state of limbo: while the compiler has usually been functional on FreeBSD, we have never had proper CI support, meaning that we could neither systematically validate correctness nor produce binary distributions.
Late last year Ben fixed several issues that had been plaguing FreeBSD, allowing it to pass
the testsuite and worked with a contributor to bring up a continuous integration
runner for this platform (!13619, !13963). In response, we expect that GHC 9.14 will ship binary distributions for FreeBSD as a tier 2 platform.
However, supporting GHC’s compatibility matrix requires a real investment of time and energy. If you rely on FreeBSD or any other BSD, we would appreciate your help in looking after and improving GHC’s support of these platforms.
Frontend
Matthew, Adam, and Rodrigo wrote and proposed the now-accepted Explicit Level Imports proposal. This proposal represents a significant step forward in Haskell staged metaprogramming story, introducing syntax to distinguish imports needed at runtime from those only needed at compile-time (e.g. for TemplateHaskell splices). This distinction opens the door to compile-time improvements, more robust cross-compilation support, and a more expressive metaprogramming story.
Sam finished up work by GHC contributor Jade, giving GHCi error messages
their own error codes (#23338, !13094).
Sam made several internal improvements to the typechecker, surrounding the
function checkTyEqRhs which is responsible for skolem escape, occurs checks
and representation-polymorphism checking (!13778, !13931).
Sam implemented a simplification of the logic for solving of quantified constraints,
both improving solver efficiency and simplifying specialization (!13958).
Sam refactored the GHC “error context” infrastructure, migrating it to a
structured representation like for the error message contents (#23436, !10540).
Sam implemented defaulting of representational equalities, which allows
GHC to accept several uses of coerce that used to be rejected with ambiguous
types (#21003, !13834).
Sam fixed GHC emitting spurious “incomplete record selectors” warnings due to
missing long-distance information in the pattern-match checker (#25749, !13979).
Backend
Sam prevented GHC emitting LLVM code with incompatible vector types (e.g. the same
variable being declared as 4xi32 and used as 8xi16) (!13936).
Sam investigated CI failures with the LLVM backend, identifying several
critical bugs such as #25771 and #25773.
Ben improved the naming of various compiler-generated binders, which will make it easier
to make sense of -ddump-simpl output and runtime stacks (!13849, !13875).
Rodrigo ensured that certain join-points inline,
dramatically improving runtime allocations of certain programs (#25723, !13909).
In response to a serious correctness regression in 9.12.1,
Ben improved the testing story for primops by adding Cmm surface syntax for the
previously-untested Mul2 operations and expanding the scope of the
test-primops testsuite (!13843, test-primops!27).
Compiler performance
Matthew improved the performance of type family consistency checking by
ensuring checks are run in topological order, significantly reducing redundant work
(#25554, !13685).
Rodrigo made a variety of improvements in GHC’s memory consumption, including
refactoring the ModuleGraph interface (!13658) and reducing the memory
usage of module transitive closure calculations done when encountering
Template Haskell splices (#25634, !13753).
Rodrigo refactored the HomePackageTable and HomeUnitGraph to avoid
significant space leaks (#25511, !13675).
Rodrigo improved the performance of compiling deriving Show and deriving Data
(!13739).
Runtime system
Ben fixed a few bugs in the linker’s object unloading implementation resulting in
runtime crashes (#24935, !13704; #25039, !13714).
Ben removed some dead code in the IO manager which was causing some CI jobs
to fail (!13678).
Ben lifted the runtime system’s limit of 256 capabilities, ensuring that
the runtime system can scale to large multicore systems (#25560, !13692).
Ben fixed a bug in mmapInRegion which would cause it to loop indefinitely
in certain circumstances on FreeBSD (#25492, !13618).
GHCi & bytecode interpreter
Matthew improved the error reporting of out-of-scope qualified names in
GHCi (!13751).
Matthew fixed segfaults in the bytecode interpreter that were caused by the
FastString table being loaded unoptimised (!13877).
Matthew dramatically improved the performance of the bytecode interpreter
by avoiding generating no-op SLIDE x 0 instructions (!13868),
by using a strict genericLength function (!13885),
by avoiding intermerdiate lists in nameToCLabel (!13898),
by using Name rather than FastString to key the symbol cache (!13914).
Matthew fixed the INTERP_STATS macro that is used for performance statistics
of GHCi (!13879), and then proceeded to fix accounting errors in these
statistics (!13956).
Ben and Matthew improved the printing of BCOs to assist debugging the bytecode
interpreter (!13570, !13878, !13955)
Ben fixed an incorrect assumption regarding which unlifted types can appear at
the top-level (#25641, !13796).
Libraries
Ben re-introduced missing {Enum/Show} IOSubSystem instances that were
accidentally removed in !9676 (#25549, !13683).
Ben introduced Data.Enum and Data.Bounded as agreed in CLC Proposal #208, correcting
an accidental inclusion of Data.Enum in the ghc-internal refactor
(#25320, !11347, !13790).
Ben implemented CLC proposal #305, ensuring that threads created by GHC’s base library can be easily identified by their thread label (#25452, !13566).
Rodrigo improved the implementation of SomeException for SomeAsyncException,
implementing CLC Proposal #309 (!13725).
Build system
Ben fixed #25501, ensuring that the ld-override logic is consistent between the configure
script and ghc-toolchain (!13617).
Ben mitigated a race condition with mktexfmt in Hadrian (#25564, !13703).
Ben allowed i686 to be parsed as part of triples in the configure script
(#25691, !13874).
Cabal
Matthew fixed two Cabal 3.14 regressions in which the current working directory
was not correctly taken into account, when creating the build folder
(#10772, #10800) and when running test executables (#10704, #10725).
Matthew fixed a Cabal 3.14 regression in which Cabal would erroneously pick
versions of build tools (such as alex or happy) from the system environment
rather than the versions specified in build-tool-depends (#10692, #10731).
ghc-debug
Zubin implemented support in the ghc-debug backend and Brick front-end for
streaming heap traversals, enabling constant-space analysis of large heaps
(!66).
A new release of Liquid Haskell is out after quite an active period of
development with 99 pull requests in the liquidhaskell repository, and
29 pull requests in the liquid-fixpoint repository from about ten contributors.
This post is to provide an overview of the changes that made it into the latest release.
There were contributions to the reflection and proof mechanisms; we got
contributions to the integration with GHC; the support of cvc5 was improved
when dealing with sets, bags, and maps; and there was a rather large overhaul
of the name resolution mechanism.
Reflection improvements
Liquid Haskell is a tool to verify Haskell programs. We can write formal
specifications inside special Haskell comments {-@ ... @-}, and the tool
will check whether the program behaves as specified. For instance, the following
specification of the filter function says that we expect all of the elements
in the result to satisfy the given predicate.
{-@ filter :: p:(a -> Bool) -> xs:[a] -> {v:[a] | all p v } @-}
Liquid Haskell would then analyze the implementation of filter to verify that
it does indeed yield elements that satisfy the predicate.
To verify such a specification, Liquid Haskell needs to attach a meaning to the
names in the predicate all p v. It readily learns that p is a parameter
of filter, and that v is the result. all, however, isn’t bound by the specification’s parameters, so it refers to whatever is in scope, which is the
Haskell function from the Prelude.
all::(a->Bool)->[a]->Bool
And Liquid Haskell has a mechanism to provide logic meaning to the implementation
of a function like all, known as reflection. While it has always been convenient to reflect functions in modules analyzed by Liquid Haskell, it was not so easy when there was a
mix of local and imported definitions from dependencies that are not analysed with
Liquid Haskell. Last year, there was an internship at Tweag to address exactly this
friction, which resulted in contributions to the
latest release.
Reasoning and reflection of lambdas
The reflection mechanism also has other specific limitations at the moment. For instance,
it doesn’t allow reflecting recursive functions defined in let or where bindings. And
until recently, it didn’t allow reflecting functions that contained anonymous functions.
For example,
takePositives = filter (\x -> x > 0)
In the latest release, we have several contributions that introduce support for reflecting lambdas and improve the story for reasoning with them.
This feature is considered experimental at the moment, since we will still have usability and
performance concerns that deserve further contributions, but one can
already explore the experience that we could expect in the long run.
Integration with GHC
In 2020 Liquid Haskell became a compiler plugin for GHC. It was hooked into the
end of the type checking phase firstly to ensure it only runs on well-typed programs,
and secondly, to ensure the plugin runs when GHC is only asked to typecheck the
module but not to generate code, which was helpful to IDEs.
For a few technical reasons, the plugin was re-parsing and re-typechecking the module
instead of using the abstract syntax tree (AST) that GHC handed to it as the result of
type checking. That is no longer the case in the latest release, where the AST after
type checking is now used for all purposes. In addition, there were several improvements
to how the ghc library is used.
cvc5 support
Liquid Haskell offloads part of its reasoning to a family of automated theorem
provers known as SMT solvers. For most developments, Liquid Haskell has been
used with the Z3 SMT solver, and this is what has been used most of the time in
continuous integration pipelines.
In theory, any SMT solver can be used with Liquid Haskell, if it provides a standard
interface known as SMT-LIB. In practice, however, experiments are done with
theories that are not part of the standard. For instance, the reasoning capabilities
for bags, sets, and maps used to require z3. But now the latest release implements
support for cvc5 as well.
Name resolution overhaul
Name resolution determines, for each name in a program, what is the definition that
it refers to. Liquid Haskell, in particular, is responsible for resolving names
that appear in specifications. This task was problematic when the programs
it was asked to verify spanned many modules.
There were multiple kinds of names, each with their own name resolution rules,
and names were resolved in different environments when verifying a module and
when importing it elsewhere, not always yielding the same results, which often
produced confusing errors.
Name resolution, however, was done all over the code base, and any attempt to
rationalize it would require a few months of effort. I started such an epic last
September, and managed to conclude it in February.
These changes made it into the latest release together with an awful lot of
side quests to simplify the existing code.
The road ahead
There is no coordinated roadmap for Liquid Haskell. Much of the contributions
that it receives depend on the opportunity enabled by academic research or
the needs of particular use cases.
On my side, I’m trying to improve the adoption of Liquid Haskell. Much of the challenge
is reducing the amount of common workarounds that the proficient Liquid Haskeller
needs to employ today. For instance, supporting reflection of functions in local bindings
would save the user the trouble of rewriting her programs to put the recursive functions
in the top level.
Repairing the support for type classes would allow functions to be verified
even if they use type classes, which is a large subset of Haskell today.
And without having defined a scope with precision yet, Liquid Haskell still needs to
improve its user documentation, its error messages, and its tracing and logging.
The project is chugging along, though. It is making significant leaps in usability. The
upgrade costs have been quantified for a few GHC releases, and
no longer look like an unbounded risk. The amount of external contributions has
increased last year, although we still have to see if it is a trend. And there is
no shortage of interest from academia and industrial interns.
Thanks to the many contributors for their work and their help during code
reviews. I look forward to learning what makes it into the coming Liquid Haskell releases!
On this episode of the Haskell Interlude, Andres Löh and Mike Sperber are joined by Farhad Mehta, a professor at OST Rapperswil, and one of the organizers of ZuriHac. Fahrad tells us about formal methods, building tunnels, the importance of education, and the complicated relationship between academia and industry.
At work I sometimes need to deal with large and deep JSON objects where I'm only
interested in a few of the values. If all the interesting values are on the top
level, then aeson have functions that make it easy to implement FromJSON's
parseJSON (Constructors and accessors), but if the values are spread out then
the functions in aeson come up a bit short. That's when I reach for lens-aeson,
as lenses make it very easy to work with large structures. However, I've found
that using its lenses to implement parseJSON become a lot easier with a few
helper functions.
Many of the lenses produces results wrapped in Maybe, so the first function is
one that transforms a Maybe a to a Parser a. Here I make use of Parser
implementing MonadFail.
infixl8<!>(<!>) :: (MonadFail m) => Maybe a->String->m a(<!>) mv err = maybe (fail err) pure mv
In some code I wrote this week I used it to extract the user name out of a JWT
produced by Keycloak:
instance FromJSON OurClaimsSetwhere
parseJSON = ... $ \o ->do
cs <- parseJSON o
n <- o ^? key "preferred_username". _String <!>"preferre username missing"
...
pure $ OurClaimsSet cs n ...
Also, all the lenses start with a Value and that makes the withX functions
in aeson to not be a perfect fit. So I define variations of the withX
functions, e.g.
withObjectV :: String->(Value->Parser a)->Value->Parser awithObjectV s f = withObject s (f . Object)
That makes the full FromJSON instance for OurClaimsSet looks like this
instance FromJSON OurClaimsSetwhereparseJSON = withObjectV "OurClaimsSet"$ \o ->do
cs <- parseJSON o
n <- o ^? key "preferred_username". _String <!>"name"letrs = o ^.. key "resource_access". members . key "roles". _Array . traverse . _String
pure $ OurClaimsSet cs n rs
The GHC developers are happy to announce the release of GHC 9.12.2.
Binary distributions, source distributions, and documentation are available at
downloads.haskell.org.
We hope to have this release available via ghcup shortly. This is a small
release fixing a critical code generation bug, #25653, affecting some subword
division operations.
As always, GHC’s release status, including planned future releases, can
be found on the GHC Wiki status.
We would like to thank IOG, the Zw3rk stake pool,
Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell
Foundation, and other anonymous contributors whose on-going financial
and in-kind support has facilitated GHC maintenance and release
management over the years. Finally, this release would not have been
possible without the hundreds of open-source contributors who
contribute their code, tickets, and energy to the GHC project.
As always, do give this release a try and open a ticket if you see
anything amiss.
I’ve created an open mirror
contest which will run in
parallel to the official contest, so if you want to grab some friends
and try solving some of the problems together using your favorite
language, be my guest!
<noscript>Javascript needs to be activated to view comments.</noscript>
Today, 2025-03-12, at 1930 UTC (12:30 pm PST, 3:30 pm EST, 7:30 pm GMT, 20:30 CET, …)
we are streaming the 40th episode of the Haskell Unfolder live on YouTube.
QuickCheck is useful for more than just testing. Comparing the behaviour of a system to a model can be used to check if a system under construction is working correctly, but it can also be used to better understand an already existing system. In this episode we show that this does not need to be very difficult, by designing a model that we can use to understand tensor convolutions in an existing large library.
About the Haskell Unfolder
The Haskell Unfolder is a YouTube series about all things Haskell hosted by
Edsko de Vries and Andres Löh, with episodes appearing approximately every two
weeks. All episodes are live-streamed, and we try to respond to audience
questions. All episodes are also available as recordings afterwards.
A few months ago I explained that one reason why this blog has become more quiet is that all my work on Lean is covered elsewhere.
This post is an exception, because it is an observation that is (arguably) interesting, but does not lead anywhere, so where else to put it than my own blog…
When defining a function recursively in Lean that has nested recursion, e.g. a recusive call that is in the argument to a higher-order function like List.map, then extra attention used to be necessary so that Lean can see that xs.map applies its argument only elements of the list xs. The usual idiom is to write xs.attach.map instead, where List.attach attaches to the list elements a proof that they are in that list. You can read more about this my Lean blog post on recursive definitions and our new shiny reference manual, look for Example “Nested Recursion in Higher-order Functions”.
To make this step less tedious I taught Lean to automatically rewrite xs.map to xs.attach.map (where suitable) within the construction of well-founded recursion, so that nested recursion just works (issue #5471). We already do such a rewriting to change if c then … else … to the dependent if h : c then … else …, but the attach-introduction is much more ambitious (the rewrites are not definitionally equal, there are higher-order arguments etc.) Rewriting the terms in a way that we can still prove the connection later when creating the equational lemmas is hairy at best. Also, we want the whole machinery to be extensible by the user, setting up their own higher order functions to add more facts to the context of the termination proof.
I implemented it like this (PR #6744) and it ships with 4.18.0, but in the course of this work I thought about a quite different and maybe better™ way to do this, and well-founded recursion in general:
WellFounded.fix : (hwf : WellFounded r) (F : (x : α) → ((y : α) → r y x → C y) → C x) (x : α) : C x
we have to rewrite the functorial of the recursive function, which naturally has type
F : ((y : α) → C y) → ((x : α) → C x)
to the one above, where all recursive calls take the termination proof r y x. This is a fairly hairy operation, mangling the type of matcher’s motives and whatnot.
so the functorial’s type is unmodified (here β will be ((x : α) → C x)), and everything else is in the propositional side-condition montone F. For this predicate we have a syntax-guided compositional tactic, and it’s easily extensible, e.g. by
theorem monotone_mapM (f : γ → α → m β) (xs : List α) (hmono : monotone f) :
monotone (fun x => xs.mapM (f x))
Once given, we don’t care about the content of that proof. In particular proving the unfolding theorem only deals with the unmodified F that closely matches the function definition as written by the user. Much simpler!
Isabelle has it easier
Isabelle also supports well-founded recursion, and has great support for nested recursion. And it’s much simpler!
There, all you have to do to make nested recursion work is to define a congruence lemma of the form, for List.map something like our List.map_congr_left
List.map_congr_left : (h : ∀ a ∈ l, f a = g a) :
List.map f l = List.map g l
This is because in Isabelle, too, the termination proofs is a side-condition that essentially states “the functorial F calls its argument f only on smaller arguments”.
Can we have it easy, too?
I had wished we could do the same in Lean for a while, but that form of congruence lemma just isn’t strong enough for us.
But maybe there is a way to do it, using an existential to give a witness that F can alternatively implemented using the more restrictive argument. The following callsOn P F predicate can express that F calls its higher-order argument only on arguments that satisfy the predicate P:
section setup
variable {α : Sort u}
variable {β : α → Sort v}
variable {γ : Sort w}
def callsOn (P : α → Prop) (F : (∀ y, β y) → γ) :=
∃ (F': (∀ y, P y → β y) → γ), ∀ f, F' (fun y _ => f y) = F f
variable (R : α → α → Prop)
variable (F : (∀ y, β y) → (∀ x, β x))
local infix:50 " ≺ " => R
def recursesVia : Prop := ∀ x, callsOn (· ≺ x) (fun f => F f x)
noncomputable def fix (wf : WellFounded R) (h : recursesVia R F) : (∀ x, β x) :=
wf.fix (fun x => (h x).choose)
def fix_eq (wf : WellFounded R) h x :
fix R F wf h x = F (fix R F wf h) x := by
unfold fix
rw [wf.fix_eq]
apply (h x).choose_spec
This allows nice compositional lemmas to discharge callsOn predicates:
theorem callsOn_base (y : α) (hy : P y) :
callsOn P (fun (f : ∀ x, β x) => f y) := by
exists fun f => f y hy
intros; rfl
@[simp]
theorem callsOn_const (x : γ) :
callsOn P (fun (_ : ∀ x, β x) => x) :=
⟨fun _ => x, fun _ => rfl⟩
theorem callsOn_app
{γ₁ : Sort uu} {γ₂ : Sort ww}
(F₁ : (∀ y, β y) → γ₂ → γ₁) -- can this also support dependent types?
(F₂ : (∀ y, β y) → γ₂)
(h₁ : callsOn P F₁)
(h₂ : callsOn P F₂) :
callsOn P (fun f => F₁ f (F₂ f)) := by
obtain ⟨F₁', h₁⟩ := h₁
obtain ⟨F₂', h₂⟩ := h₂
exists (fun f => F₁' f (F₂' f))
intros; simp_all
theorem callsOn_lam
{γ₁ : Sort uu}
(F : γ₁ → (∀ y, β y) → γ) -- can this also support dependent types?
(h : ∀ x, callsOn P (F x)) :
callsOn P (fun f x => F x f) := by
exists (fun f x => (h x).choose f)
intro f
ext x
apply (h x).choose_spec
theorem callsOn_app2
{γ₁ : Sort uu} {γ₂ : Sort ww}
(g : γ₁ → γ₂ → γ)
(F₁ : (∀ y, β y) → γ₁) -- can this also support dependent types?
(F₂ : (∀ y, β y) → γ₂)
(h₁ : callsOn P F₁)
(h₂ : callsOn P F₂) :
callsOn P (fun f => g (F₁ f) (F₂ f)) := by
apply_rules [callsOn_app, callsOn_const]
With this setup, we can have the following, possibly user-defined, lemma expressing that List.map calls its arguments only on elements of the list:
theorem callsOn_map (δ : Type uu) (γ : Type ww)
(P : α → Prop) (F : (∀ y, β y) → δ → γ) (xs : List δ)
(h : ∀ x, x ∈ xs → callsOn P (fun f => F f x)) :
callsOn P (fun f => xs.map (fun x => F f x)) := by
suffices callsOn P (fun f => xs.attach.map (fun ⟨x, h⟩ => F f x)) by
simpa
apply callsOn_app
· apply callsOn_app
· apply callsOn_const
· apply callsOn_lam
intro ⟨x', hx'⟩
dsimp
exact (h x' hx')
· apply callsOn_const
end setup
So here is the (manual) construction of a nested map for trees:
section examples
structure Tree (α : Type u) where
val : α
cs : List (Tree α)
-- essentially
-- def Tree.map (f : α → β) : Tree α → Tree β :=
-- fun t => ⟨f t.val, t.cs.map Tree.map⟩)
noncomputable def Tree.map (f : α → β) : Tree α → Tree β :=
fix (sizeOf · < sizeOf ·) (fun map t => ⟨f t.val, t.cs.map map⟩)
(InvImage.wf (sizeOf ·) WellFoundedRelation.wf) <| by
intro ⟨v, cs⟩
dsimp only
apply callsOn_app2
· apply callsOn_const
· apply callsOn_map
intro t' ht'
apply callsOn_base
-- ht' : t' ∈ cs -- !
-- ⊢ sizeOf t' < sizeOf { val := v, cs := cs }
decreasing_trivial
end examples
This makes me happy!
All details of the construction are now contained in a proof that can proceed by a syntax-driven tactic and that’s easily and (likely robustly) extensible by the user. It also means that we can share a lot of code paths (e.g. everything related to equational theorems) between well-founded recursion and partial_fixpoint.
I wonder if this construction is really as powerful as our current one, or if there are certain (likely dependently typed) functions where this doesn’t fit, but the β above is dependent, so it looks good.
With this construction, functions defined by well-founded recursion will reduce even worse in the kernel, I assume. This may be a good thing.
The cake is a lie
What unfortunately kills this idea, though, is the generation of the functional induction principles, which I believe is not (easily) possible with this construction: The functional induction principle is proved by massaging F to return a proof, but since the extra assumptions (e.g. for ite or List.map) only exist in the termination proof, they are not available in F.
Oh wey, how anticlimactic.
PS: Path dependencies
Curiously, if we didn’t have functional induction at this point yet, then very likely I’d change Lean to use this construction, and then we’d either not get functional induction, or it would be implemented very differently, maybe a more syntactic approach that would re-prove termination. I guess that’s called path dependence.
There’s yet again been a bit of functional programming-adjacent twitter drama
recently, but it’s actually sort of touched into some subtleties about sum types
that I am asked about (and think about) a lot nowadays. So, I’d like to take
this opportunity to talk a bit about the “why” and nature of sum types and how
to use them effectively, and how they contrast with other related concepts in
programming and software development and when even cases where sum types aren’t
the best option.
Sum Types at their Best
The quintessential sum type that you just can’t live without is
Maybe, now adopted in a lot of languages as
Optional:
dataMaybe a =Nothing|Just a
If you have a value of type Maybe Int, it means that its valid
values are Nothing, Just 0, Just 1,
etc.
This is also a good illustration to why we call it a “sum” type: if
a has n possible values, then Maybe a has
1 + n: we add the single new value Nothing to it.
The “benefit” of the sum type is illustrated pretty clearly here too: every
time you use a value of type Maybe Int, you are forced to
consider the fact that it could be Nothing:
showMaybeInt ::MaybeInt->StringshowMaybeInt = \caseNothing->"There's nothing here"Just i ->"Something is here: "<>show i
That’s because usually in sum type implementations, they are implemented in a
way that forces you to handle each case exhaustively. Otherwise, sum types are
much less useful.
At the most fundamental level, this behaves like a compiler-enforced null
check, but built within the language in user-space instead being compiler magic,
ad-hoc syntax1, or static analysis — and the fact that it
can live in user-space is why it’s been adopted so widely. At a higher level,
functional abstractions like Functor, Applicative, Monad, Foldable, Traversable
allow you to use a Maybe a like just a normal a with
the appropriate semantics, but that’s a
topic for another time (like 2014).
This power is very special to me on a personal level. I remember many years
ago on my first major haskell project changing a type from String
to Maybe String, and then GHC telling me every place in the
codebase where something needed to change in order for things to work still.
Coming from dynamically typed languages in the past, this sublime experience
truly altered my brain chemistry and Haskell-pilled me for the rest of my life.
I still remember the exact moment, what coffee shop I was at, what my order was,
the weather that day … it was truly the first day of the rest of my life.
It should be noted that I don’t consider sum types a “language feature” or a
compiler feature as much as I’d consider it a design pattern. Languages that
don’t have sum types built-in can usually implement them using typed unions and
an abstract visitor pattern interface (more on that later). Of course, having a
way to “check” your code before running it (like with a type system or
statically verified type annotations) does make a lot of the features much more
useful.
Anyway, this basic pattern can be extended to include more error information
in your Nothing branch, which is how you get the
Either e a type in the Haskell standard library, or the
Result<T,E> type in rust.
Along different lines, we have the common use case of defining syntax
trees:
dataExpr=LitInt|NegateExpr|AddExprExpr|SubExprExpr|MulExprExpreval ::Expr->Inteval = \caseLit i -> iNegate x ->-(eval x)Add x y -> eval x + eval ySub x y -> eval x - eval yMul x y -> eval x * eval ypretty ::Expr->Stringpretty = go 0where wrap ::Int->Int->String->String wrap prio opPrec s| prio > opPrec ="("<> s <>")"|otherwise= s go prio = \caseLit i ->show iNegate x -> wrap prio 2$"-"<> go 2 xAdd x y -> wrap prio 0$ go 0 x <>" + "<> go 1 ySub x y -> wrap prio 0$ go 0 x <>" - "<> go 1 yMul x y -> wrap prio 1$ go 1 x <>" * "<> go 2 ymain ::IO ()main =doputStrLn$ pretty myExprprint$ eval myExprwhere myExpr =Mul (Negate (Add (Lit4) (Lit5))) (Lit8)
-(4 + 5) * 8
-72
Now, if we add a new command to the sum type, the compiler enforces us to
handle it.
dataExpr=LitInt|NegateExpr|AddExprExpr|SubExprExpr|MulExprExpr|AbsExpreval ::Expr->Inteval = \caseLit i -> iNegate x ->-(eval x)Add x y -> eval x + eval ySub x y -> eval x - eval yMul x y -> eval x * eval yAbs x ->abs (eval x)pretty ::Expr->Stringpretty = go 0where wrap ::Int->Int->String->String wrap prio opPrec s| prio > opPrec ="("<> s <>")"|otherwise= s go prio = \caseLit i ->show iNegate x -> wrap prio 2$"-"<> go 2 xAdd x y -> wrap prio 0$ go 0 x <>" + "<> go 1 ySub x y -> wrap prio 0$ go 0 x <>" - "<> go 1 yMul x y -> wrap prio 1$ go 1 x <>" * "<> go 2 yAbs x -> wrap prio 2$"|"<> go 0 x <>"|"
Another example where things shine are as clearly-fined APIs between
processes. For example, we can imagine a “command” type that sends different
types of commands with different payloads. This can be interpreted as perhaps
the result of parsing command line arguments or the message in some
communication protocol.
For example, you could have a protocol that launches and controls
processes:
dataCommand a =LaunchString (Int-> a) -- ^ takes a name, returns a process ID|StopInt (Bool-> a) -- ^ takes a process ID, returns success/failurelaunch ::String->CommandIntlaunch nm =Launch nm idstop ::Int->CommandBoolstop pid =Stop pid id
This ADT is written in the “interpreter” pattern (used often with things like
free monad), where any arguments not involving a are the command
payload, any X -> a represent that the command could respond
with X.
Let’s write a sample interpreter backing the state in an IntMap in an
IORef:
importqualifiedData.IntMapasIMimportData.IntMap (IntMap)runCommand ::IORef (IntMapString) ->Command a ->IO arunCommand ref = \caseLaunch newName next ->do currMap <- readIORef reflet newId =case IM.lookupMax currMap ofNothing->0Just (i, _) -> i +1 modifyIORef ref $ IM.insert newId newNamepure (next newId)Stop procId next ->do existed <- IM.member procId <$> readIORef ref modifyIORef ref $ IM.delete procIdpure (next existed)main ::IO ()main =do ref <- newIORef IM.empty aliceId <- runCommand ref $ launch "alice"putStrLn$"Launched alice with ID "<>show aliceId bobId <- runCommand ref $ launch "bob"putStrLn$"Launched bob with ID "<>show bobId success <- runCommand ref $ stop aliceIdputStrLn$if successthen"alice succesfully stopped"else"alice unsuccesfully stopped"print=<< readIORef ref
Launched alice with ID 0
Launched bob with ID 1
alice succesfully stopped
fromList [(1, "bob")]
Let’s add a command to “query” a process id for its current status:
dataCommand a =LaunchString (Int-> a) -- ^ takes a name, returns a process ID|StopInt (Bool-> a) -- ^ takes a process ID, returns success/failure|QueryInt (String-> a) -- ^ takes a process ID, returns a status messagequery ::Int->CommandStringquery pid =Query pid idrunCommand ::IORef (IntMapString) ->Command a ->IO arunCommand ref = \case-- ...Query procId next ->do procName <- IM.lookup procId <$> readIORef refpurecase procName ofNothing->"This process doesn't exist, silly."Just n ->"Process "<> n <>" chugging along..."
Relationship with Unions
To clarify a common confusion: sum types can be described as “tagged unions”:
you have a tag to indicate which branch you are on (which can be case-matched
on), and then the rest of your data is conditionally present.
In many languages this can be implemented under the hood as a struct with a
tag and a union of data, along with some abstract visitor pattern
interface to ensure exhaustiveness.
Remember, it’s not exactly a union, because, ie, consider a type
like:
dataEntity=UserInt|PostInt
An Entity here could represent a user at a user id, or a post at
a post id. If we considered it purely as a union of Int and
Int:
union Entity {int user_id;int post_id;};
we’d lose the ability to branch on whether or not we have a user or an int.
If we have the tagged union, we recover the original tagged union semantics:
Of course, you still need an abstract interface like the visitor pattern to
actually be able to use this as a sum type with guarantees that you handle every
branch, but that’s a story for another day. Alternatively, if your language
supports dynamic dispatch nicely, that’s another underlying implementation that
would work to back a higher-level visitor pattern interface.
Subtypes Solve a Different
Problem
Now, sum types aren’t exactly a part of common programming education
curriculum, but subtypes and supertypes definitely were
drilled into every CS student’s brain and waking nightmares from their first
year.
Informally (a la Liskov), B is a subtype of A (and
A is a supertype of B) if anywhere that expects an
A, you could also provide a B.
In normal object-oriented programming, this often shows up in early lessons
as Cat and Dog being subclasses of an
Animal class, or Square and Circle being
subclasses of a Shape class.
When people first learn about sum types, there is a tendency to understand
them as similar to subtyping. This is unfortunately understandable, since a lot
of introductions to sum types often start with something like
-- | Bad Sum Type Example!dataShape=CircleDouble|RectangleDoubleDouble
While there are situations where this might be a good sum type (ie, for an
API specification or a state machine), on face-value this is a bad example on
the sum types vs. subtyping distinction.
You might notice the essential “tension” of the sum type: you declare all of
your options up-front, the functions that consume your value are open and
declared ad-hoc. And, if you add new options, all of the consuming functions
must be adjusted.
So, subtypes (and supertypes) are more effective when they lean into
the opposite end: the universe of possible options are open and declared ad-hoc,
but the consuming functions are closed. And, if you add new functions,
all of the members must be adjusted.
In typed languages with a concept of “objects” and “classes”, subtyping is
often implemented using inheritance and interfaces.
So, a function like processWidget(Widget widget) that expects a
Widget would be able to be passed a Button or
InputField or Box. And, if you had a container like
List<Widget>, you could assemble a structure using
Button, InputField, and Box. A perfect
Liskov storm.
In typical library design, you’re able to add new implementations of
Widget as an open universe easily: anyone that imports
Widget can, and they can now use it with functions taking
Widgets. But, if you ever wanted to add new functionality
to the Widget interface, that would be a breaking change to all
downstream implementations.
However, this implementation of subtyping, while prevalent, is the most
mind-numbly boring realization of the concept, and it pained my soul to even
spend time talking about it. So let’s jump into the more interesting way that
subtype and supertype relationships manifest in the only language where anything
is interesting: Haskell.
Subtyping via Parametric
Polymorphism
In Haskell, subtyping is implemented in terms of parametric polymorphism and
sometimes typeclasses. This allows for us to work nicely with the concept of
functions and APIs as subtypes and supertypes of each other.
For example, let’s look at a function that takes indexers and applies
them:
So, what functions could you pass to sumAtLocs? Can you
only pass [Double] -> Int -> Double?
Well, not quite. Look at the above where we passed (!!), which
has type forall a. [a] -> Int -> a!
In fact, what other types could we pass? Here are some examples:
fun1 :: [a] ->Int-> afun1 = (!!)fun2 :: [a] ->Int-> afun2 xs i =reverse xs !! ifun3 :: (Foldable t, Floating a) => t a ->Int-> afun3 xs i =iflength xs > i then xs !! i elsepifun4 ::Num a => [a] ->Int-> afun4 xs i =sum (take i xs)fun5 :: (Integral b, Num c) => a -> b -> cfun5 xs i =fromIntegral ifun5 :: (Foldable t, Fractional a, Integral b) => t a -> b -> afun5 xs i =sum xs /fromIntegral ifun5 :: (Foldable t, Integral b, Floating a) => t a -> b -> afun5 xs i =logBase (fromIntegral i) (sum xs)
What’s going on here? Well, the function expects a
[Double] -> Int -> Double, but there are a lot of other types
that could be passed instead.
At first this might seem like meaningless semantics or trickery, but it’s
deeper than that: remember that each of the above types actually has a very
different meaning and different possible behaviors!
forall a. [a] -> Int -> a means that the amust come from the given list. In fact, any function with that type is
guaranteed to be partial: if you pass it an empty list, there is no
a available to use.
forall a. Num a => [a] -> Int -> a means that the
result might actually come from outside of the list: the implementation could
always return 0 or 1, even if the list is empty. It
also guarantees that it will only add, subtract, multiply, or abs: it will never
divide.
forall a. Fractional a => [a] -> Int -> a means that
we could possibly do division on the result, but we can’t do anything “floating”
like square rooting or logarithms.
forall a. Floating a => [a] -> Int -> a means that we
can possibly start square rooting or taking the logarithms of our input
numbers
[Double] -> Int -> Double gives us the least guarantees
about the behavior: the result could come from thin air (and not be a part of
the list), and we can even inspect the machine representation of our
inputs.
So, we have all of these types with completely different semantics and
meanings. And yet, they can all be passed to something expecting a
[Double] -> Int -> Double. That means that they are all
subtypes of [Double] -> Int -> Double!
[Double] -> Int -> Double is a supertype that houses
multitudes of possible values, uniting all of the possible values and semantics
into one big supertype.
Through the power of parametric polymorphism and typeclasses, you can
actually create an extensible hierarchy of supertypes, not just of
subtypes.
Consider a common API for json serialization. You could have multiple
functions that serialize into JSON:
The type of toJSON :: forall a. JSON a => a -> Value is a
subtype of Foo -> Value, Bar -> Value, and
Baz -> Value, because everywhere you would want a
Foo -> Value, you could give toJSON instead. Every
time you want to serialize a Foo, you could use
toJSON.
This usage works well, as it gives you an extensible abstraction to design
code around. When you write code polymorphic over Monoid a, it
forces you to reason about your values with respect to only the aspects relating
to monoidness. If you write code polymorphic over Num a, it forces
you to reason about your values only with respect to how they can be added,
subtracted, negated, or multiplied, instead of having to worry about things like
their machine representation.
The extensibility comes from the fact that you can create even more
supertypes of forall a. ToJSON a => a -> Value easily,
just by defining a new typeclass instance. So, if you need a
MyType -> Value, you could make it a supertype of
toJSON :: ToJSON a => a -> Value by defining an instance of
the ToJSON typeclass, and now you have something you can use in its
place.
Practically this is used by many libraries. For example, ad uses it for automatic
differentiation: its diff function looks scary:
diff :: (forall s.AD s ForwardDouble->AD s ForwardDouble) ->Double->Double
But it relies on the fact that that
(forall s. AD s ForwardDouble -> AD s ForwardDuble) is a
superclass of (forall a. Floating a => a -> a),
(forall a. Num a => a -> a), etc., so you can give it
functions like \x -> x * x (which is a
forall a. Num a => a -> a) and it will work as that
AD s type:
ghci> diff (\x -> x * x) 1020-- 2*x
This “numeric overloading” method is used by libraries for GPU programming,
as well, to accept numeric functions to be optimized and compiled to GPU
code.
Another huge application is in the lens library, which
uses subtyping to unite its hierarchy of optics.
For example, an Iso is a subtype of Traversal which
is a subtype of Lens, and Lens is a supertype of
Fold and Traversal, etc. In the end the system even
allows you to use id from the Prelude as a lens or a
traversal, because the type signature of id :: a -> a is
actually a subtype of all of those types!
Subtyping using Existential
Types
What more closely matches the spirit of subtypes in OOP and other
languages is the existential type: a value that can be a value of any
type matching some interface.
For example, let’s imagine a value that could be any instance of
Num:
This is somewhat equivalent to Java’s
List<MyInterface> or List<MyClass>, or
python’s List[MyClass].
Note that to use this effectively in Haskell with superclasses and
subclasses, you need to manually wrap and unwrap:
dataSomeFrational=forall a.Fractional a =>SumFractional acastUp ::SomeFractional->SumNumcastUp (SomeFractional x) =SomeNum x
So, SomeNum is “technically” a supertype of
SomeFractional: everywhere a SomeNum is expected, a
SomeFractional can be given…but in Haskell it’s a lot less
convenient because you have to explicitly cast.
In OOP languages, you can often cast “down” using runtime reflection
(SomeNum -> Maybe SomeFractional). However, this is impossible
in Haskell the way we have written it!
That’s because of type erasure: Haskell does not (by default) couple a value
at runtime with all of its associated interface implementations. When you create
a value of type SomeNum, you are packing an untyped pointer to that
value as well as a “dictionary” of all the functions you could use it with:
dataNumDict a =NumDict { (+) :: a -> a -> a , (*) :: a -> a -> a , negate :: a -> a , abs :: a -> a , fromInteger ::Integer-> a }mkNumDict ::Num a =>NumDict amkNumDict =NumDict (+) (*) negateabsfromIntegerdataFractionalDict a =FractionalDict { numDict ::NumDict a , (/) :: a -> a -> a , fromRational ::Rational-> a }-- | Essentially equivalent to the previous 'SomeNum'dataSomeNum=forall a.SomeNum { numDict ::NumDict a , value :: a }-- | Essentially equivalent to the previous 'SomeFractional'dataSomeFractional=forall a.SomeFractional { fractionalDict ::FractionalDict a , value :: a }castUp ::SomeFractional->SomeNumcastUp (SomeFractional (FractionalDict {numDict}) x) =SomeNum d xcastDown ::SomeNum->MaybeSomeFractionalcastDown (SomeNum nd x) =error"not possible!"
All of these function pointers essentially exist at runtime inside
the SomeNum. So, SomeFractional can be “cast up” to
SomeNum by simply dropping the FractionalDict.
However, you cannot “cast down” from SomeNum because there is no
way to materialize the FractionalDict: the association from type to
instance is lost at runtime. OOP languages usually get around this by having the
value itself hold pointers to all of its interface implementations at
runtime. However, in Haskell, we have type erasure by default: there are no
tables carried around at runtime.2
In the end, existential subtyping requires explicit wrapping/unwrapping
instead of implicit or lightweight casting possible in OOP languages optimized
around this sort of behavior.3 Existential-based subtyping is just less
common in Haskell because parametric polymorphism offers a solution to most
similar problems. For more on this topic, Simon Peyton Jones has a nice lecture on the
topic.
The pattern of using existentially qualified data in a container
(like [SomeNum]) is often called the “widget pattern” because it’s
used in libraries like xmonad to allow
extensible “widgets” stored alongside the methods used to manipualte them. It’s
more common to explicitly store the handler functions (a “dictionary”) inside
the type instead of of existential typeclasses, but sometimes it can be nice to
let the compiler handle generating and passing your method tables implicitly for
you. Using existential typeclasses instead of explicit dictionaries also allows
you to bless certain methods and functions as “canonical” to your type, and the
compiler will make sure they are always coherent.
I do mention in a blog
post about different types of existential lists, however, that this
“container of instances” type is much less useful in Haskell than in other
languages for many reasons, including the up/downcasting issues mentioned above.
In addition, Haskell gives you a whole wealth of functionality to operate over
homogeneous parameters (like [a], where all items have the same
type) that jumping to heterogeneous lists gives up so much.
Aside
Let’s briefly take a moment to talk about how typeclass hierarchies give us
subtle subtype/supertype relationships.
Let’s look at the classic Num and Fractional:
classNum aclassNum a =>Fractional a
Num is a superclass of Fractional, and
Fractional is a subclass of Num. Everywhere a
Num constraint is required, you can provide a
Fractional constraint to do the same thing.
However, in these two types:
Num a => aFractional a => a
forall a. Num a => a is actually a subclass of
forall a. Fractional a => a! That’s because if you need a
forall a. Fractional a => a, you can provide a
forall a. Num a => a instead. In fact, let’s look at three
levels: Double, forall a. Fractional a => a, and
forall a. Num a => a.
-- can be used as `Double`1.0 ::Double1.0 ::Fractional a => a1 ::Num a => a-- can be used as `forall a. Fractional a => a`1.0 ::Fractional a => a1 ::Num a => a-- can be used as `forall a. Num a => a`1 ::Num a => a
So, Double is a supertype of Fractional a => a
is a supertype of Num a => a.
The general idea here is that the more super- you go, the more you “know”
about the actual term you are creating. So, with Num a => a, you
know the least (and, you have the most possible actual terms because
there are more instances of Num than of Fractional).
And, with Double, you know the most: you even know its
machine representation!
So, Num is a superclass of Fractional but
forall a. Num a => a is a subclass of
forall a. Fractional a => a. This actually follows the typical
rules of subtyping: if something appears on the “left” of an arrow
(=> in this case), it gets flipped from sub- to super-. We often
call the left side a “negative” (contravariant) position and the right side a
“positive” position, because a negative of a negative (the left side of a left
size, like a in (a -> b) -> c) is a
positive.
Also note that our “existential wrappers”:
dataSomeNum=forall a.Num a =>SomeFractional adataSomeFractional=forall a.Fractional a =>SomeFractional a
can be CPS-transformed to their equivalent types:
typeSomeNum'=forall r. (forall a.Num a => a -> r) -> rtypeSomeFractional'=forall r. (forall a.Fractional a => a -> r) -> rtoSomeNum' ::SomeNum->SomeNum'toSomeNum' (SomeNum x) f = f xtoSomeNum ::SomeNum'->SomeNumtoSomeNum sn = sn SomeNum
And in those cases, Num and Fractional again appear
in the covariant (positive) position, since they’re the negative of negative.
So, this aligns with our intuition that SomeFractional is a subtype
of SomeNum.
The Expression Problem
This tension that I described earlier is closely related to the expression
problem, and is a tension that is inherent to a lot of different aspects of
language and abstraction design. However, in the context laid out in this post,
it serves as a good general guide to decide what pattern to go down:
If you expect a canonical set of “inhabitants” and an open set of
“operations”, sum types can suit that end of the spectrum well.
If you expect a canonical set of “operations” and an open set of
“inhabitants”, consider subtyping and supertyping.
I don’t really think of the expression problem as a “problem” in the sense of
“some hindrance to deal with”. Instead, I see it in the “math problem” sort of
way: by adjusting how you approach things, you can play with the equation make
the most out of what requirements you need in your design.
Looking Forward
A lot of frustration in Haskell (and programming in general) lies in trying
to force abstraction and tools to work in a way they weren’t meant to. Hopefully
this short run-down can help you avoid going against the point of these
design patterns and start making the most of what they can offer. Happy
Haskelling!
Special Thanks
I am very humbled to be supported by an amazing community, who make it
possible for me to devote time to researching and writing these posts. Very
special thanks to my supporter at the “Amazing” level on patreon, Josh Vera! :)
Must OOP languages also have mechanisms for type erasure, but
the default is unerased, which is opposite of Haskell.↩︎
Note that there are current GHC proposals
that attempt to allow “naked” existentials without newtype wrappers, so we could
actually get the same seamless and implicit up-casting as we would get in OOP
languages. However, the jury is out on whether or not this is a good idea.↩︎
A while ago, we wrote a post on how we helped a client initially
integrate the Testwell CTC++ code coverage tool from Verifysoft into
their Bazel build.
Since then, some circumstances have changed, and we were recently challenged to
see if we could improve the CTC++/Bazel integration to the point were CTC++
coverage builds could enjoy the same benefits of Bazel caching and incremental
rebuilds as regular (non-coverage) builds. Our objective was to make it feasible
for developers to do coverage builds with CTC++ locally, rather than them
using different coverage tools or delaying coverage testing altogether.
Thus we could enable the client to focus their efforts on improving
overall test coverage with CTC++ as their only coverage tool.
In this sequel to the initial integration, we, as a team, have come up with a more involved
scheme for making CTC++ meet Bazel’s expectations of hermetic and reproducible
build actions. There is considerable extra complexity needed to make this work,
but the result is a typical speedup of 5-10 times on most coverage builds.
The kind of speedup that not only makes your CI faster, but that allows
developers to work in a different and more efficient way, altogether.
More generally, we hope this blog post can serve as a good example (or maybe a
cautionary tale 😉) of how to take a tool that does not play well with Bazel’s
idea of a well-behaved build step, and force it into a shape where we can still
leverage Bazel’s strengths.
The status quo
You can read our previous blog post for more details, but here
we’ll quickly summarize the relevant bits of the situation after our initial
integration of CTC++ coverage builds with Bazel:
CTC++ works by wrapping the compiler invocation with its ctc tool, and
adding coverage instrumentation between the preprocessing and compiling steps.
In addition to instrumenting the source code itself, ctc also writes
instrumentation data in a custom text format (aka. symbol data) to a separate
output file, typically called MON.sym (aka. the symbol file).
At runtime the instrumented unit tests will collect coverage statistics and
write these (in binary form) to another separate output file: MON.dat.
As far as Bazel is concerned, both the MON.sym and MON.dat files are
untracked side-effects of the respective compilation and testing steps. As
such we had to poke a hole in the Bazel sandbox and arrange for these files
to be written to a persistent location without otherwise being tracked or
managed by Bazel.
More importantly, these side-effects mean that we have to disable all caching
and re-run the entire build and all tests from scratch every single time.
Otherwise, we would end up with incomplete MON.sym and MON.dat files.
Another consideration - not emphasized in our previous post since we had to
disable caching of intermediate outputs in any case - is that the outputs from
ctc are not hermetic and reproducible. Both the instrumentation that is added
to the source code, as well as the symbol file that is written separately by
ctc contain the following information that is collected at compile time:
Absolute paths to source code files: Even though Bazel passes relative
paths on the command-line, ctc will still resolve these into absolute paths
and record these paths into its outputs. Since all these build steps run
inside the Bazel sandbox, the recorded paths vary arbitrarily from build to
build. Even worse: the paths are made invalid as soon as the sandbox is
removed, when the compilation step is done.
Timestamps: ctc will also record timestamps into the instrumented source
code and the symbol file. As far as we know, these might have been part of
some internal consistency check in previous versions of CTC++, but currently
they are simply copied into the final report, and displayed as a property of
the associated symbol data on which the HTML report is based. Since our
coverage reports are already tied to known Git commits in the code base,
these timestamps have no additional value for us.
Fingerprints: ctc calculates a 32-bit fingerprint based on the symbol
data, and records this fingerprint into both the instrumented source and the
symbol file. Since the symbol data already contains absolute path names as
detailed above, the resulting fingerprint will also vary accordingly, and thus
not be reproducible from one build to the next, even when all other inputs
remain unchanged.
Outlining the problems to be solved
If we are to make CTC++ coverage builds quicker by leveraging the Bazel cache,
we must answer these two questions:
Can we make ctc’s outputs reproducible? Without this, re-enabling the Bazel
cache for these builds is a non-starter, as each re-evaluation of an
intermediate build step will have never-before-seen action inputs, and none
of the cached outputs from previous builds will ever get reused.
Can we somehow capture the extra MON.sym output written by ctc at build
time, and appropriately include it into Bazel’s build graph?1
We need for Bazel to cache and reuse the symbol data associated with a
compilation unit in exactly the same way that it would cache and reuse the
object file associated with the same compilation unit.
Solving both of these would allow us to achieve a correct coverage report
assembled from cached object files and symbol data from previously-built and
unchanged source code, together with newly-built object files and symbol data
from recently-changed source code (in addition to the coverage statistics
collected from re-running all tests).
Achieving reproducibility
Let’s tackle the problem of making ctc’s outputs reproducible first. We start
by observing that ctc allows us to
configure hook scripts that will be invoked at various
points while ctc is running. We are specifically interested in:
RUN_AFTER_CPP, allows access to the preprocessed source before the
instrumentation step, and
RUN_AFTER_INSTR, allows access to the instrumented source before it’s passed
on to the underlying compiler.
From our existing work, we of course also have our own wrapper script around
ctc, which allows us to access the outputs of each ctc invocation before
they are handed back to Bazel. We also know, from our previous work, that we can
instruct ctc to write a separate symbol file per compilation unit, rather than
have all compilation units append to the same MON.sym file.
Together this allows us to rewrite the outputs from ctc in such a way as to
make them reproducible. What we want to rewrite, has already been outlined
above:
Absolute paths into the sandbox: We could rewrite these into corresponding
absolute paths to the original source tree instead, but we can just as well
take it one step further and simply strip the sandbox root directory prefix
from all absolute paths. This turns them into relative paths that happen to
resolve correctly, whether they’re taken relative to the sandbox directory at
compile time, or relative to the root of the source tree afterwards.
Timestamps: This one is relatively easy, we just need to decide on a
static timestamp that does not change across builds. For some reason the CTC++
report tooling did not like us passing the ultimate default timestamp, aka.
the Unix Epoch, so we instead settled for midnight on January 1 2024.2
Fingerprints: Here we need to calculate a 32-bit value that will reflect
the complete source code in this compilation unit (but importantly with
transient sandbox paths excluded). We don’t have direct access to the
in-progress symbol data that ctc uses to calculate its own fingerprint,
so instead we settle on calculating a CRC32 checksum across the entire
preprocessed source code (beforectc adds its own instrumentation).3
Once we’ve figured out what to rewrite, we can move on to the how:
Using the RUN_AFTER_CPP option to ctc, we can pass in a small script that
calculates our new fingerprint by running the preprocessed source code
through CRC32.
Using the RUN_AFTER_INSTR option to ctc, we can pass in a script that
processes the instrumented source, line by line:
rewriting any absolute paths that point into the Bazel sandbox,
rewriting the timestamp recorded by ctc into our static timestamp, and
rewriting the fingerprint to the one calculated in step 1.
In our script that wraps the ctc invocation, we can insert the above two
options on the ctc command line. We can also instruct ctc to write a
separate .sym file for this compilation unit inside the sandbox.
In the same wrapper script, afterctc is done producing the object file
and symbol file for a compilation unit, we can now rewrite the symbol file
that ctc produced. The rewrites are essentially the same as performed in
step 2, although the syntax of the symbol file is different than the
instrumented source.
At this point, we have managed to make ctc’s outputs reproducible, and we can
proceed to looking at the second problem from above: properly capturing and
maintaining the symbol data generated by ctc. However, we have changed
the nature of the symbol data somewhat: Instead of having multiple compilation
units write to the same MON.sym file outside of the sandbox, we now have one
.sym file per compilation unit written inside the sandbox. These files are
not yet known to Bazel, and would be removed together with the rest of the
sandbox as soon as the compilation step is finished.
Enabling correct cache/reuse of symbol data
What we want to achieve here is for the symbol data associated with a
compilation unit to closely accompany the corresponding object file from the
same compilation unit: If the object file is cached and later reused by Bazel,
we want the symbol file to be treated the same. And when the object file is
linked into an executable or a shared library, we want the symbol file to
automatically become part of any coverage report that is later created based on
running code from that executable or library.
I suspect there are other ways we could handle this, for example using
Bazel aspects, or similar, but since we’re already knee-deep
in compiler wrappers and rewriting outputs…
In for a penny, in for a pound…
Given that we want the symbol file to be as closely associated with the object
file as possible, let’s take that to the ultimate conclusion and make it a
stowaway inside the object file. After all, the object file is “just� an ELF
file, and it does not take too much squinting to regard the ELF format as a
generic container of sections, where a section really can be any piece of
data you like.
The objcopy tool, part of the GNU binutils tool suite, also comes to our aid
with options like --add-section and --dump-section to help us embed and
extract such sections from any ELF file.
With this in hand, we can design the following scheme:
In our wrapper script, after ctc has generated an object file with an
accompanying symbol file, we run
objcopy --add-section ctc_sym=$SYMBOL_FILE $OBJECT_FILE to embed the
symbol file as a new ctc_sym section inside the object file.
We make no changes to our Bazel build, otherwise. We merely expect Bazel to
collect, cache, and reuse the object files as it would do with any
intermediate build output. The symbol data is just along for the ride.
In the linking phase (which is already intercepted by ctc and our wrapper
script) we can forward the symbol data from the linker inputs (ELF object
files) into the linker output (a shared library or executable, also in the
ELF format), like this: Extract the ctc_sym from each object file passed as
input (objcopy --dump-section ctc_sym=$SYMBOL_FILE $OBJECT_FILE /dev/null),
then concatenate these symbol files together, and finally embed that into the
ELF output file from the linker.4
At test run time, in addition to running the tests (which together produce
MON.dat as a side effect), we can iterate over the test executables and
their shared library dependencies, and extract any ctc_sym sections that
we come across. These are then split into separate symbol files and placed
next to MON.dat.
Finally, we can pass MON.dat and all the .sym files on to the ctcreport
report generator to generate the final HTML report.5
Results
With all of the above in place, we can run coverage builds with and without our
changes, while testing various build scenarios, to see what we have achieved.
Let’s look at some sample build times for generating CTC++ coverage reports.
All times below are taken from the best of three runs, all on the same machine.
Status quo
Starting with the situation as of our previous blog post:
Scope of coverage build + tests
bazel build/test
ctcreport
Total
Entire source tree
38m46s
2m06s
44m26s
One large application
13m59s
43s
15m30s
One small application
21s
1s
35s
Since caching is intentionally disabled and there is no reuse between these
coverage builds, these are the kinds of numbers you will get, no matter the
size of your changes since the last coverage build.
Let’s look at the situation after we made the changes outlined above.
Worst case after our changes: No cache to be reused
First, for a new coverage build from scratch (i.e. a situation in which there is
nothing that can be reused from the cache):
Scope of coverage build + tests
bazel build/test
ctcreport
Total
Speedup
Entire source tree
38m48s
1m59s
43m03s
1.0x
One large application
13m04s
43s
14m26s
1.1x
One small application
19s
1s
22s
1.6x
As expected, these numbers are very similar to the status quo. After all, we are
doing the same amount of work, and this is not the scenario we sought to improve
in any case.
There is maybe a marginal improvement in the overhead (i.e. the time spent
between/around bazel and ctcreport), but it’s pretty much lost in the noise,
and certainly nothing worth writing a blog post about.
Best case after our changes: Rebuild with no changes
This is the situation where we are now able to reuse already-instrumented
intermediate build outputs. In fact, in this case there are no changes
whatsoever, and Bazel can reuse the test executables from the previous build
directly, no (re-)building necessary. However, as discussed above, we do need
to re-run all tests and then re-generate the coverage report:
Scope of coverage build + tests
bazel build/test
ctcreport
Total
Speedup
Entire source tree
3m24s
1m58s
6m55s
6.4x
One large application
1m31s
42s
2m49s
5.5x
One small application
1s
1s
4s
8.8x
Common case after our changes: Rebuild with limited change set
This last table is in many ways the most interesting (but least accurate),
as it tries to reflect the common case that most developers are interested in:
“I’ve made a few changes to the source code, how long will I have to wait to
see the updated coverage numbers?�
Of course, as with a regular build, it depends on the size of your changes, and
the extent to which they cause misses in Bazel’s build cache. Here, I’ve done
some small source code change that cause rebuilds in a handful of compilation
units:
Scope of coverage build + tests
bazel build/test
ctcreport
Total
Speedup
Entire source tree
3m23s
1m57s
6m54s
6.4x
One large application
1m34s
42s
2m52s
5.4x
One small application
4s
1s
6s
5.8x
The expectation here would be that the total time needed is the sum of how long
it takes to do a regular build of your changes, plus the numbers from the no-op
case above. And this seems to largely hold true. Especially for the single-
application case were we expect your changes to affect application’s unit tests,
and therefore the build phase must strictly precede the test runs.
In the full source tree scenario, it seems that Bazel can start running other
(unrelated) tests concurrently with building your changes, and as long as your
changes, and the tests on which they depend, are not among the slowest tests
to run, then those other, slower test will “hide� the marginal build time cost
imposed by your changes.
Conclusion
We have achieved what we set out to do: to leverage the Bazel cache to avoid
unnecessary re-building of coverage-instrumented source code. It involves a
fair amount of added complexity in the build process, in order to make CTC++‘s
outputs reproducible, and thus reusable by Bazel, but the end result, in the
common case - a developer making a small source code change relative to a
previous coverage build - is a 5-10x speedup of the total time needed to build
and test with coverage instrumentation, including the generation of the final
coverage report.
Future work
A natural extension of the above scheme is to apply a similar treatment to the
generation of the coverage statistics at test runtime: Bazel allows for test
runs to be cached, so that later build/test runs can reuse the results and logs
from earlier test runs, rather than having to re-run tests that haven’t changed.
However, in much the same way as for symbol data at build time, we would need to
make sure that coverage statistics (.dat files) were saved and reused along
with the corresponding test run results/logs.
One could imagine each test creating a separate .dat file when run, and then
have Bazel cache this together with the test logs. The report generation phase
would then need to collect the .dat files from both the reused/cached and
the new/uncached test runs, and pass them all to the ctcreport tool.
Failure to do so correctly would cause coverage statistics to be lost, and the
resulting coverage report would be misleading.
With all this in place we could then enable caching of test results (in
practice, removing the --nocache_test_results flag that we currently pass),
and enjoy yet another speedup courtesy of Bazel’s cache.
That said, we are entering the realm of diminishing returns: Unit tests - once
they are built - typically run quickly, and there is certainly less time to be
saved here than what is saved by reusing cached build results. Looking at the
above numbers: even if we were able to fully eliminate time used by
bazel test, we would still only achieve another 2x speedup, theoretically.
For now, we can live with re-running all tests from scratch in order to create
a complete MON.dat file, every time.
And that is where I believe it stops: extending this even further to
incrementally generate the coverage report itself, in effect to re-generate
parts of the report based on a few changed inputs, is - as far as I can see -
not possible with the existing tools.
Finally, I want to commend Verifysoft for their understanding and cooperation.
I can only imagine that for someone not used to working with Bazel, our initial
questions must have seemed very eccentric. They were, however, eager to
understand our situation and find a way to make CTC++ work for us. They have
even hinted at including a feature in a future version of CTC++ to allow
shortening/mapping paths at instrumentation time. Using such a feature to
remove the sandbox paths would also have the nice side effect of making CTC++‘s
own fingerprint logic reproducible, as far as we can see. Together, this would
enable us to stop rewriting paths and fingerprints on our own.
Thanks to Mark Karpov for being my main co-conspirator in coming up with this
scheme, and helping to work out all the side quests and kinks along the way.
Also thanks to Christopher Harrison, Joseph Neeman, and Malte Poll for their reviews of this article.
Four years ago I bought a pair of YubiKey 5s:
One YubiKey 5 Nano, which fits in my laptop’s USB slot, and another YubiKey 5 NFC as backup, which sat in my home office.
However, I kept worrying about what happens if my house burns down or something, taking both my laptop and office YubiKeys together at the same time.
On the otherhand, if I stored my YubiKey 5 NFC offsite, then whenever I needed to register a new FIDO service, I would need to go fetch the key, update it, and then return it.
Based my peronal experince, even if that were not a big pain, the "return it" step often gets delayed indefinitely because it feels so low priority.
Then I read a popular comment made on Hacker News: Get three YubiKeys.
Suddenly everything clicked!
I bought a second YubiKey 5 NFC last year.
Now, I keep a second YubiKey 5 NFC offsite, in addition to the one in my laptop and the one in my office.
If my home burns down, I still have an offsite YubiKey available.
But the best thing about having a second YubiKey 5 NFC is that it partly mitigates the offsite update problem.
In the previous scenario, we required potentially two trips offsite to update the backup YubiKey.
However, now the procedure is to register a new FIDO service is to first update the office YubiKey 5 NFC key (and the YubiKey 5 nano).
Then, at your earlist convienence, you swap the office YubiKey 5 NFC key with the offsite YubiKey 5 NFC.
When you get the offsite YubiKey home, you update it with the new FIDO service and then it becomes the new office YubiKey.
There is no need to return to the offsite location.
Part of the issue is that there is no "public FIDO key", like there is with a "public PGP key".
You need the acutual YubiKey in hand to register it with a FIDO service, no matter whether it is a discoverable credetial or not.
If you were only using the YubiKey as a OpenPGP smart card, the perhaps you could get away with just having a local key and an offsite key.
Even still, I would recommend a third YubiKey so that whenever the time comes to do some operation on your offsite key, you can perform the same swaping trick.
The title of this article says that three is the right number of YubiKeys.
However this is because I only have one nano in my laptop because that is my primary computing interface.
I do have a desktop computer that I mostly only access as a remote server.
If you have multiple computer devices that you regularly use, it would make sense to have a YubiKey nano device in each of them.
And in addition to those, have one offsite key, and one local key for swapping with the offsite key.
Retrieval-augmented generation (RAG) is about providing large language models
with extra context to help them produce more informative responses. Like
any machine learning application, your RAG app needs to be monitored and
evaluated to ensure that it continues to respond accurately to user queries.
Fortunately, the RAG ecosystem has developed to the point where you can evaluate
your system in just a handful of lines of code.
The outputs of these evaluations are easily interpretable: numbers between
0 and 1, where higher numbers are better. Just copy our
sample code below, paste it into your continuous monitoring system,
and you’ll be looking at nice dashboards in no time. So that’s it, right?
Well, not quite. There are several common pitfalls in RAG evaluation. From this
blog post, you will learn what the metrics mean and how to check that they’re
working correctly on your data with our field-gained knowledge. As they say,
“forewarned is forearmed”!
Background
If you’re new to RAG evaluation, our previous posts about it give an
introduction to evaluation and discuss benchmark suites.
For now, you just need to know that a benchmark suite consists of a collection of
questions or prompts, and for each question establishes:
a “ground truth” context, consisting of documents from our database that are relevant
for answering the question; and
a “ground truth” answer to the question.
For example
Query
Ground truth context
Ground truth answer
What is the capital of France?
Paris, the capital of France, is known for its delicious croissants.
Paris
Where are the best croissants?
Lune Croissanterie, in Melbourne, Australia, has been touted as ‘the best croissant in the world.’
Melbourne
Then the RAG system provides (for each question):
a “retrieved” context — the documents that our RAG system thought were relevant — and
a generated answer.
The inputs to a RAG evaluator
Example
Here’s an example that uses the
Ragas library to evaluate the “faithfulness” (how well
the response was supported by the context) of a single RAG output, using an LLM
from AWS Bedrock:
from langchain_aws import ChatBedrockConverse
from ragas import EvaluationDataset, evaluate
from ragas.llms import LangchainLLMWrapper
from ragas.metrics import Faithfulness
# In real life, this probably gets loaded from an internal file (and hopefully# has more than one element!)
eval_dataset = EvaluationDataset.from_list([{"user_input":"What is the capital of France?","retrieved_contexts":["Berlin is the capital of Germany."],"response":"I don't know.",}])# The LLM to use for computing metrics (more on this below).
model ="anthropic.claude-3-haiku-20240307-v1:0"
evaluator = LangchainLLMWrapper(ChatBedrockConverse(model=model))print(evaluate(dataset=eval_dataset, metrics=[Faithfulness(llm=evaluator)]))
If you paid close attention in the previous section, you’ll have noticed
that our evaluation dataset doesn’t include all of the components we talked about.
That’s because the “faithfulness” metric only requires the retrieved context and
the generated answer.
RAG evaluation metrics
There are a variety of RAG evaluation metrics available; to keep them straight, we
like to use the RAG Triad, a helpful system of categorizing some RAG metrics.
A RAG system has one input (the query) and two outputs
(the context and the response), and the RAG Triad lets us visualize the three interactions
that need to be evaluated.
The RAG triad
Evaluating retrieval
Feeding an LLM with accurate and relevant context can help it respond well; that’s the
whole idea of RAG.
Your system needs to find that relevant context, and your
evaluation system needs to figure out how well the retrieval is working. This
is the top-right side of the RAG Triad: evaluating the relationship between the query
and the retrieved context.
The two main retrieval metrics are precision and recall; each one has a classical
definition, plus an “LLM-enhanced” definition for RAG.
Roughly, “good precision” means that we don’t return irrelevant information,
while “good recall” means that we don’t miss any relevant information. Let’s
say that each of our benchmark queries is labelled with a ground truth set of
relevant documents, so that we can check how many of the retrieved documents are relevant.
Then the classical precision and recall are
<semantics>precision=# relevant retrieved docs# retrieved docsrecall=# relevant retrieved docs# relevant docs in the database<annotation encoding="application/x-tex">\text{precision} = \frac{\text{\# relevant retrieved docs}}{\text{\# retrieved docs}}
\qquad
\text{recall} = \frac{\text{\# relevant retrieved docs}}{\text{\# relevant docs in the database}}</annotation></semantics>precision=# retrieved docs# relevant retrieved docsrecall=# relevant docs in the database# relevant retrieved docs
These metrics are well-established, useful, and easy to compute. But in a RAG system, the
database might be large, uncurated, and contain redundant documents.
For example, suppose you have ten related documents, each containing an
answer to the query. If your retrieval system returns just one of them then it will have
done its job adequately, but it will only receive a 10% recall score.
With a large database, it’s also possible that there’s a document with the necessary
context that wasn’t tagged as relevant by the benchmark builder. If the retrieval
system finds that document, it will be penalized in the precision score even though
the document is relevant.
Because of these issues with classical precision and recall, RAG evaluations often
adapt them to work on statements instead of documents. We
list the statements in the ground-truth context and in the retrieved
context; we call a retrieved statement “relevant” if it was present in the ground-truth
context.
This definition of precision and recall is better tailored to RAG than the classical one, but
it comes with a big disadvantage: you need to decide what a “statement” is, and whether two statements
are “the same.” Usually you’ll want to automate this decision with an LLM, but that raises
its own issues with cost and reliability. We’ll say more about that later.
Evaluating generation
Once your retrieval is working well — with continuous monitoring and
evaluation, of course — you’ll need to evaluate your generation step. The most
commonly used metric here is faithfulness1, which
measures whether a generated answer is factually supported by the retrieved
context; this is
the bottom side of the RAG Triad.
To calculate faithfulness, we count the number of factual claims in
the generated answer, and then decide which of them is supported
by the context. Then we define
Like the RAG-adapted versions of context precision and recall,
this is a statement-based metric. To automate it, we’d need an LLM
to count the factual claims and decide which of them is context-supported.
You can evaluate faithfulness without having retrieval working yet, as long as you
have a benchmark with ground truth contexts. But if you do that, there’s one crucial point
to keep in mind: you also need to test generation when retrieval is bad, like when
it contains distracting irrelevant documents or just doesn’t have anything useful at all.
Bad retrieval will definitely happen in the wild, and so you need to ensure that your
generation (and your generation evaluation) will degrade gracefully. More on that below.
Evaluating the answer
Finally, there is a family of
commonly-used generation metrics that evaluate the quality of the answer by comparing it
to the prompt and the ground truth:
answer semantic similarity measures the semantic similarity between the generated answer and the ground truth;
answer correctness also compares the generated answer and the ground truth, but is based on counting factual claims
instead of semantic similarity; and
answer relevance measures how well the generated answer corresponds to the question that the prompt asked.
This is the top-left side of the RAG Triad.
These metrics directly get to the key outcome of your RAG system: are the generated responses good?
They come with the usual pluses and minuses of end-to-end metrics. On the one hand, they measure
exactly what you care about; on the other hand, when they fail you don’t know which component
is to blame.
As you’ve seen above, many of the metrics used for evaluating RAG rely on
LLMs to extract and evaluate factual claims. That means that some of the
same challenges you’ll face while building your RAG system also apply to its
evaluation:
You’ll need to decide which model (or models) to use for evaluation, taking into account cost, accuracy, and reliability.
You’ll need to sanity-check the evaluator’s responses, preferably with continuous monitoring and
occasional manual checks.
Because the field is moving so quickly, you’ll need to evaluate the options yourself — any
benchmarks you read online have a good chance of being obsolete by the time you
read them.
When the judges don’t agree
In order to better understand these issues, we ran a few experiments on a basic RAG
system — without query re-writing, context re-ranking or other tools to improve
retrieval — using the Neural Bridge benchmark dataset as our test set. We first ran
these experiments in early 2024; when we re-visited them in December 2024 we found
that newer base LLMs had improved results somewhat but not dramatically.
The Neural Bridge dataset contains 12,000 questions; each one comes with a
context and an answer. We selected 200 of these questions at random and ran them
through a basic RAG system using Chroma DB as the vector store and either Llama
2 or Claude Haiku 3 as the LLM for early 2024 and December 2024 runs, respectively. The RAG system was
not highly tuned — for example, its retrieval step was just a vector similarity search
— and so it gave a mix of good answers, bad answers, and answers saying essentially
“I don’t know: the context doesn’t say.”
Finally, we used Ragas to evaluate various metrics on the generated responses, while varying
the LLMs used to power the metrics.
Experimental results
Our goal in these experiments was to determine:
whether the LLM evaluators were correct, and
whether they were consistent with one another.
We found that different LLMs are often not in agreement. In particular, they can’t all be correct.
Here are the evaluation scores of five different models on four different metrics,
averaged across our benchmark dataset. You’ll notice a fair amount of spread
in the scores for faithfulness and context precision.
Average metrics scores across models
But the scores above are just averages across the dataset — they don’t tell
us how well the LLMs agreed on individual ratings. For that, we checked the
correlation
between model scores and again found some discrepancies between models.
Here are the results for answer relevancy scores: the correlations show
that even though the different models gave very similar average scores,
they aren’t in full agreement.
Correlation of answer relevancy scores across models. A score of one means
that the models agree completely, while a score of zero means that they
agree or disagree essentially at random.
It might not be too surprising that models from the same family (GPT 3.5 and 4,
and Sonnet 3 and 3.5) had larger overlaps than models from different families.
If your budget allows it, choosing multiple uncorrelated models and evaluating
with all of them might make your evaluation more robust.
When faithfulness gets difficult
We dug a little more into the specific reasons for LLM disagreement, and found
something interesting about the faithfulness score:
we restricted to the subset of questions for which
retrieval was particularly bad, having no overlap with the ground truth data.
Even the definition of faithfulness is tricky when the context is
bad. Let’s say the LLM decides that the context doesn’t have relevant information
and so responds “I don’t know” or “The context doesn’t say.” Are those factual statements?
If so, are they supported
by the context? If not, then according to the definition, the faithfulness is zero
divided by zero. Alternatively, you could try to detect responses like this
and treat them as a sort of meta-response that doesn’t go through the normal
metrics pipeline. We’re not sure how best to handle this corner case, but we do know
that you need to do it explicitly and consistently. You also need to be prepared
to handle null values and empty responses from your metrics pipeline, because
this situation often induces them.
Experimental results
On the subset of questions with poor retrieval our Ragas-computed faithfulness
scores ranged from 0%, as judged by Llama 3, to more than 80%, as judged by
Claude 3 Sonnet. We emphasize that these were faithfulness scores evaluated by
different LLMs judging the same retrievals, responses, and generated answers.
Even if you exclude Llama 3 as an outlier, there is a lot of variation.
Faithfulness scores across models, when the context is bad
This variation in scores doesn’t seem to be an intentional choice (to the extent
that LLMs can have “intent”) by the evaluator LLMs, but rather a situation of corner
cases compounding one another. We noticed that this confusing situation made some
models — Llama 3 most often, but also other models — fail to respond in the JSON
format expected by the Ragas library. Depending on how you treat these failures, this
can result in missing metrics or strange scores.
You can sidestep these issues somewhat if you have thorough evaluation across the entire
RAG pipeline: if other metrics are flagging poor retrieval, it matters less that your
generation metrics are behaving strangely on poorly-retrieved examples.
In general, there’s no good substitute for careful human evaluation. The LLM judges don’t agree, so
which one agrees best with ground truth human evaluations (and is the agreement good enough
for your application)? That will depend on your documents, your typical questions, and on future releases of improved models.
Conclusion
Oh, were you hoping we’d tell you which LLM you should use? No such luck: our advice would be
out of date by the time you read this, and if your data doesn’t closely resemble our benchmark
data, then our results might not apply anyway.
In summary, it’s easy to compute metrics for your RAG application, but don’t just
do it blindly. You’ll want to test different LLMs for driving the metrics, and you’ll need
to evaluate their outputs. Your metrics should cover all the sides of the RAG triad, and
you should know what they mean (and be aware of their corner cases) so that you can
interpret the results. We hope that helps, and happy measuring!
The terminology is not quite settled: what Ragas calls “faithfulness,” TruLens
calls “groundedness.” Since the RAG Triad was introduced by TruLens, you’ll
usually see it used in conjunction with their terminology. We’ll use the Ragas
terminology in this post, since that’s what we used for our experiments.↩
The GHC developers are happy to announce the availability of the first and
likely final release candidate of GHC 9.12.2. This is an important bug-fix
release resolving a significant correctness issue present in 9.12.1
(#25653).
In accordance with our under-discussion release policies this candidate
will have a two-week testing window. The final 9.12.2 release will likely come
the week of 12 March 2025.
As always, if you find anything amiss please open a ticket.
Cooked Validators is a Haskell library designed to simplify
the complex process of crafting and testing transactions on the Cardano
blockchain. Writing proper transactions in Cardano can be challenging due to its
UTXO-based model, which requires precise definitions and careful structuring of
inputs, outputs, and complementary components. cooked-validators tackles these
challenges by offering a powerful framework for defining transactions in a
minimal and declarative manner while incorporating a significant degree of
automation.
One of the library’s core strengths lies in its ability to help developers
transform simple transaction templates, referred to as “skeletons”, or TxSkel,
into fully-formed transactions that satisfy the technical requirements of
Cardano’s validation process. This automation not only minimizes boilerplate
code but also reduce the room for errors, thus streamlining the creation and
testing of transactions. In particular, we’ve used cooked-validators extensively
to rigorously audit smart contracts for many
clients and well-known products now live on
Cardano.
Although cooked-validators has been a reliable tool for years, no blog post
has yet explored how it automates key aspects of transaction creation,
simplifying complex processes into manageable workflows. This post aims to fill
that gap by showcasing how the library helps developers build Cardano
transactions with ease and efficiency, allowing them to focus on high-level
design and intent rather than getting bogged down by low-level technical
details.
Validating transactions in cooked-validators
cooked-validators provides a convenient way to interact with the blockchain
through a type class abstraction, MonadBlockChain. Among
the primitives provided by this type class, the most fundamental is
validateTxSkel which:
takes a transaction skeleton as input,
expands the skeleton’s content based on missing parts and skeleton options,
generates a transaction,
submits this transaction for validation, and
returns the validated transaction, or throws an error if it is invalid.
Thus, the function has the following type signature:
In the remainder of this post, we will explore the fields of the transaction
skeleton (TxSkel) and how validateTxSkel behaves when automatically
expanding this skeleton.
Transaction skeletons
Cardano transactions are usually represented by large Haskell records containing
a predefined set of fields that evolve alongside the Cardano protocol. The
traditional approach to building transactions involves directly creating
instances of these records and submitting them for validation.
In cooked-validators, however, transactions are further abstracted through a
custom record called TxSkel, which has its own set of fields, some of which map directly to
corresponding fields in a Cardano transaction, while others guide the
translation process. The primary motivation behind using this abstraction is to
highlight the most relevant information for common use cases while hiding less
critical details that can be inferred automatically based on the provided
data1.
There are several additional reasons for the use of TxSkel:
Our transaction skeletons embed as much type information as possible for
scripts and UTXOs, thus increasing type-safety.
Each transaction skeleton includes its own set of options to guide transaction
generation, with sensible default values.
Our transaction skeletons have default values for all fields, allowing users
to provide minimal information relevant to their use-case.
Our skeleton elements use meaningful, yet simple types, avoiding the need for
the complex overlays and type annotations commonly found in
Cardano or Ledger APIs, which are avoided by
defaulting to the current Cardano era.
While TxSkel is designed to be lighter and more user-friendly than Cardano
transactions, it does not compromise user flexibility. Since TxSkel ultimately
generates Cardano transactions, users are provided with the option to manually
tweak the generated transaction if desired. This ensures that users retain full
control and can build their Cardano transactions in any way they prefer.
To build a transaction skeleton, users simply override the fields they need to
set from the default skeleton, txSkelTemplate.
txSkelTemplate{txSkelIns=...,txSkelMints=...,...}
From manual ADA payments to automated transaction balancing
The first feature one might expect from a transaction is to pay assets to a
given peer. Surprisingly, this can be quite complex due to the underlying
extended UTXO model on which Cardano is based. Without diving too deeply into
the details, it’s important to understand that exchanging assets in Cardano is
done through “pouches” of various sizes, called UTXOs. If Alice wants to send 12
ADA (Cardano’s currency) to Bob, and she possesses one UTXO with 4 ADA and
another with 10 ADA, she will have to provide both UTXOs, create a new UTXO with
12 ADA for Bob, and return a UTXO with 2 ADA for herself. Moreover, she will
also need to account for transaction fees, meaning the returning UTXO will
actually contain something like 1.998222 ADA (1,998,222 lovelace).
In summary, this seemingly simple payment of 12 ADA will result in a transaction
with 2 inputs and 2 outputs, along with an additional “phantom” payment
corresponding to the transaction fees. However, from the user’s perspective, the
key point is that Alice needs to pay 12 ADA to Bob. cooked-validators allows
users to focus on these high-level intentions, as demonstrated by the following
skeleton:
In this skeleton, we specify that the transaction pays 12 ADA to Bob and that
Alice is a signer of the transaction. And that’s it.
Internally, cooked-validators processes this skeleton through a balancing
phase. In this context, “balancing” is a multifaceted term. It not only refers
to ensuring that the inputs and outputs of the transaction contain the same
amount of ADA (and other assets)2, but also to calculating fees, accounting
for them in the transaction, and handling associated collaterals when necessary
(funds that are made available within the transaction in case a script failure
occurs during validation). This automated process is a part of the added value
provided by cooked-validators.
Computing fees, collaterals, and balancing transactions is notoriously difficult
in Cardano due to circular dependencies (higher fees imply more collaterals,
which increase transaction size, which in turn leads to higher fees…) and the
unpredictable resource consumption of scripts in terms of memory space and
computation cycles. See cooked-validators’s
documentation for the details of what balancing involves, how
cooked-validators performs it, and the options available to control
balancing. Notably, cooked-validators is non-invasive, meaning that the automation
can be disabled if needed. For instance, users can manually set fees and
collaterals and even balance their transactions themselves.
After balancing, the skeleton will look like this:
In most cases, this skeleton will remain hidden from the user, though it can be
retrieved and used if necessary by manually invoking the balancing function or
checking the logs.
From manual payments to automated minimal amount of ADA
While Alice is using Cardano, she might come across non-ADA tokens with custom
names4, such as mySmartContractToken. These tokens are provided by smart contracts
and dedicated to specific purposes such as
NFTs to represent ownership
of a certain resource. Alice might also want to send such a token to Bob:
As shown above, cooked-validators will attempt to balance this skeleton by
retrieving an instance of mySmartContractToken from Alice’s UTXOs, along with
the necessary ADA to cover the transaction fee. However, validating the
resulting balanced skeleton will fail because Cardano requires every UTXO to
include a minimum amount of lovelace to cover its storage cost. This minimum
amount, derived from the protocol parameters, also acts as a safeguard against
potential security risks that could arise if UTXOs were allowed to exist without
any ADA. Thankfully, cooked-validators can automatically calculate this required
amount when the appropriate transaction option is enabled. The updated skeleton
then becomes:
Enabling this option triggers an initial transformation pass, before balancing,
which calculates the required amount of ADA to sustain the output and adds this
amount to the transaction skeleton. After both passes, the skeleton will
resemble something like, with remainingValue being the original value in
Alice’s UTXO minus the fees and the payment to Bob:
By default, txOptEnsureMinAda is set to False, which may seem
counterintuitive. However, this prevents unexpected adjustments to ADA amounts
that may have been carefully computed. If a transaction output is meant to
contain a specific ADA amount based on a precise calculation, but the protocol
requires a higher minimum, enabling this option would silently modify the
value. This could obscure computation errors, allowing transactions to validate
without the user realizing the discrepancy. To stay true to
cooked-validators’s philosophy of minimal intervention, the option remains off
by default, ensuring that any necessary adjustments are made explicitly.
From spending scripts to automated script witness binding
In the previous examples, we saw how cooked-validators can handle the addition
of inputs in a transaction skeleton. However, there are cases where one might
want to manually specify the inputs. This is typically necessary when a
transaction needs to consume UTXOs belonging to scripts, in which case a
redeemer must be provided, as it cannot be inferred automatically. A redeemer is
a piece of information (which may be empty) required whenever a script from a
smart contract is invoked. This redeemer usually informs the script as to why it
has been called, and can also pass dynamic values as inputs to the script. In
the examples above, the added inputs were UTXOs from peers, so
emptyTxSkelRedeemer was automatically provided.
When consuming scripts, collaterals must be included in case the validation
process fails after the script execution. These collaterals cover the
computation resources used during validation, which cannot be covered by fees,
as fees are only paid if the transaction is successfully validated. The
inclusion (or omission) of collaterals, depending on whether the transaction
involves scripts, is handled during balancing. Collaterals can only be provided
as UTXOs from peers, so a signer is also required in such cases, even if no peer
UTXO is consumed. A transaction skeleton that consumes a script can thus be
written as:
From this skeleton, cooked-validators offers two types of automation. The first
is the balancing mechanism, which has already been discussed. Beyond computing
fees and collaterals, the balancing process also creates an output at the first
signer’s address to return any excess value from inputs and consumes a UTXO from
the user to cover the transaction fees.
The second automation concerns the addition of script witnesses. On-chain,
scripts are represented by their hash, which serves different purposes depending
on the script’s type3—address for spending scripts, policy ID for minting
scripts, or staking ID for staking scripts. However, during validation, scripts
must be executed, and their hash alone is insufficient. Instead, users must
supply the full scripts as witnesses, ensuring their hash matches the expected
on-chain hash.
When a UTXO is created at a spending script’s address, cooked-validators
retains the script, allowing it to automatically attach the required witness for
these inputs in future transactions. However, for minting or staking scripts,
the tool lacks knowledge of the necessary witnesses, so they must be specified
manually.
Since September 2022, Cardano supports reference scripts, which are complete
scripts stored on-chain in UTXOs. These reference scripts can be used as
witnesses in place of the full script, reducing transaction size and
fees. cooked-validators also automates the inclusion of such reference
scripts. In practice, when a script witness is required, the following process
unfolds:
if a witness is manually provided by the user, it is used as is.
if no such witness exists, cooked-validators attempts to find a reference
witness among known UTXOs.
if no such witness could be found, and the script is used for spending a
UTXO, cooked-validators attempts to find a direct witness among its known
scripts.
In the previous example, assuming a reference witness was present on some UTXO,
the skeleton will look like this:
From issuing proposals to automated deposit payment
The final type of automation we will discuss in this post involves proposals
issued by users, a feature introduced in the Conway era. These proposals can
vary, but the most common are parameter changes, where users propose new values
for parameters that control on-chain behaviors. These proposals must obey a set
of constitutional rules, which is checked by a constitution script. For example,
here is a skeleton where Alice proposes to update the cost of fees per byte in
the size of a transaction to 100 lovelace, witnessed by a given constitution
script:
Each proposal requires a deposit of a certain amount of lovelace, as specified
by the protocol parameters. cooked-validators takes such deposits into account
during the balancing phase. It looks up the current required deposit amount and
retrieves this amount from the available UTXOs from the balancing wallet to
include in the transaction. After balancing, the skeleton will look like this:
Currently, cooked-validators allows the users to provide any constitution
script to validate whether the proposal adheres to constitutional rules. In
practice, the ledger prevents any such script that does not correspond to the
current official Cardano constitution. Thus, in the future, cooked-validators
might automatically fetch this script and attach it to proposals.
Conclusion
One of cooked-validators’ main strengths is its ability to allow users to
express their high-level transaction requirements conveniently and efficiently,
without having to deal with the intricate technical details of the resulting
transaction. This is achieved through TxSkels, which are transaction
abstractions that can be partially filled by users. cooked-validators performs
several passes on these partial skeletons, such as filling in missing minimal
ADA, balancing the transaction, and automatically adding witnesses, to translate
these minimal skeletons into transactions that can be submitted for
validation. This blog post has summarized these key automation steps, stay tuned
for more posts around cooked-validators.
it is always possible to override those fields in the generated transaction
though, as cooked-validators never forces users to build their
transactions one way or another.↩
the actual balancing equation is more complicated: withdrawals + inputs + mints = burn + outputs + deposits + fees↩
this name here stands for the combination of a token name and a policy ID.↩
all scripts are defined in a same way following the Conway era, what we call
script types are only abstractions to reference the way they are used.↩
In this episode Wouter Swiestra and Niki Vazou talk with Conal Elliott. Conal discusses doing things just for the poetry, how most programs miss their purpose, and the simplest way to ask a question. Conal is currently working on a book about his ideas and actively looking for partners.
Regular, everyday stuff. But the instances for type constructors are more interesting, because they come with an instance context:
instance (Foo a, Foo b) =>Foo (a, b) where...
Then, of course, if we know both Foo a and Foo b, we can infer Foo (a, b). To make this fact overwhelmingly explicit, we can reify the usual constraint-solving logic by using the Dict type, and thus the following program will typecheck:
with the only change required coming from the type constructor instances:
instance (Foo a, Foo b) =>Foo (a, b) wheretypeEvidence (a, b) = (Foo a, Foo b)...
or, if we you want to be cute about it:
instanceEvidence (a, b) =>Foo (a, b) wheretypeEvidence (a, b) = (Foo a, Foo b)...
By sticking Evidence into the superclass constraint, GHC knows that this dictionary is always available when you’ve got a Foo dictionary around. And our earlier backwards program now typechecks as expected.
When I first joined the Topiary Team, I floated the idea of trying to
format Bash with Topiary. While this did nothing to appease my
unenviable epithet of “the Bash guy,� it was our first foray into
expanding Topiary’s support beyond OCaml and simple syntaxes like JSON.
Alas, at the time, the Tree-sitter Bash grammar was
not without its problems. I got quite a long way, despite this, but
there were too many things that didn’t work properly for us to graduate
Bash to a supported language.
Fast-forward two years and both Topiary and the Tree-sitter Bash grammar
have moved on. As the incumbent Bash grammar was beginning to cause
downstream problems from bit rot — frustratingly breaking the builds of
both Topiary and Nickel — my fellow Topiarist, Nicolas Bacquey,
migrated Topiary to the latest version of the Bash grammar and updated
our Bash formatting queries to match.
With surprisingly little effort, Nicolas was
able to resolve all those outstanding problems. So with that, Bash was
elevated to the lofty heights of “supported language� and — with the
changes I’ve made from researching this blog post — Bash formatting is
now in pretty good shape in Topiary v0.6.
So much so, in fact, let me put my money where my mouth is! Let’s see
how Topiary fares against a rival formatter. I’ll do this, first, by
taking you down some of the darker alleys of Bash parsing, just to show
you what we’re up against.
Hello darkness, my old friend
There is a fifth dimension beyond that which is known to man. It is a
dimension as vast as space and as timeless as infinity. It is the middle
ground between light and shadow, between science and superstition; it
lies between the pit of man’s fears and the summit of his knowledge.
This is the dimension of imagination. It is an area we call: the Bash
grammar.
In our relentless hubris, man has built a rocket that — rather than
exploding on contact with reality — dynamically twists and turns to
meet reality’s expectations. Is that a binary? Execute it! Is that a
built-in? Execute it! Is that three raccoons in a trench coat,
masquerading as a function? Execute it! And so, with each token parsed,
we are Bourne Again and stray ever further from god.
Trailing comments must be preceded by whitespace or a semicolon.
However, if either of those are escaped, they are interpreted as
literals and this changes the tokenisation semantics:
echo\# Ceci n'est pas| une pipe'
Here, perhaps the writer intended to add a comment against the first
line. But, what looks like a comment isn’t a comment at all; it
becomes an argument to echo, along with everything that follows.
That includes the apostrophe in “n’est�, which is interpreted as an
opening quote — a raw string — which is closed at the end of the
next line.
Case statements idiomatically delimit each branch condition with a
closing parenthesis. In a subshell, for example, this leads to
unbalanced brackets:
(case$xin foo )# Wat?...echo bar;;esac)# 🤯
This subshell outputs bar when the variable $x is equal to foo.
Whereas, on a more casual reading, this formulation might just look
like a confusing syntax error.
Speaking of case statements, did you know that ;& and ;;& are also
valid branch terminators? Without checking the manual — if you can
find the single paragraph where it’s mentioned — can you tell me
how they differ?
Bash will try to compute an array index if it looks like an arithmetic
expression:
# Output the (foo - bar)th element of arrayecho"${array[foo-bar]}"
However, if array in this example is an associative array (i.e., a
hash map/dictionary), then foo-bar could be a valid key. In which
case, it’s not evaluated and used verbatim.
Without backtracking, it’s not possible to distinguish between an
arithmetic expansion and a command substitution containing a subshell
at its beginning or end:
echo$((foo + bar))echo$((foo);(bar))
Here, the first statement will output the value of the addition of
those two variables; the second will execute foo then bar, each in
a subshell, echoing their output. In the subshell case, the POSIX
standards even recommend that you add spaces — e.g., $( (foo) ) —
to remove this ambiguity.
Heredocs effectively switch the parser into a different state, where
everything is interpreted literally except when it isn’t. This alone
is tricky, but Bash introduces some variant forms that allow
additional indentation (with hard tabs), switching off all string
interpolation, or both.
# Indented, with interpolationcat<<-HEREDOC
I am a heredoc. Hear me roar.
HEREDOC
Suffice to say, any formatter has their work cut out.
Battle of the Bash formatters
The de facto formatter for Bash is shfmt. It’s written in Go,
by Daniel MartÃ, actively maintained and has been around for the best
part of a decade.
Let’s compare Topiary’s Bash formatting with shfmt in a contest worthy
of a Netflix special. I’ll look specifically at each tool’s parsing and
formatting capabilities as well as their performance characteristics. I
won’t, however, compare their subjective formatting styles, as this is
largely a matter of taste.
When it comes to formatting Bash in a way that is commonly attested in
the wild, there are three things that Topiary cannot currently do.
Unfortunately, these are either from the absence of a feature in
Topiary, or a lack of fidelity in the Tree-sitter grammar; no amount of
hacking on queries will fix them.
The worst offender is probably the inability to distinguish line
continuations from other token boundaries. These are used in Bash
scripts all the time to break up long commands into more digestible
code. In the following example, the call to topiary was spread over
multiple lines, with line continuations. Topiary slurps everything onto
a single line, whereas shfmt preserves the original line continuations
in the input:
One saving grace is that Topiary’s Bash parser understands a trailing
|, in a pipeline, to accept a line break. As such — while it isn’t my
personal favourite style3 — Topiary does support multi-line
pipelines. Arguably, they even look a little nicer in Topiary than in
shfmt, which only preserves where the line breaks occurred in the
input:
# Topiary
foo |
bar |
baz |
quux
# shfmt
foo | bar |
baz | quux
Otherwise, in Topiary, every command is a one-liner…whether you like
it or not!
Next on the “nice to have� list is the long-standing (and
controversial) feature request of “alignment blocks�;
specifically for comments. That is, presumably related comments
appearing on a series of lines should be aligned to the same column:
# Topiary
here # comment
is # comment
a # comment
sequence # comment
of # comment
commands # comment
# shfmt
here # comment
is # comment
a # comment
sequence # comment
of # comment
commands # comment
The tl;dr of the controversy is that, despite being a popular request —
and we all know where popularity gets us, these days — it’s a slap in
the face to one of Topiary’s core design principles: minimising diffs.
Because we live in a universe where elastic tabstops never really took
off, a small change to the above example — say, adding an option to one
of the commands — would produce the following noisy diff:
For the time being, Topiary won’t be making alignment great again.
Finally, string interpolations — with command substitution and
arithmetic expansions — cannot be formatted without potentially
breaking the string itself. This is particularly true of heredocs; the
full subtleties of which escape the Tree-sitter Bash grammar and so are
easily corruptible with naive formatting changes. As such, Topiary has
to treat these as immutable leaves and leave them untouched:
# Topiaryecho"2 + 2 = $((2+2))"cat<<EOF
Today is $(date)
EOF
# shfmtecho"2 + 2 = $((2+2))"cat<<EOF
Today is $(date)
EOF
So far, I have only found three constructions that are syntactically
correct, but the Tree-sitter Bash grammar cannot parse (whereas, shfmt
can):
A herestring that follows a file redirection (issue #282):
rev> output <<< hello
A workaround, for now, is to switch the order; so the herestring
comes first.
Similar to line continuations, the Tree-sitter Bash grammar seems to
swallow escaped spaces at the beginning of tokens, interpreting them
as tokenisation whitespace rather than literals (issue #284):
# This should output:# <a># <b># < ># <c>printf"<%s>\n" a b \ c
For what it’s worth, shfmt also supports POSIX shell and mksh (a
KornShell implementation). As of writing, there are no Tree-sitter
grammars for these shells. However, their syntax doesn’t diverge too far
from Bash, so it’s likely that Topiary’s Bash support will be sufficient
for large swathes of such scripts. Moreover, the halcyon years of the
1990s are a long way behind us, so maybe this doesn’t matter.
shfmt is part of a wider project that includes a Bash parser for the
Go ecosystem. A purpose-built parser, particularly for Bash, should
perform better than the generalised promise of Tree-sitter and, indeed,
that’s what we see. However, there are a few minor constructions that
shfmt doesn’t like, but the Tree-sitter Bash grammar accepts:
An array index assignment which uses the addition augmented
assignment operator:
my_array=(
foo
[0]+=bar
)
To be fair to shfmt, while this is valid Bash, not even the
venerable ShellCheck can parse this!
Topiary leaves array indices unformatted, despite them allowing
arithmetic expressions. shfmt, however, will add whitespace to any
index that looks like an arithmetic expression (e.g., [foo-bar]
will become [ foo - bar ]); even if the original, unspaced version
could be a valid associative array key.
(Neither Topiary nor shfmt can handle indices containing spaces.
However, the standard Bash workaroundâ„¢ is to quote these:
${array["foo bar"]}.)
Brace expansions can appear — perhaps
surprisingly — almost anywhere. Particularly surprising to shfmt
is when they appear in variable declarations, which it cannot parse:
While it’s a bit of a hack,4 we also implement something akin to
“rewrite rules� in our Topiary Bash formatting queries, which shfmt
(mostly) doesn’t do. This is to enforce a canonical style over certain
constructions. Namely:
All $... variables are rewritten in their unambiguous form of
${...}, excluding special variables such as $1 and $@. (Note
that this doesn’t affect $'...' ANSI C strings, despite their
superficial similarity.)
All function signatures are rewritten to the name() { ... } form,
rather than function name { ... } or function name() { ... }.
All POSIX-style [ ... ] test clauses are rewritten to the Bash
[[ ... ]] form.
All legacy $[ ... ] arithmetic expansions are rewritten to their
$(( ... )) form.
All `...` command substitutions are rewritten to their
$( ... ) form.
(This is one that shfmt does do.)
Technically, it is also possible to write rules that put quotes around
unquoted command arguments, ignoring things like -o/--options. While
this is good practice, we do not enforce this style as it changes the
code’s semantics and there may be legitimate reasons to leave arguments
unquoted.
Throughput
Let’s be honest: If you have so much Bash to format that throughput
becomes meaningful, then formatting is probably the least of your
worries. That being said, it is the one metric that we can actually
quantify.
Our first problem is that we need a large corpus of normal scripts. By
“normal,� I mean things that you’d see in the wild and could conceivably
understand if you squint hard enough. This rules out the Bash test
suite, for example, which — while quite large — is a grimoire of weird
edge cases that neither Topiary nor shfmt handle well. Quite frankly,
if you’re writing Bash that looks like this, then you don’t deserve
formatting:
:$(case a in a):;#esac ;;esac)
Digging around on r/bash, I came across this
repository of scripts. They’re all fairly short, but
they’re quite sane. This will do.
We need to slam large amounts of Bash into the immovable objects that
are our formatters; a “Bash test dummy,�5 if
you will. It would be ideal if we could stream Bash into our formatters
— so we could orchestrate sampling at regular time intervals —
however, neither Topiary nor shfmt support streaming formatting. This
stands to reason as there are cases where formatting will depend on some
future context, so the whole input will need to be read upfront. As
such, we need to invert our approach to collecting metrics and sample
over input size instead.
The general method is:
Locate the scripts in the repository that are Bash, by looking at
their shebang.
Filter this list to those which Topiary can handle without tripping
over itself because of some obscure parsing failure. (We assume
shfmt doesn’t require such a concession.)
Perform <semantics>N<annotation encoding="application/x-tex">N</annotation></semantics>N trials, in which:
The whitelist of scripts is randomised, to remove any potential
confounding from caching.
The top <semantics>M<annotation encoding="application/x-tex">M</annotation></semantics>M scripts are concatenated to obtain a single trial
input.6 This is to increase the input size to
the formatters in each trial, which is presumed to be the dependent
variable, but may be subject to confounding effects when the input
is small.
The trial input is read to /dev/null a handful of times to warm
up the filesystem cache.
The trial input is fed into the following, with benchmarks — trial
input size (bytes) and runtime (nanoseconds) — recorded for each:
cat, which acts as a control;
Topiary (v0.5.1; release build, with the query changes described
in this blog post);
Topiary, with its idempotence checking disabled;
shfmt (v3.10.0).
This identified 156 Bash scripts within the test repository; of which,
154 of them could be handled by Topiary.7 On an 11th
generation Intel Core i7, at normal stepping, with <semantics>N=50<annotation encoding="application/x-tex">N=50</annotation></semantics>N=50 and <semantics>M=25<annotation encoding="application/x-tex">M=25</annotation></semantics>M=25, on
a Tuesday afternoon, I obtained the following results:
cat, which does nothing, is unsurprisingly way out in front; by two
orders of magnitude. This is not interesting, but establishes that input
can be read faster than it can be formatted. That is, our little
experiment is not accidentally I/O bound.
What is interesting is that Topiary is about 3× faster than
shfmt. We also see that the penalty imposed by idempotency checking —
which formats twice, to check the output reaches a fixed point — is
quite negligible. This indicates that most of the work Topiary is doing
is in its startup overhead, which involves loading the grammar and
parsing the formatting query file.
Since Topiary only has to do this once per trial, it’s a little unfair
to set <semantics>M=25<annotation encoding="application/x-tex">M=25</annotation></semantics>M=25; that is, an artificially enlarged input that is
syntactically valid but semantically meaningless. However, if we set
<semantics>M=1<annotation encoding="application/x-tex">M=1</annotation></semantics>M=1 (i.e., individual scripts), then we see a similar comparison:
For small inputs, the idempotency check penalty is barely perceptible.
Otherwise, the startup overhead dominates for both formatters — hence
the much lower throughput values — but, still, Topiary comfortably
outperforms shfmt by a similar factor.
And the winner is…
In an attempt to regain some professional integrity, I’ll fess up to the
fact that Topiary has a bit of a home advantage and maybe — just
maybe — I’m ever so slightly biased. That is, as we are in the
(dubious) position of building a plane while attempting to fly it, I was
able to tweak and fix a few of our formatting rules to improve Topiary’s
Bash support during the writing of this blog post:
I added formatting rules for arrays (and associative arrays) and
their elements.
I corrected the formatting of trailing comments that appear at the
end of a script.
I corrected the function signature rewriting rule.
I corrected the formatting of a string of commands that are
interposed by Bash’s & asynchronous operator.
I fixed the formatting of test commands and added a rewrite rule for
POSIX-style [ ... ] tests.
I updated the $... variable rewrite rule to avoid targeting
special forms like $0, $? and $@, etc.
I implemented a rewrite rule that converts legacy $[ ... ]
arithmetic expansions into their $(( ... )) form.
I implemented a rewrite rule that converts `...` command
substitutions into their $(...) form.
I fixed the spacing within variable declarations, to accommodate
arguments and expansions.
I forced additional spacing in command substitutions containing
subshells, to remove any ambiguity with arithmetic expansions.
The point I’m making here is that these adjustments were very easy to
conjure up; just a few minutes of thought for each, across our
Tree-sitter queries, was required.
So who’s the winner?
Well, would it be terribly anticlimactic of me, after all that, not to
call it? shfmt is certainly more resilient to Bash-weirdness and, of
the “big three� I discussed, its line continuation handling is a must
have. However, Topiary does pretty well, regardless: It’s much faster,
for what that’s worth, and — more to the point — far easier to tweak
and hack on.
Indeed, when the Topiary team first embarked upon this path, we weren’t
even sure whether it would be possible to format Bash. Now that the
Tree-sitter Bash grammar has matured, Topiary — perhaps with future
fixes to address some of its shortcomings, uncovered by this blog post
— is a contender in the Bash ecosystem.
Thanks to Nicolas Bacquey, Yann Hamdaoui, Tor Hovland, Torsten Schmits
and Arnaud Spiwack for their reviews and input on this post, and to
Florent Chevrou for his assistance with the side-by-side code styling.
Recently I looked again at PHOAS, and once again I concluded it's nice for library APIs, but so painful to do anything with inside those libraries. So let convert to something else, like de Bruijn.
There are standalone source files if you just want to see the code:
There is always a way to cheat, though. You can turn the PHOAS ->
untyped de Bruijn machinery into the PHOAS -> typed de Bruijn
machinery by checking that future contexts indeed extend past contexts
and throwing an error otherwise (which can't happed, because future
contexts always extend past contexts, but it's a metatheorem).
In "Generic Conversions of Abstract Syntax Representation" by Steven Keuchel and Johan Jeuring, authors also "cheat" a bit. The "Parametrhic higher-order abstract syntax" section ends with a somewhat disappointing
wherepostulate whatever :_
Keuchel and Jeuring also mention "Unembedding Domain-Specific Languages" by Robert Atkey, Sam Lindley and Jeremy Yallop; where there is one unsatisfactory ⊥ (undefined in Haskell) hiding.
I think that for practical developments (say a library in Haskell), it is ok to make a small short cut; but I kept wondering isn't there is a way to make a conversion without cheating.
Well... it turns out that we cannot "cheat". Well-formedness of PHOAS representation depends on parametricity, and the conversion challenge seems to requires a theorem which there are no proof in Agda.
In unpublished (?) work Adam Chlipala shows a way to do the conversion without relying on postulates http://adam.chlipala.net/cpdt/html/Intensional.html; but that procedure requires an extra well formedness proof of given PHOAS term.
This Agda development is a translation of that developement.
Common setup
Our syntax representations will be well-typed, so we need types:
-- Typesdata Ty :Setwhere emp : Ty fun : Ty → Ty → TyCtx :SetCtx = List Tyvariable A B C : Ty Γ Δ Ω : Ctx v : Ty →Set
de Bruijn syntax
Var : Ctx → Ty →SetVar Γ A = Idx A Γ -- from agda-np, essentially membership relation.data DB (Γ : Ctx): Ty →Setwhere var : Var Γ A → DB Γ A app : DB Γ (fun A B)→ DB Γ A → DB Γ B lam : DB (A ∷ Γ) B → DB Γ (fun A B) abs : DB Γ emp → DB Γ A
Parametric Higher-order abstract syntax
data PHOAS (v : Ty →Set): Ty →Setwhere var : v A → PHOAS v A app : PHOAS v (fun A B)→ PHOAS v A → PHOAS v B lam :(v A → PHOAS v B)→ PHOAS v (fun A B) abs : PHOAS v emp → PHOAS v A-- closed "true" PHOAS terms.PHOAS° : Ty →Set₁PHOAS° A =∀{v}→ PHOAS v A
de Bruijn to PHOAS
This direction is trivial. An anecdotal evidence that de Bruijn representation is easier to transformation on.
phoasify : NP v Γ → DB Γ A → PHOAS v Aphoasify γ (var x)= var (lookup γ x)phoasify γ (app f t)= app (phoasify γ f)(phoasify γ t)phoasify γ (lam t)= lam λ x → phoasify (x ∷ γ) tphoasify γ (abs t)= abs (phoasify γ t)
Interlude: Well-formedness of PHOAS terms
dam Chlipala defines an equivalence relation between two PHOAS terms, exp_equiv in Intensional, wf in CPDT book). e only need a single term well-formedness so can do a little less
Terms like invalid cannot be values of PHOAS°, as all values of "v" inside PHOAS° have to originated from lam-constructor abstractions. We really should keep v parameter free, i.e. parametric, when constructingPHOAS terms.
The idea is then to simply to track which variables (values of v) are intoduced by lambda abstraction.
data phoasWf {v :Ty→Set} (G:List (Σ Ty v)) : {A:Ty} →PHOAS v A→Setwhere varWf :∀ {A} {x : v A}→Idx (A , x) G→ phoasWf G (var x) appWf :∀ {AB} {f :PHOAS v (fun AB)} {t :PHOAS v A}→ phoasWf G f→ phoasWf G t→ phoasWf G (app f t) lamWf :∀ {AB} {f : v A→PHOAS v B}→ (∀ (x : v A) → phoasWf ((A , x) ∷G) (f x))→ phoasWf G (lam f) absWf :∀ {A} {t :PHOAS v emp}→ phoasWf G t→ phoasWf G (abs {A=A} t)-- closed terms start with an empty GphoasWf° :PHOAS° A→Set₁phoasWf° tm =∀ {v} → phoasWf {v = v} [] tm
A meta theorem is then that all PHOASᵒ terms are well-formed, i.e.
meta-theorem-proposition :Set₁meta-theorem-proposition =∀{A}(t : PHOAS° A)→ phoasWf° t
As far as I'm aware this proposition cannot be proved nor refuted in Agda.
de Bruijn to PHOAS translation creates well-formed PHOAS terms.
As a small exercise we can show that phoasify of closed de Bruijn terms creates well-formed PHOAS terms.
toList : NP v Γ → List (Σ Ty v)toList [] = []toList (x ∷ xs)=(_ , x) ∷ toList xsphoasifyWfVar :(γ : NP v Γ)(x : Var Γ A)→ Idx (A , lookup γ x)(toList γ)phoasifyWfVar (x ∷ γ) zero = zerophoasifyWfVar (x ∷ γ)(suc i)= suc (phoasifyWfVar γ i)phoasifyWf :(γ : NP v Γ)(t : DB Γ A)→ phoasWf (toList γ)(phoasify γ t)phoasifyWf γ (var x)= varWf (phoasifyWfVar γ x)phoasifyWf γ (app f t)= appWf (phoasifyWf γ f)(phoasifyWf γ t)phoasifyWf γ (lam t)= lamWf λ x → phoasifyWf (x ∷ γ) tphoasifyWf γ (abs t)= absWf (phoasifyWf γ t)phoasifyWf° :(t : DB [] A)→ phoasWf° (phoasify [] t)phoasifyWf° t = phoasifyWf [] t
PHOAS to de Bruijn
The rest deals with the opposite direction.
In Intensional Adam Chlipala uses v = λ _ → ℕ instatiation to make the translation.
I think that in the typed setting using v = λ _ → Ctx turns out nicer.
The idea in both is that we instantiate PHOAS variables to be de Bruijn levels.
data IsSuffixOf {ℓ}{a :Set ℓ}: List a → List a →Set ℓ where refl :∀{xs}→ IsSuffixOf xs xs cons :∀{xs ys}→ IsSuffixOf xs ys →∀{y}→ IsSuffixOf xs (y ∷ ys)
We need to establish well-formedness of PHOAS expression in relation to some context Γ
Note that variables encode de Bruijn levels, thus the contexts we "remember" in variables should be the suffix of that outside context.
wf :(Γ : Ctx)→ PHOAS (λ_→ Ctx) A →Setwf {A = A} Γ (var Δ)= IsSuffixOf (A ∷ Δ) Γwf Γ (app f t)= wf Γ f × wf Γ twf Γ (lam {A = A} t)= wf (A ∷ Γ)(t Γ)wf Γ (abs t)= wf Γ t
And if (A ∷ Δ) is suffix of context Γ, we can convert the evidence to the de Bruijn index (i.e. variable):
makeVar : IsSuffixOf (A ∷ Δ) Γ → Var Γ AmakeVar refl = zeromakeVar (cons s)= suc (makeVar s)
Given the term is well-formed in relation to context Γ we can convert it to de Bruijn representation.
dbify :(t : PHOAS (λ_→ Ctx) A)→ wf Γ t → DB Γ Adbify (var x) wf = var (makeVar wf)dbify (app f t)(fʷ , tʷ)= app (dbify f fʷ)(dbify t tʷ)dbify {Γ = Γ}(lam t) wf = lam (dbify (t Γ) wf)dbify (abs t) wf = abs (dbify t wf)
What is left is to show that we can construct wf for all phoasWf-well-formed terms.
Adam Chlipala defines a helper function:
makeG′ : Ctx → List (Σ Ty (λ_→ Ctx))makeG′ [] = []makeG′ (A ∷ Γ)=(A , Γ) ∷ makeG′ Γ
However for somewhat technical reasons, we rather define
Today, 2025-02-12, at 1930 UTC (11:30 am PST, 2:30 pm EST, 7:30 pm GMT, 20:30 CET, …)
we are streaming the 39th episode of the Haskell Unfolder live on YouTube.
In this episode we’ll discuss the the four different ways GHC offers for deriving class instance definitions: the classic “stock” deriving, generalised “newtype” deriving, as well as the “anyclass” and “via” strategies. For each of these, we’ll explain the underlying ideas, use cases, and limitations.
About the Haskell Unfolder
The Haskell Unfolder is a YouTube series about all things Haskell hosted by
Edsko de Vries and Andres Löh, with episodes appearing approximately every two
weeks. All episodes are live-streamed, and we try to respond to audience
questions. All episodes are also available as recordings afterwards.
Normalization by evaluation using parametric higher order syntax. In Agda.
I couldn't find a self-contained example of PHOAS NbE, so here it is. I hope someone might find it useful.
module NbEXP.PHOAS wheredata Ty :Setwhere emp : Ty fun : Ty → Ty → Tydata Tm (v : Ty →Set): Ty →Setwhere var :∀{a}→ v a → Tm v a app :∀{a b}→ Tm v (fun a b)→ Tm v a → Tm v b lam :∀{a b}→(v a → Tm v b)→ Tm v (fun a b)data Nf (v : Ty →Set): Ty →Setdata Ne (v : Ty →Set): Ty →Setdata Ne v where nvar :∀{a}→ v a → Ne v a napp :∀{a b}→ Ne v (fun a b)→ Nf v a → Ne v bdata Nf v where neut : Ne v emp → Nf v emp nlam :∀{a b}→(v a → Nf v b)→ Nf v (fun a b)Sem :(Ty →Set)→ Ty →SetSem v emp = Ne v empSem v (fun a b)= Sem v a → Sem v blower :∀{v : Ty →Set}(a : Ty)→ Sem v a → Nf v araise :∀{v : Ty →Set}(a : Ty)→ Ne v a → Sem v alower emp s = neut slower (fun a b) s = nlam λ x → lower b (s (raise a (nvar x)))raise emp n = nraise (fun a b) n x = raise b (napp n (lower a x ))eval :{v : Ty →Set}{a : Ty}→ Tm (Sem v) a → Sem v aeval (var x)= xeval (app f t)= eval f (eval t)eval (lam t) x = eval (t x)nf :{a : Ty}→{v : Ty →Set}→ Tm (Sem v) a → Nf v anf {a} t = lower a (eval t)nf_parametric :{a : Ty}→({v : Ty →Set}→ Tm v a)->({v : Ty →Set}→ Nf v a)nf_parametric t = nf t
This last month has been fascinating. I guess LLMs have finally
resonated with me on a deeper level. It wasn’t like I woke up and
suddenly everything was different, but their impact is growing on me
non-linearly, forcing me to rewire my brain.
I've been fortunate to be nominated for a few teaching awards over my career, and even to win a couple. The nomination I just received may be the best.
As a new student at the uni, Philip Wadler was the first introductory lecture I had, and his clear passion for the subject made me feel excited to begin my journey in computer science. In particular he emphasised the importance of asking questions, which made the idea of tutorials and lectures a lot less intimidating, and went on to give really valuable advice for starting university. I enjoyed this session so much, and so was looking forward to the guest lectures he was going to do for Inf1A at the end of semester 1. They certainly did not disappoint, the content he covered was engaging, interesting, and above all very entertaining to listen to, especially when he dressed up as a superhero to cement his point. Because I found these talks so rewarding, I also attended the STMU that he spoke at about AI and ChatGPT, and everyone I talked to after the event said they had a really good time whilst also having a completely new insightful perspective on the topic. In summary, Philip Wadler has delivered the best lectures I have attended since starting university, and I have gotten a lot out of them.
President Trump has started rolling out his tariffs, something I blogged about in November. People are talking about these tariffs a lot right now, with many people (correctly) commenting on how consumers will end up with higher prices as a result of these tariffs. While that part is true, I’ve seen a lot of people taking it to the next, incorrect step: that consumers will pay the entirety of the tax. I put up a poll on X to see what people thought, and while the right answer got a lot of votes, it wasn't the winner.
Checking on people's general view of taxes. When the government imposes a tax on trade (sales tax, VAT, tariff, or even payroll tax), which party absorbs the cost of the tax?
For purposes of this blog post, our ultimate question will be the following:
Suppose apples currently sell for $1 each in the entire United States.
There are domestic sellers and foreign sellers of apples, all receiving the same price.
There are no taxes or tariffs on the purchase of apples.
The question is: if the US federal government puts a $0.50 import tariff per apple, what will be the change in the following:
Number of apples bought in the US
Price paid by buyers for apples in the US
Post-tax price received by domestic apple producers
Post-tax price received by foreign apple producers
Before we can answer that question, we need to ask an easier, first question: before instituting the tariff, why do apples cost $1?
And finally, before we dive into the details, let me provide you with the answers to the ultimate question. I recommend you try to guess these answers before reading this, and if you get it wrong, try to understand why:
The number of apples bought will go down
The buyers will pay more for each apple they buy, but not the full amount of the tariff
Domestic apple sellers will receive a higher price per apple
Foreign apple sellers will receive a lower price per apple, but not lowered by the full amount of the tariff
In other words, regardless of who sends the payment to the government, both taxed parties (domestic buyers and foreign sellers) will absorb some of the costs of the tariff, while domestic sellers will benefit from the protectionism provided by tariffs and be able to sell at a higher price per unit.
Let’s say I absolutely love apples, they’re my favorite food. How much would I be willing to pay for a single apple? You might say “$1, that’s the price in the supermarket,” and in many ways you’d be right. If I walk into supermarket A, see apples on sale for $50, and know that I can buy them at supermarket B for $1, I’ll almost certainly leave A and go buy at B.
But that’s not what I mean. What I mean is: how high would the price of apples have to go everywhere so that I’d no longer be willing to buy a single apple? This is a purely personal, subjective opinion. It’s impacted by how much money I have available, other expenses I need to cover, and how much I like apples. But let’s say the number is $5.
How much would I be willing to pay for another apple? Maybe another $5. But how much am I willing to pay for the 1,000th apple? 10,000th? At some point, I’ll get sick of apples, or run out of space to keep the apples, or not be able to eat, cook, and otherwise preserve all those apples before they rot.
The point being: I’ll be progressively willing to spend less and less money for each apple. This form of analysis is called marginal benefit: how much benefit (expressed as dollars I’m willing to spend) will I receive from each apple? This is a downward sloping function: for each additional apple I buy (quantity demanded), the price I’m willing to pay goes down. This is what gives my personal demand curve. And if we aggregate demand curves across all market participants (meaning: everyone interested in buying apples), we end up with something like this:
Assuming no changes in people’s behavior and other conditions in the market, this chart tells us how many apples will be purchased by our buyers at each price point between $0.50 and $5. And ceteris paribus (all else being equal), this will continue to be the demand curve for apples.
Marginal cost
Demand is half the story of economics. The other half is supply, or: how many apples will I sell at each price point? Supply curves are upward sloping: the higher the price, the more a person or company is willing and able to sell a product.
Let’s understand why. Suppose I have an apple orchard. It’s a large property right next to my house. With about 2 minutes of effort, I can walk out of my house, find the nearest tree, pick 5 apples off the tree, and call it a day. 5 apples for 2 minutes of effort is pretty good, right?
Yes, there was all the effort necessary to buy the land, and plant the trees, and water them… and a bunch more than I likely can’t even guess at. We’re going to ignore all of that for our analysis, because for short-term supply-and-demand movement, we can ignore these kinds of sunk costs. One other simplification: in reality, supply curves often start descending before ascending. This accounts for achieving efficiencies of scale after the first number of units purchased. But since both these topics are unneeded for understanding taxes, I won’t go any further.
Anyway, back to my apple orchard. If someone offers me $0.50 per apple, I can do 2 minutes of effort and get $2.50 in revenue, which equates to a $75/hour wage for me. I’m more than happy to pick apples at that price!
However, let’s say someone comes to buy 10,000 apples from me instead. I no longer just walk out to my nearest tree. I’m going to need to get in my truck, drive around, spend the day in the sun, pay for gas, take a day off of my day job (let’s say it pays me $70/hour). The costs go up significantly. Let’s say it takes 5 days to harvest all those apples myself, it costs me $100 in fuel and other expenses, and I lose out on my $70/hour job for 5 days. We end up with:
Total expenditure: $100 + $70 * 8 hours a day * 5 days == $2900
Total revenue: $5000 (10,000 apples at $0.50 each)
Total profit: $2100
So I’m still willing to sell the apples at this price, but it’s not as attractive as before. And as the number of apples purchased goes up, my costs keep increasing. I’ll need to spend more money on fuel to travel more of my property. At some point I won’t be able to do the work myself anymore, so I’ll need to pay others to work on the farm, and they’ll be slower at picking apples than me (less familiar with the property, less direct motivation, etc.). The point being: at some point, the number of apples can go high enough that the $0.50 price point no longer makes me any money.
This kind of analysis is called marginal cost. It refers to the additional amount of expenditure a seller has to spend in order to produce each additional unit of the good. Marginal costs go up as quantity sold goes up. And like demand curves, if you aggregate this data across all sellers, you get a supply curve like this:
Equilibrium price
We now know, for every price point, how many apples buyers will purchase, and how many apples sellers will sell. Now we find the equilibrium: where the supply and demand curves meet. This point represents where the marginal benefit a buyer would receive from the next buyer would be less than the cost it would take the next seller to make it. Let’s see it in a chart:
You’ll notice that these two graphs cross at the $1 price point, where 63 apples are both demanded (bought by consumers) and supplied (sold by producers). This is our equilibrium price. We also have a visualization of the surplus created by these trades. Everything to the left of the equilibrium point and between the supply and demand curves represents surplus: an area where someone is receiving something of more value than they give. For example:
When I bought my first apple for $1, but I was willing to spend $5, I made $4 of consumer surplus. The consumer portion of the surplus is everything to the left of the equilibrium point, between the supply and demand curves, and above the equilibrium price point.
When a seller sells his first apple for $1, but it only cost $0.50 to produce it, the seller made $0.50 of producer surplus. The producer portion of the surplus is everything to the left of the equilibrium point, between the supply and demand curves, and below the equilibrium price point.
Another way of thinking of surplus is “every time someone got a better price than they would have been willing to take.”
OK, with this in place, we now have enough information to figure out how to price in the tariff, which we’ll treat as a negative externality.
Modeling taxes
Alright, the government has now instituted a $0.50 tariff on every apple sold within the US by a foreign producer. We can generally model taxes by either increasing the marginal cost of each unit sold (shifting the supply curve up), or by decreasing the marginal benefit of each unit bought (shifting the demand curve down). In this case, since only some of the producers will pay the tax, it makes more sense to modify the supply curve.
First, let’s see what happens to the foreign seller-only supply curve when you add in the tariff:
With the tariff in place, for each quantity level, the price at which the seller will sell is $0.50 higher than before the tariff. That makes sense: if I was previously willing to sell my 82nd apple for $3, I would now need to charge $3.50 for that apple to cover the cost of the tariff. We see this as the tariff “pushing up” or “pushing left” the original supply curve.
We can add this new supply curve to our existing (unchanged) supply curve for domestic-only sellers, and we end up with a result like this:
The total supply curve adds up the individual foreign and domestic supply curves. At each price point, we add up the total quantity each group would be willing to sell to determine the total quantity supplied for each price point. Once we have that cumulative supply curve defined, we can produce an updated supply-and-demand chart including the tariff:
As we can see, the equilibrium has shifted:
The equilibrium price paid by consumers has risen from $1 to $1.20.
The total number of apples purchased has dropped from 63 apples to 60 apples.
Consumers therefore received 3 less apples. They spent $72 for these 60 apples, whereas previously they spent $63 for 3 more apples, a definite decrease in consumer surplus.
Foreign producers sold 36 of those apples (see the raw data in the linked Google Sheet), for a gross revenue of $43.20. However, they also need to pay the tariff to the US government, which accounts for $18, meaning they only receive $25.20 post-tariff. Previously, they sold 42 apples at $1 each with no tariff to be paid, meaning they took home $42.
Domestic producers sold the remaining 24 apples at $1.20, giving them a revenue of $28.80. Since they don’t pay the tariff, they take home all of that money. By contrast, previously, they sold 21 apples at $1, for a take-home of $21.
The government receives $0.50 for each of the 60 apples sold, or in other words receives $30 in revenue it wouldn’t have received otherwise.
We could be more specific about the surpluses, and calculate the actual areas for consumer surplus, producer surplus, inefficiency from the tariff, and government revenue from the tariff. But I won’t bother, as those calculations get slightly more involved. Instead, let’s just look at the aggregate outcomes:
Consumers were unquestionably hurt. Their price paid went up by $0.20 per apple, and received less apples.
Foreign producers were also hurt. Their price received went down from the original $1 to the new post-tariff price of $1.20, minus the $0.50 tariff. In other words: foreign producers only receive $0.70 per apple now. This hurt can be mitigated by shifting sales to other countries without a tariff, but the pain will exist regardless.
Domestic producers scored. They can sell less apples and make more revenue doing it.
And the government walked away with an extra $30.
Hopefully you now see the answer to the original questions. Importantly, while the government imposed a $0.50 tariff, neither side fully absorbed that cost. Consumers paid a bit more, foreign producers received a bit less. The exact details of how that tariff was split across the groups is mediated by the relevant supply and demand curves of each group. If you want to learn more about this, the relevant search term is “price elasticity,” or how much a group’s quantity supplied or demanded will change based on changes in the price.
Other taxes
Most taxes are some kind of a tax on trade. Tariffs on apples is an obvious one. But the same applies to income tax (taxing the worker for the trade of labor for money) or payroll tax (same thing, just taxing the employer instead). Interestingly, you can use the same model for analyzing things like tax incentives. For example, if the government decided to subsidize domestic apple production by giving the domestic producers a $0.50 bonus for each apple they sell, we would end up with a similar kind of analysis, except instead of the foreign supply curve shifting up, we’d see the domestic supply curve shifting down.
And generally speaking, this is what you’ll always see with government involvement in the economy. It will result in disrupting an existing equilibrium, letting the market readjust to a new equilibrium, and incentivization of some behavior, causing some people to benefit and others to lose out. We saw with the apple tariff, domestic producers and the government benefited while others lost.
You can see the reverse though with tax incentives. If I give a tax incentive of providing a deduction (not paying income tax) for preschool, we would end up with:
Government needs to make up the difference in tax revenue, either by raising taxes on others or printing more money (leading to inflation). Either way, those paying the tax or those holding government debased currency will pay a price.
Those people who don’t use the preschool deduction will receive no benefit, so they simply pay a cost.
Those who do use the preschool deduction will end up paying less on tax+preschool than they would have otherwise.
This analysis is fully amoral. It’s not saying whether providing subsidized preschool is a good thing or not, it simply tells you where the costs will be felt, and points out that such government interference in free economic choice does result in inefficiencies in the system. Once you have that knowledge, you’re more well educated on making a decision about whether the costs of government intervention are worth the benefits.
For many years I wished I had a setup that would allow me to work (that is, code) productively outside in the bright sun. It’s winter right now, but when its summer again it’s always a bit. this weekend I got closer to that goal.
TL;DR: Using code-server on a beefy machine seems to be quite neat.
Passively lit coding
Personal history
Looking back at my own old blog entries I find one from 10 years ago describing how I bought a Kobo eBook reader with the intent of using it as an external monitor for my laptop. It seems that I got a proof-of-concept setup working, using VNC, but it was tedious to set up, and I never actually used that. I subsequently noticed that the eBook reader is rather useful to read eBooks, and it has been in heavy use for that every since.
Four years ago I gave this old idea another shot and bought an Onyx BOOX Max Lumi. This is an A4-sized tablet running Android and had the very promising feature of an HDMI input. So hopefully I’d attach it to my laptop and it just works™. Turns out that this never worked as well as I hoped: Even if I set the resolution to exactly the tablet’s screen’s resolution I got blurry output, and it also drained the battery a lot, so I gave up on this. I subsequently noticed that the tablet is rather useful to take notes, and it has been in sporadic use for that.
Going off on this tangent: I later learned that the HDMI input of this device appears to the system like a camera input, and I don’t have to use Boox’s “monitor” app but could other apps like FreeDCam as well. This somehow managed to fix the resolution issues, but the setup still wasn’t as convenient to be used regularly.
I also played around with pure terminal approaches, e.g. SSH’ing into a system, but since my usual workflow was never purely text-based (I was at least used to using a window manager instead of a terminal multiplexer like screen or tmux) that never led anywhere either.
My colleagues have said good things about using VSCode with the remote SSH extension to work on a beefy machine, so I gave this a try now as well, and while it’s not a complete game changer for me, it does make certain tasks (rebuilding everything after a switching branches, running the test suite) very convenient. And it’s a bit spooky to run these work loads without the laptop’s fan spinning up.
In this setup, the workspace is remote, but VSCode still runs locally. But it made me wonder about my old goal of being able to work reasonably efficient on my eInk tablet. Can I replicate this setup there?
VSCode itself doesn’t run on Android directly. There are project that run a Linux chroot or in termux on the Android system, and then you can VNC to connect to it (e.g. on Andronix)… but that did not seem promising. It seemed fiddly, and I probably should take it easy on the tablet’s system.
code-server, running remotely
A more promising option is code-server. This is a fork of VSCode (actually of VSCodium) that runs completely on the remote machine, and the client machine just needs a browser. I set that up this weekend and found that I was able to do a little bit of work reasonably.
Access
With code-server one has to decide how to expose it safely enough. I decided against the tunnel-over-SSH option, as I expected that to be somewhat tedious to set up (both initially and for each session) on the android system, and I liked the idea of being able to use any device to work in my environment.
I also decided against the more involved “reverse proxy behind proper hostname with SSL” setups, because they involve a few extra steps, and some of them I cannot do as I do not have root access on the shared beefy machine I wanted to use.
That left me with the option of using a code-server’s built-in support for self-signed certificates and a password:
With trust-on-first-use this seems reasonably secure.
Update: I noticed that the browsers would forget that I trust this self-signed cert after restarting the browser, and also that I cannot “install” the page (as a Progressive Web App) unless it has a valid certificate. But since I don’t have superuser access to that machine, I can’t just follow the official recommendation of using a reverse proxy on port 80 or 431 with automatic certificates. Instead, I pointed a hostname that I control to that machine, obtained a certificate manually on my laptop (using acme.sh) and copied the files over, so the configuration now reads as follows:
(I am using nix as a package manager on a Debian system there, hence the additional PATH and complex ExecStart. If you have a more conventional setup then you do not have to worry about Environment and can likely use ExecStart=code-server.
For this to survive me logging out I had to ask the system administrator to run loginctl enable-linger joachim, so that systemd allows my jobs to linger.
Git credentials
The next issue to be solved was how to access the git repositories. The work is all on public repositories, but I still need a way to push my work. With the classic VSCode-SSH-remote setup from my laptop, this is no problem: My local SSH key is forwarded using the SSH agent, so I can seamlessly use that on the other side. But with code-server there is no SSH key involved.
I could create a new SSH key and store it on the server. That did not seem appealing, though, because SSH keys on Github always have full access. It wouldn’t be horrible, but I still wondered if I can do better.
I thought of creating fine-grained personal access tokens that only me to push code to specific repositories, and nothing else, and just store them permanently on the remote server. Still a neat and convenient option, but creating PATs for our org requires approval and I didn’t want to bother anyone on the weekend.
So I am experimenting with Github’s git-credential-manager now. I have configured it to use git’s credential cache with an elevated timeout, so that once I log in, I don’t have to again for one workday.
To login, I have to https://github.com/login/device on an authenticated device (e.g. my phone) and enter a 8-character code. Not too shabby in terms of security. I only wish that webpage would not require me to press Tab after each character…
This still grants rather broad permissions to the code-server, but at least only temporarily
Android setup
On the client side I could now open https://host.example.com:8080 in Firefox on my eInk Android tablet, click through the warning about self-signed certificates, log in with the fixed password mentioned above, and start working!
I switched to a theme that supposedly is eInk-optimized (eInk by Mufanza). It’s not perfect (e.g. git diffs are unhelpful because it is not possible to distinguish deleted from added lines), but it’s a start. There are more eInk themes on the official Visual Studio Marketplace, but because code-server is a fork it cannot use that marketplace, and for example this theme isn’t on Open-VSX.
For some reason the F11 key doesn’t work, but going fullscreen is crucial, because screen estate is scarce in this setup. I can go fullscreen using VSCode’s command palette (Ctrl-P) and invoking the command there, but Firefox often jumps out of the fullscreen mode, which is annoying. I still have to pay attention to when that’s happening; maybe its the Esc key, which I am of course using a lot due to me using vim bindings.
A more annoying problem was that on my Boox tablet, sometimes the on-screen keyboard would pop up, which is seriously annoying! It took me a while to track this down: The Boox has two virtual keyboards installed: The usual Google ASOP keyboard, and the Onyx Keyboard. The former is clever enough to stay hidden when there is a physical keyboard attached, but the latter isn’t. Moreover, pressing Shift-Ctrl on the physical keyboard rotates through the virtual keyboards. Now, VSCode has many keyboard shortcuts that require Shift-Ctrl (especially on an eInk device, where you really want to avoid using the mouse). And the limited settings exposed by the Boox Android system do not allow you configure that or disable the Onyx keyboard! To solve this, I had to install the KISS Launcher, which would allow me to see more Android settings, and in particular allow me to disable the Onyx keyboard. So this is fixed.
I was hoping to improve the experience even more by opening the web page as a Progressive Web App (PWA), as described in the code-server FAQ. Unfortunately, that did not work. Firefox on Android did not recognize the site as a PWA (even though it recognizes a PWA test page). And I couldn’t use Chrome either because (unlike Firefox) it would not consider a site with a self-signed certificate as a secure context, and then code-server does not work fully. Maybe this is just some bug that gets fixed in later versions.
Now that I use a proper certificate, I can use it as a Progressive Web App, and with Firefox on Android this starts the app in full-screen mode (no system bars, no location bar). The F11 key still does’t work, and using the command palette to enter fullscreen does nothing visible, but then Esc leaves that fullscreen mode and I suddenly have the system bars again. But maybe if I just don’t do that I get the full screen experience. We’ll see.
I did not work enough with this yet to assess how much the smaller screen estate, the lack of colors and the slower refresh rate will bother me. I probably need to hide Lean’s InfoView more often, and maybe use the Error Lens extension, to avoid having to split my screen vertically.
I also cannot easily work on a park bench this way, with a tablet and a separate external keyboard. I’d need at least a table, or some additional piece of hardware that turns tablet + keyboard into some laptop-like structure that I can put on my, well, lap. There are cases for Onyx products that include a keyboard, and maybe they work on the lap, but they don’t have the Trackpoint that I have on my ThinkPad TrackPoint Keyboard II, and how can you live without that?
Conclusion
After this initial setup chances are good that entering and using this environment is convenient enough for me to actually use it; we will see when it gets warmer.
A few bits could be better. In particular logging in and authenticating GitHub access could be both more convenient and more safe – I could imagine that when I open the page I confirm that on my phone (maybe with a fingerprint), and that temporarily grants access to the code-server and to specific GitHub repositories only. Is that easily possible?
Over the last year, Well-Typed have carried out significant work in Cabal,
Haskell’s build system, thanks to funding from the Sovereign Tech
Fund. Our main goal was to re-think the Cabal
architecture for building packages. This was historically tied to the Setup
command-line interface, with each package technically capable of providing its
own independent build system via the Custom build-type. In practice, the full
generality of this interface is not useful, and it obstructs the development of
new features and created a drag on maintenance, so there has long been an
appetite to reimagine this interface within Cabal.1
With the release of Cabal-3.14.0.0
and cabal-install-3.14.1.1,
the new Hooks build-type we have developed, together with the Cabal-hooks
library, are now available to package authors. Over time, we hope to see packages that
depend on the Custom build-type gradually migrate to use Hooks instead.
The Cabal specification (2005)
was designed to allow Haskell tool authors to package their code and share it
with other developers.
The Haskell Package System (Cabal) has the following main goal:
to specify a standard way in which a Haskell tool can be packaged, so that it
is easy for consumers to use it, or re-package it, regardless of the Haskell
implementation or installation platform.
The Cabal concept of a package is a versioned unit of distribution in source
format, with enough metadata to allow it to be built and packaged by
downstream distributors (e.g. Linux distributions and other build tools).
A Cabal package consists of multiple components which map onto individual
Haskell units (e.g. a single library or executable).
The Cabal package model
Each package must bundle some metadata, specified in a .cabal file. Chiefly:
the package name and version number,
its dependencies, including version bounds (e.g. base >= 4.17 && < 4.21, lens ^>= 5.3),
what the package provides (libraries and their exposed modules, executables…),
how to build the package (e.g. build-type: Simple).
The Cabal library then implements everything required to build individual
packages, first parsing the .cabal file and then building and invoking the
Setup script of the package.
The Setup interface
The key component of the original Cabal specification is that each
package must provision an executable which is used to build it.
As written in an early draft:
To help users install packages and their dependencies, we propose a system
similar to Python’s Distutils, where each Haskell package is distributed with a
script which has a standard command-line interface.
More precisely, to comply with the Cabal specification, the build system of a
package need only implement the Setup command-line interface, i.e. provide a
Setup executable that supports invocations of the form ./Setup <cmd>:
<cmd>
description
configure
resolve compiler, tools and dependencies
build/haddock/repl
prepare sources and build/generate docs/open a session in the interpreter
test/bench
run testsuites or benchmarks
copy/install/register
move files into an image dir or final location/register libraries with the compiler
sdist
create an archive for distribution/packaging
clean
clean local files (local package store, local build artifacts, …)
In practice, the ./Setup configure command takes a large number of parameters
(as represented in the Cabal ConfigFlags datatype).
This configuration is preserved for subsequent invocations, which usually only
take a couple of parameters (e.g. ./Setup build -v2 --builddir=<dir>).
This interface can be used directly to build any package, by executing the
the following recipe:
build and install the dependencies in dependency order;
Usually, these steps will be executed by a build tool such as cabal-install,
which provides a more convenient user interface than invoking Setup commands
directly. Some systems (such as nixpkgs) do directly use this interface, however.
The tricky parts in the above are:
passing appropriate arguments to ./Setup configure,
in particular exactly specifying dependencies,2
and making sure the arguments are consistent with those
expected by the cabal-version of the package,3
constructing the correct environment for invoking ./Setup, e.g. adding
appropriate build-tool-depends executables in PATH and defining the
corresponding <buildTool>_datadir environment variables.
Library registration
In the above recipe to build packages, there was a single step which wasn’t an
invocation of the Setup script: a call to hc-pkg. To quote from the original
Cabal specification:
Each Haskell compiler hc must provide an associated package-management
program hc-pkg. A compiler user installs a package by placing the package’s
supporting files somewhere, and then using hc-pkg to make the compiler aware
of the new package. This step is called registering the package with the compiler.
To register a package, hc-pkg takes as input an installed package description (IPD),
which describes the installed form of the package in detail.
This is the key interchange mechanism between Cabal and the Haskell compiler.
The installed package description format is laid out in the Cabal specification;
in brief, it contains all the information the Haskell compiler needs to use
a library, such as its exposed modules, its dependencies, and its installation
path. This information can be seen by calling hc-pkg describe:
Note that, perhaps confusingly, the hc-pkg interface is not concerned with
Cabal’s notion of “packages”. Rather, it deals only in “units”; these generally
map to Cabal components, such as the package’s main library and its private and
public sublibraries. For example, the internal attoparsec-internal sublibrary
of the attoparsec package is registered separately:
Centering the package build process around the Setup script provides a great
deal of flexibility to package authors, as the Setup executable can be
implemented in any way the package author chooses. In this way, each package
brings its own build system.
However, in practice, this is more expressiveness that most library authors want
or need. Consequently, almost all packages use one of the following two build
systems:
build-type: Simple (most packages). For such packages, the Setup.hs file
is of the following form:
build-type: Custom where the Setup.hs file uses the Cabal library to
perform most of the build, but brackets some of its logic with package-specific code using
the Cabal UserHooks mechanism,
e.g. so that it runs custom configuration code after
Cabal configure, or generates module sources
before running Cabal build.
For an example of case (2), the custom Setup.hs code for hooking into the
configure phase might look like the following:
main = ( defaultMainWithHooks simpleUserHooks ) { confHook = \ info cfgFlags ->do info' <- customPreConfHook info cfgFlags confHook simpleUserHooks info' cfgFlags }
In this example, simpleUserHooks means “no hooks” (or more accurately “exactly
the hooks that build-type: Simple uses”). So the above snippet shows how we
can include custom logic in customPreConfHook in order to update the Cabal
GenericPackageDescription, before calling the Cabal library configure
function (via confHook simpleUserHooks). Here, aGenericPackageDescription
is the representation of a .cabal file used by Cabal (the Generic part
means “before attempting to resolve any conditionals”).
The fact that Setup executables may (in principle) be arbitrary when using
build-type: Custom fundamentally limits what build tools such as
cabal-install or the Haskell Language Server can do in multi-package projects.
The tool has to treat the build system of each package as an opaque black box,
merely invoking functionality defined by the specific version of the Setup
interface supported by the package.
The main observation is that, in practice, custom Setup.hs scripts only insert
benign modifications to the build process: they still fundamentally rely on
the Cabal library to do the bulk of the work building the package.
A replacement for Custom setup scripts
The limitations of the Setup interface discussed above motivate the need for a new mechanism
to customise the build system of a package:
The bulk of the work should be carried out by the Cabal library,
which exposes functions such as configure
and build,
but these need to be augmented
with hooks so that individual packages can customise certain phases.
The hooks provided by this mechanism should be kept to a minimum
(to give more flexibility to build tools)
while still accommodating the needs of package authors in practice.
Customisation should be declared by a Haskell library interface (as opposed
to the black-box command-line interface of Setup.hs), in order
to enable as much introspection by build systems as possible.
This will enable a gradual restructuring of build tools such as cabal-install
away from the Setup command-line interface, which has grown unwieldy due to
the difficulty of evolving it to meet requirements that could not be foreseen
when it was created.
Building on this understanding, as well as a survey of existing uses cases of
build-type: Custom, we have introduced an alternative mechanism for customizing
how a package is built: build-type: Hooks.
This mechanism does not allow arbitrary
replacement of the usual Cabal build logic, but rather merely exposes a set of
well-defined hooks which bracket a subset of Cabal’s existing build steps.
We arrived at this design through collaboration with Cabal developers, users,
and packagers as part of a RFC process in
Haskell Foundation Tech Proposal #60.
Introducing build-type: Hooks
The main documentation for usage of the hooks API is provided in the Haddocks
for the Cabal-hooks package.
The Cabal Hooks overlay contains patched packages
using build-type: Hooks.
It can be used as an overlay like head.hackage, for constructing build plans
without any build-type: Custom packages. It can also serve as a reference for
usage of the API.
At a high-level, a package with build-type: Hooks:
declares in its .cabal file:
a cabal-version of at least 3.14,
build-type: Hooks,
a custom-setup stanza with a dependency on Cabal-hooks (the latter is
a library bundled with Cabal that provides the API for writing hooks):
contains a SetupHooks.hs Haskell module source file, next to the .cabal file,
which specifies the hooks the package uses.
This module exports a value setupHooks :: SetupHooks (in which the
SetupHooks type is exported by Distribution.Simple.SetupHooks from the
Cabal-hooks package).
configure hooks allow customising how a package will be built
pre-build rules allow generating source files to be built
post-build hooks allow the package to customise the linking step
install hooks allow the package to install additional files alongside
the usual binary artifacts
In the remainder of this blog post, we will focus on the two most important
(and most commonly used) hooks: configure hooks and pre-build rules.
Configure hooks
The configure hooks allow package authors to make decisions about how to
build their package, by modifying the Cabal package description (which is Cabal’s
internal representation of the information in a .cabal file). Crucially,
these modifications will persist to all subsequent phases.
Configuration happens at two levels:
global configuration covers the entire package,
local configuration covers a single component.
There are three hooks into the configure phase:
Package-wide pre-configure. This can be used for custom logic in the style
of traditional ./configure scripts, e.g. finding out information about
the system and configuring dependencies, when those don’t easily fit into
Cabal’s framework.
Package-wide post-configure. This can be used to write custom package-wide
information to disk, to be consumed by (3).
Per-component pre-configure. This can be used to modify individual
components, e.g. adding exposed modules or specifying flags to be used when
building the component.
Per-package configuration
Suppose our package needs to use some external executable, e.g. a preprocessor.
If we require custom logic to find this external executable on the system, or
to parse its version number, we need to go beyond Cabal’s built-in support for
build-tool-depends.
We can do this in a pre-configure hook:
myConfigureHooks ::ConfigureHooksmyConfigureHooks = noConfigureHooks { preConfigurePackageHook =Just configureCustomPreProc }configureCustomPreProc ::PreConfPackageInputs->IOPreConfPackageOutputsconfigureCustomPreProc pcpi@( PreConfPackageInputs { configFlags = cfg, localBuildConfig = lbc } ) =dolet verbosity = fromFlag $ configVerbosity cfg progDb = withPrograms lbc configuredPreProcProg <- configureUnconfiguredProgram verbosity customPreProcProg progDbreturn$ ( noPreConfPackageOutputs pcpi ) { extraConfiguredProgs = Map.fromList [ ( customPreProcName, configuredPreProcProg ) ] }customPreProcName ::StringcustomPreProcName ="customPreProc"customPreProcProg ::ProgramcustomPreProcProg = ( simpleProgram customPreProcName ) { programFindLocation =-- custom logic to find the installed location of myPreProc-- on the system used to build the package myPreProcProgFindLocation , programFindVersion =-- custom logic to find the program version myPreProcProgFindVersion }
Cabal will then add this program to its program database, allowing the program
to be used to satisfy build-tool-depends requirements, as well as making it
available in subsequent hooks (e.g. pre-build hooks).
Modifying individual components
Suppose we want to modify a component of a Cabal package, e.g. inserting
configuration options determined by inspecting the system used to build the
package (e.g. availability of certain processor capabilities). We can do this
using hooks into the configure phase. For illustration, consider the following
example, which includes:
a package-wide post-configure hook, which inspects the system to determine
availability of AVX2 CPU features, and writes it out to a "system-info" file,
a per-component pre-configure hook which reads the "system-info" file,
and uses that to pass appropriate compiler options (e.g. -mavx2) when
compiling each component.
Pre-build rules can be used to generate Haskell source files which can then be
built as part of the compilation of a unit. Since we want to ensure that such
generated modules don’t break recompilation avoidance (thereby crippling HLS and
other interactive tools), these hooks comprise a simple build system. They
are described in the Haddock documentation for Cabal-hooks.
The overall structure is that one specifies a collection of Rules
inside the monadic API in the RulesM monad.
Each individual rule contains a Command,
consisting of a statically specified action to run (e.g. a preprocessor such as
alex, happy or c2hs) bundled with (possibly dynamic) arguments (such as
the input and output filepaths). In the Hooks API, these are constructed using
the mkCommand function.
The actions are referenced using static pointers;
this allows the static pointer table of the SetupHooks module to be used as a
dispatch table for all the custom preprocessors provided by the hooks.
One registers rules using staticRule,
declaring the inputs and outputs of each rule. In this way, we can think of each
rule as corresponding to an individual invocation of a custom preprocessor.
Rules are also allowed to have dynamic dependencies (using dynamicRule
instead of staticRule); this supports use-cases such as C2Hs in which one
needs to first process .chs module headers to discover the import structure.
Let’s start with a simple toy example to get used to the API:
declare hooks that run alex on Lexer.alex and happy on
Parser.happy (running alex/happy on *.x/*.y files is built into Cabal,
but this is just for illustrative purposes).
The static Dict arguments to mkCommand provide evidence that the arguments
passed to the preprocessor can be serialised and deserialised. While syntactically
inconvenient for writers of Hooks, this crucially allows external build tools
(such as cabal-install or HLS) to run and re-run individual build rules without
re-building everything, as explained in the Haskell Foundation Tech Proposal #60.
Rules are allowed to depend on the output of other rules, as well as
directly on files (using the Location datatype). If rule B depends on a file generated by rule A, then one
must declare A as rule dependency of B (and not use a file dependency).
To summarise, the general structure is that we use the monadic API to declare
a collection of rules (usually, one rule per Haskell module we want to generate,
but a rule can generate multiple outputs as well). Each rule stores a reference
(via StaticPointers) to a command to run, as well as the (possibly dynamic)
arguments to that command.
We can think of the pre-build rules as a table of statically known custom
pre-processors, together with a collection of invocations of these custom
pre-processors with specific arguments.
A word of warning: authors of pre-build rules should use the static keyword
at the top-level whenever possible in order to avoid GHC bug #16981. In the example above, this corresponds to defining
runAlex and runHappy at the top-level, instead of defining them in-line in
the body of myPreBuildRules.
Custom pre-processors
To illustrate how to write pre-build rules, let’s suppose one wants to declare
a custom preprocessor, say myPreProc, which generates Haskell modules from
*.hs-mypp files. Any component of the package which requires such
pre-processing would declare build-tool-depends: exe:myPreProc.
The pre-build rules can be structured as follows:
Look up the pre-processor in the Cabal ProgramDb (program database).
Define how, given input/output files, we should invoke this preprocessor,
e.g. what arguments should we pass to it.
Search for all *.hs-mypp files relevant to the project, monitoring
the results of this search (for recompilation checking).
For each file found by the search in (3), register a rule which invokes the
processor as in (2).
{-# LANGUAGE StaticPointers #-}myBuildHooks = noBuildHooks { preBuildComponentRules =Just$ rules ( static () ) myPreBuildRules }myPreBuildRules ::PreBuildComponentInputs->RulesM ()myPreBuildRulesPreBuildComponentInputs { buildingWhat = what , localBuildInfo = lbi , targetInfo =TargetInfo { targetComponent = comp, targetCLBI = clbi } } =dolet verbosity = buildingWhatVerbosity what progDb = withPrograms lbi bi = componentBuildInfo comp mbWorkDir = mbWorkDirLBI lbi-- 1. Look up our custom pre-processor in the Cabal program database. for_ ( lookupProgramByName myPreProcName progDb ) $ \ myPreProc ->do-- 2. Define how to invoke our custom preprocessor.let myPpCmd ::Location->Location->CommandMyPpArgs ( IO () ) myPpCmd inputLoc outputLoc = mkCommand ( static Dict ) ( static ppModule ) ( verbosity, mbWorkDir, myPreProc, inputLoc, outputLoc )-- 3. Search for "*.hs-mypp" files to pre-process in the source directories of the package.let glob =GlobDirRecursive [ WildCard, Literal"hs-mypp" ] myPpFiles <- liftIO $ for ( hsSourceDirs bi ) $ \ srcDir ->dolet root = interpretSymbolicPath mbWorkDir srcDir matches <- runDirFileGlob verbosity Nothing root globreturn [ Location srcDir ( makeRelativePathEx match )| match <- globMatches matches ]-- Monitor existence of file glob to handle new input files getting added.-- NB: we don't have to monitor the contents of the files, because the files-- are declared as inputs to rules, which means that their contents are-- automatically tracked. addRuleMonitors [ monitorFileGlobExistence $RootedGlobFilePathRelative glob ]-- NB: monitoring a directory recursive glob isn't currently supported;-- but implementing support would be a nice newcomer-friendly task for cabal-install.-- See https://github.com/haskell/cabal/issues/10064.-- 4. Declare rules, one for each module to be preprocessed, with the-- corresponding preprocessor invocation. for_ ( concat myPpFiles ) $ \ inputLoc@( Location _ inputRelPath ) ->dolet outputBaseLoc = autogenComponentModulesDir lbi clbi outputLoc =Location outputBaseLoc ( unsafeCoerceSymbolicPath $ replaceExtensionSymbolicPath inputRelPath "hs" ) registerRule_ ( toShortText $ getSymbolicPath inputRelPath ) $ staticRule ( myPpCmd inputLoc outputLoc ) [] ( outputLoc NE.:| [] )typeMyPpArgs= ( Verbosity, Maybe ( SymbolicPathCWD ( DirPkg ) ), ConfiguredProgram, Location, Location )-- NB: this could be a datatype instead, but it would need a 'Binary' instance.ppModule ::MyPpArgs->IO ()ppModule ( verbosity, mbWorkDir, myPreProc, inputLoc, outputLoc ) =dolet inputPath = location inputLoc outputPath = location outputLoc createDirectoryIfMissingVerbose verbosity True$ interpretSymbolicPath mbWorkDir $ takeDirectorySymbolicPath outputPath runProgramCwd verbosity mbWorkDir myPreProc [ getSymbolicPath inputPath, getSymbolicPath outputPath ]
This might all be a bit much on first reading, but the key principle is that
we are declaring a preprocessor, and then registering one invocation of this
preprocessor per *.hs-mypp file:
In myPpCmd, the occurrence of static ppModule can be thought of as
declaring a new preprocessor,4
with ppModule being the function to run. This is accompanied by the
neighbouring static Dict occurrence, which provides a way to serialise
and deserialise the arguments passed to preprocessor invocations.
We register one rule per each module to pre-process, which means that
external build tools can re-run the preprocessor on individual modules
when the source *.hs-mypp file changes.
Conclusion
This post has introduced build-type: Hooks for the benefit of package authors
who use build-type: Custom. We hope that this introduction will inspire and
assist package authors to move away from build-type: Custom in the future.
We encourage package maintainers to explore build-type: Hooks and
contribute their feedback on the Cabal issue tracker,
helping refine the implementation and expand its adoption across
the ecosystem. To assist such explorations,
we also recall the existence of the Cabal Hooks overlay,
an overlay repository like head.hackage which contains packages that have been
patched to use build-type: Hooks instead of build-type: Custom.
In addition to the work described here, we have done extensive work in
cabal-install to address technical debt and enable it to make use of the new
interface as opposed to going through the Setup CLI. The changes needed in
cabal-install and other build tools (such as HLS) will be the subject of a
future post.
While there remains technical work needed in cabal-install and HLS to fully
realize the potential of build-type: Hooks, it should eventually lead to:
decreases in build times,
improvements in recompilation checking,
more robust HLS support,
removal of most limitations of build-type: Custom, such as
the lack of ability to use multiple sublibraries,
better long-term maintainability of the Cabal project.
Well-Typed are grateful to the Sovereign Tech
Fund for funding this work. In order to continue
our work on Cabal and the rest of the Haskell tooling ecosystem, we are offering
Haskell Ecosystem Support Packages. If
your company relies on Haskell, please encourage them to consider purchasing a
package!
e.g. --package-db=<pkgDb>, --cid=<unitId> and
--dependency=<depPkgNm>:<depCompNm>=<depUnitId> arguments↩︎
The cabal-version field of a package description specifies the version of the Cabal specification it expects. As the Cabal specification evolves, so does the set of flags understood by the Setup CLI.
This means that, when invoking the Setup script for a package, the build tool needs
to be careful to pass arguments consistent with that version; see for instance how
cabal-install handles this in Distribution.Client.Setup.filterConfigureFlags.↩︎
In practice, this means adding an entry to the static pointer table.↩︎
Below we present some animations that illustrate operations on finite patches of Penrose’s Kite and Dart tiles.
These were created using PenroseKiteDart which is a Haskell package available on Hackage making use of the Haskell Diagrams package. For details, see the PenroseKiteDart user guide.
Penrose’s Kite and Dart tiles can produce infinite aperiodic tilings of the plane. There are legal tiling rules to ensure aperiodicity, but these rules do not guarantee that a finite tiling will not get stuck. A legal finite tiling which can be continued to cover the whole plane is called a correct tiling. The rest, which are doomed to get stuck, are called incorrect tilings. (More details can be found in the links at the end of this blog.)
Decomposition Animations
The function decompose is a total operation which is guaranteed to preserve the correctness of a finite tiling represented as a tile graph (or Tgraph). Let us start with a particular Tgraph called sunGraph which is defined in PenroseKiteDart and consists of 5 kites arranged with a common origin vertex. It is drawn using default style in figure 1 on the left. On the right of figure 1 it is drawn with both vertex labels and dotted lines for half-tile join edges.
Figure 1: sunGraph
We can decompose sunGraph three times by selecting index 3 of the infinite list of its decompositions.
The result (sunD3) is drawn in figure 2 (scaled up).
Figure 2: sunD3
The animation in figure 3 illustrates two further decompositions of sunD3 in two stages.
Figure 3: Two decompositions of sunD3
Figure 4 also illustrates two decompositions, this time starting from forcedKingD.
forcedKingD :: Tgraph
forcedKingD = force (decompose kingGraph)
Figure 4: Two decompositions of forcedKingD
A Composition Animation
An inverse to decomposing (namely composing) has some extra intricacies. In the literature (see for example 1 and 2) versions of the following method are frequently described.
Firstly, split darts in half.
Secondly, glue all the short edges of the half-darts where they meet a kite (simultaneously). This will form larger scale complete darts and larger scale half kites.
Finally join the halves of the larger scale kites.
This works for infinite tilings, but we showed in Graphs,Kites and Darts and Theorems that this method is unsound for finite tilings. There is the trivial problem that a half-dart may not have a complete kite on its short edge. Worse still, the second step can convert a correct finite tiling into an incorrect larger scale tiling. An example of this is given in Graphs, Kites and Darts and Theorems where we also described our own safe method of composing (never producing an incorrect Tgraph when given a correct Tgraph). This composition can leave some boundary half-tiles out of the composition (called remainder half-tiles).
The animation in figure 5 shows such a composition where the remainder half-tiles are indicated with lime green edges.
Figure 5: Composition Animation
In general, compose is a partial operation as the resulting half-tiles can break some requirements for Tgraphs (namely, connectedness and no crossing boundaries). However we have shown that it is a total function on forced Tgraphs. (Forcing is discussed next.)
Forcing Animations
The process of forcing a Tgraph adds half-tiles on the boundary where only one legal choice is possible. This continues until either there are no more forced additions possible, or a clash is found showing that the tiling is incorrect. In the latter case it must follow that the initial tiling before forcing was already an incorrect tiling.
The process of forcing is animated in figure 6, starting with a 5 times decomposed kite and in figure 7 with a 5 times decomposed dart.
Figure 6: Force animationFigure 7: Another force animation
It is natural to wonder what forcing will do with cut-down (but still correct) Tgraphs. For example, taking just the boundary faces from the final Tgraph shown in the previous animation forms a valid Tgraph (boundaryExample) shown in figure 8.
Applying force to boundaryExample just fills in the hole to recreate force (decompositions dartGraph !!5) modulo vertex numbering. To make it more interesting we tried removing further half-tiles from boundaryExample to make a small gap. Forcing this also completes the filling in of the boundary half-tiles to recreate force (decompositions dartGraph !!5). However, we can see that this filling in is constrianed to preserve the required Tgraph property of no crossing boundaries which prevents the tiling closing round a hole.
This is illustrated in the animation shown in figure 9.
Figure 9: Boundary gap animation
As another experiment, we take the boundary faces of a (five times decomposed but not forced) star. When forced this fills in the star and also expands outwards, as illustrated in figure 10.
Figure 10: Star boundary
In the final example, we pick out a shape within a correct Tgraph (ensuring the chosen half-tiles form a valid Tgraph) then animate the force process and then run the animation in both directions (by adding a copy of the frames in reverse order).
The result is shown in figure 11.
Figure 11: Heart animation
Creating Animations
Animations as gif files can be produced by the Haskell Diagrams package using the rasterific back end.
The main module should import both Diagrams.Prelude and Diagrams.Backend.Rasterific.CmdLine. This will expose the type B standing for the imported backend, and diagrams then have type Diagram B.
An animation should have type [(Diagram B, Int)] and consist of a list of frames for the animation, each paired with an integer delay (in one-hundredths of a second).
The animation can then be passed to mainWith.
module Main (main)whereimport Diagrams.Prelude
import Diagrams.Backend.Rasterific.CmdLine
...
fig::[(Diagram B,Int)]
fig = myExampleAnimation
main :: IO ()
main = mainWith fig
If main is then compiled and run (e.g. with parameters -w 700 -o test.gif) it will produce an output file (test.gif with width 700).
Crossfade tool
The decompose and compose animations were defined using crossfade.
crossfade :: Int -> Diagram B -> Diagram B ->[Diagram B]
crossfade n d1 d2 = map blending ratios
where
blending r = opacity (1-r) d1 <> opacity r d2
ratios = map ((/ fromIntegral n) . fromIntegral)[0..n]
Thus crossfade n d1 d2 produces n+1 frames, each with d1 overlaid on d2 but with varying opacities (decreasing for d1 and increasing for d2).
Adding the same pause (say 10 hundreths of a second) to every frame can be done by applying map (,10) and this will produce an animation.
Force animation tool
To create force animations it was useful to create a tool to produce frames with stages of forcing.
an angle argument (to rotate the diagrams in the animation from the default alignment of the Tgraph),
an Int (for the required number of frames),
a Tgraph (to be forced),
a triple of colours for filling darts, kites and grout (edge colour), respectively.
The definition of forceFrames uses stepForce to advance forcing a given number of steps to get the intermediate Tgraphs. The total number of forcing steps will be the number of faces (half-tiles) in the final force g less the number of faces in the initial g. All the Tgraphs are drawn (using colourDKG) but the resulting diagrams must all be aligned properly. The alignment can be achieved by creating a VPatch (vertex patch) from the final Tgraph which is then rotated. All the Tgraphs can then be drawn using sub vertex patches of the final rotated one. (For details see Overlaid examples in the PenroseKiteDart user guide.)
Empires and SuperForce – these new operations were based on observing properties of boundaries of forced Tgraphs.
Graphs, Kites and Darts introduced Tgraphs. This gave more details of implementation and results of early explorations. (The class Forcible was introduced subsequently).
Diagrams for Penrose Tiles – the first blog introduced drawing Pieces and Patches (without using Tgraphs) and provided a version of decomposing for Patches (decompPatch).
consider a generic implementation of alpha-beta game tree search with transposition table, generic enough to be applicable to any user-specified game. what should be its API? what features should it provide?
evaluate to infinite depth (possible because of transposition table), returning game value and line (principal variation). intended for small games.
return the transposition table so that it can be reused for subsequent moves.
evaluate to given depth. or, user-specified predicate of whether to stop searching, e.g., quiescence search. quiescence search wants access to the transposition table.
ambitious: because of the many ways game tree search can be customized (for many examples, albeit often poorly described, see the chessprogramming wiki), structure the algorithm as a collection components each of which can be modified and hooked together in various ways. I have no idea what language or framework could enable this kind of software engineering, though functional programming languages seem attractive as the first thing to try. but beware that a pure functional programming language such as Haskell easily leaks space for this kind of task, and threading state, the transposition table, though the computation may be awkward.
common customizations sacrifice accuracy (correctness or completeness) for speed. for example, if two different evaluated positions have the same key (for example, a 64-bit Zobrist hash in chess), one can optimize by doing no transposition table collision resolution; the second position gets ignored, assumed to have already been evaluated. the default algorithm should not do such optimizations but should allow the user to specify both safe and unsafe optimizations.
allow the search to be augmented with various statistics gathered along the way that get consumed by other user-specified parts of the algorithm. for example, the move generator could order moves based on values of similar moves already evaluated in other parts of the tree.
provide visibility into how user customizations are working, ways to evaluate whether or not they are worth it.
There’s a common anti-pattern I see in beginner-to-intermediate Haskell programmers that I wanted to discuss today. It’s the tendency to conceptualize the creation of an object by repeated mutation. Often this takes the form of repeated insertion into an empty container, but comes up under many other guises as well.
This anti-pattern isn’t particularly surprising in its prevalence; after all, if you’ve got the usual imperative brainworms, this is just how things get built. The gang of four “builder pattern” is exactly this; you can build an empty object, and setters on such a thing change the state but return the object itself. Thus, you build things by chaning together setter methods:
Even if you don’t ascribe to the whole OOP design principle thing, you’re still astronomically likely to think about building data structures like this:
Doodad doodad =new Doodad;foreach(Widget widget in widgets){ doodad.addWidget(widget);}
To be more concrete, maybe instead of doodads and widgets you have BSTs and Nodes. Or dictionaries and key-value pairs. Or graphs and edges. Anywhere you look, you’ll probably find examples of this sort of code.
Maybe you’re thinking to yourself “I’m a hairy-chested functional programmer and I scoff at patterns like these.” That might be true, but perhaps you too are guilty of writing code that looks like:
foldr (\(k, v) m -> Map.insert k v m) Map.empty$ toKVPairs something
Just because it’s dressed up with functional combinators doesn’t mean you’re not still writing C code. To my eye, the great promise of functional programming is its potential for conceptual clarity, and repeated mutation will always fall short of the mark.
The complaint, as usual, is that repeated mutation tells you how to build something, rather than focusing on what it is you’re building. An algorithm cannot be correct in the absence of intention—after all, you must know what you’re trying to accomplish in order to know if you succeeded. What these builder patterns, for loops, and foldrs all have in common is that they are algorithms for strategies for building something.
But you’ll notice none of them come with comments. And therefore we can only ever guess at what the original author intended, based on the context of the code we’re looking at.
I’m sure this all sounds like splitting hairs, but that’s because the examples so far have been extremely simple. But what about this one?
cgo :: (a -> (UInt, UInt)) -> [a] -> [NonEmpty a]cgo f =foldr step []where step a [] = [pure a] step a bss0@((b :| bs) : bss)|let (al, ac) = f a , let (bl, bc) = f b , al +1== bl && ac == bc= (a :| b : bs) : bss|otherwise=pure a : bss0
which I found by grepping through haskell-language-server for foldr, and then mangled to remove the suggestive variable names. What does this one do? Based solely on the type we can presume it’s using that function to partition the list somehow. But how? And is it correct? We’ll never know—and the function doesn’t even come with any tests!
It’s Always Monoids
The shift in perspective necessary here is to reconceptualize building-by-repeated-mutation as building-by-combining. Rather than chiseling out the object you want, instead find a way of gluing it together from simple, obviously-correct pieces.
The notion of “combining together” should evoke in you a cozy warm fuzzy feeling. Much like being in a secret pillow form. You must come to be one with the monoid. Once you have come to embrace monoids, you will have found inner programming happiness. Monoids are a sacred, safe place, at the fantastic intersection of “overwhelming powerful” and yet “hard to get wrong.”
As an amazingly fast recap, a monoid is a collection of three things: some type m, some value of that type mempty, and binary operation over that type (<>) :: m -> m -> m, subject to a bunch of laws:
∀a.mempty<> a = a = a <>mempty∀a b c. (a <> b) <> c = a <> (b <> c)
which is to say, mempty does nothing and (<>) doesn’t care where you stick the parentheses.
If you’re going to memorize any two particular examples of monoids, it had better be these two:
instanceMonoid [a] wheremempty= [] a <> b = a ++ binstance (Monoid a, Monoid b) =>Monoid (a, b) wheremempty= (mempty, mempty) (a1, b1) <> (a2, b2) = (a1 <> a2, b1 <> b2)
The first says that lists form a monoid under the empty list and concatenation. The second says that products preserve monoids.
The list monoid instance is responsible for the semantics of the ordered, “sequency” data structures. That is, if I have some sequential flavor of data structure, its monoid instance should probably satisfy the equation toList a <> toList b = toList (a <> b). Sequency data structures are things like lists, vectors, queues, deques, that sort of thing. Data structures where, when you combine them, you assume there is no overlap.
The second monoid instance here, over products, is responsible for pretty much all the other data structures. The first thing we can do with it is remember that functions are just really, really big product types, with one “slot” for every value in the domain. We can show an isomorphism between pairs and functions out of booleans, for example:
from :: (Bool-> a) -> (a, a)from f = (f False, f True)to :: (a, a) -> (Bool-> a)to (a, _) False= ato (_, a) True= a
and under this isomorphism, we should thereby expect the Monoid a => Monoid (Bool -> a) instance to agree with Monoid a => Monoid (a, a). If you generalize this out, you get the following instance:
instanceMonoid a =>Monoid (x -> a) wheremempty= \_ ->mempty f <> g = \x -> f x <> g x
which combines values in the codomain monoidally. We can show the equivalence between this monoid instance and our original product preservation:
from f <> from g= (f False, f True) <> (g False, g True)= (f False<> g False, f True<> g True)= ((f <> g) False, (f <> g) True)= from (f <> g)
and
to (a11, a12) <> to (a21, a22)= \x -> to (a11, a12) x <> to (a21, a22) x= \x ->case x ofFalse-> to (a11, a12) False<> to (a21, a22) FalseTrue-> to (a11, a12) True<> to (a21, a22) True= \x ->case x ofFalse-> a11 <> a21True-> a12 <> a22= \x -> to (a11 <> a21, a12 <> a22) x= to (a11 <> a21, a12 <> a22)
which is a little proof that our function monoid agrees with the preservation-of-products monoid. The same argument works for any type x in the domain of the function, but showing it generically is challenging.
Anyway, I digresss.
The reason to memorize thisMonoid instance is that it’s the monoid instance that every data structure is trying to be. Recall that almost all data structures are merely different encodings of functions, designed to make some operations more efficient than they would otherwise be.
Don’t believe me? A Map k v is an encoding of the function k -> Maybe v optimized to efficiently query which k values map to Just something. That is to say, it’s a sparse representation of a function.
From Theory to Practice
What does all of this look like in practice? Stuff like worrying about foldr is surely programming-in-the-small, which is worth knowing, but isn’t the sort of thing that turns the tides of a successful application.
The reason I’ve been harping on about the function and product monoids is that they are compositional. The uninformed programmer will be surprised by just far one can get by composing these things.
At work, we need to reduce a tree (+ nonlocal references) into an honest-to-goodness graph. While we’re doing it, we need to collect certain nodes. And the tree has a few constructors which semantically change the scope of their subtrees, so we need to preserve that information as well.
It’s actually quite the exercise to sketch out an algorithm that will accomplish all of these goals when you’re thinking about explicit mutation. Our initial attempts at implementing this were clumsy. We’d fold the tree into a graph, adding fake nodes for the Scope construcotrs. Then we’d filter all the nodes in the graph, trying to find the ones we needed to collect. Then we’d do a graph traversal from the root, trying to find these Scope nodes, and propagating their information downstream.
Rather amazingly, this implementation kinda sorta worked! But it was slow, and took \(O(10k)\) SLOC to implement.
The insight here is that everything we needed to collect was monoidal:
where the deriving (Semigroup, Monoid) via Generically Solution stanza gives us the semigroup and monoid instances that we’d expect from Solution being the product of a bunch of other monoids.
And now for the coup de grace: we hook everything up with the Writer monad. Writer is a chronically slept-on type, because most people seem to think it’s useful only for logging, and, underwhelming at doing logging compared to a real logger type. But the charm is in the details:
instanceMonoid w =>Monad (Writer w)
Writer w is a monad whenever w is a monoid, which makes it the perfect monad for solving data-structure-creation problems like the one we’ve got in mind. Such a thing gives rise to a few helper functions:
collectNode ::MonadWriterSolution m =>Node-> m ()collectNode n = tell $mempty { collectedNodes = Set.singleton n }addMetadata ::MonadWriterSolution m =>Node->Metadata-> m ()addMetadata n m = tell $mempty { metadata = Map.singleton n m }emitGraphFragment ::MonadWriterSolution m =>Graph-> m ()emitGraphFragment g = tell $mempty { graph = g }
each of which is responsible for adding a little piece to the final solution. Our algorithm is thus a function of the type:
algorithm ::Metadata-- ^ the current scope->Tree-- ^ the tree we're reducing->WriterSolutionNode-- ^ our partial solution, and the node corresponding to the root of the tree
which traverses the Tree, recursing with a different Metadata whenever it comes across a Scope constructor, and calling our helper functions as it goes. At each step of the way, the only thing it needs to return is the root Node of the section of the graph it just built, which recursing calls can use to break up the problem into inductive pieces.
This new implementation is roughly 20x smaller, coming in at @O(500)@ SLOC, and was free of all the bugs we’d been dilligently trying to squash under the previous implementation.
Suppose we have a sequence of integers \(a_1, \dots, a_n\) and want to be
able to perform two operations:
we can update any \(a_i\) by adding some value \(v\) to it; or
we can perform a range query, which asks for the sum of the values
\(a_i + \dots + a_j\) for any range \([i,j]\).
There are several ways to solve this problem. For example:
We could just keep the sequence of integers in a mutable array.
Updating is \(O(1)\), but range queries are \(O(n)\) since we must
actually loop through the range and add up all the values.
We could keep a separate array of prefix sums on the side, so
that \(P_i\) stores the sum \(a_1 + \dots + a_i\). Then the range
query on \([i,j]\) can be computed as \(P_j - P_{i-1}\), which only
takes \(O(1)\); however, updates now take \(O(n)\) since we must also
update all the prefix sums which include the updated element.
We can get the best of both worlds using a segment tree, a binary
tree storing the elements at the leaves, with each internal node
caching the sum of its children. Then both update and range query
can be done in \(O(\lg n)\).
I won’t go through the details of this third solution here, but it is
relatively straightforward to understand and implement, especially in
a functional language.
However, there is a fourth solution, known as a Fenwick tree or
Fenwick array, independently invented by Ryabko (1989) and
Fenwick (1994). Here’s a typical Java implementation of a Fenwick
tree:
class FenwickTree {privatelong[] a;publicFenwickTree(int n){ a =newlong[n+1];}publiclongprefix(int i){long s =0;for(; i >0; i -=LSB(i)) s += a[i];return s;}publicvoidupdate(int i,long delta){for(; i < a.length; i +=LSB(i)) a[i]+= delta;}publiclongrange(int i,int j){returnprefix(j)-prefix(i-1);}publiclongget(int i){returnrange(i,i);}publicvoidset(int i,long v){update(i, v -get(i));}privateintLSB(int i){return i &(-i);}}
I know what you’re thinking: what the heck!? There are some loops adding and
subtracting LSB(i), which is defined as the bitwise AND of i and
-i? What on earth is this doing? Unless you have seen this
before, this code is probably a complete mystery, as it was for me the
first time I encountered it.
However, from the right point of view, we can derive this mysterious imperative
code as an optimization of segment trees. In particular, in my
paper I show how we can:
Start with a segment tree.
Delete some redundant info from the segment tree, and shove the
remaining values into an array in a systematic way.
Define operations for moving around in the resulting Fenwick array by
converting array indices to indices in a segment tree, moving
around the tree appropriately, and converting back.
Describe these operations using a Haskell EDSL for
infinite-precision 2’s complement binary arithmetic, and fuse away
all the intermediate conversion steps, until the above mysterious
implementation pops out.
Sam Lindley is a Reader in Programming Languages Design and Implementation at the University of Edinburgh. In this episode, he tells us how difficult naming is, the different kinds of effect systems and handlers, languages *much* purer than Haskell, and Modal logic.
Admittedly a bit late, buuuuuut Merry belated Christmas and Happy New Years
to all!
This past December I again participated in Eric Wastl’s Advent of Code, a series of 25 daily
Christmas-themed puzzles. Each puzzle comes with a cute story about saving
Christmas, and the puzzles increase in difficulty as the stakes get higher and
higher. Every night at midnight EST, my friends and I (including the good people
of libera chat’s ##advent-of-code channel) discuss the latest
puzzle and creative ways to solve and optimize it. But, the main goal isn’t to
solve it quickly, it’s always to see creative ways to approach the puzzle and
share different insights. The puzzles are bite-sized enough that there are often
multiple ways to approach it, and in the past I’ve leveraged group theory, galilean
transformations and linear algebra, and more group theory.
This year was also the special 10 year anniversary event, with callbacks to fun
story elements of all the previous years!
Most of the puzzles are also pretty nice to solve in Haskell! Lots of DFS’s
that melt away as simple recursion or recursion schemes, and even the BFS’s that
expose you to different data structures and encodings.
This year I’ve moved almost all of my Haskell code to an Advent of Code Megarepo.
I also like to post write-ups on Haskelly ways to approach the problems, and
they are auto-compiled on the megarepo wiki.
I try my best every year, but sometimes I am able to complete write-ups for
all 25 puzzles before the new year catches up. The last time was 2020, and I’m
proud to announce that 2024 is now also 100% complete!
You can find all of
them here, but here are links to each individual one. Hopefully you can find
them helpful. And if you haven’t yet, why not try Advent of Code yourself? :) And drop by the
freenode ##advent-of-code channel, we’d love to say hi and chat, or
help out! Thanks all for reading, and also thanks to Eric for a great event this
year, as always!
Lucas Escot wrote a good blog post titled “Making My Life Easier with GADTs”, which contains a demonstration of GADTs that made his life easier.
He posted the article to reddit.
I’m going to trust that - for his requirements and anticipated program evolution - the solution is a good one for him, and that it actually made his life easier.
However, there’s one point in his post that I take issue with:
Dependent types and assimilated type-level features get a bad rep. They are often misrepresented as a futile toy for “galaxy-brain people”, providing no benefit to the regular programmer. I think this opinion stems from a severe misconception about the presumed complexity of dependent type systems.
I am often arguing against complexity in Haskell codebases.
While Lucas’s prediction about “misconceptions” may be true for others, it is not true for me.
I have worked extensively with Haskell’s most advanced features in large scale codebases.
I’ve studied “Types and Programming Languages,”the Idris book, “Type Theory and Formal Proof”, and many other resources on advanced type systems.
I don’t say this to indicate that I’m some kind of genius or authority, just that I’m not a rube who’s looking up on the Blub Paradox.
My argument for simplicity comes from the hard experience of having to rip these advanced features out, and the pleasant discovery that simpler alternatives are usually nicer in every respect.
They are often misrepresented as a futile toy for “galaxy-brain people”, providing no benefit to the regular programmer. I think this opinion stems from a severe misconception about the presumed complexity of dependent type systems.
This opinion - in my case at least - stems from having seen people code themselves into a corner with fancy type features where a simpler feature would have worked just as well.
In this case, the “simplest solution” is to have two entirely separate datatypes, as the blog post initially starts with. These datatypes, after all, represent different things - a typed environment and an untyped environment. Why mix the concerns? What pain or requirement is solved by having one more complicated datatype when two datatypes works pretty damn well?
I could indeed keep typed environments completely separate. Different datatypes, different information. But this would lead to a lot of code duplication. Given that the compilation logic will mostly be mostly identical for these two targets, I don’t want to be responsible for the burden of keeping both implementations in sync.
Code duplication can be a real concern. In this case, we have code that is not precisely duplicated, but simply similar - we want compilation logic to work for both untyped and typed logics, and only take typing information into account. When we want code to work over multiple possible types, we have two options: parametric polymorphism and ad-hoc polymorphism.
With parametric polymorphism, the solution looks like this:
This is actually very similar to the GADT approach, because we’re threading a type variable through the system. For untyped, we can write GlobalDecl (), and for typed, we can write GlobalDecl LamBox.Type.
Functions which can work on either untyped or typed would have GlobalDecl a -> _ as their input, and functions which require a representation can specify it directly. This would look very similar to the GADT approach: in practice, replace GlobalDecl Typed with GlobalDecl Type and GlobalDecl Untyped with GlobalDecl () and you’re good.
(or, heck, data Untyped = Untyped and the change is even smaller).
This representation is much easier to work with. You can deriving stock (Show, Eq, Ord). You can $(deriveJSON ''GlobalEnv). You can delete several language extensions. It’s also more flexible: you can use Maybe Type to represent partially typed programs (or programs with type inference). You can use Either TypeError Type to represent full ASTs with type errors. You can deriving stock (Functor, Foldable, Traversable) to get access to fmap (change the type with a function) and toList (collect all the types in the AST) and traverse (change each type effectfully, combining results).
When we choose GADTs here, we pay significant implementation complexity costs, and we give up flexibility. What is the benefit? Well, the entire benefit is that we’ve given up flexibility. With the parametric polymorphism approach, we can put anything in for that type variable a. The GADT prevents us from writing TypeDecl () and it forbids you from having anything other than Some (type :: Type) or None in the fields.
This restriction is what I mean by ‘coding into a corner’. Let’s say you get a new requirement to support partially typed programs. If you want to stick with the GADT approach, then you need to change data Typing = Typed | Untyped | PartiallyTyped and modify all the WhenTyped machinery - Optional :: Maybe a -> WhenTyped PartiallTyped a. Likewise, if you want to implement inference or type-checking, you need another constructor on Typing and another onWhenTyped - ... | TypeChecking and Checking :: Either TypeError a -> WhenTyped TypeChecking a.
But wait - now our TypeAliasDecl has become overly strict!
But, uh oh, we also want to write functions that can operate in many of these states. We can extend IsTypedish with a function witness witnessTypedish :: WhenTyped t Type -> Type, but that also doesn’t quite work - the t actually determines the output type.
but actually working with this becomes a bit obnoxious. You see, without knowing t, you can’t know the result of isTypedIshWitness, so you end up needing to say things like (IsTypedish t, TypedIshPayload t ~ f Type, Foldable f) => ... to cover the Maybe and Either case - and this only lets you fold the result. But now you’re working with the infelicities of type classes (inherently open) and sum types (inherently closed) and the way that GHC tries to unify these two things with type class dispatch.
Whew.
Meanwhile, in parametric polymorphism land, we get almost all of the above for free. If we want to write code that covers multiple possible cases, then we can use much simpler type class programming. Consider how easy it is to write this function and type:
My Emacs config's todo-list has long had an item about finding some way to
review GitHub PRs without having to leave Emacs and when the forge issue that I
subscribe to came alive again I thought it was time to see if I can improve my
config.
I've tried the first one before but at the time it didn't seem to work at all.
Apparently that's improved somewhat, though there's a PR with a change that's
necessary to make it work.1 The first two don't support comments on multiple
lines of a PR, there are issues/discussions for both
The last one, emacs-pr-review does support commenting on multiple lines, but
it lacks a nice way of opening a review from magit. What I can do is
position the cursor on a PR in the magit status view, then
copy the the PR's URL using forge-copy-url-at-point-as-kill, and
open the PR by calling pr-review and pasting the PR's URL.
Which I did for a few days until I got tired of it and wrote a function to cut
out they copy/paste part.
(defunmes/pr-review-via-forge()(interactive)(if-let*((target (forge--browse-target))(url (if(stringp target) target (forge-get-url target)))(rev-url (pr-review-url-parse url)))(pr-review url)(user-error"No PR to review at point")))
I've bound it to a key in magit-mode-map to make it easier.
I have to say I'm not completely happy with emacs-pr-review, so if either of
the other two sort out commenting on multiple lines I'll check them out again.
The links to formulae here are broken but a PDF version is available at github.
Preface
Functional programming encourages us to program without mutable state.
Instead we compose functions that can be viewed as state transformers.
It's a change of perspective that can have a big impact on how we reason about our code.
But it's also a change of perspective that can be useful in mathematics and I'd like to give an example: a really beautiful technique that alows you to sample from the infinite limit of a probability distribution without needing an infinite number of operations.
(Unless you're infinitely unlucky!)
Markov Chains
A Markov chain is a sequence of random states where each state is drawn from a random distribution that possibly depends on the previous state, but not on any earlier state.
So it is a sequence such that for all .
A basic example might be a model of the weather in which each day is either sunny or rainy but where it's more likely to be rainy (or sunny) if the previous day was rainy (or sunny).
(And to be technically correct: having information about two days or earlier doesn't help us if we know yesterday's weather.)
Like imperative code, this description is stateful.
The state at step depends on the state at step .
Probability is often easier to reason about when we work with independent identically drawn random variables and our aren't of this type.
But we can eliminate the state from our description using the same method used by functional programmers.
Let's choose a Markov chain to play with.
I'll pick one with 3 states called , and and with transition probabilities given by
where
Here's a diagram illustrating our states:
Implementation
First some imports:
> {-# LANGUAGE LambdaCase #-}
> {-# LANGUAGE TypeApplications #-}
> data ABC = A | B | C deriving (Eq, Show, Ord, Enum, Bounded)
We are now in a position to simulate our Markov chain.
First we need some random numbers drawn uniformly from [0, 1]:
> uniform :: (RandomGen gen, MonadState gen m) => m Double
> uniform = state random
And now the code to take a single step in the Markov chain:
> step :: (RandomGen gen, MonadState gen m) => ABC -> m ABC
> step A = do
> a <- uniform
> if a < 0.5
> then return A
> else return B
> step B = do
> a <- uniform
> if a < 1/3.0
> then return A
> else if a < 2/3.0
> then return B
> else return C
> step C = do
> a <- uniform
> if a < 0.5
> then return B
> else return C
Notice how the step function generates a new state at random in a way that depends on the previous state.
The m ABC in the type signature makes it clear that we are generating random states at each step.
We can simulate the effect of taking steps with a function like this:
> steps :: (RandomGen gen, MonadState gen m) => Int -> ABC -> m ABC
> steps 0 i = return i
> steps n i = do
> i <- steps (n-1) i
> step i
We can run for 100 steps, starting with , with a line like so:
*Main> evalState (steps 3 A) gen
B
The starting state of our random number generator is given by gen.
Consider the distribution of states after taking steps.
For Markov chains of this type, we know that as goes to infinity the distribution of the th state approaches a limiting "stationary" distribution.
There are frequently times when we want to sample from this final distribution.
For a Markov chain as simple as this example, you can solve exactly to find the limiting distribution.
But for real world problems this can be intractable.
Instead, a popular solution is to pick a large and hope it's large enough.
As gets larger the distribution gets closer to the limiting distribution.
And that's the problem I want to solve here - sampling from the limit.
It turns out that by thinking about random functions instead of random states we can actually sample from the limiting distribution exactly.
Some random functions
Here is a new version of our random step function:
> step' :: (RandomGen gen, MonadState gen m) => m (ABC -> ABC)
> step' = do
> a <- uniform
> return $ \case
> A -> if a < 0.5 then A else B
> B -> if a < 1/3.0
> then A
> else if a < 2/3.0 then B else C
> C -> if a < 0.5 then B else C
In many ways it's similar to the previous one.
But there's one very big difference: the type signature m (ABC -> ABC) tells us that it's returning a random function, not a random state.
We can simulate the result of taking 10 steps, say, by drawing 10 random functions, composing them, and applying the result to our initial state:
> steps' :: (RandomGen gen, MonadState gen m) => Int -> m (ABC -> ABC)
> steps' n = do
> fs <- replicateA n step'
> return $ foldr (flip (.)) id fs
Notice the use of flip.
We want to compose functions , each time composing on the left by the new .
This means that for a fixed seed gen, each time you increase by 1 you get the next step in a single simulation:
(BTW I used replicateA instead of replicateM to indicate that these are independent random draws.
It may be well known that you can use Applicative instead of Monad to indicate independence but I haven't seen it written down.)
*Main> [f A | n <- [0..10], let f = evalState (steps' n) gen]
[A,A,A,B,C,B,A,B,A,B,C]
When I first implemented this I accidentally forgot the flip.
So maybe you're wondering what effect removing the flip has?
The effect is about as close to a miracle as I've seen in mathematics.
It allows us to sample from the limiting distribution in a finite number of steps!
Here's the code:
> steps_from_past :: (RandomGen gen, MonadState gen m) => Int -> m (ABC -> ABC)
> steps_from_past n = do
> fs <- replicateA n step'
> return $ foldr (.) id fs
We end up building .
This is still a composition of independent identically distributed functions and so it's still drawing from exactly the same distribution as steps'.
Nonetheless, there is a difference: for a particular choice of seed, steps_from_past n no longer gives us a sequence of states from a Markov chain.
Running with argument draws a random composition of functions.
But if you increase by 1 you don't add a new step at the end.
Instead you effectively restart the Markov chain with a new first step generated by a new random seed.
Try it and see:
*Main> [f A | n <- [0..10], let f = evalState (steps_from_past n) gen]
[A, A, A, A, A, A, A, A, A, A]
Maybe that's surprising.
It seems to get stuck in one state.
In fact, we can try applying the resulting function to all three states.
*Main> [fmap f [A, B, C] | n <- [0..10], let f = evalState (steps_from_past n) gen]
[[A,B,C],[A,A,B],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A]]
In other words, for large enough we get the constant function.
Think of it this way:
If f isn't injective then it's possible that two states get collapsed to the same state.
If you keep picking random f's it's inevitable that you will eventually collapse down to the point where all arguments get mapped to the same state.
Once this happens, we'll get the same result no matter how large we take .
If we can detect this then we've found the limit of as goes to infinity.
But because we know composing forwards and composing backwards lead to draws from the same distribution, the limiting backward composition must actually be a draw from the same distribution as the limiting forward composition.
That flip can't change what probability distribution we're drawing from - just the dependence on the seed.
So the value the constant function takes is actually a draw from the limiting stationary distribution.
We can code this up:
> all_equal :: (Eq a) => [a] -> Bool
> all_equal [] = True
> all_equal [_] = True
> all_equal (a : as) = all (== a) as
> test_constant :: (Bounded a, Enum a, Eq a) => (a -> a) -> Bool
> test_constant f =
> all_equal $ map f $ enumFromTo minBound maxBound
This technique is called coupling from the past.
It's "coupling" because we've arranged that different starting points coalesce.
And it's "from the past" because we're essentially asking answering the question of what the outcome of a simulation would be if we started infinitely far in the past.
> couple_from_past :: (RandomGen gen, MonadState gen m, Enum a, Bounded a, Eq a) =>
> m (a -> a) -> (a -> a) -> m (a -> a)
> couple_from_past step f = do
> if test_constant f
> then return f
> else do
> f' <- step
> couple_from_past step (f . f')
We can now sample from the limiting distribution a million times, say:
*Main> let samples = map ($ A) $ evalState (replicateA 1000000 (couple_from_past step' id)) gen
We can now count how often A appears:
*Main> fromIntegral (length $ filter (== A) samples)/1000000
0.285748
That's a pretty good approximation to , the exact answer that can be found by finding the eigenvector of the transition matrix corresponding to an eigenvalue of 1.
> gen = mkStdGen 669
Notes
The technique of coupling from the past first appeared in a paper by Propp and Wilson.
The paper Iterated Random Functions by Persi Diaconis gave me a lot of insight into it.
Note that the code above is absolutely not how you'd implement this for real.
I wrote the code that way so that I could switch algorithm with the simple removal of a flip.
In fact, with some clever tricks you can make this method work with state spaces so large that you couldn't possibly hope to enumerate all starting states to detect if convergence has occurred.
Or even with uncountably large state spaces.
But I'll let you read the Propp-Wilson paper to find out how.
Writing an interpreter for Brainfuck is almost a rite of passage for any programming language implementer,
and it’s my turn now. In this post, we’ll write not one but four Brainfuck interpreters in Haskell. Let’s go!
Brainfuck (henceforth BF) is the most famous of esoteric programming languages. Its fame lies in
the fact that it is extremely minimalistic, with only eight instructions, and very easy to implement.
Yet, it is Turing-complete and as capable as any other programming language1. Writing
an interpreter for BF is a fun exercise, and so there are hundreds, maybe even thousands of them. Since BF
is very verbose, optimizing BF interpreters is almost a sport, with people posting benchmarks of their
creations. I can’t say that what I have in this post is novel, but it was definitely a fun exercise for me.
BF has eight instructions of one character each. A BF program is a sequence of these instructions. It may have other characters as well, which are treated as comments and are ignored while executing. An instruction pointer (IP) points at the next instruction to be executed, starting with the first instruction. The instructions are executed sequentially, except for the jump instructions that may cause the IP to jump to remote instructions. The program terminates when the IP moves past the last instruction.
BF programs work by modifying data in a memory that is an array of at least 30000 byte cells initialized to zero. A data pointer (DP) points to the current byte of the memory to be modified, starting with the first byte of the memory. BF programs can also read from standard input and write to standard output, one byte at a time using the ASCII character encoding.
The eight BF instructions each consist of a single character:
>
Increment the DP by one to point to the next cell to the right.
<
Decrement the DP by one to point to the next cell to the left.
+
Increment the byte at the DP by one.
-
Decrement the byte at the DP by one.
.
Output the byte at the DP.
,
Accept one byte of input, and store its value in the byte at the DP.
[
If the byte at the DP is zero, then instead of moving the IP forward to the next command, jump it forward to the command after the matching ] command.
]
If the byte at the DP is nonzero, then instead of moving the IP forward to the next command, jump it back to the command after the matching [ command.
Each [ matches exactly one ] and vice versa, and the [ comes first. Together, they add conditions and loops to BF.
Some details are left to implementations. In our case, we assume that the memory cells are signed bytes that underflow and overflow without errors. Also, accessing the memory beyond array boundaries wraps to the opposite side without errors.
For a taste, here is a small BF program that prints Hello, World! when run:
As you can imagine, interpreting BF is easy, at least when doing it naively. So instead of writing one interpreter, we are going to write four, with increasing performance and complexity.
Setup
First, some imports:
{-# LANGUAGE GHC2021 #-}{-# LANGUAGE LambdaCase #-}{-# LANGUAGE TypeFamilies #-}moduleMainwhereimportControl.Arrow ((>>>))importControl.Monad (void)importData.Bits (shiftR, (.&.))importData.ByteArrayqualifiedasBAimportData.Char (chr, ord)importData.Functor (($>))importData.Int (Int8)importData.Kind (Type)importData.VectorqualifiedasVimportData.Vector.Storable.MutablequalifiedasMVimportData.Word (Word16, Word8)importForeign.Ptr (Ptr, castPtr, minusPtr, plusPtr)importForeign.StorablequalifiedasSimportSystem.Environment (getArgs, getProgName)importSystem.Exit (exitFailure)importSystem.IOqualifiedasIOimportText.ParserCombinators.ReadPqualifiedasP
We use the GHC2021 extension here that enables a lot of useful GHC extensions by default. Our non-base imports come from the memory and vector libraries.
We abstract the interpreter interface as a typeclass:
classInterpreter a wheredataProgram a ::Type parse ::String->Program a interpret ::Memory->Program a ->IO ()
An Interpreter is specified by a data type Program and two functions: parse parses a string to a Program, and interpret interprets the parsed Program.
For modelling the mutable memory, we use a mutable unboxed IOVector of signed bytes (Int8) from the vector package. Since our interpreter runs in IO, this works well for us. The DP hence, is modelled as a index in this vector, which we name the MemIdx type.
We wrap the IOVectorInt8 with a Memorynewtype. newMemory creates a new memory array of bytes initialized to zero. memorySize returns the size of the memory. readMemory, writeMemory and modifyMemory are for reading from, writing to and modifying the memory respectively. nextMemoryIndex and prevMemoryIndex increment and decrement the array index respectively, taking care of wrapping at boundaries.
Now we write the main function using the Interpreter typeclass functions:
The main function calls the parse and interpret functions for the right interpreter with a new memory and the input string read from the file specified in the command line argument. We make sure to filter out non-BF characters when reading the input file.
With the setup done, let’s move on to our first interpreter.
String Interpreter
A BF program can be interpreted directly from its string representation, going over the characters and executing the right logic for them. But strings in Haskell are notoriously slow because they are implemented as singly linked-lists of characters. Indexing into strings has \(O(n)\) time complexity, so it is not a good idea to use them directly. Instead, we use a char Zipper2.
Zippers are a special view of data structures, which allow one to navigate and easily update them. A zipper has a focus or cursor which is the current element of the data structure we are “at”. Alongside, it also captures the rest of the data structure in a way that makes it easy to move around it. We can update the data structure by updating the element at the focus3.
This zipper is a little different from the usual implementations because we need to know when the focus of the zipper has moved out the program boundaries. Hence, we model the focus as MaybeChar. czFromString creates a char zipper from a string. czMoveLeft and czMoveRight move the focus left and right respectively, taking care of setting the focus to Nothing if we move outside the program string.
Parsing the program is thus same as creating the char zipper from the program string. For interpreting the program, we write this function:
interpretCharZipper ::Memory->CharZipper->IO ()interpretCharZipper memory = go 0where go !memIdx !program =case czFocus program ofNothing->return ()Just c ->case c of'+'-> modifyMemory memory (+1) memIdx >> goNext'-'-> modifyMemory memory (subtract1) memIdx >> goNext'>'-> go (nextMemoryIndex memory memIdx) program''<'-> go (prevMemoryIndex memory memIdx) program'','->dogetChar>>= writeMemory memory memIdx .fromIntegral.ord goNext'.'->do readMemory memory memIdx >>=putChar.chr.fromIntegral goNext'['-> readMemory memory memIdx >>= \case0-> go memIdx $ skipRight 1 program _ -> goNext']'-> readMemory memory memIdx >>= \case0-> goNext _ -> go memIdx $ skipLeft 1 program _ -> goNextwhere program' = czMoveRight program goNext = go memIdx program'
Our main driver here is the tail-recursive go function that takes the memory index and the program as inputs. It then gets the current focus of the program zipper, and executes the BF logic accordingly.
If the current focus is Nothing, it means the program has finished running. So we end the execution. Otherwise, we switch over the character and do what the BF spec tells us to do.
For + and -, we increment or decrement respectively the value in the memory cell at the current index, and go to the next character. For > and <, we increment or decrement the memory index respectively, and go to the next character.
For ,, we read an ASCII encoded character from the standard input, and write it to the memory at the current memory index as a byte. For ., we read the byte from the memory at the current memory index, and write it out to the standard output as an ASCII encoded character. After either cases, we go to the next character.
For [, we read the byte at the current memory index, and if it is zero, we skip right over the part of the program till the matching ] is found. Otherwise, we go to the next character.
For ], we skip left over the part of the program till the matching [ is found, if the current memory byte is non-zero. Otherwise, we go to the next character.
The next two functions implement the skipping logic:
The tail-recursive functions skipRight and skipLeft skip over parts of the program by moving the focus to right and left respectively, till the matching bracket is found. Since the loops can contain nested loops, we keep track of the depth of loops we are in, and return only when the depth becomes zero. If we move off the program boundaries while skipping, we throw an error.
That’s it! We now have a fully functioning BF interpreter. To test it, we use these two BF programs: hanoi.bf and mandelbrot.bf.
hanoi.bf solves the Tower of Hanoi puzzle with animating the solution process as ASCII art:
<noscript></noscript>
A freeze-frame from the animation of solving the Tower of Hanoi puzzle with hanoi.bf
mandelbrot.bf prints an ASCII art showing the Mandelbrot set:
<noscript></noscript>
Mandelbrot set ASCII art by mandelbrot.bf
Both of these BF programs serve as good benchmarks for BF interpreters. Let’s test ours by compiling and running it4:
❯ nix-shell -p "ghc.withPackages (pkgs: with pkgs; [vector memory])" \
--run "ghc --make bfi.hs -O2"
[1 of 2] Compiling Main ( bfi.hs, bfi.o )
[2 of 2] Linking bfi [Objects changed]
❯ time ./bfi -s hanoi.bf > /dev/null
29.15 real 29.01 user 0.13 sys
❯ time ./bfi -s mandelbrot.bf > /dev/null
94.86 real 94.11 user 0.50 sys
That seems quite slow. We can do better.
AST Interpreter
Instead of executing BF programs from their string representations, we can parse them to an Abstract Syntax Tree (AST). This allows us to match brackets only once at parse time, instead of doing it repeatedly at run time. We capture loops as AST nodes, allowing us to skip them trivially.
There is one constructor per BF instruction, except for loops where the Loop constructor captures both the start and end of loop instructions. We use immutable boxed vectors for lists of instructions instead of Haskell lists so that we can index into them in \(O(1)\).
We use the parse combinator library ReadP to write a recursive-decent parser for BF:
All cases except the loop one are straightforward. For loops, we call the parser recursively to parse the loop body. Note that the parser matches the loop brackets correctly. If the brackets don’t match, the parser fails.
The AST interpreter code is quite similar to the string interpreter one. This time we use an integer as the IP to index the Instructions vector. All cases except the loop one are pretty much same as before.
For loops, we read the byte at the current memory index, and if it is zero, we skip executing the Loop AST node and go to the next instruction. Otherwise, we recursively interpret the loop body and go to the next instruction, taking care of passing the updated memory index returned from the recursive call to the execution of the next instruction.
And we are done. Let’s see how it performs:
❯ time ./bfi -a hanoi.bf > /dev/null
14.94 real 14.88 user 0.05 sys
❯ time ./bfi -a mandelbrot.bf > /dev/null
36.49 real 36.32 user 0.17 sys
Great! hanoi.bf runs 2x faster, whereas mandelbrot.bf runs 2.6x faster. Can we do even better?
Bytecode Interpreter
AST interpreters are well known to be slow because of how AST nodes are represented in the computer’s memory. The AST nodes contain pointers to other nodes, which may be anywhere in the memory. So while interpreting an AST, it jumps all over the memory, causing a slowdown. One solution to this is to convert the AST into a more compact and optimized representation known as Bytecode. That’s what our next interpreter uses.
We reuse the parser from the AST interpreter, but then we convert the resultant AST into bytecode by translating and assembling it5. We use the Bytes byte array data type from the memory package to represent bytecode.
Unlike AST, bytecode has a flat list of instructions—called Opcodes—that can be encoded in a single byte each, with optional parameters. Because of its flat nature and compactness, bytecode is more CPU friendly to execute, which is where it gets its performance from. The downside is that bytecode is not human readable unlike AST.
The assembleOpcode function assembles an Opcode to a list of bytes (Word8s). For all cases except for OpLoop, we simply return a unique byte for the opcode.
For OpLoop, we first recursively assemble the loop body. We encode both the body and the body length in the assembled bytecode, so that the bytecode interpreter can use the body length to skip over the loop body when required. We use two bytes to encode the body length, so we first check if the body length plus three is over 65536 (\(= 2^8*2^8\)). If so, we throw an error. Otherwise, we return:
a unique byte for loop start (6),
followed by the body length encoded in two bytes (in the Little-endian order),
then the assembled loop body,
followed by a unique byte for loop end (7),
finally followed by the encoded body length again.
We encode the body length at the end again so that we can use it to jump backward to the start of the loop, to continue looping. Let’s look at this example to understand the loop encoding better:
In Haskell, the pointer type Ptr is parametrized by the type of the data it points to. We have two types of pointers here, one that points to the bytecode program, and another that points to the memory cells. So in this case, the IP and DP are actually pointers.
The go function here is again the core of the interpreter loop. We track the current IP and DP in it, and execute the logic corresponding to the opcode at the current memory location. go ends when the IP points to the end of the program byte array.
Most of the cases in go are similar to previous interpreters. Only difference is that we use pointers to read the current opcode and memory cell. For the loop start opcode, we read the byte pointed to by the DP, and if it is zero, we read the next two bytes from the program bytecode, and use it as the offset to jump the IP by to skip over the loop body. Otherwise, we jump the IP by 3 bytes to skip over the loop start opcode and encoded loop body length bytes. For the loop end opcode, we follow similar steps, except we jump backward to the start of the loop.
The helper functions for doing pointer arithmetic are following:
nextMemPtr and prevMemPtr implement wrapping of pointers as we do for memory indices in nextMemoryIndex and prevMemoryIndex. Let’s see what the results of our hard work are:
❯ time ./bfi -b hanoi.bf > /dev/null
11.10 real 11.04 user 0.04 sys
❯ time ./bfi -b mandelbrot.bf > /dev/null
15.72 real 15.68 user 0.04 sys
1.3x and 2.3x speedups for hanoi.bf and mandelbrot.bf respectively over the AST interpreter. Not bad. But surely we can do even better?
Optimizing Bytecode Interpreter
We can optimize our bytecode interpreter by emitting specialized opcodes for particular patterns of opcodes that occur frequently. Think of it as replacing every occurrence of a long phrase in a text with a single word that means the same, leading to a shorter text and faster reading time. Since BF is so verbose, there are many opportunities for optimizing BF bytecode7. We are going to implement only one simple optimization, just to get a taste of how to do it.
The optimizing bytecode interpreter is pretty much same as the bytecode interpreter, with the optimize function called between the translation and assembly phases.
The pattern of opcode we are optimizing for is [-] and [+]. Both of these BF opcodes when executed, decrement or increment the current memory cell till it becomes zero. In effect, these patterns clear the current cell. We start the process by adding a new Opcode for clearing a cell:
We can see how the patterns [-] and [+] that may execute operations tens, maybe hundreds, of times, are replaced by a single operation in the interpreter now. This is what gives us the speedup in this case. Let’s run it:
❯ time ./bfi -o hanoi.bf > /dev/null
4.07 real 4.04 user 0.01 sys
❯ time ./bfi -o mandelbrot.bf > /dev/null
15.58 real 15.53 user 0.04 sys
hanoi.bf runs 2.7x faster, whereas mandelbrot.bf is barely 1% faster as compared to the non-optimizing bytecode interpreter. This demonstrates how different optimizations apply to different programs, and hence the need to implement a wide variety of them to be able to optimize all programs well.
Comparison
It’s time for a final comparison of the run times of the four interpreters:
Interpreter
Hanoi
Mandelbrot
String
29.15s
94.86s
AST
14.94s
36.49s
Bytecode
11.10s
15.72s
Optimizing Bytecode
4.07s
15.58s
The final interpreter is 7x faster than the baseline one for hanoi.bf, and 6x faster for mandelbrot.bf. Here’s the same data as a chart:
Run time of the four interpreters
That’s it for this post. I hope you enjoyed it and took something away from it. In a future post, we’ll explore more optimization for our BF interpreter. The full code for this post is available here.
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
BF is Turning-complete. That means it can be used to implement any computable program. However, it is a Turing tarpit, which means it is not feasible to write any useful programs in it because of its lack of abstractions.↩︎
A string interpreter also serves as an useful baseline for measuring the performance of BF interpreters. That’s why I decided to use strings instead of Data.Text or Data.Sequence, which are more performant.↩︎
I am a big fan of zippers, as evidenced by this growing list of posts that I use them in.↩︎
We use Nix for getting the dependency libraries.↩︎
If you are unfamiliar, >>> is the left-to-right function composition function:
While the only way to access byte arrays is pointers, we could have continued accessing the memory vector using indices. I benchmarked both methods, and found that using pointers for memory access sped up the execution of hanoi.bf by 1.1x and mandelbrot.bf by 1.6x as compared to index-based access. It’s also nice to learn how to use pointers in Haskell. This is why we chose to use Storable vectors for the memory.↩︎
See BFC, which touts itself as “an industrial-grade Brainfuck compiler”, with a huge list of optimizations.↩︎
I’ve always considered lenses to be a bit uncomfortable. While they’re occasionally useful for doing deeply nested record updates, they often seem to be more trouble than they’re worth. There’s a temptation in the novice programmer, to ^.. and folded their way to a solution that is much more naturally written merely as toList. And don’t get me started about the stateful operators like <<+= and their friends. Many programs which can be more naturally written functionally accidentally end up being imperative due to somebody finding a weird lens combinator and trying to use it in anger. Much like a serious drug collection, the tendency is to push it as far as you can.
Thus, my response has usually been one of pushback and moderation. I don’t avoid lenses at all costs, but I do try to limit myself to the prime types (Lens', Prism', Iso'), and to the boring combinators (view, set, over). I feel like these give me most of the benefits of lenses, without sending me tumbling down the rabbit hole.
All of this is to say that my grokkage of lenses has always been one of generalized injections and projections, for a rather shallow definition of “generalized”. That is, I’ve grown accustomed to thinking about lenses as getter/setter pairs for data structures—eg, I’ve got a big product type and I want to pull a smaller piece out of it, or modify a smaller piece in a larger structure. I think about prisms as the dual structure over coproducts—“generalized” injecting and pattern matching.
And this is all true; but I’ve been missing the forest for the trees on this one. That’s not to say that I want to write lensier code, but that I should be taking the “generalized” part much more seriously.
The big theme of my intellectual development over the last few years has been thinking about abstractions as shared vocabularies. Monoids are not inherently interesting; they’re interesting because of how they let you quotient seemingly-unrelated problems by their monoidal structure. Applicatives are cool because once you’ve grokked them, you begin to see them everywhere. Anywhere you’ve got conceptually-parallel, data-independent computations, you’ve got an applicative lurking somewhere under the surface (even if it happens to be merely the Identity applicative.)
I’ve had a similar insight about lenses, and that’s what I wanted to write about today.
The Context
At work, I’ve been thinking a lot about compilers and memory layout lately. I won’t get into the specifics of why, but we can come up with an inspired example. Imagine we’d like to use Haskell to write a little eDSL that we will use to generate x86 machine code.
The trick of course, is that we’re writing Haskell in order to not write machine code. So the goal is to design high-level combinators in Haskell that express our intent, while simultaneously generating machine code that faithfully implements the intention.
One particularly desirable feature about eDSLs is that they allow us to reuse Haskell’s type system. Thus, imagine we have some type:
typeCode ::Type->TypedataCode a =Code { getMachineCode :: [X86OpCode] }
Notice that the a parameter here is entirely phantom; it serves only to annotate the type of the value produced by executing getMachineCode. For today’s purpose, we’ll ignore all the details about calling conventions and register layout and what not; let’s just assume a Code a corresponds to a computation that leaves a value (or pointer) to something of type a in a well-known place, whether that be the top of the stack, or eax or something. It doesn’t matter!
Since the type parameter to Code is phantom, we need to think about what role it should have. Keeping it at phantom would be disastrous, since this type isn’t used by Haskell, but it is certainly used to ensure our program is correct. Similarly, representational seems wrong, since coerce is meaningful only when thinking about Haskell; which this thing decidedly is not. Thus, our only other option is:
type role Code nominal
Frustratingly, due to very similar reasoning, Code cannot be a functor, because there’s no way1 to lift an arbitrary Haskell function a -> b into a corresponding function Code a -> Code b. If there were, we’d be in the clear! But alas, we are not.
The Problem
All of the above is to say that we are reusing Haskell’s type system, but not its values. An expression of type Code Bool has absolutely no relation to the values True or False—except that we could write, by hand, a function litBool :: Bool -> Code Bool which happened to do the right thing.
It is tempting, however, to make new Haskell types in order to help constrain the assembly code we end up writing. For example, maybe we want to write a DSP for efficiently decoding audio. We can use Haskell’s types to organize our thoughts and prevent ourselves from making any stupid mistakes:
dataDecoder=Decoder { format ::Format , seekPos ::Int , state ::ParserState }dataChunk=...createDecoder ::CodeMediaHandle->CodeDecoderdecodeChunk ::CodeDecoder-> (CodeDecoder, CodeChunk)
We now have a nice interface in our eDSL to guide end-users along the blessed path of signal decoding. We have documented what we are trying to do, and how it can be used once it’s implemented. But due to our phantom, yet nominal, parameter to Code, this is all just make believe. There is absolutely no correlation between what we’ve written down and how we can use it. The problem arises when we go to implement decodeChunk. We’ll need to know what state we’re in, which means we’ll need some function:
In a world where Code is a functor, this is implemented trivially as fmap state. But Code is not a functor! Alas! Woe! What ever can we do?
The Solution
Lenses, my guy!
Recall that Code is phantom in its argument, even if we use roles to restrict that fact. This means we can implement a safe-ish version of unsafeCoerce, that only fiddles with the paramater of our phantom type:
unsafeCoerceCode ::Code a ->Code bunsafeCoerceCode (Code ops) =Code ops
Judicious use of unsafeCoerceCode allows us to switch between a value’s type and its in-memory representation. For example, given a type:
typeBytes ::Nat->TypedataBytes n
we can reinterpret a Decode as a sequence of bytes:
decoderRep ::Iso' (CodeDecoder) (Code (Bytes (32+4+1)))decoderRep = iso unsafeCoerceCode unsafeCoerceCodestateRep ::Iso' (CodeParserState) (Code (Bytes1))stateRep = iso unsafeCoerceCode unsafeCoerceCode
which says we are considering our Decoder to be laid out in memory like:
Of course, this is a completely unsafe transformation, as far as the Haskell type system is aware. We’re in the wild west out here, well past any type theoretical life buoys. We’d better be right that this coercion is sound. But assuming this is in fact the in-memory representation of a Decoder, we are well justified in this transformation.
Notice the phrasing of our Iso' above. It is not an iso between Decoder and Bytes 37, but between Codes of such things. This witnesses the fact that it is not true in the Haskell embedding, merely in our Code domain. Of course, isos are like the least exciting optics, so let’s see what other neat things we can do.
Imagine we have some primitives:
slice :: n <= m=>Int-- ^ offset->Proxy n -- ^ size->Code (Bytes m)->Code (Bytes n)overwrite :: n <= m=>Int-- ^ offset->Bytes n->Bytes m->Bytes m
which we can envision as Haskell bindings to the pseudo-C functions:
Such a lens acts exactly as a record selector would, in that it allows us to view, set, and over a ParserState inside of a Decoder. But recall that Code is just a list of instructions we eventually want the machine to run. We’re using the shared vocabulary of lenses to emit machine code! What looks like using a data structure to us when viewed through the Haskell perspective, is instead invoking an assembler.
Reflections
Once the idea sinks in, you’ll start seeing all sorts of cool things you can do with optics to generate code. Prisms generalize running initializer code. A Traversal over Code can be implemented as a loop. And since all the sizes are known statically, if you’re feeling plucky, you can decide to unroll the loop right there in the lens.
Outside of the context of Code, the realization that optics are this general is still doing my head in. Something I love about working in Haskell is that I’m still regularly having my mind blown, even after a decade.
Humans want the resources of other humans. I want the food that the supermarket owns so that I can eat it. Before buying it, I wanted the house that I now own. And before that, someone wanted to build a house on that plot of land, which was owned by someone else first. Most of the activities we engage in during our lifetime revolve around extracting something from someone else.
There are two basic modalities to getting the resources of someone else. The first, the simplest, and the one that has dominated the majority of human history, is force. Conquer people, kill them, beat them up and take their stuff, force them into slavery and make them do your work. It’s a somewhat effective strategy. This can also be more subtle, by using coercive and fraudulent methods to trick people into giving you their resources. Let’s call this modality the looter approach.
The second is trade. In the world of trade, I can only extract resources from someone else when they willingly give them to me in exchange for something else of value. This can be barter of value for value, payment in money, built-up goodwill, favors, charity (exchanging resources for the benefit you receive for helping someone else), and more. In order to participate in this modality, you need to create your own valuable resources that other people want to trade for. Let’s call this the producer approach.
The producer approach is better for society in every conceivable way. The looter approach causes unnecessary destruction, pushes production into ventures that don’t directly help anyone (like making more weapons), and rewards people for their ability to inflict harm. By contrast, the producer approach rewards the ability to meet the needs of others and causes resources to end up in the hands of those who value them the most.
Looter philosophy is rooted in the concept of the zero sum game, the mistaken belief that I can only have more if someone else has less. By contrast, the producer philosophy correctly identifies the fact that we can all end up better by producing more goods in more efficient ways. We live in our modern world of relatively widespread luxury because producers have made technological leaps—for their own self-serving motives—that have improved everyone’s ability to produce more goods going forward. Think of the steam engine, electricity, computing power, and more.
A producer-only world
It would be wonderful to live in a world in which there are no looters. We all produce, we all trade, everyone receives more value than they give, and there is no wasted energy or destruction from the use of force.
Think about how wonderful it could be! We wouldn’t need militaries, allowing a massive amount of productive capacity to be channeled into things that make everyone’s lives better. We wouldn’t need police. Not only would that free up more resources, but would remove the threat of improper use of force by the state against citizens. The list goes on and on.
I believe many economists—especially Austrian economists—are cheering for that world. I agree with them on the cheering. It’s why things like Donald Trump’s plans for tariffs are so horrific in their eyes. Tariffs introduce an artificial barrier between nations, impeding trade, preventing the peaceful transfer of resources, and leading to a greater likelihood of armed conflict.
There’s only one problem with this vision, and it’s also based in economics: game theory.
Game theory and looters
Imagine I’m a farmer. I’m a great farmer, I have a large plot of land, I run my operations efficiently, and I produce huge amounts of food. I sell that food into the marketplace, and with that money I’m able to afford great resources from other people, who willingly trade them to me because they value the money more than their own resources. For example, how many T-shirts does the clothing manufacturer need? Instead of his 1,000th T-shirt, he’d rather sell it for $5 and buy some food.
While I’m really great as a farmer, I’m not very good as a fighter. I have no weapons training, I keep no weapons on my property, and I dislike violence.
And finally, there’s a strong, skilled, unethical person down the street. He could get a job with me on the farm. For back-breaking work 8 hours a day, I’ll pay him 5% of my harvest. Or, by contrast, he could act like the mafia, demand a “protection fee” of 20%, and either beat me up, beat up my family, or cause harm to my property, if I don’t pay it.
In other words, he could be a producer and get 5% in exchange for hard work, or be a looter and get 20% in exchange for easy (and, likely for him, fun) work. As described, the game theoretic choice is clear.
So how do we stop a producer world from devolving back into a looter world?
Deterrence
There’s only one mechanism I’m aware of for this, and it’s deterrence. As the farmer, I made a mistake. I should get weapons training. I should keep weapons on my farm. I should be ready to defend myself and my property. Because if I don’t, game theory ultimately predicts that all trade will collapse, and society as we know it will crumble.
I don’t necessarily have to have the power of deterrence myself. I could hire a private security company, once again allowing the producer world to work out well. I trade something of lesser value (some money) for something I value more (the protection afforded by private security). If I’m lucky, that security company will never need to do anything, because the mere threat of their presence is sufficient.
And in modern society, we generally hope to rely on the government police force to provide this protection.
There are easy ways to defeat the ability of deterrence to protect our way of life. The simplest is to defang it. Decriminalize violent and destructive acts, for example. Remove the consequences for bad, looter behavior, and you will incentivize looting. This is far from a theoretical discussion. We’ve seen the clear outcome in California, which has decriminalized theft under $950, resulting—in a completely predictable way—in more theft, stores closing, and an overall erosion of producer philosophy.
And in California, this is even worse. Those who try to be their own deterrence, by arming themselves and protecting their rights, are often the targets of government force instead of the looters.
I’m guessing this phrasing has now split my reading audience into three groups. Group A agrees wholly with what I’m saying. Group B believes what I’ve just written is pure evil and garbage. Group C initially disagreed with my statements, but has an open mind and is willing to consider a different paradigm. The next section is targeted at groups A and C. Group B: good luck with the broken world you’re advocating.
Global scale
This concept of deterrence applies at a global scale too. I would love to live in a world where all nations exchange value for value and never use force against others. In fact, I believe the ultimate vision for this kind of a world ends with anarcho-capitalism (though I don’t know enough about the topic to be certain). There ends up being no need for any force against anyone else. It’s a beautiful vision for a unified world, where there are no borders, there is no destruction, there is only unity through trade. I love it.
But game theory destroys this too. If the entire world disarmed, it would take just one person who thinks he can do better through looter tactics to destroy the system. The only way to defeat that is to have a realistic threat of force to disincentivize someone from acting like a looter.
And this is the paradox. In order to live in our wonderful world of production, prosperity, health, and happiness, we always need to have our finger near enough to the trigger to respond to looters with force. I know of no other approach that allows production to happen. (And I am very interested in other theoretical solutions to this problem, if anyone wants to share reading material.)
Peace through strength
This line of thinking leads to the concept of peace through strength. When those tempted to use violence see the overwhelming strength of their potential victims, they will be disincentivized to engage in violent behavior. It’s the story of the guy who wants to rob my farm. Or the roaming army in the ancient world that bypassed the well fortified walled city and attacked its unprotected neighbor.
There are critics of this philosophy. As put by Andrew Bacevich, "'Peace through strength' easily enough becomes 'peace through war.'" I don’t disagree at all with that analysis, and it’s something we must remain vigilant against. But disarming is not the answer, as it will, of course, necessarily lead to the victory of those willing to use violence on others.
In other words, my thesis here is that the threat of violence must be present to keep society civilized. But the cost of using that violence must be high enough that neither side is incentivized to initiate it.
Israel
I’d been thinking of writing a blog post on this topic for a few months now, but finally decided to today. Israel just agreed to a hostage deal with Hamas. In exchange for the release of 33 hostages taken in the October 7 massacre, Israel will hand over 1,000 terrorists in Israeli prisons.
I have all the sympathy in the world for the hostages and their families. I also have great sympathy for the Palestinian civilians who have been harmed, killed, displaced, and worse by this war. And I have empathy (as one of the victims) for all of the Israeli citizens who have lived under threat of rocket attacks, had our lives disrupted, and for those who have been killed by this war. War is hell, full stop.
My message here is to those who have been pushing the lie of “peace through negotiations.” Or peace through capitulation. Or anything else. These tactics are the reason the war has continued. As long as the incentive structure makes initiating a war a positive, wars will continue to be initiated. Hamas has made its stance on the matter clear: it has sworn for the eradication of all Jews within the region, and considers civilian casualties on the Palestinian side not only acceptable, but advantageous.
I know that many people who criticize Israel and put pressure on us to stop the war in Gaza believe they are doing so for noble reasons. (For the record, I also believe many people have less altruistic reasons for their stance.) I know people like to point to the list of atrocities they believe Israel has committed. And, by contrast, the pro-Israel side is happy to respond with corresponding atrocities from the other side.
I honestly believe this is all far beyond irrelevant. The only question people should be asking is: how do we disincentivize the continuation of hostilities? And hostage deals that result in the release of terrorists, allow “aid” to come in (which, if history is any indication, will be used to further the construction of tunnels and other sources for attack on Israel), and give Hamas an opportunity to rearm, only incentivize the continuation of the war.
In other words, if you care about the innocent people on either side, you should be opposed to this kind of capitulation. Whatever you think about the morality of each side, more people will suffer with this approach.
Skin in the game
It’s easy to say things like that when your life isn’t on the line. I also don’t think that matters much. Either the philosophical, political, and economic analysis is correct, or it isn’t. Nonetheless, I do have skin in the game here. I still live in a warzone. I am less than 15 kilometers from the Lebanese border. We’ve had Hezbollah tunnels reaching into our surrounding cities. My family had to lock ourselves inside when Hezbollah paratroopers had attempted to land in our city.
My wife (Miriam) and I have discussed this situation at length, many times, over the course of this war. If I’m ever taken hostage, I hope the Israeli government bombs the hell out of wherever I am being held. I say this not only because I believe it is the right, just, moral, ethical, and strategically correct thing to do. I say this because I am selfish:
I would rather die than be tortured by our enemies.
I would rather die than be leveraged to make my family and country less safe.
I would rather die than live the rest of my life a shell of my former self, haunted not only by the likely torture inflicted on me, but by the guilt of the harm to others resulting from my spared life.
I don’t know why this hostage deal went through now. I don’t know what pressures have been brought to bear on the leaders in Israel. I don’t know if they are good people trying to protect their citizens, nefarious power hungry cretins looking to abuse both the Israeli and Palestinian populace to stay in control, weak-willed toadies who do what they’re told by others, or simply stupid. But my own stance is clear.
But what about the Palestinians?
I said it above, and I’ll say it again: I truly do feel horrible for the trauma that the Palestinian people are going through. Not for the active terrorists mind you, I feel no qualms about those raising arms against us being destroyed. But everyone else, even those who wish me and my fellow Israelis harm. (And, if polling is to be believed, that’s the majority of Palestinians.) I would much rather that they not be suffering now, and that eventually through earned trust on both sides, everyone’s lots are improved.
But the framework being imposed by those who “love” peace isn’t allowing that to happen. Trust cannot be built when there’s a greater incentive to return to the use of force. I was strongly opposed to the 2005 disengagement from Gaza. But once it happened, it could have been one of those trust-building starting points. Instead, I saw many people justify further violence by Hamas—such as non-stop rocket attacks on the south of Israel—because Israel hadn’t done enough yet.
Notice how fundamentally flawed this mentality is, just from an incentives standpoint! Israel gives up control of land, something against its own overall interests and something desired by Palestinians, and is punished for it with increased violence against citizens. Hamas engaged in a brutal destruction of all of its opponents within the Palestinian population, launched attacks on Israel, and when Israel did respond with force, Israel was blamed for having not done enough to appease Hamas.
I know people will want to complicate this story by bringing up the laundry list of past atrocities, of assigning negative motivations to Israel and its leaders, and a million other evasions that are used to avoid actually solving this conflict. Instead, I beg everyone to just use basic logic.
The violence will continue as long as the violence gets results.
My blog posts and reading material have both been on a decidedly economics-heavy slant recently. The topic today, incentives, squarely falls into the category of economics. However, when I say economics, I’m not talking about “analyzing supply and demand curves.” I’m talking about the true basis of economics: understanding how human beings make decisions in a world of scarcity.
A fair definition of incentive is “a reward or punishment that motivates behavior to achieve a desired outcome.” When most people think about economic incentives, they’re thinking of money. If I offer my son $5 if he washes the dishes, I’m incentivizing certain behavior. We can’t guarantee that he’ll do what I want him to do, but we can agree that the incentive structure itself will guide and ultimately determine what outcome will occur.
The great thing about monetary incentives is how easy they are to talk about and compare. “Would I rather make $5 washing the dishes or $10 cleaning the gutters?” But much of the world is incentivized in non-monetary ways too. For example, using the “punishment” half of the definition above, I might threaten my son with losing Nintendo Switch access if he doesn’t wash the dishes. No money is involved, but I’m still incentivizing behavior.
And there are plenty of incentives beyond our direct control! My son is also incentivized to not wash dishes because it’s boring, or because he has some friends over that he wants to hang out with, or dozens of other things. Ultimately, the conflicting array of different incentive structures placed on him will ultimately determine what actions he chooses to take.
Why incentives matter
A phrase I see often in discussions—whether they are political, parenting, economic, or business—is “if they could just do…” Each time I see that phrase, I cringe a bit internally. Usually, the underlying assumption of the statement is “if people would behave contrary to their incentivized behavior then things would be better.” For example:
If my kids would just go to bed when I tell them, they wouldn’t be so cranky in the morning.
If people would just use the recycling bin, we wouldn’t have such a landfill problem.
If people would just stop being lazy, our team would deliver our project on time.
In all these cases, the speakers are seemingly flummoxed as to why the people in question don’t behave more rationally. The problem is: each group is behaving perfectly rationally.
The kids have a high time preference, and care more about the joy of staying up now than the crankiness in the morning. Plus, they don’t really suffer the consequences of morning crankiness, their parents do.
No individual suffers much from their individual contribution to a landfill. If they stopped growing the size of the landfill, it would make an insignificant difference versus the amount of effort they need to engage in to properly recycle.
If a team doesn’t properly account for the productivity of individuals on a project, each individual receives less harm from their own inaction. Sure, the project may be delayed, company revenue may be down, and they may even risk losing their job when the company goes out of business. But their laziness individually won’t determine the entirety of that outcome. By contrast, they greatly benefit from being lazy by getting to relax at work, go on social media, read a book, or do whatever else they do when they’re supposed to be working.
My point here is that, as long as you ignore the reality of how incentives drive human behavior, you’ll fail at getting the outcomes you want.
If everything I wrote up until now made perfect sense, you understand the premise of this blog post. The rest of it will focus on a bunch of real-world examples to hammer home the point, and demonstrate how versatile this mental model is.
Running a company
Let’s say I run my own company, with myself as the only employee. My personal revenue will be 100% determined by my own actions. If I decide to take Tuesday afternoon off and go fishing, I’ve chosen to lose that afternoon’s revenue. Implicitly, I’ve decided that the enjoyment I get from an afternoon of fishing is greater than the potential revenue. You may think I’m being lazy, but it’s my decision to make. In this situation, the incentive–money–is perfectly aligned with my actions.
Compare this to a typical company/employee relationship. I might have a bank of Paid Time Off (PTO) days, in which case once again my incentives are relatively aligned. I know that I can take off 15 days throughout the year, and I’ve chosen to use half a day for the fishing trip. All is still good.
What about unlimited time off? Suddenly incentives are starting to misalign. I don’t directly pay a price for not showing up to work on Tuesday. Or Wednesday as well, for that matter. I might ultimately be fired for not doing my job, but that will take longer to work its way through the system than simply not making any money for the day taken off.
Compensation overall falls into this misaligned incentive structure. Let’s forget about taking time off. Instead, I work full time on a software project I’m assigned. But instead of using the normal toolchain we’re all used to at work, I play around with a new programming language. I get the fun and joy of playing with new technology, and potentially get to pad my resume a bit when I’m ready to look for a new job. But my current company gets slower results, less productivity, and is forced to subsidize my extracurricular learning.
When a CEO has a bonus structure based on profitability, he’ll do everything he can to make the company profitable. This might include things that actually benefit the company, like improving product quality, reducing internal red tape, or finding cheaper vendors. But it might also include destructive practices, like slashing the R&D budget to show massive profits this year, in exchange for a catastrophe next year when the next version of the product fails to ship.
Or my favorite example. My parents owned a business when I was growing up. They had a back office where they ran operations like accounting. All of the furniture was old couches from our house. After all, any money they spent on furniture came right out of their paychecks! But in a large corporate environment, each department is generally given a budget for office furniture, a budget which doesn’t roll over year-to-year. The result? Executives make sure to spend the entire budget each year, often buying furniture far more expensive than they would choose if it was their own money.
There are plenty of details you can quibble with above. It’s in a company’s best interest to give people downtime so that they can come back recharged. Having good ergonomic furniture can in fact increase productivity in excess of the money spent on it. But overall, the picture is pretty clear: in large corporate structures, you’re guaranteed to have mismatches between the company’s goals and the incentive structure placed on individuals.
Using our model from above, we can lament how lazy, greedy, and unethical the employees are for doing what they’re incentivized to do instead of what’s right. But that’s simply ignoring the reality of human nature.
Moral hazard
Moral hazard is a situation where one party is incentivized to take on more risk because another party will bear the consequences. Suppose I tell my son when he turns 21 (or whatever legal gambling age is) that I’ll cover all his losses for a day at the casino, but he gets to keep all the winnings.
What do you think he’s going to do? The most logical course of action is to place the largest possible bets for as long as possible, asking me to cover each time he loses, and taking money off the table and into his bank account each time he wins.
But let’s look at a slightly more nuanced example. I go to a bathroom in the mall. As I’m leaving, I wash my hands. It will take me an extra 1 second to turn off the water when I’m done washing. That’s a trivial price to pay. If I don’t turn off the water, the mall will have to pay for many liters of wasted water, benefiting no one. But I won’t suffer any consequences at all.
This is also a moral hazard, but most people will still turn off the water. Why? Usually due to some combination of other reasons such as:
We’re so habituated to turning off the water that we don’t even consider not turning it off. Put differently, the mental effort needed to not turn off the water is more expensive than the 1 second of time to turn it off.
Many of us have been brought up with a deep guilt about wasting resources like water. We have an internal incentive structure that makes the 1 second to turn off the water much less costly than the mental anguish of the waste we created.
We’re afraid we’ll be caught by someone else and face some kind of social repercussions. (Or maybe more than social. Are you sure there isn’t a law against leaving the water tap on?)
Even with all that in place, you may notice that many public bathrooms use automatic water dispensers. Sure, there’s a sanitation reason for that, but it’s also to avoid this moral hazard.
A common denominator in both of these is that the person taking the action that causes the liability (either the gambling or leaving the water on) is not the person who bears the responsibility for that liability (the father or the mall owner). Generally speaking, the closer together the person making the decision and the person incurring the liability are, the smaller the moral hazard.
It’s easy to demonstrate that by extending the casino example a bit. I said it was the father who was covering the losses of the gambler. Many children (though not all) would want to avoid totally bankrupting their parents, or at least financially hurting them. Instead, imagine that someone from the IRS shows up at your door, hands you a credit card, and tells you you can use it at a casino all day, taking home all the chips you want. The money is coming from the government. How many people would put any restriction on how much they spend?
And since we’re talking about the government already…
Government moral hazards
As I was preparing to write this blog post, the California wildfires hit. The discussions around those wildfires gave a huge number of examples of moral hazards. I decided to cherry-pick a few for this post.
The first and most obvious one: California is asking for disaster relief funds from the federal government. That sounds wonderful. These fires were a natural disaster, so why shouldn’t the federal government pitch in and help take care of people?
The problem is, once again, a moral hazard. In the case of the wildfires, California and Los Angeles both had ample actions they could have taken to mitigate the destruction of this fire: better forest management, larger fire department, keeping the water reservoirs filled, and probably much more that hasn’t come to light yet.
If the federal government bails out California, it will be a clear message for the future: your mistakes will be fixed by others. You know what kind of behavior that incentivizes? More risky behavior! Why spend state funds on forest management and extra firefighters—activities that don’t win politicians a lot of votes in general—when you could instead spend it on a football stadium, higher unemployment payments, or anything else, and then let the feds cover the cost of screw-ups.
You may notice that this is virtually identical to the 2008 “too big to fail” bail-outs. Wall Street took insanely risky behavior, reaped huge profits for years, and when they eventually got caught with their pants down, the rest of us bailed them out. “Privatizing profits, socializing losses.”
And here’s the absolute best part of this: I can’t even truly blame either California or Wall Street. (I mean, I do blame them, I think their behavior is reprehensible, but you’ll see what I mean.) In a world where the rules of the game implicitly include the bail-out mentality, you would be harming your citizens/shareholders/investors if you didn’t engage in that risky behavior. Since everyone is on the hook for those socialized losses, your best bet is to maximize those privatized profits.
There’s a lot more to government and moral hazard, but I think these two cases demonstrate the crux pretty solidly. But let’s leave moral hazard behind for a bit and get to general incentivization discussions.
Non-monetary competition
At least 50% of the economics knowledge I have comes from the very first econ course I took in college. That professor was amazing, and had some very colorful stories. I can’t vouch for the veracity of the two I’m about to share, but they definitely drive the point home.
In the 1970s, the US had an oil shortage. To “fix” this problem, they instituted price caps on gasoline, which of course resulted in insufficient gasoline. To “fix” this problem, they instituted policies where, depending on your license plate number, you could only fill up gas on certain days of the week. (Irrelevant detail for our point here, but this just resulted in people filling up their tanks more often, no reduction in gas usage.)
Anyway, my professor’s wife had a friend. My professor described in great detail how attractive this woman was. I’ll skip those details here since this is a PG-rated blog. In any event, she never had any trouble filling up her gas tank any day of the week. She would drive up, be told she couldn’t fill up gas today, bat her eyes at the attendant, explain how helpless she was, and was always allowed to fill up gas.
This is a demonstration of non-monetary compensation. Most of the time in a free market, capitalist economy, people are compensated through money. When price caps come into play, there’s a limit to how much monetary compensation someone can receive. And in that case, people find other ways of competing. Like this woman’s case: through using flirtatious behavior to compensate the gas station workers to let her cheat the rules.
The other example was much more insidious. Santa Monica had a problem: it was predominantly wealthy and white. They wanted to fix this problem, and decided to put in place rent controls. After some time, they discovered that Santa Monica had become wealthier and whiter, the exact opposite of their desired outcome. Why would that happen?
Someone investigated, and ended up interviewing a landlady that demonstrated the reason. She was an older white woman, and admittedly racist. Prior to the rent controls, she would list her apartments in the newspaper, and would be legally obligated to rent to anyone who could afford it. Once rent controls were in place, she took a different tact. She knew that she would only get a certain amount for the apartment, and that the demand for apartments was higher than the supply. That meant she could be picky.
She ended up finding tenants through friends-of-friends. Since it wasn’t an official advertisement, she wasn’t legally required to rent it out if someone could afford to pay. Instead, she got to interview people individually and then make them an offer. Normally, that would have resulted in receiving a lower rental price, but not under rent controls.
So who did she choose? A young, unmarried, wealthy, white woman. It made perfect sense. Women were less intimidating and more likely to maintain the apartment better. Wealthy people, she determined, would be better tenants. (I have no idea if this is true in practice or not, I’m not a landlord myself.) Unmarried, because no kids running around meant less damage to the property. And, of course, white. Because she was racist, and her incentive structure made her prefer whites.
You can deride her for being racist, I won’t disagree with you. But it’s simply the reality. Under the non-rent-control scenario, her profit motive for money outweighed her racism motive. But under rent control, the monetary competition was removed, and she was free to play into her racist tendencies without facing any negative consequences.
Bureaucracy
These were the two examples I remember for that course. But non-monetary compensation pops up in many more places. One highly pertinent example is bureaucracies. Imagine you have a government office, or a large corporation’s acquisition department, or the team that apportions grants at a university. In all these cases, you have a group of people making decisions about handing out money that has no monetary impact on them. If they give to the best qualified recipients, they receive no raises. If they spend the money recklessly on frivolous projects, they face no consequences.
Under such an incentivization scheme, there’s little to encourage the bureaucrats to make intelligent funding decisions. Instead, they’ll be incentivized to spend the money where they recognize non-monetary benefits. This is why it’s so common to hear about expensive meals, gift bags at conferences, and even more inappropriate ways of trying to curry favor with those that hold the purse strings.
Compare that ever so briefly with the purchases made by a small mom-and-pop store like my parents owned. Could my dad take a bribe to buy from a vendor who’s ripping him off? Absolutely he could! But he’d lose more on the deal than he’d make on the bribe, since he’s directly incentivized by the deal itself. It would make much more sense for him to go with the better vendor, save $5,000 on the deal, and then treat himself to a lavish $400 meal to celebrate.
Government incentivized behavior
This post is getting longer in the tooth than I’d intended, so I’ll finish off with this section and make it a bit briefer. Beyond all the methods mentioned above, government has another mechanism for modifying behavior: through directly changing incentives via legislation, regulation, and monetary policy. Let’s see some examples:
Artificial modification of interest rates encourages people to take on more debt than they would in a free capital market, leading to malinvestment and a consumer debt crisis, and causing the boom-bust cycle we all painfully experience.
Going along with that, giving tax breaks on interest payments further artificially incentivizes people to take on debt that they wouldn’t otherwise.
During COVID-19, at some points unemployment benefits were greater than minimum wage, incentivizing people to rather stay home and not work than get a job, leading to reduced overall productivity in the economy and more printed dollars for benefits. In other words, it was a perfect recipe for inflation.
The tax code gives deductions to “help” people. That might be true, but the real impact is incentivizing people to make decisions they wouldn’t have otherwise. For example, giving out tax deductions on children encourages having more kids. Tax deductions on childcare and preschools incentivizes dual-income households. Whether or not you like the outcomes, it’s clear that it’s government that’s encouraging these outcomes to happen.
Tax incentives cause people to engage in behavior they wouldn’t otherwise (daycare+working mother, for example).
Inflation means that the value of your money goes down over time, which encourages people to spend more today, when their money has a larger impact. (Milton Friedman described this as high living.)
Conclusion
The idea here is simple, and fully encapsulated in the title: incentives determine outcomes. If you want to know how to get a certain outcome from others, incentivize them to want that to happen. If you want to understand why people act in seemingly irrational ways, check their incentives. If you’re confused why leaders (and especially politicians) seem to engage in destructive behavior, check their incentives.
We can bemoan these realities all we want, but they are realities. While there are some people who have a solid internal moral and ethical code, and that internal code incentivizes them to behave against their externally-incentivized interests, those people are rare. And frankly, those people are self-defeating. People should take advantage of the incentives around them. Because if they don’t, someone else will.
(If you want a literary example of that last comment, see the horse in Animal Farm.)
How do we improve the world under these conditions? Make sure the incentives align well with the overall goals of society. To me, it’s a simple formula:
Focus on free trade, value for value, as the basis of a society. In that system, people are always incentivized to provide value to other people.
Reduce the size of bureaucracies and large groups of all kinds. The larger an organization becomes, the farther the consequences of decisions are from those who make them.
And since the nature of human beings will be to try and create areas where they can control the incentive systems to their own benefits, make that as difficult as possible. That comes in the form of strict limits on government power, for example.
And even if you don’t want to buy in to this conclusion, I hope the rest of the content was educational, and maybe a bit entertaining!
At work a few weeks back, I found myself digging into profile reports, trying to determine why our program was running so slowly. Despite having the extremely obvious-in-retrospect data in front of me, I wasted a lot of time speeding up code that turned out to not move the needle at all.
Although perhaps it will be interesting only to future me, I thought it would be a good exercise to write up the experience—if only so I learn the lesson about how to read profiles and not make the same mistake again.
Some Context
I’m currently employed to work on a compiler. The performance has never been stellar, in that we were usually seeing about 5s to compile programs, even trivially small ones consisting of less than a hundred instructions. It was painful, but not that painful, since the test suite still finished in a minute or two. It was a good opportunity to get a coffee. I always assumed that the time penalties we were seeing were constant factors; perhaps it took a second or two to connect to Z3 or something like that.
But then we started unrolling loops, which turned trivially small programs into merely small programs, and our performance ballooned. Now we were looking at 45s for some of our tests! Uh oh! That’s no longer in the real of constant factors, and it was clear that something asymptotically was wrong.
So I fired up GHC with the trusty old -prof flag, and ran the test suite in +RTS -p mode, which instruments the program with all sorts of profiling goodies. After a few minutes, the test suite completed, and left a test-suite.prof file laying around in the current directory. You can inspect such things by hand, but tools like profiteur make the experience much nicer.
Without further ado, here’s what our profile looked like:
Now we’re in business. I dutifully dug into toSSA, the transforms, and collectGarbage. I cached some things, used better data structures, stopped appending lists, you know, the usual Haskell tricks. My work was rewarded, in that I managed to shave 80% off the runtime of our program.
A few months later, we wrote a bigger program and fed it to the compiler. This one didn’t stop compiling. We left it overnight.
Uh oh. Turns out I hadn’t fixed the problem. I’d only papered over it.
Retrospective
So what went wrong here? Quite a lot, in fact! And worse, I had all of the information all along, but managed to misinterpret it at several steps of the process.
Unwinding the story stack, the most salient aspect of having not solved the problem was reducing the runtime by only 80%. Dramatic percentages feel like amazing improvements, but that’s because human brains are poorly designed for building software. In the real world, big percentages are fantastic. In software, they are linear improvements.
That is to say that a percentage-based improvement is \(O(n)\) faster in the best case. My efforts improved our runtime from 45s to 9s. Which feels great, but the real problem is that this program is measured in seconds at all.
It’s more informative to think in terms of orders of magnitude. Taking 45s on a ~3GHz processor is on the order of 1011 instructions, while 9s is 1010. How the hell is it taking us TEN BILLION instructions to compile a dinky little program? That’s the real problem. Improving things from one hundred billion down to ten billion is no longer very impressive at all.
To get a sense of the scale here, even if we spent 1M cycles (which feels conservatively expensive) for each instruction we wanted to compile, we should still be looking at < 0.1s. Somehow we are over 1000x worse than that.
So that’s one mistake I made: being impressed by extremely marginal improvements. Bad Sandy.
The other mistake came from my interpretation of the profile. As a quick pop quiz, scroll back up to the profile and see if you can spot where the problem is.
After expanding a few obviously-not-the-problem call centers that each were 100% of the runtime, I turned my brain off and opened all of the 100% nodes. But in doing so, I accidentally breezed past the real problem. The real problem is either that compileProgram takes 100% of the time of the test, or that transformSSA takes 100% of compiling the program. Why’s that? Because unlike main and co, test does more work than just compiling the program. It also does non-trivial IO to produce debugging outputs, and property checks the resulting programs. Similarly for compileProgram, which does a great deal more than transformSSA.
This is somewhat of a philosophical enlightenment. The program execution hasn’t changed at all, but our perspective has. Rather than micro-optimizing the code that is running, this new perspective suggests we should focus our effort on determining why that code is running in the first place.
Digging through transformSSA made it very obvious the problem was an algorithmic one—we were running an unbounded loop that terminated on convergence, where each step it took @O(n^2)@ work to make a single step. When I stopped to actually read the code, the problem was immediate, and the solution obvious.
The lesson? Don’t read the profile. Read the code. Use the profile to focus your attention.
In my previous two posts "Ways to use torch.compile" and "Ways to use torch.export", I often said that PyTorch would be good for a use case, but there might be some downsides. Some of the downsides are foundational and difficult to remove. But some... just seem like a little something is missing from PyTorch. In this post, here are some things I hope we will end up shipping in 2025!
Improving torch.compile
A programming model for PT2. A programming model is a an abstract description of the system that is both simple (so anyone can understand it and keep it in their head all at once) and can be used to predict the system's behavior. The torch.export programming model is an example of such a description. Beyond export, we would like to help users understand why all aspects of PT2 behave the way it does (e.g., via improved error messages), and give simple, predictable tools for working around problems when they arise. The programming model helps us clearly define the intrinsic complexity of our compiler, which we must educate users about. This is a big effort involving many folks on the PyTorch team and I hope we can share more about this effort soon.
Pre-compilation: beyond single graph export. Whenever someone realizes that torch.compile compilation is taking a substantial amount of time on expensive cluster machines, the first thing they ask is, "Why don't we just compile it in advance?" To support precompiling the torch.compile API exactly as is not so easy; unlike a traditional compiler which gets the source program directly as input, users of torch.compile must actually run their Python program to hit the regions of code that are intended to be compiled. Nor can these regions be trivially enumerated and then compiled: not only must know all the metadata input tensors flowing into a region, a user might not even know what the compiled graphs are if a model has graph breaks.
OK, but why not just run the model, dump all the compiled products, and then reuse them later? This works! Here is a POC from Nikita Shulga where a special decorator aot_compile_sticky_cache swaps between exporting a graph and running the exported product. Zhengxu Chen used a similar idea to export Whisper as a few distinct graphs, which he then manually stitched together in C++ to get a Python-free version of Whisper. If you want training to work, you can more directly integrate AOTInductor as an Inductor backend, e.g., as seen in this POC.. We are a stones throw away from working precompilation, which can guarantee no compilation at runtime, we just need to put the pieces together!
Improving caching further. There are some gaps with caching which we hope to address in the near future: (1) loading Triton cache artifacts takes a long time because we still re-parse the Triton code before doing a cache lookup (James Wu is on this), (2) if you have a lot of small graphs, remote cache ends up having to do lots of small network requests, instead of one batched network request at the beginning (Oguz Ulgen recently landed this), (3) AOTAutograd cache is not fully rolled out yet (James Wu again). These collectively should be worth a 2x speedup or even more on warm cache time.
Fix multithreading. We should just make sure multithreading works, doing the testing and fiddly thread safety auditing needed to make it work. Here's a list of multithreading related issues.
Improving torch.export
Draft mode export. Export requires a lot of upfront work to even get an exported artifact in the first place. Draft mode export capitalizes on the idea that it's OK to generate an unsound "draft" graph early in the export, because even an incorrect graph is useful for kicking the tires on the downstream processing that happens after export. A draft export gives you a graph, and it also gives you a report describing what potential problems need to be fixed to get some guarantees about the correctness of the export. You can then chip away on the problems in the report until everything is green. One of the biggest innovations of draft-mode export is pervasive use of real tensor propagation when doing export: you run the export with actual tensors, so you can always trace through code, even if it is doing spicy things like data-dependent control flow.
Libtorch-free AOTInductor. AOTInductor generated binaries have a relatively small ABI surface that needs to be implemented. This hack from the most recent CUDA Mode meetup shows that you can just create an alternate implementation of the ABI that has no dependence on libtorch. This makes your deployed binary size much smaller!
Support for bundling CUDA kernels into AOTInductor. AOTInductor already supports directly bundling Triton kernels into the generated binary, but traditional CUDA kernels cannot be bundled in this way. There's no reason this has to be the case though: all we're doing is bundling cubins in both case. If we have the ability to bundle traditional CUDA kernels into AOTInductor, this means you could potentially directly embed custom operators into AOTInductor binaries, which is nice because then those operators no longer have to be offered on the runtime (especially if you're commonly iterating on these kernels!)
Export multigraphs. Export's standard model is to give you a single graph that you call unconditionally. But it's easy to imagine a level of indirection on top of these graphs, where we can dispatch between multiple graphs depending on some arguments to the model. For example, if you have a model that optionally takes an extra Tensor argument, you can simply have two graphs, one for when the Tensor is absent, and one for when it is present.
ABI stable PyTorch extensions. It's hard work being a third-party PyTorch extension with native code, because whenever there's a new release of Python or PyTorch you have to rebuild all of your wheels. If there was a limited ABI that you could build your extension against that didn't expose CPython and only relied on a small, stable ABI of PyTorch functions, your binary packaging situation would be much simpler! And if an extension relied on a small ABI, it could even be bundled with AOTInductor binary, letting these export products be truly package agnostic (one of our lessons we learned with torch.package is picking the split between "what is packaged" and "what is not" is very difficult, and people would much rather just have everything be packaged.) Jane Xu is investigating how to do this, and separately, Scott Wolchok has been refactoring headers in libtorch so that a small set of headers can be used independently of the rest of libtorch.
GHC since version 9.8 allows us to create callbacks from JS to Haskell code, which enables us to create full-fledged browser apps.
This article shows how to use the JS backend with foreign component libraries.
When people talk about functional programming in modern multi-paradigm languages, they usually mention Rust, Scala, or Kotlin. You rarely hear Swift being mentioned. This is odd, as one might argue that, of these languages, Swift places the strongest emphasis on functional programming.
In this talk, I will explain the core functional programming features of Swift, including its expressive type system, value types, and mutability control. Furthermore, I will discuss how Swift’s language design is influenced by the desire to create a language that addresses the whole spectrum from low-level systems programming up to high-level applications with sophisticated graphical user interfaces. Beyond the core language itself, functional programming also permeates Swift’s rich ecosystem of libraries. To support this point, I will outline some FP-inspired core libraries, covering concepts from functional data structures over functional reactive programming to declarative user interfaces.
Finally, I will briefly summarise practical considerations for using Swift in your own projects. This includes the cross-platform toolchain, the package manager, and interoperability with other languages.
The seat layout fits on a grid. Each position is either floor (.), an empty seat (L), or an occupied seat (#). For example, the initial seat layout might look like this:
All decisions are based on the number of occupied seats adjacent to a given seat (one of the eight positions immediately up, down, left, right, or diagonal from the seat).
The following rules are applied to every seat simultaneously:
If a seat is empty (L) and there are no occupied seats adjacent to it, the seat becomes occupied.
If a seat is occupied (#) and four or more seats adjacent to it are also occupied, the seat becomes empty.
Otherwise, the seat’s state does not change.
Floor (.) never changes; seats don’t move, and nobody sits on the floor.
This is a classic Cellular Automaton problem. We need to write a program that simulates seats being occupied till no further seats are emptied or occupied, and
returns the final number of occupied seats. Let’s solve this in Haskell.
The Cellular Automaton
First, some imports:
{-# LANGUAGE GHC2021 #-}{-# LANGUAGE LambdaCase #-}{-# LANGUAGE PatternSynonyms #-}{-# LANGUAGE TypeFamilies #-}moduleMainwhereimportControl.Arrow ((>>>))importControl.Comonad (Comonad (..))importData.Function (on)importData.List (intercalate, nubBy)importData.Massiv.Array (Ix2 (..))importData.Massiv.ArrayqualifiedasAimportData.Massiv.Array.UnsafequalifiedasAUimportData.Proxy (Proxy (..))importData.Vector.GenericqualifiedasVGimportData.Vector.Generic.MutablequalifiedasVGMimportData.Vector.UnboxedqualifiedasVUimportSystem.Environment (getArgs, getProgName)
We use the GHC2021 extension here that enables a lot of useful GHC extensions by default. Our non-base imports come from the comonad, massiv and vector libraries.
A cellular automaton consists of a regular grid of cells, each in one of a finite number of states.
For each cell, a set of cells called its neighborhood is defined relative to the specified cell.
An initial state is selected by assigning a state for each cell.
A new generation is created, according to some fixed rule that determines the new state of each cell in terms of the current state of the cell and the states of the cells in its neighborhood.
Let’s model the automaton of the challenge using Haskell:
A cell in the grid can be in empty, occupied or floor state. We encode this with the pattern synonymsEmpty, Occupied and Floor over the Cellnewtype over Char1.
The parseCell function parses a character to a Cell. The rule function implements the automaton rule.
The Solution
We are going to solve this puzzle in three different ways. So, let’s abstract the details and solve it top-down.
class (Eq a) =>Grid a where fromLists :: [[Cell]] -> a step :: a -> a toLists :: a -> [[Cell]]solve ::forall a. (Grid a) =>Proxy a -> [[Cell]] ->Intsolve _ = fromLists @a>>> fix step>>> toLists>>>fmap (filter (==Occupied) >>>length)>>>sumwhere fix f x =let x' = f x inif x == x' then x else fix f x'
We solve the challenge using the Grid typeclass that all our different solutions implement. A grid is specified by three functions:
fromList: converts a list of lists of cells to the grid.
step: runs one step of the CA simulation.
toList: converts the grid back to a list of lists of cells.
The solve function calculates the number of finally occupied seats for any instance of the Grid typeclass by running the simulation till it converges2.
Now, we use solve to solve the challenge in three ways depending on the command line argument supplied:
We have set up the top (main) and the bottom (rule) of our solutions. Now let’s work on the middle part.
The Zipper
To simulate a CA, we need to focus on each cell of the automaton grid, and run the rule for the cell. What is the first thing that come to minds of functional programmers when we want to focus on a part of a data structure? Zippers!.
Zippers are a special view of data structures, which allow one to navigate and easily update them. A zipper always has a focus or cursor which is the current element of the data structure we are “at”. Alongside, it also captures the rest of the data structure in a way that makes it easy to move around it. We can update the data structure by updating the element at the focus.
The first way to solve the challenge is the zipper for once-nested lists. Let’s start with creating the zipper for a simple list:
dataZipper a =Zipper [a] a [a] deriving (Eq, Functor)zPosition ::Zipper a ->IntzPosition (Zipper left _ _) =length leftzLength ::Zipper a ->IntzLength (Zipper left _ right) =length left +1+length rightlistToZipper :: [a] ->Zipper alistToZipper = \case [] ->error"Cannot create Zipper from empty list" (x : xs) ->Zipper [] x xszipperToList ::Zipper a -> [a]zipperToList (Zipper left focus right) =reverse left <> (focus : right)pShowZipper :: (Show a) =>Zipper a ->StringpShowZipper (Zipper left focus right) =unwords$mapshow (reverse left) <> (("["<>show focus <>"]") :mapshow right)zLeft ::Zipper a ->Zipper azLeft z@(Zipper left focus right) =case left of [] -> z x : xs ->Zipper xs x (focus : right)zRight ::Zipper a ->Zipper azRight z@(Zipper left focus right) =case right of [] -> z x : xs ->Zipper (focus : left) x xs
A list zipper has a focus element, and two lists that capture the elements to the left and right of the focus. We use it through these functions:
zPosition returns the zero-indexed position of the focus in the zipper.
zLength returns the length of the zipper.
listToZipper and zipperToList do conversions between lists and zippers.
pShowZipper pretty-prints a zipper, highlighting the focus.
zLeft and zRight move the zipper’s focus to left and right respectively.
ZGrid is a newtype over a zipper of zippers. It has functions similar to Zipper for getting focus, position and size, for conversions to-and-from lists of lists, and for pretty-printing.
Next, the functions to move the focus in the grid:
zgUp ::ZGrid a ->ZGrid azgUp (ZGrid rows) =ZGrid$ zLeft rowszgDown ::ZGrid a ->ZGrid azgDown (ZGrid rows) =ZGrid$ zRight rowszgLeft ::ZGrid a ->ZGrid azgLeft (ZGrid rows) =ZGrid$fmap zLeft rowszgRight ::ZGrid a ->ZGrid azgRight (ZGrid rows) =ZGrid$fmap zRight rows
It works as expected. Now, how do we use this to simulate a CA?
The Comonad
A CA requires us to focus on each cell of the grid, and run a rule for the cell that depends on the neighbours of the cell. An Haskell abstraction that neatly fits this requirement is Comonad.
Comonads are duals of Monads3. We don’t need to learn everything about them for now. For our purpose, Comonad provides an interface that exactly lines up with what is needed for simulating CA:
classFunctor w =>Comonad w where extract :: w a -> a duplicate :: w a -> w (w a) extend :: (w a -> b) -> w a -> w b{-# MINIMAL extract, (duplicate | extend) #-}
Assuming we can make ZGrid a comonad instance, the signatures for the above functions for ZGridCell would be:
The extract function would return the current focus of the grid.
The duplicate function would return a grid of grids, one inner grid for each possible focus of the input grid.
The extend function would apply the automata rule to each possible focus of the grid, and return a new grid.
The nice part is, we need to implement only the extract and duplicate functions, and the generation of the new grid is taken care of automatically by the default implementation of the extend function. Let’s write the comonad instance for ZGrid.
First, we write the comonad instance for Zipper:
instanceComonadZipperwhere extract (Zipper _ focus _) = focus duplicate zipper =Zipper left zipper rightwhere pos = zPosition zipper left = iterateN pos zLeft $ zLeft zipper right = iterateN (zLength zipper - pos -1) zRight $ zRight zipperiterateN ::Int-> (a -> a) -> a -> [a]iterateN n f =take n .iterate f
extract for Zipper simply returns the input zipper’s focus element.
duplicate returns a zipper of zippers, with the input zipper as its focus, and the left and right lists of zippers as variation of the input zipper with all possible focuses. Trying out the functions in GHCi gives a better idea:
zGridNeighbours returns the neighbour cells of the currently focussed cell of the grid. It does so by moving the focus in all eight directions, and extracting the new focuses. We also make sure to return unique cells by their position.
stepZGrid implements one step of the CA using the extend function of the Comonad typeclass. We call extend with a function that takes the current grid, and returns the result of running the CA rule on its focus and the neighbours of the focus.
Finally, we plug in our functions into the ZGridCell instance of Grid.
❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
--run "ghc --make seating-system.hs -O2"
[1 of 2] Compiling Main ( seating-system.hs, seating-system.o )
[2 of 2] Linking seating-system
❯ time ./seating-system -z input.txt
2243
2.72 real 2.68 user 0.02 sys
I verified with the Advent of Code website that the result is correct. We also see the time elapsed, which is 2.7 seconds. That seems pretty high. Can we do better?
The Array
The problem with the zipper approach is that lists in Haskell are too slow. Some operations on them like length are \(O(n)\). The are also lazy in spine and value, and build up thunks. We could switch to a different list-like data structure5, or cache the grid size and neighbour indices for each index to make it run faster. Or we could try an entirely different approach.
Let’s think about it for a bit. Zippers intermix two things together: the data in the grid, and the focus. When running a step of the CA, the grid data does not change when focussing on all possible focuses, only the focus itself changes. What if we separate the data from the focus? Maybe that’ll make it faster. Let’s try it out.
Let’s model the grid as combination of a 2D array and an index into the array. We are using the arrays from the massiv library.
dataAGrid a =AGrid {aGrid ::A.ArrayA.BA.Ix2 a, aGridFocus ::A.Ix2}deriving (Eq, Functor)
A.Ix2 is massiv’s way of representing an index into an 2D array, and is essentially same as a two-tuple of Ints. A.ArrayA.BA.Ix2 a here means a 2D boxed array of as. massiv uses representation strategies to decide how arrays are actually represented in the memory, among which are boxed, unboxed, primitive, storable, delayed etc. Even though primitive and storable arrays are faster, we have to go with boxed arrays here because the Functor instance of A.Array exists only for boxed and delayed arrays, and boxed ones are the faster among the two for our purpose.
It is actually massively6 easier to write the Comonad instance for AGrid:
The extract implementation simply looks up the element from the array at the focus index. This time, we don’t need to implement duplicate because it is easier to implement extend directly. We map with index (A.imap) over the grid, calling the function f for the variation of the grid with the index as the focus.
Next, we write the CA step:
listsToAGrid :: [[Cell]] ->AGridCelllistsToAGrid = A.fromLists' A.Seq>>>flipAGrid (0:.0)aGridNeighbours ::AGrid a -> [a]aGridNeighbours (AGrid grid (x :. y)) = [ grid A.! (x + i :. y + j)| i <- [-1, 0, 1], j <- [-1, 0, 1], (x + i, y + j) /= (x, y), validIndex (x + i, y + j) ]whereA.Sz (rowCount :. colCount) = A.size grid validIndex (a, b) =and [a >=0, b >=0, a < rowCount, b < colCount]stepAGrid ::AGridCell->AGridCellstepAGrid = extend $ \grid -> rule (extract grid) (aGridNeighbours grid)instanceGrid (AGridCell) where fromLists = listsToAGrid step = stepAGrid toLists = aGrid >>> A.toLists
listsToAGrid converts a list of lists of cells into an AGrid focussed at (0,0). aGridNeighbours finds the neighbours of the current focus of a grid by directly looking up the valid neighbour indices into the array. stepAGrid calls extract and aGridNeighbours to implement the CA step, much like the ZGrid case. And finally, we create the AGridCell instance of Grid.
Let’s compile and run it:
❯ rm ./seating-system
❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
--run "ghc --make seating-system.hs -O2"
[2 of 2] Linking seating-system
❯ time ./seating-system -a input.txt
2243
0.10 real 0.09 user 0.00 sys
Woah! It takes only 0.1 second this time. Can we do even better?
The Stencil
massiv has a construct called Stencil that can be used for simulating CA:
Stencil is abstract description of how to handle elements in the neighborhood of every array cell in order to compute a value for the cells in the new array.
That sounds like exactly what we need. Let’s try it out next.
With stencils, we do not need the instance of Comonad for the grid. So we can switch to the faster unboxed array representation:
newtypeinstanceVU.MVector s Cell=MV_Char (VU.MVector s Char)newtypeinstanceVU.VectorCell=V_Char (VU.VectorChar)derivinginstanceVGM.MVectorVU.MVectorCellderivinginstanceVG.VectorVU.VectorCellinstanceVU.UnboxCelltypeSGrid a =A.ArrayA.UA.Ix2 a
First five lines make Cell an instance of the Unbox typeclass. We chose to make Cell a newtype wrapper over Char because Char has an Unbox instance.
Then we define a new grid type SGrid that is an 2D unboxed array.
Now, we define the stencil and the step function for our CA:
We make a stencil of size 3-by-3, where the focus is at index (1,1) relative to the stencil’s top-left cell. In the callback function, we use the supplied get function to get the neighbours of the focus by using indices relative to the focus, and call rule with the cells at focus and neighbour indices.
Then we write the step function stepSGrid that maps the stencil over the grid. Finally we put everything together in the SGridCell instance of Grid.
Let’s compile and run it:
❯ rm ./seating-system
❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
--run "ghc --make seating-system.hs -O2"
[2 of 2] Linking seating-system
❯ time ./seating-system -s input.txt
2243
0.08 real 0.07 user 0.00 sys
It is only a bit faster than the previous solution. But, this time we have another trick up our sleeves. Did you notice A.computeP we sneaked in there? With stencils, we can now run the step for all cells in parallel! Let’s recompile it with the right options and run it again:
❯ rm ./seating-system
❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
--run "ghc --make seating-system.hs -O2 -threaded -rtsopts"
[2 of 2] Linking seating-system
❯ time ./seating-system -s input.txt +RTS -N
2243
0.04 real 0.11 user 0.05 sys
The -threaded option enables multithreading, and the +RTS -N option makes the process use all CPU cores7. We get a nice speedup of 2x over the single-threaded version.
Bonus Round: Simulation Visualization
Since you’ve read the entire post, here is a bonus visualization of the CA simulation for you (warning: lots of fast blinking):
Play the simulation
<noscript></noscript>
That’s it for this post! I hope you enjoyed it and took something away from it. The full code for this post is available here.
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
The reason for using a newtype instead of a data is explained in the Stencil section.↩︎
If you are unfamiliar, >>> is the left-to-right function composition function:
Cartoonist Ann Telnaes has quit the Washington Post, after they refused to publish one of her cartoons, depicting Mark Zuckerberg (Meta), Sam Altman (Open AI), Patrick Soon-Shiong (LA Times), the Walt Disney Company (ABC News), and Jeff Bezos (Amazon & Washington Post). All that exists is her preliminary sketch, above. Why is this important? See her primer below. (Spotted via Boing Boing.)
Previously, I discussed the value proposition of torch.compile. While doing so, I observed a number of downsides (long compile time, complicated operational model, lack of packaging) that were intrinsic to torch.compile's API contract, which emphasized being able to work on Python code as is, with minimal intervention from users. torch.export occupies a different spot in the tradeoff space: in exchange for more upfront work making a model exportable, it allows for use of PyTorch models in environments where using torch.compile as is would be impossible.
Enable end-to-end C++ CPU/GPU Inference
Scenario: Like before, suppose you want to deploy your model for inference. However, now you have more stringent runtime requirements: perhaps you need to do inference from a CPython-less environment (because your QPS requirements require GIL-less multithreading; alternately, CPython execution overhead is unacceptable but you cannot use CUDA graphs, e.g., due to CPU inference or dynamic shapes requirements). Or perhaps your production environment requires hermetic deploy artifacts (for example, in a monorepo setup, where infrastructure code must be continually pushed but model code should be frozen). But like before, you would prefer not to have to rewrite your model; you would like the existing model to serve as the basis for your Python-less inference binary.
What to do: Use torch.export targeting AOTInductor. This will compile the model into a self-contained shared library which then can be directly invoked from a C++ runtime. This shared library contains all of the compiler generated Triton kernels as precompiled cubins and is guaranteed not to need any runtime compilation; furthermore, it relies only on a small runtime ABI (with no CPython dependency), so the binaries can be used across versions of libtorch. AOTInductor's multithreading capability and low runtime overhead also makes it a good match for CPU inference too!
You don't have to go straight to C++ CPU/GPU inference: you can start with using torch.compile on your code before investing in torch.export. There are four primary extra requirements export imposes: (1) your model must compile with fullgraph=True (though you can sometimes bypass missing Dynamo functionality by using non-strict export; sometimes, it is easier to do non-strict torch.export than it is to torch.compile!), (2) your model's inputs/outputs must only be in torch.export's supported set of argument types (think Tensors in pytrees), (3) your model must never recompile--specifically, you must specify what inputs have dynamic shapes, and (4) the top-level of your model must be an nn.Module (so that export can keep track of all of the parameters your model has).
Some tips:
Check out the torch.export programming model. The torch.export programming model is an upcoming doc which aims to help set expectations on what can and cannot be exported. It talks about things like "Tensors are the only inputs that can actually vary at runtime" and common mistakes such as module code which modifies NN modules (not supported!) or optional input types (you will end up with an export that takes in that input or not, there is no runtime optionality).
Budget time for getting a model to export. With torch.compile for Python inference, you could just slap it on your model and see what happens. For torch.export, you have to actually finish exporting your entire model before you can even consider running the rest of the pipeline. For some of the more complicated models we have exported, there were often dozens of issues that had to be worked around in one way or another. And that doesn't even account for all of the post-export work you have to do, like validating the numerics of the exported model.
Intermediate value debugging. AOTInductor has an option to add dumps of intermediate tensor values in the compiled C++ code. This is good for determining, e.g., the first time where a NaN shows up, in case you are suspecting a miscompilation.
Open source examples: Among other things, torchchat has an example end-to-end AOTInductor setup for server-side LLM inference, which you can view in run.cpp.
torch.export specific downsides:
No built-in support for guard-based dispatch (multiple compilations). Earlier, I mentioned that an exported model must not have any recompiles. This leads to some fairly common patterns of code not being directly supported by torch.export: you can't export a single model that takes an enum as input, or has an optional Tensor argument, or accepts two distinct tensor shapes that need to be compiled individually. Now, technically, we could support this: you could imagine a package that contains multiple exported artifacts and dispatches between them depending on some conditions (e.g., the value of the enum, whether or the optional Tensor argument was provided, the shape of the input tensor). But you're on your own: torch.compile will do this for you, but torch.export will not.
No built-in support for models that are split into multiple graphs. Similarly, we've mentioned that an exported model must be a single graph. This is in contrast to torch.compile, which will happily insert graph breaks and compile distinct islands of code that can be glued together with Python eager code. Now, technically, you can do this with export too: you can carve out several distinct subnets of your model, export them individually, and then glue them together with some custom written code on the other end (in fact, Meta's internal recommendation systems do this), but there's no built-in support for this workflow.
The extra requirements often don't cover important components of real world models. I've mentioned this previously as the extra restrictions export places on you, but it's worth reiterating some of the consequences of this. Take an LLM inference application: obviously, there is a core model that takes in tokens and produces logit predictions--this part of the model is exportable. But there are also important other pieces such as the tokenizer and sampling strategy which are not exportable (tokenizer because it operates on strings, not tensors; sampling because it involves complicated control flow). Arguably, it would be much better if all of these things could be directly bundled with the model itself; in practice, end-to-end applications should just expect to directly implement these in native code (e.g., as is done in torchchat). Our experience with TorchScript taught us that we don't really want to be in the business of designing a general purpose programming language that is portable across all of export's targets; better to just bet that the tokenizer doesn't change that often and eat the cost of natively integrating it by hand.
AOTInductor specific downsides:
You still need libtorch to actually run the model. Although AOTInductor binaries bundle most of their compiled kernel implementation, they still require a minimal runtime that can offer basic necessities such as tensor allocation and access to custom operators. There is not yet an official offering of an alternative, lightweight implementation of the stable ABI AOTInductor binaries depends on, so if you do want to deploy AOTInductor binaries you will typically have to also bring libtorch along. This is usually not a big deal server side, but it can be problematic if you want to do client side deployments!
No CUDA graphs support. This one is not such a big deal since you are much less likely to be CPU bound when the host side logic is all compiled C++, but there's no support for CUDA graphs in AOTInductor. (Funnily enough, this is also something you technically can orchestrate from outside of AOTInductor.)
Edge deployment
Scenario: You need to deploy your PyTorch model to edge devices (e.g., a mobile phone or a wearable device) where computational resources are limited. You have requirements that are a bit different from server size: you care a lot more about minimizing binary size and startup time. Traditional PyTorch deployment with full libtorch won't work. The device you're deploying too might also have some strange extra processors, like a DSP or NPU, that you want your model to target.
What to do: Use torch.export targeting Executorch. Among other things, Executorch offers a completely separate runtime for exported PyTorch programs (i.e., it has no dependency on libtorch, except perhaps there are a few headers which we share between the projects) which was specifically designed for edge deployment. (Historical note: we spent a long time trying to directly ship a stripped down version of libtorch to mobile devices, but it turns out it's really hard to write code that is portable on server and client, so it's better to only share when absolutely necessary.) Quantization is also a pretty important part of deployment to Edge, and Executorch incorporates this into the end-to-end workflow.
Open source examples: torchchat also has an Executorch integration letting you run an LLM on your Android phone.
Downsides. All of the export related downsides described previously apply here. But here's something to know specifically about Executorch:
The edge ecosystem is fragmented. At time of writing, there are seven distinct backends Executorch can target. This is not really Executorch's fault, it comes with the territory--but I want to call it out because it stands in stark contrast to the NVIDIA's server-side hegemony. Yes, AMD GPUs are a thing, and various flavors of CPU are real, but it really is a lot easier to be focused on server side because NVIDIA GPUs come first.
Pre-compiled kernels for eager mode
Scenario: You need a new function or self-contained module with an efficient kernel implementation. However, you would prefer not to have to write the CUDA (or even Triton) by hand; the kernel is something that torch.compile can generate from higher level PyTorch implementation. At the same time, however, you cannot tolerate just-in-time compilation at all (perhaps you are doing a massive training job, and any startup latency makes it more likely that one of your nodes will fail during startup and then you make no progress at all; or maybe you just find it annoying when PyTorch goes out to lunch when you cache miss).
What to do: Use torch.export targeting AOTInductor, and then load and run the AOTInductor generated binary from Python.
Downsides. So, we know this use case works, because we have internally used this to unblock people who wanted to use Triton kernels but could not tolerate Triton's just-in-time compilation. But there's not much affordance in our APIs for this use case; for example, guard-based dispatch is often quite useful for compiled functions, but you'll have to roll that by hand. More generally, when compiling a kernel, you have to make tradeoffs about how static versus dynamic the kernel should be (for example, will you force the inputs to be evenly divisible by eight? Or would you have a separate kernel for the divisible and not divisible cases?) Once again, you're on your own for making the call there.
An exchange format across systems
Scenario: In an ideal world, you would have a model, you could export it to an AOTInductor binary, and then be all done. In reality, maybe this export process needs to be a multi-stage process, where it has to be processed to some degree on one machine, and then finish processing on another machine. Or perhaps you need to shift the processing over time: you want to export a model to freeze it (so it is no longer tied to its original source code), and then repeatedly run the rest of the model processing pipeline on this exported program (e.g., because you are continuously updating its weights and then reprocessing the model). Maybe you want to export the model and then train it from Python later, committing to a distributed training strategy only when you know how many nodes you are running. The ability to hermetically package a model and then process it later is one of the big value propositions of TorchScript and torch.package.
What to do: Use torch.export by itself, potentially using pre-dispatch if you need to support training use-cases. torch.export produces an ExportedProgram which has a clean intermediate representation that you can do processing on, or just serialize and then do processing on later.
Downsides:
Custom operators are not packaged. A custom operator typically refers to some native code which was linked with PyTorch proper. There's no way to extract out this kernel and embed it into the exported program so that there is no dependence; instead, you're expected to ensure the eventual runtime relinks with the same custom operator. Note that this problem doesn't apply to user defined Triton kernels, as export can simply compile it and package the binary directly into the exported product. (Technically, this applies to AOTInductor too, but this tends to be much more of a problem for use cases which are primarily about freezing rapidly evolving model code, as opposed to plain inference where you would simply just expect people to not be changing custom operators willy nilly.)
Choose your own decompositions. Export produces IR that only contains operators from a canonical operator set. However, the default choice is sometimes inappropriate for use cases (e.g., some users want aten.upsample_nearest2d.vec to be decomposed while others do not), so in practice for any given target you may have a bespoke operator set that is appropriate for that use case. Unfortunately, it can be fiddly getting your operator set quite right, and while we've talked about ideas like a "build your own operator set interactive tool" these have not been implemented yet.
Annoyingly large FC/BC surface. Something I really like about AOTInductor is that it has a very small FC/BC surface: I only need to make sure I don't make breaking changes to the C ABI, and I'm golden. With export IR, the FC/BC surface is all of the operators produced by export. Even a decomposition is potentially BC breaking: a downstream pass could be expecting to see an operator that no longer exists because I've decomposed it into smaller pieces. Matters get worse in pre-dispatch export, since the scope of APIs used inside export IR expands to include autograd control operators (e.g., torch.no_grad) as well as tensor subclasses (since Tensor subclasses cannot be desugared if we have not yet eliminated autograd). We will not break your AOTInductor blobs. We can't as easily give the same guarantee for the IR here.
Next time: What's missing, and what we're doing about it
Up until this year, my Bitcoin custody strategy was fairly straightforward, and likely familiar to other hodlers:
Buy a hardware wallet
Put the seed phrase on steel plates
Secure those steel plates somewhere on my property
But in October of last year, the situation changed. I live in Northern Israel, close to the Lebanese border. The past 14 months have involved a lot of rocket attacks, including destruction of multiple buildings in my home town. This brought into question how to properly secure my sats. Importantly, I needed to balance two competing goals:
Resiliency of the saved secrets against destruction. In other words: make sure I didn't lose access to the wallet.
Security against attackers trying to steal those secrets. In other words: make sure no one else got access to the wallet.
I put some time into designing a solution to these conflicting goals, and would like to share some thoughts for others looking to improve their BTC custody strategy. And if anyone has any recommendations for improvements, I'm all ears!
Goals
Self custody I didn't want to rely on an external custody company. Not your keys, not your coins.
Full access I always maintain full access to my funds, without relying on any external party.
Computer hack resilient If my computer systems are hacked, I will not lose access to or control of my funds (neither stolen nor lost).
Physical destruction resilient If my hardware device and steel plates are both destroyed (as well as anything else physically located in my home town), I can still recovery my funds.
Will survive me If I'm killed, I want my wife, children, or other family members to be able to recover and inherit my BTC.
Multisig
The heart of this protection mechanism is a multisig wallet. Unfortunately, interfaces for setting up multisig wallets are tricky. I'll walk through the basics and then come back to how to set it up.
The concept of a multisig is that your wallet is protected by multiple signers. Each signer can be any "normal" wallet, e.g. a software or hardware wallet. You choose a number of signers and a threshold of signers required to perform a transaction.
For example, a 2 of 2 multisig would mean that 2 wallets can sign transactions, and both of them need to sign to make a valid transaction. A 3 of 5 would mean 5 total signers, any 3 of them being needed to sign a transaction.
For my setup, I set up a 2 of 3 multisig, with the 3 signers being a software wallet, a hardware wallet, and SLIP39 wallet. Let's go through each of those, explain how they work, and then see how the solution addresses the goals.
Software wallet
I set up a software wallet and saved the seed phrase in a dedicated password manager account using Bitwarden. Bitwarden offers an emergency access feature, which essentially means a trusted person can be listed as an emergency contact and can recover your account. The process includes a waiting period, during which the account owner can reject the request.
Put another way: Bitwarden is offering a cryptographically secure, third party hosted, fully managed, user friendly dead-man switch. Exactly what I needed.
I added a select group of trusted people as the recoverers on the account. Otherwise, I keep the account securely locked down in Bitwarden and can use it for signing when necessary.
Let's see how this stacks up against the goals:
Self custody Check, no reliance on anyone else
Full access Check, I have access to the wallet at all times
Computer hack resilient Fail, if my system is hacked, I lose control of the wallet
Physical destruction resilient Check, Bitwarden lives beyond my machines
Will survive me Check thanks to the dead-man switch
Hardware wallet
Not much to say about the hardware wallet setup that I haven't said already. Let's do the goals:
Self custody Check, no reliance on anyone else
Full access Check, I have access to the wallet at all times
Computer hack resilient Check, the private keys never leave the hardware device
Physical destruction resilient Fail, the wallet and plates could easily be destroyed, and the plates could easily be stolen. (The wallet could be stolen too, but thanks to the PIN mechanism would theoretically be resistant to compromise. But that's not a theory I'd want to bet my wealth on.)
Will survive me Check, anyone can take my plates and recover the wallet
SLIP39
This one requires a bit of explanation. SLIP39 is a not-so-common standard for taking some data and splitting it up into a number of shards. You can define the threshold of shards necessary to reconstruct the original secret. This uses an algorithm called Shamir's Secret Sharing. (And yes, it is very similar in function to multisig, but implemented differently).
The idea here is that this wallet is controlled by a group of friends and family members. Without getting into my actual setup, I could choose 7 very trusted individuals from all over the world and tell them that, should I contact them and ask for them, they should send me their shards so I can reconstruct that third wallet. And to be especially morbid, they also know the identity of some backup people in the event of my death.
In any event, the idea is that if enough of these people agree to, they can reconstruct the third wallet. The assumption is that these are all trustworthy people. But even with trustworthy people, (1) I could be wrong about how trustworthy they are, or (2) they could be coerced or tricked. So let's see how these security mechanism stands up:
Self custody Fail, I'm totally reliant on others.
Full access Fail, by design I don't keep this wallet myself, so I must rely on others.
Computer hack resilient Check, the holders of these shards keep them in secure, offline storage.
Physical destruction resilient Check (sort of), since the probability of all copies being destroyed or stolen is negligible.
Will survive me Check, by design
Comparison against goals
We saw how each individual wallet stacked up against the goals. How about all of them together? Well, there are certainly some theoretical ways I could lose the funds, e.g. my hardware wallet and plates are destroyed and a majority of shard holders for the SLIP39 lost their shards. However, if you look through the check/fail lists, every category has at least two checks. Meaning: on all dimensions, if some catastrophe happens, at least two of the wallets should survive.
Now the caveats (I seem to like that word). I did a lot of research on this, and this is at least tangential to my actual field of expertise. But I'm not a dedicated security researcher, and can't really claim full, deep understanding of all these topics. So if I made any mistakes here, please let me know.
How-to guide
OK, so how do you actually get a system like this running? I'll give you my own step-by-step guide. Best case scenario for all this: download all the websites and programs mentioned onto a fresh Linux system install, disconnect the internet, run the programs and copy down any data as needed, and then wipe the system again. (Or, alternatively, do all the actions from a Live USB session.)
Set up the SLIP39. You can use an online generator. Choose the number of bits of entropy (IMO 128bit is sufficient), choose the total shares and threshold, and then copy down the phrases.
Generate the software wallet. You can use a sister site to the SLIP39 generator. Choose either 12 or 24 words, and write those words down. On a different, internet-connected computer, you can save those words into a Bitwarden account, and set it up with appropriate emergency access.
Open up Electrum. (Other wallets, like Sparrow, probably work for this too, but I've only done it with Electrum.) The rest of this section will include a step-by-step guide through the Electrum steps. And yes, I took these screenshots on a Mac, but for a real setup use a Linux machine.
Set up a new wallet. Enter a name (doesn't matter what) and click next.
Choose a multisig wallet and click next.
Choose 3 cosigners and require 2 signatures.
Now we're going to enter all three wallets. The first one will be your hardware device. Click next, then follow all the prompts to set it up.
After a few screens (they'll be different based on your choice of hardware device), you'll be prompted to select a derivation path. Use native segwit and the standard derivation path.
This next screen was the single most complicated for me, simply because the terms were unclear. First, you'll see a Zpub string displayed as a "master public key," e.g.:
You need to write this down. It's the same as an xpub, but for multisig wallets. This represents all the possible public keys for your hardware wallet. Putting together the three Zpub values will allow your software of choice to generate all the receiving and change addresses for your new wallet. You'll need all three, so don't lose them! But on their own, they cannot be used to access your funds. Therefore, treat them with "medium" security. Backing up in Bitwarden with your software wallet is a good idea, and potentially simply sending to some friends to back up just in case.
And that explanation brings us back to the three choices on the screen. You can choose to either enter a cosigner key, a cosigner seed, or use another hardware wallet. The difference between key and seed is that the former is public information only, whereas the latter is full signing power. Often, multisig wallets are set up by multiple different people, and so instead of sharing the seed with each other (a major security violation), they each generate a seed phrase and only share the key with each other.
However, given that you're setting up the wallet with access to all seed phrases, and you're doing it on an airgapped device, it's safe to enter the seed phrases directly. And I'd recommend it, to avoid the risk of generating the wrong master key from a seed. So go ahead and choose "enter cosigner seed" and click next.
And now onto the second most confusing screen. I copied my seed phrase into this text box, but it won't let me continue!
The trick is that Electrum, by default, uses its own concept of seed phrases. You need to click on "Options" and then choose BIP39, and then enter your seed phrase.
Continue through the other screens until you're able to enter the final seed. This time, instead of choosing BIP39, choose SLIP39. You'll need to enter enough of the SLIP39 shards to meet the threshold.
And with that, you can continue through the rest of the screens, and you'll now have a fully operational multisig!
Open up Electrum again on an internet-connected computer. This time, connect the hardware wallet as before, enter the BIP39 as before, but for the SLIP39, enter the master key instead of the SLIP39 seed phrase. This will ensure that no internet connected device ever has both the software wallet and SLIP39 at the same time. You should confirm that the addresses on the airgapped machine match the addresses on the internet connected device.
If so, you're ready for the final test. Send a small amount of funds into the first receiving address, and then use Electrum on the internet connected device to (1) confirm in the history that it arrived and (2) send it back to another address. You should be asked to sign with your hardware wallet.
If you made it this far, congratulations! You're the proud owner of a new 2of3 multisig wallet.
Conclusion
I hope the topic of death and war wasn't too terribly morbid for others. But these are important topics to address in our world of self custody. I hope others found this useful. And once again, if anyone has recommendations for improvements to this setup, please do let me know!
Tom Ellis works at Groq, using Haskell to compile AI models to specialized hardware. In this episode, we talk about stability of both GHC and Haskell libraries, effects, and strictness, and the premise of functional programming: make invalid states and invalid *laziness* unrepresentable!
I'm part of the programme committee for Lambda Days, and I’m personally inviting you to submit your talk!
Lambda Days is all about celebrating the world of functional programming, and we’re eager to hear about your latest ideas, projects, and discoveries. Whether it’s functional languages, type theory, reactive programming, or something completely unexpected—we want to see it!
Submission Deadline: 9 February 2025 Never spoken before? No worries! We’re committed to supporting speakers from all backgrounds, especially those from underrepresented groups in tech.
Submit your talk and share your wisdom with the FP community.
The world we live in today is inflationary. Through the constant increase in the money supply by governments around the world, the purchasing power of any dollars (or other government money) sitting in your wallet or bank account will go down over time. To simplify massively, this leaves people with three choices:
Keep your money in fiat currencies and earn a bit of interest. You’ll still lose purchasing power over time, because inflation virtually always beats interest, but you’ll lose it more slowly.
Try to beat inflation by investing in the stock market and other risk-on investments.
Recognize that the game is slanted against you, don’t bother saving or investing, and spend all your money today.
(Side note: if you’re reading this and screaming at your screen that there’s a much better option than any of these, I’ll get there, don’t worry.)
High living and melting ice cubes
Option 3 is what we’d call “high time preference.” It means you value the consumption you can have today over the potential savings for the future. In an inflationary environment, this is unfortunately a very logical stance to take. Your money is worth more today than it will ever be later. May as well live it up while you can. Or as Milton Friedman put it, engage in high living.
But let’s ignore that option for the moment, and pursue some kind of low time preference approach. Despite the downsides, we want to hold onto our wealth for the future. The first option, saving in fiat, would work with things like checking accounts, savings accounts, Certificates of Deposit (CDs), government bonds, and perhaps corporate bonds from highly rated companies. There’s little to no risk in those of losing your original balance or the interest (thanks to FDIC protection, a horrible concept I may dive into another time). And the downside is also well understood: you’re still going to lose wealth over time.
Or, to quote James from InvestAnswers, you can hold onto some melting ice cubes. But with sufficient interest, they’ll melt a little bit slower.
The investment option
With that option sitting on the table, many people end up falling into the investment bucket. If they’re more risk-averse, it will probably be a blend of both risk-on stock investment and risk-off fiat investment. But ultimately, they’re left with some amount of money that they want to put into a risk-on investment. The only reason they’re doing that is on the hopes that between price movements and dividends, the value of their investment will grow faster than anything else they can choose.
You may be bothered by my phrasing. “The only reason.” Of course that’s the only reason! We only put money into investments in order to make more money. What other possible reason exists?
Well, the answer is that while we invest in order to make money, that’s not the only reason. That would be like saying I started a tech consulting company to make money. Yes, that’s a true reason. But the purpose of the company is to meet a need in the market: providing consulting services. Like every economic activity, starting a company has a dual purpose: making a profit, but by providing actual value.
So what actual value is generated for the world when I choose to invest in a stock? Let’s rewind to real investment, and then we’ll see how modern investment differs.
Michael (Midas) Mulligan
Let’s talk about a fictional character, Michael Mulligan, aka Midas. In Atlas Shrugged, he’s the greatest banker in the country. He created a small fortune for himself. Then, using that money, he very selectively invested in the most promising ventures. He put his own wealth on the line because he believed each of those ventures had a high likelihood to succeed.
He wasn’t some idiot who jumps on his CNBC show to spout nonsense about which stocks will go up and down. He wasn’t a venture capitalist who took money from others and put it into the highest-volatility companies hoping that one of them would 100x and cover the massive losses on the others. He wasn’t a hedge fund manager who bets everything on financial instruments so complex he can’t understand them, knowing that if it crumbles, the US government will bail him out.
And he wasn’t a normal person sitting in his house, staring at candlestick charts, hoping he can outsmart every other person staring at those same charts by buying in and selling out before everyone else.
No. Midas Mulligan represented the true gift, skill, art, and value of real investment. In the story, we find out that he was the investor who got Hank Rearden off the ground. Hank Rearden uses that investment to start a steel empire that drives the country, and ultimately that powers his ability to invest huge amounts of his new wealth into research into an even better metal that has the promise to reshape the world.
That’s what investment is. And that’s why investment has such a high reward associated with it. It’s a massive gamble that may produce untold value for society. The effort necessary to determine the right investments is high. It’s only right that Midas Mulligan be well compensated for his work. And by compensating him well, he’ll have even more money in the future to invest in future projects, creating a positive feedback cycle of innovation and improvements.
Michael (Crappy Investor) Snoyman
I am not Midas Mulligan. I don’t have the gift to choose the winners in newly emerging markets. I can’t sit down with entrepreneurs and guide them to the best way to make their ideas thrive. And I certainly don’t have the money available to make such massive investments, much less the psychological profile to handle taking huge risks with my money like that.
I’m a low time preference individual by my upbringing, plus I am very risk-averse. I spent most of my adult life putting money into either the house I live in or into risk-off assets. I discuss this background more in a blog post on my current investment patterns. During the COVID-19 money printing, I got spooked about this, realizing that the melting ice cubes were melting far faster than I had ever anticipated. It shocked me out of my risk-averse nature, realizing that if I didn’t take a more risky stance with my money, ultimately I’d lose it all.
So like so many others, I diversified. I put money into stock indices. I realized the stock market was risky, so I diversified further. I put money into various cryptocurrencies too. I learned to read candlestick charts. I made some money. I felt pretty good.
I started feeling more confident overall, and started trying to predict the market. I fixated on this. I was nervous all the time, because my entire wealth was on the line constantly.
And it gets even worse. In economics, we have the concept of an opportunity cost. If I invest in company ABC and it goes up 35% in a month, I’m a genius investor, right? Well, if company DEF went up 40% that month, I can just as easily kick myself for losing out on the better opportunity. In other words, once you’re in this system, it’s a constant rat race to keep finding the best possible returns, not simply being happy with keeping your purchasing power.
Was I making the world a better place? No, not at all. I was just another poor soul trying to do a better job of entering and exiting a trade than the next guy. It was little more than riding a casino.
And yes, I ultimately lost a massive amount of money through this.
Normal people shouldn’t invest
Which brings me to the title of this post. I don’t believe normal people should be subjected to this kind of investment. It’s an extra skill to learn. It’s extra life stress. It’s extra risk. And it doesn’t improve the world. You’re being rewarded—if you succeed at all—simply for guessing better than others.
(Someone out there will probably argue efficient markets and that having everyone trading stocks like this does in fact add some efficiencies to capital allocation. I’ll give you a grudging nod of agreement that this is somewhat true, but not sufficiently enough to justify the returns people anticipate from making “good” gambles.)
The only reason most people ever consider this is because they feel forced into it, otherwise they’ll simply be sitting on their melting ice cubes. But once they get into the game, between risk, stress, and time investment, they’re lives will often get worse.
One solution is to not be greedy. Invest in stock market indices, don’t pay attention to day-to-day price, and assume that the stock market will continue to go up over time, hopefully beating inflation. And if that’s the approach you’re taking, I can honestly say I think you’re doing better than most. But it’s not the solution I’ve landed on.
Option 4: deflation
The problem with all of our options is that they are built in a broken world. The fiat/inflationary world is a rigged game. You’re trying to walk up an escalator that’s going down. If you try hard enough, you’ll make progress. But the system is against you. This is inherent to the design. The inflation in our system is so that central planners have the undeserved ability to appropriate productive capacity in the economy to do whatever they want with it. They can use it to fund government welfare programs, perform scientific research, pay off their buddies, and fight wars. Whatever they want.
If you take away their ability to print money, your purchasing power will not go down over time. In fact, the opposite will happen. More people will produce more goods. Innovators will create technological breakthroughs that will create better, cheaper products. Your same amount of money will buy more in the future, not less. A low time preference individual will be rewarded. By setting aside money today, you’re allowing productive capacity today to be invested into building a stronger engine for tomorrow. And you’ll be rewarded by being able to claim a portion of that larger productive pie.
And to reiterate: in today’s inflationary world, if you defer consumption and let production build a better economy, you are punished with reduced purchasing power.
So after burying the lead so much, my option 4 is simple: Bitcoin. It’s not an act of greed, trying to grab the most quickly appreciating asset. It’s about putting my money into a system that properly rewards low time preference and saving. It’s admitting that I have no true skill or gift to the world through my investment capabilities. It’s recognizing that I care more about destressing my life and focusing on things I’m actually good at than trying to optimize an investment portfolio.
Can Bitcoin go to 0? Certainly, though year by year that likelihood is becoming far less likely. Can Bitcoin have major crashes in its price? Absolutely, but I’m saving for the long haul, not for a quick buck.
I’m hoping for a world where deflation takes over. Where normal people don’t need to add yet another stress and risk to their life, and saving money is the most natural, safest, and highest-reward activity we can all do.
The GHC developers are very pleased to announce the release of GHC 9.12.1.
Binary distributions, source distributions, and documentation are available at
downloads.haskell.org.
We hope to have this release available via ghcup shortly.
GHC 9.12 will bring a number of new features and improvements, including:
The new language extension OrPatterns allowing you to combine multiple
pattern clauses into one.
The MultilineStrings language extension to allow you to more easily write
strings spanning multiple lines in your source code.
Improvements to the OverloadedRecordDot extension, allowing the built-in
HasField class to be used for records with fields of non lifted representations.
The NamedDefaults language extension has been introduced allowing you to
define defaults for typeclasses other than Num.
More deterministic object code output, controlled by the -fobject-determinism
flag, which improves determinism of builds a lot (though does not fully do so)
at the cost of some compiler performance (1-2%). See #12935 for the details
GHC now accepts type syntax in expressions as part of GHC Proposal #281.
The WASM backend now has support for TemplateHaskell.
Experimental support for the RISC-V platform with the native code generator.
… and many more
A full accounting of changes can be found in the release notes.
As always, GHC’s release status, including planned future releases, can
be found on the GHC Wiki status.
We would like to thank GitHub, IOG, the Zw3rk stake pool,
Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell
Foundation, and other anonymous contributors whose on-going financial
and in-kind support has facilitated GHC maintenance and release
management over the years. Finally, this release would not have been
possible without the hundreds of open-source contributors whose work
comprise this release.
As always, do give this release a try and open a ticket if you see
anything amiss.
The Stackage team is happy to announce that Stackage LTS version 23 has finally been released a couple of days ago, based on GHC stable version 9.8.4. It follows on from the LTS 22 series which was the longest lived LTS major release to date (with probable final snapshot lts-22.43).
We are dedicating the LTS 23 release to the memory of Chris Dornan, who left this world suddenly and unexceptedly around the end of May. We are indebted to Christopher for his many years of wide Haskell community service, including also being one of the Stackage Curators up until the time he passed away. He is warmly remembered.
LTS 23 includes many package changes, and almost 3200 packages!
Thank you for all your nightly contributions that made this release possible: the initial release was prepared by Jens Petersen.
(The closest nightly snapshot to lts-23.0 is nightly-2024-12-09, but lts-23 is just ahead of it with pandoc-3.6.)
At the same time we are excited to move Stackage Nightly to GHC 9.10.1: the initial snapshot release is nightly-2024-12-11. Current nightly has over 2800 packages, and we expect that number to grow over the coming weeks and months: we welcome your contributions and help with this.
This initial release build was made by Jens Petersen (64 commits).
Most of our upperbounds were dropped for this rebase so quite a lot of packages had to be disabled.
You can see all the changes made relative to the preceding last 9.8 nightly snapshot.
Apart from trying to build yourself, the easiest way to understand why particular packages are disabled is to look for their < 0 lines in build-constraints.yaml, particularly under the "Library and exe bounds failures" section.
We also have some tracking issues still open related to 9.10 core boot libraries.
Thank you to all those who have already done work updating their packages for ghc-9.10.
Sam and Wouter interview Harry Goldstein, a researcher in property-based testing who works in PL, SE, and HCI. In this episode, we reflect on random generators, the find-a-friend model, interdisciplinary research, and how to have impact beyond your own research community.