Apr. 17, 2015 · Felix S. Klock II
+
+
+
One of the primary goals of the Rust project is to enable safe systems
+programming. Systems programming usually implies imperative
+programming, which in turns often implies side-effects, reasoning
+about shared state, et cetera.
+
At the same time, to provide safety, Rust programs and data types
+must be structured in a way that allows static checking to ensure
+soundness. Rust has features and restrictions that operate in tandem
+to ease writing programs that can pass these checks and thus ensure
+safety. For example, Rust incorporates the notion of ownership deeply
+into the language.
+
Rust's match
expression is a construct that offers an interesting
+combination of such features and restrictions. A match
expression
+takes an input value, classifies it, and then jumps to code written to
+handle the identified class of data.
+
In this post we explore how Rust processes such data via match
.
+The crucial elements that match
and its counterpart enum
tie
+together are:
+
+-
+
Structural pattern matching: case analysis with ergonomics vastly
+improved over a C or Java style switch
statement.
+
+-
+
Exhaustive case analysis: ensures that no case is omitted
+when processing an input.
+
+-
+
match
embraces both imperative and functional styles of
+programming: you can continue using break
statements, assignments,
+et cetera,
+rather than being forced to adopt an expression-oriented mindset.
+
+-
+
match
"borrows" or "moves", as needed: Rust encourages the developer to
+think carefully about ownership and borrowing. To ensure that
+one is not forced to yield ownership of a value
+prematurely, match
is designed with support for merely borrowing
+substructure (as opposed to always moving such substructure).
+
+
+
We cover each of the items above in detail below, but first we
+establish a foundation for the discussion: What does match
look
+like, and how does it work?
+
The Basics of match
+
The match
expression in Rust has this form:
+
match INPUT_EXPRESSION {
+ PATTERNS_1 => RESULT_EXPRESSION_1,
+ PATTERNS_2 => RESULT_EXPRESSION_2,
+ ...
+ PATTERNS_n => RESULT_EXPRESSION_n
+}
+
+
where each of the PATTERNS_i
contains at least one pattern. A
+pattern describes a subset of the possible values to which
+INPUT_EXPRESSION
could evaluate.
+The syntax PATTERNS => RESULT_EXPRESSION
is called a "match arm",
+or simply "arm".
+
Patterns can match simple values like integers or characters; they
+can also match user-defined symbolic data, defined via enum
.
+
The below code demonstrates generating the next guess (poorly) in a number
+guessing game, given the answer from a previous guess.
+
enum Answer {
+ Higher,
+ Lower,
+ Bingo,
+}
+
+fn suggest_guess(prior_guess: u32, answer: Answer) {
+ match answer {
+ Answer::Higher => println!("maybe try {} next", prior_guess + 10),
+ Answer::Lower => println!("maybe try {} next", prior_guess - 1),
+ Answer::Bingo => println!("we won with {}!", prior_guess),
+ }
+}
+
+#[test]
+fn demo_suggest_guess() {
+ suggest_guess(10, Answer::Higher);
+ suggest_guess(20, Answer::Lower);
+ suggest_guess(19, Answer::Bingo);
+}
+
+
(Incidentally, nearly all the code in this post is directly
+executable; you can cut-and-paste the code snippets into a file
+demo.rs
, compile the file with --test
, and run the resulting
+binary to see the tests run.)
+
Patterns can also match structured data (e.g. tuples, slices, user-defined
+data types) via corresponding patterns. In such patterns, one often
+binds parts of the input to local variables;
+those variables can then be used in the result expression.
+
The special _
pattern matches any single value, and is often used as
+a catch-all; the special ..
pattern generalizes this by matching any
+series of values or name/value pairs.
+
Also, one can collapse multiple patterns into one arm by separating the
+patterns by vertical bars (|
); thus that arm matches either this pattern,
+or that pattern, et cetera.
+
These features are illustrated in the following revision to the
+guessing-game answer generation strategy:
+
struct GuessState {
+ guess: u32,
+ answer: Answer,
+ low: u32,
+ high: u32,
+}
+
+fn suggest_guess_smarter(s: GuessState) {
+ match s {
+ // First arm only fires on Bingo; it binds `p` to last guess.
+ GuessState { answer: Answer::Bingo, guess: p, .. } => {
+ // ~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~ ~~
+ // | | | |
+ // | | | Ignore remaining fields
+ // | | |
+ // | | Copy value of field `guess` into local variable `p`
+ // | |
+ // | Test that `answer field is equal to `Bingo`
+ // |
+ // Match against an instance of the struct `GuessState`
+
+ println!("we won with {}!", p);
+ }
+
+ // Second arm fires if answer was too low or too high.
+ // We want to find a new guess in the range (l..h), where:
+ //
+ // - If it was too low, then we want something higher, so we
+ // bind the guess to `l` and use our last high guess as `h`.
+ // - If it was too high, then we want something lower; bind
+ // the guess to `h` and use our last low guess as `l`.
+ GuessState { answer: Answer::Higher, low: _, guess: l, high: h } |
+ GuessState { answer: Answer::Lower, low: l, guess: h, high: _ } => {
+ // ~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~ ~~~~~~ ~~~~~~~~ ~~~~~~~
+ // | | | | |
+ // | | | | Copy or ignore
+ // | | | | field `high`,
+ // | | | | as appropriate
+ // | | | |
+ // | | | Copy field `guess` into
+ // | | | local variable `l` or `h`,
+ // | | | as appropriate
+ // | | |
+ // | | Copy value of field `low` into local
+ // | | variable `l`, or ignore it, as appropriate
+ // | |
+ // | Test that `answer field is equal
+ // | to `Higher` or `Lower`, as appropriate
+ // |
+ // Match against an instance of the struct `GuessState`
+
+ let mid = l + ((h - l) / 2);
+ println!("lets try {} next", mid);
+ }
+ }
+}
+
+#[test]
+fn demo_guess_state() {
+ suggest_guess_smarter(GuessState {
+ guess: 20, answer: Answer::Lower, low: 10, high: 1000
+ });
+}
+
+
This ability to simultaneously perform case analysis and bind input
+substructure leads to powerful, clear, and concise code, focusing the
+reader's attention directly on the data relevant to the case at hand.
+
That is match
in a nutshell.
+
So, what is the interplay between this construct and Rust's approach to
+ownership and safety in general?
+
Exhaustive case analysis
+
+...when you have eliminated all which is impossible,
+then whatever remains, however improbable, must be the truth.
+-- Sherlock Holmes (Arthur Conan Doyle, "The Blanched Soldier")
+
+
One useful way to tackle a complex problem is to break it down
+into individual cases and analyze each case individually.
+For this method of problem solving to work, the breakdown must be
+collectively exhaustive; all of the cases you identified must
+actually cover all possible scenarios.
+
Using enum
and match
in Rust can aid this process, because
+match
enforces exhaustive case analysis:
+Every possible input value for a match
must be covered by the pattern
+in a least one arm in the match.
+
This helps catch bugs in program logic and ensures that the value of a
+match
expression is well-defined.
+
So, for example, the following code is rejected at compile-time.
+
fn suggest_guess_broken(prior_guess: u32, answer: Answer) {
+ let next_guess = match answer {
+ Answer::Higher => prior_guess + 10,
+ Answer::Lower => prior_guess - 1,
+ // ERROR: non-exhaustive patterns: `Bingo` not covered
+ };
+ println!("maybe try {} next", next_guess);
+}
+
+
Many other languages offer a pattern matching construct (ML and
+various macro-based match
implementations in Scheme both come to
+mind), but not all of them have this restriction.
+
Rust has this restriction for these reasons:
+
+-
+
First, as noted above, dividing a problem into cases only yields a
+general solution if the cases are exhaustive. Exhaustiveness-checking
+exposes logical errors.
+
+-
+
Second, exhaustiveness-checking can act as a refactoring aid. During
+the development process, I often add new variants for a particular
+enum
definition. The exhaustiveness-check helps points out all of
+the match
expressions where I only wrote the cases from the prior
+version of the enum
type.
+
+-
+
Third, since match
is an expression form, exhaustiveness ensures
+that such expressions always either evaluate to a value of the correct type,
+or jump elsewhere in the program.
+
+
+
Jumping out of a match
+
The following code is a fixed version of the suggest_guess_broken
+function we saw above; it directly illustrates "jumping elsewhere":
+
fn suggest_guess_fixed(prior_guess: u32, answer: Answer) {
+ let next_guess = match answer {
+ Answer::Higher => prior_guess + 10,
+ Answer::Lower => prior_guess - 1,
+ Answer::Bingo => {
+ println!("we won with {}!", prior_guess);
+ return;
+ }
+ };
+ println!("maybe try {} next", next_guess);
+}
+
+#[test]
+fn demo_guess_fixed() {
+ suggest_guess_fixed(10, Answer::Higher);
+ suggest_guess_fixed(20, Answer::Lower);
+ suggest_guess_fixed(19, Answer::Bingo);
+}
+
+
The suggest_guess_fixed
function illustrates that match
can handle
+some cases early (and then immediately return from the function),
+while computing whatever values are needed from the remaining cases
+and letting them fall through to the remainder of the function
+body.
+
We can add such special case handling via match
without fear
+of overlooking a case, because match
will force the case
+analysis to be exhaustive.
+
Algebraic Data Types and Structural Invariants
+
Algebraic data types succinctly describe classes of data and allow one
+to encode rich structural invariants. Rust uses enum
and struct
+definitions for this purpose.
+
An enum
type allows one to define mutually-exclusive classes of
+values. The examples shown above used enum
for simple symbolic tags,
+but in Rust, enums can define much richer classes of data.
+
For example, a binary tree is either a leaf, or an internal node with
+references to two child trees. Here is one way to encode a tree of
+integers in Rust:
+
enum BinaryTree {
+ Leaf(i32),
+ Node(Box<BinaryTree>, i32, Box<BinaryTree>)
+}
+
+
(The Box<V>
type describes an owning reference to a heap-allocated
+instance of V
; if you own a Box<V>
, then you also own the V
it
+contains, and can mutate it, lend out references to it, et cetera.
+When you finish with the box and let it fall out of scope, it will
+automatically clean up the resources associated with the
+heap-allocated V
.)
+
The above enum
definition ensures that if we are given a BinaryTree
, it
+will always fall into one of the above two cases. One will never
+encounter a BinaryTree::Node
that does not have a left-hand child.
+There is no need to check for null.
+
One does need to check whether a given BinaryTree
is a Leaf
or
+is a Node
, but the compiler statically ensures such checks are done:
+you cannot accidentally interpret the data of a Leaf
as if it were a
+Node
, nor vice versa.
+
Here is a function that sums all of the integers in a tree
+using match
.
+
fn tree_weight_v1(t: BinaryTree) -> i32 {
+ match t {
+ BinaryTree::Leaf(payload) => payload,
+ BinaryTree::Node(left, payload, right) => {
+ tree_weight_v1(*left) + payload + tree_weight_v1(*right)
+ }
+ }
+}
+
+/// Returns tree that Looks like:
+///
+/// +----(4)---+
+/// | |
+/// +-(2)-+ [5]
+/// | |
+/// [1] [3]
+///
+fn sample_tree() -> BinaryTree {
+ let l1 = Box::new(BinaryTree::Leaf(1));
+ let l3 = Box::new(BinaryTree::Leaf(3));
+ let n2 = Box::new(BinaryTree::Node(l1, 2, l3));
+ let l5 = Box::new(BinaryTree::Leaf(5));
+
+ BinaryTree::Node(n2, 4, l5)
+}
+
+#[test]
+fn tree_demo_1() {
+ let tree = sample_tree();
+ assert_eq!(tree_weight_v1(tree), (1 + 2 + 3) + 4 + 5);
+}
+
+
Algebraic data types establish structural invariants that are strictly
+enforced by the language. (Even richer representation invariants can
+be maintained via the use of modules and privacy; but let us not
+digress from the topic at hand.)
+
Both expression- and statement-oriented
+
Unlike many languages that offer pattern matching, Rust embraces
+both statement- and expression-oriented programming.
+
Many functional languages that offer pattern matching encourage one to
+write in an "expression-oriented style", where the focus is always on
+the values returned by evaluating combinations of expressions, and
+side-effects are discouraged. This style contrasts with imperative
+languages, which encourage a statement-oriented style that focuses on
+sequences of commands executed solely for their side-effects.
+
Rust excels in supporting both styles.
+
Consider writing a function which maps a non-negative integer to a
+string rendering it as an ordinal ("1st", "2nd", "3rd", ...).
+
The following code uses range patterns to simplify things, but also,
+it is written in a style similar to a switch
in a statement-oriented
+language like C (or C++, Java, et cetera), where the arms of the
+match
are executed for their side-effect alone:
+
fn num_to_ordinal(x: u32) -> String {
+ let suffix;
+ match (x % 10, x % 100) {
+ (1, 1) | (1, 21...91) => {
+ suffix = "st";
+ }
+ (2, 2) | (2, 22...92) => {
+ suffix = "nd";
+ }
+ (3, 3) | (3, 23...93) => {
+ suffix = "rd";
+ }
+ _ => {
+ suffix = "th";
+ }
+ }
+ return format!("{}{}", x, suffix);
+}
+
+#[test]
+fn test_num_to_ordinal() {
+ assert_eq!(num_to_ordinal( 0), "0th");
+ assert_eq!(num_to_ordinal( 1), "1st");
+ assert_eq!(num_to_ordinal( 12), "12th");
+ assert_eq!(num_to_ordinal( 22), "22nd");
+ assert_eq!(num_to_ordinal( 43), "43rd");
+ assert_eq!(num_to_ordinal( 67), "67th");
+ assert_eq!(num_to_ordinal(1901), "1901st");
+}
+
+
The Rust compiler accepts the above program. This is notable because
+its static analyses ensure both:
+
+-
+
suffix
is always initialized before we run the format!
at the end
+of the function, and
+
+-
+
suffix
is assigned at most once during the function's execution (because if
+we could assign suffix
multiple times, the compiler would force us
+to mark suffix
as mutable).
+
+
+
To be clear, the above program certainly can be written in an
+expression-oriented style in Rust; for example, like so:
+
fn num_to_ordinal_expr(x: u32) -> String {
+ format!("{}{}", x, match (x % 10, x % 100) {
+ (1, 1) | (1, 21...91) => "st",
+ (2, 2) | (2, 22...92) => "nd",
+ (3, 3) | (3, 23...93) => "rd",
+ _ => "th"
+ })
+}
+
+
Sometimes expression-oriented style can yield very succinct code;
+other times the style requires contortions that can be
+avoided by writing in a statement-oriented style.
+(The ability to return from one match
arm in the
+suggest_guess_fixed
function earlier was an example of this.)
+
Each of the styles has its use cases. Crucially, switching to a
+statement-oriented style in Rust does not sacrifice every other
+feature that Rust provides, such as the guarantee that a non-mut
+binding is assigned at most once.
+
An important case where this arises is when one wants to
+initialize some state and then borrow from it, but only on
+some control-flow branches.
+
fn sometimes_initialize(input: i32) {
+ let string: String; // a dynamically-constructed string value
+ let borrowed: &str; // a reference to string data
+ match input {
+ 0...100 => {
+ // Construct a String on the fly...
+ string = format!("input prints as {}", input);
+ // ... and then borrow from inside it.
+ borrowed = &string[6..];
+ }
+ _ => {
+ // String literals are *already* borrowed references
+ borrowed = "expected between 0 and 100";
+ }
+ }
+ println!("borrowed: {}", borrowed);
+
+ // Below would cause compile-time error if uncommented...
+
+ // println!("string: {}", string);
+
+ // ...namely: error: use of possibly uninitialized variable: `string`
+}
+
+#[test]
+fn demo_sometimes_initialize() {
+ sometimes_initialize(23); // this invocation will initialize `string`
+ sometimes_initialize(123); // this one will not
+}
+
+
The interesting thing about the above code is that after the match
,
+we are not allowed to directly access string
, because the compiler
+requires that the variable be initialized on every path through the
+program before it can be accessed.
+At the same time, we can, via borrowed
, access data that
+may held within string
, because a reference to that data is held by the
+borrowed
variable when we go through the first match arm, and we
+ensure borrowed
itself is initialized on every execution path
+through the program that reaches the println!
that uses borrowed
.
+
(The compiler ensures that no outstanding borrows of the
+string
data could possibly outlive string
itself, and the
+generated code ensures that at the end of the scope of string
, its
+data is deallocated if it was previously initialized.)
+
In short, for soundness, the Rust language ensures that data is always
+initialized before it is referenced, but the designers have strived to
+avoid requiring artificial coding patterns adopted solely to placate
+Rust's static analyses (such as requiring one to initialize string
+above with some dummy data, or requiring an expression-oriented style).
+
Matching without moving
+
Matching an input can borrow input substructure, without taking
+ownership; this is crucial for matching a reference (e.g. a value of
+type &T
).
+
The "Algebraic Data Types" section above described a tree datatype, and
+showed a program that computed the sum of the integers in a tree
+instance.
+
That version of tree_weight
has one big downside, however: it takes
+its input tree by value. Once you pass a tree to tree_weight_v1
, that
+tree is gone (as in, deallocated).
+
#[test]
+fn tree_demo_v1_fails() {
+ let tree = sample_tree();
+ assert_eq!(tree_weight_v1(tree), (1 + 2 + 3) + 4 + 5);
+
+ // If you uncomment this line below ...
+
+ // assert_eq!(tree_weight_v1(tree), (1 + 2 + 3) + 4 + 5);
+
+ // ... you will get: error: use of moved value: `tree`
+}
+
+
This is not a consequence, however, of using match
; it is rather
+a consequence of the function signature that was chosen:
+
fn tree_weight_v1(t: BinaryTree) -> i32 { 0 }
+// ^~~~~~~~~~ this means this function takes ownership of `t`
+
+
In fact, in Rust, match
is designed to work quite well without
+taking ownership. In particular, the input to match
is an L-value
+expression; this means that the input expression is evaluated to a
+memory location where the value lives.
+match
works by doing this evaluation and then
+inspecting the data at that memory location.
+
(If the input expression is a variable name or a field/pointer
+dereference, then the L-value is just the location of that variable or
+field/memory. If the input expression is a function call or other
+operation that generates an unnamed temporary value, then it will be
+conceptually stored in a temporary area, and that is the memory
+location that match
will inspect.)
+
So, if we want a version of tree_weight
that merely borrows a tree
+rather than taking ownership of it, then we will need to make use of
+this feature of Rust's match
.
+
fn tree_weight_v2(t: &BinaryTree) -> i32 {
+ // ^~~~~~~~~~~ The `&` means we are *borrowing* the tree
+ match *t {
+ BinaryTree::Leaf(payload) => payload,
+ BinaryTree::Node(ref left, payload, ref right) => {
+ tree_weight_v2(left) + payload + tree_weight_v2(right)
+ }
+ }
+}
+
+#[test]
+fn tree_demo_2() {
+ let tree = sample_tree();
+ assert_eq!(tree_weight_v2(&tree), (1 + 2 + 3) + 4 + 5);
+}
+
+
The function tree_weight_v2
looks very much like tree_weight_v1
.
+The only differences are: we take t
as a borrowed reference (the &
+in its type), we added a dereference *t
, and,
+importantly, we use ref
-bindings for left
and
+right
in the Node
case.
+
The dereference *t
, interpreted as an L-value expression, is just
+extracting the memory address where the BinaryTree
is represented
+(since the t: &BinaryTree
is just a reference to that data in
+memory). The *t
here is not making a copy of the tree, nor moving it
+to a new temporary location, because match
is treating it as an
+L-value.
+
The only piece left is the ref
-binding, which
+is a crucial part of how destructuring bind of
+L-values works.
+
First, let us carefully state the meaning of a non-ref binding:
+
+-
+
When matching a value of type T
, an identifier pattern i
will, on
+a successful match, move the value out of the original input and
+into i
. Thus we can always conclude in such a case that i
has type
+T
(or more succinctly, "i: T
").
+For some types T
, known as copyable T
(also pronounced "T
+implements Copy
"), the value will in fact be copied into i
for such
+identifier patterns. (Note that in general, an arbitrary type T
is not copyable.)
+Either way, such pattern bindings do mean that the variable i
has
+ownership of a value of type T
.
+
+
+
Thus, the bindings of payload
in tree_weight_v2
both have type
+i32
; the i32
type implements Copy
, so the weight is copied into
+payload
in both arms.
+
Now we are ready to state what a ref-binding is:
+
+- When matching an L-value of type
T
, a ref
-pattern ref i
+will, on a successful match, merely borrow a reference into the
+matched data. In other words, a successful ref i
match of a value of
+type T
will imply that i
has the type of a reference to T
+(or more succinctly, "i: &T
").
+
+
Thus, in the Node
arm of
+tree_weight_v2
, left
will be a reference to the left-hand box (which
+holds a tree), and right
will likewise reference the right-hand tree.
+
We can pass these borrowed references to trees into the recursive calls to tree_weight_v2
,
+as the code demonstrates.
+
Likewise, a ref mut
-pattern (ref mut i
) will, on a successful
+match, borrow a mutable reference into the input: i: &mut T
. This allows
+mutation and ensures there are no other active references to that data
+at the same time. A destructuring
+binding form like match
allows one to take mutable references to
+disjoint parts of the data simultaneously.
+
This code demonstrates this concept by incrementing all of the
+values in a given tree.
+
fn tree_grow(t: &mut BinaryTree) {
+ // ^~~~~~~~~~~~~~~ `&mut`: we have exclusive access to the tree
+ match *t {
+ BinaryTree::Leaf(ref mut payload) => *payload += 1,
+ BinaryTree::Node(ref mut left, ref mut payload, ref mut right) => {
+ tree_grow(left);
+ *payload += 1;
+ tree_grow(right);
+ }
+ }
+}
+
+#[test]
+fn tree_demo_3() {
+ let mut tree = sample_tree();
+ tree_grow(&mut tree);
+ assert_eq!(tree_weight_v2(&tree), (2 + 3 + 4) + 5 + 6);
+}
+
+
Note that the code above now binds payload
by a ref mut
-pattern;
+if it did not use a ref
pattern, then payload
would be bound to a
+local copy of the integer, while we want to modify the actual integer
+in the tree itself. Thus we need a reference to that integer.
+
Note also that the code is able to bind left
and right
+simultaneously in the Node
arm. The compiler knows that the two
+values cannot alias, and thus it allows both &mut
-references to live
+simultaneously.
+
Conclusion
+
Rust takes the ideas of algebraic data types and pattern matching
+pioneered by the functional programming languages, and adapts them to
+imperative programming styles and Rust's own ownership and borrowing
+systems. The enum
and match
forms provide clean data definitions
+and expressive power, while static analysis ensures that the resulting
+programs are safe.
+
For more information
+on details that were not covered here, such as:
+
+-
+
how to say Higher
instead of Answer::Higher
in a pattern,
+
+-
+
defining new named constants,
+
+-
+
binding via ident @ pattern
, or
+
+-
+
the potentially subtle difference between { let id = expr; ... }
versus match expr { id => { ... } }
,
+
+
+
consult the Rust
+documentation, or quiz our awesome community (in #rust
on IRC, or in
+the user group).
+
(Many thanks to those who helped review this post, especially Aaron Turon
+and Niko Matsakis, as well as
+Mutabah
, proc
, libfud
, asQuirrel
, and annodomini
from #rust
.)
+
+