Functional Algorithms Verified
Functional Algorithms Verified
Functional Algorithms,
Verified!
em
at D ostr
Er
Tinsert a t insert a t t lg jtj
an
+ ( ) ( ) 3 ( + 3) + 2
Quod
dum
QED
Is L
ab
elle / H O
2
Isabelle
Isabelle [63, 76, 57] is a proof assistant for the logic HOL (= Higher-Order
Logic), which is why the system is often called Isabelle/HOL. HOL is a gen-
eralization of first-order logic: functions can be passed as parameters and
returned as results, just as in functional programming, and they can be quan-
tified over. Isabelle’s version of HOL also supports a simple version of Haskell’s
type classes.
The main strength of Isabelle and other proof assistants is their trustwor-
thiness: all definitions, lemma statements, and inferences are checked. Beyond
trustworthiness, formal proofs can also clarify arguments, by exposing and
4 Preface
explaining difficult steps. Most Isabelle users will confirm that their pen-and-
paper proofs have become more lucid, and more correct, after they subjected
themselves to the discipline of formal proof.
As emphasized above, the reader need not be familiar with Isabelle or
HOL in order to read this book. However, to take full advantage of our proof
assistant approach, readers are encouraged to learn how to write Isabelle def-
initions and proofs themselves — and to solve some of the exercises in this
book. To this end we recommend the tutorial Programming and Proving in
Isabelle/HOL [52], which is also Part I of the book Concrete Semantics [56].
Prerequisites
We expect the reader to be familiar with
• the basics of discrete mathematics: propositional and first-order logic, sets
and relations, proof principles including induction;
• a typed functional programming language like Haskell [27], OCaml [60] or
Standard ML [64];
• simple inductive proofs about functional programs.
Under Development
This book is meant to grow. New chapters are meant to be added over time.
The list of authors is meant to grow — you could become one of them!
Colour
For the quick orientation of the reader, definitions are displayed in coloured
boxes:
From a logical point of view there is no difference between the two kinds of
definitions except that auxiliary definitions need not be executable.
1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4 Binary Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7 2-3 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
8 Red-Black Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
12 Tries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
20 Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
Part V Appendix
C Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
1
Basics
In this chapter we describe the basic building blocks the book rests on.
Programs: The functional programming language we use is merely sketched
because of its similarity with other well known functional languages.
Predefined types and notation: We introduce the basic predefined types
and notations used in the book.
Inductive proofs: Although we do not explain proofs in general, we make
an exception for certain inductive proofs.
Running time: We explain how we model running time by step counting
functions.
1.1 Programs
A useful bit of notation: any infix operator can be turned into a function
by enclosing it in parentheses, e.g. (+).
1.2 Types
Type variables are denoted by 0a, 0b, etc. The function type arrow is ).
Type constructor names follow their argument types, e.g. 0a list . The notation
t :: means that term t has type . The following types are predefined.
Booleans
Type bool comes with the constants True and False and the usual operations.
Numbers
There are three numeric types: the natural numbers nat (0, 1, . . . ), the inte-
gers int and the real numbers real. They correspond to the mathematical sets
NZ R
, and and not to any machine arithmetic. All three types come with the
usual (overloaded) operations.
Sets
The type 0a set of sets (finite and infinite) over type 0a comes with the
standard mathematical operations. The minus sign “ ”, unary or binary, can
denote set complement or difference.
Lists
The type 0a list of lists whose elements are of type 0a is a recursive data type:
Constant Nil represents the empty list and Cons x xs represents the list with
first element x, the head, and rest list xs, the tail. The following syntactic
sugar is sprinkled on top;
[] Nil
x # xs Cons x xs
[ x 1; :::; x n] x1 # :: : # xn # []
The symbol means that the left-hand side is merely an abbreviation of the
right-hand side.
A library of predefined functions on lists is shown in Appendix A. The
length of a list xs is denoted by jxs j.
1.2 Types 9
Type 0a option
Pairs are written (a ; b ). Functions fst and snd select the first and second
component of a pair: fst (a ; b ) = a and snd (a ; b ) = b. The type unit
contains only a single element (), the empty tuple.
Functions are defined by equations and pattern matching, for example over
lists. Natural numbers may also be used in pattern-matching definitions:
fib (n + 2) = fib (n + 1) + fib n
Occasionally we use an extension of pattern matching where patterns can
be named. For example, the defining equation
f (x # ( y # zs =: ys )) = ys @ zs
introduces a variables ys on the left that stands for y # zs and can be referred
to on the right. Logically it is just an abbreviation of
f (x # y # zs ) = ( let ys = y # zs in ys @ zs )
although it is suggestive of a more efficient interpretation. The general format
is pattern =: variable.
The numeric types nat , int and real are all distinct. Converting between them
requires explicit coercion functions, in particular the inclusion functions int
:: nat ) int and real :: nat ) real that do not lose any information (in
1.2.3 Multisets
⦃⦄ :: 0a multiset
0 0
( 2#) :: a ) a multiset ) bool
Their meaning: ⦃⦄ is the empty multiset; ( 2#) is the element test; add_mset
adds an element to a multiset; (+) is the sum of two multisets, where multi-
plicities of elements are added; size M, written jM j, is the number of elements
in M, taking multiplicities into account; mset converts a list into a multiset
by forgetting about the order of elements; set_mset converts a multiset into a
set; image_mset applies a function to all elements of a multiset; filter_mset
removes all elements from a multiset that do not satisfy the given predicate;
sum_mset is the sum of the values of a multiset, the iteration of (+) (taking
multiplicity into account).
We use some additional suggestive syntax for some of these operations:
⦃x 2# M j P x⦄ filter_mset P M
⦃f x j x 2 M⦄ image_mset f M
P#
P # M sum_mset M
x 2#M f x sum_mset (image_mset f M )
See Section C.3 in the appendix for an overview of such syntax.
1.3 Notation
1.4 Proofs
Proofs are the raison d’être of this book. Thus we present them in more
detail than is customary in a book on algorithms. However, not all proofs:
• We omit proofs of simple properties of numbers, lists, sets and multisets,
our predefined types. Obvious properties (e.g. jxs @ ys j = jxs j + jys j or
commutativity of [) are used implicitly without proof.
• With some exceptions, we only state properties if their proofs require
induction, in which case we will say so, and we will always indicate which
supporting properties were used.
• If a property is simply described as “inductive” or its proof is described by a
phrase like “by an easy/automatic induction” it means that in the Isabelle
proofs all cases of the induction were automatic, typically by simplification.
As a simple example of an easy induction consider the append function
(@) ::
0a list ) 0a list ) 0a list
[] @ys = ys
(x # xs ) @ ys = x # xs @ ys
Because most of our proofs are about recursive functions, most of them are
by induction, and we say so explicitly. If we do not state explicitly what form
the induction takes, it is by an obvious structural induction. The alternative
and more general induction schema is computation induction where the
induction follows the terminating computation, but from the bottom up. For
example, the terminating recursive definition for gcd :: nat ) nat ) nat
gcd m n = ( if n = 0 then m else gcd n (m mod n ))
gives rise to the following induction schema:
If (n 6= 0 ! P n (m mod n )) ! P m n (for all m and n),
then P m n (for all m and n).
In general, let f :: ) 0 be a terminating function of, for simplicity, one
argument. Proving P (x :: by induction on the computation of f means
)
proving
P r1 ^ : : : ^ P rn ! P e
for every defining equation
fe = ::: f r1 ::: f rn :::
where f r 1 , . . . , f r n are all the recursive calls. For simplicity we have ig-
nored the if and case contexts that a recursive call f r i occurs in and that
should be preconditions of the assumption P r i as in the gcd example. If the
defining equations for f overlap, the above proof obligations are stronger than
necessary.
E f p : : : pn
[[ 1 = e ]] = (Tf p 1 : : : p n = T [[e ]] + 1)
T g e : : : ek
[[ 1 ]] = T e 1 ]] + : : : + T [[e k ]] + Tg e 1 : : : e k
[[ (1.1)
T if b then e 1 else e 2 ]]
[[
T case e of p ) e j : : : j p k ) e k
[[ 1 1 ]]
= T e [[case e of p ) T e j : : : j p k ) T
]] + ( 1 [[ 1 ]] e k ]])
[[
T let x
[[ = e 1 in e 2 ]] = T e 1 ]]
[[ + ( x : T e 2 ]]) e 1
[[
Note that T [[e 2 ]] must be evaluated completely before (x : T [[e 2 ]]) e 1
is -contracted to ensure that T [[x ]] = 1 (see above) kicks in instead of
generating multiple copies of T [[e 1 ]].
• For simplicity we restrict ourselves to a first-order language above. Nev-
ertheless we use a few basic higher-order functions like map in the book.
Their running time functions are defined in Appendix B.1.
As an example consider the append function (@) defined above. The defin-
ing equations for Tappend :: 0a list ) 0a list ) nat are easily derived. The
first equation translates like this:
E[[[] @ ys = ys ]]
= (Tappend [] ys = T ys ]]
[[ + 1 )
= (Tappend [] ys = 2 )
The right-hand side of the second equation translates like this:
14 1 Basics
T x
[[ # xs @ys ]]
= T [[ x ]] + T xs @ ys ]] + TCons x (xs @ ys )
[[
= 1 + (1 + 1 + Tappend xs ys ) + 1
Tappend [] ys = 1
Tappend (x # xs ) ys = Tappend xs ys + 1
Unless noted otherwise we always work with canonical running time functions.
This may also involve looking inside if/case when reducing additive constants,
e.g. reducing (if b then 2 + e else 3) + 2 to if b then 1 + e else 1.
Occasionally we will apply one more simplification step: when defining
some Tf , you may drop additive constants when they are added to some term
Tg : : : where T g has already been defined (i.e. g 6= f and f and g are not
mutually recursive). This is because by construction all Tg return positive
results and adding a constant does not change O(Tf ).
In the main body of the book we initially show the definition of each
Tf . Once the principles above have been exemplified sufficiently, the time
functions are relegated to Appendix B.
This section exemplifies not just the definition of time functions but also their
analysis. The standard list reversal function rev is defined in Appendix A.
This is the corresponding canonical time function:
Trev ::
0a list ) nat
Trev [] = 1
Trev (x # xs ) = Trev xs + Tappend (rev xs ) [x ] + 1
1.5 Running Time 15
itrev ::
0a list ) 0a list ) 0a list
itrev [] ys = ys
itrev (x # xs ) ys = itrev xs (x # ys )
Titrev ::
0a list ) 0a list ) nat
Titrev [] _ = 1
Titrev (x # xs ) ys = Titrev xs (x # ys ) + 1
1.5.2 Discussion
Analysing the running time of any program, let alone a functional one, is
tricky unless you have two formally specified entities: a machine model that
includes a notion of running time and a translation from programs to ma-
chines. For imperative programs the standard model is the Random Access
Machine (RAM) where each instruction takes one time unit. For functional
programs a standard measure is the number of function calls. We adopt this
measure and equate it with time although a more honest name would be
computation steps.
A full proof that the execution time of our functional programs is in O(Tf )
on some actual soft- and hardware is a major undertaking: one would need
to formalize the full stack of compiler, runtime system and hardware. We
do not offer such a proof. Thus our formalization of “time” should be seen as
conditional: given a stack that satisfies our basic assumptions in the definition
of E and T , our analyses are correct for that stack. We argue that these
assumptions are not unreasonable (on a RAM) provided we accept that both
the address space and numbers have a fixed size and cannot grow arbitrarily.
16 1 Basics
Of course this means that actual program execution may abort if the resources
are exhausted.
Our basic assumption is that function calls take constant time. This is
reasonable because we just need to allocate (and later deallocate) a new stack
frame of constant size because all parameters are references or numbers and
thus of fixed size. We also assumed that variable access takes constant time.
This is a standard RAM assumption. Assuming that constructor functions
take constant time is reasonable because the memory manager could simply
employ a single reference to the first free memory cell and increment that with
each constructor function call. How to account for garbage collection is less
clear. In the worst case we have to assume that garbage collection is switched
off, which simply exhausts memory more quickly. Finally we assume that
operations on bool and numbers take constant time. The former is obvious,
the latter follows from our assumption that we have fixed-size numbers.
In the end, we are less interested in a specific model of time and more in
the principle that time (and other resources) can be analyzed just as formally
as functional correctness once the ground rules (e.g. T ) have been established.
The above approach to running time analysis is nicely concrete and avoids the
more sophisticated machinery of asymptotic notation, O(:) and friends. Thus
we have intentionally lowered the entry barrier to the book for readers who
want to follow the Isabelle formalization: we require no familiarity with Isa-
belle’s real analysis library and in particular with the existing formalization of
and automation for asymptotic notation [17]. Of course this comes at a price:
one has to come up with and reason about somewhat arbitrary constants in
the analysis of individual functions. Moreover we rarely appeal to the “master
theorem" (although Eberl [17] provides a generalized version) but prove so-
lutions to recurrence equations correct by induction. Again, this is merely to
reduce the required mathematical basis and to show that it can be done. In
informal explanations, typically when considering inessential variations, we
do use standard mathematical notation and write, for example, O(n lg n).
Part I
In this chapter we define and verify the following sorting functions: insertion
sort, quicksort, and one top-down and two bottom-up merge sorts. We also
analyze their running times, except for quicksort.
Sorting involves an ordering. In this book we assume such an ordering
to be provided by comparison operators and < defined on the underlying
type.
Sortedness of lists is defined as follows:
sorted :: (
0a ::linorder ) list ) bool
sorted [] = True
sorted (x # ys ) = (( 8 y 2set ys : x y ^ sorted ys
) )
That is, every element is than all elements to the right of it: the list is
sorted in increasing order. But what does 0a ::linorder mean?
The type variable 0a is annotated with linorder, which means that sorted
is only applicable if a binary predicate () :: 0a ) 0a ) bool is defined and
() is a linear order, i.e. the following properties are satisfied:
reflexivity: xx
transitivity: y^yz !x z
x
antisymmetry: b^ba !a b
a =
y_yx
linearity/totality: x
Moreover, the binary predicate < must satisfy ( )
x <y ! x y ^ x 6 y. =
in types: you can assume that if you see or < on a generic type 0a in this
book, 0a is implicitly annotated with linorder. Note further that () on the
numeric types nat, int and real is a linear order.
insort1 ::
0a ) 0a list ) 0a list
insort1 x [] = [x ]
insort1 x (y # ys ) = ( if x y then x # y # ys else y # insort1 x ys )
insort ::
0a list ) 0a list
insort [] = []
insort (x # xs ) = insort1 x (insort xs )
These are the canonical running time functions (according to Section 1.5):
Tinsort 1 ::
0a ) 0a list ) nat
Tinsort 1 _ [] = 1
Tinsort 1 x (y # ys ) = ( if x y then 0 else Tinsort 1 x ys ) + 1
Tinsort ::
0a list ) nat
Tinsort [] = 1
Tinsort (x # xs ) = Tinsort xs + Tinsort 1 x (insort xs ) + 1
A dismal quadratic upper bound for the running time of insertion sort is
proved readily:
2.2.3 Exercises
Exercise 2.1. Show that any sorting function has the same input/output
behaviour as insertion sort:
(8 xs : mset (f xs ) = mset xs ) ^ 8 xs : sorted
( f xs ))
( !
f xs = insort xs
Exercise 2.2. Show that Tinsort achieves its optimal value of 2 n + 1 for
sorted lists, and its worst-case value of (n + 1) (n + 2) div 2 for the list
rev [0::<n ].
2.3 Quicksort
quicksort ::
0a list ) 0a list
quicksort [] = []
quicksort (x # xs )
= quicksort (filter (y : y < x ) xs ) @
2.3.2 Exercises
Exercise 2.3. Function quicksort appends the lists returned from the recur-
sive calls. This is expensive because the running time of (@) is linear in the
length of its first argument. Define a function quicksort2 :: 0a list ) 0a list
) 0a list that avoids (@) but accumulates the result in its second parameter
via (#) only. Prove quicksort2 xs ys = quicksort xs @ ys.
partition3 ::
0a ) 0a list ) 0a list 0a list 0a list
partition3 x xs
= (filter (y : y <x ) xs ; filter y : y
( = x ) xs ;
filter (y : y >x ) xs )
quicksort3 ::
0a list ) 0a list
quicksort3 [] = []
24 2 Sorting
quicksort3 (x # xs )
= (let (ls ; es ; gs ) = partition3 x xs
in quicksort3 ls @ x # es @ quicksort3 gs )
Prove that this version of quicksort also produces the correct results.
Tquicksort ::
0a list ) nat
Tquicksort [] = 1
Tquicksort (x # xs ) = Tquicksort (filter (y : y <x ) xs ) +
2 Tfilter (_: 1) xs + 1
Tquicksort xs a jxs j 2
+ b jxs j + c
merge ::
0a list ) 0a list ) 0a list
merge [] ys = ys
2.4 Top-Down Merge Sort 25
merge xs [] = xs
merge (x # xs ) (y # ys )
= (if x y then x # merge xs (y # ys )
else y # merge (x # xs ) ys )
msort ::
0a list ) 0a list
msort xs
= (let n = jxs j
in if n 1 then xs
else merge (msort (take (n div 2) xs ))
(msort (drop (n div 2) xs )))
= mset xs ut
Now we turn to sortedness. An induction on the computation of merge,
using (2.11), yields
sorted (merge xs ys ) = ( sorted xs ^ sorted ys ) (2.12)
The proof is an easy induction on the computation of msort. The base case
(n 1) follows because every list of length 1 is sorted. The induction step
follows with the help of (2.12).
26 2 Sorting
To simplify the analysis, and in line with the literature, we only count the
number of comparisons:
Cmerge ::
0a list ) 0a list ) nat
Cmerge [] _ = 0
Cmerge _ [] = 0
Cmerge (x # xs ) (y # ys )
= 1 + (if x y then Cmerge xs (y # ys ) else Cmerge (x # xs ) ys )
Cmsort ::
0a list ) nat
Cmsort xs
= (let n = jxs j;
ys = take (n div 2) xs ;
zs = drop (n div 2) xs
in if n 1 then 0
else Cmsort ys + Cmsort zs + Cmerge (msort ys ) (msort zs ))
Proof. The proof is by induction on k. The base case is trivial and we con-
centrate on the step. Let n = jxs j, ys = take (n div 2) xs and zs = drop
(n div 2) xs. The case n 1 is trivial. Now assume n > 1.
Cmsort xs
= Cmsort ys + Cmsort zs + Cmerge (msort ys ) (msort zs )
Bottom-up merge sort starts by turning the input [x 1 ; : : : ; x n ] into the list
[[x 1 ]; : : : ; [x n ]]. Then it passes over this list of lists repeatedly, merging pairs
Termination of merge_all relies on the fact that merge_adj halves the length
of the list (rounding up). Computation induction proves
jmerge_adj xs j = ( jxs j + 1) div 2 (2.16)
sorted (msort_bu xs )
The third and the last property prove functional correctness of msort_bu. The
proof of each proposition may use the preceding proposition and the propo-
sitions (2.10) and (2.12). The propositions about merge_adj and merge_all
are proved by computation inductions.
Cmerge_adj ::
0a list list ) nat
Cmerge_adj [] = 0
Cmerge_adj [_] = 0
Cmerge_adj (xs # ys # zss ) = Cmerge xs ys + Cmerge_adj zss
Cmerge_all ::
0a list list ) nat
Cmerge_all [] = 0
Cmerge_all [xs ] = 0
Cmerge_all xss = Cmerge_adj xss + Cmerge_all (merge_adj xss )
Cmsort_bu ::
0a list ) nat
Cmsort_bu xs = Cmerge_all (map ( x : x ]) xs )
[
= m k 2 ut
k
Setting m = 1 we obtain the same upper bound as for top-down merge sort
in Lemma 2.4:
Corollary 2.6. jxs j = 2
k ! Cmsort_bu xs k 2
k
A disadvantage of all the sorting functions we have seen so far (except insertion
sort) is that even in the best case they do not improve upon the n lg n bound.
For example, given the sorted input [1; 2; 3; 4; 5], msort_bu will, as a first
step, create [[1]; [2]; [3]; [4]; [5]] and then merge this list of lists recursively.
A slight variation of bottom-up merge sort, sometimes referred to as “nat-
ural merge sort,” first partitions the input into its constituent ascending and
descending subsequences (collectively referred to as runs) and only then starts
merging. In the above example we would get merge_all [[1; 2; 3; 4; 5]], which
returns immediately with the result [1; 2; 3; 4; 5]. Assuming that obtaining
runs is of linear complexity, this yields a best-case performance that is linear
in the number of list elements.
Function runs computes the initial list of lists; it is defined mutually
recursively with asc and desc, which gather ascending and descending runs
in accumulating parameters:
30 2 Sorting
runs ::
0a list ) 0a list list
runs (a # b # xs )
= (if b < a then desc b [a ] xs else asc b ((#) a ) xs )
runs [x ] = [[x ]]
runs [] = []
asc ::
0a ) (
0a list ) 0a list ) ) 0a list ) 0a list list
asc a as (b # bs )
= (if : b < a then asc b (as (#) a ) bs else as [a ] # runs (b # bs ))
asc a as [] = [as [a ]]
desc ::
0a ) 0a list ) 0a list ) 0a list list
desc a as (b # bs )
= (if b < a then desc b (a # as ) bs else (a # as ) # runs (b # bs ))
desc a as [] = [a # as ]
nmsort ::
0a list ) 0a list
nmsort xs = merge_all (runs xs )
We have
8 xs ys : f (xs @ ys ) = f xs @ ys ) !
(
Cruns ::
0a list ) nat
Cruns (a # b # xs ) = 1 + ( if b < a then Cdesc b xs else Casc b xs )
Cruns [] = 0
Cruns [_] = 0
Casc ::
0a ) 0a list ) nat
Casc a (b # bs ) = 1 + ( if : b < a then Casc b bs else Cruns ( b # bs ))
Casc _ [] = 0
Cdesc ::
0a ) 0a list ) nat
Cdesc a (b # bs ) = 1 + ( if b < a then Cdesc b bs else Cruns ( b # bs ))
Cdesc _ [] = 0
32 2 Sorting
Cnmsort ::
0a list ) nat
Cnmsort xs = Cruns xs + Cmerge_all (runs xs )
Again note the mutually recursive definitions of Cruns , Casc , and Cdesc .
Hence the remark on proofs about runs also applies to proofs about Cruns .
Before talking about Cnmsort , we need a variant of Lemma 2.5 that also
works for lists whose lengths are not powers of two (since the result of runs
will usually not satisfy this property).
To this end, we will need the following two results, which we prove by two
simple computation inductions using (2.15) and (2.13):
jconcat xss j
Cmerge_adj xss (2.27)
jconcat merge_adj xss j jconcat xss j
( ) = (2.28)
Cmerge_all xss
= Cmerge_adj xss + Cmerge_all (merge_adj xss )
jconcat asc a f ys j
( jf j jys j,
) = 1 + [] +
jconcat desc a xs ys j
( jxs j jys j, ) = 1 + +
2.7 Stability
Proof. The proof is by induction on xs. The base case is trivial. In the induc-
tion step we consider the list a # xs and perform a case analysis. If f a 6= k
the claim follows by IH using (2.35). Now assume f a = k :
filter (y : f y = k ) (insort_key f (a # xs ))
= filter (y : f y = k ) (insort1_key f a (insort_key f xs ))
Proof. Suppose x 6
= < y. This implies:
y and then w.l.o.g. x
⦃z 2# mset xs j z x⦄ # ⦃z 2# mset xs j z < y⦄ (3.2)
From this we can prove the contradiction jxs j < jxs j:
jxs j = j⦃z 2# mset xs j z x⦄j j⦃z 2# mset xs j z > x⦄j
+
j⦃z 2# mset xs j z < y⦄j j⦃z 2# mset xs j z > x⦄j using (3.2)
+
On n
( lg ) sorting algorithms and then return the k -th element of the result-
ing sorted list. It is also fairly easy to come up with an algorithm that has
running time O(kn), i.e. that runs in linear-time in n for any fixed k (see
Exercise 3.3).
In the remainder of this chapter, we will look at a selection algorithm due
to Blum et al. [10] that achieves O(n) running time unconditionally (i.e. for
any k < n). Since a selection algorithm must, in general, inspect each element
at least once (see Exercise 3.4), this running time is asymptotically optimal.
3 Selection 37
3.0.1 Exercises
Exercise 3.1. A simple special case of selection is select 0 xs, i.e. the mini-
mum. Implement a linear-time function select0 such that
xs 6
= [] ! select0 xs = select 0 xs
and prove this. This function should be tail-recursive and traverse the list ex-
actly once. You need not prove the linear running time (it should be obvious).
Exercise 3.2. How can your select0 algorithm be modified to obtain an anal-
ogous algorithm select1 such that
jxs j > 1 ! select1 xs = select 1 xs
Do not try to prove the correctness yet; it gets somewhat tedious and you will
be able to prove it more easily after the next exercise.
Exercise 3.3.
1. Based on the previous two exercises, implement and prove correct an
algorithm select_fixed that fulfils
k < jxs j ! select_fixed k xs = select k xs
The algorithm must be tail-recursive with running time O(kn) and tra-
verse the list exactly once.
Hint: one approach is to first define a function take_sort that computes
take m (sort xs ) in time O(mn).
2. Prove your select1 from the previous exercise correct by showing that it
is equivalent to select_fixed 1.
3. Define a suitable time function for your select_fixed. Prove that this time
function is O(kn), i.e. that
Tselect_fixed k xs C1 k jxs j + C2 jxs j + C3 k + C4
for all k < jxs j for some constants C 1 to C 4 .
If you have trouble finding the concrete values for these constants, try
proving the result with symbolic constants first and observe what condi-
tions need to be fulfilled in order to make the induction step go through.
Exercise 3.4. Show that if xs is a list of integers with no repeated elements,
an algorithm computing the result of select k xs must examine every single
element, i.e. for any index i < jxs j, the i -th element can be replaced by some
other number such that the result changes. Formally:
k< jxs j ^ i < jxs j ^ distinct xs !
9 z : distinct xs i z ^ select k xs i
( ( [ := ]) ( [ := z ]) 6
= select k xs )
Here, the notation xs [i := z ] denotes the list xs where the i -th element has
been replaced with z (the first list element, as always, having index 0).
Hint: a lemma you might find useful is that k : select k xs is injective if
xs has no repeated elements.
38 3 Selection
select k xs
= select k (ls @ es @ gs ) using (3.5)
= if k < jls j then select k ls
else if k jls j < jes j then select (k jls j ) es using (3.6)
else select (k jls j jes j) gs
Clearly, k jls j < jes j ! k < jls j jes j and select k jls j es
+ ( ) = x
since select (k jls j) es 2 set es and set es fx g by construction.
= ut
Note that this holds for any pivot x. Indeed, x need not even be in the
list itself. Therefore, the algorithm is (partially) correct no matter what pivot
we choose. However, the number of recursive calls (and thereby the running
time) depends strongly on the pivot choice:
• If we always choose a pivot that is smaller than any element in the list,
the algorithm does not terminate at all.
• If we choose the smallest element in the list as a pivot every time, only
one element is removed from the list in every recursion step so that we get
n recursive calls in total. Since we do a linear amount of work in every
step, this leads to a running time of (n2 ).
• If we choose pivots from the list at random, the worst-case running time is
again (n2 ), but the expected running time can be shown to be (n), sim-
ilarly to the situation in Quicksort. Indeed, it can also be shown that it is
very unlikely that the running time is significantly “worse than linear” [39,
Section 2.5].
• If we choose a pivot that cuts the list in half every time, we get roughly
lg n recursive steps and, by the Master Theorem, a running time of (n)
median ::
0a list ) 0a
median xs = select (( jxs j 1) div 2) xs
Exercise 3.5. Show that select k xs can be reduced to computing the median
of a list in linear time, i.e. give a linear-time function reduce_select_median
such that
xs 6= [] ^ k < jxs j !
reduce_select_median k xs 6= [] ^
median (reduce_select_median k xs ) = select k xs
Since, as we have seen, computing a true median in every recursive step is too
expensive, the next natural question is: is there something that is almost as
good as a median but easier to compute?
This is indeed the case, and this is where the ingenuity of the algorithm
is hidden: instead of computing the median of all the list elements, we use
another list element M, which is computed as follows:
• chop the list into groups of 5 elements each (possibly with one smaller
group at the end if n is not a multiple of 5)
• compute the median of each of the d n5 e groups (which can be done in
constant time for each group using e.g. insertion sort, since their sizes are
bounded by 5)
• compute the median M of these d n5 e elements (which can be done by a
recursive call to the selection algorithm)
We call M the median of medians. M is not quite as good a pivot as the
true median, but it is still fairly decent:
Theorem 3.4 (Pivoting bounds for the median of medians).
Let xs be a list and let be either < or >. Define
M := median (map median (chop 5 xs )) .
where the chop function performs the chopping described earlier. Then:
j⦃y 2# mset xs j y M⦄j d : n 0 7 + 3 e
Proof. The result of chop 5 xs is a list of dn = 5e chunks, each of size at
most 5, i.e. jchop 5 xs j = dn = 5e. Let us split these chunks into two groups
according to whether their median is M or M :
Y := ⦃ys 2# mset ( chop 5 xs ) j median ys M⦄
Y := ⦃ys 2# mset ( chop 5 xs ) j median ys M⦄
We clearly have
3.2 The Median of Medians 41
P
mset xs = (
ys chop 5 xs mset ys ) (3.8)
mset (chop 5 xs ) = Y + Y (3.9)
dn = e jY j jY j
5 = + (3.10)
and since M is the median of the medians of the groups, we also know that:
jY j < dn = e
1
2
5 (3.11)
The core idea of the proof is that any group ys 2# Y can have at most 2
elements that are M :
j⦃y 2# mset ys j y M⦄j
j⦃y 2# mset ys j y median ys⦄j because median ys M
jys j div 2 using (3.1)
div5 2 = 2
And of course, since each group has size at most 5, any group in ys 2# Y
can contribute at most 5 elements. In summary, we have:
8 ys 2#Y : j⦃y 2# mset ys j y M⦄j 2
=
Pys chop xs
Y ⦃y 2# mset ys j y M⦄
5
=
ys 2# Y( + )
using (3.9)
Taking the size of both sides, we have
j⦃yP2# mset xs j y M⦄j
Pys 2# Y Y j⦃y 2# mset ys j y M⦄j
( + )
=
Pys 2#Y
j⦃y 2# mset ys j y M⦄j +
Pys 2#Y
j⦃y 2# mset
P
ys j y M⦄j
ys 2#Y
(
ys 2#Y
5) + ( 2) using (3.12)
= 5 jY j jY j
+ 2
= 2 dn = e 5 jY j
+ 3 using (3.10)
: dn = e
3 5 5 using (3.11)
: dn = e d : n e
= 3 5 5 0 7 + 3
The delicate arithmetic reasoning about rounding in the end can thankfully
be done fully automatically by Isabelle’s linarith tactic. ut
42 3 Selection
We now have all the ingredients to write down our algorithm: the base cases
(i.e. sufficiently short lists) can be handled using the naive approach of per-
forming insertion sort and then returning the k -th element. For bigger lists,
we perform the divide-and-conquer approach outlined in Theorem 3.3 using
M as a pivot. We have two recursive calls: one on a list with exactly d0:2 n e
elements to compute M, and one on a list with at most d0:7 n + 3e. We will
still need to show later that this actually leads to a linear-time algorithm, but
the fact that 0:7 + 0:2 < 1 is at least encouraging.
The full algorithm looks like this:
slow_median :: 0a list ) 0a
slow_median xs = slow_select (( jxs j 1) div 2) xs
(ls ; es ; gs ) = partition3 M xs
seems somewhat arbitrary. Find the smallest constant n 0 for which the algo-
rithm still works. Why do you think 20 was chosen?
else T’mom_select d : ne
0 2 + T’mom_select d : n
0 7 + 3 e + 17 n + 50)
The time functions of the auxiliary functions used here can be found in Sec-
tion B.2 in the appendix. The proof is a simple computation induction using
Theorem 3.4 and the time bounds for the auxiliary functions from Chapter B
in the appendix.
The next section will be dedicated to showing that T’mom_select 2 O(n).
Exercise 3.7. Show that the upper bound d0:7 n + 3e is fairly tight by
giving an infinite family (xs i )i 2 of lists with increasing lengths for which
more than 70 % of the elements are larger than the median of medians (with
chopping size 5). In Isabelle terms: define a function f :: nat ) nat list such
that 8 n : jf n j < jf (n + 1)j and
j⦃y 2# mset ( f n) j y > mom ( f n ) ⦄j
> :
jf n j 0 7
where 0 < a; b < and C ; C > . The Akra–Bazzi Theorem then tells
1 1 2 0
Instead of presenting the full theorem statement and its proof right away,
let us take a more explorative approach. What we want to prove in the end
is that there are real constants C 3 > 0 and C 4 such that f n C3 n +
C4 for all n. Suppose these constants are fixed already and now we want to
prove that the inequality holds for them. For simplicity of the presentation,
we assume b ; d 0, but note that these assumptions are unnecessary and
the proof still works for negative b and d if we replace b and d with max 0 b
and max 0 d.
The obvious approach to show this is by induction on n, following the
structure of the recurrence above. To do this, we use complete induction
(i.e. the induction hypothesis holds for all n 0 < n)1 and a case analysis on n
> n 0 (where n 0 is some constant we will determine later).
The two cases we have to show in the induction are then:
Base case: 8 n n 0 : f n C 3 n + C 4
Step: 8 n >n 0 : (8 m <n : f m C 3 m + C 4 ) !fnC n 3 + C4
We can see that in order to even be able to apply the induction hypothesis
in the induction step, we need da n + b e < n. We can make the estimate2
da n + be a n + b + 1 <n
!
n0 b
1a
+ 1
and
1
n0
c
d + 1
(3.13)
However, it will later turn out that these are implied by the other conditions
we will have accumulated anyway.
Now that we have ensured that the basic structure of our induction will
work out, let us continue with the two cases.
The base cases (n n 0 ) is fairly uninteresting: we can simply choose C 4
to be large enough to satisfy all of them.
In the recursive step, unfolding one step of the recurrence and applying
the induction hypotheses leaves us with the following proof obligation
da n b e dc n de n
!
C3 + + C4 + ( C3 + + C4 ) + C1 + C2
C3 n C4 ,+
1
In Isabelle, the corresponding rule is called less_induct.
!
2
The notation < stands for “must be less than”. It emphasises that this inequality
is not a consequence of what we have shown so far, but something that we still
need to show, or in this case something that we need to ensure by adding suitable
preconditions.
46 3 Selection
or, equivalently,
da n be dc n de n
!
C3 ( + + + n) + C1 + C2 + C4 0 ,
We estimate the left-hand side like this:
C 3 (da n + b e + dc n + d e n ) + C1 n + C2 + C4
C3 (a n + b + 1 + (c n + d + 1) n ) + C1 n + C2 + C4
= C3 (b + d + 2) + C2 + C4 (C3 (1 a c) C1 ) n ()
C3 (b + d + 2) + C2 + C4 (C3 (1 a c) C1 ) n 0 (y)
!
0
The step from () to (y) uses the fact that n > n 0 and requires the factor C3
(1 a c ) C1 in front of the n to be positive, i.e. we need to add the
assumption
C3 >
C1
. (3.14)
1 a c
The term (y) (which we want to be 0) is now a constant. If we solve that
inequality for C 3 , we get the following two additional conditions:
C1 n 0 + C2 + C4
n0 > and C3
b +d +2
(3.15)
1 a c (1 a c) n 0 b d 2
The former of these directly implies our earlier conditions (3.13), so we can
safely discard those now.
Now all we have to do is to find a combinations of n 0 , C 3 , and C 4 that
satisfy (3.14) and (3.15). This is straightforward:
n 1 := max n 0
b
1
+ d
a
+ 2
c
+ 1 C 4 := Max ff n j n g 20
C1 n 1 + C2 + C4
C 3 := max
1
C1
a c (1 a c) n 1 b d 2
And with that, the induction goes through and we get the following theorem:
8 n >n : f n f da n b e f dc n
( 0 = + + + de + C1 n + C 2) !
9 C C : 8 n: f n C n C
( 3 4 3 + 4)
Exercise 3.8.
1. Suppose that instead of groups of 5, we now chop into groups of size l 1.
Prove a corresponding generalisation of Theorem 3.4.
2. Examine (on paper only): how does this affect correctness and running
time of our selection algorithm? Why do you think l = 5 was chosen?
Part II
Search Trees
4
Binary Trees
hi Leaf
hl ; x ; r i Node l x r
The trees l and r are the left and right children of the node hl ; x ; r i.
Because most of our tree will be binary trees, we drop the “binary” most
of the time and have also called the type merely tree.
When displaying a tree in the usual graphical manner we show only the
Nodes. For example, hhhi; 3; hii; 9; hhi; 7; hiii is displayed like this:
3 7
The (label of the) root node is 9. The depth (or level) of some node (or leaf)
in a tree is the distance from the root. We use these concepts only informally.
The inorder, preorder and postorder traversals (we omit the latter) list
the elements in a tree in a particular order:
inorder ::
0a tree ) 0a list
inorder hi = []
inorder hl ; x ; r i = inorder l @ [ x] @ inorder r
preorder ::
0a tree ) 0a list
preorder hi = []
preorder hl ; x ; r i = x # preorder l @ preorder r
These two size functions count the number of Nodes and Leaf s in a tree:
size ::
0a tree ) nat
jhij = 0
jhl ; _; r ij jl j jr j= + + 1
size1 ::
0a tree ) nat
jhij 1 = 1
jhl ; _; r ij 1 = jl j
1 + jr j 1
The syntactic sugar jt j for size t and jt j1 for size1 t is only used in this text,
not in the Isabelle theories.
Induction proves a convenient fact that explains the name size1:
jt j 1 = jt j + 1
The height and the minimal height of a tree are defined as follows:
4.2 Complete Trees 51
height ::
0a tree ) nat
h hi = 0
h hl ; _; r i = max (h l ) (h r ) + 1
min_height ::
0a tree ) nat
mh hi = 0
mh hl ; _; r i = min (mh l ) (mh r ) + 1
You can think of them as the longest and shortest (cycle-free) path from the
root to a leaf. The real names of these functions are height and min_height.
The abbreviations h and mh are only used in this text, not in the Isabelle
theories.
The obvious properties h t jt j and mh t h t and the following classical
properties have easy inductive proofs:
2
mh t jt j 1 jt j
1 2
ht
subtrees ::
0a tree ) 0a tree set
subtrees hi = fhig
subtrees hl ; a ; r i = fhl ; a ; r ig [ subtrees l [ subtrees r
Note that every tree is a subtree of itself.
Exercise 4.1. Function inorder has quadratic complexity because the run-
ning time of (@) is linear in the length of its first argument. Define a function
inorder2 :: 0a tree ) 0a list ) 0a list that avoids (@) but accumulates the
result in its second parameter via (#) only. Its running time should be linear
in the size of the tree. Prove inorder2 t xs = inorder t @ xs.
A complete tree is one where all the leaves are on the same level. An example
is shown in Figure 4.1. The predicate complete is defined recursively:
52 4 Binary Trees
Fig. 4.1. A complete tree
complete ::
0a tree ) bool
complete hi = True
complete hl ; _; r i = (h l = hr ^ complete l ^ complete r )
This recursive definition is equivalent with the above definition that all leaves
must have the same distance from the root. Formally:
Lemma 4.1. complete t ! mh t = ht
Proof. The proof is by induction and case analyses on min and max. ut
The following classic property of complete trees is easily proved by induction:
It turns out below that this is in fact a defining property of complete trees.
For complete trees we have (2:: 0b )mh t jt j1 = 2h t . For incomplete trees
the equalities the = sign becomes a < as the following two lemmas prove:
Lemma 4.3. : complete t ! jt j < 1 2
ht
The proof of this lemma is completely analogous to the previous proof except
that one also needs to use Lemma 4.1.
From the contrapositive of Lemma 4.3 one obtains jt j1 = 2h t ! com-
plete t, the converse of Lemma 4.2. Thus we arrive at:
4.3 Almost Complete Trees 53
The complete trees are precisely the ones where the height is exactly the
logarithm of the number of leaves.
Exercise 4.2. Define a function mcs that computes a maximal complete sub-
tree of some given tree. You are allowed only one traversal of the input but
you may freely compute the height of trees and may even compare trees for
equality. You are not allowed to use complete or subtrees.
Prove that mcs returns a complete subtree (which should be easy) and
that it is maximal in height:
u 2 subtrees t ^ complete u ! h u h ( mcs t )
Bonus: get rid of any tree equality tests in mcs.
An almost complete tree is one where the leaves may occur not just at the
lowest level but also one level above:
acomplete ::
0a tree ) bool
acomplete t = ( ht mh t 1)
An example of an almost complete tree is shown in Figure 4.2. You can think
of an almost complete tree as a complete tree with (possibly) some additional
nodes one level below the last full level.
Fig. 4.2. Almost complete tree
Almost complete trees are important because among all the trees with the
same number of nodes they have minimal height:
54 4 Binary Trees
We will now see how to convert a list xs into an almost complete tree t such
that inorder t = xs. If the list is sorted, the result is an almost complete
binary search tree (see the next chapter). The basic idea is to cut the list
in two halves, turn them into almost complete trees recursively and combine
them. But cutting up the list in two halves explicitly would lead to an n
lg n algorithm but we want a linear one. Therefore we use an additional nat
parameter to tell us how much of the input list should be turned into a tree.
The remaining list is returned with the tree:
(r ; zs ) = bal n ( 1 m ) (tl ys )
in hl ; hd ys ; r i; zs
( ))
The trick is not to chop xs in half but n because we assume that arithmetic
is constant-time. Hence bal runs in linear time (see Exercise 4.4). Figure 4.3
shows the result of bal 10 [0::9].
2 8
1 4 7 9
0 3 6
Correctness
The following lemma clearly expresses that bal n xs turns the prefix of length
n of xs into a tree and returns the corresponding suffix of xs:
56 4 Binary Trees
Proof by complete induction on n, assuming that the property holds for all
values below n. If n = 0 the claim is trivial. Now assume n 6= 0 and let m = n
div 2 and m 0 = n 1 m (and thus m ; m 0 < n). From bal n xs = (t ; zs )
we obtain l, r and ys such that bal m xs = (l ; ys ), bal m 0 (tl ys ) = (r ;
zs ) and t = hl ; hd ys ; r i. Because m < n jxs j the induction hypothesis
implies xs = inorder l @ ys ^ jl j = m (1). This in turn implies m 0 jtl ys j
and thus the induction hypothesis implies tl ys = inorder r @ zs ^ jr j =
m 0 (2). Properties (1) and (2) together with t = hl ; hd ys ; r i imply the claim
xs = inorder t @ zs ^ jt j = n because ys 6= []. ut
The corresponding correctness properties of the derived functions are easy
consequences:
n jxs j ! inorder bal_list n xs ) = take n xs
(
inorder (balance_list xs ) = xs
n jt j ! inorder (bal_tree n t ) = take n (inorder t )
inorder (balance_tree t ) = inorder t
To prove that bal returns an almost complete tree we determine its height
and minimal height.
Lemma 4.10. n jxs j ^ bal n xs = ( t ; zs ) !ht = d lg ( n + 1) e
Proof. The proof structure is the same as for Lemma 4.9 and we reuse the
variable names introduced there. In the induction step we obtain the simplified
induction hypothesese h l = dlg (m + 1)e and h r = dlg (m 0 + 1)e. This
leads to
h t = max (h l ) (h r ) + 1
= h l + 1 because m 0 m
= dlg (m + 1) + 1e
(and consequently the functions that build on it) returns an almost complete
tree:
Corollary 4.12. n jxs j ^ bal n xs = ( t ; ys ) ! acomplete t
4.4 Augmented Trees 57
4.3.2 Exercises
Exercise 4.4. Prove that the running time of function bal is linear in its first
argument. (Isabelle hint: you need to refer to bal as Balance :bal.)
A tree of type 0a tree only stores elements of type 0a. However, it is frequently
necessary to store some additional information of type 0b in each node too,
usually for efficiency reasons. Typical examples are:
• The size or the height of the tree. Because recomputing them requires
traversing the whole tree.
• Lookup tables where each key of type 0a is associated with a value of type
0b.
In this case we simply work with trees of type ( 0a 0b ) tree and call them
augmented trees. As a result we need to redefine a few functions that should
ignore the additional information. For example, function inorder, when ap-
plied to an augmented tree, should return an 0a list. Thus we redefine it in
the obvious way:
inorder :: (
0a 0b ) tree ) 0a list
inorder hi = []
inorder hl ; (a ; _); r i = inorder l @ a # inorder r
of f on augmented trees that focuses on the 0a-values just like inorder above
does. Of course functions that do not depend on the information in the nodes,
e.g. size and height, stay unchanged.
Note that there are two alternative redefinitions of inorder (and similar
functions): map fst Tree :inorder or Tree :inorder map_tree fst where
Tree :inorder is the original function.
58 4 Binary Trees
sz :: (
0a nat ) tree ) nat
sz hi = 0
sz h_; (_; n ); _i = n
F :: (
0a 0b ) tree ) 0b
F hi = zero
F hl ; (a ; _); r i = f (F l ) a (F r )
This generalizes the definition of size. Let node_f compute the 0b-value from
the 0b-value of its children via f :
b_val :: (
0a 0b ) tree ) 0b
b_val hi = zero
4.4 Augmented Trees 59
4.4.2 Exercises
Exercise 4.5. Augment trees by a pair of a boolean and something else where
the boolean indicates whether the tree is complete or not. Define ch, node_ch
and invar_ch as in Section 4.4.1 and prove the following properties:
invar_ch t ! ch t = (complete t ; ?)
Exercise 4.6. Assume type 0a is of class linorder and augment each Node
with the maximum value in that tree. Following Section 4.4.1 (but mind
the option type!) define mx :: ( 0a 0b ) tree ) 0b option, node_mx and
invar_mx and prove
invar_mx t !
mx t = (if t =hi then None else Some (Max (set_tree t )))
where Max is the predefined maximum operator on finite, non-empty sets.
5
Binary Search Trees
bst :: (
0a ::linorder ) tree ) bool
bst hi = True
bst hl ; a ; r i
= ((8 x 2set_tree l : x < a ^ 8 x 2set_tree r : a < x ^ bst l ^ bst r
) ( ) )
3 9
2 5 8
not found it yet. In the worst case this takes time proportional to the height
of the tree. In later chapters we discuss a number of methods for ensuring
that the height of the tree is logarithmic in its size. For now we ignore all
efficiency considerations and permit our BSTs to degenerate.
Exercise 5.1. The above recursive definition of bst is not a direct transla-
tion of the description “For each node” given in the text. For a more direct
translation define a function
nodes ::
0a tree ) (
0a tree 0a 0a tree ) set
that collects all the nodes as triples (l ; a ; r ). Now define bst_nodes as
bst_nodes t = (8 (l ;a ;r ) 2 nodes t : : : : ) and prove bst_nodes t = bst t.
5.1 Interfaces
Trees are concrete data types that provide the building blocks for realizing
abstract data types like sets. The abstract type has a fixed interface, i.e. set of
operations, through which the values of the abstract type can be manipulated.
The interface hides all implementation detail. In the Search Tree part of the
book we focus on the abstract type of sets with the following interface:
empty :: 0s
insert :: 0a ) 0s ) 0s
delete :: 0a ) 0s ) 0s
isin :: 0s ) 0a ) bool
where 0s is the type of sets of elements of type 0a. Most of our implementa-
tions of sets will be based on variants of BSTs and will require a linear order
on 0a, but the general interface does not require this. The correctness of an
implementation of this interface will be proved by relating it back to HOL’s
type 0a set via an abstraction function, e.g. set_tree.
datatype cmp_val = LT j EQ j GT
cmp :: (
0a :: linorder ) ) 0a ) cmp_val
cmp x y = ( if x < y then LT else if x = y then EQ else GT )
5.2 Implementing Sets via unbalanced BSTs 63
empty ::
0a tree
empty = hi
isin ::
0a tree ) 0a ) bool
isin hi _ = False
isin hl ; a ; r i x
= (case cmp x a of LT ) isin l x j EQ ) True j GT ) isin r x )
insert ::
0a ) 0a tree ) 0a tree
insert x hi hhi; x ; hii
=
LT ) hinsert x l ; a ; r i j
EQ ) hl ; a ; r i j
GT ) hl ; a ; insert x r i )
delete ::
0a ) 0a tree ) 0a tree
delete x hi = hi
delete x hl ; a ; r i
= (case cmp x a of
LT ) hdelete x l ; a ; r i j
EQ ) if r = hi then l
else let (a 0; r 0) = split_min r in hl ; a 0; r 0i j
GT ) hl ; a ; delete x r i)
5.2.1 Deletion
a 0 and r by r 0 where
64 5 Binary Search Trees
delete2 ::
0a ) 0a tree ) 0a tree
delete2 _ hi = hi
delete2 x hl ; a ; r i = ( case cmp x a of
LT ) hdelete2 x l ; a ; r i j
EQ ) join l r j
GT ) hl ; a ; delete2 x r i)
join ::
0a tree ) 0a tree ) 0a tree
join t hi = t
join hi t = t
join ht 1 ; a ; t 2 i ht 3 ; b ; t 4 i
= (case join t 2 t 3 of
hi ) ht 1 ; a ; hhi; b ; t 4 ii j
hu 2 ; x ; u 3 i ) hht 1 ; a ; u 2 i; x ; hu 3 ; b ; t 4 ii)
We call this deletion by joining. The characteristic property of join is
inorder (join l r ) = inorder l @ inorder r.
The definition of join may appear needlessly complicated. Why not this
much simpler version:
join0 t hi = t
join0 hi t = t
join0 ht 1 ; a ; t 2 i ht 3 ; b ; t 4 i = ht ; a ; hjoin
1 0 t 2 t 3 ; b ; t 4 ii
Because with this version of join, deletion may almost double the height of
the tree, in contrast to join and also deletion by replacing, where the height
cannot increase:
Exercise 5.2. First prove that join behaves well:
h (join l r ) max ( h l ) (h r ) + 1
Now show that join0 behaves badly: find an upper bound ub of h (join0 l r )
such that ub is a function of h l and h r. Prove h (join0 l r ) ub and prove
that ub is a tight upper bound if l and r are complete trees.
We focus on delete, deletion by replacing, in the rest of the chapter.
5.4 Correctness Proofs 65
5.3 Correctness
Why is the above implementation correct? Roughly speaking, because the
implementations of empty, insert, delete and isin on type 0a tree simulate
the behaviour of fg, [, and 2 on type 0a set. Taking the abstraction function
into account we can formulate the simulation precisely:
set_tree empty = fg
set_tree (insert x t ) = set_tree t [ fx g
set_tree (delete x t ) = set_tree t fx g
isin t x = (x 2 set_tree t )
However, the implementation only works correctly on BSTs. Therefore we
need to add the precondition bst t to all but the first property. But why are we
permitted to assume this precondition? Only because bst is an invariant of
this implementation: bst holds for empty, and both insert and delete preserve
bst. Therefore every tree that can be manufactured through the interface is
a BST. Of course this adds another set of proof obligations for correctness,
invariant preservation:
bst empty
bst t ! bst ( insert x t )
bst t ! bst ( delete x t )
When looking at the abstract data type of sets from the user (or ‘client’)
perspective, we would call the collection of all proof obligations for the cor-
rectness of an implementation the specification of the abstract type.
Exercise 5.3. Verify the implementation in Section 5.2 by showing all the
proof obligations above, without the detour via sorted lists explained below.
approach is quite generic: it works not only for the BSTs in this chapter but
also for the more efficient variants discussed in later chapters. The remainder
of this section can be skipped if one is not interested in proof automation.
The key idea [54] is to express bst and set_tree via inorder :
bst t = sorted (inorder t ) and set_tree t = set (inorder t )
where
sorted ::
0a list ) bool
sorted [] = True
sorted [_] = True
sorted (x # y # zs ) = ( x < y ^ sorted (y # zs ))
Note that this is “sorted w.r.t. (<)” whereas in the chapter on sorting sorted
was defined as “sorted w.r.t. ()”.
Instead of showing directly that BSTs implement sets, we show that they
implement an intermediate specification based on lists (and later that the
list-based specification implies the set-based one). We can assume that the
lists are sorted because they are abstractions of BSTs. Insertion and deletion
on sorted lists can be defined as follows:
The abstraction function from trees to lists is function inorder. The speci-
fication in Figure 5.1 expresses that empty, insert, delete and isin implement
[], ins_list, del_list and xs x : x 2 set xs. One nice aspect of this specifica-
inorder empty = []
The first and last step are by definition, the second step by induction hypothe-
sis, and the third step by lemmas in Figure 5.2: (5.1) rewrites the assumption
sorted (inorder t ) to sorted (inorder l @ [a ]) ^ sorted (a # inorder
r ), thus allowing (5.5) to rewrite the term ins_list x (inorder l @ a #
inorder r ) to ins_list x (inorder l ) @ a # inorder r .
The lemma library in Figure 5.2 helps to prove the properties in Fig-
ure 5.1. These proofs are by induction on t and lead to (possibly nested) tree
constructor terms like hht 1 ; a 1 ; t 2 i; a 2 ; t 3 i where the t i and a i are variables.
Evaluating inorder of such a tree leads to a list of the following form:
inorder t 1 @ a1 # inorder t 2 @ a2 # ::: # inorder t n
Now we discuss the lemmas in Figure 5.2 that simplify the application of
sorted, ins_list and del_list to such terms.
Terms of the form sorted (xs 1 @ a 1 # xs 2 @ a 2 # : : : # xs n ) are
decomposed into the following basic formulas
sorted (xs @ [a ]) (simulating 8 x 2set xs : x < a)
sorted (a # xs ) (simulating 8 x 2set xs : a < x )
a <b
by the rewrite rules (5.1)–(5.2). Lemmas (5.3)–(5.4) enable deductions from
basic formulas.
68 5 Binary Search Trees
isin :: (
0a 0b ) tree ) 0a ) bool
isin hi _ = False
isin hl ; (a ; _); r i x
= (case cmp x a of LT ) isin l x j EQ ) True j GT ) isin r x )
This works for any kind of augmented BST, not just interval trees.
5.5.2 Intervals
closed set between low and high. The standard mathematical notation is [l; h],
the Isabelle notation is fl ::h g. We restrict ourselves to non-empty intervals:
low p high p
Type 0a can be any linearly ordered type with a minimum element ? (for
example, the natural numbers or the real numbers extended with 1). In-
tervals can be linearly ordered by first comparing low, then comparing high.
The definitions are as follows:
The readers should convince themselves that overlap does what it is supposed
to do: overlap x y = (flow x ::high x g \ flow y ::high y g 6= fg)
We also define the concept of an interval overlapping with some interval
in a set:
has_overlap S y = ( 9 x 2S : overlap x y )
An interval tree associates to each node a number max_hi, which records the
maximum high value of all intervals in the subtrees. This value is updated
during insert and delete operations, and will be crucial for enabling efficient
determination of overlap with some interval in the tree.
max_hi :: 0a ivl_tree ) 0a
max_hi hi = ?
max_hi h_; (_; m ); _i = m
max3 ::
0a ivl ) 0a ) 0a ) 0a
max3 a m n = max (high a ) (max m n )
it follows by induction that max_hi is the maximum value of high in the tree
and comes from some node in the tree:
Interval trees can implement sets of intervals via unbalanced BSTs as in Sec-
tion 5.2. Of course empty = hi. Function isin was already defined in Sec-
tion 5.5.1 Insertion and deletion are also very close to the versions in Sec-
tion 5.2, but the value of max_hi must be computed (by max3) for each new
node. We follow Section 4.4 and introduce a smart constructor node for that
purpose and replace hl ; a ; r i by node l a r (on the right-hand side):
node ::
0a ivl_tree ) 0a ivl ) 0a ivl_tree ) 0a ivl_tree
node l a r = hl ; a ; max a max_hi l max_hi r ; r i
( 3 ( ) ( ))
insert ::
0a ivl ) 0a ivl_tree ) 0a ivl_tree
insert x hi hhi; x ; high x ; hii
= ( )
LT ) node insert x l a r j ( )
EQ ) hl ; a ; m ; r i j ( )
GT ) node l a insert x r ( ))
delete ::
0a ivl
) 0a ivl_tree ) 0a ivl_tree
delete _ hi hi=
delete x hl ; a ; _ ; r i
( )
= ( case cmp x a of
LT ) node (delete x l ) a r j
EQ ) if r = hi then l else let (x ; y ) = split_min r in node l x y j
GT ) node l a (delete x r ))
The correctness proofs for insertion and deletion cover two aspects. Func-
tional correctness and preservation of the invariant sorted inorder (the
BST property) are proved exactly as in Section 5.3 for ordinary BSTs. Preser-
vation of the invariant inv_max_hi can be proved by a sequence of simple
inductive properties. In the end the main correctness properties are
sorted (inorder t ) ! inorder (insert x t ) = ins_list x (inorder t )
sorted (inorder t ) ! inorder
(delete x t ) = del_list x (inorder t )
inv_max_hi t ! inv_max_hi (insert x t )
inv_max_hi t ! inv_max_hi (delete x t )
Defining invar t = (inv_max_hi t ^ sorted ( inorder t )) we obtain the
following top-level correctness corollaries:
invar s ! set_tree insert x s
( ) = set_tree s [ fx g
invar s ! set_tree delete x s
( ) = set_tree s fx g
invar s ! invar insert x s
( )
The added functionality of interval trees over ordinary BSTs is function search
that searches for an overlapping rather than identical interval:
5.5 Interval Trees 73
search ::
0a ivl_tree ) 0a ivl ) bool
search hi _ = False
search hl ; (a ; _); r i x
= (if overlap x a then True
has_overlap (set_tree t ) x
Proof. The result is clear when t is hi. Now suppose t is in the form hl ; (a ;
m ); r i, where m is the value of max_hi at root. If a overlaps with x, search
returns True as expected. Otherwise, there are two cases.
• If l 6= hi and low x max_hi l, the search goes to the left child. If there
is an interval in the left child overlapping with x, then the search returns
True as expected. Otherwise, we show there is also no interval in the right
child overlapping with x. Since l 6= hi, Lemma 5.2 yields a node p in the
left child such that high p = max_hi l. Since low x max_hi l, we have
low x high p. Since p does not overlap with x, we must have high x <
low p. But then, for every interval rp in the right child, low p low rp,
so that high x < low rp, which implies that rp does not overlap with x.
• Now we consider the case where either l = hi or max_hi l < low x. In
this case, the search goes to the right. We show there is no interval in the
left child that overlaps with x. This is clear if l = hi. Otherwise, for each
interval lp, we have high lp max_hi l by Lemma 5.1, so that high lp
< low x, which means lp does not overlap with x.
ut
Exercise 5.4. Define a function that determines if a given point is in some
interval in a given interval tree. Starting with
in_ivl :: 0a ) 0a ivl) bool
in_ivl x iv = ( low iv x ^ x high iv )
5.5.6 Application
assume that each ADT describes one type of interest T. In the set interface
T is 0s. This type T must be specified by some existing HOL type A, the
abstract model. In the case of sets this is straightforward: the model for sets
is simply the HOL type 0a set. The motto is that T should behave like A. In
order to bridge the gap between the two types, the specification needs an
• abstraction function :: T )A
that maps concrete values to their abstract counterparts. Moreover, in general
only some elements of T represent elements of A. For example, in the set
implementation in the previous chapter not all trees but only BSTs represent
sets. Thus the specification should also take into account an
• invariant invar :: T ) bool
Note that the abstraction function and the invariant are not part of the in-
terface, but they are essential for specification and verification purposes.
As an example, the ADT of sets is shown in Figure 6.1 with suggestive
keywords and a fixed mnemonic naming schema for the labels in the speci-
fication. This is the template for ADTs that we follow throughout the book.
ADT Set =
interface
empty :: 0s
insert :: 0a ) 0s ) 0s
delete :: 0a ) 0s ) 0s
isin :: 0s ) 0a ) bool
abstraction set ::
0
s ) 0a set
invariant invar ::
0
s ) bool
specification
invar empty (empty-inv )
set empty = fg (empty)
invar s ! invar insert x s ( ) (insert -inv )
invar s ! set insert x s set s [ fx g
( ) = (insert )
invar s ! invar delete x s ( ) (delete-inv )
invar s ! set delete x s set s fx g
( ) = (delete)
invar s ! isin s x x 2 set s = ( ) (isin)
The actual Isabelle text can of course be found in the source files, and locales
are explained in a dedicated manual [5].
We conclude this section by explaining what the specification of an ar-
bitrary ADT looks like. We assume that for each function f of the interface
there is a corresponding function f A in the abstract model, i.e. defined on A.
For a uniform treatment we extend and invar to arbitrary types by setting
x = x and invar x = True for all types other than T. Each function f of
the interface gives rise to two properties in the specification: preservation of
the invariant and simulation of f A . The precondition is shared:
invar x 1 ^ : : : ^ invar x n !
invar (f x 1 : : : x n ) (f -inv )
(f x 1 : : : x n ) = f A ( x 1) ::: ( x n) (f )
To understand how the specification of ADT Set is the result of this uniform
schema one has to take two things into account:
• Precisely which abstract operations on type 0a set model the functions in
the interface of the ADT Set ? This correspondence is implicit in the spec-
ification: empty is modeled by fg, insert is modeled by x s : s [ fx g,
delete is modeled by x s : s fx g and isin is modeled by s x : x 2 s.
• Because of the artificial extension of and invar the above uniform for-
mat often collapses to something simpler where some ’s and invar ’s
disappear.
6.3 Maps
An even more versatile data structure than sets are (efficient) maps from 0a to
0b. In fact, sets can be viewed as maps from 0a to bool. Conversely, many data
structures for sets also support maps, e.g. BSTs. Although, for simplicity, we
focus on sets in this book, the ADT of maps should at least be introduced. It
is shown in Figure 6.2. Type 0m is the type of maps from 0a to 0b. The ADT
Map is very similar to the ADT Set except that the abstraction function
lookup is also part of the interface: it abstracts a map to a function of type
0a ) 0b option. This implies that the equations are between functions of that
type. We use the function update notation (see Section 1.3) to explain update
and delete: update is modeled by m a b : m (a := b ) and delete by m a :
m (a := None ).
78 6 Abstract Data Types
ADT Map =
interface
empty :: 0m
update :: 0a ) 0b ) 0m ) 0m
delete :: 0a ) 0m ) 0m
lookup :: 0m ) 0a ) 0b option
abstraction lookup
invariant invar :: 0m ) bool
specification
invar empty (empty-inv )
lookup empty = ( a : None ) (empty)
invar m ! invar update a b m
( ) (update-inv )
invar m ! lookup update a b m
( ) = ( lookup m )(a := Some b ) (update)
invar m ! invar delete a m
( ) (delete-inv )
invar m ! lookup delete a m
( ) = ( lookup m )(a := None ) (delete)
6.5 Exercises
Exercise 6.1. Modify the ADT Set specification by making isin the abstrac-
tion function (from 0s to 0a ) bool). Follow the example of the ADT Map
specification.
Exercise 6.2. In the ADT Map specification, following the general schema,
there should be a property labeled (lookup), but it is missing. The reason is
that given the correct abstract model of lookup, the equation becomes triv-
ial: invar m ! lookup m a = lookup m a. Why is that, which function
models lookup?
6.5 Exercises 79
Exercise 6.3. Implement ADT Map via unbalanced BSTs (like Set in Chap-
ter 5) using augmented trees. Verify the implementation by proving all the
correctness properties in the specification of ADT Map directly, without any
detour via sorted lists as in Section 5.4.
7
2-3 Trees
datatype 0a tree23 =
Leaf j
Node2 ( 0a tree23) 0a (
0a tree23) j
Node3 ( 0a tree23) 0a 0 0 0
( a tree23) a ( a tree23)
hi Leaf
hl ; a ; r i Node2 l a r
hl ; a ; m ; b ; r i Node3 l a m b r
The size, height and the completeness of a 2-3 tree are defined by adding
an equation for Node3 to the corresponding definitions on binary trees:
jhl ; _; m ; _; r ij jl j jm j jr j
= + + + 1
82 7 2-3 Trees
h hl ; _; m ; _; r i = max (h l ) (max (h m ) (h r )) + 1
complete hl ; _; m ; _; r i
= (h l = h m ^ h m = h r ^ complete l ^ complete m ^
complete r )
Exercise 7.1. Define a function maxt :: nat ) unit tree23 that creates the
tree with the largest number of nodes given the height of the tree. We use
type unit because we are not interested in the elements in the tree. Prove
jmaxt n j = (3n 1) div 2 and that no tree of the given height can be larger:
jt j (3h t 1) div 2. Note that both subtraction and division on type
nat can be tedious to work with. You may want to prove the two properties
as corollaries of subtraction- and division-free properties. Alternatively, work
with real instead of nat by replacing div by =.
The implementation will maintain the usual ordering invariant and addition-
ally completeness. When we speak of a 2-3 tree we will implicitly assume these
two invariants now.
Searching a 2-3 tree is like searching a binary tree (see Section 5.2) but
with one more defining equation:
isin hl ; a ; m ; b ; r i x
= (case cmp x a of LT ) isin l x j EQ ) True
datatype 0a upI = TI (
0a tree23) j OF ( 0a tree23) 0a ( 0a tree23)
treeI ::
0a upI ) 0a tree23
treeI (TI t ) = t
treeI (OF l a r ) = hl ; a ; r i
Deletion is dual. Recursive calls must report back to the caller if the child
has “underflown”, i.e. decreased in height. Therefore deletion returns a result
of type 0a upD:
datatype 0a upD = TD (
0a tree23) j UF ( 0a tree23)
treeD ::
0a upD ) 0a tree23
treeD (TD t ) = t
treeD (UF t ) = t
84 7 2-3 Trees
ins ::
0
a ) 0
a tree23 ) 0
a upI
ins x hi = OF hi x hi
ins x hl ; a ; r i
= (case cmp x a of
LT ) case ins x l of TI l 0 ) TI hl 0; a ; r i
j OF l 1 b l 2 ) TI hl 1 ; b ; l 2 ; a ; r i
j EQ ) TI hl ; a ; r i
j GT ) case ins x r of TI r 0 ) TI hl ; a ; r 0i
j OF r 1 b r 2 ) TI hl ; a ; r 1 ; b ; r 2 i)
ins x hl ; a ; m ; b ; r i
= (case cmp x a of
LT ) case ins x l of TI l 0 ) TI hl 0; a ; m ; b ; r i
j OF l 1 c l 2 ) OF hl 1 ; c ; l 2 i a hm ; b ; r i
j EQ ) TI hl ; a ; m ; b ; r i
j GT ) case cmp x b of
LT ) case ins x m of TI m 0 ) TI hl ; a ; m 0; b ; r i
j OF m 1 c m 2 ) OF hl ; a ; m 1 i c hm 2 ; b ; r i
j EQ ) TI hl ; a ; m ; b ; r i
j GT ) case ins x r of TI r 0 ) TI hl ; a ; m ; b ; r 0i
j OF r 1 c r 2 ) OF hl ; a ; m i b hr 1 ; c ; r 2 i)
hI (TI t ) = h t
hI (OF l a r ) = h l
The intuition is that hI is the height of the tree before insertion. A routine
induction proves
complete t ! complete ( treeI (ins a t )) ^ hI ins a t )
( = ht
which implies by definition that
complete t ! complete ( insert a t )
7.2 Preservation of Completeness 85
del ::
0
a ) 0
a tree23 ) 0
a upD
del x hi = TD hi
del x hhi; a ; hii = (if x = a then UF hi else TD hhi; a ; hii)
del x hhi; a ; hi; b ; hii
= TD (if x = a then hhi; b ; hii
hD (TD t ) = h t
hD (UF t ) = h t + 1
The intuition is that hD is the height of the tree before deletion. We now list a
sequence of properties that build on each other and culminate in completeness
preservation of delete:
node21 ::
0
a upD ) 0
a ) ) 0a upD
0
a tree23
node21 (TD t 1 ) at 2 TD ht ; a ; t i
= 1 2
node21 (UF t 1 ) a ht ; b ; t i
2 UF ht ; a ; t ; b ; t i
3 = 1 2 3
node21 (UF t 1 ) a ht ; b ; t ; c ; t i
2 TD hht ; a ; t i; b ; ht
3 4 = 1 2 3 ; c; t 4 ii
node22 ::
0
a tree23 ) 0
a )
) 0a upD 0
a upD
node22 t a TD t
1 ( TD ht ; a ; t i
2) = 1 2
node22 ht ; b ; t i a UF t
1 2 UF ht ; b ; t ; a ; t i
( 3) = 1 2 3
node22 ht ; b ; t ; c ; t i a UF t
1 2 3 TD hht ; b ; t i; c ; ht
( 4) = 1 2 3 ; a; t 4 ii
node31 ::
0
a upD ) 0
a ) 0
a tree23 ) 0
a ) 0
a tree23 ) 0
a upD
node31 (TD t 1 ) a t 2 b t 3 = TD ht 1 ; a ; t 2 ; b ; t 3 i
node31 (UF t 1 ) a ht 2 ; b ; t 3 i c t 4 = TD hht 1 ; a ; t 2 ; b ; t 3 i; c ; t 4 i
node31 (UF t 1 ) a ht 2 ; b ; t 3 ; c ; t 4 i d t 5
= TD hht 1 ; a ; t 2 i; b ; ht 3 ; c ; t 4 i; d ; t 5 i
node32 ::
0
a tree23 ) 0
a ) 0
a upD ) 0
a ) 0
a tree23 ) 0
a upD
node32 t 1 a (TD t 2 ) b t 3 = TD ht 1 ; a ; t 2 ; b ; t 3 i
node32 t 1 a (UF t 2 ) b ht 3 ; c ; t 4 i = TD ht 1 ; a ; ht 2 ; b ; t 3 ; c ; t 4 ii
node32 t 1 a (UF t 2 ) b ht 3 ; c ; t 4 ; d ; t 5 i
= TD ht 1 ; a ; ht 2 ; b ; t 3 i; c ; ht 4 ; d ; t 5 ii
node33 ::
0
a tree23 ) 0
a )
) 0a upD ) 0a upD
0
a tree23 ) 0
a
node l a m b TD r
33 (TD hl ; a ; m ; b ; r i
) =
node t a ht ; b ; t i c UF t
33 1 2 3 TD ht ; a ; ht ; b ; t ; c ; t ii
( 4) = 1 2 3 4
node t a ht ; b ; t ; c ; t i d UF t
33 1 2 3 4 ( 5)
= TD ht ; a ; ht ; b ; t i; c ; ht ; d ; t ii
1 2 3 4 5
split_min t =
0
(x ; t ) ^ complete t ^ < h t ! complete treeD t 0 0 ( )
complete t ! hD del x t h t ( ) =
For each property of node21 there are analogues properties for the other
nodeij functions which we omit.
7.3 Converting a List into a 2-3 Tree 87
datatype 0a tree23s = T (
0a tree23) j TTs ( 0a tree23) 0a ( 0a tree23s )
len ::
0a tree23s ) nat
len (T _) = 1
len (TTs _ _ ts ) = len ts + 1
trees ::
0a tree23s ) 0a tree23 set
trees (T t ) = ft g
trees (TTs t _ ts ) = ft g [ trees ts
inorder2 ::
0a tree23s ) 0a list
inorder2 (T t ) = inorder t
inorder2 (TTs t a ts ) = inorder t @ a # inorder2 ts
88 7 2-3 Trees
Repeatedly passing over the alternating list until only a single tree remains
is expressed by the following functions:
Note that join_adj is not and does not need to be defined on single trees.
We express this precondition with an abbreviation:
not_T ts 8 t : ts 6 = Tt
Also note that join_all terminates only because join_adj shortens the list:
not_T ts ! len (join_adj ts ) < len ts
In fact, it reduces the length at least by a factor of 2:
not_T ts ! len (join_adj ts ) len ts div 2 (7.1)
The whole process starts with a list of alternating leaves and elements:
leaves ::
0a list ) 0a tree23s
leaves [] = T hi
leaves (a # as ) = TTs hi a ( leaves as )
complete (tree23_of_list as )
Why does the conversion take linear time? Because the first pass over an
alternating list of length n takes n steps, the next pass n=2 steps, the next
pass n=4 steps, etc, and this sums up to 2n. The canonical time functions for
the formal proof are shown in Appendix B.3. The following upper bound is
easily proved by induction on the computation of join_adj :
not_T ts ! Tjoin_adj ts len ts div 2 (7.2)
An upper bound Tjoin_all ts len ts follows by induction on the compu-
2
Bibliographic Remarks
The invention of 2-3 trees is credited to Hopcroft in 1970 [13, p. 337]. Equa-
tional definitions were given by Hoffmann and O’Donnell [29] (only insertion)
and Reade [66]. Our formalisation is based on the teaching material by Tur-
bak [74] and the article by Hinze [28].
8
Red-Black Trees
Rlar hl ; a ; Red ; r i
( )
Blar hl ; a ; Black ; r i
( )
color ::
0a rbt ) color
color hi = Black
color h_; (_; c ); _i = c
paint c hl ; a ; _ ; r i
( hl ; a ; c ; r i
) = ( )
8.1 Invariants
rbt ::
0a rbt ) bool
rbt t = ( invc t ^ invh t ^ color t = Black )
The color invariant expresses that red nodes must have black children:
invc ::
0a rbt ) bool
invc hi = True
invc hl ; (_; c ); r i
= ((c = Red ! color l = Black ^ color r = Black ) ^
invc l ^ invc r )
The height invariant expresses (via the black height bh) that all paths
from the root to a leaf have the same number of black nodes:
8.2 Implementation of ADT Set 93
invh ::
0a rbt ) bool
invh hi = True
invh hl ; (_; _); r i = ( bh l = bh r ^ invh l ^ invh r )
bh ::
0a rbt ) nat
bh hi = 0
bh hl ; (_; c ); _i = ( if c = Black then bh l + 1 else bh l )
Note that although bh traverses only the left spine of the tree, the fact that
invh traverses the complete tree ensures that all paths from the root to a leaf
are considered. (See Exercise 8.2)
The split of the invariant into invc and invh improves modularity: fre-
quently one can prove preservation of invc and invh separately, which fa-
cilitates proof search. For compactness we will mostly present the combined
invariance properties.
In a red-black tree, i.e. rbt t, every path from the root to a leaf has the same
number of black nodes, and no such path has two red nodes in a row. Thus
each leaf is at most twice as deep as any other leaf, and therefore h t 2
lg jt j1 . The detailed proof starts with the key inductive relationship between
which has the easy corollary rbt t ! h t = bh t. Together with the easy
2
inductive property
invc t ^ invh t ! bh t jt j 2 1
We implement sets by red-black trees that are also BSTs. As usual, we only
discuss the proofs of preservation of the rbt invariant.
Function isin is implemented as for all augmented BSTs (see Section 5.5.1).
94 8 Red-Black Trees
ins ::
0
a )) 0
a rbt 0
a rbt
ins x = hi R hi x hi
ins x (B l a r ) = ( case cmp x a of
LT ) baliL (ins x l ) a r j
EQ ) B l a r j
GT ) baliR l a (ins x r ))
ins x (R l a r ) = (case cmp x a of
LT ) R (ins x l ) a r j
EQ ) R l a r j
GT ) R l a (ins x r ))
baliL ::
0
a rbt ) 0
a ) 0
a rbt ) 0
a rbt
baliL (R (R t 1 a t 2 ) b t 3 ) c t 4 = R (B t 1 a t 2 ) b (B t 3 c t 4 )
baliL (R t 1 a (R t 2 b t 3 )) c t 4 = R (B t 1 a t 2 ) b (B t 3 c t 4 )
baliL t 1 a t 2 = B t 1 a t 2
baliR ::
0
a rbt ) 0
a ) 0
a rbt ) 0
a rbt
baliR t 1 a (R t 2 b (R t 3 c t 4 )) = R (B t 1 a t 2 ) b (B t 3 c t 4 )
baliR t 1 a (R (R t 2 b t 3 ) c t 4 ) = R (B t 1 a t 2 ) b (B t 3 c t 4 )
baliR t 1 a t 2 = B t 1 a t 2
8.2.1 Insertion
= R (B hi 0 hi) 1 (B hi 2 (R hi 3 hi))
Passing a red node up means an overflow occurred (as in 2-3 trees) that needs
to be dealt with further up — at the latest at the very top where insert turns
red into black.
8.2 Implementation of ADT Set 95
Function ins preserves invh but not invc: it may return a tree with a red
root node (see example above). However, once the root node is colored black,
everything is fine again. Thus we introduce the weaker invariant invc2 as an
abbreviation:
It is easy to prove that baliL and baliR preserve invh and upgrade from
invc2 to invc:
invh l ^ invh r ^ invc2 l ^ invc r ^ bh l = bh r !
invc (baliL l a r ) ^ invh (baliL l a r ) ^ bh (baliL l a r ) = bh l + 1
8.2.2 Deletion
Deletion from a red-black tree is shown in Figure 8.2. It follows the deletion-
by-replacing approach (Section 5.2.1). The tricky bit is how to maintain the
invariants. As before, intermediate trees may only satisfy the weaker invariant
invc2. Functions del and split_min decrease the black height of a tree with a
black root node and leave the black height unchanged otherwise. To see that
this makes sense, consider deletion from a singleton black or red node. The
case that the element to be removed is not in the black tree can be dealt with
by coloring the root node red. These are the precise input/output relations:
Lemma 8.1. split_min t = (x ; t 0 ) ^ t 6= hi ^ invh t ^ invc t !
invh t 0 ^ (color t = Red ! bh t 0 = bh t ^ invc t 0 ) ^
(color t = Black ! bh t 0 = bh t 1 ^ invc2 t 0 )
Lemma 8.2. invh t ^ invc t ^ t 0 = del x t !
invh t 0 ^ (color t = Red ! bh t 0 = bh t ^ invc t 0 ) ^
(color t = Black ! bh t 0 = bh t 1 ^ invc2 t 0 )
It is easy to see that the del-Lemma implies correctness of delete:
Corollary 8.3. rbt t ! rbt ( delete x t )
96 8 Red-Black Trees
del ::
0
a ) 0
a rbt ) 0
a rbt
del _ hi = hi
del x hl ; (a ; _); r i
= (case cmp x a of
LT ) let l 0 = del x l
in if l 6= hi ^ color l = Black then baldL l 0 a r else R l 0 a r j
EQ ) if r = hi then l
else let (a 0; r 0) = split_min r
in if color r = Black then baldR l a 0 r 0 else R l a 0 r 0 j
GT ) let r 0 = del x r
in if r 6= hi ^ color r = Black then baldR l a r 0 else R l a r 0)
baldL ::
0
a rbt ) 0
a ) 0
a rbt ) 0
a rbt
baldL (R t 1 a t 2 ) b t 3 = R (B t 1 a t 2 ) b t 3
baldL t 1 a (B t 2 b t 3 ) = baliR t 1 a (R t 2 b t 3 )
baldL t 1 a (R (B t 2 b t 3 ) c t 4 ) = R (B t 1 a t 2 ) b (baliR t 3 c (paint Red t 4 ))
baldL t1 a t2 = R t1 a t2
baldR ::
0
a rbt ) 0
a ) 0
a rbt ) 0
a rbt
baldR t 1 a (R t 2 b t 3 ) = R t 1 a (B t 2 b t 3 )
baldR (B t 1 a t 2 ) b t 3 = baliL (R t 1 a t 2 ) b t 3
baldR t1 a t2 = R t1 a t2
The proofs of the two lemmas need the following precise characterizations
of baldL and baldR, the counterparts of baliL and baliR:
Lemma 8.4. invh l ^ invh r ^ bh l + 1 = bh r ^ invc2 l ^ invc r ^
t 0 = baldL l a r !
invh t 0 ^ bh t 0 = bh r ^ invc2 t 0 ^ (color r = Black ! invc t 0 )
t 0 = baldR l a r !
invh t 0 ^ bh t 0 = bh l ^ invc2 t 0 ^ color l
( = Black ! invc t 0 )
8.3 Exercises 97
The proofs of these lemmas are by case analyses over the defining equations
using the characteristic properties of baliL and baliR given above.
case a < x is dual and the case x = a is similar (using Lemma 8.1). We do
not show the details because they are tedious but routine. ut
The proof of Lemma 8.1 is similar but simpler.
8.3 Exercises
Exercise 8.1. Show that the logarithmic height of red-black trees is already
guaranteed by the color and height invariants:
invc t ^ invh t ! h t 2 lg jt j1 + 2
del ::
0
a ) 0
a rbt ) 0
a rbt
del _ hi = hi
del x hl ; (a ; _); r i
= (case cmp x a of
join ::
0
a rbt ) 0
a rbt ) 0
a rbt
join hi t = t
join t hi = t
join (R t 1 a t 2 ) (R t 3 c t 4 )
= (case join t 2 t 3 of
R u 2 b u 3 ) R (R t 1 a u 2 ) b (R u 3 c t 4 ) j
t 2 3 ) R t 1 a (R t 2 3 c t 4 )
join (B t 1 a t 2 ) (B t 3 c t 4 )
= (case join t 2 t 3 of
R u 2 b u 3 ) R (B t 1 a u 2 ) b (B u 3 c t 4 ) j
t 2 3 ) baldL t 1 a (R t 2 3 c t 4 )
join t 1 (R t 2 a t 3 ) = R (join t 1 t 2 ) a t 3 j
join (R t 1 a t 2 ) t 3 = R t 1 a (join t 2 t 3 )
Exercise 8.3. Following Section 7.3, define a linear time function rbt_of_list
::
0a list ) 0a rbt and prove both inorder (rbt_of_list as ) = as and
rbt (rbt_of_list as ).
Bibliographic Remarks
Red-Black trees were invented by Bayer [6] who called them “symmetric bi-
nary B-trees”. The red-black color convention was introduced by Guibas and
Sedgewick [23] who studied their properties in greater depth. The first func-
tional version of red-black trees (without deletion) is due to Okasaki [62] and
everybody follows his code. A functional version of deletion was first given
8.3 Exercises 99
by Kahrs [36]1 and Section 8.2.3 is based on it. Germane [22] presents a
function for deletion by replacement that is quite different from the one in
Section 8.2.2. Our starting point were Isabelle proofs by Reiter and Krauss
(based on Kahrs). Other verifications of red-black trees are reported by Fil-
liâtre and Letouzey [19] (using their own deletion function) and Appel [4]
(based on Kahrs).
1
The code for deletion is not in the article but can be retrieved from this URL:
http://www.cs.ukc.ac.uk/people/staff/smk/redblack/rb.html
9
AVL Trees
The AVL tree [2] (named after its inventors Adelson-Velsky and Landis) is
the granddaddy of efficient binary search trees. Its logarithmic height is main-
tained by rotating subtrees based on their height. For efficiency reasons the
height of each subtree is stored in its root node. That is, the underlying data
structure is a height-augmented tree (see Section 4.4):
Function ht extracts the height field and node is a smart constructor that
sets the height field:
ht ::
0a tree_ht ) nat
ht hi = 0
ht h_; (_; n ); _i = n
node ::
0a tree_ht ) 0a ) 0a tree_ht ) 0a tree_ht
node l a r = hl ; a ; max ht l ht r
( ( ) ( ; ri
) + 1)
An AVL tree is a tree that satisfies the AVL invariant: the height of the
left and right child of any node differ by at most 1
avl ::
0a tree_ht ) bool
avl hi = True
avl hl ; (_; n ); r i
= (jint (h l ) int (h r )j 1 ^
n = max (h l ) (h r ) + 1 ^ avl l ^ avl r )
102 9 AVL Trees
and the height field contains the correct value. The conversion function int ::
AVL trees have logarithmic height. The key insight for the proof is that M n,
the minimal number of leaves of an AVL tree of height n, satisfies the re-
currence relation M (n + 2) = M (n + 1) + M n. Instead of formalizing
this function M we prove directly that an AVL tree of height n has at least
fib (n + 2) leaves where fib is the Fibonacci function:
= fib (h l + 2) + fib (h l + 1)
jl j1 + fib (h l + 1) by fib (h l + 2) jl j1
jl j1 + jr j1 by fib (h r + 2) jr j1
The last step is justified because h l + 1 h r + 2 (which follows from
avl t ) and fib is monotone. ut
pNow we prove a well-known exponential lower bound for fib where ' (1
+ = : 5) 2
'n + 1
+ 'n by IHs
= (' + 1) 'n
= 'n + 2
because ' + 1 = ' ut
2
That is, the height of an AVL tree is at most = ' : times worse 1 lg 1 44
9.2.1 Insertion
Insertion follows the standard approach: insert the element as usual and
reestablish the AVL invariant on the way back up.
insert ::
0a ) 0a tree_ht ) 0a tree_ht
insert x hi hhi; x ; ; hii
= ( 1)
LT ) balL insert x l a r j ( )
EQ ) hl ; a ; n ; r i j ( )
GT ) balR l a insert x r ( ))
Functions balL/balR readjust the tree after an insertion into the left/right
child. The AVL invariant has been lost if the difference in height has become 2
— it cannot become more because the height can only increase by 1. Consider
the definition of balL in Figure 9.1 (balR in Figure 9.2 is dual). If the AVL
invariant has not been lost, i.e. if ht AB 6= ht C + 2, then we can just return
the AVL tree node AB c C. But if ht AB = ht C + 2, we need to “rotate”
the subtrees suitably. Clearly AB must be of the form hA; (a ; _); B i. There
are two cases, which are illustrated in Figure 9.1. Rectangles denote trees.
Rectangles of the same height denote trees of the same height. Rectangles
with a +1 denote the additional level due to insertion of the new element.
If ht B ht A then balL performs what is known as a single rotation.
If ht A < ht B then B must be of the form hB 1 ; (b ; _); B 2 i (where
either B 1 or B 2 has increased in height) and balL performs what is known as
a double rotation.
It is easy to check that in both cases the tree on the right satisfies the
AVL invariant.
104 9 AVL Trees
balL ::
0
a tree_ht ) 0
a ) 0
a tree_ht ) 0
a tree_ht
balL AB c C
= (if ht AB = ht C + 2
then case AB of
hA; (a ; x ); B i )
if ht B ht A then node A a (node B c C )
else case B of
hB 1 ; (b ; _); B 2 i ) node (node A a B 1 ) b (node B 2 c C )
else node AB c C )
Single rotation:
c !
balL
a
a c
C A
A B +1 B C
+1
Double rotation:
c !
balL
b
a a c
b C
B1 B2
A A C
+1 +1
B1 B2
+1 +1
balR ::
0
a tree_ht ) 0
a ) 0
a tree_ht ) 0
a tree_ht
balR A a BC
= (if ht BC = ht A + 2
then case BC of
hB ; (c ; x ); C i )
if ht B ht C then node (node A a B ) c C
else case B of
hB 1 ; (b ; _); B 2 i ) node (node A a B 1 ) b (node B 2 c C )
else node A a BC )
9.2.2 Deletion
delete ::
0
a ) 0
a tree_ht ) 0
a tree_ht
delete _ hi = hi
delete x hl ; (a ; _); r i
= (case cmp x a of
LT ) balR (delete x l ) a r j
EQ ) if l = hi then r else let (l 0; a 0) = split_max l in balR l 0 a 0 r j
GT ) balL l a (delete x r ))
Figure 9.3 shows deletion-by-replacing (see 5.2.1). The recursive calls are
dual to insertion: in terms of the difference in height, deletion of some element
106 9 AVL Trees
from one child is the same as insertion of some element into the other child.
Thus functions balR/balL can again be employed to restore the invariant.
An element is deleted from a node by replacing it with the maximal el-
ement of the left child (the minimal element of the right child would work
just as well). Function split_max performs that extraction and uses balL to
restore the invariant after splitting an element off the right child.
The fact that balR/balL can be reused for deletion can be illustrated by
drawing the corresponding rotation diagrams. We look at how the code for
balL behaves when an element has been deleted from C. Dashed rectangles
indicate a single additional level that may or may not be there. The label -1
indicates that the level has disappeared due to deletion.
Single rotation in balL after deletion in C :
c
balL
! a
a c
C
A
-1
B B C
A
a a c
b C
B1 B2
A -1 A C
B1 B2
9.3 Exercises
Exercise 9.1. The logarithmic height of AVL trees can be proved directly.
Prove
avl t ^ht = n ! 2
n div 2
jt j
1
We are only interested in the shape of these trees. Therefore the nodes just
contain dummy unit values (). Hence we need to define the AVL invariant
again for trees without annotations:
avl0 ::
0a tree ) bool
avl0 hi = True
avl0 hl ; _; r i = ( jint ( h l) int (h r )j ^ avl
1 0 l ^ avl 0 r)
Prove the following properties of Fibonacci trees:
avl0 (fibt n ) jfibt n j 1 = fib (n + 2)
108 9 AVL Trees
Conclude that the Fibonacci trees are minimal (w.r.t. their size) among all
AVL trees of a given height:
avl t ! jfibt (h t )j1 jt j 1
Exercise 9.3. Show that every almost complete tree is an AVL tree:
acomplete t ! avl 0 t
As in the previous exercise we consider trees without height annotations.
avl (avl_of_list as ).
9.4 An Optimization
Instead of recording the height of the tree in each node, it suffices to record
the balance factor, i.e. the difference in height of its two children. Rather
than the three integers -1, 0 and 1 we utilize a new data type:
The names Lh and Rh stand for “left-heavy” and “right-heavy”. The AVL
invariant for these trees reflect these names:
9.4 An Optimization 109
avl ::
0a tree_bal ) bool
avl hi = True
avl hl ; (_; b ); r i = (( case b of
Lh ) h l = h r + 1 j
Bal ) h r = h l j
Rh ) h r = h l + 1) ^
avl l ^ avl r )
The code for insertion (and deletion) is similar to the height-based version.
The key difference is that the test if the AVL invariant as been lost cannot be
based on the height anymore. We need to detect if the tree has increased in
height upon insertion based on the balance factors. The key insight is that a
height increase is coupled with a change from Bal to Lh or Rh. Except when
we transition from hi to hhi; (a ; Bal ); hii. This insight is encoded in the test
incr :
The test for a height increase compares the trees before and after insertion.
Therefore it has been pulled out of the balance functions into insertion:
insert ::
0a ) 0a tree_bal ) 0a tree_bal
insert x = ( hi hhi; x ; Bal ; hii
)
insert x ( ) hl ; a ; b ; r i
= (case cmp x a of
LT ) let l 0 = insert x l
in if incr l l 0 then balL l 0 a b r else hl 0; (a ; b ); r i j
EQ ) hl ; (a ; b ); r i j
GT ) let r 0 = insert x r
in if incr r r 0 then balR l a b r 0 else hl ; (a ; b ); r 0i)
The balance functions are shown in Figure 9.4. Function rot2 implements
double rotations. Function balL is called if the left child AB has increased
in height. If the tree was Lh then single or double rotations are necessary
110 9 AVL Trees
balL ::
0
a tree_bal ) 0
a ) bal ) 0
a tree_bal ) 0
a tree_bal
balL AB c bc C
= (case bc of
Lh ) case AB of
hA; (a ; Lh ); B i ) hA; (a ; Bal ); hB ; (c ; Bal ); C ii j
hA; (a ; Bal ); B i ) hA; (a ; Rh ); hB ; (c ; Lh ); C ii j
hA; (a ; Rh ); B i ) rot2 A a B c C j
Bal ) hAB ; (c ; Lh ); C i j
Rh ) hAB ; (c ; Bal ); C i)
balR ::
0
a tree_bal ) 0
a ) bal ) 0
a tree_bal ) 0
a tree_bal
balR A a ba BC
= (case ba of
Lh ) hA; (a ; Bal ); BC i j
Bal ) hA; (a ; Rh ); BC i j
Rh ) case BC of
hB ; (c ; Lh ); C i ) rot2 A a B c C j
hB ; (c ; Bal ); C i ) hhA; (a ; Rh ); B i; (c ; Lh ); C i j
hB ; (c ; Rh ); C i ) hhA; (a ; Bal ); B i; (c ; Bal ); C i)
rot2 ::
0
a tree_bal ) 0
a ) 0
a tree_bal ) 0
a ) 0
a tree_bal ) 0
a tree_bal
rot2 A a B c C
= (case B of
hB 1 ; (b ; bb ); B 2 i )
let b 1 = if bb = Rh then Lh else Bal ;
b 2 = if bb = Lh then Rh else Bal
in hhA; (a ; b 1 ); B 1 i; (b ; Bal ); hB 2 ; (c ; b 2 ); C ii)
The functions incr and decr are almost dual except that incr implicitly
assumes t 0 6= hi because insertion is guaranteed to return a Node. Thus we
could use decr instead of incr but not the other way around.
Deletion and split_max change in the same manner as insertion:
9.5 Exercises 111
delete ::
0a ) 0a tree_bal ) 0a tree_bal
delete _ hi = hi
delete x hl ; (a ; ba ); r i
= (case cmp x a of
LT ) let l 0 = delete x l
in if decr l l 0 then balR l 0 a ba r else hl 0; (a ; ba ); r i
j EQ ) if l = hi then r
else let (l 0; a 0) = split_max l
in if decr l l 0 then balR l 0 a 0 ba r
else hl 0; (a 0; ba ); r i
0
j GT ) let r = delete x r
in if decr r r 0 then balL l a ba r 0 else hl ; (a ; ba ); r 0i)
This theorem tells us not only that avl is preserved but also that incr indi-
cates correctly if the height has increased or not.
Similarly for deletion and decr :
Theorem 9.7. avl t ^ t 0 = delete x t !
avl t 0 ^ h t = h t 0 + (if decr t t 0 then 1 else 0)
9.5 Exercises
Exercise 9.6. We map type 0a tree_bal back to type ( 0a nat ) tree called
0a tree_ht in the beginning of the chapter:
112 9 AVL Trees
debal ::
0a tree_bal ) (
0a nat ) tree
debal hi = hi
debal hl ; (a ; _); r i = hdebal l ; a ; max
( h l ) (h r )
( + 1) ; debal r i
Prove that the AVL property is preserved: avl t ! avl_ht (debal t ) where
avl_ht is the avl predicate on type 0a tree_ht from the beginning of the
chapter.
Define a function debal2 of the same type that traverses the tree only once
and in particular does not use function h. Prove avl t ! debal2 t = debal t .
Exercise 9.7. To realize the full space savings potential of balance factors we
encode them directly into the node constructors and work with the following
special tree type:
datatype 0a tree4 = Leaf
j Lh ( 0a tree4) 0a ( 0a tree4)
j Bal ( 0a tree4) 0a ( 0a tree4)
j Rh ( 0a tree4) 0a ( 0a tree4)
On this type define the AVL invariant, insertion, deletion and all necessary
auxiliary functions. Prove theorems 9.6 and 9.7. Hint: modify the theory un-
derlying Section 9.4.
10
Beyond Insert and Delete: [, \ and
smaller set is inserted into the larger one or the other way around. Of course
the former sum is less than or equal to the latter sum. To estimate the growth
of lg n + + lg(n + m 1) = lg(n (n + m 1)) we can easily generalize the
derivation of lg(n!) 2 (n lg n) in the initial paragraph of Section 7.3. The
result is lg(n (n + m 1)) 2 (m lg n). That is, inserting m elements into
an n element set one by one takes time (m lg n).
There is a second, possibly naive sounding algorithm for computing the
union: flatten both trees to ordered lists (using function inorder2 from Exer-
cise 4.1), merge both lists and convert the resulting list back into a suitably
balanced search tree. All three steps take linear time. The last step is the only
slightly nontrivial one but has been dealt with before (see Section 7.3 and Ex-
ercises 8.3 and 9.5). This algorithm takes time O(m + n) which is significantly
better than O(m lg n) if m n but significantly worse if m << n.
This chapter is concerned with a third approach that has the following
salient features:
• Union, intersection and difference take time O(m lg( m n + 1))
• It works for a whole class of balanced trees, including AVL, red-black and
weight-balanced trees.
114 10 Beyond Insert and Delete: [, \ and
• It is based on a single function for joining two balanced trees to form a
new balanced tree.
We call it the join approach. It is easily and efficiently parallelizable, a
property we will not explore here.
The join approach is at least as fast as the one-by-one approach: from
m + n mn it follows that mn + 1 n (if m; n 2). The join approach is also
n + 1)
at least as fast as the tree-to-list-to-tree approach because m + n = m( m
(if m 1).
Before explaining the join approach we extend the ADT Set by three new
functions union, inter and diff. The specification in Figure 10.1 is self-
explanatory.
Now we come to the heart of the matter, the definition of union, intersection
and difference in terms of a single function join. We promised that the al-
gorithms would be generic across a range of balanced trees. Thus we assume
that we operate on augmented trees of type ( 0a 0b ) tree where 0a is the
type of the elements and 0b is the balancing information (which we can ignore
10.2 Just Join 115
set of elements, join must behave like union. But it need only return a BST
if both trees are BSTs and the element a lies in between the elements of
the two trees, i.e. if bst hl ; (a ; _); r i. The structural invariant inv must be
preserved by formation and destruction of trees. Thus we can see join as a
smart constructor that builds a balanced tree.
To define union and friends we need a number of simple auxiliary functions
shown in Figure 10.3. Function split_min decomposes a tree into its leftmost
(minimal) element and the remaining tree; the remaining tree is reassembled
via join, thus preserving inv. Function join2 is reduced to join with the help
of split_min. Function split splits a BST w.r.t. a given element a into a triple
(l ; b ; r ) such that l contains the elements less than a, r contains the elements
insert ::
0a ) (
0a 0b tree ) 0a 0b tree
) ( )
delete ::
0a ) (
0a 0b tree ) 0a 0b tree
) ( )
The efficiency can be improved a little by taking the returned b into account.
116 10 Beyond Insert and Delete: [, \ and
split_min :: ( 0a 0b ) tree ) 0a ( 0a 0b ) tree
split_min hl ; (a ; _); r i
= (if l = hi then (a ; r )
join2 :: (
0
a 0
b ) tree ) (
0
a 0
b ) tree ) (
0
a 0
b ) tree
join2 l hi = l
join2 l r = (let (m ; r 0) = split_min r in join l m r 0)
split :: (
0
a ) 0a )
0
b ) tree (
0
a 0
b ) tree bool (
0
a 0
b ) tree
split hi _ hi; False ; hi
= ( )
split hl ; a ; _ ; r i x
( )
= (case cmp x a of
10.2.1 Correctness
We need to prove that union, inter and diff satisfy the specification in Fig-
ure 10.1 where set = set_tree and invar t = inv t ^ bst t. That is, for each
function we show its set-theoretic property and preservation of inv and bst
using the assumptions in Figure 10.2. Most of the proofs in this section are
obvious and automatic inductions and we do not discuss them.
First we need to prove suitable properties of the auxiliary functions
split_min, join2 and split :
split_min t = (m ; t 0 ) ^ t 6= hi !
m 2 set_tree t ^ set_tree t = fm g [ set_tree t 0
split_min t = (m ; t 0 ) ^ bst t ^ t 6= hi !
bst t 0 ^ (8 x 2set_tree t 0 : m < x )
split_min t = ( m ; t 0) ^ inv t ^ t 6 hi ! inv t 0
=
10.2 Just Join 117
union :: (
0
a 0
b ) tree ) (
0
a 0
b ) tree ) (
0
a 0
b ) tree
union hi t = t
union t hi = t
union hl 1 ; (a ; _); r 1 i t 2
= (let (l 2 ; b 2 ; r 2 ) = split t 2 a
inter 0b tree )
:: (
0
a ) (
0
a 0
b ) tree ) (
0
a 0
b ) tree
inter hi t hi =
inter t hi hi =
inter hl ; a ; _ ; r i t
1 ( ) 1 2
= ( let (l 2 ; b 2 ; r 2 ) = split t 2 a ;
l 0 = inter l 1 l 2 ; r 0 = inter r 1 r 2
in if b 2 then join l 0 a r 0 else join2 l 0 r 0)
diff :: (
0
a 0b ) tree ) (
0
a 0
b ) tree ) (
0
a 0
b ) tree
diff hi t = hi
diff t hi = t
diff t 1 hl 2 ; (a ; _); r 2 i
= (let (l 1 ; b 1 ; r 1 ) = split t 1 a
bst (join2 l r )
inv l ^ inv r ! inv join l r ( 2 )
split t x l ; b ; r ^ bst t !
= ( )
The proof of the preservation properties are automatic but the proof of the
set_tree property is more involved than the corresponding proof for union
and we take a closer look at the induction. We focus on the case t 1 = hl 1 ; (a ;
_); r 1 i and t 2 6= hi. Let L1 = set_tree l 1 and R 1 = set_tree r 1 . Let (l 2 ;
b ; r 2 ) = split t 2 a, L2 = set_tree l 2 , R 2 = set_tree r 2 and A = (if b then
fa g else fg). The separation properties
a 2
= L1 [ R1 a 2= L2 [ R2
L2 \ R 2 = fg L1 \ R 2 = fg L2 \R 1 = fg
follow from bst t 1 , bst t 2 and (10.6). Now for the main proof:
set_tree t 1 \ set_tree t 2
= (L1 [ R 1 [ fa g) \ (L2 [ R 2 [ A) by (10.6), bst t 2
= L1 \ L 2 [ R 1 \ R 2 [ A by the separation properties
= set_tree (inter t 1 t 2 ) by (10.1), (10.5), IHs, bst t 1 , bst t 2 , (10.6)
The correctness properties of diff follow the same pattern and their proofs
are similar to the proofs of the inter properties. This concludes the generic
join approach.
This section shows how to implement join efficiently on red-black trees. The
basic idea is simple: descend along the spine of the higher of the two trees
until reaching a subtree whose height is the same as the height of the lower
tree. With suitable changes this works for other balanced trees as well [9].
The function definitions are shown in Figure 10.5. Function join calls joinR
(descending along the right spine of l) if l is the higher tree, or calls joinL
(descending along the left spine of r ) if r is the higher tree, or returns B l x r
10.3 Joining Red-Black Trees 119
joinL ::
0
a rbt ) 0
a ) 0
a rbt ) 0
a rbt
joinL l x r
= (if bh r bh l then R l x r
else case r of
hl 0; (x 0; Red ); r 0i ) R (joinL l x l 0) x 0 r 0 j
hl 0; (x 0; Black ); r 0i ) baliL (joinL l x l 0) x 0 r 0)
joinR ::
0
a rbt ) 0
a ) 0
a rbt ) 0
a rbt
joinR l x r
= (if bh l bh r then R l x r
else case l of
hl 0; (x 0; Red ); r 0i ) R l 0 x 0 (joinR r 0 x r ) j
hl 0; (x 0; Black ); r 0i ) baliR l 0 x 0 (joinR r 0 x r ))
join ::
0
a rbt ) 0
a ) 0
a rbt ) 0
a rbt
join l x r
= (if bh r < bh l then paint Black (joinR l x r )
otherwise. The running time is linear in the black height (and thus logarithmic
in the size) if we assume that the black height is stored in each node; our
implementation of red-black trees would have to be augmented accordingly.
Note that in joinR (and similarly in joinL) the comparison is not bh l = bh r
but bh l bh r to simplify the proofs.
10.3.1 Correctness
We need to prove that join on red-black trees (and a suitable inv ) satisfies its
specification in Figure 10.2. We start with properties of joinL; the properties
of function joinR are completely symmetric. These are the automatically
provable inductive properties:
invc l ^ invc r ^ invh l ^ invh r ^ bh l bh r !
invc2 (joinL l x r ) ^
(bh l 6= bh r ^ color r = Black ! invc (joinL l x r )) ^
invh (joinL l x r ) ^ bh (joinL l x r ) = bh r
bh l bh r ! set_tree (joinL l x r ) = set_tree l [ fx g [ set_tree r
bst hl ; (a ; n ); r i ^ bh l bh r ! bst joinL l a r )
(
Because joinL employs baliL from the chapter on red-black trees, the proof
of the first property makes use of the baliL property shown in Section 8.2.1.
120 10 Beyond Insert and Delete: [, \ and
We define the invariant inv required for the specification in Figure 10.2
as follows:
inv t = ( invc t ^ inv h t )
Although weaker than rbt, it still guarantees logarithmic height (see Exer-
cise 8.1). Note that rbt itself does not work because it does not satisfy prop-
erty (10.4). The properties of join and inv are now easy consequences of the
joinL (and joinR) properties shown above.
10.4 Exercises
Exercise 10.1. Define an alternative version diff1 of diff where in the third
equation pattern matching is on t 1 and t 2 is split. Prove bst t 1 ^ bst t 2 !
set_tree (diff1 t 1 t 2 ) = set_tree t 1 set_tree t 2 .
Exercise 10.2. Following the general idea of the join function for red-black
trees, define a join function for 2-3-trees. Start with two functions joinL,
joinR :: 0a tree23 ) 0a ) 0a tree23 ) 0a upI and combine them into the
overall join function:
join ::
0a tree23 ) 0a ) 0a tree23 ) 0a tree23
Prove the following correctness properties:
complete l ^ complete r ! complete ( join l x r )
complete l ^ complete r !
inorder (join l x r ) = inorder l @ x # inorder r
The corresponding (and needed) properties of joinL and joinR are slightly
more involved.
Bibliographic Remarks
The join approach goes back to Adams [1]. Blelloch et al. [9] generalized
the approach from weight-balanced trees to AVL trees, red-black trees and
n + 1)) bound for the work
treaps. In particular they proved the O(m lg( m
(and an O(lg m lg n) bound for the span).
11
Arrays via Braun Trees
Braun trees are a subclass of almost complete trees. In this chapter we explore
their use as arrays and in Chapter 16 as priority queues.
11.1 Array
So far we have discussed sets (or maps) over some arbitrary linearly ordered
type. Now we specialize that linearly ordered type to nat to model arrays. In
principle we could model arrays as maps from a subset of natural numbers
to the array elements. Because arrays are contiguous, it is more appropriate
to model them as lists. The type 0a list comes with two array-like operations
(see Appendix A):
Indexing: xs ! n is the nth element of the list xs.
Updating: xs [n := x ] is xs with the nth element replaced by x.
By convention, indexing starts with n = 0. If n length xs then xs ! n and
xs [n := x ] are underdefined: they are defined terms but we do not know what
their value is.
Note that operationally, indexing and updating take time linear in the
index, which may appear inappropriate for arrays. However, the type of lists
is only an abstract model that specifies the desired functional behaviour of
arrays but not their running time complexity.
The ADT of arrays is shown in Figure 11.1. Type 0ar is the type of ar-
rays, type 0a the type of elements in the arrays. The abstraction function list
abstracts arrays to lists. It would make perfect sense to include list in the
interface as well. In fact, our implementation below comes with a (reasonably
efficiently) executable definition of list.
The behaviour of lookup, update, size and array is specified in terms
of their counterparts on lists and requires that the invariant is preserved.
122 11 Arrays via Braun Trees
ADT Array =
interface
lookup :: 0ar ) nat ) 0a
update :: nat ) 0a ) 0ar ) 0
ar
len :: 0ar ) nat
array :: 0a list ) 0ar
abstraction list ::
0
ar ) 0a list
invariant invar ::
0
ar ) bool
specification
invar ar ^ n < len ar ! lookup ar n = list ar ! n (lookup)
invar ar ^ n < len ar ! invar (update n x ar ) (update-inv )
invar ar ^ n < len ar ! list (update n x ar ) = (list ar )[n := x ] (update)
invar ar ! len ar = jlist ar j (len)
invar (array xs ) (array-inv )
list (array xs ) = xs (array)
What distinguishes the specifications of lookup and update from the standard
schema (see Chapter 6) is that they carry a size precondition because the
result of lookup and update is only specified if the index is less than the size
of the array.
One can implement arrays by any one of the many search trees presented in
this book. Instead we take advantage of the fact that the keys are natural
numbers and implement arrays by so-called Braun trees which are almost
complete and thus have minimal height.
The basic idea is to index a node in a binary tree by the non-zero bit string
that leads from the root to that node in the following fashion. Starting from
the least significant bit and while we have not reached the leading 1 (which
is ignored), we examine the bits one by one. If the current bit is 0, descend
into the left child, otherwise into the right child. Instead of bit strings we
use the natural numbers 1 that they represent. The Braun tree with nodes
indexed by 1–15 is shown in Figure 11.2. The numbers are the indexes and not
the elements stored in the nodes. For example, the index 14 is 0111 in binary
(least significant bit first). If you follow the path left-right-right corresponding
to 011 in Figure 11.2 you reach node 14.
11.2 Braun Trees 123
2 3
4 6 5 7
8 12 10 14 9 13 11 15
It turns out that the following invariant guarantees that a tree t contains
exactly the nodes indexed by 1, ..., jt j:
braun ::
0a tree ) bool
braun hi = True
braun hl ; _; r i = (( jl j jr j _ jl j jr j
= = + 1) ^ braun l ^ braun r )
mh hl ; x ; r i b jr j c= lg 1 + 1 ( )
jr j
1 + 1 i 2 . This implies i
+ 1
b jr j c and i d jr j
= e.
lg 1 + 1 = lg ( 1 + 1)
Now we can show that all Braun trees are almost complete:
Lemma 11.2. braun t ! acomplete t
Thus we know that Braun trees have optimal height (Lemma 4.6) and can
even quantify it (Lemma 4.7).
Proof. The proof is by induction. We focus on the induction step where t =
hl ; x ; r i. By assumption we have acomplete l and acomplete r. Because of
braun t we can distinguish two cases. First assume jl j = jr j + 1. The claim
acomplete t follows immediately from the previous lemma. Now assume jl j =
jr j. By definition, there are four cases to consider when proving acomplete t .
By symmetry it suffices to consider only two of them. If h l h r and
mh r < mh l then acomplete t reduces to acomplete r, which is true by
assumption. Now assume h l h r and mh l mh r. Because jl j = jr j,
the fact that the height of an almost complete tree is determined uniquely
by its size (Lemma 4.7) implies h l = h r and thus acomplete t reduces to
acomplete l, which is again true by assumption. ut
Note that the proof does not rely on the fact that it is the left child that is
potentially one bigger than the right one; it merely requires that the difference
in size between two siblings is at most 1.
In this section we implement arrays by means of Braun trees and verify cor-
rectness and complexity. We start by defining array-like functions on Braun
trees. After the above explanation of Braun trees the following lookup func-
tion will not come as a surprise:
lookup1 ::
0a tree ) nat ) 0a
lookup1 hl ; x ; r i n
= (if n = 1 then x else lookup1 (if even n then l else r ) (n div 2))
The least significant bit is the parity of the index and we advance to the next
bit by div 2. The function is called lookup1 rather than lookup to emphasize
that it expects the index to be at least 1. This simplifies the implementation
via Braun trees but is in contrast to the Array interface where by convention
indexing starts with 0.
Function update1 descends in the very same manner but also performs an
update when reaching 1:
11.3 Arrays via Braun Trees 125
update n x hl ; a ; r i
1
= (if n then hl ; x ; r i
= 1
The second equation updates existing entries in case n = 1. The first equation,
however, creates a new entry and thus supports extending the tree. That is,
update1 (jt j + 1) x t extends the tree with a new node x at index jt j + 1.
Function adds iterates this process (again expecting jt j + 1 as the index) and
thus adds a whole list of elements:
adds ::
0a list ) nat ) 0a tree ) 0a tree
adds [] _ t = t
adds (x # xs ) n t = adds xs (n + 1) ( update1 (n + 1) x t)
lookup (t ; _) n = lookup1 t (n + 1)
update n x (t ; m ) = ( update1 (n + 1) x t ; m )
len (t ; m ) = m
array xs = (adds xs 0 hi; jxs j)
invar (t ; l ) = ( braun t ^l = jt j
)
Instead we define list recursively and derive the above equation later on
list ::
0a tree ) 0a list
list hi = []
list hl ; x ; r i = x # splice (list l ) (list r )
This definition is best explained by looking at Figure 11.2. The subtrees with
root 2 and 3 will be mapped to the lists [2; 4; 6; 8; 10; 12; 14] and [1; 3; 5;
7; 9; 11; 13; 15]. The obvious way to combine these two lists into [1; 2; 3; :::;
splice ::
0a list ) 0a list ) 0a list
splice [] ys = ys
splice (x # xs ) ys = x # splice ys xs
Note that because of this reasonably efficient (O(n lg n), see Section 11.3.2)
implementation of list we can also regard list as part of the interface of arrays.
Before we embark on the actual proofs we state a helpful arithmetic truth
that is frequently used implicitly below:
braun hl ; x ; r i ^ n 2 f1::jhl ; x ; r ijg ^ 1 <n !
(odd n ! n div 2 2 f1::jr jg) ^ (even n ! n div 2 f ::jl jg
2 1 )
where fm ::n g = fk j m k ^ k m g.
We will now verify that the implementation in Figure 11.3 of the Array
interface in Figure 11.1 satisfies the given specification.
We start with property (len), the correctness of function len. Because of
the invariant, (len) follows directly from
jlist t j jt j=
The following property relating lookup1 and update1 is again proved by in-
duction:
braun t ^ n 2 f1::jt jg !
lookup1 (update1 n x t ) m = ( if n = m then x else lookup1 t m )
The last three properties together with (11.2) and list extensionality prove
the following proposition, which implies (update):
braun t ^ n 2 f ::jt jg ! list
1 ( update1 n x t ) = ( list t )[n 1 := x]
Finally we turn to the constructor array. It is implemented in terms of
adds and update1. Their correctness is captured by the following properties
whose inductive proofs build on each other:
braun t ! jupdate jt j 1 ( x tj
+ 1) jt j = + 1 (11.3)
braun t ! braun update jt j
( 1 ( xt + 1) ) (11.4)
braun t ! list update jt j
( 1 ( xt list t
+ 1) ) = @ [ x] (11.5)
braun t ! jadds xs jt j t j jt j jxs j ^ braun
= + ( adds xs jt j t )
braun t ! list adds xs jt j t list t xs
( ) = @
The last two properties imply the remaining proof obligations (array-inv ) and
(array). The proof of (11.5) requires the following two properties of splice
which are proved by simultaneous induction:
jys j jxs j ! splice xs x ys ( @ [ ]) = splice xs ys @ [x ]
jxs j jys j ! splice xs ys
+ 1 ( @ [ y ]) = splice xs ys @ [y ]
128 11 Arrays via Braun Trees
Flexible arrays can be grown and shrunk at either end. Figure 11.4 shows
the specification of all four operations. (For tl and butlast see Appendix A.)
Array_Flex extends the basis specification Array in Figure 11.1.
interface
add_lo :: 0a ) 0ar ) 0
ar
del_lo :: 0ar ) 0ar
add_hi :: 0a ) 0ar ) 0
ar
del_hi :: 0ar ) 0ar
specification
invar ar ! invar (add_lo a ar ) (add_lo-inv )
invar ar ! list (add_lo a ar ) = a # list ar (add_lo)
invar ar ! invar (del_lo ar ) (del_lo-inv )
invar ar ! list (del_lo ar ) = tl (list ar ) (del_lo)
invar ar ! invar (add_hi a ar ) (add_hi -inv )
invar ar ! list (add_hi a ar ) = list ar @ [a ] (add_hi )
invar ar ! invar (del_hi ar ) (del_hi -inv )
invar ar ! list (del_hi ar ) = butlast (list ar ) (del_hi )
This was easy but extending an array at the low end seems hard because
one has to shift the existing entries. However, Braun trees support a logarith-
mic implementation:
1 0
add_lo
!
0
2 3 1 2
4 6 5 3 5 4 6
Function del_lo simply reverses add_lo by removing the root and merging
the children:
130 11 Arrays via Braun Trees
merge ::
0a tree ) 0a tree ) 0a tree
merge hi r = r
merge hl ; a ; r i rr = hrr ; a ; merge l r i
Figure 11.5 shows the obvious implementation of the functions in the Ar-
ray_Flex specification from Figure 11.4 (on the left-hand side) with the help
of the corresponding Braun tree operations (on the right-hand side). It is an
extension of the basic array implementation from Figure 11.3. All Array_Flex
functions have logarithmic time complexity because the corresponding Braun
tree functions do because they descend along one branch of the tree.
add_lo x (t ; l ) = ( add_lo x t ; l + 1)
del_lo (t ; l ) = (del_lo t ; l 1)
add_hi x (t ; l ) = (update1 (l + 1) x t; l + 1)
del_hi (t ; l ) = (del_hi l t ; l 1)
We now have to prove the properties in Figure 11.4. We have already dealt
with update1 and thus add_hi above. Properties (add_hi -inv ) and (add_hi )
follow from (11.3), (11.4) and (11.5) stated earlier.
Correctness of del_hi on Braun trees is captured by the following two
properties proved by induction:
braun t ! braun del_hi jt j t
( )
butlast (splice xs ys )
= (if jys j < jxs j then splice (butlast xs ) ys else splice xs (butlast ys ))
In this section we meet efficient versions of some old and new functions on
Braun trees. The implementation of the corresponding array operations is
trivial and is not discussed.
The size of a Braun tree can be computed without having to traverse the
entire tree:
diff ::
0a tree ) nat ) nat
diff hi _ = 0
diff hl ; _; r i n
= (if n = 0 then 1
Function size_fast descends down the right spine, computes the size of a
Node as if both children were the same size (1 + 2 n), but adds diff l n to
compensate for bigger left children. Correctness of size_fast
Lemma 11.3. braun t ! size_fast t = jt j
follows from this property of diff :
braun t ^ jt j 2 f n ; n + 1 g ! diff t n = jt j n
The running time of size_fast is quadratic in the height of the tree (see
Exercise 11.3).
Above we only considered the construction of a Braun tree from a list. Alterna-
tively one may want to create a tree (array) where all elements are initialized
to the same value. Of course one can call update1 n times, but one can also
build the tree directly:
braun_of_naive x n
= (if n = 0 then hi
in if odd n
then hbraun_of_naive x m ; x ; braun_of_naive x m i
else hbraun_of_naive x (m + 1); x ;
braun_of_naive x m i)
This solution also has time complexity O(n lg n) but it can clearly be improved
by sharing identical recursive calls. Function braun2_of shares as much as
possible by producing trees of size n and n + 1 in parallel:
We improve on function adds from Section 11.3 that has running time
(n lg n) by developing a linear-time function. Given a list of elements
[1; 2; : : : ], we can subdivide it into sublists [1], [2; 3], [4; : : : ; 7], . . . such that
the kth sublist contains the elements of level k of the corresponding Braun
tree. This is simply because on each level we have the entries whose index has
k+1 bits. Thus we need to process the input list in chunks of size 2k to produce
the trees on level k. But we also need to get the order right. To understand
how that works, consider the last two levels of the tree in Figure 11.2:
4 6 5 7
8 12 10 14 9 13 11 15
8 12 9 13 10 14 11 15
the following pattern emerges: the left subtrees are labeled [8; : : : ; 11], the
right subtrees [12; : : : ; 15]. Call t i the tree with root label i. The correct order
of subtrees, i.e. t 4 , t 6 , t 5 , t 7 , is restored when the three lists [t 4 ; t 5 ], [2; 3]
(the labels above) and [t 6 ; t 7 ] are combined into new trees by going through
them simultaneously from left to right, yielding [ht 4 ; 2; t 6 i; ht 5 ; 3; t 7 i], the
level above.
Abstracting from this example we arrive at the following code. Loosely
speaking, brauns k xs produces the Braun trees on level k.
134 11 Arrays via Braun Trees
Function brauns chops off a chunk ys of size 2k from the input list and
recursively converts the remainder of the list into a list ts of (at most) 2k+1
trees. This list is (conceptually) split into take 2k ts and drop 2k ts which
are combined with ys by function nodes that traverses its three argument
lists simultaneously. As a local optimization, we pass all of ts rather than just
take 2k ts to nodes.
nodes ::
0a tree list ) 0a list ) 0a tree list ) 0a tree list
nodes (l # ls ) (x # xs r) ( #rs hl ; x ; r i nodes ls xs rs
) = #
nodes _ [] _ = []
Because the input list may not have exactly 2n 1 elements, some of the
chunks of elements and trees may be shorter than 2k . To compensate for
that, function nodes implicitly pads lists of trees at the end with leaves. This
padding is the purpose of equations two to four.
The top-level function for turning a list into a tree simply extracts the
first (and only) element from the list computed by brauns 0:
brauns1 ::
0a list ) 0a tree
brauns1 xs = ( if xs = [] then hi else brauns 0 xs ! 0)
Functional Correctness
The key correctness lemma below expresses a property of Braun trees: the
subtrees on level k consist of all elements of the input list xs that are 2k
elements apart, starting from some offset. To state this concisely we define
11.5 Bigger, Better, Faster, More! 135
else take_nths (i 1) k xs )
take_nths 0 1 (splice xs ys ) = xs ^
take_nths 1 1 (splice xs ys ) = ys (11.13)
jtake_nths 0 1 xs j = jtake_nths 1 1 xs j _
jtake_nths 0 1 xs j = jtake_nths 1 1 xs j + 1 (11.14)
We also introduce a predicate relating a tree to a list:
braun_list l (take_nths 1 1 xs ) ^
braun_list r (take_nths 2 1 xs ))
This definition may look a bit mysterious at first but it satisfies a simple
specification: braun_list t xs ! braun t ^ xs = list t (see below). The
idea of the above definition is that instead of relating hl ; x ; r i to xs via
splice we invert the process and relate l and r to the even and odd numbered
elements of drop 1 xs.
Proof. The proof is by induction on t. The base case is trivial. In the induction
step the key properties are (11.14) to prove braun t and (11.9) and (11.13)
to prove xs = list t. ut
The correctness proof of brauns rests on a few simple inductive properties:
jnodes ls xs rs j jxs j
= (11.15)
i < jxs j !
nodes ls xs rs ! i
= hif i < jls j then ls ! i else hi; xs ! i;
if i < jrs j then rs ! i else hii (11.16)
jbrauns k xs j = j j
min xs 2k (11.17)
The main theorem expresses the following correctness property of the el-
ements of brauns k xs: every tree brauns k xs ! i is a Braun tree and its list
of elements is take_nths i k xs:
Proof. The proof is by induction on the length of xs. Assume i < min jxs j
2 , which implies xs 6= []. Let zs = drop 2 xs. Thus jzs j < jxs j and therefore
k k
the IH applies to zs and yields the property
8i j: j i + 2k ^ i < min jzs j 2k + 1
= !
braun_list (ts ! i ) (take_nths j (k + 1) xs )
( )
8 12 9 13 10 14 11 15
the following strategy strongly suggests itself: first the roots, then the left
children, then the right children. The recursive application of this strategy
also takes care of the required reordering of the subtrees. Of course we have
to ignore any leaves we encounter. This is the resulting function:
138 11 Arrays via Braun Trees
in if us = [] then []
else map value us @ list_fast_rec (map left us @ map right us ))
value hl ; x ; r i = x
left hl ; x ; r i = l
right hl ; x ; r i = r
6= [] this proves ' (map left us @ map right us ) < ' ts. We do not show
the technical details.
Finally, the top level function to extract a list from a single tree:
From list_fast one can easily derive an efficient fold function on Braun
trees that processes the elements in the tree in the order of their indexes.
Functional Correctness
where n = jts j jxs j. This can be proved pointwise. Take some i < k
2 . If i
< jxs j then take nths i k xs = take 1 (drop i xs ) (which can be proved by
_
induction on xs). By definition of braun_list it follows that t ! i = hl ; xs !
i ; r i for some l and r such that braun_list l [] and braun_list l [] and thus
l = r = hi, i.e. t ! i = hhi; xs ! i ; hii. If : i < jxs j then take_nths i k xs =
[] by (11.11) and thus braun_list (ts ! i ) [] by the second premise and thus
Tlist_fast_rec ::
0a tree list ) nat
Tlist_fast_rec ts
= (let us = filter (t : t 6= hi) ts
in jts j +
(if us = [] then 0
8Pt 2set ts : t 6 hi P!
( = )
(
t ts k jt j ) = (
t map left ts map right ts k jt j
@
k jts j (11.18) ) +
P
The suggestive notation x xs : f x abbreviates sum_list map f xs . ( )
Proof. The proof is by induction on the size of ts, again using the measure
function t : 2 jt j + 1 which decreases with recursive calls as we proved
above. If ts = [] the claim is trivial. Now assume ts 6= [] and let us = filter
(t : t 6= hi) ts and children = map left us @ map right us.
= (
Pt us 7 jt j) + jts j P
by (11.18)
(
t ts 7 jt j) + jts j = ( t ts 7 jt j + 1) ut
11.6 Exercises
Exercise 11.1. Instead of first showing that Braun trees are almost complete,
give a direct proof of braun t ! h t = dlg jt j1 e by first showing braun t
! 2h t 2 jt j + 1 by induction.
Exercise 11.2. Show that function bal in Section 4.3.1 produces Braun trees:
n jxs j ^ bal n xs = (t ; zs ) ! braun t. (Isabelle hint: bal needs to be
qualified as Balance :bal.)
Exercise 11.3. One can view Braun trees as tries (see Chapter 12) by index-
ing them not with a nat but a bool list where each bit tells us whether to
go left or right (as explained at the start of Section 11.2). Function nat_of
specifies the intended correspondence:
nat_of :: bool list ) nat
nat_of [] = 1
nat_of (b # bs ) = 2 nat_of bs + ( if b then 1 else 0)
Exercise 11.4. Function del_lo is defined with the help of function merge.
Define a recursive function del_lo2 :: 0a tree ) 0a tree without recourse to
any auxiliary function and prove del_lo2 t = del_lo t.
Exercise 11.5. Let lh, the “left height”, compute the length of the left spine
of a tree. Prove that the left height of a Braun tree is equal to its height:
braun t ! lh t = h t
Exercise 11.6. Show that the running time of size_fast is quadratic in the
height of the tree: Define the running time functions Tdiff and Tsize_fast (taking
0 time in the base cases) and prove Tsize_fast t (h t )2 .
Bibliographic Remarks
Braun trees were investigated by Braun and Rem [67] and later, in a functional
setting, by Hoogerwoord [31] who coined the term “Braun tree”. Section 11.5
is partly based on work by Okasaki [61]. The whole chapter is based on work
by Nipkow and Sewell [58].
12
Tries
A trie is a search tree where keys are strings, i.e. lists. A trie can be viewed as
a tree-shaped finite automaton where the root is the start state. For example,
the set of strings fa; an; can; car; catg is encoded as the trie in Figure 12.1.
The solid states are accepting, i.e. those nodes terminate the string leading
to them.
a c
n a
n r t
There is no invariant, i.e. the invariant is simply True: there are no or-
dering, balance or other requirements.
Below we use a variant of the function update notation f (a := b ):
f (a 7! b f ) (a := Some b )
empty ::
0a trie
empty = Nd False _: None
( )
isin ::
0a trie ) 0a list ) bool
isin (Nd b _) [] = b
isin (Nd _ m ) (k # xs )
= (case m k of None ) False j Some t ) isin t xs )
insert ::
0a list ) 0a trie ) 0a trie
insert [] (Nd _ m ) = Nd True m
insert (x # xs ) (Nd b m )
= (let s = case m x of None ) empty j Some t ) t
in Nd b (m (x 7! insert xs s )))
delete ::
0a list ) 0a trie ) 0a trie
delete [] (Nd _ m ) = Nd False m
delete (x # xs ) (Nd b m )
= Nd b (case m x of None ) m j Some t )mx !
7 delete xs t
( ))
The definitions are straightforward. But note that delete does not try to
shrink the trie. For example:
delete [a]
a a
Formally:
delete [a ] (Nd False [a 7! Nd True (_: None )])
= Nd False [a 7! Nd False (_: None )]
where [x 7! t ] (_: None )(x 7! t ). The resulting trie is correct (it repre-
sents the empty set of strings) but could have been shrunk to Nd False (_:
None ).
12.2 Binary Tries 145
Functional Correctness
For the correctness proof we take a lazy approach and define the abstraction
function in a trivial manner via isin:
set ::
0a trie ) 0a list set
set t = fxs j isin t xs g
Correctness of empty and isin (set empty = fg and isin t xs = (xs 2 set t ))
are trivial, correctness of insertion and deletion are easily proved by induction:
set (insert xs t ) = set t [ fxs g
set (delete xs t ) = set t fxs g
This simple model of tries leads to simple correctness proofs but is com-
putationally inefficient because of the function space in 0a ) 0a trie option.
In principle, any representation of this function space, for example by some
search tree, works. However, such a generic theory is relatively complex. Hence
we restrict ourselves to binary tries when exploring more efficient implemen-
tations of tries.
False
False True
Lf s are not shown at all. The edge labels indicated that False refers to the
left and True to the right child. This convention is encoded in the following
auxiliary functions selecting from and modifying pairs:
146 12 Tries
sel2 :: bool ) 0a 0a ) 0a
sel2 b (a 1 ; a 2 ) = ( if b then a 2 else a 1 )
mod2 :: (
0a) 0a ) bool ) 0a 0a )
)
0a 0a
mod2 f b (a ;a1 if b then a ; f a
2) = ( ( 1 2) else (f a 1 ; a 2 ))
empty :: trie
empty = Lf
Functional Correctness
For the correctness proof we take the same lazy approach as above:
We are also lazy in that we set the invariant to True. A more precise
invariant would express that the tries are minimal, i.e. cannot be shrunk. See
Exercise 12.2. It turns out that the correctness properties do not require this
more precise invariant.
The two non-trivial correctness properties are
set_trie (insert xs t ) = set_trie t [ fxs g (12.1)
set_trie (delete xs t ) = set_trie t fxs g (12.2)
are simple consequences of the following inductive properties:
isin (insert xs t ) ys = ( xs = ys _ isin t ys )
isin (delete xs t ) ys = ( xs 6
= ys ^ isin t ys )
Tries can contain long branches without branching. These can be contracted
by storing the branch directly in the start node. The result is called a Patricia
trie. The following figure shows the contraction of a trie into a Patricia trie:
car
c s t
s t
The implementation of the Set interface via binary Patricia tries is shown
in Figure 12.3.
Functional Correctness
emptyP = LfP
in if ps = take n ks
then case drop n ks of [] )bjk # x ) isinP ( sel2 k lr ) x
else False )
j (qs ; []; p # ps 0) )
let t = NdP ps 0 b lr
in NdP qs True (if p then (LfP ; t ) else (t ; LfP ))
j (_; k # ks 0; []) ) NdP ps b (mod2 (insertP ks 0) k lr )
j (qs ; k # ks 0; _ # ps 0) )
let tp = NdP ps 0 b lr ; tk = NdP ks 0 True (LfP ; LfP )
in NdP qs False (if k then (tp ; tk ) else (tk ; tp )))
(qs ; ks ; p # ps ) ) NdP ps b lr
0 0
Again we take a lazy approach and set the invariant on trieP to True.
Correctness of emptyP is trivial. Correctness of the remaining operations
is proved by induction and requires a number of supporting inductive lemmas
which we display before the corresponding correctness property.
Correctness of isinP :
isin (prefix_trie ps t ) ks = (ps = take jps j ks ^ isin t drop jps j ks ))
(
12.4 Exercises
Exercise 12.1. Rework the above theory of binary tries as follows. Eliminate
the bool argument from constructor Nd by replacing Nd by two constructors
representing Nd True and Nd False.
Exercise 12.2. Define the invariant invar that characterizes fully shrunk
binary tries, i.e. tries where every non-Lf trie represents a non-empty set.
Note that a trie represents the empty set if it does not contain any node Nd
True _. Prove that insert and delete maintain the invariant.
Bibliographic Remarks
Tries were first sketched by De La Briandais [15] and described in more detail
by Fredkin [20] who coined their name based on the word reTRIEval. However,
“trie” is usually pronounced like “try” rather than “tree” to avoid confusion.
Patricia tries are due to Morrison [50].
13
Huffman’s Algorithm
C 1 = fa 7! ; b 7! ; c 7!
0 10 ; d 7!
110 111 g
gives the 14-bit code word 01001100100111. The code C1 is optimum: No
code that unambiguously encodes source symbols one at a time could do
better than C1 on the input ‘abacabad’. With a fixed-length code such as
C 2 = fa 7! ; b 7! ; c 7! ; d 7! g
00 01 10 11
152 13 Huffman’s Algorithm
0 1
0 1
a
0 1 and
0 1 0 1
b
0 1
a b c d
c d
correspond to C1 and C2 . The code word for a given symbol can be obtained
as follows: Start at the root and descend toward the leaf node associated with
the symbol one node at a time; emit a 0 whenever the left child of the current
node is chosen and a 1 whenever the right child is chosen. The generated
sequence of 0s and 1s is the code word.
To avoid ambiguities, we require that only leaf nodes are labeled with
symbols. This ensures that no code word is a prefix of another. Moreover, it
is sufficient to consider only full binary trees (trees whose inner nodes all have
two children), because any node with only one child can advantageously be
eliminated by removing it and letting the child take its parent’s place.
Each node in a code tree is assigned a weight. For a leaf node, the weight
is the frequency of its symbol; for an inner node, it is the sum of the weights
of its subtrees. In diagrams, we often annotate the nodes with their weights.
w1 w2
and
w1 +w2
w1 w2
into the list so as to keep it ordered. Finally, repeat the process until only one
tree is left in the list.
As an illustration, executing the algorithm for the frequencies fd = 3,
fe = 11, ff = 5, fs = 7; fz = 2 gives rise to the following sequence of states:
1.
z d f s e
2 3 5 7 11
2.
5 f s e
5 7 11
z d
2 3
3.
s 10 e
7 11
5 f
5
z d
2 3
4.
e 17
11
s 10
7
5 f
5
z d
2 3
154 13 Huffman’s Algorithm
5.
28
e 17
11
s 10
7
5 f
5
z d
2 3
Leaf nodes are of the form Leaf w a, where a is a symbol and w is the
frequency associated with a, and inner nodes are of the form Node w t 1 t 2 ,
where t 1 and t 2 are the left and right subtrees and w caches the sum of the
weights of t 1 and t 2 . The cachedWeight function extracts the weight stored
in a node:
cachedWeight ::
0a tree ) nat
cachedWeight (Leaf w _) = w
cachedWeight (Node w _ _) = w
uniteTrees ::
0a tree ) 0a tree ) 0a tree
uniteTrees t 1 t 2 = Node (cachedWeight t 1 + cachedWeight t 2 ) t 1 t 2
13.4 Basic Auxiliary Functions Needed for the Proof 155
The second function, insortTree, inserts a tree into a list sorted by cached
weight, preserving the sort order:
insortTree ::
0a tree ) 0a tree list ) 0a tree list
insortTree u [] = [u ]
insortTree u (t # ts )
= (if cachedWeight u cachedWeight t then u # t # ts
else t # insortTree u ts )
huffman ::
0a tree list ) 0a tree
huffman [t ] = t
huffman (t 1 # t 2 # ts ) = huffman (insortTree (uniteTrees t 1 t 2 ) ts )
The function should initially be invoked with a non-empty list of leaf nodes
sorted by weight. It repeatedly unites the first two trees of the list it receives
as argument until a single tree is left.
This section introduces basic concepts such as alphabet, consistency and opti-
mality, which are needed to state the correctness and optimality of Huffman’s
algorithm. The next section introduces more specialized functions that arise
in the proof.
The alphabet of a code tree is the set of symbols appearing in the tree’s
leaf nodes:
alphabet ::
0a tree ) 0a set
alphabet (Leaf _ a ) = fa g
alphabet (Node _ t 1 t 2 ) = alphabet t 1 [ alphabet t 2
A tree is consistent if for each inner node the alphabets of the two subtrees
are disjoint. Intuitively, this means that a symbol occurs in at most one leaf
node. Consistency is a sufficient condition for a (the length of the code word
for a) to be uniquely defined. This well-formedness property appears as an
assumption in many of the lemmas. The definition follows:
156 13 Huffman’s Algorithm
consistent ::
0a tree ) bool
consistent (Leaf _ _) = True
consistent (Node _ t 1 t 2 )
= (alphabet t 1 \ alphabet t 2 = fg ^ consistent t ^ consistent t
1 2)
depth ::
0a tree ) 0a ) nat
depth (Leaf _ _) _ = 0
depth (Node _ t 1 t 2 ) a
= (if a 2 alphabet t 1 then depth t 1 a + 1
By convention, symbols that do not occur in the tree or that occur at the
root of a one-node tree are given a depth of 0. If a symbol occurs in several
leaf nodes (of an inconsistent tree), the depth is arbitrarily defined in terms
of the leftmost node labeled with that symbol.
The height of a tree is the length of the longest path from the root to a
leaf node, or equivalently the length of the longest code word:
height ::
0a tree ) nat
height (Leaf _ _) = 0
height (Node _ t 1 t 2 ) = max (height t 1 ) (height t 2 ) + 1
freq ::
0a tree ) 0a ) nat
freq (Leaf w a ) b = (if b = a then w else 0)
freq (Node _ t 1 t 2 ) b = freq t 1 b + freq t 2 b
For consistent trees, the sum comprises at most one nonzero term. The fre-
quency is then the weight of the leaf node labeled with the symbol, or 0 if
there is no such node.
Two trees are comparable if they have the same alphabet and symbol
frequencies. This is an important concept, because it allows us to state not
13.5 Other Functions Needed for the Proof 157
only that the tree constructed by Huffman’s algorithm is optimal but also
that it has the expected alphabet and frequencies.
The weight function returns the weight of a tree:
weight ::
0a tree ) nat
weight (Leaf w _) = w
weight (Node _ t 1 t 2 ) = weight t 1 + weight t 2
In the Node case, we ignore the weight cached in the node and instead com-
pute the tree’s weight recursively.
The cost (or weighted path length) of a consistent tree is the sum
X
freq t a depth t a
a 2 alphabet t
P
which we wrote as a wa a above. It is defined recursively by
cost ::
0a tree ) nat
cost (Leaf _ _) = 0
cost (Node _ t 1 t 2 ) = weight t 1 + cost t 1 + weight t 2 + cost t 2
A tree is optimum if and only if its cost is not greater than that of any
comparable tree:
optimum ::
0a tree ) bool
optimum t
= (8 u : consistent u !
alphabet t = alphabet u !
freq t = freq u ! cost t cost u )
Tree functions are readily generalized to lists of trees, or forests. For ex-
ample, the alphabet of a forest is defined as the union of the alphabets of its
trees. The forest generalizations have a subscript ‘F ’ attached to their name
(e.g. alphabet F ).
vice versa, and to refer to the two symbols with the lowest frequencies. These
concepts are represented by four functions: swapSyms, swapFourSyms, mer-
geSibling, splitLeaf and minima.
The interchange function swapSyms takes a three t and two symbols a, b,
and exchanges them:
swapSyms ::
0a tree ) 0a ) 0a ) 0a tree
swapSyms t a b = swapLeaves t (freq t a ) a (freq t b ) b
The following lemma captures the intuition that to minimize the cost,
more frequent symbols should be encoded using fewer bits than less frequent
ones:
Lemma 13.1. consistent t ^ a 2 alphabet t ^ b 2 alphabet t ^ freq t a
freq t b ^
depth t a depth t b !
cost (swapSyms t a b ) cost t
The four-way symbol interchange function swapFourSyms takes four sym-
bols a, b, c, d with a 6= b and c 6= d, and exchanges them so that a and b
occupy c’s and d’s positions. A naive definition of this function would be
swapSyms (swapSyms t a c ) b d. This naive definition fails in the face of
aliasing: If a = d, but b 6= c, then swapFourSyms a b c d would leave a in
b’s position. Instead, we use this definition:
swapFourSyms ::
0a tree ) 0a ) 0a ) 0a ) 0a ) 0a tree
swapFourSyms t a b c d
= (if a = d then swapSyms t b c
into
a
a b
The frequency of a in the resulting tree is the sum of the original frequencies
of a and b. The function is defined by the equations
13.5 Other Functions Needed for the Proof 159
The sibling function returns the label of the node that is the (left or right)
sibling of the node labeled with the given symbol a in tree t. If a is not in t’s
alphabet or it occurs in a node with no sibling leaf, we simply return a. This
gives us the nice property that if t is consistent, then sibling t a 6= a if and
only if a has a sibling. The definition, which is omitted here, distinguishes
the same cases as mergeSibling.
Using the sibling function, we can state that merging two sibling leaves
with weights wa and wb decreases the cost by wa + wb :
into
a
a b
splitLeaf (Leaf w c c ) w a a w b b
= (if c = a then Node w c (Leaf w a a ) (Leaf w b b ) else Leaf w c c )
splitLeaf (Node w t t ) w a a w b b
1 2
160 13 Huffman’s Algorithm
Splitting a leaf with weight wa + wb into two sibling leaves with weights
wa and wb increases the cost by wa + wb :
Lemma 13.3. consistent t ^ a 2 alphabet t ^ freq t a = wa + wb !
cost (splitLeaf t w a a w b b ) = cost t + w a + w b
Finally, the minima predicate expresses that two symbols a, b have the
lowest frequencies in the tree t and that freq t a freq t b:
minima ::
0a tree ) 0a ) 0a ) bool
minima t a b
= (a 2 alphabet t ^ b 2 alphabet t ^ a 6= b ^
(8 c 2alphabet t :
The main difficulty is to prove the optimality of the tree constructed by Huff-
man’s algorithm. We need three lemmas before we can present the optimality
theorem.
First, if a and b are minima, and c and d are at the very bottom of the
tree, then exchanging a and b with c and d does not increase the tree’s cost.
Graphically, we have
cost cost
c a
d b
a b c d
13.6 The Key Lemmas and Theorems 161
optimum =) optimum
a
a b
Proof. We assume that t’s cost is less than or equal to that of any other
comparable tree v and show that splitLeaf t w a a w b b has a cost less than
or equal to that of any other comparable tree u. For the nontrivial case where
height t > 0, it is easy to prove that there must be two symbols c and d
occurring in sibling nodes at the very bottom of u. From u we construct the
tree swapFourSyms u a b c d in which the minima a and b are siblings:
a b
The question mark reminds us that we know nothing specific about u’s struc-
ture. Merging a and b gives a tree comparable with t, which we can use to
instantiate v:
162 13 Huffman’s Algorithm
c d z
wc wd wz
a b
wa wb
gives the same result as applying the algorithm on the “flat” forest
c a d z
wc wa +wb wd wz
followed by splitting the leaf node a into two nodes a and b with frequencies
wa , wb . The lemma effectively provides a way to flatten the forest at each step
of the algorithm.
This leads us to our main result.
Proof. The proof is by induction on the length of ts. The assumptions ensure
that ts is of the form
a b c d z
wa wb wc wd wz
13.7 Chapter Notes 163
c d z
wc wd wz
huffman
a b
wa wb
In the diagram, we put the newly created tree at position 2 in the forest; in
general, it could be anywhere. By Lemma 13.6, the above tree equals
!
c a d z
splitLeaf huffman wa a wb b
wc wa +wb wd wz
To prove that this tree is optimum, it suffices by Lemma 13.5 to show that
c a d z
huffman
wc wa +wb wd wz
The sorted list of trees constitutes a simple priority queue (Part III). The
time complexity of Huffman’s algorithm is quadratic in the size n of this
queue. By using a binary search to implement insortTree, we can obtain an
O(n lg n) imperative implementation. An O(n) implementation is possible by
maintaining two queues, one containing the unprocessed leaf nodes and the
other containing the combined trees [43].
Huffman’s algorithm was invented by David Huffman [32]. The proof above
was inspired by Knuth’s informal argument [43]. This chapter’s text is based
on a published article [8], with the publisher’s permission. An alternative
formal proof, developed using Coq, is due to Théry [73].
Knuth [42] presented an alternative, more abstract view of Huffman’s al-
gorithm as a “Huffman algebra.” Could his approach help simplify our proof?
The most tedious steps above concerned splitting nodes, merging siblings, and
164 13 Huffman’s Algorithm
swapping symbols. These steps would still be necessary, as the algebraic ap-
proach seems restricted to abstracting over the arithmetic reasoning, which is
not very difficult in the first place. On the other hand, with Knuth’s approach,
perhaps the proof would gain in elegance.
Part III
Priority Queues
14
Priority Queues
ADT Priority_Queue =
interface
empty :: 0q
insert :: 0a ) 0
q ) 0
q
del_min :: 0
q ) 0
q
get_min :: 0
q ) 0
a
abstraction mset :: 0q ) 0a multiset
invariant invar :: 0q ) bool
specification
invar empty (empty-inv )
mset empty = ⦃⦄ (empty)
invar q ! invar (insert x q ) (insert -inv )
invar q ! mset (insert x q ) = mset q + ⦃x⦄ (insert )
invar q ^ mset q 6 = ⦃⦄ ! invar (del_min q ) (del_min-inv )
invar q ^ mset q 6 = ⦃⦄ !
mset (del_min q ) = mset q ⦃get_min q⦄ (del_min)
invar q ^ mset q 6 = ⦃⦄ ! get_min q = Min_mset (mset q ) (get_min)
Our priority queues are simplified. The more general version contains el-
ements that are pairs of some item and its priority. In that case a priority
queue can be viewed as a set, but because we have dropped the item we need
to view a priority queue as a multiset. In imperative implementations, priority
queues frequently also provide an operation decrease_key: given some direct
reference to an element in the priority queue, decrease the element’s priority.
This is not completely straightforward in a functional language. Lammich and
Nipkow [46] present an implementation, a Priority Search Tree.
14.1 Heaps
A popular implementation technique for priority queues are heaps, i.e. trees
where the minimal element in each subtree is at the root:
heap ::
0a tree ) bool
heap hi = True
heap hl ; m ; r i
= ((8 x 2set_tree l [ set_tree r : m x ^ heap l ^ heap r
) )
empty = hi
get_min = value
Bibliographic Remarks
The idea of the heap goes back to Williams [77] who also coined the name.
15
Leftist Heaps
Leftist heaps are heaps in the sense of Section 14.1 and implement mergeable
priority queues. The key idea is to maintain the invariant that at each node
the minimal height of the right child is that of the left child. We represent
leftist heaps as augmented trees that store the minimal height in every node:
mht ::
0a lheap ) nat
mht hi = 0
mht h_; (_; n ); _i = n
There are two invariants: the standard heap invariant (on augmented trees)
heap :: (
0a 0b ) tree ) bool
heap hi = True
heap hl ; (m ; _); r i
= ((8 x 2set_tree l [ set_tree r : m x ^ heap l ^ heap r
) )
and the structural leftist tree invariant that requires that the minimal height
of the right child is no bigger than that of the left child (and that the minimal
height information in the node is correct):
ltree ::
0a lheap ) bool
ltree hi = True
ltree hl ; (_; n ); r i
= (mh r mh l ^ n = mh r + 1 ^ ltree l ^ ltree r )
172 15 Leftist Heaps
Thus a tree is a leftist tree if for every subtree the right spine is a shortest
path from the root to a leaf. Pictorially:
Exercise 15.1. An alternative definition of leftist tree is via the length of the
right spine of the tree:
rank ::
0a tree ) nat
rank hi = 0
rank h_; _; r i = rank r + 1
merge ::
0a lheap ) 0a lheap ) 0a lheap
merge hi t = t
merge t hi = t
merge (hl 1 ; (a 1 ; n 1 ); r 1 i =: t 1 ) (hl 2 ; (a 2 ; n 2 ); r 2 i =: t 2)
= (if a 1 a 2 then node l 1 a 1 (merge r 1 t 2 )
node ::
0a lheap ) 0a ) 0a lheap ) 0a lheap
node l a r
= (let mhl = mht l ; mhr = mht r
Termination of merge can be proved either by the sum of the sizes of the two
arguments (which goes down with every call) or by the lexicographic product
of the two size measures: either the first argument becomes smaller or it stays
unchanged and the second argument becomes smaller.
15.2 Correctness 173
As shown in Section 14.1, once we have merge, the other operations are
easily definable. We repeat their definitions simply because this chapter em-
ploys augmented rather than ordinary trees:
empty ::
0a lheap
empty = hi
get_min :: 0a lheap ) 0a
get_min h_; (a ; _); _i = a
insert ::
0a ) 0a lheap ) 0a lheap
insert x t = merge hhi; x ; ; hii t
( 1)
15.2 Correctness
Of course the above proof (ignoring the ltree part) works for any mergeable
priority queue implemented as a heap.
The running time functions are shown in Appendix B.4. By induction on the
computation of merge we obtain
^ ltree r ! Tmerge l r mh l
ltree l + mh r + 1
ltree t ! Tdel_min t jt j
2 lg 1 + 1
The derivation of the bound for insertion is trivial but the one for deletion
uses a little lemma, assuming ltree t :
Tdel_min t = Tmerge l r + 1
lg jl j1 + lg jr j1 + 2 using (15.1)
2 lg jt j1 + 1 because lg x + lg y + 1 < 2 lg (x + y )
Bibliographic Remarks
Leftist heaps were invented by Crane [14]. Another version of leftist trees,
based on weight rather than height, was introduced by Cho and Sahni [12, 68].
16
Priority Queues via Braun Trees
insert ::
0a ) 0a tree
) 0a tree
insert a hi hhi; a ; hii
=
insert a hl ; x ; r i
= (if a < x then hinsert x r ; a ; l i else hinsert a r ; x ; l i
)
To delete the minimal (i.e. root) element from a tree, extract the leftmost
element from the tree and let it sift down to its correct position in the tree in
the manner of heapsort:
176 16 Priority Queues via Braun Trees
sift_down ::
0a tree
) 0a ) 0a tree ) 0a tree
sift_down hi a _ hhi; a ; hii =
sift_down hhi; x ; _i a hi
= ( if a x then hhhi; x ; hii; a ; hii else hhhi; a ; hii; x ; hii )
sift_down hl ; x ; r i
( 1 t a hl ; x ; r i
1 1 =: 1) t ( 2 2 2 =: 2)
= ( if a x ^ a x then ht ; a ; t i
1 2 1 2
else ht ; x ; sift_down l a r i
1 2 2 2 )
In the first two equations for sift_down, the Braun tree property guarantees
that the “_” arguments must be empty trees if the pattern matches.
16.2 Correctness
We outline the correctness proofs for insert and del_min by presenting the
key lemmas. Correctness of insert is straightforward:
jinsert x t j jt j= + 1
del_left t =
0
(x ; t ) ^t = 6 hi ^ braun t ! braun t 0
16.2 Correctness 177
braun hl ; a ; r i ! jsift_down l a r j jl j jr j= + + 1
braun hl ; a ; r i !
mset_tree (sift_down l a r ) ⦃a⦄ + (mset_tree l + mset_tree r )
=
braun t ! jdel_min t j jt j = 1
braun t ^ t 6 hi !
=
Bibliographic Remarks
Our implementation of priority queues via Braun trees is due to Paulson [64]
who credits it to Okasaki.
17
Binomial Heaps
binomial tree.
2 1 0
1 0 0
Fig. 17.1. A binomial tree of rank 3. The node labels depict the rank of each node.
A node of rank r has child nodes of ranks r 1; : : : ; 0.
rank hr ; x ; ts i = r root hr ; x ; ts i = x
This datatype contains all binomial trees, but also some non-binomial
trees. To carve out the binomial trees, we define an invariant, which reflects
the informal definition above:
btree ::
0a tree
) bool
btree hr ; _; ts i 8 t 2set ts : btree t ^ map rank ts
= (( ) = rev [0 ::<r ])
Additionally, we require the heap property, i.e., that the root element of
each subtree is a minimal element in that subtree:
heap ::
0a tree
) bool
heap h_; x ; ts i 8 t 2set ts : heap t ^ x root t
= ( )
Thus, a binomial tree is a tree that satisfies both the structural and the
heap invariant. The two invariants are combined in a single predicate:
bheap ::
0a tree ) bool
bheap t = ( btree t ^ heap t )
invar ::
0a trees ) bool
invar ts = (( 8 t 2set ts : bheap t ^ sorted_wrt <
) ( ) ( map rank ts ))
Note that sorted_wrt states that a list is sorted wrt. the specified relation,
here (<). It is defined in Appendix A.
17.1 Size
The following functions return the multiset of elements in a binomial tree and
in a binomial heap:
17.2 Implementation of ADT Priority_Queue 181
Most operations on binomial heaps are linear in the length of the heap.
To show that the length is bounded by the number of heap elements, we first
observe that the number of elements in a binomial tree is already determined
by its rank. A binomial tree of rank r has 2r nodes:
btree t ! jmset_tree t j = 2
rank t
To prove this, recall that the heap ts is strictly sorted by rank. Thus, we can
underestimate the ranks of the trees in ts by 0; 1; : : : ; jtsj 1. This means
that they must have at least 20 ; 21 ; : : : ; 2jtsj 1 elements, i.e., at least 2jtsj 1
elements together, which yields the desired bound.
Obviously, the empty list is a binomial heap with no elements, and a binomial
heap is empty only if it is the empty list:
invar []
mset_trees [] = ⦃⦄
(mset_trees ts = ⦃⦄) = ( ts = [])
17.2.2 Insertion
A crucial property of binomial trees is that we can link two binomial trees of
rank r to form a binomial tree of rank r + 1, simply by prepending one tree
182 17 Binomial Heaps
as the first child of the other. To preserve the heap property, we add the tree
with the bigger root element below the tree with the smaller root element.
This linking of trees is illustrated in Figure 17.2. Formally:
link ::
0a tree
) 0a tree ) 0a tree
link hr ; x ; ts i
( 1 t 1hr 0; x ; ts i t
=: 1) ( 2 2 =: 2)
= ( if x x then hr
1 2 ; x ; t ts i else hr
+ 1 1 2 # 1 + 1 ;x ;t
2 1 # ts 2 i)
Fig. 17.2. Linking two binomial trees of rank 2 to form a binomial tree of rank 3,
by linking the left tree as first child of the right tree, as indicated by the dashed
line. We assume that the root element of the left tree is greater than or equal to the
root element of the right tree, such that the heap property is preserved.
By case distinction, we can easily prove that link preserves the invariant
and that the resulting tree contains the elements of both arguments.
bheap t 1 ^ bheap t 2 ^ rank t 1 = rank t 2 ! bheap (link t 1 t 2 )
mset_tree (link t 1 t 2 ) = mset_tree t 1 + mset_tree t 2
The link operation forms the basis of inserting a tree into a heap: if the
heap does not contain a tree with the same rank, we can simply insert the tree
at the correct position in the heap. Otherwise, we merge the two trees and
recursively insert the result. For our purposes, we can additionally assume
that the rank of the tree to be inserted is smaller than or equal to the lowest
rank in the heap, which saves us a case in the following definition:
insert ::
0a ) 0a trees ) 0a trees
insert x ts = ins_tree h ; x ; i ts
0 []
The above definition meets the specification for insert required by the
Priority_Queue ADT:
invar t ! invar (insert x t )
mset_trees (insert x t ) = ⦃x⦄ + mset_trees t
17.2.3 Merging
Recall the merge algorithm used in top-down merge sort (Section 2.4). It
merges two sorted lists by repeatedly taking the smaller list head. We use a
similar idea for merging two heaps: if the rank of one list’s head is strictly
smaller, we choose it. If both ranks are equal, we link the two heads and insert
the resulting tree into the merged remaining heaps. Thus, the resulting heap
will be strictly ordered by rank. Formally:
merge ::
0a trees ) 0a trees ) 0a trees
merge ts 1 [] = ts 1
merge [] (v # va ) = v # va
merge (t 1 # ts 1 =: h 1 ) (t 2 # ts 2 =: h 2 )
= (if rank t 1 < rank t 2 then t 1 # merge ts 1 h 2
The merge function can be regarded as an algorithm for adding two sparse
binary numbers. This intuition is explored in Exercise 17.2.
We show that the merge operation preserves the invariant and adds the
elements:
invar ts 1 ^ invar ts 2 ! invar (merge ts 1 ts 2 )
mset_trees (merge ts 1 ts 2 ) = mset_trees ts 1 + mset_trees ts 2
184 17 Binomial Heaps
For a binomial tree, the root node always contains a minimal element. Unfor-
tunately, there is no such property for the whole heap—the minimal element
may be at the root of any of the heap’s trees. To get a minimal element from
a non-empty heap, we look at all root nodes:
get_min :: 0a trees ) 0a
get_min [t ] = root t
get_min (t # ts ) = min (root t ) (get_min ts )
To delete a minimal element, we first need to find one and then remove it.
Removing the root node of a tree with rank r leaves us with a list of its
children, which are binomial trees of ranks r 1; : : : ; 0. Reversing this list
yields a valid binomial heap, which we merge with the remaining trees in the
original heap:
17.3 Running Time Analysis 185
Here, the auxiliary function get_min_rest splits a heap into a tree with min-
imal root element, and the remaining trees.
We prove that, for a non-empty heap, del_min preserves the invariant and
deletes the minimal element:
ts 6
= [] ^ invar ts ! invar del_min ts( )
ts 6
= [] ! mset_trees ts mset_trees del_min ts
= ( ) + ⦃get_min ts⦄
The proof is straightforward. For invariant preservation, the key is to show
that get_min_rest preserves the invariants:
get_min_rest ts = ( t 0; ts 0) ^ ts 6= [] ^ invar ts ! bheap t 0
get_min_rest ts =
0 0
(t ; ts ) ^ ts 6= [] ^ invar ts ! invar ts 0
To show that we actually remove a minimal element, we show that
get_min_rest selects the same tree as get_min:
ts 6
= [] ^ get_min_rest ts = ( t 0; ts 0) ! root t 0 = get_min ts
The running time functions are shown in Appendix B.5. Intuitively, the op-
erations are linear in the length of the heap, which in turn is logarithmic in
the number of elements (cf. Section 17.1).
The running time analysis for insert is straightforward. The running time
is dominated by ins_tree. In the worst case, it iterates over the whole heap,
taking constant time per iteration. By straightforward induction, we show
Tins_tree t ts jts j + 1
and thus
186 17 Binomial Heaps
The running time analysis for merge is more interesting. In each recursion,
we need constant time to compare the ranks. However, if the ranks are equal,
we link the trees and insert them into the merger of the remaining heaps.
In the worst case, this costs linear time in the length of the merger. A naive
analysis would estimate jmerge ts 1 ts 2 j jts 1 j + jts 2 j, and thus yield a
quadratic running time in the length of the heap.
However, we can do better: we observe that every link operation in
ins_tree reduces the number of trees in the heap. Thus, over the whole merge,
we can only have linearly many link operations in the combined size of both
heaps.
To formalize this idea, we estimate the running time of ins_tree and merge
together with the length of the result:
Tins_tree t ts + jins_tree t ts j jts j = 2 +
From the bound for merge, we can easily derive a bound for del_min:
invar ts ^ ts 6 = [] ! Tdel_min ts 6 lg ( jmset_trees ts j + 1) + 3
The only notable point is that we use a linear time bound for reversing a list,
as explained in Section 1.5.1:
Trev ::
0a list ) nat
Trev xs = jxs j + 1
17.4 Exercises
Exercise 17.1. A node in a tree is on level n if it needs n edges from the
root to reach it. Define a function nol ::nat ) 0a tree ) nat such that nol
n t is the number
of nodes on level n in tree t and show that a binomial tree
of rank r has rl nodes on level l:
btree t ! nol l t = rank t choose l
Hint: You might want to prove separately that
i<r
X
i
r
i =0
n =
n +1
17.4 Exercises 187
Bibliographic Remarks
It computes the well-known Fibonacci numbers. You may also have noticed
that calculating fib 50 already causes quite some stress for your computer and
there is no hope for fib 500 to ever return a result.
This is quite unfortunate considering that there is a very simple imperative
program to compute these numbers efficiently:
int fib(n) {
int a = 0;
int b = 1;
for (i in 1..n) {
int temp = b;
b = a + b;
a = temp;
}
return a;
}
same function, and use whichever one is more suited for the task at hand. For
fib, of course, it is trivial to define a functional analogue of the imperative
program and to prove its correctness. However, doing this for all recursive
functions we would like to define is tedious. Instead, this chapter will sketch a
recipe that allows to define such recursive functions in the natural way, while
still getting an efficient implementation “for free”.
In the following, the Fibonacci function will serve as a simple example on
which we can illustrate the idea. Next, we will show how to prove the cor-
rectness of the efficient implementation in an efficient way. Subsequently, we
will discuss further details of the how approach and how it can be applied be-
yond fib. The chapter closes with the study of two famous (and archetypical)
dynamic programming algorithms: the Bellman-Ford algorithm for finding
shortest paths in weighted graphs and an algorithm due to Knuth for com-
puting optimal binary search trees.
18.1 Memoization
Let us consider the tree of recursive calls that are issued when computing fib
5 (Fig. 18.1). We can see that the subtree for fib 3 is computed twice, and
4 3
3 2 2 1
2 1 1 0 1 0
1 0
that the subtree for fib 2 is even computed three times. How can we avoid
these repeated computations? A common solution is memoization: we store
previous computation results in some kind of memory and consult it to poten-
tially recall a memoized result before issuing another recursive computation.
Here is a simple memoizing version of fib that implements the memory as a
mapping of type nat * nat (the notation m (k 7! v ) is a shorthand for m (k
:= Some v ), see also Section 12.1):
18.1 Memoization 193
(j ; m ) =
And indeed, we can ask Isabelle to compute (via the value command) fib 1 50
or even fib 1 500 and we get the result within a split second.
However, we are not yet happy with this code. Carrying the memory
around means a lot of additional weight for the definition of fib 1 , and proving
that this function computes the same value as fib is not completely trivial
(how would you approach this?). Let us streamline the definition first by
pulling out the reading and writing of memory into a function memo (for a
type 0k of keys and a type 0v of values):
memo ::
0k ) (( 0k * 0v ) 0v 0k * 0v
) ( ))
) 0k * 0v ) 0v 0k * 0v
( ) ( )
memo k f m
= (case m k of None ) let (v ; m ) = f m in (v ; m (k 7! v ))
j Some v ) (v ; m ))
fib 2 :: nat ) nat
( * nat ) nat
) ( nat * nat )
fib 2 0 = memo 0 ( (0 m : ; m
))
fib 2 1 = memo 1 ( (1 m : ; m
))
fib 2 (n + 2)
= memo (n + 2)
(j ; m ) = fib 2 (n + 1) m
in (i + j ; m ))
fib 2 (n + 2)
= memo (n + 2)
This already looks a lot more like the original definition but it still has one
problem: we have to thread the memory through the program explicitly. This
can be become rather tedious for more complicated programs and diverges
from the original shape of the program, complicating the proofs.
Let us examine the type of fib 2 more closely. We can read it as the type of
a function that, given a natural number, returns a computation. Given an
initial memory, it computes a pair of a result and an updated memory. We
can capture this notion of “stateful” computations in a data type:
datatype (
0s ; 0a ) state = State (
0s ) 0a 0s )
run_state :: ( 0s ; 0a ) state ) 0s ) 0a 0s
run_state (State f ) s = f s
The advantage of this definition may not seem immediate. Its value only starts
to show when we see how it allows us to chain stateful computations. To do
so, we only need to define two constants: return to pack up a result in a
computation, and bind to chain two computations after each other.
return ::
0a ) (
0s ; 0a ) state
return x = State ( s : x ; s
( ))
bind :: (
0s ; 0a ) state ) 0a ) 0s ; 0b state ) 0s ; 0b state
( ( ) ) ( )
We add a little syntax on top and write h x i for return x, and a >>= f instead
of bind a f. The “identity” computation h x i simply leaves the given state
unchanged and produces x as a result. The chained computation a >>= f
starts with some state s, runs a on it to produce a pair of a result x and a
new state s 0, and then evaluates f x to produce another computation that is
run on s 0.
18.1 Memoization 195
We have now seen how to pass state around but we are not yet able to
interact with it. For this purpose we define get and set to retrieve and update
the current state, respectively:
get :: (
0s ; 0s ) state
get = State s : s ; s
( ( ))
set ::
0s ) 0s ; unit ) state
(
memo 1 ::
0k ) ( 0k * 0v ; 0v ) state ) (
0k * 0v ; 0v ) state
memo 1 k a
= get >>=
(m : case m k of
j Some x ) h x i )
fib 3 :: nat ) nat
( * nat ; nat ) state
fib 3 0 = 0 hi
fib 3 1 = 1 hi
fib 3 (n + 2)
= memo 1 (n + 2) ( fib 3 n >> i : fib
= ( 3 (n + 1) >> j : h i
= ( + j i )))
Can you see how we have managed to hide the whole handling of state behind
the scenes? The only explicit interaction with the state is now happening
inside of memo 1 . This is sensible as this is the only place where we really
want to recall a memoized result or to write a new value to memory.
While this is great, we still want to polish the definition further: the syn-
tactic structure of the last case of fib 3 still does not match fib exactly. To this
end, we lift function application f x to the state monad:
:
( ) :: (
0s ; 0a ) 0s ; 0b ) state ) state ) ( 0s ; 0a ) state
( ) (
0s ; 0b ) state
fm : xm = (f m >>= (f : x m >>= (x : f x )))
We can now spell out our final memoizing version of fib where :
( ) replaces
ordinary function applications in the original definition:
196 18 Dynamic Programming
You may wonder why we added that many additional computations in this last
step. On the one hand, we have gained the advantage that we can now closely
follow the syntactic structure of fib to prove that fib 4 is correct (notwith-
standing that memo 1 will need a special treatment, of course). On the other
hand, we can remove most of these additional computations in a final post-
processing step.
Let us recap what we have seen so far in this chapter. We noticed that the
naive recursive formulation of the Fibonacci numbers leads to a highly inef-
ficient implementation. We then showed how to work around this problem
by using memoization to obtain a structurally similar but efficient implemen-
tation. After all this, you may wonder why this chapter is titled Dynamic
Programming and not Memoization.
Dynamic programming relies on two main principles. First, to find an
optimal solution for a problem by computing it from optimal solutions for
“smaller” instances of the same problem, i.e. recursion. Second, to memoize
these solutions for smaller problems in, e.g. a table. Thus we could be bold
and state:
dynamic programming = recursion + memoization
A common objection to this equation would be that memoization should
be distinguished from tabulation. In this view, the former only computes
“necessary” solutions for smaller sub-problems, while the latter just “blindly”
builds solutions for sub-problems of increasing size, many of which might be
unnecessary. The benefit of tabulation could be increased performance, for
instance due to improved caching performance. We believe that this distinc-
tion is largely irrelevant to our approach. First, in this book we focus on
asymptotically efficient solutions, not constant-factor optimizations. Second,
in many dynamic programming algorithms memoization would actually com-
pute solutions for the same set of sub-problems as tabulation does. No matter
which of the two approaches is used in the implementation, the hard part is
18.2 Correctness of Memoization 197
usually to come up with a recursive solution that can efficiently make use of
sub-problems in the first place.
There are problems, however, where clever tabulation instead of naive
memoization is necessary to achieve an asymptotically optimal solution in
terms of memory consumption. One instance of this is the Bellman-Ford
algorithm presented in Section 18.4. On this example, we will show that our
approach is also akin to tabulation. It can easily be introduced as a final
“post-processing” step.
Some readers may have noticed that our optimized implementations of fib
are not really optimal as they use a map for memoization. Indeed it is possible
to swap in other memory implementations as long as they provide a lookup
and an update method. One can even make use of imperative data structures
like arrays. As this is not the focus of this book, the interested reader is
referred to the literature that is provided at the end of this chapter. Here,
we will just assume that the maps used for memoization are implemented as
red-black trees (and Isabelle’s code generator can be instructed to do so).
For the remainder of this chapter, we will first outline how to prove that
fib 4 is correct. Then, we will sketch how to apply our approach of memoization
beyond fib. Afterwards, we will study some prototypical examples of dynamic
programming problems and show how to apply the above formula to them.
We now want to prove that fib 4 is correct. But what is it exactly that we
want to prove? We surely want fib 4 to produce the same result as fib when
run with an empty memory (where empty x : None):
fst (run_state (fib 4 n ) empty ) = fib n (18.1)
If we were to make a naive attempt at this prove, we would probably start
with an induction on the computation of fib just to realize that the induction
hypotheses are not strong enough to prove the recursion case, since they
demand an empty memory. We can attempt generalization as a remedy:
fst (run_state (fib 4 n ) m ) = fib n
However, this statement does not hold anymore for every memory m.
What do we need to demand from m? It should only memoize values that
are consistent with fib:
198 18 Dynamic Programming
dom :: (
0k * 0v ) 0k set
)
dom m = fa j m a 6 None g
=
Note that, from now, we use the type 0a mem to denote memoized values of
type 0a that have been “wrapped up” in our memoizing state monad. Using
cmem, we can formulate a general notion of equivalence between a value v
and its memoized version a, written v . a: starting from a consistent memory
m, a should produce another consistent memory m 0, and the result v.
.
( ) ::
0a ) 0a mem ) bool
v .a
= ( 8 m : cmem m !
let v 0; m 0
( ( ) = run_state a m in v = v0 ^ cmem m 0
))
lines:
“f m is a memoized function that, when applied to a value x, yields a
memoized value that is equivalent to f x ”.
18.2 Correctness of Memoization 199
This goes beyond what we can currently express with (.) as v . a merely
states that “a is a memoized value equivalent to v ”. What we need is more lib-
erty in our choice of equivalence. That is, we want to use statements v .R a,
with the meaning: “a is a memoized value that is related to v by R”. The
formal definition is analogous to (.) (and (.) = (.(=) )):
v .R s
= ( 8 m : cmem m !
let v 0; m 0
( run_state s m in R v v 0 ^ cmem m 0
( ) = ))
However, we still do not have a means of expressing the second part of our
sentence. To this end, we use the function relator ( ): V
(V0 ) ::
0c
( a ) ) bool )
) 0
( b ) 0d ) bool ) 0a ) 0b ) 0c ) 0d ) bool
) ( ) ( )
R VS = ( f g : 8 x y : R x y ! S f x g y ( ) ( ))
Our current rule for application (18.4) does not match this goal. Thus we need
to generalize it. In addition, we need a new rule for return, and a rule for ( ). V
To summarize, we need the following set of theorems about our consistency
relation, applying them wherever they match syntactically to finish the proof
of (18.2):
Rxy ! x .R h y i
x .R x m ^ f .R V . f m ! f x .S f m : x m
8 x y: R x y ! S f x g y ! R V S f g
S
( ( ) ( )) ( )
∗
18.3 Details of Memoization
In this section, we will look at some further details of the memoization process
and sketch how it can be applied beyond fib. First note that our approach of
memoization hinges on two rather independent components: We transform the
original program to use the state monad, to thread (an a priori arbitrary)
state through the program. Only at the call sites of recursion, we then in-
troduce the memoization functionality by issuing lookups and updates to the
memory (as implemented by memo 1 ). We will name this first process monad-
ification. For the second component, many different memory implementations
can be used, as long as we can define memo 1 and prove its characteristic the-
orem (18.3). For details on this, the reader is referred to the literature. Here,
we want to turn our attention towards monadification.
To discuss some of the intricacies of monadification, let us first stick with
fib for a bit longer and consider the following alternative definition (which is
mathematically equivalent but not the same program):
We have not yet seen how to handle two ingredients of this program: con-
structs like if-then-else or case-combinators; and higher-order functions such
as map.
It is quite clear how if-then-else can be lifted to the state monad:
By following the structure of the terms, we can also deduce a proof rule for
if m :
b . b m ^ x .R x m ^ y .R y m !
(if b then x else y ) .R if m b m x m y m
If you are just interested in the dynamic programming algorithms of the following
sections, this section can safely be skipped on first reading.
1
18.3 Details of Memoization 201
However, suppose we want to apply this proof rule to our new equation for
fib. We will certainly need the knowledge of whether n = 0 to make progress
in the correctness proof. Thus we make our rule more precise:
b . b m ^ (b ! x .R x m ) ^ (: b ! y .R y m ) !
(if b then x else y ) .R if m b m x m y m
How can we lift map to the state monad level? Consider its defining equa-
tions:
map f [] = []
map f (x # xs ) = fx # map f xs
map m 0 f [] = h []i
map m 0 f (x # xs ) = h a : h b : h a # bi i i : hf i : hxi :
( ) ( map m 0 f xs )
((
0a ) 0b mem ) ) (
0a list ) 0b list mem ) mem ) mem
because map has two arguments and we need one layer of the state monad
for each of its arguments. Therefore we simply define
map m = h f : h map m 0 f i i
For inductive proofs about the new definition of fib, we also need the
knowledge that fib is recursively applied only to smaller values than n when
computing fib n. That is, we need to know which values f is applied to in
map f xs. We can encode this knowledge in a proof rule for map:
xs = ys ^ (8 x : x 2 set ys ! f x .R f m x ! )
( h a : h 1 + a i i :
The correctness proof for fib m is analogous to the one for fib 4 , once we have
proved the new rules discussed above.
At the end of this section, we note that the techniques that were sketched
above also extend to case-combinators and other higher-order functions. Most
of the machinery for monadification and the corresponding correctness proofs
can be automated in Isabelle [79]. Finally note that none of the techniques
we used so far are specific to fib. The only parts that have to be adopted are
the definitions of memo 1 and cmem. In Isabelle, this can be done by simply
instantiating a locale.
This concludes the discussion of the fundamentals of our approach towards
verified dynamic programming. We now turn to the study of two typical
examples of dynamic programming algorithms: the Bellman-Ford algorithm
and an algorithm for computing optimal binary search trees.
and some target vertex (known as sink ), and we want to calculate the weight
of the shortest (i.e. minimum weight) paths from every vertex to the sink.
Figure 18.2 shows an example of such a graph.
4
3
1
0 5 1
2
2
2
2 4
3
Formally, we will take a simple view of graphs. We assume that we are given
a number of nodes numbered 0;: : : ;n, and some sink t 2 f0::n g (thus n = t
= 4 in the example). Edge weights are given by a function W :: int ) int
) int extended. The type int extended extends the natural numbers with
positive and negative infinity:
datatype 0a extended = Fin 0a j1j 1
We refrain from giving the explicit definition of addition and comparison on
this domain, and rely on your intuition instead. A weight assignment W i j
= 1 means that there is no edge from i to j. The purpose of 1 will become
clear later.
i=v 0 1 2 3 4
0 11110
1 11 3 2 0
2 5 6 3 1 0
3 5 5 3 1 0
4 4 5 3 1 0
Table 18.1. The minimum weights of paths from vertices v = 0 : : : 4 to t that use
at most i = 0 : : : 4 edges.
The analysis we just ran on the example already gives us a clear intuition
on all we need to deduce a dynamic program: a recursion on sub-problems, in
this case to compute the weight of shortest paths with at most i + 1 edges
from the weights of shortest paths with at most i edges. To formalize this
recursion, we first define the notion of a minimum weight path from some
node v to t with at most i edges, denoted as OPT i v :
weight _ [_] = 0
(lhs rhs) For this direction, we essentially need to show that every path
on the rhs is covered by the lhs, which is trivial.
(lhs rhs) We skip the cases where OPT (i + 1) v is trivially 0 or 1
(i.e. where it is given by the singleton set in the definition of OPT ). Thus
consider some xs such that OPT (i + 1) v = weight (v # xs @ [t ]),
jxs j i , and set xs f0::n g. The cases where jxs j < i or i = 0 are trivial.
Otherwise, we have OPT (i + 1) v = W v (hd xs ) + weight (xs @ [t ])
by definition of weight, and OPT i (hd xs ) weight (xs @ [t ]) by def-
inition of OPT. Therefore, we can show:
OPT (i + 1) v W v (hd xs ) + OPT i (hd xs ) rhs
ut
We can turn these equations into a recursive program:
bf (i + 1) v
= min_list (bf i v # map ( w : W v w + bf i w ) [0 ::<n + 1])
Have we solved the initial problem now? The answer is “not quite” because
we have ignored one additional complication. Consider our example table 18.1
again. The table stops at path length five because no shorter paths with more
edges exist. For this example, five corresponds to the number of nodes, which
bounds the length of the longest simple path. However, is it the case that we
will never find shorter non-simple paths in other graphs? The answer is “no”.
If a graph contains a negative reaching cycle, i.e. a cycle with a negative sum
of edge weights from which the sink is reachable, then we can use it arbitrarily
often too find shorter and shorter paths.
Luckily, we can use the Bellman-Ford algorithm to detect this situation
by examining the relationship of OPT n and OPT (n + 1). The following
proposition summarizes the key insight:
The graph contains a negative reaching cycle if and only if there exists a
v n such that OPT (n + 1) v < OPT n v
206 18 Dynamic Programming
Proof. If there is no negative reaching cycle, then all shortest paths are
either simple or contain superfluous cycles of weight 0. Thus, we have
OPT (n + 1) v = OPT n v for all v n.
Otherwise, there is a negative reaching cycle ys = a # xs @ [a ] with
weight ys < 0. Working towards a contradiction, assume that OPT n v
OPT (n + 1) v for all v n. Using the recursion we proved above, this
implies OPT n v W v u + OPT n u for all u ; v n. By applying this
inequality to the nodes in a # xs, we can prove the inequality
sum_list (map (OPT n ) ys )
sum_list (map (OPT n ) ys ) + weight ys
This implies 0 weight ys, which yields the contradiction. ut
This means we can use bf to detect the existence of negative reaching
cycles by computing one more round, i.e. bf (n + 1) v for all v. If nothing
changes in this step, we know that there are no negative reaching cycles and
that bf n correctly represents the shortest path weights. Otherwise, there has
to be a negative reaching cycle.
Finally, we can use memoization to obtain an efficient implementation that
solves the single-destination shortest path problem. Applying our memoiza-
tion technique from above, we first obtain a memoizing version bf m of bf. We
then define the following program:
bellman_ford ::
((nat nat ; int extended ) mapping ; int extended list option ) state
bellman_ford
= iter_bf (n ; n ) > >=
0
(_: map m (bf m n ) [0::<n + 1] > >=
0
(xs : map m (bf m (n + 1)) [0::<n + 1] > >=
(ys : h if xs = ys then Some xs else None i )))
2 4
1 5
2 5
1 3
Our task is equivalent to minimizing the weighted path length (or cost )
as we did for Huffman encodings (Chapter 13). Recall that the weighted path
length is the sum of the frequencies of every node in the tree multiplied by
its depth in the tree. It fulfills the following (recursive) equations:
cost hi = 0
cost hl ; k ; r i
208 18 Dynamic Programming
P P
= (
2
k set_tree l p k) + cost l + pk + cost r + (
2
k set_tree r p k)
The key insight into the problem is that subtrees of optimal binary search
trees are also optimal. The left and right subtrees of the root must be optimal,
since if we could improve either one, we would also get a better tree for the
complete range of keys. This motivates the following definition:
wpl W i j hi = 0
wpl W i j hl ; k ; r i
= wpl W i (k 1) l + wpl W (k + 1) jr + Wij
Pj
Wij = (
k = i p k)
else min_list
(map (k : min_wpl i (k 1) + min_wpl (k + 1) j +
W i j)
18.5 Optimal Binary Search Trees 209
[ i ::j ]))
18.5.2 Memoization
We can apply the memoization techniques that were discussed above to ef-
ficiently compute min_wpl and opt_bst. The only remaining caveat is that
W also needs to be computed efficiently from the distribution p. If we just
Pj
use the defining equality W i j = ( k = i p k ), the computation of W is
unnecessarily costly. Another way is to memoize W itself, using the following
recursion:
Wpij = ( if i j then W p i j
( 1) + p j else 0)
Wpij . Wm0 p i j
We can now iterate W m 0 p i n for i = 0 :::n to pre-compute all relevant
values of W p i j :
Using the correctness theorem for map m 0 from above, it can easily be shown
that this yields a consistent memory:
cmem (W c p n )
We can show the following equation for computing W
Wpij = ( case (W c p n ) (i ; j ) of None ) W p i j j Some x ) x )
Note that the None branch will only be triggered when indices outside of
0: : : n are accessed. Finally, we can use W c to pass the pre-computed values
of W to opt_bst :
Rij
= argmin ( k : w i j + min_wpl i (k 1) + min_wpl (k + 1) j ) [i ::j ]
The proof of this fact is rather involved and the details can be found in the
references provided at the end of this section.
With this knowledge, we can make the following optimization to opt_bst :
opt_bst 2 (k + 1) j i)
[left ::right ]))
You may wonder whether this change really incurs an asymptotic runtime im-
provement. Indeed, it can be shown that it improves the algorithm’s runtime
by a factor of O (n ). For a fixed search tree size d = i j, the total number
of recursive computations is given by the following telescoping series:
d
P
n !
( j d ::n : let i
= = j d in R (i + 1) j R i (j 1) + 1)
= R (n d + 1) n R 0 (d 1) + n d +1
This quantity is bounded by 2 n, which implies that the overall number of
recursive calls is bounded by O (n 2 ).
Bibliographic Remarks
The original O (n 2 ) algorithm for Binary Search Trees is due to Knuth [41].
Yao later explained this optimization more elegantly in his framework of
"quadrilateral inequalities" [80]. Nipkow and Somogyi follow Yao’s approach
in their Isabelle formalization [59], on which the last subsection of this chapter
is based. The other parts of this chapter are based on a paper by Wimmer et
al. [79] and its accompanying Isabelle formalization [78]. The formalization
also contains further examples of dynamic programming algorithms, includ-
ing solutions for the Knapsack and the minimum edit distance problems, and
the CYK algorithm.
19
Amortized Analysis
So far we implicitly assumed that all operations return the data structure
as a result, otherwise (f s ) does not make sense. How should we analyze
so-called observer functions that do not modify the data structure but
return a value of some other type? Amortized analysis does not make sense
here because the same observer can be applied multiple times to the same
data structure value without modifying it. Classical worst-case complexity is
needed, unless the observer does modify the data structure as a side effect or
by returning a new value. Then one can perform an amortized analysis that
ignores the returned observer value (but not the time it takes to compute it).
Now we study two important examples of amortize analyses. More complex
applications are found in later chapters.
bs jfilter x : x
= ( ) bs j
19.3.1 Insertion
The key observation is that doubling the size of the table upon overflow leads
to an amortized cost of 3 per insertion: 1 for inserting the element, plus 2
towards the later cost of copying a table of size l upon overflow (because only
the l =2 elements that lead to the overflow pay for it).
Insertion always increments n by 1. The size increases from 0 to 1 with
the first insertion and doubles with every further overflow:
n; l
( n l) = 2
Now the load factor is always between 1=4 and 1. It turns out that the lower
bound is not needed in the proofs and we settle for a simpler invariant:
218 19 Amortized Analysis
n; l
( if n < l /
) = ( 2 then l / 2 n else 2 n l)
Exercise 19.2. Prove that in the dynamic table with naive deletion (where
deletion decrements n but leaves l unchanged), insertion has an amortized
cost of at most 3 and deletion of at most 1.
the table is multiplied by 2=3, i.e. reduced to (2 l ) div 3. Prove that insertion
and deletion have constant amortized complexity using the potential (n ; l )
= j2 n l j.
Bibliographic Remarks
ADT Queue =
interface empty :: 0q
enq :: 0a ) 0q ) 0q
deq :: 0q ) 0q
first :: 0q ) 0a
is_empty :: 0q ) bool
abstraction list :: 0q ) 0a list
invariant invar :: 0q ) bool
specification list empty = []
invar q ! list (enq x q ) = list q @ [x ]
invar q ! list (deq q ) = tl (list q )
invar q ^ list q 6= [] ! first q = hd (list q )
invar q ! is_empty q = (list q = [])
invar empty
invar q ! invar (enq x q )
invar q ! invar (deq q )
A trivial implementation is as a list, but then enq is linear in the length of the
queue. To improve this we consider two more sophisticated implementations.
First, a simple implementation where every operation has amortized constant
complexity. Second, a tricky “real time” implementation where every operation
has worst-case constant complexity.
220 20 Queues
The queue is implemented as a pair of lists (fs ; rs ), the front and rear lists.
Function enq adds elements to the head of the rear rs and deq removes
elements from the head of the front fs. When fs becomes empty, it is replaced
by rev rs (and rs is emptied) — the reversal ensures that now the oldest
element is at the head. Hence rs is really the reversal of the rear of the queue
but we just call it the rear. The abstraction function is obvious:
list ::
0a list 0a list ) 0a list
list (fs ; rs ) = fs @ rev rs
Clearly enq and deq are constant-time until the front becomes empty.
Then we need to reverse the rear which takes linear time (if it is implemented
by itrev, see Section 1.5.1). But we can pay for this linear cost up front by
paying a constant amount for each call of enq. Thus we arrive at amortized
constant time. See below for the formal treatment.
The implementation is shown in Figure 20.1. Of course empty = ([]; []).
Function norm performs the reversal of the rear once the front becomes
norm ::
0
a list 0
a list ) 0
a list 0
a list
norm (fs ; rs ) = ( if fs = [] then (itrev rs ;
[] []) else (fs ; rs ))
enq ::
0
a ) 0
a list 0
a list ) 0
a list 0
a list
enq a (fs ; rs ) = norm (fs ; a # rs )
deq ::
0
a list 0
a list ) 0
a list 0
a list
deq (fs ; rs ) = ( if fs = [] then (fs ; rs ) else norm (tl fs ; rs ))
first ::
0
a list 0
a list ) 0
a
first (a # _; _) = a
empty. Why does not only deq but also enq call norm? Because otherwise
enq x n (:::(enq x 1 empty ):::) would result in ([]; [x n ; :::; x 1 ]) and first would
20.3 A Real Time Implementation 221
invar ::
0a list 0a list ) bool
invar (fs ; rs ) = ( fs = [] ! rs = [])
_; rs jrs j
( ) =
because jrs j is the amount we have accumulated by charging 1 for each enq.
This is enough to pay for the eventual reversal. Now it is easy to prove that
both enq and deq have amortized constant running time:
Tenq a (fs ; rs ) + (enq a (fs ; rs )) fs ; rs
( ) 4
The two observer functions first and is_empty have constant running time.
Breaking down a reversal operation into multiple steps can be done using the
following function:
Note that each call to rev_step takes constant time since its definition is
non-recursive.
Using the notation f n for the n-fold composition of function f we can
state a simple inductive lemma:
Lemma 20.1. rev_step jxs j (xs ; ys ) = ([] ; rev xs @ ys )
As a special case this implies rev_step jxs j (xs ; []) = ([] ; rev xs ) .
Phases (1) and (2) are independent and can be performed at the same time,
hence, when starting from this configuration
f f 0 r r0
q 0 ::: qm qm +1 ::: qn
after max jf j jr j steps of reversal the state would be the following:
f f 0 r r0
qm ::: q 0 qn ::: qm +1
20.3 A Real Time Implementation 223
Phase (3) reverses f 0 onto r 0 fr to obtain the same result as a call to list :
f f 0 r r0
q 0 ::: qm qm +1 ::: qn
front rear
deq q 0 ::: qm d = 0 enq
q 0 ::: qm qm +1 ::: qn
queue
224 20 Queues
20.3.4 Implementation
not constant. Finally, status tracks the current reversal phase of the queue in
a 0a status value.
datatype 0a status =
Idle j
Rev nat ( 0a list ) ( 0a list ) ( 0a list ) (
0a list ) j
App nat ( 0a list ) ( 0a list ) j
Done
exec ::
0a status 0a status
)
exec (Rev ok (x # f ) f 0 (y # r ) r 0)
0
= Rev (ok + 1) f (x # f ) r (y # r )
0
exec (Rev ok [] f 0 [y ] r 0) = App ok f 0 (y # r 0)
exec (App 0 f 0 r 0) = Done r 0
exec (App ok (x # f 0) r 0) = App (ok 1) f
0 (x # r 0)
exec s = s
invalidate ::
0a status
0a status
)
invalidate (Rev ok f f 0 r r 0) = Rev (ok 1) f f
0 r r0
0 0
invalidate (App 0 f (_ # r )) = Done r 0
invalidate (App ok f 0 r 0) = App (ok 1) f
0 r0
invalidate s = s
By decreasing the value of ok, the number of elements from f 0 that are
moved into r 0 in phase (3) is reduced, since exec might produce Done early,
once ok = 0, ignoring the remaining elements of f 0. Furthermore, since f 0 is
a reversal of the front list, elements left behind in its tail correspond directly
to those being removed from the queue.
The rest of the implementation is shown below. Auxiliary function exec2,
as its name suggests, applies exec twice and updates the queue accordingly if
Done is returned.
exec2 ::
0a queue ) 0a queue
exec2 q = ( case exec (exec q ) of
Done fr ) q (jstatus = Idle ; front = fr j) j
newstatus ) q (jstatus = newstatus j))
check ::
0a queue ) 0a queue
check q
= (if lenr q lenf q then exec2 q
empty ::
0a queue
empty = make 0 [] Idle [] 0
first ::
0a queue ) 0a
first q = hd (front q )
enq ::
0a ) 0a queue ) 0a queue
enq x q = check (q (jrear := x # rear q ; lenr := lenr q j
+ 1 ))
deq ::
0a queue ) 0a queue
deq q
= check
The two main queue operations, enq and deq, alter front and rear as
expected, with additional updates to lenf and lenr to keep track of the their
length. To perform all “internal” operations, both functions call check. Addi-
tionally, deq uses invalidate to mark elements as removed.
Function check calls exec2 if lenr is not larger than lenf. Otherwise a
reversal process is initiated: rear is emptied and lenr is set to 0; lenf is
increased to the size of the whole queue since, conceptually, all element are
now in the soon-to-be-computed front; the status newstate is initialized as
described at the beginning of Section 20.3.2.
The time complexity of this implementation is clearly constant, since there
are no recursive functions.
invar q
= (lenf q = jfront_list q j ^ lenr q = jrear_list q j ^
lenr q lenf q ^
(case status q of
Rev ok f f 0 _ _ )
0
2 lenr q jf j ^ ok 6= 0 ^
2 jf j + ok + 2 2 jfront q j
@
( fr : status q = Done fr ) ^
inv_st (status q ))
For case App ok f 0 r 0, the front list corresponds to the final result of the
stepped reversal (20.1) but only elements in f 0 that are still in the queue,
denoted by take ok f 0, are considered. Analogously for Rev ok f f 0 r r 0,
both stepped reversal results are appended and only relevant elements in f 0
are used, however, rear lists r and r 0 are reversed again to achieve canonical
order.
Continuing with invar, inequality lenr q lenf q is the main invariant in
our reversal strategy, and by the previous two equalities must holds even as
internal operations occur. Furthermore, predicate 9 rest : front_list q = front
q @ rest ensures front q is contained within front_list q, thus preventing
any mismatch between the internal state and the queue’s front. Given that
exec2 is the only function that manipulates a queue’s status, it holds that
@ fr : status q = Done fr since any internal Done result is replaced by Idle.
The case distinction on status q places size bounds on internal lists front
and rear ensuring the front does not run out of elements and the rear never
grows beyond lenr q lenf q. In order to clarify some of formulations used
in this part of invar, consider the following correspondences, which hold once
the reversal process starts:
• lenr q corresponds to the number of enq operations performed so far, and
2 lenr q denotes the exec applications in those operations.
before reaching Done must be less than the minimum number of applications
possible, denoted by 2 jf j + ok + 2 2 jfront q j.
Case App ok f r has similar invariants, with equation 2 lenr q jr j
bounding the growth of r as it was previously done with f 0. Moreover, ok
+ 1 2 jfront q j ensures fron q is not exhausted before the reversal is
completed.
Using invar we can prove the properties of enq and deq required by the
Queue ADT using this abstraction following abstraction function:
list ::
0a queue ) 0a list
list q = front_list q @ rear_list q
Bibliographic Remarks
The representation of queues as pairs of lists is due to Burton [11]. The Hood-
Melville queues are due to Hood and Melville [30]. The implementation is
based on the presentation by Okasaki [62].
21
Splay Trees
21.1 Implementation
The central operation on splay trees is the splay function shown in Fig-
ure 21.1. It rotates the given element x to the root of the tree if x is already
in the tree. Otherwise the last element found before the search for x hits a
leaf is rotated to the root.
Function isin has a trivial implementation in terms of splay:
isin ::
0a tree ) 0a ) bool
isin t x = ( case splay x t of hi ) False j h_; a ; _i ) x = a)
Except that splay creates a new tree that needs to be returned from a proper
isin as well to achieve the amortized logarithmic complexity (see the discus-
sion of observer functions at the end of Section 19.1). This is why splay trees
232 21 Splay Trees
splay x hAB ; b ; CD i
= (case cmp x b of
LT ) case AB of
hi ) hAB ; b ; CD i j
hA; a ; B i )
case cmp x a of
LT ) if A = hi then hA; a ; hB ; b ; CD ii
else case splay x A of
hA1 ; a 0; A2 i ) hA1 ; a 0; hA2 ; a ; hB ; b ; CD iii j
EQ ) hA; a ; hB ; b ; CD ii j
GT ) if B = hi then hA; a ; hB ; b ; CD ii
else case splay x B of
hB 1 ; b 0; B 2 i ) hhA; a ; B 1 i; b 0; hB 2 ; b ; CD ii j
EQ ) hAB ; b ; CD i j
GT ) case CD of
hi ) hAB ; b ; CD i j
hC ; c ; D i )
case cmp x c of
LT ) if C = hi then hhAB ; b ; C i; c ; D i
else case splay x C of
hC 1 ; c 0; C 2 i ) hhAB ; b ; C 1 i; c 0; hC 2 ; c ; D ii j
EQ ) hhAB ; b ; C i; c ; D i j
GT ) if D = hi then hhAB ; b ; C i; c ; D i
else case splay x D of
hD 1 ; d ; D 2 i ) hhhAB ; b ; C i; c ; D 1 i; d ; D 2 i)
are inconvenient in functional languages. For the moment we ignore this as-
pect and stick with the above isin because it has the type required by the
Set ADT.
The implementation of insert x t in Figure 21.2 is straightforward: let hl ;
a ; r i = splay x t ; if a = x, return hl ; a ; r i; otherwise make x the root of a
suitable recombination of l, a and r.
The implementation of delete x t in Figure 21.3 starts similarly: let hl ; a ;
r i = splay x t ; if a 6= x, return hl ; a ; r i. Otherwise follow the deletion-by-
replacing paradigm (Section 5.2.1): if l 6= hi, splay the maximal element m
in l to the root and replace x with it. Note that splay_max returns a tree
that is just a glorified pair: if t 6= hi then splay_max t is of the form ht 0;
m ; hii. The definition splay_max hi = hi is not really needed (splay_max
is always called with non-hi argument) but some lemmas can be stated more
slickly with this definition.
21.2 Correctness 233
insert ::
0
a ) 0
a tree ) 0
a tree
insert x t
= (if t = hi then hhi; x ; hii
delete ::
0
a ) 0
a tree ) 0
a tree
delete x t
= (if t = hi then hi
21.2 Correctness
The inorder approach of Section 5.4 applies. Because the details are a bit
different (everything is reduced to splay) we present the top-level structure.
The following easy inductive properties are used implicitly in a number of
subsequent proofs:
splay a t = hi ! t hi =
splay_max t = hi ! t hi =
Correctness of isin
sorted (inorder t ) ! isin t x = ( x 2 set ( inorder t ))
234 21 Splay Trees
0a tree ) real
::
hi = 0
hl ; a ; r i ' hl ; a ; r i
= + l + r
't log2 jt j1
21.3 Amortized Analysis 235
b b a0
= n = n = n
a C a C A1 a
=n = n = n
A B a0 B A2 b
=n =n
A1 A2 B C
This is the case where x < a < b and A 6= hi. On the left we have the
input and on the right the output of splay x. Because A 6= hi, splay x A =
hA1 ; a 0; A2 i =: A 0 for some A1 , a 0 and A2 . The intermediate tree is obtained
by replacing A by A 0. This tree is shown for illustration purpose only; in the
algorithm the right tree is constructed directly from the left one. Let X = hl ;
x ; r i. Clearly X 2 subtrees A. We abbreviate compound trees like hA; a ; B i
by the names of their subtrees, in this case AB. First note that
' A A BC
1 2 = ' ABC
( )
236 21 Splay Trees
by and definition of ( )
by IH and X 2 subtrees A
= 2 ' A ' A2 BC ' BC ' AB
+ + 'X 3 + 2
because x y < x
1 + lg y if x; y >
+ lg 2 lg ( + ) 0
b b b’
= n = n
a C a C a b
=n = n =n = n
A B A b0 A B1 B2 C
= n
B1 B2
This is the case where a < x < b and B 6= hi. On the left we have the
input and on the right the output of splay x. Because B 6= hi, splay x B =
hB 1 ; b 0; B 2 i =: B 0 for some B 1 , b 0 and B 2 . The intermediate tree is obtained
by replacing B by B 0. Let X = hl ; x ; r i. Clearly X 2 subtrees B. The proof
is very similar to the zig-zig case, the same naming conventions apply and we
omit some details:
Asplay x ABC = Tsplay x A + 1 + AB B C ABC
1 2
'B
3 + ' AB 1 + ' B2 C ' AB ' B 0 'X 3 + 2
by IH and X 2 subtrees B
= 2 'B + ' AB 1 + ' B2 C ' AB 'X 3 + 2
Because ' hl ; x ; r i
, the above theorem implies
1
If x is not in the tree we show that there is a y in the tree such that
splaying with y would produce the same tree in the same time:
Lemma 21.3. t 6= hi ^ bst t !
(9 y 2set_tree t : splay y t = splay x t ^ Tsplay y t = Tsplay x t )
Element y is the last element in the tree that the search for x encounters
before it hits a leaf. Naturally, the proof is by induction on the computation
of splay.
Combining this lemma with Corollary 21.2 yields the final unconditional
amortized complexity of splay on BSTs:
Corollary 21.4. bst t ! Asplay x t ' t 3 + 1
Asplay_max t ' t3 + 1
A running time analysis of isin is trivial because isin is just splay followed
by a constant-time test.
21.4 Exercises
Exercise 21.1. Find a sequence of numbers n 1 , n 2 , . . . n k such that the
insertion of theses numbers one by one creates a tree of height k.
238 21 Splay Trees
Bibliographic Remarks
Splay trees were invented and analyzed by Sleator and Tarjan [70] for which
they received the 1999 ACM Paris Kanellakis Theory and Practice Award [38].
In addition to the amortized complexity as shown above they proved that
splay trees perform as well as static BSTs (the Static Optimality Theorem)
and conjectured that, roughly speaking, they even perform as well as any other
BST-based algorithm. This Dynamic Optimality Conjecture is still open.
This chapter is based on earlier publications [69, 53, 54, 55].
22
Skew Heaps
Skew heaps are heaps in the sense of Section 14.1 and implement mergeable
priority queues. Skew heaps can be viewed as a self-adjusting form of leftist
heaps that attempts to maintain balance by unconditionally swapping all
nodes in the merge path when merging two heaps.
merge ::
0a tree ) 0a tree ) 0a tree
merge hi t = t
merge t hi = t
merge (hl 1 ; a 1 ; r 1 i =: t 1 ) (hl 2 ; a 2 ; r 2 i =: t 2 )
= (if a 1 a 2 then hmerge t 2 r 1 ; a 1 ; l 1 i else hmerge t 1 r 2 ; a 2 ; l 2 i)
The remaining operations (fg, insert, get_min and del_min) are defined as
in Section 14.1.
The following properties of merge have easy inductive proofs:
jmerge t 1 t 2j = jt j jt j
1 + 2
0a tree ) int
::
hi = 0
hl ; _; r i l r = + + rh l r
rh ::
0a tree ) 0a tree ) nat
rh l r = ( if jl j < jr j then else 1 0)
The rough intuition: because merge descends along the right spine, the more
right-heavy nodes a tree contains, the longer merge takes.
Two auxiliary functions count the number of right-heavy nodes on the left
spine (lrh) and left-heavy (= not right-heavy) nodes on the right spine (rlh):
lrh ::
0a tree ) nat
lrh hi = 0
lrh hl ; _; r i = rh l r + lrh l
rlh ::
0a tree ) nat
rlh hi = 0
rlh hl ; _; r i = 1 rh l r + rlh r
They imply
lrh t lg jt j1 rlh t lg jt j
1 (22.1)
Now we are ready for the amortized analysis. All canonical time functions
can be found in Appendix B.8. The key lemma is an upper bound of the
amortized complexity of merge in terms of lrh and rlh:
22.2 Amortized Analysis 241
The amortized complexity of insertion and deletion follows easily from the
complexity of merge:
Tinsert a t + (insert a t ) t 3 lg ( jt j
1 + 2) + 2
Bibliographic Remarks
Skew heaps were invented by Sleator and Tarjan [71] as one of the first self-
organizing data structures. Their presentation was imperative. Our presenta-
tion follows earlier work by Nipkow [53, 55] based on the functional account
by Kaldewaij and Schoenmakers [37].
23
Pairing Heaps
The abstraction function to multisets and the invariant follow the heap
paradigm:
pheap ::
0a heap ) bool
pheap Empty = True
pheap (Hp x hs )
= (8 h 2set hs : (8 y 2#mset_heap h : x y ^ pheap h
) )
Note that pheap is not the full invariant. Moreover, Empty does not occur
inside a non-empty heap.
244 23 Pairing Heaps
The implementations of empty and get_min are obvious, and insert fol-
lows the standard heap paradigm:
empty = Empty
get_min :: 0a heap ) 0a
get_min (Hp x _) = x
insert ::
0a ) 0a heap ) 0a heap
insert x h = merge (Hp x []) h
Function merge is not recursive (as in binary heaps) but simply adds one of
the two heaps to the front of the top-level heaps of the other, depending on
the root value:
merge ::
0a heap ) 0a heap ) 0a heap
merge h Empty = h
merge Empty h = h
merge (Hp x hsx =: hx ) (Hp y hsy =: hy )
= (if x < y then Hp x (hy # hsx ) else Hp y (hx # hsy ))
Thus merge and insert have constant running time. All the work is offloaded
on del_min which just calls merge_pairs:
pass 1 ::
0a heap list ) 0a heap list
pass 1 (h 1 # h 2 # hs ) = merge h 1 h 2 # pass 1 hs
pass 1 hs = hs
pass 2 ::
0a heap list ) 0a heap
pass 2 [] = Empty
pass 2 (h # hs ) = merge h (pass 2 hs )
empty = hi
get_min :: 0a tree ) 0a
get_min h_; x ; _i = x
link ::
0
) 0a tree
a tree
link hhsx ; x ; hhsy ; y ; hs ii
= ( if x < y then hhhsy ; y ; hsx i; x ; hs i else hhhsx ; x ; hsy i; y ; hs i )
link hp = hp
pass 1 ::
0
a tree) 0a tree
pass 1 hhsx ; x ; hhsy ; y ; hs ii = link hhsx ; x ; hhsy ; y ; pass 1 hs ii
pass 1 hp = hp
pass 2 ::
0
a tree) 0a tree
pass 2 hhsx ; x ; hs i link hhsx ; x ; pass
= 2 hs i
pass 2 hi hi =
get_min :: 0a tree ) 0a
get_min h_; x ; _i = x
merge ::
0
a tree ) 0
a tree ) 0
a tree
merge hi hp = hp
merge hp hi = hp
merge hhsx ; x ; hii hhsy ; y ; hii = link hhsx ; x ; hhsy ; y ; hiii
insert ::
0
a ) 0
a tree) 0a tree
insert x hp = merge hhi; x ; hii hp
as the tree hhs 1 ; x 1 ; hhs 2 ; x 2 ; :::i:::i. This simplifies the analysis because we
now have to deal only with a single type, trees.
The code for the tree representation of pairing heaps is shown in Fig-
ure 23.1. We work with the pass 1 /pass 2 version of del_min. The correctness
proof is very similar to what we saw in the previous section. We merely display
the two invariants:
pheap ::
0a tree ) bool
pheap hi = True
pheap hl ; x ; r i = (( 8 y 2set_tree l : x y ^ pheap l ^ pheap r
) )
Now we turn to the amortized analysis. The potential of a tree is the sum
of the logarithms of the sizes of the subtrees:
0a tree ) real
::
hi = 0
hl ; x ; r i jhl ; x ; r ij
= lg + l + r
These easy inductive size properties are frequently used implicitly below:
jlink hp j jhp j=
jpass hp j jhp j
1 =
jpass hp j jhp j
2 =
is_root h ^ is_root h
1 ! jmerge h 2 1 h 2j = jh j jh j
1 + 2
We can now analyze the differences in potential caused by all the queue op-
erations. In a separate step we will derive their amortized complexities.
For insertion, the following upper bound follows trivially from the defini-
tions:
Lemma 23.1. is_root hp ! ( insert x hp ) hp lg ( jhp j + 1)
(merge h 1 h 2 ) h 1 h jh j jh j
2 lg ( 1 + 2 ) + 1
Proof. From
merge h 1 h 2 )
(
= link hhs ; x ; h i
( 1 1 2 )
= hs hs1 + jhs j
2 + lg ( 1 + jhs j 2 + 1) + lg ( jhs j jhs j
1 + 2 + 2)
= hs hs1 + jhs j
2 + lg ( 1 + jhs j 2 + 1) + lg ( jh j jh j
1 + 2 )
it follows that
(merge h 1 h 2 ) h1 h2
= lg (jhs 1 j + jhs 2 j + 1) + lg (jh 1 j + jh 2 j)
lg (jhs 1 j + 1) lg (jhs 2 j + 1)
248 23 Pairing Heaps
lg ( jh 1 j + jh 2 j) + 1
because lg (1 + x + y) 1 + lg (1 + x) + lg (1 + y ) if x ;y 0
ut
Now we come to the core of the proof, the analysis of del_min. Its running
time is linear in the number of nodes reachable by descending to the right
(starting from the left child of the root). We denote this metric by len:
len ::
0a tree ) nat
len hi = 0
len h_; _; r i = 1 + len r
Therefore we have to show that the potential change compensates for this
linear work. Our main goal is this:
Theorem 23.3. (del_min hhs ; x ; hii
) hhs ; x ; hii
2 lg (jhs j + 1) len hs + 2
It will be proved in two steps: First we show that pass 1 frees enough potential
to compensate for the work linear in len hs and increases the potential only
by a logarithmic term. Then we show that the increase due to pass 2 is also
only at most logarithmic. Combining these results one easily shows that the
amortized running time of del_min is indeed logarithmic.
First we analyze the potential difference caused by pass 1 :
Lemma 23.4. (pass 1 hs ) hs 2 lg ( jhs j + 1) len hs + 2
Proof by induction on the computation of pass 1 . The base cases are trivial.
We focus on the induction step. Let t = hhs 1 ; x ; hhs 2 ; y ; hs ii, n 1 = jhs 1 j,
n 2 = jhs 2 j and m = jhs j.
pass 1 t )
( t
= lg (n 1 + n 2 + 1) lg (n 2 + m + 1) + (pass 1 hs ) hs
lg (n 1 + n 2 + 1) lg (n 2 + m + 1)
+ 2 lg (m + 1) len hs + 2 by IH
2 lg (n 1 + n 2 + m + 1) lg (n 2 + m + 1) + lg (m + 1)
len hs because lg x + lg y + 2 2 lg (x + y ) if x ;y > 0
2 lg (n 1 + n 2 + m + 2) len hs
= 2 lg jt j len t + 2
2 lg (jt j + 1) len t + 2 ut
Now we turn to pass 2 :
Lemma 23.5. hs 6 hi !
= (pass 2 hs ) hs lg jhs j
23.2 Amortized Analysis 249
Proof by induction on hs. The base cases are trivial. The induction step (for
hhs 1 ; x ; hs i) is trivial if hs = hi. Assume hs = hhs 2 ; y ; r i. Now we need one
more property of pass 2 :
9 hs z : pass hhs ; y ; r i hhs ; z ; hii
3 2 2 = 3
is_root h 1 ^ is_root h 2 !
Tmerge h 1 h 2 + (merge h 1 h 2 ) h1 h 2
lg (jh 1 j + jh 2 j + 1) + 2
They follow from the corresponding Lemmas 23.1 and 23.2.
Combining this inductive upper bound for the running time of the two
passes
Tpass2 (pass 1 hs 1 ) + Tpass1 hs 1 len hs 1 + 2
with Theorem 23.3 yields the third and final amortized running time:
is_root h !
Tdel_min h + ( del_min h ) h 2 lg ( jh j + 1) + 5
250 23 Pairing Heaps
Thus we could prove that insertion, merging and deletion all have amor-
tized logarithmic running times.
Bibliographic Remarks
Pairing heaps were invented by Fredman et al. [21] as a simpler but compet-
itive alternative to Fibonacci heaps. The authors gave the amortized analysis
presented above and conjectured that it can be improved. Later research con-
firmed this [33, 65, 34] but the final analysis is still open. An empirical study
[47] showed that pairing heaps do indeed outperform Fibonacci heaps in prac-
tice. This chapter is based on an article by Nipkow and Brinkop [55].
Part V
Appendix
A
List Library
length ::
0a list ) nat
jj
[] = 0
jx # xs j = jxs j + 1
(@) ::
0a list ) 0a list ) 0a list
[] @ ys = ys
( x # xs ) @ ys = x # xs @ ys
set ::
0a list ) 0a set
set [] = fg
set (x # xs ) = fx g [ set xs
map :: (
0a ) 0b ) ) 0a list ) 0b list
map f [] = []
map f (x # xs ) = fx # map f xs
filter :: (
0a ) bool ) )
0a list ) 0a list
filter p [] = []
filter p (x # xs ) = ( if p x then x # filter p xs else filter p xs )
concat ::
0a list list ) 0a list
concat [] = []
concat (x # xs ) = x @ concat xs
254 A List Library
hd ::
0a list ) 0a
hd (x # xs ) = x
tl ::
0a list ) 0a list
tl [] = []
tl (x # xs ) = xs
butlast ::
0a list ) 0a list
butlast [] = []
butlast (x # xs ) = ( if xs = [] then [] else x # butlast xs )
rev ::
0a list ) 0a list
rev [] = []
rev (x # xs ) = rev xs @ [ x]
(!) ::
0a list ) nat ) 0a
( x # xs ) ! n = ( case n of 0 )x jk + 1 ) xs ! k)
list_update ::
0a list ) nat ) 0a ) 0a list
[][ _ := _] = []
( x # xs )[i := v] = ( case i of 0 )v # xs jj + 1 )x # xs [j := v ])
sum_list :: 0a list ) 0a
sum_list [] = 0
sum_list (x # xs ) = x + sum_list xs
min_list :: 0a list ) 0a
min_list (x # xs )
= (case xs of [] ) x j _ # _ ) min x (min_list xs ))
B.1 Lists
Tlength ::
0a list ) nat
Tlength [] = 1
Tlength (_ # xs ) = Tlength xs + 1
Tmap :: (
0a ) nat ) )
0a list ) nat
Tmap _ [] = 1
Tmap Tf (x # xs ) = Tf x + Tmap Tf xs + 1
Tfilter ::
0
( a ) nat ) )
0a list ) nat
Tfilter _ [] = 1
Tfilter Tp (x # xs ) = Tp x + Tfilter Tp xs + 1
Tlength xs = jxs j + 1
P
Tmap Tf xs = ( x xs Tf x ) + jxs j + 1
P
Tfilter Tp xs = ( x xs Tp x ) + jxs j + 1
Ttake n xs = min n jxs j + 1
Tdrop n xs = min n jxs j + 1
258 B Time Functions
B.2 Selection
Tslow_median ::
0a list ) nat
Tslow_median xs = Tslow_select (( jxs j 1) div 2) xs + 1
Tchop d xs 5 jxs j + 1
Tpartition3 x xs = jxs j + 1
Tslow_select k xs jxs j2 + 3 jxs j + 3
Tslow_median xs jxs j2 + 3 jxs j + 4
Tjoin_adj ::
0a tree23s ) nat
Tjoin_adj (TTs _ _ (T _)) = 1
Tjoin_adj (TTs _ _ (TTs _ _ (T _))) = 1
Tjoin_adj (TTs _ _ (TTs _ _ ts )) = Tjoin_adj ts + 1
Tjoin_all ::
0a tree23s ) nat
Tjoin_all (T _) = 1
Tjoin_all ts = Tjoin_adj ts + Tjoin_all (join_adj ts ) + 1
Ttree23_of_list ::
0a list ) nat
Ttree23_of_list as = Tleaves as + Tjoin_all (leaves as ) + 1
B.5 Binomial Heaps 259
Tmerge :: (
0a nat ) tree ) (
0a nat ) tree ) nat
Tmerge hi t = 1
Tmerge t hi = 1
Tmerge (hl 1 ; (a 1 ; n 1 ); r 1 i =: t 1 ) (hl 2 ; (a 2 ; n 2 ); r 2 i =: t 2 )
= (if a 1 a 2 then Tmerge r 1 t 2 else Tmerge t 1 r 2 ) + 1
Tinsert ::
0a ) (
0a
nat tree ) nat )
Tdel_min :: (
0a nat ) tree ) nat
Tdel_min hi = 1
Tdel_min hl ; _; r i = Tmerge l r + 1
Tlink ::
0a tree ) 0a tree ) nat
Tlink _ _ = 1
Tins_tree ::
0a tree ) 0a tree list ) nat
Tins_tree _ [] = 1
Tins_tree t 1 (t 2 # ts )
= (if rank t 1 < rank t 2 then 1
Tinsert ::
0a ) 0a tree list ) nat
Tinsert x ts = Tins_tree (Node 0 x []) ts + 1
Tmerge ::
0a tree list ) 0a tree list ) nat
Tmerge _ [] = 1
Tmerge [] (_ # _) = 1
Tmerge (t 1 # ts 1 =: h 1 ) (t 2 # ts 2 =: h 2)
= 1 +
Tget_min ::
0a tree list ) nat
Tget_min [_] = 1
Tget_min (_ # v # va ) = 1 + Tget_min (v # va )
Tget_min_rest ::
0a tree list ) nat
Tget_min_rest [_] = 1
Tget_min_rest (_ # v # va ) = 1 + Tget_min_rest (v # va )
Trev ::
0a list ) nat
Trev xs = jxs j + 1
Tdel_min ::
0a tree list ) nat
Tdel_min ts
= Tget_min_rest ts +
(case get_min_rest ts of
B.6 Queues
Tnorm ::
0a list 0a list ) nat
Tnorm (fs ; rs ) = ( if fs = [] then Titrev rs [] else 0) + 1
Tenq ::
0a ) 0a list 0a list ) nat
Tenq a ( fs ; rs Tnorm fs ; a
) = rs( # ) + 1
Tdeq ::
0a list 0a list ) nat
Tdeq (fs ; rs ) = ( if fs = [] then 0 else Tnorm (tl fs ; rs )) + 1
Tfirst ::
0a list 0a list ) nat
Tfirst (_ # _; _) = 1
B.7 Splay Trees 261
Tis_empty ::
0a list 0a list ) nat
Tis_empty (_; _) = 1
Tsplay ::
0a ) 0a tree ) nat
Tsplay _ hi = 1
Tsplay x hAB ; b ; CD i
= (case cmp x b of
LT ) case AB of
hi ) j 1
GT ) case CD of
hi ) j 1
hC ; c ; D i ) case cmp x c of
LT ) if C hi then = 1 else Tsplay x C + 1 j
EQ ) j 1
Tsplay_max ::
0a tree ) nat
Tsplay_max hi = 1
Tsplay_max h_; _; hii = 1
Tsplay_max h_; _; h_; _; C ii = ( if C = hi then 1 else Tsplay_max C + 1)
Tinsert ::
0a ) 0a tree ) nat
Tinsert x t = 1 + ( if t =hi then 0 else Tsplay x t )
Tdelete ::
0a ) 0a tree ) nat
Tdelete x t
= 1 +
(if t = hi then 0
262 B Time Functions
else Tsplay x t +
(case splay x t of
hl ; a ; _i )
if x 6= a then 0 else if l = hi then 0 else Tsplay_max l ))
Tmerge ::
0a tree ) 0a tree ) nat
Tmerge hi _ = 1
Tmerge _ hi = 1
Tmerge hl 1 ; a 1 ; r 1 i hl 2 ; a 2 ; r 2 i
= (if a 1 a 2 then Tmerge hl 2 ; a 2 ; r 2 i r 1 else Tmerge hl 1 ; a 1 ; r 1 i r 2 ) + 1
Tinsert ::
0a ) 0a tree
) int
Tinsert a t = Tmerge hhi; a ; hii t + 1
Tdel_min ::
0a tree ) int
Tdel_min t = ( case t of hi ) j ht ; _; t i ) Tmerge t
1 1 2 1 t2 + 1)
Tinsert ::
0a ) 0a tree ) nat
Tinsert _ _ = 1
Tmerge ::
0a tree ) 0a tree ) nat
Tmerge _ _ = 1
Tdel_min ::
0a tree ) nat
Tdel_min hi = 1
Tdel_min hhs ; _; _i = Tpass2 (pass 1 hs ) + Tpass1 hs + 1
Tpass1 ::
0a tree ) nat
B.9 Pairing Heaps 263
Tpass1 hi = 1
Tpass1 h_; _; hii = 1
Tpass2 ::
0a tree ) nat
Tpass2 hi = 1
Tpass2 h_; _; hs i = Tpass2 hs + 1
C
Notation
The following table gives an overview of all the special symbols used in this
book and how to enter them into Isabelle. The second column shows the
full internal name of the symbol; the third column shows additional ASCII
abbreviations. Either of these can be used to input the character using the
auto-completion popup.
\<le> <=
\<ge> >=
\<circ> function composition
\<times> <*> cartesian prod., prod. type
j \<bar> || absolute value
b \<lfloor> [.
c \<rfloor> .]
floor
d \<lceil> [.
e
P
\<rceil> .]
ceiling
\<Sum> SUM
Q see Section C.3
\<Prod> PROD
Note that the symbols “⦃” and “⦄” that is used for multiset notation in the
book do not exist Isabelle; instead, the ASCII notation {# and #} are used
(cf. Section C.3).
)
)
)
C.3 Syntactic Sugar 267
The following table lists relevant syntactic sugar that is used in the book or
its supplementary material. In some cases, the book notation deviates slightly
from the Isabelle notation for better readability.
The last column gives the formal meaning of the notation (i.e. what it
expands to). In most cases, this is not important for the user to know, but
it can occasionally be useful to find relevant lemmas, or to understand that
e.g. if one encounters the term sum f A, this is just the -contracted form of
P
x 2A: f x.
Lists
jxs j length xs
[] [] Nil
x # xs x # xs Cons x xs
[x ; y] [ x ; y] x # y # []
Sets
fg fg empty
fx ; y g fx ; y g insert x (insert y fg
)
x 2A x 2A Set :member x A
x 2=A x 2=A :x 2A
( )
Multisets
jM j size M
⦃⦄ f g # empty_mset
⦃x⦄ + M add_mset x M
⦃x ; y⦄ f x; y g
# # add_mset x (add_mset y f g
# )
x 2# M x 2 M # x 2 set_mset M
x 2=# M x 2
= M # :x 2 M ( # )
⦃x 2# M j P x⦄ f x 2 M : P x g filter_mset P M
# # #
⦃f x j x 2# M⦄ f f x : x 2 M g image_mset f M
# # #
8 x 2#M : P x 8x2 M: P x
# 8 x 2set_mset M : P x
9 x 2#M : P x 9x2 M: P x
# 9 x 2set_mset M : P x
M # M 0 M M #
0 subseteq_mset M M 0
C.3 Syntactic Sugar 269
Sums
P P
A A sum ( x : x ) A
P P
2 fx
x A x 2A: f x sum f A
Pj P
k ifk=
k =i ::j : f k sum f fi ::j g
P P
# M # M sum_mset M
P P
x2 M f x x 2 #M : f x sum_mset (image_mset f M )
P # P
x xs fx x xs : f x sum_list (map f xs )
(analogous for products)
15. R. De La Briandais. File searching using variable length keys. In Papers Pre-
sented at the the March 3-5, 1959, Western Joint Computer Conference,
IRE-AIEE-ACM ’59 (Western), pages 295–298. ACM, 1959.
16. M. Eberl. The number of comparisons in quicksort. Archive of For-
mal Proofs, Mar. 2017. http://isa-afp.org/entries/Quick_Sort_Cost.html,
Formal proof development.
17. M. Eberl. Proving divide and conquer complexities in Isabelle/HOL. Journal
of Automated Reasoning, 58(4):483–508, 2017.
18. M. Eberl, M. W. Haslbeck, and T. Nipkow. Verified analysis of random binary
tree structures. In J. Avigad and A. Mahboubi, editors, Interactive Theorem
Proving (ITP 2018), volume 10895 of LNCS, pages 196–214. Springer, 2018.
19. J.-C. Filliâtre and P. Letouzey. Functors for proofs and programs. In ESOP,
volume 2986 of LNCS, pages 370–384. Springer, 2004.
20. E. Fredkin. Trie memory. Commun. ACM, 3(9):490–499, 1960.
21. M. L. Fredman, R. Sedgewick, D. Sleator, and R. Tarjan. The pairing heap: A
new form of self-adjusting heap. Algorithmica, 1(1):111–129, 1986.
22. K. Germane and M. Might. Deletion: The curse of the red-black tree. J.
Functional Programming, 24(4):423–433, 2014.
23. L. J. Guibas and R. Sedgewick. A dichromatic framework for balanced trees. In
19th Annual Symposium on Foundations of Computer Science (FOCS 1978),
pages 8–21, 1978.
24. F. Haftmann. Code generation from Isabelle/HOL theories. http://
isabelle.in.tum.de/doc/codegen.pdf.
25. F. Haftmann. Haskell-style type classes with Isabelle/Isar. http://isabelle.
in.tum.de/doc/classes.pdf.
26. F. Haftmann and T. Nipkow. Code generation via higher-order rewrite sys-
tems. In M. Blume, N. Kobayashi, and G. Vidal, editors, Functional and Logic
Programming (FLOPS 2010), volume 6009 of LNCS, pages 103–117. Springer,
2010.
27. Haskell website. https://www.haskell.org.
28. R. Hinze. On constructing 2-3 trees. J. Funct. Program., 28:e19, 2018.
29. C. M. Hoffmann and M. J. O’Donnell. Programming with equations. ACM
Trans. Program. Lang. Syst., 4(1):83–112, 1982.
30. R. Hood and R. Melville. Real-time queue operation in pure LISP. Inf. Process.
Lett., 13(2):50–54, 1981.
31. R. R. Hoogerwoord. A logarithmic implementation of flexible arrays. In R. Bird,
C. Morgan, and J. Woodcock, editors, Mathematics of Program Construction,
volume 669 of LNCS, pages 191–207. Springer, 1992.
32. D. A. Huffman. A method for the construction of minimum-redundancy codes.
In Proceedings of the I.R.E., pages 1098–1101, 1952.
33. J. Iacono. Improved upper bounds for pairing heaps. In M. M. Halldórsson,
editor, Algorithm Theory - SWAT 2000, 7th Scandinavian Workshop on Al-
gorithm Theory, volume 1851 of LNCS, pages 32–45. Springer, 2000.
34. J. Iacono and M. V. Yagnatinsky. A linear potential function for pairing heaps.
In T. H. Chan, M. Li, and L. Wang, editors, Combinatorial Optimization and
Applications — 10th International Conference, COCOA 2016, volume 10043
of LNCS, pages 489–504. Springer, 2016.
References 273
54. T. Nipkow. Automatic functional correctness proofs for functional search trees.
In J. Blanchette and S. Merz, editors, Interactive Theorem Proving (ITP
2016), volume 9807 of LNCS, pages 307–322. Springer, 2016.
55. T. Nipkow and H. Brinkop. Amortized complexity verified. J. Autom. Reason-
ing, 62(3):367–391, 2019.
56. T. Nipkow and G. Klein. Concrete Semantics with Isabelle/HOL. Springer,
2014. 298 pp. http://concrete-semantics.org.
57. T. Nipkow, L. Paulson, and M. Wenzel. Isabelle/HOL — A Proof Assistant
for Higher-Order Logic, volume 2283 of LNCS. Springer, 2002. http://www.
in.tum.de/~nipkow/LNCS2283/.
58. T. Nipkow and T. Sewell. Proof pearl: Braun trees. In J. Blanchette and
C. Hritcu, editors, Certified Programs and Proofs, CPP 2020, pages 18–31.
ACM, 2020.
59. T. Nipkow and D. Somogyi. Optimal binary search trees. Archive of Formal
Proofs, May 2018. https://isa-afp.org/entries/Optimal_BST.html, Formal
proof development.
60. Ocaml website. https://ocaml.org.
61. C. Okasaki. Three algorithms on Braun trees. J. Functional Programming,
7(6):661–666, 1997.
62. C. Okasaki. Purely Functional Data Structures. Cambridge University Press,
1998.
63. L. C. Paulson. The foundation of a generic theorem prover. Journal of Auto-
mated Reasoning, 5:363–397, 1989.
64. L. C. Paulson. ML for the Working Programmer. Cambridge University Press,
2nd edition, 1996.
65. S. Pettie. Towards a final analysis of pairing heaps. In 46th Annual IEEE Sym-
posium on Foundations of Computer Science (FOCS, pages 174–183. IEEE
Computer Society, 2005.
66. C. Reade. Balanced trees with removals: An exercise in rewriting and proof.
Sci. Comput. Program., 18(2):181–204, 1992.
67. M. Rem and W. Braun. A logarithmic implementation of flexible arrays. Mem-
orandum MR83/4. Eindhoven University of Techology, 1983.
68. S. Sahni. Leftist trees. In D. P. Mehta and S. Sahni, editors, Handbook of Data
Structures and Applications. Chapman and Hall/CRC, 2004.
69. B. Schoenmakers. A systematic analysis of splaying. Information Processing
Letters, 45:41–50, 1993.
70. D. D. Sleator and R. E. Tarjan. Self-adjusting binary search trees. J. ACM,
32(3):652–686, 1985.
71. D. D. Sleator and R. E. Tarjan. Self-adjusting heaps. SIAM J. Comput.,
15(1):52–69, 1986.
72. R. E. Tarjan. Amortized complexity. SIAM J. Alg. Disc. Meth., 6(2):306–318,
1985.
73. L. Théry. Formalising Huffman’s algorithm. Technical Report TRCS 034, De-
partment of Informatics, University of L’Aquila, 2004.
74. F. Turbak. CS230 Handouts — Spring 2007, 2007. http://cs.wellesley.edu/
~cs230/spring07/handouts.html.
75. J. Vuillemin. A data structure for manipulating priority queues. Commun.
ACM, 21(4):309–315, 1978.
References 275
j_j , 50
1 bst, 61
j_j, 50, 253 butlast, 254
[],8
[_ ::< _], 254 child, 49
(@), 253 cmp, 62
(!), 254 coercion, 9
(#), 8 complete, 52
_ [_ := _], 254 complete tree, 51
⦃⦄, computation induction, 12
P 10
concat, 253
# , 10
( 2#), 10