2010 Book DesignAndVerificationOfMicropr
2010 Book DesignAndVerificationOfMicropr
2010 Book DesignAndVerificationOfMicropr
123
Editor
David S. Hardin
Rockwell Collins, Inc.
400 Collins Road NE.
Cedar Rapids IA 52498
USA
dshardin@rockwellcollins.com
vii
viii Preface
that have actually been built and deployed, and feature systems that have been cer-
tified at high Evaluation Assurance Levels, namely the Rockwell Collins AAMP7G
microprocessor (EAL7) and the Green Hills INTEGRITY-178B separation kernel
(EAL6C). The contributing authors to this book have endeavored to bring forth
compelling new material on significant, modern design and verification efforts;
many of the results described herein were obtained only within the past year.
This book is intended for practicing computer engineers, computer scientists,
professionals in related fields, as well as faculty and students, who have an interest
in the intersection of high-assurance design, microprocessor systems, and formal
verification, and wish to learn about current developments in the field. It is not in-
tended as a tutorial for any of the aforementioned subjects, for which excellent texts
already exist.
The approach we have taken is to treat each subject that we examine in depth.
Rather than presenting a mere summary of the work, we provide details: how ex-
actly the design is specified and implemented, how the design is formalized, what
the exact correctness properties are, and how the design is shown to meet its speci-
fication. Thus, for example, the text describes precisely how a radix-4 SRT divider
for a commercial microprocessor is implemented and proven correct. Another chap-
ter details how a complete AES-128 design is refined from an abstract specification
traceable back to the FIPS-197 document all the way down to a high-performance
hardware-based implementation that is provably equivalent to the FIPS-197 specifi-
cation. The contributors to this book have made an extraordinary effort to produce
descriptions of their work that are as complete and detailed as possible.
Just as important, this book takes the time to derive useful correctness statements
from basic principles. The text formally develops the “GWV” family of information
flow theorems used in the certifications of the AAMP7G as well as the INTEGRITY-
178B kernel, proceeding from a simple model of computing systems (expressed
in the language of the PVS theorem prover) called the “calculus of indices”, and
formally developing the GWVr1 and GWVr2 information flow theorems. The text
presents a proof of how the GWV formulation maps to classical noninterference, as
well as a proof demonstrating that a system can be shown to uphold the GWVr1 in-
formation flow specification via model checking. Another example of development
from basic principles can be found in the chapter detailing the refinement frame-
works used in the verification of the seL4 microkernel.
Along the way, we delve into a number of “tools of the trade” – theorem provers
(e.g., ACL2, HOL4, Isabelle/HOL, PVS), model checkers (BAT, NuSMV, Prover),
and equivalence checkers – and show how formal verification toolchains are increas-
ingly able to parse the actual engineering artifacts under analysis, with the result that
the formal models are much more detailed and accurate. Another tool trend noted
in several chapters is the combination of theorem proving, model checking, sym-
bolic simulation, etc., to produce a final verification result. A notable example of
this combination of techniques documented in the text is the process used by Cen-
taur Technology to verify their x86 compatible processors. The book also highlights
ways in which ideas from, for example, theorem proving and compiler design, are
being combined to produce novel and useful capabilities.
Preface ix
xi
xii Contents
Index . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .429
Contributors
xiii
xiv Contributors
1 Introduction
Digital systems designs are growing in complexity, with huge state spaces and even
larger sets of possible execution paths through those spaces. Traditional simulation-
based testing for complex systems covers only relatively few of those execution
paths. One solution that is getting increased attention is formal verification: the ap-
plication of mechanized mathematical techniques to verify design properties for all
execution paths.
In this chapter, we introduce ACL2 – a programming language, logic, and proof
development environment – and explore its use in the formal verification of digital
systems. Section 2 explores the general problem of proving properties of digital
machines. Next, Sect. 3 introduces ACL2. Finally, in Sect. 4, we illustrate how to
apply ACL2 to reason about digital system models and programs running on them.
M. Kaufmann ()
University of Texas at Austin, Austin, TX, USA
e-mail: kaufmann@cs.utexas.edu
what language or languages does one operate? Most readers will immediately think
of several programming languages well suited to describing abstract state machines,
e.g., VHDL, C, Java, etc., as well as various modeling languages.
But we assume that most readers are less familiar with mathematical proof sys-
tems and so we explore that more closely here. A formal logic includes a syntax
describing all well-formed formulas together with some rules of inference that al-
low one to deduce “new” formulas from “old” ones. There are many formal logics:
propositional calculus, first-order predicate calculus, higher order logic, a plethora
of modal logics, etc. A logic can be turned into a theory by identifying some for-
mulas as axioms. It is conventional to assign meaning to formulas so that some are
considered valid with respect to the axioms and the rules of inference preserve va-
lidity: if the “old” formulas used by a rule are valid, so is the new formula produced
by the rule. The axioms characterize the properties of the primitive objects and the
rules of inference let us deduce additional properties. A proof is a derivation of a
formula from some axioms using the rules of inference. The new formula is called
a theorem (or, if it is just a stepping stone used to derive a more interesting formula,
a lemma) and it is valid. Thus, a way to determine that a formula is valid is to con-
struct a proof of the formula. A piece of software that checks that an alleged proof
is indeed a proof is called, naturally enough, a proof checker. A piece of software
that attempts to discover a proof given an alleged theorem is a theorem prover. If we
were talking about the game of chess, proof checking is akin to checking that every
move in a published game is legal and theorem proving is akin to playing the game
against an opponent (the alleged theorem).
Because numbers are so basic in digital machines – they are typically used as
data, as addresses, as instructions, etc. – our theory will have to have axioms char-
acterizing the numbers (especially the natural numbers including 0 and 1) and
possibly other “atomic” objects relevant to our machine or its description, such as
symbols and strings. In addition, our axioms should allow us to build composite
structures such as lists or vectors, tables, trees, graphs, etc., because these are typi-
cally used in machine descriptions.
All of these objects are inductively constructed in the sense that they can be built
up by repeatedly applying some basic functions. For example, the naturals are built
from 0 by adding 1. Vectors can be built from the empty vector by the operation
of adding an element. Tables can be built by adding a row or column (vector) to the
empty table, etc. To reason about inductive objects, one must have an inductive rule
of inference. The most familiar is: to prove that is valid for all natural numbers
n, (a) prove when n is 0, and (b) prove that if holds for the natural number
n, then holds for nC1. Similar rules of inference can be formulated for vectors,
tables, etc.
Because no preexisting theory will contain all the concepts needed to describe
our machine, our logical theory must also support the notion of definition, allowing
us to define new functions and relations. For example, we might need to speak of
the “physical address,” if any, associated with a given “virtual address” in a cer-
tain table. The main idea behind a definitional principle is to add one or more
new axioms that characterize the properties of some previously undistinguished
ACL2 and Its Applications to Digital System Verification 3
3 ACL2
The name “ACL2” is used to refer to three distinct systems: a functional (side-
effect free) programming language, a formal mathematical theory, and an interactive
mechanized theorem prover and proof development environment. ACL2 stands for
4 M. Kaufmann and J S. Moore
“A Computational Logic for Applicative Common Lisp” and hence might have been
written “ACL2 .”
In this section, we introduce ACL2 and provide a few references, but much more
information is available about it. In particular, the reader is invited to visit the ACL2
home page [30], where one can find tutorials, demos, publications, mailing lists, and
an extensive hypertext user’s manual. The home page also links to home pages of
past ACL2 workshops, where one may find many dozens of papers and slide presen-
tations about ACL2 and its applications, many of which are on the topic of digital
system verification. Some of that work is also published in journals and conference
proceedings, but there are advantages to starting with the workshops Web sites (1)
ACL2 Workshop papers are freely available in full on the Web (other that those of
the first workshop; see [31]), (2) the Web site often provides supplemental mate-
rial in the form of ACL2 source material (e.g., “ACL2 books”) or exercises, and (3)
the reader will learn about the standards and activities of the ACL2 community by
browsing the workshop Web sites.
Among the primitive data types supported by ACL2, aside from the numbers, are
characters (e.g., #\A, #\a, and #\Newline); strings (e.g., "Hello, world!");
the Boolean symbols t and nil denoting true and false, respectively; other symbols
(e.g., LOAD and X); and pairs of objects discussed at greater length below. Various
primitive functions allow us to manipulate, compare, and construct such objects. For
example, (+ n 1) is the sum of n and 1, and (< x y) is t if x is less than y,
and nil otherwise. The predicate equal takes two objects and returns t if they
are the same and nil otherwise.
The most basic “method” for constructing composite structures is cons, which
takes two objects and returns the ordered pair containing them. The car of such a
pair is the first object and the cdr is the second. The predicate consp returns t or
nil according to whether its argument is an ordered pair constructed by cons.
Because ACL2 is untyped, any object x can be treated as a list denoting a finite
sequence of objects. If x is a cons pair whose car is a and whose cdr is d , then
x denotes the sequence whose first element is a and whose remaining elements are
those denoted by the list d . If x is not a cons pair, it denotes the empty sequence.
It is conventional to use nil as the representative of the empty list, though any
non-cons will do. When a list is nil-terminated it is said to be a “true-list.”
For example, the sequence containing the elements 1, 2, and 3 is denoted by the
object constructed by (cons 1 (cons 2 (cons 3 nil))). This object is
written (1 2 3) in ACL2.
Given any object v in ACL2, it is possible to write a literal constant in the lan-
guage that evaluates to that object, namely ’v.
Suppose we wished to define the function sum-list which takes one argu-
ment, x, and treats it as a list of numbers, returning the sum of its elements. Then
we could define sum-list as follows:
(defun sum-list (x)
(if (consp x)
(+ (car x)
(sum-list (cdr x)))
0))
In particular, if x is an ordered pair, then sum-list returns the sum of the car
(head) of x and the result of recursively summing the elements of the cdr (rest) of
x; otherwise, sum-list returns 0. Thus, (sum-list ’(1 2 3)) is 6.
The syntax of ACL2 may be extended by a powerful abbreviational facility called
“macros.” A macro is similar to a function except it operates on the syntax of the lan-
guage. For example, it is possible to define list as a macro so that (list x(+ y
1) (* y 2)) is just an abbreviation for (cons x (cons (+ y 1)(cons
(* y 2) nil))). The idea is that list is defined (as a macro) so that when
given the list (x (+ y 1) (* y 2)) it returns the list (cons x (cons
(+ y 1) (cons (* y 2) nil))). The syntax of ACL2 is defined so that
if a macro is called in an alleged expression, the macro is evaluated on the argu-
ment list (not on the value of arguments) and the object returned is treated as the
6 M. Kaufmann and J S. Moore
expression meant by the macro call. The process is repeated recursively until no
macros are called in the expression. It is beyond the scope of this chapter to explain
macros in greater detail. Macros allow the full power of recursive computation to be
exploited in the syntax; amazing abbreviations can be introduced.
is an ordinal measure that decreases according to o< in each recursive call. This
principle is logically conservative (nothing new can be proved in the extended logic
unless it involves the newly introduction symbol) and thus insures the soundness
of the extended theory. Informally, the Definitional Principle insures that every de-
fined function terminates. To use the principle, the user must exhibit an appropriate
measure of the size of the arguments, although by default the system uses an often-
convenient notion of the “size” of an object.
For example, here is a definition of a list concatenation function that takes two
lists, x and y, and returns a list containing all the elements of x followed by all the
elements of y, in sequence.
(defun app (x y)
(if (consp x)
(cons (car x) (app (cdr x) y))
y))
This definition terminates because the size of x decreases on each recursion. Once
accepted, it is executable: given two concrete lists x and y the answer can be com-
puted directly from the definitions and axioms. (app ’(1 2 3) ’(4 5 6))
evaluates to (1 2 3 4 5 6).
In duality with the Definitional Principle, the ACL2 Induction Principle permits
one, in the induction step of a proof of , to assume any number of instances each
of o<-smaller “size,” as determined by a user-supplied ordinal-valued measure on
the variables in . To use the induction principle one must exhibit the measure,
prove that it is ordinal valued, and prove that it decreases under the case analysis
and variable substitutions used in the induction steps.
For example, using induction and axioms about the basic data types, ACL2 can
automatically prove the theorem
(equal (sum-list (app x y))
(+ (sum-list x) (sum-list y)))
by induction on x. The base case is the formula
(implies (not (consp x))
(equal (sum-list (app x y))
(+ (sum-list x) (sum-list y))))
and the induction step is the formula
(implies (and (consp x)
(equal (sum-list (app (cdr x) y))
(+ (sum-list (cdr x)) (sum-list y))))
(equal (sum-list (app x y))
(+ (sum-list x) (sum-list y))))
Both follow easily from the definitions of sum-list and app and the axioms.
The Defchoose Principle permits one to introduce a function symbol that returns
an object satisfying a given property, if such an object exists. It is conservative.
Using it we can provide the power of full first-order quantification, e.g., it is possible
8 M. Kaufmann and J S. Moore
The ACL2 theorem prover accepts as input an alleged theorem and attempts to find
a proof. Also provided as input is a logical world which, roughly speaking, lists the
axioms, user-supplied definitions, and proved theorems of the session. Additional
arguments allow the user to provide hints and other pragmatic advice. The theo-
rem prover either terminates with a success or failure message or else runs until
interrupted by the user. When it reports failure, the meaning is simply that no proof
was found and not that the formula is not a theorem. However, the prover’s output
can help the user to find a counterexample if the formula is not valid or to formu-
late useful lemmas to assist in subsequent proof attempts [28]. ACL2 also includes
numerous proof debugging tools [24, 29].
It tries to find proofs by applying a suite of standard proof techniques based
largely on the function symbols used in the goal formula. The most common tech-
nique is simplification, which replaces the goal formula by a set of supposedly
ACL2 and Its Applications to Digital System Verification 9
The best designed books formalize a set of useful concepts and configure ACL2
to reason about those formal concepts effectively. ACL2’s home page provides ac-
cess to hundreds of user-developed books that attempt to provide convenient settings
for reasoning about arithmetic in its many forms (integer, rational, bit-vector, float-
ing point, etc.), list processing, and other more specialized domains. But virtually
every project introduces important concepts that are idiosyncratic to that project or
model and these become the standard books within that project.
The design of the ACL2 theorem prover – driven as it is by the available rules –
puts a great deal of burden on the user in one sense and relieves a great burden in
another. In the early- and mid-stages of a major project, ACL2 users are often less
worried about proving the particular goal theorem than they are about discovering
and codifying proof strategies in the form of books. This can make progress slow.
But once a suitable collection of books has been created allowing the “automatic”
proof of the main theorem, the investment pays off in the later stages of the project
where the verified artifact is repeatedly modified, elaborated, and improved. Each
such modification requires a proof of correctness. This is called proof maintenance.
If each modification required a “hand-made” proof, even if it is produced from an
explicitly described earlier proof, progress here would be much slower. But the
ACL2 user frequently finds that minor modifications to the artifact can be veri-
fied automatically, and that when that verification fails the problem is just in the
region changed, requiring the incremental formalization of the insight that justified
the modification to the artifact.
3.4 Efficiency
To use ACL2 in industrial projects has required a great deal of engineering aside
from the more obviously necessary attention to powerful proof techniques.
The ACL2 user highly values its capability as a programming language. This
makes well-designed ACL2 models doubly useful: as simulation engines and as for-
mal artifacts about which one can reason. But to be useful as a simulation platform,
the user must have the means to make them efficiently executable without compli-
cating their logical semantics. We give three examples.
It is possible to annotate Lisp code with declarations asserting the intended types
of the values of the variables at runtime. The Common Lisp compiler merely as-
sumes that these declarations are accurate and lay down suitably optimized code. To
see how this can produce much more efficient execution, consider the representation
of numbers in Lisp. To provide semantic cleanliness, every number is a first class
“Object,” a state of affairs that may be achieved by “boxing” every number, i.e., by
representing every number as an instance of some class with the actual magnitude
of the number somehow coded in the fields. Operationally, every number is then
a pointer to one or more memory locations containing the binary representation of
the number. But this would make common arithmetic exceedingly slow because,
naively, one would have to allocate memory to add two numbers. Most Lisp im-
plementations solve this by essentially preallocating all the “small” numbers, e.g.,
those representable in 30 bits, often by representing the small two’s complement
ACL2 and Its Applications to Digital System Verification 11
integers by the corresponding binary addresses. Thus, if two small numbers are to
be added and their sum is known (or assumed) to be small, the compiler can generate
the native add instruction on the host processor. If one of these conditions is not met,
the compiler must lay down code that “unboxes,” sums, and “boxes” appropriately.
But it is logically dangerous to assume that the declarations are accurate. Thus,
ACL2 provides a mechanism (called guard verification) by which the user can not
only annotate functions and formulas without affecting their logical meanings, but
also prove mechanically the accuracy of those annotations. ACL2 will not execute
optimized code unless the declarations have been verified; in the absence of ver-
ification, ACL2 arranges for Common Lisp to execute code as it would have no
declarations present. Thus, the user can annotate code for efficiency; the annota-
tions do not complicate the proof obligations when reasoning about the functional
properties of the code, and if those annotations are subsequently verified, the user
will observe his or her functions executing much faster.
A second example is the mbe (“must be equal”) facility, described at length in
[17]. This mechanism allows the user to provide, for example, two entirely different
definitions for a function, one to use in logical reasoning and one to use in execution,
but produces the obligation to prove the two definitions equivalent. For example, it
may be easiest to reason about a function defined in a natural, recursive style but
more efficient to compute it with an iterative (tail-recursive) scheme eliminating
the need for a stack at the expense of some clever invariant among some auxiliary
variables.
A third example is support for single-threaded objects, also called stobjs [5].
Semantically, these objects are just list structures. But “under the hood,” they use
destructive operations such as updating an array at a given index. Syntactic restric-
tions are imposed and enforced so that the user cannot detect the difference between
the functional semantics alleged by the axioms and the imperative implementation.
With stobjs one can attain execution that approaches the efficiency of C (e.g., 90%
of the speed of C on a small microprocessor model is reported by Hardin et al.
in [18]).
Other efficiency issues that have required careful engineering attention include
manipulating and even printing large formulas and constants [25], implementing
proof techniques efficient enough to deal with industrial-scale definitions and for-
mulas, being able to load deep hierarchies of books into the session in a reasonable
amount of time, and being able to recertify deep and broad hierarchies of books in
parallel fast enough to allow overnight rebuilding of recently modified systems.
What does a microprocessor model look like in ACL2? Below we show a complete
description of a very simple machine akin to the Java Virtual Machine. To save
space, we use a slightly small font in our displays. The state of the machine is a list
of four items, a program counter (here called ipc because the name pc is already
12 M. Kaufmann and J S. Moore
defined in ACL2), a list of local variable values (locals), a list (stack) of values
pushed (stack), and a list of instructions (code). Below we define make-state
to construct a state and the four accessor functions to return the corresponding
components. A semicolon (;) delimits a comment to the end of the line.
(defun make-state (ipc locals stack code)
(list ipc locals stack code))
(defun ipc (s) (nth 0 s))
(defun locals (s) (nth 1 s))
(defun stack (s) (nth 2 s))
(defun code (s) (nth 3 s))
(defun next-inst (s) ; fetch instruction at ipc in code
(nth (ipc s) (code s)))
Macros can be written that make it easy to describe the shape of a state and have the
appropriate functions defined automatically.
Instructions are represented by lists, where the 0th element of the list (i.e., the
car, but below written (nth 0 inst)) is the symbolic name of the opcode
and the remaining elements are the operands. For example, an ICONST instruc-
tion, which will cause the machine to push a literal constant onto the stack, will be
represented (ICONST c), where c (i.e., (nth 1 inst)) is the literal constant
to push. The list of local variable values is also accessed with nth; a new list of
values may be obtained from an old one by update-nth, which “replaces” the
item at a given location in a list by another item. Stacks will be represented here
as lists, with the car being the top-most element and the cdr being the rest of the
stack, i.e., the result of popping the stack. The code is the analog of an execute-only
memory and is just a list of instructions.
Clearly we are describing this little machine at a very high level of abstraction.
It will be possible to refine our description. For example, instructions and opcodes
could be integers instead of lists and symbols; the opcode and operands could be
obtained by arithmetic shifting and masking. The code could be refined into a list of
integers representing the contents of successive machine addresses and would most
likely become a read–write memory. The stack could be an integer address into a
writable region of memory, etc. We return to this imagined lower level description
below, but for now we continue to describe the machine at the abstract level.
For each opcode we define a function that executes the instructions with that
opcode, i.e., the function takes an instruction (of the given opcode) and a state and
returns the next state. Such functions are called semantic functions because they give
the semantics of our instructions. The semantic function must define the program
counter, locals, stack, and code of the next state.
(defun execute-ICONST (inst s) ; (ICONST c): push c onto stack
(make-state (+ 1 (ipc s))
(locals s)
(cons (nth 1 inst) (stack s))
(code s)))
ACL2 and Its Applications to Digital System Verification 13
It should be obvious how to add other instructions and other state components to
this machine. When the semantic functions for all instructions have been defined, we
introduce, below, a single function which takes an arbitrary instruction and steps the
state accordingly, by simply doing a “big switch” on the opcode of the instruction
and invoking the appropriate semantic function. Note below that if an unknown
instruction is encountered, its semantics is a no-op: the new state is the old state.
(defun do-inst (inst s)
(case (nth 0 inst)
(ICONST (execute-ICONST inst s))
(ILOAD (execute-ILOAD inst s))
(ISTORE (execute-ISTORE inst s))
(IADD (execute-IADD inst s))
(ISUB (execute-ISUB inst s))
(IMUL (execute-IMUL inst s))
(GOTO (execute-GOTO inst s))
(IFLE (execute-IFLE inst s))
(otherwise s)))
We finally define the single-step state transition function simply to fetch the next
instruction and “do” it.
(defun istep (s)
(do-inst (next-inst s) s))
We conclude by defining the function run that takes a “schedule” and a state
and runs the state according to the schedule. Here, we just step the state once for
every element of the schedule, but in general the schedule provides additional input
to the istep function and may indicate signals received at that step on external
pins, which “thread” in a multithreaded state to step, etc.
(defun run (sched s)
(if (endp sched)
s
(run (cdr sched) (istep s))))
Recall our earlier hints of a lower level machine modeled mainly with integers.
If we had such a lower level model we could proceed to formalize and prove the
relation between it and this one, e.g., a commuting diagram or bisimulation between
the two machines. This has been done many times in ACL2 (and its predecessor
Nqthm [4]) for complex and realistic models, including some pipelined machines
[1, 7, 20, 22, 38, 40, 50, 52].
ACL2 and Its Applications to Digital System Verification 15
The abstract model above may be executed on concrete data. For example, con-
sider the code produced by a straightforward compilation of the pseudocode for
factorial:
a = 1;
while (n > 0)
a = n * a;
n = n-1;;
return a;
If we allocate local variable n to locals[0] and local variable a to locals[1],
the resulting code is as shown below. We define the ACL2 constant *fact-code*
to be this code snippet.
(defconst *fact-code*
’((ICONST 1) ;;; 0
(ISTORE 1) ;;; 1 a = 1;
(ILOAD 0) ;;; 2 while ; loop: ipc=2
(IFLE 10) ;;; 3 (n > 0)
(ILOAD 0) ;;; 4
(ILOAD 1) ;;; 5
(IMUL) ;;; 6
(ISTORE 1) ;;; 7 a = n * a;
(ILOAD 0) ;;; 8
(ICONST 1) ;;; 9
(ISUB) ;;; 10
(ISTORE 0) ;;; 11 n = n-1;
(GOTO -10) ;;; 12 ; jump to loop
(ILOAD 1) ;;; 13
(HALT) ;;; 14 return a;
))
Note that the unknown HALT instruction at program counter 14 halts the machine
since stepping that instruction is a no-op.
The following expression evaluates a state by taking 100 steps. The term
(repeat ’TICK 100) just returns a list of 100 repetitions of the symbol
TICK and is used as a schedule here. The make-state below constructs the
initial state: the program counter is 0; we have two locals, the 0th (called n
in our pseudocode) having the value 5 and the 1st (called a) having the value 0;
the stack is empty; and the code is our code constant. Note that this example
illustrates running the code snippet with n= 5.
(run
(repeat ’TICK 100) ; 100 clock ticks
(make-state
0 ; ipc
’(5 0) ; locals: n=5, a=0
nil ; stack
*ifact-code* ; code
))
16 M. Kaufmann and J S. Moore
The two make-state expressions above are the initial and final states of a run of
our code. In the initial state, the program counter is set to 0 and the local variables
have the unknown symbolic values n and a. In the final state, obtained by running
the initial state some number of steps determined by the function ifact-sched,
the program counter is 14 (i.e., points to the HALT) and we find the factorial of
n pushed on the stack. This equivalence holds provided n is a natural number.
This theorem is proved by induction and establishes the correctness of the facto-
rial snippet.
In [41], we present this same machine as well as a compiler for the pseudocode
used, and we describe the methodology used to configure ACL2 to do proofs about
ACL2 and Its Applications to Digital System Verification 17
code. Code is often verified with ACL2 and Nqthm this way, e.g., the binary code
produced by gcc for the Berkeley C String Library is verified against a model of
the Motorola 68020 in [6], a game of Nim is verified in [57] for the fabricated and
verified microprocessor described in [22], and JVM bytecode is verified against a
detailed model of the JVM in [35]. In [7], a commercially designed digital signal
microprocessor is modeled and verified to implement a given ISA. The machine, the
Motorola CAP, included a three-stage pipeline that exposed many programmer vis-
ible hazards. A tutorial on pipelined machine verification in ACL2 may be found in
[50]. In [53], a pipelined microarchitecture with speculative execution, exceptions,
and program modification capability is verified.
Because ACL2 is a general purpose mathematical logic, many different proof
styles and strategies can be brought to bear on the problem of verifying properties
of systems described in it [36, 43, 45]. These include commuting diagrams, bisim-
ulation and stuttering bisimulation, direct proofs of functional equivalence, and a
variety of methods related to inductive assertions. Different proof styles may be
mixed.
We have used a simple example to illustrate a methodology for using ACL2 to verify
correctness properties for digital system models. But there are many ways to use
ACL2 for digital system verification.
Another common approach to code verification in ACL2 skips the problem of
formalizing the entire microprocessor and instead models the code as an ACL2
function. This is akin to modeling our factorial code with the expression (ifact
n 0), where
(defun ifact (n a)
(if (zp n) ; if n <= 0
a ; return a
(ifact (- n 1) ; else loop with a:=n*a, n:=n-1
(* n a))))
The production of functions like ifact is sometimes done by hand and other times
is automated by tools that embed the semantics of the code. For example, the cor-
rectness of the floating-point division algorithm on the AMD K5 microprocessor
was proved using the former approach (in which the semantics of the relevant mi-
crocode was modeled as an ACL2 function) [42]. The correctness of divide and
square root on the IBM Power4 was also proved that way [51]. At Rockwell Collins,
the AAMP7G cryptoprocessor was modeled this way and a security-critical separa-
tion property was proved, allowing Rockwell to obtain NSA MILS certification[19].
Such certification requires a comparison of the model to the actual design (if they
are different) and it behooves the modeler to produce a model with as much fidelity
to the actual design as possible.
18 M. Kaufmann and J S. Moore
For this reason, or when code proofs are to be done repeatedly, it is more of-
ten worthwhile to build mechanical translators or to formalize the actual design
language so that fidelity is assured – or at least has to be checked just once at the
“metalevel.” This methodology has been used many times in the verification of float-
ing point designs at the register-transfer level (RTL) at AMD [9,48,49] and is similar
to the use by Rockwell Collins [13] of “reader macros” that allow near-isomorphism
between models written in ACL2 and in a very simple subset of C. Another approach
is to formalize the hardware description language itself [21, 22, 46, 50].
Other chapters in this volume describe applications of ACL2 in more detail. In
particular, the chapter by Hunt et al. describes research at Centaur Technology using
an extension of ACL2 that supports a formalized hardware design language and
efficient symbolic simulation to reason about RTL.
Traditional formal software verification often uses a verification condition gen-
erator (VCG) to take code annotated with assertions and produce proof obligations,
so that the provability of those obligations implies that the assertions always hold.
The “interpreter” approach of Sect. 4, by contrast, formalizes execution of the code
so that properties can be proved directly against the semantics. However, the VCG
approach can be emulated using the interpreter approach, see [39, 44].
In all ACL2 work, the problem arises of what to do when the theorem prover fails
to find a proof. As noted, it is the responsibility of the user to determine whether the
original goal formula was not a theorem (perhaps by constructing a counterexample
from the failed proof) or to formulate lemmas and hints to lead ACL2 to a proof.
Mechanizing the construction of counterexamples for ACL2 formulas is a topic of
ongoing work in the ACL2 community, e.g., Reeber and Hunt [47] describe a system
that unrolls a certain class of ACL2 formulas and attempts to prove them with a SAT
solver, converting any counterexample produced by SAT into an ACL2 counterex-
ample. Chamarthi, Dillinger, Kaufmann, and Manolios are working on an approach
using failed proofs and testing (private communication 2009).
6 Summary
mechanically. Second, modifications to the system can often be verified with an in-
cremental amount of effort on the part of the user. Nevertheless, in order to lead
ACL2 to proofs about industrial-scale designs the user must be tenacious and must
apply talent in mathematics, programming, and pattern recognition.
Acknowledgements We wish to thank the entire ACL2 community for helping push this work
along. We especially thank Warren Hunt for his recognition that Nqthm was particularly well suited
to modeling and verifying microprocessors and his decades of leadership in this area.
The preparation of this chapter was funded in part by NSF grants IIS-0417413 and EIA-
0303609 and DARPA/NSF CyberTrust grant CNS-0429591. Kaufmann also thanks the Texas –
United Kingdom Collaborative for travel support to Cambridge, England and the Computer Labo-
ratory at the University of Cambridge for hosting him during preparation of this paper.
References
1. Bevier W, Hunt WA Jr, Moore JS, Young W (1989) Special issue on system verification. J Au-
tom Reason 5(4):409–530
2. Boyer RS, Moore JS (1979) A computational logic. Academic, New York
3. Boyer RS, Moore JS (1981) Metafunctions: proving them correct and using them efficiently
as new proof procedures. In: Boyer RS, Moore JS (eds) The correctness problem in computer
science. Academic, London
4. Boyer RS, Moore JS (1997) A computational logic handbook, 2nd edn. Academic, New York
5. Boyer RS, Moore JS (2002) Single-threaded objects in ACL2. In: PADL 2002, LNCS 2257.
Springer, Heidelberg, pp 9–27. http://www.cs.utexas.edu/users/moore/publications/stobj/main.
ps.gz
6. Boyer RS, Yu Y (1996) Automated proofs of object code for a widely used microprocessor.
J ACM 43(1):166–192
7. Brock B, Hunt WA Jr (1999) Formal analysis of the motorola CAP DSP. In: Hinchey M, Bowen
J (eds) Industrial-strength formal methods. Springer, Heidelberg
8. Flatau AD (1992) A verified implementation of an applicative language with dynamic storage
allocation. PhD thesis, University of Texas at Austin
9. Flatau A, Kaufmann M, Reed D, Russinoff D, Smith E, Sumners R (2002) Formal verification
of microprocessors at AMD. In: Proceedings of designing correct circuits 2002. http://www.
cs.chalmers.se/ ms/DCC02/Slides.html
10. Goerigk W, Hoffmann U (1998) Rigorous compiler implementation correctness: how to prove
the real thing correct. In: Proceedings FM-TRENDS’98 international workshop on current
trends in applied formal methods, Boppard, LNCS
11. Goodstein RL (1964) Recursive number theory. North-Holland, Amsterdam
12. Greve D, Wilding M (2002) Evaluatable, high-assurance microprocessors. In: NSA high-
confidence systems and software conference (HCSS), Linthicum, MD. http://hokiepokie.org/
docs/hcss02/proceedings.pdf
13. Greve D, Wilding M, Hardin D (2000) High-speed, analyzable simulators. In: Kaufmann M,
Manolios P, Moore JS (eds) Computer-aided reasoning: ACL2 case studies. Kluwer, Boston,
MA, pp 113–136
14. Greve D, Wilding M, Vanfleet WM (2003) A separation kernel formal security policy. In: ACL2
workshop 2003, Boulder, CO. http://www.cs.utexas.edu/users/moore/acl2/workshop-2003/
15. Greve D, Richards R, Wilding M (2004) A summary of intrinsic partitioning verifica-
tion. In: ACL2 workshop 2004, Austin, TX. http://www.cs.utexas.edu/users/moore/acl2/
workshop-2003/
16. Greve D, Wilding M, Richards R, Vanfleet M (2005) Formalizing security policies for dy-
namic and distributed systems. In: Proceedings of systems and software technology conference
(SSTC) 2005, Salt Lake City, UT. http://hokiepokie.org/docs/sstc05.pdf
20 M. Kaufmann and J S. Moore
17. Greve D, Kaufmann M, Manolios P, Moore JS, Ray S, Ruiz-Reina JL, Sumners R, Vroon D,
Wilding M (2008) Efficient execution in an automated reasoning environment. J Funct Program
18(01):15–46
18. Hardin D, Wilding M, Greve D (1998) Transforming the theorem prover into a digital design
tool: from concept car to off-road vehicle. In: Hu AJ, Vardi MY (eds) Computer-aided verifica-
tion – CAV ’98, Lecture notes in computer science, vol 1427. Springer, Heidelberg. See http://
pobox.com/users/hokie/docs/concept.ps
19. Hardin DS, Smith EW, Young WD (2006) A robust machine code proof framework for highly
secure applications. In: ACL2 ’06: proceedings of the sixth international workshop on the
ACL2 theorem prover and its applications. ACM, New York, NY, pp 11–20. DOI http://doi.
acm.org/10.1145/1217975.1217978
20. Hunt WA Jr (1994) FM8501: a verified microprocessor. LNAI 795. Springer, Heidelberg
21. Hunt WA Jr (2000) The DE language. In: Kaufmann M, Manolios P, Moore JS (eds) Computer-
aided reasoning: ACL2 case studies. Kluwer, Boston, MA, pp 151–166
22. Hunt WA Jr, Brock B (1992) A formal HDL and its use in the FM9001 verification. Philosoph-
ical Transactions of the Royal Society: Physical and Engineering Sciences, 339(1652):35–47
23. Hunt WA Jr, Kaufmann M, Krug RB, Moore JS, Smith EW (2005) Meta reasoning in ACL2.
In: Hurd J, Melham T (eds) 18th international conference on theorem proving in higher order
logics: TPHOLs 2005, Lecture notes in computer science, vol 3603. Springer, Heidelberg,
pp 163–178
24. Kaufmann M (2008) Aspects of ACL2 User interaction (Invited talk, 8th international work-
shop on user interfaces for theorem provers (UITP 2008), Montreal, Canada, August, 2008).
See www.ags.uni-sb.de/omega/workshops/UITP08/kaufmann-UITP08/talk.html
25. Kaufmann M (2009) Abbreviated output for input in ACL2: an implementation case study.
In: Proceedings of ACL2 workshop 2009. http://www.cs.utexas.edu/users/sandip/acl2-09
26. Kaufmann M, Moore JS (1997) A precise description of the ACL2 logic. Technical report,
Deparment of Computer Sciences, University of Texas at Austin. http://www.cs.utexas.edu/
users/moore/publications/km97a.ps.gz
27. Kaufmann M, Moore JS (2001) Structured theory development for a mechanized logic.
J Autom Reason 26(2):161–203
28. Kaufmann M, Moore JS (2008) An ACL2 tutorial. In: Proceedings of theorem proving in
higher order logics, 21st international conference, TPHOLs 2008. Springer, Heidelberg. See
http://dx.doi.org/10.1007/978-3-540-71067-7 4
29. Kaufmann M, Moore JS (2008) Proof search debugging tools in ACL2. In: A Festschrift in
honour of Prof. Michael J. C. Gordon FRS. Royal Society, London
30. Kaufmann M, Moore JS (2009) The ACL2 home page. http://www.cs.utexas.edu/users/moore/
acl2/
31. Kaufmann M, Manolios P, Moore JS (eds) (2000a) Computer-aided reasoning: ACL2 case
studies. Kluwer, Boston, MA
32. Kaufmann M, Manolios P, Moore JS (2000b) Computer-aided reasoning: an approach. Kluwer,
Boston, MA
33. Kaufmann M, Moore JS, Ray S, Reeber E (2009) Integrating external deduction tools with
ACL2. J Appl Logic 7(1):3–25
34. Kaufmann M, Moore JS, Ray S (in press) Foundations of automated induction for a structured
mechanized logic
35. Liu H (2006) Formal specification and verification of a jvm and its bytecode verifier. PhD
thesis, University of Texas at Austin
36. Manolios P (2000) Correctness of pipelined machines. In: Formal methods in computer-aided
design, FMCAD 2000, LNCS 1954. Springer, Heidelberg, pp 161–178
37. Manolios P, Vroon D (2003) Ordinal arithmetic in ACL2. In: ACL2 workshop 2003, Boulder,
CO. http://www.cs.utexas.edu/users/moore/acl2/workshop-2003/
38. Manolios P, Namjoshi K, Sumners R (1999) Linking theorem proving and model-checking
with well-founded bisimulation. In: Computed aided verification, CAV ’99, LNCS 1633.
Springer, Heidelberg, pp 369–379
ACL2 and Its Applications to Digital System Verification 21
39. Matthews J, Moore JS, Ray S, Vroon D (2006) Verification condition generation via theorem
proving. In: Proceedings of 13th international conference on logic for programming, artificial
intelligence, and reasoning (LPAR 2006), vol LNCS 4246, pp 362–376
40. Moore JS (1996) Piton: a mechanically verified assembly-level language. Automated reasoning
series. Kluwer, Boston, MA
41. Moore JS (2008) Mechanized operational semantics: lectures and supplementary material.
In: Marktoberdorf summer school 2008: engineering methods and tools for software safety and
security. http://www.cs.utexas.edu/users/moore/publications/talks/marktoberdorf-08/index.
html
42. Moore JS, Lynch T, Kaufmann M (1998) A mechanically checked proof of the correctness
of the kernel of the AMD5K86 floating point division algorithm. IEEE Trans Comput 47(9):
913–926
43. Ray S, Hunt WA Jr (2004) Deductive verification of pipelined machines using first-order quan-
tification. In: Proceedings of the 16th international conference on computer-aided verification
(CAV 2004), vol LNCS 3117. Springer, Heidelberg, pp 31–43
44. Ray S, Moore JS (2004) Proof styles in operational semantics. In: Hu AJ, Martin AK (eds)
Formal methods in computer-aided design (FMCAD-2004), Lecture notes in computer science,
vol 3312. Springer, Heidelberg, pp 67–81
45. Ray S, Hunt WA Jr, Matthews J, Moore JS (2008) A mechanical analysis of program verifica-
tion strategies. J Autom Reason 40(4):245–269
46. Reeber E, Hunt WA Jr (2005) Formalization of the DE2 language. In: Correct hardware design
and verification methods (CHARME 2005), vol LNCS 3725. Springer, Heidelberg, pp 20–34
47. Reeber E, Hunt WA Jr (2006) A SAT-based decision procedure for the subclass of unrollable
list functions in ACL2 (SULFA). In: Proceedings of 3rd international joint conference on au-
tomated reasoning (IJCAR 2006). Springer, Heidelberg, pp 453–467
48. Russinoff DM, Flatau A (2000) RTL verification: a floating-point multiplier. In: Kaufmann M,
Manolios P, Moore JS (eds) Computer-aided reasoning: ACL2 case studies. Kluwer, Boston,
MA, pp 201–232
49. Russinoff D, Kaufmann M, Smith E, Sumners R (2005) Formal verification of floating-point
RTL at AMD using the ACL2 theorem prover. In: IMACS’2005 world congress
50. Sawada J (2000) Verification of a simple pipelined machine model. In: Kaufmann M, Mano-
lios P, Moore JS (eds) Computer-aided reasoning: ACL2 case studies. Kluwer, Boston, MA,
pp 137–150
51. Sawada J (2002) Formal verification of divide and square root algorithms using series calcula-
tion. In: Proceedings of the ACL2 workshop, 2002, Grenoble. http://www.cs.utexas.edu/users/
moore/acl2/workshop-2002
52. Sawada J, Hunt WA Jr (1998) Processor verification with precise exceptions and specula-
tive execution. In: Computer aided verification, CAV ’98, LNCS 1427. Springer, Heidelberg,
pp 135–146
53. Sawada J, Hunt WA Jr (2002) Verification of FM9801: an out-of-order microprocessor model
with speculative execution, exceptions, and program modification capability. Formal Methods
Syst Des 20(2):187–222
54. Shankar N (1994) Metamathematics, machines, and Godel’s proof. Cambridge University
Press, Cambridge
55. Shoenfield JR (1967) Mathematical logic. Addison-Wesley, Reading, MA
56. Steele GL Jr (1990) Common lisp the language, 2nd edn. Digital Press, Burlington, MA
57. Wilding M (1993) A mechanically verified application for a mechanically verified environ-
ment. In: Courcoubetis C (ed) Computer-aided verification – CAV ’93, Lecture Notes in
Computer Science, vol 697. Springer, Heidelberg. See ftp://ftp.cs.utexas.edu/pub/boyer/nqthm/
wilding-cav93.ps
58. Young WD (1988) A verified code generator for a subset of Gypsy. Technical report 33. Com-
putational Logic Inc., Austin, TX
A Mechanically Verified Commercial SRT
Divider
David M. Russinoff
1 Introduction
the corresponding values Q and R (quotient and remainder) of the data outputs.
Under suitable input constraints, the following relations must be satisfied:
1. Y D QX C R
2. jRj < jX j
3. Either R D 0 or R and X have the same sign.
Regrettably (from a verification perspective), the simplicity of this behavioral
specification is not reflected in the design. In contrast to Taylor’s circuit, which uses
only five state-holding registers, the divider of the Llano processor uses 56. In order
to address this complexity, the proof is divided into four parts, which model the
design at successively lower levels of abstraction.
At the highest level, as discussed in Sect. 2, we establish the essential properties
of the underlying SRT algorithm. Our description of the algorithm is based on an
unspecified radix, 2r . In the case of interest, we have r D 2, which means that
two quotient bits are generated per cycle. The main result of this section pertains
to the iterative phase of the computation, which generates the sequences of partial
remainders p0 ; : : : ; pn , quotient digits m1 ; : : : ; mn , and resulting partial quotients
Q0 ; : : : ; Qn . We also address several relevant issues that are ignored in the proofs
cited above (1) prescaling of the divisor and dividend and postscaling of the re-
mainder; (2) determination of the required number n of iterations, which depend
on the relative magnitudes of the operands; (3) incremental (“on-the-fly”) computa-
tion of the quotient, which involves the integration of positive and negative quotient
digits; and (4) derivation of the final remainder and quotient R and Q, as speci-
fied above, from the results R0 and Q0 of the iteration, which are characterized by
Y D Q0 X C R0 and jR0 j jX j.
In the radix-4 case, the quotient digits are confined to the range 3 mk 3.
Each mk is read from a table of 4 32 D 128 entries according to indices derived
from the normalized divisor d and the previous partial remainder pk1 and is used to
compute the next partial remainder by the recurrence formula pk D 4pk1 mk d .
At the second level of abstraction, in Sect. 3, we present the actual table used in
our implementation, which was adapted from the IBM z990 [4], and prove that it
preserves the invariant jpk j jd j.
At the third level, the algorithm is implemented in XFL, a simple formal language
developed at AMD for the specification of the AMD64 instruction set architecture.
XFL is based on unbounded integer and arbitrary precision rational data types and
combines the basic constructs of C with the logical bit vector operations of Verilog
in which AMD RTL designs are coded. The XFL encodings of the lookup table
and the divider are displayed in Appendices 1 and 2. Like most XFL programs, this
code was automatically generated from a hand-coded C++ program, which has been
subjected to testing for the purpose of validating the model.
The XFL model is significantly smaller than the RTL, which consists of some
150 kilobytes of Verilog code, but it is designed to perform the same sequence
of register-transfer-level operations while avoiding low-level implementation con-
cerns. Thus, much of the complexity of the design is captured at this third level,
A Mechanically Verified Commercial SRT Divider 25
including several essential features that are absent from higher-level models such as
Taylor’s circuit specification:
1. A hardware implementation of Taylor’s model, which computes an explicit
representation of the partial remainder on each iteration, would require a
time-consuming full-width carry-propagate adder, resulting in a prohibitively
long cycle time. In contrast, a typical contemporary commercial implementation
such as the Llano divider stores the remainder in a redundant form, which may
be computed by a much faster carry-save adder. A single full-width addition is
then performed at the end of the iterative phase.
2. The derivation of the final results R and Q from the intermediate values R0 and
Q0 involves consideration of the special cases R0 D 0 and R0 D ˙X . Tim-
ing considerations dictate that these conditions be detected in advance of the
full-width addition that produces R0 . This requires special logic for predicting
cancellation.
3. The module is also responsible for detecting overflow, i.e., a quotient that is too
large to be represented in the target format. This involves an analysis that is
performed concurrently with the final computation of the quotient.
Each of these complications introduces a possible source of design error that cannot
be ignored. In Sect. 4, we present a complete proof of the claim that the algorithm
is correctly implemented by the XFL model.
The lowest level of abstraction to be considered is that of the RTL itself. The
proof of equivalence between the RTL and XFL models represents a significant
portion of the overall effort, involving the analysis of a complex state machine,
innumerable timing and scheduling issues, and various other implementation con-
cerns. However, this part of the proof would be of relatively little interest to a
general readership; moreover, neither space nor proprietary confidentiality allows
its inclusion here.
Thus, the purpose of this paper is an exposition of the proof of correctness of
the Llano divider as represented by the XFL model. The presentation is confined to
standard mathematical notation, avoiding any obscure special-purpose formalism,
but assumes familiarity with the general theory of bit vectors and logical oper-
ations, making implicit use of the results found in [10]. Otherwise, the proof is
self-contained and surveyable, with one exception: Lemma 6, which provides a set
of inequalities that are satisfied by the entries of the lookup table, involves machine-
checked computation that is too extensive to be carried out by hand.
We emphasize, however, that a comprehensive statement of correctness of the
RTL module itself has been formalized in the logic of ACL2 and its proof has
been thoroughly checked with the ACL2 prover. This includes a formalization of
the proof presented here, along with a detailed proof of equivalence between the
XFL and RTL models. For this purpose, the XFL model was recoded directly in
ACL2 and the RTL module was translated to ACL2 by a tool that was developed
for this purpose [11]. Thus, the validity of the proof depends only on the semantic
correctness of the Verilog–ACL2 translator and the soundness of the ACL2 prover,
both of which have been widely tested.
26 D.M. Russinoff
2 SRT Division
pk D 2r pk1 mk d;
A Mechanically Verified Commercial SRT Divider 27
p d D1 d D2
pDd
6
pD2
01111 — — — 3
p D 34 d
01110 — — — 3
01101 — — 3 3
01100 — — 3 3
01011 — 3 3 3
01010 — 3 3 3
p D 12 d
01001 3 3 3 3
01000 3 3 3 3
00111 3 3 2 2
00110 3 3 2 2
3 2
00101 2 2
p D 14 d
00100 2 2 1 1
00011 2 2 1 1
00010 1 1 1 1
00001
1 1 1 1
00000 0 0 0 0 -d
X
HXX
Z
@ H X
0 ZH XXX
11111 0 0 0 0
@ZHH XXX 0
11110 0 0 0
@Z H 11101 X 1XX1 1 1
@Z H H X XX
1XX
@ZZ HH
11100 1 1 1XX
H
2 2 1 1 XXX
@ Z
11011 H X p D 14 d
Z 2 2
@11010 H H 2 2
@ Z HH 2
11001 3
Z 2 2
@ 3Z 3 2HH 2
@ 3 Z3 2 2HH
11000
10111
@ Z HH
10110 3@ 3 Z3Z 3 HH
p D 12 d
10101 3 @3 3 Z 3
10100 — 3@ 3 Z 3Z
10011 — 3 @3 3 Z
@ 3 Z
10010 — — 3
@ Z
— — 3 @ 3 Z
10001 Z p D 34 d
10000 — — — 3@
p D 2
00 01 10 11 @
@
@
@
@ p D d
Fig. 1 SRT table
28 D.M. Russinoff
where the multiplier mk contributes the accumulated quotient. The invariant jpk j
jd j is guaranteed by selecting mk from the interval
2r pk1 2r pk1
1 mk C 1:
d d
For further motivation, refer to [6].
pk D 2r pk1 mk d
and
Qk D 2r Qk1 C mk ;
where mk is an integer such that if jpk1 j jd j, then jpk j jd j. Let R D
2expo.X/ pn and Q D Qn . Then Y D QX C R and jRj jX j.
It follows by induction that jpk j jd j for all k n. We shall also show by induc-
tion that
pk D 2rk p0 Qk d:
The claim clearly holds for k D 0, and for 0 < k n,
pk D 2r pk1 mk d
D 2r 2r.k1/ p0 Qk1 d mk d
In particular,
pn D 2rn p0 Qn d
and
Y D 2expo.X/ 2rn p0 D 2expo.X/ .Qn d C pn / D QX C R;
where jRj D j2expo.X/ p j j2expo.X/ d j D jX j.
n t
u
A quotient and remainder that satisfy the conclusion of Lemma 1 may be easily
adjusted to satisfy the specification stated in Sect. 1.
A Mechanically Verified Commercial SRT Divider 29
and
EkC D .Qk C 1/ŒN 1 W 0:
Proof. The claim holds trivially for k D 0. In the inductive step, there are seven
equations to consider. For example, if mk < 1, then
EkC D .4Ek1
C mk C 5/ŒN 1 W 0
D .4.Qk1 1/ŒN 1 W 0 C mk C 5/ŒN 1 W 0
D .4.Qk1 1/ C mk C 5/ŒN 1 W 0
D .4Qk1 C mk C 1/ŒN 1 W 0
D .Qk C 1/ŒN 1 W 0:
Proof. If B 0, then
B B 1
2n D 2 C1 2 C 1 D B C 1;
2 2
so that 2n C expo.X / B C 1 C expo.X / expo.Y / C 1 and
In this section, we define a process for computing the quotient bits mk of Lemma 1
and prove that the invariant jpk j jd j is preserved. The problem may be formulated
as follows:
Given rational numbers d and p such that 1 jd j < 2 and jpj jd j, find an integer m
such that 3 m 3 and j4p d mj jd j.
We may restrict our attention to the case d > 0, since the inequalities in the above
objective are unaffected by reversing the signs of both d and m. Thus, we have 1
d < 2 and 2 < p < 2. These constraints determine a rectangle in the dp-plane as
displayed in Fig. 1, which is adapted from [4]. The rectangle is partitioned into an
array of rectangles of width 14 and height 18 . The columns and rows of the array are
numbered with indices i and j , respectively, where 0 i < 4 and 0 j < 32.
Let Rij denote the rectangle in column i and row j , and let .ıi ; j / be its lower left
vertex. Thus,
1 1
Rij D .d; p/ j ıi d < ıi C and j p < j C :
4 8
32 D.M. Russinoff
The numbering scheme is designed so that if .d; p/ 2 Rij , then i comprises the
leading two bits of the fractional part of d and j comprises the leading 5 bits of the
two’s complement representation of p.
The contents of the rectangles of Fig. 1 represent a function
m D .i; j /;
1
j p < j C :
4
Thus, in geometric terms, we may assume that .d; p/ is known to lie within the
square Sij formed as the union of the rectangle Rij and the rectangle directly above it:
1 1
Sij D .d; p/ j ıi d < ıi C and j p < j C :
4 4
We would like to show that if .d; p/ 2 Sij and m D .i; j /, then j4p d mj d ,
or, equivalently,
m1 p mC1
:
4 d 4
We first present an informal argument, which will then be formalized and proved
analytically.
The definition of is driven by the following observations:
1. Since jpj d , .d; p/ lies between the lines p D d and p D d . Therefore, if
Sij lies entirely above the line p D d or entirely below the line p D d , then m
is inconsequential and left undefined. In all other cases, m is defined.
2. Since p d , the upper bound
p mC1
d 4
A Mechanically Verified Commercial SRT Divider 33
p m1
d 4
is satisfied trivially if m D 3. In order to guarantee that this bound holds
generally, it suffices to ensure that if m ¤ 3, then Sij lies above the line
p D .m1/d4 .
It is easily verified by inspection of Fig. 1 that in all cases in which m is defined,
the conditions specified by (2) and (3) are satisfied and, consequently, the desired
inequality holds. It should also be noted that in some cases, there is a choice between
two acceptable values of m. If Sij lies within the region bounded by p D m 4 d and
p D 4 d , where 3 m 2, then the inequality is satisfied by both m and
mC1
1
j ıi C :
4
The condition that Sij lies entirely below the line p D d is similarly determined
by the location of its upper right vertex, .ıi C 14 ; j C 14 /, and is expressed by
the inequality
1 1 1
j ıi C D ıi :
4 4 2
Thus, m D .i; j / is defined if and only if neither of these inequalities holds, i.e.,
1 1
ıi < j < ıi C :
2 4
2. The maximum value of the quotient dp in Sij occurs at either the upper left or the
upper right vertex, depending on the sign of their common p-coordinate, j C 14 .
Thus, Sij lies below the line p D .mC1/d
4 if and only if both vertices lie on or
below the line, i.e,
j C 14 4j C 1 mC1
D
ıi 4ıi 4
34 D.M. Russinoff
and
j C 1
4j C 1 mC1
4
D :
ıi C 1
4
4ıi C 1 4
d
3. The minimum value of p in Sij occurs at either the lower left or the lower right
.m1/d
vertex, depending on the sign of j . Thus, Sij lies above the line p D 4 if
and only if both vertices lie on or above the line, i.e.,
j m1
ıi 4
and
j 4j m1
D :
ıi C 1
4
4ıi C 1 4
We shall also require analytical expressions for ıi and j as functions of i and j .
The definition of ıi is trivial.
Definition 1. For each integer i such that 0 i < 4,
i
ıi D 1 C :
4
Since j is the five-bit two’s complement representation of the signed integer 8j ,
we have the following definition, in which the function SgndIntVal.w; x/ computes
the value represented by a bit vector x with respect to a signed integer format of
width w:
Definition 2. For each integer j such that 0 j < 32,
8
ˆ j
< if j < 16
j D SgndIntVal.5; j / D 8
:̂ j 32 if j 16:
8
The formal statement of correctness of appears below as Lemma 7. The
constraints on that were derived above are required in the proof. These are sum-
marized in Lemma 6, which is proved by straightforward exhaustive computation.
Lemma 6. Let i and j be integers, 0 i < 4 and 0 j < 32. Assume that
ıi 12 < j < ıi C 14 and let m D .i; j /.
4j C1 4j C1
(a) If m ¤ 3, then max 4ı ; 4ı C1 mC1
4 ;
i
i
4j
(b) If m ¤ 3, then min ıj ; 4ı C1 m14
.
i i
Lemma 7. Let d and p be rational numbers, 1 d < 2 and jpj d . Let i and
j be integers, 0 i < 4 and 0 j < 32, such that ıi d < ıi C 14 and
j p < j C 14 . Let m D .i; j /. Then j4p d mj d .
A Mechanically Verified Commercial SRT Divider 35
1
j p d < ıi C
4
and
1 1 1 1 1
j > p d > ıi C D ıi ;
4 4 4 4 2
we may apply Lemma 6.
We must show that d 4p d m d , i.e.,
m1 p mC1
:
4 d 4
p 3C1
1D ;
d 4
p j C 1
j C 1
4j C 1 mC1
< 4
4
D :
d d ıi 4ıi 4
p j C 1
j C 1
4j C 1 mC1
< 4
< 4
D :
d d ıi C 1
4
4ıi C 1 4
p 3 1
1 D ;
d 4
p j j 4j m1
D :
d d ıi C 1
4
4ıi C 1 4
4 Implementation
The results of this section refer to the values assumed by variables during a
hypothetical execution of the XFL function SRT, defined in Appendix 2. With the
exception of the loop variables b and k, each variable of SRT belongs to one of the
two classes:
Some variables assume at most one value during an execution. The value of such
a variable will be denoted by the name of the variable in italics, e.g., X, dEnc,
and YNB.
Variables that are assigned inside the main for loop may assume only one value
during each iteration and may or may not be assigned an initial value before the
loop is entered. The value assigned to such a variable during the kth iteration will
be denoted with the subscript k, e.g., pk , mAbsk , and addAk . If such a variable
is assigned to an initial value outside of the loop, it will be denoted with the
subscript 0, e.g., p0 and QPart0 . When convenient, the subscript may be omitted
and understood to have the value k. When replaced with an accent (0 ), it will be
understood to have the value k 1. For example, in the statement of Lemma 14,
m and p 0 represent mk and pk1 , respectively.
SRT has four input parameters:
isSigned is a boolean indication of a signed or unsigned integer format
w is the format width, which is assumed to be 8, 16, 32, or 64
XEnc is the signed or unsigned w-bit encoding of the divisor
YEnc is the signed or unsigned 2w-bit encoding of the dividend
Three values are returned:
A boolean indication of whether the computation completed successfully
The signed or unsigned w-bit encoding of the quotient
The signed or unsigned w-bit encoding of the remainder
The last two values are of interest only when the first is true, in which case they are
the values of the variables QOut and ROut, respectively.
Some of the variables of SRT do not contribute to the outputs, but are used only
in our analysis and in embedded assertions. Of these (listed in a preamble to the
function), X and Y are the integer values represented by XEnc and YEnc, and Q
and R are the quotient and remainder, which, unless X D 0, satisfy Y D QX C R,
jRj < jX j, and either R D 0 or sgn.R/ D sgn.Y /.
Our objective is to show that success is indicated if and only if X ¤ 0 and Q is
representable with respect to the indicated format, in which case Q and R are the
integer values of QOut and ROut. Since this obviously holds when X D 0, we shall
assume X ¤ 0 in the following. The main result is the theorem at the end of this
section.
The computation is naturally partitioned into three phases, which are described
in the following three subsections.
A Mechanically Verified Commercial SRT Divider 37
In the first phase, the operands are analyzed and normalized in preparation for the
iterative computation of the quotient and remainder, and the number n of iterations
is established.
The variable XNB represents the “number of bits” of X , derived by counting the
leading zeroes or ones:
Proof. If X > 0, then X D XEnc and XNB 1 is the index of the leading 1 of X ,
which implies 2XNB-1 X < 2XNB , and the claim follows.
If XNegPower2 D 1, then XEncŒb D 1 if and only if w > b XNB 1. It
follows that XEnc D 2w 2XNB1 and
In the remaining case, X < 0, XNB 1 is the index of the leading 0 of XEnc, and
XEncŒXNB 2 W 0 ¤ 0. It follows that
which implies 2XNB < X < 2XNB1 , i.e., 2XNB1 < jX j < 2XNB : t
u
X D 2expo.X/ d D 2XNB1 d:
Clearly,
Lemma 10 gives an expression for i , the first argument of the table access func-
tion :
Lemma 10. i D b4.jd j 1/c.
Proof. If X > 0, then since 4 4d < 8,
i D 0 D b4.jd j 1/c:
b4.jd j 1/c D b4 263 dEncŒ64 W 0c D b4 dEncŒ64 W 63 263 dEncŒ62 W 0c:
YNB is the “number of bits” of Y , including, in the negative case, the final trailing
sign bit.
A Mechanically Verified Commercial SRT Divider 39
Proof. If Y > 0, then YNB 1 is the index of the leading 1 of YEnc D Y , i.e.,
expo.Y / D YNB 1.
1
2YNB2 D < jY j D 1 D 2YNB1 :
2
In the remaining case, Y < 1, YNB 2 is the index of the leading 0 of YEnc,
which implies
2w 2YNB1 YEnc < 2w 2YNB2 :
But since Y D YEnc 2w ,
and
2YNB2 < jY j 2YNB1 : t
u
The number of iterations, n, satisfies the requirement of Lemma 1.
Lemma 13.
(a) If n D 0, then
.264YNB Y /Œ67 W 0 if YNBŒ0 D XNBŒ0
pEncHi0 D
.265YNB Y /Œ67 W 0 if YNBŒ0 ¤ XNBŒ0.
Proof. First consider the case YNBŒ0 D XNBŒ0. We may assume YNB > 0;
otherwise, Y D 0 and the lemma is trivial.
Therefore,
Thus,
pEnc D pEncŒ131 W 0 D .2128YNB Y /Œ131 W 0:
If n D 0, then YNB < XNB 64 and
and
p0 D 2expo.X/2n Y D 2YNB1 Y:
Thus, 2129 p0 D 2128YNB Y is an integer and
The proof for the case YNBŒ0 ¤ XNBŒ0 is similar, with every occurrence of 127
or 128 replaced by 128 or 129. Thus, we have
4.2 Iteration
The second phase is the iteration loop in which the quotient digits are selected
and the partial remainder and quotient are updated accordingly. The main re-
sults pertaining to the iterative computation of the partial remainder are given by
Lemmas 14 and 16:
1. The quotient digit m is correctly computed as the value of .i; j /, as stated in
Lemma 14.
2. The partial remainder pk D 4pk1 mk d is encoded by pEncHi, carryHi, and
pEncLo, as stated in Lemma 16.
The proof of (2) depends on (1), and that of (1) requires the assumption that (2)
holds on the preceding iteration.
Lemma 14. Let 0 < k n. Suppose that jp 0 j jd j < 2, 2129 p 0 is an integer, and
Then
.i; j / if X 0
(a) m D
.i; j / if X < 0;
(b) j p 0 < j C 14 .
Proof. First suppose pEncHi0 C carryHi0 268 . Then pTop D 63; otherwise,
and therefore,
1 1 1
j D < 0 p 0 < D j C :
8 8 4
We may assume, therefore, that pEncHi0 C carryHi0 < 268 and hence
1 1 1
pTop p 0 < pTop C :
8 8 4
and mSign D XSign, which implies (a). To prove (b), we need only observe that
Now suppose p 0 < 0. Then 2129 p 0 D .2129 p 0 /Œ131 W 0 2132 and the above
estimate yields
1 1 1
.pTop 64/ p 0 < .pTop 64/ C :
8 8 4
0
Thus, pTop > 8p C 62 > 16 C 62 D 46, so pTop 47. Let us assume that
pTop 48. Then j D pIndex and jmj D j.i; j /j D .i; j /. Thus, to establish
(a), we need only show that m and X have opposite signs. But this follows from
mSign D XSign ˆ pSign and pSign D 1. To prove (b), it suffices to show that
pTop D SgndIntVal.5; j /C64. But in this case, j D pTopŒ4 W 0 D pT op32 16,
so SgndIntVal.5; j / D j 32 D pTop 64.
There remains the special case pTop D 47. Since
1 1 17 1 15
p0 < . pTop 64/ C D C D ;
8 4 8 4 8
2 > jd j jp 0 j > 15
8
, which implies
Thus,
jmj D mAbs D SRTLookup.3; 15/ D 3:
On the other hand, .i; j / D .3; 16/ D 3. But again, since pSign D 1, m and X
have opposite signs and (a) follows. To prove (b), note that SgndIntVal.5; j / D 16;
hence,
17 1 1
j D 2 < p 0 < C < j C : t
u
8 4 4
The computation of the partial remainder, as described in Lemma 16, involves
a “compression” that reduces four addends to two. This is performed by the serial
operation of two carry-save adders, as described by the following basic result, taken
from [10]:
Lemma 15. Given n-bit vectors x, y, and z, let
aDxˆyˆz
and
b D 2.x & y | x & z | y & z/:
Then
x C y C z D a C b:
Lemma 16. If n > 0 and 0 k n, then jpj jd j < 2, 2129 p is an integer, and
Thus, applying Lemma 7, with the signs of d and m reversed if d < 0, we have
jpj D j4p 0 md j d .
44 D.M. Russinoff
By induction,
2129 p D 2131 p 0 2129 md
is an integer. The computation of .2129 p/Œ131 W 0 involves a 4–2 compressor with
inputs addA, addB, addC, addD. We shall show that
The first two terms, addA and addB, if not 0, represent ˙2d and ˙d , respectively,
depending on the value of m. However, in the negative case, in order to avoid a full
67-bit addition, the simple complement of 2d or d is used in place of its negation,
and the missing 1 is recorded in the variable inject, which is more conveniently
combined later with addD. Thus, our first goal is to prove that
If mSign D 1, then
addA D addAŒ67 W 0
D mAbsŒ1 .2.˜dEncŒ66 W 0/ C 1/Œ67 W 0
D mAbsŒ1 .2.dEnc 1/Œ66 W 0/ C 1/Œ67 W 0
D mAbsŒ1 .2dEnc 2/Œ67 W 0/ C 1/Œ67 W 0
D mAbsŒ1 .2 dEnc 1/Œ67 W 0
D .2 mAbsŒ1 dEnc mAbsŒ1/Œ67 W 0;
The remaining two terms, addC and addD, represent the shifted result of the
previous iteration, 4p 0 . Thus,
and therefore,
But by Lemma 9,
and thus,
For k > 0,
and
Lemma 19 refers to the quotient and remainder before the correction step:
Lemma 20.
8 66XNB
< .2 RPre/Œ66 W 0 if n > 0
REncPre D .264YNB RPre/Œ66 W 0 if n D 0 and YNBŒ0 D XNBŒ0
: 65YNB
.2 RPre/Œ66 W 0 if n D 0 and YNBŒ0 ¤ XNBŒ0.
On the other hand, if n D 0, then REncPre D pEncHi0 Œ66 W 0 and the lemma
follows from Lemma 13. t
u
The encoding REnc of the final remainder, which is derived from REncPre, de-
pends on the signs of RPre and Y and the special cases RPre is 0 or ˙X . Timing
considerations dictate that these conditions must be detected before the full addition
that produces REncPre is actually performed. This requires a technique for predict-
ing cancellation, which is provided by the following result, found in [10]:
Lemma 21. Given n-bit vectors a and b and a one-bit vector c, let
D a ˆ b ˆ .2.a | b/ C c/:
If 0 k < n, then
.a C b C c/Œk W 0 D 0 , Œk W 0 D 0:
Proof. By Lemma 21, RIs0 is true if and only if REncPre D 0. If n > 0, then by
Lemma 20, REncPre D .265expo.X/ RPre/Œ66 W 0. But by Lemma 19,
where
Proof. If fixupNeeded is false, then R D RPre, REnc D REncPreŒ66 W 2, and the
lemma follows from Lemma 20. If n D 0, then as noted in the proof of Lemma 22,
RPre D Y and jY j < jX j, from which it follows that fixupNeeded is false. Thus,
we may assume that fixupNeeded is true and n > 0. We may further assume that
RIsX D 0; otherwise, REnc D R D 0. If RSign D XSign, then
and jQj 2w .
Now suppose Y < 0. Then jX j < 2XNB and jY jˇ >ˇ2YNB2 . Since the format is
signed, it will suffice to show that jQj > 2w1 or ˇ X
Y ˇ
2w1 C 1. If XNB D w,
then jX j 2 w1
and we must have X D 2 w1
and
ˇ ˇ
ˇY ˇ 2YNB2
ˇ ˇ> D 2YNBXNB1 2w :
ˇX ˇ 2XNB1
2XNBCw1
2w1 C 1;
2XNB 1
or, equivalently,
from which we conclude that Q ¤ 0. Thus, if YSign D XSign, then Q > 0, and if
YSign ¤ XSign, then Q < 0.
Suppose Q > 0. Then since Q 2wC2 , Q D QŒw C 2 W 0 D QEncŒw C 2 W 0.
If the format is unsigned, then
Q is representable , Q 2w1
, QEncŒw C 1 W 0 2wC2 2w1
, QEncŒw C 1 W w 1 D 7
, QTooLarge D 0: t
u
Next, we show that if Q is representable, then QOut and ROut are the encodings
of Q and R. Clearly, QOut D QEncŒw 1 W 0 D QŒw 1 W 0, which is the
encoding of Q. We must also show that ROut D RŒw 1 W 0.
Consider the case n > 0. By Lemma 26,
Acknowledgments The author is grateful to Mike Achenbach, the principal designer of the Llano
divider, for facilitating this work and especially for his patience in explaining the design and an-
swering endless questions about it.
References
11. Russinoff DM (2005) Formal verification of floating-point RTL at AMD using the ACL2 theo-
rem prover, IMACS World Congress, Paris, 2005. http://www.russinoff.com/papers/paris.html
12. Taylor GS (1981) Compatible hardware for division and square root. In: Proceedings of the 5th
symposium on computer arithmetic. IEEE Computer Society, Washington, DC
13. Tocher KD (1958) Techniques of multiplication and division for automatic binary computers.
Q J Mech Appl Math 11(3):364–384
case 2:
switch (j) {
case 0x0D: case 0x0C: case 0x0B: case 0x0A: case 0x09: case 0x08:
return 3;
case 0x07: case 0x06: case 0x05:
return 2;
case 0x04: case 0x03: case 0x02: case 0x01:
return 1;
case 0x00: case 0x1F: case 0x1E:
return 0;
case 0x1D: case 0x1C: case 0x1B:
return -1;
case 0x1A: case 0x19: case 0x18: case 0x17:
return -2;
case 0x16: case 0x15: case 0x14: case 0x13: case 0x12: case 0x11:
return -3;
default: assert(false);
}
case 3:
switch (j) {
case 0x0F: case 0x0E: case 0x0D: case 0x0C:
case 0x0B: case 0x0A: case 0x09: case 0x08:
return 3;
case 0x07: case 0x06: case 0x05:
return 2;
case 0x04: case 0x03: case 0x02: case 0x01:
return 1;
case 0x00: case 0x1F: case 0x1E:
return 0;
case 0x1D: case 0x1C: case 0x1B:
return -1;
case 0x1A: case 0x19: case 0x18: case 0x17:
return -2;
case 0x16: case 0x15: case 0x14: case 0x13:
case 0x12: case 0x11: case 0x10:
return -3;
default: assert(false);
}
default: assert(false);
}
}
<bool, nat, nat> SRT(nat YEnc, nat XEnc, nat w, bool isSigned) {
assert((w == 8) || (w == 16) || (w == 32) || (w == 64));
// Decode operands:
if (isSigned) {
Y = SgndIntVal(2*w, YEnc[2*w-1:0]);
X = SgndIntVal(w, XEnc[w-1:0]);
}
else {
Y = YEnc[2*w-1:0];
X = XEnc[w-1:0];
}
A Mechanically Verified Commercial SRT Divider 57
// Compute the number of divisor bits that follow the leading sign
// bits. In the case of the negative of a power of 2, the trailing
// sign bit is included as a divisor bit:
bool XSign = isSigned ? XEnc[w-1] : false;
nat b = w;
while ((b > 0) && (XEnc[b-1] == XSign)) {
b--;
}
bool XNegPower2 = XSign && ((b == 0) || (XEnc[b-1:0] == 0));
nat XNB = XNegPower2 ? b+1 : b;
assert(XNB == expo(X) + 1);
// Compute the number of dividend bits that follow the leading sign
// bits. In the negative case, the trailing sign bit is
// included as a dividend bit.
bool YSign = isSigned ? YEnc[2*w-1] : false;
b = 2*w;
while ((b > 0) && (YEnc[b-1] == YSign)) {
b--;
}
nat YNB = YSign ? b + 1 : b;
if (Y > 0) {
assert(1 << (YNB - 1) <= Y && Y < 1 << YNB);
}
else if (Y < 0) {
assert(1 << (YNB - 2) < abs(Y) && abs(Y) <= 1 << (YNB - 1));
};
assert(Y == 0 || YNB >= expo(Y)+1);
58 D.M. Russinoff
// Table lookup:
nat pTop = pEncHi[67:62];
bool pSign = pTop[5];
nat pIndex = pTop[4:0]; // second argument of SRTLookup
A Mechanically Verified Commercial SRT Divider 59
// 4-2 compression:
nat sum1 = addA ˆ addB ˆ addC;
nat carry1 = (addA & addB | addB & addC | addA & addC) << 1;
nat sum2 = sum1 ˆ carry1 ˆ addD;
nat carry2 = (sum1 & carry1 | carry1 & addD | sum1 & addD) << 1;
assert((sum2 + carry2)[67:0] == p[2:-65]);
// Update quotient:
QPart = 4*QPart + m;
assert(abs(QPart) < (1 << 2*k));
if (mAbs == 0) {
QPEnc = (Q0Enc << 2)[66:0] | 1;
QMEnc = (QMEnc << 2)[66:0] | 3;
Q0Enc = (Q0Enc << 2)[66:0];
}
else if (mSign == 0) {
switch (mAbs) {
case 1:
QPEnc = (Q0Enc << 2)[66:0] | 2;
QMEnc = (Q0Enc << 2)[66:0];
Q0Enc = (Q0Enc << 2)[66:0] | 1;
break;
case 2:
QPEnc = (Q0Enc << 2)[66:0] | 3;
QMEnc = (Q0Enc << 2)[66:0] | 1;
Q0Enc = (Q0Enc << 2)[66:0] | 2;
break;
case 3:
QPEnc = (QPEnc << 2)[66:0];
QMEnc = (Q0Enc << 2)[66:0] | 2;
Q0Enc = (Q0Enc << 2)[66:0] | 3;
break;
default: assert(false);
}
}
else { // mSign == 1
switch (mAbs) {
case 1:
QPEnc = (Q0Enc << 2)[66:0];
Q0Enc = (QMEnc << 2)[66:0] | 3;
QMEnc = (QMEnc << 2)[66:0] | 2;
break;
case 2:
QPEnc = (QMEnc << 2)[66:0] | 3;
Q0Enc = (QMEnc << 2)[66:0] | 2;
A Mechanically Verified Commercial SRT Divider 61
// Encoding of remainder:
nat REncPre = (pEncHi + carryHi)[66:0];
if (YNB >= XNB) {
assert(REncPre == (RPre << (66 - XNB))[66:0]);
}
else if (YNB[0] == XNB[0]) {
assert(REncPre == (RPre << (64 - YNB))[66:0]);
}
else {
assert(REncPre == (RPre << (65 - YNB))[66:0]);
}
if (QTooLarge) {
return <false, 0, 0>;
}
else {
ROut = ((RSign << w) - (RSign << (YNB+1))) | REnc[63:63-YNB];
}
assert(QOut == Q[w-1:0]);
assert(ROut == R[w-1:0]);
return <true, QOut, ROut>;
}
Use of Formal Verification at Centaur
Technology
Warren A. Hunt, Jr., Sol Swords, Jared Davis, and Anna Slobodova
1 Introduction
In our verification process, we first translate the Verilog RTL source code of
Centaur’s design into EMOD, a formally defined HDL. This process captures a de-
sign as an ACL2 object that can be interpreted by an ACL2-based HDL simulator.
The HDL simulator is used both to run concrete test cases and to extract symbolic
representations of the circuit logic of blocks of interest. We then use a combina-
tion of theorem proving and equivalence checking to prove that the functionality
of the circuit in question is equivalent to a higher-level specification. A completed
verification yields an ACL2 theorem that precisely states what we have proven.
We have developed a deep embedding of our hardware description language,
EMOD [12], in the ACL2 logic. We describe the EMOD language in Sect. 4.1. Our
1.2 Timeline
The integration of formal methods into Centaur’s design methodology has been on-
going for several years. Hunt first met with Centaur representatives in April 2007.
This led Hunt and Swords to join Centaur in June 2007 to see if our existing (ACL2-
based) tools could be usefully deployed on Centaur verification problems. Our use
of formal methods is not new, and AMD [19] has been using ACL2 for many
years for floating-point hardware verification. However, there are several things that
differentiate our effort from all others: the Centaur design is converted into our
EMOD-formalized hardware description language (described later), our verification
(BDD and AIG) algorithms are themselves verified, and all of our claims are all
checked as ACL2 theorems.
The use of formal methods to aid hardware design has been ongoing for many
years. Possibly the earliest adopter was IBM with equivalence checking mechanisms
that they developed in the early 1980s; IBM protected these mechanisms as trade
secrets. With the development of simple microprocessor verification examples, such
as the FM8501 [9] and the VIPER [5], and introduction of BDDs [6], commercial
organizations started integrating some use of formal methods into their design flow.
A big impetus for the use of formal methods came from the Intel FDIV bug [18].
Work that allowed us to get an immediate start was just being finished when
our Centaur-based effort began. Boyer and Hunt had implemented BDDs [3, 4]
Use of Formal Verification at Centaur Technology 67
with an extended version of ACL2 that included unique object representation and
function memoization [3]. Separately, Hunt and Reeber had previously embedded
the DE2 HDL into ACL2 [10], and this greatly influenced the development of the
EMOD HDL.
Our initial efforts were directed along two fronts: analyzing microcode for integer
division and verifying the floating-point addition/subtraction hardware. Our analy-
sis of the microcode for the integer divide algorithm involved creating an abstraction
of the microcode with ACL2 functions and then using the ACL2 theorem-prover to
mechanically check that our model of the divide microcode computes the correct
answer. This effort discovered an anomaly that was subsequently corrected.
Our work on the verification of the floating-point addition/subtraction hardware
was much more involved. Because of the size of the design – some 34,000 lines of
Verilog – it was necessary for us to create a translator from Verilog into our EMOD
hardware description language. We enhanced a Verilog parser, written by Terry
Parks (of Centaur), so that it emitted an EMOD-language version of the floating-
point hardware design; this translator created an EMOD-language representation of
the entire module hierarchy, including all interface and wire names. The semantics
of the EMOD language are given by the EMOD simulator which allows an EMOD-
language-based design to be simulated or symbolically simulated with a variety
(e.g., BDDs and AIGs) of mechanisms. Simultaneously, we developed an extension
to ACL2 that provides a symbolic simulator for the entire ACL2 logic; this system
was called G. Given these components, we were able to attempt the verification of
Centaur’s designs; this was done by comparing the symbolic equations produced
by the EMOD HDL symbolic simulator to the equations produced by the G-based
symbolic simulation of our ACL2 floating-point specifications.
Our verification of Centaur’s floating-point addition/subtraction instructions led
to the discovery of two design flaws: for two of the four floating-point adders, the
floating-point control flag inputs arrived one cycle early and for one pair of 80-bit
numbers (described more fully later), the sum/difference was incorrect. Both of
these very subtle problems were fixed. This work was completed within the first
year of our efforts at Centaur. This effort strained our Verilog translator and illumi-
nated areas where we wanted to better integrate symbolic simulation into the ACL2
system.
In the summer of 2008, Davis arrived and began developing a more capable
Verilog translator named VL. The new translator was itself written in ACL2, and
it was designed with simplicity and assurance in mind. The translator has provi-
sions for translating Verilog annotations and property specifications into the EMOD
language.
Starting in the summer of 2008, Swords began an effort to build a verified version
of the ACL2 G symbolic simulator, called GL (for G in the Logic). This new system
represents symbolic ACL2 expressions as ACL2 data objects, which allows proofs
to be carried out which show that such objects are manipulated correctly.
In the fall of 2008, Slobodova joined Centaur as manager of the formal veri-
fication team and began using these tools to verify a number of different(integer
68 W.A. Hunt et al.
Instruction
Clocks Control Flags Data A
Data B
1074 inputs
394 outputs
2 Modeling Effort
The specification of the CN processor consists of over half a million lines of Verilog;
this Verilog is frequently updated by the logic designers. To bring this design into
our EMOD HDL, we have developed a translator named VL. This is a challenge since
Verilog is such a large language with no formal semantics. Our work is based on the
IEEE Verilog 1364-2005 standard [13], and we do not yet support the SystemVerilog
extensions. This standard usually explains things well, but sometimes it is vague; in
these cases, we have carried out thousands of tests and attempted to emulate the
behavior of Cadence’s Verilog simulator.
VL needs to produce a “sound” translation or our verification results may be
meaningless. Because of this, we have written VL in the purely functional program-
ming language of the ACL2 theorem prover, and our emphasis from the start has
been on correctness rather than performance. For instance, our parser is written in
a particularly naive way: to begin, each source file is read, in its entirety, into a
simple list of extended characters, which associate each character with its filename
and position. This makes the remaining steps in the parsing process ordinary list-
transforming functions:
read : filename ! echar list
preprocess : echar list ! echar list
lex : echar list ! token list
eat-comments : token list ! token list comment map
parse : token list ! module list
70 W.A. Hunt et al.
2.1.1 Unparameterization
Verilog modules can have parameters, e.g., an adder module might take
input wires of some arbitrary width, and other modules can then instantiate
adder with different widths, say 8, 16, and 32. Our first transformation is to
eliminate parametrized modules, e.g., we would introduce three new modules,
adder$width=8, adder$width=16, and adder$width=32, and change
the instances of adder to point to these new modules as appropriate.
Modules may be instantiated using either positional or named argument lists. For
instance, given a module M with ports a, b, and c, the following instances of M are
equivalent:
M my instance(1, 2, 3);
M my instance(.b(2), .c(3), .a(1));
In this transformation, we convert all instances to the positional style and annotate
the arguments as inputs or outputs.
We can reduce the variety of operators we need to deal with by simply rewriting
some operators away. In particular, we perform rewrites such as
a && b ! (|a) & (|b),
a != b ! |(a ˆ b), and
a < b ! (a >= b).
This process eliminates all logical operators (&&, ||, and !), equality comparisons
(== and !=), negated reduction operators (&, |, and ˆ), and standardizes all
inequality comparisons (<, >, <=, and >=) to the >= format. We have a considerable
simulation test suite to validate these rewrites.
We now annotate every expression with its type (sign) and width. This is tricky.
The rules for determining widths are quite complicated, and if they are not properly
implemented then, for instance, carries might be inappropriately kept or dropped. It
took a lot of experimenting with Cadence and many readings of the standard to be
sure that we had it right.
After the widths have been computed, we introduce explicit wires to hold the inter-
mediate values in expressions,
assign w = (a + b) - c;
!
wire [width:0] newname;
assign newname = a + b;
assign w = newname - c;
We also split inputs to module and gate instances
my mod my inst(a + b, : : : );
!
wire [width:0] newname;
assign newname = a + b;
my mod my inst(newname, : : : );
We now replace all assignments with module instances. First, we develop a way to
generate modules to perform each operation at a given width, and we write these
modules using only gates and submodule instances. Next, we replace each assign-
ment with an instance of the appropriate module, e.g.,
assign w = a + b;
!
VL 13 BIT PLUS newname(w, a, b);
This is one of our more complicated transformations, so we have developed a
test suite which, for instance, uses Cadence to exhaustively test VL 4 BIT PLUS
against an ordinary addition operation. We are careful to handle the X and Z behav-
ior appropriately. We go out of our way so that all of w’s bits become X if any bit of
a or b is X or Z, even though this makes our generated adders more complex.
We have left out a few other rewrites like naming any unnamed instances, elimi-
nating supply wires, and some minor optimizations. But the basic idea is that, taken
all together, our simplifications leave us with a new list of modules where only
simple gate and module instances are used. This design lets us focus on each task
separately instead of needing to consider all of Verilog at once.
It takes around 20 min to run our full translation process on the whole of CN. A lot of
memory is needed, and we ordinarily use a machine with 64 GB of physical memory
to do the translation. Not all modules can be translated successfully (e.g., because
they use constructs which are not yet supported). However, a large portion of the
chip is fully supported.
The translator is run against multiple versions of the chip each night, and the
resulting EMOD modules are stored on disk into files that can be loaded into an
ACL2 executable in seconds. This process also results in internal Web pages that
allow the original source code, translated source code, and warnings about each
module to be easily viewed and some other Lint-like reports for the benefit of the
logic designers and verification engineers.
3 Verification Method
Our verification efforts so far have concentrated on proving the functional correct-
ness of instructions running on certain execution units; that is, showing that they
operate equivalently to a high-level specification. However, we believe our method-
ology would also be useful for proving nonfunctional properties of the design.
Our specifications are functions written in ACL2. They are executable and can
therefore be used to run tests against the hardware model or a known implemen-
tation. In most cases, we write specifications that operate at the integer level on
vectors of signals. Often these specifications are simple enough that we are satisfied
that they are correct by examination; by comparison with the RTL designs of the cor-
responding hardware units, they are very small indeed. For floating-point addition,
we use a low-level integer-based specification that is somewhat optimized for sym-
bolic execution performance and is relatively complicated compared to our other
specifications. However, this specification has been separately proven equivalent to
a high-level, rational-number-based specification. Before this proof was completed,
we had also tested the specification by running it on millions of inputs and com-
paring the results to those produced by running the same floating-point operations
directly on the local CPU.
Figure 2 shows the verification methodology we used in proving the correct-
ness of the fadd unit’s floating-point addition instructions. We compare the result
of symbolic simulations of an instruction specification and our model of the fadd
Use of Formal Verification at Centaur Technology 75
Per-instruction
Case-splitting, AIGs
Parametrization
AIG2BDD
C=a+b
hardware. To obtain our model of the hardware, we translate the fadd unit’s Verilog
design into our EMOD hardware description language. We then run an AIG-based
symbolic simulation of the fadd model using the EMOD symbolic simulator; the re-
sults of this simulation describe the outputs of the fadd unit as four-valued functions
of the inputs, and we represent these functions with AIGs. We then specialize these
functions by setting input control bits to values appropriate for the desired instruc-
tion. To compare these functions with those produced by the specification, we then
convert these AIGs into BDDs.
For many instructions, it is feasible to simply construct BDDs representing the
outputs as functions of the inputs, and we therefore may verify these instructions
directly using symbolic simulation. For the case of floating-point addition, however,
there is a capacity problem due to the shifted addition of mantissas. We therefore use
case splitting via BDD parametrization [1, 16] to restrict the analysis to subsets of
the input space. This allows us to choose a BDD variable ordering specially for each
input subset, which is essential to avoid this blowup. For each case split, we run a
symbolic simulation of the instruction specification and an AIG-to-BDD conversion
of the specialized AIGs for the instruction. If corresponding BDDs from these re-
sults are equal, this shows that the fadd unit operates identically to the specification
function on the subset of the input space covered by the case split; otherwise, we
can generate counterexamples by analyzing the differences in the outputs.
For each instruction, we produce a theorem stating that evaluation of the
instruction-specialized AIGs yields the same result as the instruction’s specification
function. This theorem is proven using the GL symbolic simulation framework
[4], which automates the process of proving theorems by BDD-based symbolic
execution, optionally with parametrized case splitting. Much of the complexity
of the flow is hidden from the user by the automation provided by GL; the user
provides the statement of the desired theorem and high-level descriptions of the
case split, symbolic simulation inputs, and suitable BDD variable orderings. BDD
76 W.A. Hunt et al.
parametrization and the AIG to BDD conversion algorithm are used automatically
based on these parameters. The statement of the theorem is independent of the sym-
bolic execution mechanism; it is stated in terms of universally quantified variables
which collectively represent a (concrete) input vector for the design.
In the following subsections, we will describe in more detail the case-splitting
mechanism, the process of translating the Verilog design into an EMOD description,
and the methods of symbolic simulation used for the fadd unit model and the in-
struction specification.
For verifying the floating-point addition instructions, we use case splitting to avoid
BDD blowup that occurs due to a nonconstant shift of the operand mantissas based
on the difference in their exponents. By choosing case-splitting boundaries appropri-
ately, the shift amount can be reduced to a constant. The strategy for choosing these
boundaries is documented by others [1, 7, 15, 20], and we believe it to be reusable
for new designs.
In total, we split into 138 cases for single, 298 for double, and 858 for extended
precision. Most of these cases cover input subsets over which the exponent dif-
ference of the two operands is constant and either all input vectors are effective
additions or all are effective subtractions. Exponent differences greater than the
maximum shift amount are considered as a block. Special inputs such as NaNs and
infinities are considered separately. For performance reasons, we use a finer-grained
case-split for extended precision than for single or double precision.
For each case split, we restrict the simulation coverage to the chosen subset of
the input space using BDD parametrization. This generates a symbolic input vector
(a BDD for each input bit) that covers exactly and only the appropriate set of inputs;
we describe BDD parametrization in more detail in Sect. 4.3. Each such symbolic
input vector is used in both an AIG-to-BDD conversion and a symbolic simulation
of the specification. The BDD variable ordering is chosen specifically for each case
split, thereby reducing the overall size of the intermediate BDDs. No knowledge of
the design was used to determine the case-splitting approach.
We use the EMOD symbolic simulator to obtain Boolean formulas (AIGs) repre-
senting the outputs of a unit in terms of its inputs. In such simulations, we use
a four-valued logic in which each signal may take values 1 (true), 0 (false), X
(unknown), or Z (floating). This is encoded using two AIGs (onset and offset) per
signal. The Boolean values taken by each AIG determine the value taken by the
signal as in Fig. 3.
Use of Formal Verification at Centaur Technology 77
The fadd unit is mainly a pipeline, where each instruction is bounded by a fixed
latency. To verify its instructions, we set all bits of the initial state to unknown (X )
values – the onsets and offsets of all nonclock inputs are set to free Boolean variables
at each cycle, so that every input signal but the clocks can take any of the four values.
We then symbolically simulate it for a fixed number of cycles. This results in a fully
general formula for each output in terms of the inputs at each clock cycle.
To obtain symbolic outputs for a particular instruction, we restrict the fully gen-
eral output formulas by setting control signals to the values required for performing
the given instruction and any signals we know to be irrelevant to unknown (X ) input
values. This reduces the number of variables present in these functions and keeps
our result as general as possible. Constant propagation with these specified values
restricts the AIGs to formulas in terms of only the inputs relevant to the instruction
we are considering. For the floating-point addition instructions of the fadd unit, the
remaining inputs are the operands and the status register, which are the same as the
inputs to the specification function.
The theorems produced by our verifications typically say that for any well-
formed input vector, the evaluation of the instruction-specialized AIGs using the
variable assignment generated from the input vector is equivalent to the output of the
specification function on that input vector. Such a theorem may often be proven au-
tomatically, given appropriate BDD ordering and case splitting, by the GL symbolic
execution framework. GL has built in the notion of symbolically evaluating an AIG
using BDDs, effectively converting the Boolean function representation from one
form to the other. It uses the procedure AIG2BDD described in Sect. 4.4 for this
process; this algorithm avoids computing certain intermediate-value BDDs that are
irrelevant to the final outputs, which helps to solve some BDD size explosions.
The specification for an instruction is generally an ACL2 function that takes inte-
gers or Booleans representing some of the inputs to a block and produces integers
or Booleans representing the relevant outputs. Such functions are usually defined in
terms of word-level primitives such as shifts, bit-wise logical operations, plus, and
minus. For the floating-point addition instructions, the function takes integers repre-
senting the operands and the control register and produces integers representing the
result and the flag register. It is optimized for symbolic simulation performance
rather than referential clarity; however, it has separately been proven equivalent
to a high-level, rational arithmetic-based specification of the IEEE floating-point
78 W.A. Hunt et al.
standard [14]. Additionally, it has been tested against floating-point instructions run-
ning on Intel and AMD CPUs on many millions of input operand pairs, including a
test suite designed to detect floating-point corner-cases [22] as well as random tests.
To support symbolic simulation of our specifications, we developed the GL sym-
bolic execution framework for ACL2 [4]. The GL framework allows user-provided
ACL2 code to be symbolically executed using a BDD-based symbolic object rep-
resentation. The symbolic execution engine is itself verified in ACL2 so that its
results provably reflect the behavior of the function that was symbolically executed.
GL also provides automation for proving theorems based on such symbolic execu-
tions. Since these theorems do not depend on any unverified routines, they offer the
same degree of assurance as any proof in ACL2: that is, they can be trusted if ACL2
itself can be trusted.
GL automates several of the steps in our verification methodology. For a theo-
rem in which we show that the evaluation of an AIG representation of the circuit
produces results equivalent to a specification function, the GL symbolic execution
encompasses the AIG-to-BDD transformation and the comparison of the results, as
well as the counterexample generation if there is a bug. If the proof requires case
splitting, the parametrization mechanism is also handled by GL. The user specifies
the BDD variable ordering used to construct the symbolic input vectors, as well as
the case split. To specify the case split, the user provides a predicate which deter-
mines whether an input vector is covered by a given case; like the theorem itself,
this predicate is written at the level of concrete objects. Typically, all computations
at the symbolic (BDD) level are performed by GL; the user programs only at the
concrete level.
For each case split in which the results from the symbolic simulations of the
specification and the hardware model are equal, this serves to prove that for any con-
crete input vector drawn from the coverage set of the case, a simulation of the fadd
model will produce the same result as the instruction specification. If the results are
not equal, we can generate a counterexample by finding a satisfying assignment for
the XOR of two corresponding output BDDs.
To prove the top-level theorem that the fadd unit produces the same result as the
specification for all legal concrete inputs, we must also prove that the union of all
such input subsets covers the entire set of legal inputs. This is handled automatically
by the GL framework. For each case, GL produces a BDD representing the indicator
function of the coverage set (the function which is true on inputs that are elements of
the set and false on inputs that are not.) As in [7], the OR of all such BDDs is shown
to be implied by the indicator function BDD of the set of legal inputs; therefore, if
an input vector is legal then it is in one or more of the coverage sets of the case split.
Use of Formal Verification at Centaur Technology 79
(defm *half-adder-module*
‘(:i (a b)
:o (sum carry)
:occs
((:u o0 :o (sum) :op ,*xor2* :i (a b))
(:u o1 :o (carry) :op ,*and2* :i (a b)))))
(defm *one-bit-cntr*
‘(:i (c-in reset-)
:o (out c)
:occs
((:u o2 :o out :op ,*ff* :i (sum-reset))
(:u o0 :o (sum c) :op ,*half-adder-module* :i (c-in out))
(:u o1 :o (sum-reset) :op ,*and2* :i (sum reset-)))))
BDDs and AIGs both are data objects that represent Boolean-valued functions of
Boolean variables. We have defined evaluators for both BDDs and AIGs in ACL2.
The BDD (resp. AIG) evaluator, given a BDD (AIG) and an assignment of Boolean
values to the relevant variables, produces the Boolean value of the function it repre-
sents at that variable assignment. Here, for brevity, we use the notation hxibdd .env/
or hxiaig .env/ for the evaluation of x with variable assignment env. We use the
same notation when x is a list to denote the mapping of h ibdd .env/ over the
elements of x.
The BDD and AIG logical operators are defined in the ACL2 logic and proven
correct relative to the evaluator functions. For example, the following theorem
shows the correctness of the BDD AND operator (written ^bdd ); similar theorems
are proven for every basic BDD and AIG operator such as NOT, OR, XOR, and ITE:
4.3 Parametrization
In the symbolic simulation process for the fadd unit, we obtain AIGs representing
the outputs as a function of the primary inputs and subsequently assign parametrized
input BDDs to each primary input, computing BDDs representing the function
composition of the AIG with the input BDDs. A straightforward (but inefficient)
method to obtain this composition is an algorithm that recursively computes the
BDD corresponding to each AIG node: at a primary input, look up the assigned
BDD; at an AND node, compute the BDD AND of the BDDs corresponding to the
child nodes; and at a NOT node, compute the BDD NOT of the BDD correspond-
ing to the negated node. This method proves to be impractical for our purpose; we
describe here the algorithm AIG2BDD that we use instead.
To improve the efficiency of the straightforward recursive algorithm, one nec-
essary modification is to memoize it so as to traverse the AIG as a DAG (without
examining the same node twice) rather than as a tree: due to multiple fanouts in
the hardware model, most AIGs produced would take time exponential in the logic
depth if traversed as a tree. The second important improvement is to attempt to avoid
computing the full BDD translation of nodes that are not relevant to the primary out-
puts. For example, if there is a multiplexer present in the circuit and its selector is set
to 1 for all settings of the inputs possible under the current parametrization, then the
value of the unselected input is irrelevant unless it has another fanout that is relevant.
In AIGs, such irrelevant branches appear as fanins to ANDs in which the other fanin
is unconditionally false. More generally, an AND of two child AIGs a and b can be
reduced to a if it can be shown that a ) b (though the most common occurrence
of this is when a is unconditionally false.) The AIG2BDD algorithm applies in iter-
ative stages of two methods that can each detect certain of these situations without
fully translating b to a BDD. In both methods, we calculate exact BDD translations
for nodes, beginning at the leaves and moving toward the root, until some node’s
translation exceeds a BDD size limit. We replace the over-sized BDD with a new
representation that loses some information but allows the computation to continue
while avoiding blowup. When the primary outputs are computed, we check to see
whether or not they are exact BDD translations. If so, we are done; if not, we in-
crease the size limit and try again. During each iteration of the translation, we check
82 W.A. Hunt et al.
each AND node for an irrelevant branch; if a branch is irrelevant it is removed from
the AIG so that it will be ignored in subsequent iterations. We use the weaker of
the two methods first with small size limits, then switch to the stronger method at a
larger size limit.
In the weaker method, the translated value of each AIG node is two BDDs that
are upper and lower bounds for its exact BDD translation, in the sense that the
lower-bound BDD implies the exact BDD and the exact BDD implies the upper-
bound BDD. If the upper and lower bound BDDs for a node are equal, then they
both represent the exact BDD translation for the node. When a BDD larger than the
size limit is produced, it is thrown away and the constant-true and constant-false
BDDs are instead used for its upper and lower bounds. If an AND node a ^ b is
encountered for which the upper bound for a implies the lower bound for b, then
we have a ) b; therefore we may replace the AND node with a. Thus using the
weak method we can, for example, replace an AIG representing a ^ .a _ b/ with a
whenever the BDD translation of a is known exactly, without computing the exact
translation for b.
In the stronger method, instead of approximating BDDs by an upper and
lower bound, fresh BDD variables are introduced to replace over-sized BDDs.
(We necessarily take care that these variables are not reused.) The BDD associated
with a node is its exact translation if it references only the variables used in the
primary input assignments. This catches certain additional pruning opportunities
that the weaker method might miss, such as b ¤ .a ¤ b/ ! a.
These two AIG-to-BDD translation methods, as well as the combined method
AIG2BDD that uses both in stages, have been proven in ACL2 to be equivalent,
when they produce an exact result, to the naive AIG-to-BDD translation algorithm
described above.
When symbolically simulating the fadd unit, using input parametrization in con-
junction with the AIG2BDD procedure works around the problem that BDD variable
orderings that are efficient for one execution path are inefficient for another. Input
parametrization allows cases where one execution path is selected to be analyzed
separately from cases where others are used. However, a naive method of building
BDDs from the hardware model might still construct the BDDs of the intermediate
signals produced by multiple paths, leading to blowups. The AIG2BDD procedure
ensures that unused paths do not cause a blowup.
both 16-bit natural numbers, and we conclude that when x and y are given as inputs
to the circuit, the result produced is x C y. To prove this, the user specifies what
shape of input objects should be used for symbolic execution (in this case, 16-bit
natural numbers). This shape specification also gives the BDD ordering for the bits
of x and y. From the shape specification, GL constructs symbolic objects represent-
ing x and y. It then symbolically executes the conclusion. Ideally, the result of this
symbolic execution will be a symbolic object that can syntactically be determined
to always represent true. If not, GL will extract counterexamples from the resulting
symbolic object, giving concrete values of x and y that falsify the conjecture. When
the symbolic execution produces a true result, the final step in proving this theorem
is to show that the symbolic objects used as inputs to the simulation cover the finite
set of concrete inputs recognized by the hypothesis. In this example, 16-bit sym-
bolic natural numbers suffice to cover the input space provided all the bits are free,
independent variables; smaller symbolic naturals would not be adequate.
Symbolic objects are structures that describe functions over Booleans. Depend-
ing on the shape of such objects, they may take as their values any object in the
ACL2 universe. For example, we represent symbolic integers as the pairing of a tag,
which distinguishes such an object from other symbolic types such as Booleans and
ordered pairs, and a list of BDDs, which represents the two’s-complement digits of
the integer. We define an evaluator function for symbolic objects, which gives the
concrete value represented by an object under an assignment of Booleans to each
variable. For the integer example, the evaluator recognizes the tag and evaluates
each BDD in the representation under the given assignment. Then it produces the
integer whose two’s-complement representation matches the resulting list of bits.
To perform a symbolic execution, we employ two methods. We may create a
symbolic counterpart fsym for a user-specified function f . fsym is an executable
ACL2 function that operates on symbolic objects in the same way as f operates on
concrete objects. It is defined by examining the definition of f , creating symbolic
counterparts recursively for all its subfunctions and nesting them in the same man-
ner as in the definition. Alternatively, we may symbolically interpret an ACL2 term
under an assignment of symbolic objects to that term’s free variables. In this case,
we walk over the given term. At each function call, we either call that function’s
symbolic counterpart if it exists or else look up the function’s definition and recur-
sively symbolically interpret it under an assignment that pairs its formals with the
corresponding symbolic values produced by the given actual parameters.
In both methods of symbolic execution, it is necessary for any ACL2 primi-
tives to have predefined symbolic counterparts, since they do not have definitions.
We have defined many of these functions manually and proven the correctness of
their symbolic counterparts. For example, the symbolic counterpart of C is defined
such that on symbolic integers, it performs a BDD-level ripple-carry algorithm,
producing a new symbolic integer that provably always evaluates to the sum of
the evaluations of the inputs. We have also manually defined symbolic counter-
parts for certain functions for which symbolic interpretation of the ACL2 definitions
would be inefficient. For example, the bit-wise negation function lognot is defined
as lognot.x/ D .x/ 1, but for symbolic execution it is more efficient to per-
84 W.A. Hunt et al.
form the bit-wise negation directly by negating the BDDs in the symbolic integer
representation; in fact, we define the negation operator in terms of lognot, rather
than the reverse.
The correctness condition for a symbolic counterpart fsym states a correspon-
dence between the operation of fsym on symbolic objects and the operation of f on
concrete objects. Namely, the evaluation of the (symbolic) result of fsym on some
symbolic inputs is the same as the (concrete) result of running f on the evaluations
of those inputs:
ev ev
? f ?
Concrete Inputs - Concrete Results
Each primitive symbolic counterpart we have defined is proven (using standard
ACL2 proof methodology) to provide this correctness condition. The correctness of
symbolic counterparts of functions defined in terms of these primitives follows from
this; the correctness proofs are automated in the routine that creates symbolic coun-
terparts. The symbolic interpreter is also itself verified; its correctness condition is
similar. Suppose we symbolically interpret a term x with a binding of its variables
vi to symbolic objects si , yielding a symbolic result. We have proven that the eval-
uation of this result under assignment a equals the result of running the term x with
its variables vi each bound to ev.si ; a/.
These correctness conditions allow theorems to be proven using symbolic execu-
tion. Consider our previous example of the 16-bit adder. Suppose we symbolically
execute the conclusion of our theorem on symbolic inputs sx ; sy and the result is an
object that evaluates to true under every variable assignment:
8a : ev concsym sx ; sy ; a :
8a : conc ev .sx ; a/ ; ev sy ; a :
That is, the conclusion holds of any pair of values x and y such that sx and sy eval-
uate to that pair under some assignment. The coverage side condition then requires
us to show that sx and sy are general enough to cover any pair that satisfies the the-
orem’s hypothesis. Once this is proven, the proof of the theorem (hypotheses imply
conclusion) is complete.
Use of Formal Verification at Centaur Technology 85
6 Related Work
20, 21]. The symbolic simulation frameworks used in all of these verifications,
including the symbolic trajectory evaluation implementation in Intel’s Forte prover,
are themselves unverified programs. Similarly, the floating-point verification de-
scribed in [7] uses the SMV model checker and a separate argument that its case
split provides full coverage. To obtain more confidence in our results, we construct
our symbolic simulation mechanisms within the theorem prover and prove that they
yield sound results. Combining tool verifications with the results of our symbolic
simulations yields a theorem showing that the instruction implementation equals its
specification.
7 Conclusion
Acknowledgments We would like to acknowledge the support of Centaur Technology, Inc. and
ForrestHunt, Inc. We would also like to thank Bob Boyer for development of much of the tech-
nology behind EMOD and the ACL2 BDD package, Terry Parks for developing a very detailed
floating-point addition specification, and Robert Krug for his proof that our integer-level, floating-
point addition specification performs the rational arithmetic and rounding specified by the IEEE
floating-point standard. Portions of this chapter originally appeared in [11].
References
1. Aagaard MD, Jones RB, Seger CJH (1999) Formal verification using parametric represen-
tations of boolean constraints. In: Proceedings of the 36th design automation conference,
pp 402–407
2. Baumgartner J (2006) Integrating FV into main-stream verification: the IBM
experience. Tutorial given at FMCAD. Available at http://domino.research.ibm.com/
comm/research projects.nsf/pages/sixthsense.presentations.html
88 W.A. Hunt et al.
3. Boyer RS, Hunt WA Jr (2006) Function memoization and unique object representation for
ACL2 functions. In: ACL2 ’06: Proceedings of the sixth international workshop on the ACL2
theorem prover and its applications. ACM, New York, NY, pp 81–89
4. Boyer RS, Hunt WA Jr (2009) Symbolic simulation in ACL2. In: Proceedings of the eighth
international workshop on the ACL2 theorem prover and its applications
5. Brock B, Hunt WA Jr (1991) Report on the formal specification and partial verification of the
VIPER microprocessor. In: NASA contractor report, 187540
6. Bryant RE (1986) Graph-based algorithms for boolean function manipulation. IEEE Trans
Comput C-35(8):677–691
7. Chen Y-A, Bryant RE (1998) Verification of floating-point adders. In: Hu AJ, Vardi MY (eds)
Computer aided verification, Lecture notes in computer science, vol 1427. Springer, Heidelberg
8. Harrison J (2006) Floating-point verification using theorem proving. In: Bernardo M, Cimatti A
(eds) Formal methods for hardware verification, Sixth international school on formal methods
for the design of computer, communication, and software systems, SFM 2006, Lecture notes
in computer science, Bertinoro, Italy, vol 3965. Springer, New York, pp 211–242
9. Hunt WA Jr (1994) FM8501: a verified microprocessor. Springer, London
10. Hunt WA Jr, Reeber E (2005) Formalization of the DE2 language. In: Proceedings of the 13th
conference on correct hardware design and verification methods (CHARME 2005), pp 20–34
11. Hunt WA Jr, Swords S (2009a) Centaur technology media unit verification: case study: floating-
point addition. In: Computer aided verification, Lecture notes in computer science 5643.
Springer, Berlin, pp 353–367
12. Hunt WA Jr, Swords S (2009b) Use of the E language. In: Martin A, O’Leary J (eds) Hardware
design and functional languages ETAPS 2009 Workshop, York, UK
13. IEEE (2005) IEEE standard (1364-2005) for verilog hardware description language
14. IEEE Computer Society (2008) IEEE standard for floating-point arithmetic, IEEE std 754TM -
2008 edn
15. Jacobi C, Weber K, Paruthi V, Baumgartner J (2005) Automatic formal verification of
fused-multiply-add FPUs. In: Proceedings of design, automation and test in Europe, vol 2,
pp 1298–1303
16. Jones RB (2002) Symbolic simulation methods for industrial formal verification. Kluwer,
Dordrecht
17. Kaivola R, Ghughal R, Narasimhan N, Telfer A, Whittemore J, Pandav S, Slobodová A,
Taylor C, Frolov V, Reeber E, Naik A (2009) Replacing testing with formal verification
in Intelr CoreTM i7 processor execution engine validation. In: Computer aided verification,
Lecture notes in computer science. Springer, Berlin, pp 414–429
18. Price D (1995) Pentium FDIV flaw – lessons learned. IEEE Micro 15(2):88–87
19. Russinoff D (2000) A case study in formal verification of register-transfer logic with ACL2:
the floating point adder of the AMD Athlon (TM) processor. In: Hunt WA Jr, Johnson SD (eds)
Formal methods in computer-aided design, LNCS 1954. Springer, Berlin, pp 22–55
20. Seger CJH, Jones RB, O’Leary JW, Melham T, Aagaard MD, Barrett C, Syme D (2005) An
industrially effective environment for formal hardware verification. IEEE Trans Comput Aided
Des Integr Circuits Syst 24(9):1381
21. Slobodová A (2006) Challenges for formal verification in industrial setting. In: Brim L,
Haverkort B, Leucker M, van de Pol J (eds) Formal methods: applications and technology,
LNCS 4346. Springer, Berlin, pp 1–22
22. University of California at Berkeley, Department of Electrical Engineering and Computer
Science, Industrial Liaison Program. A compact test suite for P754 arithmetic – version 2.0.
23. Visser E (2005) A survey of strategies in rule-based program transformation systems.
J Symbolic Comput 40:831–873
Designing Tunable, Verifiable Cryptographic
Hardware Using Cryptol
1 Introduction
1
Cryptol is a registered trademark of Galois, Inc. in USA and other countries. This chapter is
derived from materials copyrighted by Galois, Inc., and used by permission.
S. Browning ()
Galois, Inc., Portland, OR, USA
e-mail: sally@galois.com
1.1 Outline
2 Cryptol Overview
This section briefly mentions some of Cryptol’s useful features, including available
primitives, features of the type system, and other constructs.
In a functional language, functions have values just like any other expressions. For
example, f where f x = x + 1 is a function that increments its argument. It
can be bound to variable, which can be applied:
g = f where f x = x + 1;
y = g 10;
Designing Tunable, Verifiable Cryptographic Hardware Using Cryptol 91
g = \x -> x + 1;
y = g 10;
In Cryptol, \arg -> body is simply syntactic sugar for f where f arg =
body.
Cryptol supports two interesting polymorphic terms: zero and undefined. Both
of these have the following type:
{a} a
This means they can be of any type. When zero is a sequence of bits, then each
element in the sequence is False. Otherwise, each element in zero is itself zero.
For example, zero : [4][3] is the same as:
Or simply this:
[0b000 0b000 0b000 0b000]
f x = x @ 3;
92 S. Browning and P. Weaver
Note that Cryptol utilizes 0-based array indices. The type of this function can be
as generic as:
f : {a b} [a]b -> b;
This can be read as “for any types a and b, the function f takes a sequence of width
a, where each element of the sequence is of type b, and returns a single element of
type b.” However, to ensure that we do not try to index outside the sequence, we
should constrain a to be at least 4:
This says that, for any types a and b where a is at least 4, the function takes in a
sequence of width a, where each element of the sequence is of type b, and returns a
single element of type b. The type variable b could itself be Bit or some sequence
of arbitrary length.
Type ascriptions can appear almost anywhere within Cryptol code, not just on a
line of their own. This is especially useful to prevent Cryptol from inferring a width
that is too small. For example, at the Cryptol interpreter prompt, we can observe the
following behavior:
Cryptol> 1+1
0x0
interpreter
This is because 1 defaults to a width of 1, the smallest width necessary to repre-
sent the value. The type of + is:
Therefore, the result is the same as the widths of the inputs, so 0x2 overflows to
0x0. To prevent this, we can ascribe a type to either argument to +, which causes
Cryptol to infer the type of the other argument and the result:
Cryptol> (1:[4]) + 1
0x2
interpreter
All polymorphic types must be specialized to monomorphic (i.e., specific) types
at compile time. Cryptol will infer types as much as possible, including defaulting
to the minimum width possibly needed to represent a sequence. If it cannot reduce
a function to a monomorphic type, it will refuse to apply or compile it.
The way to force an expression to a monomorphic type is to ascribe a type, either
directly to the expression or somewhere else that causes the compiler to infer the
type of that expression. For example, consider a polymorphic function that incre-
ments its argument:
inc : {a} (a >= 1) => [a] -> [a];
inc x = x + 1;
Designing Tunable, Verifiable Cryptographic Hardware Using Cryptol 93
In the expression inc x, the compiler must be able to infer either the type of
x or the type of inc x in order to know the type of this particular instantiation
of inc. We can explicitly ascribe either of these types using inc x :: [10] or
inc (x :: [10]), or we can place an ascription elsewhere in our code that will
cause the compiler to infer one of these types.
Cryptol allows constant terms to be used as types. So, we could define a and the
type of f as follows:
a = 16;
f : {b} [a]b -> b;
Note that, in this case, the type of f is no longer parameterized over the type
variable a.
Cryptol supports a primitive called width that can be used at both the expression
level and the type level. At the expression level, width x is the number of elements
in x, and it must be known at compile time; if x is a sequence of bits, then width x
is the number of bits needed to represent x. At the type level, width can be used
in a constraint.
So, we can write polymorphic functions whose behavior depends on the width
of the inputs, as in the following example that outputs the least significant bit of xs
when xs has more than 10 bits and otherwise outputs the most significant bit. (See
Sect. 2.1.5 for more about the indexing operators @ and !.)
Cryptol supports type aliases and records. One can define a new type as an alias
to existing types, and it can be parameterized over other types. For example, the
following defines a type Complex x as an alias to (x, x).
type Complex x = (x, x);
We can use the type alias to define a function that multiplies two complex
numbers.
A record contains named fields. We can define a record for complex numbers that
names the real and imaginary fields:
94 S. Browning and P. Weaver
2.1.4 Enumerations
Cryptol supports finite and infinite enumerations. Following are some examples:
[1..10] == [1 2 3 4 5 6 7 8 9 10]
[1 3..10] == [1 3 5 7 9]
[1..] == [1 2 3 4 5 6 7 ..]
[10--1] == [10 9 8 7 6 5 4 3 2 1]
[10 6--0] == [10 6 2]
[10--] == [10 9 8 7 6 ...]
Cryptol supports the following index operators: @ @@ ! !!. The @ operator indexes
from the least significant element and ! operator indexes from the most significant
element. The least significant element is the rightmost for sequences of bits and the
leftmost for other sequences.2 The operators @@ and !! lookup a range of indices.
So, the following equalities hold:
2
This is different in Cryptol 2.0.
Designing Tunable, Verifiable Cryptographic Hardware Using Cryptol 95
All sequence widths are fixed and therefore must be known at compile time. The
width of the result of @@ and !! depends on the value of the second argument;
therefore, the value of the second argument must be known at compile time.
As their names imply, they take or drop a certain number of elements from the be-
ginning of a sequence. Because the width of the result depends on the first argument
(number of elements that are taken or dropped), this argument must be constant. For
example, we cannot define the following function, because n is not known:
f n = take(n, [1 2 3 4 5 6 7 8 9 10]);
The split, splitBy, and groupBy functions each convert a sequence into
a two-dimensional sequence. Their inverse is the join function. split and
splitBy behave the same; the splitBy function is provided as an alternative
to split because it allows the user to explicitly choose the size of the first dimen-
sion, rather than forcing the compiler to infer it.
The following equalities show how splitBy, groupBy, join, and
transpose behave.
splitBy (3, [1 2 3 4 5 6 7 8 9 10 11 12]) ==
[[1 2 3 4] [5 6 7 8] [9 10 11 12]]
groupBy (3, [1 2 3 4 5 6 7 8 9 10 11 12]) ==
[[1 2 3] [4 5 6] [7 8 9] [10 11 12]]
join [[1 2 3 4] [5 6 7 8] [9 10 11 12]] ==
[1 2 3 4 5 6 7 8 9 10 11 12]
join [[1 2 3] [4 5 6] [7 8 9] [10 11 12]] ==
[1 2 3 4 5 6 7 8 9 10 11 12]
transpose [[1 2 3 4] [5 6 7 8]] ==
[[1 5] [2 6] [3 7] [4 8]]
transpose [[1 2 3] [4 5 6] [7 8 9]] ==
[[1 4 7] [2 5 8] [3 6 9]]
96 S. Browning and P. Weaver
One enters the Cryptol interpreter by typing “cryptol” at the shell command prompt.
The interpreter provides a typical command-line interface (a read–eval–print loop).
One can execute a shell command from within the interpreter by placing a ! at
the beginning of the command:
Cryptol> !ls
interpreter
The Cryptol interpreter supports a number of commands, each of which begins
with a colon (:). Following are the most common commands used in the Cryptol
interpreter. For a more detailed discussion of these and other commands, see the
reference manual.
:load <path>
Load all definitions from a .cry file, bringing them into scope.
:set <mode>
Switch to a given mode. Each mode supports a different set of options and per-
forms evaluation on a different intermediate form that is translated from the Abstract
Syntax Tree (AST). See Sect. 2.3 for a discussion of the modes that are useful for
hardware design.
:translate <function> [<path>]
Compile a function to the intermediate form associated with the current mode (see
Sect. 2.3). This function is known as the top-level function. The optional <path>
argument is relative to the current working directory, or can be written as an absolute
path, and should include the desired extension. If <path> is not provided, then
Cryptol uses the outfile setting, saves the file in outdir, and adds the extension
automatically.
:fm <function> [<path>]
Generate a formal model from a function. The path is optional and will be chosen
automatically unless provided by the user. All modes that support the generation of
formal models produce the same format, so functions can be checked for equiva-
lence across modes.
:eq <function or path> <function or path>
Determines whether two functions are equivalent. Requires two arguments, each of
which is either a Cryptol expression in parentheses or the path to a formal model in
quotes.
Designing Tunable, Verifiable Cryptographic Hardware Using Cryptol 97
This section discusses the Cryptol modes that are relevant to hardware design and
verification. Enter a mode by typing :set <mode>, where <mode> is one of the
following modes, at the Cryptol interpreter prompt. When an expression is entered
at the interpreter prompt, it is translated to the intermediate form associated with
the current mode and then evaluated. The translate command produces the in-
termediate form for an expression in the current mode, but does not evaluate it.
Regardless of the mode, all concrete syntax is first translated into an abstract syntax
tree called the IR (intermediate representation). For some modes, the IR is their in-
termediate form, while other modes translate from the IR to their own intermediate
form.
The relevant Cryptol modes are as follows:
symbolic: This performs symbolic interpretation on the IR. This is useful for
prototyping circuits and supports equivalence checking.
LLSPIR: This compiles the IR to Low Level Signal Processing Intermediate
Representation (LLSPIR), inlining all function calls into the top-level function and
performing timing transformations that optimize the circuit. This provides rough
profiling information of the final circuit, including longest path, estimated clockrate,
output rate, latency, and size of circuit. This supports equivalence checking. Rather
than output LLSPIR, the translate command produces a .dot file, a graph of
the LLSPIR circuit that can be viewed graphically.
VHDL: This compiles to LLSPIR and then translates to VHDL. Evaluation is
performed by using external tools to compile the VHDL to a simulation executable
and then running the executable. This is useful for generating VHDL that is
manually integrated into another design, rather than directly synthesizing the result.
FPGA: This compiles to LLSPIR, translates to VHDL, and uses external tools to
synthesize the VHDL to an architecture-dependent netlist. There are several options
in this mode that control what the external tools should do next, and they are most
easily accessed via the following aliases:
FSIM: This compiles the netlist to a low-level structural VHDL netlist suitable
for simulation only. Evaluation is performed by compiling the VHDL to a simula-
tion executable and running the executable. This produces profiling information
that does not take into account routing delays. This reports the maximum theo-
retical clockrate.
98 S. Browning and P. Weaver
TSIM: This is similar to FSIM, but performs map and place-and-route when
generating the VHDL netlist. This process can increase compilation time signifi-
cantly, but produces very accurate profiling, including a true obtainable clockrate.
FPGA Board: This compiles the architecture-dependent netlist to a bitstream
suitable for loading onto a particular FPGA board.
The :eq command is supported in FSIM and TSIM modes. It changes the syn-
thesis target to a Verilog netlist suitable for equivalence checking and compiles this
netlist to a formal model.
The top-level function is compiled to a single VHDL entity whose interface is
determined by the type of the function. The top-level function always has some
variation of the following type:
For each of a and b, if the type is a tuple, then each element of that tuple becomes
a port in VHDL; otherwise the type becomes a bit or bit-vector in VHDL. Tuples
nested inside a top-level tuple are appended into single bit-vectors. For example, if
the type of the top-level function is
The following modes support equivalence checking: symbolic, LLSPIR, and FPGA
(FSIM and TSIM). The user can generate a formal model of a function in any of
these modes using the :fm command and check a function for equivalence to a
formal model using :eq. Two formal models generated in two different modes may
also be compared using :eq.
There are two main uses of equivalence checking: (1) to verify that an implemen-
tation is correct with respect to a specification and (2) to verify that Cryptol compiles
a particular function correctly, by comparing a function in LLSPIR or FPGA mode
to the same function symbolic mode.
Designing Tunable, Verifiable Cryptographic Hardware Using Cryptol 99
Cryptol supports three specific equivalence checkers: jaig, eaig, and abc. The
equivalence checker outputs True if two functions are equivalent and otherwise
outputs False along with a counterexample.
This section discusses the Cryptol language with respect to hardware design. First,
it discusses features in the language that are not supported by the compiler and
other issues that may make it difficult to generate efficient circuits. It also describes
techniques for making space–time tradeoffs and applies these techniques to several
concrete and simple examples. These techniques are the same that are used to ma-
nipulate the AES implementations in Sect. 5.
This section discusses some of the limitations of the Cryptol hardware compiler,
including some techniques for avoiding them.
Then, the compiler inlines inc into app_tup to obtain code that no longer
contains higher-order functions:
inc_tup t = (inc x, inc y) where inc x = x + 1;
However, the hardware compiler does not support functions that return nonclosed
functions, such as the following:
f : [8] -> ([8] -> [8]);
f x = g where g y = x + y;
Hardware performance can vary drastically based on subtle changes in how se-
quence comprehensions are written. This section explains how to generate efficient
circuits from sequence comprehensions.
Although the following two expressions are semantically equivalent, they com-
pile to significantly different circuits:
take(N, [1..])
[1..N]
The first one generates code to calculate a sequence of numbers at run time.
Furthermore, because the sequence is infinite, it is mapped across time, so each
element is calculated in a different clock cycle. The second expression generates the
enumeration at compile time. When used in larger circuits, this subtle difference can
cause drastic changes in performance.
Consider a function that takes in some fixed number of bytes, N, and pair-wise
multiplies each byte by 1 through N, respectively. A naive implementation might
look like this:
Designing Tunable, Verifiable Cryptographic Hardware Using Cryptol 101
3
Some pure Cryptol functions, which look like combinatorial circuits, may actually map to
clocked, sequential FPGA primitives, such as a Block RAM. In this case, they may have a latency
of several clock cycles.
102 S. Browning and P. Weaver
In the stream model, sequential circuits are modeled using infinite sequences over
time, so a function in the stream model has some variation of the following type:
[inf]input -> [inf]output;
Each element in the input or output corresponds to some number of clock cycles,
which is the latency of the circuit. To manage state, the user may define a stream
within the circuit that holds the state, as in the definition of lift_step below.
One can lift a combinational circuit into the stream model using the following
function. It takes in the function for a combinational circuit and an infinite stream
and maps the function across all elements in the stream.
lift : {a b} (a -> b, [inf]a) -> [inf]b;
lift (f, ins) = [| f x || x <- ins |];
Note that this may not cause the circuit to become clocked, especially if there is
no stateful information passed from one cycle to the next. It is important that circuits
generated by Cryptol always be clocked, otherwise the synthesis tools cannot make
use of clock constraints to produce useful timing analyses. In general, it is good to
latch onto the inputs and outputs of a circuit by inserting registers after the inputs,
before the outputs, or both. The following function lifts a combinatorial function
into the stream model and places those registers:
lift_and_latch : {a b} (a -> b, [inf]a) -> [inf]b;
lift_and_latch (f, ins) = [undefined] #
[| f x || x <- [undefined]
# ins |];
See Sects. 3.3 and 3.5 to learn how to use registers in Cryptol.
In the step model, sequential circuits are modeled as combinatorial circuits that
are later lifted into the stream model. A function in the step model is defined as a
pure mapping from input and current state to output and next state, so it has some
variation of the following type:
(input, state) -> (output, state)
Consider a sequence, s, in the stream model, so s is infinite and mapped over time.
Assuming an output rate of one element per cycle, we can delay the stream by n
cycles by appending n elements to the beginning of the stream. For example, the
following function outputs its inputs unmodified, but each output is delayed by 1
cycle:
delay : {a} [inf]a -> [inf]a;
delay ins = [undefined] # ins;
And this function delays the output by 2 cycles:
delay2 : {a} [inf]a -> [inf]a;
delay2 ins = [undefined undefined] # ins;
Alternatively, we could define delay2 by applying delay to the input twice:
delay (delay ins).
Note that using zero instead of undefined adds latency to the circuit because
it takes some time to initialize it.
While a “delay” causes its output to occur some number of cycles after the input,
an “undelay” causes its output to occur before the input. One can cause an undelay
to occur using the drop construct:
undelay : {a} [inf]a -> [inf]a;
undelay ins = drop(1, ins);
Undelays are not synthesizable. During an optimization pass, the compiler
pushes delays and undelays around in the circuit, introducing new ones and cancel-
ing delays with adjacent undelays.
Delays and undelays can be used to synchronize data across time. For exam-
ple, the following Fibonacci implementation uses drop to look back in time in the
stream so that we can add the previous two values together:
104 S. Browning and P. Weaver
fib : [inf][32];
fib = [ 1 1 ] # [| x + y || x <- fib
|| y <- drop(1, fib) |];
One can also use delays to produce pipelines. A delay synthesizes to a
register/latch. Section 3.5 shows how to use registers to pipeline circuits.
Cryptol supports two simple but very powerful pragmas that control space-time
tradeoffs in the compiler. The par pragma causes circuitry to be replicated, whereas
the seq pragma causes circuitry to be reused over multiple clock cycles. By default
the compiler replicates circuitry as much as possible in exchange for performance,
and the user overrides this behavior using seq; the par pragma is only useful for
switching back to the default behavior within an instance of seq.
Semantically, both seq and par are the identity, because the types and semantics
of Cryptol have no notion of time:
seq : {n t} [n]t -> [n]t;
seq x = x;
By default, the compiler will unroll and parallelize the sequence comprehension
as much as possible. However, seq (f xs) requests that the circuitry within f
be reused over n cycles. This requires extra flip-flops to synchronize each reuse
of the circuitry within the comprehension, but can reduce overall logic utilization,
resulting in a smaller circuit.
For example, the prods’ function from Sect. 3.1.2 can be defined to reuse one
multiplier over multiple clock cycles:
prods’ xs = seq [| x * i || i <- [1..(width xs)]
|| x <- xs |];
can be performed in parallel and one involving a sequence comprehension that must
be performed sequentially. In both examples, the seq pragma is used to map the
sequence over multiple clock cycles, and the performance advantages and disadvan-
tages of doing so are discussed.
For both examples, we assume that there is some function f of type a->b or
b->b. The advantages of using seq are greater when f consumes more area, be-
cause reusing f will then have a greater impact on logic utilization.
Consider the map function that applies some function to every element in a finite
sequence:
map : {n a b} (fin n) => (a -> b, [n]a) -> [n]b;
map (f, xs) = [| f x || x <- xs |];
Project 0 f
Project 16 f
INPUT Append OUTPUT
Project 32 f
Project 48 f
Project 0 1 Switch f
2
INPUT Project 16 1 Switch Delay 3
2
3 Delay
Project 32 1 Switch Delay
2 3
Project 48 Delay Delay
4
3
2
LoopPulse 4
Delay 1 Append
RESTART
OUTPUT
However, f appears twice in this function. We want f to only appear once (inside
the comprehension), so that when we use the seq pragma f is only instantiated
once. So, we define iterate as follows, where the element at index k is the one
that results from k applications of f:
iterate : b -> [(k+1)]b;
iterate x = outs
where outs = [x] # [| f prev
|| i <- [1..k]
|| prev <- outs |];
Note that each element depends on the previous, so the sequence will be evaluated
sequentially. Rather than instantiating f in parallel, as in the previous section, the
sequence comprehension will be unrolled.
We define two hardware implementations; in the graphs of these implementations
below, k is fixed at 4. The first implementation simply returns the last element in the
sequence produced by iterate. Its definition and graph are provided in Fig. 4. It
uses k copies of f and chains them together sequentially in one clock cycle.
iterate2 uses the seq pragma to request that f be reused each of k clock
cycles. Its definition and graph are provided in Fig. 5.
The result is that iterate1 will have a lower clockrate and will take up more
area, but will have an input/output rate of one element per clockcycle. iterate2
INPUT f f f f OUTPUT
iterate1 : b -> b;
iterate1 x = iterate x ! 0;
INPUT
func
OUTPUT
iterate2 : b -> b;
iterate2 x = seq (iterate x) ! 0;
will have a higher clockrate and will take up less area, but it will have an input/output
rate of one element every k cycles and will require extra flip-flops to latch onto the
output of f each cycle.
We can verify that the two implementations are equivalent, using iterate1 as
the reference spec:
:set symbolic
:fm iterate1 "iterate1.fm"
:set LLSPIR
:eq iterate1 "iterate1.fm"
:eq iterate2 "iterate1.fm"
interpreter
3.5 Pipelining
Sequential circuits in the stream model can be pipelined to increase clockrate and
throughput. One separates a function into several smaller computational units, each
of which is a stage in the pipeline that consumes output from the previous stage and
produces output for the next stage. The stages are synchronized by placing registers
between them.
Pipelining an implementation typically increases the overall latency and area of
a circuit, but can increase the clockrate and total throughput dramatically. Each
stage is a relatively small circuit with some propagation delay. The clockrate is
limited by the stage in the pipeline with the highest propagation delay, whereas the
un-pipelined implementation would be limited by the sum of the propagation delays
of all stages. So, rather than performing one large computation on one input during
a very long clock cycle, an n-stage pipeline performs n parallel computations on
Designing Tunable, Verifiable Cryptographic Hardware Using Cryptol 109
Consider the following combinatorial circuit, where g and h are arbitrary combina-
torial circuits of type a -> a. This will be our reference specification.
add_g_h_spec : ([c],[c]) -> [c];
add_g_h_spec (a, b) = g(a) + h(b);
The definitions of g and h are not revealed here, because they are irrelevant to
implementing the pipeline unless we want to split g and h themselves into multiple
stages (see add_g_h_4 below). The functions g and h can be polymorphic, but
for simplicity we fix the width:
c = 8;
Below are five separate implementations of add_g_h in the stream model. The
first simply lifts the original spec into the stream model. The remaining are pipelined
implementations. These functions all share the same type:
add_g_h_1, add_g_h_2, add_g_h_3, add_g_h_4 :
[inf]([c],[c]) -> [inf][c];
The definitions and graphs of each implementation are provided in the figures
below.
First, we implement the spec in the stream model as a circuit that consumes
input and produces output on every clock cycle. In this case, g(a) and h(b) are
performed in parallel, but the addition operation cannot be performed until g(a)
and h(b) both finish. Thus, the total propagation delay is the maximum delay of g
and h plus the delay of the addition. See Fig. 6 for the definition and graph of this
circuit.
110 S. Browning and P. Weaver
INPUT 1 g
ADD OUTPUT
INPUT 2 h
INPUT 1 g Delay
ADD OUTPUT
INPUT 2 h Delay
We can pipeline this circuit by identifying two distinct stages that can execute in
parallel (1) the applications of g and h and (2) the addition operation. To implement
the pipeline, we evaluate g(a) and h(b) in parallel and store the results in a state
in 1 cycle. On the next cycle, we add the two elements of the state together and
make that the output of the circuit. This adds an extra clock cycle of latency so that
it now takes 2 cycles to perform the entire computation. However, the clockrate is
only limited by maximum delay of g, h, and the addition, whereas in the previous
implementation it was the maximum delay of g and h plus the delay of the addition.
Therefore, each stage of the computation can execute faster, and the throughput
increases.
Note that during any given clock cycle, a pipelined implementation operates on
data associated with two consecutive and unrelated inputs; it applies h and g to the
most recent inputs, and it applies addition to the state which stores h(a) and h(b)
associated with the inputs of the previous cycle.
Our first pipelined implementation, add_g_h_2, is provided in Fig. 7.
In each pipelined implementation above, the second stage only performs a single
addition operation. Therefore, if either g or h has a propagation delay of more than
one addition operation, then the first stage is the bottleneck of the pipeline and
Designing Tunable, Verifiable Cryptographic Hardware Using Cryptol 111
Const 0x05
Const 0x06
8 8
2 2
should be split into multiple stages if possible. Suppose g and h are defined as
follows:
g a = (a+5) * 6;
h b = b*7 + 1;
We can implement g and h each as a two-stage pipeline, so our entire circuit has
three stages (1) perform a+5 and b*7; (2) apply multiplication by 6 and addition
by 1 to the results from stage 1; and (3) add the results from stage 2. This three-stage
pipeline, add_g_h_3, is defined in Fig. 8.
We can define a utility function that lifts any combinatorial circuit into a stage of
a pipeline in the stream model:
stage : {a b} (a -> b, [inf]a) -> [inf]b;
stage (f, ins) = [undefined] # [| f x || x <- ins |];
Equivalence checking is not possible on infinite streams, so we cannot verify
that a sequential circuit is equivalent to its spec for all time. However, we can still
provide some level of assurance that the circuit is correct. First, we verify that the
function always produces the correct output for the first input. For example, to test
add_g_h_1 above, use a function like the following:
test_add_g_h : ([c],[c]) -> [c];
test_add_g_h input = add_g_h_1 ([input] # zero) @ 0;
This function should be equivalent to the spec, which we can verify with the
following command:
112 S. Browning and P. Weaver
In order to pipeline the above examples, we had to lift each circuit into the stream
model. This is because we need to have access to a stream that is mapped over time
in order to delay it.
In this section, we introduce a new pragma that allows the user to pipeline com-
binational code without lifting code into the stream model, and show how it can be
applied to the examples above. This allows the user to pipeline code without chang-
ing the structure, yielding a pipelined implementation that more closely resembles
the original spec.
When used as intended, this reg pragma causes a combinational circuit to be
divided into smaller combinational circuits with registers between. Each applica-
tion of reg generates a Delay–Undelay pair in the SPIR AST, so the net delay
through the circuit is exactly 0. This allows us and the compiler to treat the circuit as
combinational and without any notion of time. During the translation from SPIR to
LLSPIR, the circuit becomes sequential as the compiler uses specific rewrite rules
to move the Delays and Undelays around while keeping the circuit synchronized
with respect to time.
Unlike when the user pipelines by appending [undefined], the compiler is
aware of the latency that the reg pragma introduces. The compiler will report the
Designing Tunable, Verifiable Cryptographic Hardware Using Cryptol 113
correct latency of the circuit, and when we lift the circuit into the stream model, we
do not have to drop the undefined outputs from the beginning of the stream; the first
element will be the first valid output.
Using the reg pragma, we can pipeline g and h as combinational circuits
without changing the definition of add_g_h_spec:
g x = reg(reg(x+5) * 6);
h y = reg(reg(y*7) + 1);
Using these definitions of g and h, when add_g_h_spec is lifted into the
stream model, it will be identical to the add_g_h_4 circuit that was manually
pipelined above.
We can also use the reg pragma to pipeline the iterate function introduced
in Sect. 3.4.2:
iterate : b -> [(k+1)]b;
iterate x = outs
where outs = [x] # [| reg(f prev)
|| i <- [1..k]
|| prev <- outs |];
iterate_pipe’ : b -> b;
iterate_pipe’ x = iterate x ! 0;
4 AES Specification
This section develops AES from the description in FIPS-197 [4] while using
Cryptol’s advanced features to make the specification clear and amenable to ver-
ification techniques. While in many respects the correspondence between the spec-
ification and the Cryptol code is easily seen, we have made one major departure.
Namely, while the specification is defined in terms of successively applying permu-
tations to the plaintext to yield the ciphertext, this implementation uses higher-order
functions to compute a single key-dependent pair of permutations that encrypt and
decrypt. The advantage of this is that the decryption function is, by construction,
manifestly the inverse of the encryption function.
Most of the sections and subsections herein are numbered as they are in FIPS-
197. The intent is that this document be read in tandem with that one.
4.1 API
Within FIPS-197 there are three key sizes (128, 192, and 256) and a common block
size (128). These three key sizes correspond to encryption/decryption primitives
114 S. Browning and P. Weaver
AES128.encrypt(key128,plaintext)
AES128.decrypt(key128,ciphertext)
AES192.encrypt(key192,plaintext)
AES192.decrypt(key192,ciphertext)
AES256.encrypt(key256,plaintext)
AES256.decrypt(key256,ciphertext)
where plaintext and ciphertext are 128-bit blocks and key128, key192,
and key256 are 128-, 192-, and 256-bit keys. The return value for all six primitives
has type [128]. Naturally the encrypt and decrypt functions for a given size and
key are inverses of each other.
The Cryptol code for the API follows:
AES128 : {encrypt : (Key(4),Block) -> Block;
decrypt : (Key(4),Block) -> Block};
AES128 = {encrypt = Cipher; decrypt = InvCipher};
4.2 Types
This section loosely corresponds to FIPS-197 3 for concrete types like byte and
word. Cryptol’s more advanced types facilitate more appropriate signatures for
many of the functions defined in FIPS-197.
AES has the familiar data structures byte ([8]) and word ([32]). It also has
block4 ([128]) and state, a four-by-four5 matrix of bytes ([4][4][8]).
In code these are:
type BitsPerByte = 8;
type BitsPerWord = 32;
4
While AES insists that the block size is 128, Rijndael [1], on which AES was based, allows block
sizes of 128, 160, 192, 224, and 256.
5
If AES had block sizes other than 128, the number of columns would differ from 4.
Designing Tunable, Verifiable Cryptographic Hardware Using Cryptol 115
6
Rijndael allows for key sizes of 128, 160, 192, 224, and 256 which correspond to values of 4, 5,
6, 7, and 8 for N k. AES insists that the key size be one of 128, 192, and 256.
116 S. Browning and P. Weaver
Functions to construct duos from permutations and involutions are given by:
makeDuo : (Permutation,Permutation) -> Duo;
makeDuo(t,u) = {function = t; inverse = u};
makeDuoFromInvolution : {a} Involution -> Duo;
makeDuoFromInvolution(i) = makeDuo(i,i);
Since code in the sequel is predicated on the components of a Duo being inverses
of each other, ideally we would enforce that somehow in code. The brute force ap-
proach of ensuring that both compositions of the duo’s function and inverse behave
as the identity on the 2448 D 340 282 366 920 938 463 463 374 607 431 768 211
456 inputs is impractical. While we lack a general approach, there are some efforts
to be made in the following as many of the permutations have a structure that allows
for decomposition of the inverse test onto substructures having smaller domains.
4.5.1 Addition
4.6 Multiplication
4.6.1 Multiplication by x
Cryptol’s ease of defining gTimes above means we do not need the xtime()
function defined in the corresponding section of FIPS-197.
aMatrix : State;
aMatrix = transpose(columns)
where {
columns = [| [ 0x02 0x01 0x01 0x03 ] >>> i ||
i <- [0 .. 3] |];
};
And multiplying by a1 .x/ from FIPS-197 4.3 corresponds to the function:
invMixColumn : Column -> Column;
invMixColumn(column) = gMatrixVectorProduct
(aInverseMatrix,column);
aInverseMatrix : State;
aInverseMatrix = transpose(columns)
118 S. Browning and P. Weaver
where {
columns = [| [ 0x0e 0x09 0x0d 0x0b ] >>> i ||
i <- [0 .. 3] |];
};
The routines for performing the matrix-vector product follow:
gMatrixVectorProduct : (State,Column) -> Column;
gMatrixVectorProduct(matrix,column) =
join(gMatrixProduct(matrix,split(column)));
In this section, we follow the development of the AES algorithm in FIPS-197 5.
Although the more abstract types State, Permutation, and Duo are used
throughout, all the functions with capitalized names have essentially the same func-
tionality as those in FIPS-197.
The implementation is quite different in that it focuses on composing permuta-
tions to compute a permutation that corresponds to encryption and then applying
that composed permutation to the plaintext to get the ciphertext rather than succes-
sively applying permutations to get intermediate results whose culmination is the
ciphertext. The advantage to composing the permutations is that the Duo type car-
ries the inverse through those permutations, so that it is readily apparent that the
decryption permutation is inverse of the encryption permutation.
4.9 Cipher
roundKeys = KeyExpansion(keyToWords(key));
duos = duosByRound(roundKeys);
out = applyPermutation(composeDuos(duos),in);
};
In it the duos (permutations paired with their inverses) by round are computed,
composed, and applied to the suitably processed plaintext.
duosByRound : {nr} (fin nr,nr >= 1) => RoundKeys(nr) ->
[nr + 1]Duo;
duosByRound(roundKeys)
= [ (initialRoundDuo(roundKeys @ 0)) ]
# [| (medialRoundDuo(roundKey))
|| roundKey <- roundKeys @@ [1 .. width(roundKeys) - 2] |]
# [ (finalRoundDuo(roundKeys ! 0)) ];
SubBytes : Permutation;
SubBytes(state) = mapBytes(f,state)
where {
f(b) = SBox @ b;
};
SBox : [256]Byte;
SBox = [
0x63 0x7c 0x77 0x7b 0xf2 0x6b 0x6f 0xc5 0x30 0x01 0x67 0x2b
0xfe 0xd7 0xab 0x76 0xca 0x82 0xc9 0x7d 0xfa 0x59 0x47 0xf0
0xad 0xd4 0xa2 0xaf 0x9c 0xa4 0x72 0xc0 0xb7 0xfd 0x93 0x26
0x36 0x3f 0xf7 0xcc 0x34 0xa5 0xe5 0xf1 0x71 0xd8 0x31 0x15
0x04 0xc7 0x23 0xc3 0x18 0x96 0x05 0x9a 0x07 0x12 0x80 0xe2
0xeb 0x27 0xb2 0x75 0x09 0x83 0x2c 0x1a 0x1b 0x6e 0x5a 0xa0
0x52 0x3b 0xd6 0xb3 0x29 0xe3 0x2f 0x84 0x53 0xd1 0x00 0xed
0x20 0xfc 0xb1 0x5b 0x6a 0xcb 0xbe 0x39 0x4a 0x4c 0x58 0xcf
0xd0 0xef 0xaa 0xfb 0x43 0x4d 0x33 0x85 0x45 0xf9 0x02 0x7f
0x50 0x3c 0x9f 0xa8 0x51 0xa3 0x40 0x8f 0x92 0x9d 0x38 0xf5
0xbc 0xb6 0xda 0x21 0x10 0xff 0xf3 0xd2 0xcd 0x0c 0x13 0xec
0x5f 0x97 0x44 0x17 0xc4 0xa7 0x7e 0x3d 0x64 0x5d 0x19 0x73
0x60 0x81 0x4f 0xdc 0x22 0x2a 0x90 0x88 0x46 0xee 0xb8 0x14
0xde 0x5e 0x0b 0xdb 0xe0 0x32 0x3a 0x0a 0x49 0x06 0x24 0x5c
0xc2 0xd3 0xac 0x62 0x91 0x95 0xe4 0x79 0xe7 0xc8 0x37 0x6d
0x8d 0xd5 0x4e 0xa9 0x6c 0x56 0xf4 0xea 0x65 0x7a 0xae 0x08
0xba 0x78 0x25 0x2e 0x1c 0xa6 0xb4 0xc6 0xe8 0xdd 0x74 0x1f
0x4b 0xbd 0x8b 0x8a 0x70 0x3e 0xb5 0x66 0x48 0x03 0xf6 0x0e
0x61 0x35 0x57 0xb9 0x86 0xc1 0x1d 0x9e 0xe1 0xf8 0x98 0x11
0x69 0xd9 0x8e 0x94 0x9b 0x1e 0x87 0xe9 0xce 0x55 0x28 0xdf
0x8c 0xa1 0x89 0x0d 0xbf 0xe6 0x42 0x68 0x41 0x99 0x2d 0x0f
0xb0 0x54 0xbb 0x16 ];
rcon : [10]Word;
rcon = [| zero # p || p <- take(10,ps) |]
where {
ps = [ 0x01 ] # [| gTimes(p,0x02) || p <- ps |];
};
The other functions needed by KeyExpansion are straightforward:
SubWord : Word -> Word;
SubWord(w) = join([| (SBox @ b) || b <- split(w) |]);
Due to the Duo data structure, the InvCipher function below reads almost exactly
like the Cipher function of 4.9.
InvCipher : {nk} (fin nk,8 >= width(nk),nk >= 1) =>
(Key(nk),Block) -> Block;
InvCipher(key,ciphertext) = stateToBlock(out)
where {
in = blockToState(ciphertext);
roundKeys = KeyExpansion(keyToWords(key));
duos = duosByRound(roundKeys);
out = applyInversePermutation(composeDuos(duos),in);
};
where {
f(b) = InverseSBox @ b;
};
InverseSBox : [256]Byte;
InverseSBox = [
0x52 0x09 0x6a 0xd5 0x30 0x36 0xa5 0x38 0xbf 0x40 0xa3 0x9e
0x81 0xf3 0xd7 0xfb 0x7c 0xe3 0x39 0x82 0x9b 0x2f 0xff 0x87
0x34 0x8e 0x43 0x44 0xc4 0xde 0xe9 0xcb 0x54 0x7b 0x94 0x32
0xa6 0xc2 0x23 0x3d 0xee 0x4c 0x95 0x0b 0x42 0xfa 0xc3 0x4e
0x08 0x2e 0xa1 0x66 0x28 0xd9 0x24 0xb2 0x76 0x5b 0xa2 0x49
0x6d 0x8b 0xd1 0x25 0x72 0xf8 0xf6 0x64 0x86 0x68 0x98 0x16
0xd4 0xa4 0x5c 0xcc 0x5d 0x65 0xb6 0x92 0x6c 0x70 0x48 0x50
0xfd 0xed 0xb9 0xda 0x5e 0x15 0x46 0x57 0xa7 0x8d 0x9d 0x84
0x90 0xd8 0xab 0x00 0x8c 0xbc 0xd3 0x0a 0xf7 0xe4 0x58 0x05
0xb8 0xb3 0x45 0x06 0xd0 0x2c 0x1e 0x8f 0xca 0x3f 0x0f 0x02
0xc1 0xaf 0xbd 0x03 0x01 0x13 0x8a 0x6b 0x3a 0x91 0x11 0x41
0x4f 0x67 0xdc 0xea 0x97 0xf2 0xcf 0xce 0xf0 0xb4 0xe6 0x73
0x96 0xac 0x74 0x22 0xe7 0xad 0x35 0x85 0xe2 0xf9 0x37 0xe8
0x1c 0x75 0xdf 0x6e 0x47 0xf1 0x1a 0x71 0x1d 0x29 0xc5 0x89
0x6f 0xb7 0x62 0x0e 0xaa 0x18 0xbe 0x1b 0xfc 0x56 0x3e 0x4b
0xc6 0xd2 0x79 0x20 0x9a 0xdb 0xc0 0xfe 0x78 0xcd 0x5a 0xf4
0x1f 0xdd 0xa8 0x33 0x88 0x07 0xc7 0x31 0xb1 0x12 0x10 0x59
0x27 0x80 0xec 0x5f 0x60 0x51 0x7f 0xa9 0x19 0xb5 0x4a 0x0d
0x2d 0xe5 0x7a 0x9f 0x93 0xc9 0x9c 0xef 0xa0 0xe0 0x3b 0x4d
0xae 0x2a 0xf5 0xb0 0xc8 0xeb 0xbb 0x3c 0x83 0x53 0x99 0x61
0x17 0x2b 0x04 0x7e 0xba 0x77 0xd6 0x26 0xe1 0x69 0x14 0x63
0x55 0x21 0x0c 0x7d ];
The definition style below is encrypt rather than decrypt biased, but the symmetry
of the Duos, the shrewd use of reverse, and the ordering of the statements make
the algorithm of EqInvCipher presented in FIPS-197 readily apparent.
eqFinalRoundDuo : RoundKey -> Duo;
eqFinalRoundDuo(roundKey)=makeAddRoundKeyDuo(roundKey);
Following are some convenience functions for operating on data of type State in
a Byte-by-Byte or Column-by-Column fashion.
Designing Tunable, Verifiable Cryptographic Hardware Using Cryptol 125
mixColumnsDuo : Duo;
mixColumnsDuo = makeDuo(MixColumns,InvMixColumns);
shiftRowsDuo : Duo;
shiftRowsDuo = makeDuo(ShiftRows,InvShiftRows);
subBytesDuo : Duo;
subBytesDuo = makeDuo(SubBytes,InvSubBytes);
makeAddRoundKeyDuo : RoundKey -> Duo;
makeAddRoundKeyDuo(roundKey) =
makeDuoFromInvolution(makeAddRoundKeyInvolution
(roundKey));
a1_test : Bit;
a1_test = ws == ws’
where {
roundKeys : RoundKeys(10);
roundKeys =
KeyExpansion(keyToWords
(0x2b7e151628aed2a6abf7158809cf4f3c));
ws’ = join([| [| join(reverse(row))
|| row <- transpose(roundKey) |]
|| roundKey <- roundKeys |]);
ws : [44]Word;
ws = [ 0x2b7e1516 0x28aed2a6 0xabf71588 0x09cf4f3c
0xa0fafe17 0x88542cb1 0x23a33939 0x2a6c7605
0xf2c295f2 0x7a96b943 0x5935807a 0x7359f67f
0x3d80477d 0x4716fe3e 0x1e237e44 0x6d7a883b
0xef44a541 0xa8525b7f 0xb671253b 0xdb0bad00
0xd4d1c6f8 0x7c839d87 0xcaf2b8bc 0x11f915bc
Designing Tunable, Verifiable Cryptographic Hardware Using Cryptol 127
The permutation composition style used herein does not lend itself to verifying the
intermediate steps as in FIPS-197 Appendix B. We content ourselves with the end-
to-end test:
b_test : Bit;
b_test = output == output’
where {
key : Key(4);
key = 0x2b7e151628aed2a6abf7158809cf4f3c;
plaintext : Block;
plaintext = 0x3243f6a8885a308d313198a2e0370734;
ciphertext : Block;
ciphertext = AES128.encrypt(key,plaintext);
output’ : State;
output’ = blockToState(ciphertext);
output : State;
output = [ [ 0x39 0x02 0xdc 0x19 ]
[ 0x25 0xdc 0x11 0x6a ]
[ 0x84 0x09 0x85 0x0b ]
[ 0x1d 0xfb 0x97 0x32 ] ];
};
Evaluating b_test returns True as one would expect.
Again the permutation composition style used herein does not lend itself to verifying
the intermediate steps as in FIPS-197 Appendix C. We content ourselves with the
following end-to-end tests.
128 S. Browning and P. Weaver
c1_test_1: Bit;
c1_test_1= AES128.encrypt(0x000102030405060708090a0b0c0d0e0f,
0x00112233445566778899aabbccddeeff)
== 0x69c4e0d86a7b0430d8cdb78070b4c55a;
c1_test_2: Bit;
c1_test_2= AES128.decrypt(0x000102030405060708090a0b0c0d0e0f,
0x69c4e0d86a7b0430d8cdb78070b4c55a)
== 0x00112233445566778899aabbccddeeff;
c1_test_3: Bit;
c1_test_3= EqInvCipher(0x000102030405060708090a0b0c0d0e0f : [128],
0x69c4e0d86a7b0430d8cdb78070b4c55a)
== 0x00112233445566778899aabbccddeeff;
Evaluating (c1_test_1,c1_test_2,c1_test_3) returns the expression
(True, True, True).
5 AES Implementations
This section documents the process of refining the AES specification described in
Sect. 4 into synthesizable implementations of AES-128 and AES-256. First, we re-
move constructs that are unsupported in the hardware compiler, replace many of
the functions with more efficient versions, and specialize the algorithm to two im-
plementations, one for AES-128 and one for AES-256. Then, we define a second
Designing Tunable, Verifiable Cryptographic Hardware Using Cryptol 129
pair of implementations by defining a function that performs one round of key ex-
pansion and encryption for each algorithm and using this function to combine the
KeyExpansion and Cipher functions into one top-level function.
We adjust the second pair of implementations to meet space and time require-
ments. We provide two specific implementations of each algorithm (1) a smaller
implementation that uses the seq pragma to reuse the same key expansion and en-
cryption circuitry over multiple clock cycles and (2) a high-throughput pipelined
implementation.
We then improve the AES-128 implementation by replacing the round of encryp-
tion with an equivalent and more efficient T-Box implementation and use the seq
pragma again to produce a small circuit from the T-Box implementation.
In Sect. 5.2.2, we show how to use the new reg pragma to pipeline the AES-128
implementations in this section.
The resulting implementations are checked for equivalence with the original spec
in symbolic, LLSPIR, and FPGA modes. We also present performance results with
and without Block RAMs in both LLSPIR mode and TSIM mode.
5.1 Implementation #1
In this section, we replace many of the functions from the reference specification
with much more efficient versions and specialize the encryption algorithm. Specifi-
cally, we perform the following changes:
Replace gTimes with a more efficient implementation from the Rijndael spec,
then rewrite using stream recursion
Eliminate gPower, replacing with specialized gPower2
Eliminate gTimes, replacing with specialized gTimes2 and gTimes3
Replace mixColumn with much more efficient implementation, through inlin-
ing and static evaluation
Specialize KeyExpansion to AES-128 and AES-256 (only AES-128 is
shown)
These changes result in an AES implementation with very reasonable perfor-
mance (see Sect. 5.5).
The following are not supported by the Cryptol hardware compiler and therefore
should be removed when producing an implementation from the spec:
Recursive functions, which can usually be replaced by stream recursion
Functions that return nonclosed functions
There are no recursive functions in the spec. However, the Cipher Duo record
contains higher-order functions. We can remove the use of higher-order func-
tions and inline into Cipher the functions that use Duo, yielding the following
definition:
130 S. Browning and P. Weaver
[[0x2 0x3 0x1 0x1] [0x1 0x2 0x3 0x1] [0x1 0x1 0x2 0x3]
[0x3 0x1 0x1 0x2]]
Given that the argument to mixColumn is [y0 y1 y2 y3], then the matrix
product of the above aMatrix and this column argument is simply:
The original gTimes is now used only by gPower. However, gPower is called
only by Rcon and always with a constant first argument, 2. Therefore, we could
rewrite gPower as gPower2:
For each replaced function above, we can use the equivalence checker to verify
that the new version is equivalent to the original. For example, the following com-
mands can be used to verify that gTimes2 and gTimes3 are correct with respect
to the spec for gTimes:
:set symbolic
:eq (\x -> gTimes(2,x)) gTimes2
:eq (\x -> gTimes(3,x)) gTimes3
interpreter
To produce a more efficient implementation of KeyExpansion, we specialize
it to a 128-bit key size by making two minor changes. First, we replace nextWord
with nextWord_128, which does not have to check if the key size is more than
6. Second, we replace the infinite intermediate sequences, ws and [Nk ..] with
finite sequences.
In the specification, KeyExpansion is implemented using an infinite sequence
of words, ws, that is defined by drawing i from an infinite sequence, [Nk ..].
KeyExpansion then draws a finite number of elements from that sequence using
take. The hardware compiler can produce a more efficient implementation if we
draw i from [4..43], so that its contents can be evaluated at compile time, and
implement ws as a finite sequence. The new implementation is defined in Fig. 9.
The same technique is used to define key expansion for AES-256.
Finally, Cipher128 is written to use the new definitions (see Fig. 10), and we
lift this definition into the stream model as follows:
Cipher128_stream : [inf]([128],[128]) -> [inf][128];
Cipher128_stream ins = [| Cipher128 in || in <- ins |];
Fig. 9 KeyExpansion128
Designing Tunable, Verifiable Cryptographic Hardware Using Cryptol 133
Fig. 10 Cipher128
5.2 Implementation #2
Rather than use nextWord to produce one word at a time, we can write a func-
tion that produces four words at a time. This function is specialized to AES-128 in
Fig. 11 and is equivalent to four sequential applications of nextWord.
It is important to note that the SubWord and RotWord operations can be ex-
changed. This can be checked as follows:
:eq (\x -> SubWord (RotWord x)) (\x -> RotWord (SubWord x))
interpreter
134 S. Browning and P. Weaver
rounds = [init] #
[| oneRound128 (round, state_and_key)
|| state_and_key <- rounds
|| round <- [1..11]
|];
(final_state, dont_care) = rounds ! 0
};
Fig. 13 Cipher128’
When pipelining AES in the following sections, exchanging these operations may
produce a faster implementation.
We can now implement the oneRound128 function that performs a single
round of encryption and key expansion. It takes in the previous state and the key
for this round and produces the key for the next round and a new state that uses that
key from this round. This function is provided in Fig. 12.
Our new Cipher function simply applies the appropriate “oneRound” function
for each of the 11 rounds. This is defined in Fig. 13.
To prevent the Cipher function from being laid out over time, we lift it into the
stream world:
Cipher128’_stream: [inf]([128],[128]) -> [inf][128];
Cipher128’_stream ins= [| Cipher128’ in || in <- ins|];
This sort of function should be used whenever translating to a hardware imple-
mentation.
In this section, we optimize the implementation from Sect. 5.2 for area by reusing
one round over multiple clock cycles. Cipher128’ was written with this goal in
mind, so we can reuse the oneRound128 function and all we have to do is insert
Designing Tunable, Verifiable Cryptographic Hardware Using Cryptol 135
the seq pragma as shown in Fig. 14. The seq pragma has the same effect here as
it did in Sect. 3.4.
To implement in hardware, this function should be lifted into the stream model:
Cipher128_seq_stream : [inf]([128],[128]) -> [inf][128];
Cipher128_seq_stream ins = [| Cipher128_seq in || in <- ins |];
In this section, we use the reg pragma to pipeline the implementation from
Sect. 5.2. All we do is apply the reg pragma to each round. The function is de-
fined in Fig. 15.
We use the same method as before to conditionally insert registers after each
stage:
add_reg (i, x) = if check_stage i then (reg x) else x;
d = [| if (round == 11)
then [(t0@1) (t1@2) (t2@3) (t3@0)]
else t0 ˆ t1 ˆ t2 ˆ t3
where {
t0 = T0 (state @ 0 @ (j+0));
t1 = T1 (state @ 1 @ (j+1));
t2 = T2 (state @ 2 @ (j+2));
t3 = T3 (state @ 3 @ (j+3));
}
|| j <- [0 .. 3] |];
};
In this section, we use the seq pragma to optimize the T-Box implementation for
area, reusing the round each cycle. We use the same technique as in Sects. 3.4 and
5.2.1.
The implementation reuses the oneRound128_Tbox function from the previ-
ous section and is defined in Fig. 18.
rounds = [init] #
[| oneRound128_Tbox (round, state_and_key)
|| state_and_key <- rounds
|| round <- [1..11] |];
(final_state, dont_care) = rounds ! 0
};
rounds = [init] #
seq [| oneRound128_Tbox (round, state_and_key)
|| state_and_key <- rounds
|| round <- [1..11] |];
(final_state, dont_care) = rounds ! 0
};
In this section, we use the reg pragma to pipeline the AES-128 T-Box implemen-
tation. We have decided to pipeline each round to five stages and target a clockrate
of about 400 MHz (2.5 ns).
We should attempt to implement the same number of stages for both “nextKey”
and “oneRound,” because they execute in parallel. If we do not use the same number
of stages in each function, the compiler will simply insert registers to keep the cir-
cuits synchronized, but the registers will not be optimally placed. Also, we should
use Block RAMs with 2-cycle latency; they have a higher clockrate, and the extra
latency simply allows us to do more in parallel with the Block RAM. Therefore, to
obtain five stages per round we should insert three more registers into both “nex-
tKey” and “oneRound.”
Each Block RAM behaves as a register, so the SBox and T box operations al-
ready define some of the pipeline stages. We are defining a pipeline with many small
stages, so the latency of Block RAMs will dominate. Therefore, the input to a Block
RAM should never be calculated in the same stage as the Block RAM because this
would increase the delay of that stage. For example, we should place a register
between the RotWord and SubWord operations so that they are performed in sep-
arate stages. Alternatively, we can exchange these operations as this may result in a
more balanced pipeline.
Because compiling to LLSPIR is relatively quick and provides us with an
estimated clockrate, one can experiment with many different combinations and
placements of reg in search of the fastest possible clockrate.
The “nextKey” circuit performs the following, sequentially:
1. The following, in parallel:
(a) SubWord and RotWord
(b) Rcon and xor with w0
2. Xor the previous two results
3. Xor with w1
4. Xor with w2
5. Xor with w3
Because it is implemented using a Block RAM, the SubWord operation will take
2 cycles. We can check the propagation delay of the other operations by executing
the following commands in LLSPIR mode:
Designing Tunable, Verifiable Cryptographic Hardware Using Cryptol 139
rounds = [init] #
[| oneRound128_Tbox_reg (round, state_and_key)
|| state_and_key <- rounds
|| round <- [1..11] |];
(final_state, dont_care) = rounds ! 0
};
In this section, we verify that the implementations in Sects. 5.1 and 5.2 are equiva-
lent to the reference specifications for AES-128 and AES-256.
First, we generate formal models of the reference specification in symbolic
mode:
:l AES_Revisited.tex
:set symbolic
:fm (Cipher : ([128],[128]) -> [128]) "aes128-spec.fm"
:fm (Cipher : ([256],[128]) -> [128]) "aes256-spec.fm"
interpreter
Designing Tunable, Verifiable Cryptographic Hardware Using Cryptol 141
:set symbolic
:fm Cipher128 "aes128-imp1.fm"
:fm Cipher256 "aes256-imp1.fm"
5.5 Performance
6 Conclusion
We have introduced the types and constructs of the Cryptol language, as well
as the Cryptol interpreter, and in Sect. 3 we have provided several examples of
how to use the language and toolchain to specify, implement, refine, and verify
hardware circuits. We then used these techniques to produce and refine several
Designing Tunable, Verifiable Cryptographic Hardware Using Cryptol 143
References
1. Daemen J, Rijmen V (2000) The block cipher rijndael. In: Quisquater J-J, Schneier B (eds)
Smart card research and applications, LNCS 1820. Springer, Berlin, pp 288–296
2. Galois, Inc. (2008a) Cryptol programming guide. http://www.galois.com/files/Cryptol/Cryptol
Programming Guide.pdf
3. Galois, Inc. (2008b) From cryptol to FPGA: a tutorial. http://www.galois.com/files/Cryptol/
Cryptol Tutorial.pdf
4. National Institute of Standards and Technology (2001) Federal information processing stan-
dards publication 197: advanced encryption standard (AES). http://csrc.nist.gov/publications/
fips/fips197/fips-197.pdf
Verifying Pipelines with BAT
1 Introduction
In this chapter, we show how to use the Bit-level Analysis Tool (BAT) [4, 20–22]
for hardware verification. The BAT system has been used in the analysis of systems
ranging from cryptographic hash functions to machine code to biological systems
to large component-based software systems [18, 23, 24], but here we focus on one
application: verification of pipelined hardware systems. This chapter brings together
results from previous work in a self-contained way, and is intended as a starting
point for someone who is interested in using automatic formal verification tools to
prove the correctness of hardware or low-level software. The structure and examples
in this chapter are based on previous work by the authors that showed how to use
the ACL2 theorem proving system [8] to model and verify pipelined machines [12].
Hardware systems are ubiquitous and are an integral part of safety-critical and
security-critical systems. Ensuring the correct functioning of hardware is therefore
of paramount importance as failure of deployed systems can lead to loss of life
and treasure. A well known example is the bug that was found in the floating point
division (FDIV) unit of the Intel Pentium processor and that led to a 475 million
dollar write-off by Intel. Estimates show that a similar bug in the current generation
of Intel processors would cost the processor company about 12 billion dollars [1].
One of the key optimizations used in hardware systems is pipelining. Pipelining
is used extensively in hardware designs, including both mainstream and embedded
microprocessor designs, multi-core systems, cache coherence protocols, memory
interfaces, etc. Therefore, the verification of pipelines is an important, ubiquitous
problem in hardware verification and has received a lot of attention from the research
community.
P. Manolios ()
Northeastern University, Boston, MA, USA
e-mail: pete@ccs.neu.edu
Pipelines are essentially assembly lines. Just like it is much more efficient to
build cars using an assembly line, it is also much more efficient to break up the
execution of processor instructions into well-defined stages, e.g., fetch, decode, and
execute. In this way, at any point in time there can be multiple instructions being ex-
ecuted simultaneously, in parallel and in various stages of completion. Furthermore,
in order to extract maximum performance from pipelines, synchronization between
the various instructions being executed in parallel is required. This synchronization
between instructions, memories, and register files is provided by complex pipeline
controllers. This added complexity makes the design and verification of pipelines a
challenging problem.
We use the BAT system [22] for pipelined machine verification for several rea-
sons. The BAT specification language [21] is designed as a synthesizable HDL with
formal semantics and can therefore be used to construct bit-level pipelined machine
models amenable to formal analysis. The decision procedure incorporated in BAT
includes a memory abstraction algorithm and memory rewriting techniques and can
therefore deal with verification problems that involve large memories [20]. Also, the
BAT decision procedure uses an efficient circuit to CNF compiler, which drastically
improves efficiency [4, 19].
The notion of correctness that we use for pipelined machines is based on Well-
founded Equivalence Bisimulation (WEB) refinement [10, 11]. There are several
attractive properties of refinement. The instruction set architecture (ISA) is used as
the specification. Both safety and liveness are accounted for. The refinement map
(a function used to relate states of the pipelined machine with states of its ISA) is a
parameter of the framework and can therefore be studied and optimized to improve
efficiency [7, 13, 15, 16]. Refinement is a compositional notion, a property that can
be exploited to deal with scalability issues [14].
The rest of the chapter is organized as follows. Section 2 describes the BAT
system, including the BAT specification language and the BAT decision procedure.
Section 3 describes a three-stage pipelined machine example and its ISA, and also
shows how to model these machines using BAT. In Sect. 4, we provide an overview
of the notion of correctness we use, which is based on refinement. Section 5 shows
how to verify pipelines with the BAT system, using the example of the three-stage
pipeline. Section 6 provides an overview of techniques to cope with the efficiency
and scalability issues that arise when reasoning about more complex pipelined sys-
tems. Conclusions are given in Sect. 7.
The BAT is a system for solving verification problems arising from hardware,
software, and security. BAT is designed to be used as a bounded model checker
and k-induction engine for register transfer level (RTL) models. At the core of
the system is a decision procedure for quantifier-free formulas over the exten-
sional theory of fixed-size bit-vectors and fixed-size bit-vector arrays (memories).
Verifying Pipelines with BAT 147
BAT also incorporates a specification language that can be used to model hardware
designs at the word-level and to express linear temporal logic (LTL) properties.
In this section, we describe the BAT specification language and provide a brief
overview of the BAT decision procedure.
The BAT specification language is strongly typed and includes a type inference
algorithm. BAT takes as input a machine description and LTL specification, and tries
to either find a counterexample requiring no more steps than a user provided upper
bound, or tries to prove no such counterexample exists. While BAT accepts various
file formats, a commonly used format for the machine specification requires the
following four sections: :vars, :init, :trans, and :spec. These correspond
to the declaration of the variables making up the machine state, a Boolean formula
describing valid initial states, a Boolean formula describing the transition relation,
and an LTL formula giving the desired formula, respectively. In this section, we
describe the main features of the language. For a complete description, see the BAT
Web page [21].
The BAT language is strongly typed. Variables are either bit-vectors or memories.
The :vars section is a list of variable declarations that specify the types of each
variable. Each variable declaration is either (1) A symbol corresponding to the vari-
able name, in which case the variable is a bit-vector of one bit (e.g., x). (2) A list
with two elements, a variable name and a positive integer, in which case the variable
is a bit-vector of the given number of bits (e.g., (x 4) is a bit-vector of 4 bits). (3)
A list with three elements, a variable name and two positive integers, specifying
that the variable is a memory with the given word size and number of words (e.g.,
(x 8 4) is a memory with eight 4-bit words).
A :vars section then looks like this: (:vars (x 2) y (z 8 16)). In
addition to variables, there are bit-vectors and integer constants. Bit-vectors can be
given in binary, hex, or octal. For example, numbers in binary start with 0b and are
followed by an arbitrary sequence of 0s and 1s.
Integers are represented by signed bit-vectors. The size of the bit-vector is deter-
mined by BAT’s type-inferencing mechanism. The appropriate size is determined
by the context in which the integer is used. For example, if x is a 4-bit bit-vector,
then if we bitwise-and it with 3, it is written as (and x 3). Then in this context,
3 is represented by the bit-vector 0b0011, since bit-vectors that are bitwise-anded
together must be of the same type. The only restriction in this case is that the integer
must be representable in signed binary notation (2’s complement) in the number of
bits dictated by the context.
148 P. Manolios and S.K. Srinivasan
2.1.2 Primitives
BAT supports primitives for Boolean, arithmetic, and memory operations. All the
basic bitwise Boolean functions are provided. The functions and, or, and xor
all take an arbitrary number of arguments and perform the appropriate operations.
In addition, -> (implication), and <-> (iff) take exactly two arguments. The not
function takes exactly one argument. All of these functions take bit-vectors of the
same size, and return a bit-vector of that size.
Arithmetic operations include =, <, >, <= (less than or equal to), >= (greater than
or equal to), add, sub, inc, and dec. BAT contains bit-vector related functions
as well. These include different kind of shift and rotate operations, concatenation,
and (signed) extension. For example, the cat function concatenates bit-vectors,
returning a bit-vector with size equal to the sum of the inputs to the cat function.
The most significant bits are to the left so the earlier arguments to the cat formula
are more significant than the later arguments.
Memories have to be treated with care because the obvious translation that con-
verts formulas involving memories to propositional logic leads to an exponential
blow-up. The BAT system introduced a decision procedure for memories that leads
to greatly reduced SAT problems [20]. The memory-specific BAT functions are get
and set. The get function takes a memory and a bit-vector and returns the word
of the memory addressed by the bit-vector. The set function takes a memory and
two bit-vectors. It returns a memory equivalent to the original memory except that
the word addressed by the first bit-vector is set to the value of the second bit-vector.
In both cases the size of the addresses must be equal to the ceiling of the log of
the number of words in the memory, and in the case of the set the size of the last
argument must be equal to the words size of the memory. Memories can be directly
compared for equality using = (type checking makes sure that they have the same
type, i.e., that they have the same word size and the same number of elements). In a
similar way, they type of an if can be a memory (type checking again checks that
the then and else cases have the same type).
2.1.3 Expressions
BAT supports several constructs to build bit-vector and bit-vector memory expres-
sions. Conditional statements include if and cond. The if statement takes three
arguments: the first is the test and must be a 1-bit bit-vector. The second and third
arguments are the then and else clauses respectively and must be the same type.
Cond statements are convenient for expressing a series of if statements. For ex-
ample, a cond statement that returns -1 if x < y, 1 if x > y and 0 otherwise is
shown below:
BAT provides a way to return multiple values from an expression (this becomes
helpful in conjunction with user-defined functions). This is done simply by wrap-
ping a sequence of values in an mv form:
(mv (+ a b) (set m x y))
This returns both the sum and the result of the set form.
The most complex construct of the BAT language is local. In its simplest form,
it operates like a let* in Lisp. The following implementation of an ALU slice
demonstrates one of the more complex abilities of the local.
(local ((nb (xor bnegate b))
(res0 (and a nb))
(res1 (or a nb))
(((cout 1) (res2 1)) (fa a nb cin)))
(cat cout (mux-4 res0 res1 res2 1u op))))
Here, the last binding binds variables cout and res2 simultaneously. It declares
each to be 1 bit, and binds them to the 2-bit output of the fa function (a user-
defined function). This splits up the output of the fa function between cout and
res2 according to their sizes. Another feature of the local is illustrated by the
following.
(local ((c 2))
(((t0 (c 0))
(alu-slice (a 0) (b 0) bnegate bnegate op))
((t1 (c 1))
(alu-slice (a 1) (b 1) t0 bnegate op))
(zero (= c 0)))
(cat t1 c zero))
Here an extra argument appears at the beginning of the local. This is a list of
bit-vector variable declarations. The idea is that these variables can be bound by bits
and pieces through the bindings. The first binding binds several values, as in the last
example. However, in this example the second value being bound is not a variable,
but a bit of the variable, c, declared in the first argument to the local. Likewise,
the other bit of c is set in the second binding. It is also possible to set a sequence of
bits in a similar way by giving two integers: ((c 0 1) (and a b)).
Finally, it is possible to set multiple values to the result of an mv form:
(local ((aa mm) (mv (inc a) (set m a b)))
(set mm c aa))
Here the types of the variables being bound are inferred from the type of the
mv form.
150 P. Manolios and S.K. Srinivasan
BAT takes specifications in one of the three formats. The first is a machine descrip-
tion for bounded model checking. A file in this format contains three items. The first
is the keyword “:machine” (without the quotes). The second is the machine descrip-
tion (described above). The third is a natural number which represents the number
of steps you want BAT to check the property for.
The other two formats are very similar. They are used to check if a formula holds
for some values of the variables (existential), or if a formula holds for all values of
the variables (universal). These files contain four items. The first is either “:exists”
or “:forall” (without the quotes). The next is a list of variable declarations for the
formula. The third is a list of function definitions for use in the formula (this can be
( ) if there are no functions). The final argument is the formula itself, which is over
the variables and functions declared earlier in the file.
For examples of all these formats, see the BAT Web page [21].
Verifying Pipelines with BAT 151
As we saw in the earlier section, a BAT specification includes a model and a property
about the model that BAT attempts to verify. The BAT decision procedure translates
the input specification to a Boolean formula in conjunctive normal form (CNF). The
CNF formula is then checked using a SAT solver. In the common case, where we are
checking validity, if the CNF formula is found to be unsatisfiable, then this corre-
sponds to a formal proof that the user-provided property is valid. If the CNF formula
is satisfiable, then the satisfying assignment is used to construct a counterexample
for the input property.
The translation from the input specification to CNF is performed using four
high-level compilation steps and is based on a novel data structure for represent-
ing circuits known as the NICE dag, because it is a dag that contains Negations,
Ites (If–Then–Else operators), Conjunctions, and Equivalences [4]. In the first step,
functions are inlined, constants are propagated, and a range of other simplifications
are performed. The output of the first step is a NICE dag that also includes next
operators, memory variables, and memory operators. In the second step, the tran-
sition relation is unrolled for as many steps as specified by the specification. This
eliminates the next operators, resulting in a NICE dag with memory variables and
memory operators. In the third step, BAT uses its own decision procedure for the
extensional theory of arrays to reduce memories [20], which are then eliminated by
replacing memory variables and memory operators with Boolean circuits, resulting
in a NICE dag. In the fourth step, the NICE dag is translated to a SAT problem in
CNF format.
The key idea of the abstraction algorithm is to reduce the size of a memory to a
size that is comparable to the number of unique accesses (both read and write) to
that memory. The insight here is that if in a correctness property, there are only 10
unique accesses to a memory with say 232 words, it is enough to reason about a re-
duced version of the memory whose resulting size is just larger than 10, to check the
property. Therefore, the original memory size can be drastically reduced. Note how-
ever that care has to taken when performing the reduction because a memory access
could be a symbolic reference, i.e., an access that could reference any one of a large
number of words in the memory. Another complication is that we allow memories to
be directly compared in any context, i.e., we have to support an extensional theory
of arrays.
The efficiency of memory abstraction depends on the size of the reduced mem-
ories, which in turn depends on the number of unique memory access. However,
because of nested memory operations, it is often hard to determine if two different
memory references correspond to the same symbolic reference. To improve the ef-
ficiency of the abstraction, BAT incorporates automated term-rewriting techniques
employing a number of rewrite rules that are used to simplify expressions with mem-
ory operators. The simplifications performed by rewriting help to identify equivalent
memory references thereby improving the efficiency of memory abstraction.
CNF generation can significantly affect SAT solving times. BAT introduced a new
linear-time CNF generation algorithm, and extensive experiments, have shown that
our algorithm leads to faster SAT solving times and smaller CNF than existing ap-
proaches. Our CNF generation algorithm is based on NICE dags, which subsume
and-inverter graphs (AIGs) and are designed to provide better normal forms at lin-
ear complexity. The details are beyond the scope of this chapter, but are described
in detail elsewhere [4].
In this section, we show how to model a simple instruction set architecture and a
three-stage pipelined implementation of this instruction set architecture using the
BAT specification language. We start by defining ISA, a sequential machine that di-
rectly implements the instruction set architecture. We then define MA, a three-stage
pipelined implementation (the microarchitecture machine). As stated previously,
the models are based on our previous work on using ACL2 for hardware verifi-
cation [12]. Those models in turn are based on Sawada’s simple machine [27] and
our subsequent related machines [9].
The instructions in the ISA have four components, including an opcode, a des-
tination register, and two source registers. The pipelined MA machine is shown
Verifying Pipelines with BAT 153
ALU
Memory Latch Latch
1 2
in Fig. 1. The functionality of the ISA is split into three stages so that each of the
stages can operate in parallel on different instructions. Registers, known as pipeline
latches, are used to separate the stages. The pipeline latches hold the intermediate
results generated in a stage. The MA machine has two pipeline latches, latch 1 and
latch 2 as shown in the figure. The three stages of our MA machine are fetch, set
up, and write. In the fetch stage, an instruction is fetched from memory using the
program counter as the address, and is stored in latch 1. In the set-up stage, the
source operands are retrieved from the register file and stored in latch 2, along with
the rest of the instruction. In the write stage, the appropriate operation is performed
by the ALU (arithmetic and logic unit), and the result of the ALU operation is stored
in the destination register specified by the destination address of the instruction.
Consider a simple example, where the contents of the memory is as follows.
Inst
0 add rb ra ra
1 add ra rb ra
The following traces are obtained when the two-line code segment is executed
on the ISA and MA machines. Note that we only show the values of the program
counter and the contents of registers ra and rb.
The rows correspond to steps of the machines, e.g., row Clock 0 corresponds to
the initial state, Clock 1 to the next state, and so on. The ISA and MA columns con-
tain the relevant parts of the state of the machines: a pair consisting of the Program
Counter (PC) and the register file (itself a pair consisting of registers ra and rb). The
final two columns indicate what stage the instructions are in (only applicable to the
MA machine).
The PC in the initial state (in row Clock 0) of the ISA machine is 0. The values
of the registers ra and rb are 1. The next state of the ISA machine (row Clock 1)
is obtained after executing instruction “Inst 0.” In this state, the PC is incremented
to 1, and the sum of the values stored in registers ra and rb (2) is computed and
stored in rb. In the second clock cycle, instruction “Inst 1” is executed. The PC is
again incremented to 2. The sum of the values stored in registers ra and rb (3) is
computed and stored in ra.
In the initial state of the MA machine, the PC is 0. We assume that the two latches
are initially empty. In the first clock cycle, “Inst 0” is fetched and the PC is incre-
mented. In the second clock cycle, “Inst 1” is fetched, the PC is incremented again,
154 P. Manolios and S.K. Srinivasan
and “Inst 0” proceeds to the set-up stage. In the third clock cycle, “Inst 0” com-
pletes and updates register rb with the correct value (as can be seen from the MA
column). However, during this cycle, “Inst 1” cannot proceed, as it requires the rb
value computed by “Inst 0,” and therefore is stalled and remains in the fetch stage.
In the next clock cycle, “Inst 1” moves to set-up, as it can obtain the the rb value it
requires from the register file, which has now been updated by “Inst 0.” In the fifth
clock cycle, “Inst 1” completes and updates register ra.
We now consider how to define the ISA and MA machines using BAT. The first
machine we define is a 32-bit ISA, i.e., the data path is 32 bits. The main function
is isa-step, a function that steps the ISA machine, i.e., it takes an ISA state and
returns the next ISA state. The definition of isa-step follows.
(isa-step
((32) (4294967296 32)
(4294967296 100) (4294967296 32))
((pc 32) (regs 4294967296 32)
(imem 4294967296 100) (dmem 4294967296 32))
(local
((inst (get imem pc))
(op (opcode inst))
(rc (dest-c inst))
(ra (src-a inst))
(rb (src-b inst)))
(cond ((= op 0) (isa-add rc ra rb pc regs imem dmem))
;; REGS[rc] := REGS[ra] + REGS[rb]
((= op 1) (isa-sub rc ra rb pc regs imem dmem))
;; REGS[rc] := REGS[ra] - REGS[rb]
((= op 2) (isa-and rc ra rb pc regs imem dmem))
;; REGS[rc] := REGS[ra] and REGS[rb]
((= op 3) (isa-load rc ra pc regs imem dmem))
;; REGS[rc] := MEM[ra]
((= op 4) (isa-loadi rc ra pc regs imem dmem))
Verifying Pipelines with BAT 155
;; REGS[rc] := MEM[REGS[ra]]
((= op 5) (isa-store ra rb pc regs imem dmem))
;; MEM[REGS[ra]] := REGS[rb]
((= op 6) (isa-bez ra rb pc regs imem dmem))
;; REGS[ra]=0 -> pc:=pc+REGS[rb]
((= op 7) (isa-jump ra pc regs imem dmem))
;; pc:=REGS[ra]
(1b1 (isa-default pc regs imem dmem)))))
In the above function regs refers to the register file, imem is the instruc-
tion memory, and dmem is the data memory. The function fetches the instruction
from the instruction memory, which is a bit-vector. Then it uses decode functions
opcode, dest-c, src-a, and src-b to decode the instruction. The opcode
is then used to figure out what action to take. For example, in the case of an
add instruction, the next ISA state is (isa-add rc ra rb pc regs imem
dmem), where isa-add provides the semantics of add instructions. The definition
of isa-add is given below.
(isa-add
((32)(4294967296 32)
(4294967296 100) (4294967296 32))
((rc 32) (ra 32) (rb 32) (pc 32) (regs 4294967296 32)
(imem 4294967296 100) (dmem 4294967296 32))
(mv (bits (+ pc 1) 0 31)
(add-rc ra rb rc regs)
imem
dmem))
Notice that the program counter is incremented and the register file is updated by
setting the value of register rc to the sum of the values in registers ra and rb. This
happens in function add-rc.
The other ALU instructions are similarly defined. We now show how to define
the semantics of the rest of the instructions. The semantics of the load instructions
are shown next.
(isa-loadi
((32) (4294967296 32)
(4294967296 100) (4294967296 32))
((rc 32) (ra 32) (pc 32) (regs 4294967296 32)
(imem 4294967296 100) (dmem 4294967296 32))
156 P. Manolios and S.K. Srinivasan
(load-rc
(4294967296 32)
((ad 32) (rc 32) (regs 4294967296 32)
(dmem 4294967296 32))
(set regs rc (get dmem ad)))
(isa-load
((32) (4294967296 32)
(4294967296 100) (4294967296 32))
((rc 32) (ad 32) (pc 32) (regs 4294967296 32)
(imem 4294967296 100) (dmem 4294967296 32))
(mv (bits (+ pc 1) 0 31)
(load-rc ad rc regs dmem)
imem
dmem))
(store
(4294967296 32)
((ra 32) (rb 32) (regs 4294967296 32)
(dmem 4294967296 32))
(set dmem (get regs ra) (get regs rb)))
(bez
(32)
((ra 32) (rb 32) (regs 4294967296 32) (pc 32))
(cond ((= (get regs ra) 0)
(bits (+ pc (bits (get regs rb) 0 31)) 0 31))
(1b1 (bits (+ pc 1) 0 31))))
3.2 MA Definition
The MA machine is a pipelined machine with three stages that implements the in-
struction set architecture of the ISA machine. Therefore, the ISA machine can be
thought of as a specification of the MA machine. The MA machine contains a PC, a
register file, a memory, and two pipeline latches. The latches are used to implement
158 P. Manolios and S.K. Srinivasan
pipelining and stores intermediate results generated in each stage. The first latch
contains a flag which indicates if the latch is valid, an opcode, the target register, and
two source registers. The second latch contains a flag as before, an opcode, the tar-
get register, and the values of the two source registers. The definition of ma-step
follows.
(ma-step
((298) (4294967296 32)
(4294967296 100) (4294967296 32))
((ma 298) (regs 4294967296 32)
(imem 4294967296 100) (dmem 4294967296 32))
(mv
(cat
(step-latch2 ma regs)
(step-latch1 ma imem)
(step-pc ma regs imem))
(step-regs ma regs dmem)
imem
(step-dmem ma dmem)))
The ma-step function works by calling functions that given one of the MA
components return the next state value of that component. Note that this is very
different from isa-step, which calls functions, based on the type of the next
instruction, that return the complete next isa state.
Below, we show how the register file is updated. If latch2 is valid, then if we have
an ALU instruction, the output of the ALU is used to update register rc. Otherwise,
if we have a load instruction, then we update register rc with the appropriate word
from memory.
(step-regs
(4294967296 32)
((ma 298) (regs 4294967296 32) (dmem 4294967296 32))
(local
((validp (getvalidp2 ma))
(op (getop2 ma))
(rc (getrc2 ma))
(ra-val (getra-val2 ma))
(rb-val (getrb-val2 ma)))
(cond ((and validp (alu-opp op))
(set regs rc (alu-output op ra-val rb-val)))
((and validp (load-opp op))
(set regs rc (get dmem ra-val)))
(1b1 regs))))
Verifying Pipelines with BAT 159
(alu-opp
(1)
((op 4))
(or (= op 0) (= op 1) (= op 2)))
(load-opp
(1)
((op 4))
(or (= op 3) (= op 4)))
(alu-output
(32)
((op 4) (val1 32) (val2 32))
(cond ((= op 0) (bits (+ val1 val2) 0 31))
((= op 1) (bits (- val1 val2) 0 31))
(1b1 (bits (and val1 val2) 0 31))))
Finally, the PC is updated as follows. If latch 1 stalls, then the PC is not modified.
Otherwise, if latch 1 is invalidated, then if this is due to a bez instruction in latch2,
the jump address can be now be determined, so the program counter is updated as
per the semantics of the bez instruction. Otherwise, if the invalidation is due to a
jump instruction in latch 1, the jump address can be computed and the program
counter is set to this address. The only other possibility is that the invalidation is
due to a bez instruction in latch 1; in this case the jump address has not yet been
determined, so the pc is not modified. Note, this simple machine does not have
a branch predictor. If the invalidate signal does not hold, then we increment the
program counter unless we are fetching a branch instruction.
(step-pc (32)
((ma 298) (regs 4294967296 32) (imem 4294967296 100))
(local
162 P. Manolios and S.K. Srinivasan
4 Refinement
In the previous section, we saw how one can model a pipelined machine and its
instruction set architecture in BAT. We now discuss how to verify such machines.
Consider the partial traces of the ISA and MA machines on the simple two-line code
fragment from the previous section (add rb ra ra followed by add ra rb ra). We
are only showing the value of the program counter and the contents of registers ra
and rb.
ISA MA MA MA
(0, (1,1)) (0, (1,1)) (0, (1,1)) (0, (1,1))
(1, (1,2)) (1, (1,1)) ! (0, (1,1)) ! (1, (1,2))
(2, (3,2)) (2, (1,1)) Commit (0, (1,1)) Remove (2, (3,2))
(2, (1,2)) PC (1, (1,2)) Stutter
(, (1,2)) (1, (1,2))
(, (3,2)) (2, (3,2))
Notice that the PC differs in the two traces and this occurs because the pipeline,
initially empty, is being filled and the PC points to the next instruction to fetch.
If the PC were to point to the next instruction to commit (i.e., the next instruction to
complete), then we would get the trace shown in column 3. Notice that in column 3,
Verifying Pipelines with BAT 163
the PC does not change from 0 to 1 until Inst 0 is committed in which case the next
instruction to commit is Inst 1. We now have a trace that is the same as the ISA
trace except for stuttering; after removing the stuttering we have, in column 4, the
ISA trace.
We now formalize the above and start with the notion of a refinement map, a
function that maps MA states to ISA states. In the above example we mapped MA
states to ISA states by transforming the PC. Proving correctness amounts to relating
MA states with the ISA states they map to under the refinement map and proving a
WEB. Proving a WEB guarantees that MA states and related ISA states have related
computations up to finite stuttering. This is a strong notion of equivalence, e.g., a
consequence is that the two machines satisfy the same CTL n X.1 This includes the
class of next-time free safety and liveness (including fairness) properties, e.g., one
such property is that the MA machine cannot deadlock (because the ISA machine
cannot deadlock).
Why “up to finite stuttering”? Because we are comparing machines at differ-
ent levels of abstraction: the pipelined machine is a low-level implementation of
the high-level ISA specification. When comparing systems at different levels of ab-
straction, it is often the case that the low-level system requires several steps to match
a single step of the high-level system.
Why use a refinement map? Because there may be components in one system that
do not appear in the other, e.g., the MA machine has latches but the ISA machine
does not. In addition, data can be represented in different ways, e.g., a pipelined
machine might use binary numbers whereas its instruction set architecture might
use a decimal representation. Yet another reason is that components present in both
systems may have different behaviors, as is the case with the PC above. Notice that
the refinement map affects how MA and ISA states are related, not the behavior of
the MA machine. The theory of refinement we present is based on transition systems
(TSs). A TS, M , is a triple hS; Ü; Li, consisting of a set of states, S , a left-total
transition relation, Ü S 2 , and a labeling function L whose domain is S and
where L:s (we sometimes use an infix dot to denote function application) corre-
sponds to what is “visible” at state s. Clearly, the ISA and MA machines can be
thought of as transition systems (TS).
Our notion of refinement is based on the following definition of stuttering bisim-
ulation [2], where by fp.; s/ we mean that is a fullpath (infinite path) starting at s,
and by match.B; ; ı/ we mean that the fullpaths and ı are equivalent sequences
up to finite stuttering (repetition of states).
1
CTL is a branching-time temporal logic; CTL n X is CTL without the next-time operator X.
164 P. Manolios and S.K. Srinivasan
Browne et al. have shown that states that are stuttering bisimilar satisfy the same
next-time-free temporal logic formulas [2].
Lemma 1. Let B be an STB on M and let sBw. For any CTL n X formula f ,
M ; w ˆ f iff M ; s ˆ f .
We note that stuttering bisimulation differs from weak bisimulation [25] in that
weak bisimulation allows infinite stuttering. Stuttering is a common phenomenon
when comparing systems at different levels of abstraction, e.g., if the pipeline is
empty, MA will require several steps to complete an instruction, whereas ISA com-
pletes an instruction during every step. Distinguishing between infinite and finite
stuttering is important, because (among other things) we want to distinguish dead-
lock from stutter.
When we say that MA refines ISA, we mean that in the disjoint union (]) of the
two systems, there is an STB that relates every pair of states w, s such that w is an
MA state and r.w/ D s.
In this form, the above rule exactly matches the compositional proof rules in [5].
The above theorem states that to prove MA k P ` ' (that MA, the pipelined ma-
chine, executing program P satisfies property ', a property over the ISA visible
state), it suffices to prove MA ISA and ISA k P ` ': that MA refines ISA
(which can be done using a sequence of refinement proofs) and that ISA, executing
P , satisfies '. That is, we can prove that code running on the pipelined machine is
correct, by first proving that the pipelined machine refines the instruction set archi-
tecture and then proving that the software running on the instruction set – not on the
pipelined machine – is correct.
5 Verification
This section describes how BAT is used to verify the three-stage pipelined machine
given in Sect. 3. Note that the definition of WEBs given in Sect. 4 cannot be di-
rectly expressed in the BAT specification language. Therefore, we first strengthen
the WEB refinement proof obligation such that we obtain a statement that is express-
ible as a quantifier-free formula over the extensional theory of fixed-size bit-vectors
and fixed-size bit-vector arrays (memories), the kind of formulas that BAT decides.
We first define the equivalence classes of B to consist of an ISA state and all
the MA states whose image under the refinement map r is the ISA state. As a re-
sult, condition 2 of the WEB refinement definition clearly holds. Since an ISA
machine never stutters with respect to the MA machine, the second disjunct of the
third condition in the WEB definition can be ignored. Also, the ISA machine is
deterministic, and the MA machine if not deterministic, can be transformed to a de-
terministic machine using oracle variables [11]. Using these simplifications and after
some symbolic manipulation, Condition 3 of the WEB definition can be strength-
ened to the following core refinement-based correctness formula, where rank is a
function that maps states of MA into the natural numbers.
The correctness formula shown above is also depicted in Fig. 2. In the formula
above, if MA is the set of all reachable MA states, MA-step is a step of the MA
machine, and ISA-step is a step of the ISA machine, then proving the above
formula guarantees that the MA machine refines the ISA machine. In the formula
above, w is an MA state and v (also an MA state) is a successor of w. s is an ISA
state obtained by applying the refinement map r to w and u (also an ISA state)
is a successor of s. The formula states that if applying the refinement map r to v
does not result in the ISA state u, then r.v/ must be equal to s and the rank of v
should decrease w.r.t. the rank of w. Also, the proof obligation relating s and v can
Verifying Pipelines with BAT 167
v r(v)
r
rank(v) < rank(w)
MA−Step Compare states for equality
ISA−Step Compare states for inequality
Refinement map
be thought of as the safety component, and the proof obligation rank.v/ < rank.w/
can be thought of as the liveness component.
If the ISA and MA models are described at the bit-level, then the core refinement-
based correctness formula relating these models is in fact expressible in the logic
that BAT decides.
To check the core refinement-based correctness formula using BAT, two witness
functions are required, a refinement map and a rank function. There are many dif-
ferent ways in which these witness functions can be defined. In this section, we
describe one approach.
The following function is a recognizer for “good” MA states.
(good-ma (1)
((ma 298) (regs 4294967296 32) (imem 4294967296 100)
(dmem 4294967296 32))
(local
(((nma nregs nimem ndmem)
(committed-ma ma regs imem dmem))
((nma1 nregs1 nimem1 ndmem1)
(ma-step nma nregs nimem ndmem))
((nma2 nregs2 nimem2 ndmem2)
(ma-step nma1 nregs1 nimem1 ndmem1)))
(cond ((getvalidp2 ma)
(equiv-ma nma2 nregs2 nimem2 ndmem2
ma regs imem dmem))
((getvalidp1 ma)
(equiv-ma nma1 nregs1 nimem1 ndmem1
ma regs imem dmem))
(1b1 1b1))))
168 P. Manolios and S.K. Srinivasan
The “good” MA states (also known as reachable states) are states that are reach-
able from the reset states (states in which the pipeline latches are empty). The reason
for using a recognizer for reachable states is that unreachable states can be inconsis-
tent and interfere with verification by raising spurious counterexamples. A state in
which a pipeline latch has an add instruction, when there are no add instructions in
memory is an example of an inconsistent unreachable state. We check for reachable
states by stepping the committed state, the state obtained by invalidating all partially
completed instructions and altering the program counter so that it points to the next
instruction to commit.
(committed-ma
((298) (4294967296 32)
(4294967296 100) (4294967296 32))
((ma 298) (regs 4294967296 32) (imem 4294967296 100)
(dmem 4294967296 32))
(local ((inst (get imem (getppc ma))))
(mv
(cat
(getpch2 ma) (getrb-val2 ma)
(getra-val2 ma) (getrc2 ma)
(getop2 ma) 1b0
(getppc ma) (src-b inst)
(src-a inst) (dest-c inst)
(opcode inst) 1b0
(committed-pc ma))
regs imem dmem)))
The program counter (PC) of the committed state is the PC of the instruction
in the first valid latch. Each latch has a history variable that stores the PC value
corresponding to the instruction in that latch. Therefore, the PC of the committed
state can be obtained from the history variables.
(committed-pc (32) ((ma 298))
(cond ((getvalidp2 ma) (getpch2 ma))
((getvalidp1 ma) (getpch1 ma))
(1b1 (getppc ma))))
The equiv-MA function is used to check if two MA states are equal. Note how-
ever that if latch 1 in both states are invalid, then the contents of latch 1 in both
states are not compared for equality. Latch 2 is also compared similarly.
The committed-MA function invalidates partially executed instructions in the
pipeline and essentially rolls back the program counter to correspond with the next
instruction to be committed. The consistent states of MA are determined by checking
that they are reachable from the committed states within two steps. The refinement
map is defined as follows.
(ma-to-isa
((32) (4294967296 32)
(4294967296 100) (4294967296 32))
Verifying Pipelines with BAT 169
The commitment theorem also includes an inductive proof for the “good” MA
invariant, i.e., we check that if we step MA from any good state, then the succes-
sor of that state will also be good. Next, the property that we ask BAT to check is
shown below. We declare a symbolic MA state in the (:vars) section. The sym-
bolic state essentially corresponds to the set of all syntactically possible MA states.
In the (:spec) section, we ask BAT to check if the commitment-theorem for
all the MA states, which corresponds to the core theorem applied to the “good” MA
states and an inductive invariance proof for the “good” MA invariant.
(:vars (mastate 298) (regs 4294967296 32)
(imem 4294967296 100) (dmem 4294967296 32))
(:spec (commitment-theorem mastate regs imem dmem)))
Table 1 shows the verification times and CNF statistics for the verification of five
three-stage processor models using BAT. The models are obtained by varying the
size of the data path and the number of words in the register file and memories. Note
that the original three-stage model was parametrized, and the models for the exper-
iments were generated by varying the parameters. The models are given the name
“DLX3-n,” where “n” indicates the size of the data path and the size of the program
counter. The instruction memory, the data memory, and the register file each have
2n words. The experiments were conducted on a 1.8-GHz Intel (R) Core(TM) Duo
CPU, with an L1 cache size of 2,048 KB. The SAT problems generated by BAT
were checked using version 1.14 of the MiniSat SAT solver [6].
The formal proof of correctness for the three-stage pipelined machine required
stating the refinement correctness formula in the BAT specification language. BAT
was then able to automatically prove the refinement theorem relating the three-stage
pipelined machine and its ISA. However, a big challenge in verifying pipelined ma-
chines using decision procedures is that as the complexity of the machine increases,
the verification times are known to increase exponentially [18]. An alternate ap-
proach to verifying pipelined machines is based on using general-purpose theorem
provers. More complex designs can be handled using theorem provers, but a
heroic effort is typically required on the part of the expert user effort to carry
Verifying Pipelines with BAT 171
out refinement-based correctness proofs for pipelined machines [17]. In this sec-
tion, we discuss some techniques for handling the scalability issues when using
decision procedures for pipelined machine verification.
One of the advantages of using the WEB refinement framework is that the re-
finement map is factored out and can be studied independently. In Sect. 5, the
commitment refinement map was described. There are other approaches to define
the refinement map as well. Another well-known approach to define the refinement
map is based on flushing, the idea being that partially executed instructions in the
pipeline latches of a pipelined machine state are forced to complete without fetch-
ing any new instructions. Projecting out the programmer-visible components in the
resulting state gives the ISA state.
There are several more approaches to define the refinement map that have been
found to be computationally more efficient. One approach is the Greatest Fixpoint
invariant based commitment [15]. The idea here is to define the invariant that charac-
terizes the set of reachable states in a computationally more efficient way. A second
approach is collapsed flushing, which is an optimization of the flushing refinement
map [7]. A third approach is intermediate refinement maps, that combine both flush-
ing and commitment by choosing a point midway in the pipeline and committing all
the latches before that point and flushing all the latches after that point [16]. This
approach is also known to improve scalability and efficiency.
The BAT decision procedures directly handles the verification problem at the RTL.
One approach to handle scalability issues to to abstract and verify the pipelined
machine at the term-level. The drawback however is that the final correctness result
172 P. Manolios and S.K. Srinivasan
is only about the abstract model and the formal connection with the RTL model is
lost. Hybrid approaches that exploit the refinement framework and use both theorem
proving and decision procedures have been developed to address this problem [17].
The idea is to use the theorem prover to formally reduce the verification problem at
the RTL to an abstract verification problem, which can then be handled by a decision
procedure. The approach scales better for some complex machines, but is much less
automatic than using a decision procedure like BAT.
6.4 Parametrization
An advantage of using BAT is that the models can be easily parametrized. This pro-
vides an effective debugging mechanism. The idea is based on the fact that models
with smaller data path widths lead to computationally more tractable verification
problems. For example, the verification of a 32-bit pipelined machine with many
pipeline stages may not be tractable, but BAT could probably verify a 2-bit or 4-bit
version of the model. While verifying a 4-bit version of the model does not guaran-
tee correctness, a majority of the bugs (for example, control bugs that do not depend
on the width of the data path) will be exposed. Generating a 4-bit version of a 32-bit
model is easy to accomplish if the model is parametrized.
7 Conclusions
In this chapter, we described how to use the BAT system to verify that pipelined
machines refine their instruction set architectures. The notion of correctness that
we used is based on WEB refinement. We showed how to strengthen the WEB
refinement condition to obtain a statement in the BAT specification language, for
which BAT includes a decision procedure. This allows us to automatically check
that the pipelined machine satisfies the same safety and liveness properties as its
specification, the instruction set architecture. If there is a bug, then BAT will provide
a counterexample. We also discussed various techniques to deal with more complex
designs.
While much of the focus of pipelined machine verification has been in verifying
microprocessor pipelines, these techniques can also be used to reason about other
domains in which pipelines occur. Examples include cache coherence protocols and
memory interfaces that use load and store buffers.
BAT is not limited to proving properties of pipelines. Any system that can be
modeled using BAT’s synthesizable hardware description language can be analyzed
using BAT. This includes verification problems arising in both hardware and soft-
ware, embedded systems, cryptographic hash functions, biological systems, and the
assembly of large component-based software systems.
Verifying Pipelines with BAT 173
References
20. Manolios P, Srinivasan SK, Vroon D (2006a) Automatic memory reductions for RTL
model verification. In: Hassoun S (ed) International conference on computer-aided design
(ICCAD’06). ACM, New York, NY, pp 786–793
21. Manolios P, Srinivasan SK, Vroon D (2006b) BAT: the bit-level analysis tool. Available from
http://www.ccs.neu.edu/ pete/bat/
22. Manolios P, Srinivasan SK, Vroon D (2007a) BAT: the bit-level analysis tool. In: International
conference computer aided verification (CAV’07)
23. Manolios P, Oms MG, Valls SO (2007b) Checking pedigree consistency with PCS. In: Tools
and algorithms for the construction and analysis of systems, TACAS, vol 4424 of Lecture notes
in computer science. Springer, Berlin, pp 339–342
24. Manolios P, Vroon D, Subramanian G (2007c) Automating component-based system assem-
bly. In: Proceedings of the ACM/SIGSOFT international symposium on software testing and
analysis, ISSTA. ACM, New York, NY, pp 61–72
25. Milner R (1990) Communication and concurrency. Prentice-Hall, Upper Saddle River, NJ
26. Namjoshi KS (1997) A simple characterization of stuttering bisimulation. In: 17th Conference
on foundations of software technology and theoretical computer science, vol 1346 of LNCS,
pp 284–296
27. Sawada J (2000) Verification of a simple pipelined machine model. In: Kaufmann M,
Manolios P, Moore JS (eds) Computer-aided reasoning: ACL2 case studies. Kluwer, Dordrecht,
pp 137–150
Formal Verification of Partition Management
for the AAMP7G Microprocessor
1 Introduction
The AAMP7G is the latest in the line of Collins Adaptive Processing System
(CAPS) processors and AAMP microprocessors developed by Rockwell Collins for
use in military and civil avionics since the early 1970s [2]. AAMP designs have his-
torically been tailored to embedded avionics product requirements, accruing size,
weight, power, cost, and specialized feature advantage over alternate solutions. Each
new AAMP makes use of the same multitasking stack-based instruction set while
adding state-of-the-art technology in the design of each new CPU and peripheral set.
AAMP7G adds built-in partitioning technology among other improvements.
AAMP processors feature a stack-based architecture with 32-bit segmented, as
well as linear, addressing. AAMP supports 16/32-bit integer and fractional and
32/48-bit floating point operations. The lack of user-visible registers improves code
density (many instructions are a single byte), which is significant in embedded ap-
plications where code typically executes directly from slow Read-Only Memory.
The AAMP provides a unified call and operand stack, and the architecture defines
both user and executive modes, with separate stacks for each user “thread,” as well
as a separate stack for executive mode operation. The transition from user to exec-
utive mode occurs via traps; these traps may be programmed or may occur as the
result of erroneous execution (illegal instruction, stack overflow, etc.). The AAMP
architecture also provides for exception handlers that are automatically invoked in
the context of the current stack for certain computational errors (divide by zero,
arithmetic overflow). The AAMP instruction set is of the CISC variety, with over
200 instructions, supporting a rich set of memory data types and addressing modes.
The AAMP7G provides a feature called “Intrinsic Partitioning” that allows it to host
several safety-critical or security-critical applications on the same CPU. The inten-
tion is to provide a system developer an architectural approach that will simplify the
overall complexity of an integrated system, such as is described in [22]. The transi-
tion from multiple CPUs to a single multifunction CPU is shown in Fig. 1. On the
left, three federated processors provide three separate functions, A, B, and C. It is
straightforward to show that these three functions have no unintended interaction.
On the right of Fig. 1, an integrated processor provides for all three functions.
The processor executes code from A, B, and C; its memory contains all data and
I/O for A, B, and C. A partition is a container for each function on a multifunction
partitioned CPU like the AAMP7G. AAMP7G follows two rules to ensure partition
independence:
1. Time partitioning. Each partition must be allowed sufficient time to execute the
intended function.
2. Space partitioning. Each partition must have exclusive-use space for storage.
Formal Verification of Partition Management for the AAMP7G Microprocessor 177
A A
C C
Federated Integrated
Each partition must be allowed sufficient time to execute the intended function. The
AAMP7G uses strict time partitioning to ensure this requirement. Each partition is
allotted certain time slices during which time the active function has exclusive use
and control of the CPU and related hardware.
For the most secure systems, time slices are allocated at system design time
and not allowed to change. For dynamic reconfiguration, a “privileged” partition
may be allowed to set time slices. AAMP7G supports both of these approaches as
determined by the system designer.
The asynchronous nature of interrupts poses interesting challenges for time-
partitioned systems. AAMP7G has partition-aware interrupt capture logic. Each
interrupt is assigned to a partition; the interrupt is only recognized during its
partition’s time slice. Of course, multiple interrupts may be assigned to a partition.
In addition, an interrupt may be shared by more than one partition if needed.
System-wide interrupts, such as power loss eminent or tamper detect, also need
to be addressed in a partitioned processor. In these cases, AAMP7G will suspend
current execution, abandon the current list of partition control, and start up a list
of partition interrupt handlers. Each partition’s interrupt handler will then run, per-
forming finalization or zeroization as required by the application.
Each partition must have exclusive-use space for storage. The AAMP7G uses mem-
ory management to enforce space partitioning. Each partition is assigned areas in
memory that it may access. Each data and code transfer for that partition is checked
to see if the address of the transfer is legal for the current partition. If the transfer is
legal, it is allowed to complete. If the transfer is not legal, the AAMP7G Partition
Management Unit (PMU) disallows the CPU from accessing read data or code fetch
data; the PMU also preempts write control to the addressed memory device.
178 M.M. Wilding et al.
Only a small amount of memory is needed for the AAMP7G partition control struc-
tures (summarized in Fig. 2). This data space is typically not intended to be included
in any partition’s memory access ranges. The partition control structures include
each partition’s control includes time allotment, memory space rights, initial state,
and test access key stored in ROM. Each partition’s saved state is stored in RAM.
Partition control blocks are linked together defining a partition activation sched-
ule. AAMP7G partition initialization and partition switching are defined entirely by
these structures.
The partition control structures are interpreted entirely in microcode, so no soft-
ware access is needed to the AAMP7G partitioning structures. This limits the
verification of AAMP7G partitioning to proving that the partitioning microcode per-
forms the expected function and that no other microcode accesses the partitioning
structures.
Init Table
The AAMP7G formal processing model is shown in Fig. 3. Actual AAMP7G pro-
cessing layers are shown in nonitalic text, while layers introduced for the sake of
formal reasoning are shown in italics.
We generally prove correspondence between a concrete model at a given level
and a more abstract model. Sequences of microcode implement a given instruction;
sequences of abstract instruction steps form basic blocks; a machine code subroutine
is made up of a collection of basic blocks. Subroutine invocations are performed in
the context of an AAMP thread, and multiple user threads plus the executive mode
constitute an AAMP7G partition. Our model supports the entire context switching
machinery defined by the AAMP architecture, including traps, outer procedure re-
turns, executive mode error handlers, and so on.
Some aspects of the AAMP7G model are useful for general comprehension
of the AAMP7G architecture and for organizing the proof effort. In particular,
the correctness theorem we proved about the AAMP7G partitioning mechanism
relates the behavior of the microcode of the microprocessor to an abstract notion of
AAMP7G partitioning operation, so understanding many of the layers of the model
is not strictly necessary to understanding what has been proved about the AAMP7G.
Rockwell Collins has performed a formal verification of the AAMP7G partition-
ing system using the ACL2 theorem prover [11]. This work was part of an evaluation
effort which led the AAMP7G to receive a certification from NSA, enabling a sin-
gle AAMP7G to concurrently process Unclassified through Top Secret codeword
information. We first established a formal security specification, as described in
[8], and summarized in Sect. 5. We produced an abstract model of the AAMP7G’s
partitioning system as well as a low-level model that directly corresponded to the
AAMP7G microcode. We then used ACL2 to automatically produce the following:
Subroutine Invocations
BasicBlocks
We have chosen the ACL2 logic, an enhancement of the Common Lisp program-
ming language [11], to describe our security specification. ACL2 is a good choice
for this work because of its usefulness in modeling and reasoning about computing
systems [6, 9] as well as the substantial automation afforded by the ACL2 theorem
proving system.
The formal security specification describes abstractly what a separation kernel
does. The machine being modeled supports a number of partitions whose names
are provided by the constant function allparts. We use the notation of ACL2’s
encapsulate command to indicate a function of no arguments that returns a single
value.
((allparts)=>*)
One of the partitions is designated the “current” partition. The function current
calculates the current partition given a machine state.
((current *)=>*)
We use the notation of ACL2’s defthm command, which presents a theorem
expressed in Common Lisp notation, to indicate a property about the functions
current and allparts.
(defthm current-is-partition
(member (current st) (allparts)))
Associated with partitions are memory segments. Memory segments have names
and are intended to model portions of the machine state. The names of the memory
segments associated with a particular partition are available from the function segs,
182 M.M. Wilding et al.
which takes as an argument the name of the partition. (Note that since segs is a
function only of partition name and not, for example, a function of machine state,
the assignment of segments to partitions is implicitly invariant.)
((segs *)=>*)
The values in a machine state that are associated with a memory segment are
extracted by the function select. select takes two arguments: the name of
the memory segment and the machine state.
((select * *)=>*)
The separation kernel enforces a communication policy on the memory seg-
ments. This policy is modeled with the function dia (for Direct Interaction
Allowed), which represents the pairs of memory segments for which direct in-
teraction is allowed. The function takes as an argument a memory segment name
and returns a list of memory segments that are allowed to affect it. (Note that
since dia is a function only of the memory segment name, the formalization here
implicitly requires that the communication policy is invariant.)
((dia *)=>*)
The last function constrained in the security specification is next, which mod-
els one step of computation of the machine state. The function next takes as an
argument a machine state and returns a machine state that represents the effect of
the single step.
((next *)=>*)
The aforementioned constrained functions are used to construct several
additional functions. selectlist takes a list of segments and returns a list
of segment values; segslist takes a list of partition names and returns the list of
memory segments associated with the partitions; and run takes an initial machine
state and number of steps and returns an initial machine state updated by executing
the number of steps indicated.
(defun selectlist (segs st)
(if (consp segs)
(cons
(select (car segs) st)
(selectlist (cdr segs) st))
nil))
Figure 5 shows the theorem proved about the AAMP7G, which is an instance of the
abstract security specification of Fig. 4, made concrete so as to show that the abstract
notion of separation holds of a concretely described model of the AAMP7G. The
reification of the theorem in Fig. 4 to the AAMP7G theorem in Fig. 5 is not obvious,
not the least because the theorem includes an operational model of the AAMP7G,
which must be relatable to the actual device in order for the correctness proof to
have practical usefulness as part of a certification process.
Recall the different levels of AAMP7G models described in Fig. 3. The correct-
ness theorem involves models of the AAMP7G at two levels, at both the functional
level and abstract level. The functional model closely corresponds with the actual
AAMP7G microarchitecture implementation. For example, in the functional model,
RAM is modeled as an array of values. The abstract model represents the data of the
AAMP7G in a manner more convenient for describing properties. For example, the
partitioning. All the partitioning-relevant microcode runs in this trusted mode, and
the low-level design model of the AAMP7G models all the microcode that runs in
trusted mode.
Considerable thought was put into defining the “step” of the AAMP7G micro-
processor for the purpose of formalizing it in the next function. Broadly speaking,
a step is a high-level partition step indicated at the highest level of Fig. 3. For ex-
ample, in the nominal case (where there is no unusual event such as a power-down
warning), a step is the loading of a partition including relevant protections, an exe-
cution of a user partition, and the saving of the state of that partition.
While the notion of “step” implemented by the next function is abstract, its
definition in ACL2 is most assuredly not. It corresponds to the most concrete, low-
est level of Fig. 3. This is because another crucial consideration when developing
the model of the AAMP7G contained in the next function is how to validate this
hand-written model against the actual AAMP7G. The AAMP7G is a microcoded
microprocessor, and much of the functionality of the machine is encoded in its mi-
crocode. The low-level design model is written specifically to make a code-to-spec
review that relates the model to the actual implementation relatively straightforward.
An ACL2 macro allows an imperative-style description that eases comparison with
microcode. Also, very importantly, the model is written with the model of mem-
ory that the microcode programmer uses. That is, memory has only two operations:
read and write. The simplicity of the memory model makes the code-to-spec review
easier but adds a great deal of complexity to the proof. Since the proof is machine
checked while the model validation process requires evaluation, this is a good trade-
off. It provides a high level of assurance with a reasonable level of evaluation. Much
of the effort on the project was spent constructing the proofs, but the proofs were
reviewed relatively easily by the evaluators because they could be replayed using
ACL2.
Figure 6 presents an example fragment of the low-level functional design model.
It is typical of the ACL2 microcode model in that each line of microcode is modeled
by how it updates the state of the partition-relevant machine. A small program
was written that identifies all microcode that can be run in trusted mode, and the
results were used to check informally that the ACL2 model in fact models every
line of microcode that runs in trusted mode. An additional manual check was per-
formed to insure that the output of this tool correctly identified entry/exit points of
the trusted microcode. The entire AAMP7G model is approximately 3,000 lines of
ACL2 definitions.
The AAMP7G GWV theorem was proved using ACL2. The proof architecture
decomposes the proof into three main pieces:
1. Proofs validating the correctness theorem (as described in [7])
2. Proof that the abstract model meets the security specification
3. Proof that the low-level model corresponds with the abstract model.
Formal Verification of Partition Management for the AAMP7G Microprocessor 187
The analysis described in Sects. 3 and 4 was part of the evaluation that led to the
AAMP7’s certification [18]. Machine checking provides a high level of confidence
that the theorem was in fact a theorem. But how to ensure that proving the theorem
really means that the AAMP7G has the appropriate behavior? An important step in
the process was to conduct a code-to-spec review with a National Security Agency
evaluation team. This review validated the theorem. Each of the functions in the
188 M.M. Wilding et al.
formal specification was reviewed. The most complex of these functions is the rep-
resentation of the AAMP7G’s design, as the link between the model of the design
and the actual implementation must be established. As discussed earlier, the model
was designed specifically to facilitate this kind of scrutiny.
The documentation package that was created for this review included:
Material explaining the semantics of ACL2 and AAMP7G microcode
Listings of the AAMP7G microcode and the ACL2 low-level model
The source code listing of a tool that identifies trusted-mode microcode se-
quences, and a listing of such sequences in the AAMP7G microcode
Cross-references between microcode line numbers, addresses, and formal model
line numbers
The ACL2-checkable proofs in electronic form.
The exhaustive review accounted for each line of trusted microcode and each model
of a line of trusted microcode, ensuring that there was nothing left unmodeled, that
there was nothing in the model that was not in the actual device, and that each line
of the model represented the actual behavior of the microcode.
This review was made possible because the model of the AAMP7G was designed
to correspond to the actual device, in particular its concrete microprocessor model
that maintained line-for-line correspondence with the microcode and employed a
linear address space model. Although the proof was considerably more challenging
to construct because this approach was taken, the proofs were all machine checked
so little of that effort was borne by the evaluators. The machine-checked formal
analysis allowed the evaluators to focus on validation that the security policy and
model described what they were interested in – operation of a separation kernel and
the AAMP7G – rather than trying to determine through inspection or testing that
the device implementation always did the right thing.
Since we model the AAMP7G instruction set in its entirety, we can analyze
AAMP7G machine code from any source, including compilers and assemblers.
Additionally, since we directly model memory, we merely translate the binary file
for a given AAMP7G machine code program into a list of (address, data) pairs that
can be loaded into ACL2. We load the code, reset the model, and the execution of
the machine code then proceeds, under the control of an eclipse-based user interface
that was originally written to control the actual AAMP7G.
We then validate the AAMP7G instruction set model by executing instruction
set diagnostics on the model that are used for AAMP processor acceptance testing.
A typical diagnostic exercises each instruction, plus context switching, exception
handling, etc.
7 Conclusion
We have presented a summary of the formal modeling and verification that led to
a MILS certificate for the AAMP7G microprocessor, enabling a single AAMP7G
to concurrently process Unclassified through Top Secret codeword information. We
discussed the formal model architecture of the AAMP7G at several levels, includ-
ing the microcode and instruction set levels. We described how the ACL2 theorem
prover was used to develop a formal security specification, the GWV theorem, and
outlined a mathematical proof (machine-checked using ACL2) which established
that the AAMP7G trusted microcode implemented that security specification, in
accordance with EAL 7 requirements. We discussed the evaluation process that val-
idated the formal verification evidence through a code-to-spec review. Finally, we
detailed a technique for compositional reasoning at the instruction set level, using a
symbolic simulation based technique.
Acknowledgments Many thanks to three excellent teams that contributed greatly to this effort: the
AAMP7G development team at Rockwell Collins, the ACL2 development team at the University
of Texas at Austin, and the AAMP7G security evaluation team from the US DoD.
Formal Verification of Partition Management for the AAMP7G Microprocessor 191
References
1. Alves-Foss J, Taylor C (2004) An analysis of the GWV security policy. In: Proceedings of the
fifth international workshop on ACL2 and its applications, Austin, TX, Nov. 2004
2. Best D, Kress C, Mykris N, Russell J, Smith W (1982) An advanced-architecture CMOS/SOS
microprocessor. IEEE Micro 2(3):11–26
3. Common Criteria for Information Technology Security Evaluation (CCITSE) (1999) Available
at http://www.radium.ncsc.mil/tpep/library/ccitse/ccitse.html
4. Greve D (2004) Address enumeration and reasoning over linear address spaces. In: Proceedings
of ACL2’04, Austin, TX, Nov. 2004
5. Greve D (2010) Information security modeling and analysis. In Hardin D (ed) Design and
verification of microprocessor systems for high-assurance applications. Springer, Berlin,
pp 249–299
6. Greve D, Wilding M, Hardin D (2000) High-speed, analyzable simulators. In: Kaufmann M,
Manolios P, Moore JS (eds) Computer-aided reasoning: ACL2 case studies. Kluwer, Dordrecht,
pp 89–106
7. Greve D, Wilding M, Vanfleet M (2003) A separation kernel formal security policy. In: Pro-
ceedings of ACL2’03
8. Greve D, Richards R, Wilding M (2004) A summary of intrinsic partitioning verification.
In: Proceedings of ACL2’04, Austin, TX, Nov. 2004
9. Hardin D, Wilding M, Greve D (1998), Transforming the theorem prover into a digital design
tool: from concept car to off-road vehicle. In: Hu A, Vardi M (eds) CAV’98, vol 1427 of LNCS.
Springer, Berlin, pp 39–44
10. Hardin D, Smith E, Young W (2006) A robust machine code proof framework for highly secure
applications. In: Proceedings of ACL2’06, Seattle, WA, Aug. 2006
11. Kaufmann M, Manolios P, Moore JS (2000) Computer-aided reasoning: an approach. Kluwer,
Dordrecht
12. Matthews J, Moore JS, Ray S, Vroon D (2006) Verification condition generation via theorem
proving. In: Proceedings of LPAR’06, vol 4246 of LNCS, pp 362–376
13. Moore JS (2003) Inductive assertions and operational semantics. In Geist D (ed) CHARME
2003, vol 2860 of LNCS. Springer, Berlin, pp 289–303
14. Moore JS, Boyer R (2002) Single-threaded objects in ACL2. In: Proceedings of PADL 2002,
vol 2257 of LNCS. Springer, Berlin, pp 9–27
15. Richards R (2010) Modeling and security analysis of a commercial real-time operating system
kernel. In Hardin D (ed) Design and verification of microprocessor systems for high-assurance
applications. Springer, Berlin, pp 301–322
16. Richards R, Greve D, Wilding M, Vanfleet M (2004) The common criteria, formal methods,
and ACL2. In: Proceedings of the fifth international workshop on ACL2 and its applications,
Austin, TX, Nov. 2004
17. Rockwell Collins, Inc. (2003) AAMP7r1 reference manual
18. Rockwell Collins, Inc. (2005) Rockwell Collins receives MILS certification from NSA on
microprocessor. Rockwell Collins press release, 24 August 2005. http://www.rockwellcollins.
com/news/page6237.html
19. RTCA, Inc. (2000) Design assurance guidance for airborne electronic hardware, RTCA/DO-
254
20. Rushby J (1981) Design and verification of secure systems. In: Proceedings of the eighth sym-
posium on operating systems principles, vol 15, December 1981
21. Rushby J (1999) Partitioning for safety and security: requirements, mechanisms, and assurance.
NASA contractor report CR-1999–209347
22. Wilding M, Hardin D, Greve D (1999) Invariant performance: a statement of task isolation
useful for embedded application integration. In: Weinstock C, Rushby J (eds) Proceedings of
dependable computing for critical applications – DCCA-7. IEEE Computer Society Depend-
able Computing Series
Compiling Higher Order Logic by Proof
There has recently been a surge of research on verified compilers for languages like
C and Java, conducted with the aid of proof assistants [22, 24, 25]. In work of this
kind, the syntax and semantics of the levels of translation – from the source language
to various intermediate representations and finally to the object code – are defined
explicitly by datatypes and inductively defined evaluation relations. Verification of a
program transformation is then typically performed by proving semantics preserva-
tion, e.g. by proving that a simulation relation holds, usually by rule induction over
the evaluation relation modelling the operational semantics. This deep-embedding
approach, in which the compiler under study is a logical function from a source
datatype to a target datatype, both represented in the logic, is a by-now classi-
cal methodology, which advances in proof environments support increasingly well.
A major benefit of such a formalized compiler is that all datatypes and algorithms
comprising the compiler are explicitly represented in the logic and are therefore
available for a range of formal analyses and manipulations. For example, compila-
tion algorithms are being re-scrutinized from the perspective of formal verification,
with the result that some are being precisely specified for the first time [41] and are
even being simplified. Finally, the technique applies to a wide range of functional
and imperative programming languages.1
1
There are a couple of choices when it comes to the deployment of such a verified compiler.
Since it is a deep embedding, an actual working compiler exists in the logic and deductive steps
can be used to compile and execute programs. This is not apt to scale, so it is much more likely
that the formalized compiler is automatically written out into the concrete syntax of an existing
programming language, compiled, and subsequently deployed in a standard fashion.
K. Slind ()
Rockwell Collins, Inc., Bloomington, MN, USA
e-mail: klslind@rockwellcollins.com
2
There are some exceptions; for example, to the authors’ knowledge, no efficient purely func-
tional Union-Find algorithm has yet been found. However, see [10] for an imperative Union-Find
implementation that can be used in a purely functional manner.
Compiling Higher Order Logic by Proof 195
There are a variety of solutions to this problem. For example, one could prove
an equivalence between f and the operational semantics applied to the AST corre-
sponding to f , or one could attempt to prove the desired functional properties of
the AST using the operational semantics directly [42]. We will not investigate these
paths. Instead, we will avoid the verified compiler and compile the logical functions.
We have been pursuing an approach, based on the use of verified rewrite rules, to
construct a verifying compiler for the functional programming language inherent
in a general-purpose logical framework. In particular, a subset of the term language
dwelling within higher order logic (HOL) is taken as the source language; thus, there
is no AST type in our approach. Intermediate languages introduced during compi-
lation are not embodied in new types, only as particular kinds of terms. This means
that source programs and intermediate forms are simply functions in HOL enjoying
exactly the same semantics. Thus, a major novelty of our compiler is that pro-
gram syntax and operational semantics need not be formalized. In addition, program
transformations are isolated clearly and specified declaratively, as term rewrites; a
different order of applying rewrites can lead to a different certifying compiler. For a
rewriting step, a theorem that establishes the equality for the input and result of the
transformation is given as by-product. We call this technique compilation by proof,
and it can be seen as a fine-grained version of translation validation [40] in which
the much of the validation is conducted offline, when proving the rewrite rules.
Each intermediate language is derived from the source language by restricting
its syntax to certain formats and introducing new administrative terms to facilitate
compilation and validation. Thus, an intermediate language is a restricted instance of
the source language. One advantage of this approach is that intermediate forms can
be reasoned about using ordinary facilities supplied by the logic implementation,
e.g. ˇ-conversion. Our compiler applies translations such as normalization, inline
expansion, closure conversion, polymorphism elimination, register allocation and
structured assembly generation in order to translate a source program into a form
that is suitable for machine code generation. Before examining these more closely,
we first provide a self-contained discussion of the HOL logic and its implementation
in the HOL4 system [45].
The logic implemented by HOL systems [12] (there are several mature implementa-
tions besides HOL4, including ProofPower and HOL Light)3 is a typed higher order
predicate calculus derived from Church’s Simple Theory of Types [9]. HOL is a
3
Isabelle/HOL is a similar system; it extends the HOL logic with a Haskell-like type-class system.
196 K. Slind et al.
classical logic and has a set theoretic semantics, in which types denote non-empty
sets and the function space denotes total functions. Formulas are built on a lambda
calculus with an ML-style system of types with polymorphic type variables.
Formally, the syntax is based on signatures for types (˝) and terms (˙˝ ). The
type signature assigns arities to type operators, while the term signature assigns
constants their types. These signatures are extended by principle of definition for
types and terms, as discussed later.
Definition 1 (HOL types). The set of types is the least set closed under the follow-
ing rules:
Type variable. There is a countable set of type variables, which are represented with
Greek letters, e.g. ˛, ˇ, etc..
Compound type. If op in ˝ has arity n, and each of ty1 ; : : : ; tyn is a type, then
.ty1 ; : : : ; tyn /op is a type.
A type constant is represented by a 0-ary compound type. Types are definitionally
constructed in HOL, building on the initial types found in ˝: truth values (bool),
function space (written ˛ ! ˇ), and an infinite set of individuals (ind).
Definition 2 (HOL terms). The set of terms is the least set closed under the fol-
lowing rules:
Variable. if v is a string and ty is a type built from ˝, then v W ty is a term.
Constant. (c W ty) is a term if c W is in ˙˝ and ty is an instance of , i.e. there
exists a substitution for type variables , such that each element of the range of is
a type in ˝ and ./ D ty.
Combination. .M N / is a term of type ˇ if M is a term of type ˛ ! ˇ and N is a
term of type ˛.
Abstraction. (v: M ) is a term of type ˛ ! ˇ if v is a variable of type ˛ and M is
a term of type ˇ.
Initially, ˙˝ contains constants denoting equality, implication, and an indefinite
description operator ":
Types and terms form the basis of the prelogic, in which basic algorithmic ma-
nipulations on types and terms are defined: e.g. the free variables of a type or term,
˛-convertibility, substitution, and ˇ-conversion. For describing substitution in the
following, the notation ŒM1 7! M2 N is used to represent the term N where all
free occurrences of M1 have been replaced by M2 . Of course, M1 and M2 must
have the same type in this operation. During substitution, every binding occurrence
of a variable in N that would capture a free variable in M2 is renamed to avoid the
capture taking place.
Compiling Higher Order Logic by Proof 197
) -intro `Q `P )Q `P ) -elim
fP g ` P ) Q [ `Q
^-intro `P `Q `P ^Q ^-elim
[ `P ^Q `P `Q
_-intro `P 1 `P _Q _-elim
` P _ Q; `Q_P 2; P `M 3; Q ` M
1 [ 2 [ 3 `M
Assume P `P `M DM Refl
Sym `M DN ` M D N; ` N D P Trans
`N DM [ `M DP
Comb ` M D N; ` P D Q `M DN Abs
[ `M P DN Q ` .v:M / D .v:N /
Bool ` P _ :P
Eta ` .v: M v/ D M
Select ` P x ) P ."x: P x/
Infinity ` 9f W ind ! ind: .8x y: .f x D f y/ ) .x D y// ^ 9y:8x: :.y D f x/
2.2 Definitions
This concludes the formal description of the basic HOL logic. For more extensive
discussion, see [35].
4
However, in some cases, it is practical to allow external proof tools to be treated as oracles deliv-
ering HOL theorems sans proof. Such theorems are tagged in such a way that the provenance of
subsequent theorems can be ascertained.
200 K. Slind et al.
The view of proof in HOL4 is that the user interactively develops the proof at a high
level, leaving subsidiary proofs to automated reasoners. Towards this, the system
provides an underlying database of theorems (case analysis, induction, etc.) which
supports user control of decisive proof steps. In combination with a few ‘declarative
proof’ facilities, this allows many proofs to be conducted at a high level.
HOL4 provides a suite of automated reasoners. All produce HOL proofs. Propo-
sitional logic formulas can be sent off to external SAT tools and the resulting
resolution-style proofs are backtranslated into HOL proofs. For formulas involv-
ing N, Z, or R, decision procedures for linear arithmetic may be used. A decision
procedure for n-bit words has recently been released. For formulas falling (roughly)
into first-order logic, a robust implementation of ordered resolution, implemented
by Joe Hurd, is commonly used.
Finally, probably the most commonly used proof technique in HOL (in common
with other proof systems) is simplification. There are several simplification proof
tools. For example, there is a call-by-value evaluation mechanism which reduces
ground, and some symbolic, terms to normal form [3]. A more general, and more
heavily used, tool – the simplifier – provides conditional and contextual ordered
rewriting, using matching for higher order patterns. The simplifier may be extended
with arbitrary context-aware decision procedures.
As mentioned, most simple proofs in HOL can be accomplished via a small amount
of interactive guidance (specifying induction or case-analysis, for example) fol-
lowed by application of the simplifier and first-order proof search. However, it is
common for proof tool developers to write their own inference procedures, special-
ized to the task at hand. Such work is typically based on tactics and conversions.
Compiling Higher Order Logic by Proof 201
The HOL4 system provides a wide collection of theories on which to base further
verifications: booleans, pairs, sums, options, numbers (N, Z, Q, R, fixed point,
floating point, and n-bit words), lists, lazy lists, character strings, partial orders,
monad instances, predicate sets, multi-sets, finite maps, polynomials, probability,
abstract algebra, elliptic curves, lambda calculus, program logics (Hoare logic,
separation logic), machine models (ARM, PPC, and IA32), temporal logics (!-
automata, CTL, -calculus, and PSL), and so on. All theories have been built up
definitionally and together represent hundreds of man-years of effort by researchers
and students.
HOL4 also has an informal notion of a library, which is a collection of theo-
ries, APIs, and proof procedures supporting a particular domain. For example, the
library for N provides theories formalizing Peano Arithmetic and extensions (nu-
merals, gcd, and simple number theory), a decision procedure, simplification sets
for arithmetic expressions, and an extensive collection of syntactic procedures for
manipulating arithmetic terms. Loading a library extends the logical context with
the types, constants, definitions, and theorems of the comprised theories; it also
automatically extends general proof tools, such as the simplifier and the evaluator,
with library-specific contributions.
Both theories and libraries are persistent: this is achieved by representing them
as separately compiled ML structures. A ‘make’-like dependency maintenance
tool is used to automatically rebuild formalizations involving disparate collections
of HOL4 libraries and theories, as well as ML or external source code in other
programming languages.
2.3.5 Applications
Peter Sewell and colleagues have used HOL4 to give the first detailed formal spec-
ifications of commonly used network infrastructure (UDP, TCP) [6]. This work has
heavily used the tools available in HOL4 for operational semantics. They also imple-
mented a derived inference rule which tested the conformance of real-world traces
with their semantics.
As an application of the HOL4 backend of the Ott tool [43], Scott Owens
has formalized the operational semantics of a large subset of OCaml and proved
type soundness [36]. The formalization heavily relied upon the definition packages
for datatypes, inductive relations, and recursive functions. Most of the proofs pro-
ceeded by rule induction, case analysis, simplification, and first-order proof search
with user-selected lemmas. In recent work, Norrish has formalized the semantics of
CCC [34].
An extremely detailed formalization of the ARM due to Anthony Fox sits at the
centre of much current work in HOL4 focusing on the verification of low-level soft-
ware. The development is based on a proof that a micro-architecture implements the
ARM instruction set architecture. In turn, the ISA has been extended with so-called
‘Thumb’ instructions (which support compact code) and co-processor instructions.
On top of the ISA semantics, Myreen has built a separation logic for the ARM and
provided proof automation [31].
Compiling Higher Order Logic by Proof 203
The source language for our compiler is a subset of the HOL term language. This
subset, called TFL in [44], amounts to a polymorphic, simply typed, higher order,
pure functional programming language supporting pattern matching over algebraic
datatypes. A ‘program’ in this language is simply a (total) mathematical function,
and its semantics are obtained by applying the semantics of classical HOL [12].
Thus, notions of program execution, including evaluation order, are absent. This
approach has several benefits:
1. Proofs about TFL programs may be conducted in the ordinary mathematics
supported by HOL. Reasoning about a TFL program is typically based on the
induction theorem arising from the recursion structure of the program, rather
than induction along the evaluation relation of an operational semantics.
2. Many front end tasks in a compiler are already provided by HOL4: lexical anal-
ysis, parsing, type inference, overloading resolution, function definition, and
termination proof.
3. The syntax of the language resembles the pure core subset of widely used func-
tional programming languages such as SML and OCAML. Thus, our results can
be easily extended to these practical languages.
The syntax of TFL is shown in Fig. 2, where Œtermseparator means a sequence of
terms separated by the separator.
For example, Quicksort can be defined by the following invocations of a package
implementing the automatic definition of TFL functions:
Define
‘(PART P [] l1 l2 = (l1,l2)) /\
(PART P (h::rst) l1 l2 =
if P h then PART P rst (h::l1) l2
else PART P rst l1 (h::l2))‘;
tDefine
"QSORT"
‘(QSORT ord [] = []) /\
(QSORT ord (h::t) =
let (l1,l2) = PART (\y. ord y h) t [] []
in
QSORT ord l1 ++ [h] ++ QSORT ord l2)‘
(WF_REL_TAC ‘measure (LENGTH o SND)‘ THEN
< ... rest of termination proof ... >);
The definition of the partition function PART is by primitive recursion, using pat-
tern matching over Nil and Cons5 Similarly, the definition of QSORT is recursive;
however, an explicit termination proof (mostly elided) is needed in this case. Thus,
Define automatically performs a simple but useful class of termination proofs while
tDefine has the termination argument explicitly supplied as a tactic. Reasoning
about QSORT is performed by using an induction theorem automatically derived af-
ter termination is proved.
In the following, we will sometimes not distinguish between fun and Define
or between datatype and Hol datatype. As well, we use some ASCII renderings
of mathematical symbols. For example v:M is replaced by \v. M and ^ is ren-
dered as /\.
4 Compilation by Rewriting
The compilation process performs transformations that are familiar from exist-
ing functional language compilers except that transformations are implemented by
deductive steps. TFL’s high-level features such as polymorphism, higher order func-
tions, pattern matching, and composite expressions need to be expressed in terms of
much lower level structures. Briefly, the translator
Converts pattern matching first into nested case expressions and eventually into
explicit conditional expressions
Removes polymorphism from TFL programs by making duplications of poly-
morphic datatype declarations and functions for each distinct combination of
instantiating types
Names intermediate computation results and imposes an evaluation order in the
course of performing a continuation-passing-style (CPS) transformation
5
The ML-like notation [] and infix :: is surface syntax for Nil and Cons.
Compiling Higher Order Logic by Proof 205
ty case f1 : : : fn .c1
!
x / D f1
!
x
::
:
ty case f1 : : : fn .cn
!
x / D fn
!
x
206 K. Slind et al.
0 1 0 1
z WW st ack st ack
B v1 WW pat s1 D rhs1 ; C B pat s1 D rhs1 Œz v1 ; C
B CD B C
@::: A @::: A
vn WW pat sn D rhsn pat sn D rhsn Œz vn
0 1
z WW st ack
B C p WW pat s D rhs ; C
B 1 11 11 11 C
B C
B::: C
B C
B Cn p1k1 WW pat s1k1 D rhs1k1 C D ty case . 1 :M1 / ::: . n :Mn / z
B C
B Cn pn1 WW pat sn1 D rhs11 ; C
B C
@::: A
Cn p1kn WW pat snkn D rhsnkn
0 1
iWW st ack
B pi1 @ pat sk1 D rhsk1 ; C
where Mi D B C for i D 1; : : : ; n
@::: A
piki @ pat siki D rhsiki
Fig. 3 Pattern matching
is defined. For example, the case expression for the natural numbers is defined as
Case expressions form the target of the pattern-matching translation. The algo-
rithm shown in Fig. 3 converts a sequence of clauses of the form Œpati D rhsi j
into a nested case expression. takes two arguments: a stack of subterms that are
yet to be matched and a matrix whose rows correspond to the clauses in the pat-
tern. All rows are of equal length, and the elements in a column should have the
same type.
Conversion proceeds from left to right, column by column. At each step the
first column is examined. If each element in this column is a variable, then the head
variable z in the stack is substituted for the corresponding vi for the right-hand side
of each clause. If each element in the column is the application of a constructor for
type and contains constructor c1 ; : : : ; cn , then the rows are partitioned into n
groups of size k1 ; : : : ; kn according to the constructors. After partitioning, a row
.c.p/N WW patsI rhs/ has its lead constructor discarded, resulting in a row expression
.pN @ patsI rhs/. Here WW is the list constructor, and @ appends the second list to
the first one. If constructor ci has type 1 ! ! j ! , then new variables
i D v1 W 1 ; : : : ; vj W j are pushed onto the stack. Finally the results for all groups
are combined into a case expression for the specified type.
(gcd (x, y) =
if y <= x then gcd (x - y, y)
else gcd (x, y - x))‘
After the pattern-matching transformation is used to define the function and ter-
mination is proved, we obtain the theorem (the irregularly named variables are an
artefact of the pattern-matching translation)
|- gcd a =
pair_case (\v v1.
num_case
(num_case 0 (\v7. SUC v7) v1)
(\v4. num_case (SUC v4)
(\v9. if SUC v4 >= SUC v9
then gcd (SUC v4 - SUC v9,SUC v9)
else gcd (SUC v4,SUC v9 - SUC v4))
v1)
v) a
where operator isCi tells whether a variable matches the i th constructor Ci , i.e.
isCi .Cj ! x / D T iff i D j ; and operator destCij is the j th projection func-
tion for constructor ci . These operations are automatically defined when a datatype
is declared. In addition, an optimization is performed to tuple variables: if an ar-
gument variable x has type 1 # : : : #n , then it is replaced by a tuple of new
variables .x1 ; : : : ; xn /. Superfluous branches and ‘let’ bindings are removed and
some generally useful rewrites are applied. In this manner, the gcd equations are
converted to
|- gcd (n1,n2) =
if n1 = 0 then
if n2 = 0 then 0 else n2
else
if n2 = 0 then n1
else
if n1 >= n2
then gcd (n1 - n2, n2)
else gcd (n1, n2 - n1)
4.2 Polymorphism
duplicate polymorphic datatype declarations at each ground type used and a function
declaration at each type used, resulting in multiple monomorphic clones of polymor-
phic datatypes and functions. This step paves the way for subsequent conversions
such as type-based defunctionalization. Although this approach would seem to lead
to code explosion, it is manageable in practice. For example, MLton, a high-quality
compiler for Standard ML, uses similar techniques and reports maximum increase
of 30% in code size.
The first step is to build an instantiation map that enumerates, for each datatype
and function declaration, the full set of instantiations for each polymorphic type. As
mentioned above, a TFL program will be type checked by the HOL system and be
decorated with polymorphic type variables such as ˛; ˇ; : : : when it is defined. In
particular, type inference is done for (mutually) recursive function definitions. The
remaining task is then to instantiate the generic types of a function with the actual
types of arguments at its call sites, and this is also achieved by type inference.
The notation used in this section is as follows. A substitution rule R D .t ,!
fT g/ maps a parameterized type t to a set of its type instantiations; an instantiation
set S D fRg is a set of substitution rules; and an instantiation map M D fz ,! S g
maps a datatype or a function z to its instantiation set S . We write M:y for the value
at field y in the map M ; if y … Dom M then M:y returns an empty set. The union of
two substitution
S sets S1 [s S2 is ft ,! .S1 :t [ S2 :t/ j t 2 Dom S1 [ Dom S2 g. We
write s fS g forS the union of a set of substitution rules. The union of two instantia-
tion maps M1 m M2 is defined similarly. The S composition of two instantiation sets
S1 and S2 , denoted as S1 ır S2 , is fz ,! fS2 :t j t 2 Dom S1 g j z 2 Dom S1 g.
Finally, the composition of an instantiation map M and a set S is defined as
M ım S D fz ,! M:z ır S j z 2 Dom M g.
The instantiation information of each occurrence of a polymorphic function and
datatype is gathered into an instantiation map during a syntax-directed bottom-up
traversal. The main conversion rules and shown in Fig. 4 build the instantia-
tion map by investigating types and expressions respectively. The rule for a single
variable/function declaration is trivial and omitted here: we just need to walk over
the right-hand side of its definition. If a top-level function f is called in the body of
another function g, then g must be visited first to generate an instantiation map Mg
and then f is visited to generate Mf ; finally these two maps are combined to a new
one, i.e. ..Mf ı Mg :f / [m Mg /. The clauses in mutually recursive functions can
be visited in an arbitrary order.
This algorithm makes use of a couple of auxiliary functions provided by the
HOL system. Function con2tp.c/ maps a constructor c to the datatype to which it
belongs; at tp D returns if there is a datatype definition datatype D D of : : :;
when x is either a function name or a constructor, and match tp x matches
the original type of x (i.e. the type when x is defined) with and returns a
substitution set.
After the final instantiation map is obtained, we duplicate each polymorphic
datatype and function declaration for all combinations of its type instantiations and
replace each call of the polymorphic function with the call to its monomorphic clone
with respect to the type. The automatic correctness proof for the transformation is
Compiling Higher Order Logic by Proof 209
datatype .˛; ˇ/ ty D c of ˛ # ˇ
fun f .x W ˛/ D x
fun g .x W ˇ; y W / D let h.z/ D c.f .x/; f .z// in h.y/
j D .g .1 W num; F/; g .F; T//
In this definition, h has the inferred type ! .ˇ; / ty. The algorithm builds the
following instantiation maps:
This transformation bridges the gap between the form of expressions and control
flow structures in TFL and assembly. A TFL program is converted to a simpler
form such that (1) the arguments to function and constructor applications are atoms
like variables or constants; (2) discriminators in case expressions are also simple
expressions; (3) compound expressions nested in an expression are lifted to make
new ‘let’ bindings; and (4) curried functions are uncurried to a sequence of simple
functions that take a single tupled argument.
To achieve this, a CPS transformation is performed. The effect is to sequential-
ize the computation of TFL expressions by introducing variables for intermediate
results, and the control flow is pinned down into a sequence of elementary steps. It
extends the one in our software compiler [27] by addressing higher level structures
specific to TFL. Since there is no AST of programs in our approach, the CPS trans-
lation cannot be defined over a particular type; instead, it comprises a generic set of
rewrite rules applicable to any HOL term. The basis is the following definition of
the suspended application of a continuation parameter:
` cek D ke :
To CPS an expression e we create the term c e . x:x/ and then exhaustively apply
the following rewrite rules, which are easy to prove since they are just rearrange-
ments of simple facts about lambda calculus (Fig. 5).
Application of these rules pushes occurrences of c deeper into the term. After this
phase of rewriting finishes, we rewrite with the theorem ` c e k D let x D e in k x
and ˇ-reduce to obtain a readable ‘let’-based normal form.
Example 3. As a simple example with no control flow, consider the following oper-
ation found in the TEA block cipher [47]:
ShiftXor .x; s; k0 ; k1 / D .x 4 C k0 / ˚ .x C s/ ˚ .x 5 C k1 /
All operations are on 32-bit machine words. In HOL4’s ASCII representation this is
|- ShiftXor (x,s,k0,k1) =
((x << 4) + k0) ?? (x+s) ?? ((x >> 5) + k1)
4.4 Defunctionalization
In the next phase of compilation, we convert higher order functions into equivalent
first-order functions and hoist nested functions to the top level. This is achieved
through a type-based closure conversion. After the conversion, no nested functions
exist; and function call is made by dispatching on the closure tag followed by a
top-level call.
Function closures are represented as algebraic data types in a way that, for each
function definition, a constructor taking the free variables of this function is created.
For each arrow type, we create a dispatch function, which converts the definition of
a function of this arrow type into a closure constructor application. A nested func-
tion is hoisted to the top level with its free variables to be passed as extra arguments.
After that, the calling to the original function is replaced by a calling to the rel-
evant dispatch function passing a closure containing the values of this function’s
free variables. The dispatch function examines the closure tag and passes control
to the appropriate hoisted function. Thus, higher order operations on functions are
replaced by equivalent operations on first-order closure values.
As an optimization, we first run a pass to identify all ‘targeted’ functions which
appear in the arguments or outputs of other functions and record them in a side effect
variable Targeted. Non-targeted functions need not to be closure converted, and
calls to them are made as usual. During this pass we also find out the functions to be
defined at the top level and record them in Hoisted. Finally Hoisted contains
all top-level functions and nested function to be hoisted.
The conversion works on simple typed functions obtained by monomorphisation.
We create a closure datatype and a dispatch function for each of the arrow types that
212 K. Slind et al.
As shown in Fig. 6, the main translation algorithm inspects the references and
applications of targeted functions and replaces them with the corresponding clo-
sures and dispatch functions. Function returns the new types of variables. When
walking over expressions, replaces calls to unknown functions (i.e. those not
presented in Hoisted) with calls to the appropriate dispatch function and calls
to known functions with calls to hoisted functions. In this case, the values of free
variables are passed as extra arguments. Function references are also replaced with
appropriate closures. Finally Redefn contains all converted functions, which will
be renamed and redefined in HOL at the top level.
Now we show the technique to prove the equivalence of a source function f to
its converted form f 0 . We say that a variable v0 W 0 corresponds v W iff (1) v D v0 if
both and 0 are closure type or neither of them is. (2) 8x8x 0 : dispatch 0 .v0 ; x 0 / D
v x if v0 is a closure type and v is an arrow type, and x 0 corresponds to x; or vice
.v W T / D T
.v W 1 ! 2 / D if v 2 Targeted then clos1 !2 else 1 ! 2
.v W D/ D . / D
.Œv; / D Œ .v/;
.v W / D if v 2 Targeted then consv else v W clos
.Œe; / D Œ .e/;
.p e/ D p . .e//
.c e/ D c . .e//
..f W / e/ D if f 2 Hoisted then .new name of f / . .e/; fv f /
else dispatch .f W clos ; .e//
.if e1 then e2 else e3 / Dif .e1 / then .e2 / else .e3 /
.case e1 of Œc e2 Ý e3 j / D case .e1 / of Œ. .c e2 // Ý .e3 /j
.let f D ! v : e1 in e2 / D.˚.f !
v D e1 / I .e2 //
.let v D e1 in e2 / Dlet v D .e1 / in .e2 / when e1 is not a expression
˚.fid . !v W / D e/ D
let e 0 D .e/ in
Redefn WD Redefn C .fid ,! Redefn:fid [ f.fid W ! .tp of e 0 //
!
v D e0 g
˚.Œfdecl ^ / D Œ˚.fdecl /I
versa. Then f 0 is equivalent to f iff they correspond to each other. The proof pro-
cess is simple, as it suffices to simply rewrite with the old and new definitions of the
functions.
Example 4. The following higher order program
is closure converted to
where 1 and 2 stand for arrow types num ! bool and num ! num respectively.
The following theorems (which are proved automatically) justify the correctness of
this conversion:
`f Df0 ` k0 D k
` .8x: dispatch1 .s 0 ; x/ D s x/ ) 8x8y: dispatch2 .g 0 .s 0 ; x/; y/
D .g .s; x// y
In this section, the naming convention is: variables yet to be allocated begin with v,
variables spilled begin with m (memory variable), and those in registers begin with
r (register variable). Notation ‘ ’ matches a variable of any of these kinds. vO , rO ,
and m O stand for a fresh variable, an unused register, and a new memory location,
respectively. Predicate r v specifies that variable v is assigned to register r; by
definition 8r 2 Smach : r r and 8r 2v Smach 8m: r ¸ m (where Smach is the
set of machine registers). Notation avail e returns the set of available registers after
allocating e, i.e. avail e D Smach fr j 8w: w 2 e ^ r wg. Administrative terms
app, save and restore are all defined as x:x; and loc .v; l/ D l indicates that
variable v is allocated to location l (where l D r or m). A function application is
denoted by app.
When variable v in expression let v D e1 in e2 Œv is to be assigned a register, the
live variables to be considered are just the free variables in e2 , excluding v. If live
variables do not use up all the machine registers, then we pick an available register
and assign v to it by applying rule assgn. Otherwise, we spill to the memory a
variable consuming a register and assign this register to v. In some cases, we prefer
to spill a variable as early as possible: in the early spill rule variable w’s value is
spilled from r for future use; r may not be allocated to v in the subsequent allocation.
When encountering a memory variable in later phases, we need to generate code
that will restore its value from the memory to a register (the vO in rule restore will
be assigned a register by the subsequent application of rule assgn).
The allocation can be viewed as being implemented by rewriting with the following
set of rules.
specifies that an expression matching redex can be replaced with the instantiated
contractum provided that side condition P over the redex holds. The declarative
part of the rule, redex ! contractum, is a HOL theorem that characterizes the
transformation to be performed; the control part, P; specifies in what cases the
Compiling Higher Order Logic by Proof 215
rewrite should be applied. Notation eŒv stands for an expression that has free
occurrences of expression v; and eŒv1 ; : : : ; vn ! eŒw1 ; : : : ; wn indicates that,
for 8i: 1 i n, all occurrences of vi in e are replaced with wi .
Saving is necessary not only when registers are spilled, but also when functions
are called. Our compiler adopts the caller-save convention, so every function call is
assumed to destroy the values of all registers. Therefore, we need to save the values
of all registers that are live at that point, as implemented in the caller save rule. In
addition, as we allocate the two branches of a conditional expression separately, a
variable may be assigned different registers by the branches. This will contradict the
convention that a variable should be assigned only one register. In this case, we just
need to early spill it through the spill if rule.
In the final step, all save, store, and loc in an expression are eliminated. This
results in an equivalent expression containing only register variables and memory
variables.
6 Related Work
on higher order rewrite rules. A set of unverified rewriting rules are used to convert a
higher level program to a lower level program. They use higher order abstract syntax
to represent programs and do not define the semantics of these programs. Thus, no
formal verification of the rewriting rules is done.
Proof producing compilation for smaller subsets of logic has already been in-
vestigated in a prototype hardware compiler, which synthesizes Verilog netlists
[14], and a software compiler [28], which produced low-level code from first-order
HOL functions.
Hannan and Pfenning [17] constructed a verified compiler in LF for the untyped
-calculus. The target machine is a variant of the CAM runtime and differs greatly
from real machines. In their work, programs are associated with operational seman-
tics; and both compiler transformation and verifications are modeled as deductive
systems. Chlipala [8] further considered compiling a simply typed -calculus to
assembly language. He proved semantics preservation based on denotational se-
mantics assigned to the intermediate languages. Type preservation for each compiler
pass was also verified. The source language in these works is the bare lambda cal-
culus and is thus much simpler than TFL; thus, their compilers only begin to deal
with the high-level issues, which we discuss in this paper.
Compared with Chlipala [8] who gives intermediate languages dependent types,
Benton and Zarfaty [4] interpret types as binary relations. They proved semantic
type soundness for a compiler from a simple imperative language with heap-
allocated data into an idealized assembly language.
Leroy [7, 25] verified a compiler from a subset of C, i.e. Clight, to PowerPC as-
sembly code in the Coq system. The semantics of Clight is completely deterministic
and specified as a big-step operational semantics. Several intermediate languages
are introduced, and translations between them are verified. The proof of semantics
preservation for the translation proceeds by induction over the Clight evaluation
derivation and case analysis on the last evaluation rule used; in contrast, our proofs
proceed by verifying the rewriting steps.
A purely operational semantics-based development is that of Klein and Nipkow
[22] which gives a thorough formalization of a Java-like language. A compiler from
this language to a subset of Java Virtual Machine is verified using Isabelle/HOL.
The Isabelle/HOL theorem prover is also used to verify the compilation from a
type-safe subset of C to DLX assembly code [24], where a big step semantics and
a small step semantics for this language are defined. Meyer and Wolff [29] de-
rive in Isabelle/HOL a verified compilation of a lazy language called MiniHaskell
to a strict language called MiniML based on the denotational semantics of these
languages.
a range of program analyses, some of which may be done by the compiler itself,
but some of which are external. Among the most important of these analyses is
program verification. In a setting where program properties are important enough to
be formally proved, the following are desirable:
A source language with clear semantics and a good program logic.
A compilation path, the result of which is machine code with a guarantee that
executing that code on the target platform yields a result equal to that specified
by the source program.
The approach taken in this paper satisfies the above criteria. Since the source lan-
guage is a subset of HOL functions, the semantics are clear and the program logic
is just HOL. We have also shown how compilation of a program can produce lower
level programs as an integral part of proving the compilation run correct.
In future work, we want to scale up the compilation algorithms and investigate
wider application of the technique. In particular, what applications formalize easily
in pure functional programming and are important enough to warrant full functional
verification and correctness of compilation?
Acknowledgement Thanks to David Hardin, Mike Gordon, Magnus Myreen, and Thomas Türk
for help, encouragement, and advice.
References
11. Dave MA (2003) Compiler verification: a bibliography. ACM SIGSOFT Softw Eng Notes
28(6):2
12. Gordon M, Melham T (1993) Introduction to HOL, a theorem proving environment for higher
order logic. Cambridge University Press, Cambridge
13. Gordon M, Milner R, Wadsworth C (1979) Edinburgh LCF: a mechanised logic of computa-
tion, Lecture notes in computer science, vol 78. Springer, Berlin
14. Gordon M, Iyoda J, Owens S, Slind K (2005) Automatic formal synthesis of hardware from
higher order logic. In: Proceedings of fifth international workshop on automated verification of
critical systems (AVoCS), ENTCS, vol 145
15. Gordon MJC, Hunt WA, Kaufmann M, Reynolds J (2006a) An embedding of the ACL2 logic
in HOL. In: Proceedings of ACL2 2006, ACM international conference proceeding series, vol
205. ACM, New York, NY, pp 40–46
16. Gordon MJC, Reynolds J, Hunt WA, Kaufmann M (2006b) An integration of HOL and ACL2.
In: Proceedings of FMCAD 2006. IEEE Computer Society, Washington, DC, pp 153–160
17. Hannan J, Pfenning F (1992) Compiler verification in LF. In: Proceedings of the 7th sympo-
sium on logic in computer science
18. Harrison J (1995) Inductive definitions: automation and application. In: Schubert ET, Windley
PJ, Alves-Foss J (eds) Proceedings of the 1995 international workshop on higher order logic
theorem proving and its applications (Aspen Grove, Utah), LNCS, vol 971. Springer, Berlin,
pp 200–213
19. Harrison J (1998) Theorem proving with the real numbers. CPHC/BCS distinguished disserta-
tions, Springer, Berlin
20. Hickey J, Nogin A (2006) Formal compiler construction in a logical framework. High Order
Symbolic Comput 19(2–3):197–230
21. Kaufmann M, Manolios P, Moore JS (2000) Computer-aided reasoning: an approach. Kluwer,
Dordrecht
22. Klein G, Nipkow T (2006) A machine-checked model for a Java-like language, virtual machine
and compiler. TOPLAS 28(4):619–695 619–695
23. Krauss K (2009) Automating recursive definitions and termination proofs in higher order logic.
PhD thesis, Institut für Informatik, Technische Universität München
24. Leinenbach D, Paul W, Petrova E (2005) Towards the formal verification of a C0 compiler:
code generation and implementation correctness. In: 4th IEEE international conference on soft-
ware engineering and formal methods (SEFM 2006)
25. Leroy X (2006) Formal certification of a compiler backend, or: programming a compiler with
a proof assistant. In: Proceedings of POPL 2006. ACM, New York, NY
26. Leroy X (2009) Formal verification of a realistic compiler. Commun ACM 52(7):107–115
27. Li G, Slind K (2007) Compilation as rewriting in higher order logic. In: Conference on auto-
mated deduction (CADE-21), July 2007
28. Li G, Owens S, Slind K (2007) Structure of a proof-producing compiler for a subset of higher
order logic. In: 16th European symposium on programming (ESOP’07)
29. Meyer T, Wolff B (2004) Tactic-based optimized compilation of functional programs. In:
FilliOatre J-C, Paulin-Mohring C, Werner B (eds) TYPES 2004. Springer, Heidelberg
30. Myreen M (2009) Formal verification of machine-code programs. PhD thesis, University of
Cambridge
31. Myreen M, Gordon M (2007) Hoare logic for realistically modelled machine code. In: Pro-
ceedings of TACAS 2007, LNCS vol 4424. Springer, Berlin
32. Myreen M, Slind K, Gordon M (2009) Extensible proof-producing compilation. In: de Moor
O, Schwartzbach M (eds) Compiler construction, LNCS, vol 5501. Springer, Heidelberg
33. Nipkow T, Paulson LC, Wenzel M (2002) Isabelle/HOL – a proof assistant for higher-order
logic, LNCS, vol 2283. Springer, Berlin
34. Norrish M (2008) A formal semantics for CCC. In: Informal proceedings of TTVSI
35. Norrish M, Slind K (2009) The HOL system: logic, 1998–2009. At http://hol.sourceforge.net/
36. Owens S (2008) A sound semantics for OCaml-Light, In: Proceedings of ESOP 2008, LNCS,
vol 4960. Springer, Berlin
220 K. Slind et al.
37. Owre S, Rushby JM, Shankar N, Stringer-Calvert DJ (1998) PVS system guide. SRI Computer
Science Laboratory, Menlo Park, CA. Available at http://pvs.csl.sri.com/manuals.html
38. Paulson L (1983) A higher order implementation of rewriting. Sci Comput Program 3:119–149
39. Pfenning F, Elliot C (1988) Higher order abstract syntax. In: Proceedings of PLDI. ACM,
New York, NY, pp 199–208
40. Pnueli A, Siegel M, Singerman E (1998) Translation validation. In: Proceedings of TACAS’98,
Lecture notes in computer science, vol 1384. Springer, Berlin, pp 151–166
41. Rideau L, Serpette B, Leroy X (2008) Tilting at windmills with Coq: formal verification of a
compilation algorithm for parallel moves. J Autom Reason 40(4):307–326
42. Ridge T (2009) Verifying distributed systems: the operational approach. In: Shao Z, Pierce BC
(eds) Proceedings of the 36th ACM SIGPLAN-SIGACT symposium on principles of program-
ming languages, POPL 2009, Savannah, GA, USA, January 21–23, 2009. ACM, New York,
NY, pp 429–440
43. Sewell P, Nardelli F, Owens S, Peskine G, Ridge T, Sarkar S, Strnisa R (2007) Ott: effective
tool support for the working semanticist. In: Proceedings of ICFP 2007. ACM, New York, NY
44. Slind K (1999) Reasoning about terminating functional programs. PhD thesis, Institut für In-
formatik, Technische Universität München
45. Slind K, Norrish M (2008) A brief overview of HOL4. In: Mohamed O, Muñoz C, Tahar S
(eds) TPHOLs, Lecture notes in computer science, vol 5170. Springer, Heidelberg, pp 28–32
46. Tolmach A, Oliva DP (1998) From ML to Ada: strongly-typed language interoperability via
source translation. J Funct Program 8(4):367–412
47. Wheeler D, Needham R (1999) TEA, a tiny encryption algorithm. In: Fast software encryption:
second international workshop, Lecture notes in computer science, vol 1008. Springer, Berlin,
pp 363–366
Specification and Verification of ARM Hardware
and Software
This introductory section provides a high-level summary of the history and evolving
goals of the ARM verification project. Section 2, by Anthony Fox, is a more detailed
look into the modelling and verification of ARM processors. Section 3, by Magnus
Myreen, is more detailed than the others and introduces a new method for creating
trustworthy software implementations directly on bare metal. This approach uses
the Fox processor model for the semantics of a machine code programming logic
that borrows some ideas from separation logic.
In the late 1990s, Graham Birtwistle, at the University of Leeds, was investi-
gating the use of the Standard ML (SML) functional programming language for
modelling ARM processors. He approached Mike Gordon, a longtime collabora-
tor, about the possibility of a joint project to extend the Leeds modelling work to
formal verification. Birtwistle and Gordon, together with contacts at ARM Ltd in
Cambridge, submitted a research proposal to the UK Engineering and Physical Sci-
ences Research Council (EPSRC) entitled “Formal Specification and Verification of
ARM6”. This application was initially turned down on the grounds that the ARM6
processor was obsolete. However, following a strong letter from ARM pointing out
that they could not place more modern designs in the public domain, the project was
funded on resubmission.
The EPSRC project supported two PhD students at Leeds: Dominic Pajak and
Daniel Schostak and a postdoctoral researcher at Cambridge: Anthony Fox. Pajak
and Schostak developed SML models of the ARMv3 ISA and ARM6 micro-
architecture, respectively. They both had summer internships at ARM in Cambridge,
and this enabled them to talk to ARM engineers to find out details, especially
concerning the ARM6 micro-architecture, that were not easily available. Fox took
details from Pajack and Schostak’s models, and public ARM documentation, and
developed formal specifications in higher order logic (HOL) suitable for formal
1
For example, the class of temporal abstraction maps required for superscalar designs is necessar-
ily more general than that needed for conventional pipelined processors.
2
At the time, a pen-and-paper proof of correctness was not feasible/attempted for the superscalar
design. Since then some bugs have been identified.
Specification and Verification of ARM Hardware and Software 225
In the latter half of 2000, Fox moved to Cambridge to start working on the ARM6
project. Work had already begun at Leeds; however, their ARM6 model had not
been completed yet. This meant that Fox, who had no previous experience in using
theorem provers, could gradually start learning HOL4 (then at version Taupo-4).
Getting to know HOL can be challenging but fortunately he shared an office with
Michael Norrish and there were various other HOL gurus around, including Kon-
rad Slind. As an initial project, the Swansea approach was formalised in HOL. This
involved defining predicates that characterised the various classes of state systems
and abstraction maps (for example, state-dependent immersions); formalising the
definition of correctness (one general enough to cover conventional pipelined pro-
cessors); and proving the 1-step theorems. Then the framework was given a test run
with the formal verification of a tiny micro-programmed CPU, see [5]. This was
followed by the formal verification of the pipelined design from Fox’s thesis.3
Work on specifying the ARM instruction set architecture (ISA) in HOL began in
2001, see [6]. In this context, the ISA is taken to correspond with the assembly
programmer’s view of an architecture. In general, programmers have access to a
fixed set of registers (contained in a CPU) and to a much larger main memory –
this is usually connected to the processor via a memory bus.4 To write code, the
assembly programmer has at their disposal a set of low-level instructions – these all
update the registers and memory in various precisely defined ways. For example,
typically there will be a set of data processing instructions, which use an arithmetic
logic unit (ALU) to perform primitive operations – such as addition, multiplica-
tion or bitwise logic – on registers.5 There will also be a set of memory access
instructions for loading data from memory to registers and for storing registers to
memory. The overall set of instruction is often extended with the introduction of
new architecture generations. The number and variety of registers and instructions
can vary considerably across platforms but there will normally be at least a handful
of registers and few dozen or soinstructions. Instructions are encoded as a sequence
3
A minor bug was found in the pen-and-paper proof.
4
In practice, memory may be implemented with a series of caches, firmware, RAM and sometimes
with virtual memory e.g. a hard disk or a solid-state drive (SSD). However, memory details are
invariably implementation dependent and are mostly hidden from the programmer. In some cases,
the actual behaviour can be somewhat counter-intuitive, see [1].
5
In CISC architectures, these instructions may address the memory as well as just registers.
226 A.C.J. Fox et al.
of bits (machine code) to be stored in the main memory. In the x86 architecture,
instructions have a variable length6 but with the ARM architecture all instructions
are 32-bits long.
The official descriptions of ISAs need to be relatively precise. This is invariably
achieved through the use of pseudo-code and in some cases the descriptions are
semi-formal. To define an operational semantics for the ARM architecture in HOL,
Fox used the specifications produced by Birtwistle’s group at Leeds, in conjunction
with Steve Furber’s book [10] and the official ARM610 data sheet. The objective
was to accurately declare a type S , corresponding with the programmer’s model
state space (registers and memory), and to define a next state function next W S ! S
that specifies the operational semantics of the ARM instructions i.e. the effect of
the instructions on the registers and memory. Fortunately, HOL provides excellent
support for modelling systems in a functional style, thanks to its “type base” tools,7
and by virtue of Slind’s TFL environment, see [29]. The specification was structured
according to instruction classes i.e. groups of similar instructions were specified as
a whole. To begin with I/O was not considered, in particular, hardware interrupts
were not modelled.
where RName represents the complete set of register names e.g. r8 usr and CPSR.
Although HOL has good support for working with algebraic types, there was a slight
problem with regards to modelling machine words. At the time, HOL had a theory
of words developed by Wai Wong, see [31]; however, this theory was list based and
made heavy use of restricted quantifiers, with predicates used to restrict the scope
of universal quantifiers.8 It was decided that this theory would be too cumbersome
to use in the context of the ARM6 verification effort, particularly with regards to
symbolic evaluation. This started the winding road to developing HOL’s current
6
This is mainly because the x86 architectures has its origins in 8-bit and 16-bit computing.
Although this variable instruction length can greatly complicate the hardware needed to decode
instructions, it can give excellent code density. ARM added a set of 16-bit (Thumb) instructions in
order to improve code density.
7
It is possible to define and work with types in HOL that correspond with algebraic data types.
8
HOL is based on simple type theory and does not directly support predicate sub-typing.
Specification and Verification of ARM Hardware and Software 227
theory of n-bit words, with the latest version using an idea from John Harrison
(see [17]) in order to get around the perceived need for restricted quantifiers.
In 2002, the ARM6 micro-architecture was modelled in HOL, see [7], and in 2003
a formal verification was completed, see [8]. The ARM6 microprocessor dates from
around 1994 and was widely deployed in a number of low-powered devices, such as
the Apple Newton PDA. The processor’s micro-architecture is relatively simple, em-
ploying a three-stage pipeline with fetch, decode and execute stages. As with other
commercial designs, details of the processor’s implementation are not in the public
domain. It was only through collaboration with ARM Ltd, and Daniel Schostak’s
internship there, that it was possible to develop the formal model. The ARM6 pro-
cessor is no longer in production, which was a factor in us gaining permission to
carry out this research. However, it is worth noting that the ARM9 (circa 2004 and
used in the Nokia N-Gage) is not a superscalar design and has a five-stage pipeline.
It can be argued therefore that the verification of the ARM6 is still pertinent with
respect to some more modern designs.
Daniel Schostak produced a very detailed model of the ARM6 for his thesis,
see [28]. He only introduced a limited amount of abstraction, modelling the RTL
(register transfer level) with a two-phase clock model. A limited amount of data
abstraction was applied when producing the cycle accurate HOL model. One of the
most useful resources in achieving this was Schostak’s tabular style paper specifi-
cation.9 For example, his tabular description of the DIN latch (which stores input
from the data bus) is shown below.
DIN
IC IS
* *
data_proc t2 IREG
mrs_msr t2 IREG
ldr t2 IREG
ldr t4 DATA
str t2 IREG
ldm t4 DATA
ldm tn DATA
swp t4 DATA
br t2 IREG
mrc t4 DATA
ldc t2 IREG
stc t2 IREG
x x x
9
Schostak produced extensive paper specifications of the ARM6 using various styles. He also
produced a high-fidelity implementation in ML and now works full time at ARM Ltd.
228 A.C.J. Fox et al.
Here ic represents the instruction class and is is the instruction step, for example,
t3 is the first cycle of the pipeline execute stage and tn represents an iterated phase.
By defining the next-state behaviour of all of the processor’s latches and buses, it
was possible to define a next-state function for the entire ARM6 core. Formal veri-
fication was proceeded by case splitting over the instruction class – the final version
had 17 such classes. Inevitably, a small number of bugs were found in all of the
specifications. Ultimately the ARM6 can be regarded as a reference implementation
and so the formal verification can be seen as an exercise in developing an ISA model
that is a verified abstraction of the processor.
2.4.1 Coverage
Somewhat confusingly, the ARM6 processor implements version three of the ARM
architecture, written as ARMv3. To begin with, all of the ARMv3 instructions were
modelled at the ISA level but some “hard” features were not included in the first
ARM6 model – accordingly they were dropped from the ISA model prior to carry-
ing out the initial verification attempt. The omissions included the mul, ldm and
stm instructions, which all have relatively complex low-level behaviour (an iterated
phase).10 To complete, the formal proof invariants were constructed for these cases.
The coprocessor instructions and hardware interrupts required models with input
and output and this is discussed below. A feature complete formal verification was
finished in 2005, see [9].
To accommodate input and output (I/O) features, the HOL formalisation of the
Swansea approach was extended. It was also necessary to make significant changes
to the ISA and micro-architecture models, and the formal verification required a fair
amount of reworking. More sophisticated reasoning is required when verifying the
correctness of coprocessor instructions and hardware interrupts. For example, the
10
The ARM6 ALU does not contain a multiplier, so instead the processor’s adder and shifter are
used to implement Booth’s algorithm over a number of clock cycles.
Specification and Verification of ARM Hardware and Software 229
communication between the ARM core and coprocessor happens through a busy-
wait loop, which has to be assumed to terminate after some indeterminate interval.
It is also necessary to reason about the priority and timing of interrupts and, added
to this mix, a reset signal can abort instructions at any cycle.
In the process of adding I/O, the memory was removed from the state-space of
the ISA and micro-architecture specifications. At the ISA level, this meant that the
state-space consisted of just the programmer’s model registers, together with an
instruction register (the op-code of the instruction to be run) and a exception status
field, that is:
Following the formal verification of 2005, it was decided to extend the ISA model
and focus on machine code verification, forgoing the considerable overhead associ-
ated with further extending and re-verifying the ARM6 model. It was at about this
time that Magnus Myreen started his PhD at Cambridge. To begin with ARMv3M
was supported (with the inclusion of long [64-bit] multiplies) and then ARMv4 was
covered (through the addition of half-word and signed load and store instructions).
At the time of writing this article, the ARMv4 architecture is still very much in use
– it is implemented by a selection of processors in the ARM7, ARM8 and ARM9
families (as used in the Nintendo DS and Apple iPod).
After making these extensions, the next step was to provide support for reason-
ing about assembly code. In particular, it is not especially practical to work directly
with 32-bit machine-code values. To this end, a HOL type was added to represent
decoded ARM instructions; a parser/assembler was written;11 ; there was also sup-
port for pretty-printing instructions i.e. providing disassembly of machine code.
A new top-level next-state function was defined (using the existing definitions as
sub-functions) and this reintroduced the main memory as part of the state-space.
Consequently it was again possible to run code using the model and one could
11
This was originally done using mosmllex and mosmlyac and later ported to mllex and
mlyacc, so as to generate Standard ML.
230 A.C.J. Fox et al.
also start reasoning about the semantics of programs. A pure memory model was
assumed, that is to say, the memory was treated as a simple array with read and
write accesses that never fail. A fast method for running code (useful in testing
the model) was provided through the use of Konrad Slind’s EmitML tool – this
converted the HOL definitions into Standard ML. This ML code was compiled with
MLton, resulting in an instruction throughput performance of approximately 10,000
instructions-per-second (10 kips).
It was then necessary to address the problem that the formal model some-
what obfuscates the behaviour of particular instruction instances. For example,
one cannot read the specification and immediately see the effect of the instruc-
tion add r1,r2,r3. The reasons for this are: the underlying model is based
on machine code; the specification is structuring according to instructions classes
(not instruction instances); the overall semantics is expressed through one mono-
lithic, top-level next-state function. To address this, a collection of single-step
theorems of the following form are generated:
P .s/ ) .next.s/ D s 0 / :
With the addition of the 16-bit Thumb instructions, the ARMv4 model was later
extended to ARMv4T. A more advanced mechanism for constructing a complete
system was also examined i.e. building a system composed of ISA, memory and
coprocessors models. A compositional, circuit-based style was adopted, wherein
the output from one unit is connected to the input of another. This means that one
can more easily consider different system configurations, for example, “plugging-
in” different memory models. This contrasts with the previous approach wherein the
memory was more hard-wired into the ISA specification.
In addition to his work with the ARM architecture, Myreen has also worked with
formal models of the x86 and PowerPC architectures. The x86 model initially came
from collaborating with Susmit Sarkar, who has been working with Peter Sewell and
Specification and Verification of ARM Hardware and Software 231
others in the field of relaxed memory models, see [1]. This group made enquiries
as to the suitability of the ARM model with respect to their research. However, a
single-step operational semantics was not what they are after – they needed to know
the precise order of all memory and register accesses. In collaboration with Myreen,
they had developed a monadic approach to ISA specification and, inspired by this,
Fox agreed to completely re-specify the entire ARM ISA using this approach. This
would provide an event-based semantics for work on relaxed memory models and
an operational semantics for work on code verification.
In their monadic approach, three principle operators are used: sequencing
(seqT or >>=), parallel composition (parT or |||) and returning a constant
value (constT or return). For example, in
(f ||| g) >>= ( (x,y). return (x + y + 1))
the operations f and g are performed in parallel and the results are then combined
in a summation and returned. The overall type of this term is num M and the pre-
cise details of this type are hidden underneath a HOL-type abbreviation on M. For
example, in the standard ARM operational semantics, we have:
’a M D arm state ! (’a, arm state) error option
Here arm state is the state-space and error option is just like the standard
functional option type, except that the “none” case is tagged with a string, which
provides a useful mechanism for reporting erroneous behaviour. In this sequential
operational semantics, the parT operator is evaluated sequentially with a left-to-
right ordering e.g. f is applied before g in the example above.
There are many advantages to working in this monadic style, these include:
The ability to modify the underlying semantics by simply changing the monad’s
type and the definitions of the monadic operators.
The ability to avoid excessive parameter passing and to hide details of the state-
space. In some cases, there might not even be a state-space.
It provides a clean way to handle erroneous cases. In particular, it is easy to model
behaviour that the ARM architecture classifies as Unpredictable.
With some pretty-printing support, the definitions look more like imperative
code. This makes the specifications more readable to those unfamiliar with func-
tional programming and it also provides a more visible link with pseudo-code
from reference manuals.
For example, consider the following pseudo-code from the ARM architectural ref-
erence manual:
232 A.C.J. Fox et al.
With pretty-printing turned on this corresponds with the following HOL code:
` 8 ii address.
branch write pc ii address =
do
iset current instr set ii;
if iset = InstrSet ARM then
do
version arch version ii;
if version < 6 ^ (1 >< 0) address <> 0w then
errorT "branch write pc: unpredictable"
else
branch to ii ((31 <> 2) address)
od
else
branch to ii ((31 <> 1) address)
od
Although the translation is not literal, there is clearly a connection between the
two specifications. The function branch write pc has return type unit M, that
is to say, it is similar to a procedure (or void function in C). The HOL model
introduces a variable ii, which is used to uniquely identify the source of all read
and write operations – this becomes significant in multi-core systems with shared
memory. The operator errorT is used to handle the unpredictable case. The word
extract and slice operations (>< and <>) are used to implement the bit operations
shown in the ARM reference. Inequality is overloaded to be <>, which corresponds
with != in the pseudo-code. Observe that the HOL specification does not explicitly
refer to state components; such details are hidden by the monad, and the operations
arch version and current instr set automatically have access to all the
data that they need. In the sequential model, the state actually contains a component
that identifies the specific version of the architecture being modelled e.g. ARMv4 or
ARMv4T, both of which give a version number of four. This makes it possible to
simultaneously support multiple architecture versions. Further refinement has also
been made in the process of producing the new specification, especially with regard
to instruction decoding and the representation of instructions.
2.6.1 Coverage
The monadic specification covers all of the currently supported ARM architec-
ture versions, that is to say: ARMv4, ARMv4T, ARMv5T, ARMv5TE, ARMv6,
ARMv6K, ARMv6T2, ARMv7-A, ARMv7-R and ARMv7-M. A significant num-
ber of new instruction were introduced with ARMv6, which was introduced with
the ARM11 family of processors. The latest generation (ARMv7) has only a small
number of extra ARM instructions but these versions do all support Thumb2 – this
provides a large number of double-width Thumb instructions, which cover nearly all
of the functionality of the standard ARM instructions. In fact, the latest Cortex-M
processors only run Thumb2 instructions and are designed to be used as microcon-
trollers. The Cortex-A8 processor (as found in the Apple iPhone 3GS and Palm Pre)
implements ARMv7-A.
Specification and Verification of ARM Hardware and Software 233
The HOL model also covers the Security and Multiprocessor extensions. It does
not support Jazelle, which provides hardware support for running Java bytecode.
Technical details about Jazelle and its implementations are restricted to ARM li-
censees only, see [2]. Consequently, a HOL specification of Jazelle is very unlikely.
Documentation is available for the ThumbEE, VFP (vector floating-point) and
Advanced SIMD extensions, but they have not been specified yet – the SIMD ex-
tensions were introduced with ARMv7 and the associated infrastructure is referred
to as NEONTM technology.
Recently a tool for generating single-step theorems for the monadic model has been
developed. These theorems are now generated entirely on-the-fly for specific op-
codes.12 This contrasts with the previous approach whereby a collection of pre-
generated theorems (effectively templates) are stored and then specialised prior to
use. The old approach is not practical in the context of the much larger number of
instructions and range of contexts. The single-step theorems are generated entirely
through forward proof and so the process is not especially fast. Consequently, it
may prove necessary to store some of the resulting theorems in order to improve
runtimes further down the line.
The function call
arm_stepLib.arm_step "v6T2,be,thumb,sys" "FB02F103";
For brevity/clarity, some parts of the antecedent have been omitted. The first
argument to arm step is a string containing configuration options e.g. the archi-
tecture version and the byte ordering. The second string is the instruction op-code.
In the example above, 0xFB02F103 is the machine code for the Thumb2 instruc-
tion mul r1,r2,r3.13 The four instruction bytes are read from memory using the
program-counter, which is register 15.
12
This tool makes heavy use of the HOL conversion EVAL.
13
At the moment op-codes are being generated using GNU’s binutils tools. FB02F103 breaks
up into 251w, 2w, 241w and 3w, which are used in the theorem.
234 A.C.J. Fox et al.
The next-step theorem shown above bears little or no visible resemblance to the
underlying monadic specification. The functions in uppercase are defined in post
hoc manner, so as to present a more conventional state-oriented semantics. The top-
level next-state function ARM NEXT returns an option type – if an error occurred
(e.g. with an unpredictable case), then the result would be NONE, but in practice the
tool raises an ML exception for such cases.
Current work includes focusing on updating the parser, assembler and pretty-
printing support. The instruction parser has been completely rewritten in ML,
abandoning the use of mllex and mlyacc. This provides greater flexibility in sup-
porting multiple architectures and encoding schemes. It should also facilitate better
code portability. It would have been possible to avoid writing an assembler and in-
stead interface with GNU’s binutils but this would require users to specifically
install these tools, configuring them as an ARM cross-compiler. Also, only the very
latest version of the GNU assembler supports Thumb2, and it does appear to have a
small number of teething problems (bugs) in that area. The parser is nearly complete
– generating op-codes is the next stage.
Another future area of work will be in handling I/O. It should be relatively
straightforward to add hardware interrupts. Readers may have observed that the se-
quential version of the monadic model again includes the memory as part of the
state-space. This could be considered to be a regressive step in comparison with
the approach discussed at the end of Sect. 2.5. However, the monadic approach
does make it easier to modify the underlying memory model.14 It is expected that
memory-mapped I/O (MMIO) can be supported by interleaving calls to next-state
functions i.e. the ISA next-state function would be interleaved with a MMIO next-
state function.
In 2005, Myreen started his PhD which came to focus on theories and tools for
proving ARM machine code correct on top of Fox’s formal specification of the
ARM ISA. This section presents the current state of the resulting framework which
has come to support both formal verification of existing ARM machine code and
synthesis of new ARM code from functional specifications. The three layers of this
framework are as follows:
1. Hoare logic for machine code is used for making concise and composable formal
specifications about ARM code (Sect. 3.1).
14
Although at the moment, the arm step tool does make some assumptions about the memory
model.
Specification and Verification of ARM Hardware and Software 235
A formal definition of this Hoare triple will be given later, also see [22].
The frame property manifests itself in practice as a proof rule called the frame
rule (again borrowed from separation logic). The frame rules allows an arbitrary as-
sertion r to be added to any Hoare triple fpg c fqg using the separating conjunction
(defined later):
fpg c fqg ) 8r: fp rg c fq rg
The frame property of our Hoare triples allows us to only mention locally rel-
evant resources, e:g: a theorem describing the ARM instruction, add r4,r3,r4
(encoded as 0xE0834004), need only mention resources register 3, 4 and 15 (the
program counter). For example the following Hoare-triple theorem states, if register
3 has value a, register 4 has value b and the program counter is p, then the code
E0834004 at location p will reach a state where register 3 has value a, register 4
holds a C b and the program counter is set to p C 4:
f r3 a r4 b pc p g
p W E0834004
f r3 a r4 .a C b/ pc .p C 4/ g
The frame rule allows us to infer that the value of register 5 is left unchanged by
the above ARM instruction, since we can instantiate r in the frame rule above with
an assertion saying that register 5 holds value c, i:e: r5 c.
236 A.C.J. Fox et al.
f r3 a r4 b pc p r5 c g
p W E0834004
f r3 a r4 .a C b/ pc .p C 4/ r5 c g
All user-level ARM instructions satisfy specification in this style. Memory reads
and writes are not much different, e:g: Hoare-triple theorem describing the in-
struction swp r4,r4,[r3] (E1034094) for swapping the content of memory
location a given in register 3 with that of register 4 is given as follows. Here m m
states that a function m, a partial mapping from addresses (32-bit words) to values
(32-bit words), correctly represents a portion of memory (addresses domain m),
address a must be in the memory portion covered by m and for tidiness needs to be
word-aligned, i:e: a & 3 D 0; we write mŒa 7! b for m updated to map a to b.
a & 3 D 0 ^ a 2 domain m )
f r3 a r4 b m m pc p g
p W E1034094
f r3 a r4 .m.a// m .mŒa 7! b/ pc .p C 4/ g
The following subsections will present the definition of our machine-code Hoare
triple and some proof rules (HOL theorems) that have been derived from the defini-
tion of the Hoare triple and are hence sound.
The definition of our machine-code Hoare triple uses the separating conjunction,
which we define unconventionally to split sets rather than partial functions. Our
definition of the set-based separating conjunction states that .p q/ s whenever s
can be split into two disjoint sets u and v such that p holds for u and q holds for v:
.p q/ s D 9u v: p u ^ q v ^ .u [ v D s/ ^ .u \ v D fg/
arm2set state D
range . r: Reg r .arm read reg r state// [
range . a: Mem a .arm read mem a state// [
range . s: Status s .arm read status s state// [
f Undef .arm read undefined state/ g
Some basic assertions are defined over sets of ARM state elements as follows.
We often write r1 a, r2 b, etc: as abbreviations for reg 1 a, reg 2 b, etc:
.reg i a/ s D .s D f Reg i a g/
.mem a w/ s D .s D f Mem a w g/
These assertions have their intended meaning when used with arm2set:
aligned a D .a & 3 D 0/
emp s D .s D fg/
hbi s D .s D fg/ ^ b
.code c/ s D .s D f Mem .aŒ31—2/ i j .a; i / 2 c g/
.m m/ s D .s D f Mem .aŒ31—2/ .m a/ j a 2 domain m ^ aligned a g/
.pc p/ s D .s D fReg 15 p; Undef Fg/ ^ aligned p
.s .n; z; c; v// s D .s D f Status N n; Status Z z; Status C c; Status V v g/
Let run.n; s/ be a function which applies the next-state function from our ARM
ISA specification n times to ARM state s.
run.0; s/ D s
run.nC1; s/ D run.n; arm next state.s//
Our machine-code Hoare triple has the following definition: any state s which sat-
isfies p separately from code c and some frame r (written p code c r) will
238 A.C.J. Fox et al.
reach (after some k applications of the next-state function) a state which satisfies q
separately from the code c and frame r (written q code c r).
Below we list some theorems proved from the definition of our Hoare triple. These
theorems are cumbersome to use in manual proofs, but easy to use in building proof
automation, which is the topic of the next two sections.
Frame: fpg c fqg ) 8r: fp rg c fq rg
The frame rule allows any assertions to be added to the pre- and postconditions of a Hoare
triple, often applied before composition.
Specification and Verification of ARM Hardware and Software 239
3.2.1 Example
Given some ARM code, which calculates the length of a linked list,
0: E3A00000 mov r0, #0 ; set reg 0 to 0
4: E3510000 L: cmp r1, #0 ; compare reg 1 with 0
8: 12800001 addne r0, r0, #1 ; if not equal: add 1 to reg 1
12: 15911000 ldrne r1, [r1] ; load mem[reg 1] into reg 1
16: 1AFFFFFB bne L ; jump to compare
the decompiler reads the hexadecimal numbers, extracts a function f and a safety
condition fpre which describe the data-update performed by the ARM code:
The decompiler also proves the following theorem which state that f is accu-
rate with respect to the ARM model, for input values that satisfy fpre . Here
.k1 ; k2 ; : : : ; kn / is .x1 ; x2 ; : : : ; xn / abbreviates k1 x1 k2 x2 kn xn ,
i.e. expression .r0; r1; m/ is .r0 ; r1 ; m/ states that register 0 has value r0 , register 1
is r1 and part of memory is described by m.
The user can then prove that the original machine code indeed calculates the
length of a linked-list by simply proving that the extracted function f does that. Let
list state that abstract list l is stored in memory m from address a onwards.
list .nil; a; m/ D a D 0
list .xWWl; a; m/ D 9a0 : m.a/ D a0 ^ m.aC4/ D x ^ a 6D 0 ^
list .l; a0 ; m/ ^ aligned a
Given (2) and (3), it follows immediately from (1) that the ARM code calculates the
length of a linked-list correctly:
3.2.2 Implementation
The following loop-introduction rule is the key idea behind our decompiler imple-
mentation. This rule can introduce any tail-recursive function tailrec, with safety
condition tailrec pre, of the form:
Given a theorem for the step case, fr.x/g c fr.F x/g, and one for the base case,
fr.x/g c fr 0 .D x/g, the loop rule can introduce tailrec:
The loop rule can be derived from the rule for composition of Hoare triples given
in Sect. 3.1. For details of decompilation, see [22, 24].
It is often the case that we prefer to synthesise ARM code from specifications rather
than apply post hoc verification to existing ARM code. For this purpose, we have
developed a proof-producing compiler [25] which maps tail-recursive functions in
HOL4, i.e. functional specifications, to ARM machine code and proves that the
ARM code is a valid implementation of the original HOL4 functions.
3.3.1 Example 1
Given a function f,
In case we have manually proved that f calculates unsigned-word modulus of 10, i.e.
8x: f.x/ D x mod 10, then we immediately know that the ARM code calculates
modulus of 10:
3.3.2 Example 2
f.r1 ; r2 ; r3 / D let r1 D r1 C r2 in
let r1 D r1 C r3 in
let r1 D r1 mod 10 in
r1
will compile successfully into a theorem which makes use of the previously verified
code (the last three instructions in the code below):
fr1 r1 r2 r2 r3 r3 pc p sg
p W E0811002; E0811003; E351000A; 2241100A; 2AFFFFFC
fr1 .f.r1 ; r2 ; r3 // r2 r2 r3 r3 pc .pC20/ sg
3.3.3 Implementation
Extensions are implemented by making the decompiler use the theorems pro-
vided when constructing the step- and base-theorems for instantiating its loop rule,
as explained in Sect. 3.2.
The construction of a verified LISP interpreter [23] is the, to date, largest case study
conducted on top of the ARM model. This case study included producing and veri-
fying implementations for:
A copying garbage collector
An implementation of basic LISP evaluation
A parser and printer for s-expressions
3.4.1 Example
For a flavour of what we have implemented and proved, consider an example: if our
implementation is supplied with the following call to function pascal-triangle,
(pascal-triangle ’((1)) ’6)
The theorem we have proved about our LISP implementation can be used to show
e:g: that running pascal-triangle will terminate and print the first n C 1 rows of
Pascal’s triangle, without a premature exit due to lack of heap space. One can use
our theorem to derive sufficient conditions on the inputs to guarantee that there will
be enough heap space.
The most interesting part of this case study is possibly the construction of verified
code for LISP evaluation. For this we used our extensible compiler, described above.
First, the compiler’s input language was extended with theorems that provide
ARM code that performs LISP primitives, car, cdr, cons, equal, etc. These theo-
rems make use of an assertion lisp, which states that a heap of s-expressions v1 : : : v6
is present in memory. For car of s-expressions v1 , we have the theorem:
is pair v1 )
f lisp .v1 ; v2 ; v3 ; v4 ; v5 ; v6 ; l/ pc p g
p W E5933000
f lisp .car v1 ; v2 ; v3 ; v4 ; v5 ; v6 ; l/ pc .p C 4/ g
The cons primitive was the hardest one to construct and prove correct, since the im-
plementation of cons contains the garbage collector: cons is guaranteed to succeed
whenever the size of all live s-expressions is less than the heap limit l.
Once the compiler understood enough LISP primitives, we defined lisp eval as
a lengthy tail-recursive function and used the compiler to synthesise ARM code for
implementing lisp eval.
In order to verify the correctness of lisp eval, we proved that lisp eval will al-
ways evaluate s to r in environment whenever a clean relation semantics for LISP
evaluation, which had been developed in unrelated previous work [11], evaluates s
to r in environment , written .s; / !eval r. Here s-expression nil initialises vari-
ables v2 , v3 , v4 and v6 ; functions t and u are translation functions from one form of
s-expression into another.
8s r: .s; / !eval r ) fst .lisp eval .t s; nil; nil; nil; u ; nil; l// D t r
Specification and Verification of ARM Hardware and Software 245
The heap of s-expressions defined within the lisp assertion used above is non-trivial
to set up. Therefore we constructed verified code for setting up and tearing down a
heap of s-expressions. The set-up code also parses s-expressions stored as a string
in memory and sets up a heap containing that s-expression. The tear down code
prints into a buffer in memory, the string representation of an s-expression from the
heap. The code for set-up/tear-down, parsing/printing, was again synthesised from
functions in the HOL4 logic.
By composing theorems for parsing, evaluation and printing, we get the final cor-
rectness theorem: if !eval relates s with r under the empty environment (i:e:
.s; Œ/ !eval r), no illegal symbols are used (i:e: sexp ok .t s/), running lisp eval
on t s will not run out of memory (i:e: lisp eval pre.t s; nil; nil; nil; nil; nil; l/), the
string representation of t s is in memory (i:e: string a .sexp2string .t s//)
and there is enough space to parse t s and set up a heap of size l (i:e:
enough space .t s/ l), then the code will execute successfully and terminate with
the string representation of t r stored in memory (i:e: string a .sexp2string .t r//).
The ARM code expects the address of the input string to be in register 3, i:e: r3 a.
8s r l p:
.s; Œ/ !eval r ^ sexp ok .t s/ ^ lisp eval pre.t s; nil; nil; nil; nil; nil; l/ )
f 9a: r3 a string a .sexp2string .t s// enough space .t s/ l pc p g
p W ... code not shown ...
f 9a: r3 a string a .sexp2string .t r// enough space’ .t s/ l
pc .pC10404/ g
We have also proved this result for similar x86 and PowerPC code. Our verified
LISP implementations run can be run on ARM, x86 and PowerPC hardware.
The ARM verification project has been a fairly modest scale effort: one person
full-time specifying and verifying the hardware (Fox) and one to two part time
researchers looking at software and the background mathematical theories (Hurd,
Myreen). In addition, some students have spent time assisting the research, namely
Scott Owens, Guodong Li and Thomas Tuerk.
The project aims to verify systems built out of COTS components where ev-
erything – micro-architecture up to abstract mathematics – is formalised within a
single framework. The research is still in progress and, unlike the celebrated CLI
246 A.C.J. Fox et al.
Stack [21], we have not yet completely joined up the various levels of modelling,
but this remains our goal. Unlike most other work, we have used a COTS processor
and have tried (and are still trying) to formally specify as much as possible, includ-
ing difficult features like input/output and interrupts. The closest work we know of is
the verification of security properties of the Rockwell Collins AAMP7G processor
[13, 14]. More on AAMP7G can be found in other chapters of this book.
Even though the ARM ISA is relatively simple, the low-level details can over-
whelm verification attempts. During the project we have found that it is important
to abstract as much as possible so that proofs are not cluttered with such details.
A key tool for this has been the derivation of a next-state function for CPU–memory
combinations which then can be used to derive clean semantic specifications for in-
struction uses-cases and then support a further abstraction to Hoare-like rules for
machine code segments, with the frame problem managed via a separating conjunc-
tion. Some of the technical details pertaining to this abstraction methodology are
sketched in the preceding two sections.
Although our formal specifications include input/output, interrupts and facili-
ties for modelling complex memory models, we have yet (2009) to demonstrate
significant verification case studies that utilize them. Our current work aims to cre-
ate a complete functional programming platform on bare metal, with high-fidelity
modelling of system level timing and communication with the environment. We ex-
pect that achieving this will take several more years of research at the current level
of effort.
References
1. Alglave J, Fox A, Ishtiaq S, Myreen M, Sarkar S, Sewell P, Nardelli FZ (2009) The semantics of
Power and ARM multiprocessor machine code. In: Basin D, Wolff B (eds) Proceedings of the
4th ACM SIGPLAN workshop on declarative aspects of multicore programming. Association
for Computing Machinery, New York, NY, pp 13–24
2. ARM Ltd. (2009) Jazelle technology. http://www.arm.com/products/multimedia/java/jazelle.
html (accessed in July 2009)
3. Burch J, Dill D (1994) Automatic verification of pipelined microprocessor control. Springer,
Berlin, pp 68–80
4. Fox ACJ (1998) Algebraic models for advanced microprocessors. PhD thesis, University of
Wales, Swansea
5. Fox ACJ (2001a) An algebraic framework for modelling and verifying microprocessors using
HOL. In: Technical report 512, University of Cambridge Computer Laboratory, April 2001
6. Fox ACJ (2001b). A HOL specification of the ARM instruction set architecture. In: Technical
report 545, University of Cambridge Computer Laboratory, June 2001
7. Fox ACJ (2002) Formal verification of the ARM6 micro-architecture. In: Technical report 548,
University of Cambridge, Computer Laboratory, 2002
8. Fox ACJ (2003) Formal specification and verification of ARM6. In: Basin D, Wolff B (eds)
Theorem proving in higher order logics, vol 2758 of Lecture notes in computer science.
Springer, Berlin, pp 25–40
9. Fox ACJ (2005) An algebraic framework for verifying the correctness of hardware with in-
put and output: a formalization in HOL. In: Fiadeiro J, Harman N, Roggenbach M, Rutten
Specification and Verification of ARM Hardware and Software 247
JJMM (eds) CALCO 2005, vol 3629 of Lecture notes in computer science. Springer, Berlin,
pp 157–174
10. Furber S (2000) ARM: system-on-chip architecture, 2nd edn. Addison-Wesley, Reading, MA
11. Gordon M (2007) Defining a LISP interpreter in a logic of total functions. In: The ACL2
theorem prover and its applications (ACL2)
12. Gordon MJC (1983) Proving a computer correct with the LCF-LSM hardware verification
system. In: Technical report 42, University of Cambridge Computer Laboratory, 1983
13. Greve D, Wilding M, Vanfleet WM (2003) A separation kernel formal security policy. In:
ACL2 workshop 2003, June 2003
14. Greve D, Richards R, Wilding M (2004) A summary of intrinsic partitioning verification.
In: ACL2 Workshop 2004, November 2004
15. Hardin D (2008) Invited tutorial: considerations in the design and verification of micropro-
cessors for safety-critical and security-critical applications. In: Proceedings of FMCAD 2008,
November 2008
16. Harman NA, Tucker JV (1997) Algebraic models of microprocessors: the verification of a sim-
ple computer. In: Stavridou V (ed) Mathematics of dependable systems II. Oxford University
Press, Oxford, pp 135–170
17. Harrison JR (2005) A HOL theory of Euclidean space. In: Hurd J, Melham T (eds) Theo-
rem proving in higher order logics, 18th International conference, TPHOLs 2005, vol 3603 of
Lecture notes in computer science, Oxford, UK. Springer, Berlin, pp 114–129
18. Hurd J (2005) Formalizing elliptic curve cryptography in higher order logic. Available from
the author’s Web site, October 2005
19. Hurd J, Gordon M, Fox A (2006) Formalized elliptic curve cryptography. In: High confidence
software and systems: HCSS 2006, April 2006
20. McCarthy J, Abrahams PW, Edwards DJ, Hart TP, Levin MI (1966) LISP 1.5 programmer’s
manual. MIT, Cambridge, MA
21. Moore JS (foreword) (1989) Special issue on systems verification. J Autom Reason 5(4):
461–492
22. Myreen MO (2009a) Formal verification of machine-code programs. PhD thesis, University of
Cambridge
23. Myreen MO (2009b) Verified implementation of LISP on ARM, x86 and PowerPC.
In: Theorem proving in higher-order logics (TPHOLs). Springer, Berlin
24. Myreen MO, Slind K, Gordon MJC (2008) Machine-code verification for multiple architec-
tures – an application of decompilation into logic. In: Formal methods in computer aided design
(FMCAD). IEEE, New York, NY
25. Myreen MO, Slind K, Gordon MJC (2009) Extensible proof-producing compilation. In: Com-
piler construction (CC). Springer, Heidelberg
26. Reynolds J (2002) Separation logic: a logic for shared mutable data structures. In: Proceedings
of logic in computer science (LICS). IEEE Computer Society, Washington, DC
27. Sawada J, Hunt WA Jr (2002) Verification of fm9801: an out-of-order microprocessor model
with speculative execution, exceptions, and program-modifying capability. Formal Methods
Syst Des 20(2):187–222
28. Schostak D (2003) Methodology for the formal specification of RTL RISC processor designs
(with particular reference to the ARM6). PhD thesis, University of Leeds
29. Slind K (2009) TFL: an environment for terminating functional programs. http://www.cl.cam.
ac.uk/ks121/tfl.html (accessed in July 2009)
30. Thery L (2007) Proving the group law for elliptic curves formally. In: Technical report RT-
0330, INRIA, 2007
31. Wong W (1983) Formal verification of VIPER’s ALU. In: Technical report 300, University of
Cambridge Computer Laboratory, April 1983
Information Security Modeling and Analysis
David A. Greve
1 Introduction
Coq, in which proof scripts are stored as special comments within PVS theories.
Many of the theories we present here have been edited for the sake of brevity,
but a Web site containing the complete proof scripts can be found by visiting
http://extras.springer.com and entering the ISBN for this book.
While many of the concepts in this chapter were first formalized in the logic
of ACL2, PVS provides a convenient, generally accessible mathematical frame-
work for presenting high-level concepts involving sets, quantifiers, and first-class
functions not available in ACL2. These formalizations also act as a sanity check,
helping to ensure that our understanding of the concepts is consistent and portable
across different formalizations. The theories presented here were developed using
PVS 4.2 and ProofLite 4.0.
In our framework, sequential computing systems that interact with the external envi-
ronment are generally modeled as state machines. A state machine model suggests a
state transition (or step) function that operates over a set of inputs and an initial state
to produce a set of outputs and a next state. This function can be applied iteratively
to successive inputs and states to simulate the evolution of the system state and its
outputs over time.
State machine models are significant in our analysis because they allow us to
decompose our analysis into both the single-step and the multistep (trace) behavior
of the system. Many security properties are best stated as single-step properties.
Some properties, however, must be analyzed over an entire execution trace. State
machine models support the analysis of both.
State transition operations in our framework are modeled functionally. A func-
tional model is one in which an output is computed only from the inputs with no
hidden side-effects. State transition functions must therefore accept as input an ini-
tial state plus inputs, all state changes and outputs must be computed relative to the
initial state and the inputs, and the updated state plus outputs must be returned by
the function. By eliminating side-effects, functional models require that all compu-
tations be explicit. This is important in a security context, where it is essential to be
able to account for all system behavior.
There are many obvious and effective techniques for formally modeling computing
systems. This seems not to be the case for the information processed by comput-
ing systems. This is not to say that there are no useful mathematical models of
Information Security Modeling and Analysis 251
1
Because index values are computed, analysis in our model is, in general, undecidable.
2
Note that such metalogical conclusions often disguise pointer aliasing issues: the faulty intuition
being that syntactically unique symbols must point to unique address locations.
252 D.A. Greve
Every indexing scheme is associated with a specific type of object, though each
type of object may have many indexing schemes. Most common data structures
suggest obvious indexing schemes. The set of field names is a natural choice for
records. Tuples might be well served by a natural number basis that maps to each
tuple position. The contents of a list could be described by their position in the list
and the contents of an association by the keys of the association. Arrays suggest a
very natural indexing scheme: each index value is simply an index into the array.
The fact that array indices are generally computable within the logic highlights the
need to support computation within the calculus. Arguments to functions may also
be indexed, either by name or position, as they may return values.
Sets of index values are important in our formalizations. The PVS theory we use
to model index sets introduces many common set operations as well as short-hand
(infix) notations set insertion, deletion, and union.
IMPORTING sets_lemmas[index]
END IndexSet
2.1.2 Paths
IndexingExample : THEORY
BEGIN
index: TYPE = { a, b, c }
END IndexingExample
Just as with basis sets in linear algebra, a variety of bases may be capable of
accurately modeling a given system. However, unlike basis sets in linear algebra,
it is not always possible to translate models expressed in one basis set into an equiv-
alent model expressed in another basis set. Some care is thus needed in the choice
of a basis set to ensure that it is useful for expressing interesting properties about
the system.
Conversely, the choice of basis set will influence the meaning of the information
flow properties expressed in terms of that basis. It is possible, for example, to choose
a degenerate basis set that would render nearly any information flow theorem vac-
uous. To help guard against such deception, it may be useful to consider several
properties that different basis sets might exhibit:
PolyValued. It is possible for the value projected by each index to assume at least
two unique values. This property ensures that the projection functions are not
constant.
Information Security Modeling and Analysis 255
Divisive. It is possible for the value projected at two unique index values to vary
independently of each other. This property ensures that no projection function
simply returns the entire state.
Orthogonal. The portion of state associated with each index value is independent
of the portions associated with every other index. A nonorthogonal basis gives
the illusion of separation (between two index values) when, in fact, those index
values overlap and are therefore dependent.
Complete. If the values projected at every index are equal, then the states are
equal. If a basis set is incomplete, then there is some portion of the state that is
not observable using that set, and that portion of the state could potentially be
used as a covert channel.
Here we formalize these and additional concepts as predicates over projection
functions.
BEGIN
gettablevalue?(g: projection,
i: index,
v:value): bool =
EXISTS (st: state): v = g(i,st)
PolyTypeIndex: bool =
(EXISTS (i,j: index): (j /= i))
OrthogonalSet_implies_Injectable: LEMMA
FORALL (g: projection):
OrthogonalSet(g) => Injectable(g)
END Ideal
Information Security Modeling and Analysis 257
IndexingExampleProperties: THEORY
BEGIN
IMPORTING IndexingExample
IMPORTING Ideal[state,index,polyvalue]
ExampleIsIdeal: LEMMA
Ideal(get)
END IndexingExampleProperties
from the two objects agree at that index. It is straightforward to extend the notion of
equivalence modulo an index to equivalence modulo a set of indices.
IMPORTING IndexSet[index]
END Equiv
The kind of congruence relation we use to model communication says: given two
arbitrary application instances of a specific function (next), the value at a selected
index3 (seg) in the range of the output of the two application instances will be the
same if the values of the input domains (st1,st2) within a set (DIA) of index
locations are the same.
equivSet(DIA,st1,st2) =>
equiv(seg,next(st1),next(st2))
A set of index values (DIA) satisfying this assertion is called the interferes set of
the index (seg), since it contains every index location that might interfere with (or
influence or communicate with) the final value of the selected output. This set could
also be called the use set of the index, since it must contain every input index value
used4 in computing the specified output. Every input index that does not appear in
3
In our original formulation, an index value was referred to as a seg, which is to say, a segment of
the state.
4
In this context, the term “used” is potentially too broad, since not every index value referenced in
the course of computation needs to be included if they can be shown to be functionally irrelevant.
However, the term “required” is perhaps too narrow, as we allow the use set to be an overapproxi-
mation. A more precise description would be those index values which have not been shown to be
irrelevant.
Information Security Modeling and Analysis 259
this set satisfies a “noninterference” property with respect to the selected output:
it is impossible for such input indices to interfere (communicate) with the selected
output.
The GWV theorem was our earliest formulation of this congruence [7]. The the-
orem was motivated by and specifically targeted toward separation kernels. The
original theorem formulation broke the equivalence relation in the hypothesis into
three components: one similar to the hypothesis shown above, one targeting the cur-
rently active partition, and one that required that the value of seg be equivalent in
the initial state. While the theorem was sufficient for simple, static separation ker-
nels like the AAMP7G, it suffered from some expressive limitations which spurred
development of two more expressive and more general formulations [1, 8].
In the original GWV theorem, DIA (the name was chosen as an acronym for
direct interaction allowed) was expressed as a function of the output index. In that
formulation, next was the transition function of a state machine model, speci-
fying what the system can do in a single step of its operation. The designation
“Direct,” therefore, emphasized the single-step nature of the characterization and
distinguished it from what may take place transitively over multiple steps of the
system.
Origin notwithstanding, this congruence may be used to characterize any func-
tion, not just state transition functions. In subsequent revisions of the theorem, the
computation of the interferes set has become more dynamic. It is now computed in
its full generality as a function of state. For representational convenience, however,
this computation has been decomposed into two steps: the computation of a com-
prehensive information flow model encompassing the behavior of the entire function
and the extraction of a specific interferes set based on the output index being con-
sidered. The comprehensive information flow model is called the information flow
graph. The function that extracts from the graph the interferes set associated with a
specific output index is now referred to as the DIA function.
Our model of graphs in PVS has two types of edges: one that maps an index value
to a set of index values (a Compute edge) and one that maps an index value into
a single index value (a Copy edge). One model of information flow utilizes Copy
edges to model “frame conditions,” locations that remain unchanged following the
execution of a function. Index values associated with Computed edges are locations
in the state that may have in some way been changed during the course of function
execution.
Note that a graph models the information flow of a function and that the type of
the output of a function may differ from the type of its input. This means that, in
general, the type of the index value used to index the graph will differ from the type
of the index value found in the set returned by the graph. The strong typing in PVS
helps to make this explicit. Our GWV Graph theory, therefore, is parameterized by
both the input and the output index types.
IMPORTING IndexSet[INindex]
IMPORTING IndexSet[OUTindex]
IMPORTING GraphEdge[INindex]
The DIA function, defined over graphs, computes the interferes set for a specific
output index by mapping the output index to the set of input index values upon
Information Security Modeling and Analysis 261
which it depends. The DIASet function extends the behavior of DIA to apply to sets
of output indices. DIASet is simply the union of the DIA values for each member
of the set.
The overall define set (or defSet) of a function, the set of locations modified
by the function, can be computed from the graph. The same is true of the set of
locations upon which the define set depends, the use set (or useSet). The inverse
DIA function, a function that computes the set of outputs that depend upon a given
input, is also provided as it is useful for expressing certain graph properties.
END GWV_Graph
Having formalized graphs in PVS, we are now in a position to formalize our exten-
sions of the original GWV information flow theorem, extensions expressed in terms
of information flow graphs. Our ability to quantify over functions in PVS allows us
to express these extensions in their full generality.
IMPORTING GWV_Graph[INindex,OUTindex]
IMPORTING Equiv[INindex,INState,INvalue,getIN]
AS Input
IMPORTING Equiv[OUTindex,OUTState,OUTvalue,getOUT]
AS Output
5
The representational similarity between models of information flow graphs and models of tran-
sition systems is partially responsible for inspiring subsequent work on model checking transitive
information flow properties.
Information Security Modeling and Analysis 263
GWVr1(Next: StepFunction)
(Hyps: PreCondition,
Graph: GraphFunction): bool =
FORALL (x: OUTindex, in1,in2: INState):
Input.equivSet(DIA(x,Graph(in1)),in1,in2) &
Hyps(in1) & Hyps(in2) =>
Output.equiv(x,Next(in1),Next(in2))
GWVr1Set(Next: StepFunction)
(Hyps: PreCondition,
Graph: GraphFunction): bool =
FORALL (x: set[OUTindex], in1,in2: INState):
Input.equivSet(DIAset(x,Graph(in1)),in1,in2) &
Hyps(in1) & Hyps(in2) =>
Output.equivSet(x,Next(in1),Next(in2))
GWVr1_implies_GWVr1Set: LEMMA
FORALL (Next: StepFunction,
Hyps: PreCondition,
Graph: GraphFunction):
GWVr1(Next)(Hyps,Graph) =>
GWVr1Set(Next)(Hyps,Graph)
END GWVr1
It is interesting to note that there is a witnessing graph satisfying GWVr1 for any
function (assuming a Complete basis set). In particular, it is easy to show that the
264 D.A. Greve
graph that says that every output depends upon every input characterizes any func-
tion, not that such a graph is a particularly useful characterization of any function.
IMPORTING Ideal[INState,INindex,INvalue]
GWVr1Existence: LEMMA
Complete(getIN) =>
FORALL (F: StepFunction,H: PreCondition):
EXISTS (G: GraphFunction):
GWVr1(F)(H,G)
END GWVr1Existence
For further insight on how GWVr1 works, it is illustrative to consider how it could
be applied to some simple examples. Consider a state data structure modeled as
a mapping from natural numbers to integers. We define a projection function for
this state (get), as well as a function that allows us to selectively modify specific
portions of the state (set). With these definitions in hand, we import the GWVr1
theory.
GWVr1Test: THEORY
BEGIN
IMPORTING GWVr1[nat,state,int,get,
nat,state,int,get]
addDIA(i:nat,d:nat,g:graph): graph =
(LAMBDA (n:nat):
IF (n = i) THEN
Compute(DIA(i,g) + d)
ELSE
g(n)
ENDIF)
Our first example is a function that modifies two locations of our state data
structure. twoAssignment updates location 0 with the value obtained by read-
ing location 1 and it updates location 2 with the sum of the values from locations
3 and 4.
AUTO_REWRITE+ twoAssignment!
twoAssignmentGraph(st:state): graph =
addDIA(0,1,addDIA(2,3,addDIA(2,4,ID)))
AUTO_REWRITE+ twoAssignmentGraph!
Assuming that we have covered all of the cases, we should now be able to prove
our GWVr1 theorem for twoAssignment. And, in fact, we can.
twoAssignmentGWVr1: LEMMA
GWVr1(twoAssignment)(Hyps,twoAssignmentGraph)
But what if we had not covered all of the cases? Would the theorem catch our
mistakes? Consider what would happen if we failed to account for one of the writes
performed by twoAssignment. The following graph fails to account for the write to
location 2. Consider what happens when we try to prove that this new graph still
characterizes our original function. Note that we employ the same proof tactics.
AUTO_REWRITE+ missedUpdateGraph!
missedUpdateGWVr1Fails: LEMMA
GWVr1(twoAssignment)(Hyps,missedUpdateGraph)
In this case, the proof fails. The failed subgoal is included below. Note that the
remaining proof obligation is to show that the sum of locations 3 and 4 for the two
different instances is the same. However, the only hypothesis we have is that the
instances agree at location 2 (x!1 D 2). This failed proof typifies proof attempts for
graphs that do not account for all state updates.
missedUpdateGWVr1Fails :
{-1} (2 = x!1)
{-2} (in1!1(x!1) = in2!1(x!1))
{-3} Hyps(in1!1)
{-4} Hyps(in2!1)
|-------
{1} (in1!1(3) + in1!1(4) = in2!1(3) + in2!1(4))
{2} (0 = x!1)
{3} (x!1 = 0)
Rule?
Now consider what happens if we fail to appropriately account for all uses of
a location. The following graph fails to account for the use of location 4 in the
computation of location 2. Again we employ the same basic proof strategy.
missedUseGraph(st:state): graph =
addDIA(0,1,addDIA(2,3,ID))
AUTO_REWRITE+ missedUseGraph!
missedUseGWVr1Fails: LEMMA
GWVr1(twoAssignment)(Hyps,missedUseGraph)
268 D.A. Greve
Again the proof fails. In the failed subgoal for this proof, we see that we know
that locations 2 and 3 are the same. However, in order to complete the proof we need
to know that location 4 is also the same in both instances. This failed proof typifies
proof attempts for graphs that do not account for all uses of state locations.
missedUseGWVr1Fails :
{-1} (2 = x!1)
{-2} (x!1 = 2)
{-3} (in1!1(3) = in2!1(3))
{-4} (in1!1(2) = in2!1(2))
{-5} Hyps(in1!1)
{-6} Hyps(in2!1)
|-------
{1} (in1!1(3) + in1!1(4) = in2!1(3) + in2!1(4))
{2} (0 = x!1)
{3} (x!1 = 0)
Rule?
The following graph accurately reflects a precise information flow model of the
function conditionalUpdates. Note that location 4 depends upon location 7
only if the test succeeds. However, it depends upon location 0 regardless of the
outcome of the test. The GWVr1 proof for this graph succeeds.
AUTO_REWRITE+ conditionalUpdatesGraph!
conditionalUpdatesGWVr1: LEMMA
GWVr1(conditionalUpdates)
(Hyps,conditionalUpdatesGraph)
If, however, the conditional dependency is omitted from one (or both) of the
branches, the proof fails.
AUTO_REWRITE+ missedConditionalGraph!
missedConditionalGWVr1Fails: LEMMA
GWVr1(conditionalUpdates)
(Hyps,missedConditionalGraph)
In the failed subgoal below, we are trying to prove that location 4 is the
same in both instances. Observe, however, that the conditional expression
get(0,st) D 3 has produced different outcomes for the two different input
instances (f2g and f1g). This is possible because location 0 is not known to be the
same in those instances. This failed proof typifies proof attempts for graphs that do
not accurately account for the dependencies of conditional assignments.
missedConditionalGWVr1Fails.1 :
{-1} (4 = x!1)
{-2} in1!1(0) = 3
{-3} (x!1 = 4)
{-4} (in1!1(7) = in2!1(7))
{-5} (in1!1(4) = in2!1(4))
{-6} Hyps(in1!1)
{-7} Hyps(in2!1)
|-------
{1} in2!1(0) = 3
{2} (in1!1(7) = in2!1(x!1))
Rule?
270 D.A. Greve
We have shown how graphs can be used to model the information flow properties of
individual functions. We now explore the transitive information flow relationships
that result from function composition. That is to say, if A depends upon B in a
first function and B depends upon C in a second function, it should be possible
to conclude that A depends upon C when those two functions are combined. The
combination of two functions is called function composition. The combination of
two graphs is called graph composition. Graph composition, when properly defined,
models function composition. That is to say, given two graphs characterizing the
information flow of two functions, the composition of the graphs is a model of the
information flow of the composition of the two functions.
In PVS, we denote graph composition using the infix “o” operator and we define it
as follows. We also provide the function GraphComposition for use when the
infix operator may be ambiguous.
GraphComposition[index1,index2,index3: TYPE]:
THEORY BEGIN
END GraphComposition
Information Security Modeling and Analysis 271
Note that, in order to compose two graphs, the input index type of the left graph
must agree with the output index type of the right graph. Graph composition is not
commutative, but it is associative.
GraphCompositionProperties[index1,index2,index3,
index4: TYPE]:
THEORY BEGIN
IMPORTING GraphComposition[index1,index2,index3]
IMPORTING GraphComposition[index2,index3,index4]
IMPORTING GraphComposition[index1,index2,index4]
IMPORTING GraphComposition[index1,index3,index4]
compose_is_associative: LEMMA
FORALL (g12: graph12,
g23: graph23,
g34: graph34):
((g34 o g23) o g12) =
(g34 o (g23 o g12))
When the input and output index types are the same, we can define an identity
graph. The identity graph maps each index into its own singleton set. The DIASet
of any set, when applied to an identity graph, is the original set. Likewise, any graph
composed with the identity graph (either from the left or from the right) remains
unchanged.
IMPORTING GWV_Graph[Index,Index]
IMPORTING GraphComposition[Index,Index,Index]
DIAset_ID: LEMMA
272 D.A. Greve
compose_ID_1: LEMMA
FORALL (g: graph):
ID o g = g
compose_ID_2: LEMMA
FORALL (g: graph):
g o ID = g
END GraphID
GWVr1_Composition[Index1,Index2,Index3,
State1,State2,State3,
Value1,Value2,Value3: TYPE,
Get1: [[Index1,State1]->Value1],
Get2: [[Index2,State2]->Value2],
Get3: [[Index3,State3]->Value3]]:
THEORY BEGIN
IMPORTING GWVr1[Index1,State1,Value1,Get1,
Index2,State2,Value2,Get2] AS P12
IMPORTING GWVr1[Index2,State2,Value2,Get2,
Index3,State3,Value3,Get3] AS P23
IMPORTING GWVr1[Index1,State1,Value1,Get1,
Index3,State3,Value3,Get3] AS P13
IMPORTING GraphComposition[Index1,Index2,Index3]
AS G
IMPORTING Equiv[Index1,State1,Value1,Get1] AS ST1
IMPORTING Equiv[Index2,State2,Value2,Get2] AS ST2
IMPORTING Equiv[Index3,State3,Value3,Get3] AS ST3
IMPORTING function_props[State1,State2,State3] AS F
Information Security Modeling and Analysis 273
GWVr1_Composition_Theorem: LEMMA
FORALL (Next12 : P12.StepFunction,
Graph12 : P12.GraphFunction,
Hyp1 : P12.PreCondition,
Next23 : P23.StepFunction,
Graph23 : P23.GraphFunction,
Hyp2 : P23.PreCondition):
(P12.GWVr1(Next12)(Hyp1,Graph12) AND
P23.GWVr1(Next23)(Hyp2,Graph23)) =>
P13.GWVr1 (Next23 o Next12)
((lambda (in1: State1):
(Hyp2(Next12(in1)) & Hyp1(in1))),
(lambda (in1: State1):
(GraphComposition((Graph23 o Next12)(in1),
Graph12(in1)))))
END GWVr1_Composition
Given a concrete graph describing the information flow of a system, it may be pos-
sible to construct a more abstract graph of the same system containing less detail
that preserves the essential information flow properties of the system. A graph ab-
straction associates groups of concrete index values with abstract index names. Such
abstractions can substantially reduce the complexity of a graph, especially when it
serves to partition a large basis set into a small number of security domains.
An abstraction has three components: a lifting graph that transforms concrete
input indices into abstract input indices, a lifting graph that transforms concrete
output indices into abstract output indices, and an abstract graph that models the ab-
stract information flow relation between abstract input and output indices. We
say that an abstraction is conservative if the interferes set of each concrete in-
dex is preserved (or extended) by the abstraction. With a conservative abstraction,
noninterference questions about concrete index values can be translated into nonin-
terference questions about the abstract index values. Because abstract graphs may
be more concise than concrete graphs, such questions may be easier to answer in
the abstract domain.
GraphAbstractionProperty[CiIndex,CoIndex,AiIndex,AoIndex: TYPE]:
THEORY BEGIN
IMPORTING GWV_Graph[CiIndex,CoIndex] AS C
IMPORTING GWV_Graph[CiIndex,AiIndex] AS Li
IMPORTING GWV_Graph[CoIndex,AoIndex] AS Lo
IMPORTING GWV_Graph[AiIndex,AoIndex] AS A
IMPORTING GraphComposition[CiIndex,CoIndex,AoIndex]
IMPORTING GraphComposition[CiIndex,AiIndex,AoIndex]
274 D.A. Greve
IMPORTING IndexSet[CiIndex]
IMPORTING IndexSet[CoIndex]
IMPORTING IndexSet[AiIndex]
ConservativeAbstraction(GCStep: C.graph,
GLi : Li.graph,
GLo : Lo.graph)
(AStep : A.graph): bool =
FORALL (Ao: AoIndex, Ci: CiIndex):
member(Ci,DIA(Ao,GLo o GCStep)) =>
member(Ci,DIA(Ao,AStep o GLi))
AbstractNonInterference: LEMMA
FORALL (Ci : CiIndex,
Co : CoIndex,
Ao : AoIndex,
GCStep: C.graph,
GLi : Li.graph,
GLo : Lo.graph,
AStep : A.graph):
(ConservativeAbstraction(GCStep,GLi,GLo)(AStep) &
disjoint(invDIA(Ci,GLi),DIA(Ao,AStep)) &
member(Co,DIA(Ao,GLo)))
=>
not(member(Ci,DIA(Co,GCStep)))
END GraphAbstractionProperty
IMPORTING GraphAbstractionProperty[T1,T2,T4,T5]
IMPORTING GraphAbstractionProperty[T2,T3,T6,T7]
Information Security Modeling and Analysis 275
IMPORTING GraphAbstractionProperty[T1,T3,T4,T7]
IMPORTING GraphAbstractionProperty[T2,T2,T5,T6]
IMPORTING GWV_Graph[T5,T6]
IMPORTING GraphComposition[T1,T2,T3] AS P13
IMPORTING GraphComposition[T4,T5,T6] AS P46
IMPORTING GraphComposition[T4,T6,T7] AS P47
IMPORTING GraphID[T2]
IMPORTING GraphID2[T1,T2]
Composition: LEMMA
FORALL (S12: graph[T1,T2],
L14: graph[T1,T4],
L25: graph[T2,T5],
A45: graph[T4,T5],
S23: graph[T2,T3],
L26: graph[T2,T6],
L37: graph[T3,T7],
A67: graph[T6,T7],
B56: graph[T5,T6]):
ConservativeAbstraction(S12,L14,L25)(A45) &
ConservativeAbstraction(I22,L25,L26)(B56) &
ConservativeAbstraction(S23,L26,L37)(A67) =>
ConservativeAbstraction(P13.o(S23,S12),L14,L37)
(P47.o(A67,(P46.o(B56,A45))))
END AbstractGraphComposition
While GWVr1 has been shown to be effective at modeling information flow proper-
ties of functions, there are example of functions for which GWVr1 requires the use
of an information flow graph that seems to overapproximate the information flow of
the function.
Recall our illustrative state data structure modeled as a mapping from natural
numbers to integers, the associated get projection function, and the set function
that allows us to selectively modify specific portions of the state.
AUTO_REWRITE+ guardedDomain!
AUTO_REWRITE+ guardedDomainGraph!
guardedDomainGWVr1Fails: LEMMA
GWVr1(guardedDomain)(Hyps,guardedDomainGraph)
guardedDomainGWVr1Fails.2 :
Rule?
There is a graph that characterizes this function. However, in this graph every
index location depends upon location 0.
AUTO_REWRITE+ overkillGraph!
overkillGWVr1: LEMMA
GWVr1(guardedDomain)(Hyps,overkillGraph)
We call this graph overkillGraph, because it seems like overkill for every
index to have to depend upon this condition. Domain guards turn out to be very
common in dynamic systems. For example, null pointer checks on heap resident
data structures are one kind of domain guard. If every part of the state of a dynamic
system must depend upon every domain guard in the system, pretty soon every-
thing depends upon everything and the information flow graph becomes useless as
a specification. Our efforts to model the Green Hills INTEGRITY-178B operat-
ing system, with its many heap resident (dynamic) data structures and null pointer
checks, brought this issue to a head [10]. It was precisely that modeling experience
that motivated our development of GWVr2.
278 D.A. Greve
3 GWVr2
ASSUMING
IMPORTING Ideal[OUTState,OUTindex,OUTvalue]
copy_right: ASSUMPTION
FORALL (i: INindex, o: OUTindex, ival: INvalue):
gettablevalue(getOUT,o,copy(i,o)(ival))
OrthogonalSet_And_Complete: ASSUMPTION
OrthogonalSet(getOUT) & Complete(getOUT)
Information Security Modeling and Analysis 279
ENDASSUMING
IMPORTING GWV_Graph[INindex,OUTindex]
IMPORTING Equiv[INindex,INState,INvalue,getIN]
AS Input
IMPORTING Equiv[OUTindex,OUTState,OUTvalue,getOUT]
AS Output
The construction of the new state object is complicated by the fact that we want
GWVr2 to be as strong as GWVr1 under appropriate conditions. This leads us to
choose (via epsilon, representing an axiom of choice) a value for the new state that
is, in a sense, the one most likely to cause our efforts to fail. We call this condition
the “bad boy” condition and the resulting state the “bad state” (bad st). If, however,
despite the odds, we succeed in proving equivalence using this malicious state, it
ensures that we will be able to prove a similar equivalence with any other state. This
claim is reminiscent of GWVr1, which gives us some hope that GWVr2 will at least
be similar to GWVr1.
The process begins by defining appropriate types and predicates and articulating
exactly how we would recognize a bad boy state: the result of applying Next to that
state would differ at the index of interest from an application of Next to our original
state. We then use epsilon to choose our malicious state: st bad.
use_equiv(Hyp: PreCondition,
u: set[INindex],
st1: PreState(Hyp))
(stx: INState): bool =
equivSet(u,st1,stx) & Hyp(stx)
use_equiv_state(Hyp: PreCondition,
u: set[INindex],
st1: PreState(Hyp)): TYPE =
{ s: INState | use_equiv(Hyp,u,st1)(s) }
st bad, because of its base type, has several useful properties. It satisfies the
Hyp precondition and it is equivalent to its st1 argument at every index location in
the set u. If possible, it would also satisfy the bad boy predicate. However, if that is
not possible, we get a very nice property: one that mirrors GWVr1.
st_bad_next_equiv: LEMMA
FORALL (Hyp: PreCondition, i: OUTindex,
Next: StepFunction, u: set[INindex],
st1: INState):
Hyp(st1) &
equiv(i,Next(st1),
Next(st_bad(Hyp,i,Next,u,st1))) =>
FORALL (st2: INState):
equivSet(u,st1,st2) & Hyp(st1) & Hyp(st2) =>
equiv(i,Next(st1),Next(st2))
pstate: TYPE =
[ i:OUTindex -> gettablevalue(getOUT,i) ]
The function NextGpst interprets the graph to construct a pstate output. For
a computed index, the result is the projection at that index of the Next function
applied to a state that is equivalent (via st bad) to the input state st at every index
location dictated by DIA(i,G). For a copied index, on the other hand, the result is
a direct copy of that input. The function NextG simply converts NextGpst back
into a state object.
GWVr2 is expressed simply as equality between Next and NextG under the ap-
propriate preconditions.
GWVr2(Next: StepFunction)
(Hyp: PreCondition,
Graph: GraphFunction): bool =
FORALL (st: INState):
Hyp(st) =>
Next(st) = NextG(Hyp,Next)(Graph(st),st)
FramedGWVr1 [
INindex: TYPE, INState: TYPE+, INvalue: TYPE,
getIN: [[INindex, INState] -> INvalue],
OUTindex, OUTState, OUTvalue: TYPE,
getOUT: [[OUTindex, OUTState] -> OUTvalue],
copy:[[INindex,OUTindex] -> [INvalue -> OUTvalue]]]:
THEORY BEGIN
IMPORTING GWVr1[INindex,INState,INvalue,getIN,
OUTindex,OUTState,OUTvalue,getOUT]
For Copy edges, GWVr2 reduces to a single state theorem about how the func-
tion being characterized is equivalent to a function that copies an input value to
the output. We call this theorem the frame condition (FrameCondition). The
frame condition provides a strong functional theorem about the behavior of Next
at copied index values. Typically copy is defined as the identity function (the first
two parameters to copy are included only to satisfy type reasoning). In such case,
FrameCondition says that Next leaves the state completely unchanged at copied
index locations. We will see in subsequent sections that this strong functional theo-
rem, which is not available with GWVr1, provides a convenient means of expressing
noninterference theorems. The defSet of a graph is composed entirely of computed
index values. The nonmembership of an index value in the defSet of a graph is
equivalent that index being a copy edge in the graph.
FrameCondition(Next: StepFunction)
(Hyps: PreCondition,
Graph: GraphFunction): bool =
FORALL (x: OUTindex, st: INState):
not(member(x,defSet(Graph(st)))) & Hyps(st) =>
getOUT(x,Next(st)) =
copy(CopyIndex(Graph(st)(x)),x)
(getIN(CopyIndex(Graph(st)(x)),st))
For Compute edges, GWVr2 reduces to GWVr1 (due, in part, to the virtues
of the carefully chosen st bad). We call a GWVr1 theorem that is condi-
tional on the graph edge associated with the output index a framed congruence
(FramedGWVr1).
Information Security Modeling and Analysis 283
FramedGWVr1(Next: StepFunction)
(Hyps: PreCondition,
Graph: GraphFunction): bool =
FORALL (x: OUTindex, in1,in2: INState):
equivSet(DIA(x,Graph(in1)),in1,in2) &
Hyps(in1) & Hyps(in2) &
member(x,defSet(Graph(in1))) =>
equiv(x,Next(in1),Next(in2))
GWVr2_reduction: LEMMA
FORALL (Next: StepFunction,
Hyp: PreCondition,
Graph: GraphFunction):
GWVr2(Next)(Hyp,Graph) =
(FramedGWVr1(Next)(Hyp,Graph) &
FrameCondition(Next)(Hyp,Graph))
The development of GWVr2 was motivated in part by the need for a better method
for modeling systems containing dynamic domain guards. Recall our definition of
guardedDomain.
AUTO_REWRITE+ guardedDomain!
And recall the naı̈ve, type-safe graph that failed under GWVr1.
284 D.A. Greve
AUTO_REWRITE+ guardedDomainGraph!
The proof that failed under GWVr1 succeeds under FramedGWVr1, for an ap-
propriate definition of copy.
IMPORTING FramedGWVr1[nat,state,int,get,
nat,state,int,get,
copy]
guardedDomainFramedGWVr1: LEMMA
FramedGWVr1(guardedDomain)(Hyps,guardedDomainGraph)
guardedDomainFrameCondition: LEMMA
FrameCondition(guardedDomain)
(Hyps,guardedDomainGraph)
%|- (then
%|- (auto-rewrite "FrameCondition")
%|- (auto-rewrite! "DIA" "defSet")
%|- (auto-rewrite! "addDIA" "defSet" "get" "set")
%|- (auto-rewrite "ID")
%|- (auto-rewrite "member")
%|- (auto-rewrite "equiv")
%|- (auto-rewrite "copy")
%|- (auto-rewrite-theory
%|- "EquivSetRules[nat,state,int,get]")
%|- (apply (repeat* (then*
%|- (lift-if) (ground) (skosimp)))))
%|- QED
step function, that the precondition is independent of the inputs, that the basis set is
Orthogonal and Complete, and that the graph is reactive, which is to say that any
input contained in the graph depends upon itself.
ASSUMING
NextCharacterization: ASSUMPTION
GWVr1(Next)(Hyp,Graph)
Next_PostCondition: ASSUMPTION
FORALL (s: state):
Hyp(s) =>
Hyp(Next(s))
HypCongruence: ASSUMPTION
FORALL (s1,s2: state):
equivSet(not(InputSet),s1,s2) =>
Hyp(s1) = Hyp(s2)
IMPORTING Ideal[state,index,value]
OrthogonalSet_get: ASSUMPTION
OrthogonalSet(get)
Complete_get: ASSUMPTION
Complete(get)
reactive_Graph: ASSUMPTION
FORALL (i: index):
member(i,InputSet) =>
FORALL (st: state):
member(i,DIA(i,Graph(st)))
ENDASSUMING
get_applyInputs: LEMMA
FORALL (i: index, input,st: state):
get(i,applyInputs(input,st)) =
IF member(i,InputSet) THEN
get(i,input)
ELSE
get(i,st)
ENDIF
We are now in a position to describe the functional behavior of our state machine
over time. Time is modeled as a natural number and a trace is defined as a mapping
from time to states. The first argument to our state machine model is the time up
to which it is to run, beginning at time zero. The second input to our model is a
trace (the oracle) containing the inputs that are to be consumed by the machine at
each step. Additionally, the oracle at time zero is interpreted as the initial state of
the system. In every step, the machine applies the inputs at the current time to the
result of applying a single step (Next) to the state computed in the previous time.
Observe that our recursive model of sequential execution can be viewed as a se-
quence of compositions of the state transition function with the inputs and itself. Not
surprisingly, the information flow of our sequential system can also be modeled as
a sequence of compositions of the graph that characterizes the state transition func-
288 D.A. Greve
tion with itself. In other words, the information flow model of a sequential system
is itself a sequential system.
IMPORTING GraphID[index]
It is relatively easy to prove, given the fact that graph composition is a model of
function composition, that such a state machine model does, in fact, track informa-
tion flow in a sequential system. However, the development of the proof requires
that we extend our calculus of indices to take into account the fact that our inputs
are applied fresh in every cycle. There are a variety of ways to model this poten-
tially unbounded collection of inputs. We could, for example, extend our basis set
to construct a unique index value for each input at each step in time. The technique
we choose, however, is to extend our interpretation of what an input is. Rather than
an input being a single value, we view each input as a sequence of values that may
vary over time. We call such mappings from time to values signals. Employing this
concept we define a new projection function, sigget, that projects input index values
from the input oracle into signals that may vary over time. For representational con-
venience, the projection function also projects index values that are not inputs into
signals that assume the value of the index in the initial state (the oracle at time 0) for
all time. Employing this new projection function, we define sigEquiv and sigSetE-
quiv as equivalence relations between two different oracle traces.
These new definitions, in conjunction with selected theory assumptions and their
consequences, allow us to prove that GraphRun characterizes Run (in a GWVr1
sense) for all time.
RunGWVr1: LEMMA
FORALL (t: time):
FORALL (i: index,oracle1,oracle2: trace):
Hyp(oracle1(0)) & Hyp(oracle2(0)) &
sigSetEquiv(DIA(i,GraphRun(t)(oracle1)),
oracle1,oracle2) =>
equiv(i,Run(t)(oracle1),Run(t)(oracle2))
5 Classical Noninterference
Rushby goes on to claim that the requisite lack of perception exists if the behavior
of v remains unchanged, even after purging from the system trace of all the actions
performed by u.
Terminology aside, these formalisms are similar in that they both speak of secu-
rity domains (or users) that are selectively empowered with a set of computational
capabilities (commands or actions). Notions such as actions, commands, and users,
while convenient for expressing certain policy statements, are nearly orthogonal to
the essential concept of noninterference found in both formulations: that of not per-
ceiving (or not seeing) some “effect” and the use of equality between some projected
portion of state as a litmus test of that fact.
We view this essential concept of noninterference as an information flow prop-
erty. Furthermore, we require that higher level concepts (such as security domains
and capabilities) be given formal information flow models before noninterference
properties expressed in terms of those concepts can be verified.
290 D.A. Greve
5.1 Domains
5.2 Capabilities
Consider a system containing two domains: a Red domain and a Black domain.
Informally, the Red domain is noninterfering with the Black domain if no action
of the Red domain can ever be perceived by the Black domain. We express this
6
Such functions have also been called crawlers [4, 6].
Information Security Modeling and Analysis 291
property as a theorem about the sequential behavior of the system after an arbitrary
number of steps. In particular, we want to show that the value of each element of
the Black domain is the same whether we start from an arbitrary state or from that
arbitrary state modified by some capability of the Red domain.
Our noninterference example extends the MultiCycle theory presented previ-
ously. Our single-step model function, Next, is employed as a generic model of
the application of one or more system capabilities. Run, therefore, models an ar-
bitrary evolution of our system over some amount of time and RunGraph models
the information flow of the overall system during that time. One system capability
is explicitly identified, however, and we call it RedCapability. Associated with that
capability is a graph, RedGraph, that characterizes the information flow behavior of
RedCapability. We assume that the system state can be partitioned into some num-
ber of security domains, including a Red domain (RedDomain) and a Black domain
(BlackDomain).
The type DomainFunction is provided to help identify domain functions, being
defined as a function that maps a state into a set of index values. We define a copy
function and use it to import the FramedGWVr1 theory. We also define a function
that constructs an input oracle trace from a trace and an initial state.
Predicates are introduced that become obligations on the various functions ap-
pearing in our example. The trueCopies predicate states that the index values stored
in the copy edges of a graph function really are copies of the index value used to
index the graph. The PostCondition predicate says that the precondition is invariant
over the step function. The RedGraphRestriction predicate restricts the defSet of
the RedGraph to be a subset of the RedDomain (ensuring no writes outside of the
bounds of the Red domain by the Red capability). The final predicate, the NonInter-
ferenceProperty, is a property of the system, RunGraph, that says that no member
of the Red domain ever appears in the DIA (interferes set) of any member of the
Black domain.
PostCondition(Hyp: X.PreCondition,
Next: X.StepFunction): bool =
FORALL (s: state):
Hyp(s) => Hyp(Next(s))
RedGraphRestriction
(RedGraph: X.GraphFunction,
RedDomain: DomainFunction) : bool =
FORALL (st: state):
subset?(defSet(RedGraph(st)),RedDomain(st))
NonInterferenceProperty
(RedDomain,BlackDomain: DomainFunction): bool =
FORALL (blk,red: index,
in: trace,
st: state):
Hyp(st) &
member(blk,BlackDomain(st)) &
member(red,RedDomain(st)) =>
FORALL (t: time):
not(member(red,
DIA(blk,GraphRun(t)(Oracle(in,st)))))
Employing these definitions, our example noninterference theorem says that, for
every RedCapability, RedGraph, RedDomain, and BlackDomain, if the RedGraph
contains trueCopies, the RedCapability satisfies the PostCondition, the RedCapabil-
ity satisfies the FrameCondition (half of GWVr2) suggested by the RedGraph under
Hyp, the RedGraph satisfies RedGraphRestriction relative to the RedDomain, and
the system RunGraph satisfies the NonInterferenceProperty with respect to the Red-
Domain and BlackDomain, then the value extracted by get at every member of the
BlackDomain after running t cycles following the execution of the RedCapability
will be the same as the value at that index after running t cycles without execut-
ing the RedCapability. The proof of this theorem ultimately appeals to the fact that
RunGraph characterizes Run.
NonInterferenceTheorem: LEMMA
FORALL
(RedCapability : X.StepFunction,
RedGraph : X.GraphFunction,
RedDomain : DomainFunction,
BlackDomain : DomainFunction
):
trueCopies(RedGraph) &
PostCondition(Hyp,RedCapability) &
FrameCondition(RedCapability)(Hyp,RedGraph) &
Information Security Modeling and Analysis 293
RedGraphRestriction(RedGraph,RedDomain) &
NonInterferenceProperty(RedDomain,BlackDomain)
=>
FORALL (t: time,
blk: index,
in: trace,
st: state):
Hyp(st) &
member(blk,BlackDomain(st)) =>
get(blk,Run(t)(Oracle(in,RedCapability(st)))) =
get(blk,Run(t)(Oracle(input,st)))
It is worth noting that the proof of this particular theorem is made possible by the
strong functional property provided by the GWVr2 frame condition. In particular,
the frame condition allows us to reduce RedCapability to a copy (no op) when we
examine index values outside of its defSet.
The trueCopies, PostCondition, FrameCondition, and RedGraphRestriction
properties appearing in this theorem are all obligations that can be dispatched
locally. That is, they can be established as properties of RedCapability, RedGraph,
RedDomain, and BlackDomain without knowledge of the rest of the system. The
NonInterferenceProperty, on the other hand, is a property of the entire system infor-
mation flow graph (RunGraph). It is a system-wide property and it is the true heart
of the noninterference theorem. Goguen and Meseguer claim that noninterference is
useful as a system security policy. We claim that information flow graphs are useful
as a system specification. Our example noninterference theorem demonstrates how
a noninterference policy can be established from a property (NonInterferenceProp-
erty) of a graphical system specification.
simple_trace_prop(Hyp,Prop: GSPred,
Step?: GSRelation): bool =
(FORALL (trace: GSTrace(Step?)):
Hyp(trace(0)) =>
(FORALL (n: nat): Prop(trace(n))))
IMPORTING LTL[GState]
simple_LTL_prop(Hyp,Prop: GSPred,
Step?: GSRelation): bool =
FORALL (s: GSTrace(Step?)):
(s |= (Holds(Hyp) =>
G(Holds(Prop))))
Information Security Modeling and Analysis 295
trace_to_LTL: LEMMA
FORALL (Hyp,Prop: GSPred,
Step?: GSRelation):
simple_LTL_prop(Hyp,Prop,Step?) =>
simple_trace_prop(Hyp,Prop,Step?)
Alternatively, one could express our simple property using MU calculus. This
formulation is provably equivalent to our LTL formulation. For properties expressed
in the MU calculus, PVS provides built-in model checking capabilities.
simple_MU_prop(Hyp,Prop: GSPred,
Step: GSStep): bool =
(FORALL (gs: GState):
Hyp(gs) =>
AG(relation(Step),Prop)(gs))
LTL_to_MU: LEMMA
FORALL (Hyp,Prop: GSPred,
Step: GSStep,
Step?: StepGSRelation(Step)):
simple_LTL_prop(Hyp,Prop,Step?) =
simple_MU_prop(Hyp,Prop,Step)
END TraceMU
NonInterferenceProperty
(RedDomain,BlackDomain: DomainFunction): bool =
FORALL (blk,red: index,
in: trace,
st: state):
Hyp(st) &
member(blk,BlackDomain(st)) &
member(red,RedDomain(st)) =>
FORALL (t: time):
not(member(red,
DIA(blk,GraphRun(t)(Oracle(in,st)))))
We define a compound type that contains both our system state and our graph
state. Using this we define a relation, StepRelation, that constrains our trace to re-
flect the evolution of our system state machine and system information flow graph.
A predicate, GSHyp0, is defined to reflect our preconditions on the initial system
state and the initial graph state.
296 D.A. Greve
noninterference_hyp(RedDomain: DomainFunction,
BlackDomain:DomainFunction,
red,blk: index)
(gs: GState): bool =
GSHyp0(gs) &
member(blk,BlackDomain(state(gs))) &
member(red,RedDomain(state(gs)))
noninterference_prop(red,blk: index)
(gs: GState): bool =
not(member(red,DIA(blk,graph(gs))))
Information Security Modeling and Analysis 297
TraceNonInterferenceProperty
(RedDomain,BlackDomain: DomainFunction): bool =
FORALL (blk,red: index, gs: StepTrace):
noninterference_hyp
(RedDomain,BlackDomain,red,blk)(gs(0)) =>
FORALL (t: time):
noninterference_prop(red,blk)(gs(t))
TraceNonInterference_is_NonInterference: LEMMA
FORALL (Red: DomainFunction,
Black: DomainFunction,
TraceNonInterferenceProperty(Red,Black) =
NonInterferenceProperty(Red,Black)
IMPORTING TraceMU[GState]
TraceNonInterference_as_simple_trace_prop: LEMMA
FORALL (RedD,Black: DomainFunction):
TraceNonInterferenceProperty(RedD,Black) =
FORALL (blk,red: index):
simple_trace_prop(
noninterference_hyp(RedD,Black,red,blk),
noninterference_prop(red,blk),
StepRelation)
6 Conclusion
The AAMP7G effort also employed commuting proof between the implementa-
tion model and an abstract representation as well as a proof that abstract model
implemented the desired interpartition communication policy, expressed as an in-
formation flow property.
Our models of information flow have been used compositionally to verify in-
teresting security properties of more abstract computing systems. We verified the
security properties of a simple firewall implemented on a partitioned operating
system by appealing to the information flow properties of the OS and the indi-
vidual partitions. The hierarchical approach used in our analysis of the Green Hills
INTEGRITY-178B operating system derived information flow properties of systems
by composing information flow properties of their subsystems. Finally, an analysis
of an abstract model of the Turnstile system verified several key information flow
properties of the system and confirmed a known information back-channel resulting
from “assured delivery” requirements [15].
We have also established that our model of information flow accurately expresses
and predicts system behavior in the domain of interest. Recall that our concern is
how information may or may not be communicated within computing systems. Our
techniques have been shown to address three specific concerns from the secure com-
puting domain [7]:
Exfiltration. A computational principal is able to read information in violation of
the system security policy, a policy such as a Bell-LaPadula “read-up” policy.
Infiltration. A computational principal is able to write information in violation of
the system security policy, a policy such as a Bell-LaPadula “write-down” policy.
Mediation. A computational principal is able to move information in the system,
contrary to a policy which does not allow that principal to perform that action, a
policy such as a “Chinese wall” separation of duties policy.
Having demonstrated that our models accurately express and predict behavior in the
domain of information assurance, can be used in the verification of larger systems,
and can be validated against actual systems, we feel that we have met our objective
of developing a good mathematical framework for modeling and reasoning about
information security specifications.
References
1. Alves-Foss J, Taylor C (2004) An analysis of the GWV security policy. In: Proceedings of the
fifth international workshop on ACL2 and its applications, Austin, TX, Nov. 2004
2. Clarke EM, Grumberg O, Peled DA (1999) Model checking. MIT, Cambridge, MA
3. Goguen JA, Meseguer J (1982) Security policies and security models In: Proceedings of the
1982 IEEE symposium on security and privacy, pp 11–20. IEEE Computer Society Press,
Washington, DC
4. Greve D (2004) Address enumeration and reasoning over linear address spaces. In: Proceedings
of ACL2’04, Austin, TX, Nov. 2004
5. Greve D (2006) Parameterized congruences in ACL2. In: Proceedings of ACL2’06, Austin,
TX, Nov. 2006
Information Security Modeling and Analysis 299
Raymond J. Richards
1 Introduction
– System state
– Behavior
– Information flow
The proof architecture used to demonstrate correspondence.
The informal analysis of the hardware abstraction layer.
2 Separation Theorem
System*
System
1
System
State 2 State’
Inputs Outputs
System
S
If it can be proven that the system and system produce identical results for all
inputs of interest, it implies that the graph used by system completely captures the
information flow of the system. This is the GWVr2 theorem.
(equal
(system state)
(system graph state))
A trivial graph that satisfies this theorem simply gives each subsystem all inputs
and state elements. Conversely, there is a minimal graph, for which removing any
element from the input of any subsystem causes the theorem to fail. Elements can
be added to the minimal graph, without impacting the correctness of the theorem.
This means that the input for one of the subsystems defines all of the data necessary
for computing one element of the next state or output.
The GWVr2 Theorem is the Common Criteria Security Policy Model for
INTGERITY-178B.
To be consistent with the goal that the formal analysis be platform independent, the
model of system state is that of nested abstract data structures. Elements within a
data structure can either be a scalar or a nested data structure. A data structure that
contains other data structures may be a record of heterogeneous data items or an
array of homogenous data items. All elements in a data structure have names that
uniquely identify them and distinguish them from their peer elements. This is analo-
gous to a Unix file system containing directories and files. The directories represent
nested data structures and the files represent scalar data elements.
In such a file system, one can identify any directory or file resident within a
particular directory by specifying a path. The path contains the name of every sub-
directory that must be traversed in order to reach the item of interest. Similarly, in
the model of state, one can reach any piece of state that is resident in a data structure
by specifying a path. Arrays are represented in this model by using the array indices
as a specifier in the path.
Paths are considered scalar data items; they can be stored as part of state. This is
how C language pointers are modeled. Paths can be references to state locations and
can be dereferenced. Dereferencing a path produces the value stored at that location
in state.
An example data structure is shown in Fig. 3. Four scalar values are stored in
nested data structures. The paths to these for values and the data stored in this struc-
ture are shown in Table 1.
The ACL2 representation of a path is simply a list of identifiers. The head of
a path is the outermost data structure. The tail of a path represents a path that is
relative to the head. In this way, paths are analogous to a directory path in a Unix
306 R.J. Richards
Struct1
Value1
A String
Value2
123
Struct2
Value3
7.5
Value4
State.Struct1.Value2
file system. Absolute paths are relative to the root of the file system; relative paths
are referenced from the current location.
Operators are defined to update and query the state. Both of these operators use
a path to specify which element of the state they are affecting. The query operator
“GP,” or Get from Path, returns the value stored at the specified location. Its im-
plementation is a recursive function that fetches the data specified by the identifier
at the head of the path and recursively calls itself using the tail of the path and the
fetched bit of state as the recursive arguments. The signature of the GP operator is:
(GP path st)
The update operator “SP,” or Set Path, returns a new state, where the element
specified by the given path is replaced with a new value. Its implementation is
Modeling and Security Analysis of a Commercial Real-Time Operating System Kernel 307
also recursive. SP replaces the element specified by the head of the path with the
value returned by its recursive call. The arguments to this recursive call are the tail
of the given path and the value found in the state at the location specified by the
head of the path. The signature of the SP operator is:
(SP path value st)
In this example, the outer call to the function reflex depends upon the results
of the inner call. This results in the proof of termination depending upon the termina-
tion of the function. When this occurs in the INTEGRITY-178B kernel, it is modeled
by unrolling the recursion. The recursion encountered in INTEGRITY-178B is con-
trolled by a simple counting variable; when that variable reaches a particular value,
the recursion is terminated.
A set of ACL2 macros is used to allow the functional model to have an impera-
tive look and feel. These macros are known collectively has the reader macro. The
reader macro expands statements into a functional form. The reader macro is a form
that begins with the symbol “%.” This allows a syntax that closely resembles C to
expand into native ACL2. The native ACL2 uses the state operators “SP” and “GP”
to interact with the system state. The following types of statements are handled by
the reader macro:
Global variable access
Assignment
Function invocation
Conditional early exit from a function
In the functional language model, all state information, except for local variables,
is stored in the state structure that is passed throughout the model. Local, or stack,
variables are modeled by local variables in ACL2, as long as there is no address-
based accessing of the variable. A local variable that has its address passed to a
subordinate function must be modeled as a state element.
The C language’s use of variable identifiers does not distinguish between global
and local variables. Since global variables are elements in the state structure, syntax
was adopted to indicate when an identifier is an access to a global variable. Preced-
ing an identifier with the symbol “@” indicates that the identifier should be treated
as a path to a global variable. Preceding any path, including that of a local variable,
with the symbol “ ” queries the value stored at that location pointed to by the path.
4.1.2 Assignment
Assignment statement syntax depends on the impact of the assignment. That is,
assignments to local variables have a different syntax than assignments that change
the persistent state.
Modeling and Security Analysis of a Commercial Real-Time Operating System Kernel 309
4.1.3 Functions
C language functions may or may not return a value. When modeling in ACL2,
functions need to at least return the state that is a result of their invocation. The
reader macro transforms function invocations that appear to not return a value into a
function call that returns the new state, catching it in the appropriate variable. Model
functions are declared using a form called “defmodel,” which is similar to the ACL2
defun form.
Functions that return a value are modeled using a multivalued return. That is, it
returns a list of items. The return list has a length of two; the second item is always
the state returned from the function.
clause includes whatever error handling is needed. The else clause contains the re-
mainder of the function. The syntax for conditional early exist is:
(ifx (conditional)
error handling )
The following example will be used to illustrate the various parts of this analysis.
The example is a function that operates on a circular, doubly linked list. This func-
tion removes one element from the list, maintaining a well-formed linked list. This
function is passed two arguments. The first is a pointer to a structure that contains a
pointer to the head of the list. The second is a pointer to the element that is removed
from the list. It is assumed that the element pointed to by the second argument is a
member of the list pointed to by the first argument. How this assumption is captured
in the analysis will be discussed later in this chapter. The example’s C language
implementation is:
void RemoveFromList (LIST *TheList, ELEMENT *Element){
ELEMENT *NextInList, *PrevInList;
if(NULL == NextInList)
return;
/* Update list */
if (Element == NextInList){
/* only element in the list */
TheList->First = NULL;
} else {
/* not only element in the list */
if (TheList->First == Element){
/* Element is first in list */
TheList->First) = NextInList;
}
PrevInList = Element->prev;
PrevInList->next = NextInList;
NextInList->prev = PrevInList;
}
/* clear this element’s links */
Element->next = NULL;
Element->prev = NULL;
}
Modeling and Security Analysis of a Commercial Real-Time Operating System Kernel 311
;; else
(%
;; not only element in the list
(if (equal (* TheList -> First) Element)
;; else
st)
The following table describes how various C language constructs are modeled
C ACL2 Lisp/ACL2 Notes
Variable reference
X Value of local variable x
x
( @ x) Value of global variable x
x ( xp ) Value pointed to by local variable xp
p
( ( @ xp )) Value pointed to by global variable xp
&x (@ x) Address of global y variable
312 R.J. Richards
Variable assignment
x D : : :; (% .. .x D : : :/ ..) Assign value of local variable x
(% .. (x @= . . . ) ..) Assign value of global variable x
(% .. ((xp ) @D . . . ) ..) Assign value pointed to by local
variable xp
xp D : : :; (% .. (( @ xp ) @D . . . ) ..) Assign value pointed to by global
variable xp
Simple structure references
Logical operators
x DD y (equal x y) Equal
x !D y (not (equal x y)) Not-equal (Negation)
x<y (< x y) Less-than
x>y (> x y) Greater-than
x <D y (<D x y) Less-than or equal-to
xp DD NULL (NULLP xp ) Null pointer test
!xp
314 R.J. Richards
xp DD 0
xp (NNULL xp ) Non-Null pointer test
Function Declarations
type foo . . . (defun foo . . . Return type not specified in
function signature
. . . foo (int x) . . . . . . foo (x st) No parameter type declarations
State (st) parameter added for
access to global state
. . . foo (int x) . . . . . . foo (xp st) No parameter type declarations
State (st) parameter added for
access to global state
return; (return st) Function return when function
application only changes state
return x; (return x) Function returns a single value x
when function application does
not change state
There are several important boundaries to the kernel model that could not be
modeled as a straightforward translation of the system source code. These bound-
aries include asynchronous interactions with the world outside of the kernel, includ-
ing interrupt processing and execution of application code. Another boundary is the
interaction with the portion of the kernel that is specific to the hardware platform.
The GWVr2 theorem requires that steady-state operation be defined as a step that
can be performed repeatedly. Since the steady-state execution of most operating
Modeling and Security Analysis of a Commercial Real-Time Operating System Kernel 315
Secure States
Time
Execute
Load Store
Step
Information flows are modeled by defining for a given element in system state;
what are the state elements that can influence its next value. This can be thought
of as a graph with vertices representing state elements and edges representing
dependencies. For each function of the model, a new function is defined that
calculates its graph, given its set of inputs, including the input state. The naming
convention for these graph-computing functions (often referred to as graph func-
tions) is to append “-graph” to the model function’s name. The graph function’s
parameters are identical to that of the model function. The graph function returns
a data structure that contains all of the graph edges for each state element that is
updated by the model function.
Modeling and Security Analysis of a Commercial Real-Time Operating System Kernel 317
5.1 Crawlers
5.2 Graphs
A graph describes, for a set of state elements that may have their value changed by
an operation, what are the sources of information that are used in calculating the
new values. Using our circular linked list example, let us consider a graph for either
a sort or remove-element operation. In each case, the state elements that may be
changed are the previous and next fields of all of the elements of the list. The new
values that may be stored in these locations are the values that are stored in these
same locations before the operation. Therefore, the graph states that the new values
of the previous and next fields in the list depend upon what is currently stored in the
previous and next fields in the list. More precisely, the graph contains an entry for
each previous and next pointer as a location that may be updated. Each entry defines
a dependency on the set of previous and next pointers as the source of information
for the updated values.
Several functions and macros are defined to assist in developing graph functions.
Chief among these is “defgraph,” which takes four arguments. The first argument
is the name of the function whose graph is being defined. The second argument
is the list of names of the function’s parameters. Any parameter that is a pass-by-
value structure has its name in parentheses. The third argument is the list of variable
names whose values are returned by this function. Again, pass-by-value structures
have their names in parentheses. The last argument is the body of the graph defining
function.
The functions “du,” “du*,” and “merg-u2” are used to create dependencies or use
lists. The functions “su” and “su*” are used to associate a dependency set with an
318 R.J. Richards
element that has its value defined by the function. The function “mvg” returns the
graph and associates variables with returned values.
The graph for the RemoveFromList example function is defined as follows:
(defgraph RemoveFromList (:TheList :Element (:st))
((:st))
(%
;; determine the nodes in the list
(list-nodes = (crawl-list TheList st))
In the formal analysis, graphs are created for each function in the kernel. A graph
for a function that calls other functions must be no smaller than the graphs of the
subordinate functions. That is, the dependencies defined by any graph of a called
function must exist in the graph of the calling function.
In the circular linked list example, consider an Add operation. The state elements
that may be updated are not only the previous and next fields of the existing list, but
also the previous and next fields of the element being added to the list. The sources
of new values for these elements are not only the previous and next pointers of the
existing list, but also the parameter to the function pointing to the new element.
The graph of any function calling the Add operation must relate the previous
and next fields of the current list members and the previous and next fields of any
elements that might be added to the list to the sources of possible new values. The
Modeling and Security Analysis of a Commercial Real-Time Operating System Kernel 319
sources of possible new values are, of course, the previous and next fields of the
current members of the list and the locations that could supply the new elements to
the Add operation.
6 Proof of Separation
In order to prove the GWVr2 theorem, it is useful to first prove two lemmas with re-
spect to the function being analyzed. These lemmas are referred to as the Workhorse
Lemma and the ClearP Lemma. We will discuss these lemmas with respect to
the circular linked list example. Before we discuss these lemmas, we will define
functions needed to support them.
RemoveFromList-Hyp. For every model function “foo” a function “foo-hyp” is
defined. This function states the hypothesis that is needed in order to have the
model function work appropriately. The hypothesis function takes the same ar-
guments as the model function. Recall that for the RemoveFromList function, it
was assumed that the element given to the function is a member of the list; the
RemoveFromList-Hyp function is where that assumption is stated.
Keys. The Key function is passed a dependency graph and returns the set of state
elements that may be updated, according to the graph.
DIA. The direct interaction allowed (DIA) function takes a state element and a
graph. It returns the set of state elements that the passed-in element has depen-
dencies on, as defined by the graph.
CP-Set-Equal. CP-Set-Equal is a predicate that takes a set of state elements and
two states. It evaluates to True if the two states have the same value for each
member of the set. It does not say anything about portions of the state that are
not in the set. Therefore, the two states may be different in the parts of the state
not defined in the set.
CLRP-Set. CLRP-Set takes a set of state elements and a state. It returns a state
that is a copy of the passed-in state, but the elements of the state specified in
the set have been cleared. In this case, cleared means that their values have been
replaced with nil.
The Workhorse Lemma states a relationship between the results of two invocations
of a function. These two invocations operate on different states, but on the same
parameters. For the circular linked list example, the two states satisfy the following
constraints:
Both states satisfy the RemoveFromList-Hyp assumptions. This means we are
only considering invocations of this function, where the element to be removed
is a member of the list.
The two states have the same values for all elements that are in one of the depen-
dency sets defined by the RemoveFromList graph.
320 R.J. Richards
The ClearP Lemma demonstrates that all of the changes to state performed by a
function are captured by the function’s graph. In the circular linked list example, for
a list, element, and a state that satisfy the function’s hypothesis function
Modeling and Security Analysis of a Commercial Real-Time Operating System Kernel 321
8 Conclusion
After a thorough review of all of the certification evidence, including the for-
mal, semiformal, and informal analysis described herein, NIAP granted a Com-
mon Criteria Certificate for the INTEGRITY-178B kernel at the EAL6C level on
September 1, 2008. The “home page” for the certification documentation can be
found online [8]; a summary of the formal verification activities can be found in the
Security Target document [3].
References
1. Alves-Foss J, Rinker B, Taylor C (2002) Towards common criteria certification for DO-178B
compliant airborne systems, Center for Secure and Dependable Systems, University of Idaho
2. Common Criteria for Information Technology Security Evaluation (CCITSE) (1999). Available
at http://www.radium.ncsc.mil/tpep/library/ccitse/ccitse.html
322 R.J. Richards
1 Introduction
G. Klein ()
NICTA, Sydney, NSW, Australia
e-mail: gerwin.klein@nicta.com.au
On the surface, these two large refinement proofs use different formalisms and
connect different kinds of specification artefacts. Technical details on these two
proofs have appeared elsewhere [4, 17, 23]. This article recalls some of these details
and shows how they are put together into a common, general refinement framework
that allows us to connect the results and extract the main overall theorem: the C code
of seL4 correctly implements its abstract specification.
Section 2 shows the overall data refinement framework. Section 3 gives some
example code on the monadic and C level. Section 4 summarises the refinement
proof RA and shows how it is mapped into the framework. Section 5 does the same
for the C implementation proof RC .
2 Data Refinement
The ultimate objective of our effort is to prove refinement between an abstract and a
concrete process. Following de Roever and Engelhardt [6], we define a process as a
triple containing an initialisation function, which creates the process state with ref-
erence to some external state, a step function which reacts to an event, transforming
the state, and a finalisation function which reconstructs the external state.
record process D Init WW ’external ) ’state set
Step WW ’event ) .’state ’state/ set
Fin WW ’state ) ’external
The idea is that the external state is the one observable on the outside, about which
one may formulate Hoare logic properties. A process may also contain hidden state
to implement its data structures. In the simple case, the full state space of a compo-
nent is just a pair of external and hidden states and the projection function Fin is just
Refinement in the Formal Verification of the seL4 Microkernel 325
the canonical projection from pairs. With more complex processes, the projection
function that extracts the observable state may become more complex as well.
The execution of a process may be non-deterministic, starting from a initial ex-
ternal state, resulting via a sequence of inputs in a set of external states:
steps ı s events foldl . states event: .ı event/ ‘‘ states/ s events
execution A s events .Fin A/ ‘ .steps .Step A/ .Init A s/ events/
where R ‘‘ S and f ‘ R are the images of the set S under the relation R and the
function f, respectively.
Process A is refined by C, if with the same initial state and input events, execution
of C yields a subset of the external states yielded by executing A:
AvC 8 s events: execution C s events execution A s events
This is the classic notion of refinement as reducing non-determinism. Note that it
also includes data refinement: A and C may work on different internal state spaces;
they merely both need to project to the same external state space.
A well-known property of refinement is that it is equivalent with the preservation
of Hoare logic properties.
State Relation
State Relation
Concrete Operation
2.2 Structure
The three processes we are interested in have a common structure in their Step
operations. We model five kinds of events in our processes. The first two are transi-
tions that do not involve the kernel: user thread execution and idle thread execution.
We model the execution of user threads with unrestricted non-determinism, allowing
all possible behaviours. We distinguish the idle thread as it may run in the kernel’s
context and thus must be better behaved. The next two kinds of events model the
transition from user mode to kernel mode when exceptions occur: user mode excep-
tions and idle mode exceptions. The final event type is the one we are interested in:
kernel execution. This is the only part of the Step operation that differs between our
processes.
Refinement in the Formal Verification of the seL4 Microkernel 327
Formally, we model this in a function global-automaton that takes the kernel be-
haviour as a parameter and implements the above transitions generically. The kernel
transition is:
global-automaton kernel-call KernelTransition
f ..s; KernelMode; Some e/; .s’; m; None// js s’ e m: .s;s’;m/ 2 kernel-call eg
The parameter kernel-call is a relation between current and final kernel state and the
next mode the machine is switched into (kernel mode, user mode, and idle mode).
The state space of the process is a triple of the kernel-observed machine state, in-
cluding memory and devices, a current mode, and a current kernel entry event. The
latter is produced by the other transitions in the model. For instance, in idle mode,
only an interrupt event can be generated:
global-automaton kernel-call IdleEventTransition
f ..s; IdleMode; None/; .s; KernelMode; Some Interrupt// js: True g
From user mode, any kernel entry event e is possible. The transition from user to
kernel mode itself does not change the state; the context switch is modelled inside
the kernel transition that comes after, because it is modelled differently at each ab-
straction level. The transition assumes no further conditions and does not depend on
the parameter kernel-call.
global-automaton kernel-call UserEventTransition
f ..s; UserMode; None/; .s; KernelMode; Some e// js e: Trueg
The other transitions are analogous.
The definition of kernel execution may vary between our three processes, but
they share a common aspect. Each is implemented through a call to the top-level
kernel handler function from which a call graph proceeds in a structured language.
Exploiting this structure is the key aspect of our approach.
2.3 Correspondence
3 Example
The seL4 kernel [8] provides the following operating system kernel services:
inter-process communication, threads, virtual memory, access control, and interrupt
control. In this section, we present a typical function, cteMove, with which we will
illustrate the two proof frameworks for refinement. Figure 3 shows the same func-
tion in the monadic executable specification and in the C implementation. The first
refinement proof relates two monadic specifications; the second refinement proof
relates the two layers shown in the figure.
Access control in seL4 is based on capabilities. A capability contains an object
reference along with access rights. A capability table entry (CTE) is a kernel data
structure with two fields: a capability and an mdbNode. The latter is book-keeping
information and contains a pair of pointers which form a doubly linked list.
The cteMove operation, shown in Fig. 3, moves a CTE from src to dest.
The first six lines in Fig. 3 initialise the destination entry and clear the source
entry; the remainder of the function updates the pointers in the doubly linked list.
During the move, the capability in the entry may be diminished in access rights.
Thus, the argument cap is this possibly diminished capability, previously retrieved
from the entry at src.
In this example, the C source code is structurally similar to the executable
specification. This similarity is not accidental: the executable specification describes
the low-level design with a high degree of detail. Most of the kernel functions ex-
hibit this property. It is also true, to a lesser degree, for the refinement between two
monadic specifications. Even so, the implementation here makes a small optimisa-
tion: in the specification, updateMDB always checks that the given pointer is not
NULL. In the implementation, this check is done for prev ptr and next ptr –
which may be NULL – but omitted for srcSlot and destSlot. In verifying
cteMove, we will have to prove that these checks are not required.
4 Monadic Refinement
The abstract and executable specifications over which RA is proved are written
in a monadic style inspired by Haskell. The type constructor .’a; ’s/ nd-monad
is a non-deterministic state monad representing computations with a state type ’s
and a return value type ’a. Return values can be injected into the monad using the
return :: ’a ) .’a; ’s/ nd-monad operation. The composition operator bind ::
.’a; ’s/ nd-monad ) .’a ) .’b; ’s/ nd-monad/ ) .’b; ’s/ nd-monad performs
the first operation and makes the return value available to the second operation.
These canonical operators form a monad over .’a; ’s/ nd-monad and satisfy the
usual monadic laws. More details are given elsewhere [4]. The ubiquitous do : : :
od syntax seen in Sect. 3 is syntactic sugar for a sequence of operations composed
using bind.
The type .’a; ’s/ nd-monad is isomorphic to ’s ) .’a ’s/ set bool. This can
be thought of as a non-deterministic state transformer (mapping from states to sets
of states) extended with a return value (required to form a monad) and a boolean
failure flag. The flag is set by the fail :: .’a; ’s/ nd-monad operation to indicate
unrecoverable errors in a manner that is always propagated and not confused by
non-determinism. The destructors mResults and mFailed access, respectively, the
set of outcomes and the failure flag of a monadic operation evaluated at a state.
Exception handling is introduced by using a return value in the sum type. An
alternative composition operator op >>DE :: .’e C ’a; ’s/ nd-monad ) .’a
) .’e C ’b; ’s/ nd-monad/ ) .’e C ’b; ’s/ nd-monad inspects the return
value, executing the subsequent operation for normal (right) return values and skip-
ping it for exceptional (left) ones. There is an alternative return operator returnOk
and these form an alternative monad. Exceptions are thrown with throwError and
caught with catch.
We define a Hoare triple denoted fjPjg a fjRjg on a monadic operator a, precondition
P and postcondition Q. We have a verification condition generator (VCG) for such
Hoare triples, which are used extensively both to establish invariants and to make
use of them in correspondence proofs.
4.2 Correspondence
parameters: R is a predicate which will relate abstract and concrete return values,
and the preconditions P and P’ restrict the input states, allowing use of information
such as global invariants:
corres R P P’ A C 8 .s; s’/ 2 state-relation: P s ^ P’ s’ !
.8 .r’; t’/ 2 mResults .C s’/: 9 .r; t/ 2 mResults .A s/: .t; t’/ 2 state-relation ^ R r r’/
^ .:mFailed .C s’//
Note that the outcome of the monadic computation is a pair of result and failure
flag. The last conjunct of the corres statement mandates non-failure for C.
The key property of corres is that it decomposes over the bind constructor
through the CORRES - SPLIT rule.
CORRES - SPLIT :
corres R’ P P’ A C
8 r r’: R’ r r’ ! corres R .S r/ .S’ r’/ .B r/ .D r’/ fjQjg A fjSjg fjQ’jg C fjS’jg
corres R .P and Q/ .P’ and Q’/ .A >>D B/ .C >>D D/
This splitting rule decomposes the problem into four subproblems. The first two
are corres predicates relating the subcomputations. Two Hoare triples are also re-
quired. This is because the input states of the subcomputations appearing in the
second subproblem are intermediate states, not input states, of the original problem.
Any preconditions assumed in solving the second subproblem must be shown to
hold at the intermediate states by proving a Hoare triple over the partial computa-
tion. Use of Hoare triples to demonstrate intermediate conditions is both a strength
and a weakness of this approach. In some cases, the result is repetition of existing
invariant proofs. However, in the majority of cases, this approach makes the flexi-
bility and automation of the VCG available in demonstrating preconditions that are
useful as assumptions in proofs of the corres predicate.
The decision to mandate non-failure for concrete elements and not abstract ones
is pragmatic. Proving non-failure on either system could be done independently;
however, the preconditions needed are usually the same as in corres proofs and it
is convenient to solve two problems simultaneously. Unfortunately we cannot so
easily prove abstract non-failure. Because the concrete specification may be more
deterministic than the abstract one, there is no guarantee that we will examine all
possible failure paths. In particular, if a conjunct mandating abstract non-failure was
added to the definition of corres, the splitting rule above would not be provable.
Similar splitting rules exist for other common monadic constructs including
bindE, catch, and conditional expressions. There are terminating rules for the ele-
mentary monadic functions, for example:
CORRES - RETURN :
Rab
corres R > > .return a/ .return b/
The corres predicate also has a weakening rule, similar to the Hoare Logic.
CORRES - PRECOND - WEAKEN :
corres R Q Q’ A C 8 s: P s ! Q s 8 s: P’ s ! Q’ s
corres R P P’ A C
Refinement in the Formal Verification of the seL4 Microkernel 331
Proofs of the corres property take a common form: first the definitions of the
terms under analysis are unfolded and the CORRES - PRECOND - WEAKEN rule is ap-
plied. As with the VCG, this allows the syntactic construction of a precondition to
suit the proof. The various splitting rules are used to decompose the problem, in
some cases with carefully chosen return value relations. Existing results are then
used to solve the component corres problems. Some of these existing results, such
as CORRES - RETURN, require compatibility properties on their parameters. These
are typically established using information from previous return value relations.
The VCG eliminates the Hoare triples, bringing preconditions assumed in corres
properties at later points back to preconditions on the starting states. Finally, as in
Dijkstra’s postcondition propagation [7], the precondition used must be proved to
be a consequence of the one that was originally assumed.
To prove RA , we must connect the corres framework described above to the for-
ward simulation property we wish to establish. The Step actions of the processes we
are interested in are equal for all events other than kernel executions, and simulation
is trivial to prove for equal operations. In the abstract process A, kernel execution
is defined in the monadic function call-kernel. The semantics of the whole abstract
process A are then derived by using call-kernel in the call to global-automaton.
The context switch is modelled by explicitly changing all user accessible parts,
for instance the registers of the current thread, fully non-deterministically. The se-
mantics of the intermediate process for the executable specification E are derived
similarly from a monadic operation callKernel. These two top-level operators sat-
isfy a correspondence theorem KERNEL - CORRES:
8 event: corres . rv rv’: True/ invs invs’ .call-kernel event/ .callKernel event/
The required forward simulation property for kernel execution (assuming the
system invariants) is implied by this correspondence rule. Invariant preservation for
the system invariants follows similarly from Hoare triples proved over the top-level
monadic operations:
8 event: fjinvsjg call-kernel event fj -: invsjg
8 event: fjinvs’jg callKernel event fj -: invs’jg
From these facts, we may thus conclude that RA holds:
Theorem 1. The executable specification refines the abstract one.
AvE
5 C Refinement
In this section, we describe our infrastructure for parsing C into Isabelle/HOL and
for reasoning about the result.
332 G. Klein et al.
C-SIMPL
C expressions, guards
C Code
C memory model
Parser SIMPL
Operational Semantics
The seL4 kernel is implemented almost entirely in C99 [15]. Direct hardware
accesses are encapsulated in machine interface functions, some of which are im-
plemented in ARMv6 assembly. In the verification, we axiomatise the assembly
functions using Hoare triples.
Figure 4 gives an overview of the components involved in importing the ker-
nel into Isabelle/HOL. The right-hand side shows our instantiation of SIMPL [18],
a generic, imperative language inside Isabelle. The SIMPL framework provides a
program representation, a semantics, and a VCG. This language is generic in its ex-
pressions and state space. We instantiate both components to form C-SIMPL, with
a precise C memory model and C expressions, generated by a parser. The left-hand
side of Fig. 4 shows this process: the parser takes a C program and produces a
C-SIMPL program.
SIMPL provides a data type and semantics for statement forms; expressions are
shallowly embedded. Along with the usual constructors for conditional statements
and iteration, SIMPL includes statements of the form Guard F P c which raises the
fault F if the condition P is false and executes c otherwise.
Program states in SIMPL are represented by Isabelle records containing a field
for each local variable in the program and a field globals containing all global vari-
ables and the heap. Variables are then simply functions on the state.
SIMPL semantics are represented by judgements of the form ` hc;xi ) x’,
which means that executing statement c in state x terminates and results in state x’;
the parameter maps function names to function bodies. These states include both
the program state and control flow information, including that for abruptly termi-
nating THROW statements used to implement the C statements return, break,
and continue.
The SIMPL environment also provides a VCG for partial correctness triples;
Hoare-triples are represented by judgements of the form `=F P c C;A, where
P is the precondition, C is the postcondition for normal termination, A is the
Refinement in the Formal Verification of the seL4 Microkernel 333
postcondition for abrupt termination, and F is the set of ignored faults. If F is UNIV,
the universal set, then all Guard statements are effectively ignored. Both A and F
may be omitted if empty.
Our C subset allows type-unsafe operations including casts. To achieve this
soundly, the underlying heap model is a function from addresses to bytes. This al-
lows, for example, the C function memset, which sets each byte in a region of the
heap to a given value. We generally use a more abstract interface to this heap: we
use additional typing information to lift the heap into functions from typed pointers
to Isabelle terms; see Tuch et al. [20, 21] for more detail.
The C parser takes C source files and generates the corresponding C-SIMPL
terms, along with Hoare-triples describing the set of variables mutated by the
functions. Although our C subset does not include union types, we have a tool which
generates data types and manipulation functions which implement tagged unions via
C structures and casts [3]. The tool also generates proofs of Hoare-triples describing
the operations.
As with the correspondence statement for RA , we deal with state preconditions and
return values by including guards on the states and a return value relation in the
RC correspondence statement. In addition, we include an extra parameter used for
dealing with early returns and breaks from loops, namely a list of statements called
a handler stack.
We thus extend the semantics to lists of statements, writing hchs; si ) x’.
The statement sequence hs is a handler stack; it collects the CATCH handlers which
334 G. Klein et al.
Monadic Operation
s P (s', rv)
S S r
Fig. 5 Correspondence
ccorres r xf P P’ hs a c
8 .s; t/ 2 S: 8 t’: s 2 P ^ t 2 P’ ^ : mFailed .a s/ ^ hchs; ti ) t’
! 9 .s’;rv/ 2 mResults .a s/:
9 t’N : t’ D Normal t’N ^ .s’; t’N / 2 S ^ r rv .xf t’N /
The definition can be read as follows: given related states s and t with the pre-
conditions P and P’, respectively, if the abstract specification a does not fail when
evaluated at state s, and the concrete statement c evaluates under handler stack hs in
extended state t to extended state t’, then the following must hold:
1. Evaluating a at state s returns some value rv and new abstract state s’.
2. The result of the evaluation of c is some normal (non-abrupt) state Normal t’N .
3. States s’ and t’N are related by the state relation S.
4. Values rv and xf t’N – the extraction function applied to the final state of c – are
related by r, the given return value relation.
Note that a is non-deterministic: we may pick any suitable rv and s’. As mentioned
in Sect. 4.2, the proof of RA entails that the executable specification does not fail.
Thus, in the definition of ccorres, we may assume : mFailed .a s/. In practice, this
means assertions and other conditions for (non-)failure in the executable specifica-
tion become known facts in the proof. Of course, these facts are only free because
we have already proven them in RA .
Refinement in the Formal Verification of the seL4 Microkernel 335
Data refinement predicates can, in general [6], be rephrased and solved as Hoare
triples. We do this in our framework by using the VCG after applying the following
rule:
8 s: `ft j s 2 P ^ t 2 P’ ^ .s; t/ 2 S g
c
ft’ j 9 .rv; s’/ 2 mResults .a s/: .s’; t’/ 2 S ^ r rv .xf t’/g
ccorres r xf P P’ hs a c
In essence, this rule states that to show correspondence between a and c, for a
given initial specification state s, it is sufficient to show that executing c results in
normal termination where the final state is related to the result of evaluating a at s.
The VCG precondition can assume that the initial states are related and satisfy the
correspondence preconditions.
Use of this rule in verifying correspondence is limited by two factors. First, the
verification conditions produced by the VCG may be excessively large or complex.
Our experience is that the output of a VCG step usually contains a separate term
for every possible path through the target code and that the complexity of these
terms tends to increase with the path length. Second, the specification return value
and result state are existential and thus outside the range of our extensive automatic
support for showing universal properties of specification fragments. Fully expanding
the specification is always possible, and in the case of deterministic operations it
will yield a single state/return value pair, but the resulting term structure may also
be large.
5.4 Splitting
In the C-SIMPL VCG obligation, we may ignore any guard faults as their absence
is implied by the first premise. In fact, in most cases the C-SIMPL VCG step can
be omitted altogether, because the postcondition collapses to the universal set after
simplifications.
We have developed a tactic which assists in splitting: C-SIMPL’s encoding of
function calls and struct member updates requires multiple specialised rules. The
tactic symbolically executes and moves any guards if required, determines the cor-
rect splitting rule to use, instantiates the extraction function, and lifts the second
correspondence premise.
We map the C kernel into a process by lifting the operational semantics of the kernel
C code into a non-deterministic monad:
exec-C c s: .f./g fs’ j ` hc;Normal si ) Normal s’g; False/
that is, for a given statement c we construct a function from an initial state s into the
set of states resulting from evaluating c at s. We define the return value of this exe-
cution as the unit. We set the failure flag to False and require a successful Normal
result from C.
We then construct a function callKernel-C, parametrised by the input event,
which simulates the hardware exception dispatch mechanism. The function exam-
ines the argument and dispatches the event to the corresponding kernel entry point.
Finally, we form the process ADT-C by instantiating the global automaton with this
step function.
We again establish a correspondence result between the kernel entry points, this
time between callKernel in E and callKernel-C in C. This time, we did not need to
prove additional invariants about the concrete level (the C program). The framework
presented above enabled us to shift all such reasoning to the level of the executable
specification E.
Theorem 2. The translated C code refines its executable specification.
E vC
6 Main Theorem
Putting the two theorems from the previous sections together, we arrive via transi-
tivity of refinement at the main functional correctness theorem.
Theorem 3. A v C
Refinement in the Formal Verification of the seL4 Microkernel 337
7 Related Work
8 Conclusion
We have presented the different refinement techniques used in the verification of the
seL4 microkernel. We have given an overview of the overall unifying framework, of
the refinement calculus used for stateful, monadic specification, of the refinement
calculus for imperative programs, and we have shown how these are put together
into the final theorem.
The two frameworks presented here have withstood the test of large-scale ap-
plication to high-performance C code in the Isabelle/HOL verification of the seL4
microkernel. Proving functional correctness for real-world application down to the
implementation level is possible and feasible.
Acknowledgments We thank the other current and former members of the L4.verified and seL4
teams: David Cock, Tim Bourke, June Andronick, Michael Norrish, Jia Meng, Catherine Menon,
338 G. Klein et al.
Jeremy Dawson, Harvey Tuch, Rafal Kolanski, David Tsai, Andrew Boyton, Kai Engelhardt, Kevin
Elphinstone, Philip Derrin, and Dhammika Elkaduwe for their contributions to this verification.
NICTA is funded by the Australian Government as represented by the Department of Broad-
band, Communications and the Digital Economy and the Australian Research Council through the
ICT Centre of Excellence program.
References
OS kernel. In: Proceedings of 22th SOSP, Big Sky, MT, USA, October 2009. ACM, New York,
NY, pp 207–220
18. Schirmer N (2006) Verification of sequential imperative programs in Isabelle/HOL. PhD thesis,
Technische Universität München
19. Tews H, Weber T, Völp M (2008) A formal model of memory peculiarities for the verification
of low-level operating-system code. In: Huuck R, Klein G, Schlich B (eds) Proceedings of
3rd international workshop on systems software verification (SSV’08), vol 217 of ENTCS.
Elsevier, Amsterdam, pp 79–96
20. Tuch H (2009) Formal verification of C systems code: structured types, separation logic
and theorem proving. J Autom Reason (special issue on operating system verification)
42(2–4):125–187
21. Tuch H, Klein G, Norrish M (2007) Types, bytes, and separation logic. In: Hofmann M,
Felleisen M (eds) Proceedings of 34th ACM SIGPLAN-SIGACT symposium on principles
of programming languages, Nice, France. ACM, New York, NY, pp 97–108
22. Walker B, Kemmerer R, Popek G (1980) Specification and verification of the UCLA unix
security kernel. Commun ACM 23(2):118–131
23. Winwood S, Klein G, Sewell T, Andronick J, Cock D, Norrish M (2009) Mind the gap: a
verification framework for low-level C. In: Berghofer S, Nipkow T, Urban C, Wenzel M (eds)
Proceedings of TPHOls’09, vol 5674 of LNCS, Munich, Germany, August 2009. Springer,
Berlin, pp 500–515
Specification and Checking of Software
Contracts for Conditional Information Flow
1 Introduction
T. Amtoft ()
Kansas State University, Manhattan, KS, USA
e-mail: tamtoft@cis.ksu.edu
2 Example
fg
Separation pro cedure MACHINE STEP
INFORMATION FLOW CONTRACT GOES HERE
s e e F i g u r e 2
Mailbox is
DATA 0 , DATA 1 : CHARACTER ;
Input 0 Input 1 begin
i f IN 0 RDY and no t OUT 1 RDY t hen
DATA 0 : = IN 0 DAT ;
IN 0 RDY : = FALSE ;
Client 0
Client 1
processes – each running on its own partition in the separation kernel. Client 0
writes data to communicate in the memory segment Input 0 that is shared between
Client 0 and the mailbox, then it sets the Input 0 Ready flag. The mailbox process
polls its ready flags; when it finds that, e.g., Input 0 Ready is set and Output 1
Ready is cleared (indicating that Client 1 has already consumed data deposited in
the Output 1 slot in a previous communication), then it copies the data from Input 0
to Output 1 and clears Input 0 Ready and sets Output 1 Ready. The communication
from Client 1 to Client 0 follows a symmetric set of steps. The actions to be taken in
each execution frame are encoded in SPARK Ada by the MACHINE STEP procedure
shown in Fig. 1.
Figure 2a shows SPARK Ada annotations for the MACHINE STEP procedure,
whose information flow properties are captured by derives annotations. It re-
quires that each parameter and each global variable referenced by the procedure be
classified as in (read only), out (written and initial values [values at the point of pro-
cedure call] are unread), or in out (written and initial values read). For a procedure
P , variables annotated as in or in out are called input variables and denoted as
INP , while variables annotated as out or in out are output variables and denoted
as OUTP . Each output variable xo must have a derives annotation indicating the
input variables whose initial values are used to directly or indirectly calculate the
final value of xo . One can also think of each derives clause as expressing a depen-
dence relation (or program slice) between an output variable and the input variables
that it transitively depends on (via both data and control dependence).
For example, the second derives clause specifies that on each MACHINE STEP
execution the output value of OUT 1 DAT is possibly determined by the input val-
ues of several variables: from IN 0 DAT when the Mailbox forward data supplied
by Client 0, from OUT 1 DAT when the conditions on the ready flags are not sat-
isfied (OUT 1 DAT’s output value then is its input value), and from OUT 1 RDY and
IN 0 RDY because these variables control whether or not data flows from Client 0 on
a particular machine step (i.e., they guard the flow).
a b
# g l o b a l i n o u t IN 0 RDY , IN 1 RDY , # d e r i v e s
# OUT 0 RDY , OUT 1 RDY , # OUT 0 DAT f r o m
# OUT 0 DAT , OUT 1 DAT ; # IN 1 DAT when
# in IN 0 DAT , IN 1 DAT ; # ( IN 1 RDY and n o t OUT 0 RDY ) ,
# d e r i v e s # OUT 0 DAT when
# OUT 0 DAT f r o m IN 1 DAT , OUT 0 DAT , # ( n o t IN 1 RDY o r OUT 0 RDY ) ,
# OUT 0 RDY , IN 1 RDY & # OUT 0 RDY , IN 1 RDY &
# OUT 1 DAT f r o m IN 0 DAT , OUT 1 DAT , # OUT 1 DAT f r o m
# IN 0 RDY , OUT 1 RDY & # IN 0 DAT when
# IN 0 RDY f r o m IN 0 RDY , OUT 1 RDY & # ( IN 0 RDY and n o t OUT 1 RDY ) ,
# IN 1 RDY f r o m INP 1 RDY , OUT 0 RDY & # OUT 1 DAT when
# OUT 0 RDY f r o m OUT 0 RDY , IN 1 RDY & # ( n o t IN 0 RDY o r OUT 1 RDY ) ,
# OUT 1 RDY f r o m OUT 1 RDY , IN 0 RDY ; # OUT 1 RDY , IN 0 RDY
Fig. 2 (a) SPARK information flow contract for Mailbox example. (b) Fragment of same example
with proposed conditional information flow extensions
Specification and Checking of Software Contracts for Conditional Information Flow 345
While upper levels of the MILS architecture require reasoning about lattices of
security levels (e.g., unclassified, secret, top secret), the policies of infrastructure
components such as separation kernels and guard applications usually focus on data
separation policies (reasoning about flows between components of program state),
and we restrict ourselves to such reasoning in this paper.
No other commercial language framework provides automatically checkable
information flow specifications, so the use of the information flow checking frame-
work in SPARK is a significant step forward. As illustrated above, SPARK derives
clauses can be used to specify flows of information from input variables to output
variables, but they do not have enough expressive power to state that information
only flows under specific conditions. For example, in the Mailbox code, informa-
tion from IN 0 DAT only flows to OUT 1 DAT when the flag IN 0 RDY is set and
OUT 1 READY is cleared, otherwise OUT 1 DAT remains unchanged. In other words,
the flags IN 0 RDY and OUT 1 RDY guard the flow of information through the mail-
box. Unfortunately, the SPARK derives cannot distinguish the flag variables as
guards nor phrase the conditions under which the guards allow information to pass
or be blocked. This means that guarding logic, which is central to many MLS ap-
plications including those developed at Rockwell Collins, is completely absent from
the checkable specifications in SPARK.
In general, the lack of ability to express conditional information flow not
only inhibits automatic verification of guarding logic specifications, but also re-
sults in imprecision which cascades and builds throughout the specifications in
the application.
The SPARK subset of Ada is designed for programming and verifying high assur-
ance applications such as avionics applications certified to DO-178B Level A. It
deliberately omits constructs that are difficult to reason about such as dynamically
created data, pointers, and exceptions. In Fig. 3, we present the syntax of a simple
imperative language with assertions that one can consider to be an idealized ver-
sion of SPARK. We omit some features of SPARK that do not present conceptual
challenges, such as records, and the package and inheritance structure.
Referring to Fig. 3, we consider three kind of expressions (E 2 Exp): arithmetic
(A 2 AExp), boolean (B 2 BExp), and array expressions (H 2 HExp). We use
x; y to range over scalar variables, h to range over array variables, and w; z to range
over both kind of variables; actual variables appearing in programs are depicted
using typewriter font. We also use c to range over integer constants, p to range
over named (parameterless) procedures, op to range over arithmetic operators in
fC; ; mod; : : :g, and bop to range over comparison operators in fD; <; : : :g.
The use of programmer assertions is optional, but often helps to improve the
precision of our analysis. For example, a loop while B do S od, which is known
to have invariant , may be transformed into while B do assert. ^ B/ I S od;
346 T. Amtoft et al.
Expressions Commands
arithmetic S WWD skip
A WWD x j c j A op A j x := A assignment
boolean j S IS sequential composition
B WWD A bop A j assert./ programmer asssertion
j call p procedure call
Assertions
j if B then S else S conditional
WWD B j ^
j while B do S od iteration
j _ j :
3.1 Semantics
The semantics of an arithmetic expression ŒŒA is a function from stores into values,
where a value (v2Val) is an integer n and where a store s 2 Store maps variables to
values; we write dom.s/ for the domain of s and write Œsjx 7! v for the store that is
like s except that it maps x into v. Similarly, ŒŒBs denotes a boolean.
Figure 4 summarizes the semantics of commands. A command transforms the
store into another store; hence its semantics is given in relational style, in the form
s ŒŒS s 0 . For some S and s, there may not exist any s 0 such that s ŒŒS s 0 ; this can
Specification and Checking of Software Contracts for Conditional Information Flow 347
s ŒŒskip s 0 iff s 0 D s
s ŒŒx := A s 0 iff 9v W v D ŒŒAs and s 0 D Œs j x 7! v
s ŒŒS1 I S2 s 0 iff 9s 00 W s ŒŒS1 s 00 and s 00 ŒŒS2 s 0
s ŒŒassert./ s 0 iff s ˆ and s 0 D s
s ŒŒcall p s 0 iff s P .p/ s 0
s ŒŒif B then S1 else S2 s 0 iff .ŒŒBs D True and s ŒŒS1 s 0 /
or .ŒŒBs D False and s ŒŒS2 s 0 /
s ŒŒwhile B do S od s 0 iff 9i 0 W s fi s 0 where fi is inductively defined by:
s f0 s 0 iff ŒŒBs D False and s 0 D s
s fiC1 s 0 iff 9s 00 W .ŒŒBs D True and
s ŒŒS s 00 and s 00 fi s 0 /
happen if a while loop does not terminate, or an assert fails. We assume an implicit
global procedure environment P that for each p returns a relation between input
and output stores.
Assertions are also called 1-assertions since they represent predicates on a
single program state; we write s ˆ to denote that holds in s following the
standard semantics. We write B1 0 if whenever s ˆ also s ˆ 0 . As usual we
define 1 ! 2 as :1 _ 2 ; we also define true as 0 D 0 and false as 0 D 1.
MILS seeks to prevent security breaches that can occur via unauthorized/unintended
information flow from one partition to another; thus previous certification efforts for
MILS components have among the core requirements included the classical prop-
erty of non-interference [17] which (in this setting) states: for every pair of runs of
a program, if the runs agree on the initial values of one partition’s data (but may
disagree on the data of other partitions) then the runs also agree on the final values
of that partition’s data.
The logic developed in [1] was designed to verify specifications of the following
form: given two runs of P that initially agree on variables x1 ; : : : ; xn , the runs agree
on variables y1 ; : : : ; ym at the end of the runs. This includes noninterference as a
348 T. Amtoft et al.
To capture conditional information flow, recent work [3] by Banerjee and the first
author introduced conditional agreement assertions, also called 2-assertions. They
are of the form ) EË which is satisfied by a pair of stores if either at least one
of them does not satisfy , or they agree on the value of E:
s & s1 ˆ ) EË iff whenever s ˆ and s1 ˆ then ŒŒEs D ŒŒEs1 .
Fig. 5 A derivation for the mailbox example, illustrating the handling of conditionals
Specification and Checking of Software Contracts for Conditional Information Flow 349
agreement assertions zËc indexed by a channel identifier c – which one can usually
associate with a particular output variable. In the multichannel logic, the confused
triple above can now be correctly stated as fwË N z ; yË
N x g S fzËz ; xËx g. (Alterna-
tively, we could have two single-channel triples: fyËg N S fxËg and fwËg N S fzËg.)
The algorithm to be given in Sect. 5 extends to the multichannel version of the logic
in a straightforward manner; hence the implementation described subsequently sup-
ports the multichannel version of the logic. For notational simplicity, we continue
the discussion of the semantics of contracts using the single-channel version of
the logic.
We now give a more convenient notation for triples of the form fgP f 0 g. This
will provide a formal interpretation for method contracts that capture conditions
of flows from beginning to end of a method P . A flow judgment is of the form
Ý 0 , with the precondition and with 0 the postcondition. We say that Ý 0
is valid for command S , written S ˆ Ý 0 , if whenever s1 &s2 ˆ and s1 ŒŒS s10
and s2 ŒŒS s20 then also s10 &s20 ˆ 0 (if the 2-assertions in the precondition hold for
input states s1 and s2 , the postcondition must also hold for associated output states
s10 and s20 ).
The logic of the preceding section is potentially much more powerful than what
we actually want to expose to developers – instead, we view it as a “core calculus”
in which information flow reasoning is expressed. Our design goals that determine
how much of the power of the logic we wish to expose to developers in enhanced
SPARK conditional information flow contracts are (1) the effort required to write
the contracts should be as simple as possible, (2) the contracts should be able to
capture common idioms of MILS information guarding, (3) the contract checking
framework should be compositional so as to support MILS goals, and (4) there
should be a natural progression (e.g., via formal refinements) from unconditional
derives statements to conditional statements.
The agreement assertions from the logic of Sect. 3 have the form ) EË. Here
E is an arbitrary expression (not necessarily a variable), whereas SPARK derives
statements are phrased in terms of IN/OUT variables only. We believe that includ-
ing arbitrary expressions in SPARK conditional derives statements would add
significant complexity for developers, and our experimental studies have shown
that little increase in precision would be gained by such an approach. Instead, we
retain the use of expression-based assertions ) EË only during intermedi-
ate(automated) steps of the analysis. Appealing to Fact 2, we have a canonical way
Specification and Checking of Software Contracts for Conditional Information Flow 351
0
fg .R/ (H if B then S1 else S2 f g
iff f1 g .R1 / (H S1 f 0 g , f2 g .R2 / (H S2 f 0 g , R D R10 [ R20 [ R00 [ R0 , and D dom.R/,
where R10 D f..1 ^ B/ ) E1 Ë; m; 0 / j 0 2 m 0
; .1 ) E1 Ë; ; 0 / 2 R1 g
and R20 D f..2 ^ :B/ ) E2 Ë; m; 0 / j 0 2 m 0
; .2 ) E2 Ë; ; 0 / 2 R2 g
and R00 D f...1 ^ B/ _ .2 ^ :B// ) BË; m; 0 /
j 0 2 m 0
; .1 ) E1 Ë; ; 0 / 2 R1 ; .2 ) E2 Ë; ; 0 / 2 R2 g
and R0 D f...1 ^ B/ _ .2 ^ :B// ) E Ë; u; 0 /
j 0 2 u0 ; .1 ) E Ë; u; 0 / 2 R1 ; .2 ) E Ë; u; 0 / 2 R2 g
0
and m D f 0 2 0 j 9. ; m; 0 / 2 R1 [ R2 g and u0 D 0 n m 0
Note that Theorem 1 is termination insensitive; this is not surprising given our
choice of a relational semantics (but see [2] for a logic-based approach that is termi-
nation sensitive). Also note that correctness is phrased directly wrt the underlying
semantics, unlike [1, 4] which first establish the semantic soundness of a logic and
next provide a sound implementation of that logic. Theorem 1 is proved in the tech-
nical report accompanying this paper [5], much as the corresponding result [3] (that
handled a language with heap manipulation but without procedure calls and without
automatic computation of loop invariants), by establishing some auxiliary properties
that have largely determined the design of Pre. The first such property is a variant
of the “*-property” by Bell and La Padula [11], also called “write confinement” [7],
which is used to preclude, e.g., “low writes under high guards.” In our setting, it
captures the role of the u tag and reads as follows:
Lemma 3. Assume fg .R/ (H S f 0 g . Then dom.R/ D and ran.R/ D 0 .
Given 0 2 0 , there exists at most one such that .; u; 0 / 2 R. If there exists
such , then con./ D con. 0 /, and with E D con./ we have that if s ŒŒS s 0 then s
agrees with s 0 on fv.E/.
Lemma 3, proved in [5], is needed in the proof of Theorem 1 to handle the case
where the two runs in question follow different branches in a conditional, as we
must then ensure that neither run modifies a variable on which we want the two runs
to agree afterward. We shall also use a lemma, proved in [5], which expresses that
there will always be one applicable condition in the precondition:
Lemma 4. Assume fg .R/ (H S f 0 g . Given 0 2 0 , there exists .; ; 0 / 2 R
such that whenever s ŒŒS s 0 and s 0 ˆ ant. 0 / then s ˆ ant./.
We now explain the various clauses of Pre in Fig. 6, where the clause for skip is
trivial. For an assignment x := A, each 2-assertion ) EË in 0 produces exactly
one 2-assertion in , given by substituting A for x (as in standard Hoare logic) in
as well as in E; the connection is tagged m when x occurs in E. For example, if
S is x := w then R might contain the triplets .y > 4 ) wË; m; y > 4 ) xË/ and
.w > 3 ) zË; u; x > 3 ) zË/.
The rule for S1 I S2 works backward, first computing S2 ’s precondition which
is then used to compute S1 ’s; the tags express that a consequent is modified iff
it has been modified in either S1 or S2 . The rule for assert allows us to weaken
2-assertions, by strengthening their antecedents; this is sound since execution will
abort from stores not satisfying the new antecedents.
To illustrate and motivate the rule for conditionals, we shall use Fig. 5 where,
given postcondition OUT 0 DATË, the then branch generates (as the domain of R1 )
precondition INP 1 DATË, which by R10 contributes the first conditional assertion
of the overall precondition. The skip command in the implicit else branch gen-
erates (as the domain of R2 ) precondition OUT 0 DATË which by R20 contributes
Specification and Checking of Software Contracts for Conditional Information Flow 355
the second conditional assertion of the overall precondition. We must also capture
that two runs, in order to agree on OUT 0 DAT after the conditional, must agree
on the value of the test B; this is done by R00 which generates the precondition
.true ^ B/ _ .true ^ :B/ ) BË; optimizations (not shown) in our algorithm sim-
plify this to BË and then use Fact 2 to split out the variables in the conjuncts of
B into the two unconditional assertions of the overall precondition. Finally, assume
the postcondition contained an assertion ) EË, where E is not modified by ei-
ther branch: if also is not modified then ) EË belongs to both R1 and R2 ,
and hence by R0 also to the overall precondition; if is modified by one or both
branches, R0 generates a more complex antecedent for EË.
rmC
X .B/ D true if fv.B/ \ X ¤ ; rm
X .B/ D false if fv.B/ \ X ¤ ;
rmC
X .B/ D B if fv.B/ \ X D ; rm
X .B/ D B if fv.B/ \ X D ;
rmC C C
X .1 ^ 2 / D rmX .1 / ^ rmX .2 / rm
X .1 ^ 2 / D rmX .1 / ^ rmX .2 /
rmC C C
X .1 _ 2 / D rmX .1 / _ rmX .2 / rm
X .1 _ 2 / D rmX .1 / _ rmX .2 /
rmC
X .:0 / D :rmX .0 / rm C
X .:0 / D :rmX .0 /
Equipped with rmC , we can now define the analysis of procedure call, as
done in Fig. 6 and illustrated in Fig. 7. Here Ru deals with assertions (such as
x > 5 ^ z > 7 ) vË in the example) whose consequent has not been modified
by the procedure call (its “frame conditions” determined by the OUT declaration).
For an assertion whose consequent E has been modified (such as x > 7 ^ z > 5 )
.x C u/Ë), we must ensure that the variables of E agree after the procedure call
(when the antecedent holds). For those not in OUTp (such as u), this is done by
R0 (which expresses some “semiframe conditions”); for those in OUTp (such as x),
this is done by Rm which utilizes the procedure summary (contract) of the called
procedure.
For while loops (the only iterative construct), the idea is to consider assertions of
the form x ) xË and then repeatedly analyze the loop body so as to iteratively
weaken the antecedents until a fixed point is reached. To illustrate the overall behav-
ior, consider the example in Fig. 8 where we are given rË as postcondition; hence
the initial value of r’s antecedent is true, whereas all other antecedents are initial-
ized to false. The first iteration updates v’s antecedent to odd.i/, since v is used to
compute r when i is odd, and also updates i’s antecedent to true, since (the parity
of) i is used to decide whether r is updated or not. The second iteration updates x’s
antecedent to :odd.i/, since in order for two runs to agree on v when i is odd, they
must have agreed on x in the previous iteration when i was even. The third iteration
Specification and Checking of Software Contracts for Conditional Information Flow 357
Fig. 8 Iterative analysis of while loop. (We use odd.i / as a shorthand for i mod 2 D 1)
updates x’s antecedent to true, since in order for two runs to agree on x when i is
even, they must agree on x always (as x does not change). We have now reached a
fixed point. It is noteworthy that even though the postcondition mentions rË, and r
is updated using v which in turn is updated using h, the generated precondition does
not mention h, since the parity of i was exploited. This shows [3] that even if we
should only aim at producing contracts where all assertions are unconditional, pre-
cision may still be improved if the analysis engine makes internal use of conditional
assertions.
In the general case, however, fixed point iteration may not terminate. To ensure
termination, we need a “widening operator” 5 on 1-assertions, with the following
properties:
(a) For all and , logically implies 5 and also logically implies 5
(b) If for all i we have that i C1 is of the form 5 i , then the chain f i j i 0g
eventually stabilizes.
A trivial widening operator is the one that always returns true, in effect converting
conditional agreement assertions into unconditional. A less trivial option will utilize
a number of assertions, say 1 ; : : : ; k , and allow 5 D j if j is logically
implied by as well as by ; such assertions may be given by the user if he has a
hint that a suitable invariant may have one of 1 ; : : : ; k as antecedent.
We can now explain the various lines in the clause for while loops in Fig. 6. The
iteration starts with antecedents w0 that are computed such that the correspond-
ing 2-assertion, i , implies the postcondition 0 . The i th iteration updates the
antecedents wi into antecedents wi C1 that are potentially weaker in that for each
w 2 X , each disjunct of wi must imply wi C1 ; here wi captures the “business logic”
of the while loop:
1. If the precondition computed for the iteration contains an assertion ) EË
with w 2 fv.E/, then is an element of wi .
2. If a consequent has been modified by the loop body, then the antecedent must
belong to wi for all w 2 fv.B/.
Here (2) ensures that if one run stays in the loop and updates a variable on which
the two runs must agree, then also the other run stays in the loop (similar to the
role of R00 in the clause for conditionals), whereas (1) caters for the soundness when
both runs stay in the loop, cf. the role of R10 and R20 in the case for conditionals.
358 T. Amtoft et al.
Alternatively, to more closely follow the rule for conditionals, for (1) we could
instead demand that ^ B belongs to wi ; our current choice reflects that we expect
the bodies of while loops to be prefixed by assert statements (which will automati-
cally add B to the antecedents), but do not expect such transformations for branches
of a conditional.
With the iteration stabilizing after j steps (thanks to the widening operator), the
while loop’s precondition and its R component can now be computed; the former
is given as the domain of the latter which is made up from two parts:
First, Ru deals with those assertions in 0 whose consequents have not been
modified (a kind of “frame condition” for the while loop); each such asser-
tion is connected to an assertion with the same consequent (so as to establish
Lemma 3) but with an antecedent that is designed to be so weak that we can
establish Lemma 4.
Next, Rm deals with those assertions in 0 whose consequents have been
modified; each such assertion is connected to all other assertions in j so as
to express that the subsequent iterations of the while loop may give rise to chains
of variable dependences. (It would be possible to give a definition that in most
cases it produces only a subset of those connections, but this would increase the
conceptual complexity of Pre, without – we conjecture – any improvement in
the overall precision of the algorithm.) In addition, again to establish Lemma 4,
we introduce a trivial assertion true ) 0Ë.
6 Evaluation
focus the experimental studies of this section on the more challenging problem
of automatically inferring contracts starting from code with no existing derives
annotations.
For each procedure P , with OUTP D fw1 ; : : : ; wk g, the algorithm analyzes the
body wrt a postcondition w1 Ë1 ; : : : ; wk Ëk . Since SPARK disallows recursion, we
simply move in a bottom-up fashion through the call-graph – guaranteeing that a
contract exists for each called procedure. When deployed in actual development,
one would probably allow developers to tweak the generated contracts (e.g., by
removing unnecessary conditions for establishing end-to-end policies) before pro-
ceeding with contract inference for methods in the next level of the call hierarchy.
However, in our experiments, we used autogenerated contracts for called methods
without modification. All experiments were run under JDK 1.6 on a 2.2-GHz Intel
Core2 Duo.
Embedded security devices are the initial target domain for our work, and the
security-critical sections to be certified from these code bases are often relatively
small, e.g., roughly 1,000 LOC for the guard partition of the Rockwell Collins high
assurance guard mentioned earlier and 3,000 LOC for the (undisclosed) device re-
cently certified by Naval Research Labs researchers [19]. For our evaluation, we
consider a collection of five small to moderate size applications from the SPARK
distribution in addition to an expanded version of the mailbox example of Sect. 2. Of
these, the Autopilot and Missile Control applications are the most realistic. There are
well over 250 procedures in the code bases, but due to space constraints, in Table 1
we list metrics for only the most complex procedures from each application (see
[29] for the source code of all the examples). Columns LOC, C, L, and P report the
number of noncomment lines of code, conditional expressions, loops, and procedure
calls in each method. Our tool can run in two modes. The first mode (identified as
version 1 in Table 1) implements the rules of Fig. 6 directly, with just one small op-
timization: a collection of boolean simplifications are introduced, e.g., simplifying
assertions of the form true ^ ) EË to ) EË. The second mode (version 2 in
Table 1) enables a collection of simplifications aimed at compacting and eliminating
redundant flows from the generated set of assertions. One simplification performed
is elimination of assertions with false in the antecedent (these are trivially true) and
elimination of duplicate assertions. Also, it eliminates simple entailed assertions,
such as )EË when true)EË also appears in the assertion set.
contract. Column Flows gives the number of flows generated by different versions
of our algorithm. This number increases over SF as SPARK flows are refined into
conditional flows (often creating two or more conditioned flows for a particular
IN=OUT variable pair). The data shows that the compacting optimizations often
Specification and Checking of Software Contracts for Conditional Information Flow 361
substantially reduce the number of flows; the practical impact of this is to sub-
stantially increase the readability/tractability of the contracts. Column Cond. Flows
indicates the number of flows from Flows that are conditional. Not only we expect to
see the refining power of our approach in procedures with conditionals (column C)
primarily, but we also see increases in precision that is due to conditional contracts
of called procedures (column P). In few cases, we see a blow-up in the number
of conditional flows. The worse case is MissileGuidance.Transition , which
contains a case statement with each branch containing nested conditionals and pro-
cedure calls with conditional contracts – leading to an exponential explosion in path
conditions. Only a few variables in these conditions lie in what we consider to be
the “control logic” of the system. The tractability of this example would improve
significantly with the methodology suggested earlier in which developers declare
explicitly the guarding variables (such as the xx RDY variables of Fig. 1) and the al-
gorithm then omits tracking of conditional flows not associated with declared guard
variables. Overall, a manual inspection of each inferred contract showed that the
algorithm usually produces conditions that an expert would expect.
As can be see in the Time columns, the algorithm is quite fast for all the examples,
usually taking a little longer in version 2 (all optimizations on). However, for some
examples, version 2 is actually faster; these are the cases of procedures with calls to
other procedures. Due to the optimizations, the callees now have simpler contracts,
simplifying the processing of the caller procedures.
allow us to answer the important question: does our approach provide the precision
needed to better verify local and end-to-end MILS policies, without generating large
contracts that become unwieldy for developers and certifiers?
In this section, we give a detailed discussion of two case studies: the Mailbox exam-
ple (briefly discussed in Sect. 2) and part of the control code for the Autopilot code
base (for which number figures were given in Table 1). For the Mailbox, we will
discuss in detail the MACHINE STEP procedure, previously introduced, comparing
the results of running our tool with the original SPARK specification. In the Au-
topilot case study, we will discuss four procedures and two functions, spanning an
entire call chain in the package, starting at the Main procedure, and going through
the code that controls the altitude in this simplified example of an aircraft autopilot.
This example was discussed in detail in Sect. 2, so we will focus in comparing the
resulting information flow specifications obtained from running our tool on the code
with the original SPARK specification. Figure 9 shows the procedure MACHINE STEP
with the original SPARK information flow specification. Figure 10 shows the in-
formation flow specification obtained by running our tool on the same procedure
(using the slightly modified version of the SPARK language described in Sect. 4.2).
For simplicity, in Fig. 10 we have omitted the body of the procedure, as well as the
global annotations. In addition to using unabbreviated variable names, the code of
Fig. 9 differs from that of Fig. 1 in its use of procedures to manipulate both the con-
trol variables (e.g., Mailbox.CHARACTER INPUT 0 READY) as well as the data vari-
ables of the system. For example, the procedure NOTIFY INPUT 0 CONSUMED clears
the Mailbox.CHARACTER INPUT 0 READY flag where as NOTIFY OUTPUT 1 READY
sets the Mailbox.CHARACTER OUTPUT 1 READY flag.
Upon close examination of Fig. 10, we can see the usage of the symbol fg. These
empty braces are used to represent flow from a constant value. For example, in the
following information flow declaration from Fig. 10:
d e r i v e s . . . M a i l b o x . CHARACTER INPUT 0 READY from
. . . f g when ( M a i l b o x . CHARACTER INPUT 0 READY
and no t M a i l b o x . CHARACTER OUTPUT 1 READY)
indicates that the variable Mailbox.CHARACTER INPUT 0 READY, in the case when
the condition specified holds, has its postcondition value derived from a constant
instead of another variable. By examining the code in Fig. 9, we can see that this is
the case when Mailbox.CHARACTER INPUT 0 READY is assigned the literal false.
The results displayed in Fig. 10 show that the information flow specifications for
every variable in this example have been refined with at least one conditional flow.
Specification and Checking of Software Contracts for Conditional Information Flow 363
Now, we wish to determine what benefits are gained from having such a refined
information flow specification; that is, what do we gain from having information
flow specifications split into cases denoted by particular conditions? We must keep
364 T. Amtoft et al.
in our mind the objective of our research and engineering effort: we want to build a
foundation for an information assurance specification and verification framework.
From our point of view, an adequate information flow assurance framework
must capture and describe the following information about an information-critical
system:
Admissible channels of information flow. The framework must provide mecha-
nisms to appropriately specify when a flow of information from one part of the
system to another (or from one variable to another) is acceptable. The original
derives annotations from SPARK, and its corresponding checking mechanism,
can already be used for this purpose (although they were not originally intended
to fulfill this functionality).
Enabling conditions for information flow channels. The framework must pro-
vide mechanisms to specify under what conditions a particular information flow
channel is active. In information flow assurance applications, information flow
channels are often controlled by system conditions. However, as it is, SPARK
does not posses any mechanism that allows specifying under what conditions a
particular information flow channel is active.
In the case of the mailbox example, we have a device intended to serve as a
communication channel between two entities. If we were to try to describe the in-
formation flow policy requirements for the mailbox, we could write something like:
The mailbox will guarantee that information produced at the by Client 0 will be
forwarded to Client 1, and the information produced by Client 1 will be forwarded
to Client 0.
However, when we look at the information flow specification for the Client 0’s
output on Fig. 9, we have:
d e r i v e s M a i l b o x . CHARACTER OUTPUT 0 DATA from M a i l b o x . CHARACTER INPUT 1 DATA ,
M a i l b o x . CHARACTER OUTPUT 0 READY,
M a i l b o x . CHARACTER OUTPUT 0 DATA ,
M a i l b o x . CHARACTER INPUT 1 READY
The output of Client 0 is not derived only from Client 1’s input, but from other
three variables. It is not necessarily obvious where these other dependences are com-
ing from, and they certainly do not match with our first attempt at describing the
mailbox’s behavior. As it turns out, what happens here is that this specification de-
scribes more than one information flow channel, and the conditions on which they
are active, but all this information has been merged into a single annotation. Let us
look at the equivalent annotation from Fig. 10 to see what is going on:
d e r i v e s M a i l b o x . CHARACTER OUTPUT 0 DATA
from M a i l b o x . CHARACTER INPUT 1 DATA
when ( M a i l b o x . CHARACTER INPUT 1 READY
and no t M a i l b o x . CHARACTER OUTPUT 0 READY) ,
M a i l b o x . CHARACTER OUTPUT 0 READY,
M a i l b o x . CHARACTER OUTPUT 0 DATA
when ( n o t ( M a i l b o x . CHARACTER INPUT 1 READY
and no t M a i l b o x . CHARACTER OUTPUT 0 READY ) ) ,
M a i l b o x . CHARACTER INPUT 1 READY
In the original SPARK specification, we cannot tell whether there are several in-
formation channels, or that the target variable is derived from a combination of the
Specification and Checking of Software Contracts for Conditional Information Flow 365
source variables, because there are no conditions. However, by looking at the spec-
ification produced by our tool, we can see that there are actually two information
flow channels acting on this variable, controlled by two different conditions. We
can also see that the dependence on the extra two variables is produced from control
dependence on the variables that are used to compute the conditions.
It is clear now that there are two information flow channels working on this vari-
able (1) when information available from Client 1 and Client 0 is ready to receive
this information, then the output read by Client 0 is derived from the input produced
by Client 1, and (2) if either there is no input from Client 1 or Client 0 is not ready
to receive, then the output read by Client 0 keeps its old value. And clearly, which
of these two channels is active depends on the aforementioned conditions, which in
turn produce a control dependence on the variables that keep track of whether Client
1 has produced any information and whether Client 0 is ready to receive.
After the previous discussion, the benefits of having conditional information flow
specifications are immediately clear. We have a more precise description of the be-
havior of the system and are able to check both aspects of the information assurance
behavior of a system that we described before: the channels of information flow and
the conditions under which those channels are active.
Another improvement that could be made is to differentiate the parts of the
specification that deal with the control logic from those that deal exclusively with
information flow. For example, in the case of the mailbox annotation for output 0,
we get dependences on a couple of extra variables that arise from control depen-
dence. Perhaps one could mark these flows with a special annotation to explicitly
state that they arise from the control logic. Similarly, we can see in Figs. 9 and 10
that, besides those for the output variables, we have flow annotations for each of
the control variables. These annotations are needed because these variables may be
reset by the procedure. However, these modifications of control variables are also
part of the control logic, and perhaps these flows could also be annotated in a spe-
cial way. Furthermore, one could imagine a tool that would use these annotations
to filter views and show all annotations or hide flows corresponding to the control
logic, etc.
The Autopilot system is one of the examples included in the SPARK distribution
(discussed in detail in [8, Chapter 14]). This is a control system controlling both the
altitude and heading of an aircraft. The altitude is controlled by manipulating the
elevators and the heading is controlled by manipulating the ailerons and rudder. The
autopilot has a control panel with three switches each of which has two positions –
on and off.
The master switch – the autopilot is completely inactive if this is off.
The altitude switch – the autopilot controls the altitude if this is on.
The heading switch – the autopilot controls the heading if this is on.
366 T. Amtoft et al.
Desired autopilot heading values are entered in a console by the pilot, whereas
desired altitude values are determined by the current altitude (similar to how an
automobile cruise control takes its target speed from the current speed when the
cruise control is activated). For this example, we will take a look at a total of four
procedures and two functions.
The procedure in Fig. 15 is interesting for conditional information flow analysis
for multiple reasons:
It contains nested case statements with a call at the lowest level of nesting to
procedure Pitch.Pitch AP that updates global variables.
The actual updates to global variables occur several levels down the call chain
from Pitch.Pitch AP.
The call chain includes several procedures with conditional flows – some of the
conditions propagate up through the call chain, whereas others do not.
We discuss in detail the conditional information flow along the following call
path:
Main (main.adb): It contains an infinite loop that does nothing but call
AP.Control on each iteration.
AP.Control (ap.adb): It reads values for the three switches above
from the environment. If Master Switch is on, then it uses the values
read for Altitude Switch and Heading Switch to set switch variables
Altitude Selected and Heading Selected , otherwise Altitude Selected
and Heading Selected are set to “off.” Instruments needed to calculate altitude
and heading are read, then Altitude.Maintain (with Altitude Selected as
the actual parameter for Switch Pressed ) and Heading.Maintain are called to
update the autopilot state.
AP.Altitude.Maintain (ap-altitude.adb): If Altitude Switch has
transitioned from off to on, the Present Altitude is used as value for
Target Altitude . Otherwise, the previous value of Target Altitude is used
for value of Target Altitude . Pitch.Pitch AP is called to calculate the value
of Surfaces.Elevators based on the parameter values of Pitch.Pitch AP and
the pitch history.
Pitch.Pitch AP (ap-altitude-pitch.ads): It calls a series of
helper functions which update the local variables Present Pitchrate ,
Target Pitchrate , and Elevator Movement and these are used in
Surfaces.Move Elevators to calculate the value of the global output variable
Surfaces.Elevators . The behavior of Surfaces.Move Elevators lies out-
side the SPARK boundary and thus the interface to Surfaces.Move Elevators
represents the leaf of the call tree path under consideration.
We will also consider two functions called from Pitch.Pitch AP:
Altitude.Target Rate and Altitude.Target ROC. This will allow us to
illustrate some interesting aspects of computing information flow specifications
for SPARK functions.
The first procedure we look at is the main procedure. This is, like in most
languages, the top most procedure and the point of access for the whole system.
Specification and Checking of Software Contracts for Conditional Information Flow 367
loop
AP . C o n t r o l ;
end l o o p ;
end Main ;
Fig. 11 Original SPARK specification for procedure main from the autopilot code base
Figure 11 shows the original SPARK specifications and the code, and Fig. 12 shows
the corresponding information flow specifications computed by our tool. The first
thing we note is that there are more derived variables in the annotations derived by
our tool than in the original annotations. This is not a mistake. The reason for this is
that we still have not incorporated SPARK’s “own refinement” abstraction mecha-
nism in our tool. All the variables in the derives annotations from Fig. 12 that start
with AP. are abstracted into the variable AP.State in Fig. 11. As a consequence,
we get more flow specifications because they are refined from those in the original
annotations.
An interesting effect of not having abstraction in our annotations is that
some of the false flows introduced by the abstraction process are not present
in our annotations. For instance, in Fig. 11 one of the annotations suggests that
Surfaces.Ailerons may be derived from Instruments.Altitude . However, as
we can see in Fig. 12, this is not the case; such flow is absent from the specification.
The reason this false flow appears in the abstracted version is that, when all the AP.
368 T. Amtoft et al.
Fig. 12 Results of running tool on procedure main from the autopilot code base
pro cedure C o n t r o l
# g l o b a l i n C o n t ro l s . Master Switch ,
# Controls . Altitude Switch ,
# C o n t ro l s . Heading Switch ;
# in out A l t i t u d e . State ,
# Hea d i n g . S t a t e ;
# out Surfaces . Elevators ,
# Surfaces . Ailerons ,
# S u r f a c e s . Ru d d er ;
# in Instruments . Al t i t u d e ,
# I n s t r u m e n t s . Bank ,
# I n s t r u m e n t s . Heading ,
# I n s t r u m e n t s . Hea d i n g Bu g ,
# I n s t r u m e n t s . Mach ,
# I n st ru m en t s . Pitch ,
# I n st ru men t s . Rate Of Climb ,
# Instruments . S li p ;
# d e r i v e s A l t i t u d e . S t a t e
# f ro m ,
# C o n t ro l s . Master Switch ,
# Controls . Altitude Switch ,
# Instruments . Al t i t u d e ,
# Instruments . Pitch &
# Hea d i n g . S t a t e
# f ro m ,
# C o n t ro l s . Master Switch ,
# C o n t ro l s . Heading Switch ,
# I n s t r u m e n t s . Bank ,
# Instruments . S li p &
# S u r f a c e s . E l e v a t o r s
# f ro m C o n t r o l s . M a s t e r S w i t c h ,
# Controls . Altitude Switch ,
# Alti tude . State ,
# Instruments . Al t i t u d e ,
# I n s t r u m e n t s . Mach ,
# I n st ru m en t s . Pitch ,
# I n st ru men t s . Rate Of Climb &
# S u r f a c e s . A i l e r o n s
# f ro m C o n t r o l s . M a s t e r S w i t c h ,
# C o n t ro l s . Heading Switch ,
# Hea d i n g . S t a t e ,
# I n s t r u m e n t s . Bank ,
# I n s t r u m e n t s . Heading ,
# I n s t r u m e n t s . Hea d i n g Bu g ,
# I n s t r u m e n t s . Mach &
# S u r f a c e s . Ru d d er
# f ro m C o n t r o l s . M a s t e r S w i t c h ,
# C o n t ro l s . Heading Switch ,
# Hea d i n g . S t a t e ,
# I n s t r u m e n t s . Mach ,
# Instruments . S li p
# ;
is
Master Switch , A l t i t u d e Sw i t ch , Heading Switch ,
A l t i t u d e S e l e c t e d , H ead i n g Sel ect ed : C o n t r o l s . Switch ;
Pre s e n t Al t i t u d e : Instruments . Feet ;
Bank : I n st r u men t s . Bankangle ;
Pr esen t H ead i n g : I n st r u men t s . Headdegree ;
Target Heading : I n st r u men t s . Headdegree ;
Mach : I n s t r u m e n t s . Machnumber ;
Pitch : Instruments . Pi t ch an g l e ;
Rate Of Climb : I n st r u men t s . Feetpermin ;
Slip : Instruments . Sl i p an g l e ;
begin
C o n t r o l s . Read Master Switch ( M ast er Sw i t ch ) ;
Controls . Read Altitude Switch ( Al t i t u d e Swi t ch ) ;
C o n t r o l s . Read Heading Switch ( Heading Switch ) ;
case M ast er Sw i t ch i s
when C o n t r o l s . On =>
A lt i t u d e S el e c t ed := Al t i t u d e Swi t ch ;
H ead i n g Sel ect ed := Heading Switch ;
when C o n t r o l s . Of f =>
A l t i t u d e S e l e c t e d : = C o n t r o l s . Of f ;
H e a d i n g S e l e c t e d : = C o n t r o l s . Of f ;
end c a s e ;
Instruments . Read Altimeter ( P re se n t A l t i tu d e ) ;
I n s t r u m e n t s . R e a d B a n k I n d i c a t o r ( Bank ) ;
I n s t r u m e n t s . R ead C o mp ass ( P r e s e n t H e a d i n g ) ;
I n s t r u m e n t s . R ead Head i n g B u g ( T a r g e t H e a d i n g ) ;
I n s t r u m e n t s . R e a d M a c h I n d i c a t o r ( Mach ) ;
Instruments . R e ad Pi tc h In d ic at o r ( Pi t ch ) ;
I n s t r u m e n t s . R ead VSI ( R a t e O f C l i m b ) ;
Instruments . R e ad Sl ip In d ic at o r ( Sl i p ) ;
A l t i t u d e . M a i n t a i n ( A l t i t u d e S e l e c t e d , P r e s e n t A l t i t u d e , Mach , R a t e O f C l i m b , P i t c h ) ;
Head i n g . M a i n t a i n ( H e a d i n g S e l e c t e d , Mach , P r e s e n t H e a d i n g , T a r g e t H e a d i n g , Bank , S l i p ) ;
end C o n t r o l ;
Fig. 13 Original specification for procedure AP.Control from the autopilot code base
370 T. Amtoft et al.
pro cedure C o n t r o l ;
# d e r i v e s A l t i t u d e . S w i t c h P r e s s e d B e f o r e
# f ro m ,
# fg ,
# C o n t ro l s . Master Switch ,
# Controls . A ltitude Switch &
# A l t i t u d e . P i t c h . R a t e . P i t c h H i s t o r y
# f ro m ,
# fg ,
# C o n t ro l s . Master Switch ,
# Controls . Altitude Switch ,
# Instruments . Pitch &
# A l t i t u d e . T a r g e t A l t i t u d e
# f ro m ,
# fg ,
# C o n t ro l s . Master Switch ,
# Controls . Altitude Switch ,
# A l t i t u d e . Switch Pressed Before ,
# Instruments . A l t i t u d e &
# Hea d i n g . Yaw . R a t e . Y a w H i s t o r y
# f ro m ,
# fg ,
# C o n t ro l s . Master Switch ,
# C o n t ro l s . Heading Switch ,
# Instruments . S li p &
# Hea d i n g . R o l l . R a t e . R o l l H i s t o r y
# f ro m ,
# fg ,
# C o n t ro l s . Master Switch ,
# C o n t ro l s . Heading Switch ,
# I n s t r u m e n t s . Bank &
# S u r f a c e s . E l e v a t o r s
# f ro m f g ,
# C o n t ro l s . Master Switch ,
# Controls . Altitude Switch ,
# Altitude . Target Altitude ,
# A l t i t u d e . Switch Pressed Before ,
# A l t i t u d e . Pi t ch . Rate . Pi t ch Hi st o ry ,
# Instruments . Al t i t u d e ,
# I n s t r u m e n t s . Mach ,
# I n st ru m en t s . Pitch ,
# I n st ru men t s . Rate Of Climb &
# S u r f a c e s . A i l e r o n s
# f ro m f g ,
# C o n t ro l s . Master Switch ,
# C o n t ro l s . Heading Switch ,
# Hea d i n g . R o l l . R a t e . R o l l H i s t o r y ,
# I n s t r u m e n t s . Bank ,
# I n s t r u m e n t s . Heading ,
# I n s t r u m e n t s . Hea d i n g Bu g ,
# I n s t r u m e n t s . Mach &
# S u r f a c e s . Ru d d er
# f ro m f g ,
# C o n t ro l s . Master Switch ,
# C o n t ro l s . Heading Switch ,
# Hea d i n g . Yaw . R a t e . Y a w H i s t o r y ,
# I n s t r u m e n t s . Mach ,
# Instruments . S li p
# ;
Fig. 14 Results of running tool on procedure AP.Control from the autopilot code base
however, all these conditions are dropped once the top three procedure calls are
analyzed: the procedures that read the value of the switches (the guard variables).
What happens is that the conditions generated are basically predicates in terms of
the guard variables (the value of the switches), and since the top three procedures
set these switch variables, we have to drop the conditions and turn the annotation
into an unconditional one.
To see this with more detail, let us take a look at what happens at the return
point of the procedure call to Controls.Read Heading Switch. Recall that our
algorithm is a weakest precondition algorithm and, as such, it works bottom-up. So,
when we reach the point right before the call to this procedure, the algorithm has,
among all of the derivations generated, the following flow specification:
d e r i v e s S u r f a c e s . A i l e r o n s from I n s t r u m e n t s . H e a d i n g
when H e a d i n g S w i t c h = C o n t r o l s . On and M a s t e r S w i t c h = C o n t r o l s . On
where Read Controls Switches simply performs the top three procedure
calls in Control , and the rest of the functionality is implemented in
Execute Control Logic. Then the conditional information flow specifications of
Control would be exposed in the procedure Execute Control Logic.
372 T. Amtoft et al.
Another option would be to exploit other annotations in the code (like postcon-
ditions and/or assertions) to avoid unnecessary generalizations. For example, if the
procedure Controls.Read Heading Switch had the following annotation:
pr oc e dur e R e a d H e a d i n g S w i t c h ( H e a d i n g S w i t c h )
# p o s t : H e a d i n g S w i t c h = C o n t r o l s . H e a d i n g S w i t c h ;
then from this annotation we could determine exactly what the value of
Heading Switch is in the postcondition (Controls.Heading Switch) and perform
a direct substitution in the condition expressions instead of having to drop them.
These are all options that we are considering for future versions of the tool.
The next procedure to discuss is Altitude.Maintain , which is called from
AP.Control . This is the first procedure in our study of the Autopilot that has gener-
ated conditional specifications. The original SPARK annotations as well as the code
are displayed in Fig. 15, and the annotations generated by our tool are presented in
Fig. 16. The purpose of this procedure is to maintain the altitude of the airplane de-
pending on the current configuration of the autopilot, so there are quite a few cases
this procedure has to handle, which is why we get several conditional information
flow specifications.
Fig. 15 Original SPARK specification for procedure Altitude.Maintain from the autopilot code
base
Specification and Checking of Software Contracts for Conditional Information Flow 373
Fig. 16 Results of running tool on procedure Altitude.Maintain from the autopilot code base
from ON to ON), unless the system transitions again from OFF to ON. So
Switch Pressed Before is basically used to detect the transitions from OFF to
ON and set the target altitude. A similar analysis applied to the information flow
specifications for Surfaces.Elevators .
Now let us examine procedure Altitude.Pitch.Pitch AP which is called from
Altitude.Maintain . The original SPARK annotations as well as the code can be
seen in Fig. 17, and the results of our tool are presented in Fig. 18. This example
is actually relatively simple, and as seen by looking at the figures the results of
our tool are exactly the same as the original SPARK annotations. As Pitch AP’s
purpose is just to update a set of variables, depending on its input, there is really
no conditional information flow behavior in this procedure. This procedure sets the
pro cedure P i t c h A P ( P r e s e n t A l t i t u d e : i n I n s t r u m e n t s . F e e t ;
Target Altitude : in Instruments . Feet ;
Mach : i n I n s t r u m e n t s . Machnumber ;
Climb Rate : in I n st r u men t s . Feetpermin ;
The Pitch : in Instruments . Pi t ch an g l e )
# g l o b a l i n o u t R a t e . P i t c h H i s t o r y ;
# out Surfaces . El eva t o r s ;
# d e r i v e s R a t e . P i t c h H i s t o r y
# f ro m ,
# The Pitch &
# Surfaces . El eva t o rs
# f ro m R a t e . P i t c h H i s t o r y ,
# Present Altitude ,
# Target Altitude ,
# Mach ,
# Climb Rate ,
# The Pitch
# ;
is
P re se n t P i t c h ra t e : Degreespersec ;
Target Pitchrate : Degreespersec ;
Elevator Movement : Su r f aces . C o n t r o l an g l e ;
begin
C a l c P i t c h r a t e ( The Pitch , P r e s e n t P i t c h r a t e ) ;
T a r g e t P i t c h r a t e := Tar g et R at e ( P r e s e n t A l t i t u d e , T a r g e t A l t i t u d e , Climb Rate ) ;
E l e v a t o r M o v e m e n t : = C a l c E l e v a t o r M o v e ( P r e s e n t P i t c h r a t e , T a r g e t P i t c h r a t e , Mach ) ;
Su r f aces . Move Elevators ( Elevator Movement ) ;
end P i t c h A P ;
Fig. 17 Original SPARK specification for procedure AP.Altitude.Pitch AP from the autopilot code
base
pro cedure P i t c h A P ( P r e s e n t A l t i t u d e : i n I n s t r u m e n t s . F e e t ;
Target Altitude : in Instruments . Feet ;
Mach : i n I n s t r u m e n t s . Machnumber ;
Climb Rate : in I n st r u men t s . Feetpermin ;
The Pitch : in Instruments . Pi t ch an g l e )
# d e r i v e s R a t e . P i t c h H i s t o r y
# f ro m ,
# The Pitch &
# S u r f a c e s . E l e v a t o r s
# f ro m R a t e . P i t c h H i s t o r y ,
# Present Altitude ,
# Target Altitude ,
# Mach ,
# Climb Rate ,
# The Pitch
# ;
is
P re se n t P i t c h ra t e : Degreespersec ;
Target Pitchrate : Degreespersec ;
Elevator Movement : Su r f aces . C o n t r o l an g l e ;
begin
C a l c P i t c h r a t e ( The Pitch , P r e s e n t P i t c h r a t e ) ;
T a r g e t P i t c h r a t e := Tar g et R at e ( P r e s e n t A l t i t u d e , T a r g e t A l t i t u d e , Climb Rate ) ;
E l e v a t o r M o v e m e n t : = C a l c E l e v a t o r M o v e ( P r e s e n t P i t c h r a t e , T a r g e t P i t c h r a t e , Mach ) ;
Su r f aces . Move Elevators ( Elevator Movement ) ;
end P i t c h A P ;
Fig. 18 Results of running tool on procedure AP.Altitude.Pitch AP from the autopilot code base
Specification and Checking of Software Contracts for Conditional Information Flow 375
pitch, depending on the values of the present and target altitude. What is interesting
about this procedure is that it calls a SPARK function, which is the one we look at
next.
To conclude we look at a couple of SPARK functions, which are at the bottom of
this call chain in Autopilot. The reason we look at this functions is to discuss a cou-
ple of relevant concepts in the computation of the annotations relevant to functions,
which are not issues in the original SPARK. The functions are Target Rate, which
is called from Pitch AP, and Target ROC, which is called from Target Rate. The
code for these functions and the annotations obtained with our tool are presented in
Figs. 19 and 20, respectively.
Fig. 19 Results of running tool on function AP.Altitude.Target Rate from the autopilot code base
f u n c t i o n Tar g et R OC ( P r e s e n t A l t i t u d e : I n s t r u m e n t s . F e e t ;
Target Altitude : Instruments . Feet )
return Floorfpm
# d e r i v e s @ r e s u l t
# f ro m fg
# when ( ( T a r g e t A l t i t u d e P r e s e n t A l t i t u d e ) / 10 < Fl o o rf p m ’ F i r s t
# and n o t ( ( T a r g e t A l t i t u d e P r e s e n t A l t i t u d e ) / 10 > Fl o o rf p m ’ L a s t ) ) ,
# fg
# when ( ( T a r g e t A l t i t u d e P r e s e n t A l t i t u d e ) / 10 > Fl o o rf p m ’ L a s t ) ,
# Target Altitude ,
# Present Altitude
# ;
is
R esu l t : I n st r u men t s . Feetpermin ;
begin
R esu l t := I n st r u men t s . Feetpermin ( I n t e g e r ( T a r g e t A l t i t u d e P r e s e n t A l t i t u d e ) / 1 0 ) ;
i f ( R e s u l t > Fl o o r f p m ’ L a s t ) t hen
R e s u l t : = Fl o o r f p m ’ L a s t ;
e l s i f R e s u l t < Fl o o r f p m ’ F i r s t t hen
R e s u l t : = Fl o o r f p m ’ F i r s t ;
end i f ;
return R esu l t ;
end Tar g et R OC ;
Fig. 20 Results of running tool on function AP.Altitude.Target ROC from the autopilot code base
376 T. Amtoft et al.
To conclude, we look at two SPARK functions that lie at the bottom of this call
chain in Autopilot. This will reveal a couple of relevant concepts in the computation
of the annotations relevant to functions, which are not issues in the original SPARK.
The main thing to observe in these functions is the annotations. In SPARK, func-
tions do not have derives annotations. Because functions are not allowed to have
side-effects, the required information is implicit in the declaration of arguments and
return value: the return value simply depends on all parameters and globals declared
in the function. However, in our case, to provide a means to capture conditional
flows, we need to explicitly introduce flow contracts as shown in Figs. 19 and 20.
In order to be able to specify conditional information flow for functions, we need
to be able to talk about the value computed by the function. We define a special
variable @result , which denotes the value returned by the function, and we com-
pute dependences for this special variable. In the case of Target Rate, we do not
have conditional information flow specifications, but in the case of Target ROC, we
do have a couple of conditional flows. However, these two conditional flows are
constant flows and as such they disappear in Target Rate. The main point we want
to present here is the need for information flow specifications for SPARK functions
and the way we implement those in our adaptation of SPARK.
7 Related Work
The theoretical framework for the SPARK information flow framework is provided
by Bergeretti and Carré [12] who present a compositional method for inferring
and checking dependences [14] among variables. That approach is flow sensitive,
whereas most security type systems [7, 33] are flow insensitive as they rely on as-
signing a security level (“high” or “low”) to each variable. Chapman and Hilton [13]
describe how SPARK information flow contracts could be extended with lattices of
security levels and how the SPARK Examiner could be enhanced to check confor-
mance of flows to particular security levels. Those ideas could be applied directly
to provide security levels of flows in our framework. Rossebo et al.[26] show how
the existing SPARK framework can be applied to verify various unconditional prop-
erties of a MILS Message Router. Apart from SPARK, there exist several tools for
analyzing information flow properties, notably Jif (Java + information flow) which
is based on [23] and Flow Caml [28].
The seminal work on agreement assertions is [1], whose logic is flow sensi-
tive, and comes with an algorithm for computing (weakest) preconditions, but the
approach does not integrate with programmer assertions. To address that, and to
analyze heap-manipulating languages, the logic of [4] employs three kinds of prim-
itive assertions: agreement, programmer, and region (for a simple alias analysis).
But, since those can be combined only through conjunction, programmer assertions
are not smoothly integrated, and it is not possible to capture conditional information
flows. That was what motivated Amtoft and Banerjee [3] to introduce conditional
agreement assertions for a heap-manipulating language. This paper integrates that
Specification and Checking of Software Contracts for Conditional Information Flow 377
approach into the SPARK setting (where the lack of heap objects enables us to omit
the “object flow invariants” of [3]) for practical industrial development, adds inter-
procedural contract-based compositional checking, adds an algorithm for computing
loop invariants (rather than assuming they are provided by the user), and provides
an implementation as well as reports on experiments.
A recently popular approach to information flow analysis is self-composition,
first proposed by Barthe et al. [10] and later extended by, e.g., Terauchi and
Aiken [31] and (for heap-manipulating programs) Naumann [24]. Self-composition
works as follows: for a given program S , a copy S 0 is created with all variables re-
named (primed); with the observable variables say x; y, then noninterference holds
provided the sequential composition S I S 0 when given precondition x D x 0 ^y D y 0
also ensures postcondition x D x 0 ^ y D y 0 . This is a property that can be checked
using existing verifiers like BLAST [20], Spec# [9], or ESC/Java2 [15]. Darvas
et al. [16] use the key tool for interactive verification of noninterference; informa-
tion flow is modeled by a dynamic logic formula, rather than by assertions as in
self-composition.
When it comes to conditional information flow, the most noteworthy existing
tool is the slicer by Snelting et al. [30] which generates path conditions in pro-
gram dependence graphs for reasoning about end-to-end flows between specified
program points/variables. In contrast, we provide a contract-based approach for
compositional reasoning about conditions on flows with an underlying logic rep-
resentation that can provide external evidence for conformance to conditional flow
properties. We have recently received the implementation of the approach in [30],
and we are currently investigating the deeper technical connections between the two
approaches.
Finally, we have already noted how our work has been inspired by and aims to
complement previous ground-breaking efforts in certification of MILS infrastruc-
ture [18, 19]. While the direct theorem-proving approach followed in these efforts
enables proofs of very strong properties beyond what our framework can currently
handle, our aim is to dramatically reduce the labor required, and the potential for
error, by integrating automated techniques directly on code, models, and developer
workflows to allow many information flow verification obligations to be discharged
earlier in the life cycle.
8 Conclusion
Acknowledgments This work was supported in part by the US National Science Foundation
(NSF) awards 0454348, 0429141, and CAREER award 0644288, the US Air Force Office of Scien-
tific Research (AFOSR), and Rockwell Collins. The authors gratefully acknowledge the assistance
of Rod Chapman and Trevor Jennings of Praxis High Integrity Systems in obtaining SPARK
examples and running the SPARK tools. The material in this chapter originally appeared in the
Proceedings of FM’08, LNCS 5014.
References
1. Amtoft T, Banerjee A (2004) Information flow analysis in logical form. In: 11th static analysis
symposium (SAS), LNCS, vol 3148. Springer, Berlin, pp 100–115
2. Amtoft T, Banerjee A (2007a) A logic for information flow analysis with an application to
forward slicing of simple imperative programs. Sci Comp Prog 64(1):3–28
3. Amtoft T, Banerjee A (2007b) Verification condition generation for conditional information
flow. In: 5th ACM workshop on formal methods in security engineering (FMSE), a long
version, with proofs, appears as technical report CIS TR 2007-2, Kansas State University,
Manhattan, KS, pp 2–11
4. Amtoft T, Bandhakavi S, Banerjee A (2006) A logic for information flow in object-oriented
programs. In: 33rd Principles of programming languages (POPL), pp 91–102
5. Amtoft T, Hatcliff J, Rodriguez E, Robby, Hoag J, Greve D (2007) Specification and
checking of software contracts for conditional information flow (extended version). Tech-
nical report SAnToS-TR2007-5, CIS Department, Kansas State University. Available at
http://www.sireum.org
6. Amtoft T, Hatcliff J, Rodrı́guez E (2009) Precise and automated contract-based reasoning for
verification and certification of information flow properties of programs with arrays. Technical
report, Kansas State University. URL http://www.cis.ksu.edu/edwin/papers/TR-esop10.pdf,
available from http://www.cis.ksu.edu/ edwin/papers/TR-esop10.pdf
7. Banerjee A, Naumann DA (2005) Stack-based access control and secure information flow.
J Funct Program 2(15):131–177
8. Barnes J (2003) High integrity software – the SPARK approach to safety and security. Addison-
Wesley, Reading, MA
Specification and Checking of Software Contracts for Conditional Information Flow 379
9. Barnett M, Leino KRM, Schulte W (2004) The Spec# programming system: an overview. In:
Construction and analysis of safe, secure, and interoperable smart devices (CASSIS), pp 49–69
10. Barthe G, D’Argenio P, Rezk T (2004) Secure information flow by self-composition. In:
Foccardi R (ed) CSFW’04. IEEE, New York, NY, pp 100–114
11. Bell D, LaPadula L (1973) Secure computer systems: mathematical foundations. Technical
report, MTR-2547, MITRE Corp
12. Bergeretti JF, Carré BA (1985) Information-flow and data-flow analysis of while-programs.
ACM TOPLAS 7(1):37–61
13. Chapman R, Hilton A (2004) Enforcing security and safety models with an information flow
analysis tool. In: SIGAda’04, Atlanta, Georgia. ACM, New York, NY, pp 39–46
14. Cohen ES (1978) Information transmission in sequential programs. In: Foundations of secure
computation. Academic, New York, NY, pp 297–335
15. Cok DR, Kiniry J (2004) ESC/Java2: uniting ESC/Java and JML. In: Construction and analysis
of safe, secure, and interoperable smart devices (CASSIS), pp 108–128
16. Darvas A, Hähnle R, Sands D (2005) A theorem proving approach to analysis of secure infor-
mation flow. In: 2nd International conference on security in pervasive computing (SPC 2005),
LNCS, vol 3450. Springer, Berlin, pp 193–209
17. Goguen JA, Meseguer J (1982) Security policies and security models. In: IEEE symposium on
security and privacy, pp 11–20
18. Greve D, Wilding M, Vanfleet WM (2003) A separation kernel formal security policy. In: 4th
International workshop on the ACL2 prover and its applications (ACL2-2003)
19. Heitmeyer CL, Archer M, Leonard EI, McLean J (2006) Formal specification and verification
of data separation in a separation kernel for an embedded system. In: 13th ACM conference on
computer and communications security (CCS’06), pp 346–355
20. Henzinger TA, Jhala R, Majumdar R, Sutre G (2003) Software verification with blast. In: 10th
SPIN workshop, LNCS, vol 2648. Springer, Berlin, pp 235–239
21. Jackson D, Thomas M, Millett LI (eds) (2007) Software for dependable systems: sufficient
evidence? National Academies Press, Committee on certifiably dependable software systems,
National Research Council
22. Kaufmann M, Manolios P, Moore JS (2000) Computer-aided reasoning: an approach. Kluwer,
Dordrecht
23. Myers AC (1999) JFlow: practical mostly-static information flow control. In: POPL’99,
San Antonio, Texas. ACM, New York, NY, pp 228–241
24. Naumann DA (2006) From coupling relations to mated invariants for checking information
flow. In: Gollmann D, Meier J, Sabelfeld A (eds) 11th European symposium on research in
computer security (ESORICS’06), LNCS, vol 4189. Springer, Berlin, pp 279–296
25. Owre S, Rushby JM, Shankar N (1992) PVS: a prototype verification system. In: Proceed-
ings of the 11th international conference on automated deduction (Lecture notes in computer
science 607)
26. Rossebo B, Oman P, Alves-Foss J, Blue R, Jaszkowiak P (2006) Using SPARK-Ada to model
and verify a MILS message router. In: Proceedings of the international symposium on secure
software engineering
27. Rushby J (1981) The design and verification of secure systems. In: 8th ACM symposium on
operating systems principles, vol 15, Issue 5, pp 12–21
28. Simonet V (2003) Flow Caml in a nutshell. In: Hutton G (ed) First APPSEM-II workshop,
pp 152–165
29. Sireum website. http://www.sireum.org
30. Snelting G, Robschink T, Krinke J (2006) Efficient path conditions in dependence graphs for
software safety analysis. ACM Trans Softw Eng Method 15(4):410–457
31. Terauchi T, Aiken A (2005) Secure information flow as a safety problem. In: 12th Static anal-
ysis symposium, LNCS, vol 3672. Springer, Berlin, pp 352–367
32. Vanfleet M, Luke J, Beckwith RW, Taylor C, Calloni B, Uchenick G (2005) MILS: architecture
for high-assurance embedded computing. CrossTalk: J Defense Softw Eng 18:12–16
33. Volpano D, Smith G, Irvine C (1996) A sound type system for secure flow analysis. J Comput
Security 4(3):167–188
Model Checking Information Flow
1 Introduction
2 A Motivating Example
BUFFER
IFIED &
UI NCLAS S mo de =UN UO
m ode =U Write CLAS SIF
U_ & U_Re ad IED
S_Read U_Read
SECRET UNCLASSIFIED
Secret
Unclassified
Secret/Unclassified
to control the buffer until a corresponding read from the buffer is completed. The
controller is designed to ensure that the secret data is only allowed to be consumed
by the secret output and symmetrically that the unclassified data is only consumed
by the unclassified output.
Given this system, we would like to determine whether or not there is information
flow between the secret processes and the unclassified processes. In other words, is
it possible for the unclassified processes to glean information of any kind from the
secret processes and vice versa? This information sharing is usually called interfer-
ence; noninterference is the dual idea expressing that no information sharing occurs.
In this example, the potential for interference exists via the scheduler. Unclassified
processes can perceive the state of the buffer (whether they are able to read and write
from it) via the scheduler, which is affected by the secret processes.
If we decide that this interference is allowable, we would like to be able to de-
termine whether there are any other sources of interference between the secret and
unclassified processes. An analysis which does not account for the current system
state will probably decide that there is the potential for interference, since both kinds
of processes use a shared buffer. We would like a more accurate analysis that ac-
counts for the scheduler state in order to show that there is no interference through
the shared buffer.
This example demonstrates important features of the analysis that we will de-
scribe in the next sections:
Conditional information flow. We would like the analysis to account for enough
of the system state to allow an accurate analysis (e.g., that no information flows
from a secret input to unclassified output through the shared buffer)
384 M.W. Whalen et al.
“Covert” information flow. The scheduler does not directly convey information
from secret processes to unclassified processes, yet its state allows information
about the secret processes to be perceived. The analysis should detect this inter-
ference.
Intransitive information flow. If we are willing to allow information flow through
the scheduler, there should be a mechanism to allow us to tag this information
path as “allowable” and determine if other sources of flow exist. In the nonin-
terference literature, this is generally described as intransitive noninterference
[5, 19, 20]. The meaning of intransitive has to do with the nature of information
flows. Since the scheduler depends on the secret input and the unclassified output
depends on the scheduler, a transitive analysis would assert that the unclassified
output depends on the secret input. However, we would like to be able to tag
certain mediation points (e.g., downgraders or encryptors) as “allowed” sources
of information flow.
A Simulink model of the shared buffer example is shown in Fig. 2. The inputs to the
model are shown on the left: we have the requests to use the buffer from the four pro-
cesses (the secret input/output process and the unclassified input/output processes)
as well as the input buffer data from the secret and unclassified input processes. The
scheduler subsystem determines access to the buffer, while the buffer subsystem
uses the scheduler state to determine which process writes to the shared buffer.
The information flow analysis is performed in terms of a set of principal
variables. These variables are the variables that we are interested in tracking
1 [si_req]
si_req [si_req] si_req
[ui_data] scheduler
4
ui_data
[State] is_mode_secret 2
5 [so_req] [State] State
so_req From1 so_data
0
3 principal [ui_req]
[uo_req] uo_req
ui_req Gryphon if_principal ui
scheduler
4 principal [ui_data]
ui_data Gryphon if_principal ui 1
[State] is_mode_secret 2
[so_req] [State] State
5 principal so_data
From1
so_req Gryphon if_principal so 0
through the model. We always track the input variables to the model, and we
sometimes track computed variables internal to the model. To perform the analysis,
the Simulink model is annotated to add the principal variables as shown in Fig. 3.
Once we have annotated the model, we use the Gryphon tool set [24] to automat-
ically construct an information flow model that can be model checked on a variety
of model checking tools including NuSMV [8], SAL [23], and Prover [16]. The
analysis process extends the original model with a flow model that operates over
sets of principal variables. Each computed variable in the original model has a flow
variable in the flow model that tracks its dependencies in terms of the principal
variables.
For model checking, sets of principal variables are encoded as bit sets, and check-
ing whether information flow is possible is the same as determining whether it is
possible that one of the principal bits is set. For the model above, the translation
generates the following bit set for the principals:
Now we can write properties over output variables. For example, suppose we
want to show that the secret output data is unaffected by the unclassified input or
output principal. In this case, we could write:
386 M.W. Whalen et al.
3 principal [ui_req]
[uo_req] uo_req
ui_req Gryphon if_principal ui
scheduler
4 principal [ui_data]
ui_data Gryphon if_principal ui 1
[State] is_mode_secret 2
[State] State
5 principal [so_req] so_data
From1
so_req Gryphon if_principal so 0
LTLSPEC G(!(gry_IF_so_data[ui_idx] |
gry_IF_so_data[uo_idx]));
gry IF is the prefix used for the flow variables, so the analysis checks whether
there is flow to the so data output from the ui principal or the uo principal. These
principals correspond to flow from the ui req, ui data, and uo req input variables.
As described earlier, this property is violated, because there is information flow
from the unclassified processes to the secret output through the scheduler. NuSMV
generates a counterexample that we can examine to determine how the information
leak occurred.
After analyzing the problem, we decide that the flow of information through the
scheduler state is allowable. We would now like to search for additional sources
of flow. By adding an additional principal for the scheduler state, as shown in
Fig. 4, we can ignore the flows from the ui and uo principals that occur through the
scheduler. After rerunning the analysis, the model checker finds no other sources of
information flow.
Languages such as Simulink [11] and SCADE [4] are examples of synchronous
dataflow languages. The languages are synchronous because computation proceeds
in a sequence of discrete instants. In each instant, inputs are perceived and states and
outputs are computed. From the perspective of the formal semantics, the computa-
tions are instantaneous. The languages are dataflow because they can be understood
as a system of assignment equations, where an assignment can be computed as
Model Checking Information Flow 387
d + Y Y=X+d
soon as the equations on which it is dependent are computed. The equations can
be represented either textually or graphically. As an example, consider a system that
computes the values of two variables, X and Y , based on four inputs: a, b, c, and
d , as shown in Fig. 5.
The variables (often referred to as signals) in a dataflow model are used to label
a particular computation graph. Therefore, it is incorrect to view the equations as
a set of constraints on the model: a set of equations shown in Fig. 6 is not a valid
model because X and Y mutually refer to one another. This is shown in Fig. 6, where
the bold lines indicate the cyclic dependencies. Such a system may have no solution
or infinitely many solutions, so cannot be directly used as a deterministic program.
If viewed as a graph, these sets of equations have data dependency cycles and are
considered incorrect.
However, in order for the language to be useful, we must be able to have mutual
reference between variables. To allow benign cyclic dependencies, we create a step-
delay operator (i.e., a latch) using the comma operator. For example, fX D 2a=Y ;
Y D 1, .X C d //g defines a system where X is equal to 2a divided by the current
value of Y , while Y is initially equal to 1, and thereafter equal to the previous value
of X plus d .
There are several examples of textual dataflow languages, including Lustre [7],
Lucid Synchrone [3], and Signal [9], that differ in terms of structuring mechanisms,
computational complexity (i.e., whether recursion is allowed), and clocks that de-
fine the rates of computation for variables. Our analysis is defined over the Lustre
language. Lustre is the kernel language of the SCADE tool suite and also the inter-
nal language of the Rockwell Collins Gryphon tool suite. Lustre is also sufficient to
model the portions of the Simulink/Stateflow languages that are suitable for hard-
ware/software codesign.
388 M.W. Whalen et al.
InterferenceTheorem: LEMMA
This theorem states that if two traces are equivalent (vtraceEquivSet) on the
dependencies computed for a variable idx by our Interferes set (DepSet(idx,gt1)),
then two traces agree on the value of idx. The details of the theorem and steps in the
proof will be explained in the following sections.
How this is used in practice is that the user suggests what is believed to be a
noninterfering principal variable for some variable c and a model checker is used to
determine whether or not this variable interferes with (i.e., affects) c.
PVS [15, 22] is a mechanized theorem prover based on classical, typed higher order
logic. Specifications are organized into (potentially parameterized) theories, which
are collections of type and function definitions, assumptions, axioms, and theorems.
The proof language of PVS is composed of a variety of primitive inference proce-
dures that may be combined to construct more powerful proof strategies.
Normally in PVS the proof process is performed interactively, and the proof
script encoding the entire proof is not visible to the user. In our development, we
used the ProofLite [14] extension to PVS in order to embed the proofs as comments
into the PVS theories. To make the theories shorter and easier to understand, we
omit the ProofLite scripts in this chapter. However, the interested reader is encour-
aged to visit http://extras.springer.com, and enter the ISBN for this book, in order to
view the complete scripts.
1
Opaque types in PVS allow one to define a type as an unspecified set of values.
390 M.W. Whalen et al.
variables that are necessary for computing v. These traces are defined by the gtrace
and graphState types, respectively.
Note that our states are defined over an infinite set of variables nat. In a real
system, we would have a finite set, but this can be modeled by simply ignoring all
variables above some maximum index. This change does not affect the formalization
or the proofs.
Next, we define processes that constrain the traces in Fig. 8. The processes are
built from expressions: an (unspecified) set of unary and binary operators, constant,
Fig. 8 (continued)
392 M.W. Whalen et al.
Fig. 8 (continued)
We define different kinds of semantics for the values produced by a program and
also for the information flow. The semantic functions that are introduced follow a
naming convention to make them easier to follow and to relate to one another. The
form of the semantics functions is as follows:
Model Checking Information Flow 393
<TYPE><syntax><OPTIONAL RESTRICTION>
For example, the Se function defines the value-semantic function for expressions,
and the IFsG function defines the information-flow function for states with respect
to gates.
The <TYPE>s of semantics that will be used in the following discussion are as
follows:
S: Value semantics for traces
D: Syntactic dependencies
DS: Dependencies based on syntax and current state
IF: Information flow dependencies
The <syntax>es that will be discussed are the following:
e: Expressions
i: Indices (assignments)
s: States
t: Traces
The <OPTIONAL RESTRICTION>s restrict the semantic functions at a par-
ticular syntactic level to:
I: Inputs
G: Gates
L: Latches
We next create semantic functions for the expressions and programs in Fig. 8. Fol-
lowing [1] and [12], the semantics are defined in terms of trace conformance, as
shown in Fig. 9. We state that a trace conforms to a program if the values computed
by the assignment expressions for the gates and latches correspond to the values in
the trace. The Se function computes a value from a Process expression. The SsG
predicate checks conformance between the gate assignments and a state, and the
SsL predicates check conformance between the latch assignments and the trace. The
St predicate defines trace conformance over both gates and latches.
Now we can create a semantics that tracks information flow through the model,
as shown in Fig. 10. This semantics maps indices to the set of indices used when
computing the value of the index. For expressions, we create two different seman-
tics; the first tracks the indices that are immediately used within the computation of
the expression; the second traces the indices back to principal variables, which are
394 M.W. Whalen et al.
the actual concern of the information flow analysis. For the moment, we consider
the inputs as the principal variables. We expand this notion when we talk about
intransitive interference in Sect. 6.
The only difference between the DSe and IFe semantics in Fig. 10 is in the
behavior of the Variable branch. For the IFe semantics, a set of principal variables
are provided. If a referenced variable is a principal variable, then we return it as a
dependency; if it is not, then we return the dependencies of that variable. The effect
of this rule is to backchain through the intermediate variables so that dependencies
Model Checking Information Flow 395
Fig. 10 (continued)
are always a subset of the principal variables. The DSe semantics, on the other hand,
return the immediate dependencies (i.e., the indices of all variables referenced in the
assignment expression).
Note that both the DSe and IFe semantics are state dependent: For if/then/else
expressions, the set of dependencies depends on the if-test; only dependencies for
the used branch are returned. This feature allows conditional dependencies to be
tracked within the model.
After defining the expression semantics, we define the IF semantics on states
and programs, matching the structure of the S definitions in Fig. 9. At the bottom
of Fig. 10, we define trace pairs as a type and define trace pair conformance to a
program based on both semantics.
Model Checking Information Flow 397
We can now state the interference theorem that should be proven over the trace
pairs. Informally, we would like to state that for a particular index idx, if the inputs
referenced in an information flow trace for idx (DepSet) have the same values in two
state traces (vtraceEquivSet), then the two traces will have the same values for idx.
Formally, this obligation is expressed in Fig. 11. Note that there is an asymmetry in
the interference theorem: we define two execution traces (st1 and st2) but only one
graph trace (gt1). The graph trace (gt1) corresponding to an execution trace (st1) for
a given index idx characterizes the signals that must match for any other execution
trace (in this case st2) to match st1 for signal idx. It is equivalent to use a graph trace
based on st2.
To prove this theorem, we have to build a hierarchy of equivalences shown in
Fig. 12. This graph does not show all of the connections between proofs (e.g.,
which theorems are instantiated in the proofs of other theorems), but it provides
a good overview of the structure of the proof. Ultimately, we are interested in prov-
ing the final theorem, which defines a relationship between traces as described by
the information flow semantics IF and the value semantics S . In order to prove
this theorem, we define an intermediate flow semantics based on state dependen-
cies (DS). Whereas the information flow semantics unwinds the dependencies from
outputs to inputs implicitly through the use of the graph state and graph trace, the
DS flow semantics unwind the graph explicitly and therefore provide an easier basis
for inductive proof.
IFe De
DSiP
State
Dependencies
over Principals
Si
DSiIF
State
Dependencies
back to Graph
State
DSt
Dependencies
over Traces
GraphUnwinding
Key:
GWV Equivalences
Information flow Equivalences
Subset Relations
Defined-In-Terms-Of
The “rows” of the proof graph correspond to a level in the evaluation hierarchy.
Reading from top to bottom, we talk about equivalences in terms of expres-
sions, then in terms of indices (assignments), then states, and, finally, traces. The
“columns” correspond to the different semantics. On the left is the information flow
(IF) semantics, in the middle is the DS semantics, and on the right is the value (S /
semantics. One semantics bridges the IF and DS semantics (DSiIF).
There are two different kinds of theorems that are proved between the semantics.
The first are equivalences between the different flow semantics (e.g., that two flow
Model Checking Information Flow 399
semantics yield the same set of dependencies). The second are GWV-style theorems,
in the same style Pas [6]. These state that if the values of the dependent indices for
a piece of syntax arePequal within two states or traces s1 and s2, then the value
produced by evaluating over s1 and s2 will be equal.
In our analysis, we prove GWVr1-style theorems. GWVr1 is less expressive than
GWVr2 but it is simpler to formulate. The additional expressive power in GWVr2 is
necessary to describe dynamic memory, but the synchronous models that we analyze
in this chapter do not use dynamic memory, so GWVr1 is sufficiently expressive for
our purposes. The connection between the formulation in this chapter and [6] is
explored further in Sect. 7.
In Fig. 13, we begin the process of proving the final theorem by describing some
lemmas over expressions. These will form the basis of the later proofs over larger
pieces of syntax.
The DSe subset De lemma states that the state-aware dependency function (DSe)
returns a subset of the indices referenced by the syntactic dependency function (De).
We appeal to this lemma (through another lemma: WFg to WFgDSe) to establish a
basis for induction for some of the proofs involving equivalence of gate assignments.
The Compose function is used to look up each of the entries in a set in the graph
state. It performs the same function as the Direct Interaction Allowed (DIA) function
in Greve’s formulation [6]. It is used to map from a set of immediate dependencies
to their dependencies.
The IFe to DSe Property lemma defines the first mapping between the state-
based DS dependency semantics and the gtrace-based IF dependency semantics.
Remember from Sect. 3.4 that the IFe semantics are defined in terms of a set of
principals: if a variable is principal, then we look up its dependencies in the graph
state. This property creates an equivalence between these semantics by looking up
(via Compose) the nonprincipal variables from the DSe semantics.
In Fig. 14, we define a bridge between the program well-formedness constraint WFp
and state dependencies (DSe). This bridge will allow us to use the WFp predicate
in reasoning about GWV equivalences involving state dependencies. We define a
WFgDSe predicate that defines well-formedness in terms of the DSe and show that
WFp implies the (more accurate) WFgDSe predicate.
P
traces s1 and s2, then the value produced by the evaluating over s1 and s2 will
match. The idea is that we will start from the immediate dependencies of an expres-
sion and progressively unwind the dependencies toward the inputs. This unwinding
occurs in two stages as follows:
First we unwind to the principals, which (for the purposes of the proof) are
the states and inputs. Another way of looking at this first unwinding is un-
winding back to the “beginning” of the step. This is the definition of the DSiP
dependencies.
Next, we unwind the dependencies back to the inputs by examining the graph
trace over time. This is the definition of the DSt dependencies.
Model Checking Information Flow 401
We also map these state-based equivalences that are computed via explicit
unwindings of dependencies to the IF equivalences, which implicitly unwind
the dependencies using the graph states. This is accomplished by using the DSiIF
dependency relation. This will be the key lemma to show the equivalence of the IF
and DS formulations.
Figure 15 shows the dependency proof for the DSe dependencies. There are two
equivalences: the first over evaluation of expressions and the second over evaluation
of indices.
Figure 16 shows the proofs for the next level of unwinding: showing that if the
principal variables are the same for two states, then the results produced for an index
will be the same. This step removes the gates from the dependency calculation.
Figure 17 shows the proofs of the next level of unwinding to the dependen-
cies of the states. The definition of the DSiIF predicate is particularly important
as it bridges between the graph-trace-based IF semantics and the state-based DS
semantics. Like the DSiP semantics, it backtraces through the gates to reach de-
pendencies based on states and inputs. The distinction is that it then looksup the
402 M.W. Whalen et al.
state dependencies in the graph state. This means that the dependencies computed
by DSiIF will match the dependencies computed by the IF relation, as demonstrated
by the IFe to DSiIF lemma. This is a key lemma in proving the unwinding theorem
over state dependency traces DSt and information flow traces IFt.
Finally, in Fig. 18, we map dependencies to inputs across a multistep trace. First,
we prove a lemma that is sufficient for the proof of latch assignment at step zero
(GWVr1 Si SsL0). This lemma will be used to provide the base case for latches in
the GWVr1 Si DSt proof.
Next, in Fig. 19, we have to define a graph unwinding theorem, which maps
between our state-dependency-based formulation DSt and our graph-dependency-
based formulation IFt. This is performed in two steps. First, we show that the DSiIF
formulation matches the result returned by IFe. Next, we define the unwinding
theorem which demonstrates that DSt and IFt yield the same dependencies.
Now we have finally assembled the pieces necessary to prove the trace theorem
that was proposed in Fig. 8 in Sect. 3.7. The proof is shown in Fig. 20. We state
that the information flow characterizes the execution of a model, if it satisfies the
InterferenceTheorem.
Model Checking Information Flow 403
4 Interference to Noninterference
Fig. 19 The graph unwinding theorem demonstrating equivalence between IFt and DSt semantics
We first assume Rushby’s formalization of LTL [2] in PVS presented in [6]. We now
prove in Fig. 22 that a noninterference assertion over a graph state machine follows
from a particular LTL assertion, in the same way as in Greve [6].
Recall that the IF semantics correspond to graph traces (gtrace) that are composed
of a sequence of graph states (gstate). Each gstate maps program variables to a finite
set of Principal variables. The information flow semantics from the previous section
are then encoded as set manipulations. The information flow model is then the set
of assignments to the information flow variables.
Model Checking Information Flow 407
The mechanism for creating the information flow variable assignments is a set of
transformation rules that are applied to the syntax of ProcessExpr and ProcessAs-
sign datatypes defined in Fig. 8. The transformation rules generate a slightly richer
expression syntax (shown in Fig. 23) that contains two additional variables. The first
expression, IF Variable, allows reference variables in the information flow graph
state. The second, SingletonSet, takes an index and generates a singleton set con-
taining that index.
We can now reflect the information flow semantics into an extended program
ProgramExt that contains assignments for both the state and graph traces, as shown
in Fig. 24.
The hybrid model in Fig. 24 contains assignments both for the state variables (st)
and the graph variables (gr). The syntax of the state assignments does not change;
however, the strong typing of PVS requires that we define a transformation to
408 M.W. Whalen et al.
map from the ProcessExpr and ProcessAssignment datatypes into the ExprExt and
AssignExt datatypes, respectively. This is performed by the IDe and IDa functions,
respectively.
The mapping of the information flow IF semantics into syntax that can be inter-
preted is performed by the TR functions. These functions create new syntax based
Model Checking Information Flow 409
Fig. 24 (continued)
Y is semantically equivalent in both cases, and the soundness of the flow analysis
follows from the existing proof of if/then/else expressions in Fig. 12. Note that the
condition variable for if/then/else (C) is always used for the information flow analy-
sis, so if both variables in a Boolean expression are control variables, the following
is generated:
Y = C0 and C1 ,
Y = if C0 then (if C1 then C0 else false) else
(if C1 then C0 else false)
After applying the syntactic TRe transformation to the right-hand side of the
equivalence and simplifying, this yields the “standard” information flow expression
for the original binary expression: Bop(Union, TRe(a1, Pr), TRe(a2, Pr)).
as long as it is mediated in some way. The reasoning for allowing this interference
is well explained by Roscoe and Goldsmith [19]:
It seems intuitively obvious that the relation must be transitive: how can it make sense for
A to have lower security level than B, and B to have lower level than C , without A hav-
ing lower level than C ? But this argument misses a crucial possibility, that some high-level
users are trusted to downgrade material or otherwise influence low-level users. Indeed, it
has been argued that no large-scale system for handling classified data would make sense
without some mechanism for downgrading information after some review process, inter-
val (e.g., the U.K. 30-year rule) or defined event (the execution of some classified mission
plan, for example). Largely to handle this important problem, a variety of extended theories
proposing definitions of “intransitive noninterference” have appeared, though we observe
that this term is not really accurate, as it is in fact the interference rather than the nonin-
terference relation which is not transitive. Perhaps the best way to read the term is as an
abbreviation for “noninterference under an intransitive security policy.”
There have been several formulations of intransitive interference based on state ma-
chines [20], process algebras [19], and event traces [10].
1 In1 Out1
B
A
Encryptor 1
O
C
on B, even though there is clearly a flow that bypasses B. The problem is that the
encryptor variable is functionally derived from a single input A, so the equivalence
on B forces a corresponding equivalence on the input A. In other words, requiring
a trace equivalence on a computed principal variable may cause an implicit equiv-
alence on another principal variable. These implicit equivalences allow an attacker
to bypass the desired mediation variable.
An approach that could be considered for intransitive interference reframes the prob-
lem: given a program P involving a computed principal variable c, we construct a
program P 0 in which c is an input and assert that all traces must agree on P 0 . P 0
has at least as many traces as P , as the value of c is unconstrained with respect
to the other variables in P 0 . The additional traces distinguish variables that bypass
the computed principal as there is no longer a functional connection between the
computed variable and the inputs.
Unfortunately, treating states as inputs leads to overly conservative analyses in-
volving traces that are impossible in the original program. Consider the shared
buffer model from Sect. 2. If a new model is created in which the scheduler out-
put is instead a system input, then the scheduler can no longer correctly mediate
access to the shared buffer and so information flow occurs through the buffer. The
flow analysis will (correctly) state that there is information flow through the buffer,
but the flagged traces are not possible in the original model.
C
Original Model
A
X = A or B
Y = if X then C else D
B Z = C and Y
Downgrader
D
Transitive information
flow graph
C
X_Graph = A bit_or B
bit Y_Graph = X_Graph bit_or
(if X then C else D)
bit
A Z_Graph = C bit_or Y_Graph
bit
B
D
7 Connections to GWV
In the current chapter and the earlier chapter by Greve [6], we have presented
two quite similar formulations of information flow modeling. The formulation in
Greve’s chapter is more abstract and describes information flow over arbitrary func-
tions using flow graphs. It then describes how these functions can be composed and
how multistep state transition systems can be encoded. Two different formulations
(GWVr1 and GWVr2) are presented. The GWVr2 formulation is capable of model-
ing dynamic information flows, in which storage locations are created and released
during the computation of the function, but this additional capability comes at a cost
of some additional complexity.
In this chapter, we have modeled information flow specifically for synchronous
dataflow languages. The basis for this approach was modeling GWV-style equiv-
alences using a model checker. However, the approach was originally justified by
manual proofs over trace equivalences due to the first author’s familiarity with this
style of formalization for synchronous dataflow languages. The mechanized proofs
in this chapter reflect the manual proofs.
As a basis for formalization, the trace equivalence allows a very natural style of
presentation. It provides a nice abstraction of the computation and information flow
analysis in that a total computation order for the assignments of the semantic and
flow analyses is not required. Instead, we can talk about conformance to some exist-
ing trace. Also, since the entire trace is provided, we can describe latch conformance
by examining the previous state in the trace.
Model Checking Information Flow 417
the infrastructure that had already been established in [6] with respect to function
composition, mapping from interference to noninterference, and justifying LTL
theorems in terms of trace equivalence. It would be possible to reformalize the syn-
chronous language semantics defined in Sect. 3 in order to better utilize the GWV
infrastructure, but we leave this for future work.
We now demonstrate the information flow analysis in the Rockwell Collins Gryphon
tool suite. Gryphon is an analysis framework designed to support model-based
development tools such as Simulink/Stateflow and SCADE. Model-based develop-
ment (MBD) refers to the use of domain-specific, graphical modeling languages that
can be executed and analyzed before the actual system is built. The use of such mod-
eling languages allows the developers to create a model of the system, execute it on
their desktop, analyze it with automated tools, and use it to automatically generate
code and test cases.
As MBD established itself as a reliable technique for software development, an
effort was made to develop a set of tools to enable the practitioners of MBD to for-
mally reason about the models they created. Figure 30 illustrates MBD development
process flow.
The following sections briefly describe each component of the MBD toolchain.
Simulink, Stateflow, and MATLAB are products of The MathWorks, Inc. [11]
Simulink is an interactive graphical environment for use in the design, simulation,
implementation, and testing of dynamic systems. The environment provides a cus-
tomizable set of block libraries from which the user assembles a system model by
selecting and connecting blocks. Blocks may be hierarchically composed from pre-
defined blocks.
8.1.2 Reactis
model verification
MATLAB
executable
graphical
Requirements output Simulink output
system
model
Stateflow
Gryphon RCI
formal
translator
specification
Lustre output
of
model
C implementation
VHDL implementation
8.1.3 Gryphon
Gryphon [24] refers to the Rockwell Collins tool suite that automatically translates
from two popular commercial modeling languages, Simulink/Stateflow and SCADE
[4], into several back-end analysis tools, including model checkers and theorem
provers. Gryphon also supports code generation into Spark/Ada and C. An overview
Model Checking Information Flow 421
Model Checkers:
NuSMV, Prover,
Simulink BAT,Kind,SAL
Simulink SCADE
Gateway
of the Gryphon framework is shown in Fig. 31. Gryphon uses the Lustre [7] formal
specification language (the kernel language of SCADE) as its internal representa-
tion. This allows for the reuse of many of the RCI proprietary optimizations.
8.1.4 Prover
Prover [16] is a best-of-breed commercial model checking tool for analysis of the
behavior of software and hardware models. Prover can analyze both finite-state
models and infinite-state models, that is, models with unbounded integers and real
numbers, through the use of integrated decision procedures for real and integer arith-
metic. Prover supports several proof strategies that offer high performance for a
number of different analysis tasks including functional verification, test-case gener-
ation, and bounded model checking (exhaustive verification to a certain maximum
number of execution steps).
A large scale use of the Gryphon analysis was performed on the Rockwell
Collins Turnstile high-assurance cross-domain guard [18]. A high-level view of
422 M.W. Whalen et al.
GE
the architecture is shown in Fig. 32. The offload engines (OEs) provide the external
interface to Turnstile. The Guard Engine (GE) is responsible for enforcing the
desired security policy for message transport. The guard data movers (GDMs) pro-
vide a high-speed mechanism to transfer messages under the direction of the GE.
The GE is implemented on the EAL-7 AAMP7 microprocessor [25] and uses the
partitioning guarantees provided by the AAMP to ensure secure operation.
In its initial implementation, Turnstile provides a “one-way” guard. It has a high-
side OE (OE1 in Fig. 32) that submits messages (generates input) for the guard, a
low-side OE (OE3 in Fig. 32) that emits messages if they are allowed to pass through
the guard, and an audit OE (OE2 in Fig. 32) that provides audit functionality for the
system.
The architectural analysis focused on the interaction between the GDMs, GE,
and OEs. The OEs, GDMs, and GE do not share a common clock and both execute
and communicate asynchronously. In the model, we clock each of the subsystems
Model Checking Information Flow 423
The Simulink model of the Turnstile system architecture is shown in Fig. 33. The
components were modeled at various levels of fidelity, depending on their relevance
to the information flow problem:
1 9
OE1_T X_Access OE3_TX_Access
[OE1_T X_Access] OEx_Ac cess _GDMx_TX OEx_Writing_GDMx_TX_Data [OE1_Writing_GDM1_T X_Data] OEx_Acces s_GDMx_TX OEx_Writing_GDMx_TX_Data [OE3_T X_Access] OEx_Acces s_GDMx_TX OEx_Writing_GDMx_TX_Data [OE3_Writing_GDM3_TX_Data]
Gryphon if_principal OE1_Write_Process TX_Data_From_OEx [T X_Data_From_OE1] TX_Data_From_OEx Gryphon if_principal OE3_Write_Process TX_Data_From_OEx [TX_Data_From_OE3]
2 In1 Out1 OEx_TX_Data_GDMx OEx_TX_Data_GDMx 8 In1 Out1 OEx_TX_Data_GDMx
OE1_T X_Data OE3_TX_Data
2 OEx_Writing_GDMx_TX_Size [OE1_Writing_GDM1_TX_Size] OEx_Writing_GDMx_TX_Size 10 OEx_Writing_GDMx_TX_Size [OE3_Writing_GDM3_T X_Size]
OE1_RX_Access OE3_RX_Access
[OE1_RX_Access] OEx_Ac cess _GDMx_RX OEx_Acces s_GDMx_RX [OE3_RX_Access] OEx_Acces s_GDMx_RX
TX_Size_From_OEx [T X_Size_From_OE1] TX_Size_From_OEx TX_Size_From_OEx [TX_Size_From_OE3]
3 11
OE1_RX_Read_Data OE3_RX_Read_Data
[GDM1_RX_Data] OEx_RX_Data_GDMx RX_Data_From _GDMx OEx_RX_Data_GDMx RX_Data_From_GDMx [GDM3_RX_Data] OEx_RX_Data_GDMx RX_Data_From_GDMx
Gryphon if_principal OE1_Read_Process Writing_RX_Status_From_OEx [OE1_Writing_GDM1_RX_Status] Writing_RX_Status _From_OEx Gryphon if_principal OE3_Read_Process Writing_RX_Status _From_OEx [OE3_Writing_GDM3_RX_Status]
3 In1 Out1 OE_Read_Process OE_Read_Proc ess 9 In1 Out1 OE_Read_Proc ess
OE1_Read_T X OE3_Read_TX
RX_Status_From_OEx [RX_Status_From_OE1] RX_Status _From_OEx 12 RX_Status _From_OEx [RX_Status_From_OE3]
4 16
OE1_CTRL_Access OE2_CTRL_Access OE3_CTRL_Access
[OE1_CTRL_Access] OEx_Ac cess _GDMx_CTRL [OE2_CT RL_Access] OEx_Acces s_GDMx_CTRL [OE3_CT RL_Access] OEx_Acces s_GDMx_CTRL
CTRL_Data_From_GDMx CTRL_Data_From_GDMx CTRL_Data_From_GDMx
5 17 13
OE1_CT RL_Data OE2_CT RL_Data OE3_CT RL_Data
[GDM1_CTRL_Data] OEx_CTRL_Data_GDMx Writing_CTRL_Status_From_OEx [OE1_Writing_GDM1_CT RL_Statu s [GDM2_CTRL_Data] OEx_CTRL_Data_GDMx Writing_CTRL_Status _From_OEx [OE2_Writing_GDM2_CTRL_Status [GDM3_CT RL_Data] OEx_CTRL_Data_GDMx Writing_CTRL_Status _From_OEx [OE3_Writing_GDM3_CTRL_Status
CTRL_Status_From_OEx [CT RL_Status_From_OE1] CTRL_Status _From_OEx [CT RL_Status_From_OE2] CTRL_Status _From_OEx [CTRL_Status_From_OE3]
4 OE_Read_CTRL 20 OE_Read_CTRL 10 OE_Read_CTRL
OE1_Read_CTRL OE2_Read_CT RL OE3_Read_CT RL
6 OEx _Writing_GDMx_HLST_Data [OE1_Writing_GDM1_HLST _Data] 18 OEx_Writing_GDMx _HLST_Data [OE2_Writing_GDM2_HLST _Data] 14 OEx_Writing_GDMx _HLST_Data [OE3_Writing_GDM3_HLST_Data]
5 21 11
Heartbeat Heartbeat Heartbeat
OE1_HLST_Heartbeat OEx_HLST_Data_GDMx OEx_Writing_GDMx_HLST_Size [OE1_Writing_GDM1_HLST _Size] OE2_HLST_Heartbeat OEx_HLST_Data_GDMx OEx_Writing_GDMx _HLST_Size [OE2_Writing_GDM2_HLST _Size] OE3_HLST _Heartbeat OEx_HLST_Data_GDMx OEx_Writing_GDMx _HLST_Size [OE3_Writing_GDM3_HLST_Size]
6 22 12
AuditInf ormation AuditInf ormation AuditInf orm ation
OE1_HLST _Audit 7 OE2_HLST _Audit 19 OE3_HLST_Audit 15
HLST_Size_From_OEx [HLST _Size_From_OE1] HLST_Size_From_OEx [HLST _Size_From_OE2] HLST_Size_From_OEx [HLST_Size_From_OE3]
OE1_Audit_Access OE2_Audit_Access OE3_Audit_Access
[OE1_Audit_Access] OEx_Ac cess _GDMx_Audit [OE2_Audit_Access] OEx_Acces s_GDMx_Audit [OE3_Audit_Access] OEx_Acces s_GDMx_Audit
[GE_Writing_GDM1_CT RL_Data] GE_CTRL_Writing_Data RX_SIZE [GE_Writing_GDM2_CTRL_Data] GE_CTRL_Writing_Data RX_SIZE [GE_Writing_GDM3_CTRL_Data] GE_CTRL_Writing_Data RX_SIZE
[GE_GDM1_CT RL_Size] GE_CTRL_Size CTRL_SIZE [GE_GDM2_CT RL_Size] GE_CTRL_Size CTRL_SIZE [GE_GDM3_CT RL_Size] GE_CTRL_Size CTRL_SIZE
[OE1_Writing_GDM1_T X_Size] OE_TX_Writing_Size OE_RX_Access [OE1_RX_Access] OE_TX_Writing_Size OE_RX_Acc ess [OE3_Writing_GDM3_TX_Size] OE_TX_Writing_Size OE_RX_Acc ess [OE3_RX_Access]
[RX_Status_From_OE1] OE_RX_Status OE_HLST_Access [OE1_HLST_Access] OE_RX_Status OE_HLST_Acc ess [OE2_HLST _Access] [RX_Status_From_OE3] OE_RX_Status OE_HLST_Acc ess [OE3_HLST _Access]
[OE1_Writing_GDM1_HLST _Data] OE_HLST_Writing_Data GE_TX_Access [GE_GDM1_T X_Access] [OE2_Writing_GDM2_HLST _Data] OE_HLST_Writing_Data GE_TX_Acc ess [OE3_Writing_GDM3_HLST _Data] OE_HLST_Writing_Data GE_TX_Acc ess [GE_GDM3_TX_Access]
GE_CTRL_Access [GE_GDM1_CT RL_Access] GE_CTRL_Acc ess [GE_GDM2_CT RL_Access] GE_CTRL_Acc ess [GE_GDM3_CT RL_Access]
[OE1_Writing_GDM1_HLST _Size] OE_HLST_Writing_Size [OE2_Writing_GDM2_HLST _Size] OE_HLST_Writing_Size [OE3_Writing_GDM3_HLST _Size] OE_HLST_Writing_Size
[HLST _Size_From_OE1] OE_HLST_Size GE_HLST_Access [GE_GDM1_HLST_Access] [HLST _Size_From_OE2] OE_HLST_Size GE_HLST_Acc ess [GE_GDM2_HLST _Access] [HLST _Size_From_OE3] OE_HLST_Size GE_HLST_Acc ess [GE_GDM3_HLST _Access]
13
GE
GE_Clock
GE_GDM1_Writing_TX_Size [GE_Writing_GDM1_TX_Size]
[GDM1_TX_Data] GDM1_TX_Data
GE_GDM1_HLST_Data
Writing_GDM2_CTRL_Size [GE_Writing_GDM2_CTRL_Size]
16 GE_GDM2_CTRL_Data
GE_GDM2_CT RL_Data GDM2_CTRL_Size [GE_GDM2_CT RL_Size]
GE_GDM2_HLST_Data
[GE_GDM2_HLST _Access] GE_Acc ess_GDM2_HLST
Writing_GDM2_Audit_Data [GE_Writing_GDM2_Audit_Data]
17 GE_GDM2_HLST_Read_Complete
GE_GDM2_HLST _Read_Complete GDM2_Audit_Data [GE_GDM2_Audit_Data]
GDM2_Audit_Size [GE_GDM2_Audit_Size]
GE_GDM3_Writing_TX_Size [GE_Writing_GDM3_TX_Size]
[GDM3_TX_Data] GDM3_TX_Data
18 GE_GDM3_CTRL_Data
GE_GDM3_CT RL_Data Writing_GDM3_CTRL_Size [GE_Writing_GDM3_CTRL_Size]
GE_GDM3_HLST_Data
GE
The GDMs are responsible for most of the data routing and are modeled to a
high level of fidelity. All of the GDM channels (transmit, receive, audit, control,
and health monitor) are modeled as well as the GDM-to-GDM and GDM-to-GE
transfer protocols.
The data routing portions of the GE were accurately modeled. The policy en-
forcement portions (the guard evaluator) were modeled nondeterministically: the
GE component randomly chooses whether messages are dropped or propagated.
The OEs were modeled at a fairly low level of fidelity. As the OEs are not trusted
by the Turnstile architecture, we allow them to nondeterministically submit re-
quests on all of the interfaces between OE and GDM. This approach allows us to
model situations in which the OE violates the Turnstile communications proto-
cols (which should cause the system to enter a fail-safe mode).
The principals of interest are those processes on the Offload Engines that inter-
act with the outside world (the low and high networks): the reading and writing
processes on OE1 and the reading and writing processes on OE3. To represent the
arbitrary interleavings of the Turnstile processes, we used enabled (clocked) subsys-
tems in Simulink. The GDMs run in synchrony at the basic rate of the model while
the OEs and GE run at arbitrary intervals of the basic rate.
The model in Fig. 33 was translated via Gryphon into the model checkers
NuSMV [8] and Prover [16]. With these tools we analyzed several of the information
flows through the model. Since the OE has multiple inputs in our model (and in real
life), we analyzed every input into the OEs for the possible presence of information
from an unwanted source. In a one-way guard configuration, we are interested in
determining whether there is backflow of information to the high-side network, that
is, whether any GDM input into OE1 is influenced by the low-side (OE3) reading
or writing principals. These properties can be encoded as shown in Fig. 34.
One of the back flow properties (shown in bold font) was violated in the architec-
tural model. However, this was already a known source of back flow because of the
implementation of the GDM transfer protocol that resulted from a quality of service
In this chapter, we have described an analysis procedure that can be used to check a
variety of information flow properties of hardware and software systems, including
noninterference over system traces. This procedure is an instantiation of the GWV-
style flow analysis specialized for synchronous dataflow languages such as SCADE
[4] and Simulink [11]. Our analysis is based on annotations that can be added di-
rectly to a Simulink or SCADE model that describe specific sources and sinks of
information. After this annotation phase, the translation and model checking tools
426 M.W. Whalen et al.
There are several directions for future work given the framework that has been
created. First, there are a variety of interesting properties beyond noninterference
that can be formalized using temporal logic. For example, it is possible to be-
gin talking about rates of information flow through a system by creating more
interesting temporal logic formulations of flow properties. For example, one can
state that flow occurs at most every ten cycles of evaluation (say), with the follow-
ing Real-Time CTL (RTCTL) [2] property:
SPEC AG(gry IF output[P1] -> ABF[5, 23] (!gry IF output[P1]));
where “ABF” is the bounded future operator of RTCTL. This formula states that if
flow occurs from principal P1 to variable output in the current steps, then no flow
occurs from P1 to output over the next ten steps. In order to be informative, this
obligation would have to be paired with some notion of how much information was
being transmitted by a particular flow in an instant when flow occurs. It should be
possible to annotate (manually or automatically) an information flow model with
the flow rates along particular edges within the graph. Such an annotation could be
Model Checking Information Flow 427
Acknowledgments We would like to thank the reviewers of early drafts of this paper, espe-
cially Matt Staats, Andrew Gacek, and Kimberly Whalen, for their many helpful comments and
suggestions.
References
A destructor elimination, 9
AAMP7G, 17, 222, 246, 303 encapsulation, 181, 315
abstract model, 179, 180, 184–186 encapsulation principle, 8
context switch, 179, 189 functional instantiation, 8
exception, 189 generalization, 9
exception handlers, 176, 189 guard verification, 11
executive mode, 176, 179 induction principle, 6, 7
functional model, 184, 185 linear arithmetic, 9
instruction set model, 188, 189 logical world, 8, 9
intrinsic partitioning, 176–178, 180 metafunctions, 9
machine state, 178 must-be-equal (mbe), 11
microarchitecture, 175, 184 primitives, 77, 84
microcode, 175, 178, 179, 185–188, 190 proof techniques, 8–10
partition, 175–180, 185–189 rewriter, 189
partition-aware interrupt, 177 simplification, 8
partition control blocks, 178 single-threaded object, 11, 189
partition initialization, 178 stobjs, 11
partition management unit, 177 symbolic simulator
partition state load, 186, 189 G, 67, 68
partition state save, 178 GL, 67, 68, 75, 77, 78, 81–84
partition switch, 178 Advanced encryption standard (AES)
RAM modeling, 184 encryption, 90, 113, 115, 118, 129, 134,
space partitioning, 177–178 142
thread, 176, 179 encryption round, 134, 137, 139
time partitioning, 176, 177 key expansion, 121–122, 126–127, 129,
traps, 176, 179 132, 134, 137, 139
trusted microcode, 185, 188, 190 pipelining, 90, 134, 139, 142
user mode, 176, 179 reference specification, 90, 129, 140, 141
Abstract circuit, 224 T-Box, 90, 129, 136–140, 142
Abstraction AES. See Advanced encryption standard
conservative, 273, 274 AES-128, 114, 128, 129, 133, 134, 136–140,
graph, 273–275 142
Abstract process, 331 AES-256, 114, 128, 129, 132, 133, 140
ACL2, 23, 25, 65–70, 74, 77–87, 145, 152 Agreement assertions, 348–350, 357, 376
book, 4, 9–11, 16 indexed, 349–350
certification, 9 AIG. See And-inverter graph
clause-processors, 9 AIG-to-BDD conversion, 75, 76, 78, 81–82
Defchoose principle, 7 Algebraic types, 226
definitional principle, 6, 7 ALU. See Arithmetic and logic unit
429
430 Index
AMD, 23, 24, 66, 78, 86, 222 machine description, 147
AMD64 instruction set architecture, 24 memories, 146–148, 151, 152, 166, 170
K5, 17 memory abstraction algorithm, 146, 152
Llano, 23, 24 multiple value return, 149, 150
And-inverter graph (AIG), 66, 67, 75–82, 86 NICE dag, 151, 152
Arithmetic and logic unit (ALU), 149, 150, primitives, 148
153, 155, 158 specification language, 146–152, 166, 170,
ARM CPU 172
ARM6, 222 type inference, 147
ARM7, 229 universal formula, 150
ARM8, 229 user-defined functions, 149, 150
ARM9, 227, 229 Bit vectors, 24, 25, 29–31, 34, 39, 43,
ARM610, 226 48, 57
Cortex-A8, 232 Black box verification, 87
Cortex-M, 232 Block RAM, 90, 101, 129, 138, 141, 142
ARM ISA Bounded model checker, 146
ARMv3, 221, 222 Boxing, 10
ARMv4, 229 Branching-time temporal logic, 163
ARMv6, 232, 323, 332, 337 Branch prediction, 164
ARMv7, 232
ARMv4T, 230
Thumb, 230 C
Thumb2, 232 C99, 332
Assembly language, 321 Cadence, 69, 72, 73
2-Assertions. See Agreement assertions Calculus of indices
Asynchronous events, 316 basis, 251, 288
Autopilot example, 365–376 index, 251, 288
path, 253
projection function, 288
B Capabilities, 328
Backflow, 424, 425 Carry-propagate adder, 25
Bare metal, 246 Carry-save adder, 25, 43
Basis set Case split, 66, 75–78, 86
complete, 255, 257, 263, 278, 280, 286 Centaur, 18, 222
divisive, 255, 257 theorem formulation, 76–77
ideal, 257 verification methodology, 74–76, 85, 87
orthogonal, 255, 257, 278, 280, 286 Certification evidence, 303, 321
PolyValued, 254, 257 ClearP Lemma, 319–321
reasonable, 257 CLI stack, 245, 246
BAT. See Bit-level analysis tool Clock gating, 79
BDD. See Binary-decision diagram Clockrate, 97, 98, 107, 108, 110, 137, 138,
Binary-decision diagram (BDD), 23, 66, 67, 141, 143
75–84, 86 CN. See VIA Centaur Nano
ordering, 75–78, 82, 83 CNF. See Conjunctive normal form
parametrization, 75, 76, 78, 80–82, 86 Code-to-spec review, 186, 187, 190, 307
Bisimulation, 14 Collapsed flushing, 171
Bit-level analysis tool (BAT) Combinatorial circuit, 89, 101–103, 109–112
bit vectors, 146–150, 155, 156 Commitment refinement map, 165, 171
circuit to CNF compiler, 146 Commitment theorem, 169, 170
decision procedure, 146–148, 151–152, Common Criteria, 337, 341, 342
170–172 EAL6C, 302, 321
existential formula, 150 EAL6 augmented, 302
expressions, 148–149 EAL 7, 175, 180, 190
local, 149 evaluation assurance level (EAL), 175, 302
Index 431
graph, 259, 260, 262, 270, 271, 273 Imperative language, 307
graph function, 261–262 In, 344, 345, 350, 351, 359, 360, 378
GWVr1, 262–269, 275–285, 289, 382, Index, 389, 390, 392, 393, 395, 397, 401, 407,
399, 402–405, 416–419 409, 417, 418
GWVr2, 277–285, 292, 293, 303, 305, 314, Inductive assertions, 17
315, 319, 321 Inductive invariance proof, 170
interferes set, 259–261 Infiltration, 183
use set, 261 Informal analysis, 303, 321
Information flow, 303, 305, 316–319, 341–378
annotations, 382, 425
H backflow, 424, 425
Handler stack, 333, 334 channels, 364, 365, 373
Hard-partitioning, 301 conditional, 383, 425, 426
Hardware abstraction layer, 303 graph, 389, 407, 414, 416, 427
Hardware-dependent code, 321 graph cuts, 414–146
Hardware description language (HDL), 65–69, intransitive, 382, 384, 386, 394, 413–416,
75, 79, 86, 87, 146, 172 426
Hardware interrupts, 228, 234 model, 385–404, 406–410, 415, 416, 426,
Hardware verification, 65, 66, 82, 145 427
Haskell programming language, 323, 329 model checking, 381–427
HDL. See Hardware description language shared buffer model, 382, 413, 426
HDL simulator, 65 Inlining, 97
Heap, 332, 333 Instruction memory, 155, 170
High assurance guards, 341, 343 Instruction set architecture (ISA), 146,
Higher order logic (HOL), 221 152–172, 225–227
applications, ARM, 202 Instruction specification, 74, 75, 78
implementation INTEGRITY-178B, 301–303, 307, 308, 315,
code generation, 195 316, 321, 337
definition packages, 200, 202
Intel, 23, 66, 78, 79, 85–87, 222
interfaces, 201–202
Pentium FDIV bug, 23, 26, 145
library, 202
Interference
proof procedures, 200–201
intransitive, 382, 384, 394, 412–416
theory, 201
transitive, 382, 384, 394, 412–416
logic
Interference theorem, 382, 397–403, 406, 413,
constant definition, 198
416–419, 426, 427
deductive system, 197–198
Interpreter, 16, 18
terms, 196, 198, 202, 203, 210
Interprocedural analysis, 355–356
type definition, 198, 199
Interrupt processing, 314, 316
types, 196, 201
Interrupts, 327, 328, 333
TFL, 203–204, 226
Intraprocedural analysis, 342, 354–355
High-level design, 302, 303
Invariant, 325, 326, 329–331, 336
Hoare logic, 234, 324, 325, 330
ISA. See Instruction set architecture
Hoare triple, 235, 329–333
Isabelle/HOL, 323, 331–333, 337
HOL4, 222
HOL, See Higher order logic
Hybrid model, 404, 407, 410, 415
J
Hypothesis function, 319, 320
Java Virtual Machine (JVM), 11, 16
JVM. See Java Virtual Machine
I
IBM, 66, 86
power4, 17 K
z990, 24 Kernel
Idle thread, 326 entry point, 336
IEEE 754, 86 execution, 326, 327, 331
434 Index
U X
UCLA Secure Unix, 337 x86, 222, 230, 245
Unboxing, 11 x87, 68
Unpredictable case, 232 x86-compatible microprocessor, 65
Unsigned integer, 36, 52, 56 XFL, 24–26, 36, 54–64