Electronic Research Archive of Blekinge Institute of Technology
http://www.bth.se/fou/
This is an author produced version of a conference paper. The paper has been peer-reviewed
but may not include the final publisher proof-corrections or pagination of the proceedings.
Citation for the published Conference paper:
Title:
Type checking cryptography implementations
Author:
Manuel Barbosa, Andrew Moss, Dan Page, Nuno Rodrigues, Paulo Silva
Conference Name:
4th IPM International Conference on Fundamentals of Software Engineering, FSEN 2011
Conference Year:
2012
Conference Location:
Access to the published version may require subscription.
Published with permission from:
Springer
Type Checking Cryptography Implementations
Manuel Barbosa1
Andrew Moss2
Dan Page3
1,4
Nuno F. Rodrigues
Paulo F. Silva1
1
Departamento de Informática, Universidade do Minho
School of Computing, Blekinge Institute of Technology
3
Department of Computer Science, University of Bristol
4
DIGARC, Instituto Politécnico do Cávado e do Ave
2
Abstract. Cryptographic software development is a challenging field:
high performance must be achieved, while ensuring correctness and compliance with low-level security policies. CAO is a domain specific language
designed to assist development of cryptographic software. An important
feature of this language is the design of a novel type system introducing
native types such as predefined sized vectors, matrices and bit strings,
residue classes modulo an integer, finite fields and finite field extensions,
allowing for extensive static validation of source code. We present the
formalisation, validation and implementation of this type system.
1
Introduction
The development of cryptographic software is clearly distinct from other areas of
software engineering. The design and implementation of cryptographic software
draws on skills from mathematics, computer science and electrical engineering.
Also, since security is difficult to sell as a feature in software products, cryptography needs to be as close to invisible as possible in terms of computational
and communication load. As a result, cryptographic software must be optimised
aggressively, without altering the security semantics. Finally, cryptographic software is implemented on a very wide range of devices, from embedded processors
with very limited computational power and memory, to high-end servers, which
demand high-performance and low-latency. Therefore, the implementation of
cryptographic kernels imposes a specific set of challenges that do not apply to
other system components. For example, direct implementation in assembly language is common, not only to guarantee a more efficient implementation, but
also to ensure that low-level security policies are satisfied by the machine code.
The CAO language. The CAO language aims to change this state of affairs, allowing natural description of cryptographic software implementations, which can
be analysed by a compiler that performs security-aware analysis, transformation
and optimisation. The driving principle behind the design of CAO is that the
language should support cryptographic concepts as first-class language features.
Unlike the languages used in mathematical software packages such as Magma or
Maple, which allow the description of high-level mathematical constructions in
their full generality, CAO is restricted to enabling the implementation of cryptographic components such as block ciphers, hash functions and sequences of finite
field arithmetic for Elliptic Curve Cryptography (ECC).
2
CAO preserves some higher-level features to be familiar to an imperative
programmer, whilst focusing on the implementation aspects that are most critical
for security and efficiency. The memory model of CAO is, by design, extremely
simple to prevent memory management errors (there is no dynamic memory
allocation and it has call-by-value semantics). Furthermore, the language does
not support any input/output constructions, as it is targeted at implementing
the core components in cryptographic libraries. In fact, a typical CAO program
comprises only the definition of a global state and a set of functions that permit
performing cryptographic operations over that state. Conversely, the native types
and operators in the language are highly expressive and tuned to the specific
domain of cryptography. In short, the design of CAO allowed trading off the
generality of a language such as C or Java, for a richer type system that permits
expressing cryptographic software implementations in a more natural way.
CAO introduces as first-class features pure incarnations of mathematical
types commonly used in cryptography (arbitrary precision integer, ring of residue
classes modulo an integer, finite field of residue classes modulo a prime, finite
field extensions and matrices of these mathematical types) and also bit strings
of known finite size. A more expressive type system would be expected from any
domain-specific language. However, in the case of CAO, the design of the type
system was taken a step further in order not only to allow an elegant formalisation of the type checking rules, but also to allow the efficient implementation
of a type checking system that performs extensive preliminary validation of the
code, and extracts a very rich body of information from it. This fact makes the
CAO type checker a critical building block in the implementation of compilation
and formal verification tools supporting the language.
Contributions. This paper presents the formalisation, validation and implementation of the CAO type system. Our main contribution is to show that the
trade-offs in language features that were introduced in the design of CAO –
specifically for cryptographic software implementation – enabled us to tame the
complexity of formalising and validating a surprisingly powerful type system.
We also show, resorting to practical examples, how this type system enforces
strong typing rules and how these rules detect several common run-time errors.
To support this claim, we outline our proof of soundness of the CAO type system.
More in detail, we describe a formalisation of the CAO type system and
the corresponding implementation of a type checker5 as a front-end of the CAO
tool chain. One of the main achievements of our system is the enforcement of
strong typing rules that are aware of type parameters in the data types of the
language. The type checking rules permit determining concrete values for these
parameters and, furthermore, resolving the consistency of these parameters inside CAO programs. Concretely, the CAO type system explicitly includes as type
parameters the sizes of containers such as vectors, matrices and bit strings. In
other words, CAO is dependently typed. Furthermore, typing of complex opera5
An implementation of a CAO interpreter (including the type system and semantics)
is available via http://www.cace-project.eu.
3
tions over these containers, including concatenation and extensional assignment,
statically checks the compatibility of these parameters.
More interestingly, we are able to handle parameters in mathematical types
in a similar way. Our type system maintains information for the concrete values of integer moduli and polynomial moduli, so that it is possible to validate
the consistency of complex mathematical expressions, including group and finite
field operations, the conversion between a finite field element and its polynomial
representation, and other type conversions. Finally, the CAO type system also
deals with language usability issues that include implicit (automatic) type conversions between bit strings and the integer value that they represent, and also
between values within the same finite field extension hierarchy.
Paper organisation. In Section 2 we expand on the relevant features of CAO.
We then build some intuition for the subsequent formal presentation of the type
system by introducing real-world examples of CAO code in Section 3. In Section 4
we present the CAO type system, including a detailed example of its operation.
In Section 5 we describe our implementation. We conclude with a discussion of
soundness and related work in Sections 6 and 7.
2
A closer look at CAO
Real world examples of the most relevant CAO language features are presented
in Section 3. We now provide an intuitive description of the CAO type system.
Bit strings. The bits type represents a string of n bits (labelled 0 . . . n−1, where
the 0-th is the least-significant bit). This should not be seen as the “bit vector”
type, as the get operator a[i] actually returns type bits[1]. The distinction between
ubits and sbits concerns only the conversion convention to the integer type, which
can be unsigned or two’s complement respectively. The bits type is equipped with
a set of C-like bit-wise operators, including the usual Boolean, shift and rotate
operators, which are closed over the bit-length. The range selection/assignment
(or slicing) operator (..), combined with the concatenation operator @ can be
used to (de)construct bit strings of different sizes using a very concise syntax.
For example, the following is a valid CAO statement over bit strings:
a[3..8] := b[0..2] @ c[2..4];
Integers and the mod type. Operations modulo some prime or composite
integer are used extensively in cryptography [7]; for example, the ring6 Zn underlies the pervasively used RSA function [5], and the finite field7 Fp is widely
used in ECC. Therefore, CAO includes not only arbitrary precision integers as
a native type (int), but also a mod[n] type. For example, the mod[7] type is an
instance of mod with modulus 7. In this case the modulus is prime, and hence
inhabitants of this type are actually elements of a finite field. More generally,
6
7
The ring of residue classes modulo an integer n can be seen as the set of numbers in
the range 0 to n-1 with addition and multiplication modulo n.
The ring of residue classes modulo an integer p is actually a field when p is prime:
all non-zero elements have a multiplicative inverse.
4
the modulus can be prime or composite, provided it is fixed at compile-time.
Algebraic operations over the mod type are closed over the modulus parameter.
Internal representation and Casts. The internal representation of mathematical types is deliberately undefined. The CAO semantics ensures that arithmetic with such values is valid, but makes no guarantee about (and hence disallows access to) their physical representation. Nevertheless, the CAO type system
includes the necessary functionality to access the conceptually natural representation of algebraic types, by supporting appropriate cast operators. For example,
to obtain the representation of a finite field element in mod[p] as an integer value
of the appropriate range, one simply casts it into the int type. To obtain the representation of an arbitrary precision integer, one can cast it into a bit string
of a predetermined size, and so on. Hence, compared to C, a CAO cast is more
explicitly a conversion. Aside from this nuance, the syntax of casts is similar to
C: one specifies the target type in parenthesis, e.g. y := (int) x.
General moduli. An alternative form of the mod type allows defining finite
field extensions, as shown below:
typedef a := mod[ 2 ];
typedef b := mod[ a<X> / X**8 + X**4 + X**3 + X + 1 ];
The type synonym a represents a mod type whose modulus is 2; this is simply the
field F2 . This is used as the base type for a second type synonym b which represents the field F28 . In addition to the base type one also specifies an indeterminate
symbol (in this case X), and an irreducible polynomial in the ring of polynomials
with coefficients in the base type (in this case P (X) = X 8 +X 4 +X 3 +X +1). Intuitively, this declaration defines an implementation of the field based on the referred polynomial ring, with arithmetic defined via standard polynomial algebra
with reductions modulo P (X). To access the coefficients in this representation,
one can cast the value into a vector of elements in the base type.
Matrices. The matrix type represents a 2-dimensional algebraic matrix over
which one can perform addition and multiplication. For this reason, there are
some restrictions on what the base type can be. The matrix type also has an
undefined representation; its size must be fixed at compile-time, but the ordering
of elements in memory (e.g. row-major or column-major order) is a choice that
can be made by the compiler. The matrix type also supports get and range
selection/assignment operations that permit easily (de)constructing matrices of
different sizes.
Vectors. The vector type represents a 1-dimensional generic container of elements of homogeneous type, where each element is referred to by a single index
in the range 0 . . . n − 1, offering selection/assignment, concatenation and rotate
operations similar to the bits type.
5
3
CAO Type System in Action
In this section we present some examples of CAO code taken from the implementation of the NaCl cryptographic library8 that illustrate the validation capacity
of the type checker over real world examples.
The following program fragment was taken from the implementation of the
poly1305 one-time message authentication mechanism [3]. The function receives
two vectors ciu and ru of content type byte, which is an alias for type unsigned
bits[8], and an integer q. It returns a value of type mod1305, an alias for type
mod[2**130-5].
def polyStep(ciu:vector[17] of byte, ru:vector[16] of byte, q:int) : mod1305 {
def r : unsigned bits[16*8]; def ci : unsigned bits[17*8];
r := ru[0]@ru[1]@ru[2]@ru[3]@ru[4]@ru[5]@ru[6]@ru[7]@ru[8]@ru[9]@ru[10]@
ru[11]@ru[12]@ru[13]@ru[14]@ru[15];
ci:= ciu[0]@ciu[1]@ciu[2]@ciu[3]@ciu[4]@ciu[5]@ciu[6]@ciu[7]@ciu[8]@
ciu[9]@ciu[10]@ciu[11]@ciu[12]@ciu[13]@ciu[14]@ciu[15]@ciu[16];
return ((mod1305)ci * (mod1305)r**q); }
The type system must solve the following problems to type the function body.
Firstly, the concatenation of several bit strings must be typed to a single bit
string of the appropriate type and size (and fail if these do not match in assignment). Secondly, the type checker must recognise that the cast to type mod1305
requires the expression on the right to be coerced to type int.
The next program fragment is from the NaCl implementation of hsalsa20 [4].
seq i := 0 to 3 {
x[i+1] := from_littleendian( k[i*4..i*4+3]);
x[i+6] := from_littleendian(in[i*4..i*4+3]);
x[i+11] := from_littleendian( k[i*4+16..i*4+19]); }
...
seq i := 0 to 3 {
out[i*4..i*4+3]
:= to_littleendian(x[5*i]);
out[i*4+16..i*4+19] := to_littleendian(x[i+6]); }
This is a good example of how CAO was fine tuned to provide assistance to
the programmer in what, at first sight, might seem like a surprisingly powerful
validation procedure. Range selection and assignment operators in bit strings,
vectors and matrices may depend on the value of integer expressions, which can
only be formed by literals, constants and basic arithmetic operations that can
be evaluated at compile-time. This might seem just like a pre-processing step
of compilation, were it not for the fact that we are also able to include in these
expressions locally defined constants. Our type system is able to validate that all
range selections (resp. assignments) result in vectors that are compatible with
calls to function from littleendian (resp. return type of function to littleendian).
Finally, the following code snippet is extracted from a CAO implementation
of AES. It shows how our type system is capable of dealing with the complex
mathematical types that arise in cryptographic implementations. In this case we
have a matrix multiplication operation mix * s[0..3,i], where the contents of the
matrices are elements of a finite field extension GF2N.
8
http://nacl.cr.yp.to
6
n
x
fp
dv
dfp
ds
:
:
:
:
:
:
Num
IdV
IdFP
DecV
DecFP
DecS
Numerals
Variable Identifiers
Function and Procedure Identifiers
Variable declarations
Function and Procedure declarations
Struct declarations
pg
e
c
l
pol
t
:
:
:
:
:
:
Progs
Exp
Stm
Lv
Poly
Types
Programs
Expressions
Statements
LValues
Polynomials
Types
e ::= n | true | false | x | −e | e1 † e2 | e.x | e1 [e2 ] | e1 [e2 ..e3 ] |
e1 [e2 , e3 ] | e1 [e2 ..e3 , e4 ..e5 ] |∼ e | (t) e | fp(e1 , ..., en ) | ! e
l ::= x | l.x | l[e] | l[e1 ..e2 ] | l[e1 , e2 ] | l[e1 ..e2 , e3 ..e4 ]
c ::= dv | l1 , ..., li := e1 , ..., ej | c1 ; c2 | if (e) { c1 } | if (e) { c1 } else { c2 } |
while (e) { c } | seq x := e1 to e2 by e3 { c } | seq x := e1 to e2 { c } |
return e1 , ..., en | fp(e1 , ..., en )
dv ::= def x1 , ..., xn : t1 , ..., tn | def x1 , ..., xn : t1 , ..., tn :=e1 , ..., en
ds ::= typedef x := t; | typedef x1 := struct [ def x2 : t1 ; ...; def xn : tn ];
dfp ::= def fp (x1 : t1 , ..., xn : tn ) : rt { c }
rt ::= void | t1 , . . . , tn
t ::= x | int | bool | signed bits [e] | unsigned bits [e] | mod [e] | mod [ t x / pol ] |
vector [n] of t | matrix [n1 , n2 ] of t
pg ::= dv ; | ds | dfp | pg 1 pg 2
Fig. 1: Formal syntax of CAO
typedef GF2 := mod[ 2 ];
typedef GF2N := mod[ GF2<X> / X**8 + X**4 + X**3 + X + 1 ];
typedef S
:= matrix[4,4] of GF2N;
def mix : matrix[4,4] of GF2N :=
{[X],[X+1],[1],[1],[1],[X],[X+1],[1],[1],[1],[X],[X+1],[X+1],[1],[1],[X]};
def MixColumns( s : S ) : S {
def r : S;
seq i := 0 to 3 { r[0..3,i] := mix * s[0..3,i]; }
return r; }
In addition to resolving the matrix size restrictions imposed by the matrix multiplication operation, our type system is able to individually type the finite field
literals in the matrix initialisation, and check that these types are compatible
with the type of the matrix contents. Note that this implies recognising that a
literal of type mod[2] is coercible to GF2N.
4
Formalisation of the CAO Type System
In this Section, we will overview our formalisation of the CAO type system.
Since CAO is a relatively large language, only the most interesting features will
be covered. A full description of the CAO formalisation can be found in [?].
CAO Syntax. The formal syntax of CAO is presented in Figure 1. To simplify
presentation we use † to represent a set of traditional binary operators, namely
† ∈ {+, −, ∗, /, %, ∗∗, &, ˆ, |, ≫, ≪, @, ==, ! =, <, >, <=, >=, ||, &&, ˆˆ}
7
Most of the binary operators are the same as their C equivalents, although
they are overloaded for multiple types. Worth mentioning are the multiplicative
exponentiation operator for integers, residue class groups and fields (∗∗); the
bit-wise conjunction (AND), inclusive- (IOR) and exclusive-disjunction (XOR)
operators (&, | and ˆ respectively); the shift operators for bit strings and vectors
(≫ and ≪); the concatenation operator for bit strings and vectors @; and the
boolean logic exclusive-disjunction (XOR) operator (ˆˆ).
Most of the language syntactic entities, and the accompanying syntax rules,
are also similar to C. Additional domains have been added to this basic set: some
for the sake of a clearer presentation, and others because they are part of CAO’s
domain specific character for cryptography.
4.1
CAO Type System
Function Classification. The type checker is able to automatically classify
CAO functions with respect to their interaction with global variables. The type
checking rules classify functions as either of the following three types:
Pure functions Do not depend on global variables in any way and can only
call other pure functions. These functions are, not only side-effect free, but
also return the same result in every invocation with the same input. This
property is often called referential transparency.
Read-only functions Can read values from global variables, but they cannot
assign values to them. They can call pure functions and other read-only
functions, but not procedures. These functions are side-effect free.
Procedures Can read and assign values from/to global variables. They can call
pure functions, read-only functions and other procedures.
For the CAO type checker, the most important distinction is that between procedures and other functions. Procedures are only admitted in restricted contexts, such as simple assignment constructions. This distinction is completely
automated in the type-checking rules that associate the following total order of
classifiers to CAO constructions: Pure < ReadOnly < Procedure
Put simply, the type checking system enforces the following rules: 1) A construction depending only on local variables is classified as Pure; 2) When reading the value of a global variable, the classifier is set to Read-only; 3) When a
global variable is used in an assignment target, the classifier is set to Procedure;
4) Expressions and statements procedures are classified with respect to their
sub-elements using the maximum operator defined over the total order specified
above. Note that this classification system is conservative in the sense that, for
example, it will fail to correctly classify a function as pure when it reads a global
variable but does not use its value.
Environments, type judgements and conventions. We use symbol τ (possibly with subscripts) to represent an arbitrary (fixed) data type. We write x :: τ
to denote that x has type τ . We use two distinct environments in our type rules:
8
the type environment relation Γ , which collects all the declarations (e.g. variables, function, procedures) together with their associated types; and the constant environment relation ∆, which records the values associated with integer
constants. The Γ environment is partitioned into two relations: ΓG for global definitions and ΓL for local definitions. This distinction is important to deal with
symbol scoping and visibility when typing, for example function declarations.
Whenever this distinction is not important we will just write Γ to abbreviate
ΓG , ΓL . Notation Γ [x :: τ ] is used to extend the environment Γ with a new
variable x of type τ , providing that x is not in the original environment (i.e.,
x 6∈ dom(Γ )). Similarly, ∆[x := n] is used to extend the environment ∆ with a
new constant x with value n, also provided that x is not in the domain of environment ∆. Notation Γ (x) and ∆(x) represent, respectively, the type and the
integer value associated with identifier x, assuming that x belongs to the domain
of the respective environment. Environments are built by order of declaration
in source code, implying that recursive declarations are not possible and that
function classifiers are already known when the functions are called.
We use symbol ⊢ for type judgement of expressions of the form Γ, ∆ ⊢ e ::
(τ, c), retrieving type τ and functional classifier c associated to an expression.
Operator β denotes type judgements of special statements that may modify the
type environment relation: it retrieves not only a typed statement, but also a
new type environment relation. Subscript β (seen as a place-holder) in operator
β represents the return type of the function in which the statement was defined.
This information is particularly useful, allowing the type checker to guarantee
that the several return statements that may appear in a function are always in
accordance with the return type of the corresponding function declaration.
Evaluation of integer expressions. We define a partial function φ∆ to deal
with type parameters such as vector sizes that must be determined at compile
time. This function is used in typing rules to compute the integer value of a
given expression e in context ∆. If this value cannot be determined, then typing
will fail. This function is defined as follows
φ∆ (n) = n
φ∆ (x) = ∆(x), x ∈ dom ∆
φ∆ (−e) = −φ∆ (e)
φ∆ (e1 ∗∗ e2 ) = (φ∆ (e1 ))
φ∆ (e1 † e2 ) = φ∆ (e1 ) † φ∆ (e2 )
(φ∆ (e2 ))
φ∆ (e1 % e2 ) = φ∆ (e1 )
mod φ∆ (e2 )
for † ∈ {+, −, ∗, /}. When evaluating integer expressions in typing rules, we write
...
φ∆ (e) = n
Γ, ∆ ⊢ . . .
...
to mean
...
Γ, ∆ ⊢ e :: (Int, Pure) φ∆ (e) = n
Γ, ∆ ⊢ . . .
...
which implicitly implies that expression e is of integer type.
Data types. In Section 2, types were informally described using CAO syntax
for type declarations. Here we will distinguish between a type declaration and
the type it refers to in our formalisation. We use upper case to indicate the CAO
data types shown in Table 1. An important difference is that the CAO grammar
allows any expression as a parameter of a type declaration, while CAO types
9
Table 1: CAO data types.
Bool
Int
UBits [i]
SBits [i]
Mod [n]
Mod [τ /pol ]
Vector [i] of τ
Matrix [i, j] of α
Booleans
Arbitrary precision integers
Unsigned bit strings of length i
Signed bit strings of length i
Rings or fields defined by integer n
Extension field defined by τ /pol
Vectors of i elements of type τ
Matrices of i × j elements of type α ∈ A
A = {Int, Mod [m], Matrix [i, j] of α | α ∈ A}
must have parameters of the correct type and with a fully determined value,
e.g., sizes must be integer values. In Table 1, A denotes the set of algebraic
types, which are the only ones that can be used to construct matrices. These are
types for which addition, multiplication and symmetric operators are closed. In
order to emphasise occurrences where the type must be algebraic, we will use α
(possibly with subscripts) instead of τ .
Type translation. To deal with the type parameters informally described in
Section 1, we introduce a new judgement that makes the translation between type
declaration in the CAO syntax and types used in the type checking process. This
judgement, of the form ∆ ⊢t t
τ , depends only on the environment ∆, which
can in turn be used to determine the values of expressions that only depend on
constants. This accounts for the fact that, during type checking, types must have
their parameters fully determined, while type declarations in CAO can depend
on arithmetic expressions using constants stored in the environment ∆. Hence
the translation judgement uses evaluation function φ∆ to compute parameter
expressions in the declaration of bit string, vector and matrix sizes, ensuring
that no negative or zero sizes are used. The evaluation function is also used in
modular types with integer modulus to determine its value and ensure that it is
meaningful (i.e., greater than 1). We present only part of this definition below.
φ∆ (e) = n
∆ ⊢t unsigned bits [e]
UBits[n]
n≥1
φ∆ (e) = n
n≥2
∆ ⊢t mod [e]
Mod[n]
φ∆ (e) = n ∆ ⊢t t
τ
Γ, ∆ ⊢t vector [e] of t
Vector [n] of τ
φ∆ (e1 ) = n φ∆ (e2 ) = m ∆ ⊢t t
α
∆ ⊢t matrix [e1 , e2 ] of t
Matrix [n, m] of α
n≥1
α ∈ A, n ≥ 1, m ≥ 1
Type coercions. Type coercions are essentially implicit (typically data preserving) type conversions, whereby the programmer is allowed to use terms of
some type whenever another type is expected. In CAO, this mechanism is remarkably useful, for example when dealing with field extensions (cf. the third
rule in Table 2), since a field can be seen as a subtype of all its field extensions.
In general, when a CAO type τ1 is coercible to another type τ2 , then the set of
10
Table 2: Type coercion relation, ⊢≤ t1 ≤ t2
t1
UBits[n]
SBits[n]
τ
Vector[n] of τ1
Matrix [i, j] of α1
t2
Int
Int
Mod[τ ′ /pol ]
Vector[n] of τ2
Matrix [i, j] of α2
Condition
⊢≤ τ ≤ τ ′
⊢≤ τ 1 ≤ τ 2
⊢≤ α1 ≤ α2 and α1 , α2 ∈ A
Table 3: A few cases for the cast relation, ⊢c t1 ⇒ t2 .
t1
Int
Int
Vector [i] of τ1
Mod [τ1 /pol ]
Matrix [1, j] of α
Vector [i] of τ
Vector [i] of τ1
Matrix [i, j] of α1
t2
Bits [i]
Mod [n]
Mod [τ2 /pol ]
Vector [i] of τ2
Vector [j] of τ
Matrix [i, 1] of α
Vector [i] of τ2
Matrix [i, j] of α2
Condition
⊢c
⊢c
⊢c
⊢c
⊢c
⊢c
τ1 ⇒ τ2 and i = degree(pol )
τ1 ⇒ τ2 and i = degree(pol )
α ⇒ τ and α ∈ A
τ ⇒ α and α ∈ A
τ1 ⇒ τ2
α1 ⇒ α2 and α1 , α2 ∈ A
values in τ1 can be seen as a subset of the values in τ2 . For example, all bitstrings of a given size can be coerced to the integer type. We define a coercion
relation ≤, associated with a new kind of judgement ⊢≤ . Coercions are naturally
reflexive, and Table 2 summarises the other possible coercions.
Often the arguments of an operation have different types but are coercible to
a common type, or one is coercible to the other. In order to capture this situation,
we define the ↑ operator on types, which returns the least upper bound of the
types to which its arguments are coercible:
τ1 ↑ τ2 = min{τ | ⊢≤ τ1 ≤ τ and ⊢≤ τ2 ≤ τ }
This requires that the coercion relation ≤ is regarded as a partial order on types,
thus requiring the reflexivity, transitivity and anti-symmetry properties to hold.
As we have seen before, the coercion relation is reflexive; the transitivity and
anti-symmetry requirements are also easy to add and well suited to our intuitive
notion of coercion. With these properties in place, and for the particular set of
coercions allowed in CAO, we have that τ1 ↑ τ2 is always uniquely defined. In
typing rules, we therefore abbreviate the following pattern
...
Γ, ∆ ⊢ e :: τ1 ⊢≤ τ1 ≤ τ2
Γ, ∆ ⊢ . . .
...
by
...
Γ, ∆ ⊢ e ≤ τ2
Γ, ∆ ⊢ . . .
...
.
Casts. The CAO language includes a cast mechanism that allows for explicitly
converting values from one type to another. However, not all casts are possible:
the set of admissible type cast operations has been carefully designed to account
for those conversions that are conceptually meaningful in the mathematical sense
and/or are important for the implementation of cryptographic software in a natural way. We define a type cast relation ⇒, which is associated with a new kind of
11
judgment ⊢c . Table 3 shows the part of the definition of the cast relation. Using
the cast relation, we only have to provide one typing rule for cast expressions.
⊢≤ τ 1 ≤ τ 2
⊢c τ 1 ⇒ τ 2
∆ ⊢t t
τ
Γ, ∆ ⊢ e :: (τ ′ , c) ⊢c τ ′ ⇒ τ
Γ, ∆ ⊢ (t) e :: (τ, c)
The additional rule on the left is needed so that coercions can be made explicit,
which also implies that a certain type can be cast to itself.
Sizes of bit strings, vectors and matrices. Since type declarations are
mandatory and container types have explicit sizes, we can use these to verify if
operations deal consistently with these sizes. Furthermore, the type system can
feed this information to subsequent components in the CAO tool chain.
For instance, the operation that concatenates two vectors should return a
new vector whose size is the sum of the sizes of the individual vectors, and
whose type is the least upper bound of the types of the two vectors, with respect
to the coercion ordering ≤:
Γ, ∆ ⊢ e1 :: (Vector[i] of τ1 , c1 ) Γ, ∆ ⊢ e2 :: (Vector[j] of τ2 , c2 )
Γ, ∆ ⊢ e1 @ e2 :: (Vector[i + j] of τ, max(c1 , c2 ))
τ1 ↑ τ2 = τ
The concatenation of bit strings is similar. Moreover, in the case of matrix algebraic operations, e.g. multiplication, the dimension of the matrices can be
checked for correctness.
When range selection is used over bit strings, vectors or matrices, we require
that the integer expressions must be evaluated at compile-time so that the size
of the expression, and therefore its type can be determined. In this case, the
limits of the range are compared against the bounds of the associated type. For
instance, for a range access to a vector we have:
Γ, ∆ ⊢ e :: (Vector[k] of τ, c) φ∆ (e1 ) = i φ∆ (e2 ) = j
k > j, j ≥ i ≥ 0
Γ, ∆ ⊢ e[e1 ..e2 ] :: (Vector[j − i + 1] of τ, c)
This is also a limited form of dependent typing since the type associated with
the expression depends on the expression itself.
Rings, Finite Fields and Extensions. One of the most unusual features of
the CAO language is the support for ring and finite field types and their possible
extensions. Our type checking rules allow us to ensure that operations over values
of these types are well-defined and that values from different (instances of these)
types are not being erroneously mixed due to programming errors. For instance,
the typing rule for division is:
Γ, ∆ ⊢ e1 :: (Mod [m1 ], c1 )
Γ, ∆ ⊢ e2 :: (Mod [m2 ], c2 ) Mod [m1 ] ↑ Mod [m2 ] = Mod [m]
Γ, ∆ ⊢ e1 / e2 :: (Mod [m], max(c1 , c2 ))
The use of the least upper bound captures the fact that the types may be equal,
or one may be an extension of the other.
Variables and function calls. The classification of expressions depends
on the environment accessed when retrieving the value of a variable. If a local
12
variable is accessed, the code is considered pure; if a global variable is read, the
code is classified as read-only.
ΓG (x) = τ
ΓG , ΓL , ∆ ⊢ x :: (τ, ReadOnly)
ΓL (x) = τ
ΓG , ΓL , ∆ ⊢ x :: (τ, Pure)
x ∈ dom(ΓG )
x ∈ dom(ΓL )
Since in expression, we can only use functions that do not cause side-effects, the
typing rule for function application has a side condition to ensure that the body
of the function is not a procedure (i.e., it does not modify a global variable):
ΓG (f ) = ((τ1 , . . . , τn ) → τ, c)
ΓG , ΓL , ∆ ⊢ e1 ≤ (τ1 , c1 ) . . . ΓG , ΓL , ∆ ⊢ en ≤ (τn , cn )
ΓG , ΓL , ∆ ⊢ f (e1 , . . . , en ) :: (τ, max(c, c1 , . . . , cn ))
max(c, c1 , . . . , cn ) < Procedure and f ∈ dom(ΓG )
Functions, procedures and statements. We introduce symbol • as a possible (empty) return type to detect misuses of the return statement. We distinguish
the cases when a block has explicitly executed a return statement from the cases
where no return statement has been executed. In the former case we take the
type of the parameter passed to the return statement or • if no such parameter
exists. In the latter case we also use the • symbol. Thus, a return statement is
typed with the same type as its argument, which must coincide with the expected
return type for the block.
Γ, ∆
Γ, ∆ ⊢ e1 ≤ (τ1 , cc1 ) . . . Γ, ∆ ⊢ en ≤ (τn , ccn )
(τ1 ,...,τn ) return e1 , . . . , en :: ((τ1 , . . . , τn ), max(cc1 , . . . , ccn ), Γ )
Since CAO has a call-by-value semantics, returning multiple values is allowed in
order to make references or additional structures unnecessary.
The typing rule for a function definition therefore verifies if the type of its
body is not • to ensure that a return statement was used to exit the function.
Moreover, the returned type has to be equal (or coercible) to the declared type
(recall the use of judgement τ ).
The seq statement permits iterating over an integer variable varying between
two statically determined bounds. The index starts with the value of the lower
(resp. upper) bound and at each step is incremented (resp. decremented) by
the amount of the step value until it reaches the upper (resp. lower) bound.
The interesting feature of this mechanism is that the iterator is regarded as a
constant at each iteration step. In the typing rules, this allows us to add the
index and its respective value to the environment ∆ at each iteration:
φ∆ (e1 ) = i φ∆ (e2 ) = j ∀n∈{i...j} ΓG , ΓL [x :: Int], ∆[x := n]
ΓG , Γ L , ∆
τ
τ
c :: (ρ, cc, ΓG′ , ΓL′ )
seq x := e1 to e2 { c } :: (•, cc, ΓG , ΓL )
ρ ∈ {τ, •}, x 6∈ dom ΓL , i ≤ j
Therefore, declarations and access expressions inside the body of the sequence
statement may depend on the index but may still be statically typeable. As highlighted in Section 3, the combination of range selection and assignment operators
13
for bit strings, vectors and matrices with this simplified loop construction is a
good example of how the CAO language design allowed us to fine tune the type
checker to provide extra assistance to the programmer. Note, however, that sequential statements can make the type checking process slow, as sequences must
be explicitly unfolded and typed for each possible value of the iterator.
A Detailed Example. We now present a detailed example of the how our type
system handles the hsalsa20 fragment introduced in Section 3. The syntactic form
of the program is
seq i := 0 to 3
x[i+1] :=
x[i+6] :=
x[i+11] :=
{
from_littleendian( k[i*4..i*4+3]);
from_littleendian(in[i*4..i*4+3]);
from_littleendian( k[i*4+16..i*4+19]); }
where we desire type annotations for each node in the parse tree. The inference
process traverses the tree matching rules against syntax. This traversal highlights
aspects of the inference at three levels in the tree. Before reaching this fragment
the declarations have already been produced and thus the initial environment is
ΓL = {k :: Vec[32] of UBits[8], in :: Vec[16] of UBits[8], x :: Vec[8] of UBits[32]}
ΓG = {to littleendian :: UBits[32] → Vec[4] of UBits[8],
from littleendian :: Vec[4] of UBits[8] → UBits[32]}
∆ = {}
The first step matches the entire fragment against seq i := 0 to 3 {s1 ; s2 ; s3 }
∀n∈{0...3} ΓG , ΓL [i :: Int], ∆[i := n]
ΓG , ΓL , ∆
τ
τ
c :: (ρ, cc, ΓG′ , ΓL′ )
seq i := 0 to 3 {s1 ; s2 ; s3 } :: (•, cc, ΓG , ΓL )
This entails, for each of the n ∈ {0, 1, 2, 3} cases, that for assignments (li :=ri ) =
si in each of the s1 , s2 , s3 preconditions, each statement is matched by
Γn , ∆n ⊢ li :: (τ, cl) Γ, ∆n ⊢ ri ≤ (τ, c)
Γn , ∆n |=τ li := ri :: (•, max(cl, c), Γ )
Here Γn = ΓG , ΓL [i :: Int] and ∆n = ∆[i := n]. Now, for each of the li we obtain
something of the form x[i + 1] where ΓL (x) = Vec[8] of UBits[32] and an index
expression i + 1 :: (Int, Pure), thus we can match
Γn , ∆n ⊢ x :: (Vec[8] of UBits[32], Pure) Γn , ∆n ⊢ i + 1 ≤ (Int, Pure)
Γn , ∆n ⊢ x[i + 1] :: (UBits[32], max(Pure, Pure))
Finally, for each of the ri the function parameter ei is either ΓG [k] or ΓG [in] ::
Vec[16] of UBits[8], Furthermore, the index expression is defined only over i,
whose value is known, and integer literals. Thus each expression of the form
k[i ∗ 4..i ∗ 4 + 3] becomes a slice over determined indices after application of φ∆
and k[i ∗ 4..i ∗ 4 + 3] :: (Vec[4] of UBits[8], Pure). Hence
ΓG (from littleendian) = (Vec[4] of UBits[8] → UBits[32], Pure)
ΓG , ΓL , ∆1 ⊢ k[i ∗ 4..i ∗ 4 + 3] ≤ (Vec[4] of UBits[8], Pure)
ΓG , ΓL [i :: Int], ∆1 ⊢ from littleendian(k[i ∗ 4..i ∗ 4 + 3]) :: (UBits[32], max(Pure, Pure))
14
5
Implementation
The CAO type-checker was fully implemented in the Haskell functional language,
which provides a plethora of libraries and built-in language features. Among
these, we found some to be particularly useful, such as classes, specific syntax
for handling monadic data types and the monad Error data type. These Haskell
assets, not only simplified the implementation process, but also helped improving
substantially the readability of the code and its comparison with the formal
specification of the type checking rules described in the previous section.
To generally illustrate Haskell’s ability to deal with the formal type checking rules that we specified in the previous section, consider the following code
snippet, which implements the rule for type checking CAO while statements.
tcStatement s@(WhileStatement info cond wstms) h rt =
do (cond’, condt, cb) <- tcExp cond h
checkMatchType info condt Boolean
(wstms’, wst, cc, h’) <- tcStatements wstms h rt
return (mkWhileStatement (buildTcNodeInfo info Bullet)
cond’ wstms’, Bullet, max cb cc ,h)
The interpretation of the above code is quite immediate. Function tcStatement
is our formal statement type checking function |=, rt represents the expected
return type, which in the formal definition subscripts |= and h corresponds to
the type environments Γ and ∆. Note that, even though we have made clear the
distinction between Γ and ∆ in the formal rules, this was mainly justified by
presentational reasons. Still on the arguments side, one finds (WhileStatement
info cond wstms), trivially matching while b {c}, except for the info identifier,
which is an add-on of the implementation for storing the exact place where the
CAO code being analysed appears in the input file.
Regarding the function body, in accordance to the formal rule, which relies
on premises referring to ⊢ and |=, so does the implementation, referring to functions tcExp and tcStatements respectively. Here, however, one resorts to Haskell’s
monadic operator <- over the monad Error data type. In this way we combine
calls to different type checking functions that may return type checking errors,
ensuring that if an error occurs in one of the calls, the error is propagated down
to the end of the type checker execution, without interfering with any other type
checking rule in between.
Function checkMatchType corresponds to our order comparison operator ≤
over data types, while Bullet is our functional representation of symbol •. Function max ensures that type classifiers, which allow the type system to recognise
various types of functions, are properly propagated. Instead of returning the
type of the expression being evaluated, the implementation returns the expression received annotated with its type, to be used by subsequent compilation
steps. Nevertheless, the above rule implementation illustrates how we have kept
the implementation reasonably close to the formal definition, therefore favoring
a direct validation by inspection of the implementation.
6
Soundness of the Type System
As usual, the CAO type system aims to ensure that “well-typed programs do
not go wrong” [8]. This is formalised as a soundness theorem relating static
15
(typing) and dynamic semantics. For the moment, our result only ensures that
the evaluation of well-typed program does not fall into a certain class of errors:
formally, we are proving a weak soundness theorem. Concretely, we have shown
that only a well-defined set of run-time errors (trapped errors, denoted by ǫ in the
semantic domain V) can occur when evaluating a correctly typed program. These
are explicitly captured in the semantics of the language, and they are limited to
divisions by zero and out of bounds accesses to containers. In this Section, we first
shortly present some aspects of our formalization of the CAO semantics necessary
to provide support to the subsequent discussion of our soundness theorem and
proof sketch. The complete description of both can be found in [?].
CAO Semantics Evaluation of a CAO program is defined by an evaluation
relation that relates an initial configuration (a CAO program together with a
description of the initial state) with a final configuration (a semantic value and
a final state). The domain of semantic values is defined as a solution of the
domain equation V = Z + V⋆ + E, where Z denotes the domain of integers,
V⋆ denotes sequences of values of type V of the form [v0 , . . . , vn−1 ] and E is
the type of the trapped error value ǫ. A trapped error is an execution error
that results in an immediate fault (run-time error); an untrapped error is an
execution error that does not immediately result in a fault, corresponding to an
unexpected behavior. We denote such an error by ⊥ (considering the lift version
of the semantic domain V⊥ ).
We define three mutually recursive evaluation relations, each of them responsible for characterising the evaluation of different syntactic classes: expressions,
statements and declarations:
– h e | st i → r evaluates expression e in state st to the value r. Expression
evaluation is side-effect free, and hence the state is not changed.
– h c | st i ⇒ h r | st′ i means that the evaluation of statement c in state st
transforms the state into st′ , and (possibly) produces result r.
– h d | st i ⇛ h st′ i means that the evaluation of declaration d in state st
transforms the state into st′ .
CAO has a call by value semantics, where there are no references and each variable identifier denotes a value. Assignments mean that old values are replaced by
the new ones in the state. Since expressions are effect-free, simultaneous value
assignments are possible.
Var
Assign
h y | st i → st(y)
h e1 | st i → r1
···
h en | st i → rn
h l1 , ..., ln := e1 , ..., en | st i ⇒ h • | st[r1 ...rn /l1 ...ln ] i
In CAO, a run-time trapped error can occur only in three cases: 1) accessing
a vector, matrix or bit string out of the bounds; 2) division (or remainder of
division) by zero; and 3) assigning a value to a vector, matrix or bit string out
of bounds. We present the rules for the first two cases below.
16
VSel
VSel-Err
h ea | st i → [r0 , ..., rn−1 ]
h ei | st i → i
h ea[ei] | st i → at(i, [r0 , ..., rn−1 ])
h ea | st i → [r0 , ..., rn−1 ]
h ei | st i → i
h ea[ei] | st i → ǫ
Div-Zero
h e1 | st i → r1 h en | st i → 0
h e1 / e2 | st i → ǫ
Rem-Zero
h e1 | st i → r1 h en | st i → 0
h e1 % e2 | st i → ǫ
0≤i<n
i<0∨i≥n
where function at returns the n-th element of a sequence. Range accesses actually
cannot cause trapped errors, as the type system enforces that the limits must
be statically defined in order to determine the size of the result, which means
that such errors can be detected. Trapped errors are propagated throughout
evaluation rules, i.e., whenever a premiss evaluates to E the overall rule also
evaluates to E. All cases that fall outside of our semantic rules are implicitly
evaluated to untrapped errors (⊥ value).
Soudness theorem and proof sketch Our result is stated in the following
theorem, where ⊢ ρ :: ΓG denotes consistency and ◦ denotes empty store/state.
Theorem 1. Given a program p if ◦, ◦, ◦ ⊢ p :: (•, ΓG ) and h p | ◦ i ⇛ h ρ i
terminates, then ⊢ ρ :: ΓG or ρ is an error state.
Proof (Sketch). The proof is by induction on type judgements. The base case for
induction is that prior to execution, every type-checked program has an initial
evaluation environment that is (trivially) consistent with the typing environment. Here, consistency means that all variables in the evaluation environment
have associated values compatible with their corresponding type in the typing
environment. The inductive cases are considered for each transition defined in
the semantics of the language. In each case we show that one of two cases can
occur: 1) either a consistent environment is produced at the end of each transition; or 2) a trapped error has been generated and is returned by the program
along with a (still consistent) evaluation environment. We present two cases, illustrating how the proof proceeds for expressions that may raise trapped errors.
Division Expressions. We have to prove that if h e1 ⊘ e2 | ρ i → v terminates
then v ∈ V\E. Two semantic rules can be applied for each operator, one in the
case of division by 0; the other in the general case:
– If h e1 | ρ i → v1 and h e2 | ρ i → 0 terminate, then h e1 /e2 | ρ i evaluates
to ǫ ∈ V by semantic Div-Zero.
– If h e1 | ρ i → v1 and h e2 | ρ i → v2 terminate, with v2 6= 0, then
h ||e1 /e2 || | ρ i evaluates to [[/]][v1 , v2 ] by semantic rule Op. Here [[/]] gives
the interpretation of the / operator with respect to the values v1 and v2 . By
induction hypothesis, v1 and v2 are in the semantic domain V, corresponding
17
to representations of integer values. Since division is well-defined for integer
representations, then [[/]][v1 , v2 ] evaluates to another value v which is again
a representation of an integer and hence v ∈ V\E.
Bit string Access Expressions. We have to prove that if h e1 [e2 ] | ρ i → v
terminates then v ∈ V\E. In bit string accesses, two semantic rules are applicable, VSel and VSel-Error, depending on the validity of the index value:
– If h e1 | ρ i → [r0 , . . . , rn−1 ] and h e2 | ρ i → i terminate, for 0 ≤ i < n,
then h e1 [e2 ] | ρ i evaluates to at(i, [r0 , . . . , rn−1 ]) by semantic rule VSel,
where [r0 , . . . , rn−1 ] represents a bit string with r0 ∈ V\E, . . . rn−1 ∈ V\E,
and i ∈ V\E represents an integer, both by induction hypothesis. This case
follows from the observation that the at function only fails if the integer index
is outside the range of positions of the argument sequence and by hypothesis
this cannot happen. Therefore, the returned value always belongs to V\E.
– If h e1 | ρ i → [r0 , . . . , rn−1 ] and h e2 | ρ i → i terminate, whenever i <
0 ∨ i ≤ n, by VSel-Error, h e1 [e2 ] | ρ i evaluates to ǫ which belongs to the
semantic domain and hence this case also holds.
7
Related Work
Cryptol [6] is a domain-specific language and tool suite developed for the specification and implementation of cryptographic algorithms. It is a functional DSL
without global state or side-effects, which was developed with the main purpose
of producing formally verified hardware implementations of symmetric cryptographic primitives such as block ciphers and hash functions. CAO is an imperative language that targets a wider application domain, although also restricted to
cryptography. Indeed, the CAO language features have been designed to permit
expressing, not only symmetric but also asymmetric cryptographic primitives, in
a natural way. Furthermore, CAO tools are released under an open-source policy.
Dependent types offer a powerful approach to ensure program properties.
However, this power in not incorporated in any of the mainstream languages,
while the prototypical languages that do it are mostly functional. The first prototype of an imperative language to use dependent types was Xanadu [9], allowing,
e.g., to statically verify that accesses to arrays are within bounds. So far, CAO
offers a modest form of dependent types, where all type parameters values must
be statically known. Ongoing work aims extend CAO with a more powerful approach to dependent types inspired by [9]. This new version of the type system
allows for symbolic parametrisation, dropping the requirement that all sizes are
known at compilation, using an SMT solver to handle associated constraints.
The use of Generalized Algebraic Data Types (GADTs) in Haskell, together
with type families and existential types, allows the implementation of embedded
DSL’s with some dependent typing features. Moreover, since this approach relies
on Haskell’s type system, this permits avoiding the full implementation of a type
checker. CAO does not follow this embedded approach because it would make it
18
harder to preserve characteristics of the language that pre-date this work. For example, the CAO syntax tries to follow the cryptographic specification standards,
and GADTs would impose their own syntax, which is targeted at building combinator systems. One could of course try to use a GADT-based intermediate
representation, but it is not clear that this would pay out in terms of the global
implementation effort. In particular, we anticipate that dealing with coercions
and casts would complicate the type checking apparatus [?]. Moreover, it would
probably be difficult using an embedded approach to keep the implementation
structure close to the formal specification.
The use of an embedded implementation in a dependently typed language,
e.g. Coq or Agda, could also be an option for the implementation of our type system. However, this would suffer from the same drawbacks previously presented
for GADTs, and would also require specific expertise that are not realistic to
assume in the target audience for CAO. The need to reason about the correctness and termination of CAO programs at this level would also be an overkill for
most applications. In the CAO tool-chain, this sort of analysis is enabled by an
independent deductive formal verification tool called CAOVerif.
8
Conclusion
CAO is a language aimed at closing the gap between the usual way of specifying cryptographic algorithms and their actual implementation, reducing the
possibility of errors and increasing the understanding of the source code. This
language offers high-level features and a type system tailored to the implementation of cryptographic concepts, statically ruling out some important classes of
errors. In this paper, we have presented a short overview of CAO and the specification, validation and implementation of a type-system designed to support the
implementation of front-ends for CAO compilation and formal verification tools.
References
1. Barbosa, Manuel (editor). CACE Deliverable 5.2: Formal Specification Language
Definitions and Security Policy Extensions, June 2009.
2. Barbosa, Manuel (editor). CAO Formal Specification, November 2010. http://
www3.di.uminho.pt/~mbb/FSENSubmission/.
3. D. J. Bernstein. The Poly1305-AES message-authentication code. In H. Gilbert
and H. Handschuh, editors, FSE, volume 3557 of LNCS. Springer, 2005.
4. D. J. Bernstein. Cryptography in NaCl, 2009. http://nacl.cr.yp.to.
5. J. Jonsson and B. Kaliski. Public-Key Cryptography Standards (PKCS) #1: RSA
Cryptography Specification Version 2.1, 2003.
6. J. Lewis. Cryptol: specification, implementation and verification of high-grade cryptographic applications. In FMSE ’07, page 41. ACM, 2007.
7. A. J. Menezes, S. A. Vanstone, and P. C. V. Oorschot. Handbook of Applied Cryptography. CRC Press, Inc., Boca Raton, FL, USA, 1996.
8. R. Milner. A theory of type polymorphism in programming. Journal of Computer
and System Sciences, 17:348–375, Aug. 1978.
9. H. Xi. Imperative programming with dependent types. In LICS, pages 375–387,
2000.