Academia.eduAcademia.edu

Semantics-based generation of verification conditions via program specialization

2017, Science of Computer Programming

ISTITUTO DI ANALISI DEI SISTEMI ED INFORMATICA CONSIGLIO NAZIONALE DELLE RICERCHE E. De Angelis, F. Fioravanti, A. Pettorossi, M. Proietti SEMANTICS-BASED GENERATION OF VERIFICATION CONDITIONS VIA PROGRAM SPECIALIZATION R. 9 2016 Emanuele De Angelis – DEC, University “G. d’Annunzio”, Pescara, Italy, and Istituto di Analisi dei Sistemi ed Informatica “Antonio Ruberti” del CNR, Via dei Taurini 15, I-00185 Roma, Italy. Email : deangelis@sci.unich.it. URL: http://www.sci.unich.it/~deangelis. Fabio Fioravanti – DEC, University “G. d’Annunzio”, Pescara, Italy, and Istituto di Analisi dei Sistemi ed Informatica “Antonio Ruberti” del CNR, Via dei Taurini 15, I-00185 Roma, Italy. Email : fioravanti@sci.unich.it. URL: http://www.sci.unich.it/~fioravan. Alberto Pettorossi – DICII, University “Tor Vergata”, Roma, Italy, and Istituto di Analisi dei Sistemi ed Informatica “Antonio Ruberti” del CNR, Via dei Taurini 15, I-00185 Roma, Italy. Email : adp@iasi.cnr.it. URL : http://www.iasi.cnr.it/~adp. Maurizio Proietti – Istituto di Analisi dei Sistemi ed Informatica “Antonio Ruberti” del CNR, Via dei Taurini 19, I-00185 Roma, Italy. Email : maurizio.proietti@iasi.cnr.it. URL : http://www.iasi.cnr.it/~proietti. ISSN: 1128–3378 Collana dei Rapporti dell’Istituto di Analisi dei Sistemi ed Informatica “Antonio Ruberti”, CNR, via dei Taurini 19, 00185 ROMA, Italy tel. ++39-0649931 fax ++39-0649937106 email: iasi@iasi.cnr.it URL: http://www.iasi.cnr.it Semantics-based generation of verification conditions via program specialization✩ E. De Angelisa,c,∗, F. Fioravantia,c,∗, A. Pettorossib,c,∗, M. Proiettic,∗ a DEC, University ”G. d’Annunzio” of Chieti-Pescara, Viale Pindaro 42, 65127 Pescara, Italy University of Rome Tor Vergata, Via del Politecnico 1, 00133 Roma, Italy c CNR-IASI, Via dei Taurini 19, 00185 Roma, Italy b DICII, Abstract We present a method for automatically generating verification conditions for a class of imperative programs and safety properties. Our method is parametric with respect to the semantics of the imperative programming language, as it generates the verification conditions by specializing, using unfold/fold transformation rules, a Horn clause interpreter that encodes that semantics. We define a multi-step operational semantics for a fragment of the C language and compare the verification conditions obtained by using this semantics with those obtained by using a more traditional small-step semantics. The flexibility of the approach is further demonstrated by showing that it is possible to easily take into account alternative operational semantics definitions for modeling additional language features. We have proved that the verification condition generation takes a number of transformation steps that is linear with respect to the size of the imperative program to be verified. Also the size of the verification conditions is linear with respect to the size of the imperative program. Besides the theoretical computational complexity analysis, we also provide an experimental evaluation of the method by generating verification conditions using the multi-step and the small-step semantics for a few hundreds of programs taken from various publicly available benchmarks, and by checking the satisfiability of these verification conditions by using state-of-the-art Horn clause solvers. These experiments show that automated verification of programs from a formal definition of the operational semantics is indeed feasible in practice. Keywords: constraint logic programming, Horn clauses, program verification, program specialization, semantics of programming languages, verification conditions, software model checking. 1. Introduction A well-established technique for the verification of program correctness relies on the generation of suitable verification conditions (VCs, for short) starting from the program code [2, 11, 29]. Verification conditions are logical formulas whose satisfiability implies program correctness, and the satisfiability check can be performed, if at all possible (because, in general, the problem of verifying program correctness is undecidable), by using special purpose theorem provers or Satisfiability Modulo Theories (SMT) solvers [4, 16]. Recently, constrained Horn clauses have been proposed as a common encoding format for software verification problems, thus facilitating the interoperability of different software verifiers, and efficient solvers have been made available for checking the satisfiability of Horn-based verification conditions [4, 10, 16, 28, 32]. The notion of a constrained Horn clause we use in this paper is basically equivalent to the notion of a Constraint Logic Programming (CLP) clause [33]. The choice of either terminology depends on the context of use. Constraints are assumed to be formulas of any first order theory. ✩ This paper is an extended, improved version of [12]. author Email addresses: emanuele.deangelis@unich.it (E. De Angelis), fabio.fioravanti@unich.it (F. Fioravanti), adp@iasi.cnr.it (A. Pettorossi), proietti@iasi.cnr.it (M. Proietti) ∗ Corresponding Preprint submitted to Elsevier December 19, 2016 Typically, verification conditions are automatically generated, starting from the programs to be verified, by using verification condition generators. These generators are special purpose software components that implement algorithms for handling the syntax and the semantics of both the programming language in which programs are written and the class of properties to be verified. A VC generator takes as input a program written in a given programming language, and a property of that program to be verified, and by applying axiomatic rules à la Floyd-Hoare, it produces as output a set of verification conditions. Having built a VC generator for a given programming language, to build a new VC generator for programs written in an extension of that language, or a different programming language, or even for programs written in the same language syntax but with a different language semantics (for instance, the big-step semantics, instead of the smallstep semantics [52]) requires the design and the implementation of a new, ad hoc VC generator. In this paper we present a method for generating verification conditions which is based on a CLP encoding of the operational semantics of the programming language and on the CLP program specialization technique which uses the unfold/fold transformation rules. The use of CLP program specialization for analyzing programs is not novel. Peralta et al. [49] have used it for analyzing simple imperative programs and Albert et al. [1, 27] for analyzing Java bytecode. In a previous work of ours [11] VCs are generated from a small-step semantics, for verifying imperative programs using iterated specialization. Here we extend and further develop the VC generation technique based on CLP specialization, and we demonstrate its generality and flexibility by showing that suitable customizations of the CLP specialization strategy are able to effectively deal with a multi-step semantics and several variants thereof. By the term multi-step semantics, which is folklore, we mean a hybrid between small-step semantics and big-step semantics, where: (i) the execution of each command, different from a function call, is formalized as a one-step transition from a state to the next one, and (ii) the execution of a function call is formalized as a sequence of one-step transitions from the state where the function is called to the one where the function evaluation terminates. We also show the scalability of our technique for VC generation through both a theoretical complexity analysis and the results we have achieved by using our implementation. Finally, we show, in an empirical way, that our specialization strategy returns VCs which are of high quality, in the sense that these VCs can effectively be handled by state-of-the-art solvers for checking the satisfiability of Horn clause verification conditions [4, 16, 28, 32]. Actually, some solvers perform better on VCs generated by specialization, than on VCs generated by ad hoc algorithms. Our verification method can be described as follows. Given an imperative program P and a property ϕ to be proved, we construct a CLP program I, which defines a nullary predicate unsafe such that P satisfies the property ϕ if and only if the atom unsafe is not derivable from I. The construction of the CLP program I depends on the following parameters: (i) the imperative program P, (ii) the operational semantics of the imperative language in which P is written, (iii) the property ϕ to be proved, and (iv) the logic that is used for specifying ϕ (in this case, the reachability of an unsafe state, that is, a state where ϕ does not hold). The verification conditions are obtained by specializing program I with respect to its parameters. This specialization process is performed by applying semantics-preserving unfold/fold transformation rules [17]. The application of these rules is guided by a strategy particularly designed for verification condition generation, called the VCG strategy. Thus, the correctness of the verification conditions follows directly from the correctness of the unfold/fold transformation rules that are applied during program specialization. When we perform the specialization of the CLP program I, we get the effect of removing from I the overhead due to the level of interpretation which is present in I because of the clauses defining the operational semantics of the imperative programming language. This specialization, called the removal of the interpreter in this paper, realizes the first Futamura projection, which is a well-known operation in the program specialization literature [35]. Indeed, by the first Futamura projection we have that the specialization of an interpreter written in a language L (CLP, in our case) with respect to a source program (written in C, in our case) has the effect of compiling the source program into L. The removal of the interpreter drastically simplifies program I by getting rid of the complex terms (including lists) that encode the commands of the imperative language and their operational semantics. Then, the simplified program, derived after the removal of the interpreter, is handled by using special purpose automatic tools for Horn clauses with linear integer arithmetic constraints [4, 16, 28, 32]. In our approach, similarly to what is done in other papers [6, 44, 48], we use a formal representation of the operational semantics of the language in which the imperative programs are written, as an explicit parameter of the verification problem. One of the most significant advantages of this technique is that it enables us to design widely 4 applicable VC generators for programs written in different programming languages, and for different operational semantics of languages with the same syntax, by making small modifications only. An Introductory example Let us consider the program sum upto in Figure 1. The final value of the program variable z is the sum of the integers from 1 to the initial value of the program variable x (which equals the final value of x, as that value never changes). The semantics of the program can be viewed as a relation between an initial configuration, where the global program variables x and z have values X and Z, respectively, and a final configuration, where x and z have values X1 and Z1, respectively. Suppose we want to check the following safety property: for every computation that starts from an initial configuration where X>=2 is true (we use Prolog syntax for constraints) and terminates, then in the final configuration Z1>X1 is true. This property is equivalent to the Hoare triple {x>=2} sum upto {z>x}. As mentioned above, our method works by introducing a CLP program I that encodes the negation of the safety property through the following clause: unsafe:- initConf(C), reach(C,C1), errorConf(C1). where initConf(C) holds if C is an initial configuration where X>=2 is true, reach(C,C1) holds if configuration C1 can be reached from configuration C, and errorConf(C1) holds if C1 is a final configuration where Z1=<X1 holds. The predicate reach is defined in terms of predicates that encode the interpreter of the language. The CLP specialization technique presented in this paper will produce the new CLP program Isp shown in Figure 1, which encodes the verification conditions for the program and property of interest. The predicates of Isp correspond to some of the program points of sum upto. We have indicated this correspondence in Figure 1. By the correctness of specialization, the safety property holds for sum upto if and only if Isp ̸|= unsafe, or equivalently, Isp ∪ {¬ unsafe} is satisfiable. Thus, by using one of the state-of-the-art solvers for Horn clauses with linear integer constraints, we can attempt to check the satisfiability of the set Isp ∪ {¬ unsafe} of clauses. Now, (i) if that set is satisfiable, then the safety property holds, while (ii) if that set is unsatisfiable, then the safety property does not hold. Clearly, being the satisfiability of Horn clauses with linear integer constraints an undecidable problem, the solver may not terminate with a definite answer. Fortunately, in this example, by using the solver Z3 we get that Isp ∪ {¬ unsafe} is satisfiable, and hence the safety property holds. ! Program sum upto Verification conditions Isp int x, z; int f(int n) { int r; if (n <= 0) r = 0; else r = f(n-1)+n; return r; } void sum_upto() { z = f(x); } unsafe :- X>=2, Z1=<X1, new1(X,X1,Z1). new1(X,X1,Z1) :- N=X, Z1=R, new2(N,X,X1,R). new2(N,X,X1,R) :- N=<0, X1=X, R=0. new2(N,X,X1,R) :- N>=1, new3(N,X,X1,R). new3(N,X,X1,R) :- M=N-1, R=R1+N, new2(M,X,X1,R1). /*new2*/ /*new3*/ /*new1*/ Figure 1: The program sum upto and the verification conditions ensuring the safety of the program. The contributions of this paper can be summarized as follows. - We have defined a multi-step operational semantics for a fragment of the C language manipulating integers and integer arrays. - We have designed a VCG strategy which is parametric with respect to the operational semantics of the imperative language under consideration and the logic used for specifying the program property of interest. - We have presented two results about the computational complexity of the VCG strategy. First, we have shown that, under suitable conditions on unfolding, the VCG strategy always terminates in a number of transformation steps that is linear with respect to the size of the imperative program. Then, we have shown that the size of the generated verification conditions is linear with respect to the size of the imperative program. 5 - We have presented two transformation techniques aimed at reducing the number of arguments of the predicates used in the VCs. These techniques extend to the case of CLP programs analogous techniques that have been developed for logic programs [38, 51]. The first technique is a transformation strategy, called NLR (Non-Linking variable Removal), that removes variables occurring as arguments of an atom in the body of a clause and do not occur elsewhere in the clause. The second technique, called constrained FAR, is a generalization of liveness analysis, and removes arguments that are not actually used. Similarly to the case of logic programming, the reduction of predicate arity improves the time and space needed for matching atoms during satisfiability proofs. Through our experiments we show that this arity reduction is very effective in the case of large programs. - We have compared the verification conditions obtained by applying our VCG strategy on the multi-step semantics, with those obtained by using the same VCG strategy on the more traditional small-step semantics. Indeed, although these two semantics are equivalent with respect to the input-output behavior of the programs, they show differences in the structure of the verification conditions that are generated and also in the subsequent ability of an automatic system to prove the program properties of interest. - We have demonstrated the flexibility of the approach by showing that it is possible, with a very low effort, to take into account alternative operational semantics definitions for modeling new, additional language features. - Finally, we have empirically proved the feasibility of the approach by performing an experimental evaluation. We have generated verification conditions in the cases of the multi-step semantics and the small-step semantics for a few hundreds of programs taken from various publicly available benchmarks. We have also checked the satisfiability of these verifications conditions by using state-of-the-art Horn clause solvers such as ELDARICA [32], MathSAT [4], QARMC [28], and Z3 [16]. Our experiments also show that, when compared with the HSF(C) software model checker [28], which makes use of an ad hoc technique for generating VCs and then uses QARMC to test their satisfiability, our semantics-based approach to VC generation incurs in a relatively small increase of verification time but, interestingly enough, determines a significant improvement of accuracy over HSF(C) itself. We have also shown that if we apply the NLV and FAR techniques for removing redundant arguments, we can obtain VCs that are easier to be verified by Horn solvers. In conclusion, we have demonstrated that the use of program specialization for generating VCs provides great flexibility with little performance overhead, and thus it is effectively usable in practical software verification applications. 2. An Imperative Language and its Operational Semantics We consider imperative programs manipulating integers and integer arrays, written in a language L which is a fragment of the C intermediate Language (CIL) [47]. The syntax of our imperative language L is shown in Table 1, where: (i) Vars is a set of integer variable identifiers, (ii) AVars is a set of integer array identifiers, (iii) Functs is a set of function identifiers, (iv) Z is the set of integers, and (v) Labels are non-negative integers. The language L is an extension of the one considered by De Angelis et al. [11]. In particular, in L: (i) functions can be recursively defined, and (ii) there is an abort command which causes the abrupt termination of the execution of the program. The global variables of a program are those introduced in the declarations of the program, and the local variables of the functions are those introduced in the declarations of the function definitions. We assume that local variables are suitably renamed so to avoid name clashes between the local and the global variables. In what follows we will feel free to say ‘command’, instead of ‘labeled command’. Language assumptions. We assume that: (i) every two distinct labeled commands have distinct labels, and labeled commands are linearly ordered according to the textual order of the program, (ii) the evaluation of expressions has no side effects, while the evaluation of functions may have side effects, (iii) in theif (expr) ℓ1 else ℓ2 commands, the labels ℓ1 and ℓ2 are different, and (iv) in every program there exists the definition of the function void main() whose first command has label ℓ0 and whose last command is ℓh : halt and this is the only halt command in the program. In our language there are neither blocks, nor structures, nor pointers. We can deal with commands of the form ‘if (expr) cmd else cmd’ and ‘while (expr) {cmd}’ by considering their translation in terms of if-else and goto commands. Jumps are allowed only to labeled commands which occur within the same function definition. Without loss of generality, we assume that the global variables of the program and the local variables of every function definition are not initialized, and every function definition has a unique return command and at most one abort command. For reasons of simplicity, we will consider one-dimensional arrays only. 6 x, y, . . . , i, j, . . . ∈ a, b, . . . ∈ f, g, . . . ∈ k ∈ ℓ, ℓ0 , ℓ1 , . . . ∈ type ∈ uop, bop ∈ prog decl fundef lab cmd cmd expr Vars AVars Functs Z Labels Types Ops (integer variable identifiers) (integer array identifiers) (function identifiers) (integer constants) (labels) (void, int, char, . . .) (unary and binary operators: −, +, ∗, =, ≥, . . .) decl∗ fundef + type x type f (decl∗ ) { decl∗ lab cmd+ } ℓ : cmd x = expr | a[expr] = expr | x = f (expr∗ ) | goto ℓ | | if (expr) ℓ1 else ℓ2 | return expr | abort | halt ::= k | x | uop expr | expr bop expr | a[expr] ::= ::= ::= ::= ::= Table 1: Syntax of the imperative language L under consideration. Superscripts respectively. Commands occurring in sequences are separated by semicolons. + and ∗ (programs) (declarations) (function definitions) (labeled commands) (commands) (expressions) denote non-empty and possibly empty finite sequences, In order to define the multi-step operational semantics, denoted MS, of our imperative language whose syntax is shown in Table 1, we need the following structures (see, for instance, [50]). (i) A global environment δ which is a function that maps global variables to integers or integer arrays, and (ii) a local environment σ which is a function that maps the formal parameters and the local variables to integers or integer arrays. A global or local environment with domain V maps: (i) every integer variable identifier x ∈ V to a value v ∈ Z, and (ii) every array identifier a ∈ V, whose dimension is dim(a), to a finite function from the set {0, . . . , dim(a) − 1} to Z, that is, to a sequence of dim(a) integers. Let δ, σ, and ⊥ denote a global environment, a local environment, and an aborted execution, respectively. A configuration is a pair ⟨⟨c, γ⟩⟩, where: (i) c is a labeled command, and (ii) γ is either a pair ⟨δ, σ⟩ in case of a regular execution (the configuration is said to be regular), or a triple ⟨⊥, δ, σ⟩ in case of an aborted execution (the configuration is said to be aborted). Given any mapping g : X → D, by update(g, x, d), with x ∈ X and d ∈ D, we denote the mapping g′ that is equal to g, except that g′ (x) = d. Given the mappings g1 : X1 → D and g2 : X2 → D, with X1 ∩X2 = ∅, the pair of mappings ⟨g1 , g2 ⟩ : X1∪X2 → D is defined as follows: ⟨g1 , g2 ⟩(x) = if x ∈ X1 then g1 (x) else g2 (x). We extend the update function to act on pairs of mappings ⟨g1 , g2 ⟩ as follows: for any x ∈ X1 ∪X2 , with X1 ∩X2 = ∅, and d ∈ D, update(⟨g1 , g2 ⟩, x, d) = if x ∈ X1 then ⟨update(g1 , x, d), g2⟩ else ⟨g1 , update(g2 , x, d)⟩. Given a finite function a denoting an array of n elements, and given an integer i in {0, . . . , n−1} and an integer v, in what follows we will write write(a, i, v), instead of update(a, i, v). Thus, write(a, i, v) is a new array obtained from a by replacing the element of a at position i by v. We will use the write function to define the semantics of operations on arrays. For any program P, for any label ℓ, (i) at(ℓ) denotes the command in P with label ℓ, and (ii) nextlab(ℓ) denotes the label of the command, if any, that is written in P immediately after the command with label ℓ. Given a function f , the first command of f is called the entry point of f and its label is denoted by firstlab( f ). For any expression e, any global environment δ, and any local environment σ, !e" ⟨δ σ⟩ denotes the value of e in ⟨δ, σ⟩. For instance, if x is a global integer variable and δ(x) = ⟨δ σ⟩(x) = 2, then !x+1"⟨δ σ⟩ = 3. 2.1. Multi-step semantics MS The MS semantics of an imperative program P is represented as a binary transition relation between configurations, denoted =⇒ which is defined by the following rules R1–R5. As usual, =⇒∗ denotes the reflexive, transitive closure of =⇒. If C1 =⇒ C2 (or C1 =⇒∗ C2 ), we say that C1 is the source configuration and C2 is the target configuration of the transition relation =⇒ (or =⇒∗ , respectively). In the MS semantics, similarly to the small-step semantics, the relation =⇒ formalizes the notion of ‘one step of computation’. However, in the case of a function call, =⇒ is defined 7 in terms of =⇒∗ (see rule (R2)), hence the name multi-step semantics. This semantics is different both from the smallstep semantics, which defines the semantics of function calls by introducing a stack of calls, and from the big-step (or evaluation) semantics, which defines a relation =⇒ from configurations to final values [52]. (R1) Assignment. If x is an integer (global or local) variable identifier: ⟨⟨ℓ : x = e, ⟨δ, σ⟩⟩⟩ =⇒ ⟨⟨at(nextlab(ℓ)), update(⟨δ, σ⟩, x, !e"⟨δ σ⟩)⟩⟩ If a is an integer (global or local) array identifier: ⟨⟨ℓ : a[ie] = e, ⟨δ, σ⟩⟩⟩ =⇒ ⟨⟨at(nextlab(ℓ)), update(⟨δ, σ⟩, a, write(⟨δ, σ⟩(a), !ie"⟨δ σ⟩, !e"⟨δ σ⟩)⟩⟩ Informally, an assignment updates either the global environment δ or the local environment σ. (R2) Function call. During the execution a function definition, one of the following two situations may occur: either the execution aborts (see rule (R2a)), or the execution proceeds regularly and the value of a given expression is returned (see rule (R2r)). Let {x1 , . . . , xk } and {y1 , . . . , yh } be the set of the formal parameters and the set of the local variables, respectively, of the function f . (R2a) ⟨⟨ℓ : x = f (e1 , . . . , ek ), ⟨δ, σ⟩⟩⟩ =⇒ ⟨⟨ℓa : abort, ⟨⊥, δ′ , σ⟩⟩⟩ if ⟨⟨at(firstlab( f )), ⟨δ, σ⟩⟩⟩ =⇒∗ ⟨⟨ℓa : abort, ⟨⊥, δ′ , σ′ ⟩⟩⟩ (R2r) ⟨⟨ℓ : x = f (e1 , . . . , ek ), ⟨δ, σ⟩⟩⟩ =⇒ ⟨⟨at(nextlab(ℓ)), update(⟨δ′ , σ⟩, x, !e" ⟨δ′ σ′ ⟩)⟩⟩ if ⟨⟨at(firstlab( f )), ⟨δ, σ⟩⟩⟩ =⇒∗ ⟨⟨ℓr : return e, ⟨δ′ , σ′ ⟩⟩⟩ In these rules (R2a) and (R2r): (i) the arguments e1 , . . . , ek are evaluated in the global and local environments of the caller and their values, say v1 = !e1 " ⟨δ, σ⟩, . . . , vk = !ek " ⟨δ, σ⟩, are bound to the formal parameters of the function f , and (ii) σ is the local environment for the evaluation of f . The environment σ is of the form: {⟨x1 , v1 ⟩, . . . , ⟨xk , vk ⟩, ⟨y1 , n1 ⟩, . . . , ⟨yh , nh ⟩}, where n1 , . . . , nh are some values in Z. (Recall that we assume that, when the local variables y1 , . . . , yh are declared, they are not initialized.) Note that, since the values of n1 , . . . , nh are left unspecified, the transition relation defined by these rules (R2a) and (R2r) is nondeterministic. Informally, a function call either (i) aborts, if the execution of the function definition eventually leads to an aborted configuration (see Rule R2a), or (ii) updates the global environment using the value returned by the function call and then the computation continues by executing the command that occurs after the function call (see Rule R2r). (R3) Abort. ⟨⟨ℓ : abort, ⟨δ, σ⟩⟩⟩ =⇒ ⟨⟨ℓ : abort, ⟨⊥, δ, σ⟩⟩⟩ The abort command forces a transition from a regular configuration to an aborted configuration. (R4) Conditional. If !e" ⟨δ σ⟩ ! 0: ⟨⟨ℓ : if (e) ℓ1 else ℓ2 , ⟨δ, σ⟩⟩⟩ =⇒ ⟨⟨at(ℓ1 ), ⟨δ, σ⟩⟩⟩ If !e" ⟨δ σ⟩ = 0: ⟨⟨ℓ : if (e) ℓ1 else ℓ2 , ⟨δ, σ⟩⟩⟩ =⇒ ⟨⟨at(ℓ2 ), ⟨δ, σ⟩⟩⟩ Depending on the evaluation of the expression used in the condition, an if-else command follows either the ‘then’ branch or the ‘else’ branch, leaving unchanged the global environment δ and the local environment σ. (R5) Jump. ⟨⟨ℓ : goto ℓ′ , ⟨δ, σ⟩⟩⟩ =⇒ ⟨⟨at(ℓ′ ), ⟨δ, σ⟩⟩⟩ The goto ℓ′ command simply makes the program execution to continue from the command with label ℓ′ , leaving unchanged the global environment δ and the local environment σ. Note that rules are given neither for the halt command, nor the return commands, nor for aborted configurations, and rule (R3) is applied only when the abort command occurs in a regular configuration. 3. Encoding Program Safety Using Constraint Logic Programs In this section we define the notion of program safety and we show how to encode this notion as a CLP program. Given a program P whose global variables are z1 , . . . , zr , we define an initial configuration to be a triple of the form: ⟨⟨ℓ0 : c0 , ⟨δInit , σInit ⟩⟩⟩, where: (i) ℓ0 : c0 is the first command of the function main() in P, (ii) δInit is the initial global environment of the form: {⟨z1 , n1 ⟩, . . . , ⟨zr , nr ⟩}, where n1 , . . . , nr are some given integers in Z, and (iii) the initial local 8 environment σInit of the form: {⟨y1 , m1 ⟩, . . . , ⟨y s , m s ⟩}, where y1 , . . . , y s are the local variables of the function main() and m1 , . . . , m s are some given integers in Z. A final configuration is either an aborted configuration or a configuration whose labeled command is ℓh : halt. An error configuration is a final configuration in which an undesirable property holds, as we now specify. Safety. We say that a program is safe when, starting from an initial configuration, it is impossible to reach an error configuration via an execution of the program. In order to formalize this safety notion for any given program P with global variables z1 . . . , zr , we introduce the notion of an unsafety triple of the form {{Init}} P {{Err}}, where Init and Err are formulas with free variables in {z1 . . . , zr }, that denote a set CInit of initial configurations and a set CErr of error configurations, respectively. We have that a configuration C is in the set CInit iff C is an initial configuration and the global environment δ of C satisfies Init, that is, Init[δ(z1 )/z1 , . . . , δ(zr )/zr ] holds. Likewise, we have that a configuration C is in the set CErr iff C is a final configuration and the global environment δ of C satisfies Err, that is, Err[δ(z1 )/z1 , . . . , δ(zr )/zr ] holds. We say that a program P is unsafe with respect to Init and Err, that is, the unsafety triple {{Init}} P {{Err}} holds, iff there exist a configuration Ci in CInit and a configuration Ce in CErr such that Ci =⇒∗ Ce . We say that a program P is said to be safe iff it is not unsafe. Note that our definition of the program safety is independent of the particular values to which the global variables of the program P are initially bound. In the next Section 3.1 we show how to encode the multi-step semantics MS and an unsafety triple as a CLP program. 3.1. CLP encoding of the interpreter for multi-step semantics MS First, we recall some basic notions of Constraint Logic Programming (CLP) we need in this paper. For other notions not mentioned here the reader may refer to the book by Lloyd [42] or the paper by Jaffar and Maher [33]. In this paper we will consider constraint logic programs with linear constraints over the integers and one-dimensional integer arrays. These constraints are not standard in Prolog-based CLP systems, and for handling them we will use solvers for constrained Horn clauses such as ELDARICA [32], MathSAT [4], QARMC [28], and Z3 [16]. Atomic integer constraints are formulas of the form: p1 = p2 , or p1 > p2 , or p1 > p2 , where p1 and p2 are linear polynomials with integer variables and coefficients. When writing polynomials the sum and the multiplication operations are denoted by + and ∗, respectively. An integer constraint is a conjunction of atomic integer constraints. Atomic array constraints are constraints of the form: (i) dim(A,N), denoting that N is the dimension of the array A, or (ii) read(A,I,V), denoting that the I-th element of the array A has value V, or (iii) write(A,I,V,B), denoting that the array B is equal to the array A except that the I-th element of B has value V. Indexes of arrays and elements of arrays are assumed to be integers. An array constraint is a conjunction of atomic array constraints. A constraint is either true, or false, or an integer constraint, or an array constraint, or a conjunction of constraints. An atom is an atomic formula of the form q(t1 , . . . , tm ), where: (i) q is a m-ary predicate symbol different from =, >, >, dim, read, and write, and (ii) t1 , . . . , tm are terms constructed out of variables, constants, and function symbols different from + and *. Thus, for instance, the atom q(2*X+1) is replaced by the atom q(Y), where Y is a new variable such that the constraint Y=2*X+1 holds. A CLP program is a finite set of clauses each of which is of the form A :- c, B, where A is an atom, c is a constraint, and B is a (possibly empty) conjunction of atoms. If B is an atom only, then ‘c, B’ is said to be a constrained atom. As usual, in a clause A :- c, B, the atom A is called the head and the conjunction ‘c, B’ is called the body. Without loss of generality, we assume that in every clause head, all occurrences of integer terms are distinct variables. For instance, the clause p(X,X+1) :- X>0, q(X) is written as p(X,Y) :- Y=X+1, X>0, q(X). A clause A :- c is called a constrained fact. If in the clause A :- c the constraint c is true, then it is omitted and the resulting clause is called a fact. A CLP clause is said to be linear if it is of the form A :- c, B, where B consists of at most one atom. A CLP program is said to be linear if all its clauses are linear. By vars(ϕ) we denote the set of all free variables in the formula ϕ. We extend this notation to sets of formulas so that, for instance, vars({ϕ1 , ϕ2 }) = vars(ϕ1 ) ∪ vars(ϕ2 ). Now we define the semantics of CLP programs. An A-interpretation D is an interpretation such that: (i) the carrier of D is the Herbrand universe [42] constructed out of the integers (that is, the elements of Z), the finite sequences of integers (which provide the interpretation for arrays), and the function symbols of any (null or positive) arity, different from + and *, (ii) D assigns to the function symbols + and * the expected meaning in Z×Z → Z, and to the predicate symbols =, >, and > the expected meaning in Z×Z, 9 (iii) for all sequences a0 . . . an-1 of integers, for all integers d, dim(a0 . . . an-1 , d) is true in D iff d=n, (iv) for all sequences a0 . . . an-1 and b0 . . . bm-1 of integers, for all integers i and v, read(a0 . . . an-1 , i, v) is true in D iff 0 ≤ i ≤ n-1 and v = ai , and write(a0 . . . an−1 , i,v,b0 . . . bm-1 ) is true in D iff 0 ≤ i ≤ n-1 and n = m and bi = v and for j=0,...,n-1, if j!i then aj = bj , (v) D is an Herbrand interpretation [42] for function and predicate symbols different from +, *, =, >, >, dim, read, and write. We can identify an A-interpretation D with the set of all ground atoms that are true in D, and hence A-interpretations are partially ordered by the set inclusion relation. Given a formula ϕ, if for every A-interpretation D, we have that ϕ is true in D, then we write A |= ϕ, and we say that ϕ is true in A. A constraint c is satisfiable iff A |= ∃(c), where for every formula ϕ, ∃(ϕ) denotes the existential closure of ϕ. Likewise, ∀(ϕ) denotes the universal closure of ϕ. A constraint is unsatisfiable iff it is not satisfiable. The semantics of a CLP program Q is defined to be the least A-model of Q, denoted M(Q), that is, the least A-interpretation D such that every clause of Q is true in D [33]. For the multi-step semantics MS, the transition relation =⇒ between configurations and its reflexive, transitive closure =⇒∗ are encoded by the binary predicates tr and reach, respectively. These predicates, whose defining clauses are shown in Table 2 below, constitute the CLP interpreter for the multi-step semantics of our imperative language of Table 1. In Table 2 we have the clauses relative to: (i) assignments (clause 1), (ii) function calls (clauses 2a and 2r), (iii) aborts (clause 3), (iv) conditionals (clauses 4t and 4f ), (v) jumps (clause 5), and (vi) reachability of configurations (clauses 6 and 7). 1. tr(cf(cmd(L,asgn(X,expr(E))), (D,S)), cf(cmd(L1,C), (D1,S1))) :eval(E,(D,S),V), update((D,S),X,V,(D1,S1)), nextlab(L,L1), at(L1,C). 2a. tr(cf(cmd(L,asgn(X,call(F,Es))), (D,S)), cf(cmd(LA,abort), (bot,D1,S))) :eval list(Es,(D,S),Vs), build funenv(F,Vs,Sbar), firstlab(F,FL), at(FL,C), reach(cf(cmd(FL,C), (D,Sbar)), cf(cmd(LA,abort), (bot,D1,S1))). 2r. tr(cf(cmd(L,asgn(X,call(F,Es))), (D,S)), cf(cmd(L2,C2), (D2,S2))) :eval list(Es,(D,S),Vs), build funenv(F,Vs,Sbar), firstlab(F,FL), at(FL,C), reach(cf(cmd(FL,C), (D,Sbar)), cf(cmd(LR,return(E)), (D1,S1))), eval(E,(D1,S1),V), update((D1,S),X,V,(D2,S2)), nextlab(L,L2), at(L2,C2). 3. tr(cf(cmd(L,abort), (D,S)), cf(cmd(L,abort), (bot,D,S))). 4t. tr(cf(cmd(L,ite(E,L1,L2)), (D,S)), cf(cmd(L1,C), (D,S))) :- beval(E,(D,S)), at(L1,C). 4f. tr(cf(cmd(L,ite(E,L1,L2)), (D,S)), cf(cmd(L2,C), (D,S))) :- beval(not(E),(D,S)), at(L2,C). 5. tr(cf(cmd(L,goto(L1)), (D,S)), cf(cmd(L1,C), (D,S))) :- at(L1,C). 6. reach(C,C). 7. reach(C,C2) :- tr(C,C1), reach(C1,C2). Table 2: The CLP interpreter for the multi-step operational semantics MS: the clauses for tr (encoding ⇒) and reach (encoding ⇒∗ ). Configurations are represented as terms of the form cf(cmd(L,C),Env), where: (i) L and C encode the label and the command, respectively, (ii) Env is either a pair (D,S) or a triple (bot,D,S), where bot represents the symbol ⊥, and D and S encode the global and the local environment, respectively. The term asgn(X,expr(E)) encodes the assignment of the value of the expression E to the variable X. The predicate eval(E,(D,S),V) holds iff V is the value of the expression E in the global environment D and local environment S. The predicate eval list extends the predicate eval to lists of expressions. The predicate beval(E,(D,S)) holds iff the value of the expression E is not 0 in the global environment D and the local environment S. The predicate at(L,C) holds iff the command C has label L. The predicate nextlab(L,L1) holds iff L1 is the label of the command that is written in the given imperative program immediately after the command with label L. The predicate firstlab(F,L1) holds iff L1 is the label of the first command of the definition of the function F. The predicate build funenv(F,Vs,Sbar) holds iff Sbar is the local environment needed for the execution of the body of the function F, and Vs is the list of the values of the actual parameters in the call of F. The term ite(E,L1,L2) encodes the conditional command (ite stands for if-then-else), and labels L1 and L2 specify where to jump to, depending on the value of the expression E. The term goto(L) encodes the jump to the 10 command with label L. The predicate update((D,S),X,V,(D1,S1)) holds iff the new global and local environments D1 and S1 are computed from the old global and local environments D and S, by binding the (global or local) variable X to the value V, using the function update acting on pairs of mappings (see Section 2). Note that no constraint appears in Table 2. However, constraints are used for defining some predicates whose clauses are not shown, such as eval, beval, and update. Moreover, constraints are used in the encoding of an unsafety triple, as we now show. 3.2. CLP encoding of an unsafety triple We encode any given unsafety triple {{Init}} P {{Err}} by the CLP program I containing the following clause: 8. unsafe:- initConf(C), reach(C,C1), errorConf(C1). and also the clauses defining: (i) the predicates tr and reach that encode the interpreter (these clauses are given in Table 2), (ii) the predicates initConf and errorConf that encode the formulas Init and Err, respectively, denoting the sets of the initial and error configurations, and (iii) the predicates that encode the declarations and the function definitions of the imperative program P (among these clauses there are those defining the predicate at that encode the commands of P). Since the focus of this paper is the generation of the verification conditions, we will restrict ourselves to Init and Err properties that are constraints. The interested reader may refer to a paper by De Angelis et al. [13] where it is shown how to deal with more complex Init and Err properties such as those defined by a set of recursive constrained Horn clauses. Now to fix the ideas we give an example of an unsafety triple and its encoding via a CLP program, which we call I. Let us consider the unsafety triple {{Init}} gcd {{Err}}, where: (i) gcd is the C program (shown in Column (a) of Table 3) that computes, as a final value of the variable x (and y), the greatest common divisor of the two positive integers which are the initial values of the variables x and y, respectively, (ii) Init is the constraint x ≥ 1 ∧ y ≥ 1, and (iii) Err is the constraint x < 0. The program I that encodes that triple, is made out of: (i) the clauses 1–7 of Table 2 and clause 8 above, (ii) the following clauses 9 and 10 encoding the constraints Init and Err, and (iii) the clauses defining the predicates that encode the declarations and the function definitions of the program gcd (see Column (c) of Table 3). Clauses 9 and 10 below and the clauses of Point (iii) are constructed by first translating the program gcd into a program written in the language L of Table 1. The resulting program is shown in Column (b) of Table 3. Note that in this translation the while-loop at line (n8) is replaced by using the conditional ‘3: if(x!=y) 4 else 9’ and the two jumps ‘6: goto 3’ and ‘8: goto 3’. 9. initConf(cf(cmd(3,C),[(x,X),(y,Y)],[])) :- at(3,C), X>=1, Y>=1. 10. errorConf(cf(cmd(9,C),[(x,X),(y,Y)],[])) :- at(9,C), X=<-1. where: (a) the body of the clause for initConf is made out of the constraint Init and the atom at(3,C) which refers to the command ‘3: if(x!=y) 4 else 9’ which is the first command of the main function (see Column (b) of Table 3), and (b) the body of the clause for errorConf is made out of the the constraint Err and the atom at(9,C) which refers to the command ‘9: halt’ in the main function (see Column (b) of Table 3) The clauses of Point (iii) defining the predicates that encode the declarations and the function definitions of the program gcd are shown in Column (c) of Table 3. They are constructed as follows starting from the program of Column (b) of that table. In these clauses we have: (i) the predicate globals for encoding the list of identifiers of the global variables (see clause 11), (ii) the predicate fun for encoding function definitions (see clauses 12 and 16), and (iii) the predicate at for encoding labeled commands as indicated at the end of Section 3.1. In particular, clause 12 encodes the sub function that has two formal parameters [a,b], one local variable [r], and whose definition starts with the command ‘1: r=a-b’ encoded by the constrained fact ‘at(1,asgn(r,minus(a,b)))’ (see clause 13). Similarly, clause 15 encodes the main function that has no formal parameters and no local variables (hence, the two empty lists []), and whose definition starts with the command ‘3: if(x!=y) 4 else 9’ encoded by the constrained 11 (a) the C program gcd (n1) (n2) (n3) (n4) (n5) (n6) (n7) (n8) (n9) (n10) (n11) (n12) (n13) (n14) (n15) int x, y; int sub(int a, int b) { int r; r=a-b; return r; } void main() { while (x!=y) { if (x>y) x=sub(x,y); } else { y=sub(y,x); } } } (b) gcd in the language L of Table 1 (c) the set of CLP clauses encoding gcd int x, y; int sub(int a, int b) { int r; 1 : r=a-b; 2 : return r; } void main() { 3 : if(x!=y) 4 else 9; 4 : if(x>y) 5 else 7; 5 : x=sub(x,y); 6 : goto 3; 7 : y=sub(y,x); 8 : goto 3; 9 : halt } 11. globals([x,y]). 12. fun(sub,[a,b],[r],1). 13. at(1,asgn(r,minus(a,b))). 14. at(2,return(r)). 15. 16. 17. 18. 19. 20. 21. 22. fun(main,[],[],3). at(3,ite(neq(x,y),4,9)). at(4,ite(gt(x,y),5,7)). at(5,asgn(x,call(sub,[x,y]))). at(6,goto(3)). at(7,asgn(y,call(sub,[y,x]))). at(8,goto(3)). at(9,halt). Table 3: The gcd program of the unsafety triple {{Init}} gcd {{Err}} (Column (a)), its translation into the language L (Column (b)), and its encoding CLP clauses (Column (c)). fact ‘at(3,ite(neq(x,y),4,9))’ (see clause 16). The first argument of the ite term in clause 16 is the condition neq(x,y) of the while-loop, and the second and the third arguments (that is, 4 and 9) are the labels of the first commands occurring, respectively, in the ‘then’ and ‘else’ branches of the conditional. The jump commands ‘6: goto 3’ and ‘8: goto 3’ are encoded by the constrained facts ‘at(6,goto(3))’ and ‘at(8,goto(3))’ (see clauses 19 and 21, respectively). Given any unsafety triple {{Init}} P {{Err}}, its encoding program I constructed as indicated above is correct in the sense that the unsafety triple holds (and thus the program P is unsafe) iff the atom unsafe belongs to the least A-model of I. This is a straightforward consequence of the fact that the tr and reach predicates of Table 2 are a correct encoding of the operational semantics. Thus, we have the following correctness result. Theorem 1. (Correctness of CLP Encoding) Let I be the CLP encoding of an unsafety triple {{Init}} P {{Err}}. The program P is safe with respect to Init and Err iff unsafe " M(I). The proof of this theorem is similar to the proof of Theorem 1 of a paper by De Angelis et al. [11]. However, in this paper: (i) we use a slightly different representation of configurations (we do not use an execution stack for dealing with function calls), and (ii) the predicate reach has two arguments, instead of one argument only (this change is needed by the multi-step semantics MS for encoding the reachability relation within function calls). 4. Automatic Generation of Verification Conditions by Program Specialization In this section we present the Verification Condition Generation strategy (VCG strategy, for short), which we use for automatically generating verification conditions. From this section onwards, we consider the CLP program I that encodes an unsafety triple {{Init}} P {{Err}} as shown in Section 3, and we assume that the imperative program P is written in the language L. The VCG strategy is a specialization strategy for CLP programs. In general program specialization, or partial evaluation, is a program transformation technique that aims at customizing a general purpose program to a specific context of use [35, 37], thereby deriving a so called residual program. One prominent application of program specialization is program compilation and compiler generation via the so-called Futamura projections. Indeed, a compiler from a source language L1 to a target language L2 can be viewed as a program that specializes an interpreter, written in L2 , with respect to a source program, written in L1 . In particular, our VCG strategy can be viewed as a compiler from the imperative language L to CLP. There are two main categories of program specializers: 12 - online specializers, that implement a strategy that makes decisions on what call to unfold and what call to fold (or memoize) on the basis of an analysis performed at specialization time; and - offline specializers, that implement a two-stage strategy: (i) first a binding time analysis produces an annotation of the program to be specialized, which tells which call to unfold and which call to fold, and (ii) then the specializer works by using this annotation. Usually, offline specializers are more efficient, while online specializers produce better quality residual programs. The VCG strategy can be considered as a strategy for offline specialization, as it is based on an unfolding annotation that is computed before the specialization is performed. We will show that VCG is very efficient, both in theory and in practice, and the quality of the VCs it generates is comparable to the one achieved by ad hoc VC generators. The VCG strategy (see Figure 2 below) takes as input the CLP program I and produces as output a specialized CLP program Isp , encoding a set of verification conditions, such that Isp is equivalent to I with respect to the atom unsafe, that is, unsafe ∈ M(I) iff unsafe ∈ M(Isp ). Thus, by Theorem 1, program P is safe with respect to Init and Err iff unsafe " M(I). The VCG strategy works by performing the so-called removal of the interpreter, that is, by removing the overhead due to the level of interpretation which is present in the initial CLP program I because of the CLP clauses defining the operational semantics of the imperative programming language and the clauses encoding the commands of the program P. The set Isp of specialized CLP clauses has the graph of predicate calls that can be viewed as an abstraction of the control flow graph of the imperative program P. Now, due to undecidability limitations, there is no algorithm for checking whether or not unsafe∈ M(Isp ). However, by relying on the fact that unsafe ∈ M(Isp ) iff Isp ∪ {¬unsafe} is unsatisfiable, we can prove that program P is safe (or unsafe) by showing that Isp ∪ {¬unsafe} is satisfiable (or unsatisfiable, respectively). Despite the undecidability of the verification problem in the general case, it is often the case in practice that this satisfiability check can successfully be performed by automatic tools that deal with Horn clauses with linear integer arithmetic constraints [4, 10, 16, 28]. Moreover, it turns out that checking the satisfiability of Isp ∪ {¬unsafe} is often easier than checking the satisfiability of I ∪ {¬unsafe}. This is due to the fact that when the VCG strategy specializes program I, it produces drastically simplified clauses by compiling away both the references to the commands of the program P and the references to the operational semantics of the imperative programming language. 4.1. The VCG strategy During the application of the VCG strategy we use the following transformation rules: unfolding, definition introduction, and folding [17, 19]. The VCG strategy starts off by unfolding clause 8 of program I (see Section 3.2), which defines the top-level predicate unsafe, whose validity in the least A-model M(I) establishes the unsafety of the given program P with respect to the formulas Init and Err. By unfolding the initConf, errorConf, and reach atoms, the VCG strategy performs a symbolic exploration of the control flow graph of the imperative program P. Now we recall the definition of the transformation rules we use. Unfolding Rule. Let C be a clause of the form H :- c,L,A,R, where H and A are atoms, L and R are (possibly empty) conjunctions of atoms, and c is a constraint. Let {Ki :- ci ,Bi | i = 1, . . . , m} be the set of the (renamed apart) clauses in the CLP program I such that, for i = 1, . . . , m, A is unifiable with Ki via the most general unifier ϑi and (c,ci) ϑi is satisfiable. We define the following function Unf: Unf (C, A, I) = { (H :- c,ci ,L,Bi ,R) ϑi | i = 1, . . . , m } Each clause in Unf (C, A, I) is said to be derived by unfolding C w.r.t. A (or by unfolding A in C) using I. For the first execution of the assignment SpC := Unf (C, A, I), the VCG strategy selects the leftmost atom in the body of the input clause, that is, initConf(C), so that the left argument in reach(C,C1) will be instantiated to the initial configuration. For the subsequent executions, the atom A will be the only atom in the body of clause C, as all clauses added to InCls have a single atom in their body. The application of the unfolding rule during specialization is guided by an annotation of the atoms in the body of the clauses that tells us whether or not a clause should be unfolded with respect to an atom. This annotation guarantees that the Unfolding phase of the VCG strategy terminates (that is, only finite sequences of applications of the unfolding rule are generated) (see Figure 2). The reader may refer to a paper by Leuschel and Bruynooghe [37] for a survey on related techniques which guarantee finiteness of unfolding. In Section 4.2, we will explain in detail how this annotation is generated. For the 13 Input: The CLP program I encoding the given unsafety triple {{Init}} P {{Err}}. Output: A CLP program Isp encoding a set of verification conditions, such that unsafe∈ M(I) iff unsafe∈ M(Isp ). Initialization: Isp := ∅; InCls := { unsafe :- initConf(C), reach(C,C1), errorConf(C1)}; Defs := ∅; while in InCls there is a clause C with an atom in its body do Unfolding: SpC := Unf (C, A, I) where A is the leftmost atom in the body of C; while in SpC there is a clause D whose body contains an occurrence of an unfoldable atom A do SpC := (SpC − {D}) ∪ U where: U = Unf (D, A, I), if A is unfoldable once, and U = FullUnf (D, A, I), if A is fully unfoldable end-while; Definition-Introduction & Folding: while in SpC there is a clause E of the form: H :- e, L, reach(cf1,cf2), R where H is either unsafe or an atom of the form newp(X), e is a constraint, (cf1,cf2) is a pair of terms representing configurations, and L and R are possibly empty conjunctions of atoms do if in Defs there is a (renamed apart) clause D of the form: newq(V) :- B where: V is the tuple of variables occurring in B, and for some renaming substitution ϑ, Bϑ = reach(cf1,cf2) then SpC := (SpC − {E}) ∪ { H :- e, L,newq(V)ϑ,R }; else let F be the clause: newr(V) :- reach(cf1,cf2) where: newr is a predicate symbol not occurring in I ∪ Defs, and V is the tuple of variables occurring in reach(cf1,cf2); InCls := InCls ∪ {F}; Defs := Defs ∪ {F}; SpC := (SpC − {E}) ∪ { H :- e, L, newr(V), R } end-while; InCls := InCls − {C}; Isp := Isp ∪ SpC; end-while; Figure 2: The Verification Condition Generation (VCG) strategy. description of the VCG strategy we only assume that every atom is marked by an unfolding annotation, which is: either (i) unfoldable once, or (ii) fully unfoldable, or (iii) non-unfoldable. An atom is said to be unfoldable if it is annotated as unfoldable once or fully unfoldable. For a clause C and an atom A which is unfoldable once, the unfolding rule derives the set Unf (C,A,I) of clauses. For an atom A which is fully unfoldable, the unfolding rule derives the set FullUnf (C,A,I) of clauses D recursively defined as follows: either (i) D ∈ Unf(C,A,I) and D contains no unfoldable atom in its body, or (ii) D ∈ FullUnf (D′,B,I) for some D′ ∈ Unf (C,A,I) and some unfoldable atom B occurring in the body of D′ . Informally, for an atom A which is fully unfoldable, the unfolding rule is repeatedly applied to all clauses which are directly or indirectly derived by unfolding C w.r.t. A using I, until all unfoldable atoms have been unfolded. Note that the order in which clauses and atoms in their bodies are selected for unfolding is not relevant. At the end of the Unfolding phase new predicate definitions are introduced for calls to the predicate reach, by applying the following rule. Definition Introduction Rule. A new predicate newr is introduced by the clause: newr(X) :- A, where A is an atom 14 and the argument X of newr is a tuple of variables occurring in A. Clauses introduced by the definition introduction rule are called definitions. Note that in the VCG strategy the atom A will always consist of an atom of the form reach(cf1,cf2) and X will be the tuple of all variables occurring in reach(cf1,cf2). However, in Section 6 we will present a transformation strategy, that in order to improve efficiency, aims at reducing the number of arguments of the predicates. In that strategy the tuple X is constructed by taking a subset of the variables of an atom A (which is not of the form reach(cf1,cf2)). The new predicate introduced by the definition introduction rule can be viewed as a generalization of the constrained atom ‘e, reach(cf1,cf2)’ where the constraint e is replaced by the constraint true (which is implicit in the clause defining newr). For more sophisticated generalization techniques used in specialization-based verification approaches we refer to the literature [7, 8, 11, 14, 21]. Then, calls to reach, with complex arguments representing configurations, are replaced by calls to the newly introduced predicates, by applying the following folding rule. Folding Rule. Let C: H :- e, L, B, R be a clause and D: newr(X) :- A be a (renamed apart) definition such that, for some renaming substitution ϑ: (i) B = Aϑ, and (ii) for every variable Y occurring in A and not in X, Yϑ does not occur in {H, e, L, R}. Then C is folded w.r.t. B by using D, thereby deriving the new clause H :- e, L, newr(X)ϑ, R. The VCG strategy proceeds by adding the clause defining the new predicate newr to the set InCls to be specialized and to the set Defs of clauses introduced by the definition introduction rule. The strategy terminates when all clauses in InCls have been processed, and no new predicate definitions are added to that set because all clauses derived by unfolding (and different from constrained facts) can be folded by using clauses in Defs. The correctness of the VCG strategy with respect to the least model semantics is a direct consequence of the correctness of the transformation rules [17, 20]. Indeed, we have the following result. Theorem 2 (Correctness of the VCG strategy). Suppose that, given the input program I, the VCG strategy terminates and upon termination it returns the CLP program Isp . Then unsafe ∈ M(I) iff unsafe ∈ M(Isp ). 4.2. Termination We will refer to the outer loop of the VCG strategy (that is, the loop with double vertical lines in Figure 2), as the UDF-loop. That loop consists of: (i) the Unfolding phase, and (ii) the Definition-Introduction & Folding phase. The VCG strategy may not terminate because of two reasons: (i) the non-termination of the while-loop of the Unfolding phase, and (ii) the non-termination of the UDF-loop (indeed, the termination of the while-loop of the Definition-Introduction & Folding phase is obvious because the number of clauses with the reach atom decreases). As already mentioned, the unfolding annotation of the atoms occurring in the body of the clauses should guarantee finiteness of the Unfolding phase. First we need the following definition. Definition 1. We say that an unfolding annotation guarantees finiteness if, for every clause C such that the predicates occurring in the body of C are defined in the CLP program I, the Unfolding phase of the VCG strategy, with the given annotation, terminates. Now we define an unfolding annotation for the VCG strategy, denoted UA, that guarantees finiteness. The annotation UA also guarantees that the size of the specialized clauses generated by the VCG strategy is linear with respect to the size of the given clauses, and hence with respect to the size of the imperative program to be verified, that is, the number of labeled commands occurring in P (see Section 4.3). The unfolding annotation UA is defined as follows: (i) an atom whose predicate symbol is different from tr or reach is (annotated as) fully unfoldable; (ii) an atom of the form tr(cf(LCmd, ), ) is unfoldable once, if LCmd is the term cmd(L,asgn(X,call(F,Es))) representing a function call; otherwise it is fully unfoldable; (iii) an atom of the form reach(cf(cmd(L,Cmd), ), ) is unfoldable once, if Cmd is an assignment or a goto command, and L is neither the entry point of a function definition nor a label occurring in an ite or goto command; otherwise reach(cf(cmd(L,Cmd), ), ) is non-unfoldable. 15 This definition of the unfolding annotation UA is specific to the interpreter considered in Section 3.1, and it has been suggested by the following general principles: (1) (Finite unfolding) the unfolding annotation should guarantee finiteness, (2) (Determinate unfolding) the annotation should enforce that, after the first unfolding of a given predicate definition, every subsequent unfolding step derives at most one new clause (the notion of determinate unfolding has been considered in a paper by Gallagher [24] as a means for avoiding code explosion during program specialization), and (3) (Singular unfolding) different variants of the same atom should not be unfolded while specializing different predicate definitions. The above principles can be applied to design unfolding annotations in a systematic, possibly automatic, way for the interpreters of various programming language, similarly to the Binding Time Analysis performed before offline specialization [39, 40]. The following definition and lemmata are needed for the Termination Theorem 3 below. When not presented in the text, the proofs of these and other lemmata and theorems will be found in the Appendix. Definition 2 (Set of Definitions and Size of an Imperative Program). (i) Let ∆ be the set of new predicate definitions that are introduced during the execution of the VCG strategy. (ii) The size of an imperative program P written in the language L is the number of labeled commands occurring in P. Lemma 1. The unfolding annotation UA guarantees finiteness. In order to show that the UDF-loop terminates, it is enough to prove that the set ∆ is finite, and hence the set InCls will eventually become empty. The fact that ∆ is finite follows from the facts that: (i) each clause introduced during the Definition-Introduction & Folding phase is of the form: newr(V) :- reach(cf1,cf2), and (ii) for any given program P, there are finitely many pairs (cf1,cf2) of configurations. The next lemma shows that the cardinality of the set ∆ is linear with respect to the size of P. Lemma 2. Let ∆ be the set of new predicate definitions introduced during the execution of the VCG strategy. Then every clause in ∆ has the form : newr(V) :- reach(cf1,cf2) where : (i) cf1 is a configuration of the form cf(cmd(L1,Cmd1),(D1,S1)); (ii) cmd(L1,Cmd1) is a labeled command in P; (iii) cf2 is a configuration of the form cf(cmd(L2,Cmd2),Env); (iv) Cmd2 is either halt, or abort, or return(E), and cmd(L2,abort) or cmd(L2,return(E)) is the unique abort or return command, respectively, occurring in the definition of the function where L1 occurs; (v) Env is either (D2,S2) or (bot,D2,S2); (vi) D1, S1, D2, and S2 are (global or local) environments, that is, finite functions represented as lists of the form [(x1,X1),(x2,X2),...], where x1,x2,... are (global or local) variables of P and X1,X2,... are CLP variables; (vii) (D1,S1) and (D2,S2) are uniquely determined, modulo CLP variable renaming, by L1 and L2, respectively. Then, the cardinality of ∆ is of the order of O(n), where n is the size of P. From Lemmata 1 and 2 we immediately get the following result. Theorem 3 (Termination of the VCG strategy). The VCG strategy using the unfolding annotation UA terminates on input CLP program I. Proof. The UDF-loop of the VCG strategy terminates because, by Lemma 2, a finite number of new definitions is introduced. The while-loop of the Unfolding phase terminates because, by Lemma 1, the unfolding annotation UA guarantees finiteness. Finally, the while-loop of the Definition-Introduction & Folding phase terminates because at each iteration the number of occurrences of the reach predicate decreases by one unit. ! 16 4.3. Computational Complexity of Verification Condition Generation Now we present two results concerning the time and space complexity of the VCG strategy. First, we show that the VCG strategy terminates in a number of transformation steps that is linear with respect to the size of the imperative program P. Then, we show that the size of the generated verification conditions is linear with respect to the size of P. For reasons of simplicity, in our complexity analysis we count only the transformation steps that are: either (i) the unfolding of a clause with respect to a tr atom or a reach atom (in particular, we do not count the transformation steps consisting in the unfolding of a clause with respect to atoms occurring in the definition of the operational semantics, and having different predicate symbols, such as eval, update, nextlab, and at), or (ii) the introduction of a new definition, or (iii) the folding of a clause. First we show that, during the VCG strategy, by using the unfolding annotation UA, a linear number of unfolding steps is performed. We need the following definition. A configuration is said to be a return configuration if its command is of the form: return(E). Lemma 3. For every label L in the program P, for every final or return configuration cfz, there exists at most one clause which is unfolded w.r.t. an atom of the form reach(cf(cmd(L,Cmd), ),cfz), during the execution of the VCG strategy. Next, we prove that the result of the Unfolding phase, performed according to the unfolding annotation UA, is a set of clauses whose size is bounded by a constant. Definition 3 (Size of Clauses). (i) The size of a clause C is the number α(C) of the atoms occurring in C. (ii) The size α(S ) of a set S of clauses is the sum of the sizes of the clauses in S . Lemma 4. There exists a positive integer k such that, for every clause C: newp(X) :- reach(cf1,cfz), where cfz is a final or return configuration, the result of applying the Unfolding phase to C using the unfolding annotation UA is a set SpC of clauses with α(SpC) ≤ k. Now, we are able to show that the VCG strategy takes at most O(n) transformation steps, where n is the size of the imperative program P to be verified. Theorem 4 (Time Complexity of VCG). Let I be the CLP program encoding a given unsafety triple {{Init}} P {{Err}}. The VCG strategy terminates on the input program I in O(n) transformation steps, where n is the size of P. Proof. By Lemmata 2 and 3, O(n) unfolding steps are performed during the VCG strategy. By Lemma 2, the definition introduction rule is applied O(n) times. Finally, the VCG strategy applies the folding rule once for each atom occurring in the body of a clause in SpC at the end of the Unfolding phase, and hence, by Lemma 4, O(n) folding steps are performed. ! Finally, we show that the size of the CLP program Isp , which is the output of the VCG strategy, is linear with respect to the size of the program P. Theorem 5 (Size of the Output of VCG). Let Isp be the output of the VCG strategy on the input program I. Then α(Isp ) is of the order of O(n), where n is the size of P. Proof. Suppose that the VCG strategy terminates after r iterations of the UDF-loop. Thus, the set ∆ of new predicate ! definitions introduced by I has cardinality |∆| = r. Let ∆ = {C1 , . . . , Cr }. By construction, Isp = ri=1 SpCi , where, for i = 1, . . . , r, SpCi is the set of clauses derived from Ci after one iteration of the UDF-loop. By Lemma 4 there exists a positive integer k (independent of I) such that the set of clauses derived by unfolding Ci using the unfolding annotation UA has size not larger than k. Since the folding rule replaces a single atom by another single atom, we have that α(SpCi ) ≤ k. Hence, α(Isp ) ≤ k · |∆| and, by Lemma 2, we get the thesis. ! 17 4.4. An Example of Application of the VCG Strategy Now we will see in action the VCG strategy of Figure 2, when given in input the CLP program I encoding the unsafety triple {{Init}} gcd {{Err}} presented in Section 3.2. In order to generate a set of VCs for the gcd program, we use the unfolding annotation UA defined in Section 4.2. In the following, for reasons of readability, we will omit the round parentheses around the pair of lists denoting the global and local environments. The VCG strategy starts off by performing the Unfolding phase for the set InCls = {8}. First Iteration of the UDF-loop. By unfolding clause 8 w.r.t. the fully unfoldable atom initConf(X), we get: 23. unsafe:- X>=1, Y>=1, reach(cf(cmd(3,ite(neq(x,y)),4,9),[(x,X),(y,Y)],[]),C), errorConf(C). Then, the Unfolding phase selects the fully unfoldable atom errorConf(C), as it is the only unfoldable atom in clause 23 (note that the reach atom is not unfoldable because the command in its source configuration is an if-thenelse). By unfolding errorConf(C) we get: 24. unsafe:- X>=1, Y>=1, X1=< -1, reach(cf(cmd(3,ite(neq(x,y)),4,9),[(x,X),(y,Y)],[]),cf(cmd(9,halt),[(x,X1),(y,Y1)],[]))). No atom in the body of clause 24 is unfoldable. Thus, we continue by executing the Definition-Introduction & Folding phase. In order to fold clause 24 the following clause is introduced in Defs and added to InCls: 24. new1(X,Y,X1,Y1):reach(cf(cmd(3,ite(neq(x,y)),4,9),[(x,X),(y,Y)],[]),cf(cmd(9,halt),[(x,X1),(y,Y1)],[]))). where new1 is a new predicate symbol. By folding clause 24 w.r.t. the atom reach using clause 24 we get: 25. unsafe:- X>=1, Y>=1, X1=< -1, new1(X,Y,X1,Y1). Second Iteration of the UDF-loop. Now, we consider clause 24 in InCls, and we perform one more iteration of the UDF-loop. By unfolding clause 24 w.r.t. the atom in its body we get: 26. new1(X,Y,X1,Y1):- X=Y, reach(cf(cmd(9,halt),[(x,X),(y,Y)],[]),cf(cmd(9,halt),[(x,X1),(y,Y1)],[])). 27. new1(X,Y,X1,Y1):- X>=Y+1, reach(cf(cmd(4,ite(gt(x,y)),5,7),[(x,X),(y,Y)],[]),cf(cmd(9,halt),[(x,X1),(y,Y1)],[])). 28. new1(X,Y,X1,Y1):- X+1=<Y, reach(cf(cmd(4,ite(gt(x,y)),5,7),[(x,X),(y,Y)],[]),cf(cmd(9,halt),[(x,X1),(y,Y1)],[])). Note that the symbolic evaluation of the while-loop condition neq(x,y) in clause 24 generates the three constraints X=Y (which is the exit condition of the loop), X>=Y+1, and X+1=<Y (which together make the loop condition x!=y) in clauses 26–28 . We have that the atom in clause 26 is unfoldable, and hence, by reflexivity of the reach predicate (clause 6), we get the following constrained fact: 29. new1(X,Y,X,Y):- X=Y. By unfolding clause 26 using clause 7, and then further unfolding this clause w.r.t. the tr atom in its body, we get the empty set of clauses because the predicate tr has no clauses defining a transition from the halt command. No unfoldable atom occurs in the body of clauses 27 and 28. In order to fold clause 27 and 28 the following definition is introduced in Defs and added to InCls: 30. new2(X,Y,X1,Y1):reach(cf(cmd(4,ite(gt(x,y)),5,7),[(x,X),(y,Y)],[]),cf(cmd(9,halt),[(x,X1),(y,Y1)],[])). By folding clauses 27 and 28 using clause 30 we get: 31. new1(X,Y,X1,Y1):- X>=Y+1, new2(X,Y,X1,Y1). 32. new1(X,Y,X1,Y1):- X+1=<Y, new2(X,Y,X1,Y1). Third Iteration of the UDF-loop. Now, we perform one more iteration of the UDF-loop starting from clause 30 in InCls. By unfolding the reach atom in clause 30 we get: 33. new2(X,Y,X1,Y1):- X>=Y+1, reach(cf(cmd(5,asgn(x,call(sub,[x,y]))),[(x,X),(y,Y)],[]),cf(C,[(x,X1),(y,Y1)],[])). 34. new2(X,Y,X1,Y1):- X=<Y, reach(cf(cmd(7,asgn(y,call(sub,[y,x]))),[(x,X),(y,Y)],[]),cf(C,[(x,X1),(y,Y1)],[])). 18 Clauses 33 and 34 correspond to the ‘then’ and ‘else’ branches of the conditional at line (n9) of the gcd program. No unfoldable atom occurs in the body of clauses 33 and 34 (note that the reach atoms are not unfoldable because the commands in their source configurations are function calls). In order to fold clause 33 the following definition is introduced in Defs and added to InCls: 35. new3(X,Y,X1,Y1):reach(cf(cmd(5,asgn(x,call(sub,[x,y]))),[(x,X),(y,Y)],[]), cf(C,[(x,X1),(y,Y1)],[])). By folding clause 33 using clause 35 we get: 36. new2(X,Y,X1,Y1):- X>=Y+1, new3(X,Y,X1,Y1) Clause 34, corresponding to the ‘else’ branch, is processed in a similar way. We first introduce the following definition: 37. new4(X,Y,X1,Y1):reach(cf(cmd(7,asgn(x,call(sub,[y,x]))),[(x,X),(y,Y)],[]), cf(C,[(x,X1),(y,Y1)],[])). By folding 34 using definition 37 we get: 38. new2(X,Y,X1,Y1):- X=<Y, new4(X,Y,X1,Y1). Fourth Iteration of the UDF-loop. Since we have introduced a new definition, namely clause 35, we start a new iteration of the UDF-loop (Clause 37 will be considered in the next iteration of the UDF-loop below). From clause 35, after some unfolding steps, we get: 39. new3(X,Y,X3,Y3):- A=X, B=Y, X2=R1, reach(cf(cmd(1,asgn(r,minus(a,b))),[(x,X),(y,Y)],[(a,A),(b,B),(r,R)]), cf(cmd(2,return(r)),[(x,X1),(y,Y1)],[(a,A1),(b,B1),(r,R1)]))), reach(cf(cmd(3,ite(neq(x,y)),4,9),[(x,X2),(y,Y1)],[]), cf(cmd(9,halt),[(x,X3),(y,Y3)],[]))). We observe that: (i) the command occurring in the first argument of the first reach atom corresponds to the entry point of the sub function, and (ii) the command occurring in the first argument of the second reach atom is an ite command. Thus, none of the atoms of clause 39 are unfoldable. Note also that the local environments in the first argument of the first reach atom is a list where new logical variables, namely A, B, and R, are associated with the parameters and local variable identifiers used by sum, that is, a, b, and r. In order to fold the first atom occurring in the body of clause 39 the following clause is introduced: 40. new5(X,Y,A,B,R,X1,Y1,A1,B1,R1):reach(cf(cmd(1,asgn(r,minus(a,b))),[(x,X),(y,Y)],[(a,A),(b,B),(r,R)]), cf(cmd(2,return(r)),[(x,X1),(y,Y1)],[(a,A1),(b,B1),(r,R1)]))). By folding clause 39 using definition 40 we get: 41. new3(X,Y,X3,Y3):- A=X, B=Y, X2=R1, new5(X,Y,A,B,R,X1,Y1,A1,B1,R1), reach(cf(cmd(3,ite(neq(x,y)),4,9),[(x,X2),(y,Y1)],[]),cf(cmd(9,halt),[(x,X3),(y,Y3)],[]))). In order to fold the second atom occurring in the body of clause 41 the VCG strategy does not require the introduction of any new definition. Indeed, it is possible to fold clause 41 using clause 24 in Defs and we get: 42. new3(X,Y,X3,Y3):- A=X, B=Y, X2=R1, new5(X,Y,A,B,R,X1,Y1,A1,B1,R1), new1(X2,Y1,X3,Y3). Fifth Iteration of the UDF-loop. We perform one more iteration of the UDF-loop for clause 37 defining the new predicate new4. From clause 37, after some unfolding steps, we get: 43. new4(X,Y,X3,Y3):- A=Y, B=X, Y2=R1, reach(cf(cmd(1,asgn(r,minus(a,b))),[(x,X),(y,Y)],[(a,A),(b,B),(r,R)]), cf(cmd(2,return(r)),[(x,X1),(y,Y1)],[(a,A1),(b,B1),(r,R1)]))), reach(cf(cmd(3,ite(neq(x,y)),4,9),[(x,X1),(y,Y2)],[]), cf(cmd(9,halt),[(x,X3),(y,Y3)],[]))). In order to fold the atoms occurring in the body of clause 43 the VCG strategy does not require the introduction of any new definition. Indeed, it is possible to fold clause 43 using clauses 40 and 24 in Defs, and we get: 44. new4(X,Y,X3,Y3):- A=Y, B=X, Y2=R1, new5(X,Y,A,B,R,X1,Y1,A1,B1,R1), new1(X1,Y2,X3,Y3). 19 Sixth Iteration of the UDF-loop. We take clause 40 from InDefs and we start a new iteration of the UDF-loop. By unfolding clause 40 we get: 45. new5(X,Y,A,B,R,X1,Y1,A1,B1,R1):- R1=A-B, reach(cf(cmd(2,return(r)),[(x,X),(y,Y)],[(a,A),(b,B),(r,R)]), cf(cmd(2,return(r)),[(x,X1),(y,Y1)],[(a,A1),(b,B1),(r,R1)]))). The atom reach in the above clause is unfoldable. After one more unfolding step, by using the reflexivity of the reach predicate, we get: 46. new5(X,Y,A,B,R,X,Y,A,B,R1):- R1=A-B. Since InCls = ∅, the VCG strategy terminates. The final, specialized program consists of the following set VCMS of verification conditions: 25. unsafe:- X>=1, Y>=1, X1=< -1, new1(X,Y,X1,Y1). 29. new1(X,Y,X,Y):- X=Y. 31. new1(X,Y,X1,Y1):- X>=Y+1, new2(X,Y,X1,Y1). 32. new1(X,Y,X1,Y1):- X+1=<Y, new2(X,Y,X1,Y1). 36. new2(X,Y,X1,Y1):- X>=Y+1, new3(X,Y,X1,Y1). 38. new2(X,Y,X1,Y1):- X=<Y, new4(X,Y,X1,Y1). 42. new3(X,Y,X3,Y3):- A=X, B=Y, X2=R1, new5(X,Y,A,B,R,X1,Y1,A1,B1,R1), new1(X2,Y1,X3,Y3). 44. new4(X,Y,X3,Y3):- A=Y, B=X, Y2=R1, new5(X,Y,A,B,R,X1,Y1,A1,B1,R1), new1(X1,Y2,X3,Y3). 46. new5(X,Y,A,B,R,X,Y,A,B,R1):- R1=A-B. Now, by using either the solver ELDARICA, or MathSAT, or QARMC, or Z3, we are able to prove that VCMS ∪ {¬ unsafe} is satisfiable, and hence the gcd program is safe. 5. Multi-Step and Small-Step Semantics Compared Now we will compare the multi-step operational semantics MS presented in Section 2 with a small-step operational semantics, denoted SS, that extends the semantics presented in a paper De Angelis et al. [11] to deal with the syntax presented in Table 1. We will also discuss the main differences between the verification conditions we obtain by applying the VCG strategy for these two different semantics. The small-step semantics SS is similar to the multi-step semantics MS in the case of expressions, assignments, conditionals, and jumps. We will not show here the rules for these commands and the interested reader may refer to the above mentioned paper by De Angelis et al. [11]. These two semantics differ in the way they deal with function calls and function return’s. The SS semantics keeps an execution stack (which is empty in the initial configurations), whose elements are called activation frames. Each activation frame contains information about a single function call, that is, it includes: (i) the label where to jump after returning from the function call, (ii) the variable used for storing the value returned by the call, and (iii) the local environment to be used during the execution of the function. Configurations are represented as terms of the form cf(Cmd,D,T), where Cmd is a labeled command, D is a global environment, and T is a stack of activation frames. When a function call of the form ℓ : x = f (e1 , . . . , ek ) is encountered, the SS semantics ‘dives into’ the function definition and makes a transition from the configuration containing the function call to the configuration containing the entry point of f (that is, the command at(firstlab( f ))). When making this transition, encoded by the following clause s1, the SS semantics pushes a new activation frame on top of the execution stack. The loc env(T,S) predicate holds iff either (i) both T and S are empty lists, or (ii) S is the local environment component of the topmost activation frame in T. s1. tr(cf(cmd(L,asgn(X,call(F,Es))),D,T),cf(cmd(FL,C),D,[frame(L1,X,FEnv)|T])):firstlab(F,FL), at(FL,C), nextlab(L,L1), loc env(T,S), eval list(Es,D,S,Vs), build funenv(F,Vs,FEnv). When exiting from a function call, that is, when a command of the form ℓ : return e is encountered, the topmost activation frame in the execution stack is retrieved, and the caller environment is updated using the value returned by the function call. Then, program execution proceeds by popping the activation frame from the execution stack and jumping to the command which is written immediately after the function call. Thus, the transition for a return command is encoded by the following clause: 20 s2. tr(cf(cmd(L,return(E)),D,[frame(L1,X,S)|T]),cf(cmd(L1,C),D1,T1)):eval(E,D,S,V), update(D,T,X,V,D1,T1), at(L1,C). Unlike SS, the MS semantics does not need to keep an execution stack for dealing with function calls. Indeed, when a function call is encountered, MS ‘steps over’ the function definition and makes a transition from the configuration containing the function call to the configuration containing the command which is written immediately after the function call. Since such transition can only be performed if the function call terminates, MS checks that there exists a sequence of transitions (hence, the semantics has been called multi-step) from the configuration containing the entry point of the function definition to a configuration containing either a return or an abort command occurring in the function definition. To make that check possible, the MS semantics requires the introduction of a reach predicate with two arguments that encode the source and target configurations (see clauses 6 and 7 of Table 2), while for the SS semantics it suffices to use a reach predicate that has only one argument that stores the current configuration. Indeed, for the SS semantics, program unsafety is specified by using the following clauses, where the predicate reach is unary and encodes the reachability of an error configuration, not that of a generic configuration as in the case of the MS semantics. s3. unsafe :- initConf(C), reach(C). s4. reach(C) :- tr(C,C1), reach(C1). s5. reach(C) :- errorConf(C). As a consequence of these differences, the VCs generated by our VCG strategy for the MS semantics are different from those generated for the SS semantics. In particular, in the case of the unsafety triple {{x ≥ 1 ∧ y ≥ 1}} gcd {{x < 0}}, the VCG strategy for the SS semantics generates the following set VCSS of verification conditions: ss1. ss2. ss3. ss4. ss5. ss6. ss7. ss8. ss9. ss10. unsafe :- X>=1, Y>=1, new ss1(X,Y). new ss1(X,Y) :- X>=1+Y, new ss2(X,Y). new ss1(X,Y) :- X+1=<Y, new ss2(X,Y). new ss1(X,Y) :- X=< -1, Y=X. new ss2(X,Y) :- X=<Y, new ss3(X,Y). new ss2(X,Y) :- X>=Y+1, new ss4(X,Y). new ss3(X,Y) :- A=Y, B=X, new ss5(X,Y,A,B,R). new ss4(X,Y) :- A=X, B=Y, new ss6(X,Y,A,B,R). new ss5(X,Y,A,B,R) :- Y1=A-B, new ss1(X,Y1). new ss6(X,Y,A,B,R) :- X1=A-B, new ss1(X1,Y). Linearity of clauses. The VCs generated by using the small-step semantics SS consist of linear Horn clauses (that is, clauses having at most one atom in their body), while those generated by using the multi-step semantics might contain nonlinear clauses (see clauses 42 and 44 in Section 4.4). This is due to the fact that the predicate tr encoding the transition relation =⇒ for MS, is defined in terms of the predicate reach that encodes the reflexive, transitive closure =⇒∗ of the relation =⇒. Thus, the clauses obtained at the end of the Unfolding phase of the VCG strategy may contain multiple reach atoms in their body. We will see in Section 8 that linear clauses are typically easier to analyze than nonlinear ones. Moreover, some Horn clause solvers are unable to deal with nonlinear clauses [4, 10]. Processing function calls. According to clause s1 each activation frame includes information about a single function call (the label where to jump after returning from the function call and the variable used for storing the value returned by the call). Hence, the new definition introduced for the entry point of a function contains information that makes it dependent on the context in which the function is called. A consequence of this fact is that such a definition cannot be used for folding all the reach atoms corresponding to the same entry point, that are reached from different calls to the same function. Therefore for each different function call the VCG strategy may need to repeat the specialization process corresponding to the function body. Now let us illustrate this phenomenon by considering again the gcd example. The specialization of the small-step semantics SS introduces the following two definitions for the entry point of the function sub (in these definitions ‘. . .’ stands for some term which is of no interest in our example here), while the specialization of the multi-step semantics MS produces the definition clause 40 only. new ss5(X,Y,A,B,R) :reach(cf(cmd(1,asgn(r,minus(a,b))),[(x,X),(y,Y)],[frame(8,y,[(a,A),(b,B),(r,R)])|. . .])). 21 new ss6(X,Y,A,B,R) :reach(cf(cmd(1,asgn(r,minus(a,b))),[(x,X),(y,Y)],[frame(6,x,[(a,A),(b,B),(r,R)])|. . .])). Both reach atoms refer to the entry point of the definition of the sub function. However, they encode different environments (see, in particular, the different return labels and the different variable names used for storing the value returned by each call). This difference prevents the VCG strategy from introducing a single definition for folding both atoms. Recursive functions. If functions are recursively defined, then the specialization of MS generates better verification conditions than the one of SS in most examples. Indeed, in the presence of recursive definitions the specialization of the SS semantics will not be able to remove the dynamic data structure that encodes the execution stack, and this will make the task of verifying satisfiability much harder for Horn solvers. In contrast, the multi-step semantics can easily deal with recursively defined functions and produces nonlinear VCs whose satisfiability can be checked by using SMT solvers for Horn clauses with constraints over integers and integer arrays. Number of variables. The atoms occurring in the VCs generated when using the MS semantics often have more variables than those occurring in VCs generated when using the SS semantics. This is due to the fact that the SS semantics is encoded by a unary reachability relation reach on configurations, while the MS semantics is encoded by a binary reachability relation reach on configurations. We will see in Section 8 that the differences between the VCs automatically generated using the SS and MS semantics have an impact on the effectiveness of the Horn clause solvers we use for proving satisfiability. Indeed, current Horn solvers are more effective at proving linear VCs, like those generated by the SS semantics, than at proving non-linear VCs, like those generated by the MS semantics. 6. Removing Redundant Arguments It is well known that program specialization and transformation techniques often produce clauses with more arguments than those that are actually needed [25, 38, 51]. Thus, it is not surprising to observe that such a side-effect also occurs when generating VCs via program specialization. Indeed, it is often the case that some of the variables occurring in the CLP program Isp , which is the output of the VCG strategy, are not actually needed to check whether or not unsafe ∈ M(Isp). Avoiding those unnecessary variables, and thus deriving predicates with smaller arity, can increase the effectiveness and the efficiency of applying Horn clause solvers. In this section we will present two transformation techniques aimed at reducing the number of variables occurring in the program Isp . They are extensions to the case of CLP programs of analogous transformations of logic programs presented in other papers [38, 51]. 1. Non-Linking variable Removal Strategy (NLR). We first consider a transformation strategy, called NLR (NonLinking variable Removal) strategy, whose objective is to remove non-linking variables, that is, variables that occur as arguments of an atom in the body of a clause and do not occur elsewhere in the clause [51]. Definition 4 (Linking Variables). Let C be a clause of the form H :- c, L, B, R , where c is a constraint, L and R are (possibly empty) conjunctions of atoms, and B is an atom. The set of the linking variables of the atom B in C, denoted linkvars(B, C), is vars(B) ∩ vars({H,c,L,R}). The set of the non-linking variables of B in C is vars(B) − linkvars(B, C). Before presenting the NLR strategy, we show an example of its effect. Let us suppose that, by applying the VCG strategy, we get the set P1 of clauses in Figure 3, where the non-linking variables have been underlined. By applying the NLR strategy we will get the set P2 of clauses. Now, P2 is equivalent to P1 with respect to the query unsafe, that is, unsafe ∈ M(P1) iff unsafe ∈ M(P2). In particular, NLR replaces the predicates newp1 and newp2, which are called with the non-linking variables X2, Y1, and Y2 (see clauses 1 and 2), by the two new predicates newp3 and newp4, respectively, which are called with linking variables only. Note that the removal of the two arguments Y1 and X2 of newp1, that are the non-linking variables in clause 1, determines in clause 2 the removal of the two arguments Y1 and X2, that are linking variables of newp2. Thus, from newp2 with six arguments in clause 2, by removing also the non-linking variable Y2, we get the predicate newp4 with three arguments only. 22 P1: VCs obtained by VCG P2: VCs obtained by NLR 1. unsafe:- X1>=0, Y2=<0, newp1(X1,Y1,X2,Y2). 2. newp1(X1,Y1,X2,Z2):- Z1=X1+1, newp2(X1,Y1,Z1,X2,Y2,Z2). 3. newp2(X1,Y1,Z1,X2,Y2,Z2):- Z1=<9, Z3=Z1+1, newp2(X1,Y1,Z3,X2,Y2,Z2). 4. newp2(X1,Y1,Z1,X1,Y1,Z1):- Z1>=10. 1′ . unsafe:- X1>=0, Y2=<0, newp3(X1,Y2). 2′ . newp3(X1,Z2):- Z1=X1+1, newp4(X1,Z1,Z2). 3′ . newp4(X1,Z1,Z2):- Z1=<9, Z3=Z1+1, newp4(X1,Z3,Z2). 4′ . newp4(X1,Z1,Z1):- Z1>=10. Figure 3: Application of the Non-Linking variable Removal (NLR) Strategy. The NLR strategy is similar to the VCG strategy, and now we will mention the differences between the two. The NLR strategy is obtained from the VCG strategy in Figure 2 by: (1) assuming that the Unfolding phase is performed with all atoms annotated as non-unfoldable (and thus, for each definition, only the first step of unfolding is performed), (2) replacing the Definition-Introduction & Folding phase with the Definition-Introduction of Figure 4, and (3) performing the Folding phase of Figure 5 at the end of the outermost loop of the VCG strategy (that is, at the end of the loop with double vertical lines in Figure 2), after all unfolding and definition introduction steps. We assume that the input of NLR is any CLP program Prog. To keep the notation simple, we will identify a tuple of variables with the set of variables occurring in it. The union of two tuples is constructed by erasing duplicate elements. Definition-Introduction: while in SpC there is a clause E of the form: H :- c, L, B, R , such that E cannot be folded w.r.t. the atom B using any clause in Defs do let F be newp(P):- B, where newp is a predicate symbol not occurring in Prog ∪ Defs, and P = linkvars(B, E); if in Defs there is a clause D of the form newq(Q):- S such that for some renaming substitution ϑ, Bϑ = S then let G be newp(L):- B, where L = Pϑ ∪ Q; Defs := (Defs − {D}) ∪ {G}; InCls := (InCls − {D}) ∪ {G}; else Defs := Defs ∪ {F}; InCls := InCls ∪ {F}; end-while; Figure 4: The Definition-Introduction phase. Folding: while in SpC there is a clause E of the form H :- c, L, B, R , and in Defs there is a clause D of the form newp(P):- B (modulo variable renaming), and E can be folded w.r.t. B by using D do SpC := (SpC − {E}) ∪ {H :- c, L, newp(P), R}; end-while; Figure 5: The Folding phase. The peculiarity of the NLR strategy lies in the careful treatment of the set of variables occurring in the head of the definition clauses during the Definition-Introduction phase. Let E be a clause in SpC of the form: H :- c, L, B, R , where the predicate symbol of B occurs in Prog. If E cannot be folded with respect to the atom B using any clause in Defs, then we have to introduce a new definition clause as we now explain. First, we consider a definition F whose head contains only the linking variables of the atom B in the clause E. Let F be newp(P):- B, where newp is a predicate symbol not occurring in the set Prog ∪ Defs, and P is the set linkvars (B, E) of the linking variables of B in E. 23 If the set Defs contains a clause D of the form: newq(Q) :- S such that, for some renaming substitution ϑ, Bϑ = S, then we replace clause D in Defs with the clause newp(L):- B, where L = Pϑ ∪ Q. Otherwise, we introduce the definition clause F and we add it to Defs. The introduction of the definition F might seem to be the best choice in the sense that it contains exactly the head variables which are actually needed for folding clause E. However, (variants of) B may occur also in some other clauses to be folded. Thus, if we introduce definitions whose heads contain only the linking variables, we run the risk of introducing several definitions with the same atom in the body and different sets of variables in the head (modulo variable renaming). In order to keep the number of definitions as low as possible (and this often enhances the ability of proving program correctness), instead of introducing multiple definitions containing the same atom in the body, by applying the NLR strategy, we merge them in a single definition whose set of head variables is the union of the head variables occurring in the merged definitions (modulo variable renaming). Theorem 6 (Termination, Correctness, and Size of the Output of NLR). Given any CLP program Prog, the NLR strategy terminates and produces a CLP program Prog′ such that: (i) unsafe ∈ M(Prog) iff unsafe ∈ M(Prog′ ), and (ii) α(Prog′ ) ≤ α(Prog). 2. Constrained FAR Algorithm (cFAR). Now we present an extension to constraint logic programs of the FAR algorithm propoed by Leuschel and Sørensen [38] for removing redundant arguments from logic programs. This extension will be called constrained FAR algorithm, or cFAR, for short. The objective of the FAR algorithm is to remove arguments that are not actually used during any computation of the program at hand. Indeed, it has been shown by Henriksen and Gallagher [30] that the FAR algorithm (and thus, also the cFAR algorithm) can be seen as a generalization of the liveness analysis. In Figure 6 we show the effect of applying the cFAR algorithm to the CLP program P2 obtained by the NLR strategy (see Figure 3). The output of the algorithm is the CLP program P3. Note that in program P3 the predicate symbol newp4 denotes a different relation with respect to the one in program P2, because in P3 it has arity 2 and not 3. P3: VCs obtained by cFAR P2: VCs obtained by NLR 1′ . unsafe:- X1>=0, Y2=<0, newp3(X1,Y2). 2′ . newp3(X1,Z2):- Z1=X1+1, newp4(X1,Z1,Z2). 3′ . newp4(X1,Z1,Z2):- Z1=<9, Z3=Z1+1, newp4(X1,Z3,Z2). 4′ . newp4(X1,Z1,Z1):- Z1>=10. 1′′ . unsafe:- X1>=0, Y2=<0, newp3(X1,Y2). 2′′ . newp3(X1,Z2):- Z1=X1+1, newp4(Z1,Z2). 3′′ . newp4(Z1,Z2):- Z1=<9, Z3=Z1+1, newp4(Z3,Z2). 4′′ . newp4(Z1,Z1):- Z1>=10. Figure 6: Application of the constrained FAR (cFAR) algorithm. In order to define the cFAR algorithm we need to introduce some preliminary notions, some of which have been adapted from the above cited paper by Leuschel and Sørensen [38]. Definition 5 (Erasure, Erased Atom, Erased Clause, Erased Program). (i) An erasure is a set of pairs each of which is of the form (p, k), where p is a predicate symbol of arity n and 1 ≤ k ≤ n. (ii) Given an erasure E and an atom A whose predicate symbol is p, the erased atom A|E is obtained by dropping all the arguments that occur at position k, for some (p, k) ∈ E. (iii) Given an erasure E and a clause C (respectively, a CLP program Prog), the erased clause C|E (respectively, the erased program Prog|E ) is obtained by replacing all atoms A in C (respectively, in Prog) by A|E . In order to avoid the risk of collisions between predicate symbols after erasing some arguments, we assume that Prog does not contain identical predicate symbols with different arity. Obviously, we are interested in removing redundant arguments without altering the semantics of the original program, in the sense captured by the following definition. 24 Definition 6 (Correctness of Erasure). An erasure E is correct for a program Prog if, for all atoms A, we have that : A ∈ M(Prog) iff A|E ∈ M(Prog|E ). Since we are dealing with constraint logic programs, the notion of multiple occurrences of a variable which is used in the original formulation of FAR [38], needs to be generalized as follows. We assume that variables occurring in atomic constraints are distinct. Definition 7 (Variable Constrained to Another Variable). Given two distinct variables X and Y and a constraint c of the form c1∧ . . . ∧ch , where the ci ’s are atomic constraints, we say that X is constrained to Y (in c) if there exists cj , with 1≤ j ≤ h , such that either (i) {X, Y} ⊆ vars(cj ), or (ii) there exists a variable Z such that (ii.1) {X, Z} ⊆ vars(cj ) and (ii.2) Z is constrained to Y (in c). Now we are ready to introduce the notion of safe erasure that will be used during the application of the constrained FAR algorithm. Definition 8 (Safe Erasure). Given a program Prog, an erasure E is a safe erasure if, for all (p, k) ∈ E and clauses H :- c,G in Prog, where H is of the form p(X1,...,Xn) and c is a constraint, we have that : (i) Xk is a variable in {X1,...,Xn} and A |= ∀Xk . ∃Y1, . . . ,Ym . c, with {Y1,...,Ym} = vars(c) − {Xk}, (ii) Xk is not constrained (in c) to any other variable occurring in H, and (iii) Xk is not constrained (in c) to any variable occurring in G|E . By a proof similar to the one by Leuschel and Sørensen [38], it can be shown that if an erasure E is safe, then it is also correct. The cFAR algorithm takes as input a CLP program Prog, computes a safe erasure E, and produces as output the program Prog|E . The algorithm starts off by initializing the current erasure E to the full erasure, that is, the set of all pairs (p, k), where p is a predicate of arity n occurring in Prog and 1 ≤ k ≤ n. Then, while E contains a pair (p, k) such that one of the conditions of Definition 8 is not satisfied, the pair (p, k) is removed from E. The algorithm terminates when it is no longer possible to remove a pair (p, k) from E, and thus E is a safe erasure. The cFAR algorithm terminates and preserves the semantics and the size of the input program, as stated by the following theorem. Theorem 7 (Termination, Correctness, and Size of the Output of cFAR). Given any CLP program Prog, the cFAR algorithm terminates and produces a CLP program Prog|E such that unsafe ∈ M(Prog) iff unsafe ∈ M(Prog|E ) and α(Prog) = α(Prog|E ). Finally, we would like to note that, even if the objectives of the NLR strategy and cFAR algorithm are similar, they work in a different way. While cFAR is goal independent, NLR starts from the predicate unsafe and proceeds by unfolding in a goal directed fashion, similarly to redundant argument filtering [38]. It can be shown that, in general, the NLR strategy and the cFAR algorithm have incomparable effects. 7. Encoding Variations of the Semantics One of the biggest advantages of a semantics-based approach to VC generation via program specialization lies in its agility, that is, its ability to rapidly adapt to changes in the semantics of the imperative programming language under consideration. For example, it might be desirable for a software verification engineer to start modeling a core fragment of the language semantics. That fragment of the semantics will be incrementally extended and refined by adding support for language features which were initially ignored. In this section, we will see how to extend the MS semantics for supporting additional features and how easy it is to encode such extensions in our VC generation framework, without having to modify the VCG strategy. 25 Side-effect free functions. In general, functions may have side effects, that is, the value of the global variables may be altered by a function call. However, if we know that a given function is side-effect free, then we can use custom semantics rules that leave the global environment unchanged, thus generating verification conditions that are hopefully easier to verify. Here is the rule for a function call to f that is side-effect free. ⟨⟨ℓ : x = f (e1 , . . . , ek ), ⟨δ, σ⟩⟩⟩ =⇒ ⟨⟨at(nextlab(ℓ)), update(⟨δ, σ⟩, x, !e" δ σ′ )⟩⟩ if ⟨⟨at(firstlab( f )), ⟨δ, σ⟩⟩⟩ =⇒∗ ⟨⟨ℓr : return e, ⟨δ, σ′ ⟩⟩⟩ If we use this rule, instead of rule R2r, the number of logical variables in the VCs decreases because there is no need to encode the values of the global variables occurring in the target configuration. Let us show an example of this fact. Consider again the gcd program of Section 4.4. If we annotate (either manually or by using an automated analysis) the sub function as side-effect free, then the VCG strategy generates a set of verification conditions which is identical to the set VCMS of verification conditions obtained at the end of Section 4.4, except for the clauses 42, 44, and 46 defining the predicates new3, new4, and new5, respectively, which have to be replaced by the following ones: (R2rsef ) 42sef . new3(X,Y,X3,Y3):- A=X, B=Y, X2=R1, new5(X,Y,A,B,R,A1,B1,R1), new1(X2,Y1,X3,Y3). 44sef . new4(X,Y,X3,Y3):- A=Y, B=X, Y2=R1, new5(X,Y,A,B,R,A1,B1,R1), new1(X1,Y2,X3,Y3). 46sef . new5(X,Y,A,B,R,A,B,R1):- R1=A-B. Note that in clause 46sef the predicate new5, encoding the body of the sub function, has two arguments less than the corresponding predicate new5 in clause 46, which was obtained using rule R2r, instead of R2rsef . We observe that the same effect can also be obtained by applying the NLR strategy to the program VCMS . However, if the information about the side effect freeness is already available, the use of a custom semantics rule for side effect free function calls allows us to avoid performing additional transformations. Undefined functions and assertions. When presenting the multi-step semantics of our language we have assumed that there exists a definition for every function that is called. Now we remove this assumption and we allow programs to call functions whose definition is unknown at verification time (for instance, library functions or functions defined by external modules). In order to extend our semantics with this new feature, we should: (i) restrict the applicability of the rules (R2a) and (R2r) for function calls to defined functions only, and (ii) introduce the following two new rules (R2au) and (R2ru ) for dealing with an undefined function fu . (R2au) ⟨⟨ℓ : x = fu (e1 ,. . .,ek ), ⟨δ,σ⟩⟩⟩ =⇒ ⟨⟨ℓa : abort, ⟨⊥,δ′,σ′ ⟩⟩⟩ This rule considers the case where the call to fu aborts. In this case there is a transition to an aborted configuration. Note that the environments δ′ and σ′ are unknown. We also assume that, for each undefined function fu , we are given an assertion assn( fu ), which denotes an overapproximation of the set of values which may be returned by fu .1 The environment δ′ is unknown. (R2ru ) ⟨⟨ℓ : x = fu (e1 , . . . , ek ), ⟨δ, σ⟩⟩⟩ =⇒ ⟨⟨at(nextlab(ℓ)), update(⟨δ′ , σ⟩, x, v)⟩⟩ where v ∈ assn( fu ). This rule considers the case where the call to fu returns an unknown value v satisfying the assertion on fu . In this case the caller environment is updated by using v as the new value of variable x. Let us now assume that the definition of the sub function of our gcd program of Section 4.4 is unknown. We only know that sub returns a value x such that x ≥ 0. If we annotate the program with this assertion, then we get a set of VCs which is identical to the set VCMS except that: (i) the predicate new5 is not defined (and thus clause 46 is erased), and (ii) clauses 42 and 44 are replaced by the following clauses 42u and 44u : 42u . new3(X,Y,X3,Y3):- A=X, B=Y, X2>=0, new1(X2,Y1,X3,Y3). 44u . new4(X,Y,X3,Y3):- A=Y, B=X, Y2>=0, new1(X1,Y2,X3,Y3). In this replacement the atoms of clauses 42 and 44 with predicate new5, encoding the calls to the sub function, together with the constraints binding the return values to variables of the calling contexts, have been substituted by the underlined constraints. 1 Library functions usually provide some information about their specifications. For instance, the abs function of the GNU C Library is sideeffect free and returns a value greater than zero. 26 Aborted stack traces. In case of an aborted execution, it might be desirable, for debugging purposes, to record the call stack trace containing the command labels and the local environments which led to the execution of the abort command. This can be done by adding to the configuration an extra third component that stores the stack trace. We should also make the following changes to the rules for the abort command and the function call (stack traces are represented by using the familiar list notation): (R3st ) ⟨⟨ℓa: abort, ⟨δ, σ⟩, [ ]⟩⟩ =⇒ ⟨⟨ℓa: abort, ⟨⊥, δ, σ⟩, [(ℓa, σ)]⟩⟩ (R2ast ) ⟨⟨ℓ : x = f (e1 , . . . , ek ), ⟨δ, σ⟩, [ ]⟩⟩ =⇒ ⟨⟨ℓa : abort, ⟨⊥, δ′ , σ⟩, [(ℓ, σ)|s]⟩⟩ if ⟨⟨at(firstlab( f )), ⟨δ, σ⟩, [ ]⟩⟩ =⇒∗ ⟨⟨ℓa : abort, ⟨⊥, δ′ , σ′ ⟩, s⟩⟩ Tuning the VCG strategy. An additional value of the rule-based transformational approach to VC generation is that it gives to the verification engineer a fine-grained control over the shape of the VCs which can be generated. For example, as shown in the gcd example above, by using the unfolding annotation UA presented in Section 4.2 we are guaranteed that the size of the VCs is linear with respect to the size of the imperative program. Thus, we avoid a well-known risk of potential exponential explosion of the number of VCs, and automatically obtain an effect similar to that described by Flanagan and Saxe [23]. In some situations, however, it could be advantageous to use different unfolding annotations. For example, by enlarging the set of reach atoms that are annotated as unfoldable once, we could derive the following set of VCs for gcd example which is considerably smaller than VCMS : a1. unsafe:- X>=1, Y>=1, X1=< -1, new6(X,Y,X1,Y1). a2. new6(X,Y,X2,Y2):- Y1=Y-X, X+1=<Y, new6(X,Y1,X2,Y2). a3. new6(X,Y,X2,Y2):- X1=X-Y, X>=Y+1, new6(X1,Y,X2,Y2). a4. new6(X,Y,X,Y):- X=Y. Of course, care must be taken to ensure that the chosen unfolding annotation still guarantees the termination of the VCG strategy. Conversely, if we reduce the set of atoms that are annotated as unfoldable, the termination of the VCG strategy is always guaranteed, but more definitions are introduced and, consequently, the set of VCs tends to grow. For example, we may tune the unfolding annotation so that the VCG strategy introduces a definition for each program point. For the gcd example, we obtain the following set of VCs: b1. b2. b3. b4. b5. b6. b7. b8. b9. b10. b11. unsafe :- X>=1, Y>=1, X1=< -1, new7(X,Y,X1,Y1). new7(X,Y,X1,Y1):- new8(X,Y,X1,Y1). %main new8(X,Y,X1,Y1):- X+1=<Y, new9(X,Y,X1,Y1). %loop new8(X,Y,X1,Y1):- X>=Y+1, new9(X,Y,X1,Y1). %loop new8(X,Y,X,Y):- X=Y. %loop new9(X,Y,X1,Y1):- X=<Y, new10(X,Y,X1,Y1). %else new9(X,Y,X1,Y1):- X>=Y+1, new11(X,Y,X1,Y1). %then new11(X,Y,X2,Y2):- A=X, B=Y, R1=X1, new12(X,Y,A,B,R,A1,B1,R1), new8(X1,Y,X2,Y2). %sub new10(X,Y,X2,Y2):- A=Y, B=X, R1=Y1, new12(X,Y,A,B,R,A1,B1,R1), new8(X,Y1,X2,Y2). %sub new12(X,Y,A,B,R,A1,B1,R1):- A-B=R, new13(X,Y,A,B,R,A1,B1,R1). %asgn new13(X,Y,A,B,R,A,B,R). %return 8. Experimental Evaluation In this section we present the results of the experimental evaluation we have performed for assessing the viability of our semantics-based method for generating VCs. This experimental evaluation is important because the form of the VCs may have a significant impact on the efficiency and, more importantly, on the effectiveness of the tools which are then used for checking the satisfiability of the VCs. We have applied our VCG strategy for generating the VCs for several verification problems taken from the literature, using both the SS semantics and the MS semantics. Then, we have evaluated the quality of the generated VCs by giving them as input to the following state-of-the-art Horn solvers: (i) QARMC (the Horn solver of the HSF(C) software model checking tool [28]), (ii) Z3 [16] using the PDR engine, (iii) MSATIC3 (a version of MathSAT [4] 27 optimized for Horn solving), and (iv) ELDARICA [32]. In order to evaluate the efficiency of our implementation we have also run the HSF(C) tool alone on the same benchmark set. The results of the experiments demonstrate that our method improves the overall accuracy of HSF(C) with a little increase of verification time, and, thus, it is viable in practice. We also show the performance improvements that we have obtained by improving the implementation of our VCG strategy. Verification problems. We have considered a benchmark set of 320 verification problems written in the C language (227 of which are safe and the remaining 93 are unsafe), taken from the benchmark sets of various software model checking tools,2 whose size ranges from a dozen to about three thousand lines of code. The C programs of the problems we have considered and the VCs we have generated are available at http://map.uniroma2.it/vcgen. Implementation. We have implemented our approach as a part of VeriMAP [10], a software model checking tool written in SICStus Prolog and based on program transformation of CLP programs. Our prototype implementation of the VC generator consists of three modules. (1) A front-end module, based on the C Intermediate Language (CIL) [47], that compiles the given verification problem into a set of Horn clauses (such as the clauses for the at, initConf, and errorConf predicates) using a custom implementation of the CIL visitor pattern. (2) A back-end module, based on VeriMAP, realizing the VCG strategy described in Section 4.1. (3) A module that translates the generated VCs to the specific input format of the solvers we have considered, that is, the constrained Horn clauses dialect of QARMC and ELDARICA and the SMT-LIBv2 format for the Z3 and MSATIC3 solvers. Technical resources. The experiments have been performed using GNU Parallels [54] on 24 to 32 logical cores of an Intel Xeon CPU E5-2640 2.00GHz processor with 64GB of memory under the GNU Linux operating system CentOS 7 (64 bit). Timings are computed as if the experiments were run sequentially. A time limit of five minutes has been set for all problems. (The experimental settings are slightly different from those used in a previous work of ours [12].) Generating the VCs. Now we discuss the performance and the scalability of the VC generation process. In a previous paper [11] we have shown that our verification framework can be effectively used to generate the VCs from a smallstep semantics for a subset of the language presented in Table 1. In the present work, besides experimenting with different formalizations of the operational semantics and different unfolding strategies, we have also implemented several optimizations for increasing the scalability of our method. In particular, we have introduced more efficient procedures for: (i) checking the satisfiability of constraints, and (ii) computing the set FullUnf for atoms annotated as fully unfoldable. Regarding Point (i), the specialization strategy presented in De Angelis et al. [11] makes use of the psat operator, which checks the satisfiability of a constraint and projects it over a given set of variables. However, since projection is not needed when applying the unfolding rule, we have implemented a more efficient operator, called sat, which only performs the satisfiability check. Regarding Point (ii), we have implemented the full unfolding of an atom A simply by evaluating the query A via the findall Prolog predicate and collecting all the answers, hence avoiding the computational overhead due to repeated applications of the meta-level unfolding operation. We report the results we have obtained in Table 4. Small-step Multi-step p SS o SS so SS pf SS fs 1. 2. 3. 4. 5. MS n 216 320 317 320 320 t VCG t 216 VCG 180.43 180.43 1215.62 38.66 4475.19 40.67 221.68 15.17 141.85 10.24 Table 4: Times (in seconds) taken for the VC generation using different language semantics and settings. The time limit is five minutes. n is the number of programs out of 320, for which the VCs were generated. 2 DAGGER (21 problems) TRACER (66 problems) and InvGen (68 problems), WHALE (7 problems) and from the TACAS Software Verification Competition (149 problems). The remaining 9 problems are taken from the literature. 28 Columns (n) and (t VCG ) report the total number of verification tasks for which our tool was able to generate the VCs within the time limit of five minutes, and the time taken for the generation, respectively. Line 1 (SS po ) reports the results obtained when the VCG strategy uses the small-step SS semantics [11] with the psat operator (denoted by superscript p) and unfoldable atoms can only be annotated as unfoldable once (denoted by subscript o), not as fully unfoldable. Line 2 reports the results obtained by replacing the psat operator with the more efficient sat operator (denoted by superscript s). Line 3 and 4 show the results obtained by enabling the use of fully unfoldable annotations for all atoms with non-recursive predicates (denoted by subscript f ) and by using psat and sat, respectively. The best performance of the SS semantics is obtained (see line 4) by using the efficient satisfiability test and the fully unfoldable annotations (SS sf ) allowing us to produce the VCs for the whole benchmark set in less than 4 minutes. Note that if we use psat there are always timed out problems. With regard to the multi-step MS semantics, in line 5 we report the results we obtained by using sat and fully unfoldable annotations for all tr atoms, except those which can be unified with the head of clauses 2a and 2r (see Table 2), whose body contains a reach atom (this restriction is needed for guaranteeing the termination of the calls to the FullUnf procedure). The VC generation process using the MS semantics is faster than using the SS semantics. Moreover, the VCs generated using the MS semantics are more compact than those obtained by the SS semantics. Indeed, the number of clauses of the VCs generated using MS is about the 37% lower than that of the VCs generated using SS fs . (The number of the clauses of the VCs is not shown in Table 4.) In order to compare the VC generation times obtained by varying the version of the semantics and the settings used, we consider the subset of the benchmark consisting of the 216 verification problems for which the specializer is able to generate the VCs (within the time out) whichever semantics and setting is used. In Column (t 216 VCG ) we report the total time required to generate the VCs on this subset of the benchmark set. We note that for this subset the VC generation speedup with respect to SS po reaches 12× for SS sf and 17.5× for MS. In our experiments with different semantics we have also considered a subset of the benchmark set consisting of the SV-COMP verification tasks systemc-transmitter∗ and systemc-token ring∗ (43 problems) whose size ranges from 450 LOC to 2 KLOC. On this subset the VC generation time using MS is always lower than the one required to generate the VCs using SS sf . Moreover, if we consider the hardest verification tasks in this set, namely systemc-transmitter.16 unsafeil.c and systemc-token ring.15 unsafeil.c, the VC generation time using SS po is about 40 minutes, for each problem. This time drops dramatically if we generate the VCs using SS sf (about 8s, for each problem) and MS (about 3.5s, for each problem). Solving the VCs (that is, proving satisfiability of the VCs). The results we have obtained by running the Horn solvers QARMC, Z3, MSATIC3, and ELDARICA on the VCs generated by our tool are reported in Table 5. Small-step (SS sf ) QARMC c s u i f m to n Correct answers safe problems unsafe problems Incorrect answers false alarms missed bugs Timeouts Total problems t VCG st tt at VCG time Solving time Total time Average Time Z3 MSAT Multi-step (MS) ELD QARMC Z3 MSAT HSF(C) ELD 217 161 56 5 3 2 98 320 208 150 58 0 0 0 112 320 205 158 47 3 1 2 112 320 217 158 59 2 0 2 101 320 210 160 50 3 1 2 120 320 196 144 52 0 0 0 124 320 177 147 30 1 1 0 142 320 182 141 41 0 0 0 138 320 189 158 31 12 3 9 119 320 221.68 3656.24 3877.92 17.87 221.68 4221.39 4443.07 21.36 221.68 2988.86 3210.54 15.66 221.68 8809.58 9031.26 41.62 141.85 2674.00 2815.85 13.41 141.85 2704.95 2846.80 14.52 141.85 1896.96 2038.81 11.52 141.85 2779.18 2921.03 16.05 N/A N/A 631.11 3.14 Table 5: Verification results using QARMC, Z3, MSATIC3 (MSAT, for short), ELDARICA (ELD, for short), and HSF(C). The time limit is five minutes. Times are in seconds. 29 Line (c) reports the total number of correct answers, which is the sum of the number of correct answers for safe and unsafe problems reported at lines (s) and (u), respectively. Line (i) reports the total number of incorrect answers, which is the sum of the number of false alarms (safe problems that have been proved unsafe) and missed bugs (unsafe problems that have been proved safe) reported at lines (f ) and (m), respectively. Line (to) reports the number of problems for which the tool did not provide any conclusive answer within the time limit of five minutes. Line (n) reports the total number of problems on which the tool has been applied. Line (t VCG ) reports the time taken by the execution of the VCG strategy. Line (st) reports the time taken for solving the VCs, that is, proving their satisfiability (or unsatisfiability). Lines (tt) and (at) report the total and average verification time, respectively. These times are computed on the (correct or incorrect) answers, excluding the time taken by problems which timed out. We have also reported in the last column the results obtained by running the HSF(C) tool alone, that is, using its own specific VC generator3. If we consider the VCs generated by applying the VCG strategy using the SS semantics, QARMC and ELDARICA provide the highest number of correct answers, while if we consider the VCs generated by using the MS semantics, QARMC provides more correct answers than Z3, ELDARICA, MSATIC34 and, surprisingly, even than HSF(C). Moreover, the SS semantics provides a higher precision (defined as the ratio between the number of programs which has been shown to be safe or unsafe, and the total number of programs) than the MS semantics. We note also that Z3 and ELDARICA provide the highest number of correct answers on unsafe problems. Unfortunately, when executed on the VCs generated by the VCG strategy, most Horn solvers also give some incorrect answers which are due to missed bugs. The incorrect answers are due to the fact that we have considered an idealized semantics for C expressions, which are viewed as expressions in the theory of integer arithmetics. This idealized semantics does not model correctly the overflows that may occur during the evaluation of C expressions. Indeed, the missed bugs of Table 5 are due to unsigned integer expressions occurring in the conditions of if-then-else commands that, when evaluated in the theory of integer arithmetics, are unsatisfiable and make one branch of the conditional unfeasible. This idealized semantics is often adopted by verifiers, such as HSF(C), that do not focus on the problem of handling overflows, and for a fair comparison among the verifiers, we have considered the same idealized semantics. However, our approach is parametric with respect to the constraint solvers used during specialization, and we can define a semantics that agrees with the standard behavior of C arithmetic expressions by using, for instance, a solver that supports modular integer arithmetics (one such solver is available in the Z3 system). If we examine line (at) reporting the average verification time, the best performance is achieved by HSF(C) followed by MSATIC3 and QARMC (whose verification times are 3.6–5.7 times higher). (Recall that the verification times for QARMC and MSATIC3 include the times for VC generation taken by VeriMAP.) The higher time taken by QARMC with respect to HSF(C) can be justified by the fact that it solves more verification problems whose size is large (up to two thousand lines of code). Indeed, if we consider, for example, the set of 190 problems for which an answer (either correct or incorrect) is provided by both HSF(C) and QARMC (on the VCs generated by using the MS semantics), the ratio between their verification times decreases from 4.46 to about 3.29. In this set of problems there are eleven problems (SVCOMP13-locks-test locks∗) having the same structure, but different size, on which QARMC is particularly slow. If we remove these examples from the set, the ratio drops down to 1.29. For QARMC, we also measured the overhead introduced by the VCG strategy, computed as the ratio between the VCG time and total time for the problems which did not time out. We found that this overhead is quite low and ranges from about 5.7% for the SS semantics to 5% for the MS semantics. Improving effectiveness of solving. In Table 6 we show the results obtained by using Z3 after the application of the auxiliary transformations realized by the NLR strategy and the cFAR algorithm presented in Section 6. Those transformations are applied only to the verification conditions for which Z3 was not able to provide an answer. In particular, column ‘VCG ; Z3’ reports the results we get by applying the VCG strategy for the MS semantics, and 3 For technical reasons we were only able to run HSF(C) on an Intel Core Duo E7300 2.66Ghz processor with 4GB of memory under the GNU Linux operating system Ubuntu 12.10 (64 bit). Note that, however, the power of the cores of this processor is comparable to the power of the cores of the processor used for running the other experiments. 4 MSATIC3 is only able to deal with Horn clauses which are linear, possibly after some preprocessing. However, in general the clauses produced by using the MS semantics may be nonlinear. 30 c s u to n Correct answers safe problems unsafe problems Timeouts Total problems t VCG t NLR t cFAR VCG time NLR time cFAR time st Solving time tt at Total time Average time VCG ; Z3 196 144 52 124 320 VCG ; NLR ; Z3 7 3 4 117 124 VCG ; NLR ; cFAR ; Z3 9 7 2 108 117 40.65 – – 20.48 58.39 – 4.57 9.53 304.84 2704.95 988.15 649.56 2745.60 14.01 1067.02 152.43 968.50 107.61 Table 6: Verification results obtained for the MS semantics by applying the VCG strategy, possibly followed by the NLR strategy and the cFAR algorithm, and then by the Z3 solver. The time limit is five minutes. Times are in seconds. then by applying Z3 on the VCs generated. Column ‘VCG ; NLR ; Z3’ reports the results we get after the application of the NLR strategy on the VCs obtained by applying the VCG strategy. Column ‘VCG ; NLR ; cFAR ; Z3’ reports the results we get after the application of the cFAR algorithm on the VCs obtained by applying the NLR strategy. Lines (t VCG ), (t NLR ), and (t cFAR ) report the time taken by the execution of the VCG, NLR, and cFAR transformations on the correct answers, respectively. The NLR strategy enables Z3 to prove 7 additional verification problems. In particular, it allows Z3 to prove the program ntdrvsimpl-cdaudio simpl1 unsafeil.c, that is the largest program in the benchmark set (2.1 KLOC). Concerning the time required for executing the NLR strategy, we want to point out that this program takes the 91% of the total NLR time (t NLR ), that is 53.04 seconds. Therefore, the remaining six programs only require 5.35 seconds to be transformed. The cFAR algorithm allows Z3 to prove 9 additional verification problems. In this case, about 89% of the total cFAR time (t cFAR ), that is 271.62 seconds, is required for specializing two programs, namely ntdrvsimpl-diskperf simpl1 safeil.c (98.82 seconds) and ntdrvsimpl-floppy simpl3 safeil.c (172.80 seconds) whose size is about 1 KLOC each. 9. Related Work and Conclusions Constraint logic programming, also named constrained Horn clauses, has been shown to be a powerful, flexible formalism to reason about the correctness of programs [1, 10, 11, 22, 27–29, 34, 36, 45, 49]. It can also be used as a common intermediate language for exchanging VCs between software verifiers [1, 3, 9, 26] to take advantage of the many special purpose solvers that are available nowadays. In this paper we have shown that program transformation techniques, and more specifically, specialization of CLP programs, can be effectively applied for automatically generating VCs in the form of Horn clauses, starting from different CLP interpreters for the operational semantics of the programming language and for the logic in which the property of interest is specified. Program specialization of a CLP interpreter for the small-step operational semantics of an imperative language has been initially proposed by Peralta et al. [49]. In their approach the specialization process yields a residual CLP program on which analysis techniques based on abstract interpretation are subsequently applied. In a previous work [11] we have presented a VC generation method which overcomes some limitations which were present in Peralta et al.’s paper [49]. In particular, we have introduced support for (non-recursive) functions, and we have improved scalability by encoding programs as sets of facts, instead of terms. In this paper, we have provided support also for recursive functions and multi-step semantics. Specialization of interpreters has also been used by Albert et al. [1, 27] to decompile (that is, translate) Java bytecode into Prolog programs. Then, the residual program is given as input to the CiaoPP analyzer for Prolog that 31 makes use of abstract interpretation techniques to infer properties of the original Java bytecode. Although the work by Albert et al. and ours share some objectives and methods, they also exhibit several differences. First of all, the source languages and their semantics are different: Java bytecode with a big-steps semantics on one side, and C with a multistep semantics on the other side. Also the target languages are different: Prolog on one side, and a CLP language over integers and integer arrays on the other side. As a consequence, the type of verification tools that can be applied and the properties that can be proved are different. In particular, the approach presented in this paper does not produce a program which is directly executable by a CLP (or Prolog) system, but it enables the use of SMT solvers that can prove theorems in the theories of integers and integer arrays. These theorems (such as those expressing properties related to the contents of arrays whose size is a parameter) may not be proved by the CiaoPP analyzer. Another distinctive feature of our approach is that we specialize the interpreter of the program together with the property to be verified, hence enabling a property-directed generation and transformation of verification conditions. We have shown in other papers that this feature may allow us to prove many complex properties besides safety, including recursively defined properties and program equivalence [13, 15]. Also the specialization method of Albert et al. is different from the one presented in this paper. The former follows the approach formalized by Lloyd and Shepherdson [43], while we use unfold/fold rules. While the two approaches are semantically equivalent, we believe that the rule-based approach has some advantages: (i) it gets for free (via folding) the renaming mechanism performed by the codegen procedure, and more importantly (ii) it can easily be combined with the many transformations that can be expressed via unfold/fold rules. Another difference lies in the specialization strategies: Albert et al.’s strategy can be classified, by using the terminology of partial evaluation, as a hybrid online-offline control strategy (some unfolding decisions are taken at specialization time), while our VCG strategy is purely offline. Finally, the two approaches are similar also with respect to scalability, as they both present specialization strategies that run in linear time and produce residual programs that have linear size with respect to the size of the input bytecode. However, here we have proved a more refined property, as we compute the size of the residual programs in terms of the number of atoms, instead of the number of clauses. It would have been interesting to compare the scalability of the two approaches in practice, but this is not straightforward due to the difference of source programs. Our approach shares the same objective of the work by van Leeuwen [55], where generic programming and monadic denotational semantics have been used to define a compositional method for building VC generators, which can be extended to new language features or new languages. A considerable effort has been placed in the area of automated VC generation, as it is evident from the many tools currently available, such as ESC/Java [5], Boogie [2], and Why3 [18]. These tools generate VCs by using Dijkstra’s weakest precondition calculus. ESC/Java generates VCs for Java programs with (user-provided) annotations. Boogie, besides using program annotations, takes advantage of abstract interpretation techniques for inferring inductive invariants, and relies on front-ends that translate programs written in different languages (e.g. C, .NET) into the intermediate BoogiePL language. Similarly, the Why3 [18] verification platform generates VCs for C, Java, and Ada programs by converting them to an intermediate specification and programming language (WhyML). Similarly to Boogie, the Valigator tool [31] is able to infer loop invariants, but it uses different techniques (symbolic summation, Gröbner basis computation, and quantifier elimination) and the strongest postcondition calculus. The approach we have presented in this paper is able, like Boogie and Valigator, to automatically infer loop invariants. To this purpose, we can configure the VCG specialization strategy by using suitable generalization operators [11]. Our method does not rely on a specific calculus to generate the VCs, and it is parametric with respect to the logic in which the property of interest is specified. The generation of VCs based on theorem proving and operational semantics has been investigated by Moore and Matthews et al. Moore [46] presents a proof of concept method to prove partial correctness of programs that makes use of a small-step operational semantics. The semantics is explicitly expressed in the logic, and the VCs are generated as a by-product of the correctness proof. Matthews et al. [44] describe a related approach, where it is shown how an off-the-shelf theorem prover and an operational semantics can be converted into a VC generator. The design of general purpose abstract interpreters, parameterized with respect to the semantics of the programming language has been studied by Cousot [6] and implemented in the TVLA system [41]. Finally, somewhat related to our work, we would like to mention the K rewriting-based framework [53], which has been used for defining executable semantics of several programming languages (including ANSI C). We believe that the use of transformational methods can play an important role in the development of highly 32 parametric tools that support the verification of programs, starting from the formal definition of the programming language semantics and the logic of the properties to be proved. 10. Acknowledgments We thank to the anonymous referees of the conferences PPDP 2015 and CILC 2015, and we also thank John Gallagher for stimulating discussions on the subject. We acknowledge the financial support of INDAM-GNCS (Italy). References [1] E. Albert, M. Gómez-Zamalloa, L. Hubert, and G. Puebla. Verification of Java Bytecode Using Analysis and Transformation of Logic Programs. In M. Hanus, editor, Practical Aspects of Declarative Languages, Lecture Notes in Computer Science 4354, pages 124–139, Springer, 2007. [2] M. Barnett, B.-Y. E. Chang, R. De Line, B. Jacobs, and K. R. M. Leino. Boogie: A modular reusable verifier for object-oriented programs. In F. de Boer, M. Bonsangue, S. Graf, and W.-P. de Roever, editors, Formal Methods for Components and Objects, Lecture Notes in Computer Science 4111, pages 364–387, Springer, 2006. [3] N. Bjørner, K. McMillan, and A. Rybalchenko. Program verification as satisfiability modulo theories. In Proceedings of the 10th International Workshop on Satisfiability Modulo Theories, SMT-COMP ’12, pages 3–11, 2012. [4] A. Cimatti, A. Griggio, B. Schaafsma, and R. Sebastiani. The MathSAT5 SMT Solver. In N. Piterman and S. Smolka, editors, Proceedings of TACAS, Lecture Notes in Computer Science 7795, pages 93–107, Springer, 2013. [5] D. Cok and J. Kiniry. ESC/Java2: Uniting ESC/Java and JML. In Proceedings of the 2004 International Conference on Construction and Analysis of Safe, Secure, and Interoperable Smart Devices, CASSIS’04, pages 108–128, Springer-Verlag, 2005. [6] P. Cousot. Abstract interpretation based static analysis parameterized by semantics. In Proceedings of the 4th International Symposium on Static Analysis, SAS ’97, pages 388–394, London, UK, Springer-Verlag, 1997. [7] P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction of approximation of fixpoints. In Proceedings of the 4th ACM-SIGPLAN Symposium on Principles of Programming Languages, POPL ’77, pages 238–252. ACM, 1977. [8] P. Cousot and N. Halbwachs. Automatic discovery of linear restraints among variables of a program. In Proceedings of the Fifth ACM Symposium on Principles of Programming Languages, POPL ’78, pages 84–96. ACM, 1978. [9] E. De Angelis, F. Fioravanti, J. A. Navas, and M. Proietti. Verification of programs by combining iterated specialization with interpolation. In Proceedings First Workshop on Horn Clauses for Verification and Synthesis, HCVS 2014, Vienna, Austria, 17 July 2014, Electronic Proceedings in Theoretical Computer Science, volume 169, pages 3–18, 2014. [10] E. De Angelis, F. Fioravanti, A. Pettorossi, and M. Proietti. VeriMAP: A Tool for Verifying Programs through Transformations. In Proceedings of the 20th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS ’14, Lecture Notes in Computer Science 8413, pages 568–574, Springer, 2014. Available at: http://www.map.uniroma2.it/VeriMAP. [11] E. De Angelis, F. Fioravanti, A. Pettorossi, and M. Proietti. Program verification via iterated specialization. Science of Computer Programming, 95, Part 2:149–175, 2014. Selected and extended papers from Partial Evaluation and Program Manipulation 2013. [12] E. De Angelis, F. Fioravanti, A. Pettorossi, and M. Proietti. Semantics-based generation of verification conditions by program specialization. In Proceedings of the 17th International Symposium on Principles and Practice of Declarative Programming, Siena, Italy, July 14-16, 2015, pages 91–102. ACM, 2015. [13] E. De Angelis, F. Fioravanti, A. Pettorossi, and M. Proietti. Proving correctness of imperative programs by linearizing constrained Horn clauses. Theory and Practice of Logic Programming, 15(4-5):635–650, 2015. [14] E. De Angelis, F. Fioravanti, A. Pettorossi, and M. Proietti. A rule-based verification strategy for array manipulating programs. Fundamenta Informaticae, 140(3-4):329–355, 2015. [15] E. De Angelis, F. Fioravanti, A. Pettorossi, and M. Proietti. Relational Verification through Horn Clause Transformation. In Proceedings of the 23rd International Symposium on Static Analysis, SAS ’16, Lecture Notes in Computer Science 9837, pages 147–169, Springer, 2016. [16] L. M. de Moura and N. Bjørner. Z3: An efficient SMT solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS ’08, Lecture Notes in Computer Science 4963, pages 337–340, Springer, 2008. [17] S. Etalle and M. Gabbrielli. Transformations of CLP modules. Theoretical Computer Science, 166:101–146, 1996. [18] J.-C. Filliâtre and A. Paskevich. Why3 - Where programs meet provers. In Programming Languages and Systems, 22nd European Symposium on Programming, ESOP ’13, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS ’13, Rome, Italy, March 16–24, 2013. Proceedings, Lecture Notes in Computer Science 7792, pages 125–128, Springer, 2013. [19] F. Fioravanti, A. Pettorossi, and M. Proietti. Automated strategies for specializing constraint logic programs. In K.-K. Lau, editor, Proceedings of the Tenth International Workshop on Logic-based Program Synthesis and Transformation, LOPSTR ’00, London, UK, 24-28 July 2000, Lecture Notes in Computer Science 2042, pages 125–146, Springer-Verlag, 2001. [20] F. Fioravanti, A. Pettorossi, M. Proietti, and V. Senni. Improving reachability analysis of infinite state systems by specialization. Fundamenta Informaticae, 119(3-4):281–300, 2012. [21] F. Fioravanti, A. Pettorossi, M. Proietti, and V. Senni. Generalization strategies for the verification of infinite state systems. Theory and Practice of Logic Programming. Special Issue on the 25th Annual GULP Conference, 13(2):175–199, 2013. [22] C. Flanagan. Automatic software model checking via constraint logic. Sci. Comput. Program., 50(1–3):253–270, 2004. [23] C. Flanagan and J. Saxe. Avoiding exponential explosion: Generating compact verification conditions. SIGPLAN Not., 36(3):193–205, 2001. [24] J. P. Gallagher. Tutorial on specialisation of logic programs. In Proceedings of the 1993 ACM SIGPLAN Symposium on Partial Evaluation and Semantics Based Program Manipulation, PEPM ’93, Copenhagen, Denmark, pages 88–98. ACM Press, 1993. 33 [25] J. P. Gallagher and B. Kafle. Analysis and Transformation Tools for Constrained Horn Clause Verification. Theory and Practice of Logic Programming, 14(4-5):90–101 (Supplementary Materials), 2014. [26] G. Gange, J. A. Navas, P. Schachte, H. Søndergaard, and P. J. Stuckey. Horn clauses as an intermediate representation for program analysis and transformation. Theory and Practice of Logic Programming, 15(4-5):526–542, 2015. [27] M. Gómez-Zamalloa, E. Albert, and G. Puebla. Decompilation of Java Bytecode to Prolog by Partial Evaluation. Information and Software Technology, 51(10):1409–1427, October 2009. [28] S. Grebenshchikov, A. Gupta, N. P. Lopes, C. Popeea, and A. Rybalchenko. HSF(C): A Software Verifier based on Horn Clauses. In C. Flanagan and B. König, editors, Proc. of the 18th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS ’12, Lecture Notes in Computer Science 7214, pages 549–551, Springer, 2012. [29] S. Grebenshchikov, N. P. Lopes, C. Popeea, and A. Rybalchenko. Synthesizing software verifiers from proof rules. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, pages 405–416, 2012. [30] K. S. Henriksen and J. P. Gallagher. Abstract interpretation of pic programs through logic programming. In Proceedings of the 6th IEEE International Workshop on Source Code Analysis and Manipulation, SCAM ’06, pages 103–179, 2006. [31] T. Henzinger, T. Hottelier, and L. Kovács. Valigator: A verification tool with bound and invariant generation. In Logic for Programming, Artificial Intelligence, and Reasoning, 15th International Conference, LPAR 2008, Doha, Qatar, November 22-27, 2008. Proceedings, pages 333–342, 2008. [32] H. Hojjat, F. Konecný, F. Garnier, R. Iosif, V. Kuncak, and P. Rümmer. A verification toolkit for numerical transition systems. In D. Giannakopoulou and D. Méry, editors, FM ’12: Formal Methods, 18th International Symposium, Paris, France, August 27–31, 2012. Proceedings, Lecture Notes in Computer Science 7436, pages 247–251, Springer, 2012. [33] J. Jaffar and M. Maher. Constraint logic programming: A survey. Journal of Logic Programming, 19/20:503–581, 1994. [34] J. Jaffar, J. A. Navas, and A. E. Santosa. Unbounded Symbolic Execution for Program Verification. In Proceedings of the 2nd International Conference on Runtime Verification, RV ’11, Lecture Notes in Computer Science 7186, pages 396–411, Springer, 2012. [35] N. D. Jones, C. K. Gomard, and P. Sestoft. Partial Evaluation and Automatic Program Generation. Prentice Hall, 1993. [36] B. Kafle and J. P. Gallagher. Constraint Specialisation in Horn Clause Verification. In Proceedings of the 2015 Workshop on Partial Evaluation and Program Manipulation, PEPM ’15, Mumbai, India, January 15–17, 2015, pages 85–90, ACM, 2015. [37] M. Leuschel and M. Bruynooghe. Logic program specialisation through partial deduction: Control issues. Theory and Practice of Logic Programming, 2(4&5):461–515, 2002. [38] M. Leuschel and M. H. Sørensen. Redundant argument filtering of logic programs. In J. Gallagher, editor, Logic Program Synthesis and Transformation, Proceedings LOPSTR ’96, Stockholm, Sweden, Lecture Notes in Computer Science 1207, pages 83–103, Springer-Verlag, 1996. [39] M. Leuschel, S. S.J. Craig, M. Bruynooghe, and W. Vanhoof. Specialising interpreters using offline partial deduction. In M. Bruynooghe and K.-K. Lau, editors, Program Development in Computational Logic, Lecture Notes in Computer Science 3049, pages 340–375, Springer Berlin Heidelberg, 2004. [40] M. Leuschel and G. Vidal. Fast offline partial evaluation of logic programs. Information and Computation, 235:70–97, 2014. [41] T. Lev-Ami, R. Manevich, and M. Sagiv. Tvla: A system for generating abstract interpreters. In R. Jacquart, editor, Building the Information Society, volume 156 of IFIP International Federation for Information Processing, pages 367–375, Springer US, 2004. [42] J. W. Lloyd. Foundations of Logic Programming. Springer-Verlag, Berlin, 1987. Second Edition. [43] J. W. Lloyd and J. C. Shepherdson. Partial evaluation in logic programming. Journal of Logic Programming, 11:217–242, 1991. [44] J. Matthews, J. Moore, S. Ray, and D. Vroon. Verification condition generation via theorem proving. In M. Hermann and A. Voronkov, editors, Logic for Programming, Artificial Intelligence, and Reasoning, Lecture Notes in Computer Science 4246, pages 362–376, Springer Berlin Heidelberg, 2006. [45] K. L. McMillan and A. Rybalchenko. Solving constrained Horn clauses using interpolation. MSR Technical Report 2013-6, Microsoft Report, 2013. [46] J. Moore. Inductive assertions and operational semantics. In D. Geist and E. Tronci, editors, Correct Hardware Design and Verification Methods, Lecture Notes in Computer Science 2860, pages 289–303, Springer Berlin Heidelberg, 2003. [47] G. C. Necula, S. McPeak, S. P. Rahul, and W. Weimer. CIL: Intermediate language and tools for analysis and transformation of C programs. In R. Horspool, editor, Compiler Construction, Lecture Notes in Computer Science 2304, pages 209–265, Springer, 2002. [48] J. C. Peralta and J. P. Gallagher. Imperative Program Specialisation: An Approach Using CLP. In A. Bossi, editor, Logic Programming Synthesis and Transformation, 9th International Workshop, LOPSTR’99, Venezia, Italy, September 22–24, 1999, Selected Papers, Lecture Notes in Computer Science 1817, pages 102–117, Springer, 2000. [49] J. C. Peralta, J. P. Gallagher, and H. Saglam. Analysis of Imperative Programs through Analysis of Constraint Logic Programs. In G. Levi, editor, Proceedings of the 5th International Symposium on Static Analysis, SAS ’98, Lecture Notes in Computer Science 1503, pages 246–261, Springer, 1998. [50] B. C. Pierce. Types and Programming Languages. MIT Press, Cambridge, MA, USA, 2002. [51] M. Proietti and A. Pettorossi. Unfolding-definition-folding, in this order, for avoiding unnecessary variables in logic programs. Theoretical Computer Science, 142(1):89–124, 1995. [52] C. J. Reynolds. Theories of Programming Languages. Cambridge University Press, 1998. [53] G. Rosu and T. Serbanuta. An overview of the K semantic framework. Journal of Logic and Algebraic Programming, 79(6):397–434, 2010. [54] O. Tange. Gnu parallel - the command-line power tool. ;login: The USENIX Magazine, 36(1):42–47, Feb 2011. [55] A. van Leeuwen. Building verification condition generators by compositional extension. Electronic Notes in Theoretical Computer Science, 191(0):73–83, Proceedings of the Doctoral Symposium affiliated with the Fifth Integrated Formal Methods Conference (IFM 2005), 2007. 34 Appendix Proof of Lemma 1 Proof. In this paper we have assumed that the initial and error properties are expressed by constraints, and hence both the predicates initConf and the predicate errorConf is defined by a constrained fact. Thus, the full unfolding of the initConf and errorConf atoms terminates. Similarly, we assume that the full unfolding of atoms having predicate different from tr and reach (that is, eval, update, etc.) terminates. All tr atoms, except those corresponding to the semantics of function calls, are defined by non recursive predicates. Thus, the full unfolding of these atoms terminates. Let us now consider a sequence of unfolding steps performed during the while-do loop of the Unfolding phase by using the annotation UA. We will show that the sequence is finite. We refer to the command occurring in the source configuration of tr or reach as the source command. We have the following properties: (i) by full unfolding a tr atom whose source command is not a function call, the tr atom is deleted; (ii) by unfolding once a tr atom whose source command is a function call, the tr atom is replaced by an atom of the form reach(cf(cmd(L,Cmd), ), ) which is non-unfoldable because L is the entry point of a function definition; (iii) by unfolding once a reach atom whose source command is an assignment, and then by fully unfolding the derived tr atom, the reach atom is replaced by a new reach atom whose source configuration has a command with a greater label (according to the linear order of the labels); (iv) by unfolding once a reach atom whose source command is a jump goto(L), and then by full unfolding of the derived tr atom, the reach atom is replaced by an atom of the form reach(cf(cmd(L,Cmd), ), ) which is non-unfoldable because L occurs in a goto command; (v) by unfolding once a reach atom whose source command is the abort command, and then unfolding the derived atoms, the reach atom is deleted. By properties (i)–(v), during the sequence either (a) the number of unfoldable atoms in the body of a clause decreases or (b) it does not increase and the source configuration of a reach atom contains a command with a greater label. Thus, the sequence of unfolding steps is finite. ! Proof of Lemma 2 Proof. After the Unfolding phase, all atoms in the body of clauses in SpC have the reach predicate symbol. Indeed, all other atoms are unfoldable (fully or once). Then, as prescribed by the Definition & Folding phase, only clauses of the form newr(V) :- reach(cf1,cf2) will be introduced for folding clauses in SpC. Points (i)–(vii) follow from the definition of the semantics in Table 2. In particular, observe the following facts. Points (i)–(iii) follow from the definition of tr as a binary relation between configurations. Point (iv) follows from the fact that an error configuration can only contain halt or abort commands, and by unfolding a function call we can only derive new reach atoms whose target configuration contains either an abort or a return command. Point (v) directly follows from the definition of tr. Point (vi) follows from the fact that the domain of every global environment is determined by the set of global variables occurring in the program, and the domains of the local environments S1 and S2 are determined by the formal parameters and local variables of the functions where L1 and L2 appear. Point (vii) follows from the fact that the set of program variables belonging to the domain of the environment at a given label is uniquely determined by the label itself, and their values are represented by distinct CLP variables. By Points (i)–(vii), every clause in ∆ contains a label of P in its source configuration and, for every label of P, there are at most three definitions whose source configuration contains that label. Hence the thesis. ! Proof of Lemma 3 Proof. The proof is by contradiction. Suppose that there exist two distinct definitions: D1. new1(X) :- reach(cf1, ) D2. new2(X) :- reach(cf2, ) such that cf1 and cf2 have commands with distinct labels and by unfolding (possibly several times) D1 and D2, respectively, we get two clauses of the form: D3. new1(X) :- ...,reach(cf(cmd(L,Cmd), ),cfz) 35 D4. new2(X) :- ...,reach(cf(cmd(L,Cmd), ),cfz) and then reach(cf(cmd(L,Cmd), ),cfz) is unfolded in both clauses. We may assume that D3 and D4 are the first (in the unfolding order) pair of clauses where this happens. Since L is reached from two distinct labels, it must occur in some ite or goto, or it is the entry point of a function definition. Thus, according to the unfolding annotation UA, reach(cf(cmd(L,Cmd), ),cfz) is non-unfoldable, and we get a contradiction. ! Proof of Lemma 4 Proof. We prove this lemma by taking k = 6. If the command in cf1 is halt, abort, or return, then by unfolding reach(cf1,cfz) in C we derive at most one clause of the form: D1. newp(X)ϑ where ϑ is the most general unifier of cf1 and cfz. Hence, the lemma holds. Otherwise, by unfolding reach(cf1,cfz) in C we derive a clause of the form: D2. newp(X) :- tr(cf1,Y), reach(Y,cfz) and SpC = {D2}. There are the following two cases. (Case 1: tr(cf1,Y) is fully unfoldable.) By unfolding tr(cf1,Y) in D2 we derive at most two clauses. Indeed, precisely two clauses (D3, D4 below) in the case where the command in cf1 is an if-then-else, and one clause (D3) otherwise: D3. newp(X) :- c, reach(cf2,cfz) D4. newp(X) :- d, reach(cf3,cfz) where c and d are constraints. In the case where the command in cf1 is an if-then-else both reach(cf2,cfz) and reach(cf3,cfz) is non-unfoldable. Thus, α(SpC) = α({D3, D4}) = 4 and we get the thesis. Otherwise, SpC = {D3} and reach(cf2,cfz) may be unfoldable. If the command in cf2 is abort, then from D3 we derive a clause with empty body. If the command in cf2 is either an assignment or a jump, then from D3 we derive, by unfolding reach(cf2,cfz) and then by unfolding also the generated tr atom, a clause of the form: D5. newp(X) :- e, reach(cf4,cfz) where e is a constraint. Otherwise, reach(cf2,cfz) is non-unfoldable. The unfolding of D5 will eventually terminate (see Lemma 1) and will derive precisely one clause with at most one atom in its body. Thus, by unfolding, from D2 we derive (possibly in many steps), a set SpC consisting of at most one clause with at most one atom in its body. In this case α(SpC) ≤ 2 and we get the thesis. (Case 2: tr(cf1,Y) is unfoldable once.) This case occurs when cf1 contains a function call. By unfolding tr(cf1,Y) in D2 we derive two clauses of the form: D3. newp(X) :- reach(cf2,cf3), reach(cf4,cfz) D4. newp(X) :- reach(cf2,cf5), reach(cf6,cfz) where: (i) cf2 corresponds to the entry point of a function, cf3 corresponds to an abort command, cf5 corresponds to a return command, and cf4, cf6 are the configurations reached after the exit from the function call. Then, by the unfolding annotation UA, reach(cf2,cf3) and reach(cf2,cf5) are non-unfoldable. The atoms reach(cf4,cfz) and reach(cf6,cfz) are either unfoldable once or non-unfoldable. Since the unfolding of a reach atom (possibly followed by the unfolding of a tr atom) which is annotated as unfoldable once has the effect of replacing that atom by at most one new atom, when the Unfolding phase terminates, from D3 and D4 we derive a set SpC consisting of two clauses with at most two atoms in their body. Thus, α(SpC) ≤ 6. ! Proof of Theorem 6 Proof. (Termination.) The while-loop within the Unfolding phase terminates because no predicate is annotated as unfoldable, and hence exactly one unfolding is performed for each clause in InCls. The Definition-Introduction phase in Figure 4 terminates because it introduces a finite number of definitions, at most one for each atom B occurring in the body of a clause in SpC. The while-loop that iterates the Unfolding and Definition-Introduction phases terminates because each new definition is of the form newp(L):- B, where L is a tuple of variables occurring in B. The while-loop of the Folding phase in Figure 5, terminates because at each iteration the number of occurrences of atoms that can be folded decreases by one. 36 (Correctness.) The correctness of the NLR strategy follows from the correctness of the transformation rules presented in Section 4.1. (Size of the Output.) For any given atom B to be folded by using a definition introduced by the NLR strategy, the Definition-Introduction in Figure 4 either (then branch) replaces the renamed apart definition H :- B in Defs with a definition of the form F :- B such that vars(H) ⊆ vars(F) or (else branch) adds a definition of the form F :- B where vars(F) ⊆ vars(B). Therefore, at the end of the Definition-Introduction phase, for each atom B to be folded, there exists a single definition that can be used to fold all the renamed apart variants of B occurring in the clauses in SpC. Hence, we have that the NLR strategy does not increase the number of predicates with respect to those introduced by the VCG strategy. By construction, each predicate p′ in Prog′ corresponds to a predicate p in Prog such that p and p′ are defined by sets of clauses with the same number of atoms (possibly, with smaller arity). ! Proof of Theorem 7 Proof. (Termination.) Termination follows from the fact that the erasure E is finite and its size decreases at each iteration of the loop of the cFAR algorithm. (Correctness.) Since all the erasures that are not safe are removed by the cFAR algorithm, for all atoms A, we have that A ∈ M(Prog) iff A|E ∈ M(Prog|E ). Thus, in particular, unsafe ∈ M(Prog) iff unsafe ∈ M(Prog|E ). (Size of the Output.) The program Prog|E is obtained from the program Prog by replacing every atom A in Prog with the atom A|E . Thus all clauses C|E in Prog|E have the same number of atoms of the corresponding clause C in Prog. Since, by definition, the size of a clause C is the number of the atoms occurring in C and the size of a set S of clauses is the sum of the sizes of the clauses in S , we have that α(Prog) = α(Prog|E ). ! 37