Academia.eduAcademia.edu

Goanna: Syntactic Software Model Checking

2008, Lecture Notes in Computer Science

Goanna: Syntactic Software Model Checking Ansgar Fehnker, Jörg Brauer, Ralf Huuck, and Sean Seefried National ICT Australia Ltd. (NICTA)⋆ Locked Bag 6016 University of New South Wales Sydney NSW 1466, Australia Abstract. Goanna is an industrial-strength static analysis tool used in academia and industry alike to find bugs in C/C++ programs. Unlike existing approaches Goanna uses the off-the-shelf NuSMV model checker as its core analysis engine on a syntactic flow-sensitive program abstraction. The CTL-based model checking approach enables a high degree of flexibility in writing checks, scales to large number of checks, and can scale to large code bases. Moreover, the tool incorporates techniques from constraint solving, classical data flow analysis and a CEGAR inspired counterexample based path reduction. In this paper we describe Goanna’s core technology, its features and the relevant techniques, as well as our experiences of using Goanna on large code bases such as the Firefox web browser. 1 Introduction Model checking and static analysis are automated techniques promising to ensure (limited) correctness of software and to find certain classes of bugs automatically. One of the drawbacks of software model checkers [1–3] is that they typically operate on a low level semantic abstraction making them suitable for small code bases, but less so for larger software and, when soundness is paramount, are not applicable to industrial C/C++ code containing pointer arithmetic, unions, templates and alike. On the other hand, static analysis tools [4] have been concentrating on a shallower but more scalable and applicable analysis of large code bases. Typically, soundness is sacrificed for performance and practicality [5]. There are, however, many advantages in using a model checker. Specifications can often be given elegantly in temporal logic, there are many built-in optimizations in state-of-the-art tools, and especially CTL model checkers have been shown to be rather insensitive to the number of different checks performed on the same model. In this work we present Goanna, a tool for static program analysis that makes use of the advances of modern model checkers and combines it with constraint ⋆ National ICT Australia is funded by the Australian Governments Department of Communications, Information Technology and the Arts and the Australian Research Council through Backing Australias Ability and the ICT Research Centre of Excellence programs. solving and counterexample based abstraction refinement (CEGAR) techniques [15, 1]. Goanna uses standard symbolic CTL model checking as implemented in the NuSMV [6] tool on a high-level program abstraction. This abstraction includes the control flow graph (CFG) of a program and labels (atomic propositions) consisting of syntactic occurrences of interest. On this level of abstraction model checking is fast, scalable to large code fragments and scalable to many such checks in the same model. Given that there are typically only a few bugs in every thousand lines of code [7] the abstraction is also appropriate for a first approximation. In a second step, more advanced features are used such a constraint solving and CEGAR-inspired path reduction to exclude false alarms. On top we incorporated alias analysis and summary-based interprocedural analysis to gain additional depth while remaining scalable. In Section 2 we briefly explain the underlying technology, followed by a list of additional features in Section 3. A summary of our experiences from analyzing industrial code can be found in Section 4. 2 Core Technology Goanna is built on an automata based static analysis framework as described in [8], which is related to [9–11]. The basic idea of this approach is to map a C/C++ program to its CFG, and to label this CFG with occurrences of syntactic constructs of interest automatically. The CFG together with the labels can be seen as a transition systems with atomic propositions, which can easily be mapped to the input language of a model checker, in our case NuSMV, or directly translated into a Kripke structure for model checking. A simple example of this approach is shown in Fig. 1. Consider the contrived program foo which is allocating some memory, copying it a number of times to a, and freeing the memory in the last loop iteration. One example of a property to check is that after freeing some resource it will not be used, i.e., otherwise indicating some memory corruption. In our approach we syntactically identify program locations that allocate, use, and free resource p. This is done automatically by pattern matching for the pre-defined relevant constructs on the program’s abstract syntax tree. Next, we automatically label the program’s CFG with this information as shown on the right hand side of Fig. 1 and encode the check itself as follows in CTL: AG (mallocp ⇒ AG (f reep ⇒ ¬EF usedp )), which means that whenever there is free after malloc for a resource p, there is no path such that p is used later on. Neglecting any further semantic information will lead to a false alarm in the current example since p is only freed once in the last loop iteration and there is no access to it later. However, the abstraction in Fig. 1 does not reflect this. We will come back to this issue in Section 3.2. One of the advantages of the proposed approach is that, e.g., a stronger variant of the above check can easily be obtained by switching from EF to AF , i.e., warning only when something goes wrong on all paths. As such, CTL model 2 1 void foo() { 2 int x, *a; 3 int *p=malloc(sizeof(int)); 4 for(x = 10; x > 0; x--) { 5 a = *p; 6 if(x == 1) 7 free(p); l1 mallocp l2 l7 l3 usedp f reep l5 } 8 9 l0 l4 l6 } Fig. 1. Example program and labeled CFG for use-after-free check. checking is an easy and powerful approach to defining different properties with different strength quickly. 3 Features The aforementioned static analysis approach allows for describing properties in a simple straightforward fashion. In this section we present a number of more advanced features integrated in Goanna. In particular, we describe new techniques to increase the precision of the analysis and the type of bugs that can be found by the tool. 3.1 Constraint Analysis The model-checking based static analysis approach described in Section 2 is well suited for checking control-flow dependent properties such as certain types of memory leaks, uninitialized variables, potential null-pointer dereferences or alike. However, one of the major concerns for software correctness as well as software security are buffer overruns such as accesses to arrays beyond their bounds. Typically, these problems have been addressed by abstract interpretation solutions [12]. In Goanna we added an interval constraint solving approach that uses the approach first developed by [13]. These techniques guarantee a precise least solution for interval analysis including operations such as additions, subtraction and multiplication without any widening in the presence of (simple) loops. Goanna uses interval analysis to infer for every integer variable in every program statement its potential range and uses it to, e.g., check for buffer overand underruns. 3 3.2 False Path Reduction Not every bug is a real one, i.e., the initial coarse syntactic abstraction might lead to a number of false positives. For instance in the example in Figure 1 the memory will only be accessed as long as the memory is not freed. A straightforward flowsensitive analysis will not catch this fact. In Goanna we make use of the aforementioned interval constraint solving approach to rule out some of those spurious alarms automatically [14]. For a given counterexample path Goanna subjects it to the interval analysis as described in the previous Section 3.1. This means, for every variable every possible value will be approximated along the counterexample path. E.g., in the program of Figure 1 the counterexamples loops trough the program after condition x==1 becomes true. However, when x = 1, the loop counter will be set to x = 0 in the next iteration, invalidating the loop condition of x > 0 and preventing the loop body to be re-entered. An interval constraint analysis over the counterexample path will come to the same conclusion by discovering that here is no possible satisfying valuation for x. Goanna will learn this fact and create an observer automaton as described in [14]. This observer rules out a minimum set of conflicts, i.e., those conditions responsible for a false path. Observers are put in parallel composition to the existing coarse grained model to obtain a more precise model showing fewer false positives. While the parallel composition adds some overhead to the overall computation time, it is only added in a few cases were spurious alarms are found, ensuring an overall efficient analysis. Similar to CEGAR the above process is iterated until no more bugs are found or no more counterexample paths can be eliminated. Note that since the interval analysis is an approximation itself, it is not guaranteed that all false positives can be removed. 3.3 Interprocedural Analysis The aforementioned analysis techniques are mostly applicable for intraprocedural analysis, i.e., analyzing one function at a time. While this is sufficient for returning good results on standard software, it neglects, e.g., null pointers being passed on through several functions and then finally dereferenced without being properly checked. Therefore, we developed a summary-based interprocedural analysis. This analysis includes two features: The computation of alias information by inferring points-to sets [16] and computing a summary based on the lattice of the property under investigation. In the case of passing null pointer information, the summary records for every passed pointer if it points to Null, to something not Null or to an unknown value. Given the call graph of the whole program Goanna computes the fixed point on the summary information, rerunning the local analysis when needed. Even in the presence of recursion this procedure typically terminates within two to three iterations involving a limited set of functions. 4 3.4 Additional Features We briefly summarize a number of additional features built into Goanna, which are mostly on a usability level. Integration. Goanna can be run from the command line as well as be tightly integrated in the Eclipse IDE. In the latter case, all the warnings are displayed in the IDE, settings and properties can be set in the IDE and for every bug a counterexample trace created by the model checker can be displayed. As a consequence, Goanna can be used during software development for every compilation. Incremental analysis. To minimize the analysis overhead Goanna creates hashes of every function and object. If those functions and objects are not changed between compilations there will be no re-analysis. Of course, hashes are insensitive to additional comments, line breaks and minor code rearrangements. User defined checks. Goanna comes with a simple language including CTL-style patterns, which enables the user to define his own checks. The language builds on pre-defined labels for constructs of interest. This language does not enable full flexibility, but it is safe to use and covers most scenarios. A full abstract language is in preparation. 4 Experiences We have been evaluating Goanna on numerous open source projects as well as on industrial code. The largest code base has been the Firefox web browser, which has 2.5 million lines of code after preprocessing. Run-time is roughly 3 to 4 times slower than compilation itself for intraprocedural analysis. Interprocedural analysis can double the run-time, but it is worth to mention, that most of the time is spent in the alias analysis that is not yet efficiently implemented. In a typical analysis run over 90% of files are analyzed in less than 2 seconds. Roughly 40% of the analysis time is spent for model checking, 30% for interval analysis and 30% for pattern matching, parsing and other computations. Adding more checks gives modest penalties, i.e., a threefold increase in the number of checks doubles the analysis time. Interval analysis is fast as long as the number of constraints is below a few hundred, i.e., resulting in a maximal overhead of one second. In very rare cases, due to C/C++ macros or C++ templates a function might contain several hundreds of variables and very long code fragments. In these cases the overall analysis might take considerable time. However, in our experience from analyzing the source code of Firefox a time out of 2 minutes was only exceeded once out of roughly 250, 000 functions for the complete analysis. The overall defect density is between 0.3 to 2 bugs per 1000 lines of code, depending on the code base and type of checks enabled. This is comparable with other commercial static code checkers [4]. 5 References 1. Clarke, E., Kroening, D., Sharygina, N., Yorav, K.: SATABS: SAT-based predicate abstraction for ANSI-C. In: Proc. TACAS 2005. LNCS 3440 (2005) 570–574 2. Clarke, E., Kroening, D., Lerda, F.: A tool for checking ANSI-C programs. In: Proc. TACAS 2004. LNCS 2988 (2004) 168–176 3. Henzinger, T.A., Jhala, R., Majumdar, R., McMillan, K.L.: Abstractions from proofs. In: POPL. (2004) 232–244 4. Emanuelsson, P., Nilsson, U.: A comparative study of industrial static analysis tools. Electronic notes in theoretical computer science (2008) 5. Engler, D., Chelf, B., Chou, A., Hallem, S.: Checking system rules using systemspecific, programmer-written compiler extensions. In: Proc. Symposium on Operating Systems Design and Implementation, San Diego, CA. (October 2000) 6. Cimatti, A., Clarke, E., Giunchiglia, E., Giunchiglia, F., Pistore, M., Roveri, M., Sebastiani, R., Tacchella, A.: NuSMV Version 2: An OpenSource Tool for Symbolic Model Checking. In: Intl. Conf. on Computer-Aided Verification (CAV 2002). Volume 2404 of LNCS., Springer (2002) 7. scan.coverity.com: Open source report. Technical report, Coverity Inc. (2008) 8. Fehnker, A., Huuck, R., Jayet, P., Lussenburg, M., Rauch, F.: Model checking software at compile time. In: Proc. TASE 2007, IEEE Computer Society (2007) 9. Holzmann, G.: Static source code checking for user-defined properties. In: Proc. IDPT 2002, Pasadena, CA, USA (June 2002) 10. Dams, D., Namjoshi, K.: Orion: High-precision methods for static error analysis of C and C++ programs. Bell Labs Tech. Mem. ITD-04-45263Z, Lucent Technologies (2004) 11. Schmidt, D.A., Steffen, B.: Program analysis as model checking of abstract interpretations. In: Proc. SAS ’98, Springer-Verlag (1998) 351–380 12. Cousot, P., Cousot, R.: Systematic design of program analysis frameworks. In: Conference Record of the Sixth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Antonio, Texas, ACM Press, New York, NY (1979) 269–282 13. Gawlitza, T., Seidl, H.: Precise fixpoint computation through strategy iteration. In: ESOP. LNCS 4421, Springer Verlag (2007) 300–315 14. Fehnker, A., Huuck, R., Seefried, S.: Counterexample guided path reduction for static program analysis. In: Correctness, Concurrency, and Compositionality. Volume number to be assigned of Festschrift Series., LNCS (2008) 15. Henzinger, T., Jhala, R., Majumdar, R., SUTRE, G.: Software verification with BLAST. In: Proc. SPIN2003. LNCS 2648 (2003) 235–239 16. Andersen, L.: Program Analysis and Specialization for the C Programming Language. PhD thesis, DIKU, Unversity of Copenhagen (1994) 6