OCaml Static Analysis
OCaml Static Analysis
OCaml Static Analysis
281
slicer might remove information leaks — in particular, a malicious 3.2 Control over the use of resources
programmer could insert information leaks that he knows the tradi- One of Frama-C’s first plug-ins was a value analysis based on ab-
tional slicer used for the audit will remove. stract interpretation. This plug-in computes supersets of possible
values for expressions of the program. Among other things, these
over-approximated sets are useful to exclude the possibility of a
run-time error. In contrast with the heuristic techniques used in
other static analysis tools, which may be very efficient but solve
a different problem, shortcuts do not work when the question is
to correctly — that is, without false negatives — find all possible
run-time errors in large embedded C programs. The analysis has
to start from the entry point of the analyzed program and unroll
function calls (and often loops). In addition, the modular nature of
Frama-C and the interface that the value analysis aimed at provid-
ing to other plug-ins meant that abstract states had to be memo-
rized at each statement, which dictated the choice of persistent data
structures, with sharing between identical sub-parts of states (Cuoq
and Doligez 2008). This meant at the very least that a garbage-
collected language had to be used. While there are popular imper-
ative languages that are garbage-collected nowadays, and some of
these languages have huge libraries of data structures ready to use,
persistent data structures are often under-represented in these li-
braries, and are slightly annoying to write in these languages. For
writing a static analyzer, one is not worse off with OCaml and a few
select libraries (Conchon et al. 2008; Filliâtre and Conchon 2006)
Figure 1. Frama-C’s value analysis displaying results in the GUI than with any of Python, the .NET framework or the Java platform,
although it is by no means impossible to write static analyzers in
any of these.
2.3 History of Frama-C By contrast, in the development of Caveat in C++, there were
At the Software Reliability Labs, Frama-C was initially seen as a explicit deallocation functions, and sanity checks that warned at
major evolution to Caveat (Baudin et al. 2002; Randimbivololona the end of a session if deallocation had been forgotten for some
et al. 1999; Delmas et al. 2008), a software verification tool for C nodes. Programming time that could have been spent usefully was
programs based on Hoare logic. Caveat is used by Airbus for part spent writing the calls to the deallocation primitives and the source
of the verification of part of the software embedded in the A380. code’s readability was diminished, but this is neglectable compared
As the DO-178B standard mandates, Caveat has been qualified by to the time that had to be spent debugging (the warnings told the
Airbus as a verification tool to be used for the certification of this developper that the deallocation had been forgotten, not where
particular software. Caveat is supported by the Software Reliability it should have been done). The solution to a subtle deallocation
Labs but new ideas are now developed within Frama-C. problem was often to make copies of existing nodes (although it is
OCaml was pushed as the implementation language to choose not meaningful to compare the memory consumption of Caveat to
for Frama-C by the new hires, but the actual reason it was ac- that of Frama-C’s value analysis, because they work on different
cepted is that OCaml was not completely unheard of to the senior principles).
researchers there. Indeed, OCaml was already used in the (predom- To conclude, in Frama-C, garbage collection paradoxically en-
inantly C++) Caveat project as the scripting language that allows ables a tighter control of memory usage than explicit deallocation
an interactive validation process to be re-played in batch mode. because it makes sharing possible in practice. OCaml is a strict lan-
guage. We do not know what the influence of lazy evaluation would
3. Technical context be on memory use.
282
This drawback (less developers implies less available libraries) with the help of source distribution systems such as GARNOME or
is mitigated by higher code re-usability when you do find code MacPorts. In this case, retrieving and installing the dependencies of
to re-use, the existence of the Caml Hump3 and the fact that the the gtksourceview1 library (a GTK + widget for displaying source
grapevine works well between OCaml developers. OCaml does not code) is a pain. This is not directly an OCaml problem, but another
have anything that begins to compare for instance with CPAN, the development platform could have made it possible to use the mod-
Comprehensive Perl Archive Network, but it does have a healthy ern gtksourceview2 (which solves the dependencies problem) or
community. provided more toolkits to choose from initially (to the best of our
knowledge, OCaml only offers Tk and GTK + at this time). Now
3.4 Portability that gtksourceview2 has become stable, there is talk on the lablgtk
It would have been an annoyance if the development platform for development list about including it in lablgtk. This is anyway a very
Frama-C had allowed it to be used only on Unix variants, because minor quibble. It should be kept in mind for comparison that Java
many potential users only have access to Windows stations. Unix or .NET/Mono do not come pre-installed on every platform either.
being the system used by the majority of the researchers at the Again, we have no reason to regret the choices of OCaml and GTK +
Software Reliability Labs, the switch to a Windows-only platform from the standpoint of portability.
was not considered. Motivated users — and at this time actually OCaml as a scripting language It should be noted that while this
deploying formal methods requires motivation anyway — have is not its strongest point, OCaml is an acceptable scripting lan-
found their way past this limitation for previous projects developed guage. In other words, when OCaml is chosen as the main language
at the Labs. for a new project, the project may be saved the introduction of addi-
From this point of view, the choice of OCaml (and later of GTK + tional dependencies towards various dedicated scripting languages
as the toolkit for the graphical interface, through the lablgtk bind- down the road. For instance, the HTML pages of the Frama-C web
ings) was an excellent compromise, with Unix clearly being the site are processed with the yamlpp preprocessor4 , which is written
primary platform of the compiler, and Win32 robustly supported in OCaml. For comparison, in its 15 years of development, Caveat
through either Cygwin or Visual C++. Compiling a large OCaml had at one point accumulated dependencies towards Perl, Python,
project on Windows+Cygwin is slow. This is probably caused one Bash and Zsh (in addition to C++, and in addition to OCaml, used
way or the other by the use of Unix compilation idioms (config- for journalizing and re-playing). Some of these dependencies have
uration script, makefile) on an OS where things are usually done since be removed, by replacing some tools with OCaml equivalents.
differently, and is not a limitation in this context. Whatever the main language, discipline is obviously the foremost
It should be noted that many OCaml developments are refer- factor in avoiding “dependency creep”.
enced in the source distribution GODI, with dependency lists and
automated compilation scripts. All of those Frama-C dependencies 3.5 Module system
that are written in OCaml are referenced in GODI, and Frama-C it-
self is. Some Frama-C users who have no interest in OCaml outside OCaml’s module system (Leroy 1996) has direct advantages: it
of Frama-C have found that this was the most convenient installa- creates separate namespaces and, when the modules are in separate
tion path for them. files and interfaces have been defined, fully type-checked separate
Frama-C has been tested under Windows, Mac OS X, Solaris compilation. It is easy to underestimate the importance of these
(32-bit), OpenBSD and Linux. Binaries are also distributed for features in the management of a big project because they make the
some of these platforms. compiler transparent, but when they are missing, their absence is
unpleasantly noticeable. We discuss these, and the (theoretically
64-bit readiness A 64-bit address space is available to OCaml more interesting) functor system per se.
programs for many 64-bit platforms. Frama-C compiles in both
32- and 64-bit mode, and the resulting versions are functionally Separate compilation With OCaml, in bytecode, separate com-
identical. This was not a big effort, as OCaml encourages to write pilation has the same meaning as everywhere: compilation is par-
high-level code, for instance by providing bignums. For some 64- allelizable and only modified files need to be recompiled, with a
bit-aware platforms (Mac OS X), it is a simple configure option quick final link phase. With native compilation, all the ancestors of
to choose between 32-bit or 64-bit pointers at the time of compiling the modified modules in the dependency graph must be recompiled,
OCaml. For others, it is troublesome to go against the default size and the compilation of two files with a parenthood relationship
(Linux). With Linux, getting an OCaml compiler with a word size can not be parallelized. Depending on the structure of an OCaml
different from the distribution default is akin to building a cross- project, recompilation after an incremental change in a low-level
compiler. However, efforts are under way in the OCaml community module may sometimes feel longish, but in truth, it is much faster
to improve support for cross-compilation, including from Linux to to recompile Frama-C with ocamlopt than to recompile Caveat
Win32. We are looking forward to the maturation of such initia- with g++.
tives. The existence of two OCaml compilers, one with blazingly fast
compilation in general, the other with acceptable recompilation
Availability of a graphical toolkit Frama-C uses the GTK + toolkit time and producing reasonably fast code, allows very short modify-
for its graphical user interface. This section does not discuss the recompile-test cycles. Again, it is easy to take short recompilation
merits or demerits of this toolkit with respect to others. The ques- times for granted but with other languages, when a software project
tion it tries to answer is “If I choose OCaml for a software project, grows in size, this can sometimes be lost, and sorely missed.
will I find one satisfactory toolkit to design the user interface The OCaml compiler tries very hard not to get in the way
with?”. The choice of GTK + for Frama-C was somewhat arbitrary, between a programmer and his program, and it does not force the
but it allows to give a positive answer to the question above, without programmer to write interfaces. However, if the interface m.mli
prejudice to other available toolkits. is missing for module M, the compiled interface is generated from
Our experience is that for some Unix variants (Solaris, Mac m.ml. This means that any change to m.ml changes the compiled
OS X, very old Linux distributions), it is necessary to obtain and interface and forces the recompilation of every module that uses M,
compile the missing GTK + libraries manually, or semi-manually even in bytecode. In a large project, modules should always have
3 The Caml Hump an is informal central repository for OCaml libraries 4 http://www.lri.fr/
~filliatr/yamlpp.en.html
283
interfaces, if only for the sake of separate compilation. OCaml has when a function takes several arguments of the same type with no
an option to generate automatically the interface m.mli that exports obvious normal order between them. The only language that we
everything from M. know of with a feature vaguely similar to OCaml’s labels is Objec-
tive C’s infix notation for function calls.
Separate namespaces for compilation units Orthogonally to sep- Syntax begets style. The “OCaml style” is to write pure func-
arate compilation, but as importantly for big projects, OCaml’s tions unless an exception needs to be made because the syntax re-
module system provides separate namespaces. Better yet, the com- wards the use of immutable definitions, as seen in the following:
piler option -pack allows to group several compilation units into let x = 2 in ... x ...
a namespace. As a consequence, compilation units that have been let x = ref 2 in ... !x ...
put into different packs may safely use the same name for a type, We argue that using labels rewards consistent naming schemes in a
variable or module. For instance the types tree in files both called similar fashion. When it is common for an argument to be passed
m.ml in packs lib1 and lib2 are seen as Lib1.M.tree and repeatedly as-is from caller to callee without any computations ac-
Lib2.M.tree. tually happening to it (and in a persistent setting as much of Frama-
This feature is very useful for libraries because libraries may use C is, this happens with a lot of arguments), the labels syntax re-
very common filenames (util.ml) with the guarantee that there wards the consistent choice of a unique label and eponymous vari-
will not be a clash at link-time for users of this library (on condition able name for this argument by a very concise syntax. In this ex-
that the pack name itself is unique). ample, the function f is being defined and calls functions g and
In Frama-C, plug-ins are independent from each other: each eval.
plug-in only interfaces with the Frama-C kernel, and does not see
the implementation details of other plug-ins. In order to implement let f ~mode ~env x y =
this separation, the Frama-C system automatically packs each plug- let context = ... in
in. Thus, two different plug-ins may use files with identical names ... g ~mode ~context (x+y) ...
and still be linked together within Frama-C. ... eval ~mode ~context ~env ...
Interfaces and functors The possibility to write functors (mod- If the programmer deviates from this style by using different la-
ules that are parameterized by other modules or functors), intro- bel names or variable names for mode, context, or env, he re-
duced before objects (at the time of Caml Special Light), has ceives a gentle slap on the wrist in the form of the awkward
proved a workable, and completely statically checked, alternative ~context:computation_context syntax. This changes the way
to object-oriented programming. We use OCaml objects only when of reading labels-enabled OCaml programs, too. The reader can
interfacing with existing OCaml code that uses objects (CIL and put more trust in the names of variables, without having to look for
lablgtk), and use functors for the rest. context all the time. The level of obtrusiveness of the label syntax
Some very structural idioms seem destined to be expressed is exactly the same as with the definition of mutable values, and it
with objects (or, for that matter, class types): equality, pretty- is exactly right, too. Good style is encouraged but the system can
printing, hashconsing or marshaling functions5 . Most of our data be circumvented when it needs to.
structure are complicated enough that automatically produced Optional arguments (a syntax for giving a labeled argument a
pretty-printers or equality functions would not fit the bill. Con- default value if it is omitted) are convenient when the consequences
sequently, it is in our case neither more nor less tedious to write of the omission (and subsequent use of the default value) are visible
modules (and interfaces) that sport, in addition to a type t, func- and traceable (for instance, to provide a toolkit interface that is both
tions such as pretty: Format.formatter -> t -> unit and powerful and beginner-friendly). It is in general a bad idea to use an
equal: t -> t -> bool, and for unmarshaling values of a hash- optional argument to add a new mode to an existing function, both
consed type, rehash: t -> t. But, speaking of equality, it should because of all the existing calls to this function — that the compiler
be noted on the other hand that OCaml’s polymorphic comparison would be glad to help the programmer inspect if s/he did not use an
functions (including =, >= and even ==) are dangerous pitfalls. The optional argument — and because of all the calls to be written in
type-checker does not complain when they are applied wrongly the future where the optional argument will be omitted by accident.
instead of, for instance, equal above.
In OCaml, the module system allows to encapsulate the defini- 4. Development of Frama-C
tions of data structures, and in particular to give a purely functional There are a number of features in Frama-C’s architecture that any
interface to a sophisticated data structure that uses mutable values Frama-C developer must be aware of. The goal of this section is not
internally for optimization. With this compromise, the amount of to provide a complete list — which can be found in the Frama-C
stateful information that the programmer has to keep in mind is plug-in development guide (Signoles 2008) — but to give a mildly
limited by the module boundaries, and the implementation’s algo- technical overview of each interesting one, with reference to the
rithmic complexity may be better than that of all known pure im- OCaml feature(s) that make its implementation possible.
plementations. Some in the OCaml community call such an impure
module “persistent” (Conchon and Filliâtre 2007). In fact, some 4.1 Software architecture
positive reports on the industrial use of Haskell (Nanavati 2008)
The software architecture of Frama-C is plug-in-oriented. This ar-
resonate deeply with our own programming experience, except that
chitecture allows fine-grained collaboration of analysis techniques
we attribute to OCaml’s module system the advantages attributed
(as opposed to the large-grain collaboration that happens when one
there to Haskell’s purity.
technique is used in a first pass and another in a second pass). As
3.6 Labels and optional arguments a consequence, mutual recursion between plug-ins must be possi-
ble: a plug-in A must be able to use a plug-in B that uses A. A
OCaml allows to use labels for function arguments. This feature Frama-C plug-in is a packed compilation unit (see Section 3.5) but,
does not make anything possible that was not already, but in prac- unfortunately, OCaml does not support mutually-recursive compi-
tice, labels provide a concise way to remove the risk of confusion lation units. This problem is circumvented pragmatically by using
references to functions placed in a module that all plug-ins are al-
5 In
the presence of hashconsing, not only do you have to write your own lowed to depend on.
unmarshaling functions, but they are extremely tricky to get right
284
This “central directory” module is called db.ml and a snippet 4.4 Multi-project framework
of it may look like: Frama-C is able to handle several ASTs simultaneously. This al-
/* db.ml: kernel database of plug-in stubs */ lows to build slicing plug-ins where an original AST is navigated
module Plugin1: sig val run: (unit->unit) ref end through while a reduced AST is being built. Each of these ASTs
module Plugin2: sig ... end has its own state (containing for instance the results of the analy-
ses that have been run on this AST). The AST and corresponding
During its initialization, a plug-in registers each of its exported state form what is called a project (Signoles 2009). The desirable
functions in the appropriate stub in Db — another OCaml feature is “safety property” of projects is the absence of interference between
that each compilation unit can have its own initialization effects. two distinct projects. To enforce this property, each global mutable
/* plugin1_register.ml */ value of Frama-C must be “projectified”. A set of functors are pro-
let run () = ... (* the analysis goes here *) vided to this effect (these functors add a project-aware indirection
to any mutable data that is used by any of the functions made visible
(* registration of [run] in the kernel *) by the plug-in). We wish OCaml’s type system helped us enforce
let () = Db.Plugin1.run := run this rule, but we plan to move to dynamic tags to detect at least
at analysis-time when a variable that should have been projectified
Thought this solution is the most common way to break mu- wasn’t.
tual recursion between compilation units, it has trade-offs. Firstly,
polymorphic functions may not be registered this way. This is not 4.5 Journalization
an issue here: each plug-in is a static analyzer, and none of the
analyzers we have written so far wanted to provide polymorphic During an interactive session, Frama-C journalizes most of the ac-
functions. Secondly, the types of the registered functions have to tions which modified its global state. This means that, like Caveat,
be known by the Frama-C kernel. Here again, that is not a big issue it generates an OCaml script retracing what happened during the
in our context, especially because Frama-C encourages the use of session. The journal may be compiled and statically or dynami-
ACSL (Baudin et al. 2008), a common specification language, as cally linked with the Frama-C kernel in order to replay the same
the lingua franca to transmit knowledge between plug-ins. Finally, actions. Furthermore, the journal can be used to grasp Frama-C’s
the most significant trade-off with this solution is that any plug-in internals (translating GUI actions into function calls), and it is pos-
that wishes to provide an interface to other plug-ins (as opposed to sible to modify it before compiling and replaying it. As for dynamic
interacting with the user only) needs to modify some well-identified loading, phantom types allow to safely implement this feature (see
parts of the Frama-C kernel. This has not been a problem so far Section 4.6).
because, for now, plug-ins written outside Frama-C’s development 4.6 Phantom types for dynamic typing in a static setting
team have all been dedicated to answering a specific problem, as
opposed to providing computations to help other plug-ins. Both dynamic loading and journalization rely on phantom types
(Rhiger 2003). Phantom types — parameterized types which em-
4.2 Dynamic loading of plug-ins ploy their type variables for encoding meta-information — are used
OCaml has allowed dynamic linking of bytecode compilation units in both cases to ensure the dynamic safety of function calls which
(through module Dynlink) for a long time. In OCaml 3.11, dy- cannot be checked by the OCaml type system. Indeed we provide
namic linking of native code became available for a large number a library of dynamic typing. Its implementation requires the use
of target architectures. of unsafe features (through OCaml standard library’s module Obj)
Frama-C uses dynamic linking where available in order to pro- but phantom types allow to provide a safe interface: the use of the
vide dynamic loading of plug-ins. This is an alternative way to plug library cannot break type safety (as long as there is no implemen-
analyzers into the Frama-C kernel. When dynamic linking is used tation error in the library).
for the plugging, the plug-in’s functions are registered in a global
table in the kernel at load-time. Because all functions do not have 5. Conclusion
the same ML type, phantom types (Rhiger 2003) are used in or- We have not yet considered the point of view of the external Frama-
der to dynamically ensure the program’s safety (see Section 4.6). C plug-in developer. We hope to see in the future many useful plug-
Dynamic linking solves two out of three issues of static linking: it ins written outside the circle of the initial developers. It is too early
ceases to be necessary for the kernel to be aware of the types of all to draw conclusions on the consequences of the choice of OCaml as
plug-ins’ exported functions, and it becomes more convenient to the platform’s language for this goal. Responses so far have ranged
distribute a plug-in separately from Frama-C (in particular a plug- from the enthusiastic (“and it’s even written in OCaml”) to the
in no longer needs to patch the kernel). rejection (“[...]drawback that the extensions have to be written in
Ocaml[sic]”), with in the middle at least one person who decided
4.3 Impure functional programming
to learn OCaml because there was something s/he wanted to do
Most analyses in Frama-C are written in a functional style. How- with Frama-C.
ever Frama-C’s value analysis (whose results are used by many
other plug-ins) relies on hashconsing (Filliâtre and Conchon 2006) Acknowledgments
and memoization, which are both implemented with mutable data
structures. More generally, Frama-C makes use of imperative fea- We would like to acknowledge the help of our colleagues at ProVal,
tures in order to improve efficiency. For instance, the abstract syn- at INRIA Sophia Antipolis’ projects Everest and now Marelle,
tax tree (inherited from CIL) contains many mutable fields. Besides, and at CEA LIST, in the building of Frama-C. The feedback of
Frama-C has a global state which is composed of many global ta- users of Frama-C within the CAT project, at Fraunhofer FIRST
bles. or elsewhere has been great. The anonymous referees suggested
various improvements to this experience report. Special thanks go
to the developers of the OCaml system.
Keywords OCaml, software architecture, plug-ins, static analysis
285
References Jean-Christophe Filliâtre and Sylvain Conchon. Type-safe modular hash-
consing. In ML ’06: Proceedings of the 2006 workshop on ML, pages
Patrick Baudin, Anne Pacalet, Jacques Raguideau, Dominique Schoen, and
Nicky Williams. Caveat: a tool for software validation. In Dependable 12–19, New York, NY, USA, 2006. ACM. ISBN 1-59593-483-9.
Systems and Networks, 2002, pages 537+, 2002. Xavier Leroy. A syntactic theory of type generativity and sharing. Journal
Patrick Baudin, Jean-Christophe Filliâtre, Thierry Hubert, Claude Marché, of Functional Programming, 6:1–32, 1996.
Benjamin Monate, Yannick Moy, and Virgile Prevosto. ACSL: ANSI C Yaron Minsky. Caml trading: Experiences in functional programming on
Specification Language (preliminary design V1.4), preliminary edition, Wall Street. In Wouter Swierstra, editor, The Monad.Reader, April 2007.
October 2008. URL http://frama-c.cea.fr/acsl.html. Benjamin Monate and Julien Signoles. Slicing for security of code. In Peter
Géraud Canet, Pascal Cuoq, and Benjamin Monate. A value analysis for C Lipp, Ahmad-Reza Sadeghi, and Klaus-Michael Koch, editors, TRUST,
programs, 2009. To appear in the proceedings of SCAM2009. volume 4968 of Lecture Notes in Computer Science, pages 133–142.
Sylvain Conchon and Jean-Christophe Filliâtre. A persistent union-find data Springer-Verlags, March 2008.
structure. In ML ’07: Proceedings of the 2007 workshop on Workshop Ravi Nanavati. Experience report: a pure shirt fits. SIGPLAN Not., 43(9):
on ML, pages 37–46, New York, NY, USA, 2007. ACM. ISBN 978-1- 347–352, 2008. ISSN 0362-1340.
59593-676-9. doi: http://doi.acm.org/10.1145/1292535.1292541. George C. Necula, Scott Mcpeak, Shree P. Rahul, and Westley Weimer.
Sylvain Conchon, Jean-Christophe Filliâtre, and Julien Signoles. Designing CIL: Intermediate language and tools for analysis and transformation
a generic graph library using ML functors. In Marco T. Morazán, editor, of C programs. In International Conference on Compiler Construction,
Trends in Functional Programming, volume 8 of Trends in Functional pages 213–228, 2002.
Programming, pages 124–140. Intellect, UK/The University of Chicago Famantanantsoa Randimbivololona, Jean Souyris, Patrick Baudin, Anne
Press, USA, 2008. ISBN 978-1-84150-196-3. Pacalet, Jacques Raguideau, and Dominique Schoen. Applying formal
Pascal Cuoq. Documentation of Frama-C’s value analysis plug- proof techniques to avionics software: A pragmatic approach. In FM
in, 2008. URL http://frama-c.cea.fr/download/ ’99: Proceedings of the Wold Congress on Formal Methods in the De-
frama-c-manual-Lithium-en.pdf. velopment of Computing Systems-Volume II, pages 1798–1815, London,
Pascal Cuoq and Damien Doligez. Hashconsing in an incrementally UK, 1999. Springer-Verlag. ISBN 3-540-66588-9.
garbage-collected system: a story of weak pointers and hashconsing in Morten Rhiger. A foundation for embedded languages. ACM Transactions
OCaml 3.10.2. In ML ’08: Proceedings of the 2008 ACM SIGPLAN on Programming Languages and Systems (TOPLAS), 25(3):291–315,
workshop on ML, pages 13–22, New York, NY, USA, 2008. ACM. ISBN 2003. ISSN 0164-0925.
978-1-60558-062-3. Julien Signoles. Plug-in development guide, 2008. URL http://
David Delmas, Stéphane Duprat, Patrick Baudin, and Benjamin Monate. frama-c.cea.fr/download/plug-in_development_guide.pdf.
Proving temporal properties at code level for basic operators of con- Julien Signoles. Foncteurs impératifs et composés: la notion de projets
trol/command programs. In 4th European Congress on Embedded Real dans Frama-C. In Actes des Journées Francophones des Langages
Time Software, 2008. Applicatifs, pages 37–54, January 2009. In French.
286