BedRock Systems Ebook - Developing With Formal Methods

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

(c) 2022 BedRock Systems, Inc.

This paper incorporates material from:


G. Malecha, G. Stewart, F. Farka, J. Haag and Y. Hirai, “Developing
With Formal Methods at BedRock Systems, Inc.,” in IEEE Security
& Privacy, vol. 20, no. 3, pp. 33-42, May-June 2022, doi: 10.1109/
MSEC.2022.3158196.

URL:https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnum-
ber=9760701&isnumber=9782817
The BedRock HyperVisorTM is a commercial, highly concurrent, verified virtualiza-
tion platform that employs formal methods to enable proofs of complex, lock-free
concurrent code; support automating proofs of large programs; and integrate with
“informal” parts of the software lifecycle.

Building on academic research but with feet firmly ware engineering best practices. Further, our ex-
planted in industrial applications, BedRock SystemsTM periences suggest that FM techniques are increas-
is in the process of building the BedRock HyperVisor ingly able to directly address (and in some cases,
(BHVTM), a formally verified, highly concurrent, mi- improve upon) current best practices in software
crokernel-based commercial hypervisor. By for- engineering.
mally verified, we mean that the C++ and assem-
bly-code implementation of the operating system Despite the alignment of aims, the road is not al-
is proved correct in the Coq proof assistant.1 By ways an easy one. Pioneering FM at scale means
highly concurrent, we mean that we use, and veri- that we must build many tools and libraries our-
fy, lock-free data structures in core parts of the im- selves. Although FM tools still lag behind more
plementation. That the BHV is microkernel based mainstream tools, we believe they have matured
means that most of the system runs in user mode to the point of being usable in an industrial con-
on top of a small, kernel-level program, which pro- text. Further, development and adoption of these
vides bare-bones abstractions such as address tools is growing, and we anticipate the situation
spaces and interprocess communication (IPC). will continue to improve.

To verify the BHV, BedRock Systems is pioneer- Beyond the tools themselves, writing verified soft-
ing the use of formal methods (FM) at scale. This ware, even in mainstream languages, still suffers
experience report explains how we use FM and from a dearth of libraries. One of the main benefits
why. Our experience shows that advances in FM of mature ecosystems such as C++ is the availabil-
techniques finally enable them to integrate well ity of libraries, but very few of these libraries are
in the standard software development process. In formally specified, let alone verified. Our own work
essence, FM-based software development is “just” has already begun to address this problem inter-
mathematically rigorous software engineering. nally, and we believe that it is just a matter of time
Additionally, the design aims of FM align with soft- before developers are able to use libraries that are
formally verified.

bedrocksystems.com 2
bedrocksystems.com 3
Figure 1. The BHV. Top-level applications
such as the Virtual Machine Monitor
(VMM) and virtual switch (vSwitch) sit
atop the master controller. The master
controller provides support for userland
services, which are used by independent
programs to provide application-specif-
ic services. The NOVA microhypervisor
provides minimal primitives that enable
these features. The Active Security mod-
ule enforces security policies on guests
at runtime. UMX is the system console
multiplexer.

bedrocksystems.com 4
An Extensible, Distrusting Platform
The BHV supports running unverified applications side by side with verified ones, without compromising the in-
tegrity of the verified applications. Without this requirement, we could verify a weaker specification that requires
a disciplined use of the API. However, these weaker specifications are insufficient when adversarial code might
be running on the platform. To address this issue, the BHV’s top-level specification uses an operational semantics
based on process calculi in which untrusted processes, such as guests, are modeled by their machine-level behav-
ior. Although verbose in some cases, operational semantics enables us to model untrusted code as simply “what
the bits say.”

Supporting unverified applications is crucial in practice because it enables a path to verified systems rather than
mandating an all-or-nothing mentality. This enables both “preview” releases, which may contain unverified func-
tionality, and the ability to support customer applications, which have not been verified.

bedrocksystems.com 5
BRiCk—A Program Logic for C++
To verify C++ programs, we need formal reasoning principles for the language fragment that the programs use. These
rules are codified in BRiCk,5 the BedRock C++ program logic. BRiCk builds off of the Clang/LLVM6 compiler front end
and uses its source-level abstract syntax to represent C++ programs.

BRiCk axiomatizes reasoning principles for each type of node. We justify these reasoning principles informally by ap-
pealing to the C++ standard7 and academic work-formalizing aspects of both C8 and C++.9 The choice to formalize C++
axiomatically, rather than operationally, is primarily a pragmatic one: it enables us to easily underspecify language
constructs, grow the supported feature set over time and, we believe, accurately model the standard.

Automation for BRiCk


Automation is crucial to scaling program verification to large and evolving code bases. BedRock System’s automation
for BRiCk is built around the mental model of symbolic debugging, where the current state is expressed formally in
separation logic and the core automation interprets program fragments against this state. To be understandable, the
automation must preserve program-specific abstractions as much as possible. Reaching into a complex invariant to
justify a read may enable the verification to make progress, but the resulting state is often incomprehensible to clients
who wish to remain insulated from the definition of the invariant. To achieve this, library developers write, and prove,
“hints” encapsulating common reasoning patterns that are not immediately obvious from the code. These hints cover
coding patterns sanctioned by the library writer and are applied automatically when clients follow these guidelines.
Deviating from these coding patterns leads the automation to get stuck, but in an understandable state that facilitates
debugging.

BedRock System’s automation also provides more manual tactics for reasoning about language constructs that de-
serve special attention. For example, loop invariants are notoriously difficult to find automatically, so we provide
tactics to specify loop invariants manually. Our collection of tactics also includes some more specialized tactics for
reasoning about common coding patterns such as cas-loops and foreach-loops. Beyond their general usefulness,
these tactics also document reusable reasoning patterns. We have also experimented with tactics that apply more ag-
gressive heuristics. However, we tend to prefer slightly more verbose (but maintainable) proof scripts as these more
aggressive tactics can sometimes fail in unpredictable ways.

bedrocksystems.com 6
bedrocksystems.com 7
bedrocksystems.com 8
Once we have covered the core functionality of the Although the previous two bugs were found during
class, we formalize the definitions. Generally, this pro- the FM review phase, in other instances, our reviews
cess is rather straightforward, but it relies on good missed subtle bugs that were ultimately uncovered as
working knowledge of the verification tool (Coq, in our we formalized the proof. In the UMX, we uncovered a
case). At this level, we are choosing specific data repre- synchronization issue that would occur if a client dis-
sentations, such as whether to use a list or a finite map, connected at precisely the right time during data for-
how to express the relationship between an array in warding. Ultimately, this bug could cause data loss, but
C++ and its length, and so on. Many of these problems reliably triggering it in a testing scenario would be ex-
can be solved prescriptively, e.g., “always use arrayR to tremely difficult.
represent an array”; however, we do not claim to have
all of right answers yet. But as we verify more code, Beyond logical bugs, FM code reviews and proofs un-
we refine patterns and create new ones. Beyond pro- covered portability and standards compliance issues
viding a codified best practice, patterns such as these within our code. Portability bugs often arise in code
also enable us to narrow the focus of the automation’s that implements low-level data marshaling and might
development. rely, e.g., on the endianness of the system or the ability
to perform unaligned reads and writes. Although strict
When code does not fit within our existing abstrac- adherence to the C++ standard may seem overly pe-
tions, we look to expand them, develop new ones, or dantic, we believe that it is the only viable path forward
rework the code to fit within them. Although rewriting in the long term. Optimizing compilers crucially rely on
code may seem to indicate that our techniques are undefined behavior to enable optimizations, and non-
not up to the challenge, we note that developers often compliant code can result in bugs at higher optimiza-
prefer simplified code, and on many occasions, very tion levels that are difficult to track down because they
subtle bugs have been found around these points. do not exist in debug builds. The C++ standard is the
Mathematically, separation logic can scale to arbitrari- contract between developers and compiler writers; if
ly complex code, but keeping reasoning simple is often developers need something that the standard does
the better path in the long term. When proposing code not provide, the standard needs to be expanded to
changes, we always consider runtime costs, readabili- provide it.
ty considerations, and limitations (or enablements) of
the new code.

Bugs
Throughout verification we found and fixed a number
of bugs across all parts of our stack. Although many
bugs are found during testing, we have determined
that concurrency bugs, resource leaks, and error-han-
dling logic are especially difficult to test and are there-
fore often caught by code reviews or during proofs.
For example, in developing our shared-pointer library,
we ran a significant number of randomized tests with-
out uncovering several bugs caught during the FM
code review. Another instance of a subtle logic bug
was a synchronization issue within NOVA, which could
cause incorrect continuation to be used when switch-
ing threads (execution contexts in NOVA). In these in-
stances, and many others involving concurrency, we
found that state-based reasoning, which focuses on
what is true in a particular state, is more useful than
trace-based reasoning, which describes the operations
that took place to arrive at a given state, for zeroing in
on problems.

bedrocksystems.com 9
Proof Maintenance. Keeping proofs in sync with code is from some of the more technical details of specifica-
essential to maintaining high quality through refactor- tions and code.
ings. At BedRock Systems, our continuous integration
(CI) checks that all proofs succeed before any merge We contrast this semiautomatic reasoning with more
to the main branch. Overall, we have not found this to manual reasoning traditionally provided by interactive
be particularly burdensome as well-designed verified theorem provers and the Iris Proof Mode (IPM).11 The
code tends to be fairly stable. When code changes are IPM provides fine-grained context management and
small (and correct), our automation is often able to dis- low-level primitive tactics for reasoning about sep-
charge new obligations automatically, and no changes aration logic formulas. This sort of reasoning is ideal
to the proof scripts are necessary. Inevitably though, for subtle proofs that require tricky resource man-
more complex changes, especially those that affect agement; our metatheory leverages this verification
class invariants and concurrency protocols, require approach extensively. When verifying C++ programs,
manual proof maintenance. Robust proof automation however, we find that large parts of the proof are “fol-
and appropriate abstractions mitigate the burden to low-your-nose” proofs. Indeed, in many instances, the
some degree, but do not scale to all interface- and program is effectively the proof, and the proof is mere-
specification-level refactorings. In these situations, the ly bookkeeping. In these circumstances, it is ideal to
cost is unavoidable: significant algorithmic changes teach the automation to follow its own nose so that
will require fundamentally different proofs. the verification engineer can focus on subtle aspects
of the verification.
Beyond guaranteeing bug freedom, one benefit of
proofs over testing is that they are hyperlocal. Proof We offer two instances where custom, but reusable,
failures tell you exactly the point where the code may hints accelerated proof development. The first arose
be broken as well as giving you the symbolic state that when verifying higher-level specifications on top of
is problematic. This information is especially useful in lower-level ones for the microkernel. When verifying
tracking down concurrency “Heisenbugs” that occur the microkernel, we need to provide specifications
infrequently and are therefore difficult to test and re- that are maximally distrusting of applications running
produce. atop it. We achieve this by providing low-level, “undis-
ciplined” specifications that can support arbitrary, es-
When working on improving automation, checking pecially concurrent, usage. In practice, however, these
proofs in CI is crucial as proofs of code double as test specifications can be both difficult to read and program
cases for automation. Timing statistics from CI runs against. On top of these specifications, we can prove
provide useful information around automation perfor- simpler specifications that are able to hide details
mance. Line-count statistics of new proofs are a good when using the API in a more restricted setting. As a
first-order signal of the effectiveness of the automa- simple example, if we know that a capability must refer
tion because verbose proofs often point to shortcom- to a semaphore object, then we do not need to consid-
ings in automation, although it is important to factor in er error codes from the microkernel that correspond
complexity of the underlying code as well. to capability mismatches. Proving the well-behaved
specifications from the unsafe ones can be onerous,
Extensible Automation but is generally not complicated; however, the tedium
The need to understand and maintain proofs requires of this task can be alleviated by a small number of ge-
that we express them at a high level. Making this pos- neric insights, which are easily expressed within our
sible for larger C++ programs that evolve over time hint infrastructure. Using these hints, we reduced the
requires that we keep the proofs small by automating size of some proofs by more than half, which great-
the administrative reasoning necessary to complete a ly increased the readability and maintainability of the
proof. To facilitate high-level reasoning, our automa- proofs as the specifications evolved. Ultimately, the
tion is post-facto extensible using stylized reasoning proofs became fairly close to the proof outlines written
principles that we call hints. Hints are justified once and by experts because the automation was extended with
applied auto-matically by automation whenever the sit- the expert’s strategy for reasoning.
uation merits it. These hints enable us to reason at a
natural level of abstraction while also insulating clients

bedrocksystems.com 10
A second instance where hints were able to abstract virtual functions in C++.
low-level details arose when working with arrays, and In theory, we could desugar these to tables of function
especially with array initialization and destruction. For pointers, but doing so (even abstractly) would expose
modularity purposes, BRiCk’s semantics describes ar- reasoning principles not justified by the C++ standard.
ray initialization compositionally by initializing each Further, desugaring the construct would require all de-
array cell sequentially. Although this provides a clear velopers using the construct to reason about the de-
specification, using this approach becomes quite costly sugaring, something which is clearly undesirable. Sup-
when working with large arrays. Providing special hints porting the feature directly not only keeps us closer to
for default-initializing primitive arrays enabled us to the standard but also enables us to build opinionated
fuse many reasoning steps together, resulting in more abstractions and automation for the use cases of vir-
natural descriptions of the program state. Our hints tual functions.
can also codify patterns for reasoning about array ac-
cesses in a natural way by decomposing (and recom- Supporting Sophisticated Language Features. Industri-
posing) a large array into locations of interest and the al languages also have features that are difficult to rea-
rest of the array. These sorts of “borrows” constitute son about. The archetypical difficult language feature
a large part of the administrative reasoning necessary in C++ is the object model, which is front and center
in low-level C++ programs, and our ability to express in C++ semantics. Keep in mind that the concurrent
these borrowing patterns generically generally means memory model is another cross-cutting feature, albeit
that the automation can churn through array reason- one more limited in scope, because regular C++ vari-
ing with relatively little manual intervention. able accesses must be data-race free. Although devel-
opers often think of C++ pointers as virtual addresses,
Industrial Programming Languages the C++ language puts significantly more structure on
One of the largest sources of complexity in verification them. This additional structure gives optimizers the
is not the code we write but the language in which we ability to reason more aggressively about programs
write it. C++, and modern programming languages but comes at the cost of more bookkeeping in formal
in general, are both large and complex and provide proofs. For example, our semantics tracks pointer
formal reasoning challenges in and of themselves. Al- provenance to rule out undefined behavior that arises
though necessary to address, we note that these chal- from low-level pointer manipulation.
lenges arise on a per-language rather than a per-pro-
gram basis, so most FM practitioners need not worry Although not pervasive, the interoperation between
about these issues. Further, BRiCk already addresses C++ and assembly is another necessary part of low-lev-
these issues for the fragment of C++ that it supports. el programming. Beyond just giving a semantics for
assembly, we are forced to answer questions such as,
Supporting Large Languages. The size of modern lan- “What is the effect of sharing memory between multi-
guages means that we must formalize them incremen- ple programs?” Empirically, we know what compilers
tally. To facilitate this, BRiCk’s semantics is directly ex- do, but the standard text is silent on many of these
pressed as a program logic, rather than as a derived low-level questions.
logic on top of an underlying operational semantics.
This approach allows us to leverage the built-in modu- To avoid these issues in the short term, we make ju-
larity of separation logic to modularize our semantics. dicious, simplifying assumptions that seem to hold
It also makes it natural to underspecify the semantics in practice. For example, BRiCk assumes that deallo-
of particular language constructs, a property that is cating memory does not invalidate the pointer, an as-
essential early on and still useful when working with sumption not sanctioned by the standard. This choice
multiple related languages, e.g., C++14 and C++17. requires that we add liveness side conditions to cer-
tain operations to avoid obvious unsoundnesses. Re-
Although many language features can be desugared to searchers suggest12 that this seems necessary (in C) to
simpler primitives, we avoid this when possible. This support common, low-level programming idioms and
is partly for soundness as some transformations only is therefore likely justified by compilers in practice.
refine the high-level specification; however, there are
also reasons to support the sugar natively. Consider

bedrocksystems.com 11
bedrocksystems.com 12
Hiring and Training for FM
Spike-based formal verification improves onboarding, but hiring for FM is still difficult. There are few candidates
with general FM expertise and even fewer with expertise in specialized areas such as concurrent separation logic.
The universities teaching FM to undergraduates help mitigate the gap, but we have found that teaching FM “on the
job” is necessary. Even Ph.D.s with deep experience in separation logic require training to transition from academic
FM (which often focus on smaller programs and meta-level issues) to industrial program verification. Over time,
this transition period at BedRock Systems has shortened, and we believe that good training material and exposure
through spike-based verification will further reduce the overhead.

When hiring general software engineers, we have found that candidates with functional programming background
are able to pick up FM much more readily than those without such exposure. In part, this is attributable to the
fact that Coq is built around a functional programming language (Gallina), but we believe that it is more than that.
Functional programming languages tend to focus on minimalism, which seems to train developers to more quickly
separate the core problem from the noise surrounding it. This skill is transferable not only to specification writing
but also to the design of good interfaces.

Figure 2. The lifecycle of a


specification (spec). We veri-
fy code in spikes that address
both users and implementers
of specified code, iteratively
refining specs until they are
both realizable and useful. Lat-
er clients benefit from usabili-
ty improvements that occur in
previous verification cycles.

Incorporating FM into all aspects of the software de- bases means building automatable, but also highly
velopment lifecycle, from software system design to expressive, program logics. BedRock System’s auto-
implementation to code maintenance, has the poten- mation for C++ supports a hybrid of highly automat-
tial to revolutionize the software industry. But making ed reasoning when possible and deliberate reasoning
pervasive FM a reality requires solving deep technical when necessary. The need for both is essential as pro-
and nontechnical challenges, many of which we have grams grow not only in size but also complexity.
begun to address at BedRock Systems.
On the nontechnical side, it is crucial to expand profi-
On the technical side, we see refining language stan- ciency in FM across the board. Specialized FM experts
dards and improving automation as crucial barriers will still be necessary, but we believe that much of the
that are beginning to fall. Reasoning about industrial knowledge required for verification could be made
languages such as C++ is necessary but raises difficult accessible to undergraduates. Mainstream interest
problems in semantics, especially around complicated in Rust and the growing popularity of functional pro-
corners of standards. This is an active area of research, gramming languages both suggest that attitudes to-
and increasingly, standards committees (especially the ward new technologies are changing in a positive way
C standard committee) are seeing the value in it. Engi- for FM. In the meantime, on-the-job training can com-
neering verification to scale to complex industrial code pensate for a lack of general knowledge.

bedrocksystems.com 13
1. Y. Bertot and P. Castran, Interactive Theorem Prov- Gregory Malecha is the director of formal methods at
ing and Program Development: Coq’Art the Calcu- BedRock Systems, Inc., San Mateo, California, 94401,
lus of Inductive Constructions. New York, NY, USA: USA. His research interests include formal verification,
Springer-Verlag, 2010. automation, and programming languages. Malecha
2. M. S. Tsirkin and C. Huck, “Virtual I/O Device (VIR- received a Ph.D. in computer science from Harvard
TIO),” OASIS Virtual I/O Device (VIRTIO) Techni- University. Contact him at gregory@bed-rocksystems.
cal Committee, Version 1.1, [Online]. Avail-able: com.
https://docs.oasis-open.org/virtio/virtio/v1.1/vir-
tio-v1.1.html Gordon Stewart is a formal methods lead at BedRock
3. J. Reynolds, “Separation logic: A logic for shared Systems, Inc., San Mateo, California, 94401, USA. His
mutable data structures,” in Proc. 17th Annu. IEEE research interests include formal verification and com-
Symp. Logic Comput. Sci., 2002, pp. 55–74, doi: piler correctness. Stewart received a Ph.D. in computer
10.1109/LICS.2002.1029817. science from Princeton University. Contact him at
4. R. Jung et al., “Iris: Monoids and invariants as an gordon@bedrocksystems.com.
orthogonal basis for concurrent reasoning,” ACM
SIG-PLAN Notices, vol. 50, no. 1, pp. 637–650, 2015, František Farka is a senior formal methods engineer
doi: 10.1145/2775051.2676980. at BedRock Systems, Inc., San Mateo, California, 94401,
5. “BedRock systems/BRiCk.” GitHub. https://github. USA. His research interests include logic in computer
com/bedrocksystems/BRiCk (Accessed: Nov. 30, science, type theory, and proof search. Farka received
2021). C. a Ph.D. in computer science from the University of St
6. Lattner and V. Adve, “LLVM: A compilation frame- Andrews and Heriot-Watt University. Contact him at
work for lifelong program analysis and transfor- frantisek@bedrocksys tems.com.
mation,” in Proc. Int. Symp. Code Generation Opti-
mization, San Jose, CA, USA, Mar. 2004, pp. 75–86, Jasper Haag is a formal methods engineer at BedRock
doi: 10.1109/CGO.2004.1281665. Systems, Inc., San Mateo, California, 94401, USA. His
7. Information Technology—Programming Languag- research interests include formal verification. Haag re-
es—C++, ISO/IEC 14882: 2011. ceived a B.S. in computer science from the Massachu-
8. R. J. Krebbers, “The C standard formalized in Coq,” setts Institute of Technology. Contact him at
Ph.D. dissertation, Radboud University Nijmegen, jasper@bedrocksystems.com.
Nijmegen, The Netherlands, 2015.
9. T. Ramananandro, G. Dos Reis, and X. Leroy, “For- Yoichi Hirai is a senior software engineer BedRock
mal verification of object layout for C++ multiple Systems, Inc., San Mateo, California, 94401, USA. His
inheritance,” SIGPLAN Notices, vol. 46, no. 1, pp. research interests include modal logics for knowledge
67–80, Jan. 2011, doi: 10.1145/1925844.1926395. and concurrency. Hirai received a Ph.D. in computer
10. U. Steinberg and B. Kauer, “Nova: A microhyper- science from the University of Tokyo. Contact him at
visor- based secure virtualization architecture,” yoichi@bedrocksystems.com.
in Proc. 5th Eur. Conf. Comput. Syst., Association
for Computing Machinery, 2010, pp. 209–222, doi:
10.1145/1755913.1755935. (c) 2022 BedRock Systems, Inc.
11. R. Krebbers, A. Timany, and L. Birkedal, “Interactive
This paper incorporates material from:
proofs in higher-order concurrent separation log- G. Malecha, G. Stewart, F. Farka, J. Haag and Y. Hirai, “Developing
ic,” SIG-PLAN Notices, vol. 52, no. 1, pp. 205–217, With Formal Methods at BedRock Systems, Inc.,” in IEEE Security
& Privacy, vol. 20, no. 3, pp. 33-42, May-June 2022, doi: 10.1109/
Jan. 2017, doi: 10.1145/3093333.3009855.
MSEC.2022.3158196.
12. P. E. McKenney et al., Pointer Lifetime-End Zap,
ISO/IEC JTC1/SC22/WG21 P1726R0, 2019. [Online]. URL:https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnum-
ber=9760701&isnumber=9782817
Avail-able: http://www.open-std.org/jtc1/sc22/
wg21/docs/papers/2019/p1726r0.pdf

bedrocksystems.com 14

You might also like