0% found this document useful (0 votes)
114 views21 pages

1812 00140 PDF

This document surveys the field of fuzzing, which is a technique for discovering software vulnerabilities by feeding programs random or malformed inputs. It presents a unified model of the fuzzing process and taxonomy of fuzzing techniques. The document then systematically reviews the literature on each stage of fuzzing, from input generation to crash analysis. It aims to consolidate progress in fuzzing and address issues like inconsistent terminology that hinder research.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views21 pages

1812 00140 PDF

This document surveys the field of fuzzing, which is a technique for discovering software vulnerabilities by feeding programs random or malformed inputs. It presents a unified model of the fuzzing process and taxonomy of fuzzing techniques. The document then systematically reviews the literature on each stage of fuzzing, from input generation to crash analysis. It aims to consolidate progress in fuzzing and address issues like inconsistent terminology that hinder research.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

1

The Art, Science, and Engineering of Fuzzing:


A Survey
Valentin J.M. Manès, HyungSeok Han, Choongwoo Han, Sang Kil Cha, Manuel Egele,
Edward J. Schwartz, and Maverick Woo

Abstract—Among the many software vulnerability discovery techniques available today, fuzzing has remained highly popular due to its
conceptual simplicity, its low barrier to deployment, and its vast amount of empirical evidence in discovering real-world software
vulnerabilities. At a high level, fuzzing refers to a process of repeatedly running a program with generated inputs that may be
arXiv:1812.00140v4 [cs.CR] 8 Apr 2019

syntactically or semantically malformed. While researchers and practitioners alike have invested a large and diverse effort towards
improving fuzzing in recent years, this surge of work has also made it difficult to gain a comprehensive and coherent view of fuzzing. To
help preserve and bring coherence to the vast literature of fuzzing, this paper presents a unified, general-purpose model of fuzzing
together with a taxonomy of the current fuzzing literature. We methodically explore the design decisions at every stage of our model
fuzzer by surveying the related literature and innovations in the art, science, and engineering that make modern-day fuzzers effective.

Index Terms—software security, automated software testing, fuzzing.

1 I NTRODUCTION Figure 1 on p. 5) and an increasing number of fuzzing


Ever since its introduction in the early 1990s [152], fuzzing studies appear at major security conferences (e.g. [225],
has remained one of the most widely-deployed techniques [52], [37], [176], [83], [239]). In addition, the blogosphere is
to discover software security vulnerabilities. At a high level, filled with many success stories of fuzzing, some of which
fuzzing refers to a process of repeatedly running a program also contain what we consider to be gems that warrant a
with generated inputs that may be syntactically or seman- permanent place in the literature.
tically malformed. In practice, attackers routinely deploy Unfortunately, this surge of work in fuzzing by re-
fuzzing in scenarios such as exploit generation and pene- searchers and practitioners alike also bears a warning sign
tration testing [21], [109]; several teams in the 2016 DARPA of impeded progress. For example, the description of some
Cyber Grand Challenge (CGC) also employed fuzzing in fuzzers do not go much beyond their source code and
their cyber reasoning systems [37], [200], [9], [93]. Fueled by manual page. As such, it is easy to lose track of the de-
these activities, defenders have started to use fuzzing in an sign decisions and potentially important tweaks in these
attempt to discover vulnerabilities before attackers do. For fuzzers over time. Furthermore, there has been an observ-
example, prominent vendors such as Adobe [1], Cisco [2], able fragmentation in the terminology used by various
Google [61], [5], [15], and Microsoft [38], [8] all employ fuzzers. For example, whereas AFL [231] uses the term “test
fuzzing as part of their secure development practices. More case minimization” to refer to a technique that reduces the
recently, security auditors [237] and open-source develop- size of a crashing input, the same technique is called “test
ers [4] have also started to use fuzzing to gauge the security case reduction” in funfuzz [187]. At the same time, while
of commodity software packages and provide some suitable BFF [49] includes a similar-sounding technique called “crash
forms of assurance to end-users. minimization”, this technique actually seeks to minimize
The fuzzing community is extremely vibrant. As of this the number of bits that differ between the crashing input
writing, GitHub alone hosts over a thousand public reposi- and the original seed file and is not related to reducing
tories related to fuzzing [86]. And as we will demonstrate, input size. This makes it difficult, if not impossible, to
the literature also contains a large number of fuzzers (see compare fuzzers using the published evaluation results. We
believe such fragmentation makes it difficult to discover
and disseminate fuzzing knowledge and this may severely
• V. J. M. Manès is with KAIST Cyber Security Research Center, Korea
E-mail: valentin.manes@kaist.ac.kr. hinder the progress in fuzzing research in the long run.
• H. Han and S. K. Cha are with KAIST, Korea Due to the above reasons, we believe it is prime time
E-mail: hyungseok.han@kaist.ac.kr and sangkilc@kaist.ac.kr. to consolidate and distill the large amount of progress in
• C. Han is with Naver Corp., Korea
E-mail: cwhan.tunz@navercorp.com.
fuzzing, many of which happened after the three trade-
• M. Egele is with Boston University books on the subject were published in 2007–2008 [79], [203],
E-mail: megele@bu.edu. [205].
• E. J. Schwartz is with SEI, Carnegie Mellon University As we attempt to unify the field, we will start by us-
E-mail: edmcman@cmu.edu
• M. Woo is with Carnegie Mellon University ing §2 to present our fuzzing terminology and a unified
E-mail: pooh@cmu.edu. model of fuzzing. Staying true to the purpose of this paper,
Corresponding author: Sang Kil Cha. our terminology is chosen to closely reflect the current
Manuscript submitted on April 8, 2019. predominant usages, and our model fuzzer (Algorithm 1,
2

p. 3) is designed to suit a large number of fuzzing tasks Definition 4 (Fuzz Campaign). A fuzz campaign is a specific
as classified in a taxonomy of the current fuzzing literature execution of a fuzzer on a PUT with a specific security
(Figure 1, p. 5). With this setup, we will then explore every policy.
stage of our model fuzzer in §3–§7, and present a detailed
The goal of running a PUT through a fuzzing campaign
overview of major fuzzers in Table 1 (p. 6). At each stage,
is to find bugs [26] that violate the specified security policy.
we will survey the relevant literature to explain the design
For example, a security policy employed by early fuzzers
choices, discuss important trade-offs, and highlight many
tested only whether a generated input—the test case—
marvelous engineering efforts that help make modern-day
crashed the PUT. However, fuzz testing can actually be used
fuzzers effective at their task.
to test any security policy observable from an execution, i.e.,
EM-enforceable [183]. The specific mechanism that decides
2 S YSTEMIZATION , TAXONOMY, AND T EST P RO - whether an execution violates the security policy is called
GRAMS the bug oracle.
The term “fuzz” was originally coined by Miller et al. in Definition 5 (Bug Oracle). A bug oracle is a program, per-
1990 to refer to a program that “generates a stream of ran- haps as part of a fuzzer, that determines whether a given
dom characters to be consumed by a target program” [152, execution of the PUT violates a specific security policy.
p. 4]. Since then, the concept of fuzz as well as its action—
“fuzzing”—has appeared in a wide variety of contexts, in- We refer to the algorithm implemented by a fuzzer
cluding dynamic symbolic execution [90], [226], grammar- simply as its “fuzz algorithm”. Almost all fuzz algorithms
based test case generation [88], [105], [213], permission test- depend on some parameters beyond (the path to) the PUT.
ing [24], [80], behavioral testing [122], [175], [224], complex- Each concrete setting of the parameters is a fuzz configura-
ity testing [135], [222], kernel testing [216], [196], [186], repre- tion:
sentation dependence testing [121], function detection [227],
Definition 6 (Fuzz Configuration). A fuzz configuration of
robustness evaluation [223], exploit development [111], GUI
a fuzz algorithm comprises the parameter value(s) that
testing [197], signature generation [72], and penetration
control(s) the fuzz algorithm.
testing [81], [156]. To systematizethe knowledge from the
vast literature of fuzzing, let us first present a terminology The definition of a fuzz configuration is intended to be
of fuzzing extracted from modern uses. broad. Note that the type of values in a fuzz configuration
depend on the type of the fuzz algorithm. For example, a
2.1 Fuzzing & Fuzz Testing fuzz algorithm that sends streams of random bytes to the
Intuitively, fuzzing is the action of running a Program Under PUT [152] has a simple configuration space {(PUT)}. On
Test (PUT) with “fuzz inputs”. Honoring Miller et al., we the other hand, sophisticated fuzzers contain algorithms
consider a fuzz input to be an input that the PUT may not be that accept a set of configurations and evolve the set over
expecting, i.e., an input that the PUT may process incorrectly time—this includes adding and removing configurations.
and trigger a behavior that was unintended by the PUT For example, CERT BFF [49] varies both the mutation ratio
developer. To capture this idea, we define the term fuzzing and the seed over the course of a campaign, and thus its
as follows. configuration space is {(PUT, s1 , r1 ), (PUT, s2 , r2 ), . . .}. A
seed is a (commonly well-structured) input to the PUT, used
Definition 1 (Fuzzing). Fuzzing is the execution of the PUT to generate test cases by modifying it. Fuzzers typically
using input(s) sampled from an input space (the “fuzz input maintain a collection of seeds, and some fuzzers evolve the
space”) that protrudes the expected input space of the PUT. collection as the fuzz campaign progresses. This collection is
Three remarks are in order. First, although it may be called a seed pool. Finally, a fuzzer is able to store some data
common to see the fuzz input space to contain the expected within each configuration. For instance, coverage-guided
input space, this is not necessary—it suffices for the former fuzzers may store the attained coverage in each configura-
to contain an input not in the latter. Second, in practice tion.
fuzzing almost surely runs for many iterations; thus writing
“repeated executions” above would still be largely accurate.
2.2 Paper Selection Criteria
Third, the sampling process is not necessarily randomized,
as we will see in §5. To achieve a well-defined scope, we have chosen to include
Fuzz testing is a form of software testing technique that all publications on fuzzing in the last proceedings of 4 ma-
utilizes fuzzing. To differentiate it from others and to honor jor security conferences and 3 major software engineering
what we consider to be its most prominent purpose, we conferences from Jan 2008 to February 2019. Alphabetically,
deem it to have a specific goal of finding security-related the former includes (i) ACM Conference on Computer and
bugs, which include program crashes. In addition, we also Communications Security (CCS), (ii) IEEE Symposium on
define fuzzer and fuzz campaign, both of which are common Security and Privacy (S&P), (iii) Network and Distributed
terms in fuzz testing: System Security Symposium (NDSS), and (iv) USENIX Se-
curity Symposium (USEC); and the latter includes (i) ACM
Definition 2 (Fuzz Testing). Fuzz testing is the use of fuzzing
International Symposium on the Foundations of Software
to test if a PUT violates a security policy.
Engineering (FSE), (ii) IEEE/ACM International Conference
Definition 3 (Fuzzer). A fuzzer is a program that performs on Automated Software Engineering (ASE), and (iii) Interna-
fuzz testing on a PUT. tional Conference on Software Engineering (ICSE). For writ-
3

ALGORITHM 1: Fuzz Testing tions, the current time telapsed , and a timeout tlimit as
Input: C, tlimit input, and selects a fuzz configuration to be used for
Output: B // a finite set of bugs the current fuzz iteration. See §4.
1 B←∅ I N P U T G E N (conf) → tcs
2 C ← P R E P R O C E S S (C) I N P U T G E N takes a fuzz configuration as input and
3 while telapsed < tlimit ∧ C O N T I N U E (C) do returns a set of concrete test cases tcs as output.
4 conf ← S C H E D U L E (C, telapsed , tlimit )
When generating test cases, I N P U T G E N uses specific
5 tcs ← I N P U T G E N (conf)
// Obug is embedded in a fuzzer parameter(s) in conf. Some fuzzers use a seed in conf
6 B′ , execinfos ← I N P U T E V A L (conf, tcs, Obug ) for generating test cases, while others use a model or
7 C ← C O N F U P D A T E (C, conf, execinfos) grammar as a parameter. See §5.
8 B ← B ∪ B′ I N P U T E V A L (conf, tcs, Obug ) → B′ , execinfos
9 return B I N P U T E V A L takes a fuzz configuration conf, a set of
test cases tcs, and a bug oracle Obug as input. It
executes the PUT on tcs and checks if the executions
ings that appear in other venues or mediums, we include violate the security policy using the bug oracle Obug . It
them based on our own judgment on their relevance. then outputs the set of bugs found B′ and information
As mentioned in §2.1, fuzz testing only differentiates about each of the fuzz runs execinfos, which may
itself from software testing in that fuzz testing is security be used to update the fuzz configurations. We assume
related. In theory, focusing on security bugs does not imply Obug is embedded in our model fuzzer. See §6.
a difference in the testing process beyond the selection of C O N F U P D A T E (C, conf, execinfos) → C
a bug oracle. The techniques used often vary in practice, C O N F U P D A T E takes a set of fuzz configurations C, the
however. When designing a testing tool, access to source current configuration conf, and the information about
code and some knowledge about the PUT are often assumed. each of the fuzz runs execinfos, as input. It may
Such assumptions often drive the development of testing update the set of fuzz configurations C. For example,
tools to have different characteristics compared to those of many grey-box fuzzers reduce the number of fuzz
fuzzers, which are more likely to be employed by parties configurations in C based on execinfos. See §7.
other than the PUT’s developer. Nevertheless, the two fields C O N T I N U E (C) → {True, False }
are still closely related to one another. Therefore, when we C O N T I N U E takes a set of fuzz configurations C as input
are unsure whether to classify a publication as relating to and outputs a boolean indicating whether a new fuzz
“fuzz testing” and include it in this survey, we follow a iteration should occur. This function is useful to model
simple rule of thumb: we include the publication if the word white-box fuzzers that can terminate when there are no
fuzz appears in it. more paths to discover.

2.4 Taxonomy of Fuzzers


2.3 Fuzz Testing Algorithm
For this paper, we have categorized fuzzers into three
We present a generic algorithm for fuzz testing, Algorithm 1, groups based on the granularity of semantics a fuzzer ob-
which we imagine to have been implemented in a model serves in each fuzz run. These three groups are called black-,
fuzzer. It is general enough to accommodate existing fuzzing grey-, and white-box fuzzers, which we define below. Note
techniques, including black-, grey-, and white-box fuzzing that this classification is different from traditional software
as defined in §2.4. Algorithm 1 takes a set of fuzz configu- testing, where there are only two major categories (black-
rations C and a timeout tlimit as input, and outputs a set of and white-box testing) [158]. As we will discuss in §2.4.3,
discovered bugs B. It consists of two parts. The first part is grey-box fuzzing is a variant of white-box fuzzing that can
the P R E P R O C E S S function, which is executed at the begin- only obtain some partial information from each fuzz run.
ning of a fuzz campaign. The second part is a series of five
functions inside a loop: S C H E D U L E , I N P U T G E N , I N P U T E V A L , 2.4.1 Black-box Fuzzer
C O N F U P D A T E , and C O N T I N U E . Each execution of this loop The term “black-box” is commonly used in software test-
is called a fuzz iteration and each time I N P U T E V A L executes ing [158], [32] and fuzzing to denote techniques that do
the PUT on a test case is called a fuzz run. Note that some not see the internals of the PUT—these techniques can
fuzzers do not implement all five functions. For example, to observe only the input/output behavior of the PUT, treating
model Radamsa [102], which never updates the set of fuzz it as a black-box. In software testing, black-box testing is
configurations, C O N F U P D A T E always returns the current set also called IO-driven or data-driven testing [158]. Most
of configurations unchanged. traditional fuzzers [13], [103], [49], [6], [50] are in this
P R E P R O C E S S (C) → C category. Some modern fuzzers, e.g., funfuzz [187] and
A user supplies P R E P R O C E S S with a set of fuzz config- Peach [76], also take the structural information about inputs
urations as input, and it returns a potentially-modified into account to generate more meaningful test cases while
set of fuzz configurations. Depending on the fuzz algo- maintaining the characteristic of not inspecting the PUT. A
rithm, P R E P R O C E S S may perform a variety of actions similar intuition is used in adaptive random testing [57].
such as inserting instrumentation code to PUTs, or
measuring the execution speed of seed files. See §3. 2.4.2 White-box Fuzzer
S C H E D U L E (C, telapsed , tlimit ) → conf At the other extreme of the spectrum, white-box fuzzing [90]
S C H E D U L E takes in the current set of fuzz configura- generates test cases by analyzing the internals of the PUT
4

and the information gathered when executing the PUT. constraints. Each fuzzer is summarized based on its imple-
Thus, white-box fuzzers are able to explore the state space mentation of the five functions of our model fuzzer, and
of the PUT systematically. The term white-box fuzzing was a miscellaneous section that provides other details on the
introduced by Godefroid [87] in 2007 and refers to dynamic fuzzer. We describe the properties described by each column
symbolic execution (DSE), which is a variant of symbolic below. The first column (feedback gathering granularity)
execution [39], [126], [108]. In DSE, symbolic and concrete ex- indicates whether the fuzzer is black- ( ), white- (#), or
ecution operate concurrently, where concrete program states grey-box (H #). Two circles appear when a fuzzer has two
are used to simplify symbolic constraints, e.g., concretizing phases which use different kinds of feedback gathering.
system calls. DSE is thus often referred to as concolic testing For example, SymFuzz [52] runs a white-box analysis as a
(concrete + symbolic) [191], [89]. In addition, white-box preprocessing step in order to optimize the performance of a
fuzzing has also been used to describe fuzzers that employ subsequent black-box campaign ( +#), and hybrid fuzzers,
taint analysis [84]. The overhead of white-box fuzzing is such as Driller [200], alternate between white- and grey-
typically much higher than that of black-box fuzzing. This box fuzzing (H #+#). The second column shows whether the
is partly because DSE implementations [90], [46], [25] often source code of the fuzzer is publicly available. The third
employ dynamic instrumentation and SMT solving [155]. column denotes whether fuzzers need the source code of
While DSE is an active research area [90], [88], [38], [172], the PUT to operate. The fourth column points out whether
[112], many DSEs are not white-box fuzzers because they fuzzers support in-memory fuzzing (see §3.1.2). The fifth col-
do not aim to find security bugs. As such, this paper does umn is about whether fuzzers can infer models (see §5.1.2).
not provide a comprehensive survey on DSEs and we refer The sixth column shows whether fuzzers perform either
the reader to recent survey papers [17], [185] for more static or dynamic analysis in P R E P R O C E S S . The seventh
information on DSEs for non-security applications. column indicates if fuzzers support handling multiple seeds,
and perform scheduling. The mutation column specifies if
2.4.3 Grey-box Fuzzer fuzzers perform input mutation to generate test cases. We
Some fuzzers [78], [68], [205] take a middle ground ap- use H # to indicate fuzzers that guide input mutation based
proach which is dubbed grey-box fuzzing. In general, grey- on the execution feedback. The model-based column is
box fuzzers can obtain some information internal to the about whether fuzzers generate test cases based on a model.
PUT and/or its executions. Unlike white-box fuzzers, grey- The constraint-based column shows that fuzzers perform a
box fuzzers do not reason about the full semantics of the symbolic analysis to generate test cases. The taint analysis
PUT; instead, they may perform lightweight static analysis column means that fuzzers leverage taint analysis to guide
on the PUT and/or gather dynamic information about its their test case generation process. The two columns in the
executions, such as code coverage. Grey-box fuzzers rely on I N P U T E V A L section show whether fuzzers perform crash
approximated, imperfect information in order to gain speed triage using either stack hashing or code coverage. The
and be able to test more inputs. Although there usually is a first column of the C O N F U P D A T E section indicates if fuzzers
consensus between security experts, the distinction between evolve the seed pool during C O N F U P D A T E , such as adding
black-, grey- and white-box fuzzing is not always clear. new seeds to the pool (see §7.1). The second column of
Black-box fuzzers may collect some information about fuzz the C O N F U P D A T E section is about whether fuzzers learn an
runs, and white-box fuzzers often use some approximations. input model in an online fashion. Finally, the third column
When classifying the fuzzers in this survey, particularly in of the C O N F U P D A T E section shows which fuzzers remove
Table 1, we used our best judgement. seeds from the seed pool (see §7.2).
An early example of grey-box fuzzer is EFS [68], which
uses code coverage gathered from each fuzz run to generate
3 P REPROCESS
test cases with an evolutionary algorithm. Randoop [166]
also used a similar approach, though it did not target Some fuzzers modify the initial set of fuzz configurations
security vulnerabilities. Modern fuzzers such as AFL [231] before the first fuzz iteration. Such preprocessing is com-
and VUzzer [176] are exemplars in this category. monly used to instrument the PUT, to weed out potentially-
redundant configurations (i.e., “seed selection” [177]), to
trim seeds, and to generate driver applications. As will be
2.5 Fuzzer Genealogy and Overview
detailed in §5.1.1 (p. 9), P R E P R O C E S S can also be used to
Figure 1 (p. 5) presents our categorization of existing fuzzers prepare a model for future input generation (I N P U T G E N ).
in chronological order. Starting from the seminal work by
Miller et al. [152], we manually chose popular fuzzers that
either appeared in a major conference or obtained more than 3.1 Instrumentation
100 GitHub stars, and showed their relationships as a graph. Unlike black-box fuzzers, both grey- and white-box fuzzers
Black-box fuzzers are in the left half of the figure, and grey- can instrument the PUT to gather execution feedback as
and white-box fuzzers are in the right half. Furthermore, I N P U T E V A L performs fuzz runs (see §6), or to fuzz the
fuzzers are subdivided depending on the type of input the memory contents at runtime. The amount of collected in-
PUT uses: file, network, UI, web, kernel I/O, or threads (in formation defines the color of a fuzzer (i.e., black-, white-,
the case of concurrency fuzzers). or grey-box). Although there are other ways of acquiring
Table 1 (p. 6) presents a detailed summary of the tech- information about the internals of the PUT (e.g., processor
niques used in the most notable fuzzers in Figure 1. We traces or system call usage [204], [92]), instrumentation is
had to omit some of fuzzers in Figure 1 due to space often the method that collects the most valuable feedback.
5

1990
∼ Miller et al.  [152]

2001
PROTOS [120]
SPIKE [13] Sharefuzz [14]

zzuf [103] Peach [76]


2005
GPF [6]
SPIKEfile [202]
antiparser [149]
FileFuzz [201]
Autodafé  [214]
2006
SNOOZE  [29] fsfuzzer [143] Sidewinder [78]
2007
KiF  [12] Sulley [16] CalFuzzer  [189] EFS  [68] SAGE  [87], [88], [90]
2008
LZFuzz  [40]
Fuzzbox [207] AtomFuzzer  [169] KLEE  [46]
jsfunfuzz [187]
RaceFuzzer  [190]
DOMfuzz [187]
ref fuzz [234]
2009
jFuzz  [112]

DeadlockFuzzer  [116] BuzzFuzz  [84]


MiniFuzz [151]

2010 SmartFuzz  [154]

BitFuzz  [44]
BFF [49] FLAX  [182] AssetFuzzer  [131] honggfuzz [204]
TaintScope  [219]
2011

cross fuzz [232] Radamsa [102] FuzzBALL  [27], [147], [48]


kb-Anonymity  [43]
2012
LangFuzz  [105]
Doupé et al.  [73]
Nduja [212]
MagicFuzzer  [47] Mamba  [117] Mahmood et al.  [146]
Householder  [107], [106] Web
BlendFuzz  [229]
Trinity [115]
2013 FOE [50]

Woo et al.  [225] AFL [231] Dowser  [97]


orangfuzz [188]

2014
KameleonFuzz  [74] Rebert et al.  [177]
T-Fuzz  [114] Melkor [95] Tavor [240] Nightmare [129]
Dewey et al.  [69], [70]
2015
PULSAR  [85] Choronzon  [194] LibFuzzer [7] GRT  [145]

tlsfuzzer [124] SymFuzz  [52] perf fuzzer  [221] Dharma [3] go-fuzz [215] MutaGen  [123]
llfuzzer [198] syzkaller [216]
Ruiter et al.  [180] CLsmith  [140]
Narada  [181]
2016
QuickFuzz  [94] Triforce [162]
Concurency
TLS-Attacker  [195] IFuzzer  [213] KernelFuzzer [157] Hodor [161] GRR [211]
Driller  [200]
NeuralFuzzer [62] AFLFast  [37]
classfuzz  [59] MoWF  [172]
2017
Skyfire  [217]
Digtool  [168] GLADE  [30]
DELTA  [134] DIFUZE  [64] VUzzer  [176] kAFL  [184] CAB-Fuzz  [125]
IMF  [99]
AFLGo  [36] Kernel
2018
IoTFuzzer  [54] Hawkeye  [53]
T-Fuzz  [170]
Angora  [56]
CollAFL  [83] Chopper  [210]
FairFuzz  [136]
2019

REDQUEEN  [23] QSYM  [230]


CodeAlchemist  [100] PeriScope  [196]
NAUTILUS  [22] DigFuzz  [239]

Network File Kernel Concurrency File Kernel


Black-box Grey-box White-box

Fig. 1: Genealogy tracing significant fuzzers’ lineage back to Miller et al.’s seminal work. Each node in the same row
represents a set of fuzzers appeared in the same year. A solid arrow from X to Y indicates that Y cites, references, or
otherwise uses techniques from X .  denotes that a paper describing the work was published.
6

TABLE 1: Overview of fuzzers sorted by their instrumentation granularity and their name. ,H
#, and # represent black-,
grey-, and white-box, respectively.

Misc. PR E P R O C E S S SC H E D U L E IN P U TGE N IN P U TEV A L CO N FUP D A T E

Feedback Gathering Granularity

Evolutionary Seed Pool Update


Support In-memory Fuzzing

Crash Triage: Stack Hash

Crash Triage: Coverage


Source Code Required

Model Construction

Program Analysis

Seed Pool Culling


Constraint-based
Seed Scheduling

Model Update
Open-Sourced

Taint Analysis
Model-based
Mutation
Fuzzer
BFF [49] X X X
CodeAlchemist [100] X #
H X
CLsmith [140] X X
DELTA [134] X
DIFUZE [64] X X # X
Digtool [168]
Doupé et al. [73] X
FOE [50] X X X
GLADE [30] X X X
IMF [99] X X
jsfunfuzz [187] X X X
LangFuzz [105] X
Miller et al. [152] X
Peach [76] X X X
PULSAR [85] X X
Radamsa [102] X X
Ruiter et al. [180] X
TLS-Attacker [195] X
zuff [103] X
FLAX [182] +# X X X
IoTFuzzer [54] +# X X
SymFuzz [52] +# X X X
AFL [231] #
H X X X X X X
AFLFast [37] #
H X X X† X X X
AFLGo [36] #
H X X X X X† X X X
AssetFuzzer [131] #
H X X
AtomFuzzer [169] #
H X X X
CalFuzzer [189] #
H X X X
classfuzz [59] #
H X
CollAFL [83] #†
H X X X† X X X
DeadlockFuzzer [116] #
H X X X
FairFuzz [136] #
H X X X† H†
# X X X
go-fuzz [215] #
H X X X X X X #
H X
Hawkeye [53] #
H X X X X #
H X X
honggfuzz [204] #
H X X X
kAFL [184] #
H X X
LibFuzzer [7] #
H X X X X X X
MagicFuzzer [47] #
H X X X
Nautilus [22] #
H X X X X X X
RaceFuzzer [190] #
H X X X
RedQueen [23] #
H X X H
# X
Steelix [138] #†
H X X X† #
H X X† X
Syzkaller [216] #
H X X X X X X X
Angora [56] #+#
H X X #
H X X
Cyberdyne [92] #+#
H X X X X X X X
DigFuzz [239] #+#
H X X X X X
Driller [200] #+#
H X X X X X X
QSYM [230] #+#
H X X X X X X
T-Fuzz [170] #+#
H X X X X† X X X X
VUzzer [176] #+#
H X X X X X #
H
BitFuzz [44] # X X
BuzzFuzz [84] # X X #
H X X
CAB-Fuzz [125] # X X X
Chopper [210] # X X X X
Dewey et al. [70] # X X X X
Dowser [97] # X X X
GRT [145] # X X X X X #
KLEE [46] # X X X
MoWF [172] # X X
MutaGen [123] #
Narada [181] # X X X
SAGE [90] # X
TaintScope [219] # X X #
H X X

The corresponding fuzzer is derived from AFL, and it changed this part of the fuzzing algorithm.
7

Program instrumentation can be either static or As an example, GRR [211], [92] creates a snapshot before
dynamic—the former happens before the PUT runs loading any input bytes. This way, it can skip over unneces-
(P R E P R O C E S S ), whereas the latter happens while the PUT sary startup code. AFL also employs a fork server to avoid
is running (I N P U T E V A L ). But for the reader’s convenience, some of the process startup costs. Although it has the same
we summarize both static and dynamic instrumentation in motivation as in-memory fuzzing, a fork server involves
this section. forking off a new process for every fuzz iteration (see §6).
Static instrumentation is often performed at compile Some fuzzers [7], [231] perform in-memory fuzzing on
time on either source code or intermediate code. Since it a function without restoring the PUT’s state after each
occurs before runtime, it generally imposes less runtime iteration. We call such a technique as an in-memory API
overhead than dynamic instrumentation. If the PUT relies fuzzing. For example, AFL has an option called persistent
on libraries, these have to be separately instrumented, com- mode [233], which repeatedly performs in-memory API
monly by recompiling them with the same instrumentation. fuzzing in a loop without restarting the process. In this case,
Beyond source-based instrumentation, researchers have also AFL ignores potential side effects from the function being
developed binary-level static instrumentation (i.e., binary called multiple times in the same execution.
rewriting) tools [238], [132], [77]. Although efficient, in-memory API fuzzing suffers from
Although it has higher overhead than static instrumen- unsound fuzzing results: bugs (or crashes) found from in-
tation, dynamic instrumentation has the advantage that it memory fuzzing may not be reproducible, because (1) it is
can easily instrument dynamically linked libraries, because not always feasible to construct a valid calling context for
the instrumentation is performed at runtime. There are the target function, and (2) there can be side-effects that are
several well-known dynamic instrumentation tools such as not captured across multiple function calls. Notice that the
DynInst [173], DynamoRIO [42], Pin [144], Valgrind [163], soundness of in-memory API fuzzing mainly depends on
and QEMU [33]. the entry point function, and finding such a function is a
A given fuzzer can support more than one type of challenging task.
instrumentation. For example, AFL supports static instru-
mentation at the source code level with a modified compiler, 3.1.3 Thread Scheduling
or dynamic instrumentation at the binary level with the help Race condition bugs can be difficult to trigger because they
of QEMU [33]. When using dynamic instrumentation, AFL rely on non-deterministic behaviors which may only occur
can either instrument (1) executable code in the PUT itself, infrequently. However, instrumentation can also be used
which is the default setting, or (2) executable code in the to trigger different non-deterministic program behaviors
PUT and any external libraries (with the AFL_INST_LIBS by explicitly controlling how threads are scheduled [189],
option). The second option—instrumenting all encountered [190], [169], [116], [131], [47], [181]. Existing work has shown
code—can report coverage information for code in external that even randomly scheduling threads can be effective at
libraries, and thus provides a more complete picture of finding race condition bugs [189].
coverage. However, this will also encourage AFL to fuzz
additional paths in external library functions.
3.2 Seed Selection
3.1.1 Execution Feedback Recall from §2 that fuzzers receive a set of fuzz configura-
Grey-box fuzzers typically take execution feedback as input tions that control the behavior of the fuzzing algorithm. Un-
to evolve test cases. AFL and its descendants compute fortunately, some parameters of fuzz configurations, such as
branch coverage by instrumenting every branch instruction seeds for mutation-based fuzzers, have large value domains.
in the PUT. However, they store the branch coverage in- For example, suppose an analyst fuzzes an MP3 player
formation in a bit vector, which can cause path collisions. that accepts MP3 files as input. There is an unbounded
CollAFL [83] recently addressed this issue by introducing a number of valid MP3 files, which raises a natural question:
new path-sensitive hash function. Meanwhile, LibFuzzer [7] which seeds should we use for fuzzing? This problem of
and Syzkaller [216] use node coverage as their execution decreasing the size of the initial seed pool is known as the
feedback. Honggfuzz [204] allows users to choose which seed selection problem [177].
execution feedback to use. There are several approaches and tools that address the
seed selection problem [177], [76]. A common approach is to
3.1.2 In-Memory Fuzzing find a minimal set of seeds that maximizes a coverage met-
When testing a large program, it is sometimes desirable to ric, e.g., node coverage, and this process is called computing
fuzz only a portion of the PUT without re-spawning a process a minset. For example, suppose the current set of configura-
for each fuzz iteration in order to minimize execution over- tions C consists of two seeds s1 and s2 that cover the follow-
head. For example, complex (e.g., GUI) applications often ing addresses of the PUT: {s1 → {10, 20} , s2 → {20, 30}}.
require several seconds of processing before they accept If we have a third seed s3 → {10, 20, 30} that executes
input. One approach to fuzzing such programs is to take roughly as fast as s1 and s2 , one could argue it makes sense
a snapshot of the PUT after the GUI is initialized. To fuzz to fuzz s3 instead of s1 and s2 , since it intuitively tests more
a new test case, one can then restore the memory snapshot program logic for half the execution time cost. This intuition
before writing the new test case directly into memory and is supported by Miller’s report [153], which showed that a
executing it. The same intuition applies to fuzzing network 1% increase in code coverage increased the percentage of
applications that involve heavy interaction between client bugs found by .92%. As is noted in §7.2, this step can also
and server. This technique is called in-memory fuzzing [104]. be part of C O N F U P D A T E , which is useful for fuzzers that
8

can introduce new seeds into the seed pool throughout the 4.1 The Fuzz Configuration Scheduling (FCS) Problem
campaign. The goal of scheduling is to analyze the currently-available
Fuzzers use a variety of different coverage metrics in information about the configurations and pick one that is
practice. For example, AFL’s minset is based on branch likely to lead to the most favorable outcome, e.g., finding the
coverage with a logarithmic counter on each branch. The most number of unique bugs, or maximizing the coverage
rationale behind this decision is to allow branch counts to attained by the set of generated inputs. Fundamentally, ev-
be considered different only when they differ in orders of ery scheduling algorithm confronts the same exploration vs.
magnitude. Honggfuzz [204] computes coverage based on exploitation conflict—time can either be spent on gathering
the number of executed instructions, executed branches, and more accurate information on each configuration to inform
unique basic blocks. This metric allows the fuzzer to add future decisions (explore), or on fuzzing the configurations
longer executions to the minset, which can help discover that are currently believed to lead to more favorable out-
denial of service vulnerabilities or performance problems. comes (exploit). Woo et al. [225] dubbed this inherent conflict
the Fuzz Configuration Scheduling (FCS) Problem.
In our model fuzzer (Algorithm 1), the function
3.3 Seed Trimming S C H E D U L E selects the next configuration based on (i) the
Smaller seeds are likely to consume less memory and entail current set of fuzz configurations C, (ii) the current time
higher throughput. Therefore, some fuzzers attempt to re- telapsed , and (iii) the total time budget tlimit . This configu-
duce the size of seeds prior to fuzzing them, which is called ration is then used for the next fuzz iteration. Notice that
seed trimming. Seed trimming can happen prior to the main S C H E D U L E is only about decision-making. The information
fuzzing loop in P R E P R O C E S S or as part of C O N F U P D A T E . on which this decision is based is acquired by P R E P R O C E S S
One notable fuzzer that uses seed trimming is AFL [231], and C O N F U P D A T E , which augment C with this knowledge.
which uses its code coverage instrumentation to iteratively
remove a portion of the seed as long as the modified seed 4.2 Black-box FCS Algorithms
achieves the same coverage. Meanwhile, Rebert et al. [177]
In the black-box setting, the only information an FCS algo-
reported that their size minset algorithm, which selects
rithm can use is the fuzz outcomes of a configuration—the
seeds by giving higher priority to smaller seeds in size,
number of crashes and bugs found with it and the amount
results in fewer unique bugs compared to a random seed
of time spent on it so far. Householder and Foote [107] were
selection. For the specific case of fuzzing Linux system call
the first to study how such information can be leveraged
handlers, MoonShine [167] extends syzkaller [216] to reduce
in the CERT BFF black-box mutational fuzzer [49]. They
the size of seeds while preserving the dependencies between
postulated that a configuration with a higher observed
calls which are detected using a static analysis.
success rate (#bugs / #runs) should be preferred. Indeed,
after replacing the uniform-sampling scheduling algorithm
in BFF, they observed 85% more unique crashes over 5
3.4 Preparing a Driver Application
million runs of ffmpeg, demonstrating the potential benefit
When it is difficult to directly fuzz the PUT, it makes sense to of more advanced FCS algorithms.
prepare a driver for fuzzing. This process is largely manual Shortly after, the above idea was improved on multiple
in practice although this is done only once at the beginning fronts by Woo et al. [225]. First, they refined the mathemati-
of a fuzzing campaign. For example, when our target is cal model of black-box mutational fuzzing from a sequence
a library, we need to prepare for a driver program that of Bernoulli trials [107] to the Weighted Coupon Collector’s
calls functions in the library. Similarly, kernel fuzzers may Problem with Unknown Weights (WCCP/UW). Whereas the
fuzz userland applications to test kernels [125], [165], [31]. former assumes each configuration has a fixed eventual
IoTFuzzer [54] targets IoT devices by letting the driver com- success probability and learns it over time, the latter ex-
municate with the corresponding smartphone application. plicitly maintains an upper-bound on this probability as it
decays. Second, the WCCP/UW model naturally leads Woo
et al. to investigate algorithms for multi-armed bandit (MAB)
4 S CHEDULING problems, which is a popular formalism to cope with the
exploration vs. exploitation conflict in decision science [34].
In fuzzing, scheduling means selecting a fuzz configura- To this end, they were able to design MAB algorithms to
tion for the next fuzz iteration. As we have explained accurately exploit configurations that are not known to have
in §2.1, the content of each configuration depends on the decayed yet. Third, they observed that, all else being equal,
type of the fuzzer. For simple fuzzers, scheduling can be a configuration that is faster to fuzz allows a fuzzer to
straightforward—for example, zzuf [103] in its default mode either collect more unique bugs with it, or decrease the
allows only one configuration and thus there is simply no upperbound on its future success probability more rapidly.
decision to make. But for more advanced fuzzers such as This inspired them to normalize the success probability of
BFF [49] and AFLFast [37], a major factor to their success a configuration by the time that has been spent on it, thus
lies in their innovative scheduling algorithms. In this section, causing a faster configuration to be more preferable. Fourth,
we will discuss scheduling algorithms for black- and grey- they changed the orchestration of fuzz runs in BFF from a
box fuzzing only; scheduling in white-box fuzzing requires fixed number of fuzz iterations per configuration selection
a complex setup unique to symbolic executors and we refer (“epochs” in BFF parlance) to a fixed amount of time per
the reader to another source [38]. selection. With this change, BFF is no longer forced to spend
9

more time in a slow configuration before it can re-select. By AFLGo [36] extends AFLFast by modifying its priority
combining the above, the evaluation [225] showed a 1.5× attribution in order to target specific program locations.
increase in the number of unique bugs found using the same Hawkeye [53] further improves directed fuzzing by lever-
amount of time as the existing BFF. aging a static analysis in its seed scheduling and input gen-
eration. FairFuzz [136] guides the campaign to exercise rare
branches by employing a mutation mask for each pair of a
4.3 Grey-box FCS Algorithms
seed and a rare branch. QTEP [218] uses static analysis to
In the grey-box setting, an FCS algorithm can choose to use infer which part of the binary is more ‘faulty’ and prioritize
a richer set of information about each configuration, e.g., the configurations that cover them.
coverage attained when fuzzing a configuration. AFL [231]
is the forerunner in this category and it is based on an
evolutionary algorithm (EA). Intuitively, an EA maintains 5 I NPUT G ENERATION
a population of configurations, each with some value of Since the content of a test case directly controls whether or
“fitness”. An EA selects fit configurations and applies them not a bug is triggered, the technique used for input generation
to genetic transformations such as mutation and recom- is naturally one of the most influential design decisions in
bination to produce offspring, which may later become a fuzzer. Traditionally, fuzzers are categorized into either
new configurations. The hypothesis is that these produced generation- or mutation-based fuzzers [203]. Generation-
configurations are more likely to be fit. based fuzzers produce test cases based on a given model
To understand FCS in the context of an EA, we need that describes the inputs expected by the PUT. We call such
to define (i) what makes a configuration fit, (ii) how con- fuzzers model-based fuzzers in this paper. On the other hand,
figurations are selected, and (iii) how a selected configu- mutation-based fuzzers produce test cases by mutating a
ration is used. As a high-level approximation, among the given seed input. Mutation-based fuzzers are generally con-
configurations that exercise a control-flow edge, AFL con- sidered to be model-less because seeds are merely example
siders the one that contains the fastest and smallest input inputs and even in large numbers they do not completely
to be fit (“favorite” in AFL parlance). AFL maintains a describe the expected input space of the PUT. In this section,
queue of configurations, from which it selects the next fit we explain and classify the various input generation tech-
configuration essentially as if the queue is circular. Once niques used by fuzzers based on the underlying test case
a configuration is selected, AFL fuzzes it for essentially generation (I N P U T G E N ) mechanism.
a constant number of runs. From the perspective of FCS,
notice that the preference for fast configurations is common
5.1 Model-based (Generation-based) Fuzzers
for the black-box setting [225].
Recently, AFLFast by Böhme et al. [37] has improved Model-based fuzzers generate test cases based on a given
upon AFL in each of the three aspects above. First, AFLFast model that describes the inputs or executions that the PUT
adds two overriding criteria for an input to become a “fa- may accept, such as a grammar precisely characterizing the
vorite”: (i) Among the configurations that exercise a control- input format or less precise constraints such as magic values
flow edge, AFLFast favors the one that has been chosen least. identifying file types.
This has the effect of cycling among configurations that exer-
cise this edge, thus increasing exploration. (ii) When there is 5.1.1 Predefined Model
a tie in (i), AFLFast favors the one that exercises a path that Some fuzzers use a model that can be configured by the user.
has been exercised least. This has the effect of increasing the For example, Peach [76], PROTOS [120], and Dharma [3]
exercise of rare paths, which may uncover more unobserved take in a specification provided by the user. Autodafé [214],
behavior. Second, AFLFast forgoes the round-robin selection Sulley [16], SPIKE [13], and SPIKEfile [202] expose APIs that
in AFL and instead selects the next fit configuration based allow analysts to create their own input models. Tavor [240]
on a priority. In particular, a fit configuration has a higher also takes in an input specification written in Extended
priority than another if it has been chosen less often or, when Backus-Naur form (EBNF) and generates test cases conform-
tied, if it exercises a path that has been exercised less often. ing to the corresponding grammar. Similarly, network pro-
In the same spirit as the first change, this has the effect of tocol fuzzers such as PROTOS [120], SNOOZE [29], KiF [12],
increasing the exploration among fit configurations and the and T-Fuzz [114] also take in a protocol specification from
exercising of rare paths. Third, AFLFast fuzzes a selected the user. Kernel API fuzzers [115], [216], [157], [162], [221]
configuration a variable number of times as determined by define an input model in the form of system call templates.
a power schedule. The FAST power schedule in AFLFast starts These templates commonly specify the number and types of
with a small “energy” value to ensure initial exploration arguments a system call expects as inputs. The idea of using
among configurations and increases exponentially up to a a model in kernel fuzzing originated in Koopman et al.’s
limit to quickly ensure sufficient exploitation. In addition, seminal work [128] where they compared the robustness
it also normalizes the energy by the number of generated of OSes with a finite set of manually chosen test cases for
inputs that exercise the same path, thus promoting explo- system calls. Nautilus [22] employs grammar-based input
rations of less-frequently fuzzed configurations. The overall generation for general-purpose fuzzing, and also uses its
effect of these changes is very significant—in a 24-hour grammar for seed trimming (see §3.3).
evaluation, Böhme et al. observed AFLFast discovered 3 Other model-based fuzzers target a specific language
bugs that AFL did not, and was on average 7× faster than or grammar, and the model of this language is built in
AFL on 6 other bugs that were discovered by both. to the fuzzer itself. For example, cross fuzz [232] and
10

DOMfuzz [187] generate random Document Object Model to scan for web vulnerabilities. The work of Ruiter et al. [180]
(DOM) objects. Likewise, jsfunfuzz [187] produces random, is similar, but targets TLS and bases its implementation
but syntactically correct JavaScript code based on its own on LearnLib [174]. GLADE [30] synthesizes a context-free
grammar model. QuickFuzz [94] utilizes existing Haskell li- grammar from a set of I/O samples, and fuzzes the PUT
braries that describe file formats when generating test cases. using the inferred grammar. Finally, go-fuzz [215] is a grey-
Some network protocol fuzzers such as Frankencerts [41], box fuzzer, which builds a model for each of the seed it adds
TLS-Attacker [195], tlsfuzzer [124], and llfuzzer [198] are to its seed pool. This model is used to generate new inputs
designed with models of specific network protocols such from this seed.
as TLS and NFC. Dewey et al. [69], [70] proposed a way to
generate test cases that are not only grammatically correct, 5.1.3 Encoder Model
but also semantically diverse by leveraging constraint logic Fuzzing is often used to test decoder programs which parse
programming. LangFuzz [105] produces code fragments a certain file format. Many file formats have corresponding
by parsing a set of seeds that are given as input. It then encoder programs, which can be thought of as an implicit
randomly combines the fragments, and mutates seeds with model of the file format. MutaGen [123] is a unique fuzzer
the fragments to generate test cases. Since it is provided that leverages this implicit model contained in an encoder
with a grammar, it always produces syntactically correct program to generate new test cases. MutaGen leverages mu-
code. LangFuzz was applied to JavaScript and PHP. Blend- tation to generate test cases, but unlike most mutation-based
Fuzz [229] is based on similar ideas as LangFuzz, but targets fuzzers, which mutate an existing test case (see §5.2), Muta-
XML and regular expression parsers. Gen mutates the encoder program. Specifically, to produce a
new test case MutaGen computes a dynamic program slice
5.1.2 Inferred Model of the encoder program and runs it. The underlying idea is
Inferring the model rather than relying on a predefined that the program slices will slightly change the behavior of
or user-provided model has recently been gaining traction. the encoder program so that it produces testcases that are
Although there is an abundance of published research slightly malformed.
on the topic of automated input format and protocol re-
verse engineering [66], [45], [141], [63], [28], only a few
5.2 Model-less (Mutation-based) Fuzzers
fuzzers leverage these techniques. Similar to instrumenta-
tion (§3.1), model inference can occur in either P R E P R O C E S S Classic random testing [20], [98] is not efficient in generating
or C O N F U P D A T E . test cases that satisfy specific path conditions. Suppose there
5.1.2.1 Model Inference in P R E P R O C E S S : Some is a simple C statement: if (input == 42). If input is
fuzzers infer the model as a preprocessing step. Test- a 32-bit integer, the probability of randomly guessing the
Miner [67] searches for the data available in the PUT, such right input value is 1/232 . The situation gets worse when
as literals, to predict suitable inputs. Given a set of seeds we consider well-structured input such as an MP3 file. It
and a grammar, Skyfire [217] uses a data-driven approach is extremely unlikely that random testing will generate a
to infer a probabilitistic context-sensitive grammar, and then valid MP3 file as a test case in a reasonable amount of time.
uses it to generate a new set of seeds. Unlike previous As a result, the MP3 player will mostly reject the generated
works, it focuses on generating semantically valid inputs. test cases from random testing at the parsing stage before
IMF [99] learns a kernel API model by analyzing system reaching deeper parts of the program.
API logs, and it produces C code that invokes a sequence This problem motivates the use of seed-based input
of API calls using the inferred model. CodeAlchemist [100] generation as well as white-box input generation (see §5.3).
breaks JavaScript code into “code bricks”, and computes Most model-less fuzzers use a seed, which is an input to the
assembly constraints, which describe when distinct bricks PUT, in order to generate test cases by modifying the seed. A
can be assembled or merged together to produce semanti- seed is typically a well-structured input of a type supported
cally valid test cases. These constraints are computed using by the PUT: a file, a network packet, or a sequence of UI
both a static analysis and dynamic analysis. Neural [62] events. By mutating only a fraction of a valid file, it is often
and Learn&Fuzz [91] use a neural network-based machine possible to generate a new test case that is mostly valid, but
learning technique to learn a model from a given set of also contains abnormal values to trigger crashes of the PUT.
test files, and generate test cases from the inferred model. There are a variety of methods used to mutate seeds, and
Liu et al. [142] proposed a similar approach specific to text we describe the common ones below.
inputs.
5.1.2.2 Model Inference in C O N F U P D A T E : Other 5.2.1 Bit-Flipping
fuzzers can potentially update their model after each fuzz Bit-flipping is a common technique used by many model-
iteration. PULSAR [85] automatically infers a network proto- less fuzzers [231], [204], [103], [6], [102]. Some fuzzers simply
col model from a set of captured network packets generated flip a fixed number of bits, while others determine the
from a program. The learned network protocol is then used number of bits to flip at random. To randomly mutate
to fuzz the program. PULSAR internally builds a state seeds, some fuzzers employ a user-configurable parameter
machine, and maps which message token is correlated with called the mutation ratio, which determines the number of
a state. This information is later used to generate test cases bit positions to flip for a single execution of I N P U T G E N .
that cover more states in the state machine. Doupé et al. [73] Suppose a fuzzer wants to flip K random bits from a given
propose a way to infer the state machine of a web service by N -bit seed. In this case, a mutation ratio of the fuzzer is
observing the I/O behavior. The inferred model is then used K/N .
11

SymFuzz [52] showed that fuzzing performance is sensi- techniques based on their underlying test case algorithm.
tive to the mutation ratio, and that there is not a single ratio Please note that we intentionally omit dynamic symbolic
that works well for all PUTs. There are several approaches executors such as [89], [191], [60], [46], [209], [51] unless they
to find a good mutation ratio. BFF [49] and FOE [50] use explicitly call themselves as a fuzzer as mentioned in §2.2.
an exponentially scaled set of mutation ratios for each seed
and allocate more iterations to mutation ratios that prove 5.3.1 Dynamic Symbolic Execution
to be statistically effective [107]. SymFuzz [52] leverages a At a high level, classic symbolic execution [126], [39], [108]
white-box program analysis to infer a good mutation ratio runs a program with symbolic values as inputs, which
for each seed. represents all possible values. As it executes the PUT, it
builds symbolic expressions instead of evaluating concrete
5.2.2 Arithmetic Mutation values. Whenever it reaches a conditional branch instruction,
AFL [231] and honggfuzz [204] contain another mutation it conceptually forks two symbolic interpreters, one for the
operation where they consider a selected byte sequence as true branch and another for the false branch. For every
an integer, and perform simple arithmetic on that value. path, a symbolic interpreter builds up a path formula (or
The computed value is then used to replace the selected path predicate) for every branch instruction it encountered
byte sequence. The key intuition is to bound the effect of during an execution. A path formula is satisfiable if there
mutation by a small number. For example, AFL selects a 4- is a concrete input that executes the desired path. One can
byte value from a seed, and treats the value as an integer i. generate concrete inputs by querying an SMT solver [155]
It then replaces the value in the seed with i ± r, where r is a for a solution to a path formula. Dynamic symbolic execu-
randomly generated small integer. The range of r depends tion is a variant of traditional symbolic execution, where
on the fuzzer, and is often user-configurable. In AFL, the both symbolic execution and concrete execution operate at
default range is: 0 ≤ r < 35. the same time. Thus, we often refer to dynamic symbolic
execution as concolic (concrete + symbolic) testing. The
5.2.3 Block-based Mutation idea is that concrete execution states can help reduce the
There are several block-based mutation methodologies, complexity of symbolic constraints. An extensive review
where a block is a sequence of bytes of a seed: (1) insert of the academic literature of dynamic symbolic execution,
a randomly generated block into a random position of a beyond its application to fuzzing, is out of the scope of this
seed [231], [7]; (2) delete a randomly selected block from a paper. However, a broader treatment of dynamic symbolic
seed [231], [102], [204], [7]; (3) replace a randomly selected execution can be found in other sources [17], [185].
block with a random value [231], [204], [102], [7]; (4) ran- Dynamic symbolic execution is slow compared to grey-
domly permute the order of a sequence of blocks [102], [7]; box or black-box approaches as it involves instrumenting
(5) resize a seed by appending a random block [204]; and (6) and analyzing every single instruction of the PUT. To cope
take a random block from a seed to insert/replace a random with the high cost, a common strategy has been to narrow
block of another seed [231], [7]. its usage; for instance, by letting the user to specify unin-
teresting parts of the code [210], or by alternating between
5.2.4 Dictionary-based Mutation concolic testing and grey-box fuzzing. Driller [200] and
Cyberdyne [92] have shown the usefulness of this technique
Some fuzzers use a set of predefined values with potentially
at the DARPA Cyber Grand Challenge. QSYM [230] seeks
significant semantic weight, e.g., 0 or −1, and format strings
to improve the integration between grey- and white-box
for mutation. For example, AFL [231], honggfuzz [204],
fuzzing by implementing a fast concolic execution engine.
and LibFuzzer [7] use values such as 0, -1, and 1 when
DigFuzz [239] optimizes the switch between grey- and
mutating integers. Radamsa [102] employs Unicode strings
white-box fuzzing by first estimating the probability of ex-
and GPF [6] uses formatting characters such as %x and %s
ercising each path using grey-box fuzzing, and then having
to mutate strings [55].
its white-box fuzzer focus on the paths that are believed to
be most challenging for grey-box fuzzing.
5.3 White-box Fuzzers
White-box fuzzers can also be categorized into either model- 5.3.2 Guided Fuzzing
based or model-less fuzzers. For example, traditional dy- Some fuzzers leverage static or dynamic program analysis
namic symbolic execution [90], [112], [27], [147], [200] does techniques for enhancing the effectiveness of fuzzing. These
not require any model as in mutation-based fuzzers, but techniques usually involve fuzzing in two phases: (i) a costly
some symbolic executors [88], [172], [125] leverage input program analysis for obtaining useful information about the
models such as an input grammar to guide the symbolic PUT, and (ii) test case generation with the guidance from the
executor. previous analysis. This is denoted in the sixth column of Ta-
Although many white-box fuzzers including the semi- ble 1 (p. 6). For example, TaintScope [219] uses a fine-grained
nal work by Godefroid et al. [90] use dynamic symbolic taint analysis to find “hot bytes”, which are the input bytes
execution to generate test cases, not all white-box fuzzers that flow into critical system calls or API calls. A similar
are dynamic symbolic executors. Some fuzzers [219], [52], idea is presented by other security researchers [75], [110].
[145], [182] leverage a white-box program analysis to find Dowser [97] performs a static analysis during compilation
information about the inputs a PUT accepts in order to use to find loops that are likely to contain bugs based on a
it with black- or grey-box fuzzing. In the rest of this subsec- heuristic. Specifically, it looks for loops containing pointer
tion, we briefly summarize the existing white-box fuzzing dereferences. It then computes the relationship between
12

input bytes and the candidate loops with a taint analysis. vulnerability that overwrites a data or code pointer with
Finally, Dowser runs dynamic symbolic execution while an invalid value will usually cause a segmentation fault
making only the critical bytes to be symbolic hence improv- or abort when it is dereferenced. In addition, this policy is
ing performance. VUzzer [176] and GRT [145] leverage both efficient and simple to implement, since operating systems
static and dynamic analysis techniques to extract control- allow such exceptional situations to be trapped by the fuzzer
and data-flow features from the PUT and use them to guide without any instrumentation.
input generation. However, the traditional policy of detecting crashes will
Angora [56] and RedQueen [23] decrease the cost of not detect every memory vulnerability that is triggered. For
their analysis by first running for each seed with a costly example, if a stack buffer overflow overwrites a pointer on
instrumentation and using this information for generating the stack with a valid memory address, the program might
inputs which are run with a lighter instrumentation. An- run to completion with an invalid result rather than crash-
gora improves upon the “hot bytes” idea by using taint ing, and the fuzzer would not detect this. To mitigate this,
analysis to associate each path constraint to correspond- researchers have proposed a variety of efficient program
ing bytes. It then performs a search inspired by gradient transformations that detect unsafe or unwanted program
descent algorithm to guide its mutations towards solving behaviors and abort the program. These are often called
these constraints. On the other hand, RedQueen tries to sanitizers.
detect how inputs are used in the PUT by instrumenting all
comparisons and looking for correspondence between their 6.1.1 Memory and Type Safety
operands and the given input. Once a match is found, it can Memory safety errors can be separated into two classes:
be used to solve a constraint. spatial and temporal. Informally, spatial memory errors
occur when a pointer is dereferenced outside of the object it
5.3.3 PUT Mutation was intended to point to. For example, buffer overflows and
One of the practical challenges in fuzzing is bypassing a underflows are canonical examples of spatial memory errors.
checksum validation. For example, when a PUT computes Temporal memory errors occur when a pointer is accessed
a checksum of an input before parsing it, many test cases after it is no longer valid. For example, a use-after-free
will be rejected by the PUT. To handle this challenge, vulnerability, in which a pointer is used after the memory
TaintScope [219] proposed a checksum-aware fuzzing tech- it pointed to has been deallocated, is a typical temporal
nique, which identifies a checksum test instruction with a memory error.
taint analysis, and patches the PUT to bypass the checksum Address Sanitizer (ASan) [192] is a fast memory error
validation. Once they find a program crash, they generate detector that instruments programs at compile time. ASan
the correct checksum for the input to generate a test case that can detect spatial and temporal memory errors and has
crashes the unmodified PUT. Caballero et al. [44] suggested an average slowdown of only 73%, making it an attractive
a technique called stitched dynamic symbolic execution that alternative to a basic crash harness. ASan employs a shadow
can generate test cases in the presence of checksums. memory that allows each memory address to be quickly
T-Fuzz [170] extends this idea to efficiently penetrate checked for validity before it is dereferenced, which allows
all kind of conditional branches with grey-box fuzzing. It it to detect many (but not all) unsafe memory accesses, even
first builds a set of Non-Critical Checks (NCC), which are if they would not crash the original program. MEDS [101]
branches that can be transformed without modifying the improves on ASan by leveraging the large memory space
program logic. When the fuzzing campaign stops discov- available on 64-bit platforms to create large chunks of inac-
ering new paths, it picks an NCC, transforms it, and then cessible memory redzones in between allocated objects. These
restarts a fuzzing campaign on the modified PUT. Finally, redzones make it more likely that a corrupted pointer will
when a crash is found fuzzing a transformed program, T- point to invalid memory and cause a crash.
Fuzz tries to reconstruct it on the original program using SoftBound/CETS [159], [160] is another memory error
symbolic execution. detector that instruments programs during compilation.
Rather than tracking valid memory addresses like ASan,
6 I NPUT E VALUATION however, SoftBound/CETS associates bounds and temporal
information with each pointer, and can theoretically de-
After an input is generated, the fuzzer executes the PUT
tect all spatial and temporal memory errors. However, as
on the input, and decides what to do with the resulting
expected, this completeness comes with a higher average
execution. This process is called input evaluation. Although
overhead of 116% [160]. CaVer [133], TypeSan [96] and
the simplicity of executing a PUT is one of the reasons
HexType [113] instrument programs during compilation so
that fuzzing is attractive, there are many optimizations and
that they can detect bad-casting in C++ type casting. Bad
design decisions related to input evaluation that effect the
casting occurs when an object is cast to an incompatible type,
performance and effectiveness of a fuzzer, and we explore
such as when an object of a base class is cast to a derived
these considerations in this section.
type. CaVer has been shown to scale to web browsers, which
have historically contained this type of vulnerability, and
6.1 Bug Oracles imposes between 7.6 and 64.6% overhead.
The canonical security policy used with fuzz testing consid- Another class of memory safety protection is Control Flow
ers every program execution terminated by a fatal signal Integrity [10], [11] (CFI), which detects control flow transi-
(such as a segmentation fault) to be a violation. This pol- tions at runtime that are not possible in the original program.
icy detects many memory vulnerabilities, since a memory CFI can be used to detect test cases that have illegally
13

modified the control flow of a program. A recent project testing, which uses differential testing of multiple inputs on
focused on protecting against a subset of CFI violations has a single program to map mutations from the PUT’s input to
landed in the mainstream gcc and clang compilers [208]. its output. These mappings are used to identify information
leaks.
6.1.2 Undefined Behaviors
Languages such as C contain many behaviors that are left 6.2 Execution Optimizations
undefined by the language specification. The compiler is Our model considers individual fuzz iterations to be exe-
free to handle these constructs in a variety of ways. In cuted sequentially. While the straightforward implementa-
many cases, a programmer may (intentionally or other- tion of such an approach would simply load the PUT every
wise) write their code so that it is only correct for some time a new process is started at the beginning of a fuzz it-
compiler implementations. Although this may not seem eration, the repetitive loading processes can be significantly
overly dangerous, many factors can impact how a compiler reduced. To this end, modern fuzzers provide functionalities
implements undefined behaviors, including optimization that skip over these repetitive loading processes. For exam-
settings, architecture, compiler, and even compiler version. ple, AFL [231] provides a fork-server that allows each new
Vulnerabilities and bugs often arise when the compiler’s fuzz iteration to fork from an already initialized process.
implementation of an undefined behavior does not match Similarly, in-memory fuzzing is another way to optimize
the programmer’s expectation [220]. the execution speed as discussed in §3.1.2. Regardless of the
Memory Sanitizer (MSan) is a tool that instruments exact mechanism, the overhead of loading and initializing
programs during compilation to detect undefined behav- the PUT is amortized over many iterations. Xu et al. [228]
iors caused by uses of uninitialized memory in C and further lower the cost of an iteration by designing a new
C++ [199]. Similar to ASan, MSan uses a shadow memory system call that replaces fork().
that represents whether each addressable bit is initialized
or not. Memory Sanitizer has approximately 150% over- 6.3 Triage
head. Undefined Behavior Sanitizer (UBSan) [71] modifies
programs at compile-time to detect undefined behaviors. Triage is the process of analyzing and reporting test cases
Unlike other sanitizers which focus on one particular source that cause policy violations. Triage can be separated into
of undefined behavior, UBSan can detect a wide variety three steps: deduplication, prioritization, and test case mini-
of undefined behaviors, such as using misaligned pointers, mization.
division by zero, dereferencing null pointers, and integer
6.3.1 Deduplication
overflow. Thread Sanitizer (TSan) [193] is a compile-time
modification that detects data races with a trade-off between Deduplication is the process of pruning any test case from
precision and performance. A data race occurs when two the output set that triggers the same bug as another test
threads concurrently access a shared memory location and case. Ideally, deduplication would return a set of test cases
at least one of the accesses is a write. Such bugs can cause in which each triggers a unique bug.
data corruption and can be extremely difficult to reproduce Deduplication is an important component of most
due to non-determinism. fuzzers for several reasons. As a practical implementation
manner, it avoids wasting disk space and other resources by
6.1.3 Input Validation storing duplicate results on the hard drive. As a usability
consideration, deduplication makes it easy for users to
Testing for input validation vulnerabilities such as XSS understand roughly how many different bugs are present,
(cross site scripting) and SQL injection vulnerabilities is a and to be able to analyze an example of each bug. This is
challenging problem, as it requires understanding the behav- useful for a variety of fuzzer users; for example, attackers
ior of the very complicated parsers that power web browsers may want to look only for “home run” vulnerabilities that
and database engines. KameleonFuzz [74] detects successful are likely to lead to reliable exploitation.
XSS attacks by parsing test cases with a real web browser, There are currently three major deduplication implemen-
extracting the Document Object Model tree, and comparing tations used in practice: stack backtrace hashing, coverage-
it against manually specified patterns that indicate a success- based deduplication, and semantics-aware deduplication.
ful XSS attack. µ4SQLi [18] uses a similar trick to detect SQL 6.3.1.1 Stack Backtrace Hashing: Stack backtrace
injections. Because it is not possible to reliably detect SQL hashing [154] is one of the oldest and most widely used
injections from a web applications response, µ4SQLi uses a methods for deduplicating crashes, in which an automated
database proxy that intercepts communication between the tool records a stack backtrace at the time of the crash, and
target web application and the database to detect whether assigns a stack hash based on the contents of that back-
an input triggered harmful behavior. trace. For example, if the program crashed while executing
a line of code in function foo, and had the call stack
6.1.4 Semantic Difference main → d → c → b → a → foo (see Fig. 2), then a
Semantic bugs are often discovered using a technique called stack backtrace hashing implementation with n = 5 would
differential testing [148], which compares the behavior of sim- group that test case with other crashing executions whose
ilar (but not identical) programs. Several fuzzers [41], [171], backtrace ended with d → c → b → a → foo.
[59] have used differential testing to identify discrepancies Stack hashing implementations vary widely, starting
between similar programs, which are likely to indicate a with the number of stack frames that are included in the
bug. Jung et al. [118] introduced black-box differential fuzz hash. Some implementations use one [19], three [154], [225],
14

6.3.2 Prioritization and Exploitability


main
Prioritization, a.k.a. the fuzzer taming problem [58], is the
d process of ranking or grouping violating test cases according
to their severity and uniqueness. Fuzzing has traditionally
c been used to discover memory vulnerabilities, and in this
context prioritization is better known as determining the
n=5 b exploitability of a crash. Exploitability informally describes
the likelihood of an adversary being able to write a prac-
a tical exploit for the vulnerability exposed by the test case.
Both defenders and attackers are interested in exploitable
foo (crashed ) bugs. Defenders generally fix exploitable bugs before non-
exploitable ones, and attackers are interested in exploitable
bugs for obvious reasons.
Fig. 2: Stack backtrace hashing example.
One of the first exploitability ranking systems was
Microsoft’s !exploitable [150], which gets its name from
the !exploitable WinDbg command name that it
five [82], [49], or do not have any limit [123]. Implementa- provides. !exploitable employs several heuristics paired
tions also differ in the amount of information included from with a simplified taint analysis [164], [185]. It clas-
each stack frame. Some implementations will only hash the sifies each crash on the following severity scale:
function’s name or address, but other implementations will EXPLOITABLE > PROBABLY_EXPLOITABLE > UNKNOWN >
hash both the name and the offset or line. Neither option NOT_LIKELY_EXPLOITABLE, in which x > y means that x
works well all the time, so some implementations [150], [82] is more severe than y . Although these classifications are not
produce two hashes: a major and minor hash. The major formally defined, !exploitable is informally intended to be
hash is likely to group dissimilar crashes together as it only conservative and error on the side of reporting something
hashes the function name, whereas the minor hash is more as more exploitable than it is. For example, !exploitable con-
precise since it uses the function name and line number, and cludes that a crash is EXPLOITABLE if an illegal instruction
also includes an unlimited number of stack frames. is executed, based on the assumption that the attacker was
Although stack backtrace hashing is widely used, it is able to coerce control flow. On the other hand, a division by
not without its shortcomings. The underlying hypothesis of zero crash is considered NOT_LIKELY_EXPLOITABLE.
stack backtrace hashing is that similar crashes are caused Since !exploitable was introduced, other, similar rule-
by similar bugs, and vice versa, but, to the best of our based heuristics systems have been proposed, including the
knowledge, this hypothesis has never been directly tested. exploitable plugin for GDB [82] and Apple’s CrashWran-
There is some reason to doubt its veracity: some crashes do gler [19]. However, their correctness has not been system-
not occur near the code that caused the crash. For example, atically studied and evaluated yet.
a vulnerability that causes heap corruption might only crash
when an unrelated part of the code attempts to allocate 6.3.3 Test case minimization
memory, rather than when the heap overflow occurred. Another important part of triage is test case minimization.
6.3.1.2 Coverage-based Deduplication: AFL [231] is Test case minimization is the process of identifying the
a popular grey-box fuzzer that employs an efficient source- portion of a violating test case that is necessary to trigger the
code instrumentation to record the edge coverage of each violation, and optionally producing a test case that is smaller
execution of the PUT, and also measure coarse hit counts and simpler than the original, but still causes a violation.
for each edge. As a grey-box fuzzer, AFL primarily uses this Although test case minimization and seed trimming (see 3.3,
coverage information to select new seed files. However, it p. 8) are conceptually similar in that they aim at reducing the
also leads to a fairly unique deduplication scheme as well. size of an input, they are distinct because a minimizer can
As described by its documentation, AFL considers a crash to leverage a bug oracle.
be unique if either (i) the crash covered a previously unseen Some fuzzers use their own implementation and algo-
edge, or (ii) the crash did not cover an edge that was present rithms for this. BFF [49] includes a minimization algorithm
in all earlier crashes. tailored to fuzzing [106] that attempts to minimize the
6.3.1.3 Semantics-aware Deduplication: RETracer number of bits that are different from the original seed
[65] performs crash triage based on the semantics recovered file. AFL [231] also includes a test case minimizer, which
from a reverse data-flow analysis from each crash. Specif- attempts to simplify the test case by opportunistically set-
ically, RETracer checks which pointer caused the crash by ting bytes to zero and shortening the length of the test case.
analyzing a crash dump (core dump), and recursively iden- Lithium [179] is a general purpose test case minimization
tifies which instruction assigned the bad value to it. It even- tool that minimizes files by attempting to remove “chunks”
tually finds a function that has the maximum frame level, of adjacent lines or bytes in exponentially descending sizes.
and “blames” the function. The blamed function can be used Lithium was motivated by the complicated test cases pro-
to cluster crashes. The authors showed that their technique duced by JavaScript fuzzers such as jsfunfuzz [187].
successfully deduped millions of Internet Explorer bugs into There are also a variety of test case reducers that are
one. In contrast, stack hashing categorized the same crashes not specifically designed for fuzzing, but can nevertheless
into a large number of different groups. be used for test cases identified by fuzzing. These include
15

format agnostic techniques such as delta debugging [236], a branch has been taken. STADS [35] presents a statistical
and specialized techniques for specific formats such as C- framework inspired by ecology to estimate how many new
Reduce [178] for C/C++ files. Although specialized tech- configurations will be discovered if the fuzzing campaign
niques are obviously limited in the types of files they can re- continues. Another common strategy is to measure the
duce, they have the advantage that they can be significantly fraction of conditions that are met when complex branch
more efficient than generic techniques, since they have an conditions are evaluated. For example, LAF-INTEL [130]
understanding of the grammar they are trying to simplify. simply breaks multi-byte comparison into several branches,
which allows it to detect when a new seed passes an inter-
mediate byte comparison. LibFuzzer [7], Honggfuzz [204],
7 C ONFIGURATION U PDATING go-fuzz [215] and Steelix [138] instrument all comparisons,
The C O N F U P D A T E function plays a critical role in dis- and add any test case that makes progress on a comparison
tinguishing the behavior of black-box fuzzers from grey- to the seed pool. A similar idea was also released as a
and white-box fuzzers. As discussed in Algorithm 1, the stand-alone instrumentation module for clang [119]. Ad-
C O N F U P D A T E function can modify the set of configurations ditionally, Steelix [138] checks which input offsets influence
(C) based on the configuration and execution information comparison instructions. Angora [170] improves the fitness
collected during the current fuzzing run. In its simplest criteria of AFL by considering the calling context of each
form, C O N F U P D A T E returns the C parameter unmodified. branch taken.
Black-box fuzzers do not perform any program introspec- VUzzer [176] is an EA-based fuzzer whose fitness func-
tion beyond evaluating the bug oracle Obug , and so they tion relies on the results of a custom program analysis
typically leave C unmodified because they do not have any that determines weights for each basic block. Specifically,
information collected that would allow them to modify it1 . VUzzer first uses a built-in program analysis to classify basic
In contrast, grey- and white-box fuzzers are distin- blocks as either normal or error handling (EH). For a normal
guished by their more sophisticated implementations of the block, its weight is inversely proportional to the probability
C O N F U P D A T E function, which allows them to incorporate that a random walk on the CFG containing this block visits it
new fuzz configurations, or remove old ones that may according to transition probabilities defined by VUzzer. This
have been superseded. C O N F U P D A T E enables information encourages VUzzer to prefer configurations that exercise
collected during one fuzzing iteration to be used by all normal blocks deemed rare by the aforementioned random
future fuzzing iterations. For example, white-box fuzzers walk. The weight of EH blocks is negative, and its magnitude
typically create a new fuzz configuration for every new test is the ratio of the number of basic blocks compared to the
case produced, since they produce relatively few test cases number of EH blocks exercised by this configuration. These
compared to black- and grey-box fuzzers. negative weights are used to discourage the execution of
error handling (EH) blocks, based on the hypothesis that
7.1 Evolutionary Seed Pool Update traversing an EH block signals a lower chance of exercising
a vulnerability since bugs often coincide with unhandled
An Evolutionary Algorithm (EA) is a heuristic-based ap-
errors.
proach that involves biological evolution mechanisms such
as mutation, recombination, and selection. In the context of
fuzzing, an EA maintains a seed pool of promising individ- 7.2 Maintaining a Minset
uals (i.e., seeds) that evolves over the course of a fuzzing
With the ability to create new fuzzing configurations comes
campaign as new individuals are discovered. Although the
the risk of creating too many configurations. A common
concept of EAs is relatively simple, it forms the basis of strategy used to mitigate this risk is to maintain a minset,
many grey-box fuzzers [231], [7], [216]. The process of
or a minimal set of test cases that maximizes a coverage
choosing the seeds to be mutated and the mutation process
metric. Minsetting is also used during P R E P R O C E S S , and is
itself were detailed in §4.3 and §5 respectively.
described in more detail in §3.2.
Arguably, the most important step of an EA is to add
Some fuzzers use a variant of maintaining a minset that
a new configuration to the set of configurations C, which
is specialized for configuration updates. As one example,
occurs during the C O N F U P D A T E step of fuzzing. Most EA-
rather than completely removing configurations that are not
based fuzzers use node or branch coverage as a fitness
in the minset, which is what Cyberdyne [92] does, AFL [231]
function: if a new node or branch is discovered by a test
uses a culling procedure to mark minset configurations as
case, it is added to the seed pool. As the number of reachable
being favorable. Favorable fuzzing configurations are given
paths can be orders of magnitude larger than the number of
a significantly higher chance of being selected for fuzzing by
seeds, the seed pool is intended to be a diverse subselection
the S C H E D U L E function. The author of AFL notes that “this
of all reachable paths in order to represent the current
provides a reasonable balance between queue cycling speed
exploration of the PUT. Also note that seed pools of different
and test case diversity” [235].
size can have the same coverage (as mentioned in §3.2, p. 7).
A common strategy in EA fuzzers is to refine the fitness
function so that it can detect more subtle and granular indi- 8 R ELATED WORK
cators of improvements. For example, AFL [231] refines its
The literature on fuzzing had an early bloom in 2007–
fitness function definition by recording the number of times
2008, when three trade-books on the subject were published
1. Some fuzzers add violating test cases to the set of seeds. For within the two-year period [79], [203], [205]. These books
example, BFF [49] calls this feature crash recycling. took a more practical approach by presenting the different
16

tools and techniques available at the time and their usages [13] D. Aitel, “An introduction to SPIKE, the fuzzer creation kit,” in
on a variety of targets. We note that Takanen et al. [205] Proceedings of the Black Hat USA, 2001.
[14] ——, “Sharefuzz,” https://sourceforge.net/projects/sharefuzz/,
already distinguished among black-, white- and grey-box 2001.
fuzzers, although no formal definitions were given. Most [15] M. Aizatsky, K. Serebryany, O. Chang, A. Arya, and
recently, [205] had been revised after a decade. The second M. Whittaker, “Announcing OSS-Fuzz: Continuous fuzzing for
edition [206] contained many updates to include modern open source software,” Google Testing Blog, 2016.
[16] P. Amini, A. Portnoy, and R. Sears, “sulley,”
tools such as AFL [231] and ClusterFuzz [61]. https://github.com/OpenRCE/sulley.
We are aware of two fuzzing surveys that are concurrent [17] S. Anand, E. K. Burke, T. Y. Chen, J. Clark, M. B. Cohen,
to ours [137], [139]. However, the goals of both of these W. Grieskamp, M. Harman, M. J. Harrold, and P. Mcminn, “An
surveys are more focused than ours, which is to provide orchestrated survey of methodologies for automated software
test case generation,” Journal of Systems and Software, vol. 86,
a comprehensive study on recent developments covering no. 8, pp. 1978–2001, 2013.
the entire area. In particular, Li et al. [137] provided a thor- [18] D. Appelt, C. D. Nguyen, L. C. Briand, and N. Alshahwan,
ough review of many recent advances in fuzzing, though “Automated testing for sql injection vulnerabilities: An input
mutation approach,” in Proceedings of the International Symposium
the authors have also decided to focus on the detail of
on Software Testing and Analysis, 2014, pp. 259–269.
coverage-based fuzzing and not others. More similar to [19] Apple Inc., “Accessing crashwrangler to analyze crashes for
ours, Liang et al. [139] proposed an informal model to security implications,” Technical Note TN2334.
describe various fuzzing techniques. However, their model [20] A. Arcuri, M. Z. Iqbal, and L. Briand, “Random testing:
Theoretical results and practical implications,” IEEE Transactions
is not flexible enough to encompass some of the fuzzing on Software Engineering, vol. 38, no. 2, pp. 258–277, 2012.
approaches we cover in this paper, such as model inference [21] Ars Technica, “Pwn2own: The perfect antidote to fanboys who
(see §5.1.2) and hybrid fuzzing (see §5.3). say their platform is safe,”
Klees et al. [127] recently found that there has been no http://arstechnica.com/security/2014/03/pwn2own-the-perfect-antidote-to-f
2014.
coherent way of evaluating fuzzing techniques, which can [22] C. Aschermann, P. Jauernig, T. Frassetto, A.-R. Sadeghi, T. Holz,
hamper our ability to compare the effectiveness of fuzzing and D. Teuchert, “NAUTILUS: Fishing for deep bugs with
techniques. In addition, they provided several useful guide- grammars,” in Proceedings of the Network and Distributed System
lines for evaluating fuzzing algorithms. We consider their Security Symposium, 2019.
[23] C. Aschermann, S. Schumilo, T. Blazytko, R. Gawlik, and
work to be orthogonal to ours as the evaluation of fuzzing T. Holz, “REDQUEEN: Fuzzing with input-to-state
algorithms is beyond the scope of this paper. correspondence,” in Proceedings of the Network and Distributed
System Security Symposium, 2019.
[24] K. W. Y. Au, Y. F. Zhou, Z. Huang, and D. Lie, “PScout:
9 C ONCLUDING R EMARKS Analyzing the android permission specification,” in Proceedings
of the ACM Conference on Computer and Communications Security,
As we have set forth in §1, our goal for this paper is 2012, pp. 217–228.
to distill a comprehensive and coherent view of modern [25] T. Avgerinos, A. Rebert, S. K. Cha, and D. Brumley, “Enhancing
fuzzing literature. To this end, we first present a general- symbolic execution with Veritesting,” in Proceedings of the
purpose model fuzzer to facilitate our effort to explain the International Conference on Software Engineering, 2014, pp.
1083–1094.
many forms of fuzzing in current use. Then, we illustrate a [26] A. Avizienis, J.-C. Laprie, B. Randell, and C. Landwehr, “Basic
rich taxonomy of fuzzers using Figure 1 (p. 5) and Table 1 concepts and taxonomy of dependable and secure computing,”
(p. 6). We have explored every stage of our model fuzzer by IEEE Transactions on Dependable and Secure Computing, vol. 1,
discussing the design decisions as well as showcasing the no. 1, pp. 11–33, 2004.
[27] D. Babic, L. Martignoni, S. McCamant, and D. Song,
many achievements by the community at large. “Statically-directed dynamic automated test generation,” in
Proceedings of the International Symposium on Software Testing and
Analysis, 2011, pp. 12–22.
R EFERENCES [28] G. Bai, J. Lei, G. Meng, S. S. Venkatraman, P. Saxena, J. Sun,
[1] “Binspector: Evolving a security tool,” Y. Liu, and J. S. Dong, “AUTHSCAN: Automatic extraction of
web authentication protocols from implementations.” in
https://blogs.adobe.com/security/2015/05/binspector-evolving-a-security-tool.html.
[2] “Cisco secure development lifecycle,” Proceedings of the Network and Distributed System Security
Symposium, 2013.
https://www.cisco.com/c/en/us/about/security-center/security-programs/secure-development-lifecycle/sdl-process/validate.html.
[3] “dharma,” https://github.com/MozillaSecurity/dharma. [29] G. Banks, M. Cova, V. Felmetsger, K. Almeroth, R. Kemmerer,
[4] “The fuzzing project,” and G. Vigna, “SNOOZE: Toward a stateful network protocol
https://fuzzing-project.org/software.html. fuzzer,” in Proceedings of the International Conference on
[5] “Google chromium security,” Information Security, 2006, pp. 343–358.
https://www.chromium.org/Home/chromium-security/bugs. [30] O. Bastani, R. Sharma, A. Aiken, and P. Liang, “Synthesizing
[6] “GPF,” http://www.vdalabs.com/tools/efs gpf.html. program input grammars,” in Proceedings of the ACM Conference
[7] “LibFuzzer,” http://llvm.org/docs/LibFuzzer.html. on Programming Language Design and Implementation, 2017, pp.
[8] “Microsoft Security Development Lifecycle, verification phase,” 95–110.
https://www.microsoft.com/en-us/sdl/process/verification.aspx. [31] I. Beer, “pwn4fun spring 2014–safari–part ii,”
[9] “Reddit: Iama mayhem, the hacking machine that won darpa’s http://googleprojectzero.blogspot.com/2014/11/pwn4fun-spring-2014-safari-
cyber grand challenge. ama!” 2014.
https://www.reddit.com/r/IAmA/comments/4x9yn3/iama mayhem [32] the Beizer, Black-box
B. hacking machineTesting: Techniques
that won for Functional Testing of
darpas/.
[10] M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti, “Control-flow Software and Systems. John Wiley & Sons, 1995.
integrity,” in Proceedings of the ACM Conference on Computer and [33] F. Bellard, “QEMU, a fast and portable dynamic translator,” in
Communications Security, 2005, pp. 340–353. Proceedings of the USENIX Annual Technical Conference, 2005, pp.
[11] ——, “Control-flow integrity principles, implementations, and 41–46.
applications,” ACM Transactions on Information and Systems [34] D. A. Berry and B. Fristedt, Bandit Problems: Sequential Allocation
Security, vol. 13, no. 1, pp. 4:1–4:40, 2009. of Experiments. Springer Netherlands, 1985.
[12] H. J. Abdelnur, R. State, and O. Festor, “KiF: A stateful sip [35] M. Böhme, “STADS: Software testing as species discovery,”
fuzzer,” in Proceedings of the International Conference on Principles, ACM Transactions on Software Engineering and Methodology,
2007, pp. 47–56. vol. 27, no. 2, pp. 7:1–7:52, 2018.
17

[36] M. Böhme, V.-T. Pham, M.-D. Nguyen, and A. Roychoudhury, [58] Y. Chen, A. Groce, C. Zhang, W.-K. Wong, X. Fern, E. Eide, and
“Directed greybox fuzzing,” in Proceedings of the ACM Conference J. Regehr, “Taming compiler fuzzers,” in Proceedings of the ACM
on Computer and Communications Security, 2017, pp. 2329–2344. Conference on Programming Language Design and Implementation,
[37] M. Böhme, V.-T. Pham, and A. Roychoudhury, “Coverage-based 2013, pp. 197–208.
greybox fuzzing as markov chain,” in Proceedings of the ACM [59] Y. Chen, C. Su, C. Sun, S. Su, and J. Zhao, “Coverage-directed
Conference on Computer and Communications Security, 2016, pp. differential testing of jvm implementations,” in Proceedings of the
1032–1043. ACM Conference on Programming Language Design and
[38] E. Bounimova, P. Godefroid, and D. Molnar, “Billions and Implementation, 2016, pp. 85–99.
billions of constraints: Whitebox fuzz testing in production,” in [60] V. Chipounov, V. Kuznetsov, and G. Candea, “S2E: A platform
Proceedings of the International Conference on Software Engineering, for in-vivo multi-path analysis of software systems,” in
2013, pp. 122–131. Proceedings of the International Conference on Architectural Support
[39] R. S. Boyer, B. Elspas, and K. N. Levitt, “SELECT—a formal for Programming Languages and Operating Systems, 2011, pp.
system for testing and debugging programs by symbolic 265–278.
execution,” ACM SIGPLAN Notices, vol. 10, no. 6, pp. 234–245, [61] Chrome Security Team, “Clusterfuzz,”
1975. https://code.google.com/p/clusterfuzz/.
[40] S. Bratus, A. Hansen, and A. Shubina, “LZfuzz: a fast [62] CIFASIS, “Neural fuzzer,” http://neural-fuzzer.org.
compression-based fuzzer for poorly documented protocols,” [63] P. Comparetti, G. Wondracek, C. Kruegel, and E. Kirda,
Dartmouth College, Tech. Rep. TR2008-634, 2008. “Prospex: Protocol specification extraction,” in Proceedings of the
[41] C. Brubaker, S. Janapa, B. Ray, S. Khurshid, and V. Shmatikov, IEEE Symposium on Security and Privacy, 2009, pp. 110–125.
“Using frankencerts for automated adversarial testing of [64] J. Corina, A. Machiry, C. Salls, Y. Shoshitaishvili, S. Hao,
certificate validation in SSL/TLS implementations,” in C. Kruegel, and G. Vigna, “DIFUZE: Interface aware fuzzing for
Proceedings of the IEEE Symposium on Security and Privacy, 2014, kernel drivers,” in Proceedings of the ACM Conference on Computer
pp. 114–129. and Communications Security, 2017, pp. 2123–2138.
[42] D. L. Bruening, “Efficient, transparent, and comprehensive [65] W. Cui, M. Peinado, S. K. Cha, Y. Fratantonio, and V. P. Kemerlis,
runtime code manipulation,” Ph.D. dissertation, Massachusetts “RETracer: Triaging crashes by reverse execution from partial
Institute of Technology, 2004. memory dumps,” in Proceedings of the International Conference on
[43] A. Budi, D. Lo, L. Jiang, and Lucia, “kb-Anonymity: A model for Software Engineering, 2016, pp. 820–831.
anonymized behavior-preserving test and debugging data,” in [66] W. Cui, M. Peinado, K. Chen, H. J. Wang, and L. Irun-Briz,
Proceedings of the ACM Conference on Programming Language “Tupni: Automatic reverse engineering of input formats,” in
Design and Implementation, 2011, pp. 447–457. Proceedings of the ACM Conference on Computer and
[44] J. Caballero, P. Poosankam, S. McCamant, D. Babić, and D. Song, Communications Security, 2008, pp. 391–402.
“Input generation via decomposition and re-stitching: Finding [67] L. Della Toffola, C. A. Staicu, and M. Pradel, “Saying ‘hi!’ is not
bugs in malware,” in Proceedings of the ACM Conference on enough: Mining inputs for effective test generation,” in
Computer and Communications Security, 2010, pp. 413–425. Proceedings of the International Conference on Automated Software
[45] J. Caballero, H. Yin, Z. Liang, and D. Song, “Polyglot: Automatic Engineering, 2017, pp. 44–49.
extraction of protocol message format using dynamic binary [68] J. D. DeMott, R. J. Enbody, and W. F. Punch, “Revolutionizing
analysis,” in Proceedings of the ACM Conference on Computer and the field of grey-box attack surface testing with evolutionary
Communications Security, 2007, pp. 317–329. fuzzing,” in Proceedings of the Black Hat USA, 2007.
[46] C. Cadar, D. Dunbar, and D. Engler, “KLEE: Unassisted and [69] K. Dewey, J. Roesch, and B. Hardekopf, “Language fuzzing
automatic generation of high-coverage tests for complex using constraint logic programming,” in Proceedings of the
systems programs,” in Proceedings of the USENIX Symposium on International Conference on Automated Software Engineering, 2014,
Operating System Design and Implementation, 2008, pp. 209–224. pp. 725–730.
[47] Y. Cai and W. Chan, “MagicFuzzer: Scalable deadlock detection [70] ——, “Fuzzing the rust typechecker using clp,” in Proceedings of
for large-scale applications,” in Proceedings of the International the International Conference on Automated Software Engineering,
Conference on Software Engineering, 2012, pp. 606–616. 2015, pp. 482–493.
[48] D. Caselden, A. Bazhanyuk, M. Payer, L. Szekeres, S. McCamant, [71] W. Dietz, P. Li, J. Regehr, and V. Adve, “Understanding integer
and D. Song, “Transformation-aware exploit generation using a overflow in C/C++,” in Proceedings of the International Conference
HI-CFG,” University of California, Tech. Rep. on Software Engineering, 2012, pp. 760–770.
UCB/EECS-2013-85, 2013. [72] B. Dolan-Gavitt, A. Srivastava, P. Traynor, and J. Giffin, “Robust
[49] CERT, “Basic Fuzzing Framework,” signatures for kernel data structures,” in Proceedings of the ACM
https://www.cert.org/vulnerability-analysis/tools/bff.cfm. Conference on Computer and Communications Security, 2009, pp.
[50] ——, “Failure Observation Engine,” 566–577.
https://www.cert.org/vulnerability-analysis/tools/foe.cfm. [73] A. Doupé, L. Cavedon, C. Kruegel, and G. Vigna, “Enemy of the
[51] S. K. Cha, T. Avgerinos, A. Rebert, and D. Brumley, “Unleashing State: A state-aware black-box web vulnerability scanner,” in
mayhem on binary code,” in Proceedings of the IEEE Symposium Proceedings of the USENIX Security Symposium, 2012, pp. 523–538.
on Security and Privacy, 2012, pp. 380–394. [74] F. Duchene, S. Rawat, J.-L. Richier, and R. Groz, “Kameleonfuzz:
[52] S. K. Cha, M. Woo, and D. Brumley, “Program-adaptive Evolutionary fuzzing for black-box XSS detection,” in
mutational fuzzing,” in Proceedings of the IEEE Symposium on Proceedings of the ACM Conference on Data and Application Security
Security and Privacy, 2015, pp. 725–741. and Privacy, 2014, pp. 37–48.
[53] H. Chen, Y. Xue, Y. Li, B. Chen, X. Xie, X. Wu, and Y. Liu, [75] D. Duran, D. Weston, and M. Miller, “Targeted taint driven
“Hawkeye: Towards a desired directed grey-box fuzzer,” in fuzzing using software metrics,” in Proceedings of the CanSecWest,
Proceedings of the ACM Conference on Computer and 2011.
Communications Security, 2018, pp. 2095–2108. [76] M. Eddington, “Peach fuzzing platform,”
[54] J. Chen, W. Diao, Q. Zhao, C. Zuo, Z. Lin, X. Wang, W. C. Lau, http://community.peachfuzzer.com/WhatIsPeach.html.
M. Sun, R. Yang, and K. Zhang, “IoTFuzzer: Discovering [77] A. Edwards, A. Srivastava, and H. Vo, “Vulcan: Binary
memory corruptions in IoT through app-based fuzzing,” in transformation in a distributed environment,” Microsoft
Proceedings of the Network and Distributed System Security Research, Tech. Rep. MSR-TR-2001-50, 2001.
Symposium, 2018. [78] S. Embleton, S. Sparks, and R. Cunningham, ““sidewinder”: An
[55] K. Chen and D. Wagner, “Large-scale analysis of format string evolutionary guidance system for malicious input crafting,” in
vulnerabilities in debian linux,” in Proceedings of the Workshop on Proceedings of the Black Hat USA, 2006.
Programming Languages and Analysis for Security, 2007, pp. 75–84. [79] G. Evron, N. Rathaus, R. Fly, A. Jenik, D. Maynor, C. Miller, and
[56] P. Chen and H. Chen, “Angora: Efficient fuzzing by principled Y. Naveh, Open Source Fuzzing Tools. Syngress, 2007.
search,” in Proceedings of the IEEE Symposium on Security and [80] A. P. Felt, E. Chin, S. Hanna, D. Song, and D. Wagner, “Android
Privacy, 2018, pp. 855–869. permissions demystified,” in Proceedings of the ACM Conference
[57] T. Y. Chen, F.-C. Kuo, R. G. Merkel, and T. H. Tse, “Adaptive on Computer and Communications Security, 2011, pp. 627–638.
random testing: The ART of test case diversity,” Journal of [81] S. Fewer, “A collection of burpsuite intruder payloads, fuzz lists
Systems and Software, vol. 83, no. 1, pp. 60–66, 2010. and file uploads,” https://github.com/1N3/IntruderPayloads.
18

[82] J. Foote, “Gdb exploitable,” [108] W. E. Howden, “Methodology for the generation of program test
https://github.com/jfoote/exploitable. data,” IEEE Transactions on Computers, vol. C, no. 5, pp. 554–560,
[83] S. Gan, C. Zhang, X. Qin, X. Tu, K. Li, Z. Pei, and Z. Chen, 1975.
“CollAFL: Path sensitive fuzzing,” in Proceedings of the IEEE [109] InfoSec Institute, “Charlie Miller reveals his process for security
Symposium on Security and Privacy, 2018, pp. 660–677. research,”
[84] V. Ganesh, T. Leek, and M. Rinard, “Taint-based directed http://resources.infosecinstitute.com/how-charlie-miller-does-research/,
whitebox fuzzing,” in Proceedings of the International Conference 2011.
on Software Engineering, 2009, pp. 474–484. [110] V. Iozzo, “0-knowledge fuzzing,” in Proceedings of the Black Hat
[85] H. Gascon, C. Wressnegger, F. Yamaguchi, D. Arp, and K. Rieck, USA, 2010.
“PULSAR: Stateful black-box fuzzing of proprietary network [111] S. Jana and V. Shmatikov, “Abusing file processing in malware
protocols,” in Proceedings of the International Conference on detectors for fun and profit,” in Proceedings of the IEEE
Security and Privacy in Communication Systems, 2015, pp. 330–347. Symposium on Security and Privacy, 2012, pp. 80–94.
[86] GitHub, “Public fuzzers,” [112] K. Jayaraman, D. Harvison, V. Ganesh, and A. Kiezun, “jFuzz: A
https://github.com/search?q=fuzzing&type=Repositories. concolic whitebox fuzzer for java,” in Proceedings of the First
[87] P. Godefroid, “Random testing for security: Blackbox vs. NASA Forma Methods Symposium, 2009, pp. 121–125.
whitebox fuzzing,” in Proceedings of the International Workshop on [113] Y. Jeon, P. Biswas, S. Carr, B. Lee, and M. Payer, “HexType:
Random Testing, 2007, pp. 1–1. Efficient detection of type confusion errors for c++,” in
Proceedings of the ACM Conference on Computer and
[88] P. Godefroid, A. Kiezun, and M. Y. Levin, “Grammar-based
Communications Security, 2017, pp. 2373–2387.
whitebox fuzzing,” in Proceedings of the ACM Conference on
[114] W. Johansson, M. Svensson, U. E. Larson, M. Almgren, and
Programming Language Design and Implementation, 2008, pp.
V. Gulisano, “T-Fuzz: Model-based fuzzing for robustness
206–215.
testing of telecommunication protocols,” in Proceedings of the
[89] P. Godefroid, N. Klarlund, and K. Sen, “DART: Directed IEEE International Conference on Software Testing, Verification and
automated random testing,” in Proceedings of the ACM Conference Validation, 2014, pp. 323–332.
on Programming Language Design and Implementation, 2005, pp.
[115] D. Jones, “Trinity,” https://github.com/kernelslacker/trinity.
213–223.
[116] P. Joshi, C.-S. Park, K. Sen, and M. Naik, “A randomized
[90] P. Godefroid, M. Y. Levin, and D. A. Molnar, “Automated dynamic program analysis technique for detecting real
whitebox fuzz testing,” in Proceedings of the Network and deadlocks,” in Proceedings of the ACM Conference on Programming
Distributed System Security Symposium, 2008, pp. 151–166. Language Design and Implementation, 2009, pp. 110–120.
[91] P. Godefroid, H. Peleg, and R. Singh, “Learn&Fuzz: Machine [117] R. L. S. Jr., “A framework for file format fuzzing with genetic
learning for input fuzzing,” in Proceedings of the International algorithms,” Ph.D. dissertation, University of Tennessee, 2012.
Conference on Automated Software Engineering, 2017, pp. 50–59. [118] J. Jung, A. Sheth, B. Greenstein, D. Wetherall, G. Maganis, and
[92] P. Goodman and A. Dinaburg, “The past, present, and future of T. Kohno, “Privacy oracle: A system for finding application
cyberdyne,” in Proceedings of the IEEE Symposium on Security and leaks with black box differential testing,” in Proceedings of the
Privacy, 2018, pp. 61–69. ACM Conference on Computer and Communications Security, 2008,
[93] GrammaTech, “Grammatech blogs: The cyber grand challenge,” pp. 279–288.
http://blogs.grammatech.com/the-cyber-grand-challenge. [119] M. Jurczyk, “CompareCoverage,”
[94] G. Grieco, M. Ceresa, and P. Buiras, “Quickfuzz: An automatic https://github.com/googleprojectzero/CompareCoverage.
random fuzzer for common file formats,” in Proceedings of the 9th [120] R. Kaksonen, M. Laakso, and A. Takanen, “Software security
International Symposium on Haskell, 2016, pp. 13–20. assessment through specification mutations and fault injection,”
[95] A. H. H, “Melkor elf fuzzer,” in Proceedings of the IFIP TC 6/TC 11 International Conference on
https://github.com/IOActive/Melkor ELF Fuzzer. Communications and Multimedia Security, 2001, pp. 173–183.
[96] I. Haller, Y. Jeon, H. Peng, M. Payer, C. Giuffrida, H. Bos, and [121] A. Kanade, R. Alur, S. Rajamani, and G. Ramanlingam,
E. Van Der Kouwe, “TypeSan: Practical type confusion “Representation dependence testing using program inversion,”
detection,” in Proceedings of the ACM Conference on Computer and in Proceedings of the International Symposium on Foundations of
Communications Security, 2016, pp. 517–528. Software Engineering, 2010, pp. 277–286.
[97] I. Haller, A. Slowinska, M. Neugschwandtner, and H. Bos, [122] A. Kapravelos, C. Grier, N. Chachra, C. Kruegel, G. Vigna, and
“Dowsing for overflows: A guided fuzzer to find buffer V. Paxson, “Hulk: Eliciting malicious behavior in browser
boundary violations,” in Proceedings of the USENIX Security extensions,” in Proceedings of the USENIX Security Symposium,
Symposium, 2013, pp. 49–64. 2014, pp. 641–654.
[98] D. Hamlet, “When only random testing will do,” in Proceedings [123] U. Kargén and N. Shahmehri, “Turning programs against each
of the International Workshop on Random Testing, 2006, pp. 1–9. other: High coverage fuzz-testing using binary-code mutation
[99] H. Han and S. K. Cha, “IMF: Inferred model-based fuzzer,” in and dynamic slicing,” in Proceedings of the International
Proceedings of the ACM Conference on Computer and Symposium on Foundations of Software Engineering, 2015, pp.
Communications Security, 2017, pp. 2345–2358. 782–792.
[124] H. Kario, “tlsfuzzer,” https://github.com/tomato42/tlsfuzzer.
[100] H. Han, D. Oh, and S. K. Cha, “CodeAlchemist:
Semantics-aware code generation to find vulnerabilities in [125] S. Y. Kim, S. Lee, I. Yun, W. Xu, B. Lee, Y. Yun, and T. Kim,
javascript engines,” in Proceedings of the Network and Distributed “CAB-Fuzz: Practical concolic testing techniques for COTS
System Security Symposium, 2019. operating systems,” in Proceedings of the USENIX Annual
Technical Conference, 2017, pp. 689–701.
[101] W. Han, B. Joe, B. Lee, C. Song, and I. Shin, “Enhancing memory
[126] J. C. King, “Symbolic execution and program testing,”
error detection for large-scale applications and fuzz testing,” in
Communications of the ACM, vol. 19, no. 7, pp. 385–394, 1976.
Proceedings of the Network and Distributed System Security
[127] G. Klees, A. Ruef, B. Cooper, S. Wei, and M. Hicks, “Evaluating
Symposium, 2018.
fuzz testing,” in Proceedings of the ACM Conference on Computer
[102] A. Helin, “radamsa,” https://github.com/aoh/radamsa. and Communications Security, 2018, pp. 2123–2138.
[103] S. Hocevar, “zzuf,” https://github.com/samhocevar/zzuf. [128] P. Koopman, J. Sung, C. Dingman, D. Siewiorek, and T. Marz,
[104] G. Hoglund, “Runtime decompilation,” in Proceedings of the Black “Comparing operating systems using robustness benchmarks,”
Hat USA, 2003. in Proceedings of the Symposium on Reliable Distributed Systems,
[105] C. Holler, K. Herzig, and A. Zeller, “Fuzzing with code 1997, pp. 72–79.
fragments,” in Proceedings of the USENIX Security Symposium, [129] J. Koret, “Nightmare,”
2012, pp. 445–458. https://github.com/joxeankoret/nightmare.
[106] A. D. Householder, “Well there’s your problem: Isolating the [130] lafintel, “Circumventing fuzzing roadblocks with compiler
crash-inducing bits in a fuzzed file,” CERT, Tech. Rep. transformations,”
CMU/SEI-2012-TN-018, 2012. https://lafintel.wordpress.com/2016/08/15/circumventing-fuzzing-roadblock
[107] A. D. Householder and J. M. Foote, “Probability-based 2016.
parameter selection for black-box fuzz testing,” CERT, Tech. Rep. [131] Z. Lai, S. Cheung, and W. Chan, “Detecting atomic-set
CMU/SEI-2012-TN-019, 2012. serializability violations in multithreaded programs through
19

active randomized testing,” in Proceedings of the International [154] D. Molnar, X. C. Li, and D. A. Wagner, “Dynamic test generation
Conference on Software Engineering, 2010, pp. 235–244. to find integer bugs in x86 binary linux programs,” in
[132] M. Laurenzano, M. M. Tikir, L. Carrington, and A. Snavely, Proceedings of the USENIX Security Symposium, 2009, pp. 67–82.
“PEBIL: Efficient static binary instrumentation for linux,” in [155] L. D. Moura and N. Bjørner, “Satisfiability modulo theories:
Proceedings of the IEEE International Symposium on Performance Introduction and applications,” Communications of the ACM,
Analysis of Systems & Software, 2010, pp. 175–183. vol. 54, no. 9, pp. 69–77, 2011.
[133] B. Lee, C. Song, T. Kim, and W. Lee, “Type casting verification: [156] C. Mulliner, N. Golde, and J.-P. Seifert, “SMS of death: from
Stopping an emerging attack vector,” in Proceedings of the analyzing to attacking mobile phones on a large scale,” in
USENIX Security Symposium, 2015, pp. 81–96. Proceedings of the USENIX Security Symposium, 2011, pp. 24–24.
[134] S. Lee, C. Yoon, C. Lee, S. Shin, V. Yegneswaran, and P. Porras, [157] MWR Labs, “KernelFuzzer,”
“DELTA: A security assessment framework for software-defined https://github.com/mwrlabs/KernelFuzzer.
networks,” in Proceedings of the Network and Distributed System [158] G. J. Myers, C. Sandler, and T. Badgett, The Art of Software
Security Symposium, 2017. Testing. Wiley, 2011.
[135] C. Lemieux, R. Padhye, K. Sen, and D. Song, “PerfFuzz: [159] S. Nagarakatte, J. Zhao, M. M. K. Martin, and S. Zdancewic,
Automatically generating pathological inputs,” in Proceedings of “SoftBound: Highly compatible and complete spatial memory
the International Symposium on Software Testing and Analysis, 2018, safety for C,” in Proceedings of the ACM Conference on
pp. 254–265. Programming Language Design and Implementation, 2009, pp.
[136] C. Lemieux and K. Sen, “FairFuzz: A targeted mutation strategy 245–258.
for increasing greybox fuzz testing coverage,” in Proceedings of [160] ——, “CETS: Compiler enforced temporal safety for C,” in
the International Conference on Automated Software Engineering, Proceedings of the International Symposium on Memory Management,
2018, pp. 475–485. 2010, pp. 31–40.
[137] J. Li, B. Zhao, and C. Zhang, “Fuzzing: a survey,” Cybersecurity, [161] NCC Group, “Hodor fuzzer,”
vol. 1, no. 1, 2018. https://github.com/nccgroup/hodor.
[138] Y. Li, B. Chen, M. Chandramohan, S.-W. Lin, Y. Liu, and A. Tiu, [162] ——, “Triforce linux syscall fuzzer,”
“Steelix: Program-state based binary fuzzing,” in Proceedings of https://github.com/nccgroup/TriforceLinuxSyscallFuzzer.
the International Symposium on Foundations of Software Engineering, [163] N. Nethercote and J. Seward, “Valgrind: a framework for
2017, pp. 627–637. heavyweight dynamic binary instrumentation,” in Proceedings of
[139] H. Liang, X. Pei, X. Jia, W. Shen, and J. Zhang, “Fuzzing: State of the ACM Conference on Programming Language Design and
the art,” IEEE Transactions on Reliability, vol. 67, no. 3, pp. Implementation, 2007, pp. 89–100.
1199–1218, 2018. [164] J. Newsome and D. Song, “Dynamic taint analysis for automatic
[140] C. Lidbury, A. Lascu, N. Chong, and A. F. Donaldson, detection, analysis, and signature generation of exploits on
“Many-core compiler fuzzing,” in Proceedings of the ACM commodity software,” in Proceedings of the Network and
Conference on Programming Language Design and Implementation, Distributed System Security Symposium, 2005.
2015, pp. 65–76. [165] D. Oleksiuk, “Ioctl fuzzer,”
[141] Z. Lin and X. Zhang, “Deriving input syntactic structure from https://github.com/Cr4sh/ioctlfuzzer, 2009.
execution,” in Proceedings of the International Symposium on [166] C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball,
Foundations of Software Engineering, 2008, pp. 83–93. “Feedback-directed random test generation,” in Proceedings of the
[142] P. Liu, X. Zhang, M. Pistoia, Y. Zheng, M. Marques, and L. Zeng, International Conference on Software Engineering, 2007, pp. 75–84.
“Automatic text input generation for mobile testing,” in [167] S. Pailoor, A. Aday, and S. Jana, “MoonShine: Optimizing OS
Proceedings of the International Conference on Software Engineering, fuzzer seed selection with trace distillation,” in Proceedings of the
2017, pp. 643–653. USENIX Security Symposium, 2018, pp. 729–743.
[143] LMH, S. Grubb, I. van Sprundel, E. Sandeen, and J. Wilson, [168] J. Pan, G. Yan, and X. Fan, “Digtool: A virtualization-based
“fsfuzzer,” framework for detecting kernel vulnerabilities,” in Proceedings of
http://people.redhat.com/sgrubb/files/fsfuzzer-0.7.tar.gz. the USENIX Security Symposium, 2017, pp. 149–165.
[144] C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, [169] C.-S. Park and K. Sen, “Randomized active atomicity violation
S. Wallace, V. J. Reddi, and K. Hazelwood, “Pin: Building detection in concurrent programs,” in Proceedings of the
customized program analysis tools with dynamic International Symposium on Foundations of Software Engineering,
instrumentation,” in Proceedings of the ACM Conference on 2008, pp. 135–145.
Programming Language Design and Implementation, 2005, pp. [170] H. Peng, Y. Shoshitaishvili, and M. Payer, “T-Fuzz: Fuzzing by
190–200. program transformation,” in Proceedings of the IEEE Symposium
[145] L. Ma, C. Artho, C. Zhang, H. Sato, J. Gmeiner, and R. Ramler, on Security and Privacy, 2018, pp. 917–930.
“GRT: Program-analysis-guided random testing,” in Proceedings [171] T. Petsios, A. Tang, S. J. Stolfo, A. D. Keromytis, and S. Jana,
of the International Conference on Automated Software Engineering, “NEZHA: Efficient domain-independent differential testing,” in
2015, pp. 212–223. Proceedings of the IEEE Symposium on Security and Privacy, 2017,
[146] R. Mahmood, N. Esfahani, T. Kacem, N. Mirzaei, S. Malek, and pp. 615–632.
A. Stavrou, “A whitebox approach for automated security [172] V.-T. Pham, M. Böhme, and A. Roychoudhury, “Model-based
testing of android applications on the cloud,” in Proceedings of whitebox fuzzing for program binaries,” in Proceedings of the
the International Workshop on Automation of Software Test, 2012, pp. International Conference on Automated Software Engineering, 2016,
22–28. pp. 543–553.
[147] L. Martignoni, S. McCamant, P. Poosankam, D. Song, and [173] P. Project, “DynInst: Putting the performance in high
P. Maniatis, “Path-exploration lifting: Hi-fi tests for lo-fi performance computing,” http://www.dyninst.org/.
emulators,” in Proceedings of the International Conference on [174] H. Raffelt, B. Steffen, and T. Berg, “LearnLib: A library for
Architectural Support for Programming Languages and Operating automata learning and experimentation,” in Proceedings of the
Systems, 2012, pp. 337–348. International Workshop on Formal Methods for Industrial Critical
[148] W. M. McKeeman, “Differential testing for software,” Digital Systems, 2005, pp. 62–71.
Technical Journal, vol. 10, no. 1, pp. 100–107, 1998. [175] S. Rasthofer, S. Arzt, S. Triller, and M. Pradel, “Making malory
[149] D. Mckinney, “antiparser,” http://antiparser.sourceforge.net/. behave maliciously: Targeted fuzzing of android execution
[150] Microsoft Corporation, “!exploitable crash analyzer – MSEC environments,” in Proceedings of the International Conference on
debugger extensions,” https://msecdbg.codeplex.com. Software Engineering, 2017, pp. 300–311.
[151] ——, “Minifuzz,” [176] S. Rawat, V. Jain, A. Kumar, L. Cojocar, C. Giuffrida, and H. Bos,
https://msdn.microsoft.com/en-us/biztalk/gg675011. “VUzzer: Application-aware evolutionary fuzzing,” in
[152] B. P. Miller, L. Fredriksen, and B. So, “An empirical study of the Proceedings of the Network and Distributed System Security
reliability of UNIX utilities,” Communications of the ACM, vol. 33, Symposium, 2017.
no. 12, pp. 32–44, 1990. [177] A. Rebert, S. K. Cha, T. Avgerinos, J. Foote, D. Warren, G. Grieco,
[153] C. Miller, “Fuzz by number: More data about fuzzing than you and D. Brumley, “Optimizing seed selection for fuzzing,” in
ever wanted to know,” in Proceedings of the CanSecWest, 2008. Proceedings of the USENIX Security Symposium, 2014, pp. 861–875.
20

[178] J. Regehr, Y. Chen, P. Cuoq, E. Eide, C. Ellison, , and X. Yang, [201] M. Sutton, “Filefuzz,”
“Test-case reduction for C compiler bugs,” in Proceedings of the http://osdir.com/ml/security.securiteam/2005-09/msg00007.html.
ACM Conference on Programming Language Design and [202] M. Sutton and A. Greene, “The art of file format fuzzing,” in
Implementation, 2012, pp. 335–346. Proceedings of the Black Hat Asia, 2005.
[179] J. Ruderman, “Lithium,” [203] M. Sutton, A. Greene, and P. Amini, Fuzzing: Brute Force
https://github.com/MozillaSecurity/lithium/. Vulnerability Discovery. Addison-Wesley Professional, 2007.
[180] J. D. Ruiter and E. Poll, “Protocol state fuzzing of tls [204] R. Swiecki and F. Gröbert, “honggfuzz,”
implementations,” in Proceedings of the USENIX Security https://github.com/google/honggfuzz.
Symposium, 2015, pp. 193–206. [205] A. Takanen, J. D. DeMott, and C. Miller, Fuzzing for Software
[181] M. Samak, M. K. Ramanathan, and S. Jagannathan, Security Testing and Quality Assurance. Artech House, 2008.
“Synthesizing racy tests,” in Proceedings of the ACM Conference on [206] A. Takanen, J. D. DeMott, C. Miller, and A. Kettunen, Fuzzing for
Programming Language Design and Implementation, 2015, pp. Software Security Testing and Quality Assurance, 2nd ed. Artech
175–185. House, 2018.
[182] P. Saxena, S. Hanna, P. Poosankam, and D. Song, “FLAX: [207] D. Thiel, “Exposing vulnerabilities in media software,” in
Systematic discovery of client-side validation vulnerabilities in Proceedings of the Black Hat EU, 2008.
rich web applications,” in Proceedings of the Network and [208] C. Tice, T. Roeder, P. Collingbourne, S. Checkoway,
Distributed System Security Symposium, 2010. U. Erlingsson, L. Lozano, and G. Pike, “Enforcing forward-edge
[183] F. B. Schneider, “Enforceable security policies,” ACM Transactions control-flow integrity in gcc & llvm,” in Proceedings of the
on Information System Security, vol. 3, no. 1, pp. 30–50, 2000. USENIX Security Symposium, 2014, pp. 941–955.
[184] S. Schumilo, C. Aschermann, R. Gawlik, S. Schinzel, and T. Holz, [209] N. Tillmann and J. De Halleux, “Pex–white box test generation
“kAFL: Hardware-assisted feedback fuzzing for os kernels,” in for .NET,” in Proceedings of the International Conference on Tests
Proceedings of the USENIX Security Symposium, 2017, pp. 167–182. and Proofs, 2008, pp. 134–153.
[185] E. J. Schwartz, T. Avgerinos, and D. Brumley, “All you ever [210] D. Trabish, A. Mattavelli, N. Rinetzky, and C. Cadar, “Chopped
wanted to know about dynamic taint analysis and forward symbolic execution,” in Proceedings of the International Conference
symbolic execution (but might have been afraid to ask),” in on Software Engineering, 2018, pp. 350–360.
Proceedings of the IEEE Symposium on Security and Privacy, 2010, [211] Trail of Bits, “GRR,” https://github.com/trailofbits/grr.
pp. 317–331. [212] R. Valotta, “Taking browsers fuzzing to the next (dom) level,” in
[186] M. Schwarz, D. Gruss, M. Lipp, C. Maurice, T. Schuster, A. Fogh, Proceedings of the DeepSec, 2012.
and S. Mangard, “Automated detection, exploitation, and [213] S. Veggalam, S. Rawat, I. Haller, and H. Bos, “IFuzzer: An
elimination of double-fetch bugs using modern CPU features,” evolutionary interpreter fuzzer using genetic programming,” in
in Proceedings of the ACM Symposium on Information, Computer Proceedings of the European Symposium on Research in Computer
and Communications Security, 2018, pp. 587–600. Security, 2016, pp. 581–601.
[187] M. Security, “funfuzz,” [214] M. Vuagnoux, “Autodafé: an act of software torture,” in
https://github.com/MozillaSecurity/funfuzz. Proceedings of the Chaos Communication Congress, 2005, pp. 47–58.
[188] ——, “orangfuzz,” [215] D. Vyukov, “go-fuzz,” https://github.com/dvyukov/go-fuzz.
https://github.com/MozillaSecurity/orangfuzz. [216] ——, “syzkaller,” https://github.com/google/syzkaller.
[189] K. Sen, “Effective random testing of concurrent programs,” in [217] J. Wang, B. Chen, L. Wei, and Y. Liu, “Skyfire: Data-driven seed
Proceedings of the International Conference on Automated Software generation for fuzzing,” in Proceedings of the IEEE Symposium on
Engineering, 2007, pp. 323–332. Security and Privacy, 2017, pp. 579–594.
[190] ——, “Race directed random testing of concurrent programs,” in [218] S. Wang, J. Nam, and L. Tan, “QTEP: Quality-aware test case
Proceedings of the ACM Conference on Programming Language prioritization,” in Proceedings of the International Symposium on
Design and Implementation, 2008, pp. 11–21. Foundations of Software Engineering, 2017, pp. 523–534.
[191] K. Sen, D. Marinov, and G. Agha, “CUTE: A concolic unit testing [219] T. Wang, T. Wei, G. Gu, and W. Zou, “TaintScope: A
engine for C,” in Proceedings of the International Symposium on checksum-aware directed fuzzing tool for automatic software
Foundations of Software Engineering, 2005, pp. 263–272. vulnerability detection,” in Proceedings of the IEEE Symposium on
[192] K. Serebryany, D. Bruening, A. Potapenko, and D. Vyukov, Security and Privacy, 2010, pp. 497–512.
“AddressSanitizer: A fast address sanity checker,” in Proceedings [220] X. Wang, N. Zeldovich, M. F. Kaashoek, and A. Solar-Lezama,
of the USENIX Annual Technical Conference, 2012, pp. 309–318. “Towards optimization-safe systems: Analyzing the impact of
[193] K. Serebryany and T. Iskhodzhanov, “ThreadSanitizer: data race undefined behavior,” in Proceedings of the ACM Symposium on
detection in practice,” in Proceedings of the Workshop on Binary Operating System Principles, 2013, pp. 260–275.
Instrumentation and Applications, 2009, pp. 62–71. [221] V. M. Weaver and D. Jones, “perf fuzzer: Targeted fuzzing of the
[194] Z. Sialveras and N. Naziridis, “Introducing Choronzon: An perf event open() system call,” UMaine VMW Group, Tech.
approach at knowledge-based evolutionary fuzzing,” in Rep., 2015.
Proceedings of the ZeroNights, 2015. [222] J. Wei, J. Chen, Y. Feng, K. Ferles, and I. Dillig, “Singularity:
[195] J. Somorovsky, “Systematic fuzzing and testing of tls libraries,” Pattern fuzzing for worst case complexity,” in Proceedings of the
in Proceedings of the ACM Conference on Computer and International Symposium on Foundations of Software Engineering,
Communications Security, 2016, pp. 1492–1504. 2018, pp. 213–223.
[196] D. Song, F. Hetzelt, D. Das, C. Spensky, Y. Na, S. Volckaert, [223] S. Winter, C. Sârbu, N. Suri, and B. Murphy, “The impact of fault
G. Vigna, C. Kruegel, J.-P. Seifert, and M. Franz, “Periscope: An models on software robustness evaluations,” in Proceedings of the
effective probing and fuzzing framework for the hardware-os International Conference on Software Engineering, 2011, pp. 51–60.
boundary,” in Proceedings of the Network and Distributed System [224] M. Y. Wong and D. Lie, “Intellidroid: A targeted input generator
Security Symposium, 2019. for the dynamic analysis of android malware,” in Proceedings of
[197] W. Song, X. Qian, and J. Huang, “EHBDroid: Beyond GUI the Network and Distributed System Security Symposium, 2016.
testing for android applications,” in Proceedings of the [225] M. Woo, S. K. Cha, S. Gottlieb, and D. Brumley, “Scheduling
International Conference on Automated Software Engineering, 2017, black-box mutational fuzzing,” in Proceedings of the ACM
pp. 27–37. Conference on Computer and Communications Security, 2013, pp.
[198] C. Spensky and H. Hu, “Ll-fuzzer,” 511–522.
https://github.com/mit-ll/LL-Fuzzer. [226] T. Xie, N. Tillmann, J. de Halleux, and W. Schulte,
[199] E. Stepanov and K. Serebryany, “MemorySanitizer: fast detector “Fitness-guided path exploration in dynamic symbolic
of uninitialized memory use in C++,” in Proceedings of the execution,” in Proceedings of the International Conference on
International Symposium on Code Generation and Optimization, Dependable Systems Networks, 2009, pp. 359–368.
2015, pp. 46–55. [227] D. Xu, J. Ming, and D. Wu, “Cryptographic function detection in
[200] N. Stephens, J. Grosen, C. Salls, A. Dutcher, R. Wang, J. Corbetta, obfuscated binaries via bit-precise symbolic loop mapping,” in
Y. Shoshitaishvili, C. Kruegel, and G. Vigna, “Driller: Proceedings of the IEEE Symposium on Security and Privacy, 2017,
Augmenting fuzzing through selective symbolic execution,” in pp. 921–937.
Proceedings of the Network and Distributed System Security [228] W. Xu, S. Kashyap, C. Min, and T. Kim, “Designing new
Symposium, 2016. operating primitives to improve fuzzing performance,” in
21

Proceedings of the ACM Conference on Computer and [237] K. Zetter, “A famed hacker is grading thousands of
Communications Security, 2017, pp. 2313–2328. programs—and may revolutionize software in the process,”
[229] D. Yang, Y. Zhang, and Q. Liu, “Blendfuzz: A model-based https://goo.gl/LRwaVl.
framework for fuzz testing programs with grammatical inputs,” [238] M. Zhang, R. Qiao, N. Hasabnis, and R. Sekar, “A platform for
in Proceedings of the ACM Conference on Programming Language secure static binary instrumentation,” in Proceedings of the
Design and Implementation, 2012, pp. 1070–1076. International Conference on Virtual Execution Environments, 2014,
[230] I. Yun, S. Lee, M. Xu, Y. Jang, and T. Kim, “QSYM: A practical pp. 129–140.
concolic execution engine tailored for hybrid fuzzing,” in [239] L. Zhao, Y. Duan, H. Yin, and J. Xuan, “Send hardest problems
Proceedings of the USENIX Security Symposium, 2018, pp. 745–762. my way: Probabilistic path prioritization for hybrid fuzzing,” in
[231] M. Zalewski, “American Fuzzy Lop,” Proceedings of the Network and Distributed System Security
http://lcamtuf.coredump.cx/afl/. Symposium, 2019.
[232] ——, “Crossfuzz,” [240] M. Zimmermann, “Tavor,” https://github.com/zimmski/tavor.
https://lcamtuf.blogspot.com/2011/01/announcing-crossfuzz-potential-0-day-in.html.
[233] ——, “New in AFL: persistent mode,”
https://lcamtuf.blogspot.com/2015/06/new-in-afl-persistent-mode.html.
[234] ——, “ref fuzz,”
http://lcamtuf.blogspot.com/2010/06/announcing-reffuzz-2yo-fuzzer.html.
[235] ——, “Technical “whitepaper” for afl-fuzz,”
http://lcamtuf.coredump.cx/afl/technical details.txt.
[236] A. Zeller and R. Hildebrandt, “Simplifying and isolating
failure-inducing input,” IEEE Transactions on Software
Engineering, vol. 28, no. 2, pp. 183–200, 2002.

You might also like