0% found this document useful (0 votes)
11 views19 pages

HyPFuzz Formal-Assisted Processor Fuzzing

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 19

HyPFuzz: Formal-Assisted Processor Fuzzing

Chen Chen, Rahul Kande, Nathan Nguyen, Flemming Andersen,


and Aakash Tyagi, Texas A&M University; Ahmad-Reza Sadeghi,
Technische Universität Darmstadt; Jeyavijayan Rajendran, Texas A&M University
https://www.usenix.org/conference/usenixsecurity23/presentation/chen-chen

This paper is included in the Proceedings of the


32nd USENIX Security Symposium.
August 9–11, 2023 • Anaheim, CA, USA
978-1-939133-37-3

Open access to the Proceedings of the


32nd USENIX Security Symposium
is sponsored by USENIX.
HyPFuzz: Formal-Assisted Processor Fuzzing

Chen Chen† , Rahul Kande† , Nathan Nguyen† , Flemming Andersen† , Aakash Tyagi† ,
Ahmad-Reza Sadeghi∗ , and Jeyavijayan Rajendran†
† Texas A&M University, USA, ∗ Technische Universität Darmstadt, Germany
† {chenc, rahulkande, nathan.tm.nguyen, flandersen, tyagi, jv.rajendran}@tamu.edu,
∗ {ahmad.sadeghi}@trust.tu-darmstadt.de

Abstract MITRE reports 111 hardware-related common weakness enu-


merations (CWEs) as of 2022 [49]. There exist a variety of
Recent research has shown that hardware fuzzers can effec-
traditional methodologies and tools for hardware security ver-
tively detect security vulnerabilities in modern processors.
ification, each having its own advantages and shortcomings,
However, existing hardware fuzzers do not fuzz well the hard-
as we explain below.
to-reach design spaces. Consequently, these fuzzers cannot
Hardware verification techniques. Academic and industry
effectively fuzz security-critical control- and data-flow logic
researchers have developed numerous hardware vulnerabil-
in the processors, hence missing security vulnerabilities.
ity detection techniques. These techniques can be classified
To tackle this challenge, we present HyPFuzz, a hybrid
as (i) formal: theorem proving [16], formal assertion prov-
fuzzer that leverages formal verification tools to help fuzz the
ing [78], model checking [15], and information-flow track-
hard-to-reach part of the processors. To increase the effective-
ing [31]; and (ii) simulation-based: random regression [52]
ness of HyPFuzz, we perform optimizations in time and space.
and hardware fuzzing [32, 36, 43, 74].
First, we develop a scheduling strategy to prevent under- or
over-utilization of the capabilities of formal tools and fuzzers. Formal verification techniques prove whether a design-
Second, we develop heuristic strategies to select points in the under-test (DUT) satisfies specified properties [42]. However,
design space for the formal tool to target. these techniques alone cannot verify the entire DUT because:
(i) in most cases, writing properties requires manual effort and
We evaluate HyPFuzz on five widely-used open-source pro-
expert knowledge of the DUT, which is error-prone and time-
cessors. HyPFuzz detected all the vulnerabilities detected by
consuming [19, 36], and (ii) large DUTs (such as processors)
the most recent processor fuzzer and found three new vulner-
lead to state explosion, making it impractical to comprehen-
abilities that were missed by previous extensive fuzzing and
sively verify a DUT for security vulnerabilities [14, 19].
formal verification. This led to two new common vulnera-
bilities and exposures (CVE) entries. HyPFuzz also achieves Random regression can automatically generate test cases
11.68× faster coverage than the most recent processor fuzzer. for verification. However, it does not scale well to large DUTs,
including processors [32, 36, 43], especially the regions of the
design that are hard-to-reach. For example, the probability of
1 Introduction a random regression technique to generate a test case that trig-
gers a zero flag (indicates that the output is “0”) in a 64-bit sub-
Hardware designs are becoming increasingly complex to meet tract module is 2−64 . Unfortunately, many hardware security-
the rising need for custom hardware and increased perfor- critical components are inherently hard-to-reach (e.g., access
mance. Around 67% of the application-specific integrated control and password checkers) [19].
circuit (ASIC) designs developed in 2020 have over 1 mil- Inspired by the success of software fuzzing methods, re-
lion gates, and 45% of them embed two or more proces- searchers have investigated hardware fuzzing to significantly
sors [2]. However, unlike software vulnerabilities that can increase the exploration of design spaces and accelerate the
be patched, most hardware vulnerabilities cannot be fixed detection of security vulnerabilities [32, 36, 43, 74]. Unfor-
post-fabrication, resulting in security vulnerabilities that put tunately, software fuzzers cannot be directly applied to the
many critical systems at risk and tarnish the reputation of the software model of hardware due to their fundamental differ-
companies involved. Hence, it is essential to detect vulnera- ences [36, 43, 69, 74]; for instance, hardware does not have
bilities pre-fabrication. However, the emergence of hardware an equivalent of a software crash, and software does not have
security vulnerabilities [19, 39, 46] shows that vulnerabili- floating wires. Hardware fuzzers outperform traditional hard-
ties are becoming more stealthy and harder to detect [19]. ware verification techniques, such as random regression and

USENIX Association 32nd USENIX Security Symposium 1361


The first challenge is to build an dynamic time scheduling
between the formal tool and the fuzzer since static scheduling
leads to under- or over-utilization of the capabilities of the
formal tool and the fuzzer. The second challenge concerns
the selection of coverage points in the DUT to be targeted
by the formal tool and the fuzzer, as the former is better at
reaching hard-to-reach design spaces while the latter is better
at exploring design spaces. The third challenge concerns the
incompatibility of formal tools and fuzzing tools: formal tools
target assertions about properties of the hardware, whereas
Formal tool Fuzzer
fuzzers target coverage points of the hardware. For a seamless
Vulnerability
integration of formal and fuzzing tools, one needs to convert
Figure 1: Formal tools and fuzzers catch vulnerabilities in the these assertions into coverage points, and vice-versa.
maze of designs. To solve these challenges: (i) We create a scheduling strat-
egy for fuzzer and formal tool that increases the overall cov-
erage rate of HyPFuzz (see Section 4.3). (ii) We propose
formal verification techniques [32, 36, 43, 74], in terms of cov- multiple strategies to select the coverage points for the formal
erage, scalability, and efficiency in detecting vulnerabilities, tool and empirically determine the best-performing strategy
and they can fuzz large designs such as processors [32, 36], (see Section 4.4). (iii) We develop a custom property gen-
including Rocket Core [11] and CVA6 [79] [32]. They have erator that converts coverage points to assertions taken by
found vulnerabilities that lead to privilege escalation and arbi- formal tools and a custom test case converter that converts
trary code execution attacks [36]. To improve efficiency, these Boolean assignments from formal tools into test cases taken
fuzzers use coverage data that succinctly captures different by fuzzers, enabling seamless integration of fuzzing and for-
hardware behaviors—finite-state machines (FSMs), branch mal techniques for hardware (see Section 4.2).
conditions, statements, multiplexors, etc.—to generate and Consequently, HyPFuzz achieves 11.68× faster cover-
mutate new test cases. age than the most recently proposed processor fuzzer, The-
However, while hardware fuzzing is very promising, it still Huzz [36]. It has detected three new vulnerabilities, apart from
does not even cover 70% of the hardware design in a practi- detecting all the vulnerabilities previously reported. It is also
cal amount of time. For example, a recent hardware fuzzer, 3.06× faster than TheHuzz.
TheHuzz, which has higher and faster coverage than random In summary, our main contributions are:
regression techniques and DIFUZZRTL [32], has covered only
about 63% of the total coverage points in the processors, leav- • We present a novel processor fuzzer, HyPFuzz, which com-
ing one-third of the space unexplored for vulnerabilities [36]. bines fuzzing and formal verification techniques to verify
The coverage of current hardware fuzzers falls well below large-scale processor designs and supports commonly-used
industry standards. For instance, Google states that security- hardware description languages (HDLs) like Verilog and
critical programs should achieve at least 90% coverage [34]. SystemVerilog. We use scheduling and selection strategies
Achieving 90% coverage is also typical in hardware verifica- making HyPFuzz fast and efficient for design space explo-
tion [77]. Faster coverage can promote the decision to tape ration and vulnerability detection.
out and expose unverified design spaces and vulnerabilities • We evaluate the effectiveness of HyPFuzz on five real-world
early [72]. Unfortunately, none of the existing hardware pro- open-source processors from RISC-V instruction set ar-
cessor fuzzers meet these criteria. chitecture (ISA) [63]—Rocket Core [11], CVA6 [79], and
Our goals and contributions. To alleviate the above limi- BOOM [80]—and OpenRISC ISA [54]—mor1kx [53] and
tation of hardware fuzzers and inspired by hybrid software OR1200 [55]— which are widely used as benchmarks in
fuzzers [70], we aim at making the first step towards building the hardware security community and include all the bench-
a hybrid hardware fuzzer that combines the capabilities of marks used by DIFUZZRTL [32] and TheHuzz [36].
formal verification techniques/tools and fuzzing tools. Fig-
ure 1 illustrates the intuition of a hybrid hardware fuzzer. • HyPFuzz achieves 11.68× faster coverage than the most
The maze represents the entire design space of hardware, and recent processor fuzzer and 239.93× faster coverage than
the walls represent the conditions required to reach a design random regression. It found three new vulnerabilities lead-
space. Formal tools will lead fuzzers to the hard-to-reach ing to two common vulnerabilities and exposures (CVE) en-
design spaces so that fuzzers can quickly explore them and tries, CVE-2022-33021 and CVE-2022-33023, apart from
detect the vulnerability in the hard-to-reach design spaces. detecting all the vulnerabilities detected by TheHuzz. HyP-
To this end, we developed a new hybrid hardware fuzzer, Fuzz is 3.06× faster regarding run-time and 3.05× faster
HyPFuzz, which is non-trivial due to the following reasons. regarding the number of instructions than TheHuzz.

1362 32nd USENIX Security Symposium USENIX Association


2 Background such as Siemens Questa [6], Synopsys VC Formal [8], and
Cadence JasperGold. They operate on hardware designs rep-
We now provide a succinct background on formal verification resented in different hardware description languages (HDLs).
and hardware fuzzing, which form the basis of HyPFuzz. For HyPFuzz, we currently use the JasperGold, which is well-
known for its performance and features supported [61] and is
also used for the verification of RISC-V processors [1].
2.1 Formal Verification Despite their performance, these tools cannot formally ver-
Formal verification techniques have shown to be effective ify complete processor designs because the size of the designs
at finding subtle vulnerabilities [15], such as side-channel and/or complexity are often too big. For instance, JasperGold
leakage [23, 48, 60, 73], information leakage [20, 22, 62], and took around eight days to verify 94.51% of the branch cover-
concurrency errors [17, 27]. These techniques can find these age points in CVA6 processor [79] (see Section 3.3), motivat-
vulnerabilities because they can exhaustively prove whether a ing the need for techniques such as hardware fuzzing.
design-under-test (DUT) satisfies specified properties. JasperGold includes SAT/BDD-based formal engines with
Existing hybrid software fuzzers use symbolic execution to variations of these algorithms to prove properties [3]. There-
generate test cases and explore the hard-to-reach regions of a fore, HyPFuzz could use any other SAT/BDD-based proof
DUT [25, 26, 57, 58, 70, 81, 82]. This is done by identifying engines that support the generation of Boolean assignments
the execution paths a fuzzer cannot reach and generating test for SVA properties to generate test cases.
cases that force the target DUT to execute these paths [25].
Symbolic execution explores all possible execution paths of 2.2 Hardware Fuzzing
a DUT, but it requires a mapping between other coverage
metrics to execution paths and is limited by the huge space of Most hardware fuzzers consist of a seed corpus, mutator,
execution path of large designs [82]. and vulnerability detector [32, 36, 74]. The seed corpus is an
HyPFuzz uses another method as it uses commercial hard- initial set of input test cases called seeds [32]. These input test
ware formal tools, like Cadence JasperGold [4], to generate cases are the inputs required to simulate the DUT. The seed
such test cases. JasperGold requires SystemVerilog Asser- corpus is either manually crafted or generated randomly [51].
tion (SVA) properties [28] known as cover properties as in- The fuzzer simulates the DUT with these test cases, collects
puts to enable HyPFuzz to generate test cases. The properties coverage, and mutates all “interesting” test cases (i.e., test
proved by such tools mostly fall under two categories: (i) “as- cases that achieve coverage) using its mutator to generate
sert/safety” properties, where one must verify all possible new test cases [43]. The vulnerability detector reports any
execution paths of the DUT to ensure that the property is not vulnerabilities detected during the simulation. The fuzzer
violated; and (ii) “cover/progress/trace-existing” properties, simulates these new test cases and repeats the cycle until it
where one must verify there exists a path from the initial state achieves the desired coverage. Next, we explain the various
of the DUT to a state where the property is satisfied. The components and tasks performed by hardware fuzzers.
relationship between a cover property and an assert property DUT is a hardware design written in HDLs like Verilog
can be shown as cover(p) = assert(¬p), where p represents and SystemVerilog [36, 51, 74] or hardware construction lan-
the expression of a property. Hence, compared to symbolic guages (HCLs) like Chisel [13, 32, 43]. Hardware fuzzers use
execution, formal tools such as JasperGold that support cover simulation tools like Verilator [69], Synopsys VCS [71], and
property can (i) explore hard-to-reach regions based on vari- Siemens Modelsim [5] to simulate these DUTs.
ous coverage metrics rather than execution paths and (ii) effi- Test cases of generic hardware fuzzers include data for each
ciently verify larger DUTs since the tools only need to find input signal of the DUT for each clock cycle [43, 51, 74].
the existence of one path that satisfies the property. In contrast, fuzzers designed specifically to fuzz processors
On proving a cover property, formal tools will return one generate binary executable files as test cases [32, 36].
of three results: (i) unreachable, (ii) reachable, or (iii) un- Coverage measures the number of various types of hard-
determined [61]. A property is unreachable when there is ware behaviors, such as toggling the select signals of muxes
no execution path from an initial state of the DUT to a state (mux-toggle coverage [43]) and setting registers that drive the
satisfying the property. A reachable property will have at selected signals of muxes to different values (control-register
least one such path. Formal tools usually output such a path coverage [32]) during the simulation. Coverage points are
consisting of Boolean assignments for the inputs of the DUT assigned to each of these behaviors. For example, branch
for each clock cycle to satisfy the property. HyPFuzz uses coverage indicates whether the different paths of a branch
such assignments to generate the test cases as seeds for the statement are covered or not. Whenever a design enters one
fuzzer (see Section 4.2). A property is undetermined if the of the branch paths, its corresponding coverage point is con-
formal tool cannot find a path within a given time limit. We sidered covered; otherwise, it remains uncovered.
account for all these properties while building HyPFuzz. The DUT is instrumented to generate coverage during the
Several commercial formal tools verify hardware DUTs, simulations [43]. Thus, covering all the coverage points in

USENIX Association 32nd USENIX Security Symposium 1363


the DUT is essential to verify all the hardware behaviors. Listing 1: Interrupt handler in the CVA6 processor [79].
Hardware fuzzers use coverage as feedback to determine the 1 localparam SupervisorIrq = 1;
2 localparam logic [63:0] S_EXT_INTERRUPT = 64'd9;
interesting test cases [32, 36, 43, 74]. 3 ...
Mutations are data manipulation operations, such as bit-flip, 4 // Interrupt handler
5 if (mie[S_TIMER_INTERRUPT] && mip[S_TIMER_INTERRUPT])
byte-flip, clone, and swap, inspired by software fuzzers like 6 interrupt_cause = S_TIMER_INTERRUPT; // Supervisor Timer
the AFL fuzzer [44] [36, 43]. 7 if (mie[S_SW_INTERRUPT] && mip[S_SW_INTERRUPT])
8 interrupt_cause = S_SW_INTERRUPT; // Supervisor Software
Vulnerability detection in hardware fuzzers involves ei- 9 if (mie[S_EXT_INTERRUPT] && (mip[S_EXT_INTERRUPT] | irq[
,→ SupervisorIrq]))
ther differential testing or assertion checking. In differen- 10 interrupt_cause = S_EXT_INTERRUPT; // Supervisor External
tial testing, the outputs of the DUT and a golden reference
model (GRM), when tested with the same test case, are com- Huzz, for 72 hours, generating more than 200K test cases* .
pared to detect vulnerabilities [32, 36]. In assertion checking, Unfortunately, TheHuzz did not cover any of the branch cov-
we insert the conditions to trigger the vulnerabilities or asser- erage points of all the three interrupts [36]. We performed
tion properties into the DUT based on its specification and further analysis to understand this limitation.
use the violations of these assertions during the simulation Consider the S_EXT interrupt in Line 9 in Listing 1. Trigger-
to detect vulnerabilities [51, 74]. Note that, unlike software, ing this interrupt requires the S_EXT_INTERRUPT bit of both
hardware does not have events like crashes, memory leaks, the CSRs, mie and mip, to be enabled. According to the RISC-
and buffer overflows to use for vulnerability detection [74]. V instruction set architecture (ISA) emulator, Spike [64],
there are only four instructions (CSRRW, CSRRWI, CSRRS, and
CSRRSI) out of the total 1146 RISC-V instructions can mod-
3 Motivation ify the values of CSRs, and there are 229 CSRs in total. The
probability of generating an instruction that sets the mie regis-
In this section, we highlight the limitations of existing formal ter is 4/(1146×229) = 1.524×10−5 . As we also need the bit
and fuzzing techniques to motivate the need for hybrid hard- of mip CSR to be set, the combined probability of generating
ware fuzzers. To this end, we use a popular, open-sourced such a test case is only (1.524 × 10−5 )2 = 2.323 × 10−10 .
RISC-V [63] based processor, CVA6 [79], as a case study. Though this is the probability of randomly generating a
However, we perform extensive evaluation of HyPFuzz on all test case, it sheds light on why existing hardware fuzzers
modules of five different processors (see Section 5). [13, 32, 36]—which use the coverage feedback and mutation
techniques—could not cover this coverage point after fuzzing
for 72 hours. This demonstrates that existing hardware fuzzers
3.1 Case Study on CVA6 Processor are insufficient to explore hard-to-reach design spaces, leav-
Consider the Listing 1, which shows the trigger condition of ing vulnerabilities undetected.
three interrupts in the interrupt handler of CVA6. Verifying
the correctness of the interrupt handler is critical, as a vul- 3.3 Limitations of Formal Verification
nerable interrupt handler can be exploited for information
leakage [18]. To trigger each type of interrupt, the correspond- Theoretically, formal tools can use cover properties to prove
ing bits of two control and status registers (CSRs): mie and the reachability of all the coverage points in a DUT [28],
mip, need to be enabled (i.e., set to 1’b1). Thus, the test case achieving 100% of the reachable coverage. However, this
should simultaneously consists of instructions that set the bits requires writing and proving cover properties for all the cov-
of both mie and mip registers. This condition is covered by erage points, an error-prone and time-consuming task (as it
the branch coverage metric, which checks if both directions requires design knowledge and manual effort), especially in
of the branch (in this case, the if statement) are taken [50]. large and complex hardware designs like processors with
thousands of coverage points [19]. Moreover, since the reach-
ability of a cover property only requires the existence of one
3.2 Limitations of Existing Hardware Fuzzers path, formal tools, like JasperGold, do not guarantee to find
all the paths with vulnerabilities.
Hardware fuzzers iteratively perform seed generation and
Case Study on CVA6’s interrupt controller. The CVA6 pro-
mutation to improve coverage [13, 32, 36, 43, 74]. However,
cessor has 9.53 × 103 branch coverage points. First, there
fuzzers still require an exponentially large amount of time
exists no tool that can convert these branch coverage points
to cover some coverage points—whose test cases are hard
into cover properties that tools like JasperGold [4] can prove.
to generate due to the specific conditions required to trig-
Thus, one has to craft these properties manually. We devel-
ger them—leaving multiple design spaces unexplored and
oped and used a property generator that can automatically
vulnerabilities undetected.
derive cover properties from branch coverage points.
Case Study on CVA6’s interrupt controller. We fuzzed the
CVA6 processor with the most recent processor fuzzer, The- * For this paper, a test case refers to a binary executable.

1364 32nd USENIX Security Symposium USENIX Association


Listing 2: The Verilog code of CSR reading in CVA6. Listing 3: Instructions covering the point of the
1 if (csr_read) begin S_EXT_INTERRUPT.
2 unique case (csr_address)
1 ORI X6, X3, h'204; // update reg X6 with s_ext value
3 // Counters and Timers
2 CSRRS X0, mie, X6; // write the value of X6 to mie
4 ML1_ICACHE_MISS, ML1_DCACHE_MISS, MITLB_MISS,
3 CSRRS X0, mip, X6; // write the value of X6 to mip
5 MDTLB_MISS, MLOAD, MSTORE, MEXCEPTION,
6 MEXCEPTION_RET, MBRANCH_JUMP, MCALL, MRET,
7 MMIS_PREDICT, MSB_FULL, MIF_EMPTY, Case Study on CVA6’s interrupt controller. Since the fuzzer
8 MHPM_COUNTER_17,
9 ... can cover none of the three coverage points in the interrupt
10 MHPM_COUNTER_31: csr_rdata = perf_data_i; => Point handler (see Section 3.2), the hybrid fuzzer uses a formal
tool for assistance. Of these three branch coverage points,
Second, but more importantly, we evaluate the time taken consider the point of S_EXT_INTERRUPT at line 9 in List-
by JasperGold to prove all these properties and the coverage ing 1 as an example. We first use the conditions for cover-
it achieves. Since the corresponding Boolean assignments ing this point (see Appendix B) into an SVA cover prop-
generated by JasperGold cannot be directly used as test cases erty: cover property (mie[S_EXT _INT ERRUPT ] &&
for fuzzing a processor, we developed a test case converter mip[S_EXT _INT ERRUPT ] || irq[SupervisorIrq]). Jasper-
that can automatically convert these Boolean assignments Gold then takes only 20 seconds to find a path to this property
into the binary executable format. We then simulate these and dump Boolean assignments as shown in Figure 9.
test cases and collect the branch coverage achieved. The time However, the fuzzer cannot directly use these Boolean
consumption of each point is the summation of formal verifi- assignments because the processors require binary executa-
cation and simulation. To prevent JasperGold from spending bles as test cases. Hence, we identify instruction-related sig-
too much time on the property of one point, we limit the maxi- nals (e.g., the input instruction port of the decoder) from the
mal time on each property (see Section 5.1). JasperGold took Boolean assignments and parse their values beginning from
eight days to verify 94.51% of the branch points in the CVA6 the initial state to the state that satisfies the property. Listing 3
processor as shown in Figure 2. This shows that using formal shows the extracted sequence of instructions.
tools requires extensive manual labor and has tremendous
Then, the sequence of instructions is converted into a valid
runtime in verifying all coverage points in the DUT.
executable file. Such files consist of three instruction se-
Case Study on vulnerability detection. Theoretically, for- quences: INIT instructions that initialize registers and mem-
mal tools alone can achieve 100% coverage by proving the ory of the processor, TEST instructions that contain the testing
cover properties of all the coverage points. But, there is still instruction sequence, and EXIT instructions that handle nor-
scope for vulnerabilities. For example, consider the vulnera- mal and abnormal (e.g., exception) termination of simulation,
bility V3 found by HyPFuzz where the CVA6 processor returns as shown in Figure 10. To generate a valid executable file,
X-values when accessing unallocated CSRs. Listing 2 is a we create an executable file template with NOP instructions as
code segment from the CVA6 processor that accesses data TEST instructions, compare and identify the initial memory
from the hardware performance counters (HPCs) using the address of the TEST instruction section from the disassembly
address of the CSRs (csr_address). The HPCs are used for file, and replaces these NOP instructions with the instruction
anomaly and malicious behavior detection [41, 76], hence sequence extracted from the Boolean assignments.
verifying their security is essential. Triggering the vulnerabil- The hybrid fuzzer can now use this test case as a seed,
ity V3 requires a test case to access the data of the counters mutate it, and generate new test cases that cover the remaining
among MHPM_COUNTER_17 to MHPM_COUNTER_31. However, two coverage points in the interrupt handler. For example,
JasperGold will not always find a path to access the target a commonly-used mutation technique in existing hardware
counters. This is because all counters share the same point to fuzzers [13, 32, 36], random-8, overwrites a random byte with
reduce the instrumentation overhead from the branch cover- a random value in the instruction [36, 43]. This mutation can
age metric. A cover property does not explore all paths under easily toggle the other bits in mie and mip CSRs, covering the
the point, which shows that using formal tools with standard other coverage points of the interrupt handler. Therefore, a
cover properties is insufficient to detect all vulnerabilities. hybrid fuzzer can use test cases from formal tools to overcome
the limitation of fuzzers, thereby increasing the coverage.
3.4 Advantages of a Hybrid Fuzzer We built a property generator to automatically generate
the cover properties of coverage points and a test case con-
By using both techniques in tandem, hybrid fuzzers overcome verter (see Section 4.5) to automatically convert Boolean
the limitations of using fuzzing and formal techniques alone. assignments of reachable properties into test cases.
The fuzzer quickly explores the DUT through the mutation Case Study on CVA6. Figure 2 shows the branch coverage
of effective test cases. The formal tool verifies points that the achieved by the most recent processor fuzzer, TheHuzz; the
fuzzer is struggling to cover and provides the corresponding formal tool, JasperGold; and our hybrid fuzzer, HyPFuzz.
test cases as seeds to the fuzzer. The fuzzer then mutates these JasperGold continues to achieve coverage but is slow due
test cases to explore the target design further. to its high run-time to cover each coverage point; it reaches

USENIX Association 32nd USENIX Security Symposium 1365


100 Formal rate

Switch
Fuzzer Coverage Formal tool

Fuzzer rate
% Branch points covered
95 Scheduler
90 Mutation engine
01010 10001
85
11100 01000 Boolean assign
80
00111 11111 Point
selector
75
Test case
Jaspergold
70 database Property Test case
TheHuzz DUT
65
HyPFuzz generator converter
Seed
60
0 1 2 3 4 5 6 7 8
Day
Figure 3: Framework of HyPFuzz. The circle and cross repre-
Figure 2: Eight day coverage results of TheHuzz [36], Jasper- sent covered and uncovered points, respectively.
Gold [4], and HyPFuzz for the CVA6 processor [79].

C3. Seamless integration. A hybrid fuzzer should seamlessly


a coverage of 94.51% after running for eight days. On the integrate the formal tool and the fuzzer to be faster and easier
other hand, even though TheHuzz initially achieves faster to use. However, the inputs and outputs of the fuzzer and
coverage than JasperGold, it fails to achieve the coverage be- formal tool are incompatible. The fuzzer uses test cases as
yond 88%, even after running for 72 hours, as all the remain- input, while the formal tool generates Boolean assignments
ing coverage points are hard-to-reach. In contrast, HyPFuzz for the inputs of the DUT for each clock cycle as the output.
achieves 94.78% coverage (6.1% more coverage compared Also, the formal tool needs cover properties as input while
to TheHuzz) in 72 hours, and it achieves the 94.51% coverage the fuzzer outputs coverage of each point in the DUT. Hence,
achieved by JasperGold in 50.71 hours (3.71× faster than it is not straightforward to combine a formal tool and a fuzzer.
JasperGold). Therefore, a hybrid fuzzer can use the fuzzer to Hence, challenge C3 is to seamlessly integrate formal and
explore the DUT and overcome the manual and run-time over- fuzzing tools to build an automated flow for the hybrid fuzzer.
head limitations of formal tools, achieving coverage faster. We address challenges C1 and C2 by building a dynamic
scheduler and an uncovered point selector, respectively. We
solve C3 by building a property generator and a test case
4 Hybrid Hardware Fuzzing converter, which facilitate seamless integration of the fuzzer
with the formal tool.
In this section, we first elaborate on the challenges of building
a hybrid fuzzer and how we address them in HyPFuzz.
4.2 Framework of HyPFuzz
HyPFuzz consists of the scheduler, point selector, property
4.1 Challenges
generator, and test case converter, apart from the fuzzer and
C1. Scheduling: The speed of the fuzzer varies over time the formal tool, as shown in Figure 3. HyPFuzz starts by invok-
depending on the design-under-test (DUT) and the type of ing the point selector that heuristically selects the uncovered
fuzzer used. Similarly, the speed of the formal tool varies point that the formal tool should verify (see Section 4.4). Then,
from one coverage point to another based on the DUT, the the property generator generates the cover property for that
type of formal tool, and the computational resources. Thus, point (see Section 4.5). The formal tool proves this property
challenge C1 is to build a dynamic scheduler between the and generates the Boolean assignments for each input of the
formal tool and the fuzzer that minimizes the under- and over- DUT for every clock cycle required to trigger that uncovered
utilization of the capabilities of the formal tool and the fuzzer. point. The test case converter converts the Boolean assign-
C2. Selection of coverage points: Since the coverage point ments of instruction signals into a test case, which the fuzzer
targeted by the formal tool determines the seed of the fuzzer, uses as a seed (see Section 4.5). The fuzzer simulates the DUT
it also impacts the successive points covered by the fuzzer with this seed to reach the coverage point. The fuzzer also
as it mutates this seed. Thus, the hybrid fuzzer should select mutates this seed to cover the neighborhood points. It runs
the uncovered points that maximize the number of coverage until the scheduler stops it. Then, the point selector selects
points the fuzzer can uncover, thereby increasing the speed the next uncovered point. HyPFuzz repeats this process until
of the fuzzer. However, the current set of uncovered points it achieves the target coverage or hits a timeout.
depends on the DUT and also what points have been covered
by the fuzzer in the past. Thus, challenge C2 is to build a 4.3 Scheduling of Fuzzer and Formal Tool
point selector that maximizes the rate of coverage despite the
uneven distribution of uncovered points in the DUT, which Our scheduler uses a dynamic scheduling strategy that
also changes with time. switches HyPFuzz from fuzzer to formal tool when the rate

1366 32nd USENIX Security Symposium USENIX Association


of coverage increment of the fuzzer (r f uzz ) is less than that ki denotes the ith test case generated. Then, the set of test
of the formal tool (r f ml ). The rates reflect their capability to cases in the window w is Kw = {ki ∈ K | |K| − w < i ≤ |K|}.
explore the design spaces. On the other hand, the formal tool Let n(ki ) and t f uzz (ki ) denote the number of new coverage
is running until it generates the Boolean assignments to reach points covered and the run-time of the fuzzer for each test
the uncovered point. Next, we formulate the rate of coverage case, respectively. Thus,
increment of the fuzzer and formal tool, which helps one to
schedule them dynamically. ∑ n(ki )
Formal tool’s coverage increment rate (r f ml ) is the ratio of ki ∈Kw
r f uzz (w) = (2)
the number of coverage points verified by the formal tool so ∑ t f uzz (ki )
far. Since a formal tool will process the points that are hard to ki ∈Kw
be covered by the fuzzer, we can calculate the optimal r f ml us- The scheduler computes r f uzz and r f ml in real-time (see
ing tnp , where n represents the uncovered points selected when Section 5.1), runs the fuzzer as long as it can cover more
switching to the formal tool, and t p represents the time spent points than the formal tool, and then switches to the formal
by the formal tool on proving the corresponding properties. tool when r f uzz < r f ml .
However, since we need to calculate r f ml before switching
to the formal tool, HyPFuzz needs to predict the uncovered
points selected and the time taken by the formal tool on the
4.4 Selection of Uncovered Points
corresponding properties. Unfortunately, both predictions are The hardware design and verification process is modular [12,
difficult due to the randomness of fuzzing and the low predic- 13]. Therefore, to be compatible with the existing hardware
tion accuracy of the time cost of a formal tool (the accuracy of design verification flow, we develop strategies for the selection
the state-of-the-art machine learning strategy is 68%, given a of uncovered points at the module level.
property [21]). Therefore, we first calculate the average time We run existing processor fuzzers, including DIFUZ-
tave spent by a formal tool on properties. We then use tave to ZRTL [32] and TheHuzz, on up to five different processors
1
estimate the r f ml as tave to reflect how the formal tool will and analyze their effectiveness at the module level. Based on
explore the hard-to-reach spaces of the fuzzer in a design. this study, we make the following observations:
However, it is infeasible to calculate tave by proving all Observation O1. The farther the module is from the inputs
uncovered points in the hard-to-reach region. Therefore, we of the DUT, the harder it is for the fuzzer to trigger the mod-
estimate r f ml using the moving average, which is widely ap- ule’s components accurately. Thus, the coverage of the fuzzer
plied to estimate the underlying trend [33], as isproportional to the distance of the module from the input.
Observation O2. The more the number of uncovered points
|C |
r f ml = , (1) in the module, the more coverage points the fuzzer can cover
∑ f ml (c)
t if the fuzzer’s seed activates the module.
c∈C
Observation O3. The more DUT logic a module drives, the
where C is the set of coverage points verified by the formal higher the probability of the fuzzer covering new points.
tool, and t f ml (c) denotes the run-time of the formal tool to Based on these observations, we have developed three
generate a test case for the coverage point c. strategies: (i) BotTop, (ii) MaxUncovd, and (iii) ModDep. The
Fuzzer’s coverage increment rate (r f uzz ) is the ratio of the strategies include deterministic and non-deterministic oper-
number of new coverage points covered by the fuzzer in a ations, which will first select a module and then randomly
rolling window over the run-time of the fuzzer. To prevent select an uncovered point inside. We also use RandSel, a
under-/over-utilization of a fuzzer, r f uzz reflects how well the naive selection strategy that randomly selects the uncovered
fuzzer recently explored the design spaces. Fuzzers usually points. We evaluate these strategies on a comprehensive set
achieves faster coverage initially and then slow down due of five real-world, open-source processors covering one of
to the hard-to-reach spaces in the design [32, 36, 43]. There- the first and most widely used OpenRISC ISA [54] and RISC-
fore, if we calculate r f uzz including coverage increment at the V ISA [63], respectively, and select the best strategy for our
beginning, r f uzz will become unnecessarily high and cause point selector based on empirical results (see Section 5.3). We
over-utilization of the fuzzer, delaying the switching process. now explain these three strategies using CVA6 as an example.
However, if we calculate r f uzz using the coverage achieved by
the most recent test case, the test case may not achieve new 4.4.1 BotTop Strategy
coverage, whereas the upcoming test cases can due to the ran-
domness of the fuzzing process. This causes under-utilization To ensure the efficient usage of the formal tool and fuzzer, the
of the fuzzer and hence cannot fuzz around the seed from the formal tool should target the hard-to-reach coverage points,
formal tool entirely. Therefore, a rolling window is used in this and the fuzzer should target the remaining points. Based on
case to compute the instantaneous rate of the fuzzer. Let K de- Observation O1, BotTop selects modules deep in the DUT, i.e.,
note the set of all the test cases generated by the fuzzer, where their distance from the DUT’s input. We define this distance as

USENIX Association 32nd USENIX Security Symposium 1367


FETCH DECODE ISSUE EXECUTE COMMIT

Data
Instr Realigner LSU
Register cache
Instr queue Register

EXECUTE/COMMIT
file ALU

ISSUE/EXECUTE
FETCH/DECODE

DECODE/ISSUE
Branch WB

Scoreboard
Compressed
PC predictor decoder
Branch
Scoreboard unit
M
Predicted
Illegal Instr Exception
U
addr FPU
Decoder
ADD

X
Instr ops Merge

Parallel
Instruction

FPUOp
4
Legal instr FPMul Interrupt
MUX decoding
Branch check
prediction Illegal FPDiv
mie Valid interrupt
Interrupt Interrupt
CSR mip Interrupt type
handler

Figure 4: Simplified pipeline of the CVA6 processor [79]. The connectivity in dotted lines is an example to show how a module
drives various other modules in the design.

the number of modules between the input of the DUT and the Thus, based on Observation O3, targeting the coverage points
target module. For example, consider the simplified pipeline from a module with many other modules in their fanout COI
of the CVA6 processor shown in Figure 4. In this pipeline, will allow the fuzzer to uncover more points. Hence, our Mod-
the distance of FETCH, DECODE, and EXECUTE is one, and both Dep strategy prioritizes modules with higher COI over other
FPMul and FPDiv have a distance of four. Thus, the BotTop modules, as shown in Algorithm 2. For example, ModDep
strategy assigns the highest priority to FPMul and FPDiv over assigns the highest priority to the CSR module in the CVA6
other modules in the CVA6 processor. processor because it has the highest fanout due to driving
multiple components: interrupt control logic (shown using
4.4.2 MaxUncovd Strategy red dotted lines in Figure 4), write logic in Register file,
and enable logic in Data cache.
In a hybrid fuzzer, the seed generated using the formal tool
will cover a coverage point in the hard-to-reach design space.
On mutating the seed, the fuzzer will explore the design space 4.5 Integrating Fuzzer and Formal Tool
in the “vicinity” of this covered point. Thus, based on Ob-
servation O2, having more uncovered points in the “vicinity” Formal tools target the DUT’s properties, but fuzzers target the
will increase the number of coverage points the fuzzer can DUT’s coverage points. Due to this incompatibility, we cannot
cover, thereby accelerating HyPFuzz. Thus, our second strat- directly send the target coverage point to the formal tool to
egy, MaxUncovd prioritizes the module with the maximum verify or take the Boolean assignments of a proved property as
number of uncovered points over the rest of DUT’s modules, an executable test case for the fuzzer, creating challenge C3.
irrespective of its distance from the inputs. Unlike the BotTop To seamlessly integrate the fuzzer and formal tool, we develop
strategy, whose module distance is fixed for a given DUT, the a property generator and a test case converter.
number of uncovered points in a given module decreases as Property generator generates the cover property for the se-
HyPFuzz explores more design space over time. Hence, Max- lected uncovered point. It parses the DUT’s logic and iden-
Uncovd is a dynamic strategy that recomputes the priorities tifies the conditions for covering the point. Then, it converts
of each module every time the point selector is invoked. the conditions into a cover property and loads it to the formal
For example, in the CVA6 processor, the Decoder and float- tool along with the DUT. For example, given an uncovered
ing point unit (FPU) modules have 381 and 108 branch cov- point of branch coverage, the property generator analyzes
erage points, respectively. In the beginning, all the coverage the dependencies of the branch statement. It then identifies
points in the DUT are uncovered. Hence, the Decoder will the conditions to cover that point, and the logical conjunction
have more uncovered points; consequently, MaxUncovd pri- of them will form the expression of the corresponding cover
oritizes the Decoder over the FPU. However, over time, the property. The property generator is compatible to uncovered
number of uncovered points in the Decoder decreases as in- points of other coverage metrics, as shown in Appendix B.
structions of all types activate this module. Test case converter As mentioned earlier, the formal tool
generates the Boolean assignments of the input signals for
4.4.3 ModDep Strategy each clock cycle to cover a target coverage point. However,
the fuzzer has a different input format. For example, pro-
Triggering a coverage point first requires activating the mod- cessor fuzzers, such as TheHuzz and DIFUZZRTL [32], use
ules that drive the logic corresponding to this coverage point. sequences of instructions as test cases to fuzz [32, 36, 43].

1368 32nd USENIX Security Symposium USENIX Association


Algorithm 1: HyPFuzz Table 1: Benchmarks used in our study.
Input: DUT : design-under-test; Processor # Latches # Gates # Branch points OoO SIMD Time limit (sec)
OR1200 [55] 3.08 × 103 2.48 × 104 7.00 × 102 4.80 × 102
s: point selection strategy; mor1kx [53] 5.29 × 103 4.64 × 104 1.34 × 103 6.98 × 102
CVA6 [79] 2.48 × 104 4.63 × 105 9.53 × 103 3.80 × 103
tlimit : time limit of fuzzing; Rocket Core [11] 1.64 × 105 9.24 × 105 1.33 × 104 2.71 × 103
tc: target coverage; BOOM [80] 1.99 × 105 1.26 × 106 2.42 × 104 3.83 × 103

Output: n: total coverage achieved;


1 t ← 0, r f ml ← 0
then converted to a sequence of instructions to be used as the
2 Initialize registers and memory in DUT to zero
seed of the fuzzer (Line 11). We then invoke the fuzzer on
3 while (n < tc) and (t < tlimit ) do
the DUT (Line 15). The fuzzer is continued to execute until
4 r f ml , seed,t ← S WITCH T O F ORMALT OOL
the coverage rate of the fuzzer is less than that of the formal
(DUT ,s,t, r f ml )
tool (Line 16); otherwise, we select a new uncovered point
5 n,t ← S WITCH T O F UZZER
for the formal tool to target. This cycle continues until the
(DUT ,t,r f ml ,seed,tlimit )
target coverage is achieved or the time limit is reached.
6 return n
7 Function S WITCH T O F ORMALT OOL (DUT , s, t, r f ml ):
/* select an uncovered point p based on 5 Evaluation
RandSel, MaxUncovd, BotTop, and
ModDep strategies */ We first evaluate how the rate of formal tool (r f ml ) and
8 p ← S ELECTION S TRATEGY (DUT , s) fuzzer (r f uzz ) reflect the capability that the formal tool and
9 cprop ← P ROPERTY G ENERATOR (DUT ,p) the fuzzer explore the design spaces. We then evaluate HyP-
10 Boolean_assignment, r f ml ← Fuzz on five open-source processors from RISC-V [63] and
F ORMALT OOL (DUT ,cprop ) OpenRISC [54] instruction set architectures (ISAs) to empiri-
11 seed ← T EST C ASE C ON - cally select the best point selection strategy. We then compare
VERTER (DUT ,Boolean_assignment)
HyPFuzz with the most recent hardware processor fuzzer [36]
12 return r f ml , seed, t regarding the coverage achieved and the vulnerabilities de-
tected. Finally, we investigate the capability of HyPFuzz to
13 Function S WITCH T O F UZZER (DUT ,t,r f ml ,seed, cover points of different coverage metrics. We ran our experi-
tlimit ): ments on a 32-core, 2.6 GHz Intel Xeon processor with 512
14 repeat GB of RAM running Cent OS Linux release 7.9.2009.
15 n,t, r f uzz ← F UZZER (DUT ,seed, t)
16 until (r f uzz < r f ml ) or (t ≥ tlimit )
17 return n,t 5.1 Evaluation Setup
Benchmark selection. Most commercial processors are pro-
tected intellectual property (IP) and close-sourced. Thus, we
pick the three large (in terms of the number of gates) and
The test case converter will use the ISA of the DUT to map
widely-used open-sourced processors: Rocket Core [11],
the Boolean assignments generated by the formal tool to a
BOOM [80], and CVA6 [79] from the RISC-V ISA and
sequence of instructions. This mapping process uses the ISA
OR1200 [55] and mor1kx [53] from the OpenRISC ISA as the
of the target processor and is repeated for each clock cycle.
diverse set of benchmarks to evaluate HyPFuzz. Table 1 lists
The test case converter also prepends this test case with an-
the details of these processors. CVA6 and BOOM processors are
other sequences of instructions that initialize and terminate
complex than the Rocket Core, mor1kx, and OR1200, with
the processor’s simulation as the seed.
advanced micro-architectural features like out-of-order exe-
cution (OoO) and single instruction-multiple data (SIMD).
4.6 Putting It All Together Evaluation environment. We use the popular and industry-
standard Cadence JasperGold [4] and Synopsys VCS [9] tools
As shown in Algorithm 1, HyPFuzz first marks all the cov- as the formal tool and the simulation tool, respectively. We use
erage points as uncovered. Then, we run a test case where the branch coverage generated by VCS to evaluate HyPFuzz
we initialize all the registers and memory values of the DUT because branch coverage is an important coverage metric in
to zero using nop instructions (Line 2). We then select an vulnerability detection [50]. Appendix B details how we can
uncovered point (p) based on the selected strategy (s) (Line evaluate HyPFuzz with other coverage metrics. We use Chip-
8). We then convert this p into a corresponding cover prop- yard [10] as a simulation environment for the RISC-V proces-
erty and invoke the formal tool to target this property (Line sors. For HyPFuzz’s fuzzer, we use the most recent hardware
9). The formal tool returns the Boolean assignment for each fuzzer for processors, TheHuzz [36], as it achieves more cover-
signal of the DUT for each clock cycle (Line 10), which is age than prior hardware fuzzers, e.g., DIFUZZRTL [32], and

USENIX Association 32nd USENIX Security Symposium 1369


50

41.24
traditional random regression. To ensure a fair comparison, CVA6

36.41
we constrain the environment of JasperGold to be the same as BOOM
40
Rocket Core
the hardware fuzzer’s simulation environment [13, 32, 36, 43]. mor1kx

Speedup
30 OR1200
To prevent JasperGold from getting stuck while proving a
property, we set a time limit on JasperGold to prove each prop- 20

11.42
erty. We compute this time limit by using the time JasperGold

8.78
5.57
10

4.17
spends on 30 random coverage points and applying survival

2.07
2.03
2.00
1.99

1.99
1.90
1.85

1.81

1.73
1.72

1.63

1.63
1.37
1.31
analysis [38] to calculate a time limit large enough to prove 0
over 99% of the points in the design. We set the rolling win- dDep zz-RandSel uzz-BotTop xUnco
vd
zz-Mo HyPF zz-Ma
HyPFu HyPFu HyPFu
dow size (w) as 100 to calculate r f uzz using Equation 2. We
ran each experiment for 72 hours and repeated it three times. Figure 5: Speedup of HyPFuzz over TheHuzz [36].

5.2 Evaluating the Scheduling Strategy sors except for the CVA6 processor, which has a difference
We pick the coverage results of HyPFuzz on CVA6 to evalu- of 1.31%. The difference in the coverage achieved by all the
ate the coverage increment rate of the formal tool (r f ml ) and strategies decreases over time, and the coverage achieved will
the fuzzer (r f uzz ). The results are shown in Appendix E. eventually converge. However, they still achieve an average
Evaluation on r f ml . We evaluate how the r f ml will quickly of 1.72% more coverage compared to TheHuzz and 4.73%
converge and reflect how fast the formal tool will explore the more coverage compared to random regression, across the
hard-to-reach spaces of the fuzzer in a design. The sched- five processors (see Section 5.4).
uler updates r f ml every time when JasperGold proves the Coverage speed of the four selection strategies of HyPFuzz
reachability of an uncovered point. As shown in Figure 11, on the five processors is shown in Figure 5. The MaxUncovd
HyPFuzz switched to using JasperGold more than 300 times. selection strategy is the fastest for all processors except for
The accumulated r f ml only varies during the first 24 switches, the OR1200, where the RandSel strategy performs the best.
after which the mean difference between the final r f ml and The reason why RandSel strategy is the fastest for the OR1200
the accumulated r f ml is less than 5%. Hence, compared to processor is because of its smaller size (around 25K gates),
the entire experiment, r f ml only requires several samples to and RandSel has a faster selection time compared to other
converge and can reflect the capability of space exploration strategies. However, this is not the case for the other pro-
of JasperGold in a design. The time limit on JasperGold also cessors, especially the larger processors, such as CVA6 and
helps accelerate the convergence of r f ml . BOOM, which have several hundred thousand gates and more
Evaluation on r f uzz (w). We evaluate how the rolling win- microarchitectural features. For these designs, the order of
dow size (w) affects the accuracy of estimating how well the point selection has a more dominant effect on coverage speed
fuzzer in HyPFuzz recently explored the design space. The than the time taken to select the point. Based on these analy-
fuzzer in HyPFuzz executes 10 test cases at a time; hence we ses, we select the MaxUncovd strategy as our point selection
analyze the r f uzz (w) for different values of w in increments of strategy for HyPFuzz. All further evaluations will use the
10. For each value of w, we plot the number of times the sched- MaxUncovd strategy unless stated otherwise.
uler will under-utilize the fuzzer and switch to a formal tool,
as shown in Figure 12. We consider the fuzzer under-utilized 5.4 Coverage Achieved
if the coverage of the fuzzer is temporarily stagnated but will
cover new points through mutation of test cases if scheduler We now evaluate the capability of random regression, The-
had not switched to the formal tool. We can observe from Huzz, and HyPFuzz in achieving coverage. Across the five
Figure 12 that when the window is set to 100, r f uzz (w) can processors, HyPFuzz achieves 4.73% more coverage than ran-
capture almost all coverage increments from the following dom regression and is 239.93× × faster than random regression
test cases. Hence, we set the experiment w to be 100. after fuzzing for 72 hours, as seen in Figure 8. Also, HyPFuzz
5.3 Evaluating Point Selection Strategies achieves 1.72% more coverage than TheHuzz and is 11.68× ×
faster than TheHuzz after running for the same 72 hours, as
To find the most efficient selection strategy for HyPFuzz, seen in Figure 8 and Figure 5, respectively.
we now evaluate the three point selection strategies BotTop, The CVA6 and BOOM processors are complex with advanced
MaxUncovd, and ModDep, along with the RandSel, using two OoO and SIMD features. Hence, HyPFuzz achieves substan-
metrics: amount of coverage achieved and coverage speed. tially higher coverage than TheHuzz on them. Figure 6 shows
Coverage achieved. Figure 8 shows the branch coverage that HyPFuzz achieves the highest branch coverage improve-
achieved by different selection strategies for all the proces- ment of 6.84% on CVA6 compared to TheHuzz. Thus, the cov-
sors. The difference between the coverage achieved by the erage speed of TheHuzz is less for these processors. However,
four selection strategies is less than 0.5% on all the proces- HyPFuzz leverages the efficient point selector to generate

1370 32nd USENIX Security Symposium USENIX Association


95
% Branch points covered
RISC-V specification [63], a processor must raise an excep-
90
tion when operations try to access data at invalid memory
addresses. However, CVA6 will not raise exceptions in the
85 same situation. We detected this vulnerability as a mismatch
when Spike raised an exception for such operations and in-
80
Random regression HyPFuzz-MaxUncovd
terrupted the program, whereas CVA6 continued to execute
75 TheHuzz HyPFuzz-ModDep the program. Because operating systems usually use such
HyPFuzz-RandSel HyPFuzz-BotTop exceptions to protect isolated executable memory space, miss-
70
0 10 20 30 40 50 60 70 ing them can allow an attacker to access data from all of
Time (hrs)
memory (CWE-1252 [37]).
Figure 6: Total branch points covered by random regression, Vulnerability V2 is located in the decode stage of CVA6 pro-
TheHuzz [36], and HyPFuzz on CVA6 [79]. cessor and is similar to an undefined behavior vulnerability
in a software program [7]. According to the RISC-V speci-
fication [63], the decoder should throw an illegal instruction
seeds and maximize the coverage achieved by its fuzzer. This exception when the destination register (rd) of instruction
results in a speedup of 41.24× on CVA6 and 11.42× on BOOM MULH is the same as the first (rs1) or second (rs2) source
compared to TheHuzz. On the other hand, the Rocket Core, register. This specification will reduce the utilization of multi-
mor1kx, and OR1200 do not have advanced OoO and SIMD plier units and increase the performance of processors. The
features. However, HyPFuzz is still at least 1.5× faster than vulnerability is that the decoder in CVA6 allows the rd of MULH
TheHuzz. This is because the dynamic scheduler adapts to to share the same register as rs1 or rs2. This means the de-
the complexity of the DUT using the values of r f ml and r f uzz , sign of CVA6 violates the ISA specification. We detected this
scheduling the fuzzer and the formal tool efficiently. vulnerability when HyPFuzz generated a test case containing
In summary, HyPFuzz achieved more coverage and is faster MULH with the same rd, rs1, and rs2. Spike threw an illegal
than random regression and TheHuzz on all five processors. instruction exception, whereas CVA6 executed the instruction.
HyPFuzz is faster when fuzzing processors, including the ones This vulnerability can result in a potential performance bot-
with complex micro-architectural features. However, as the tleneck when executing applications with heavy multiplier
complexity and size of the processor grow, the point selector operations, such as machine learning (CWE-440 [59]).
and the scheduler become more effective, resulting in more Vulnerability V3 is a cross-modular vulnerability where
speedup achieved by HyPFuzz. the read logic in the CSR module enables access to unde-
fined hardware performance counters (HPCs), resulting in
unknown values when these HPCs are accessed. It is similar
5.5 Vulnerabilities Detected
to an undefined behavior in a software program [7]. CVA6
While the coverage achieved sheds light on the extent of the has implemented 14 HPCs for recording various hardware
DUT verified, it is not a direct measure of a tool’s ability to behaviors, and its CSR module is responsible for reading the
detect vulnerabilities. Hence, we evaluate HyPFuzz’s ability value of an HPC based on requests from the operating system.
to detect vulnerabilities in real-world processors. However, the reading logic in the CSR module enables ac-
Vulnerability detection strategy of HyPFuzz is the same as cess to 32 HPCs, which causes X-propagation when reading
that of many existing fuzzers [13, 32, 36]. For the same test nonexistent HPCs. We detected this vulnerability when Spike
case, we output the architecture states of both the DUT and returns regular numbers while CVA6 returns X (unknown)
the golden reference model (GRM). The architecture state values. This vulnerability will cause potential issues during
includes the value of general purpose registers (GPRs), con- synthesis and fail the functions of HPC (CWE-1281 [24]).
trol and status registers (CSRs), instructions committed, and Vulnerabilities V2 and V3 resulted in two new common vul-
exceptions triggered. Any mismatch between the architecture nerabilities and exposures (CVE) entries, CVE-2022-33021
states indicates a potential vulnerability in the DUT or the and CVE-2022-33023.
GRM. In our experiment, we use the RISC-V ISA emulator, Comparison with TheHuzz. In addition to the three new
Spike [64], as the GRM for Rocket Core, BOOM, and CVA6 vulnerabilities, HyPFuzz detected all 11 vulnerabilities re-
processors and the OpenRISC ISA emulator, or1ksim [56], ported by TheHuzz as listed in Table 2. The speedup column
for OR1200 and mor1kx processors. shows how fast HyPFuzz is compared to TheHuzz in terms
Apart from detecting all the vulnerabilities detected by the of both run-time and number of instructions. It can be seen
most recent fuzzer [36], HyPFuzz detected three new vulner- that the speed of vulnerability detection follows the same
abilities, as listed in Table 2. trend as the coverage speeds. HyPFuzz detects vulnerabili-
Vulnerability V1 is in the memory control unit of CVA6 ties in CVA6 at faster than the vulnerabilities in Rocket Core,
processor and is similar to an out-of-bounds memory access mor1kx, and OR1200 processors. However, TheHuzz detected
vulnerability in software programs [66]. According to the some vulnerabilities faster than HyPFuzz. This is because the

USENIX Association 32nd USENIX Security Symposium 1371


Table 2: Vulnerabilities detected by HyPFuzz. N.A. denotes “Not Applicable.”

# Instructions Time (sec)


Processor Vulnerability Description CWE New?
TheHuzz HyPFuzz Speedup TheHuzz HyPFuzz Speedup
V1: Missing exceptions when
CWE-1252 N.A. 2.67 × 101 N.A. N.A. 6.67 × 101 N.A.
accessing invalid addresses.
V2: Incorrect decoding logic
CWE-440 N.A. 9.80 × 104 N.A. N.A. 4.20 × 103 N.A.
for multiplication instructions.
CVA6 [79]
V3: Returning X-value when
CWE-1281 N.A. 2.08 × 103 N.A. N.A. 1.70 × 102 N.A..
access unallocated CSRs.
V4: Failure to detect cache
CWE-1202 1.72 × 105 2.15 × 105 0.80× 6.50 × 103 8.40 × 103 0.77×
coherency violation.
V5: Incorrect decoding logic
CWE-440 1.36 × 104 1.08 × 104 1.26× 1.68 × 103 3.08 × 102 5.45×
for the FENCE.I instruction.
V6: Incorrect exception type
CWE-1202 4.02 × 104 7.40 × 103 5.43× 2.54 × 103 2.92 × 102 8.69×
in instruction queue.
V7: Missing exceptions for
CWE-1242 1.81 × 106 1.43 × 105 12.66× 5.67 × 104 6.41 × 103 8.85×
some illegal instructions.
V8: Instruction commit count
Rocket Core [11] CWE-1201 7.76 × 102 1.07 × 102 7.25× 1.19 × 102 6.67 × 101 1.78×
not increased when EBREAK.
V9: Incorrect implementation of
CWE-1201 20 20 1.00× 10 10 1.00×
the carry flag generation.
mor1kx [53]
V10: Missing access checking
CWE-1262 4.46 × 105 2.30 × 105 1.94× 1.97 × 104 8.20 × 103 2.40×
for privileged register.
V11: Incomplete implementation of
CWE-1199 1.12 × 105 2.13 × 105 0.53× 4.89 × 103 7.60 × 103 0.64×
the EEAR register write logic.
V12: Incorrect forwarding logic
CWE-1281 1.05 × 103 1.23 × 103 0.85× 1.12 × 102 60 1.87×
for the GPR0.
OR1200 [55]
V13: Incorrect overflow logic
CWE-1201 47 73 0.64× 1.25 × 101 5.05 × 101 0.25×
for MSB & MAC instructions.
V14: Incorrect generation of
CWE-1201 2.21 × 104 1.78 × 104 1.24× 1.34 × 103 6.72 × 102 1.99×
overflow flag.

95
number of instructions required to trigger them is too low (<
1000 instructions) (i.e., measure the coverage and mutate test 85
% Points covered

cases). For example, V9 and V13 require less than one hun- 75
dred instructions to trigger. On an average, HyPFuzz detects
65
× faster than TheHuzz in terms of run time
vulnerabilities 3.06×
and uses 3.05×× fewer number of instructions to trigger them. 55

45 FSM:TheHuzz Condtion:TheHuzz
FSM:HyPFuzz Condtion:HyPFuzz
5.6 Evaluation on Compatibility of HyPFuzz 35
0 10 20 30 40 50 60 70
Time (hrs)
In this section, we demonstrate the compatibility of HyPFuzz
in achieving coverage with different coverage metrics such as Figure 7: Total condition points and FSM points covered by
condition and finite-state machine (FSM) coverage. HyPFuzz TheHuzz [36] and HyPFuzz.
uses its property generator to generate the cover properties
for the points of them as discussed in Appendix B. Figure 7
shows the coverage achieved by HyPFuzz and TheHuzz on
DUT. DirectFuzz [13] leverages the mux-toggle coverage
CVA6 processor when run for 72 hours and repeated three
and allocates more mutation energy to test cases that achieve
times. HyPFuzz achieves an average of 15.95% more con-
coverage closed to a manually selected model, hence achiev-
dition coverage and 33.76% more FSM coverage compared
ing faster coverage. However, the mux-toggle metric does
to TheHuzz hence demonstrating that HyPFuzz is capable of
not scale well to large designs like BOOM and CVA6 due to
improving coverage for different types of coverage metrics.
instrumentation overhead. DIFUZZRTL [32] addresses the
overhead limitation of RFUZZ by developing a new coverage
6 Related Works metric, control-register coverage. It uses all possible combina-
tions of values of all the registers driving the selection logic
Hardware fuzzers. RFUZZ [43] uses mux-toggle coverage of MUX. However, this coverage metric does not assign cov-
as feedback to fuzz hardware and was one of the first attempts erage points to many registers as well as combinational logic,
towards creating hardware fuzzers. The mux-toggle cover- thereby missing many security-critical vulnerabilities [36].
age covers the activity in the select signals of muxes in the Therefore, TheHuzz [36] uses code coverage metrics (branch,

1372 32nd USENIX Security Symposium USENIX Association


condition coverage, etc.) as feedback information to guide processor, (iv) uses industry-standard hardware simulators
the fuzzer. TheHuzz cannot verify the DUT efficiently as it that support all hardware constructs and has the ability to
is increasingly difficult to cover the coverage points in hard- detect vulnerabilities.
to-reach regions of hardware (see Section 3.2). The coverage
achieved by this processor is only 63%. Fuzzing hardware
like software [74], unlike other hardware fuzzers, first trans-
7 Discussion
lates a hardware design into a software model, then uses the
Utilization of point selection strategies. HyPFuzz specif-
coverage-guided software fuzzers to fuzz hardware. How-
ically adopts the MaxUncovd strategy to select uncovered
ever, the software model does not support all the hardware
points since it shows the highest coverage speed compared
constructs like latches and floating wires [36].
to the other strategies. However, we can modify HyPFuzz to
In summary, existing hardware fuzzers suffer from one or switch from them dynamically. We can assign weight to each
more of the following limitations: they (i) cannot scale to strategy as the probability of it being selected by HyPFuzz.
large designs, (ii) cannot capture all hardware behavior in We then use evolutionary algorithms, such as particle swarm
RTL, thereby missing vulnerabilities, (iii) are not designed to optimization algorithm [47, 68], to dynamically update the
fuzz the entire DUT, or (iv) do not support all the constructs weights for fast and high coverage.
like latches and floating wires. FPGA emulations is 10× faster than hardware simula-
Hybrid techniques. Existing software hybrid techniques com- tion [32, 43]. HyPFuzz uses hardware simulators to instru-
bine fuzzers with symbolic execution [25,26,57,58,70,81,82]. ment and collect the coverage data. Thus, currently, it does not
Fuzzers leverage symbolic execution’s constraint-solving readily support FPGA emulations. One can provide FPGA
capabilities to explore deep execution paths. Meanwhile, emulation support for HyPFuzz by first instrumenting the
symbolic execution utilizes concrete test cases generated by DUT using existing tools such as Verific [75] or by modifying
fuzzers to mitigate scalability issues [35]. Therefore, most intermediate representation (IR) compilers [13, 32, 43]. The
hybrid fuzzers run symbolic execution in parallel to keep instrumented DUT will then be emulated on the FPGA while
calculating paths covered by test cases and also use sym- the formal tool and the rest of the components of HyPFuzz
bolic execution to generate new test cases when the origi- run on the host machine. Thus, one can use HyPFuzz to fuzz
nal program terminates normally (halt) or abnormally (e.g., designs emulated on an FPGA.
crash) [25, 26, 57, 58, 81]. Driller [70] invokes symbolic
execution when the fuzzer cannot achieve coverage after
a pre-defined time threshold. Moreover, Driller will select 8 Conclusion
an unexplored path close to the paths covered by test cases.
DART [25] applies a depth-first search along the path tree, Existing hardware fuzzers do not fuzz the hard-to-reach parts
and SAGE [26] selects paths under the paths explored by pre- of the processor, thereby missing security vulnerabilities.
vious inputs with maximal coverage increment. Pak’s strat- HyPFuzz tackles this challenge by using a formal tool to
egy [57] profiles unexplored paths and assigns difficult ones help fuzz the hard-to-reach parts. We fuzzed five open-source
with higher priority to be selected, and Digfuzz [81] applies processors, including a million-gate BOOM processor. HyPFuzz
Monte Carlo methods [65] to quantify the difficulty of unex- found three new memory and undefined behavior-related vul-
plored paths with probability. nerabilities and detected all the existing vulnerabilities 3.06×
Compared to software hybrid fuzzers, HyPFuzz: (i) gener- faster than the most recent processor fuzzer. It also achieved
ates SVA properties and invokes formal tools, ensuring com- 239.93× faster coverage than random regression and 11.68×
patibility with various coverage metrics for different security faster coverage than the most recent processor fuzzer.
and verification purposes rather than just path, (ii) does not Responsible disclosure. We responsibly disclosed the vul-
require formal tools to check the test cases generated by the nerabilities to the designers.
fuzzers, utilizing formal tools only when necessary, (iii) de-
velops multiple uncovered point selection strategies based on 9 Acknowledgement
computer architecture and hardware design routines [29, 67].
Apart from these fuzzing techniques, researchers have com- Our research work was partially funded by the US Office of
bined various formal verification techniques with either ran- Naval Research (ONR Award #N00014-18-1-2058), by In-
dom regression [30, 40] or fuzzing [45] in the context of test tel’s Scalable Assurance Program, and by the European Union
generation, targeting faults in hardware rather than security (ERC, HYDRANOS, 101055025). We thank anonymous re-
vulnerabilities, which is the focus of HyPFuzz. Also, none of viewers and Shepherd for their comments. Any opinions, find-
these techniques scale to large, real-world designs. ings, conclusions, or recommendations expressed herein are
In contrast, HyPFuzz: (i) is scalable to large real-world those of the authors and do not necessarily reflect those of
processors, (ii) captures all activity in the hardware details, the US Government, the European Union, or the European
such as FSMs and combinational logic, (iii) fuzzes the entire Research Council.

USENIX Association 32nd USENIX Security Symposium 1373


References [13] S. Canakci, L. Delshadtehrani, et al. Directfuzz: Au-
tomated Test Generation for RTL Designs using Di-
[1] Accelerating exhaustive and complete verification of rected Graybox Fuzzing. Design Automation Confer-
RISC-V processors, 2021. https://riscv.org/ne ence, 2021.
ws/2021/08/accelerating-exhaustive-and-c
omplete-verification-of-risc-v-processor [14] E. Clarke, O. Grumberg, et al. Progress on the State
s-ashish-darbari-axiomise/. Last accessed on Explosion Problem in Model Checking. Informatics,
09/30/2022. pages 176–194, 2001.
[15] E. M. Clarke, T. A. Henzinger, et al. Handbook of Model
[2] Prologue: The 2020 Wilson Research Group Functional
Checking. 10, 2018.
Verification Study. https://blogs.sw.siemens.c
om/verificationhorizons/2020/11/05/part-1 [16] D. Cyrluk, S. Rajan, et al. Effective Theorem Proving
-the-2020-wilson-research-group-functiona for Hardware Verification. International Conference on
l-verification-study/, 2020. Last accessed on Theorem Provers in Circuit Design, 1994.
02/13/2022.
[17] S. Das, C. Karfa, et al. Formal Modeling of Network-
[3] Jasper Engine Selection Guide. https://www.cade on-Chip Using CFSM and Its Application in Detecting
nce.com/en_US/home/support.html, 2022. Last Deadlock. IEEE Transactions on Very Large Scale Inte-
accessed on 01/11/2023. gration Systems, 28(4):1016–1029, 2020.

[4] Jasper RTL Apps. https://www.cadence.com/en [18] R. De Clercq, F. Piessens, et al. Secure Interrupts on
_US/home/tools/system-design-and-verificat Low-End Microcontrollers. IEEE International Confer-
ion/formal-and-static-verification/jasper ence on Application-Specific Systems, Architectures and
-gold-verification-platform.html, 2022. Last Processors, pages 147–152, 2014.
accessed on 02/11/2022. [19] G. Dessouky, D. Gens, et al. HardFails: Insights into
Software-Exploitable Hardware Bugs. USENIX Security
[5] ModelSim. https://eda.sw.siemens.com/en-US/
Symposium, pages 213–230, 2019.
ic/modelsim/, 2022. Last accessed on 05/02/2022.
[20] C. Deutschbein, A. Meza, et al. Toward Hardware Se-
[6] Questa Formal Verification Apps. https://eda.sw.s curity Property Generation at Scale. IEEE Security and
iemens.com/en-US/ic/verification-and-valid Privacy, (01):2–10, 2022.
ation/formal-verification/, 2022. Last accessed
on 09/16/2022. [21] E. El Mandouh and A. G. Wassal. Estimation of Formal
Verification Cost Using Regression Machine Learning.
[7] Undefined behavior. https://en.cppreference. IEEE International High Level Design Validation and
com/w/cpp/language/ub, 2022. Last accessed on Test Workshop, pages 121–127, 2016.
09/30/2022.
[22] H. Eldib, C. Wang, et al. Formal Verification of Software
[8] VC Formal. https://www.synopsys.com/verific Countermeasures against Side-Channel Attacks. ACM
ation/static-and-formal-verification/vc-f Transactions on Software Engineering and Methodol-
ormal.html, 2022. Last accessed on 02/11/2022. ogy, 24(2):1–24, 2014.

[9] VCS. https://www.synopsys.com/verificatio [23] M. R. Fadiheh, J. Müller, et al. A Formal Approach


n/simulation/vcs.html, 2022. Last accessed on for Detecting Vulnerabilities to Transient Execution At-
05/02/2022. tacks in Out-of-Order Processors. ACM/IEEE Design
Automation Conference, pages 1–6, 2020.
[10] A. Amid, D. Biancolin, et al. Chipyard: Integrated De-
[24] N. Fern. CWE-1281. https://cwe.mitre.org/da
sign, Simulation, and Implementation Framework for
ta/definitions/1281.html, 2020. Last accessed on
Custom SoCs. IEEE Micro, 40(4):10–21, 2020.
09/12/2022.
[11] K. Asanović, R. Avizienis, et al. The Rocket Chip Gen- [25] P. Godefroid, N. Klarlund, et al. DART: Directed Auto-
erator. (UCB/EECS-2016-17), Apr 2016. mated Random Testing. ACM conference on Program-
ming language design and implementation, 2005.
[12] B. Bailey. Incremental Design Breakdown. https://
github.com/openrisc/mor1kx, 2022. Last accessed [26] P. Godefroid, M. Y. Levin, et al. Automated Whitebox
on 04/08/2022. Fuzz Testing. NDSS, 8:151–166, 2008.

1374 32nd USENIX Security Symposium USENIX Association


[27] I. Graja, S. Kallel, et al. A comprehensive survey on [43] K. Laeufer, J. Koenig, et al. RFUZZ: Coverage-Directed
modeling of cyber-physical systems. Concurrency and Fuzz Testing of RTL on FPGAs. IEEE International
Computation: Practice and Experience, 32(15), 2020. Conference on Computer-Aided Design, 2018.
[28] S. L. W. Group. IEEE Standard for SystemVerilog– [44] lcamtuf. American Fuzzy Lop (AFL) Fuzzer. http:
Unified Hardware Design, Specification, and Verifica- //lcamtuf.coredump.cx/afl/technical_detail
tion Language. IEEE Std 1800-2017, 2018. s.txt, 2014. Last accessed on 02/07/2022.
[29] J. L. Hennessy and D. A. Patterson. Computer Archi-
[45] T. Li, H. Zou, et al. Symbolic Simulation Enhanced
tecture: A Quantitative Approach. 2011.
Coverage-Directed Fuzz Testing of RTL Design. IEEE
[30] P.-H. Ho, T. Shiple, et al. Smart Simulation Using Col- International Symposium on Circuits and Systems, 2021.
laborative Formal and Simulation Engines. Interna-
tional Conference on Computer-Aided Design., 2000. [46] M. Lipp, M. Schwarz, et al. Meltdown: Reading Kernel
Memory from User Space. USENIX Security, 2018.
[31] W. Hu, A. Ardeshiricham, et al. Hardware Information
Flow Tracking. ACM Computing Surveys, 2021. [47] C. Lyu, S. Ji, et al. MOPT: Optimized Mutation Schedul-
ing for Fuzzers. USENIX Security Symposium, 2019.
[32] J. Hur, S. Song, et al. DIFUZZRTL: Differential Fuzz
Testing to Find CPU Bugs. IEEE Symposium on Security [48] Y. A. Manerkar, D. Lustig, et al. RTLCheck: Verifying
and Privacy, pages 1286–1303, 2021. the Memory Consistency of RTL Designs. IEEE/ACM
[33] R. J. Hyndman. Moving Averages. 2011. International Symposium on Microarchitecture, 2017.

[34] M. Ivanković, G. Petrović, et al. Code Coverage at [49] MITRE. CWE VIEW: Hardware Design. https://cw
Google. ACM Joint Meeting on European Software e.mitre.org/data/definitions/1194.html, 2019.
Engineering Conference and Symposium on the Foun- Last accessed on 05/02/2022.
dations of Software Engineering, pages 955–963, 2019.
[50] A. Mockus, N. Nagappan, et al. Test Coverage and
[35] I. B. Kadron, Y. Noller, et al. Fuzzing, Symbolic Exe- Post-Verification Defects: A Multiple Case Study.
cution, and Expert Guidance for Better Testing. IEEE IEEE/ACM International Symposium on Empirical Soft-
Software, 2023. ware Engineering and Measurement, 2009.
[36] R. Kande, A. Crump, et al. TheHuzz: Instruction
[51] S. K. Muduli, G. Takhar, et al. HyperFuzzing for SoC
Fuzzing of Processors Using Golden-Reference Mod-
Security Validation. ACM/IEEE International Confer-
els for Finding Software-Exploitable Vulnerabilities.
ence on Computer-Aided Design, pages 1–9, 2020.
USENIX Security Symposium, pages 3219–3236, 2022.
[37] A. Kanuparthi, H. Khattri, et al. CWE-1252. https: [52] Y. Naveh, M. Rimon, et al. Constraint-Based Random
//cwe.mitre.org/data/definitions/1252.html, Stimuli Generation for Hardware Verification. AI maga-
2022. Last accessed on 09/12/2022. zine, 28(3):13–13, 2007.

[38] D. G. Kleinbaum and M. Klein. Survival Analysis a [53] OpenRISC. mor1kx source code. https://github
Self-Learning Text. 1996. .com/openrisc/mor1kx, 2020. Last accessed on
04/08/2021.
[39] P. Kocher, J. Horn, et al. Spectre Attacks: Exploiting
Speculative Execution. IEEE Symposium on Security [54] OpenRISC. OpenRISC Homepage. https://openri
and Privacy, 2019. sc.io/, 2020. Last accessed on 04/08/2021.
[40] A. Kolbi, J. Kukula, et al. Symbolic RTL Simulation.
[55] OpenRISC. or1200 source code. https://github
IEEE/ACM Design Automation Conference, 2001.
.com/openrisc/or1200, 2020. Last accessed on
[41] P. Krishnamurthy, R. Karri, et al. Anomaly Detection in 04/08/2021.
Real-Time Multi-Threaded Processes Using Hardware
Performance Counters. IEEE Transactions on Informa- [56] OpenRISC. Or1ksim Source Code. https://gith
tion Forensics and Security, 15:666–680, 2019. ub.com/openrisc/or1ksim, 2020. Last accessed on
04/08/2021.
[42] Y. Kukimoto. Introduction to Formal Verification. ht
tps://ptolemy.berkeley.edu/projects/embedd [57] B. S. Pak. Hybrid Fuzz Testing: Discovering Software
ed/research/vis/doc/VisUser/vis_user/node4 Bugs via Fuzzing and Symbolic Execution. School of
.html, 2011. Last accessed on 09/12/2022. Computer Science Carnegie Mellon University, 2012.

USENIX Association 32nd USENIX Security Symposium 1375


[58] V.-T. Pham, M. Böhme, et al. Model-Based Whitebox [73] C. Trippel, D. Lustig, et al. Checkmate: Automated Syn-
Fuzzing for Program Binaries. International Conference thesis of Hardware Exploits and Security Litmus Tests.
on Automated Software Engineering, 2016. IEEE International Symposium on Microarchitecture,
2018.
[59] PLOVER. CWE-440. https://cwe.mitre.org/da
ta/definitions/440.html, 2022. Last accessed on [74] T. Trippel, K. G. Shin, et al. Fuzzing Hardware Like
09/12/2022. Software. USENIX Security Symposium, 2022.
[60] H. Ponce-de León and J. Kinder. Cats vs. Spectre: An
Axiomatic Approach to Modeling Speculative Execu- [75] Verific. Verific Design Automation. https://www.ve
tion Attacks. IEEE Symposium on Security and Privacy, rific.com/, 2022. Last accessed on 04/08/2021.
2022.
[76] X. Wang and R. Karri. Numchecker: Detecting Kernel
[61] R. J. Punnoose, R. C. Armstrong, et al. Survey of Exist- Control-flow Modifying Rootkits by Using Hardware
ing Tools for Formal Verification. 2014. Performance Counters. Design Automation Conference,
2013.
[62] J. Rajendran, A. M. Dhandayuthapany, et al. Formal
Security Verification of Third Party Intellectual Property [77] B. Wile, J. Goss, et al. Comprehensive Functional Veri-
Cores For Information Leakage. IEEE/ACM Interna- fication: The Complete Industry Cycle. 2005.
tional Conference on VLSI Design and International
Conference on Embedded Systems, 2016.
[78] H. Witharana, Y. Lyu, et al. A Survey on Assertion-
[63] RISC-V. RISC-V Webpage. https://riscv.org/, based Hardware Verification. ACM Computing Surveys,
2021. Last accessed on 04/08/2021. 2022.

[64] RISC-V. SPIKE Source Code. https://github.c [79] F. Zaruba and L. Benini. The Cost of Application-
om/riscv/riscv-isa-sim, 2021. Last accessed on Class Processing: Energy and Performance Analysis
05/12/2022. of a Linux-Ready 1.7-GHz 64-Bit RISC-V Core in 22-
nm FDSOI Technology. IEEE Transactions on Very
[65] C. P. Robert. Monte Carlo Methods in Statistics. Large Scale Integration Systems, 2019.
arXiv:0909.0389, 2009.

[66] K. Serebryany, D. Bruening, et al. AddressSanitizer: A [80] J. Zhao, B. Korpan, et al. SonicBOOM: The 3rd Gen-
Fast Address Sanity Checker. USENIX Annual Techni- eration Berkeley Out-of-Order Machine. 4th Workshop
cal Conference, pages 309–318, 2012. on Computer Architecture Research with RISC-V, 2020.

[67] J. P. Shen and M. H. Lipasti. Modern Processor Design: [81] L. Zhao, Y. Duan, et al. Send Hardest Problems My
Fundamentals of Superscalar Processors. 2013. Way: Probabilistic path prioritization for hybrid fuzzing.
NDSS, 2019.
[68] Y. Shi and R. Eberhart. A Modified Particle Swarm Op-
timizer. IEEE international conference on evolutionary
[82] X. Zhu, S. Wen, et al. Fuzzing: A Survey for Roadmap.
computation proceedings., 1998.
ACM Computing Surveys (CSUR), 54(11s):1–36, 2022.
[69] W. Snyder. Verilator. https://www.veripool.org/w
iki/verilator, 2021. Last accessed on 04/08/2021.
Appendix
[70] N. Stephens, J. Grosen, et al. Driller: Augmenting
Fuzzing Through Selective Symbolic Execution. NDSS, A Total branch coverage achieved by random
2016. regression, TheHuzz [36], and HyPFuzz
[71] Synopsys. Synopsys Webpage. https://www.synops
ys.com/, 2021. Last accessed on 04/08/2021. Figure 8 shows the branch coverage achieved by random re-
gression, TheHuzz [36], and different point selection strategies
[72] Synopsys. Accelerating Verification Shift Left with of HyPFuzz. Across the five processors, HyPFuzz achieves
Intelligent Coverage Optimization. https://www.sy 4.73% more coverage than random regression after fuzzing
nopsys.com/cgi-bin/verification/dsdla/pdfr for 72 hours, as seen in Figure 8. Also, HyPFuzz achieves
1.cgi?file=ico-wp.pdf, 2022. Last accessed on 1.72% more coverage than TheHuzz after running for the
02/18/2023. same 72 hours. More details are discussed in Section 5.4.

1376 32nd USENIX Security Symposium USENIX Association


100 100 100
% Branch points covered

% Branch points covered

% Branch points covered


95 95 95

90 90 90

85 85 85

80 80 80
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70
Time (hrs) Time (hrs) Time (hrs)

(a) CVA6 [79] (b) BOOM [80] 100 (c) Rocket Core [11]

% Branch points covered


100 100
% Branch points covered

% Branch points covered


95 95 95 Random regression
TheHuzz
90 90 HyPFuzz-RandSel
90 HyPFuzz-MaxUncovd
85 85
HyPFuzz-ModDep
85
HyPFuzz-BotTop
80 80
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70
Time (hrs) Time (hrs)

(d) mor1kx [53] (e) OR1200 [55] 80


0 10 20 30 40 50 60 70
Time (hrs)
Figure 8: Total branch points covered by random regression, TheHuzz [36], and HyPFuzz.

Listing 4: Code snippet showing different coverage metrics. ering the point for taking the if branch statement will be
1 module code_cov_example (input a,b,c,output d); (start_q == IDLE) and (a||b&&c). And, the condition
2 reg [1:0] state_d,state_q;
3 always@(*)begin to check the branch not taken is (start_q == IDLE) and
4 case (start_q) begin not(a||b&&c). The cover properties for the two points can be
5 IDLE: begin // FSM
6 if (a || b && c) begin // branch, condition specified in the SVA format as:
7 state_d = FINISH; cover property ((start_q == IDLE)&&(a||b&&c))
8 end else begin
9 state_d = a & c; // expression cover property ((start_q == IDLE)&&!(a||b&&c))
10 ...
11 FINISH: begin // FSM
12 d = 1'b1; // toggle Condition coverage allocates coverage points for all possible
combinations of values for the signals in a branch statement
B The Property Generator for Different Cover- (i.e., three 1-bit signals in the if branch statement in Line 6
age Metrics lead to eight condition coverage points.) The cover properties:
cover property (a == 1′ b0&&b == 1′ b0&&c == 1′ b0)
HyPFuzz currently uses the branch, condition, and finite-state- ..
.
machine (FSM) coverage metrics for evaluation to demon- cover property (a == 1′ b1&&b == 1′ b1&&c == 1′ b1)
strate the compatibility of HyPFuzz with different coverage
metrics. Our property generator is similarly compatible with Expression coverage allocates coverage points for all possi-
other coverage metrics also, as we can translate the coverage ble combinations of values for the signals in an assignment
points to cover properties needed by the formal tools, such (i.e., two 1-bit signals in Line 9 lead to four expression cover-
as JasperGold [4]. Usually, the coverage metrics are reported age points). The cover properties:
as either a signal name or Boolean expression. Hence, it is
cover property (a == 1′ b0&&c == 1′ b0)
always possible to translate these metrics into SystemVer- ..
ilog Assertions (SVA) cover properties because the cover .
property is any legal, temporal logic (TL) expression [28] cover property (a == 1′ b1&&c == 1′ b1)
with the form: cover property < T L − expression >, where
< T L − expression > can be temporal in general. We use Finite-state machine (FSM) coverage allocates two sets of
Listing 4 as an example to show how to generate SVA cover coverage points to check the FSM in a module. The first set of
properties for different code coverage metrics. point checks if state registers have reached all possible state
Branch coverage allocates coverage points following the values. The second set of point captures all state transitions
branch statement (i.e., two coverage points, one for if branch in the FSM. Lines 5 and 11 show the two state values and
statement in Line 6 taken and another one for not taken). at least one state transition of the state_q register. The cover
Following the branch statement tree, the conditions of cov- properties in SVA format:

USENIX Association 32nd USENIX Security Symposium 1377


(1) checking FSM states: the MaxUncovd strategy, the priority of module dependence
cover property (state_q == IDLE) will not change. More details are discussed in Section 4.4.3.
..
.
Algorithm 2: ModDep strategy
cover property (state_q == FINISH)
(2) checking state transitions: Input: M = {m0 , m1 , ..., mn }: a set of modules;
cover property Output: M ′ , (|M ′ | = |M|): ordered set of modules
(state_q == IDLE ##1 state_q == FINISH) based on ModDep strategy;

1 M ←0 /
2 |COI| ← |M| // store COI of the modules
Toggle coverage entails verifying whether specific bits have
3 for j ← 0...(|M| − 1) do
been flipped during simulation or not. Line 12 contains a
toggle coverage point that verifies if the 1-bit output signal d 4 coi j ← 0
has been flipped from 0 to 1. The cover property: 5 for out put ∈ GET O UTPUTS (m j ) do
cover property (d == 1′ b0 7→ d == 1′ b1) 6 coi j ← coi j + M EASURE COI(out put)
/* sort the module set based on the fanout
C Case study example for Boolean assign- COI of the modules */
7 M ′ ← S ORT (M,COI, MaxToMin)
ments and Test case converter
8 return M ′
Figure 9 shows the Boolean assignments output by Jasper-
Gold after proving a cover property. Figure 10 shows the
template disassembly file used to determine the address of
the TEST instructions, the binary file template, and the valid
E Evaluation results of r f uzz and r f ml
binary file generated by the test case converter to cover the Figure 11 and Figure 12 show the evaluation results of cover-
S_EXT interrupt branch point. More details are in Section 3.4. age increment rate of formal tool (r f ml ) and fuzzer (r f uzz )
respectively on CVA6 processor. Figure 11 evaluates the
Clock r f ml overtime, and Figure 12 evaluates the number of under-
utilization of fuzzers when the window size (w) is insufficient.
MIE[S_EXT]
More details are discussed in Section 5.2.
MIP[S_EXT]
0.03 Final rfml
Rate of formal tool (rfml)

Instruction 32'h2041e313 32'h30432073 32'h34432073 Accumulated rfml


0.02 Mean of difference

Final rfml = 9.797x10−4


Figure 9: Boolean assignments from JasperGold [4]. 0.01

0.00

Template(Disas.) Template(Binary) Valid file(Binary) −0.01

0x80000000 <INIT>: ... ... −0.02


0 50 100 150 200 250 300 350
… 0x80000000 0x00000113 0x80000000 0x00000113 # Times formal tool is used
0x8000258c <TEST>: ... ...
0x8000258c nop 0x8000258c 0x00000013 0x8000258c 0x2041e313
0x80002590 nop 0x80002590 0x00000013 0x80002590 0x30432073 Figure 11: The change of r f ml overtime.
0x80002594 nop 0x80002594 0x00000013 0x80002594 0x34432073
0x80002598 nop 0x80002598 0x00000013 0x80002598 0x00000013
… … …
0x80001e0c <EXIT>: ... ... 120
... 0x80001e0c 0x0000006f 0x80001e0c 0x0000006f 107
# Times fuzzer is under-utilized

100

Figure 10: Test case conversion. 80


64
60
43
40
31
22
D ModDep algorithm 20 16 14
9
4 3
0
10 20 30 40 50 60 70 80 90 100
Algorithm 2 is the method to calculate the fanout COI of Window (w)

each module. The module with the highest fanout COI will
be selected first. HyPFuzz will prioritize modules based on Figure 12: Sampling results of underutilization of fuzzer.
the dependence before running the experiment. Hence, unlike

1378 32nd USENIX Security Symposium USENIX Association

You might also like