HyPFuzz Formal-Assisted Processor Fuzzing
HyPFuzz Formal-Assisted Processor Fuzzing
HyPFuzz Formal-Assisted Processor Fuzzing
Chen Chen† , Rahul Kande† , Nathan Nguyen† , Flemming Andersen† , Aakash Tyagi† ,
Ahmad-Reza Sadeghi∗ , and Jeyavijayan Rajendran†
† Texas A&M University, USA, ∗ Technische Universität Darmstadt, Germany
† {chenc, rahulkande, nathan.tm.nguyen, flandersen, tyagi, jv.rajendran}@tamu.edu,
∗ {ahmad.sadeghi}@trust.tu-darmstadt.de
Switch
Fuzzer Coverage Formal tool
Fuzzer rate
% Branch points covered
95 Scheduler
90 Mutation engine
01010 10001
85
11100 01000 Boolean assign
80
00111 11111 Point
selector
75
Test case
Jaspergold
70 database Property Test case
TheHuzz DUT
65
HyPFuzz generator converter
Seed
60
0 1 2 3 4 5 6 7 8
Day
Figure 3: Framework of HyPFuzz. The circle and cross repre-
Figure 2: Eight day coverage results of TheHuzz [36], Jasper- sent covered and uncovered points, respectively.
Gold [4], and HyPFuzz for the CVA6 processor [79].
Data
Instr Realigner LSU
Register cache
Instr queue Register
EXECUTE/COMMIT
file ALU
ISSUE/EXECUTE
FETCH/DECODE
DECODE/ISSUE
Branch WB
Scoreboard
Compressed
PC predictor decoder
Branch
Scoreboard unit
M
Predicted
Illegal Instr Exception
U
addr FPU
Decoder
ADD
X
Instr ops Merge
Parallel
Instruction
FPUOp
4
Legal instr FPMul Interrupt
MUX decoding
Branch check
prediction Illegal FPDiv
mie Valid interrupt
Interrupt Interrupt
CSR mip Interrupt type
handler
Figure 4: Simplified pipeline of the CVA6 processor [79]. The connectivity in dotted lines is an example to show how a module
drives various other modules in the design.
the number of modules between the input of the DUT and the Thus, based on Observation O3, targeting the coverage points
target module. For example, consider the simplified pipeline from a module with many other modules in their fanout COI
of the CVA6 processor shown in Figure 4. In this pipeline, will allow the fuzzer to uncover more points. Hence, our Mod-
the distance of FETCH, DECODE, and EXECUTE is one, and both Dep strategy prioritizes modules with higher COI over other
FPMul and FPDiv have a distance of four. Thus, the BotTop modules, as shown in Algorithm 2. For example, ModDep
strategy assigns the highest priority to FPMul and FPDiv over assigns the highest priority to the CSR module in the CVA6
other modules in the CVA6 processor. processor because it has the highest fanout due to driving
multiple components: interrupt control logic (shown using
4.4.2 MaxUncovd Strategy red dotted lines in Figure 4), write logic in Register file,
and enable logic in Data cache.
In a hybrid fuzzer, the seed generated using the formal tool
will cover a coverage point in the hard-to-reach design space.
On mutating the seed, the fuzzer will explore the design space 4.5 Integrating Fuzzer and Formal Tool
in the “vicinity” of this covered point. Thus, based on Ob-
servation O2, having more uncovered points in the “vicinity” Formal tools target the DUT’s properties, but fuzzers target the
will increase the number of coverage points the fuzzer can DUT’s coverage points. Due to this incompatibility, we cannot
cover, thereby accelerating HyPFuzz. Thus, our second strat- directly send the target coverage point to the formal tool to
egy, MaxUncovd prioritizes the module with the maximum verify or take the Boolean assignments of a proved property as
number of uncovered points over the rest of DUT’s modules, an executable test case for the fuzzer, creating challenge C3.
irrespective of its distance from the inputs. Unlike the BotTop To seamlessly integrate the fuzzer and formal tool, we develop
strategy, whose module distance is fixed for a given DUT, the a property generator and a test case converter.
number of uncovered points in a given module decreases as Property generator generates the cover property for the se-
HyPFuzz explores more design space over time. Hence, Max- lected uncovered point. It parses the DUT’s logic and iden-
Uncovd is a dynamic strategy that recomputes the priorities tifies the conditions for covering the point. Then, it converts
of each module every time the point selector is invoked. the conditions into a cover property and loads it to the formal
For example, in the CVA6 processor, the Decoder and float- tool along with the DUT. For example, given an uncovered
ing point unit (FPU) modules have 381 and 108 branch cov- point of branch coverage, the property generator analyzes
erage points, respectively. In the beginning, all the coverage the dependencies of the branch statement. It then identifies
points in the DUT are uncovered. Hence, the Decoder will the conditions to cover that point, and the logical conjunction
have more uncovered points; consequently, MaxUncovd pri- of them will form the expression of the corresponding cover
oritizes the Decoder over the FPU. However, over time, the property. The property generator is compatible to uncovered
number of uncovered points in the Decoder decreases as in- points of other coverage metrics, as shown in Appendix B.
structions of all types activate this module. Test case converter As mentioned earlier, the formal tool
generates the Boolean assignments of the input signals for
4.4.3 ModDep Strategy each clock cycle to cover a target coverage point. However,
the fuzzer has a different input format. For example, pro-
Triggering a coverage point first requires activating the mod- cessor fuzzers, such as TheHuzz and DIFUZZRTL [32], use
ules that drive the logic corresponding to this coverage point. sequences of instructions as test cases to fuzz [32, 36, 43].
41.24
traditional random regression. To ensure a fair comparison, CVA6
36.41
we constrain the environment of JasperGold to be the same as BOOM
40
Rocket Core
the hardware fuzzer’s simulation environment [13, 32, 36, 43]. mor1kx
Speedup
30 OR1200
To prevent JasperGold from getting stuck while proving a
property, we set a time limit on JasperGold to prove each prop- 20
11.42
erty. We compute this time limit by using the time JasperGold
8.78
5.57
10
4.17
spends on 30 random coverage points and applying survival
2.07
2.03
2.00
1.99
1.99
1.90
1.85
1.81
1.73
1.72
1.63
1.63
1.37
1.31
analysis [38] to calculate a time limit large enough to prove 0
over 99% of the points in the design. We set the rolling win- dDep zz-RandSel uzz-BotTop xUnco
vd
zz-Mo HyPF zz-Ma
HyPFu HyPFu HyPFu
dow size (w) as 100 to calculate r f uzz using Equation 2. We
ran each experiment for 72 hours and repeated it three times. Figure 5: Speedup of HyPFuzz over TheHuzz [36].
5.2 Evaluating the Scheduling Strategy sors except for the CVA6 processor, which has a difference
We pick the coverage results of HyPFuzz on CVA6 to evalu- of 1.31%. The difference in the coverage achieved by all the
ate the coverage increment rate of the formal tool (r f ml ) and strategies decreases over time, and the coverage achieved will
the fuzzer (r f uzz ). The results are shown in Appendix E. eventually converge. However, they still achieve an average
Evaluation on r f ml . We evaluate how the r f ml will quickly of 1.72% more coverage compared to TheHuzz and 4.73%
converge and reflect how fast the formal tool will explore the more coverage compared to random regression, across the
hard-to-reach spaces of the fuzzer in a design. The sched- five processors (see Section 5.4).
uler updates r f ml every time when JasperGold proves the Coverage speed of the four selection strategies of HyPFuzz
reachability of an uncovered point. As shown in Figure 11, on the five processors is shown in Figure 5. The MaxUncovd
HyPFuzz switched to using JasperGold more than 300 times. selection strategy is the fastest for all processors except for
The accumulated r f ml only varies during the first 24 switches, the OR1200, where the RandSel strategy performs the best.
after which the mean difference between the final r f ml and The reason why RandSel strategy is the fastest for the OR1200
the accumulated r f ml is less than 5%. Hence, compared to processor is because of its smaller size (around 25K gates),
the entire experiment, r f ml only requires several samples to and RandSel has a faster selection time compared to other
converge and can reflect the capability of space exploration strategies. However, this is not the case for the other pro-
of JasperGold in a design. The time limit on JasperGold also cessors, especially the larger processors, such as CVA6 and
helps accelerate the convergence of r f ml . BOOM, which have several hundred thousand gates and more
Evaluation on r f uzz (w). We evaluate how the rolling win- microarchitectural features. For these designs, the order of
dow size (w) affects the accuracy of estimating how well the point selection has a more dominant effect on coverage speed
fuzzer in HyPFuzz recently explored the design space. The than the time taken to select the point. Based on these analy-
fuzzer in HyPFuzz executes 10 test cases at a time; hence we ses, we select the MaxUncovd strategy as our point selection
analyze the r f uzz (w) for different values of w in increments of strategy for HyPFuzz. All further evaluations will use the
10. For each value of w, we plot the number of times the sched- MaxUncovd strategy unless stated otherwise.
uler will under-utilize the fuzzer and switch to a formal tool,
as shown in Figure 12. We consider the fuzzer under-utilized 5.4 Coverage Achieved
if the coverage of the fuzzer is temporarily stagnated but will
cover new points through mutation of test cases if scheduler We now evaluate the capability of random regression, The-
had not switched to the formal tool. We can observe from Huzz, and HyPFuzz in achieving coverage. Across the five
Figure 12 that when the window is set to 100, r f uzz (w) can processors, HyPFuzz achieves 4.73% more coverage than ran-
capture almost all coverage increments from the following dom regression and is 239.93× × faster than random regression
test cases. Hence, we set the experiment w to be 100. after fuzzing for 72 hours, as seen in Figure 8. Also, HyPFuzz
5.3 Evaluating Point Selection Strategies achieves 1.72% more coverage than TheHuzz and is 11.68× ×
faster than TheHuzz after running for the same 72 hours, as
To find the most efficient selection strategy for HyPFuzz, seen in Figure 8 and Figure 5, respectively.
we now evaluate the three point selection strategies BotTop, The CVA6 and BOOM processors are complex with advanced
MaxUncovd, and ModDep, along with the RandSel, using two OoO and SIMD features. Hence, HyPFuzz achieves substan-
metrics: amount of coverage achieved and coverage speed. tially higher coverage than TheHuzz on them. Figure 6 shows
Coverage achieved. Figure 8 shows the branch coverage that HyPFuzz achieves the highest branch coverage improve-
achieved by different selection strategies for all the proces- ment of 6.84% on CVA6 compared to TheHuzz. Thus, the cov-
sors. The difference between the coverage achieved by the erage speed of TheHuzz is less for these processors. However,
four selection strategies is less than 0.5% on all the proces- HyPFuzz leverages the efficient point selector to generate
95
number of instructions required to trigger them is too low (<
1000 instructions) (i.e., measure the coverage and mutate test 85
% Points covered
cases). For example, V9 and V13 require less than one hun- 75
dred instructions to trigger. On an average, HyPFuzz detects
65
× faster than TheHuzz in terms of run time
vulnerabilities 3.06×
and uses 3.05×× fewer number of instructions to trigger them. 55
45 FSM:TheHuzz Condtion:TheHuzz
FSM:HyPFuzz Condtion:HyPFuzz
5.6 Evaluation on Compatibility of HyPFuzz 35
0 10 20 30 40 50 60 70
Time (hrs)
In this section, we demonstrate the compatibility of HyPFuzz
in achieving coverage with different coverage metrics such as Figure 7: Total condition points and FSM points covered by
condition and finite-state machine (FSM) coverage. HyPFuzz TheHuzz [36] and HyPFuzz.
uses its property generator to generate the cover properties
for the points of them as discussed in Appendix B. Figure 7
shows the coverage achieved by HyPFuzz and TheHuzz on
DUT. DirectFuzz [13] leverages the mux-toggle coverage
CVA6 processor when run for 72 hours and repeated three
and allocates more mutation energy to test cases that achieve
times. HyPFuzz achieves an average of 15.95% more con-
coverage closed to a manually selected model, hence achiev-
dition coverage and 33.76% more FSM coverage compared
ing faster coverage. However, the mux-toggle metric does
to TheHuzz hence demonstrating that HyPFuzz is capable of
not scale well to large designs like BOOM and CVA6 due to
improving coverage for different types of coverage metrics.
instrumentation overhead. DIFUZZRTL [32] addresses the
overhead limitation of RFUZZ by developing a new coverage
6 Related Works metric, control-register coverage. It uses all possible combina-
tions of values of all the registers driving the selection logic
Hardware fuzzers. RFUZZ [43] uses mux-toggle coverage of MUX. However, this coverage metric does not assign cov-
as feedback to fuzz hardware and was one of the first attempts erage points to many registers as well as combinational logic,
towards creating hardware fuzzers. The mux-toggle cover- thereby missing many security-critical vulnerabilities [36].
age covers the activity in the select signals of muxes in the Therefore, TheHuzz [36] uses code coverage metrics (branch,
[4] Jasper RTL Apps. https://www.cadence.com/en [18] R. De Clercq, F. Piessens, et al. Secure Interrupts on
_US/home/tools/system-design-and-verificat Low-End Microcontrollers. IEEE International Confer-
ion/formal-and-static-verification/jasper ence on Application-Specific Systems, Architectures and
-gold-verification-platform.html, 2022. Last Processors, pages 147–152, 2014.
accessed on 02/11/2022. [19] G. Dessouky, D. Gens, et al. HardFails: Insights into
Software-Exploitable Hardware Bugs. USENIX Security
[5] ModelSim. https://eda.sw.siemens.com/en-US/
Symposium, pages 213–230, 2019.
ic/modelsim/, 2022. Last accessed on 05/02/2022.
[20] C. Deutschbein, A. Meza, et al. Toward Hardware Se-
[6] Questa Formal Verification Apps. https://eda.sw.s curity Property Generation at Scale. IEEE Security and
iemens.com/en-US/ic/verification-and-valid Privacy, (01):2–10, 2022.
ation/formal-verification/, 2022. Last accessed
on 09/16/2022. [21] E. El Mandouh and A. G. Wassal. Estimation of Formal
Verification Cost Using Regression Machine Learning.
[7] Undefined behavior. https://en.cppreference. IEEE International High Level Design Validation and
com/w/cpp/language/ub, 2022. Last accessed on Test Workshop, pages 121–127, 2016.
09/30/2022.
[22] H. Eldib, C. Wang, et al. Formal Verification of Software
[8] VC Formal. https://www.synopsys.com/verific Countermeasures against Side-Channel Attacks. ACM
ation/static-and-formal-verification/vc-f Transactions on Software Engineering and Methodol-
ormal.html, 2022. Last accessed on 02/11/2022. ogy, 24(2):1–24, 2014.
[34] M. Ivanković, G. Petrović, et al. Code Coverage at [49] MITRE. CWE VIEW: Hardware Design. https://cw
Google. ACM Joint Meeting on European Software e.mitre.org/data/definitions/1194.html, 2019.
Engineering Conference and Symposium on the Foun- Last accessed on 05/02/2022.
dations of Software Engineering, pages 955–963, 2019.
[50] A. Mockus, N. Nagappan, et al. Test Coverage and
[35] I. B. Kadron, Y. Noller, et al. Fuzzing, Symbolic Exe- Post-Verification Defects: A Multiple Case Study.
cution, and Expert Guidance for Better Testing. IEEE IEEE/ACM International Symposium on Empirical Soft-
Software, 2023. ware Engineering and Measurement, 2009.
[36] R. Kande, A. Crump, et al. TheHuzz: Instruction
[51] S. K. Muduli, G. Takhar, et al. HyperFuzzing for SoC
Fuzzing of Processors Using Golden-Reference Mod-
Security Validation. ACM/IEEE International Confer-
els for Finding Software-Exploitable Vulnerabilities.
ence on Computer-Aided Design, pages 1–9, 2020.
USENIX Security Symposium, pages 3219–3236, 2022.
[37] A. Kanuparthi, H. Khattri, et al. CWE-1252. https: [52] Y. Naveh, M. Rimon, et al. Constraint-Based Random
//cwe.mitre.org/data/definitions/1252.html, Stimuli Generation for Hardware Verification. AI maga-
2022. Last accessed on 09/12/2022. zine, 28(3):13–13, 2007.
[38] D. G. Kleinbaum and M. Klein. Survival Analysis a [53] OpenRISC. mor1kx source code. https://github
Self-Learning Text. 1996. .com/openrisc/mor1kx, 2020. Last accessed on
04/08/2021.
[39] P. Kocher, J. Horn, et al. Spectre Attacks: Exploiting
Speculative Execution. IEEE Symposium on Security [54] OpenRISC. OpenRISC Homepage. https://openri
and Privacy, 2019. sc.io/, 2020. Last accessed on 04/08/2021.
[40] A. Kolbi, J. Kukula, et al. Symbolic RTL Simulation.
[55] OpenRISC. or1200 source code. https://github
IEEE/ACM Design Automation Conference, 2001.
.com/openrisc/or1200, 2020. Last accessed on
[41] P. Krishnamurthy, R. Karri, et al. Anomaly Detection in 04/08/2021.
Real-Time Multi-Threaded Processes Using Hardware
Performance Counters. IEEE Transactions on Informa- [56] OpenRISC. Or1ksim Source Code. https://gith
tion Forensics and Security, 15:666–680, 2019. ub.com/openrisc/or1ksim, 2020. Last accessed on
04/08/2021.
[42] Y. Kukimoto. Introduction to Formal Verification. ht
tps://ptolemy.berkeley.edu/projects/embedd [57] B. S. Pak. Hybrid Fuzz Testing: Discovering Software
ed/research/vis/doc/VisUser/vis_user/node4 Bugs via Fuzzing and Symbolic Execution. School of
.html, 2011. Last accessed on 09/12/2022. Computer Science Carnegie Mellon University, 2012.
[64] RISC-V. SPIKE Source Code. https://github.c [79] F. Zaruba and L. Benini. The Cost of Application-
om/riscv/riscv-isa-sim, 2021. Last accessed on Class Processing: Energy and Performance Analysis
05/12/2022. of a Linux-Ready 1.7-GHz 64-Bit RISC-V Core in 22-
nm FDSOI Technology. IEEE Transactions on Very
[65] C. P. Robert. Monte Carlo Methods in Statistics. Large Scale Integration Systems, 2019.
arXiv:0909.0389, 2009.
[66] K. Serebryany, D. Bruening, et al. AddressSanitizer: A [80] J. Zhao, B. Korpan, et al. SonicBOOM: The 3rd Gen-
Fast Address Sanity Checker. USENIX Annual Techni- eration Berkeley Out-of-Order Machine. 4th Workshop
cal Conference, pages 309–318, 2012. on Computer Architecture Research with RISC-V, 2020.
[67] J. P. Shen and M. H. Lipasti. Modern Processor Design: [81] L. Zhao, Y. Duan, et al. Send Hardest Problems My
Fundamentals of Superscalar Processors. 2013. Way: Probabilistic path prioritization for hybrid fuzzing.
NDSS, 2019.
[68] Y. Shi and R. Eberhart. A Modified Particle Swarm Op-
timizer. IEEE international conference on evolutionary
[82] X. Zhu, S. Wen, et al. Fuzzing: A Survey for Roadmap.
computation proceedings., 1998.
ACM Computing Surveys (CSUR), 54(11s):1–36, 2022.
[69] W. Snyder. Verilator. https://www.veripool.org/w
iki/verilator, 2021. Last accessed on 04/08/2021.
Appendix
[70] N. Stephens, J. Grosen, et al. Driller: Augmenting
Fuzzing Through Selective Symbolic Execution. NDSS, A Total branch coverage achieved by random
2016. regression, TheHuzz [36], and HyPFuzz
[71] Synopsys. Synopsys Webpage. https://www.synops
ys.com/, 2021. Last accessed on 04/08/2021. Figure 8 shows the branch coverage achieved by random re-
gression, TheHuzz [36], and different point selection strategies
[72] Synopsys. Accelerating Verification Shift Left with of HyPFuzz. Across the five processors, HyPFuzz achieves
Intelligent Coverage Optimization. https://www.sy 4.73% more coverage than random regression after fuzzing
nopsys.com/cgi-bin/verification/dsdla/pdfr for 72 hours, as seen in Figure 8. Also, HyPFuzz achieves
1.cgi?file=ico-wp.pdf, 2022. Last accessed on 1.72% more coverage than TheHuzz after running for the
02/18/2023. same 72 hours. More details are discussed in Section 5.4.
90 90 90
85 85 85
80 80 80
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70
Time (hrs) Time (hrs) Time (hrs)
(a) CVA6 [79] (b) BOOM [80] 100 (c) Rocket Core [11]
Listing 4: Code snippet showing different coverage metrics. ering the point for taking the if branch statement will be
1 module code_cov_example (input a,b,c,output d); (start_q == IDLE) and (a||b&&c). And, the condition
2 reg [1:0] state_d,state_q;
3 always@(*)begin to check the branch not taken is (start_q == IDLE) and
4 case (start_q) begin not(a||b&&c). The cover properties for the two points can be
5 IDLE: begin // FSM
6 if (a || b && c) begin // branch, condition specified in the SVA format as:
7 state_d = FINISH; cover property ((start_q == IDLE)&&(a||b&&c))
8 end else begin
9 state_d = a & c; // expression cover property ((start_q == IDLE)&&!(a||b&&c))
10 ...
11 FINISH: begin // FSM
12 d = 1'b1; // toggle Condition coverage allocates coverage points for all possible
combinations of values for the signals in a branch statement
B The Property Generator for Different Cover- (i.e., three 1-bit signals in the if branch statement in Line 6
age Metrics lead to eight condition coverage points.) The cover properties:
cover property (a == 1′ b0&&b == 1′ b0&&c == 1′ b0)
HyPFuzz currently uses the branch, condition, and finite-state- ..
.
machine (FSM) coverage metrics for evaluation to demon- cover property (a == 1′ b1&&b == 1′ b1&&c == 1′ b1)
strate the compatibility of HyPFuzz with different coverage
metrics. Our property generator is similarly compatible with Expression coverage allocates coverage points for all possi-
other coverage metrics also, as we can translate the coverage ble combinations of values for the signals in an assignment
points to cover properties needed by the formal tools, such (i.e., two 1-bit signals in Line 9 lead to four expression cover-
as JasperGold [4]. Usually, the coverage metrics are reported age points). The cover properties:
as either a signal name or Boolean expression. Hence, it is
cover property (a == 1′ b0&&c == 1′ b0)
always possible to translate these metrics into SystemVer- ..
ilog Assertions (SVA) cover properties because the cover .
property is any legal, temporal logic (TL) expression [28] cover property (a == 1′ b1&&c == 1′ b1)
with the form: cover property < T L − expression >, where
< T L − expression > can be temporal in general. We use Finite-state machine (FSM) coverage allocates two sets of
Listing 4 as an example to show how to generate SVA cover coverage points to check the FSM in a module. The first set of
properties for different code coverage metrics. point checks if state registers have reached all possible state
Branch coverage allocates coverage points following the values. The second set of point captures all state transitions
branch statement (i.e., two coverage points, one for if branch in the FSM. Lines 5 and 11 show the two state values and
statement in Line 6 taken and another one for not taken). at least one state transition of the state_q register. The cover
Following the branch statement tree, the conditions of cov- properties in SVA format:
0.00
100
each module. The module with the highest fanout COI will
be selected first. HyPFuzz will prioritize modules based on Figure 12: Sampling results of underutilization of fuzzer.
the dependence before running the experiment. Hence, unlike