Improving auto-generated dictionary of Cmplog #2493

am009 · 2025-07-09T09:44:08Z

Background: We observed that on FuzzBench, Honggfuzz performs better than AFL++ on the proj4 benchmark (for example, here). Over the past month, we have been investigating the reasons and attempting to further improve AFL++.

For proj4, we noticed that Honggfuzz’s auto-generated dictionary is of much higher quality than AFL++’s. Honggfuzz employs a simple strategy when collecting string constants for its dictionary: during memcmp/strcmp operations, it checks whether one of the pointers originates from the ELF file’s mapped memory (typically writable or non-writable data sections), excluding pointers from the heap. For example, in string constant comparisons, the data being compared usually points to the heap, while the string constant pointer points to the ELF’s read-only data section. When porting this to AFL++, we marked this information in an unused field of the cmplog’s rtn entry.

We initially ported Honggfuzz’s strategy (Version 1, aflplusplus_hfdictv1), but the generated dictionary still contained many low-quality bytes. We modified AFL++ to log auto-dictionary additions per code location (Here), revealing that location3 and location4 (link) added numerous suboptimal dictionary entries, typically 32 bytes in size.

AFL++’s cmplog will even instruments functions with signatures similar to memcmp/strcmp (and also ignoring the size from 3rd arg, see this code), it directly records the maximum 32 bytes from the memory pointed to by the first two arguments as a cmplog rtn entry when encountering such functions. The related condition checks (link) are quite loose, and over 90% of auto-dictionary entries originate there. After further filtering these entries, we achieved better results (Version 2, aflplusplus_hfdictv2).

Local FuzzBench Results
Fuzzers:

aflplusplus_hfdictv1: First version with Honggfuzz’s auto-dictionary logic.
aflplusplus_hfdictv2: Filtered out 32-byte dictionaries from cmplog rtn entries.
aflplusplus_proj4dict: Auto-dictionary entries extracted from a 23-hour Honggfuzz instance (converted and fed to AFL++).
honggfuzz_orig: Original Honggfuzz from FuzzBench.
aflplusplus_recent: Recent stable AFL++ version.

On the proj4_proj_crs_to_crs_fuzzer benchmark, AFL++ now performs as good as Honggfuzz:

hfdict-base-aflpphfdictv2-23h.zip

Testing across 4 other benchmarks also showed improvements on some other benchmarks:

report-5-benchmarks.zip

We are also requesting public FuzzBench experiments to further validate the results across more benchmarks (experiments request PR link).

Wait for the public Fuzzbench result
Probably find better names for new functions, and probably split a dedicated field from the unused field in cmpfn_operands
Rebase to latest dev branch

push to stable

vanhauser-thc · 2025-07-10T13:59:57Z

thanks for looking into this!
optimizations that help one target in the benchmark might make them worse for others, so it will be interesting what the verdict will be. I can imagine though that your changes will overall be beneficial, lets see.
I have rights to start fuzzbench runs, so I initiated that for your PR there.

vanhauser-thc · 2025-07-15T15:15:33Z

This is the result that I extracted from the data from the fuzzbench run:

bloaty_fuzz_target.TXT
aflplusplus_recent 0.996
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.997

curl_curl_fuzzer_http.TXT
aflplusplus_recent 0.999
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.998

freetype2_ftfuzzer.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 0.999
aflplusplus_hfdict_v1 1.000

harfbuzz_hb.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 1.000

jsoncpp_jsoncpp_fuzzer.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 1.000

lcms_cms_transform_fuzzer.TXT
aflplusplus_recent 0.926
aflplusplus_hfdict_v2 0.985
aflplusplus_hfdict_v1 1.000

libjpeg.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.999

libpcap_fuzz_both.TXT
aflplusplus_recent 0.981
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.989

libpng_libpng_read_fuzzer.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.999

libxml2_xml.TXT
aflplusplus_recent 0.997
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.996

libxslt_xpath.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.999

mbedtls_fuzz_dtlsclient.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 0.990
aflplusplus_hfdict_v1 0.998

openssl_x509.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.999

openthread_ot.TXT
aflplusplus_recent 0.998
aflplusplus_hfdict_v2 0.943
aflplusplus_hfdict_v1 1.000

proj4_proj_crs_to_crs_fuzzer.TXT
aflplusplus_recent 0.937
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.941

re2_fuzzer.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.999

sqlite3_ossfuzz.TXT
aflplusplus_recent 0.993
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.981

systemd_fuzz.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 0.999
aflplusplus_hfdict_v1 0.999

vorbis_decode_fuzzer.TXT
aflplusplus_recent 0.998
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 1.000

woff2_convert_woff2ttf_fuzzer.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 0.997
aflplusplus_hfdict_v1 0.998

zlib_zlib_uncompress_fuzzer.TXT
aflplusplus_recent 0.998
aflplusplus_hfdict_v2 0.998
aflplusplus_hfdict_v1 1.000


RESULTS.TXT
aflplusplus_recent 0.998
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.999

so overall it looks like it gives a marginal improvement - which is already good!

vanhauser-thc · 2025-07-17T18:07:03Z

@am009 ping

am009 · 2025-07-17T23:54:34Z

@am009 ping

Have been quite busy with other things these days. I will back to this on Sunday.

vanhauser-thc and others added 3 commits June 28, 2025 22:29

Merge pull request AFLplusplus#2486 from AFLplusplus/dev

11a5e37

push to stable

Add instrumentation for addr attr.

16a7fb9

cmplog: more strict condition for auto dict.

5558192

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improving auto-generated dictionary of Cmplog #2493

Improving auto-generated dictionary of Cmplog #2493

Uh oh!

am009 commented Jul 9, 2025

Uh oh!

vanhauser-thc commented Jul 10, 2025

Uh oh!

vanhauser-thc commented Jul 15, 2025

Uh oh!

vanhauser-thc commented Jul 17, 2025

Uh oh!

am009 commented Jul 17, 2025

Uh oh!

Uh oh!

Uh oh!

Improving auto-generated dictionary of Cmplog #2493

Are you sure you want to change the base?

Improving auto-generated dictionary of Cmplog #2493

Uh oh!

Conversation

am009 commented Jul 9, 2025

Uh oh!

vanhauser-thc commented Jul 10, 2025

Uh oh!

vanhauser-thc commented Jul 15, 2025

Uh oh!

vanhauser-thc commented Jul 17, 2025

Uh oh!

am009 commented Jul 17, 2025

Uh oh!

Uh oh!