Skip to content

Improving auto-generated dictionary of Cmplog #2493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

am009
Copy link
Contributor

@am009 am009 commented Jul 9, 2025

Background: We observed that on FuzzBench, Honggfuzz performs better than AFL++ on the proj4 benchmark (for example, here). Over the past month, we have been investigating the reasons and attempting to further improve AFL++.

For proj4, we noticed that Honggfuzz’s auto-generated dictionary is of much higher quality than AFL++’s. Honggfuzz employs a simple strategy when collecting string constants for its dictionary: during memcmp/strcmp operations, it checks whether one of the pointers originates from the ELF file’s mapped memory (typically writable or non-writable data sections), excluding pointers from the heap. For example, in string constant comparisons, the data being compared usually points to the heap, while the string constant pointer points to the ELF’s read-only data section. When porting this to AFL++, we marked this information in an unused field of the cmplog’s rtn entry.

We initially ported Honggfuzz’s strategy (Version 1, aflplusplus_hfdictv1), but the generated dictionary still contained many low-quality bytes. We modified AFL++ to log auto-dictionary additions per code location (Here), revealing that location3 and location4 (link) added numerous suboptimal dictionary entries, typically 32 bytes in size.

AFL++’s cmplog will even instruments functions with signatures similar to memcmp/strcmp (and also ignoring the size from 3rd arg, see this code), it directly records the maximum 32 bytes from the memory pointed to by the first two arguments as a cmplog rtn entry when encountering such functions. The related condition checks (link) are quite loose, and over 90% of auto-dictionary entries originate there. After further filtering these entries, we achieved better results (Version 2, aflplusplus_hfdictv2).

Local FuzzBench Results
Fuzzers:

  • aflplusplus_hfdictv1: First version with Honggfuzz’s auto-dictionary logic.
  • aflplusplus_hfdictv2: Filtered out 32-byte dictionaries from cmplog rtn entries.
  • aflplusplus_proj4dict: Auto-dictionary entries extracted from a 23-hour Honggfuzz instance (converted and fed to AFL++).
  • honggfuzz_orig: Original Honggfuzz from FuzzBench.
  • aflplusplus_recent: Recent stable AFL++ version.

On the proj4_proj_crs_to_crs_fuzzer benchmark, AFL++ now performs as good as Honggfuzz:

2025-07-09172510

hfdict-base-aflpphfdictv2-23h.zip

Testing across 4 other benchmarks also showed improvements on some other benchmarks:

2025-07-09164000

report-5-benchmarks.zip

We are also requesting public FuzzBench experiments to further validate the results across more benchmarks (experiments request PR link).

  • Wait for the public Fuzzbench result
  • Probably find better names for new functions, and probably split a dedicated field from the unused field in cmpfn_operands
  • Rebase to latest dev branch

@vanhauser-thc
Copy link
Member

thanks for looking into this!
optimizations that help one target in the benchmark might make them worse for others, so it will be interesting what the verdict will be. I can imagine though that your changes will overall be beneficial, lets see.
I have rights to start fuzzbench runs, so I initiated that for your PR there.

@vanhauser-thc
Copy link
Member

This is the result that I extracted from the data from the fuzzbench run:

bloaty_fuzz_target.TXT
aflplusplus_recent 0.996
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.997

curl_curl_fuzzer_http.TXT
aflplusplus_recent 0.999
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.998

freetype2_ftfuzzer.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 0.999
aflplusplus_hfdict_v1 1.000

harfbuzz_hb.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 1.000

jsoncpp_jsoncpp_fuzzer.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 1.000

lcms_cms_transform_fuzzer.TXT
aflplusplus_recent 0.926
aflplusplus_hfdict_v2 0.985
aflplusplus_hfdict_v1 1.000

libjpeg.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.999

libpcap_fuzz_both.TXT
aflplusplus_recent 0.981
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.989

libpng_libpng_read_fuzzer.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.999

libxml2_xml.TXT
aflplusplus_recent 0.997
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.996

libxslt_xpath.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.999

mbedtls_fuzz_dtlsclient.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 0.990
aflplusplus_hfdict_v1 0.998

openssl_x509.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.999

openthread_ot.TXT
aflplusplus_recent 0.998
aflplusplus_hfdict_v2 0.943
aflplusplus_hfdict_v1 1.000

proj4_proj_crs_to_crs_fuzzer.TXT
aflplusplus_recent 0.937
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.941

re2_fuzzer.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.999

sqlite3_ossfuzz.TXT
aflplusplus_recent 0.993
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.981

systemd_fuzz.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 0.999
aflplusplus_hfdict_v1 0.999

vorbis_decode_fuzzer.TXT
aflplusplus_recent 0.998
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 1.000

woff2_convert_woff2ttf_fuzzer.TXT
aflplusplus_recent 1.000
aflplusplus_hfdict_v2 0.997
aflplusplus_hfdict_v1 0.998

zlib_zlib_uncompress_fuzzer.TXT
aflplusplus_recent 0.998
aflplusplus_hfdict_v2 0.998
aflplusplus_hfdict_v1 1.000


RESULTS.TXT
aflplusplus_recent 0.998
aflplusplus_hfdict_v2 1.000
aflplusplus_hfdict_v1 0.999

so overall it looks like it gives a marginal improvement - which is already good!

@vanhauser-thc
Copy link
Member

@am009 ping

@am009
Copy link
Contributor Author

am009 commented Jul 17, 2025

@am009 ping

Have been quite busy with other things these days. I will back to this on Sunday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants