Abstract
JavaScript (JS), as a platform-independent programming language, remains to be the most popular language over the years. However, popular JavaScript engines that have been widely utilized by web browsers to interpret JS code, have become the most common targets for attackers. Thus ensuring the security and reliability of JS engines is significant. Fuzzing is a simple yet effective method to unveil vulnerabilities. However, existing JS fuzzers focus more on the design of effective mutation mechanisms to generate diverse and valid seeds while they often ignore the importance of the initial seed corpus selected to drive the fuzzing process. In this paper, we performed extensive experiments to systematically evaluate the impact of seed selection on fuzzing JavaScript engines. In particular, we investigate seed selections from three main dimensions, their collected sources (e.g., CVE PoCs, Regression tests, etc.), the number and sizes, as well as a set of concerned code properties. Our major findings reveal that seeds collected from different sources can cast a significant impact on the fuzzing effectiveness (i.e., CVE PoC is significantly better than the other types of seeds), and seed files containing those concerned code structures can lead existing fuzzers to achieve superior results in terms of both code coverage and unique crashes identified. Inspired by our observations, we devised a simple heuristic to prioritize JavaScript files when selecting seed corpus. Our experiments show that when driven by our selected seed corpus, the existing state-of-art fuzzer is able to achieve significantly higher code coverage and identify more crashes.











Similar content being viewed by others
Data Available Statement
The seeds we collected and the analysis results of the experiment have been stored in the Git repository, https://github.com/CGCL-codes/JSFuzz
Notes
A fuzzer usually perform a dry-run on the seed corpus to obtain the initial information.
References
(2019) A collection of javascript engine cves with pocs. https://github.com/tunz/js-vuln-db
Apple Javascriptcore (2014) The Built-in Javascript Engine for Webkit. https://trac.webkit.org/wiki/JavaScriptCore
Aschermann C, Frassetto T, Holz T, Jauernig P, Sadeghi AR, Teuchert D (2019) Nautilus: Fishing For Deep Bugs With Grammars. In: NDSS
Athanasakis M, Athanasopoulos E, Polychronakis M, Portokalidis G, Ioannidis S (2015) The devil is in the constants: Bypassing defenses in browser jit engines. In: NDSS
Böhme M, Pham VT, Roychoudhury A (2017) Coverage-based greybox fuzzing as markov chain. IEEE Trans Softw Eng 45(5):489–506
Böhme M, Pham VT, Roychoudhury A (2017) Coverage-based greybox fuzzing as markov chain. IEEE Trans Softw Eng 45(5):489–506
Chen Y, Zhong R, Hu H, Zhang H, Yang Y, Wu D, Lee W (2021) One engine to fuzz’em all: Generic language processor testing with semantic validation. In: Proc 42nd IEEE Symp Secur Priv (Oakland)
Cummins C, Petoumenos P, Murray A, Leather H (2018) Compiler fuzzing through deep learning. In: Proc 27th ACM SIGSOFT Int Symp Soft Test Anal pp 95–105
Ecma (2019) standard ecma-262. https://www.ecma-international.org/publications/standards/Ecma-262.htm
Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: Continual prediction with lstm. Neural Comput 12(10):2451–2471
Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: Continual prediction with lstm. Neural Comput 12(10):2451–2471
Godefroid P, Peleg H, Singh R (2017) Learn amp;fuzz: Machine learning for input fuzzing. In: 2017 32nd IEEE/ACM Int Conf Autom Softw Eng (ASE) pp 50–59. https://doi.org/10.1109/ASE.2017.8115618
Han H, Oh D, Cha SK (2018) Codealchemist: Semantics-aware code generation to find vulnerabilities in javascript engines. In: NDSS
Herrera A, Gunadi H, Magrath S, Norrish M, Payer M, Hosking AL (2021) Seed selection for successful fuzzing. In: Proc 30th ACM SIGSOFT Int Symp Softw Test Anal ISSTA 2021 Assoc Comput Mach. New York, NY, USA pp 230–243. https://doi.org/10.1145/3460319.3464795
He X, Xie X, Li Y, Sun J, Li F, Zou W, Liu Y, Yu L, Zhou J, Shi W, Huo W (2021) Sofi: Reflection-augmented fuzzing for javascript engines. CCS ’21
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Computation 9(8):1735–1780
Holler C, Herzig K, Zeller A (2012) Fuzzing with code fragments. In: 21st USENIX Secur Symp (USENIX Security 12) pp 445–458. USENIX Association, Bellevue, WA. https://www.usenix.org/conference/usenixsecurity12/technical-sessions/presentation/holler
Klees G, Ruef A, Cooper B, Wei S, Hicks M (2018) Evaluating fuzz testing. In: Proc 2018 ACM SIGSAC Conf Comput Commun Secur CCS’18 pp 2123–2138. Assoc Comput Mach. New York, NY, USA. https://doi.org/10.1145/3243734.3243804
Language Ranking (2021). https://madnight.github.io/githut/#/pullrequests/2021/3 Accessed 28 Oct 2021
Lee S, Han H, Cha SK, Son S (2020) Montage: A neural network language model-guided javascript engine fuzzer. In: 29th USENIX Secur Symp (USENIX Security 20) pp 2613–2630
Lemieux C, Sen K (2018) Fairfuzz: A targeted mutation strategy for increasing greybox fuzz testing coverage. In: Proc 33rd ACM/IEEE Int Conf Autom Softw Eng pp 475–485
LLVM Project (2015) Libfuzzer. https://llvm.org/docs/LibFuzzer.html#value-profile. Accessed 10 Jan 2021
Lyu C, Ji S, Zhang C, Li Y, Lee WH, Song Y, Beyah R (2018) MOPT: Optimized mutation scheduling for fuzzers. In: 28th USENIX Secur Symp (USENIX Security 19) pp 1949–1966
Mann HB, Whitney DR (1947) On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18(1):50– 60. DOI 10.1214/aoms/1177730491. https://doi.org/10.1214/aoms/1177730491
Mann HB, Whitney DR (1947) On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18(1):50–60. https://doi.org/10.1214/aoms/1177730491
Molinyawe M, Hariri AA, Spelman J (2016) Shell on earth: From browser to system compromise. Proc Black Hat USA
Official Ecmascript Conformance Test Suite (1997). https://github.com/tc39/test262
Patra J, Pradel M (2016) Learning to fuzz: Application-independent fuzz testing with probabilistic, generative models of input data. TU Darmstadt, Department of Computer Science, Tech. Rep. TUD-CS-2016-14664
Pham VT, Böhme M, Santosa AE, Căciulescu AR, Roychoudhury A (2019) Smart greybox fuzzing. IEEE Trans Softw Eng 47(9):1980–1997
Pham VT, Böhme M, Santosa AE, Căciulescu AR, Roychoudhury A (2019) Smart greybox fuzzing. IEEE Trans Softw Eng 47(9):1980–1997
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I et al (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I et al (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9
Raychev V, Bielik P, Vechev M, Krause A (2016) Learning programs from noisy data. ACM Sigplan Notices 51(1):761–774
Raychev V, Bielik P, Vechev M, Krause A (2016) Learning programs from noisy data. ACM Sigplan Notices 51(1):761–774
Reddy S, Lemieux C, Padhye R, Sen K (2021) Quickly generating diverse valid test inputs with reinforcement learning. In: 2020 IEEE/ACM 42nd Int Conf Softw Eng (ICSE) pp 1410–1421. IEEE
Rohlf C, Ivnitskiy Y (2011) Attacking clientside jit compilers. Black Hat USA
Romano A, Lehmann D, Pradel M, Wang W (2021) Wobfuscator: Obfuscating javascript malware via opportunistic translation to webassembly
R. Swiecki. Honggfuzz. (2016). http://code.google.com/p/honggfuzz
The React.js Library (2013). https://reactjs.org. Accessed 28 Oct 2021
Theori INC (2019) pwn.js. https://github.com/theori-io/pwnjs
Veggalam S, Rawat S, Haller I, Bos H (2016) Ifuzzer: An evolutionary interpreter fuzzer using genetic programming. In: I. Askoxylakis S, Ioannidis S, Katsikas C Meadows, (ed) Comput Secur - ESORICS 2016 pp 581–601. Springer International Publishing, Cham
Wang J, Chen B, Wei L, Liu Y (2017) Skyfire: Data-driven seed generation for fuzzing. In: 2017 IEEE Symp Secur Priv (SP) pp 579–594. IEEE
Wang J, Chen B, Wei L, Liu Y (2019) Superion: Grammar-aware greybox fuzzing. In: 2019 IEEE/ACM 41st Int Conf Softw Eng (ICSE) pp 724–735. IEEE
Ye G, Tang Z, Tan SH, Huang S, Fang D, Sun X, Bian L, Wang H, Wang Z (2021) Automated conformance testing for javascript engines via deep compiler fuzzing. In: PLDI pp 435–450
Acknowledgements
We sincerely thank the editor for his/her help in reviewing this paper and all anonymous reviewers for their valuable comments. This work was supported by the National Natural Science Foundation of China (Grant No. 62002125) as well as the Young Elite Scientists Sponsorship Program by CAST (Grant No. 2021QNRC001)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Tingting Yu.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wen, M., Wang, Y., Xia, Y. et al. Evaluating seed selection for fuzzing JavaScript engines. Empir Software Eng 28, 133 (2023). https://doi.org/10.1007/s10664-023-10340-9
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-023-10340-9