Reviewer Integration and Performance Measurement for Malware Detection

Miller, Brad; Kantchelian, Alex; Tschantz, Michael Carl; Afroz, Sadia; Bachwani, Rekha; Faizullabhoy, Riyaz; Huang, Ling; Shankar, Vaishaal; Wu, Tony; Yiu, George; Joseph, Anthony D.; Tygar, J. D.

doi:10.1007/978-3-319-40667-1_7

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9721))

Included in the following conference series:

International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment

2461 Accesses

Abstract

We present and evaluate a large-scale malware detection system integrating machine learning with expert reviewers, treating reviewers as a limited labeling resource. We demonstrate that even in small numbers, reviewers can vastly improve the system’s ability to keep pace with evolving threats. We conduct our evaluation on a sample of VirusTotal submissions spanning 2.5 years and containing 1.1 million binaries with 778 GB of raw feature data. Without reviewer assistance, we achieve 72 % detection at a 0.5 % false positive rate, performing comparable to the best vendors on VirusTotal. Given a budget of 80 accurate reviews daily, we improve detection to 89 % and are able to detect 42 % of malicious binaries undetected upon initial submission to VirusTotal. Additionally, we identify a previously unnoticed temporal inconsistency in the labeling of training datasets. We compare the impact of training labels obtained at the same time training data is first seen with training labels obtained months later. We find that using training labels obtained well after samples appear, and thus unavailable in practice for current training data, inflates measured detection by almost 20 % points. We release our cluster-based implementation, as well as a list of all hashes in our evaluation and 3 % of our entire dataset.

B. Miller and G. Yiu—Primarily contributed while at UC Berkeley.

R. Bachwani—Primarily contributed while at Intel.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: EUR 29.95; Price includes VAT (France)

eBook: EUR 42.79; Price includes VAT (France)

Softcover Book: EUR 52.74; Price includes VAT (France)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

AVclass: A Tool for Massive Malware Labeling

Malware Detection Using Pseudo Semi-Supervised Learning

A Deep Dive into the VirusTotal File Feed

Notes

1.
http://secml.cs.berkeley.edu/detection_platform/.
2.
In particular, we include the following vendors: AVG, Antiy-AVL, Avast, BitDefender, CAT-QuickHeal, ClamAV, Comodo, ESET-NOD32, Emsisoft, F-Prot, Fortinet, GData, Ikarus, Jiangmin, K7AntiVirus, Kaspersky, McAfee, McAfee-GW-Edition, Microsoft, Norman, Panda, SUPERAntiSpyware, Sophos, Symantec, TheHacker, TotalDefense, TrendMicro, TrendMicro-HouseCall, VBA32, VIPRE, ViRobot and nProtect.

References

ClamAV PUA, 14 November 2014. http://www.clamav.net/doc/pua.html
PEiD, 14 November 2014. http://woodmann.com/BobSoft/Pages/Programs/PEiD
The Cuckoo Sandbox, 14 November 2014. http://www.cuckoosandbox.org
Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K.: Drebin: effective and explainable detection of android malware in your pocket. In: NDSS (2014)
Google Scholar
Canali, D., Cova, M., Vigna, G., Kruegel, C.: Prophiler: a fast filter for the large-scale detection of malicious web pages. In: WWW (2011)
Google Scholar
Chakradeo, S., Reaves, B., Traynor, P., Enck, W.: Mast: triage for market-scale mobile malware analysis. In: ACM WiSec (2013)
Google Scholar
Chapelle, O., Schlkopf, B., Zien, A.: Semi-Supervised Learning. The MIT Press, Cambridge (2010)
Google Scholar
Curtsinger, C., Livshits, B., Zorn, B., Seifert, C.: Zozzle: fast and precise in-browser javascript malware detection. In: Usenix Security (2011)
Google Scholar
Damballa: State of Infections Report: Q4 2014. Technical report, Damballa (2015)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001)
Book MATH Google Scholar
Kantchelian, A., Afroz, S., Huang, L., Islam, A.C., Miller, B., Tschantz, M.C., Greenstadt, R., Joseph, A.D., Tygar, J.D.: Approaches to adversarial drift. In: ACM AISec (2013)
Google Scholar
Karanth, S., Laxman, S., Naldurg, P., Venkatesan, R., Lambert, J., Shin, J.: ZDVUE: prioritization of javascript attacks to discover new vulnerabilities. In: ACM AISec (2011)
Google Scholar
Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–2744 (2006)
MathSciNet MATH Google Scholar
McAfee Labs: McAfee Labs Threats Report, August 2014
Google Scholar
Miller, B.: Scalable Platform for Malicious Content Detection Integrating Machine Learning and Manual Review. Ph.D. thesis, UC Berkeley (2015)
Google Scholar
Nissim, N., Cohen, A., Moskovitch, R., Shabtai, A., Edry, M., Bar-Ad, O., Elovici, Y.: ALPD: active learning framework for enhancing the detection of malicious pdf files. In: IEEE JISIC, September 2014
Google Scholar
Nissim, N., Moskovitch, R., Rokach, L., Elovici, Y.: Novel active learning methods for enhanced pc malware detection in windows os. J. Expert Syst. Appl. 41(13), 5843–5857 (2014)
Article Google Scholar
Perdisci, R., Lee, W., Feamster, N.: Behavioral clustering of http-based malware and signature generation using malicious network traces. In: NSDI (2010)
Google Scholar
Provos, N., Mavrommatis, P., Rajab, M.A., Monrose, F.: All your iframes point to us. In: USENIX Security (2008)
Google Scholar
Rajab, M.A., Ballard, L., Lutz, N., Mavrommatis, P., Provos, N.: CAMP: content-agnostic malware protection. In: NDSS (2013)
Google Scholar
Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: IEEE S&P (2001)
Google Scholar
Schwenk, G., Bikadorov, A., Krueger, T., Rieck, K.: Autonomous learning for detection of javascript attacks: vision or reality? In: ACM AISec (2012)
Google Scholar
Sculley, D., Otey, M.E., Pohl, M., Spitznagel, B., Hainsworth, J., Zhou, Y.: Detecting adversarial advertisements in the wild. In: KDD (2011)
Google Scholar
Settles, B.: Active learning literature survey. Computer Sciences Technical report 1648, University of Wisconsin-Madison (2009)
Google Scholar
Šrndic, N., Laskov, P.: Detection of malicious PDF files based on hierarchical document structure. In: NDSS (2013)
Google Scholar
Stringhini, G., Kruegel, C., Vigna, G.: Shady paths: leveraging surfing crowds to detect malicious web pages. In: ACM CCS (2013)
Google Scholar
VirusTotal. https://www.virustotal.com/. Accessed 30 Jul 2014

Download references

Author information

Authors and Affiliations

Google Inc., Mountain View, USA
Brad Miller
UC Berkeley, Berkeley, USA
Alex Kantchelian, Riyaz Faizullabhoy, Vaishaal Shankar, Tony Wu, Anthony D. Joseph & J. D. Tygar
International Computer Science Institute, Berkeley, USA
Michael Carl Tschantz & Sadia Afroz
Netflix, Los Gatos, USA
Rekha Bachwani
DataVisor, Mountain View, USA
Ling Huang
Pinterest, San Francisco, USA
George Yiu

Authors

Brad Miller
View author publications
You can also search for this author in PubMed Google Scholar
Alex Kantchelian
View author publications
You can also search for this author in PubMed Google Scholar
Michael Carl Tschantz
View author publications
You can also search for this author in PubMed Google Scholar
Sadia Afroz
View author publications
You can also search for this author in PubMed Google Scholar
Rekha Bachwani
View author publications
You can also search for this author in PubMed Google Scholar
Riyaz Faizullabhoy
View author publications
You can also search for this author in PubMed Google Scholar
Ling Huang
View author publications
You can also search for this author in PubMed Google Scholar
Vaishaal Shankar
View author publications
You can also search for this author in PubMed Google Scholar
Tony Wu
View author publications
You can also search for this author in PubMed Google Scholar
George Yiu
View author publications
You can also search for this author in PubMed Google Scholar
Anthony D. Joseph
View author publications
You can also search for this author in PubMed Google Scholar
J. D. Tygar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brad Miller .

Editor information

Editors and Affiliations

IMDEA Software Institute, Pozuelo de Alarcón, Madrid, Spain
Juan Caballero
Mondragon University, Arrasate, Guipúzcoa, Spain
Urko Zurutuza
Universidad de Zaragoza, Zaragoza, Spain
Ricardo J. Rodríguez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Miller, B. et al. (2016). Reviewer Integration and Performance Measurement for Malware Detection. In: Caballero, J., Zurutuza, U., Rodríguez, R. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2016. Lecture Notes in Computer Science(), vol 9721. Springer, Cham. https://doi.org/10.1007/978-3-319-40667-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-40667-1_7
Published: 12 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40666-4
Online ISBN: 978-3-319-40667-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Reviewer Integration and Performance Measurement for Malware Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

AVclass: A Tool for Massive Malware Labeling

Malware Detection Using Pseudo Semi-Supervised Learning

A Deep Dive into the VirusTotal File Feed

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Reviewer Integration and Performance Measurement for Malware Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

AVclass: A Tool for Massive Malware Labeling

Malware Detection Using Pseudo Semi-Supervised Learning

A Deep Dive into the VirusTotal File Feed

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation