0% found this document useful (0 votes)
120 views14 pages

Dos and Don'ts of ML in Computer Security - Sec22 - Slides

Uploaded by

qoriah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views14 pages

Dos and Don'ts of ML in Computer Security - Sec22 - Slides

Uploaded by

qoriah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Dos and Don’ts of Machine Learning

in Computer Security
Daniel Arp, Erwin Quiring, Feargus Pendlebury, Alexander Warnecke,
Fabio Pierazzi, Christian Wressnegger, Lorenzo Cavallaro, Konrad Rieck
USENIX Security 2022
Machine Learning already solved
many problems in computer security

2
Machine Learning already
Unfortunately not… !solved
many problems in computer security

2
Motivation—Historical Examples

Network intrusion detection: The base rate fallacy


• Intrusion detectors should have low false positive rates (FPR)
• ‘Low’ FPR often still corresponds to large number of false positives

Android malware detection: Spatio-temporal bias inflating performance


• Models trained with access to ‘future’ information
• Unrealistic class balance inflates performance

Axelsson. The base-rate fallacy and the difficulty of intrusion detection. ACM TISSEC, 2000.
Pendlebury et al. TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time. USENIX Security, 2019.
4
Overview

1. Identification of common pitfalls


• 10 subtle issues affecting ML for security
• Recommendations for avoiding them

2. Survey on the prevalence of pitfalls


• Review of 30 top papers in security
• Pitfalls are widespread
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020

3. Case studies demonstrating impact of pitfalls


• Mobile malware detection
• Vulnerability discovery
• Source code authorship attribution
• Network intrusion detection

Important remark
This work should not be interpreted as a finger-pointing exercise. Any work mentioned as having pitfalls still has important
contributions and we identify pitfalls in our own work also.
8
ML Pipeline and Pitfalls

Performance
System Design and Evaluation
Learning
Data Collection and Deployment and
Labeling • P6 Inappropriate baselines Operation
• P3 Data snooping
• P7 Inappropriate measures
• P1 Sampling bias • P4 Spurious correlations
• P8 Base rate fallacy • P9 Lab-only evaluation
• P2 Label Inaccuracy • P5 Biased parameters
• P10 Inappropriate threat
model

19
Prevalence Study

1. Paper Selection 2. Review Process 3. Authors Feedback

Pitfall is either…

present (but discussed)


partly present (but discussed)
not present
unclear from text

2011 2012 2013 2014 2015 2016 2017 2018 2019 2020

22
Prevalence Study

Present Partly present Discussed

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Sampling Bias 18 24 3 6 3

Label Inaccuracy 3 5 3 7 6

Data Snooping 17 5

Spurious Correlations 6 1

Biased Parameters 3 2

Inappropriate Baseline 6 2

Inappropriate Measures 10 6

Base Rate Fallacy 3 5 6 7 3

Lab-Only Evaluation 14 17 2 3 2

Inappropriate Threat Model 5 1 16 14 4


10 % 20 % 30 % 40 % 50 % 60 % 70 % 80 % 90 % 100 %

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

22
Prevalence Study

Present Partly present Discussed

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Sampling Bias 18 24 3 6 3

Label Inaccuracy 3 5 3 7 6

Data Snooping 17 5

Spurious Correlations 6 1

Biased Parameters 3 2 PItfalls are prevalent even in top research!


Inappropriate Baseline 6 2

Inappropriate Measures 10 6

Base Rate Fallacy 3 5 6 7 3

Lab-Only Evaluation 14 17 2 3 2

Inappropriate Threat Model 5 1 16 14 4


10 % 20 % 30 % 40 % 50 % 60 % 70 % 80 % 90 % 100 %

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

22
Impact Analysis

Android Malware Detection

P1: Sampling Bias


P4: Spurious Correlations
Network Intrusion Detection
P7: Inappropriate Performance Measures
P6: Inappropriate baselines
P9: Lab-only evaluation
Authorship Attribution Vulnerability Discovery

P1: Sampling Bias P2: Label Inaccuracy


P4: Spurious Correlations P4: Spurious Correlations
P6: Inappropriate Baselines

23
Impact Analysis

Android Malware Detection

P1: Sampling Bias


P4: Spurious Correlations
Network Intrusion Detection
P7: Inappropriate Performance Measures
P6: Inappropriate baselines
P9: Lab-only evaluation
Authorship Attribution Vulnerability Discovery

P1: Sampling Bias P2: Label Inaccuracy


P4: Spurious Correlations P4: Spurious Correlations
P6: Inappropriate Baselines

23
Impact Study: Mobile Malware Detection P1: Sampling Bias
P4: Spurious Correlations
P7: Inappropriate Performance Measures

What is the problem?


• Merging of data from different sources leads to sampling bias
• Different origins of malware and benign apps can introduce unwanted shortcuts

GooglePlay Store Chinese Markets Other Origins


1,00
Sampling Probability

≈80% ≈70%
0,75

0,50

0,25

0,00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Number of AV detections

Allix et al. AndroZoo: collecting millions of Android apps for the research community. ACM MSR, 2016.
Arp et al. DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket. NDSS, 2014.
24
Impact Study: Mobile Malware Detection P1: Sampling Bias
P4: Spurious Correlations
P7: Inappropriate Performance Measures

What is the impact?


• Comparison on datasets with (D1) and without (D2) the artifact
• Training of SVM on two different feature sets

1 -11%
-17%
True positive rate
0,75

0,5
With artifact (D1)
0,96 0,88
0,85 Without artifact (D2)
0,73
0,25

0
Drebin Opseqs

Results
• Experimental results show how sampling bias affects results (P1)
• The URL „play.google.com" is among top features in D1 (P4)
• Using Accuracy would have underestimated the presence of bias (P7)

Allix et al. AndroZoo: collecting millions of Android apps for the research community. ACM MSR, 2016.
Arp et al. DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket. NDSS, 2014.
25
Dos and Don’ts of Machine Learning in Computer Security
• We identify 10 subtle pitfalls affecting the field
• Find that they are prevalent throughout top research
• Demonstrate their impact through case studies

Updates on pitfalls and recommendations:


• https://dodo-mlsec.org/ "

You might also like