Detection of Recurring Software Vulnerabilities
by
Nam H. Pham
A thesis submitted to the graduate faculty
in partial fulfillment of the requirements for the degree of
MASTER OF SCIENCE
Major: Computer Engineering
Program of Study Committee:
Tien N. Nguyen, Major Professor
Akhilesh Tyagi
Samik Basu
Iowa State University
Ames, Iowa
2010
c Nam H. Pham, 2010. All rights reserved.
Copyright ⃝
ii
TABLE OF CONTENTS
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii
CHAPTER 1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
CHAPTER 2. BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.1
Terminology and Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.2
Bug Detection and Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
2.3
Vulnerability Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
CHAPTER 3. EMPIRICAL STUDY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
3.1
Hypotheses and Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
3.2
Representative Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
3.3
Results and Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
3.4
Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
CHAPTER 4. APPROACH OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
4.1
Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
4.2
Algorithmic Solution and Techniques . . . . . . . . . . . . . . . . . . . . . . . . . .
19
CHAPTER 5. SOFTWARE VULNERABILITY DETECTION . . . . . . . . . . . . . . .
21
5.1
Type 1 Vulnerability Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
5.1.1
Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
5.1.2
Feature Extraction and Similarity Measure . . . . . . . . . . . . . . . . . . .
21
iii
5.1.3
Candidate Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
5.1.4
Origin Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
Type 2 Vulnerability Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
5.2.1
Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
5.2.2
Feature Extraction and Similarity Measure . . . . . . . . . . . . . . . . . . .
27
5.2.3
Candidate Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
CHAPTER 6. EMPIRICAL EVALUATION . . . . . . . . . . . . . . . . . . . . . . . . . .
31
5.2
6.1
Evaluation of Type 1 Vulnerability Detection . . . . . . . . . . . . . . . . . . . . . .
31
6.2
Evaluation of Type 2 Vulnerability Detection . . . . . . . . . . . . . . . . . . . . . .
33
6.3
Patching Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
CHAPTER 7. CONCLUSIONS AND FUTURE WORK . . . . . . . . . . . . . . . . . . .
39
APPENDIX A. ADDITIONAL TECHNIQUES USED IN SECURESYNC . . . . . . . . .
42
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
iv
LIST OF TABLES
Table 3.1
Recurring Software Vulnerabilities . . . . . . . . . . . . . . . . . . . . . . .
15
Table 6.1
Recurring Vulnerability Type 1 Detection Evaluation . . . . . . . . . . . . .
32
Table 6.2
Recurring Vulnerability Type 2 Detection Evaluation . . . . . . . . . . . . .
34
Table 6.3
Recurring Vulnerability Type 2 Recommendation . . . . . . . . . . . . . . .
37
Table A.1
Extracted Patterns and Features . . . . . . . . . . . . . . . . . . . . . . . . .
44
Table A.2
Feature Indexing and Occurrence Count . . . . . . . . . . . . . . . . . . . .
44
v
LIST OF FIGURES
Figure 3.1
Vulnerable Code in Firefox 3.0.3 . . . . . . . . . . . . . . . . . . . . . . . .
10
Figure 3.2
Vulnerable Code in SeaMonkey 1.1.12 . . . . . . . . . . . . . . . . . . . . .
10
Figure 3.3
Patched Code in Firefox 3.0.4 . . . . . . . . . . . . . . . . . . . . . . . . . .
11
Figure 3.4
Patched Code in SeaMonkey 1.1.13 . . . . . . . . . . . . . . . . . . . . . .
12
Figure 3.5
Recurring Vulnerability in NTP 4.2.5 . . . . . . . . . . . . . . . . . . . . . .
13
Figure 3.6
Recurring Vulnerability in Gale 0.99 . . . . . . . . . . . . . . . . . . . . . .
13
Figure 4.1
SecureSync’s Working Process . . . . . . . . . . . . . . . . . . . . . . . . .
17
Figure 4.2
Detection of Recurring Vulnerabilities . . . . . . . . . . . . . . . . . . . . .
19
Figure 5.1
xAST from Code in Figure 3.1 and Figure 3.2 . . . . . . . . . . . . . . . . .
22
Figure 5.2
xAST from Patched Code in Figure 3.3 and Figure 3.4 . . . . . . . . . . . .
22
Figure 5.3
xGRUMs from Vulnerable and Patched Code in Figure 3.5 . . . . . . . . . .
26
Figure 5.4
Graph Alignment Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . .
28
Figure 6.1
Vulnerable Code in Thunderbird 2.0.17 . . . . . . . . . . . . . . . . . . . .
33
Figure 6.2
Vulnerable Code in Arronwork 1.2 . . . . . . . . . . . . . . . . . . . . . . .
35
Figure 6.3
Vulnerable and Patched Code in GLib 2.12.3 . . . . . . . . . . . . . . . . . .
35
Figure 6.4
Vulnerable Code in SeaHorse 1.0.1 . . . . . . . . . . . . . . . . . . . . . . .
36
Figure 7.1
The SecureSync Framework . . . . . . . . . . . . . . . . . . . . . . . . . .
40
Figure A.1
The Simulink Model and Graph Representation . . . . . . . . . . . . . . . .
43
vi
ACKNOWLEDGEMENTS
I would like to take this opportunity to express my thanks to those who helped me with various
aspects of conducting research and the writing of this thesis.
First and foremost, Dr. Tien N. Nguyen for his guidance, patience and support throughout this
research and the writing of this thesis. Her insights and words of encouragement have often inspired me
and renewed my hopes for completing my graduate education. I would also like to thank my committee
members for their efforts and contributions to this work: Dr. Akhilesh Tyagi and Dr. Samik Basu.
I would additionally like to thank Tung Thanh Nguyen and Hoan Anh Nguyen for their comments
and support throughout the all stages of this thesis.
vii
ABSTRACT
Software security vulnerabilities are discovered on an almost daily basis and have caused substantial damage. It is vital to be able to detect and resolve them as early as possible. One of early detection
approaches is to consult with the prior known vulnerabilities and corresponding patches. With the hypothesis that recurring software vulnerabilities are due to software reuse, we conducted an empirical
study on several databases for security vulnerabilities and found several recurring and similar software
security vulnerabilities occurring in different software systems. Most of recurring vulnerabilities occur
in the systems that reuse source code, share libraries/APIs or reuse at a higher level of abstraction (e.g.
algorithms, protocols, or specifications).
The finding suggests that one could effectively detect and resolve some unreported vulnerabilities
in one software system by consulting the prior known and reported vulnerabilities in the other systems
that reuse/share source code, libraries/APIs, or specifications. To help developers with this task, we
developed SecureSync, a supporting tool to automatically detect recurring software vulnerabilities in
different systems that share source code or libraries, which are the most frequent types of recurring
vulnerabilities. SecureSync is designed to work with a semi-automatically built knowledge base of the
prior known/reported vulnerabilities, including the corresponding systems, libraries, and vulnerable
and patched code. To help developers check and fix the vulnerable code, SecureSync also provides
some suggestions such as adding missed function calls, adding checking of input/output of a function
call, replacing the operators in an expression, etc.
We conducted an evaluation on 60 vulnerabilities of with the totals of 176 releases in 119 opensource software systems. The result shows that SecureSync is able to detect recurring vulnerabilities
with high accuracy and to identify several vulnerable code locations that are not yet reported or fixed
even in mature systems.
1
CHAPTER 1. INTRODUCTION
New software security vulnerabilities are discovered on an almost daily basis [4]. Attacks against
computer software, which is one of the key infrastructures of our modern society and economies, can
cause substantial damage. For example, according to the CSI Computer Crime and Security Survey
2008 [30], 522 US companies were reported to have lost in total $3.5 billion per year due to the
attacks on critical business software applications. Many systems are developed, deployed, and used
over years that contain significant security weaknesses. Over 90% of security incidents reported to the
Computer Emergency Response Team (CERT) Coordination Center result from software defects [17].
Because late corrections of errors could cost up to 200 times as much as early correction [23], it is
vital to be able to detect and resolve them as early as possible. One of early detection approaches is
to consult with the prior known vulnerabilities and corresponding patches. In current practice, known
software security vulnerabilities and/or patches are often reported in public databases (e.g. National
Vulnerability Database (NVD) [17], Common Vulnerabilities and Exposures database (CVE) [4]), or
on public websites of specific software applications.
With the hypothesis that recurring software vulnerabilities are due to software reuse, we conducted
an empirical study on several databases for security vulnerabilities including NVD [17], CVE [4], and
others. We found several recurring and similar software security vulnerabilities occurring in different
software systems. Most of recurring vulnerabilities occur in the systems that reuse source code (e.g.
having the same code base, deriving from the same source, or being developed on top of a common
framework). That is, a system has some vulnerable code fragments. Then, such code fragments are
reused in other systems (e.g. by copy-and-paste practice, by branching/duplicating the code base and
then developing new versions or new systems). Patches in one of such systems were late propagated
into other systems. Due to the reuse of source code, the recurring vulnerable code fragments are
2
identical or highly similar in code structure and names of function calls, variables, constants, literals,
or operators. Let us call them Type 1.
Another type of recurring vulnerabilities occurs across different systems that share APIs/libraries
(Type 2). For example, such systems use the same function from a library and have the same errors in
API usages, e.g. missing or wrongly checking the input/output of the function; missing or incorrectly
ordering function calls, etc. The corresponding vulnerable code fragments on such systems tend to
misuse the same APIs in a similar manner, e.g., using the incorrect orders, missing step(s) in function
calls, missing the same checking statements, incorrectly using the same comparison expression, etc.
There are also some systems having recurring or similar vulnerabilities due to the reuse at a higher
level of abstraction. For example, such systems share the same algorithms, protocols, specifications,
standards, and then have the same bugs or programming faults. We call such recurring vulnerabilities
Type 3. The examples and detailed results of all three types will be discussed in Chapter 3.
This finding suggests that one could effectively detect and resolve some unreported vulnerabilities
in one software system by consulting the prior known and reported vulnerabilities in the other systems that reuse/share source code, libraries, or specifications. To help developers with this task, we
developed SecureSync, a supporting tool that is able to automatically detect recurring software vulnerabilities in different systems that share source code or libraries, which are the most frequent types
of recurring vulnerabilities. Detecting recurring vulnerabilities in systems reusing at higher levels of
abstraction will be investigated in future work.
SecureSync is designed to work with a semi-automatically built knowledge base of the prior known/reported vulnerabilities, including the corresponding systems, libraries, and vulnerable and patched code.
It could support detecting and resolving vulnerabilities in the two following scenarios:
1. Given a vulnerability report in a system A with corresponding vulnerable and patched code,
SecureSync analyzes the patches and stores the information in its knowledge base. Then, via Google
Code Search [6], it searches for all other systems B that share source code and libraries with A, checks
if B has the similarly vulnerable code, and reports such locations (if any).
2. Given a system X for analysis, SecureSync will check whether X reuses some code fragments
or libraries with another system Y in its knowledge base. Then if the shared code in X is sufficiently
3
similar to the vulnerable code in Y , SecureSync will report it to be likely vulnerable and point out the
vulnerable location(s).
In those scenarios, to help developers check and fix the vulnerable code, SecureSync also provides
some suggestions such as adding missed function calls, adding checking of input/output before or after
a call, replacing the operators in an expression, etc.
The key technical goals of SecureSync are how to represent vulnerable and patched code and how
to detect code fragments that are similar to vulnerable ones. We have developed two core techniques for
those problems to address two kinds of recurring vulnerabilities. For recurring vulnerabilities of Type
1 (reusing source code), SecureSync represents vulnerable code fragments as Abstract Syntax Tree
(AST)-like structures, with the labels of nodes representing both node types and node attributes. For
example, if a node represents a function call, its label will include the node type FUNCTION CALL, the
function name, and the parameter list. The similarity of code fragments is measured by the similarity
of structures of such labeled trees. Our prior technique, Exas [44, 49], is used to approximate structure
information of labeled trees and graphs by vectors and to measure the similarity of such trees via vector
distance.
For recurring vulnerabilities of Type 2 (systems sharing libraries), the traditional code clone detection techniques do not work in these cases because the similarity measurement must involve program
semantics such as API usages and relevant semantic information. SecureSync represents vulnerable
code fragments as graphs, with the nodes representing function calls, condition checking blocks (as
control nodes) in statements such as if, while, or for, and operations such as ==, !, or <. Labels of
nodes include their types and names. The edges represent the relations between nodes, e.g. control/data
dependencies, and orders of function calls. The similarity of such graphs is measured based on their
largest common subgraphs.
To improve the performance, SecureSync also uses several filtering techniques. For example, it
uses text-based filtering to keep only source files containing identifiers/tokens related to function names
appearing in vulnerable code in its knowledge base. It also uses locality-sensitive hashing (LSH) [20]
to perform fast searching for similar trees in its knowledge base: only trees having the same hash
code are compared to each other. SecureSync uses set-based filtering to find the candidates of Type 2:
4
only the graphs containing the nodes having the same/similar labels with the nodes of the graphs in its
knowledge base are kept as candidates for comparison.
We conducted an evaluation on 48 and 12 vulnerabilities of Type 1 and Type 2 with the totals of 51
and 125 releases, respectively, in 119 open-source software systems. The result shows that SecureSync
is highly accurate. It is able to correctly locate most of locations of vulnerable code in the systems that
share source code/libraries with the systems in its knowledge base. Interestingly, it detects 90 releases
having potentially vulnerable code locations (fragments having vulnerabilities) that, to the best of our
knowledge, have not been detected, reported, or patched yet even in mature systems. Based on the
recommendations from our tool, we produced the patches for such vulnerabilities and reported to the
developers of those systems. Some of such vulnerabilities and patches were actually confirmed. We
still wait for replies on others.
The contribution of this thesis includes:
1. An empirical study that confirms the existence of recurring/similar software vulnerabilities and
our aforementioned software reuse hypothesis, and provides us insights on their characteristics;
2. Two representations and algorithms to detect recurring vulnerabilities on systems sharing source
code and/or APIs/libraries;
3. SecureSync: An automatic prototype tool that detects recurring vulnerabilities and recommends
the resolution for them;
4. An empirical evaluation of our tool on real-world datasets of vulnerabilities and systems shows
its accuracy and usefulness.
The outline of the thesis is as follow: Chapter 2 discusses about the literature review. Chapter 3
reports our empirical study on recurring vulnerabilities. Chapter 4 and Chapter 5 present the overview
of our approach and detailed techniques. The empirical evaluation is in Chapter 6. Finally, conclusions
appear last in Chapter 7.
5
CHAPTER 2. BACKGROUND
First, we discuss about the terminology and concepts used within the thesis. Second, we present
related approaches in bug detection and localization. Finally, an overview of the current state of open
software security databases appears last in this chapter.
2.1
Terminology and Concepts
A system is a software product, program, module, or library under investigation (e.g. Firefox,
OpenSSL, etc). A release refers to a specific release/version of a software system (e.g. Firefox 3.0.5,
OpenSSL 0.9.8).
A software bug is the common term used to describe an error, flaw, mistake, failure, or fault in a
computer program or system that produces an incorrect or unexpected result, or causes it to behave in
unintended ways [13].
A patch is a piece of software designed to fix problems with, or update a computer program or
its supporting data. This includes fixing security vulnerabilities and other bugs, and improving the
usability or performance [11].
A vulnerability is an exploitable software fault occurring on specific release(s) of a system. For
example, CVE-2008-5023 reported a vulnerability in security checks on Firefox 3.0.0 to 3.0.4. A
software vulnerability can be caused by a software bug which may allow an attacker to misuse an
application.
A recurring vulnerability is a vulnerability that occurs and should be fixed/patched on at least two
different releases (of the same or different systems). This term also refers to a group of vulnerabilities
having the same causes on different systems/releases. Examples of recurring vulnerabilities are in
Chapter 3.
6
2.2 Bug Detection and Localization
Our research is closely related to static approaches to detect similar and recurring bugs. Static
approaches could be categorized into two types: rule/pattern-based and peer-based. The strategy in
pattern-based approaches is first to detect the correct code patterns via mining frequent code/usages, via
mining rules from existing code, or predefined rules. The anomaly usages deviated from such patterns
are considered as potential bugs. In contrast, a peer-based approach attempts to match the code fragment under investigation against its databases of cases and provides a fix recommendation if a match
occurs. Peer-based approach was chosen for SecureSync because a vulnerability must be detected early
even though it does not necessarily repeat frequently yet. It is desirable that a vulnerability is detected
even if it matches with one case in the past. SecureSync efficiently stores only features for each case.
Several bug finding approaches are based on mining of code patterns [32,38,55–57]. Sun et al. [54]
present a template- and rule-based approach to automatically propagate bug fixes. Their approach
supports some pre-defined templates/rules (e.g. orders of pairs of method calls, condition checking
around the calls) and requires a fix to be extracted/expressed as rules in order to propagate it. Other
tools also detect pre-defined, common bug patterns using syntactic pattern matching [26,32]. In contrast
to those pattern-based approaches, SecureSync is based on our general graph-based representation for
API-related code and its feature extraction that could support any vulnerable and corresponding patched
code.
JADET [56] and GrouMiner [46] perform mining object usage patterns and detect the violations
as potential bugs. Several approaches mine usage patterns in term of the orders of pairs of method
calls [18, 38, 57], or association rules [53, 55]. Error-handling bugs are detected via mining sequence
association rules [55]. Chang et al. [24] find patterns on condition nodes on program dependence graphs
and detect bugs involving negligent conditions. Hipikat [27] extracts lexical information while building
the project’s memories and recommends relevant artifacts. Song et al. [53] detect association rules
between six types of bugs from the project’s history for bug prediction. However, those approaches
focus only on specific sets of patterns and bugs (object usages [56], error-handling [55], condition
checking [24]). BugMem [35] uses a textual difference approach to identify changed texts and detects
similar bugs. SecureSync captures better the contexts of API usages with its graph representation,
7
thus, it could detect any type of API-related recurring vulnerabilities more precisely and generally
than specific sets of bug patterns. It is designed toward API better than GrouMiner [46] with data and
operator nodes, and their relations with function calls.
FixWizard [45] is a peer-based approach for detecting recurring bugs. It relies on code peers, i.e.
methods/classes with similar interactions in the same system, thus, cannot work across systems. Patch
Miner [12] finds all code snapshots with similar snippets (i.e. cloned code) to the fragment that was
fixed. CP-Miner [37] mines frequent subsequences of tokens to detect bugs caused by inconsistent
editing to cloned code. Jiang et al. [34] detect clone-related bugs via formulating context-based inconsistencies. Existing supports for consistent editing of cloned code are limited to interactive synchronization in editors such as CloneTracker [28]. Compared to clone-related bug detection approaches [34,37],
our detection for code-reused vulnerabilities could also handle non-continuous clones (e.g. source code
in Figure 3.1 and Figure 3.2). More importantly, our graph-based representation and feature extraction
are more specialized toward finding fragments with similar API usages, and they are more flexible to
support the detection of API-related bugs across systems.
Several approaches have been proposed to help users localize buggy code areas [22, 31, 36, 40, 42].
Some leverage the project’s historical information: the amount of changed LOC over the total in a time
period [42], frequently/recently modified/fixed modules [31], code co-changes and bug locality [36],
change and complexity metrics [41, 48, 52], social network among developers [21, 22, 51, 58], etc.
In software security, vulnerability prediction approaches that relied on projects’ components and
history include [29, 43]. Other researchers study the characteristics of vulnerabilities via discovery
rates [25], or time and efforts of patching [19]. Longstaff [39] defined a vulnerability classification
called Ease of Exploit Classification, based on the difficulty of exploitation. In general, none of existing software security approaches has studied recurring software security vulnerabilities and their
characteristics.
2.3 Vulnerability Databases
There are many publicly available vulnerability databases in the Internet including, but not limited
to, Common Vulnerabilities and Exposure (CVE) [4], Internet Security Systems (ISS) [7], National
8
Vulnerability Database (NVD) [17], The Open Source Vulnerability Database (ovsbd) [15]. There is
also a suite of selected open standards (The Security Content Automation Protocol - SCAP) that enumerate software flaws, security related configuration issues, and product names; measure systems to
determine the presence of vulnerabilities; and provide mechanisms to rank the results of these measurements in order to evaluate the impacts of the discovered security issues. SCAP standards [16] are
comprised of Common Vulnerabilities and Exposures (CVE) [4], Common Configuration Enumeration (CCE) [2], Common Platform Enumeration (CPE) [3], Common Vulnerability Scoring System
(CVSS) [5], Extensible Configuration Checklist Description Format (XCCDF) [14], and Open Vulnerability and Assessment Language (OVAL) [10].
9
CHAPTER 3. EMPIRICAL STUDY
3.1 Hypotheses and Process
Hypotheses. In this study, our research is based on the philosophy that similar code tends to have
similar properties. In the context of this thesis, the similar properties are software bugs, programming
flaws, or vulnerabilities. Due to software reuse, in reality, there exist many systems that have similar
code and/or share libraries/components. For example, they could be developed from a common framework, share some code fragments due to the copy-and-paste programming practice, use the same API
libraries, or implement the same algorithms or specifications, etc. Therefore, we make hypotheses that
(H1) there exist software vulnerabilities recurring in different systems, and (H2) one of the causes of
such existence is software reuse. In this study, we use the following important terms.
Analysis Process. To confirm those two hypotheses, we considered around 3,000 vulnerability
reports in several security databases: National Vulnerability Database (NVD [17]), Open Source Computer Emergency Response Team Advisories (oCERT) [9], Mozilla Foundation Security Advisories
(MFSA [8]), and Apache Security Team (ASF) [1]. Generally, each report describes a vulnerability,
thus, a recurring vulnerability would be described in multiple reports. Sometimes, one report (such as
in oCERT) describes more than one recurring vulnerabilities. Since it is impossible to manually analyze all those reports, we used a textual analysis technique to cluster them into different groups having
similar textual contents and manually read such groups. Interestingly, we found some groups which really report recurring vulnerabilities. To verify such recurring vulnerabilities and gain more knowledge
about them (e.g. causes and patches), we also collected and analyzed all available source code, bug
reports, discussions relevant to them. Let us discuss representative examples and then present detailed
results.
10
PRBool nsXBLBinding::AllowScripts() {
...
JSContext* cx = (JSContext*) context->GetNativeContext();
nsCOMPtr<nsIDocument> ourDocument;
mPrototypeBinding->XBLDocumentInfo()->GetDocument(getter_AddRefs(ourDocument));
/* Vulnerable code: allows remote attackers to bypass the protection mechanism for codebase
principals and execute arbitrary script via the -moz-binding CSS property in a signed
JAR file */
PRBool canExecute;
nsresult rv = mgr->CanExecuteScripts(cx, ourDocument->NodePrincipal(), &canExecute);
return NS_SUCCEEDED(rv) && canExecute;
}
Figure 3.1 Vulnerable Code in Firefox 3.0.3
PRBool nsXBLBinding::AllowScripts() {
...
JSContext* cx = (JSContext*) context->GetNativeContext();
nsCOMPtr<nsIDocument> ourDocument;
mPrototypeBinding->XBLDocumentInfo()->GetDocument(getter_AddRefs(ourDocument));
nsIPrincipal* principal = ourDocument->GetPrincipal();
if (!principal){
return PR_FALSE;
}
// Similarly vulnerable code
PRBool canExecute;
nsresult rv = mgr->CanExecuteScripts(cx, principal, &canExecute);
return NS_SUCCEEDED(rv) && canExecute;
}
Figure 3.2 Vulnerable Code in SeaMonkey 1.1.12
3.2 Representative Examples
Example 1. In CVE-2008-5023, it is reported that there is a vulnerability which “allows remote attackers to bypass the protection mechanism for codebase principals and execute arbitrary script via
the -moz-binding CSS property in a signed JAR file”. Importantly, this vulnerability recurs in different
software systems: all versions of Firefox 3.x before 3.0.4, Firefox 2.x before 2.0.0.18, and SeaMonkey
1.x before 1.1.13 have this vulnerability.
Figure 3.1 and Figure 3.2 show the vulnerable code fragments extracted from Firefox 3.0.3 and
SeaMonkey 1.1.12. Figure 3.3 and Figure 3.4 show the corresponding patched code in Firefox 3.0.4
and SeaMonkey 1.1.13. As we could see, the vulnerable code is patched by adding more checking
mechanisms via two functions GetHasCertificate and Subsumes. Further analyzing, we figure
11
PRBool nsXBLBinding::AllowScripts() {
....
JSContext* cx = (JSContext*) context->GetNativeContext();
nsCOMPtr<nsIDocument> ourDocument;
mPrototypeBinding->XBLDocumentInfo()->GetDocument(getter_AddRefs(ourDocument));
// PATCHED CODE --------------------PRBool canExecute;
nsresult rv = mgr->CanExecuteScripts(cx, ourDocument->NodePrincipal(), &canExecute);
if (NS_FAILED(rv) || !canExecute) {
return PR_FALSE;
}
PRBool haveCert;
doc->NodePrincipal()->GetHasCertificate(&haveCert);
if (!haveCert){
return PR_TRUE;
}
PRBool subsumes;
rv = ourDocument->NodePrincipal()->Subsumes(doc->NodePrincipal(), &subsumes);
return NS_SUCCEEDED(rv) && subsumes;
}
Figure 3.3 Patched Code in Firefox 3.0.4
out that the vulnerability is recurring due to the reuse of source code. In Figure 3.1 and Figure 3.2, the
vulnerable code fragments are highly similar. Because Firefox and SeaMonkey are developed from the
common framework Mozilla, they largely share source code, including the vulnerable code in those
fragments. Therefore, it causes a recurring vulnerability in those two systems.
This example is a representative example of the recurring vulnerabilities that we classify as Type
1 (denoted by RV1). The vulnerabilities of this type are recurring due to the reuse of source code i.e.
a code fragment in one system has some undetected flaws and is reused in another system. Therefore,
when the flaws are exploited as a vulnerability, the vulnerability is recurring in both systems. In general,
the reuse could be made via copy-and-paste practice, or via branching the whole code base to create a
new version of a system. In other cases, systems could be derived from the same codebase/framework,
but are later independently developed as a new product. Because of reuse, the vulnerable code tends to
be highly similar in texts (e.g. names of called functions, variables, names/values of literals, etc) and in
structure (e.g. the structure of statements, branches, expressions, etc). Due to this nature, those similar
features could be used to identify them.
A special case of Type 1 is related to different releases of a system (e.g. Firefox 2.x and 3.x in this
example). Generally, such versions/releases are developed from the same codebase, e.g. later versions
12
PRBool nsXBLBinding::AllowScripts() {
...
JSContext* cx = (JSContext*) context->GetNativeContext();
nsCOMPtr<nsIDocument> ourDocument;
mPrototypeBinding->XBLDocumentInfo()->GetDocument(getter_AddRefs(ourDocument));
nsIPrincipal* principal = ourDocument->GetPrincipal();
if (!principal){
return PR_FALSE;
}
// PATCHED CODE --------------------PRBool canExecute;
nsresult rv = mgr->CanExecuteScripts(cx, ourDocument->NodePrincipal(), &canExecute);
if (NS_FAILED(rv) || !canExecute) {
return PR_FALSE;
}
PRBool haveCert;
doc->NodePrincipal()->GetHasCertificate(&haveCert);
if (!haveCert){
return PR_TRUE;
}
PRBool subsumes;
rv = ourDocument->NodePrincipal()->Subsumes(doc->NodePrincipal(), &subsumes);
return NS_SUCCEEDED(rv) && subsumes;
}
Figure 3.4 Patched Code in SeaMonkey 1.1.13
are copied from earlier versions, thus, most likely have the same vulnerable code. When vulnerable
code is fixed in the later versions, it should also be fixed in the earlier versions. This kind of patching
is referred as back-porting.
Example 2. OpenSSL, an open source implementation of the SSL and TLS protocols, provides an
API library named EVP as a high-level interface to its cryptographic functions. As described in
EVP documentation, EVP has a protocol for signature verification, which could be used in the following procedure. First, EVP VerifyInit is called to initialize a verification context object. Then,
EVP VerifyUpdate is used to hash the data for verification into that verification context. Finally,
that data is verified against corresponding public key(s) via EVP VerifyFinal. EVP VerifyFinal
would return one of three values: 1 if the data is verified to be correct; 0 if it is incorrect; and -1 if
there is any failure in the verification process. However, the return value of -1 is overlooked by several
developers. In other words, they might understand that, EVP VerifyFinal would return only two
values: 1 and 0 for correct and incorrect verification. Therefore, in several systems, the flaw state-
13
static int crypto_verify() {
...
EVP VerifyInit(&ctx, peer->digest);
EVP VerifyUpdate(&ctx, (u_char *)&ep->tstamp, vallen + 12);
/* Vulnerable code: EVP\_VerifyFinal returns 1: correct, 0: incorrect, and -1: failure.
This expression is false for both 1 and -1, thus, verification failure is mishandled
as correct verification */
if (!EVP VerifyFinal(&ctx, (u_char *)&ep->pkt[i], siglen, pkey))
return (XEVNT_SIG);
...
}
Figure 3.5 Recurring Vulnerability in NTP 4.2.5
int gale_crypto_verify_raw(...) {
...
EVP VerifyInit(&context,EVP_md5());
EVP VerifyUpdate(&context,data.p,data.l);
for (i = 0; is_valid && i < key_count; ++i) {
... //Similarly vulnerable code
if (!EVP VerifyFinal(&context,sigs[i].p,sigs[i].l,key)) {
crypto_i_error();
is_valid = 0;
goto cleanup;
}
}
cleanup: EVP_PKEY_free(key);
}
Figure 3.6 Recurring Vulnerability in Gale 0.99
ment if (!EVP VerifyFinal(...)) is used to check for some error(s) in verification. Thus, when
EVP VerifyFinal returns -1, i.e. a failure occurs in the verification process, the control expression
(!EVP VerifyFinal(...))is false as in the case when 1 is returned. As a result, the program would
behave as if the verification is correct, i.e. it is vulnerable to this exploitation.
From CVE-2009-0021 and CVE-2009-0047, this programming flaw appeared in two systems NTP
and Gale using EVP library, and really caused a recurring vulnerability that “allows remote attackers
to bypass validation of the certificate chain via a malformed SSL/TLS signature for DSA and ECDSA
keys”. Figure 3.5 and Figure 3.6 show the corresponding vulnerable code that we found from NTP
4.2.5 and Gale 0.99. Despite detailed differences, both of them use the signature verification protocol
provided by EVP and incorrectly process the return value of EVP VerifyFinal by the aforementioned
if statement.
We classify this vulnerability into Type 2, i.e. API-shared/reused recurring vulnerability (denoted
14
by RV2). The vulnerabilities of this type occur on the systems that share APIs/libraries. Generally,
APIs should be used following a usage protocol specified by API designers. For example, the API
functions must be called in the correct orders; the input/output provided to/returned from API function
calls must be properly checked. However, developers could wrongly use such APIs, i.e. do not follow
the intended protocols or specifications. They could call the functions in an incorrect order, miss
an essential call, pass an unchecked/wrong-typed input parameter, or incorrectly handle the return
values. Since they reuse the same library in different systems, they could make such similar erroneous
usages, thus, create similar faulty code, and make their programs vulnerable to the same or similar
vulnerabilities. Generally, each RV2 is related to a misused API function or protocol.
Example 3. Besides those two types, the other identified recurring vulnerabilities are classified as
Type 3, denoted by RV3. Here is an example of type 3. According to CVE-2006-4339 and CVE2006-7140, two systems OpenSSL 0.9.7 and Sun Solaris 9 “when using an RSA key with exponent 3,
removes PKCS-1 padding before generating a hash, which allows remote attackers to forge a PKCS
#1 v1.5 signature that is signed by that RSA key and prevents libike from correctly verifying X.509 and
other certificates that use PKCS #1”. Those two systems realize the same RSA encryption algorithm,
even though with different implementations. Unfortunately, the developers of both systems make the
same mistake in their corresponding implementations of that algorithm, i.e. “removes PKCS-1 padding
before generating a hash”, thus make both systems vulnerable to the same exploitation.
Generally, recurring vulnerabilities of Type 3 occur in the systems with the reuse of artifacts at a
higher level of abstraction. For example, they could implement the same algorithms, specifications, or
same designs to satisfy the same requirements. Then, if their developers made the same implementation
mistakes, or the shared algorithms/specifications had some flaws, the corresponding systems would
have the same or similar vulnerabilities. However, unlike Type 1 and Type 2, vulnerable code of Type
3 is harder to recognize/localize/match in those systems due to the wide varieties of implementation
choices and differences in design, architecture, programming language, etc among systems.
15
Database
NVD
oCERT
MFSA
ASF
TOTAL
Report
2,598
30
103
234
2,965
Group
151
30
77
59
299
RV
143
34
77
59
313
RV1
74
18
77
59
228
RV2
36
14
0
0
50
RV3
33
2
0
0
35
Table 3.1 Recurring Software Vulnerabilities
3.3 Results and Implications
Table 3.1 summarizes the result of our study. Column Report shows the total number of vulnerability reports we collected and column Group shows the total number of groups of reports about
recurring vulnerabilities we manually analyzed in each database. Column RV is the number of identified
recurring vulnerabilities. The last three columns display the numbers for each type. The result confirms
our hypotheses H1 and H2: there exist many recurring vulnerabilities in different systems (see column
RV), and those vulnerabilities recur due to the reuse of source code (RV1), APIs/libraries (RV2), and
other artifacts at higher levels of abstraction, e.g. algorithms, specifications (RV3). Note that, each
group in oCERT contains only one report, however, each report in oCERT generally describes several
vulnerabilities, and many of them are recurring. Thus, the number in column RV is larger than that in
column Group.
The result also shows that the numbers of RV1s (source code-reused recurring vulnerabilities) and
RV2s (API-reused) are considerable. All vulnerabilities reported on Mozilla and Apache are RV1
because Mozilla and Apache are two frameworks on which the systems in analysis are developed.
Therefore, such systems share a large amount of code including vulnerable code fragments. Recurring
vulnerabilities of Type 3 (RV3s) are less than RV1s and RV2s partly because the chance that developers
make the same mistakes when implementing an algorithm might be less than the chance that they create
a flaw in source code or misuse libraries in similar ways. Moreover, the systems sharing designs or
algorithms might be not as many as the ones reusing source code and libraries.
Implications. The study confirms our hypotheses on recurring software vulnerabilities. Those
vulnerabilities are classified in three types based on the artifacts that their systems reuse. This finding
suggests that we could use the knowledge of prior known vulnerabilities in reported systems to detect
16
and resolve not-yet-reported vulnerabilities recurring in other systems/releases that reuse the related
source code/libraries/algorithms, etc.
The study also provides some insights about the characteristics of vulnerable code of Types 1 and
2. While Type 1 vulnerable code is generally similar in texts and structure, Type 2 vulnerable code
tends to have similar method calls, and similar input checking and output handling before and after
such calls. Those insights are used in our detection and resolution of recurring vulnerabilities.
3.4
Threats to Validity
Public vulnerability databases used in our research could be incomplete because some of vulnerabilities are not disclosed or do not have patches available to the public yet. Since our empirical study is
based on the reported vulnerabilities with available patches, it results might be affected. Furthermore,
recurring vulnerabilities in our study are examined and identified by human beings. Therefore, the
result could be biased due to human subjective views and mistakes.
17
CHAPTER 4. APPROACH OVERVIEW
We have developed SecureSync, an automatic tool to support the detection and resolution recommendation for RV1s and RV2s. The tool builds a knowledge base of the prior known/reported vulnerabilities and locates the vulnerable code fragments in a given system that are similar to the ones in its
knowledge base. The working process of SecureSync is illustrated in Figure 4.1.
Open Security
Vulnerability Database
Testing
Systems
Extracting vulnerability with
available patches
Text-based
Filtering
Sample vulnerable
and patched code
Source Code
Candidates
Source Code Parser
Source Code Parser
Candidate
xASTs and xGRUMs
Recommending patches
Extracted
xASTs and xGRUMs
Report Vulnerable
Source Code
Knowledge Base
Database
Sample
xASTs and xGRUMs
Feature Extraction and
Similarity Measurement
Figure 4.1 SecureSync’s Working Process
In order to build the knowledge base, SecureSync searches for vulnerabilities with available patches
in Open Security Vulnerability Databases. The vulnerable and patched samples are then extracted and
stored in the Knowledge Base Database as xASTs and xGRUMs. When a testing system arrives,
SecureSync performs text-based filtering to keep only the source files related to sample vulnerabilities
(i.e similar identifiers/tokens). Then, these source files are parsed to build a set of xASTs and xGRUMs.
Given candidates X and samples Y as a set of xASTs and xGRUMs extracted from the testing system
18
and the knowledge base respectively, SecureSync calculates the similarity between pairs of xASTs
(Type 1) and pairs of xGRUMs (Type 2) from X and Y and reports the candidates which are sufficiently
similar to the vulnerable sample and less similar to the corresponding patched one. SecureSync also
helps developers by pointing out the specific locations of vulnerable code and providing corresponding
patches derived from sample patches.
4.1 Problem Formulation
To build SecureSync, there are two main challenges: how to represent and measure the similarity
of RV1s and RV2s, and how to localize the recurring ones in different systems. In SecureSync, we
represent vulnerabilities via the features extracted from their vulnerable and patched code, and calculate
the similarity of those vulnerabilities via such features. Feature extraction and similarity measure
functions are defined differently for the detection of RV1s and RV2s, due to the differences in their
characteristics. The problem of detecting recurring vulnerabilities is formulated as follows.
Definition 1 (Feature and Similarity) Two functions F () and Sim() are called the feature extraction
and similarity measure functions for the code fragments. F (A) is called the feature set of a fragment
A. Sim(A, B) is the similarity measurement of two fragments A and B.
Definition 2 (Recurring Vulnerable Code) Given a vulnerable code fragment A and its corresponding patched code A′ . If a code fragment B is sufficiently similar to A and less similar to A′ , i.e.
Sim(B, A) ≥ σ and Sim(B, A′ ) < Sim(B, A), then B is considered as a recurring vulnerable code
fragment of A. σ is a chosen threshold.
A and A′ could be similar because A′ is modified from A. Thus, B could be similar to both A and
A′ . The second condition requires B to be more similar to vulnerable code than to patched code.
Definition 3 (Detecting Recurring Vulnerability) Given a knowledge base as a set of vulnerable and
patched code fragments K ={(A1 , A′1 ), (A2 , A′2 ), ..., (An , A′n )} and a program as a set of code fragments P ={B1 , B2 , ..., Bm }. Find fragment Bi (s) that is recurring vulnerable code of some fragment
Aj (s).
19
4.2 Algorithmic Solution and Techniques
The general process for detection of RV1s and RV2s is illustrated in Figure 4.2. First, SecureSync
produces candidates fragments from the program P under investigation (line 2). Then, each candidate is
compared against vulnerable and patched code of the vulnerabilities in the knowledge base K to find the
recurring ones (lines 3-4). Detected vulnerabilities are reported to the users with the recommendation.
1 function Detect(P, K, σ) //detect recurring vulnerability
2 C = Candidates(P, K) //produce candidates
3 for each fragment B ∈ C: //check against knowledge base for recurring
4
if ∃(A, A′ ) ∈ K : Sim(B, A) ≥ σ ∧ Sim(B, A′ ) < Sim(B, A)
5
ReportAndRecommend(B)
Figure 4.2 Detection of Recurring Vulnerabilities
This algorithm requires the following techniques:
Feature Extraction and Similarity Measure. For RV1s, SecureSync uses a tree-based representation, called extended AST (xAST), that incorporates textual and structural features of code fragments.
The similarity of fragments is computed based on the similarity of such trees via Exas, an approach for
structural approximation and similarity measure of trees and graphs [44]. For RV2s, SecureSync uses
a novel graph-based representation, called xGRUM. Each code fragment is represented as a graph, in
which nodes represent function calls, variables, operators and branching points of control statements
(e.g. if, while); and edges represent control/data dependencies between nodes. With this, SecureSync
could represent the API usage information relevant to the orders of function calls, the checking of input or handling of output of function calls. Then, the similarity of code fragments is measured by the
similarity of those xGRUMs based on their aligned nodes (Section 5).
Building Knowledge Base of Reported Vulnerabilities. We build the knowledge base for SecureSync using a semi-automated method. First, we access to vulnerability databases and manually
analyze each report to choose vulnerabilities. Then, using code search, we find the corresponding vulnerable and patched code for the chosen vulnerabilities. We use SecureSync to automatically produce
corresponding xASTs and xGRUMs from those collected code fragments as their features. Note that,
20
this knowledge building process could be fully automated if the vulnerability databases provide the
information on the vulnerable and corresponding patched code.
Producing Candidate Code Fragments. After having functions for feature extraction, similarity
measure, and the knowledge base, SecureSync produces code fragments from the program under investigation to find recurring vulnerable code using Definition 2. To improve the detection performance,
SecureSync uses a text-based filtering technique to keep for further processing only the files having
some tokens (i.e. words) identical or similar to the names of the functions in vulnerable code of the
knowledge base.
Recommending Patches. For Type 1 with the nature of source code reuse, the patch in the knowledge base might be applicable to the detected vulnerable code with little modification. Thus, SecureSync does recommendation by pointing out the vulnerable statements and the sample patch taken
from its knowledge base. For Type 2 with the nature of API usage, via its novel graph alignment algorithm, SecureSync suggests the addition of missed function calls, or the checking of input/output
before/after calls, etc.
21
CHAPTER 5. SOFTWARE VULNERABILITY DETECTION
5.1 Type 1 Vulnerability Detection
5.1.1 Representation
To detect Type 1 recurring vulnerabilities, SecureSync represents code fragments, including vulnerable and patched code in its knowledge base, via an AST-like structure, which we call extended AST
(xAST). An xAST is an augmented AST in which a node representing a function call, a variable, a literal, or an operator has its label containing the node’s type, the signature, the data type, the value, or the
token of the corresponding program entity. This labeling provides more semantic information for the
tree (e.g. two nodes of the same type of function call but different labels would represent different calls).
Figure 5.1 illustrates two xASTs of similar vulnerable statements in Figure 3.1 and Figure 5.2 represents the corresponding patch code. Two trees have mostly similar structures and nodes’ labels, e.g.
the nodes representing the function calls CanExecuteScripts, the variable of data type nsresult,
etc. Therefore, they similarly have the same patch. (For simplicity, the node types or parameter lists
are not drawn).
5.1.2 Feature Extraction and Similarity Measure
SecureSync considers feature set F (A) of a code fragment A is a set of xASTs, each represents
a statement of A. For example, vulnerable code fragments in Figure 3.1 have feature sets of sex and
eight xASTs, respectively. Then, the similarity of two fragments is measured via the similarity of
corresponding feature sets of xASTs.
SecureSync uses Exas [44] to approximate the xAST structures and measure their similarity. Using
Exas, each xAST or a set of xASTs T is represented by a characteristic vector of occurrence-counts of
22
a. xAST tree in FireFox
ASSIGN
nsresult
CanExecuteScripts
nsIScriptSecurityManager
JSContext
NodePrincipal
PRBool
nsCOMPtr<nsIDocument>
nsresult rv = mgr->CanExecuteScripts(cx, ourDocument->NodePrincipal(), &canExecute);
b. xAST tree in SeaMonkey
ASSIGN
nsresult
CanExecuteScripts
nsIScriptSecurityManager
JSContext
nsIPrincipal
PRBool
nsresult rv = mgr->CanExecuteScripts (cx, principal, &canExecute);
Figure 5.1 xAST from Code in Figure 3.1 and Figure 3.2
IF
OR
PRBool
NodePrincipal
RETURN
NOT
PRBool
nsIDocument
PR_FALSE
IF
GetHasCertificate
NOT
RETURN
PRBool
PRBool
PR_TRUE
NodePrincipal
nsCOMPtr<nsIDocument>
NodePrincipal
if (NS_FAILED(rv) || !canExecute) return PR_FALSE;
doc->NodePrincipal()->GetHasCertificate(&haveCert);
if (!haveCert) return PR_TRUE;
rv = ourDocument->NodePrincipal()->Subsumes(doc-> NodePrincipal(), &subsumes);
Figure 5.2 xAST from Patched Code in Figure 3.3 and Figure 3.4
nsIDocument
Subsumes
PRBool
23
its structural features Ex(T ). For trees, a structural feature is a sequence of labels of the nodes along a
limited length path. For example, both trees in Figure 5.1 has a feature [ASSIGN]-[nsresult] and a
feature [ASSIGN]-[CanExecuteScripts]-[PRBool].
The similarity of two fragments are measured based on the Manhattan distance of two corresponding Exas vectors of their feature sets:
Sim(A, B) = 1 −
|Ex(F (A)) − Ex(F (B))|
|Ex(F (A))| + |Ex(F (B))|
(5.1)
This formula normalizes vector distance with the vectors’ sizes. Thus, with the same threshold for
similarity, larger trees, which have larger vectors, are allowed to have more different vectors.
5.1.3 Candidate Searching
A vulnerable code fragment of Type 1 generally scatters in several non-consecutive statements (see
Figure 3.1 and 3.2). Thus, traditional code clone detection techniques could not handle well such
similar, non-consecutive fragments. To address that and to find the candidates of such vulnerable code
fragments, SecureSync compares every statement of P to vulnerable code statements in its knowledge
base and merges such statements into larger fragments. To improve searching, SecureSync uses two
levels of filtering.
Text-based Filtering. Text-based filtering aims at filtering out the source files that do not have any
code textually similar to vulnerable code in the knowledge base. For each file, SecureSync does lexical
analysis, and only keeps the files that contain the tokens/words (e.g. identifiers, literals) identical or
similar to the names in the vulnerable code (e.g. function/variable names, literals). This text-based
filtering is highly effective. For example, in our evaluation, after filtering Mozilla Firefox with more
than 6,000 source files, SecureSync keeps only about 100 files.
Structure-based Filtering. Structure-based filtering aims at keeping only the statements that
potentially have similar xAST structures to the vulnerable ones in knowledge base. To do this, SecureSync uses locality-sensitive hashing (LSH) [20]. LSH scheme provides the hash codes for the
vectors such that the more similar the two vectors are, the higher probability they would have the same
hash code [20]. SecureSync first parses each source file kept from the previous step into an xAST.
24
Then, for each sub-tree representing a statement S in the file, it extracts an Exas feature vector Ex(S).
To check whether statement S is similar to a statement T in the knowledge base, SecureSync compares
LSH hash codes of Ex(S) with those of Ex(T ). If Ex(S) and Ex(T ) have some common LSH-hash
code, they are likely to be similar vectors, thus, S and T tend to have similar xAST structures. For faster
processing, every statement T of the vulnerable code in knowledge base is pre-hashed into a hashing
table. Thus, if a statement S does not share any hash code in that hash table, it will be disregarded.
Candidate Producing and Comparing. After previous steps, SecureSync has a set of candidate
statements that potentially have similar xAST structures with some statement(s) in vulnerable code in
its knowledge base. SecureSync now merges consecutive candidate statements into larger code fragments, generally at the method level. Then, candidate code fragments will be compared to vulnerable
and patched code fragments in the knowledge base, using Definition 2 Section 4.1 and Formula 5.1.
Based on the LSH hash table, SecureSync compares each candidate B with only the code fragment(s)
A in the knowledge base that contain(s) the statements T s correspondingly having some common LSH
hash codes with the statements Ss of A.
5.1.4 Origin Analysis
When reusing source code, developers could make modifications to the identifiers/names of the
code entities. For example, the function ap proxy send dir filter in Apache HTTP Server 2.0.x
was renamed to proxy send dir filter in Apache HTTP Server 2.2.x. Because the features of
xASTs rely on names, such renaming could affect the comparison of code fragments in different systems/versions. This problem is addressed by an origin analysis process that provides the name mapping
between two versions of a system or two code-sharing systems. Using such mapping, when producing
the features of xASTs for candidate code fragments, SecureSync uses the mapped names, instead of
the names in the candidate code, thus, avoids the renaming problem. To map the names from such
two versions, currently, SecureSync uses an origin analysis technique in our prior work, OperV [47].
OperV models each software system by a project tree where each of its nodes corresponds to a program
element such as package, file, function, or statement. Then, it does origin analysis using a tree alignment algorithm that compares two project trees based on the similarity of the sub-trees and provides the
25
mapping of the nodes. Using OperV, SecureSync knows the mapped entities thus, is able to produce
the name mapping that is used in feature extraction.
5.2 Type 2 Vulnerability Detection
5.2.1 Representation
Type 2 vulnerabilities are caused by the misuse or mishandling of APIs. Therefore, we emphasize
on API usages to detect such recurring vulnerabilities. If a candidate code fragment B has similar
API usages to a vulnerable code fragment A, B is likely to have a recurring vulnerability with A. In
SecureSync, API usages are represented as graph-based models, in which nodes represent the usages
of API function calls, data structures, and control structures, and edges represent the relations or control/data dependencies between them. Our graph-based representation for API usages, called Extended
GRaph-based Usage Model (xGRUM), is as follows:
Definition 4 (xGRUM) Each extended graph-based usage model is a directed, labeled, acyclic graph
in which:
1. Each action node represents a function or method call;
2. Each data node represents a variable;
3. Each control node represents the branching point of a control structure (e.g. if, for, while,
switch);
4. Each operator node represents an operator (e.g. not, and, or);
5. An edge connecting two nodes x and y represents the control and data dependencies between x
and y; and
6. The label of an action, data, control, and operator node is the name, data type, or token of the
corresponding function, variable, control structure, or operator, along with the type of the corresponding node.
The rationale behind this representation is as follows. The usage of an API function or data structure
is represented as an action or data node. The order of two API function calls, e.g. x must be called
before y, is represented by an edge connecting the action nodes corresponding to the calls to x and
26
a. Vulnerable API usage in Gale
FOR
b. Vulnerable API usage in NTP
EVP_VerifyInit
EVP_md5
EVP_VerifyInit
EVP_VerifyUpdate
EVP_MD_CTX
EVP_VerifyUpdate
EVP_VerifyFinal
IF
EVP_PREY
NOT
crypto_i_error
EVP_VerifyInit
EVP_MD_CTX
EVP_VerifyUpdate
EVP_VerifyFinal
IF
EVP_PKEY_free
c. Patched API usage in NTP
EVP_MD_CTX
EVP_VerifyFinal
IF
NOT
RETURN
LEQ
integer 0
RETURN
Legends
action node
control dependency
data node
data dependency
control node
Figure 5.3 xGRUMs from Vulnerable and Patched Code in Figure 3.5
to y. The checking of input or output of API functions is modeled via the control and operator nodes
surrounding the action nodes of those function calls and via the edges between such control, operator
and action nodes.
Figure 5.3 partially shows three xGRUMs of two vulnerable code fragments in two figures 3.5 and
3.6 and one patched code fragment. (For better understanding, only the nodes/edges related to API
usages and the vulnerabilities are drawn). The xGRUMs have the action nodes representing function
calls EVP VerifyFinal, EVP VerifyInit, data node EVP MD CTX, control nodes IF, FOR, and operator nodes NOT, LEQ. An edge from EVP VerifyUpdate to EVP VerifyFinal represents both their
control dependency, (i.e. EVP VerifyUpdate is used before EVP VerifyFinal), and the data dependency: those two nodes share data via data node EVP MD CTX. The edge between EVP MD CTX and
EVP VerifyFinal shows that the corresponding variable is used as an input for EVP VerifyFinal
(as well as an output, since the variable is a reference). The edge from action node EVP VerifyFinal
to control node IF shows the control dependency: EVP VerifyFinal is called before the branching
point of that if statement. That is, condition checking occurs after the call.
Especially, operator node NOT represents the operator in the control expression !EVP VerifyFinal(...).
It has control dependencies with two nodes EVP VerifyFinal and IF. In Figure 5.3c, the control
expression is modified into EVP VerifyFinal(...)<=. Thus, that operator node NOT is replaced
by the operator LEQ and the data node integer 0 is added for the literal value zero. A literal is
27
modeled as a special data node, and its label is formed by its type and value. SecureSync models
only the literals of supported primitive data types (e.g. integer, float, char) and special values (e.g. 0, 1, -1, null, empty string). Prior work [24, 34] showed that bugs often occur at condition checking points of such special values. Currently, SecureSync uses intra-procedural data analysis to find the data dependencies between graph nodes. For example, the data dependency between
EVP VerifyInit, EVP VerifyUpdate, and EVP VerifyFinal are found via their connections to
the data node EVP MD CTX.
5.2.2 Feature Extraction and Similarity Measure
Each vulnerable code fragment A is represented by an xGRUM G. Extracted features of A represent the nodes of G that are relevant to misused APIs. Note that, not all nodes in G is relevant
to the misused API functions. For example, in Example 2, only EVP VerifyFinal is misused,
EVP VerifyInit and EVP VerifyUpdate are correctly used. Thus, the feature of the vulnerabil-
ity should emphasize on the action node EVP VerifyFinal, operator node NOT, control node IF, and
data node EVP MD CTX, and of course, on the edges, i.e. the control and data dependencies between
such nodes.
For RV2s, features are extracted from the comparison between two xGRUMs representing the vulnerable and patched code, respectively. SecureSync finds the nodes related to misused APIs based on
the idea that: if some program entities are related to the bug, they should be changed/affected by the fix.
Since SecureSync represents program entities and dependencies via labels and edges, changed/affected
entities are represented by the nodes having different labels or neighborhoods, or being added/deleted.
That is, the unchanged nodes between two xGRUMs of the vulnerable and patched code represent the
entities irrelevant to API misuse. Thus, sub-graphs containing changed nodes of those two xGRUMs
are considered as features of the corresponding vulnerability. To find the changed and unchanged
nodes, SecureSync uses the following approximate graph alignment algorithm Figure 5.4.
28
1 function Align(G, G′ , µ) //align and differ two usage models
2 for all u ∈ G, v ∈ G′ //calculate similarity of all nodes.
3
if label(u) = label(v)
4
sim(u, v) = 1 − |N (u) − N (v)|/(|N (u)| + |N (v)|)
5 M = MaximumWeightedMatching(U, U ′ , sim) //matching
6 for each (u, v) ∈ M :
7
if sim(u, v) < µ then M.remove((u, v)) //remove too low matches
8 return M
Figure 5.4 Graph Alignment Algorithm
5.2.2.1 Graph Alignment Algorithm
This algorithm aligns (i.e. maps) the nodes between two xGRUMs G and G′ based on their labels
and neighborhoods, then the aligned nodes could be considered as unchanged nodes and not affected
by the patch. The detailed algorithm is shown in Figure 5.4. For each node u into a graph, SecureSync
extracts an Exas vector N (u) to represent the neighborhood of u. The similarity of two nodes u ∈ G
and v ∈ G′ , sim(u, v), is calculated based on the vector distance of N (u) and N (v) as in Formula 5.1
if they have the same label (see lines 2-4), otherwise they have zero similarity. Then, the maximum
weighted matching with such similarity as weights is computed (line 5). Only matched nodes with
sufficiently high similarity are kept (lines 6-7) and returned as aligned nodes (line 8).
5.2.2.2 Feature Extraction and Similarity Measure
Using that algorithm, SecureSync extracts features as follows. It first parses the vulnerable and
corresponding patched code fragments A and A′ into two xGRUMs G and G′ . Then, it runs the graph
alignment algorithm to find the aligned nodes and considered them as unchanged. Unaligned nodes are
considered as changed, and the subgraphs formed by such nodes in G and G′ are put into the feature
sets F (A) and F (A′ ) for the current vulnerability.
Let us examine the code in Figure 3.5 and 3.6. Figure 5.3b and Figure 5.3c display the xGRUMs
G and G′ of vulnerable code and patched code fragments in NTP. The neighborhood structures of
two nodes labeled EVP VerifyInit in two graphs are identical, thus, they are 100% similar. The
similarity of two nodes labeled EVP VerifyFinal is less because they have different neighborhood
structures (one has a neighbor node NOT, one has LEQ). Therefore, after maximum matching, those
29
two EVP VerifyInit nodes are aligned, however, the nodes EVP VerifyFinal and other nodes
representing operators NOT, LEQ and literal integer 0 are considered as changed nodes. Then, each
feature set F (A) and F (A′ ) contains the corresponding sub-graph with those changed nodes in gray
color in Figures 5.3b and 5.3c.
Similarity Measure. Given a code fragment B with the corresponding xGRUM H. SecureSync
measures the similarity of B against A in the database based on the usages of API functions that are
(mis)used in A and (re)used in B. To find such API usages, SecureSync aligns H and F (A) which
contains the changed nodes representing the entities related to the misused API functions in A. This
alignment also uses the aforementioned graph alignment algorithm with a smaller similarity threshold
µ because the difference between B and A might be larger than that of A′ and A.
Assume that the sets of aligned nodes are M (A) and M (B). SecureSync builds two xGRUMs
U (A) and U (B) containing the nodes in M (A) and M (B) as well as their dependent nodes and edges
in G and H, respectively. Since M (A) and M (B) contain the nodes related to API functions that are
(mis)used in A and are (re)used in B, U (A) and U (B) will represent the corresponding API usages in
A and in B. Then, the similarity of A and B are measured based on the similarity of U (A) and U (B):
Sim(A, B) = 1 −
|Ex(U (A)) − Ex(U (B))|
|Ex(U (A))| + |Ex(U (B))|
(5.2)
This formula is in fact similar to Formula 5.1. The only different is that Ex(U (A)) and Ex(U (B)) are
Exas vectors of two xGRUMs, not xASTs. In Figures 5.3a and 5.3b, M (A) and M (B) will contain
the action node EVP VerifyFinal, operator node NOT and control node IF. Then, U (A) and U (B)
will be formed from them and their data/control-dependent nodes, such as EVP VerifyInit, EVP
VerifyUpdate, and EVP MD CTX. In Figure 5.3a, nodes EVP PKEY free, FOR, EVP PKEY, and crypto i error
are also included in U (A). Their similarity calculated based on Formula 5.2 is 90%.
5.2.3 Candidate Searching
Similarly to the detection of RV1s, SecureSync uses origin analysis to find renaming of API functions between systems and versions. Then, it also uses text-based filtering and set-based filtering to
keep only source files and xGRUMs that contain tokens and names similar to misused API functions
stored as the features in its database. After such filterings, SecureSync has a set of xGRUMs that
30
potentially contain similarly misused API functions with some xGRUMs in the vulnerable code in its
database. Then, the candidate xGRUMs of code fragments are compared to xGRUMs of vulnerable
and patched code fragments in the database, using Definition 2 and Formula 5.2. Matched candidates
are recommended for patching.
31
CHAPTER 6. EMPIRICAL EVALUATION
This chapter presents our evaluation of SecureSync on real-world software systems and vulnerabilities. The evaluation is separated into two experiments for the detection of type 1 and type 2 vulnerabilities. Each experiment has three steps: 1) selecting of vulnerabilities and systems for building
knowledge base, 2) investigating and running, and 3) analyzing results. Let us describe the details of
each experiment.
6.1
Evaluation of Type 1 Vulnerability Detection
Selecting. We chose three Mozilla-based open-source systems FireFox, Thunderbird and SeaMonkey for the evaluation of type 1 vulnerabilities because they are actively developed and maintained, and
have available source code, security reports, forums, and discussions. First, we contacted and obtained
the release and branch history of those three systems from Mozilla security team. For each system, we
chose a range of releases that are currently maintained and supported on security updates, with the total
of 51 releases for three systems. The numbers of releases of each system is shown in column Release
of Table 6.1.
We aimed to evaluate how SecureSync uses the knowledge base of vulnerabilities built from the
reports in some releases of FireFox to detect the recurring ones in Thunderbird and SeaMonkey, and
also in different FireFox’s release branches in which those vulnerabilities are not reported yet. Thus, we
selected 48 vulnerabilities reported in the chosen releases of FireFox with publicly available vulnerable
code and corresponding patched code to build the knowledge base for SecureSync. In the cases that a
vulnerability occurred and was patched in several releases of FireFox, i.e., there were several pairs of
vulnerable and patched code fragments, we chose only one pair to build its features in the database.
Running. With the knowledge base of those 48 vulnerabilities, we run SecureSync on 51 chosen
32
Systems
ThunderBird
SeaMonkey
FireFox
TOTAL
Release
12
10
29
51
DB
report
21
28
14
63
SS
report
33
39
22
94
X
in DB
19
26
14
59
X
new
11
10
5
26
X
3
3
3
9
Miss
in DB
2
2
0
4
Table 6.1 Recurring Vulnerability Type 1 Detection Evaluation
releases. For each release, SecureSync reported the locations of vulnerable code (if any). We analyzed
those results and considered a vulnerability v to be correctly detected in a release r if either 1) v is
officially reported about r; or 2) the code locations of r reported by SecureSync have the same or highly
similar programming flaws to the vulnerable code of v. We also sent the reports from SecureSync to
Mozilla security team for their confirmation. We did not count the cases when a vulnerability is reported
on the release or branch from which the vulnerable and patched code are used in the database since a
vulnerability is considered recurring if it occurs on different release branches.
Analyzing. Table 6.1 shows the analysis result. There are 21 vulnerabilities which had been
officially reported by MFSA [8] and verified by us as truly recurrings on Thunderbird (see column DB
report). However, SecureSync reports 33 RV1s (column SS report). The manual analysis confirms
that 19 of them (see Xin DB) were in fact officially reported (i.e. coverage of 19/21 = 90%) and that
11 RV1s are not-yet-reported and newly discovered ones (see Xnew). Thus, three cases are incorrectly
reported (column X) and two are missed (Miss in DB), given the precision of 91% (30/33). The
results on SeaMonkey are even better: coverage of 93% (26/28) and precision of 92% (36/39). The
detection of RV1 on different branches of FireFox is also quite good: coverage of 100% (14/14) and
precision of 86% (19/22).
The result shows that SecureSync is able to detect RV1s with high accuracy. Most importantly,
it is able to correctly detect the total of 26 not-yet-reported vulnerabilities in three subject systems.
Figure 6.1 shows a vulnerable code fragment in Thunderbird 2.0.17 as an example of such not-yet reported and patched vulnerabilities. Note that, this one is the same vulnerability presented in Example 1.
However, it is reported in CVE-2008-5023 for only FireFox and SeaMonkey and now it is revealed by
SecureSync on Thunderbird. Based on the recommendation from our tool, we had produced a patch
33
PRBool nsXBLBinding::AllowScripts() {
...
JSContext* cx = (JSContext*) context->GetNativeContext();
nsCOMPtr<nsIDocument> ourDocument;
mPrototypeBinding->XBLDocumentInfo()->GetDocument(getter_AddRefs(ourDocument));
nsIPrincipal* principal = ourDocument->GetPrincipal();
if (!principal) return PR_FALSE;
PRBool canExecute;
nsresult rv = mgr->CanExecuteScripts(cx, principal, &canExecute);
return NS_SUCCEEDED(rv) && canExecute; ...
}
Figure 6.1 Vulnerable Code in Thunderbird 2.0.17
and sent it to Mozilla security team. They had kindly confirmed this programming flaw and our provided patch. We are waiting for their confirmation on other vulnerabilities reported by SecureSync and
corresponding patches that we built based on its fixing recommendation.
6.2
Evaluation of Type 2 Vulnerability Detection
Selecting. Out of 50 RV2s identified in our empirical study, some have no publicly available
source code (e.g. commercial software), and some have no available patches. We found available
patches (vulnerable and corresponding patched code) for only 12 RV2s and used all of them to build
the knowledge base for SecureSync in this experiment. For each of those 12 RV2s, if it is related to
an API function m, we used Google Code Search to find all systems using m and randomly chose
1-2 releases of each system from the result returned by Google Code Search (it could return several
systems using m, and several releases for each system). Some of those releases have been officially
reported to have the RV2s in knowledge base, and some have not. However, we did not select the
releases containing the vulnerable and patched code that we already used for building the knowledge
base. Thus, in total, we selected 12 RV2s, 116 different systems, with 125 releases for this experiment.
Running and Analyzing. The running and analyzing is similar to the experiment for RV1s. Table 6.2 shows the analysis result. For example, there is an RV2 related to the misuse of two functions
seteuid and setuid in ftpd and ksu programs which “...do not check return codes for setuid calls,
which might allow local users to gain privileges by causing setuid to fail to drop privileges” (CVE-
34
API function related
to vulnerability
seteuid/setuid
ftpd
gmalloc
ObjectStream
EVP VerifyFinal
DSA verify
libcurl
RSA public decrypt
ReadSetOfCurves
DSA do verify
ECDSA verify
ECDSA do verify
TOTAL
System
Release
42
21
10
7
7
7
7
5
4
3
2
1
116
46
23
10
8
8
8
7
5
4
3
2
1
125
DB
report
4
0
3
3
5
3
3
1
1
1
0
0
24
SS
report
28
19
10
6
7
8
7
5
4
2
2
1
99
X
in DB
3
0
3
3
5
3
3
1
1
0
0
0
22
X
new
20
12
7
3
2
4
4
4
3
2
2
1
64
X
5
7
0
0
0
1
0
0
0
0
0
0
13
Miss
in DB
1
0
0
0
0
0
0
0
0
1
0
0
2
Table 6.2 Recurring Vulnerability Type 2 Detection Evaluation
2006-3084). We found 46 releases of 42 different systems using those two functions. 4 out of 46 are
officially reported in CVE-2006-3083 and CVE-2006-3084 (column DB report). In the experiment,
SecureSync reports 28 out of 46 releases vulnerable. Manually checking confirms 23/28 to be correct
and 5 are incorrect (giving precision of 82%). In the 23 correctly reported vulnerabilities, 3 are officially reported and 20 others are not-yet-reported. SecureSync missed only one officially reported case.
Similarly, for the RV2 related to API ftpd, it correctly detected 12 unreported releases and wrongly
reported on 7 releases. For other RV2s, it detects correctly in almost all releases.
Manual analyzing of all the cases that SecureSync missed, we found that they are due to the data
analysis. Currently, the implementation of data analysis in SecureSync is restricted to intra-procedural.
Therefore, it misses the cases when checking/handling of inputs/outputs for API function calls is processed in different functions. For the cases that SecureSync incorrectly detected, we found the problem
is mostly due to the chosen threshold. In this experiment, we chose σ = 0.8. When we chose σ = 0.9,
for the RV2 related to ftpd, the number of wrongly detected cases reduces from 7 to 3, however, the
number of correctly detected cases also reduces from 12 to 10. However, the results still show that
SecureSync is useful, and it could be improved with more powerful data analysis.
Interesting Examples. Here are some interesting cases on which SecureSync correctly detected
35
static int ssl3_get_key_exchange(s){
...
if (pkey->type == EVP_PKEY_DSA){
/* lets do DSS */
EVP VerifyInit(&md_ctx,EVP_dss1());
EVP VerifyUpdate(&md_ctx,&(s->s3->client_random[0]),SSL3_RANDOM_SIZE);
EVP VerifyUpdate(&md_ctx,&(s->s3->server_random[0]),SSL3_RANDOM_SIZE);
EVP VerifyUpdate(&md_ctx,param,param_len);
if (!EVP VerifyFinal(&md_ctx,p,(int)n,pkey)){
/* bad signature */
al=SSL3_AD_ILLEGAL_PARAMETER;
SSLerr(SSL_F_SSL3_GET_KEY_EXCHANGE,SSL_R_BAD_SIGNATURE);
goto f_err;
}
}
}
Figure 6.2 Vulnerable Code in Arronwork 1.2
gchar *g_base64_encode
gchar *out;
gint state = 0, save
g_return_val_if_fail
g_return_val_if_fail
(const guchar *data, gsize len) {
= 0, outlen;
(data != NULL, NULL);
(len > 0, NULL);
- g_malloc (len * 4 / 3 + 4);
+
+
if (len >= ((G_MAXSIZE - 1) / 4 - 1) * 3)
g_error("Input too large for Base64 encoding "...);
+
out = g_malloc ((len / 3 + 1) * 4 + 1);
outlen = g_base64_encode_step (data, len, FALSE, out, &state, &save);
outlen += g_base64_encode_close (FALSE, out + outlen, &state, &save);
out[outlen] = ’\0’;
return (gchar *) out;
}
Figure 6.3 Vulnerable and Patched Code in GLib 2.12.3
not-yet-reported RV2s. Figure 6.2 illustrates a code fragment in Arronwork having the same vulnerability related to the incorrect usage of EVP VerifyFinal function as described in Chapter 3, and to the
best of our knowledge, it has not been reported anywhere. The code in Arronwork has different details
from the code in NTP (which we chose in building knowledge base). For example, there are different
variables and function calls, and EVP VerifyUpdate is called three times, instead of one. However,
it uses the same EVP protocol and has the same flaw. Using the recommendation from SecureSync
to change the operator and expression related to the function call to EVP VerifyFinal, we derived a
patch for it and reported this case to Arron’s developers.
36
guchar * seahorse_base64_decode (const gchar *text, gsize *out_len) {
guchar *ret;
gint inlen, state = 0, save = 0;
inlen = strlen (text);
ret = g_malloc0 (inlen * 3 / 4);
*out_len = seahorse_base64_decode_step (text, inlen, ret, &state, &save);
return ret;
}
gchar * seahorse_base64_encode (const guchar *data, gsize len) {
gchar *out;
gint state = 0, outlen, save = 0;
out = g_malloc (len * 4 / 3 + 4);
outlen = seahorse_base64_encode_step (data, len, FALSE, out, &state, &save);
outlen += seahorse_base64_encode_close (FALSE, out + outlen, &state, &save);
out[outlen] = ’\0’;
return (gchar *) out;
}
Figure 6.4 Vulnerable Code in SeaHorse 1.0.1
Here is another interesting example. CVE-2008-4316 reported “Multiple integer overflows in
glib/gbase64.c in GLib before 2.20 allow context-dependent attackers to execute arbitrary code via
a long string that is converted either (1) from or (2) to a base64 representation”. The vulnerable and patched code in GLib is in Figure 6.3, the new and removed code are marked with symbols “+” and “-”, respectively. This vulnerability is related to the misuse of function g malloc
for memory allocation with a parameter that is unchecked against the amount of available memory
(len>=((G MAXSIZE-1)/4-1)*3)), and against an integer overflow in the expression (len*4/3+4).
Using this patch, SecureSync is able to detect a similar flaw in SeaHorse system in which two functions base64 encoder and decoder incorrectly use g malloc and g malloc0 (see Figure 6.4). The
interesting point is that, the API function names in two systems are just similar, but not identical (e.g.
g malloc0 and g malloc, g base64 * and seahorse base64 *). Thus, the origin analysis infor-
mation SecureSync uses is helpful for this correct detection. Using the alignment between g malloc
and g malloc0 when comparing the graphs of two methods in Figure 6.4 with that of the method in
Figure 6.3, SecureSync correctly suggests the fixing by adding the if statement before the calls to
g malloc and g malloc0 functions, respectively.
37
API-related
seteuid/setuid
gmalloc
ftpd
ObjectStream
EVP VerifyFinal
DSA verify
libcurl
RSA public decrypt
ReadSetOfCurves
DSA do verify
ECDSA verify
ECDSA do verify
Total
Systems
42
10
21
7
7
7
7
5
4
3
2
1
116
Releases
46
10
23
8
8
8
7
5
4
3
2
1
125
SS report
28
12
19
6
7
8
7
5
8
2
2
1
105
Correct
23
11
12
6
7
7
7
5
8
2
2
1
91
Incorrect
5
1
7
0
0
1
0
0
0
0
0
0
14
Precision
82
92
63
100
100
88
100
100
100
100
100
100
Table 6.3 Recurring Vulnerability Type 2 Recommendation
6.3 Patching Recommendation
SecureSync could suggest developers fixing code by applying the tree edit operations to transform
the xAST of buggy code into the patched one. As in Example 1 Section 3.2, after SecureSync detects
the recurring vulnerability in AllowScripts function in Thunderbird system, it compares two xASTs
of buggy and patched code to detect that the patch adds the function calls for privilege checking and a
change in the return statement. Therefore, it suggests to add GetHasCertificate() and Subsumes,
and to replace the return variable canExecute of the return statement with subsumes variable. For
Type 2, SecureSync provides the operations related to API function calls for developers to fix API
misuses. For example, in Figure 6.3 and Figure 6.4, SecureSync detects the changes in g malloc and
g malloc0 functions when comparing the graphs of two methods seahorse base64 decode and
seahorse base64 encode with that of g base64 encode method. It correctly suggests fixing by
adding the if statement before calling g malloc and g malloc0 functions.
Table 6.3 shows the recommendation result for type 2 vulnerabilities. For each testing system,
SecureSync not only checks whether it is vulnerable, but also point out the locations of vulnerable code with proposed patches. For example, there is a vulnerability related to the misuse of API
ReadSetOfCurves. Among 4 releases of testing systems, SecureSync detect 8 locations (column SS
38
report) containing vulnerable code. Manually checking confirms all of them correct (column Correct),
thus giving the precision 100% (column Precision).
39
CHAPTER 7. CONCLUSIONS AND FUTURE WORK
This thesis reports an empirical study on recurring software vulnerabilities. The study shows that
there exist many vulnerabilities recurring in different systems due to the reuse of source code, APIs,
and artifacts at higher levels of abstraction (e.g. specifications). We also introduce an automatic tool
to detect such recurring vulnerabilities on different systems. The core of SecureSync includes two
techniques for modeling and matching vulnerable code across different systems. The evaluation on
real-world software vulnerabilities and systems shows that SecureSync is able to detect recurring vulnerabilities with high accuracy and to identify several vulnerable code locations that are not yet reported
or fixed even in mature systems. A couple of detected ones were confirmed by developers.
Future Work. We want to extend SecureSync approach to build a framework that incorporates
the knowledge from vulnerability reports and vulnerable source code to better detect recurring vulnerabilities. In detail, the core of SecureSync will include a usage model and a mapping algorithm for
matching vulnerable code across different systems, a model for the comparison of vulnerability reports,
and a tracing technique from a report to corresponding source code [50]. In other words, we will extend
SecureSync that:
1. Represents and compares the vulnerability reports to identify the ones that report the recurring/similar vulnerabilities,
2. Traces from a vulnerability report to the corresponding source code fragment(s) in the codebase,
3. Represents and compares code fragments to find the ones that are similar due to code reuse or
similar in API library usages.
Figure 7.1 illustrates our framework. Given a system S1 with source code C1 and a known security
vulnerability reported by R1 . The framework can support two following scenarios:
40
Report R1
Report R2
Vulnerability Model Extraction
Vulnerability
Model M1
Similarity-based
Mapping
Vulnerability
Model M2
Concept/Entity Localization
via Tracing
Source Code
C1
Suggestion of
Resolution
Source Code
C2
Usage Model Extraction
Usage
Model U1
Similarity-based
Mapping
Usage
Model U2
Figure 7.1 The SecureSync Framework
Scenario 1. Given a system S2 , one needs to determines whether S2 potentially has a recurring/similar vulnerability as S1 and point out the potential buggy code C2 . In the case that S2 is a
different version of S1 , the problem is referred to as back-porting. In general, due the difference in the
usage contexts in two systems S1 and S2 , the buggy code C1 and C2 might be different.
From R1 , a vulnerability model M1 is built to describe the vulnerability of S1 . Then, the trace from
R1 will help to find the corresponding source code fragments C1 , which are used to extract the usage
model U1 . If the tracing link is not available, SecureSync extends a traceability link recovery method,
called incremental Latent Semantic Indexing (iLSI) [33], that we developed in prior work. From usage
model U1 , SecureSync uses its usage clone detection algorithm (will be discussed later) to find code
fragments C2 with the usage U2 similar to U1 . Those C2 fragments are considered as potential buggy
code that could cause a recurring/similar vulnerability as in S1 . The suggested patch for code in S2 is
derived from the comparison between U1 and its patched U1′ . The change from U1 to U1′ will be applied
to U2 (which is similar to U1 ). Then, the concrete code will be derived to suggest the fix to C2 .
Scenario 2.Provided that R2 is reported on S2 , SecureSync compares vulnerability models extracted from security reports. First, SecureSync extracts M2 from R2 and then searches for a vulnerability model M1 in the security database that is similar to M2 . If such M1 exists, SecureSync will
41
identify the corresponding system S1 , the patch, and then map the code fragments and recommend the
patch in the similar manner as in scenario 1.
42
APPENDIX A. ADDITIONAL TECHNIQUES USED IN SECURESYNC
There are two techniques SecureSync used to calculate graph similarity and improve its perfomance. It is Exas - an approach previously developed by us to approximate and capture structure
information of labeled graphs by vectors and measure the similarity of such graphs via vector distance.
SecureSync also uses the hashing technique called Locality Sensitive Hashing to filter labeled trees
with similar structure.
Exas: A Structural Characteristic Feature Extraction Approach
Structure-oriented Representation. In our structure-oriented representation approach, a software
artifact is modeled as a labeled, directed graph (tree is a special case of graph), denoted as G =
(V, E, L). V is the set of nodes in which a node represents an element within an artifact. E is the set of
edges in which each edge between two nodes models their relationship. L is a function that maps each
node/edge to a label that describes its attributes. For example, for ASTs, node types could be used as
nodes’ labels. For Simulink models, the label of a node could be the type of its corresponding block.
Other attributes could also be encoded within labels. In existing clone detection approaches, labels for
edges are rarely explored. However, for general applicability, Exas supports the labels for both nodes
and edges.
Figure A.1 shows an illustrated example of a Simulink model, its representation graph and two
cloned fragments A and B.
Structural Feature Selection. Exas focuses on two kinds of patterns of structural information of
the graph, called (p, q)-node and n-path.
A (p, q)-node is a node having p incoming and q outgoing edges. The values of p and q associated
to a certain node might be different in different examined fragments. For example, node 9 in Figure A.1
43
2
5
Gain
Fragment A
Gain1
1
In1
1
2
Out
In2
1
2
Mul1
1
In
In
6
Mul
9
Sum
12
Out
Unit Delay
z
3
3
In
In3
4
In4
Mul2
4
In
7
2
Gain2
8
Mul
Gain
10
11
Delay
Sum
Fragment B
a) A Simulink model
b) The representation graph
Figure A.1 The Simulink Model and Graph Representation
is a (3,1)-node if entire graph is currently considered as a fragment, but is a (2,0)-node if fragment A is
examined.
An n-path is a directed path of n nodes, i.e. a sequence of n nodes in which any two consecutive
nodes are connected by a directed edge in the graph. A special case is 1-path which contains only one
node.
Structural feature of a (p, q)-node is the label of the node along with two numbers p and q. For
example, node 6 in fragment A is (2, 1)-node and gives the feature mul-2-1. Structural feature of an
n-path is a sequence of labels of nodes and edges in the path. For example, the 3-path 1-5-9 gives the
feature in-gain-sum. Table A.1 lists all patterns and features extracted from A and B. It shows that
both fragments have the same feature set and the same number of each feature. Later, we will show
that it holds for all isomorphic fragments.
Characteristic Vectors. An efficient way to express the property “having the same or similar
features” is the use of vectors. The characteristic vector of a fragment is the occurrence-count vector
of its features. That is, each position in the vector is indexed for a feature and the value at that position is
the number of occurrences of that feature in the fragment. Table A.2 shows the indexes of the features,
which are global across all vectors, and their occurrence counts in fragment A.
Two fragments having the same feature sets and occurrence counts will have the same vectors and
vice versa. The vector similarity can be measured by an appreciably chosen vector distance such as
44
Pattern
1-path
2-path
3-path
(p,q)-node
(p,q)-node
(continued)
Features of fragment A
1
2
5
6
in
in
gain
mul
1-5
1-6
2-6
6-9
in-gain in-mul in-mul mul-sum
1-5-9
1-6-9
in-gain-sum
in-mul-sum
1
2
in-0-2
in-0-1
6
9
mul-2-1
sum-2-0
9
sum
5-9
gain-sum
2-6-9
in-mul-sum
5
gain-1-1
Features of fragment B
4
3
8
7
in
in
gain
mul
4-8
4-7
3-7
7-11
in-gain in-mul in-mul mul-sum
4-8-11
4-7-11
in-gain-sum
in-mul-sum
4
3
in-0-2
in-0-1
7
11
mul-2-1
sum-2-0
11
sum
8-11
gain-sum
3-7-11
in-mul-sum
8
gain-1-1
Table A.1 Extracted Patterns and Features
Feature
in
gain
mul
sum
Index
1
2
3
4
Counts
2
1
1
1
Feature
in-gain
in-mul
gain-sum
mul-sum
Index
5
6
7
8
Counts
1
2
1
1
Feature
in-gain-sum
in-mul-sum
in-0-1
in-0-2
Index
9
10
11
12
Counts
1
2
1
1
Feature
gain-1-1
mul-2-1
sum-2-0
Index
13
14
15
Counts
1
1
1
Table A.2 Feature Indexing and Occurrence Count
1-norm distance.
In the Table A.2, based on the occurrence counts of features in fragment A, the vector for A is
(2,1,1,1,1,2,1,1,1,2,1,1,1,1,1).
LSH: Locality Sensitive Hashing
A locality-sensitive hashing (LSH) function is a hash function for vectors such that the probability
that two vectors having a same hash code is a strictly decreasing function of their corresponding distance. In other words, vectors having smaller distance will have higher probability to have the same
hash code, and vice versa. Then, if we use locality-sensitive hash functions to hash the fragments into
buckets based on the hash codes of their vectors, fragments having similar vectors tend to be hashed
into same buckets, and the other ones are less likely to be so.
The vector distance used in SecureSync for similarity measure is Manhattan distance. Therefore,
it uses locality-sensitive hash functions for l1 norm. The following family H of hash functions was
proved to be locality-sensitive for Manhattan distance:
h(u) = ⌊
a.u + b
⌋
w
In this formula, a is a vector whose elements are drawn from Cauchy distribution; w is a fixed
positive real number; and b is a random number in [0, w]. Common implementations choose w = 4.
45
If ∥u − v∥ = l then
P r(l) = P r[h(u) = h(v)] =
∫
w
0
x 2
2.e−( l )
x
√
(1 − )dx
w
l. 2π
P r(l) is proved to be a decreasing function of l [20]. Then, for l ≤ δ, we have P r(l) ≤ p = P r(δ).
Therefore, for any two vectors u, v that ∥u − v∥ ≤ δ, P r[h(u) = h(v)] ≤ p . That means, they have a
chance at least p to be hashed into a same bucket.
However, two distant points also have a chance at most p to be hashed into a same bucket. To reduce
that odds, we could use more than one hash functions. Each hash function h used in SecureSync is a
tuple of k independent hash functions of H: h = (h1 , h2 , ..., hk ). That means hashcode of each vector
u will be a vector of integers h(u) = (h1 (u), h2 (u), ..., hk (u)), with each corresponding integer index
for such a vector hashcode is calculated as follows:
h(u) =
k
∑
ri .hi (u) mod P
i=1
where each ri is a randomly chosen integer and P is a very large prime number. In SecureSync, we use
a 24 bit prime number.
We call this kind of hash functions as k-line functions. Then, two distant vectors having the same
vector hashcode if all of the member hashcodes are the same, and the probability of this event is q ≤ pk .
The corresponding probability for two similar vectors is p ≥ pk .
Since the chance for similar vectors be hashed to the same buckets reduces, SecureSync uses N
independent k-hash functions, and each vector is hashed to N corresponding buckets. Then, if u and
v are missed by a hash function, they still have chances from the others. Indeed, the probability that u
and v are missed by all those N functions, i.e. having all different hash codes is (1 − p)N ≤ (1 − pk )N .
If N is large enough, this probability approaches to zero, i.e. u and v are hashed into at least the same
bucket with a high probability.
46
BIBLIOGRAPHY
[1] ASF Security Team. http://www.apache.org/security/.
[2] Common Configuration Enumeration. http://cce.mitre.org/.
[3] Common Platform Enumeration. http://cpe.mitre.org/.
[4] Common Vulnerabilities and Exposures. http://cve.mitre.org/.
[5] Common Vulnerability Scoring System. http://www.first.org/cvss/.
[6] Google Code Search. http://www.google.com/codesearch.
[7] IBM Internet Security Systems. http://www.iss.net/.
[8] Mozilla Foundation Security Advisories. http://www.mozilla.org/security/.
[9] Open Source Computer Emergency Response Team. http://www.ocert.org/.
[10] Open Vulnerability and Assessment Language. http://oval.mitre.org/.
[11] Patch (computing). http://en.wikipedia.org/wiki/Patch (computing).
[12] Pattern Insight. http://patterninsight.com/solutions/find-once.php.
[13] Software Bug. http://en.wikipedia.org/wiki/Software bug.
[14] The
eXtensible
Configuration
Checklist
Description
http://scap.nist.gov/specifications/xccdf.
[15] The Open Source Vulnerability Database. http://osvdb.org/.
[16] The Security Content Automation Protocol. www.nvd.nist.gov/scap/docs/SCAP.doc.
Format.
47
[17] US-CERT Bulletins. http://www.us-cert.gov/.
[18] Mithun Acharya, Tao Xie, Jian Pei, and Jun Xu. Mining API patterns as partial orders from source
code: From usage scenarios to specifications. In Proc. 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software
Engineering (ESEC/FSE 2007), pages 25–34, September 2007.
[19] O.H. Alhazmi, Y.K. Malaiya, and I. Ray. Measuring, analyzing and predicting security vulnerabilities in software systems. Computers & Security, 26(3):219 – 228, 2007.
[20] Alexandr Andoni and Piotr Indyk. E2LSH 0.1 User manual. http://www.mit.edu/ andoni/LSH/manual.pdf.
[21] Erik Arisholm and Lionel C. Briand. Predicting fault-prone components in a java legacy system. In ISESE ’06: Proceedings of the 2006 ACM/IEEE international symposium on Empirical
software engineering, pages 8–17. ACM, 2006.
[22] Christian Bird, David Pattison, Raissa D’Souza, Vladimir Filkov, and Premkumar Devanbu. Latent social structure in open source projects. In SIGSOFT ’08/FSE-16: Proceedings of the 16th
ACM SIGSOFT International Symposium on Foundations of software engineering, pages 24–35.
ACM, 2008.
[23] Barry Boem. Software Engineering Economics. Prentice Hall, Englewood Cliffs, 1981.
[24] Ray-Yaung Chang, Andy Podgurski, and Jiong Yang. Discovering neglected conditions in software by mining dependence graphs. IEEE Trans. Softw. Eng., 34(5):579–596, 2008.
[25] Omar Alhazmi Colorado and Omar H. Alhazmi. Quantitative vulnerability assessment of systems
software. In Proc. Annual Reliability and Maintainability Symposium, pages 615–620, 2005.
[26] Tom Copeland. PMD Applied. Centennial Books, 2005.
[27] Davor Cubranic, Gail C. Murphy, Janice Singer, and Kellogg S. Booth. Hipikat: A project memory for software development. IEEE Transactions on Software Engineering, 31:446–465, 2005.
48
[28] Ekwa Duala-Ekoko and Martin P. Robillard. Tracking code clones in evolving software. In ICSE
’07: Proceedings of the 29th international conference on Software Engineering, pages 158–167,
Washington, DC, USA, 2007. IEEE Computer Society.
[29] Michael Gegick, Laurie Williams, Jason Osborne, and Mladen Vouk. Prioritizing software security fortification throughcode-level metrics. In QoP ’08: Proceedings of the 4th ACM workshop
on Quality of protection, pages 31–38, New York, NY, USA, 2008. ACM.
[30] Computer Security Institute. http://gocsi.com/survey.
[31] Ahmed E. Hassan and Richard C. Holt. The top ten list: Dynamic fault prediction. In ICSM
’05: Proceedings of the 21st IEEE International Conference on Software Maintenance, pages
263–272, Washington, DC, USA, 2005. IEEE Computer Society.
[32] David Hovemeyer and William Pugh. Finding bugs is easy. SIGPLAN Not., 39(12):92–106, 2004.
[33] Hsin-Yi Jiang, T. N. Nguyen, Ing-Xiang Chen, H. Jaygarl, and C. K. Chang. Incremental latent
semantic indexing for automatic traceability link evolution management. In ASE ’08: Proceedings
of the 2008 23rd IEEE/ACM International Conference on Automated Software Engineering, pages
59–68, Washington, DC, USA, 2008. IEEE Computer Society.
[34] Lingxiao Jiang, Zhendong Su, and Edwin Chiu. Context-based detection of clone-related bugs.
In ESEC-FSE ’07: Proceedings of the the 6th joint meeting of the European software engineering
conference and the ACM SIGSOFT symposium on The foundations of software engineering, pages
55–64, New York, NY, USA, 2007. ACM.
[35] Sunghun Kim, Kai Pan, and E. E. James Whitehead, Jr. Memories of bug fixes. In SIGSOFT
’06/FSE-14: Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of
software engineering, pages 35–45, New York, NY, USA, 2006. ACM.
[36] Sunghun Kim, Thomas Zimmermann, E. James Whitehead Jr., and Andreas Zeller. Predicting
faults from cached history. In ICSE ’07: Proceedings of the 29th international conference on
Software Engineering, pages 489–498, Washington, DC, USA, 2007. IEEE Computer Society.
49
[37] Zhenmin Li, Shan Lu, Suvda Myagmar, and Yuanyuan Zhou. Cp-miner: Finding copy-paste and
related bugs in large-scale software code. IEEE Trans. Softw. Eng., 32(3):176–192, 2006.
[38] Benjamin Livshits and Thomas Zimmermann. Dynamine: finding common error patterns by
mining software revision histories. SIGSOFT Softw. Eng. Notes, 30(5):296–305, 2005.
[39] T. Longstaff. Update: Cert/cc vulnerability. knowledge base. Technical report, Technical presentation at a DARPA workshop in Savannah, Georgia, 1997.
[40] Tim Menzies, Jeremy Greenwald, and Art Frank. Data mining static code attributes to learn defect
predictors. IEEE Trans. Softw. Eng., 33(1):2–13, 2007.
[41] Raimund Moser, Witold Pedrycz, and Giancarlo Succi. A comparative analysis of the efficiency
of change metrics and static code attributes for defect prediction. In ICSE ’08: Proceedings of
the 30th international conference on Software engineering, pages 181–190, New York, NY, USA,
2008. ACM.
[42] Nachiappan Nagappan and Thomas Ball. Use of relative code churn measures to predict system defect density. In ICSE ’05: Proceedings of the 27th international conference on Software
engineering, pages 284–292. ACM, 2005.
[43] Stephan Neuhaus and Thomas Zimmermann. The beauty and the beast: Vulnerabilities in red
hat’s packages. In Proceedings of the 2009 USENIX Annual Technical Conference, June 2009.
[44] Hoan Anh Nguyen, Tung Thanh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen.
Accurate and efficient structural characteristic feature extraction for clone detection. In FASE
’09: Proceedings of the 12th International Conference on Fundamental Approaches to Software
Engineering, pages 440–455, Berlin, Heidelberg, 2009. Springer-Verlag.
[45] Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen.
Recurring bug fixes in object-oriented programs. In 32nd International Conference on Software
Engineering (ICSE 2010).
50
[46] Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen.
Graph-based mining of multiple object usage patterns. In ESEC/FSE ’09: Proceedings of the
the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT
symposium on The foundations of software engineering, pages 383–392, New York, NY, USA,
2009. ACM.
[47] Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, and Tien N. Nguyen. Operv: Operationbased, fine-grained version control model for tree-based representation. In FASE’ 10: The 13th
International Conference on Fundamental Approaches to Software Engineering.
[48] T. Ostrand, E. Weyuker, and R. Bell. Predicting the location and number of faults in large software
systems. volume 31, pages 340–355. IEEE CS, 2005.
[49] Nam H. Pham, Hoan Anh Nguyen, Tung Thanh Nguyen, Jafar M. Al-Kofahi, and Tien N. Nguyen.
Complete and accurate clone detection in graph-based models. In ICSE ’09: Proceedings of the
2009 IEEE 31st International Conference on Software Engineering, pages 276–286, Washington,
DC, USA, 2009. IEEE Computer Society.
[50] Nam H. Pham, Tung Thanh Nguyen, Hoan Anh Nguyen, Xinying Wang, Anh Tuan Nguyen, and
Tien N. Nguyen. Detecting recurring and similar software vulnerabilities. In 32nd International
Conference on Software Engineering (ICSE 2010 NIER Track).
[51] Martin Pinzger, Nachiappan Nagappan, and Brendan Murphy. Can developer-module networks
predict failures? In SIGSOFT ’08/FSE-16: Proceedings of the 16th ACM SIGSOFT International
Symposium on Foundations of software engineering, pages 2–12, New York, NY, USA, 2008.
ACM.
[52] Jacek Śliwerski, Thomas Zimmermann, and Andreas Zeller. Hatari: raising risk awareness. In
ESEC/FSE-13: Proceedings of the 10th European software engineering conference held jointly
with 13th ACM SIGSOFT international symposium on Foundations of software engineering, pages
107–110. ACM, 2005.
51
[53] Qinbao Song, Martin Shepperd, Michelle Cartwright, and Carolyn Mair. Software defect association mining and defect correction effort prediction. IEEE Trans. Softw. Eng., 32(2):69–82,
2006.
[54] Boya Sun, Ray-Yaung Chang, Xianghao Chen, and Andy Podgurski. Automated support for
propagating bug fixes. In ISSRE ’08: Proceedings of the 2008 19th International Symposium on
Software Reliability Engineering, pages 187–196, Washington, DC, USA, 2008. IEEE Computer
Society.
[55] Suresh Thummalapenta and Tao Xie. Mining exception-handling rules as sequence association
rules. In ICSE ’09: Proceedings of the 2009 IEEE 31st International Conference on Software
Engineering, pages 496–506, Washington, DC, USA, 2009. IEEE Computer Society.
[56] Andrzej Wasylkowski, Andreas Zeller, and Christian Lindig. Detecting object usage anomalies.
In ESEC-FSE ’07: Proceedings of the the 6th joint meeting of the European software engineering
conference and the ACM SIGSOFT symposium on The foundations of software engineering, pages
35–44, New York, NY, USA, 2007. ACM.
[57] Chadd C. Williams and Jeffrey K. Hollingsworth. Automatic mining of source code repositories
to improve bug finding techniques. volume 31, pages 466–480, Piscataway, NJ, USA, 2005. IEEE
Press.
[58] Timo Wolf, Adrian Schroter, Daniela Damian, and Thanh Nguyen. Predicting build failures using
social network analysis on developer communication. In ICSE ’09: Proceedings of the 2009 IEEE
31st International Conference on Software Engineering, pages 1–11. IEEE CS, 2009.