Journal of Network and Computer Applications: Mohiuddin Ahmed, Abdun Naser Mahmood, Jiankun Hu

Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎
1 Contents lists available at ScienceDirect

2
3
4 Journal of Network and Computer Applications
5
6
journal homepage: www.elsevier.com/locate/jnca
7
8
9
Review
10
11 A survey of network anomaly detection techniques
12
13 Q1 Mohiuddin Ahmed, Abdun Naser Mahmood, Jiankun Hu
14
School of Engineering and Information Technology, UNSW Canberra, ACT 2600, Australia
15
16
17
18 art ic l e i nf o a b s t r a c t
19
20 Article history: Information and Communication Technology (ICT) has a great impact on social wellbeing, economic growth
Received 10 June 2015 and national security in todays world. Generally, ICT includes computers, mobile communication devices and
21 Received in revised form networks. ICT is also embraced by a group of people with malicious intent, also known as network intruders,
22 29 October 2015
cyber criminals, etc. Confronting these detrimental cyber activities is one of the international priorities and
23 Accepted 19 November 2015
important research area. Anomaly detection is an important data analysis task which is useful for identifying
24 the network intrusions. This paper presents an in-depth analysis of four major categories of anomaly
25 Keywords: detection techniques which include classification, statistical, information theory and clustering. The paper
26 Intrusion detection also discusses research challenges with the datasets used for network intrusion detection.
Computer security
27 & 2015 Published by Elsevier Ltd.
Anomaly detection
28 Classification
29 Clustering
Q3
30 Information theory
31 Statistics
32 67
33 Contents 68
34 69
35 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 70
36 1.1. Roadmap of the paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 71
2. Preliminary discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 72
37
2.1. Types of anomalies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 73
38
2.2. Output of anomaly detection techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
39 74
2.3. Types of network attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
40 75
2.4. Mapping of network attacks with anomalies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
41 3. Classification based network anomaly detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
76
42 3.1. Support vector machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 77
43 3.2. Bayesian network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 78
44 3.3. Neural network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 79
45 3.4. Rule-based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 80
46 4. Statistical anomaly detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 81
47 4.1. Mixture model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 82
4.2. Signal processing technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 83
48
4.3. Principal component analysis (PCA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 84
49
5. Information theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
50 85
5.1. Correlation analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
51 86
6. Clustering-based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
52 6.1. Regular clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 87
53 6.2. Co-clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 88
54 7. Intrusion detection datasets and issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 89
55 7.1. Limitations of DARPA/KDD datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 90
56 7.2. Contemporary network attacks evaluation dataset: ADFA-LD12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 91
57 7.3. Current network data repositories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 92
8. Evaluation of network anomaly detection techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 93
58
9. Conclusions and future research directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 94
59
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
60 95
61 96
62 97
63 98
64 99
http://dx.doi.org/10.1016/j.jnca.2015.11.016
65 1084-8045/& 2015 Published by Elsevier Ltd.
100
66 101
Please cite this article as: Ahmed M, et al. A survey of network anomaly detection techniques. Journal of Network and Computer
Applications (2015), http://dx.doi.org/10.1016/j.jnca.2015.11.016i
2 M. Ahmed et al. / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎
1 1. Introduction 67
2 68
3 Computer security has become a necessity due to proliferation 69
4 of information technologies in everyday life. The mass usage of 70
5 computerized systems has given rise to critical threats such as 71
6 zero-day vulnerabilities, mobile threats, etc. Despite research in 72
7 the security domain having increased significantly, are yet to be 73
8 mitigated. The evolution of computer networks has greatly exa- 74
9 cerbated computer security concerns, particularly internet security 75
10 in today's networking environment and advanced computing 76
11 facilities. Although Internet Protocols (IPs) were not designed to 77
12 place a high priority on security issues, network administrators 78
13 now have to handle a large variety of intrusion attempts by both 79
14 individuals with malicious intent and large botnets (Papalexakis 80
15 et al., 2012). According to Symantecs Internet Security Threat 81
16 Report, there were more than three billion malware attacks 82
17 reported in 2010 and the number of denial of service attacks 83
18 increased dramatically by 2013 (Symantec internet security threat 84
19 report, 2014). As stated in Verizon's Data Breach Investigation 85
20 Report 2014, 63,437 security breaches carried out by hackers 86
21 (Verizon's data breach investigation report, 2014). The Global State 87
22 of Information Security Survey 2015 (The Global State of Infor- 88
23 mation Security Survey, 2015) found an increase in great rise of 89
24 incidents. Figure 1 shows the security incidents growth from 2009 90
25 to 2014. Therefore, the detection of network attacks has become 91
Fig. 2. Generic framework for network anomaly detection.
26 the highest priority today. In addition, the expertise required to 92
27 commit cyber crimes has decreased due to easily available tools 93
28 image processing, sensor networks, robots behavior and astro- 94
(Hacking and cracking tools, 2014). nomical data (Mahmood et al., 2010; Ahmed et al., 2015b).
29 Anomaly detection is an important data analysis task that 95
30 Figure 2 displays a generic framework for network anomaly 96
detects anomalous or abnormal data from a given dataset. It is an detection. The input data requires processing because the data are
31 interesting area of data mining research as it involves discovering 97
32 of different types, for example, the IP addresses are hierarchical, 98
enthralling and rare patterns in data. It has been widely studied in whereas the protocols are categorical and port numbers are
33 statistics and machine learning (Ahmed et al., 2014), and also 99
34 numerical in nature (Mahmood et al., 2008). Processing techni- 100
synonymously termed as outlier detection, novelty detection,
35 ques are based on the individual anomaly detection techniques. 101
deviation detection and exception mining. Although an anomaly is
36 Then, the anomaly detection techniques (broadly categorized in 102
defined by researchers in various ways based on its application
37 two: supervised and unsupervised) are applied on the data. For 103
domain, one widely accepted definition is that of Hawkins (Haw-
38 evaluation of the output, either scores or labels are used (dis- 104
kins, 1980): ‘An anomaly is an observation which deviates so much
39 cussed in Section 2.2). 105
from other observations as to arouse suspicions that it was generated
40 Although network anomaly detection seems very straightfor- 106
by a different mechanism’. Anomalies are considered important
41 ward, we need to find the data that do not follow normal beha- 107
because they indicate significant but rare events and can prompt
42 vioral patterns. Despite the many techniques available, following 108
critical actions to be taken in a wide range of application domains;
43 are the research challenges. 109
for example, an unusual traffic pattern in a network could mean
44 110
that a computer has been hacked and data is transmitted to A lack of universally applicable anomaly detection technique;
45 111
unauthorized destinations; anomalous behavior in credit card
46 for example, an intrusion detection technique in a wired net- 112
transactions could indicate fraudulent activities, and an anomaly
47 work may be of little use in a wireless network. 113
48
in a MRI image may indicate the presence of a malignant tumor Data contains noise which tends to be an actual anomaly and, 114
(Ahmed et al., 2015a). Anomaly detection has been widely applied therefore, is difficult to segregate.
49 115
50
in countless application domains such as medical and public A lack of publicly available labeled dataset to be used for net- 116
health, fraud detection, intrusion detection, industrial damage, work anomaly detection.
51 117
52 As normal behaviors are continually evolving and may not be 118
45
53 normal forever, current intrusion detection techniques may not 119
40
Number of Incidents (in Million)
54 be useful in the future (Qin et al., 2011). A need for newer and 120
55 35 more sophisticated techniques because the intruders are 121
56 30 already aware of the prevailing techniques. 122
57 25 123
58 Due to the aforementioned challenges, network anomaly 124
20
59 detection has been more challenging than it was before. As most 125
15 existing supervised techniques are based on knowledge provided
60 126
61 10 by an external agent, they require labeled data and are unable to 127
62 5 detect zero-day vulnerabilities. The research community has 128
63 0
increased interest about proactive network security systems. 129
64 2009 2010 2011 2012 2013 2014 During the last decade several surveys of network intrusion 130
65 Fig. 1. Growth of information security incidents (The Global State of Information detection have been conducted. One of the earliest was that of 131
66 Security Survey, 2015). Towards a taxonomy of intrusion-detection systems (1999) who 132
M. Ahmed et al. / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 3
1 Table 1 67
2 Comparison of this and related surveys (n - this survey). 68
3 69
Survey Network anomaly Data issue
4 detection 70
5 71
6 Ahmed[n] √ √ 72
7 Debar-1 (Towards a taxonomy of intru- √ 73
sion-detection systems, 1999)
8 Chandola et al. (2009) √
74
9 Patcha and Park (2007) 75
10 Debar-2 (Debar et al., 2000) √ 76
11 Phua et al. 77
Hodge and Austin (2004)
12 78
Axelsson (1998) √
13 Markou and Singh (2003) 79
14 Furnell et al. (2007) √ 80
15 Beckman and Cook (1983) 81
16 Estevez-Tapiador et al. (2004) √ 82
Gogoi et al. (2011) √
17 Ahmed et al. (2015a) √
83
18 Xie et al. (2011) √ 84
19 Liao et al. (2013) √ 85
20 86
21 87
22 classified intrusion detection systems based on detection method, Fig. 3. Taxonomy of network anomaly detection techniques. 88
23 network behavior, audit source location and/or frequency of usage. 89
24 The same authors extended their work to include the detection performance metric, and contributes to erroneous efficacy claims. 90
paradigm as state- or transition-based (Debar et al., 2000).
25 We consider this phenomenon as ‘Data Issue’ in the scope of 91
26 Another popular survey was by Axelsson et al (Axelsson, 1998) 92
this paper.
27 which focused on the detection principle and operational aspects. 93
Based on the discussion on the existing surveys, this paper aims
28 In the taxonomy matrix of intrusion systems proposed by Furnell 94
at the following:
29 et al. (2007), the matrix is formed from the output type and data 95
scale elements of the taxonomy. Another survey by Estevez-
30 Mapping different types of anomaly with the major types of 96
31 Tapiador et al. (2004) considered anomaly detection methods for 97
attacks based on the characteristics analysis.
32 network intrusion detection in wired networks. Most recently, Providing a taxonomy of network anomaly detection based on 98
33 Gogoi et al. (2011) classified outlier detection techniques as 99
classification, statistics, information theory and clustering
34 supervised and unsupervised but their taxonomy is slightly con- 100
(Fig. 3).
35 fusing because they consider proximity-based approaches under 101
Evaluating the effectiveness of different class of techniques
36 the supervised category. 102
using different criteria such as computational complexity, attack
37 Table 1 shows the methods, application domains and data 103
detection priority, output etc.
38 covered by us and related surveys. Chandola et al. (2009) provided 104
Discussing the recent research related to overcome the issues of
39 an extensive survey encompassing various techniques and appli- 105
publicly available network intrusion evaluation datasets and a
40 cation domains however, not a stand-alone one for network 106
list of current repository.
41 anomaly detection. Phua et al. categorized, compared and sum- 107
42 marized a great number of published technical and review articles 108
43 on automated fraud detection, since the report was published 1.1. Roadmap of the paper 109
44 newer attacks have emerged in the last couple of years. Patcha and 110
45 Park (2007) and Hodge and Austin (2004) presented various In Section 2, we provide a brief introduction of anomaly 111
46 anomaly detection techniques based on supervised, unsupervised detection, different types of attacks and the mapping of these 112
47 and clustering methods with no priority on network anomaly attacks with different types of anomalies. Sections 3– analyse the 113
48 detection. Markou and Singh (2003) and Beckman and Cook four categories of techniques (Fig. 3). Section 7 discusses the 114
49 (1983) conducted surveys of anomaly detection but they con- dataset issues related to network traffic and Section 8 compares 115
50 sidered only supervised methods. A comprehensive review and contrasts different categories of network anomaly detection 116
51 focussing on signature-based, behavior-based and specification- techniques. Section 9 concludes with future research directions in 117
52 based methods for intrusion detection system are presented in handling large volume of network traffic. 118
53 Liao et al. (2013). Xie et al. (2011) also provided a survey on 119
54 anomaly detection in wireless sensor networks. One of the pro- 120
55 blems of the above study is that they do not include any discussion 121
56 on the research challenges related to datasets. Our previous work 2. Preliminary discussion 122
57 (Ahmed et al., 2015a) addressed the data issue in the financial 123
58 domain for fraud detection. Network anomaly detection techni- 2.1. Types of anomalies 124
59 ques are generally tested using datasets (such as DARPA/KDD) 125
60 developed at the end of last century (Creech and Hu, 2013), jus- Anomalies are referred to as patterns in data that do not con- 126
61 tified by the need for publicly available test data and the lack of form to a well-defined characteristic of normal patterns. They are 127
62 any other alternative datasets. Widely accepted as benchmark, generated by a variety of abnormal activities, e.g., credit card fraud, 128
63 these datasets no longer represent relevant architecture or con- mobile phone fraud, cyber attacks, etc., which are significant to 129
64 temporary attack protocols, and are accused of data corruptions data analysts. An important aspect of anomaly detection is the 130
65 and inconsistencies. Hence, testing of network anomaly detection nature of the anomaly. An anomaly can be categorized in the fol- 131
66 techniques using these datasets does not provide an effective lowing ways (Ahmed et al., 2014, 2015a). 132
1 Point anomaly: When a particular data instance deviates from unavailable. A simple example of a DoS attack is denying 67
2 the normal pattern of the dataset, it can be considered a point legitimate users access to a web service when the server is 68
3 anomaly. For a realistic example, if a persons' normal car fuel flooded with numerous connection requests. As performing a 69
4 usage is five litres per day but if it becomes fifty litres in any DoS attack requires no prior access to the target, it is considered 70
5 random day, then it is a point anomaly. a dreaded attack. 71
6 Contextual anomaly: When a data instance behaves anomalously 2. Probe: It is used to gather information about a targeted network 72
7 in a particular context, it is termed a contextual or conditional or host and, more formally, for reconnaissance purposes. 73
8 anomaly; for example, expenditure on a credit card during a Reconnaissance attacks are quite common ways of gathering 74
9 festive period, e.g., Christmas or New Year, is usually higher information about the types and numbers of machines con- 75
10 than during the rest of the year. Although it can be high, it may nected to a network, and a host can be attacked to determine 76
11 not be anomalous as high expenses are contextually normal in the types of software installed and/or applications used. A Probe 77
12 nature. On the other hand, an equally high expenditure during a attack is considered the first step in an actual attack to com- 78
13 non-festive month could be deemed a contextual anomaly. promise a host or network. Although no specific damage is 79
14 Collective anomaly: When a collection of similar data instances caused by these attacks, they are considered serious threats to 80
15 behave anomalously with respect to the entire dataset, the corporations because they might obtain useful information for 81
16 group of data instances is termed a collective anomaly. For launching another dreadful attack. 82
17 example, in a humans Electro Cardiogram (ECG) output, the 3. User to Root (U2R): It is an attack launched, when an attacker 83
18 existence of low values for a long period of time indicates an aims to gain illegal access to an administrative account to 84
19 outlying phenomenon corresponding to an abnormal pre- manipulate or abuse important resources. Using a social engi- 85
20 mature contraction (Lin et al., 2005) whereas one low value by neering approach or sniffing password, the attacker can access a 86
21 itself is not considered anomalous. normal user account and then exploits a or some vulnerability 87
22 to gain the privilege of a super user. 88
23 2.2. Output of anomaly detection techniques 4. Remote to User (R2U): It is launched when an attacker wants to 89
24 gain local access as a user of a targeted machine to have the 90
25 One important issue for anomaly detection is how anomalies privilege of sending packets over its network (also known as 91
26 are represented as output which, generally, is in one of the two R2L). Most commonly, the attacker uses a trial and error method 92
27 following ways (Chandola et al., 2009). to guess the password by automated scripts, a brute force 93
28 method, etc. There are also some sophisticated attacks whereby 94
29 Scores: Scoring-based anomaly detection techniques assign an an attacker installs a sniffing tool to capture the password 95
30 anomaly score to each data instance. Then, the scores are before penetrating the system. 96
31 ranked and an analyst chooses the anomalies or uses a thresh- 97
32 old to select them; for example, in Table 2, the data instances 2.4. Mapping of network attacks with anomalies 98
33 are represented as a, b, c, d, e and the corresponding anomaly 99
34 scores within a range from 0 to 1. Based on the discussion on different types of anomalies and 100
35 Binary/label: According to these techniques, outputs are con- attacks in the previous section of this paper, in this section we 101
36 sidered in a binary fashion, i.e., either anomalous or normal. If identify the relationship among the attacks and anomalies. The 102
37 we consider the data instances in Table 2, binary-based outputs DoS attack characteristics match with the collective anomaly 103
38 will label each data instance as either normal or anomaly. (Ahmed and Mahmood, 2014a, 2015, 2014b). As stated in Section 104
39 2.1, when a collection of data instances behave anomalously, it is 105
40 Techniques which provide binary labels are computationally called collective anomaly but a single data instance from that 106
41 efficient since each data instance does not need to provide or have group is not anomalous. In case of a DoS attack, numerous con- 107
42 an anomaly score. 108
nection request to a web server is a collective anomaly but a single
43 109
request is legitimate. So, we can consider the DoS attack as col-
44 2.3. Types of network attacks 110
lective anomaly. Probe attacks are based on specific intention to
45 111
attain information and reconnaissance. The authors of this paper
46 The task of network security is to protect digital information by 112
map them with contextual anomaly. U2R and R2L attacks are
47 maintaining data confidentiality and integrity, and ensuring the 113
condition specific and sophisticated. Launching suck attacks are
48 availability of resources. In simple terms, a threat/attack refers to 114
not easy as compared to others. Therefore, these attacks are con-
49 anything which has detrimental characteristics aimed at com- 115
sidered as point anomaly. Figure 4 illustrates the mapping of the
50 promising a network. The poor design of a network, carelessness 116
attack classes.
51 of its users and/or mis-configuration of its software or hardware 117
52 can be vulnerable to attacks (Kendall, 1999). 118
53 119
54 1. Denial of service: (DoS) is a type of misuse of the rights to the 3. Classification based network anomaly detection 120
55 resources of a network or host which is targeted at disrupting 121
56 the normal computing environment and rendering the service Classification-based techniques rely on experts' extensive 122
57 knowledge of the characteristics of network attacks. When a 123
58 Table 2 network expert provides details of the characteristics to the 124
59 Outputs from anomaly detection techniques. detection system, an attack with a known pattern can be detected 125
60 as soon as it is launched. This is solely dependent on the attack's 126
Data instance Score Binary/Label
61 signature as a system, which is capable of detecting an attack only 127
62 a 0.3 Normal if its signature has been provided earlier by a network expert. This 128
63 b 0.4 Normal demonstrates a system which can detect only what it knows is 129
64 c 0.2 Normal vulnerable to new attacks, which are constantly appearing in dif- 130
d 0.8 Anomaly
65 ferent versions and more stealthily launched. Even if a new 131
e 0.1 Normal
66 attack‘s signature is created and incorporated in the system, the 132
1 the Windows registry. RAD maps the input data into a high- 67
2 dimensional feature space via a kernel and iteratively finds the 68
3 maximal margin hyperplane to separate two classes of data. 69
4 Interestingly, in a paper by Hu et al. (2003), an anomaly 70
5 detection method which ignores noisy data is developed using the 71
6 Robust SVM (RSVM) is presented. In practice, training data often 72
7 contain noise which invalidates the main assumption of the SVM 73
8 that all the sample data for training are independently and iden- 74
9 tically distributed. As a result, the standard SVM results in a highly 75
10 non-linear decision boundary which leads to poor generalization. 76
11 In this scenario, the RSVM incorporates the averaging technique in 77
12 the form of a class centre to make the decision surface smoother 78
13 and automatically control regularization. In addition, the number/ 79
14 quantity of support vectors in the RSVM is significantly less than 80
15 the standard SVM which results in a reduced run time. More 81
Fig. 4. Mapping of attacks with anomalies.
16 recently a patent is published which contains method and system 82
17 for confident anomaly detection in computer network traffic 83
initial loss is irreplaceable and the repair procedure is extremely
18 where SVM is explored (Balabine and Velednitsky, 2015). 84
expensive.
19 85
20 The classification based approaches rely on the normal traffic 86
activity profile that builds the knowledge base and consider 3.2. Bayesian network
21 87
22 activities deviate from baseline profile as anomalous. The advan- 88
tage lies in their capability to detect attacks which are completely A Bayesian network is an efficient approach for modeling a
23 domain containing uncertainty. A discrete random variable is 89
24 novel, assuming that they exhibit ample deviations from the nor- 90
mal profile. Additionally, as normal traffic not included in the represented using a directed acyclic graph (DAG), where each node
25 reflects the state of the random variable and contains a conditional 91
26 knowledge base is considered an attack, there will be inadvertent 92
false alarms. Therefore, training is required for anomaly detection probability table (CPT). The task of the CPT is to provide the
27 93
techniques to build a normal activity profile which is time- probability of a node being in a specific state. In a Bayesian net-
28 94
consuming and also depends on the availability of completely work, a parent–child relationship exists among the nodes which
29 95
normal traffic datasets. In practice, it is very rare and expensive to indicate that a variable represented by a child node is dependent
30 96
obtain attack-free traffic instances. Moreover, in today's dynamic on those represented by the parent nodes. As this network can be
31 97
and evolving network environments, it is extremely difficult to used for an event classification scheme, it is also applicable for
32 98
keep a normal profile up-to-date. Among a large pool of network anomaly detection. In a paper by Kruegel et al. (2003),
33 99
classification-based network anomaly detection techniques avail- two major problems caused in high false positives in anomaly
34 100
able, we discuss four major techniques as follows. detection techniques are identified. It is assumed that anomaly
35 101
detection systems contain a number of models for analyzing dif-
36 102
3.1. Support vector machine ferent features of an event. The first problem is that models which
37 103
provide a score or probability for the normality/abnormality of an
38 104
The basic principle of the Support Vector Machine (SVM) is to event require the anomaly detection system to aggregate their
39 105
derive a hyperplane that maximizes the separating margin different outputs which result in high false positives. Secondly,
40 106
between the positive and negative classes (Eskin et al., 2002). An anomaly detection systems cannot handle behaviors which are
41 107
interesting property of SVM is that it is an approximate imple- unusual but legitimate, for example, a sudden increase in CPU
42 108
mentation of the structure risk minimization principle, based on utilization, memory usage, etc. If this problem occurs, additional
43 109
statistical learning theory. The standard SVM algorithm is a information can explain unusual behaviors that are not anomalous
44 110
supervised learning technique, which requires labeled data to is ignored.
45 111
create a classification rule. However, it can also be adapted as an Based on the concept of the Bayesian network, the authors of
46 112
unsupervised learning algorithm whereby it tries to separate the Kruegel et al. (2003) proposed an approach for solving the afore-
47 113
entire set of training data from its origin whereas the regular mentioned problems. For an ordered stream of input events
48 114
49 supervised SVM attempts to separate two classes of data in a (S ¼ e1 ; e2 …), the event classification system decides whether an 115
50 feature space by a hyperplane. In a paper by Eskin et al. (2002), the event is normal or abnormal. This decision is based on the outputs 116
51 concept of the unsupervised SVM is used to detect anomalous (oi j i ¼ 1; 2; …; k) from k models (M¼ m1 ; m2 ; …; mk ) and possible 117
52 events. The algorithm finds hyper planes which separate the data additional information (I). The models analyze the features of a 118
53 instances from their origins with the maximal margin and then an given input event and compare their results with those from 119
54 optimization problem is solved to determine the best hyperplane previously established models. The result from an event classifi- 120
55 (for more details, Please see Cristianini and Shawe-Taylor, 2000). cation system (EC) is defined as: 121
56 The optimization problem is solved using a variant of the X
k 122
57 Sequential Minimal Optimization algorithm (Platt, 1999). Using a ECðo1 ; o2 ; …; ok ; IÞ ¼ e is normal : oi rIj j e is anomalous 123
58 similar concept to that of the One-class SVM (OCSVM) but in a i¼1 124
59 supervised manner, in the paper by Heller et al. (2003), a new X
k 125
60 approach called Registry Anomaly Detection (RAD) is developed to : oi 4I ð1Þ 126
i¼1
61 monitor Windows registry queries. It is usual that during normal 127
62 computer activity, a certain set of registry keys is accessed by the A Bayesian network is applied to identify anomalous events by 128
63 Windows program. Based on the fact that users tend to frequently introducing the root node which represents a variable with two 129
64 use certain programs and registry activities are normal, deviations states. One child node is used to capture the model's outputs and 130
65 from these activities will be considered anomalous. The OCSVM the child node is connected to the root node, it is expected that the 131
66 is applied to the RAD system to detect anomalous activities in output events will be different when the input is either abnormal 132
1 Input encompassed within it considered anomalous. These techniques 67

Target
2 consider both single and multi-label learning algorithms. 68
V1 V1
3 From a machine learning point of view, single-label classifica- 69
4 V2 V2 tion aims to learn from a set of instances each of which is asso- 70
5 ciated with a unique class label from a set of disjoint class labels. 71
6 V3 However, multi-label classification allows one instance to be 72
. .
7 Vi . . . associated with more than one class which can be correlated with 73
8 . . . . fuzzy clustering. For a given training set (S ¼ ðxi ; yi Þ; 1 ri r n) 74
. .
9
.
consisting of n training instances (xi A x; yi A y) which are inde- 75
10 pendent and identically distributed, multi-label learning produces 76
Vn Vn
11 a multi-label classifier (n : x-y) that optimizes the specific eva- 77
Layer 1 2 3 4 5
12 luation function. In Yang et al. (2013) a rule-based method for IEC 78
13 Fig. 5. Schematic of replicator neural network, adapted from Hawkins et al. (2002). 60870-5-104 driven SCADA networks using an in-depth protocol 79
14 analysis and a deep packet inspection method is proposed. 80
15 or normal. A more recent application of Bayesian network can be 81
16 found in a telecommunication network (Deljac et al., 2015). 82
17 4. Statistical anomaly detection 83
18 3.3. Neural network 84
19 Intrusion detection techniques have also been developed using 85
20 The strength of a neural network for classifying data has also statistical theories; for example, the established chi-square theory 86
21 been used for network anomaly detection. Neural networks have is used for anomaly detection in Ye and Chen (2001). According to 87
22 been applied in various application domains, such as image and this technique, a profile of normal events in an information system 88
23 speech processing, but they have high computational require- is created. The basic idea in this approach is to detect both a large 89
24 ments. For network anomaly detection, a neural network has been 90
departure of events from normal as anomalous and intrusions. A
25 merged with other techniques, such as a statistical approach and 91
distance measure based on the chi-square test statistic is devel-
26 variants of it. In Hawkins et al. (2002), a Replicator Neural Network 92
oped as
27 (RNN) is used to provide an outlyingness factor for anomalous 93
28 network traffic. It is a feed-forward multi-layer perception with X
n
ðX i Ei Þ2 94
X2 ¼ ð4Þ
29 three hidden layers placed between the input and output layers. i¼1
Ei 95
30 Its objective is to reproduce the input data pattern at the output 96
Xi ¼the observed value of the ith variable,
31 layer with a minimized error via training. Figure 5 presents a 97
Ei ¼ the expected value of the ith variable,
32 schematic view of a RNN. The Sk(Iki) function produces the output 98
n ¼the number of variables.
33 from unit i for layer k as 99
34 100
LX
k1 X2 has a low value when an observation of the variables is near
35 θ ¼ I ki ¼ wkij Z ðk 1Þj ð2Þ 101
the expected. Following the μ 7 3σ rule, when an observation, X2 is
36 j¼0 102
37 greater than X 2 þ 3S2X , is considered an anomaly. 103
where Iki is the weighted sum of the inputs to the unit, Zkj the Krügel et al. (2002) proposed a statistical processing unit for
38 104
output from the jth unit of the kth layer and Lk the number of units detecting anomalous network traffic, more specifically to detect
39 105
in the kth layer. The outlier factor is defined using the trained RNN the attacks which are rare such as R2L and U2R. A metric is
40 106
as follows, where xij is the input value and oij the output value developed which allows the system to automatically search
41 107
from the RNN. identical characteristics of different service requests. The anomaly
42 108
43 1X n score of a request is calculated based on the following three main 109
OF i ¼ ðx oij Þ2 ð3Þ characteristics:
44 n j ¼ 1 ij 110
45 111
46 In Zhang et al. (2001), a hierarchical intrusion detection system the type of request; 112
47 in which neural networks are combined with statistical models to the length of the request; and 113
48 detect network attacks is proposed. The output from the neural the payload distribution. 114
49 network classifier is represented as a continuous variable (t), 115
50 where 1 means an intrusion with absolute certainty and 1 no The network administrator defines a threshold to raise alarms for 116
51 attack. In addition, Self-organizing Maps (SOM) are used for net- anomalous requests. The anomaly score is derived as in Eq. (5) 117
52 work anomaly detection. Ramadas et al. (2003) suggested that, where the payload distribution is given more weight than the 118
53 using SOM, network traffic can be classified in real time. SOM other properties. 119
54 relies on the hypothesis that network attacks can be characterized AS ¼ 0:3 AStype þ 0:3 ASlength þ0:4 ASpayload ð5Þ 120
55 by different sets of neurons that cover larger areas compared to 121
56 others on an output neuron map. In Poojitha et al. (2010) a feed Based on the principles of the statistical theory, different types 122
57 forward neural network trained by back propagation algorithm is of techniques have been developed to detect anomaly 123
58 developed to detect the anomalies using a given dataset with the discussed next. 124
59 information related to the computer network during normal and 125
60 during anomalous behavior. 4.1. Mixture model 126
61 127
62 3.4. Rule-based Based on the concept that an anomaly lies within a large 128
63 number of normal elements, Eskin (2000) proposed a mixture 129
64 Rule-based anomaly detection techniques are widely used in model for detecting anomalies from noisy data. Generally, in 130
65 supervised learning algorithms (Lee et al., 1999). The basic idea is mixture models, each element falls into one of the following two 131
66 to learn the normal behavior of a system and anything not classes: 132
1 having a small probability of λ; or According to the approach by Shyu et al. (2003), it is assumed 67
2 the majority of elements having the probability of 1 λ. that the number of normal instances is much higher than that of 68
3 anomalies. The principal component classifier (PCC) contains two 69
4 The authors of Eskin (2000) assumed from an intrusion detection scores; one of each of the major and minor components, and a 70
5 perspective that the set of system calls with the probability of 1 λ data instance (x) is classified as an anomaly if 71
6 is a legitimate use of the system and the intrusions have the 72
7 probability of λ. From a mixture model perspective, the two X
q
y2 X
p
y2i 73
i
4 c1 or 4 c2 ð8Þ
8 probability distributions which generate the data are called the i¼1
λi λ
i ¼ prþ1 i 74
9 majority (M) and anomalous (A) distributions, with an element (xi) 75
10 generated from either. When the generative distribution for the and is normal when/if 76
11 data is D, it can be represented as X
q X
p 77
y2 y2i
12 i
r c1 or r c2 ð9Þ 78
D ¼ ð1 λÞM þ λA ð6Þ i¼1
λi λ
i ¼ prþ1 i
13 79
14 The data elements generated from the A distribution are consid- 80
where c1 and c2 are outlier thresholds for creating a specific false
15 ered anomalous. 81
alarm. It is also assumed that the data distribution is multivariate
16 82
normal and the false alarm rate of the classifier
17 83
4.2. Signal processing technique
18 α ¼ α 1 þ α 2 α1 α 2 ð10Þ 84
19 85
Although signal processing is an interesting research area, where
20 86
using such a technique for anomaly detection has hardly been !
21 X
q
y2 87
22 explored. In Thottan and Ji (2003), a statistical signal processing α1 ¼ P i
4 c1 j x is a normal instance ð11Þ 88
λi
23 technique based on an abrupt change detection is presented. The i¼1
89
24 authors describe network anomalies in two ways: 90
and
25 0 1 91
26 Anomalies correspond to network failures and performance X
p
y2i 92
problems; α2 ¼ P @ 4c2 j x is a normal instanceA ð12Þ
27 λ
i ¼ pr þ1 i
93
28 Encompasses security-related issues such as DoS attacks. 94
29 95
30 In Thottan and Ji (2003), management information bases are 96
31 used to produce a network health function that can be used to 97
raise alarms corresponding to anomalous networks. The unusual 5. Information theory
32 98
33 behaviors in these bases are determined by finding abrupt changes 99
Information-theoretic measures can be used to create an
34 in their statistics. A hypothesis test based on the general likelihood 100
appropriate anomaly detection model. In a paper by Lee and Xiang
35 ratio (GLR) is used to detect the changes to provide the degree of 101
(2001), several measures, such as entropy, conditional entropy,
36 abnormality on a scale of between 0 and 1. 102
relative entropy, information gain and information cost, are used
37 103
to explain the characteristics of a dataset. We provide the fol-
38 4.3. Principal component analysis (PCA) 104
lowing definitions of these measures.
39 105
40 106
Shyu et al. (2003) presented an easier way to analyze high Entropy is a basic concept of information theory which mea-
41 107
dimensional network traffic dataset using PCA. PCAs are linear sures the uncertainty of a collection of data items. For a dataset,
42 108
combinations of p random variables (A1 ; A2 ; …Ap ) and can be D in which each data item belongs to a class (x A C D ), the
43 109
characterized: entropy of D relative to the j C D j wise classification is defined
44 110
as
45 111
1. uncorrelated,
46 X 1 112
2. with their variances sorted in order from high to low or HðDÞ ¼ PðxÞlog ð13Þ
47 PðxÞ 113
3. their total variance equal to the variance of the original data. x A CD
48 114
49 where P(x) is the probability of x in D. 115
A brief mathematical formulation of PCA is as follows. Let A be
50 Conditional entropy is the entropy of D given that Y is the 116
an nnp data matrix of n observations on each of p variables
51 entropy of the probability distribution (Pðxj yÞ) as 117
(A1 ; A2 ; …Ap ) and S a pnp sample covariance matrix of A1 ; A2 ; …Ap .
52 X 118
If ðλ1 ; e1 Þ; …ðλp ; ep Þ are the p eigenvalue–eigenvector pairs of the 1
53 HðDj YÞ ¼ Pðx; yÞlog ð14Þ 119
matrix S, the ith principal component is as follows, where i¼1,2, Pðxj yÞ
54 x;y A C D ;C Y 120
55 …,p and λ1 Z λ2 Z …λp Z 0. 121
where P(x, y) is the joint probability of x and y and Pðxj yÞ the
56 yi ¼ ei ðx xÞ ¼ ei1 ðx1 x1 Þ þ ei2 ðx2 x2 Þ þ …… þeip ðxp xp Þ ð7Þ 122
conditional probability of x given y.
57 123
Relative entropy is the entropy between two probability dis-
58 An anomaly detection technique based on PCA (Shyu et al., 124
tributions p(x) and q(x) defined over the same x A C D as
59 2003) has the benefits of: 125
60 X pðxÞ 126
being free from any assumption of statistical distribution; relEntropyðpj qÞ ¼ PðxÞlog ð15Þ
61 qðxÞ 127
x A CD
62 being able to reduce the dimension of the data without losing 128
63 any important information; and 129
64 having minimal computational complexity which supports real- Relative conditional entropy is the entropy between two prob- 130
65 time anomaly detection. ability distributions (pðxj yÞ and qðxj yÞ) defined over the same 131
66 132
1 x A C D and y A C Y as ‘triangle area map generation’ module is applied to extract the 67

2 X pðxj yÞ correlations between two distinct features within each traffic 68
3 relCondEntropyðpj qÞ ¼ pðx; yÞlog ð16Þ instance coming from the first step. In Step 3 contains decision- 69
x;y A C D ;C Y
qðxj yÞ
4 making based on training and testing phase. 70
5 The concept of multivariate correlation analysis approach in 71
6 Information gain is a measure of the information gain of an Tan et al. (2014a) is incorporated to characterize network traffic 72
7 attribute or feature A in a dataset D and is instances and to convert them into respective images. These 73
8 X images are used for DoS attack detection based on a widely used 74
jDv j
9 GainðD; AÞ ¼ HðDÞ HðDv Þ ð17Þ dissimilarity measure, namely Earth Mover's Distance (EMD) 75
jDj
10 v A ValuesðAÞ (Rubner et al., 1998). EMD considers cross-bin matching and pro- 76
11 vides a more accurate evaluation on the dissimilarity between 77
where values A is the set of possible values of A and Dv the
12 distributions than some other well-known dissimilarity measures. 78
subset of D where A has the value v.
13 79
14 80
Based on this knowledge, appropriate anomaly detection
15 81
models can be built. Supervised anomaly detection techniques 6. Clustering-based
16 82
require a training dataset followed by a test data to evaluate the
17 83
performance of a model. In this situation, firstly, information- Clustering refers to unsupervised learning algorithms which do
18 84
theoretic measures are used to determine whether a model is not require pre-labeled data to extract rules for grouping similar
19 85
suitable for testing the new dataset. Noble and Cook (2003) con- data instances (Jain et al., 1999). Although there are different types
20 86
ducted experiments on the benchmark DARPA and UNM audit of clustering techniques, we discuss the usefulness of regular
21 87
datasets to demonstrate the utility of information-theoretic mea- clustering and co-clustering for network anomaly detection. The
22 88
sures and concluded that they can be used to create efficient difference between regular clustering and co-clustering is the
23 89
anomaly detection models and also to explain their performances. processing of rows and columns. Regular clustering techniques
24 90
25 such as k-means (Ahmed and Naser, 2013) clusters the data con- 91
5.1. Correlation analysis sidering the rows of the dataset whereas the co-clustering con-
26 92
27 siders both rows and columns of the dataset simultaneously to 93
In Ambusaidi et al. (2014) a nonlinear correlation coefficient- produce clusters (Ahmed et al., 2015d).
28 94
based (NCC) similarity measure is suggested to extract both linear The three key assumptions that are always made when using
29 95
and nonlinear correlations between network traffic. The extracted clustering to detect anomalies are briefly discussed below.
30 96
correlative information is used to detect malicious network
31 97
32
behaviours. Pearson's correlation coefficient is a basic linear cor- Assumption 1: As we can create clusters of only normal data, any 98
relation method to find out dependence between two variables subsequent new data that do not fit well with existing clusters
33 99
(Ahmed et al., 2015c), however, there are datasets where non- of normal data are considered anomalies; for example, as
34 100
linear correlation exists between different variables such as in density-based clustering algorithms do not include noise inside
35 101
network traffic. The NCC is defined by Wang et al. (2005) as in Eq. clusters (Ester et al., 1996), noise is considered anomalous.
36 102
37
(18), where H r ðXÞ and H r ðYÞ are the revised entropies of the vari- Assumption 2: When a cluster contains both normal and 103
able X and Y. anomalous data, it has been found that the normal data lie close
38 104
39 NCCðX; YÞ ¼ H r ðXÞ þ H r ðYÞ H r ðX; YÞ ð18Þ to the nearest clusters centroid but anomalies are far away from 105
40 centroids (Ahmed and Naser, 2013). Under this assumption, 106
Given a set of m normal training data instances, NCC is calcu- anomalous events are detected using a distance score.
41 107
42
lated first. For any incoming instance the NCC between incoming Assumption 3: In a clustering with clusters of various sizes, the 108
one and the normal instances is recorded as NCC m;m þ 1 . For a user smaller and sparser can be considered anomalous and the
43 109
defined threshold σ which is ranged between 0 and 1, an incoming thicker normal. Instances belonging to clusters the sizes and/or
44 110
traffic instance is considered as anomalous if the difference in NCC densities below a threshold are considered anomalous.
45 111
is greater than the σ (19).
46 112
47 m m;m þ 1 6.1. Regular clustering 113
NCC NCC 4σ ð19Þ
48 114
49 In Tan et al. (2014a), for DoS attack detection a system is pro- The approach used by Münz et al. (2007) to anomalous data is 115
50 posed that uses multivariate correlation analysis (MCA) for accu- quite straightforward. They use k-means clustering to generate 116
51 rate network traffic characterization by extracting the geometrical normal and anomalous clusters. Once clustering is achieved, it is 117
52 correlations between network traffic features. The detection pro- analyzed using the following assumptions: 118
53 cess contains three major steps as shown in Fig. 6. In Step 1, basic 119
54 features are generated in a well-defined time interval. Step An instance is classified as normal, if it is closer to the normal 120
55 2 includes the multivariate correlation analysis, where the than anomalous clusters centroid and vice versa; 121
56 122
57 123
58 124
59 125
60 126
61 127
62 128
63 129
64 130
65 131
66 Fig. 6. MCA based framework for DoS attack detection, adapted from Tan et al. (2014a). 132
1 If the distance between an instance and centroid is larger than a More recently, In Ahmed and Mahmood (2014a), the authors 67
2 predefined threshold (dmax), the instance is treated as an also used x-means clustering (Pelleg and Moore, 2000) to detect 68
3 anomaly; and collective anomaly such as DoS attacks. The performance of their 69
4 An instance is treated as an anomaly, if it is closer to the technique was significantly better than other existing clustering 70
5 anomalous than normal clusters centroid or if its distance to the based methods. 71
6 normal clusters centroid is larger than the predefined threshold. 72
7 6.2. Co-clustering 73
8 Petrovic et al. (2006) proposed a cluster-labeling strategy based 74
9 on a combination of clustering evaluation techniques. The Davies– Co-clustering can be simply considered a simultaneous clus- 75
10 Bouldin clustering evaluation index and comparison of the cen- tering of both rows and columns (Govaert and Nadif, 2008; 76
11 troid diameters of the clusters are combined in order to respond Banerjee et al., 2007). It can produce a set of c column clusters of 77
12 adequately to the properties of attack vectors. They consider the the original columns (C) and a set of r row clusters of the original 78
13 compactness of the corresponding clusters and the separation row instances (R). Unlike other clustering algorithms, co- 79
14 between them, and the principal parameters which distinguish clustering defines a clustering criterion and then optimizes it. In 80
15 between ‘normal’ and ‘abnormal’ behavior in the analyzed net- a nutshell, it simultaneously finds the subsets of rows and columns 81
16 work. However, they do not explain the reason for use of k ¼2 for of a data matrix using a specified criterion. The benefits of co- 82
17 their k-means clustering. According to their approach, the attack clustering over the regular clustering are the following: 83
18 vectors are often very similar, if not identical; for example, the 84
19 corresponding cluster in the case of a massive attack is extremely 1. Simultaneous grouping of both rows and columns can provide a 85
20 compact and the Davies–Bouldin index of such a clustering is either more compressed representation and it preserves information 86
21 0 (when the non-attack cluster is empty) or very close to 0. contained in the original data. 87
22 Bearing in mind the expected similarity among attack vectors, as 2. Co-clustering can be considered as a dimensionality reduction 88
23 the diameter of the centroid of an attack cluster is expected to be technique and it is suitable for creating new features. 89
24 smaller than that of a non-attack cluster they can distinguish 3. Significant reduction in computational complexity. For example, 90
25 between normal and anomalous clusters. traditional k-means algorithm has the OðmnkÞ as computational 91
26 Portnoy et al. (2001) proposed clustering based on the width to complexity where m ¼ number of rows, n ¼number of columns 92
27 classify data instances. The width is constant and remains the and k is the number of clusters. But in co-clustering the 93
28 same for all clusters. Once clustering is performed, based on the computational complexity is Oðmkl þ nklÞ, here l is the number 94
29 assumption that normal instances constitute an overwhelmingly of column clusters. Obviously OðmnkÞ⪢Oðmkl þ nklÞ. 95
30 large proportion of the entire dataset, N percent of clusters are 96
31 normal and the rest are anomalous. Using this assumption, Leung According to Ahmed and Mahmood (2014b), co-clustering is 97
32 and Leckie (2005) proposed a density-and grid-based clustering beneficial for detecting DoS attacks and significant performance 98
33 algorithm which is suitable for unsupervised anomaly detection. improvement is achieved while it is being used in the collective 99
34 Syarif et al. (2012) investigated the performances of various anomaly detection framework (Table 4 depicts the experimental 100
35 clustering algorithms when applied for anomaly detection. They results). However, in Papalexakis et al. (2012), the usage of co- 101
36 used five different approaches, the k-means, improved k-means, k- clustering for detecting all types of network attacks is investigated. 102
37 medoids, Expectation Maximization (EM) clustering, and distance- Table 5 shows the comparison of clustering purity between Ahmed 103
38 based anomaly detection algorithms. Table 3 demonstrates the and Mahmood (2014b) and Papalexakis et al. (2012) for identifying 104
39 performance evaluation of clustering algorithms used for network DoS attacks. 105
40 anomaly detection. 106
41 Arshad et al. (https://repository.lib.fit.edu/handle/11141/126) 107
42 developed an approach to determine if a cluster is an outlier 7. Intrusion detection datasets and issues 108
43 (CLAD). This technique relies on two properties of a cluster; its 109
44 density and distance from other clusters. According to them the Due to privacy issues, the datasets used for network traffic 110
45 cluster density is dependent on the number of data instances. To analysis are not easily available. There are very few publicly 111
46 determine the distance, they calculate the average inter-cluster available datasets and among them DARPA/KDD datasets are 112
47 distance (ICD) between one cluster and the others. According to considered as benchmark. In this section, we discuss the limita- 113
48 Guan et al. (2003), if the population ratio of one cluster is above a tions of publicly available DARPA/KDD datasets and will also pro- 114
49 given threshold, all the instances in that cluster are classified as vide a detailed introduction to a recent technique for building an 115
50 normal; otherwise, they are labeled intrusive. Joshua et al. (Old- intrusion detection dataset. 116
51 meadow et al., 2004) proposed a solution for time-varying net- 117
52 work intrusion detection and they also demonstrate how feature 118
7.1. Limitations of DARPA/KDD datasets
53 weighting can be improved in classification accuracy. Clustering 119
54 techniques are also incorporated in other systems as hybrid 120
Among the anomaly detection techniques discussed in the
55 techniques, for example in k-NN based classifiers for online 121
scope of this paper, more than 50% of them uses the DARPA/KDD
56 anomaly detection (Su, 2011). 122
datasets due to their availability. However, these datasets are cri-
57 123
ticized by Testing intrusion detection systems (2000) for the
58 Table 3 124
Network anomaly detection: evaluation using NSL-KDD dataset (Syarif et al., 2012).
generation procedure and the analysis by Mahoney and Chan
59 125
60 126
Algorithm Accuracy (%) False positive (%) Table 4
61 Evaluation results. 127
62 k-means 57.81 22.95 128
63 Improved k-means 65.40 21.52 Accuracy Precision Recall F-measure Attack cluster Normal cluster 129
64 k-medoids 76.71 21.83 purity purity 130
EM clustering 78.06 20.74
65 Distance-based anomaly detection 80.15 21.14 92.82% 0.9236 0.9923 0.96 92.36% 95.6%
131
66 132
1 Table 5 exploits. Interestingly, the performances of intrusion detection 67

2 Cluster purity comparison. techniques on the KDD datasets vary from the ADFA-LD12 as KDD 68
3 dataset fails to represent contemporary attacks. 69
Purity Ahmed and Mahmood (2014b) Papalexakis et al. (2012)
4 70
5 Normal (%) 95.6 75.84 71
6 Attack (%) 92.36 92.44 7.3. Current network data repositories 72
7 73
8 Some publicly available network traffic datasets are based on 74
9 the current operating systems and hardware (their sources and 75
10 Table 6
comments about them are listed below). However, several projects 76
11 Attack structure, adapted from Creech and Hu (2013). 77
dedicated to the development of benchmark intrusion detection
12 78
Payload/effect Vector evaluation datasets are currently being undertaken.
13 79
14 80
Password bruteforce FTP, SSH by Hydra (Thc-hydra, 2014)
PREDICT (Predict, 2014): Stands for Protected Repository for the
15 Adding new super-user Client side poisoned executable 81
Java-based meterpreter Tiki Wiki vulnerability exploit (Tikiwiki, 2014) Defense of Infrastructure Against cyber threats. It is a US-based
16 82
Linux Meterpreter Payload Client side poisoned executable community of producers of security-relevant network opera-
17 C100 webshell PHP remote file inclusion vulnerability
83
tions data and consists of researchers of networking and
18 84
information security. This dataset supports developers and
19 85
evaluators by providing regularly updated network data rele-
20 (2003) found evidence of simulation artifacts that could result in 86
vant to cyber security research.
21 over-estimations of anomaly detection performances. The KDD 87
CAIDA (Caida, 2014): Provides basic captured network traces but
22 datasets were developed using a Solaris-based operating system to 88
it is not labeled and lacks multiple-attack scenarios.
23 collect a wide range of data due to its easy deployment. However, 89
Internet Traffic Archive (Internet traffic archive, 2014): It is a
24 we can see significant differences in today's operating systems 90
repository for supporting widespread access to traces of inter-
25 which barely resemble Solaris. In this age of Ubuntu, Windows and 91
net network traffic and is sponsored by ACM SIGCOMM. How-
26 MAC, Solaris has almost no market share. The traffic collector used 92
ever, it suffers from heavy anonymization, lacks the necessary
27 in KDD datasets, TCPdump, is very likely to become overloaded 93
packet information, is not labeled and has no multiple-attack
28 and drop packets from a heavy traffic load. More importantly, 94
scenarios.
29 there is some confusion about these datasets attack distributions. 95
DEFCON (Defcon, 2014): It is different from real network traffic
30 According to an attack analysis, Probe is not an attack unless the 96
and consists mainly of intrusive traffic and normally used for
31 number of iterations exceeds a specific threshold while label 97
the alert correlation technique.
32 inconsistency has been reported. A description of the KDD datasets 98
ADFA Intrusion Detection Datasets (Adfa intrusion detection
33 states that there are 24 training and 14 test attacks. However, it is 99
datasets, 2014): It covers both Linux and Windows, and are
34 reported by Shafi and Abbass (2013) that the training data contain 100
designed for the evaluation by a system call-based HIDS.
35 22 attacks and the test data 17. This inconsistency has a significant 101
NSL-KDD (NSL-KDD, 2014): It is a dataset suggested as a means
36 impact on the class distribution of attacks. In this scenario, it is 102
of solving some of the inherent problems of the KDD dataset
37 important to create intrusion detection datasets in modern-day 103
mentioned in Testing intrusion detection systems (2000).
38 computing to address the issues of DARPA/KDD. The next section 104
KYOTO (Kyoto Dataset, 2014): It contains traffic data from Kyoto
39 discusses one such contemporary dataset for network traffic 105
University's ‘Honeypots’.
40 analysis. 106
ISCX 2012 (Shiravi et al., 2012): It is developed by Information
41 107
Security Centre of Excellence at University of New Brunswick. It
42 7.2. Contemporary network attacks evaluation dataset: ADFA-LD12 108
contains seven days captured traffic with overall 2450324 flows
43 109
including DoS attacks.
44 Creech and Hu (2013) introduced a publicly available dataset 110
ICS Attack (ICS Attack Dataset, 2014): Oak Ridge National
45 ADFA-LD12, which is a representative of the modern attack 111
Laboratories (ORNL) have created three datasets which include
46 structure and methodology. The dataset is developed using 112
measurements related to electric transmission system normal,
47 Ubuntu Linux version 11.04 (a modern Linux distribution widely 113
disturbance, control and cyber attack behaviors.
48 used Ubuntu Linux, 2014) as the host operating system (Adfa 114
49 intrusion detection datasets, 2014). To allow web-based attacks, 115
50 Apache Version 2.2.17 (The apache software foundation, 2014) 116
51 running on PHP Version 5.3.5 (PHP: Hypertext processor, 2014) 117
8. Evaluation of network anomaly detection techniques
52 was installed and enabled. TikiWiki Version 8.1 (Tikiwiki: Cms 118
53 groupware, 2014) was installed as a web-based collaborative tool 119
This paper contains discussion on four major categories of
54 because it has a known vulnerability, as detailed in Tikiwiki (2014). 120
network anomaly detection (Sections 3–6). In this section, we
55 The developers of the dataset selected the attacks carefully and 121
56 focused on the methods of contemporary penetration testers and compare and contrast (Table 7) these techniques based on the 122
57 hackers. Also, there was a trade-off between the vulnerability of following criteria: 123
58 the targeted system and the realism required, with the intended 124
59 target server for the ADFA-LD12 fully patched to create a realistic Table 7 125
Evaluation of network anomaly detection techniques.
60 target. The vulnerability used in this scenario, such as TikiWiki 126
61 remote code execution vulnerability (Tikiwiki, 2014) is considered Technique Output Attack priority Complexity 127
62 to emulate a realistic and small flaw in a well-configured server, is 128
63 an acceptable simulation of the real world. Table 6 shows the Classification Label, score DoS Quadratic 129
64 breakdown of payloads and vectors used to attack the Ubuntu OS. Statistical Label, score R2L, U2R Linear 130
Clustering Label DoS Quadratic
65 The ADFA-LD12 is a possible successor to the DARPA/KDD data- Information theory Label Neutral Exponential
131
66 sets as it uses a modern Linux operating system and up-to-date 132
1 Computational Complexity (Papadimitriou, 1994): Linear, Quad- Data mining and machine learning techniques constantly 67
2 ratic, Exponential attempt to improve the knowledge discovery process. The ubi- 68
3 Preference of attack detection: DoS, Probe, R2L, U2R quitous data streams generated from various applications are 69
4 Output: Label, Score greater in volume (Ahmed and Mahmood, 2014; Ahmed et al., 70
5 2015e, 2015f, 2015c). Given the fact that internet traffic doubles 71
6 As discussed earlier in Section 2.2, the anomaly detection each year and computer network traffic is increasing at a fast rate 72
7 techniques which only have the labelled output are more efficient making it a challenging task to monitor a network in real time. 73
8 than the score based outputs. In this scenario, clustering and Applications such as email, ftp, http and p2p generate a large 74
9 information theory based techniques are better than the classifi- amount of data, even for small networks, which cannot be ana- 75
10 cation and statistical techniques. When the priority of attack lyzed in real time (Zhu, 2011). Consequently, many existing data 76
11 detection is concerned, the classification and clustering-based mining techniques cannot be applied to data streams which are 77
12 techniques are more interested to identify DoS attacks. More- evolving and need to be mined in a single pass. As argued by 78
13 over, the DoS attacks are the most easily launched attacks and yet Hoplaros et al. (2014), summarization is a potential solution to this 79
14 they have detrimental impact on any network. Therefore, the issue. However, existing summarization processes are complex 80
15 techniques dealing with identifying DoS attacks are more and struggles to find emerging patterns in huge volumes of data 81
16 demanding than others. Based on the computational complexity, also known as ‘Big Data’. This poses a challenge for the next- 82
17 statistical techniques are better than other techniques due to their generation data mining community. From a network traffic per- 83
18 linear complexity nature. It is usual to have linear complexity for spectives, finding both normal and anomalous traffic patterns is 84
19 fitting statistical distribution such as Gaussian, mixture models. important and could be an interesting area for future research. 85
20 However, in case of principal component analysis, the complexity Existing anomaly detection techniques are mostly for mon- 86
21 is not linear because of the underlying computations. The clus- itoring a single system or a single network by carrying out local 87
22 analysis for attacks. Hence, between instances of such a stand- 88
tering and classification-based techniques have quadratic com-
23 alone anomaly detection techniques, no communication and 89
plexity for the following reasons:
24 interaction exists. Certainly, such a solution will not be able to 90
25 Clustering techniques require pairwise distance computation. detect sophisticated and highly distributed attacks (Vasilomano- 91
26 Classification techniques require quadratic optimization to lakis et al., 2015; Tan et al., 2014b). Thus, for the security of large 92
27 networks and large IT ecosystems (i.e. cloud services), collabora- 93
separate two or more classes (e.g. SVM).
28 tive techniques are extremely efficient which consist of several 94
29 monitors that act as sensors and collect data. Due to the unavail- 95
The information theory based techniques suffer from expo-
30 ability of implementations of collaborative techniques such as 96
nential time complexity because of the calculation of the mea-
31 CIDSs (Collaborative Intrusion Detection Systems), future research 97
surements such as entropy, relative uncertainty, etc. These tech-
32 efforts are necessary for extensive quantitative evaluation with 98
niques also require dual optimization for minimizing the subset
33 99
size and simultaneously reducing the complexity in the dataset. state-of-the-art network infrastructure.
34 100
From the above discussion, we can arrive at the following
35 101
conclusions:
36 102
References
37 103
As far as the output style is concerned, clustering and infor-
38 104
mation theory based techniques are better than others. Clus- Adfa intrusion detection datasets, accessed: 2014-12-29. URL 〈http://seit.unsw.adfa.
39 edu.au/staff/sites/hu/〉.
105
tering techniques are computationally efficient and has specific
40 Ahmed M, Mahmood A. Clustering based semantic data summarization technique: 106
target for DoS attack detection, while the information theory a new approach. In: 2014 IEEE 9th conference on industrial electronics and
41 107
based techniques have no specific attack target. applications (ICIEA), 2014, p. 1780–5.
42 108
Based on the attack preference, both clustering and classifica- Ahmed M, Mahmood A. Network traffic analysis based on collective anomaly
43 detection. In: 2014 IEEE 9th conference on industrial electronics and applica- 109
tion targets DoS attack, while the statistical techniques have the
44 tions (ICIEA), 2014. p. 1141–46. 110
preference for R2L and U2R attacks, which are very rare. Ahmed M, Mahmood AN. Network traffic pattern analysis using improved
45 111
Though the statistical techniques have the lowest complexity, information-theoretic co-clustering based collective anomaly detection. In:
46 Security and privacy in communication networks, Lecture notes of the institute 112
they are not suitable for DoS attack detection.
47 for computer sciences, social informatics and telecommunications engineering, 113
Although both classification and clustering techniques have vol. 153. Springer International Publishing, 2014. p. 1–16.
48 114
similar complexity and the same target, the classification tech- Ahmed M, Mahmood A. Novel approach for network traffic pattern analysis using
49 clustering-based collective anomaly detection. Ann. Data Sci. 2015;2(1):111–30. 115
niques are based on supervised learning which requires tracing
50 Ahmed M, Naser A. A novel approach for outlier detection and clustering 116
of pre-labeled data. But clustering is completely unsupervised. improvement. In: 2013 8th IEEE conference on industrial electronics and
51 117
Clustering cannot outdo the statistical techniques in complexity, applications (ICIEA), 2013, p. 577–82.
52 Ahmed M, Mahmood AN, Hu J. Outlier Detection, CRC Press, New York, USA, 2014. 118
yet it outperforms in all other criteria.
53 p. 3–21, Chapter 1 (in book: The State of the Art in Intrusion Prevention and 119
54 Detection). 120
Ahmed M, Mahmood AN, Islam MR. A survey of anomaly detection techniques in
55 financial domain, Future Generation Computer Systems, http://dx.doi.org/10.
121
56 9. Conclusions and future research directions 1016/j.future.2015.01.001. 122
57 Ahmed M, Anwar A, Mahmood AN, Shah Z, Maher MJ. An investigation of perfor- 123
The survey of literature reported in this paper has categorized mance analysis of anomaly detection techniques for big data in scada systems.
58 EAI Endorsed Trans Ind Netw Intell Syst 2015b;15(3):1–16.
124
59 the network anomaly detection methods on four major categories. Ahmed M, Mahmood AN, Maher MJ. An efficient technique for network traffic 125
60 For each category, we described the assumptions for segregating summarization using multiview clustering and statistical sampling. EAI 126
normal data instances from anomalous. These assumptions will Endorsed Trans Scalable Inf Syst 2015c;15(5):1–9.
61 127
Ahmed M, Mahmood A, Maher M. Heart disease diagnosis using co-clustering. In:
62 provide a guideline to assess the efficiency of the techniques when Jung JJ, Badica C, Kiss A, editors. Scalable information systems, Lecture notes of 128
63 applied in a particular domain. Compared to other surveys, this the institute for computer sciences, Social informatics and telecommunications 129
64 paper provided a discussion on network traffic dataset issues engineering, vol. 139. Springer International Publishing; 2015d. p. 61–70. 130
Ahmed M, Mahmood A, Maher M. A novel approach for network traffic summar-
65 which are of significant concern to the research community in the ization. In: Jung JJ, Badica C, Kiss A, editors. Scalable information systems, 131
66 area of network traffic analysis. Lecture notes of the institute for computer sciences, Social informatics and 132
1 telecommunications engineering, vol. 139. Springer International Publishing; Krügel C, Toth T, Kirda E. Service specific anomaly detection for network intrusion 67
2 2015e. p. 51–60. detection. In: Proceedings of the 2002 ACM symposium on applied computing, 68
Ahmed M, Mahmood A, Maher M. An efficient approach for complex data sum- SAC '02, ACM, New York, NY, USA; 2002. p. 201–8.
3 marization using multiview clustering. In: Jung JJ, Badica C, Kiss A, editors. Kyoto Dataset, accessed: 2014-12-29. URL 〈www.takakura.com〉.
69
4 Scalable information systems, lecture notes of the institute for computer sci- Lee W, Xiang D. Information-theoretic measures for anomaly detection. In: Pro- 70
5 ences, Social informatics and telecommunications engineering, vol. 139. ceedings of 2001 IEEE symposium on security and privacy, 2001 S P 2001; 2001. 71
6 Springer International Publishing; 2015f. p. 38–47. p. 130–43. 72
Ambusaidi MA, Tan Z, He X, Nanda P, Lu LF, Jamdagni A. Intrusion detection method Lee W, Stolfo S, Mok K. A data mining framework for building intrusion detection
7 based on nonlinear correlation measure. Int J Internet Protoc Technol 2014;8(2/3): models. In: Proceedings of the 1999 IEEE Symposium on Security and Privacy; 73
8 77–86. 1999. p. 120–32. 74
9 Axelsson S. Technical report: Research in intrusion-detection systems: A survey, no. Leung K, Leckie C. Unsupervised anomaly detection in network intrusion detection 75
98–17, SE–412 96, Göteborg, Sweden, 1998. using clusters. In: Proceedings of the twenty-eighth Australasian conference on
10 76
Balabine I, Velednitsky A. Method and system for confident anomaly detection in computer science – vol. 38, ACSC '05, Australian Computer Society, Inc., Dar-
11 computer network traffic. Google Patents, 2015. linghurst, Australia, Australia; 2005. p. 333–42. 77
12 Banerjee A, Dhillon I, Ghosh J, Merugu S, Modha DS. A generalized maximum Liao H-J, Lin C-HR, Lin Y-C, Tung K-Y. Intrusion detection system: a comprehensive 78
13 entropy approach to bregman co-clustering and matrix approximation. J Mach review. J Netw Comput Appl 2013;36(1):16–24. 79
Learn Res 2007;8:1919–86. Lin J, Keogh E, Fu A, Van Herle H. Approximations to magic: finding unusual
14 Beckman RJ, Cook RD. Outlier .s. Technometrics 1983;25(2):119–49. medical time series. In: Proceedings of the 18th IEEE symposium on computer-
80
15 Caida, accessed: 2014-12-29. URL 〈www.caida.org〉. based medical systems, CBMS '05, IEEE Computer Society, Washington, DC, 81
16 Chandola V, Banerjee A, Kumar V. Anomaly detection: a survey. ACM Comput Surv USA; 2005. p. 329–334. 82
2009;41(3):15:1–58. Mahmood A, Leckie C, Udaya P. An efficient clustering scheme to exploit hier-
17 83
Creech G, Hu J. Generation of a new ids test dataset: time to retire the kdd col- archical data in network traffic analysis. IEEE Trans Knowl Data Eng 2008;
18 lection, In: Wireless communications and networking conference (WCNC), 20(6):752–67. 84
19 2013 IEEE, 2013. p. 4487–92. Mahmood AN, Hu J, Tari Z, Leckie C. Critical infrastructure protection: resource 85
20 Cristianini N, Shawe-Taylor J. An introduction to support vector machines: and efficient sampling to improve detection of less frequent patterns in network 86
other kernel-based learning methods. New York, NY, USA: Cambridge Uni- traffic. J Netw Comput Appl 2010;33(4):491–502.
21 Mahoney M, Chan P. An analysis of the 1999 darpa/lincoln laboratory evaluation
87
versity Press; 2000.
22 Debar H, Dacier M, Wespi A. A revised taxonomy for intrusion-detection systems. data for network anomaly detection. In: Vigna G, Kruegel C, Jonsson E, editors. 88
23 Ann Des Télécommun 2000;55(7–8):361–78. Recent advances in intrusion detection, Lecture notes in computer science, vol. 89
24 Defcon, accessed: 2014-12-29. URL 〈www.defcon.org〉. 2820. Berlin, Heidelberg: Springer; 2003. p. 220–37. 90
Deljac Željko, Randic M, Krcelic G. Early detection of network element outages Markou M, Singh S. Novelty detection: a review; part 2: neural network based
25 based on customer trouble calls. Decis Support Syst 2015;73:57–73. approaches. Signal Process 2003;83(12):2499–521.
91
26 Eskin E. Anomaly detection over noisy data using learned probability distributions. Münz G, Li S, Carle G. Traffic anomaly detection using kmeans clustering. In: In GI/ 92
27 In: Proceedings of the seventeenth international conference on machine ITG Workshop MMBnet, 2007. 93
learning, ICML '00, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc., Noble CC, Cook DJ. Graph-based anomaly detection. In: Proceedings of the ninth
28 94
2000. p. 255–62. ACM SIGKDD international conference on knowledge discovery and data
29 Eskin E, Arnold A, Prerau M, Portnoy L, Stolfo S. A geometric framework for mining, KDD '03, ACM, New York, NY, USA; 2003. p. 631–6. 95
30 unsupervised anomaly detection. in: Barbará D, Jajodia S (editors). Applications NSL-KDD, accessed: 2014-12-29. URL 〈http://nsl.cs.unb.ca/NSL-KDD/〉. 96
31 of data mining in computer security, Adv Inf Secur, vol. 6. Springer US, 2002. p. Oldmeadow J, Ravinutala S, Leckie C. Adaptive clustering for network intrusion 97
77–101. detection. In: Dai H, Srikant R, Zhang C, editors. Advances in knowledge dis-
32 Ester M, Kriegel H-P, Sander J, Xu X. A density-based algorithm for discovering covery and data mining, Lecture notes in computer science, vol. 3056. Berlin,
98
33 clusters in large spatial databases with noise. In: KDD'96, 1996. p. 226–31. Heidelberg: Springer; 2004. p. 255–9. 99
34 Estevez-Tapiador JM, Garcia-Teodoro P, Diaz-Verdejo JE. Anomaly detection meth- Papadimitriou CM. Computational complexity. Reading, MA: Addison-Wesley; 100
ods in wired networks: a survey and taxonomy. Comput Commun 2004; 1994.
35 101
27(16):1569–84. Papalexakis EE, Beutel A, Steenkiste P. Network anomaly detection using co-
36 Furnell S, Tucker C, Furnell S, Ghita B, Brooke P. A new taxonomy for comparing clustering. In: Proceedings of the 2012 international conference on advances 102
37 intrusion detection systems. Internet Res 2007;17(1):88–98. in social networks analysis and mining (ASONAM 2012), ASONAM '12, IEEE 103
38 Gogoi P, Bhattacharyya D, Borah B, Kalita JK. A survey of outlier detection methods Computer Society, Washington, DC, USA; 2012. p. 403–10. 104
in network anomaly identification. Comput J 2011;54(4):570–88. Patcha A, Park J-M. An overview of anomaly detection techniques: existing solu-
39 Govaert G, Nadif M. Block clustering with Bernoulli mixture models: comparison of tions and latest technological trends. Comput Netw 2007;51(12):3448–70.
105
40 different approaches. Comput Stat Data Anal 2008;52(6):3233–45. Pelleg D, Moore AW. X-means: extending k-means with efficient estimation of the 106
41 Guan Y, Ghorbani A, Belacel N. Y-means: a clustering method for intrusion detec- number of clusters. In: Proceedings of the seventeenth international conference 107
42 tion. In: IEEE CCECE 2003 Canadian conference on electrical and computer on machine learning, ICML '00, San Francisco, CA, USA; Morgan Kaufmann 108
engineering, 2003, vol. 2; 2003. p. 1083–6. Publishers Inc., 2000. p. 727–34.
43 Hacking and cracking tools, accessed: 2014-12-29. URL 〈http://hackingncrack Petrovic S, Alvarez G, Orfila A, Carbo J. Labelling clusters in an intrusion detection 109
44 ingtools.blogspot.com.au/〉. system using a combination of clustering evaluation techniques. In: Proceed- 110
45 Hawkins D. Identification of outliers (monographs on statistics and applied prob- ings of the 39th annual Hawaii international conference on System Sciences, 111
46 Q4 ability). 1st ed. Springer; 1980. 2006. HICSS '06, vol. 6; 2006. p. 129b–129b.
112
Hawkins S, He H, Williams G, Baxter R. Outlier detection using replicator neural PHP: Hypertext processor, accessed: 2014-12-29. URL 〈http://www.php.net〉.
47 networks. In: Kambayashi Y, Winiwarter W, Arikawa M, editors. Data ware- Phua C, Lee VCS, Smith-Miles K, Gayler RW. A comprehensive survey of data 113
48 housing and knowledge discovery, lecture notes in computer science, vol. 2454. mining-based fraud detection research, CoRR abs/1009.6119. URL arxiv.org/abs/ 114
49 Berlin, Heidelberg: Springer; 2002. p. 170–80. 1009.6119. 115
Heller KA, Svore KM, Keromytis AD, Stolfo SJ. One class support vector machines for Platt JC. Advances in kernel methods. In: Fast training of support vector machines
50 detecting anomalous windows registry accesses. In: Proceedings of the work- using sequential minimal optimization. MIT Press, Cambridge, MA, USA; 1999.
116
51 shop on data mining for computer security, 2003. p. 185–208. 117
52 Hodge V, Austin J. A survey of outlier detection methodologies. Artif Intell Rev Poojitha G, Kumar K, Reddy P. Intrusion detection using artificial neural network, 118
2004;22(2):85–126. In: 2010 International conference on computing communication and net-
53 119
Hoplaros D, Tari Z, Khalil I. Data summarization for network traffic monitoring. J working technologies (ICCCNT); 2010. p. 1–7.
54 Netw Comput Appl 2014;37(0):194–205. Portnoy L, Eskin E, Stolfo S. Intrusion detection with unlabeled data using cluster- 120
55 Hu W, Liao Y, Vemuri VR. Robust anomaly detection using support vector machines. ing. In: Proceedings of ACM CSS workshop on data mining applied to security 121
56 In: Proceedings of the international conference on machine learning; 2003. (DMSA); 2001. 122
ICS Attack Dataset, 2014, accessed: 2015-02-27. URL 〈http://www.ece.msstate.edu/ Predict, accessed: 2014-12-29. URL 〈www.predict.org〉.
57 Qin T, Guan X, Li W, Wang P, Huang Q. Monitoring abnormal network traffic based
123
wiki/〉.
58 Identifying outliers via clustering for anomaly detection, accessed: 2014-12-29. URL on blind source separation approach. J. Netw. Comput. Appl. 2011;34(5):1732– 124
59 〈https://repository.lib.fit.edu/handle/11141/126〉. 42 Dependable multimedia communications: systems, services, and 125
60 Internet traffic archive, accessed: 2014-12-29. URL 〈http://ita.ee.lbl.gov/〉. applications. 126
Jain AK, Murty MN, Flynn PJ. Data clustering: a review. ACM Comput Surv 1999; Ramadas M, Ostermann S, Tjaden B. Detecting anomalous network traffic with self-
61 31(3):264–323. organizing maps. In: Vigna G, Kruegel C, Jonsson E, editors. Recent advances in 127
62 Kendall K. A database of computer attacks for the evaluation of intrusion detection intrusion detection, Lecture notes in computer science, vol. 2820. Berlin, Hei- 128
63 systems. In: Proceedings of DARPA information survivability conference and delberg: Springer; 2003. p. 36–54. 129
exposition (DISCEX); 1999. p. 12–26. Rubner Y, Tomasi C, Guibas L. A metric for distributions with applications to image data-
64 130
Kruegel C, Mutz D, Robertson W, Valeur F. Bayesian event classification for intrusion bases. In: Sixth international conference on computer vision, 1998; 1998. p. 59–66.
65 detection. In: Proceedings of 19th annual computer security applications con- Shafi K, Abbass H. Evaluation of an adaptive genetic-based signature extraction 131
66 ference; 2003. p. 14–23. system for network intrusion detection. Pattern Anal Appl 2013;16(4):549–66. 132
1 Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA. Toward developing a systematic Tikiwiki: Cms groupware, accessed: 2014-12-29. URL 〈http://info.tiki.org/ 67
2 approach to generate benchmark datasets for intrusion detection. Comput Tikiþ Wiki þCMS þ Groupware〉. 68
Secur 2012;31(3):357–74. Towards a taxonomy of intrusion-detection systems, Comput. Netw. 31 (9) (1999)
3 Shyu M-L, Chen S-C, Sarinnapakorn K, Chang L. A novel anomaly detection scheme 805–822. 69
4 based on principal component classifier. In: IEEE foundations and new direc- Ubuntu Linux, accessed: 2014-12-29. URL 〈http://www.ubuntu.com〉. 70
5 tions of data mining workshop, in conjunction with ICDM'03, 2003. p. 171–9. Vasilomanolakis E, Karuppayah S, Mühlhäuser M, Fischer M. Taxonomy and survey 71
Su M-Y. Using clustering to improve the knn-based classifiers for online anomaly of collaborative intrusion detection. ACM Comput Surv 2015;47(4):551–533.
6 network traffic identification. J Netw Comput Appl 2011;34(2):722–30 (efficient Verizon's data breach investigation report 2014, accessed: 2014-12-29. URL 〈http://
72
7 and robust security and services of wireless mesh networks). www.verizonenterprise.com/DBIR/2014/〉. 73
8 Syarif I, Prugel-Bennett A, Wills G. Unsupervised clustering approach for network Wang Q, Shen Y, Zhang JQ. A nonlinear correlation measure for multivariable data 74
anomaly detection. In: Benlamri R, editor. Networked digital technologies set. Phys D: Nonlinear Phenom 2005;200(3–4):287–95.
9 communications in computer and information science, vol. 293. Berlin Hei- Xie M, Han S, Tian B, Parvin S. Anomaly detection in wireless sensor networks: a
75
10 delberg: Springer; 2012. p. 135–45. survey. J Netw Comput Appl 2011;34(4):1302–25 (advanced topics in cloud 76
11 Symantec internet security threat report, accessed: 2014-12-29. URL 〈http://www. computing). 77
symantec.com/〉. Yang Y, McLaughlin K, Littler T, Sezer S, Wang H. Rule-based intrusion detection
12 78
Tan Z, Jamdagni A, He X, Nanda P, Liu RP. A system for denial-of-service attack system for scada networks. In: Renewable power generation conference (RPG
13 detection based on multivariate correlation analysis. IEEE Trans Parallel Distrib 2013), 2nd IET; 2013. p. 1–4. 79
14 Syst 2014a;25(2):447–56. Ye N, Chen Q. An anomaly detection technique based on a chi-square statistic for 80
15 Tan Z, Nagar UT, He X, Nanda P, Liu RP, Wang S, Hu J. Enhancing big data security detecting intrusions into information systems. Qual Relaib Eng Int 81
with collaborative intrusion detection. IEEE Cloud Comput 2014:27–33. 2001;17:105–12.
16 Testing intrusion detection systems: a critique of the 1998 and 1999 darpa intru- Zhang Z, Li J, Manikopoulos CN, Jorgenson J, Ucles J. Hide: a hierarchical network 82
17 sion detection system evaluations as performed by lincoln laboratory. ACM intrusion detection system using statistical preprocessing and neural network 83
18 Trans Inf Syst Secur 2000:3(4):262–94. classification. In: Proceedings of IEEE workshop on information assurance and 84
Thc-hydra, accessed: 2014-12-29. URL 〈http://www.thc.org/thc-hydra/〉. security; 2001. p. 85–90.
19 The apache software foundation, accessed: 2014-12-29. URL 〈http://apache.org〉. Zhu R. Intelligent rate control for supporting real-time traffic in wlan mesh net- 85
20 The Global State of Information Security Survey 2015, accessed: 2015-01-19. URL works. J Netw Comput Appl 2011;34(5):1449–58 (dependable multimedia 86
21 〈http://www.pwc.com〉. communications: systems, services, and applications). 87
Thottan M, Ji C. Anomaly detection in ip networks. IEEE Trans Signal Process
22 2003;51(8):2191–204.
88
23 Tikiwiki cms groupware remote php code injection, accessed: 2014-12-29. URL 89
24 〈http://www.exploit-db.com/exploits/18265/〉. 90
25 91
26 92
27 93
28 94
29 95
30 96
31 97
32 98
33 99
34 100
35 101
36 102
37 103
38 104
39 105
40 106
41 107
42 108
43 109
44 110
45 111
46 112
47 113
48 114
49 115
50 116
51 117
52 118
53 119
54 120
55 121
56 122
57 123
58 124
59 125
60 126
61 127
62 128
63 129
64 130
65 131
66 132

Journal of Network and Computer Applications: Mohiuddin Ahmed, Abdun Naser Mahmood, Jiankun Hu

Uploaded by

Copyright:

Available Formats

Journal of Network and Computer Applications: Mohiuddin Ahmed, Abdun Naser Mahmood, Jiankun Hu

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Journal of Network and Computer Applications: Mohiuddin Ahmed, Abdun Naser Mahmood, Jiankun Hu

Uploaded by

Copyright:

Available Formats

Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 Contents lists available at ScienceDirect

1 Input encompassed within it considered anomalous. These techniques 67

1 x A C D and y A C Y as ‘triangle area map generation’ module is applied to extract the 67

1 Table 5 exploits. Interestingly, the performances of intrusion detection 67

You might also like