Academia.eduAcademia.edu

Anomaly detection using baseline and K-means clustering

2010

Anomaly detection refers to methods that provide warnings of unusual behaviors which may compromise the security and performance of communication networks. In this paper it is proposed a novel model for network anomaly detection combining baseline, K-means clustering and particle swarm optimization (PSO). The baseline consists of network traffic normal behavior profiles, generated by the application of Baseline for Automatic Backbone Management (BLGBA) model in SNMP historical network data set, while K-means is a supervised learning clustering algorithm used to recognize patterns or features in data sets. In order to escape from local optima problem, the K-means is associated to PSO, which is a metaheuristic whose main characteristics include low computational complexity and small number of input parameters dependence. The proposed anomaly detection approach classifies data clusters from baseline and real traffic using the K-means combined with PSO. Anomalous behaviors can be identified by comparing the distance between real traffic and cluster centroids. Tests were performed in the network of State University of Londrina and the obtained detection and false alarm rates are promising.

Anomaly detection using baseline and K-means clustering Moisés F. Lima∗ , Bruno B. Zarpelão† , Lucas D. H. Sampaio∗ , Joel J. P. C. Rodrigues‡ , Taufik Abrão∗ and Mario Lemes Proença Jr.∗ ∗ Computing Science Department, State University of Londrina (UEL), Londrina, Brazil of Elect. & Comp. Engineering, University of Campinas (UNICAMP), Campinas, Brazil ‡ Instituto de Telecomunicações, University of Beira Interior, Covilhã, Portugal E-mails: {moisesflima, brunozarpelao, lucas.dias.sampaio}@gmail.com, joeljr@ieee.org, {taufik, proenca}@uel.br † School Abstract: Anomaly detection refers to methods that provide warnings of unusual behaviors which may compromise the security and performance of communication networks. In this paper it is proposed a novel model for network anomaly detection combining baseline, K-means clustering and particle swarm optimization (PSO). The baseline consists of network traffic normal behavior profiles, generated by the application of Baseline for Automatic Backbone Management (BLGBA) model in SNMP historical network data set, while K-means is a supervised learning clustering algorithm used to recognize patterns or features in data sets. In order to escape from local optima problem, the K-means is associated to PSO, which is a metaheuristic whose main characteristics include low computational complexity and small number of input parameters dependence. The proposed anomaly detection approach classifies data clusters from baseline and real traffic using the K-means combined with PSO. Anomalous behaviors can be identified by comparing the distance between real traffic and cluster centroids. Tests were performed in the network of State University of Londrina and the obtained detection and false alarm rates are promising. 1. INTRODUCTION Identifying network anomalies is essential for communication networks of enterprises or institutions. The goal of anomaly detection is to provide an early warning about an unusual behavior, which can affect the security and the performance of a network. It is very important to detect and treat anomalies efficiently, because they affect the quality of services provided, resulting in degradation of network performance and even in operations’ interruption. Due to the large number of anomalous events that can occur in networks, the main challenge is to detect and classify anomalies automatically. [1]–[3]. Anomaly detection techniques are divided in three major areas according [4]: statistical anomaly detection, data mining and machine learning based techniques. Two another research areas named information theory and spectral theory are included in the anomaly detection classification provided by [5]. Considering data mining techniques, there is a wide variety of algorithms that can be applied to anomaly detection, where stands the clustering as the most important unsupervised learning process for finding pattern on unlabeled data [6]. Among the wide variety of applications for anomaly detection, the most common are network traffic monitoring, intrusion detection for cyber security, fault detection in safety critical systems, insurance, military surveillance for enemy activities and many others [5]. This work proposes the use of K-means clustering algorithm [7] combined to the network behavior profiles called baseline [8] and Particle Swarm Optimization (PSO) [9] for anomaly detection. This model fits into data mining based methods, aiming to detect volume anomalies. Classified as an unsupervised learning technique, K-means clustering is a classical algorithm, initially developed by J. MacQueen in 1967. Although being a simple algorithm, it suffers from the inability to escape from local optima, which can be overcome by combining with the PSO algorithm. PSO is a high efficient heuristic technique with low computational complexity and capability to escape from local optima, developed in 1995 by Kennedy and Eberhart [9]. The baseline consists of different normal behavior profiles to a specific network device or segment, generated by the GBA tool (Automatic Backbone Management) [8], using data collected from Simple Network Management Protocol (SNMP) objects. The proposed anomaly detection system (ADS) combines the K-means and PSO algorithms, aiming to calculate the clusters centroids of real traffic collected in a SNMP object and its respective baseline. Anomaly detection is performed by comparing real traffic and clusters centroids. Tests were carried out using a real network environment in the State University of Londrina (UEL), Brazil. Numerical results have been shown that the obtained detection and false alarm rates are promising. This paper is organized as follows. The Section 2 presents related work on network anomalies and the traffic model characterization is detailed in Section 3. Section 4 describes the K-means and swarm optimization aspects. Section 5 details the proposed anomaly detection approach, while Section 6 discusses the adopted tests setup and the respective performance results. Finally, the main conclusions and future work are presented in Section 7. 2. RELATED WORK In recent years, several works such as [2] [10] [11] have been developed in the anomaly detection area. Though using different approaches, they have the same goal of maximizing the detection rate while minimizing the rate of false alarm. The establishment of a normal model for the network traffic and the need of increasing anomaly detection rate with lower false alarm rate are still challenging tasks, which keeps the anomaly detection an open research area. Xiao et al. [7] proposed a K-means algorithm based on PSO for network anomaly detection. As a hill-climbing method, if the initial synaptic weights and input patterns of the Kmeans are not set correctly, the method does not converge, or converges to a local optimum. Because their tendencies to converge to a local minimum, the Particle Swarm Optimization which has a good global search ability, is associated to solve the local convergence minimum problem. The KDD CUP 1999 dataset [12] was used to evaluate the proposed method. Experiments results show that the proposed method is effective for partitioning large dataset and is useful for anomaly detection, reaching satisfactory detection rates for different classes of anomalies. In [13] Liu proposed a modified version of the traditional quantum-behaved particle swarm optimization (QPSO), the MQPSO. This algorithm is employed to train a wavelet neural network (WNN), which is used for network anomaly detection. A multidimensional vector composed by WNN parameters was associated to a particle in the evolutionary learning algorithm. The suitable parameter combination determines the feasibility of the search space to obtain the optimal solution. In order to validate the proposed approach, the KDDCup99 [12] training dataset was used as the experimental data set. Results showed that the proposed algorithm has a better training performance, faster convergence, as well as a better detecting ability for new unknown type attacks, compared to QPSO. ℎ = (��� − ��� )/5. Then, the limits of each ��� class are obtained. They are calculated by ��� = ��� + ℎ ⋅ �, where �� represents the � class (� = 1 . . . 5). The value that is the greatest element inserted in the class with accumulated frequency equal or greater than 80% is included in baseline. The samples for the generation of baseline are collected second by second along the day, by the GBA tool. Two types of baseline are generated: the bl-7 consisting of one baseline for each day of the week, and the bl-3 consisting of one baseline for the workdays, one for Saturday and another one for Sunday. Figure 1 shows chart containing one day of monitoring of UEL network. Data were collected from SNMP object ifInOctets, at the University’s Web server in the period of 02/08/2010. The monitored traffic is represented in green and the respective baseline values by the blue line. It is possible to observe a great adjustment between the behavior of real traffic and the baseline, excepting from 5 p.m to 10 p.m when occurs a volume anomaly. Figure 1. Test Day. Traffic and baseline of 02/08/2010 from ifInOctets SNMP object, on main Web-Server of State University of Londrina. 3. TRAFFIC CHARACTERIZATION: BLGBA AND BASELINE The first step to detect anomalies is to adopt a model that characterizes the network traffic efficiently, which represents a significant challenge due to the non-stationary nature of network traffic. Large networks traffic behavior is composed by daily cycles, where traffic levels are usually higher in working hours and are also distinct for workdays and weekends. So an efficient traffic characterization model should be able to trustworthily represent these characteristics. Thus, in this work the GBA tool is used to generate different profiles of normal behavior for each day of the week, meeting this requirement. These behavior profiles are named Digital Signature of Network Segment (baseline), proposed by Proença in [8] and applied to anomaly detection with great results in [3]. Hence, the BLGBA algorithm was developed based on a variation in the calculation of statistical mode. In order to determine an expected value to a given second of the day, the model analyzes the values for the same second in previous weeks. These values are distributed in frequencies, based on the difference between the greatest ��� and the smallest ��� element of the sample, using 5 classes. This difference, divided by five, forms the amplitude ℎ between the classes, 4. K-MEANS CLUSTERING AND PARTICLE SWARM OPTIMIZATION K-means is a well-known clustering algorithm created by J. MacQueen. It can be used for unsupervised learning of neural networks, pattern recognitions, clustering analysis and more. The algorithm classifies data sets based on attributes into K groups. The grouping is performed by minimizing the sum of squares of distances between data and the corresponding cluster centroid. The K-means algorithm suffers from the absence of diversity mechanism to escape from local optimum. Thus, in order to overcome this drawback and simultaneously keeps computational complexity under control, mainly because for high-dimensional problems complexity is a concern, the Kmeans algorithm can be associated to PSO [7] [14]. The PSO is an evolutionary computation technique based on swarm intelligence, created by Kennedy and Eberhart in 1995, inspired on birds social behavior [9]. PSO is powerful since it is able to escape from global optima while keeps a simple structure. In PSO, the solutions into the search space are called particles. Each particle has a fitness value, which is measured by the function to be optimized, having an updating speed that drives its flight and moving through search space. The PSO principle is the movement of a group of particles, randomly distributed in the search space, each one with its own position and velocity. The position of each particle is modified by the application of velocity in order to reach a better performance [9]. The interaction among particles is inserted in the calculation of particle velocity. Hence, at each iteration, the speed and position of all particles from a population of size � are updated. If the best values for local or global solutions were founded, the respective best is the best position candidate-vector is updated, where pbest � value obtained so far by each particle in the population of size is the best position value obtained by all particle � , and pbest � so far. The best local and global particles are column-vectors wise, with dimension �. In the PSO strategy, each candidate-vector at �th iteration, defined as p� [�] with �×1 dimension, is used for the velocity calculation of next iteration as: v� [� + 1] = � ⋅ v� [�] + �1 ⋅ U�1 [�](pbest � − p [�]) �2 ⋅ U�2 [�](pbest � � − p� [�]) + (1) where � is the inertia weight, adopted as an unitary value in this work, for simplicity; U�1 [�] and U�2 [�] are diagonal matrices with dimension �, and elements are random variables with uniform distribution ∼ � ∈ [0, 1], generated for and pbest the �th particle at iteration � = 1, 2, . . . , � ; pbest � � are the best global position and the best local positions found until the �th iteration, respectively; �1 and �2 are acceleration coefficients regarding the best particles and the best global positions influences in the velocity updating, respectively. The �th particle’s position at iteration � is a clustering candidate-vector p� [�] of size � × 1. The position of each particle is updated using the new velocity vector (1) for that particle, according to: p� [� + 1] = p� [�] + v� [� + 1], � = 1, . . . , � (2) of �−dimension is clamped to a maximum magnitude �m . If we could define the search space by the bounds [�min ; �max ], then the value of �m will be typically set to �m = � (�max − �min ), where 0.1 ≤ � ≤ 1.0. In this work, the objective function to be minimized by PSO is the sum of Euclidean distances of the candidate-vector regarding each data point of the �th cluster generated by Kmeans, given by: � ∑ � √ ∑ 2 �(p) = ∣p�� − c� ∣ where � is the number of clusters, � is the number of traffic samples and c� is the �th cluster centroid. 5. NETWORK ANOMALY DETECTION MODEL BASED ON SWARM INTELLIGENCE As seen in section 3 the baseline is responsible for the normal traffic characterization, using historical SNMP network data. So, the proposed ADS does not have a pre-processing phase in order to characterize normal traffic, but instead the baseline is responsible for this task. The objective of K-means and PSO combination is to enable an efficient calculation of traffic samples and baseline centroids, over a high dimensional data. In this work is considered a fixed value of � = 1, and every 300-seconds is calculated the value of c� . Then it is calculated the distance of each traffic sample and the cluster centroid. If one sample in the 300-seconds interval exceeds a threshold, then this interval is considered anomalous. The elements of the proposed ADS can be seen in Figure 2. The GBA tool [8] is responsible for the collection of real traffic samples and generation of the baseline. The PSOCls system calculates the cluster centroids of the traffic and baseline. Then, the PSO Alarm system can analyze the distance between cluster centroids and real traffic samples, aiming to find the existence of anomalies. The PSO algorithm consists of repeated application of the velocity and position updating equations until a stop criteria is found. The stop criteria can be a fixed number of iteration or determined by the non-improvement in the solution when the algorithm evolves. In order to reduce the likelihood that the particle might leave the search universe, maximum velocity �m factor is added to the PSO model (1), which will be responsible for limiting the velocity in the range [±�m ]. The adjustment of velocity allows the particle to move in a continuous but constrained subspace, been simply accomplished by: �� [�] = min {�m ; max {−�m ; �� [�]}} (3) From (3) it is clear that if ∣�� [�]∣ exceeds a positive constant value �m specified by the user, the �th particle’ velocity is assigned to be sign(�� [�])�m , i.e. particles velocity on each (4) �=1 �=1 Figure 2. Proposed anomaly detection system model. The process for anomaly detection of the proposed system is divided into two stages: 1. The PSO-Cls system analyzes traffic data collected from SNMP objects and their respective baseline every 300 seconds. Firstly, traffic data and baseline from each 300-seconds interval are clustered simultaneously. Then, a centroid for each cluster is calculated, which represents the expected behavior for the traffic samples of the cluster. The clustered data and clusters centroids generated in this stage are used in the next step. 2. The PSO Alarm system is responsible for analyzing the results generated by the step 1, verifying if exists anomalies in the analyzed interval. The PSO Alarm system checks how close each sample of traffic movement is from their corresponding cluster centroid. The distance measure adopted in this work is the Euclidean distance, which consists of the straight line distance between two points. A sample is considered anomalous if the Euclidian distance between it and their respective cluster centroid, exceeds a threshold value called �. Then, PSO Alarm system triggers an alarm to notify the network administrator that occurred an anomaly. There is no unanimity regarding the definition of anomalies in network traffic. The same behavior deviation can be classified differently according to distinct management policies. For a first network administrator, even small deviations should be detected, in order to identify every possibility of undesired usage of network resources. Other network administrator may be interested only in long deviations, which make the users to experience a degradation of services’ quality. Different approaches are found in literature to define which behaviors are anomalies. Thottan and Ji [15] consider as anomalies only the behavior deviations that result in operations’ disruption. Tapiador et al. [16] showed some events that were not reported on syslogs and did not cause the operations’ disruption, but should have been detected, because they influenced badly the quality of service provided for end users. Thus, in order to evaluate the proposed ADS in terms of detection and false alarm rates, according to different anomalies characteristics, a parameterized definition for volume anomalies was implemented. Two parameters are used in anomalies defining, �, which is related to the amplitude of the anomaly and � representing its duration. Taking into consideration that the polling interval of the traffic monitor is 10 seconds and one day have 8640 traffic samples, if during the monitoring, a traffic sample exceed or stay below its baseline in �%, an alert interval will open. If within the alert interval exists � samples that exceed or stay below its baseline in �%, this interval is considered anomalous and must be detected by the proposed ADS. 6. NUMERICAL RESULTS Aiming to validate our ADS, we used a real network environment from State University of Londrina (UEL). The traffic used in the experiment was monitored during the day 02/08/2010, from ifInOctets SNMP MIB object of UEL main Web server. ifInOctets determines the total number of octets received on the interface. The objective of proposed algorithm is to detect the large difference between real traffic (reading) and it baseline in the period of 5 p.m. and 10 p.m. as can be seen in figure 1, which indicates a volume anomaly. As seen in section 5, for each traffic sample, PSO Alarm system calculates the Euclidean distance between monitored data sample and its respective cluster centroid, aiming to verify whether the sample is anomalous. Every time the monitored traffic shows a significant deviation from the baseline, a substantial variation on the Euclidean distance values takes place, which can characterize a volume anomaly according to the parameters specified by the network administrator. So, if this distance exceed the � threshold value, the PSO Alarm system triggers an alarm to notify the network administrator. The evaluation of the proposed ADS is based on two performance metrics: the detection rate, which consists of the detection probability given by (5), and the false alarm rate, which represents the probability of alarms that not show significant variation between real traffic and the baseline, according to (6). The variables used to calculate the detection and false alarm rates are: ∙ ��������� ��������: number of anomalies that were correctly detected. ∙ ��� ���������: number of anomalies occurred in traffic. ∙ � ���� ���������: number of alarms that do not correspond to an anomalous situation. ∙ ��� ������: number of generated alarms. ��������� ���� = ��������� ��������/��� ��������� (5) � ���� �������� ���� = � ���� ���������/��� ������ (6) In order to assess the efficiency of the proposed ADS, one class of anomalies called class 1 was defined. This class concerns the long duration anomalies that exceed or stay below their baseline up to 60%. The parameters used to define this class are � = 60% and � = 25. With the objective of finding the optimum � value that results in the best values for detection and false alarm rates for the test day, the detection algorithm was performed several times with different values of � for the anomalies class 1. Figure 3 describes the performance of PSO-based ADS in terms of trade-off between detection and false alarm rates, with � varying on the [1 . . . 100] interval for class 1. The best threshold found for class 1 was � = 5.74 ∗ 106 . The results confirm that the proposed method is useful for anomaly detection, achieving the best detection rate × false alarm rate, 82.92% and 2.85% respectively for class 1 over the test day traffic. Figure 4 shows the alarms generated by the proposed ADS for anomalies class 1 with � = 5.74∗106 for the test day. The y-axis represents the Euclidean distances between samples and State University of Londrina. The proposed clustering-based anomaly detection algorithm showed robustness against false alarm while held good anomaly detection rates, achieving 82.92% detection rate with 2.85% false alarm rate for the test day, as shown in Figure 3. Our ongoing work is centered on expanding the proposed detection model through the simultaneous monitoring of several SNMP objects, in order to correlate these and classifying the anomalies. Thus, the proposed approach can be extended to other types of anomalies while improving the detection and false alarm rate. REFERENCES Figure 3. RoC curve for the test Day baseline cluster centroids, and the x-axis represents the time they occurred and red dotted line represents � threshold. One can observe that between 5 p.m. and 10 p.m. there is a wide variation between baseline and traffic, which was correctly detected by PSO Alarm system. Analyzing this day, we have that the detection × false alarm rates reached 82.92%×2.85% for anomalies class 1, demonstrating that the proposed system, achieved excellent results against anomalous traffic. Figure 4. Alarms for the test day 7. CONCLUSIONS AND FUTURE WORK In this paper it was presented the K-means algorithm combined with the particle swarm optimization for anomaly detection. The experiments’ results applied to a real network environment showed that proposed method is capable to detect volume anomalies in real network traffic, achieving satisfactory results. The experiments were performed using data monitored from the SNMP object ifInOctets, of the main web server of [1] A. Lakhina, M. Crovella, and C. Diot, “Mining anomalies using traffic feature distributions,” SIGCOMM Comput. Commun. Rev., vol. 35, no. 4, 2005. [2] A. Kind, M. P. Stoecklin, and X. Dimitropoulos, “Histogram-based traffic anomaly detection,” in IEEE Transactions on Network Service Management, vol. 6, no. 2, June 2009. [3] B. B. Zarpelão, L. S. Mendes, M. L. Proença Jr., and J. J. P. C. Rodrigues, “Parameterized anomaly detection system with automatic configuration,” in GC’09 CSS. 2009 IEEE Global Communications Conference (IEEE GLOBECOM 2009), Communications Software and Services Symposium, 2009. [4] A. Patcha and J. M. Park, “An overview of anomaly detection techniques: Existing solutions and latest technological trends,” Computer Networks: The International Journal of Computer and Telecommunications Networking, 2007. [5] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” ACM Computing Surveys, vol. 41, no. 3, July 2009. [6] M. Jianliang, S. Haikun, and B. Liang, “The application on intrusion detection based on k-means cluster algorithm,” in International Forum on Information Technology and Applications, 2009. [7] L. Xiao, Z. Shao, and G. Liu, “K-means algorithm based on particle swarm optimization algorithm for anomaly intrusion detection,” in WCICA 2006 . The Sixth World Congress on Intelligent Control and Automation, 2006, pp. 5854 – 5858. [8] M. L. Proença Jr., C. Coppelmans, M. Botolli, and L. S. Mendes, Security and reliability in information systems and networks: Baseline to help with network management. Springer, 2006, pp. 149–157. [9] R. C. Eberhart and J. Kennedy, “A new optimizer using particle swarm theory,” in Proceedings of the Sixth International Symposium on Micromachine and Human Science, 1995, pp. 39–43. [10] Y. ling Zhang, Z. guo Han, and J. xia Ren, “A network anomaly detection method based on relative entropy theory,” in Proceedings of the 2009 Second International Symposium on Electronic Commerce and Security, 2009, pp. 231 – 235. [11] V. Sotiris, P. Tse, and M. Pecht, “Anomaly detection through a bayesian support vector machine,” Reliability, IEEE Transactions on, pp. 277 – 286, june 2010. [12] The third international knowledge discovery and data mining tools competition data set KDD99-Cup. Available at http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. [13] L. li Liu and Y. Liu, “MQPSO based on wavelet neural network for network anomaly detection,” in Wireless Communications, Networking and Mobile Computing, 2009. WiCom ’09. 5th International Conference on, 2009. [14] B. Firouzi, T. Niknam, and M. Nayeripour, “A new evolutionary algorithm for cluster analisys,” in International Journal of Computer Science, 2009. [15] M. Thottan and C. Ji, “Anomaly detection in ip networks,” IEEE Transactions in Signal Processing, vol. 51, no. 8, pp. 2191–2204, 2004. [16] J. M. Tapiador, P. G. Teodoro, and J. E. D. Verdejo, “Anomaly detection methods in wired networks: a survey and taxonomy,” Computer Communications, vol. 27, no. 16, pp. 1569–1584, October 2004.