Generalized Discriminant Analysis and Support Vector Machines P Indira priyadarsini I Ramesh Babu Dept of Computer Science & Engg. Dept of Computer Science,&Engg. Acharya Nagarjuna University Acharya Nagarjuna University Guntur,A.P.,India Guntur,A.P.,India indupullagura@gmail.com rinampudi@hotmail.com
ABSTRACT criteria in improving the performance of Intrusion Detection
These days incidents of cyber attacks have increased, developing Systems. The need for reducing data is to remove the an effective Intrusion Detection Systems (IDSs) are mandatory redundant or irrelevant features. In a study the authors Ravi for defending information systems security. In general all existing kiran et.al, [6] demonstrated using the Principal Component IDSs use all the features in the network packet to trace out Analysis (PCA) which gained improved performance of the known intrusive patterns. A few of these features are irrelevant or redundant. However, high dimensional and non-linear data Artificial Neural Network (ANN) classifier for intrusion degrade the performance due to the Curse of dimensionality detection. In another related work achieved good results with problem. We have used a non-linear dimensionality reduction PCA as feature reduction techniques using Support Vector technique, Generalized Discriminant Analysis (GDA) which finds Machine in IDS [7],[8],[9].Most recently Rupali et.al an optimized transformation that maximizes the class investigated, Linear Discriminant analysis (LDA) in intrusion separability and avoid Curse of dimensionality problem. Well detection systems and achieved drastically good results organized IDSs make the classification process more effective and [10].The authors Srilatha et.al [11], had showed comparably efficient. Support Vector Machines (SVMs) are used in this best detection rate. process since they have eminent classifying ability with good Examining the many aspects of dimensionality reduction generalization power. The purpose of this paper is to select most important features that are useful in building computationally techniques in IDS we use Generalized Discriminant effective IDS. We have successfully introduced IDS with GDA Analysis(GDA) is used as the finest approach for reducing the using SVMs and are able to speed up the process with minimum number of dimensions and SVMs are used for classifying the memory space and CPU utilization. We investigated the networked data since they are more accurate and performs experiments on KDD Cup 99 dataset with standard Principal well on large data sets. The goal of this paper is to apply Component Analysis (PCA) using SVMs and compared the kernel trick which offer modular framework in dimensionality performances with the GDA. reduction techniques. The remaining part of this paper is organized as follows. Section 2 of this paper describes An Key words: Curse of dimensionality, Intrusion Detection System, Overview of Intrusion Detection System and KDD Cup 99 Generalized Discriminant Analysis (GDA), Support Vector Machines (SVMs), Principal Component Analysis (PCA). dataset. Section 3 will describe Support Vector Machine Classification. Section 4 describes the Dimensionality 1. INTRODUCTION Reduction techniques with standard PCA and proposed GDA With the advent of Internet and world wide connectivity the algorithms. Section 5 describes the Proposed Framework for damage that can be caused by attacks launched over Internet Building an efficient IDS. Section 6 gives metrics for against remote systems has increased. So, Intrusion Detection evaluating results while last Section describes Conclusions Systems (IDSs) have become an essential module of computer with Future work. security to detect these attacks. Several methods have been 2. AN OVERVIEW OF INTRUSION DETECTION SYSTEM (IDS) proposed in the development of IDS systems including 2.1. NETWORKING ATTACKS Decision Trees, Bayesian networks, Artificial Neural In the year 1998 DARPA intrusion detection Networks, Association Rules and SVM.Besides these methods evaluation program [DARPA], has set up a setting by research was made to investigate the performance of SVM as simulating a typical U.S. Air Force LAN to obtain raw TCP/IP the tool for the classification module of the intrusion detection dump data for a network. At that period it was blasted with system using various kernels [1],[2]. Now to enhance the multiple attacks. DARPA’98 have taken 7 weeks of network learning capabilities and reduce the computational ability of traffic, which is processed into about 5 million connection SVM, different dimensionality reduction techniques are records, each with about 100 bytes. KDD Cup 99 training applied. dataset [12] consists of around 4,900,000 single connection In recent years, a large variety of nonlinear dimensionality vectors each of which contains 41 features and is marked as reduction techniques have been proposed, many of which either normal or an attack, with exactly one specific attack depend on the evaluation of local properties of the data type and the testing data contain the last two weeks of traffic [3],[4],[5].Mainly Dimensionality reduction is the major with nearly 300,000 connections. It holds new types of attacks
ISSN: 2347-9310 (Online) IJCSITAR Vol. 2, ISSUE 2, Feb. 2014 that were not contained in the training data. From the training SVMs can solve a two-class problem, by separating the database a subset of 4, 94,021 data was taken as standard dataset using a maximal marginal hyper plane defined by a set dataset, which is 10% of the original data. Each class specifies of support vectors as shown in Fig.1.The dataset is separated a category of a simulated attack, and the attacks are Denial of using a hyper plane in such a way that maximizing the margin Service (DOS), User to Root (U2R), Remote to Local Attack (solid line in the figure below), this can be done by extending (R2L), and Probing Attack. both the marginal lines at both sides. The solid line is called Denial of Service (DOS): This is a type of attack in Maximal Marginal Hyper plane (MMH). Then the support which the attacker makes some computing or vectors are taken as a subset of training dataset which play a memory resource too busy or too full to handle crucial role in classification process; hence the process is legitimate requests by sending some malicious named as Support Vector Machines. If SVM is not able to packets. separate into two classes, then it solves by mapping input User to Root (U2R): This is a type of attack in which dataset into high dimensional dataset using a kernel function. the attacker attempts to gain access to the root Then in the high dimensional space it is able to classify with account of the target system and is able to develop good accuracy. There are several kernel functions used in some vulnerability to gain access to the root account. SVM classification like linear, polynomial and Gaussian. The Remote to Local Attack (R2L): This is a type of SVM Algorithm is given below. attack in which the attacker who does not have any account on that target machine, make use of some flaw and tries to gain the access of that target 1/w w.x+b=1 machine. Probing Attack: This is a type of attack in which w.x+b=-1 malicious attacker attempt to gather information Support vectors about a network of computers for the observable purpose of circumventing its security control. The IDS trains and builds a classifier on a training dataset containing TCP\IP data and labels are taken from the KDD 99 w.x+b=0 dataset and then it will test itself by trying to classify a set of test cases to the above groups. 2.2. KDD CUP 99 DATASET DESCRIPTION The KDD Cup 99 dataset used in the experiment was taken MMH from the Third International Knowledge Discovery and Data 1/w Mining Tools Competition. Each connection record is given with 41 attributes. The list of attributes contains both continuous type and discrete type variables, which are statistical distributions, crooked drastically from each other, Class label y= - 1 Class label y= + 1 which makes the detecting of intrusions a very exigent task. Figure 1: MMH is shown in SVM classification There are 22 categories of attacks from the following four classes: DOS, R2L, U2R, and Probe [13].The dataset has 391458 DOS attack records, 97278 normal records, 4107 SVM Algorithm: Probe attack records, 1126 R2L attack records and 52 U2R For classifying a training dataset, we try to estimate a function attack records. This is the dataset taken from only 10 percent f: Rn → {+1,-1}.Suppose there are two classes denote as A of the original data set. The data has been preprocessed before and B. The class A can be given with xЄA,y=1 and the class B using for training and testing of the IDS model. The 41 can be given with xЄB,y=-1 for (xi,yi) Є Rn x {+1,-1}.If the attributes [1] are given in the order given training data are linearly separable then there exists a A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,A pair (w,b) Є Rn x R such that y(wt x+b)≥ 1 for all xЄ AUB, A,AB,AC,AD,AE,AF,AG,AH,AI,AJ,AK,AL,AM,AN,AO and where w is the weight vector and b is the bias given on test the class label AP. tuple t. 3. SUPPORT VECTOR MACHINES 1. SVM can be defined as a maximal marginal classifier, in Support Vector Machines (SVMs) are machines that make which the classifying dataset is represented as an optimization classification process based on support vectors.SVMs are problem: introduced by the Vapnik [24], [25]. These are built based on Min ψ (w) = ǁwǁ2 subject to y (wt x+b) ≥ 1. Statistical Learning Theory (SLT).They are accurate on 2. The dual of the problem is finding Lagrange’s multiplier λi training samples and have a good generalization ability on which maximize the following equation as testing samples.SVMs can create both linear and non-linear decision margins using an optimization problem [14].These Max w (λ) = λi - λiλjxi txj yi yj constrained to λi≥ 0 days SVMs have become tremendous areas for research and i and λi yi = 0 are also became powerful tools in machine learning. Generally
ISSN: 2347-9310 (Online) IJCSITAR Vol. 2, ISSUE 2, Feb. 2014 SVMs are originally designed for binary classification later After multiplying the resulted components with the old data, they are extended for handling multiclass problems. These are we will get the new data. solved using one versus rest, one-versus-one and Directed 4.2. GENERALIZED DISCRIMINANT ANALYSIS (GDA) Acyclic Graph methods. It is also called as Kernel Fisher Discriminant Analysis (KFD) 4. DIMENSIONALITY REDUCTION TECHNIQUES or Kernel Discriminant Analysis (KDA).It is a kernel zed A dimension is defined as a measurement of a certain version of Linear Discriminant Analysis (LDA).GDA is taken characteristic of an object. Its general properties are to remove as an extension of the standard LDA from linear domain to a irrelevant and redundant data to reduce the computational cost nonlinear domain via the kernel trick. It is given and avoid data over-fitting [17], and to improve the quality of after Fisher.Using the kernel trick, LDA is absolutely data. Dimensionality reduction is a effectual solution to the performed in a new feature space, which allows non-linear problem of “curse of dimensionality”. When the dimensions mappings to be applied successfully. GDA is the one which are increased linearly, experiments have shown that the extends KFD to multiple classes. It was independently given required number of examples for learning increases by Baudat et.al [22] which specifies that the kernel matrix K is exponentially [18].In practice dimension, feature, variable, non-singular with the applications on low-dimensional feature vector, object and attribute are all similar. Consider any space. application in which a system processes data (eg: speech Suppose that X is n-dimensional sample set with N elements. signal, images, or patterns in general) in the form of a Let X1 denote subset of X and C is the number of classes. Let collection of vectors. And in many situations it is suggested to X is mapped to a non-linear mapping function ɸ.Then it is remove the irrelevant and redundant dimensions from the data transformed as Z: ɸ: X→Z.The between-class scatter matrix which produce successful results [19] [23]. and the with-in class scatter matrix of the nonlinearly mapped 4.1. PRINCIPAL COMPONENT ANALYSIS (PCA) data is given with the following equations [26]. It is the most widely used dimensionality reduction technique. Bɸ = Mcmɸc(mc)T -----(3) It is also known as Karhunen-Loeve transform in signal processing literature. The class label information is not taken Wɸ = ɸ(x)(x)T -----(4) ɸ into consideration in this method. It decreases the amount of Where m c is the mean of the class Xc in Z and Mc is the dimensions essential to classify new data and generates a set number of instances in Xc. of principal components, which are orthonormal eigenvector The main purpose of the GDA is to find projection matrix Uɸ pairs [20]. It reduces the dimensionality of the data to some that maximizes the ratio directions in the feature space where the variance is largest. Uɸopt =arg max |(Uɸ)T Bɸ Uɸ | /|(Uɸ)T Wɸ Uɸ| The amount of the total variance obtained for a feature is =[uɸ1,uɸ2,….uɸN] ------(5) proportional to its Eigen value [21].PCA can be calculated by Where the vectors uɸ ,can be taken as solution of the the following steps shown below [6]. generalized Eigen value problem that is given as the statement STEP 1: Get some data of the form The 41 features of preprocessed KDD CUP 99 dataset are Bɸ uɸi=λ Wɸ uɸi ----- (6) applied to PCA for feature optimization, which are redundant The training samples are to be adjusted in the center (zero and correlated. mean, unit variance) in the feature space Z.From the existing STEP 2: Subtract the mean kernels any solution uɸЄ Z have to lie in the span of all The mean is calculated as shown below: training samples in Z. X' = ----- (1) So, it becomes uɸ = αciɸ(xci) --(7) Where αci are some real weights and xci is the ith sample of the Now subtract the mean from each dimension from the data set. class c.The solution is obtained by solving the equation The mean subtracted is the average across each dimension. λ= αT KDK α / αT KKα ----- (8) The resultant data set with subtracted means will have a mean Where K is a kernel matrix of (Ki,j =K(xi,xj)). of zero. i.e. K=(Kkl) k=1,…C,,l=1,…C ------(9) STEP 3: Compute the covariance matrix The matrix D is given as Covariance matrix is calculated as: Di,j =1/mk ,if both xk and xj both belong to kth class; ( )= ----- (2) =0;otherwise ----(10) STEP 4: Compute the Eigen vectors and Eigen values of the covariance matrix. Solving the Eigen value problem yields the coefficient vector For generating the signal, we can calculate the Eigen vectors α that define the projection vectors uɸ Є Z.A projection of a and Eigen values for this matrix, which is a square matrix. testing vector xtest is computed as STEP 5: Forming a feature vector by selecting the (u)T ɸ(xtest) = αci K(xci,xtest) ---(11) components. Based on the signals it generated, we are required to choose GDA Algorithm: the features whose signal value is more and these features are Input: Training dataset X and the class labels, input data point called principal components. z STEP 6: Get the new data. Output: low dimensional dataset
ISSN: 2347-9310 (Online) IJCSITAR Vol. 2, ISSUE 2, Feb. 2014 1. Compute the matrices K and D by solving the equations (9) 3)Since the KDD 99 cup dataset retrieved is unlabeled, and (10). column headers are added specifying 2. Decompose K using Eigen vector decomposition. duration,protocol_type,src_bytes,dst_bytes,ext. 3. Compute Eigen vectors α and Eigen values of the equation 4) Then Sampling is applied to the dataset. Since the 10% (5). KDD Cup 99 dataset contain major part with normal and DOS 4. Compute Eigen vectors uɸ using αci from the equation (7) attacks while very minor part with remaining attacks. So we and normalize those using (8). have applied Sampling to both normal and DOS instances 5. Compute the projections of test points onto the eigenvectors .While we have taken the same number as it is from KDD Cup uɸ using equation (11). 99 for the remaining attacks. In this dataset among 97277 normal instances 32790 are taken. From 391458 DOS 5. PROPOSED FRAMEWORK instances 32790 are taken to the final dataset. From the Probe, Because of the larger dataset with 41 input features the U2R, R2L attacks all the instances are taken i.e.4107, 1126, 52 classification process, including training and testing time instances respectively. After performing preprocessing becomes slow. Therefore irrelevant and redundant features can techniques, the final dataset we have taken contain 70865 lead to slow training and testing processes, greater resource instances. The Experiments which were conducted are consumption, and a poor detection rate so they are omitted. performed on 5 classes. Among this final dataset, we have For building computationally capable intrusion detection taken the two-third of it as training dataset and one-third as system it is necessary to identify important input features. In test dataset which contain 46771, 24094 instances this paper, a statistical dimensionality reduction technique is respectively. introduced to identify most significant attributes. This will speed up the learning process. In this paper SVM classification is presented to obtain excellent accurate results. Collection of KDD Cup Dataset In the experiments conducted a standard data reduction scheme Principal Component Analysis was performed. Then the proposed approach is compared with the PCA.Due to the large differences in the attack patterns of various attack Pre-processing Step classes, there is generally an overlap between some of these classes in the feature space. In this situation, a feature transformation mechanism that can minimize the between- Final dataset with 41 attr., 70865 instances class scatter is used as given in the previous section. The proposed Framework is shown in the Fig.2. Dimension reduction A. DATASET SELECTION (GDA) Step The KDD cup 99 dataset of 4, 94,020 records contain 97277, Reduced dimensions with 12 attr., 70865 instaces 391458, 4107,1126,52 instances which are of Normal, DOS, Probe, R2L, U2R respectively. Hence the dataset cannot be Classification using SVM processed in its format. So, Pre-processing techniques are to be applied to the dataset. Therefore Pre-processing a dataset Step becomes very crucial task in classification which also makes Testing the model on 24094 instances (test data) difference in accuracy. B. PRE-PROCESSING STEP Metrics used to analyze Even though Pre-processing step is often neglected but it is the results important step in the data mining process. 1) Since the dataset do not contain no missing values this is one advantage. The class attribute in the dataset which Figure 2: Building Proposed System Framework contains normal and all categories of attacks which are not differentiated. So they are assigned names. Based on the four C.DIMENSION REDUCTION attack categories the target attribute is converted. For example Dimension reduction is done using PCA and GDA.Since the back, land, Neptune, pod, smurf, and teardrop belong to DOS final dataset contains 41 attributes with 70865 instances, the category. So the instances with these are converted to DOS number of attributes is reduced. Both the Experiments are category in the class label. Similarly all the remaining attack done on the platform Windows 2007 with 3.40 GHz CPU and categories are named. 2.0GB of RAM using Java 1.6 and Weka 3.6.9. 2) Then the symbolic attributes are converted to numeric. So D.CLASSIFICATION USING SVM the attributes B, C, D are converted to numeric. For example After the dataset is reduced, then the model is constructed the attribute B is protocol_type which has icmp, tcp, udp using Support Vector Machines. The Kernel used is Radial values. These are assigned 1 to icmp, 2 to tcp and 3 to udp. Basis Function with the parameters constant C=10 and γ=0.01. Likewise other symbolic attributes are also converted. The reason behind the choice of RBF kernel is that it has less numerical difficulties since the kernel values lie between zero
ISSN: 2347-9310 (Online) IJCSITAR Vol. 2, ISSUE 2, Feb. 2014 and one, while the polynomial kernel values may go to infinity FN=Actual normal instances that were incorrectly predicted as or zero while the degree is large .The final step of the test attacks =425+94+144+5=668 consists of finding a classification method using the SVM FP=Actual Attacks that were incorrectly labeled as algorithm on the lower-dimensional training dataset, as a Normal=1876+214+224+9=2323 result the test dataset is classified and results obtained are TN=All the remaining attacks that were correctly classified as shown in the Section 6. non-normal. EXPERIMENT 1: =9293+4+178+976+10+58+44+57+2+3+2+3=10630 The first experiment is performed on PCA.The final dataset is Detection Rate: It is given by reduced using Principal Component Analysis. Then it has DR= x 100% selected 19 attributes among 41 attributes. The attributes obtained are False Alarm Rate (FAR): It refers to the proportion of A,B,C,D,E,F,J,M,P,U,V,W,X,AA,AD,AE,AF,AH,AK. normal data is falsely detected as attack behavior. Then training is done on 46771 instances using SVMs. The FAR= x 100% time taken to build the model using 19 attributes is 2350.89s. Then the correctly classified instances are 20802 where TABLE III: CONFUSION MATRIX OBTAINED BY SVM CLASSIFIER remaining 3294 are incorrectly classified instances based on WITH PCA Predicted test dataset. Actual Normal DOS Probe R2L U2R % EXPERIMENT 2: Now the proposed approach Generalized Discriminant Analysis is applied on the final dataset which selects 12 Normal 10473 425 94 144 5 94 DOS 1876 9293 4 0 0 83 attributes out of 41 attributes. The attributes selected are Probe 214 178 976 10 0 71 C,E,F,L,W,X,Y,AB,AE,AF,AG,AI. R2L 224 58 44 57 0 15 Then the training is done on 46771 instances using SVMs.The U2R 9 2 3 2 3 16 time taken to build the model using 12 attributes is % 82 93 87 27 38 1521.82s.We have obtained 22648 correctly classified The time taken to build the model: 2350.89 sec instances among 24094 instances and the rest are incorrectly TABLE IV: CONFUSION MATRIX OBTAINED BY SVM CLASSIFIER classified instances. WITH GDA 6. EVALUATION AND RESULTS Predicted % 6.1. METRICS USED FOR EVALUATION Actual Normal DOS Probe R2L U2R We have used three metrics to analyze the results. They are namely Confusion matrix, Detection Rate (DR) and the False Normal 11118 23 0 0 0 98 Alarm Rate (FAR). DOS 998 10149 26 0 0 91 Confusion matrix: Probe 10 4 1279 85 0 93 The confusion matrix represents the data with actual and R2L 90 120 75 98 0 26 predicted classifications done by a classifier for each class. U2R 10 2 2 1 4 21 For the proposed system it contain 5 classes namely % 91 98 93 53 0 Normal,DOS,probe,R2Land U2R.The Confusion matrix The time taken to build the model: 1521.82 sec generally contain True Positive(TP),False Negative(FN),False TABLE V: COMPARISON OF BOTH THE MODELS Positive(FP),True Negative (TN) values as shown in Table I SVM classifier with PCA SVM classifier with GDA .The performance of the system is computed based on the data in a matrix. Table II shows the confusion matrix for Normal for illustration. DR Rate FAR Rate DR Rate FAR Rate TABLE I: CONFUSION MATRIX (%) (%) (%) (%) Predicted Normal Attack Actual Normal 94 17.92 99.7 8.5 Normal TP FN DOS 83.17 5.13 90.8 1.06 Attack FP TN Probe 70.8 0.63 92.8 0.45 R2L 14.8 0.61 25.5 0.358 In the above matrix, rows represent actual categories while the U2R 15.7 0.02 21.0 0 columns represent predicted categories. 7. CONCLUSIONS AND FUTURE WORK TABLE II: CONFUSION MATRIX FOR NORMAL FROM TABLE III Predicted Normal Attack In this paper GDA is used to select the important features for Actual classifying the KDD cup 99 dataset. When compared to PCA Normal 10473 668 which find directions (principal directions) that best represent Attack 2323 10630 the original data, GDA obtain directions that are efficient for discrimination. While using SVMs there are several kernels, TP= Actual normal instances that were correctly predicted as among all we have used RBF kernel because it shows better normal = 10473 performance mainly in IDS.The experimental results showed that our system is able to speed up the training and testing
ISSN: 2347-9310 (Online) IJCSITAR Vol. 2, ISSUE 2, Feb. 2014 process of Intrusions Detection Systems. Even though both of [21] E. E. Cureton and R. B. D’Agostino, ”Factor Analysis: An Applied Approach”, London: Lawrence Erlbaum Associates, vol. I, 1983. the GDA and SVMs are slightly bewildering with many [22] G.Baudt and F. Anouar “Generalized Discriminant Aanlyis Using a mathematical equations they are good enough in solving Kernal Approach” Neural Computation, 2000. complex problems. As our future work, we will extend our [23] Kai-mei Zheng, Xu Qian, Na An,”Supervised Non-Linear SVM system to build more proficient intrusion detection Dimensionality Reduction Techniques for Classification in Intrusion Detection”, International Conference on Artificial Intelligence and systems based on several non-linear dimensionality reduction Computational Intelligence, 2010. techniques. [24] Boser, Guyon, and Vapnik, “A training algorithm for optimal margin REFERENCES classifiers”,Proceedings of the fifth annual workshop on Computational [1] Mukkamala S., Janoski G., Sung A. H, “Intrusion Detection Using Neural learning theory.pp.144-152, 1992. Networks and Support Vector Machines,” Proceedings of IEEE International [25] Cortes C.,Vapnik V.,“Support vector networks, in Proceedings of Joint Conference on Neural Networks, 2002, pp.1702-1707. Machine Learning 20: pp.273–297, 1995. [2] Wun-Hwa Chen, Sheng-Hsun Hsu,”Application of SVM and ANN for [26]Sebastian Mika, Gunnar fitsch, Jason Weston,Bernhard Scholkopf , and intrusion detection”, Computers & Operations Research, 2005 – Elsevier . Klaus-Robert Muller,”Fisher discriminant analysis with kernels”,IEEE,1999. [3] C.J.C. Burges. “Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers”, chapter Geometric Methods for Feature Selection and Dimensional Reduction: A Guided Tour. Kluwer Academic Publishers, 2005. [4] B. Sch¨olkopf, A.J. Smola, and K.R. Muller,”Nonlinear component analysis as a kernel eigenvalue problem”, Neural Computation, 10(5):1299– 1319, 1998. [5] J. Wang, Z. Zhang, and H. Zha.,”Adaptive manifold learning”,In Advances in Neural Information Processing Systems, vol17, pages 1473– 1480, Cambridge, MA, USA, 2005.The MIT Press. [6)]Ravi Kiran Varma,V.Valli Kumari ,”Feature Optimization and Performance Improvement of a Multiclass Intrusion Detection System using PCA and ANN” , International Journal of Computer Applications (0975 – 8887) Vol 44 No13, April 2012. [7] Hansheng Lei, Venu Govindaraju,”Speeding Up Multi-class SVM Evaluation by PCA and Feature Selection”,The 5th SIAM International Conference on Data Mining Workshop, California, USA, 2005. [8] Gopi K. Kuchimanchi, Vir V. Phoha, Kiran S. Balagani, Shekhar R. Gaddam,”Dimension Reduction Using Feature Extraction Methods for Real- time Misuse Detection Systems”,Proceedings of the 2004 IEEE Workshop on Information Assurance and Security T1B2 1555 United States Military Academy, West Point, NY, 10,June 2004. [9] Heba F. Eid1, Ashraf Darwish2, Aboul Ella Hassanien3, and Ajith Abraham,”Principle Components Analysis and Support Vector Machine based Intrusion Detection System”,ISDA 2010,363-367. [10] Rupali Datti, Bhupendra verma,”Feature Reduction for Intrusion Detection Using Linear Discriminant Analysis”, (IJCSE) International Journal on Computer Science and Engineering Vol 02, No. 04, 2010, 1072-1078. [11] Srilatha Chebrolu, Ajith Abraham, Johnson P. Thomas,”Feature deduction and ensemble design of intrusion detection systems”,Elsevier,Computers & Security (2005) 24, 295-307. [12]KDD Cup 1999,” October 2007, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html [13] Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A. Ghorbani “A Detailed Analysis of the KDD CUP 99 Data Set”, Proceedings of the 2009 IEEE Symposium on Computational Intelligence in Security and Defence Applications (CISDA 2009). [14] P Indira priyadarsini,Nagaraju Devarakonda,I Ramesh Babu,”A Chock- Full Survey on Support Vector Machines”, International Journal of Computer Science and Software Engineering,Vol 3,issue10,2013. [15] Mukkamala S., Janoski G., Sung A. H.,” Comparison of Neural Networks and Support Vector Machines, in Intrusion Detection”, Workshop on Statistical and Machine Learning Techniques in Computer Intrusion Detection, June 11-13, 2002. [16] Vapnik V.,” The Nature of Statistical Learning Theory”, Springer- Verlag, New York, 1995. [17] Andrew Y N.,”Preventing overfitting of crossvalidation data”, In Proceedings of Fourteenth International Conference on Machine Learning, pages 245–253, 1997. [18] R. Bellman, “Adaptive Control Processes: A Guided Tour”, Princeton University Press, Princeton, 1961. [19] U.M. Fayyad and R. Uthurusamy,” Evolving data mining into solutions for insights”, Communications of the Association for Computing Machinery, 45(8):28 – 31, August 2002. [20] I. T. Jolliffe, ”Principal Component Analysis” Springer-Verlag, (New York), 2002.
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB