Integrated Intrusion Detection Using SCT: Selvakani Kandeeban and R. S. Rajesh
Integrated Intrusion Detection Using SCT: Selvakani Kandeeban and R. S. Rajesh
Integrated Intrusion Detection Using SCT: Selvakani Kandeeban and R. S. Rajesh
optimization framework of their work is based on the and vice versa, therefore the mutual information is the same
wrapper model [12]. as the uncertainty contained in Y (or X) alone, namely the
One IDS tool that uses GAs to detect intrusions, and is entropy of Y (or X: clearly if X and Y are identical they have
available to the public is the Genetic Algorithm as an equal entropy). In a specific sense [2], mutual information
Alternative Tool for security Audit Trails Analysis quantifies the distance between the joint distribution of X
(GASSATA). GASSATA finds among all possible sets of and Y and the product of their marginal distributions.
known attacks, the subset of attacks that are the most likely Decision Independent Correlation is defined as the ratio
to have occurred in a set of audit data [11]. Since there can between the mutual information and the uncertainty of the
be many possible attack types, and finding the optimal feature. DIC is expressed as
subset is very expensive to compute. GAs are used to search I(X j ; X j )
efficiently. The population to be evolved consists of vectors DICX J ( X i, X j ) = , (2)
with a bit set for each attack that is comprised in the data H(X j )
set. Crossover and mutation converge the population to the
most probable attacks. I(X j ; X j )
A second tool that is implemented and undergoing more DICXi ( X i, X j ) = (3)
advanced enhancements is the Network Exploitation H(Xi )
Detection Analyst Assistant (NEDAA). The Applied Correlation Measure is defined as a measure to quantify
Research Laboratories of the University of Texas at Austin the information redundancy between Xi and Yi with respect
has developed the NEDAA [9], which uses different to Y as follows:
machine learning techniques, such as a finite state machine, I (Y ; X i ) + I (Y; X j ) − I (Y ; X i X j )
a decision tree, and a GA, to generate artificial intelligence QY ( X i, X j ) = (4)
(AI) rules for IDS. One network connection and its related H (Y )
behavior can be translated to represent a rule to judge The ranked lists of features is obtained by using a simple
whether or not a real-time connection is considered an forward selection hill climbing search, starting with an
intrusion or not. These rules can be modeled as empty set and evaluating each feature individually and
chromosomes inside the population. The population evolves forcing it to continue to the far side of the search space.
until the evaluation criteria are met. The generated rule set It has been shown that dependency measure or correlation
can be used as knowledge inside the IDS for judging measures qualify the accuracy of decision to predict the
whether the network connection and behaviors are potential value of one variable. However, the symmetrical uncertainty
intrusions. measure is not accurate enough to quantify the dependency
The extraction of knowledge in the form of rules has among features with respect to a given decision. A critical
been successfully explored before on RBF networks using point was neglected that the correlation or redundancy
the (hidden unit Rule Extraction) hREx algorithm [7]. This between features is strongly related with the decision
work inspired the authors to develop knowledge synthesis or variable. The feature subset is obtained as:
knowledge insertion by manipulating the RBF network 1. Generate feature set R from the ranked list of features
parameters but information flow/extraction was in the 2. For each feature for each type of attack, calculate the
opposite direction. mutual information between the feature Xi and the
decision Y, I(Y;Xi)
3. Methodology 3. Updating relevant features set R by comparing the
mutual information I(Yi;Xi)
As indicated in the introduction the basic objective of this if I(Y;Xi)≥ δx then R ← R + { Xi }
work is to determine the contribution of the 41 features in where δx is the threshold which is user defined
KDD 99 intrusion detection datasets to attack detection 4. Create working Set W by copying R
3.1 Feature Selection 5. Set goal Set G = null
6. While e(G) < δ2 do
Formally, the mutual information of two discrete random
If W = null then break
variables X and Y can be defined as:
Choose Xk Є W that subjects to
p ( x, y )
I ( X ;Y ) = ∑ ∑ p ( x , y ) log (1) (i) Mutual information where
y∈Y x∈ X p ( x) p( y) I(Y;Xk) ≥ I(Y;Xl) for all l≠k, Xl Є W
where p(x,y) is the joint probability distribution function of (ii) Correlation Measure
X and Y, and p(x) and p(y) are the marginal probability Qy(Xk,Xn) ≤ Qy(Xm,Xn) for all m≠k, Xn Є G
distribution functions of X and Y respectively. W ← W - {Xk}
Intuitively, mutual information measures the information G ← G + {Xk}
about X that is shared by Y: it measures how much knowing End Loop
one of these variables reduces our uncertainty about the 7. Obtain a feature subset from the intersection of all
other. If X and Y are independent, then X contains no the attacks subset
information about Y and vice versa, so their mutual 3.2 GA Rules Formation
information is zero: knowing X does not give any
By analyzing the dataset, rules will be generated in the rule
information about Y (and vice versa). If X and Y are
set. These rules will be in the form of an ‘if then’ format as
identical then all information conveyed by X is shared with
follows.
Y: knowing X provides all necessary information about Y
if {condition} then {act}
130 (IJCNS) International Journal of Computer and Network Security,
Vol. 1, No. 2, November 2009
The condition using this format refers to the attributes in interacting with the RBF based system in some loosely or
the rule set that forms a network connection in the dataset. tightly coupled protocol. However, by converting the fuzzy
The condition will result in a ‘true’ or ‘false’. The attack rules into RBF architecture they can be subjected to further
name will be specified only if the condition is true. analysis by rule extraction and it also avoids hybrid system
The condition in the format above refers to the attributes integration issues.
in the rule set that forms a network connection in the
dataset, which is selected from the feature selection phase.
4. Experiments and Results
Note that the condition will result in a ‘true’ or ‘false’. The
act field in the ‘if-then’ format above will refer to an action We have used an open source machine learning
once the condition is true, such as reporting an alert to the framework WEKA [Waikato Environment for Knowledge
system administrator. For example, a rule in the rule set can Analysis] written at University of Waikato, New zealand.
be defined as follows: The algorithms can either be applied directly to a data set or
if Number of “hot” indicators <= 0.0 and called from our own JAVA code. The input data for weka
Number of connections to the same host as the classifiers is represented in .ARFF [Attribute Relation
connection in the past two seconds <=500.82 and Function Format], consisting of the list of all instances with
% of connections that have “REJ” errors >0.21 the values for each instance separated by commas. As a
and <=0.01 and Number of connections to result of data set training and testing, a confusion matrix
host <=41.2 and >112.3 will be generated showing the number of instances of each
then SMURF class that has been assigned.
In this Genetic Algorithm, each chromosome represents Experiments were conducted to verify the performance of
one learning rule was evaluated. To evaluate a intrusion detection approach based on the above discussion.
chromosome, an appropriately sized network was configured All the experimental data is available from the corrected
for each of the 20 tasks. An individual of each population data set of KDD cup 1999. Important features based on
consisted of genes, where each gene represented a certain correlation Measure and Information gain was identified.
feature and its values represented the value of the feature. There were 21 types of intrusions in the test set but only 7 of
Each GA is trained during 300 generations where in each them were chosen in the framing set. Therefore the selected
generation 100 worst performed individuals are replaced data also challenged the ability to detect the unknown
with the newly generated ones. The same process was intrusions.
repeated with ten epochs and the results were analyzed. The main concern of features reduction is one of false
The second part of the GA is the fitness function. The alarms and missed intrusion detection. In this work, we
fitness function ‘F’ determines whether a rule is ‘good’ i.e. attempted to reduce the feature that may be effectively used
it detects intrusions, or whether the rule is ‘bad’, i.e. it does for intrusion detection without compromising security. We
not detect intrusions. ‘F’ is calculated for each rule. It will have specially focused on statistical techniques to test
depend on the following equation individual significance and mutual significance.
Support = A and B / N In this KDD data set each sample is unique with 34
Confidence = A and B / A numerical features and 7 symbolic features. In the
Fitness = t1 * support + t2 * confidence Table 1: Information Gain Measures
Where
N is the total number of records Attack
Information Gain Feature Type
A stands for the number of network connections matching type
num_compromised [0,255] were scaled linearly to the range additional rule systems for detecting DoS attacks and
[0.0, 1.0]. Two features spanned over a very large integer normal connections is conformed, as the false positive rate
range, namely src_bytes [0,693375640] and dst_bytes [0, has decreased in each of the cases.
5203179] were scaled by logarithmic scaling to the range Table2: Ranked List of Features
[0.0, 20.4] and [0,15.5]. For Boolean features having values
(0 or 1), they were left unchanged.
Attack
It should be noted that the test data is not from the same Ranked List
type
probability distribution as the training data. Moreover the
test data includes novel attack types that have not been DOS 5,23,3,33,35,34,24,36,2,39,4,38,26,25,29,30,6,
appeared in the training data. 12,10,13,40,41,31,37,32,8,7,28,27,9,1,19,18,2
In the second stage, from the reduced feature subset, rules 2,20,21,14,11,17,15,16
are formed using the genetic algorithm from the KDD data Probe 23,29,27,36,4,32,34,40,35,3,30,2,5,41,28,37,3
set and tested on the KDD training set to observe their 3,25,38,26,39,10,9,12,11,6,1,8,7,21,19,20,31,2
performance with respect to detection, false alarm rate and 2,24,15,13,14,18,16,17
missed alarm rate. The only drawback in this is the rules U2R 6,3,13,15,12,14,18,19,16,17,20,4,5,1,2,10,11,7
are biased to training data set. The genetic algorithm in the ,9,8,35,36,32,34,33,40,41,37,39,38,24,25,21,2
proposed design evaluates the rules and discards the bad 3,22,29,31,30,26,28,27
rules while generating more rules to reduce the false alarm R2L 3,34,1,6,5,33,35,36,32,12,23,24,10,2,37,4,38,1
rate and to increase the intrusion detection. The GA thus 3,16,15,14,8,7,11,9,29,30,27,28,40,41,31,39,1
continues to detect intrusions and produce new rules, storing 9,20,17,18,25,26,21,22
the good rules and discard the bad ones.
dst_host_srv_count <= 227
| num_failed_logins <= 0 Ten kinds of network attacks are included in the training
| | rerror_rate <= 0 set namely back, land, Neptune, pod, smurf, teardrop,
| | | num_access_files <= 0 ipsweep, portsweep, buffer overflow and guess passwd.
| | | | protocol_type = tcp Fifteen kinds of network attacks are included in the testing
| | | | | dst_host_same_srv_rate <= 0.11 set namely perl, xlock, mailbomn\b, UDPStrom, saint,
| | | | | | dst_host_serror_rate <= 0.01 xlock, back, land ,Neptune, pod, smurf, teardrop, ipsweep,
| | | | | | dst_host_serror_rate > 0.01: warezmaster portsweep, bufferoverflow and guess-passwd. The test
| | | | | dst_host_same_srv_rate > 0.11 dataset is similar with the training data set. The only
| | | | | | is_host_login <= 0: warezmaster differences are that the test data set includes some unknown
| | | | | | is_host_login > 0: multihop attacks not accruing in the training data set.
| | | | protocol_type = udp: multihop
| | | | protocol_type = icmp: multihop a b c d e f <-- classified as
| | | num_access_files > 0: ftp_write 76 0 1 0 0 0 | a = back
| | rerror_rate > 0: imap 0 7 0 0 0 0 | b = land
| num_failed_logins > 0: guess_passwd 1 0 250 0 0 0 | c = neptune
dst_host_srv_count > 227: guess_passwd 0 0 0 17 0 0 | d = pod
The summary of the results after RBF training is given
as follows:
Correctly Classified Instances 916 99.7821 % Table 3: Performance of the Implemented System
Incorrectly Classified Instances 2 0.2179 % No of
False Positive Rate
Kappa statistic 0.9959 rules Detection Rate in %
in %
Mean absolute error 0.0008
R2 Pro Do U2 Pro
Root mean squared error 0.0269 DoS U2R
L be S R
R2L
be
Relative absolute error 0.4262 % 50 86.7 79.2 81. 86. 0.9 1.1 1.5 1.2
Root relative squared error 9.0025 % 2 1
Total Number of Instances 918 75 81.4 71.3 75. 83. 1.9 2.7 2.9 2.3
4 4
Confusion matrix showing accuracy of the original RBF 100 78.3 67 71 82. 2.3 3.1 3.6 2.7
network compared with synthesized RBF. The numbers 5
represent test cases and those lying on the diagonal have 0 0 0 0 566 0 | e = smurf
been classified accurately, while those off the diagonal have 0 0 0 0 0 0 | f = teardrop
been misclassified. The original network has an accuracy of
51.3% on the high speed data but when it is modified by The time complexity is quite low. It requires m*(n2-n)/2
inserting domain rules that are characteristic of the nature of operations for computing the pairwise feature correlation
high speed data, the accuracy goes up to 95.0 % matrix, where m is the number of instances and n is the
We have performed experiments using 50, 75, and 100 initial number of features. The feature selection requires
best performed rules for detecting attacks. From TABLE III (n2-n)/w operations for a forward selection and a backward
believing that our system out performs the best performed elimination. The hill climbing search is purely exhaustive,
model reported in literature. Moreover, our previous but the use of the stopping criterion makes the probability of
statement of reducing false positive rate when deploying exploring the entire search space small. In particular we
132 (IJCNS) International Journal of Computer and Network Security,
Vol. 1, No. 2, November 2009
have stressed the message that feature selection can help to reduce the overhead in collecting data when used in a real
focus a learning algorithm only on the most relevant network environment. The generated rules from the genetic
features insight a given dataset thus on the main aspects of algorithm DNA encoding are capable of identifying and
the considered problem. classifying attack categories aright. Once rules are extracted
using genetic algorithm, the rule base is then inserted back
100%
into a new network which has similar problem domain has
80% the desired potential. This is similar to the heuristics given
to expert systems. Also like the heuristics the
60% 100 Rules
extracted/inserted rules may be refined, as more. Since the
75 Rules
40%
numbers of features used are minimum, this method not
50 Rules
only improves the detection performance but also trimmed
20% the time required for training.
0%
5. Conclusion
U2R
U2R
R2L
P rob e
R2L
P rob e
D oS
D oS
Proposed Method
80 a detection scheme that is more general and able to handle
Binary Tree
60 LAMSTAR all types’ data as well as numerical.
SOM
40
ART
20
References
0 [1] Andrews and Geva , “Intrusion detection Rules and
Normal DoS Probe U2R R2L
Networks”, Proceedings of the Rule Extraction from
Attack Type
Trained Artificial Neural NetworksWorkshop,
Figure 2. Comparison with other IDS Artificial Intelligence and Simulation of Behaviour,
Brighton UK, 1996.
When compared to other IDS [5, 6] in this approach, an [2] E. Eskin, A. Arnold, M. Prerau, L. Portnoy, S. Stolfo,
efficient algorithm for feature extraction is proposed to “A geometric framework for unsupervised anomaly
remove the irrelevance during the data preparation period. detection: Detecting intrusions in unlabeled data,” in
Experimental results showed that the new decision Applications of Data Mining in Computer Security,
dependent correlation measure can be used to select the near Chapter 4, D. Barbara and S. Jajodia (editors),
optimal feature subset. The smaller number of features will Kluwer, ISBN 1-4020-7054-3,2002
result in a much faster rule generation process and it will
(IJCNS) International Journal of Computer and Network Security, 133
Vol. 1, No. 2, November 2009
Authors Profile
Selvakani Kandeeban received the MCA
degree from Manonmanium Sundaranar
University and M.Phil degree from Madurai
Kamaraj University.
Presently she is working as an Professor &
Head, MCA Dept in Francis Xavier
Engineering College, Tirunelveli. Previously
she was with Jaya College of Engineering
and Technology as an Assistant Professor,
MCA Department. She has presented 4 papers in National
Conference and 1 paper in international conference. She has
published 1 paper in National journal and 8 papers in International
Journal. She is currently pursuing her PhD degree in Network
Security.