International Journal of Engineering Technology and Management (IJETM) ISSN: 2394-6881
Available Online at www.ijetm.org
Volume 3, Issue 4; July-August: 2016; Page No. 01-12
Classification Rule Discovery on Biological Dataset Using Ant Colony Optimization
1
2
M.Ramachandro , Dr.R.Bhramaramba
1
Asst.Professor, Department of CSE, GMRIT Rajam-32127, Srikakulam,
rama00565@gmail.com
2
Associate Professor, Dept. of Information Technology, GIT, GITAM University, Visakhapatnam
bhramarambaravi@gmail.com
ABSTRACT:
Classifi atio s ste s ha e ee
idel utilized i
edi al do ai to e plo e patie t’s data and extract a
predictive model. This model helps physicians to improve the irprognos is diagnosis or treatment planning
procedures. Data mining can be done by using different functionalities. Classification is one of them.
Classification is a data mining technique that assigns objects to a predefined classes or labels. The aim of
classification is to classify the objects into target class. On the other hand biology inspired algorithms such as
Genetic Algorithms (GA) and Swarm based approaches like Particle Swarm Optimization (PSO) and Ant Colonies
Optimization (ACO) were used in solving many data mining problems. In this project, binary classification is
considered as an area of problem. The main aim of this project is to discover the classification rule on biological
dataset using ant miner by calculating accuracy function depends upon pheromone update levels. Ant miner
uses rule induction algorithm that occupies collective intelligence to construct classification rules.
Keywords: Particle Swarm Optimization, Ant Colonies Optimization, Classification,
INTRODUCTION
being learning classifier systems.
Data mining has been defined as the nontrivial
extraction of implicit, previously unknown, and
potentially useful information from data it uses
machine learning and visualization techniques to
discover and present knowledge in a form of which
is easily comprehensible to humans. The actual
data mining task is the automatic or semi automatic analysis of large quantities of data to
extract previously unknown interesting patterns
such as groups of data records, unusual record and
dependencies. Out of several data mining tasks,
including regression, c l u s t e r i n g , dependence
modeling, etc, classification is most studied and
popular data mining task. The main objective of
classification is to build a model that predicts the
class of an unseen data instance through predicting
attributes. Rule Discovery is an important data
mining task since it generates a set of symbolic rules
that describe each class or category in a natural
way. The human mind is able to understand rules
better than any other data mining model. However,
these rules need to be simple and comprehensive;
othe ise, a hu a
o ’t e a le to o p ehe d
them. Evolutionary algorithms have been widely
used for rule discovery, a well-known approach
1.1EXISTING SYSTEM
Corresponding author: M.Ramachandro
Classification is a data mining technique that assigns
items to a predefined categories or classes or labels.
The aim of classification is to predict the target class
for the inputted data. On the other hand biology
inspired algorithms such as Genetic Algorithms (GA)
and Swarm based approaches like Particle Swarm
Optimization (PSO) and Ant Colonies Optimization
(ACO) were used in solving many data mining
problems and currently the most prominent choice
in the area of swarm intelligence. In this paper
binary classification is considered as an area of
problem and a modified Ant Miner is used to solve
the problem. The basic algorithm of Ant Miner has
been modified with a different classification
accuracy function.
1.2 PROPOSED SYSTEM
In many real world problems, classification is used
as one of the important decision making technique.
classification task can be used when a tuple or
sample needs to be classified into a predefined set
of classes based on some set of attributes. There
are many real world problems that can be
1
M.Ramachandro, et.al.,International Journal of Engineering Technology and Management (IJETM)
categorized as classification problems like weather
forecasting, credit risk evaluation, medical diagnosis
problem, bankruptcy prediction etc.
In binary class problems a set of attributes is
categorized to one class between two classes like a
decision of Yes or No. While in a multiple
classification problem a tuple is categorized to one
class having number of classes as a solution like
Class A, Class B and Class C, i.e. more than two
classes as a solution or predicted class. In this a
different quality function is used which is simpler in
mathematical operation i.e. less no of multiplication
and division operation, suitable for binary
classification and produces good results. The
function is described in.
Q=TP+TN/TP+TN+FP+FN
All the parameters passed in quality function have
the same meaning as in basic Ant Miner.
1.3. ADVANTAGES
Positive Feedback accounts for rapid discovery
of good solutions.
Distributed computation avoids premature
convergence.
The greedy heuristic helps find acceptable
solution in the early solution in the early stages of
the search process.
Ant-Miner discovered rule lists much simpler
(i.e., smaller) than the rule sets discovered by C4.5
and the rule lists discovered byCN2.
2. LITERATURESURVEY
In the last several decades, the size of data is
increasing vigorously every day. The factors include
the widespread use of barcodes for most
commercial products, the computerization of many
business, scientific and government transactions. In
addition to this, popular usage of World Wide Web
as global information, it has flooded with
tremendous amount of data. Most of the data
patterns are unstructured and complex, though
available in the digitalized form. Analyzing of
unstructured data is difficult and not efficient until
we change it into a structured data. Data mining is
the process of discovering new patterns from large
data sets involving methods of artificial intelligent,
machine learning, statics and database systems. It
is the best way to get structured data patterns. It
extracts knowledge from the dataset in human
understandable structure. The main methods in the
data mining process are association, classification,
and clustering.
2.1. CLASSIFICATION
Classification is done on the basis of the learnt
classification model and it comprises of assigning a
class label to test samples.
Properties of classification
With classification rules the groups (or classes) are
specified beforehand, with each training data
instance belonging to a particular class.
1. This type of data you will get from the train
data.
2. This type of learning is called as supervised
learning.
3. This type solving problem comes under
Classification.
Classification versus Clustering
Classification is a supervised learning whereas
clustering under goes an unsupervised learning. In
general, in classification we have a set of predefined
classes and want to know which class a new object
belongs to is whereas in the clustering tries to
group a set of objects and find whether there is
some relationship between the objects.
2.2 Performance Evaluation of Classification
Methods
Classification methods are usually compared on the
basis of following criteria
2.2.1
Predictive Accuracy
This is the ability of the classification model [6] to
correctly classify unseen data. After a classification
model has built with the help of training data its
accuracy is measured on test samples whose
correct class labels are known but not shown to the
model. Predictive accuracy is the number of
correctly categorized test samples divided by total
number of test samples. For example, if we have
twenty test samples and the classification model
correctly classifies eighteen out of them, then
accuracy of the model is90%.
2.2.2
Robustness
This is the ability of the classification model to
perform well on noisy or missing values data.
2.2.3
Speed
This is the computational cost of generating the
© 2016 IJETM. All Rights Reserved.
2
M.Ramachandro, et.al.,International Journal of Engineering Technology and Management (IJETM)
model. This cost is measured in terms of running
time of the algorithm. The running time is measured
in terms of number of steps/ operation required by
an algorithm and it is independent of the operating
system and the machine used.
2.2.4
Scalability
This is the ability to construct the model efficiently
even for a large amount of high dimensional data.
When we increase the size of input the algorithm
should be able to construct the classification model
as efficiently as for small input size.
2.2.5
Interpretability
This is the ease of comprehensibility
understanding of the model by the user.
2.3
or
Types of Classifiers
A large number of classification methods are
available. They can be divided into two major
groups: comprehensible classifiers and statistical (or
mathematical classifiers).
2.3.1
Comprehensible Classifiers
Comprehensible classifiers [1] are usually rule based
classifiers. These are easy to understand and
interpret and are interesting for the users (or at
least the domain experts). They are in contrast to
mathematical classifiers which are difficult to
understand. The major benefit of these classifiers is
that comprehensibility leads to trust of the user on
the decisions obtained from them. Some of the
commonly used rule induction algorithms are
described below.
C4.5 Decision Tree
CN2
3. IMPLEMENTATIONISSUES
3.1 Classification analysis
CLASSIFICATION [6] is one of the most frequently
occurring tasks of human decision making. A
classification problem encompasses the assignment
of an object to a predefined class according to its
characteristics. Many decision problem in a variety
of domains, such as engineering, medical sciences,
human sciences, and management science can be
considered as classification problems. Popular
examples are speech recognition, character
recognition,
medical
diagnosis,
bankruptcy
prediction and credit scoring. Throughout the
years, a myriad of techniques for classification has
been proposed such as linear and logistic
regression, decision trees and rules, k-nearest
neighbor classifiers, neural networks, and support
vector machines (SVMs). Various bench marking
studies indicate the success of the latter two
nonlinear classification techniques, but their
strength is also their main weakness: since the
classifiers generated by neural networks and SVMs
are described as complex mathematical functions,
they are rather in comprehensible and opaque to
humans. This opacity property prevents them from
being used in many real-life applications where
both accuracy and comprehensibility are required,
such as medical diagnosis and credit risk evaluation.
For example, in credit scoring, since the models
concern key decisions of a financial institution, they
need to be validated by a financial regulator.
Transparency and comprehensibility are, therefore,
of crucial importance.
Similarly, classification
models provided to physicians for medical diagnosis
need to be validated, demanding the same clarity as
for any domain that requires regulatory validation.
Classification is a two steps process. 1st step is
Model Construction.
Model Construction: It describes a set of
predetermined classes.
Each tuple/sample is
assumed to belong to a predefined class, as
determined by the class label attribute. These of
tuples used for model construction is called training
set. The model is represented as classification rules,
decision trees, or mathematical formulae.
Model usage: This is the 2nd step in classification.
For classifying future or unknown objects, this is
used. This model estimates the accuracy of the
model. The known label of test sample is compared
with the classified result from the model. Test set is
independent of Training set.
There are many algorithms which are used for
classification in data mining shown above.
Following are some algorithms
1. Rule based classifier
2. Decision tree induction
3. Nearest neighbor classifier
4. Bayesian classifier
5. Artificial neural network
6. Support vector machine
7. Ensemble classifier
8. Regression trees
© 2016 IJETM. All Rights Reserved.
3
M.Ramachandro, et.al.,International Journal of Engineering Technology and Management (IJETM)
The Data
The data used in this investigation is the diabetes
[7] data. It has a total dimension of 699 rows and
9columns. For the purposes of training and testing,
only 75% of the overall data is used for training and
the rest is used for testing the accuracy of the
classification of the selected classification methods.
3.2. OVERVIEW OFDIABETES
3.2.1. Diabetes
Diabetes [7] is a disease that occurs when the
insulin production in the body is inadequate or the
body is unable to use the produced insulin in a
proper manner, as a result, this leads to high blood
glucose. The body cells break down the food into
glucose and this glucose needs to be transported to
all the cells of the body. The insulin is the hormone
that directs the glucose that is produced by
breaking down the food in to the body cells. Any
change in the production of insulin leads to an
increase in the blood sugar levels and this can lead
to damage to the tissues and failure of the organs.
Generally a person is considered to be suffering
from diabetes, when blood sugar levels are above
normal (4.4 to 6.1 mm ol/L). Generally a person is
considered to be suffering from diabetes, when
blood
sugar
levels
are
above
normal
(4.4to6.1mmol/L)[5].A diabetic patient essentially
has low production of insulin or their body is not
able to use the insulin well. There are three main
types of diabetes, viz. Type1,Type2 and Gestational.
Type1–The disease manifest as an auto immune
disease occurring at a very young age of below 20
years. In this type of diabetes, the pancreatic cells
that produce insulin have been destroyed. Type2 –
Diabetes is in the state when the various organs of
the body become insulin resistant, and this
increases the demand for insulin. At this point,
pa eas does ’t ake the e ui ed a ou t of
insulin. Gestational diabetes ends to occur in
p eg a t o e , as the pa eas do ’t ake
sufficient amount of insulin. All these types of
diabetes need treatment and if they are detected at
a nearly state, one can avoid the complications
associated with them. Now a days, large amount of
information is collected in the form of patient
records by the hospitals. Knowledge discovery for
predictive purposes is done through data mining,
which is analysis technique that helps in proposing
inferences
There are three main types of diabetes, viz. Type 1,
Type 2 and Gestational.
3.2.2Types of Diabetes
The three main types of diabetes are described
below:
1. Type1–Though there are only about 10% of
diabetes [7] patients have this form of diabetes,
recently, there has been a rise in the number of
cases of this type in the United States. The disease
manifest as an auto immune disease occurring at a
very young age of below 20 years hence also called
juvenile-on set diabetes. In this type of diabetes,
the pancreatic cells that produce insulin have been
destroyed by the defense system of the body.
Injections of insulin along with frequent blood tests
and dietary restrictions have to be followed by
patients suffering from Type 1diabetes.
2. Type 2 – This type accounts for almost 90% of
the diabetes cases and commonly called the adult on set diabetes or the non - insulin dependent
diabetes. In this case the various organs of the
body become insulin resistant, and this increases
the demand for insulin. At this point, pancreas
does ’t ake the e ui ed a ou t of i suli . To
keep this type of diabetes at bay, the patients have
to follow a strict diet, exercise routine and keep
track of the blood glucose. Obesity, being
overweight, being physically inactive can lead to
type 2 diabetes. Also with age ing, the risk of
developing diabetes is considered to be more.
Majority of the Type 2 diabetes patients have
border line diabetes or the Pre-Diabetes, a
condition where the blood glucose levels are higher
than normal but not as high as a diabetic patient.
3. Gestational diabetes–is a type of diabetes that
tends to occur in pregnant women due to the high
suga le els as the pa eas do ’t p odu e
sufficient amount of insulin.
© 2016 IJETM. All Rights Reserved.
4
M.Ramachandro, et.al.,International Journal of Engineering Technology and Management (IJETM)
Table 4.2.2.1: Dataset [5] Description
Dataset
No of attributes
No of instances
8
768
Pima Indians Diabetes Database of
National Institute of Diabetes and
Digestive and Kidney Diseases
Table4.2.2.2: The attributes description [7]
Attribute
Values
No of times pregnant
Plasma glucoseconcentration
Diastolic blood pressure
Triceps skin foldthickness
2 hour insulin levels
Body mass index
Pedigree function
Age
Class variable 1 or2
Preg
Plas
Pres
Skin
Insulin
Mass
Pedi
Age
Class
3.3MODULES
3.3.1DataPre-processing
Data pre-processing [4] is an important step in the
data mining process. Data-gathering methods are
often loosely controlled, resulting in out-of-range
alues e.g., I o e: −
, i possi le data
combinations (e.g., Sex: Male, Pregnant: Yes),
missing values, etc. Analyzing data that has not
been carefully screened for such problems can
produce misleading results. If there is much
irrelevant and redundant information present or
noisy and unreliable data, then knowledge
discovery during the training phase is more difficult.
Data preparation and filtering steps can take
considerable amount of processing time.
Data pre-processing includes cleaning,
normalization, transformation, feature extraction
and selection, etc. Some attributes may not be
required in the analysis, and then those attributes
can be removed from the dataset before analysis.
For example, attribute instance number of iris
dataset is not required in analysis. This attribute can
be removed by selecting it in the Attributes check
box, and clicking Remove. Resulting dataset then
can be stored in arff file format.
Missing values
Missing data might occur because the value is not
relevant to a particular case, could not be recorded
when the data was collected, or is ignored by users
because of privacy concerns. Missing values lead to
the difficulty of extracting useful information from
that data set [2]. Missing data are the absence of
data items that hide some information that may be
important [1]. Most of the real world databases are
characterized by an unavoidable problem of
incompleteness, in terms of missing or erroneous
values Type of missing data: There is different type
of missing value
MCAR
The te
Missi g Co pletel at Ra do
efe s to
data where the missing ness mechanism does not
depend on the variable of interest, or any other
variable, which is observed in the dataset.
MAR
MAR Sometimes data might not be missing at
a do
ut
a
e te ed as Missi g at
Ra do . We a o side a e t Xi as issi g at
random if the data meets the requirement that
missing ness should not depend on the value of Xi
after controlling for another variable.
NAMR
NAMR If the data is not missing at random or
i fo ati el
issi g the it is te ed as Not
© 2016 IJETM. All Rights Reserved.
5
M.Ramachandro, et.al.,International Journal of Engineering Technology and Management (IJETM)
issi g at Ra do . “u h a situatio o u s he
the missing ness mechanism depends on the actual
value of missing data.
Missing data imputation techniques
Lit wise deletion: This method omits those cases
(instances) with missing data and does analysis on
the remains. Though it is the most common
method, it has two obvious disadvantages: a) A
substantial decrease in the size of data set available
for the analysis. b) Data are not always missing
completely at random. Mean/ Mode Imputation
(MMI) Replace a missing data with the mean
(numeric attribute) or mode (nominal attribute) of
all cases observed. To reduce the influence of
exceptional data, median can also be used. This is
one of the most common used methods.
Replacing the missing values
In weka there is a filter called "Replace Missing
Values" that permit to replace all missing values in a
dataset using the mean of each attribute. I'd like to
replace missing values, for a certain attribute, using
the mean of values that belong to a certain class.
For example in a binary dataset I think that is more
correct to replace a missing value for an attribute in
record that belong to the positive class using the
mean calculated with only the records that belong
to the positive class.to replace missing values of
Class A by taking the mean calculated from the
training instances of that particular class A, then
ou a e " iasi g ou dataset. To a oid ias hi h
eventually will over fit your trained model), it is
(Differential path effect) With time, the amount of
pheromone the ants deposit increases more rapidly
on the shorter path, and so more ants prefer
this path. This positive effect is called auto
catalysis. The difference between the two paths is
wise to use the default "replace missing values"
function- i.e., to consider mean and mode of all
training instances rather than of just that particular
class.
3.3.2Applying ACO model on dataset ANT COLONY
SYSTEM (ACS)
Ant Colony Optimization (ACO) [4] is a branch of a
newly developed form of artificial intelligence called
swarm intelligence. Swarm intelligence is a field
hi h studies the e e ge t olle ti e i tellige e
of g oups of si ple age ts . I g oups of i se ts,
which live in colonies, such as ants and bees, an
individual can only do simple tasks on its own, while
the colony's cooperative work is the main reason
determining the intelligent behavior it shows. Most
real ants are blind. However, each ant while it is
walking, deposits a chemical substance on the
ground called pheromone. Pheromone encourages
the following ants to stay close to previous moves.
The pheromone evaporates over time to allow
search exploration. In a number of experiments
presented in Dorigo and Maniezzo illustrate the
complex behavior of ant colonies. For example, a
set of ants built a path to some food. An obstacle
with two ends was then placed in their way such
that one end of the obstacle was more distant than
the other. In the beginning, equal numbers of ants
spread around the two ends of the obstacle. Since
all ants have almost the same speed, the ants going
around the nearer end of the obstacle return before
the ants going around the farther end
called the preferential path effect; it is the result of
the differential deposition of pheromone between
the two sides of the obstacle, since the ants
following the shorter path will make more visits to
the source than those following the longer path.
© 2016 IJETM. All Rights Reserved.
6
M.Ramachandro, et.al.,International Journal of Engineering Technology and Management (IJETM)
Because of pheromone evaporation pheromone on
the longer path vanishes with time.
ACO Based Classification Rule Discovery: Ant
Miner Algorithm
A few authors have applied ACO for discovery of
classification rules. The first ACO based algorithm
for classification rule discovery, called, Ant Miner
[3] was proposed by Parpinelli, etal. An ant
constructs a rule. It starts with an empty rule and
incrementally constructs it by adding one term at a
time. The selection of a term to be added is
probabilistic and based on two factors: a heuristic
quality of the term and the amount of pheromone
deposited on it by the previous ants. The authors
use information gain as the heuristic value of a
term. The rule construction continues until one of
the two situations occurs. One situation is that
there is no term left whose addition would not
cause the rule to cover a number of cases smaller
than a threshold specified by the user called
Min_cases_per_rule (minimum number of cases
covered by the rule). The second situation is that
there are no more attributes that could be inserted
in the rule because all attributes have already been
utilized by the ant. When one of these two stopping
o ditio s is et the a a t’s tou is o side ed
o plete
the
ule’s a te ede t pa t is
complete).The consequent of the rule is assigned by
taking a majority vote from the training samples
covered by the rule. The constructed rule is then
pruned to remove irrelevant terms and to improve
its accuracy. The quality of the constructed rule is
determined and pheromone values are updated on
the trail take place by the ant relative to the quality
of rule. After this a new ant starts with updated
pheromone values to guide its search. When all ants
have constructed their rules, the best rule among
them is selected and added to a discovered rule list.
The training samples correctly classified by that rule
are deleted from the training set. This process
continues until the number of uncovered samples is
less than a threshold specified by the user. The
final product is an ordered discovered rule list that
is used to classify the test data.
The goal of ant miner is to extract classification
rules from data. The algorithm is presented above
1. Training set=all training cases; attributes that
are not yet used by the ant.
2. WHILE(No. of cases in the Training set>max
_uncovered_ cases)
3.
4.
5.
6.
i=0;
REPEAT
i=i+1;
Anti-incrementally constructs a classification
rule;
7. Prune the just constructed rule;
8. Update the pheromone of the trail followed
by Anti;
9. UNTIL i ≥No_of_A ts
10. Select the best rule among all constructed
rules;
11. Remove the cases correctly covered by the
selected rule from the training set;
12. ENDWHILE
Pheromone Initialization:
All cells in the pheromone table are initialized
equally to the following value
Where a is the total number of attributes, bi is the
number of values in the domain of attribute i.
Rule Construction
Each rule in Ant-Miner contains a condition part as
the antecedent and a predicted class. The condition
part is a conjunction of attribute-operator-value
tuples. The operator used in all experiments
is = si ei A t-Miner2, just a sin Ant-Miner, all
attributes are assumed to be categorical. Let us
assume a rule condition such as termij≈ Ai=Vij,
where Aiis the ithattribute and Vijist the jthvalue in
the domain of Ai. The probability, that this
condition is added to the current partial rule that
the ant is constructing, is given by the following
Equation:
Where ηijis a problem-dependent heuristic value
for term-ij, τijis the amount of pheromone currently
available (at time t) on the connection between
attribute I and value I is the set of attributes that
are not yet used by the ant.
Heuristic Value
In traditional ACO a heuristic value is usually used in
conjunction with the pheromone value to decide on
© 2016 IJETM. All Rights Reserved.
7
M.Ramachandro, et.al.,International Journal of Engineering Technology and Management (IJETM)
the transitions to be made. In Ant-Miner, the
heuristic value is taken to be an information
theoretic measure for the quality of the term to be
added to the rule. The quality here is measured in
terms of the entropy for preferring this term to the
others, and is given by the following equations:
=
Rule Pruning
Immediately after the ant completes the
construction of a rule, rule pruning is under taken to
increase the comprehensibility and accuracy of the
rule. After the pruning step, the rule may be
assigned a different predicted class based on the
majority class in the cases covered by the rule
antecedent. The rule pruning procedure iteratively
removes the term whose removal will cause a
maximum increase in the quality of the rule. The
quality of a rule is measured using the following
equation:
�=
� ����
� �����
×
� ���� + ��� ���� ��� ��� + � �����
DESCRIPTION OF ANT-MINER [3]:
The pseudo code of Ant miner is at a very high level
of abstraction, in Algorithm. Ant-Miner starts by
initializing the training set to the set of all training
cases, and initializing the discovered rule list to an
empty list. Then it performs an outer loop where
each iteration discovers a classification rule.
The first step of this outer loop is to initialize all
trails with the same amount of pheromone, which
means that all term shave the same probability of
being chosen by an ant to incrementally construct a
rule. This is done by an inner loop, consisting of
three steps. First, an ant starts with an empty rule
and incrementally constructs a classification rule by
adding one term at a time to the current rule. In
this step a term..– representing a triple <Attribute =
Value> – is chosen to be added to the current rule
with probability proportional to the product of
h..t..(t), where h..is the value of a problem –
dependent heuristic function for term. And tjj(t) is
the amount of pheromone associated with term i.
at iteration (time index) t. More precisely, hi is
essentially the information gain associated with
term i. The higher the value of h. the more relevant
for classification term. Is and so the higher its
probability of being chosen .t..(t) corresponds to the
amount of pheromone currently available in the
positio i,]’ of the t ail ei g follo ed
the
current ant. The better the quality of the rule
constructed by an ant, the higher the amount of
phe o o e added to the t ail positio s te s
visited used
the a t.
The second step of the inner loop consists of
pruning the just-constructed rule, that is, removing
irrelevant terms – terms that do not improve the
predictive accuracy of the rule. In essence, a term is
removed from a rule if this operation does not
decrease the quality of the rule. This pruning
process helps to avoid the overfitting of the
discovered rule to the training set.
The third step of the inner loop consists of
updating the pheromone of all trails by increasing
the pheromone in the trail followed by the ant, the
quality (Q) of a rule is measured by the equation:
Where Sensitivity = TP/ (TP+FN) and Specificity=TN/
(TN+FP). The meaning of the acronyms TP, FN, TN
and FP is as follows:
True Pos = number of true positives, that is, the
number of cases covered by the rule that have the
class predicted by the rule;
False Neg = number of false negatives, that is, the
number of cases that are not covered by the rule
but that have the class predicted by the rule;
True Neg = number of true negatives, that is, the
number of cases that are not covered by the rule
and that do not have the class predicted by the rule;
and
False Pos = number of false positives, that is, the
number of cases covered by the rule that have a
class different from the class predicted by the rule
4. RESULTS ANDDISCUSSIONS
4.1 DATA SET ANALYSIS
The data sets for performing classification had been
taken from the Pima Indians Diabetes Database of
National Institute of Diabetes. The data sets had
been taken to apply for ant colony optimization
4.2. Diabetes dataset
Diabetes data set consists of 9 numerical attributes
and 7 6 8 instances .. Using this data, we are
cleaning the data. After applying preprocessing
© 2016 IJETM. All Rights Reserved.
8
M.Ramachandro, et.al.,International Journal of Engineering Technology and Management (IJETM)
4.3 Data fields of DiabetesDataset
4.4OUTPUTSCREENS
4.4.1Overview Screen
© 2016 IJETM. All Rights Reserved.
9
M.Ramachandro, et.al.,International Journal of Engineering Technology and Management (IJETM)
© 2016 IJETM. All Rights Reserved.
10
M.Ramachandro, et.al.,International Journal of Engineering Technology and Management (IJETM)
5. CONCLUSIONSANDFUTUREENHANCEMENTS
Conclusion
In this paper, we have discussed the use of ACO for
classification.
By providing an appropriate
environment, the ants choose their paths and
implicitly construct a rule. One of the strengths of
the ant – based approaches is that the results are
comprehensible, as they are in a rule based format.
Such rule lists provide insight into the decision
making, which is a key requirement in domains such
as credit scoring and medical diagnosis. The
proposed Ant Miner technique can handle both
binary and multiclass classification problems and
generates rule lists consisting of propositional and
interval rules. Another advantage of ACO that
comes out more clearly in our approach is the
possibility to handle distributed environments.
Since the Ant Miner construction graph is defined as
a sequence of vertex groups (of which the order is
of no relevance), we are able to mine distribute
databases.
Future Enhancements
An issue faced by any rule-based classifier is that,
although the classifier is comprehensible, it is not
necessarily in line with existing domain knowledge
[7]. It may well occur that data instances, that are
very evident to classify by the domain expert, do
not appear frequently enough in the data in order
to be appropriately modeled by the rule induction
technique. Hence, to be sure that the rules are
intuitive and logical, expert knowledge needs to be
incorporated. An example rule-set of such an
unintuitive rule list, generated by Ant Miner, The
underlined term is contradictory to medical
knowledge suggesting that increasing tumor sizes
result in higher probability of recurrence. As shown
in Fig. 10, such domain knowledge can be included
in Ant Miner by changing the environment.4 since
the ants extract rules for the recurrence class only,
we can remove the second vertex group
corresponding to the upper bound on the variable.
Doing so ensures that the rules comply with the
constraint required for the tumor size variable.
Applying such constraints on relevant datasets to
obtain accurate, comprehensible, and intuitive rule
lists is surely an interesting topic for future
research.
6. References:
1. D.Martens,M.deBacker,R.Haesen,B.Baesens,C.
Mues,a dJ.Va thie e , A t asedapp oa h to
the knowledge fusion p o le , i AntColony
Classification and Associative Classification Rule
Discovery using Ant Colony Optimization
Optimization and Swarm Intelligence(ANTS
2006), LNCS 4150, pp. 84-95,Springer,2006.
© 2016 IJETM. All Rights Reserved.
11
M.Ramachandro, et.al.,International Journal of Engineering Technology and Management (IJETM)
2. Abraham, A.Grosan, C., Ramos V.: "Swarm
Intelligence in Data Mining". Studiesin
Computational Intelligence, vol. 34,(2006).
3. SmaldonJ.&Freitas,A.A.(2006).Anewversionofth
eAnt-Mineralgorithmdiscovering
unordered
rule sets. Proc. Genetic and Evolutionary
Computation
Conf.(GECCO-20060),San
Francisco, CAMorganKaufmann.
4. DorigoM&Maniezzo,V.(1996).Theantsystem:opt
imizationbyacolonyofcooperatin
Optimization,D.Corne,M.DorigoandF.Glover,Eds
.London,U.K.:McGraw-Hill,1999,pp. 11–32.
5. https://archive.ics.uci.edu/ml/datasets/Pima+In
dians+Diabetes
6. Liu, H.A. Abbaas, a d B. M Ka , Classifi atio
rule discovery with ant colonyopti izatio ,
IEEE Computational Intelligence Bulletin, Vol. 3,
No. 1, Feb.2004.
7. R. S. Parpinelli, H. S. Lopes, and A. A. Freitas,
A a t olo
ased s ste fo data i i g:
Appli atio s to edi al data, i Proc. Genetic
and Evol.Comput. Conf., 2001, pp.791–797
8. Parepinelli, R. S., Lopes, H. S., & Freitas, A.
(2002). An Ant Colony Algorithm for
Classification Rule Discovery. In H. A. a. R.S. a. C.
Newton (Ed.), Data Mining: Heuristic Approach:
Idea Group Publishing.
Text Books:
1. Introduction to data mining, Pang-Ning-Tan,
Michael Steinbach, Vipin Kumar, Published by
PearsonEducation.
2. Data Mining and Concepts by Jiawei Han,
Michelin Kamber, Morgan Kauffma
© 2016 IJETM. All Rights Reserved.
12