10.1515 - Jaiscr 2015 0031

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

JAISCR, 2015, Vol. 5, No. 4, pp.

231 – 238
10.1515/jaiscr-2015-0031

FEATURE SELECTION USING PARTICLE SWARM


OPTIMIZATION IN TEXT CATEGORIZATION

Mehdi Hosseinzadeh Aghdam1 and Setareh Heidari2

1 Department of Computer Engineering and Information Technology, Payame Noor University


(PNU)P.O.BOX 19395-3697, Tehran, IRAN
2 School of Computer Engineering Iran University of Science and Technology
Tehran, IRAN

Abstract

Feature selection is the main step in classification systems, a procedure that selects a
subset from original features. Feature selection is one of major challenges in text cat-
egorization. The high dimensionality of feature space increases the complexity of text
categorization process, because it plays a key role in this process. This paper presents a
novel feature selection method based on particle swarm optimization to improve the per-
formance of text categorization. Particle swarm optimization inspired by social behavior
of fish schooling or bird flocking. The complexity of the proposed method is very low
due to application of a simple classifier. The performance of the proposed method is com-
pared with performance of other methods on the Reuters-21578 data set. Experimental
results display the superiority of the proposed method.

1 Introduction terms that occur in texts, and the number of terms


can be hundreds of thousands for even a moderate-
Feature selection (FS) is used in many areas as a sized text collection. This is highly costly for many
tool to eliminate irrelevant and redundant features. mining methods. Thus, it is highly desirable to re-
FS simplifies a data set by reducing its dimension- duce the feature space without decreasing catego-
ality and identifying relevant features without de- rization accuracy.
creasing the prediction accuracy. The dimensional-
Several methods have been applied to the prob-
ity of data set are often very large, since learning
lem of FS in TC. Yang and Pedersen conducted a
algorithm might not work as well before removing
comparative study on five FS criteria for TC, includ-
these irrelevant features. Reducing the number of
ing document frequency, information gain, mutual
irrelevant features significantly reduces the running
information, a χ2-test (CHI) and term strength and
time of a learning algorithm. FS has many applica-
found χ2 statistics and information gain more effec-
tions, including text categorization (TC), data min-
tive for optimizing classification results [3]. Kim, et
ing, pattern recognition and signal processing [1].
al. examined three methods consist of centroid, or-
The goal of TC is automatically assigning pre- thogonal centroid, and LDA/GSVD, which are de-
defined categories to text documents [2]. This goal signed for reducing the dimension of clustered data
is of great practical importance given the enormous for dimensional reduction in TC [4]. Forman pre-
bulk of online text available through the web sites, sented an excellent review of FS methods for TC
emails, and digital libraries. A main challenge of and introduced a case study in text feature selection
TC is the high dimensionality of the feature space. [5].
The original feature space contains of many unique
232 Mehdi Hosseinzadeh Aghdam, Setareh Heidari

Exhaustive search is the easiest way to deter- 2 Feature Selection Approaches


mine the optimal subset of features by evaluating
all the subsets. This method is quite impractical FS is a procedure that chooses a subset from the
even for a medium size feature set. FS methods usu- feature set. The optimality of a feature subset is
ally involve heuristic or random search strategies to evaluated by criterion. Since FS is a NP-hard prob-
avoid this complexity. However, the optimality of lem, there is no practical solution to find its opti-
the final feature subset is often reduced [1]. mal feature subset [11]. A typical FS procedure
contains subset generation, subset evaluation, ter-
Among many methods which are proposed for
mination criteria and result validation [12]. Sub-
FS, heuristic methods such as genetic algorithm
set selection process implements a search method
[6], ant colony optimization [7] and particle swarm
that chooses feature subsets for evaluation based
optimization have been interested for researchers.
on a certain search method. These search meth-
These methods try to gather better solutions by us-
ods includes forward selection, backward elimina-
ing knowledge from previous steps. GAs are op-
tion and forward/backward combination methods.
timization methods based on the natural selection.
The process of subset selection and evaluation is
They applied operations found in natural genetics
repeated until a given termination condition is satis-
to guide search in the search space [8]. Because
fied. The selected best feature subset usually needs
of their advantages, GAs have been widely used as
to be validated using a different test data set [13].
a tool for FS in data mining [9]. Particle swarm
The methods to feature subset selection can be cat-
optimization which is introduced by Kennedy and
egorized into filters, wrappers and embedded ap-
Eberhart, is based on a social-psychological model
proaches. The filter model separates FS from clas-
of social influence and social learning. Particles in a
sifier learning and selects feature subsets that are
swarm follow a very simple behavior: emulate the
independent of any learning algorithm [1]. In the
success of neighboring individuals. The emerged
wrapper method feature subset is chosen using the
group behavior discovers optimal regions of a high
evaluation function based on the same learning al-
dimensional search space.
gorithm that will be used later for learning. In this
In this paper a modified PSO-based FS method method the evaluation function computes the suit-
has been presented. The classifier performance and ability of a feature subset generated by the subset
the length of selected feature subset are used as generation procedure and it also compares that with
heuristic information for the proposed PSO-based the previous best candidate, replacing it if found to
method. Thus, the proposed method needs no prior be better. A stopping criterion is tested in each of
knowledge of features. The proposed method is iterations to determine whether or not the FS pro-
applied to text features of bag of words model in cedure should continue. Although, wrappers may
which a document is considered as a set of words or generate better solutions, they have complexity to
phrases and each position in the feature vector cor- run and can break down with very large numbers of
responds to a given term in document [10]. Finally, features since they use of learning algorithms in the
the length of selected feature vector and the clas- evaluation of subsets. If the FS and learning algo-
sifier performance are considered for performance rithm are interleaved then the FS method is a type
evaluation. of embedded method [14].
The rest of this paper is structured as follows.
Section 2 presents a brief overview of FS meth-
3 Particle Swarm Optimization for
ods. The proposed PSO-based feature selection al-
gorithm is described in section 3. Section 4 reports Feature Selection
computational experiments. It also includes a brief
The PSO is a computational approach that op-
discussion of the results which are obtained and fi-
timizes a problem in continuous, multidimensional
nally the conclusion and future works are offered in
search spaces. PSO starts with a swarm of random
the last section.
particles. Each particle is associated with a velocity.
Particles’ velocities are adjusted in order to the his-
torical behavior of each particle and its neighbors
FEATURE SELECTION USING PARTICLE . . . 233

during they fly through the search space. Thus, the


particles have a tendency to move towards the bet-
ter search space [15]. The version of the utilized
PSO algorithm is decribed mathematically by the
following equations:

Vi (t + 1) = w.Vi (t)
+ c1 .r1 (t).[Pi (t) − Xi (t)] (1)
+ c2 .r2 (t).[Pg (t) − Xi (t)]

Xi (t + 1) = Xi (t) +Vi (t + 1) (2)

where c1 and c2 are positive constants, called learn-


ing rates; Pi = (pi1 , pi2 , . . . , piD ) shows the best pre-
vious position of the swarm; r1 and r2 are random
values in the range [0, 1]; Xi = (xi1 , xi2 , . . . , xiD )
displays the position of the ith particle in a prob-
lem space with D dimensions; t indicates the itera-
tion number; w is a inertia weight; the index g rep-
resents the best particle among all the particles in
the population; Vi = (vi1 , vi2 , . . . , viD ) indicates the
rate of change of position (velocity). If the sum of
the factors in the right side of equation (1) exceeds
a specified constant value, particles’ velocities on
each dimension are clamped to a maximum veloc-
ity Vmax [15].
PSO was originally considered for searching Figure 1. PSO-based feature selection algorithm
multidimensional continuous spaces. In this paper,
it is adapted to the discrete FS problem. Each fea- 3.1 Updating velocity
ture subset can be considered as a point in feature
The velocity of each particle is displayed as a pos-
space. The optimal point is the subset with least
itive integer; particle velocities are bounded to a
length and highest classification accuracy. The ini-
maximum velocity Vmax . It shows how many of fea-
tial swarm is distributed randomly over the search
tures should be changed to be same as the global
space, each particle takes one position. The goal of
best point, in other words, the velocity of the par-
particles is to fly to the best position. By passing the
ticle moving toward the best position. The number
time, their position are changed by communicating
of different features (bits) between two particles re-
with each other, and they search around the local
lated to the difference between their positions. For
best and global best position. Finally, they should
example, Pg = [1 0 1 1 1 0 1 0 0 1], Xi = [0 1 0 0 1 1 0
converge on good, possibly optimal, positions since
1 0 1]. The difference between Pg and the particle’s
they have exploration ability that equip them to per-
current position is Pg - Xi = [1 -1 1 1 0 -1 1 -1 0 0].
form FS and discover optimal subsets.
A value of 1 indicates that compared with the best
PSO needs to be extended in order to deal with position, this feature should be selected but is not,
FS. The particle’s position is considered as binary which will decrease classification quality and lead
bit strings. Every bit represents a feature; the bit to a lower fitness value. Assume that the number of
value 1 represents a selected feature, whereas the 1’s is M. On the other hand, a value of -1 indicates
bit value 0 represents a non-selected feature. Each that, compared with the best position, this bit should
position is a feature subset. To apply the PSO idea not be selected, but is selected. Redundant features
to optimization problem, the following problem- will make the length of the subset longer and lead
dependent aspects can be defined.
234 Mehdi Hosseinzadeh Aghdam, Setareh Heidari

to a lower fitness value. The number of -1’s is N. 3.4 Solution construction


We use the value of (M+N) to express the distance
The overall process of PSO for feature selec-
between two positions; (M+N) may be positive or
tion can be seen in Figure 1. The process begins
negative. Such variation makes particles exhibit ex-
by generating a number of particles which are then
ploration ability within the solution space. In this
placed randomly on the search space, i.e. each par-
example, (M+N) = 4 + 3 = 7, so Pg - Xi = 7.
ticle starts with one random position. Alternatively,
the number of particles to place on the search space
3.2 Updating position may be set equal to the number of features within
After updating the velocity, a particle’s position will the data; each particle starts finding solution by a
be updated by the new velocity. Suppose that the movement. From these initial positions, they fly
new velocity is V. In this case, V bits of the parti- through solutions in each iteration. The selected
cle are randomly changed, different from that of Pg . subsets are collected and then evaluated. If a best
The particles then fly toward the global best while subset has been discovered or the algorithm has run
still exploring the search area, instead of simply be- a certain number of times, then the process stops
ing same as Pg . and returns the best feature subset encountered. If
none of these conditions are met, then the velocities
The Vmax is used as a constraint to control the global
are updated, the particles’ positions are calculated
exploration ability of particles. A larger Vmax pro-
and the process iterates once more.
vides global exploration, while a smaller Vmax in-
creases local exploitation. When Vmax is low, parti-
cles have difficulty getting out from locally optimal 4 Experimental Results
sections. If Vmax is too high, swarm might fly past
good solutions [16]. We set Vmax = (1/2)N and limit Table 1. Number of Training/Test Documents
the velocity within the range [1, (1/2)N], which pre-
vents an overly-large velocity. A particle can be Category Number Number of
close to an optimal solution, but a high velocity may Name of Train Test Docu-
make it move far away. By limiting Vmax , particles Documents ments
cannot move too far away from the optimal solu- Acquisition 1484 664
tion. Corn 170 53
Crude 288 126
3.3 Fitness function Earn 2721 1052
The fitness function is defined in equation (3): Grain 72 32
Interest 165 74
Money-fx 313 106
Ship 122 42
Fitness(Xi ) = ϕ.γ(Si (t)) + φ.(n − |Si (t)|) (3) Trade 297 99
Wheat 153 51
A series of experiments was conducted to show
where Si (t) is the feature subset found by particle i the utility of proposed FS algorithm. All exper-
at iteration t, and |Si (t)|is its length. Fitness is com- iments are executed on a machine with Intel(R)
puted in order to both the measure of the classifier Core(TM) i7 CPU 3.2 GHz and 4 GB of RAM.
performance,γ(Si (t)), and feature subset length. ϕ We implement the proposed PSO algorithm and
and φ are two parameters that control the relative other three FS algorithms in Matlab. The oper-
weight of classifier performance and feature sub- ating system was Windows 7 Professional. We
set length, ϕ ∈[0,1] and φ = 1 − ϕ. This formula used Reuters-21578, the newer version of the cor-
denotes that the classifier performance and feature pus [17]. In Reuters-21578 data set, we select the
subset length have different effect on FS. This paper top ten classes; 5785 documents in training set and
considers that classifier performance is more impor- 2299 documents in test set. The distribution of
tant than subset length, so we set them to ϕ=0.8, the class is unbalance. The maximum class has
φ=0.2.

classified under i-th category (ci), FPi is the number of
test documents incorrectly classified under ci, and FNi is
the number of test documents incorrectly classified
under other categories these probabilities may be
estimated in terms of the contingency Table for ci on a 
FEATURE SELECTION USING PARTICLE . . . given test set. Another commonly used measure in 235TC is
F1 measure that is defined in equation (8) [19] [20].

2721 documents, occupying 47.04 % of training set. Table 2: Table


The Global2. The Global Contingency
Contingency Table Table
The minimum class has 72 documents, occupying 
1.24% of training set. Table 1 presents the ten most Category set Expert judgments
frequent categories. C = {c1,…,c|C|} Yes No wher
|C| |C| macro a
Classifier Yes TP  i1 TPi FP  i 1 FPi
4.1 Feature Extraction |C | |C |
Judgments No FN  i 1 FN i TN  i 1 TN i
Text documents cannot be directly interpreted by 4.3 Re
a classifier. So, an indexing procedure that maps 2    To show
 F1  
a text document into a compact representation of 2(× π )× ρ compare
F1 = (8) informa
its content needs to be uniformly applied to docu- (π + ρ)
When facing multiple classes there are two possible for the
ments. Therefore, a text d j is usually represented When
ways of facing multiple
averaging aboveclasses
measures, there namely,
are two macro
pos- show th
as a vector of term weights. Typically each position sible and
waysmicro
of averaging above measures, the para
average average. The macro averagenamely,
weights
in the input feature vector equals to a given word The po
equally
macro all the and
average classes,
microregardless
average. The of macro
how many
av-
or phrase. This representation often called bag of documents belongequally
erage weights to it. all
Thethemicroclasses,average weights
regardless of iteration
words model. Weights are determined using nor- equally all the documents, thus favoring
how many documents belong to it. The micro av- the performance 1.4]. Th
malized tfidf function [18], defined as: on erage
common classes. The global prelimin
weights equally all thecontingency
documents,tablethuswhich
fa-
is shown in Table 2 is thus obtained by summing over these ar
t f id f (tk , d j ) voring the performance on common classes. The topic fo
wk j = √ (4) category-specific contingency tables; equations (10) to
global contingency table which is shown in Table 2 Analyzi
|T|
∑s=1 (t f id f (ts , d j ))2 (13) show micro averaging and macro averaging on
is thus and
precision obtained
recall.by summing over category-specific average
where is the set of terms (features) that occur at contingency tables; equations (9) to (12) show mi- accurac
 cro averaging and macro averaging  on precision and
gain an

|C|
least once in at least one document of training set, 
TPi 
recall.   |C| i1 the PSO
and 0 ≤ wk j ≤ 1 represents, how much term tk con- i1 |C|i i
(TP  FP )
percent
tributes to the semantics of document d j . µ ∑ T Pi
π = |C| i=1 (9) measure
∑i=1 (T Pi + FPi ) the proc
|Tr|
t f id f (tk , d j ) = #(tk , d j ). log . (5) |C| number
#Tr (tk ) µ ∑i=1 T Pi
ρ = |C|
(10)
where #(tk ,d j ) is the number of occurrence of tk in ∑i=1 (T Pi + FNi )
d j , Tr is the training set and |Tr|is its length. #Tr (tk ) |C|
∑i=1 πi
denote the number of documents in Tr in which tk πM = (11)
|C|
occurs.
|C|
∑i=1 ρi
ρM = (12)
4.2 Performance Measure |C|
Typically precision (π) and recall (ρ) are used for where µ denotes micro averaging and denotes
measurement. They showed in equations (6) and macro averaging.
(7).
T Pi
πi = (6) 4.3 Results
T Pi + FPi
T Pi To show the utility of proposed PSO-based algo-
ρi = (7) rithm we compare the proposed algorithm with ge-
T Pi + FNi
where TPi is the number of test documents correctly netic algorithm, information gain and CHI. Various
classified under i-th category (ci ), FPi is the num- values were tested for the parameters of proposed
ber of test documents incorrectly classified under ci , algorithm. The results show that the highest per-
and FNi is the number of test documents incorrectly formance is achieved by setting the parameters to
classified under other categories these probabilities values as follow:
may be estimated in terms of the contingency Ta- The population size is 50, the maximum num-
ble for ci on a given test set. Another commonly ber of iteration is 100, C1 =C2 =1 and w is in the
used measure in TC is F1 measure that is defined in range of [0.4, 1.4]. These values were empirically
equation (8) [19] [20]. determined in our preliminary experiments; but we
236 Mehdi Hosseinzadeh Aghdam, Setareh Heidari

make no claim that these are optimal values. Pa- performs genetic algorithm, information gain and
rameter optimization is a topic for future research. CHI.
Analyzing the precision and recall shown in Ta- Table 4. Micro-F1 and Macro-F1 of Four
ble 3, on average, the PSO-based algorithm ob- Algorithms
tained a higher accuracy value than the genetic al-
gorithm, information gain and CHI. To graphically Feature Selection Macro-F1 Micro-F1
illustrate the progress of the PSO as it searches for Algorithms
optimal solutions, we take percent features as the IG 69.8124 80.9482
horizontal coordinate and the F1 measure as the CHI 70.8601 82.2097
vertical coordinate. This should illustrate the pro- GA 76.2685 86.3854
cess of improvement of the best particle as the num- PSO 78.8564 89.5684
ber of features increase.
Table 4 describes micro-F1 and macro-F1 for
four FS algorithms. From this Table, the best cat-
egorization performance is achieved with GA and
PSO. Compared with GA, PSO is quicker in lo-
cating the optimal solution. In general, it can find
the solution within tens of iterations. If exhaustive
search is used to find the optimal feature subset in
the Reuters-21578 data set, there will be tens of bil-
lions of candidate subsets, which is impossible to
execute. But with PSO, at the 100nd iteration the
solution is found.

5 Conclusion
Figure 2. Comparison of micro-F1 of four
algorithms Exhaustive searches are impossible for even
medium sized data sets. Thus, stochastic methods
provide a promising FS mechanism. This paper
proposes a FS technique based on particle swarm
optimization. We compare its performance with
other FS methods in TC. PSO has the ability to con-
verge quickly; it has a strong search capability on
the problem space and can efficiently find minimal
feature subset.
In the proposed algorithm, the classifier perfor-
mance and the length of selected feature subset are
adopted as heuristic information. So, we can se-
lect the best feature subset without any prior knowl-
edge of features. To show the utility of the pro-
posed algorithm and to compare it with informa-
Figure 3. Comparison of macro-F1 of four
tion gain and CHI, a set of experiments were car-
algorithms
ried out on Reuters-21578 data set. The computa-
Figures 2 and 3 show the micro-averaged and tional results indicate that proposed algorithm out-
macro-averaged F1 measure for each of the FS al- performs information gain and CHI methods since
gorithms as we change the number of selected fea- it achieved better performance with the lower num-
tures. The results display that as the percentage ber of features. To show the effectiveness of the
of selected features exceeds 12% in micro-F1 and proposed algorithm, we have used a simple clas-
macro-F1 measures, the PSO-based algorithm out- sifier (nearest neighbor classifier) which can affect
FEATURE SELECTION USING PARTICLE . . . 237

Table 3: The Performance


Table (Precision(Precision
3. The Performance and Recall)
and of Information
Recall) Gain, CHI,
of Information Gain,GA andGA
CHI, PSO
andon Reuters-21578
PSO on
Data set Reuters-21578 Data set
IG CHI GA PSO
Category Name
Precision Recall Precision Recall Precision Recall Precision Recall
Acquisition 91.7466 71.988 89.5575 76.2048 92.4528 81.1747 91.5264 90.5648
Corn 87.8049 67.9245 87.8049 67.9245 90.6977 73.5849 91.6321 74.2145
Crude 84.9057 71.4286 83.0189 69.8413 78.3217 88.8889 81.6942 88.8889
Earn 84.3803 94.4867 84.9829 94.6768 90.1961 96.1977 97.5612 94.3328
Grain 63.6364 65.625 60.6061 62.5 69.697 71.875 67.264 74.3568
Interest 45.7627 36.4865 54 36.4865 60 36.4865 50.6841 52.6428
Money-fx 51.7241 56.6038 61.2245 56.6038 67.8899 69.8113 68.4863 70.8113
Ship 33.8235 54.7419 37.5 57.1429 57.1429 66.6667 60.3521 75.3621
Trade 68.7023 90.9091 73.9837 91.9192 72.3577 89.899 78.6325 91.8751
Wheat 91.3043 82.3529 89.3617 82.3529 87.7551 84.3137 86.7864 85.3621
Average 70.3791 69.2547 72.204 69.5653 76.6511 75.8898 77.4619 79.8411

the categorization performance. As for the future [8] M. Srinivas, L.M. and L.M. Patnik, Genetic Algo-
work, intention is to apply the proposed FS algo- rithms: A Survey, IEEE Computer Society Press,
display that as the percentage of selected features
rithm using complicated classifiers to improve its Los lamitos, 1994.
exceeds 12% in micro-F1 and macro-F1 measures, the
performance and to combine the proposed method PSO-based algorithm
[9] W. Siedlecki, and J.outperforms genetic
Sklansky, A note algorithm,
on genetic
with other population-based algorithms. algorithms for large-scale
information gain and CHI. feature selection, Pattern
Recognition Letters, vol. 10(5), pp. 335-347, 1989.
[10] M.F.
Table Caropreso,and
4: Micro-F1 andMacro-F1
S. Matwin,of
Beyond the Bag of
Four Algorithms
References Words: A Text Representation for Sentence Selec-
tion, StateplaceBerlin:
Feature Springer-Verlag,
Selection Algorithms Macro-F1 pp.Micro-F1
324-
[1] Jensen, R. (2005). Combining rough and fuzzy sets
IG335, 2006. 69.8124 80.9482
for feature selection. Ph.D. dissertation, School of
Information, Edinburgh University. [11]CHI 70.8601
R. Kohavi, and G.H. John, Wrappers for 82.2097
feature
GA 76.2685 86.3854
subset selection, Journal of Artificial Intelligence,
[2] W. Shang, H. Huang, H. Zhu, Y. Lin, Y. Qu, and Z. PSO 78.8564 89.5684
vol. 97(1-2), pp. 273–324, 1997.
Wang, A novel feature selection algorithm for text
categorization, Expert Systems with Applications, [12] M. Dash, and H. Liu, Feature selection for classifi-
vol. 33(1), pp.
Tablecation,
4 describes micro-F1
Intelligent and macro-F1
Data Analysis: for four FS
An International
Figure 2: Comparison of 1-5, 2007. of four algorithms.
micro-F1 algorithms. From this Table, the best categorization
Journal, vol. 1(3), pp. 131-156, 1997.
[3] Y. Yang, and J.O. Pedersen, A Comparative Study performance is achieved with GA and PSO. Compared
on Feature Selection in Text Categorization, Pro- [13] H.
with Liu,PSO
GA, and L.isYu,quicker
Toward Integrating
in locatingFeature
the Se-
optimal
ceedings of the 14th International Conference on lection Algorithms for Classification and Cluster-
solution. In general, it can find the solution within tens
Machine Learning, pp. 412-420, 1997. ing, IEEE Transactions on Knowledge and Data
of iterations. If exhaustive search is used to find the
Engineering, vol. 17(4), pp. 491-502, 2005.
[4] H. Kim, P. Howland, and H. Park, Dimension Re- optimal feature subset in the Reuters-21578 data set,
duction in Text Classification with Support Vector there willMladeni,
[14] D. be tensFeature
of billions of candidate
Selection subsets, which
for Dimensionality
Machines, Journal of Machine Learning Research, Reduction. Subspace, Latent
is impossible to execute. But with PSO,Structure andatFeature
the 100nd
6, 37–53, 2005. Selection,
iteration Statistical
the solution and Optimization, Perspec-
is found.
tives Workshop, SLSFS 2005, CityplaceBohinj,
[5] G. Forman, Feature Selection for Text Classifica-
country-regionSlovenia, Lecture Notes in Com-
tion, In: Computational Methods of Feature Selec- 5 Conclusion
puter Science 3940 Springer, pp. 84–102, 2006.
tion, Chapman and Hall/CRC Press, 2007.
Exhaustive searches
[15] J. Kennedy, R.C. are impossible
Eberhart, Particlefor evenopti-
swarm medium
[6] M. Raymer, W. Punch, E. Goodman, L. Kuhn, and sized mization,
data sets. Thus, stochastic methods provide a
Proceedings of IEEE International Con-
A.K. Jain, Dimensionality Reduction Using Ge- promising FS mechanism. This paper proposes a FS
ference on Neural Networks, pp. 1942-1948, 1995.
Figure 3: Comparison of macro-F1
netic Algorithms, of four algorithms.
IEEE Transactions on Evolution- technique based on particle swarm optimization. We
ary Computing, 4, pp. 164–171, 2000. [16] A.P. Engelbrecht, Fundamentals of Computational
compare its performance with other FS methods in TC.
Figures [7]
2 and
M.H.3 Aghdam,
show theN.micro-averaged
Ghasem-Aghaee, andand
macro-
M.E.
Swarm Intelligence, John Wiley & Sons, London,
PSO has the ability to converge quickly; it has a strong
averaged F1Basiri,
measure for each of the FS optimization
algorithms for
as 2005.
Application of ant colony search capability on the problem space and can
we change feature
the number of selected
selection features. TheProceed-
in text categorization, results [17] The find
efficiently reuters-21578 text subset.
minimal feature categorization test
ings of the IEEE Congress on Evolutionary Com- collection. Available: http://kdd.ics.uci.edu/
putation, pp. 2872-2878, 1-6, 2008. databases/reuters21578/reuters21578.html
238 Mehdi Hosseinzadeh Aghdam, Setareh Heidari

[18] G. Salton, and C. Buckley, Term-weighting [20] M.H. Aghdam, J. Tanha, A.R. Naghsh-Nilchi, and
approaches in automatic text retrieval, Pla- M.E. Basiri, Combination of Ant Colony Opti-
ceNameCornell PlaceTypeUniversity CityplaceI- mization and Bayesian Classification for Feature
thaca, StateNY, country-regionUSA, Technical Re- Selection in a Bioinformatics Dataset, Journal of
port TR87-881, 1987. Computer Science & Systems Biology vol. 2, pp.
[19] C.J. Rijsbergen, Information Retrieval, 2nd ed. 186-199, 2009.
Butterworths, London, UK, 1979.

Mehdi Hosseinzadeh Aghdam is a Setareh Heidari has graduated with a


Ph.D. candidate in the computer engi- master in computer networks informa-
neering department at the Iran Univer- tion technology in the Iran University
sity of Science and Technology, also of Science and Technology. At IUST,
he is a lecturer at Payame Noor Uni- She worked on Formal Verification of
versity. He graduated with a master’s security protocol. Her main research
in computer engineering Artificial interests are: Swarm Intelligence, Net-
Intelligence from University of Isfa- work Security, Wireless Networks and
han (UI) in 2008. At UI, he worked on Sensor Networks.
swarm intelligence-based method for feature selection. His
main research interests are: Data Mining, Computational In-
telligence, and Pattern Recognition.

You might also like