10.1515 - Jaiscr 2015 0031
10.1515 - Jaiscr 2015 0031
10.1515 - Jaiscr 2015 0031
231 – 238
10.1515/jaiscr-2015-0031
Abstract
Feature selection is the main step in classification systems, a procedure that selects a
subset from original features. Feature selection is one of major challenges in text cat-
egorization. The high dimensionality of feature space increases the complexity of text
categorization process, because it plays a key role in this process. This paper presents a
novel feature selection method based on particle swarm optimization to improve the per-
formance of text categorization. Particle swarm optimization inspired by social behavior
of fish schooling or bird flocking. The complexity of the proposed method is very low
due to application of a simple classifier. The performance of the proposed method is com-
pared with performance of other methods on the Reuters-21578 data set. Experimental
results display the superiority of the proposed method.
Vi (t + 1) = w.Vi (t)
+ c1 .r1 (t).[Pi (t) − Xi (t)] (1)
+ c2 .r2 (t).[Pg (t) − Xi (t)]
make no claim that these are optimal values. Pa- performs genetic algorithm, information gain and
rameter optimization is a topic for future research. CHI.
Analyzing the precision and recall shown in Ta- Table 4. Micro-F1 and Macro-F1 of Four
ble 3, on average, the PSO-based algorithm ob- Algorithms
tained a higher accuracy value than the genetic al-
gorithm, information gain and CHI. To graphically Feature Selection Macro-F1 Micro-F1
illustrate the progress of the PSO as it searches for Algorithms
optimal solutions, we take percent features as the IG 69.8124 80.9482
horizontal coordinate and the F1 measure as the CHI 70.8601 82.2097
vertical coordinate. This should illustrate the pro- GA 76.2685 86.3854
cess of improvement of the best particle as the num- PSO 78.8564 89.5684
ber of features increase.
Table 4 describes micro-F1 and macro-F1 for
four FS algorithms. From this Table, the best cat-
egorization performance is achieved with GA and
PSO. Compared with GA, PSO is quicker in lo-
cating the optimal solution. In general, it can find
the solution within tens of iterations. If exhaustive
search is used to find the optimal feature subset in
the Reuters-21578 data set, there will be tens of bil-
lions of candidate subsets, which is impossible to
execute. But with PSO, at the 100nd iteration the
solution is found.
5 Conclusion
Figure 2. Comparison of micro-F1 of four
algorithms Exhaustive searches are impossible for even
medium sized data sets. Thus, stochastic methods
provide a promising FS mechanism. This paper
proposes a FS technique based on particle swarm
optimization. We compare its performance with
other FS methods in TC. PSO has the ability to con-
verge quickly; it has a strong search capability on
the problem space and can efficiently find minimal
feature subset.
In the proposed algorithm, the classifier perfor-
mance and the length of selected feature subset are
adopted as heuristic information. So, we can se-
lect the best feature subset without any prior knowl-
edge of features. To show the utility of the pro-
posed algorithm and to compare it with informa-
Figure 3. Comparison of macro-F1 of four
tion gain and CHI, a set of experiments were car-
algorithms
ried out on Reuters-21578 data set. The computa-
Figures 2 and 3 show the micro-averaged and tional results indicate that proposed algorithm out-
macro-averaged F1 measure for each of the FS al- performs information gain and CHI methods since
gorithms as we change the number of selected fea- it achieved better performance with the lower num-
tures. The results display that as the percentage ber of features. To show the effectiveness of the
of selected features exceeds 12% in micro-F1 and proposed algorithm, we have used a simple clas-
macro-F1 measures, the PSO-based algorithm out- sifier (nearest neighbor classifier) which can affect
FEATURE SELECTION USING PARTICLE . . . 237
the categorization performance. As for the future [8] M. Srinivas, L.M. and L.M. Patnik, Genetic Algo-
work, intention is to apply the proposed FS algo- rithms: A Survey, IEEE Computer Society Press,
display that as the percentage of selected features
rithm using complicated classifiers to improve its Los lamitos, 1994.
exceeds 12% in micro-F1 and macro-F1 measures, the
performance and to combine the proposed method PSO-based algorithm
[9] W. Siedlecki, and J.outperforms genetic
Sklansky, A note algorithm,
on genetic
with other population-based algorithms. algorithms for large-scale
information gain and CHI. feature selection, Pattern
Recognition Letters, vol. 10(5), pp. 335-347, 1989.
[10] M.F.
Table Caropreso,and
4: Micro-F1 andMacro-F1
S. Matwin,of
Beyond the Bag of
Four Algorithms
References Words: A Text Representation for Sentence Selec-
tion, StateplaceBerlin:
Feature Springer-Verlag,
Selection Algorithms Macro-F1 pp.Micro-F1
324-
[1] Jensen, R. (2005). Combining rough and fuzzy sets
IG335, 2006. 69.8124 80.9482
for feature selection. Ph.D. dissertation, School of
Information, Edinburgh University. [11]CHI 70.8601
R. Kohavi, and G.H. John, Wrappers for 82.2097
feature
GA 76.2685 86.3854
subset selection, Journal of Artificial Intelligence,
[2] W. Shang, H. Huang, H. Zhu, Y. Lin, Y. Qu, and Z. PSO 78.8564 89.5684
vol. 97(1-2), pp. 273–324, 1997.
Wang, A novel feature selection algorithm for text
categorization, Expert Systems with Applications, [12] M. Dash, and H. Liu, Feature selection for classifi-
vol. 33(1), pp.
Tablecation,
4 describes micro-F1
Intelligent and macro-F1
Data Analysis: for four FS
An International
Figure 2: Comparison of 1-5, 2007. of four algorithms.
micro-F1 algorithms. From this Table, the best categorization
Journal, vol. 1(3), pp. 131-156, 1997.
[3] Y. Yang, and J.O. Pedersen, A Comparative Study performance is achieved with GA and PSO. Compared
on Feature Selection in Text Categorization, Pro- [13] H.
with Liu,PSO
GA, and L.isYu,quicker
Toward Integrating
in locatingFeature
the Se-
optimal
ceedings of the 14th International Conference on lection Algorithms for Classification and Cluster-
solution. In general, it can find the solution within tens
Machine Learning, pp. 412-420, 1997. ing, IEEE Transactions on Knowledge and Data
of iterations. If exhaustive search is used to find the
Engineering, vol. 17(4), pp. 491-502, 2005.
[4] H. Kim, P. Howland, and H. Park, Dimension Re- optimal feature subset in the Reuters-21578 data set,
duction in Text Classification with Support Vector there willMladeni,
[14] D. be tensFeature
of billions of candidate
Selection subsets, which
for Dimensionality
Machines, Journal of Machine Learning Research, Reduction. Subspace, Latent
is impossible to execute. But with PSO,Structure andatFeature
the 100nd
6, 37–53, 2005. Selection,
iteration Statistical
the solution and Optimization, Perspec-
is found.
tives Workshop, SLSFS 2005, CityplaceBohinj,
[5] G. Forman, Feature Selection for Text Classifica-
country-regionSlovenia, Lecture Notes in Com-
tion, In: Computational Methods of Feature Selec- 5 Conclusion
puter Science 3940 Springer, pp. 84–102, 2006.
tion, Chapman and Hall/CRC Press, 2007.
Exhaustive searches
[15] J. Kennedy, R.C. are impossible
Eberhart, Particlefor evenopti-
swarm medium
[6] M. Raymer, W. Punch, E. Goodman, L. Kuhn, and sized mization,
data sets. Thus, stochastic methods provide a
Proceedings of IEEE International Con-
A.K. Jain, Dimensionality Reduction Using Ge- promising FS mechanism. This paper proposes a FS
ference on Neural Networks, pp. 1942-1948, 1995.
Figure 3: Comparison of macro-F1
netic Algorithms, of four algorithms.
IEEE Transactions on Evolution- technique based on particle swarm optimization. We
ary Computing, 4, pp. 164–171, 2000. [16] A.P. Engelbrecht, Fundamentals of Computational
compare its performance with other FS methods in TC.
Figures [7]
2 and
M.H.3 Aghdam,
show theN.micro-averaged
Ghasem-Aghaee, andand
macro-
M.E.
Swarm Intelligence, John Wiley & Sons, London,
PSO has the ability to converge quickly; it has a strong
averaged F1Basiri,
measure for each of the FS optimization
algorithms for
as 2005.
Application of ant colony search capability on the problem space and can
we change feature
the number of selected
selection features. TheProceed-
in text categorization, results [17] The find
efficiently reuters-21578 text subset.
minimal feature categorization test
ings of the IEEE Congress on Evolutionary Com- collection. Available: http://kdd.ics.uci.edu/
putation, pp. 2872-2878, 1-6, 2008. databases/reuters21578/reuters21578.html
238 Mehdi Hosseinzadeh Aghdam, Setareh Heidari
[18] G. Salton, and C. Buckley, Term-weighting [20] M.H. Aghdam, J. Tanha, A.R. Naghsh-Nilchi, and
approaches in automatic text retrieval, Pla- M.E. Basiri, Combination of Ant Colony Opti-
ceNameCornell PlaceTypeUniversity CityplaceI- mization and Bayesian Classification for Feature
thaca, StateNY, country-regionUSA, Technical Re- Selection in a Bioinformatics Dataset, Journal of
port TR87-881, 1987. Computer Science & Systems Biology vol. 2, pp.
[19] C.J. Rijsbergen, Information Retrieval, 2nd ed. 186-199, 2009.
Butterworths, London, UK, 1979.