Abstract
In this paper we investigate a special case of classification problem, called multi-label learning, where each instance (or object) is associated with a set of target labels (or simple decisions). Multi-label classification is one of the most important issues in semantic indexing and text categorization systems. Most of multi-label classification methods are based on combination of binary classifiers, which are trained separately for each label. In this paper we concentrate on the application of ensemble technique to multi-label classification problem. We present the most recent ensemble methods for both the binary classifier training phase as well as the combination learning phase. The proposed methods have been implemented within the SONCA system which is a part of SYNAT project. We present some experiment results performed on PubMed Central biomedical articles database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Zhou, Z., Zhang, M.: Multi-instance multi-label learning with application to scene classification. In: Advances in Neural Information Processing Systems, vol. 19, p. 1609 (2007)
Barutcuoglu, Z., Schapire, R.E., Troyanskaya, O.G.: Hierarchical multi-label prediction of gene function. Bioinformatics 22(7), 830–836 (2006)
Zhou, Z., Zhang, M., Huang, S., Li, Y.: Multi-instance multi-label learning. Artificial Intelligence 176(1), 2291–2320 (2012)
Roberts, R.J.: PubMed Central: The GenBank of the published literature. Proceedings of the National Academy of Sciences of the United States of America 98(2), 381–382 (2001)
Zhang, M.L., Zhou, Z.H.: Ml-knn: A lazy learning approach to multi-label learning. Pattern Recognition 40(7), 2038–2048 (2007)
Beck, J., Sequeira, E.: PubMed Central (PMC): An archive for literature from life sciences journals. In: McEntyre, J., Ostell, J. (eds.) The NCBI Handbook. National Center for Biotechnology Information, Bethesda (2003)
United States National Library of Medicine: Introduction to MeSH - 2011 (2011), http://www.nlm.nih.gov/mesh/introduction.html
Canese, K., Jentsch,J., Myers,C.: PubMed: The Bibliographic Database (2002) (updated August 13, 2003)
Névéol, A., Shooshan, S.E., Humphrey, S.M., Mork, J.G., Aronson, A.R.: A recent advance in the automatic indexing of the biomedical literature. J. of Biomedical Informatics 42(5), 814–823 (2009)
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the Twentieth International Joint Conference for Artificial Intelligence, Hyderabad, India, pp. 1606–1611 (2007)
Szczuka, M., Janusz, A., Herba, K.: Clustering of Rough Set Related Documents with Use of Knowledge from DBpedia. In: Yao, J., Ramanna, S., Wang, G., Suraj, Z. (eds.) RSKT 2011. LNCS, vol. 6954, pp. 394–403. Springer, Heidelberg (2011)
Ślęzak, D., Janusz, A., Świeboda, W., Nguyen, H.S., Bazan, J.G., Skowron, A.: Semantic Analytics of PubMed Content. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 63–74. Springer, Heidelberg (2011)
Janusz, A., Nguyen, H.S., Ślęzak, D., Stawicki, S., Krasuski, A.: JRS’2012 Data Mining Competition: Topical Classification of Biomedical Research Papers. In: Yao, J.T., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS(LNAI), vol. 7413, pp. 422–431. Springer, Heidelberg (2012)
Janusz, A., Świeboda, W., Krasuski, A., Nguyen, H.S.: Interactive Document Indexing Method Based on Explicit Semantic Analysis. In: Yao, J.T., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS(LNAI), vol. 7413, pp. 156–165. Springer, Heidelberg (2012)
Wojnarski, M., Stawicki, S., Wojnarowski, P.: TunedIT.org: System for Automated Evaluation of Algorithms in Repeatable Experiments. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS(LNAI), vol. 6086, pp. 20–29. Springer, Heidelberg (2010)
Zbontar, J., Zitnik, M., Zidar, M., Majcen, G., Potocnik, M., Zupan, B.: Team ULjubljana’s Solution to the JRS 2012 Data Mining Competition. In: Yao, J.T., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS(LNAI), vol. 7413, pp. 471–478. Springer, Heidelberg (2012)
Caruana, R., Munson, A., Niculescu-Mizil, A.: Getting the most out of ensemble selection. In: Proceedings of the 6th IEEE International Conference on Data Mining, pp. 828–833 (2006)
Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Proceedings of the 21st International Conference on Machine Learning, pp. 137–144. ACM Press (2004)
Janusz, A.: Combining Multiple Classification or Regression Models Using Genetic Algorithms. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS(LNAI), vol. 6086, pp. 130–137. Springer, Heidelberg (2010)
Bennett, J., Lanning, S.: The netflix prize. In: KDD Cup and Workshop in conjunction with KDD (2007)
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40(2), 139–157 (2000)
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 36(1-2), 105–139 (1999)
Patrick, E., Fischer III, F.: A generalized k-nearest neighbor rule. Information and Control 16(2), 128–152 (1970)
Mitchell, T.M.: Machine Learning. McGraw Hill series in computer science. McGraw-Hill (1997)
Coomans, D., Massart, D.: Alternative k-nearest neighbour rules in supervised pattern recognition: Part 1. k-nearest neighbour classification by using alternative voting rules. Analytica Chimica Acta 136, 15–27 (1982)
Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs. Springer (1996)
Joachims, T.: A support vector method for multivariate performance measures. In: Proceedings of the International Conference on Machine Learning, ICML (2005)
Vapnik, V.N.: The nature of statistical learning theory. Springer-Verlag New York, Inc., New York (1995)
Joachims, T.: Training linear svms in linear time. In: Proceedings of the ACM Conference on Knowledge Discovery and Data Mining, KDD (2006)
Yu, C.N.J., Joachims, T.: Sparse kernel svms via cutting-plane training. In: Proceedings of the European Conference on Machine Learning (ECML), Machine Learning Journal, Special ECML Issue (2009)
Chang, C.C., Lin, C.J., Hsu, C.W.: A practical guide to support vector classification
Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.Y.: A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing 16(6), 1190–1208 (1995)
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Kurach, K., Pawłowski, K., Romaszko, Ł., Tatjewski, M., Janusz, A., Nguyen, H.S. (2013). Multi-label Classification of Biomedical Articles. In: Bembenik, R., Skonieczny, L., Rybinski, H., Kryszkiewicz, M., Niezgodka, M. (eds) Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol 467. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35647-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-35647-6_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35646-9
Online ISBN: 978-3-642-35647-6
eBook Packages: EngineeringEngineering (R0)