Multi-label Classification of Biomedical Articles

Kurach, Karol; Pawłowski, Krzysztof; Romaszko, Łukasz; Tatjewski, Marcin; Janusz, Andrzej; Nguyen, Hung Son

doi:10.1007/978-3-642-35647-6_15

Part of the book series: Studies in Computational Intelligence ((SCI,volume 467))

1043 Accesses
1 Citations

Abstract

In this paper we investigate a special case of classification problem, called multi-label learning, where each instance (or object) is associated with a set of target labels (or simple decisions). Multi-label classification is one of the most important issues in semantic indexing and text categorization systems. Most of multi-label classification methods are based on combination of binary classifiers, which are trained separately for each label. In this paper we concentrate on the application of ensemble technique to multi-label classification problem. We present the most recent ensemble methods for both the binary classifier training phase as well as the combination learning phase. The proposed methods have been implemented within the SONCA system which is a part of SYNAT project. We present some experiment results performed on PubMed Central biomedical articles database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: EUR 29.95; Price includes VAT (France)

eBook: EUR 117.69; Price includes VAT (France)

Softcover Book: EUR 158.24; Price includes VAT (France)

Hardcover Book: EUR 158.24; Price includes VAT (France)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Large-scale online semantic indexing of biomedical articles via an ensemble of multi-label classification models

Article Open access 22 September 2017

Designing an Ensemble Classifier for Classifying Multi-label Data

Evaluation of Multi-label Classifiers in Various Domains Using Decision Tree

References

Zhou, Z., Zhang, M.: Multi-instance multi-label learning with application to scene classification. In: Advances in Neural Information Processing Systems, vol. 19, p. 1609 (2007)
Google Scholar
Barutcuoglu, Z., Schapire, R.E., Troyanskaya, O.G.: Hierarchical multi-label prediction of gene function. Bioinformatics 22(7), 830–836 (2006)
Article Google Scholar
Zhou, Z., Zhang, M., Huang, S., Li, Y.: Multi-instance multi-label learning. Artificial Intelligence 176(1), 2291–2320 (2012)
Article MathSciNet MATH Google Scholar
Roberts, R.J.: PubMed Central: The GenBank of the published literature. Proceedings of the National Academy of Sciences of the United States of America 98(2), 381–382 (2001)
Article Google Scholar
Zhang, M.L., Zhou, Z.H.: Ml-knn: A lazy learning approach to multi-label learning. Pattern Recognition 40(7), 2038–2048 (2007)
Article MATH Google Scholar
Beck, J., Sequeira, E.: PubMed Central (PMC): An archive for literature from life sciences journals. In: McEntyre, J., Ostell, J. (eds.) The NCBI Handbook. National Center for Biotechnology Information, Bethesda (2003)
Google Scholar
United States National Library of Medicine: Introduction to MeSH - 2011 (2011), http://www.nlm.nih.gov/mesh/introduction.html
Canese, K., Jentsch,J., Myers,C.: PubMed: The Bibliographic Database (2002) (updated August 13, 2003)
Google Scholar
Névéol, A., Shooshan, S.E., Humphrey, S.M., Mork, J.G., Aronson, A.R.: A recent advance in the automatic indexing of the biomedical literature. J. of Biomedical Informatics 42(5), 814–823 (2009)
Article Google Scholar
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the Twentieth International Joint Conference for Artificial Intelligence, Hyderabad, India, pp. 1606–1611 (2007)
Google Scholar
Szczuka, M., Janusz, A., Herba, K.: Clustering of Rough Set Related Documents with Use of Knowledge from DBpedia. In: Yao, J., Ramanna, S., Wang, G., Suraj, Z. (eds.) RSKT 2011. LNCS, vol. 6954, pp. 394–403. Springer, Heidelberg (2011)
Chapter Google Scholar
Ślęzak, D., Janusz, A., Świeboda, W., Nguyen, H.S., Bazan, J.G., Skowron, A.: Semantic Analytics of PubMed Content. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 63–74. Springer, Heidelberg (2011)
Chapter Google Scholar
Janusz, A., Nguyen, H.S., Ślęzak, D., Stawicki, S., Krasuski, A.: JRS’2012 Data Mining Competition: Topical Classification of Biomedical Research Papers. In: Yao, J.T., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS(LNAI), vol. 7413, pp. 422–431. Springer, Heidelberg (2012)
Chapter Google Scholar
Janusz, A., Świeboda, W., Krasuski, A., Nguyen, H.S.: Interactive Document Indexing Method Based on Explicit Semantic Analysis. In: Yao, J.T., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS(LNAI), vol. 7413, pp. 156–165. Springer, Heidelberg (2012)
Chapter Google Scholar
Wojnarski, M., Stawicki, S., Wojnarowski, P.: TunedIT.org: System for Automated Evaluation of Algorithms in Repeatable Experiments. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS(LNAI), vol. 6086, pp. 20–29. Springer, Heidelberg (2010)
Chapter Google Scholar
Zbontar, J., Zitnik, M., Zidar, M., Majcen, G., Potocnik, M., Zupan, B.: Team ULjubljana’s Solution to the JRS 2012 Data Mining Competition. In: Yao, J.T., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS(LNAI), vol. 7413, pp. 471–478. Springer, Heidelberg (2012)
Chapter Google Scholar
Caruana, R., Munson, A., Niculescu-Mizil, A.: Getting the most out of ensemble selection. In: Proceedings of the 6th IEEE International Conference on Data Mining, pp. 828–833 (2006)
Google Scholar
Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Proceedings of the 21st International Conference on Machine Learning, pp. 137–144. ACM Press (2004)
Google Scholar
Janusz, A.: Combining Multiple Classification or Regression Models Using Genetic Algorithms. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS(LNAI), vol. 6086, pp. 130–137. Springer, Heidelberg (2010)
Chapter Google Scholar
Bennett, J., Lanning, S.: The netflix prize. In: KDD Cup and Workshop in conjunction with KDD (2007)
Google Scholar
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40(2), 139–157 (2000)
Article Google Scholar
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 36(1-2), 105–139 (1999)
Article Google Scholar
Patrick, E., Fischer III, F.: A generalized k-nearest neighbor rule. Information and Control 16(2), 128–152 (1970)
Article MathSciNet MATH Google Scholar
Mitchell, T.M.: Machine Learning. McGraw Hill series in computer science. McGraw-Hill (1997)
Google Scholar
Coomans, D., Massart, D.: Alternative k-nearest neighbour rules in supervised pattern recognition: Part 1. k-nearest neighbour classification by using alternative voting rules. Analytica Chimica Acta 136, 15–27 (1982)
Article Google Scholar
Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs. Springer (1996)
Google Scholar
Joachims, T.: A support vector method for multivariate performance measures. In: Proceedings of the International Conference on Machine Learning, ICML (2005)
Google Scholar
Vapnik, V.N.: The nature of statistical learning theory. Springer-Verlag New York, Inc., New York (1995)
MATH Google Scholar
Joachims, T.: Training linear svms in linear time. In: Proceedings of the ACM Conference on Knowledge Discovery and Data Mining, KDD (2006)
Google Scholar
Yu, C.N.J., Joachims, T.: Sparse kernel svms via cutting-plane training. In: Proceedings of the European Conference on Machine Learning (ECML), Machine Learning Journal, Special ECML Issue (2009)
Google Scholar
Chang, C.C., Lin, C.J., Hsu, C.W.: A practical guide to support vector classification
Google Scholar
Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.Y.: A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing 16(6), 1190–1208 (1995)
Article MathSciNet MATH Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Mathematics, Informatics and Mechanics, The University of Warsaw, Banacha 2, 02-097, Warsaw, Poland
Karol Kurach, Krzysztof Pawłowski, Łukasz Romaszko, Marcin Tatjewski, Andrzej Janusz & Hung Son Nguyen

Authors

Karol Kurach
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof Pawłowski
View author publications
You can also search for this author in PubMed Google Scholar
Łukasz Romaszko
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Tatjewski
View author publications
You can also search for this author in PubMed Google Scholar
Andrzej Janusz
View author publications
You can also search for this author in PubMed Google Scholar
Hung Son Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karol Kurach .

Editor information

Editors and Affiliations

, Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warszawa, 00-665, Poland
Robert Bembenik
, Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warszawa, 00-665, Poland
Lukasz Skonieczny
, Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warszawa, 00-665, Poland
Henryk Rybinski
, Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warszawa, 00-665, Poland
Marzena Kryszkiewicz
, Interdisciplinary Centre for, University of Warsaw, Pawińskiego 5a bl. D, Warsaw, 02-106, Poland
Marek Niezgodka

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kurach, K., Pawłowski, K., Romaszko, Ł., Tatjewski, M., Janusz, A., Nguyen, H.S. (2013). Multi-label Classification of Biomedical Articles. In: Bembenik, R., Skonieczny, L., Rybinski, H., Kryszkiewicz, M., Niezgodka, M. (eds) Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol 467. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35647-6_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-35647-6_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35646-9
Online ISBN: 978-3-642-35647-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Multi-label Classification of Biomedical Articles

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Large-scale online semantic indexing of biomedical articles via an ensemble of multi-label classification models

Designing an Ensemble Classifier for Classifying Multi-label Data

Evaluation of Multi-label Classifiers in Various Domains Using Decision Tree

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Multi-label Classification of Biomedical Articles

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Large-scale online semantic indexing of biomedical articles via an ensemble of multi-label classification models

Designing an Ensemble Classifier for Classifying Multi-label Data

Evaluation of Multi-label Classifiers in Various Domains Using Decision Tree

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation