Skip to main content

Human Papillomavirus Risk Type Classification from Protein Sequences Using Support Vector Machines

  • Conference paper
Applications of Evolutionary Computing (EvoWorkshops 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3907))

Included in the following conference series:


Infection by the human papillomavirus (HPV) is associated with the development of cervical cancer. HPV can be classified to high- and low-risk type according to its malignant potential, and detection of the risk type is important to understand the mechanisms and diagnose potential patients. In this paper, we classify the HPV protein sequences by support vector machines. A string kernel is introduced to discriminate HPV protein sequences. The kernel emphasizes amino acids pairs with a distance. In the experiments, our approach is compared with previous methods in accuracy and F1-score, and it has showed better performance. Also, the prediction results for unknown HPV types are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others


  1. Bosch, F.X., Manos, M.M., et al.: Prevalence of Human Papillomavirus in Cervical Cancer: a Worldwide Perspective. Journal of the National Cancer Institute 87, 796–802 (1995)

    Article  Google Scholar 

  2. Janicek, M.F., Averette, H.E.: Cervical Cancer: Prevention, Diagnosis, and Therapeutics. Cancer Journals for Clinicians 51, 92–114 (2001)

    Article  Google Scholar 

  3. Furumoto, H., Irahara, M.: Human Papillomavirus (HPV) and Cervical Cancer. Journal of Medical Investigation 49, 124–133 (2002)

    Google Scholar 

  4. Centurioni, M.G., Puppo, A., et al.: Prevalence of Human Papillomavirus Cervical Infection in an Italian Asymptomatic Population. BMC Infectious Diseases 5(77) (2005)

    Google Scholar 

  5. Burk, R.D., Ho, G.Y., et al.: Sexual Behavior and Partner Characteristics Are the Predominant Risk Factors for Genital Human Papillomavirus Infection in Young Women. The Journal of Infectious Diseases 174, 679–689 (1996)

    Article  Google Scholar 

  6. Muñoz, N., Bosch, F.X., et al.: Epidemiologic Classification of Human Papillomavirus Types Associated with Cervical Cancer. New England Journal of Medicin 348, 518–527 (2003)

    Article  Google Scholar 

  7. Watson, J.D., Laskowski, R.A., Thornton, J.M.: Predicting Protein Function from Sequence and Structural Data. Current Opinion in Structural Biology 15, 275–284 (2005)

    Article  Google Scholar 

  8. Borgwardt, K.M., Ong, C.S., et al.: Protein Function Prediction via Graph Kernels. In: Proceedings of Thirteenth International Conference on Intelligenc Systems for Molecular Biology, pp. 47–56 (2005)

    Google Scholar 

  9. Eom, J.-H., Park, S.-B., Zhang, B.-T.: Genetic mining of DNA sequence structures for effective classification of the risk types of human papillomavirus (HPV). In: Pal, N.R., Kasabov, N., Mudi, R.K., Pal, S., Parui, S.K. (eds.) ICONIP 2004. LNCS, vol. 3316, pp. 1334–1343. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  10. Joung, J.-G., O, S.J., Zhang, B.-T.: Prediction of the risk types of human papillomaviruses by support vector machines. In: Zhang, C., Guesgen, H.W., Yeap, W.-K. (eds.) PRICAI 2004. LNCS (LNAI), vol. 3157, pp. 723–731. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  11. Park, S.-B., Hwang, S., Zhang, B.-T.: Mining the risk types of human papillomavirus (HPV) by adaCost. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 403–412. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  12. Vapnik, V.N.: Statistical Learning Theory. Springer, Heidelberg (1998)

    MATH  Google Scholar 

  13. Leslie, C., Eskin, E., Noble, W.S.: The Spectrum Kernel: A String Kernel for SVM Protein Classification. In: Proceedings of the Pacific Symposium on Biocomputing, pp. 564–575 (2002)

    Google Scholar 

  14. Leslie, C., Eskin, E., et al.: Mismatch String Kernels for Discriminative Protein Classification. Bioinformatics 20(4), 467–476 (2004)

    Article  Google Scholar 

  15. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  16. The HPV sequence database in Los Alamos laboratory,

  17. Pillai, M., Lakshmi, S., et al.: High-Risk Human Papillomavirus Infection and E6 Protein Expression in Lesions of the Uterine Cervix. Pathobiology 66, 240–246 (1998)

    Article  Google Scholar 

  18. Longuet, M., Beaudenon, S., Orth, G.: Two Novel Genital Human Papillomavirus (HPV) Types, HPV68 and HPV70, Related to the Potentially Oncogenic HPV39. Journal of Clinical Microbiology 34, 738–744 (1996)

    Google Scholar 

  19. Meyer, T., Arndt, R., et al.: Association of Rare Human Papillomavirus Types with Genital Premalignant and Malignant Lesions. The Journal of Infectious Diseases 178, 252–255 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, S., Zhang, BT. (2006). Human Papillomavirus Risk Type Classification from Protein Sequences Using Support Vector Machines. In: Rothlauf, F., et al. Applications of Evolutionary Computing. EvoWorkshops 2006. Lecture Notes in Computer Science, vol 3907. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33237-4

  • Online ISBN: 978-3-540-33238-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics