Abstract
Infection by the human papillomavirus (HPV) is associated with the development of cervical cancer. HPV can be classified to high- and low-risk type according to its malignant potential, and detection of the risk type is important to understand the mechanisms and diagnose potential patients. In this paper, we classify the HPV protein sequences by support vector machines. A string kernel is introduced to discriminate HPV protein sequences. The kernel emphasizes amino acids pairs with a distance. In the experiments, our approach is compared with previous methods in accuracy and F1-score, and it has showed better performance. Also, the prediction results for unknown HPV types are presented.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bosch, F.X., Manos, M.M., et al.: Prevalence of Human Papillomavirus in Cervical Cancer: a Worldwide Perspective. Journal of the National Cancer Institute 87, 796–802 (1995)
Janicek, M.F., Averette, H.E.: Cervical Cancer: Prevention, Diagnosis, and Therapeutics. Cancer Journals for Clinicians 51, 92–114 (2001)
Furumoto, H., Irahara, M.: Human Papillomavirus (HPV) and Cervical Cancer. Journal of Medical Investigation 49, 124–133 (2002)
Centurioni, M.G., Puppo, A., et al.: Prevalence of Human Papillomavirus Cervical Infection in an Italian Asymptomatic Population. BMC Infectious Diseases 5(77) (2005)
Burk, R.D., Ho, G.Y., et al.: Sexual Behavior and Partner Characteristics Are the Predominant Risk Factors for Genital Human Papillomavirus Infection in Young Women. The Journal of Infectious Diseases 174, 679–689 (1996)
Muñoz, N., Bosch, F.X., et al.: Epidemiologic Classification of Human Papillomavirus Types Associated with Cervical Cancer. New England Journal of Medicin 348, 518–527 (2003)
Watson, J.D., Laskowski, R.A., Thornton, J.M.: Predicting Protein Function from Sequence and Structural Data. Current Opinion in Structural Biology 15, 275–284 (2005)
Borgwardt, K.M., Ong, C.S., et al.: Protein Function Prediction via Graph Kernels. In: Proceedings of Thirteenth International Conference on Intelligenc Systems for Molecular Biology, pp. 47–56 (2005)
Eom, J.-H., Park, S.-B., Zhang, B.-T.: Genetic mining of DNA sequence structures for effective classification of the risk types of human papillomavirus (HPV). In: Pal, N.R., Kasabov, N., Mudi, R.K., Pal, S., Parui, S.K. (eds.) ICONIP 2004. LNCS, vol. 3316, pp. 1334–1343. Springer, Heidelberg (2004)
Joung, J.-G., O, S.J., Zhang, B.-T.: Prediction of the risk types of human papillomaviruses by support vector machines. In: Zhang, C., Guesgen, H.W., Yeap, W.-K. (eds.) PRICAI 2004. LNCS (LNAI), vol. 3157, pp. 723–731. Springer, Heidelberg (2004)
Park, S.-B., Hwang, S., Zhang, B.-T.: Mining the risk types of human papillomavirus (HPV) by adaCost. In: MaÅ™Ãk, V., Å tÄ›pánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 403–412. Springer, Heidelberg (2003)
Vapnik, V.N.: Statistical Learning Theory. Springer, Heidelberg (1998)
Leslie, C., Eskin, E., Noble, W.S.: The Spectrum Kernel: A String Kernel for SVM Protein Classification. In: Proceedings of the Pacific Symposium on Biocomputing, pp. 564–575 (2002)
Leslie, C., Eskin, E., et al.: Mismatch String Kernels for Discriminative Protein Classification. Bioinformatics 20(4), 467–476 (2004)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
The HPV sequence database in Los Alamos laboratory, http://hpv-web.lanl.gov/stdgen/virus/hpv/index.html
Pillai, M., Lakshmi, S., et al.: High-Risk Human Papillomavirus Infection and E6 Protein Expression in Lesions of the Uterine Cervix. Pathobiology 66, 240–246 (1998)
Longuet, M., Beaudenon, S., Orth, G.: Two Novel Genital Human Papillomavirus (HPV) Types, HPV68 and HPV70, Related to the Potentially Oncogenic HPV39. Journal of Clinical Microbiology 34, 738–744 (1996)
Meyer, T., Arndt, R., et al.: Association of Rare Human Papillomavirus Types with Genital Premalignant and Malignant Lesions. The Journal of Infectious Diseases 178, 252–255 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, S., Zhang, BT. (2006). Human Papillomavirus Risk Type Classification from Protein Sequences Using Support Vector Machines. In: Rothlauf, F., et al. Applications of Evolutionary Computing. EvoWorkshops 2006. Lecture Notes in Computer Science, vol 3907. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732242_6
Download citation
DOI: https://doi.org/10.1007/11732242_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33237-4
Online ISBN: 978-3-540-33238-1
eBook Packages: Computer ScienceComputer Science (R0)