Abstract
Sentiment analysis is one of the most important tasks in text mining. This field has a high impact for government and private companies to support major decision-making policies. Even though Genetic Programming (GP) has been widely used to solve real world problems, GP is seldom used to tackle this trendy problem. This contribution starts rectifying this research gap by proposing a novel GP system, namely, Root Genetic Programming, and extending our previous genetic operators based on projections on the phenotype space. The results show that these systems are able to tackle this problem being competitive with other state-of-the-art classifiers, and, also, give insight to approach large scale problems represented on high dimensional spaces.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
\(x_i\) could be a input-vector or a scalar.
- 4.
The K-nearest neighbor classifier was tested with varying K from 10 to 100 and \(K=30\) gave the highest result.
References
Arora, S., Mayfield, E., Penstein-Ros, C., Nyberg, E.: Sentiment classification using automatically extracted subgraph features. In: Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, CAAGET ’10, pp. 131–139, Stroudsburg, PA, USA (2010). Association for Computational Linguistics. 00030
Baeza-Yates, P.A., Ribeiro-Neto, B.A.: Modern Information Retrieval, 2 edn. Addison-Wesley (2011)
Castelli, M., Silva, S., Vanneschi, L.: A C++ framework for geometric semantic genetic programming. Genet. Program. Evol. Mach. 16(1), 73–81 (2014). 00004
Castelli, M., Trujillo, L., Vanneschi, L., Silva, S., Z-Flores, E., Legrand, P.: Geometric semantic genetic programming with local search. In: Proceedings of the 2015 on Genetic and Evolutionary Computation Conference, GECCO ’15, pp. 999–1006. ACM, New York, NY, USA (2015). 00000
Doucette, J., Lichodzijewski, P., Heywood, M.: Evolving coevolutionary classifiers under large attribute spaces. In: Riolo, R., O’Reilly, U.-M., McConaghy, T. (eds.) Genetic Programming Theory and Practice VII, Genetic and Evolutionary Computation, pp. 37–54. Springer, US (2010). 00008. doi:10.1007/978-1-4419-1626-6_3
Escalante, H.J., Garcia-Limon, M.A., Morales-Reyes, A., Graff, M., Montes-y Gomez, M., Morales, E.F., Martinez-Carranza, J.: Term-weighting learning via genetic programming for text classification. Knowl.-Based Syst. (2015). 00000
Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 40(2):121–144 (2010)
Giannakopoulos, G., Mavridi, P., Paliouras, G., Papadakis, G., Tserpes, K.: Representation models for text classification: a comparative analysis over three web document types. In: Proceedings of the 2Nd International Conference on Web Intelligence, Mining and Semantics, WIMS ’12, pp. 13:1–13:12. ACM, New York, NY, USA (2012)
Graff, Mario, Tellez, E.S., Villasenor, E., Miranda-Jiménez, S.: Semantic genetic programming operators based on projections in the phenotype space. Res. Comput. Sci. 94, 73–85 (2015)
Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6(2):65–70 (1979). 10011
Iqbal, M., Browne, W.N., Zhang, M.: Reusing building blocks of extracted knowledge to solve complex, large-scale boolean problems. IEEE Trans. Evol. Comput. 18(4):465–480 (2014). 00019
Korns, M.F.: Large-scale, time-constrained symbolic regression. In: Riolo, R., Soule, T., Worzel, B. (eds.) Genetic Programming Theory and Practice IV, Genetic and Evolutionary Computation, pp. 299–314. Springer, US (2007). 00019 doi:10.1007/978-0-387-49650-4_18
Korns, M.F.: Large-scale, time-constrained symbolic regression-classification. In: Riolo, R., Soule, T., Worzel, B. (eds.) Genetic Programming Theory and Practice V, Genetic and Evolutionary Computation Series, pp. 53–68. Springer, US, (2008). 00020 doi:10.1007/978-0-387-76308-8_4
Korns, M.F., Nunez, L.: Profiling symbolic regression-classification. In: Genetic Programming Theory and Practice VI, Genetic and Evolutionary Computation, pp. 1–14. Springer, US (2009). 00011 doi:10.1007/978-0-387-87623-8_14
Liu, B.: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions, 381 p. Cambridge University Press (2015). ISBN: 1-107-01789-0
Mayfield, E., Penstein-Rosé, C.: Using feature construction to avoid large feature spaces in text classification. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, GECCO ’10, pp. 1299–1306. ACM, New York, NY, USA (2010). 00013
McConaghy, T.: Latent variable symbolic regression for high-dimensional inputs. In: Riolo, R., O’Reilly, U.-M., McConaghy, T. (eds.) Genetic Programming Theory and Practice VII, Genetic and Evolutionary Computation, pp. 103–118. Springer, US (2010). 00007. doi:10.1007/978-1-4419-1626-6_7
Moraglio, A., Krawiec, K., Johnson, C.G.: Geometric semantic genetic programming. In: Coello Coello, C.A., Cutello, V., Deb, K., Forrest, S., Nicosia, G., Pavone, M. (eds.) Parallel Problem Solving from Nature - PPSN XII, number 7491 in Lecture Notes in Computer Science, pp. 21–31. Springer, Berlin, Heidelberg (2012)
Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings – Practical on-line search algorithms for texts and biological sequences, 280 p. Cambridge University Press (2002). ISBN 0-521-81307-7
Padr, L., Stanilovsky, E.: Freeling 3.0: Towards wider multilinguality. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2012). ELRA, Istanbul, Turkey (2012)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Peng, T., Zuo, W., He, F.: Svm based adaptive learning method for text classification from positive and unlabeled documents. Knowl. Inf. Syst. 16(3), 281–301 (2008)
Poli, R.: TinyGP. See Genetic and Evolutionary Computation Conference (GECCO-2004) competition (2004). http://cswww.essex.ac.uk/staff/sml/gecco/TinyGP.html
Poli, R., Langdon, W.B., McPhee, N.F.: A Field Guide to Genetic Programming. Lulu Enterprises UK Ltd (2008)
Romn, J.V., Morera, J.G., Garca Cumbreras, M.A., Martnez Cmara, E., Teresa Martn Valdivia, M., Alfonso Urea Lpez, L.: Overview of tass 2015. CEUR Workshop Proc. 1397:13–21 (2015)
Sammut, C., Webb, G.I. (eds.): Statistical natural language processing. Encyclopedia of Machine Learning, pp. 916–916. Springer, US (2010)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2008)
Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (2001)
Sidorov, G., Miranda-Jiménez, S., Viveros-Jiménez, F., Gelbukh, A., Castro-Sánchez, N., Velásquez, F., Díaz-Rangel, I., Suárez-Guerra, S., Treviño, A., Gordon, J.: Empirical study of machine learning based approach for opinion mining in tweets. In: Proceedings of the 11th Mexican International Conference on Advances in Artificial Intelligence - Volume Part I, MICAI’12, pp. 1–14. Springer, Berlin, Heidelberg (2013)
Silla, C.N. Jr., Pappa, G.L., Freitas, A.A., Kaestner, A.A.: Automatic text summarization with genetic algorithm-based attribute selection. In: Lemaître, C., Reyes, C.A., González, J.A. (eds.) Proceedings 9th Ibero-American Conference on AI Advances in Artificial Intelligence - IBERAMIA 2004. Lecture Notes in Computer Science, vol. 3315, pp. 305–314. Springer, Puebla, Mexico, 22–26 November 2004
Silva, S.: Gplab: A genetic programming toolbox for matlab. http://gplab.sourceforge.net
Uy, N.Q., Anh, P.T., Doan, T.C., Hoai, N.X.: A study on the use of genetic programming for automatic text summarization. In: Dang-Van, H., Sanders, J. (eds.) The Fourth International Conference on Knowledge and Systems Engineering, KSE 2012, pp. 93–98, Danang, Vietnam, 17–19 August 2012
Vanneschi, L., Castelli, M., Manzoni, L., Silva, S.: A new implementation of geometric semantic GP and its application to problems in pharmacokinetics. In: Krawiec, K., Moraglio, A., Hu, T., Ima Etaner-Uyar, A., Hu, B. (eds.) Genetic Programming, number 7831 in Lecture Notes in Computer Science, pp. 205–216. Springer, Berlin, Heidelberg (2013)
Vanneschi, L., Castelli, M., Silva, S.: A survey of semantic methods in genetic programming. Genet. Program. Evol. Mach. 15(2), 195–214 (2014). June
White, D.R.: Software review: the ecj toolkit. Genet. Program. Evol. Mach. 13(1):65–67 (2012)
Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1(6), 80 (1945)
Zhang, Y., Bhattacharyya, S.: Genetic programming in classifying large-scale data: an ensemble method. Inf. Sci. 163(1–3):85–101 (2004). 00061
Zhou, Z.-H.: Ensemble Methods: Foundations and Algorithms. CRC Press (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Graff, M., Tellez, E.S., Jair Escalante, H., Miranda-Jiménez, S. (2017). Semantic Genetic Programming for Sentiment Analysis. In: Schütze, O., Trujillo, L., Legrand, P., Maldonado, Y. (eds) NEO 2015. Studies in Computational Intelligence, vol 663. Springer, Cham. https://doi.org/10.1007/978-3-319-44003-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-44003-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44002-6
Online ISBN: 978-3-319-44003-3
eBook Packages: EngineeringEngineering (R0)