Abstract
Online news platforms have attracted massive users to read digital news online. The demographic information of these users such as gender is critical for these platforms to provide personalized services such as news recommendation and targeted advertising. However, the gender information of many users in online news platforms is not available. Fortunately, male and female users usually have different pattern in reading online news. Thus, the news browsing data of users can provide useful clues for inferring their genders. In this paper, we propose a neural gender prediction approach based on the news browsing data of users. Usually a news article has different kinds of information such as title, body and categories. However, the characteristics of these components are very different, and they should be processed differently. Thus, we propose to learn unified user representations for gender prediction by incorporating different components of browsed news as different views of users. In each view, we use a hierarchical framework to first learn news representations and then learn user representations from news representations. In addition, since different words in news titles and bodies usually have different informativeness for learning news representations, we use attention mechanisms to select important words. Besides, since different news articles may also have different informativeness for gender prediction, we use news-level attentions to attend to important news articles for learning informative user representations. Extensive experiments on a real-world dataset validate the effectiveness of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Buraya, K., Farseev, A., Filchenkov, A.: Multi-view personality profiling based on longitudinal data. In: Bellot, P., et al. (eds.) CLEF 2018. LNCS, vol. 11018, pp. 15–27. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98932-7_2
Ciccone, G., Sultan, A., Laporte, L., Egyed-Zsigmond, E., Alhamzeh, A., Granitzer, M.: Stacked gender prediction from tweet texts and images notebook for pan at CLEF 2018. In: CLEF, 11 p. (2018)
Culotta, A., Kumar, N.R., Cutler, J.: Predicting the demographics of Twitter users from website traffic data. In: AAAI, pp. 72–78 (2015)
Das, A.S., Datar, M., Garg, A., Rajaram, S.: Google news personalization: scalable online collaborative filtering. In: WWW, pp. 271–280. ACM (2007)
Farnadi, G., Tang, J., De Cock, M., Moens, M.F.: User profiling through deep multimodal fusion. In: WSDM, pp. 171–179 (2018)
Filippova, K.: User demographics and language in an implicit social network. In: EMNLP, pp. 1478–1488 (2012)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Hu, J., Zeng, H.J., Li, H., Niu, C., Chen, Z.: Demographic prediction based on user’s browsing behavior. In: WWW, pp. 151–160 (2007)
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP, pp. 1746–1751 (2014)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Li, W., Dickinson, M.: Gender prediction for Chinese social media data. In: RANLP, pp. 438–445 (2017)
Mac Kim, S., Xu, Q., Qu, L., Wan, S., Paris, C.: Demographic inference on Twitter using recursive neural networks. In: ACL, vol. 2, pp. 471–477 (2017)
Malmi, E., Weber, I.: You are what apps you use: demographic prediction based on user’s apps. In: ICWSM, pp. 635–638 (2016)
Mislove, A., Lehmann, S., Ahn, Y.Y., Onnela, J.P., Rosenquist, J.N.: Understanding the demographics of Twitter users. In: 2011 5th ICWSM, vol. 25 (2011)
Mukherjee, S., Bala, P.K.: Gender classification of microblog text based on authorial style. IseB 15(1), 117–138 (2017)
Nguyen, D., Gravel, R., Trieschnigg, D., Meder, T.: “How old do you think i am?” a study of language and age in Twitter. In: ICWSM, pp. 439–448 (2013)
Nguyen, D., Smith, N.A., Rosé, C.P.: Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 115–123 (2011)
Nguyen, D., Trieschnigg, D., Doğruöz, A.S., Gravel, R., Theune, M., Meder, T., De Jong, F.: Why gender and age prediction from tweets is hard: lessons from a crowdsourcing experiment. In: COLING, pp. 1950–1961 (2014)
Peersman, C., Daelemans, W., Van Vaerenbergh, L.: Predicting age and gender in online social networks. In: SMUC, pp. 37–44 (2011)
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
Phuong, T.M., et al.: Gender prediction using browsing history. In: Huynh, V., Denoeux, T., Tran, D., Le, A., Pham, S. (eds.) Knowledge and Systems Engineering, vol. 244, pp. 271–283. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-02741-8_24
Rangel, F., Rosso, P., Montes-y Gómez, M., Potthast, M., Stein, B.: Overview of the 6th author profiling task at pan 2018: multimodal gender identification in Twitter. Working Notes Papers of the CLEF (2018)
Rangel Pardo, F.M., Celli, F., Rosso, P., Potthast, M., Stein, B., Daelemans, W.: Overview of the 3rd author profiling task at pan 2015. In: CLEF, pp. 1–8 (2015)
Reddy, T.R., Vardhan, B.V., Reddy, P.V.: N-gram approach for gender prediction. In: IACC, pp. 860–865. IEEE (2017)
Rosenthal, S., McKeown, K.: Age prediction in blogs: a study of style, content, and online behavior in pre-and post-social media generations. In: ACL, pp. 763–772 (2011)
Sezerer, E., Polatbilek, O., Sevgili, Ö., Tekir, S.: Gender prediction from tweets with convolutional neural networks: notebook for pan at CLEF 2018. In: CLEF (2018)
Wang, J., Li, S., Zhou, G.: Joint learning on relevant user attributes in micro-blog. In: IJCAI, pp. 4130–4136 (2017)
Wang, L., Li, Q., Chen, X., Li, S.: Multi-task learning for gender and age prediction on chinese microblog. In: Lin, C.-Y., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds.) ICCPOL/NLPCC -2016. LNCS (LNAI), vol. 10102, pp. 189–200. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50496-4_16
Wu, C., Wu, F., Liu, J., He, S., Huang, Y., Xie, X.: Neural demographic prediction using search query. In: WSDM, pp. 654–662. ACM (2019)
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: NAACL, pp. 1480–1489 (2016)
Zhang, D., Li, S., Wang, H., Zhou, G.: User classification with multiple textual perspectives. In: COLING, pp. 2112–2121 (2016)
Acknowledgments
The authors would like to thank Microsoft News for providing technical support and data in the experiments, and Jiun-Hung Chen (Microsoft News) and Ying Qiao (Microsoft News) for their support and discussions. This work was supported by the National Key Research and Development Program of China under Grant number 2018YFC1604002, the National Natural Science Foundation of China under Grant numbers U1836204, U1705261, U1636113, U1536201, and U1536207, and the Tsinghua University Initiative Scientific Research Program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, C., Wu, F., Qi, T., Huang, Y., Xie, X. (2019). Neural Gender Prediction from News Browsing Data. In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics. CCL 2019. Lecture Notes in Computer Science(), vol 11856. Springer, Cham. https://doi.org/10.1007/978-3-030-32381-3_53
Download citation
DOI: https://doi.org/10.1007/978-3-030-32381-3_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32380-6
Online ISBN: 978-3-030-32381-3
eBook Packages: Computer ScienceComputer Science (R0)