Abstract
Multi-Label Text Classifiction (MLTC) aims to assign the most relevant labels to each given text. Existing methods demonstrate that label dependency can help to improve the model’s performance. However, the introduction of label dependency may cause the model to suffer from unwanted prediction bias. In this study, we attribute the bias to the model’s misuse of label dependency, i.e., the model tends to utilize the correlation shortcut in label dependency rather than fusing text information and label dependency for prediction. Motivated by causal inference, we propose a CounterFactual Text Classifier (CFTC) to eliminate the correlation bias, and make causality-based predictions. Specifically, our CFTC first adopts the predict-then-modify backbone to extract precise label information embedded in label dependency, then blocks the correlation shortcut through the counterfactual de-bias technique with the help of the human causal graph. Experimental results on three datasets demonstrate that our CFTC significantly outperforms the baselines and effectively eliminates the correlation bias in datasets.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Notes
In this paper, Label Information (LI) refers to the information extracted based on label dependency.
In this paper, we only require that \(T^*\) does not contain the correct text information in T, rather than a semantic counterfactual text.
The Gumbel(0, 1) distribution can be sampled using inverse transform sampling by drawing \(u \sim \text {Uniform}(0,1)\) and computing \(g = -\log (- \log (u))\).
References
Wang T, Liu L, Liu N et al (2020) A multi-label text classification method via dynamic semantic representation model and deep neural network. Appl Intell 50(8):2339–2351. https://doi.org/10.1007/s10489-020-01680-w
Wang S, Cai J, Lin Q et al (2019) An overview of unsupervised deep feature representation for text categorization. IEEE Trans Comput Soc Syst 6(3):504–517. https://doi.org/10.1109/TCSS.2019.2910599
Wankhade M, Rao ACS, Kulkarni C (2022) A survey on sentiment analysis methods, applications, and challenges. Artif Intell Rev 55(7):5731–5780. https://doi.org/10.1007/s10462-022-10144-1
Alswaidan N, Menai MEB (2020) A survey of state-of-the-art approaches for emotion recognition in text. Knowl Inf Syst 62(8):2937–2987. https://doi.org/10.1007/s10115-020-01449-0
Boutell MR, Luo J, Shen X et al (2004) Learning multi-label scene classification. Pattern Recognit 37(9):1757–1771. https://doi.org/10.1016/j.patcog.2004.03.009
Chen Z, Ren J (2021) Multi-label text classification with latent word-wise label information. Appl Intell 51(2):966–979. https://doi.org/10.1007/s10489-020-01838-6
Yang P, Sun X, Li W, et al (2018) SGM: sequence generation model for multi-label classification. In: Bender EM, Derczynski L, Isabelle P (eds) Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018. Association for Computational Linguistics, pp 3915–3926. https://aclanthology.org/C18-1330/
Yang P, Luo F, Ma S, et al (2019) A deep reinforced sequence-to-set model for multi-label classification. In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. Association for Computational Linguistics, pp 5252–5258. https://doi.org/10.18653/v1/p19-1518
Wang R, Ridley R, Su X et al (2021) A novel reasoning mechanism for multi-label text classification. Inf Process Manag 58(2):102441. https://doi.org/10.1016/j.ipm.2020.102441
Xiao L, Huang X, Chen B, et al (2019) Label-specific document representation for multi-label text classification. In: Inui K, Jiang J, Ng V, et al (eds) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019. Association for Computational Linguistics, pp 466–475. https://doi.org/10.18653/v1/D19-1044
Liu N, Wang Q, Ren J (2021) Label-embedding bi-directional attentive model for multi-label text classification. Neural Process Lett 53(1):375–389. https://doi.org/10.1007/s11063-020-10411-8
Liu H, Chen G, Li P et al (2021) Multi-label text classification via joint learning from label embedding and label correlation. Neurocomputing 460:385–398. https://doi.org/10.1016/j.neucom.2021.07.031
Vu HT, Nguyen MT, Nguyen VC et al (2022) Label-representative graph convolutional network for multi-label text classification. Appl Intell. https://doi.org/10.1007/s10489-022-04106-x
Zhang X, Zhang Q, Yan Z, et al (2021) Enhancing label correlation feedback in multi-label text classification via multi-task learning. In: Zong C, Xia F, Li W, et al (eds) Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021, Findings of ACL, vol ACL/IJCNLP 2021. Association for Computational Linguistics, pp 1190–1200. https://doi.org/10.18653/v1/2021.findings-acl.101
Niven T, Kao H (2019) Probing neural network comprehension of natural language arguments. In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. Association for Computational Linguistics, pp 4658–4664. 10.18653/v1/p19-1459
Feder A, Oved N, Shalit U et al (2021) Causalm: Causal model explanation through counterfactual language models. Comput Linguistics 47(2):333–386. https://doi.org/10.1162/coli_a_00404
Shah H, Tamuly K, Raghunathan A, et al (2020) The pitfalls of simplicity bias in neural networks. In: Larochelle H, Ranzato M, Hadsell R, et al (eds) Advances in Neural Information Processing Systems, vol 33. Curran Associates, Inc., pp 9573–9585. URL https://proceedings.neurips.cc/paper/2020/file/6cfe0e6127fa25df2a0ef2ae1067d915-Paper.pdf
Yao L, Chu Z, Li S, et al (2021) A survey on causal inference. ACM Trans Knowl Discov Data 15(5):74:1–74:46. https://doi.org/10.1145/3444944
Niu Y, Tang K, Zhang H, et al (2021) Counterfactual VQA: A cause-effect look at language bias. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. Computer Vision Foundation / IEEE, pp 12700–12710. https://doi.org/10.1109/CVPR46437.2021.01251. https://openaccess.thecvf.com/content/CVPR2021/html/Niu_Counterfactual_VQA_A_Cause-Effect_Look_at_Language_Bias_CVPR_2021_paper.html
Wang W, Feng F, He X, et al (2021c) Clicks can be cheating: Counterfactual recommendation for mitigating clickbait issue. In: Diaz F, Shah C, Suel T, et al (eds) SIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021. ACM, pp 1288–1297. https://doi.org/10.1145/3404835.3462962
Lewis DD, Yang Y, Rose TG, et al (2004) RCV1: A new benchmark collection for text categorization research. J Mach Learn Res 5:361–397. http://jmlr.org/papers/volume5/lewis04a/lewis04a.pdf
Lewis DD (1997) Reuters-21578 text categorization test collection, distribution 1.0. In: Reuters Ltd
Tsoumakas G, Katakis I (2007) Multi-label classification: An overview. Int J Data Warehous Min 3(3):1–13. https://doi.org/10.4018/jdwm.2007070101
Read J, Pfahringer B, Holmes G, et al (2009) Classifier chains for multi-label classification. In: Buntine WL, Grobelnik M, Mladenic D, et al (eds) Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2009, Bled, Slovenia, September 7-11, 2009, Proceedings, Part II, Lecture Notes in Computer Science, vol 5782. Springer, pp 254–269. https://doi.org/10.1007/978-3-642-04174-7_17
Nam J, Mencía EL, Kim HJ, et al (2017) Maximizing subset accuracy with recurrent neural networks in multi-label classification. In: Guyon I, von Luxburg U, Bengio S, et al (eds) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp 5413–5423. https://proceedings.neurips.cc/paper/2017/hash/2eb5657d37f474e4c4cf01e4882b8962-Abstract.html
Tsai C, Lee H (2020) Order-free learning alleviating exposure bias in multi-label classification. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, pp 6038–6045. URL https://ojs.aaai.org/index.php/AAAI/article/view/6066
Xun G, Jha K, Sun J, et al (2020) Correlation networks for extreme multi-label text classification. In: Gupta R, Liu Y, Tang J, et al (eds) KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020. ACM, pp 1074–1082. https://doi.org/10.1145/3394486.3403151
Devlin J, Chang M, Lee K, et al (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, pp 4171–4186. https://doi.org/10.18653/v1/n19-1423
Chen H, Ma Q, Lin Z, et al (2021a) Hierarchy-aware label semantics matching network for hierarchical text classification. In: Zong C, Xia F, Li W, et al (eds) Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021. Association for Computational Linguistics, pp 4370–4379. https://doi.org/10.18653/v1/2021.acl-long.337
Ma Q, Yuan C, Zhou W, et al (2021) Label-specific dual graph neural network for multi-label text classification. In: Zong C, Xia F, Li W, et al (eds) Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021. Association for Computational Linguistics, pp 3855–3864. https://doi.org/10.18653/v1/2021.acl-long.298
Ozmen M, Zhang H, Wang P, et al (2022) Multi-relation message passing for multi-label text classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022. IEEE, pp 3583–3587. https://doi.org/10.1109/ICASSP43922.2022.9747225
Wang C, Liu L, Sun S et al (2022) Rethinking the framework constructed by counterfactual functional model. Appl Intell 52(11):12957–12974. https://doi.org/10.1007/s10489-022-03161-8
Luo G, Zhao B, Du S (2019) Causal inference and bayesian network structure learning from nominal data. Appl Intell 49(1):253–264. https://doi.org/10.1007/s10489-018-1274-3
Li L, Yue W (2020) Dynamic uncertain causality graph based on intuitionistic fuzzy sets and its application to root cause analysis. Appl Intell 50(1):241–255. https://doi.org/10.1007/s10489-019-01520-6
Yue Z, Wang T, Sun Q, et al (2021) Counterfactual zero-shot and open-set visual recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. Computer Vision Foundation / IEEE, pp 15404–15414. 10.1109/CVPR46437.2021.01515. URL https://openaccess.thecvf.com/content/CVPR2021/html/Yue_Counterfactual_Zero-Shot_and_Open-Set_Visual_Recognition_CVPR_2021_paper.html
Qian C, Feng F, Wen L, et al (2021) Counterfactual inference for text classification debiasing. In: Zong C, Xia F, Li W, et al (eds) Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021. Association for Computational Linguistics, pp 5434–5445. https://doi.org/10.18653/v1/2021.acl-long.422
Wang Z, Culotta A (2021) Robustness to spurious correlations in text classification via automatically generated counterfactuals. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press, pp 14024–14031. https://ojs.aaai.org/index.php/AAAI/article/view/17651
Paranjape B, Lamm M, Tenney I (2022) Retrieval-guided counterfactual generation for QA. In: Muresan S, Nakov P, Villavicencio A (eds) Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022. Association for Computational Linguistics, pp 1670–1686. https://doi.org/10.18653/v1/2022.acl-long.117
Du M, Manjunatha V, Jain R, et al (2021) Towards interpreting and mitigating shortcut learning behavior of NLU models. In: Toutanova K, Rumshisky A, Zettlemoyer L, et al (eds) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021. Association for Computational Linguistics, pp 915–929. https://doi.org/10.18653/v1/2021.naacl-main.71
Wang W, Feng F, He X, et al (2021b) Deconfounded recommendation for alleviating bias amplification. In: Zhu F, Ooi BC, Miao C (eds) KDD ’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14-18, 2021. ACM, pp 1717–1725. https://doi.org/10.1145/3447548.3467249
Cheng D, Li J, Liu L et al (2022) Sufficient dimension reduction for average causal effect estimation. Data Min Knowl Discov 36(3):1174–1196. https://doi.org/10.1007/s10618-022-00832-5
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, pp 1746–1751. https://openreview.net/forum?id=SJU4ayYgl
Chen B, Huang X, Xiao L, et al (2020) Hyperbolic capsule networks for multi-label classification. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics, pp 3115–3124. https://doi.org/10.18653/v1/2020.acl-main.283
Chen H, Xia R, Yu J (2021b) Reinforced counterfactual data augmentation for dual sentiment classification. In: Moens M, Huang X, Specia L, et al (eds) Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021. Association for Computational Linguistics, pp 269–278. https://doi.org/10.18653/v1/2021.emnlp-main.24
Tang K, Huang J, Zhang H (2020) Long-tailed classification by keeping the good and removing the bad momentum causal effect. In: Larochelle H, Ranzato M, Hadsell R, et al (eds) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual. https://proceedings.neurips.cc/paper/2020/hash/1091660f3dff84fd648efe31391c5524-Abstract.html
Yang X, Zhang H, Qi G, et al (2021) Causal attention for vision-language tasks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. Computer Vision Foundation / IEEE, pp 9847–9857. https://doi.org/10.1109/CVPR46437.2021.00972. https://openaccess.thecvf.com/content/CVPR2021/html/Yang_Causal_Attention_for_Vision-Language_Tasks_CVPR_2021_paper.html
Jang E, Gu S, Poole B (2017) Categorical reparameterization with gumbel-softmax. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, pp 5434–5445. https://openreview.net/forum?id=rkE3y85ee
Schapire RE, Singer Y (1998) Improved boosting algorithms using confidence-rated predictions. In: Bartlett PL, Mansour Y (eds) Proceedings of the Eleventh Annual Conference on Computational Learning Theory, COLT 1998, Madison, Wisconsin, USA, July 24-26, 1998. ACM, pp 80–91. https://doi.org/10.1145/279943.279960
Gonçalves T, Quaresma P (2003) A preliminary approach to the multilabel classification problem of portuguese juridical documents. In: Moura-Pires F, Abreu S (eds) Progress in Artificial Intelligence, 11th Protuguese Conference on Artificial Intelligence, EPIA 2003, Beja, Portugal, December 4-7, 2003, Proceedings, Lecture Notes in Computer Science, vol 2902. Springer, pp 435–444. https://doi.org/10.1007/978-3-540-24580-3_50
Kim Y (2014) Convolutional neural networks for sentence classification. In: Moschitti A, Pang B, Daelemans W (eds) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. ACL, pp 1746–1751. https://doi.org/10.3115/v1/d14-1181
Chen G, Ye D, Xing Z, et al (2017) Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. In: 2017 International Joint Conference on Neural Networks, IJCNN 2017, Anchorage, AK, USA, May 14-19, 2017. IEEE, pp 2377–2383. https://doi.org/10.1109/IJCNN.2017.7966144
Chen Z, Wei X, Wang P, et al (2019) Multi-label image recognition with graph convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, pp 5177–5186. https://doi.org/10.1109/CVPR.2019.00532. http://openaccess.thecvf.com/content_CVPR_2019/html/Chen_Multi-Label_Image_Recognition_With_Graph_Convolutional_Networks_CVPR_2019_paper.html
Funding
This work was supported by the Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102), and the Shanghai Science and Technology Innovation Action Plan (20511102600).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Text contents in analysis
In this paper, we selected several representative samples from AAPD [7] to analyze the model performance. We showed these contents in Tables 9 & 10. AAPD contained the abstract and the corresponding subjects of 55840 papers in the computer science field from the website.
By comparing the predictions obtained from these samples, we verified that there was unwanted bias in the LD-based model due to the label correlation shortcut, and our CFTC could alleviate this bias and obtain causality-based predictions.
Label co-occurrence matrix
To capture the implied interactions of labels, we employed the label co-occurrence matrix [12, 30] as prior knowledge and applied a graph neural network to extract deeper label information. The label co-occurrence matrix \(A \in \mathbb {R}^{\vert L\vert \times \vert L\vert }\) is the statistic of co-occurrence between labels, where \(A_{ij}\) denotes the conditional probability of a text belonging to label \(L_i\) when it belongs to label \(L_j\). We counted the label co-occurrence matrices of AAPD, RCV1 and Reuters-21578, and visualized them as in Fig. 6.
The results showed the existence of sparse co-occurrence relationships between the labels, and this particular relationship can provide additional information to the model in Multi-Label Text Classification tasks.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fan, C., Chen, W., Tian, J. et al. Accurate use of label dependency in multi-label text classification through the lens of causality. Appl Intell 53, 21841–21857 (2023). https://doi.org/10.1007/s10489-023-04623-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04623-3