Skip to main content

Advertisement

Log in

Accurate use of label dependency in multi-label text classification through the lens of causality

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Multi-Label Text Classifiction (MLTC) aims to assign the most relevant labels to each given text. Existing methods demonstrate that label dependency can help to improve the model’s performance. However, the introduction of label dependency may cause the model to suffer from unwanted prediction bias. In this study, we attribute the bias to the model’s misuse of label dependency, i.e., the model tends to utilize the correlation shortcut in label dependency rather than fusing text information and label dependency for prediction. Motivated by causal inference, we propose a CounterFactual Text Classifier (CFTC) to eliminate the correlation bias, and make causality-based predictions. Specifically, our CFTC first adopts the predict-then-modify backbone to extract precise label information embedded in label dependency, then blocks the correlation shortcut through the counterfactual de-bias technique with the help of the human causal graph. Experimental results on three datasets demonstrate that our CFTC significantly outperforms the baselines and effectively eliminates the correlation bias in datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
€32.70 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Notes

  1. In this paper, Label Information (LI) refers to the information extracted based on label dependency.

  2. In this paper, we only require that \(T^*\) does not contain the correct text information in T, rather than a semantic counterfactual text.

  3. The Gumbel(0, 1) distribution can be sampled using inverse transform sampling by drawing \(u \sim \text {Uniform}(0,1)\) and computing \(g = -\log (- \log (u))\).

  4. https://git.uwaterloo.ca/jimmylin/Castor-data/tree/master/datasets/AAPD

  5. http://www.ai.mit.edu/projects/jmlr/papers/volume5/lewis04a/lyrl2004_rcv1v2_README.htm

  6. http://www.daviddlewis.com/resources/testcollections/reuters21578/

References

  1. Wang T, Liu L, Liu N et al (2020) A multi-label text classification method via dynamic semantic representation model and deep neural network. Appl Intell 50(8):2339–2351. https://doi.org/10.1007/s10489-020-01680-w

    Article  Google Scholar 

  2. Wang S, Cai J, Lin Q et al (2019) An overview of unsupervised deep feature representation for text categorization. IEEE Trans Comput Soc Syst 6(3):504–517. https://doi.org/10.1109/TCSS.2019.2910599

    Article  Google Scholar 

  3. Wankhade M, Rao ACS, Kulkarni C (2022) A survey on sentiment analysis methods, applications, and challenges. Artif Intell Rev 55(7):5731–5780. https://doi.org/10.1007/s10462-022-10144-1

    Article  Google Scholar 

  4. Alswaidan N, Menai MEB (2020) A survey of state-of-the-art approaches for emotion recognition in text. Knowl Inf Syst 62(8):2937–2987. https://doi.org/10.1007/s10115-020-01449-0

    Article  Google Scholar 

  5. Boutell MR, Luo J, Shen X et al (2004) Learning multi-label scene classification. Pattern Recognit 37(9):1757–1771. https://doi.org/10.1016/j.patcog.2004.03.009

    Article  Google Scholar 

  6. Chen Z, Ren J (2021) Multi-label text classification with latent word-wise label information. Appl Intell 51(2):966–979. https://doi.org/10.1007/s10489-020-01838-6

    Article  Google Scholar 

  7. Yang P, Sun X, Li W, et al (2018) SGM: sequence generation model for multi-label classification. In: Bender EM, Derczynski L, Isabelle P (eds) Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018. Association for Computational Linguistics, pp 3915–3926. https://aclanthology.org/C18-1330/

  8. Yang P, Luo F, Ma S, et al (2019) A deep reinforced sequence-to-set model for multi-label classification. In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. Association for Computational Linguistics, pp 5252–5258. https://doi.org/10.18653/v1/p19-1518

  9. Wang R, Ridley R, Su X et al (2021) A novel reasoning mechanism for multi-label text classification. Inf Process Manag 58(2):102441. https://doi.org/10.1016/j.ipm.2020.102441

    Article  Google Scholar 

  10. Xiao L, Huang X, Chen B, et al (2019) Label-specific document representation for multi-label text classification. In: Inui K, Jiang J, Ng V, et al (eds) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019. Association for Computational Linguistics, pp 466–475. https://doi.org/10.18653/v1/D19-1044

  11. Liu N, Wang Q, Ren J (2021) Label-embedding bi-directional attentive model for multi-label text classification. Neural Process Lett 53(1):375–389. https://doi.org/10.1007/s11063-020-10411-8

    Article  Google Scholar 

  12. Liu H, Chen G, Li P et al (2021) Multi-label text classification via joint learning from label embedding and label correlation. Neurocomputing 460:385–398. https://doi.org/10.1016/j.neucom.2021.07.031

    Article  Google Scholar 

  13. Vu HT, Nguyen MT, Nguyen VC et al (2022) Label-representative graph convolutional network for multi-label text classification. Appl Intell. https://doi.org/10.1007/s10489-022-04106-x

    Article  Google Scholar 

  14. Zhang X, Zhang Q, Yan Z, et al (2021) Enhancing label correlation feedback in multi-label text classification via multi-task learning. In: Zong C, Xia F, Li W, et al (eds) Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021, Findings of ACL, vol ACL/IJCNLP 2021. Association for Computational Linguistics, pp 1190–1200. https://doi.org/10.18653/v1/2021.findings-acl.101

  15. Niven T, Kao H (2019) Probing neural network comprehension of natural language arguments. In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. Association for Computational Linguistics, pp 4658–4664. 10.18653/v1/p19-1459

  16. Feder A, Oved N, Shalit U et al (2021) Causalm: Causal model explanation through counterfactual language models. Comput Linguistics 47(2):333–386. https://doi.org/10.1162/coli_a_00404

    Article  Google Scholar 

  17. Shah H, Tamuly K, Raghunathan A, et al (2020) The pitfalls of simplicity bias in neural networks. In: Larochelle H, Ranzato M, Hadsell R, et al (eds) Advances in Neural Information Processing Systems, vol 33. Curran Associates, Inc., pp 9573–9585. URL https://proceedings.neurips.cc/paper/2020/file/6cfe0e6127fa25df2a0ef2ae1067d915-Paper.pdf

  18. Yao L, Chu Z, Li S, et al (2021) A survey on causal inference. ACM Trans Knowl Discov Data 15(5):74:1–74:46. https://doi.org/10.1145/3444944

  19. Niu Y, Tang K, Zhang H, et al (2021) Counterfactual VQA: A cause-effect look at language bias. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. Computer Vision Foundation / IEEE, pp 12700–12710. https://doi.org/10.1109/CVPR46437.2021.01251. https://openaccess.thecvf.com/content/CVPR2021/html/Niu_Counterfactual_VQA_A_Cause-Effect_Look_at_Language_Bias_CVPR_2021_paper.html

  20. Wang W, Feng F, He X, et al (2021c) Clicks can be cheating: Counterfactual recommendation for mitigating clickbait issue. In: Diaz F, Shah C, Suel T, et al (eds) SIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021. ACM, pp 1288–1297. https://doi.org/10.1145/3404835.3462962

  21. Lewis DD, Yang Y, Rose TG, et al (2004) RCV1: A new benchmark collection for text categorization research. J Mach Learn Res 5:361–397. http://jmlr.org/papers/volume5/lewis04a/lewis04a.pdf

  22. Lewis DD (1997) Reuters-21578 text categorization test collection, distribution 1.0. In: Reuters Ltd

  23. Tsoumakas G, Katakis I (2007) Multi-label classification: An overview. Int J Data Warehous Min 3(3):1–13. https://doi.org/10.4018/jdwm.2007070101

    Article  Google Scholar 

  24. Read J, Pfahringer B, Holmes G, et al (2009) Classifier chains for multi-label classification. In: Buntine WL, Grobelnik M, Mladenic D, et al (eds) Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2009, Bled, Slovenia, September 7-11, 2009, Proceedings, Part II, Lecture Notes in Computer Science, vol 5782. Springer, pp 254–269. https://doi.org/10.1007/978-3-642-04174-7_17

  25. Nam J, Mencía EL, Kim HJ, et al (2017) Maximizing subset accuracy with recurrent neural networks in multi-label classification. In: Guyon I, von Luxburg U, Bengio S, et al (eds) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp 5413–5423. https://proceedings.neurips.cc/paper/2017/hash/2eb5657d37f474e4c4cf01e4882b8962-Abstract.html

  26. Tsai C, Lee H (2020) Order-free learning alleviating exposure bias in multi-label classification. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, pp 6038–6045. URL https://ojs.aaai.org/index.php/AAAI/article/view/6066

  27. Xun G, Jha K, Sun J, et al (2020) Correlation networks for extreme multi-label text classification. In: Gupta R, Liu Y, Tang J, et al (eds) KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020. ACM, pp 1074–1082. https://doi.org/10.1145/3394486.3403151

  28. Devlin J, Chang M, Lee K, et al (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, pp 4171–4186. https://doi.org/10.18653/v1/n19-1423

  29. Chen H, Ma Q, Lin Z, et al (2021a) Hierarchy-aware label semantics matching network for hierarchical text classification. In: Zong C, Xia F, Li W, et al (eds) Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021. Association for Computational Linguistics, pp 4370–4379. https://doi.org/10.18653/v1/2021.acl-long.337

  30. Ma Q, Yuan C, Zhou W, et al (2021) Label-specific dual graph neural network for multi-label text classification. In: Zong C, Xia F, Li W, et al (eds) Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021. Association for Computational Linguistics, pp 3855–3864. https://doi.org/10.18653/v1/2021.acl-long.298

  31. Ozmen M, Zhang H, Wang P, et al (2022) Multi-relation message passing for multi-label text classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022. IEEE, pp 3583–3587. https://doi.org/10.1109/ICASSP43922.2022.9747225

  32. Wang C, Liu L, Sun S et al (2022) Rethinking the framework constructed by counterfactual functional model. Appl Intell 52(11):12957–12974. https://doi.org/10.1007/s10489-022-03161-8

    Article  Google Scholar 

  33. Luo G, Zhao B, Du S (2019) Causal inference and bayesian network structure learning from nominal data. Appl Intell 49(1):253–264. https://doi.org/10.1007/s10489-018-1274-3

    Article  Google Scholar 

  34. Li L, Yue W (2020) Dynamic uncertain causality graph based on intuitionistic fuzzy sets and its application to root cause analysis. Appl Intell 50(1):241–255. https://doi.org/10.1007/s10489-019-01520-6

    Article  MathSciNet  Google Scholar 

  35. Yue Z, Wang T, Sun Q, et al (2021) Counterfactual zero-shot and open-set visual recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. Computer Vision Foundation / IEEE, pp 15404–15414. 10.1109/CVPR46437.2021.01515. URL https://openaccess.thecvf.com/content/CVPR2021/html/Yue_Counterfactual_Zero-Shot_and_Open-Set_Visual_Recognition_CVPR_2021_paper.html

  36. Qian C, Feng F, Wen L, et al (2021) Counterfactual inference for text classification debiasing. In: Zong C, Xia F, Li W, et al (eds) Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021. Association for Computational Linguistics, pp 5434–5445. https://doi.org/10.18653/v1/2021.acl-long.422

  37. Wang Z, Culotta A (2021) Robustness to spurious correlations in text classification via automatically generated counterfactuals. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press, pp 14024–14031. https://ojs.aaai.org/index.php/AAAI/article/view/17651

  38. Paranjape B, Lamm M, Tenney I (2022) Retrieval-guided counterfactual generation for QA. In: Muresan S, Nakov P, Villavicencio A (eds) Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022. Association for Computational Linguistics, pp 1670–1686. https://doi.org/10.18653/v1/2022.acl-long.117

  39. Du M, Manjunatha V, Jain R, et al (2021) Towards interpreting and mitigating shortcut learning behavior of NLU models. In: Toutanova K, Rumshisky A, Zettlemoyer L, et al (eds) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021. Association for Computational Linguistics, pp 915–929. https://doi.org/10.18653/v1/2021.naacl-main.71

  40. Wang W, Feng F, He X, et al (2021b) Deconfounded recommendation for alleviating bias amplification. In: Zhu F, Ooi BC, Miao C (eds) KDD ’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14-18, 2021. ACM, pp 1717–1725. https://doi.org/10.1145/3447548.3467249

  41. Cheng D, Li J, Liu L et al (2022) Sufficient dimension reduction for average causal effect estimation. Data Min Knowl Discov 36(3):1174–1196. https://doi.org/10.1007/s10618-022-00832-5

    Article  MathSciNet  MATH  Google Scholar 

  42. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  43. Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, pp 1746–1751. https://openreview.net/forum?id=SJU4ayYgl

  44. Chen B, Huang X, Xiao L, et al (2020) Hyperbolic capsule networks for multi-label classification. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics, pp 3115–3124. https://doi.org/10.18653/v1/2020.acl-main.283

  45. Chen H, Xia R, Yu J (2021b) Reinforced counterfactual data augmentation for dual sentiment classification. In: Moens M, Huang X, Specia L, et al (eds) Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021. Association for Computational Linguistics, pp 269–278. https://doi.org/10.18653/v1/2021.emnlp-main.24

  46. Tang K, Huang J, Zhang H (2020) Long-tailed classification by keeping the good and removing the bad momentum causal effect. In: Larochelle H, Ranzato M, Hadsell R, et al (eds) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual. https://proceedings.neurips.cc/paper/2020/hash/1091660f3dff84fd648efe31391c5524-Abstract.html

  47. Yang X, Zhang H, Qi G, et al (2021) Causal attention for vision-language tasks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. Computer Vision Foundation / IEEE, pp 9847–9857. https://doi.org/10.1109/CVPR46437.2021.00972. https://openaccess.thecvf.com/content/CVPR2021/html/Yang_Causal_Attention_for_Vision-Language_Tasks_CVPR_2021_paper.html

  48. Jang E, Gu S, Poole B (2017) Categorical reparameterization with gumbel-softmax. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, pp 5434–5445. https://openreview.net/forum?id=rkE3y85ee

  49. Schapire RE, Singer Y (1998) Improved boosting algorithms using confidence-rated predictions. In: Bartlett PL, Mansour Y (eds) Proceedings of the Eleventh Annual Conference on Computational Learning Theory, COLT 1998, Madison, Wisconsin, USA, July 24-26, 1998. ACM, pp 80–91. https://doi.org/10.1145/279943.279960

  50. Gonçalves T, Quaresma P (2003) A preliminary approach to the multilabel classification problem of portuguese juridical documents. In: Moura-Pires F, Abreu S (eds) Progress in Artificial Intelligence, 11th Protuguese Conference on Artificial Intelligence, EPIA 2003, Beja, Portugal, December 4-7, 2003, Proceedings, Lecture Notes in Computer Science, vol 2902. Springer, pp 435–444. https://doi.org/10.1007/978-3-540-24580-3_50

  51. Kim Y (2014) Convolutional neural networks for sentence classification. In: Moschitti A, Pang B, Daelemans W (eds) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. ACL, pp 1746–1751. https://doi.org/10.3115/v1/d14-1181

  52. Chen G, Ye D, Xing Z, et al (2017) Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. In: 2017 International Joint Conference on Neural Networks, IJCNN 2017, Anchorage, AK, USA, May 14-19, 2017. IEEE, pp 2377–2383. https://doi.org/10.1109/IJCNN.2017.7966144

  53. Chen Z, Wei X, Wang P, et al (2019) Multi-label image recognition with graph convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, pp 5177–5186. https://doi.org/10.1109/CVPR.2019.00532. http://openaccess.thecvf.com/content_CVPR_2019/html/Chen_Multi-Label_Image_Recognition_With_Graph_Convolutional_Networks_CVPR_2019_paper.html

Download references

Funding

This work was supported by the Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102), and the Shanghai Science and Technology Innovation Action Plan (20511102600).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao He.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Text contents in analysis

In this paper, we selected several representative samples from AAPD [7] to analyze the model performance. We showed these contents in Tables 9 & 10. AAPD contained the abstract and the corresponding subjects of 55840 papers in the computer science field from the website.

By comparing the predictions obtained from these samples, we verified that there was unwanted bias in the LD-based model due to the label correlation shortcut, and our CFTC could alleviate this bias and obtain causality-based predictions.

Table 10 The text content of the samples selected in Section 6.4.2

Label co-occurrence matrix

To capture the implied interactions of labels, we employed the label co-occurrence matrix [12, 30] as prior knowledge and applied a graph neural network to extract deeper label information. The label co-occurrence matrix \(A \in \mathbb {R}^{\vert L\vert \times \vert L\vert }\) is the statistic of co-occurrence between labels, where \(A_{ij}\) denotes the conditional probability of a text belonging to label \(L_i\) when it belongs to label \(L_j\). We counted the label co-occurrence matrices of AAPD, RCV1 and Reuters-21578, and visualized them as in Fig. 6.

Fig. 6
figure 6

The label co-occurrence matrix of AAPD, RCV1 and Reuters-21578

The results showed the existence of sparse co-occurrence relationships between the labels, and this particular relationship can provide additional information to the model in Multi-Label Text Classification tasks.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, C., Chen, W., Tian, J. et al. Accurate use of label dependency in multi-label text classification through the lens of causality. Appl Intell 53, 21841–21857 (2023). https://doi.org/10.1007/s10489-023-04623-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04623-3

Keywords

Navigation