Skip to main content

Advertisement

Log in

Extracting optimal explanations for ensemble trees via automated reasoning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Ensemble trees are a popular machine learning model which often yields high prediction performance when analysing structured data. Although individual small decision trees are deemed explainable by nature, an ensemble of large trees is often difficult to understand. In this work, we propose an approach called optimised explanation (OptExplain) that faithfully extracts global explanations of ensemble trees using a combination of logical reasoning, sampling, and nature-inspired optimisation. OptExplain is an interpretable surrogate model that is as close as possible to the prediction ability of the original model. Building on top of this, we propose a method called the profile of equivalent classes (ProClass), which simplify the explanation even further by solving the maximum satisfiability problem (MAX-SAT). ProClass gives the profile of the classes and features from the perspective of the model. Experiment on several datasets shows that our approach can provide high-quality explanations to large ensemble tree models, and it betters recent top-performers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
€32.70 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Algorithm 2
Algorithm 3
Fig. 2
Algorithm 4
Algorithm 5
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

All data generated or analysed during this study are included in this published article (and its supplementary information files).

Code Availability

The code is available at https://github.com/GreeenZhang/OptExplain.

References

  1. Ho TK (1995) Random decision forests. In: Proceedings of 3rd International conference on document analysis and recognition. IEEE, vol 1, pp 278–282

  2. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  3. Freund Y, E Schapire R (1999) A short introduction to boosting. Trans Jpn Soc Artif Intell 14:771–780

    Google Scholar 

  4. Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378

    Article  MathSciNet  MATH  Google Scholar 

  5. Pafka S (2018) A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations of the top machine learning algorithms for binary classification. https://github.com/szilard/benchm-ml. Accessed 28 Sept 2022

  6. Bride H, Dong J, Dong JS, Hóu Z (2018) Towards dependable and explainable machine learning using automated reasoning. In: Formal methods and software engineering - 20th international conference on formal engineering methods, ICFEM 2018, gold coast, QLD, Australia, November 12-16, 2018, Proceedings, pp 412–416

  7. Ltd DIP (2018) Silas. https://depintel.com/silas/. Accessed 28 Sept 2022

  8. Ribeiro MT, Singh S, Guestrin C (2016) Why should I trust you? : Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining, San Francisco, CA, USA, August 13-17, 2016, pp 1135–1144

  9. Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: NIPS

  10. Hinton G, Frosst N (2017) Distilling a neural network into a soft decision tree. https://arxiv.org/pdf/1711.09784.pdf

  11. Ribeiro MT, Singh S, Guestrin C (2018) Anchors: High-precision model-agnostic explanations. In: Proceedings of the AAAI Conference on artificial intelligence, vol 32

  12. Hatwell J, Gaber MM, Azad RMA (2020) Chirps: Explaining random forest classification. Artif Intell Rev 53:5747– 5788

    Article  Google Scholar 

  13. Hara S, Hayashi K (2018) Making tree ensembles interpretable: a bayesian model selection approach. In: International Conference on Artificial Intelligence and Statistics, pp 77–85. PMLR

  14. Deng H. (2019) Interpreting tree ensembles with intrees. Int J Data Sci Anal 7(4):277–287

    Article  Google Scholar 

  15. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106. https://doi.org/10.1023/A:1022643204877

    Article  Google Scholar 

  16. Cui Z, Chen W, He Y, Chen Y (2015) Optimal action extraction for random forests and boosted trees. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’15, pp 179–188. ACM. https://doi.org/10.1145/2783258.2783281

  17. Friedman JH (2000) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232

    MathSciNet  MATH  Google Scholar 

  18. Kullback S (1959) Information theory and statistics. Wiley

  19. Shannon C (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423

    Article  MathSciNet  MATH  Google Scholar 

  20. Papenmeier A, Englebienne G, Seifert C (2019) How model accuracy and explanation fidelity influence user trust. arXiv:1907.12652

  21. Shi Y, Eberhart RC (1999) Empirical study of particle swarm optimization. In: Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), vol 3, pp 1945–19503. https://doi.org/10.1109/CEC.1999.785511

  22. de Moura L, Bjørner N (2008) Z3: An efficient smt solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) tools and algorithms for the construction and analysis of systems, pp 337–340. Springer

  23. Du D, Gu J, Pardalos PM et al (1997) Satisfiability problem: theory and applications: DIMACS Workshop, March 11–13, 1996, vol 35. American Mathematical Soc.

  24. OpenML (2022) openml.org. https://www.openml.org. Accessed 28 Sept 2022

  25. Dua D, Graff C (2017) UCI Machine Learning Repository. http://archive.ics.uci.edu/ml. Accessed 28 Sept 2022

  26. Breiman L, Shang N (1996) Born again trees. University of California, Berkeley, Berkeley, CA, Technical Report 1(2):4

    Google Scholar 

  27. Meinshausen N (2010) Node harvest, vol 4. http://www.jstor.org/stable/23362459

  28. Wan A, Dunlap L, Ho D, Yin J, Lee S, Jin H, Petryk S, Bargal SA, Gonzalez JE (2020) NBDT: Neural-Backed Decision Trees. arXiv:2004.00221

  29. Törnblom J, Nadjm-Tehrani S (2019) Formal verification of random forests in safety-critical applications: 6th International Workshop, FTSCS 2018, Gold Coast, Australia, November 16, 2018, Revised Selected Papers. pp 55–71

  30. Wang B, Hóu Z, Zhang G, Shi J, Huang Y (2021) Tree ensemble property verification from a testing perspective. In: Accepted by the 33rd international conference on software engineering and knowledge engineering (SEKE), pittsburgh, USA

  31. Bride H, Cai C, Dong J, Dong JS, Hóu Z, Mirjalili S, Sun J (2021) Silas: a high-performance machine learning foundation for logical reasoning and verification. Expert Syst Appl 176:114806. https://doi.org/10.1016/j.eswa.2021.114806

    Article  Google Scholar 

Download references

Funding

Yanhong Huang’s work is partially supported by National Key Research and Development Program (2019YFB2102602).

Author information

Authors and Affiliations

Authors

Contributions

Gelin zhang: Software, Investigation, data curation, Writing - review and editing; Zhé Hóu: Conceptualization, Methodology, Writing – original draft; Yanhong Huang: Resources, Validation; Jianqi Shi: Supervision, Conceptualization, Validation; Hadrien Bride: Methodology, Software; Jin Song Dong: Conceptualization, Validation; Yongsheng Gao: Investigation, Writing - review and editing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Yanhong Huang.

Ethics declarations

Conflict of Interests

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The code is available at https://github.com/GreeenZhang/OptExplain.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, G., Hóu, Z., Huang, Y. et al. Extracting optimal explanations for ensemble trees via automated reasoning. Appl Intell 53, 14371–14382 (2023). https://doi.org/10.1007/s10489-022-04180-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04180-1

Keywords

Navigation