Extracting optimal explanations for ensemble trees via automated reasoning

Zhang, Gelin; Hóu, Zhé; Huang, Yanhong; Shi, Jianqi; Bride, Hadrien; Dong, Jin Song; Gao, Yongsheng

doi:10.1007/s10489-022-04180-1

Extracting optimal explanations for ensemble trees via automated reasoning

Published: 25 October 2022

Volume 53, pages 14371–14382, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

301 Accesses
1 Altmetric
Explore all metrics

Abstract

Ensemble trees are a popular machine learning model which often yields high prediction performance when analysing structured data. Although individual small decision trees are deemed explainable by nature, an ensemble of large trees is often difficult to understand. In this work, we propose an approach called optimised explanation (OptExplain) that faithfully extracts global explanations of ensemble trees using a combination of logical reasoning, sampling, and nature-inspired optimisation. OptExplain is an interpretable surrogate model that is as close as possible to the prediction ability of the original model. Building on top of this, we propose a method called the profile of equivalent classes (ProClass), which simplify the explanation even further by solving the maximum satisfiability problem (MAX-SAT). ProClass gives the profile of the classes and features from the perspective of the model. Experiment on several datasets shows that our approach can provide high-quality explanations to large ensemble tree models, and it betters recent top-performers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Stable and actionable explanations of black-box models through factual and counterfactual rules

Article Open access 14 November 2022

Towards Explainability of Tree-Based Ensemble Models. A Critical Overview

Semantic Explanations in Ensemble Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

All data generated or analysed during this study are included in this published article (and its supplementary information files).

Code Availability

The code is available at https://github.com/GreeenZhang/OptExplain.

References

Ho TK (1995) Random decision forests. In: Proceedings of 3rd International conference on document analysis and recognition. IEEE, vol 1, pp 278–282
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Freund Y, E Schapire R (1999) A short introduction to boosting. Trans Jpn Soc Artif Intell 14:771–780
Google Scholar
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378
Article MathSciNet MATH Google Scholar
Pafka S (2018) A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations of the top machine learning algorithms for binary classification. https://github.com/szilard/benchm-ml. Accessed 28 Sept 2022
Bride H, Dong J, Dong JS, Hóu Z (2018) Towards dependable and explainable machine learning using automated reasoning. In: Formal methods and software engineering - 20th international conference on formal engineering methods, ICFEM 2018, gold coast, QLD, Australia, November 12-16, 2018, Proceedings, pp 412–416
Ltd DIP (2018) Silas. https://depintel.com/silas/. Accessed 28 Sept 2022
Ribeiro MT, Singh S, Guestrin C (2016) Why should I trust you? : Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining, San Francisco, CA, USA, August 13-17, 2016, pp 1135–1144
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: NIPS
Hinton G, Frosst N (2017) Distilling a neural network into a soft decision tree. https://arxiv.org/pdf/1711.09784.pdf
Ribeiro MT, Singh S, Guestrin C (2018) Anchors: High-precision model-agnostic explanations. In: Proceedings of the AAAI Conference on artificial intelligence, vol 32
Hatwell J, Gaber MM, Azad RMA (2020) Chirps: Explaining random forest classification. Artif Intell Rev 53:5747– 5788
Article Google Scholar
Hara S, Hayashi K (2018) Making tree ensembles interpretable: a bayesian model selection approach. In: International Conference on Artificial Intelligence and Statistics, pp 77–85. PMLR
Deng H. (2019) Interpreting tree ensembles with intrees. Int J Data Sci Anal 7(4):277–287
Article Google Scholar
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106. https://doi.org/10.1023/A:1022643204877
Article Google Scholar
Cui Z, Chen W, He Y, Chen Y (2015) Optimal action extraction for random forests and boosted trees. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’15, pp 179–188. ACM. https://doi.org/10.1145/2783258.2783281
Friedman JH (2000) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
MathSciNet MATH Google Scholar
Kullback S (1959) Information theory and statistics. Wiley
Shannon C (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423
Article MathSciNet MATH Google Scholar
Papenmeier A, Englebienne G, Seifert C (2019) How model accuracy and explanation fidelity influence user trust. arXiv:1907.12652
Shi Y, Eberhart RC (1999) Empirical study of particle swarm optimization. In: Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), vol 3, pp 1945–19503. https://doi.org/10.1109/CEC.1999.785511
de Moura L, Bjørner N (2008) Z3: An efficient smt solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) tools and algorithms for the construction and analysis of systems, pp 337–340. Springer
Du D, Gu J, Pardalos PM et al (1997) Satisfiability problem: theory and applications: DIMACS Workshop, March 11–13, 1996, vol 35. American Mathematical Soc.
OpenML (2022) openml.org. https://www.openml.org. Accessed 28 Sept 2022
Dua D, Graff C (2017) UCI Machine Learning Repository. http://archive.ics.uci.edu/ml. Accessed 28 Sept 2022
Breiman L, Shang N (1996) Born again trees. University of California, Berkeley, Berkeley, CA, Technical Report 1(2):4
Google Scholar
Meinshausen N (2010) Node harvest, vol 4. http://www.jstor.org/stable/23362459
Wan A, Dunlap L, Ho D, Yin J, Lee S, Jin H, Petryk S, Bargal SA, Gonzalez JE (2020) NBDT: Neural-Backed Decision Trees. arXiv:2004.00221
Törnblom J, Nadjm-Tehrani S (2019) Formal verification of random forests in safety-critical applications: 6th International Workshop, FTSCS 2018, Gold Coast, Australia, November 16, 2018, Revised Selected Papers. pp 55–71
Wang B, Hóu Z, Zhang G, Shi J, Huang Y (2021) Tree ensemble property verification from a testing perspective. In: Accepted by the 33rd international conference on software engineering and knowledge engineering (SEKE), pittsburgh, USA
Bride H, Cai C, Dong J, Dong JS, Hóu Z, Mirjalili S, Sun J (2021) Silas: a high-performance machine learning foundation for logical reasoning and verification. Expert Syst Appl 176:114806. https://doi.org/10.1016/j.eswa.2021.114806
Article Google Scholar

Download references

Funding

Yanhong Huang’s work is partially supported by National Key Research and Development Program (2019YFB2102602).

Author information

Authors and Affiliations

National Trusted Embedded Software Engineering Technology Research Center, East China Normal University, Shanghai, China
Gelin Zhang, Yanhong Huang & Jianqi Shi
Griffith University, Brisbane, Australia
Zhé Hóu, Hadrien Bride, Jin Song Dong & Yongsheng Gao
National University of Singapore, Singapore, Singapore
Jin Song Dong

Authors

Gelin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhé Hóu
View author publications
You can also search for this author in PubMed Google Scholar
Yanhong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jianqi Shi
View author publications
You can also search for this author in PubMed Google Scholar
Hadrien Bride
View author publications
You can also search for this author in PubMed Google Scholar
Jin Song Dong
View author publications
You can also search for this author in PubMed Google Scholar
Yongsheng Gao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Gelin zhang: Software, Investigation, data curation, Writing - review and editing; Zhé Hóu: Conceptualization, Methodology, Writing – original draft; Yanhong Huang: Resources, Validation; Jianqi Shi: Supervision, Conceptualization, Validation; Hadrien Bride: Methodology, Software; Jin Song Dong: Conceptualization, Validation; Yongsheng Gao: Investigation, Writing - review and editing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Yanhong Huang.

Ethics declarations

Conflict of Interests

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The code is available at https://github.com/GreeenZhang/OptExplain.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, G., Hóu, Z., Huang, Y. et al. Extracting optimal explanations for ensemble trees via automated reasoning. Appl Intell 53, 14371–14382 (2023). https://doi.org/10.1007/s10489-022-04180-1

Download citation

Accepted: 13 September 2022
Published: 25 October 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10489-022-04180-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Extracting optimal explanations for ensemble trees via automated reasoning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Stable and actionable explanations of black-box models through factual and counterfactual rules

Towards Explainability of Tree-Based Ensemble Models. A Critical Overview

Semantic Explanations in Ensemble Learning

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Extracting optimal explanations for ensemble trees via automated reasoning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Stable and actionable explanations of black-box models through factual and counterfactual rules

Towards Explainability of Tree-Based Ensemble Models. A Critical Overview

Semantic Explanations in Ensemble Learning

Explore related subjects

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation