Abstract
Ensemble trees are a popular machine learning model which often yields high prediction performance when analysing structured data. Although individual small decision trees are deemed explainable by nature, an ensemble of large trees is often difficult to understand. In this work, we propose an approach called optimised explanation (OptExplain) that faithfully extracts global explanations of ensemble trees using a combination of logical reasoning, sampling, and nature-inspired optimisation. OptExplain is an interpretable surrogate model that is as close as possible to the prediction ability of the original model. Building on top of this, we propose a method called the profile of equivalent classes (ProClass), which simplify the explanation even further by solving the maximum satisfiability problem (MAX-SAT). ProClass gives the profile of the classes and features from the perspective of the model. Experiment on several datasets shows that our approach can provide high-quality explanations to large ensemble tree models, and it betters recent top-performers.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
All data generated or analysed during this study are included in this published article (and its supplementary information files).
Code Availability
The code is available at https://github.com/GreeenZhang/OptExplain.
References
Ho TK (1995) Random decision forests. In: Proceedings of 3rd International conference on document analysis and recognition. IEEE, vol 1, pp 278–282
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Freund Y, E Schapire R (1999) A short introduction to boosting. Trans Jpn Soc Artif Intell 14:771–780
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378
Pafka S (2018) A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations of the top machine learning algorithms for binary classification. https://github.com/szilard/benchm-ml. Accessed 28 Sept 2022
Bride H, Dong J, Dong JS, Hóu Z (2018) Towards dependable and explainable machine learning using automated reasoning. In: Formal methods and software engineering - 20th international conference on formal engineering methods, ICFEM 2018, gold coast, QLD, Australia, November 12-16, 2018, Proceedings, pp 412–416
Ltd DIP (2018) Silas. https://depintel.com/silas/. Accessed 28 Sept 2022
Ribeiro MT, Singh S, Guestrin C (2016) Why should I trust you? : Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining, San Francisco, CA, USA, August 13-17, 2016, pp 1135–1144
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: NIPS
Hinton G, Frosst N (2017) Distilling a neural network into a soft decision tree. https://arxiv.org/pdf/1711.09784.pdf
Ribeiro MT, Singh S, Guestrin C (2018) Anchors: High-precision model-agnostic explanations. In: Proceedings of the AAAI Conference on artificial intelligence, vol 32
Hatwell J, Gaber MM, Azad RMA (2020) Chirps: Explaining random forest classification. Artif Intell Rev 53:5747– 5788
Hara S, Hayashi K (2018) Making tree ensembles interpretable: a bayesian model selection approach. In: International Conference on Artificial Intelligence and Statistics, pp 77–85. PMLR
Deng H. (2019) Interpreting tree ensembles with intrees. Int J Data Sci Anal 7(4):277–287
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106. https://doi.org/10.1023/A:1022643204877
Cui Z, Chen W, He Y, Chen Y (2015) Optimal action extraction for random forests and boosted trees. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’15, pp 179–188. ACM. https://doi.org/10.1145/2783258.2783281
Friedman JH (2000) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Kullback S (1959) Information theory and statistics. Wiley
Shannon C (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423
Papenmeier A, Englebienne G, Seifert C (2019) How model accuracy and explanation fidelity influence user trust. arXiv:1907.12652
Shi Y, Eberhart RC (1999) Empirical study of particle swarm optimization. In: Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), vol 3, pp 1945–19503. https://doi.org/10.1109/CEC.1999.785511
de Moura L, Bjørner N (2008) Z3: An efficient smt solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) tools and algorithms for the construction and analysis of systems, pp 337–340. Springer
Du D, Gu J, Pardalos PM et al (1997) Satisfiability problem: theory and applications: DIMACS Workshop, March 11–13, 1996, vol 35. American Mathematical Soc.
OpenML (2022) openml.org. https://www.openml.org. Accessed 28 Sept 2022
Dua D, Graff C (2017) UCI Machine Learning Repository. http://archive.ics.uci.edu/ml. Accessed 28 Sept 2022
Breiman L, Shang N (1996) Born again trees. University of California, Berkeley, Berkeley, CA, Technical Report 1(2):4
Meinshausen N (2010) Node harvest, vol 4. http://www.jstor.org/stable/23362459
Wan A, Dunlap L, Ho D, Yin J, Lee S, Jin H, Petryk S, Bargal SA, Gonzalez JE (2020) NBDT: Neural-Backed Decision Trees. arXiv:2004.00221
Törnblom J, Nadjm-Tehrani S (2019) Formal verification of random forests in safety-critical applications: 6th International Workshop, FTSCS 2018, Gold Coast, Australia, November 16, 2018, Revised Selected Papers. pp 55–71
Wang B, Hóu Z, Zhang G, Shi J, Huang Y (2021) Tree ensemble property verification from a testing perspective. In: Accepted by the 33rd international conference on software engineering and knowledge engineering (SEKE), pittsburgh, USA
Bride H, Cai C, Dong J, Dong JS, Hóu Z, Mirjalili S, Sun J (2021) Silas: a high-performance machine learning foundation for logical reasoning and verification. Expert Syst Appl 176:114806. https://doi.org/10.1016/j.eswa.2021.114806
Funding
Yanhong Huang’s work is partially supported by National Key Research and Development Program (2019YFB2102602).
Author information
Authors and Affiliations
Contributions
Gelin zhang: Software, Investigation, data curation, Writing - review and editing; Zhé Hóu: Conceptualization, Methodology, Writing – original draft; Yanhong Huang: Resources, Validation; Jianqi Shi: Supervision, Conceptualization, Validation; Hadrien Bride: Methodology, Software; Jin Song Dong: Conceptualization, Validation; Yongsheng Gao: Investigation, Writing - review and editing. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of Interests
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The code is available at https://github.com/GreeenZhang/OptExplain.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, G., Hóu, Z., Huang, Y. et al. Extracting optimal explanations for ensemble trees via automated reasoning. Appl Intell 53, 14371–14382 (2023). https://doi.org/10.1007/s10489-022-04180-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04180-1