Towards Faithful Neural Network Intrinsic Interpretation with Shapley Additive Self-Attribution

Sun, Ying; Zhu, Hengshu; Xiong, Hui

Computer Science > Machine Learning

arXiv:2309.15559 (cs)

[Submitted on 27 Sep 2023]

Title:Towards Faithful Neural Network Intrinsic Interpretation with Shapley Additive Self-Attribution

Authors:Ying Sun, Hengshu Zhu, Hui Xiong

View PDF

Abstract:Self-interpreting neural networks have garnered significant interest in research. Existing works in this domain often (1) lack a solid theoretical foundation ensuring genuine interpretability or (2) compromise model expressiveness. In response, we formulate a generic Additive Self-Attribution (ASA) framework. Observing the absence of Shapley value in Additive Self-Attribution, we propose Shapley Additive Self-Attributing Neural Network (SASANet), with theoretical guarantees for the self-attribution value equal to the output's Shapley values. Specifically, SASANet uses a marginal contribution-based sequential schema and internal distillation-based training strategies to model meaningful outputs for any number of features, resulting in un-approximated meaningful value function. Our experimental results indicate SASANet surpasses existing self-attributing models in performance and rivals black-box models. Moreover, SASANet is shown more precise and efficient than post-hoc methods in interpreting its own predictions.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2309.15559 [cs.LG]
	(or arXiv:2309.15559v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2309.15559

Submission history

From: Ying Sun [view email]
[v1] Wed, 27 Sep 2023 10:31:48 UTC (5,558 KB)

Computer Science > Machine Learning

Title:Towards Faithful Neural Network Intrinsic Interpretation with Shapley Additive Self-Attribution

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Towards Faithful Neural Network Intrinsic Interpretation with Shapley Additive Self-Attribution

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators