Reward Engineering for Generating Semi-structured Explanation

Han, Jiuzhou; Buntine, Wray; Shareghi, Ehsan

Computer Science > Computation and Language

arXiv:2309.08347 (cs)

[Submitted on 15 Sep 2023 (v1), last revised 24 Jan 2024 (this version, v2)]

Title:Reward Engineering for Generating Semi-structured Explanation

Authors:Jiuzhou Han, Wray Buntine, Ehsan Shareghi

View PDF HTML (experimental)

Abstract:Semi-structured explanation depicts the implicit process of a reasoner with an explicit representation. This explanation highlights how available information in a specific query is utilised and supplemented with information a reasoner produces from its internal weights towards generating an answer. Despite the recent improvements in generative capabilities of language models, producing structured explanations to verify a model's true reasoning capabilities remains a challenge. This issue is particularly pronounced for not-so-large LMs (e.g., FLAN-T5-XXL). In this work, we first underscore the limitations of supervised fine-tuning (SFT) in tackling this challenge, and then introduce a carefully crafted reward engineering method in reinforcement learning (RL) to better address this problem. We investigate multiple reward aggregation methods and provide a detailed discussion which sheds light on the promising potential of RL for future research. Our proposed method on two semi-structured explanation generation benchmarks (ExplaGraph and COPA-SSE) achieves new state-of-the-art results.

Comments:	Accepted to EACL2024; code is available at this https URL
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2309.08347 [cs.CL]
	(or arXiv:2309.08347v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2309.08347

Submission history

From: Jiuzhou Han [view email]
[v1] Fri, 15 Sep 2023 12:10:03 UTC (63 KB)
[v2] Wed, 24 Jan 2024 04:53:13 UTC (495 KB)

Computer Science > Computation and Language

Title:Reward Engineering for Generating Semi-structured Explanation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Reward Engineering for Generating Semi-structured Explanation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators