Can AI Assistants Know What They Don't Know?

Cheng, Qinyuan; Sun, Tianxiang; Liu, Xiangyang; Zhang, Wenwei; Yin, Zhangyue; Li, Shimin; Li, Linyang; He, Zhengfu; Chen, Kai; Qiu, Xipeng

Computer Science > Computation and Language

arXiv:2401.13275 (cs)

[Submitted on 24 Jan 2024 (v1), last revised 28 Jan 2024 (this version, v2)]

Title:Can AI Assistants Know What They Don't Know?

Authors:Qinyuan Cheng, Tianxiang Sun, Xiangyang Liu, Wenwei Zhang, Zhangyue Yin, Shimin Li, Linyang Li, Zhengfu He, Kai Chen, Xipeng Qiu

View PDF

Abstract:Recently, AI assistants based on large language models (LLMs) show surprising performance in many tasks, such as dialogue, solving math problems, writing code, and using tools. Although LLMs possess intensive world knowledge, they still make factual errors when facing some knowledge intensive tasks, like open-domain question answering. These untruthful responses from the AI assistant may cause significant risks in practical applications. We believe that an AI assistant's refusal to answer questions it does not know is a crucial method for reducing hallucinations and making the assistant truthful. Therefore, in this paper, we ask the question "Can AI assistants know what they don't know and express them through natural language?" To answer this question, we construct a model-specific "I don't know" (Idk) dataset for an assistant, which contains its known and unknown questions, based on existing open-domain question answering datasets. Then we align the assistant with its corresponding Idk dataset and observe whether it can refuse to answer its unknown questions after alignment. Experimental results show that after alignment with Idk datasets, the assistant can refuse to answer most its unknown questions. For questions they attempt to answer, the accuracy is significantly higher than before the alignment.

Comments:	Work in progress
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2401.13275 [cs.CL]
	(or arXiv:2401.13275v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2401.13275

Submission history

From: Qinyuan Cheng [view email]
[v1] Wed, 24 Jan 2024 07:34:55 UTC (363 KB)
[v2] Sun, 28 Jan 2024 09:07:13 UTC (313 KB)

Computer Science > Computation and Language

Title:Can AI Assistants Know What They Don't Know?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Can AI Assistants Know What They Don't Know?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators