A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting

Li, Yuang; Zhang, Min; Su, Chang; Li, Yinglu; Qiao, Xiaosong; Ren, Mengxin; Ma, Miaomiao; Wei, Daimeng; Tao, Shimin; Yang, Hao

Computer Science > Artificial Intelligence

arXiv:2309.09552 (cs)

[Submitted on 18 Sep 2023 (v1), last revised 6 Jun 2024 (this version, v4)]

Title:A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting

Authors:Yuang Li, Min Zhang, Chang Su, Yinglu Li, Xiaosong Qiao, Mengxin Ren, Miaomiao Ma, Daimeng Wei, Shimin Tao, Hao Yang

View PDF HTML (experimental)

Abstract:The recognition of rare named entities, such as personal names and terminologies, is challenging for automatic speech recognition (ASR) systems, especially when they are not frequently observed in the training data. In this paper, we introduce keyword spotting enhanced Whisper (KWS-Whisper), a novel ASR system that leverages the Whisper model and performs open-vocabulary keyword spotting (OV-KWS) on the hidden states of the Whisper encoder to recognize user-defined named entities. These entities serve as prompts for the Whisper decoder. To optimize the model, we propose a multitask training approach that learns OV-KWS and contextual-ASR tasks. We evaluate our approach on Chinese Aishell hot word subsets and two internal code-switching test sets and show that it significantly improves the entity recall compared to the original Whisper model. Moreover, we demonstrate that the OV-KWS can be a plug-and-play module to enhance the ASR error correction methods and frozen Whisper models.

Comments:	5 pages, 2 figures, Accepted to InterSpeech 2024
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2309.09552 [cs.AI]
	(or arXiv:2309.09552v4 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2309.09552

Submission history

From: Yuang Li [view email]
[v1] Mon, 18 Sep 2023 08:03:54 UTC (1,719 KB)
[v2] Sun, 14 Jan 2024 01:12:43 UTC (1,549 KB)
[v3] Tue, 23 Jan 2024 02:59:44 UTC (1,550 KB)
[v4] Thu, 6 Jun 2024 05:18:10 UTC (1,124 KB)

Computer Science > Artificial Intelligence

Title:A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators