VCEMO: Multi-Modal Emotion Recognition for Chinese Voiceprints

Tang, Jinghua; Zhang, Liyun; Lu, Yu; Ding, Dian; Yang, Lanqing; Chen, YiChao; Bian, Minjie; Li, Xiaoshan; Xue, Guangtao

Computer Science > Multimedia

arXiv:2408.13019 (cs)

[Submitted on 23 Aug 2024]

Title:VCEMO: Multi-Modal Emotion Recognition for Chinese Voiceprints

Authors:Jinghua Tang, Liyun Zhang, Yu Lu, Dian Ding, Lanqing Yang, YiChao Chen, Minjie Bian, Xiaoshan Li, Guangtao Xue

View PDF

Abstract:Emotion recognition can enhance humanized machine responses to user commands, while voiceprint-based perception systems can be easily integrated into commonly used devices like smartphones and stereos. Despite having the largest number of speakers, there is a noticeable absence of high-quality corpus datasets for emotion recognition using Chinese voiceprints. Hence, this paper introduces the VCEMO dataset to address this deficiency. The proposed dataset is constructed from everyday conversations and comprises over 100 users and 7,747 textual samples. Furthermore, this paper proposes a multimodal-based model as a benchmark, which effectively fuses speech, text, and external knowledge using a co-attention structure. The system employs contrastive learning-based regulation for the uneven distribution of the dataset and the diversity of emotional expressions. The experiments demonstrate the significant improvement of the proposed model over SOTA on the VCEMO and IEMOCAP datasets. Code and dataset will be released for research.

Comments:	12 pages, 4 figures
Subjects:	Multimedia (cs.MM); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2408.13019 [cs.MM]
	(or arXiv:2408.13019v1 [cs.MM] for this version)
	https://doi.org/10.48550/arXiv.2408.13019

Submission history

From: Dian Ding [view email]
[v1] Fri, 23 Aug 2024 12:14:18 UTC (650 KB)

Computer Science > Multimedia

Title:VCEMO: Multi-Modal Emotion Recognition for Chinese Voiceprints

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Multimedia

Title:VCEMO: Multi-Modal Emotion Recognition for Chinese Voiceprints

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators