Emotion-Aligned Contrastive Learning Between Images and Music

Stewart, Shanti; Avramidis, Kleanthis; Feng, Tiantian; Narayanan, Shrikanth

Computer Science > Multimedia

arXiv:2308.12610 (cs)

[Submitted on 24 Aug 2023 (v1), last revised 8 Dec 2024 (this version, v3)]

Title:Emotion-Aligned Contrastive Learning Between Images and Music

Authors:Shanti Stewart, Kleanthis Avramidis, Tiantian Feng, Shrikanth Narayanan

View PDF HTML (experimental)

Abstract:Traditional music search engines rely on retrieval methods that match natural language queries with music metadata. There have been increasing efforts to expand retrieval methods to consider the audio characteristics of music itself, using queries of various modalities including text, video, and speech. While most approaches aim to match general music semantics to the input queries, only a few focus on affective qualities. In this work, we address the task of retrieving emotionally-relevant music from image queries by learning an affective alignment between images and music audio. Our approach focuses on learning an emotion-aligned joint embedding space between images and music. This embedding space is learned via emotion-supervised contrastive learning, using an adapted cross-modal version of the SupCon loss. We evaluate the joint embeddings through cross-modal retrieval tasks (image-to-music and music-to-image) based on emotion labels. Furthermore, we investigate the generalizability of the learned music embeddings via automatic music tagging. Our experiments show that the proposed approach successfully aligns images and music, and that the learned embedding space is effective for cross-modal retrieval applications.

Comments:	Published at ICASSP 2024. Code: this https URL
Subjects:	Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2308.12610 [cs.MM]
	(or arXiv:2308.12610v3 [cs.MM] for this version)
	https://doi.org/10.48550/arXiv.2308.12610

Submission history

From: Shanti Stewart [view email]
[v1] Thu, 24 Aug 2023 07:20:47 UTC (2,697 KB)
[v2] Wed, 20 Sep 2023 21:11:14 UTC (1,312 KB)
[v3] Sun, 8 Dec 2024 05:26:25 UTC (1,312 KB)

Computer Science > Multimedia

Title:Emotion-Aligned Contrastive Learning Between Images and Music

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Multimedia

Title:Emotion-Aligned Contrastive Learning Between Images and Music

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators