Few-shot Adaptation of Medical Vision-Language Models

Shakeri, Fereshteh; Huang, Yunshi; Silva-Rodríguez, Julio; Bahig, Houda; Tang, An; Dolz, Jose; Ayed, Ismail Ben

Computer Science > Computer Vision and Pattern Recognition

arXiv:2409.03868 (cs)

[Submitted on 5 Sep 2024]

Title:Few-shot Adaptation of Medical Vision-Language Models

Authors:Fereshteh Shakeri, Yunshi Huang, Julio Silva-Rodríguez, Houda Bahig, An Tang, Jose Dolz, Ismail Ben Ayed

View PDF HTML (experimental)

Abstract:Integrating image and text data through multi-modal learning has emerged as a new approach in medical imaging research, following its successful deployment in computer vision. While considerable efforts have been dedicated to establishing medical foundation models and their zero-shot transfer to downstream tasks, the popular few-shot setting remains relatively unexplored. Following on from the currently strong emergence of this setting in computer vision, we introduce the first structured benchmark for adapting medical vision-language models (VLMs) in a strict few-shot regime and investigate various adaptation strategies commonly used in the context of natural images. Furthermore, we evaluate a simple generalization of the linear-probe adaptation baseline, which seeks an optimal blending of the visual prototypes and text embeddings via learnable class-wise multipliers. Surprisingly, such a text-informed linear probe yields competitive performances in comparison to convoluted prompt-learning and adapter-based strategies, while running considerably faster and accommodating the black-box setting. Our extensive experiments span three different medical modalities and specialized foundation models, nine downstream tasks, and several state-of-the-art few-shot adaptation methods. We made our benchmark and code publicly available to trigger further developments in this emergent subject: \url{this https URL}.

Comments:	MICCAI 2024 (Spotlight) - Code is available at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2409.03868 [cs.CV]
	(or arXiv:2409.03868v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2409.03868

Submission history

From: Fereshteh Shakeri [view email]
[v1] Thu, 5 Sep 2024 19:10:29 UTC (839 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Few-shot Adaptation of Medical Vision-Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Few-shot Adaptation of Medical Vision-Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators