Pre-training image-language transformers for open-vocabulary tasks

Piergiovanni, AJ; Kuo, Weicheng; Angelova, Anelia

Computer Science > Computer Vision and Pattern Recognition

arXiv:2209.04372 (cs)

[Submitted on 9 Sep 2022]

Title:Pre-training image-language transformers for open-vocabulary tasks

Authors:AJ Piergiovanni, Weicheng Kuo, Anelia Angelova

View PDF

Abstract:We present a pre-training approach for vision and language transformer models, which is based on a mixture of diverse tasks. We explore both the use of image-text captioning data in pre-training, which does not need additional supervision, as well as object-aware strategies to pre-train the model. We evaluate the method on a number of textgenerative vision+language tasks, such as Visual Question Answering, visual entailment and captioning, and demonstrate large gains over standard pre-training methods.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2209.04372 [cs.CV]
	(or arXiv:2209.04372v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2209.04372

Submission history

From: Aj Piergiovanni [view email]
[v1] Fri, 9 Sep 2022 16:11:11 UTC (143 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2022-09

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Pre-training image-language transformers for open-vocabulary tasks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Pre-training image-language transformers for open-vocabulary tasks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators