Controllable Image Captioning

Maxwell, Luka

Computer Science > Computer Vision and Pattern Recognition

arXiv:2204.13324 (cs)

This paper has been withdrawn by arXiv Admin

[Submitted on 28 Apr 2022 (v1), last revised 25 May 2022 (this version, v4)]

Title:Controllable Image Captioning

Authors:Luka Maxwell

No PDF available, click to view other formats

Abstract:State-of-the-art image captioners can generate accurate sentences to describe images in a sequence to sequence manner without considering the controllability and interpretability. This, however, is far from making image captioning widely used as an image can be interpreted in infinite ways depending on the target and the context at hand. Achieving controllability is important especially when the image captioner is used by different people with different way of interpreting the images. In this paper, we introduce a novel framework for image captioning which can generate diverse descriptions by capturing the co-dependence between Part-Of-Speech tags and semantics. Our model decouples direct dependence between successive variables. In this way, it allows the decoder to exhaustively search through the latent Part-Of-Speech choices, while keeping decoding speed proportional to the size of the POS vocabulary. Given a control signal in the form of a sequence of Part-Of-Speech tags, we propose a method to generate captions through a Transformer network, which predicts words based on the input Part-Of-Speech tag sequences. Experiments on publicly available datasets show that our model significantly outperforms state-of-the-art methods on generating diverse image captions with high qualities.

Comments:	This submission has been withdrawn by arXiv administrators because the identity of the submitter and author could not be verified
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2204.13324 [cs.CV]
	(or arXiv:2204.13324v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2204.13324

Submission history

From: arXiv Admin [view email]
[v1] Thu, 28 Apr 2022 07:47:49 UTC (85 KB)
[v2] Tue, 3 May 2022 18:25:35 UTC (9,446 KB)
[v3] Mon, 23 May 2022 00:26:12 UTC (9,452 KB)
[v4] Wed, 25 May 2022 17:56:19 UTC (1 KB) (withdrawn)

Computer Science > Computer Vision and Pattern Recognition

Title:Controllable Image Captioning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Controllable Image Captioning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators