Teaching Matters: Investigating the Role of Supervision in Vision Transformers

Walmer, Matthew; Suri, Saksham; Gupta, Kamal; Shrivastava, Abhinav

Computer Science > Computer Vision and Pattern Recognition

arXiv:2212.03862 (cs)

[Submitted on 7 Dec 2022 (v1), last revised 5 Apr 2023 (this version, v2)]

Title:Teaching Matters: Investigating the Role of Supervision in Vision Transformers

Authors:Matthew Walmer, Saksham Suri, Kamal Gupta, Abhinav Shrivastava

View PDF

Abstract:Vision Transformers (ViTs) have gained significant popularity in recent years and have proliferated into many applications. However, their behavior under different learning paradigms is not well explored. We compare ViTs trained through different methods of supervision, and show that they learn a diverse range of behaviors in terms of their attention, representations, and downstream performance. We also discover ViT behaviors that are consistent across supervision, including the emergence of Offset Local Attention Heads. These are self-attention heads that attend to a token adjacent to the current token with a fixed directional offset, a phenomenon that to the best of our knowledge has not been highlighted in any prior work. Our analysis shows that ViTs are highly flexible and learn to process local and global information in different orders depending on their training method. We find that contrastive self-supervised methods learn features that are competitive with explicitly supervised features, and they can even be superior for part-level tasks. We also find that the representations of reconstruction-based models show non-trivial similarity to contrastive self-supervised models. Project website (this https URL) and code (this https URL) are publicly available.

Comments:	Website: see this https URL. Code: see this https URL. The first two authors contributed equally. Accepted to CVPR 2023 as conference paper
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2212.03862 [cs.CV]
	(or arXiv:2212.03862v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2212.03862

Submission history

From: Matthew Walmer [view email]
[v1] Wed, 7 Dec 2022 18:59:45 UTC (13,364 KB)
[v2] Wed, 5 Apr 2023 18:14:23 UTC (15,124 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Teaching Matters: Investigating the Role of Supervision in Vision Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Teaching Matters: Investigating the Role of Supervision in Vision Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators