Representation Separation for Semantic Segmentation with Vision Transformers

Hong, Yuanduo; Pan, Huihui; Sun, Weichao; Yu, Xinghu; Gao, Huijun

Computer Science > Computer Vision and Pattern Recognition

arXiv:2212.13764 (cs)

[Submitted on 28 Dec 2022]

Title:Representation Separation for Semantic Segmentation with Vision Transformers

Authors:Yuanduo Hong, Huihui Pan, Weichao Sun, Xinghu Yu, Huijun Gao

View PDF

Abstract:Vision transformers (ViTs) encoding an image as a sequence of patches bring new paradigms for semantic this http URL present an efficient framework of representation separation in local-patch level and global-region level for semantic segmentation with ViTs. It is targeted for the peculiar over-smoothness of ViTs in semantic segmentation, and therefore differs from current popular paradigms of context modeling and most existing related methods reinforcing the advantage of attention. We first deliver the decoupled two-pathway network in which another pathway enhances and passes down local-patch discrepancy complementary to global representations of transformers. We then propose the spatially adaptive separation module to obtain more separate deep representations and the discriminative cross-attention which yields more discriminative region representations through novel auxiliary supervisions. The proposed methods achieve some impressive results: 1) incorporated with large-scale plain ViTs, our methods achieve new state-of-the-art performances on five widely used benchmarks; 2) using masked pre-trained plain ViTs, we achieve 68.9% mIoU on Pascal Context, setting a new record; 3) pyramid ViTs integrated with the decoupled two-pathway network even surpass the well-designed high-resolution ViTs on Cityscapes; 4) the improved representations by our framework have favorable transferability in images with natural corruptions. The codes will be released publicly.

Comments:	17 pages, 13 figures. This work has been submitted to the IEEE for possible publication
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2212.13764 [cs.CV]
	(or arXiv:2212.13764v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2212.13764

Submission history

From: Yuanduo Hong [view email]
[v1] Wed, 28 Dec 2022 09:54:52 UTC (13,293 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Representation Separation for Semantic Segmentation with Vision Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Representation Separation for Semantic Segmentation with Vision Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators