Residual Mixture of Experts

Wu, Lemeng; Liu, Mengchen; Chen, Yinpeng; Chen, Dongdong; Dai, Xiyang; Yuan, Lu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2204.09636 (cs)

[Submitted on 20 Apr 2022 (v1), last revised 4 Oct 2022 (this version, v3)]

Title:Residual Mixture of Experts

Authors:Lemeng Wu, Mengchen Liu, Yinpeng Chen, Dongdong Chen, Xiyang Dai, Lu Yuan

View PDF

Abstract:Mixture of Experts (MoE) is able to scale up vision transformers effectively. However, it requires prohibiting computation resources to train a large MoE transformer. In this paper, we propose Residual Mixture of Experts (RMoE), an efficient training pipeline for MoE vision transformers on downstream tasks, such as segmentation and detection. RMoE achieves comparable results with the upper-bound MoE training, while only introducing minor additional training cost than the lower-bound non-MoE training pipelines. The efficiency is supported by our key observation: the weights of an MoE transformer can be factored into an input-independent core and an input-dependent residual. Compared with the weight core, the weight residual can be efficiently trained with much less computation resource, e.g., finetuning on the downstream data. We show that, compared with the current MoE training pipeline, we get comparable results while saving over 30% training cost. When compared with state-of-the-art non- MoE transformers, such as Swin-T / CvT-13 / Swin-L, we get +1.1 / 0.9 / 1.0 mIoU gain on ADE20K segmentation and +1.4 / 1.6 / 0.6 AP gain on MS-COCO object detection task with less than 3% additional training cost.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2204.09636 [cs.CV]
	(or arXiv:2204.09636v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2204.09636

Submission history

From: Lemeng Wu [view email]
[v1] Wed, 20 Apr 2022 17:29:48 UTC (4,153 KB)
[v2] Wed, 8 Jun 2022 07:58:28 UTC (4,154 KB)
[v3] Tue, 4 Oct 2022 10:07:03 UTC (8,312 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Residual Mixture of Experts

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Residual Mixture of Experts

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators