NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation

Meinhardt, Tim; Feiszli, Matt; Fan, Yuchen; Leal-Taixe, Laura; Ranjan, Rakesh

Computer Science > Computer Vision and Pattern Recognition

arXiv:2308.15266 (cs)

[Submitted on 29 Aug 2023 (v1), last revised 18 Sep 2023 (this version, v2)]

Title:NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation

Authors:Tim Meinhardt, Matt Feiszli, Yuchen Fan, Laura Leal-Taixe, Rakesh Ranjan

View PDF

Abstract:Until recently, the Video Instance Segmentation (VIS) community operated under the common belief that offline methods are generally superior to a frame by frame online processing. However, the recent success of online methods questions this belief, in particular, for challenging and long video sequences. We understand this work as a rebuttal of those recent observations and an appeal to the community to focus on dedicated near-online VIS approaches. To support our argument, we present a detailed analysis on different processing paradigms and the new end-to-end trainable NOVIS (Near-Online Video Instance Segmentation) method. Our transformer-based model directly predicts spatio-temporal mask volumes for clips of frames and performs instance tracking between clips via overlap embeddings. NOVIS represents the first near-online VIS approach which avoids any handcrafted tracking heuristics. We outperform all existing VIS methods by large margins and provide new state-of-the-art results on both YouTube-VIS (2019/2021) and the OVIS benchmarks.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2308.15266 [cs.CV]
	(or arXiv:2308.15266v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2308.15266

Submission history

From: Tim Meinhardt [view email]
[v1] Tue, 29 Aug 2023 12:51:04 UTC (7,111 KB)
[v2] Mon, 18 Sep 2023 14:46:11 UTC (7,112 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators