Faster run-length compressed suffix arrays

Brown, Nathaniel K.; Gagie, Travis; Manzini, Giovanni; Navarro, Gonzalo; Sciortino, Marinella

Computer Science > Data Structures and Algorithms

arXiv:2408.04537 (cs)

[Submitted on 8 Aug 2024 (v1), last revised 16 Feb 2025 (this version, v5)]

Title:Faster run-length compressed suffix arrays

Authors:Nathaniel K. Brown, Travis Gagie, Giovanni Manzini, Gonzalo Navarro, Marinella Sciortino

View PDF HTML (experimental)

Abstract:We first review how we can store a run-length compressed suffix array (RLCSA) for a text $T$ of length $n$ over an alphabet of size $\sigma$ whose Burrows-Wheeler Transform (BWT) consists of $r$ runs in $O \left( \rule{0ex}{2ex} r \log (n / r) + r \log \sigma + \sigma \right)$ bits such that later, given character $a$ and the suffix array interval for $P$, we can find the suffix-array (SA) interval for $a P$ in $O (\log r_a + \log \log n)$ time, where $r_a$ is the number of runs of copies of $a$ in the BWT. We then show how to modify the RLCSA such that we find the SA interval for $a P$ in only $O (\log r_a)$ time, without increasing its asymptotic space bound. Our key idea is applying a result by Nishimoto and Tabei (ICALP 2021) and then replacing rank queries on sparse bitvectors by a constant number of select queries. Finally, we review two-level indexing and discuss how our faster RLCSA may be useful in improving it.

Subjects:	Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:2408.04537 [cs.DS]
	(or arXiv:2408.04537v5 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.2408.04537

Submission history

From: Travis Gagie [view email]
[v1] Thu, 8 Aug 2024 15:44:37 UTC (4 KB)
[v2] Sun, 11 Aug 2024 22:27:12 UTC (5 KB)
[v3] Mon, 9 Sep 2024 21:46:04 UTC (39 KB)
[v4] Thu, 6 Feb 2025 14:46:17 UTC (537 KB)
[v5] Sun, 16 Feb 2025 21:01:25 UTC (540 KB)

Computer Science > Data Structures and Algorithms

Title:Faster run-length compressed suffix arrays

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:Faster run-length compressed suffix arrays

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators