Tutorial 7
Tutorial 7
Wisely
1
Introduction of BWT
• Burrows and Wheeler introduced a new
compression algorithm based on a reversible
transformation now called the Burrows-Wheeler
Transform (BWT)
2
Transform Steps
(1) Append at the end of a text T a special character
$ smaller than any other text character
(2) Form a conceptual matrix MT whose rows are
the cyclic shifts of the string T$ sorted in
lexicographic order
3
Example :
F Tbwt
4
Notation
(1) Let C[ ] be an array of length |Σ| such that
C[c] = total # of text characters which are
alphabetically smaller than c
5
Example :
C table Occ(c,q)
F Tbwt
$ i m p s $ i m p s
0 1 5 6 8 0 1 0 0 0
0 1 0 1 0
0 1 0 1 1
0 1 0 1 2
0 1 1 1 2
1 1 1 1 2
1 1 1 2 2
1 2 1 2 2
1 2 1 2 3
1 2 1 2 4
1 3 1 2 4
1 4 1 2 4
6
Last to Front Mapping
F Tbwt
7
Backward Search
• The LF( ) mapping allows us to scan the text T
backward.
• In other words, we could search a pattern in T
backward. (How?)
8
Backward Search Algorithm
Backward_Search( P[1,p] )
{
i = p, c = P[p], First = C[c]+1, Last = C[c+1];
while ( ( First ≦ Last) and i ≧ 2 ) {
c = P[i-1];
First = C[c] + Occ(c, First-1)+1;
Last = C[c] + Occ(c, Last);
i = i-1;
}
if ( Last < First ) then
return “no occurrence” ;
else return ( First, Last );
}
9
References
[1] M. Burrows and D. Wheeler (1994).
A Block Sorting Lossless Data Compression Algorithm.
Technical Report 124, Digital Equipment Corporation.
[2] G. Manzini (2001).
An Analysis of the Burrows-Wheeler Transform.
Journal of the ACM, 48(3): pages 407 – 430.
[3] P. Ferragina and G. Manzini (2000).
Opportunistic Data Structures with Applications.
In Proceedings of FOCS, pages 390 – 398.
[4] P. Ferragina, G. Manzini, V. Mäkinen, and G. Navarro (2004).
An Alphabet-Friendly FM-Index.
In Proceedings of SPIRE, pages 150 – 160.
10