0% found this document useful (0 votes)
33 views

Tutorial 7

The Burrows-Wheeler Transform (BWT) is a reversible data transformation used in data compression. It involves sorting all cyclic shifts of the input text and outputting the last column. This transforms the text in a way that groups similar characters together. The BWT allows scanning a text backward efficiently using a last-to-front mapping, enabling fast pattern searching in compressed text. It underlies compression tools like bzip2 and forms the basis for full-text indexes like the FM-index.

Uploaded by

Saad Rabbi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Tutorial 7

The Burrows-Wheeler Transform (BWT) is a reversible data transformation used in data compression. It involves sorting all cyclic shifts of the input text and outputting the last column. This transforms the text in a way that groups similar characters together. The BWT allows scanning a text backward efficiently using a last-to-front mapping, enabling fast pattern searching in compressed text. It underlies compression tools like bzip2 and forms the basis for full-text indexes like the FM-index.

Uploaded by

Saad Rabbi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Burrows-Wheeler Transform

Wisely

1
Introduction of BWT
• Burrows and Wheeler introduced a new
compression algorithm based on a reversible
transformation now called the Burrows-Wheeler
Transform (BWT)

• BWT is applied in data compression techniques


such as bzip2 (http://bzip.org/)

2
Transform Steps
(1) Append at the end of a text T a special character
$ smaller than any other text character
(2) Form a conceptual matrix MT whose rows are
the cyclic shifts of the string T$ sorted in
lexicographic order

(3) Construct the transformed text Tbwt by taking the


last column of matrix MT

3
Example :

F Tbwt

4
Notation
(1) Let C[ ] be an array of length |Σ| such that
C[c] = total # of text characters which are
alphabetically smaller than c

(2) Let Occ(c,q) denote # of occurrences of


character c in the prefix Tbwt[1, q].

(3) Let LF(i) = C[ Tbwt[i] ] + Occ(Tbwt[i] , i )

5
Example :
C table Occ(c,q)
F Tbwt
$ i m p s $ i m p s
0 1 5 6 8 0 1 0 0 0
0 1 0 1 0
0 1 0 1 1
0 1 0 1 2
0 1 1 1 2
1 1 1 1 2
1 1 1 2 2
1 2 1 2 2
1 2 1 2 3
1 2 1 2 4
1 3 1 2 4
1 4 1 2 4
6
Last to Front Mapping
F Tbwt

• LF( ) = Last-to-Front Column Mapping


• The character Tbwt[i] is located in
the first column F at position LF[i]

• LF(10) = C[s] + Occ(s,10)=12.


• Both Tbwt[10] and F[12] correspond
to the first s in the mississippi

7
Backward Search
• The LF( ) mapping allows us to scan the text T
backward.
• In other words, we could search a pattern in T
backward. (How?)

8
Backward Search Algorithm
Backward_Search( P[1,p] )
{
i = p, c = P[p], First = C[c]+1, Last = C[c+1];
while ( ( First ≦ Last) and i ≧ 2 ) {
c = P[i-1];
First = C[c] + Occ(c, First-1)+1;
Last = C[c] + Occ(c, Last);
i = i-1;
}
if ( Last < First ) then
return “no occurrence” ;
else return ( First, Last );
}
9
References
[1] M. Burrows and D. Wheeler (1994).
A Block Sorting Lossless Data Compression Algorithm.
Technical Report 124, Digital Equipment Corporation.
[2] G. Manzini (2001).
An Analysis of the Burrows-Wheeler Transform.
Journal of the ACM, 48(3): pages 407 – 430.
[3] P. Ferragina and G. Manzini (2000).
Opportunistic Data Structures with Applications.
In Proceedings of FOCS, pages 390 – 398.
[4] P. Ferragina, G. Manzini, V. Mäkinen, and G. Navarro (2004).
An Alphabet-Friendly FM-Index.
In Proceedings of SPIRE, pages 150 – 160.
10

You might also like