Text Compression: Examples Huffman Coding: Go Go Gophers
Text Compression: Examples Huffman Coding: Go Go Gophers
Text Compression: Examples Huffman Coding: Go Go Gophers
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
1 1
F C
11 6 5 5 4 4 3 3 3 3 2 2 2 2 2 1 1 1 11 6 5 5 4 4 3 3 3 3 2 2 2 2 2 2 1
CPS 100 8.5 CPS 100 8.6
I E N S M A B O T G D L R U P F C I E N S M A B O T G D L R U P
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
2 3 2 3 2 2
L R
1 1 1 2 1 1 1 2
C F P U C F P U
11 6 5 5 4 4 3 3 3 3 3 2 2 2 2 2 11 6 5 5 4 4 4 3 3 3 3 3 2 2 2
CPS 100 8.7 CPS 100 8.8
I E N S M A B O T G D L R I E N S M A B O T G D
Building a tree Building a tree
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
23 37 23
21 16 12 11 21 16 12 11
10 11 8 8 6 6 10 11 8 8 6 6
I I
5 5 5 6 4 4 4 4 3 3 5 5 5 6 4 4 4 4 3 3
E N S M A B E N S M A B
3 2 3 3 2 2 2 2 3 2 3 3 2 2 2 2
O T G D L R O T G D L R
1 1 1 2 1 1 1 2
C F P U C F P U
23 21 16 37 23
CPS 100 8.9 CPS 100 8.10
60
CPS 100 8.11 CPS 100 8.12
Huffman Complexities Writing code out to file
How do we measure? Size of input file, size of alphabet How do we go from characters to encodings?
Which is typically bigger? Build Huffman tree
Root-to-leaf path generates encoding
Accumulating character counts: ______
How can we do this in O(1) time, though not really Need way of writing bits out to file
Building the heap/priority queue from counts ____ Platform dependent?
Initializing heap guaranteed Complicated to write bits and read in same ordering
Building Huffman tree ____
Why? See BitInputStream and BitOutputStream classes
Create table of encodings from tree ____ Depend on each other, bit ordering preserved
Why?
Write tree and compressed file _____ How do we know bits come from compressed file?
Store a magic number
01100000100001001101 1100000100001001101
60 60
37 23 37 23
21 16 12 11 21 16 12 11
10 11 8 8 6 6 10 11 8 8 6 6
I I
5 5 5 6 4 4 4 4 3 3 5 5 5 6 4 4 4 4 3 3
E N S M A B E N S M A B
3 2 3 3 2 2 2 2 3 2 3 3 2 2 2 2
O T G D L R O T G D L R
1 1 1 2 1 1 1 2
C F P U C F P U
100000100001001101 00000100001001101
60 60
37 23 37 23
21 16 12 11 21 16 12 11
10 11 8 8 6 6 10 11 8 8 6 6
I I
5 5 5 6 4 4 4 4 3 3 5 5 5 6 4 4 4 4 3 3
E N S M A B E N S M A B
3 2 3 3 2 2 2 2 3 2 3 3 2 2 2 2
O T G D L R O T G D L R
1 1 1 2 1 1 1 2
C F P U C F P U
0000100001001101 1
60 60
37 23 37 23
21 16 12 11 21 16 12 11
10 11 8 8 6 6 10 11 8 8 6 6
I I
5 5 5 6 4 4 4 4 3 3 5 5 5 6 4 4 4 4 3 3
E N S M A B E N S M A B
3 2 3 3 2 2 2 2 3 2 3 3 2 2 2 2
O T G D L R O T G D L R
1 1 1 2 1 1 1 2
C F P U C F P U