CSEP 590
Data Compression
Autumn 2007
Course Policies
Introduction to Data Compression
Entropy
Variable Length Codes
Instructors
• Instructor
– Richard Ladner
– ladner@cs.washington.edu
– 206 543-9347
• TA
– Rahul Vanam
– rahulv@u.washington.edu
CSEP 590 - Lecture 1 - Autumn 2007 2
Helpful Knowledge
• Algorithm Design and Analysis
• Probability
CSEP 590 - Lecture 1 - Autumn 2007 3
Resources
• Text Book
– Khalid Sayood, Introduction to Data Compression,
Third Edition, Morgan Kaufmann Publishers, 2006.
• Course Web Page
– http://www.cs.washington.edu/csep590a
• Papers and Sections from Books
• Discussion Board
– For discussion
CSEP 590 - Lecture 1 - Autumn 2007 4
Engagement by Students
• Weekly Assignments
– Understand compression methodology
– Due in class on Fridays (except midterm Friday)
– No late assignments accepted except with prior
approval
• Programming Projects
– Bi-level arithmetic coder and decoder.
– Build code and experiment
CSEP 590 - Lecture 1 - Autumn 2007 5
Final Exam and Grading
• 6:30-8:20 p.m. Thursday, Dec. 13, 2007
• Percentages
– Weekly assignments (50%)
– Project (20%)
– Final exam (30%)
CSEP 590 - Lecture 1 - Autumn 2007 6
Logistics
• I will be gone the week of October 15th. We’ll
need to have a make up class.
• There is no class Thanksgiving week,
November 19th.
• We have some guest speakers toward the
end of the quarter.
CSEP 590 - Lecture 1 - Autumn 2007 7
Basic Data Compression Concepts
original compressed decompressed
x y x̂
Encoder Decoder
• Lossless compression x = xˆ
– Also called entropy coding, reversible coding.
• Lossy compression x ≠ xˆ
– Also called irreversible coding.
• Compression ratio = x y
– x is number of bits in x.
CSEP 590 - Lecture 1 - Autumn 2007 8
Why Compress
• Conserve storage space
• Reduce time for transmission
– Faster to encode, send, then decode than to send
the original
• Progressive transmission
– Some compression techniques allow us to send
the most important bits first so we can get a low
resolution version of some data before getting the
high fidelity version
• Reduce computation
– Use less data to achieve an approximate answer
CSEP 590 - Lecture 1 - Autumn 2007 9
Braille
• System to read text by feeling raised dots on
paper (or on electronic displays). Invented in
1820s by Louis Braille, a French blind man.
a b c z
and the with mother
th ch gh
CSEP 590 - Lecture 1 - Autumn 2007 10
Braille Example
Clear text:
Call me Ishmael. Some years ago -- never mind how
long precisely -- having \\ little or no money in my purse,
and nothing particular to interest me on shore, \\ I thought
I would sail about a little and see the watery part of the
world. (238 characters)
Grade 2 Braille in ASCII.
,call me ,i\%mael4 ,``s ye$>$s ago -- n``e m9d h[ l;g
precisely -- hav+ \\ ll or no m``oy 9 my purse1 \& no?+
``picul$>$ 6 9t]e/ me on \%ore1 \\ ,i $?$``$|$ ,i wd sail
ab a ll \& see ! wat]y ``p ( ! \_w4 (203 characters)
Compression ratio = 238/203 = 1.17
CSEP 590 - Lecture 1 - Autumn 2007 11
Lossless Compression
• Data is not lost - the original is really needed.
– text compression
– compression of computer binary files
• Compression ratio typically no better than 4:1 for
lossless compression on many kinds of files.
• Statistical Techniques
– Huffman coding
– Arithmetic coding
– Golomb coding
• Dictionary techniques
– LZW, LZ77
– Sequitur
– Burrows-Wheeler Method
• Standards - Morse code, Braille, Unix compress, gzip,
zip, bzip, GIF, JBIG, Lossless JPEG
CSEP 590 - Lecture 1 - Autumn 2007 12
Lossy Compression
• Data is lost, but not too much.
– audio
– video
– still images, medical images, photographs
• Compression ratios of 10:1 often yield quite
high fidelity results.
• Major techniques include
– Vector Quantization
– Wavelets
– Block transforms
– Standards - JPEG, JPEG2000, MPEG 2, H.264
CSEP 590 - Lecture 1 - Autumn 2007 13
Why is Data Compression Possible
• Most data from nature has redundancy
– There is more data than the actual information
contained in the data.
– Squeezing out the excess data amounts to
compression.
– However, unsqueezing is necessary to be able to
figure out what the data means.
• Information theory is needed to understand
the limits of compression and give clues on
how to compress well.
CSEP 590 - Lecture 1 - Autumn 2007 14
What is Information
• Analog data
– Also called continuous data
– Represented by real numbers (or complex
numbers)
• Digital data
– Finite set of symbols {a1, a2, ... , am}
– All data represented as sequences (strings) in the
symbol set.
– Example: {a,b,c,d,r} abracadabra
– Digital data can be an approximation to analog
data
CSEP 590 - Lecture 1 - Autumn 2007 15
Symbols
• Roman alphabet plus punctuation
• ASCII - 256 symbols
• Binary - {0,1}
– 0 and 1 are called bits
– All digital information can be represented
efficiently in binary
– {a,b,c,d} fixed length representation
symbol a b c d
binary 00 01 10 11
– 2 bits per symbol
CSEP 590 - Lecture 1 - Autumn 2007 16
Exercise - How Many Bits Per
Symbol?
• Suppose we have n symbols. How many bits
(as a function of n ) are needed in to
represent a symbol in binary?
– First try n a power of 2.
CSEP 590 - Lecture 1 - Autumn 2007 17
Discussion: Non-Powers of Two
• Can we do better than a fixed length
representation for non-powers of two?
CSEP 590 - Lecture 1 - Autumn 2007 18
Information Theory
• Developed by Shannon in the 1940’s and 50’s
• Attempts to explain the limits of communication
using probability theory.
• Example: Suppose English text is being sent
– It is much more likely to receive an “e” than a “z”.
– In some sense “z” has more information than “e”.
CSEP 590 - Lecture 1 - Autumn 2007 19
First-order Information
• Suppose we are given symbols {a1, a2, ... , am}.
• P(ai) = probability of symbol ai occurring in the
absence of any other information.
P(a1) + P(a2) + ... + P(am) = 1
• inf(ai) = log2(1/P(ai)) bits is the information of ai
in bits. 7
6
5
-log(x)
4
y
3
2
1
0
0.01
0.08
0.15
0.22
0.29
0.36
0.43
0.5
0.57
0.64
0.71
0.78
0.85
0.92
0.99
x
CSEP 590 - Lecture 1 - Autumn 2007 20
Example
• {a, b, c} with P(a) = 1/8, P(b) = 1/4, P(c) = 5/8
– inf(a) = log2(8) = 3
– inf(b) = log2(4) = 2
– inf(c) = log2(8/5) = .678
• Receiving an “a” has more information than
receiving a “b” or “c”.
CSEP 590 - Lecture 1 - Autumn 2007 21
First Order Entropy
• The first order entropy is defined for a probability
distribution over symbols {a1, a2, ... , am}.
m
1
H = ∑ P (ai ) log 2 ( )
i =1 P (ai )
• H is the average number of bits required to code up a
symbol, given all we know is the probability distribution
of the symbols.
• H is the Shannon lower bound on the average number of
bits to code a symbol in this “source model”.
• Stronger models of entropy include context.
CSEP 590 - Lecture 1 - Autumn 2007 22
Entropy Examples
• {a, b, c} with a 1/8, b 1/4, c 5/8.
– H = 1/8 *3 + 1/4 *2 + 5/8* .678 = 1.3 bits/symbol
• {a, b, c} with a 1/3, b 1/3, c 1/3. (worst case)
– H = 3* (1/3)*log2(3) = 1.6 bits/symbol
• Note that a standard code takes 2 bits per
symbol
symbol a b c
binary code 00 01 10
CSEP 590 - Lecture 1 - Autumn 2007 23
An Extreme Case
• {a, b, c} with a 1, b 0, c 0
– H=?
CSEP 590 - Lecture 1 - Autumn 2007 24
Entropy Curve
• Suppose we have two symbols with probabilities
x and 1-x, respectively.
maximum entropy at .5
1.2
1 -(x log x + (1-x)log(1-x))
0.8
entropy
0.6
0.4
0.2
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
probability of first symbol
CSEP 590 - Lecture 1 - Autumn 2007 25
A Simple Prefix Code
• {a, b, c} with a 1/8, b 1/4, c 5/8.
• A prefix code is defined by a binary tree
• Prefix code property
– no output is a prefix of another
input output
binary tree a 00
0 1
c b 01 code
0 1
a b c 1
ccabccbccc
1 1 00 01 1 1 01 1 1 1
CSEP 590 - Lecture 1 - Autumn 2007 26
Binary Tree Terminology
root
node
leaf
1. Each node, except the root, has a unique parent.
2. Each internal node has exactly two children.
CSEP 590 - Lecture 1 - Autumn 2007 27
Decoding a Prefix Code
repeat
start at root of tree
0 1
repeat
c if read bit = 1 then go right
0 1
else go left
a b until node is a leaf
report leaf
until end of the code
11000111100
CSEP 590 - Lecture 1 - Autumn 2007 28
Decoding a Prefix Code
0 1
c
0 1
a b
11000111100
CSEP 590 - Lecture 1 - Autumn 2007 29
Decoding a Prefix Code
0 1
c
0 1
a b
11000111100
CSEP 590 - Lecture 1 - Autumn 2007 30
Decoding a Prefix Code
0 1
c
0 1
a b
11000111100
CSEP 590 - Lecture 1 - Autumn 2007 31
Decoding a Prefix Code
0 1
c
0 1
a b
11000111100
cc
CSEP 590 - Lecture 1 - Autumn 2007 32
Decoding a Prefix Code
0 1
c
0 1
a b
11000111100
cc
CSEP 590 - Lecture 1 - Autumn 2007 33
Decoding a Prefix Code
0 1
c
0 1
a b
11000111100
cc
CSEP 590 - Lecture 1 - Autumn 2007 34
Decoding a Prefix Code
0 1
c
0 1
a b
11000111100
cca
CSEP 590 - Lecture 1 - Autumn 2007 35
Decoding a Prefix Code
0 1
c
0 1
a b
11000111100
cca
CSEP 590 - Lecture 1 - Autumn 2007 36
Decoding a Prefix Code
0 1
c
0 1
a b
11000111100
cca
CSEP 590 - Lecture 1 - Autumn 2007 37
Decoding a Prefix Code
0 1
c
0 1
a b
11000111100
ccab
CSEP 590 - Lecture 1 - Autumn 2007 38
Decoding a Prefix Code
0 1
c
0 1
a b
11000111100
ccabccca
CSEP 590 - Lecture 1 - Autumn 2007 39
Exercise Encode/Decode
0 1
a 0 1
d
0 1
b c
• Player 1: Encode a symbol string
• Player 2: Decode the string
• Check for equality
CSEP 590 - Lecture 1 - Autumn 2007 40
How Good is the Code
0 1
c
0 1 5/8
a b
1/8 1/4
bit rate = (1/8)2 + (1/4)2 + (5/8)1 = 11/8 = 1.375 bps
Entropy = 1.3 bps
Standard code = 2 bps
(bps = bits per symbol)
CSEP 590 - Lecture 1 - Autumn 2007 41
Design a Prefix Code 1
• abracadabra
• Design a prefix code for the 5 symbols
{a,b,r,c,d} which compresses this string the
most.
CSEP 590 - Lecture 1 - Autumn 2007 42
Design a Prefix Code 2
• Suppose we have n symbols each with
probability 1/n. Design a prefix code with
minimum average bit rate.
• Consider n = 2,3,4,5,6 first.
CSEP 590 - Lecture 1 - Autumn 2007 43
Huffman Coding
• Huffman (1951)
• Uses frequencies of symbols in a string to build a
variable rate prefix code.
– Each symbol is mapped to a binary string.
– More frequent symbols have shorter codes.
– No code is a prefix of another.
• Example: 0 1
a 0
b 100 a 0 1
c 101
d
d 11 0 1
b c
CSEP 590 - Lecture 1 - Autumn 2007 44
Variable Rate Code Example
• Example: a 0, b 100, c 101, d 11
• Coding:
– aabddcaa = 16 bits
– 0 0 100 11 11 101 0 0= 14 bits
• Prefix code ensures unique decodability.
– 00100111110100
– aabddcaa
CSEP 590 - Lecture 1 - Autumn 2007 45
Cost of a Huffman Tree
• Let p1, p2, ... , pm be the probabilities for the
symbols a1, a2, ... ,am, respectively.
• Define the cost of the Huffman tree T to be
m
C(T) = ∑ piri
i=1
where ri is the length of the path from the root
to ai.
• C(T) is the expected length of the code of a
symbol coded by the tree T. C(T) is the bit
rate of the code.
CSEP 590 - Lecture 1 - Autumn 2007 46
Example of Cost
• Example: a 1/2, b 1/8, c 1/8, d 1/4
T
0 1
a 0 1
d
0 1
b c
C(T) = 1 x 1/2 + 3 x 1/8 + 3 x 1/8 + 2 x 1/4 = 1.75
a b c d
CSEP 590 - Lecture 1 - Autumn 2007 47
Huffman Tree
• Input: Probabilities p1, p2, ... , pm for symbols
a1, a2, ... ,am, respectively.
• Output: A tree that minimizes the average
number of bits (bit rate) to code a symbol.
That is, minimizes
m
HC(T) = ∑ piri bit rate
i=1
where ri is the length of the path from the root
to ai. This is the Huffman tree or Huffman
code
CSEP 590 - Lecture 1 - Autumn 2007 48
Optimality Principle 1
• In a Huffman tree a lowest probability symbol
has maximum distance from the root.
– If not exchanging a lowest probability symbol with
one at maximum distance will lower the cost.
T p smallest T’
k p<q
k<h
h
p q
q p
C(T’) = C(T) + hp - hq + kq - kp = C(T) - (h-k)(q-p) < C(T)
CSEP 590 - Lecture 1 - Autumn 2007 49
Optimality Principle 2
• The second lowest probability is a sibling of
the smallest in some Huffman tree.
– If not, we can move it there not raising the cost.
T p smallest T’
k q 2nd smallest
q<r
h
q k<h r
r p q p
C(T’) = C(T) + hq - hr + kr - kq = C(T) - (h-k)(r-q) < C(T)
CSEP 590 - Lecture 1 - Autumn 2007 50
Optimality Principle 3
• Assuming we have a Huffman tree T whose two
lowest probability symbols are siblings at
maximum depth, they can be replaced by a new
symbol whose probability is the sum of their
probabilities.
– The resulting tree is optimal for the new symbol set.
T
T’
p smallest
q 2nd smallest
h
q+p
q p
C(T’) = C(T) + (h-1)(p+q) - hp -hq = C(T) - (p+q)
CSEP 590 - Lecture 1 - Autumn 2007 51
Optimality Principle 3 (cont’)
• If T’ were not optimal then we could find a
lower cost tree T’’. This will lead to a lower
cost tree T’’’ for the original alphabet.
T’ T’’ T’’’
q+p
q+p q p
C(T’’’) = C(T’’) + p + q < C(T’) + p + q = C(T) which is a contradiction
CSEP 590 - Lecture 1 - Autumn 2007 52
Recursive Huffman Tree Algorithm
1. If there is just one symbol, a tree with one
node is optimal. Otherwise
2. Find the two lowest probability symbols with
probabilities p and q respectively.
3. Replace these with a new symbol with
probability p + q.
4. Solve the problem recursively for new symbols.
5. Replace the leaf with the new symbol with an
internal node with two children with the old symbols.
CSEP 590 - Lecture 1 - Autumn 2007 53
Iterative Huffman Tree Algorithm
form a node for each symbol ai with weight pi;
insert the nodes in a min priority queue ordered by probability;
while the priority queue has more than one element do
min1 := delete-min;
min2 := delete-min;
create a new node n;
n.weight := min1.weight + min2.weight;
n.left := min1;
n.right := min2;
insert(n)
return the last node in the priority queue.
CSEP 590 - Lecture 1 - Autumn 2007 54
Example of Huffman Tree Algorithm (1)
• P(a) =.4, P(b)=.1, P(c)=.3, P(d)=.1, P(e)=.1
.4 .1 .3 .1 .1
a b c d e
.4 .2 .3 .1
a c d
b e
CSEP 590 - Lecture 1 - Autumn 2007 55
Example of Huffman Tree Algorithm (2)
.4 .2 .3 .1
a c d
b e
.4 .3 .3
a c
b e
CSEP 590 - Lecture 1 - Autumn 2007 56
Example of Huffman Tree Algorithm (3)
.4 .3 .3 .4 .6
a c a
d c
b e d
b e
CSEP 590 - Lecture 1 - Autumn 2007 57
Example of Huffman Tree Algorithm (4)
.4 .6
a a
c c
d d
b e b e
CSEP 590 - Lecture 1 - Autumn 2007 58
Huffman Code
0 1 average number of bits per symbol is
.4 x 1 + .1 x 4 + .3 x 2 + .1 x 3 + .1 x 4 = 2.1
a
0 1
c a 0
0 b 1110
1
c 10
d d 110
0 1 e 1111
b e
CSEP 590 - Lecture 1 - Autumn 2007 59
Optimal Huffman Code vs. Entropy
• P(a) =.4, P(b)=.1, P(c)=.3, P(d)=.1, P(e)=.1
Entropy
H = -(.4 x log2(.4) + .1 x log2(.1) + .3 x log2(.3)
+ .1 x log2(.1) + .1 x log2(.1))
= 2.05 bits per symbol
Huffman Code
HC = .4 x 1 + .1 x 4 + .3 x 2 + .1 x 3 + .1 x 4
= 2.1 bits per symbol
pretty good!
CSEP 590 - Lecture 1 - Autumn 2007 60
In Class Exercise
• P(a) = 1/2, P(b) = 1/4, P(c) = 1/8, P(d) = 1/16,
P(e) = 1/16
• Compute the Optimal Huffman tree and its
average bit rate.
• Compute the Entropy
• Compare
• Hint: For the tree change probabilities to be
integers: a:8, b:4, c:2, d:1, e:1. Normalize at
the end.
CSEP 590 - Lecture 1 - Autumn 2007 61
Quality of the Huffman Code
• The Huffman code is within one bit of the entropy
lower bound.
H ≤ HC ≤ H + 1
• Huffman code does not work well with a two symbol
alphabet.
– Example: P(0) = 1/100, P(1) = 99/100
– HC = 1 bits/symbol
0 1
1 0
– H = -((1/100)*log2(1/100) + (99/100)log2(99/100))
= .08 bits/symbol
CSEP 590 - Lecture 1 - Autumn 2007 62
Powers of Two
• If all the probabilities are powers of two then
HC = H
• Proof by induction on the number of symbols.
Let p1 < p2 < ... < pn be the probabilities that add up
to 1
If n = 1 then HC = H (both are zero).
If n > 1 then p1 = p2 = 2-k for some k, otherwise the
sum cannot add up to 1.
Combine the first two symbols into a new symbol of
probability 2-k + 2-k = 2-k+1.
CSEP 590 - Lecture 1 - Autumn 2007 63
Powers of Two (Cont.)
By the induction hypothesis
HC(p1 + p 2 , p 3 ,..., pn ) = H(p1 + p 2 , p 3 ,..., pn )
n
= - (p1 + p 2 )log2 (p1 + p 2 ) − ∑ pilog2 (pi )
i= 3
n
= −2 −k +1
log2 (2 −k +1
) − ∑ pilog2 (pi )
i=3
n
= −2 −k +1
(log2 (2 ) + 1) − ∑ pilog2 (pi )
−k
i= 3
n
= −2 log2 (2 ) − 2 log2 (2 ) − ∑ pilog2 (pi ) − 2 −k − 2 −k
−k −k −k −k
i= 3
n
= − ∑ pilog2 (pi ) − (p1 + p 2 )
i =1
= H(p1, p 2 ,..., pn ) − (p1 + p 2 )
CSEP 590 - Lecture 1 - Autumn 2007 64
Powers of Two (Cont.)
By the previous page,
HC(p1 + p 2 , p 3 ,..., pn ) = H(p1, p 2 ,..., pn ) − (p1 + p 2 )
By the properties of Huffman trees (principle 3),
HC(p1, p 2 ,..., pn ) = HC(p1 + p 2 , p 3 ,..., pn ) + (p1 + p 2 )
Hence,
HC(p1, p 2 ,..., pn ) = H(p1, p 2 ,..., pn )
CSEP 590 - Lecture 1 - Autumn 2007 65
Extending the Alphabet
• Assuming independence P(ab) = P(a)P(b), so
we can lump symbols together.
• Example: P(0) = 1/100, P(1) = 99/100
– P(00) = 1/10000, P(01) = P(10) = 99/10000,
P(11) = 9801/10000.
0 1 HC = 1.03 bits/symbol (2 bit symbol)
= .515 bits/bit
11
0 1 Still not that close to H = .08 bits/bit
10
0 1
01 00
CSEP 590 - Lecture 1 - Autumn 2007 66
Quality of Extended Alphabet
• Suppose we extend the alphabet to symbols
of length k then
H ≤ HC ≤ H + 1/k
• Pros and Cons of Extending the alphabet
+ Better compression
- 2k symbols
- padding needed to make the length of the input
divisible by k
CSEP 590 - Lecture 1 - Autumn 2007 67
Huffman Codes with Context
• Suppose we add a one symbol context. That is in
compressing a string x1x2...xn we want to take into
account xk-1 when encoding xk.
– New model, so entropy based on just independent
probabilities of the symbols doesn’t hold. The new entropy
model (2nd order entropy) has for each symbol a probability
for each other symbol following it.
– Example: {a,b,c}
next
a b c
a .4 .2 .4
prev b .1 .9 0
c .1 .1 .8
CSEP 590 - Lecture 1 - Autumn 2007 68
Multiple Codes
next
Code for first symbol
a b c
a 00
a .4 .2 .4
prev b .1 .9 0 b 01
c 10
c .1 .1 .8
a b c
0 1 0 1 0 1
a b a c
0 .9 .1 .8 0 1
1 .4
b a b
c abbacc
.2 .1 .1
.4
00 00 0 1 01 0
CSEP 590 - Lecture 1 - Autumn 2007 69
Average Bit Rate for Code
• P(a) = .4 P(a) + .1 P(b) + .1 P(c)
P(b) = .2 P(a) + .9 P(b) + .1 P(c)
1 = P(a) + P(b) + P(c)
• 0 = -.6 P(a) + .1 P(b) + .1 P(c)
0 = .2 P(a) - .1 P(b) + .1 P(c)
1 = P(a) + P(b) + P(c)
• P(a) = 1/7, P(b) = 4/7, P(c) = 2/7
CSEP 590 - Lecture 1 - Autumn 2007 70
Average Bit Rate for Code
1/7 4/7 2/7
0 1 0 1 0 1
a b a c
0 .9 .1 .8 0 1
1 .4
b a b
c
.2 .1 .1
.4
ABR = 1/7 (.6 x 2 + .4) + 4/7 (1) + 2/7 ( .2 x 2 +.8)
= 8/7 = 1.14 bps
CSEP 590 - Lecture 1 - Autumn 2007 71
Complexity of Huffman Code Design
• Time to design Huffman Code is O(n log n)
where n is the number of symbols.
– Each step consists of a constant number of priority
queue operations (2 deletemin’s and 1 insert)
CSEP 590 - Lecture 1 - Autumn 2007 72
Approaches to Huffman Codes
1. Frequencies computed for each input
– Must transmit the Huffman code or
frequencies as well as the compressed input
– Requires two passes
2. Fixed Huffman tree designed from training data
– Do not have to transmit the Huffman tree
because it is known to the decoder.
– H.263 video coder
3. Adaptive Huffman code
– One pass
– Huffman tree changes as frequencies change
CSEP 590 - Lecture 1 - Autumn 2007 73
Run-Length Coding
• Lots of 0’s and not too many 1’s.
– Fax of letters
– Graphics
• Simple run-length code
– Input
00000010000000001000000000010001001.....
– Symbols
6 9 10 3 2 ...
– Code the bits as a sequence of integers
– Problem: How long should the integers be?
CSEP 590 - Lecture 1 - Autumn 2007 74
Golomb Code of Order m
Variable Length Code for Integers
• Let n = qm + r where 0 < r < m.
– Divide m into n to get the quotient q and
remainder r.
• Code for n has two parts:
1. q is coded in unary
2. r is coded as a fixed prefix code
Example: m = 5 0 1
code for r
0 1 0 1
0 0 1
1 2
3 4
CSEP 590 - Lecture 1 - Autumn 2007 75
Example
• n = qm + r is represented by:
q
67 8
11L10rˆ
r̂ is the fixed prefix code for r
– where
• Example (m = 5):
2 6 9 10 27
010 1001 10111 11000 11111010
CSEP 590 - Lecture 1 - Autumn 2007 76
Alternative Explanation
Golomb Code of order 5
input output
00000 1
0 1
0 1 00001 0111
00000
0 1 0001 0110
1 0
0 1 001 010
1 01 001
01 001
0001 00001
1 000
Variable length to variable length code.
CSEP 590 - Lecture 1 - Autumn 2007 77
Run Length Example: m = 5
00000010000000001000000000010001001.....
1
00000010000000001000000000010001001.....
001
00000010000000001000000000010001001.....
1
00000010000000001000000000010001001.....
0111
In this example we coded 17 bits in only 9 bits.
CSEP 590 - Lecture 1 - Autumn 2007 78
Choosing m
• Suppose that 0 has the probability p and 1
has probability 1-p.
• The probability of 0n1 is pn(1-p). The Golomb
code of order
m = - 1
log2 p
is optimal.
• Example: p = 127/128.
m = - 1 = 89
log2 (127/128)
CSEP 590 - Lecture 1 - Autumn 2007 79
Average Bit Rate for Golomb Code
Average output code length
Average Bit Rate =
Average input code length
• m = 4 as an example. With p as the probability of 0.
p 4 + 3p 3 (1− p) + 3p 2 (1− p) + 3p(1 − p) + 3(1 − p)
ABR =
4p 4 + 4p 3 (1- p) + 3p 2 (1- p) + 2p(1 − p) + (1− p)
ouput 1 011 010 001 000
input 0000 0001 001 01 1
weight p4 p3(1-p) p2(1-p) p(1-p) 1-p
CSEP 590 - Lecture 1 - Autumn 2007 80
Comparison of GC with Entropy
GC – entropy
order entropy
CSEP 590 - Lecture 1 - Autumn 2007 81
Notes on Golomb codes
• Useful for binary compression when one symbol is
much more likely than another.
– binary images
– fax documents
– bit planes for wavelet image compression
• Need a parameter (the order)
– training
– adaptively learn the right parameter
• Variable-to-variable length code
• Last symbol needs to be a 1
– coder always adds a 1
– decoder always removes a 1
CSEP 590 - Lecture 1 - Autumn 2007 82
Tunstall Codes
• Variable-to-fixed length code
• Example
input output
a 000
b 001 a b cca cb ccc ...
ca 010 000 001 110 011 110 ...
cb 011
cca 100
ccb 101
ccc 110
CSEP 590 - Lecture 1 - Autumn 2007 83
Tunstall code Properties
1. No input code is a prefix of another to
assure unique encodability.
2. Minimize the number of bits per symbol.
CSEP 590 - Lecture 1 - Autumn 2007 84
Prefix Code Property
a 000 a c
b
b 001
000 001 a c
ca 010 b
cb 011
010 011 a b
c
cca 100
ccb 101 100 101 110
ccc 110
Unused output code is 111.
CSEP 590 - Lecture 1 - Autumn 2007 85
Use for unused code
• Consider the string “cc”, if it occurs at the end
of the data. It does not have a code.
• Send the unused code and some fixed code
for the cc.
• Generally, if there are k internal nodes in the
prefix tree then there is a need for k-1 fixed
codes.
CSEP 590 - Lecture 1 - Autumn 2007 86
Designing a Tunstall Code
• Suppose there are m initial symbols.
• Choose a target output length n where 2n > m.
1. Form a tree with a root and m children with
edges labeled with the symbols.
2. If the number of leaves is > 2n – m then halt.*
3. Find the leaf with highest probability and
expand it to have m children.** Go to 2.
* In the next step we will add m-1 more leaves.
** The probability is the product of the probabilities
of the symbols on the root to leaf path.
CSEP 590 - Lecture 1 - Autumn 2007 87
Example
• P(a) = .7, P(b) = .2, P(c) = .1
• n=3
a c
b
.7 .2 .1
CSEP 590 - Lecture 1 - Autumn 2007 88
Example
• P(a) = .7, P(b) = .2, P(c) = .1
• n=3
a c
b
a c
b .2 .1
.49 .14 .07
CSEP 590 - Lecture 1 - Autumn 2007 89
Example
• P(a) = .7, P(b) = .2, P(c) = .1
• n=3
aaa 000
a c aab 001
b
aac 010
a c
b .2 .1 ab 011
a c ac 100
b .14 .07
b 101
.343 .098 .049 c 110
CSEP 590 - Lecture 1 - Autumn 2007 90
Bit Rate of Tunstall
• The length of the output code divided by the
average length of the input code.
• Let pi be the probability of, and ri the length of
input code i (1 < i < s) and let n be the length
of the output code.
n
Average bit rate = s
∑p r
i=1
i i
CSEP 590 - Lecture 1 - Autumn 2007 91
Example
aaa .343 000
a c aab .098 001
b aac .049 010
a c ab .14 011
b .2 .1
ac .07 100
a c b .2 101
b .14 .07
c .1 110
.343 .098 .049
ABR = 3/[3 (.343 + .098 + .049) + 2 (.14 + .07) + .2 + .1]
= 1.37 bits per symbol
Entropy = 1.16 bits per symbol
CSEP 590 - Lecture 1 - Autumn 2007 92
Notes on Tunstall Codes
• Variable-to-fixed length code
• Error resilient
– A flipped bit will introduce just one error in the
output
– Huffman is not error resilient. A single bit flip can
destroy the code.
CSEP 590 - Lecture 1 - Autumn 2007 93