0% found this document useful (0 votes)

54 views

CSEP 590 Data Compression: Course Policies Introduction To Data Compression Entropy Variable Length Codes

This document provides an overview of the CSEP 590 Data Compression course. It introduces key concepts like lossless vs lossy compression, entropy coding, and compression ratios. It outlines course policies including assignments, projects, grading, and resources. Concepts like Braille, lossless techniques, lossy compression, information theory, and entropy are also summarized.

Uploaded by

Rohan Borgalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views

CSEP 590 Data Compression: Course Policies Introduction To Data Compression Entropy Variable Length Codes

Uploaded by

Rohan Borgalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 93

CSEP 590

Data Compression
Autumn 2007

Course Policies
Introduction to Data Compression
Entropy
Variable Length Codes
Instructors
• Instructor
– Richard Ladner
– ladner@cs.washington.edu
– 206 543-9347
• TA
– Rahul Vanam
– rahulv@u.washington.edu

CSEP 590 - Lecture 1 - Autumn 2007 2

Helpful Knowledge
• Algorithm Design and Analysis
• Probability

CSEP 590 - Lecture 1 - Autumn 2007 3

Resources
• Text Book
– Khalid Sayood, Introduction to Data Compression,
Third Edition, Morgan Kaufmann Publishers, 2006.
• Course Web Page
– http://www.cs.washington.edu/csep590a

• Papers and Sections from Books

• Discussion Board
– For discussion

CSEP 590 - Lecture 1 - Autumn 2007 4

Engagement by Students
• Weekly Assignments
– Understand compression methodology
– Due in class on Fridays (except midterm Friday)
– No late assignments accepted except with prior
approval
• Programming Projects
– Bi-level arithmetic coder and decoder.
– Build code and experiment

CSEP 590 - Lecture 1 - Autumn 2007 5

Final Exam and Grading
• 6:30-8:20 p.m. Thursday, Dec. 13, 2007
• Percentages
– Weekly assignments (50%)
– Project (20%)
– Final exam (30%)

CSEP 590 - Lecture 1 - Autumn 2007 6

Logistics
• I will be gone the week of October 15th. We’ll
need to have a make up class.
• There is no class Thanksgiving week,
November 19th.
• We have some guest speakers toward the
end of the quarter.

CSEP 590 - Lecture 1 - Autumn 2007 7

Basic Data Compression Concepts
original compressed decompressed

x y x̂
Encoder Decoder

• Lossless compression x = xˆ
– Also called entropy coding, reversible coding.
• Lossy compression x ≠ xˆ
– Also called irreversible coding.
• Compression ratio = x y
– x is number of bits in x.

CSEP 590 - Lecture 1 - Autumn 2007 8

Why Compress
• Conserve storage space
• Reduce time for transmission
– Faster to encode, send, then decode than to send
the original
• Progressive transmission
– Some compression techniques allow us to send
the most important bits first so we can get a low
resolution version of some data before getting the
high fidelity version
• Reduce computation
– Use less data to achieve an approximate answer

CSEP 590 - Lecture 1 - Autumn 2007 9

Braille
• System to read text by feeling raised dots on
paper (or on electronic displays). Invented in
1820s by Louis Braille, a French blind man.

a b c z

and the with mother

th ch gh

CSEP 590 - Lecture 1 - Autumn 2007 10

Braille Example
Clear text:
Call me Ishmael. Some years ago -- never mind how
long precisely -- having \\ little or no money in my purse,
and nothing particular to interest me on shore, \\ I thought
I would sail about a little and see the watery part of the
world. (238 characters)
Grade 2 Braille in ASCII.
,call me ,i\%mael4 ,``s ye$>$s ago -- n``e m9d h[ l;g
precisely -- hav+ \\ ll or no m``oy 9 my purse1 \& no?+
``picul$>$ 6 9t]e/ me on \%ore1 \\ ,i $?$``$|$ ,i wd sail
ab a ll \& see ! wat]y ``p ( ! \_w4 (203 characters)

Compression ratio = 238/203 = 1.17

CSEP 590 - Lecture 1 - Autumn 2007 11
Lossless Compression
• Data is not lost - the original is really needed.
– text compression
– compression of computer binary files
• Compression ratio typically no better than 4:1 for
lossless compression on many kinds of files.
• Statistical Techniques
– Huffman coding
– Arithmetic coding
– Golomb coding
• Dictionary techniques
– LZW, LZ77
– Sequitur
– Burrows-Wheeler Method
• Standards - Morse code, Braille, Unix compress, gzip,
zip, bzip, GIF, JBIG, Lossless JPEG
CSEP 590 - Lecture 1 - Autumn 2007 12
Lossy Compression
• Data is lost, but not too much.
– audio
– video
– still images, medical images, photographs
• Compression ratios of 10:1 often yield quite
high fidelity results.
• Major techniques include
– Vector Quantization
– Wavelets
– Block transforms
– Standards - JPEG, JPEG2000, MPEG 2, H.264

CSEP 590 - Lecture 1 - Autumn 2007 13

Why is Data Compression Possible
• Most data from nature has redundancy
– There is more data than the actual information
contained in the data.
– Squeezing out the excess data amounts to
compression.
– However, unsqueezing is necessary to be able to
figure out what the data means.
• Information theory is needed to understand
the limits of compression and give clues on
how to compress well.
CSEP 590 - Lecture 1 - Autumn 2007 14
What is Information
• Analog data
– Also called continuous data
– Represented by real numbers (or complex
numbers)
• Digital data
– Finite set of symbols {a1, a2, ... , am}
– All data represented as sequences (strings) in the
symbol set.
– Example: {a,b,c,d,r} abracadabra
– Digital data can be an approximation to analog
data

CSEP 590 - Lecture 1 - Autumn 2007 15

Symbols
• Roman alphabet plus punctuation
• ASCII - 256 symbols
• Binary - {0,1}
– 0 and 1 are called bits
– All digital information can be represented
efficiently in binary
– {a,b,c,d} fixed length representation
symbol a b c d
binary 00 01 10 11
– 2 bits per symbol
CSEP 590 - Lecture 1 - Autumn 2007 16
Exercise - How Many Bits Per
Symbol?
• Suppose we have n symbols. How many bits
(as a function of n ) are needed in to
represent a symbol in binary?
– First try n a power of 2.

CSEP 590 - Lecture 1 - Autumn 2007 17

Discussion: Non-Powers of Two
• Can we do better than a fixed length
representation for non-powers of two?

CSEP 590 - Lecture 1 - Autumn 2007 18

Information Theory
• Developed by Shannon in the 1940’s and 50’s
• Attempts to explain the limits of communication
using probability theory.
• Example: Suppose English text is being sent
– It is much more likely to receive an “e” than a “z”.
– In some sense “z” has more information than “e”.

CSEP 590 - Lecture 1 - Autumn 2007 19

First-order Information
• Suppose we are given symbols {a1, a2, ... , am}.
• P(ai) = probability of symbol ai occurring in the
absence of any other information.
P(a1) + P(a2) + ... + P(am) = 1
• inf(ai) = log2(1/P(ai)) bits is the information of ai
in bits. 7
6
5
-log(x)
4
y

3
2
1
0
0.01
0.08

0.15

0.22
0.29

0.36
0.43

0.5

0.57
0.64

0.71
0.78

0.85

0.92
0.99
x

CSEP 590 - Lecture 1 - Autumn 2007 20

Example
• {a, b, c} with P(a) = 1/8, P(b) = 1/4, P(c) = 5/8
– inf(a) = log2(8) = 3
– inf(b) = log2(4) = 2
– inf(c) = log2(8/5) = .678
• Receiving an “a” has more information than
receiving a “b” or “c”.

CSEP 590 - Lecture 1 - Autumn 2007 21

First Order Entropy
• The first order entropy is defined for a probability
distribution over symbols {a1, a2, ... , am}.
m
1
H = ∑ P (ai ) log 2 ( )
i =1 P (ai )
• H is the average number of bits required to code up a
symbol, given all we know is the probability distribution
of the symbols.
• H is the Shannon lower bound on the average number of
bits to code a symbol in this “source model”.
• Stronger models of entropy include context.

CSEP 590 - Lecture 1 - Autumn 2007 22

Entropy Examples
• {a, b, c} with a 1/8, b 1/4, c 5/8.
– H = 1/8 *3 + 1/4 *2 + 5/8* .678 = 1.3 bits/symbol

• {a, b, c} with a 1/3, b 1/3, c 1/3. (worst case)

– H = 3* (1/3)*log2(3) = 1.6 bits/symbol

• Note that a standard code takes 2 bits per

symbol
symbol a b c
binary code 00 01 10

CSEP 590 - Lecture 1 - Autumn 2007 23

An Extreme Case
• {a, b, c} with a 1, b 0, c 0
– H=?

CSEP 590 - Lecture 1 - Autumn 2007 24

Entropy Curve
• Suppose we have two symbols with probabilities
x and 1-x, respectively.
maximum entropy at .5
1.2

1 -(x log x + (1-x)log(1-x))

0.8
entropy

0.6

0.4

0.2

0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
probability of first symbol

CSEP 590 - Lecture 1 - Autumn 2007 25

A Simple Prefix Code
• {a, b, c} with a 1/8, b 1/4, c 5/8.
• A prefix code is defined by a binary tree
• Prefix code property
– no output is a prefix of another
input output
binary tree a 00
0 1
c b 01 code
0 1
a b c 1

ccabccbccc
1 1 00 01 1 1 01 1 1 1
CSEP 590 - Lecture 1 - Autumn 2007 26
Binary Tree Terminology
root

node

leaf

1. Each node, except the root, has a unique parent.

2. Each internal node has exactly two children.

• Player 2: Decode the string
• Check for equality

CSEP 590 - Lecture 1 - Autumn 2007 40

How Good is the Code

0 1
c
0 1 5/8
a b
1/8 1/4

bit rate = (1/8)2 + (1/4)2 + (5/8)1 = 11/8 = 1.375 bps

Entropy = 1.3 bps
Standard code = 2 bps

(bps = bits per symbol)

CSEP 590 - Lecture 1 - Autumn 2007 41

Design a Prefix Code 1
• abracadabra
• Design a prefix code for the 5 symbols
{a,b,r,c,d} which compresses this string the
most.

CSEP 590 - Lecture 1 - Autumn 2007 42

Design a Prefix Code 2
• Suppose we have n symbols each with
probability 1/n. Design a prefix code with
minimum average bit rate.
• Consider n = 2,3,4,5,6 first.

CSEP 590 - Lecture 1 - Autumn 2007 43

Huffman Coding
• Huffman (1951)
• Uses frequencies of symbols in a string to build a
variable rate prefix code.
– Each symbol is mapped to a binary string.
– More frequent symbols have shorter codes.
– No code is a prefix of another.
• Example: 0 1
a 0
b 100 a 0 1
c 101
d
d 11 0 1

b c
CSEP 590 - Lecture 1 - Autumn 2007 44
Variable Rate Code Example
• Example: a 0, b 100, c 101, d 11
• Coding:
– aabddcaa = 16 bits
– 0 0 100 11 11 101 0 0= 14 bits
• Prefix code ensures unique decodability.
– 00100111110100

– aabddcaa

CSEP 590 - Lecture 1 - Autumn 2007 45

Cost of a Huffman Tree
• Let p1, p2, ... , pm be the probabilities for the
symbols a1, a2, ... ,am, respectively.
• Define the cost of the Huffman tree T to be
m
C(T) = ∑ piri
i=1
where ri is the length of the path from the root
to ai.
• C(T) is the expected length of the code of a
symbol coded by the tree T. C(T) is the bit
rate of the code.
CSEP 590 - Lecture 1 - Autumn 2007 46
Example of Cost
• Example: a 1/2, b 1/8, c 1/8, d 1/4
T
0 1
a 0 1

d
0 1

b c

C(T) = 1 x 1/2 + 3 x 1/8 + 3 x 1/8 + 2 x 1/4 = 1.75

a b c d
CSEP 590 - Lecture 1 - Autumn 2007 47
Huffman Tree
• Input: Probabilities p1, p2, ... , pm for symbols
a1, a2, ... ,am, respectively.
• Output: A tree that minimizes the average
number of bits (bit rate) to code a symbol.
That is, minimizes
m
HC(T) = ∑ piri bit rate
i=1
where ri is the length of the path from the root
to ai. This is the Huffman tree or Huffman
code
CSEP 590 - Lecture 1 - Autumn 2007 48
Optimality Principle 1
• In a Huffman tree a lowest probability symbol
has maximum distance from the root.
– If not exchanging a lowest probability symbol with
one at maximum distance will lower the cost.

T p smallest T’
k p<q
k<h
h
p q

q p

C(T’) = C(T) + hp - hq + kq - kp = C(T) - (h-k)(q-p) < C(T)

CSEP 590 - Lecture 1 - Autumn 2007 49
Optimality Principle 2
• The second lowest probability is a sibling of
the smallest in some Huffman tree.
– If not, we can move it there not raising the cost.

T p smallest T’
k q 2nd smallest
q<r
h
q k<h r

r p q p

C(T’) = C(T) + hq - hr + kr - kq = C(T) - (h-k)(r-q) < C(T)

CSEP 590 - Lecture 1 - Autumn 2007 50
Optimality Principle 3
• Assuming we have a Huffman tree T whose two
lowest probability symbols are siblings at
maximum depth, they can be replaced by a new
symbol whose probability is the sum of their
probabilities.
– The resulting tree is optimal for the new symbol set.
T
T’
p smallest
q 2nd smallest
h

q+p
q p
C(T’) = C(T) + (h-1)(p+q) - hp -hq = C(T) - (p+q)
CSEP 590 - Lecture 1 - Autumn 2007 51
Optimality Principle 3 (cont’)
• If T’ were not optimal then we could find a
lower cost tree T’’. This will lead to a lower
cost tree T’’’ for the original alphabet.

T’ T’’ T’’’

q+p
q+p q p

C(T’’’) = C(T’’) + p + q < C(T’) + p + q = C(T) which is a contradiction

CSEP 590 - Lecture 1 - Autumn 2007 52

Recursive Huffman Tree Algorithm
1. If there is just one symbol, a tree with one
node is optimal. Otherwise
2. Find the two lowest probability symbols with
probabilities p and q respectively.
3. Replace these with a new symbol with
probability p + q.
4. Solve the problem recursively for new symbols.
5. Replace the leaf with the new symbol with an
internal node with two children with the old symbols.

CSEP 590 - Lecture 1 - Autumn 2007 53

Iterative Huffman Tree Algorithm
form a node for each symbol ai with weight pi;
insert the nodes in a min priority queue ordered by probability;
while the priority queue has more than one element do
min1 := delete-min;
min2 := delete-min;
create a new node n;
n.weight := min1.weight + min2.weight;
n.left := min1;
n.right := min2;
insert(n)
return the last node in the priority queue.

CSEP 590 - Lecture 1 - Autumn 2007 54

Example of Huffman Tree Algorithm (1)
• P(a) =.4, P(b)=.1, P(c)=.3, P(d)=.1, P(e)=.1

.4 .1 .3 .1 .1
a b c d e

.4 .2 .3 .1
a c d

b e

Huffman Code

0 1 average number of bits per symbol is

.4 x 1 + .1 x 4 + .3 x 2 + .1 x 3 + .1 x 4 = 2.1
a
0 1
c a 0
0 b 1110
1
c 10
d d 110
0 1 e 1111
b e

CSEP 590 - Lecture 1 - Autumn 2007 59

Optimal Huffman Code vs. Entropy
• P(a) =.4, P(b)=.1, P(c)=.3, P(d)=.1, P(e)=.1
Entropy

H = -(.4 x log2(.4) + .1 x log2(.1) + .3 x log2(.3)

+ .1 x log2(.1) + .1 x log2(.1))
= 2.05 bits per symbol

Huffman Code

HC = .4 x 1 + .1 x 4 + .3 x 2 + .1 x 3 + .1 x 4
= 2.1 bits per symbol
pretty good!
CSEP 590 - Lecture 1 - Autumn 2007 60
In Class Exercise
• P(a) = 1/2, P(b) = 1/4, P(c) = 1/8, P(d) = 1/16,
P(e) = 1/16
• Compute the Optimal Huffman tree and its
average bit rate.
• Compute the Entropy
• Compare
• Hint: For the tree change probabilities to be
integers: a:8, b:4, c:2, d:1, e:1. Normalize at
the end.

CSEP 590 - Lecture 1 - Autumn 2007 61

Quality of the Huffman Code
• The Huffman code is within one bit of the entropy
lower bound.
H ≤ HC ≤ H + 1
• Huffman code does not work well with a two symbol
alphabet.
– Example: P(0) = 1/100, P(1) = 99/100
– HC = 1 bits/symbol
0 1
1 0
– H = -((1/100)*log2(1/100) + (99/100)log2(99/100))
= .08 bits/symbol

CSEP 590 - Lecture 1 - Autumn 2007 62

Powers of Two
• If all the probabilities are powers of two then
HC = H
• Proof by induction on the number of symbols.
Let p1 < p2 < ... < pn be the probabilities that add up
to 1
If n = 1 then HC = H (both are zero).
If n > 1 then p1 = p2 = 2-k for some k, otherwise the
sum cannot add up to 1.
Combine the first two symbols into a new symbol of
probability 2-k + 2-k = 2-k+1.
CSEP 590 - Lecture 1 - Autumn 2007 63
Powers of Two (Cont.)
By the induction hypothesis
HC(p1 + p 2 , p 3 ,..., pn ) = H(p1 + p 2 , p 3 ,..., pn )
n
= - (p1 + p 2 )log2 (p1 + p 2 ) − ∑ pilog2 (pi )
i= 3
n
= −2 −k +1
log2 (2 −k +1
) − ∑ pilog2 (pi )
i=3
n
= −2 −k +1
(log2 (2 ) + 1) − ∑ pilog2 (pi )
−k

i= 3
n
= −2 log2 (2 ) − 2 log2 (2 ) − ∑ pilog2 (pi ) − 2 −k − 2 −k
−k −k −k −k

i= 3
n
= − ∑ pilog2 (pi ) − (p1 + p 2 )
i =1

= H(p1, p 2 ,..., pn ) − (p1 + p 2 )

CSEP 590 - Lecture 1 - Autumn 2007 64

Powers of Two (Cont.)
By the previous page,

HC(p1 + p 2 , p 3 ,..., pn ) = H(p1, p 2 ,..., pn ) − (p1 + p 2 )

By the properties of Huffman trees (principle 3),

HC(p1, p 2 ,..., pn ) = HC(p1 + p 2 , p 3 ,..., pn ) + (p1 + p 2 )

Hence,
HC(p1, p 2 ,..., pn ) = H(p1, p 2 ,..., pn )

CSEP 590 - Lecture 1 - Autumn 2007 65

Extending the Alphabet
• Assuming independence P(ab) = P(a)P(b), so
we can lump symbols together.
• Example: P(0) = 1/100, P(1) = 99/100
– P(00) = 1/10000, P(01) = P(10) = 99/10000,
P(11) = 9801/10000.

0 1 HC = 1.03 bits/symbol (2 bit symbol)

= .515 bits/bit
11
0 1 Still not that close to H = .08 bits/bit
10
0 1
01 00
CSEP 590 - Lecture 1 - Autumn 2007 66
Quality of Extended Alphabet
• Suppose we extend the alphabet to symbols
of length k then

H ≤ HC ≤ H + 1/k
• Pros and Cons of Extending the alphabet
+ Better compression
- 2k symbols
- padding needed to make the length of the input
divisible by k

CSEP 590 - Lecture 1 - Autumn 2007 67

Huffman Codes with Context
• Suppose we add a one symbol context. That is in
compressing a string x1x2...xn we want to take into
account xk-1 when encoding xk.
– New model, so entropy based on just independent
probabilities of the symbols doesn’t hold. The new entropy
model (2nd order entropy) has for each symbol a probability
for each other symbol following it.
– Example: {a,b,c}
next
a b c
a .4 .2 .4
prev b .1 .9 0
c .1 .1 .8

CSEP 590 - Lecture 1 - Autumn 2007 68

Multiple Codes
next
Code for first symbol
a b c
a 00
a .4 .2 .4
prev b .1 .9 0 b 01
c 10
c .1 .1 .8

a b c

0 1 0 1 0 1

a b a c
0 .9 .1 .8 0 1
1 .4
b a b
c abbacc
.2 .1 .1
.4
00 00 0 1 01 0

CSEP 590 - Lecture 1 - Autumn 2007 69

Average Bit Rate for Code
• P(a) = .4 P(a) + .1 P(b) + .1 P(c)
P(b) = .2 P(a) + .9 P(b) + .1 P(c)
1 = P(a) + P(b) + P(c)
• 0 = -.6 P(a) + .1 P(b) + .1 P(c)
0 = .2 P(a) - .1 P(b) + .1 P(c)
1 = P(a) + P(b) + P(c)
• P(a) = 1/7, P(b) = 4/7, P(c) = 2/7

CSEP 590 - Lecture 1 - Autumn 2007 70

Average Bit Rate for Code
1/7 4/7 2/7

0 1 0 1 0 1

a b a c
0 .9 .1 .8 0 1
1 .4
b a b
c
.2 .1 .1
.4

ABR = 1/7 (.6 x 2 + .4) + 4/7 (1) + 2/7 ( .2 x 2 +.8)

= 8/7 = 1.14 bps

CSEP 590 - Lecture 1 - Autumn 2007 71

Complexity of Huffman Code Design
• Time to design Huffman Code is O(n log n)
where n is the number of symbols.
– Each step consists of a constant number of priority
queue operations (2 deletemin’s and 1 insert)

CSEP 590 - Lecture 1 - Autumn 2007 72

Approaches to Huffman Codes
1. Frequencies computed for each input
– Must transmit the Huffman code or
frequencies as well as the compressed input
– Requires two passes
2. Fixed Huffman tree designed from training data
– Do not have to transmit the Huffman tree
because it is known to the decoder.
– H.263 video coder
3. Adaptive Huffman code
– One pass
– Huffman tree changes as frequencies change

CSEP 590 - Lecture 1 - Autumn 2007 73

Run-Length Coding
• Lots of 0’s and not too many 1’s.
– Fax of letters
– Graphics
• Simple run-length code
– Input
00000010000000001000000000010001001.....
– Symbols
6 9 10 3 2 ...
– Code the bits as a sequence of integers
– Problem: How long should the integers be?
CSEP 590 - Lecture 1 - Autumn 2007 74
Golomb Code of Order m
Variable Length Code for Integers
• Let n = qm + r where 0 < r < m.
– Divide m into n to get the quotient q and
remainder r.
• Code for n has two parts:
1. q is coded in unary
2. r is coded as a fixed prefix code
Example: m = 5 0 1
code for r
0 1 0 1

0 0 1
1 2

3 4
CSEP 590 - Lecture 1 - Autumn 2007 75
Example
• n = qm + r is represented by:
q
67 8
11L10rˆ
r̂ is the fixed prefix code for r
– where
• Example (m = 5):
2 6 9 10 27
010 1001 10111 11000 11111010

CSEP 590 - Lecture 1 - Autumn 2007 76

Alternative Explanation
Golomb Code of order 5
input output
00000 1
0 1
0 1 00001 0111
00000
0 1 0001 0110
1 0
0 1 001 010
1 01 001
01 001
0001 00001
1 000

Variable length to variable length code.

CSEP 590 - Lecture 1 - Autumn 2007 77

Run Length Example: m = 5
00000010000000001000000000010001001.....
1
00000010000000001000000000010001001.....
001
00000010000000001000000000010001001.....
1
00000010000000001000000000010001001.....
0111

In this example we coded 17 bits in only 9 bits.

CSEP 590 - Lecture 1 - Autumn 2007 78

Choosing m
• Suppose that 0 has the probability p and 1
has probability 1-p.
• The probability of 0n1 is pn(1-p). The Golomb
code of order
m = - 1 
 log2 p
is optimal.
• Example: p = 127/128.

m = - 1  = 89
 log2 (127/128) 

CSEP 590 - Lecture 1 - Autumn 2007 79

Average Bit Rate for Golomb Code

Average output code length

Average Bit Rate =
Average input code length

• m = 4 as an example. With p as the probability of 0.

p 4 + 3p 3 (1− p) + 3p 2 (1− p) + 3p(1 − p) + 3(1 − p)
ABR =
4p 4 + 4p 3 (1- p) + 3p 2 (1- p) + 2p(1 − p) + (1− p)

ouput 1 011 010 001 000

input 0000 0001 001 01 1
weight p4 p3(1-p) p2(1-p) p(1-p) 1-p
CSEP 590 - Lecture 1 - Autumn 2007 80
Comparison of GC with Entropy

GC – entropy
order entropy

CSEP 590 - Lecture 1 - Autumn 2007 81

Notes on Golomb codes
• Useful for binary compression when one symbol is
much more likely than another.
– binary images
– fax documents
– bit planes for wavelet image compression
• Need a parameter (the order)
– training
– adaptively learn the right parameter
• Variable-to-variable length code
• Last symbol needs to be a 1
– coder always adds a 1
– decoder always removes a 1
CSEP 590 - Lecture 1 - Autumn 2007 82
Tunstall Codes
• Variable-to-fixed length code
• Example
input output
a 000
b 001 a b cca cb ccc ...
ca 010 000 001 110 011 110 ...
cb 011
cca 100
ccb 101
ccc 110

CSEP 590 - Lecture 1 - Autumn 2007 83

Tunstall code Properties
1. No input code is a prefix of another to
assure unique encodability.
2. Minimize the number of bits per symbol.

CSEP 590 - Lecture 1 - Autumn 2007 84

Prefix Code Property

a 000 a c
b
b 001
000 001 a c
ca 010 b
cb 011
010 011 a b
c
cca 100
ccb 101 100 101 110

ccc 110
Unused output code is 111.

CSEP 590 - Lecture 1 - Autumn 2007 85

Use for unused code
• Consider the string “cc”, if it occurs at the end
of the data. It does not have a code.
• Send the unused code and some fixed code
for the cc.
• Generally, if there are k internal nodes in the
prefix tree then there is a need for k-1 fixed
codes.

CSEP 590 - Lecture 1 - Autumn 2007 86

Designing a Tunstall Code
• Suppose there are m initial symbols.
• Choose a target output length n where 2n > m.

1. Form a tree with a root and m children with

edges labeled with the symbols.
2. If the number of leaves is > 2n – m then halt.*
3. Find the leaf with highest probability and
expand it to have m children.** Go to 2.

* In the next step we will add m-1 more leaves.

** The probability is the product of the probabilities
of the symbols on the root to leaf path.
CSEP 590 - Lecture 1 - Autumn 2007 87
Example
• P(a) = .7, P(b) = .2, P(c) = .1
• n=3
a c
b

.7 .2 .1

CSEP 590 - Lecture 1 - Autumn 2007 88

Example
• P(a) = .7, P(b) = .2, P(c) = .1
• n=3
a c
b
a c
b .2 .1

.49 .14 .07

CSEP 590 - Lecture 1 - Autumn 2007 89

Example
• P(a) = .7, P(b) = .2, P(c) = .1
• n=3
aaa 000
a c aab 001
b
aac 010
a c
b .2 .1 ab 011
a c ac 100
b .14 .07
b 101
.343 .098 .049 c 110

CSEP 590 - Lecture 1 - Autumn 2007 90

Bit Rate of Tunstall
• The length of the output code divided by the
average length of the input code.
• Let pi be the probability of, and ri the length of
input code i (1 < i < s) and let n be the length
of the output code.
n
Average bit rate = s

∑p r
i=1
i i

CSEP 590 - Lecture 1 - Autumn 2007 91

Example
aaa .343 000

a c aab .098 001

b aac .049 010
a c ab .14 011
b .2 .1
ac .07 100
a c b .2 101
b .14 .07
c .1 110

.343 .098 .049

ABR = 3/[3 (.343 + .098 + .049) + 2 (.14 + .07) + .2 + .1]

= 1.37 bits per symbol
Entropy = 1.16 bits per symbol

CSEP 590 - Lecture 1 - Autumn 2007 92

Notes on Tunstall Codes
• Variable-to-fixed length code
• Error resilient
– A flipped bit will introduce just one error in the
output
– Huffman is not error resilient. A single bit flip can
destroy the code.

CSEP 590 - Lecture 1 - Autumn 2007 93

Gordon Jacob Fantasia For Euphonium
50% (4)
Gordon Jacob Fantasia For Euphonium
5 pages
Children S Depression Inventory
No ratings yet
Children S Depression Inventory
1 page
Paper and Pencil Test
100% (4)
Paper and Pencil Test
8 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
Chapter 4 - Introduction To Source Coding
No ratings yet
Chapter 4 - Introduction To Source Coding
72 pages
Chapter 4 - Introduction To Source Coding PDF
No ratings yet
Chapter 4 - Introduction To Source Coding PDF
72 pages
Compression For Sending and Storing Information: Text, Audio, Images, Videos
No ratings yet
Compression For Sending and Storing Information: Text, Audio, Images, Videos
28 pages
Text and Image Compression
100% (1)
Text and Image Compression
57 pages
EC 2214: Coding & Data Compression: Vishwakarma Institute of Technology
No ratings yet
EC 2214: Coding & Data Compression: Vishwakarma Institute of Technology
35 pages
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
No ratings yet
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
37 pages
cp467_12_lecture14_compression1
No ratings yet
cp467_12_lecture14_compression1
146 pages
Algorithms in The Real World: Data Compression: Lectures 1 and 2
No ratings yet
Algorithms in The Real World: Data Compression: Lectures 1 and 2
55 pages
Basic Data Compression Concepts: - Lossless - Lossy - Compression Ratio
No ratings yet
Basic Data Compression Concepts: - Lossless - Lossy - Compression Ratio
9 pages
Data compression
No ratings yet
Data compression
26 pages
20 Compression
No ratings yet
20 Compression
58 pages
Forouzan6e ch11 PPTs Accessible
No ratings yet
Forouzan6e ch11 PPTs Accessible
119 pages
Data Compression
No ratings yet
Data Compression
22 pages
2017 May 24 Huffman Lecture1
No ratings yet
2017 May 24 Huffman Lecture1
24 pages
Notes 07 Compression PDF
No ratings yet
Notes 07 Compression PDF
193 pages
Kap 5
No ratings yet
Kap 5
29 pages
Data Compression Chapter 7
No ratings yet
Data Compression Chapter 7
40 pages
Why Needed?: Without Compression, These Applications Would Not Be Feasible
No ratings yet
Why Needed?: Without Compression, These Applications Would Not Be Feasible
11 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
Compressor Principles
No ratings yet
Compressor Principles
32 pages
Lecture
No ratings yet
Lecture
75 pages
5 Data Compression
No ratings yet
5 Data Compression
46 pages
Data Compression Lecture01
No ratings yet
Data Compression Lecture01
87 pages
CHAPTER FOURmultimedia
No ratings yet
CHAPTER FOURmultimedia
23 pages
Introduction To Data Compression - Guy E. Blelloch PDF
No ratings yet
Introduction To Data Compression - Guy E. Blelloch PDF
54 pages
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
No ratings yet
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
48 pages
Unit 1 Data Compression
No ratings yet
Unit 1 Data Compression
30 pages
Intro To ICT 11
No ratings yet
Intro To ICT 11
31 pages
Module IV
No ratings yet
Module IV
37 pages
Book-Chapter-07 (Lossless Compression Algorithms) Merged
No ratings yet
Book-Chapter-07 (Lossless Compression Algorithms) Merged
25 pages
CHAPTER 7
No ratings yet
CHAPTER 7
36 pages
Ee121 Lec 01 22
No ratings yet
Ee121 Lec 01 22
3 pages
chap2
No ratings yet
chap2
47 pages
Basics of Compression: Goals
No ratings yet
Basics of Compression: Goals
15 pages
3 Source Coding
No ratings yet
3 Source Coding
31 pages
CH 6
No ratings yet
CH 6
21 pages
Entropy, Coding and Data Compression
No ratings yet
Entropy, Coding and Data Compression
33 pages
Compression PDF
No ratings yet
Compression PDF
55 pages
Chapter Presentation
No ratings yet
Chapter Presentation
57 pages
Lossless Compression: Lesson 1
No ratings yet
Lossless Compression: Lesson 1
10 pages
8
No ratings yet
8
35 pages
Aadel Veri
No ratings yet
Aadel Veri
37 pages
CSEP 590 Data Compression: Adaptive Huffman Coding
No ratings yet
CSEP 590 Data Compression: Adaptive Huffman Coding
42 pages
Lecture 3-Huffman Coding
No ratings yet
Lecture 3-Huffman Coding
30 pages
Chapter Five Lossless Compression
No ratings yet
Chapter Five Lossless Compression
49 pages
HTCS501 unit 4
No ratings yet
HTCS501 unit 4
17 pages
Module 2
No ratings yet
Module 2
47 pages
Analog & Digital Communication Presentation On Data Compression
No ratings yet
Analog & Digital Communication Presentation On Data Compression
31 pages
Group Assignment Multimedia System
No ratings yet
Group Assignment Multimedia System
26 pages
Lossless Math
No ratings yet
Lossless Math
32 pages
Week 3
No ratings yet
Week 3
30 pages
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
From Everand
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
Kanto
No ratings yet
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet
GNU Octave Beginner's Guide
From Everand
GNU Octave Beginner's Guide
Jesper Schmidt Hansen
3/5 (2)
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
Finite Field Fun: A lightweight introduction to finite fields and their applications for engineers, computer scientists, and others
From Everand
Finite Field Fun: A lightweight introduction to finite fields and their applications for engineers, computer scientists, and others
Riccardo Bernardini
No ratings yet
Notes Data Compression and Cryptography Module 6 System Security
No ratings yet
Notes Data Compression and Cryptography Module 6 System Security
23 pages
Digital Marketing Day 2 Session 1
No ratings yet
Digital Marketing Day 2 Session 1
85 pages
Workshop On Digital Marketing Day 2 - Session 2 Social Media Marketing - Sachin Sadare
No ratings yet
Workshop On Digital Marketing Day 2 - Session 2 Social Media Marketing - Sachin Sadare
68 pages
Sequential Circuits: Latches and Flip-Flops
No ratings yet
Sequential Circuits: Latches and Flip-Flops
32 pages
Audio and Video Compression
No ratings yet
Audio and Video Compression
16 pages
TCOM 370: NOTES 99-10 Data Compression
No ratings yet
TCOM 370: NOTES 99-10 Data Compression
19 pages
Flip Flop Conversions: - Srtod - Srtojk - Srtot - Jktot - Jktod - Jktosr - Dtot - Dtosr - Ttod
No ratings yet
Flip Flop Conversions: - Srtod - Srtojk - Srtot - Jktot - Jktod - Jktosr - Dtot - Dtosr - Ttod
23 pages
Entropy & Run Length Coding
No ratings yet
Entropy & Run Length Coding
45 pages
Data Science & Python - TIT - 24 August - Session 2 PDF
No ratings yet
Data Science & Python - TIT - 24 August - Session 2 PDF
17 pages
Audio Compression
No ratings yet
Audio Compression
81 pages
JPEG Image Compression: - K. M. Aishwarya
No ratings yet
JPEG Image Compression: - K. M. Aishwarya
31 pages
Imc14 03 Huffman Codes PDF
No ratings yet
Imc14 03 Huffman Codes PDF
31 pages
Imc14 05 Dictionary Codes
No ratings yet
Imc14 05 Dictionary Codes
31 pages
Introduction To JPEG Compression - Tutorialspoint
No ratings yet
Introduction To JPEG Compression - Tutorialspoint
4 pages
Image Compression
No ratings yet
Image Compression
38 pages
Lab Manual of DSIP
No ratings yet
Lab Manual of DSIP
22 pages
Report On 2-D State-Space Digital Filter Employing 2's Complement Overflow Arithmetic
No ratings yet
Report On 2-D State-Space Digital Filter Employing 2's Complement Overflow Arithmetic
28 pages
Chipko Movement - Final
50% (2)
Chipko Movement - Final
32 pages
INGETEAM Power Technology India Private Limited
No ratings yet
INGETEAM Power Technology India Private Limited
1 page
Tom Ricketts, Executive Chairman, Chicago Cubs
No ratings yet
Tom Ricketts, Executive Chairman, Chicago Cubs
1 page
PGP - 1 - Term-I - Microeconomics For Managers
No ratings yet
PGP - 1 - Term-I - Microeconomics For Managers
4 pages
TOPIC 2 Transformation
No ratings yet
TOPIC 2 Transformation
28 pages
SCM QB 2 Units
No ratings yet
SCM QB 2 Units
10 pages
Liu 2016
No ratings yet
Liu 2016
9 pages
Reasons Behind Not Recommendation in Issb: My 5 Days Experience at ISSB, Part-I
No ratings yet
Reasons Behind Not Recommendation in Issb: My 5 Days Experience at ISSB, Part-I
6 pages
PSAT Questions 30 Mixed
No ratings yet
PSAT Questions 30 Mixed
30 pages
141
No ratings yet
141
1 page
GR: Donations Inter Vivos Are Subject To Donor's Taxes While Donations
No ratings yet
GR: Donations Inter Vivos Are Subject To Donor's Taxes While Donations
8 pages
Verb To Be: 2. Write The Missing Words of The Verb To Be (Interrogative Form) : Affirmative
100% (1)
Verb To Be: 2. Write The Missing Words of The Verb To Be (Interrogative Form) : Affirmative
1 page
F1 ENGLISH UPDATED NOTES
No ratings yet
F1 ENGLISH UPDATED NOTES
54 pages
Allied Bank
No ratings yet
Allied Bank
30 pages
Downloads: Center For Systems and Software Engineering
No ratings yet
Downloads: Center For Systems and Software Engineering
2 pages
04GR(Rotation01)_15873564550
No ratings yet
04GR(Rotation01)_15873564550
8 pages
Module 15
No ratings yet
Module 15
45 pages
Slab and Beam Tabulations
No ratings yet
Slab and Beam Tabulations
5 pages
Dental Caries After Radiation On Patient Nashopharingeal Cancer
No ratings yet
Dental Caries After Radiation On Patient Nashopharingeal Cancer
8 pages
Tribhuvan University: Institute of Engineering
No ratings yet
Tribhuvan University: Institute of Engineering
10 pages
Jamba Juice Expansion Plan
No ratings yet
Jamba Juice Expansion Plan
11 pages
Determination of Equivalent Circuit Parameters of A Single Phase Transformer
No ratings yet
Determination of Equivalent Circuit Parameters of A Single Phase Transformer
5 pages
08 - Akash Sharma
No ratings yet
08 - Akash Sharma
71 pages
Sawit-Sumbermas-Sarana TBK Billingual 30 Sep 22 - Released1667285054
No ratings yet
Sawit-Sumbermas-Sarana TBK Billingual 30 Sep 22 - Released1667285054
135 pages
Driving Doha To Dubai
No ratings yet
Driving Doha To Dubai
6 pages
Cover Page Sample
No ratings yet
Cover Page Sample
1 page
My Immortal
No ratings yet
My Immortal
56 pages