2.1 Source Code

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

SKET 3583

Digital Communication Systems

Source Coding

Huffman Code

• Prefix-free,variable length code that can


achieve the shortest average code length for an
alphabet
• Most frequent symbols have short codes
• Procedure
– List all symbols and probabilities in descending
order
– Merge branches with two lowest probabilities,
combine their probabilities
– Repeat until one branch is left

1
Huffman Code Example

0.4 0.4 0.4 0.4 0.6 1


a 1.0
0.2 0.2 0.2 0.4 1 0.6 0.4
b 0
0.1 0.2 0.2 1 0.2
c 0.4 0
d
0.1 0.1 1 0.2
0 The Code: n  2.4
0.2
0.1 a 11
e 10.2 0.1 0 b 00 Compression
0.1
f 0 c 101 Ratio:
d 100 3.0/2.4=1.25
e 011 Entropy:
f 010 2.32

• Step 1 : Arrange the messages in order of


decreasing prob.

message Prob.
s5 0.5
s3 0.25
s2 0.125
s1 0.0625
s4 0.0625

2
5
n   ni pi
i 1

 4(0.0625 )  3(0.125)  2(0.25)  4(0.0625 )  1(0.5)


 1.875 bit/messag e
5
H X    pi log pi
i 1

 2(0.0625 ) log( 0.0625 )  0.125 log( 0.125)  0.25 log( 0.25)


 0.5 log( 0.5)  1.875 bit/message

Lempel-Ziv Algorithm

• Look through a code dictionary with


already coded segments
– If matches segment, send <dictionary address,
next character> and store segment + new
character in dictionary
– If no match, store in dictionary, send
<0,symbol>

3
Example LZ Coding

• Encode [a b a a b a b b b b b b b a b b b b b a]
Code Dictionary
Address Contents Encoded Packets
1 a <0,a> Note: 9 code
2 b <0,b> words, 3 bit
3 aa <1,a> address, 1
4 ba <2,a>
bit for new
5 bb <2,b>
6 bbb <5,b> character,
7 bba <5,a>
8 bbbb <6,b>
<4,->

Notes on Lempel-Ziv Codes

• Compression gains occur for long


sequences
• Other variations exist that encode data
using packets and run length encoding:
<# characters back, length, next character>

4
Lempel-Ziv (ZIP) Codes

• Huffman codes have some shortcomings


– Know symbol probability information a priori
– Coding tree must be known at coder/decoder
• Lempel-Ziv algorithm use text itself to
iteratively construct a sequence of variable
length code words
• Used in gzip, UNIX compress, LZW
algorithms

You might also like