0% found this document useful (0 votes)

102 views22 pages

Arithmetic Coding

Arithmetic coding is summarized in 3 sentences: Arithmetic coding is a method for lossless data compression that encodes data symbols using arithmetic operations on real numbers in the interval [0,1). It provides near-entropy compression by mapping variable-length strings of input symbols into fixed-length representations. Arithmetic coding achieves better compression than other entropy encoding techniques like Huffman coding by encoding multiple symbols in a single interval.

Uploaded by

Pradeep Kumar Sriperumbudur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views22 pages

Arithmetic Coding

Uploaded by

Pradeep Kumar Sriperumbudur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Arithmetic coding

Irina Bocharova,
University of Aerospace Instrumentation,
St.-Petersburg, Russia

Outline

Shannon-Fano-Elias coding

Gilbert-Moore coding

Arithmetic coding as a generalization of SFE

and GM coding

Implementation of arithmetic coding

Lund, Sweden, February, 2005 1:22

Let x ∈ X = {1, . . . , M }, p(x) > 0,
p(1) ≥ p(2) ≥ · · · ≥ p(M ).
The cumulative sum is associated with the sym-
bol x

Q(x) = p(a),
a<x
that is,
M −1
Q(1) = 0,Q(2) = p(1), . . . , Q(M ) = i=1 p(i).

Then Q(m)lm is a codeword for m,

where lm = −log2 p(m)

Lund, Sweden, February, 2005 2:22

x p(x) Q Q in binary l(x) codeword
1 0.6 0 0.0 1 0
2 0.3 0.6 0.1001. . . 2 10
3 0.1 0.9 0.1110. . . 4 1110

L = 1.6 bits H(X) = 1.3 bits

Lund, Sweden, February, 2005 3:22

If lm binary symbols have been already transmit-
ted then the length of the interval of uncertainty
is 2−lm . Thus we can decode uniquely if

2−lm ≤ p(m)
or
lm ≥ − log2 p(m)

Choosing length lm we used only right segment

with respect to the point Q(m). This segment is
always shorter than the corresponding left seg-
ment since symbol probabilities are ordered in
descending order.

H(X) ≤ L < H(X) + 1.

Lund, Sweden, February, 2005 4:22

Let x ∈ X = {1, . . . , M }, p(x) > 0.
The cumulative sum is associated with the sym-
bol x

Q(x) = p(a),
a<x
that is,
M −1
Q(1) = 0, Q(2) = p(1), . . . , Q(M ) = i=1 p(i).

Introduce σ(x) = Q(x) + p(x)

Then σ̂(m) = σ(m)lm is a codeword for m,

where lm = −log2(p(m)/2).

Lund, Sweden, February, 2005 5:22

We put point σ(m) to the center of the segment
Q(m) + p(m)/2 and choose length of codeword
in such a manner that if lm binary symbols have
been transmitted the length of the interval of
uncertainty is less than or equal to p(m)/2.

Lund, Sweden, February, 2005 6:22

x p(x) Q σ l GM ShFE
1 0.1 0.0 0.00001... 5 00001 0000
2 0.6 0.0001.. 0.01100... 2 01 0
3 0.3 0.10110... 0.11011... 3 110 10

L = 2.6 bits H(X) = 1.3 bits

Lund, Sweden, February, 2005 7:22

Let i < j then σ(j) > σ(i)
j−1
i−1
p(j) p(i)
σ(j) − σ(i) = p(l) − p(l) + −
l=1 l=1
2 2

j−1
p(j) − p(i) p(j) − p(i)
= p(l) + ≥ p(i) +
l=i
2 2

p(i) + p(j) max{p(i), p(j)}

≥ ≥
2 2
Since lm = − log2 p(m)
2 ≥ − log2
p(m)
2

we obtain
max{p(i), p(j)}
σ(j) − σ(i) ≥ ≥ 2− min{li,lj }.
2
H(X) + 1 ≤ L < H(X) + 2

Lund, Sweden, February, 2005 8:22

When symbol-by-symbol coding is not eﬃ-
cient?
1. Memoryless source
For symbol-by-symbol coding
R = H(X) + α,
where α is coding redundancy.
For block coding
H(X n) + α nH(X) + α α
R= = = H(X) + ,
n n n
where H(X n ) denotes entropy of n random vari-
ables.
If H(X) << 1 R ≥ 1 for symbol-by-symbol cod-
ing. For binary memoryless source with p(0) =
0.99, p(1) = 0.01 H(X) = 0.081 bits and we can
easily construct the Huﬀman code with R = 1
bit but it is impossible to obtain R < 1 bit.
2.Source with memory
H(X n ) ≤ nH(X) and
H(X n ) + α α
R= ≤ H(X) + .
n n
R → H∞(X)
when n → ∞, H∞(X) denotes entropy rate.

Lund, Sweden, February, 2005 9:22

How to implement block coding?
Let x ∈ X = {1, . . . , M }, and we are going to
encode sequences x = (x1, . . . , xn) which appear
at the output of X during n consecutive time
moments.

We can consider a new source X n with symbols

corresponding to the sequences x = (x1, . . . , xn)
of length n and apply any method of symbol-by-
symbol coding to these symbols. We will obtain
H(X n ) α
R= + ,
n n
where α depends on the chosen coding proce-
dure.

The problem is coding complexity. The alpha-

bet of the new source is of size M n. For example,
if M = 28 = 256 then for n = 2 M 2 = 65536,
and for n = 3 M 3 = 16777216.

The arithmetic coding provides redundancy 2/n

with complexity n2.

Lund, Sweden, February, 2005 10:22

Arithmetic coding is a direct extension of the
Gilbert-Moore coding scheme.

Let x = (x1, x2, . . . , xn) be an M -ary sequence of

length n. We construct the modiﬁed cumulative
distribution function
p(x) p(x)
σ(x) = p(a) + = Q(x) + ,
a≺x 2 2
where a ≺ x means that a is lexicographically less
than x, l(x) = −log2(p(x)/2).

The code rate R is equal to

1 1 1
p(x)l(x) = p(x)(log2 + 1)
n x n x p(x)

H(X n) + 2
<
n

If the source generates symbols independently

we obtain
2
R < H(X) + .
n
For source with memory
R → H∞(X)
when n → ∞.

Lund, Sweden, February, 2005 11:22

Consider

Q(x[1,n]) = p(a) =
a≺x

p(a)+
a:a[1,n−1] ≺x[1,n−1],an

p(a),
a:a[1,n−1] =x[1,n−1],an≺xn

where x[1,i] = x1, x2, . . . , xi. It is easy to see that

Q(x[1,n]) = Q(x[1,n−1])+ p(a)
a:a[1,n−1]=x[1,n−1] ,an≺xn

= Q(x[1,n−1]) + p(a[1,n−1]) p(an /a[1,n−1]).
an≺xn
If the source generates symbols independently
n−1

p(a[1,n−1]) = p(ai ),
i=1

p(an /a[1,n−1]) = p(an ) = Q(xn ),
an≺xn an≺xn
where Q(xi) denotes the cumulative probability
for xi.

Lund, Sweden, February, 2005 12:22

0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
...

Lund, Sweden, February, 2005 13:22

We obtain the following recurrent equations

Q(x[1,n]) = Q(x[1,n−1]) + p(x[1,n−1])Q(xn ),

p(x[1,n−1]) = p(x[1,n−2])p(xn−1).

Lund, Sweden, February, 2005 14:22

Coding procedure
x = (x1, . . . , xn)

Initialization

F = 0; G = 1; Q(1) = 0;
for j = 2 : M

Q(j) = Q(j − 1) + p(j − 1);

end;

for i = 1 : n

F ← F + Q(xi) × G;

G ← G × p(xi);
end;

F = F + G/2; l = −log2 G/2; F̂ ← F ∗ 2l ;

Lund, Sweden, February, 2005 15:22

X = {a, b, c},
p(a) = 0.1, p(b) = 0.6, p(c) = 0.3

x = (bcbab), n = 5

i xi p(xi ) Q(xi ) F G
0 - - - 0.0000 1.0000
1 b 0.6 0.1 0.1000 0.6000
2 c 0.3 0.7 0.5200 0.1800
3 b 0.6 0.1 0.5380 0.1080
4 a 0.1 0.0 0.5380 0.0108
5 b 0.6 0.1 0.5391 0.0065

Codeword length −log2 G + 1 = 9

F + G/2 = 0.5423... and

codeword F̂ = F + G/2l = 100010101

H(X) = 1.3 bits R = 1.8 bit/symbol

Lund, Sweden, February, 2005 16:22

At each step of the coding algorithm we perform
1 addition and 2 multiplications.
Let p(1), . . . , p(M ) be numbers with binary rep-
resentation of length d. Then at the ﬁrst step
F and G will be numbers with binary representa-
ton of length 2d. Next steps will require length
of binary representation 3d, . . . , nd.

The complexity of coding procedure can be es-

timated as
n(n + 1)d
d + 2d + · · · + nd =
2

PROBLEMS

1. Algorithm requires high computational accu-

racy (theoretically inﬁnite)

2. Computational delay=length of the sequence

to be encoded.

Lund, Sweden, February, 2005 17:22

Decoding of Gilbert-Moore code

Q(m), m = 1, . . . , M are known.

Input:σ̂.

Set m = 1
While Q(m + 1) < σ̂ m ← m + 1
end;
Output: x(m)

Example.

σ̂ = 0.01 → σ̂ = 0.25

Q(2) = 0.1 < 0.25 m = 2

Q(3) = 0.7 > 0.25 stop with m = 2.

Lund, Sweden, February, 2005 18:22

Decoding procedure:
F̂ ← F̂ /2l ; S = 0; G = 1;

for i = 1 : n
j = 1;

while S + Q(j + 1) × G < F̂ andj ≤ M

j ←j+1
end;

S ← S + Q(j) × G;

G ← G × Q(j);

xi = j;

end;

At the ith step G = p(x[1,i]) and S = Q(x[1,i]).

Lund, Sweden, February, 2005 19:22

a, b, c p(a) = 0.1, p(b) = 0.6, p(c) = 0.3

Codeword 0100010101 F̂ = 0.541

F̂ = 0.0100010101

S G Hyp. Q S + QG xi p
0.0000 1.000 a 0.0 0.0000 < F̂
b 0.1 0.1000 < F̂ b 0.6
c 0.7 0.7000 > F̂
0.1000 0.6000 a 0.0 0.1000 < F̂
b 0.1 0.1600 < F̂ c 0.3
c 0.7 0.5200 < F̂
0.5200 0.1800 a 0.0 0.5200 < F̂
b 0.1 0.5380 < F̂ b 0.6
c 0.7 0.6460 > F̂
0.5380 0.1080 a 0.0 0.5380 < F̂
b 0.1 0.5488 > F̂ a 0.1
0.5380 0.0108 a 0.0 0.5380 < F̂
b 0.1 0.5391 < F̂ b 0.6
c 0.7 0.5456 > F̂

Lund, Sweden, February, 2005 20:22

1. High < 0.5

Bit = 0;

Normalization:

Low = Low × 2
High = High × 2
Low = 0; High = 0.00011000001
Bit = 0; High = 0.0011000001
Bit = 0; High = 0.011000001
Bit = 0; High = 0.11000001

2.Low > 0.5

Bit = 1;

Normalization:

Low = Low − 0.5;Low = Low × 2

High = High − 0.5; High = High × 2
Low = 0.11000011
Bit = 1; Low = 0.1000011
Bit = 1; Low = 0.000011

Lund, Sweden, February, 2005 21:22

3. Low < 0.5 High > 0.5

0.011111...1
It can be 0.01111...10 or 0.10000...01

Low = 0.0110 < 0.5 High = 0.1010 > 0.5

Count=1;
Read next symbol
Low = 0.10001 = 0.0110 + 0.00101
High = 0.10101
Bit=1; Output: 10

Lund, Sweden, February, 2005 22:22

Mathematics of Neural Networks. Models, Algorithms and Applications (PDFDrive)
No ratings yet
Mathematics of Neural Networks. Models, Algorithms and Applications (PDFDrive)
423 pages
Information Theory and Coding Sample Question 2021
No ratings yet
Information Theory and Coding Sample Question 2021
5 pages
Module 1 Long Quiz
No ratings yet
Module 1 Long Quiz
13 pages
Sem 3 - Major
No ratings yet
Sem 3 - Major
12 pages
Euler and Hamiltonian Paths
100% (1)
Euler and Hamiltonian Paths
5 pages
Implementation Details and Examples: Variable-Length Entropy Encoding Lossless Data Compression
No ratings yet
Implementation Details and Examples: Variable-Length Entropy Encoding Lossless Data Compression
26 pages
Coding Theory Lecture Notes
100% (1)
Coding Theory Lecture Notes
73 pages
Coding Round Question & Answers
No ratings yet
Coding Round Question & Answers
56 pages
ADA Solved Model Paper 2024
No ratings yet
ADA Solved Model Paper 2024
43 pages
Arithmetic Coding
No ratings yet
Arithmetic Coding
5 pages
System Erros in Assembler - TPF
50% (2)
System Erros in Assembler - TPF
472 pages
Error Control Coding
No ratings yet
Error Control Coding
76 pages
Interview - Preparation-Machine Learning Questions & Answers
No ratings yet
Interview - Preparation-Machine Learning Questions & Answers
37 pages
Coding Theory
100% (1)
Coding Theory
297 pages
Source 515 A
No ratings yet
Source 515 A
80 pages
(Euclidean, Metric, and Wasserstein) Gradient Flows
No ratings yet
(Euclidean, Metric, and Wasserstein) Gradient Flows
68 pages
Unit 3 Handwritten
No ratings yet
Unit 3 Handwritten
65 pages
Huffman Coding Notes
No ratings yet
Huffman Coding Notes
7 pages
Machine Learning Algorithms For Wireless Sensor Networksa Survey
100% (1)
Machine Learning Algorithms For Wireless Sensor Networksa Survey
25 pages
Matlab 2
No ratings yet
Matlab 2
40 pages
Huffman Coding 1
No ratings yet
Huffman Coding 1
54 pages
Coding in Malaysia
No ratings yet
Coding in Malaysia
1 page
Data Compression Arithmetic Coding
No ratings yet
Data Compression Arithmetic Coding
38 pages
Deep Learning Unit 1
No ratings yet
Deep Learning Unit 1
35 pages
Arithmetic Coding
No ratings yet
Arithmetic Coding
15 pages
week12
No ratings yet
week12
9 pages
Example of SIC Assembler Language Program
100% (1)
Example of SIC Assembler Language Program
66 pages
Chapter 3 Searching and Planning
No ratings yet
Chapter 3 Searching and Planning
104 pages
Research Proposal RMU
No ratings yet
Research Proposal RMU
5 pages
27 SVM Interview Questions (ANSWERED) To Master Before ML & Data Science Interview - MLStack - Cafe
No ratings yet
27 SVM Interview Questions (ANSWERED) To Master Before ML & Data Science Interview - MLStack - Cafe
25 pages
T. H. Pulliam NASA Ames
No ratings yet
T. H. Pulliam NASA Ames
30 pages
Pseudocode Guide For Teachers v1 PDF
100% (2)
Pseudocode Guide For Teachers v1 PDF
28 pages
Research Paper Deep Learning Update
No ratings yet
Research Paper Deep Learning Update
12 pages
Business Data Mining Week 6
No ratings yet
Business Data Mining Week 6
20 pages
Engineering Analysis With Boundary Elements: Suchismita Patra, Naraveni Rajashekar, V.V.K. Srinivas Kumar
No ratings yet
Engineering Analysis With Boundary Elements: Suchismita Patra, Naraveni Rajashekar, V.V.K. Srinivas Kumar
11 pages
AVR221 Discrete PID Controller
No ratings yet
AVR221 Discrete PID Controller
10 pages
Coding and Decoding Tests: 52 A Handbook of Verbal Reasoning
50% (2)
Coding and Decoding Tests: 52 A Handbook of Verbal Reasoning
23 pages
CS0302_ArtificialIntelligence_&_Expert_Systems
No ratings yet
CS0302_ArtificialIntelligence_&_Expert_Systems
6 pages
Python Basic
No ratings yet
Python Basic
109 pages
Programming in C Language Question and Answers
100% (2)
Programming in C Language Question and Answers
19 pages
Stock Price Pattern Recognition - A Recurrent NN Approach - K Kamijo T Tanigawa
0% (1)
Stock Price Pattern Recognition - A Recurrent NN Approach - K Kamijo T Tanigawa
7 pages
Assembler Language I
No ratings yet
Assembler Language I
42 pages
Elgamal Public Key Cryptography: Cs 303 Alg. Number Theory & Cryptography Jeremy Johnson
100% (1)
Elgamal Public Key Cryptography: Cs 303 Alg. Number Theory & Cryptography Jeremy Johnson
12 pages
Harvard Algorithms cs124 Notes
No ratings yet
Harvard Algorithms cs124 Notes
6 pages
Return To Basic Reference Manual: Gordon Henderson C 2012
No ratings yet
Return To Basic Reference Manual: Gordon Henderson C 2012
73 pages
Computer Network Module 2
No ratings yet
Computer Network Module 2
160 pages
Security and E-Commerce
No ratings yet
Security and E-Commerce
5 pages
Python
No ratings yet
Python
52 pages
Agment
No ratings yet
Agment
4 pages
Generate Fuzzy Inference System Output Surface - MATLAB Gensurf
No ratings yet
Generate Fuzzy Inference System Output Surface - MATLAB Gensurf
4 pages
Python Practical 180
100% (1)
Python Practical 180
13 pages
GSM Network and Services: Channel Coding - From Source Data To Radio Bursts
No ratings yet
GSM Network and Services: Channel Coding - From Source Data To Radio Bursts
21 pages
CS2107 First Homework: February 1, 2013
No ratings yet
CS2107 First Homework: February 1, 2013
3 pages
Specman Cheat Book
0% (1)
Specman Cheat Book
15 pages
Multimedia University of Kenya. Faculty of Engineering and Technology. Bsc. Electrical and Telecommunication Engineering
No ratings yet
Multimedia University of Kenya. Faculty of Engineering and Technology. Bsc. Electrical and Telecommunication Engineering
8 pages
SP Mid-Term Exam: Chang-Yu Wu 2006/12/03
No ratings yet
SP Mid-Term Exam: Chang-Yu Wu 2006/12/03
21 pages
Decoder Tocci
No ratings yet
Decoder Tocci
25 pages
IBM Assembler Mannual
No ratings yet
IBM Assembler Mannual
118 pages
C++ Quick Reference / C++ Cheatsheet: Preprocessor
No ratings yet
C++ Quick Reference / C++ Cheatsheet: Preprocessor
9 pages
Tutorial 1
No ratings yet
Tutorial 1
6 pages
Decision Tree - JDD
No ratings yet
Decision Tree - JDD
3 pages
Chapter 2:instructions: Language of The Computer
No ratings yet
Chapter 2:instructions: Language of The Computer
81 pages
Computing Ks3 Lesson Com y8 u6 l1
100% (1)
Computing Ks3 Lesson Com y8 u6 l1
7 pages
System Software Notes
100% (1)
System Software Notes
97 pages
Chapter 3 Boolean Anlgebra and Logi Gates
No ratings yet
Chapter 3 Boolean Anlgebra and Logi Gates
59 pages
Microprocessor and Assembly Language CSC-321: Sheeza Zaheer
No ratings yet
Microprocessor and Assembly Language CSC-321: Sheeza Zaheer
30 pages
Huffman Coding
No ratings yet
Huffman Coding
10 pages
Python Programming
No ratings yet
Python Programming
3 pages
Introduction To Programming Language
100% (1)
Introduction To Programming Language
13 pages
Introduction To Python
No ratings yet
Introduction To Python
24 pages
Computational Fluid Dynamics: Department of
No ratings yet
Computational Fluid Dynamics: Department of
1 page
Arithmetic Coding
No ratings yet
Arithmetic Coding
15 pages
Chapter 4 Computer Codes
No ratings yet
Chapter 4 Computer Codes
30 pages
System Software Lab
100% (2)
System Software Lab
49 pages
HP 35 User Manual
No ratings yet
HP 35 User Manual
48 pages
Chapter 5 Data Representation
No ratings yet
Chapter 5 Data Representation
80 pages
1 History of Coding Theory
No ratings yet
1 History of Coding Theory
7 pages
Unit1 - Introduction & Syntax of Python Program
No ratings yet
Unit1 - Introduction & Syntax of Python Program
7 pages
Computer Organization
No ratings yet
Computer Organization
77 pages
Pseudocode Example Slides
No ratings yet
Pseudocode Example Slides
16 pages
CSE 232 Systems Programming: Lecture Notes #2
No ratings yet
CSE 232 Systems Programming: Lecture Notes #2
9 pages
Scanner Tutorial
No ratings yet
Scanner Tutorial
13 pages
Unit - 1 1.what Are The Features of Python. Explain The PVM. Ans: Following Are Some Important Features of Python
No ratings yet
Unit - 1 1.what Are The Features of Python. Explain The PVM. Ans: Following Are Some Important Features of Python
15 pages
Numpy and Matplotlib: Purushothaman.V.N March 10, 2011
No ratings yet
Numpy and Matplotlib: Purushothaman.V.N March 10, 2011
27 pages
8086 New
No ratings yet
8086 New
17 pages
Lab Work: Lab 2: Structured Program Development in C
No ratings yet
Lab Work: Lab 2: Structured Program Development in C
5 pages
Advanced Unix Programming
From Everand
Advanced Unix Programming
Prof. N. B Venkateswarlu
No ratings yet
Building Websites with VB.NET and DotNetNuke 4
From Everand
Building Websites with VB.NET and DotNetNuke 4
Daniel N. Egan
1/5 (1)
Touchpad Prime Ver. 1.2 Class 6
From Everand
Touchpad Prime Ver. 1.2 Class 6
Nisha Batra
No ratings yet
Mastering The Spritekit Framework: Develop Professional Games With This New Ios 7 Framework
From Everand
Mastering The Spritekit Framework: Develop Professional Games With This New Ios 7 Framework
Peter van de Put
No ratings yet