0% found this document useful (0 votes)
300 views10 pages

Redundancy (Information Theory)

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 10

Redundancy

(information theory)
Page issues

In Information theory, redundancy


measures the fractional difference
between the entropy H(X) of an ensemble
X, and its maximum possible value
.[1][2] Informally, it is the amount
of wasted "space" used to transmit certain
data. Data compression is a way to reduce
or eliminate unwanted redundancy, while
checksums are a way of adding desired
redundancy for purposes of error
detection when communicating over a
noisy channel of limited capacity.

Quantitative definition
In describing the redundancy of raw data,
the rate of a source of information is the
average entropy per symbol. For
memoryless sources, this is merely the
entropy of each symbol, while, in the most
general case of a stochastic process, it is
the limit, as n goes to infinity, of the joint
entropy of the first n symbols divided by n.
It is common in information theory to
speak of the "rate" or "entropy" of a
language. This is appropriate, for example,
when the source of information is English
prose. The rate of a memoryless source is
simply , since by definition there is
no interdependence of the successive
messages of a memoryless source.

The absolute rate of a language or source


is simply
the logarithm of the cardinality of the
message space, or alphabet. (This formula
is sometimes called the Hartley function.)
This is the maximum possible rate of
information that can be transmitted with
that alphabet. (The logarithm should be
taken to a base appropriate for the unit of
measurement in use.) The absolute rate is
equal to the actual rate if the source is
memoryless and has a uniform
distribution.

The absolute redundancy can then be


defined as
the difference between the absolute rate
and the rate.

The quantity is called the relative

redundancy and gives the maximum


possible data compression ratio, when
expressed as the percentage by which a
file size can be decreased. (When
expressed as a ratio of original file size to
compressed file size, the quantity
gives the maximum compression ratio that
can be achieved.) Complementary to the
concept of relative redundancy is

efficiency, defined as so that


. A memoryless source with

a uniform distribution has zero


redundancy (and thus 100% efficiency),
and cannot be compressed.

Other notions
A measure of redundancy between two
variables is the mutual information or a
normalized variant. A measure of
redundancy among many variables is
given by the total correlation.

Redundancy of compressed data refers to


the difference between the expected
compressed data length of messages
(or expected data rate
) and the entropy (or
entropy rate ). (Here we assume the data
is ergodic and stationary, e.g., a
memoryless source.) Although the rate
difference can be
arbitrarily small as increased, the actual
difference , cannot,
although it can be theoretically upper-
bounded by 1 in the case of finite-entropy
memoryless sources.

See also
Minimum redundancy coding
Huffman encoding
Data compression
Hartley function
Negentropy
Source coding theorem
Overcompleteness

References
1. Here it is assumed are the sets on
which the probability distributions are
defined.
2. MacKay, David J.C. (2003). "2.4
Definition of entropy and related functions".
Information Theory, Inference, and Learning
Algorithms . Cambridge University Press.
p. 33. ISBN 0-521-64298-1. “The
redundancy measures the fractional
difference between H(X) and its maximum
possible value, ”
Reza, Fazlollah M. (1994) [1961]. An
Introduction to Information Theory. New
York: Dover [McGraw-Hill]. ISBN 0-486-
68210-2.
Schneier, Bruce (1996). Applied
Cryptography: Protocols, Algorithms, and
Source Code in C. New York: John Wiley
& Sons, Inc. ISBN 0-471-12845-7.
Auffarth, B; Lopez-Sanchez, M.;
Cerquides, J. (2010). "Comparison of
Redundancy and Relevance Measures
for Feature Selection in Tissue
Classification of CT images". Advances
in Data Mining. Applications and
Theoretical Aspects. Springer. pp. 248–
262. CiteSeerX 10.1.1.170.1528  .

Retrieved from
"https://en.wikipedia.org/w/index.php?
title=Redundancy_(information_theory)&oldid=805
743890"

Last edited 3 months ago by Kku

Content is available under CC BY-SA 3.0 unless


otherwise noted.

You might also like