The Use of Information Theory in Evolutionary Biology

Ann. N.Y. Acad. Sci.
ISSN 0077-8923
A N N A L S O F T H E N E W Y O R K A C A D E M Y O F SC I E N C E S
Issue: The Year in Evolutionary Biology
The use of information theory in evolutionary biology

Christoph Adami1,2,3
1
Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan. 2 Department of
Physics and Astronomy, Michigan State University, East Lansing, Michigan. 3 BEACON Center for the Study of Evolution in
Action, Michigan State University, East Lansing, Michigan
Address for correspondence: C. Adami, Department of Microbiology and Molecular Genetics, 2209 Biomedical and Physical
Sciences Building, Michigan State University, East Lansing, MI 48824. adami@msu.edu
Information is a key concept in evolutionary biology. Information stored in a biological organism’s genome is used
to generate the organism and to maintain and control it. Information is also that which evolves. When a population
adapts to a local environment, information about this environment is fixed in a representative genome. However,
when an environment changes, information can be lost. At the same time, information is processed by animal
brains to survive in complex environments, and the capacity for information processing also evolves. Here, I review
applications of information theory to the evolution of proteins and to the evolution of information processing in
simulated agents that adapt to perform a complex task.
Keywords: information theory; evolution; protein evolution; animat evolution
that they are the result of the evolutionary mecha-

Introduction
nisms at work—is not in doubt. Yet, mathematical
Evolutionary biology has traditionally been a sci- population genetics cannot quantify them because
ence that used observation and the analysis of spec- the theory only deals with existing variation. At the
imens to draw inferences about common descent, same time, the uniqueness of any particular line of
adaptation, variation, and selection.1,2 In contrast descent appears to preclude a generative principle,
to this discipline that requires fieldwork and metic- or a framework that would allow us to understand
ulous attention to detail, stands the mathematical the generation of these lines from a perspective once
theory of population genetics,3,4 which developed removed from the microscopic mechanisms that
in parallel but somewhat removed from evolution- shape genes one mutation at the time. In the last 24
ary biology, as it could treat exactly only very ab- years or so, the situation has changed dramatically
stract cases. The mathematical theory cast Darwin’s because of the advent of long-term evolution exper-
insight about inheritance, variation, and selection iments with replicate lines of bacteria adapting for
into formulae that could predict particular aspects over 50,000 generations,5,6 and in silico evolution ex-
of the evolutionary process, such as the probability periments covering millions of generations.7,8 Both
that an allele that conferred a particular advantage experimental approaches, in their own way, have
would go to fixation, how long this process would provided us with key insights into the evolution of
take, and how the process would be modified by dif- complexity on macroscopic time scales.6,8–14
ferent forms of inheritance. Missing from these two But there is a common concept that unifies the
disciplines, however, was a framework that would al- digital and the biochemical approach: information.
low us to understand the broad macro-evolutionary That information is the essence of “that which
arcs that we can see everywhere in the biosphere evolves” has been implicit in many writings (al-
and in the fossil record—the lines of descent that though the word “information” does not appear in
connect simple to complex forms of life. Granted, Darwin’s On the Origin of Species). Indeed, shortly
the existence of these unbroken lines—and the fact after the genesis of the theory of information at the
doi: 10.1111/j.1749-6632.2011.06422.x
Ann. N.Y. Acad. Sci. xxxx (2012) 1–17
c 2012 New York Academy of Sciences. 1
Information theory in evolutionary biology Adami
hands of a Bell Laboratories engineer,15 this the- then we can find those general principles that have
ory was thought to ultimately explain everything eluded us so far.
from the higher functions of living organisms down In this review, I focus on two uses of informa-
to metabolism, growth, and differentiation.16 How- tion theory in evolutionary biology: First, the quan-
ever, this optimism soon gave way to a miasma of tification of the information content of genes and
confounding mathematical and philosophical argu- proteins and how this information may have evolved
ments that dampened enthusiasm for the concept along the branches of the tree of life. Second, the evo-
of information in biology for decades. To some ex- lution of information-processing structures (such
tent, evolutionary biology was not yet ready for a as brains) that control animals, and how the func-
quantitative treatment of “that which evolves:” the tional complexity of these brains (and how they
year of publication of “Information in Biology”16 evolve) could be quantified using information the-
coincided with the discovery of the structure of ory. The latter approach reinforces a concept that
DNA, and the wealth of sequence data that cata- has appeared in neuroscience repeatedly: the value
pulted evolutionary biology into the computer age of information for an adapted organism is fitness,18
was still half a century away. and the complexity of an organism’s brain must
Colloquially, information is often described as be reflected in how it manages to process, inte-
something that aids in decision making. Interest- grate, and make use of information for its own
ingly, this is very close to the mathematical meaning advantage.19
of “information,” which is concerned with quanti-
Entropy and information in molecular
fying the ability to make predictions about uncer-
sequences
tain systems. Life—among many other aspects—
has the peculiar property of displaying behavior To define entropy and information, we first must de-
or characters that are appropriate, given the envi- fine the concept of a random variable. In probability
ronment. We recognize this of course as the conse- theory, a random variable X is a mathematical object
quence of adaptation, but the outcome is that the that can take on a finite number of different states
adapted organism’s decisions are “in tune” with its x1 · · · x N with specified probabilities p1 , . . . , p N .
environment—the organism has information about We should keep in mind that a mathematical ran-
its environment. One of the insights that has dom variable is a description—sometimes accurate,
emerged from the theory of computation is that in- sometimes not—of a physical object. For example,
formation must be physical—information cannot the random variable that we would use to describe a
exist without a physical substrate that encodes it.17 fair coin has two states: x1 = heads and x2 = tails,
In computers, information is encoded in zeros and with probabilities p1 = p2 = 0.5. Of course, an ac-
ones, which themselves are represented by differ- tual coin is a far more complex device—it may devi-
ent voltages on semiconductors. The information ate from being true, it may land on an edge once in
we retain in our brains also has a physical substrate, a while, and its faces can make different angles with
even though its physiological basis depends on the true North. Yet, when coins are used for demon-
type of memory and is far from certain. Context- strations in probability theory or statistics, they are
appropriate decisions require information, however most succinctly described with two states and two
it is stored. For cells, we now know that this infor- equal probabilities. Nucleic acids can be described
mation is stored in a cell’s inherited genetic material, probabilistically in a similar manner. We can define
and is precisely the kind that Shannon described in a nucleic acid random variable X as having four
his 1948 articles. If inherited genetic material rep- states x1 = A, x2 = C, x3 = G, and x4 = T, which
resents information, then how did the information- it can take on with probabilities p1 , . . . , p4 , while
carrying molecules acquire it? Is the amount of in- being perfectly aware that the nucleic acid molecules
formation stored in genes increasing throughout themselves are far more complex, and deserve
evolution, and if so, why? How much information a richer description than the four-state abstrac-
does an organism store? How much in a single gene? tion. But given the role that these molecules
If we can replace a discussion of the evolution of play as information carriers of the genetic mate-
complexity along the various lines of descent with a rial, this abstraction will serve us very well going
discussion of the evolution of information, perhaps forward.
2 Ann. N.Y. Acad. Sci. xxxx (2012) 1–17

c 2012 New York Academy of Sciences.
Adami Information theory in evolutionary biology
Entropy The entropy of position variable X 28 can now be

Using the concept of a random variable X, we can estimated as
define its entropy (sometimes called uncertainty) 5 5 17 17
as20,21 H (X 28 ) = −2 × log2 − log2
33 33 33 33

N
6 6
H(X) = − pi log pi . (1) − log2 ≈ 1.765 bits, (4)
i =1
33 33
Here, the logarithm is taken to an arbitrary base or less than the maximal 2 bits we would expect
that will normalize (and give units to) the entropy. if all nucleotides appeared with equal probability.
If we choose the dual logarithm, the units are “bits,” Such an uneven distribution of states immediately
whereas if we choose base e, the units are “nats.” suggests a “betting” strategy that would allow us to
Here, I will often choose the size of the alphabet make predictions with accuracy better than chance
as the base of the logarithm, and call the unit the about the state of position variable X 28 : If we bet
“mer.”22 So, if we describe nucleic acid sequences that we would see a “C” there, then we would be
(alphabet size 4), a single nucleotide can have up to right over half the time on average, as opposed to
1 “mer” of entropy, whereas if we describe proteins a quarter of the time for a variable that is evenly
(logarithms taken to the base 20), a single residue distributed across the four states. In other words,
can have up to 1 mer of entropy. Naturally, a 5-mer information is stored in this variable.
has up to 5 mers of entropy, and so on. Information
A true coin, we can immediately convince our- To learn how to quantify the amount of information
selves, has an entropy of 1 bit. A single random stored, let us go through the same exercise for a dif-
nucleotide, by the same reasoning, has an entropy ferent position (say, position 41a ) of that molecule,
of 1 mer (or 2 bits) because to find approximately

4
H(X) = − 1/4 log4 1/4 = 1. (2) p41 (A) = 0.24, p41 (C) = 0.46,
i =1 p41 (G) = 0.21, p41 (T) = 0.09, (5)
What is the entropy of a nonrandom nucleotide?
To determine this, we have to find the probabil- so that H(X 41 ) ≈ 1.765 bits. To determine how
ities pi with which that nucleotide is found at a likely it is to find any particular nucleotide at posi-
particular position within a gene. Say we are in- tion 41 given position 28 is a “C,” for example, we
terested in nucleotide 28 (counting from 5 to 3 ) have to collect conditional probabilities. They are
of the 76 base pair tRNA gene of the bacterium easily obtained if we know the joint probability to
Escherichia coli. What is its entropy? To determine observe any of the 16 combinations AA. . .TT at the
this, we need to obtain an estimate of the proba- two positions. The conditional probability to ob-
bility that any of the four nucleotides are found at serve state j at position 41 given state i at position
that particular position. This kind of information 28 is
can be gained from sequence repositories. For ex- pi j
pi | j = , (6)
ample, the database tRNAdb23 contains sequences pj
for more than 12,000 tRNA genes. For the E. coli
tRNA gene, among 33 verified sequences (for differ- where pi j is the joint probability to observe state i at
ent anticodons), we find 5 that show an “A” at the position 28 and at the same time state j at position
28th position, 17 have a “C,” 5 have a “G,” and 6 41. The notation “i | j” is read as “i given j.” Col-
have a “T.” We can use these numbers to estimate lecting these probabilities from the sequence data
the substitution probabilities at this position as gives the probability matrix that relates the random
p28 (A) = 5/33, p28 (C) = 17/33,
p28 (G) = 5/33, p28 (T) = 6/33, (3)
which, even though the statistics are not good, allow a
The precise numbering of nucleotide positions differs
us to infer that “C” is preferred at that position. between databases.

variable X 28 to the variable X 41 : We can now proceed and calculate the information
p (X 41 |X 28 ) content. Each column in Eq. (7) represents a con-
⎛ ⎞ ditional probability to find a particular nucleotide
p (A | A) p (A | C) p (A | G) p (A | T) at position 41, given a particular value is found at
⎜ ⎟
⎜ p (C | A) p (C | C) p (C | G) p (C | T) ⎟ position 28. We can use these values to calculate the
⎜ ⎟
=⎜ ⎟ conditional entropy to find a particular nucleotide,
⎜ p (G | A) p (G | C) p (G | G) p (G | T) ⎟
⎝ ⎠ given that position 28 is “A,” for example, as
p (T | A) p (T | C) p (T | G) p (T | T)
H(X 41 |X 28 = A)
⎛ ⎞
0.2 0.235 0 0.5 = −0.2 log2 0.2 − 0.8 log2 0.8 ≈ 0.72 bits. (9)
⎜ ⎟
⎜ 0 0.706 0.2 0.333 ⎟ This allows us to calculate the amount of informa-
⎜ ⎟
=⎜ ⎟. tion that is revealed (about X 41 ) by knowing the
⎜ 0.8 0 0.4 0.167 ⎟
⎝ ⎠ state of X 28 . If we do not know the state of X 28 , our
0 0.059 0.4 0 (7) uncertainty about X 41 is 1.795 bits, as calculated
earlier. But revealing that X 28 actually is an “A” has
We can glean important information from these
reduced our uncertainty to 0.72 bits, as we saw in
probabilities. It is clear, for example, that positions
Eq. (9). The information we obtained is then just
28 and 41 are not independent from each other. If
the difference
nucleotide 28 is an “A,” then position 41 can only
be an “A” or a “G,” but mostly (4/5 times) you I (X 41 : X 28 = A) = H (X 41 ) − H (X 41 |X 28 = A)
expect a “G.” But consider the dependence between ≈ 1.075 bits, (10)
nucleotides 42 and 28
⎛ ⎞ that is, just over 1 bit. The notation in Eq. (10),
0 0 0 1 indicating information between two variables by a
⎜ ⎟
⎜0 0 1 0⎟ colon (sometimes a semicolon) is conventional. We
⎜ ⎟
p (X 42 | X 28 ) = ⎜ ⎟. (8) can also calculate the average amount of information
⎜0 1 0 0⎟
⎝ ⎠ about X 41 that is gained by revealing the state of X 28
1 0 0 0 as
This dependence is striking—if you know posi- I (X 41 : X 28 ) = H (X 41 ) − H (X 41 |X 28 )
tion 28, you can predict (based on the sequence ≈ 0.64 bits. (11)
data given) position 42 with certainty. The reason
Here, H(X 41 |X 28 ) is the average conditional en-
for this perfect correlation lies in the functional in-
tropy of X 41 given X 28 , obtained by averaging the
teraction between the sites: 28 and 42 are paired in
four conditional entropies (for the four possible
a stem of the tRNA molecule in a Watson–Crick
states of X 28 ) using the probabilities with which
pair—to enable the pairing, a “G” must be associ-
X 28 occurs in any of its four states, given by Eq. (3).
ated with a “C,” and a “T” (encoding a U) must be
If we apply the same calculation to the pair of po-
associated with an “A.” It does not matter which is
sitions X 42 and X 28 , we should find that knowing
at any position as long as the paired nucleotide is
X 28 reduces our uncertainty about X 42 to zero—
complementary. And it is also clear that these as-
indeed, X 28 carries perfect information about X 42 .
sociations are maintained by the selective pressures
The covariance between residues in an RNA sec-
of Darwinian evolution—a substitution that breaks
ondary structure captured by the mutual entropy
the pattern leads to a molecule that does not fold into
can be used to predict secondary structure from se-
the correct shape to efficiently translate messenger
quence alignments alone.24
RNA into proteins. As a consequence, the organism
bearing such a mutation will be eliminated from Information content of proteins
the gene pool. This simple example shows clearly We have seen that different positions within a
the relationship between information theory and biomolecule can carry information about other po-
evolutionary biology: Fitness is reflected in infor- sitions, but how much information do they store
mation, and when selective pressures maximize fit- about the environment within which they evolve?
ness, information must be maximized concurrently. This question can be answered using the same

information-theoretic formalism introduced ear- Watson–Crick binding to establish their secondary

lier. Information is defined as a reduction in our structure. In proteins, correlations between residues
uncertainty (caused by our ability to make predic- are much weaker (but certainly still important, see,
tions with an accuracy better than chance) when e.g., Refs. 25–33), and we can take Eq. (15) as a
armed with information. Here we will use proteins first-order approximation of the information con-
as our biomolecules, which means our random vari- tent, while keeping in mind that residue–residue
ables can take on 20 states, and our protein variable correlations encode important information about
will be given by the joint variable the stability of the protein and its functional affinity
to other molecules. Note, however, that a population
X = X1 X2 · · · X L , (12) with two or more subdivisions, where each subpop-
where L is the number of residues in the protein. ulation has different amino acid frequencies, can
We now ask: “How much information about the mimic residue correlations on the level of the whole
environment (rather than about another residue) population when there are none on the level of the
is stored in a particular residue?” To answer this, subpopulations.34
we have to first calculate the uncertainty about any For most cases that we will have to deal with, a
particular residue in the absence of information protein is only functional in a very defined cellular
about the environment. Clearly, it is the environ- environment, and as a consequence the conditional
ment within which a protein finds itself that con- entropy of a residue is fixed by the substitution prob-
strains the particular amino acids that a position abilities that we can observe. Let us take as an exam-
variable can take on. If I do not specify this environ- ple the rodent homeodomain protein,35 defined by
ment, there is nothing that constrains any particu- 57 residues. The environment for this protein is of
lar residue i, and as a consequence the entropy is course the rodent, and we might surmise that the in-
maximal formation content of the homeodomain protein in
rodents is different from the homeodomain protein
H (X i ) = Hmax = log2 20 ≈ 4.32 bits. (13) in primates, for example, simply because primates
In any functional protein, the residue is highly and rodents have diverged about 100 million years
constrained, however. Let us imagine that the pos- ago,36 and have since then taken independent evo-
sible states of the environment can be described by lutionary paths. We can test this hypothesis by cal-
a random variable E (that takes on specific environ- culating the information content of rodent proteins
mental states e j with given probabilities). Then the and compare it to the primate version, using substi-
information about environment E = e j contained tution probabilities inferred from sequence data that
in position variable X i of protein X is given by can be found, for example, in the Pfam database.37
Let us first look at the entropy per residue, along the
I (X i : E = e j ) = Hmax − H(X i | E = e j ), chain length of the 57 mer. But instead of calculating
(14) the entropy in bits (by taking the base-2 logarithm),
we will calculate the entropy in “mers,” by taking
in perfect analogy to Eq. (10). How do we calculate
the logarithm to base 20. This way, a single residue
the information content of the entire protein, armed
can have at most 1 mer of entropy, and the 57-mer
only with the information content of residues? If
has at most 57 mers of entropy. The entropic pro-
residues do not interact (that is, the state of a residue
file (entropy per site as a function of site) of the
at one position does not reveal any information
rodent homeodomain protein depicted in Figure 1
about the state of a residue at another position),
shows that the entropy varies considerably from site
then the information content of the protein would
to site, with strongly conserved and highly variable
just be a sum of the information content of each
residues.
residue
When estimating entropies from finite ensembles

L
(small number of sequences), care must be taken
I (X : E = e j ) = I (X i : E = e j ). (15) to correct for the bias that is inherent in estimating
i =1
the probabilities from the frequencies. Rare residues
This independence of positions certainly could will be assigned zero probabilities in small ensem-
not be assumed in RNA molecules that rely on bles but not in larger ones. Because this error is not

odomain protein we obtain

IPrimates = 25.43 ± 0.08 mers, (19)
which is identical to the information content of ro-
dent homeodomains within statistical error. We can
thus conclude that although the information is en-
coded somewhat differently between the rodent and
the primate version of this protein, the total infor-
mation content is the same.
Evolution of information
Figure 1. Entropic profile of the 57-amino acid rodent home- Although the total information content of the
odomain, obtained from 810 sequences in Pfam (accessed Febru- homeodomain protein has not changed between ro-
ary 3, 2011). Error of the mean is smaller than the data points dents and primates, what about longer time inter-
shown. Residues are numbered 2–58 as is common for this
domain.35
vals? If we take a protein that is ubiquitous among
different forms of life (i.e., its homologue is present
in many different branches), has its information
symmetric (probabilities will always be underesti- content changed as it is used in more and more
mated), the bias is always toward smaller entropies. complex forms of life? One line of argument tells
Several methods can be applied to correct for this, us that if the function of the protein is the same
and I have used here the second-order bias correc- throughout evolutionary history, then its informa-
tion, described for example in Ref. 38. Summing up tion content should be the same in each variant. We
the entropies per site shown in Figure 1, we can get saw a hint of that when comparing the information
an estimate for the information content by applying content of the homeodomain protein between ro-
Eq. (15). The maximal entropy Hmax , when mea- dents and primates. But we can also argue instead
sured in mers, is of course 57, so the information that because information is measured relative to the
content is just environment the protein (and thus the organism)
finds itself in, then organisms that live in very dif-

57
ferent environments can potentially have different
IRodentia = 57 − H(X i | e Rodentia ), (16)
i =1
information content even if the sequences encoding
the proteins are homologous. Thus, we could ex-
which comes out to pect differences in protein information content in
organisms that are different enough that the protein
IRodentia = 25.29 ± 0.09 mers, (17) is used in different ways. But it is certainly not clear
whether we should observe a trend of increasing or
where the error of the mean is obtained from the
decreasing information along the line of descent. To
theoretical estimate of the variance given the fre-
get a first glimpse at what these differences could be
quency estimate.38
like, I will take a look here at the evolution of in-
The same analysis can be repeated for the primate
formation in two proteins that are important in the
homeodomain protein. In Figure 2, we can see the
function of most animals—the homeodomain pro-
difference between the “entropic profile” of rodents
tein and the COX2 (cytochrome-c-oxidase subunit
and primates
2) protein.
Entropy = H(X i | e Rodentia ) − H(X i | e Primates ), The homeodomain (or homeobox) protein is es-
sential in determining the pattern of development
(18)
in animals—it is crucial in directing the arrange-
which shows some significant differences, in partic- ment of cells according to a particular body plan.39
ular, it seems, at the edges between structural motifs In other words, the homeobox determines where
in the protein. the head goes and where the tail goes. Although it is
When summing up the entropies to arrive at the often said that these proteins are specific to animals,
total information content of the primate home- some plants have homeodomain proteins that are

level), we move closer toward evolutionarily more

ancient proteins, in particular because this group
(the great apes) is used to reconstruct the sequence
of the ancestor of that group. The great apes are
but one family of the order primates which besides
the great apes also contains the families of mon-
keys, lemurs, lorises, tarsiers, and galagos. Looking
at the homeobox protein of all the primates then
takes us further back in time. A simplified version
of the phylogeny of animals is shown in Figure 3,
which shows the hierarchical organization of the
tree.
Figure 2. Difference between entropic profile “ΔEntropy” of The database Pfam uses a range of different tax-
the homeobox protein of rodents and primates (the latter from onomic levels (anywhere from 12 to 22, depend-
903 sequences in Pfam, accessed February 3, 2011). Error bars
are the error of the mean of the difference, using the average of
ing on the branch) defined by the NCBI Taxonomy
the number of sequences. The colored boxes indicate structural Project,47 which we can take as a convenient proxy
domains as determined for the fly version of this gene. (“N” for taxonomic depth—ranging from the most basal
refers to the protein’s “N-terminus”). taxonomic identifications (such as phylum) to the
most specific ones. In Figure 4, we can see the total
sequence entropy
homologous to those I study here.40 The COX2 pro-
tein, on the other hand, is a subunit of a large protein
57
Hk (X) = H(X i |e k ), (20)
complex with 13 subunits.41 Whereas a nonfunc-
i =1
tioning (or severely impaired) homeobox protein
certainly leads to aborted development, an impaired for sequences with the NCBI taxonomic level k, as a
COX complex has a much less drastic effect—it leads function of the level depth. Note that sequences at
to mitochondrial myopathy due to a cytochrome level k always include all the sequences at level k−1.
oxidase deficiency,42 but is usually not fatal.43 Thus, Thus, H1 (X), which is the entropy of all home-
by testing the changes within these two proteins, we odomain sequences at level k = 1, includes the se-
are examining proteins with very different selective quences of all eukaryotes. Of course, the taxonomic
pressures acting on them. level description is not a perfect proxy for time. On
To calculate the information content of each of the vertebrate line, for example, the genus Homo oc-
these proteins along the evolutionary line of de- cupies level k = 14, whereas the genus Mus occupies
scent, in principle we need access to the sequences level k = 16. If we now plot Hk (X) versus k (for the
of extinct forms of life. Even though the resurrection major phylogenetic groups only), we see a curious
of such extinct sequences is possible in principle44 splitting of the lines based only on total sequence en-
using an approach dubbed “paleogenetics,”45,46 we tropy, and thus information (as information is just
can take a shortcut by grouping sequences accord- I = 57 − H if we measure entropy in mers). At the
ing to the depth that they occupy within the phy- base of the tree, the metazoan sequences split into
logenetic tree. So when we measure the informa- chordate proteins with a lower information con-
tion content of the homeobox protein on the tax- tent (higher entropy) and arthropod sequences with
onomic level of the family, we include in there the higher information content, possibly reflecting the
sequences of homeobox proteins of chimpanzees, different uses of the homeobox in these two groups.
gorillas, and orangutans along with humans. As the The chordate group itself splits into mammalian
chimpanzee version, for example, is essentially iden- proteins and the fish homeodomain. There is even a
tical with the human version, we do not expect to notable split in information content into two major
see any change in information content when mov- groups within the fishes.
ing from the species level to the genus level. But The same analysis applied to subunit II of the
we can expect that by grouping the sequences on COX protein (counting only 120 residue sites that
the family level (rather than the genus or species have sufficient statistics in the database) gives a very

Figure 3. Simplified phylogenetic classification of animals. At the root of this tree (on the left tree) are the eukaryotes, but only
the animal branch is shown here. If we follow the line of descent of humans, we move on the branch toward the vertebrates. The
vertebrate clade itself is shown in the tree on the right, and the line of descent through this tree follows the branches that end in the
mammals. The mammal tree, finally, is shown at the bottom, with the line ending in Homo sapiens indicated in red.
different picture. Except for an obvious split of the what allows the bearer to make predictions (about
bacterial version of the protein and the eukaryotic that particular system) with accuracy better than
one, the total entropy markedly decreases across chance, information is valuable as long as prediction
the lines as the taxonomic depth increases. Further- is valuable. In an uncertain world, making accu-
more, the arthropod COX2 is more entropic than rate predictions is tantamount to survival. In other
the vertebrate one (see Fig. 5) as opposed to the words, we expect that information, acquired from
ordering for the homeobox protein. This finding the environment and processed, has survival value
suggests that the evolution of the protein informa- and therefore is selected for in evolution.
tion content is specific to each protein, and most Predictive information
likely reflects the adaptive value of the protein for The connection between information and fitness
each family. can be made much more precise. A key relation
between information and its value for agents that
Evolution of information in robots
survive in an uncertain world as a consequence of
and animats
their actions in it was provided by Ay et al.,49 who
The evolution of information within the genes of applied a measure called “predictive information”
adapting organisms is but one use of information (defined earlier by Bialek et al.50 in the context of
theory in evolutionary biology. Just as anticipated dynamical systems theory) to characterize the be-
in the heydays of the “Cybernetics” movement,48 havioral complexity of an autonomous robot. These
information theory has indeed something to say authors showed that the mutual entropy between a
about the evolution of information processing in changing world (as represented by changing states
animal brains. The general idea behind the connec- in an organism’s sensors) and the actions of mo-
tion between information and function is simple: tors that drive the agent’s behavior (thus changing
Because information (about a particular system) is the future perceived states) is equivalent to Bialek’s

Figure 4. Entropy of homeobox domain protein sequences (PF00046 in the Pfam database, accessed July 20, 2006) as a function of
taxonomic depth for different major groups that have at last 200 sequences in the database, connected by phylogenetic relationships.
Selected groups are annotated by name. Fifty-seven core residues were used to calculate the molecular entropy. Core residues have
at least 70% sequence in the database.
predictive information as long as the agent’s deci- states—that is, the uncertainty about what the de-
sions are Markovian, that is, only depend on the state tectors will record next—is explained by the motor
of the agent and the environment at the preceding states at the preceding time point. For example, if
time. This predictive information is defined as the the motor states at time t perfectly predict what
shared entropy between motor variables Yt and the will appear in the sensors at time t + 1, then the
sensor variables at the subsequent time point X t+1 predictive information is maximal. Another version
of the predictive information studies not the effect
Ipred = I (Yt : X t+1 ) = H(X t+1 ) − H(X t+1 |Yt ).
the motors have on future sensor states, but the
(21) effect the sensors have on future motor states in-
Here, H(X t+1 ) is the entropy of the sensor states stead, for example to guide an autonomous robot
at time t + 1, defined as through a maze.51 In the former case, the pre-
dictive information quantifies how actions change
H(X t+1 ) = − p(xt+1 ) log p(xt+1 ), (22) the perceived world, whereas in the latter case the
xt+1
predictive information characterizes how the per-
using the probability distribution p(xt+1 ) over the ceived world changes the robot’s actions. Both for-
sensor states xt+1 at time t + 1. The conditional mulations, however, are equivalent when taking
entropy H(X t+1 |Yt ) characterizes how much is left into account how world and robot states are being
uncertain about the future sensor states X t+1 given updated.51 Although it is clear that measures such as
the robot’s actions in the present, that is, the state predictive information should increase as an agent
of the motors at time t, and can be calculated in the or robot learns to behave appropriately in a complex
standard manner20,21 from the joint probability dis- world, it is not at all clear whether information could
tribution of present motor states and future sensor be used as an objective function that, if maximized,
states p(xt+1 , yt ). will lead to appropriate behavior of the robot. This
As Eq. (21) implies, the predictive information is the basic hypothesis of Linsker’s “Infomax” prin-
measures how much of the entropy of sensorial ciple,52 which posits that neural control structures

Figure 5. Entropy of COX subunit II (PF00116 in the Pfam database, accessed June 22, 2006) protein sequences as a function
of taxonomic depth for selected different groups (at least 200 sequences per group), connected by phylogenetic relationships. One
hundred twenty core residues were used to calculate the molecular entropy.
evolve to maximize “information preservation” sub- periments suggest that there may be a deeper con-
ject to constraints. This hypothesis implies that the nection between information and fitness that goes
infomax principle could play the role of a guiding beyond the regularities induced by a perception–
force in the organization of perceptual systems. This action loop, that connects fitness (in the evolution-
is precisely what has been observed in experiments ary sense as the growth rate of a population) directly
with autonomous robots evolved to perform a va- to information.
riety of tasks. For example, in one task visual and As a matter of fact, Rivoire and Leibler18 recently
tactile information had to be integrated to grab an studied abstract models of the population dynamics
object,53 whereas in another, groups of five robots of evolving “finite-state agents” that optimize their
were evolved to move in a coordinated fashion54 response to a changing environment and found just
or else to navigate according to a map.55 Such ex- such a relationship. In such a description, agents

respond to a changing environment with a probabil- Integrated information

ity distribution ␲(␴t | ␴t−1 ) of changing from state What are the aspects of information processing
␴t−1 to state ␴t , to maximize the growth rate of the that distinguish complex brains from simple ones?
population. Under fairly general assumptions, the Clearly, processing large amounts of information is
growth rate is maximized if the Shannon informa- important, but a large capacity is not necessarily
tion that the agents can extract from the changing a sign of high complexity. It has been argued that
environment is maximal.18 For our purposes, this a hallmark of complex brain function is its ability
Shannon information is nothing but the predictive to integrate disparate streams of information and
information discussed earlier (see supplementary mold them into a complex gestalt that represents
text S1 in Ref. 51 for a discussion of that point). more than the sum of its parts.56–65 These streams
However, such a simple relationship only holds if of information come not only from different senso-
each agent perceives the environment in the same rial modalities such as vision, sound, and olfaction,
manner, and if information is acquired only from but also (and importantly) from memory, and create
the environment. If information is inherited or re- a conscious experience in our brains that allows us
trieved from memory, on the other hand, predic- to function at levels not available to purely reactive
tive information cannot maximize fitness. This is brains. One way to measure how much informa-
easily seen if we consider an agent that makes deci- tion a network processes is to calculate the shared
sions based on a combination of sensory input and entropy between the nodes at time t and time t + 1
memory. If memory states (instead of sensor states)
best predict an agent’s actions, the correlation be- Itotal = I (Z t : Z t+1 ). (24)
tween sensors and motors may be lost even though Here, Z t represents the state of the entire network
the actions are appropriate. A typical case would (not just the sensors or motors) at time t, and thus
be navigation under conditions when the sensors the total information captures information process-
do not provide accurate information about the en- ing among all nodes of the network, and can in
vironment, but the agent has nevertheless learned principle be larger or smaller than the predictive in-
the required actions “by heart.” In such a scenario, formation that only considers processing between
the predictive information would be low because the sensor and motors.
actions do not correlate with the sensors. Yet, the We can write the network random variable Z t as a
fitness is high because the actions were controlled product of the random variables that describe each
by memory, not by the sensors. Rivoire and Leibler node i, that is, each neuron, as (n is the number of
show further that if the actions of an agent are al- nodes in the network)
ways optimal, given the environment, then a differ-
ent measure maximizes fitness, namely the shared Z t = Z t(1) Z t(2) · · · Z t(n) , (25)
entropy between sensors and variables given the pre-
which allows us to calculate the amount of informa-
vious time step’s sensor statesb
tion processed by each individual node i as

Icausal = I (X t : Yt+1 |X t−1 ). (23)

I (i ) = I Z t(i ) : Z t+1
(i )
. (26)
In most realistic situations, however, optimal nav- Note that I omitted the index t on the left-hand
igation strategies cannot be assumed. Indeed, as side of Eqs. (24) and (26), assuming that the dynam-
optimal strategies are (in a sense) the goal of evolu- ics of the network becomes stationary as t → ∞,
tionary adaptation, such a measure could conceiv- and, thus, that a sampling of the network states
ably only apply at the endpoint of evolution. Thus, at any subsequent time points becomes represen-
no general expression can be derived that ties these tative of the agent’s behavior. If the nodes in the
informational quantities directly to fitness. network process information independently from
each other, then the sum (over all neurons) of the
information processed by each neuron would equal
the amount of information processed by the entire
b
The notation is slightly modified here to conform to the network. The difference between the two then rep-
formalism used in Ref. 51. resents the amount of information that the network

processes over and above the information processed partition of the network into nonoverlapping
by the individual neurons, the synergistic informa- groups of nodes (parts) that are as independent of
tion51 each other (information theoretically speaking) as

n
possible. If we define the partition P of a network
S Iatom = I (Z t : Z t+1 ) − I (i ) Z t(i ) : Z t+1
(i )
. into k parts via P = {P (1) , P (2) , . . . , P (k) }, where
i =1 each P (i ) is a part of the network (a nonempty set
(27) of neurons with no overlap between the parts), then
The index “atom” on the synergistic information we can define a quantity that is analogous to Eq. (29)
reminds us that the sum is over the indivisible ele- except that the sum is over the parts rather than the
ments of the network—the neurons themselves. As individual neurons61
we see later, other more general partitions of the n
network are possible, and often times more appro- (P ) = H Pt(i ) |Pt+1
(i )
− H(Pt |Pt+1 ).
priate to capture synergy. The synergistic informa- i =1 (31)
tion is related to other measures of synergy that
In Eq. (31), each part carries a time label because
have been introduced independently. One is simply
every part takes on different states as time proceeds.
called “integration” and defined in terms of Shan-
The so-called “minimum information partition” (or
non entropies as64,66,67
MIP) is found by minimizing a normalized Eq. (31)

n
over all partitions
I= H Z t(i ) − H(Z t ). (28)
i =1 (Pt )
MIP = arg min , (32)
P N(Pt )
This measure has been introduced earlier under
the name “multi-information.”68,69 Another mea- where the normalization N(Pt ) = (k −
sure, called atom in Ref. 51, was independently in- 1) mini [Hmax (Pt(i ) )] balances the parts of the
troduced by Ay and Wennekers70,71 as a measure partition.62 Using this MIP, the integrated informa-
of the complexity of dynamical systems they called tion is then simply given by
“stochastic interaction,” and is defined as
= (P = MIP) . (33)

n
atom = H Z t(i ) |Z t+1

(i )
− H (Z t |Z t+1 ) . Finally, we need to introduce one more concept to
i =1 (29) measure information integration in realistic evolv-
ing networks. Because of a network with a single
Note the similarity between Eqs. (27)–(29): (or more) disconnected nodes vanishes (because the
whereas (27) measures synergistic information, (28) MIP for such a network is always the partition into
measures “synergistic entropy” and (29) synergistic the connected nodes in one part, and the discon-
conditional entropy in turn. The three are related nected in another), we should attempt to define the
because entropy and information are related, as for computational “main complex,” which is that sub-
example in Eqs. (11) and (21). Using this relation, set of nodes for which is maximal.62 This measure
it is easy to show that51 will be called MC hereafter.
atom = S Iatom + I. (30) Although all these measures attempt to capture
synergy, it is not clear whether any of them corre-
Although we can expect that measures such as Eqs. late with fitness when an agent evolves, that is, it
(28)–(30) quantify some aspects of information in- is not clear whether synergy or integration capture
tegration, it is likely that they overestimate the in- an aspect of the functional complexity of control
tegration because it is possible that elements of the structures that goes beyond the predictive infor-
computation are performed by groups of neurons mation defined earlier. To test this, Edlund et al.
that together behave as a single entity. In that case, evolved animats that learned, over 50,000 genera-
subdividing the whole network into independent tions of evolution, to navigate a two-dimensional
neurons may lead to the double counting of inte- maze,51 constructed in such a way that optimal nav-
grated information. A cleaner (albeit computation- igation requires memory. While measuring fitness,
ally much more expensive) approach is to find a they also recorded six different candidate measures

Figure 6. (A) Three candidate measures of information integration Φatom (29), ΦMC , and I (28) along the line of descent of a
representative evolutionary run in which animats adapted to solve a two-dimensional maze. (B) Three measures of information
processing, in the same run. Blue (solid): total information Itotal (24), green (dashed): atomic information S Iatom (27), and red
(dotted): predictive information Ipred (21) (from Ref. 51).
for brain complexity, among which are the predic- pathogen is changed (often dramatically), and as
tive information Eq. (21), the total information Eq. a consequence the genomic sequence that repre-
(24), the synergistic information Eq. (27), the inte- sented information before the administration of the
gration Eq. (28), the “atomic ” Eq. (29), and the drug is not information (or much less informa-
computationally intensive measure MC . Figure 6 tion) about the new environment.22 As the pathogen
shows a representative run (of 64) that shows the six adapts to the new environment, it acquires informa-
candidate measures as a function of evolutionary tion about that environment and its fitness increases
time measured in generations. During this run, the commensurately.
fitness increased steadily, with a big step around gen- Generally speaking, it appears that there is a fun-
eration 15,000 where this particular animat evolved damental law that links information to fitness (suit-
the capacity to use memory for navigation (from ably defined). Such a relationship can be written
Ref. 51). down explicitly for specific systems, such as the re-
It is not clear from a single run which of these lationship between the information content of DNA
measures best correlates with fitness. If we take the binding sites with the affinity the binding proteins
fitness attained at the end of each of 64 runs and plot have with that site,72 or the relationship between
it against the fitness (here measured as the percent- the information content of ribozymes and their cat-
age of the achievable fitness in this environment), alytic activity.73 We can expect such a relationship
the sophisticated measure MC emerges as the clear to hold as long as information is valuable, and this
winner, with a Spearman rank correlation coeffi- will always be the case as long as information can be
cient with achieved fitness of R = 0.937 (see Fig. 7). used in decision processes (broadly speaking) that
This suggests that measures of information integra- increase the long term of success of a lineage. It is
tion can go beyond simple “reactive” measures such possible to imagine exceptions to such a law where
as Ipred in characterizing complex behavior, in par- information would be harmful to an organism, in
ticular when the task requires memory, as was the the sense that signals perceived by a sensory appara-
case there. tus overwhelm, rather than aid, an organism. Such
a situation could arise when the signals are unan-
Future directions
ticipated, and simply cannot be acted upon in an
Needless to say, there are many more uses for inappropriate manner (for example in animal devel-
formation theory in evolutionary biology than re- opment). It is conceivable that in such a case, mech-
viewed here. For example, it is possible to describe anisms will evolve that protect an organism from
the evolution of drug resistance in terms of loss, and signals—this is the basic idea behind the evolution
subsequent gain, of information: when a pathogen of canalization,74 which is the capacity of an organ-
is treated with a drug, the fitness landscape of that ism to maintain its phenotype in the face of genetic

Figure 7. Correlation of information-based measures of complexity with fitness. ΦMC , I, Φatom , Itotal , Ipred , as a function of fitness
at the end of each of 64 independent runs. R indicates Spearman’s rank correlation coefficient. The red dot shows the run depicted
in Figure 6 (from Ref. 51).
and environmental variation. I would like to point from other sources), the population will soon evolve
out, however, that strictly speaking, canalization is to defect rather than to cooperate. This happens be-
the evolution of robustness with respect to entropy cause when the signals cannot be relied upon any-
(noise), not information. If a particular signal can- more, information (as the noise increases) gradually
not be used to make predictions, then this signal is turns into entropy. In that case, canalization is the
not information. In that respect, even the evolution better strategy and players evolve genes that ignore
of canalization (if it increases organismal fitness) the opponent’s moves.79 Thus, it appears entirely
increases the amount of information an organism possible that an information-theoretic formulation
has about its environment, because insulating itself of inclusive fitness theory (a theory that predicts the
from certain forms of noise will increase the reli- fitness of groups76,77 that goes beyond Hamilton’s
ability of the signals that the organism can use to kin selection theory) will lead to a predictive frame-
further its existence. work in which reliable communication is the key to
An interesting example that illustrates the benefit cooperation.
of information and the cost of entropy is the evo-
Conclusions
lution of cooperation, couched in the language of
evolutionary game theory.75 In evolutionary games, Information is the central currency for organismal
cooperation can evolve as long as the decision to fitness,80 and appears to be that which increases
cooperate benefits the group more than it costs when organisms adapt to their niche.13 Informa-
the individual.76–78 Groups can increase the ben- tion about the niche is stored in genes, and used to
efit accruing to them if they can choose judiciously make predictions about the future states of the envi-
who to interact with. Thus, acquiring information ronment. Because fitness is higher in well-predicted
about the game environment (in this case, the other environments (simply because it is easier to take
players) increases the fitness of the group via mu- advantage of the environment’s features for repro-
tual cooperative behavior. Indeed, it was shown reduction if they are predictable), organisms with
cently that cooperation can evolve among players more information about their niche are expected to
that interact via the rules of the so-called “Pris- outcompete those with less information, suggesting
oner’s Dilemma” game if the strategies that evolve a direct relationship between information content
can take into account information about how the and fitness within a niche (comparisons of informa-
opponent is playing.79 However, if this information tion content across niches, on the other hand, are
is marred by noise (either from genetic mutations meaningless because the information is not about
that decouple the phenotype from the genotype or the same system). A very similar relationship, also

enforced by the rules of natural selection, can be 7. Adami, C. 1998. Introduction to Artificial Life. Springer Ver-
found for information acquired not through the lag. New York.
8. Adami, C. 2006. Digital genetics: unravelling the genetic
evolutionary process, but instead via an organism’s
basis of evolution. Nat. Rev. Genet. 7: 109–118.
sensors. When this information is used for navi- 9. Lenski, R.E. & M. Travisano. 1994. Dynamics of adapta-
gation, for example, then a measure called “pre- tion and diversification: a 10,000-generation experiment
dictive information” is a good proxy for fitness as with bacterial populations. Proc. Natl. Acad. Sci. U. S. A.
long as navigation is performed taking only sen- 91: 6808–6814.
10. Cooper, T.F., D.E. Rozen & R.E. Lenski. 2003. Parallel
sor states into account: indeed, appropriate behav-
changes in gene expression after 20,000 generations of evo-
ior can evolve, even when information, not fitness, lution in Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 100:
is maximized. If, instead, decisions are also influ- 1072–1077.
enced by memory, different information-theoretic 11. Blount, Z.D., C.Z. Borland & R.E. Lenski. 2008. Historical
constructions based on the concept of “integrated contingency and the evolution of a key innovation in an
experimental population of Escherichia coli. Proc. Natl. Acad.
information” appear to correlate better with fitness,
Sci. U. S. A. 105: 7899–7906.
and capture how the brain forms more abstract rep- 12. Woods, R.J., J.E. Barrick, T.F. Cooper, et al. 2011. Second-
resentations of the world81 that are used to predict order selection for evolvability in a large Escherichia coli
the states of the world on temporal scales much population. Science 331: 1433–1436.
larger than the immediate future. Thus, the ability 13. Adami, C., C. Ofria & T.C. Collier. 2000. Evolution of biolog-
ical complexity. Proc. Natl. Acad. Sci. U. S. A. 97: 4463–4468.
of making predictions about the world that range
14. Lenski, R.E., C. Ofria, R.T. Pennock & C. Adami. 2003. The
far into the future may be the ultimate measure of evolutionary origin of complex features. Nature 423: 139–
functional complexity82 and perhaps even intelli- 144.
gence.83 15. Shannon, C. 1948. A mathematical theory of communica-
tion. Bell System Tech. J. 27: 379–423,623–656.
Acknowledgments 16. Quastler, H., Ed. 1953. Information Theory in Biology. Uni-
versity of Illinois Press. Urbana.
I would like to thank Matthew Rupp for collabora- 17. Landauer, R. 1991. Information is physical. Phys. Today 44:
tion in work presented in Section 3, and J. Edlund, 23–29.
A. Hintze, N. Chaumont, G. Tononi, and C. Koch 18. Rivoire, O. & S. Leibler. 2011. The value of information
for stimulating discussions and collaboration in the for populations in varying environments. J. Stat. Phys. 142:
1124–1166.
work presented in Section 4. This work was sup-
19. Sporns, O. 2011. Networks of the Brain. MIT Press.
ported in part by the Paul G. Allen Family Foun- Cambridge, MA.
dation, the Cambridge Templeton Consortium, the 20. Ash, R.B. 1965. Information Theory. Dover Publications, Inc.
National Science Foundation’s Frontiers in Integra- New York, NY.
tive Biological Research Grant FIBR-0527023, and 21. Cover, T.M. & J.A. Thomas 1991. Elements of Information
Theory. John Wiley. New York, NY.
NSF’s BEACON Center for the Study of Evolution
22. Adami, C. 2004. Information theory in molecular biology.
in Action under contract No. DBI-0939454. Phys. Life Rev. 1: 3–22.
23. Jühling, F., M. Mörl, R.K. Hartmann, et al. 2009. tRNAdb
Conflicts of interest
2009: compilation of tRNA sequences and tRNA genes. Nu-
The author declares no conflicts of interest. cleic Acids Res. 37(Suppl. 1): D159–D162.
24. Eddy, S.R. & R. Durbin. 1994. RNA sequence analysis using
References covariance models. Nucleic Acids Res. 22: 2079–2088.
25. Korber, B.T., R.M. Farber, D.H. Wolpert & A.S. Lapedes.
1. Darwin, C. 1859. On the Origin of Species By Means of Natural 1993. Covariation of mutations in the V3 loop of human
Selection. John Murray. London. immunodeficiency virus type 1 envelope protein: an infor-
2. Futuyma, D. 1998. Evolutionary Biology. Sinauer Associates. mation theoretic analysis. Proc. Natl. Acad. Sci. U. S. A. 90:
Sunderland, MA. 7176–7180.
3. Ewens, W.J. 2004. Mathematical Population Genetics. 26. Clarke, N.D. 1995. Covariation of residues in the home-
Springer. New York. odomain sequence family. Protein Sci. 4: 2269–2278.
4. Hartl, D. & A.G. Clark. 2007. Principles of Population Genet- 27. Atchley, W.R., K.R. Wollenberg, W.M. Fitch, et al. 2000.
ics. Sinauer Associates. Sunderland, MA. Correlations among amino acid sites in bHLH protein do-
5. Lenski, R.E. 2011. Evolution in action: a 50,000-generation mains: an information theoretic analysis. Mol. Biol. Evol. 17:
salute to Charles Darwin. Microbe 6: 30–33. 164–178.
6. Barrick, J.E., D.S. Yu, S.H. Yoon, et al. 2009. Genome evo- 28. Wang, L.-Y. 2005. Covariation analysis of local amino acid
lution and adaptation in a long-term experiment with Es- sequences in recurrent protein local structures. J. Bioinform.
cherichia coli. Nature 461: 1243–1247. Comput. Biol. 3: 1391–1409.

29. Wahl, L.M., L.C. Martin, G.B. Gloor & S.D. Dunn. 2005. 48. Wiener, N. 1948. Cybernetics: Or Control and Communica-
Using information theory to search for co-evolving residues tion in the Animal and the Machine. MIT Press. Cambridge,
in proteins. Bioinformatics 21: 4116–4124. MA.
30. Wang, Q. & C. Lee. 2007. Distinguishing functional amino 49. Ay, N., N. Bertschinger, R. Der, et al. 2008. Predictive in-
acid covariation from background linkage disequilibrium in formation and explorative behavior of autonomous robots.
HIV protease and reverse transcriptase. PLoS One 2: e814. Eur. Phys. J. B 63: 329–339.
31. Callahan, B., R.A. Neher, D. Bachtrog, et al. 2011. Correlated 50. Bialek, W., I. Nemenman & N. Tishby. 2001. Predictability,
evolution of nearby residues in Drosophilid proteins. PLoS complexity, and learning. Neural Comput. 13: 2409–2463.
Genet. 7: e1001315. 51. Edlund, J., N. Chaumont, A. Hintze, et al. 2011. Integrated
32. Levy, R.M., O. Haq, A.V. Morozov & M. Andrec. 2009. Pair- information increases with fitness in the evolution of ani-
wise and higher-order correlations among drug-resistance mats. PLoS Comput. Biol. 7: e1002236.
mutations in HIV-1 subtype B protease. BMC Bioinformat- 52. Linsker, R. 1988. Self-organization in a perceptual network.
ics 10(Suppl. 8): S10. Computer 21: 105–117.
33. Kryazhimskiy, S., J. Dushoff, G.A. Bazykin & J.B. Plotkin. 53. Sporns, O. & M. Lungarella. 2006. Evolving coordinated be-
2011. Prevalence of epistasis in the evolution of influenza A havior by maximizing information structure. In Proceedings
surface proteins. PLoS Genet. 7: e1001301. of the Tenth International Conference on the Simulation and
34. da Silva, J. 2009. Amino acid covariation in a functionally Synthesis of Living Systems. L.M. Rocha, L.S. Yaeger, M.A.
important human immunodeficiency virus type 1 protein Bedau, D. Floreano, R.L. Goldstone, et al., Eds.: 323–329.
region is associated with population subdivision. Genetics MIT Press. Bloomington, IN.
182: 265–275. 54. Zahedi, K., N. Ay & R. Der. 2010. Higher coordination with
35. Billeter, M., Y.Q. Qian, G. Otting, et al. 1993. Determination less control: a result of information maximization in the
of the nuclear magnetic resonance solution structure of an sensorimotor loop. Adapt. Behav. 18: 338–355.
Antennapedia homeodomain-DNA complex. J. Mol. Biol. 55. Klyubin, A.S., D. Polani & C.L. Nehaniv. 2007. Represen-
234: 1084–1093. tations of space and time in the maximization of informa-
36. Li, W.H., M. Gouy, P.M. Sharp, et al. 1990. Molecular phy- tion flow in the perception-action loop. Neural Comput. 19:
logeny of Rodentia, Lagomorpha, Primates, Artiodactyla, 2387–2432.
and Carnivora and molecular clocks. Proc. Natl. Acad. Sci. 56. Tononi, G., O. Sporns & G.M. Edelman. 1996. A complexity
U. S. A. 87: 6703–6707. measure for selective matching of signals by the brain. Proc.
37. Finn, R.D., J. Mistry, J. Tate, et al. 2010. The Pfam protein Natl. Acad. Sci. U. S. A. 93: 3422–3427.
families database. Nucleic Acids Res. 38: D211–D222. 57. Tononi, G. 2001. Information measures for conscious expe-
38. Basharin, G.P. 1959. On a statistical estimate for the entropy rience. Arch. Ital. Biol. 139: 367–371.
of a sequence of random variables. Theory Probab. Appl. 4: 58. Tononi, G. & O. Sporns. 2003. Measuring information inte-
333. gration. BMC Neurosci. 4: 31.
39. Scott, M.P., J.W. Tamkun & G.W. Hartzell. 1989. The struc- 59. Tononi, G. 2004. An information integration theory of con-
ture and function of the homeodomain. Biochim. Biophys. sciousness. BMC Neurosci. 5: 42.
Acta 989: 25–48. 60. Tononi, G. & C. Koch. 2008. The neural correlates of con-
40. van der Graaff, E., T. Laux & S.A. Rensing. 2009. The WUS sciousness: an update. Ann. N. Y. Acad. Sci. 1124: 239–
homeobox-containing (WOX) protein family. Genome Biol. 261.
10: 248. 61. Tononi, G. 2008. Consciousness as integrated information:
41. Garcia-Horsman, J.A., B. Barquera, J. Rumbley, et al. 1994. a provisional manifesto. Biol. Bull. 215: 216–242.
The superfamily of heme-copper respiratory oxidases. J Bac- 62. Balduzzi, D. & G. Tononi. 2008. Integrated information
teriol. 176: 5587–5600. in discrete dynamical systems: motivation and theoretical
42. Robinson, B.H. 2000. Human cytochrome oxidase defi- framework. PLoS Comput. Biol. 4: e1000091.
ciency. Pediatr. Res. 48: 581–585. 63. Balduzzi, D. & G. Tononi. 2009. Qualia: the geometry of
43. Taanman, J.W. 1997. Human cytochrome c oxidase: struc- integrated information. PLoS Comput. Biol. 5: e1000462.
ture, function, and deficiency. J. Bioenerg. Biomembr. 29: 64. Tononi, G., O. Sporns & G.M. Edelman. 1994. A measure
151–163. for brain complexity: relating functional segregation and
44. Thornton, J.W. 2004. Resurrecting ancient genes: experi- integration in the nervous system. Proc. Natl. Acad. Sci.
mental analysis of extinct molecules. Nat. Rev. Genet. 5: U. S. A. 91: 5033–5037.
366–375. 65. Tononi, G. 2010. Information integration: its relevance to
45. Pauling, L. & E. Zuckerkandl. 1963. Chemical paleogenetics: brain function and consciousness. Arch. Ital. Biol. 148: 299–
molecular restoration studies of extinct forms of life. Acta 322.
Chem. Scand. 17: 89. 66. Lungarella, M., T. Pegors, D. Bulwinkle & O. Sporns. 2005.
46. Benner, S. 2007. The early days of paleogenetics: connecting Methods for quantifying the informational structure of sen-
molecules to the planet. In Ancestral Sequence Reconstruction. sory and motor data. Neuroinformatics 3: 243–262.
D.A. Liberles, Ed.: 3–19. Oxford University Press. New York. 67. Lungarella, M. & O. Sporns. 2006. Mapping information
47. Federhen, S. 2002. The taxonomy project. In The NCBI flow in sensorimotor networks. PLoS Comput. Biol. 2: e144.
Handbook. J. McEntyre & J. Ostell, Eds. National Center 68. McGill, W.J. 1954. Multivariate information transmission.
for Biotechnology Information. Bethesda, MD. Psychometrika 19: 97–116.

69. Schneidman, E., S. Still, M.J. Berry & W. Bialek. 2003. Net- in the evolution of social behavior. Nature 318: 366–
work information and connected correlations. Phys. Rev. 367.
Lett. 91: 238701. 77. Fletcher, J.A. & M. Zwick. 2006. Unifying the theories of
70. Ay, N. & T. Wennekers. 2003. Temporal infomax leads to al- inclusive fitness and reciprocal altruism. Am. Nat. 168: 252–
most deterministic dynamical systems. Neurocomputing 52– 262.
54: 461–466. 78. Fletcher, J.A. & M. Doebeli. 2009. A simple and general
71. Ay, N. & T. Wennekers. 2003. Dynamical properties of explanation for the evolution of altruism. Proc. R. Soc. B-
strongly interacting Markov chains. Neural Netw. 16: 1483– Biol. Sci. 276: 13–19.
1497. 79. Iliopoulos, D., A. Hintze & C. Adami. 2010. Critical dynam-
72. Adami, C. 2012. unpublished. ics in the evolution of stochastic strategies for the iterated
73. Carothers, J.M., S.C. Oestreich, J.H. Davis & J.W. Szostak. Prisoner’s Dilemma. PLoS Comput. Biol. 6: e1000948.
2004. Informational complexity and functional activ- 80. Polani, D. 2009. Information: currency of life? HFSP J. 3:
ity of RNA structures. J. Am. Chem. Soc. 126: 5130– 307–316.
5137. 81. Marstaller, L., C. Adami & A. Hintze. 2012. Cognitive systems
74. Waddington, C.H. 1942. Canalization of development and evolve complex representations for adaptive behavior. In
the inheritance of acquired characters. Nature 150: 563– press.
565. 82. Adami, C. 2002. What is complexity? Bioessays 24: 1085–
75. Maynard Smith, J. 1982. Evolution and the Theory of Games. 1094.
Cambridge University Press. Cambridge, UK. 83. Adami, C. 2006. What do robots dream of? Science 314:
76. Queller, D.C. 1985. Kinship, reciprocity and synergism 1093–1094.


The Use of Information Theory in Evolutionary Biology

Uploaded by

Copyright:

Available Formats

The Use of Information Theory in Evolutionary Biology

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Use of Information Theory in Evolutionary Biology

Uploaded by

Copyright:

Available Formats

Ann. N.Y. Acad. Sci.

The use of information theory in evolutionary biology

Keywords: information theory; evolution; protein evolution; animat evolution

that they are the result of the evolutionary mecha-

2 Ann. N.Y. Acad. Sci. xxxx (2012) 1–17

Entropy The entropy of position variable X 28 can now be

Ann. N.Y. Acad. Sci. xxxx (2012) 1–17

4 Ann. N.Y. Acad. Sci. xxxx (2012) 1–17

information-theoretic formalism introduced ear- Watson–Crick binding to establish their secondary

Ann. N.Y. Acad. Sci. xxxx (2012) 1–17

odomain protein we obtain

6 Ann. N.Y. Acad. Sci. xxxx (2012) 1–17

level), we move closer toward evolutionarily more

Ann. N.Y. Acad. Sci. xxxx (2012) 1–17

8 Ann. N.Y. Acad. Sci. xxxx (2012) 1–17

Ann. N.Y. Acad. Sci. xxxx (2012) 1–17

10 Ann. N.Y. Acad. Sci. xxxx (2012) 1–17

respond to a changing environment with a probabil- Integrated information

Icausal = I (X t : Yt+1 |X t−1 ). (23)

Ann. N.Y. Acad. Sci. xxxx (2012) 1–17

atom = H Z t(i ) |Z t+1

12 Ann. N.Y. Acad. Sci. xxxx (2012) 1–17

Ann. N.Y. Acad. Sci. xxxx (2012) 1–17

14 Ann. N.Y. Acad. Sci. xxxx (2012) 1–17

Ann. N.Y. Acad. Sci. xxxx (2012) 1–17

16 Ann. N.Y. Acad. Sci. xxxx (2012) 1–17

Ann. N.Y. Acad. Sci. xxxx (2012) 1–17

You might also like