The Use of Information Theory in Evolutionary Biology
The Use of Information Theory in Evolutionary Biology
The Use of Information Theory in Evolutionary Biology
ISSN 0077-8923
A N N A L S O F T H E N E W Y O R K A C A D E M Y O F SC I E N C E S
Issue: The Year in Evolutionary Biology
Address for correspondence: C. Adami, Department of Microbiology and Molecular Genetics, 2209 Biomedical and Physical
Sciences Building, Michigan State University, East Lansing, MI 48824. adami@msu.edu
Information is a key concept in evolutionary biology. Information stored in a biological organism’s genome is used
to generate the organism and to maintain and control it. Information is also that which evolves. When a population
adapts to a local environment, information about this environment is fixed in a representative genome. However,
when an environment changes, information can be lost. At the same time, information is processed by animal
brains to survive in complex environments, and the capacity for information processing also evolves. Here, I review
applications of information theory to the evolution of proteins and to the evolution of information processing in
simulated agents that adapt to perform a complex task.
doi: 10.1111/j.1749-6632.2011.06422.x
Ann. N.Y. Acad. Sci. xxxx (2012) 1–17
c 2012 New York Academy of Sciences. 1
Information theory in evolutionary biology Adami
hands of a Bell Laboratories engineer,15 this the- then we can find those general principles that have
ory was thought to ultimately explain everything eluded us so far.
from the higher functions of living organisms down In this review, I focus on two uses of informa-
to metabolism, growth, and differentiation.16 How- tion theory in evolutionary biology: First, the quan-
ever, this optimism soon gave way to a miasma of tification of the information content of genes and
confounding mathematical and philosophical argu- proteins and how this information may have evolved
ments that dampened enthusiasm for the concept along the branches of the tree of life. Second, the evo-
of information in biology for decades. To some ex- lution of information-processing structures (such
tent, evolutionary biology was not yet ready for a as brains) that control animals, and how the func-
quantitative treatment of “that which evolves:” the tional complexity of these brains (and how they
year of publication of “Information in Biology”16 evolve) could be quantified using information the-
coincided with the discovery of the structure of ory. The latter approach reinforces a concept that
DNA, and the wealth of sequence data that cata- has appeared in neuroscience repeatedly: the value
pulted evolutionary biology into the computer age of information for an adapted organism is fitness,18
was still half a century away. and the complexity of an organism’s brain must
Colloquially, information is often described as be reflected in how it manages to process, inte-
something that aids in decision making. Interest- grate, and make use of information for its own
ingly, this is very close to the mathematical meaning advantage.19
of “information,” which is concerned with quanti-
Entropy and information in molecular
fying the ability to make predictions about uncer-
sequences
tain systems. Life—among many other aspects—
has the peculiar property of displaying behavior To define entropy and information, we first must de-
or characters that are appropriate, given the envi- fine the concept of a random variable. In probability
ronment. We recognize this of course as the conse- theory, a random variable X is a mathematical object
quence of adaptation, but the outcome is that the that can take on a finite number of different states
adapted organism’s decisions are “in tune” with its x1 · · · x N with specified probabilities p1 , . . . , p N .
environment—the organism has information about We should keep in mind that a mathematical ran-
its environment. One of the insights that has dom variable is a description—sometimes accurate,
emerged from the theory of computation is that in- sometimes not—of a physical object. For example,
formation must be physical—information cannot the random variable that we would use to describe a
exist without a physical substrate that encodes it.17 fair coin has two states: x1 = heads and x2 = tails,
In computers, information is encoded in zeros and with probabilities p1 = p2 = 0.5. Of course, an ac-
ones, which themselves are represented by differ- tual coin is a far more complex device—it may devi-
ent voltages on semiconductors. The information ate from being true, it may land on an edge once in
we retain in our brains also has a physical substrate, a while, and its faces can make different angles with
even though its physiological basis depends on the true North. Yet, when coins are used for demon-
type of memory and is far from certain. Context- strations in probability theory or statistics, they are
appropriate decisions require information, however most succinctly described with two states and two
it is stored. For cells, we now know that this infor- equal probabilities. Nucleic acids can be described
mation is stored in a cell’s inherited genetic material, probabilistically in a similar manner. We can define
and is precisely the kind that Shannon described in a nucleic acid random variable X as having four
his 1948 articles. If inherited genetic material rep- states x1 = A, x2 = C, x3 = G, and x4 = T, which
resents information, then how did the information- it can take on with probabilities p1 , . . . , p4 , while
carrying molecules acquire it? Is the amount of in- being perfectly aware that the nucleic acid molecules
formation stored in genes increasing throughout themselves are far more complex, and deserve
evolution, and if so, why? How much information a richer description than the four-state abstrac-
does an organism store? How much in a single gene? tion. But given the role that these molecules
If we can replace a discussion of the evolution of play as information carriers of the genetic mate-
complexity along the various lines of descent with a rial, this abstraction will serve us very well going
discussion of the evolution of information, perhaps forward.
variable X 28 to the variable X 41 : We can now proceed and calculate the information
p (X 41 |X 28 ) content. Each column in Eq. (7) represents a con-
⎛ ⎞ ditional probability to find a particular nucleotide
p (A | A) p (A | C) p (A | G) p (A | T) at position 41, given a particular value is found at
⎜ ⎟
⎜ p (C | A) p (C | C) p (C | G) p (C | T) ⎟ position 28. We can use these values to calculate the
⎜ ⎟
=⎜ ⎟ conditional entropy to find a particular nucleotide,
⎜ p (G | A) p (G | C) p (G | G) p (G | T) ⎟
⎝ ⎠ given that position 28 is “A,” for example, as
p (T | A) p (T | C) p (T | G) p (T | T)
H(X 41 |X 28 = A)
⎛ ⎞
0.2 0.235 0 0.5 = −0.2 log2 0.2 − 0.8 log2 0.8 ≈ 0.72 bits. (9)
⎜ ⎟
⎜ 0 0.706 0.2 0.333 ⎟ This allows us to calculate the amount of informa-
⎜ ⎟
=⎜ ⎟. tion that is revealed (about X 41 ) by knowing the
⎜ 0.8 0 0.4 0.167 ⎟
⎝ ⎠ state of X 28 . If we do not know the state of X 28 , our
0 0.059 0.4 0 (7) uncertainty about X 41 is 1.795 bits, as calculated
earlier. But revealing that X 28 actually is an “A” has
We can glean important information from these
reduced our uncertainty to 0.72 bits, as we saw in
probabilities. It is clear, for example, that positions
Eq. (9). The information we obtained is then just
28 and 41 are not independent from each other. If
the difference
nucleotide 28 is an “A,” then position 41 can only
be an “A” or a “G,” but mostly (4/5 times) you I (X 41 : X 28 = A) = H (X 41 ) − H (X 41 |X 28 = A)
expect a “G.” But consider the dependence between ≈ 1.075 bits, (10)
nucleotides 42 and 28
⎛ ⎞ that is, just over 1 bit. The notation in Eq. (10),
0 0 0 1 indicating information between two variables by a
⎜ ⎟
⎜0 0 1 0⎟ colon (sometimes a semicolon) is conventional. We
⎜ ⎟
p (X 42 | X 28 ) = ⎜ ⎟. (8) can also calculate the average amount of information
⎜0 1 0 0⎟
⎝ ⎠ about X 41 that is gained by revealing the state of X 28
1 0 0 0 as
This dependence is striking—if you know posi- I (X 41 : X 28 ) = H (X 41 ) − H (X 41 |X 28 )
tion 28, you can predict (based on the sequence ≈ 0.64 bits. (11)
data given) position 42 with certainty. The reason
Here, H(X 41 |X 28 ) is the average conditional en-
for this perfect correlation lies in the functional in-
tropy of X 41 given X 28 , obtained by averaging the
teraction between the sites: 28 and 42 are paired in
four conditional entropies (for the four possible
a stem of the tRNA molecule in a Watson–Crick
states of X 28 ) using the probabilities with which
pair—to enable the pairing, a “G” must be associ-
X 28 occurs in any of its four states, given by Eq. (3).
ated with a “C,” and a “T” (encoding a U) must be
If we apply the same calculation to the pair of po-
associated with an “A.” It does not matter which is
sitions X 42 and X 28 , we should find that knowing
at any position as long as the paired nucleotide is
X 28 reduces our uncertainty about X 42 to zero—
complementary. And it is also clear that these as-
indeed, X 28 carries perfect information about X 42 .
sociations are maintained by the selective pressures
The covariance between residues in an RNA sec-
of Darwinian evolution—a substitution that breaks
ondary structure captured by the mutual entropy
the pattern leads to a molecule that does not fold into
can be used to predict secondary structure from se-
the correct shape to efficiently translate messenger
quence alignments alone.24
RNA into proteins. As a consequence, the organism
bearing such a mutation will be eliminated from Information content of proteins
the gene pool. This simple example shows clearly We have seen that different positions within a
the relationship between information theory and biomolecule can carry information about other po-
evolutionary biology: Fitness is reflected in infor- sitions, but how much information do they store
mation, and when selective pressures maximize fit- about the environment within which they evolve?
ness, information must be maximized concurrently. This question can be answered using the same
Figure 3. Simplified phylogenetic classification of animals. At the root of this tree (on the left tree) are the eukaryotes, but only
the animal branch is shown here. If we follow the line of descent of humans, we move on the branch toward the vertebrates. The
vertebrate clade itself is shown in the tree on the right, and the line of descent through this tree follows the branches that end in the
mammals. The mammal tree, finally, is shown at the bottom, with the line ending in Homo sapiens indicated in red.
different picture. Except for an obvious split of the what allows the bearer to make predictions (about
bacterial version of the protein and the eukaryotic that particular system) with accuracy better than
one, the total entropy markedly decreases across chance, information is valuable as long as prediction
the lines as the taxonomic depth increases. Further- is valuable. In an uncertain world, making accu-
more, the arthropod COX2 is more entropic than rate predictions is tantamount to survival. In other
the vertebrate one (see Fig. 5) as opposed to the words, we expect that information, acquired from
ordering for the homeobox protein. This finding the environment and processed, has survival value
suggests that the evolution of the protein informa- and therefore is selected for in evolution.
tion content is specific to each protein, and most Predictive information
likely reflects the adaptive value of the protein for The connection between information and fitness
each family. can be made much more precise. A key relation
between information and its value for agents that
Evolution of information in robots
survive in an uncertain world as a consequence of
and animats
their actions in it was provided by Ay et al.,49 who
The evolution of information within the genes of applied a measure called “predictive information”
adapting organisms is but one use of information (defined earlier by Bialek et al.50 in the context of
theory in evolutionary biology. Just as anticipated dynamical systems theory) to characterize the be-
in the heydays of the “Cybernetics” movement,48 havioral complexity of an autonomous robot. These
information theory has indeed something to say authors showed that the mutual entropy between a
about the evolution of information processing in changing world (as represented by changing states
animal brains. The general idea behind the connec- in an organism’s sensors) and the actions of mo-
tion between information and function is simple: tors that drive the agent’s behavior (thus changing
Because information (about a particular system) is the future perceived states) is equivalent to Bialek’s
Figure 4. Entropy of homeobox domain protein sequences (PF00046 in the Pfam database, accessed July 20, 2006) as a function of
taxonomic depth for different major groups that have at last 200 sequences in the database, connected by phylogenetic relationships.
Selected groups are annotated by name. Fifty-seven core residues were used to calculate the molecular entropy. Core residues have
at least 70% sequence in the database.
predictive information as long as the agent’s deci- states—that is, the uncertainty about what the de-
sions are Markovian, that is, only depend on the state tectors will record next—is explained by the motor
of the agent and the environment at the preceding states at the preceding time point. For example, if
time. This predictive information is defined as the the motor states at time t perfectly predict what
shared entropy between motor variables Yt and the will appear in the sensors at time t + 1, then the
sensor variables at the subsequent time point X t+1 predictive information is maximal. Another version
of the predictive information studies not the effect
Ipred = I (Yt : X t+1 ) = H(X t+1 ) − H(X t+1 |Yt ).
the motors have on future sensor states, but the
(21) effect the sensors have on future motor states in-
Here, H(X t+1 ) is the entropy of the sensor states stead, for example to guide an autonomous robot
at time t + 1, defined as through a maze.51 In the former case, the pre-
dictive information quantifies how actions change
H(X t+1 ) = − p(xt+1 ) log p(xt+1 ), (22) the perceived world, whereas in the latter case the
xt+1
predictive information characterizes how the per-
using the probability distribution p(xt+1 ) over the ceived world changes the robot’s actions. Both for-
sensor states xt+1 at time t + 1. The conditional mulations, however, are equivalent when taking
entropy H(X t+1 |Yt ) characterizes how much is left into account how world and robot states are being
uncertain about the future sensor states X t+1 given updated.51 Although it is clear that measures such as
the robot’s actions in the present, that is, the state predictive information should increase as an agent
of the motors at time t, and can be calculated in the or robot learns to behave appropriately in a complex
standard manner20,21 from the joint probability dis- world, it is not at all clear whether information could
tribution of present motor states and future sensor be used as an objective function that, if maximized,
states p(xt+1 , yt ). will lead to appropriate behavior of the robot. This
As Eq. (21) implies, the predictive information is the basic hypothesis of Linsker’s “Infomax” prin-
measures how much of the entropy of sensorial ciple,52 which posits that neural control structures
Figure 5. Entropy of COX subunit II (PF00116 in the Pfam database, accessed June 22, 2006) protein sequences as a function
of taxonomic depth for selected different groups (at least 200 sequences per group), connected by phylogenetic relationships. One
hundred twenty core residues were used to calculate the molecular entropy.
evolve to maximize “information preservation” sub- periments suggest that there may be a deeper con-
ject to constraints. This hypothesis implies that the nection between information and fitness that goes
infomax principle could play the role of a guiding beyond the regularities induced by a perception–
force in the organization of perceptual systems. This action loop, that connects fitness (in the evolution-
is precisely what has been observed in experiments ary sense as the growth rate of a population) directly
with autonomous robots evolved to perform a va- to information.
riety of tasks. For example, in one task visual and As a matter of fact, Rivoire and Leibler18 recently
tactile information had to be integrated to grab an studied abstract models of the population dynamics
object,53 whereas in another, groups of five robots of evolving “finite-state agents” that optimize their
were evolved to move in a coordinated fashion54 response to a changing environment and found just
or else to navigate according to a map.55 Such ex- such a relationship. In such a description, agents
processes over and above the information processed partition of the network into nonoverlapping
by the individual neurons, the synergistic informa- groups of nodes (parts) that are as independent of
tion51 each other (information theoretically speaking) as
n
possible. If we define the partition P of a network
S Iatom = I (Z t : Z t+1 ) − I (i ) Z t(i ) : Z t+1
(i )
. into k parts via P = {P (1) , P (2) , . . . , P (k) }, where
i =1 each P (i ) is a part of the network (a nonempty set
(27) of neurons with no overlap between the parts), then
The index “atom” on the synergistic information we can define a quantity that is analogous to Eq. (29)
reminds us that the sum is over the indivisible ele- except that the sum is over the parts rather than the
ments of the network—the neurons themselves. As individual neurons61
we see later, other more general partitions of the n
network are possible, and often times more appro- (P ) = H Pt(i ) |Pt+1
(i )
− H(Pt |Pt+1 ).
priate to capture synergy. The synergistic informa- i =1 (31)
tion is related to other measures of synergy that
In Eq. (31), each part carries a time label because
have been introduced independently. One is simply
every part takes on different states as time proceeds.
called “integration” and defined in terms of Shan-
The so-called “minimum information partition” (or
non entropies as64,66,67
MIP) is found by minimizing a normalized Eq. (31)
n
over all partitions
I= H Z t(i ) − H(Z t ). (28)
i =1 (Pt )
MIP = arg min , (32)
P N(Pt )
This measure has been introduced earlier under
the name “multi-information.”68,69 Another mea- where the normalization N(Pt ) = (k −
sure, called atom in Ref. 51, was independently in- 1) mini [Hmax (Pt(i ) )] balances the parts of the
troduced by Ay and Wennekers70,71 as a measure partition.62 Using this MIP, the integrated informa-
of the complexity of dynamical systems they called tion is then simply given by
“stochastic interaction,” and is defined as
= (P = MIP) . (33)
n
Figure 6. (A) Three candidate measures of information integration Φatom (29), ΦMC , and I (28) along the line of descent of a
representative evolutionary run in which animats adapted to solve a two-dimensional maze. (B) Three measures of information
processing, in the same run. Blue (solid): total information Itotal (24), green (dashed): atomic information S Iatom (27), and red
(dotted): predictive information Ipred (21) (from Ref. 51).
for brain complexity, among which are the predic- pathogen is changed (often dramatically), and as
tive information Eq. (21), the total information Eq. a consequence the genomic sequence that repre-
(24), the synergistic information Eq. (27), the inte- sented information before the administration of the
gration Eq. (28), the “atomic ” Eq. (29), and the drug is not information (or much less informa-
computationally intensive measure MC . Figure 6 tion) about the new environment.22 As the pathogen
shows a representative run (of 64) that shows the six adapts to the new environment, it acquires informa-
candidate measures as a function of evolutionary tion about that environment and its fitness increases
time measured in generations. During this run, the commensurately.
fitness increased steadily, with a big step around gen- Generally speaking, it appears that there is a fun-
eration 15,000 where this particular animat evolved damental law that links information to fitness (suit-
the capacity to use memory for navigation (from ably defined). Such a relationship can be written
Ref. 51). down explicitly for specific systems, such as the re-
It is not clear from a single run which of these lationship between the information content of DNA
measures best correlates with fitness. If we take the binding sites with the affinity the binding proteins
fitness attained at the end of each of 64 runs and plot have with that site,72 or the relationship between
it against the fitness (here measured as the percent- the information content of ribozymes and their cat-
age of the achievable fitness in this environment), alytic activity.73 We can expect such a relationship
the sophisticated measure MC emerges as the clear to hold as long as information is valuable, and this
winner, with a Spearman rank correlation coeffi- will always be the case as long as information can be
cient with achieved fitness of R = 0.937 (see Fig. 7). used in decision processes (broadly speaking) that
This suggests that measures of information integra- increase the long term of success of a lineage. It is
tion can go beyond simple “reactive” measures such possible to imagine exceptions to such a law where
as Ipred in characterizing complex behavior, in par- information would be harmful to an organism, in
ticular when the task requires memory, as was the the sense that signals perceived by a sensory appara-
case there. tus overwhelm, rather than aid, an organism. Such
a situation could arise when the signals are unan-
Future directions
ticipated, and simply cannot be acted upon in an
Needless to say, there are many more uses for in- appropriate manner (for example in animal devel-
formation theory in evolutionary biology than re- opment). It is conceivable that in such a case, mech-
viewed here. For example, it is possible to describe anisms will evolve that protect an organism from
the evolution of drug resistance in terms of loss, and signals—this is the basic idea behind the evolution
subsequent gain, of information: when a pathogen of canalization,74 which is the capacity of an organ-
is treated with a drug, the fitness landscape of that ism to maintain its phenotype in the face of genetic
Figure 7. Correlation of information-based measures of complexity with fitness. ΦMC , I, Φatom , Itotal , Ipred , as a function of fitness
at the end of each of 64 independent runs. R indicates Spearman’s rank correlation coefficient. The red dot shows the run depicted
in Figure 6 (from Ref. 51).
and environmental variation. I would like to point from other sources), the population will soon evolve
out, however, that strictly speaking, canalization is to defect rather than to cooperate. This happens be-
the evolution of robustness with respect to entropy cause when the signals cannot be relied upon any-
(noise), not information. If a particular signal can- more, information (as the noise increases) gradually
not be used to make predictions, then this signal is turns into entropy. In that case, canalization is the
not information. In that respect, even the evolution better strategy and players evolve genes that ignore
of canalization (if it increases organismal fitness) the opponent’s moves.79 Thus, it appears entirely
increases the amount of information an organism possible that an information-theoretic formulation
has about its environment, because insulating itself of inclusive fitness theory (a theory that predicts the
from certain forms of noise will increase the reli- fitness of groups76,77 that goes beyond Hamilton’s
ability of the signals that the organism can use to kin selection theory) will lead to a predictive frame-
further its existence. work in which reliable communication is the key to
An interesting example that illustrates the benefit cooperation.
of information and the cost of entropy is the evo-
Conclusions
lution of cooperation, couched in the language of
evolutionary game theory.75 In evolutionary games, Information is the central currency for organismal
cooperation can evolve as long as the decision to fitness,80 and appears to be that which increases
cooperate benefits the group more than it costs when organisms adapt to their niche.13 Informa-
the individual.76–78 Groups can increase the ben- tion about the niche is stored in genes, and used to
efit accruing to them if they can choose judiciously make predictions about the future states of the envi-
who to interact with. Thus, acquiring information ronment. Because fitness is higher in well-predicted
about the game environment (in this case, the other environments (simply because it is easier to take
players) increases the fitness of the group via mu- advantage of the environment’s features for repro-
tual cooperative behavior. Indeed, it was shown re- duction if they are predictable), organisms with
cently that cooperation can evolve among players more information about their niche are expected to
that interact via the rules of the so-called “Pris- outcompete those with less information, suggesting
oner’s Dilemma” game if the strategies that evolve a direct relationship between information content
can take into account information about how the and fitness within a niche (comparisons of informa-
opponent is playing.79 However, if this information tion content across niches, on the other hand, are
is marred by noise (either from genetic mutations meaningless because the information is not about
that decouple the phenotype from the genotype or the same system). A very similar relationship, also
enforced by the rules of natural selection, can be 7. Adami, C. 1998. Introduction to Artificial Life. Springer Ver-
found for information acquired not through the lag. New York.
8. Adami, C. 2006. Digital genetics: unravelling the genetic
evolutionary process, but instead via an organism’s
basis of evolution. Nat. Rev. Genet. 7: 109–118.
sensors. When this information is used for navi- 9. Lenski, R.E. & M. Travisano. 1994. Dynamics of adapta-
gation, for example, then a measure called “pre- tion and diversification: a 10,000-generation experiment
dictive information” is a good proxy for fitness as with bacterial populations. Proc. Natl. Acad. Sci. U. S. A.
long as navigation is performed taking only sen- 91: 6808–6814.
10. Cooper, T.F., D.E. Rozen & R.E. Lenski. 2003. Parallel
sor states into account: indeed, appropriate behav-
changes in gene expression after 20,000 generations of evo-
ior can evolve, even when information, not fitness, lution in Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 100:
is maximized. If, instead, decisions are also influ- 1072–1077.
enced by memory, different information-theoretic 11. Blount, Z.D., C.Z. Borland & R.E. Lenski. 2008. Historical
constructions based on the concept of “integrated contingency and the evolution of a key innovation in an
experimental population of Escherichia coli. Proc. Natl. Acad.
information” appear to correlate better with fitness,
Sci. U. S. A. 105: 7899–7906.
and capture how the brain forms more abstract rep- 12. Woods, R.J., J.E. Barrick, T.F. Cooper, et al. 2011. Second-
resentations of the world81 that are used to predict order selection for evolvability in a large Escherichia coli
the states of the world on temporal scales much population. Science 331: 1433–1436.
larger than the immediate future. Thus, the ability 13. Adami, C., C. Ofria & T.C. Collier. 2000. Evolution of biolog-
ical complexity. Proc. Natl. Acad. Sci. U. S. A. 97: 4463–4468.
of making predictions about the world that range
14. Lenski, R.E., C. Ofria, R.T. Pennock & C. Adami. 2003. The
far into the future may be the ultimate measure of evolutionary origin of complex features. Nature 423: 139–
functional complexity82 and perhaps even intelli- 144.
gence.83 15. Shannon, C. 1948. A mathematical theory of communica-
tion. Bell System Tech. J. 27: 379–423,623–656.
Acknowledgments 16. Quastler, H., Ed. 1953. Information Theory in Biology. Uni-
versity of Illinois Press. Urbana.
I would like to thank Matthew Rupp for collabora- 17. Landauer, R. 1991. Information is physical. Phys. Today 44:
tion in work presented in Section 3, and J. Edlund, 23–29.
A. Hintze, N. Chaumont, G. Tononi, and C. Koch 18. Rivoire, O. & S. Leibler. 2011. The value of information
for stimulating discussions and collaboration in the for populations in varying environments. J. Stat. Phys. 142:
1124–1166.
work presented in Section 4. This work was sup-
19. Sporns, O. 2011. Networks of the Brain. MIT Press.
ported in part by the Paul G. Allen Family Foun- Cambridge, MA.
dation, the Cambridge Templeton Consortium, the 20. Ash, R.B. 1965. Information Theory. Dover Publications, Inc.
National Science Foundation’s Frontiers in Integra- New York, NY.
tive Biological Research Grant FIBR-0527023, and 21. Cover, T.M. & J.A. Thomas 1991. Elements of Information
Theory. John Wiley. New York, NY.
NSF’s BEACON Center for the Study of Evolution
22. Adami, C. 2004. Information theory in molecular biology.
in Action under contract No. DBI-0939454. Phys. Life Rev. 1: 3–22.
23. Jühling, F., M. Mörl, R.K. Hartmann, et al. 2009. tRNAdb
Conflicts of interest
2009: compilation of tRNA sequences and tRNA genes. Nu-
The author declares no conflicts of interest. cleic Acids Res. 37(Suppl. 1): D159–D162.
24. Eddy, S.R. & R. Durbin. 1994. RNA sequence analysis using
References covariance models. Nucleic Acids Res. 22: 2079–2088.
25. Korber, B.T., R.M. Farber, D.H. Wolpert & A.S. Lapedes.
1. Darwin, C. 1859. On the Origin of Species By Means of Natural 1993. Covariation of mutations in the V3 loop of human
Selection. John Murray. London. immunodeficiency virus type 1 envelope protein: an infor-
2. Futuyma, D. 1998. Evolutionary Biology. Sinauer Associates. mation theoretic analysis. Proc. Natl. Acad. Sci. U. S. A. 90:
Sunderland, MA. 7176–7180.
3. Ewens, W.J. 2004. Mathematical Population Genetics. 26. Clarke, N.D. 1995. Covariation of residues in the home-
Springer. New York. odomain sequence family. Protein Sci. 4: 2269–2278.
4. Hartl, D. & A.G. Clark. 2007. Principles of Population Genet- 27. Atchley, W.R., K.R. Wollenberg, W.M. Fitch, et al. 2000.
ics. Sinauer Associates. Sunderland, MA. Correlations among amino acid sites in bHLH protein do-
5. Lenski, R.E. 2011. Evolution in action: a 50,000-generation mains: an information theoretic analysis. Mol. Biol. Evol. 17:
salute to Charles Darwin. Microbe 6: 30–33. 164–178.
6. Barrick, J.E., D.S. Yu, S.H. Yoon, et al. 2009. Genome evo- 28. Wang, L.-Y. 2005. Covariation analysis of local amino acid
lution and adaptation in a long-term experiment with Es- sequences in recurrent protein local structures. J. Bioinform.
cherichia coli. Nature 461: 1243–1247. Comput. Biol. 3: 1391–1409.
29. Wahl, L.M., L.C. Martin, G.B. Gloor & S.D. Dunn. 2005. 48. Wiener, N. 1948. Cybernetics: Or Control and Communica-
Using information theory to search for co-evolving residues tion in the Animal and the Machine. MIT Press. Cambridge,
in proteins. Bioinformatics 21: 4116–4124. MA.
30. Wang, Q. & C. Lee. 2007. Distinguishing functional amino 49. Ay, N., N. Bertschinger, R. Der, et al. 2008. Predictive in-
acid covariation from background linkage disequilibrium in formation and explorative behavior of autonomous robots.
HIV protease and reverse transcriptase. PLoS One 2: e814. Eur. Phys. J. B 63: 329–339.
31. Callahan, B., R.A. Neher, D. Bachtrog, et al. 2011. Correlated 50. Bialek, W., I. Nemenman & N. Tishby. 2001. Predictability,
evolution of nearby residues in Drosophilid proteins. PLoS complexity, and learning. Neural Comput. 13: 2409–2463.
Genet. 7: e1001315. 51. Edlund, J., N. Chaumont, A. Hintze, et al. 2011. Integrated
32. Levy, R.M., O. Haq, A.V. Morozov & M. Andrec. 2009. Pair- information increases with fitness in the evolution of ani-
wise and higher-order correlations among drug-resistance mats. PLoS Comput. Biol. 7: e1002236.
mutations in HIV-1 subtype B protease. BMC Bioinformat- 52. Linsker, R. 1988. Self-organization in a perceptual network.
ics 10(Suppl. 8): S10. Computer 21: 105–117.
33. Kryazhimskiy, S., J. Dushoff, G.A. Bazykin & J.B. Plotkin. 53. Sporns, O. & M. Lungarella. 2006. Evolving coordinated be-
2011. Prevalence of epistasis in the evolution of influenza A havior by maximizing information structure. In Proceedings
surface proteins. PLoS Genet. 7: e1001301. of the Tenth International Conference on the Simulation and
34. da Silva, J. 2009. Amino acid covariation in a functionally Synthesis of Living Systems. L.M. Rocha, L.S. Yaeger, M.A.
important human immunodeficiency virus type 1 protein Bedau, D. Floreano, R.L. Goldstone, et al., Eds.: 323–329.
region is associated with population subdivision. Genetics MIT Press. Bloomington, IN.
182: 265–275. 54. Zahedi, K., N. Ay & R. Der. 2010. Higher coordination with
35. Billeter, M., Y.Q. Qian, G. Otting, et al. 1993. Determination less control: a result of information maximization in the
of the nuclear magnetic resonance solution structure of an sensorimotor loop. Adapt. Behav. 18: 338–355.
Antennapedia homeodomain-DNA complex. J. Mol. Biol. 55. Klyubin, A.S., D. Polani & C.L. Nehaniv. 2007. Represen-
234: 1084–1093. tations of space and time in the maximization of informa-
36. Li, W.H., M. Gouy, P.M. Sharp, et al. 1990. Molecular phy- tion flow in the perception-action loop. Neural Comput. 19:
logeny of Rodentia, Lagomorpha, Primates, Artiodactyla, 2387–2432.
and Carnivora and molecular clocks. Proc. Natl. Acad. Sci. 56. Tononi, G., O. Sporns & G.M. Edelman. 1996. A complexity
U. S. A. 87: 6703–6707. measure for selective matching of signals by the brain. Proc.
37. Finn, R.D., J. Mistry, J. Tate, et al. 2010. The Pfam protein Natl. Acad. Sci. U. S. A. 93: 3422–3427.
families database. Nucleic Acids Res. 38: D211–D222. 57. Tononi, G. 2001. Information measures for conscious expe-
38. Basharin, G.P. 1959. On a statistical estimate for the entropy rience. Arch. Ital. Biol. 139: 367–371.
of a sequence of random variables. Theory Probab. Appl. 4: 58. Tononi, G. & O. Sporns. 2003. Measuring information inte-
333. gration. BMC Neurosci. 4: 31.
39. Scott, M.P., J.W. Tamkun & G.W. Hartzell. 1989. The struc- 59. Tononi, G. 2004. An information integration theory of con-
ture and function of the homeodomain. Biochim. Biophys. sciousness. BMC Neurosci. 5: 42.
Acta 989: 25–48. 60. Tononi, G. & C. Koch. 2008. The neural correlates of con-
40. van der Graaff, E., T. Laux & S.A. Rensing. 2009. The WUS sciousness: an update. Ann. N. Y. Acad. Sci. 1124: 239–
homeobox-containing (WOX) protein family. Genome Biol. 261.
10: 248. 61. Tononi, G. 2008. Consciousness as integrated information:
41. Garcia-Horsman, J.A., B. Barquera, J. Rumbley, et al. 1994. a provisional manifesto. Biol. Bull. 215: 216–242.
The superfamily of heme-copper respiratory oxidases. J Bac- 62. Balduzzi, D. & G. Tononi. 2008. Integrated information
teriol. 176: 5587–5600. in discrete dynamical systems: motivation and theoretical
42. Robinson, B.H. 2000. Human cytochrome oxidase defi- framework. PLoS Comput. Biol. 4: e1000091.
ciency. Pediatr. Res. 48: 581–585. 63. Balduzzi, D. & G. Tononi. 2009. Qualia: the geometry of
43. Taanman, J.W. 1997. Human cytochrome c oxidase: struc- integrated information. PLoS Comput. Biol. 5: e1000462.
ture, function, and deficiency. J. Bioenerg. Biomembr. 29: 64. Tononi, G., O. Sporns & G.M. Edelman. 1994. A measure
151–163. for brain complexity: relating functional segregation and
44. Thornton, J.W. 2004. Resurrecting ancient genes: experi- integration in the nervous system. Proc. Natl. Acad. Sci.
mental analysis of extinct molecules. Nat. Rev. Genet. 5: U. S. A. 91: 5033–5037.
366–375. 65. Tononi, G. 2010. Information integration: its relevance to
45. Pauling, L. & E. Zuckerkandl. 1963. Chemical paleogenetics: brain function and consciousness. Arch. Ital. Biol. 148: 299–
molecular restoration studies of extinct forms of life. Acta 322.
Chem. Scand. 17: 89. 66. Lungarella, M., T. Pegors, D. Bulwinkle & O. Sporns. 2005.
46. Benner, S. 2007. The early days of paleogenetics: connecting Methods for quantifying the informational structure of sen-
molecules to the planet. In Ancestral Sequence Reconstruction. sory and motor data. Neuroinformatics 3: 243–262.
D.A. Liberles, Ed.: 3–19. Oxford University Press. New York. 67. Lungarella, M. & O. Sporns. 2006. Mapping information
47. Federhen, S. 2002. The taxonomy project. In The NCBI flow in sensorimotor networks. PLoS Comput. Biol. 2: e144.
Handbook. J. McEntyre & J. Ostell, Eds. National Center 68. McGill, W.J. 1954. Multivariate information transmission.
for Biotechnology Information. Bethesda, MD. Psychometrika 19: 97–116.
69. Schneidman, E., S. Still, M.J. Berry & W. Bialek. 2003. Net- in the evolution of social behavior. Nature 318: 366–
work information and connected correlations. Phys. Rev. 367.
Lett. 91: 238701. 77. Fletcher, J.A. & M. Zwick. 2006. Unifying the theories of
70. Ay, N. & T. Wennekers. 2003. Temporal infomax leads to al- inclusive fitness and reciprocal altruism. Am. Nat. 168: 252–
most deterministic dynamical systems. Neurocomputing 52– 262.
54: 461–466. 78. Fletcher, J.A. & M. Doebeli. 2009. A simple and general
71. Ay, N. & T. Wennekers. 2003. Dynamical properties of explanation for the evolution of altruism. Proc. R. Soc. B-
strongly interacting Markov chains. Neural Netw. 16: 1483– Biol. Sci. 276: 13–19.
1497. 79. Iliopoulos, D., A. Hintze & C. Adami. 2010. Critical dynam-
72. Adami, C. 2012. unpublished. ics in the evolution of stochastic strategies for the iterated
73. Carothers, J.M., S.C. Oestreich, J.H. Davis & J.W. Szostak. Prisoner’s Dilemma. PLoS Comput. Biol. 6: e1000948.
2004. Informational complexity and functional activ- 80. Polani, D. 2009. Information: currency of life? HFSP J. 3:
ity of RNA structures. J. Am. Chem. Soc. 126: 5130– 307–316.
5137. 81. Marstaller, L., C. Adami & A. Hintze. 2012. Cognitive systems
74. Waddington, C.H. 1942. Canalization of development and evolve complex representations for adaptive behavior. In
the inheritance of acquired characters. Nature 150: 563– press.
565. 82. Adami, C. 2002. What is complexity? Bioessays 24: 1085–
75. Maynard Smith, J. 1982. Evolution and the Theory of Games. 1094.
Cambridge University Press. Cambridge, UK. 83. Adami, C. 2006. What do robots dream of? Science 314:
76. Queller, D.C. 1985. Kinship, reciprocity and synergism 1093–1094.