From Philosophy To Program Size Key Idea PDF
From Philosophy To Program Size Key Idea PDF
From Philosophy To Program Size Key Idea PDF
i i
i i
i i
i i
i i
i i
i i
i i
i i
i i
i i
i i
Lecture Notes on
Algorithmic Information Theory
from the 8th Estonian Winter School in
Computer Science, EWSCS’03
Institute of Cybernetics
Tallinn
i i
i i
i i
Gregory J. Chaitin
IBM Research
P. O. Box 218, Yorktown Heights, NY 10598, USA
http://www.cs.umaine.edu/~chaitin/
http://www.cs.auckland.ac.nz/CDMTCS/chaitin/
The 8th Estonian Winter School in Computer Science, EWSCS’03, was held
at Palmse, Estonia, 2-7 March 2003, organized by the Institute of Cybernetics,
Tallinn.
The organizers acknowledge the financial support from the Estonian Informa-
tion Technology Foundation through the Tiigriülikool programme, the Euro-
pean Commission through the FP5 IST project eVikings II and the Ministry
of Education and Research of Estonia through its Centers of Excellence in
Research programme.
ISBN 9985-894-55-3
c Gregory J. Chaitin 2003
i i
i i
i i
Preface
This little book contains the course that I had the pleasure of giving at
the 8th Estonian Winter School in Computer Science (EWSCS ’03) held
at the beautiful Palmse manor in Lahemaa National Park, Estonia, from
March 2nd through 7th, 2003. There I gave four 90-minute lectures on
algorithmic information theory (AIT), which is the theory of program-
size complexity. Each of these lectures is one chapter of this book.
In these lectures I discuss philosophical applications of AIT, not prac-
tical applications. Indeed, I believe AIT has no practical applications.
The most interesting thing about AIT is that you can almost never
determine the complexity of anything. This makes the theory useless
for practical applications, but fascinating from a philosophical point of
view, because it shows that there are limits to knowledge.
Most work on computational complexity is concerned with time.
However this course will try to show that program-size complexity, which
measures algorithmic information, is of much greater philosophical sig-
nificance. I’ll discuss how one can use this complexity measure to study
what can and cannot be achieved by formal axiomatic mathematical
theories.
In particular, I’ll show (a) that there are natural information-
theoretic constraints on formal axiomatic theories, and that program-
size complexity provides an alternative path to incompleteness from the
one originally used by Kurt Gödel. Furthermore, I’ll show (b) that
in pure mathematics there are mathematical facts that are true for no
reason, that are true by accident. These have to do with determining
the successive binary digits of the precise numerical value of the halting
probability Ω for a “self-delimiting” universal Turing machine.
i i
i i
i i
—Gregory Chaitin
http://www.cs.umaine.edu/~chaitin
http://www.cs.auckland.ac.nz/CDMTCS/chaitin
i i
i i
i i
Contents
i i
i i
i i
i i
i i
i i
Additional Reading 57
i i
i i
i i
i i
i i
i i
Chapter 1
Day I—Philosophical
Necessity of AIT
11
i i
i i
i i
i i
i i
i i
DAY I—PHILOSOPHY 13
i i
i i
i i
[This assumes that the theory uses a formal language, symbolic logic,
requires complete proofs, and provides a proof-checking algorithm that
always enables you to decide mechanically whether or not a proof is
correct, whether or not it followed all the rules in deriving a logical
consequence of the axioms.]
And we shall concentrate our attention on the size in bits of this
computer program for generating all the theorems. That’s our measure
of the complexity of this theory, that’s our measure of its information
content.
So when we speak of an N -bit theory, we mean one with an N -
bit program for generating all the theorems. We don’t care that this
process is very slow and never terminates. AIT is a theoretical theory,
not a practical theory!
Okay, that’s half the setup! These are the theories we’ll consider.
Here’s the other half of the setup. Here are the theorems that we
want these theories to be able to establish. We want to use these theories
to prove that particular computer programs are elegant.
“Elegant” Programs:
No smaller program gives exactly the same output.
i i
i i
i i
DAY I—PHILOSOPHY 15
For any particular computational task, for any particular output that
we desire to achieve, there has to be at least one elegant program, and
there may even be several.
But what if we want to be able to prove that a particular program
is elegant?
i i
i i
i i
Let c be the size in bits of the fixed-size routine that does all this: It
is given the N -bit theory as data and then it runs through all possible
proofs in the theory until it finds a provably elegant program that is
sufficiently large (> N + c bits), then it runs that program and returns
that large elegant program’s output as its own output.
So the N + c bit program displayed above produces the same output
as a provably elegant program that is larger than N + c bits. But that
is impossible!
So our precise result is that an N -bit theory cannot enable us to
prove that any program that is larger than N +c bits is elegant.
Here c is the size in bits of the fixed-size routine that when added to
the N -bit formal theory as above yields the paradox that proves our
theorem.
Note: We are making the tacit assumption that if our theory proves
that a program is elegant, then that program is actually elegant. I.e.,
we assume that only true theorems are provable in our formal theory. If
this is NOT the case, then the theory is of absolutely no interest.
i i
i i
i i
DAY I—PHILOSOPHY 17
Gödel, who had the conventional Platonist view of math, was only
forced into backing new math axioms that are only justified prag-
matically just as in physics because of his famous 1931 incompleteness
theorem. And I believe that the ideas that I’ve just presented applying
program-size complexity to incompleteness, in particular my result that
it takes an N -bit theory to prove that an N -bit program is elegant, and
the results on Ω that we’ll see Day II, provide even more support for
Gödel’s heretical views on new axioms.
The way AIT measures the complexity (information content) of math-
ematical theories makes Gödel incompleteness seem much more natural,
much more pervasive, much more inevitable, and much more danger-
ous. Adding new axioms—adding more mathematical information—
seems to be the only way out, the only way to go forward!
We’ve discussed adding new axioms in math just as in physics, prag-
matically. A related question is
i i
i i
i i
i i
i i
i i
DAY I—PHILOSOPHY 19
i i
i i
i i
Tymoczko,
New Directions in the Philosophy of Mathematics,
Princeton University Press, 1998.
i i
i i
i i
DAY I—PHILOSOPHY 21
Independent parts of the world, of which living beings are the most
interesting example, have the property that their program-size complex-
ity decomposes additively.
Conversely, the parts of a living being are certainly not indepen-
dent and have high mutual information. [Mutual information will be
discussed in Sections 3.6 and 3.7, Day III, and Section 4.3, Day IV.]
i i
i i
i i
i i
i i
i i
Chapter 2
– axioms
– rules of inference
– symbolic logic
– formal grammar
23
i i
i i
i i
i i
i i
i i
DAY II—INCOMPLETENESS 25
• B = 0.d1 d2 d3 . . .
• dN =
1 −→ answer Yes!
2 −→ answer No!
0 −→ NO ANSWER
(N th text in French is not a valid yes/no question,
or cannot be answered.)
i i
i i
i i
• T = 0.b1 b2 b3 . . .
• bN =
0 −→ pN doesn’t halt,
1 −→ pN halts.
• Riemann Hypothesis
About the location of the complex zeroes of the zeta function
X 1 Y 1
ζ(s) ≡ =
n ns p 1 − p1s
(Here n ranges over positive integers and p ranges over the primes.)
Tells us a lot about the distribution of prime numbers.
• Goldbach Conjecture
Is every even number the sum of two primes?
i i
i i
i i
DAY II—INCOMPLETENESS 27
2−|p|
X
Ω≡
p halts
i i
i i
i i
i i
i i
i i
DAY II—INCOMPLETENESS 29
i i
i i
i i
H(x) is the size in bits of the smallest program for computing x on each
machine.
Then we have
HU (x) ≤ HC (x) + |πC |.
In other words, programs for U are not too large. For U to simulate
C adds only a fixed number of bits to each program for C.
Then ΩU is defined as follows:
2−|p| .
X
ΩU ≡
U (p) halts
i i
i i
i i
DAY II—INCOMPLETENESS 31
For more details, see Ord and Kieu, On the existence of a new family of
Diophantine equations for Ω,
http://arxiv.org/abs/math.NT/0301274.
So we get randomness in arithmetic! Ω’s algorithmic ran-
domness also infects elementary number theory! The disease
is spreading!
i i
i i
i i
i i
i i
i i
Chapter 3
Self-delimiting computer C
• C(p) −→ output
33
i i
i i
i i
i i
i i
i i
• If we knew all the laws of physics, and the world turned out to be
a giant computer, then we could use that computer as U !
•
P −H(x) ≤ 1.
x2
Then pick out an optimal minimal H, one with the property that for
any other H 0 there is a constant c such that
H(x) ≤ H 0 (x) + c.
1
“Art is a lie that helps us to see the truth,” Pablo Picasso.
i i
i i
i i
• Subadditivity:
i i
i i
i i
i i
i i
i i
In other words, you have to know how many bits there are plus
what each bit is. And most N -bit strings have H close to the
maximum possible. These are the algorithmically random N -
bit strings.
This notion has many philosophical resonances. First, a ran-
dom string is one for which there is no theory that obeys Leibniz’s
dictum that a theory must be smaller than what it explains. Leib-
niz clearly anticipated this definition of randomness.
Second, a random string is an unexplainable, irrational string, one
that cannot be comprehended, except as “a thing in itself” (Ding
an sich), to use Kantian terminology.
i i
i i
i i
• Low complexity:
Consider the N -bit strings consisting of N 0’s or N 1’s.
• Intermediate complexity:
Consider an elegant program p. I.e., no smaller program makes U
produce the same output.
2−|p|
X
Ω≡
U (p) halts
i i
i i
i i
Ω0 ≡ 2−H(N )
X
i i
i i
i i
2−|p| ,
X
PC (x) ≡
C(p)=x
2−|p| .
X
P (x) ≡ PU (x) ≡
U (p)=x
This takes into account all the programs that calculate x, not just the
smallest one.
P (x) ≥ 2−H(x)
This crucial result shows that AIT, at least in so far as the appearance of
its formulas is concerned, is sort of just a version of probability theory
in which we take logarithms and convert probabilities into complexities.
In particular, the important and subtle theorem that
i i
i i
i i
i i
i i
i i
i i
i i
i i
In the last lecture, Day IV, I’ll drop this abstract viewpoint and
make everything very, very concrete by indicating how to program U in
LISP and actually run programs on U . . .
i i
i i
i i
Chapter 4
Day IV—LISP
Implementation of AIT
The LISP interpreter for Day IV, which is a Java applet that will run
in your web browser, can be found at these two URL’s:
http://www.cs.umaine.edu/~chaitin/unknowable/lisp.html
http://www.cs.auckland.ac.nz/CDMTCS/chaitin/unknowable/
lisp.html
Note that there is a concise LISP “reference manual” on these web pages,
just below the windows for input to and output from the interpreter.
45
i i
i i
i i
such that
smallest p −→ U −→ {theorems}
The rather abstract point of view taken here is in the spirit of Emil
Post’s 1944 paper R.e. sets of positive integers and their decision prob-
lems. For our purposes the internal details of the formal theory are
irrelevant. We are flying high in the sky looking down at the formal
theory. We’re so high up that we don’t care about the axioms and rules
of inference used in the theory. We can’t see all of that detail. Neverthe-
less, using these very general methods we can make some rather strong
statements about the power of the formal theory in question.
This follows from the fact that knowing ΩK , the first K bits of Ω,
enables you to answer the halting problem for all programs for U
up to K bits in size.
• Now that you know that ΩK has high complexity, in the second
half of the proof you use this fact to derive a contradiction from
i i
i i
i i
Well, you could try to make mutual information into the size of a
program. E.g., mutual information could be defined to be the size of
the largest (not the smallest!) possible common subroutine shared by
elegant programs for x and y.1
Unfortunately this alternative definition, which is much more intu-
itive than the one I actually use, doesn’t seem to fit into AIT too well.
Perhaps someone can find a way to make it work! Can you?
1
For other possibilites, see the section on common information in my 1979
paper Toward a mathematical definition of “life.”
i i
i i
i i
Therefore
i i
i i
i i
You can go on like this using more and more headers and getting more
and more complicated upper bounds. To short-circuit this process, just
go back to using a single header, and make it an elegant program for
N . That gives
i i
i i
i i
i i
i i
i i
You can get a toy version of AIT by using LISP program size!
Define an elegant LISP expression to be one with the property that
no smaller LISP expression yields the same value.
In order to measure size properly, LISP expressions are written in
a canonical standard notation with no embedded comments and with
exactly one blank separating successive elements of a list.
Then the LISP complexity of an S-expression is defined to be the
size in characters of an elegant expression with that particular value.
Next represent a formal mathematical theory in LISP as a LISP S-
expression whose evaluation never terminates and which uses the prim-
itive function display to output each theorem.
display is an identity function with the side-effect of displaying its
operand and is normally used for debugging large LISP S-expressions
that give the wrong value.
Theorem—
A formal mathematical theory with LISP complexity N cannot
enable you to prove that a LISP expression is elegant if the
elegant expression’s size is greater than N + 410 characters!
This is a very concrete, down-to-earth incompleteness theorem! In
that direction, it’s the best I can do!
The reductio ad absurdum proof consists of N + 410 characters of
LISP that define a function, and that apply this function to a quoted
form of the N -character LISP expression that generates all the theorems
of the formal theory. The first step is to measure the size N of the LISP
expression that generates all the theorems. Then use TRY (Section
4.7) to run the theory for longer and longer, until a provably elegant
expression is found whose size is greater than N + 410. When this
happens, evaluate the provably elegant expression and return its value
as our value.
So we’ve described an N +410 character LISP expression whose value
is the same as the value of an elegant expression that is larger than it
is! Contradiction!
i i
i i
i i
i i
i i
i i
i i
i i
i i
run-utm-on
bits ’ ’(a b c)
run-utm-on
append bits ’read-bit
’(0)
This yields 0.
Let’s change the data bit. Now the prefix is read-bit and the data
is 1.
run-utm-on
append bits ’read-bit
’(1)
This yields 1.
run-utm-on is a macro that expands to the definition of U , which is
cadr try no-time-limit ’eval read-exp.
Also note that when bits converts a LISP S-expression into a bit
string (a list of 0’s and 1’s) it automatically adds the newline character
required by read-exp.
i i
i i
i i
For years and years I used this inequality without having the faintest
idea what the value of the constant might be! It’s nice to know that there
is a natural choice for U that can be easily programmed in LISP and for
which c = 432! That’s a little bit like dreaming of a woman for years
and then actually meeting her! That’s also how I feel about now having
a version of U that I can actually run code on!
My book The Limits of Mathematics programs in LISP all the proofs
of my incompleteness results, in particular, all the key results about Ω.
And my book Exploring Randomness programs in LISP all the proofs
of the results presented Day III. That’s a lot of LISP programming, but
I enjoyed it a lot!
Some of these programs are fun to read, and it’s educational to do
so, but I probably went overboard! In general, I would say that you are
better off programming something yourself, rather than reading someone
else’s code, if you really want to understand what’s going on! There is
no substitute for doing it yourself, your own way, if you really want to
understand something. After all, that’s how I came up with AIT and my
alternative approach to Gödel incompleteness in the first place, because
I was dissatisfied with the usual approach!
i i
i i
i i
• I think that AIT is largely sketched out and only highly technical
questions remain. I personally am more interested in attempting
new theories of life or intelligence. This however will not be easy
to do.
3
In this connection, see footnote 6 on page 24.
i i
i i
i i
Additional Reading
The six papers listed here were the handouts that accompanied my lec-
tures at EWSCS ’03. The two books listed below provide the reference
material for the course, including complete proofs for all the theorems
that I stated here, and a detailed explanation of my version of LISP.
57
i i
i i
i i
The papers listed here are also available at the author’s websites:
http://www.cs.umaine.edu/~chaitin
http://www.cs.auckland.ac.nz/CDMTCS/chaitin
http://www.cs.umaine.edu/~chaitin/ait
http://www.cs.auckland.ac.nz/CDMTCS/chaitin/ait
http://www.cs.umaine.edu/~chaitin/lm.html
http://www.cs.auckland.ac.nz/CDMTCS/chaitin/lm.html
http://www.cs.umaine.edu/~chaitin/unknowable/lisp.html
http://www.cs.auckland.ac.nz/CDMTCS/chaitin/unknowable/
lisp.html
i i
i i
i i
i i
i i
i i
The organizers wish to thank all lecturers of EWSCS’03 for the good
work they did at Palmse. Thanks for coming, thanks for sharing your
knowledge and your passion for your subject!
Many thanks also to the people of Park Hotel Palmse and the Visitor
Center of Lahemaa National Park.
i i
i i
i i
i i
i i
i i
i i
i i