Mathematics For Computer Science PDF
Mathematics For Computer Science PDF
Mathematics For Computer Science PDF
for
Theoretical Computer Science
(preliminary version 1.0)
Margaret M. Fleck
January 9, 2012
Contents
Preface xi
1 Math review 1
1.1 Some sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Pairs of reals . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Exponentials and logs . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Factorial, floor, and ceiling . . . . . . . . . . . . . . . . . . . . 5
1.5 Summations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Variation in notation . . . . . . . . . . . . . . . . . . . . . . . 8
2 Logic 9
2.1 A bit about style . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Complex propositions . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Implication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Converse, contrapositive, biconditional . . . . . . . . . . . . . 13
2.6 Complex statements . . . . . . . . . . . . . . . . . . . . . . . 14
2.7 Logical Equivalence . . . . . . . . . . . . . . . . . . . . . . . . 15
2.8 Some useful logical equivalences . . . . . . . . . . . . . . . . . 16
i
CONTENTS ii
3 Proofs 26
3.1 Proving a universal statement . . . . . . . . . . . . . . . . . . 26
3.2 Another example of direct proof involving odd and even . . . . 28
3.3 Direct proof outline . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Proving existential statements . . . . . . . . . . . . . . . . . . 30
3.5 Disproving a universal statement . . . . . . . . . . . . . . . . 30
3.6 Disproving an existential statement . . . . . . . . . . . . . . . 31
3.7 Recap of proof methods . . . . . . . . . . . . . . . . . . . . . 32
3.8 Direct proof: example with two variables . . . . . . . . . . . . 32
3.9 Another example with two variables . . . . . . . . . . . . . . . 33
3.10 Proof by cases . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.11 Rephrasing claims . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.12 Proof by contrapositive . . . . . . . . . . . . . . . . . . . . . . 36
3.13 Another example of proof by contrapositive . . . . . . . . . . 37
3.14 Proof by contradiction . . . . . . . . . . . . . . . . . . . . . . 38
3.15 2 is irrational . . . . . . . . . . . . . . . . . . . . . . . . . . 38
CONTENTS iii
4 Number Theory 40
4.1 Factors and multiples . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 Direct proof with divisibility . . . . . . . . . . . . . . . . . . . 41
4.3 Stay in the Set . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4 Prime numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.5 There are infinitely many prime numbers . . . . . . . . . . . . 43
4.6 GCD and LCM . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.7 The division algorithm . . . . . . . . . . . . . . . . . . . . . . 45
4.8 Euclidean algorithm . . . . . . . . . . . . . . . . . . . . . . . 46
4.9 Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.10 A recursive version of gcd . . . . . . . . . . . . . . . . . . . . 48
4.11 Congruence mod k . . . . . . . . . . . . . . . . . . . . . . . . 48
4.12 Proofs with congruence mod k . . . . . . . . . . . . . . . . . . 50
4.13 Equivalence classes . . . . . . . . . . . . . . . . . . . . . . . . 50
4.14 Wider perspective on equivalence . . . . . . . . . . . . . . . . 52
4.15 Variation in Terminology . . . . . . . . . . . . . . . . . . . . . 53
5 Sets 54
5.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2 Things to be careful about . . . . . . . . . . . . . . . . . . . . 55
5.3 Cardinality, inclusion . . . . . . . . . . . . . . . . . . . . . . . 56
5.4 Vacuous truth . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.5 Set operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.6 Set identities . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.7 Size of set union . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.8 Product rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
CONTENTS iv
6 Relations 68
6.1 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2 Properties of relations: reflexive . . . . . . . . . . . . . . . . . 70
6.3 Symmetric and antisymmetric . . . . . . . . . . . . . . . . . . 71
6.4 Transitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.5 Types of relations . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.6 Proving that a relation is an equivalence relation . . . . . . . . 75
6.7 Proving antisymmetry . . . . . . . . . . . . . . . . . . . . . . 76
7.9 A 2D example . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.10 Composing two functions . . . . . . . . . . . . . . . . . . . . . 86
7.11 A proof involving composition . . . . . . . . . . . . . . . . . . 87
7.12 Variation in terminology . . . . . . . . . . . . . . . . . . . . . 88
9 Graphs 100
9.1 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
9.2 Degrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
9.3 Complete graphs . . . . . . . . . . . . . . . . . . . . . . . . . 103
9.4 Cycle graphs and wheels . . . . . . . . . . . . . . . . . . . . . 104
9.5 Isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
9.6 Subgraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
9.7 Walks, paths, and cycles . . . . . . . . . . . . . . . . . . . . . 107
9.8 Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
CONTENTS vi
10 Induction 117
10.1 Introduction to induction . . . . . . . . . . . . . . . . . . . . . 117
10.2 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
10.3 Why is this legit? . . . . . . . . . . . . . . . . . . . . . . . . . 119
10.4 Building an inductive proof . . . . . . . . . . . . . . . . . . . 121
10.5 Another example of induction . . . . . . . . . . . . . . . . . . 121
10.6 Some comments about style . . . . . . . . . . . . . . . . . . . 123
10.7 Another example . . . . . . . . . . . . . . . . . . . . . . . . . 123
10.8 A geometrical example . . . . . . . . . . . . . . . . . . . . . . 124
10.9 Graph coloring . . . . . . . . . . . . . . . . . . . . . . . . . . 126
10.10Postage example . . . . . . . . . . . . . . . . . . . . . . . . . 128
10.11Nim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
10.12Prime factorization . . . . . . . . . . . . . . . . . . . . . . . . 130
10.13Variation in notation . . . . . . . . . . . . . . . . . . . . . . . 131
12 Trees 142
12.1 Why trees? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
12.2 Defining trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
12.3 m-ary trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
12.4 Height vs number of nodes . . . . . . . . . . . . . . . . . . . . 147
12.5 Context-free grammars . . . . . . . . . . . . . . . . . . . . . . 147
12.6 Recursion trees . . . . . . . . . . . . . . . . . . . . . . . . . . 151
12.7 Another recursion tree example . . . . . . . . . . . . . . . . . 153
12.8 Tree induction . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
12.9 Heap example . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
12.10Proof using grammar trees . . . . . . . . . . . . . . . . . . . . 157
12.11Variation in terminology . . . . . . . . . . . . . . . . . . . . . 158
13 Big-O 160
13.1 Running times of programs . . . . . . . . . . . . . . . . . . . . 160
13.2 Function growth: the ideas . . . . . . . . . . . . . . . . . . . . 161
13.3 Primitive functions . . . . . . . . . . . . . . . . . . . . . . . . 162
13.4 The formal definition . . . . . . . . . . . . . . . . . . . . . . . 164
13.5 Applying the definition . . . . . . . . . . . . . . . . . . . . . . 164
13.6 Writing a big-O proof . . . . . . . . . . . . . . . . . . . . . . . 165
13.7 Sample disproof . . . . . . . . . . . . . . . . . . . . . . . . . . 166
CONTENTS viii
14 Algorithms 168
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
14.2 Basic data structures . . . . . . . . . . . . . . . . . . . . . . . 168
14.3 Nested loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
14.4 Merging two lists . . . . . . . . . . . . . . . . . . . . . . . . . 171
14.5 A reachability algorithm . . . . . . . . . . . . . . . . . . . . . 172
14.6 Binary search . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
14.7 Mergesort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
14.8 Tower of Hanoi . . . . . . . . . . . . . . . . . . . . . . . . . . 176
14.9 Multiplying big integers . . . . . . . . . . . . . . . . . . . . . 178
17 Countability 202
17.1 The rationals and the reals . . . . . . . . . . . . . . . . . . . . 202
17.2 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
17.3 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
17.4 More countably infinite sets . . . . . . . . . . . . . . . . . . . 204
17.5 Cantor Schroeder Bernstein Theorem . . . . . . . . . . . . . . 205
17.6 P(N) isnt countable . . . . . . . . . . . . . . . . . . . . . . . 206
17.7 More uncountability results . . . . . . . . . . . . . . . . . . . 207
17.8 Uncomputability . . . . . . . . . . . . . . . . . . . . . . . . . 208
17.9 Variation in notation . . . . . . . . . . . . . . . . . . . . . . . 209
A Jargon 223
A.1 Strange technical terms . . . . . . . . . . . . . . . . . . . . . . 223
A.2 Odd uses of normal words . . . . . . . . . . . . . . . . . . . . 224
A.3 Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
A.4 Unexpectedly normal . . . . . . . . . . . . . . . . . . . . . . . 227
This book teaches two different sorts of things, woven together. It teaches
you how to read and write mathematical proofs. It also provides a survey
of basic mathematical objects, notation, and techniques which will be useful
in later computer science courses. These include propositional and predicate
logic, sets, functions, relations, modular arithmetic, counting, graphs, and
trees.
xi
PREFACE xii
rience writing proofs, including inductive proofs, this book may be too easy
for you. You may wish to read some of the books listed as supplementary
readings in Appendix B.
For instructors
This text is designed for a broad range of computer science majors, ranging
from budding young theoreticians to capable programmers with very little
interest in theory. It assumes only limited mathematical maturity, so that it
can be used very early in the major. Therefore, a central goal is to explain
the process of proof construction clearly to students who cant just pick it
up by osmosis.
This book is designed to be succinct, so that students will read and absorb
its contents. Therefore, it includes only core concepts and a selection of
illustrative examples, with the expectation that the instructor will provide
supplementary materials as needed and that students can look up a wider
range of facts, definitions, and pictures, e.g. on the internet.
Although the core topics in this book are old and established, terminol-
ogy and notation have changed over the years and vary somewhat between
authors. To avoid overloading students, I have chosen one clean, modern
version of notation, definitions, and terminology to use consistently in the
main text. Common variations are documented at the end of each chapter. If
students understand the underlying concepts, they should have no difficulty
adapting when they encounter different conventions in other books.
Many traditional textbooks do a careful and exhaustive treatment of each
topic from first principles, including foundational constructions and theorems
which prove that the conceptual system is well-formed. However, the most
abstract parts of many topics are hard for beginners to absorb and the impor-
tance of the foundational development is lost on most students at this level.
The early parts of this textbook remove most of this clutter, to focus more
clearly on the key concepts and constructs. At the end, we revisit certain
topics to look at more abstract examples and selected foundational issues.
Chapter 1
Math review
This book assumes that you understood precalculus when you took it. So you
used to know how to do things like factoring polynomials, solving high school
geometry problems, using trigonometric identities. However, you probably
cant remember it all cold. Many useful facts can be looked up (e.g. on the
internet) as you need them. This chapter reviews concepts and notation that
will be used a lot in this book, as well as a few concepts that may be new
but are fairly easy.
Z = {. . . , 3, 2, 1, 0, 1, 2, 3, . . .} is the integers.
1
CHAPTER 1. MATH REVIEW 2
a point in 2D space
a complex number
bx by = bx+y
ax bx = (ab)x
(bx )y = bxy
y
b(x ) 6= (bx )y
Suppose that b > 1. Then we can invert the function bx , to get the func-
tion logb x (logarithm of x to the base b). Logarithms appear in computer
science as the running times of particularly fast algorithms. They are also
used to manipulate numbers that have very wide ranges, e.g. probabilities.
Notice that the log function takes only positive numbers as inputs. In this
class, log x with no explicit base always means log2 x because analysis of
computer algorithms makes such heavy use of base-2 numbers and powers of
2.
Useful facts about logarithms include:
CHAPTER 1. MATH REVIEW 5
blogb (x) = x
logb (xy) = logb x + logb y
logb (xy ) = y logb x
logb x = loga x logb a
In the change of base formula, its easy to forget whether the last term
should be logb a or loga b. To figure this out on the fly, first decide which of
logb x and loga x is larger. You then know whether the last term should be
larger or smaller than one. One of logb a and loga b is larger than one and
the other is smaller: figure out which is which.
More importantly, notice that the multiplier to change bases is a con-
stant, i.e doesnt depend on x. So it just shifts the curve up and down
without really changing its shape. Thats an extremely important fact that
you should remember, even if you cant reconstruct the precise formula. In
many computer science analyses, we dont care about constant multipliers.
So the fact that base changes simply multiply by a constant means that we
frequently dont have to care what the base actually is. Thus, authors often
write log x and dont specify the base.
k! = 1 2 3 . . . (k 1) k
1.5 Summations
If ai is some formula that depends on i, then
n
X
ai = a1 + a2 + a3 + . . . + an
i=1
For example
n
X 1 1 1 1 1
i
= + + ...+ n
i=1
2 2 4 8 2
n
Y 1 1 1 1 1
= ...
k=1
k 1 2 3 n
Certain sums can be re-expressed in closed form i.e. without the sum-
mation notation. For example:
n
X 1 1
i
=1 n
i=1
2 2
In Calculus, you may have seen the infinite version of this sum, which
converges to 1. In this class, were always dealing with finite sums, not
infinite ones.
If you modify the start value, so we start with the zeroth term, we get
the following variation on this summation. Always be careful to check where
your summation starts.
n
X 1 1
i
=2 n
i=0
2 2
Many reference books have tables of useful formulas for summations that
have simple closed forms. Well see them again when we cover mathematical
induction and see how to formally prove that they are correct.
Only a very small number of closed forms need to be memorized. One
example is
n
X n(n + 1)
i=
i=1
2
Heres one way to convince yourself its right. On graph paper, draw a
box n units high and n+1 units wide. Its area is n(n+1). Fill in the leftmost
part of each row to represent one term of the summation: the left box in the
first row, the left two boxes in the second row, and so on. This makes a little
triangular pattern which fills exactly half the box.
CHAPTER 1. MATH REVIEW 8
Logic
This chapter covers propositional logic and predicate logic at a basic level.
Some deeper issues will be covered later.
9
CHAPTER 2. LOGIC 10
2.2 Propositions
Two systems of logic are commonly used in mathematics: propositional logic
and predicate logic. Well start by covering propositional logic.
A proposition is a statement which is true or false (but never both!). For
example, Urbana is in Illinois or 2 15. It cant be a question. It also
cant contain variables, e.g. x 9 isnt a proposition. Sentence fragments
without verbs (e.g. bright blue flowers) or arithmetic expressions (e.g.
5 + 17), arent propositions because they dont state a claim.
The lack of variables prevents propositional logic from being useful for
very much, though it has some applications in circuit analysis, databases, and
artificial intelligence. Predicate logic is an upgrade that adds variables. We
will mostly be using predicate logic in this course. We just use propositional
logic to get started.
2.4 Implication
Two propositions p and q can also be joined into the conditional statement.
if p, then q. The proposition after the if (p in this case) is called the
hypothesis and the proposition after then (q in this example) is called
CHAPTER 2. LOGIC 12
(p q) (r s)
To show this, we build the truth table for p q and notice that the output
values exactly match those for p q.
p q p p q
T T F T
T F F F
F T T T
F F T T
Two very well-known equivalences are De Morgans Laws. These state
that (p q) is equivalent to p q. and that (p q) is equivalent to
p q. Similar rules in other domains (e.g. set theory) are also called De
Morgans Laws. They are especially helpful, because they tell you how to
simplify the negation of a complex statement involving and and or.
We can show this easily with another truth table:
p q p q pq (p q) p q
T T F F T F F
T F F T T F F
F T T F T F F
F F T T F T T
T and F are special constant propositions with no variables that are,
respectively, always true and always false. So, since p p is always false,
we have the following equivalence:
p p F
a(b + c) = ab + ac
p (q r) (p q) (p r)
p (q r) (p q) (p r)
So, in logic, you can distribute either operator over the other. Also,
arithmetic has a clear rule that multiplication is done first, so the righthand
side doesnt require parentheses. The order of operations is less clear for the
logic, so more parentheses are required.
(p) p
(p q) p q
(p q) p q
(p q) p q.
(r (p l)) r (p l) r p l r p l
Suppose we call these P (x) and Q(y). Then Q(y) is true if y is mint but
not if y is tomato.1
If we substitute concrete values for all the variables in a predicate, were
back to having a proposition. That wasnt much use, was it?
The main use of predicates is to make general statements about what
happens when you substitute a variety of values for the variables. For exam-
ple:
In normal English, when we say that there is an object with some prop-
erties, this tends to imply that theres only one or perhaps only a couple. If
there were many objects with this property, we normally expect the speaker
to say so. So it would seem odd to say
Or
Mathematicians, however, are happy to say things like that. When they
say there exists an x, with certain properties, they mean that there exists
at least one x with those properties. They are making no claims about how
many such xs there are.
However, it is sometimes important to point out when one and only one x
has some set of properties. The mathematical jargon for this uses the unique
existence quantifier, as in:
Mathematicians use the adjective unique to mean that theres only one
such object (similar to the normal usage but not quite the same).
2.12 Notation
The universal quantifier has the shorthand notation . For example,
CHAPTER 2. LOGIC 21
x R, x2 + 3 0
In this sentence, is the quantifier. x R declares the variable and the set
(R) from which its values can be taken, called its domain or its replacement
set. As computer scientists, we can also think of this as declaring the type
of x, just as in a computer program. Finally, x2 + 3 0 is the predicate.
The existential quantifier is written , e.g. y R, y = 2. Notice that
we dont write such that when the quantifier is in shorthand. The unique
existence quantifier is written ! as in !x R, x2 = 0. When existential
quantifiers are written in English, rather than shorthand, we need to add the
phrase such that to make the English sound right, e.g.
There exists a real number y such that y = 2.
Theres no deep reason for adding such that. Its just a quirk about how
mathematical English is written, which you should copy so that your written
mathematics looks professional. Such that is sometimes abbreviated s.t.
x R, y R, x + y x
x, y R, x + y x
This means for all real numbers x and y, x + y x (which isnt true).
In such a claim, the two variables x and y might contain different values,
but its important to realize that they might also be equal. For example, the
following sentence is true:
CHAPTER 2. LOGIC 22
x, y Z, x y = 0
Notice that the quantifier stays the same: we only transform the if/then
statement inside it.
(x, y) R2 , x2 + y 2 = 1
When you later need to make precise what it means to be on the unit circle,
you will have to break up p into its two coordinates. At that point, you say
that that since p is a point on the plane, it must have the form (x, y), where
x and y are real numbers. This defines the component variables you need to
expand the definition of on the unit circle into the equation x2 + y 2 = 1.
CHAPTER 2. LOGIC 23
Similarly,
So this is a bit like the de Morgans laws: when you move the negation
across the operator, you change it to the other similar operator.
We saw above how to move negation operators from the outside to the
inside of expressions involving , , and the other propositional operators.
Together with these two new rules to handle quantifiers, we now have a
mechanical procedure for working out the negation of any random statement
in predicate logic.
So if we have something like
Its negation is
x R, x2 + 3 0
because its less confusing if we all use the same notation. However, dont
be surprised if another class or book does things differently. For example:
There are several conventions about inserting commas after the quan-
tifier and/or parentheses around the following predicate. We wont be
picky about this.
Some subfields (but not this class) have a convention that and is
applied before or, so that parentheses around and operations can
be omitted. Well keep the parentheses in such cases.
Some authors use certain variations of or (e.g. either ... or) with an
exclusive meaning, when writing mathematical English. In this class,
always read or as inclusive unless there is an explicit reason to do
otherwise (e.g. X and Y, but not both).
Proofs
26
CHAPTER 3. PROOFS 27
At the start of the proof, notice that we expanded the word rational
into what its definition said. At the end of the proof, we went the other
way: noticed that something had the form required by the definition and
then asserted that it must be a rational.
WARNING!! Abuse of notation. Notice that the above definition of
rational used the word if. If you take this literally, it would mean that the
definition could be applied only in one direction. This isnt whats meant.
Definitions are always intended to work in both directions. Technically, I
should have written if and only if (frequently shortened to iff). This
little misuse of if in definitions is very, very common.
Notice also that we spelled out the definition of rational but we just
freely used facts from high school algebra as if they were obvious. In general,
when writing proofs, you and your reader come to some agreement about
what parts of math will be considered familiar and obvious, and which re-
quire explicit discussion. In practical terms, this means that, when writing
solutions to homework problems, you should try to mimic the level of detail in
examples presented in lecture and in model solutions to previous homeworks.
1
The formal name for this is universal instantiation.
CHAPTER 3. PROOFS 28
This has a slightly different form from the previous claim: x Z, if P (x),
then Q(x)
Before doing the actual proof, we first need to be precise about what we
mean by odd. And, while we are on the topic, what we mean by even.
Such definitions are sometimes written using the jargon has the form, as
in An integer n is even if it has the form 2m, where m is an integer.
Well assume that its obvious (from our high school algebra) that every
integer is even or odd, and that no integer is both even and odd. You prob-
ably also feel confident that you know which numbers are odd or even. An
exception might be zero: notice that the above definition makes it definitely
even. This is the standard convention in math and computer science.
Using these definitions, we can prove our claim as follows:
As in the previous proof, we used our key definition twice in the proof:
once at the start to expand a technical term (odd) into its meaning, then
again at the end to summarize our findings into the appropriate technical
terms.
At the start of the proof, notice that we chose a random (or arbitrary
in math jargon) integer k, like last time. However, we also supposed that
the hypothesis of the if/then statement was true. Its helpful to collect up
all your given information right at the start of the proof, so you know what
you have to work with.
The comment about what we need to show is not necessary to the proof.
Its sometimes included because its helpful to the reader. You may also want
to include it because its helpful to you to remind you of where you need to
get to at the end of the proof.
Similarly, introducing the variable p isnt really necessary with a claim
this simple. However, using new variables to create an exact match to a
definition may help you keep yourself organized.
We could spell out a bit more detail, but its really not necessary. Proofs
of existential claims are often very short, though there are exceptions.
Notice one difference from our previous proofs. When we pick a value to
instantiate a universally quantified variable, we have no control over exactly
what the value is. We have to base our reasoning just on what set it belongs
to. But when we are proving an existential claim, we get to pick our own
favorite choice of concrete value, in this case zero.
Dont prove an existential claim using a general argument about why
there must exist numbers with these properties.2 This is not only overkill,
but very hard to do correctly and harder on the reader. Use a specific,
concrete example.
Claim 5 For every integer k, its not the case that k 2 + 2k + 1 < 0.
Claim 7 For any integers m and n, if m and n are perfect squares, then so
is mn.
Notice that we used a different variable name in the two uses of the
definition of perfect square: k the first time and j the second time. Its
important to use a fresh variable name each time you expand a definition
like this. Otherwise, you could end up forcing two variables (m and n in this
case) to be equal when that isnt (or might not be) true.
Notice that the phrase which is what we needed to show helps tell the
reader that were done with the proof. Its polite to indicate the end in
one way or another. In typed notes, it may be clear from the indentation.
Sometimes, especially in handwritten proofs, we put a box or triangle of dots
or Q.E.D. at the end. Q.E.D. is short for Latin Quod erat demonstrandum,
which is just a translation of what we needed to show.
Claim 8 For all integers j and k, if j and k are odd, then jk is odd.
Proof: Let j and k be integers and suppose they are both odd.
Because j is odd, there is an integer p such that j = 2p + 1.
Similarly, there is an integer q such that k = 2q + 1.
So then jk = (2p+1)(2q+1) = 4pq+2p+2q+1 = 2(2pq+p+q)+1.
Since p and q are both integers, so is 2pq + p + q. Lets call it
m. Then jk = 2m + 1 and therefore jk is odd, which is what we
needed to show.
CHAPTER 3. PROOFS 34
It is ok to have more than two cases. Its also ok if the cases overlap,
e.g. one case might assume that x 0 and another case might assume that
x 0. However, you must be sure that all your cases, taken together, cover
all the possibilities.
In this example, each case involved expanding the definition of even.
We expanded the definition twice but, unlike our earlier examples, only one
expansion is active at a given time. So we could have re-used the variable m
when we expanded even in Case 2. I chose to use a fresh variable name
(n) because this is a safer strategy if you arent absolutely sure when re-use
is ok.
CHAPTER 3. PROOFS 35
Its not clear how to start a proof for a claim like this. What is our given
information and what do we need to show?
In such cases, it is often useful to rephrase your claim using logical equiv-
alences. For example, the above claim is equivalent to
Claim 11 For every integer k, it is not the case that k is odd and k 2 is even.
Since were assuming we all know that even and odd are opposites, this
is the same as
This is hard to prove in its original form, because were trying to use
information about a derived quantity to prove something about more basic
quantities. If we rephrase as the contrapositive, we get
Claim 16 For any integers a and b, if its not the case that a 8 or b 8,
then its not the case that a + b 15.
Claim 17 For any integers a and b, if a < 8 and b < 8, then a + b < 15.
Proof: Suppose not. That is, suppose that there were a largest
even integer. Lets call it k.
Since k is even, it has the form 2n, where n is an integer. Consider
k + 2. k + 2 = (2n) + 2 = 2(n + 1). So k + 2 is even. But k + 2
is larger than k. This contradicts our assumption that k was the
largest even integer. So our original claim must have been true.
The proof starts by informing the reader that youre about to use proof
by contradiction. The phrase suppose not is one traditional way of doing
this. Next, you should spell out exactly what the negation of the claim is.
Then use mathematical reasoning (e.g. algebra) to work forwards until you
deduce some type of contradiction.
3.15 2 is irrational
One
of the best known examples of proof by contradiction is the proof that
2 is irrational. This proof, and consequently knowledge of the existence of
irrational numbers, apparently dates back to the Greek philosopher Hippasus
in the 5th century BC.
CHAPTER 3. PROOFS 39
Suppose not. That is, suppose that 2 were rational.
a
Then we can write 2 as a fraction b
where a and b are integers
with no common factors.
2
Since 2 = ab , 2 = ab2 . So 2b2 = a2 .
By the definition of even, this means a2 is even. But then a must
be even, by (*) above. So a = 2n for some integer n.
If a = 2n and 2b2 = a2 , then 2b2 = 4n2 . So b2 = 2n2 . This means
that b2 is even, so b must be even.
We now have a contradiction. a and b were chosen not to have
any common factors. But they are both even, i.e. they are both
divisible by 2.
Because assuming that 2 was rational led to a contradiction, it
must be the case that 2 is irrational.
Chapter 4
Number Theory
Weve now covered most of the basic techniques for writing proofs. So were
going to start applying them to specific topics in mathematics, starting with
number theory.
Number theory is a branch of mathematics concerned with the behavior
of integers. It has very important applications in cryptography and in the
design of randomized algorithms. Randomization has become an increasingly
important technique for creating very fast algorithms for storing and retriev-
ing objects (e.g. hash tables), testing whether two objects are the same (e.g.
MP3s), and the like. Much of the underlying theory depends on facts about
which numbers evenly divide one another and which numbers are prime.
40
CHAPTER 4. NUMBER THEORY 41
7 | 77
77 6 | 7
7 | 7 because 7 = 7 1
(3) | 12 because 12 = 3 4
3 | (12) because 12 = 3 4
A number p is even exactly when 2 | p. The fact that zero is even is just
a special case of the fact that zero is divisible by any integer.
When we expanded the definition of divides for the second time, we used
a fresh variable name. If we had re-used k, then we would have wrongly
forced b and c to be equal.
The following two claims can be proved in a similar way:
When constructing math from the ground up, the integers are typically
constructed first and the rationals built from them. So using rationals
to prove facts about integers can lead to circular proofs.
For example, among the integers no bigger than 20, the primes are 2, 3,
5, 7, 11, 13, 17, and 19.
A key fact about prime numbers is
The word unique here means that there is only one way to factor each
integer.
For example, 260 = 2 2 5 13 and 180 = 2 2 3 3 5.
We wont prove this theorem right now, because it requires a proof tech-
nique called induction, which we havent seen yet.
There are quite fast algorithms for testing whether a large integer is prime.
However, even once you know a number is composite, algorithms for factoring
the number are all fairly slow. The difficulty of factoring large composite
numbers is the basis for a number of well-known cryptographic algorithms
(e.g. the RSA algorithm).
Proof: Suppose not. That is, suppose there were only finitely
many prime numbers. Lets call them p1 , p2 , up through pn .
Consider Q = p1 p2 pn + 1.
If you divide Q by one of the primes on our list, you get a re-
mainder of 1. So Q isnt divisible by any of the primes p1 , p2 ,
up through pn . However, by the Fundamental Theorem of Arith-
metic, Q must have a prime factor (which might be either itself
or some smaller number). This contradicts our assumption that
p1 , p2 ,. . . pn was a list of all the prime numbers.
Notice one subtlety. Were not claiming that Q must be prime. Rather,
were making the much weaker claim that Q isnt divisible by any of the first
n primes. Its possible that Q might be divisible by another prime larger
than pn .
ab
lcm(a, b) =
gcd(a, b)
70130
For example, lcm(70, 130) = 10
= 910.
If two integers a and b share no common factors, then gcd(a, b) = 1. Such
a pair of integers are called relatively prime.
If k is a non-zero integer, then k divides zero. the largest common divisor
of k and zero is k. So gcd(k, 0) = gcd(0, k) = k. However, gcd(0, 0) isnt
defined. All integers are common divisors of 0 and 0, so there is no greatest
one.
The term corollary means that this fact is a really easy consequence of
the preceding claim.
begin
r := remainder(x,y)
x := y
y := r
end
return x
Lets trace this algorithm on inputs a = 105 and b = 252. Traces should
summarize the values of the most important variables.
x y r = remainder(x, y)
105 252 105
252 105 42
105 42 21
42 21 0
21 0
Since x is smaller than y, the first iteration of the loop swaps x and y.
After that, each iteration reduces the sizes of a and b, because a mod b is
smaller than b. In the last iteration, y has gone to zero, so we output the
value of x which is 21.
To verify that this algorithm is correct, we need to convince ourselves of
two things. First, it must halt, because each iteration reduces the magnitude
of y. Second, by our corollary above, the value of gcd(x, y) does not change
from iteration to iteration. Moreover, gcd(x, 0) is x, for any non-zero integer
x. So the final output will be the gcd of the two inputs a and b.
This is a genuinely very nice algorithm. Not only is it fast, but it involves
very simple calculations that can be done by hand (without a calculator).
Its much easier than factoring both numbers into primes, especially as the
individual prime factors get larger. Most of us cant quickly see whether a
large number is divisible by, say, 17.
4.9 Pseudocode
Notice that this algorithm is written in pseudocode. Pseudocode is an ab-
stracted type of programming language, used to highlight the important
CHAPTER 4. NUMBER THEORY 48
This code is very simple, because this algorithm has a natural recursive
structure. Our corollary allows us to express the gcd of two numbers in terms
of the gcd of a smaller pair of numbers. That is to say, it allows us to reduce
a larger version of the task to a smaller version of the same task.
numbers and addition wraps around from the highest number to the lowest
one. This is true, for example, for the 12 hours on a standard US clock: 3
hours after 11 oclock is 2 oclock, not 14 oclock.
The formal mathematical definitions of modular arithmetic are based on
the notion of congruence. Specifically, two integers are congruent mod k
if they differ by a multiple of k. Formally:
3 10 (mod 7)
38 3 (mod 7)
3 6 3 (mod 7)
3 3 (mod 6)
Notice that [4], and [10] are exactly the same set as [3]. That is [4] =
[10] = [3]. So we have one object (the set) with many different names (one
per number in it). This is like a student apartment shared by Fred, Emily,
Ali, and Michelle. The superficially different phrases Emilys apartment
and Alis apartment actually refer to one and the same apartment.
Having many names for the same object can become confusing, so people
tend to choose a special preferred name for each object. For the k equiva-
lence classes of numbers mod k, mathematicians tend to prefer the names
[0], [1], . . . , [k 1]. Other names (e.g. [30] when k = 7) tend to occur only as
intermediate results in calculations.
Because standard arithmetic operations interact well with modular con-
gruence, we can set up a system of arithmetic on these equivalence classes.
Specifically, we define addition and multiplication on equivalence classes by:
[x] + [y] = [x + y]
[x] [y] = [x y]
This new set of numbers ([0], [1], . . . , [k 1]), with these modular rules of
arithmetic and equality, is known as the integers mod k or Zk for short.
For example, the addition and multiplication tables for Z4 are:
+4 [0] [1] [2] [3]
[0] [0] [1] [2] [3]
[1] [1] [2] [3] [0]
[2] [2] [3] [0] [1]
[3] [3] [0] [1] [2]
CHAPTER 4. NUMBER THEORY 52
2
The modulo operation entry on wikipedia has a nice table of what happens in
different languages.
Chapter 5
Sets
So far, weve been assuming only a basic understanding of sets. Its time to
discuss sets systematically, including a useful range of constructions, opera-
tions, notation, and special cases. Well also see how to compute the sizes of
sets and prove claims involving sets.
5.1 Sets
Sets are an extremely general concept, defined as follows:
For example, the natural numbers are a set. So are the integers between
3 and 7 (inclusive). So are all the planets in this solar system or all the
programs written by students in CS 225 in the last three years. The objects
in a set can be anything you want.
The items in the set are called its elements or members. Weve already
seen the notation for this: x A means that x is a member of the set A.
Theres three basic ways to define a set:
54
CHAPTER 5. SETS 55
Set builder notation has two parts separated with a vertical bar or a
colon. The first part names a variable (in this case x) that ranges over all
objects in the set. The second part gives one or more constraints that these
objects must satisfy, e.g. 3 x 7. The type of the variable (integer in
our example) can be specified either before or after the vertical bar. The
separator (| or :) is often read such that.
Heres an example of a set containing an infinite number of objects
multiples of 7
{x Z | x is a multiple of 7}
Weve seen ordered pairs and triples of numbers, such as (3, 4) and (4, 5, 2).
The general term for an ordered sequence of k numbers is a k-tuple.1 Tuples
are very different from sets, in that the order of values in a tuple matters
and duplicate elements dont magically collapse. So (1, 2, 2, 3) 6= (1, 2, 3)
and (1, 2, 2, 3) 6= (2, 2, 1, 3). Therefore, make sure to enclose the elements
of a set in curly brackets and carefully distinguish curly brackets (set) from
parentheses (ordered pair).
A more subtle feature of tuples is that a tuple must contain at least two
elements. In formal mathematics, a 1-dimensional value x is just written as
x, not as (x). And theres no such thing in mathematics as a 0-tuple. So a
tuple is simply a way of grouping two or more values into a single object.
By contrast, a set is like a cardboard box, into which you can put objects.
A kitty is different from a box containing a kitty. Similarly, a set containing
a single object is different from the object by itself. For example, {57} is not
the same as 57. A set can also contain nothing at all, like an empty box.
The set containing nothing is called the empty set or the null set, and has
the shorthand symbol .2
The empty set may seem like a pain in the neck. However, computer
science applications are full of empty lists, strings of zero length, and the
like. Its the kind of special case that all of you (even the non-theoreticians)
will spend your life having to watch out for.
Both sets and tuples can contain objects of more than one type, e.g.
(cat, Fluffy, 1983) or {a, b, 3, 7}. A set can also contain complex objects,
e.g. {(a, b), (1, 2, 3), 6} is a set containing three objects: an ordered pair, an
ordered triple, and a single number.
Claim 27 For all natural numbers n, if 14 + n < 10, then n wood elves will
attack Siebel Center tomorrow.
I claim this is true, a fact which most students find counter-intuitive. In fact,
it wouldnt be true if n was declared to be an integer.
Notice that this statement has the form n, P (n) Q(n), where P (n) is
the predicate 14 + n < 10. Because n is declared to be a natural number, n is
never negative, so n + 14 will always be at least 14. So P (n) is always false.
CHAPTER 5. SETS 58
Therefore, our conventions about the truth values for conditional statements
imply that P (n) Q(n) is true. This argument works for any choice of n.
So n, P (n) Q(n) is true.
Because even mathematicians find such statements a bit wierd, they typ-
ically say that such a claim is vacuously true, to emphasize to the reader
that it is only true because of this strange convention about the meaning of
conditionals. Vacuously true statements typically occur when you are trying
to apply a definition or theorem to a special case involving an abnormally
small or simple object, such as the empty set or zero or a graph with no
arrows at all.
In particular, this means that the empty set is a subset of any set A. For
to be a subset of A, the definition of subset requires that for every object
x, if x is an element of the empty set, then x is an element of A. But this
if/then statement is considered true because its hypothesis is always false.
A B = {S | S A and S B}
The set difference of A and B (A B) contains all the objects that are
in A but not in B. In this case,
M P = {bread}
The complement of a set A (A) is all the objects that arent in A. For
this to make sense, you need to define your universal set (often written U).
U contains all the objects of the sort(s) you are discussing. For example, in
some discussions, U might be all real numbers. U doesnt contain everything
you might imagine putting in a set, because constructing a set that inclusive
leads to paradoxes. U is more limited than that. Whenever U is used, you
and your reader need to come to an understanding about whats in it.
So, if our universe is all integers, and A contains all the multiples of 3,
then A is all the integers whose remainder mod 3 is either 1 or 2. Q would
be the irrational numbers if our universe is all real numbers. If we had
been working with complex numbers, it might be the set of all irrational real
numbers plus all the numbers with an imaginary component.
If A and B are two sets, their Cartesian product (A B) contains all
ordered pairs (x, y) where x is in A and y is in B. That is
A B = {(x, y) | x A and y B}
Notice that these two sets arent equal: order matters for Cartesian prod-
ucts.
DeMorgans Law: A B = A B
is like
is like
A is like P
The two systems arent exactly the same. E.g. set theory doesnt use a
close analog of the operator. But they are very similar.
We can use this basic 2-set formula to derive the corresponding formula
for three sets A, B, and C:
|A B C| = |A| + |B C| |A (B C)|
= |A| + |B| + |C| |B C| |A (B C)|
= |A| + |B| + |C| |B C| |(A B) (A C)|
= |A| + |B| + |C| |B C| (|A B| + |A C| |(A B) (A C)|)
= |A| + |B| + |C| |B C| |A B| |A C| + |A B C|
The product rule: if you have p choices for one part of a task,
then q choices for a second part, and your options for the second
part dont depend on what you chose for the first part, then you
have pq options for the whole task.
CHAPTER 5. SETS 62
This property is called transitivity, just like similar properties for (say)
on the real numbers. Both and are examples of a general type of
object called a partial order, for which transitivity is a key defining property.
First, remember our definition of : a set A is a subset of a set B if and
only if, for any object x, x A implies that x B.
In this example, the second half is very basically the first half written
backwards. However, this is not always the case. The power of this method
comes from the fact that, in harder problems, the two halves of the proof
can use different techniques.
To prove this, we first gather up all the facts we are given. What we need
to show is the subset inclusion A C B D. To do this, well need to
pick a representative element from A C and show that its an element of
B D.
The last paragraph is optional. When you first start, its a useful recap
because you might be a bit fuzzy about what you needed to prove. As you
get experienced with this sort of proof, its often omitted. But you will still
see it occasionally at the end of a very long (e.g. multi-page) proof, where
even an experienced reader might have forgotten the main goal of the proof.
Proof: Lets prove the contrapositive. That is, well prove that if
A B 6= , then (A B) (B A) 6= A B.
so, let A and B be sets and suppose that. A B 6= . Since
A B 6= , we can choose an element from A B. Lets call it x.
Since x is in A B, x is in both A and B. So x is in A B.
However, since x is in B, x is not in AB. Similarly, since x is in
A, x is not in B A. So x is not a member of (A B) (B A).
This means that (A B) (B A) and A B cannot be equal,
because x is in the second set but not in the first. .
Relations
6.1 Relations
A relation R on a set A is a subset of A A, i.e. R is a set of ordered pairs
of elements from A. For simplicity, we will assume that the base set A is
always non-empty. If R contains the pair (x, y), we say that x is related to
y, or xRy in shorthand. Well write x 6 Ry to mean that x is not related to y.
For example, suppose we let A = {2, 3, 4, 5, 6, 7, 8}. We can define a
relation W on A by xW y if and only if x y x + 2. Then W contains
pairs like (3, 4) and (4, 6), but not the pairs (6, 4) and (3, 6). Under this
relation, each element of A is related to itself. So W also contains pairs like
(5, 5).
We can draw pictures of relations using directed graphs. We draw a
graph node for each element of A and we draw an arrow joining each pair of
elements that are related. So W looks like:
68
CHAPTER 6. RELATIONS 69
2 3 4 5 6 7 8
2 3 4
5 6 7
25 12
6 4
5 3 2
Fred Ginger
Steve Alan
6.4 Transitive
The final important property of relations is transitivity. A relation R on a
set A is transitive if
So, to show that a relation is not transitive, we need to find one counter-
example, i.e. specific elements a, b, and c such that aRb and bRc but not
aRc. In the graph of a non-transitive relation, you can find a subsection that
looks like:
a c
It could be that a and c are actually the same element, in which case the
offending subgraph might look like:
a b
The problem here is that if aRb and bRa, then transitivity would imply
that aRa and bRb.
One subtle point about transitive is that its an if/then statement. So
its ok if some sets of elements just arent connected at all. For example, this
subgraph is consistent with the relation being transitive.
a b
definition of transitive. Its also symmetric, for the same reason. And, oddly
enough, antisymmetric. All of these properties hold via vacuous truth.
Vacuous truth doesnt apply to reflexive and irreflexive, because they are
unconditional requirements, not if/then statements. So this empty relation
is irreflexive and not reflexive.
[x]R = {y A | xRy}
When the relation is clear from the context (as when we discussed congruence
mod k), we frequently omit the subscript on the square brackets.
For example, we saw the relation Z on the real plane R2 , where (x, y)Z(p, q)
if and only if x2 + y 2 = p2 + q 2 . Then [(0, 1)]Z contains all the points related
to (0, 1), i.e. the unit circle.
p
F = { | p, q Z and q 6= 0}
q
Fractions arent the same thing as rational numbers, because each rational
number is represented by many fractions. We consider two fractions to be
equivalent, i.e. represent the same rational number, if xq = yp. So, we have
an equivalence relation defined by: xy pq if and only if xq = yp.
Lets show that is an equivalence relation.
ypt = pty, this means that xqt = qsy. Cancelling out the qs, we
get xt = sy. By the definition of , this means that xy st .
Since is reflexive, symmetric, and transitive, it is an equivalence
relation.
Notice that the proof has three sub-parts, each showing one of the key
properties. Each part involves using the definition of the relation, plus a
small amount of basic math. The reflexive case is very short. The symmetric
case is often short as well. Most of the work is in the transitive case.
To show that C is a partial order, wed need to show that its reflex-
ive, antisymmetric, and transitive. Weve seen how to prove two of these
properties. Lets see how to do a proof of antisymmetry.
For proving antisymmetry, its typically easiest to use this form of the
definition of antisymmetry: if xRy and yRx, then x = y. Notice that C is a
relation on intervals, i.e. pairs of numbers, not single numbers. Substituting
the definition of C into the definition of antisymmetry, we need to show that
For any intervals (a, b) and (c, d), if (a, b) C (c, d) and (c, d) C (a, b),
then (a, b) = (c, d).
So, suppose that we have two intervals (a, b) and (c, d) such that (a, b) C (c, d)
and (c, d) C (a, b). By the definition of C, (a, b) C (c, d) implies that a c
and d b. Similarly, (c, d) C (a, b) implies that c a and b d.
Since a c and c a, a = c. Since d b and b d, b = d. So
(a, b) = (c, d).
Chapter 7
7.1 Functions
Were all familiar with functions from high school and calculus. However,
these prior math courses concentrate on functions whose inputs and outputs
are numbers, defined by an algebraic formula such as f (x) = 2x + 3. Well be
using a broader range of functions, whose input and/or output values may
be integers, strings, characters, and the like.
Suppose that A and B are sets, then a function f from A to B is an
assignment of exactly one element of B (i.e. the output value) to each element
of A (i.e. the input value). A is called the domain of f and B is called the
co-domain. All of this information can be captured in the shorthand type
signature: f : A B. If x is an element of A, then the value f (x) is also
known as the image of x.
For example, suppose P is a set of five people:
77
CHAPTER 7. FUNCTIONS AND ONTO 78
f (Margaret) = Blue
f (Tom) = Red
f (LaSonya) = Purple
f (Emma) = Red
f (Chen) = Blue
Emma
Chen purple
Tom green
Even if A and B are finite sets, there are a very large number of possible
functions from A to B. Suppose that |A| = n, |B| = p. We can write out the
elements of A as x1 , x2 , . . . , xn . When constructing a function f : A B, we
have p ways to choose the output value for x1 . The choice of f (x1 ) doesnt
affect our possible choices for f (x2 ): we also have p choices for that value. So
we have p2 choices for the first two output values. If we continue this process
for the rest of the elements of A, we have pn possible ways to construct our
function f .
CHAPTER 7. FUNCTIONS AND ONTO 79
For any set A, the identity function idA maps each value in A to itself.
That is, idA : A A and idA (x) = x.
Emma
LaSonya
Chen purple
Tom
LaSonya
purple
Chen
blue red
Margaret
green
Tom
The following isnt a function, because one input is paired with two out-
puts:
LaSonya
purple
red
Margaret
blue
green
Tom
f (A) = {f (x) : x A}
CHAPTER 7. FUNCTIONS AND ONTO 81
For example, suppose M = {a, b, c, d}, N = {1, 2, 3, 4}, and our function
g : M N is as in the following diagram. Then g(A) = {1, 3, 4}.
a
3
b 2
c 1
4
d
y B, x A, f (x) = y
y B, x A, f (x) = y
y B, x A, f (x) = y
y B, x A, f (x) = y
y B, x A, (f (x) = y)
y B, x A, f (x) 6= y
CHAPTER 7. FUNCTIONS AND ONTO 83
So, if we want to show that f is not onto, we need to find some value y
in B, such that no matter which element x you pick from A, f (x) isnt equal
to y.
This sentence asks you to consider some random Fleck. Then, given that
choice, it asserts that they have a toothbrush. The toothbrush is chosen
after weve picked the person, so the choice of toothbrush can depend on the
choice of person. This doesnt absolutely force everyone to pick their own
toothbrush. (For a brief period, two of my sons were using the same one
because they got confused.) However, at least this statement is consistent
with each person having their own toothbrush.
Suppose now that we swap the order of the quantifiers, to get
In this case, were asked to choose a toothbrush t first. Then were assert-
ing that every Fleck uses this one fixed toothbrush t. Eeeuw! That wasnt
what we wanted to say!
We do want the existential quantifier first when theres a single object
thats shared among the various people, as in:
CHAPTER 7. FUNCTIONS AND ONTO 84
There is a stove s, such that for every person p in the Fleck family,
p cooks his food on s.
Notice that this order issue only appears when a statement a mixture of
existential and universal quantifiers. If all the quantifiers are existential, or
if all the quantifiers are universal, the order doesnt matter.
To take a more mathematical example, lets look back at modular arith-
metic. Two numbers x and y are multiplicative inverses if xy = yx = 1. In
the integers Z, only 1 has a multiplicative inverse. However, in Zk , many
other integers have inverses. For example, if k = 7, then [3][5] = [1]. So [3]
and [5] are inverses.
For certain values of k every non-zero element of Zk has an inverse.1 You
can verify that this is true for Z7 : [3] and [5] are inverses, [2] and [4] are
inverses, [1] is its own inverse, and [6] is its own inverse. So we can say that
non-zero x Z7 , y Z7 , xy = yx = 1
Notice that weve put the universal quantifier outside the existential one,
so that each number gets to pick its own inverse. Reversing the order of the
quantifers would give us the following statement:
y Z7 , non-zero x Z7 , xy = yx = 1
This version isnt true, because you cant pick one single number that
works as an inverse for all the rest of the non-zero numbers, even in modular
arithmetic.
However, we do want the existential quantifier first in the following claim,
because 0y = y0 = 0 for every y Z7 .
x Z7 , y Z7 , xy = yx = x
1
Can you figure out which values of k have this property?
CHAPTER 7. FUNCTIONS AND ONTO 85
Claim 32 Define the function g from the integers to the integers by the
formula g(x) = x 8. g is onto.
For some functions, several input values map onto a single output value.
In that case, we can choose any input value in our proof, typically whichever
is easiest for the proof-writer. For example, suppose we had g : Z Z such
that g(x) = x2 . To show that g is onto, were given an output value x and
need to find the corresponding input value. The simplest choice would be 2x
itself. But you could also pick 2x + 1.
Suppose we try to build such a proof for a function that isnt onto, e.g.
f : Z Z such that f (x) = 3x + 2.
If f was a function from the reals to the reals, wed be ok at this point,
because x would be a good pre-image for y. However, f s inputs are declared
to be integers. For many values of y, (y2)
3
isnt an integer. So it wont work
as an input value for f .
CHAPTER 7. FUNCTIONS AND ONTO 86
7.9 A 2D example
Heres a sample function whose domain is 2D. Let f : Z2 Z be defined by
f (x, y) = x + y. I claim that f is onto.
First, lets make sure we know how to read this definition. f : Z2 is
shorthand for Z Z, which is the set of pairs of integers. So f maps a pair
of integers to a single integer, which is the sum of the two coordinates.
To prove that f is onto, we need to pick some arbitrary element y in the
co-domain. That is to say, y is an integer. Then we need to find a sample
value in the domain that maps onto y, i.e. a preimage of y. At this point,
it helps to fiddle around on our scratch paper, to come up with a suitable
preimage. In this case, (0, y) will work nicely. So our proof looks like:
Notice that this function maps many input values onto each output value.
So, in our proof, we could have used a different formula for finding the input
value, e.g. (1, y 1) or (y, 0). A proof writer with a sense of humor might
use (342, y 342).
f (x) = 3x + 7
g(x) = x 8
Since the domains and co-domains for both functions are the integers, we
can compose the two functions in both orders. But two composition orders
give us different functions:
Claim 33 For any sets A, B, and C and for any functions f : A B and
g : B C, if f and g are onto, then g f is also onto.
In this chapter, well see what it means for a function to be one-to-one and
bijective. This general topic includes counting permutations and comparing
sizes of finite sets (e.g. the pigeonhole principle). Well also see the method
of adding stipulations to a proof without loss of generality as well as the
technique of proving an equality via upper and lower bounds.
8.1 One-to-one
Suppose that f : A B is a function from A to B. If we pick a value y B,
then x A is a pre-image of y if f (x) = y. Notice that I said a pre-image
of y, not the pre-image of y, because y might have more than one preimage.
For example, in the following function, 1 and 2 are pre-images of 1, 4 and 5
are pre-images of 4, 3 is a pre-image of 3, and 2 has no preimages.
89
CHAPTER 8. FUNCTIONS AND ONE-TO-ONE 90
1
3
2 2
3
1
4 4
5
x, y A, x 6= y f (x) 6= f (y)
x, y A, f (x) = f (y) x = y
When reading this definition, notice that when you set up two variables
x and y, they dont have to have different (math jargon: distinct) values.
In normal English, if you give different names to two objects, the listener is
expected to understand that they are different. By contrast, mathematicians
always mean you to understand that they might be different but theres also
the possibility that they might be the same object.
CHAPTER 8. FUNCTIONS AND ONE-TO-ONE 91
8.2 Bijections
If a function f is both one-to-one and onto, then each output value has
exactly one pre-image. So we can invert f , to get an inverse function f 1 . A
function that is both one-to-one and onto is called bijective or a bijection. If
f maps from A to B, then f 1 maps from B to A.
Suppose that A and B are finite sets. Constructing an onto function
from A to B is only possible when A has at least as many elements as B.
Constructing a one-to-one function from A to B requires that B have at least
as many values as A. So if there is a bijection between A and B, then the two
sets must contain the same number of elements. As well see later, bijections
are also used to define what it means for two infinite sets to be the same
size.
For example, if you have 8 playing cards and the cards come in five colors,
then at least two of the cards share the same color. Or, if you have a 30-
character string containing lowercase letters, it must contain some duplicate
letters, because 26 is smaller than 30.
The pigeonhole principle (like many superheros) has a dual identity.
When youre told to apply it to some specific objects and labels, its ob-
vious how to do so. However, it is often pulled out of nowhere as a clever
trick in proofs, where you would have never suspected that it might be useful.
Such proofs are easy to read, but sometimes hard to come up with.
CHAPTER 8. FUNCTIONS AND ONE-TO-ONE 92
For example, heres a fact that we can prove with a less obvious applica-
tion of the pigeonhole principle.
To show that 4 is the maximum number of points we can place with the
required separations, we need to show that its possible to place 4 points,
but it is not possible to place 5 points. Its easiest to tackle these two sub-
problems separately, using different techniques.
Suppose we tried to place five or more points into the big trian-
gle. Since there are only four small triangles, by the pigeonhole
principle, some small triangle would have to contain at least two
points. But since the small triangle has side length only 1, these
points cant be separated by more than one unit.
The tricky part of this proof isnt the actual use of the pigeonhole prin-
ciple. Rather, its the setup steps, e.g. dividing up the triangle, that got us
to the point where the pigeonhole principle could be applied. This is typical
of harder pigeonhole principle proofs.
8.4 Permutations
Now, suppose that |A| = n = |B|. We can construct a number of one-to-one
functions from A to B. How many? Suppose that A = {x1 , x2 , . . . , xn }.
We have n ways to choose the output value for x1 . But that choice uses up
one output value, so we have only n 1 choices for the output value of x2 .
Continuing this pattern, we have n(n 1)(n 2) . . . 2 1 (n! for short) ways
to construct our function.
Similarly, suppose we have a group of n dragon figurines that wed like to
arrange on the mantlepiece. We need to construct a map from positions on
the mantlepiece to dragons. There are n! ways to do this. An arrangement
of n objects in order is called a permutation of the n objects.
More frequently, we need to select an ordered list of k objects from a
larger set of n objects. For example, we have 30 dragon figurines but space
for only 10 on our mantlepiece. Then we have 30 choices for the first figurine,
CHAPTER 8. FUNCTIONS AND ONE-TO-ONE 94
29 for the second, and so forth down to 21 choices for the last one. Thus we
have 30 29 . . . 21 ways to decorate the mantlepiece.
In general, an ordered choice of k objects from a set of n objects is known
n!
as a k-permutation of the n objects. There are n(n1) . . . (nk +1) = (nk)!
different k-permutations of n objects. This number is called P (n, k). P (n, k)
is also the number of one-to-one functions from a set of k objects to a set of
n objects.
When we pick x and y at the start of the proof, notice that we havent
specified whether they are the same number or not. Mathematical convention
leaves this vague, unlike normal English where the same statement would
strongly suggest that they were different.
Claim 36 For any sets A, B, and C and for any functions f : A B and
g : B C, if f and g are one-to-one, then g f is also one-to-one.
CHAPTER 8. FUNCTIONS AND ONE-TO-ONE 96
We can prove this with a direct proof, by being systematic about using our
definitions and standard proof outlines. First, lets pick some representative
objects of the right types and assume everything in our hypothesis.
Now, we need to apply the definition of function composition and the fact
that f and g are each one-to-one:
Claim 37 For any sets of real numbers A and B, if f is any strictly increas-
ing function from A to B, then f is one-to-one.
x, y A, x 6= y f (x) 6= f (y)
This method only works if you, and your reader, both agree that its
obvious that the two cases are very similar and the proof will really be similar.
Dangerous assumption right now. And weve only saved a small amount of
writing, which isnt worth the risk of losing points if the grader doesnt think
it was obvious.
But this simplification can be very useful in more complicated situations
where you have may have lots of cases, the proof for each case is long, and
the proofs for different cases really are very similar.
CHAPTER 8. FUNCTIONS AND ONE-TO-ONE 99
Graphs
Graphs are a very general class of object, used to formalize a wide variety of
practical problems in computer science. In this chapter, well see the basics
of (finite) undirected graphs, including graph isomorphism, connectivity, and
graph coloring.
9.1 Graphs
A graph consists of a set of nodes V and a set of edges E. Well sometimes
refer to the graph as a pair of sets (V, E). Each edge in E joins two nodes
in V . Two nodes connected by an edge are called neighbors or adjacent.
For example, here is a graph in which the nodes are Illinois cities and the
edges are roads joining them:
100
CHAPTER 9. GRAPHS 101
Chicago
Bloomington
Urbana Danville
Springfield Decatur
plane
San Francisco Oakland
I-80
Rt 101 I-880
Rt 84
Palo Alto Fremont
9.2 Degrees
The degree of a node v, written deg(v) is the number of edges which have v as
an endpoint. Self-loops, if you are allowing them, count twice. For example,
in the following graph, a has degree 2, b has degree 6, d has degree 0, and so
forth.
b c
a d
e f
CHAPTER 9. GRAPHS 103
Each edge contributes to two node degrees. So the sum of the degrees of
all the nodes is twice the number of edges. This is called the Handshaking
Theorem and can be written as
X
deg(v) = 2|E|
vV
n
X
vk V deg(v) = 2|E|
k=1
b c
d e
E = {v1 v2 , v2 v3 , . . . , vn1 vn , vn v1 }
So C5 looks like
b c
d e
Cn has n nodes and also n edges. Cycle graphs often occur in networking
applications. They could also be used to model games like telephone where
people sit in a circle and communicate only with their neighbors.
The wheel Wn is just like the cycle graph Cn except that it has an addi-
tional central hub node which is connected to all the others. Notice that
Wn has n + 1 nodes (not n nodes). It has 2n edges. For example, W5 looks
like
b c
hub
d e
CHAPTER 9. GRAPHS 105
9.5 Isomorphism
In graph theory, we only care about how nodes and edges are connected
together. We dont care about how they are arranged on the page or in
space, how the nodes and edges are named, and whether the edges are drawn
as straight or curvy. We would like to treat graphs as interchangeable if they
have the same abstract connectivity structure.
Specifically, suppose that G1 = (V1 , E1 ) and G2 = (V2 , E2 ) are graphs.
An isomorphism from G1 to G2 is a bijection f : V1 V2 such that nodes
a and b are joined by an edge if and only if f (a) and f (b) are joined by an
edge. The graphs G1 and G2 are isomorphic if there is an isomorphism
from G1 to G2 .
For example, the following two graphs are isomorphic. We can prove this
by defining the function f so that it maps 1 to d, 2 to a, 3 to c, and 4 to b.
The reader can then verify that edges exist in the left graph if and only if
the corresponding edges exist in the right graph.
a c 1 3
b d 2 4
The two graphs must have the same number of nodes and the same
number of edges.
CHAPTER 9. GRAPHS 106
For any node degree k, the two graphs must have the same number of
nodes of degree k. For example, they must have the same number of
nodes with degree 3.
We can prove that two graphs are not isomorphic by giving one example
of a property that is supposed to be invariant but, in fact, differs between
the two graphs. For example, in the following picture, the lefthand graph
has a node of degree 3, but the righthand graph has no nodes of degree 3, so
they cant be isomorphic.
a c 1 3
b d 2 4
9.6 Subgraphs
Its not hard to find a pair of graphs that arent isomorphic but where the
most obvious properties (e.g. node degrees) match. To prove that such a pair
isnt isomorphic, its often helpful to focus on certain specific local features of
one graph that arent present in the other graph. For example, the following
two graphs have the same node degrees: one node of degree 1, three of degree
2, one of degree 3. However, a little experimentation suggests they arent
isomorphic.
a c 1 3
e b d 2 4 5
G and the edges of G are a subset of the edges of G. If two graphs G and
F are isomorphic, then any subgraph of G must have a matching subgraph
somewhere in F .
A graph has a huge number of subgraphs. However, we can usually find
evidence of non-isomorphism by looking at small subgraphs. For example, in
the graphs above, the lefthand graph has C3 as a subgraph, but the righthand
graph does not. So they cant be isomorphic.
a c
b d e
CHAPTER 9. GRAPHS 108
In the following graph, one cycle of length 4 has edges: ab, bc, ce, da.
Other closely-related cycles go through the same nodes but with a different
starting point or in the opposite direction, e.g. da, ab, bc, ce. Unlike cycles,
closed walks can re-use nodes, e.g. ab, ba, ac, ce, ec, ca is a closed walk but
not a cycle.
a c e
b d f
a c e
b d f
Notice that the cycle graph Cn contains 2n different cycles. For example,
if the vertices of C4 are labelled as shown below, then one cycle is ab, bc, cd, da,
another is cd, bc, ab, da, and so forth.
c d
b a
CHAPTER 9. GRAPHS 109
9.8 Connectivity
A graph G is connected if there is a walk between every pair of nodes in
G. Our previous examples of graphs were connected. The following graph is
not connected, because there is no walk from (for example), a to g.
a c g
h
b d e f
a c g
b d e f
CHAPTER 9. GRAPHS 110
9.9 Distances
In graphs, distances are based on the lengths of paths connecting pairs of
nodes. Specifically, the distance d(a, b) between two nodes a and b is the
length of the shortest path from a to b. The diameter of a graph is the
maximum distance between any pair of nodes in the graph. For example,
the lefthand graph below has diameter 4, because d(f, e) = 4 and no other
pair of nodes is further apart. The righthand graph has diameter 2, because
d(1, 5) = 2 and no other pair of nodes is further apart.
f a c 1 3
5
b d e 2 4
a c e
b d f
An Euler circuit is possible exactly when the graph is connected and each
node has even degree. Each node has to have even degree because, in order
to complete the circuit, you have to leave each node that you enter. If the
CHAPTER 9. GRAPHS 111
node has odd degree, you will eventually enter a node but have no unused
edge to go out on.
Fascination with Euler circuits dates back to the 18th century. At that
time, the city of Konigberg, in Prussia, had a set of bridges that looked
roughly as follows:
R G
G B
But the complete graph Kn requires n colors, because each node is adja-
cent to all the other nodes. E.g. K4 can be colored as follows:
R G
B Y
For small finite graphs, the simplest way to show that (G) n is to show
a coloring of G that uses n colors.
Showing that (G) n can sometimes be equally straightforward. For
example, G may contain a copy of Kn , which cant be colored with less than
n colors. However, sometimes it may be necessary to step carefully through
all the possible ways to color G with n 1 colors and show that none of them
works out.
CHAPTER 9. GRAPHS 113
G R
R G
G R
R G
Bipartite graphs often appear in matching problems, where the two sub-
sets represent different types of objects. For example, one group of nodes
might be students, the other group of nodes might be workstudy jobs, and
the edges might indicate which jobs each student is interested in.
The complete bipartite graph Km,n is a bipartite graph with m nodes in
V1 , n nodes in V2 , and which contains all possible edges that are consistent
with the definition of bipartite. The diagram below shows a partial bipartite
graph on a set of 7 nodes, as well as the complete bipartite graph K3,2 .
Induction
n(n+1)
Claim 38 For any positive integer n, ni=1 i = 2
.
At that point, we didnt prove this formula correct, because this is most
easily done using a new proof technique: induction.
Mathematical induction is a technique for showing that a statement P (n)
is true for all natural numbers n, or for some infinite subset of the natural
numbers (e.g. all positive even integers). Its a nice way to produce quick,
easy-to-read proofs for a variety of fact that would be awkward to prove with
the techniques youve seen so far. It is particularly well suited to analyzing
the performance of recursive algorithms. Most of you have seen a few of these
in previous programming classes; youll see many more in later classes.
Induction is very handy, but it may strike you as a bit weird. It may
take you some time to get used to it. In fact, you have two tasks which are
somewhat independent:
117
CHAPTER 10. INDUCTION 118
You can learn to write correct inductive proofs even if you remain some-
what unsure of why the method is legitimate. Over the next few classes,
youll gain confidence in the validity of induction and its friend recursion.
10.2 An Example
A proof by induction has the following outline:
n(n+1)
Proof: We will show that ni=1 i = 2
for any positive integer
n, using induction on n.
Base: We need to show that the formula holds for n = 1, i.e.
1i=1 i = 12
2
.
Induction: Suppose that ni=1 i = n(n+1)
2
for n = 1, 2, . . . , k 1.
k k(k+1)
We need to show that i=1 i = 2 .
CHAPTER 10. INDUCTION 119
n(n+1)
Proof: We will show that ni=1 i = 2
for any positive integer
n, using induction on n.
Base: We need to show that the formula holds for n = 1. 1i=1 i =
1. And also 12
2
= 1. So the two are equal for n = 1.
Induction: Suppose that ni=1 i = n(n+1)
2
for n = 1, 2, . . . , k 1.
k(k+1)
We need to show that ki=1 i = 2 .
k1
By the definition of summation notation, ki=1 i = (i=1 i) + k
Our inductive hypothesis states that at n = k 1, 1
i=1 k i =
(k1)k
( 2 ).
Combining these two formulas, we get that ki=1 i = ( (k1)k
2
) + k.
But ( (k1)k
2
) + k = ( (k1)k
2
)+ 2k
2
= ( (k1+2)k
2
)= k(k+1)
2
.
So, combining these equations, we get that ki=1 i = k(k+1)
2
which
is what we needed to show.
First, you should try (on your own) some specific integers and verify that
the claim is true. Since the claim specifies n 4, its worth checking that 4
does work but the smaller integers dont.
In this claim, the proposition P (n) is 2n < n!. So an outline of our
inductive proof looks like:
CHAPTER 10. INDUCTION 122
Notice that our base case is for n = 4 because the claim was specified
to hold only for integers 4. Weve also used a variation on our induction
outline, where the induction hypothesis covers values up through k (instead
of k 1) and we prove the claim at n = k + 1 (instead of at n = k). It
doesnt matter whether your hypothesis goes through n = k 1 or n = k, as
long as you prove the claim for the next larger integer.
Fleshing out the details of the algebra, we get the following full proof.
When working with inequalities, its especially important to write down your
assumptions and what you want to conclude with. You can then work from
both ends to fill in the gap in the middle of the proof.
suitable integer n.
Proof: By induction on n.
Base: Let n = 0. Then n3 n = 03 0 = 0 which is divisible by
3.
Induction: Suppose that n3 n is divisible by 3, for n = 0, 1, . . . , k.
We need to show that (k + 1)3 (k + 1) is divisible by 3.
(k+1)3 (k+1) = (k 3 +3k 2 +3k+1)(k+1) = (k 3 k)+3(k 2 +k)
From the inductive hypothesis, (k 3 k) is divisible by 3. And
3(k 2 + k) is divisible by 3 since (k 2 + k) is an integer. So their
sum is divisible by 3. That is (k + 1)3 (k + 1) is divisible by 3.
The zero base case is technically enough to make the proof solid, but
sometimes a zero base case doesnt provide good intuition or confidence. So
youll sometimes see an extra base case written out, e.g. n = 1 in this
example, to help the author or reader see why the claim is plausible.
I claim that
Proof: by induction on n.
Base: Suppose n = 1. Then our 2n 2n checkerboard with one
square removed is exactly one right triomino.
Induction: Suppose that the claim is true for n = 1, . . . , k. That
is a 2n 2n checkerboard with any one square removed can be
tiled using right triominoes as long as n k.
Suppose we have a 2k+1 2k+1 checkerboard C with any one
square removed. We can divide C into four 2k 2k sub-checkerboards
P , Q, R, and S. One of these sub-checkerboards is already miss-
ing a square. Suppose without loss of generality that this one is
S. Place a single right triomino in the middle of C so it covers
one square on each of P , Q, and R.
Now look at the areas remaining to be covered. In each of the
sub-checkerboards, exactly one square is missing (S) or already
covered (P , Q, and R). So, by our inductive hypothesis, each
of these sub-checkerboards minus one square can be tiled with
right triominoes. Combining these four tilings with the triomino
we put in the middle, we get a tiling for the whole of the larger
checkerboard C. This is what we needed to construct.
CHAPTER 10. INDUCTION 126
We can use this idea to design an algorithm (called the greedy algo-
rithm) for coloring a graph. This algorithm walks through the nodes one-by-
one, giving each node a color without revising any of the previously-assigned
CHAPTER 10. INDUCTION 127
colors. When we get to each node, we see what colors have been assigned to
its neighbors. If there is a previously used color not assigned to a neighbor,
we re-use that color. Otherwise, we deploy a new color. The above theorem
shows that the greedy algorithm will never use more than D + 1 colors.
Notice, however, that D + 1 is only an upper bound on the chromatic
number of the graph. The actual chromatic number of the graph might be a
lot smaller. For example, D + 1 would be 7 for the wheel graph W6 but this
graph actually has chromatic number only three:
G B
B R G
G B
G R G
Notice that whether we need to deploy a new color to handle a node isnt
actually determined by the degree of the node but, rather, by how many of
its neighbors are already colored. So a useful heuristic is to order nodes by
their degrees and color higher-degree nodes earlier in the process. This tends
to mean that, when we reach a high-degree node, some of its neighbors will
not yet be colored. So we will be able to handle the high-degree nodes with
CHAPTER 10. INDUCTION 128
fewer colors and then extend this partial coloring to all the low-degree nodes.
For example, 12 cents uses three 4-cent stamps. 13 cents of postage uses
two 4-cent stamps plus a 5-cent stamp. 14 uses one 4-cent stamp plus two
5-cent stamps. If you experiment with small values, you quickly realize that
the formula for making k cents of postage depends on the one for making
k 4 cents of postage. That is, you take the stamps for k 4 cents and add
another 4-cent stamp. We can make this into an inductive proof as follows:
Notice that we needed to directly prove four base cases, since we needed
to reach back four integers in our inductive step. Its not always obvious how
many base cases are needed until you work out the details of your inductive
step.
10.11 Nim
In the parlour game Nim, there are two players and two piles of matches. At
each turn, a player removes some (non-zero) number of matches from one of
the piles. The player who removes the last match wins.1
Claim 44 If the two piles contain the same number of matches at the start
of the game, then the second player can always win.
Heres a winning strategy for the second player. Suppose your opponent
removes m matches from one pile. In your next move, you remove m matches
from the other pile, thus evening up the piles. Lets prove that this strategy
works.
Induction: Suppose that the second player can win any game that
starts with two piles of n matches, where n is any value from 1
through k 1. We need to show that this is true if n = k.
So, suppose that both piles contain k matches. A legal move by
the first player involves removing j matches from one pile, where
1 j k. The piles then contain k matches and k j matches.
The second player can now remove j matches from the other pile.
This leaves us with two piles of k j matches. If j = k, then
the second player wins. If j < k, then were now effectively at
the start of a game with k j matches in each pile. Since j 1,
k j k 1. So, by the induction hypothesis, we know that the
second player can finish the rest of the game with a win.
The induction step in this proof uses the fact that our claim P (n) is
true for a smaller value of n. But since we cant control how many matches
the first player removes, we dont know how far back we have look in the
sequence of earlier results P (1) . . . P (k). Our previous proof about postage
can be rewritten so as to avoid strong induction. Its less clear how to rewrite
proofs like this Nim example.
Proof by induction on n.
2
Well leave the details of proving this as an exercise for the reader.
CHAPTER 10. INDUCTION 131
Again, the inductive step needed to reach back some number of steps in
our sequence of results, but we couldnt control how far back we needed to
go.
Authors writing for more experienced audiences may abbreviate the out-
line somewhat, e.g. packing an entirely short proof into one paragraph with-
out labelling the base and inductive steps separately. However, being careful
about the outline is important when you are still getting used to the tech-
nique.
Chapter 11
Recursive Definition
n
X
i = 1 + 2 + 3 + . . . + (n 1) + n
i=1
This method is only ok when the reader can easily see what regular pattern
the . . . is trying to express. When precision is essential, e.g. when the pattern
is less obvious, we need to switch to recursive definitions.
Recursive function definitions in mathematics are basically similar to re-
cursive procedures in programming languages. A recursive definition defines
an object in terms of smaller objects of the same type. Because this pro-
cess has to end at some point, we need to include explicit definitions for the
smallest objects. So a recursive definition always has two parts:
133
CHAPTER 11. RECURSIVE DEFINITION 134
Recursive formula
Pn
For example, the summation i=1 i can be defined as:
g(1) = 1
Both the base case and the recursive formula must be present to have a
complete definition. However, it is traditional not to explicitly label these two
pieces. Youre just expected to figure out for yourself which parts are base
case(s) and which is the recursive formula. The input values are normally
assumed to be integers.
The true power of recursive definition is revealed when the result for
n depends on the results for more than one smaller value, as in the strong
induction examples. For example, the famous Fibonacci numbers are defined:
F0 = 0
F1 = 1
Fi = Fi1 + Fi2 , i 2
T (1) = 1
T (n) = 2T (n 1) + 3, n 2
The values of this function are T (1) = 1, T (2) = 5, T (3) = 13, T (4) = 29,
T (5) = 61. It isnt so obvious what the pattern is.
The idea behind unrolling is to substitute a recursive definition into itself,
so as to re-express T (n) in terms of T (n 2) rather than T (n 1). We keep
doing this, expressing T (n) in terms of the value of T for smaller and smaller
inputs, until we can see the pattern required to express T (n) in terms of n
and T (0). So, for our example function, we would compute:
T (n) = 2T (n 1) + 3
= 2(2T (n 2) + 3) + 3
= 2(2(2T (n 3) + 3) + 3) + 3
= 23 T (n 3) + 22 3 + 2 3 + 3
= 24 T (n 4) + 23 3 + 22 3 + 2 3 + 3
...
= 2k T (n k) + 2k1 3 + . . . + 22 3 + 2 3 + 3
The first few lines of this are mechanical substitution. To get to the last line,
you have to imagine what the pattern looks like after k substitutions.
We can use summation notation to compactly represent the result of the
kth unrolling step:
T (n) = 2k T (n k) + 2k1 3 + . . . + 22 3 + 2 3 + 3
= 2k T (n k) + 3(2k1 + . . . + 22 + 2 + 1)
k1
X
= 2k T (n k) + 3 (2i )
i=0
CHAPTER 11. RECURSIVE DEFINITION 136
Now, we need to determine when the input to T will hit the base case. In
our example, the input value is n k and the base case is for an input of 1.
So we hit the base case when n k = 1. i.e. when k = n 1. Substituting
this value for k back into our equation, and using the fact that T (1) = 1, we
get
k1
X
k
T (n) = 2 T (n k) + 3 (2i )
i=0
n2
X
= 2n1T (1) + 3 (2i )
i=0
n2
X
= 2n1 + 3 (2k )
k=0
= 2n1 + 3(2n1 1) = 4(2n1 ) 3 = 2n+1 3
So the closed form for this function is T (n) = 2n+1 3. The unrolling
process isnt a formal proof that our closed form is correct. However, well
see below how to write a formal proof using induction.
S(1) = c
S(n) = aS(n/b) + f (n), n 2
The base case takes some constant amount of work c. The term f (n) is
the work involved in dividing up the big problem and/or merging together
CHAPTER 11. RECURSIVE DEFINITION 137
the solutions for the smaller problems. The call to the ceiling function is
required to ensure that the input to S is always an integer.
Handling such definitions in full generality is beyond the scope of this
class.1 So lets consider a particularly important special case: dividing our
problem into two half-size problems, where the dividing/merging takes time
proportional to the size of the problem. And lets also restrict our input n
to be a power of two, so that we dont need to use the ceiling function. We
then get a recursive definition that looks like:
S(1) = c
S(n) = 2S(n/2) + n, n 2 (n a power of 2)
S(n) = 2S(n/2) + n
= 2(2S(n/4) + n/2) + n
= 4S(n/4) + n + n
= 8S(n/8) + n + n + n
...
n
= 2i S( i ) + in
2
We hit the base case when 2ni = 1 i.e. when i = log n (i.e. log base 2,
which is the normal convention for algorithms applications). Substituting in
this value for i and the base case value S(1) = c, we get
n
S(n) = 2i S( i ) + in = 2log n c + n log n = cn + n log n
2
So the closed form for S(n) is cn + n log n.
In real applications, our input n might not be a power of 2, so our actual
recurrence might look like:
1
See any algorithms text for more details.
CHAPTER 11. RECURSIVE DEFINITION 138
S(1) = c
S(n) = 2S(n/2) + n, n 2
We could extend the details of our analysis to handle the input values that
arent powers of 2. In many practical contexts, however, we are only inter-
ested in the overall shape of the function, e.g. is it roughly linear? cubic?
exponential? So it is often sufficient to note that S is increasing, so values
of S for inputs that arent powers of 2 will lie between the values of S at the
adjacent powers of 2.
11.4 Hypercubes
Non-numerical objects can also be defined recursively. For example, the
hypercube Qn is the graph of the corners and edges of an n-dimensional
cube. It is defined recursively as follows (for any n N):
That is, each node vi in one copy of Qn1 is joined by an edge to its clone
copy vi in the second copy of Qn1 . Q0 , Q1 , Q2 , and Q3 look as follows. The
node labels distinguish the two copies of Qn 1
A A A B B B
B A B A A B B
A A
The hypercube defines a binary coordinate system. To build this coordi-
nate system, we label nodes with binary numbers, where each binary digit
CHAPTER 11. RECURSIVE DEFINITION 139
corresponds to the value of one coordinate. The edges connect nodes that
differ in exactly one coordinate.
111 101
010 000
n n
Q has 2 nodes. To compute the number of edges, we set up the following
recursive definition for the number of edges E(n) in the Qn :
1. E(0) = 0
The 2n1 term is the number of nodes in each copy of Qn1 , i.e. the number
of edges required to join corresponding nodes. Well leave it as an exercise
to find a closed form for this recursive definition.
Proof: by induction on n.
Base: F0 = 0, which is even.
CHAPTER 11. RECURSIVE DEFINITION 140
Some people feel a bit uncertain if the base case is a special case like zero.
Its ok to also include a second base case. For this proof, you would check the
case for n = 1 i.e. verify that F3 is even. The extra base case isnt necessary
for a complete proof, but it doesnt cause any harm and may help the reader.
f (1) = 3
n 1, f (n + 1) = 3f (n) 2f (n 1)
I claim that:
Claim 46 n N, f (n) = 2n + 1
Proof: by induction on n.
Base: f (0) is defined to be 2. 20 +1 = 1+1 = 2. So f (n) = 2n +1
when n = 0.
f (1) is defined to be 3. 21 + 1 = 2 + 1 = 3. So f (n) = 2n + 1
when n = 1.
Induction: Suppose that f (n) = 2n + 1 for n = 0, 1, . . . , k.
f (k + 1) = 3f (k) 2f (k 1)
By the induction hypothesis, f (k) = 2k +1 and f (k1) = 2k1 +1.
Substituting these formulas into the previous equation, we get:
f (k + 1) = 3(2k + 1) 2(2k1 + 1) = 3 2k + 3 2k 2 = 2 2k + 1 =
2k+1 + 1
So f (k + 1) = 2k+1 + 1, which is what we needed to show.
Trees
142
CHAPTER 12. TREES 143
((a * c) + b) * d)
a * c + b * d
NP
NP
NP
NP
Heres what a medical decision tree might look like. Decision trees are
also used for engineering classification problems, such as transcribing speech
waveforms into the basic sounds of a natural language.
Cough?
no
yes
Headache? Fever?
no no
yes yes
And here is a tree storing the set of number {2, 8, 10, 32, 47, 108, 200, 327, 400}
CHAPTER 12. TREES 145
32
8 108
-2 10 47 327
200 400
If you can get from x to g by following one or more parent links, then
g is an ancestor of x and x is a descendent of g. We will treat x as an
ancestor/descendent of itself. The ancestors/descendents of x other than x
itself are its proper ancestors/descendents. If you pick some random node
a in a tree T , the subtree rooted at a consists of a (its root), all of as
descendents, and all the edges linking these nodes.
Claim 47 A full m-ary tree with i internal nodes has mi + 1 nodes total.
To see why this is true, notice that there are two types of nodes: nodes
with a parent and nodes without a parent. A tree has exactly one node with
no parent. We can count the nodes with a parent by taking the number of
parents in the tree (i) and multiplying by the branching factor m.
Therefore, the number of leaves in a full m-ary tree with i internal nodes
is (mi + 1) i = (m 1)i + 1.
CHAPTER 12. TREES 147
as follows: ((a*c)+b)*d). This tree uses two variables (E and V ) and six
different terminals: a, b, c, d, +, .
V V V V
a * c + b * d
E E+V
E EV
E V +V
E V V
V a
V b
V c
V d
E E +V |EV |V +V |V V
V a|b|c|d
S aSb
S
CHAPTER 12. TREES 150
S S S S
a S b a S b a S b
a S b a S b
a S b
The sequences of terminals for the above trees are (left to right): empty
sequence (no terminals at all), ab, aabb, aaabbb. The sequences from this
grammar always have a number of as followed by the same number of bs.
Notice that the left-to-right order of nodes in the tree must match the
order in the grammar rules. So, for example, the following tree doesnt match
the above grammar.
b S a
N | N | NN
N N N
N N N N
N N N N
N N N
N N N N N
N N N N N
S(1) = c
S(n) = 2S(n/2) + n, n 2 (n a power of 2)
We can draw a picture of this definition using a recursion tree. The top
CHAPTER 12. TREES 152
node in the tree represents S(n) and contains everything in the formula for
S(n) except the recursive calls to S. The two nodes below it represent
two copies of the computation of S(n/2). Again, the value in each node
contains the non-recursive part of the formula for computing S(n/2). The
value of S(n) is then the sum of the values in all the nodes in the recursion
tree.
n/2 n/2
How high is this tree, i.e. how many levels do we need to expand before
we hit the base case n = 1?
For each level of the tree, what is the sum of the values in all nodes at
that level?
In this example, the tree has height log n, i.e. there are log n non-leaf
levels. At each level of the tree, the node values sum to n. So the sum for all
non-leaf nodes is n log n. There are n leaf nodes, each of which contributes c
to the sum. So the sum of everything in the tree is n log n + cn, which is the
same closed form we found earlier for this recursive definition.
Recursion trees are particularly handy when you only need an approx-
imate description of the closed form, e.g. is its leading term quadratic or
cubic?
CHAPTER 12. TREES 153
P (1) = c
P (n) = 2P (n/2) + n2 , n 2 (n a power of 2)
n2
(n/2)2 (n/2)2
The height of the tree is again log n. The sums of all nodes at the top
level is n2 . The next level down sums to n2 /2. And then we have sums:
n2 /4, n2 /8, n2 /16, and so forth. So the sum of all nodes at level k is n2 21k .
The lowest non-leaf nodes are at level log n 1. So the sum of all the
non-leaf nodes in the tree is
log n1 log n1
X
21 2
X 1
P (n) = n k =n
k=0
2 k=0
2k
1 2 2
= n2 (2 ) = n2 (2 ) = n2 (2 ) = 2n2 2n
2log n1 2log n n
Adding cn to cover the leaf nodes, our final closed form is 2n2 + (c 2)n.
CHAPTER 12. TREES 154
X Y
In writing such a proof, its tempting to think that if the full tree has
height h, the child subtrees must have height h 1. This is only true if
the tree is complete. For a tree that isnt necessarily complete, one of the
subtrees must have height h 1 but the other subtree(s) might be shorter
than h 1. So, for induction on trees, it is almost always necessary to use a
strong inductive hypothesis.
In the inductive step, notice that we split up the big tree (T ) at its
root, producing two smaller subtrees (X) and (Y ). Some students try to do
induction on trees by grafting stuff onto the bottom of the tree. Do not
do this. There are many claims, especially in later classes, for which this
grafting approach will not work and it is essential to divide at the root.
32
19 12
18 8 9 1
Notice that the values at one level arent uniformly bigger than the values
at the next lower level. For example, 18 in the bottom level is larger than
12 on the middle level. But values never decrease as you move along a path
from a leaf up to the root.
Trees with the heap property are convenient for applications where you
have to maintain a list of people or tasks with associated priorities. Its easy
to retrieve the person or task with top priority: it lives in the root. And its
easy to restore the heap property if you add or remove a person or task.
I claim that:
Claim 49 If a tree has the heap property, then the value in the root of the
tree is at least as large as the value in any node of the tree.
To keep the proof simple, lets restrict our attention to full binary trees:
Claim 50 If a full binary tree has the heap property, then the value in the
root of the tree is at least as large as the value in any node of the tree.
Lets let v(a) be the value at node a and lets use the recursive structure
of trees to do our proof.
S ab
S SS
S aSb
Big-O
This chapter covers asymptotic analysis of function growth and big-O nota-
tion.
160
CHAPTER 13. BIG-O 161
are each the product of n terms. For 2n , they are all 2. For n! they are the
first n integers, and all but the first two of these are bigger than 2. Although
we only proved this inequality for integer inputs, youre probably prepared
to believe that it also holds for all real inputs 4.
In a similar way, you can use induction to show that n2 < 2n for any
integer n 4. And, in general, for any exponent k, you can show that nk < 2n
for any n above some suitable lower bound. And, again, the intermediate
real input values follow the same pattern. Youre probably familiar with how
CHAPTER 13. BIG-O 163
1 n n2 n3 . . . 2n n!
Ive used curly because this ordering isnt standard algebraic . The
ordering only works when n is large enough.
For the purpose of designing computer programs, only the first three of
these running times are actually good news. Third-order polynomials already
grow too fast for most applications, if you expect inputs of non-trivial size.
Exponential algorithms are only worth running on extremely tiny inputs, and
are frequently replaced by faster algorithms (e.g. using statistical sampling)
that return approximate results.
Now, lets look at slow-growing functions, i.e. functions that might be
the running times of efficient programs. Well see that algorithms for finding
entries in large datasets often have running times proportional to log n. If
you draw the log function and ignore its strange values for inputs smaller
than 1, youll see that it grows, but much more slowly than n.
Algorithms for sorting a list of numbers have running times that grow
like n log n. If n is large enough, 1 < log n < n. So n < n log n < n2 . We can
summarize these relationships as:
1 log n n n log n n2
Its well worth memorizing the relative orderings of these basic functions,
since youll see them again and again in this and future CS classes.
CHAPTER 13. BIG-O 164
The constant c in the equation models the fact that we dont care about
multiplicative constants in comparing functions. The restriction that the
equation only holds for x k models the fact that we dont care about the
behavior of the functions on small input values.
So, for example, 3x2 is O(2x ). 3x2 is also O(x2 ). But 3x2 is not O(x). So
the big-O relationship includes the possibility that the functions grow at the
same rate.
When g(x) is O(f (x)) and f (x) is O(g(x)), then f (x) and g(x) must grow
at the same rate. In this case, we say that f (x) is (g(x)) (and also g(x) is
(f (x))).
Big-O is a partial order on the set of all functions from the reals to
the reals. The relationship is an equivalence relation on this same set of
functions. So, for example, under the relation, the equivalence class [x2 ]
contains functions such as x2 , 57x2 301, 2x2 + x + 2, and so forth.
Notice that the steps of this proof are in the opposite order from the work
we used to find values for c and k. This is standard for big-O proofs. Count
on writing them in two drafts (e.g. the first on scratch paper).
Heres another example of a big-O proof:
CHAPTER 13. BIG-O 166
Outside theory classes, computer scientists often say that f (x) is O(g(x))
when they actually mean the (stronger) statement that f (x) is (g(x)). Or
this drives theoreticians nutsthey will say that g(x) is a tight big-O
bound on f (x). In this class, well stick to the proper theory notation, so
that you can learn how to use it. That is, use when you mean to say that
two functions grow at the same rate or when you mean to give a tight bound.
Very, very annoyingly, for historical reasons, the statement f (x) is O(g(x))
is often written as f (x) = O(g(x)). This looks like a sort of equality, but it
isnt. It is actually expressing an inequality. This is badly-designed notation
but, sadly, common.
Chapter 14
Algorithms
14.1 Introduction
The techniques weve developed earlier in this course can be applied to ana-
lyze how much time a computer algorithm requires, as a function of the size
of its input(s). We will see a range of simple algorithms illustrating a variety
of running times. Three methods will be used to analyze the running times:
nested loops, resource consumption, and recursive definitions.
We will figure out only the big-O running time for each algorithm, i.e.
ignoring multiplicative constants and behavior on small inputs. This will
allow us to examine the overall design of the algorithms without excessive
complexity. Being able to cut corners so as to get a quick overview is a
critical skill when you encounter more complex algorithms in later computer
science classes.
168
CHAPTER 14. ALGORITHMS 169
1, and you often need to examine the last subscript to find out the length.
Sequences can be stored using either an array or a linked list. The choice
sometimes affects the algorithm analysis, because these two implementation
methods have slightly different features.
An array provides constant-time access to any element. So you can
quickly access elements in any order you choose and the access time does
not depend on the length of the list. However, the length of an array is fixed
when the array is built. Changing the array length takes time proportional
to the length of the array, i.e. O(n). Adding or deleting objects in the middle
of the array requires pushing other objects sideways. This can also take O(n)
time. Two-dimensional arrays are similar, except that you need to supply
two subscripts e.g. ax,y .
In a linked list, each object points to the next object in the list. An
algorithm has direct access only to the elements at the ends of the list.1
Objects in the middle of the list can only be accessed by walking element-
by-element from one end, which can take O(n) time. However, the length
of the list is flexible and objects can added to, or removed from, the ends of
the list in constant time. Once you are at a position in the middle of a list,
objects can be added or deleted at that position in constant time.
A linked list starts with its head and ends with its tail. For example,
suppose our list is L = (1, 7, 3, 4, 7, 19). Then head(L) is 1 and tail(L) is 19.
The function pop removes and returns the value at the head of a list. I.e.
pop(L) will return 1 leave the list L containing (7, 3, 4, 7, 19).
For some algorithms, the big-O performance does not depend on whether
arrays or linked lists are used. This happens when the number of objects is
fixed and the objects are accessed in sequential order. However, remember
that a big-O analysis ignores multiplicative constants. All other things be-
ing equal, array-based implementations tend to have smaller constants and
therefore run faster.
CHAPTER 14. ALGORITHMS 170
When code contains a while loop, rather than a for loop, it can be less
obvious how many times the loop will run. For example, suppose we have two
sorted lists, a1 , . . . , ap and b1 , . . . , bq . We can merge them very efficiently
into a combined sorted list. To do this, we make a new third empty list
to contain our merged output. Then we examine the first elements of the
two input lists and move the smaller value onto our output list. We keep
looking at the first elements of both lists until one list is empty. We then
copy over the rest of the non-empty list. Figure 14.2 shows pseudocode for
this algorithm.
For merge, a good measure of the size of the input is the length of the
output array n, which is equal to the sum of the lengths of the two input
arrays (p + q). The merge function has one big while loop. Since the op-
erations within the loop (lines 4-10) all take constant time, we just need to
figure out how many times the loop runs, as a function of n.
CHAPTER 14. ALGORITHMS 172
A good way to analyze while loops is to track a resource that has a known
size and is consumed as the loop runs. In this case, each time the loop runs,
we move one number from L1 or L2 onto the output list O. We have n
numbers to move, after while the loop halts. So the loop must run n times.
So merge takes O(n) (aka linear) time. This is called a resource consumption
analysis.
the list M more than once. So the loop in lines 6-10 runs no more than n
times.
Line 8 starts at a node q and find all its neighbors p. So it traces all the
edges involving q. During the whole run of the code, a graph edge might get
traced twice, once in each direction. There are m edges in the graph. So
lines 9-10 cannot run more than 2m times.
In total, this algorithm needs O(n + m) time. This is an interesting case
because neither of the two terms n or m dominates the other. It is true
that the number of edges m is no O(n2 ) and thus the connected component
algorithm is O(n2). However, in most applications, relatively few of these
potential edges are actually present. So the O(n + m) bound is more helpful.
Notice that there is a wide variety of graphs with n nodes and m edges.
Our analysis was based on the kind of graph that wold causes the algorithm
to run for the longest time, i.e. a graph in which the algorithm reaches every
node and traverses every edge, reaching t last. This is called a worst-case
analysis. On some input graphs, our code might run much more quickly, e.g.
if we encounter t early in the search or if much of the graph is not connected to
s. Unless the author explicitly indicates otherwise, big-O algorithm analyses
are normally understood to be worst-case.
Figure 14.4: Binary search for n
This isnt the fastest way to find a square root,3 but this simple method
generalizes well to situations in which we have only a weak model of the
function we are trying to optimize. This method requires only that you can
test whether a candidate value is too low or too high. Suppose, for example,
that you are tuning a guitar string. Many amateur players can tell if the
current tuning is too low or too high, but have a poor model of how far to
turn the tuning knob. Binary search would be a good strategy for optimizing
the tuning.
To analyze how long binary search takes, first notice that the start-up and
clean-up work in the main squareroot function takes only constant time. So
we can basically ignore its contribution. The function squarerootrec makes
one recursive call to itself and otherwise does a constant amount of work.
The base case requires only a constant amount of work. So if the running
time of squarerootrec is T (n), we can write the following recursive definition
for T (n), where c and d are constants.
3
A standard faster approach is Newtons method.
CHAPTER 14. ALGORITHMS 175
T (1) = c
T (n) = T (n/2) + d
14.7 Mergesort
Mergesort takes an input linked list of numbers and returns a new linked list
containing the sorted numbers. This is somewhat different from bubble and
insertion sort, which rearrange the values within a single array (and dont
return anything).
Mergesort divides its big input list (length n) into two smaller lists of
length n/2. Lists are divided up repeatedly until we have a large number of
very short lists, of length 1 or 2 (depending on the preferences of the code
writer). A length-1 list is necessarily sorted. A length 2 list can be sorted
in constant time. Then, we take all these small sorted lists and merge them
together in pairs, gradually building up longer and longer sorted lists until we
have one sorted list containing all of our original input numbers. Figure 14.5
shows the resulting pseudocode.
does O(n) work merging the two results. So if the running time of mergesort
is T (n), we can write the following recursive definition for T (n), where c and
d are constants.
T (1) = c
T (n) = 2T (n/2) + dn
dn
dn/2 dn/2
The tree has O(log n) non-leaf levels and the work at each level sums
up to dn. So the work from the non-leaf nodes sums up to O(n log n). In
addition, there are n leaf nodes (aka base cases for the recursive function),
each of which involves c work. So the total running time is O(n log n) + cn
which is just O(n log n).
T (1) = c
T (n) = 2T (n 1) + d
T (n) = 2T (n 1) + d
= 2 2(T (n 2) + d) + d
= 2 2(2(T (n 3) + d) + d) + d
= 23 T (n 3) + 22 d + 2d + d
k1
X
k
= 2 T (n k) + d 2i
i=0
k1
X
k
T (n) = 2 T (n k) + d 2i
i=0
n2
X
= 2n1c + d 2i
i=0
n1 n1
= 2 c + d(2 1)
n1 n1
= 2 c+2 dd
= O(2n )
suppose that our input numbers are x and y and they each have 2m digits.
We can then divide them up as
x = x1 2m + x0
y = y 1 2m + y 0
xy = A22m + B2m + C
T (1) = c
B = (x1 + x0 )(y1 + y0 ) A C
CHAPTER 14. ALGORITHMS 180
This means we can compute B with only one multiplication rather that two.
So, if we use this formula for B, the running time of multiplication has the
recursive definition
P (1) = c
Its not obvious that weve gained anything substantial, but we have. If
we build a recursion tree for P , we discover that the kth level of the tree
contains 3k problems, each involving n 21k work. The tree height is log2 (n).
So each level requires n( 32 )k work. If you add up this up for all levels, you
get a summation that is dominated by the term for the bottom level of the
tree: n( 32 )log2 n work. If you mess with this expression a bit, using facts about
logarithms, you find that its O(nlog2 3 ) which is approximately O(n1.585 ).
So this trick, due to Anatolii Karatsuba, has improved our algorithms
speed from O(n2 ) to O(n1.585 ) with essentially no change in the constants.
If n = 210 = 1024, then the naive algorithm requires (210 )2 = 1, 048, 576
multiplications, whereas Katatsubas method requires 310 = 59, 049 multipli-
cations. So this is a noticable improvement and the difference will widen as
n increases.
There are actually other integer multiplication algorithms with even faster
running times, e.g. Schoohage-Strassens method takes O(n log n log log n)
time. But these methods are more involved.
Chapter 15
Sets of Sets
So far, most of our sets have contained atomic elements (such as numbers or
strings) or tuples (e.g. pairs of numbers). Sets can also contain other sets.
For example, {Z, Q} is a set containing two infinite sets. {{a, b}, {c}} is a
set containing two finite sets. In this chapter, well see a variety of examples
involving sets that contain other sets.
181
CHAPTER 15. SETS OF SETS 182
Chen both play the oboe). The set of these groups might look like:
Or we could try to list all ways that we could choose a 3-person commit-
tee from this set of students, which would be a rather large set containing
elements such as {Ian, Emily, Jose} and {Ian, Emily, Michelle}.
When a set like B is the domain of a function, the function maps an
entire subset to an output value. For example, suppose we have a function
f : B {dorms}. Then f would map each set of students to a dorm. E.g.
f ({Michelle, Emily}) = Babcock.
The value of a function on a subset can depend in various ways on what-
ever is in the subset. For example, suppose that we have a set
P(A) = {, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}
a c
e f g h
b d
Now, lets define the function N so that it takes a node as input and
returns the neighbors of that node. A node might have one neighbor, but
it could have several, and it might have no neighbors. So the outputs of
N cant be individual nodes. They must be sets of nodes. For example,
N(a) = {b, c, e} and N(f ) = . Its important to be consistent about the
output type of N: it always returns a set. So N(g) = {h}, not N(g) = h.
Formally, the domain of N is V and the co-domain is P(V ). So the type
signature of N would be N : V P(V ).
Suppose we have the two graphs shown below, with sets of nodes X =
{a, b, c, d, e} and Y = {1, 2, 3, 4, 5}. And suppose that were trying to find
all the possible isomorphisms between the two graphs. We might want a
function f that retrieves likely corresponding nodes. For example, if p is a
node in X, then f (p) might be the set of nodes in Y with the same degree
as p.
CHAPTER 15. SETS OF SETS 184
a c 5 1 3
e b d 2 4
f cant return a single node, because there might be more than one node
in Y with the same degree. Or, if the two graphs arent isomorphic, no
nodes in Y with the same degree. So well have f return a set of nodes. For
example, f (e) = {1, 5} and f (a) = {2}. The co-domain of f will need to be
P(Y ). So, to declare f , wed write f : X P(Y ).
15.3 Partitions
When we divide a base set A into non-overlapping subsets which include
every element of A, the result is called a partition of A. For example,
suppose that A is the set of nodes in the following graph. The partition
{{a, b, c, d}, {e, f, g}, {h, i, j, k}} groups nodes into the same subset if they
belong to the same connected component.
a c g j
k
b d e f h
i
We could also write this partition as {[0], [1], [2], [3]} since each equivalence
class is a set of numbers.
Collections of subsets dont always form partitions. For example, consider
the following graph G.
a c e
f
b d g
Suppose we collect sets of nodes in G that form a cycle. Well get the
following set of subsets. This isnt a partition because some of the subsets
overlap.
1. covers all of A: A1 A2 . . . An = A
S
1. covers all of A: XP X=A
15.4 Combinations
In many applications, we have an n-element set and need to count all subsets
of a particular size k. A subset of size k is called a k-combination. Notice
the difference between a permutation and a combination: we care about the
order of elements in a permutation but not in a combination.
For example, how many ways can I select a 7-card hand from a 60-card
deck of Magic cards (assuming no two cards are identical)?1
One way to analyze this problem is to figure out how many ways we
can select an ordered list of 7 cards, which is P (60, 7). This over-counts the
number of possibilities, so we have to divide by the number of different orders
in which the same 7-cards might appear. Thats just 7!. So our total number
1
Ok, ok, for those of you who actually play Magic, decks do tend to contain identical
land cards. But maybe we are using lots of special lands or perhaps well treat cards with
different artwork as different.
CHAPTER 15. SETS OF SETS 187
of hands is P (60,7)
7!
This is 60595857565554
765432
. Probably not worth simplifying or
multiplying this out unless you really have to. (Get a computer to do it.)
In general, suppose that we have a set S with n elements and we want to
n!
choose an unordered subset of k elements. We have (nk)! ways to choose k
elements in some particular order. Since there are k! ways to put each subset
into an order, we need to divide by k! so that we will only count each subset
n!
once. So the general formula for the number of possible subsets is k!(nk)! .
n!
is often written C(n, k) or nk . This is pronounced
The expression k!(nk)!
n choose r. It is also sometimes called a binomial coefficient, for reasons
that will become obvious shortly. So the shorthand answer to our question
60
about magic cards would be 7 .
Notice that nr is only defined when n r 0. What is 00 ? This is
0! 1
0!0!
= 11 = 1.
10 10 10
257 + 258 + 259 + 2510
3 2 1
TT#O#MMM
T T T ## M M M
But this picture is redundant, since the items before the first separator
are always thymes, the ones between the separators are oreganos, and the
last group are mints. So we can simplify the diagram by using a star for each
object and remembering their types implicitly. Then 2 thyme, 1 oregano,
and 3 mint looks like
** # * # ***
*** ## ***
CHAPTER 15. SETS OF SETS 189
n+1 n n
(Pascals identity) k
= k
+ k1
n
This is not hard to prove from the definition of k
. To remember it, suppose
CHAPTER 15. SETS OF SETS 190
that S is a set with n + 1 elements. The lefthand side of the equation is the
number of k-element subsets of S.
Now, fix some element a in S. There are two kinds of k-element subsets:
(1) those that dont contain a and (2) those that do contain a. The first term
on the righthand side counts the subsets in group (1): all k-element subsets
of S {a}. The second term on the righthand side counts the k 1-element
subsets of S {a}. We then add a to each of these to get the subsets in
group (2).
If we have Pascals identity, we can give a recursive definition for the
binomial coefficients, for all natural numbers n and k with k n.
Induction: nk = n1 n1
k1
+ k
, whenever k < n
When we collect terms, the coefficient for each term will be the size of this
set of equivalent terms. E.g. the coefficient for x2 y 3 is 10, because [xxyyy]
contains 10 elements. To find the coefficient for xnk y k , we need to count
how many ways we can make a sequence of n variable names that contains
k ys and n k xs. This amounts to picking a subset of k elements from a
n
set of n positions in the sequence. In other words, there are k such terms.
State Diagrams
In this chapter, well see state diagrams, an example of a different way to use
directed graphs.
16.1 Introduction
State diagrams are a type of directed graph, in which the graph nodes rep-
resent states and labels on the graph edges represent actions. For example,
here is a state diagram representing the life cycle of a chicken:
grow chick
chicken
hatch
lay
egg cook
omelet
The label on the edge from state A to state B indicates what action
happens as the system moves from state A to state B. In many applications,
all the transitions involve one basic type of action, such as reading a character
192
CHAPTER 16. STATE DIAGRAMS 193
or going though a doorway. In that case, the diagram might simply indicate
the details of this action. For example, the following diagram for a multi-
room computer game shows only the direction of motion on each edge.
dining rm
cellar
east
Walks (and therefore paths and cycles) in state diagrams must follow the
arrow directions. So, for example, there is no path from the ferry to the
study. Second, an action can result in no change of state, e.g. attempting
to go east from the cellar. Finally, two different actions may get you to the
same new state, e.g. going either west or north from the hall gets you to the
dining room.
Remember that the full specification of a walk in a graph contains both
a sequence of nodes and a sequence of edges. For state diagrams, these
correspond to a sequence of states and a sequence of actions. Its often
important to include both sequences, both to avoid ambiguity and because
the states and actions may be important to the end user. For example, for
one walk from the hall to the barn, the full state and action sequences look
like:
c cgb
wgcb wc wcb g gb
w wgb
In this diagram, actions arent marked on the edges: youre left to infer
the action from the change in state. The start state (wgcb) where the system
begins is marked by showing an error leading into it. The end state () where
nothing is left on the east bank is marked with a double ring.
In this diagram, it is possible for a (directed) walk to loop back on itself,
repeating states. So there are an infinite number of solutions to this puzzle.
CHAPTER 16. STATE DIAGRAMS 195
p
5 7
o
i
c h t
1 2 3 a 4 6
a t
c
1 8 o 9 b 10
We can combine these two phone lattices into one large diagram, repre-
senting the union of these two sets of words:
1
This is, of course, totally wrong. But the essential ideas stay the same when you
switch to a phonetic spelling of English.
CHAPTER 16. STATE DIAGRAMS 196
p
5 7
o
i
h t
2 3 a 4 6
c
a t
c
1 8 o 9 b 10
Notice that there are two edges leading from state 1, both marked with
the phone c. This indicates that the user (e.g. a speech understanding
program) has two options for how to handle a c on the input stream. If
this was inconvenient for our application, it could be eliminated by merging
states 2 and 8.
Many state diagrams are passive representations of a set of possibilities.
For example, a room layout for a computer game merely shows what walkss
are possible; the player makes the decisions about which walk to take. Phone
lattices are often used in a more active manner, as a very simple sort of
computer called a finite automaton. The automaton reads an input sequence
of characters, following the edges in the phone lattice. At the end of the
input, it reports whether it has or hasnt successfully reached an end state.
One better approach is to build a 1D array of states. The cell for each
state contains a list of actions possible from that state, together with the
new states for each action. For example, in our final phone lattice, the
entry for state 1 would be ((c, (2, 8))) and the entry for state 3 would be
((o, (5)), (i, (5)), (a, (4))). This adjacency list style of storage is much more
compact, because we are no longer wasting space representing the large num-
ber of impossible actions.
Another approach is to build a function, called a hash function that maps
each state/action pair to a small integer. We then allocate a 1D array with
one position for each state/action pair. Each array cell then contains a list
of new states for this state/action pair. The details of hash functions are
beyond the scope of this class. However, modern programming languages
often include built-in hash table or dictionary objects that handle the details
for you.
o p
13 14 15
c o p s
9 10 11 12
c
c a p s
1 5 6 7 8
c
a p
2 3 4
Although this lattice encodes the right set of words, it uses a lot more
states than necessary. We can represent the same information using the
following, much more compact, phone lattice.
CHAPTER 16. STATE DIAGRAMS 199
a
c p s
1 2 o 3 4 5
State merger is even more important when states are generated dynam-
ically. For example, suppose that we are trying to find an optimal strategy
for playing a game like tic-tac-toe. Blindly enumerating sequences of moves
might create state diagrams such as:
X X O X O X O
X X O
O X O X O
X X X X O
Searching through the same sequence of moves twice wastes time as well
as space. If we build an index of states weve already generated, we can detect
when we get to the same state a second time. We can then use the results
of our previous work, rather than repeating it. This trick, called dynamic
programming, can significantly improve the running time of algorithms.
X X O
X O X O
X X O
O
X X
We can also use state diagrams to model what happens when computer
programs run. Consider the following piece of code
CHAPTER 16. STATE DIAGRAMS 200
cyclic()
y = 0
x = 0
while (y < 100)
x = remainder(x+1,4)
y = 2x
We can represent the state of this machine by giving the values of x and y.
We can then draw its state diagram as shown below. By recognizing that we
return to the same state after four iterations, we not only keep the diagram
compact but also capture the fact that the code goes into an infinite loop.
For some initial configurations, the system goes into a stable state or
oscillates between several configurations. However, even if the initial set of
live cells is finite, the set of live cells can grow without bound as time moves
forwards. So, for some initial configurations, the system has infinitely many
states.
When a system has an intractably large number of states, whether finite
or infinite, we obviously arent going to build its state space explicitly. Anal-
ysis of such systems requires techniques like computing high-level properties
of states, generating states in a smart way, and using heuristics to decide
whether a state is likely to lead to a final solution.
Countability
202
CHAPTER 17. COUNTABILITY 203
17.2 Completeness
One big difference between the two sets is that the reals have a so-called
completeness property. It states that any subset of the reals with an upper
bound has a smallest upper bound. (And similarly for lower bounds.) So if I
have a sequence of reals that converges, the limit it converges to is also a real
number. This isnt true for the rationals. We can make a series of rational
numbers that converge (for example) such as
17.3 Cardinality
We know how to calculate and compare the sizes of finite sets. To extend
this idea to infinite sets, we use bijections functions to compare the sizes of
sets:
Weve seen that there is a bijection between two finite sets exactly when
the sets have the same number of elements. So this definition of cardinality
matches our normal notion of how big a finite set is. But, since we dont
have any numbers of infinite size, working with bijections extends better to
infinite sets.
The integers and the natural numbers have the same cardinality, because
we can construct a bijection between them. Consider the function f : N Z
where f (n) = n2 when n is even and f (n) = (n+1)
2
when n is odd. f maps
CHAPTER 17. COUNTABILITY 204
the even natural numbers bijectively onto the non-negative integers. It maps
the odd natural numbers bijectively onto the negative integers.
Because the integers are so important, theres a special name for sets that
have the same cardinality as the integers:
The term countable is used to cover both finite sets and sets that are count-
ably infinite.
It is the case that if |A| |B| and |B| |A|, then |A| = |B|. That is, if
you can build one-to-one functions in both directions, a bijection does exist.
This result, called the Cantor Schroeder Bernstein Theorem, isnt obvious
when the sets are infinite. You do have enough background to understand
its proof (which you can look up), but its somewhat messy.
To see this theorem in action, consider the rational numbers. Rational
numbers are almost the same as fractions, which are basically pairs of in-
tegers, which we know to be countablily infinite. But this isnt quite right:
each rational number is represented by many fractions.
So, lets use Cantor Schroeder Bernstein: to show that the positive ratio-
nal numbers are the same size as the natural numbers show that |N| |Q+ |
and |Q+ | |N|. Its easy to make a one-to-one function from the natural
numbers to the positive rational numbers: just map each natural number n
to n + 1 (because Q+ doesnt include zero). So |N| |Q+ |. So now we just
need a one-to-one function from the rationals to the integers, to show that
|Q+ | |N|.
To map the positive rational numbers to the natural numbers, first map
each rational number to one representative fraction, e.g. the one in lowest
terms. This isnt a bijection, but it is one-to-one. Then use the method we
saw above to map the fractions, which are just pairs of positive integers, to
the natural numbers. We now have the required one-to-one function from
the positive rationals to the natural numbers.
Again, this construction can be adapted to also handle negative rational
numbers. So the set of rational numbers is countably infinite.
CHAPTER 17. COUNTABILITY 206
v0 1 1 0 1 1 0 1 1 1 1 ...
v1 1 1 0 0 1 0 1 1 0 0 ...
v2 0 0 0 0 1 0 0 1 0 0 ...
v3 0 1 1 1 1 0 1 0 0 0 ...
v4 0 0 0 0 1 1 1 0 1 1 ...
v5 1 1 1 0 1 0 1 0 0 1 ...
... ...
This is supposed to be a complete list of all the bit vectors. But we can
construct a bit vector x thats not on the list. The value of xk , i.e. the kth
bit in our new vector, will be 0 if the k digit of vk is 1, and 1 if the k digit
of vk is 0. Notice that x is different from v3 because the two vectors differ
in the third position. It cant be v20 because the two vectors differ in the
twentieth position. And, in general, x cant equal vk because the two vectors
differ in the kth position. For the example above, the new vector not in the
list would start out: 0 0 1 0 0 1 . . ..
CHAPTER 17. COUNTABILITY 207
So, its not possible to put these infinite bit vectors into a list indexed
by the natural numbers, because we can always construct a new bit vector
thats not on the list. That is, there cant be a one-to-one function from the
infinite bit vectors to the natural numbers. So there cant be a one-to-one
function from the subsets of the natural numbers to the natural numbers. So
P(N) isnt countable. That is, the subsets of the natural numbers are more
numerous than the natural numbers themselves.
17.8 Uncomputability
We can pull some of these facts together into some interesting consequences
for computer science. Notice that a formula for a function is just a finite
string of characters. So the set of formulas is countable. But the set of
functions, even from the integers to the integers, is uncountable. So there
are more functions than formulas, i.e. some functions which have no finite
formula.
Similarly, notice that a computer program is simply a finite string of
ASCII characters. So there are only countably many computer programs.
But there are uncountably many functions. So there are more functions
than programs, i.e. there are functions which cannot be computed by any
program.
A final problem is created by the fact that, although the code for a com-
puter program is finite in length, the trace of the programs execution may
be infinite. Specifically, program traces fall into three categories
(2) The program loops, in the sense of returning back to a previous state.
(3) The program keeps going forever, consuming more and more storage
space.
Planar Graphs
210
CHAPTER 18. PLANAR GRAPHS 211
C D
c d
a b
A B
18.2 Faces
When a planar graph is drawn with no crossing edges, it divides the plane into
a set of regions, called faces. By convention, we also count the unbounded
area outside the whole graph as one face. The boundary of a face is the
subgraph containing all the edges adjacent to that face and a boundary walk
is a closed walk containing all of those edges. The degree of the face is the
minimum length of a boundary walk. For example, in the figure below, the
lefthand graph has three faces. The boundary of face 2 has edges df, f e, ec, cd,
so this face has degree 4. The boundary of face 3 (the unbounded face) has
edges bd, df, f e, ec, ca, ab, so face 3 has degree 6.
CHAPTER 18. PLANAR GRAPHS 212
b d f b f
1 2 3 1 d 2
a c e a c e
The righthand graph above has a spike edge sticking into the middle of
face 1. The boundary of face 1 has edges bf, f e, ec, cd, ca, ab. However, any
boundary walk must traverse the spike twice, e.g. one possible boundary
walk is bf, f e, ec, cd, cd, ca, ab, in which cd is used twice. So the degree of
face 1 in the righthand graph is 7. Notice that the boundary walk for such a
face is not a cycle.
Suppose that we have a graph with e edges, v nodes, and f faces. We
know that the Handshaking theorem holds, i.e. the sum of node degrees is
2e. For planar graphs, we also have a Handshaking theorem for faces: the
sum of the face degrees is 2e. To see this, notice that a typical edge forms
part of the boundary of two faces, one to each side of it. The exceptions are
edges, such as those involved in a spike, that appear twice on the boundary
of a single face.
Finally, for connected planar graphs, we have Eulers formula: v e+ f =
2. Well prove that this formula works.1
18.3 Trees
Before we try to prove Eulers formula, lets look at one special type of
planar graph: free trees. In graph theory, a free tree is any connected
graph with no cycles. Free trees are somewhat like normal trees, but they
dont have a designated root node and, therefore, they dont have a clear
ancestor-descendent ordering to their notes.
A free tree doesnt divide the plane into multiple faces, because it doesnt
contain any cycles. A free tree has only one face: the entire plane surrounding
it. So Eulers theorem reduces to v e = 1, i.e. e = v 1. Lets prove that
1
You can easily generalize Eulers formula to handle graphs with more than one con-
nected components.
CHAPTER 18. PLANAR GRAPHS 213
v (e 1) + (f 1) = 2
So
ve+f =2
Proof: The sum of the degrees of the faces is equal to twice the
number of edges. But each face must have degree 3. So we
have 3f 2e.
Eulers formula says that v e + f = 2, so f = e v + 2 and
thus 3f = 3e 3v + 6. Combining this with 3f 2e, we get
3e 3v + 6 2e. So e 3v 6.
We can also use this formula to show that the graph K5 isnt planar. K5
has five nodes and 10 edges. This isnt consistent with the formula e 3v6.
Unfortunately, this method wont help us with K3,3 , which isnt planar but
does satisfy this equation.
We can also use this Corollary 1 to derive a useful fact about planar
graphs:
CHAPTER 18. PLANAR GRAPHS 215
If our graph G isnt connected, the result still holds, because we can apply
our proof to each connected component individually. So we have:
Proof: The sum of the degrees of the faces is equal to twice the
number of edges. But each face must have degree 4 because
all cycles have length 4. So we have 4f 2e, so 2f e.
Eulers formula says that v e + f = 2, so e v + 2 = f , so
2e 2v + 4 = 2f . Combining this with 2f e, we get that
2e 2v + 4 e. So e 2v 4.
This result lets us show that K3,3 isnt planar. All the cycles in K3,3 have
at least four nodes. But K3,3 has 9 edges and 6 nodes, which isnt consistent
with this formula. So K3,3 cant be planar.
CHAPTER 18. PLANAR GRAPHS 216
A B A B
F G
C D C D
This was proved in 1930 by Kazimierz Kuratowski, and the proof is ap-
parently somewhat difficult. So well just see how to apply it.
For example, heres a graph known as the Petersen graph (after a Danish
mathematician named Julius Petersen).
CHAPTER 18. PLANAR GRAPHS 217
B b e E
c d
C D
This isnt planar. The offending subgraph is the whole graph, except for
the node B (and the edges that connect to B):
b e E
c d
C D
This subgraph is a subdivision of K3,3 . To see why, first notice that the
node b is just subdividing the edge from d to e, so we can delete it. Or,
CHAPTER 18. PLANAR GRAPHS 218
e E
c d
C D
In the same way, we can remove the nodes A and C, to eliminate unnec-
essary subdivisions:
e E
c d
Now deform the picture a bit and we see that we have K3,3 .
CHAPTER 18. PLANAR GRAPHS 219
a e D
c d E
Its not hard, but a bit messy, to upgrade this proof to show that planar
graphs require only five colors. Four colors is much harder. Way back in
1852, Francis Guthrie hypothesized that any planar graph could be colored
with only four colors, but it took 124 years to prove that he was right. Alfred
Kempe thought he had proved it in 1879 and it took 11 years for another
mathematician to find an error in his proof.
The Four Color Theorem was finally proved by Kenneth Appel and Wolf-
gang Haken at UIUC in 1976. They reduced the problem mathematically, but
were left with 1936 specific graphs that needed to be checked exhaustively,
using a computer program. Not everyone was happy to have a computer
involved in a mathematical proof, but the proof has come to be accepted as
legitimate.
2e
v=
d
By the handshaking theorem for faces, the sum of the face degrees is also
twice the number of edges. That is kf = 2e. So
2e
f=
k
2e 2e
+ >e
d k
CHAPTER 18. PLANAR GRAPHS 222
1 1 1
+ >
d k 2
If we analyze this equation, we discover that d and k cant both be larger
than 3. If they were both 4 or above, the left side of the equation would be
21 . Since we know that d and k are both 3, this implies that one of the
two is actually equal to three and the other is some integer that is at least 3.
Suppose we set d to be 3. Then the equation becomes 13 + k1 > 21 . So
1
k
> 61 , which means that k cant be any larger than 5. Similarly, if k is 3,
then d cant be any larger than 5.
This leaves us only five possibilities for the degrees d and k: (3, 3), (3, 4),
(3, 5), (4, 3), and (5, 3). Each of these corresponds to one of the Platonic
solids.
Appendix A
Jargon
223
APPENDIX A. JARGON 224
QED Short for quod erat demonstrandum. This is just the Latin transla-
tion of which is what we needed to show. The Latin and the English
versions are both polite ways to tell your reader that the proof is fin-
ished.
NTS Need to show, as in we need to show that all horses have four legs.
Consider The author is about to pull an example or constant or the like out
of thin air. Typically, this part of the proof was constructed backwards,
and the reasons for this step will not become apparent until later in
the proof.
Or The connective or and its variations (e.g. either...or) leave open the
possibility that both statements are true. If a mathematician means to
exclude the possibility that both are true (exclusive or), they say so
explicitly. In everyday English, you need to use context to determine
whether an or was meant to be inclusive or exclusive.
Has the form E.g. if x is rational, then x has the form pq where p, q Z
and q 6= 0. Means that the definition of this type of object forces the
object to have this specific internal structure.
Recall The author is about to tell you something basic that you probably
ought to know, but he realizes that some of you have gaps in your
background or have forgotten things. Hes trying to avoid offending
some of the audience by suggesting they dont know something so basic,
while not embarrassing everyone else by making them confess that they
dont.
Similarly You can easily fill in these details by adapting a previous part of
the proof, and youll just get bored if I spell them out. Occasionally
misused in a manner similar to obviously.
Unique There is only one item with the specified properties. There is a
unique real number whose square is zero.
APPENDIX A. JARGON 226
Vacuously true The claim is true, but only because its impossible to sat-
isfy the hypothesis. Recall that in math, an if/then statement is con-
sidered true if its hypothesis is true. Vacuously true statements often
occur when definitions are applied to examples that are very small (e.g.
the empty set) or lacking some important feature (e.g. a graph with
no edges, a function whose domain is the empty set).
A.3 Constructions
Mathematicians also use certain syntactic constructions in ways that arent
quite the same as in normal English.
Variable names Variable names are single-letter, e.g. f but not foo. The
letter can be adorned with a wide variety of accents, subscripts, and
superscripts, however.
names of variables In principle, any letter can be used as the name for any
variable you need in a proof, with or without accent marks of various
sorts. However, there are strong conventions favoring certain names.
E.g. x is a real number, n is an integer, f is a function, and T is a set.
Observe what names authors use in your specific topic area and dont
stray too far from their conventions.
Acknowledgements and
Supplementary Readings
Most of the basic ideas and examples in this book date back many years and
their original source is almost impossible to trace. Ive consulted so many
books and worked with so many helpful teachers, students, and course staff
over the years that the details have blended into a blur. Nevertheless, certain
books have had a particularly strong influence on my presentation. Most of
them would make excellent supplementary materials, both for instructors
and for students.
Its impossible to even think about discrete mathematics without citing
the classic encyclopedic text by Rosen [Rosen 2007]. It helped me under-
stand what topics would be most relevant to a computer science, as op-
posed to a mathematics, audience. Biggs [Biggs 2002], Matousek and Nesetril
[Matousek and Nesetril 1998] and West [West 2001] have been invaluable ref-
erences for discrete mathematics and graph theory.
From Liebeck [Liebeck 2006], Sipser [Sipser 2006], and Biggs [Biggs 2002],
I learned how to extract core concepts and present them concisely. From
Fendel and Resek [Fendel and Resek 1990], I learned how to explain proof
mechanics and talk about the process of constructing proofs. From my nu-
merous students and course staff, both at Iowa and Illinois, I have learned
to understand why some of these concepts seem so hard to beginners. My
co-instructors Eric Shaffer and Viraj Kumar helped mould the curriculum to
228
APPENDIX B. ACKNOWLEDGEMENTS AND SUPPLEMENTARY READINGS229
the needs of our students. Sariel Har-Peled has provided many interesting
examples.
Finally, a special individual citation goes to Jeff Erickson for the recur-
sion fairy (though my fairy has a slightly different job than his) and for
extended arguments about the proper way to present induction (though we
still disagree).
Bibliography
[Fendel and Resek 1990] Daniel Fendel and Diane Resek (1990) Foundations
of Higher Mathematics: Exploration and Proof, Addison-Wesley, Reading
MA.
[Matousek and Nesetril 1998] Jiri Matousek and Jaroslav Nesetril (1998) In-
vitation to Discrete Mathematics Oxford University Press.
[Rosen 2007] Kenneth H. Rosen (2007) Discrete Mathematics and Its Appli-
cations, sixth edition, McGraw Hill, New York, NY.
230