0% found this document useful (0 votes)
34 views51 pages

Discrete_Maths

Uploaded by

francoispoh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views51 pages

Discrete_Maths

Uploaded by

francoispoh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Discrete Mathematics

Lecture Notes∗

Steffen van Bakel


Department of Computing, Imperial College London, 180 Queen’s Gate, London SW7 2BZ, UK

Autumn 2022

Abstract
This short course introduces some basic concepts in discrete mathematics: sets, relations, functions,
induction, countability and infinity. These notes are written to accompany the slides: the slides sum-
marise the content of the course; the notes provide more explanation and detail and should be studied
when preparing for the exam.

Introduction
This material is written as notes to accompany a the first part of a course on Discrete Mathematics, Logic,
and Reasoning given to first year students in the Department of Computing at Imperial College London.
We introduce the basics of discrete mathematics – in particular, sets, relations, functions, and infinity.
These notes are intended for beginning students of computer science, who will find that a knowledge
of the concepts covered here will aid them in understanding many areas of computer science, like for
instance, data types, databases, specification, functional programming, logic programming, and artificial
intelligence.
This course endeavours to give a rigorous account, that forms the underpinnings of many courses in
the Computer Science degree. The work is self contained, though rather concise.

Recommended books
A book everybody should have:
• K. Houston, How to Think Like a Mathematician, Cambridge University Press, 2009.
Books that give additional reading on Discrete Mathematics:
• K.H. Rosen. Discrete Mathematics and its Applications, McGraw Hill 1995.
• J.L. Gersting. Mathematical Structures for Computer Science, Freeman 1993.
• J.K. Truss. Discrete Mathematics for Computer Science, Addison-Wesley 1991.
• R. Johnsonbaugh, Discrete Mathematics, 5th ed. Prentice Hall 2000.
• C. Schumacher, Fundamental Notions of Abstract Mathematics, Addison-Wesley, 2001.
• K. Houston, How to Think Like a Mathematician, Cambridge University Press, 2009.
Related courses include Logic, Reasoning on Programs, Graphs and Algorithms, Haskell, Models of
Computation, Databases, Artificial Intelligence, etc..

Exercises and sections marked with an asterisk are not part of the examinable material of the course.
Solutions to the exercises will be released at the end of term.

∗ These lecture notes are in part based on previous notes by Iain Phillips and Philippa Gardner.

1
Contents
1 Mathematical Reasoning 3
1.1 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Acceptable proof steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 On ‘Proof by Contradiction’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Acceptable statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Some examples of proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Sets 12
2.1 Russell’s paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Comparing Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Constructing Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Relations 22
3.1 Constructing Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Equalities between Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 Transitive Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4 Orderings 30
4.1 Analysing Partial Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Well-founded Partial Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5 Functions 34
5.1 Introducing Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2 Partial Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3 Properties of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.4 Operations on Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.5 Cardinality of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.6 Cantor’s discovery of different infinities . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6 On Infinity 45
6.1 Countable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.2 Uncountable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.3 Different Infinities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7 Axioms for Set Theory 51

2
1 Mathematical Reasoning
One of the important concepts for this course is the notion of mathematical reasoning. Mathematical
reasoning is a form of human reasoning, following very precise steps that allow to conclude that a par-
ticular statement in mathematics holds. The result of this construction, normally a paragraph in natural
language, is often called a proof for that statement and we will use that terminology here. 1 What ex-
actly is accepted mathematical reasoning is open to interpretation, and depends strongly on the person
constructing the proof and, more importantly, the intended reader/observer. There is quite a difference of
opinion between mathematicians, physicists, logicians, etc.. about what exactly is a proof, and there are
many different approaches in how to teach students what is a correct proof, and how to construct one.
The one we choose here is based on the belief that learning to construct a proof is done by understanding
all details as much as possible, acknowledging the importance of each step, and aims to never say things
like ‘Proof: Easy’ or the like.
In constructing a proof, many different intermediate statements can appear, that help reason out the
correctness of the final statements, and all these need to be shown to be true as well. This then boils
down to a rather nasty problem: how much detail are we supposed to supply for the correctness of each
individual statement? Do we need to prove that ‘8 is an even number’, for example?
In practice, a proof often is not much more than a convincing argument, that creates in the observer’s
head an understanding of the ‘meaning’ of the statement and why it holds. Many (mathematicians as
well) still use this approach without hesitation, also in teaching; it suffices that the point ‘See?’ is reached
when arguing that a property holds. 2 It was only at the end of the nineteenth century that mathematicians
and philosophers started considering the problem of exactly ‘what is a proof’. For example, when we
say that one property follows from another one, what do we really mean? And what kind of steps are we
allowing in a proof?
The shared opinion that is the result of this scrutiny is that a proof is a mathematical reasoning over
statements that follows precise steps, and only those; a proof is constructed as a sequence of those steps,
and the statement formulated at the conclusion is then the statement that is shown to hold by the proof (is
proven). Intuition, for example, or insight is never part of a proof. Of course intuition and understanding
play an important role during the construction of a proof; they will guide the construction, and sometimes
‘filling in the details’ through writing down all the necessary steps is straightforward and not much is
gained by doing so, but it is important to know and understand that, when skipping the details, the result
can no longer be called a (formal) proof.
Below we will give the precise steps one is allowed to make in a proof, and a proof should have no
steps that are not in this list. In describing the allowed steps, we will abstract from what it is we are
aiming to prove, but will give concrete examples of proofs constructed using these rules.
Proofs are written in plain natural language (English, in our case), using sentences that specify or
clarify the reasoning, using the allowed steps. Although many think so, it is explicitly not the case that a
proof is a collection of formula; it can contain them, but the reasoning (expressed in natural language) is
a necessary part of a proof.
Once it is understood how to construct a proof, and what a proof should be, one could argue that
presenting all the detailed steps is too cumbersome and that some might be abbreviated; this is of course
true, but only under the condition that, if necessary, a detailed proof can be produced. This implies that,
if it is not clear how the admissibility of a step could be proven, it is perhaps better not to use it in a
proof.

1 Any dictionary will tell you that the word proof stands for many different things. As you will see later in this course, in

Logic that notion is used for an object constructed following precise rules.
2 There are those that ‘show’ the axiom of induction using that every non-empty set of natural numbers has a least element,

not realising that that latter property needs proof as well (in fact, it is shown using induction), and does not hold by itself just
because ‘it is obviously true’.

3
1.1 Statements
We will use the following notation for statements P, Q 3 that are either true or false, but not both; state-
ments can contain properties like P( x) to indicate that the statements P has an ‘open’ position, marked
by x, which can be filled by certain objects o, and P is said then to hold for o.
For the sake of clarity, we assume there are some statements for which the explicit proof is not neces-
sary, like 1 6= 2, 3 × 7 = 21, or a ∈ A (see Section 2).

Definition 1.1 The statements we deal with here are limited to the following:
‘P and Q’: This statement is true exactly when both P and Q are true.
‘P or Q’: This statement is true exactly when either P is true, or Q is, or both are true.
‘not P’: This statement is true exactly when P is false.
‘P implies Q’: This statement is true exactly whenever P is true, then so is Q. This statement is true
even if P is false, but Q cannot be false when P is true. We use ‘P if and only if Q’ as an abbreviation
for whenever both ‘P implies Q’ and ‘Q implies P’ hold.
‘For every x, 4 (the property) P( x) holds’: This is true exactly when P is true irrespective of the choice
of object we put in for x.
‘There exists an x such that P( x) holds’: This is true exactly when there exists a particular object o
(which might even be unknown) such that P is true when taking o for x; there might be many
choices for o, but only one is needed.

Since we would like to separate the language in which we express our reasoning (English in these
notes) and the language of the things we are trying to show (statements in mathematics), we could dis-
tinguish these through notation. We will use brackets and quotes to group statements, thereby separating
them from the language of the proofs themselves.
We will use a formal notation later in these notes, but will always explicitly treat these as abbreviations,
not as formulas that are to be manipulated. Proofs will always be constructed using the above listed steps.

1.2 Acceptable proof steps


When building a proof, we show step-wise that the statement at the end holds, using the truth of inter-
mediate statements achieved after the previous steps to make the next step. This might take place in the
context of assumptions, statements that, inside this reasoning, we assume to be true. It can also take
place when using a property that was shown elsewhere, effectively abbreviating a sub-proof in a single
statement. Normally, the structure of the proof is guided by the structure of the statement, but this is not
always the case: there exists no effective way of constructing proofs for statements, since that problem
is undecidable, i.e. proof search cannot be implemented, for example.
We use P, Q, etc. as position holders for statements, and will use the terminology ‘P is true’, ‘P holds’,
‘P is provable’, ‘we have a proof for P’, as well as ‘producing a proof for P’, ‘showing P’, ‘proving P’,
interchangeably.
As an example of a statement that is false, take ‘P and not P’; since P cannot both be true and false,
stating that both hold simultaneously cannot be true. If, while building a proof, we reach a statement
like ‘P and not P’, we will have reached a contradiction. This will normally lead to the rejection of an
assumption, as described below; it is impossible to build a proof that leads to a contradiction without
using assumptions.

Definition 1.2 The acceptable steps in a proof are limited to (where we use ‘. . . ’ for a sequence of
steps):

3 We avoid the use of terminology like formulas or propositions on purpose here. We are not dealing with logic, but the
problem of how to prove a result.
4 We will see below that it is important to know what kind x is from.

4
Assumption : Any statement assumed to hold will be true: so if we assume that P holds, then we can
conclude that P is true.
A proof using this step would have the structure:
Assume P holds. Then P is true.
We can also say ‘Then by assumption P is true’ or ‘Then P is true by assumption’ if we want to stress
that P only holds because it has been assumed to do so.
Note that we do not show that P is true by itself, we only say that if P occurs amongst the things
we have assumed to be true (so do not have/need a proof for), then P can be accepted to be true.

Showing ‘P and Q’: We can construct a proof for ‘P and Q is true’ by producing both a proof for P
and a proof for Q. A proof using this step would have the structure:
. . . P holds. . . . Q holds. Since both P and Q are true, we have that ‘P and Q’ holds.
The proofs for P and the proof for Q are independent, produced separately, but are both needed.

Using ‘P and Q’: We can construct a proof for P or a proof for Q from a proof for ‘P and Q’. A proof
using this step would have the structure:
. . . ‘P and Q’ holds. Then P is true.
and similarly for Q. We know that if ‘P and Q’ is true, then so are its components, even if we have
just assumed it.

Showing ‘P or Q’: We can construct a proof for ‘P or Q’ by producing either a proof for P or produc-
ing a proof for Q. A proof using this step would have the structure:
. . . P is true. Then ‘P or Q’ is true.
and similarly for Q. It does not matter which of P or Q we show, both give us the right to then say
that ‘P or Q’ holds; it might be that the other statement holds as well, and it might even be that the
other statement is false (or cannot be proven).

Using ‘P or Q’; proof by cases : If we have a proof for ‘P or Q’, we normally do not know which one
is true; it might be assumed, or it might be a case of the ‘Law of Excluded Middle’ step (see below).
We can eliminate this insecurity when building a proof with the structure:
. . . ‘P or Q’ holds. Assume P, . . . then R holds. Now assume Q, . . . then R holds. Then
R is true.
So we show that R holds under assumptions of both P and Q separately, and since we have that ‘P
or Q’ is true, then R is as well, even if we do not know which of P or Q holds, thereby eliminating
the uncertainty.
Notice that we do not require the cases P and Q to be mutually exclusive, since both can be true,
and that all cases should prove the same conclusion.

Showing ‘P implies Q’: If we produce a proof for Q in which we have used the assumption that P
holds, then the truth of Q depends on that of P, and we can accept that as a proof that ‘if P is true,
then Q is true’, or ‘P implies Q’. A proof using this step would have the structure:
Assume P. . . . Q holds. Then ‘P implies Q’ is true.
At the conclusion, the assumption P is cancelled since P is no longer assumed to be true, and the
dependency has been incorporated into the statement.
Note that we do not need to know if P really is true, we just assume that it is. It might be that in
constructing the proof for Q, the assumption that P is true is not used at all. We can then still state
that ‘P implies Q’; this is now a weaker statement than necessary, since it suggests that the truth of
Q depends on that of P, even if that is now not the case.
Also, it might be that we have assumed P and R when showing Q; then we can build proofs for
both ‘assuming R, P implies Q’, and ‘assuming P, R implies Q’.

5
Using ‘P implies Q’: The statement ‘P implies Q’ stands for ‘assuming P is true, we can show that Q
is true’. If we can show that P is true without assuming it, then we know that Q is true without
precondition. A proof using this step would have the structure:
. . . P holds. . . . ‘P implies Q’ holds. Then Q is true.
This step normally gets used when applying a result you have shown before.

Showing ‘not P’: We can construct a proof for this by assuming P is true, and showing that we get a
contradiction. We then have to reject the assumption and conclude that P cannot be true, so is false,
so ‘not P’ is true; this cancels out the assumption (that P is true), of course. A proof using this step
would have the structure:
Assume P holds. . . . we have a contradiction. Then ‘not P’ is true.
Now P need no longer assumed to be true; the assumption P is discarded, or cancelled.
Of course we could have used many assumptions to build to the contradiction. For example, if
we had assumed P, Q, R, then the contradiction might be caused by any; as a matter of fact, they all
contribute to the contradiction, since they are in conflict with each other. So we can conclude that,
when assuming P, Q, and R we have a contradiction, we can construct a proof for ‘assuming Q and
R, we have (not P) is true’, and ‘assuming P and R, we have (not Q) is true’, as well as ‘assuming Q
and P, we have (not R) is true’. Note that we do not show that R is false, but just that, assuming Q
and P are true, R cannot be true as well.

Proof by contradiction : The proof step shows that P holds because it cannot be true that it does not
hold. A proof using this step would have the structure:
Assume that ‘not P’ holds. . . . we have a contradiction. Then P is true.
The assumption ‘not P’ is cancelled.

From a contradiction, anything follows : The idea behind this is that if you can show contradiction, all
bets are off, and all statements are true. A proof using this step would have the structure:
. . . we have a contradiction. Then P is true.
and we are free to pick whichever P we like. Mark that no assumption gets cancelled in this step.

Law of Excluded Middle : Every statement is either true or false; so ‘P or not P’ is always true, for
every statement P, independent of whether or not we have a proof for either P or ‘not P’. This is
typically used in a ‘proof by cases’.

Showing ‘For every x, P( x) holds’: A proof for this statement normally goes like:
Let o be an arbitrary object. . . . so P(o) holds
(this is P with all occurrences of x replaced by o). If the proof constructed is independent of the
choice for o, so o was truly arbitrary, then P(o) holds for any o, so ‘for every x, P( x) holds’.

Using ‘For every x, P( x) holds’: If we know that ‘for every x, P( x)’ holds, and o is any object that can
be used for x, then P(o) holds.

Showing ‘There exists an x such that P( x) holds’: A proof for this statement normally goes like:
Let o be a certain object. . . . P(o) holds. So ‘there exists an x such that P( x) holds’.

Using ‘There exists x such that P( x) holds’: If we know that ‘there exists x such that P( x) holds’, then
we can safely assume that it holds for the (anonymous or already known) object o, so then P(o)
holds. So if we assume ‘there exists x such that P( x)’, we can assume we can pick an object o such
that P(o) holds. A proof that uses this statement normally goes like:
. . . ‘there exists x such that P( x)’. Let o be that object; then P(o) holds.
The object o might be unknown, but we know it exists.

6
It might be helpful to mention the steps used when building a proof, but this is not necessary.

These steps are sufficient to prove any true statement in mathematics.


For pure administrative reasons, we distinguish the statements ‘if P then Q’ and ‘P implies Q’. We
reserve the first for a proof that assumes P and then shows Q, whereas the second is the result of applying
the step ‘Showing ( P implies Q)’ (normally directly after showing that ‘if P then Q’).
Many common constructions are not acceptable in proofs:
• ‘It is easy to see’ / ‘Obviously’ / ‘Clearly’: If it’s really easy, show it.
• ‘It has to be the case that’: It either is true, or it is not; it doesn’t have to be anything.
• ‘⇒’, ‘⇐’, ‘⇐ ⇒’: these symbols can only be used in (well-formed, complete) formula, not as re-
placement for words. Moreover, ‘A ⇒ B’ is only acceptable if B only depends on A, not on other
information that is written elsewhere.
• ‘∴’ this is often used to justify a step, but is not a logical symbol; use the word therefore instead.
Experience shows that students like to write this symbol in between formulas, hoping that if the one
on the left is true, then the one on the right will be as well, but having not a clear idea on how to
justify it.
• ‘You know what I mean’: Writing a proof is very much like writing computer code: you would
never expect a compiler to ‘fill in the gaps’ in your code, and neither should your reader be expected
to find the details of the proof you didn’t write down. If the structure of the proof (attempt) is not
correct, there is no proof - and there will be no marks.

1.3 On ‘Proof by Contradiction’


The area of logic studies the structure of proofs, abstracting from what is proven, and instead focussing
on what are the permissible steps - which are not always the same. Certain kinds of mathematics do
not allow for the ‘Law of Excluded Middle’ proof step (nor the ‘Proof by Contradiction’ step, which can be
considered to be equivalent), for example. The idea is that one should only allow for a statement like ‘P
or Q’ if either we have a proof for P or we have a proof for Q. The ‘Law of Excluded Middle’ does not
follow this regime: ‘P or not P’ is always true, irrespective of whether or not we can show either. But
normal mathematics strongly depends on this law, as many very important results need it for their proof,
like the one we show in Example 1.10, so abolishing it severely limits what are provable statements. 5
Likewise, some mathematicians would like the existence of a proof to give certainty on wether a result
holds or not, 6 and the ‘Law of Excluded Middle’ does not provide that. In fact, for them, showing that a
statement cannot be false is not the same as showing that it is true.

Example 1.3 Accepting the ‘Law of Excluded Middle’, we know that Goldbach’s conjecture, one of the
oldest and best-known unsolved problems in number theory (and all of mathematics), is either true or
false. It states:
Every even integer greater than 2 can be expressed as the sum of two primes.
The conjecture has been shown to hold for all integers less than 4 × 1018 , but remains unproven today. 7
We can now define a real(number as follows:
0 if 2 × i can be expressed as a sum of two primes
ri =
1 otherwise
and the real number r as
r = 0. 0 r2 r3 r4 r5 . . .

5 There are many different results that are equivalent to the axiom of choice; that equivalence often is shown by contradiction.
6 For example, the proof for ‘There exists an x such that P ( x ) holds’ should then produce the object o for which P (o ) holds,
rather than proving that it cannot be the case that the statement is false: the proof should produce the witness of the statement.
7 Source: https://en.wikipedia.org/wiki/Goldbach’s_conjecture.

7
a perfectly well-defined number, and we can ask ourselves the question: is r greater than 0? From the
above, we know that the first 4 × 1018 digits are 0, but have no idea about the full (infinitely long) list
of digits; is one of the digits 1? So in (classical) mathematics, which allows for the ‘Law of Excluded
Middle’, ‘r = 0 or r > 0’ holds, but we do not know which one is true.

The ‘Proof by Contradiction’ step should be used sparingly. Experience shows that too often students
think that, when looking to prove P, it will be easier to prove that P cannot be false. Many a proof 8 then
gets constructed as follows:
To show that P holds, assume towards a contradiction that it does not. So assume ‘not P’
holds. . . . P holds. So we have both ‘not P’ by assumption and have shown P, which is a
contradiction. So we have to reject the assumption ‘not P’ and by the ‘Proof by Contradiction’
step we get that P is true.
The problem with this proof is that it contains a direct proof for P (here underlined)!
So if you venture in the direction of a proof by contradiction, think twice: there might be a direct proof
close by.

1.4 Acceptable statements


A complication with insisting on the rigour of the structure of proofs is that a proof is now only acceptable
if all properties are shown explicitly. This means that, for example, we cannot simply say that 3 + 4 = 7,
but would have to prove that this equality holds. That this seems to be a silly task (or even impossible,
because how does one show this using the proof steps above?) is not the issue: when we demand that a
proof is built only out of the acceptable steps above, there is no escape.
The practice in mathematics is, of course, very different. Many lecturers in mathematics will not give
a proof in terms of the rules we listed here, but reason out why a property will hold, often resorting to
an intuitive argument, for example by saying that ‘it is easy to check that 3 + 4 = 7’. Although this is
acceptable in itself, formally no proof is produced this way. That implies that, if a proof is asked for, an
intuitive argument cannot suffice.

Example 1.4 We can argue that ‘P or not P’ is equivalent to ‘(not (not P)) implies P’; after all, if ‘not
(not P)’ is true, then ‘not P’ is false, and P is true, so ‘(not (not P)) implies P’ is true.
But this is not a proof, it only shows understanding of the problem.
Moreover, this reasoning only deals with one side of the equivalence; the part that shows that the ‘Law
of Excluded Middle’ can be shown using ‘Double Negation Elimination’, is quite involved. 9

We distinguish intuitive reasoning from formal proofs; the first gives insight to the truth of a statement
(and might suggest the structure of the proof), the latter give the precise structure that constitutes a proof
and formally justifies the statement. When asking for a proof we will insist on rigour and only accept
proofs that are constructed correctly; however, in this course, there will be basic statements, on arithmetic
for example, of the kind of ‘3 + 4 = 7’ or ‘4 is even’, that we can accept without proof, that are facts that
‘we know’ and assume have been proven elsewhere.

1.5 Some examples of proofs


Example 1.5 Prove that ‘( P and Q) implies Q’.
Proof : We need to show an implication, so assume the part on the left-hand side and hope to build a
proof for the right-hand side. So assume ‘P and Q’. Then by step ‘Assumption’ we have that ‘P and Q’
is true. Then by the step ’Using and’, we know that Q is true. So, assuming ‘P and Q’ we get Q, so we
have ‘( P and Q) implies Q’.

8 An observation made after correcting many students’ attempts at proving results.


9 Note that we do not claim that ‘if (not (not P )) implies P, then ( P or not P )’, since that is not provable.

8
Notice that we have not shown that Q is true, only that the implication holds; and we do not know if
‘P and Q’ is true, we just assume it. As an example of this statement, we can take ‘7 + 2 = 9’ for P and
‘2 + 2 = 3’ for Q. Then by the proof above, we have ‘(7 + 2 = 9 and 2 + 2 = 3) implies 2 + 2 = 3’.

Example 1.6 Prove that ‘( P implies Q) implies ( P implies Q)’.


Proof : We have two ways of showing this:
1) We want to show that ‘P implies Q’, when we assume ‘P implies Q’. To show ‘P implies Q’, we
need to assume P and give a proof for Q.
Well, we assume ‘P implies Q’ holds, so we know that ‘P implies Q’ is true; we also assume
P holds, so P is true as well. Then we know that Q is true. So ‘P implies Q’ is true (under the
assumption that ‘P implies Q’ holds, but that is implicit here, since the assumption was mentioned
at the beginning of the proof), and therefore also ‘( P implies Q) implies ( P implies Q)’ is true.
2) We want to show that, assuming ‘P implies Q’, we can show ‘P implies Q’. So assume ‘P implies
Q’; then by step ‘Assumption’ we know that ‘P implies Q’ is true. So ‘( P implies Q) implies ( P
implies Q)’ is true.

Notice that an assumption is made when needed, and that it ‘exists’ and can be used until cancelled
when building the ‘implies’ that uses it.

Example 1.7 There exists x such that ‘2 + 4 = x’.


Proof : We know that ‘2 + 4 = 6’; but then ‘there exists x such that 2 + 4 = x’.

Example 1.8 For every x, if x is the product of two even numbers, it is divisible by 4.
Proof : First, we make the statement more precise: ‘For every x, if there exist y, z such that x = y × z and
there exists u such that y = 2 × u and there exists v such that z = 2 × v, then there exists w such that
x = 4 × w’.
Let n be an arbitrary number. We need to show ‘there exists w such that n = 4 × w’ from the assump-
tion ‘there exist y, z such that n = y × z and there exists u such that y = 2 × u and there exists v such that
z = 2 × v’.
From the assumption, we know there are m, k such that ‘n = m × k’ and ‘there exists u such that
m = 2 × u’ and ‘there exists v such that k = 2 × v’. From the latter two statements, we know there are
p, q such that ‘m = 2 × p’ and ‘k = 2 × q’.
So, in all, we have ‘n = m × k’, ‘m = 2 × p’, and ‘k = 2 × q’ for certain m, k, p, q. But then ‘n =
(2 × p) ×(2 × q)’, so ‘n = 4 ×( p × q)’. So we have shown ‘there exists w such that n = 4 × w’.
Cancelling the assumption, we now know that ‘(there exist y, z such that n = y × z and there exists u
such that y = 2 × u and there exists v such that z = 2 × v) implies (there exists w such that n = 4 × w)’.
This holds for arbitrary n, so we have shown that ‘for every x: (there exist y, z such that x = y × z and
there exists u such that y = 2 × u and there exists v such that z = 2 × v) implies (there exists w such that
x = 4 × w)’. So for every x, if x is the product of two even numbers, it is divisible by 4.

Example 1.9 If ‘P implies Q’, then ‘(not Q) implies (not P)’.


Proof : We want to show that, assuming ‘not Q’ we can show ‘not P’, under the assumption that ‘P
implies Q’. So assume ‘not Q’ and assume ‘P implies Q’. We want to show ‘not P’, so assume P as well.
Then, using the second assumption, we know that Q is true. Using the first assumption, we now have a
contradiction, so reject the assumption we added, so we know that ‘not P’ is true. So we have ‘(not Q)
implies (not P )’ under the assumption that ‘P implies Q’.

Remark that, all the use of negation notwithstanding, this is not a proof by contradiction. The follow-
ing is:

Example 1.10 If ‘(not Q) implies (not P)’, then ‘P implies Q’.


Proof : We want to show that ‘P implies Q’, under the assumption that ‘(not Q) implies (not P)’. So we

9
assume both P and ‘(not Q) implies (not P)’ and aim to prove Q. Notice that we cannot directly use the
‘implies’; in order for us to work with these assumptions, the only path open to us is to assume ‘not Q’.
So assume ‘not Q’. Then, using the second assumption, we know that ‘not P’ is true. Using the first
assumption, we now have a contradiction, so have to reject the assumption we added, so we know that
‘not Q’ is false, so by the ‘Proof by Contradiction’ step, Q is true. We assumed P for this result, so ‘P
implies Q’ holds.

Example 1.11 ‘(not Q) implies (not P) if and only if ( P implies Q)’.


Proof : The proof has two parts; we have to show ‘((not Q) implies (not P)) implies ( P implies Q)’, and
then have to show that ‘( P implies Q) implies ((not Q) implies (not P))’.
The first part follows by Example 1.10, the second by Example 1.9.

Example 1.12 ‘( P implies Q) if and only if ((not P) or Q)’.


Proof : The proof has two parts; we have to show ‘( P implies Q) implies ((not P) or Q)’, and then have
to show that ‘((not P) or Q) implies ( P implies Q)’.
‘( P implies Q) implies ((not P) or Q)’: Assume ‘P implies Q’; to show ‘(not P) or Q’. We will use
the ‘Law of Excluded Middle’ for this; we know ‘(not P) or P’ holds, so we have two cases.
‘not P’ holds : Then ‘(not P) or Q’ holds as well.
P holds : Using the assumption, we know that Q holds as well, so ‘(not P) or Q’ holds as well.
So ‘(not P) or Q’ holds. But then ‘( P implies Q) implies ((not P) or Q)’.
‘((not P) or Q) implies ( P implies Q)’: Assume that (1) ‘(not P) or Q’ and (2) P both hold. To show
Q holds. We now have two cases from (1):
‘not P’ holds : Using the second assumption, we have a contradiction, so by ‘From a contradiction,
anything follows’, we can say that Q holds.
Q holds : Then Q holds.
Then Q holds, so ‘P implies Q’, so ‘((not P) or Q) implies ( P implies Q)’.

Both directions of this proof heavily need to reason through contradiction.

Example 1.13 1) Show that, if ‘A and ( B or C )’, then ‘( A and B) or ( A and C )’, and vice versa.
2) Show that, if ‘A or ( B and C )’, then ‘( A or B) and ( A or C )’, and vice versa.
Proof : 1) From left to right: assume that ‘A and ( B or C )’ holds, then both A holds and B holds or C
holds, and we have two cases:
B holds : Since A and B both hold, we have that ‘A and B’ holds and thereby ‘( A and B) or ( A and
C )’ is true.
C holds : Since A and C both hold, we have that ‘A and C’ holds and thereby ‘( A and B) or ( A and
C )’ is true.
Both cases lead to the same result, so ( A and B) or ( A and C ) holds.
From right to left: assume ‘( A and B) or ( A and C )’, then we have two cases:
‘A and B’ holds : Then A and B both hold; since B holds, ‘B or C’ holds as well; since also A holds,
we have that ‘A and ( B or C )’ holds.
‘A and C’ holds : Then A and C both hold; since C holds, ‘B or C’ holds as well; since also A holds,
we have that ‘A and ( B or C )’ holds.
Both cases lead to the same result, so ‘A and ( B or C )’ is true.
2) From left to right: assume ‘A or ( B and C )’, then we have two cases:
A holds : Since A holds, also ‘A or B’ holds and ‘A or C’ holds as well, so ‘( A or B) and ( A or C )’
is true.
‘B and C’ holds : Then B and C both hold. Since B holds, ‘A or B’ holds as well, and since C holds,

10
‘A or C’ holds as well; so ‘( A or B) and ( A or C )’ is true.
Both cases lead to the same result, so ‘( A or B) and ( A or C )’ is true.
From right to left: assume ‘( A or B) and ( A or C )’, then ‘A or B’ and ‘A or C’ both hold, so we
have four cases:
A holds : Then ‘A or ( B and C )’ is true.
A and C both hold : Since A and C both hold, we know that A holds, so ‘A or ( B and C )’ is true.
B and A both hold : Since B and A both hold, we know that A holds, so ‘A or ( B and C )’ is true.
B and C both hold : Since B and C both hold, also ‘B and C’ holds, Then ‘A or ( B and C )’ is true.
All cases lead to the same result, so ‘A or ( B and C )’ is true.

Example 1.14 If ‘for every x we have that P( x) holds’, then ‘there exists an x such that P( x)’.
Proof : Assume ‘for every x we have that P( x) holds’. Take an arbitrary o, then we have P(o). So we
have shown that P holds for o, so ‘there exists an x such that P( x)’.

Example 1.15 If ‘for every x we have that P( x) implies Q( x)’, then ‘(there exists y such that P(y)) implies
(there exists z such that Q(z))’ holds.
Proof : Our aim is to prove that ‘(there exists y such that P(y)) implies (there exists z such that Q(z))’.
We can use the assumption that ‘for every x, P( x) implies Q( x)’; let’s call this assumption 1.
To prove the implication, we assume ‘there exists y such that P(y)’; we call this assumption 2. From
assumption 2, we can call o the object that satisfies P, so P(o) holds.
From assumption 1, taking o for x, we have that ‘P(o) implies Q(o)’. Since we have that both P(o)
holds and ‘P(o) implies Q(o)’, we know that Q(o) holds. So ‘there exists z such that Q(z)’.
Then, using assumption 2, we have that ‘(there exists y such that P(y)) implies (there exists z such that
Q(z))’ holds.

Notice that we do not need to know which object o is exactly, just that it exists.

Example 1.16 If ‘there exists x such that P( x)’, then ‘not for all x we have not P( x)’.
Proof : Assume (1) ‘there exists x such that P( x)’. Our aim is to prove ‘not for all x we have not P( x)’, so
aim to prove that assuming (2) ‘for all x we have not P( x)’ leads to a contradiction.
From (1), let o be the object that satisfies P, so we have P(o). From (2) we have ‘not P(o)’. This is a
contradiction, so we reject (2) and conclude ‘not for all x we have not P( x)’.

Example 1.17 If ‘there does not exist x such that P( x)’, then ‘for all x we have not P( x)’.
Proof : Assume ‘there does not exist x such that P( x)’, or better (1) ‘not (there exists x such that P( x))’.
Our aim is to prove ‘for all x we have not P( x)’, so aim to prove ‘not P(o)’ for any object o.
So take an arbitrary object o and assume (2) P(o). Then we have that ‘there exists x such that P( x)’;
this is in contradiction with (1), so we reject (2) and conclude ‘not P(o)’. We have shown this for
arbitrary o, so ‘for all x we have not P( x)’.

Example 1.18 If ‘for all x we have not P( x)’, then ‘not (there exists x such that P( x))’.
Proof : Assume (1) ‘for all x we have not P( x)’. Our aim is to prove ‘not (there exists x such that P( x))’.
We prove this using ‘showing not’. So assume (2) ‘there exists x such that P( x)’; let o be such that
P(o) holds. From (1) we know that ‘not P(o)’ holds as well; this is a contradiction, so we reject (2) and
conclude ‘not (there exists x such that P( x))’.

Example 1.19 If ‘not for all x we have not P( x)’, then ‘there exists x such that P( x)’.
Proof : Assume (1) ‘not for all x we have not P( x)’. Our aim is to prove ‘there exists x such that P( x)’.
We prove this by contradiction. So assume (2) ‘not there exists x such that P( x)’, then by Example 1.17
we have that ‘for all x we have not P( x)’. This in contradiction with (1), so we reject (2) and conclude
‘there exists x such that P( x)’.

11
2 Sets
One of the most fundamental concepts in mathematics is that of a set. It is generally seen as ‘a collection
of things’, like ‘all even numbers’, ‘all complex numbers’, ‘all cars manufactured in England’, etc..
Although normally we will not be too meticulous on whether some collection is indeed a set, we will
make clear that care has to be taken with a formal definition: contrary to intuition, not every collection
automatically is a set.
We will first introduce Georg Cantor’s original definition from 1883 together with some notation,
much of which was introduced by Giuseppe Peano in 1895.

Definition 2.1 (C ANTOR ’ S S ETS ) 1) A set is a collection of definite and separate objects 10 (or indi-
viduals) into a whole.
2) The members of a set are also called the elements of the set and a set is said to contain its elements.
We write
a ∈ A when object a is an element of set A, and
a 6∈ A when a is not an element of A
We assume that always, for all a and A, we have that either a ∈ A or a 6∈ A, but not both. 11
3) Sets are written like
{ a1 , . . . , an } or { x ∈ A | P( x)}
where the former states the members of the set explicitly, and the latter stands for the collection of
all objects from the set A, ranged over by x, that satisfy the property P (the notation P( x) stands
for a formula that contains x); this is called restriction, or comprehension (see Definition 2.8).
4) We assume that each set A has an equality relation (see Section 3) ‘=A ’, that tells us wether two
elements of A are the same; we will drop the subscript on ‘=’ when clear from the context.

Notice that a ∈ A is a statement that is either true or false, so a 6∈ A is the same as ¬ ( a ∈ A), 12 and
that it is impossible for a ∈ A and a 6∈ A to hold simultaneously. We will often write a, b ∈ A whenever
a ∈ A and b ∈ A both hold.
Sets can be finite, i.e. have a finite number of elements (the number of elements is then an element
of IN , the set of natural numbers, or infinite, i.e. have an infinite number of elements (with that number
the same as that of the number of elements in IN , or even larger (see Section 6.3)). When small enough,
finite sets can be defined by listing their elements. Infinite sets normally are defined using a pattern or
restriction.

Example 2.2 • the two-element set Bool = { True , False };


• the set of vowels V = { a, e, i, o, u } ;
• an arbitrary set { 1, 2, e, f , 5, Imperial }, which is a set containing numbers, letters, and a string of
letters with no obvious relationship to each other;
• the set IN of natural numbers { 0, 1, 2, 3, . . . }, 13 which is read as ‘IN is the set containing the natural
numbers 0, 1, 2, 3, etc..’;
• the set ZZ of integers { . . . , −3, −2, −1, 0, 1, 2, 3, . . . };
• the empty set = ∆ , 14 which is the (only) set that contains no elements;

• nested sets, such as the set {{ }, { a, e, i, o, u } } containing the sets { } and { a, e, i, o, u } as its two
elements.
10 Collections where an individual is allowed to appear more than once are known as multi-sets.
11 If we now define V = { n ∈ IN | n > 1 and 2 × n is not a sum of two primes }, we do not know if V has elements or not.
12 We will use this kind of abbreviation for other notions as well.
13 Note that we consider 0 to be a natural number.
14 We will use the notation A =∆ B whenever we want to stress that A = B follows by definition.

12
The set IN is of course an infinite set, which here is defined rather informally. We will later see
that there are more infinitely large sets, even with different, incomparable, measures of infinity (see
Section 6). The ellipses ‘. . .’ indicate that the remaining elements are given by some rule, which should
be apparent from the given elements. Notice that sets can themselves be members of other sets, as the
last example illustrates. There is an important distinction between { a, b, c } and {{ a, b, c } } for example:
the first set contains three elements, the second only one; or between and { }: the latter set is not
empty, since it contains an element.

2.1 Russell’s paradox


Cantor’s set theory, although very natural and intuitive, was shown to be inconsistent by Bertrand Russell
in 1901. He showed that we cannot just group any collection and call it a set but need to be more
cautious by presenting a collection that is defined using Cantor’s approach – so should be a set – but
cannot exist. The conclusion therefore is that not every collection of mathematical objects defined this
way automatically forms a set.
Russel’s counterexample is based on a number of important aspects of Cantor’s set theory, that it
assumes can be used freely:
1) A set can be not an element of itself. This is directly understandable; take the collection of all finite
sets, which is itself an infinite set.
2) A set can be an element of itself. This is a bit more confusing, but when we allow arbitrary collec-
tions to form sets (as Cantor does), it is easy to see it is allowed. Take for example the collection of
all sets that contain at least two elements; using Cantor’s approach, this is a set in itself, that has at
least two elements, so contains itself.
It is often presented as a paradox, but this is actually a misnomer; it in fact shows that Cantor’s definition
is badly constructed because it leads to a contradiction (is inconsistent).
∆ { X | X 6∈ X } is not a set.
Theorem 2.3 The collection R =
Proof : Assume that R is a set. Then either R ∈ R or R 6∈ R. We analyse the two cases:
Assume R ∈ R : then by definition of R it follows that R is a set that is not an element of itself, so
R 6∈ R which is a contradiction.
Assume R 6∈ R : then by definition of R it follows that R is not a set that is not an element of itself, so
R is a set that is an element of itself, so R ∈ R and again we have a contradiction.
So in both cases we come to a contradiction, hence the initial assumption is rejected and R cannot be a
set.

Russell’s counterexample led, after some commotion, to a more precise definition of the notion of
set. A consequence is that it is clear that a construction like { x | P( x)} cannot be well defined; the
principle of restriction can be safely used to define a new set, but only when used to select elements from
an already established set, as in { x ∈ A | P( x)}. If, for example, we assume that the collection of all
sets S is itself a set (as Cantor implicitly did), then we could write R = ∆ { X ∈ S | X 6∈ X } for Russel’s

set; notice that we still cannot avoid the above problem so have to conclude that even this R is not a set.
Since restriction is safe, we have to conclude that S itself cannot be a set. So, in particular, not every
collection as defined by Cantor is a set.
Also, when writing expressions using quantifiers, we need to specify what set we are quantifying
over: we cannot speak about ‘all things that satisfy a property’, since we would not be sure they would
constitute a set; so we should write statements like ‘For every x ∈ A, the property P( x) holds’ as well
as ‘There exists an x ∈ A such that P( x) holds’. This then leads to a strange anomaly in mathematical
proofs. You might be able to show that a property P( A) holds for any set A, but cannot extend that to a
proof for ‘For every A, the property P( A) holds’, since we would need to select A from a set, i.e. the set
of all sets, which does not exist.

13
We will discuss here the definition (axiomatisation) of set theory of Ernst Zermelo and Abraham
Fränkel from 1908 (see Definition 7.1). Before we can present and discuss that definition, we need to
introduce some notions and operations on sets that are used there.

2.2 Comparing Sets


We will now define the notion of one set being a subset of (or be contained in) another set, and the related
notion of two sets being equal.

Definition 2.4 (S UBSETS ) Let B be a set.


1) A set A is called a subset of another set B, written A ⊆ B, if every element of A is also an element
of B; in other words:
A ⊆ B is defined as ‘for all x ∈ A, we have x ∈ B’
2) We call A a strict subset of B, written A ⊂ B, if A is a subset of B but not equal to B:
A ⊂ B is defined as ‘A ⊆ B and A 6= B’
Notice that if A is a strict subset of B, then A is a subset of B.

Example 2.5 Any set is, trivially, a subset of itself. Other simple examples are
{ a, b } ⊆ { a, b, c }
IN ⊆ ZZ
⊆ { 1, 2, 5 }
To show that the last example is true we need to show that every element in is contained in { 1, 2, 5 } .
Since there are no elements in , this property is vacuously true: therefore is a subset of every set.
Take the set V = { , { }}, then we have both:
∈ V, { } ∈ V
⊆ V, { } ⊆ V
Notice that, of course, we can extend the last sequence (V has, in fact, four subsets), but that would
go beside the point we are trying to make here.

Proposition 2.6 Let A, B, and C be arbitrary sets. If A ⊆ B and B ⊆ C, then A ⊆ C.


Proof : Assume that A, B, and C are arbitrary sets and that A ⊆ B and B ⊆ C. We need to show that
every arbitrary element a of A is also an element of C.
So assume a ∈ A. By assumption, we know that A ⊆ B, and hence by definition of the subset relation
every b that is an element of A (also) is an element of B; since we assumed that a ∈ A, we have that
a ∈ B. We also know that B ⊆ C, and again by definition of the subset relation we conclude that a ∈ C.
So assuming that a ∈ A, we have shown that a ∈ C. We have shown this without using any specific
knowledge about a other than that it is a member of A, so have shown the property for all elements of
A, so have shown ‘for every x ∈ A, we have x ∈ C’. Hence by definition A ⊆ C, as required.

Whenever we introduce a structure (in this case sets), it is important to be able to identify when we
consider two such structures to be equal, in other words when they denote the same thing. For sets, two
sets are equal when they contain the same elements, so are subsets of each other.

Definition 2.7 (S ET EQUALITY ) Let A, B be any two sets. Then A and B are the same set, written
A = B, if both A ⊆ B and B ⊆ A: that is,
∆ ‘A ⊆ B and B ⊆ A’
A=B =

So equality between sets is defined through set inclusion, and we can formally only show, for any two
sets C and D, that C = D by showing that both D ⊆ C and C ⊆ D, even if our intuition ‘sees’ that C and
D are the same.

14
This definition of equality on sets implies that the order of the elements of a set do not matter: the sets
{ a, b, c } and { b, c, a } are equal; we can argue that they are not the same description of that set. 15

2.3 Constructing Sets


There are several ways of describing sets. So far, we have informally introduced two approaches: either
we just list the elements inside curly brackets:
V = { a, e, i, o, u } , IN = { 0, 1, 2, . . . }, { , { a } , { b }, { a, b }};
or we define a set by stating the property that its elements satisfy, using restriction.
The second approach can be generalised to a general axiom of comprehension, which states that the
collection of elements from a set that satisfy a certain logical property forms a set.

Definition 2.8 (C OMPREHENSION ) We can construct a (new) set by taking all elements of a set A that
satisfies some property P, as in
{ x ∈ A | P( x)}
where P is a predicate (a mathematical formula) which contains x.

Example 2.9 1) We can define the set of all prime numbers as follows:
∆ { p ∈ IN | p ‘is a prime number’ }
P =

= { p ∈ IN | ‘if p is the product of two natural numbers, one of them is equal to 1’ }

= { p ∈ IN | ‘for all n, m ∈ IN if n × m = p, then n = 1 or m = 1 ’ }
This is a case where writing a formula is perhaps clearer than when writing this statement in English.
2) Assume B = { x ∈ A | P( x)}, and C = { x ∈ A | Q( x)} and that ‘for all x ∈ A we have P( x) if and
only if Q( x )’; if we want to show that B = C, we would have to reason as follows:
a ) To show B ⊆ C, we need to show that, for all b ∈ B, also b ∈ C. Well, take b ∈ B, then by
definition of B, b is an element of A for which P(b) holds. Since for all x ∈ A, we have ‘P( x) if
and only if Q( x )’, in particular, for all x ∈ A, we have that ‘P ( x ) implies Q( x )’. But then ‘P (b)
implies Q(b)’ holds. Combining this with the fact that P (b) holds, we have that Q(b) holds as
well. Then by definition of C we have b ∈ C. So ‘b ∈ B implies b ∈ C’, for any b ∈ B, so B ⊆ C.
b ) C ⊆ B follows in a similar way, swapping B with C and P with Q in the proof above.

For set comprehension to be correct, we need that A is known to be a set. This implies that a definition
like
IR = { x | x is a real number }
would not formally be acceptable: first of all, it ‘depends on itself’, but more importantly, we have not
specified from what set x is selected.
We will now describe set constructors which build (new) sets from other sets. In each case, the
resulting set is well defined as long as the original set is.

Definition 2.10 (C OMBINING SETS ) Let A and B be any sets.


1) The following operations construct (new) sets:
Set Union : ∆ { x | ‘x ∈ A or x ∈ B’ }
A∪B =
Set Intersection : ∆ { x ∈ A ∪ B | ‘x ∈ A and x ∈ B’ }
A∩B =
Set Difference : ∆ { x ∈ A ∪ B | ‘x ∈ A and x 6∈ B’ }
A \ B 16 =
Symmetric Set Difference : A△B = ∆ ( A \ B) ∪ ( B \ A)

2) We will also use the following notion:

15 This is comparable to saying that 3 + 4 = 7; 3 + 4 and 7 are not the same expression, but represent the same value.
16 In the literature, also the notation A − B is used for set difference.

15
A, B are disjoint =
Disjoint Sets : ∆ A∩B=

We call A ∪ B a disjoint union when A and B are disjoint.

Notice that A ∪ B is defined to be a set; it is not defined using comprehension, since its members are
not selected from a set: for lack of a better solution, it is assumed (defined) to create a set (this is an
axiom). Moreover, we can define A ∩ B = ∆ { x ∈ A | x ∈ B } or { x ∈ B | x ∈ A } and, similarly, A \ B ∆
=
{ x ∈ A | x 6∈ B }.
Example 2.11 Let A = { 1, 3, 5, 7, 9 } and B = { 3, 5, 6, 10, 11 } . Then
A ∪ B = { 1, 3, 5, 6, 7, 9, 10, 11 }
A ∩ B = { 3, 5 }
A \ B = { 1, 7, 9 }
A △ B = { 1, 6, 7, 9, 10, 11 }
It is often helpful to illustrate these combinations of sets using Venn diagrams.

A B A B A B A B

A∪B A∩B A\B A△B

We will now investigate certain equalities between sets constructed from our operations.

Proposition 2.12 (P ROPERTIES OF OPERATORS ) Let A, B and C be arbitrary sets. The following prop-
erties hold: 17
Idempotence Commutativity Associativity
A∪A = A A∪B = B∪A A ∪ ( B ∪ C ) = ( A ∪ B) ∪ C
A∩A = A A∩B = B∩A A ∩ ( B ∩ C ) = ( A ∩ B) ∩ C
A△B = B△A

Empty set Distributivity Absorption


A∪ = A A ∪ ( B ∩ C ) = ( A ∪ B) ∩ ( A ∪ C ) A ∪ ( A ∩ B) = A
A∩ = A ∩ ( B ∪ C ) = ( A ∩ B) ∪ ( A ∩ C ) A ∩ ( A ∪ B) = A
A△A =

Proof : We will just look at the first distributivity equality; 18 some of the other cases will be set as
exercises. Let A, B, and C be arbitrary sets.
By definition of equality between sets, the proof is given by the following two results:
1) A ∪ ( B ∩ C ) ⊆ ( A ∪ B) ∩ ( A ∪ C ).
2) ( A ∪ B) ∩ ( A ∪ C ) ⊆ A ∪ ( B ∩ C ).
To prove part (1), take o to be an arbitrary element of A ∪ ( B ∩ C ). By the definition of set union, this
means that o ∈ A or o ∈ B ∩ C. So we have two cases:
o ∈ A : Then also ‘o ∈ A or o ∈ B’ holds as well; similarly, ‘o ∈ A or o ∈ C’ holds as well. But then
‘(o ∈ A or o ∈ B) and (o ∈ A or x ∈ C )’.
o ∈ B ∩ C : Then ‘o ∈ B and o ∈ C’, and also ‘o ∈ A or (o ∈ B and o ∈ C )’; by Exercise 1.13, also
‘(o ∈ A or o ∈ B) and (o ∈ A or o ∈ C )’.

17 We will use the notions defined here, like commutativity and associativity, also for other operations.
18 Drawing a Venn diagram gives some evidence that the property is indeed true, but notice that this would not be a proof,
but rather a support for understanding and intuition; it is easy to make a mistake in drawing the diagram.

16
So both cases lead to the same conclusion, which implies that ‘x ∈ A ∪ B and x ∈ A ∪ C’, and hence
‘x ∈ ( A ∪ B) ∩ ( A ∪ C )’. We have shown that every x ∈ A ∪ ( B ∩ C ) is also an element of ( A ∪ B) ∩
( A ∪ C ), so A ∪ ( B ∩ C ) ⊆ ( A ∪ B) ∩ ( A ∪ C ).
To prove part (2), we have to prove that x ∈ ( A ∪ B) ∩ ( A ∪ C ) implies x ∈ A ∪ ( B ∩ C ). In this case
the proof is easily obtained, since it just follows the above proof in reverse.

The above proof is an example of the generality often required to prove a property about sets: it uses
arbitrary sets and arbitrary elements of such a set. In contrast, to show that a property is false, it is enough
to find one counterexample, one set for which the property does not hold. Ideally, counterexamples are
as simple as possible, to illustrate that a statement is not true with minimum effort to the reader.

Proposition 2.13 The following statements are not always true:


1) A ∪ ( B ∩ C ) = ( A ∩ B) ∪ C;
2) A ∪ ( B ∩ C ) = ( A ∩ B) ∪ ( A ∩ C ).
Proof : Take A = { 1, 2, 4, 5 } , C = { 2, 3, 5, 6 } , and B = { 4, 5, 6, 7 } . We can understand that the sets are
different by looking at the Venn diagrams on the right:
Focussing on the areas that are shaded, we A C A C
observe that A ∪ ( B ∩ C ) = { 1, 2, 4, 5, 6 } 1 2 3 1 2 3
whereas ( A ∩ B) ∪ C = { 2, 3, 4, 5, 6 } . Notice
5 5
that the property holds for A = { 2, 4, 5 } , 4 6 4 6
C = { 2, 5, 6 }, and B = { 4, 5, 6 }; then
7 B 7 B
A ∪ ( B ∩ C ) = { 2, 4, 5, 6 } = ( A ∩ B) ∪ C.

A C A C For part (2), we have the Venn diagrams on


1 2 3 1 2 3 the left. Now A ∪ ( B ∩ C ) = { 1, 2, 4, 5, 6 } and
( A ∩ B) ∪ ( A ∩ C ) = { 2, 4, 5 }. To make the
5 5
4 6 4 6 equation work we could remove 1 from A and
6 from B or C.
7 B 7 B

We can show that ∪ and ∩ are idempotent.

Proposition 2.14 (I DEMPOTENCE ) For any set A, A ∪ A = A and A ∩ A = A.


Proof : A ∪ A = A: Assume a ∈ A ∪ A, then ‘a ∈ A or a ∈ A’. The two cases collapse onto one, with
conclusion a ∈ A. So A ∪ A ⊆ A.
Assume a ∈ A, then ‘a ∈ A or a ∈ A’ holds, so a ∈ A ∪ A. Therefore A ⊆ A ∪ A.
A ∩ A = A : Assume a ∈ A ∩ A, then ‘a ∈ A and a ∈ A’; so a ∈ A. Thereby A ∩ A ⊆ A.
Assume a ∈ A, then ‘a ∈ A and a ∈ A’ holds, so a ∈ A ∩ A. Therefore A ⊆ A ∩ A.

We will now explore the number of elements in a finite set; in Section 5.5 we will learn how to talk
about the number of elements in an infinite set like IN , Q , and IR .

Definition 2.15 ((F INITE ) C ARDINALITY ) Let A be a finite set. The cardinality of A, written A , is
the number (in IN ) of (distinct) elements contained in A.

Remember that we defined that elements in a set are all distinct, where a and b are distinct when a 6= b.

Example 2.16 { a, e, i, o, u } = 5
{{ a, e } , { i }, { o, u }} ∪ {{ e, a } , { u, o }} = 3
= 0

Proposition 2.17 Let A and B be finite sets.

17
1) If A and B are disjoint, then A ∪ B = A + B .
2) A ∪ B = A + B − A ∩ B .
Proof : 1) Let A = n, B = m, then there exists distinct a1 , . . . , an , and b1 , . . . , bm such that A =
{ a1 , . . . , an } and B = { b1 , . . . , bm }.
Now assume ‘there exists i with 1 ≤ i ≤ n and j with 1 ≤ j ≤ m such that ai = b j ’, then ai ∈ A and
ai ∈ B, so ai ∈ A ∩ B. But A and B are disjoint, which gives a contradiction. So it is not true that
‘there exists i with 1 ≤ i ≤ n and j with 1 ≤ j ≤ m such that ai = b j ’, so by Example 1.17 ‘for every i
with 1 ≤ i ≤ n and j with 1 ≤ j ≤ m we have ai 6= b j ’.
Then A ∪ B = { a1 , . . . , an , b1 , . . . , bm }, which are all distinct, so A ∪ B = n + m = A + B .
2) It is easy to check that A = ( A \ B) ∪ ( A ∩ B)
B = ( B \ A) ∪ ( A ∩ B)
A ∪ B = ( A \ B) ∪ ( A ∩ B) ∪ ( B \ A)
(see Exercise 2.10) and that unions on the right are disjoint. Then by the first part,
A = A\B + A ∩ B
B = B\ A + A ∩ B
A ∪ B = A\B + A ∩ B + B\ A
so
A ∪ B = A\B + A ∩ B + B\ A
= A − A∩B + A∩B + B − A∩B
= A + B − A∩B

In short, the number A + B counts the elements of A ∩ B twice, so we have to subtract A ∩ B from
it to obtain the result.

In Definition 2.4, we defined the notion of a subset of a set. We can create the set of all subsets of A,
which is called the power set 19 of A, which is a well-defined set as well.

Definition 2.18 (P OWER SET ) Let A be any set. Then the collection of all subsets of A (which are sets
by definition), the power set of A, written ℘ A, is assumed to be a a set as well and is defined as:
∆ {V | V ⊆ A }
℘A =
Examples of power sets include:
℘ { a, b } = { , { a }, { b }, { a, b }}
℘ = { }
℘ IN = { ,
{ 0 }, { 1 }, { 2 }, . . . ,
{ 0, 1 } , { 0, 2 }, . . . , { 1, 1 }, { 1, 2 }, . . . , { 2, 2 }, { 2, 3 } , . . . ,
{ 0, 1, 2 } , { 0, 1, 3 }, . . .
{ 2, 4, 6, 8, . . . }, . . .} 20
Note that the power set of the empty set is not empty.
Let A be an arbitrary finite set. One way to list all the elements of ℘ A is to start with , then add the
sets taking one element of A at a time, then the sets taking two elements from A at a time, and so on
until the whole set A is added, and ℘ A is complete.

Proposition 2.19 Let A be a finite set with A = n ∈ IN . Then ℘ A = 2n .

19 The name is due to the cardinality result in Proposition 2.19.


20 We will see in Example 6.6 that we cannot effectively list this set.

18
On formulas
It is very common in mathematics to work with a formal notation for statements, since it neatly separates what we
want to prove from how we prove it. The notation used is justified as follows:

‘P and Q’ ‘P or Q’ ‘not P’ ‘P implies Q’ ‘for all x ∈ A we have P( x )’ ‘there exists x ∈ A such that P( x )’
P∧Q P∨Q ¬P P⇒Q ∀ x ∈ A ( P( x ) ) ∃ x ∈ A ( P( x ) )

But there is no real advantage of using this notation, to write mathematics using formulas, other than that it is
more compact. Moreover, using formulas would threaten to redirect the focus towards a symbolic manipulation of
formulas rather than understanding the structure of a proof.
Use of these symbols is strongly discouraged if not used for the construction of formulas. It is incorrect,
and highly illogical, to replace words in sentences by logical symbols; what you write almost immediately
becomes nonsense.

Proof : Assume A = { a1 , . . . , an }. We form any subset V of A by taking each element ai in turn and
deciding whether or not to include it in V. This gives us n independent choices between two possibilities:
in V or not. We can therefore represent each subset of A by a binary number
b = b1 · · · bn ,
where bi = 1 if ai ∈ V, and bi = 0 if ai 6∈ V and each number b uniquely represents a subset of A. 21 The
number of different subsets we can form is therefore the number of binary numbers representable with n
bits, i.e. 2n .

Note that, in general, there are mn ways of making n independent choices between m options.

2.4 Products
The set construct we will now consider is that of the product of two (or an arbitrary number of) sets,
which forms an essential part of the definition of relation as discussed in the next section. If we want
to describe the relationship John loves Mary , then we require a way of talking about John and Mary at the
same time. We do this using the notion of an ordered pair h a, b i 22 which is a pair of objects a and b. In
our example, we have the pair h John , Mary i in the loves relation, but not necessarily the pair h Mary , John i
since the love might be unrequited; so h Mary , John i is not the same as h John , Mary i, hence, the order of
the components of the pair is important.
The product 23 constructor allows us to collect the ordered pairs together in one set. It is invented by
and named after the French mathematician René Descartes.

Definition 2.20 (C ARTESIAN PRODUCT ) Let A and B be arbitrary sets.


1) An ordered pair of elements a ∈ A and b ∈ B is written as h a, b i.
2) The Cartesian (or binary) product of A and B, written A × B, is the collection of all ordered pairs
that can be formed from A and B; it is assumed to be a set as well and is defined as
∆ {h a, b i | a ∈ A and b ∈ B }
A× B =
We will write A2 for A × A.
3) Equality on elements of A × B is defined as:
= A×B =∆ {h p , p i ∈ ( A × B )2 | ∃ a, c ∈ A, b, d ∈ B [ p = h a, b i ∧ p = h c, d i ∧ a = c ∧ b = d]}
1 2 1 2 A B

or
∆ {hh a, b i , h c, d ii ∈ ( A × B )2 | a = c ∧ b = d }
= A×B = A B

21 See also Definition 5.7.


22 Another notation for pairs is ( a, b ); we use the ‘pointy bracket’ notation for reasons of readability.
23 The terminology is due to the result in Proposition 2.22. In fact, we could have defined h A, B i for the set of all pairs

constructed out of the sets A and B, but the product notation is universal.

19
or
∆ a= c∧b= d
h a, b i = A × B h c, d i = A B

for all a, c ∈ A, and b, d ∈ B.

Notice that × is not commutative, since, in general, A × B 6= B × A, and that h a, b i 6= { a, b }.

Example 2.21 Examples of Cartesian products include the coordinate system of real numbers IR 2 : points
are described by their coordinates h x, y i.

The use of the symbol × to denote the set of all pairs (and of the terminology product) is motivated
by the next result.

Proposition 2.22 Let A and B be finite sets. Then A × B = A × B .


Proof : Let A and B be arbitrary sets with A = { a1 , . . . , am } and B = { b1 , . . . , bn }. For each element ai
in A there are m choices of element b j in B to form a pair h ai , b j i. The total number of possible pairs is
therefore n × m.

We can extend the notion of Cartesian product to the higher case.

Definition 2.23 (n- ARY PRODUCT ) 1) For any n ≥ 1, an n-tuple is a sequence h a1 , . . . , an i of n objects.
2) Let A1 , . . . , An be arbitrary sets. The n-ary product of the Ai , written A1 × · · · × An or × ni=1 Ai , is
defined as {h a1 , . . . , an i | ∀ i ∈ IN ( 1 ≤ i ≤ n ⇒ ai ∈ Ai ) }.
3) The n-ary product of As is written An , with A2 the Cartesian product (see Definition 2.20).

In the first part, again, the order of the ai matters.

Example 2.24 1) The three-dimensional space of real numbers IR 3 is an example of a ternary product.
2) Let A1 , . . . , An be a collection of finite sets. Then A1 × · · · × An = A1 × · · · × An . We will
show this, using induction, in Exercise ??.
3) Notice that we can now form the product of three sets in three different ways:
A×B×C ( A × B)× C A ×( B × C )
These sets are all fundamentally different, so the set-operator × is not associative, whereas mul-
tiplication × on numbers is. There exists however a clear correspondence between the elements
h a, b, c i and hh a, b i, c i and h a, h b, c ii, and so these sets are in some sense equivalent; see also
Exercise 5.8.

2.5 Partitions
If we divide a set into non-overlapping parts we get a partition of the set.

Definition 2.25 (PARTITIONS ) For any set A, a partition of A is a family A1 , . . . , An of subsets of A


such that: 1) each Ai is non-empty; 2) the Ai cover A; and 3) the Ai are pairwise disjoint. Expressed
formally, these criteria are:
∀ i ∈ IN ( 1 ≤ i ≤ n ⇒ Ai 6= )
A = A1 ∪ · · · ∪ An 24
∀ i, j ∈ IN ( 1 ≤ i, j ≤ n ∧ i 6= j ⇒ Ai ∩ A j = )
It is sometimes convenient to write A1 ∪ · · · ∪ An as ∪ ni=1 Ai .

For example, cars can be partitioned into makes and integers can be partitioned into even and odd. We
shall return to partitions when discussing equivalence relations.

24 Notice that it would be sufficient to demand A ⊆ A1 ∪ · · · ∪ An .

20
Set partitioning is strongly related to the pigeonhole principle, which can be stated as follows: Suppose
that m objects are to be placed in n pigeonholes, where m > n; then some pigeonhole will have to contain
more than one object.

Theorem 2.26 (T HE P IGEONHOLE P RINCIPLE ) If a set of n distinct objects is partitioned into k sub-
sets, where 0 < k < n, then at least one subset contains at least two elements.
Proof : Let A = n, and A1 , . . . , Ak be a partition of A; in particular, each Ai is not empty. Assume
(towards a contradiction) that each Ai has only one element. Since A = ∪ ki=1 Ai is a partition of A, by
Proposition 2.17 we know that A = ∪ ki=1 Ai , with the Ai pairwise disjoint, so ∪ki=1 Ai = Σki=1 Ai .
Notice that then n = A = ∪ki=1 Ai = Σki=1 Ai = k. It is impossible for k < n and n = k, so we have
a contradiction. So it cannot be true that each (non-empty!) Ai has only one element, so there is at least
one Ai that has at least two elements.

Notice that the property also holds when we drop the condition that A1 , . . . , Ak is a partition, but just
demand that A = ∪ ki=1 Ai with the Ai pairwise disjoint, and allow each Ai to be empty.
This might seem a proof by contradiction, but it is not. The statement we show is ‘not every partition
has one element’, which is shown by assuming that ‘every partition has one element’. This leads to a
contradiction, so we have to reject the assumption.

Exercises
Exercise 2.1 Show that A = B if and only if ‘for every x ∈ A ∪ B we have ( x ∈ A if and only if x ∈ B)’.
Exercise 2.2 Draw Venn diagrams representing 2 and 3 possibly overlapping sets. How many regions
are there in each case? Can you draw an appropriate diagram for 4 sets? Again, how many regions does
it contain?
Exercise 2.3 Let A = { 1, 2, 3, 4 } , B = { 3, 4, 6 } , C = { 1, 6 }. Calculate the following: 1) B ∪ C 2)
A ∩ ( B ∪ C ) 3) ( A ∩ B) ∪ ( A ∩ C ).
Exercise 2.4 Let A = { 1, 3, 5 } and B = { 2, 3 }. Write down explicit sets for 1) A ∪ B and A ∩ B; 2)
A \ B and B \ A; 3) ( A ∪ B) \ B and ( A \ B) ∪ B; 4) A △ B and B △ A; 5) A × B, A × and B × A;
6) ℘ A.
Exercise 2.5 Let A = {{ a } , { b }} and B = { a, b, { a } }. Determine A ∩ B, A ∪ B, ℘ B, A ∩ ℘ B, A × B,
( A × B) ∩ ( B × A) and A △ B.
Exercise 2.6 Determine whether the following statements are true or false. 1) { x } ⊆ { x }; 2) { x } ∈
{ x }; 3) { x } ∈ { x, { x }}; 4) { x } ⊆ { x, { x }}; 5) ∈ ; 6) ⊆ ; 7) ⊆ { }.
Exercise 2.7 Of 100 students, 35 play football, 36 play tennis and 24 play cricket. 13 play football and
tennis, 2 play football and cricket but never tennis, 12 play tennis and cricket, while 4 play in all three
games. How many students do not participate in any of these games? Justify your answer.
Exercise 2.8 Show that, for all sets A, B, and C:
1) if C ⊆ A and C ⊆ B, then C ⊆ A ∩ B.
2) if A ⊆ C and B ⊆ C, then A ∪ B ⊆ C.
Exercise 2.9 Let A, B and C be sets. Decide whether each of the following statements is true or false.
You can use earlier results, or de Morgan’s law (‘not ( A and B) if and only if ((not A) or (not B))’), or
‘( A and (not A or B)) implies B’ whenever convenient.
Justify your answer either with a formal proof or a counter-example.
1) A ∪ ( B ∩ C ) = ( A ∩ B) ∪ C 5) A \( B ∪ C ) = ( A \ B) ∪ ( A \ C )
2) A ∩ ( B ∪ C ) = ( A ∩ B) ∪ ( A ∩ C ) 6) A ∩ ( B \ C ) = ( A ∩ B) \( A ∩ C )
3) A ∪ ( B ∩ C ) = ( A ∩ B) ∪ ( A ∩ C ) 7) A △ ( B ∩ C ) = ( A △ B) ∩ ( A △ C )
4) A \( B ∪ C ) = ( A \ B) ∩ ( A \ C ) 8) A ∩ ( B △ C ) = ( A ∩ B) △ ( A ∩ C )

21
Exercise 2.10 Show that
1) A = ( A \ B) ∪ ( A ∩ B).
2) A ∪ B = ( A \ B) ∪ ( A ∩ B) ∪ ( B \ A).
and that these unions are disjoint.
Exercise 2.11 Let A, B be finite sets. Recall that A ∪ B = A + B − A ∩ B . Find a similar formula
for A ∪ B ∪ C , and justify your answer.
Exercise 2.12 Calculate ℘ { 0 }, ℘ { 0, { 0 }}, ℘ { 0, { 0 } , { 0, { 0 }}}.

3 Relations
We wish to capture the concept of objects being related: for example, John loves Mary ; 2 < 5; two
predicates P and Q are equal. Such properties can be expressed using relations.

Definition 3.1 (R ELATION ) 1) A relation R between two sets A and B, or from A to B, is a subset of
the binary product of A and B, so R ⊆ A × B. We use R, S, . . . to range over relations.
2) If R ⊆ A × B, we say that R has type A × B.
3) If R ⊆ A2 , we call R a binary relation on A.
Instead of h a, b i ∈ R, we will often write a R b, and will write a R b R c as abbreviation of ‘a R b and
b R c’.

Notice that since a relation is a subset of a product set, equality between relations is actually equality
between sets. Moreover, it might be important to point out that a pair p = h a, b i ∈ A × B is not a relation:
a relation R is a set of pairs, a subset of A × B, and p can be an element of R, and even be its unique
element, but then R = { p } = {h a, b i} ∈ ℘ ( A × B), which is different from p.
In general, there will be many relations on any set. A relation does not have to be meaningful; any
subset of a Cartesian product forms a relation. For example, for A = { a, b }, there are sixteen binary
relations on A:
{h b, b i} {h a, b i, h b, a i} {h a, a i , h a, b i, h b, b i}
{h a, a i} {h a, a i , h a, b i} {h a, b i, h b, b i} {h a, a i , h b, a i, h b, b i}
{h a, b i} {h a, a i , h b, a i} {h b, a i , h b, b i} {h a, b i, h b, a i , h b, b i}
{h b, a i} {h a, a i , h b, b i} {h a, a i , h a, b i, h b, a i} {h a, a i , h a, b i, h b, a i , h b, b i}
Note that there are four elements in A2 = {h a, a i , h a, b i, h b, a i , h b, b i}; each relation is a subset of A2 ,
so there are ℘ A2 = 16 possibilities; in contrast, there are only four subsets of { a, b }: , { a }, { b }, and
{ a, b }.
However, listing ordered pairs can get tedious and hard to follow. For binary relations R ⊆ A × B we
have several other representations.
A B
Diagram : Let A = { a1 , a2 }, B = { b1 , b2 , b3 }, and
a1 b1
R = {h a1 , b1 i, h a2 , b1 i, h a2 , b2 i}. We can represent R by
a2 b2
the diagram:
b3

Directed Graph: In case R is a binary relation on A, we

22
can also use a directed graph, which consists of a set A a1
of nodes corresponding to the elements in A, joined by
a4
arrowed lines indicating the relationship between the el-
a3
ements. For example, let
a2
A = { a1 , a2 , a3 , a4 }
R = {h a1 , a2 i, h a2 , a1 i, h a3 , a2 i, h a3 , a3 i}
The directed graph of this relation is given to the right. Notice that the direction of the arrows
matters. (It is, of course, still possible to use a diagram where the source and target sets are drawn
separately as above.)
y
Special representations : Some special ways of drawing cer-
7
tain relations exist. For example, we can sometimes rep-
resent a binary relation on IR as an area in the plane. The
diagram to the right represents the relation R defined by 7
x
xRy= ∆ x + y ≤ 7.

Matrix : Take A = { a1 , . . . , am } and B = { b1 , . . . , bn }. We can represent R by an m × n matrix M of


booleans True and False . For i = 1, . . . , m and j = 1, . . . , n, define
(
True (if ai R b j )
M (i, j) =
False (otherwise )
where M (i, j) is the usual notation for position on the ith row and jth column of the matrix. For
example, if A = { a1 , a2 }, B = { b1 , b2 , b3 } and R = {h a1 , b1 i, h a2 , b1 i, h a2 , b2 i} as before, then the
matrix is !
True False False
True True False
It is also common to use the elements 0, 1 or F , T instead of False , True .
Just as for products, we can extend the definition of a binary relation to an n-ary relation, for any n.

Definition 3.2 (H IGHER - ORDER R ELATIONS ) A n-ary relation between the sets A1 , . . . , An is a subset
of the n-ary product × ni=1 Ai . The definition of a 2-ary relation is the same as that of a binary relation
given in Definition 3.1. A predicate over the set A is a 1-ary relation: that is, a subset of A.

Example 3.3 1) The set { p ∈ IN | p is prime } is a unary relation on IN .


2) The set {h x, y, z i ∈ IR 3 | x2 + y2 + z2 = 1 } is a ternary relation on IR , which describes the surface
of the unary sphere with centre h 0, 0, 0 i.

3.1 Constructing Relations


Just as for sets, we may construct relations from others. We just give the definitions for binary relations;
it is easy to extend the definitions to higher numbers.

Definition 3.4 (BASIC R ELATION O PERATORS ) Let R, S ⊆ A × B. We define the relations R ∪ S, R ∩


S and R, all with type A × B, and R -- 1 with type B × A, by:
Relation Union : ∆ {h a, b i ∈ A × B | h a, b i ∈ R ∨ h a, b i ∈ S }
R∪S =
Relation Intersection : ∆ {h a, b i ∈ A × B | h a, b i ∈ R ∧ h a, b i ∈ S }
R∩S =
Relation Complement : ∆ {h a, b i ∈ A × B | h a, b i 6∈ R }
R =
-- 1

Inverse relation : R = {h b, a i ∈ B × A | a R b }
Example 3.5 Let R and S be binary relations on { 1, 2, 3, 4 } defined by:

23
R = {h 1, 2 i, h 2, 3 i, h 3, 4 i, h 4, 1 i, h 4, 4 i}
S = {h 1, 2 i, h 2, 1 i, h 3, 4 i, h 4, 3 i}
We can construct the following relations:
R ∪ S = {h 1, 2 i , h 2, 3 i, h 3, 4 i, h 4, 1 i, h 4, 4 i, h 2, 1 i, h 4, 3 i}
R ∩ S = {h 1, 2 i , h 3, 4 i}
R = {h 1, 1 i , h 1, 3 i, h 1, 4 i, h 2, 1 i, h 2, 2 i, h 2, 4 i, h 3, 1 i, h 3, 2 i, h 3, 3 i, h 4, 2 i, h 4, 3 i}
-- 1
R = {h 2, 1 i , h 3, 2 i, h 4, 3 i, h 1, 4 i, h 4, 4 i}
Notice that we have overloaded notation: R ∪ S and R ∩ S denote relation union and intersection
respectively when R and S are viewed as relations, and set union and intersection when viewed as sets.
To form a relation union or intersection, the relations are of the same type; in contrast, we can form
the union and intersection of arbitrary sets. It should be clear from the context which interpretation we
intend. It is easy to verify that if we take the inverse of the inverse of a relation, we recover the original:
-- 1
( R -- 1 ) = R. Note that the inverse is only defined for binary relations and that every binary relation has
an inverse. Sometimes the inverse is indicated by reversing an infix symbol: ‘>’ is the inverse of ‘<’,
etc..
The inverse of a relation should not be confused with its complement: for example, the inverse of ‘is
a parent of’ is ‘is a child of’;
x parent of y ⇐
⇒ y child of x
whereas the complement of ‘is a parent of’ is ‘is not a parent of’;
x parent of y ⇐
⇒ ¬ ( x parent of y)
As well as inheriting structure from sets, relations have operations of their own which do not apply to
sets in general, like identity and composition, which we will now define.

Definition 3.6 (I DENTITY RELATION ) Given a set A, the identity relation on A, written Id A , is the
binary relation on A defined by
∆ {h x, y i ∈ A2 | x = y }
Id A = A

Definition 3.7 (C OMPOSITION OF RELATIONS ) Given R ⊆ A × B and S ⊆ B × C, then the composition


of R with S, written R ◦ S, is defined by
∆ {h a, c i ∈ A × C | ∃ b ∈ B ( a R b ∧ b S c ) }
R◦S =

We will often write a R b S c for a R b ∧ b S c.


The notation R ◦ S can be read as R composed with S. Notice that the relation R ◦ S is only defined if
the types of R and S match up.
∆ parent of ◦ parent of ’ by:
Example 3.8 1) We can define the set ‘grandparent of =
x grandparent of y =∆ ∃ z ( x parent of z ∧ z parent of y )

2) Using the relations R and S from Example 3.5, we have


R ◦ S = {h 1, 1 i , h 2, 4 i, h 3, 3 i, h 4, 2 i, h 4, 3 i}
S ◦ R = {h 1, 3 i , h 2, 2 i, h 3, 1 i, h 3, 4 i, h 4, 4 i}

3.2 Equalities between Relations


Recall from Proposition 2.12 that we can prove certain equalities between sets constructed through set
operations. We can also show similar equalities between relations constructed through operations on
relations.

Proposition 3.9 1) If R ⊆ A × B, then Id A ◦ R = R = R ◦ IdB .

24
2) Composition is associative: for arbitrary relations R ⊆ A × B and S ⊆ B × C and T ⊆ C × D, we
have R ◦(S ◦ T ) = ( R ◦ S) ◦ T.
Proof : The proof of part (1) is simple and left as Exercise 3.7; we prove part (2).
Let R, S, and T be relations specified in the proposition, and let h x, u i be an arbitrary member of
( R ◦ S) ◦ T. Then there exists z ∈ C such that x ( R ◦ S) z and z T u; but then there also exists y ∈ B such
that x R y and y S z. So there exists y ∈ B such that x R y and y S ◦ T u, so also x R ◦(S ◦ T ) u and we
have shown that ( R ◦ S) ◦ T ⊆ R ◦(S ◦ T ).
The reverse, showing that R ◦(S ◦ T ) ⊆ ( R ◦ S) ◦ T can be proved in a similar way.

Notice that, since relation composition is associative, we can omit the brackets, and write simply R ◦ S ◦ T
for ( R ◦ S) ◦ T, etc..

Proposition 3.10 Let R and S be arbitrary binary relations on A. In general,


1) R 6= R -- 1 ;
2) R ◦ S 6= S ◦ R (so composition is not commutative);
3) R ◦ R -- 1 6= Id A .
Proof : As for Proposition 2.13, the way to prove that a property does not hold is to provide a counterex-
ample. For a counterexample to part (1), take A = { a, b } with a 6= b, and the relation R = {h a, b i}.
--
Then R 1 = {h b, a i} which is different from R since a 6= b. To show that composition is not commuta-
tive (part (2)), we have to find R and S such that R ◦ S 6= S ◦ R. Well, take A = B = { a, b } with a 6= b,
R = {h a, a i}, and S = {h a, b i}. Then R ◦ S = {h a, b i}, but S ◦ R = .
Part (3) is left as Exercise 3.7.

3.3 Equivalence Relations


Sometimes we would like to consider objects ‘the same’ even when they are not (syntactically) identical,
like 7 and 3 + 4; we then call these objects equivalent. We can think of an equivalence relation as a weak
equality: if R is an equivalence relation, then a R b means that a and b are in some sense indistinguishable,
but not necessarily identical as objects or expressions.
We give some universal definitions of the properties a relation might satisfy.

Definition 3.11 Let R be a binary relation on A. Then


∆ ∀ x ∈ A (x R x)
R is reflexive =
R is symmetric =∆ ∀ x, y ∈ A ( x R y ⇒ y R x )
∆ ∀ x, z ∈ A ( ∃ y ∈ A ( x R y ∧ y R z ) ⇒ x R z )
R is transitive =
Relations with these properties occur naturally: the equality relation on sets is reflexive, symmetric
and transitive; the relations ≤ on numbers and ⊆ on sets are reflexive and transitive, but not symmetric;
and the relations < on numbers and ⊂ on sets are transitive, but not reflexive nor symmetric.
Notice that the definition of transitivity uses implication: if an intermediate element y can be found
between x and z, the pair h x, z i is in the relation. Thereby a relation is transitive even if there is no
intermediate element to be found.
Another way to define these relations is in terms of the operations on relations introduced in the
previous section.

Proposition 3.12 Let R be a binary relation on A.


1) R is reflexive if and only if Id A ⊆ R.
2) R is symmetric if and only if R = R -- 1 .
3) R is transitive if and only if R ◦ R ⊆ R.

The proof is straightforward and is left as Exercise 3.9.

25
Definition 3.13 (E QUIVALENCE R ELATION ) Let R be a binary relation on the set A. The relation R is
an equivalence relation when R is reflexive, symmetric, and transitive. We often write a ∼ R b for a R b
when R is an equivalence, or just a ∼ b when it is clear or unimportant which R is intended.

In words, we can say that R is an equivalence relation when: 1) we cannot distinguish an element from
itself; 2) if x is indistinguishable from y, then so is y from x; and 3) if x is indistinguishable from y and
y is indistinguishable from z, then so is x from z.
We can use an equivalence relation R to partition a set in a natural way into disjoint subsets such that
the elements in these subsets are related through R and thereby equivalent to each other.

Definition 3.14 (E QUIVALENCE C LASSES ) Let R be an equivalence relation on A. For any a ∈ A, the
equivalence class of a with respect to R, denoted [ a] R , is defined as
∆ { x ∈ A | a ∼ x }.
[ a] R = R

We often write [ a] instead of [ a] R when the relation R is apparent.


The set of equivalence classes in A over the equivalence relation R is sometimes called the quotient
set A / R (or A / ∼ R ).

Example 3.15 1) Given n ∈ IN , the binary relation Rn on ZZ defined by


Rn =∆ {h a, b i ∈ ZZ 2 | ∃ q ∈ ZZ ( q × n = b – a ) }

is an equivalence relation. The set ZZ / Rn represents the integers modulo n.


2) The binary relation S on the set ZZ ×(IN \ { 0 }) defined by h z1 , n1 i S h z2 , n2 i when z1 × n2 is equal
to z2 × n1 is an equivalence relation. We can write S as:
S = ∆ {hh z , n i, h z , n ii ∈ (ZZ × IN \ { 0 })2 | z × n = z × n }
1 1 2 2 1 2 2 1

This is the usual representation of the rational numbers.


3) For any set A, the equality relation =A is an equivalence relation.
4) For any set A, the identity relation Id A is an equivalence relation.
5) Looking ahead, in Definition 5.25 we will define a relation between sets which characterises when
(finite and infinite) sets have the same number of elements. Proposition 5.28 shows that this has all
the properties of an equivalence relation (it is not one, since the collection of all sets is not a set, as
we have seen).

We can draw the separation of a set into equivalence classes


A2 A
in a diagram, as for example in the diagram on the right.
In this case, there are five equivalence classes, illustrated A1 A3
by the five disjoint subsets. In fact, an equivalence class al-
ways separates the elements into disjoint subsets that cover the A5 A4
whole of the set, as the following proposition states formally.

Proposition 3.16 Let R be an equivalence relation on A; { V ∈ ℘ A | ∃ a ∈ A ( V = [ a] R ) } forms a


partition of A.
Proof : We check the conditions of Definition 2.25:
Each [ a] R is non-empty : Take a ∈ A, then a ∼ R a by reflexivity and so a ∈ [ a] R and [ a] R 6= .
The classes cover A: We need to show that A = ∪ a∈ A [ a] R . Take a′ ∈ A; by the previous case, we
have a′ ∈ [ a′ ] R ; so also a′ ∈ ∪ a∈ A [ a] R , for every a′ ∈ A, and hence A ⊆ ∪ a∈ A [ a] R .
To show that ∪ a∈ A [ a] R ⊆ A, notice that since R ⊆ A2 , for all a ∈ A, if b ∈ [ a] R , then by defini-
tion 3.14, b ∈ A; so for all a ∈ A, [ a] R ⊆ A, so also ∪ a∈ A [ a] R ⊆ A.
The classes are disjoint (or equal) : We need to show that, for all a, b ∈ A, if [ a] R ∩ [b] R is not empty,
then [ a] R = [b] R . So assume [ a] R ∩ [b] R 6= , and let w ∈ [ a] R ∩ [b] R ; by definition a ∼ R w and
b ∼ R w, and by symmetry and transitivity we have also a ∼ R b (so b ∈ [ a] R ).

26
To show [b] R ⊆ [ a] R , take v ∈ [b] R ; then by definition b ∼ R v. Since a ∼ R b, by transitivity also
a ∼ R v. Therefore v ∈ [ a] R for any v ∈ [b] R and so [b] R ⊆ [ a] R . Using a similar argument, we can
show [ a] R ⊆ [b] R ; so we have [ a] R = [b] R .

3.4 Transitive Closure


Consider the following situation.
There are various flights between various cities. Manchester Edinburgh
For any two cities, we wish to know whether it is pos-
sible to fly from one to the other allowing for changes Knock London
of plane. We can model this by defining a set City of
cities and a binary relation R such that a R b if and
Dublin Paris
only if there exists a direct flight from a to b. This
relation may be represented as a directed graph with
the cities as nodes, as in the diagram on the right. Madrid Rome
+ +
Now define the relation R by: a R b when there exists a trip from a to b, perhaps using many flights.
Then clearly a R+ b implies there exists some path from a to b in the directed graph. For instance,
there exists a path from Manchester to Rome, but no path from Rome to Manchester. We would like to
calculate R+ from R. Such a relation is called the transitive closure of R, since it is clearly transitive; it
is in fact a special relation in the sense that it is the smallest transitive relation containing R.

Definition 3.17 (T RANSITIVE CLOSURE AND REDUCTION ) 1) We define Rn , with n ≥ 1, as follows:


∆ R
R1 =
∆ R◦R
R2 =
∆ R ◦ R2 = R2 ◦ R,
R3 = (◦ is associative )
..
.
R = R ◦ Rn – 1 = R ◦ · · · ◦ R, (n times)
n ∆
..
.
and define the transitive closure of R, R+, by:
R+ = ∆ R ∪ R2 ∪ · · · ∪ R n ∪ · · · = ∪
i ≥1 R
i

2) We write R∗ for the reflexive and transitive closure of R.


3) The transitive reduction R− of a transitive relation R is a smallest (it need not be unique) set S such
that S ⊆ R, and S+ = R. So
R− = {h a, b i ∈ R | ¬∃ c ∈ A ( a 6= c ∧ b 6= c ∧ h a, c i ∈ R ∧ h c, b i ∈ R ) }.
Informally, we have a Rn b when there exists a path of length n in R from a to b.
For acyclic relations R (so there exists no a0 , a1 , . . . , an so that n > 0, ai R ai+1 for all 0 ≤ i < n and
a0 = an ) R− is unique and is a sub-relation of R. Also, uniqueness of R− fails for relations with cycles,
and for infinite relations not even existence is guaranteed. The benefit is that R− is smaller that R, while
in some sense having the same information content as R, since R can be reconstructed from R−. We will
return to this problem in the easier setting of partial orders in Section 4.

Therefore, we have a R+ b if and only if there exists n ≥ 1 such that a Rn b. The transitive closure of
a binary relation R on A always exists, but might require an infinite construction.
We can express the relation R+ ⊆ A2 in terms of R also informally: we have a R+ b when there exists
some n ≥ 1, and a ‘path’ of length n from a to b, i.e. there are c0 , . . . , cn ∈ A such that ci R ci+1 for
0 ≤ i < n, and a = c0 and b = cn .
a = c0 ∧ c0 R c1 ∧ c1 R c2 ∧ · · · ∧ c n −2 R c n −1 ∧ c n −1 R c n ∧ c n = b
For the above example, this gives the diagram on the left.

27
To cast more light on R+, we can under-
stand it also through the following construc-
Manchester Edinburgh
tion. Let R be finite, and imagine that we want
to make R transitive in the most economical
Knock London fashion (we could create a transitive relation
by just adding all pairs in A2 , but that would
not correspond to the notion of ‘path’ that we
Dublin Paris
want to capture). If R is already transitive,
we need do nothing. Otherwise, there exists
Madrid Rome a, b, c ∈ A such that a R b and b R c, but not
a R c; we then add the pair h a, c i to the re-
lation, which is now in some sense closer to
being transitive. We carry on doing this until there are no more pairs to add and obtain a transitive rela-
tion. Anything we have added to R was forced upon us by the requirement of transitivity alone, so we
have obtained the smallest possible transitive relation containing R.
In case A is finite, we need not consider paths of length greater than n = A since they will involve
repeats (visiting the same node of the graph twice; this follows from the Pigeonhole Principle). So
Rn+1 is already included in R ∪ R2 ∪ · · · ∪ Rn and we need not calculate further as we have found R+.
In fact, we often do not have to go as far as A . In the airline example above there are 8 cities, but
the longest paths without repeats are of length 3. Thus we compute R, R2 , R3 , and R4 and find that
R4 ⊆ R ∪ R2 ∪ R3 , so we can stop.

Example 3.18 Let ≤IN be the standard ‘smaller or equal’ relation of IN . Then:
( ≤IN )− = {h n, m i ∈ IN 2 | m = n + 1 }
It is very tricky to define ( ≤IR )−, if possible at all.

Exercises
Exercise 3.1 Let R and S be binary relations on { 1, 2, 3, 4 } such that
R = {h 1, 2 i, h 2, 3 i, h 3, 4 i, h 4, 1 i, h 4, 4 i}
S = {h 1, 2 i, h 2, 1 i, h 3, 4 i, h 4, 3 i}
Give R ∪ S, R ∩ S, and R.
Exercise 3.2 Let A = { 1, 2, 3, 4 } , B = { a, b, c, d } , C = { α, β, γ }, and let
R = {h 1, a i , h 2, d i, h 3, a i, h 3, b i, h 3, d i}
S = {h b, α i , h b, γ i, h c, β i, h d, γ i}
Give the diagram and matrix representations of R and S. For the following relations, either give their
elements and types, or explain why they are not well defined: 1) R -- 1 2) S 3) R ∪ S and 4) R ◦ S.
Exercise 3.3 Give examples of relations on { 1, 2, 3, 4 } having the following properties: 1) reflexive,
symmetric, not transitive 2) reflexive, not symmetric, not transitive 3) not reflexive, not symmetric, and
transitive 4) symmetric, transitive, not reflexive. Explain your answers briefly.
Exercise 3.4 If A = n, how many binary relations are there on A? How many of these are reflexive?
Exercise 3.5 If A = k and B = m, how many relations are there between A and B? If in addition
C = n, how many ternary relations are there in A × B × C?
-- 1
Exercise 3.6 Let R, S ⊆ A2 . Prove that the following statements are true: 1) If R ⊆ S then R -- 1 ⊆ S
-- -- -- 1 --
2) ( R ∩ S) 1 = R -- 1 ∩ S 1 3) ( R ∪ S) 1 = R -- 1 ∪ S 1 4) ( R) = R -- 1 5) ( R ◦ S) 1 = S 1 ◦ R -- 1
-- -- --

Exercise 3.7 1) Show that if R ⊆ A × B, then Id A ◦ R = R = R ◦ IdB .


-- 1
2) Show that, in general, R ◦ R 6= Id A .

28
Exercise 3.8 Let A = { 1, 2, 3, 4 } and let R be a binary relation on A.
1) If R is reflexive, what ordered pairs must belong to R?
2) If R is symmetric, what ordered pairs must belong to R?
3) If h 1, 2 i, h 3, 1 i ∈ R and R is symmetric, then what other ordered pairs must belong to R?
4) Is the relation {h 1, 4 i} transitive?
Exercise 3.9 Let R be a binary relation on A. Show that
1) R is reflexive if and only if Id A ⊆ R.
--
2) R is symmetric if and only if R = R 1 .
3) R is transitive if and only if R ◦ R ⊆ R.
Exercise 3.10 1) Is symmetric difference △ idempotent? Explain your answer.
2) Show by means of Venn diagrams that △ is associative.
3) Let A, B be sets. Use associativity and other laws on △, which you should state, to simplify the
following:
( A △ B) △ ( A △ ( B △ A))
Your answer may involve A and/or B, but should have no occurrences of △.
Exercise 3.11 Say whether each of the following relations on { 1, 2, 3, 4, 5 } is reflexive, symmetric or
transitive (drawing directed graphs may help you).
1) R1 = {h 3, 2 i , h 1, 4 i, h 5, 5 i, h 4, 1 i}
2) R2 = {h 2, 2 i}
3) R3 = {h 4, 2 i , h 1, 4 i, h 3, 4 i, h 1, 2 i, h 3, 2 i}
4) R4 = {h 1, 1 i , h 2, 2 i, h 3, 3 i, h 4, 4 i, h 5, 5 i, h 1, 5 i}
5) R5 = {h 1, 1 i , h 2, 2 i, h 3, 3 i, h 4, 4 i, h 5, 5 i, h 1, 5 i, h 5, 1 i}
Exercise 3.12 1) Give matrix and directed graph representations of the following binary relation:
R = {(1, 1), (1, 2), (2, 1), (2, 2), (2, 3), (2, 4), (3, 4), (4, 1)}
on { 1, 2, 3, 4 } .
2) Also give matrix and directed graph representations for R and R -- 1 .
Exercise 3.13 Let R, S be binary relations on a set A.
-- 1
1) Show that ( R ∪ S) = R -- 1 ∪ S -- 1 .
--
2) Use (1) to deduce that R ∪ R 1 is symmetric.
-- 1 -- --
3) Show that ( R ◦ S) = S 1 ◦ R 1 .
--
4) Now suppose that R ⊆ S and S are symmetric. Show that R ∪ R 1 ⊆ S.
5) Use part (3) to show that R symmetric implies R ◦ R is symmetric.
Exercise 3.14 Let R be a binary relation on A.
1) Explain why if R is symmetric, also R+ is symmetric.
2) Show that ( R ∪ R2 )+ = R+.
Exercise 3.15 Let A = { 1, 2, 3, 4 } and consider the relation
R = {h 1, 1 i, h 2, 2 i, h 2, 3 i, h 3, 2 i, h 4, 2 i, h 4, 4 i} ⊆ A2
Is R reflexive, symmetric, or transitive? If not, give the elements that are missing.
-- 1
Exercise 3.16 1) Is R ◦ R reflexive for all binary relations on A?
2) Is a reflexive relation always symmetric?
3) Is a symmetric relation always reflexive?
4) Is the union of two symmetric relations always symmetric?

29
5) Is the complement of a transitive relation a transitive relation?
6) Is the complement of a non-symmetric relation a symmetric relation?
Exercise 3.17 Let R and S be binary relations on A.
1) Give specific relations R and S, such that R and S are transitive but R ∪ S is not.
2) Prove that if R and S are transitive then R ∩ S is transitive.
3) Give specific relations R and S, such that R and S are transitive but R ◦ S is not.
Exercise 3.18 A binary relation R on a set A is called circular when it satisfies:
∀ a, c ∈ A ( ∃ b ∈ A ( a R b ∧ b R c ) ⇒ c R a )
Prove that a relation R is an equivalence relation on A if and only if it is reflexive and circular on A.
Exercise 3.19 1) Suppose R = {h 1, 2 i , h 2, 3 i, h 3, 4 i, h 4, 1 i}. Give R+. Illustrate both R and its transi-
tive closure R+ as directed graphs.
∆ y = 2x. Give R +.
2) Let R be the binary relation on the natural numbers defined by x R y =
Exercise 3.20 Find a binary relation R on a set A with | A| = n such that R+ 6= R ∪ R2 ∪ . . . ∪ Rn−1 .

4 Orderings
Orderings are special binary relations on sets which characterise when one object is greater than (in any
sense of the word) another. Orderings on sets of numbers such as IN , ZZ and IR are familiar: the ordering
< describes the less than ordering, and ≤ the less than or equal ordering. We can define orderings on
all sets; here we give the formal definitions and properties of some orderings.

Definition 4.1 (O RDERINGS ) Let R be a binary relation on a set A. We distinguish the following
different notions of orders.
R is a pre-order : R is reflexive and transitive (so not necessarily symmetric).
R is anti-symmetric : ∀ x, y ∈ A ( x R y ∧ y R x ⇒ x =A y ) .
R is a partial order relation : R is an anti-symmetric pre-order (so is reflexive, transitive, and anti-
symmetric).
R is irreflexive : ∀ a ∈ A ( ¬ ( a R a) ).
R is a strict partial order relation : R is irreflexive and transitive.
R is a total order : A partial order that also satisfies: ∀ a, b ∈ A ( a R b ∨ b R a ) .
Partial orders are often denoted by ≤ and we write ( A, ≤) or ≤ A to denote a partial order ≤ on A; strict
partial orders are often denoted by <. Total orders are also sometimes called linear orders.

It is important to notice that, contrary to intuition, a strict partial order relation is not a partial order
relation.
Remember that if R is symmetric then, for all x and y, if h x, y i ∈ R, then also h y, x i ∈ R. Then anti-
symmetry expresses that a relation is nowhere symmetric other than in a trivial way: for no pair h x, y i in
R with x 6= y is h y, x i also in R. Notice, however, that anti-symmetric is not the opposite of symmetric,
since a relation can be both symmetric and anti-symmetric: take for example the identity relation or the
empty relation.
Similarly, irreflexive is not the same as not reflexive; irreflexive means that the relation is non-reflexive
on all pairs
∀ a ∈ A ( ¬ (a R a) )
and not reflexive means that there is at least one non-reflexive pair
¬ ∀ a ∈ A ( a R a ) = ∃ a ∈ A ( ¬ (a R a) )

30
Notice the difference in quantifier.
Every strict partial order S on A can be extended to a partial order R by: R = ∆ S ∪ Id . Also, every
A
∆ a ≤ b ∧ a 6= b.
partial order ≤ on A has an associated strict partial order, defined as ≤ \ Id A : a < b =

Example 4.2 1) Division on IN \ { 0 } induces a partial order: define


≤/ = ∆ {h n, m i ∈ IN 2 | ∃ k ∈ IN ( m = k × n ) }

then ≤/ is a partial order.


2) For any set A, the power set of A ordered by subset inclusion, (℘ A, ⊆), is a partial order.
3) Suppose ( A, ≤ A ) is a partial order and B ⊆ A. Then ( B, ≤B ) is a partial order, where ≤ B is defined
as the restriction of ≤ A to the set B2 .
≤B = ∆ {h b , b i ∈ B2 | b ≤ b }
1 2 1 A 2

4) Define a relation on formulae by: A ≤ B when ⊢ A → B. Then ≤ is a pre-order. For example,


false ≤ A ≤ true and A ≤ A ∨ B.

Definition 4.3 For any two partially ordered sets ( A, ≤ A ) and ( B, ≤B ), there are two important orders
on the product set A × B :
Product order: ∆ ( a ≤ a ) ∧ (b ≤ b )
h a1 , b1 i ≤ P h a2 , b2 i = 1 A 2 1 B 2

Lexicographic order: h a1 , b1 i ≤ L h a2 , b2 i = ( a1 < A a2 ) ∨ ( a1 =A a2 ∧ b1 ≤ B b2 )
If ≤A and ≤B are both total orders, then the lex- y
icographic order on A × B will be total (see Exer-
cise 4.6). By contrast, the product order will in gen- h 10, 5 i h 10, 5 i
. .
eral only be partial.
x
For example, the left diagram represents the set
{h r1 , r2 i ∈ IR 2 | h r1 , r2 i ≤ P h 10, 5 i}, whereas the
set {h r1 , r2 i ∈ IR 2 | h r1 , r2 i ≤ L h 10, 5 i} is repre-
sented by the right.

Proposition 4.4 For any partially ordered sets ( A, ≤ A ) and ( B, ≤B ), the product order is contained in
the lexicographic order: for all h a1 , b1 i, h a2 , b2 i ∈ A × B, if h a1 , b1 i ≤ P h a2 , b2 i then h a1 , b1 i ≤ L h a2 , b2 i.

Definition 4.5 For any totally ordered (finite) set A, let A∗ stand for the set { ǫ } ∪ A ∪ A2 ∪ A3 ∪ · · · ,
the set of all finite sequences made from elements of A, with ǫ denoting the empty sequence.
The full lexicographic order ≤ F on A∗ is defined as follows: given two sequences u, v ∈ A∗ , if u = ǫ
then u ≤ F v and if v = ǫ then v ≤ F u. Otherwise, both u and v are non-empty so we can write u = xu′
and v = yv′ where x and y are the first symbols of u and v, respectively. So
∆ ( u = ǫ) ∨ ( u = xu ′ ∧ v = yv′ ∧ ( x < y ∨ ( x = y ∧ u ′ ≤ v′ )))
u ≤F v = A F

Remark that the full lexicographic order gives the ordering for words in a dictionary, using the total order
on characters that is defined by the alphabet.

Example 4.6 The relation ‘is a divisor of’ for the set { 1, 2, 3, 6, 12, 18 } and the binary relation ⊆ on
℘ { 1, 2, 4 } are, respectively,
h , i, h , { 1 }i, h , { 2 }i, h , { 4 }i,
h , { 1, 2 }i, h , { 1, 4 }i, h , { 2, 4 }i, h , { 1, 2, 4 }i
h 1, 1 i, h 1, 2 i, h 1, 3 i, h 1, 6 i, h 1, 12 i,
h{ 1 } , { 1 }i, h{ 1 }, { 1, 2 }i, h{ 1 }, { 1, 4 }i, h{ 1 }, { 1, 2, 4 }i
h 1, 18 i, h 2, 2 i, h 2, 6 i, h 2, 12 i, h 2, 18 i ,
h{ 2 } , { 2 }i, h{ 2 }, { 1, 2 }i, h{ 2 }, { 2, 4 }i, h{ 2 }, { 1, 2, 4 }i
h 3, 3 i, h 3, 6 i, h 3, 12 i, h 3, 18 i, h 6, 6 i,
h{ 4 } , { 4 }i, h{ 4 }, { 1, 4 }i, h{ 4 }, { 2, 4 }i, h{ 4 }, { 1, 2, 4 }i
h 6, 12 i, h 6, 18 i , h 12, 12 i , h 18, 18 i
h{ 1, 2 } , { 1, 2, 4 }i, h{ 1, 4 } , { 1, 2, 4 }i, h{ 2, 4 } , { 1, 2, 4 }i,
h{ 1, 2, 4 } , { 1, 2, 4 }i

31
4.1 Analysing Partial Orders
We will now look at properties of partial orders.

Definition 4.7 Let ( A, R) be a partial order and a ∈ A.


a is minimal = ∆ ∀ b ∈ A (b R a ⇒ b = a)
A
∆ ∀ b ∈ A (a R b)
a is least =
a is maximal = ∆ ∀ b ∈ A (a R b ⇒ a = b)
A

a is greatest = ∀ b ∈ A ( b R a )

Remember that, since R is a partial order, it is reflexive, so ∀ a ∈ A ( a R a ) .

Example 4.8 In Example 4.6, the least (and minimal) element of the first relation is 1, the maximal
elements are 12 and 18, and there exists no greatest element. The least (and minimal) element of the
second relation is , and the greatest (and maximal) element is { 1, 2, 4 } .
With the usual partial order (IN , ≤), the least element is 0, and there exists no maximal element. The
partial orders (IN , ≤) and (ZZ , ≤) are different from each other: the latter has no least element.
There can be many minimal elements: take W = { V ⊆ IN | V ≥ 1 }, then each { n } is minimal in
(W, ⊆) and there exists no least element.
Proposition 4.9 Let ( A, ≤) be a partial order.
1) If A has a least element, then it is a minimal element.
2) If A has a least element, then it is unique.
3) If A is finite and non-empty, then ( A, ≤) has a minimal element.
4) If ( A, ≤) is a total order, where A is finite and non-empty, then it has a least element.
Proof : To prove part (1), suppose that a ∈ A is least and assume towards a contradiction that b < a for
some b ∈ A such that b 6= a. But a ≤ b by definition of a being least. This contradicts anti-symmetry.
To prove part (2), suppose that a and b are both least elements. By definition of least element, we have
a ≤ b and b ≤ a. By anti-symmetry, it follows that a = b.
To prove part (3), pick any a1 ∈ A. If a1 is not minimal we can pick a2 < a1 . If a2 is not minimal,
we can pick a3 < a2 . In this way, we either find a minimal element, or get an infinitely long decreasing
chain a1 > a2 > a3 > · · · . By construction, all the elements of the chain is different, which contradicts
the fact that A is a finite set. Part (4) is left as Exercise 4.3.

4.2 Well-founded Partial Orders


Suppose we wish to prove that the Ackermann function

 m+1
 ( n = 0)
Ack(n, m) = Ack(n – 1, 1) ( n > 0 ∧ m = 0)

Ack(n – 1, Ack(n, m – 1)) ( n > 0 ∧ m > 0)

always terminates, and therefore defines a total function. Counting down from n is not good enough,
since the third equation does not decrease the first argument, because of the embedded Ack(n, m – 1). We
will devise a different way of counting down, by defining a well-founded partial order with the property
that it always decreases to a terminating state.

Definition 4.10 (W ELL - FOUNDED PARTIAL ORDERS ) A partial order ( A, ≤) is well-founded if it has
no infinite decreasing chain of elements: that is, for every infinite sequence a1 , a2 , a3 , . . . of elements in
A with a1 ≥ a2 ≥ a3 ≥ · · · , there exists m ∈ IN such that m ≥ 1 and an = am for every n ≥ m.

For example, the conventional numerical order ≤IN is a well-founded partial order. This is not the case
for ≤ZZ , which can decrease for ever.

32
Proposition 4.11 If two partial orders ( A, ≤ A ) and ( B, ≤B ) are well-founded, then the lexicographical
order ≤ L on A × B (see Definition 4.3) is also well-founded.
Proof : We need to show that every infinite decreasing chain is ultimately constant. So suppose h a1 , b1 i >L
h a2 , b2 i >L h a3 , b3 i >L · · ·. Then a1 >A a2 >A a3 >A · · · by definition of lexicographic ordering. Since
( A, ≤ A ) is well-founded, the sequence ultimately consists of the same element. Therefore, there exists
m ∈ IN such that an = am for every n ≥ m.
Now by the definition of lexicographic order, we have bm >B bm+1 >B bm+3 >B · · · , and this se-
quence also ultimately ends up being constant because ( B, ≤B ) is well-founded. Thus, the original
sequence is ultimately constant.

This result implies that if two partial orders ( A, ≤ A ) and ( B, ≤B ) are well-founded, the product order
on A × B is as well, since the product order is contained in the lexicographical order (see Example 4.2).

Example 4.12 Take the Ack function, and consider the strict lexicographical order on IN 2 by
h x, y i <L h x′ , y′ i =
∆ x < x ′ ∨ ( x = x ′ ∧ y < y′ )
IN IN

Notice that, for any x and y,


h x + 1, 0 i >L h x, 1 i
h x + 1, y + 1 i >L h x, m i ( for all m ∈ IN )
h x + 1, y + 1 i >L h x + 1, y i
(notice that the second case implies h x + 1, y + 1 i >L h x, Ack( x + 1, y i)) and so evaluating the Ack
function takes us down the order. Moreover, using Proposition 4.11 and the fact that (IN , <IN ) is well-
founded we can show that the order is well-founded. Even though an element such as h 4, 3 i has infinitely
many other elements below it (for example, h 3, y i for every y), any decreasing chain is finite. Hence,
Ack is a total function and always returns an answer.

One of the more baffling things in mathematics (and a direct result of the axiom of choice, or, equiv-
alently, directly implied by the axiom of well-ordering (see the last section)) is that there exists a well-
ordering on every set, so in particular on IR ; this is, of course, not the familiar strict partial order ≤.
From the axiom of choice we can prove that it exists, but only by showing that if it does not exist, we
get a contradiction. So we know that it cannot not exist, but do not know how it is defined; the proof is
not constructive in that it does not provide the relation. Likewise, the well-ordering axiom just states the
existence, not how to define it.

Exercises

Exercise 4.1 1) Let ≤ D denote a binary relation on the positive natural numbers, defined by n ≤ D m =
n divides m. Prove that ≤D is a partial order.
2) Take S = { 2, 4, 5, 10, 12, 20, 25 } . Give ≤ D − on S. Which elements are maximal, and which mini-
mal?
3) Explain why the partial order in part (2) is not total. Find a total order ≤ T on S such that n ≤ D m
implies n ≤ T m.
Exercise 4.2 Definition 4.7 defines the notions of maximum, minimum, least, and greatest element for a
∆ a≤b∧
partial order. Give the definition of those concepts for strict partial orders <, where a < b =
a 6= b.
∆ {h a, b i | a ≤ b ∧ a 6= b } is a
Exercise 4.3 1) Let ( A, ≤) be a partial order; show that the relation < =
strict partial order.
2) Show that, if ( A, ≤) is a total order, where A is finite and non-empty, then it has a least element.
Exercise 4.4 Let R be a strict partial order on A.
1) Show that R is asymmetric: if x R y, then ¬ (y R x ).

33
2) Let R′ =
∆ R ∪ Id . Show that R ′ is a partial order.
A

Exercise 4.5 Let (℘ ( A), ⊆) denote the standard partial order on the subsets of A, given by set inclusion.
Take S = { a, b, c, d } .
1) Determine whether there exists a greatest and least element of (℘ A, ⊆).
2) Give ⊆− on ℘ S; you can omit Id℘ S from your answer.
3) Explain why (℘ S, ⊆) is not total. Find a total order (℘ S, ⊆ T ) such that A ⊆ B implies A ⊆ T B
for every A, B ∈ ℘ S.
Exercise 4.6 Let ≤A be a total order on A, and ≤B a total order on B. Show that the lexicographic
order L on A × B is a total order.
Exercise 4.7 Take A = { 1, 2, 3, 4 } , and let ≤ L denote the lexicographic order on A2 that is the natural
extension of ≤ on A.
1) Find all pairs in A2 which are less than h 2, 3 i.
2) Find all pairs in A2 which are greater than h 3, 1 i.
3) Give ≤L − on A2 .
* Exercise 4.8 Define orderings on the (total) functions in IN → IN as follows
∆ ∀ n ∈ IN ( f ( n ) ≤ g( n ) )
f ≤1 g =
∆ ∃ m ∈ IN ∀ n ∈ IN ( n ≥ m ⇒ f ( n ) ≤ g( n ) )
f ≤2 g =
1) Show that ≤1 is a partial ordering. What is its least element?
2) Is <1 well-founded? Justify your answer, by either producing an infinite descending chain or ex-
plaining why one cannot exist.
3) Show that ≤2 is a pre-order. Explain why it is not a partial order.

5 Functions
In this section we will formalise the notion of mathematical function as a special kind of relation, giving
the basic definitions, ways of constructing functions, and properties for reasoning about functions. In
the next section we will give the definition of computable functions, a notion that underpins computer
science.

5.1 Introducing Functions


Definition 5.1 (F UNCTIONS ) 1) A function f from a set A to a set B, written f : A → B, is a relation
f ⊆ A × B such that every element of A is related to exactly one element of B; more formally, it is
a relation which satisfies the following properties:

∀ b1 , b2 ∈ B ( ∃ a ∈ A ( h a, b1 i ∈ f ∧ h a, b2 i ∈ f ) ⇒ b1 = b2 ) (1)
∀ a ∈ A ∃ b ∈ B ( h a, b i ∈ f ) (2)

which state that, for every a ∈ A: 1) there exists at most one b ∈ B that is related to a by f ; and 2)
there exists at least one b ∈ B that is related to a by f .
2) The set A is called the domain (or source type) of f , and B is the co-domain (or range, or target
type) of f .
3) If a ∈ A, then f ( a) ( f applied to the argument a) denotes the unique b ∈ B such that h a, b i ∈ f ,
and f is said to map a to b. We also say that b is the image of a under f , or that a is a pre-image
(or inverse image) of b under f . Note that elements of the domain always have a single image, but
elements of the co-domain can have more than one pre-image or may have none.
4) If the domain A is the n-ary product A1 × · · · × An , we normally write f ( a1 , . . . , an ) instead of
f (h a1 , . . . , an i); we say that f is then a function of n arguments or an n-ary function.

34
5) We write B A for the set of all functions from A to B. 25
∆ { f ∈ ℘ ( A × B ) | ∀ b , b ∈ B ( ∃ a ∈ A ( f ( a) = b ∧ f ( a) = b ) ⇒ b = b )
BA = 1 2 1 2 1 2

∧ ∀ a ∈ A ∃ b ∈ B ( f ( a) = b ) }

Notice that B A is a set and that B A ⊆ ℘ ( A × B). We see f : A → B as an alternative notation for
f ∈ B A and A → B for B A , and are therefore allowed to write ∃ f : A → B ( · · · ) for ∃ f ∈ B A [· · ·] and
∀ f : A → B ( · · · ) for ∀ f ∈ B A ( · · · ), since we are essentially quantifying over the set B A .
We define equality on functions.
∆ ∀ a ∈ A ( f ( a) = h( a) ) .
Definition 5.2 Let f : A → B and h : A → B. Then f = A → B h = B

So two functions are equal when they have the same domain and return the same value for every element
of that domain. This implies that we consider functions equal only on the basis of their effect, not on
how they are defined.

Example 5.3 Let f : IN → IN and g : IN → IN be defined as:


(
0 ( n = 0)
f (n) =
n + f ( n – 1) ( n > 0)
n ×(n + 1)
g( n ) =
2
Then f = g.

Definition 5.4 Let f : A → B. For any V ⊆ A, we define the image of V under f to be


∆ { b ∈ B | ∃ a ∈ V ( b = f ( a) ) }
f [V ] =
The set f [ A] ⊆ B of all images of f is called the image set of f .

Notice that, in general, f [ A] 6= B.


We explore some examples.

Example 5.5 Since functions are defined as binary relations, we can use the representations given in
Section 3 for relations to describe functions. When the domain and co-domain are finite and not too
large, a useful representation is the diagram representation.
1) Let A = { 1, 2, 3 } and B = { a, b, c } . A B
Let f : A → B (notice that then f ⊆ A × B) be defined 1 a
by f = {h 1, a i , h 2, b i, h 3, a i}. Then f is a function, as 2 b
is clear from the diagram. Then f [{ 1, 2, 3 }] = { a, b };
3 c
f [{ 1, 3 }], the image of { 1, 3 } under f , is { a }.
2) Let A = { 1, 2, 3 } , B = { a, b }, and f ⊆ A × B be A B
defined by f = {h 1, a i , h 1, b i, h 2, b i, h 3, a i}. This f is 1 a
not a function, since one element of A is related to two 2
elements in B as is evident from the diagram:
3 b
3) The following are examples of functions with infinite
domains and codomains:
a ) the function f : IN 2 → IN defined by f ( x, y) = x + y; f [IN 2 ] = IN .
b ) the function g : IN → IN defined by g( x) = x2 ; g[IN ] = { 0, 1, 4, 9, 16, 25, . . . }.
c ) the function h : IR → IR defined by h( x) = x + 3; h[IN ] = { n ∈ IN | n ≥ 3 }.
∆ {h x, y i ∈ IR 2 | x = y2 } is not a function, since for
4) The binary relation R on IR defined by R =
example 4 relates to both 2 and −2.

25 This notation is justified by Proposition 5.6.

35
Notice that it might be tempting to write { x2 | x ∈ IN } in part (3.b ), but that would not be a correctly
defined set. Rather, we should write { y ∈ IN | ∃ x ∈ IN ( y = x2 ) }.

In the finite case, we can give an exact measure for the set of functions from A to B.
Proposition 5.6 If A = m and B = n, then B A = nm .
Proof : For each element of A, there are n independent ways of mapping it to an element of B. We can
view this as establishing how many numbers we can represent with m positions in base n: the answer to
this is nm .

Definition 5.7 (C HARACTERISTIC FUNCTION ) 1) Let A be a set. The characteristic function of B ⊆ A


is the function χ B : A → { 0, 1 } defined as:
(
1 ( a ∈ B)
χ B ( a) =
0 ( a ∈ A \ B)
2) The characteristic function of B ⊆ A1 × · · · × An is the function χ B : A1 × · · · × An → { 0, 1 } de-
fined as: (
1 (h a1 , . . . , an i ∈ B)
χ B ( a1 , . . . , a n ) =
0 (h a1 , . . . , an i 6∈ B)

In the literature, the characteristic function of a set B is also written as 1B , IB , or f B ; notice that the
‘surrounding set’ A is implicit in that we assume we know what B is a subset of.

5.2 Partial Functions


We have defined functions to be total (i.e. to have a value for every argument in the domain), following
usual mathematical practice. A partial function is a function which need not be defined on every member
of its domain. It therefore assigns to each element of its domain at most one element of the range.

Definition 5.8 A partial function f from a set A to a set B is a relation f ⊆ A × B such that just some
elements of A are related to unique elements of B; more formally, it is a relation which satisfies only the
first clause of Definition 5.1:
∀ a ∈ A, b1 , b2 ∈ B[h a, b1 i ∈ f ∧ h a, b2 i ∈ f ⇒ b1 = b2 ]
A function that also satisfies the second clause
∀ a ∈ A ∃ b ∈ B ( h a, b i ∈ f )
so has an image for each element of A, is then called total.

A partial function f is regarded as undefined on those elements which do not have an image under f .
It is sometimes convenient to refer to this undefined value explicitly as ⊥ (bottom); a partial function f
from A to B can straightforwardly be extended to a (total) function from A to B ∪ {⊥}, by mapping all
the elements of A that have no image under f to ⊥.
Remark that, in general, partial functions are not functions; i.e. if f : A → B is truly a partial function,
in the sense that the domain of f is a strict subset of A, then f is not a function.

Example 5.9 1) The relation R = {h 1, a i , h 3, a i} ⊆ { 1, 2, 3 }×{ a, b } is a partial function, shown in the


diagram on the left. We can see that not every element in A maps to an element in B. It is extended
to a total function as in the diagram on the right:

A 1 a B A 1 a B
2 b 2 b
3 3 ⊥

36
∆ {h x, y i ∈ IR 2 |

2) The binary relation R on IR defined by R = x = y } is a partial function (it is not
defined when x is negative).

Since not all partial functions are functions, when we use the term ‘function’, we implicitly assume
that we are dealing with total functions, so will only treat partial functions when explicitly stating so.

5.3 Properties of Functions


Recall that we highlighted certain properties of relations, such as reflexivity, symmetry and transitivity.
Since functions are special relations, we can define the identity, composition and inverse relations of
functions. The composition of two functions will turn out to always be a function. In contrast, we shall
see that the inverse relation of a function need not necessarily be a function.

Definition 5.10 (P ROPERTIES OF FUNCTIONS ) Let f : A → B be a function. We define the following


properties on f :
f is onto (surjective) : every element of B is in the image of f ; that is:
∀ b ∈ B ∃ a ∈ A ( f ( a) = b )
f is one-to-one (injective) : for each b ∈ B there exists at most one a ∈ A with f ( a) = b; that is:
∀ a, a′ ∈ A ( f ( a) = f ( a′ ) ⇒ a = a′ )
f is bijective : f is both one-to-one and onto.

We also say that f is a surjection, respectively injection or bijection.


By contra-position, 26 we can formulate f being injective also as equivalent to
∀ a, a′ ∈ A ( a 6= a′ ⇒ f ( a) 6= f ( a′ ) )
and so a one-to-one function never repeats values.
The previous definition is stated for total functions only; if we would like to extend the properties
to partial functions, we have to make them total, by either restricting the domain (eliminating all the
elements for which the function is not defined), or extending the range with ⊥.

Example 5.11 1) Let A = { 1, 2, 3 } and B = { a, b }.


The function f = {h 1, a i , h 2, b i, h 3, a i} is onto, but not A 1 a B
one-to-one, as is immediate from its diagram. It is not 2
possible to define a one-to-one function from A to B, 3 b
as there are too many elements in A for them to map
uniquely to B.
2) Let A = { 1, 2 } and B = { a, b, c } .
The function f = {h 1, c i, h 2, a i} is one-to-one, but not A 1 a B
onto. It is not possible to define an onto function from b
A to B in this case, as there are not enough elements in 2 c
A to map to all the elements of B.
3) Let A = { 1, 2, 3 } and B = { a, b, c } .
The function f = {h 1, a i , h 2, c i, h 3, b i} is bijec- A 1 a B
tive. 2 b
3 c

4) The function f : IN 2 → IN defined by f ( x, y) = x + y is onto but not one-to-one. To prove that


f is onto, take an arbitrary n ∈ IN . We have to find h m1 , m2 i ∈ IN 2 such that f (m1 , m2 ) = n.
This is easy since f (n, 0) = n + 0 = n. To show that f is not one-to-one, we need to produce a

26 Contra-position states that P → Q ⇐


⇒ ¬ Q → ¬ P, and ¬ Q → ¬ P is the contra-position of P → Q; see Example 1.11.

37
counterexample. In other words, we have to find h m1 , m2 i, h n1 , n2 i such that h m1 , m2 i 6= h n1 , n2 i,
but f (m1 , m2 ) = f (n1 , n2 ). There are many possibilities; we pick h 1, 0 i and h 0, 1 i. In fact, since
‘+’ is commutative, h m, n i, h n, m i is a counterexample for any m and n such that m 6= n.
5) The function f : IN → IN defined by f ( x) = x2 is one-to-one, but the similar function f on integers
ZZ is not. The function f : IR → IR + (where IR + = ∆ { r ∈ IR | r ≥ 0 }), defined by f ( x ) = x2 is

surjective, but restricted to IN (as f : IN → IN ) it is not.


6) The function f : IR → IR given by f ( x) = 4x + 3 is a bijective function. To prove that f is one-to-
one, suppose that f (r1 ) = f (r2 ), which means that 4r1 + 3 = 4r2 + 3. It follows that 4r1 = 4r2 , and
hence r1 = r2 . To prove that f is onto, let r be an arbitrary real number. We have f ((r – 3)/4) =
4((r – 3)/4) + 3 = (r – 3) + 3 = r by definition of f , and hence f is onto. Since f is both one-to-one
and onto, it is bijective.
Notice that the restriction of f to IN is one-to-one but not onto, since 2 6∈ f [IN ].

We have seen examples where we cannot define a bijection between finite sets A and B when A > B ;
any total function would need to map two elements of A to the same element of B. This corresponds to
the pigeonhole principle, which we can rephrase in our formal language of functions:
“ Let f : A → B be a function, where A and B are finite. If A > B , then f is not injective. ”
Recall Example 5.11. We stated that it is not possible to define a one-to-one function from A = { 1, 2, 3 }
to B = { a, b }, since A is too big, and the pigeonhole principle confirms this (see Proposition 5.13).

Proposition 5.12 Let A and B be finite sets, let f : A → B and let V ⊆ A. Then f [V ] ≤ V .
Proof : Assume towards a contradiction that f [V ] > V . By the pigeonhole principle, there exists some
a ∈ V and b, b′ ∈ f [V ] with b 6= b′ and f ( a) = b and f ( a) = b′ , which cannot happen as f is a function.
Therefore our assumption is false, and ¬ ( f [V ] > V ), so f [V ] ≤ V .

Proposition 5.13 Let A and B be finite sets, and let f : A → B.


1) If f is one-to-one, then A ≤ B .
2) If f is onto, then A ≥ B .
3) If f is a bijection, then A = B .
Proof : 1) We will show the contrapositive of this statement: if A > B then f cannot be one-to-one.
Assume A > B , and that f is injective. Since f maps onto different elements in B, then A =
f [ A] , so we have f [ A] > B . Since f [ A] ⊆ B, by the pigeonhole principle this is impossible.
So f is not injective.
2) If f is onto then f [ A] = B, so that in particular f [ A] = B . Also, A ≥ f [ A] by Proposition
5.12. Therefore A ≥ B as required.
3) This follows from parts (1) and (2).

Cantor has shown the following property, that relates injections and surjections with bijections:

Theorem 5.14 ((D UAL ) C ANTOR -B ERNSTEIN T HEOREM ) If there exists functions f : A → B and g :
B → A, both injective or both surjective, then there exists a bijection h : A → B.

We will not go into the details of the proof for this property here, but it will be useful in Section 5.5.
It is worthwhile to observe that we do not gain the existence of a bijection between sets A and B
simply from the existence of an injection from A to B and a surjection from B to A: take A = { a } and
B = { b, c }, and define f : A → B by f ( a) = b, and g : B → A by g(b) = a = g(c), then clearly f is
injective and g surjective, but we cannot define a bijection between A and B, since A 6= B .

38
5.4 Operations on Functions
Since functions are relations, we can define identity, composition, and inverse of functions. We shall see
that the inverse relation of a function need not necessarily be a function.

Definition 5.15 (F UNCTION COMPOSITION ) Let A, B, and C be arbitrary sets, and let f : A → B and
g : B → C be arbitrary functions. The composition of f with g, written g ◦ f : A → C, is a function
defined by
∆ g( f ( a))
g ◦ f ( a) =
for every element a ∈ A. In other words,
⇒ ∃ b ∈ B ( f ( a) = B b ∧ g( b) = C c )
g ◦ f ( a) = C c ⇐

Notice that composition of relations is different from functional composition. Functional composition
g ◦ f reads f followed by g (or rather g after f ), whereas the relational composition R ◦ S reads R followed
by S. The main reason for the difference is that g ◦ f is a prefix notation, whereas R ◦ S is infix. Likewise,
if we view functions f and g as relations F and G and we have g( f ( a)) = c, then a F ◦ G c, so by definition
of relation composition ∃ b ∈ B ( a F b ∧ b G c ) .
It is easy to check that g ◦ f is indeed a function (see Exercise 5.4). Notice that the co-domain of f is
the same as the domain of g for the composition to be well defined.

Example 5.16 Let A = { 1, 2, 3 }, B = { a, b, c } , C = { α, β, γ }, f : A → B = {h 1, a i , h 2, b i, h 3, a i} and


g : B → C = h a, γ i, h b, α i. Then g ◦ f : A → C = {h 1, γ i, h 2, α i, h 3, γ i}:
A f B g C A g◦ f C
1 a α 1 α
2 β or 2 β
3 b γ 3 γ

Proposition 5.17 (A SSOCIATIVITY OF FUNCTION COMPOSITION ) Let f : A → B, g : B → C, and h :


C → D be arbitrary functions. Then h ◦( g ◦ f ) = (h ◦ g) ◦ f .
Proof : This result is easily established from the definition of functional composition. Take an arbitrary
element a ∈ A; we need to show that the two functions return the same result on a. Well:
(h ◦( g ◦ f ))( a) = h(( g ◦ f )( a)) = h( g( f ( a))) = (h ◦ g)( f ( a)) = ((h ◦ g) ◦ f )( a)
The composition of two bijections gives a bijection.

Proposition 5.18 If f : A → B and g : B → C are bijections, then so is g ◦ f : A → C.


Proof : It is enough to show that:
1) if f , g are onto then so is g ◦ f ;
2) if f , g are one-to-one then so is g ◦ f .
For point (1), assume f and g are onto. We need to show that for every element c in C there exists an
element a in A such that g ◦ f ( a) = c. So, let c be an arbitrary element of C; since g : B → C is onto, we
can find an element b ∈ B such that g(b) = c. Since b ∈ B and f : A → B is onto, we can also find an
element a ∈ A such that f ( a) = b. But then g ◦ f ( a) = g( f ( a)) = g(b) = c. So we have shown that for
every element c in C there exists an element a in A such that g ◦ f ( a) = c, so g ◦ f : A → C is onto.
For point (2), assume f and g are one-to-one. We need to show that for every two elements a1 and
a2 in A, if g ◦ f ( a1 ) = g ◦ f ( a2 ) then a1 = a2 . So, let a1 , a2 be arbitrary elements of A, and suppose
g ◦ f ( a1 ) = g ◦ f ( a2 ). Then g( f ( a1 )) = g( f ( a2 )) by definition of g ◦ f . Since g is one-to-one, it follows
that f ( a1 ) = f ( a2 ). Since f is also one-to-one, it follows that a1 = a2 . So g ◦ f ( a1 ) = g ◦ f ( a2 ) implies
a1 = a2 for any a1 , a2 ∈ A, and thereby g ◦ f is one-to-one.

39
Definition 5.19 (I DENTITY FUNCTION ) Let A be a set. The identity function on A, denoted Id A :
A → A, is defined by Id A ( a) = a.

We shall now define the inverse function. We cannot just define it using the definition of inverse
relations, as we do not always get a function this way. For example, the inverse relation of the function
f : { 1, 2 } → { 1 } defined by f (1) = f (2) = 1 is not a function. However, we shall see that when an
inverse function exists, it corresponds to the inverse relation.

Definition 5.20 (I NVERSE FUNCTION ) Let f : A → B be an arbitrary function. The function g : B → A


is called a:

left inverse of f when g ◦ f = Id A : ∀ a ∈ A ( g ◦ f ( a) = a ) (3)


right inverse of f when f ◦ g = IdB : ∀ b ∈ B ( f ◦ g(b) = b ) (4)

g is called an inverse of f whenever it is both a left and a right inverse of f .

Example 5.21 Let A = { a, b, c } , B = { 1, 2, 3 } , f = {h a, 1 i , h b, 3 i, h c, 2 i} and g = h 1, a i, h 2, c i, h 3, b i.


Then g is an inverse of f .

Proposition 5.22 Let f : A → B be a bijection, then f -- 1 (as relation) is a well-defined function, and is
an inverse of f .
Proof : The inverse relation f -- 1 is defined as f -- 1 =
∆ {h b, a i ∈ B × A | f ( a) = b }. We need to check that

this is a function, so need to show that f -- 1 satisfies criterion (1) and (2) of Definition 5.1.
1: Let a1 , a2 ∈ A, and assume there exists b ∈ B such that h b, a1 i ∈ f -- 1 and h b, a2 i ∈ f -- 1 . Then we
have both h a1 , b i ∈ f and h a2 , b i ∈ f ; since f is one-to-one, we have a1 = a2 .
2: Let b ∈ B be arbitrary. Since f is onto, there exists an a ∈ A such that f ( a) = b, so h b, a i ∈ f -- 1 .
So f -- 1 is a function. To check that f -- 1 is the inverse of f , we check the two criteria of Definition 5.20:
3: Take a ∈ A, then h a, f ( a)i ∈ f , so h f ( a), a i ∈ f -- 1 , and therefore f -- 1 ( f ( a)) = a; so f -- 1 ◦ f = Id A .
4: Likewise, take b ∈ B, then h b, f -- 1 (b)i ∈ f -- 1 , so h f -- 1 (b), b i ∈ f , and therefore f ( f -- 1 (b)) = b, so
f ◦ f -- 1 = IdB .
so f -- 1 is the inverse of f .

We normally write f -- 1 for the inverse (function) of f , if it exists, and then f -- 1 (b) = a whenever
f ( a) = b. So we only use the notation f -- 1 when we are sure f has an inverse.

Proposition 5.23 Let f : A → B. If f has an inverse g, then f is a bijection and the inverse is unique.
Proof : Let g be the inverse of f . To show that f is onto, take an arbitrary b ∈ B. Since f ( g(b)) = b, it
follows that b is in the image of f , or b ∈ f [ A]; so f is onto. To show that f is one-to-one, let a1 , a2 be
arbitrary elements of A. Assume f ( a1 ) = f ( a2 ), which implies that g( f ( a1 )) = g( f ( a2 )), since g is a
function. Since g ◦ f = Id A , it follows that a1 = a2 ; so f is one-to-one.
To show that the inverse is unique, assume that g, g′ are both inverses of f . We will show that g = g′ .
Let b be an arbitrary element of B. Then f ( g(b)) = b = f ( g′ (b)) since g and g′ are both inverses of f .
Since f is one-to-one, it follows that g(b) = g′ (b). Therefore, for all b, g(b) = g′ (b), so g = g′ follows
by Definition 5.2.
Take b ∈ B, then g(b) ∈ A, so there exists a ∈ A such that g(b) = a; then b = f ( g(b)) = f ( a), so
f ( a) = b, so g = f -- 1 , the inverse relation of f .

In view of the preceding proposition, one way of showing that a function is a bijection is to show that
it has an inverse. Furthermore, if f is a bijection with inverse f -- 1 , then f -- 1 has an inverse, namely f ,
and so f -- 1 is also a bijection.

Example 5.24 1) Take f : IR → IR defined by f ( x) = x3 . The inverse is f -- 1 ( x) = 3 x.

40
2) Take the function f : IN → IN defined by
(
x+1 ( x even)
f ( x) =
x–1 ( x odd)
It is easy to check that f ◦ f ( x) = x, considering the cases when x is odd and even separately.
Therefore f is its own inverse, and we can conclude that it is a bijection.

5.5 Cardinality of Sets


Above we compared finite sets by counting the number of elements. We are now in a position to extend
that by comparing the size of infinite sets as well, using a natural relation between sets defined using
bijective functions. This relation also expresses that sets ’have the same size’, but extends the notion of
size to non-natural numbers.

Definition 5.25 We define ≈ 27 on sets as: 28


∆ ∃ f : A → B ( f is a bijection ).
A≈B =

It is straightforward to show that, if A and B are finite, and A = B , then A ≈ B.


We can now restate Theorem 5.14 as follows:

Corollary 5.26 If there exists functions f : A → B and g : B → A, both injective or both surjective, then
A ≈ B.

We can use this result to show the following:

Example 5.27 { blue, red } ≈ { cat, dog }; IN ≈ { n ∈ IN | n is even }; IN ≈ { n ∈ IN | n is prime }.

We will show that IN ≈ ZZ in Example 6.3, IN ≈ IN 2 in Example 5.33, IN ≈ Q in Example 6.4, IN ≈


{ V ∈ ℘ IN | ∃ n ∈ IN ( V = n ) } in Example 6.5, IN 6≈ ℘ IN in Example 6.6, and IN 6≈ IR in Example 6.7.
Proposition 5.28 ≈ satisfies the criteria of an equivalence relation.
Proof : We need to show that ≈ is reflexive, symmetric, and transitive.
reflexive : We need to show that A ≈ A, for every set A, so that a bijection exists between A and itself:
clearly Id A : A → A is a bijection.
symmetric: By definition, A ≈ B implies that there exists a bijection f : A → B. By Proposition 5.22,
it follows that f has an inverse f -- 1 : B → A which is also a bijection, hence B ≈ A.
transitive : From Proposition 5.18.

* Example 5.29 Let A, B, and C be arbitrary sets, and consider the products ( A × B)× C and A ×( B × C ).
Remember that we observed in Section 2.4 that these sets are not in general equal.
There exists however a natural bijection f : ( A × B)× C → A ×( B × C ) such that f (h a, b i, c) = h a, h b, c ii.
Similarly, there exists a bijection g : A ×( B × C ) → ( A × B)× C such that g( a, h b, c i) = hh a, b i, c i. In
particular, we define
Left( x, y) = x
Right( x, y) = y
f ( p, y) = h Left( p), h Right ( p), y ii
g( x, p) = hh x, Left( p)i, Right( p)i
It is not difficult to show that g ◦ f = Id A ×( B × C) and f ◦ g = Id( A × B)× C ; using Proposition 5.23 it
follows that f is a bijection, so ( A × B)× C ≈ A ×( B × C )
27 In the literature, often the symbol ∼ is used rather than ≈. In these notes, to avoid confusion, we reserve that symbol for
generic equivalence relations, and use ≈ to explicitly indicate we are talking about the measure relation on sets.
28 Since we can only define relations on sets, and Russel has shown that the collection of all sets is not a set, officially we

cannot call ≈ a relation. But by abuse of terminology, we will allow ourselves to call ≈ ‘a relation on sets’.

41
Example 5.30 Consider the set Even of even natural numbers. There exists a bijection between Even and
IN , f : IN → Even given by f (n) = 2n. Notice that not all functions from Even to IN are bijections: for
example, the function g : Even → IN given by g(n) = n is one-to-one but not onto.
To show that Even ≈ IN it is enough to show the existence of one such bijection.

The following gives another explanation for the notation for the set of functions from A to B.

Example 5.31 Let A be such that A = n, and take the set { χ B : A → { 0, 1 } | B ⊆ A }. Since each
characteristic function for a subset of A is a function from A to { 0, 1 }, we have
{ χ B : A → { 0, 1 } | B ⊆ A } = A → { 0, 1 }
By definition, we can write the latter set as { 0, 1 } A , or as 2 A . Now
℘ A ≈ { χ B : A → { 0, 1 } | B ⊆ A } = A → { 0, 1 } = 2 A
and ℘ A = 2n = 2| A| .

Recall that the cardinality of a finite set is the number of elements in that set. Consider a finite set A
with cardinality n. Then there exists a counting bijection
c A : { 1, 2, . . . , n } → A
If A and B have the same number of elements, then we can define a bijection f : A → B by
f ( a) = (c B ◦ c A -- 1 ) ( a)
So two finite sets have the same number of elements if and only if there exists a bijection between them.
We extend this observation to compare the size of infinite sets.

Definition 5.32 (C ARDINALITY ) Given two (arbitrary) sets A and B, we say that A has the same car-
dinality as B, written A = B , whenever there exists a bijection between A and B, so when A ≈ B.
∆ A≈B
A = B =

By the remark we made above (if A and B are finite, and A = B , then A ≈ B), this notion of cardinality
encompasses the one defined in Definition 2.15.
We will give some examples of sets that have the same cardinality.

Example 5.33 ( IN ≈ IN 2 ) We can build a bijection f : IN → IN 2 as illustrated by the following diagram:


h 0, 0 i h 0, 1 i h 0, 2 i h 0, 3 i h 0, 4 i h 0, 5 i ···
h 1, 0 i h 1, 1 i h 1, 2 i h 1, 3 i h 1, 4 i h 1, 5 i ···
h 2, 0 i h 2, 1 i h 2, 2 i h 2, 3 i h 2, 4 i h 2, 5 i ···
h 3, 0 i h 3, 1 i h 3, 2 i h 3, 3 i h 3, 4 i h 3, 5 i ···
h 4, 0 i h 4, 1 i h 4, 2 i h 4, 3 i h 4, 4 i h 4, 5 i ···
.. .. .. .. .. ..
. . . . . .
so f (0) = h 0, 0 i, f (1) = h 0, 1 i, f (2) = h 1, 0 i, f (3) = h 2, 0 i, f (4) = h 1, 1 i, f (5) = h 0, 2 i, f (6) =
h 0, 3 i, etc.. It is clear that f is a surjection, since all pairs will be visited; also, since all pairs are
different, f is injective, even if we do not give a formal definition for f (n).

* Example 5.34 ( IR ≈ IR 2 ) We argue that IR ≈ IR 2 through a proof-sketch, by first using Peano’s solution
to build a surjection from [0, 1] (the closed interval of reals between 0 and 1) to [0, 1] 2 in steps as
illustrated by the following diagrams:

42
etc.
In the limit, this will build a surjection onto [0, 1]2 ; the idea behind the proof is that for every element
of [0, 1] 2 , and any circle cast around that point, there will be an incarnation of the construction that cuts
through that circle. So in the limit we can let the circle contract to the point, and the line will hit the
point. We also know that g( x, y) = x is a surjection from [0, 1]2 to [0, 1]; then by Theorem 5.26, we get
[0, 1]2 ≈ [0, 1].
We also have [0, 1] ≈ IR via the surjections f : [0, 1] → IR and g : IR → [0, 1]:
 
0
 ( x = 0)  0 ( x ≤ 0)

f ( x) = 1 ( x = 1) and g( x ) = 1 ( x ≥ 1)
1
 
tan (π ×( x – /2 )) (otherwise ) x (otherwise)
 

and by transitivity of ≈ also IR ≈ IR 2 . We can easily transform this construction to show that C ≈ IR .

5.6 Cantor’s discovery of different infinities


Using his famous diagonalisation argument, Cantor showed in 1891 that no set can have the same cardi-
nality as its power set. This result is clear for the case that A is finite, since we have seen that if A = n,
then ℘ A = 2n , and n 6= 2n for all n ∈ IN .
In fact, Cantor showed that no function f : A → ℘ A can be surjective, by showing that every such
f misses a subset of A. The proof assumes a function f exists, and using f constructs a set B that is
outside the range of f , which therefore cannot be a surjection. The construction defines B as the set
{ a ∈ A | a 6∈ f ( a)}; to illustrate how this works, take
A = { 1, 2 }
℘ A = { , { 1 }, { 2 }, { 1, 2 }}
• If f (1) = { 2 }, f (2) = { 1, 2 }, then B = { 1 };
• If f (1) = { 1 }, f (2) = { 1, 2 }, then B = ;
• If f (1) = , f (2) = { 1 }, then B = { 1, 2 };
• If f (1) = { 1, 2 } , f (2) = , then B = { 2 }; etc.
Notice that in each case B is not in the image of f .

Theorem 5.35 (C ANTOR ’ S THEOREM ) For any set A, A 6≈ ℘ A.


Proof : Let f : A → ℘ A be any function from A to ℘ A. Notice that since f maps elements of A to
subsets of A, every element a of A is either an element of the set f ( a), or not. Define
B = ∆ { a ∈ A | a 6∈ f ( a)} ,

which is a well-defined subset of A, so B ∈ ℘ A. We now ask ourselves: is B in the image of A under f


(i.e. B ∈ f [ A]), so is there a b ∈ A such that f (b) = B?
Assume this is so, by assuming that f is surjective, so then there exists b ∈ A such that f (b) = B; then
also for this b, either b ∈ f (b) or b 6∈ f (b). We have two cases to consider:
b ∈ f (b) = B : Since B contains all elements a ∈ A that satisfy the predicate ‘a 6∈ f ( a)’, we know that
that predicate holds for b, so b 6∈ f (b): contradiction.
b 6∈ f (b) = B : From b 6∈ f (b) we note that b satisfies the predicate and therefore b ∈ B = f (b): con-
tradiction.
We get a contradiction in both cases, so have to reject our assumption: there is no b ∈ A such that

43
f (b) = B; so B ∈ ℘ A is not in the image of f : A → ℘ A, and f is not surjective.
So no function f : A → ℘ A can be a surjection, so certainly no such function can be a bijection.

We recognise Russell’s paradox here (which came of course after Cantor gave his proof). The difference
is that now B is a set, and that the contradiction lies in the fact that B is not in the image of f , as was
assumed.
Notice that the above result is formulated for any set, not just finite or infinite sets. An illustration of
this result is that IN 6≈ ℘ IN ; we will come back to this in Section 6.

Exercises
Exercise 5.1 Let A = { 1, 2 } and B = { a, b, c } . List the elements of the three sets A → B, A → A and
B → A. State which functions are one-to-one and onto, and which are bijections.
Exercise 5.2 Determine which of the following relations are functions. For those which are functions,
determine whether they are one-to-one and onto. Also, give the inverse function when it exists. Justify
your answer throughout.
1) {h m, n i ∈ IN 2 | m = n }; 6) {h a, b i , h b, c i, h c, a i} ⊆ { a, b, c } 2 ;
2) {h m, n i ∈ IN 2 | m 6= n }; 7) {h x, y i ∈ ZZ 2 | x = y ∨ x = – y };
3) ⊆ IN 2 ; 8) {h x, y i ∈ IR 2 | y = 1/( x2 – 4)};
2
4) {h a, a i , h a, b i, h a, c i} ⊆ { a, b, c } ; 9) {h x, y i ∈ ZZ 2 | x2 = y }.
2
5) {h a, a i , h b, a i, h c, a i} ⊆ { a, b, c } ;
Exercise 5.3 1) Let A = { a, b, { b }} and B = {{ a } , { }}. How many functions are there from A to
B? How many of these functions are onto? How many one-to-one? How many partial functions
are there from A to B?
2) Now let A and B be arbitrary sets with A = m and B = n. How many functions are there from
A to B, and how many partial functions?
Exercise 5.4 Let f : A → B and g : B → C be functions. Show that g ◦ f is also a function.
Exercise 5.5 Let f : A → B and g : B → C be functions.
1) Prove that if g ◦ f is one-to-one then so is f .
2) Give a specific example of f and g such that g ◦ f is one-to-one but g is not.
3) Prove that if g ◦ f is onto then so is g.
4) Give a specific example of f and g such that g ◦ f is onto but f is not.
Exercise 5.6 Give a specific example of sets A, B, and C, and of functions f : A → B and g : B → C such
that g ◦ f : A → C is bijective but f and g are not.
Exercise 5.7 Assume that A1 ≈ A2 and B1 ≈ B2 ; show that A1 × B1 ≈ A2 × B2 .
Exercise 5.8 Let A, B and C be arbitrary sets. If possible, find explicit bijections between the following
pairs of sets; justify your answer. If this is not possible, explain why by analysing special cases.
1) A × B and B × A;
2) A2 and A;
3) ( A → B) → C and A → ( B → C );
* Exercise 5.9 Let A, B and C be arbitrary sets. If possible, find explicit bijections between ( A × B) → C
and A → ( B → C ).

44
6 On Infinity
We will explore examples of infinite sets in two stages. We first look at those sets which have the same
cardinality as the natural numbers, since such sets have nice properties. We then briefly explore examples
of infinite sets which are truly bigger than the set of natural numbers.

6.1 Countable Sets


Using the relation ≈ from Definition 5.25, we will now introduce the notion of countability of a set A,
which expresses that we can effectively list the elements of A, albeit perhaps in an infinite list.

Definition 6.1 (C OUNTABILITY ) A set A is countable if A is finite or A ≈ IN .

The following properties hold:

Proposition 6.2 1) If V ⊆ IN , then V is countable.


2) Let A be a non-empty set. The statements i) A is countable; ii) there exists a surjection from IN to
A; iii) there exists an injection from A to IN , are equivalent.

We leave the proofs of these properties as Exercise 6.1 and 6.2. Notice that part (ii) and (iii) both express
that A is countable if IN has at least as many elements as A (see also Proposition 5.13).

Example 6.3 ( ZZ IS COUNTABLE ) The integers ZZ are countable, since they can be listed as:
0, – 1, 1, – 2, 2, – 3, 3, . . . .
This counting function can be defined formally by the bijection g : IN → ZZ defined by
(
x/ ( x even)
2
g( x ) = ( x + 1 )
– /2 ( x odd)
Our bijection does not respect the order - but of course this is not required.

By Example 5.33 we know that IN 2 is countable. Since we can represent (positive) rational numbers
as fractions (i.e. pairs) of natural numbers, the positive rational numbers are also countable.
We will now give a direct construction.
1/ 2/ 3/ 4/ 5/ 6/ ···
1 1 1 1 1 1
Example 6.4 ( Q IS COUNTABLE ) We define a
function from IN to Q + (the set { q ∈ Q | q > 0 }) 1/
2
2/
2
3/
2
4/
2
5/
2
6/
2 ···
that is onto in the diagram below (where f (0) = 1/ 2/ 3/ 4/ 5/ 6/
1/ , f (1) = 2/ , f (2) = 1/ , f (3) = 1/ , f (4) = 3 3 3 3 3 3 ···
1 1 2 3
2/ , etc.). Notice that this diagram does not de- 1/ 2/ 3/ 4/ 5/ 6/ ···
2 4 4 4 4 4 4
1 2 3
fine a bijection itself, since /2 = /4 = /6 , etc.; 1/ 2/ 3/ 4/ 5/ 6/ ···
5 5 5 5 5 5
in fact, every fraction will appear infinitely many
times in the image of f . For all this infinite waste, 1/ 2/ 3/ 4/ 5/ 6/ ···
6 6 6 6 6 6
+
it is clear however that for every r ∈ Q there will 1/ 2/ 3/ 4/ 5/ 6/
7 7 7 7 7 7 ···
be an n such that f (n) = r, making f onto; then,
by Proposition 6.2, we know that Q + is count- .. .. .. .. .. ..
. . . . . .
able.

Using this counting f of Q + , it is now straightforward to build a counting h of Q :



0
 ( n = 0)
h( n ) = n – 1
– f ( /2 ) (n odd)
 f ((n−2)/ ) (n even ∧ n > 0)

2

(so h(0) = 0, h(1) = – f (0) = – 1/1 , h(2) = f (0) = 1/1 , h(3) = – f (1) = – 2/1 , h(4) = f (1) = 2/1 ,
h(5) = – f (2) = – 1/2 , h(6) = f (2) = 1/2 , etc.). So also Q is countable.

45
Notice that, since the function h : Q + → IN , defined by h( p/q ) = p – 1 is onto as well, also using
Cantor-Bernstein we know that Q + ≈ IN , so Q + is countable.
Alternatively, working towards a bijection directly, we can optimise this mapping by skipping all
fractions p/q in the diagram where p and q are not co-prime; 29 note that then there exists m > 1 such
that p = p′ × m and q = q′ × m, with p′ and
1/
1
2/
1
3/
1
4/
1
5/
1
6/
1 ··· q′ co-prime, and m defined as the product of
1/ 3/ 5/ all the common dividers of p and q. Each
2 2 2 ···
fraction p/q is on the line where all frac-
1/ 2/ 4/ 5/ ···
3 3 3 3 tions have the property that the sum of nu-
1/ 3/ 5/ merator and denominator is p + q; this im-
4 4 4 ··· ′
plies that p /q′ is on a ‘higher’ line, and has
1/ 2/ 3/ 4/ 6/ ···
5 5 5 5 5 been ‘visited’ before, so p/q can be safely
1/ 5/ skipped, as shown in the diagram on the left.
6 6 ···
This is now a bijection, so Q + is count-
1/ 2/ 3/ 4/ 5/ 6/ ···
7 7 7 7 7 7 able.
.. .. .. .. .. ..
. . . . . .
We will now show that the set of all finite subsets of IN is countable.

Example 6.5 The set of finite subsets of IN , defined as { V ∈ ℘ IN | ∃ n ∈ IN ( V = n ) }, is countable.


We will show this property first for the set of finite subsets of IN \ { 0 },
℘f (IN \ { 0 }) = { V ∈ ℘(IN \ { 0 }) | ∃ n ∈ IN ( V = n ) },
using Theorem 5.26.
Let V ∈ ℘ f (IN \ { 0 }) be a finite set and V = n, then, since h V, <i is a well-founded order (Defini-
tion 4.10) we can write V as a list
V = v1 v2 v3 · · · v n
such that vi < v j for i < j. Let { p1 , p2 , p3 , . . . } be the set of prime numbers such that pi < p j for i < j.
We define f : ℘ f (IN \ { 0 }) → IN by:
f (V ) = 2v1 × 3v2 × 5v3 × 7v4 × · · · × pvnn = Πin=1 pvi i
(notice that we need to exclude 0 since it would not contribute to this product). Since each number has
its unique decomposition as a product of prime numbers, it is straightforward to verify that if V 6= V ′ ⇒
f (V ) 6= f (V ′ ), so f is an injection. Then by Proposition 6.2, we know that ℘ f (IN \ { 0 }) is countable.
Mark that
{ V ⊆ IN | ∃ n ∈ IN ( V = n ) } ≈ { V ⊆ IN \ { 0 } | ∃ n ∈ IN ( V = n ) }
through the bijection g(V ) = { m ∈ IN | m – 1 ∈ V }.

6.2 Uncountable Sets


Using his diagonalisation argument, with Theorem 5.35, Cantor in fact showed that there are uncountable
sets: that is, infinite sets that are too large to be countable. In particular, his theorem implies that
IN 6≈ ℘ IN , so immediately we know that ℘ IN is not countable. To illustrate the diagonalisation argument,
we look at Cantor’s proof in detail for the case of natural numbers:

Example 6.6 ( ℘ IN IS NOT COUNTABLE ) We will show that any list of subsets of IN will be missing a
subset, so any mapping of IN to ℘ IN cannot be onto, so no bijection between IN and ℘ IN can exist, as
predicted by Cantor’s theorem.
29 p, q ∈ IN are co-prime if p > 1, q > 1 and they have no common factor, i.e. there does not exists r > 1 such that p = r × m

and q = r × n, for certain m, n > 1.

46
Note that, through its characteristic function (see ℓV0 = v00 v01 v02 v03 v04 ···
Definition 5.7), we can write any subset V ⊆ IN (or
ℓV1 = v10 v11 v12 v13 v14 ···
V ∈ ℘ IN ) as a list of 0s and 1s as
ℓV = v0 v1 v2 v3 · · · ℓV2 = v20 v21 v22 v23 v24 ···
where every vi = 1 if i ∈ V, and vi = 0 if i 6∈ V. ℓV3 = v30 v31 v32 v33 v34 ···
Now let f be a listing of subsets of IN , so f : IN → ℓV4 = v40 v41 v42 v43 v44 ···
℘ IN ; we write Vn for f (n), and represent f by the .. .. .. .. ..
list . . . . .
f (0), f (1), f (2), . . . = ℓV0 , ℓV1 , ℓV2 , . . . ℓW = 1 – v00 1 – v11 1 – v22 1 – v33 1 – v44 · · ·
(since { n } ∈ ℘ IN , for all n ∈ IN , this list is infinitely
long).
The aim is to show that f cannot be surjective, by constructing a set that cannot be in the image
of f . We can now construct a set W through giving its characteristic function w0 w1 w2 w3 · · · by:
wi = 1 – vii (i.e. wi = 1 if vii = 0, and wi = 0 if vii = 1). Notice that this set corresponds to the set
created in Cantor’s proof: if vii = 0, then i 6∈ Vi and i ∈ W, and if vii = 1, then i ∈ Vi and i 6∈ W, so
W = { i ∈ IN | i 6∈ f (i )}. We can represent this through the diagram above.
Then clearly W ⊆ IN ; also, W differs from each set in the list ‘on the diagonal’: i ∈ W ⇐ ⇒ i 6∈ Vi .
So clearly W 6= Vi for all i ∈ IN , since they differ on the ith element, and therefore W is not in our list.
This is independent of the list we started with, so any list we give will be at least missing one element.
So f cannot be a surjection, and no bijection between IN and ℘ IN can exist.
So we cannot list all the subsets of IN , IN 6≈ ℘ IN and ℘ IN is not countable; the sets are both infinitely
large, but with different kinds of infinity. Thus IN and ℘ IN have a different cardinality (size of the set), as
do ℘ (℘ IN ), ℘ (℘ (℘ IN )), etc.. In fact, we have an infinite hierarchy of measures of infinity, each much
larger than the other. This concept is widely accepted at the moment, but was quite controversial when
introduced by Cantor.
Another important example of an uncountable set is the set of real numbers IR . Using the diagonali-
sation argument, to show that this set is also not countable, we will show that every list of real numbers
from the interval [0, 2] is at least missing one element.

* Example 6.7 ( IR IS NOT COUNTABLE ) Remark that we can write any number a ∈ [0, 2] ⊆ IR (all real
numbers in between and including 0 and 2) in binary notation as an infinite sequence
a = a0 × 20 + a1 × 2−1 + a2 × 2−2 + a3 × 2−3 + · · · = S∞
i=0 a i 2
−i

where every ai ∈ { 0, 1 } , and represent a by a0 a1 a2 a3 · · · . Notice that 2 is represented by an infinite


sequence of 1s (since 2 = S∞ − i = 1/ + 1/ + 1/ + 1/ + 1/ + · · ·), and 0 by an infinite sequence
i=0 2 1 2 4 8 16
of 0s. Moreover, there are numbers that have more than one representation: for example, 1.5 can be
represented by both 11000 · · · and 101111 · · · ; these are called the dyadic rationals, and are of the form
n/ . These ’doubles’ have all the characteristic that they end with an infinite number of 0s or 1s; we call
2k
this a 0-tail (or 1-tail); these create a complication in the diagonalisation argument we will use, but that
we will deal with successfully.
Now assume [0, 2] is countable, and let a0 , a1 , a2 , a3 , . . . be a list of all the elements of [0, 2].
We can now use the diagonalisation technique and construct a number not in the list, but have to be a
bit more careful, seen that we need to avoid to create an alternative representation for a dyadic rational
already occurring in the list. So we ‘skew’ the argument, and define the number b such that, for every
i ∈ IN :
b2i i
= 1 – a2i
i
b2i+1 = a2i = 1 – b2i
Note that, for all i, if b2i = 0, then b2i+1 = 1, and if b2i = 1, then b2i+1 = 0.
We can represent the list a0 , a1 , a2 , a3 , . . . through the diagram on the right. Then it will be clear that

47
b ∈ [0, 2], and b 6= ai , for all i ∈ IN , since they differ in position 2i; so b itself as a sequence is not in the
list. So far, so good.
Did we create an alternative representation of a
dyadic rational? Can it be that b is equal to one of a0 = a00 a01 a02 a03 a04 a05 · · ·
the elements of the list, say am , in the sense that they a1 = a10 a11 a12 a13 a14 a15 · · ·
are different representations of the same number? As
a2 = a20 a21 a22 a23 a24 a25 · · ·
remarked above, this is possible only if b has a 0-tail
a3 = a30 a31 a32 a33 a34 a35 · · ·
or a 1-tail (so the skewed diagonal has a 1-tail or a
0-tail). But this is impossible, since, by construction, a4 = a40 a41 a42 a43 a44 a45 · · ·
every pair of digits b2i and b2i+1 in b contains both a .. .. .. .. .. .. ..
. . . . . . .
0 and a 1. So with b we have created a new number
b = 1 – a00 a00 1 – a12 a12 1 – a24 a24 · · ·
that does not represent a number already in our list.
This is independent of the list we started with, so any list we give will be at least missing one element,
so we cannot list all the real numbers in [0, 2]. So we cannot count all real numbers in the interval [0, 2],
and therefore certainly we cannot count IR .
We can also show that ℘ IN ≈ IR . Since we know that IN 6≈ ℘ IN , then also IN 6≈ IR .

* Example 6.8 ( IR ≈ ℘ IN ) Take Seq = { a0 a1 a2 a3 · · · | ai ∈ { 0, 1 }} the set of all infinite sequences of 0s


and 1s; we write si for the ith number in the sequence s. As in the previous example, we would like to
represent a number a ∈ [0, 2] ⊆ IR via an element of Seq, since we can write
a = a0 × 20 + a1 × 2−1 + a2 ×2−2 + a3 × 2−3 + · · · = S∞ −i
i=0 a i 2 .
As before, the dyadic rationals have two representations in Seq, but we can still use Seq as representation
of elements in the real interval [0, 2].
It might be tempting to use h : Seq → ℘ IN defined by:
h(s) = { i ∈ IN | si = 1 }
(then χh(s) (n) = 1 ⇐ ⇒ n ∈ h( s) ⇐ ⇒ sn = 1) to define a function from [0, 2] to ℘ IN , but, as can be
expected, the dyadic rationals again create a problem. For example, the image of h(110000 · · ·) = { 0, 1 } ,
and h(101111 · · ·) = IN \ { 1 }; both sequences are representations of the number 1.5, which would then
have two images, which is of course not permitted; but limiting it to one would not give a surjective
function on ℘ IN .
We will define a surjective function from [0, 2] to ℘ IN like h, but taking care of the dyadic numbers.
Since all dyadic rationals are of the form n/2k , with n, k ∈ IN , the set of these numbers is representable
∆ { d , d , d , . . . } be that counting. Let
by IN 2 and is therefore countable. 30 Let Q D = 0 1 2

Seq0 = { s ∈ S | ∃ k ∈ IN ∀ i ∈ IN ( i > k ⇒ si = 0 ) }
Seq1 = { s ∈ S | ∃ k ∈ IN ∀ i ∈ IN ( i > k ⇒ si = 1 ) }
the set of all sequences in Seq with a 0-tail or a 1-tail, respectively.
Since each element of Q D has two representations in Seq, one with a 0-tail and one with a 1-tail, we
can define the mappings g0 : Q D → Seq0 and g1 : Q D → Seq1 by
gj (dk ) = s ∈ Seq j , where S∞ −i = d
i=0 s i 2 k

so g0 (1.5) = 110000 · · · and g1 (1.5) = 101111 · · · .


With these two functions we can define a mapping g : Q D → Seq0 ∪ Seq1 through:
g(d2k ) = g0 (dk )
g(d2k+1 ) = g1 (dk )
We are now ready to define f : [0, 2] → ℘ IN , using the function h we gave above:

30 Of course they are also countable since they form a subset of Q , which is a countable set itself.

48
(
h( g( x)) x ∈ Q D
f ( x) = −i
h( s) otherwise, where x = S∞
i=0 s i 2
By construction, this is a surjective function (notice that, for a dyadic rational d, the sequence g(d) no
longer represents the characteristic function of the set it is mapped unto by h).
In the opposite direction, take g : ℘ IN → [0, 2] by:
g(V ) = r
where r = S∞ −i
i=0 vi 2
vi = χV (i ).
which is a surjection as well (notice that, due to the presence of dyadic rationals, this function is not
injective). So by Theorem 5.14 we have ≤ ≈ ℘ IN .
Using h( x) = tan (π ×(1/2 x – 1/2 )) we have a surjection from [0, 2] onto IR , and with
(
x ( x ∈ [0, 2])
k( x ) =
0 (otherwise )
we have a surjection from IR to [0, 2], so [0, 2] ≈ IR ; so also IR ≈ ℘ IN .

The difference between integers and reals gives a complication for computer science: we cannot
manipulate reals in the way we can natural numbers. Instead, we have to use approximations, such as
the floating point decimals. In computing a real number is, in general, represented using a number of
significant digits (the significant) and scaled using an exponent in some fixed base (normally 10, but this
can differ). A number that can be represented exactly is of the following form:
significand × baseexponent
where significand is an integer (i.e., in ZZ ), base is an integer greater than or equal to two, and exponent
is also an integer. Since there are only countably many numbers that can be represented this way, we
cannot represent IR , and inaccuracies are inevitably created by this notation. For example, it is a small
programming exercise to find a float r such that r + 1 = r.

6.3 Different Infinities


So we know that IR is really larger than IN and Q , but the proof suggest this is only ‘by a bit’. Nothing
could be farther from the truth. Although IR is ‘dense’ in Q , 31 and Q is ‘dense’ in IR , the measure of the
sets Q and IR is very different.
To illustrate the difference between Q = IN and IR , we can show that Q is ‘sparse’ in IR . We will
achieve this by defining a subset V ⊆ IR whose ‘size’ is arbitrarily small and contains all rationals, so
Q ⊆ V and IR \ V is a set containing only pure reals. In other words, given any positive size, no matter
how small, we will be able to create a set with that size that contains all rationals.

Example 6.9 ( Q IS A NULL SET IN IR ) We know from Example 6.4 that we can count the rationals,
i.e. put them in an exhaustive list; so let q0 , q1 , q2 , . . . be such a list. Now we cast an (open) inter-
val in IR around each element of Q , and define:
∆ ( q – δ × 2− i , q + δ × 2− i )
Vδi = i i
Vδ =∆ ∪∞ V i
i=0 δ
Remark that each Vi is an interval of size 2× δ × 2−i = 2−i+1 δ; we write || Vδi || for this size; notice that
Q ⊆ Vδ . Now the total size of Vδ , || Vδ ||, is strictly greater than 0 and smaller or equal to the sum of all
the sizes of the Vδi s (they might overlap, of course), i.e.
0 < || Vδ || ≤ S∞ i − i+1 δ = 4δ
i=0 || Vδ || = Si=0 2

p
31 In-between each pair of rational numbers q1 and q2 there exists a real number: q1 + 1/ ×( q − q 1 ).
2 2

49
This holds for any δ. Since we can make δ as small as we like, this essentially shows that V = limδ→0 Vδ
is negligible, but still non-empty (also called a null or zero set) in IR . But since Q ⊆ V, so is Q .

So, for any size, however small, we can put Q in a subset of IR of that size; the rest of the real line
contains only true reals. So basically, within the space of IR , Q occupies no space at all; the rationals
do not matter in comparison: the difference in levels of infinity between Q (or IN ) and IR is enormous.
Moreover, the set of irrational numbers, defined as IR \ Q , is equivalent to IR : IR \ Q ≈ IR .
Through Example 6.6 and 6.7 we have essentially shown that we can define a bijection between the
interval [0, 2] and ℘ IN (taking care of the dyadic rationals), so also the difference in levels of infinity
between IN and ℘ IN is enormous.
So there are two fundamentally different kinds of infinity here: the cardinality of IN , IN called ℵ0 ,
and that of ℘ IN , called ℵ1 , etc.. Notice that by Cantor’s theorem we have an infinite hierarchy of notions
of infinity:
IN 6≈ ℘ IN 6≈ ℘ (℘ IN ) 6≈ ℘ (℘ (℘ IN )) 6≈ ℘ (℘ (℘ (℘ IN ))) · · · .
and
IN < ℘ IN < ℘ (℘ IN ) < ℘ (℘ (℘ IN )) < ℘ (℘ (℘ (℘ IN ))) · · · .
Since ℘ IN ≈ IR , also ℘ (℘ IN ) ≈ ℘ IR , etc..
The continuum hypothesis states that there is no different notion of infinity in between IN and ℘ IN ,
a hypothesis that can neither be proved nor disproved within present set theory.

Exercises
Exercise 6.1 1) Using that ‘every non-empty subset of IN has a least element’, show that every subset of
IN is countable.
2) Show that if V is not countable, and V ⊆ W, then neither is W.
Exercise 6.2 Let A be a non-empty set. Take the statements

A is countable (5)
there exists a surjection from IN to A (6)
there exists an injection from A to IN (7)

Show that (5) implies (6), that (6) implies (7) (using ‘every non-empty set V of IN has a least element’),
and that (7) implies (5).
Exercise 6.3 1) Assume that A ≈ B; show that A2 ≈ B2 .
2) Assume that the set A is infinite and countable; show that A ≈ A2 .
Exercise 6.4 Using the (Dual) Cantor-Bernstein Theorem, show that the union of two countable infinite
sets V and W is countable.
Exercise 6.5 1) Prove that { 0, 1 }× IN is countable.
2) Use this property to prove that if the disjoint sets V and W are countable, then so is V ∪ W.
3) Now prove that if V and W are countable then so is V ∪ W. You’ll need Exercise 6.1 here.
Exercise 6.6 Show that the sets ZZ 2 and IN 3 are countable.
Exercise 6.7 Show that a countable union of countable infinite sets is countable.
V
Exercise 6.8 Show { 0, 1 } ≈ ℘ (V ).
IN
Exercise 6.9 Show that { 0, 1 } is not countable.
* Exercise 6.10 One of the following sets is finite, one countably infinite and one uncountable. Determine
which is which, and justify your answer.
1) { f : IN → { 0, 1 } | ∀ n ∈ IN ( f (n) ≤ f (n + 1) ) }

50
2) { f : IN → { 0, 1 } | ∀ n ∈ IN ( f (n) 6= f (n + 1) ) }
3) { f : IN → IN | ∀ n ∈ IN ( f (n) ≤ f (n + 1) ) }

* 7 Axioms for Set Theory


We shall now, for completeness’ sake, state the axioms for set theory as formulated by Zermelo and
Fränkel. For reasons of legibility, we state these in English rather than as mathematical formulae.

Definition 7.1 (Z ERMELO -F R ÄNKEL A XIOMS FOR S ET T HEORY )


Axiom of Extensionality : Two sets are equal (are the same set) if they have the same elements: for all
sets A, B, if for all z we have z ∈ A when z ∈ B, then A = B.
⇒ x ∈ B ) ⇒ A = B ). 32
∀ a, B ( ∀ x ( x ∈ A ⇐
Axiom of Regularity : (also called the Axiom of Foundation) Every non-empty set A contains a member
y such that A and y are disjoint sets. (This implies, for example, that no set contains only itself.)
∀ A ( ∃ x ( x ∈ A ) ⇒ ∃ y ∈ A ( ¬∃ z ∈ y ( z ∈ A )))
Axiom of Specification : (also called the axiom schema of separation or of restricted comprehension)
Every subclass of a set that is defined through a predicate is itself a set.
∀w1 , . . . , wn ∀ A ∃ B ∀ x ∈ B ( x ∈ A ∧ ϕ( x, w1 , . . . , wn , A) )
Axiom of Pairing : If A and B are sets, then there exists a set which contains A and B as elements.
∀ A ∀ B ∃ C ( A ∈ C ∧ B ∈ C ).
Axiom of Union : For every set S of sets, the collection of all members from the sets in S , the collection
{ x ∈ B | B ∈ S } is a set.
∀S ∃ A ∀Y ∀ x ∈ Y ( Y ∈ S ⇒ x ∈ A ).
Axiom of Replacement : The image of the domain set A under the definable function f (i.e. the range
of f ) falls inside a set B.
Axiom of Infinity : There exists a set that has infinitely many members.
∃ A ( ∈ A ∧ ∀ x ∈ A ( x ∪ { x } ∈ A )) 33
Axiom of Power set : For any set A, there exists a set that contains every subset of A.
Axiom of Well-ordering : For any set A, there exists a binary relation R which is a well-founded total
order on A.

32 Notice that here we quantify over ‘all sets’, which is strictly not allowed. We can use this property for every pair of sets,

but cannot use the formula in a proof. This is true for the other axioms as well.
33 This is actually the definition of natural numbers.

51

You might also like