100% found this document useful (3 votes)
473 views200 pages

Integer and Polynomial Algebra

This document is the table of contents for a book titled "Integer and Polynomial Algebra". It lists 7 chapters that cover topics such as the integers, modular arithmetic, Diophantine equations, codes and factoring, real and complex numbers, the ring of polynomials, and finite fields. The preface provides context that this book originated from course notes for an algebra course for freshman math majors, which introduces rigorous mathematics using the familiar settings of integers and polynomials. It is intended to be a one semester course but the book has additional material for flexibility and independent reading.

Uploaded by

Vinícius Mendes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
473 views200 pages

Integer and Polynomial Algebra

This document is the table of contents for a book titled "Integer and Polynomial Algebra". It lists 7 chapters that cover topics such as the integers, modular arithmetic, Diophantine equations, codes and factoring, real and complex numbers, the ring of polynomials, and finite fields. The preface provides context that this book originated from course notes for an algebra course for freshman math majors, which introduces rigorous mathematics using the familiar settings of integers and polynomials. It is intended to be a one semester course but the book has additional material for flexibility and independent reading.

Uploaded by

Vinícius Mendes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 200

MATHEMATICAL WORLD VOLUME 31

Integer and
Polynomial Algebra

Kenneth R. Davidson
Matthew Satriano
Integer and
Polynomial Algebra
MATHEMATICAL WORLD VOLUME 31

Integer and
Polynomial Algebra

Kenneth R. Davidson
Matthew Satriano
Cover design based on picture/iStock/Getty Images Plus
and MediaProduction/E+ via Getty Images.

2020 Mathematics Subject Classification. Primary 11-01, 12-01, 13-01.

For additional information and updates on this book, visit


www.ams.org/bookpages/mawrld-31

Library of Congress Cataloging-in-Publication Data


Cataloging-in-Publication Data has been applied for by the AMS.
See http://www.loc.gov/publish/cip/.
DOI: https://doi.org/10.1090/mawrld/31

Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting
for them, are permitted to make fair use of the material, such as to copy select pages for use
in teaching or research. Permission is granted to quote brief passages from this publication in
reviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publication
is permitted only under license from the American Mathematical Society. Requests for permission
to reuse portions of AMS publication content are handled by the Copyright Clearance Center. For
more information, please visit www.ams.org/publications/pubpermissions.
Send requests for translation rights and licensed reprints to reprint-permission@ams.org.

c 2023 by the American Mathematical Society. All rights reserved.
The American Mathematical Society retains all rights
except those granted to the United States Government.
Printed in the United States of America.

∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at https://www.ams.org/
10 9 8 7 6 5 4 3 2 1 28 27 26 25 24 23
Dedication

To Virginia, Colin, Stuart and Zoé.


–K.R.D.

To Waiwai and Quinn, and in loving memory of Susan Satriano.


–M.S.
Contents
Preface ix

Chapter 1. The Integers 1


1.1. Basic Properties 1
1.2. Well Ordering Principle 5
1.3. Primes 7
1.4. Many Primes 10
1.5. Euclidean Algorithm 12
1.6. Factoring Integers 16
1.7. Irrational Numbers 19
1.8. Unique Factorization in More General Rings 21
Notes on Chapter 1 29

Chapter 2. Modular Arithmetic 31


2.1. Linear Equations 31
2.2. Congruences 34
2.3. The Ring Zn 36
2.4. Equivalence Relations 40
2.5. Chinese Remainder Theorem 42
2.6. Congruence Equations 45
2.7. Fermat’s Little Theorem 48
2.8. Euler’s Theorem 50
2.9. More on Euler’s Phi Function 52
2.10. Primitive Roots 54
Notes on Chapter 2 58

Chapter 3. Diophantine Equations and Quadratic Number Domains 59


3.1. Pythagorean Triples 60
3.2. Fermat’s Equation for n = 4 63
3.3. Quadratic Number Domains 65
3.4. Pell’s Equation 70
3.5. The Gaussian Integers 72
3.6. Quadratic Reciprocity 77
Notes on Chapter 3 83
vii
viii CONTENTS

Chapter 4. Codes and Factoring 85


4.1. Codes 85
4.2. The Rivest-Shamir-Adelman Scheme 86
4.3. Primality Testing 89
4.4. Factoring Algorithms 91
Notes on Chapter 4 93
Chapter 5. Real and Complex Numbers 95
5.1. Real Numbers 95
5.2. Complex Numbers 98
5.3. Polar Form 101
5.4. The Exponential Function 104
5.5. Fundamental Theorem of Algebra 106
5.6. Real Polynomials 109
Notes on Chapter 5 111
Chapter 6. The Ring of Polynomials 115
6.1. Preliminaries on Polynomials 115
6.2. Unique Factorization for Polynomials 118
6.3. Irreducible Polynomials in Z[x] 121
6.4. Eisenstein’s Criterion 123
6.5. Factoring Modulo Primes 125
6.6. Algebraic Numbers 128
6.7. Transcendental Numbers 130
6.8. Sturm’s Algorithm 135
6.9. Symmetric Functions 138
6.10. Cubic Polynomials 143
Notes on Chapter 6 147
Chapter 7. Finite Fields 149
7.1. Arithmetic Modulo a Polynomial 149
7.2. An Eight-Element Field 152
7.3. Fermat’s Little Theorem for Finite Fields 155
7.4. Characteristic 157
7.5. Algebraic Elements 159
7.6. Finite Fields 161
7.7. Automorphisms of Fpd 164
7.8. Irreducible polynomials of all degrees 168
7.9. Factoring Algorithms for Polynomials 174
7.10. Factoring Rational Polynomials 176
Notes on Chapter 7 180
Bibliography 181
Index 183
Preface

This little book began as a set of course notes for an unusual but very at-
tractive freshman course in algebra for math majors. The course introduces
students to the notions of rigorous mathematics in the familiar settings of the
integers and polynomials. This is worthwhile because of the strong parallels
between the two theories. Indeed, one can argue that it is these parallels
that led to the theory of commutative algebra as a unifying force.
The current book is an expanded version of those notes. Some material
has been added, and many more exercises are included. Historical notes are
given at the end of each chapter with references to a few sources for the
material.
These topics have the advantage of being somewhat familiar to a good
high school graduate, yet harbour many interesting unforeseen results. The
number of different proof techniques in the book makes this a good intro-
duction to a wide variety of new ideas. In particular, special emphasis has
been paid to the role of algorithms in mathematics. Due to the increased use
of symbolic computing, and especially because of the availability of MAPLE
here at Waterloo, it has been natural to investigate the theory behind many
of these computations. It also provides an opportunity to have student work
out problems with much larger numbers. Many other symbolic computation
programs, such as MATHEMATICA, are equally good for use in this course.
This course has been taught at the University of Waterloo for over thirty
years. Until about a decade ago, roughly 800–1200 first year students in the
mathematical sciences took a course using the textbook Classical Algebra by
W.J. Gilbert, now in a revised edition [13] co-authored by S.A. Vanstone.
About 5% of these students took the ‘advanced’ version using these notes.
These notes were used for a one semester course. We would cover much
of the material in this book, but not all. In writing this book, it has seemed
advisable to expand on certain connections beyond the scope of the course.
It is hoped that this will provide greater flexibility for the instructor and
additional reading for the interested student.
ix
x PREFACE

Students entering university to study mathematics have probably en-


countered prime numbers. Chances are great that they believe every integer
factors uniquely into a product of primes, but have not seen a proof. This
important fact, known as the Fundamental Theorem of Arithmetic, is of
crucial importance in the theory of numbers. It is not easy to prove. More
importantly, it is not intuitively obvious. Indeed, its significance is only re-
alized with very large numbers beyond our real experience. The crucial fact
that enables us to prove this with relative ease is the Euclidean Algorithm
for finding greatest common divisors. Chapters 1 and 2 deal with these
basic properties of the integers and modular arithmetic. After giving the
proof of the Fundamental Theorem of Arithmetic, we show that, in fact, the
proof technique applies in much greater generality. In Section 1.8, we define
Euclidean Domains and prove that all such rings have unique factorization.
Throughout the book, we see applications of this general theorem in a large
variety of setting, such as the Gaussian integers and polynomial rings over
a field.
It is worth noting that there are number systems not very much different
from the integers in which unique factorization into primes fails. Far from
being a disaster, this is an opportunity to investigate why this phenomena
occurs. It shows us which properties of the integers themselves are crucial
to make the theory work. That is why we make a foray into quadratic
number domains in Chapter 3. Already the material covered in Chapters 1–
3 allow us to prove Quadratic Reciprocity, one of the crowning achievements
of elementary number theory.
A nice application of modular arithmetic is the Rivest-Shamir-Adelman
public key cryptography scheme. This code, which is covered in Chapter 4,
allows the author to publish the method of encoding a message in a public
place, while keeping the method of decoding the message secret. This is
a rather different idea in coding, as for all previously known codes, the
method of decoding merely reversed the encoding method. The secret here
is that it is very easy (with a computer) to find large primes (say 200–300
digits) but very difficult to factor the product of two large primes. When
one first encounters the problem of determining if a given number is prime,
it is natural to try the brute force method of dividing by all numbers up
to the square root. However, it turns out there are beautiful and clever
methods to test for primality without finding any factors at all. We delve
more deeply into this subject, briefly discussing the Agrawal-Kayal-Saxena
algorithm and its connection to the topics we have seen thus far. We also
discuss the probabilistic test due to Miller-Rabin.
In Chapter 5, we introduce the complex numbers. There is a tacit as-
sumption that the student is already reasonably familiar with the real num-
bers from studying calculus. However, a section is devoted to a brief discus-
sion of how the real numbers are developed. The main result of this chapter
is the Fundamental Theorem of Algebra, which states that every complex
PREFACE xi

polynomial factors into a product of linear terms. We emphasize how an-


alytic techniques play a key role in the proof of this cornerstone algebraic
result. The proof we give is one of the simplest, and relies on the Extreme
Value Theorem. We also develop the complex exponential function, which
plays a vital role in applications of the complex numbers.
In Chapter 6, we show that the same theory developed for the integers
applies to the algebra of polynomials. In particular, there is a Euclidean
Algorithm and unique factorization into irreducible polynomials. We exam-
ine various tests for irreducibility, and study connections with irrationality
of the roots. We then follow up with special topics about real and complex
polynomials such as Sturm’s Theorem for counting real roots, and the for-
mula for solving cubics. In Chapter 7, we study finite fields in some detail.
We draw parallels between modular arithmetic for the integers and arith-
metic modulo an irreducible polynomial. Many of the results we have seen
for Zp in earlier chapters carry over to all finite fields. A rather beautiful
application of these ideas is an algorithm for factoring polynomials over the
rationals. This algorithm is based on a method for factoring polynomials
modulo a prime integer p. It turns out that factoring a polynomial of degree
d mod p is much easier than factoring a d digit base p number.
We would like to take this opportunity to thank the people who have
helped with this endeavour. In particular, the first author thanks Stanley
Burris with whom he has had many enjoyable conversations about this ma-
terial. The first author also thanks Keith Geddes for some conversations
on the algorithms used by MAPLE. The second author would like to thank
David Jao and Stephen New for answering questions about the practical as-
pects of RSA. It is a pleasure to thank Anton Mosunov for a careful reading
of an early draft of the new version of this book and for sending us detailed
comments and corrections. We thank the referees and editors at AMS/MAA
for their helpful comments. Lastly, we thank the many students in Math
145 classes who suffered through various versions of these notes and offered
many helpful suggestions and corrections.

Kenneth R. Davidson
Matthew Satriano
Waterloo, January, 2023
Chapter 1

The Integers
The basic object which we shall study in the first four chapters is the set
of integers. As a mathematical object, the integers have a wealth of struc-
ture. First, you can add, substract and multiply integers together. It is
the multiplicative structure which is of most interest, because the recipro-
cal operation of division is not always defined (within the integers). The
notion of divisibility leads to the definition of prime numbers, and then to
the factorization of numbers into primes. The reader may well have been
told that every number factors into primes in a unique way. This non-trivial
result is known as the Fundamental Theorem of Arithmetic. It is far from
obvious. We will prove it in this chapter. In the last section, we will show
that essentially the same argument will work in a very abstract context. The
advantage of doing this is that we will later see several explicit, important
contexts to which it applies, such as the ring of all polynomials.

1.1. Basic Properties


The integers is the set
Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . . }.
Beyond being a set, Z comes with the operations of addition and multipli-
cation. Addition has an inverse operation called subtraction. However the
inverse operation of multiplication, namely division, does not always yield
an integer answer, which leads to the notion of divisibility. Describing the
integers takes a little time, but the following list of properties is natural.

[S1] The integers consist of a set Z together with two binary opera-
tions addition (+) and multiplication (·).
[A1] (commutativity of addition) For all a, b ∈ Z,
a + b = b + a.
[A2] (associativity of addition) For all a, b, c ∈ Z,
(a + b) + c = a + (b + c).
1
2 1. THE INTEGERS

[A3] (additive identity or zero) There is an element 0 ∈ Z so that


for all a ∈ Z,
a + 0 = a = 0 + a.
[A4] (additive inverses) For each a ∈ Z, there is an element −a ∈ Z
such that
a + (−a) = 0.
[M1] (commutativity of multiplication) For all a, b ∈ Z,
a · b = b · a.
[M2] (associativity of multiplication) For all a, b, c ∈ Z,
(a · b) · c = a · (b · c).
[M3] (multiplicative identity or one) There is an element 1 ∈ Z so
that for all a ∈ Z,
a · 1 = a = 1 · a.
[D1] (distributive law) For all a, b, c ∈ Z,
(a + b) · c = a · c + b · c.
We did not define subtraction—it is enough to include the additive in-
verse. That is because a − b is just an abbreviation for a + (−b).
This is certainly a list of properties satisfied by the integers. But this
collection of properties is satisfied by many other mathematical sets. For
example, the collection R of all real numbers and Q, the set of rational
numbers (fractions). Also, the set
√ √
Z[ 3] = {a + b 3 : a, b ∈ Z},
with the usual operations satisfies all these properties. Consider the set
Z ⊕ Z = {(a, b) : a, b ∈ Z}
with coordinate-wise addition and multiplication, i.e.,
(a, b) + (c, d) = (a + c, b + d) and (a, b) · (c, d) = (ac, bd).
This also satisfies these laws. What are the zero and one in this case?
In fact a great many mathematical objects satisfy these laws. They are
called commutative rings. The word ring is used for a set satisfying all
these laws except M1–commutativity of multiplication, and with another
distributive law added:
[D2] For all a, b, c ∈ Z,
a · (b + c) = a · b + a · c
The set of 2 × 2 matrices with integer entries is an example of a non-
commutative ring. Addition is coordinate-wise, but multiplication is defined
by the rule     
a b w x aw + by ax + bz
= .
c d y z cw + dy cx + dz
1.1. BASIC PROPERTIES 3

1.1.1. Example. Another important example that will play an important


role in this book is the set of integers modulo n. For now, consider the ring
Z2 consisting of two elements {0, 1} with operations given by the tables:
+ 0 1 · 0 1
0 0 1 0 0 0
1 1 0 1 0 1
Notice that in this example, unlike the others, 1 + 1 = 0. This may seem
rather strange, but it gives us a clue about how to add further properties to
the above list to ensure that the integers are the only example.

One property that will ensure we do not get too big a set is a stipulation
that
[G1] Z is generated by {0, 1} as a ring.
This means that we start with 0 and 1, and form all the elements needed
to provide the minimal collection satisfying all our properties. Since a ring
is closed under the operations of addition and multiplication, we need all
the numbers of the form 1, 1 + 1, 1 + 1 + 1, 1 + 1 + 1 + 1,. . . . You should
convince yourself that the distributive law ensures that this set is closed
under multiplication, as well as addition. In order to satisfy [A4], additive
inverses, we may have to add in −1, −(1 + 1),. . . . You should now convince
yourself that this collection is rich enough to satisfy all the properties. This
includes checking that (−1) + (−1) = −(1 + 1), etc. None of the necessary
steps are hard, but it is very time-consuming to write them all out.
Unfortunately, this still does not ensure that we have the integers. The
example Z2 above is also generated by its 0 and 1. We can eliminate this
example by decreeing that 1, 1 + 1, 1 + 1 + 1, 1 + 1 + 1 + 1,. . . are all different.
If this holds in any ring, the collection S = {1,1+1,1+1+1,1+1+1+1, . . .}
will be indistinguishable from the natural numbers N = {1, 2, 3, . . .} by
any mathematical property. In fact, to ensure that they are all different, it
is enough that none are 0. Why? Then it follows that −S does not intersect
S (why?), and that R = S ∪ {0} ∪ −S is a ring which has all the same
properties as Z. We will name this last property [F1] for free:
[F1] No nontrivial sum of 1’s is equal to 0.
We have not written down all the important properties of the integers.
But at least, we have come up with a list of properties that distinguishes
the integers from other similar objects. Before leaving this point, we will
show how we can define another very useful property – order – using what
we already have. Define order as follows:
a < b if b − a ∈ N
a = b if b − a = 0
a > b if b − a ∈ −N.
This order satisfies some simple properties:
4 1. THE INTEGERS

[O1] For all a, b, and c in Z with a < b, a + c < b + c.


[O2] For all a, b, and c in Z with a < b and c > 0, ac < bc.
We say that an integer n is positive if n > 0. Notice that by definition
of the ordering on Z, an integer n is positive if and only if n ∈ N.
What do we mean when we say that two mathematical objects are the
same? or at least have exactly the same properties? Throughout mathe-
matics, one is concerned about this issue. It is generally dealt with by con-
sidering maps between sets that preserve the structure that one is studying.
The following definition captures part of this for rings.

1.1.2. Definition. If R and S are rings, a function ϕ : R → S is called a


ring isomorphism if
• ϕ is a bijection (i.e., one-to-one and onto) such that
• ϕ(0) = 0 and ϕ(1) = 1,
• ϕ(r1 + r2 ) = ϕ(r1 ) + ϕ(r2 ) for all r1 , r2 ∈ R,
• ϕ(r1 r2 ) = ϕ(r1 )ϕ(r2 ) for all r1 , r2 ∈ R.
Say R and S are isomorphic if there exists a ring isomorphism ϕ : R → S.

If R and S are isomorphic rings, then they are indistinguishable on the


basis of their properties as rings. We consider them to be equivalent objects.
See Exercise 7. The new ring is just the integers ‘in disguise’.

Exercises


1. Show that Z[ 3] is a commutative ring.
2. Show that if [F 1] holds, then sums of different numbers of 1’s are all
distinct, and their additive inverses are all distinct from sums of 1’s.
3. Verify the properties of a commutative ring for Z2 .
4. Can an operation < be put on Z2 satisfying [O1]?
√ √
5. Describe explicitly the ring Z[ 3 5] generated by 1 and 3 5.
6. (a) What are the additive and multiplicative identities for Z ⊕ Z?
(b) Show that there are non-zero elements in Z ⊕ Z which multiply to 0.
7. Consider the ring R = {2n : n ∈ Z} with addition ⊕ given by 2n ⊕ 2m =
2n+m , and multiplication  given by 2n  2m = 2nm . Show that this is a
ring. Then show that the map taking 2n to its logarithm base 2 (namely
n) is a ring isomorphism from R to Z.
8. What other properties of the integers can you think of? Can these
properties be deduced from [S1], [A1]–[A4], [M1]–[M3], and [D1]?
1.2. WELL ORDERING PRINCIPLE 5

1.2. Well Ordering Principle


In this section, we will look at a ‘self evident’ principle. We shall see that it
leads us to the principle of induction, a basic proof technique which we will
introduce here. In mathematics, one does have to be careful about what we
think is self-evident, as this is not as clear as the reader might think. This
principle can be justified.

1.2.1 Well Ordering Principle. Every non-empty subset of N has a


least element.

This is true for the following reason. If S is a non-empty subset of N,


then it contains an element s. Consider the finite list of integers 1, 2, 3, . . . , s.
The first integer in this list which belongs to S is the desired least element.
We will use this principle to formalize certain arguments. First, let
us consider induction. Induction is a method used to verify a long (often
infinite) list of propositions. Call the propositions P (n) for n ∈ N. That is,
each P (n) is a mathematical statement which might be true or false.

1.2.2 Principle of Induction. Suppose that proposition P (1) is true.


Furthermore, suppose that if P (k) is true for 1 ≤ k < n, then P (n) is true.
Then P (n) is true for all n ≥ 1.

Proof. Let S be the set of all n such that P (n) is false. If S is empty, we
have the desired conclusion. Otherwise, S is non-empty. In this case, the
Well Ordering Principle tells us that S has a least element, say n. By the
hypotheses, n = 1. Since n is the smallest integer in S, we see that P (k)
is true for all 1 ≤ k < n. By the induction hypothesis, P (n) is true. This
contradicts the fact that P (n) is false. The contradiction must be due to
a false supposition – in this case, that must be the supposition that S is
non-empty. So S is empty, and P (n) is true for all n ≥ 1. 

Sometimes this is called the generalized principle of induction because


it assumes that all of the statements P (k) for 1 ≤ k < n must be known to
be true in order to deduce P (n), not just P (n − 1). This is sometimes an
important improvement. See the Second Proof of Theorem 1.3.3 in the next
section.

1.2.3. Example. We look for a formula for the sum of the first n squares:

n
sn = i2 . The first few terms are 1, 5, 14, 30, 55, 91. While no obvious
i=1
n(n + 1)(2n + 1)
formula is apparent, the reader can check that P (n) : sn =
6
is valid for n = 1, 2, 3, 4, 5, 6. We will use induction to verify this for all n ≥ 1.
6 1. THE INTEGERS

First
1(1 + 1)(2 · 1 + 1) 6
= = 1 = s1 .
6 6
Thus P (1) is true. In this example, to check P (n), it is enough to use the
fact that P (n − 1) is true. Then

(n − 1)(n)(2n − 1) 2n3 − 3n2 + n + 6n2


sn = sn−1 + n2 = + n2 =
6 6
2n3 + 3n2 + n n(n + 1)(2n + 1)
= = .
6 6
Thus if P (n − 1) is true, so is P (n). By induction, this formula holds for all
n ≥ 1.

1.2.4. Example. Consider the following ‘proof’ by induction. We will


show that all people have the same colour hair. Let P (n) be the statement
that every set of n people all have the same hair colour. This is evident for
n = 1. Now look at larger n. Suppose that P (n − 1) is true. Given a group
of n people, apply the induction hypothesis to all but the last person in the
group. This group have all the same hair colour. Now repeat this argument
with all but the first person. We find that all the people have the same hair
colour by combining these two facts. By induction, all people have the same
hair colour.
This is patently absurd, and you are undoubtedly ready to refute this
by saying that Eric has different hair colour from Alana. But we want you
to find the mistake in the induction argument.

Exercises

1. Prove by induction that


n 
n 
2
i3 = i .
i=1 i=1
2. Prove by induction that


n
(−1)n n(n + 1)
(−1)i i2 = .
2
i=1
3. Find the error in the induction argument in Example 1.2.4.
Hint: P (1) is true, and P (73) implies P (74).
4. Prove that n! > 2n > n2 for n ≥ 5.
5. Prove that if x > −1 is a real number and n ≥ 1, then (1 + x)n ≥ 1 + nx.
1.3. PRIMES 7

6. Let x > 1 be a real number such that x + x−1 is an integer. Prove that
xn + x−n is an integer for all n ≥ 1.
Hint: evaluate (x + x−1 )(xn + x−n ).
7. Consider the Fibonnaci sequence, given by F (0) √ = F (1) = 1, and for
n ≥ 0, F (n + 2) = F (n) + F (n + 1). Let τ = ( 5 + 1)/2. Prove by
induction that

F (n) = (τ n+1 − (−1/τ )n+1 )/ 5.
8. Define a sequence of real numbers by the rules

s0 = 0 and sn+1 = 3 + sn for n ≥ 0.

(a) Show by induction that sn < sn+1 < 3 for all n ≥ 0.


(b) The least upper bound principle (see chapter 5) shows that

the se-
1+ 13
quence has a limit. Show that the limit should be σ = 2 .
(c) Obtain a formula for σ − sn+1 in terms of σ − sn . Hence prove by
induction that 0 < σ − sn < 3/4n for all n ≥ 0.
9. A real number x has a decimal expansion x = x0 .x1 x2 x3 . . . where x0 ∈
Z and xi ∈ {0, 1, 2, . . . , 9} for i ≥ 1. Say that this expansion is eventually
periodic if there are positive integers d and N so that xn+d = xn for all
n ≥ N . Prove that a real number with eventually periodic decimal
expansion is rational.
Hint: consider 10N +d x − 10N x.
10. Let x = pq be a rational number, with q ≥ 1.
(a) Find rk ∈ {0, 1, . . . , q −1} so that 10k = ak q + rk for k ≥ 0. Show
that there are two integers 0 ≤ k < l ≤ q such that rk = rl ; so
q|(10l − 10k ). Hint: the pigeonhole principle states that if q + 1
objects are placed in q boxes, at least one box has two or more
objects in it.
a
(b) Show that if 0 ≤ a < 10d − 1, then d has a periodic decimal
10 − 1
expansion.
(c) Prove that x has an eventually periodic decimal expansion.
Hint: consider (10l − 10k )x.

1.3. Primes
We have noted that division is not a part of the axioms for the integers.
There are two good reasons for this. The first is that a/b is not defined as
an integer for all pairs of integers a and b with b = 0. Secondly, division
is the inverse relation to multiplication in the same way that subtraction
is the inverse of addition. Subtraction does not occur in the axioms either;
but is shorthand for combining addition with the additive inverse. For these
reasons, we define divisibility in terms of multiplication.
8 1. THE INTEGERS

1.3.1. Definition. Say that an integer a divides an integer b if there is


an integer c such that b = ac. The notation for this is a|b.
An integer p is prime if p = ±1 and the only integers which divide p
are ±1 and ±p.

We don’t consider ±1 to be prime because they are invertible in Z;


and are called units of Z. They divide every number, and this does not
substantially change the factorization. Note that ±2 and ±3 are primes. So
6 = 2 · 3 = (−2)(−3) = 3 · 2 = (−3)(−2).
These are considered to be trivial differences because permutation of factors
is irrelevant since multiplication is commutative, and −1 is a unit, so that
we can put this as a factor into any term. Common practice is to factor
positive integers into a product of positive primes in increasing order.
The most important fact about factoring integers is that each integer
can be written as a product of primes in exactly one way (up to signs and
permutation of the factors). This is known as the Fundamental Theorem of
Arithmetic. It is not particularly easy to prove. Indeed, it would be quite
an accomplishment to do this properly without having seen a proof yourself.
We will prove this theorem in this book, but it will take some preparation.
In this section, we content ourselves with something easier—existence of
a factorization into primes.

1.3.2. Lemma. If n = ab with a, b ∈ N, then a ≤ n. In particular, if


b = 1, then a < n.

Proof. Since b ∈ N, we have 1 ≤ b. Thus, a ≤ ab = n. Furthermore, if


b = 1, then 1 < b and so a < ab = n. 

1.3.3. Theorem. Every integer n > 1 is the product of a finite set of


primes.

Proof. Let S be the set of all integers n > 1 which are not the product of
finitely many primes. We want to prove that S is empty. If it is not empty,
then by applying the Well Ordering Principle, we obtain a least integer n
which cannot be factored into primes. If n were prime, it would be the
product of one prime, namely itself. So, n cannot be prime and we may
write n = ab where a and b are positive integers, neither of which is 1. By
Lemma 1.3.2, we see a and b are less than n, so they cannot belong to S.
Therefore both can be factored into primes, say
a = p1 p2 . . . pk and b = q1 q2 . . . ql .
Then n can be factored as n = p1 p2 . . . pk q1 q2 . . . ql . This contradicts the
fact that n is in S, and so S must be empty. 
1.3. PRIMES 9

Second Proof. This is the same proof, but using induction rather
than the Well Ordering Principle. Notice that we need the full generality of
the Principle of Induction.
Let P (n) for n ≥ 2 be the statement that n factors as the product of
primes. Check by hand that 2 is prime, and so P (2) holds. See Exercise 5.
(This is our starting point, since there is no statement P (1).) Now suppose
that P (k) holds for all k < n. If n is prime, then it is the product of one
prime, so P (n) holds. On the other hand, if n is not prime, factor n = ab
for a, b > 1. As above, 1 < a, b < n. By the induction hypothesis, P (a) and
P (b) are true. (Here is where the full strength of the induction hypothesis
is required.) Therefore, we can factor a and b into products of primes. As
above, we can multiply them together to obtain a factorization of n into a
product of primes. By induction, the theorem is true for all n ≥ 2. 

Why is this not enough for the Fundamental Theorem mentioned above?
Because we do not know if the product of two different sets of primes can be
the same! For small numbers, you know from experience that there is only
one way to factor them. This property is known as Unique Factorization.
But how many 1000 digit numbers have you tested? If the answer is more
10
than one, what about numbers with 1010 digits? We need an argument
that goes beyond this common experience. The tool we need is the Euclidean
algorithm, which we develop in a later section.

Exercises

1. Let a, b, c, r, s be integers. Show that if a|b and a|c, then a|(br + cs).
2. Show that if a|b and b|c, then a|c.
3. Show that if c and d are integers such that c|d and d|c, then d = ±c.

4. Show that if 1 < a ∈ N has no divisor p with 1 < p ≤ a, then a is
prime.
5. Use Lemma 1.3.2 to show that 2 is prime.
6. Show that if a product of integers a = a1 a2 · · · an is even, then at least
one of the factors ai must be even.
7. Show that if a product of integers a = a1 a2 · · · an is a multiple of 3, then
at least one of the factors ai is a multiple of 3.
8. Does your method of proof in the previous question give any insight into
what happens when we replace 3 by 1049?
9. (Sieve of Eratosthenes) Imagine that you have listed all of the inte-
gers from 1 to 10000. Cross out 1. Now 2 is the first remaining number.
Cross out every second number following 2, i.e., 4, 6, 8, . . . . Now 3 is
10 1. THE INTEGERS

the next remaining number. Cross out every third number following 3,
i.e., 6, 9, 12, . . . . Some numbers like 6 and 12 are crossed out more than
once. Show that after you have crossed out all multiples of 97, what
remains is a list of all primes less than 10000.

1.4. Many Primes


In this section, we will give two proofs that there are infinitely many primes.
The first proof of this fact is credited to Euclid, and dates from about 200
BCE (see Exercise 2). Our first proof is slightly easier. Our second proof is
much harder, and you may skip it without loss. But it gives some indication
that primes are quite plentiful, whereas with the first proof, primes could
still be very rare.
In fact, the famous Prime Number Theorem shows that the number π(n)
of primes less than or equal to n is approximately n/ log(n) in the sense that
π(n)
lim = 1.
n→∞ n/ log n
Because the log function grows very slowly, this means that primes are
quite common. The prime number theorem was conjectured by Legendre
and Gauss about 200 years ago. They used extensive tables of primes to
test the conjecture, but were not able to prove it. Riemann introduced the
famous Riemann zeta function, and established important relationships be-
tween the properties of this function and the distribution of the primes. One
of the most important outstanding mathematical problems, known as the
“Riemann hypothesis”, asks about the location of the zeros of this function.
In 1896, Hadamard and de la Valleé Poussin finally proved the prime num-
ber theorem, independently of each other, by obtaining partial information
about the zeros of the zeta function.

1.4.1. Theorem. There are infinitely many primes.

Proof. Let n ≥ 1. By Theorem 1.3.3 , there is a prime, say pn , which


divides n! + 1. If pn ≤ n, then pn |n! as well, and hence it would divide
(n! + 1) − n! = 1, which is absurd. Thus pn > n. Therefore the set of prime
numbers is unbounded, and thus is infinite. 
One way to gauge the density of the primes is the following result which
says that the sum of the reciprocals of the primes diverges. For a quickly
growing series like the powers of 2, the sum of the reciprocals converges
quickly. For the set of perfect squares, one verifies that the sum of the
reciprocals converges by the integral test from calculus. In fact, even for
a series like n log n(log log n)2 , the sum of the reciprocals converges. So
prime numbers occur more frequently in some sense. Indeed, this gives
some credence to the prime number theorem.
1.4. MANY PRIMES 11

1.4.2. Theorem.  1
diverges.
p
p prime

Proof. Let us number the primes in increasing order  as p1 < p2 < . . ..


Again the proof proceeds by obtaining a contradiction if i p1i < ∞. In this
case, the ‘tail’ of the series is small. So we can choose an integer k so that
 1 1
< .
pi 2
i>k

Fix the large integer N = We will count the set {1, 2, . . . , N } in


4k+1 .
a different way. The first step is to count the numbers from 1 to N which
have a big prime factor pi for i > k. There are at most N/pi numbers in
this range which are multiples of pi . Adding this up over all i > k, we find
that there are at most
N N
n= <
pi 2
i>k
numbers in {1, . . . , N } which have any of these primes as a factor. (This is
a rather crude estimate because any multiple of more than one large prime
is counted more than once; and if pi > N , there are no multiples at all.)
Now the remaining numbers all have the form
a = pn1 1 · pn2 2 · . . . · pnk k .
To count these, we factor out the biggest square possible. That is, we write
a = b2 c where
mk
b = pm m2
1 · p2 · . . . pk
1
and c = pe11 · pe22 · . . . pekk ,
where if nj = 2mj is even,√then ej = 0 and if nj = 2mj + 1 is odd, √ then

ej = 1. There are at most N ways of choosing b since 1 ≤ b ≤ a ≤ N .
Since there are only two choices for each ej , there are k
k
√ at most 2 ways of
choosing c. So altogether, there are at most m = 2 N ways of obtaining
numbers of this form in {1, . . . , N }. (This estimate is crude too, but uses a
trick that makes it pretty good.)
Combining these two estimates, we have counted all numbers from 1 to
N at least once. So

4k+1 = N ≤ n + m < N/2 + 2k N = 22k+1 + 2k 2k+1 = 4k+1 .
This is an absurd statement, contradicting our hypothesis that the recipro-
cals of the primes converged. So the series must diverge. 

Exercises

1. Show that there are arbitrarily long strings of consecutive composite


numbers.
12 1. THE INTEGERS

2. (Euclid’s proof ) Suppose that the list of primes is finite: p1 , p2 , . . . , pn .


Consider a prime factor of p1 p2 · · · pn + 1. Conclude that there are
infinitely many primes.
n
3. The Fermat numbers are the integers Fn = 22 + 1.
(a) Show that x + 1 divides x2s − 1 for any positive integer s. Hence
show that Fn divides Fm − 2 for all m > n.
(b) Show that the Fn have no common prime factors. Hence give another
proof that there are infinitely many primes.
4. Suppose that p1 , . . . , pr is a list
of distinct primes. Let N = p1 p2 . . . pr
and qi = N/pi . Define M = ri=1 qi . Show that no pi can divide M .
Conclude that there are infinitely many primes.
5. Show that there are infinitely many primes of the form 4n + 3.
Hint: for n ≥ 4, show that n! − 1 has a prime factor pn of this form.
6. Show that if n > 1 and an − 1 is prime, then a = 2 and n is prime.
Hint: factor the polynomial xn − 1.
7. This is an exercise to see that the prime number theorem is plausible.
(a) Use the integral test to show that the following series converge:

 ∞

1 1
and
n(log n)2 n2
n=2 n=1
n
(b) Show that π(n) > infinitely often.
(log n)2
Hint: show that the sum of the reciprocals of the primes between
2k−1 and 2k is at most 21−k π(2k ). Use this to estimate the sum of
the reciprocals of all primes.

1.5. Euclidean Algorithm


Long division is an algorithm usually taught in elementary school that allows
one to divide a (usually smaller) number into another (usually larger) one,
and obtain an integer quotient and remainder. This is actually quite a strong
result, as it is the key to establishing the Euclidean algorithm in the next
section. Yet because it is familiar since childhood, we take it for granted.
This is formalized as follows:

1.5.1 Division Algorithm. Suppose a ∈ N and b ∈ Z. Then there are


unique integers q and r such that
b = aq + r and 0 ≤ r < a.

Proof. We will apply the Well Ordering Principle to the set of all positive
remainders to obtain the smallest one. Let
S = {s : s = b − aq ≥ 0, and q ∈ Z}.
1.5. EUCLIDEAN ALGORITHM 13

First note that S is non-empty. For if b ≥ 0, take q = 0 and obtain b ∈ S.


And if b < 0, take q = b to obtain
s = b − ab = b(1 − a) ≥ 0.
Let r be the least element of S (whose existence is guaranteed by the Well
Ordering Principle), and let q be the integer so that r = b − aq. If r ≥ a,
s = b − (q + 1)a = r − a ≥ 0
is a smaller element of S. Therefore 0 ≤ r < a.
It remains to verify uniqueness. Suppose that
b = aq1 + r1 = aq2 + r2 and 0 ≤ ri < a for i = 1, 2.
Subtracting yields a(q1 − q2 ) = r2 − r1 . But −a < r2 − r1 < a, so the only
multiple of a in this range is 0. Hence r2 = r1 , and thus q1 = q2 . 

1.5.2. Definition. The greatest common divisor of a pair of non-


zero integers a and b is the largest number d, denoted gcd(a, b), which di-
vides both of them. Two integers a and b are called relatively prime if
gcd(a, b) = 1.

The notion of largest common divisor, in terms of the natural order on


Z, is not directly compatible with divisibility. In other words, small num-
bers need not divide big ones. So one cannot say, without some additional
argument, that the largest common divisor of two integers is related to other
divisors in any multiplicative way. In fact, the reason that all divisors of
two numbers divide the largest common divisor is the basis for proving that
factoring numbers into primes is unique.
The theoretical and computational importance of the greatest common
divisor lies in the fact that there is a simple algorithm for computing it,
which, at the same time, reveals some of the deeper structure. This al-
gorithm is known as the Euclidean algorithm. It is best seen through an
example. But first, we describe the basic idea.
Start with two positive integers a and b, and say that a > b. Divide b into
a to obtain a remainder r1 and quotient q1 . From the division algorithm,
we have 0 ≤ r1 < b, and r1 = a − q1 b. Now divide r1 into b to obtain a
remainder r2 . Notice that r2 can be expressed in terms of b and r1 , and
hence in terms of a and b. Repeat this operation by now dividing r2 into
r1 , etc. Eventually, this process ends because the remainders are decreasing
and must eventually reach zero. The last non-zero remainder will be the
gcd(a, b). As we go along, we keep track of how to express all the remainders
in terms of integer combinations of a and b.

1.5.3. Example. Consider the algorithm for gcd(901, 636). Now 636 goes
into 901 q1 = 1 times with remainder r1 = 265. So 265 = 901(1) + 636(−1);
14 1. THE INTEGERS

we write s1 = 1 and t1 = −1. Next 265 goes into 636 q2 = 2 times with
remainder r2 = 106. Therefore,
106 = 636 − 265(2)

= 636(1) − 901(1) + 636(−1) (2)
= 901(−2) + 636(3).
We write s2 = −2 and t2 = 3. Repeating this procedure, we see that 106
goes into 265 q3 = 2 times with remainder r3 = 53. And
53 = 265 − 106(2)
 
= 901(1) + 636(−1) − 901(−2) + 636(3) (2)
= 901(5) + 636(−7).
We set s3 = 5 and t3 = −7. Finally, 53 divides into 106 exactly 2 times with
0 remainder. The following chart helps to keeping track of this information.
r q s t
901 1 0
636 0 1
265 1 1 -1
106 2 -2 3
53 2 5 -7
0 2 -12 17

Notice that one obtains the value of s and t in a given row by subtracting
q times the row above from the row above that.
Now you should notice that 53 divides 901 and 636. The reason this
happens is explained recursively. First, the fact that the next remainder is
0 means that 53 exactly divides 106. The equation 265 = 106(2) + 53 shows
that 53 divides 265. Next, one has 636 = (2)265 + 106, so that 53 divides
636. Finally, since 901 = 636 + 265, it is also a multiple of 53. Thus 53 is a
common divisor of 636 and 901.
Next, suppose that d divides both 636 and 901. Then the equation
53 = 901(5) − 636(7) implies that d divides 53. In particular, 53 must be
the biggest divisor because all common divisors of 636 and 901 divide it.

It seems worthwhile to try to set down the main ideas of the proof here
in general. However, if you do not think that you already have the basic idea
of how it goes, stop now and work out a couple of examples on your own.
Then look over the example above again to see if the arguments make more
sense. Experience shows that trying to understand the general argument
before understanding the concrete example is often futile.

1.5.4 Euclidean Algorithm. Given two positive integers a > b, use


the division algorithm repeatedly to obtain a sequence of remainders ri for
1.5. EUCLIDEAN ALGORITHM 15

1 ≤ i ≤ k + 1 until the last remainder rk+1 = 0. Then gcd(a, b) = rk , and


there are integers s and t so that gcd(a, b) = as+bt. Moreover, every divisor
of both a and b divides gcd(a, b).

Proof. For convenience of notation, we will write r−1 = a and r0 = b.


Notice that a and b are combinations of themselves. That is, a = a(1) + b(0)
and b = a(0) + b(1). So we define s−1 = 1, t−1 = 0, s0 = 0, and t0 = 1.
Now we proceed with our algorithm by induction. At each stage, we have
each remainder ri = asi + bti . If ri = 0, divide it into ri−1 to obtain
ri−1 = ri qi+1 + ri+1 with remainder 0 ≤ ri+1 < ri . We have the equation
ri+1 = ri−1 − ri qi+1
 
= asi−1 + bti−1 − asi + bti qi+1
 
= a si−1 − si qi+1 + b ti−1 − ti qi+1 .
This writes ri+1 in the form asi+1 + bti+1 , and in fact yields the explicit
expressions si+1 = si−1 − si qi+1 and ti+1 = ti−1 − ti qi+1 . Since ri is a
strictly decreasing sequence of non-negative integers, this process eventually
stops with a zero remainder rk+1 .
Now we work our way back up the list, proving that rk divides all of the
ri . To begin, rk divides itself; and the identity rk−1 = rk qk+1 shows that
rk divides rk−1 . Suppose that we have shown that rk divides ri+1 and ri .
The identity ri−1 = ri qi+1 + ri+1 holds. Since rk divides the right hand side
of the equation, it must also divide ri−1 . Continue this process until it is
shown that rk divides both r0 = b and r−1 = a.
Lastly, it must be shown that every divisor d of both a and b divides rk .
Now rk = ask + btk . It is clear that d divides the right-hand side, hence d
divides rk . Thus rk = gcd(a, b). 
Extending 1.5.4 slightly further, we can give an alternative characteriza-
tion of the gcd: while the gcd is defined to be the greatest common divisor, it
turns out that it is also the least positive solution to a certain Diophantine
equation. A Diophantine equation in an equation with integer coefficients
for which we seek only integer solutions. This will be explored in greater
depth in Chapter 3.

1.5.5. Corollary. Let a and b be positive integers. Then gcd(a, b) is the


least positive integer d for which there exist x, y ∈ Z with
ax + by = d.

Proof. Let d = gcd(a, b) and let d be the least positive integer for which
the equation ax + by = d has integer solutions. The Euclidean Algorithm
1.5.4 shows that there exist x0 , y0 ∈ Z such that ax0 + by0 = d . Therefore,
d ≤ d . On the other hand, d | a and d | b, so we see d divides ax + by = d,
and hence d ≤ d. Therefore, d = d. 
16 1. THE INTEGERS

Exercises

1. Prove that the remainder on division by 9 is obtained by the “casting


out nines” algorithm. The method is to add the decimal digits of the
given number. If the total is more than 9, repeat the procedure until
the sum is a single digit. Replace 9 by 0. This result is the remainder
after dividing by 9. Explain.
2. Find the gcd of each of the following pairs of numbers, and express it as
an integer combination of these numbers.
(a) 31463 and 9782.
(b) 65778 and 52507.
(c) 5564737 and 5574221.
(d) 2452548 and 2943234.
3. Define lcm(a, b) = ab/ gcd(a, b).
(a) Show that lcm(a, b) is a multiple of a and a multiple of b.
(b) Show that if a|n and b|n, then lcm(a, b)|n.
4. Prove the following formulae for integers a, b, d and k.
(a) gcd(a, b + ka) = gcd(a, b).
(b) gcd(ka, kb) = |k| gcd(a, b).
(c) gcd( ad , db ) = 1 when d = gcd(a, b).
5. Write a computer program to implement the Euclidean algorithm. The
input is a, b ∈ N. The output should be gcd(a, b) together with s, t ∈ Z
so that gcd(a, b) = as + bt.
6. The last step of the Euclidean algorithm yields 0 = ask+1 + btk+1 . Show
that sk+1 = ±b/d and tk+1 = ∓a/d, where d = gcd(a, b).
Hint: use induction to show that si ti−1 − si−1 ti = ±1.
7. Find all strictly increasing functions f : N → N such that f (2) = 2, and
whenever gcd(m, n) = 1, then f (mn) = f (m)f (n).

1.6. Factoring Integers


In this section, we will prove the Fundamental Theorem of Arithmetic.
This simply states that every number factors into primes in exactly one way.
This is very important, and without the aid of the Euclidean algorithm, it
would be very difficult to prove. In fact, we will see in section 3.3 that there
are number systems which do not have this unique factorization property
while others much like them do. So unique factorization is a special property
which relies on important structural properties of the integers which are not
immediately obvious.
The key to the proof is the following lemma, which follows quickly from
the tools we have now. Try to prove it without the Euclidean algorithm.
1.6. FACTORING INTEGERS 17

1.6.1. Lemma. If gcd(a, b) = 1 and a|bc, then a|c.

Proof. From the Euclidean algorithm, we obtain integers s and t such that
as + bt = 1. Since a|bc, there is an integer d so that ad = bc. Thus,
c = (as + bt)c = a(sc + dt).
Therefore c is a multiple of a. 

1.6.2. Corollary. Suppose that a prime p divides the product a1 a2 . . . ak .


Then there is an index j so that p|aj .

Proof. We proceed by induction on k. This is evident for k = 1. For k = 2,


this will follow from the lemma. For gcd(p, a1 ) divides p, and thus is 1 or p.
In the first case, the lemma yields p|a2 ; while the latter yields p|a1 .
Now suppose that we have verified the result for k − 1. By hypothesis,
p|(a1 . . . ak−1 )ak .
Applying the result for k = 2, we obtain p|ak or p|a1 . . . ak−1 . If it is this
second case, the induction hypothesis provides the desired conclusion. 
Note that this is not the most basic type of induction. As well as needing
the result for k − 1, we also need the k = 2 result. The following corollary
is an immediate consequence of the one above, so no proof is needed. Make
sure that you understand why this is the case.

1.6.3. Corollary. If a prime p divides ak , then p|a.

The numbers ±1 are units of Z, meaning that they are invertible ele-
ments; namely 1 · 1 = 1 = (−1)(−1). Any factorization into primes can be
modified by multiplying each prime by a unit, provided that the product of
all of the units used is 1. By convention, we consider a unit to be a product
of no primes.

1.6.4 Fundamental Theorem of Arithmetic. Every non-zero inte-


ger factors uniquely as a product of primes. More precisely, suppose n ≥ 2
is an integer, and two factorizations into products of positive primes
n = p1 · · · pr = q1 · · · qs
are given. If the factors are arranged so that p1 ≤ p2 ≤ . . . ≤ pr and
q1 ≤ q2 ≤ . . . ≤ qs , then r = s and pi = qi for 1 ≤ i ≤ r.

Proof. Let us prove this by induction on n. Let P (n) be the statement


that n factors uniquely into a product of positive primes in increasing order.
First suppose n = 2. We know that 2 is prime, and thus has a unique
factorization 2 = 2 as a product of primes; hence P (2) holds.
18 1. THE INTEGERS

Next suppose that the result holds for all 2 ≤ m < n. Furthermore,
there is no harm in assuming that we listed our two factorizations of n so
that p1 ≤ q1 . Since
p1 |n = q1 . . . qs ,
Corollary 1.6.2 above implies that p1 divides some qj . Since qj is prime, this
means p1 = qj . However p1 ≤ q1 ≤ qj = p1 , so we see that p1 = q1 . Let
m = n/p1 . Then
p2 . . . pr = m = q2 . . . qs .
If m = 1, then p2 . . . pr = 1 which implies that p2 . . . pr is the empty product,
i.e. r = 1; and similarly s = 1. Therefore, p1 = n = q1 and the result is
proven. If m > 1, then since m < n, by induction, r−1 = s−1 and pi = qi
for 2 ≤ i ≤ r. Hence the result is also established for n. 

Exercises

1. Factor into primes the number


n = (5564737)(5541307) = (5574221)(5531879).
You may assume that n has no factors less than 50.
2. Find gcd(100!, 3100 ). Why was this question not in the previous section?
3. How many terminal zeros are there in the decimal number 250!.
4. (a) Count the number of positive integer divisors of a number n with
prime factorization n = p2 q 6 where p and q are distinct primes.
(b) Find a general formula for the number of divisors of pa q b .
5. Show that gcd(a3 , b3 ) = gcd(a, b)3 .
6. A number is called perfect if it is equal to the sum of all of its proper
positive integer divisors. For example, 6 = 1 + 2 + 3. Show that if p and
2p − 1 are both prime, then 2p−1 (2p − 1) is a perfect number.
7. If you have a symbolic manipulation program, factor n given that it is
the product of
4609068862978065342371213044512378636389457901495069208081
and
4609068862978065342371213053881215673426353463259338798251
and is also the product of
4609068862978065342371213050758269994414054942671248930813
and
4609068862978065342371213047635324315401756422083159071287.
Hint: factoring such large numbers is slow, but gcd’s are fast.
1.7. IRRATIONAL NUMBERS 19

8. Let the set of all primes be listed in order as p1 , p2 , p3 , . . .. Suppose that


n = pa11 . . . pakk and m = pb11 . . . pbkk , where the superscripts ai and bi may
be 0. Find the formula for gcd(n, m).
9. Suppose that p and q are consecutive odd primes. Prove that p + q has
at least three prime factors (not necessarily distinct).
10. Suppose that a, b, c, d ∈ N and gcd(a, b) = 1. Show that if ab = cd , then
a and b are both dth powers.
11. Suppose that a, b, c ∈ N and gcd(a, b) = 1. Show that if c|ab, then there
is a unique factorization c = c1 c2 in N such that c1 |a and c2 |b.

1.7. Irrational Numbers


An irrational number is a real number which cannot be expressed as a
quotient of two integers. This may seem to be unrelated to the subject just
covered. But in fact, many of the proofs of irrationality depend on unique
factorization. √
√look aat the argument that 3 is irrational. It is proved by assum-
Let us
ing that 3 = b , where a and b are integers, and obtaining a contradiction.
We may suppose that gcd(a, b) = 1. Squaring and cross multiplying yields

3b2 = a2 .

So 3 divides a2 , and hence by Corollary 1.6.3, 3 divides a. If we write a = 3c


and substitute into our equation, we obtain

3b2 = 9c2 and hence b2 = 3c2 .

Repeating the argument, we see that √3 divides b. But then 3 divides


gcd(a, b). This is absurd, and therefore 3 must be irrational.
There is some controversy about who first proved the irrationality√of
certain numbers. It was the school of Pythagoras who first showed that 2
was irrational. A number a is called square free if there is no integer b > 1
such that b2 |a. Plato credits his teacher Theodorus with the irrationality of
the square roots of the square free numbers from 3 to 17. Scholars speculate
that the reason for stopping at 17 is because the Fundamental Theorem of
Arithmetic was not known. See [15, pp. 50–51.] We will see in Proposition
1.7.1 that these proofs hold in much more generality. Indeed, later in the
section on polynomials, even stronger irrationality results can be obtained.
Here is a generalization of this fact.

1.7.1.
√ Proposition.√Suppose that n and k are positive integers such that
k k
n is rational. Then n is an integer.
20 1. THE INTEGERS

Proof. Again let us write k n = ab with gcd(a, b) = 1. Taking the k-th
power and cross multiplying, we obtain

nbk = ak .

If b = 1, let p be any prime factor of b. Then p divides ak , and hence divides


a. Therefore, p divides gcd(a, b). Since
√ gcd(a, b) = 1, b cannot have any
prime factors; that is, b = 1. Hence k n = a is an integer. 

Ad hoc methods can be used to prove that various algebraic expressions


are irrational. (See the exercises) Later in this book, there will be more
sophisticated ways of proving irrationality. For other important numbers
such as π and e, one needs an analytic expression that defines these numbers
in order to prove irrationality. It is much more difficult to show that these
numbers do not satisfy any algebraic equation at all. Such numbers are
called transcendental. It is possible to give an elementary proof of the
irrationality of e. In chapter 6 we will give a much more devious proof that
e is indeed transcendental.

1.7.2. Proposition. e is irrational.

Proof. We need an expression for e. A useful expression from calculus is



 1
e= .
n!
n=0

Suppose that e = a/k where a and k are positive integers. Compute


k
k!  k!
a(k − 1)! = k!e = + .
n! n!
n=0 n≥k+1

The first sum on the RHS is an integer. Hence there is an integer


k
k! 1 1 1
a(k − 1)! − = + + + ....
n! k + 1 (k + 1)(k + 2) (k + 1)(k + 2)(k + 3)
n=0

Estimate the size of this ‘integer’, say b, by summing a geometric series:



 (k + 1)−1 1
0<b< (k + 1)−m = −1
= ≤ 1.
1 − (k + 1) k
m=1

There are no integers in this range, and so we have a contradiction. Hence


e must be irrational. 
1.8. UNIQUE FACTORIZATION IN MORE GENERAL RINGS 21

Exercises

√ √
1. Show that β := 2+ 3 is irrational. Hint: if β = pq with gcd(p, q) = 1,
do algebraic manipulations to eliminate the square roots, and deduce
that q|p.
√ √
2. Show that γ = 2 + 3 5 is irrational.
Hint: if γ = pq with gcd(p, q) = 1, get rid of the cube root first; then
eliminate the square root.
3. Show that log10 7 is irrational.
 an
4. Let an ∈ {1, 2, . . . , 9} for n ≥ 1. Show that n!
is irrational.
n≥1 10

5. Let α be a root of a polynomial p(x) = xn + cn−1 xn−1 + · · · + c1 x + c0 ,


where ci ∈ Z and c0 = 0. Show that α is either an integer or is irrational.
Hint: if α = ab with gcd(a, b) = 1, compute bn p(α) in two ways, and
deduce that b|an .
√ √
6. Find a monic polynomial with integer coefficients with 2 + 3 as a
root.
√ √
7. Find a monic polynomial with integer coefficients with 2 + 3 5 as a
root.
8. Show that if k is not a power of any other integer, then logk a is either
an integer or irrational for each positive integer a.
9. In this exercise we show there exist irrational numbers
√ q and r such√that
q r is rational. Prove that one may take r = 2 with either q = 2 or
√ √2
q= 2 .

1.8. Unique Factorization in More General Rings


This section has a much greater level of abstraction than the rest of this
chapter. It could be put off until a later point. However since the proof
is fresh in our minds, it makes sense to do it here. Otherwise we will find
ourselves providing the same proof repeatedly in various contexts.
Having now proved the Fundamental Theorem of Arithmetic 1.6.4, it
is worthwhile to figure out the level of generality in which our proof is
valid. You will notice that the Fundamental Theorem of Arithmetic relied
on Euclid’s algorithm 1.5.4, which in turn relied on the Division algorithm
1.5.1. We will see that any ring where an appropriate analogue of the division
algorithm holds will satisfy a type of Euclidean algorithm. This will then
be used to prove a version of the Fundamental Theorem of Arithmetic for
any such ring.
22 1. THE INTEGERS

To begin, a basic property that Z enjoys is that two non-zero integers


cannot multiply to be zero. We are interested in rings in general that satisfy
this constraint.

1.8.1. Definition. An element a of a ring R is a zero divisor if a = 0


and there exists a non-zero element b ∈ R with ab = 0. A commutative ring
R with no zero divisors is called an integral domain.

As we show in the next result, integral domains satisfy a familiar can-


cellation property. You will notice that this cancellation property for Z is
used throughout the last few sections. So, in order to make the proof of
the Fundamental Theorem of Arithmetic work in greater generality, it is
important that we restrict attention to integral domains.

1.8.2. Lemma. Let R be an integral domain. If a, b, c ∈ R and ab = ac,


then a = 0 or b = c.

Proof. We see a(b − c) = 0 and since R has no zero divisors, we must have
a = 0 or b − c = 0. 

Since our ultimate goal in this section is to prove an analogue of the


Fundamental Theorem of Arithmetic, we need a suitable notion of units
and prime numbers. In general rings, primes are called irreducibles.

1.8.3. Definition. A unit of a ring R is an element x which has a


multiplicative inverse y, i.e. there is an element y satisfying xy = yx = 1.
We often write y = x−1 . The set of units of R is denoted by R∗ .

1.8.4. Remark. In Exercise 5 you will prove that the y in Definition 1.8.3
is uniquely determined. Thus, the notation x−1 is unambiguous.

1.8.5. Example. In Z, the units are ±1. In Q, every non-zero √ element is


a unit. See Exercise 2 for some information on the units of Z[ 2].

1.8.6. Definition. Let R be an integral domain. An element p ∈ R is


/ R∗ and whenever p = ab for a, b ∈ R, either a or b is a
irreducible if p ∈
unit.

We next axiomatize what it means for a ring to have a division algorithm.


The key property of the division algorithm 1.5.1 is that when we divide b
into a, the absolute value of the remainder r is smaller than that of b. We
will be interested in rings which, unlike Z, may not have a useful ordering.
(See Exercise 9.) Thus, we cannot literally require in our division algorithm
1.8. UNIQUE FACTORIZATION IN MORE GENERAL RINGS 23

that r < b. However, we can look for an auxiliary function f which measures
“how big” an element is and we can require that f (r) < f (b).

1.8.7. Definition. An integral domain R is a Euclidean domain if there


is a Euclidean function f : R → N0 satisfying the following properties:
(1) f (a) ≤ f (ab) for all a, b ∈ R with b = 0. (order )
(2) for all a, b ∈ R with b = 0, there exist q, r ∈ R with
a = bq + r
and f (r) < f (b). (division)
When we wish to emphasize the function f , we will say that (R, f ) is a
Euclidean domain.

1.8.8. Lemma. Let (R, f ) be a Euclidean domain. Then


(1) if a ∈ R \ {0}, then f (0) < f (1) ≤ f (a).
(2) if a, b ∈ R \ {0}, then f (a) = f (ab) if and only if b ∈ R∗ .
(3) if b ∈ R \ {0}, then f (b) = f (1) if and only if b is a unit.

Proof. If a = 0, then by the order property,


f (1) ≤ f (1 · a) = f (a).
Now take a = b = 1 and use the division property to write 1 = 1 · q + r with
f (r) < f (1). This must mean that r = 0 and f (0) < f (1). So (1) holds.
If b ∈ R∗ , then a = (ab)b−1 , and so f (ab) ≤ f (a) ≤ f (ab); whence
f (a) = f (ab). Conversely, suppose f (ab) = f (a). By the division property,
there exist q, r ∈ R such that a = (ab)q +r with f (r) < f (ab) = f (a). Hence
r = a(1 − bq). If r = 0, we would get f (a) ≤ f (r) < f (a), a contradiction.
So, we must have 0 = r = a(1 − bq). Since a = 0, Lemma 1.8.2 implies that
1 − bq = 0. Thus b ∈ R∗ . So (2) holds.
The third statement now follows by taking a = 1 in (2). 

1.8.9. Remarks. In Exercise 6, you will show that if R has a function f


satisfying the division property, then R has a Euclidean function.
In Exercise 8, you will show that if f is a Euclidean function and g :
Ran f → N0 is strictly increasing, then g ◦ f is also a Euclidean function.
Thus there are are many different choices for the Euclidean function; so
this function is not unique. It means that we can always choose g so that
g(f (0)) = 0 and g(f (1)) = 1. Thus we may suppose that f (0) = 0 and
f (b) = 1 if and only if b is a unit.

1.8.10. Example. The integers Z is a Euclidean domain, where we take


f (n) := |n|. Notice that this particular choice of f has a lot of structure: for
example, |ab| = |a| |b|. Also if a | b, then |a| ≤ |b| and we have equality if
and only if a = ±b.
24 1. THE INTEGERS

We will see many other examples of Euclidean domains throughout the


course, such as the Gaussian integers (Section 3.5), other quadratic number
domains (see Section 3.3 and Exercise 3 of Section 3.5), and polynomial
rings over a field (Section 6.2). In this last example, the function f is the
degree of the polynomial.

1.8.11. Example. We will show that Z[ 2] is a Euclidean domain for the
function f (x) = |N (x)|,
√ where √ N is the norm function defined in Exercise 1.
That is, if x = x1 + x2 2 ∈ Q[ 2], then N √(x) = x1 − 2x2 . Exercise 1 shows
2 2

that f is multiplicative. Since f maps Z[ 2] into N0 ,



f (ab) = f (a)f (b) ≥ f (a) for all a, b ∈ Z[ 2] \ {0}.
√ √
Suppose that a = a1 + a2 2 and b = b1 + b2 2 = 0 are given. Let
√ √
a a1 + a2 2 b1 − b2 2
x= = √ √
b b1 + b2 2 b1 − b2 2
a1 b1 + 2a2 b2 a1 b2 + a2 b1 √
= + 2
N (b) N (b)
√ √
=: x1 + x2 2 ∈ Q[ 2].
That is, x1 and x2 are rational. Choose integers c1 , c2 so that |x1 − c1 | ≤ 12
√ √
and |x2 − c2 | ≤ 12 . Define c = c1 + c2 2 ∈ Z[ 2]. Then let
√  √
r = r1 + r2 2 = a − bc = b(x − c) = b (x1 − c1 ) + (x2 − c2 ) 2 .
√ √
Note that r ∈ Z[ 2]. However the norm is defined on Q[ 2] and is multi-
plicative by Exercise 1. It follows that

N (r) = N (b) (x1 − c1 )2 − 2(x2 − c2 )2 .
Now (x1 − c1 )2 ∈ [0, 14 ] and (x2 − c2 )2 ∈ [0, 14 ], so that

(x1 − c1 )2 − 2(x2 − c2 )2 ∈ − 12 , 14 .

Therefore f (r) = |N (r)| ≤ 12 |N (b)| = 12 f (b). Thus Z[ 2] has a division
algorithm, and f is a Euclidean function.

Our next result shows that Euclidean domains satisfy a type of Eu-
clidean algorithm. Since R is not necessarily ordered, we cannot speak of
the greatest common divisor of a and b. However, the properties of rk listed
in theorem below capture the fact that rk behaves like the gcd of a and
b. Indeed, the first property says rk is a common divisor of a and b; and
the third property says that if e is any other common divisor, rk must be
“greater than” e in the sense that e divides rk . Notice that in the case when
R = Z, this reduces to saying that rk = ± gcd(a, b).
1.8. UNIQUE FACTORIZATION IN MORE GENERAL RINGS 25

1.8.12 Euclidean Algorithm for Euclidean Domains. Let (R, f )


be a Euclidean domain and a, b ∈ R with b = 0. Then using the division
algorithm repeatedly yields a sequence
a = bq1 + r1
b = r1 q2 + r2
r1 = r2 q2 + r3
..
.
rk−1 = rk qk + rk+1
with r1 , r2 , . . . , rk non-zero, rk+1 = 0, and f (b) > f (r1 ) > · · · > f (rk ).
Furthermore, rk satisfies the following properties:
(1) rk | a and rk | b,
(2) there exist s, t ∈ R such that as + bt = rk .
(3) for any e ∈ R, if e | a and e | b, then e | rk ,

Proof. For notational convenience, we let r−1 = a and r0 = b. Let us


first show that the process terminates; i.e. there exists k with rk+1 = 0.
Otherwise the process would define ri = 0 for all i ≥ 1. Consider the set
{f (ri ) : ri = 0, i ≥ 1}
with the ri defined as in the statement of the theorem. Since all f (ri ) are
positive integers, by the well-ordering principle, there must be a least ele-
ment f (rk ). If rk+1 = 0, then we would have f (rk+1 ) < f (rk ) contradicting
the fact that f (rk ) is minimal. Thus, rk+1 = 0.
We now prove (1). Since rk+1 = 0, we have rk−1 = rk qk and so rk | rk−1 .
Now, inductively assume rk | ri+1 and rk | ri+2 . Since ri = ri+1 qi+2 + ri+2 ,
we see rk | ri as well. This proves that rk divides all ri , in particular it
divides r−1 = a and r0 = b.
For (2), we prove by induction that there exist si , ti ∈ R with asi + bti =
ri . For the base case of the induction, we have a = r−1 · 1 + r0 · 0 and
b = r−1 · 0 + r0 · 1. We may therefore take s−1 = 1, t−1 = 0, s0 = 0 and
t0 = 1. Now assume that there exists si−1 , ti−1 , si , ti ∈ R with
asi−1 + bti−1 = ri−1 and asi + bti = ri .
We will show the existence of si , ti ∈ R with asi + bti = ri . By definition,
we have
ri+1 = ri−1 − ri qi
= (asi−1 + bti−1 ) − (asi + bti )qi
= a(si−1 − qi si ) + b(ti−1 − ti qi ),
so we may take si+1 = si−1 − qi si and ti+1 = ti−1 − ti qi . We have therefore
shown that every rj is of the form asj +btj for some sj , tj ∈ R. In particular,
the statement is true when j = k.
26 1. THE INTEGERS

Now (3) follows from (2) since if as + bt = rk , then any common divisor
of a and b must also divide rk . 

1.8.13. Definition. Let a, b ∈ R with R an integral domain. We say a


and b are relatively prime if for every e ∈ R, e | a and e | b implies e ∈ R∗ .

1.8.14. Example. When R = Z, Definition 1.8.13 agrees with the usual


notion of relative primality since Z∗ = {±1}. In Q, any two non-zero ele-
ments are relatively prime.

1.8.15. Corollary. Let (R, f ) be a Euclidean domain. Then a, b ∈ R are


relatively prime if and only if there exist s, t ∈ R such that as + bt = 1.

Proof. Suppose as + bt = 1. If d | a and d | b, then d | 1 so d ∈ R∗ . This


shows a and b are relatively prime.
Conversely, applying Euclid’s algorithm 1.8.12, we see there exist s, t, rk
in R with as + bt = rk , rk | a and rk | b. Since a and b are relatively prime,
rk ∈ R∗ . Hence, a(srk−1 ) + b(trk−1 ) = 1. 

We next show that every non-zero non-unit can be factored into a prod-
uct of finitely many irreducibles. This gives an analogue of Theorem 1.3.3.

1.8.16. Proposition. Let (R, f ) be a Euclidean domain. Then every


non-zero non-unit a ∈ R is a product of finitely many irreducible elements.

Proof. We do induction on f (a). By Lemma 1.8.8 (1), f (a) ≥ f (1) for all
a = 0. Let us begin with the base case of the induction, namely f (a) = f (1).
By Lemma 1.8.8 (3), we have a ∈ R∗ and so there is nothing to show.
Next, fix a number n > 1 and assume that the statement is true for all
b ∈ R with 1 ≤ f (b) < n. Then we will prove the statement for all a with
f (a) = n. If a is irreducible then we are done. So, we may assume a is not
irreducible, in which case, by definition, we have a = bc with b, c ∈/ R∗ . Then
Lemma 1.8.8 (2) shows f (b) < f (a) since c ∈ ∗
/ R . Similarly, f (c) < f (a). By
our inductive hypothesis, we know both b and c are products of finitely many
irreducible elements. Since a = bc, we can multiply these two factorizations
together to obtain a as a product of finitely many irreducible elements. 

We next prove an analogue of Corollary 1.6.2, which was the key input
to showing the Fundamental Theorem of Arithmetic.

1.8.17. Proposition. Let R be a Euclidean domain and suppose p ∈ R


is irreducible. If p divides the product a1 a2 . . . ak , then there is an index j
so that p | aj .
1.8. UNIQUE FACTORIZATION IN MORE GENERAL RINGS 27

Proof. We proceed by induction on k. For k = 1, there is nothing to prove.


Now let k = 2. First suppose that p and a1 are relatively prime. By
Corollary 1.8.15, there exist s, t ∈ R such that 1 = ps + a1 t. Therefore
a2 = pa2 s + a1 a2 t. Since p | a1 a2 , we get p | a2 . On the other hand, suppose
p and a1 are not relatively prime. Thus, there exists d ∈ / R∗ such that d | p
and d | a1 . By the definition of an irreducible element, we see d = up where
u ∈ R∗ . Then up = d | a1 , so p | a1 . This completes the k = 2 case.
We now consider k > 2. We have p | bak , where b = a1 a2 . . . ak−1 . If
p | ak , we are done. Otherwise we may assume p does not divide ak . Then
by the k = 2 case, we see p | a1 a2 . . . ak−1 . By induction, there exists j such
that p | aj . 

We now come to the main result of this section: in every Euclidean do-
main, we can uniquely factor elements as a product of irreducible elements.

1.8.18 Unique Factorization for Euclidean Domains. Let (R, f )


be a Euclidean domain. Then every non-zero non-unit a ∈ R can be written
as a product of finitely many irreducible elements. Moreover, if
a = p1 . . . pr = q1 . . . qs
with all pi , qj irreducible, then r = s and after reordering the q’s, we have
qi = ui pi for some ui ∈ R∗ .

Proof. By Proposition 1.8.16, we know that every non-zero non-unit a can


be factored into a product of finitely many irreducible elements. To prove
the unique factorization statement, we proceed by induction on f (a). That
is, we let P (n) be the statement that the conclusion of the theorem is valid
for every a ∈ R with f (a) = n.
By Lemma 1.8.8 (1), we know f (a) ≥ f (1) for all non-zero a. Let us
begin with the base case of the induction, namely f (a) = f (1). By Lemma
1.8.8 (3), we have a ∈ R∗ . Then if p1 . . . pr = a, we see pi | 1, so pi ∈ R∗
which contradicts the definition of an irreducible element. Therefore, a has
no factorization into irreducible elements, and hence the statement P (f (1))
is vacuously true.
Next, assume that f (a) = n > f (1) and P (k) is true for 1 ≤ k < n. We
will prove the statement for a. Assume that a = p1 . . . pr = q1 . . . qs are two
factorizations into irreducibles. Then
p1 | a = p1 . . . pr = q1 . . . qs .
By Proposition 1.8.17, p1 | qj for some j. After reordering the q’s, we may
suppose that p1 |q1 . Since p1 ∈/ R∗ , using the definition of an irreducible
element applied to q1 , we see q1 = u1 p1 for some u1 ∈ R∗ . Therefore,
a
p2 p3 . . . pr = = q2 q3 . . . qs ,
p1
28 1. THE INTEGERS

where q2 = u1 q2 . Directly from the definition, we have that q2 is also
irreducible. Since pa1 | a and p1 ∈/ R∗ , we have from Lemma 1.8.8 (2) that
f ( p1 ) < f (a). By the induction hypothesis, we conclude that r − 1 = s − 1
a

(i.e. r = s) and after reordering the pi , we have q2 = vp2 and qi = ui pi for
some v, ui ∈ R∗ for 3 ≤ i ≤ r. Thus q2 = (u−1 −1
1 v)p2 ; so we set u2 = u1 v ∈
R∗ . This establishes P (n). By induction, the proof is complete. 

Exercises
√ √ √ √
1. Let Q[ 2] = {r + s 2 : r, s ∈ Q}. For x = r + s 2 in Q[ 2], define the
norm of x be N (x) =√r2 − 2s2 ∈√ Q.
(a) Show that r + s 2 = t + u 2 for r, s, t, u ∈ Q implies r = t and
s = u. √ √
(b) Show that if x, y ∈ Q[ 2] and y = 0, then x/y ∈ Q[√ 2].
(c) Show that N (x) = 0 implies that x = 0 for x in√Q[ 2].
(d) Show that N (xy) = N (x)N (y) for all x, y in Q[ 2].
√ √ √
2. (a) Show that 1 + 2√and 17 + 12 2 are units in Z[ 2].
(b) Prove that x ∈ Z[ 2] is a unit if and only if N (x) = ±1.
Hint: N (x) is an integer. √
(c) Prove that there are infinitely many units in Z[ √2].
Hint: find a way to make other units from 1 + 2.

3. (a) Show that 2 and 7 are√ not irreducible in Z[
√ 2].
(b) Show that x = 5 − 2 2 is irreducible in Z[ 2].
Hint: compute N (x). √
(c) Show that 3 is irreducible in Z[ 2].
Hint: What are the possible remainders after dividing a square by
8? Show that N (x) = ±3 is impossible.

4. Find a Euclidean function for Z[ 3].
Hint: modify Example 1.8.11.
5. Let R be a ring. Suppose that x ∈ R and there are elements y1 , y2 ∈ R
such that y1 x = 1 = xy2 . Prove that y1 = y2 ; and so x is a unit. In
particular, if x is a unit, then it has a unique inverse.
6. Let f be a function on a ring R satisfying the division property of a
Euclidean domain. Define g(a) = min{f (ab) : b = 0}. Prove that g is a
Euclidean function for R.
7. (a) Prove that (Q, f ) is a Euclidean domain, where f (0) = 0 and f (a) =
1 for 0 = a ∈ Q.
(b) More generally, let F be a commutative ring such that F ∗ = F {0};
such a ring F is called a field. Set f (0) = 0 and f (a) = 1 for all
a = 0. Prove that f is a Euclidean function for F .
NOTES ON CHAPTER 1 29

8. Let (R, f ) be a Euclidean domain. Let g : Ran f → N0 be any strictly


increasing function. Prove that g ◦ f is also a Euclidean function for R.

9. (a) Show that Z[ 2] is ‘dense’ in R, meaning √ that if x < y in R, then
there are integers a, b so that x < a + b 2 < y. √
Hint: for each n ∈ N, there is some an ∈ Z so that an +n 2 ∈ (0, 1).
Choose k so that k1 < y − x. Use the pigeonhole principle to find
√ √
two numbers m < n so that 0 < |(an + n 2) − (am + m 2)| < k1 .

(b) Explain why the order on Z[ 2] induced from R cannot be used to
define a Euclidean function.

Notes on Chapter 1
Presumably numbers arose from counting. Once civilizations developed
some mode of writing, they also developed ways to record numbers. The
ancient Egyptians had a system for writing numbers up to a million. The
ancient Chinese had a base 10 system of numbers. Babylonians developed
a system base 60.
The notion of zero came later, first as a placeholder for writing numbers
in base 10. For example, the Chinese just left a blank space for a zero in
a base 10 number. The Babylonians first left it to context, but eventually
adopted a symbol to indicate a blank space around 400 BCE. The Greeks
however did not adopt the concept. The symbol zero apparently comes from
India, possibly as early as 200 CE. It was brought back to Europe by the
Arabs, who adopted it. Around 700 CE, Brahmagupta gave arithmetic rules
for working with 0 as a number in its own right. This spread to China, with
records from 1247 CE. Around this time, Fibonacci was proposing the use of
0. It wasn’t until the 1600s that 0 came into more common usage in Europe.
Negative numbers were not generally accepted in ancient times. There
is a record of the use of negative numbers for solving equations in China
around 100 BCE–50 CE. In Greece, in the third century, Diophantus made
use of negative numbers as ‘a number to be subtracted’ for use in solving
equations. However he apparently did not accept them as numbers on their
own. In the 7th century, Brahmagupta used negative numbers to reduce
the solution of a quadratic equation to a single case. (Diophantus had
three cases.) Records from China show negative numbers in use by the
13th century. In 1545, Cardano used negative numbers in his formulae
for roots of cubics and quartics. In the 17th century, Descartes partially
accepted negative numbers, although he considered them as false solutions
to equations. In the 18th century, Euler discussed operations with positive
and negative numbers. Yet still in the 19th century, Hamilton attempted
‘to put negative numbers on a firm theoretical footing’. By this time, it was
becoming more accepted—a surprisingly long time!
Euclid wrote a 13 volume treatise on mathematics in 300 BCE. It con-
tains the Euclidean algorithm and the proof of an infinitude of primes. It
30 1. THE INTEGERS

also contains Lemma 1.6.1 and Corollary 1.6.2. As we saw, it is a small step
from these results to the Fundamental theorem of arithmetic—but it does
not appear in Euclid. The first precise statement of the FTA is by Gauss in
1801.
The Euclidean algorithm for the Gaussian integers was known to Gauss
(see Section 3.5). Generally people only considered Euclidean algorithms for
the norm function until 1950. The abstract notion of a Euclidean domain
was implicit in work by Hasse in 1928.
Hardy and Wright [15] is a classic book on number theory that is still
relevant today. It differs from many number theory books in that it often
discusses different proofs, and it contains many historical notes. The 6th
edition has updated notes that reference many more recent results. Riben-
boim’s The little book of bigger primes [31] is, as the title suggests, all
about primes. There are many proofs of the infinitude of primes in Chapter
1. Stark [37] is a more modern number theory book whose introduction, in
particular, is well worth reading by readers of our book. Silverman [34] is
another nice introduction to number theory.
Alaca and Williams [2] is an algebraic number theory book which treats
Euclidean domains in general. In particular, they give many results about
√ √
the quadratic number domains Z[ d] for d ≡ 2, 3 (mod 4) and Z[ 1+2 d ]
when√d ≡ 1 (mod 4). We explain at the end of Section 3.3 why we use

Z[ 1+2 d ] rather than Z[ d] when d ≡ 1 (mod 4). Stark [37, Section 8.4]
also has interesting material about when quadratic number domains are
Euclidean or UFD (unique factorization domains), which is a strictly larger
class. Stark himself made important contributions to this problem.
Chapter 2

Modular Arithmetic
In this chapter, we discuss computations ‘modulo n’, meaning that we only
keep track of the remainder on division by n. We discuss solving systems of
equations in several interesting contexts.

2.1. Linear Equations


In this section, we look for integer solutions of the simplest type of equations.
An equation in which one searches for integer solutions is called a Diophan-
tine equation, after the Greek mathematician Diophantus. Consider the
equation
ax + by = c
where a, b and c are given integers. For example, 5x+7y = 1 has the solution
x = 3 and y = −2. But 6x + 10y = 15 has no solutions because the left side
is even, and 15 is odd. In general, ax+by is always divisible by d = gcd(a, b).
Thus a necessary condition for a solution is
gcd(a, b)|c.
This is also sufficient. It follows from the Euclidean algorithm that
there are integers s and t so that as + bt = d. So if c = dz, a solution of our
equation is given by x = sz and y = tz. Therefore we have proved most of
the following theorem.

2.1.1. Theorem. The Diophantine equation ax + by = c has a solution


if and only if d = gcd(a, b) divides c. Moreover, if {x0 , y0 } is one solution,
then all solutions are given by
b a
x = x0 + k y = y0 − k for k ∈ Z.
d d

Proof. The first part has been done. So suppose that {x0 , y0 } and {x, y}
are solutions of ax + by = c. Then X = x − x0 and Y = y − y0 satisfy
aX + bY = (ax + by) − (ax0 + by0 ) = 0.
31
32 2. MODULAR ARITHMETIC

Hence aX = −bY . Dividing by d = gcd(a, b) yields a


dX= − db Y . But,
Xd
gcd( ad , db ) = 1. Thus by Lemma 1.6.1, ad |Y and db |X. Set k = . So
b
b
x = x0 + X = x0 + k .
d
It follows that Y = − ab X = −k ad , and thus
a
y = y0 + Y = y0 − k .
d
Conversely, it is clear that every pair {x, y} of this form is a solution. 
Now we can handle more variables with a simple induction argument.
If a1 , . . . , an are integers which are not all 0, then we denote the great-
est common divisor of a set {a1 , . . . , an } by gcd(a1 , . . . , an ). We define
gcd(0, . . . , 0) = 0. Like the Euclidean algorithm (1.5.4), Corollary 2.1.2
gives a constructive method for finding solutions to the Diophantine equa-
tion ni=1 ai xi = c.

2.1.2. Corollary. Let a1 , . . . , an ∈ Z. The Diophantine equation



n
ai xi = c
i=1

has a solution if and only if gcd(a1 , . . . , an )|c.



Proof. If a1 = . . . = an = 0, then there is a solution to ni=1 ai xi = c if and
only if c = 0. We see c = 0 if and only if 0 | c, and since gcd(0, . . . , 0) = 0,
the corollary holds in this case.
Hence for the remainder of the proof, we may assume some ai = 0, in
which case d = gcd(a1 , . . . , an ) is the greatest common divisor of the set
{a1 , . . . , an }.
The case n = 1 is trivial, and the n = 2 case is a consequence of Theorem
2.1.1. Proceeding by induction, we suppose that the result holds for n = k−1
(and n = 2). Consider the equation

n
ai xi = c.
i=1

Since d divides the left-hand side of this equation, the condition d|c is nec-
essary.
Suppose that d|c. Let b = gcd(a1 , . . . , an−1 ), and note that gcd(b, an ) =
d. By the n = 2 case, the equation by + an xn = c has a solution, say y = Y
and xn = Xn . Now using the n = k − 1 case, since b|bY , solve the equation

n−1
ai xi = bY.
i=1
2.1. LINEAR EQUATIONS 33

Call this solution xi = Xi for i = 1, . . . , n − 1. It is clear that X1 , . . . , Xn is


a solution to our original equation. 

2.1.3. Example. Consider the problem of measuring exactly 3 cups of


water using two containers, one which holds 12 cups and one which holds 17
cups, but neither has any markings for smaller units. This is really a matter
of solving the equation 12x + 17y = 3. From the Euclidean algorithm, we
get 5(17) − 7(12) = 1. (See the table.)

n q s t
17 1 0
12 0 1
5 1 1 -1
2 2 -2 3
1 2 5 -7

Hence, 3 = 15(17) − 21(12) = 3(17) − 4(12). To implement this solution,


fill the 17 cup container. Fill the 12 cup container from the 17 cupper.
Dump out the 12 cup container and add the remaining 5 cups. Refill the 17
cup container, and continue filling and emptying the 12 cup container. It
takes another 7 cups to fill it. Empty the 12 cup container again, and add
the remaining 10 cups. Fill the 17 cupper a third time. Two more cups fills
the 12 cupper, leaving 15 cups in the 17 cup container. Pour out another 12
cups, leaving the 17 cup container holding exactly 3 cups. In other words,
we have filled the 17 cup container 3 times, and emptied out 4 lots of 12
cups using the 12 cup container. This leaves 3(17) − 4(12) = 3 cups.

Exercises

1. Solve 615x + 243y = 21.


2. Solve 2491x + 1113y = 212.
3. Using a 16 cL measure and a 27 cL measure and (approximately) half a
litre of milk in a jug, how can you measure out exactly 30 cL? What is
the most efficient way?
4. Find a solution of 30w + 42x + 70y + 105z = 1.
5. An experimental robot may move forward in small steps of 27cm and
in large steps of 75cm. It cannot turn or move backwards. It is at the
beginning of a track of length exactly 10m. How does the robot get as
close as possible to the other end of the track?
6. A revised version of the robot above is able to move backwards as well
as forwards the same distances. How much better can it do on a short
track of length 1m than the earlier model robot?
34 2. MODULAR ARITHMETIC

2.2. Congruences
A rather useful notion in number theory is that of modular arithmetic,
which means, working only with the remainders after division by some fixed
integer. For example, working modulo 2, a number is either even or odd.
To determine the parity of the sum of two numbers, one need only know
the parity of the the two numbers, not their actual values. Similarly, their
product will be even if either number is even, and odd only if both are odd.
Assign the number 0 to all even numbers (as this is the remainder after
dividing by 2), and assign the number 1 to all odd numbers. The ‘addition’
and ‘multiplication’ tables for these remainders is the one given in section 1.1
for the ring Z2 .
+ 0 1 · 0 1
0 0 1 0 0 0
1 1 0 1 0 1
Another familiar situation is clock arithmetic. If the time now is 7
o’clock, then in 19 hours it will be 2 o’clock. This calculation amounts to
adding 19 to 7, and then throwing away all multiples of 12 until the result
lies in the range of 1 to 12.
We will see that a similar situation holds for every positive integer n.
We say that a is congruent to b modulo n provided that n divides a − b,
and write
a ≡ b (mod n).
For example,
752 ≡ 968352 (mod 100)
-98743 ≡ 57 (mod 16)
but
99998 ≡ 22 (mod 3)
For every integer a, the Division algorithm shows that there is exactly
one number b in {0, 1, . . . , n − 1} so that a ≡ b (mod n). For each remainder
a, an integer a can be chosen so that a ≡ a (mod n) called a representative
of a. The important property to recognize is that addition and multiplica-
tion of remainders does not depend on which representative is used. More
precisely:

2.2.1. Proposition. Let n be a positive integer. Suppose that a1 ≡ a2


(mod n) and b1 ≡ b2 (mod n). Then,

a1 + b1 ≡ a2 + b2 (mod n),

and
a1 b1 ≡ a2 b2 (mod n).
2.2. CONGRUENCES 35

Proof. The hypotheses say that n divides both a1 − a2 and b1 − b2 . Adding


shows that n divides
(a1 − a2 ) + (b1 − b2 ) = (a1 + b1 ) − (a2 + b2 ),
which is to say, a1 + b1 ≡ a2 + b2 (mod n).
For multiplication, consider the calculation
a1 b1 − a2 b2 = (a1 − a2 )b1 + a2 (b1 − b2 ).
Since a1 − a2 and b1 − b2 are multiples of n, this shows that a1 b1 − a2 b2 is a
multiple of n. In other words, a1 b1 ≡ a2 b2 (mod n). 
For example, consider the problem of determining the last 2 digits of
3111243 . Since 311 ≡ 11 (mod 100), it suffices to consider powers of 11.
These powers are computed modulo 100 as 11,21,31,41,. . . . It is not neces-
sary to compute 113 , for example, because
113 ≡ 21 · 11 = 231 ≡ 31 (mod 100).
In particular, 1110 ≡ 1 (mod 100). Thus,
3111243 ≡ (1110 )124 (113 ) ≡ 31 (mod 100).
Later on, we will derive computational tools that will make this exercise
even easier.

Exercises

1. Compute the remainder modulo 7 of 22225555 .


2. What are the possible squares modulo 4? Hence show that 1234567 is
not the sum of two squares.
3. Suppose that a1 ≡ a2 (mod n) and b1 ≡ b2 (mod n). Show that
a1 − b1 ≡ a2 − b2 (mod n).
4. Suppose that a ≡ b (mod n). If p(x) is a polynomial with integer coef-
ficients, show that
p(a) ≡ p(b) (mod n).
Hint: First prove this for the monomials xn .

5. Let n = i=0 ai 10i where the ai are positive integers in {0, 1, . . . , 9},
i.e., when written in base-10 expansion, n has digits a , . . . , a0 .

(a) Prove that 3 | n if and only if 3 | i=0 ai .

(b) Prove that 9 | n if and only if 9 | i=0 ai .

(c) Prove that 11 | n if and only if 11 | i=0 (−1)i ai .
(d) Give a criterion in terms of the digits ai for when 7 divides n.
Hint: 7|1001.
36 2. MODULAR ARITHMETIC

6. (Josephus Problem) Let n be a positive integer and write the numbers


from 1 through n in a circle. Starting at 1, continue going around
the circle removing every other number until only one number remains.
Determine the values of n for which 1 is the last remaining number. For
example, if n = 7, we start by crossing off 2, then 4, then 6, then 1, then
5, then 3, so the last remaining number is 7.

2.3. The Ring Zn


Proposition 2.2.1 allows us to define a ring called Zn . The elements of the
ring are [0], [1], . . . , [n − 1] corresponding to the remainders {0, . . . , n − 1}.
Addition is defined by setting [a] + [b] to be the remainder [c] such that
a + b ≡ c (mod n). Similarly, multiplication is defined by setting [ab] to be
the remainder [c] such that ab ≡ c (mod n).

2.3.1. Example. Here are addition and multiplication tables for Z4 and
Z5 :

+ 0 1 2 3 · 0 1 2 3
0 0 1 2 3 0 0 0 0 0
Z4 : 1 1 2 3 0 1 0 1 2 3
2 2 3 0 1 2 0 2 0 2
3 3 0 1 2 3 0 3 2 1

+ 0 1 2 3 4 · 0 1 2 3 4
0 0 1 2 3 4 0 0 0 0 0 0
1 1 2 3 4 0 1 0 1 2 3 4
Z5 :
2 2 3 4 0 1 2 0 2 4 1 3
3 3 4 0 1 2 3 0 3 1 4 2
4 4 0 1 2 3 4 0 4 3 2 1

Alternatively, we can use all the integers to represent elements [k] of Zn


with the rule that [j] = [k] if and only if j − k ≡ 0 (mod n). Then the rules
for addition and multiplication become
[j] + [k] = [j + k] [j][k] = [jk].
This appears easier, but it raises a new difficulty. Before, there was only one
definition of addition and multiplication for each pair {[a], [b]}. Now there
are many such definitions, one for each pair of integers which represent the
same two elements. It is important that all these definitions agree. For
example, consider [2] + [3] = [5] in Z7 . Instead, one might have chosen
representatives [16] instead of [2] and [−18] instead of [3]. For their sum,
we get [16] + [−18] = [−2]. Since [−2] = [5] in Z7 , these two definitions are
the same. Proposition 2.2.1 shows that we get the same result regardless of
which representative is chosen.
2.3. THE RING Zn 37

Using these tables, we can painstakingly verify all the laws of a com-
mutative ring. However, a bit of thought shows that Z5 inherits all these
properties from the integers. For example, consider the associative law for
addition. For any elements [a], [b], [c] in Z5 ,
[a] + ([b] + [c]) = [a] + [b + c] = [a + (b + c)]
= [(a + b) + c] = [a + b] + [c] = ([a] + [b]) + [c].
Proposition 2.2.1 shows that it did not matter which choice of representatives
was made. So the formula is verified. Similarly, all the properties of a
commutative ring can be verified. So we obtain:

2.3.2. Proposition. Zn is a commutative ring.

If you study the multiplication table for Z5 above, you will see that
every non-zero element has an inverse; for example, [2] · [3] = [1]. (That is,
2(3) = 6 ≡ 1 (mod 5).) This is a property which Z5 has but the integers do
not. A commutative ring in which every non-zero element has an inverse is
called a field. These fields will play a very important role in algebra. Two
well known fields are the rational numbers Q and the real numbers R.
In Definition 1.8.1, we defined integral domains and zero divisors. Fields
are examples of integral domains, but Z is an example of an integral domain
which is not a field. The ring Z6 provides an example of a ring which is
not an integral domain since [2] and [3] are zero divisors; this is because
[2] · [3] = [6] = [0] but [2] = [0] = [3].
In order to determine when Zn is a field, we need the following simple
consequence of the Euclidean algorithm.

2.3.3. Lemma. Suppose that a, b and n are integers with gcd(a, n) = 1.


Then the equation
ax ≡ b (mod n)
has exactly one integer solution modulo n. In other words, [a][x] = [b] has
exactly one solution in Zn .

Proof. Define a function f : Zn → Zn by f ([x]) = [ax] for all [x] in Zn .


First, let us verify that f is one-to-one. Suppose that [x] and [y] are elements
of Zn . Pick representatives x and y in Z for [x] and [y]. If f ([x]) = f ([y]), we
can interpret this as saying ax ≡ ay (mod n). This is equivalent to saying
that n divides a(x − y). By Lemma 1.6.1, n divides x − y. This of course
means that x ≡ y (mod n). So, [x] = [y].
The set Zn has exactly n elements. The function f is one-to-one, and so
takes each of these n elements to n distinct elements of Zn . It follows that f
is onto. Thus there is exactly one element [x0 ] such that [b] = f ([x0 ]) = [ax0 ].
In other words, x0 is the unique solution mod n of the congruence equation
ax ≡ b (mod n). 
38 2. MODULAR ARITHMETIC

2.3.4. Corollary. For integers a and n, there is an integer b so that


ab ≡ 1 (mod n) if and only if gcd(a, n) = 1.

Proof. The ‘if’ direction is immediate from the lemma. On the other hand,
if gcd(a, n) = d > 1, then ab + kn is a multiple of d for every choice of b and
k; and so can never equal 1. 

The invertible elements of a ring are called units. The set Z∗n of all units
of Zn is called the group of units of Zn . Z∗n is closed under multiplication
(i.e. if [a] and [b] are units, then [ab] is a unit). It has an identity [1], every
element has an inverse, and multiplication is commutative and associative.
An algebraic object with these properties is called an abelian group. (The
word abelian is derived from the name Abel, who was an eminent algebraist.
It means commutative.)
This corollary shows that [a] is an invertible element, or unit of Zn
exactly when gcd(a, n) = 1. We record this as a separate result.

2.3.5. Corollary. The units of Zn are Z∗n = [a] : gcd(a, n) = 1 .

Now we can show that Zn is a field if and only if n is a prime.

2.3.6. Theorem. If p is a prime, then Zp is a field. On the other hand,


if n is composite, Zn has zero divisors and hence is not an integral domain.

Proof. Suppose p is prime. Then every non-zero element of Zp has an


inverse by Corollary 2.3.5. Hence Zp is a field.
Conversely, if n is composite, factor n = ab so that neither a nor b is ±n.
Then n does not divide either a or b. So they represent non-zero elements
[a] and [b] in Zn satisfying [a][b] = [0]. Therefore Zn has zero divisors. 

The final result of this section gives a bound on the number of roots of
a polynomial in Zp . We prove this after a preliminary lemma.

2.3.7. Lemma. Let a ∈ Z and


p(x) = an xn + an−1 xn−1 + · · · + a1 x + a0
with a0 , . . . , an ∈ Z. Then there is a polynomial q(x) and r ∈ Z so that
p(x) = (x − a)q(x) + r. Moreover, r = p(a).

Proof. We prove the existence of q(x) and r by induction on n. If n = 0,


we may take q = 0 and r = a0 . For n > 0, we achieve the result by “long
division”. We have (x − a)an xn−1 = an xn − aan xn−1 is a multiple of x − a.
Subtracting this from p(x) leaves
p1 (x) = (an−1 + aan )xn−1 + an−2 xn−2 + · · · + a1 x + a0 .
2.3. THE RING Zn 39

By our inductive hypothesis, we have a polynomial q1 (x) in Z[x] and r ∈ Z


such that p1 (x) = (x − a)q1 (x) + r. Then
p(x) = p1 (x) + (x − a)an xn−1 = (x − a)(q1 (x) + an xn−1 ) + r,
so we may take q(x) = q1 (x) + an xn−1 . Since p(x) = (x − a)q(x) + r,
substituting x = a yields r = p(a). 
A polynomial p(x) = an xn +an−1 xn−1 +· · ·+a1 x+a0 is monic if an = 1.

2.3.8. Corollary. If q(x) is a monic polynomial of degree d with integer


coefficients, and p is a prime, then the congruence equation
q(x) ≡ 0 (mod p)
has at most d solutions modulo p.

Proof. This will follow by induction on the degree d. For d = 1, this


follows from Lemma 2.3.3. Assume that the result holds for all polynomials
of degree less than d. If q(x) ≡ 0 (mod p) has no solutions, the theorem
holds trivially. So assume that a is a solution, By Lemma 2.3.7, we have
q(x) = (x − a)q1 (x) + q(a) ≡ (x − a)q1 (x) (mod p).
If b ≡ a (mod p) is any other solution, then
0 ≡ q(b) ≡ (b − a)q1 (b) (mod p).
Since b−a ≡ 0 (mod p) and Zp has no zero divisors, it follows that q1 (b) ≡ 0
(mod p). In other words, all roots of q other than a are roots of q1 . By
the induction hypothesis, q1 (x) ≡ 0 (mod p) has at most d − 1 solutions.
Therefore q(x) ≡ 0 (mod p) has at most d solutions. 

Exercises

1. Write down the addition and multiplication tables for Z6 .


2. Solve the equation x2 + 4x + 2 ≡ 0 (mod 7) by completing the square.
3. Solve the equation x2 + x + 7 ≡ 0 (mod 13) by completing the square.
In this case, it helps to add a linear polynomial which is congruent to 0
modulo 13.
4. Show by example that Corollary 2.3.8 is false if p is not prime.
5. Show by example that Corollary 2.3.8 is false if q is not monic.1
6. Show that every finite integral domain is a field.
Hint: modify the proof of Lemma 2.3.3.
1
We thank Anton Mosunov for suggesting this exercise.
40 2. MODULAR ARITHMETIC

2.4. Equivalence Relations


In this section, we will discuss an important mathematical notion which was
used implicitly in the last two sections. This topic could be skipped by those
keen to get on with the number theory. However, it is a notion that will
recur frequently in your mathematical studies.

2.4.1. Definition. An equivalence relation on a set S is a relation ≈


satisfying the three properties:

(1) reflexivity: a ≈ a for all a ∈ S.


(2) symmetry: a ≈ b implies b ≈ a for all a, b ∈ S.
(3) transitivity: a ≈ b and b ≈ c imply a ≈ c for all a, b, c ∈ S.

2.4.2. Example. Let S be any set, and consider the equality relation.
That is, a is related to b if and only if a = b. This is easily seen to be an
equivalence relation.

2.4.3. Example. Consider the relation on Z given by congruence modulo


n. It is clear that the reflexivity property a ≡ a (mod n) holds since n|0.
Also, if a ≡ b (mod n), then n|b − a. Thus, n|a − b and so b ≡ a (mod n).
This verifies symmetry. Finally, if a ≡ b (mod n) and b ≡ c (mod n), then
n|b − a and n|c − b, so n|(c − b) + (b − a) = c − a. Thus, a ≡ c (mod n). So
the relation is also transitive. This is an equivalence relation.

2.4.4. Example. Consider the relation ≤ on R. Since a ≤ a, we see ≤


is reflexive. If a ≤ b and a = b, then b ≤ a. So ≤ is not symmetric. It is
transitive, since a ≤ b and b ≤ c implies a ≤ c. This is not an equivalence
relation.

2.4.5. Example. Consider a relation on Z given by n ≈ m if n and m


have the same sign, meaning +,−, or 0. Now, n has the same sign as itself.
If n and m have the same sign, then m and n have the same sign. Finally, if
n and m have the same sign, and m and k have the same sign, then n and
k have the same sign. So, this is an equivalence relation.

If ≈ is an equivalence relation on a set S, then each element a of S


belongs to the equivalence class [a] = {b ∈ S | b ≈ a}. Every element
of S belongs to exactly one equivalence class. So S is partitioned into a
disjoint union of these equivalence classes. Conversely, if S is partitioned
into a disjoint union of sets Eα for α ∈ A, then define a relation a ≈ b if
and only if a and b belong to the same set Eα . One can check that this is
an equivalence relation. In fact, this is essentially what occurs in example
2.4. EQUIVALENCE RELATIONS 41

2.4.5 above. One denotes the set of equivalence classes by


{[a] : a ∈ S} = S/≈
Equivalence relations arise naturally in many mathematical situations.
Often, as is the case for modular arithmetic, one wants to define some al-
gebraic operation on the equivalence classes which is compatible with the
corresponding operation on the original set. Consider congruence modulo n
again. The equivalence class for an integer a is [a] = {a + kn | k ∈ Z}. When
addition is defined on these equivalence classes by
[a] + [b] = [a + b],
it is important that we can choose any representative from each class and
add them in order to determine the class of the sum. This is known as
showing that the definition of addition is well defined. This is the content
of Proposition 2.2.1. In other words,
{a + jn | j ∈ Z} + {b + kn | k ∈ Z} = {a + b + tn | t ∈ Z}.
This same proposition shows that multiplication is well defined. In set terms,
{a + jn | j ∈ Z} · {b + kn | k ∈ Z} ⊂ {ab + tn | t ∈ Z}.
For contrast, consider defining addition in example 2.4.5. Let us call the
three equivalence classes [+], [−] and [0]. When we try to define [a] + [b] =
[a + b], the sign of a + b is ambiguous. For if a = 1 and b = −2, the sum
is negative which suggests that [+] + [−] = [−]. But a = 2 and b = −1,
then a + b > 0 which suggests [+] + [−] = [+]. Likewise, if a = 3 and
b = −3, then a + b = 0 which suggests that [+] + [−] should be [0]. So it
is not possible to define an addition on these equivalence classes which is
compatible with addition on the integers. Such a definition only works for
certain equivalence relations. For this reason, when one defines an operation
on equivalence classes, it is very important to check that the definition is
well defined.

Exercises

1. Which of the following relations are equivalence relations? If not, deter-


mine which of the three properties do hold.
(a) For all x, y ∈ R, say x ≈ y if x − y is rational.
(b) For all a, b ∈ Z, say a ≈ b if gcd (a, b) = 1.
(c) For all continuous, positive functions f, g on R, say f ≈ g if
lim f (x)/g(x) = 1.
x→∞

(d) For all a, b ∈ Z, say a ≈ b if 3|(a + b).


(e) For all a, b ∈ N, say a ≈ b if a|b.
42 2. MODULAR ARITHMETIC

2. Say that two continuous functions on [0, 1] are equivalent (f ≈ g) pro-


vided that f (0) = g(0) and f (1) = g(1). Show that addition is well
defined on the equivalence classes.
3. Put a relation on N by setting n ≈ m if n/ gcd(n, m) and m/ gcd(n, m)
are both odd.
(a) Show that this is an equivalence relation, and describe the equiva-
lence classes.
(b) Show that the multiplication [n][m] = [nm] is well defined.
(c) Show that the addition [n] + [m] = [n + m] is not well defined.
4. (Construction of the rational numbers) Put a relation on
S = Z × (Z  0) given by (a, b) ≈ (c, d) if ad = bc.
(a) Show that ≈ is an equivalence relation and let Q = S/ ≈.
(b) Show that multiplication [(a, b)][(c, d)] = [(ac, bd)] is well defined.
(c) Show that addition [(a, b)] + [(c, d)] = [(ad + bc, bd)] is well defined.
(d) Prove that Q is a field with the above addition and multiplication
operations.
(e) Prove that map
a
ϕ : Q → Q, ϕ([a, b]) =
b
is an isomorphism.
5. (Construction of fraction fields) Let R be any integral domain and
put a relation on S = R × (R  0) given by (a, b) ≈ (c, d) if ad = bc.
(a) Show that ≈ is an equivalence relation and let Frac(R) = S/ ≈.
(b) Show that multiplication [(a, b)][(c, d)] = [(ac, bd)] is well defined.
(c) Show that addition [(a, b)] + [(c, d)] = [(ad + bc, bd)] is well defined.
(d) Prove that Frac(R) is a field with the above addition and multiplica-
tion operations. This is referred to as the fraction field (or quotient
field ) of R.

2.5. Chinese Remainder Theorem


In this section, we will study systems of linear congruences of a very special
form. Problems of this type were studied in many ancient civilizations. A
full solution was obtained first in China by Yih-hing in 717. It is thought
to have been used as a method of representing numbers, and doing large
computations.
To illustrate the method, consider the following example.

2.5.1. Example. Consider the system


x ≡ 3 (mod 4)
x ≡ 12 (mod 25)
x ≡ 1 (mod 3)
2.5. CHINESE REMAINDER THEOREM 43

First, let us solve the first pair of equations. This requires integers x, y and
z such that
x = 3 + 4y = 12 + 25z.
Hence, 4y − 25z = 9. By inspection, y = −4 and z = −1 is a solution. Since
gcd(4, 25) = 1, the most general solution is
y = −4 + 25m z = −1 + 4m.
Hence x = 3 + 4(−4 + 25m) = −13 + 100m. Now combine this with the
third equation x = 1 + 3n. This yields
100m − 3n = 14.
Since 100(1) − 3(33) = 1, there is a solution m = 14 and n = 14(33) = 462;
hence, m = 14 − 4(3) = 2 and n = 462 − 4(100) = 62 is a solution. The
most general solution is given by
m = 2 + 3k n = 62 + 100k,
which gives x = 3(62 + 100k) + 1 = 187 + 300k. In other words, x ≡ 187
(mod 300). Notice that 300 = (4)(25)(3).

Now we consider the problem in general.

2.5.2. Lemma. Suppose that m and n are relatively prime positive inte-
gers. Then the system of congruences
x ≡ a (mod m)
x ≡ b (mod n)
has a unique solution (mod mn).

Proof. An integer x is a solution if and only if there are integers y and z


satisfying
x = a + my = b + nz.
Therefore, y and z form a solution of
my − nz = b − a.
By Theorem 2.1.1, this has a solution y0 , z0 , and the most general solution
is
y = y0 + nk z = z0 + mk.
Substituting back in yields
x = a + my0 + mnk = b + nz0 + mnk.
It is readily apparent that such an x solves our system of equations, so we
have found a complete solution. From the form of this solution, x is unique
modulo mn. 
Now we can prove the Chinese Remainder Theorem.
44 2. MODULAR ARITHMETIC

2.5.3 Chinese Remainder Theorem. Suppose that m1 , . . . , mn are


pairwise relatively prime positive integers (i.e. gcd(mi , mj ) = 1 for i = j).
Then the system of congruence equations
x ≡ a1 (mod m1 )
x ≡ a2 (mod m2 )
..
.
x ≡ an (mod mn )
has a unique solution modulo m1 m2 . . . mn .

Proof. The proof is an induction argument. The lemma did the n = 2


case. Suppose that the result holds for all k < n, where n ≥ 3. Consider the
first n − 1 equations. By the induction hypothesis, this system has a unique
solution b modulo m1 . . . mn−1 . In other words, the solution of this system
is the same as the solution of the equation
x ≡ b (mod m1 . . . mn−1 ).

So our original system has the same solutions as the system


x ≡ b (mod m1 . . . mn−1 )
x ≡ an (mod mn )

By the lemma, this has a unique solution (mod m1 . . . mn ). 

Exercises

1. Show that if m1 , . . . , mn are not relatively prime, then the conclusion of


the Chinese Remainder Theorem never holds.
2. Solve the system of equations
x ≡ 2 (mod 7)
x ≡ 5 (mod 11)
x ≡ 9 (mod 13).
3. Solve the system of equations
x ≡ 9 (mod 27)
x ≡ 4 (mod 5)
x ≡ 7 (mod 16).
4. Solve the equation x3 − x − 1 ≡ 0 (mod 385).
5. For every positive integer n, find n consecutive integers none of which
are square-free.2
2
This exercise was given on the 1955 Putnam competition.
2.6. CONGRUENCE EQUATIONS 45

2.6. Congruence Equations


Solving equations with congruences often yields useful information about the
solution in the integers. It is also of independent interest to solve equations
in Zn . Lemma 2.3.3 is an example of this kind of result. We will start by
giving a more general form of it.

2.6.1. Theorem. The congruence equation


ax ≡ b (mod n)
has a solution if and only if d = gcd(a, n) divides b. The solution is unique
(mod n/d).

Proof. Notice that ax ≡ b (mod n) if and only if there is an integer y such


that ax + ny = b. By Theorem 2.1.1, this has a solution if and only if
gcd(a, n)|b. In this case, let A = a/d, B = b/d and N = n/d. Dividing the
Diophantine equation by d reduces the problem to solving Ax + N y = B.
This is equivalent to solving Ax ≡ B (mod N ). Since gcd(A, N ) = 1,
Lemma 2.3.3 shows that the solution is unique (mod N ). 

2.6.2. Example. Here is an example of a linear congruence equation with


two variables:
34x + 4y ≡ 3 (mod 47).
It might appear that the left-hand side is even and the right-hand side is
odd. But in fact the right-hand side is really 3 + 47k, which may be even if
k is odd. Since gcd(4, 47) = 1, one can write 1 as a combination of 4 and
47. For example, 1 = 4(12) − 47. So, 4(12) ≡ 1 (mod 47). If we multiply
the original equation by 12, we obtain
12(34)x + 12(4)y ≡ 12(3) (mod 47).
Since 12(34) ≡ 12(−13) ≡ −156 + 3(47) ≡ −15 (mod 47), this can be
rewritten as
y ≡ 36 + 15x (mod 47).
Thus there are 47 solutions (mod 47), one for each choice of x.

2.6.3. Example. Now consider an equation of higher degree


x2 + 1 ≡ 0 (mod 65).
With a little luck, you might notice that x = 8 is a solution. Following
standard factorization techniques, you will be led to
(x − 8)(x + 8) ≡ x2 − 64 ≡ x2 + 1 ≡ 0 (mod 65).
If this were an exact equation over the integers or even the real numbers,
you could conclude that x = ±8 were the only solutions. However, in solving
this (mod 65), we are actually working in Z65 . By Theorem 2.3.6, Z65 is
46 2. MODULAR ARITHMETIC

not an integral domain. The fact that it has zero divisors means that just
because the product of x − 8 and x + 8 is 0 does not mean that either of
these terms need be zero.
To deal with this problem, we use the Chinese Remainder Theorem but
in reverse. The point is that the equation x2 + 1 ≡ 0 (mod 65) has the same
solutions as the system
x2 + 1 ≡ 0 (mod 5)
x2 + 1 ≡ 0 (mod 13)
The advantage of this is that Z5 and Z13 are both fields. So
x2 + 1 ≡ (x − 8)(x + 8) ≡ 0 (mod 5)
x2 + 1 ≡ (x − 8)(x + 8) ≡ 0 (mod 13)
do have exactly the obvious solutions. This is because in a field (or even
in an integral domain) the product of two numbers is 0 only if one of the
factors is 0. Thus we obtain the system
x ≡ ±8 (mod 5)
x ≡ ±8 (mod 13)
This is really four sets of equations
x ≡ 8 (mod 5) x ≡ -8 (mod 5)
x ≡ 8 (mod 13) x ≡ -8 (mod 13)

x ≡ 8 (mod 5) x ≡ -8 (mod 5)
x ≡ -8 (mod 13) x ≡ 8 (mod 13)
Each of these sets of equations has a unique solution (mod 65) due to the
Chinese Remainder Theorem again. The first two sets have the solutions
x ≡ ±8 (mod 65) that we are already aware of. The last two sets have the
solutions x ≡ ±18 (mod 65). So two surprising solutions turned up.

2.6.4. Example. Let us look at the problem of determining how many


square roots of 1 there are modulo n. Working as above, we can factor n
into a product of prime powers and solve a system of easier equations. Let
us first solve the equation
x2 − 1 ≡ 0 (mod pd )
where p is prime. Now, x2 − 1 factors as (x − 1)(x + 1) so that x = ±1
are roots. Can there be any other roots? If there are, then x − 1 and
x + 1 must both be divisible by some positive power of p. Hence p divides
gcd(x − 1, x + 1), and thus divides (x + 1) − (x − 1) = 2. So when p is any odd
prime, x2 − 1 ≡ 0 (mod pd ) has exactly two solutions, x ≡ ±1 (mod pd ).
We must consider p = 2 separately. Following our argument above, we
see that it may be possible that 2a |x − 1 and 2b |x + 1. The gcd(x − 1, x + 1)
is at least 2min{a,b} and divides 2. Thus min{a, b} ≤ 1. The new solutions
occur when min{a, b} = 1, namely a = 1, b = d − 1 or a = d − 1, b = 1. This
yields solutions
x ≡ 2d−1 ± 1 (mod 2d ).
2.6. CONGRUENCE EQUATIONS 47

Hence x2 ≡ 1 (mod 2d ) if and only if x ≡ ±1 (mod 2d−1 ). Thus there are


4 solutions modulo 2d if d ≥ 3. By inspection, there is 1 solution modulo 2
and 2 solutions modulo 4.
To describe the number of solutions of x2 ≡ 1 (mod n), let us write the
factorization of n as
n = 2d0 pd11 . . . pdkk ,
where pi are distinct odd primes and di > 0 for i ≥ 1, but d0 = 0 is allowed.
Let e = max{d0 − 1, 0}. The problem reduces to solving the system
x ≡ ±1 (mod 2e )
x ≡ ±1 (mod pd11 )
..
.
x ≡ ±1 (mod pdkk ).

For each i ≥ 1 there are two choices modulo pdi i , and for i = 0, there are
s0 = 1,2 or 4 choices modulo 2d0 depending on whether d0 − 2 is negative,
0 or positive. Altogether this yields s = 2k s0 different systems of equations.
By the Chinese Remainder Theorem, each system has a unique solution
modulo n. So there are s square roots of 1 modulo n.

Unlike the case of real numbers, where it is not hard to solve degree 2
equations, solving quadratic equations in Zn is a subject with considerable
depth. Indeed, if p and q are odd primes, there is a surprising relationship
between whether x2 ≡ p (mod q) is solvable and whether x2 ≡ q (mod p)
is solvable. Known as Quadratic Reciprocity, this is a cornerstone result in
Elementary Number Theory; see Section 3.6.

It is worth pointing out that our example of solving x2 −1 ≡ 0 (mod 65)


illustrates another interesting phenomenon. We see
(x − 8)(x + 8) ≡ x2 − 1 ≡ (x − 18)(x + 18) (mod 65).
We have therefore obtained two different factorizations of x2 −1 into “primes”
(i.e. irreducible polynomials). This shows the failure of unique factorization
for polynomials with coefficients in Z65 .

Exercises

1. Find all solutions of 1713x ≡ 871 (mod 2000).


2. Solve 64x ≡ 84 (mod 66) completely.
3. Solve completely the equation 3x + 7y ≡ 11 (mod 95).
4. Solve x2 ≡ 8x (mod 437).
48 2. MODULAR ARITHMETIC

5. What are the cube roots of unity mod 91? In other words, solve the
equation x3 − 1 ≡ 0 (mod 91).
6. Solve x3 + x2 + x + 1 ≡ 0 (mod 91).
7. Solve the congruence system
2x + 5y ≡ 7 (mod 82)
7x + 13y ≡ 10 (mod 82).

2.7. Fermat’s Little Theorem


The theorem to be proven in this section does not deserve the title ‘little’.
Indeed, it is a very important fact. However, Fermat’s most famous non-
theorem has so overshadowed all his other work that this lovely result is
‘belittled’.

2.7.1 Fermat’s Little Theorem. Let p be a prime, and let a be an


integer which is not a multiple of p. Then
ap−1 ≡ 1 (mod p).
Thus, np ≡ n (mod p) for every integer n.

Proof. Consider the function f mapping Zp into itself used in the proof
of Lemma 2.3.3:
f ([x]) = [ax].
Since gcd(a, p) = 1, this function is one-to-one and onto. We have f ([0]) =
[0]. So f gives a bijection of the non-zero elements of Zp . In other words,
{[a], [2a], . . . , [(p − 1)a]} is just the set {[1], [2], . . . , [p − 1]} possibly in some
other order. Hence
a(2a)(3a) · · · ((p − 1)a) ≡ 1(2)(3) · · · (p − 1) (mod p).
Simplifying both sides, we obtain
(p − 1)! ap−1 ≡ (p − 1)! (mod p).
The element [(p − 1)!] is not zero (i.e. p does not divide (p − 1)!), and since
Zp is a field, we can cancel out the (p − 1)! on each side of the equation.
(Alternately, use Theorem 2.6.1 to justify the cancellation.) Thus,
ap−1 ≡ 1 (mod p). 

This can be reformulated as a result about Zp .

2.7.2. Corollary. Let p be a prime. If [a] is a non-zero element of Zp ,


then [a]p−1 = [1]. For all elements [n], one has [n]p = [n].
2.7. FERMAT’S LITTLE THEOREM 49

2.7.3. Corollary. Let p be a prime. If [a] is a non-zero element of Zp ,


then
[a]−1 = [a]p−2 .

This theorem has many uses.

2.7.4. Example. One immediate use is in simplifying congruence equa-


tions. Very high powers can be replaced by lower ones. Consider the equa-
tion

x600 +29x543 −19x482 +199x301 +82x182 −75x121 +34x63 −60 ≡ 0 (mod 61).

It is immediately clear that x ≡ 0 (mod 61) is not a solution. For every


other x, we have x60 ≡ 1 (mod 61). So the equation reduces to

1 + 29x3 − 19x2 + 199x + 82x2 − 75x + 34x3 − 60 ≡ 0 (mod 61).

This reduces to
2x3 + 2x2 + 2x + 2 ≡ 0 (mod 61).
After cancelling the 2 and pulling out the factor x + 1, this becomes

(x + 1)(x2 + 1) ≡ 0 (mod 61).

Trial and error finds the solutions x = ±11. This means the cubic factors as

x3 + x2 + x + 1 ≡ (x + 1)(x − 11)(x + 11) (mod 61).

Since 61 is a prime, this is zero only if one of the three factors is zero. So
the complete solution is x ≡ 11, 50 or 60 (mod 61).

The number (p − 1)! comes up in the proof of Fermat’s Little Theorem.


It is an interesting fact that (p − 1)! (mod p) can be computed.

2.7.5 Wilson’s Theorem. If p is a prime, (p − 1)! ≡ −1 (mod p).

Proof. The result is trivial for p = 2. So without loss of generality, p is an


odd prime. The idea is to evaluate the product [1][2] · · · [p − 1] by pairing off
each element [a] with its inverse [a]−1 . There is a slight problem because [a]
might be its own inverse. This happens only if [a] is root of x2 = [1], which
factors as (x − [1])(x + [1]) = [0]. Since Zp is a field, the only solutions are
[±1].
Hence the non-zero elements pair off into (p − 3)/2 pairs of inverses
{[a], [a]−1 } and two singletons [1] and [−1]. Multiplying together all the non-
zero elements of Zp results in a product of (p − 3)/2 ones and [1][−1] = [−1].
That is, (p − 1)! ≡ −1 (mod p). 
50 2. MODULAR ARITHMETIC

Exercises

1513
1. Compute 217 (mod 13).
2. Find all solutions of
35x360 + 99x290 + 51x220 − 47x217 + 23x148 + 39x147
+ 24x144 + 34x75 − 23x74 + 120x + 16 ≡ 0 (mod 73).
3. Solve x39 + x25 + x14 + 1 ≡ 0 (mod 91).
4. Suppose that p is a prime of the form p = 4n + 1. Prove that ±(2n)! are
roots of the equation x2 + 1 ≡ 0 (mod p).
5. Let a > 1 be any positive integer, and let p and q be primes. Show that
if q divides ap − 1, then q ≡ 1 (mod p).
6. Use the previous exercise to test whether 213 − 1 and 237 − 1 are prime.
This cuts down significantly on the number of prime divisors that need
to be tested.
7. Suppose that n is the product of k distinct primes p1 , . . . , pk . Show that
k  
 n pi −1
≡1 (mod n).
pi
i=1

8. The Fermat numbers have the form Fj = 22 + 1. The first few, F0 =
j

3, F1 = 5, F2 = 17, F3 = 257, and F4 = 65537 are prime. However,


F5 = 641(6700417), and p = 6700417 is prime. Let
a = 2935363331541925531.
You may assume (correctly) that
a≡1 (mod F0 F1 F2 F3 F4 p) and a ≡ −1 (mod 641).
Show that 2k a + 1 is never prime for k ≥ 1.
9. Define a function f defined on {(n, m) : n, m ∈ N, n ≥ 2} as follows:
k = k(n, m) := (n − 1)! + 1 − mn
n − 2 2
f (n, m) := |k − 1| − (k 2 − 1) + 2.
2
Compute the range of f ,

2.8. Euler’s Theorem


In this section, we generalize Fermat’s Little Theorem from primes to arbi-
trary integers. The problem is to figure out what the right generalization
is. In order for ad ≡ 1 (mod n), it is necessary that ax ≡ 1 (mod n) have
a solution. By Theorem 2.6.1, this means that gcd(a, n) = 1. It turns out
2.8. EULER’S THEOREM 51

that this is also sufficient for some power of a to be congruent to 1 modulo


n. In terms of the ring Zn , this is just the condition that [a] has an inverse
because a(ad−1 ) = 1.

2.8.1. Definition. The Euler totient or phi function is the cardinality


ϕ(n) of Z∗n . That is, ϕ(n) is the cardinality of
{a : 1 ≤ a ≤ n, gcd(a, n) = 1}.

For example, ϕ(12) = |{1, 5, 7, 11}| = 4.

2.8.2. Example. If p is prime, it is clear that ϕ(p) = p − 1. More


generally, if n = pd , then gcd(a, n) = 1 if and only if p|a. The multiples of
p between 1 and n are given by p, 2p, 3p, . . . , pd , i.e. p · 1, p · 2, . . . , p · (pd−1 ).
We see there are pd−1 such numbers, so ϕ(pd ) = pd − pd−1 = pd−1 (p − 1).
We will obtain a formula for an arbitrary ϕ(n) in the next section.

You should notice that the proof of the following theorem is exactly the
same as the proof of Fermat’s Little Theorem.

2.8.3 Euler’s Theorem. If gcd(a, n) = 1, then aϕ(n) ≡ 1 (mod n).

Proof. Fix an integer a such that gcd(a, n) = 1. Consider the function


on Zn given by f ([x]) = [ax]. By Lemma 2.3.3, f is one-to-one and onto.
As we have noted, if [a] and [x] are units, then so is [ax]. So, f maps Z∗n
onto itself. Multiplying all the units together yields the equation
  
[x] = [ax] = [a]ϕ(n) [x].
[x]∈Z∗n [x]∈Z∗n [x]∈Z∗n

Since [x]∈Z∗n [x] is a unit, it can be cancelled off leaving

[a]ϕ(n) = [1]. 

Exercises

1. If gcd(a, 561) = 1, show that a80 ≡ 1 (mod 561). Calculate ϕ(561).


2. Let n = p1 p2 p3 be the product of three distinct primes. Let
d = lcm{p1 − 1, p2 − 1, p3 − 1}.
Prove that if gcd(a, n) = 1, then ad ≡ 1 (mod n). Generalize.
52 2. MODULAR ARITHMETIC

3. (a) Suppose that n is the product of k distinct primes. Use the Chinese
remainder theorem to show that if m ≡ 1 (mod ϕ(n)), then am ≡ a
(mod n) for all integers a.
(b) Show by example that this is false for n = 49.
4. Before reading the next section, compute a few examples such as ϕ(30),
ϕ(72), ϕ(225) in order to conjecture a formula for ϕ(n).

5. Compute [x]∈Z∗n [x].
Hint: Use the information about square rootsof 1 in Zn to show that
if n is odd with k distinct prime factors, then [x]∈Z∗n [x] = [−1]k . Then
find the general formula.

2.9. More on Euler’s Phi Function


First we obtain a formula for ϕ(n). The key tool is the Chinese Remainder
Theorem.

2.9.1. Lemma. If gcd(n, m) = 1, then ϕ(nm) = ϕ(n)ϕ(m).

Proof. It is clear that gcd(x, nm) = 1 if and only if gcd(x, n) = 1 and


gcd(x, m) = 1. Let
Sn = {a : 1 ≤ a ≤ n, gcd(a, n) = 1} and Sm = {b : 1 ≤ b ≤ m, gcd(b, m) = 1}.
For each a ∈ Sn and b ∈ Sm , consider the system
x ≡ a (mod n)
x ≡ b (mod m)

By the Chinese Remainder Theorem, this has a unique solution (mod nm).
Thus for each choice of a ∈ Sn and b ∈ Sm , we obtain one element in
Snm . Conversely, if x ∈ Snm , then a ≡ x (mod n) belongs to Sn and b ≡ x
(mod m) belongs to Sm . Thus,
ϕ(nm) = |Snm | = |Sn ||Sm | = ϕ(n)ϕ(m).


2.9.2. Theorem. If n = pd11 · · · pdkk where pi are distinct primes, then


 
ϕ(n) = n 1 − 1
p1 ··· 1 − 1
pk .

Proof. We prove the result by induction on k. When k = 1, the number


n is of the form n = pd where p is prime. Then Example 2.8.2 shows
ϕ(n) = pd−1 (p − 1) = n 1 − 1p .
2.9. MORE ON EULER’S PHI FUNCTION 53

For k = 2, the Lemma 2.9.1 applies directly to give


ϕ(pd q e ) = ϕ(pd )ϕ(q e )
 
= pd 1 − 1p q e 1 − 1q
 
= n 1 − 1p 1 − 1q .

Suppose that the result is true for j < k, and consider n = pd11 · · · pdkk . Let
dk−1  
m = pd11 · · · pk−1 . By hypothesis, ϕ(m) = m 1 − p11 · · · 1 − pk−1
1
. Since
n = mpdkk and gcd(m, pdkk ) = 1, the lemma applies to show that
ϕ(n) = ϕ(m)ϕ(pdkk )
  
= m 1 − p11 · · · 1 − pk−1
1
pdkk 1 − 1
pk
 
= n 1 − p11 · · · 1 − p1k .
Therefore the theorem follows by induction. 
The following result is a very useful property of the Euler phi function.

2.9.3. Theorem. 
ϕ(d) = n.
d|n

Proof. Let Sd = {k : 1 ≤ k ≤ n, gcd(k, n) = d} for divisors d of n. Since


the only possibilities for gcd(k, n) are divisors of n, it is clear that this
provides a partition of {1, . . . , n} into disjoint sets. Notice that if k ∈ Sd ,
then gcd(k/d, n/d) = 1 and 1 ≤ k/d ≤ n/d. Conversely, if gcd(j, n/d) = 1
and 1 ≤ j ≤ n/d, then k = jd belongs to Sd . Hence there is a bijection
between Sd and the units of Zn/d . So |Sd | = ϕ(n/d). Therefore
  
n= |Sd | = ϕ nd .
d|n d|n
n
Since runs over all of the divisors of n when d does, the desired formula
d
follows. 

Exercises

1. (a) Prove that if p is prime and n is divisible by p, then ϕ(pn) = pϕ(n).


(b) Show that in general if m divides n, the quantities ϕ(nm) and mϕ(n)
need not be equal.
2. Prove that for every positive integer k, there are only finitely many n
for which ϕ(n) = k.
3. Find all n with ϕ(n) = 12.
54 2. MODULAR ARITHMETIC

4. (a) Prove there are infinitely many positive integers n with ϕ(n) = n2 .
(b) Prove that there are also infinitely many positive integers n with
ϕ(n) = n3 .
5. Suppose that gcd(n, m) = 1, and d|nm. Show that there is a unique
factorization d = ab so that a|n and b|m.
6. Verify Theorem 2.9.3 directly for n = pk . Then use Exercise 5 to prove
it for products of distinct prime powers.

2.10. Primitive Roots


In this section, we show that for every prime p, one may always find an
integer a such that {1, a, a2 , . . . , ap−1 } is a permutation of {1, 2, . . . , p − 1}
mod p. This is often useful, when one wishes to study problems that are
multiplicative in nature, rather than additive.

2.10.1. Definition. If a is an element of Z∗n , its order is the smallest


positive integer d = ordn (a) such that ad ≡ 1 (mod n). Furthermore, say
that a is a primitive root (mod n) if the set of powers

ak (mod n) : 1 ≤ k ≤ d
coincides with the set of all of Z∗n .

2.10.2. Proposition. If ab ≡ ac ≡ 1 (mod n), then d = gcd(b, c) satis-


fies ad ≡ 1 (mod n) also. Hence ab ≡ 1 (mod n) if and only if ordn (a)|b.

Proof. By the Euclidean algorithm, there are integers s and t so that d =


bs + ct. Hence
 s t
ad ≡ ab ac ≡ 1 (mod n).
In particular, e = gcd(b, ordn (a)) satisfies ae ≡ 1 (mod n). Since ordn (a)
is the smallest such integer, and e| ordn (a), we conclude that e = ordn (a).
Hence ordn (a)|b. 

2.10.3. Corollary. If gcd(a, n) = 1, ordn (a)|ϕ(n).

Proof. By Euler’s theorem, aϕ(n) ≡ 1 (mod n). Hence by Proposition


2.10.2, ordn (a)|ϕ(n). 

The set of invertible elements Z∗n of Zn consists of the (equivalence classes


of) elements relatively prime to n, and so has cardinality ϕ(n). One sees that
the powers of a belong to exactly ordn (a) different classes (mod n). For if
ak ≡ al (mod n), with k < l, then al−k ≡ 1 (mod n). Thus ordn (a)|(l − k),
and so l > ordn (a). Conversely, if ordn (a)|(l − k), it follows that ak ≡ al
2.10. PRIMITIVE ROOTS 55

(mod n). So the distinct powers of a are precisely


{ak (mod n) : 1 ≤ k ≤ ordn (a)}.
In particular, a is a primitive root of Zn exactly when ordn (a) = ϕ(n). So
we obtain:

2.10.4. Proposition. If gcd(a, n) = 1 and ordn (a) = n − 1, then n is


prime.

When n is composite, there is frequently no primitive root. For example,


modulo 15, the elements {2, 7, 8, 13} have order 4, {4, 11, 14} have order 2,
and 1 has order 1. Since Z∗15 has 8 elements, there is no primitive root.
However, for a prime p, it will be shown that a primitive root always exists.
For example, modulo 17, the elements {3, 5, 6, 7, 10, 11, 12, 14} are all prim-
itive roots. The proof is based on a counting argument, and properties of
the Euler phi function.

2.10.5. Lemma. Let p be a prime. For each divisor d of p − 1, let f (d)


denote the number of elements of Z∗p of order d. Then

f (e) = d
e|d

for every divisor d of p − 1.

Proof. By Fermat’s Theorem, every element a ∈ Z∗p satisfies ap−1 = 1. In


other words, the congruence equation xp−1 − 1 ≡ 0 (mod p) has exactly
p − 1 solutions modulo p. For each divisor d of p − 1, one has that ordp (a)|d
if and only if ad ≡ 1 (mod p) (i.e. exactly
 when a is a root of x − 1 ≡ 0
d

(mod p)). Thus the number of roots is e|d f (e). Also, one can factor

xp−1 − 1 ≡ (xd − 1)pd (x) (mod p)


where 
pd (x) = 1 + xd + x2d + . . . + xp−1−d = xkd .
0≤k<(p−1)/d

By Corollary 2.3.8, pd (x) ≡ 0 (mod p) has at most p−1−d distinct solutions


modulo p, and xd − 1 ≡ 0 (mod p) has at most d solutions. But together,
they have exactly p−1 distinct solutions. So both equations must have their
particular, x ≡ 1 (mod p) had exactly d
full complement of solutions. In d

solutions modulo p. Therefore e|d f (e) = d. 

Notice that by Theorem 2.9.3, the Euler phi function satisfies exactly
the same set of equations as the function f of the lemma. That is the key
to this theorem.
56 2. MODULAR ARITHMETIC

2.10.6. Theorem. The function f of Lemma 2.10.5 coincides with the


Euler phi function on the divisors of p − 1. In particular, the field Zp for
p prime always has ϕ(p − 1) primitive roots. Therefore there is an element
a ∈ Z∗p so that the set of powers {ak : 1 ≤ k ≤ p − 1} coincides with the set
{1, 2, . . . , p − 1} modulo p.

Proof. We prove the result by induction on the size of the divisor d of p − 1.


For d = 1, there is, of course, exactly 1 solution of x ≡ 1 (mod p). Thus

f (1) = 1 = ϕ(1).

Suppose that f (e) = ϕ(e) for all divisors e of p − 1 which are less than d.
In particular, this is true for all divisors of d. Hence by the previous lemma
and Theorem 2.9.3 ,
 
f (d) = d − f (e) = d − ϕ(e) = ϕ(d).
e|d, e<d e|d, e<d

Therefore the number of primitive roots is ϕ(p − 1), which is non-zero. 

There are many interesting unsolved questions concerning primitive


roots. For example, in 1927, Artin conjectured that if a ∈ Z is not a perfect
square and not −1, then there exist infinitely many primes p for which a is
a primitive root in Zp . In particular, Artin’s conjecture would imply that 2
is a primitive root in Zp for infinitely many primes p. Currently, there is no
value of a for which Artin’s conjecture is known. In 1967, Hooley [18] did
however give a conditional proof of Artin’s conjecture assuming the gener-
alized Riemann hypothesis. Unconditionally, Heath-Brown [17] proved in
1986 that at least one of 2, 3, or 5 must be a primitive root in Zp for infinitely
many primes p.
Now let us return to the problem of proving that a number p is defi-
nitely prime. By the previous discussion, it is sufficient to find some a with
ordp (a) = p − 1. However, it defeats the purpose if we must compute all
p − 1 powers. This is not necessary if p − 1 can be factored. A method for
factoring is described in the next section. It may be the case that p − 1
has a lot of small factors. This will make factoring it substantially easier.

The idea is this: factor p − 1 = qidi , then verify that ap−1 ≡ 1 (mod p)
and compute a(p−1)/qi (mod p). If any of these is 1, then a is not a prim-
itive root. But if they are all different from 1, then all powers of a up to
p − 1 are different, and a is a primitive root. Moreover, this shows that
ordp (a) = p − 1, so p is definitely prime. To see this, suppose that ak ≡ a
(mod p) with 1 ≤ k < < p. Then if m = − k, am ≡ 1 (mod p). We also
know that ap−1 ≡ 1 (mod p). Let d = gcd(m, p − 1). By Proposition 2.10.2,
ad ≡ 1 (mod p). Clearly, d is a proper divisor of p − 1. Thus d divides
(p − 1)/qi for some i, and so a(p−1)/qi ≡ 1 (mod p).
2.10. PRIMITIVE ROOTS 57

2.10.7. Example. Consider the example p = 113. Factor p − 1 = 112 =


24 7. By hand, compute mod 113

27 ≡ 128 ≡ 15
214 ≡ 225 ≡ −1
228 ≡ 1

So 2 is not a primitive root. Try 3,

37 ≡ 2187 ≡ 40
314 ≡ 1600 ≡ 18
328 ≡ 324 ≡ −15
356 ≡ 225 ≡ −1
3112 ≡ 1

So 3 is looking good so far.

38 ≡ 120 ≡ 7
316 ≡ 49

Thus we see that 3112 ≡ 1 (mod 113), and 356 ≡ 1 (mod 113), and 316 ≡ 1
(mod 113). So 3 is a primitive root, and 113 is prime.

Of course, this method is not interesting for such small numbers. Try
some of the following exercises with a symbolic computation program.

Exercises

1. Show that 19 is a primitive root for p = 191.


2. Show that 2 is a primitive root for p = 2549.
3. Let p be prime and let a ∈ Z be a primitive root mod p. Prove that a is
a primitive root mod p2 if and only if ap−1 ≡ 1 (mod p2 ).
4. Let p = q be odd primes. Prove that there are no primitive roots mod
pq.
5. Let p, q, r be pairwise distinct primes which are not necessarily odd.
Prove that there are no primitive roots mod pqr.
6. Find a primitive element of Z∗27943 . Give a short list of congruences that
prove that it is a primitive root, and hence that 27943 is prime. You
can use computer software.
7. Find a primitive root for p = 1423554023 using computer software. Give
a short list of congruences that prove that it is a primitive root, and
hence that p is prime.
58 2. MODULAR ARITHMETIC

Notes on Chapter 2
Linear Diophantine equations were discussed by the Greek mathematician
Diophantus in the 3rd century CE, though he did not have a complete
solution. The Hindu school in India studied these equations in the 6th and
7th century CE, and Brahmegupta had a method for finding a solution.
It was in the 16th and 17th centuries that the Europeans wrote about it.
Euler gave a complete solution in the modern style in 1734. It was Gauss
who introduced the modern notation of congruence modulo n.
The abstract notion of a ring was given by Fraenkel in 1914 and extended
by Sono in 1917. However many concrete examples such as Zn were well
known much earlier. The first non-commutative example was the ring of
quaternions due to Hamilton in 1843. Cayley considered the space of n × n
matrices as a ring in 1855. See [19] for more on this history.
The Chinese remainder problem, as the name suggests, first arose in
Chinese writings from the first century CE. The Greek and the Indian schools
also studied this problem. A complete solution was provided by Yih-hing
in 717 CE. The Arab school has writings on it from about 1000 CE. The
Italians wrote about partial solutions in the late 12th century. A German
manuscript from the 15th century produced the same solution as Yih-hing.
The modern solution in complete generality was given by Euler, and also
Gauss, in the mid-18th century.
Fermat’s little theorem was stated by Fermat in 1640. Euler gave a proof
of it in 1736, and the generalization to Euler’s theorem in 1760.
Much information about this history can be found in the volume by
Dickson [9, Vol.II]. Kleiner [20] is another source worth reading. Cooke [8]
contains a lot of information of mathematics before the modern era.
See Hardy and Wright [15] for all of this material and many extensions,
plus many historical notes. Stark [37] is also an excellent source for this
material.
Chapter 3

Diophantine Equations and


Quadratic Number Domains
Diophantine equations refer to equations or systems of equations in which
both the coefficients and the unknowns are integers. Generally, there are
more unknowns than equations. But since we are interested in integer so-
lutions, it is often difficult to decide if there are any solutions at all. The
most famous Diophantine equation is Fermat’s equation

xn + y n = z n

for n ≥ 3. Fermat wrote in the pages of a book (circa 1637) that he had a
truly marvelous proof that there are no solutions, but it was too long to fit
in the margin. However, there is no way to know for certain if he really had
such a proof. Fermat never published anything in mathematics, nor did he
often communicate his methods to others. It is revealing, however, that he
wrote to others that he had a proof for the case n = 4, but never claimed
to have a general proof in his correspondence.
Euler solved the case n = 3 in 1770. Legendre and Dirichlet indepen-
dently solved the case n = 5 around 1825. Sophie Germain was a self-taught
French mathematician in the late 18th century, a time when women were not
welcomed into academic circles. She corresponded with Lagrange, Legendre
and Gauss under a pseudonym. She did some important work on Fermat’s
problem which was unpublished, but was mentioned by Legendre. Some of
her results were still being reproved by others in the 20th century.
The early development of abstract algebra, especially rings and fields,
was in part motivated by an attempt to solve Fermat’s problem. Several
‘proofs’ were found to be incorrect because they falsely assumed unique
factorization in certain number domains. Kummer was the first to provide
a solution for infinitely many primes in 1847, based on an analysis of the
failure of unique factorization. His proof works for regular primes, which
includes all primes less than 100 except 37, 59 and 67.
59
60 3. DIOPHANTINE EQUATIONS AND QUADRATIC NUMBER DOMAINS

Exciting news reached the mathematical community in June 1993 when


Andrew Wiles announced the final dramatic step to the solution of this 350-
year-old problem at a conference in Cambridge. The statement of his actual
results do not immediately look like they apply to Fermat’s question, as they
refer to some advanced notions about elliptic curves. Indeed, his results are
much more far reaching than a single equation such as Fermat’s. It turned
out that there was a gap in part of his proof. He and Richard Taylor
worked on the gap and eventually completed the argument. In particular,
these results combine with known work to finally resolve the most famous
mathematical conundrum of our time.
In this chapter, we will look at a few special cases of Diophantine equa-
tions, and will see a variety of techniques for solving them. We also will
take an excursion into some other number systems to see that the theorems
we proved in the last chapter are indeed special. The quadratic number
domains have a nice theory which imitates, yet varies from, the integers.
Several of these domains have applications to the number theory of the inte-
gers themselves. We finish the chapter with a proof of Gauss’s famous Law
of Quadratic Reciprocity, which allows one to calculate whether a number
a is a square modulo a prime p.

3.1. Pythagorean Triples


In this section, we will study the well known problem of determining all of
the integer solutions of the Pythagorean equation

x2 + y 2 = z 2 .

Of course, if (x, y, z) is a solution, then (ax, ay, az) is also a solution. So


it is natural to insist that gcd(x, y, z) = 1. Of course, any integer which
divides any two of x, y, z divides the third as well. So, it suffices to say
gcd(x, y) = 1.
We will give two characterizations of such (x, y, z). The first uses an
algebraic approach, while the second uses a geometric method.

Algebraic approach. The first observation is obtained by looking at


squares of odd and even numbers. All such squares are congruent to 0
and 1 modulo 4 respectively. Thus the sum of two odd squares is congruent
to 2 (mod 4), and no square has this form. Since we have ruled out the case
of x and y both being even by assuming that they are relatively prime, it
follows that one, say x, is even, and the other, y, is odd. Hence, z is also
odd.
Now consider the equation

x2 = z 2 − y 2 = (z + y)(z − y).
3.1. PYTHAGOREAN TRIPLES 61

Since x, z + y, and z − y are all even, there are positive integers a, b, c so


that

x = 2a, z + y = 2b and z − y = 2c.

Our equation becomes

4a2 = 4bc or a2 = bc.

Now gcd(b, c) divides gcd(b + c, b − c) = gcd(z, y) = 1. Thus b and c are


relatively prime. But bc is a perfect square, meaning each prime factor
occurs an even number of times. As b and c have no common factors, they
must both be squares. Let u and v be positive integers such that b = u2
and c = v 2 , and thus a = uv. Substituting back in yields

x = 2uv y = u2 − v 2 z = u2 + v 2 .


Furthermore, gcd(u, v) = gcd(b, c) = 1. Since y is odd, exactly one of u
and v is odd.
On the other hand, if u > v are relatively prime, one even and one odd,
then x = 2uv, y = u2 − v 2 and z = u2 + v 2 are relatively prime, and satisfy

x2 + y 2 = 4u2 v 2 + u4 − 2u2 v 2 + v 4 = u4 + 2u2 v 2 + v 4 = z 2 .

This solves the problem completely.


For the general solution of Pythagorean triples, one must put the com-
mon factors back in. So the most general solution is given by

x = 2kuv y = k(u2 − v 2 ) z = k(u2 + v 2 )

for arbitrary integers u, v, and k. Note however that to get gcd(x, y) = k,


we need to specify that exactly one of u, v is even and gcd(u, v) = 1.

Geometric approach. Observe that (x, y, z) is a solution if and only if the


point (ax, ay, az) is a rational solution for all a = 0 in Q. If we take a = z1 ,
we obtain a rational solution xz , yz , 1 . Conversely, if (x, y, 1) is a rational
solution, then clearing the denominator yields integer solutions. Therefore,
it is enough to classify (x, y) ∈ Q2 with x2 + y 2 = 1. We see that Q = (0, 1)
is such a solution.
62 3. DIOPHANTINE EQUATIONS AND QUADRATIC NUMBER DOMAINS
.

Consider the line t through (0, 1) with slope t, and let Pt be the inter-
section of t with the circle x2 + y 2 = 1.

Pt

Let’s express Pt in terms of t. The line t is given by y = tx+1. Substituting


this expression for y into x2 + y 2 = 1, we find
1 = x2 + (tx + 1)2 = (t2 + 1)x2 + 2tx + 1.
The two solutions to this equation are x = 0 and
2t
(3.1.1) x=− 2 .
t +1
Plugging back into y = tx + 1, we have
1 − t2
(3.1.2) y = tx + 1 = .
t2 + 1
Therefore,
 2t 1 − t2 
Pt = − , .
t2 + 1 t2 + 1
Notice that if t ∈ Q, then Pt ∈ Q2 . Conversely, suppose Pt ∈ Q2 and
Pt = (0, 1). Then since t is the slope of the line between Pt and (0, 1), we
see t ∈ Q. Hence, we have shown
{(x, y) ∈ Q2 : x2 + y 2 = 1} = {Pt : t ∈ Q} ∪ {(0, 1)}
 2t 1 − t2  
= − 2 , 2 :t∈Q .
t +1 t +1
Note that setting t = 0 yields the point (0, 1).
Now, set t = ab with a, b ∈ Z relatively prime and b = 0. Then
 −2t 2  1 − t2 2  2ab 2  a2 − b2 2
1= 2 + 2 = + 2 .
t +1 t +1 a2 + b2 a + b2
Multiplying through by a2 + b2 , we see that all of the integer solutions of
x2 + y 2 = z 2 (up to scalar) are given by
(2ab, a2 − b2 , a2 + b2 ).
3.2. FERMAT’S EQUATION FOR n = 4 63

The key geometric trick which made this argument work was to find one
rational solution Q of x2 + y 2 = 1 and then parameterize all other solutions
by intersecting our equation with a rationally sloped line through Q. The
reader may wonder if Diophantine equations other than x2 + y 2 = z 2 may
also be solved using this method. This question forms part of a beautiful
subject known as Arithmetic Geometry. Equations of degree 3 in x, y, z
are objects known as elliptic curves; rational points on elliptic curves is a
subject of active research and has deep connections to the Fermat equation
mentioned at the beginning of this chapter. For equations of degree at least
4 in x, y, z, a theorem of Faltings shows that there are only finitely many
rational solutions. Faltings was awarded the Fields Medal for his seminal
work on this subject.

Exercises

1. Show that there are infinitely many relatively prime solutions of


x2 + y 2 = z 4 .
2. Find all solutions of x2 + 3y 2 = z 2 .
3. Find all relatively prime solutions of x2 + 2y 2 = z 2 .
4. Use point Q = (1, 1) to find all rational points on the circle x2 + y 2 = 2.
5. Solve the Diophantine equation x2 + 442 = z 6 .

3.2. Fermat’s Equation for n = 4


The complete solution of the Pythagorean triple problem allows us to analyze
the Diophantine equation
x4 + y 4 = z 2 .
It will be shown that this has no solutions. Hence the Fermat equation
x4 + y 4 = z 4
has no solutions either.
The method of proof is called Fermat’s method of infinite descent. The
basic idea is to start with the smallest possible solution (if it exists), meaning
that z is as small as possible. Then using the given solution, construct a
smaller solution. Of course, this is a contradiction which implies that the
assumption that there were any solutions at all was wrong. This is called
infinite descent because one can construct an infinite sequence of smaller
and smaller solutions, which is not possible.

3.2.1. Theorem. The equation x4 + y 4 = z 2 has no positive integer


solutions.
64 3. DIOPHANTINE EQUATIONS AND QUADRATIC NUMBER DOMAINS

Proof. So, let us assume that there are solutions of x4 + y 4 = z 2 in positive


integers. Among all solutions, we choose x,y and z so that z is minimal. In
particular, gcd(x, y) = 1. Since x2 , y 2 , and z is a Pythagorean triple, there
are relatively prime positive integers u and v such that

x2 = 2uv y 2 = u2 − v 2 z = u2 + v 2 .

(It may be necessary to interchange x and y so that x is even, and y is odd.)


This produces another Pythagorean triple v 2 + y 2 = u2 . Thus, v must
be even, as y is odd. Consider the equation x2 = (2v)u. As gcd(u, 2v) = 1,
it follows that u and 2v are squares. Hence there are positive integers a and
b so that u = a2 and v = 2b2 .
Using the solution for Pythagorean triple system v 2 +y 2 = u2 , we obtain
relatively prime positive integers c and d so that

v = 2cd y = c2 − d2 u = c2 + d2 .

Hence, b2 = cd and a2 = c2 + d2 . Once again, since b2 = cd and gcd(c, d) =


1, it follows that c and d are perfect squares, say c = m2 and d = n2 .
Substituting back in yields

m4 + n4 = a2 .

Finally, a ≤ a2 = u < u2 + v 2 = z.
So, we have succeeded in producing a smaller solution of our equation,
contrary to the hypothesis that we started with the smallest one. This must
imply that there are no solutions at all. 

Exercises

1. Show that there are no positive integer solutions to x4 + 4y 4 = z 2 .


2. Show that there are no positive integer solutions to x4 − y 4 = z 2 .
3. Show that there is no right angle triangle with sides of integer lengths
whose area is a perfect square.
4. Solve x2 + 12 = y 4 for x, y ∈ N.
5. Show that if x, y, p ∈ N with p is prime and x3 + y 3 = p, then p = 2.
What changes if we allow x, y ∈ Z? Find a few solutions.
6. Find all integer solutions of x2 − 11y 2 = 3.
Hint: solve it mod 3 first.
7. Find all integer solutions of x4 + y 4 = 13z 4 .
Hint: try to solve it modulo some primes.
3.3. QUADRATIC NUMBER DOMAINS 65

3.3. Quadratic Number Domains


A number d is called square free if it has no repeated prime factor. Let d
be a square free integer (except 1). Define
√ √
Z[ d] = {n + m d : n, m ∈ Z}.
One may check directly that this set is closed under addition and multipli-
cation; and thus is a commutative ring. It also has the important property

if x, y ∈ Z[ d] and xy = 0 then x = 0 or y = 0.

This follows since Z[ d] is contained in the real numbers R (when d > 0)
or the complex numbers√ C (when d < 0), both of which have this property.
In other words, √ Z[ d] is an integral domain.
In fact, Z[ d] sits inside a smaller field
√ √
Q[ d] = {r + s d : r, s ∈ Q}.

One checks that Q[ d] is an integral domain. To see that non-zero elements
have inverses, notice that

 √ r−s d
r+s d 2 = 1.
r − ds2

It is a simple exercise based on the irrationality of d to see that
√ √
r+s d=a+b d
implies that a = r and b = s for all rational numbers a, b, r and s. In
particular, r2 − ds2 = 0 unless r = s = 0. Now we will introduce an
important function which will make computations possible.
√ √
3.3.1. Definition.
√ For x = r + s d ∈ Q[ d], define the conjugate of x
to be x̃ = r − s d. Let the norm of x be N (x) = xx̃ = r2 − ds2 .
√ √
Note that if x ∈ Q[ d], then N (x) is rational. If x ∈ Z[ d], then N (x)
is an integer.

3.3.2. Lemma. For x, y ∈ Q[ d],
(1)  = x̃ + ỹ.
x+y
(2) y = x̃ỹ.
x
(3) N (xy) = N (x)N (y).
(4) N (x) = 0 if and only if x = 0.

Proof. The proof consists of straightforward computations, and will be left


to the exercises. 
66 3. DIOPHANTINE EQUATIONS AND QUADRATIC NUMBER DOMAINS

Recall from Definition 1.8.3 that a unit x√of a commutative ring R is an


element with an inverse y, i.e., xy = 1. In Z[ 2], a simple calculation shows
that √ √
(17 + 12 2)(17 − 12 2) = 1.
√ √
So 17 + 12 2 is a unit in Z[ 2]. We need a criterion to decide when some-
thing is a unit.

3.3.3. Proposition. An element x ∈ Z[ d] is a unit if and only if N (x) =
±1.

Proof. If xy = 1, then N (x)N (y) = N (1) = 1. But N (x) and N (y) are
integers, so they are both ±1. Conversely, if N (x) = ±1, y = N (x)x̃ satisfies
xy = N (x)2 = 1. So x is a unit. 

This proposition shows that the units if Z[ d] correspond exactly to
integer solutions of Pell’s equation
n2 − dm2 = ±1.
When d is positive, there are always infinitely many solutions. We will look
at a few special cases in the next section. When d ≤ −2, only ±1 are units.
The case d = −1 is special. See section 3.5.

3.3.4. Definition. In a quadratic number domain, an element x is called


a prime if (i) x is not a unit, and (ii) whenever x factors as x = ab, either
a or b is a unit.

3.3.5. Remark. Notice that this definition is a special case of the one
given in Definition 1.8.6. For rings more general than quadratic number
domains, the term “irreducible” is used instead of “prime.”

One can factor 2 in Z as


2 = (1)(2) = (2)(1) = (−1)(−2) = (−2)(−1).
We consider these to be trivial factorizations because one factor is always a
unit. The primes in Z by this definition are just the ordinary primes and
their negatives.
The following lemma gives us a simple test for primes. However, the
converse is not true; so be careful how you use it.

3.3.6. Lemma. If N (x) is prime, then x is a prime.

Proof. If x = ab, then N (x) = N (a)N (b). If N (x) is prime, then either
N (a) or N (b) equals ±1. Hence either a or b is a unit by Proposition 3.3.3.
Therefore x is prime. 
3.3. QUADRATIC NUMBER DOMAINS 67
√ √
√ Example. Consider Z[ 2] again. Since N (2 + 2) = 2 is √
3.3.7. prime,

2 + 2 is
√a prime. The number 2 itself is not prime! It factors as
√ 2 = 2√ 2,
and N ( 2) = −2 = ±1. Also 7 is√not prime because 7 = (3 − 2)(3 + 2).
The integer 5 is prime in Z[ 2], even though N (5) = 25 is not prime.
If 5 were not prime, it would factor as 5 = xy, where neither x nor y is a
unit. Then 25 = N (5) = N (x)N (y). Since x and y are not units, neither
√ ±1. Thus, one must have N (x) = N (y) = ±5. Let
N (x) nor N (y) equals
us write x = n + m 2. Then
n2 − 2m2 = ±5.
This is impossible. To see this, consider this equation modulo 5. One obtains
n2 ≡ 2m2 (mod 5).
However, the squares modulo 5 are congruent to 0, 1 or 4. Thus the only
solution occurs when
n ≡ m ≡ 0 (mod 5).
Thus the only way that n2 − 2m2 can be a multiple of 5 is if both n and m
are multiples of 5. Then n2 − 2m2 is a√multiple of 25; and so never equals
±5. We conclude that 5 is prime in Z[ 2].

Let us show that every element of Z[ d] has at least one factorization
into primes. Later, we will discuss what unique factorization should mean.

3.3.8. Lemma. Every non-zero element of Z[ d] factors as the product
of a unit and finitely many primes.

Proof. The proof is basically


√ the same as the proof we gave for the integers.
The size of elements of Z[ d] will be measured by the norm function.
Consider the set

S = {x ∈ Z[ d] : x does not factor as a finite product of primes}.
If this set is empty, the lemma is true. Otherwise, the set
{|N (x)| : x ∈ S}
has a smallest element. Let x be an element of S for which |N (x)| is as
small as possible. If x were prime, it would factor as the product of one
prime and so would not belong to S. Hence x factors as x = ab so that
|N (a)| < |N (x)| and |N (b)| < |N (x)|. Therefore, both a and b must factor
as products of primes, say
a = up1 . . . pk and b = vq1 . . . ql ,
where u and v are units and pi and qj are all primes. But then
x = (uv)p1 . . . pk q1 . . . ql
68 3. DIOPHANTINE EQUATIONS AND QUADRATIC NUMBER DOMAINS

is the desired factorization of x. This contradicts the fact that x belongs to


S. We conclude that S is empty and the lemma is true. 
What does unique factorization mean in this context? Consider
√ √ √ √
11 = (5 3 + 8)(5 3 − 8) = (2 3 − 1)(2 3 + 1).
Notice that √ √
N (5 3 ± 8) = N (2 3 ± 1) = 11.
So the
√ factors are prime. Are they really two
√ different factorizations of√11
in Z[ 3]? No, they’re not. Notice that 2 − 3 is a unit with inverse 2 + 3.
Now √ √ √
(5 3 + 8)(2 − 3) = 2 3 + 1.
So these two primes are in the same relationship here as ±5 are in Z. Two
primes p and q are called associates if there is a unit u such that q = up.
So we can compute
√ √
11 = (5 3 + 8)(5 3 − 8)
 √ √  √ √
= (5 3 + 8)(2 − 3) (2 + 3)(5 3 − 8)
√ √
= (2 3 + 1)(2 3 − 1)
√ √
= (2 3 − 1)(2 3 + 1).
These two factorizations are essentially the same because the only difference
is obtained by multiplying primes by units, and permuting the √ factors.
On the other hand, consider the following situation in Z[ 10].
√ √
6 = (2)(3) = (4 + 10)(4 − 10).

We compute that N (2) = 4, N (3) √ = 9, and N (4 ± 10) = 6. If these
numbers√ factor non-trivially in Z[ 10], then there would be elements x =
n + m 10 with N (x) = n2 − 10m2 = ±2 and N (x) = ±3. However,
reducing modulo 10, this requires that n2 ≡ 2, 3, 7 or 8 (mod 10). But a
perfect square
√ is congruent to
√ 0, 1, 4, 5, 6, or 9 (mod 10). Therefore √ 2, 3
and 4 ± 10 are primes in Z[ 10]. Neither 2 nor 3 is√ an associate of 4 ± 10
because their norms are different. So the domain Z[ 10] does not have the
unique factorization property.
A domain in which every element has exactly one factorization into
primes up to permutations and multiplication by units is called a Unique
Factorization Domain or UFD. The key is the analogue of Lemma 1.6.1.
Some of these domains have a Euclidean algorithm, which is easily deduced
if there is a division algorithm. See Section 1.8√ for an introduction to these
ideas. Try√it out with x = 2 and y = 4 + 10 to see that this does not
hold in Z[ 10]. It is an interesting and difficult problem in number theory
to determine which quadratic number domains are Euclidean, and which
are UFD’s. There are only finitely many Euclidean domains. There are
more UFD’s, and it is conjectured that there are infinitely many of them.
3.3. QUADRATIC NUMBER DOMAINS 69

The interested reader should consult a book on number theory to get more
information. We recommend Stark [37]. √
There is one more
√ subtle√ point. The ring Z[ 5] is not a UFD. To see this,
notice that 4 = ( 5 + 1)( 5 − 1) = (2)(2). Also, all the factors have norm 4.
We see that n2 − √5m2 = 2 has no solutions by looking at this equation mod
5. Clearly,
√ 2 and 5 + 1 are not associates. Thus factorization is not unique
in Z[ 5]. However, in this case, the reason is that we left
√ some important
elements out of our ring. All the numbers x = n + m 5 satisfy a monic
quadratic equation with integer coefficients, namely
X 2 − 2nX + (n2 − 5m2 ) = 0.
√ √
However, the element (1 + 5)/2 belongs to √ Q[ 5], and is a root of X 2 −
X − 1. The collection of all numbers in Q[ 5] satisfying
√ such an equation
turns out to be all numbers of the form (n + m 5)/2 where n and m are
integers such that n ≡ m (mod 2). In this case, N (x) = n −5m
2 2
∈ Z. In
√ √ 4
the larger ring Z 2 , there are the units (1 ± 5)/2. It is known as the
1+ 5
√ √
ring of integers in Q[ 5] because this is the set of all elements in Q[ 5]
√ √
with integer norm. Now 2 and 1 ± 5 are associates in Z 1+2 5 . In fact,

Z 1+ 5
is a Euclidean Domain.
2 √ √
It can be shown that

the ring of integers in Q[ d] is Z[ d] when d ≡
1 (mod 4), and Z 2 1+ d
when d ≡ 1 (mod 4). Moreover, when d ≡ 1

(mod 4), Z[ d] can never be a UFD. To see this, let d = 4k + 1. Notice that
√ √
2|4k = (1 + d)(−1 + d).

We claim that 2 is prime in Z[ d]. It has norm N (2) = 4, so any proper
factor must have norm ±2. Consider the equation
±2 = n2 − dm2 ≡ n2 − m2 (mod 4).
The left-hand side is congruent to 2 (mod 4), which√can never√be the differ-
ence of two squares. Now√ the prime 2 divides (1 + d)(−1 + d), but does
not divide either ±1 + d. So there is no√unique factorization.
The list of the rings of integers of Q[ d] which are Euclidean domains
with respect to the norm function is finite: d =
−11, −7, −3, −2, −1, 2, 3, 5, 6, 7, 11, 13, 17, 19, 21, 29, 33, 37, 41, 57, 73.
The list of Euclidean domains for some other function includes d = 14, and
may be infinite. The list of UFDs is larger, and is almost surely infinite. The
negative values of d are all known though, and there are only finitely many.
In addition to the norm Euclidean domains, there are −163, −67, −43, −19.
The additional positive ones with d < 100 are
14, 22, 23, 31, 38, 43, 46, 47, 53, 59, 61, 62, 71, 77, 83, 86, 89, 93, 94, 97.
70 3. DIOPHANTINE EQUATIONS AND QUADRATIC NUMBER DOMAINS

Exercises
√ √
1. Show that n + m d = k + l d for k, l, m, n ∈ Q implies that k = n and
l = m.
2. Verify Lemma 3.3.2.

3. (a) Show that 2 and√3 are not prime √ in Z[ 3].
(b) Show that 5 − 2 3 is prime √ in Z[ 3].
(c) Show that 5 is prime in Z[ 3].

4. Show that there is no division algorithm for Z[ √10] with f (x) = |N (x)|
by showing that any remainder on dividing 4 + 10 by 2 has norm with
absolute value at least 6.
Hint: consider the norm of the remainder modulo 20.
5. Show that there are infinitely many integer solutions of n2 − 3m2 = 1.
Find an explicit recursion formula that generates your set of solutions.
6. Show that n2 − 5m2 = 2 has no solutions.

3.4. Pell’s Equation


√ √
The units (invertible elements) of Z[ d] are of the form x + y d such that
x2 − dy 2 = ±1.
For d positive, one might suspect that there are non-trivial solutions. In
fact, there are always infinitely many solutions for every positive square free
d. The proof of this is beyond the scope of this book. If you are interested,
consult [37]. The proof is based on the theory of continued fractions. Brute
force is not likely to succeed with this problem because some fairly small
numbers have very large smallest solutions. For example, for d = 109, the
smallest solution is
x = 158 070 671 986 249 y = 15 140 424 455 100.
This problem has a long history, and it was completely solved in 1150
by Bhaskara. Fermat solved it for d ≤ 150 and challenged a group of British
mathematicians to solve certain larger numbers. This was done by Broukner,
but later falsely attributed to Pell by Euler. It seems that Pell was not
responsible for either the problem or its solution, but his name has stuck.
In this section, we will solve the special case
x2 − 5y 2 = ±1.

√ y = 1 gives a non-trivial solution. This means that 2 + 5
We see x = 2 and
is a unit in Z[ 5] of norm -1. Thus any power of it is a unit (with norms
alternating ±1). That is, the pairs {±xn , ±yn } obtained from
√ √
xn + yn 5 = (2 + 5)n
3.4. PELL’S EQUATION 71

are solutions. The even pairs {±x2n , ±y2n } are solutions of x2 − 5y 2 = 1,


and the odd pairs {±x2n+1 , ±y2n+1 } are solutions of x2 − 5y 2 = −1. The
method of descent can now be used to show that this list of solutions is
complete. Indeed, this idea can be used for any d to show that if Pell’s
equation has one non-trivial solution, then it has infinitely many. See the
exercises.

3.4.1. Theorem. All solutions of the equation x2 − 5y 2 = ±1 are given


by the pairs {±xn , ±yn } for n ≥ 0 obtained from the identities
√ √
xn + yn 5 = (2 + 5)n .
This leads to the recursive formulae
x0 = 1, y0 = 0 and xn+1 = 2xn + 5yn , yn+1 = xn + 2yn for n ≥ 0.

Proof. First note that from the previous discussion, the pairs {±xn , ±yn }
for n ≥ 0 are indeed solutions. From this formula, we obtain
√ √ √
xn+1 + yn+1 5 = (xn + yn 5)(2 + 5)

= (2xn + 5yn ) + (xn + 2yn ) 5.
So the recursive equations for xn+1 and yn+1 follow immediately.
Suppose that the set S of non-negative integer solutions which are not in
this list is non-empty. We can then choose the solution {x, y} so that y is as
small as possible. The plan is to use Fermat’s method of infinite decent to
show that there is a smaller solution in S, a contradiction
√ and hence S √ =∅
and our list√ must be√complete. The idea is that x + y 5 is a unit in Z[ 5],
as is (2 + 5)−1 = 5 − 2. Hence,
√ √ √
(x + y 5)( 5 − 2) = (5y − 2x) + (x − 2y) 5
is a unit. Thus, {5y − 2x, x − 2y} is a solution.
The rest of the proof is just a computation to show that this is indeed a
smaller positive solution that is not in our list. Since
4y 2 ≤ 5y 2 ± 1 = x2 ≤ 6y 2 ,

it follows that 2y ≤ x < 6y. Hence,

0 < (5 − 2 6)y < 5y − 2x ≤ 5y − 4y = y,
and √
0 = 2y − 2y ≤ x − 2y < ( 6 − 2)y < y.
Consequently, we have obtained a smaller non-negative solution than we
started with. This solution cannot be {xn , yn } from our list. For then,
√ √ √
x + y 5 = ((5y − 2x) + (x − 2y) 5)(2 + 5)
√ √
= (xn + yn 5)(2 + 5)

= xn+1 + yn+1 5.
72 3. DIOPHANTINE EQUATIONS AND QUADRATIC NUMBER DOMAINS

Hence (5y − 2x, x − 2y) ∈ S, and 0 < x − 2y < y, contradicting the fact that
(x, y) had the smallest 2nd coordinate in S. Therefore we have obtained the
desired contradiction. 

Exercises

1. Find all solutions of x2 − 2y 2 = ±1.


2. Show by induction that the positive solutions of x2 − 5y 2 = ±1 obtained
above are given by the formulae
√ √ √ √
(2 + 5)n + (2 − 5)n (2 + 5)n − (2 − 5)n
xn = and yn = √ .
2 2 5
The notation x and x mean the least integer n ≥ x and the least
integer m ≤ x respectively. Deduce that
 √   √ √ 
xn = (2 + 5)n /2 and yn = (2 + 5)n /2 5 .

3. (a) Show that the elements of Z 1+ 5
are all elements of the form
√ 2
a+b 5
where a ≡ b (mod 2).
2 √ √
(b) Show that the set of units of Z 1+2 5 have the form ±un ±v n 5
where
√  √ n 2
un +vn 5
2 = 1+ 5
2 for n ≥ 0.

(c) By Theorem 3.4.1, 2 + 5 is a unit. Where does it fit into this list?
4. Show that there are infinitely many Pythagorean triples with y = x + 1;
i.e., solutions of the form (x, x + 1, z).
Hint: reduce it to Pell’s equation for d = 2. Hence find the smallest
solution larger than 6962 + 6972 = 9852 ; i.e. z > 985?
5. Prove that if x2 −dy 2 = 1 has one positive solution, then it has infinitely
many. If x2 −dy 2 = −1 has one positive solution, then it and x2 −dy 2 = 1
have infinitely many solutions.
6. Show that n2 − 5m2 = 11 has infinitely many√ solutions.
Hint: this is the norm of an element in Z[ 5].
7. Show that there are infinitely many positive integers a such that both
a + 1 and 3a + 1 are perfect squares. √
Hint: reduce this to a question of elements in Z[ 3] with specified
norm.

3.5. The Gaussian Integers



When d < 0, the ring Z[ d] lies in the complex numbers C, not the reals.
For this section, some familiarity with complex numbers will be assumed.
The ideas of complex numbers will be formally introduced in Chapter 5. We
3.5. THE GAUSSIAN INTEGERS 73

√ i = −1 for one (fixed) square root of −1. The Gaussian
use the notation
integers Z[ −1] consist of all complex numbers of the form n + mi for
integers n and m. The norm function is N (n + mi) = n2 + m2 , and this is
always a positive integer.
Let us find all of the units. For if u = n + mi is a unit, then n2 + m2 = 1.
Hence one of n or m is 0 and the other is ±1. So the units are ±1 and ±i.
We wish to establish unique factorization in this domain. By Theorem
1.8.18 and Remark 1.8.9, it is enough to show that the Gaussian integers
are a Euclidean domain for the norm function, i.e. they have a division
algorithm.

√ Suppose that a, b ∈ Z[ −1], and a = 0. Then there
3.5.1. Proposition.
are elements q, r ∈ Z[ −1] such that b = aq + r and 0 ≤ N (r) < N (a).

Proof. Since b/a ∈ Q[ −1], it can be written as b/a = u+iv where u and v
are rational. Pick integers n and m so that |u − n| ≤ 1/2 and |v − m| ≤ 1/2.
Set q = n + im, and

r = b − aq = a(u + iv) − a(n + im) = a (u − n) + i(v − m) .

Then using the fact that N (x) is defined on Q[ −1],

 N (a)
N (r) = N (a) |u − n|2 + |v − m|2 ≤ .
2
Thus the remainder r is sufficiently small. 

3.5.2. Theorem. Unique Factorization √ for Gaussian Integers.


Suppose that a is a non-zero element of Z[ −1], and that it factors in two
ways:
a = up1 . . . pk = vq1 . . . ql ,

where u and v are units and pi and qj are all primes. Then k = l and there
is a permutation π so that pi and qπ(i) are associates for 1 ≤ i ≤ k.

Proof. By Proposition 3.5.1 and Remark 1.8.9, the hypotheses of Theorem


1.8.18 hold. So, the Gaussian integers have unique factorization. 

In this ring, it is possible to describe all the primes. The argument


will be split into two parts. The first theorem is of independent interest.
The reader should notice that if item (5) were omitted from the list of
equivalences, it would not appear to have anything to do with the Gaussian
integers. However, the Unique Factorization theorem for this ring is crucial
to the proof.
74 3. DIOPHANTINE EQUATIONS AND QUADRATIC NUMBER DOMAINS

3.5.3. Theorem. Let p be an odd prime. Then the following are equiva-
lent:
(1) p ≡ 1 (mod 4).
(2) x2 + 1 ≡ 0 (mod p) has a solution.
(3) There are integers n and m which are not multiples of p so that
p|n2 + m2 .
(4) p is the sum of two squares: p = a2 +√ b2 .
(5) p factors as p = (a + ib)(a − ib) in Z[ −1].

Proof. Suppose that (1) holds, and write p = 4n + 1. Let a = (2n)!. Then
by Wilson’s Theorem,

2n  
2n 
a2 ≡ j (−j) (−1)2n
j=1 j=1


2n
≡ j(4n + 1 − j)
j=1
≡ (4n)! ≡ −1 (mod p)
So a is a solution of (2).
If n is a solution of x2 + 1 ≡ 0 (mod p) and m = 1, then n2 + m2 is a
multiple of p, so (3) holds. √
Suppose that (3) holds. Notice that in Z[ −1], √ it is possible to factor
n + m as (n + im)(n − im). If p were prime in Z[ −1], it would divide one
2 2

of n ± im. This then implies that p divides


√ both n and m, contrary to fact.
Hence p has a proper divisor x ∈ Z[ −1]. It follows that N (x) is a proper
divisor of N (p) = p2 . That is, N (x) = p. If x = a + ib, then p = a2 + b2 .
This proves (4) and (5). Since a2 + b2 = (a + ib)(a − ib), we see (5) implies
(4).
Finally, since p is odd, one of a, b is even and the other is odd. Therefore
p = a2 + b2 ≡ 1 (mod 4). So (4) implies (1). 

3.5.4. Theorem. The primes in Z[ −1] are:
(1) The elements of prime order: the primes ±1 ± i of norm 2; and
the elements x with N (x) = p, where p is a prime congruent to 1
(mod 4).
(2) The elements ±p and ±ip where p is a prime integer with p ≡ 3
(mod 4).

Proof. By Lemma 3.3.6, it follows that if N (x) is prime, then x is prime.


For N (x) to be prime, x cannot be an integer or i times an integer (these
elements have square norms). So x = a + ib, and a and b are not both even
(because 2 does not divide x.) Hence p = N (x) = a2 + b2 is the sum of
squares, not both even. Thus, it must be congruent to 1 or 2 modulo 4.
3.5. THE GAUSSIAN INTEGERS 75

Now 2 is the only prime congruent to 2 (mod 4), and one checks that
±1 ± i are the only elements of norm 2. The others have odd prime norm.
Suppose that
√ p is an integer prime congruent to 3 mod 4. If this were not
prime in Z[ −1], it would factor as p = xy, say. But then
p2 = N (p) = N (x)N (y).
Neither N (x) nor N (y) is 1, so N (x) = N (y) = p. But p ≡ 3 (mod 4),
and this is impossible
√ for a norm which is a sum of two squares. Hence p is
prime in Z[ −1]. Its associates ±p and ±ip are then also prime.
It remains√to show that there are no other primes. Let x = n + mi be
a prime in Z[ −1]. Its conjugate x̃ = n − mi is also a prime. To see this,
notice that x = ab if and only if x̃ = ãb̃. So any factorization of x̃ into
proper factors implies that x also factors, contrary to fact.
Consider N (x) = xx̃. If this is prime, it falls into case (i). Otherwise,
N (x) factors non-trivially in the integers as
xx̃ = N (x) = pq.
Now we can apply the Unique Factorization Theorem. The left-hand side
is the product of two primes. So the right-hand side must also be a fac-
torization into primes. Furthermore, x is the associate of one, say p, and
x̃ is the associate of the other, q. But if u is a unit so that x = up, then
x̃ = ũp̃ = ũp. Hence p is an associate of x̃, and hence also an associate of q.
This means that p = q is a prime.
There are two cases. If x = ±p or ±ip, this falls into case (ii). Otherwise,
x = n + im, where n and m are not multiples of p, but n2 + m2 = p2 is
divisible be p. So by Theorem 3.5.3, p ≡ 1 (mod 4). But then, √ by the same
theorem, we find out that p (and so also x) is not prime in Z[ −1]. That
eliminates this final possibility. 
A pretty application of this is a complete description of which numbers
can be expressed as the sum of two squares. The key additional piece of
information needed is the following computation. The proof is left to the
reader.

3.5.5. Lemma. Let a, b, x, and y be integers. Then


(a2 + b2 )(x2 + y 2 ) = (ax + by)2 + (ay − bx)2 .

3.5.6. Theorem. Let n be a positive integer. Factor n as n = ab2 where


a is square free. Then n can be expressed as the sum of two squares if and
only if a has no prime factors congruent to 3 (mod 4).

Proof. First suppose that a has no prime factors congruent to 3 (mod 4).
By Theorem 3.5.3, each factor of a is the sum of two squares. Repeated
application of the lemma shows that their product is also the sum of two
squares. Finally, multiplying by b2 preserves this as the sum of two squares.
76 3. DIOPHANTINE EQUATIONS AND QUADRATIC NUMBER DOMAINS

Conversely, suppose that n = x2 + y 2 , and that p is a factor of a. Let


k be the largest power of p which divides both x and y. Set X = x/pk ,
Y = y/pk , and N = n/p2k . Then since an odd power of p divides n, N is
still a multiple of p but X and Y are not. By Theorem 3.5.3, p = 2 or p ≡ 1
(mod 4). 

3.5.7. Example. As a second application, let us consider a Diophantine


equation:
x2 + 4 = z 3 .

It is convenient to work in Z[ −1] rather than in the integers because x2 +
4 = z 3 factors to obtain
z 3 = (x + 2i)(x − 2i).
First suppose that each of x ± 2i are cubes, so that there are integers a and
b with
x + 2i = (a + bi)3 = (a3 − 3ab2 ) + i(3a2 b − b3 ).
Hence,
(3a2 − b2 )b = 2.
Since b divides 2, it must be ±1 or ±2. Checking each case provides the
solution a = ±1, b = 1 or −2. Therefore x = a3 − 3ab2 = ±(1 − 3b2 ) ∈
{±2, ±11}. This yields the two positive solutions
22 + 4 = 23 and 112 + 4 = 53 .
Let us show that these are the only solutions.
√ Suppose that (x, y, z) is a
positive solution. Let p be a prime in Z[ −1] which divides z. If pm is the
greatest power of p which divides z, then p3m divides z 3 . If p divides only
one of x ± 2i, say x + 2i, then p3m , which is a perfect cube, divides x + 2i.
However, p might divide both x ± 2i. In that case, it divides
gcd(x + 2i, x − 2i) = gcd(x + 2i, 4).
Since 4 = −(1 + i)4 ,
this means p = 1 + i. Now p is associated to −i(1 + i) =
1 − i = p̃. Thus if p divides x + 2i, then p̃k divides (x + 2i)˜= x − 2i. That
k

is, the multiplicity of p as a factor of x + 2i and x − 2i are equal. Thus, 3m


is even, say 3m = 6n. Hence, x ± 2i are both multiples of p3n which is also a
perfect cube. It follows that except for the factor of a unit, both x ± 2i are
perfect cubes. But the units, ±i and ±1, are all perfect cubes. So, x ± 2i
are both cubes. Therefore we have found all the solutions.

Exercises

1. Factor 1105 completely in Z[ −1].
2. Solve the Diophantine equation x2 + 442 = z 6 .
3.6. QUADRATIC RECIPROCITY 77
√ √
3. Show that Z[ −2] has a division algorithm. Hence deduce that Z[ −2]
has unique factorization.
4. Find all solutions of x2 + 2 = y 3 .
5. Give another argument to find all irreducible Pythagorean triples (x, y, z),
i.e. x2 + y 2 = z 2 and gcd(x, y) = 1, as follows.
(a) Assume
√ that x is odd. Factor x2 + y 2 = (x + iy)(x − iy) = z 2 in
Z[ −1]. Prove that x + iy is a square.
(b) Hence find a formula for x, y and z.
(c) Verify that every triple of this form yields an irreducible Pythagorean
triple.
6. (Zagier) Let p be a prime with p ≡ 1 (mod 4). Define
S = {(x, y, z) ∈ N3 : x2 + 4yz = p}.
Also define T : S → S by


⎨(x + 2z, z, y − x − z) if x < y − z
T (x, y, z) = (2y − x, y, x − y + z) if y − z < x < 2y


(x − 2y, x − y + z, y) if x > 2y.

(a) Prove that S is finite, T (S) ⊂ S and T ◦ T = id.


(b) Prove that T has a unique fixed point (x0 , y0 , z0 ) = T (x0 , y0 , z0 ).
Deduce that |S| is odd.
Hint: Note that a fixed point has the form (x, x, z), which forces
x|p.
(c) Let J(x, y, z) = (x, z, y). Show that J(S) = S. Using that |S| is
odd, prove that J has a fixed point.
(d) Deduce that p is a sum of two squares.

7. Find all solutions of x2 + 11 = y 3 . You must work in Z[ −11], which is
a Euclidean domain.

3.6. Quadratic Reciprocity


Primitive roots can be used to analyze simple congruence equations. Recall
that a is a primitive root modulo a prime p if {ak : 1 ≤ k ≤ p − 1} represent
all p − 1 distinct non-zero equivalence classes (mod p). Thus every x ∈ Z∗p
has the form x = ak for some k. We can use this to solve certain congruence
equations.

3.6.1. Example. Consider the equation


(‡) x6 ≡ 13 (mod 17).
Of course, trial and error works for such a small number. However, let
us instead make use of the fact that 3 is a primitive root of Z17 (because
78 3. DIOPHANTINE EQUATIONS AND QUADRATIC NUMBER DOMAINS

38 ≡ −1 (mod 17)). A calculation shows that 34 ≡ 13 (mod 17). Equation


(‡) has a solution x = 3k if and only if
x6 ≡ 36k ≡ 34 (mod 17).
Hence ≡ 1 (mod 17). This occurs exactly when 6k ≡ 4 (mod 16).
36k−4
Since gcd(6, 16) = 2 divides 4, equation (‡) has the solutions k ≡ 6 (mod 8)
or k ≡ 6, 14 (mod 16). Thus the solutions are x ≡ 36 ≡ 15 (mod 17) and
x ≡ 36 38 ≡ 2 (mod 17).
On the other hand, consider the equation
x6 ≡ 3 (mod 17).
Again if we set x = 3k , the equation becomes
x6 ≡ 36k ≡ 31 (mod 17).
This has solutions x = 3k satisfying 6k ≡ 1 (mod 16). This has no solutions
because gcd(6, 16) = 2 does not divide 1.

The general result along these lines is proved in the same way. The
added twist is that we obtain a condition that does not use primitive roots!
However, the existence of primitive roots is used in the proof.

3.6.2. Theorem. Let p be a prime, let n be a positive integer, and suppose


that gcd(b, p) = 1. Set s = gcd(n, p − 1) and t = (p − 1)/s. Then the
congruence equation xn ≡ b has solutions if and only if bt ≡ 1 (mod p). In
this case, there are s distinct solutions modulo p.

Proof. Let a be a primitive root mod p. Let m be chosen so that b ≡


am (mod p). Then xn ≡ b (mod p) has a solution x ≡ ak if and only if
xn ≡ ank ≡ am , which happens if and only if nk ≡ m (mod p − 1). By
Theorem 2.6.1, this has solutions exactly when s = gcd(n, p − 1) divides m.
But s|m if and only if p − 1|tm. Since a is a primitive root, ae ≡ 1 (mod p)
exactly when e is a multiple of p − 1. Thus our equation has a solution if
and only if
 t
1 ≡ atm ≡ am ≡ bt (mod p).
Moreover, the solution of nk ≡ m (mod p − 1) is unique modulo t; so that
there are exactly s solutions modulo p − 1. Thus, when solutions exist, there
are exactly s distinct solutions. 
We apply this result for n = 2 and p > 2. Note that s = gcd(2, p−1) = 2;
whence t = p−1
2 .

3.6.3. Corollary. An number b is a square modulo an odd prime p if and


only if
b(p−1)/2 ≡ 1 (mod p).
3.6. QUADRATIC RECIPROCITY 79

3.6.4. Corollary. x2 ≡ −1 (mod p) has a solution if and only if p = 2


or p ≡ 1 (mod 4).

Proof. For p = 2, 12 = 1 ≡ −1 (mod 2). For p > 2, write p = 4n + e where


e ∈ {1, 3}. The previous corollary shows that −1 is a square modulo p if
and only if (−1)(p−1)/2 ≡ 1 (mod p). However

1 (mod p) if e = 1
(−1)(p−1)/2 ≡ (−1)(e−1)/2 ≡
−1 (mod p) if e = 3
That is, −1 is a square if and only if p = 2 or p ≡ 1 (mod 4). 
Gauss was interested in the problem of deciding when a number b was
congruent to a square modulo a prime p. He gave an elegant solution which
allows the calculations to be carried out easily by hand. The key result
became known as the Law of Quadratic Reciprocity. This was one of Gauss’s
most celebrated theorems.

3.6.5. Definition. The quadratic residue of a modulo a prime p is 1


a
if a is a square modulo p, and −1 if it is not. It is denoted by .
p
a
The corollary above shows that ≡ a(p−1)/2 (mod p). Hence it fol-
p
lows that
 ab   a  b 
≡ (ab)(p−1)/2 ≡ a(p−1)/2 b(p−1)/2 ≡ (mod p).
p p p
In other words, the quadratic residue
 q  is multiplicative. So in order to do
computations, it suffices to know when p and q are primes. This is the
p
content of Gauss’s famous theorem, which we prove below.

3.6.6 Law of Quadratic Reciprocity. Suppose that p and q are odd


primes. Then

2 p2 −1 1 if p ≡ ±1 (mod 8)
(1) = (−1) 8 =
p −1 if p ≡ ±3 (mod 8).

 p  q  p−1 q−1 1 if p ≡ 1 (mod 4) or q ≡ 1 (mod 4)
(2) = (−1) 2 2 =
q p −1 if p ≡ q ≡ 3 (mod 4).

p2 −1 p2 −1 64a2 ±16a
The quantity 8 appears here. If p = 8a ± 1, then 8 = 8 is
p2 −1 64a2 ±48a+8
even; and if p = 8a ± 3, then 8 = is odd.  
8
a
The following computational lemma will calculate in a different
p
way. The proof is tricky.
80 3. DIOPHANTINE EQUATIONS AND QUADRATIC NUMBER DOMAINS

3.6.7. Lemma. Let p be an odd prime and a be relatively prime to p. Let


p−1
0 < ri < p be such that ai ≡ ri (mod p) for 1 ≤ i ≤ 2 . Let
p−1
 p−1 p  2 

 ia
n= i:1≤i≤ and ri >  and N= .
2 2 p
i=1

Then
a
= (−1)n .
p

Furthermore, N ≡ n + (a − 1) p 8−1 (mod 2). In particular, if a is odd, then


2

a
= (−1)N .
p

Proof. Let b1 , . . . , bm be the ri < p2 , and let c1 , . . . , cn be the ri > p2 . Then


m + n = p−1
2 . Observe that if 1 ≤ i < j ≤ 2 ,
p−1

ri ± rj ≡ a(i ± j) ≡ 0 (mod p) ⇐⇒ i ± j ≡ 0 (mod p) =⇒ i = j.


p−1
Therefore b1 , . . . , bm , p − c1 , . . . , p − cn are distinct. Since m + n = 2 and
the bi and p − cj all lie between 1 and p−1 2 , we see that

b1 , . . . , bm , p − c1 , . . . , p − cn = 1, . . . , p−1
2 .

Thus,

 p−1 
m 
n 
m 
n

2 != bi · (p − cj ) ≡ (−1)n bi · cj
i=1 j=1 i=1 j=1


(p−1)/2
 p−1
≡ (−1)n (ia) = (−1)n a(p−1)/2 2 ! (mod p).
i=1

and hence
a
≡ a(p−1)/2 ≡ (−1)n (mod p).
p
 
We now turn to the computation of N . First observe that ia = p ia
p +ri .
Thus, we have

 ia 
(p−1)/2 

(p−1)/2

ri = ia − p
p
i=1 i=1

1p−1 p−1  p2 − 1
=a + 1 − Np = a − N p.
2 2 2 8
3.6. QUADRATIC RECIPROCITY 81

On the other hand, working mod 2, we have



(p−1)/2

m 
n

ri ≡ bi + (p − cj ) − p
i=1 i=1 j=1


(p−1)/2
p2 − 1
= −np + i = −np + (mod 2).
8
i=1

Therefore, the two quantities just computed are equal modulo 2; whence
p2 − 1
N − n ≡ (N − n)p ≡ (a − 1) (mod 2).
8
a
When a is odd, N ≡ n (mod 2); and so = (−1)N . 
p

3.6.8. Theorem. Let p be an odd prime. Then



2 p2 −1 1 if p ≡ ±1 (mod 8)
= (−1) 8 =
p −1 if p ≡ ±3 (mod 8).

Proof. By Lemma 3.6.7, we must count the number of elements n in the set
{2, 4, 6, . . . , p − 1} which are greater than p2 . If p ≡ 3 (mod 4), then smallest
such even integer is p+1 2 ; and if p ≡ 1 (mod 4), then smallest such element
is p+3
2 .
We first consider the case p ≡ 1 (mod 4). Then

p − 1 − p+3 p − 1 0 (mod 2) if p ≡ 1 (mod 8)
n=1+ 2
= ≡
2 4 1 (mod 2) if p ≡ 5 (mod 8).
Similarly, when p ≡ 3 (mod 4),

p+1
p−1− 1 (mod 2) if p ≡ 3 (mod 8)
p+1
n=1+ 2
= ≡
2 40 (mod 2) if p ≡ 7 (mod 8).
2 2
Therefore = 1 if p ≡ ±1 (mod 8) and = −1 if p ≡ ±3 (mod 8).
2 p p
= (−1)(p −1)/8 .
2
Thus, 
p
We are now ready to prove the Law of Quadratic Reciprocity.

Proof of Theorem 3.6.6. The first statement was established above in


Theorem 3.6.8. For the second statement, let
p−1 q−1
2 
 2 

iq jp
N= and M= .
p q
i=1 j=1
82 3. DIOPHANTINE EQUATIONS AND QUADRATIC NUMBER DOMAINS

Since p and q are odd, Lemma 3.6.7 shows that


 p  q 
= (−1)M +N .
q p
Consider the rectangle
p q
R = {(x, y) ∈ Z2 : 1 ≤ x ≤ , 1 ≤ y ≤ }.
2 2
Notice that
p q p−1 q−1 (p − 1)(q − 1)
|R| = = = .
2 2 2 2 4
By counting |R| in a different way, we will see it is also equal to the quantity
M + N . Consider the line L ⊂ R2 defined by the equation y = pq x. Since p
and q are distinct primes, we see that L ∩R = ∅. Divide R into two triangles
q p
T1 = {(x, y) ∈ R : y < x} and T2 = {(x, y) ∈ R : x < y}.
p q
Then |R| = |T1 | + |T2 |. For each 1 ≤ i ≤ p−1
2 ,
    
 (i, y) : 1 ≤ y ≤ q ∩ T1  =  y ∈ Z : 1 ≤ y < iq  = iq .
2 p p
Hence
p−1
2 
iq
|T1 | = = N.
p
i=1
q−1
Similarly, for each 1 ≤ j ≤ 2 ,
    
 (x, j) : 1 ≤ x ≤ p ∩ T2  =  x ∈ Z : 1 ≤ x < ip  = ip .
2 q q
Thus, |T2 | = M . Therefore N + M = |R| = (p−1)(q−1)
4 , and so
 p  q  (p−1)(q−1)
= (−1)M +N = (−1) 4 .
q p
This finishes the proof. 

Exercises

1. Determine if 107 is a quadratic residue modulo 1009.


2. Determine if 20964 is a quadratic residue modulo 1987.
3. Find all solutions to the equation x5 ≡ 29 (mod 61).
4. Without using Theorem 3.6.6, show that for every prime p, at least one
of −1, 2 and −2 is a square modulo p.
5. Let p be a prime and let a, b ∈ N with p not dividing a or b. Show that
exactly 1 or 3 of a, b, ab are squares modulo p.
NOTES ON CHAPTER 3 83

6. For prime p ≥ 7, show that there are always two consecutive quadratic
residues mod p neither of which is zero.

7. Let p be a prime in Z and suppose 5 is not prime in Z[ p]. Prove that
p = 5 or p ≡ ±1 (mod 5).

3 1 if p ≡ ±1 (mod 12)
8. (a) If p is an odd prime, show that = .
p −1 if p ≡ ±5 (mod 12)
5
(b) Find a similar formula for .
p
9. (a) Let p be an odd prime. Consider the equation

ax2 + bx + c ≡ 0 (mod p)

where p does not divide a. Let d = b2 − 4ac and y = 2ax + b. Re-


duce this equation to y 2 ≡ d (mod p) and hence obtain a quadratic
formula modulo p.
(b) Find a necessary and sufficient condition for this quadratic to have
a root when p = 2.

Notes on Chapter 3
Diophantine equations are named after the Greek mathematician Diophan-
tus of the 3rd century CE. Linear Diophantine equations were discussed in
the notes in Chapter 2.
Much earlier, in the 6th century BCE, Pythagorus gave examples of
integral Pythagorean triples and produced an infinite family. Later Plato
found another non-trivial infinite family. Independently the Hindu scholars
also found similar families. Around 300 BCE, Euclid gave more general
families of solutions in his Elements. Many schools of mathematics around
the world eventually solved this problem.
Fermat studied ways of representing numbers as sums of squares, cubes,
etc. He showed that a prime p ≡ 1 (mod 4) is a sum of two squares in a
unique way. He also knew that if n has a prime factor p ≡ 3 (mod 4) to an
odd power, then it is not a sum of two squares. The final form was due to
Euler.
Fermat wrote about his equation xn + y n = z n in his notes. However in
communications with others, he did not claim a solution. He did show the
impossibility of x4 + y 4 = z 2 . He may also have had a solution for n = 3,
since he challenged other mathematicians to solve it, although there is no
record of his solution. Euler solved n = 3. Legendre and Dirichlet solved
n = 5. The case n = 7 was due to Lamé, and was simplified by Lebesgue.
The first significant general theorem was due to Sophie Germain. Kummer
developed ideas of modern ring theory in order to analyze the failure of
unique factorization in various number domains. He used this to provide a
84 3. DIOPHANTINE EQUATIONS AND QUADRATIC NUMBER DOMAINS

proof for all regular primes. The smallest cases remaining open after that
were 37 and 59.
Mordell made a conjecture in the 1920’s which, if true, would imply that
equations like Fermat’s for n ≥ 3 could have at most finitely many solutions.
This was proved by Faltings in 1983, and he received the Fields medal for this
work. By 1993, computers had been used to show that Fermat’s equation
had no solutions for n ≤ 4 000 000. In 1955, Shimura and Taniyama proposed
a conjecture concerning elliptic curves and modular forms. It was later
shown by work of Ribet that this conjecture implies the truth of Fermat’s
claim. In 1993, Wiles announced a solution to a major case of this conjecture
which implied Fermat’s last theorem. It turned out that there was a non-
trivial gap which was later fixed by Wiles and Taylor. Breuil, Conrad,
Diamond and Taylor proved the full Shimura–Taniyama Conjecture in 2001.
Pell’s equation also has a long history back to antiquity. Bhaskara gave
a general method for solution in 1150. He explicitly solved x2 − 61y 2 = 1,
giving the smallest solution x = 1 766 319 049 and y = 226 153 980. La-
grange proved that Bhaskara’s method worked in 1738. He later developed
a complete solution using continued fractions. Fermat found a solution for
d ≤ 150 and challenged British mathematicians to solve the cases d = 151 or
d = 313. This was done by Broukner. However Euler mistakenly attributed
this to Pell, and his name has stuck in spite of it being incorrect.
The quadratic reciprocity laws were conjectured by Euler and Legendre.
Legendre made  substantial progress on the problem and introduced the Le-
p
gendre symbol . Gauss published a complete solution in his treatise
q
Disquisitiones Arithmeticae in 1798.
See [9, Vol.II] for an extensive history of these problems, or consult the
books by Cooke [8] and Kleiner [20]. See the article [22] for more informa-
tion about the work of Sophie Germain. See Ribenboim’s book Fermat’s last
theorem for amateurs [30] for more information on Fermat’s last theorem.
See Stark [37] for the solution to Pell’s equation using continued fractions.
Hardy and Wright [15] also contains much historic information, as well as
the mathematics including a proof of the law of quadratic reciprocity.
Chapter 4

Codes and Factoring


In this chapter, we will look at a code based on the number theoretic prop-
erties that we have developed. Since this code depends on the fact that it
is a lot easier to find big primes than to factor large numbers, we will also
study how this is done.

4.1. Codes
Codes are a way of encrypting a message so that it is very difficult or
impossible to read the message unless you have knowledge of the key which
unlocks the message. The most familiar codes are simple substitution codes.
This means that each letter is replaced with another one. For example,
consider the permutation of the alphabet given by
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ
5936071842KSRIUFHQPOWELJGYTADZMVXNBC
A message like ‘Houston airport, noon, Jan 22’ would become
‘QGMDZGJKPAYGAZJGGJOKJ33’.
This kind of code is very easy to break with the aid of a computer. In fact,
with a longer message, it can be done by hand and is a popular pastime for
many people. For example, this message has 5 G’s, so one might think this
is a vowel.
Actually, computers routinely use codes all the time—not for secrecy,
but because computers can only store numbers (base 2). The ASCII code
provides a number from 0 to 255 for all digits, upper and lower case letters,
and many other symbols. This is how the computer can store text, and
how word processors can manipulate it. The modern UTF-8 system extends
ASCII and encodes over 1,000,000 characters containing all major alphabets
in the world. A character uses up to 4 bytes (32 bits), so there are 232
possibilities. Since there is extra room in this system, certain bits are used
to detect and possibly correct errors. This is another important use of
encryption that helps ensure accurate transmission of digital data.
85
86 4. CODES AND FACTORING

It is much more difficult to break the code known as a ‘one time pad’.
The idea is to code your message by using another message known to the
encoder and the intended recipient. First we need a simple way to combine
two letters into one. Let us use a 36 letter alphabet consisting of 26 letters
and 10 decimal digits. We can think of each letter as representing an element
in Z36 . That is.
0 1 2 3 4 5 6 7 8 9 A B C D E F G H I
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

J K L M N O P Q R S T U V W X Y Z
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Then two letters can be combined by addition modulo 36. Let us use the
message ‘The quick brown fox jumped over the lazy dog’ to encode our
message ’Pizza 5:30 tonight’. Consider
P I Z Z A 5 3 0 T O N I G H T
T H E Q U I C K B R O W N F O
I Z D P 4 N F K 4 F B E 3 6 H
To decode this, one needs to know the coding message. If this message is
changed every time, for example using different pages of War and Peace
each time, this is virtually impossible to break without stealing the code.
However, if both the sender and the recipient always have their copy of War
and Peace with them, it might be a giveaway.
One thing these two codes have in common with each other and most
other codes is that encoding and decoding use the same information. An-
other kind of code has been invented which is of quite a different character.
Known as public key cryptography, the interesting thing about these codes
is that the method for encryption can be made public. For example, the
code can be published in the New York Times or be listed on an electronic
bulletin board. Anyone can send you a coded message. The important point
is that knowing how to encode does not tell you how to decode!

4.2. The Rivest-Shamir-Adelman Scheme


The public key code that we will study was developed at MIT by Rivest,
Shamir and Adelman. The key point that makes their code secure is that
it is very easy to find large primes (say 200–300 decimal digits), but very
difficult to factor large numbers that have a small number of large prime
factors. The reason for this will be discussed in the next section.
Here is how it works. Pick two large primes p and q, with 200–300
digits. Set n = pq, and notice that ϕ(n) = (p − 1)(q − 1). Now pick another
number r (say 6–10 digits) which is relatively prime to ϕ(n). Publish the
two numbers (n, r).
Anyone who wishes to send you a message does the following. First turn
your message into a number M by some standard scheme such as ASCII
4.2. THE RIVEST-SHAMIR-ADELMAN SCHEME 87

or any simple scheme that encodes the 36 characters 0–9 and A–Z as a two
digit number, possibly also including a–z and some punctuation marks. If
necessary, split your message into blocks so the numbers encoded are all less
than n. The coded message is
C ≡ Mr (mod n).
This message can now be published in the personals section of the New York
Times, or posted somewhere online.
The presumption is that all interested parties know the method of en-
coding and the message sent. Nevertheless, it is secure! Only you can break
the code. To do this, you must know p and q. And you must know the
Chinese Remainder Theorem. First, solve the equation
rs ≡ 1 (mod ϕ(n)).
This has a unique solution by Lemma 2.3.3. Of course, to find s it is nec-
essary to know ϕ(n), and to find it, one must factor n. The key is Euler’s
Theorem, which tells us that M ϕ(n) ≡ 1 (mod n) when gcd(M, n) = 1. In
fact, since n is the product of two distinct primes, it turns out that our de-
coding method works for every M in the interval 0 < M < n. Since rs ≡ 1
(mod ϕ(n)), it can be written as rs = 1 + kϕ(n). Now compute C s (mod n)
using the Chinese Remainder Theorem.
C s ≡ M rs ≡ M 1 (M p−1 )k(q−1) ≡ M (mod p)
C s ≡ M rs ≡ M 1 (M q−1 )k(p−1) ≡ M (mod q)
By the Chinese Remainder Theorem, C s ≡ M (mod n). Of course this only
finds M up to a multiple of n. That is why we begin with a message such
that 0 < M < n.
If you have access to a symbolic manipulation software, try to design
your own codes. Exchange messages with a friend, and decode them. Try
using these same programs to break your friend’s code.
The message is as secure as the difficulty of factoring large numbers
(not practical) and the security of the storage location of the key s. (It
is not necessary to remember p and q.) The latter consideration does not
have anything to do with coding though. Of course, if everyone knows your
encoding procedure, what prevents them from sending you a message and
signing another name? How can you be sure that the message is really from
your friend? The trick is for the sender to use his own code to give the
message a signature.
It works like this: suppose your code is (n, r) and the sender has a
published code (N, R). Let us also suppose that N < n. Only the sender
knows the decoding key S for the (N, R) code. The sender computes
Q ≡ MS (mod N ) and 0 ≤ Q < N.
Then this is encoded by the (n, r) code by
C ≡ Qr (mod n).
88 4. CODES AND FACTORING

Again C is sent. To decode, you compute


Q ≡ Cs (mod n).
Fortunately, since N < n, we know that 0 < Q < n without any ambiguity.
Now using the sender’s published code, compute
M ≡ QR (mod N ).
This message must be from our friend because only he/she knows S which
enabled the encoding in the first place.
What happens if N > n? Try it out on a computer. You will end up
with garbage. In this case you must encode using n first:
Q ≡ Mr (mod n)
C≡Q S
(mod N ).
This is decoded in the same basic way.
We end this section by discussing two practical aspects of the Rivest-
Shamir-Adelman scheme. First, the encryption scheme involves raising M
to a potentially large power r. If one computes M r by naively multiplying
M with itself r times, this requires r computations, which is a large number.
Instead, the way one performs this computation in practice is expand r in
base 2, namely write
r = a0 + 2a1 + 4a2 + . . . + 2k ak
where each ai ∈ {0, 1}. Then, by repeatedly squaring, one computes M ,
k k−1
M 2 , M 4 = (M 2 )2 , M 8 = (M 4 )2 , . . ., M 2 = (M 2 )2 mod n. One then
computes the product
k
M r ≡ M a0 (M 2 )a1 . . . (M 2 )ak (mod n).
r
In total, this requires very few computations. To obtain all powers M 2
for r ≤ s involves taking k − 1 products, and then obtaining M r involves
a0 + · · · + ak − 1 ≤ k products. Thus, this is on the order of 2k ≈ 2 log2 (r)
computations, which is substantially faster than performing r computations.
Second, the Rivest-Shamir-Adelman scheme relies on choosing n = pq
with p and q large primes, which raises the question of how one obtains
large primes in practice. At the beginning of Section 1.4, we mentioned the
famous Prime Number Theorem, which asserts that for large N , there are
N
roughly log(N ) prime numbers less than or equal to N . Said differently, if
we fix a large number N , the probability that a randomly chosen number
m ∈ {1, . . . , N } is prime is roughly log(N
1
) . Therefore, if we choose log(N )
numbers in {1, . . . , N }, there is a good chance that at least one of them is
prime. The chances are much higher if you avoid multiples of small primes.
In practice, one can test primality using the deterministic algorithm by
Agrawal-Kayal-Saxena developed in 2004 or the older Miller-Rabin proba-
bilistic test. Notice that even if N is a large number with 500 digits, log(N )
4.3. PRIMALITY TESTING 89

is only about 1000, so finding large primes with this method is quite practical
for a computer.

Exercises

1. Show that s works as a key to decode the RSA encoded message provided
that
rs ≡ 1 (mod lcm(p − 1, q − 1)).
2. Use computer software to check that r = 42385687 and a number 2 lines
long:
n = 9187532068491850238012987000740627489892542940\
1183797214111268335816454459464037326759995364752417
has a decoder
s = 5697037877032797156343521223137628208530547872\
5834255953360930453245246857516891597701705638306003
3. Use computer software to choose two primes p, q with 40–45 decimal
digits, and construct an RSA code.
4. Exchange messages with a friend. Code your student id number or a
simple message with your code ‘signature’, then code it up using your
friend’s code.
5. Try to break your friend’s code.

4.3. Primality Testing


How do you tell if a large number is composite or prime without doing a lot
of trial divisions? It turns out that you may be able to show that a number
is composite without knowing any factors! In 2004, Agrawal-Kayal-Saxena
gave a groundbreaking efficient algorithm to determine if a number is prime.
Their algorithm uses the fact that if n ≥ 2 and gcd(a, n) = 1, then n is prime
if and only if (X + a)n ≡ X n + an (mod n), see Exercise 1. Checking this
particular congruence is not efficient. However they modify it in a way that
makes the problem tractable: if (X + a)n ≡ X n + an (mod n) holds, then it
is also true that for all r, there are polynomials f (X), g(X) such that
(4.3.1) (X + a)n = (X n + an ) + (X r − 1)g(x) + nf (X).
Indeed, we can simply take g = 0 and f an appropriate polynomial. Agrawal-
Kayal-Saxena show a converse result: they prove that if there exists r and a
set of a such that if (4.3.1) holds for some f and g, then n is a prime power.
Moreover, they make these choices in such a way that this equation can be
checked efficiently. Although the details of their algorithm are beyond the
scope of this course, in this section we highlight some other methods to test
primality.
90 4. CODES AND FACTORING

One guaranteed way to test if a number p is prime is based on the results


of Section 2.10. We showed in Theorem 2.10.6 that if p is prime, then there
is a number a such that the set of powers {a, a2 , a3 , . . . , ap−1 } modulo p is
a permutation of the list {1, 2, 3, . . . , p − 1}. Conversely, the existence of
such an element guarantees that there are p − 1 different numbers relatively
prime to p. This means that p is prime. Indeed, there are ϕ(p − 1) such
generators. So chances of finding one by trial and error are quite good. The
problem, however, with this test is that if p is large, it is time-intensive to
compute all powers of a number a. In this section, we discuss more efficient
algorithms to test primality.
Like the Rivest-Shamir-Adelman code, the key to a more efficient algo-
rithm comes from Fermat’s Theorem. Let us suppose that a large number
n is given. We know that if n is prime, then an−1 ≡ 1 (mod n). So if
an−1 ≡ 1 (mod n) for some a, then n is definitely composite. For example,
if n = 2096004487, we can compute 2n−1 ≡ 1992692247 (mod n). Hence n
is composite. This does not tell much about how to factor it however.
There are some composite numbers which pass this test for all choices
of a which are relatively prime to n. Such numbers are called Carmichael
numbers. They are much less common than primes. For example, Erdös
showed that the sum of their reciprocals converges. However, recent results
have shown that they are nevertheless quite plentiful. An example is n =
561 = (3)(11)(17). Notice that if gcd(a, 561) = 1,
a560 ≡ (a2 )280 ≡ 1 (mod 3)
a560 ≡ (a10 )56 ≡ 1 (mod 11)
a560 ≡ (a16 )35 ≡ 1 (mod 17)
By the Chinese Remainder Theorem, one sees that a560 ≡ 1 (mod 561) for
all a relatively prime to 561. In fact, a80 would suffice.
Still, without any additional computation, it is possible to improve this
test. In our example, 560 = 16(35). Consider the computations
235 ≡ 263 (mod 561)
 35 2
2 70 = 2 ≡ 166 (mod 561)
 70 2
2 140 = 2 ≡ 67 (mod 561)
 140 2
2 280 = 2 ≡ 1 (mod 561)
 2
2560 = 2280 ≡ 1 (mod 561)
We see from this sequence of equations that 67 is a square root of 1 modulo
561. If 561 were a prime, there would be only two square roots, namely
±1. So this shows conclusively that 561 is composite. In this case however,
information about the factors is revealed because
0 ≡ 672 − 1 = (67 − 1)(67 + 1) (mod 561).
So gcd(66, 561) = 33 and gcd(68, 561) = 17 are factors of 561.
The general procedure, known as the Miller-Rabin test, uses this ap-
proach. Moreover, it does not involve any more computation than it requires
to get an−1 (mod n). Pull out all factors of 2 from n − 1, say n − 1 = 2d m.
4.4. FACTORING ALGORITHMS 91

Now compute am (mod n), and then successively square it to compute a2m
(mod n), a4m (mod n), a8m (mod n),. . . , an−1 (mod n). If 1 does not oc-
cur in this list, then n fails our earlier primality test. However, if an−1 ≡ 1
(mod n) and am ≡ 1 (mod n), then there is a last congruence in this list
which is not 1. This will be a square root of 1 in Zn . If it is not −1, then n is
definitely composite. This is because x2 = 1 has only the solutions x = ±1
in a field, but can have more solutions when n is composite.
It is also easy to check whether n has any small prime factors. The
computer can store the product P of all primes less than 1000. Compute
gcd(n, P ). If this is not 1, then n is composite. The composite numbers
which pass the Miller-Rabin test for half a dozen random choices of a are
quite rare. Indeed, a large number n which passes this test and has no
small prime factors is almost surely prime. Such numbers have been called
industrial grade primes. They are likely to be very hard to factor.

Exercises

1. Recall that nk = k!(n−k)!
n!
is a positive integer. Prove that if n ≥ 2, then
n
n is prime if and only if k ≡ 0 (mod n) for all 1 ≤ k ≤ n − 1. This
result plays an important role in the Agrawal-Kayal-Saxena algorithm.
2. Show that 3053 is not prime by finding a congruence identity that con-
tradicts primality. Do not factor it.
3. Show that 3876721 is not prime by finding a congruence identity that
contradicts primality. Do not factor it.
4. Show that 1729 is a Carmichael number. Find a congruence identity that
proves that n is not prime. (A factorization of n is not a satisfactory
substitute.)
5. Show that 5755495201 is a Carmichael number. Find a congruence iden-
tity which proves that n is not prime. (A factorization of n is not a
satisfactory substitute.) You may use computer software.
6. Show that if p = 6k + 1, q = 12k + 1, and r = 18k + 1 are all prime,
then pqr is a Carmichael number.
7. Korselt showed that a composite integer n is a Carmichael number if and
only if it is square free and for every prime p|n, one has (p − 1)|(n − 1).
Prove that if n has this form, then it is a Carmichael number.

4.4. Factoring Algorithms


If you wish to factor a large number using a computer, there are various
tricks you can try. No method known today can factor the product of two
primes with 200–300 digits before the end of the universe. Nevertheless,
92 4. CODES AND FACTORING

methods and computers will continue to improve. However, experts feel


that it will always be significantly easier to find large primes than to factor
the product of two of them. So the security of our code is guaranteed if we
make our primes stay ahead of the factoring game.
However, most random numbers have small prime factors as well as large
ones. Any sensible factoring algorithm starts by taking the gcd of n with
the product of the first few primes. In this way, all factors less than, say
1000, may be pulled out. Then test what is left to see if it is composite. If
it seems prime, it almost surely is. So now you try to prove that it is prime.
If it is composite, the hard work begins. Unfortunately, it is known that on
average, numbers do not have very many factors (relative to their size). So
most factors are very big.
Most factoring schemes use quite sophisticated mathematics. Here is an
elementary idea that goes back to Lagrange. The idea is simple: try to find
non-trivial solutions of
x2 ≡ y 2 (mod n).
By non-trivial solution, we mean x ≡ ±y (mod n). If n is composite, say
n = ab, then the solutions of
x − y ≡ a (mod n)
x + y ≡ b (mod n)
provide non-trivial solutions. Conversely, if x and y form a non-trivial solu-
tion, then gcd(x ± y, n) yield proper factors of n.
Lenstra and Pommerance have added some important new ideas to this
method. They hope that it will prove to be a better method than others
presently known. Their plan is to look for solutions of x ≡ y (mod n) so
that x and y are both products of only small primes. If enough solutions
are found, they can be used to construct a solution of x2 ≡ y 2 (mod n). Let
us illustrate this with a small example.
Let n = 493. A few trials yield the following equivalences (mod 493).
−3 ≡ 490 = 2 · 5 · 72
7 ≡ 500 = 22 · 53
32 ≡ 525 = 3 · 52 · 7
−7 ≡ 486 = 2 · 35
All of these equations contain only powers of −1, 2, 3, 5, and 7. Using
only the total parity of the exponents, we can represent these equations by
vectors. For example, the first equation has one − sign, and odd powers
of 2, 3, and 5; but an even power of 7. This yields the vector (1, 1, 1, 1, 0).
Altogether we obtain
(1, 1, 1, 1, 0)
(0, 0, 0, 1, 1)
(0, 1, 1, 0, 1)
(1, 1, 1, 0, 1)
In order to get squares, we wish to combine them so that all the parities are
even. Combining the first, second and fourth achieves this. Multiplying the
NOTES ON CHAPTER 4 93

three equations together yields


3 · 72 ≡ 24 · 35 · 54 · 72 .
Cancellation yields 1 ≡ 24 34 54 . This provides a solution to x2 ≡ y 2 with
x = 1 and y = 900. Computing gcd(493, 901) = 17 and gcd(493, 899) = 29
provides a complete factorization.

Exercises

1. Use computer software to find enough congruences to factor 1643 by the


method described in this section.
2. (Factoring algorithms and primality testing) Using computer soft-
ware commands for the gcd and mod, but not a complete factoring
command, interactively factor n = 21760197701640956578295160, and
report on the steps as you go along.
(i)Test for prime factors up to 1000 and factor them out. Let the large
factor remaining be called m.
(ii)Compute 3m−1 (mod m). What does this tell you?
(iii)One must use a brute force method to factor m. You may use that
999983 is a factor. Let the other factor be called q.
(iv)Repeat (i) for q − 1. This yields a prime factorization. Why?
(v)Prove that q is prime by finding a primitive root, say r.

Notes on Chapter 4
The use of codes for the purpose of secure transfer of information has a
long history. The primary uses were for military and political purposes, at
least initially, as these parties had great resources. During World War II,
codemaking and codebreaking were crucial parts of the war effort. This was
the beginning of the use of calculating machines, and led to the computer
revolution. The advent of computers has made the need for security in the
transmission of messages something that is of importance to all of us.
Computers also provided the means to use more sophisticated methods
both for encryption and the breaking of these codes. A central issue was
always how two parties could share information about a code that was safe
from prying eyes. A major breakthrough was made by Diffie, Hellman and
Merkle [10] which allowed a public exchange between two parties to agree
on a common key without revealing it to any eavesdropper. Diffie proposed
that one could develop an asymetrical code with a public key for encryption
that only the constructor could decode. This was accomplished by Rivest,
Shamir and Adleman [32] at MIT in 1978. It has since come out that the
codebreaking division of GCHQ, the British signals intelligence agency, came
up with a similar method to that of Diffie-Hellman-Merkle almost a decade
94 4. CODES AND FACTORING

earlier, but it was kept secret until recently. Since then, other methods have
been developed for public key codes.
Simon Singh’s book [36] is an interesting, non-technical introduction to
codes and codebreaking.
The use of codes to allow for accurate transmission over noisy signals
goes back to work of Hamming [16] in 1950. Nowadays, when large data files
such as computer operating systems and other software are routinely down-
loaded over the internet, the accuracy of transmission becomes as important
as security.
Computing the list of prime numbers goes back to ancient times. How-
ever the testing of large integers to decide if they are prime, primality testing,
is a modern idea relying on computers. The Miller-Rabin test [26, 29] dates
from the mid-1970’s. The first definitive algorithm to test for primality is
due to Agrawal, Kayal and Saxena [1]. Charmichael numbers were intro-
duced by Charmichael in 1910. There are infinitely many such numbers [3],
but the sum of their reciprocals is finite [11]; so they are rare compared to
prime numbers.
Factoring of composite numbers is considered to be very difficult, which
is why the RSA scheme is thought to be secure. The modification of La-
grange’s ideas from Section 4.4 is due to Lenstra and Pomerance [23]. The
possibility of quantum computers and an algorithm of Peter Shor [33] would
make factoring practical, and would break the RSA code. Other algorithms
for encryption that are secure against quantum computation have been de-
veloped, but are not yet in widespread use.
Chapter 5

Real and Complex Numbers


In this chapter, we will learn about the fields of real and complex numbers.
In particular, we will prove the famous Fundamental Theorem of Algebra
which asserts that every polynomial with complex coefficients factors into a
product of linear terms.

5.1. Real Numbers


We learn in calculus that the rational numbers are not sufficient for the study
of functions. For example, a nice function like x2 −2 does not have any zeros
if it is only defined on the rationals. Nor, from the point of view of algebra,
are the rationals adequate because this polynomial √ does not factor. The
‘natural’ domain of this function should include 2. Similarly, the function
x4 − 8x does not achieve its minimum value at any rational number. It also
turns out that the study of simple differential equations like y  = y leads to
the solution f (x) = ex where e is an even stranger ‘number’. Similarly, the
integral ! x
1
dt = ln(x)
1 t
introduces another trancendental function, meaning a function that does not
satisfy an algebraic equation. Of course, you have already learned about the
trigonometric functions sin(x), cos(x), and so on which rely on the magic
number π. So for various reasons, we find that the rational numbers are
inadequate for the analysis of functions.
The answer is to allow these other ‘numbers’ which seem called for to
fill the gaps between the rationals. There are a number of ways to define
what these real numbers R should be. One of the simplest descriptions is
to make use of the decimal system. We describe the set of real numbers as
all possible infinite decimal expansions:
x = ak ak−1 . . . a1 a0 .a−1 a−2 a−3 . . .
where ai belong to the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. Such expansions are
already familiar for rational numbers such as 13 = 0.33333 . . . and 22 7 =
95
96 5. REAL AND COMPLEX NUMBERS

3.14285714 . . .. Every such expansion gives us a real number. One problem


with this definition is that different decimal expansions may yield the same
number. For example,
1.000 . . . = 0.999 . . . .
This is a fairly minor problem, but you have to deal with this ambiguity of
names whenever you talk about the operations of addition, multiplication,
inverses, and even equality. A more important problem with this definition
is that it assumes implicitly that all of these symbols represent a number
and that we can define addition and multiplication. If we consider them
as infinite series, then that helps define these operations as limits, but the
whole notion of limits creates new issues.
The discovery of the nature of the real numbers was intimately con-
nected with the search for a good understanding of convergence and of the
nature of sets. All these notions were formalized in the middle of the nine-
teenth century. See Manheim [25] for a history of topology. There were two
different approaches.
Bolzano and later Cauchy introduced the notion of a Cauchy sequence,
which is the criterion used to decide if a sequence is convergent without the
need for any mention of the limit point itself. So one can consider the set
of all possible ‘limits’
 of sequences of rational numbers. For example, the
sequence en = nk=0 k! 1
can be shown to converge very rapidly to the real
number e. Even though e does not belong to the rational numbers, we can
manipulate it in the same way by using these rational approximations. We
are able to extend the notion of addition and multiplication because these
operations are continuous.
We have omitted an important part of the definition. For each real
number, there are many different convergent sequences with it as a limit.
So in reality, one must put an equivalence relation on the set of Cauchy
sequences. Two sequences are equivalent if they have the same limit. But
since all this must be done without reference to a limit point, say that two
Cauchy sequences {un } and {vn } are equivalent if the sequence u1 , v1 , u2 ,
v2 , . . . is also a Cauchy sequence. Alternatively, two Cauchy sequences are
equivalent if lim un − vn = 0. The real numbers are the set of equivalence
n→∞
classes of Cauchy sequences of rational numbers. Each rational number r
corresponds to the class containing the constant sequence {r, r, r, . . .}.
Another solution was proposed by Dedekind. He suggested a more al-
gebraic approach. Consider all proper subsets A ⊂ Q such that A has no
largest element and if a ∈ A and b < a, then b ∈ A. These objects are called
Dedekind ‘cuts’, because they correspond to cutting the rational numbers
into two at some ‘real’ point. For example,
A = {r ∈ Q : r ≤ 0 or r2 < 2}

represents 2. Addition can be defined on the sets themselves:
A + B := {r + s : r ∈ A, s ∈ B}.
5.1. REAL NUMBERS 97

Multiplication requires a little more ingenuity (try to define it), but is done
in a similar manner. One may then verify all the axioms of a field.
We will not carry out the construction of the real numbers in this book.
This discussion is for the purpose of making you aware that there were some
significant problems involved in the definition of the real numbers that took
many years to resolve. It took about 2000 years, from the Pythagorean
school to the middle 1800’s, to realize that one needed an abstract, non-
geometric, definition of real numbers.
The real numbers have an important completeness property. This prop-
erty was known before the real numbers were properly defined. Indeed, it
was the realization that one needed to prove this completeness property that
led to the more modern approach to mathematical proof. One way of stating
this property is known as the:

5.1.1 Least upper bound property. If X is a non-empty set of real


numbers with an upper bound, then there is a least upper bound s. That is,
every a ∈ X satisfies a ≤ s; and if every a ∈ X satisfies a ≤ t, then s ≤ t.

Let us look at how one can prove this using"Dedekind’s definition. Each
x ∈ X corresponds to a cut Ax . Define S = x∈X Ax . Let us now verify
that S is a cut which represents the least upper bound s of X. Note that S
is a proper subset of Q because it has an upper bound. If a ∈ S, then there
is some x0 ∈ X so that a ∈ Ax0 . Hence any b < a belongs to Ax0 and thus
to S; and there is some c ∈ Ax0 ⊂ S with a < c. Thus S is a cut. Now S
is an upper bound for X because S contains Ax for each x ∈ X, and thus
x ≤ s for all x ∈ X. On the other hand, if x ≤ t for all x ∈ X and t is
represented by a cut T , then T must contain Ax for every x ∈ X. Hence
S ⊂ T , and therefore s ≤ t.
The least upper bound property can be used to prove other basic prop-
erties of the real numbers. For example, the Intermediate Value Theorem
and the fact that every Cauchy sequence of real numbers converges to a
real number. This latter property is known as completeness. Other well
known theorems such as the Heine-Borel Theorem and the Extreme Value
Theorem depend crucially on this completeness property. We will require
the Extreme Value Theorem in our proof of the Fundamental Theorem of
Algebra. This is usually proven in a course on calculus or real analysis.

Exercises

1. Define multiplication using Dedekind’s definition of the real numbers.


Hint: do it first for two positive numbers.
2. Show that the associative law for addition holds in R using Dedekind
cuts.
98 5. REAL AND COMPLEX NUMBERS

3. Prove the Intermediate Value Theorem: If f is a continuous function on


[0, 1] such that f (0) < 0 and f (1) > 0, then there is a real number s
such that f (s) = 0.
Hint: use the {x : f (x) < 0} to help define a Dedekind cut.
4. Prove that every polynomial of odd degree with real coefficients has a
real root.
5. As discussed in this section, R is constructed from Q by taking limits
with respect to the absolute value. In this exercise we discuss another
type of absolute value one may construct on Q that depends on a choice
of prime p. One can also take limits of rational numbers with respect to
this so-called p-adic norm and the result is a field known as the p-adic
numbers.

Let p be a prime. Let |0|p = 0. For any 0 = a ∈ Z, let |a|p = p−k ,


where a = pk u with k ≥ 0 and u ∈ Z is relatively prime to p. For any
0 = q ∈ Q, write q = ab with a, b ∈ Z non-zero and gcd(a, b) = 1. Then
let |q|p = |a|p |b|−1
p .
(a) Prove that for all q, r ∈ Q we have |qr|p = |q|p |r|p and

|q + r|p ≤ max{|q|p , |r|p } ≤ |q|p + |r|p .

Show that |q + r|p = max{|q|p , |r|p } if |q|p = |r|p .


(b) Prove that the following series converges with respect to | · |p

 1
pn = − ;
p−1
n=0
  
i.e., show that lim 1 + (p − 1) m n
n=0 p p = 0.
m→∞

5.2. Complex Numbers


From the point of view of algebra, the real numbers still are deficient. One
would like to be able to completely factor all polynomials. But a polynomial
like x2 + 1 has no real roots. The solution is to invent a root which we call
i. In other words, one constructs a larger number system which contains an
element i such that i2 = −1. Nothing prevents us from introducing such a
symbol as long as we verify that our new system makes sense.
Define the set of complex numbers C to be the collection of all ‘num-
bers’ of the form a + ib where a and b are real. It is often convenient to
associate the number a + ib with the vector (a, b) in the plane R2 . Addition
is defined by vector addition:

(a + ib) + (c + id) = (a + c) + i(b + d).


5.2. COMPLEX NUMBERS 99

Multiplication is defined by extending real multiplication using the distribu-


tive law and the identity i2 = −1. This forces the rule:
(a + bi)(c + di) = ac + iad + ibc + i2 bd = (ac − bd) + i(ad + bc).

5.2.1. Theorem. The complex numbers form a field. That is, addition is
commutative and associative, has the zero element 0 = 0 + i0, and additive
inverses −(a + ib) = (−a) + i(−b). Multiplication is commutative and as-
sociative and distributes over addition, has the identity element 1 = 1 + i0,
and non-zero elements have (multiplicative) inverses.

The proof of this theorem will not be written out in detail. A few
comments will suffice here. The interested reader can carry out the rest of
the argument. First, the properties of addition are valid because they are
valid for vector addition. Commutativity of multiplication follows directly
from the definition and commutativity of real multiplication. The associative
law is a simple computation. Distributivity is also a routine computation.
We will carry it out in detail to give the flavour of the proofs.
Let u = a + ib, v = c + id and w = e + if be three complex numbers.
We have to verify the identity (u + v)w = uw + vw. Compute:
(u + v)w = ((a + c) + i(b + d)) (e + f i)
= (ae + ce − bf − df ) + i(af + cf + be + de)
= ((ae − bf ) + i(af + be)) + ((ce − df ) + i(cf + de))
= uw + vw
The astute reader may notice that a special case of the distributive law is
used in the proof. Multiplication by i does distribute over multiplication
and addition of real numbers. This follows from the definition of complex
addition and multiplication, and is not a circular proof.
Multiplicative inverses are worth investigating more closely. First, define
the complex conjugate of a complex number z = x + iy by z = x − iy.
Notice that zz = x2 + y 2 is a non-negative real number which is strictly
positive except when z = 0. So it follows that
z x y
z −1 = = 2 2
−i 2 .
zz x +y x + y2
This verifies all the properties of a field for C.
Let us collect together some simple properties of the conjugate function.
All of these properties are are left to the reader.

5.2.2. Proposition. Complex conjugation is an involution that preserves


all the field operations:
(1) Involution: z = z.
(2) Addition: z1 + z2 = z1 + z2 .
(3) Multiplication: (z1 z2 ) = z1 z2 .
100 5. REAL AND COMPLEX NUMBERS

There is an important geometric interpretation of the quantity zz =


x2 + y 2 . This represents the square of the Euclidean length of the vector
(x, y) in the plane. So one introduces the notion of absolute value or
modulus for z = x + iy:

|z| = (zz)1/2 = x2 + y 2 .
We also introduce the real and imaginary parts of z = x + iy defined by
z+z z−z
Re z = x = Im z = y = .
2 2i
The following proposition summarizes the basic properties of absolute value.

5.2.3. Proposition. Let z = x + yi and w = u + vi be two complex


numbers. Then
(1) |z| = |z|.
(2) |zw| = |z| |w|.
(3) |z| ≥ 0. Moreover, |z| = 0 implies that z = 0.
(4) | Re z| ≤ |z| and | Im z| ≤ |z|.
(5) (Triangle Inequality) |w + z| ≤ |w| + |z|.

Proof. The proofs of (1) and (3) are routine. For (2), notice that
|zw|2 = zwzw = zzww = |z|2 |w|2 .
Property (4) is immediate from
|z|2 = (Re z)2 + (Im z)2 .
Finally, the most important property is the triangle inequality. This
name comes from the fact that the vectors w, z, and w + z form the three
sides of a triangle. The triangle inequality states that the length of one side
is no longer than the sum of the lengths of the other two sides.
|w + z|2 = (w + z)(w + z)
= ww + wz + zw + zz
= |w|2 + 2 Re(wz) + |z|2
≤ |w|2 + 2|wz| + |z|2
= |w|2 + 2|w| |z| + |z|2 = (|w| + |z|)2 .
Taking square roots establishes the inequality. 

In Chapter 7, we introduce a general method for building a larger field


in which a given irreducible polynomial has a root. In this language, we
will see that starting with R and the polynomial x2 + 1, we construct a field
isomorphic to C by adding a root i of x2 + 1.
5.3. POLAR FORM 101

Exercises

1. Prove that |w − z| ≥ |w| − |z| for all complex numbers w and z.


2. Show that z and z −1 lie on a straight line through 0.
Show that if w ∈ C is a root of a polynomial p(x) with real coefficients,
then p(w) = 0 as well.
3. Prove that one cannot define an order < on the field of complex numbers
(see properties [O1] and [O2] from Section 1.1).
4. Show that there is no intermediate value theorem for polynomials with
complex coefficients.
5. (Products of sums of two squares) Use complex numbers to prove
that if a, b, c, d ∈ Z, then there exist x, y ∈ Z such that
(a2 + b2 )(c2 + d2 ) = x2 + y 2 . (Compare with Lemma 3.5.5.)
 
a b
6. Show that the set S of 2 × 2 matrices of the form form a field
−b a
under matrix addition and multiplication. Prove that the map
 
a b
ϕ : C → S, ϕ(a + ib) =
−b a
is an isomorphism of fields.
7. If you are familiar with the properties of determinants, use the represen-
tation of the complex numbers in Exercise 6 to prove that |wz| = |w||z|
by computing the determinants of the corresponding matrices.

5.3. Polar Form


Every point in the plane can be described by its Cartesian coordinates (x, y).
It can also be described by its polar coordinates, (r, θ), where r = (x2 +
y 2 )1/2 is the length of the vector (x, y) and θ is the (oriented) angle in radians
between the positive real axis and the ray determined by positive multiples
of the vector (x, y). The Cartesian coordinates are determined from the
polar form by the equations
x = r cos(θ) and y = r sin(θ).
Conversely, the polar coordinates are obtained from the Cartesian form by
solving these equations. Of course, the angle θ is determined only up to a
multiple of 2π.
This can be applied to the complex plane via its identification with R2 .
The argument of a complex number z = x + iy is the angle Arg(z) = θ
in the polar form (r, θ) of the vector (x, y). Again, this argument is only
determined as a real number modulo 2π. Of course, r = |z|. Let us introduce
102 5. REAL AND COMPLEX NUMBERS

a notation which will be used only for the next two sections:
cis(θ) := cos(θ) + i sin(θ).
This complex number lies on the circle of radius 1, centre 0, known as the
unit circle. Conversely, every point on the unit circle has this form. So every
complex number can be represented as z = r cis(θ). The significance of this
is that the argument of a product is the sum of the arguments.

5.3.1. Proposition. r1 cis(θ1 ) r2 cis(θ2 ) = (r1 r2 ) cis(θ1 + θ2 ).

Proof. Calculate
cis(θ1 ) cis(θ2 ) = (cos(θ1 ) + i sin(θ1 ))(cos(θ2 ) + i sin(θ2 ))
 
= cos(θ1 ) cos(θ2 )−sin(θ1 ) sin(θ2 ) +i cos(θ1 ) sin(θ2 )+sin(θ1 ) cos(θ2 )
= cos(θ1 + θ2 ) + i sin(θ1 + θ2 ) = cis(θ1 + θ2 ).
The fact that the absolute values multiply is a consequence of Proposition
5.2.3 (2). 

An immediate consequence of this is known as de Moivre’s Theorem.

5.3.2. Corollary. (cos(θ) + i sin(θ))n = cos(nθ) + i sin(nθ) for n ≥ 1.

This formula is quite useful for calculations of certain sines and cosines.
For example, consider this identity for n = 5.
cos(5θ) + i sin(5θ) = (cos(θ) + i sin(θ))5

= cos5 (θ) − 10 cos3 (θ) sin2 (θ) + 5 cos(θ) sin4 (θ)

+ i 5 cos4 (θ) sin(θ) − 10 cos2 (θ) sin3 (θ) + sin5 (θ)

By using the identity cos2 (θ) + sin2 (θ) = 1, we obtain


 2 
sin(5θ) = 5 1 − sin2 (θ) sin(θ) − 10 1 − sin2 (θ) sin3 (θ) + sin5 (θ)
= 16 sin5 (θ) − 20 sin3 (θ) + 5 sin(θ)
In particular, apply this when θ = π/5. Then sin(π/5) is a root of the
polynomial equation 16x5 − 20x3 + 5x = x(16(x2 )2 − 20x2 + 5) = 0. Since
sin(π/5) = 0, it follows that sin2 (π/5) is a root of the quadratic equation
16y 2 − 20y + 5 = 0.

This equation has roots 5±8 5 . To decide which root equals sin(π/5), notice
that 0 < π/5 < π/4. Thus this angle lies in the first quadrant, on which
sin(x) is monotone increasing. So

0 < sin(π/5) < sin(π/4) = 1/ 2.
5.3. POLAR FORM 103
√ √
5− 5 1 5+ 5
Clearly, < < . So,
8 2 8
#

5− 5
sin(π/5) =
8
#
√ √
3+ 5 1+ 5
cos(π/5) = = .
8 4
We can also solve other simple polynomial equations. For example,
consider
z 4 + 4 = 0.
Writing z = r cis(θ), the equation becomes
r4 cis(4θ) = −4 = 4 cis(π).

Hence r = 2 and θ is an angle such that 4θ ≡ π (mod 2π). So
π + 2kπ π π
θ= = + k
4 4 2
for some integer k. Only the values k = 0, 1, 2, 3 are important, for after
that, the values repeat modulo 2π. Hence the roots are

z1 = √ 2 cis(π/4) = 1+i
z2 = √2 cis(3π/4) = −1 + i
z3 = √2 cis(5π/4) = −1 − i
z4 = 2 cis(7π/4) = 1−i

Exercises

1. Find the Cartesian form of all cube roots of 8i.


2. Find the exact values of sin(π/12) and cos(π/12) by using the identity
cis(π/3) cis(−π/4) = cis(π/12).
3. Find all complex roots of the polynomial z 10 + z 5 + 1 = 0. Express at
least one of them in Cartesian form.
4. Use de Moivre’s theorem to obtain a formula for cos 4θ and sin 4θ.
5. Find all the 6th roots of −1. Graph them on the plane.
6. Calculate (1 + i)2023 .
7. Prove the quadratic formula for a quadratic with complex coefficients.
Deduce that every quadratic in C[x] has two complex roots.
Hint: complete the square.
8. (a) Solve z 4 + 16 = 0.
(b) Hence factor p(x) = x4 + 16 as a product of two real quadratic
polynomials.
104 5. REAL AND COMPLEX NUMBERS

5.4. The Exponential Function


In this section, we will extend the definition of the exponential function to
all complex numbers. To do this, we will search for a differentiable function
E : C → C such that E(w + z) = E(w)E(z) for all w and z in C and
E(x) = ex for all x ∈ R. Once we have established the existence of this
function, we will write ez for E(z).
Let us calculate some simple properties that such a function must have.
First,
E(x + iy) = ex E(iy).
And using the differentiability, we get
E(z + h) − E(z)
E  (z) = lim
h→0 h
E(h) − E(0)
= E(z) lim
h→0 h
e −1
x
= E(z) lim = E(z).
x→0 x
Equality from line 2 to line 3 follows because we have assumed that the first
limit exists.
Now concentrate on the function f (y) = E(iy). Split it into its real
and imaginary parts as f (y) = E(iy) = A(y) + iB(y). Differentiating with
respect to y yields

f  (y) = A (y) + iB  (y)


d(iy)
= E  (iy)
dy
= iE(iy) = −B(y) + iA(y)

So we arrive at the system of differential equations

A (y) = −B(y)
B  (y) = A(y).

This leads to the second order differential equation A (y) = −A(y). From
the identity 1 = E(0) = A(0) + iB(0), we also get the initial conditions
A(0) = 1 and A (0) = −B(0) = 0. From calculus, we know that this system
has a unique solution A(y) = cos(y) and B(y) = sin(y).
Thus we arrive at a unique solution E(iy) = cos(y) + i sin(y) = cis(y).
So
E(x + iy) = ex (cos(y) + i sin(y)) = ex cis(y).
For this reason, we will usually write eiθ = cos(θ) + i sin(θ) instead of cis(θ)
from now on.
5.4. THE EXPONENTIAL FUNCTION 105

Let us verify that this function indeed has the properties that we searched
for.
E(x + iy)E(u + iv) = ex cis(y)eu cis(v)
= ex+u cis(y + v) = E((x + iy) + (u + iv)).
So E satisfies the multiplicative property.
The derivative property is a bit more delicate. The hard part is to show
that E  (0) = 1. For then, as above, we obtain
E  (z) = E(z)E  (0) = E(z).
To verify that E  (0) = 1, we must show that
|E(h) − 1 − h|
lim = 0.
h→0 |h|
The complication comes from the fact that h takes all small complex values,
not just real values, as it approaches 0. However, we need only facts from
the calculus of real functions to verify this limit. The major tool for making
estimates is the mean value theorem. Let us write h = x + iy, so that

|h| = x2 + y 2 .
We may assume that |h| < 1, so in particular, |x| < 1. Calculate
E(h) − 1 − h = ex cos(y) + iex sin(y) − 1 − x − iy
= ex (cos(y) − 1) + (ex − 1 − x) + iex (sin(y) − y) + iy(ex − 1)
Each of these terms can be estimated by the mean value theorem. First,
since f (y) = cos(y) has derivative f  (y) = − sin(y), it follows that there is a
value c between 0 and y such that
| cos(y) − 1| = |f (y) − f (0)|
= |f  (c)||y| = | − sin(c)||y| ≤ |c||y| ≤ |y|2 .
So ex | cos(y) − 1| ≤ e|y|2 ≤ e|h|2 provided that |x| ≤ 1.
A similar treatment of the function ex shows that |ex − 1| ≤ e|x| for
|x| ≤ 1. Now repeat the argument for the function g(x) = ex − 1 − x, which
has derivative g  (x) = ex − 1. Again by the mean value theorem, there is a
point c between 0 and x so that
|ex − 1 − x| = |g(x) − g(0)| = |xg  (c)|
= |x||ex − 1| ≤ e|x|2 ≤ e|h|2 .
A third application with the function h(y) = sin(y) − y and derivative
h (y) = cos(y) − 1 yields a point c between 0 and y so that
| sin(y) − y| = |y|| cos(c) − 1| ≤ |y||c|2 ≤ |y|3 .
Together with the inequality |ex | ≤ e for |x| ≤ 1 yields
|iex (sin(y) − y)| ≤ e|y|3 ≤ e|h|3 .
106 5. REAL AND COMPLEX NUMBERS

Finally, the fourth term is handled by 2|xy| ≤ x2 + y 2 = |h|2 , so


|y(ex − 1)| ≤ e|y||x| ≤ 2|h|2 .
Putting it all together yields, for |h| ≤ 1,
|E(h) − 1| ≤ e|h|2 + e|h|2 + e|h|3 + 2|h|2 = (2e + 2 + e|h|)|h|2 .
Thus
|E(h) − 1 − h|
lim = 0.
h→0 |h|

Exercises

1. (a) Graph the image of a line parallel to the y-axis under the exponential
map.
(b) Graph the image of a line parallel to the x-axis under the exponential
map.
(c) Show that the strip {z = x + iy : 0 ≤ y < 2π} is mapped by the
exponential function one to one and onto the whole complex plane.
2. (Sum Angle Formula for sin and cos) Use the formulae cos(z) =
(eiz + e−iz )/2 and sin(z) = (eiz − e−iz )/2i.
(a) Prove that sin(w + z) = sin(w) cos(z) + cos(w) sin(z).
(b) Prove that cos(w + z) = cos(w) cos(z) − sin(w) sin(z).
3. Find all solutions of sin(z) = 2.
4. Let f (z) = w1 sin(z) + w2 cos(z), where w1 , w2 ∈ C. Compute f  (z).

5.5. Fundamental Theorem of Algebra


In this section, we will prove the famous Fundamental Theorem of Algebra
that states that every polynomial with complex coefficients factors into a
product of linear terms. It is not easy to prove this theorem in a strictly
algebraic way. Indeed, one can argue that it is really the analytic properties
of polynomials that make this result transparent. There are several acces-
sible proofs. They all rely on some property of functions that depends on
the completeness properties of the real and complex numbers. This proof
depends on the Extreme Value Theorem: a continuous real valued function
on a closed bounded subset of the plane achieves its maximum value at some
point. If you have only seen this for functions on an interval, see Exercise 4.
We begin with a preliminary lemma.

5.5.1. Lemma. Let F be a field and assume that every polynomial with
coefficients in F has a root in F. Then every polynomial with coefficients in
F of degree d ≥ 1 factors into a product of d linear terms.
5.5. FUNDAMENTAL THEOREM OF ALGEBRA 107

Proof. We will use induction on the degree d of polynomial with coefficients


in F. For d = 1, the result is clear. Now for d > 1, if p is a polynomial of
degree d, by hypothesis it has a root r ∈ F. So, p(z) = (z − r)q(z) with q
a degree d − 1 polynomial. By the induction hypothesis, q(z) factors as a
product of d − 1 linear factors. Hence p(z) factors as the product of d linear
factors. 

5.5.2 Fundamental Theorem of Algebra. Every polynomial with


complex coefficients of degree d ≥ 1 factors into a product of d linear terms.
d
i=0 ai z be a polynomial of degree d ≥ 1; so that
Proof. Let p(z) = i

ad = 0. By Lemma 5.5.1, it is enough to show p has a complex root.


Assume, to the contrary, that p(z) is never 0. In particular a0 = p(0) = 0.
The proof will be divided into 3 main steps:
(1) Find a global minimum for |p|.
(2) Normalize p to obtain a polynomial q with min |q(z)| = 1 = q(0).
Then we may write q(z) = 1 + q0 (z).
(3) Show q0 (z) achieves a small negative value, contradicting the fact
that the minimum of q is 1.
A key point to observe here is that Steps 1 and 2 work over R, so it is only
in Step 3 where we make use of C.
Step 1. Notice that
 a0 
d ad−1 ad−2
lim |p(z)| = lim |z| ad + + 2 + ... + d = ∞
|z|→∞ |z|→∞ z z z
since the second factor tends to the finite non-zero limit |ad | and |z|d tends
to infinity. Thus there is a large real number R so that |p(z)| > |a0 | for all
|z| > R.
By the Extreme Value Theorem applied to the continuous real valued
function f (z) = −|p(z)| on the closed bounded set {z ∈ C : |z| ≤ R}, there
is a point z0 so that
|p(z0 )| ≤ |p(z)| for all z ∈ C, |z| ≤ R.
But for |z| > R, one has |p(z)| > |a0 | = |p(0)| ≥ |p(z0 )|. So |p(z)| achieves
its global minimum at z0 .

Step 2. To simplify the computations, replace p(z) by the polynomial


p(z + z0 )
q(z) = .
p(z0 )
Notice that q(z) is also a polynomial of degree d which is never 0, and |q|
takes its minimum value 1 at z = 0. That is,
1 = q(0) ≤ |q(z)| for all z ∈ C.
108 5. REAL AND COMPLEX NUMBERS

The constant term of q is 1. Let b be the next non-zero coefficient; so that


q(z) = 1 + bz k + higher order terms = 1 + bz k r(z)
where r is another polynomial such that r(0) = 1.

Step 3. Since r(z) is continuous, there is a positive real number ε so that


1
|r(z) − 1| < for |z| ≤ ε.
2
Choose an angle θ so that beikθ is a negative real number. Indeed, one can
take θ = − Arg(b)/k. Set w = εeiθ , and note that because of the choice
of θ, one has bwk = −|b|εk . By replacing ε by an even smaller positive
number if necessary, we can also suppose that |bwk | < 1. Let us also write
r(w) = 1 + u, where |u| < 12 . Therefore,
q(w) = 1 + bwk r(w) = 1 − |b|εk (1 + u) = (1 − |b|εk ) + |b|εk u.
Hence
εk εk
|q(w)| ≤ 1 − |b|εk + |b| = 1 − |b| < 1.
2 2
This contradicts the fact that q has minimum modulus 1. So the as-
sumption that q and p have no roots is false. Hence p has a root. Therefore
by Lemma 5.5.1, the proof is complete. 

Exercises

1. Let f (x) = xd + ad−1 xd−1 + · · · + a0 with ai ∈ C. Prove that every root


α of f satisfies
  d−1 
|α| ≤ max 1, |aj | .
j=0

2. (Cauchy’s bound) Let p(x) = + an−1 xn−1 + · · · + a0 be a monic


xn
polynomial, and let r be a root. Prove that |r| ≤ 1 + A where A =
max{|ai | : 0 ≤ i < n}.
r−1
Hint: if |r| > 1, use rn = − i=0 ai ri to bound |r|n .
3. (Partial fraction decomposition) Let fand g be polynomials with
complex coefficients. We may write g(x) = ni=1 (x − ri )mi with r1 , . . . ,
rn ∈ C distinct. Prove that there is a polynomial h(x) and constants
aij ∈ C such that
f (x) n
i m
aij
= h(x) + .
g(x) (x − ri )j
i=1 j=1

4. (EVT for a rectangle) Assume the Extreme Value Theorem for a con-
tinuous function on an interval [a, b] ⊂ R, and prove it for a continuous
function f (x, y) on a rectangle [a, b] × [c, d].
5.6. REAL POLYNOMIALS 109

Hint: for x ∈ [a, b], let fx (y) = f (x, y) be a function on [c, d]. Find y(x)
so that fx attains its maximum at y(x). Let g(x) = max fx . Show that
g is continuous on [a, b].
5. Let f (x) = xn + an−1 xn−1 + · · · + a0 with ai ∈ Z and suppose that
|an−1 | > 1 + |an−2 | + · · · + |a1 | + |a0 |.
Let z1 , . . . , zn ∈ C be the roots of f . Prove there is a unique i such that
|zi | > 1, and |zj | < 1 for all j = i. 1
6. (Gershgorin Disc Theorem) Let A = (aij )1≤i,j≤n
 be an n × n matrix
with the aij ∈ C. For 1 ≤ i ≤ n, let Ri = j =i |aij |. Let λ ∈ C be
an eigenvalue of A, i.e., there exists an n × 1 matrix v = 0 such that
Av = λv. Let
D(aii , Ri ) = {z ∈ C : |z − aii | ≤ Ri },
known as a Gershgorin disc. Prove that λ lies in a Gershgorin disc.
Hint: pick i0 so that |vi0 | = max{|vi | : 1 ≤ i ≤ n} and look at the i0 th
coefficient of Av − λv.

5.6. Real Polynomials


The theory for real polynomials is not quite as simple as for complex polyno-
mials because certain real polynomials do not factor into real linear factors.
However, we may use the Fundamental Theorem of Algebra to figure out
what happens.

5.6.1. Lemma. Let p(x) be a polynomial with real coefficients. Then if a


is a complex root of p, then a is also a root.
d i.
Proof. Let p(z) = i=0 pi z This is immediate from the observation


d 
d
p(a) = i
pi a = pi ai = p(a) = 0. 
i=0 i=0

5.6.2. Theorem. Every real polynomial factors into a product of linear


and quadratic factors, in which the quadratic factors have no real roots.

Proof. By the Fundamental Theorem of Algebra, the polynomial p can be


factored into linear complex terms. By factoring out the leading coefficient,
p(x) = c(x − a1 )(x − a2 ) . . . (x − ad ).

1
This is due to Panaitopol. See https://yufeizhao.com/olympiad/intpoly.pdf.
110 5. REAL AND COMPLEX NUMBERS

Now c is real. Whenever ai is real, x − ai is a factor of p over the real


field. When ai is not real, the lemma shows that there is an integer j so
that aj = ai . In this case, write ai = u + iv and aj = u − iv. Then

(x − ai )(x − aj ) = x2 − 2ux + (u2 + v 2 ).


This is a real polynomial.
It remains to show that all the roots come in pairs. This is seen by
induction on the degree of p. Indeed, this is true for degree d = 1. If the
result holds for all real polynomials of lower degree, consider the case for
p above. If p has a real root a1 , then p factors as p(x) = (x − a1 )p1 (x).
Moreover, it is clear that division of p by x − a1 uses only real coefficients.
So p1 is real. Similarly, if p has a pair of non-real roots a1 = u + iv and
a2 = u − iv, then p factors as p(x) = (x2 − 2ux + u2 + v 2 )p1 (x). Again,
division of a real polynomial by another leaves a real quotient. In either
case, the induction hypothesis applies to p1 (x) and it factors as a product
of linear terms and quadratic terms with non-real roots. Hence the result
follows for p as well. 

We get the following immediate corollary about real polynomials of odd


degree because at least one of the factors must be of odd degree (hence
degree 1). This is also an immediate consequence of the Intermediate Value
Theorem.

5.6.3. Corollary. A real polynomial of odd degree has at least one real
root.

Somehow we have managed all this discussion of factorization without


any discussion of uniqueness. Of course, this is a crucial issue. Because
the factorization over the real or complex numbers is intimately connected
with roots, this question can be handled here by special ad hoc arguments.
However, we will see in the next chapter that the polynomials over any
field always have unique factorization into ‘primes’, known as irreducible
polynomials.

Exercises

1. Show that a quadratic p(x) = ax2 + bx + c with real coefficients has two
(possibly equal) real roots if and only if the discriminant Δ(p) = b2 −4ac
is non-negative.
2. (Partial fraction decompositions, again) Let f and g be polyno-
 
mials with real coefficients. Write g(x) = ni=1 (x − ri )mi N j=1 qj (x)
Mj

with r1 . . . , rn ∈ R distinct and the qj (x) quadratic polynomials with no


real roots. Prove that there is a polynomial h(x), constants aik ∈ R,
NOTES ON CHAPTER 5 111

and linear polynonomials ik (x) ∈ R[x] so that

f (x) n
i m
aik   ik (x)N Mj
= h(x) + + .
g(x) (x − ri )k qj (x)k
i=1 k=1 j=1 k=1

3. (Descartes’s Rule of Signs) Let p(x) be a polynomial with real coef-


ficients. Write p(x) = ai1 xi1 + · · · + aim xim where i1 > i2 > · · · > im ≥ 0
and all aij = 0. Let s be the number of sign changes in the coefficients
of p, i.e., s is the total number of times for which aij aij+1 < 0. Let t be
the number of positive roots of p, counting multiplicity (e.g., a factor of
(x − r)k counts as k roots). In this exercise, we prove that t ≤ s and
s − t is even.
(a) Reduce to the case in which ai1 = 1 and im = 0.
(b) Using Calculus, show that if ai1 a0 > 0, then there are an even
number of positive roots, and if ai1 a0 < 0, then there are an odd
number of positive roots by comparing the behaviour of p near 0
and ∞.
(c) Conclude that s − t is even.
(d) Let r > 0. Show that if aij aij+1 < 0, then the coefficient of xij+1 +1
in (x − r)p(x) agrees in sign with aij+1 .
(e) Combine these facts to show that the number and parity of the sign
changes must increase when multiplying by x − r.
(f) Using induction on the number of positive roots, prove Descartes’s
Rule of Signs.

Notes on Chapter 5
A precise notion of the real and complex numbers as we know it is a rather
modern idea. To the ancient civilizations, numbers were positive integers.
(See the notes to Chapter 1.) Positive rationals were considered as ratios
between
√ two positive integers. The discovery that certain square roots such
as 2 were not ‘commensurable’ with the integers was disturbing. Some,
like the Babylonians, considered successive rational approximations of these
numbers. Nevertheless, no notion of an extended number system developed
at that time.
Stevin proposed the use of finite decimals to represent numbers in 1585.
He recognized that arbitrary quantities could be approximated by his dec-
imals. But since a value like 13 did not have an exact representation, most
others rejected the idea. Even 200 years later, Euler considered the real
numbers as the set of all ‘magnitudes’, and apparently no definition was
considered necessary. However Euler introduced the notion of a variable x
which could take any magnitude. Since roots of equations were by this time
known that may not be real, it raised the issue of what a real number was.
Bolzano, in 1817, had a notion of real numbers and completeness using
Cauchy sequences of rationals; but never published. Cauchy also considered
112 5. REAL AND COMPLEX NUMBERS

the notion of convergent sequences of rationals, but did not propose a proper
theory. In 1858, Dedekind published his theory of the reals using cuts. In
1869, Meray published a construction using Cauchy sequences. Weierstrass,
Cantor and Heine also had related approaches. In 1900, Hilbert developed
an axiomatic approach: axioms for an ordered field together with two critical
axioms, the Archimedean property (there are no numbers x such that 0 < x
and x < n1 for all n ≥ 1) and a completeness property. He established the
uniqueness of such a field, thereby showing that different constructions such
as Dedekind’s and Meray’s must yield identical objects.
Surprisingly quadratic equations did not lead to the discovery of complex
numbers, because a simple check of the discriminant determines whether
there are (real) solutions or not. In the early 1500’s, del Ferro and Tartaglia
found the formula for the roots of a cubic. This involved square roots of
negative numbers even when the roots are real. Cardano, who found √ the
formula for the roots of a quartic, considered numbers of the form a + −b.
He was not convinced that they were bona fide quantities, but they worked.
Descartes, in 1637,
√ coined the term ‘imaginary number’. Euler introduced
the use of i = −1, as well as the polar form. Argand came up with
the notion of representing complex numbers in the plane in 1806. In 1831,
Hamilton described the complex numbers as ordered pairs of reals, (a, b),
with vector sums and product (a, b)(c, d) = (ac − bd, bc + ad). Gauss was
aware of the geometric representation of complex numbers in 1796, but did
not publish it until 1831. In 1847, Cauchy constructed the complex numbers
as an extension of the reals, R[x]/(x2 + 1). Cauchy was also responsible
for the beginnings of complex function theory (calculus for complex valued
functions).
The fundamental theorem of algebra was proposed by Roth and later
Girard in the early 1600’s, both stating that a (real) polynomial of degree
n may have n roots. D’Alembert had a proof in 1746, but it had a gap.
Euler, Lagrange and others made attempts, but implicitly assumed that
there was a field extension in which the polynomial already has n roots.
Wood in 1798 and Gauss in 1799 published proofs, that also had gaps. In
1806, Argand published the first rigorous proof. Moreover he was the first
to allow complex coefficients for his polynomials. Gauss published two other
proofs in 1816. The proof that we give uses the extreme value theorem. A
proof of this was found by Bolzano in 1830, but never published. It was
later proven by Weierstrass in 1860. The extreme value theorem depends
on the completeness property of the real numbers.
Various introductory books on real analysis provide some construction
of the real numbers. The uniqueness is more subtle. A treatment of both
can be found in Garling [12, Ch.2-3]. The fundamental theorem of algebra
is fundamentally a result in analysis. Standard comprehensive treatises on
algebra usually assume that polynomials of odd degree have a real root. This
is basically assuming the Intermediate value theorem, which is a consequence
of the completeness of the reals. Other proofs using complex analysis can
NOTES ON CHAPTER 5 113

be found in many texts; for example Simon [35] has three proofs. The
proof given in our book is perhaps the simplest if one knows about the
completeness of the real numbers.
Chapter 6

The Ring of Polynomials


In this chapter, we investigate the algebraic properties of polynomials. The
reader should notice the parallels between the structure of the integers and
the structure of the polynomials. Most of the ideas that have been devel-
oped for integers, such as primes, modular arithmetic, and so on, have a
polynomial version.

6.1. Preliminaries on Polynomials


We use the notation R[x] to denote the set of all polynomials with coefficients
in a ring R. That is, an element of R[x] is an expression of the form

rd xd + rd−1 xd−1 + . . . + r1 x + r0
where x is a formal symbol and the coefficients ri belong to R. In particular,
we are especially interested in the case when R is a field. So we will use the
symbol F whenever we mean the result works for any field. The fields of
interest to us at the moment are the rationals Q, the reals R, the complex
numbers C, and the fields Zp for p prime. So F[x] will indicate any of Q[x],
R[x], C[x] or Zp [x]. Whereas R[x] may indicate Z[x] or Zn [x] for composite
n as well.
Addition of polynomials is defined as follows:

n 
n 
n
ri xi + si xi = (ri + si )xi .
i=0 i=0 i=0

Multiplication is defined by the rule


(rxm )(sxn ) = (rs)xm+n
together with the consequences of the distributive law, i.e.

m  
n  m+n
  
ri xi sj xj = ri sj xk .
i=0 j=0 k=0 i+j=k

115
116 6. THE RING OF POLYNOMIALS

The zero element is the constant zero polynomial 0 ∈ R ⊂ R[x], and the
multiplicative identity is the constant polynomial 1 ∈ R ⊂ R[x]. One checks
that with these operations, R[x] is a ring. If R is a commutative ring, then
we see R[x] is as well. 
The degree of a non-zero polynomial p(x) = m i
i=0 pi x is the largest
integer deg(p) = d so that pd = 0. There is no natural degree for the 0
polynomial, but it is convenient to define deg(0) = −∞ since it makes the
following lemma work.

6.1.1. Lemma. Let R be an integral domain. Then R[x] is an integral


domain. Furthermore, if p, q ∈ R[x], then
deg(pq) = deg(p) + deg(q).

Proof. If p = 0, then pq = 0 and we see


deg(pq) = −∞ = −∞ + deg(q) = deg(p) + deg(q).
Therefore, we may assume both p and q are non-zero. Let deg(p) = d and
deg(q) = e with d, e ≥ 0. Then

d
p(x) = pd xd + lower order terms = pi xi
i=0
e
q(x) = qe xe + lower order terms = qj xj
j=0

Thus a computation shows that


pq(x) = (pd qe )xd+e + lower order terms.
Since pd , qe = 0 and R is an integral domain, pd qe = 0. Therefore, pq = 0
and
deg(pq) = d + e = deg(p) + deg(q),
as desired. 

Observe that if R is not an integral domain, then Lemma 6.1.1 fails.


For example, if R = Z6 [x], p = 2x3 + 1 and q = 3x, then pq = 3x so
deg(pq) = 1 = 4 = deg(p) + deg(q).
Even when F is a field, the ring F[x] is not a field. The element x never
has an inverse in F[x], as the following lemma shows.

6.1.2. Lemma. If R is an integral domain, then the units


R[x]∗ = R∗ .
In particular, for a field F, the group of units F[x]∗ = F∗ = F  {0}.
6.1. PRELIMINARIES ON POLYNOMIALS 117

Proof. If r ∈ R∗ , then there exists s ∈ R∗ such that rs = 1. This equality


persists in R[x], so r ∈ R[x]∗ .
Conversely, if p ∈ R[x]∗ , then there exists q ∈ R[x] such that pq = 1.
Applying Lemma 6.1.1, we have
0 = deg(1) = deg(p) + deg(q).
Since p and q are non-zero, deg(p), deg(q) ≥ 0. Thus, deg(p) = deg(q) = 0,
i.e. p, q ∈ R, so p ∈ R∗ . 
Again, this may be false if R has zero divisors. For example, consider
Z4 .
(2x + 1)2 = 4x2 + 4x + 1 ≡ 1 (mod 4).
So 2x + 1 is a unit of Z4 [x]. Notice how the degree of the product turned
out to be smaller than expected.
We end this section with two useful lemmas whose proofs we leave as
exercises. Lemma 6.1.4 explains why the notation ni=0 ri xi is particularly
helpful: we can plug in elements of R in place of x.

6.1.3. Lemma. Let n ≥ 2 be an integer and let


π : Z[x] → Zn [x]
be the function defined by

n  n
π ri xi = [ri ]xi
i=0 i=0
where [r] is the equivalence class of r in Zn . Then π is a ring homomorphism,
i.e. π(1) = 1, π(r+s) = π(r)+π(s), and π(rs) = π(r)π(s) for all r, s ∈ Z[x].

6.1.4. Lemma. Let R be a commutative ring and let a ∈ R. Consider


the evaluation map
eva : R[x] → R
defined by

n  
n
i
eva ri x = ri ai .
i=0 i=0
Then eva is a ring homomorphism.

Exercises

1. Prove Lemma 6.1.3.


2. Prove Lemma 6.1.4.
3. Let F and G be fields with F ⊂ G. Prove that if p, q ∈ F[x], then p | q
in F[x] if and only if p | q in G[x].
118 6. THE RING OF POLYNOMIALS

4. Show that Exercise 3 is false if F and G are allowed to be rings instead


of fields.
5. Show that if R is an integral domain, then R[x] is also an integral do-
main.
6. (Field generated by an element) Let F and G be fields with F ⊂ G.
Let α ∈ G and let
$ %
f (α)
F(α) = : f, g ∈ F[x], g(α) = 0 .
g(α)
Prove that F(α) is a field with F ⊂ F(α) ⊂ G. It is referred to as the
field generated by α.
7. (Multivariate polynomial rings) Let R be a ring. Define R[x1 , . . . , xn ]
 d
to be the set of formal sums nj=0 ijj=0 ri1 ,...,in xi11 xi22 . . . xinn with coef-
ficients ri1 ,...,in ∈ R. Define addition and multiplication analogously to
how it was defined for R[x]. Prove that R[x1 , . . . , xn ] is a ring.
8. Let R be a ring. Show (R[x])[y], (R[y])[x], and R[x, y] are isomorphic.

6.2. Unique Factorization for Polynomials


In this section, we will prove the division algorithm for polynomials, and
show, as for the integers, that this leads to a Euclidean algorithm and unique
factorization in F[x], where F is a field.

6.2.1. Definition. If R is a commutative ring, a non-constant polynomial


p in R[x] is called irreducible if for every factorization p = qr in R[x], either
q(x) or r(x) is a unit.

Notice that this definition is a special case of the one given in Definition
√ In particular, it coincides with the definition of a prime in Z or
1.8.6.
Z[ d]. The term irreducible is used instead of prime for historical reasons.
We are primarily interested in polynomials over a field. However, we will
have reason to consider polynomials in Z[x].
The (long) division algorithm for polynomials is often taught high school.
The technique is to divide the leading term of p into the leading term of q.
Subtraction leaves a remainder of lower degree. Proceed iteratively until a
remainder of degree less than deg(p) is achieved. This can easily be done by
hand, or by computer.

6.2.2. Proposition (Division algorithm for polynomials). Sup-


pose that q = 0 and p belong to F[x]. Then there is a unique quotient a and
remainder r in F[x] so that
p = aq + r and deg(r) < deg(q).
6.2. UNIQUE FACTORIZATION FOR POLYNOMIALS 119

Proof. Proceed by induction on the degree of p. If d := deg(p) < deg(q),


take a = 0 and r = p. Otherwise, d ≥ deg(q) =: n. Suppose that the result
holds for all polynomials of degree less than d. Let
q = qn xn + lower order terms and p = pd xd + lower order terms,
where qn and pd are non-zero. The polynomial
p1 (x) = p(x) − (pd qn−1 )xd−n q(x)
 
= pd xd + lower order terms − pd xd + lower order terms
= lower order terms.
It follows that deg(p1 ) < d = deg(p). So by the induction hypothesis, the
polynomial p1 can be written as p1 = a1 q + r where a1 and r belong to F[x]
and deg(r) < deg(q). Therefore,
p(x) = p1 (x) + (pd qn−1 )xd−n q(x)

= (pd qn−1 )xd−n + a1 (x) q(x) + r(x).
This establishes existence.
For uniqueness, notice that if q|p and deg(p) < deg(q), then p = 0. This
is because the identity p = aq implies that
deg(p) = deg(a) + deg(q).
Only deg(p) = deg(a) = −∞ makes this possible, for otherwise the right-
hand side is strictly larger. So p = a = 0.
Now suppose that p = a1 q + r1 = a2 q + r2 where both remainders have
degree less than deg(q). Then q divides (a1 − a2 )q = r2 − r1 . Since
deg(r2 − r1 ) ≤ max{deg(r1 ), deg(r2 )} < deg(q),
the previous argument shows that r2 − r1 = 0. Therefore we obtain r1 = r2
and a1 = a2 . 

6.2.3. Corollary. The linear polynomial x − c divides a polynomial p if


and only if p(c) = 0.

Proof. Divide x − c into p by the division algorithm to obtain a quotient a


and leave a remainder r of degree at most 0. So r is a constant. Then
p(c) = a(c)(c − c) + r = r.
So x − c divides p if and only if the remainder p(c) equals 0. 

6.2.4 Euclidean algorithm for Polynomials. If p and q are non-zero


elements of F[x], then there exists a greatest common divisor d in F[x] with
the properties:
120 6. THE RING OF POLYNOMIALS

(1) d|p and d|q,


(2) there are polynomials s and t such that d = ps + qt,
(3) if b|p and b|q, then b|d.

Proof. By Proposition 6.2.2, F[x] is a Euclidean domain. So, the result


follows from Theorem 1.8.12. 
It follows that we obtain the important consequence of unique factoriza-
tion for polynomials over a field.

6.2.5 Unique Factorization for Polynomials. Every polynomial in


F[x] factors uniquely into a product of irreducibles. That is, if r(x) factors
into irreducible terms as

m n
r= pi = qj ,
i=1 j=1

then m = n, and there is a permutation π and non-zero scalars ci ∈ F∗ so


that qπ(i) = ci pi .

Proof. By Proposition 6.2.2 and Remark 1.8.9, the hypotheses of Theorem


1.8.18 hold. So, F[x] has unique factorization. 

Exercises

1. Find gcd(f, g) and express it as a polynomial combination of f and g


for the following examples in Q[x].
(a) f (x) = x4 + 7x3 + 18x2 + 20x + 8 and
g(x) = x4 + 6x3 + 7x2 − 6x − 8.
(b) f (x) = 2x4 + 3x3 + 2x2 + 3x + 2 and g(x) = x4 + x3 − x − 1.
2. Factor p(x) = x4 + 1 completely into irreducibles in each of the follow-
ing:
(a) (i) Q[x] (ii) R[x] (iii) C[x].
(b) (i) Z2 [x] (ii) Z5 [x] (iii) Z7 [x].
3. (a) Show that a polynomial p ∈ F[x] of degree 2 or 3 is irreducible if
and only if it has no roots in F.
(b) Give an example that shows that this is false for degree 4.
4. Let f ∈ Q[x], and let f  be its derivative. Suppose that p(x) is an
irreducible polynomial. Show that p| gcd(f, f  ) if and only if p2 |f .
5. Show by example that Proposition 6.2.2 is false if F is replaced by an
arbitrary commutative ring.
6. Let F and G be fields with F ⊂ G. Let p, q ∈ F[x]. Let f be gcd(p, q)
computed in F and let g be gcd(p, q) computed in G. Prove that f = g.
6.3. IRREDUCIBLE POLYNOMIALS IN Z[x] 121

6.3. Irreducible Polynomials in Z[x]


Any polynomial in Q[x] can be multiplied by a large integer to clear the
denominators and leave a polynomial with integer coefficients. It is a con-
venient fact, proven by Gauss, that a polynomial in Z[x] factors in Q[x]
only if it factors in Z[x]. In other words, it is not necessary to use fractions
to factor integer polynomials over the rationals. This makes it possible to
obtain certain simple tests providing sufficient conditions for irreducibility.

6.3.1. Definition. A polynomial in Z[x] is called primitive if the gcd of


its set of coefficients is equal to 1.

6.3.2. Lemma. If r and s are primitive polynomials in Z[x], then rs is


also primitive.

Proof. Suppose the coefficients of rs have gcd not equal to 1. Then there
exists a prime p that divides all of the coefficients of rs. Given a polynomial
f ∈ Z[x], let π(f ) ∈ Zp [x] denote the polynomial obtained by reducing the
coefficients mod p, as in Lemma 6.1.3. Then π(r)π(s) = π(rs) = 0. By
Lemma 6.1.1, Zp [x] is an integral domain, so either π(r) = 0 or π(s) = 0.
Without loss of generality, π(r) = 0. In other words, p divides all of the
coefficients of r, and therefore r is not primitive. 

6.3.3 Gauss’s Lemma. A polynomial p ∈ Z[x] factors in Q[x] only if it


factors in Z[x]. Furthermore, if p factors as p = rs in Q[x], then there are
rational multiples r of r and s of s such that r , s ∈ Z[x] and p = r s .

Proof. Suppose that p factors as p = rs in Q[x]. Choose integers M and


N so that M r and N s have integer coefficients. Let m be the gcd of the
coefficients of M r, so that M r = mr1 where r1 is a primitive polynomial
in Z[x]. Similarly, let n be the gcd of the coefficients of N s, and factor
N s = ns1 where s1 is also primitive.
Compute
(6.3.4) M N p = (M r)(N s) = mn(r1 s1 ).
By the lemma above, the polynomial r1 s1 is primitive. So the gcd of the
coefficients of this product is mn. Let d be the gcd of the coefficients of p.
We obtain the equation
M N d = mn.
mn
Thus, dividing equation (6.3.4) by M N = d yields
p = dr1 s1 ,
which is a factorization in Z[x]. Taking r = dr1 = Md
m r and s = s1 = N
n s,
we have completed the proof. 
122 6. THE RING OF POLYNOMIALS

The following corollary of Gauss’s Lemma characterizes irreducible poly-


nomials in Z[x].

6.3.5. Corollary. Let p ∈ Z[x]. Then p is irreducible in Z[x] if and only


if p is primitive and irreducible in Q[x].

Proof. First suppose p(x) is irreducible in Z[x]. Gauss’s Lemma 6.3.3 shows
that p(x) remains irreducible in Q[x]. Now, let d be the greatest common
divisor of the coefficients of p(x). Then p(x) = dq(x) with q(x) ∈ Z[x]. By
irreducibility of p, we must have d = 1 and so p is primitive.
Conversely, suppose p(x) is reducible in Z[x]. Then we may factor p(x) =
q(x)r(x) with q(x), r(x) non-zero non-units in Z[x]. If both q and r have
positive degree, then we see p(x) is reducible in Q[x]. If on the other hand,
deg(q) = 0, then q ∈ Z is a common factor of the coefficients of p(x). Since
q is not a unit in Z[x], we have q = ±1, showing p(x) is not primitive. 

The next result is a well known criterion for finding rational roots of
polynomials in Z[x].

a
6.3.6 Rational

Root Theorem. If gcd(a, b) = 1, and b is a root of
m
p(x) = i=0 pi x
i ∈ Z[x], then b|pm and a|p0 .

Proof. By Corollary 6.2.3, x − ab is a factor of p in Q[x]. The rational


multiple of x − ab which is primitive is precisely bx − a. From Gauss’s
Lemma, bx − a must be a factor of p. That is,

p(x) = (bx − a)q(x) = bqm−1 xm + . . . − aq0 .

So pm = bqm−1 and p0 = −aq0 . 

For example, consider the polynomial x3 + x + 1. By the criterion above,


the only possible rational roots are ±1. Substituting ±1 into the above
polynomial, we see neither is a root. So this cubic has no linear factors.
Therefore, it is irreducible.
Similarly, consider p = x4 + 2x3 + 4x2 + 4x + 4. By the corollary, the
only possibilities for rational roots are ±1, ±2, and ±4. Trial shows that
none are roots. (Clearly, p has no positive roots. This cuts down on the
number of trials.) However, this only means that p has no linear factors. It
does not imply that p is irreducible. And, in fact, it is not. It factors as
 2
p(x) = x2 + x + 2 − x2
 
= x2 + 2 x2 + 2x + 2
6.4. EISENSTEIN’S CRITERION 123

Exercises

1. Factor 8x3 − 6x + 1 in Z[x] or prove that it is irreducible.


2. Factor x4 − 5x2 + 6x + 1 in Z[x] or prove that it is irreducible.
3. Let f (x) = x5 + 3x4 + 2x3 + x2 + x − 2 and g(x) = x5 + 2x4 + 2x3 − x2 −
4x + 2.
(a) Find gcd(f, g).
(b) Hence factor f and g completely in Z[x].
√ √
4. Find a quartic polynomial in Z[x] with 5 − 2 3 as a root. Factor it
completely in R[x]. Then prove that it is irreducible in Q[x].
5. Prove the following generalization of Gauss’s Lemma 6.3.3. Let R be
any UFD and let K be its fraction field, as defined in Exercise 5 of
Section 2.4. Prove that a polynomial p ∈ R[x] factors in K[x] only if it
factors in R[x]. Furthermore, prove that if p factors as p = rs in K[x],
then there exist a, b ∈ K ∗ such that r = ar ∈ R[x], s = bs ∈ R[x], and
p = r s .
6. (a) Prove that Z[x] is a UFD.
Hint: Use that Q[x] is a UFD.
(b) (Gauss’s Theorem on UFDs) Prove that if R is a UFD, then
R[x] is a UFD.

6.4. Eisenstein’s Criterion


In this section, we develop another test for irreducibility that carries these
ideas a little further.
d
6.4.1 Eisenstein’s Criterion. Let p = ∈ Z[x]. Suppose that
k=0 pk x
k

q is a prime integer such that q | pi for 0 ≤ i < d, q does not divide pd , and
q 2 does not divide p0 . Then p is irreducible in Q[x].

Proof. Suppose to the contrary that p is reducible in Q[x]. Then by Gauss’s


Lemma, we may write p = rs in Z[x] with deg(r), deg(s) > 0. Write r(x) =
I J
j=0 sj x with rI , sJ = 0 and I, J ≥ 1 Then I, J < d.
i j
i=0 ri x and s(x) =
The hypothesis tells us that q does not divide pd = rI sJ , hence q does not
divide rI and does not divide sJ . Since q does divide p0 = r0 s0 but q 2 does
not, it follows that q divides one of r0 or s0 , but not the other. Without loss
of generality, q|r0 and q does not divide s0 . Let i0 ≤ I be the least integer
for which q does not divide ri0 . Then

pi0 = (r0 si0 + . . . + ri0 −1 s1 ) + ri0 s0 .


124 6. THE RING OF POLYNOMIALS

From the choice of i0 , it follows that q divides each term in the bracketed
sum, but does not divide ri0 s0 . Thus q does not divide pi0 . Since i0 ≤ I < d,
this is contrary to the hypotheses. Therefore p must be irreducible. 

6.4.2. Example. For example, let us find an irreducible polynomial with


sin( 2π
7 ) as a root using de Moivre’s Theorem. Let us write c
:= cos( 2π
7 ) and

s = sin( 7 ). Using the formula
1 = cos(2π) + i sin(2π) = (c + is)7 ,
and taking the imaginary part of both sides, one obtains
0 = 7c6 s − 35c4 s3 + 21c2 s5 − s7
= 7(1 − s2 )3 s − 35(1 − s2 )2 s3 + 21(1 − s2 )s5 − s7
= −s(64s6 − 112s4 + 56s2 − 7)

Since s = sin( 2π
7 ) = 0, it is a root of the polynomial

p(x) = 64x6 − 112x4 + 56x2 − 7.


This polynomial is a perfect candidate for Eisenstein’s criterion. Note
that gcd(64, 7) = 1, but 7 divides −112, 56 and −7. Since 72 does not divide
−7, it follows that p is irreducible in Z[x] and hence in Q[x]. In particular,
p has no rational roots. So sin( 2π
7 ) is irrational.

6.4.3. Example. Sometimes, one has to be clever to find a way to use


Eisenstein’s criterion. Let q be an integer prime. Let
xq − 1
p(x) = = xq−1 + xq−2 + . . . + x + 1.
x−1
There is no obvious way to use the method here. However, sometimes a
substitution helps. Notice that p(x) factors as p = rs if and only if p(x + 1)
factors as p(x + 1) = r(x + 1)s(x + 1). So compute
(x + 1)q − 1
p(x + 1) =
(x + 1) − 1
q & '
 q & '
−1 q k  q k−1
=x x = x
k k
k=1 k=1
q(q − 1) q−3 q(q − 1)
= xq−1 + qxq−2 + x + ... + x+q
2 2
 
Notice that the leading coefficient is qq = 1, the constant coefficient is 1q =
& '
q q!
q, and the other coefficients are = for 2 ≤ k ≤ q − 1. This is
k k!(q − k)!
always an integer. Now q divides the numerator, but not the denominator.
Thus each is divisible by q, while the constant coefficient is not divisible by
6.5. FACTORING MODULO PRIMES 125

q 2 . So p(x + 1) satisfies Eisenstein’s criterion, and thus is irreducible. So p


is irreducible as well.
The roots of p are the q − 1 q-th roots of unity other than 1, namely
e2kπi/q for 1 ≤ k ≤ q − 1.

Exercises

1. Prove that x5 − 210x4 − 903x3 + 168x − 315 is irreducible in Z[x].


2. If n > 1 is a square free integer, show that xd − n is irreducible in Z[x].
3. Prove that x5 − 22x4 + 196x3 − 887x2 + 2036x − 1886 is irreducible in
Z[x].
Hint: substitute x − 1 for x.
4. Prove that x7 − 14x6 + 84x5 − 280x4 + 560x3 − 672x2 + 459x − 29 is
irreducible in Z[x].
Hint: find a substitution that helps.
5. Prove that if n is composite, the polynomial xn−1 + xn−2 + · · · + x + 1
is reducible.
6. (Schur) Prove the following special case of a result of Schur: if p is prime
p−1 ak k xp
and a1 , . . . , ap−1 ∈ Z, the polynomial 1 + k=1 k! x + p! is irreducible
in Q[x].
7. Prove the following generalization of Eisenstein’s Criterion 6.4.1. Let R
be any UFD and let K be its fraction field, as defined in Exercise 5 of

Section 2.4. Let q ∈ R be irreducible and let p = dk=0 pk xk ∈ R[x].
Suppose that q | pi for 0 ≤ i < d, q does not divide pd , and q 2 does not
divide p0 . Prove p is irreducible in K[x].
8. Prove xn + y n − 1 is irreducible in Q[x, y] for all n ≥ 1.
Hint: consider this as an element of (Q[x])[y] and note that xn − 1 has
a linear factor.

6.5. Factoring Modulo Primes


Another simple test for irreducibility is to study the factorization of f (x)
modulo p for various small primes p.

6.5.1. Lemma. If f ∈ Z[x] is reducible in Q[x], then it is reducible modulo


p for every prime p relatively prime to the leading coefficient of f .

The reason for the condition on p in Lemma 6.5.1 is so that the degree
of f does not decrease when moving to Zp . For example, the reducible
126 6. THE RING OF POLYNOMIALS

polynomial f (x) = 2x2 + 3x + 1 = (2x + 1)(x + 1) reduces to f ≡ x + 1


(mod 2) which is irreducible.
Proof of Lemma 6.5.1. Fix a prime p. For any h ∈ Z[x], let π(h) ∈ Zp [x]
be as in Lemma 6.1.3. By Gauss’s Lemma, we may write f = rs in Z[x]
with deg(r), deg(s) > 0. Then π(f ) = π(r)π(s). The product of the leading
coefficients of r and s is the leading coefficient of f , and hence the leading
coefficients of r and s are relatively prime to p. So
deg(π(r)) = deg(r) and deg(π(s)) = deg(s).
Hence both π(r) and π(s) are non-trivial factors in Zp [x]. 
We state the contrapositive form as a corollary.

6.5.2. Corollary. If f ∈ Z[x] has leading coefficient coprime to p, and f


is irreducible modulo p, then f is irreducible in Q[x].

6.5.3. Example. Let f (x) = x5 + 5x4 + 6x + 1. By Gauss’s Lemma, the


only possible roots are ±1, neither of which works. So, if f factors at all, it
must be into a product of a cubic and a quadratic polynomial. Modulo 3,
this polynomial is
f (x) ≡ x5 − x4 + 1 (mod 3).
The simplest approach is to find all the irreducible quadratic polynomials
in Z3 [x], and test them. The reducible quadratics are the ones with zero
constant coefficient, and the three products (x ± 1)(x ± 1); namely, x2 ,
x2 ± x, x2 − 1, and x2 ± x + 1. That leaves x2 + 1 and x2 ± x − 1 as the three
irreducible monic quadratic polynomials in Z3 [x]. A calculation shows that
none of them divide f (x). Hence f is irreducible in Z3 [x]. Therefore it is
irreducible over the rationals as well.

6.5.4. Example. This method can also be used to factor polynomials, by


using the Chinese Remainder Theorem. Consider f (x) = x5 − 12x3 + 17x2 −
10x + 2. The only possible rational roots are ±1 and ±2, none of which
work. So if this factors, it is into the product of a cubic and a quadratic.
Suppose that we have factored it mod 3 and mod 5 into irreducible factors.
f (x) ≡ (x3 − x − 1)(x2 + 1) (mod 3)
f (x) ≡ (x3 + 3x2 + x + 2)(x + 1)2 (mod 5)
The cubic term g(x), if it exists, is congruent to x3 − x − 1 (mod 3) and
congruent to x3 + 3x2 + x + 2 (mod 5). Solving this system of equations,
we find that
g(x) ≡ x3 + 3x2 + 11x + 2 (mod 15).
Moreover, the leading coefficient divides 1, and hence must be 1; and the
constant coefficient must divide 2. So it must be 2. Let us write g(x) =
x3 + ax2 + bx + 2.
6.5. FACTORING MODULO PRIMES 127

This forces the constant coefficient of the quadratic term h(x) to be 1.


Since the coefficient of x4 in f (x) is 0, the coefficient of x in h must be −a.
Let’s write h(x) = x2 − ax + 1. Trial of small choices for a and b now yields
the factorization

x5 − 12x3 + 17x2 − 10x + 2 = (x3 + 3x2 − 4x + 2)(x2 − 3x + 1).

This kind of search can be carried out with reasonable efficiency on a


computer. However, this is not the standard algorithm used on computers
to factor polynomials. The methods used will be discussed at the end of the
next chapter.

Exercises

1. Reduce the polynomial in Section 6.43 modulo 3 and factor it completely.


Use this to show that the polynomial is irreducible in Z[x].
2. Reduce the polynomial in Section 6.44 modulo 2. Use this to show that
the polynomial is irreducible in Z[x].
3. Decide if x5 + 2x + 4 is irreducible in Z[x] by reducing mod 3.
4. Prove that p(x) = x4 + 1 is reducible in Zp [x] for every prime p. (Com-
pare with Section 6.2 2.)
Hint: you need to know when −1, ±2 are squares mod p.
5. The polynomial q(x) = x6 − 6x4 + 14x3 + 12x2 + 84x + 41 factors as

q ≡ (x2 − x − 1)3 (mod 3) and q ≡ (x − 3)3 (x + 3)3 (mod 7).

Show that q is irreducible.


6. Show that if n is odd and p is prime, then f (x) = xn − p2 is irreducible
in Z[x].
Hint: if f = gh, then g(x2 )h(x2 ) = (xn − p)(xn + p).
7. Show that x4 + 12x2 + 18x + 6 is irreducible in (Z[i])[x]. Remember that
2 is not a prime in the Gaussian integers.
8. (Perron’s irreducibility criterion)
Let f (x) = xn + an−1 xn−1 + · · · + a0 with ai ∈ Z and suppose

|an−1 | > 1 + |an−2 | + · · · + |a1 | + |a0 |.

Prove that f is irreducible in Z[x].


Hint: Use Section 5.5, Exercise 5.
9. (A variant on Cohn’s irreducibility criterion)
Let f (x) = ad xd + ad−1 xd−1 + · · · + a0 with ai ∈ Z and ad = 0. Suppose
128 6. THE RING OF POLYNOMIALS

there is n ∈ Z with f (n) prime and


a 
 i
n ≥ 2 + max  .
0≤i<d ad

Prove that f (x) is irreducible in Z[x].1


Hint: Use Section 5.5, Exercise 2 to bound the roots of f . Show that
if f = gh, then |g(n)| > 1.

6.6. Algebraic Numbers


6.6.1. Definition. A complex number w is called algebraic if it is the
root of a polynomial in Q[x]. A monic polynomial p in Q[x] of least degree
such that p(w) = 0 is called the minimal polynomial of w.

We will establish that the minimal polynomial is unique. No particular


properties of the field of rational numbers is used here. Indeed, if F is any
field contained in a larger field G and w ∈ G is a root of a polynomial in
F[x], then the minimal polynomial of w is the monic polynomial of least
degree in F[x] with w as a root. The following result is valid in this greater
generality without any change in the proof.

6.6.2. Theorem. The minimal polynomial p of an algebraic number w is


unique. Moreover, p is irreducible, and if q is another polynomial such that
q(w) = 0, then p divides q.

Proof. If q and r are two polynomials such that q(w) = r(w) = 0, let
s = gcd(q, r). By the Euclidean algorithm for Q[x], there are polynomials a
and b in Q[x] so that s = aq + br. Hence
s(w) = a(w)q(w) + b(w)r(w) = 0.
In particular, the monic polynomial t = gcd(p, q) satisfies t(w) = 0.
Thus deg(t) ≥ deg(p). Since t|p, it follows that t and p are scalar multiples
of one another. Since p and t are monic, it follows that t = p. Hence, p also
divides q.
Suppose that p is not irreducible over Q[x], say p = qr where q and r
are non-constant polynomials in Q[x]. But then
0 = p(w) = q(w)r(w).
So either q(w) = 0 or r(w) = 0. But this is impossible, as they have smaller
degree than p, which is the polynomial of smallest degree in Q[x] with w as
a root. Hence p must be irreducible. 
This immediately yields a powerful test for irrationality of algebraic
numbers.
1
This variant of Cohn’s irreducibility criterion is due to Murty [27].
6.6. ALGEBRAIC NUMBERS 129

6.6.3. Corollary. If w is a root of an irreducible polynomial p in Z[x] of


degree at least 2, then w is irrational.

Proof. From the hypothesis, it follows that p is the minimal polynomial of


w (up to a scalar). The minimal polynomial of a rational number r is x − r,
which has degree 1. So w is irrational. 

6.6.4. Example. If |n| > 1 is square free, the polynomial xk − n is


irreducible by Eisenstein’s criterion. Just take any prime p dividing n, and
note that p divides all the zero coefficients,
√ and p2 does not divide n. So
xk − n is the minimal
√ polynomial of k n. This gives another proof of the
k
irrationality of n.
√ √
6.6.5. Example. Let w = 3
3−
2. Notice that
√ √ √
3 = (w + 2)3 = w3 + 3 2w2 + 6w + 2 2.

Hence we may compute


√ √
(w3 + 6w − 3)2 = (3 2w2 + 2 2)2
w6 + 12w4 − 6w3 + 36w2 − 36w + 9 = 18w4 + 24w2 + 8
w6 − 6w4 − 6w3 + 12w2 − 36w + 1 = 0.

From the rational roots theorem, the only possible rational roots of the
polynomial p(x) = x6 − 6x4 − 6x3 + 12x2 − 36x + 1 are ±1, neither of which
works.
In fact, p is irreducible in Q[x]. By Gauss’s Lemma, it suffices to show
that p is irreducible in Z[x]. To see this, reduce it mod 3. The polynomial
factors as
p ≡ (x2 + 1)3 (mod 3)
and x2 + 1 is irreducible in Z3 [x] since it has no roots in Z3 . So if p factors
in Z[x], it factors as a quadratic times a quartic. The quadratic must be
x2 + 3ax + 1. There are two ways to proceed, and both are computational.
One is to write down a general quartic, multiply it by x2 + 3ax + 1, and set
it equal to p. Then a calculation shows that the equations can’t be solved.
Since the coefficients of x and x3 are forced, simple conditions on a and the
coefficient b of x2 lead to a contradiction. Alternatively, we can factor p
mod 7 as

p(x) ≡ (x3 − 2x2 − x − 2)(x3 + 2x2 − x + 3) (mod 7).

You can check quickly that neither cubic has a root in Z7 , and thus they are
irreducible. This shows that any factorization in Z[x] must be into cubics.
This is incompatible with the factorization mod 3. So p is irreducible.
130 6. THE RING OF POLYNOMIALS

It is a non-trivial fact that if u and v are algebraic numbers, then u + v,


uv and (when v = 0) u/v are all algebraic numbers. This will be proven in
Theorem 6.9.3.

Exercises

1. Show that sin(1◦ ) = sin( 180


π
) is algebraic.
√ √
2. (a) Find a polynomial√p in Z[x]
√ with 3 + 3
5 as a root.
(b) Hence prove that 3 + 3 5 is irrational.
(c) Suppose you have calculated that p factors modulo 3 as (x + 1)6 ,
and modulo 5 as (x2 + 2)3 . Show that p is irreducible.
√ √
3. Find the minimal polynomial of 2 − 3 7.
4. Let F be a field and let p(x) = a0 + a1 x + . . . + an xn and q(x) = an +
an−1 x + . . . + a0 xn belong to F[x]. If a0 an = 0, what is the relationship
between the roots of p and the roots of q? Hence conclude that if α is
algebraic over F, then so is 1/α.
5. (Primitive Element Theorem) Let α and β be algebraic numbers.
Recall the definition of a field generated by an element given in Exercise
6 of Section 6.1. Let f (x) ∈ Q[x] be the minimal polynomial of α and
let g(x) ∈ Q[x] be the minimal polynomial of β.
(a) Prove there exists c ∈ Q such that β is the only common complex
root of g(x) and h(x) = f (α + c(β − x)).
(b) Let γ = α + cβ. Prove that gcd(g, h) = w(x − β) ∈ Q(γ).
Hint: Use Exercise 6 from Section 6.2.
(c) Prove that α, β ∈ Q(γ) and conclude that Q(α)(β) = Q(γ).
(d) Prove that if α1 , . . . , αn are algebraic numbers, then there exists δ
such that Q(α1 )(α2 ) · · · (αn ) = Q(δ).

6.7. Transcendental Numbers


A complex number which is not algebraic is called transcendental. In this
section, we will establish that various complex numbers are transcendental.
This problem has a long history. Liouville showed that certain numbers
were transcendental in 1851. However, his methods did not apply to many
naturally occurring numbers, such as π and e. In 1873, Hermite showed that
e was transcendental. And in 1882, Lindemann generalized his argument to
show that any non-trivial sum


n
αi eβi
i=1
6.7. TRANSCENDENTAL NUMBERS 131

is never 0 if the αi = 0 are algebraic, and the βi are distinct algebraic


numbers. This means that π is not algebraic because
e0 + eiπ = 0.
As 0, 1, and i are all algebraic, π must be transcendental. In 1934, Gelfond
and Schneider proved that αβ is always transcendental if α = 0 or 1 is
algebraic, and β is an irrational algebraic number. In general, these results
are very difficult. We will take a look at the results of Liouville and Hermite.
Liouville’s result is based on the fact that irrational algebraic numbers
cannot be approximated too quickly by rational numbers. This is made
precise in the following theorem.

6.7.1. Theorem. Suppose that w is a real root of an irreducible polyno-


mial

d
p(x) = pi xi
i=0
in Q[x] of degree d > 1. Then there is a positive number δ > 0 so that for
every rational number ab with a, b ∈ Z relatively prime, we have
 a 
 δ
w −  ≥ d .
b b

Proof. We will assume that p has integer coefficients, because this can easily
be achieved by multiplying p by a large integer. Let

d

M = max |p (x)| ≤ i|pi |(|w| + 1)i−1 < ∞.
|x−w|≤1
i=1

The next observation is the key idea. The number bd p( ab ) is a non-zero


integer. It is an integer because
a 
d
d
b p = pi ai bd−i .
b
i=0

Since p is irreducible, it has no rational roots. Thus bd p( ab ) = 0. A non-zero


integer has modulus at least 1. Hence
 a 
 
p  ≥ |b|−d .
b
 
Now apply the mean value theorem. Suppose that  ab − w ≤ 1. Then
there is a real number c between w and ab so that
a a  a
p =p − p(w) = p (c) w − .
b b b
Hence
 a  |p( a )|
 1
w −  =  b ≥ .
b |p (c)| M |b|d
132 6. THE RING OF POLYNOMIALS

If we set δ = min{1, M −1 }, the desired formula holds. The hard part


was done in the previous
 paragraph for fractions close to w. The remaining
case, when w − ab  > 1, follows since 1 ≥ δ/|b|d . 

6.7.2. Example. (Liouville numbers) Let q > 1 be an integer, and


define

w= q −k! .
k≥1

Then w is transcendental. To see this, first observe that the base-q expansion
of w is given by a sequence of 1’s and 0’s with a 1 in the k!-th decimal place;
since this is a non-repeating sequence, w must be irrational. To prove w is
transcendental, we may therefore apply Theorem 6.7.1. Let bn = q n! and

n 
n
an = q n! q −k! = q n!−k! .
k=1 k=1

Notice that
 an    −k!  

w −  =  q  < q −(n+1)! q −j < 2q −(n+1)!
bn
k≥n+1 j≥0

Consider any positive integer d. Then for all n ≥ d,


 an 

bdn w −  < 2q −n!(n+1−d) ≤ 2q −n! .
bn
Since this tends to 0 as n tends to infinity, there is no integer d and positive
δ such that
 a 
 δ
w − ≥ d
b b
for all fractions. By Theorem
 6.7.1, w cannot be algebraic.
For example, w = 10 −n! = 0.1100010000000000000000010 . . . is
n≥1
transcendental.

Now let us consider the much more difficult task of showing that e is
transcendental. This proof has been simplified over the years, but perhaps
it will seem rather mysterious because so much of the ‘scaffolding’ has been
removed in order to make it short. The proof uses calculus, not surprisingly,
since it is in calculus that properties of e are developed. In particular, we
d
use the fact that dx (ex ) = ex .

6.7.3. Theorem. e is transcendental.

Proof. Suppose, to the contrary, that there are integers a0 , . . . , an so that


a0 + a1 e + a2 e2 + . . . an en = 0.
6.7. TRANSCENDENTAL NUMBERS 133

We may assume that an a0 = 0. For any large prime p  max{|a0 |, n},


consider the polynomial
1
f (x) = xp−1 (1 − x)p (2 − x)p . . . (n − x)p
(p − 1)!
(n!)p p−1 1 
K
= x + higher order terms = fk x k
(p − 1)! (p − 1)!
k=p−1

where K = (n + 1)p − 1 is the degree of f . Notice that the coefficients fk


are integers.
We need information about the values f (j) (i) for integers j ≥ 0 and
0 ≤ i ≤ n, where f (j) means the j-th derivative of f . Notice that for j ≥ p,


K
k(k − 1)(k − 2) . . . (k + 1 − j)
f (j) (x) = fk xk−j
(p − 1)!
k=j
K & '
 k
= j(j − 1) . . . (p)fk xk−j
j
k=j

This polynomial has integer coefficients which are multiples of p. Hence

f (j) (i) ≡ 0 (mod p) for j ≥ p, i ∈ Z.

Now f has a zero of order p at each integer 1 ≤ i ≤ n. So each i is also a


root of f (j) for 0 ≤ j ≤ p − 1. (See the exercises.) Hence

f (j) (i) = 0 for 0 ≤ j ≤ p − 1, 1 ≤ i ≤ n.

Similarly, since f has a zero of order p − 1 at 0,

f (j) (0) = 0 for 0 ≤ j ≤ p − 2.

Finally, there is one term which is not a multiple of p,

f (p−1) (0) = (n!)p ≡ 0 (mod p).

The next trick is to introduce the polynomial


K
F (x) = f (j) .
j=0

From the previous paragraph, we see that

F (i) ≡ 0 (mod p) for 1≤i≤n

and
a0 F (0) ≡ a0 (n!)p ≡ 0 (mod p).
134 6. THE RING OF POLYNOMIALS

Since a0 = −a1 e − a2 e2 − . . . − an en ,

n 
n
0≡ ai F (i) = ai (F (i) − ei F (0))
i=0 i=1
n

= ai ei e−i F (i) − e0 F (0) (mod p).
i=1
Now it remains to estimate the size of this non-zero integer. Since
deg(f ) = K, we have f (K+1) = 0. A routine calculation shows that

d  −x  K  K
e F (x) = −e−x f (j) + e−x f (j+1) = −e−x f (x)
dt
j=0 j=0

By the mean value theorem, there are real numbers ci ∈ (0, i) so that
 
 d  −x 
−i 
|e F (i) − e F (0)| = i 
0
e F (x) (ci ) = ie−ci |f (ci )|
dt
nK+1
≤ n max |f (x)| ≤
0≤x≤n (p − 1)!
The last estimate comes from (p−1)!|f (x)| = xp−1 (1−x)p · · · (n−x)p ≤ nK .
Let A = max0≤j≤n |aj |. Then one can estimate
 n
   n
nK+1
 i −i 
 ai e e F (i) − e F (0)  ≤
0
Aen
(p − 1)!
i=1 i=1
Aen nK+2 Anen (nn+1 )p
= = .
(p − 1)! (p − 1)!
So the idea is to choose a prime p so large that this fraction is less than 1.
If this fraction is denoted as Bp , we see that for p > 2nn+1 ,
Bp+1 nn+1 1
= < .
Bp p 2
Thus, by the ratio test,
lim Bp = 0.
p→∞
Choose the prime p so large that Bp < 1. However the left-hand side repre-
sents a non-zero integer. Clearly, this is contradictory.
Therefore e does not satisfy any algebraic equation over Q. 

Exercises
 −nn
1. Show that n≥1 2 is transcendental.
2. (a) Prove that if α is transcendental and q ∈ Q  {0, 1}, then αq is
transcendental.
6.8. STURM’S ALGORITHM 135

(b) Give an example of a transcendental number α and an irrational


number q such that αq is algebraic.
3. (a) Show that if p(x) = (x − a)d q(x) is a polynomial in R[x], then f (j)
has the form (x − a)d−j r(x) for all j ≤ d.
(b) Moreover, if q(a) = 0, show that r(a) = 0.
(c) Show that a root a of p(x) is simple if and only if gcd(p, p )(a) = 0.
4. Show that if α is transcendental and β = 0 is algebraic, then α + β, αβ
and α−1 are all transcendental.
5. If 0 ≤ ak  ≤ 9 are integers for k ≥ 1 and infinitely many are non-zero,
then w = k≥1 an 10−k! is transcendental.
√ 
6. (a) Show that if gcd(a, b) = 1, then  15 − ab  > 9b12 .  

(b) If n is a positive square free integer, find a C so that  n− ab  > Cb1 2 .

6.8. Sturm’s Algorithm


Recall the factorization theorem 5.6.2 for real polynomials. Define the dis-
criminant of a quadratic polynomial p(x) = ax2 +bx+c by Δ(p) = b2 −4ac.
This theorem can be restated as:

6.8.1. Theorem. The irreducible polynomials in R[x] are the linear poly-
nomials, and the quadratic polynomials with negative discriminant. The
roots of irreducible quadratic polynomials are a conjugate pair {a, a} of non-
real complex numbers.

As in Exercise 6.7 3 or Lemma 7.8.6, we may test for multiple roots by


computing gcd(p, p ). Moreover, all the roots are simple roots exactly when
gcd(p, p ) = 1. (That lemma may be read independently of the rest of Chap-
ter 7. It is easier in the case of the reals, and other fields of characteristic
0, because the derivative of a non-constant polynomial must be non-zero.)
We now describe an algorithm known as Sturm’s Algorithm for count-
ing the number of real roots of a real polynomial with simple roots in any
interval. The key is the Euclidean algorithm with a special sign convention.
Start with a real polynomial p(x) with simple roots. Set p0 = p and p1 =
p . Apply the Euclidean algorithm by repeated use of the division algorithm.
Recall that dividing pi into pi−1 yields a quotient ai and a remainder which
we call −pi+1 , so that
pi−1 = ai pi − pi+1 .
Since the gcd(p, p ) = 1, this procedure eventually terminates with the rela-
tion
pn−1 = an pn − 0
where pn is a scalar (since it is a scalar multiple of gcd(p, p ) = 1).
136 6. THE RING OF POLYNOMIALS

For each real number a, consider the sequence

p0 (a), p1 (a), . . . , pn−1 (a), pn (a).

We say a sign change occurs at pi and a, if pi (a)pi+1 (a) < 0; i.e., pi (a) is
positive and pi+1 (a) or vice versa. We also say a sign change occurs at pi
and a if pi−1 (a) > 0, pi (a) = 0, and pi+1 (a) < 0, or if pi−1 (a) < 0, pi (a) = 0,
and pi+1 (a) > 0. In fact, the proof below will show that if pi (a) = 0, then
pi−1 (a)pi+1 (a) < 0; so there is always a sign change at pi and a.
If a sign change occurs at pi and a, we write χi (a) = 1; otherwise we
write χi (a) = 0. Let

χ(a) = χ0 (a) + · · · + χn−1 (a);

in other words, χ(a) is the total number of sign changes in the sequence
p0 (a), p1 (a), . . . , pn (a).

6.8.2 Sturm’s Theorem. Let p(x) ∈ R[x] be a polynomial with simple


roots. Then the number of real roots in the interval [a, b] is χ(a) − χ(b).

Proof. Since gcd(pi , pi+1 ) = 1, the polynomials pi and pi+1 have no common
roots. If pk (t) = 0, then

pk−1 (t) = ak pk (t) − pk+1 (t) = −pk+1 (t).

From this, we can deduce that if pk (t) = 0, then pk±1 are non-zero and
of opposite signs in a neighbourhood of t. The constant function pn never
changes sign. Moreover, the roots of p0 are simple, so p0 changes sign at
each root.
Consider the effect on the function χ0 near a root t of p0 . Note that a
sign change in p0 does not effect χk for k ≥ 1, as these quantities do not
depend on p0 . Since t is a simple root, p0 changes sign at t. Suppose that
the sign change of p0 is from positive to negative. Then p0 is decreasing near
t, and thus the derivative p1 is negative near t. So there is a sign change
from positive to negative between p0 and p1 on the interval (t − ε, t), but no
change (from negative to negative) on the interval (t, t + ε) for small ε > 0.
In other words, the function χ0 decreases by one at t:

lim χ0 (t + ε) − χ0 (t − ε) = −1.
ε→0+

Similarly, if p0 changes sign from negative to positive at t, then p0 is in-


creasing, and p1 is positive near t. So again there is a sign change between
p0 and p1 on the interval (t − ε, t), but no change on the interval (t, t + ε)
for small ε > 0. So again the function χ0 decreases by one at t.
6.8. STURM’S ALGORITHM 137

Next consider the effect of a zero t of pk for 1 ≤ k < n. The resulting


(possible) change of sign of pk may affect both χk−1 and χk . As shown
above, pk±1 are of opposite signs in a neighbourhood of t. Now
gcd(pk−1 , pk ) = 1 = gcd(pk , pk+1 ),
and thus pk has no roots in common with pk±1 . Hence there exists  > 0 for
which pk±1 are non-zero on [t − ε, t + ε] and pk has only t as a root in this
interval. So, we may assume without loss of generality that pk−1 < 0 and
pk+1 > 0 on [t − , t + ]. Observe that a change of signs is possible for pk
at t. We make the following table
pk−1 pk pk+1
t− − ? +
t − 0 +
t+ − ? +
where we do not know the signs of pk (t ± ). Changing the sign from −
to + results in increasing χk−1 (t + ε) by 1 and decreasing χk (t + ε) by 1,
leaving χk−1 (t ± ε) + χk (t ± ε) the same. Similarly a change from + to −
results in decreasing χk−1 (t + ε) by 1 and increasing χk (t + ε) by 1, again
leaving χk−1 (t ± ε) + χk (t ± ε) the same. Of course, if the sign of pk does
not change, this also has no effect on χ(t ± ε). A sign change in pk does not
affect χj except for j = k − 1 and k. Therefore regardless of these signs, we
see χ(t − ) = χ(t + ).
The theorem now follows. Our above analysis proves that χ(a) − χ(b) =
χ0 (a) − χ0 (b). So if χ(a) − χ(b) = n, this must be a result of a decrease of
1 in the value of χ0 at each of n zeros of p0 between a and b. 

6.8.3. Example. Consider the polynomial p(x) = x5 −3x−1. One checks


that gcd(p, p ) = gcd(x5 − 3x − 1, 5x4 − 3) = 1, so that p has simple roots.
Then
p1 (x) = 5x4 − 3
x 12
p2 (x) = p1 (x) − p(x) = x + 1
5 5
 52 5 3 54 55 
p3 (x) = x3 − 2 x2 + 3 x − 4 p2 (x) − p1 (x)
12 12 12 12
44 35 − 55
= > 0.
124
Consider the following table of signs.
This chart shows that there are three real roots. One lies in each of the
intervals (−2, −1), (−1, 0) and (1, 2). We could refine this by checking the
points −1.9, −1.8, . . . , −1.1, etc., to get more detail.

Exercises

1. Use Sturm’s algorithm to find the number of zeros of x7 − 7x3 + 8.


138 6. THE RING OF POLYNOMIALS

p p1 p2 p3
x x5 − 3x − 1 5x4 −3 12x + 5 1 χ
−∞ − + − + 3
−2 − + − + 3
−1 + + − + 2
0 − − + + 1
1 − + + + 1
2 + + + + 0
+∞ + + + + 0

Table 6.8.1. sign changes

2. Use Sturm’s algorithm to show that x3 + ax + b has three real roots


(counting multiplicity) if and only if

Δ := −4a3 − 27b2 ≥ 0.

Remember to deal with the case of repeated roots separately.


3. Solve the previous two exercises using calculus. (For simple polynomials
like these, calculus is easier.)
4. Locate all 7 roots of x7 − 259x5 − 510x4 + 2x3 − 518x − 1020 within 0.5
using Sturm’s algorithm.
5. Use Sturm’s algorithm to locate all real roots of x6 − 5x3 + 2x − 1 up to
an error of 0.1.
6. If f ∈ R[x] has repeated roots, explain how to factor f into a product
of polynomials with simple roots.

6.9. Symmetric Functions


Consider the polynomial


n
(x − yi ) = xn − (y1 + y2 + . . . + yn )xn−1 + . . . ± y1 y2 . . . yn
i=1
= xn − P1 (y1 , y2 , . . . , yn )xn−1 + . . . ± Pn (y1 , y2 , . . . , yn )

n
n
=x + (−1)i Pi (y1 , y2 , . . . , yn )xn−i .
i=1
6.9. SYMMETRIC FUNCTIONS 139

The coefficients of xi are special polynomials in {y1 , y2 , . . . , yn }.



P1 = ni=1 yi = y1 + y2 + . . . + yn
P2 = i<j yi yj = y1 y2 + y1 y3 + . . . + yn−1 yn
..
. 
Pk = i1 <i2 <...ik yi1 yi2 . . . yik
..
. 
n
Pn = i=1 yi = y1 y2 . . . yn

The values of these polynomials are not changed if the yi ’s are per-
muted. In general, a function of several variables is called symmetric if
it is invariant under permutation of the variables. That is to say, for every
permutation π of {1, 2, . . . , n},

f (yπ(1) , yπ(2) , . . . , yπ(n) ) = f (y1 , y2 , . . . , yn ).

Moreover, each of these polynomials is homogeneous. A polynomial


p ∈ F[y1 , . . . , yn ] is called homogeneous of degree k if

p(ty1 , ty2 , . . . , tyn ) = tk p(y1 , y2 , . . . , yn ) for t ∈ F.

Notice that Pk is homogeneous of degree k for 1 ≤ k ≤ n.


The functions P1 , P2 , . . ., Pn are called elementary symmetric poly-
nomials. The rather surprising fact is that every symmetric polynomial in
n variables can be expressed uniquely as a polynomial in P1 , . . . , Pn . Let us
look at an example.

6.9.1. Example. For n = 3, the elementary symmetric polynomials are

P1 = y1 + y2 + y3
P2 = y1 y2 + y1 y3 + y2 y3
P3 = y1 y2 y3 .

Consider the symmetric polynomial


3 
p=2 yi3 − 3 yi2 yj + 12y1 y2 y3
i=1 i=j

= 2(y13 + y23 + y33 )−3(y12 y2 + y12 y3 + y22 y1 + y22 y3 + y32 y1 + y32 y2 )+12y1 y2 y3

This is perhaps the natural way to write down a symmetric polynomial,


by collecting together all monomials of the same type. So for n = 3 and
140 6. THE RING OF POLYNOMIALS

polynomials homogeneous of degree 3, there are the three polynomials



3
q1 = yi3
i=1

q2 = yi2 yj
i=j
q3 = y1 y2 y3 .
So p = 2q1 − 3q2 + 12q3 .
Let us compute the symmetric polynomials homogeneous of degree 3
which can be obtained as monomials in P1 , P2 and P3 . They are
P13 = (y1 + y2 + y3 )3 = q1 + 3q2 + 6q3
P1 P2 = (y1 + y2 + y3 )(y1 y2 + y1 y3 + y2 y3 ) = q2 + 3q3
P3 = y1 y2 y3 = q3
Notice that only P13 contains the term y13 , and so is the only one which
requires q1 in its expression. After subtracting 2P13 from p, the polynomial is
a combination of q2 and q3 . Of the remaining two terms, only P1 P2 contains
the term y12 y2 , and hence requires q2 in its expression. So a multiple of P1 P2
can be subtracted off leaving a multiple of q3 = P3 .
We can use vector notation to simplify the calculation involved. Since
(2, −3, 12) = 2(1, 3, 6) − 9(0, 1, 3) + 27(0, 0, 1),
we obtain the relation
p = 2P13 − 9P1 P2 + 27P3 .

6.9.2. Theorem. Every symmetric polynomial in n variables with coeffi-


cients in a field F can be expressed uniquely as a polynomial with coefficients
in F in the n elementary symmetric polynomials.

Proof. The example basically explains how to proceed in general. We pro-


ceed by induction. Given a symmetric polynomial p(y1 , y2 , . . . , yn ), let m be
the largest degree of any monomial in p. Choose the term of degree m so
that the power of y1 is as large as possible, and after that, the power of y2
is as large as possible, and so on. Thus p contains a term
ay1k1 y2k2 . . . ynkn
where k1 ≥ k2 ≥ . . . > kn and k1 + k2 + . . . + kn = m. Call this the ‘largest’
term in p.
We assume the induction hypothesis that the theorem holds for sym-
metric polynomials of lower degree, and for polynomials of the same degree
such that the largest term
by1j1 y2j2 . . . ynjn
6.9. SYMMETRIC FUNCTIONS 141

precedes that of p in the lexicographic order on the exponents. That is, we


say that (j1 , . . . , jn ) ≺ (k1 , . . . , kn ) if j1 < k1 or ji = ki for 1 ≤ i < i0 and
j i0 < k i0 .
The idea is to write down the monomial in P1 , . . . , Pn which has the
same largest term and subtract off an appropriate multiple. It is not too
hard to see that this polynomial is precisely
P = P1k1 −k2 P2k2 −k3 . . . Pnkn .
This is because the ‘largest’ term of P is the product of the ‘largest’ terms
of each factor, namely
y1k1 −k2 (y1 y2 )k2 −k3 . . . (y1 y2 . . . yn )kn .
Indeed, the exponent of yi in this product is
(ki − ki+1 ) + (ki+1 − ki+2 ) + . . . + (kn−1 − kn ) + kn = ki .
Now the polynomial p − aP has a smaller ‘largest’ term. So by the
induction hypothesis, it can be expressed uniquely as a polynomial in the
elementary symmetric polynomials. Adding the monomial aP to this yields
a polynomial expression in P1 , . . ., Pn for p as well. This expression is unique
since there was a unique choice, aP , of a symmetric function with the same
largest term as p and having removed that, there is a unique expression for
the remainder. 
The most important use of symmetric functions is based on the fact
that the coefficients of a polynomial are precisely the elementary symmetric
functions of the roots. This should be clear from their definition, but it
bears repeating. The monic polynomial with roots r1 , . . ., rn is

n 
n
p(x) = (x − ri ) = x +
n
(−1)i Pi (r1 , r2 , . . . , rn )xn−i .
i=1 i=1

In particular, if q = xn +qn−1 xn−1 +. . .+q1 x+q0


is an irreducible polynomial
in Q[x], so that r1 , . . ., rn are all algebraic conjugates, we see that the
elementary symmetric functions of the roots are rational
Pi (r1 , . . . , rn ) = (−1)i qi .
Thus Theorem 6.9.2 implies that every symmetric function of these roots
with coefficients in Q is rational.
This provides one way of proving the following result.

6.9.3. Theorem. The algebraic numbers form a field.

Proof. It must be shown that if α and β are algebraic numbers, then so are
α+β, αβ and 1/α. It was shown in section 6.6, exercise 4 that the reciprocal
of algebraic numbers are algebraic. The method for sums and products are
similar. So only sums will be done here.
142 6. THE RING OF POLYNOMIALS

Let p and q be irreducible polynomials in Q[x] with α and β as roots.


Let α1 , . . ., αm be the roots of p; and let β1 , . . ., βn be the roots of q. It is
enough to show that the polynomial with roots αi + βj for 1 ≤ i ≤ m and
1 ≤ j ≤ n has rational coefficients. However, we know that if P1 , . . ., Pnm
are the elementary polynomials in nm variables, then

  
nm
r(x) = (x − αi − βj ) = (−1)k Pk (αi + βj )xk
1≤i≤m 1≤j≤n k=0

where Pk (αi + βj ) is a symmetric function of the mn roots αi + βj . Thus,


thinking of this as a function of the βj , it is a symmetric polynomial with
coefficients that are symmetric functions of the αi with rational coefficients.
Therefore, these coefficients are themselves rational. Thus Pk (αi + βj ) is
reduced to a symmetric polynomial in the βj with rational coefficients. So
it is a rational number.
We conclude that r ∈ Q[x], and hence its roots are all algebraic. In
particular, α + β is algebraic. 

Exercises

1. Express x41 + x42 + x43 as a polynomial in the three elementary symmetric


polynomials in three variables.
2. Verify that if α and β are algebraic numbers, then so is αβ.
√ √
3. Let α = 3 and β = 3 7.
(a) Find a monic polynomial q of degree 6 in Q[x] with α + β as a root.
(b) Show that γj,k := (−1)j α √+ ω k β are also roots of q for j ∈ {0, 1},
k ∈ {0, 1, 2} and ω = −1+i 2
3
.
(c) Check that P2 (γ00 , . . . , γ1,2 ) is the coefficient of x2 in q.
4. (Newton–Girard
 identities) Fix an integer n ≥ 2. For each k ≥ 0,
let qk = ni=1 xki , which is known as a power sum. Let P0 , . . . , Pn denote
the elementary symmetry symmetric functions in x1 , . . . , xn . Since the
qk are symmetric, they are expressible in terms of the Pi . Prove the
following explicit formula:

 (r1 + · · · + rk − 1)! 
k
qk = (−1)k k (−Pi )ri .
r1 ! . . . rk !
r1 +2r2 +···+krk =k i=1

5. Let the complete Bell Polynomials be defined recursively by B0 = 1 and


k & '
 k
Bk+1 (x1 , . . . , xk+1 ) = Bk−i (x1 , . . . , xk−i )xi+1 .
i
i=0
6.10. CUBIC POLYNOMIALS 143

Prove
& 'k ( )
d 
k
ti
Bk (x1 , . . . , xk ) = exp xi .
dt i!
i=1

6. (Express elementary symmetric polynomials using power sums)


Use the notation of Exercise 4.
(a) Prove that the elementary symmetric functions are expressible as
polynomials with rational coefficients in the power sums. Specifi-
cally, prove
(−1)k
Pk = Bk (−q1 , −(1!)q2 , −(2!)q3 , . . . , −(k − 1)!qk ).
k!
(b) Conclude that every symmetric polynomial with rational coefficients
is expressible as a polynomial in the power sums.
7. Prove that the elementary symmetric polynomials are not expressible as
polynomials with integer coefficients in the power sums.

6.10. Cubic Polynomials


In this section, we will show how to use the power of symmetric polynomials
to factor cubic equations in C[x]. It is nice to know that there is such a
formula, although it is too complicated to be of much practical use. In par-
ticular, even when all three roots are real, the formula still requires complex
numbers.
To illustrate the idea in a more simple setting, first consider a quadratic
polynomial x2 + ax + b. Let the two roots be r1 and r2 . We know that
r1 + r2 = P1 (r1 , r2 ) = −a
r1 r2 = P2 (r1 , r2 ) = b
The symmetric function of the roots (r1 − r2 )2 is given by
(r1 − r2 )2 = (r1 + r2 )2 − 4r1 r2 = a2 − 4b.
Hence

 a2 − 4b −a +
r1 = 1
2 (r1 + r2 ) + (r1 − r2 ) =
√2
 −a − a2 − 4b
r2 = 12 (r1 + r2 ) − (r1 − r2 ) =
2
For cubics, the same kind of technique works, although it is a fair bit
more complicated. The first simplifying step is to make a change of variables
to eliminate the coefficient of x2 . This is analogous to completing the square
in the quadratic case. Suppose we are given a cubic polynomial
w3 + Aw2 + Bw + C.
Make the substitution w = x − A/3. Then we obtain a polynomial
(x − A/3)3 + A(x − A/3)2 + B(x − A/3) + C = x3 + ax + b
144 6. THE RING OF POLYNOMIALS

where a = B − A2 /3 and b = C − AB/3 + 2A3 /27. If we can find the roots


x1 , x2 and x3 of this cubic, then the roots of the original are wi = xi − A/3.
The elementary functions in the xi are

P 1 = x1 + x2 + x3 = 0
P 2 = x1 x2 + x1 x2 + x2 x3 = a
P 3 = x1 x2 x3 = −b

The idea is to look for some ‘almost symmetric’ functions yi of the roots
which are roots of y 3 = d for some d. We investigate the properties such a
y must have. Let D represent a cube root of d and let ω = e2πi/3 be a cube
root of 1. Then

y 3 − D 3 = (y − D)(y − ωD)(y − ω 2 D).

This suggests writing down the following functions of the roots x1 , x2 and
x3 . (This change of coordinates is known as a discrete Fourier transform.)

y1 = x1 + ωx2 + ω 2 x3
y2 = ωy1 = ωx1 + ω 2 x2 + x3
y3 = ω 2 y1 = ω 2 x1 + x2 + ωx3
z1 = x1 + ω 2 x2 + ωx3
z2 = ω 2 z1 = ω 2 x1 + ωx2 + x3
z3 = ωz1 = ωx1 + x2 + ω 2 x3

We find that

y13 = y23 = y33 = y1 y2 y3


z13 = z23 = z33 = z1 z2 z3

and

y1 z1 = y2 z2 = y3 z3 .

For convenience, write the subscripts mod 3 (so that x4 means x1 ). A


computation shows that


3 
3 
3
y13 = x3i + 3ω x2i xi+1 + 3ω 2 xi x2i+1 + 6x1 x2 x3
i=1 i=1 i=1
3 3 3
z13 = x3i + 3ω 2
x2i xi+1 + 3ω xi x2i+1 + 6x1 x2 x3
i=1 i=1 i=1
6.10. CUBIC POLYNOMIALS 145

Neither of these is symmetric, but their sum is,



3 
3 
3
y13 + z13 = 2 x3i − 3 x2i xi+1 − 3 xi x2i+1 + 12x1 x2 x3
i=1 i=1 i=1
 
3   
3
=2 xi −3 x2i xj − 6x1 x2 x3 − 3 x2i xj + 12P3
i=1 i=j i=j
 
3
 
= 2(P13 − 6P3 ) − 9 xi xi xj − 3x1 x2 x3 + 12P3
i=1 i=j

= 2P13 − 9P1 P2 + 27P3 = −27b.


Notice the big advantage of simplicity in this formula occurs because P1 = 0.
Similarly, compute

3 
y1 z1 = x2i + (ω + ω 2 ) xi xj
i=1 1≤i<j≤3


3 
= x2i − xi xj
i=1 1≤i<j≤3

3 2 
= xi −3 xi xj
i=1 1≤i<j≤3

= P12 − 3P2 = −3a.


Hence y1 , y2 , y3 , z1 , z2 and z3 are the roots of
(X 3 − y13 )(X 3 − z13 ) = X 6 − (y13 + z13 )X 3 + (y1 z1 )3
= X 6 + 27bX 3 − 27a3 .
This is a quadratic in X 3 , and thus it can be solved:

3 −27b ± 27(27b2 + 4a3 )
X = .
2
Because of the symmetry involved, we can let y1 be any cube root
# 
3 −27b + 27(27b2 + 4a3 )
y1 = .
2
Then z1 = −3a/y1 . So from the equations for y1 and z1 , we obtain
y1 + z1 = 2x1 − x2 − x3 = 3x1 − P1 .
Thus the roots of the cubic x3 + ax + b are
x1 = y1 /3 − a/y1
x2 = ωy1 /3 − ω 2 a/y1
x3 = ω 2 y1 /3 − ωa/y1 .
146 6. THE RING OF POLYNOMIALS

To get the roots of the original cubic, add −A/3.

6.10.1. Example. Consider the polynomial x3 − 7x + 6, which you can


factor by hand using the rational root theorem, but has the virtue of being
computable by hand in our formula. We have
# 
3 −27(6) + 27(27(6)2 + 4(−7)3 )
y1 =
2
*
3 √
= −81 + 3 729 − 1029
*
3 √ √
= −81 + 30 3i = 3 + 2 3i

Now, for future convenience, compute



a 7y1 3 − 2 3i
− = = .
y1 |y1 |2 3

Plugging this in to the formulae, we obtain


√ √
3 + 2 3i + 3 − 2 3i
x1 = = 2
√ √ 3 √ √
(−1 + 3i)(3 + 2 3i) + (−1 − 3i)(3 − 2 3i)
x2 = = −3
√ √ 6 √ √
(−1 − 3i)(3 + 2 3i) + (−1 + 3i)(3 − 2 3i)
x3 = = 1
6

Even for such a nice cubic, the calculations are daunting. However, this
formula has the virtue of providing a closed form, algebraic expression for the
roots. For finding approximate values of the roots, numerical methods based
on calculus are much superior. Those methods, however, do not provide
exact solutions.

Exercises

1. Redo Exercise 2 from Section 6.8 without using Sturm’s algorithm.


2. Find the roots of x3 − 6x + 9.
3. Find the roots of x3 − 15x2 + 60x − 54.
 √  √
5+2− 5 − 2 = 1.
3 3
4. Show that
5. A sphere with outer radius r which is 1 cm. thick has the same volume
in the shell as in the interior hole. Find r.
NOTES ON CHAPTER 6 147

6. (Cubic resolvent) If f is a degree n polynomial with roots r1 , . . . , rn ,


its discriminant is defined to be

Δ(f ) = (ri − rj )2 .
i<j

The cubic resolvent of a quartic polynomial x4 + ax3 + bx2 + cx + d is


defined to be the polynomial x3 − bx2 + (ac − 4d)x − (a2 d + c2 − 4bd).
The cubic resolvent plays an important role in solving quartic equations.
Prove that a quartic polynomial and its cubic resolvent have the same
discriminant.

Notes on Chapter 6
The formula for the roots of a cubic was discovered by the Italian mathe-
maticians del Ferro and Tartaglia in early 16th century. Cardano and his
student Ferrari learned Tartaglia’s method, and found a solution for the
quartic. The formula for quartics is considerably more complicated than
the cubic case. It was long an open problem whether such a solution could
be obtained for arbitrary polynomial equations. In order for this to be the
case, every algebraic number would have to be expressible as a combination
of various k-th roots. However, in 1826, the Norwegian algebraist Abel pub-
lished the first rigourous argument showing that there are polynomials of
degree 5 which cannot be solved by repeated extraction of roots.
Remarkable progress was made shortly after, in 1831, by the young
mathematician Galois, who died in a duel when he was 20. Galois showed
that one can study the roots of a polynomial by looking at the structure
of the field obtained by adding all of the roots of this polynomial to the
rationals. The set of all isomorphisms of this field onto itself forms a group.
The structure of this group can be used to decide if a polynomial can be
‘solved by radicals’, meaning that the roots can be expressed by extraction
of roots. This is a very beautiful theory, and one of the landmarks of modern
algebra.
It was not until Viète in the late 16th century and Descartes in the early
17th century that a good notation for polynomials was proposed. Stevin
proved the Intermediate value theorem for polynomials, thereby showing
that real polynomials of odd degree have a root. Descartes considered the
graphs of polynomials. He found the rational root theorem and formulated
his rule of signs, but did not publish his proof (as was common). He also ob-
served that a polynomial of degree n has at most n roots. Newton showed
that complex roots of real polynomials come in conjugate pairs. He also
studied the symmetric functions of the roots and related them to the coef-
ficients of a polynomial.
The fundamental theorem of algebra was discussed in the notes to the
previous chapter.
148 6. THE RING OF POLYNOMIALS

Gauss’s lemma comes from his early work in 1801. Eisenstein’s crite-
rion dates from 1850. Sturm published his algorithm in 1829. It was the
first effective algebraic algorithm for locating roots of a polynomial to any
accuracy. In 1901, Kronecker published a set of lectures which includes a
statement and proof of unique factorization of (rational or integer) polyno-
mials into irreducibles.
Liouville was the first to construct transcendental numbers in 1851. Her-
mite showed that e is transcendental in 1873, and Lindemann showed that
π is transcendental in 1882.
The general algebra of polynomials can be found in various introductions
to abstract algebra such as Artin [5]. Sturm’s theorem and Descartes rule
of signs can be found in [38] and [28]. Hardy and Wright [15] is a good
source for information on algebraic and transcendental numbers; also see
Stark [37] and Silverman [34]. See Gray [14] for more about the history.
Chapter 7

Finite Fields
This chapter contains a detailed study of finite fields. It tries to empha-
size the dramatic parallels between the arithmetic of the integers modulo
a prime and the corresponding arithmetic of polynomials modulo an irre-
ducible polynomial. At the end of the chapter, we will obtain an algorithm
for factoring integer polynomials efficiently on a computer, in contrast to
the (apparent) difficulty of factoring large integers.

7.1. Arithmetic Modulo a Polynomial


If p is a polynomial in F[x], then it is possible to do calculations modulo
p. As in the integer case, say that polynomials a(x) and b(x) in F[x] are
congruent mod p if p divides a − b :

a≡b (mod p) if and only if p|(a − b).

This yields a ring of equivalence classes, analogous to the rings Zn , called


F[x]/(p). The point is that addition and multiplication of equivalence classes
are well defined because of the following proposition. The proof is left as an
exercise. (Compare with Proposition 2.1.1.)

7.1.1. Proposition. Let p, ai and bi be polynomials in F[x] such that


a1 ≡ a2 (mod p) and b1 ≡ b2 (mod p).

Then
(1) a1 + b1 ≡ a2 + b2 (mod p).
(2) a1 b1 ≡ a2 b2 (mod p).

This means that addition and multiplication of equivalence classes can


be defined by
[a] + [b] = [a + b] and [a][b] = [ab].
149
150 7. FINITE FIELDS

One may verify the various properties of a commutative ring, such as as-
sociativity of addition and multiplication, and the distributive law, because
these properties hold for the ring F[x].

7.1.2. Example. Consider the ring S = R[x]/(x2 + 1). By the division


algorithm, every polynomial q is equivalent modulo x2 + 1 to its remainder
after division by x2 + 1, which is a linear polynomial a + bx. Since the only
linear polynomial divisible by x2 + 1 is 0, each linear polynomial belongs to
a different equivalence class. Thus
S = {[a + bx] : a, b ∈ R}.
Addition and multiplication are given by
[a + bx] + [c + dx] = [(a + c) + (b + d)x]

[a + bx][c + dx] = [ac + (ad + bc)x + bdx2 ]


= [(ac − bd) + (ad + bc)x].
A moment’s reflection will show that this corresponds to the rules of multi-
plication in the complex numbers C.
This correspondence is not a coincidence. Notice that ±i are the two
roots of the irreducible polynomial x2 +1. In S, the equation X 2 +1 = 0 has
the solution [x]. That is why [x] takes the place of i. We can define a map ϕ
from R[x] into C by ϕ(q) = q(i). One may check that ϕ preserves addition
and multiplication. Moreover, ϕ(q) = 0 if and only if i is a root of q. By
Theorem 6.6.2, it follows that q is divisible by the minimal polynomial of i,
namely x2 + 1. Thus ϕ(q) = 0 if and only if [q] = 0 in S. So there is an
induced map ϕ̃ : S → C given by ϕ̃([q]) = q(i) as in Lemma 6.1.4. The
point of the previous discussion is two-fold. First ϕ̃ is well defined because
q1 ≡ q2 (mod x2 + 1) implies that q1 (i) = q2 (i). Secondly, ϕ̃ is one-to-one
because q1 (i) = q2 (i) implies that x2 + 1|(q1 − q2 ); i.e. q1 ≡ q2 (mod x2 +
1). So ϕ̃ maps S one-to-one and onto C, and preserves all the operations
(addition, multiplication, 0, 1). Therefore ϕ̃ is a ring isomorphism. (Recall
Definition 1.1.2.) This means that they represent the same mathematical
object.
The complex numbers form a field. The ring isomorphism ϕ̃ can be used
to show that S is also a field. For any s = 0, let z = ϕ̃(s). Since ϕ̃ is
one-to-one, z = 0. So z −1 ∈ C. Since ϕ̃ is onto, t = ϕ̃−1 (z −1 ) ∈ S and
st = ϕ̃−1 (z)ϕ̃−1 (z −1 ) = ϕ̃−1 (1) = 1.
That is, t = s−1 . Therefore S is a field.

The polynomial x2 + 1 is irreducible in R[x], and the quotient ring S


turned out to be a field. This is completely analogous to the fact that Zn is
a field if and only if n is prime.
7.1. ARITHMETIC MODULO A POLYNOMIAL 151

7.1.3. Proposition. F[x]/(p) is a field if and only if p is irreducible. If


p is reducible, then F[x]/(p) has zero divisors.

Proof. If p is not irreducible in F[x], then it factors as p = ab where both


a and b have positive degree. Since p does not divide either a or b, the
equivalence classes [a] and [b] in F[x]/(p) are non-zero. However,
[a][b] = [p] = [0].
So F[x]/(p) has zero divisors.
On the other hand, if p is irreducible, and [a] = [0], then gcd(a, p) = 1.
Thus by the Euclidean algorithm for polynomials 6.2.4, there are polynomi-
als s and t in F[x] so that 1 = as + pt. Hence
[a][s] = [1 − pt] = [1].
Therefore, all non-zero elements of F[x]/(p) are units, and so it is a field. 
The significance of this construction comes from the fact that it provides
a method for constructing a bigger field containing F in which p had a root.
Let us record this as a theorem.

7.1.4. Theorem. If p ∈ F[x] is irreducible, then the field G = F[x]/(p)


contains F as a subfield, and p has a root in G.

Proof. Notice that F sits inside G as the constant polynomials [a] for
a ∈ F. The element [x] is a root of p in G because

d
p([x]) = pi [x]i = [p(x)] = [0]. 
i=0

You may have noticed that modding out by p makes [x] a root by fiat.
This is precisely the rationale for doing this operation at all.

Exercises

1. Prove Proposition 7.1.1.


2. (a) Show that x5 + 7x2 − 7 ∈ Z − x] is irreducible.
(b) Find the inverse of [x2 + 3x − 1] in Q[x]/(x5 + 7x2 − 7).
3. (a) Show that x2 + 1 is irreducible in Z7 [x].
(b) Find the smallest integer k so that [2x]k = 1 in Z7 [x]/(x2 + 1).
(c) How many elements are there in Z7 [x]/(x2 + 1)?
4. (a) Show
√ that if d is
√ a square-free positive integer, then
Q[ d] = {r + s d : r, s ∈ Q} is a field.
(b) Express this field as a quotient ring of Q[x].
152 7. FINITE FIELDS
√ √
5. (a) Find an irreducible polynomial p ∈ Z[x] with 3 + 3 7 as a root.
(b) Show that Z[x]/(p) is a ring contained in the field Q[x]/(p).
6. Show that Zp [x]/(x4 + x3 + x + 1) is not a field for any prime p.
7. (a) Show that if a1 , a2 , p1 , p2 ∈ F[x] and gcd(p1 , p2 ) = 1, then the system
q ≡ a1 (mod p1 )
q ≡ a2 (mod p2 )
has a unique solution (mod p1 p2 ).
(b) Prove the Chinese Remainder Theorem for arithmetic modulo poly-
nomials; i.e., if gcd(pi , pj ) = 1 whenever 1 ≤ i < j ≤ n, show that
q ≡ a1 (mod p1 )
..
.
q ≡ an (mod pn )
has a unique solution modulo p1 p2 · · · pn .
8. Suppose that a1 , a2 , p1 , p2 , d ∈ F[x] and gcd(p1 , p2 ) = d and d ∈ F.
When does the system
q ≡ a1 (mod p1 )
q ≡ a2 (mod p2 )
have solutions? What can you say about these solutions?

7.2. An Eight-Element Field


Consider the polynomial p(x) = x3 + x + 1 in Z2 [x]. It is irreducible
because it has no roots, and has degree 3. Let us investigate the field
F8 = Z2 [x]/(x3 + x + 1). By the division algorithm, the different equivalence
classes are again given by all polynomials of degree less than p, namely the
quadratic polynomials a + bx + cx2 . There are 2 choices for each a, b, c, so
there are 8 = 23 elements in F8 . The multiplication rules are given by the
following table.
It is apparent from this table that every element has an inverse. For
example, [x + 1][x2 + x] = [1]. It would be difficult to find the compatible
addition and multiplication tables for a field of 8 elements without this
construction.
By Theorem 7.1.4, we see that the polynomial X 3 + X + 1 has a root [x]
in F8 . In fact, it has three roots. A calculation using the table above shows
that
([x]2 )3 + [x]2 + 1 = ([x2 ][x2 + x]) + [x2 ] + 1
= [x2 + 1 + x2 + 1] = [0]
7.2. AN EIGHT-ELEMENT FIELD 153

· 0 1 x x+1 x2 x2 +1 x2 +x x2 +x+1

0 0 0 0 0 0 0 0 0

1 0 1 x x+1 x2 x2 +1 x2 +x x2 +x+1

x 0 x x2 x2 +x x+1 1 x2 +x+1 x2 +1

x+1 0 x+1 x2 +x x2 +1 x2 +x+1 x2 1 x

x2 0 x2 x+1 x2 +x+1 x2 +x x x2 +1 1

x2 +1 0 x2 +1 1 x2 x x2 +x+1 x+1 x2 +x

x2 +x 0 x2 +x x2 +x+1 1 x2 +1 x+1 x x2

x2 +x+1 0 x2 +x+1 x2 +1 x 1 x2 +x x2 x+1


Table 7.2.1. Multiplication table for F8

So this yields the factorization


X 3 + X + 1 = (X − [x])(X − [x2 ])(X − [x2 + x]).
Now consider the powers of [x] in F8 . We have [x], [x2 ], [x3 ] = [x + 1],
[x4 ]= [x2 +x], [x5 ] = [x2 +x+1], [x6 ] = [x2 +1], and [x7 ] = 1. So the powers
of [x] run through all the 7 non-zero elements of F8 . This is a primitive root!
Notice that for any non-zero a ∈ F8 , there is a k so that a = [xk ]. So
a7 = [x7 ]k = 1.
So a is a root of X 7 − 1 = 0. Since 7 = 8 − 1, this is a variant of Fermat’s
little theorem for F8 . We will establish this for all finite fields.
This means that X 8 − X has 8 distinct roots in F8 . So it factors into
linear terms in F8 : 
X8 − X = (X − a).
a∈F8
Let us factor it in Z2 [X]. A simple calculation shows that
X 8 − X = X(X − 1)(X 3 + X + 1)(X 3 + X 2 + 1).
The two cubics are irreducible in Z2 [X] because they have no roots in Z2 .
We saw above that X 3 + X + 1 factors into three linear terms in F8 [X]. We
now also can factor
X 3 + X 2 + 1 = (X − [x + 1])(X − [x2 + 1])(X − [x2 + x + 1]).
It turns out that there is only one field of order 8. This may seem
surprising since there is a second irreducible polynomial of degree 3, namely
x3 + x2 + 1. It turns out that this other choice leads to an equivalent
field, in the sense that there is an isomorphism of one onto the other, as in
Example 7.1.2. Consider the other 8 element field, G = Z2 [y]/(y 3 + y 2 + 1).
154 7. FINITE FIELDS

As an exercise, write out the multiplication table for G. Notice that [y] is
a root of X 3 + X 2 + 1 in G. But F8 also has roots of this polynomial; for
example, [x + 1] is a root.
Consider the map from G to F8 given by
ϕ([a + by + cy 2 ]) = [a + b(x + 1) + c(x + 1)2 ] = [(a + b + c) + bx + cx2 ].
This map is easily seen to be a bijection, for
ϕ([a1 + b1 y + c1 y 2 ]) = ϕ([a2 + b2 y + c2 y 2 ])
implies that b1 = b2 , c1 = c2 and a1 + b1 + c1 = a2 + b2 + c2 , whence a1 = a2 .
More significantly, ϕ preserves addition and multiplication.
ϕ([a1 + b1 y + c1 y 2 ]) + ϕ([a2 + b2 y + c2 y 2 ])
= [(a1 + b1 + c1 ) + b1 x + c1 x2 ] + [(a2 + b2 + c2 ) + b2 x + c2 x2 ]
= [(a1 + a2 + b1 + b2 + c1 + c2 ) + (b1 + b2 )x + (c1 + c2 )x2 ]
= ϕ([(a1 + a2 ) + (b1 + b2 )y + (c1 + c2 )y 2 ]
This show that ϕ preserves addition. Multiplication is more subtle, and
uses the fact that [y] and [x + 1] have the same minimal polynomial q(X) =
X 3 + X 2 + 1. Hence [y 3 ] = [y 2 + 1] and [y 4 ] = [y 2 + y + 1] and likewise
[(x + 1)3 ] = [(x + 1)2 + 1] and [(x + 1)4 ] = [(x + 1)2 + (x + 1) + 1]. Thus
ϕ([a1 + b1 y + c1 y 2 ])ϕ([a2 + b2 y + c2 y 2 ])
= [a1 + b1 (x + 1) + c1 (x + 1)2 ])([a2 + b2 (x + 1) + c2 (x + 1)2 ]
= [a1 a2 + (a1 b2 + a2 b1 )(x + 1) + (b1 b2 + a1 c2 + a2 c1 )(x + 1)2
+ (b1 c2 + b2 c1 )(x + 1)3 + c1 c2 (x + 1)4 ]
= [a1 a2 + (a1 b2 + a2 b1 )(x + 1) + (b1 b2 + a1 c2 + a2 c1 )(x + 1)2
+ (b1 c2 + b2 c1 )((x + 1)2 + 1) + c1 c2 ((x + 1)2 + (x + 1) + 1)]

= ϕ [a1 a2 +(a1 b2 +a2 b1 )y + (b1 b2 +a1 c2 +a2 c1 )y 2
+ (b1 c2 +b2 c1 )(y 2 +1) + c1 c2 (y 2 +y+1)]

= ϕ [a1 a2 +(a1 b2 +a2 b1 )y+(b1 b2 +a1 c2 +a2 c1 )y 2+(b1 c2 +b2 c1 )y 3+c1 c2 y 4 ]

= ϕ [a1 + b1 y + c1 y 2 ][a2 + b2 y + c2 y 2 ]
So we see that ϕ is an isomorphism between these two fields of order 8.

Exercises

1. Construct the multiplication table for G = Z2 [y]/(y 3 + y 2 + 1).


2. Construct the multiplication table for F = Z3 [x]/(x2 + x − 1).
3. (a) Factor X 9 − X in Z3 [X].
(b) Factor X 9 − X in Q[x]/(x5 + 7x2 − 7).
7.3. FERMAT’S LITTLE THEOREM FOR FINITE FIELDS 155

4. Show that F = Z3 [x]/(x2 + x − 1) is isomorphic to G = Z3 [y]/(y 2 + 1).


5. (a) Show that x2 + 1 and x2 + x + 4 are irreducible in Z11 [x].
(b) Factor X 2 + X + 4 in F = Z11 [x]/(x2 + 1).
(c) Construct an explicit isomorphism from G = Z11 [x]/(x2 + x + 4)
onto the field F.
6. (a) Find all irreducible quadratics in Z2 [x].
(b) Construct a 4-element field.
(c) Show that this list of four matrices with coefficients in Z2
       
0 0 1 0 0 1 1 1
, , ,
0 0 0 1 1 1 1 0
form a field under the usual addition and multiplication of matrices,
modulo 2.

(d) Find an isomorphism between the fields that you constructed in
parts (b) and (c).

7.3. Fermat’s Little Theorem for Finite Fields


In this section, we will show that certain results about modular arithmetic
for Zp are valid for all finite fields. Moreover, the proofs in many cases are
almost unchanged from the integer case. This will lead to strong structural
results for finite fields.

7.3.1. Proposition. Let p be prime, and let q(x) be an irreducible poly-


nomial of degree d in Zp [x]. Then the field Zp [x]/(q) has cardinality pd .

Proof. This is just the observation that each [a] agrees with [r] where r is
its remainder on dividing a by q. This remainder has degree at most d − 1.
Conversely, two distinct polynomials r1 and r2 of degree at most d − 1 must
represent different equivalence classes. This is because r1 ≡ r2 (mod q) if
and only if q divides r1 − r2 , a polynomial of degree at most d − 1. Since q
has larger degree, this can happen only when r1 − r2 = 0. So r1 = r2 .
It remains to count the number of polynomials of degree at most d − 1
in Zp [x]. They can be written as a0 + a1 x . . . + ad−1 xd−1 where each ai is an
arbitrary element of Zp . There are p choices for each coefficient ai . Hence
there are pd choices for the different equivalence classes. 

In order to show that all finite fields are of this type, we must develop
various properties of finite fields. The first result is the analogue of Fermat’s
little theorem.

7.3.2. Theorem. Let F be a finite field of cardinality n. Then an−1 = 1


for every a = 0 in F.
156 7. FINITE FIELDS

Proof. The proof is the same as in Zp . Define a map f : F → F by


f (x) = ax. This map is one-to-one. To see this, notice that if f (x) = f (y),
then
0 = f (x) − f (y) = a(x − y).
Since a = 0 and F has no zero divisors, it follows that x = y. Also, f (0) = 0.
Thus f maps F∗ = F  {0} into itself. A one-to-one function of a finite set
into itself must also be onto. So multiplication by a merely permutes the
units. Therefore,
  
x= ax = an−1 x.
x∈F∗ x∈F∗ x∈F∗

Dividing by the product of the units, we get an−1 = 1. 

7.3.3. Corollary. Let F be a finite field of cardinality


 n. Then one can
factor the polynomial X n − X in F[X] as X n − X = a∈F (X − a).

Proof. By the previous theorem, every a ∈ F∗ is a root of X n−1 − 1. Thus


every element of F is a root of X n − X. This provides  n roots for this
polynomial of degree n. Hence it is a scalar multiple of a∈F (X − a). Since
the leading coefficient of both polynomials is 1, they are equal. 

7.3.4. Corollary. Let q be an irreducible polynomial of degree d in Zp [x],


d
and form the field F = Zp [x]/(q). Then q divides xp − x in Zp [x]; and q(X)
factors into linear terms in F[X].

Proof. Let a = [x] be the known root of q in F. Thinking of a as an


algebraic element over Zp , we see that q must be the minimal polynomial
of a in Zp [X] because it is irreducible. Now by Theorem 7.3.2 for F and
d
Proposition 7.3.1, we see that a is a root of X p − X. So by Theorem 6.6.2,
d d
it follows that q(X) divides X p − X in Zp [X], say X p − X = q(X)r(X).
d
By the previous corollary, X p −X factors into a product of linear terms.
By unique factorization into irreducible polynomials in F[X], it follows that
q(X) factors into a product of d linear terms in F[X]. 

Exercises

1. Find the analogue of Wilson’s Theorem for the product of all the units
of any finite field.
2. Factor x16 − x into irreducibles in Z2 [x].
3. Let R be a finite integral domain. Prove that R is a field.
Hint: Take a = 0 in R and find 0 ≤ m < n such that am = an .
7.4. CHARACTERISTIC 157
p−1
4. Let p be an odd prime, and let q(x) = xp−1 − 1 − k=1 (x − k) in Zp [x].
Show that deg q < p − 1 but q has at least p − 1 roots. Hence deduce
Wilson’s theorem for Zp .

7.4. Characteristic
Now it is possible to count the number of elements in a finite field. The
prime integer p in the following theorem is called the characteristic of the
field. This is the smallest integer p such that the sum of p ones in F equals 0.
If such a sum is never 0, say that the field has characteristic zero. Examples
of fields of characteristic zero are Q, R and C.

7.4.1. Theorem. Let F be a finite field. Then


(1) There is a prime p so that pa = 0 for every a ∈ F.
(2) F contains a copy of Zp .
(3) There is an integer d so that |F| = pd .

Proof. First consider the elements of F given by 0, 1, 2 = 1+1, 3 = 1+1+1,


and so on. This is an infinite list, and since these are all elements of the
finite set F, the list must repeat itself. So there are sums
. . + 1. = 1+ + .,-
k = +1 + .,- . . + 1. = m.
k ones m ones

subtracting yields
0 = m − k = +1 + .,-
. . + 1. .
m−k ones
Let p be the smallest positive integer such that the sum of p ones equals
0. It must be shown that p is prime. If it isn’t prime, factor p = jk where
1 < j, k < p. Then
0=1 . . + 1. = (1
+ + .,- + + .,-
. . + 1.)(1
+ + .,-
. . + 1.) = jk.
p ones j ones k ones

Neither of these terms is 0, so F contains zero divisors, which is absurd.


Therefore p is prime. Now for any element a ∈ F,
. . + a. = a(1
pa = +a + .,- + + .,-
. . + 1.) = a(0) = 0.
p a’s p ones

Let S = {0, 1, . . . , p − 1} be the set of all possible sums of ones in F.


Notice that this set is closed under addition and multiplication (because the
product of two sums of ones is a sum of ones). Moreover, by the paragraph
above, addition is calculated mod p. Clearly then, multiplication is also
calculated mod p. So S is a copy of Zp in F. From now on, we will write an
integer n to mean n (mod p) as an element of F.
158 7. FINITE FIELDS

The next problem is to find a way to represent the elements of F which


will allow us to count them. The idea is to find a minimal list a1 = 1, a2 ,
. . ., ad in F so that every a ∈ F can be expressed as a sum


d
ni ai ni ∈ Zp .
i=1

This is done recursively. If F is larger than Zp , choose some element a2 ∈ F


not in Zp . Then if {n1 + n2 a2 : ni ∈ Zp } is not all of F, choose a3 ∈ F not
in this set. Repeat this until a set a1 = 1, a2 , . . ., ad is chosen so that

• aj ∈ { j−1
i=1 ni ai : ni ∈ Zp } 
• Every a ∈ F can be expressed as di=1 ni ai for some ni ∈ Zp .
The important point of this representation is that every a ∈ F can be
represented as such a sum in exactly one way. If this were not the case,
there would be two different sums with the same total:


d 
d
mi ai = ni ai .
i=1 i=1

Subtracting yields

d
(mi − ni )ai = 0.
i=1
d
It suffices to show that if i=1 ki ai = 0, then all the coefficients ki are
0. If this were not so, let i0 be the largest integer so that ki0 = 0. Then
rearranging the equation and dividing by ki0 yields

0 −1
i
ai 0 = −ki−1
0
ki ai .
i=1

This contradicts the fact that no aj can be written as a combination of the


earlier ai ’s.
It follows that each coefficient ni can be any element of Zp . Since dif-
ferent choices yield different sums, there are pd such sums. Thus F has pd
elements. 

It is worth remarking that the last part of this proof is not really a mys-
terious one. If the reader is familiar with vector spaces and linear algebra,
then the proof may be shortened considerably. Once F contains a copy of
the field Zp , it follows that F is a vector space over Zp . If d is the dimension
of F, then |F| = pd . Indeed, the set a1 , . . . , ad is a basis for F as a vector
space over Zp .
7.5. ALGEBRAIC ELEMENTS 159

Exercises

1. Show that if F is a field of characteristic 0, then F contains a copy of


the rational numbers.
2. (a) For the field Z3 [X]/(x4 + x3 − 1), show that the set {1, [x], [x2 ], [x3 ]}
serves the role of {a1 , a2 , a3 , a4 } in the proof of Theorem 7.4.1.
(b) If you know some linear algebra, find the matrix for the linear
transformation T [p] = [xp] with respect to this basis.
3. Let F be a field of characteristic p > 0.
(a) Prove that (x + a)p = xp + ap for a ∈ F. 
Hint: use the binomial theorem. What is kp (mod p)?
k k k
(b) Deduce that if a, b ∈ F, then (a + b)p = ap + bp for every k ≥ 1.
4. Let F be a fieldof characteristic p.
(a) Prove that p−1 ≡ (−1)k (mod p) for 0 ≤ k ≤ p − 1.
k p−1 k p−1−k
(b) Hence show that if a, b ∈ F, then (a − b)p−1 = k=0 a b .
5. Suppose that G  F is a strict inclusion of finite fields of characteristic
p > 0. Let |G| = pd and |F| = pe . Modify the proof of Theorem 7.4.1
(3) to show that d|e.

7.5. Algebraic Elements


If a is an element of a field F of cardinality pd , then by Fermat’s Little
d
Theorem, a is a root of the polynomial X p − X in Zp [X]. Thus there is an
d
irreducible factor q(X) of X p − X such that q(a) = 0. This is the minimal
polynomial of a, which is algebraic over Zp . Theorem 6.6.2 is valid for
Zp as well as Q, and one can use the same proof verbatim replacing Q by
Zp . Thus we conclude that if r(a) = 0, then q|r. We state this for future
reference.

7.5.1. Proposition. If a is an element of a field F of cardinality pd , then


a has a minimal polynomial q ∈ Zp [X]. The polynomial q is a factor of
d
X p − X. If r ∈ Zp [X] satisfies r(a) = 0, then q divides r.

Starting with this element a in F, consider the set Zp [a] of all polyno-
mials of a. This set is a subset of F which is closed under addition and
multiplication because
r(a) + s(a) = (r + s)(a) and r(a)s(a) = (rs)(a)
for all r and s in Zp [X]. Say that a is a generator of F if F = Zp [a]. The
following theorem explains what this subset is.
160 7. FINITE FIELDS

7.5.2. Theorem. Let a be an element of a field F of cardinality pd ,


with minimal polynomial q ∈ Zp [X]. Then Zp [a] is a field isomorphic to
Zp [X]/(q).

Proof. Define a map ϕ : Zp [X] → Zp [a] by ϕ(r) = r(a). We have just seen
that
ϕ(r + s) = (r + s)(a) = r(a) + s(a) = ϕ(r) + ϕ(s)
and
ϕ(rs) = (rs)(a) = r(a)s(a) = ϕ(r)ϕ(s).
So ϕ preserves addition and multiplication.
The map ϕ is not one to one. Indeed, ϕ(r) = 0 if and only if r(a) = 0
which occurs if and only if q|r by Proposition 7.5.1. Hence ϕ(r1 ) = ϕ(r2 )
if and only if q|(r1 − r2 ) which holds if and only if r1 ≡ r2 (mod q). So we
may define a map ϕ̃ : Zp [X]/(q) → F by
ϕ̃([r]) = r(a).
The value of ϕ̃([r]) is independent of choice of representative r, so ϕ̃ is well
defined on equivalence classes mod q. Therefore this definition makes sense.
Moreover, our calculation also shows that ϕ̃ is one to one. Both sets have
cardinality pd . Thus the map ϕ̃ is a bijection of Zp [X]/(q) onto Zp [a].
Next notice that ϕ̃ preserves addition and multiplication because ϕ does.
In other words,
ϕ̃([r]) + ϕ̃([s]) = ϕ(r) + ϕ(s) = ϕ(r + s) = ϕ̃([r + s]).
and
ϕ̃([r])ϕ̃([s]) = ϕ(r)ϕ(s) = ϕ(rs) = ϕ̃([rs]).
So ϕ̃ is a bijection between Zp [X]/(q) and Zp [a] which preserves all the field
operations, i.e., it is an isomorphism. 

Since the cardinality of Zp [a] is at most pd , we obtain the following


consequence.

7.5.3. Corollary. Let a be an element of a field F of cardinality pd with


minimal polynomial q ∈ Zp [X]. Then deg q ≤ d and Zp [a] = F if and only
if deg q = d.

7.5.4. Example. Consider the field F = Z3 [x]/(x3 + x2 + 2), and the


element a = [x2 + x + 2]. Compute a2 = [x2 + 2x + 2] and a3 = [x2 + x].
So we observe that a is a root of q(X) = X 3 + 2X + 2. This is irreducible
because it is a cubic with no roots in Z3 . To compute a−1 in F, we notice
that
0 = a−1 (a3 + 2a + 2) = a2 + 2 − a−1 .
7.6. FINITE FIELDS 161

Thus a−1 = a2 − 1. Now we may factor q = (X − a)(X 2 + aX + a−1 ) in


F[X]. The quadratic factor will have roots
√ 
−a ± a2 − 4a−1 −a ± a2 − 1(a2 − 1) 2a ± 2
= = = a ± 1.
2 2 2
So
q(X) = (X − a)(X − a − 1)(X − a − 2).
We also might notice that since a3 = a+1, that a3 has the same minimal
polynomial as a. Also
a9 = (a3 )3 = (a + 1)3 = a3 + 3a2 + 3a + 1 = a3 + 1 = a + 2.
Hence a9 is the third root. Finally, notice that
a27 = (a + 2)3 = a3 + 6a2 + 18a + 8 = a3 + 2 = a.
This is foreshadowing of a general phenomenon that we will study in the
section on automorphisms.

Exercises

1. In the field F = Z2 [x]/(x4 + x3 + 1), find the minimal polynomial of the


element a = [x2 + 1].
Hint: compute the first four powers of a and find a linear relationship
among {1, a, a2 , a3 , a4 }.
2. What is the cardinality of the subfield Z2 [b] ⊂ F in the previous exercise
for b = [x3 + x].
3. In the field F = Z19 [x]/(x2 − 2), show that every element of F  Z19 has
a minimal polynomial of degree 2.
4. Use Section 7.4 Exercise 5 to show that if a ∈ F with minimal polynomial
q(x) ∈ Zp [x] and |F| = pd , then deg q divides d.

7.6. Finite Fields


We will see now that all finite fields of characteristic p arise from arith-
metic modulo an irreducible polynomial over Zp . To get finer detail about
the structure of F, we will need to know about primitive roots. Recall
that a primitive root of F is a unit a such that the set of powers of a,
{a, a2 , . . . , an−1 }, is the full set of units F∗ . In particular, primitive roots
are generators of F.

7.6.1. Theorem. Every finite field has a primitive root.

Proof. Again, the proof is the same as for Zp . Let n = |F| = pd . The
order of a unit a is defined to be the least positive integer d = ord(a) such
162 7. FINITE FIELDS

that ad = 1. As in Proposition 2.10.2, it follows that if ak = 1 = a , then


agcd(k,) = 1 as well. Since an−1 = 1, it then follows that ord(a)|n − 1 as in
Corollary 2.10.3.
Following the proof of Lemma 2.10.5, let f (d) count the number of el-
ements a ∈ F with ord(a) = d for each d which divides n − 1. As before,
notice that ord(a)|d if and only if a is a root of X d − 1 in F. This polynomial
has at most d roots. On the other hand,
X n−1 − 1 = (X d − 1)(X n−1−d + X n−1−2d . . . + X d + 1)
has exactly n − 1 roots by Corollary 7.3.3. The second factor has at most
n − 1 − d roots. Thus each factor must have its full complement of roots.
This yields the formula

f (e) = d.
e|d

As in the proof of Theorem 2.10.6, observe that this set of equations is


also satisfied by the Euler ϕ function. So as in that proof, we deduce that
f (d) = ϕ(d) for every divisor d of n − 1. In particular, there are ϕ(n − 1)
elements of order n − 1. These are the primitive roots. 

We can use primitive roots to provide a familiar criterion for when an


element of F is a square.

7.6.2. Proposition. Let p be an odd prime, and let F be a field of cardi-


d −1)/2
nality pd . An element a ∈ F is a square in F if and only if a(p = 1.

Proof. Let c = a(p −1)/2 . Then by Fermat’s little theorem for F, c2 =


d

ap −1 = 1. Thus c is a root of x2 − 1 = 0; whence c ∈ {±1}.


d

Let b be a primitive root for F. Then b(p −1)/2 = −1 since it is distinct


d

from bp −1 = 1. If a = bk for 0 ≤ k < pd − 1, then


d

 d
a(p −1)/2 = b(p −1)/2 = (−1)k .
d k

This equals 1 if and only if k is even.


If a = d2 for some d ∈ F and d = bl for 0 ≤ l < pd − 1, then a = b2l . So
k ≡ 2l (mod pd − 1). Since 2l and pd − 1 are both even, this forces k to be
even. Conversely, if k = 2l, then a = (bl )2 is a square. 

Now we have the necessary tools to prove the main theorem about finite
fields.

7.6.3. Theorem. Let F be a finite field of cardinality pn . There is an irre-


ducible polynomial q ∈ Zp [x] of degree n so that F is isomorphic to Zp [x]/(q).
n 
Moreover, X p −X = a∈F (X −a) factors into linear terms with pd distinct
roots.
7.6. FINITE FIELDS 163

Proof. Let a be a primitive root of F. Let q be the minimal polynomial of


a. The subfield Zp [a] contains ak for 0 ≤ k ≤ pd − 1. As this is a list of all
the non-zero elements of F, we obtain Zp [a] = F. By Theorem 7.5.2, there
is an isomorphism of Zp [X]/(q) onto F. Since

pdeg(q) = |Zp [X]/(q)| = |F| = pn ,


we see that q has degree exactly n. 
n
By Corollary 7.3.3, X p − X = a∈F (X − a) in F[X]. This is degree pn
and has pn distinct roots. 

Since there are different irreducible factors of degree d, it is possible that


there are many different finite fields of each cardinality. However, this is not
the case.

7.6.4. Corollary. There is only one field F of cardinality pn up to iso-


morphism.

Proof. Suppose that F and G are finite fields of cardinality pn . By Theorem


7.6.3, there is an irreducible polynomial q of degree n so that F is isomorphic
n
to Zp [X]/(q). Moreover, q is a factor of X p −X, so we obtain a factorization
n
X p − X = q(X)r(X).
n
By Corollary 7.3.3 applied to G, we see that X p − X factors into linear
terms in G[X] . As we have seen before, the polynomial q(X) must have
exactly n roots in G. Let b be such a root. Then the minimal polynomial
of b in Zp [X] is q since q is irreducible. By Theorem 7.5.2, Zp [X]/(q) is
isomorphic to Zp [b]. In particular, Zp [b] has pn elements, and thus is all of
G. So F and G are isomorphic are both isomorphic to Zp [X]/(q), and thus
to each other. 

Because of this corollary, there is at most one field of cardinality pn for


each prime p and positive integer n. We will call it Fpn . We still need to
show that Fpn always exists.

7.6.5. Corollary. Every irreducible polynomial of degree n in Zp [x] splits


into a product of linear terms in Fpn .

Proof. This is an immediate corollary of Corollary 7.3.4 and the uniqueness


of Fpn established above. 

7.6.6. Example. Consider p(x) = x4 + x2 + x + 1 in Z3 [x]. This is


irreducible. To see this, first notice that it has no roots in Z3 . So if it factors,
it is into a product of two quadratics. There are only three irreducible
quadratics in Z3 [x], namely x2 + 1, x2 − x − 1 and x2 + x − 1. None of these
divide p, so p is irreducible. Form the field F = Z3 [x]/(p) with 81 elements.
164 7. FINITE FIELDS

To find a primitive root, we require an element of order 80. As for prime


integers, it suffices to show that ord(a) is not a proper divisor of 80 = 24 5.
Thus an element a such that a40 = 1 and a16 = 1 must be a primitive
root. Using computer software, we compute [x]40 = 1. A second try is
[x + 1]40 = −1 and [x + 1]16 = [x3 − 1]. Thus [x + 1] is a primitive root.
Going back to the element [x], we compute [x]20 = [−x3 − x2 − x + 1] and
[x]8 = [x3 + x2 − x]. So ord([x]) = 40.

Exercises

1. Check by division that p(x) in Example 7.6.6 is not divisible by any


irreducible quadratic polynomial, as claimed.
2. (a) Factor X 16 − X into irreducibles in Z2 [X].
(b) Show that X 3 + X + 1 is irreducible over F = Z2 [x]/(p(x)) where
p(x) = x4 + x3 + x2 + x + 1.
3. Show that for any polynomial q ∈ Zp [x] (where p is prime), the polyno-
d d
mial q p − q is divisible by xp − x.
Hint: consider its roots in Fpd .
4. (a) Find ord([x]) in F = Z2 [x]/(x4 + x3 + x2 + x + 1). Notice that [x] is
a generator of F but not a primitive root.
(b) Find a primitive root for F.
(c) Factor X 4 + X 3 + 1 in F[X].
5. If p = 3 is prime, find a criterion for a ∈ Fpn to be a perfect cube.

7.7. Automorphisms of Fpd


In the study of fields, the set of isomorphisms of the field onto itself (which
are called automorphisms) is very important. It is a crucial idea of Galois
theory. Galois theory can be used to explain why certain polynomials of de-
gree at least 5 cannot be solved by repeated kth roots, k ≥ 2. It is also used
to show that certain angles cannot be trisected by a procedure using only
a straight-edge and a compass. In the case of finite fields, we may analyze
these automorphisms more concretely. The key is the following observa-
tion showing that there is a special automorphism called the Frobenius
automorphism for each finite field.

7.7.1. Lemma. Let Fpd be a finite field. The map ϕ : Fpd → Fpd given by

ϕ(a) = ap

is an isomorphism. Moreover, ϕ(a) = a if and only if a ∈ Zp .


7.7. AUTOMORPHISMS OF Fpd 165

Proof. We see ϕ(0) = 0 and ϕ(1) = 1. Also,


ϕ(ab) = (ab)p = ap bp = ϕ(a)ϕ(b).
So ϕ is multiplicative.
& ' The key is that it is also additive. Note that if
p p!
1 ≤ i < p, then = is a multiple of p because p divides the
i i!(p − i)!
numerator but not the denominator. Thus because computations in F are
done modulo p,
p & '
p p i p−i
ϕ(a + b) = (a + b) = ab
i
i=0
= ap + bp = ϕ(a) + ϕ(b).
Hence we see that ϕ preserves all the field operations. Next let us check
that ϕ is a bijection. If
0 = ϕ(a) − ϕ(b) = ϕ(a − b) = (a − b)p ,
then it follows that a − b = 0 or a = b. So ϕ is a one-to-one map of F into
itself. As F is finite, it is also onto. Hence ϕ is a bijection. Therefore it is
an automorphism.
Finally, notice that ϕ(a) = a if and only if a is a root of X p − X. By
Fermat’s little theorem, every element of Zp is a root. This accounts for p
roots of this polynomial of degree p. Hence there are no others. 

7.7.2. Example. Consider the field of 8 elements F8 . The Frobenius au-


tomorphism is ϕ(a) = a2 . So ϕ2 (a) = ϕ(ϕ(a)) = a4 is also an automorphism
of F8 . Similarly, ϕ3 (a) = a8 is an automorphism. However, by Fermat’s lit-
tle theorem for finite fields, a8 = a for every a ∈ F. So ϕ3 is the identity
map. Using the multiplication table 7.2.1, we can construct the following
table.

a ϕ(a) ϕ2 (a) ϕ3 (a)


0 0 0 0
1 1 1 1
x x2 x2 +x x
x+1 x2 +1 x2 +x+1 x+1
x 2 2
x +x x x2
x2 +1 x2 +x+1 x+1 x2 +1
2
x +x x x 2 x2 +x
2
x +x+1 x+1 2
x +1 2
x +x+1

Figure 7.7.1. Automorphisms of F8

Recall that we showed that in F8 [X], we can factor


X 3 + X + 1 = (X − x)(X − x2 )(X − (x2 + x)).
166 7. FINITE FIELDS

Observe that x, ϕ(x) = x2 and ϕ(x2 ) = x2 + x are the three roots of this
polynomial. Also ϕ(x2 + x) = x; so ϕ just permutes the roots.

This demonstrates a useful property of automorphisms of F for the pur-


pose of studying polynomials in Zp [X]. Every automorphism of F must
permute the roots of these polynomials in Zp [X].

7.7.3. Lemma. Let ψ be an automorphism of Fpd . Then ψ(a) = a for all


a ∈ Zp . If q ∈ Zp [X] and a ∈ Fpd is a root of q, then ψ(a) is also a root of
q.

Proof. First, since ψ(1) = 1, we have


ψ(k) = ψ(1
+ + .,-
. . + 1.)
k terms
= ψ(1) + . . . + ψ(1)
+ ,- .
k terms
=1
+ + .,-
. . + 1. = k.
k terms
This shows that ψ is the identity on Zp .
Now let q(X) = q0 + q1 X + . . . + qn X n be a polynomial with coefficients
qi ∈ Zp . If a is a root, then

n
0 = ψ(q(a)) = ψ(qi )ψ(ai )
i=0

n
= qi ψ(a)i = q(ψ(a)).
i=0

So ψ(a) is also a root. Indeed, applying ψ to all the roots of q yields a


permutation of the roots. 

7.7.4. Corollary. Let a be a primitive root of Fpd , and let q ∈ Zp [X]


k
be its minimal polynomial. Then q has d distinct roots: ϕk (a) = ap for
0 ≤ k ≤ d − 1, where ϕ is the Frobenius automorphism.

Proof. By the previous lemma, since a is a root of q, then so are


2 3
a1 = ϕ(a) = ap , a2 = ϕ(a1 ) = ϕ2 (a) = ap , a3 = ϕ(a2 ) = ap ,
k
and so on. Indeed, each ak = ϕk (a) = ap must be a root of q for all k ≥ 0.
For 0 ≤ k ≤ d − 1, these are all different roots because a is a primitive root.
This accounts for all d roots of q. Of course, by Fermat’s little theorem for
d
finite fields, ϕd (a) = ap = a. So the sequence starts repeating itself at that
point. 
7.7. AUTOMORPHISMS OF Fpd 167

7.7.5. Lemma. Let Fpd be a finite field, and let a be a generator of Fpd . If
ψ1 and ψ2 are automorphisms of F such that ψ1 (a) = ψ2 (a), then ψ1 = ψ2 .

Proof. Since ψi are isomorphisms, it follows that


ψ1 (r(a)) = r(ψ1 (a)) = r(ψ2 (a)) = ψ2 (r(a))
for every polynomial r ∈ Zp [X]. Since a is a generator, this accounts for
every non-zero element of F. So ψ1 = ψ2 . 
This brings us to the main theorem of this section.

7.7.6. Theorem. Let Fpd be a finite field, and let ϕ be the Frobenius
automorphism. Then d is the smallest positive integer k such that ϕk = id.
Moreover, the set of all automorphisms of Fpd is given by
{id, ϕ, ϕ2 , . . . , ϕd−1 }.
k
Proof. Notice that ϕk (a) = ap . Hence the fixed point set
{a ∈ Fpd : ϕk (a) = a}
k
consists of the roots of the polynomial X p − X. For k < d, this is a proper
subset of Fpd because the polynomial has at most pk roots. So ϕk = id. But
d
every element of Fpd is a root of X p − X by Fermat’s little theorem for
finite fields. Thus ϕd = id.
Let ψ be any automorphism of Fpd . Fix a primitive root a in Fpd , and
let q be its minimal polynomial in Zp [X]. By Lemma 7.7.3, ψ(a) is another
root of q. And by Corollary 7.7.4, there is an integer k so that ψ(a) = ϕk (a).
By Lemma 7.7.5, ψ = ϕk . Therefore every automorphism of Fpd is a power
of the Frobenius automorphism. 

Exercises

1. Let F = Z5 [x]/(x4 + x2 + x + 1). Show that q(X) = X 4 + X 2 + X + 1


factors as
q(X) = (X − x)(X − x5 )(X − x25 )(X − x125 ).
2. With F as above, use the fact that x2 + 1 is a root of the irreducible
polynomial X 4 − 2X 3 − 2X 2 + 2X + 2 to find the other roots.
3. Show that every a ∈ Fpn has a unique pth root.
4. Let p be prime and n ∈ N. Show that n divides the Euler number
ϕ(pn − 1).
Hint: this is the number of primitive roots. Show that they split into
disjoint subsets Sa = {ϕk (a) : 0 ≤ k < n} of size n for primitive roots a.
168 7. FINITE FIELDS
d
5. (a) For any divisor d of n, show that the roots of X p − X in Fpn form
a subfield isomorphic to Fpd .
Hint: Use the fact that ϕd is an automorphism to show that this
set of roots forms a field.
(b) Deduce that this is the unique subfield of cardinality pd .
(c) Show that every automorphism of Fpn maps this subfield onto itself.
6. If a ∈ F∗pn , its conjugates are {ϕk (a) : 0 ≤ k < n} = {a = a1 , a2 , . . . , ad }.
Let q be the minimal polynomial for a.
(a) Show that the the conjugates of a are roots of q.

(b) Show that the polynomial p(x) = dI=1 (x−ai ) has coefficients which
are fixed by ϕ.
(c) Deduce that p = q. So the roots of q are exactly the conjugates of
a.
(d) Show that d|n.
Hint: the smallest e > 0 such that ϕe (a) = a divides n.

7. Define the trace on Fpn by Tr(a) = n−1 k
k=0 ϕ (a).
(a) Show that Tr(a) ∈ Fp .
(b) Show that Tr(a + b) = Tr(a) + Tr(b) for a, b ∈ Fpn .
(c) Show that Tr(βa) = β Tr(a) for β ∈ Fp and a ∈ Fpn .
(d) Show that Tr(β) = nβ for β ∈ Fp .
(e) Show that Tr(ap ) = Tr(a) for a ∈ Fpn .

7.8. Irreducible polynomials of all degrees


We have made the implicit assumption in the preceding discusion that irre-
ducible polynomials exist in abundance. In this section, we will show that
there are irreducible polynomials in Zp [X] of every degree for every prime
p. First let us take note of something we already know.

7.8.1. Lemma. Let q ∈ Zp [X] be an irreducible polynomial of degree d.


d
Then q is a factor of X p − X.

Proof. Form the field Zp [X]/(q). This has pd elements, and the element
[x] is a root of q. Since q is irreducible, it is the minimal polynomial of [x].
d
By Fermat’s little theorem, [x] is a root of X p − X. Therefore, q divides
d
X p − X. 

A converse of sorts requires some more sophisticated argument. First


we need an elementary, yet rather clever, calculation.

7.8.2. Lemma. gcd(X m − 1, X n − 1) = X d − 1 where d = gcd(m, n).


7.8. IRREDUCIBLE POLYNOMIALS OF ALL DEGREES 169

Proof. If m = dk, then


X m − 1 = (X d − 1)(1 + X d + X 2d + . . . + X (k−1)d ).
Thus X d − 1 divides both X m − 1 and X n − 1. By the Euclidean algorithm,
there are positive integers s and t so that d = ms − nt. So if we define
S(X) = 1 + X m + X 2m + . . . + X (s−1)m
T (X) = (1 + X n + X 2n + . . . + X (t−1)n )X d .
Then
(X m − 1)S(X) − (X n − 1)T (X) = (X ms − 1) − (X nt − 1)X d = X d − 1.
So any common divisor of X m − 1 and X n − 1 divides X d − 1. Thus the gcd
of X m − 1 and X n − 1 equals X d − 1. 

7.8.3. Corollary. gcd(pm − 1, pn − 1) = pd − 1 where d = gcd(m, n).

Proof. Substituting p for X shows that pd − 1 divides both pm − 1 and


pn − 1. The proof of the previous lemma shows that gcd(pm − 1, pn − 1)
divides
(pm − 1)S(p) − (pn − 1)T (p) = pd − 1.
Hence gcd(pm − 1, pn − 1) = pd − 1. 

7.8.4. Lemma. Let q ∈ Zp [X] be an irreducible polynomial of degree d.


n
Then q is a factor of X p − X if and only if d|n.

Proof. The case q = X is trivial, so suppose that q = X.


Suppose that d|n. Then by Lemma 7.8.1, we have q|X p −1 − 1. By
d

Corollary 7.8.3, pd − 1 divides pn − 1. Hence by Lemma 7.8.2, X p −1 − 1


d

divides X p −1 − 1. So q divides X p − X which divides X p − X.


n d n

n n
Conversely, suppose that q divides X p − X. Since X p − X factors
into linear terms in Fpn , so does q. Let a ∈ Fpn be a root of q. Since q is
irreducible, this is the minimal polynomial of a. Hence Zp [a] is isomorphic
to Zp [x]/(q), which has cardinality pd . By Corollary 7.3.3 ,
 d
X − b = X p − X.
b∈Zp [a]
n
This divides Xp − X in Fpn [X]. Because both have coefficients in Zp , the
quotient also lies in Zp [X]. So X p −1 −1 divides X p −1 −1. By Lemma 7.8.2,
d n

pd − 1 divides pn − 1. And by Corollary 7.8.3, d divides n. 


Our next goal is to show that if q ∈ Zp [X] is an irreducible polynomial
n
of degree d and d|n, then q 2 does not divide X p − X. To prove this, we
need a method to identify repeated roots. The key tool we use is the formal
derivative.
170 7. FINITE FIELDS
d
7.8.5. Definition. Let F any field and let q(x) = i=0 qi x
i be an element
of F[x]. Then, its formal derivative is given by

d

q (x) = iqi xi−1 .
i=1

7.8.6. Lemma. For a polynomial q ∈ F[X], all irreducible factors of q are


simple if and only if gcd(q, q  ) = 1. Moreover, if there are repeated roots,
this gcd provides a proper factor except when F has characteristic p and q is
a perfect p-th power. In either case, this yields a factorization of q.

Proof. In Exercise 3, the reader will verify the product rule


(qr) = q  r + qr .
If q has a repeated factor u, then we can write q = u2 v for some v ∈ F[X].
Calculate
q  = (u2 v) = 2uu v + u2 v  = u(2u + uv  )
Hence u divides gcd(q, q  ).
This gcd provides a proper factor of q except in the special case in which
gcd(q, q  ) = q. But since deg(q  ) < deg(q), this can only occur when q  = 0.
This can never happen over the rationals, or any field of characteristic 0.
However, in a field of characteristic p, this can happen if iqi ≡ 0 (mod p) for
every coefficient i. Clearly this means that qi is non-zero only when i ≡ 0
(mod p). In this case
m
q(X) = aj X jp .
j=0
m
Let u = j=0 aj X j . By Lemma 7.7.1 above, the p-th power of a sum
is the sum of the p-th powers in any field of characteristic p. In particular,
q = up . This yields a factorization of q.
Conversely, suppose that u is an irreducible factor of q which is simple,
so that q = uv where v ∈ F[X] satisfies gcd(u, v) = 1. Then
q  = (uv) = u v + uv  ≡ u v (mod u)
Now u = 0 since u is irreducible (and thus is not a p-th power), and u is
of lower degree than u. So both u and v are relatively prime to u. By the
unique factorization theorem, the product u v is also relatively prime to u.
Hence u is not a factor of q  .
Consequently, if q has only simple factors, it can have no factor in com-
mon with q  . Therefore gcd(q, q  ) = 1. 
n
We can now describe the factorization of X p − X into irreducibles in
Zp [X].
7.8. IRREDUCIBLE POLYNOMIALS OF ALL DEGREES 171
n
7.8.7. Corollary. X p − X factors in Zp [X] as the product of all irre-
ducible polynomials q of degree d as d runs over all divisors of n.
n
Proof. Let f (X) = X p − X. The formal derivative is
f  (X) = pn X p
n −1
− 1 = −1
and so gcd(f, f  ) = 1. Since f (X) is not a perfect p-th power, by Lemma
7.8.6, all of the irreducible factors of f (X) are simple. The result now follows
from Lemma 7.8.4. 
We are finally ready to prove the main result of this section.

7.8.8. Theorem. There are irreducible polynomials in Zp [X] of degree n


for every n.

Proof. Let rd (X) denote the product of all monic irreducible polynomials
of Zp [X] of degree d. From Corollary 7.8.7, we obtain that
n

Xp − X = rd (X).
d|n

Therefore 
n
pn = deg(X p − X) = deg(rd (X)).
d|n
We will show that rn is non-zero by showing that the sum of the degrees
n
of the other factors of X p − X is strictly less than pn . Note that since rd
d
divides X p − X, it has deg(rd ) ≤ pd . Thus a crude estimate shows
  
n−1
pn − p
deg rd ≤ p ≤
d
pi = < pn .
p−1
d|n d|n i=1
d=n

So rn must have non-zero degree. 

7.8.9. Remark. We are able to prove Theorem 7.8.8 by crudely bounding


the number of irreducible polynomials of a given degree. In Exercise 7, we
prove a formula giving the exact number of irreducible polynomials in Zp [x]
of degree n. It is actually rather large.

Here are two easy consequences of this theorem.

7.8.10. Corollary. There is a finite field Fpn of cardinality pn for every


prime p and integer n ≥ 1.

7.8.11. Corollary. There are irreducible polynomials of every degree in


Z[X] using only 0’s and 1’s as coefficients.
172 7. FINITE FIELDS

Proof. Take an irreducible polynomial of degree n in Z2 [X]. Then the


corresponding polynomial in Z[X] is irreducible by Corollary 6.5.2. 

7.8.12. Example. Consider the polynomial X 31 − 1 in Z7 [X]. First look


for the smallest integer d so that X 31 − 1 divides X 7 −1 − 1. By Lemma
d

7.8.2, this occurs precisely when 31 divides 7d − 1; that is, when 7d ≡ 1


(mod 31). So we are interested in ord31 (7). By Fermat’s little theorem, this
is a divisor of 30. A calculation shows that
73 ≡ 2 (mod 31) and 75 ≡ 5 (mod 31).
Therefore, 76 ≡ 4 (mod 31), 710 ≡ −6 (mod 31) and 715 ≡ 1 (mod 31).
Hence, ord31 (7) = 15.
So X 31 − 1 divides X 7 −1 − 1. Since 31 is prime, Lemma 7.8.2 yields
15

that
3 −1 5 −1
gcd(X 31 − 1, X 7 − 1) = X − 1 = gcd(X 31 − 1, X 7 − 1).
Consequently, it follows from Lemma 7.8.4 that X 31 −1
has one linear factor,
X − 1, and no irreducible factors of degree 3 or 5. Therefore it must factor
as the product of X − 1 and two irreducible polynomials p1 , p2 of degree 15.
Symbolic computation software such as MAPLE or MATHEMATICA can find
these factors easily. In particular, p1 equals
X 15−2X 14+X 13−3X 12−X 11−3X 10+3x9−2X 7−X 6+3X 5−3X 4+X 3+X 2−3X−1.
Consider the field F = Z7 [x]/(p1 ) of order 715 . The element [x] is a
root of p1 , and thus is a root of X 31 − 1. Hence ord([x]) divides 31. Since
[x] = [1] and 31 is prime, we find that ord([x]) = 31. In particular, [x] is not
a primitive root. However, it is clearly a generator of F.
15
Let us try to count the irreducible factors of X 7 − X. From the theory
we have developed, it factors as
15
X 7 − X = r1 (X)r3 (X)r5 (X)r15 (X)
where rd (X) is the product of all monic irreducible factors of degree d. We
also know that
r1 (X) = X 7 − X = X(X − 1)(X − 2)(X − 3)(X + 3)(X + 2)(X + 1)
3 −1
r3 (X) = (X 7 − 1)/(X 6 − 1)
r5 (X) = (X 7 −1 − 1)/(X 6 − 1)
5

 
r15 (X) = (X 7 −1 − 1)(X 6 − 1) / (X 7 −1 − 1)(X 7 −1 − 1) .
15 3 5

So we see that there are 7 irreducible polynomials of degree 1. The degree


of r3 is 73 − 7 = 336. So there are 112 irreducible polynomials of degree 3
over Z7 . Similarly, the degree of r5 is 75 − 7. So there are (75 − 7)/5 = 3360
irreducible polynomials of degree 5 over Z7 . Finally, we calculate the degree
of r15 to be 715 −75 −73 +7. Dividing by 15 yields the number 316504099520
irreducible polynomials of degree 15. There are ϕ(715 − 1) = 1450340640000
7.8. IRREDUCIBLE POLYNOMIALS OF ALL DEGREES 173

primitive roots of F. These come in groups of 15 corresponding to the roots


of 96689376000 of these irreducible polynomials of degree 15.

Exercises

1. Find an irreducible polynomial of degree 6 in Z2 [x].


2. How many irreducible monic polynomials of degree 6 are there in Z2 [x].
How many of these have roots which are primitive roots in F64 ?
3. Verify the product rule for the formal derivative of polynomials in any
field.
4. Show that the only subfields of Fpn are the fields Fpd for d|n.
Hint: combine Corollary 7.8.7 and Section 7.7 Exercise 5.
5. Show that the fixed point set of ϕk on Fpd is the subfield Fpe where
e = gcd(k, d).
6. In this exercise, we prove the Möbius inversion formula. Let μ : Z+ →
Z be defined as follows. Let μ(1) = 1, μ(n) = 0 if n is not square-free,
and otherwise μ(n) = (−1)k , where n is a product of k distinct primes.

(a) Prove d|n μ(d) is 1 if n = 1, and 0 otherwise.
(b) For any functions F, G : Z+ → Z, let

(F ∗ G)(n) = F (d)G( nd ).
d|n

Prove ∗ is an associative commutative binary operation on functions.


(c) Find the function H which the identity for ∗.
(d) Suppose f : Z+ → Z and g : Z+ → Z are functions, and that

g(n) = f (d).
d|n

Prove that

f (n) = μ(d)g( nd ).
d|n

7. Let p be a prime. Prove that there are

1 n d
μ( d )p
n
d|n

monic irreducible degree n polynomials in Fp [x].


174 7. FINITE FIELDS

7.9. Factoring Algorithms for Polynomials


In this section, we take a brief look at one method for factoring polynomials.
It turns out that it is much easier to factor a polynomial of degree d in Zp [x]
than to factor a number with d digits in base p. This seems, on the surface,
to be a surprising fact because the number and the polynomial have the
same complexity. However, it turns out that the structure of finite fields is
the key.
The first step in factoring polynomials is to reduce the problem to the
case in which the polynomial q has no repeated factors, which may be done
usng Lemma 7.8.6. Compute gcd(q, q  ) and use this to factor q. Repeat as
necessary until it is factored into terms with no repeated factors.
We are now ready to study the main factoring algorithm of this sec-
tion. It is the preferred method used in the symbolic computation program
MAPLE. Also, it is perhaps the simplest and most effective way to factor
polynomials in Z[x]. The main idea is to factor polynomials modulo p based
on the Euclidean algorithm and Lemma 7.8.4. Then Hensel’s Lemma, which
will be discussed in the next section, is used to increase the information
about the possible integer factorizations.
Lemma 7.8.4 shows that if q ∈ Zp [x] is an irreducible polynomial of
d k
degree d, then q divides xp − x but does not divide xp − x for k < d.
We first compute gcd(q, xp − x) = r1 . Since xd − x = a∈Zp x − a, this
will produce a factor r1 which we will later factor into a product of linear
2
terms. Replace q with q1 = q/r1 . Next compute gcd(q1 , xp − x) = r2 .
Since r2 has no linear factors, all of its irreducible factors will be quadratic.
3
Set q2 = q1 /r1 and define gcd(q2 , xp − x) = r3 . Then all of the irreducible
factors of r3 have degree 3. Proceed until the degree of q is reached (although
this will end sooner if factors are found). For this reason, this method is
known as the distinct degree algorithm.
Now, these factors can be distinguished by using quadratic residues in
finite fields. When p = 2, half of the non-zero elements of a finite field are
perfect squares. So a polynomial t of degree at most d − 1 will be a square
modulo ri about half the time. When t is a square in Z[x]/(ri ), then by
Proposition 7.6.2,
t(p −1)/2 ≡ 1 (mod ri ).
d

And when t is not a square,


d −1)/2
t(p ≡ −1 (mod ri ).
So it suffices to compute
d −1)/2
gcd(r, t(p − 1)
for several random choices of t to obtain various proper factors of r.
We won’t work out exactly what happens when p = 2. Let f (x) =
d−1 2i
i=1 x . Compute gcd(r, f ◦ t) for random choices of t ∈ Z2 [x] of degree
less than d.
7.9. FACTORING ALGORITHMS FOR POLYNOMIALS 175

7.9.1. Example. We demonstrate this algorithm via an explicit example.


Consider the polynomial q(x) in Z5 [x] given by
q(x) = x19 + 3x18 + x17 + x16 − x15 + x14 + x12 + x11 − 2x10
+ x9 − 2x8 − 2x6 + 2x5 + x4 − x3 + 2x2 + 2x − 2.
First a computation shows that gcd(q, q  ) = 1. Then we compute
gcd(q, x5 − x) = x2 + 3x + 2 = (x + 1)(x + 2).
Factoring this out leaves q1 = q/(x2 + 3x + 2). Continue
gcd(q1 , x25 − x) = 1
showing that there are no quadratic factors. Then
gcd(q1 , x125 − x) = x9 + 2x7 − x6 − 2x4 − x3 + x2 + 1.
This must be the product of three irreducible polynomials of degree 3. Since
53 −1
2 = 62, we compute

gcd(x9 + 2x7 − x6 − 2x4 − x3 + x2 + 1, x62 − 1) = x3 + 2x − 1.


So
x9 + 2x7 − x6 − 2x4 − x3 + x2 + 1 = (x3 + 2x − 1)(x6 − 2x − 1).
And
gcd(x6 − 2x − 1, (x + 1)62 − 1) = x3 − x2 − 2.
Hence
x9 + 2x7 − x6 − 2x4 − x3 + x2 + 1 = (x3 + 2x − 1)(x3 − x2 − 2)(x3 + x2 + x − 2).
The remaining term is
q3 = q1 /(x9 + 2x7 − x6 − 2x4 − x3 + x2 + 1)
= x8 + 2x6 − 1
This is either irreducible, or the product of two irreducible factors of degree
4. We try
gcd(q3 , x625 − x) = q2
So q2 is a product of quartics
gcd(q3 , x312 − 1) = 1
gcd(q3 , (x + 1)312 − 1) = x4 + x2 − 2x + 2
Thus
x8 + 2x6 − 1 = (x4 + x2 − 2x + 2)(x4 + x2 + 2x + 2).
This provides a complete factorization of q(x) =:
(x + 1)(x + 2)(x3 + 2x − 1)(x3 − x2 − 2)(x3 + x2 + x − 2)
× (x4 + x2 − 2x + 2)(x4 + x2 + 2x + 2).
176 7. FINITE FIELDS

Exercises

1. Use the distinct degree algorithm to factor q ∈ Z7 [x] given by


x12 + 3x11 + 3x10 + x9 + 2x8 + +6x6 + 6x5 + x4 + 3x3 + x2 + 4x + 3.
2. Use the distinct degree algorithm to factor the polynomial q ∈ Z5 [x]
given by
q(x) = x8 + x7 + 3x6 + 2x5 + 4x4 + 4x3 + 3x + 4.
3. Factor in Z3 [x]:
q(x) = x16 + x14 + x12 − x8 − x6 + x2 + 1.
 2i 2d
4. Let f (x) = d−1
i=1 x ∈ Z2 [x]. Show that f (f (x) + 1) = x − 1.
5. Factor in Z2 [x] the polynomial q(x) =
x17 + x14 + x13 + x12 + x11 + x10 + x9 + x8 + x7 + x5 + x4 + x + 1.
Remember to check for repeated factors.

7.10. Factoring Rational Polynomials


Now let us reconsider the problem of factoring polynomials with integer coef-
ficients. This can now be done in a routine algorithmic way. The first step is
to pick a prime p relatively prime to the leading coefficient of the polynomial
q. For a computer, a good choice is reasonably large but still manageable
for exact integer arithmetic. (MAPLE picks one near 104 .) Then use the
distinct degree algorithm to factor q(x) (mod p). Finally, use an algorithm
we explain in this section known as Hensel’s Lemma to recursively improve
this factorization mod p to a factorization mod pk until pk is large enough
to bound the coefficients of the factors. This either yields a factorization or
shows that one does not exist.
This kind of search can be carried out efficiently on a computer. More-
over, it is not very difficult to get crude bounds on the size of the coefficients
 
of possible factors. For example, if q = di=0 qi xi is a factor of p = ni=0 pi xi ,
then
d 
n 1/2
|qi | ≤ 2d |pi |2 .
i=0 i=0
We will not prove such an estimate here. However, it means that normally
only a few applications of Hensel’s Lemma will do the job. Once k is suf-
ficiently large, we either find an integer factorization or realize that none
exists.
We make two simplifying assumptions that are easily achieved. Choose
the prime p so that it is relatively prime to the leading coefficient of our given
polynomial q ∈ Z[x]. Also, assume that q factors in Zp [x] into a product uv
where u and v are relatively prime. Of course, if q is irreducible in Zp [x],
7.10. FACTORING RATIONAL POLYNOMIALS 177

then q is irreducible in Z[x] by Corollary 6.5.2 and thus in Q[x] by Gauss’s


Lemma 6.3.3.
d
7.10.1 Hensel’s Lemma. Suppose that q(x) = factors as q ≡
i=0 qi x
i

uv (mod p). Furthermore, assume that gcd(qd , p) = 1 and that u and v are
relatively prime in Zp [x]. Then there is an algorithm to calculate polynomials
uk and vk in Z[x] so that
q ≡ uk vk (mod pk )
with deg(uk ) = deg(u) and deg(vk ) = deg(v).

Proof. When the leading coefficient of q isn’t 1, there is a slight problem


because the leading coefficients of the factors u and v aren’t determined.
However, they must be divisors of qd . So a simple trick deals with this
problem. Replace q(x) by qd q(x) and multiply u and v by the appropriate
factor so that their leading coefficient is also qd . Since the identity q ≡ uv is
only mod p, this adjustment can be made mod p, and then fixed up in the
integers by adding some multiple of pxm .
We will also assume that m = deg(u) ≤ deg(v) = n. By hypothesis,
gcd(u, v) = 1 in Zp [x]. Thus by the Euclidean algorithm there are polyno-
mials s and t in Z[x] so that
su + tv ≡ 1 (mod p).
Let u1 = u and v1 = v and define r1 = (q − u1 v1 )/p which has integer
coefficients by hypothesis. In fact, we only need r1 (mod p). Now find
integer polynomials s1 and t1 so that
s1 u1 + t1 v1 ≡ r1 (mod p)
such that deg(s1 ) < n and deg(t1 ) < m. To obtain this, notice that
(r1 s)u1 + (r1 t)v1 ≡ r1 (su + tv) ≡ r1 (mod p).
Divide u1 into r1 t to obtain quotient a1 and remainder t1 with deg(t1 ) < m.
Set s1 = r1 s + a1 v1 (mod p). We see that (s1 , t1 ) is a solution with control
on deg(t1 ). The point is that t1 v1 is a polynomial of degree
deg(t1 ) + deg(v1 ) < m + n.
By the identity s1 u1 ≡ 1 − r1 v1 (mod p), we see that the same is true for
s1 u1 . Since s1 was reduced mod p, we know that it has a leading coefficient
relatively prime to p. Thus its degree is the same as its degree mod p. So,
deg(s1 ) = deg(s1 u1 ) − deg(u1 ) < m + n − m = n.
Now we are ready to improve the factorization. Set
u2 := u1 + pt1 v2 := v1 + ps1 .
178 7. FINITE FIELDS

This does not affect the leading coefficients of the u’s or v’s. Then it is a
simple exercise to verify
q − u2 v2 = (u1 v1 + pr1 ) − (u1 v1 + p(s1 u1 + t1 v1 ) + p2 s1 t1 )
= p(r1 − s1 u1 − t1 v1 ) + p2 s1 t1 ≡ 0 (mod p2 ).
This procedure repeats recursively. Indeed, if
q ≡ uk vk (mod pk ),
define rk = (q − uk vk )/pk . As above, set tk to be the remainder on dividing
rk t by uk with quotient ak . Then set sk = rk s + ak vk (mod p). The new
approximation is given by
uk+1 := uk + pk tk vk+1 := vk + pk sk .
Then

q − uk+1 vk+1 = (uk vk + pk rk ) − uk vk + pk (sk uk + tk vk ) + p2k sk tk
= pk (rk − sk uk − tk vk ) + p2k sk tk ≡ 0 (mod pk+1 ).
Since this is accurate modulo pk+1 , reduce the coefficients mod pk+1 sym-
metrically about 0 so that the coefficients have modulus at most pk+1 /2.
Repeating this procedure increases the ‘accuracy’ of the factorization by
a factor of p at each stage. Moreover, every stage is a routine calculation.
The most complicated step, the Euclidean algorithm, is executed only once.
On a computer, this procedure is very efficient. 

7.10.2. Example. Let us work through an example. The calculations


were done by a computer, although with such a small example, it is almost
practical to do it by hand. Let
q(x) = 6x7 + 53x6 − 174x5 + 300x4 − 33x3 − 293x2 + 453x − 81.
Suppose that we found the factorization
q ≡ (x3 + 2x2 + 2x + 2)(x4 + x3 + 2x2 + 2x + 2) (mod 5).
Following our algorithm, we replace q by Q = 6q (mod 5), and set
u1 = 6u ≡ 6x3 + 2x2 + 2x + 2 (mod 5)
v1 = 6v ≡ 4 3
6x + 1x + 2x + 2x + 22
(mod 5)
By the Euclidean algorithm, solve su1 + tv1 ≡ 1 (mod 5):
s = −x3 − x2 + 3 t = x2 + 2x.
The first step is to compute the remainder
r1 = (Q − u1 v1 )/5
= 42x6 − 252x5 + 288x4 − 126x3 − 438x2 + 486x − 126
≡ 2x6 + 3x5 + 3x4 + 4x3 + 2x2 + x + 4 (mod 5)
7.10. FACTORING RATIONAL POLYNOMIALS 179

Then dividing tr1 by u1 mod 5 yields remainder t1 = x + 2 and quotient a1


which is used to compute
s1 = sr1 + a1 v1 ≡ 2x3 + 3x2 (mod 5).
Then we obtain
u2 ≡ u1 + 5t1 ≡ 6x3 + 12x2 − 8x − 3 (mod 25)
v2 ≡ v1 + 5s1 ≡ 6x − 9x3 + 2x2 + 12x + 12
4 (mod 25)
Check the remainder
r2 = (Q − u2 v2 )/25 ≡ 2x6 + 4x5 + x4 + 3x3 + 3x2 + 4x + 2 (mod 5).
This isn’t 0, so continue on. We get t2 = 4x2 + x is the remainder of tr2 on
dividing by u2 mod 5 to get quotient a2 . And
s2 ≡ sr2 + a2 v2 ≡ 3x2 + 3x + 1 (mod 5).
Then the next approximants are
u3 ≡ u2 + 25t2 ≡ 6x3 − 13x2 + 17x − 3 (mod 125)
v3 ≡ v2 + 25s2 ≡ 6x − 59x3 − 48x2 + 12x + 37
4 (mod 125)
This time the remainder is
r3 = (Q − u3 v3 )/125 ≡ x6 + 2x5 + 2x4 + 3x3 + 2x2 + 2x + 2 (mod 5).
This still isn’t 0, so continue on. We get t3 = 0 is the remainder of tr3 on
dividing by u3 mod 5 to get quotient a3 . And
s3 ≡ sr3 + a3 v3 ≡ x3 + 1 (mod 5).
Then the next approximants are
u4 ≡ u3 + 125t3 ≡ 6x3 − 13x2 + 17x − 3 (mod 625)
v4 ≡ v3 + 125s3 ≡ 6x4 + 66x3 − 48x2 + 12x + 162 (mod 625)
This time we have found the factorization
Q = (6x3 − 13x2 + 17x − 3)(6x4 + 66x3 − 48x2 + 12x + 162)
whence
q = (6x3 − 13x2 + 17x − 3)(x4 + 11x3 − 8x2 + 2x + 27).

Exercises

1. Using computer software, follow the above procedure with p = 7 to


factor
q(x) = 6x7 + 43x6 − 363x5 − 301x4 + 527x3 − 15x2 − 387x + 76.
2. Factor in Z[x] the polynomial q(x) = x14 + 31x13 − 2x11 − 63x10 + 31x9 +
27x8 + 897x7 + 33x6 + 4x5 + 54x4 + 3x3 − 58x2 + 27x + 1 given that
q(x) ≡ (x7 + x6 + 2x3 + 3x + 1)(x7 + 3x4 + x3 − x + 1) (mod 5).
180 7. FINITE FIELDS

Notes on Chapter 7
It was Galois who first realized that Zp [x]/(q) formed a field for any ir-
reducible polynomial q. He introduced the idea of adjoining a root of a
polynomial to build a larger field. Dedekind was the first to suggest that
there should be a general definition of field, although for him, a field was
always a subset of C. This was the beginning of a deeper understanding of
the relationship between algebra and number theory. Kronecker’s work was
very influential: he allowed for a more abstract extension of a field by roots
of polynomials. E.H. Moore classified finite fields in 1893.
Dedekind and Weber constructed certain fields of analytic functions as-
sociated to Riemann surfaces, and Hensel constructed the field of p-adic
numbers. These very different types of fields paved the way to a general
abstract definition of fields due to Steinitz in 1910. Many general theo-
rems in field theory and Galois theory were proven by Weber and Steinitz.
Subsequent work of Emil Artin modernized the treatment of Galois theory.
See Kleiner’s short monograph [19] for a brief history of modern algebra.
Emil Artin’s book [4] on Galois theory is a nice introduction to field theory.
See Lidl and Niederreiter [24] for more detailed information about finite
fields. Michael Artin’s comprehensive book on algebra [5] covers field theory
including a chapter on quadratic number fields.
The discovery of algorithms for the factorization of polynomials is more
recent. See Knuth [21, Section 4.6.2] for an overview of various methods.
Berlekamp [6] found the first general algorithm for factoring polynomials in
Zp [x] by reducing the problem to a large system of linear equations. The
method discussed in these notes is due to D. Cantor and H. Zassenhaus [7]
in 1981. Hensel’s lemma dates back to 1904 in the same paper in which he
introduced p-adic number fields.
Bibliography
[1] M. Agrawal, N. Kayal, and N. Saxena, PRIMES is in P, Ann. of Math. (2) 160
(2004), no. 2, 781–793, DOI 10.4007/annals.2004.160.781. MR2123939
[2] Ş. Alaca and K. S. Williams, Introductory algebraic number theory, Cambridge Uni-
versity Press, Cambridge, 2004. MR2031707
[3] W. R. Alford, A. Granville, and C. Pomerance, There are infinitely many Carmichael
numbers, Ann. of Math. (2) 139 (1994), no. 3, 703–722, DOI 10.2307/2118576.
MR1283874
[4] E. Artin, Galois theory, 2nd ed., Dover Publications, Inc., Mineola, NY, 1998. Edited
and with a supplemental chapter by Arthur N. Milgram. MR1616156
[5] M. Artin, Algebra, Prentice Hall, Inc., Englewood Cliffs, NJ, 1991.
[6] E. R. Berlekamp, Factoring polynomials over finite fields, Bell System Tech. J. 46
(1967), 1853–1859, DOI 10.1002/j.1538-7305.1967.tb03174.x. MR219231
[7] D. G. Cantor and H. Zassenhaus, A new algorithm for factoring polynomials over
finite fields, Math. Comp. 36 (1981), no. 154, 587–592, DOI 10.2307/2007663.
MR606517
[8] R. L. Cooke, The history of mathematics: A brief course, 3rd ed., John Wiley & Sons,
Inc., Hoboken, NJ, 2013. MR3236642
[9] L. E. Dickson, History of the theory of numbers. Vol. I: Divisibility and primality. Vol.
II: Diophantine analysis. Vol. III: Quadratic and higher forms., Chelsea Publishing
Co., New York, 1966.
[10] W. Diffie and M. E. Hellman, New directions in cryptography, IEEE Trans. Inform.
Theory IT-22 (1976), no. 6, 644–654, DOI 10.1109/tit.1976.1055638. MR437208
[11] P. Erdös, On pseudoprimes and Carmichael numbers, Publ. Math. Debrecen 4 (1956),
201–206, DOI 10.5486/pmd.1956.4.3-4.16. MR79031
[12] D. J. H. Garling, A course in mathematical analysis. Vol. I: Foundations and
elementary real analysis, Cambridge University Press, Cambridge, 2013, DOI
10.1017/CBO9781139424493. MR3087523
[13] W. J. Gilbert and S. A. Vanstone, An introduction to mathematical thinking: Al-
gebra and number systems, Pearson Prentice Hall, Upper Saddle River, NJ, 2005.
MR2128503
[14] J. Gray, A history of abstract algebra: From algebraic equations to modern al-
gebra, Springer Undergraduate Mathematics Series, Springer, Cham, 2018, DOI
10.1007/978-3-319-94773-0. MR3823206
[15] G. H. Hardy and E. M. Wright, An introduction to the theory of numbers, 6th ed.,
Oxford University Press, Oxford, 2008. Revised by D. R. Heath-Brown and J. H.
Silverman; With a foreword by Andrew Wiles. MR2445243

181
182 BIBLIOGRAPHY

[16] R. W. Hamming, Error detecting and error correcting codes, Bell System Tech. J. 29
(1950), 147–160, DOI 10.1002/j.1538-7305.1950.tb00463.x. MR35935
[17] D. R. Heath-Brown, Artin’s conjecture for primitive roots, Quart. J. Math. Oxford
Ser. (2) 37 (1986), no. 145, 27–38, DOI 10.1093/qmath/37.1.27. MR830627
[18] C. Hooley, On Artin’s conjecture, J. Reine Angew. Math. 225 (1967), 209–220, DOI
10.1515/crll.1967.225.209. MR207630
[19] I. Kleiner, A history of abstract algebra, Birkhäuser Boston, Inc., Boston, MA, 2007,
DOI 10.1007/978-0-8176-4685-1. MR2347309
[20] I. Kleiner, Excursions in the history of mathematics, Birkhäuser/Springer, New York,
2012, DOI 10.1007/978-0-8176-8268-2. MR3222782
[21] D. E. Knuth, The art of computer programming. Vol. 2: Seminumerical algorithms,
Addison-Wesley, Reading, MA, 1998. Third edition [of MR0286318]. MR3077153
[22] R. Laubenbacher and D. Pengelley, “Voici ce que j’ai trouvé:” Sophie Germain’s grand
plan to prove Fermat’s last theorem (English, with English and French summaries),
Historia Math. 37 (2010), no. 4, 641–692, DOI 10.1016/j.hm.2009.12.002. MR2735899
[23] H. W. Lenstra Jr. and C. Pomerance, Primality testing with Gaussian periods, J.
Eur. Math. Soc. (JEMS) 21 (2019), no. 4, 1229–1269, DOI 10.4171/JEMS/861.
MR3941463
[24] R. Lidl and H. Niederreiter, Introduction to finite fields and their applications, Cam-
bridge University Press, Cambridge, 1986. MR860948
[25] J. H. Manheim, The genesis of point set topology, Pergamon Press, Oxford-Paris-
Frankfurt; The Macmillan Company, New York, 1964. MR0226976
[26] G. L. Miller, Riemann’s hypothesis and tests for primality, J. Comput. System Sci.
13 (1976), no. 3, 300–317, DOI 10.1016/S0022-0000(76)80043-8. MR480295
[27] M. Ram Murty, Prime numbers and irreducible polynomials, Amer. Math. Monthly
109 (2002), no. 5, 452–458, DOI 10.2307/2695645. MR1901498
[28] Q. I. Rahman and G. Schmeisser, Analytic theory of polynomials, London Mathemati-
cal Society Monographs. New Series, vol. 26, The Clarendon Press, Oxford University
Press, Oxford, 2002. MR1954841
[29] M. O. Rabin, Probabilistic algorithm for testing primality, J. Number Theory 12
(1980), no. 1, 128–138, DOI 10.1016/0022-314X(80)90084-0. MR566880
[30] P. Ribenboim, Fermat’s last theorem for amateurs, Springer-Verlag, New York, 1999.
MR1719329
[31] P. Ribenboim, The little book of bigger primes, 2nd ed., Springer-Verlag, New York,
2004. MR2028675
[32] R. L. Rivest, A. Shamir, and L. Adleman, A method for obtaining digital signa-
tures and public-key cryptosystems, Comm. ACM 21 (1978), no. 2, 120–126, DOI
10.1145/359340.359342. MR700103
[33] P. W. Shor, Algorithms for quantum computation: discrete logarithms and fac-
toring, 35th Annual Symposium on Foundations of Computer Science (Santa Fe,
NM, 1994), IEEE Comput. Soc. Press, Los Alamitos, CA, 1994, pp. 124–134, DOI
10.1109/SFCS.1994.365700. MR1489242
[34] J. H. Silverman, A Friendly Introduction to Number Theory, 4th ed., Pearson Edu-
cation, Inc., Upper Saddle River, NJ, 2012.
[35] B. Simon, Basic complex analysis, A Comprehensive Course in Analysis, Part 2A,
American Mathematical Society, Providence, RI, 2015, DOI 10.1090/simon/002.1.
MR3443339
[36] S. Singh, The code book: The Secret History of Codes and Code Breaking, Fourth
Estate and Doubleday, 1999.
[37] H. M. Stark, An introduction to number theory, MIT Press, Cambridge, Mass.-
London, 1978. MR514402
[38] J. V. Uspensky, Theory of equations, McGraw-Hill Book Co., New York, 1948.
Index
a
, 79 characteristic, 157
p
C, 98 Chinese remainder theorem, 44
e, 20 closed, 3
F[x], 115 codes, 85
F[x]/(p), 149 commutative ring, 2
N, 3 commutativity, 1
N (x), 65 complete Bell Polynomials, 142
ϕ(n), 51 completeness, 97
π(n), 10 complex conjugate, 99
R, 95 complex numbers, 98
x̃, 65 congruence equation, 45
z, 99 conjugate, 65
|z|, 100 conjugates, 168
Z,√ 1 cubic polynomial, 143
Z[ 3], 2 cubic resolvent, 147
Zn√ , 36
de Moivre’s Theorem, 102, 124
Z[ d], 65
Dedekind, 96
Abel, 38, 147 Dedekind cuts, 96
absolute value, 100 del Ferro, 147
Adelman, 86 Descartes’s Rule of Signs, 111
algebraic element over a field, 159 Diophantine equation, 15, 31, 59
algebraic number, 128 Diophantus, 31
algebraic numbers, 20, 141 discriminant, 110, 135, 147
argument, 101 distinct degree algorithm, 174
associativity, 1 distributive law, 2
automorphism, 164 division, 8
division algorithm, 12
Bhaskara, 70 division algorithm, Gaussian integers,
Bolzano, 96 73
division algorithm, polynomials, 118
Cardano, 147
Carmichael numbers, 90 Eisenstein’s criterion, 123
casting out nines, 16 elementary symmetric polynomial, 139
Cauchy, 96 encryption, 85
Cauchy sequence, 96 equivalence relation, 40
Cauchy’s bound, 108 Euclid, 10
183
184 INDEX

Euclidean algorithm, 14 irreducible, 22


Euclidean algorithm, Euclidean irreducible polynomial, 118, 171
domains, 24 irreducible polynomials, 121
Euclidean algorithm, polynomials, 119 isomorphic, 4
Euclidean Domain, 23 isomorphism, 4, 153, 160, 164
Euclidean function, 23
Euler’s phi function, 51 key, 85
Euler’s Theorem, 51
exponential function, 104 Lagrange, 92
extreme value theorem, 97, 106, 107 law of quadratic reciprocity, 79
Least upper bound property, 97
factoring algorithms, 91 Lenstra, 92
factoring algorithms, polynomials, 174 Lindemann, 130
Fermat, 59 Liouville, 130
Fermat’s equation, 59 Liouville numbers, 131, 132
Fermat’s equation, n = 4, 63
Fermat’s last theorem, 84 Möbius inversion formula, 173
Fermat’s Little Theorem, 48 minimal polynomial, 128
Fermat’s little theorem for finite fields, modular arithmetic, 34
155 modulus, 100
field, 37 monic, 39
field generated by an element, 118 multiplicative inverse, 37
formal derivative, 170 Multivariate polynomial ring, 118
Fraction field, 42
natural numbers, 3
Frobenius automorphism, 164, 167
Newton–Girard identities, 142
fundamental theorem of algebra, 107
norm, 65
Fundamental Theorem of Arithmetic,
16, 17
order, 3
Galois, 147 order of an element, 54
Gauss, 79
Partial fraction decomposition, complex
Gauss’s Lemma, 121
polynomials, 108
Gauss’s Theorem on UFDs, 123
Partial fraction decomposition, real
Gaussian integers, 73
polynomials, 110
Gelfond, 131
Pell’s equation, 66, 70
generator of a field, 159
polar coordinates, 101
Gershgorin Disc Theorem, 109
polynomials, properties, 115
greatest common divisor, 13
Pommerance, 92
group of units, 38
positiveInteger, 4
Hensel’s Lemma, 174, 177 power sum, 142
Hermite, 130 primality testing, 89
prime, 8, 66
imaginary part, 100 prime factorization, existence, 8 √
induced map, 150 prime factorization, existence in Z[ d],
infinite decent, 71 67
infinite descent, 63 prime factorization, uniqueness, 17
Integers, properties, 1 prime number theorem, 10
integral domain, 22 prime, in Gaussian integers, 74
intermediate value theorem, 97 primes that are sums of two squares,
irrational number, 19 Zagier’s proof, 77
irrational numbers, 19 primes, infinitely many, 10
irrational numbers, e, 20 primitive polynomial, 121
irrationality test, 128 primitive root, 54
INDEX 185

primitive root, in a field, 161


principle of induction, 5
products of sums of squares, 101
Pythagoras, 19
Pythagorean triples, 60

quadratic number domains, 65


quadratic residue, 79

Rational Root Theorem, 122


real numbers, 95
real part, 100
reflexivity, 40
relatively prime, 13
relatively prime in an integral domain,
26
representative, 34
ring, 2
Rivest, 86
RSA scheme, 86

Schneider, 131
Shamir, 86
sieve of Eratosthenes, 9
signature, 87
square free, 19
Sturm’s algorithm, 136
Sum Angle Formula for sin and cos, 106
sum of two squares, 75
symmetric functions, 139
symmetry, 40

Tartaglia, 147
Theodorus, 19
trace, 168
transcendental number, 20, 130
transcendental numbers, e, 132
transitivity, 40
triangle inequality, 100

UFD, Gauss’s Theorem, 123


unique factorization domain, 68
unique factorization, Euclidean
domains, 27
unique factorization, Gaussian integers,
73 √
unique factorization, in Z[ d], 68
unique factorization, polynomials, 120
units, 8, 17, 22, 38

well defined, 41
well ordering principle, 5
Wiles, 60
Wilson’s Theorem, 49
Published Titles in This Series
31 Kenneth R. Davidson and Matthew Satriano, Integer and Polynomial Algebra, 2023
30 Jonathan K. Hodge and Richard E. Klima, The Mathematics of Voting and
Elections: A Hands-On Approach, Second Edition, 2018
29 Margaret Cozzens and Steven J. Miller, The Mathematics of Encryption, 2013
28 David Wright, Mathematics and Music, 2009
27 Jacques Sesiano, An Introduction to the History of Algebra, 2009
26 A. V. Akopyan and A. A. Zaslavsky, Geometry of Conics, 2007
25 Anne L. Young, Mathematical Ciphers, 2006
24 Burkard Polster, The Shoelace Book, 2006
23 Koji Shiga and Toshikazu Sunada, A Mathematical Gift, III, 2005
22 Jonathan K. Hodge and Richard E. Klima, The Mathematics of Voting and
Elections: A Hands-On Approach, 2005
21 Gilles Godefroy, The Adventure of Numbers, 2004
20 Kenji Ueno, Koji Shiga, and Shigeyuki Morita, A Mathematical Gift, II, 2004
19 Kenji Ueno, Koji Shiga, and Shigeyuki Morita, A Mathematical Gift, I, 2003
18 Timothy G. Feeman, Portraits of the Earth, 2002
17 Serge Tabachnikov, Editor, Kvant Selecta: Combinatorics, I, 2002
16 V. V. Prasolov, Essays on Numbers and Figures, 2000
15 Serge Tabachnikov, Editor, Kvant Selecta: Algebra and Analysis, II, 1999
14 Serge Tabachnikov, Editor, Kvant Selecta: Algebra and Analysis, I, 1999
13 Saul Stahl, A Gentle Introduction to Game Theory, 1999
12 V. S. Varadarajan, Algebra in Ancient and Modern Times, 1998
11 Kunihiko Kodaira, Editor, Basic Analysis: Japanese Grade 11, 1996
10 Kunihiko Kodaira, Editor, Algebra and Geometry: Japanese Grade 11, 1996
9 Kunihiko Kodaira, Editor, Mathematics 2: Japanese Grade 11, 1997
8 Kunihiko Kodaira, Editor, Mathematics 1: Japanese Grade 10, 1996
7 Dmitri Fomin, Sergey Genkin, and Ilia V. Itenberg, Mathematical Circles, 1996
6 David W. Farmer and Theodore B. Stanford, Knots and Surfaces, 1996
5 David W. Farmer, Groups and Symmetry: A Guide to Discovering Mathematics, 1996
4 V. V. Prasolov, Intuitive Topology, 1994
3 L. E. Sadovskiı̆ and A. L. Sadovskiı̆, Mathematics and Sports, 1993
2 Yu. A. Shashkin, Fixed Points, 1991
1 V.M. Tikhomirov, Stories about Maxima and Minima, 1991
This book is a concrete introduction to abstract algebra
and number theory. Starting from the basics, it develops
the rich parallels between the integers and polynomials,
covering topics such as Unique Factorization, arithmetic
over quadratic number fields, the RSA encryption scheme,
and finite fields.
In addition to introducing students to the rigorous foun-
dations of mathematical proofs, the authors cover several
specialized topics, giving proofs of the Fundamental
Theorem of Algebra, the transcendentality of e ,
and Quadratic Reciprocity Law. The book is aimed at
incoming undergraduate students with a strong passion
for mathematics.

For additional information


and updates on this book, visit
www.ams.org/bookpages/mawrld-31

MAWRLD/31
www.ams.org

You might also like