Intro Abstract Algebra
c 1997-8, Paul Garrett, garrett@math.umn.edu
http://www.math.umn.edu/~garrett/
1
Contents
(1) Basic Algebra of Polynomials
(2) Induction and the Well-ordering Principle
(3) Sets
(4) Some counting principles
(5) The Integers
(6) Unique factorization into primes
(7) (*) Prime Numbers
(8) Sun Ze's Theorem
(9) Good algorithm for exponentiation
(10) Fermat's Little Theorem
(11) Euler's Theorem, Primitive Roots, Exponents, Roots
(12) (*) Public-Key Ciphers
(13) (*) Pseudoprimes and Primality Tests
(14) Vectors and matrices
(15) Motions in two and three dimensions
(16) Permutations and Symmetric Groups
(17) Groups: Lagrange's Theorem, Euler's Theorem
(18) Rings and Fields: de nitions and rst examples
(19) Cyclotomic polynomials
(20) Primitive roots
(21) Group Homomorphisms
(22) Cyclic Groups
(23) (*) Carmichael numbers and witnesses
(24) More on groups
(25) Finite elds
(26) Linear Congruences
(27) Systems of Linear Congruences
(28) Abstract Sun Ze Theorem
(29) (*) The Hamiltonian Quaternions
(30) More about rings
(31) Tables
2
1. Basic Algebra of Polynomials
Completing the square to solve a quadratic equation is perhaps the rst really good trick in elementary algebra. It depends upon appreciating the form of the square of the binomial + :
x
( + )2 =
x
y
2
x
+
+
xy
yx
+
y
2
=
x
+2 +
2
xy
y
y
2
Thus, running this backwards,
2
x
+
ax
=
x
2
+ 2( 2 ) =
a
x
x
2
+ 2( 2 ) + ( 2 )2 , ( 2 )2
a
a
x
a
= ( + 2 )2 , ( 2 )2
a
x
Then for 6= 0,
a
+
2
ax
can be rewritten as
0= 0 =
a
Thus,
2
x
+ 22
b
a
x
a
+ =0
bx
c
+ = ( + 2 )2 + , ( 2 )2
c
b
x
a
a
( + 2 )2 = ( 2 )2 ,
b
x
x
c
a
b
r
a
a
a
c
c
b
a
r
a
a
+ 2 = ( 2 )2 ,
b
b
a
= , 2 ( 2 )2 ,
from which the usual Quadratic Formula is easily obtained.
For positive integers , we have the factorial function de ned:
b
x
b
a
c
a
a
n
!=123
n
:::
( , 2) ( , 1)
n
n
n
Also, we take 0! = 1. The fundamental property is that
( + 1)! = ( + 1) !
n
n
n
And there is the separate de nition that 0! = 1. The latter convention has the virtue that it works out in
practice, in the patterns in which factorials are most often used.
The binomial coecients are numbers with a special notation
n
k
= ! ( ,! )!
n
k
n
k
The name comes from the fact that these numbers appear in the binomial expansion (expansion of powers
of the binomial ( + )):
( + )n =
x
n+
x
y
n
1
n,1 y +
x
n
2
x
y
n,2 y 2 + : : : +
x
3
n
n
,2
2
x y
n,2 +
n
n
,1
xy
n,1 + y n
=
Notice that
X n ,
x
y
n
0in
i
i
i
n
n
= n0 = 1
There are standard identities which are useful in anticipating factorization of special polynomials and
special forms of numbers:
2
2
x , y = (x , y )(x + y )
3
3
2
2
3
3
2
2
x , y = (x , y )(x + xy + y )
x + y = (x + y )(x , xy + y )
4
4
3
2
2
3
x , y = (x , y )(x + x y + xy + y )
5
5
4
3
2 2
3
4
x , y = (x , y )(x + x y + x y + xy + y )
5
5
4
3
2 2
3
4
x + y = (x + y )(x , x y + x y , xy + y )
and so on. Note that for odd exponents there are two identities while for even exponents there is just one.
#1.1 Factor 6 , 6 in two di erent ways.
#1.2 While we mostly know that 2 , 2 has a factorization, that 3 ,
x
y
has a factorization, that x3 + y3
has, and so on, there is a factorization that seldom appears in `high school': x4 + 4y4 has a factorization into
two quadratic pieces, each with 3 terms! Find this factorization. Hint:
x
x
4
y
x
y
3
+ 4y4 = (x4 + 4x2 y2 + 4y4) , 4x2 y2 = (x2 + 2y2 )2 , (2xy)2
4
2.
Induction and the Well-ordering Principle
The meaning of the word `induction' within mathematics is very di erent from the colloquial sense!
First, let P (n) be a statement involving the integer n, which may be true or false. That is, at this point
we have a grammatically correct sentence, but are making no general claims about whether the sentence is
true, true for one particular value of n, true for all values of n, or anything. It's just a sentence.
Now we introduce some notation that is entirely compatible with our notion of function, even if the
present usage is a little surprising. If the sentence P (n) is true of a particular integer n, write
P (n) = true
and if the sentence asserts a false thing for a particular n, write
P (n) = false
That is, we can view P as a function, but instead of producing numbers as output it produces either `true'
or `false' as values. Such functions are called boolean.
This style of writing, even if it is not what you already knew or learned, is entirely parallel to ordinary
English, is parallel to programming language usage, and has many other virtues.
Caution: There is an another, older tradition of notation in mathematics which is somewhat di erent,
which is and which is harder to read and write unless you know the trick, since it is not like ordinary English
at all. In that other tradition, to write `P (n)' is to assert that the sentence `P (n)' is true. In the other
tradition, to say that the sentence is false you write `:P (n)' or ` P (n)'.
So, yes, these two ways of writing are not compatible with each other. Too bad. We need to make a
choice, though, and while I once would have chosen what I call the `older' tradition, now I like the rst way
better, for several reasons. In any case, you should be alert to the possibility that other people may choose
one or the other of these writing styles, and you have to gure it out from context!
Principle of Induction
If P (1) = true, and
if P (n) = true implies P (n + 1) = true for every positive integer n,
then P (n) = true for every positive integer n.
Caution: The second condition does not directly assert that P (n) = true, nor does it directly assert
that P (n + 1) = true. Rather, it only asserts a relative thing. That is, more generally, with some sentences
A and B (involving n or not), an assertion of the sort
(A implies B ) = true
does not assert that A = true nor that B = true, but rather can be re-written as conditional assertion
if (A = true) then B = true
In other words we prove that an implication is true.
5
That is, pushing this notation style a little further, we usually prove
(A = true) implies B = true) = true
In the more traditional notation, the assertion of Mathematical Induction is
If P (1), and
if P (n) implies P (n + 1) for every positive integer n,
then P (n) for every positive integer n. Even though I am accustomed to this style of writing, in the end
I think it is less clear!
Another Caution: Whatever the notation we use, the statements above do not indicate the way that
we usually go about proving something by induction. Rather, what we use is Practical Paraphrase of
`Principle of Induction':
First, prove P (1) = true.
Second, assume P (n) = true and using this prove P (n + 1) = true (for every positive integer n)
Then conclude P (n) = true for every positive integer n.
The second item in this procedure is what is usually called the induction step. Our paraphrase makes
it look a little di erent than the more ocial version: in the ocial version, it looks like we have to prove
that an implication is correct, whereas by contrast in our modi ed version we instead assume something true
and see if we can then prove something else.
The most popular traditional example is to prove by induction that
1 + 2 + 3 + 4 + : : : + (n , 2) + (n , 1) + n = 12 n(n + 1)
Let P (n) = true be the assertion that this formula holds for a particular integer n. So the assertion
P (1) = true is just the assertion that
1 = 12 1(1 + 1)
which is indeed true. To do the induction step, we assume that
1 + 2 + 3 + 4 + : : : + (n , 2) + (n , 1) + n = 21 n(n + 1)
is true and try to prove from it that
1 + 2 + 3 + 4 + : : : + (n , 2) + (n , 1) + n + (n + 1) = 12 (n + 1)((n + 1) + 1)
is true.
Well, if we add n + 1 to both sides of the assumed equality
1 + 2 + 3 + 4 + : : : + (n , 2) + (n , 1) + n = 21 n(n + 1)
then we have
1 + 2 + 3 + 4 + : : : + (n , 2) + (n , 1) + n + (n + 1) = 12 n(n + 1) + (n + 1)
6
The left-hand side is just what we want, but the right hand side is not. But we hope that it secretly is what
we want; that is, we hope that
1 n(n + 1) + (n + 1) = 1 (n + 1)((n + 1) + 1)
2
2
We have to check that this is true.
This raises an auxiliary question, which is easy enough to answer once we make it explicit: how would
a person go about proving that two polynomials are equal? The answer is that both of them should be
simpli ed and rearranged in descending (or ascending) powers of the variable, and then check that corresponding coecients are equal. (And this description de nitely presumes that we have polynomials in just
one variable.)
In the present example it's not very hard to do this rearranging: rst, one side of the desired equality
simpli es and rearranges to
1 n(n + 1) + (n + 1) = 1 n2 + 1 n + n + 1 = 1 n2 + 3 n + 1
2
2
2
2
2
On the other hand, the other side of the desired equality simpli es and rearranges to
Or we can try to be a little lucky and just directly rearrange one side of the desired equality of polynomials into the other: in simple situations this works, and if you have some luck, but is not the general
approach. Still, we can manage it in this example:
1 n(n + 1) + (n + 1) = ( 1 n + 1)(n + 1) = 1 (n + 2)(n + 1) =
2
2
2
= 21 (n + 1)(n + 2) = 12 (n + 1)((n + 1) + 1)
Thus, we can conclude that if
1 + 2 + 3 + 4 + : : : + (n , 2) + (n , 1) + n = 12 n(n + 1)
then
1 + 2 + 3 + 4 + : : : + (n , 2) + (n , 1) + n + (n + 1) = 12 (n + 1)((n + 1) + 1)
which is the implication we must prove to complete the induction step. Thus, we conclude that this formula
really does hold for all positive integers n.
In this example, we used the fact that we knew what we were supposed to be getting to help us do the
elementary algebra to complete the induction step. We certainly needed to know what the right formula was
before attempting to prove it! This is typical of this sort of argument!
In some circumstances, a seemingly di erent proof concept works better:
Well-Ordering Principle Every non-empty subset of the positive integers has a least element.
This Well-Ordering Principle sounds completely innocuous, but it is provably logically equivalent to the
Principle of Induction. Another logically equivalent variant is:
Let P be a property that an integer may or may not have. If P (1) = true, and if P (m) = true for all
m < n implies that P (n) = true, then P holds for all integers.
In the other notation, this would be written
7
Let P be a property that an integer may or may not have. If P (1), and if P (m) for all m < n implies
that P (n), then P holds for all integers.
#2.3 Prove by induction that
#2.4 Prove by induction on n that
1 + 2 + 3 + : : : + n = 21 n(n + 1)
xn , 1 = (x , 1)(xn,1 + xn,2 + xn,3 + : : : + x2 + x + 1)
Hint: To do the induction step, notice that
xn+1 , 1 = xn+1 , x + x , 1 = x(xn , 1) + (x , 1)
#2.5 Prove by induction that
12 + 22 + 32 + : : : + n2 = 13 n3 + 21 n2 + 61 n
#2.6 Prove by induction the following relation among binomial coecients:
n + n = n+1
k
k,1
k
for integers 0 < k n.
#2.7 (*) Prove by induction that
(1 + 2 + 3 + : : : + n)2 = 13 + 23 + 33 + : : : + n3
#2.8 (**) How would one systematically obtain the \formula" for 1k + 2k + 3k + : : : + (n , 1)k + nk for a
xed positive integer exponent k?
8
3.
Sets
Sets and functions
Equivalence relations
3.1 Sets
Here we review some relatively elementary but very important terminology and concepts about sets and
functions, in a slightly abstract setting. We use the word map as a synonym for \function", as is very often
done.
Naively, a set is supposed to be a collection of `things' (?) described by `listing' them or prescribing
them by a `rule'. Please note that this is not a terribly precise description, but will be adequate for most of
our purposes. We can also say that a set is an unordered list of di erent things.
There are standard symbols for some often-used sets:
= fg = set with no elements
Z = the integers
Q = the rational numbers
R = the real numbers
C = the complex numbers
A set described by a list is something like
S = f1; 2; 3; 4; 5; 6; 7; 8g
which is the set of integers bigger than 0 and less than 9. This set can also be described by a rule by
S = f1; 2; 3; 4; 5; 6; 7; 8g = fx : x is an integer and 1 x 8g
This follows the general format and notation
fx : x
has some propertyg
If x is in a set S , then write x 2 S or S 3 x, and say that x is an element of S . Thus, a set is the
collection of all its elements (although this remark only explains the language). It is worth noting that the
ordering of a listing has no e ect on a set, and if in the listing of elements of a set an element is repeated,
this has no e ect. For example,
f1; 2; 3g = f1; 1; 2; 3g = f3; 2; 1g = f1; 3; 2; 1g
9
A subset T of a set S is a set all of whose elements are elements of S . This is written T S or S T .
So always S S and S . If T S and T 6= and T 6= S , then T is a proper subset of S . Note that
the empty set is a subset of every set. For a subset T of a set S , the complement of T (inside S ) is
T c = S , T = fs 2 S : s 62 T g
Sets can also be elements of other sets. For example, fQ; Z; R; Cg is the set with 4 elements, each of
which is a familiar set of numbers. Or, one can check that
ff1; 2g; f1; 3g; f2; 3gg
is the set of two-element subsets of f1; 2; 3g.
The intersection of two sets A; B is the collection of all elements which lie in both sets, and is denoted
A \ B . Two sets are disjoint if their intersection is . If the intersection is not empty, then we may say
that the two sets meet. The union of two sets A; B is the collection of all elements which lie in one or the
other of the two sets, and is denoted A [ B .
Note that, for example, 1 6= f1g, and ff1gg 6= f1g. That is, the set fag with sole element a is not the
same thing as the item a itself.
An ordered pair (x; y) is just that, a list of two things in which there is a rst thing, here x, and a
second thing, here y . Two ordered pairs (x; y ) and (x ; y ) are equal if and only if x = x and y = y .
The (cartesian) product of two sets A; B is the set of ordered pairs (a; b) where a 2 A and b 2 B .
It is denoted A B . Thus, while fa; bg = fb; ag might be thought of as an unordered pair, for ordered pairs
(a; b) 6= (b; a) unless by chance a = b.
In case A = B , the cartesian power A B is often denoted A2 . More generally, for a xed positive
integer n, the nth cartesian power An of a set is the set of ordered n-tuples (a1 ; a2 ; : : : ; an ) of elements ai
of A.
Some very important examples of cartesian powers are those of R or Q or C, which arise in other
contexts as well: for example, R2 is the collection of ordered pairs of real numbers, which we use to describe
points in the plane. And R3 is the collection of ordered triples of real numbers, which we use to describe
points in three-space.
The power set of a set S is the set of subsets of S . This is sometimes denoted by P S . Thus,
0
0
0
0
P = f;g
Pf1; 2g = f; f1g; f2g; f1; 2gg
Intuitively, a function f from one set A to another set B is supposed to be a `rule' which assigns to
each element a 2 A an element b = f (a) 2 B . This is written as
f :A!B
although the latter notation gives no information about the nature of f in any detail.
More rigorously, but less intuitively, we can de ne a `function' by really telling its graph: the formal
de nition is that a function f : A ! B is a subset of the product A B with the property that for every
a 2 A there is a unique b 2 B so that (a; b) 2 f . Then we would write f (a) = b.
This formal de nition is worth noting at least because it should make clear that there is absolutely no
requirement that a function be described by any recognizable or simple `formula'.
10
As a silly example of the formal de nition of function, let f : f1; 2g ! f2; 4g be the function `multiplyby-two', so that f (1) = 2 and f (2) = 4. Then the `ocial' de nition would say that really f is the subset of
the product set f1; 2g f2; 4g consisting of the ordered pairs (1; 2); (2; 4). That is, formally the function f
is the set
f = f(1; 2); (2; 4)g
Of course, no one often really operates this way.
A function f : A ! B is surjective (or onto) if for every b 2 B there is a 2 A so that f (a) = b. A
function f : A ! B is injective (or one-to-one) if f (a) = f (a0 ) implies a = a0 . That is, f is injective if for
every b 2 B there is at most one a 2 A so that f (a) = b. A map is a bijection if it is both injective and
surjective.
The number of elements in a set is its cardinality. Two sets are said to have the same cardinality
if there is a bijection between them. Thus, this is a trick so that we don't have to actually count two sets
to see whether they have the same number of elements. Rather, we can just pair them up by a bijection to
achieve this purpose.
Since we can count the elements in a nite set in a traditional way, it is clear that a nite set has no
bijection to a proper subset of itself. After all, a proper subset has fewer elements.
By contrast, for in nite sets it is easily possible that proper subsets have bijections to the whole set. For
example, the set A of all natural numbers and the set E of even natural numbers have a bijection between
them given by
n ! 2n
But certainly E is a proper subset of A! Even more striking examples can be arranged. In the end, we take
as de nition that a set is in nite if it has a bijection to a proper subset of itself.
Let f : A ! B be a function from a set A to a set B , and let g : B ! C be a function from the set B
to a set C . The composite function g f is de ned to be
(g f )(a) = g(f (a))
for a 2 A.
The identity function on a non-empty set S is the function f : S ! S so that f (a) = a for all a 2 A.
Often the identity function on a set S is denoted by idS .
Let f : A ! B be a function from a set A to a set B . An inverse function g : B ! A for f (if such g
exists at all) is a function so that (f g)(b) = b for all b 2 B , and also (g f )(a) = a for all a 2 A. That is,
the inverse function (if it exists) has the two properties
f g = idB g f = idA
An inverse function to f , if it exists at all, is usually denoted f ,1. (This is not at all the same as 1=f !)
Proposition: A function f : A ! B from a set A to a set B has an inverse if and only if f is a bijection.
In that case, the inverse is unique (that is, there is only one inverse function).
Proof: We de ne a function g : B ! A as follows. Given b 2 B , let a 2 A be an element so that f (a) = b.
Then de ne g(b) = a. Do this for each b 2 B to de ne g. Note that we use the surjectivity to know that
there exists an a for each b, and the injectivity to be sure of its uniqueness.
To check that g f = idA , compute: rst, for any a 2 A, f (a) 2 B . Then g(f (a)) is, by de nition, an
element a0 2 A so that f (a0 ) = f (a). Since f is injective, it must be that a0 = a. To check that f g = 1,
take b 2 B and compute: by de nition of g, g(b) is an element of A so that f (g(b)) = b. But that is (after
all) just what we want. Done.
11
3.2 Equivalence Relations
The idea of equivalence relation (de ned below) is an important extension and generalization of the
traditional idea of equality, and occurs throughout mathematics. The associated idea of equivalence class
(also de ned just below) is equally important.
The goal here is to make precise both the idea and the notation in writing something like \ " to
mean that and have some speci ed common feature. We can set up a general framework for this without
worrying about the speci cs of what the features might be.
Recall the \formal" de nition of a function from a set to a set : while we think of as being some
sort of rule which to an input 2 \computes" or \associates" an output ( ) 2 , this way of talking is
inadequate, for many reasons.
Rather, the formal (possibly non-intuitive) de nition of function from a set to a set is that it is
a subset of the cartesian product with the property
For each 2 there is exactly one 2 so that ( ) 2 .
Then connect this to the usual notation by
x
x
f
s
S
T
f
S
f s
f
G
S
s
T
S
S
t
T
s; t
( )=
( )2
if
t
G
s; t
G
(Again, this would be the graph of if and were simply the real line, for example).
In this somewhat formal context, rst there is the primitive general notion of relation
relation on a set is simply a subset of . Write
G
T
T
f s
R
y
y
f
S
S
T
S
R
on a set : a
S
S
x R y
if the ordered pair ( ) lies in the subset of .
This de nition of \relation" compared to the formal de nition of \function" makes it clear that every
function is a relation. But most relations do not meet the condition to be functions. This de nition of
\relation" is not very interesting except as set-up for further development.
An equivalence relation on a set is a special kind of relation, satisfying
Re exivity:
for all 2
Symmetry: If
then
Transitivity: If
and
then
The fundamental example of an equivalence relation is ordinary equality of numbers. Or equality of
sets. Or any other version of `equality' to which we are accustomed. It should also be noted that a very
popular notation for an equivalence relation is
x; y
R
R
x R x
x R y
x R y
x
S
S
S
S
y R x
y R z
x R z
x
y
(that is, with a tilde rather than an `R'). Sometimes this is simply read as tilde , but also sometimes as
is equivalent to with only implicit reference to the equivalence relation.
A simple example of an equivalence relation on the set R2 can be de ned by
x
x
y
(
x; y
)(
0
x ;y
0
)
if and only if
12
x
=
0
x
y
That is, in terms of analytic geometry, two points are equivalent if and only if they lie on the same vertical
line. Veri cation of the three required properties in this case is easy, and should be carried out by the reader.
Let be an equivalence relation on a set S . For x 2 S , the - equivalence class x containing x is
the subset
x = fx 2 S : x xg
The set of equivalence classes of on S is denoted by
0
0
S=
(as if we were taking a quotient of some sort). Every element z 2 S is certainly contained in an equivalence
class, namely the equivalence class of all s 2 S so that s z .
Note that in general an equality x = y of equivalence classes x; y is no indication whatsoever that x = y.
While it is always true that x = y implies x = y, in general there are many other elements in x than just x
itself.
Proposition: Let be an equivalence relation on a set S . If two equivalence classes x
; y have any common
element z , then x = y.
Proof: If z 2 x \ y, then z x and z y . Then for any x 2 x, we have
0
x xzy
0
so x y by transitivity of . Thus, every element x 2 x actually lies in y. That is, x y. A symmetrical
argument, reversing the roles of x and y, shows that y x. Therefore, x = y. Done.
It is important to realize that while we tend to refer to an equivalence class in the notational style x for
some x in the class, there is no requirement to do so. Thus, it is legitimate to say \an equivalence class A
for the equivalence relation on the set S ".
But of course, given an equivalence class A inside S , it may be convenient to nd x in the set S so
that x = A. Such an x is a representative for the equivalence class. Any element of the subset A is a
representative, so in general we certainly should not imagine that there is a unique representative for an
equivalence class.
Proposition: Let be an equivalence relation on a set S . Then the equivalence classes of on S are
mutually disjoint sets, and their union is all of S .
Proof: The fact that the union of the equivalence classes is the whole thing is not so amazing: given x 2 S ,
x certainly lies inside the equivalence class
0
0
fy 2 S : y xg
Now let A and B be two equivalence classes. Suppose that A \ B 6= , and show that then A = B
(as sets). Since the intersection is non-empty, there is some element y 2 A \ B . Then, by the de nition of
\equivalence class", for all a 2 A we have a y, and likewise for all b 2 B we have b y. By transitivity,
a b. This is true for all a 2 A and b 2 B , so (since A and B are equivalence classes) we have A = B .
Done.
A set S of non-empty subsets of a set S whose union is the whole set S , and which are mutually disjoint,
is called a partition of S . The previous proposition can be run the other direction, as well:
Proposition: Let S be a set, and let S be a set of subsets of S , so that S is a partition of S . De ne a
relation on S by x y if and only if there is X 2 S so that x 2 X and y 2 X . That is, x y if and only
if they both lie in the same element of S . Then is an equivalence relation, and its equivalence classes are
the elements of S .
13
Proof: Since the union of the sets in S is the whole set S , each element x 2 S is contained in some X 2 S .
Thus, we have the re exivity property x x. If x y then there is X 2 S containing both x and y, and
certainly y x, so we have symmetry.
Finally, the mutual disjointness of the sets in S assures that each y 2 S lies in just one of the sets from
S . For y 2 S , let X be the unique set from S which contains y. If x y and y z , then it must be that
x 2 X and z 2 X , since y lies in no other subset from S . Then x and z both lie in X , so x z , and we have
transitivity.
Veri cation that the equivalence classes are the elements of S is left as an exercise. Done.
#3.9 How many elements in the set f1; 2; 2; 3; 3; 4; 5g?
How many in the set f1; 2; f2g; 3; f3g; 4; 5g? In
f1; 2; f2; 3g; 3; 4; 5g?
#3.10 Let A = f1; 2; 3; 4; 5g and B = f3; 4; 5; 6; 7g. List (without repetition) the elements of the sets A [ B,
A \ B , and of fx 2 A : x 62 B g.
#3.11 List all the elements of the power set (set of subsets) of f1; 2; 3g.
#3.12 Let A = f1; 2; 3g and B = f2; 3g. List (without repetition) all the elements of the cartesian product
set A B .
#3.13 How many functions are there from the set f1; 2; 3g to the set f2; 3; 4; 5g?
#3.14 How many injective functions are there from f1; 2; 3g to f1; 2; 3; 4g?
#3.15 How many surjective functions are there from f1; 2; 3; 4g to f1; 2; 3g?
#3.16 Show that if f : A ! B and g : B ! C are functions with inverses, then g f has an inverse, and
this inverse is f ,1 g,1.
#3.17 Show that for a surjective function f : A ! B there is a right inverse g, meaning a function
g : B ! A so that f g = idB (but not necessarily g f = idA .)
#3.18 Show that for an injective function f : A ! B there is a left inverse g, meaning a function g : B ! A
so that g f = idA (but not necessarily f g = idB .)
#3.19 Give a bijection from the collection 2Z of even integers to the collection Z of all integers.
#3.20 (*) Give a bijection from the collection of all integers to the collection of non-negative integers.
#3.21 (**) Give a bijection from the collection of all positive integers to the collection of all rational numbers.
#3.22 (**) This illustrates a hazard in a too naive notion of \rule" for forming a set. Let S be the set of
all sets which are not an element of themselves. That is, let
S = f sets x : x 62 xg
Is S 2 S or is S 62 S ? (Hint: Assuming either that S is or isn't an element of itself leads to a contradiction.
What's going on?)
14
4. Some counting principles
Here we go through some important but still relatively elementary examples of counting.
First example: Suppose we have distinct things, for example the integers from 1 to inclusive. The
question is how many di erent orderings or ordered listings
n
n
i1 ; i2 ; i3 ; : : : ; i
n,1 ; in
of these numbers are there? (Note that this is in contrast to the unordered listing in a set). The answer is
obtained by noting that there are n choices for the rst thing i1 , then n , 1 remaining choices for the second
thing i2 (since we can't reuse whatever i1 was!), n , 2 remaining choices for i3 (since we can't reuse i1 nor
i2 , whatever they were!), and so on down to 2 remaining choices for in,1 and then just one choice for in .
Thus, there are
n (n , 1) (n , 2) : : : 2 1 = n!
possible orderings of n distinct things.
Second example: How many subsets of k elements are there in a set of n things? There are n possibilities
for the rst choice, n , 1 remaining choices for the second (since the rst item is removed), n , 2 for the
third (since the rst and second items are no longer available), and so on down to n , (k , 1) choices for the
k th. This number is n!=(n , k )!, but is not what we want, since it includes a count of all di erent orders of
choices. That is,
n!
(n , k)! = k! the actual number
since we saw in the previous example that there are k! possible orderings of k distinct things. Thus, there
are
n!
n
=
k
k ! (n , k )!
choices of subsets of k elements in a set with n elements. This appearance of a binomial coecient is typical.
Third example: How many disjoint pairs of subsets
of k elements each are there in a set with n elements,
n
where 2k n? We just saw that there are k choices for the rst subset with k elements. Then the
,
remaining part of the original set has just , elements, so there are
choices for the second subset
of elements. But our counting so far accidentally takes into account a rst subset and a second one, which
is not what the question is. By now we know that there are 2! = 2 choices of ordering of two things (subsets,
for example). Therefore, there are
!
( , )!
1
, =1
2
2 ( , )! ! !( , 2 )!
= 2 ! !( ! , 2 )!
pairs of disjoint subsets of elements each inside a set with elements.
Generalizing the previous: For integers
with , we could ask how many families of disjoint
subsets of elements each are there inside a set of elements? There are
n
k
n
k
k
k
n
k
n
n
k
k
n
n
k
k k
n
k
k
n
k
k
n
k
n
n; `; k
k
k
n
n
n
k
15
k`
`
choices for the rst subset,
for the second,
for the third, up to
n,k
k
n , 2k
k
n , (` , 1)k
k
for the `th subset. But since ordering of these subsets is accidentally counted here, we have to divide by `!
to have the actual number of families. There is some cancellation among the factorials, so that the actual
number is
n!
`
`! (k!) (n , `k)!
#4.23 How many di erent ways are there to
the set f1; 2; 3; 4g?
#4.24 How many choices of 3 things from the list 1; 2; 3; : : :; 9; 10 (
#4.25 How many subsets of f1; 2; 3; 4; 5; 6; 7g with exactly 4 elements?
#4.26 How many di erent choices are there of an
pair of
reorder
without
replacement)?
unordered
distinct numbers from the set
pair?
#4.27 How many di erent choices are there of an ordered triple of numbers from the set f1; 2; : : :; 9; 10g?
#4.28 How many subsets of all sizes are there of a set S with n elements? (Hint: Go down the list of all
elements in the set: for each one you have 2 choices, to include it or to exclude it. Altogether how many
choices?)
#4.29 How many pairs of disjoint subsets A; B each with 3 elements inside the set f1; 2; 3; 4; 5; 6; 7; 8g?
f1; 2; : : :; 9; 10g? How many choices of
ordered
16
5.
The Integers
Divisibility
The division/reduction algorithm
Euclidean algorithm
Unique factorization
Multiplicative inverses modulo m
Integers modulo m
5.1 The integers
For two integers d; n, the integer d divides n (or is a divisor of n) if n=d is an integer. This is equivalent
to there being another integer k so that n = kd. As equivalent terminology, we may also (equivalently) say
that n is a multiple of d if d divides n.
A divisor d of n is proper if it is not n nor 1. A multiple N of n is proper if it is neither n. The
notation
dn
j
is read as `d divides n'. Notice that any integer d divides 0, since d 0 = 0. On the other hand, the only
integer 0 divides is itself.
A positive integer is prime if it has no proper divisors. That is, it has no divisors but itself, its negative,
and 1. Usually we only pay attention to positive primes.
The following is the simplest but far from most ecient test for primality. It does have the virtue that
if a number is not prime then this process nds the smallest divisor d > 1 of the number.
Proposition: A positive integer n is prime if and only if it is not divisible by any of the integers d with
1<d
Proof:
n.
p
First, if d n and 2 < d
j
p
n, then the integer n=d satis es
n nd n2
p
(where we are looking at inequalities among real numbers!). Therefore, neither of the two factors d nor n=d
is 1 nor n. So n is not prime.
On the other hand, suppose that n has a proper factorization n = d e, where e is the larger of the two
factors. Then
d = ne nd
gives d2 n, so d
n. Done.
Two integers are relatively prime or coprime if for every integer d if d m and d n then d = 1. Also
we may say that m is prime to n if they are relatively prime. For a positive integer n, the number of positive
p
j
17
j
integers less than n and relatively prime to n is denoted by '(n). This is called the Euler phi function.
(The trial-and-error approach to computing '(n) is suboptimal. We'll get a better method shortly.)
Proposition:
If a b and b c then a c.
If d x and d y, then for any integers a; b we have d (ax + by).
Proof: If a b then there is an integer k so that ak = b. If b c then there is an integer ` so that b` = c. Then,
replacing b by ak in the latter equation, we have
j
j
j
j
j
j
j
j
c = b` = (ak) ` = a (k`)
so a c.
If d x then there is an integer m so that dm = x. If d y then there is an integer n so that dn = y. Then
j
j
j
ax + by = a(md) + b(nd) = (am + bn) d
Thus, ax + by is a multiple of d.
Done.
5.2 The division/reduction algorithm
For a non-zero integer m, there is the process of reduction modulo m, which can be applied to
arbitrary integers N . At least if m and N are positive, this is exactly the division-with-remainder process
of elementary arithemetic, with the quotient discarded: the reduction modulo m of N is the remainder
when N is divided by n. This procedure is also called the Division Algorithm, for that reason.
More precisely, the reduction modulo m of N is the unique integer r so that N can be written as
N =q m+r
with an integer q and with
0 r< m
(Very often the word `modulo' is abbreviated as `mod'.) The non-negative integer m is the modulus. We
will use the notation
r % m = reduction of r modulo m
For example,
j
j
10 % 7 = 3
10 % 5 = 0
12 % 2 = 0
15 % 7 = 1
100 % 7 = 2
1000 % 2 = 0
1001 % 2 = 1
In some sources, and sometimes for brevity, this terminology is abused by replacing the phrase `N
reduced mod m' by `N mod m'. This is not so terrible, but there is also a related but signi cantly di erent
18
meaning that `N mod m' has, as we will see later. Usually the context will make clear what the phrase
`N mod m' means, but watch out. We will use a notation which is fairly compatible with many computer
languages: write
x%m
for
x reduced modulo m
Reductions mod m can be computed by hand by the familiar long division algorithm. For m and N
both positive, even a simple hand calculator can be used to easily compute reductions. For example: divide
N by m, obtaining a decimal. Remove (by subtracting) the integer part of the decimal, and multiply back
by n to obtain the reduction mod m of N .
The process of reduction mod m can also be applied to negative integers. For example,
,10 % 7 = 4 since , 10 = (,2) 7 + 4
,10 % 5 = 0 since , 10 = (,2) 5 + 0
,15 % 7 = 6 since , 15 = (,3) 7 + 6
But neither the hand algorithm nor the calculator algorithm mentioned above give the correct output directly:
for one thing, it is not true that the reduction mod m of ,N is the negative of the reduction mod m of N .
And all our reductions mod m are supposed to be non-negative, besides. For example,
10 = 1 7 + 3
shows that the reduction of 10 mod 7 is 3, but if we simply negate both sides of this equation we get
,10 = (,1) 7 + (,3)
That `-3' does not t our requirements. The trick is to add another multiple of 7 to that `-3', while subtracting
it from the (,1) 7, getting
,10 = (,1 , 1) 7 + (,3 + 7)
or nally
,10 = (,2) 7 + 4
And there is one last `gotcha': in case the remainder is 0, as in
14 = 2 7 + 0
when we negate to get
,14 = (,2) 7 + 0
nothing further needs to be done, since that 0 is already in the right range. (If we did add another 7 to it,
we'd be in the wrong range). Thus, in summary, let r be the reduction of N mod m. Then the reduction of
,N mod m is m , r if r 6= 0, and is 0 if r = 0.
The modulus can be negative, as well: however, it happens that always the reduction of N modulo m
is just the reduction of N mod jmj, so this introduces nothing new.
Note that by our de nition the reduction mod m of any integer is always non-negative. This is at
variance with several computer languages, where the reduction of a negative integer ,N is the negative of
the reduction of N . This di erence has to be remembered when writing code.
Last, let's prove existence and uniqueness of the quotient and remainder in the assertion of the Reduction/Division Algorithm:
19
Proposition: Given a non-zero integer m and arbitrary integer n, there are unique integers q and r so that
0 r < jmj and
n=qm+r
For simplicity, we'll do the proof just for m > 0. The case that m < 0 is very similar. For xed n
and m, let X be the collection of all integers of the form n , x m. Since x can be positive or negative, and
since m is not 0, X contains both positive and negative integers. Let r be the least positive integer in X ,
and let q be the corresponding `x', so that n , qm = r.
First, we claim that 0 r < jmj. If r jmj, then r , m 0. Since r , m is writeable as n , (q + 1)m,
it is in the collection X . But r , m < r, contradicting the fact that r is the smallest positive integer in X .
Thus, it could not have been that r jmj, and we conclude that r < jmj, as desired.
Next, we prove uniqueness of the q and r. Suppose that
Proof:
qm + r = q m + r
0
0
with 0 r < 0 and 0 r < 0. By symmetry, we can suppose that r r (if not, reverse the roles of r and
r in the discussion). Then
(q , q) m = r , r
and r , r 0. If r , r 6= 0 then necessarily q , q 6= 0, but if so then
0
0
0
0
0
0
0
r , r = jr , rj = jq , qj jmj 1 jmj
0
0
0
(Again, r , r = jr rj since r , r 0). But
0
0
0
r , r r < jmj
0
0
Putting these together, we get the impossible
jmj r , r < jmj
0
This contradicts the supposition that r 6= r . Therefore, r = r . Then, from (q , q)m = r , r = 0 (and
m 6= 0) we get q = q, as well. This proves the uniqueness. Done.
Remark: The assertion that any (non-empty) collection of positive integers has a least element is the
Well-Ordering Principle for the positive integers.
0
0
0
0
0
Proposition: Let n and N be two integers, with mjN . Then for any integer x
(x % N ) % n = x % n
Proof: Write N = kn for some integer k , and let x = Q N + R with 0 R < jN j. This R is the reduction
of x mod N . Further, let R = q n + r with 0 r < jnj. This r is the reduction of R mod n. Then
x = QN + R = Q(kn) + qn + r = (Qk + q) n + r
So r is also the reduction of x modulo m.
.
Done
20
5.3 Greatest common divisors, least common multiples
An integer d is a common divisor of a family of integers n1 ; : : : ; nm if d divides each one of the integers
ni . An integer N is a common multiple of a family of integers n1 ; : : : ; nm if N is a multiple of each of the
integers ni .
Theorem: Let m; n be integers, not both zero. Among all common divisors of m; n there is a unique one,
call it d, so that for every other common divisor e of m; n we have ejd, and also d > 0. This divisor d is the
greatest common divisor or gcd of m; n. The greatest common divisor of two integers m, n (not both zero)
is the least positive integer of the form xm + yn with x; y 2 Z.
Remark: The theorem gives a strange and possibly very counter-intuitive characterization of the greatest
common divisor of two integers. However, it is this characterization which is necessary to prove things,
not the more intuitive picture one might have of the gcd in terms of factorization into primes. A strange
situation.
Remark: The greatest common divisor of m; n is denoted gcd(m; n). Two integers are relatively prime or
coprime if their greatest common divisor is 1. Also we may say that m is prime to n if they are relatively
prime.
Proof: Let D = xo m + yo n be the least positive integer expressible in the form xm + yn. First, we show that
any divisor d of both m and n surely divides D. Write m = m d and n = n d with m ; n 2 Z. Then
0
0
0
0
D = xo m + yo n = xo (m d) + yo (n d) = (xo m + yo n ) d
0
0
0
0
which certainly presents D as a multiple of d.
On the other hand, apply the Division Algorithm to write m = qD + r with 0 r < D. Then
0 r = m , qD = m , q(xo m + yo n) = (1 , qxo ) m + (,yo ) n
That is, this r is also expressible as x m + y n for integers x ; y . Since r < D, and since D is the smallest
positive integer so expressible, it must be that r = 0. Therefore, Djm. Similarly, Djn. Done.
A companion or `dual' notion concerning multiples instead of divisors is:
0
0
0
0
Corollary: Let m; n be integers, not both zero. Among all common multiples of m; n there is a unique one,
call it N , so that for every other common multiple M of m; n we have N jM , and also N > 0. This multiple
N is the least common multiple or lcm of m; n.
Remark: If we already have the prime factorizations of two numbers m; n, then we can easily nd the
greatest common divisor and least common multiple. Speci cally, for each prime number p, the power of p
dividing the gcd is the minimum of the powers of p dividing m and dividing n. Since this is true for each
prime, we know the prime factorization of the greatest common divisor. For example,
gcd(23 35 52 11; 32 53 72 112) = 32 52 11
since 20 is the smaller of the two powers of 2 occurring, 32 is the smaller of the two powers of 3 occurring,
52 is the smaller of the two powers of 5 occurring, 70 is the smaller of the two powers of 7 occurring, and
111 is the smaller of the two powers of 11 occurring.
Similarly, the least common multiple is obtained by taking the larger of the two powers of each prime
occurring in the factorizations of m; n.
However, we will see that this approach to computing greatest common divisors or least common multiples
(by way of prime factorizations) is very inecient.
21
5.4 Euclidean Algorithm
The Euclidean Algorithm is a very important and non-obvious systematic procedure to nd the
greatest common divisor d of two integers m; n, and also to nd integers x; y so that
xm + yn = d
As we'll see just below, each step in the Euclidean Algorithm is an instance of the Division algorithm.
One important aspect of the Euclidean Algorithm is that it avoids factorization of integers into primes,
and at the same time is a reasonably fast algorithm to accomplish its purpose. This is true at the level of
hand calculations and for machine calculations, too.
Unlike the Division Algorithm, for which we didn't bother to describe the actual procedure but just the
outcome, the Euclidean Algorithm needs description, which we do now, in examples.
To perform the Euclidean Algorithm for the two integers 513; 614:
614 , 1 513 = 101 (reduction of 614 mod 513)
513 , 5 101 = 8 (reduction of 513 mod 101)
101 , 12 8 = 5
(reduction of 101 mod 8)
8,15=3
(reduction of 8 mod 5)
5,13=2
(reduction of 5 mod 3)
3,12=1
(reduction of 3 mod 2)
Notice that the rst step is reduction of the larger of the given numbers modulo the smaller of the two.
The second step is reduction of the smaller of the two modulo the remainder from the rst step. At each
step, the `modulus' of the previous step becomes the `dividend' for the next step, and the `remainder' from
the previous step becomes the `modulus' for the next step.
In this example, since we obtained a 1 as a remainder, we know that the greatest common divisor of 614
and 513 is just 1, that is, that 614; 513 are relatively prime. By the time we got close to the end, it could
have been clear that we were going to get 1 as the gcd, but we carried out the procedure to the bitter end.
Notice that we did not need to nd prime factorizations in order to use the Euclidean Algorithm to nd
the greatest common divisor. Since it turns out to be a time-consuming task to factor numbers into primes,
this fact is worth something.
As another example, let's nd the gcd of 1024 and 888:
1024 , 1 888 = 136 (reduction of 1024 mod 888)
22
888 , 6 136 = 72 (reduction of 888 mod 136)
136 , 1 72 = 64 (reduction of 136 mod 72)
72 , 1 64 = 8 (reduction of 72 mod 64)
64 , 8 8 = 0 (reduction of 64 mod 8)
In this case, since we got a remainder 0, we must look at the remainder on the previous line: 8. The
conclusion is that 8 is the greatest common divisor of 1024 and 888.
So far we've only seen how to nd gcd(m; n). For small numbers we might feel that it's not terribly hard
to do this just by factoring m; n into primes and comparing factorizations, as mentioned above. However,
the problem of nding integers x; y so that
gcd(m; n) = xm + yn
is much more of a hassle even for relatively small integers m; n.
The Euclidean Algorithm provides means to nd these x; y with just a little more trouble, requiring that
we have kept track of all the numbers occurring in the Euclidean Algorithm, and that we run it backward,
as follows.
In the case of 614 and 513:
1=3,12
(last line of algorithm)
= 3 , 1 (5 , 1 3) (replacing 2 by its expression from the previous line)
= ,1 5 + 2 3 (rearranging as sum of 5's and 3's)
= ,1 5 + 2 (8 , 1 5) (replacing 3 by its expression from the previous line)
= 2 8 , 3 5 (rearranging as sum of 8's and 5's)
= 2 8 , 3 (101 , 12 8) (replacing 5 by its expression from the previous line)
= ,3 101 + 38 8 (rearranging as sum of 101's and 8's)
= ,3 101 + 38 (513 , 5 101) (replacing 8 by its expression from the previous line)
= 38 513 , 193 101 (rearranging as sum of 513's and 101's)
= 38 513 , 193 (614 , 513) (replacing 101 by its expression from the previous line)
= 231 513 , 193 614 (rearranging as sum of 614's and 513's)
That is, we have achieved our goal: we now know that
1 = 231 513 , 193 614
In order to successfully execute this algorithm, it is important to keep track of which numbers are mere
and which are the numbers to be replaced by more complicated expressions coming from the
earlier part of the algorithm. Thus, there is considerable reason to write it out as done just here, with the
coecients rst, with the numbers to be substituted-for second.
coecients,
23
5.5 Unique factorization: introduction
The fact that integers factor uniquely as products of primes is probably well-known to all of us from
our experience with integers. And it is provably true. This a very special case of the unique factorization
we'll prove later for Euclidean rings. In this subsection we'll just make a precise statement of facts.
Theorem: Unique Factorization Every integer n can be written in an essentially unique way as a
product of primes:
n = pe11 pe22 : : : pemm
with positive integer exponents and distinct primes p1 ; : : : ; pn .
Remark: The `essentially unique' means that of course writing the product in a di erent order does not
count as truly `di erent'. The use of the word `distinct' is typical of mathematics usage: it means `no two
of them are the same'. (This is a sharpening of the more colloquial use of `di erent'.) The in the theorem
is necessary since n might be negative but primes numbers themselves are positive.
The proof of the theorem starts from the following key lemma, which may feel obvious, but is not.
Lemma: Let p be a prime number, and suppose that a and b are integers, with p (ab). Then either p a or
j
p b (or both).
j
j
Proof: This proof is surely one of the least-expected arguments in elementary number theory! Suppose that
p ab but p a, and show that p b. Let ab = mp for some integer m> Since p is prime and does not divide a,
gcd(p; a) = 1. Thus, there are integers s; t so that sp + ta = 1. Then
j
6 j
j
b = b 1 = b (sp + ta) = bsp + bta = bsp + tmp = p (bs + tm)
Visibly b is a multiple of p.
|
Corollary: (of Lemma) If a prime p divides a product a1a2 : : : an then necessarily p divides at least one of
the factors ai .
5.6 Multiplicative inverses modulo m
This notion of \inverse" has no concrete connection to the elementary idea of inverse, but abstractly it
is very similar. The Euclidean algorithm also gives an ecient method for computation of inverses modulo
m.
A multiplicative inverse mod m of an integer N is another integer t so that N t % m = 1. It is
important to realize that this new notion of `inverse' has no tangible relation to more elementary notions of
`inverse'.
For example, since 2 3 = 6 which reduces mod 5 to 1, we can say that 3 is a multiplicative inverse
mod 5 to 2. This is not to say that `3 = 21 ' or `3 = 0:5' or any such thing. As another example, 143 is a
multiplicative inverse to 7 modulo 100, since 7 143 = 1001, which reduces mod 100 to 1. On the other
hand, we can anticipate that, for example, 2 has no multiplicative inverse modulo 10, because any multiple
2 t is an even number, but all expressions q 10 + 1 are odd.
Theorem: Fix a non-zero modulus m. An integer x has a multiplicative inverse modulo m if and only if
gcd(x; m) = 1. If gcd(x; m) = 1, let s; t be integers so that sx + tm = 1. Then s is a multiplicative inverse
of x modulo m.
24
Remark: The Euclidean algorithm provides an ecient method to nd expressions gcd(x; m) = sx + tm,
so thereby provides an ecient method to nd multiplicative inverses.
Proof: If x has a multiplicative inverse y modulo m, then
xy = 1 + `m
for some integer `. Rearranging, this is
1 = xy , `m
Thus, if djx and djm then dj1, from which follows that x and m are relatively prime.
On the other hand, suppose that gcd(x; m) = 1. From above, we know that the gcd is expressible as
1 = gcd(x; m) = sx + tm
for some s; t 2 Z. Rearranging this equation, we have
sx = 1 + (,t)m
which shows that sx 1 mod m. Thus, this s is a multiplicative inverse of x modulo m.
|
5.7 Integers modulo m
If two integers x; y di er by a multiple of a non-zero integer m, we say that x is congruent to y modulo
m, written
x y mod m
Any relation such as the latter is called a congruence modulo m, and m is the modulus.
Equivalently, x y mod m if and only if mj(x , y).
The idea of thinking of integers modulo m as necessarily having something to do with reduction modulo
m is seductive, but is a trap. If for no other reason than this, a somewhat richer vocabulary of concepts is
necessary in order to discuss more sophisticated things.
For example, 3 18 mod 5 because 5j(18 , 3). Yes, indeed, this is `just' a di erent way of writing a
divisibility assertion. But this notation (due to Gauss, almost 200 years ago) is meant to cause us to think
of congruence as a variant of equality, with comparable features. That congruences really do have properties
similar to equality requires some proof, even though the proofs are not so hard. In giving the statements of
these properties the corresponding terminology is also introduced.
Proposition: For a xed integer m, congruence modulo m is an equivalence relation. That is,
Re exivity: Always x x mod m for any x.
Symmetry: If x y mod m then y x mod m.
Transitivity: If x y mod m and y z mod m then x z mod m.
Proof: Since x , x = 0 and always mj0, we have re exivity. If mj(x , y) then mj(y , x) since y , x = ,(x , y).
Thus, we have symmetry. Suppose that mj(x,y) and mj(y,z ). Then there are integers k; ` so that mk = x,y
and m` = y , z . Then
x , z = (x , y) + (y , z ) = mk + m` = m (k + `)
This proves the transitivity. Done.
25
For any integer x, the collection of all integers y congruent to x modulo m is the congruence class of
x modulo m. This is also called the residue class of x modulo m, or equivalence class of x with respect
to the equivalence relation of congruence modulo m.
The integers mod m is the collection of congruence classes of integers with respect to the equivalence
relation congruence modulo m. It is denoted Z=m (or sometimes Zm ).
Given an integer x and a modulus m, the equivalence class
fy 2 Z : y x mod mg
of x modulo m is often denoted x, and is also called the congruence class or residue class of x mod m.
On other occasions, the bar notation is not used at all, so that x-mod-m may be written simply as `x' with
only the context to make clear that this means x-mod-m and not the integer x.
Thus, for example, modulo 12 we have
0 = 12 = ,12 = 2400
7 = 7 = ,5 = 2407
1 = 13 = ,11 = 2401
or, equivalently,
0-mod-12 = 12-mod-12 = ,12-mod-12 = 2400-mod-12
7,mod,12 = 7,mod,12 = ,5,mod,12 = 2407,mod,12
1,mod,12 = 13,mod,12 = ,11,mod,12 = 2401,mod,12
Remark: There is one traditionally popular collection of representatives for the equivalence classes modulo
f0; 1; 2; : : : m , 2; m , 1g
m, namely
In fact, some awed sources de ne integers-mod-m as being this set of things, but this is too naive an
understanding of what kind of thing integers-mod-m really is. We should distinguish the set of integers
reduced mod m (which really is f0; 1; 2; : : :; m , 1g !) from the set of integers modulo m, which is the set of
equivalence classes of integers modulo m. The latter is a more abstract object.
So while it is certainly true that (for example)
Z=3 = f0; 1; 2g
it is also true that
Z=3 = f10; 31; ,1g
and that there are many other ways of describing it as well.
Again: Z=n is not the set of integers f0; 1; 2; 3; : : :; m , 1g. Rather, Z=n is the set of equivalence classes
modulo m. The set f0; 1; 2; 3; : : :; m , 1g is the set of integers reduced modulo m (for which there is no special
symbol). Still, we do have:
Proposition: Fix two integers x; x . Let x = qm+r and x = q m+r with integers q; q ; r; r and 0 r < jmj
and 0 r < jm j. Then x x mod m if and only if r r mod m.
Proof: If x x mod m then there is an integer k so that x = x + km. Then
0
0
0
0
0
0
0
0
0
0
0
0
r = x , q m = (x + km) , q m = x + m (k , q ) = qm + r + m (k , q )
0
0
0
0
0
= r + m (q + k , q )
0
26
0
This proves that r r mod m. The opposite direction of argument is similar. Done.
Beyond being just an equivalence relation, congruences behave very nicely with respect to the basic
arithmetic operations of addition, subtraction, and multiplication:
Proposition: For xed modulus m, If x x then for all y
0
0
x + y x + y mod m
0
xy x y mod m
0
In fact, if y y , then
0
x + y x + y mod m
x y x y mod m
0
0
0
0
Proof: It suces to prove only the more general assertions. Since x x mod m, mj(x , x), so there is an
integer k so that mk = x , x. That is, we have x = x + mk. Similarly, we have y = y + `m for integer `.
Then
x + y = (x + mk) + (y + m`) = x + y + m (k + `)
Thus, x + y x + y mod m. And
0
0
0
0
0
0
0
0
0
x y = (x + mk) (y + m`) = x y + xm` + mky + mk m` = x y + m (k + ` + mk`)
0
0
Thus, x y xy mod m. Done.
As a corollary of this last proposition, congruences immediately inherit some properties from ordinary
arithmetic, simply because x = y implies x y mod m:
0
0
Distributivity: x(y + z ) xy + xz mod m
Associativity of addition: (x + y) + z x + (y + z ) mod m
Associativity of multiplication: (xy)z x(yz ) mod m
Property of 1: 1 x x 1 x mod m
Property of 0:0 + x x + 0 x mod m
We should feel reassured by these observations that we can do arithmetic `mod m' without anything
messing up. As a matter of notation, we write
Z=m
for the integers mod m, viewing two integers x; y as `the same' if x y mod m. Thus, there are only m
`things' in Z=m, since there are only m possibilities for what an integer can be congruent to mod m. Very
often, a person thinks of 0; 1; 2; : : :; m , 2; m , 1 as being the `things' in Z=m, but this is not quite good
enought for all purposes.
And there are some more practical observations which also deserve emphasis:
m 0 mod m, and generally km 0 mod m for any integer k.
x + (,x) 0 mod m
x m x mod m, and generally x + km x mod m for any integer k
Note that in all this discussion we only look at one modulus m at a time.
27
Corollary: For a xed modulus m in each residue class there is exactly one integer which is reduced mod
m. Therefore, x y mod m if and only if x and y have the same reduction mod m, that is, have the same
remainder when divided by m as in the Division/Reduction Algorithm.
Proof: Fix an integer x. Invoking the Reduction algorithm, there is a unique 0 r < jmj and an integer q
so that x = qm + r. Then x , r = qm is divible by m, so x and r are in the same residue class. Since r is
reduced, this proves that there is at least one reduced representative for each residue class.
On the other hand, (reproving the uniqueness part of the Reduction Algorithm!), suppose that x r
for r in the range 0 r < jmj. If 0 r < r , then
0
0
0
0
0 < r , r = (r , x) , (r , x)
0
0
is multiple of m. Yet also 0 < r , r r < jmj. But a multiple of m cannot be > 0 and < jmj, so it cannot
be that 0 r < r . Or, supposing that 0 r < r, by a symmetrical argument we would again reach a
contradiction. Thus, r = r . This proves the uniqueness. Done.
Corollary: Fix a modulus m, and integers x and y . For brevity write
0
0
0
0
0
x%m
for x reduced modulo m. Then
(x + y) % m = ((x % m) + (y % m)) % m
and
(x y) % m = ((x % m) (y % m)) % m
Proof: The residue class of x = (x % m) is the same as the residue class of x itself. Therefore, modulo m,
we have
((x % m) + (y % m)) % m (x % m) + (y % m) x + y
since we proved that x x and y y gives x + y x + y. Further, similarly,
0
0
0
0
0
x + y (x + y) % m
Thus, by transitivity,
((x % m) + (y % m)) % m (x + y) % m
The same argument works for multiplication. Done.
One would correctly get the impression that all properties of congruences follow from properties of
ordinary equality together with properties of elementary arithmetic.
We return again to multiplicative inverses modulo m. That is, to nd a multiplicative inverse mod m
for a, we want to solve for x in the equation
ax 1 mod m
where the integer a is given. Unless a = 1 the solution x = a1 of the equation ax = 1 is not an integer. But
that's not what's going on here. Rather, recall that if gcd(a; m) = 1 then there are integers x; y so that
ax + ym = gcd(a; m) = 1
Then ax , 1 = ym is a multiple of m, so with this value of x
ax 1 mod m
28
Unless a = 1, this x can't possibly be a1 , if only because a1 is not an integer. We are doing something new.
Recall that we did prove that a has a multiplicative inverse if and only if gcd(a; m) = 1, in which case
the Euclidean Algorithm is an e ective means to actually nd the inverse.
In light of the last observation, we have a separate notation for the integers-mod-m which are relatively
prime to m and hence have inverses:
Z=m
The superscript is not an `x', but is a `times', making a reference to multiplication and multiplicative inverses,
but mod m.
Proposition: The product xy of two integers x and y both prime to m is again prime to m.
Proof: One way to think about this would be in terms of prime factorizations, but let's do without that.
Rather, let's use the fact that the gcd of two integers a; b can be expressed as
gcd(a; b) = sa + tb
for some integers s; t. Thus, there are integers a; b; c; d so that
1 = ax + bm
1 = cy + dm
Then
1 = 1 1 = (ax + bm)(cy + dm) = (ac)(xy) + (bcy + axd + bdm)m
Thus, 1 is expressible in the form A(xy) + Bm, so (by the sharp form of this principle!) necessarily xy and
m are relatively prime. Done.
So in the batch of things denoted Z=m we can multiply and take inverses (so, e ectively, divide).
#5.30 Prove directly, from the very de nition of divisibility, that if djx and djy then dj(x , y) and dj(x + y).
#5.31 Observe that 121, 1331, and 14641 cannot be prime, without computation.
#5.32 Find the greatest common divisor of 6; 10; 15.
#5.33 Find the least common multiple of 6; 10; 15.
#5.34 Find the greatest common divisor of 2; 4; 8; 16; 32; 64; 128.
#5.35 Find the least common multiple of 2; 4; 8; 16; 32; 64; 128.
#5.36 Show that for any integer n if djn and dj(n + 2) then dj2.
#5.37 Show that for any integer n the two integers n and n + 1 are invariably relatively prime.
#5.38 Show that for any integer n exactly one of n; n + 2; n + 4 is divisible by 3. In particular, except for
3; 5; 7, there are no triples of primes occuring in the pattern n; n + 2; n + 4.
#5.39 Show that for any integer n, the integers n and n2 + 1 are relatively prime.
#5.40 Prove that for any two integers m; n, the least common multiple lcm(m; n) exists, and lcm(m; n) =
m n=gcd(m; n).
#5.41 Find the reduction mod 99 of 1000.
#5.42 Find the reduction mod 88 of -1000.
29
#5.43 Prove that the reduction mod 10 of a positive integer N is simply the ones' place digit of N in
decimal notation.
#5.44 Prove that the reduction mod 100 of a positive integer N is the two-digit number made up of the
tens' and ones' place digits of N .
#5.45 Let m be any non-zero integer. Prove that the reduction mod ,m of N is the same as the reduction
mod m of N .
#5.46 Prove in general that if r is the reduction of N mod m, and if r 6= 0, then m , r is the reduction of
,N mod m.
#5.47 Find gcd(1236; 4323) and express it in the form 1236x + 4323y for some integers x; y, by hand
computation.
#5.48 Find gcd(12367; 24983), and express it in the form 12367x + 24983y, by hand computation.
#5.49 Find a proper factor of 111; 111; 111; 111; 111 without using a calculator.
#5.50 Prove/observe that the one's-place digit of a decimal number cannot is not sucient information (by
itself) to determine whether the number is divisible by 3, or by 7.
#5.51 Explain why 2m + 1 cannot possibly be a prime number unless m is a power of 2. (If it is prime then
it's called a Fermat prime.)
#5.52 Explain why 2m , 1 cannot possibly be a prime number unless m is prime. (If it is prime then it's
called a Mersenne prime.)
#5.53 How many elements does the set Z=n have?
#5.54 How many elements does the set Z=30 have?
#5.55 Find the multiplicative inverse of 3 modulo 100.
#5.56 Find the multiplicative inverse of 1001 modulo 1234.
30
6. Unique factorization into primes
We now can prove the unique factorization of integers into primes. This very possibly may already
seem \intuitively true", since after all our experience with small integers bears witness to the truth of the
assertion. And it is true, after all. But it is worth paying attention to how such a thing can be proven,
especially since we will later want to try to prove unique factorization for fancier kinds of \numbers", for
which our intuition is not adequate. Since it is not true in general that \all kinds" of numbers can be factored
uniquely into primes, we must be alert.
While we're here, we also give a formula for Euler's phi-function '(n), whose de nition
'(n) = number of integers i in the range 0 i n relatively prime to n
We also look at the most naive primality test, as well as the most naive algorithm to obtain the
factorization of an integer into primes. To obtain the list of all primes less than a given bound, we mention
Eratoshenes' sieve, which is reasonably ecient for what it does.
Also, we can take this occasion to review some algebraic identities which occasionally provide shortcuts
in the otherwise potentially laborious task of ascertaining whether a given number is prime, and/or factoring
it into primes.
Theorem: Unique Factorization Every integer n can be written in an essentially unique way as a
product of primes:
n = pe11 pe22 : : : pemm
with positive integer exponents and distinct primes p1 ; : : : ; pn .
Remark: The `essentially unique' means that of course writing the product in a di erent order does not
count as truly `di erent'. The use of the word `distinct' is typical of mathematics usage: it means `no two
of them are the same'. (This is a sharpening of the more colloquial use of `di erent'.) The in the theorem
is necessary since n might be negative but primes numbers themselves are positive.
Corollary: Let N be a positive integer factored into primes as
n = pe11 pe22 : : : penn
where p1 ; : : : pn are distinct primes, and the exponents ei are all non-negative integers. Then the Euler
phi-function of N has the value
'(N ) = (p1 , 1)pe11 ,1 (p2 , 1)pe22 ,1 : : : (pn , 1)pnen ,1
The proof of the theorem starts from the following key lemma, which may feel obvious, but is not.
Lemma: Let p be a prime number, and suppose that a and b are integers, with pj(ab). Then either pja or
pjb, or both.
Proof: (of Lemma) If pja we are done. So suppose that p does not divide a. Then the greatest common
divisor gcd(p; a) can't be p. But this greatest common divisor is also a divisor of p, and is positive. Since p
is prime, the only positive divisor of p other than p itself is just 1. Therefore, gcd(p; a) = 1. We saw that
there exist integers x; y so that xp + ya = 1.
Since pj(ab), we can write ab = hp for some integer h.
b = b 1 = b (xp , ya) = bxp , yba = (bx , yh) p
31
This shows that b is a multiple of p. Done.
Corollary: (of Lemma) If a prime p divides a product a1 a2 : : : an then necessarily p divides at least one of
the factors ai .
Proof: (of Corollary) This is by induction on n. The Lemma is the assertion for n = 2. Suppose pj(a1 : : : an ).
Then write the latter product as
a1 : : : an = (a1 : : : an,1 ) an
By the lemma, either p divides an or p divides a1 a2 : : : an,1 . If pjan we are done. If not, then pj(a1 : : : an,1 .
By induction, this implies that p divides one of the factors a1 ; a2 ; : : : ; an,1 . Altogether, we conclude that in
any case p divides one of the factors a1 ; : : : ; an . Done.
Proof: (of Theorem) First we prove that for every integer there exists a factorization, and then that it is
unique. It certainly suces to treat only factorizations of positive integers, since factorizations for ,n and
n are obviously related.
For existence, suppose that some integer n > 1 did not have a factorization into primes. Then n cannot
be prime itself, or just \n = n" is a factorization into primes. Therefore n has a proper factorization n = xy
with x; y > 0. Since the factorization is proper, both x and y are strictly smaller than n. Thus, x and y
both can be factored into primes. Putting together the two factorizations gives the factorization of n. This
contradicts the assumption that there exist any integers lacking prime factorizations.
Now prove uniqueness. Suppose we have
q1e1 : : : qmem = N = pf11 : : : pfnn
where (without loss of generality)
are primes, and also
q 1 < q2 < : : : < q m
p1 < p2 < : : : < pn
are all primes. And the exponents ei and fi are positive integers. We must show that m = n, qi = pi for all
i, and ei = fi for all i.
Since q1 divides the left-hand side of the equality, it must divide the right-hand side. Therefore, by the
corollary to the lemma just above, q1 must divide one of the factors on the right-hand side. So q1 must
divide some pi . Since pi is prime, it must be that q1 = pi .
We claim that i = 1. Indeed, if i > 1 then p1 < pi . And p1 divides the left-hand side, so divides one of
the qj , so is equal to some qj . But then qj q1 = pi > pi , which is impossible. Therefore, q1 = p1 .
Further, by dividing through by e1 factors q1 = p1 , we see that the corresponding exponents e1 and f1
must also be equal.
The rest of the argument about uniqueness is by induction on N . First, 1 has a unique factorization (of
sorts), namely the empty product. In any case, since 2 is prime it has the factorization 2 = 2. This begins
the induction. Suppose that all integers N 0 < N have unique factorizations into primes (and prove that N
likewise has a unique factorization).
From the equation
q1e1 : : : qmem = N = pf11 : : : pfnn
by dividing by q1e1 = pf11 we obtain
q2e2 : : : qmem = qNe1 = pf22 : : : pfnn
1
32
We had suppose that all the exponents ei were positive, so N=q1e1 < N . Thus, by induction, N=q1e1 has
unique factorization, and we conclude that all the remaining factors must match up. This nishes the proof
of the unique factorization theorem. Done.
Now we prove the corollary, giving the formula for Euler's phi-function:
'(N ) = (p1 , 1)pe11 ,1 (p2 , 1)pe22 ,1 : : : (pn , 1)pnen ,1
where n = pe11 : : : penn is the factorization into distinct prime factors pi , and all exponents are positive integers.
The argument is by counting: we'll count the number of numbers x in the range from 0 through N , 1 which
do have a common factor with N , and subtract. And, by unique factorization, if x has a common factor
with N then it has a common prime factor with N . There are exactly N=pi numbers divisible by pi , so we
would be tempted to say that the number of numbers in that range with no common factor with N would
be
N
p1
N,
, pN , : : : pN
n
2
However, this is not correct in general: we have accounted for numbers divisible by two di erent pi 's twice,
so we should add back in all the expressions N=pi pj with i 6= j . But then we've added back in too many
things, and have to subtract all the expressions M=pi pj pk with i; j; k distinct. And so on:
'(N ) = N ,
XN X N
+ pp ,
p
i
i
i6=j i j
= N 1 , p1
= pe11 1 , 1
p1
1
X
i;j;k distinct
N
+:::
pi pj pk
1 , p1 : : : 1 , p1
2
n
pe22 1 , 1
1
: : : penn 1 ,
pn
p2
= (p1 , 1)pe1 ,1 (p2 , 1)pe2 ,1 : : : (pn , 1)pen ,1
1
2
n
This is the desired formula. Done.
The most obvious (but not most ecient) means to obtain the prime factorization and simultaneously
to
test
primality of a positive integer N is as follows. Attempt division by integers d = 2; 3; 4; 5; 6; 7; : : :
p
N until either
p the smallest divisor d1 > 1 of N is found, or it is determined that N has no proper
divisors N . In theplatter case, N is prime. In the former case, attempt division by integers d =
d1 ; d1 + 1; d1 + 2; : : : N=d1 untilpeither the smallest divisor d2 > 1 of N=d1 is found, or it is determined
that N=d1 has no proper divisors N=d1 . In p
the latter case, N=d1 is prime. In the former case, attempt
division by integers d = d2 ; d2 + 1; d2 + 2; : : : N=d1 d2 until eitherpthe smallest divisor d3 > 1 of N=d1 d2
is found, or it is determined that N=d1 d2 has no proper divisors N=d1 d2 . In the latter case N=d1 d2 is
prime. In the former case...
This recursive procedure
p ends when some N=d1 d2 : : : dm is prime. At the same time, if N has no divisor
d in the range 1 < d < N then N is prime.
Remark: It is possible to make the procedure slightly more economical in an obvious way: in attempting
division by d in the manner indicated, there is no reason to use non-primes, since if d = ab with a; b > 1,
then we would already have detected divisibility by both a and b earlier and divided out by them. On the
other hand, the e ort required to identify all the non-primes d may be more e ort than it is worth.
Some sort of compromise approach is reasonable: for example, there is no reason to attempt division by
even numbers other than 2, nor by numbers bigger than 5 other than 5 (nor numbers divisible by 10). The
point is that for integers represented as decimals, divisibility by 2 or 5 (or 10) is very easy to identify.
33
Addressing a slightly di erent question, we might wish to nd all primes less than a given bound N .
A reasonable procedure for this is Eratosthenes' Sieve, described as follows. List all the integers from 2
through N .
Starting with 2 + 2, mark every 2nd integer on the list. (This marks all even numbers bigger than 2).
The next integer (after 2) on the list which hasn't been marked is 3. Starting with 3 + 3, mark every
3rd integer (counting those already marked). (This marks all multiples of 3 bigger than 3 itself.
The next integer (after 3) on the list which hasn't been marked is 5. Starting with 5 + 5, mark every
5th integer (counting those already marked). (This marks all multiples of 5 bigger than 5 itself.)
...
Take the next integer n on the list which has not yet been crossed-o . This n is prime. Starting with
n + n, cross o every nth integer (counting those already marked). (This marks all multiples of n bigger
than n itself).
...
p
Stop when you've marked all multiples of the largest prime less than N .
For example, looking at the list of integers from 2 through 31 and executing this procedure, we rst
have the list
2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21
22 23 24 25 26 27 28 29 30 31
Marking multiples of 2 after 2 itself gives
2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21
22 23 24 25 26 27 28 29 30 31
Marking multiples of 3 after 3 itself gives
2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21
22 23 24 25 26 27 28 29 30 31
Marking multiples of 5 after 5 itself gives
2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21
22 23 24 25 26 27 28 29 30 31
p
By this point, the next unmarked integer is 7, which is larger than 31, so all the integers in the list
unmarked by this point are prime.
There are standard identities which are useful in anticipating factorization of special polynomials and
special forms of numbers:
x2 , y2 = (x , y)(x + y)
x3 , y3 = (x , y)(x2 + xy + y2) x3 + y3 = (x + y)(x2 , xy + y2)
x4 , y4 = (x , y)(x3 + x2 y + xy2 + y3)
x5 , y5 = (x , y)(x4 + x3 y + x2 y2 + xy3 + y4)
34
x5 + y5 = (x + y)(x4 , x3 y + x2 y2 , xy3 + y4 )
and so on. Note that for odd exponents there are two identities while for even exponents there is just one.
Thus, for example, we might be curious whether there are in nitely-many primes of the form n3 , 1 for
integers n. To address this, use
n3 1 = (n , 1) (n2 + n + 1)
Therefore, if both factors n , 1 and n2 + n + 1 fall strictly between 1 and n3 , 1, then this is a proper
factorization of n3 , 1, so n3 , 1 could not be prime. In fact, it suces to show that one of the factors is
both > 1 and < n3 , 1. Note that for n = 2 the expression n3 , 1 has value 7, which is prime, so we'd better
not try to prove that this expression is never prime.
For n > 2 certainly n , 1 > 2 , 1 = 1. This is one comparison. On the other hand, also for n > 2,
n,1<22n,1<nnn,1
Thus, 0 < n , 1 < m3 , 1 if n > 2. This shows that n3 , 1 is never prime for n > 2.
One special algebraic form for numbers, which was historically of recreational interest, but is now also
of practical interest, is 2n , 1. If such a number is prime, then it is called a Mersenne prime. It is not
known whether or not there are in nitely-many Mersenne primes.
Another special form is 2m +1. If such a number is prime, it is called a Fermat prime. It is not known
whether there are in nitely-many primes of this form. Fermat evidently thought that every expression
n
22 + 1
might be prime, but this was disproved by Euler about 100 years later.
#6.57 Factor the integers 1028 and 2057 into primes.
#6.58 Find a proper factor of 111; 111; 111; 111; 111 without using a calculator.
#6.59 Find a proper factor of 101; 010; 101; 010; 101 without using a calculator.
#6.60 Prove/observe that the one's-place digit of a decimal number cannot is not sucient information (by
itself) to determine whether the number is divisible by 3, or by 7.
#6.61 Explain why n2 , 1 cannot be prime for any n > 2.
#6.62 Explain why 3n , 1 cannot possibly be a prime number if n > 1.
#6.63 Explain why 2m + 1 cannot possibly be a prime number unless m is a power of 2.
#6.64 While we mostly know that x2 , y2 has a factorization, that x3 , y3 has a factorization, that x3 + y3
has, and so on, there is a factorization that seldom appears in `high school': x4 + 4y4 has a factorization into
two quadratic pieces, each with 3 terms! Find this factorization. Hint:
x4 + 4y4 = (x4 + 4x2 y2 + 4y4) , 4x2 y2
#6.65 Can n4 + 4 be a prime if the integer n is bigger than 1?
#6.66 Factor x6 , y6 in two di erent ways.
#6.67 (*) Euclid's proof of the in nitude of primes Suppose there were only nitely-many primes p1; p2; : : : ; pn.
Consider the number N = 2p1 : : : pn + 1. Show that none of the pi can divide N . Conclude that there must
be some other prime than those on this list, contradiction.
35
7. (*) Prime Numbers
Euclid's Theorem: in nitude of primes
The Prime Number Theorem
Chebyche 's Theorem
Sharpest known asymptotics
The Riemann Hypothesis
7.1 Euclid's Theorem: in nitude of primes
Our experience probably already suggested that integers have unique factorization into primes, but it is
less intuitive that there are in nitely many primes. Euclid's 2000-year-old proof of this is not only ingenious,
but also is a good example of an indirect proof (\by contradiction").
For this discussion we grant that integers have unique factorizations into primes. (This is a special case
of our later result that all Euclidean rings have unique factorization.)
Theorem: (Euclid) There are in nitely-many prime numbers.
Proof: This is a proof by contradiction. Suppose that there were only nitely many primes. Then we
could list all of them: 1
n . Then consider the number
p ;:::;p
N
=
p1
2
p
:::
n,1 pn
p
+1
That is, is the product of all the primes, plus 1. Since
1, and since has a factorization into primes,
we can say that there is a prime dividing . Then cannot be in the list 1
n , since if it were in
that list, then would divide
, ( 1 n) = 1
which it does not. But the fact that is not on the list contradicts the hypothesis that we had listed them
all. That is, assuming that there were only nitely-many primes leads to a contradiction. Thus, there are
in nitely-many primes.
|
Note that this gives no substantial idea of what integers are or are not primes, nor \how many" primes
there may be.
N
N >
p
N
p
N
p ;:::;p
p
N
p
:::p
p
7.2 The Prime Number Theorem
Around 1800 Gauss and Legendre had made conjectures about the distribution of prime numbers, from
looking at lists of primes, but were unable to prove anything very precise. It was not until 1896 that
Hadamard and de la Vallee-Poussin independently proved the result described just below.
36
The standard counting function for primes is
(x)
= number of primes less than x
We use the standard notation that
f (x)
means
g(x)
f (x)
x!lim
+1 g(x) = 1
Prime Number Theorem: As x ! +1
(x)
lnxx
The proof of this is a bit dicult.
7.3 Chebyche 's Proof
In 1851 Chebyche made a breakthrough toward proving the Prime Number Theorem. Although what
he proved was weaker than the conjectured result, it was the rst real progress beyond collecting statistics
and making lists. His proof is more-or-less accessible in terms of things we know, so we'll do it here:
Theorem: (Chebyche ) There are positive constants c and C so that eventually (for large-enough x)
c
lnxx (x) C lnxx
Proof: We need to de ne standard auxiliary functions
(x)
(x) =
=
X ln
p<x
X ln
p
p prime:
p prime, k2Z:p <x
p
k
That is, in words, (x) is the sum of the natural logarithms of all primes less than x, and (x) is the sum
of ln p for every prime power pk less than x. The easiest estimates arise in terms of and , so at the end
we will return to see what these say about the prime-counting function . The rst thing necessary is to see
that, for purposes of our asymptotic estimates, and are not far apart. After that come two rather clever
lemmas due to Chebyche .
Lemma:
0 (x) , (x) x1=2 (ln x)2
(Proof left to the reader: there's nothing delicate about this comparison!)
Lemma: (Chebyche ) (x) = O(x).
37
Proof of Lemma: For m = 2e with positive integer e, consider the binomial coecient
m
N = m=
2
Since
2m = (1 + 1)m =
m 1m,k 1k = X m
k
k
0km
0km
X
, m
it is clear that m=
is a positive integer, and is less than 2m . On the other hand, from the expression
2
m!
m
m=2 = (m=2)! (m=2)!
, m
we can see that each prime p in the range m2 < p m divides m=
. Thus,
2
m
p m=
2
(m=2)<pm
Y
The natural logarithm function is monotone increasing, meaning that x < y implies ln(x) < ln(y).
Therefore, taking natural logarithms of both sides of the last displayed inequality, we have
(m) , (m=2) m ln 2
That is,
(2e ) , (2e,1 ) m ln 2 = 2e ln 2
Therefore, applying this repeatedly, we have
(2e ) = ((2e ) , (2e,1 )) + (2e,1 )
2e ln 2 + ((2e,1 ) , (2e,2 )) + (2e,2 )
2e ln 2 + 2e,1 ln 2 + (2e,2 )
2e ln 2 + 2e,1 ln 2 + 2e,2 ln 2 + (2e,3 )
which, by repeating further, is
2e ln 2 + 2e, ln 2 + 2e, ln 2 + : : : + 2 ln 2 + ln 2
= 2e ln 2 , 1 2e ln 2
1
2
+1
1
+1
So for 2e,1 x 2e , we have
(x) (2e ) = 2e+1 ln 2 = 2 2e ln 2 4 x ln 2 = (4 ln 2) x
This proves the lemma.
Lemma: (Chebyche ) There are positive constants c; C so that
cx (x) Cx
Proof: Consider
I=
Z
0
1
xn (1 , x)n dx
38
|
Multiplying out the (1 , x)n and integrating term-by-term, no term has a denominator larger than 2n + 1,
so if we multiply out by the least common multiple of 1; 2; 3; : : : ; 2n + 1 the result is an integer:
I
Thus,
lcm(1; 2; 3; : : : ; 2n; 2n + 1) 2 Z
1 I lcm(1; 2; 3; : : : ; 2n; 2n + 1)
or
1 lcm(1; 2; 3; : : : ; 2n; 2n + 1)
I
On the other hand, the maximum of x(1 , x) on the interval [0; 1] is 1=4, so the integrand is at most (1=4)n,
and
1
I ( )n
4
which can be rearranged to
Thus, putting these inequalities together, we have
4n I1
4n lcm(1; 2; 3; : : : ; 2n; 2n + 1)
Taking logarithms,
(ln 4) n ln lcm(1; 2; 3; : : : ; 2n + 1)
Now we are happy, because of the following essentially elementary observation (which of course was the real
reason that Chebyche introduced in the rst place):
Lemma:
(n) = ln lcm(1; 2; 3; : : : ; 2n + 1)
(Proof left to the reader: it's not too hard!)
Finally we can return to the counting function (x) rather than the auxiliary functions and . We
have a Riemann-Stieljes integral
Z x
1
(x) =
ln t d(t)
3 =2
Integrating by parts, this gives
(x)
= ln(xx) +
Z
x
(x)
3=2 t
ln2 t
dt =
(x)
ln x
dt
Using the fact that (x) = O(x), this gives
(x)
= ln(xx) +
Since we know by now that
for some positive constants c; C ,
This nishes the proof of the theorem.
Z
x
O(x)
3=2 t
cx
ln2 t
+ O(
x
ln2 x
)
(x) Cx
Cx
cx
ln x (x) ln x
39
|
7.4 Sharpest known asymptotics
The best known assertion about asymptotic distribution of primes is somewhat sharper than the simple
statement of the Prime Number Theorem, since it gives an error term. This result comes from work of I.
Vinogradov and Korobov, but was nished in all details by A. Wal sz and H.-E. Rickert. See A. Wal sz,
Weylsche Exponentialsummen in der neueren Zahlentheorie, VEB Deutscher Verlag der Wissenschaften,
Berlin, 1963. A reasonable exposition in English of this and related results is in A. A. Karatsuba, The
distribution of prime numbers, Russian Math. Surveys 45 (1990), pp. 99-171.
The logarithmic integral is de ned to be
li(x) =
Z
x
2
1 dt
ln t
lnxx
The sharpest assertion proven concerning the distribution of primes (in 1997) seems to be: there exists a
positive constant c so that
x
(x) = li(x) + O
c(ln x)3=5 (ln ln x),1=5
e
To simplify a little for clarity, we can weaken this statement to assert
(x)
= lnxx + O
x
ln2 x
Since li(x) is monotone increasing, it has an inverse function. Write
li,1 (x)
for the inverse function (not for 1=li(x)). Then the nth prime pn is estimated by
pn
= li,1 (n) + O(
n
3=5
,1=5 ) ( n ln n)
e(ln n) (ln ln n)
7.5 The Riemann Hypothesis
Even though the Prime Number Theorem was not proven until 1896, already by 1858 B. Riemann had
seen the connection between error terms in the distribution of primes and the subtle behavior of a special
function, the zeta function de ned below.
For a complex number s with real part > 1, the series
X 1
(s) =
s
1 n
n
is absolutely convergent, and de nes a function of s. This is the zeta function, often called Riemann's
because G. Riemann (about 1858) was the rst to see that analytical properties of (s) are intimately related
to delicate details concerning the distribution of primes. Other people (for example, L. Euler) had seen that
there were general connections. Already Euler had observed the Euler product expansion: for complex
s with real part > 1
Y
1
(s) =
1 , p1s
p prime
40
To give this function meaning when the real part of s is less than or equal to 1 is already an issue, but
this was resolved more than 140 years ago by Riemann, if not already by Euler.
For a real number r in the range 12 < r < 1, let P N Tr be the statement
(x)
= lnxx + O(xr+" ) for all " > 0
It is important to realize that there is presently no proof that any such assertion is true: the error term in
this assertion is asymptotically smaller than any error term that anyone has proven to hold. (See above.)
On the other hand, again for a real number r in the range 21 < r < 1, let RHr be the statement
(s)
6= 0 when the real part of s is > r
No one has been able to prove any such statement for any r < 1. At the same time, it is known that there
are in nitely-many complex numbers with real part 21 so that () = 0.
Theorem: (sketched by Riemann) For each 12 < r < 1, the assertion P N Tr is equivalent to RHr .
In particular, the Riemann Hypothesis is that (s) 6= 0 for complex s with real part > 21 .
Thus, the best possible error term in the description of the asymptotic distribution of primes would
be obtained if the Riemann Hypothesis were known to be true. But essentially nothing is known in this
direction, although the accumulation of numerical evidence strongly supports the truth of the Riemann
Hypothesis. If the Riemann Hypothesis is true, in fact (as H. von Koch has proven)
p
= li(x) + O( x ln x)
p
,1
th
n prime = li (n) + O( n (ln n)5=2 )
(x)
Then there is the Extended Riemann Hypothesis which is similar assertion about the zeros of a
wider class of functions than just the zeta function. And the Generalized Riemann Hypothesis is a
comparable assertion about the zeros of a yet wider class. All these hypotheses, if true, would give the
best possible error estimates on the distribution of primes and generalizations of primes. Unfortunately,
essentially nothing is known about these things, apart from numerical evidence in favor of all of them.
41
8. Sun Ze's Theorem
Now we start developing some standard classical number theory, on the way to understanding (for
example) the structure of Z=n, and this di erences in this structure depending upon whether or not n is
prime.
Sun Ze's theorem
Special systems of linear congruences
Congruences with composite moduli
Hensel's lemma for prime-power moduli
8.1 Sun Ze's Theorem
The result of this section is sometimes known as the Chinese Remainder Theorem, mainly because
the earliest results (including and following Sun Ze's) were obtained in China. Sun Ze's result was obtained
before 450, and the statement below was obtained by Chin Chiu Shao about 1250. Such results, with
virtually the same proofs, apply to much more general \numbers" than the integers Z.
Let m1 ; : : : ; m be non-zero integers such that for any pair of indices i; j with i 6= j the integers m and
m are relatively prime. We say that the integers m are mutually relatively prime. Let
n
i
j
i
Z=m1 Z=m2 : : : Z=m
n
denote (as usual) the collection of ordered n-tuples with the ith item lying in Z=m . De ne a map
i
f : Z=(m1 : : : m ) ! Z=m1 Z=m2 : : : Z=m
n
by
n
f (x , mod, (m1 : : : m )) = (x , mod , m1 ; x , mod , m2 ; : : : ; x , mod , m )
n
n
Theorem: (Sun-Ze) For m1; : : : ; m mutually relatively prime, this map
f : Z=(m1 : : : m ) ! Z=m1 Z=m2 : : : Z=m
n
n
n
is a bijection.
Proof: First, we consider the case that there are just two di erent relatively prime moduli m; n, and to
show that the corresponding map
f : Z=mn ! Z=m Z=n
given by
f (x , mod , mn) = (x , mod , m; x , mod , n)
is a bijection. First, we prove injectivity: if f (x) = f (y), then x y mod m and x y mod n. That is,
mjx , y and njx , y. Since m; n are relatively prime (!), this implies that mnjx , y, so x y mod mn.
42
At this point, since Z=mn and Z=m Z=n are nite sets with the same number of elements (namely
mn), any injective map must be surjective. So we could stop now and say that we know that f is surjective
(hence bijective).
But it is worthwhile to understand the surjectivity more tangibly, to see once more where the relative
prime-ness of m; n enters. Since m; n are relatively prime, there are integers s; t so that
sm + tn = 1
(We can nd these s; t via the Euclidean Algorithm if we want, but that's not the point just now.) Then we
claim that given integers a and b,
f ((b(sm) + a(tn)) , mod , mn) = (a , mod, m; b , mod, n)
Indeed,
b(sm) + a(tn) b(sm) + a(1 , sm) a mod m
and similarly
b(sm) + a(tn) b(1 , tn) + a(tn) b mod n
This proves the surjectivity, and thus the bijectivity of the function f in the case of just two moduli.
Now consider an arbitrary number of (mutually relatively prime) moduli m1 ; : : : ; mn . We'll do induction
on the number n of moduli involved. The case n = 2 was just treated, and if n = 1 there is nothing to prove.
So take n > 2. By induction on n, the map
fo : Z=m2 : : : mn ! Z=m2 Z=m3 : : : Z=mn
de ned by
fo(x , mod , m2 : : : mn ) = (x , mod , m2; x , mod , m3; : : : ; x , mod , mn)
is a bijection. Thus, the map
f1 : Z=m1 Z=m2 : : : mn ! Z=m1 Z=m2 Z=m3 : : : Z=mn
de ned by
f1 (x , mod , m1 ; x , mod , m2 : : : mn ) = (x , mod , m1; x , mod , m2; x , mod , m3 ; : : : ; x , mod , mn)
is a bijection.
At the same time, invoking unique factorization (!), m1 and the product m2 m3 : : : mn are relatively
prime, so the case n = 2 gives the bijectivity of the map
f2 : Z=m1 (m2 : : : mn ) ! Z=m1 Z=m2 : : : mn
de ned by
f2 (x , mod , m1 (m2 : : : mn )) = (x , mod , m1 ; x , mod , m2 : : : mn )
Therefore, the composite map
f = f2 f1
is also a bijection.
43
|
8.2 Special systems of linear congruences
Now we paraphrase the theorem above in terms of solving several congruences simultaneously. There
are some similarities to the more elementary discussion of systems of linear equations, but there are critical
di erences, as well.
To start with, let's take the smallest non-trivial systems, of the form
(
x a mod m
x b mod n
where m; n are relatively prime, a; b are arbitrary integers, and we are to nd all integers x which satisfy this
system.
Notice that there are two congruences but just one unknown, which in the case of equations would
probably lead to non-solvability immediately. But systems of congruences behave slightly di erently. Our
only concession is: We'll only consider the case that the moduli m and n are relatively prime, that
is, that gcd(m; n) = 1.
Using the Euclidean algorithm again, there are integers s; t so that
sm + tn = 1
since we supposed that gcd(m; n) = 1. And this can be rearranged to
tn = 1 , sm
for example. Here comes the trick: the claim is that the single congruence
xo = a(tn) + b(sm) mod mn
is equivalent to (has the same set of solutions) as the system of congruences above.
Let's check: modulo m, we have
xo (a(tn) + b(sm)) mod m a(tn) + 0 mod m
a(tn) mod m a(1 , sm) mod m
a(1) mod m a mod m
The discussion of the congruence modulo n is nearly identical, with roles reversed. Let's do it:
xo (a(tn) + b(sm)) mod n 0 + b(sm) mod m
b(sm) mod n b(1 , tn) mod n
b(1) mod n b mod n
Thus, anything congruent to this xo modulo mn is a solution to the system.
On the other hand, suppose x is a solution to the system, and let's prove that it is congruent to xo
modulo mn. Since x a mod m and x b mod n, we have
x , xo a , a 0 mod m
and
x , xo b , b 0 mod n
44
That is, both m and n divide x , x . Since m and n are relatively prime, we can conclude that mn divides
x , x , as desired.
Note the process of sticking the solutions together via the goofy formula above uses the Euclidean
Algorithm in order to be computationally e ective (rather than just theoretically possible).
For example, let's solve the system
(
x 2 mod 11
x 7 mod 13
To `glue' these congruences together, we execute the Euclidean Algorithm on 11 and 13, to nd
o
o
6 11 , 5 13 = 1
Thus, using the goofy formula above, the single congruence
x 2(,5 13) + 7(6 11) mod 11 13
is equivalent to the given system. In particular, this gives the solution
x ,2 5 13 + 7 6 11 332 mod 11 13
Quite generally, consider a system
8
x b1 mod m1
>
>
>
x
< b2 mod m2
x b3 mod m3
>
>
:::
>
: x b mod m
We'll only consider the scenario that m and m are relatively prime (for i 6= j ). We solve it in
steps: rst, just look at the subsystem
(
x b1 mod m1
x b2 mod m2
n
i
n
j
and use the method above to turn this into a single (equivalent!) congruence of the form
x c2 mod m1 m2
Then look at the system
(
x c2 mod m1 m2
x b2 mod m3
and use the method above to combine these two congruences into a single equivalent one, say
x c3 mod m1 m2 m3
and so on.
Remark: Yes, this procedure is just a paraphrase of the proof of the previous section.
45
8.3 Congruences with composite moduli
In general, to solve a congruence such as x2 b mod m with composite modulus m = m1 m2 (with m1
and m2 relatively prime), it is faster to solve the congruence modulo m1 and m2 separately and use Sun
Ze's theorem to glue the solutions together into a solution modulo m, rather than trying to solve modulo m.
This is especially true if the prime factorization of m is known.
For example, let's try to solve
x2 ,1 mod 13 17 29
by hand (so that a brute force search is unreasonable, since we would not want to search through any
signi cant fraction of 13 17 29 = 6409 possibilities by hand!). We observe that Sun Ze's theorem asserts
that the collection of integers x modulo 6409 satisfying x2 ,1 mod 6409 is in bijection with the set of
triples (x1 ; x2 ; x3 ) where x1 2 Z=13, x2 2 Z=17, and x3 2 Z=29 and
x21 ,1 mod 13 x22 ,1 mod 17 x23 ,1 mod 13
The bijection is
x , mod , 6409 ! (x , mod , 13; x , mod, 17; x , mod, 29)
Further, the discussion above tells how to go in the other direction, that is, to get back from Z=13 Z=17
Z=29.
In this example, since the numbers 13, 17, 29 are not terribly large, a brute force search for square roots
of ,1 modulo 13, 17, and 29 won't take very long. Let's describe such a search in the case of modulus 29.
First, ,1 = 28 modulo 29, but 28 is not a square. Next, add 29 to 28: 57 is not a square. Add 29 to 57: 86
is not a square. Add 29 to 86: 115 is not a square. Add 29 to 115: 144 = 122 . Thus, 12 are square roots
of -1 modulo 29. Similarly, we nd that 5 are square roots of -1 modulo 13, and 4 are square roots of -1
modulo 17.
To use Sun Ze's theorem to get a solution modulo 6409 = 13 17 29 from this, we rst need integers s; t
so that s 13 + t 17 = 1. The theoretical results about gcd's guarantee that there are such s; t, and Euclid's
algorithm nds them:
17 , 1 13 = 4
13 , 3 4 = 1
Going backwards:
1 = 13 , 3 4
= 13 , 3 (17 , 1 13)
= 4 13 , 3 17
Therefore, from the square root 5 of -1 modulo 13 and square root 4 of -1 modulo 17 we get a square root
of -1 modulo 13 17:
4(4 13) , 5(3 17) = ,47 mod 13 17
Proceeding further, now we need integers s; t so that
s (13 17) + t 29 = 1
Apply Euclid's algorithm, noting that 13 17 = 221:
221 , 7 29 = 18
29 , 1 18 = 11
18 , 1 11 = 7
11 , 1 7 = 4
7,14=3
4,13=1
46
Going back, we get
1=4,13
= 4 , 1 (7 , 4)
=24,17
= 2 (11 , 7) , 7
= 2 11 , 3 7
= 2 11 , 3(18 , 11)
= 5 11 , 3 18
= 5(29 , 18) , 3 18
= 5 29 , 8 18
= 5 29 , 8(221 , 7 29)
= 61 29 , 8 221
Therefore, from the square root -47 of -1 modulo 221 = 13 17 and the square root 12 of -1 modulo 29 we
get the square root
12(,8 221) + (,47)(61 29) = ,104359 = 4594 mod 6409
Further, the most tedious part of the above procedure doesn't need to be repeated to nd the other 7
(!) square roots of -1 modulo 13 17 29, since we already have the numbers \s,t" in our possession.
8.4 Hensel's Lemma for prime-power moduli
In many cases, solving a polynomial equation f (x) 0 mod p modulo a prime p suces to assure that
there are solutions modulo pn for powers pn of p, and also to nd such solutions eciently. And, funnily
enough, the procedure to do so is exactly parallel to Newton's method for numerical approximation to roots,
from calculus. In particular, we will use a purely algebraic form of Taylor expansions to prove the result.
First we'll do a numerical example to illustrate the idea of the process to which Hensel's Lemma refers.
Suppose we want to nd x so that x2 2 mod 73. Noting that a solution mod 73 certainly must give a
solution mod 7, we'll start by nding a solution mod 7. This is much easier, since there are only 7 things in
Z=7, and by a very quick trial and error hunt we see that (3)2 = 9 = 2 mod 7.
Now comes the trick: being optimists, we imagine that we can simply adjust the solution 3 mod 7 to
obtain a solution mod 72 by adding (or subtracting) some multiple of 7 to it. That is, we imagine that for
some y 2 Z
(3 + 7 y)2 2 mod 49
Multiplying out, we have
9 + 21y + 49y2 2 mod 49
Happily, the y2 term disappears (modulo 49), because its coecient is divisible by 49. Rearranging, this is
7 + 42y 0 mod 49
Dividing through by 7 gives
1 + 6y 0 mod 7
Since the coecient (namely, 6) of y is invertible modulo 7, with inverse 6, we nd a solution y = (6,1 )(,1) =
6 (,1) = 1 mod 7. Thus,
3 + 7 1 = 10
47
is a square root of 2 modulo 72 .
Continuing in our optimism: Now we hope that we can adjust the solution 10 mod 72 by adding some
multiple of 72 to it in order to get a solution mod 73. That is, we hope to nd so that
y
(10 + 72 )2 2 mod 73
y
Multiplying out and simplifying, this is
294 ,98 mod 73
y
Dividing through by 72 gives
6 ,2 mod 7
y
Again, the inverse of 6 mod 7 is just 6 again, so this is
y
Therefore,
6(,2) 2 mod 7
10 + 72 2 = 108
satis es
1082 2 mod 73
This was considerably faster than brute-force hunting for a square root of 2 mod 73 directly.
To prepare for a more general assertion of Hensel's Lemma, we need to give a purely algebraic description
of the derivative of a polynomial. That is, we don't want the de nition to require taking any limits. Let
( )=
f x
n
c x
n +:::+ c
o
with the coecients in . Simply de ne another polynomial 0 by
Z
f
f
0 (x) = ncn xn,1 + (n , 1)cn,1 xn,2 + : : : + 2c2 x + c1 + 0
Remark: Of course, we have de ned this derivative by the formula that we know is \correct" if it were
de ned as a limit.
Proposition: Using the purely algebraic de
and for 2 , we have
( )0 =
( + )0 =
( )0 =
=
r
Z
rf
f
g
fg
f
g
nition of derivative, for polynomials
f; g
with coecients in ,
Z
0 (constant-multiple rule)
0 0 (sum rule)
f +g
0
0 (product rule)
f g + fg
0
f g g
(chain rule)
rf
Proof: We know from calculus that these assertions hold, even though we didn't mention limits here.
48
|
Theorem: (Hensel's Lemma) Let be a polynomial with coecients in Z. Let be a prime number, and
suppose that 2 Z satis es
( ) 0 mod
with
0. Suppose that 0 ( ) 6 0 mod . Let 0( ), be an integer which is a multiplicative inverse to
0 ( ) modulo . Then
= , ( ) 0 ( ),
f
p
xn
n
p
f xn
n >
f
f
x1
xn
p
f
1
x1
p
xn+1
satis es
xn
(
f xn+1
=
xn+1
xn
f
) 0 mod
Further, from this construction,
In particular, for every index,
f xn
=
n+1
p
mod
xn
mod
x1
1
x1
n
p
p
Remark: Note that the quantity 0( 1 ),1 mod does not need to be recomputed each cycle of the iteration,
but only once at the beginning.
Proof: First, let's check that if has integer coecients, then for every positive integer the quotient
( )
! has integer coecients, where ( ) is the th derivative of . To prove this it suces to look at
( ) = ,,since
with integer coecients is a sum of multiples of such things. In this case,
,every polynomial
,
( )
!=
. Since
appears as a coecient in ( + 1) , which has integer coecients (!), this
proves what we wanted.
Now we can almost prove the theorem. Let = , ( ) 0 ( ),1 mod =1 . Note that this expression
uses 0 ( ),1 mod +1 instead of 0 ( 1 ),1 mod . We'll have to come back at the end and take care of
this adjustment.
We have 0 mod . Since is a polynomial, a Taylor expansion for it about any point is nite, and
converges to . Thus,
f
x
p
f
f
k
f x
f
k
k
=k
f
x
k
k
f
n
n
n
x
k
=k
n
k
k
y
f
n
p
xn
f
n
p
y
n
x
x
f xn f
xn
p
n
p
f
f
0
00 ( )
2
( + ) = ( ) + (1! ) + 2!
+
f xn
(The sum is nite!) Each
f
y
(i)
f
f xn
xn
f
y
( ) ! is an integer, and
2n
xn =i
p
2n
And
divides
f
00 (xn )
2!
p
y
2
+
xn
y
f
( ) 3+
3!
(3)
xn
divides 2 , 3 ,
y
f
y
( ) 3+
3!
(3)
xn
y
y
4
y ;:::
:::
, so
:::
0
0
( ) + (1! ) = ( ) + (1! ) (, ( ) 0 ( ),1
Since 0 ( ),1 is a multiplicative inverse of 0( ) modulo , there is an integer so that
f xn
f
f
xn
y
xn
f
f
f
f xn
xn
xn
f xn f
xn
p
t
0 (xn ) f 0 (xn ),1 = 1 + tp
Then
0
0
( ) + (1! ) = ( ) + (1! ) (, ( ) 0 ( ),1 ) = ( ) , ( )(1 + ) = ( )
f xn
f
xn
y
f xn
f
xn
f xn f
xn
f xn
f xn
tp
f xn
tp
Since ( ) 0 mod , and we have picked up one further factor of , this is 0 modulo +1 , as claimed.
Further, regarding the last assertions of the theorem, note the quantity ( ) 0 ( ) by which we adjust
to get +1 is a multiple of .
f xn
xn
xn
p
n
n
p
p
f xn =f
n
p
49
xn
Finally, we need to check that
f (xn )f 0 (xn ),1
= f (xn )f 0 (x1 ),1 mod pn+1
where f 0 (x1 ),1 is just an inverse mod p, not mod pn+1 . Since xn = x1 mod p, and since f 0 has integer
coecients, it is not so hard to check that
f 0 (xn ) = f 0 (x1 ) mod p
Therefore,
f 0 (xn ),1
= f 0 (x1 ),1 mod p
Further, by hypothesis pn divides f (xn ), so, multiplying through by f (xn ) gives
f (xn )f 0 (xn ),1
= f (xn )f 0 (x1 ),1 mod p pn
This veri es that we don't need to compute f 0 (xn ),1 mod pn+1 , but just the single quantity f 0(x1 ),1 . |
Remark: We could give purely algebraic proofs of the di erentiation formulas and the representability of
polynomials by their Taylor expansions, but this can be done later in greater generality anyway, so we'll be
content with the calculus-based argument here.
#8.68 Find an integer x so that x 3 mod 5 and x 4 mod 7.
#8.69 Find an integer x so that 3x 2 mod 5 and 4x 5 mod 7.
#8.70 Find four integers x which are distinct modulo 5 7 and so that x2 1 mod 5 and x2 1 mod 7.
That is, nd 4 di erent square roots of 1 modulo 35.
#8.71 Find four di erent square roots of 2 modulo 7 23.
#8.72 Explain why there are 8 di erent square roots of 1 modulo 3 5 7 = 105.
p
#8.73 Find 2 mod 75 via Hensel's Lemma.
#8.74 Find p,1 mod 56 via Hensel's Lemma.
#8.75 (*) Discuss the failure of the Quadratic Formula to solve the equation x2 + x + 1 0 mod 2.
50
9. Good algorithm for exponentiation
Fast exponentiation
9.1 Fast exponentiation
The most naive version of exponentiation, in which to compute xn one computes x2 , then x3 = x x2 ,
then x4 = x x3 , : : :, xn = x xn,1 , is very inecient. Here we note a very simple but much faster
improvement upon this, which has been known for at least 3000 years. This improvement is especially
relevant for exponentiation modulo m.
The idea is that to compute xe we express e as a binary integer
e = eo + e1 21 + e2 22 + ::: + en en
with each ei equal to 0 or 1, and compute power-of-two powers of x by squaring:
x2 = x x
x4 = (x2 )2
x8 = (x4 )2
x24 = (x8 )2
x25 = (x24 )2
:::
Then
xe = xeo (x2 )e1 (x4 )e2 (x8 )e3 (x24 )e4 : : : (x2n )en
Again, the eik's are just 0 or 1, so in fact this notation is clumsy: we omit the factor x2k if ek = 0 and include
the factor x2 if ek = 1.
A fairly good way of implementing this is the following. To compute xe , we will keep track of a triple
(X; E; Y ) which initially is (X; E; Y ) = (x; e; 1). At each step of the algorithm:
If E is odd then replace Y by by X Y and replace E by E , 1
If E is even then replace X by X X and replace E by E=2. When E = 0 the value of Y at that time
is xe .
This algorithm takes at most 2 log2 E steps (although of course the numbers involved grow considerably!)
For our purposes, this pretty fast exponentiation algorithm is of special interest when combined with
reduction modulo m: the rewritten algorithm is: to compute xe % m, we will keep track of a triple (X; E; Y )
which initially is (X; E; Y ) = (x; e; 1). At each step of the algorithm:
If E is odd then replace Y by by X Y % m and replace E by E , 1
If E is even then replace X by X X % m and replace E by E=2. When E = 0 the value of Y at that
time is xe % m.
51
Again, this algorithm takes at most 2 log2 E steps. When the exponentiation is done modulo m, the
numbers involved stay below m2 , as well.
Note that in the fast exponentiation modulo m, no number larger than m2 will arise. Thus, for example,
to compute something like
21000 % 1000001
would require no more than 2 log2 1000 2 10 = 20 multiplications of 6-digit numbers. Generally, we have
Proposition: The above algorithm for evaluation of xe % m uses O(log e log2 n) bit operations.
For example, let's directly evaluate 21000 mod 89. Setting this up as indicated just above, we have
`X `E `output
2 1000
1
4 500
1
16 250
1
78 125
1
78 124
78
32 62
78
45 31
78
45 30
39
67 15
39
67 14
32
39 7
32
39 6
2
8
3
2
8
2
16
64 1
16
64 0
45
0
We conclude that
0
0
initial state
`E' was even: square `X' mod 89
`E' was even: square `X' mod 89
`E' was even: square `X' mod 89
`E' was odd: multiply `out' by `X' mod 89
`E' was even: square `X' mod 89
`E' was even: square `X' mod 89
`E' was odd: multiply `out' by `X' mod 89
`E' was even: square `X' mod 89
`E' was odd: multiply `out' by `X' mod 89
`E' was even: square `X' mod 89
`E' was odd: multiply `out' by `X' mod 89
`E' was even: square `X' mod 89
`E' was odd: multiply `out' by `X' mod 89
`E' was even: square `X' mod 89
`E' was odd: multiply `out' by `X' mod 89
21000 % 89 = 45
#9.76 Estimate the number of steps necessary to compute 2100 by the fast exponentiation algorithm.
#9.77 Compute 233 mod 19.
#9.78 Compute 2129 mod 19.
#9.79 Compute 2127 mod 19.
52
10. Fermat's Little Theorem
More than 350 years ago Pierre de Fermat made many astute observations regarding prime numbers,
factorization into primes, and related aspects of number theory (not to mention other parts of mathematics
and science as well.) About 300 years ago, Leonhard Euler systematically continued Fermat's work. Most
of these things were prototypes for \modern" mathematical ideas, and at the same time remain very much
relevant to contemporary number theory and its applications.
Fermat's Little Theorem
Factoring bn - 1
Examples: factoring Mersenne numbers
Examples: factoring 3n , 1
A formula for square roots mod p
A formula for nth roots mod p
10.1 Fermat's Little Theorem
This little result is over 350 years old. It is basic in elementary number theory itself, and is the origin
of the rst probabilistic primality test. It is possible to prove Fermat's Little Theorem with very minimal
prerequisites, as we'll do now.
Theorem: Let p be a prime number. Then for any integer x
xp x mod p
Proof:
We will rst prove that prime p divides the binomial coecients
p
i
with 1 i p , 1, keeping in mind that the \extreme" cases i = 0 and i = p can't possibly also have this
property, since
p
p
=
1
0
p =1
Indeed, from its de nition,
p
p!
i = i! (p , i)!
Certainly p divides the numerator. Since 0 < i < p, the prime p divides none of the factors in the factorials
in the denominator. By unique factorization into primes, this means that p does not divide the denominator
at all.
53
From the Binomial Theorem,
(x + y)p =
X p i p,i
i xy
0ip
In particular, since the coecients of the left-hand side are integers the same must be true of the right-hand
side. Thus, all the binomial coecients are integers. (We did not use the fact that p is prime to reach this
conclusion.
Thus, the binomial coecients with 0 < i < p are integers expressed as fractions whose numerators
are divisible by p and whose denominators are not divisible by p. Thus, when all cancellation is done in
the fraction, there must remain a factor of p in the numerator. This proves the desired fact about binomial
coecients.
Now we prove Fermat's Little Theorem (for positive x) by induction on x. First, certainly 1p 1 mod p.
For the induction step, suppose that we already know for some particular x that
xp x mod p
Then
(x + 1)p =
X p i p,i p X p i
i x 1 = x + 0<i<p i x + 1
0ip
All the coecients in the sum in the middle of the last expression are divisible by p. Therefore,
(x + 1)p xp + 0 + 1 x + 1 mod p
since our induction hypothesis is that xp x mod p. This proves the theorem for positive x.
To prove the theorem for x < 0 we use the fact that ,x is then positive. For p = 2 we can just treat
the two cases x 0 mod 2 and x 1 mod 2 separately and directly. For p > 2 we use the fact that such a
prime is odd. Thus,
xp = ,(,x)p ,(,x) mod p = x mod p
by using the result for positive integers.
|
10.2 Factoring
n
b - 1
Using Fermat's Little Theorem, we can follow in his footsteps and speed up certain special factorizations
by a signi cant factor. First we prove a Lemma that looks too good to be true:
Lemma: Let b > 1. Then for any two positive integers m; n,
gcd(bm , 1; bn , 1) = b
gcd(
m;n) , 1
Remark: From elementary algebra we should remember the identity
xN , 1 = (x , 1)(xN , + xN , + : : : + x + x + 1)
1
2
2
for positive integers N . For a positive divisor d of n, letting x = bd and N = n=d, we obtain
bn , 1 = (bd )N , 1 = (bd , 1)((bd)N ,1 + (bd )N ,2 + : : : + (bd)2 + (bd ) + 1
Thus, for simple reasons bd , 1 divides bn , 1 for djn.
54
Proof: First, note that if m = n then the assertion of the proposition is certainly true. The rest of the proof
is by induction on the larger of m; n. We may suppose that m n (reversing the roles of m; n if necessary).
In the case that n = 1, the assertion would be that the gcd of b , 1 and b , 1 is b , 1, which is certainly
true. Now the induction step. We may suppose that m < n, since the m = n case has been treated already.
Note that
(bn , 1) , bn,m (bm , 1) = bn,m , 1
We claim that
gcd(bm , 1; bn , 1) = gcd(bm , 1; bn,m , 1)
On one hand, if djbn , 1 and djbm , 1, then dj(bn , 1) , bn,m (bm , 1), and then djbn,m , 1. Thus, any
common divisor d of bn , 1 and bm , 1 also is a divisor of bn,m , 1. On the other hand, from the rearranged
expression
bn , 1 = bn,m (bm , 1) + bn,m , 1
any common divisor of bm , 1 and bn,m , 1 divides the right-hand side, so divides bn , 1. This proves the
claim.
Thus, invoking the induction hypothesis, we have
gcd(bm , 1; bn , 1) = gcd(bm , 1; bn,m , 1) = bgcd(m;n,m) , 1
So we should show that gcd(m; n) = gcd(m; n , m): this follows the same standard idea as the proof of the
last claim: On one hand, certainly if d is a common divisor of m and n, then djn , m. On the other hand,
using n = m + (n , m), if d is a common divisor of m and n , m then djn as well.
|
Corollary: Fix a positive integer b. Let n be a positive integer. If a prime p divides bn , 1, then either
pjbd , 1 for some divisor d of n with d < n, or p 1 mod n.
Proof: Suppose that p divides bn , 1. By Fermat's Little Theorem, bp,1 1 mod p, so p divides bp,1 , 1.
Therefore, by the lemma, p divides bgcd(n;p,1) , 1. If d = gcd(n; p , 1) < n, then certainly d < n is a positive
divisor of n with pjbd , 1. If gcd(n; p , 1) = n, then njp , 1, which is to say that p 1 mod n.
|
Remark: The latter corollary shows that divisors of numbers of the form bn , 1 are considerably restricted.
Further, for odd primes p and odd n, since gcd(n; 2) = 1, if njp , 1 then from 2jp , 1 we can conclude that
2njp , 1, so p 1 mod 2n.
10.3 Examples: factoring Mersenne numbers
The restriction on the possible prime factors of numbers of the form bn , 1 noted above reduces by a
signi cant factor the time to factor (by otherwise naive methods) Mersenne numbers 2n , 1.
Example: Factor 127 = 2 , 1: Since 7 is prime, the corollarypshows that the only possible prime factors p
of this number must satisfy p 1 mod 14. On the other hand, 127 < 12, so we need only attempt division
by primes under 12. But there aren't any such things that are also congruent to 1 modulo 14, so 2 , 1 must
7
7
be prime.
Remark: Even though we could easily test primality of 127 by hand anyway, it is pretty cute that we can
also do it \by pure thought" (meaning without computing very much).
Example: Factor 255 = 28 , 1: The composite exponent yields many factors: from 22 , 1 = 3 we get 3,
from 24 , 1 = 15 = 3 5 we get 5. Dividing, we have
255=(3 5) = 17
55
which is prime. So 28 , 1 = 3 5 17.
Example: Factor 511 = 29 , 1: The composite exponent gives a factor 23 , 1 = 7. Then 511 7 = 73, which
is prime. So 29 , 1 = 7 73.
Example: Factor 1023 = 210 , 1: First, since the exponent is composite, this Mersenne number is certainly
composite. We note that the (positive) divisors of 10 less than 10 are 1 2 5, so we have divisors 3 = 22 , 1
and 31 = 25 , 1 of 210 , 1. The corollary then tells us that any other primes dividing 210 , 1 must be
congruent to 1 modulo 10. First try 11: indeed
=
;
;
1023 11 = 93 = 3 31
=
So
1023 = 3 11 31
Of course, since 2 , 1 has the small factor 3, already 1023 3 = 341 is small-enough that we might not mind
continuing its factorization by hand? Especially after being handed the factor of 31 = 25 , 1, and computing
341 31 = 11, there's nothing left to the imagination.
Example: Factor 2047 = 211 , 1: Now we don't have any \cheap" factors, since 11 is prime. If this number
were to turn out to be prime, then it would be a Mersenne prime. The corollary above assures us that
any prime dividing 2047 must satisfy 1 mod 11. Since also must be odd, as noted above we can in
fact assert that such satis es 1 mod 22. So we attempt division of 2047 by 23 = 22 + 1, and nd that
2047 23 = 89. (Since 89 100, trial division by merely 2 3 5 7 shows that 89 is prime.) So 2047 = 23 89.
Example: Factor 4095 = 212 , 1: The exponent is so composite that we have many easy prime factors
arising from the factors 2d , 1 with
12 dividing 12. That is, we can rst look at the prime factors of
22 , 1 = 3, 23 , 1 = 7, 24 , 1 = 15 = 3 5, and 26 , 1 = 63 = 32 7. Thus, 4095 is divisible by 32 5 7.
Dividing, we are left with
4095 (32 5 7) = 13
So the whole factorization is 4095 = 32 5 7 13.
Example: Factor 8191 = 213 , 1. The exponent 13 is prime, so there are nopobvious factors. If this number
were to turn out to be prime, then it would be a Mersenne prime. Since 8191 90 5, we need only do
trial division by primes under 90. The corollary above says that we need only consider primes 1 mod 26.
First, 26 + 1 = 27 is not prime, so we need not attempt division by it. Second, 2 26 + 1 = 53 is prime, but
8191 % 53 = 29. Then 3 26 + 1 = 79 is prime, but 8191 % 79 = 54. So 8191 is prime.
Example: Factor 16383 = 214 , 1. We look at 22 , 1 = 3 and 27 , 1 = 127 rst. We saw above that 127 is
prime. So we can take out prime factors of 3 and 127, leaving
10
=
=
p
p
p
=
p
p
<
;
;
;
d <
=
:
p
16383 (3 127) = 43
=
which we recognize as being prime.
Example: Factor 32767 = 215 , 1: From 23 , 1 = 7 and 25 , 1 = 31 we nd prime factors 7 and 31. Dividing
out, we have
32767 (7 31) = 151
We could attack thispby hand, or invoke the corollary to restrict our attention to primes with 1 mod 30
which are less than 151 13. Since there aren't any such primes, we can conclude for qualitative reasons
that 151 is prime. Therefore, the prime factorization is 215 , 1 = 7 31 151.
Example: Factor 65535 = 216 , 1: From 22 , 1 = 3, 24 , 1 = 15 = 3 5, 28 , 1 = 255 = 3 5 17 (from
above), we obtain prime factors 3, 5, and 127. Dividing, we get
=
p
<
65535 (3 5 17) = 257
=
56
p
Now we invoke
p the corollary to restrict our attention to potential prime factors with 1 mod 16. At the
same time, 257 17. This excludes all candidates, so 257 is prime, and the prime factorization is
p
p
<
216 , 1 = 3 5 17 257
Example: Factor
p 131071 = 217 , 1: Since 17 is prime, we only look for prime factors with 1 mod 34,
and also 131071 362 038. First, 34 + 1 = 35 is not prime. Next, 2 34 + 1 = 69 is divisible by 3, so
is not prime. Next, 3 34 + 1 = 103 is prime, but 131071 % 103 = 55. Next, 4 34 + 1 = 137 is prime, but
131071 % 137 = 99. Next, 5 34 + 1 = 171, which is divisible by 3. Next, 6 34 + 1 = 205,pvisibly divisible
by 5. Next, 7 34 = 1 = 239, which is prime (testing prime divisors 2 3 5 7 11 13 all 239 16). But
131071 % 239 = 99. Next,p8 34 = 1 = 273, which is divisible by 3. Next, 9 34 = 1 = 307, which is prime
(testing prime divisors 307 18). But 131071 % 307 = 289. Next, 19 34 + 1 = 341, which is divisible
by 11. Since this is the last candidate prime below the bound 362, it must be that 131071 = 217 , 1 is prime.
Example: It turns out that 524287 = 219 p, 1 is prime. To verify this in the most naive way, we would
have to look for possible prime divisors 524287 725, If we did this in the most naive way it would
require about 725 2 362 trial divisions to verify the primality. But if we invoke the lemma and restrict
our attention to primes with 1 mod 38, we'll only need about 725 38 19 trial divisions.
Example: Factor 8388607 = 223 , 1: By the corollary, we need only look at primes with 1 mod 46.
And, by luck 47 divides this number. Divide, to obtain 8388607 p47 = 178481. Anticipating (!?) that this
is a prime, we note that we must attempt division by primes 178481 422 5. Looking only at these
special primes, we have about 422 46 9 trial divisions to do, rather than the 422 2 211 in the most
p
<
p
p
:
;
;
;
;
<
;
<
<
=
p
p
=
p
p
=
<
:
=
=
naive approach.
Example: Factor 536870911 = 229 , 1: Any possible
prime factors satisfy 1 mod 58, and if this
p
number is not prime then it has such a factor 536870911 23170. Looking at 59, 117 (not prime), 175
(not prime), 233, , by luck it happens that 233 divides 536870911. Divide:
p
p
:::
536870911 233 = 2304167
=
Thep latter is not divisible by 233. We know that if 2304167 is not prime then it has a prime divisor
2304617 1518. After 14 more trial divisions, we would nd that the prime 19 58 + 1 = 1103 divides
2304167.
p Dividing, we have 2304167 1103 = 2089. If 2089 were not prime, then it would have a prime factor
2089 46, but also 1 mod 58. There aren't any such things, so 2089 is prime. Therefore,
<
=
<
536870911 = 233 1103 2089
n
10.4 Factoring 3 - 1
We continue with more examples using Fermat's observation about factors of special numbers of the
form n , 1.
b
Every number 3n , 1 (for
1) has the obvious factor 3 , 1, so is not prime. But this is a rather
weak statement, since we might want the whole prime factorization, or at least be curious whether or not
(3n , 1) (3 , 1) is prime. Fermat's trick is helpful in investigating this, in the same way that it was helpful
in looking at Mersenne numbers.
The trick we have in mind here asserts that if a prime divides n , 1 then either j( d , 1) for some
j with
, or else 1 mod . And if case is odd then 1 mod 2 . Thus, in rough terms, the
number of primes to attempt to divide into n , 1 is reduced by a factor of or 2 .
n >
=
p
d n
d < n
p
p
n
n
b
p
b
p
n
n
57
n
b
First, 32 , 1 = 8 = 23 .
Next, (33 , 1) 2 = 26 2 = 13. Fermat's trick indicates that a prime dividing 33 , 1 and not dividing
1
3 , 1 should be congruent to 1 mod 2 3 = 6, which is the case with 13.
Next, 34 , 1 = (32 , 1)(32 + 1). We factored 32 , 1 above. The factor 32 + 1 is still certainly divisible
by 2, and then (32 + 1) 2 = 5, which is indeed a prime congruent to 1 modulo 4.
Next, 35 , 1 = 242. Taking out the factor 3 , 1 leaves 121. By Fermat's trick, any prime dividing this
must be congruent to 1 modulo 2 5 = 10. Trying 10 + 1 = 11, we see that in fact (as we really probably
knew all along) 121 = 112.
Next, 36 , 1 = (33 , 1)(33 + 1) = (33 , 1)(3 + 1)(32 , 3 + 1). We already understand all the factors
except 32 , 3 + 1 = 7. Indeed, this is a prime congruent to 1 modulo 6.
Next, 37 , 1 = 2186. Taking out the factor of 3 , 1 = 2 leaves 1093. By Fermat's trick, we know that
any prime factor of this must be congruent to 1 mod 2 7 = 14. Since 14+1 = 15 is not prime,
p the rst prime
we try to divide into 1093 is 2 14 + 1 = 29, and we nd that 1093 % 29 = 20. Now 1093 33 06 34,
so we know in advance that we need not test potential divisors for 1093 larger than 33. But 29 is the only
prime 34 and congruent to 1 modulo 14, so we're done: 1093 is prime.
Next 38 , 1 = (34 , 1)(34 +1). After taking the factor of 3 , 1 = 2 out of 34 +1 we have (34 +1) 2 = 41.
Fermat's trick assures us that any prime not dividing 34 , 1 = 80 = 24 5pand dividing this will be congruent
to 1 modulo 8. Since there are no primes congruent to 1 mod 8 and 41 7, it follows that 41 is prime.
(Yes, we already knew that anyway.)
Next, (39 , 1) (33 , 1) = 36 + 33 + 1 = 757. From Fermat, 757 must be divisible onlypby primes
congruent to 1 modulo 2 9 = 18. Also, if 757 is not prime then it will have a prime divisor 757 28.
The only prime in this range is 18 + 1 = 19, but 757 % 19 = 16, so 757 is prime.
Next, (310 , 1) (35 , 1) = 244. Taking out the factor of 22 , this is 61. This has no prime factors dividing
2
3 , 1 = 8 nor p
(35 , 1) 2 = 112, so any prime factors of it must be congruent to 1 modulo 10. There are no
such primes 61 8, so 61 is prime. (Yes, we knew that already.)
Next (311 , 1) 2 = 88573. Any prime dividing this must be congruent to 1 modulo 22. Attempting
division by the rst such, 23, gives a quotient of 3851 with no remainder. pTrying again, 3851 % 23 = 10,
so 23 does not divide 3851. If 3851 is not prime, it has a prime divisor 3851 62 06 63. The next
candidate, 23 + 22 = 45, is not prime. The next candidate is 45 + 22 = 67, which is prime, but is too large
already. Therefore, 3851 is prime.
It turns out that (313 , 1) 2 = 797161 is prime, but even with Fermat's speed-up, this would still take
about 35 trial divisions. This is plausible to do \by hand", certainly better than the over 400 trial divisions
that the most naive primality test would require.
Skipping ahead a little, let's look at 315 , 1. The quotient (315 , 1) (35 , 1) = 59293 has prime factors
either dividing 33 , 1 = 2 13 or congruent to 1 modulo 2 15 = 30. Trying 13, we get 59293 13 = 4561.
(And 13 does not divide 4561.) The only
p prime divisors of this are congruent to 1 modulo 30. Also, if 4561
is not prime it must have a factor 4561 67 54 68. Trying 31, we nd 4561 % 31 = 4. Trying 61, we
nd 4561 % 61 = 47. Thus, 4561 is prime.
=
=
=
:
<
<
=
<
=
<
=
=
<
=
:
<
=
=
=
:
<
58
10.5 A formula for some square roots
In the case that a prime satis es 3 mod 4 we can also give a formula for the square root of a
square modulo prime. Since we have a good algorithm for exponentiation, this formula should be viewed as
reasonably good for nding square roots. Note that it only applies if the prime modulus is congruent to 3
modulo 4, and only if the given number really is a square mod p. (Otherwise, the formula can be evaluated,
but the output is garbage.)
p
p
p
p
Theorem: Let be a prime satisfying 3 mod 4. Then for an integer which is a square-modulo- ,
p
p
y
=
x
y
(p+1)=4
mod
p
p
is a square-root-mod- of . That is, 2 mod .
Remark: Unfortunately, if is not a square modulo , the formula can be evaluated, but does not give
a square root of modulo . Also, unfortunately, for 1 mod 4 there is no formula for square roots
analogous to this.
Proof: First note that the expression ( + 1) 4 is not an integer unless 3 mod 4. Suppose that = 2 .
Let's check that = (p+1)=4 has the property that 2 = mod . (Note that we do not assert that = !)
Then
( (p+1)=4 )2 = (p+1)=2 = ( 2 )(p+1)=2 = p+1 = p 1 = =
where we get p = mod from Fermat's Little Theorem.
|
p
y
x
y
p
y
y
p
p
p
p
z
=
p
y
z
y
x
x
y
y
x
y
x
z
x
th
roots
p
x
x
x
x
x
y
p
10.6 A formula for for nth roots mod p
Generalizing the square root case above, in certain circumstances we have a formula to nd
modulo p.
n
A non-zero modulo is an th -power (or, in archaic terminology, th -power residue) modulo if
there is so that n mod . (If there is no such , then is an th -power non-residue.)
Theorem: Let be a prime. If is relatively prime to , 1, then every has an th root modulo . In
particular, letting be a multiplicative inverse for modulo , 1,
y
x
p
x
n
y
n
p
p
x
n
y
p
r
y
n
th
n
p
n
n
p
p
root of mod =
y
p
y
r
Proof: We check that (yr )n = y mod p. For pjy, so that y = 0 mod p, this is easy. So now suppose that p
does not divide y. Since rn 1 mod p , 1 there is a positive integer ` so that rn = 1 + `(p , 1). Then
( r )n ) =
y
y
rn
=
y
, = y (y( p , 1))` = y 1` = y
1+`(p 1)
since Fermat's Little Theorem gives p,1 = 1 mod .
|
Having treated the case that gcd( , 1) = , we will ignore the intermediate cases where 1
gcd( , 1)
and treat the other extreme where gcd( , 1) = 1: We have a computationally e ective
way to compute th roots modulo primes with j( , 1) as long as gcd( p,n 1 ) = 1:
y
p
n; p
n; p
n
< n
n
<
n; p
p
n
p
59
n;
Theorem: Let be a prime so that 1 mod , but so that gcd(
inverse of modulo ( , 1) .
is an power then
p
n
p
p
n
If y
=n
n;
th
, ) = 1. Let r be a multiplicative
p 1
n
n
th
n
root of mod =
y
p
y
r
The basic mechanism of the argument is the same as the previous proof, with a few complications.
We check that ( r )n = mod . For j , so that = 0 mod , this is easy. So now suppose that does not
divide . Since 1 mod ( , 1) there is a positive integer so that = 1 + ( , 1) . Also, since
we are assuming that has an th root, we can express as = n mod . Then
Proof:
y
y
y
p
rn
p y
p
y
y
n
y
(
x
x
p
`
` p,
( r )n ) = (( n )r )n = nrn = n(1+ n ) =
y
p
=n
1)
x
n
x
y
x
` ( , 1) =
x
rn
p
x
n
` p
=n
p
(
x
, ) = xn 1` = xn = y mod p
p 1 n
where we invoke Fermat's Little Theorem to know that p,1 = 1 mod .
|
Remark: The formula in the latter theorem yields garbage if is not an th power!
Remark: If gcd( p,n 1 ) 1 then computation of roots is more complicated.
Remark: The di erence between the two formulas for the th roots, for the two cases in the two theorems
just above is very important. Application of either one in the other situation yields garbage.
x
p
y
n;
>
n
#10.80 Factor 5n , 1 into primes for 1 11.
#10.81 Find a square root of 2 mod 103.
#10.82 Find 11 roots of 2 and 3 modulo 101.
#10.83 Find 11 roots of 141 and 162 modulo 199.
#10.84 Show that 2 is an 11 power mod 199.
n
th
th
not
th
60
n
11. Euler's Theorem, Primitive Roots, Exponents, Roots
The direct successor to Fermat, Leonhard Euler systematically continued Fermat's work in number
theory and its applications.
Some of the the proofs in this section are incomplete, since they depend on (for example) the existence
of primitive roots modulo primes, which we can only prove later. Nevertheless, these results illustrate the
relevance of the later more abstract results.
Euler's Theorem
Facts about primitive roots
Euler's criterion for nth roots mod p
11.1 Euler's Theorem
Here we state Euler's Theorem generalizing Fermat's Little Theorem. An intelligent proof of Euler's
Theorem is best given as a corollary of some basic group theory, so we postpone the proof till later.
For a positive integer n, the Euler phi-function '(n) is the number of integers b so that 0 < b < n
and gcd(b; n) = 1.
Theorem: (Euler) For x relatively prime to a positive integer n,
x'(n) = 1 mod n
(Proof later.)
The special case that n is prime is just Fermat's Little Theorem, since for prime p we easily see that
'(p) = p , 1.
11.2 Facts about primitive roots
In this section we simply explain what a primitive root is supposed to be, and state what is true.
The existence of primitive roots modulo primes will be used just below to prove the \hard half" of Euler's
criteria for whether or not things have square roots (or nth roots) modulo primes. The proofs of existence
(and non-existence) of primitive roots require more preparation.
Let n be a positive integer. An integer g is a primitive root modulo n if the smallest positive integer
` so that g` = 1 mod n is '(n).
Note that Euler's theorem assures us that in any case for g relatively prime to n no exponent ` larger
than '(n) is necessary.
For \most" integers n there is not primitive root modulo n. The precise statement about when there is
or is not a primitive root modulo m is
61
Theorem: The only integers n for which there is a primitive root modulo n are those of the forms
n = pe with an odd prime p, and e 1
n = 2pe with an odd prime p, and e 1
n = 2; 4
This will be proven later. In particular, the most important case is that there do exist primitive roots
modulo primes.
It is useful to make clear one important property of primitive roots:
Proposition: Let g be a primitive root modulo n. Let ` be an integer so that
g` = 1 mod n
Then '(n) divides `.
Proof: Using the Division Algorithm, we may write ` = q '(n) + r with 0 r < '(n). Then
1 = g` = gq '(n)+r = (g'(n) )q gr = 1q gr = gr mod n
Since g is a primitive root, '(n) is the least positive exponent so that g raised to that power is 1 mod n.
Thus, since 1 = gr mod n, it must be that r = 0. That is, '(n)j`.
|
11.3 Euler's criterion for square roots mod p
Fermat's Little Theorem gave some information about square roots modulo primes p. Now we will
explain Euler's criterion, which gives a way to eciently determine whether or not a given integer y has
a square root modulo a prime p (presuming that we are acquainted with a fast exponentiation algorithm).
To prove Euler's criterion we must grant the existence of primitive roots modulo primes, which will be
proven only later.
Given y, a square root of y modulo m (with m not necessarily prime) is an integer x so that
x2 y mod m
If there is such an x, then y is a square mod m, or, in archaic terminology, a quadratic residue mod m.
If there is no such x, then y is a non-square mod m, or, in archaic terminology, a quadratic non-residue
mod m.
Remark: As with multiplicative inverses, there is essentially no tangible connection between these square
roots and square roots which may exist in the real or complex numbers. Thus, the expressions `py' or `y1=2 '
have no intrinsic meaning for y 2 Z=p.
Example: Since 22 = 4 = ,1 mod 5, 2 is a square root of ,1 modulo 5. We would write
p
2 = ,1 mod 5
Note that the fact that there is no real number which is a square root of ,1 is no argument against the
existence of a square root of ,1 modulo 5.
Example: Since 42 = 16 = 5 mod 11,
p
4 = 5 mod 11
62
p
Example: There is no 2 modulo 5: to be sure of this, we compute 5 cases:
0 =0=
6 2 mod 5
1 =1=
6 2 mod 5
2 =4=
6 2 mod 5
3 =9=4=
6 2 mod 5
4 = 16 = 1 =
6 2 mod 5
Since Z=5 consists of just the 5 congruence classes 0, 1, 2, 3, 4, we don't need to check any further to know
2
2
2
2
2
that there is no square root of 2 modulo 5.
From a naive viewpoint, it would appear that the only way to check whether the square root of y modulo
m exists is by brute force, squaring each element of Z=m in turn to see if by chance the value y appears
among the squares. From this viewpoint, it would be especially laborious to be sure that something had
no square root, since all of Z=m would have to be searched. But Fermat's Little Theorem (together with
the fast exponentiation algorithm) gives some help here. But present we can only prove half the following
theorem:
11.4 Euler's criterion for roots mod p
Just as in the case of square roots, there is a criterion (due to Euler) for whether or not an integer y
can be an nth power modulo a prime p, when p 1 mod n. By contrast, we've already seen that when
gcd(n; p , 1) = 1, everything is an nth power mod p. In the case of square roots a little more can be said,
which is important in later discussion of quadratic symbols and the theorem on Quadratic Reciprocity.
Again, a non-zero y modulo p is an nth-power (or, in archaic terminology, nth -power residue) modulo
p if there is x so that xn y mod p. (If there is no such x, then y is an nth-power non-residue.)
Theorem: (Euler's Criterion) Let p be a prime with p 1 mod n. Let y be relatively prime to p. Then
y is an nth power mod p if and only if y(p,1)=n 1 mod p. As a special case, for odd primes p, for p not
dividing y, y is a non-zero square mod p, if and only if y(p,1)=2 1 mod p.
Remark: This is a reasonable test for not being a square mod p. Apparently Euler was the rst to observe
this, about 300 years ago. Also, later Quadratic Reciprocity will give another mechanism to test whether or
not something is a square modulo p. The criterion for nth powers has not replacement.
Proof: Easy half: Suppose that y = xn mod p. Then, invoking Fermat's Little Theorem,
y(p,1)=n = (xn )(p,1)=n = xp,1 = 1 mod p
as claimed.
Hard half: Now suppose that y(p,1)=n = 1 mod p, and show that y is an nth power. Let g be a primitive
root modulo p, and let ` be a positive integer so that g` = y. We have
(g`)(p,1)=n = 1 mod p
From the discussion of primitive roots above, this implies that
(p , 1) j ` (p , 1)=n
63
(since '(p) = p , 1 for prime p). By unique factorization in the ordinary integers, the only way that this
can happen is that ` be divisible by n, say ` = kn for some integer k. Then
y = g` = gkn = (gk )n mod p
That is, y is the nth power of gk .
Corollary: (Euler's Criterion) Let p be an odd prime. Let y be relatively prime to p. Then
|
y(p,1)=2 1 mod p if y is a square mod p
and
y(p,1)=2 ,1 mod p if y is a non-square mod p
The only new thing to prove, is that if y is a non-square then y(p,1)=2 ,1 mod p. Since by Fermat's
Little Theorem yp,1 = 1 mod p, certainly
Proof:
(y(p,1)=2 )2 = 1 mod p
That is, y(p,1)=2 mod p satis es x2 = 1 mod p, and it is not 1. Certainly ,1 is one other solution to the
equation x2 = 1 mod p. If we can show that there are no other solutions to this equation than 1, then
we'll be done. Suppose x is an integer so that x2 = 1 mod p. Then, by de nition, pj(x , 1)(x + 1).
We recall that since p is prime, if pjab then either pja or pjb. It might be good to review why this is
true: suppose that pjab but p does not divide a, and aim to show that pjb. Let ab = kp. Since p is prime
gcd(p; a) = 1, so there are integers s; t so that sp + ta = 1. Then
b = b 1 = b (sp + ta) = bsp + tab = bsp + tkp = p (bs + tk)
Thus, pjb, as claimed.
Therefore, in the case at hand, if pj(x , 1)(x + 1) then either pj(x , 1) or pj(x + 1). That is, as claimed,
x = 1 mod p. This completes the proof that y(p,1)=2 = ,1 mod p if and only if y is a non-square mod p
(and relatively prime to p).
|
#11.85 Is 2 a square modulo 101?
#11.86 Is 3 a square modulo 101?
#11.87 Is 2 a cube modulo 103?
#11.88 Is 3 a cube modulo 103?
#11.89 Is 2 a 7 power modulo 113?
#11.90 Is 15 a 7 power modulo 113?
th
th
64
12.
(*) Public-Key Ciphers
A little history
Trapdoors
The RSA cipher
The ElGamal cipher
12.1 A little history
Until about 1975, the only kinds of ciphers in existence were symmetric ciphers, meaning that knowledge of the encryption key would easily give knowledge of the decryption key, and vice-versa. These are also
commonly called secret-key ciphers, since all of the keys involved have to be kept secret.
By contrast, a public-key or asymmetric cipher system is one in which knowledge of the encryption
key gives essentially no clue as to the decryption key. Looking at all the classical symmetric ciphers certainly
gives no inkling that a public-key cipher is even possible.
After some highly original (and unappreciated) work by Ralph Merkle, the general idea of a public-key
cipher was rst proposed in W. Die and W. Hellman, New directions in cryptography, IEEE Transactions
on Information Theory IT-22 (1976), pp. 644-654. A public-key system based on the knapsack problem
appeared in R. Merkle and M. Hellman, Hiding information and signatures in trapdoor knapsacks, IEEE
Transactions on Information Theorey IT-24 (1978), pp. 525-530. The latter system was \cracked", and
even though it has now been \ xed", the loss of con dence in the knapsack problem seems irreversible. One
of the most popular public-key systems is RSA, named after the authors: R.L. Rivest, A. Shamir, and L.
Adleman: A method for obtaining digital signatures and public-key cryptosystems, Comm. ACM 21 (1978),
pp. 120-126. The security of RSA is based upon the diculty of prime factorizations. The ElGamal system
is a relative latecomer to the scene, appearing in T. El Gamal, A public key cryptosystem and signature
scheme based on discrete logarithms, IEEE Transactions on Information Theory IT-31 (1985), pp. 469-473.
The possibility of a public-key cipher system also gives rise to many applications that were previously
inconceivable.
A simple example of the use of public key ciphers is in a communications network. For people
to communicate among each other using an asymmetric cipher such as RSA requires only triples (
):
each individual publishes their public key, so to communicate securely with them anyone simply encrypts
with the corresponding public key. That is, the whole communication network only requires one batch of
information (
) per person. By contrast, a symmetric cipher would require a key per pair of people, which
would require ( , 1) 2 for people. Thus, using asymmetric ciphers greatly reduces the number of keys
required to maintain a communications network. Using the ( , 1) 2 keys for a symmetric cipher set-up for
a communication network involving people, in which every pair of people has an encryption/decryption
key pair, is not good. First, each person in the network must remember , 1 keys. Second, there are
altogether ( , 1) 2 key pairs altogether, which have to be created and distributed.
Because the encryption and decryption algorithms for asymmetric ciphers are considerably slower than
those for symmetric ciphers, in practice the asymmetric ciphers are used to securely exchange a session key
for a symmetric cipher to be used for the actual communication. That is, the only plaintext encrypted with
N
N
e; d; n
N N
=
N
N N
=
N
N
N N
=
65
e; d; n
the asymmetric cipher is the key for a symmetric cipher, and then the faster-running symmetric cipher is
used for encryption of the actual message.
The trick of using session keys is by now very common in real uses of cryptography. Thus, in encryption
for secrecy, the advantages of public-key ciphers can be realized while at the same time bene ting from the
speed of symmetric ciphers. Further, in applications to new "exotic" protocols there is no replacement for
public-key ciphers.
12.2 Trapdoors
Each asymmetric (public-key) cipher depends upon the practical irreversibility of some process, usually
referred to as the trapdoor. At present, all the asymmetric ciphers believed to be reasonably secure make
use of tasks from number theory, although in principle there are many other possibilities.
The RSA cipher uses the fact that, while it is not hard to compute the product n = pq of two large
primes p; q (perhaps 1080 or larger), to factor a very large integer n 10160 into its prime factors seems
to be essentially impossible.
The El Gamal cipher uses the fact that, while exponentiation modulo large moduli m is not hard,
computation of discrete logs is prohibitively dicult. That is, given x; e; p (with p prime) all 10140 or
so, to compute y = xe % m is not too hard. But, going the other direction, to compute the exponent e (the
discrete logarithm of y base x modulo b) given y; x; p seems to be hard.
Note the quali cations in the last two paragraphs: we say that the tasks seem to be hard. At this time
there is no proof that factorization into primes is intrinsically terribly hard. On the other hand, there is
a great deal of practical evidence that this is a hard task: people have been thinking about this issue for
hundreds of years, and more recently more intensely so because of its relevance to cryptology. The same is
true of the discrete logarithm problem.
The earliest public-key cipher, that of Hellman-Merkle, had the opposite problem: while based upon
a provably hard problem, the so-called knapsacke problem, the modi cation necessary to make decoding
possible fatally altered the problem so as to make it no longer hard!
In 1978, McEliece proposed a cipher based on algebraic coding theory. This used a Goppa code made
to appear as a general linear code. The decoding problem for general linear codes is provably dicult (\NPcomplete"), while the decoding problem for Goppa codes is \easy". This cipher does not seem to have been
broken, but it is not as popular as RSA and ElGamal. The idea of \hiding" an easier problem inside an
NP-complete problem is similar to the trick in Hellman-Merkle, which seems to have made some people
nervous and suspicious of the security of the McEliece cipher.
So, in the end, although the problems haven't been proven hard, the apparent practical diculty (after
wide attention!) of the problems of factoring and taking discrete logs make RSA and ElGamal the most
popular public-key ciphers.
The recently-publicized elliptic-curve ciphers are technical variants of these others. The description
requires considerable further preparation.
66
12.3 The RSA cipher
The idea of this cipher is due to R.L. Rivest, A. Shamir, and L. Adleman: A method for obtaining
digital signatures and public-key cryptosystems, Comm. ACM 21 (1978), pp. 120-126. The key point is
that factoring large numbers into primes is dicult. Perhaps surprisingly, merely testing large numbers for
primality is much easier.
The hard task
Description of encryption and encryption
Elementary aspects of security of RSA
Speed of encryption/decryption algorithms
Key generation and management
Export regulations?!
The hard task
The hard task here is factorization of large integers into primes. Essential tasks which are relatively
easy are:
exponentiation
% modulo for
10160 and for large exponents .
nding many large primes
1080
As we will see, the contrast in apparent diculties of these tasks is the basis for the security of the RSA
cipher secure.
The diculty of factoring large integers into primes is intuitively clear, although this itself is no proof
of its diculty. By contrast, it is should be surprising that we can test large numbers for primality without
looking for their factors.
The issue of eciently evaluating large powers of large integers reduced modulo large integers is
more elementary.
And keep in mind that the relevant sizes
10160 and
1080 will have to be increased somewhat as
computing speeds increase, even if no improvements in algorithms occur.
e
x
n
n
n >
e
p >
e
x
x
n >
n
p >
Description of encryption and decryption
There are two keys, and . Auxiliary information, which is not secret, consists of a large-ish integer
. (The nature of , and the relation of
to each other and to will be described below). A plaintext
is encoded rst as a positive integer which we still call , and for present purposes we require that
.
Then the encoding step is
( )= %
where % denotes the reduction of modulo . This produces a ciphertext = % which is also a
positive integer in the range 0
. The decryption step is
e
n
d
n
e; d
n
x
x
e
En;e x
z
n
z
x < n
x
n
n
y
< y < n
( )=
Dn;d y
67
y
d
%
n
x
e
n
That's it!
Of course, for the decryption step to really decrypt, the two keys e; d must have the property that
(xe )d x mod n
for all integers x (at least in the range 0 < x < n). Euler's Theorem (below) asserts that if gcd(x; n) = 1
then
x'(n) 1 mod n
where '(n) is the Euler phi-function evaluated at n, de ned to be the number of integers ` in the range
0 < ` n with gcd(`; n) = 1. Thus, the relation between e and d is that they are mutually multiplicative
inverses modulo '(n), meaning that
d e 1 mod '(n)
In that case, we can verify that the encryption-decryption really works for gcd(x; n) = 1.
Dn;d(En;e (x)) = (xe % n)d % n = (xe )d % n
since by now we know that reduction modulo n can be done whenever we feel like it, or not, in the course of
an arithmetic calculation whose answer will be reduced modulo n at the end. By properties of exponents,
(xe )d % n = xe d % n
Since ed 1 mod '(n), there is an integer ` so that
ed = 1 + `'(n)
Then
xed = x1+`'(n) = x1 (x'(n) )` x 1` x mod n
by invoking Euler's theorem.
Note that we must assume that the plaintext x is prime to n. Since n is the product of the two large
primes p and q, being relatively prime to n simply means not being divisible by either p or q. The probability
that a \random" integer x in the range 0 x n would be divisible by p or q is
1+1, 1
p
q
pq
This is a very tiny number, so we just ignore this possibility.
The encryption exponent e (and decryption exponent d) must be prime to '(n) = (p , 1)(q , 1) so that
so that it will have a multiplicative inverse modulo '(n), which will be the decryption exponent d.
A common chain of events is the following. Alice picks two large primes p and q (with p 6= q), and puts
n = pq. The primes p and q must be kept secret. She then further picks the encryption and decryption
exponents e and d so that e d 1 mod '(n). She publishes the encryption exponent e on her web page,
along with the modulus n. Her decryption exponent d is kept secret also. Then anyone who wants to send
email to Alice encrypted so that only Alice can read it can encrypt plaintext x by
En;e (x) = xe % n
Alice is the only person who knows the decryption exponent d, so she is the only one who can recover the
plaintext by
x = Dn;d(En;e (x))
68
Since in this situation she can make the encryption key public, often the encryption key e is called the
public key and the decryption key d is called the private key.
Elementary aspects of security of RSA
The security of RSA more or less depends upon the diculty of factorization of integers into primes.
This seems to be a genuinely dicult problem. But, more precisely, security of RSA depends upon a much
more special problem, the diculty of factoring numbers of the special form n = pq (with p; q prime) into
primes. It is conceivable that the more special problem could be solved by special methods not applicable
to the general one. But for now the special-ness of the problem seems not to have allowed any particularly
good specialized factorization attacks.
The reason that diculty of factorization makes RSA secure is that for n the product of two big primes
p; q (with the primes kept secret), it seems hard to compute '(n) when only n is given. Of course, once the
prime factorization n = p q is known, then it is easy to compute '(n) via the standard formula
'(n) = '(pq) = (p , 1)(q , 1)
If an attacker learns '(n), then the decryption exponent d can be relatively easily computed from the
encryption exponent e, by using the Euclidean Algorithm, since the decryption exponent is just the
multiplicative inverse of e modulo n.
In fact, we can prove that for numbers n of this special form, knowing both n and '(n) is gives the
factorization n = p q (with very little computation). The trick is based on the fact that p; q are the roots
of the equation
x2 , (p + q)x + pq = 0
Already pq = n, so if we can express p + q in terms of n and '(n), we will have the coecients of this
equation expressed in terms of n and '(n), giving an easy route to p and q separately.
Since
'(n) = (p , 1)(q , 1) = pq , (p + q) + 1 = n , (p + q) + 1
we can rearrange to get
p + 1 = n , '(n) + 1
Therefore, p and q are the roots of the equation
x2 , (n , '(n) + 1)x + n = 0
Therefore, the two roots
p
,(n , '(n) + 1) (n , '(n) + 1)2 , 4n
2
are p and q.
And we must note that it is conceivable that there is some other way to obtain the plaintext, or some
portion of it, without factoring n.
It might seem that knowledge of the encryption and decryption exponents e; d would not yield the prime
factorization n = p q. Thus, it might seem that even if the pair e; d is compromised, in the sense that both
numbers become known to an adversary, the utility of the number n = p q is not gone. However, in fact
disclosure of the private (decryption) key compromises the cipher. Speci cally, there is a Las Vegas algorithm
that runs \quickly" which will yield the factorization n = pq.
Note that none of the users of a system with modulus n (the product of two secret primes p; q), public
key e, and private key d do not need to know the primes p; q. Therefore, it would be possible for a central
69
agency to use the same modulus =
compromises the others.
n
pq
over and over. However, as just noted, compromise of one key pair
Speed of encryption/decryption algorithms
If done naively, raising large numbers to large powers takes a long time. Such exponentiation is required by both encryption and decryption in the RSA, so from a naive viewpoint it may be unclear why
the algorithms themselves are any easier to execute than a hostile attack. But, in fact, the required exponentiation can be arranged to be much faster than prime factorizations for numbers in the relevant range
(with a hundred or more digits). Even so, at this time it seems that the RSA encryption and decryption
algorithms (and most asymmetric cipher algorithms) run considerably more slowly than the best symmetric
cipher algorithms.
Typically the primes
are chosen to have a hundred digits or so. Therefore, even if the encryption
exponent is chosen to be relatively small, perhaps just a few decimal digits, the multiplicative inverse (the
decryption key) will be about as large as . Thus, the task of computing large powers of integers, modulo a
large , must be executable relatively quickly by comparison to the task of factoring .
There is an important elementary speed-up of exponentiation we'll describe below, which allows us to
consider exponentiation \easy". This algorithm is useful for computing powers of numbers or other algebraic
e,1 e .
even more generally. That is, to compute e we do not compute 1 2 3 4 5
p; q
e
n
n
n
x
x ;x ;x ;x ;x ;:::;x
;x
Key generation and management
To set up a modulus = from secret primes , and to determine a key pair with 1 mod 1,
requires rst of all two large primes , at least 1080, for example. Since the security of RSA is based
upon the intractability of factoring, it is very lucky that primality testing is much easier than factorization
into primes. That is, we are able to obtain many \large" primes
1080 cheaply, despite the fact that
160
we cannot generally factor \large" numbers =
10 into primes (even with good algorithms).
The decryption (or private) key can be chosen rst, after . For there to be a corresponding
encryption key it must be that is relatively prime to ( , 1)( , 1), and then the Euclidean Algorithm
gives an ecient means to compute .
One way to obtain relatively prime to ( , 1)( , 1) is simply by guessing and checking, as follows.
Note that since , 1 and , 1 themselves are large we may not have their prime factorizations! We pick
a random large prime , and then use the Euclidean Algorithm to nd the greatest common divisor of this
and ( , 1)( , 1). If the gcd is 1 we just guess again. Since was a large random prime the heuristic
probability is very high that there rst guess itself will already be relatively prime to ( , 1)( , 1).
Further technical notes:
n
pq
p; q
p; q
e; d
ed
>
p; q >
n
pq >
d
e
p; q
d
p
q
e
d
p
p
q
q
d
d
p
q
>
d
p
q
In many implementations, to make encryption easy, the encryption exponent is always taken to be just
3, and the primes
p; q
not congruent to 1 modulo 3. This certainly o ers further simpli cations.
For technical reasons, some people have more recently recommended
216 + 1 = 65537
(which is prime) as encryption exponent. Then take the primes
p; q
not congruent to 1 modulo 65537.
Both , 1 and , 1 should have at least one very large prime factor, since there are factorization
attacks against = that are possible if , 1 or , 1 have only smallish prime factors.
p
q
n
pq
p
q
70
The primes p and q should not be \close" to each other, since there are factorization attacks on n that
succeed in this case (Fermat, Pollard's rho, etc).
Don't want the ratio p=q to be \close" to a rational number with smallish numerator and denominator,
since then D.H. Lehmer's Continued Fraction factorization attack on n = pq will succeed.
U.S. export regulations
As of this writing (December 1997), export of RSA software with any keysize is allowed for authentication purposes, although it must be demonstrated that the product in question cannot easily be converted to
use for encryption. For RSA used for encryption, evidently the keysize must be limited to 512 bits. Export
of encryption for nancial use with larger key sizes is sometimes allowed: for example, Cybercash has been
allowed to export 768-bit keys for nancial transactions.
Probably the RSA FAQ is the best reference for the current status of export and other practical questions
about RSA:
http://www.rsa.com/rsalabs/newfaq/
12.4 The ElGamal cipher
This idea appeared in T. El Gamal, A public key cryptosystem and signature scheme based on discrete
logarithms, IEEE Transactions on Information Theory IT-31 (1985), pp. 469-473. This idea is a little
more complicated that RSA, but stillessentially elementary. The idea also lends itself to certain technical
generalizations more readily than does RSA. For example, the elliptic curve cryptosystems are analogues of
the ElGamal cipher.
The hard task
Description of encryption and encryption
Elementary aspects of security of ElGamal
Speed of encryption/decryption algorithms
Key generation and management
The hard task
The hard task here is computation of discrete logs. This means the following. Fix a modulus m,
and integers b; c. An integer solution x to the equation
bx c mod m
is a (discrete) logarithm base b of c modulo m. It is important to know that for random m, b, and c
there may be no such x. But for prime modulus p and good choice of base b there will exist discrete logarithm
for any c not divisible by p. (These theoretical aspects will be clari ed later.)
Fix a large prime p > 10150 . For two integers b; c suppose that we know that
bx = c mod p
71
for some x. The dicult task is to compute x given only b; c; p.
For any positive integer m, an integer b is usually called a primitive root modulo p if every integer
c relatively prime to p may be expressed in the form
c = b mod m
We will see later that there exist primitive roots mod m only for special sorts of integers, including mainly
prime moduli. For prime modulus p, we will see that a primitive root b mod p has the property that the
smallest positive power b of b congruent to 1 modulo p is p , 1. That is,
b ,1 1 mod p
and no smaller power will do. More generally, for arbitrary x relatively prime to a modulus divisible by
prime m, the order or exponent of x mod m is the smallest positive integer exponent n so that
x 1 mod m
We will see that primitive roots have maximal order.
For El Gamal it is not strictly necessary to have a primitive root modulo p, since we only need the
con guration b c mod p, but we must require that the order of b mod p is close to the maximum possible,
or else the cipher can be too easily broken.
Both the idea of \discrete logarithm" and the use of the diculty of computing discrete logarithms to
construct a cryptosystem admit further abstraction. The most popular example of such a generalization is
the elliptic curve cipher, which needs considerable preparation even to describe. We will do this later.
x
k
p
n
`
Description of encryption and encryption
Fix a large prime p > 10150 , a primitive root b modulo p (meaning that any y can be expressed as
y = b mod p), and an integer c in the range 1 < c < p. The secret is the power ` (the discrete logarithm)
so that b = c mod p. Only the decryptor knows `.
The encryption step is as follows. A plaintext x is encoded as an integer in the range 0 < x < p. The
encryptor chooses an auxiliary random integer r, which is a temporary secret known only to the encryptor,
and encrypts the plaintext x as
y=E
(x) = (x c ) % p
Along with this encrypted message is sent the \header" b . Note that the encryptor only needs to know
b; c; p and chooses random r, but does not know the discrete logarithm `.
The decryption step requires knowledge of the discrete logarithm `, but not the random integer r. First,
from the \header" b the decryptor computes
(b ) = b = (b ) = c mod p
Then the plaintext is recovered by multiplying by the multiplicative inverse (c ),1 of c modulo p:
D
(y) = (c ),1 y % p = (c ),1 c x mod p = x % p
L
`
r
b;c;p;r
r
r
r `
r `
` r
r
r
b;c;p;r;`
r
r
r
r
Elementary aspects of security of ElGamal
The security of this cipher depends upon the diculty of computing the discrete logarithm ` of an
integer c base b, modulo prime p. Again, this is an integer so that
b = c mod p
`
72
There seems to be little tangible connection between this notion of logarithm and logarithms of real or
complex numbers, although they do share some abstract properties.
The naive algorithm to compute a discrete logarithm is simply trial and error. Better algorithms to
compute discrete logarithms, to attack the El Gamal cipher for example, require more understanding of
integers modulo p.
To avoid logarithm computation attacks, we would also want to choose p so that p , 1 does not
have \too many" small prime factors. Since p , 1 is even, it will always have a factor of 2, but beyond this
we hope to avoid small factors. This can be made more precise later.
Speed of encryption/decryption algorithms
As with RSA, the speed of encryption and decryption is mainly dependent upon speed of exponentiation
modulo n, which can be made reasonably fast.
A feature of ElGamal (and related algorithms) is that encryptors need a good supply of random numbers.
Thus, availability of high-quality pseudo-random number generators is relevant to ElGamal.
Comments on key generation and management
The most obvious requirement for this cipher system is a generous supply of large primes p, meaning
p > 10160 or so. The naive trial division primality test is completely inadequate for this.
Whoever creates the con guration
b` c mod p
will presumably rst choose a large prime p, most likely meeting some further technical conditions. An
especially nice kind of prime, for this and many other purposes, is of the form
p=2p +1
0
where p is another prime. In that case, about half the numbers b in the range 1 < b < p , 1 are primitive
roots (so have order 2p ) and the other half have order p , which is still not so bad. Thus, random selection of
such b gives a good candidate. Random choice of the exponent `, and computation of c = b` % p completes
the preparations.
In the case that the prime p is of the special form 2p + 1 with p prime, it is easy to nd primitive roots
in any case, since (as we will see later) the elements of orders p (rather than 2p which a primitive root
would have) are all squares in Z=p. The property of being a square or not is easily computable by using
Legendre and Jacobi quadratic symbols, as we will see. Thus, it is quite feasible to require that the number
b used in El Gamal be a primitive root for primes p = 2p + 1.
It is plausible to use a single prime modulus p for several key con gurations b` = c mod p since there
are many di erent primitive roots modulo p, but we will see that compromise of one such con guration
compromises others.
Also, any encryptor will need a good supply of (pseudo-) random integers for the encryption process.
This is an issue in itself.
0
0
0
0
0
0
0
73
0
13. (*) Pseudoprimes and Primality Tests
The simplest test for primality, the trial division method, may require roughly n steps to prove that
n is prime. This already takes several minutes on a 200 Mhz machine when n 1018, so would take about
1016 years for n 1060 . Yet modern ciphersystems need many primes at least 1060, if not larger.
The rst compromise is that we can gain enormously in speed if we sacri ce certainty. That is, we can
quickly prove that very large numbers are likely to be prime, but will not have the absolute certainty of
primality that traditional computations would give. But, since those traditional computations could never
be completed, perhaps the idea that something is being \sacri ced" is incorrect.
Numbers which are not truly known to be prime, but which have passed various probabilistic tests
for primality, are called pseudoprimes (of various sorts). Sometimes the word \pseudoprime" is used to
indicate a non-prime which has nevertheless passed a probabilistic test for primality. For us, though, a
pseudoprime is simply a number (which may or may not really be prime) which has passed some sort of
probabilistic primality test.
Each of these yet-to-be-speci ed probabilistic primality tests to be performed upon a number n makes
use of one or more auxiliary numbers b, chosen \at random" from the range 1 < b < n. If a particular
auxiliary b tells us that "n is likely prime", then b is a witness to the primality of n. The problem is that
a signi cant fraction of the numbers b in the range 1 < b < n may be false witnesses (sometimes called
liars), meaning that they tell us that nis prime when it's not.
Thus, part of the issue is to be sure that a large fraction of the numbers b in the range 1 < b < n are
("true") witnesses to either the primality or compositeness of n.
The fatal aw in the Fermat pseudoprime test is that there are composite numbers n for which there
are no witnesses. These are called Carmichael numbers. The other two primality tests have no such aw.
In all cases, the notion of probability that we use in saying something such as \n is prime with probability
2,10 " is a fundamentally heuristic one, based on the doubtful hypothesis that among n possibilities which
we don't understand each has probability 1=n of occurring. (This sort of pseudo-probabilistic reasoning has
been rightfully disparaged for over 200 years.)
On the other hand, these probabilistic primality tests can be converted to deterministic tests if the
Extended Riemann Hypothesis is true. Many mathematicians believe that the Extended Riemann
Hypothesis is true, and there is no simple evidence to the contrary (as of early 1998), but it seems that
no one has any idea how to prove it, either. The question has been open for about 140 years, with no real
progress on it. Assuming that the Extended Riemann Hypothesis is true, it would follow that there is a
universal constant C so that for any number n, if n is composite then there is an Euler witness (and also a
strong witness) b with
1 < b < C (log n)2
That is, we wouldn't have to look too far to nd a (truthful) witness if n is composite.
Fermat pseudoprimes
Euler pseudoprimes
Strong pseudoprimes
Miller-Rabin primality test
p
74
13.1 Fermat pseudoprimes
This section gives a heuristic test for primality. It has several weaknesses, and in the end is not what
we will use, but it illustrates two very interesting points: rst, that probabilistic algorithms may run much
faster than deterministic ones, and, second, that we simply can't expect to provide proofs for everything
which seems to be true.
On one hand, Fermat's so-called Little Theorem asserts that for any prime number p and integer b
bp = b mod p
Equivalently, for p not dividing b, we have
bp,1 = 1 mod p
This is a special case of Euler's theorem which asserts that for b prime to an integer n,
b'(n) = 1 mod n
where ' is Euler's phi-function. (Euler's theorem is best proven using a little group theory, and we will do
this later).
An integer n is called a Fermat pseudoprime or ordinary pseudoprime or simply pseudoprime
if
2n,1 = 1 mod n
No, there is no assurance that n's being a Fermat pseudoprime implies that n is prime, since there is no
converse to Fermat's little theorem. Yet, in practice it is \very unusual" that 2n = 2 mod n and yet n is
not prime. Speci cally, 341 = 11 31 is not prime, and is the rst non-prime number which is a Fermat
pseudoprime: 2341 = 2 mod 341. The next few non-prime Fermat pseudoprimes are
561; 645; 1105; 1387; 1729; 1905; 2047; 2465
There are only 5597 non-prime Fermat pseudoprimes below 109.
It is true that if an integer n fails the Fermat test, meaning that 2n,1 6= 1 mod n, then n is certainly
not a prime (since, if it were prime, then 2n,1 = 1 mod p, after all!).
Since the fast exponentiation algorithm provides an economical method for computing bn,1 mod p, we
can test whether an integer n is a Fermat pseudoprime much faster than we can test it for primality by trial
division.
We can make a more stringent condition: an integer n is a Fermat pseudoprime base b if
bn,1 = 1 mod n
If bn,1 6= 1 mod n, then n is certainly not a prime. On the other hand, even if bn,1 = 1 mod n for all b
relatively prime to n, we have no assurance that n is prime.
An integer b in the range 1 < b < n , 1 so that bn,1 1 mod n is a (Fermat) witness for the primality
of n. If n isn't actually prime, then that b is a (Fermat) false witness or (Fermat) liar.
And the behavior of a non-prime can be di erent with respect to di erent bases. For example, the
non-prime 91 = 7 13 is not a Fermat pseudoprime (base 2), but is a Fermat pseudoprime base 3.
Again, in practice, it is very unusual for an integer to be a pseudoprime base b for one or more bases b
and yet fail to be a prime. For example, among integers under 46; 000, the only Fermat pseudoprimes base
75
2 which are not prime are the 52 numbers 341, 561, 645, 1105, 1387, 1729, 1905, 2047, 2465, 2701, 2821,
3277, 4033, 4369, 4371, 4681, 5461, 6601, 7957, 8321, 8481, 8911, 10261, 10585, 11305, 12801, 13741, 13747,
13981, 14491, 15709, 15841, 16705, 18705, 18721, 19951, 23001, 23377, 25761, 29341, 30121, 30889, 31417,
31609, 31621, 33153, 34945, 35333, 39865, 41041, 41665, 42799.
Compare this to the fact that there are 4761 primes under 46 000. Thus, the failure rate for Fermat
pseudoprime base 2 is only about one percent.
In that same range, the only non-prime Fermat pseudoprimes base 3 are the 35 numbers 91, 121, 671,
703, 949, 1541, 1891, 2665, 3281, 3367, 3751, 4961, 5551, 7107, 7381, 8205, 8401, 11011, 12403, 14383, 15203,
15457, 16471, 16531, 19345, 23521, 24661, 24727, 28009, 29161, 30857, 31697, 32791, 38503, 44287.
The only non-primes under 46 000 which are Fermat pseudoprimes both base 2 and base 3 are the 14
numbers 561, 1105, 1729, 2465, 2701, 2821, 6601, 8911, 10585, 15841, 18721, 29341, 31621, 41041. Thus, the
failure rate here is less than a third of one percent.
Nevertheless, there are in nitely many integers which are pseudoprimes to all bases (relatively prime
to them), and yet are not prime. These are called Carmichael numbers. Under 46 000, there are only
14 Carmichael numbers: 561, 1105, 1729, 2465, 2701, 2821, 6601, 8911, 10585, 15841, 18721, 29341, 31621,
41041.
There are \only" 2163 Carmichael numbers below 109, and 8241 Carmichael numbers below 1012, 19 279
up to 1013, 44 706 up to 1014 , and 105 212 up to 1015 . (see R.G.E. Pinch, The Carmichael numbers up to
1015, Math. Comp. 61 (1993), pp. 381-391.)
Later, we will show that a Carmichael number must be odd, square-free, and be divisible by at least 3
primes. This result is also necessary to understand why the better versions of \pseudoprime" and corresponding probabilistic primality tests (below) do not have failings analogous to the presence of Carmichael
numbers.
It was an open problem for more than 80 years to determine whether there are or are not in nitely
many Carmichael numbers. Rather recently, it was proven that there are in nitely-many: in fact, there is
a constant so that the number of Carmichael numbers less than is 2=7 . See W.R. Alford, A.
Granville, and C. Pomerance, There are in nitely-many Carmichael numbers, Ann. of Math. 140 (1994),
pp. 703-722.
;
;
;
;
;
;
C
x
C
x
13.2 Strong pseudoprimes
Continuing with the use of square roots to test primality, we can go a bit further than Euler's criterion.
Here the underlying idea is that if is a prime then Z should have only 2 square roots of 1, namely 1.
p
=p
Let be an odd number, and factor
n
n
, 1 = 2s
`
with odd. Then is a strong pseudoprime base if
`
n
either
b
`
b
= 1 mod
n
or
2r `
b
= ,1 mod
n
for some 0
r < s
On the face of it, it is certainly hard to see how this is related to primality. And despite the remark
just above about this test being related to the presence of \false" square roots of 1, it's certainly not so clear
why or how that works, either.
Nevertheless, granting fast exponentiation, the algorithm runs pretty fast. We'll address the how and
why issues later.
76
13.3 Miller-Rabin primality test
The Miller-Rabin probabilistic primality test hunts for strong pseudoprimes. When applied to a number
n it tests whether n is an Euler pseudoprimes base b for several di erent bases b. This test is easy to
implement, so gets used in real life. The fact that makes it reasonably fast is that exponentiation modulo n
is reasonably fast.
The idea of the test is that for non-prime n there will be at least 2 elements x of Z=n with the property
that x2 = 1 by x 6= 1. As with the Solovay-Strassen test, much further explanation is needed to see why it
works, etc. The operation of the Miller-Rabin test itself is quite simple, though, even simpler than that of
the Solovay-Strassen test.
As with the Solovay-Strassen test, this test can prove compositeness with certainty, but only proves
primality with a certain probability.
Let n be a positive odd integer. Find the largest power 2 dividing n , 1, and write n , 1 = 2 s. In
order to discover either
n is composite with certainty or
n is prime with probability > 1 , (1=4)
choose k \random" integers b in the range 1 < b < n , 1.
For each b in the list, compute b1 = b % n. b2 = b21 % n, b3 = b22 % n, b +1 = b2 % n. For the rst
index i so that b = 1 (if there is one), look at b ,1 . If b ,1 6 ,1 mod n, then n is surely composite.
If for every b in the list the rst index i with b = 1 has the property that b ,1 ,1 mod n, then we
imagine that n is prime with probability > 1 , (3=4) .
If no b = 1, then n is surely composite.
The idea is that if n is composite then at least 3/4 the integers b in the range 1 < b < n, 1 are witnesses
to this fact.
Note that if none of the b is 1, then b ,1 6= 1, so n is not a Fermat pseudoprime base b.
As with the Solovay-Strassen test, to demonstrate the presence of so many witnesses requires preparation,
so we postpone it.
r
r
k
s
i
r
i
r
i
i
k
i
i
i
n
#13.91 The numbers 341, 561, 645, 1105 are all Fermat pseudoprimes base 2, but are not prime. Which
false primes are detected by Fermat's test base 3? Base 5?
#13.92 Find the smallest Euler (Solovay-Strassen) witnesses to the compositeness of the Carmichael numbers 561, 1105, 1729.
#13.93 Find the smallest strong (Miller-Rabin) witnesse to the compositeness of the Carmichael numbers
2465, 2821.
77
14. Vectors and matrices
The rst version of vector one sees is that a vector is a row or column of numbers:
0 x1 1
BB x2 CC
@ .. A
( x1 x2 : : : xn )
.
xn
Vectors of the same size have a vector addition
( x1 x2 : : : xn ) + ( y1 y2 : : : yn ) = ( x1 + y1 x2 + y2 : : : xn + yn )
and
0 x1 1 0 y1 1 0 x1 + y1 1
BB x2 CC + BB y2 CC = BB x2 + y2 CC
@ ... A @ ... A @ ... A
xn
yn
xn + yn
There is also scalar multiplication
0 x1 1 0 sx1 1
B x2 CC = BB x2 CC
sB
@ .. A @ .. A
.
.
xn
sxn
These operations in themselves are unremarkable and easy to execute.
Likewise, an m-by-n matrix is just an m-by-n block of numbers
0 x11 x12
B x21 x22
x=B
B@ x31 x32
x13 : : : x1n 1
x23 : : : x2n C
x33 : : : x3n C
CA
:::
xm1 xm2 xm3 : : : xmn
The entry in the ith row and j th column is called the (i; j )th entry. If x is the whole matrix, often its (i; j )th
entry is denoted by subscripts
xij = (i; j )th entry of the matrix x
The matrix addition is the straightforward entry-by-entry addition of two matrices of the same size:
0 x11 x12
BB x21 x22
B@ x31 x32
: : : x1n 1 0 y11 y12 y13 : : : y1n 1
: : : x2n C
BB y21 y22 y23 : : : y2n CC
+
: : : x3n C
CA B@ y31 y32 y33 : : : y3n CA
:::
: : : xmn
ym1 ym2 ym3 : : : ymn
x12 + y12 x13 + y13 : : : x1n + y1n 1
x22 + y22 x23 + y23 : : : x2n + y2n C
x32 + y32 x33 + y33 : : : x3n + y3n C
CA
:::
xm1 + ym1 xm2 + ym2 xm3 + ym3 : : : xmn + ymn
x13
x23
x33
:::
xm1 xm2 xm3
0 x11 + y11
B x21 + y21
=B
B@ x31 + y31
78
The additive identity or zero matrix 0m;n of size m-by-n is the m-by-n matrix with all entries 0. It
has the obvious property that if it is added to any other matrix x of that same shape we just get
x + 0m;n = x = 0m;n + x
There is also matrix multiplication, whose meaning and computation is more complicated: for a
k-by-m matrix x and an m-by-n matrix y the (i; j )th entry of the product xy depends only upon the ith
row of x and the j th columns of y: it is
:::1
:::C
:::C
CC
:::A
: : : ymj : : :
0 : : : y1j
0
1
0
1
BB : : : y2j
:::
:::
@ : : : (xy)ij : : : A = @ xi1 xi2 xi3 : : : xim A BB : : : y3j
@ : : : ...
:::
:::
0
1
:::
= @ : : : xi1 y1j + xi2 y2j + : : : + xim ymj : : : A
:::
That is, the (i; j )th entry of the product is
(xy)ij = xi1 y1j + xi2 y2j + : : : + xim ymj =
X
1`m
xi` y`j
The (multiplicative) identity matrix 1n of size n-by-n is the matrix
01
BB 0
1n = B
BB 0
@
1
CC
C
::: C
C
1 0A
0 0 :::
1 0 :::
0 1 :::
::: 0 1
with all 1's on the diagonal (from upper left to lower right) and 0's everywhere else. This has the property
that it's name suggests: for any m-by-n matrix x and/or n-by-m matrix y, we have
x 1n = x
1n y = y
Since the product of a k-by-m matrix x and an m-by-n matrix is k-by-n, we note that for each positive
integer n the collection of n-by-n square matrices has both addition and multiplication which give outcomes
back in the same collection.
Some n-by-n matrices x have a multiplicative inverse x,1 , meaning that
x x,1 = 1n = x,1 x
Such square matrices are said to be invertible.
#14.94 Show that the inverse of the matrix 10 x1 is 10 ,1x .
79
#14.95 Show that the inverse of the matrix
#14.96 Show that the inverse of the matrix
1 0
0 1 x x2=2 1
@0 1 x A
0 0
is
1 0
x 1 is ,x 1 .
1
0 1 ,x x2 =2 1
@ 0 1 ,x A
0
0
1
r 0 r ,1 0
#14.97 Show that the inverse of the matrix
is
.
#14.98 Show that the matrix
0 1
0
1
1 3
2 6
does not lie in GL(2; R), that is, has no multiplicative inverse.
#14.99 Find two 2-by-2 integer matrices A; B which do not commute, that is, so that AB 6= BA.
#14.100 Prove by induction that for any positive integer N
1 1 N 1 N
=
0 1
#14.101 Let
0 1
01
x = @0
1
1 0
1 1A
0 0 1
Determine a formula for xN for positive integers N , and prove (by induction) that it is correct.
80
15. Motions in two and three dimensions
One very important use of vectors and matrices is to give analytical and quantitative descriptions of
basic manipulations of two-dimensional and three-dimensional objects. This makes possible computation
by hand or by machine, rather than seeming to require precise drawing or visualization. In fact, these
computations are what a person or machine has to do in order to create graphics.
First, recall that in analytic geometry an ordered pair of real numbers (x; y) refers to a point in the
plane, while an ordered triple (x; y; z ) refers to a point in three-space. For present purposes, in all matrix
and vector computations we will write vectors as \column vectors" (rather than \row vectors"):
(x; y) =
x
y
:
0x1
(x; y; z ) = @ y A :
z
We will write the inner product or scalar product or dot product of two two-dimensional vectors
= (x1 ; y1 ) and v2 = (x2 ; y2) as
hv1 ; v2 i = x1 x2 + y1 y2
We will use the same notation for the inner product of two three-dimensional vectors v1 = (x1 ; y1 ; z1 ) and
v2 = (x2 ; y2 ; z2 ) as
hv1 ; v2 i = x1 x2 + y1 y2 + z1 z2
And the length of a two-dimensional vector v = (x; y) is
v1
p
p
jjvjj = hv; vi =
and of a three-dimensional vector v = (x; y) is
p
jjvjj = hv; vi =
x2 + y 2
p
x2
+ y2 + z 2
The distance between two points v1 and v2 is
distance from v1 to v2 = jjv1 , v2 jj
Consider now two line segments with a a common vertex: for three points vo ; v1 ; v2 , let s1 be the line
segment connecting vo to v1 , and let s2 be the line segment connecting vo to v2 . Then
1 , vo ; v2 , vo i
cosine of angle between s1 and s2 = jjvhv,
v jj jjv , v jj
1
o
2
o
In particular, the two line segments are perpendicular (orthogonal) if and only if hv1 , vo ; v2 , vo i = 0.
All the \motions" we consider will be functions from the plane (two-space) to itself, or from threespace to itself. Thus, to describe a \motion" is to describe where each point goes, probably by a formula.
The rst example is rotations in the plane. We wish to give an analytical or formulaic description
of the operation of rotation counter-clockwise by amount , with the rotation being around the origin. Let
R be the matrix
cos , sin
R =
sin cos
81
De ne
where
x
0
0
f (x; y ) = (x0 ; y 0 )
: = R
x
cos , sin x
:=
:
sin cos
y
If one draws a picture (!) it will be visible that this function rotates everything by angle counter-clockwise
around the origin.
The second example is translations in the plane. The translation by amount (h; k) is
y
y
f (x; y ) = (x + h; y + k )
If f (x; y) is written as a column vector, then this can be rewritten as
f (x; y )
=
x h
y
:+
k
:
Thus, representing a point v = (x; y) as a column vector, a rotation by angle is the function v ! R v
with R as above. Representing (xo ; yo as a column vector vo , translation by vo is the function v ! v + vo .
In three space, translations are still easy to describe: for a xed amount (vector) vo = (xo ; yo ; zo) to
translate, the translation-by-vo function sends a vector v = (x; y; z ) to
0x1 0x 1
o
f (x; y; z ) = v + vo = @ y A : + @ yo A
z
z
:
o
In three space, there are many more possible axes about which to rotate, although all rotations xing
the origin are of the form
0 1
f (x; y; z ) = R
x
@y A:
z
for some suitable 3-y-3 matrix R. In particular, for a rotation by angle around the z -axis we use
R
= R(z)
For a rotation around the y-axis we use
y
R = R
( )
0
0 cos
=@ 0
sin
And for a rotation around the x-axis we use
x
R = R
0 cos , sin 0 1
= @ sin cos 0 A
( )
0
1
1
0 , sin
1
0 A
0 cos
01
= @0
1
0
0
cos , sin A
0 sin cos
Without quite giving general de nition of rotation in three-space, we nevertheless can state an important
fact: every \rotation" in three-space xing the origin (0; 0; 0) can be (essentially uniquely) expressed as a
composite
R(x)
R y R z
( )
82
( )
The angles ; ; are the Euler angles of the rotation.
#15.102 From the fact that left multiplication of a vector
cos , sin
sin cos
x by the matrix
y
corresponds to rotation of the point (x; y) by angle , prove the Addition
cosine:
cos( + ) = cos cos , sin sin
sin( + ) = cos sin , sin cos
Laws (so-called) for sine and
#15.103 In two-space (the plane), nd a translation f1 and a rotation f2 in two-space so that (f1 f2)(1; 0) =
(3; 4) and (f1 f2 )(0; 1) = (4; 5).
#15.104 In two-space (the plane), nd a rotation f1 and a translation f2 in two-space so that (f1 f2)(1; 0) =
(3; 4) and (f1 f2 )(0; 1) = (4; 5).
#15.105 Show that rotations in two-space preserve distances between points, and preserve angles between
line segments sharing a common vertex.
#15.106 Show that translations in two-space preserve distances between points, and preserve angles between
line segments sharing a common vertex.
#15.107 Let vo be a vector in two-space, and let be an angle. Show that counter-clockwise rotation
by , followed by translation by vo , followed by counter-clockwise rotation by ,, is another translation.
Determine it explicitly.
#15.108 Let f1 be rotation of the plane by an angle 1, let f2 be translation of the plane by vector v2 , and
let f3 be rotation of the plane by an angle 3 . Show that
f3 f2 f1
is expressible more brie y as F G where F is translation by some vector, and G is rotation by some angle.
#15.109 Let f1 be translation of the plane by a vector v1, let f2 be rotation of the plane by angle 2, and
let f3 be translation of the plane by a vector v3 . Show that
f3 f2 f1
is expressible more brie y as F G where F is translation by some vector, and G is rotation by some angle.
#15.110 Let Rz
be a rotation around the z -axis by angle , R(y) a rotation around the y-axis by angle .
Show that, given any vector v = (x; y; z ) with jjvjj = 1 there are angles ; so that
( )
R(z) R(y)(1; 0; 0) = (x; y; z )
#15.111 (*) Prove the assertion that left multiplication by the matrix
cos , sin
sin cos
83
really does rotate points in the plane by the angle around the origin.
#15.112 (*) Let Rz
be rotation around the z -axis by , R(y) rotation around the y -axis by , and R(x)
rotation around the x-axis by . Given two vectors v; w so that jjv jj = jjwjj = 1 and hv; wi = 0, nd angles
; ; so that
R(x) R(y) R(z)(1; 0; 0) = v
( )
R(x) R(y) R(z) (0; 1; 0) = w
#15.113 (*) Let Rz
be rotation around the z -axis by , R(y) rotation around the y -axis by , and R(x)
rotation around the x-axis by . For given 1 ; 2 , nd ; ; so that
( )
R(z1) R(y2) = R(x) R(y) R(z)
84
16. Permutations and Symmetric Groups
Another important fundamental idea is that of permutation of a set . A permutation of a set is
de ned to be a bijection of to itself.
The crudest question we can ask about permutations of is how many are there? If has (distinct)
elements 1 2
: ! is a permutation of , then there are choices for what ( 1 ) can
n and
be, , 1 remaining choices for what ( n,1 ) can be (since it can't be whatever ( n ) was, and so on. Thus,
there are ! permutations of a set with elements.
To study permutations themselves it doesn't matter much exactly what the elements of the set are so
long as we can tell them apart, so let's just look at the set
X
X
X
x ;x ;:::;x
f
X
X
n
X
X
f x
n
n
n
f x
f x
n
f1 2 3
;
;
;:::;n
,1 g
;n
as a good prototype of a set with (distinct elements. The standard notation is to write n for the group
things.
n is also called the symmetric group on
Despite the name `symmetric group', these groups are not directly related to `groups of symmetries' of
geometric objects. At the same time, it can happen that a group of symmetries turns out to be a symmetric
group. The point is that the terminology is a little delicate here, so be careful).
A standard way to write permutations of f1 2
g in order to describe in detail what does is to
e ectively graph but in the form of a list: write
1
2
3
= (1) (2) (3)
( )
Thus, altering the notation just slightly, the permutation
= 1 2 3
n
of permutations of things. This
n
S
S
n
f
;
;:::;n
f
f
:::
f
f
g
f
f
i1
i2
n
:::
i3
f n
:::
n
:::
in
is the one so that ( ) = `.
Always we have the trivial permutation
g `
i
e
= 11 22 33
:::
n
:::
n
which does not `move' any element of the set. That is, for all , ( ) = .
Of course, one permutation may be applied after another. If
are two permutations, write
i
e i
i
g; h
g
h
for the permutation that we get by rst applying , and then applying . This is the composition or
product of the two permutations. It is important to appreciate that, in general
h
g
g
6=
h
h
g
We'll see examples of this below. But in any case this notation is indeed compatible with the notation for
(and the idea of) composition of functions. Thus, for 1 , by de nition
i
n
( )( ) = ( ( ))
g
h
i
g h i
85
It is a consequence of the de nition of permutations as (bijective) functions from a set to itself that
composition of permutations is associative: for all permutations g; h; k of a set,
(
g
h) k
=
g
(h k )
Indeed, for any element of the set, the de nition of composition of permutations gives
i
((
g
h) k )(x) = (g h)(k (x))
de nition of (
g
h) k ,
applied to
x
= ( ( ( ))) de nition of , applied to ( )
= (( )( )) de nition of , applied to
= ( ( ))( ) de nition of ( ), applied to
(This even works for in nite sets).
And for any permutation there is the inverse permutation ,1 which has the e ect of reversing the
permutation performed by . That is,
,1 = ,1 =
g h k x
g
g
h
h
k
k
g
h
x
h
x
g
k x
k
x
h
k
g
x
g
g
g
g
g
g
e
Often the little circle indicating composition is suppressed, and we just write
g
h = gh
as if it were ordinary multiplication. The hazard is that we cannot presume that = , so a little care is
required.
The graph-list notation for permutations is reasonably e ective in computing the product of two permutations: to compute, for example,
gh
1 2 3
2 3 1
1 2 3
3 2 1
hg
we see what this composite does to each of 1 2 3. The permutation on the right is applied rst. It sends
1 to 3, which is sent to 1 by the second permutation (the one on the left). Similarly, 2 is sent to 2 (by the
permutation on the right) which is sent to 3 (by the permutation on the left). Similarly, 3 is sent to 1 (by
the permutation on the right) which is sent to 2 (by the permutation on the left). Listing-graphing this
information, we have
1 2 3 1 2 3 = 1 2 3
2 3 1
3 2 1
1 3 2
;
;
If we multiply (compose) in the opposite order, we get something di erent:
1 2 3
3 2 1
1 2 3 = 1 2 3
2 3 1
2 1 3
This is the simplest example of the non-commutativity of the `multiplication' of permutations, that is,
that 6= in general.
It is certainly true that permutations, especially of big sets, can be very complicated things which are
hard to visualize. Still, they can be broken up into simple pieces, as we'll see just below.
First, the simplest permutations are the cycles of various lengths. A -cycle is a permutation so
that (for some numbers 1
k)
gh
hg
k
f
i ;:::;i
( )=
f i1
i2 ;
( )=
f i2
i3 ;
( )=
f i3
( ,1 ) =
i4 ; : : : ; f ik
86
ik ;
( )=
f ik
i1
and so that ( ) = for any number not in the list 1
k . Note that k is sent back to 1 . Thus, as the
name suggests, cycles the 1
k among themselves. A more afreviated notation is used for this: write
f j
j
j
f
i ;:::;i
i
i
i ;:::;i
(1
i
i2
:::
ik
,1
ik
)
for this -cycle.
For example, comparing with the more general notation,
k
1 2 3 = (1 2)
2 1 3
1 2 3 = (1 3)
3 2 1
1 2 3 = (1 2 3)
2 3 1
These are, in order, two 2-cycles and a 3-cycle.
Unlike the more general notation, there is some ambiguity in the cycle notation: for example,
(1 2 3) = (2 3 1) = (3 1 2)
Generally, there are di erent ways to write a -cycle in this cycle notation. In a similar vein, it is pretty
clear that
If is a -cycle, then
k
=
meaning that applying to the set times has the net e ect of moving nothing.
k
g
k
k
g
g
e
k
How do cycles interact with each other? Well, generally not very well, but if = ( 1 k ) and
= ( 1 ` ) are a -cycle and an -cycle with disjoint lists f 1
k g and f 1
` g interact nicely:
they commute with each other, meaning that
=
in this special scenario. Such cycles are called (reasonably enough) disjoint cycles. Pursuing this idea, we
have
Any permutation can be written as a product of disjoint cycles, and in essentially just one way.
g
h
j
:::j
k
`
i ;:::;i
gh
i
:::i
j ;:::;j
hg
The `essentially' means that writing the same cycles in a di erent order is not to be considered di erent,
since after all they commute. This is called a decomposition into disjoint cycles.
Knowing the decomposition into disjoint cycles of a permutation is the closest we can come to understanding the nature of . Happily, this decomposition can be determined in a systematic way (e ectively
giving an explicit proof of this assertion). For example, consider
g
g
g
= 14 23 32 45 57 66 71
We just trace the `path' of elements under repeated applications of . To start, let's see what happens to 1
under repeated application of : rst 1 goes to 4, which then goes to 5, which then goes to 7, which then
goes to 1. Since we have returned to 1, we have completed the cycle: we see that one cycle occurring inside
is
(1 4 5 7)
g
g
g
87
Next, look at any number which didn't already occur in this cycle, for example 2. First 2 goes to 3, which
then goes to 2, which already completes another cycle. Thus, there is also the 2-cycle
(2 3)
inside g. The only number which hasn't yet appeared in either of these cycles is 6, which is not moved by g.
Thus, we have obtained the decomposition into disjoint cycles:
1 2 3 4 5 6 7
4 3 2 5 7 6 1 = (1 4 5 7)(2 3) = (2 3)(1 4 5 7)
And the decomposition into disjoint cycles tells how many times a permutation must be repeated in
order to have no net e ect: the least common multiple of the lengths of the disjoint cycles appearing in its
decomposition.
The order of a permutation is the number of times it must be applied in order to have no net e ect.
(Yes, there is possibility of confusion with other uses of the word `order'). Thus,
The order of a k-cycle is k. The order of a product of disjoint cycles is the least common multiple of
the lengths.
We might imagine that permutations with larger orders `mix better' than permutations with smaller
orders, since more repetitions are necessary before the mixing e ect is `cancelled'. In this context, it may be
amusing to realize that if a card shue is done perfectly, then after some number of repetitions the cards
will be returned to their original order! But the number is pretty large with a 52-card deck, and it's not
easy to do perfect shues anyway.
As an example, let's examine the all elements of S7 , determining their structure as products of disjoint
cycles, counting the number of each kind, and noting their order.
First, let's count the 7-cycles (i1 : : : i7): there are 7 choices for i1 , 6 for i2 , and so on, but there are 7
di erent ways to write each 7-cycle, so there are 7!=7 distinct 7-cycles altogether.
Next, 6-cycles (i1 : : : i6 ): there are 7 choices for i1 , 6 for i2 , and so on down to 2 choices for i6 , but there
are 6 di erent ways to write each 6-cycle, so there are 7!=6 distinct 6-cycles altogether.
Next, 5-cycles (i1 : : : i5 ): there are 7 choices for i1 , 6 for i2 , and so on down to 3 choices for i5 , but there
are 5 di erent ways to write each 5-cycle, so there are 7!=2!5 distinct 5-cycles altogether.
For variety, let's count the number of permutations writeable as a product of disjoint 5-cycle and 2-cycle.
We just counted that there are 7!=2!5 distinct 5-cycles. But each choice of 5-cycle leaves just one choice for
2-cycle disjoint from it, so there are again 7!=2!5 distinct products of disjoint 5-cycle and 2-cycle. And we
note that the order of a product of disjoint 5 and 2-cycle is lcm(2; 5) = 10.
There are 7!=3!4 distinct 4-cycles, by reasoning similar to previous examples.
There are 7!=3!4 3!=2 choices of disjoint 4-cycle and 2-cycle. The order of the product of such is
lcm(2; 4) = 4.
There are 7!=3!4 3!=3 choices of disjoint 4-cycle and 3-cycle. The order of the product of such is
lcm(3; 4) = 12.
There are 7!=4!3 distinct 3-cycles, by reasoning similar to previous examples.
There are 7!=4!3 4!=2!2 choices of disjoint 3-cycle and 2-cycle. The order of the product of such is
lcm(2; 3) = 6.
88
The number of disjoint 3-cycle, 2-cycle, and 2-cycle is slightly subtler, since the two 2-cycles are indistinguishable. Thus, there are
7! 4! 2! 1
4!3 2!2 0!2 2!
where the last division by 2! is to take into account the 2! di erent orderings of the two 2-cycles, which make
only a notational di erence, not a di erence in the permutation itself. The order of such a permutation is
lcm(2; 2; 3) = 6.
The number of disjoint pairs of 3-cycle and 3-cycle is similar: the two 3-cycles are not actually ordered
although our \choosing" of them gives the appearance that they are ordered. There are
7! 4! 1
4!3 1!3 2!
such pairs, where the last division by 2! is to take into account the 2! di erent orderings of the two 3-cycles,
which make only a notational di erence, not a di erence in the permutation itself. The order of such a
permutation is lcm(3; 3; 1) = 3.
There are 7!=5!2 distinct 2-cycles, each of order 2.
There are 7!=5!2 5!=3!2 1=2! pairs of disjoint 2-cycles, where the last division by 2! is to take into
account the possible orderings of the two 2-cycles, which a ect the notation but not the permutation itself.
Finally, there are
7! 5! 3! 1
5!2 3!2 1!2 3!
triples of disjoint 2-cycles, where the last division by 3! is to account for the possible orderings of the 3
2-cycles, which a ects the notation but not the permutation itself. The order of such a permutation is just
lcm(2; 2; 2) = 2.
As a by-product of this discussion, we see that the largest order of any permutation of 7 things is 12,
which is obtained by taking the product of disjoint 3 and 4-cycles.
As a more extreme example of the counting issues involved, let's count the disjoint products of three
2-cycles and three 5-cycles in S24 . As above, this is
24! 22! 20! 1 18! 13! 8! 1
22!2 20!2 18!2 3! 13!5 8!5 3!5 3!
where both of the divisions by 3! come from discounting the possible orderings of the 2-cycles, and the
possible orderings of the 5-cycles. Note that since 2-cycles are distinguishable from 5-cycles, there is no
further accounting necessary for the ordering of the 2-cycles relative to the 5-cycles, etc.
And we can break down any permutation into a product of 2-cycles (likely not disjoint). The procedure
to do this is as follows. First, write the permutation as a product of cycles. This reduces the problem to
that of writing any cycle as a product of 2-cycles. It's not hard to check that
(i1 i2 : : : ik ) = (i1 i2 )(i2 i3 ) : : : (ik,1 ik )
That is, a k-cycle is a (certainly not disjoint) product of k , 1 two-cycles.
Let f be a permutation. The number n of 2-cycles needed to express f as a product of of n two-cycles
is not uniquely determined, but n modulo 2 is uniquely determined.
A permutation written as a product of an odd number of 2-cycles is an odd permutation, while a
permutation written as a product of an even number of two-cycles is an even permutation.
89
The collection An of all even permutations in the symmetric group Sn is the alternating group on n
things. The composition of two even permutations is again even.
#16.114 Express The following as products of disjoint cycles, as products of two-cycles, and determine their
order.
1 2 3 4 5 ;
1
2 5 4 3 1
2
1 2 3 4
2 3 4 7
#16.115 Compute the product
1 2 3 4 5 6 7
2 5 4 7 1 3 6
2
5
5
1
3
4
6
5
4 5 6 7 ;
7 1 3 6
7
6
1 2 3 4 5 6 7
2 3 4 7 1 5 6
#16.116 How many distinct 3-cycles are there in S5? (Hint: In cycle notation, a 3 cycle is speci ed by a
notation using 3 distinct elements of the set being permuted. And now order really does matter. But there
is a little redundancy: there are still k di erent ways to write the same k-cycle.)
#16.117 Count the number of elements of S4 of each possible order, by identifying them as products of
disjoint cycles of various orders.
#16.118 Count the number of elements of A5 of all possible orders, by identifying them as products of
disjoint cycles of various orders, and excluding those which are odd permutations.
#16.119 For any g; h 2 Sn, show that the commutator ghg,1h,1 is an even permutation.
90
17. Groups: Lagrange's Theorem, Euler's Theorem
Here we encounter the rst instance of abstract algebra rather than the tangible algebra studied in high
school.
One way to think of the point of this is that it is an attempt to study the structure of things directly,
without reference to irrelevant particular details.
This also achieves amazing eciency (in the long run, anyway), since it turns out that the same underlying structures occur over and over again in mathematics. Thus, a careful study of these basic structures is
amply repaid by allowing a much simpler and more uni ed mental picture of otherwise seemingly-di erent
phenomena.
Groups
Subgroups
Lagrange's theorem
Index of a subgroup
Laws of Exponents
Cyclic subgroups, orders, exponents
Euler's theorem
Exponents of groups
17.1 Groups
The simplest (but maybe not most immediately intuitive) object in abstract algebra is a group. This
idea is pervasive in modern mathematics. Many seemingly elementary issues seem to be merely secret
manifestations of facts about groups. This is especially true in elementary number theory, where it is possible
to give \elementary" proofs of many results, but only at the cost of having everything be complicated and
so messy that it can't be remembered.
A group is a set with an operation , with a special element called the identity, and with
properties:
The property of the identity: for all 2 , = = .
Existence of inverses: for all 2 there is 2 (the inverse of ) so that = = .
Associativity: for all
2 , ( )=( ) .
If the operation is commutative, that is, if
G
g
g
g
x; y; z
g
G
G
h
e
G
x
e
g
h
y
z
g
e
G
x
y
g
z
h
g
h=hg
91
g
h
g
g
h
e
then the group is said to be abelian (named after N.H. Abel, born on my birthday but 150 years earlier).
In that case, often, but not always, the operation is written as addition. And if the operation is written as
addition, then the identity is often written as 0 instead of e.
And in many cases the group operation is written as multiplication
g h = g h = gh
This does not preclude the operation being abelian, but rather suggests only that there is no presumption
that the operation is abelian. If the group operation is written as multiplication, then often the identity is
written as 1 rather than e. Especially when the operation is written simply as multiplication, the inverse
of an element g in the group is written as
inverse of g = g,1
If the group operation is written as addition, then the inverse is written as
inverse of g = ,g
In each of the following examples, it is easy to verify the properties necessary for the things to qualify
as groups: we need an identity and we need inverses, not to mention associativity.
The integers Z with operation the usual addition +. The identity is 0 and the inverse of x is ,x. This
group is abelian.
The even integers 2Z with the usual addition +. The identity is 0 and the inverse of x is ,x. This
group is abelian.
The set 7Z of multiples of 7 among integers, with the usual addition +. The identity is 0 and the inverse
of x is ,x. This group is abelian.
The set Z=m of integers-mod-m, with addition-mod-m as the operation. The identity is 0-mod-m and
the inverse of x-mod-m is (,x)-mod-m. This group is abelian.
The set Z=m of integers mod m relatively prime to m, with multiplication-mod-m as the operation.
The identity is 1-mod-m. In this example, a person unacquainted with arithmetic mod m would not
realize that there are multiplicative inverses. We can compute them via the Euclidean algorithm. So
this is the rst `non-trivial' example. This group is abelian.
The collection of vectors in real n-space Rn , with operation vector addition. The identity is just the 0
vector. Inverses are just negatives. (Note that we are literally forgetting the fact that there is a scalar
multiplication).
The set GL(2; R) of invertible two-by-two real matrices, with group law matrix multiplication. Here
the identity is the matrix
1 0
0 1
The existence of inverses is just part of the de nition. The fact that matrix multiplication is associative
is not obvious from the de nition, but this can either be checked by hand or inferred from `higher
principles'. The fact that the product of two invertible matrices is invertible is interesting: suppose that
g; h both have inverses, g,1 and h,1 , respectively. Then you can check that h,1 g,1 is an inverse to gh.
This group is certainly not abelian.
Permutations of a set form a group, with operation being composition (as functions) of permutations.
The do-nothing permutation is the identity. The associativity follows because permutations are mappings. If there are more than two things, these permutations groups are certainly non-abelian.
92
The collection of all bijective functions from a set S to itself form a group, with the operation being
composition of functions. The identity is the function e which maps every element just back to itself,
that is, e(s) = s for all s 2 S . (This example is just a more general paraphrase of the previous one
about permutations!)
17.2 Subgroups
Subgroups are subsets of groups which are groups \in their own right".
A subset H of a group G is said to be a subgroup if, with the same operation as that used in G, it is
a group.
That is, if H contains the identity element e 2 G, if H contains inverses of all elements in it, and if H
contains products of any two elements in it, then H is a subgroup. (The associativity of the operation is
assured since the operation was assumed associative for G itself to be a group).
Another paraphrase: if e 2 H , and if for all h 2 H the inverse h,1 is also in H , and if for all h1 ; h2 2 H
the product h1 h2 is again in H , then H is a subgroup of G.
Another cute paraphrase is: if e 2 H , and if for all h1 ; h2 2 H the product h1 h,2 1 is again in H , then
H is a subgroup of G. (If we take h1 = e, then the latter condition assures the existence of inverses! And so
on).
In any case, one usually says that H is closed under inverses and closed under the group operation.
For example, the collection of all even integers is a subgroup of the additive group of integers. More
generally, for xed integer m, the collection H of all multiples of m is a subgroup of the additive group of
integers. To check this: rst, the identity 0 is a multiple of m, so 0 2 H . And for any two integers x; y
divisible by m, write x = ma and y = mb for some integers a; b. Then using the `cute' paraphrase, we see
that
x , y = ma , mb = m(a , b) 2 H
so H is closed under inverses and under the group operation. Thus, it is a subgroup of Z.
17.3 Lagrange's Theorem
The theorem of this section is the simplest example of the use of group theory as structured counting.
Althought the discussion of this section is completely abstract, it gives the easiest route to (the very tangible)
Euler's theorem proven as a corollary below.
A nite group is simply a group which is also nite. The order of a nite group is the number of
elements in it. Sometimes the order of a group G is written as jGj. Throughout this section we will write
the group operation simply as though it were ordinary multiplication.
Theorem: (Lagrange) Let G be a nite group. Let H be a subgroup of G. Then the order of H divides
the order of G.
93
For the proof we need some other ideas which themselves will be reused later. For subgroup H of a
group G, and for g G, the left coset of H by g or left translate of H by g is
2
gH = fgh : h 2 H g
The notation gH is simply shorthand for the right-hand side. Likewise, the right coset of H by g, or right
translate of H by g is
Hg = hg : h H
f
2
g
Proof: First, we will prove that the collection of all left cosets of H is a partition of G, meaning that
every element of G lies in some left coset of H , and if two left cosets xH and yH have non-empty intersection
then actually xH = yH . (Note that this need not imply x = y.)
Certainly x = x e xH , so every element of G lies in a left coset of H .
Now suppose that xH yH = for x; y G. Then for some h1 ; h2 H we have xh1 = yh2 . Multiply
both sides of this equality on the right by h2,1 to obtain
2
\
6
2
2
(xh1 )h,2 1 = (yh2 )h,2 1
The right-hand side of this is
(yh2 )h,2 1 = y(h2 h,2 1 ) (by associativity)
= y e (by property of inverse)
= y (by property of e)
Let z = h1 h,2 1 for brevity. By associativity in G,
1
,1
y = (xh1 )h,
2 = x(h1 h2 ) = xz
Since H is a subgroup, z H .
Then
yH = yh : h H = (xz )h : h H = x(zh) : h H
On one hand, since H is closed under multiplication, for each h H the product zh is in H . Therefore,
2
f
2
g
f
2
g
f
2
g
2
yH = fx(zh) : h 2 H g fxh0 : h0 2 H g = xH
Thus, yH xH . But the relationship between x and y is completely symmetrical, so also xH yH .
Therefore xH = yH . (In other words, we have shown that the left cosets of H in G really do partition G.)
Next, we will show that the cardinalities of the left cosets of H are all the same. To do this, we show
that there is a bijection from H to xH for any x G. In particular, de ne
2
f (g ) = xg
(It is clear that this really does map H to yH .) Second, we prove injectivity: if f (g) = f (g0 ), then
xg = xg 0
Left multiplying by x,1 gives
Using associativity gives
x,1 (xg ) = x,1 (xg 0 )
(x,1 x)g = (x,1 x)g0
94
Using the property ,1 = of the inverse ,1 gives
x
x
e
x
eg
Since
eg
= 0
eg
= and 0 = 0, by the de ning property of the identity , this is
g
eg
g
e
g
= 0
g
which is the desired injectivity. For surjectivity, we simply note that by its very de nition the function
was arranged so that
( )=
Thus, any element in
is hit by an element from . Thus, we have the bijectivity of , and all left cosets
of have the same number of elements as does itself.
So is the union of all the di erent left cosets of (no two of which overlap). Let be the number of
di erent cosets of . We just showed that every left coset of has
elements. Then we can count the
number of elements in as
f
f h
xH
xh
H
H
f
H
G
H
i
H
H
jH j
G
jGj
= sum of cardinalities of cosets =
Both sides of this equation are integers, so
jH j
divides
i jH j
, as claimed.
jGj
|
17.4 Index of a subgroup
Having introduced the idea of a coset in the proof of Lagrange's theorem, we can now de ne the index
of a subgroup.
Let
G
be a group, and
H
a subgroup of . The index of
G
H
in , denoted
G
[ : ]
G
H
is the number of (left) cosets of in .
Corollary: (of Lagrange's theorem) For a nite group
H
G
jGj
G
=[ : ]
G
H
and subgroup ,
H
jH j
Proof: This is just a recapitulation of the counting done in proving Lagrange's theorem: we show that G is
the disjoint union of the left cosets of H , and that each such coset has jH j elements. Thus, the statement of
this corollary is an assertion that counting the elements in G in two ways gives the same result.
|
A closely related counting or divisibility principle is the following multiplicative property of indices
of subgroups:
Proposition: Let G be a nite group, let H; I be subgroups of G, and suppose that H I . Then
[ : ]=[ : ] [ : ]
G
I
G
H
H
I
Proof: The group G is a disjoint union of [G : I ] left cosets of I . Also, G is the disjoint union of [G : H ] left
cosets of H . If we can show that any left coset of H is a disjoint union of [H : I ] left cosets of I , then the
assertion of the proposition will follow.
95
Let
gH = fgh : h 2 H g
be a left coset of H . And express H as a (disjoint) union of [H : I ] left cosets of I by
H = h1 I [ h2 I [ : : : [ h[H :I ] I
Then
,
gH = g h1 I [ h2 I [ : : : [ h[H :I ] I = gh1 I [ gh2 I [ : : : [ gh[H :I ] I
which is certainly a union of left cosets of I . We might want to check that hi I \ hj I = (for i 6= j ) implies
that
ghi I \ ghj I =
Suppose that g ghi I ghj I . Then for some i1 I and i2 I we have
2
\
2
2
ghi i1 = x = gh2 i2
Left multiplying by g,1 gives
h i i1 = h 2 i2
The left-hand side is (by hypothesis) an element of hi I , and the right-hand side is an element of h2 I . But
we had assumed that hi I hj I = , so this is impossible. That is, we have proven that ghi I ghj I = if
hi I hj I = . This certainly nishes the proof of the multiplicative property of subgroup indices.
\
\
\
|
17.5 Laws of Exponents
It should be emphasized that the so-called Laws of Exponents are not \laws" at all, but are provable
properties of the exponential notation. And the exponential notation itself is basically nothing more than
an abbreviation for repeated multiplication.
Of course, we must be sure to be explicit about this exponential notation gn for integer n, where g is an
element of a group G. This is, after all, merely an abbreviation: rst,
g0 = e
and
gn = g g : : : g
{z
|
}
(for n 0)
n
n
,
1
,
1
g = g g : : : g ,1
{z
|
}
jnj
(for n 0)
A more precise though perhaps less intuitive way of de ning gn is by recursive de nitions:
(
for n = 0
e
g n = g g n,1
for n > 0
g ,1 g n+1 for n < 0
These are the de nitions that lend themselves both to computation and to proving things.
While we're here, maybe we should check that the so-called Laws of Exponents really do hold:
96
Proposition:
(Laws of Exponents)
gm+n = gm gn
gmn = (gm )n
For g in a group G, for integers m; n
Proof:
The least obvious thing to prove is that
(g,1 ),1 = g
Note that we absolutely cannot simply pretend to invoke \laws of exponents" to prove this! Instead, to prove
this, we must realize that the way that one checks that y is an inverse of x is to compute xy and yx and see
that they are both just e. So to prove that x is the inverse of x,1 , we must compute both x,1 x and xx,1 .
And, indeed, by the property of x,1 these both are e.
The rest of the proof is an exercise in induction, and is a bit tedious. And nothing really exciting
happens.
Let's prove that
gm+n = gm gn
for m and n non-negative integers. We prove this by induction on n. For n = 0 the assertion is true, since
gm+0 = gm = gm e = gm g0
Then for n > 0,
gm+n = g(m+n,1)+1 = gm+n,1 g
by the recursive de nition of gm+n. By induction,
gm+n,1 = gm gn,1
Therefore,
gm+n,1 g = (gm gn,1) g = gm ((gn,1 ) g)
by associativity. Now from the recursive de nition of gn we obtain
gm ((gn,1 ) g) = gm gn
This proves this \Law" for m; n 0.
|
17.6 Cyclic subgroups, orders, exponents
For an element g of a group G, let
g = gn : n
h i
f
2 Zg
This is called the cyclic subgroup of G generated by g.
The smallest positive integer n (if it exists!) so that
gn = e
is the order or exponent of g. The order of a group element g is often denoted by g . Yes, we are reusing
the terminology \order", but it will turn out that these uses are compatible (just below).
j j
Corollary:
(of Laws of Exponents)
For g in a group G, the subset g of G really is a subgroup of G.
h i
97
Proof: The associativity is inherited from G. The closure under the group operation and the closure
under taking inverses both follow immediately from the Laws of Exponents, as follows. First, the inverse of
gn is just g,n , since
gn g,n = gn+(,n) = g0 = e
And closure under multiplication is
gm gn = gm+n
|
Theorem: Let g be an element of a nite group G. Let n be the order of g. Then the order of g (as
group element is equal to the order of hgi (as subgroup). Speci cally,
hgi = fg ; g ; g ; : : : ; gn, g
0
1
2
1
Generally, for arbitrary integers i; j ,
gi = gj if and only if i j mod n
Proof: The last assertion easily implies the rst two, so we'll just prove the last assertion. On one hand,
if i j mod n, then write i = j + `m and compute (using Laws of Exponents):
gi = gj+`m = gj (gn )` = gj e` = gj e = gj
On the other hand, suppose that gi = gj . Without loss of generality, exchanging the roles of i and j if
necessary, we may suppose that i j . Then gi = gj implies e = gj,i . Using the Reduction/Division
algorithm, write
j,i=qn+r
where 0 r < n. Then
e = gj,i = gqn+r = (gn )q gr = eq gr = e gr = gr
Therefore, since n is the least positive integer so that gn = e, it must be that r = 0. That is, njj , i, which
is to say that i j mod n as claimed.
|
Corollary: (of Lagrange's theorem) The order jgj of an element g of a nite group G divides the order
of G.
Proof: We just proved that jgj = jhgij. By Lagrange's theorem, jhgij divides jGj, which yields this
corollary.
|
17.7 Euler's Theorem
Now we return to number theory, and give a clean and conceptual proof of Euler's identity, as a corollary
of Lagrange's theorem and the discussion of Laws of Exponents and cyclic subgroups. Further, we can give
a slightly re ned form of it.
Let '(n) be Euler's phi-function, counting the number of integers ` in the range 0 < ` n which are
relatively prime to n. The proof we give of this is simply the abstracted version of Euler's original argument.
98
Theorem: Let n be a positive integer. For x 2 Z relatively prime to n,
x' n 1 mod n
( )
The set Z=n of integers-mod-n which are relatively prime to n has '(n) elements. By Lagrange's
theorem and its corollaries just above, this implies that the order k of g 2 Z=n divides '(n). Therefore,
'(n)=k is an integer, and
g'(n) = (gk )'(n)=k = e'(n)=k = e
Applied to x , mod , n this is the desired result.
|
Remark: This approach also gives another proof of Fermat's theorem, dealing with the case that that n is
prime, without mention of binomial coecients.
Further, keeping track of what went into the proof of Euler's theorem in the rst place, we have
Theorem: Let n be a positive integer. For x 2 Z relatively prime to n, the smallest exponent ` so that
Proof:
x` 1 mod n
is a divisor of '(n). That is, the order of x in the multiplicative group Z=n is a divisor of '(n).
Proof: The proof is really the same: the order x is equal to the order of the subgroup hxi, which by Lagrange's
theorem is a divisor of the order of the whole group Z=n .
|
17.8 Exponents of groups
The idea of Euler's theorem can be made more precise and abstracted.
For a group G, the smallest positive integer ` so that for every g 2 G
g` = e
is the exponent of the group G. It is not clear from the de nition that there really is such a positive integer
`. Indeed, for in nite groups G there may not be. But for nite groups the mere niteness allows us to
characterize the exponent:
Proposition: Let G be a nite group. Then the exponent of G exists, and in particular
exponent of G = least common multiple of jgj for g 2 G
If gk = e, then we know from discussion of cyclic subgroups above that jgj divides k. And, on the
other hand, if k = m jgj then
gk = gm g = (g g )m = em = e
Since G is nite, every element of it is of nite order. And, since there are only nitely-many elements in G,
the least common multiple M of their orders exists. From what we've just seen, surely gM = e for any g.
Thus, G does have an exponent. And if gk = e for all g 2 G then k is divisible by the orders of all elements
of G, so by their least common multiple. Thus, the exponent of G really is the least common multiple of the
orders of its elements.
|
And Lagrange's theorem gives a limitation on what we can expect the exponent to be:
Proof:
j j
j j
99
Corollary: (of Lagrange's theorem) Let
be a nite group. Then the exponent of divides the order j j
of .
Proof: From the proposition, the exponent is the least common multiple of the orders of the elements of .
From Lagrange's theorem, each such order is a divisor of j j. The least common multiple of any collection
of divisors of a xed number is certainly a divisor of that number.
|
G
G
G
G
G
G
#17.120 Prove that in any group
G
for any elements
( ) ,1 = (
h xy h
#17.121 Prove (by induction) that in any group
hg
2 G we have
,1 )(hyh,1 )
hxh
for any elements
G
n h,1
h; x; y
=(
hgh
g; h
2 G and for any integer n
,1 )n
#17.122 Make an addition table for Z 4 and a multiplication table for Z 5.
#17.123 Why isn't f1 2 3 4 5g with operation multiplication modulo 6 a group?
#17.124 Prove by induction that in an abelian group we have
=
;
;
;
=
;
G
( )n =
g
( )2 =
g
gh
for all
g; h
n hn
2 G, and for all positive integers n.
#17.125 Show that
gh
in a group if and only if
gh
2
h
2
= .
hg
#17.126 Prove that ( ),1 = ,1 ,1.
#17.127 Prove that ( ),1 = ,1 ,1 if and only if = .
#17.128 Prove that the intersection \ of two subgroups of a group is again a subgroup of .
#17.129
Show that in an abelian group , for a xed positive integer the set n of elements of so
n
gh
h
g
gh
g
h
gh
H
that
g
K
H; K
G
= is a subgroup of .
e
hg
G
n
G
X
g
G
G
#17.130 There are 8 subgroups of the group Z 30. Find them all. (List each subgroup only once!)
0
#17.131 Check that the collection of matrices in (2 Q) of the form = 0 (that is, with lower
left and upper right entries 0) is a subgroup of (2 Q).
#17.132 Check that the collection of matrices in (2 Q) of the form = 0 (that is, with lower
left entry 0) is a subgroup of (2 Q).
#17.133 (Casting out nines:) Show that
=
g
GL
g
GL
GL
;
g
;
g
a
d
;
GL
;
123456789123456789 + 234567891234567891
6= 358025680358025680
100
a
b
d
(Hint: Look at things modulo 9: if two things are not equal mod 9 then they certainly aren't equal. And
notice the funny general fact that, for example,
1345823416 1 + 3 + 4 + 5 + 8 + 2 + 3 + 4 + 1 + 6 mod 9
since 10 1 mod 9, and 100 1 mod 9, and so on. The assertion is that a decimal number is congruent to
the sum of its digits modulo 9! This is casting out nines, which allows detection of some errors in arithmetic).
#17.134 By casting out 9's, show that
123456789123456789 234567891234567891
6= 28958998683279996179682996625361999
Certainly in this case it's not possible to check directly by hand, and probably most calculators would
over ow.
#17.135 Prove that a group element and its inverse have the same order.
#17.136 Without computing, show that in the group Z=100 (with addition) the elements 1; 99 have the
same order, as do 11; 89.
#17.137 Find the orders of the following elements g; h of GL(2; R):
g = 01 ,01
h = ,01 ,11
Compute the product gh, compute (gh)n for integers n, and then show that gh is necessarily of in nite order
in the group.
#17.138 Let G be a nite group. Let N be the least common multiple of the orders of the elements of G.
Show that for all g 2 G we have gN = e.
#17.139 (*) Let G be an abelian group. Let m; n be relatively prime positive integers. Let g be an element
of order m and let h be an element of order n. Show that jghj = mn. More generally, show that without any
relative primeness hypothesis on the orders of g; h show that jghj is the least common multiple of jgj, jhj.
#17.140 Let x be an element of a group G and suppose that x3 5 = e and x3 6= e. Show that the order of
x is either 5 or 15.
#17.141 Show that any integer i so that 1 i < 11 is a generator for the additive group Z=11 of integers
modulo 11.
#17.142 Check that Z=8 cannot be generated by a single element.
#17.143 Find all 5 of the distinct subgroups of the group Z=16 (with addition). (List each subgroup only
once!)
#17.144 Prove that
if an element g of a group G has order n and if d is a divisor of n then gn=d has order
d
d. (Equivalently, g has order n=d).
101
18. Rings and Fields: de nitions and rst examples
Rings, elds
Divisibility in rings
Polynomial rings
Euclidean algorithm in polynomial rings
Euclidean rings
18.1 Rings, elds
The idea of ring generalizes the idea of `numbers', among other things, so maybe is a little more intuitive
than the idea of group.
A ring is a set with two operations, + and , and with a special element 0 (additive identity) with
most of the usual properties we expect or demand of `addition' and `multiplication':
The addition is associative: + ( + ) = ( + ) + for all
2 .
The addition is commutative: + = + for all 2 .
For every 2 there is an additive inverse denoted , , with the property that + (, ) = 0.
The zero has the property that 0 + = + 0 = for all 2 .
The multiplication is associative: ( ) = ( ) for all
2 .
The multiplication and addition have left and right distributive properties: ( + ) = + and
( + ) = + for all 2 .
When we write this multiplication, just as in high school algebra, very often the dot will be omitted,
and just write
=
R
a
b
a
a
c
b
a
b
b
c
a
a; b; c
a; b
R
R
R
a
a
a
a
a bc
a
ab c
r
R
a; b; c
R
a b
b
c a
ba
ca
a; b
r
c
ab
ac
R
ab
a
b
Very often, a particular ring has some additional special features or properties:
If there is an element 1 in a ring with the property that 1 = 1 for all 2 , then 1 is said to be
the (multiplicative) identity or unit in the ring, and the ring is said to have an identity or have
a unit or be a ring with unit. And 1 is the unit in the ring. We also demand that 1 6= 0 in a ring.
If = for all in a ring , that is, if multiplication is commutative, then the ring is said to be
a commutative ring.
Most often, but not always, our rings of interest will have units `1'. The condition of commutativity of
multiplication is often met, but, for example, matrix multiplication is not commutative.
In a ring with 1, for a given element 2 , if there is ,1 2 so that ,1 = ,1 , then ,1 is
said to be a multiplicative inverse for . If 2 has a multiplicative inverse, then is called a unit
in . The collection of all units in a ring is denoted , and is called the group of units in .
a
ab
ba
R
a; b
a
R
R
a
R
a
R
a
a
a
R
R
a
a
a
a
a
a
R
102
R
R
A commutative ring in which every non-zero element is a unit is called a eld.
A not-necessarily commutative ring in which every non-zero element is a unit is called a division ring.
In a ring R an element r so that r s = 0 or s r = 0 for some non-zero s 2 R is called a zero divisor.
A commutative ring without non-zero zero-divisors is an integral domain.
A commutative ring R has the cancellation property if for any r =
6 0 in R if rx = ry for x; y 2 R
then x = y. Most rings with which we're familiar have this property.
Comment on terminology: There is indeed an inconsistency in the use of the word unit. But that's the
way the word is used. So the unit is 1, while a unit is merely something which has a multiplicative inverse.
Of course, there are no multiplicative inverses unless there is a unit (meaning that there is a 1!). It is almost
always possible to tell from context what is meant.
It is very important to realize that the notation ,a for an additive inverse and a,1 for multiplicative
inverse are meant to suggest `minus a' and `divide-by-a', but that at the moment we are not justi ed in
believing any of the `usual' high school algebra properties. We have to prove that all the `usual' things really
do still work in this abstract situation.
If we take a ring R with 0 and with its addition, then we get an abelian group, called the additive
group of R.
The group of units R in a ring with unit certainly is a group. Its identity is the unit 1. This group is
abelian if R is commutative.
In somewhat more practical terms: as our examples above show, very often a group really is just the
additive group of a ring, or is the group of units in a ring. There are many examples where this is not really
so, but many fundamental examples are of this nature.
The integers Z with usual addition and multiplication form a ring. This ring is certainly commutative
and has a multiplicative identity `1'. The group of units Z is just f1g. This ring is an integral domain.
The even integers 2Z with the usual addition and multiplication form a commutative ring without unit.
Just as this example suggests, very often the lack of a unit in a ring is somewhat arti cial, because there is
a `larger' ring it sits inside which does have a unit. There are no units in this ring.
The `integers mod m' Z=m form a commutative ring with identity. As the notation suggests, the group of
units really is Z=m: notice that we used the group-of-units notation in this case before we even introduced
the terminology.
Take p a prime. The ring of integers mod p Z=p is a eld if p is prime, since all positive integers less
than p have a multiplicative inverse modulo p for p prime (computable by the Euclidean Algorithm!). The
group of units really is Z=p .
The collection of n-by-n real matrices (for xed n) is a ring, with the usual matrix addition and
multiplication. Except for the silly case n = 1, this ring is non-commutative. The group of units is the group
GL(n; R).
The rational numbers Q, the real numbers R, and the complex numbers C are all examples of elds,
because all their non-zero elements have multiplicative inverses.
Let p be a prime number. Then Z=p with addition and multiplication modulo p is a eld, because (by
use of the Euclidean algorithm, for example) any x 6 0 mod p has a multiplicative inverse modulo p.
Just as in the beginning of our discussion of groups, there are some things which we might accidentally
take for granted about how rings behave, and reasonably so, after all, based on all our previous experience
with numbers, etc. But it is certainly better to give the `easy' little proofs of these things and to be conscious
of what we believe, rather than to be unconscious.
103
Let R be a ring. We will prove the following fundamental properties:
Uniqueness of additive identity: If there is an element z 2 R and another r 2 R so that r + z = r, then
z = 0. (Note that we only need this condition for one other r 2 R, not for all r 2 R).
Uniqueness of additive inverses: Fix r 2 R. If there is r0 2 R so that r + r0 = 0, then actually r0 = ,r,
the additive inverse of r.
Uniqueness of multiplicative identity: Suppose that R has a unit 1. If there is u 2 R so that for all
r 2 R we have u r = r, then u = 1. Or, if for all r 2 R we have r u = r, then u = 1. Actually, all we
need is that either 1 u = 1 or u 1 = 1 to assure that u = 1.
Uniqueness of multiplicative inverses: If r 2 R has a multiplicative inverse r,1 , and if r0 2 R is such
that r r0 = 1, then r0 = r,1 . Or, assuming instead that r0 r = 1, we still conclude that r0 = r,1 .
For r 2 R, we have ,(,r) = r. That is, the additive inverse of the additive inverse of r is just r.
Proof of uniqueness of additive identity: If there is an element z 2 R and r 2 R so that r + z = r, add
,r to both sides of this equation to obtain
(r + z ) , r = r , r = 0
by de nition of additive inverse. Using the commutativity and associativity of addition, the left-hand side
of this is
(r + z ) , r = (z + r) , r = z + (r , r) = z + 0 = z
also using the property of the 0. That is, putting this together, z = 0, proving what we wanted.
Proof of uniqueness of additive inverses: Fix r 2 R. If there is r0 2 R so that r + r0 = 0, then add ,r
to both sides to obtain
(r + r0 ) , r = 0 + (,r)
Using the commutativity and associativity of addition, the left-hand side of this is
(r + r0 ) , r = (r0 + r) , r = r0 + (r , r) = r0 + 0 = r0
Since the right hand side is 0 + (,r) = ,r, we have r0 = ,r, as claimed.
Proof of uniqueness of multiplicative identity: Suppose that either 1 u = 1 or u 1 = 1 to assure that
u = 1. Well, let's just do one case, since the other is identical apart from writing things in the opposite
order. Suppose that u 1 = 1. Then since u 1 = u by the property of the multiplicative identity 1, we have
u = 1. Done.
Proof of uniqueness of multiplicative inverses: Assume that r 2 R has a multiplicative inverse r,1 , and
that r0 2 R is such that r r0 = 1. Then multiply that latter equation by r,1 on the left to obtain
r,1 (r r0 ) = r,1 1 = r,1
by the property of 1. Using the associativity of multiplication, the left-hand side is
r,1 (r r0 ) = (r,1 r) r0 = 1 r0 = r0
by property of multiplicative inverses and of the identity. Putting this together, we have r0 = r,1 as desired.
The proof that ,(,r) = r, that is, that the additive inverse of the additive inverse of r is just r, is
identical to the argument given for groups that the inverse of the inverse is the original thing.
104
There are several `slogans' that we all learned in high school or earlier, such as `minus times minus is
plus', and `zero times anything is zero'. It may be interesting to see that from the axioms for a ring we can
prove those things. (We worried over the so-called `Laws of Exponents' already a little earlier).
These things are a little subtler than the `obvious' things above, insofar as they involve the interaction
of the multiplication and addition.
And these little proofs are good models for how to prove simple general results about rings.
Let be a ring.
For any 2 , 0 = 0 = 0.
Suppose that there is a 1 in . Let ,1 be the additive inverse of 1. Then for any 2 we have
(,1) = (,1) = , , where as usual , denotes the additive inverse of .
Let , , be the additive inverses of 2 . Then (, ) (, ) = .
Proofs: Throughout this discussion, keep in mind that to prove that
= , means to prove just that
+ = 0.
Let's prove that `zero times anything is zero': Let 2 . Then
R
r
R
r
r
R
r
x;
r
r
r
r
y
x; y
R
x
y
xy
b
a
R
r
a
b
r
0 = (0 + 0)
=0 +0
r
(since 0 + 0 = 0)
(distributivity)
r
r
R
r
Then, adding ,(0 ) to both sides, we have
r
0=0 ,0 =0 +0 ,0 =0 +0=0
r
r
r
r
r
r
r
That is, 0 . The proof that 0 = 0 is nearly identical.
Let's show that (,1) = , . That is, we are asserting that (,1) is the additive inverse of , which
by now we know is unique. So all we have to do is check that
r
r
r
r
r
r
r
+ (,1) = 0
r
We have
+ (,1) = 1 + (,1) = (1 , 1) = 0 = 0
by using the property of 1, using distributivity, and using the result we just proved, that 0 = 0. We're
done.
Last, to show that (, )(, ) = , we prove that (, )(, ) = ,(,( )), since we know generally that
,(, ) = . We can get halfway to the desired conclusion right now: we claim that ,( ) = (, ) : this
follows from the computation
( , ) + = (, + ) = 0 = 0
Combining these two things, what we want to show is that
r
r
r
r
r
r
r
x
r
y
xy
x
y
xy
r
xy
x y
xy
x
x y
y
(, )(, ) + (, ) = 0
x
y
x y
Well,
(, )(, ) + (, ) = (, )(, + ) = (, ) 0 = 0
using distributivity and the property 0 = 0. This proves that (, )(, ) = .
x
y
x y
x
r
y
y
x
x
105
y
xy
x y
18.2 Divisibility in rings
Before trying to prove that various commutative rings `have unique factorization', we should make clear
what this should mean. To make this clear, we need to talk about divisibility again. In this section we
presume that any ring in question is commutative and has a unit.
The very rst thing to understand is the potential failure of the possibility of the cancellation property,
and its connection with the presence of non-zero zero divisors:
Theorem: A commutative ring has the cancellation property if and only if it is an integral domain
Proof: Suppose that has the cancellation property, and suppose that = 0. Since 0 = 0 for any
2 , we can write = 0. For =
6 0 we can cancel the and obtain = 0. Similarly, if =
6 0 then
R
R
r
R
r
r
s
r
r
r
s
r
s
s
= 0. This shows that the cancellation property implies that there are no non-zero zero divisors.
On the other hand, suppose that has no non-zero zero divisors. Suppose that = with 6= 0.
Then, subtracting, ( , ) = 0. Since 6= 0, it must be that , = 0, or = . This is the desired
cancellation property.
|
In a commutative ring , say that 2 divides 2 if there is 2 so that = . And also say
then that is a multiple of . And just as for the ordinary integers we may write
r
R
r a
rz
b
r
R
y
x
a
R
y
b
R
a
z
rb
r
b
R
y
zx
x
j
x y
to say that divides . And then is a divisor of . If = and neither nor is a unit, then say that
is a proper divisor of .
Keep in mind that since 0 = 0 anything divides 0. But for the same reason 0 only divides itself and
nothing else. On the other hand, if 2 is a unit in (meaning that it has a multiplicative inverse in )
then divides everything: let 2 be anything, and then we see that
x
y
x
x
y
xz
y
x
z
y
r
u
u
r
R
R
R
R
r
= 1 = ( ,1 ) = ( ,1)
r
r
u
u
r
u
u
making clear that is a multiple of .
An element in is prime or irreducible if itself is not a unit in , but if
2 ) then either or is a unit in .
A paraphrase of this is: an element is prime if and only if it has no proper divisors.
r
p
x; y
R
u
R
x
p
y
R
R
If is a proper divisor of a non-zero element in an integral domain , then
d
r
R
but
R
R
r
R
6=
r
R
d
d
Proof: Since d is a divisor of r, there is x 2 R so that xd = r. Then
R
since
R
= ( )=( )
r
R xd
Rx d
R
d
is closed under multiplication, after all. Suppose that = . Then
R
d
=1 2 =
d
R
106
d
R
r
r
R
d
xy
= (with both
p
so there is 2 so that = . But then
s
R
d
s
r
r
=
xd
= ( )=( )
x sr
xs r
which gives (1 , ) = 0. Since 6= 0 and is an integral domain, 1 , = 0 or = 1. That is, is a
unit, contradicting the assumption that = is a proper factorization of . Done.
Two prime elements 2 are associate if there is a unit in so that = . (Since the inverse
of a unit is of course a unit as well, this condition is symmetrical, being equivalent to the existence of a unit
so that = ).
The idea is that, for purposes of factorization into primes, two associate prime elements will be viewed
as being essentially the same thing.
r
xs
r
R
xd
p; q
v
p
xs
r
xs
x
r
R
u
R
q
up
vq
18.3 Polynomial rings
Another important and general construction of rings is polynomial rings: let be a commutative
ring with unit, and de ne
[ ] = f polynomials with coecients in g
Then we use the usual addition and multiplication of polynomials. We will be especially interested in
polynomials whose coecients lie in a eld . By default, we might imagine that is Q or R or C, although
we should also admit the possibility that the eld is a nite eld such as Z for prime.
R
R x
R
k
k
k
=p
p
Let be a commutative ring with unit 1. Let be the thing we usually think of as a `variable' or
'indeterminate'. The ring of polynomials in with coecients in is what it sounds like: the collection
of all polynomials using indeterminate and whose coecients are in the ring . This is also called the ring
of polynomials in over . The notation for this is very standard: using square brackets:
R
x
x
R
x
x
R
R
[ ]
R x
denotes the ring of polynomials over . With the usual addition and multiplication of polynomials, this [ ]
is a ring.
We are most accustomed to polynomials with real numbers or complex numbers as coecients, but
there is nothing special about this.
When a polynomial with indeterminate is written out as
k
R x
x
( )=
P x
cn x
n
+ ,1 ,1 +
cn
x
n
:::
+
3
c3 x
+
c2 x
2
+
c1 x
+
co
the coecients are the `numbers'
in the ring . The constant coecient is . If 6= 0,
then
is called the highest-order-term or leading term and is the highest-order coecient
or leading coecient.
We refer to the summand
as the degree term. Also sometimes is called the order of the
summand . The order of the highest non-zero coecient is the degree of the polynomial.
Remark: Sometimes people write a polynomial in the form above but forget to say whether is
de nitely non-zero or not. Sometimes, also, people presume that if a polynomial is written in this fashion
then is non-zero, but that's not safe at all.
A polynomial is said to be monic if its leading or highest-order coecient is 1.
cn ; : : : ; co
n
cn x
R
co
cn
cn
ci x
i
ci x
i
i
i
cn
cn
107
Remark: Sometimes polynomials are thought of as simply being a kind of function, but that is too naive
generally. Polynomials give rise to functions, but they are more than just that. It is true that a polynomial
( )=
n
f x
with coecients in a ring
R
c x
gives rise to
( )=
functions on the ring , writing as usual
R
n
f a
n +c
n,1 + : : : + c x + c
n,1 x
1
o
c a
n+c
n,1 + : : : + c a + c
n,1 a
1
o
for 2 . That is, as usual, we imagine that the `indeterminate' is replaced by everywhere (or \ is
substituted for "). This procedure gives functions from to .
But polynomials themselves have features which may become invisible if we mistakenly think of them
as just being functions. For example, suppose that we look at the polynomial ( ) = 3 + 2 + + 1 in the
polynomial ring (Z 2)[ ], that is, with coecients in Z 2. Then
a
R
x
x
R
a
f x
=
x
a
R
x
x
x
=
f
(0) = 03 + 02 + 0 + 1 = 0
(1) = 13 + 12 + 1 + 1 = 0
That is, the function attached to the polynomial is the 0-function, but the polynomial is visibly not the zero
polynomial.
As another example, consider ( ) = 3 , as a polynomial with coecients in Z 3. Once again, (0),
(1), (2), are all 0, but the polynomial is certainly not the zero polynomial.
f
f x
f
x
x
=
f
f
18.4 Euclidean algorithm for polynomials
In a polynomial ring [ ] with a eld, there is a Division Algorithm and (therefore) there is a
Euclidean Algorithm nearly identical in form to the analogous algorithms in the ordinary integers Z.
k x
k
The division algorithm is just the usual division of one polynomial by another, with remainder, as we
all learned in high school or earlier. It takes just a moment's re ection to see that the procedure we all
learned does not depend upon the nature of the eld that the coecients are in, and that the degree of the
remainder is indeed less than the degree of the divisor!
For example: let's reduce 3 + 1 modulo 2 + 1:
x
x
( 3 + 1) , ( 2 + 1) = , 1
x
x
x
x
We're done with the reduction because the degree of , 1 is (strictly) less than the degree of
5 + 1 modulo 2 + 1, in stages:
x
x
2
x
+ 1. Reduce
x
( 5 + 1) , 3 ( 2 + 1) = , 3 + 1
(, 3 + 1) + ( 2 + 1) = + 1
x
x
x
x
x
x
x
x
which, summarized, gives the reduction
( 5 + 1) , ( 3 , ) ( 2 + 1) = + 1
x
x
x
x
x
Next, since the division algorithm works for polynomials with coecients in a eld, it is merely a
corollary that we have a `Euclidean algorithm'! If we think about it, the crucial thing in having the Euclidean
108
algorithm work was that the division algorithm gave us progressively smaller numbers at each step. (And,
indeed, each step of the Euclidean Algorithm is just the Division Algorithm!)
18.5 Euclidean rings
Based on our the most important examples of rings with a Division Algorithm and therefore with
a Euclidean Algorithm, the ordinary integers and polynomials over a eld, we now abstract the crucial
property which makes this work. The goal is only to prove that Euclidean rings have the Unique Factorization
property.
This whole line of argument applies to the ordinary integers as well, so we nally will have proven what
we perhaps had been taking for granted all along, namely that the ordinary integers really do have unique
factorizations into primes.
An absolute value on a commutative ring is a function usually denoted of elements
having
the properties
Multiplicativity: For all
we have
=
.
Triangle inequality: For all
we have +
+ .
Positivity: If = 0 then = 0.
If an absolute value on a ring has the property that any non-empty subset of has an element of
least positive absolute value (among the collection of absolute values of elements of ), then we say the the
absolute value is discrete.
A commutative ring with unit is Euclidean if there is a discrete absolute value on it, denoted , so
that for any
and for any 0 =
there are
so that
R
r; s 2 R
jrsj
r; s 2 R
jrj
jrj
r 2 R
jrj jsj
jr
sjejrj
jsj
r
R
S
R
S
R
x 2 R
jrj
6
y 2 R
x
q; r 2 R
=
yq
+
with
r
jrj < jyj
The idea is that we can divide and get a remainder strictly smaller than the divisor.
The hypothesis that the absolute value be discrete in the above sense is critical. Sometimes it is easy
to see that this requirement is ful lled. For example, if is integer-valued, as is the case with the usual
absolute value on the ordinary integers, then the usual Well-Ordering Principle assures the `discreteness'.
The most important examples of Euclidean rings are the ordinary integers Z and any polynomial ring
[ ] where is a eld. The absolute value in Z is just the usual one, while we have to be a tiny bit creative
in the case of polynomials, and de ne
( ) = 2degree P
And 0 = 0. Here the number 2 could be replaced by any other number bigger than 1, and the absolute
value obtained would work just as well.
A Euclidean ring is an integral domain.
Proof: We must show that has no zero-divisors, that is, we must show that if = 0 then either or is
0. Well, if = 0 then
0= 0 =
=
by the multiplicative property of the norm. Now and are non-negative real numbers, so for their
product to be 0 one or the other of and must be 0. And then by the positivity property of the norm
it must be that one of
themselves is 0, as claimed.
j j
k x
k
jP x j
j j
R
R
xy
x
y
xy
j j
jxyj
jxj
jxj
jxj jyj
jyj
jyj
x; y
|
109
In a Euclidean ring , if 2 has j j 1 then = 0.
Proof: Since j j = j j j j, we have j j = j j . If
6= 0, the powers j j form a set of values of the absolute
R
ab
a
r
R
b
r
r
n
<
r
r
n
r
r
n
value which have no least value: they form a decreasing sequence with limit 0, but the sequence does not
0. Thus, = 0.
|
contain
r
In a Euclidean ring , an element 2 is a unit (that is, has a multiplicative inverse) if and only if
j j = 1. In particular, |1|=1.
Proof: First, since 1 1 = 1, by taking absolute values and using the multiplicative property of the absolute
R
u
R
u
value, we have
j1j = j1 1j = j1j j1j
The only real numbers with the property that = 2 are 0 and 1, and since 1 6= 0 we have j1j =
6 0, so
necessarily j1j = 1. If = 1, then, by taking absolute values, we have 1 = j j = j j j j. Since the only
ring element with absolute value strictly smaller than 1 is 0 (from just above), we conclude that both j j
and j j are 1. Therefore, since their product is 1, they must both be 1. So the absolute value of a unit is
1. On the other hand, suppose j j = 1. Then, applying the division/reduction algorithm, we reduce 1 itself
to get
1= +
with j j j j. Since j j = 1, j j 1. But from above we know that this implies = 0. So 1 = , and is
the multiplicative inverse to . So anything with absolute value 1 is a unit.
|
z
z
z
uv
uv
u
v
u
v
u
q
r
<
u
u
r
u
r
<
r
qu
q
u
Theorem: For
in a Euclidean ring , an element of the form + (for 2 ) with smallest absolute
value is a gcd of .
Proof: The discreteness hypothesis on the absolute value assures that among the non-zero values j
+ j
there is at least one which is minimal. Let + be such. We must show that j and j . Using the
division/reduction algorithm, we have
= ( + )+
with j j j + j. Rearranging the equation, we obtain
x; y
R
sx
ty
s; t
R
x; y
sx
sx
x
r
< sx
ty
g x
q sx
ty
ty
g y
r
ty
r
= (1 , ) + (, )
qs x
qt y
So itself is of the form + with
2 . Since + had the smallest non-zero absolute value of
any such thing, and j j j + j, it must be that = 0. So + divides . Similarly, + must divide
. This proves that + is a divisor of both and . On the other hand, j and j , then certainly
j + .
|
0
r
r
y
0
t y
< sx
sx
d sx
0
s x
s ;t
0
R
ty
sx
r
ty
ty
sx
x
ty
x
sx
y
d x
ty
d y
ty
Proposition: In a Euclidean ring , an element 2 is prime if j implies j j = 1 or j j = j j. That is, a
proper divisor
of 2 has the property that 1 j j j j.
R
d
r
p
R
R
< d
d p
d
d
p
< r
Recall that the de nition of a prime element in a commutative ring is that if = then either
or is a unit. To prove both statements of the proposition, it suces to prove that if = with neither
nor a unit, then 1 j j j j. On one hand, if 1 j j j j then, because j j = j j = j jj j, j j j j 1.
Thus, since the units of are exactly those elements with absolute value 1, neither nor is a unit. On
the other hand, if = and neither nor is a unit, then 1 j j and 1 j j. Since j j = j j = j j j j, it
follows that also j j j j and j j j j.
|
Proof:
p
R
ab
b
ab
b
< a
<
n
< a
< n
n
ab
R
ab
a
b
a
n
a
a < n
p
b
b
< a
< b
a
n
a
n >
b >
b
n
ab
a
b
< n
Key Lemma: Let be a prime element in a Euclidean ring . If j then j or j . Generally, if a
prime divides a product 1
then must divide one of the factors .
p
p
R
a
: : : an
p
p ab
ai
110
p a
p b
Proof: It suces to prove that if p ab and p a then p b. Since p a, and since p is prime, the gcd of p and a
is just 1. Therefore, there are s; t R so that
j
6 j
j
6 j
2
1 = sa + tp
Then
b = b 1 = b (sa + tp) = s(ab) + (bt)p
Since p ab, surely p divides the right-hand side. Therefore, p b, as claimed.
Generally, if p divides a1 : : : an , rewrite this as (a1 )(a2 : : : an ). By the rst part, either p a1 or p a2 : : : an .
In the former case we're done. In the latter case, we continue: rewrite a2 : : : an = (a2 )(a3 : : : an ). So either
p a2 or p a3 : : : an . Continuing (induction!), we nd that p divides at least one of the factors ai .
j
j
j
j
j
j
|
Theorem: In a Euclidean ring R, every element r R can be factored into primes as
2
r = up_e11 : : : pemm
where u is a unit, the pi are distinct primes, and the ei are positive integers. If
r = vq_1f1 : : : qnfn
is another such factorization, with unit v and primes qi , then m = n, and we can reorder and relabel the qi 's
so that
qi = pi ui
for some unit ui , for all indices i. And ei = fi . That is, the factorization into primes is essentially unique.
Proof: First we prove the existence of factorizations into primes. Suppose that some r R did not have a
factorization. Then, invoking the discreteness, there is a r R without a factorization and with r smallest
among all elements lacking a prime factorization. If r is prime, then of course it has a factorization, so such
r can't be prime. But then r has a proper factorization r = ab. Just above, we saw that this means that
1 < a < r and 1 < b < r . Since a < r and b < r , by the minimality of r it must be that both a
and b have prime factorizations. Then a prime factorization of r would be obtained by multiplying together
the prime factorizations for a and b. (The product of two units is again a unit!).
Now we prove uniqueness of the factorization. Suppose that
2
2
j j
j j
j j
j j
j j
j j
j j
j j
j j
r = u pe11 : : : pemm
and also
r = v q1f1 : : : qnfn
with primes pi and qi . By induction, we could assume that m is the smallest integer for which there is a
di erent factorization. Since p1 divides vq_1f1 : : : qnfn and p is prime, by the Key Lemma above p1 must divide
one of the qi . By relabelling the qi 's, we may suppose that p1 q1 . Since these are both prime, they must
di er by a unit, that is, there is a unit u1 so that q1 = u1 p1 . Replacing q1 by u1p1 , we get
j
up_e11 : : : pemm = vuf11 pf11 q2f2 q3f3 : : : qnfn
Note that vuf11 is still a unit. Since e1 1 and f1 1, we can cancel at least one factor of p1 from both
sides. (We have already proven that a Euclidean ring is an integral domain).
But by induction, since we assumed that m was the smallest integer occurring in an expression of some
r R in two di erent ways, after removing the common factor of p1 the remaining factorizations must be
essentially the same. (That is, after adjusting the primes by units if necessary, they and their exponents all
match).
2
|
111
A person might notice that we didn't use the triangle inequality at all in these proofs. That is indeed
so, but in practice anything which is a reasonable candidate for an `absolute value' in the axiomatic sense
suggests itself mostly because it does behave like an absolute value in a more down-to-earth sense, which
includes a triangle inequality.
#18.145
Let k[x] be the polynomial ring in one variable x over the
k [x]
?
eld k. What is the group of units
#18.146 Find the greatest common divisor of x5 + x4 + x3 + x2 + x + 1 and x4 + x2 + 1, viewed as elements
in the ring Q[x] of polynomials over Q.
#18.147 Find the greatest common divisor of x6 + x3 + 1 and x2 + x + 1, viewed as elements in the ring
k [x] of polynomials over the nite eld k = Z=3 with 3 elements.
#18.148 Find the greatest common divisor of x6 + x4 + x2 + 1 and x8 + x6 + x4 + x2 + 1, viewed as elements
in the ring k[x] of polynomials over the nite eld k = Z=2 with 2 elements.
#18.149 Find the greatest common divisor of x4 +5x3 +6x2 +5x +1 and x4 +1, viewed as having coecients
in Z=7.
#18.150 Find the greatest common divisor of x4 + 2x2 + x + 2 and x4 + 1, viewed as having coecients in
Z=3.
#18.151 Even though x6 + 3x5 + 3x4 + x3 + 3x2 + 3x + 4 has no roots in Z=5, it has a repeated factor.
Find it.
#18.152 Even though x6 + 4x5 + 6x4 + 3x3 + 2x + 4 has no roots in Z=7, it has a repeated factor. Find it.
#18.153 Let u be a unit in a commutative ring R. Show that no non-unit in R can divide u.
#18.154 Show that in a ring if x = yu with a unit u then y = xu0 for some unit u0.
112
19. Cyclotomic polynomials
Characteristics of elds
Multiple factors in polynomials
Cyclotomic polynomials
Primitive roots in nite elds
19.1 Characteristics of elds
that
Let be a eld. The characteristic char of is the smallest positive integer (if there is one) so
k
k
k
1 + 1k +
{z
|k
n
:::
n
+ 1k} = 0k
where 1k is the unit in and 0k is the zero. As usual, we abbreviate
k
`
1k = 1| k + 1k +
{z
:::
`
+ 1k}
for positive integers .
If there is no such positive integer , then the characteristic is said to be 0. Thus,
`
n
char Q = 0
By contrast,
char Z =
=p
p
The characteristic of a eld is a prime number, if it is non-zero. For a eld of characteristic
with prime, if for some positive integer
Proposition:
p
p
n
1 + 1k +
{z
:::
+ 1k} = 0k
1 + 1k +
{z
:::
+ 1k} = 0k
|k
then divides .
Proof: Suppose that
p
n
n
|k
n
with minimal to achieve this e ect, and that had a factorization
n
n
n
=
ab
113
with positive integers and . Then
a
b
(1| k + 1k +
{z
:::
a
+ 1k}) (1| k + 1k +
{z
:::
+ 1k}) = 1| k + 1k +
{z
Since a eld has no proper zero-divisors, it must be that either
that was minimal, if 1k = 0 then = , and similarly for .
proper. Since
has no proper factorization, it is prime.
Suppose that 1k = 0k . By the division algorithm, we have
a
n
a
a
:::
n
b
n
b
+ 1k} = 0k
1k = 0 or 1k = 0. By the hypothesis
Thus, the factorizaton =
was not
b
n
a b
n
n
0k =
n
1k = ( 1k ) +
q p
n
=
1k = 0k +
r
qp
r
+ with 0
r
r < p
. Then
1k
From this, 1k = 0k . Since
and was the least positive integer with 1k = 0k , it follows that = 0
and divides .
Fields with positive characteristic have a peculiarity which is at rst counter-intuitive, but which plays
an important role in both theory and applications:
r
p
r < p
p
p
r
n
|
p
Proposition:
Let be a eld of positive characteristic . Then for any polynomial
k
p
( ) = n n + n,1 n,1 +
f x
in [ ] we have
k x
a
x
:::
( )p = pn pn + pn,1 p(n,1) +
f x
Proof:
a x
a x
a
x
:::
,
Recall that divides binomial coecients pi with 0
p
p
i
Thus, for n
a
2 k
+
a2 x
2
a1 x
+
a0
+ p2 2p + p1 p + p0
a x
< i < p
1k = 0 +
+
a x
a
. Therefore, for 0
< i < p
,
k
and any polynomial ( ) with coecients in ,
g x
k
( n n + ( ))p = ( n n )p +
a x
g x
a x
X
p
<i<p
0
All the middle terms have a coecient
p
i
so they disappear. Thus,
( n n )p,i ( )i + ( )p
a x
i
g x
g x
1k = 0k
( n n + ( ))p = pn pn + ( )p
a x
g x
a x
g x
The same assertion applies to ( ) itself. Take
g x
( ) = n,1 n,1 + ( )
g x
a
x
h x
Then
( ) = pn,1 p(n,1) + ( )p
Continuing (that is, doing an induction), we obtain the result for .
For example, with coecients in = Z with prime, we have
g x
a
x
h x
f
k
=p
( + 1)p = p +
x
x
p
X
<i<p
0
114
p
i
i + 1 = xp + 1
x
|
Also
(x2 + 1)p = x2p + 1
(x2 + x + 1)p = x2p + xp + 1
and such things.
19.2 Multiple factors in polynomials
There is a very simple device to detect repeated occurrence of a factor in polynomial (with coecients
in a eld). This is very useful both theoretically and in computational situations.
Let k be a eld. For a polynomial
f (x) = cn xn + : : : c1 x + c0
with coecients ci in k, we de ne
f 0 (x) = ncnxn,1 + (n , 1)cn,1 xn,2 + : : : + 3c3 x2 + 2c2 x + c1
Remark: Note that we simply de
ne a \derivative" this way, purely algebraically, without taking any limits.
Of course (!) this formula is still supposed to yield a thing with familiar properties, such as the product rule.
So we've simply used our calculus experience to make a \good guess".
Lemma: For two polynomials f; g in the ring k[x] of polynomials in x with coecients in k, and for r 2 k,
(r f )0 = r f 0
(f + g)0 = f 0 + g0
(fg) = f 0 g + fg0
Proof:
The rst assertion is easy: let f (x) = am xm + : : : + ao , and compute
(r (am xm + : : : + ao ))0 = (ram xm + ram,1 xm,1 + : : : + rao ))0
= m (ram )xm,1 + (m , 1) (ram,1 )xm,2 + : : : + ra1 + 0
= r(m (am )xm,1 + (m , 1) (am,1 )xm,2 + : : : + a1 + 0) = r f 0(x)
The second assertion is also not hard: let f (x) = am xm + : : : + ao and g(x) = bnxn + : : : + bo . Padding
the one of smaller degree with terms of the form 0 x` , we can suppose without loss of generality that m = n.
(This simpli es notation considerably!) Then
(f (x) + g(x))0 = ((an + bn )xn + : : : + : : : + (a1 + b1 )x + (a0 + b0 )x0 )0
= n(an + bn)xn,1 + (n , 1)(an,1 + bn,1 )xn,2 + : : : 1(a1 + b1 )x0 + 0 x0
,
,
= nan xn,1 + (n , 1)an,1 xn,2 + : : : 1 a1 x0 + nbnxn,1 + : : : + (n , 1)bn,1 xn,2 + : : : 1 b1 x0
= f 0 (x) + g0 (x)
115
For the third, property, let's rst see what happens when f and g are monomials, that is, are simply
f (x) = axm , g(x) = bxn . On one hand, we have
(fg)0 = (axm bxn )0 = (abxm+n )0 = ab(m + n)xm+n,1
On the other hand,
f 0g + fg0 = amxm,1 bxn + axm bnxn,1 = ab(m + n)xm+n,1
after simplifying. This proves the product rule for monomials.
To approach the general product rule, let
f (x) = am xm + : : : + ao
g(x) = bn xn + : : : + bo
The coecient of x` in the product f (x)g(x) is
X ai b j
i+j =`
Then the coecient of x`,1 in the derivative of the product is
X ai bj
`
i+j =`
On the other hand, the coecient of x`,1 in f 0 g is
X (iai) bj
i+j =`
X ai jbj
and the coecient of x`,1 in fg0 is
i+j =`
Adding these two together, we nd that the coecient of x`,1 in f 0 g + fg0 is
X ai bj (i + j) = ` X ai bj
i+j =`
i+j =`
which matches the coecient in (fg)0 . This proves the product rule.
|
Let f be a polynomial with coecients in a eld k. Let P be an irreducible polynomial with
coecients in k. Then P 2 divides f if and only if P divides gcd(f; f 0 ).
Proof: On one hand, suppose f = P 2 g. Then, using the product rule,
Proposition:
f 0 = 2PP 0 g + P 2 g0 = P (2P 0 g + Pg0 )
which is certainly a multiple of P . This half of the argument did not use the irreducibility of P .
On the other hand, suppose that P divides both P and P 0 (and show that actually P 2 divides f ).
Dividing f=P by P , we obtain
f=P = Q P + R
116
with the degree of R less than that of P . Then f = QP 2 + RP . Taking the derivative, we have
f 0 = Q0 P 2 + 2QPP 0 + R0 P + RP 0
By hypothesis P divides f 0 . All the terms on the right-hand side except possibly RP 0 are divisible by P , so
P divides RP 0 . Since P is irreducible and it divides the product RP 0 , it must divide either R or P 0 . If it
divides R, then we've shown that P 2 divides f , so we're done.
If P fails to divide R then P must divide P 0 . Since P 0 is of lower degree than P , if P divides it then P 0
must be the zero polynomial. Let's see that this is impossible for P irreducible. Let
P (x) = an xn + an,1 xn,1 + : : : + a2 x2 + a1 x + a0
Then
P 0 (x) = nanxn,1 + (n , 1)an,1 xn,2 + : : : + 2a2 x1 + a1 + 0
For this to be the zero polynomial it must be that
` a` = 0
for all indices `. That is, for any index ` with a` 6= 0 it must be that ` 1k = 0k . Since at least one coecient
of P is non-zero, this implies that the characteristic of k is not 0, so from above is some prime p. From above,
` 1k = 0k implies that p divides `. That is, the characteristic p divides ` if the coecient a` is non-zero. So
we can write
f (x) = apm xpn + ap(m,1) xp(m,1) + ap(m,2) xp(m,2) + : : : + a2p x2p + ap xp + a0
From above, we recognize this as the pth power of
apm xn + ap(m,1) x(m,1) + ap(m,2) x(m,2) + : : : + a2p x2 + ap x + a0
But if P is a pth power it is certainly not irreducible. Therefore, for P irreducible it cannot be that P 0 is
the zero polynomial. Therefore, above it must have been that R = 0, which is to say that P 2 divides f , as
claimed.
|
19.3 Cyclotomic polynomials
For b in a eld k, the exponent of b is the smallest positive integer n (if it exists at all) so that bn = 1.
That is, bn = 1 but bd 6= 1 for 0 < d < n. In other words, b is a root of the polynomial xn , 1 but not of xd , 1
for any smaller d. What we'll do here is describe the polynomial 'n , the nth cyclotomic polynomial, of
which b must be a root in order to have exponent n.
Fix a eld k, and an integer n not divisible by the characteristic of k. (If the characteristic is 0 then
this is no condition at all.)
Lemma: For m, n two integers (divisible by the characteristic or not)
gcd(xm , 1; xn , 1) = xgcd m;n , 1
lcm(xm , 1; xn , 1) = xlcm m;n , 1
117
(
)
(
)
Proof: We do induction on the maximum of m and n. First, if by chance m = n, then xm , 1 = xn , 1 and
we are certainly done. Second, if m > n, doing a fragment of a division, we have
xm , 1 , xm,n (xn , 1) = xm,n , 1
So if D is a polynomial dividing both xm , 1 and xn , 1 then D divides xm,n , 1 as well. By induction,
gcd(xm,n , 1; xn , 1) = xgcd(m,n;n) , 1
But
gcd(m; n) = gcd(m , n; n)
and
so
xm , 1 = xm,n (xn , 1) + xm,n , 1
gcd(xm , 1; xn , 1) = gcd(xm,n , 1; xn , 1)
If m < n we reverse the roles of m and n: let's repeat the argument. Doing a fragment of a division:
xn , 1 , xn,m (xm , 1) = xn,m , 1
So if D is a polynomial dividing both xm , 1 and xn , 1 then D divides xn,m , 1 as well. By induction,
gcd(xn,m , 1; xn , 1) = xgcd(n,m;n) , 1
But
gcd(m; n) = gcd(n , m; n)
and
so
xn , 1 = xn,m (xm , 1) + xn,m , 1
gcd(xm , 1; xn , 1) = gcd(xn,m , 1; xm , 1)
This completes the induction step. (The discussion of the least common multiple is essentially identical,
and also follows from this discussion.)
|
Lemma: Let n be a positive integer not divisible by the characteristic of the eld k. Then the polynomial
xn , 1 has no repeated factors.
Proof: From above, it suces to check that the gcd of xn , 1 and its derivative nxn, is 1. Since the
characteristic of the eld does not divide n, n 1k has a multiplicative inverse t in k. Then, doing a division
with remainder,
(xn , 1) , t(nxn, ) = ,1
Thus, the gcd is 1.
|.
1
1
Now suppose that n is not divisible by the characteristic of the eld k, and de ne the nth cyclotomic
polynomial 'n(x) (with coecients in k) by
'n (x) =
xn , 1
d
lcm of all x , 1 with 0 < d < n, d dividing n
where the least common multiple is taken to be monic.
118
Theorem: Let m = n be integers neither of which is divisible by the characteristic of the eld k. Then
'n is monic
gcd('m ; 'n ) = 1.
The degree of 'n is '(n) (Euler's phi-function)
There is a more ecient description of 'n (x):
n
ph (x) = Q x , 1
n
The polynomial xn , 1 factors as
d<n;djn 'd (x)
1
Y
xn , 1 =
1
dn;djn
'd (x)
Proof: First, we really should check that the least common multiple of the xd , 1 with d < n and djn divides
xn . We know that djn (and d > 0) implies that xd , 1 divides xn , 1 (either by high school algebra or from
the lemma above). Therefore, using unique factorization of polynomials with coecients in a eld, it follows
that the least common multiple of a collection of things each dividing xn , 1 will also divide xn , 1.
Next, the assertion that 'n is monic follows from its de nition, since it is the quotient of the monic
polynomial xn , 1 by the monic lcm of polynomials.
Next, to determine the gcd of 'm and 'n , rst observe that 'm divides xm , 1 and 'n divides xn , 1,
so
gcd('m ; 'n ) divides gcd(xm , 1; xn , 1)
In the lemma above we computed that
gcd(xm , 1; xn , 1) = xgcd(m;n) , 1
But from its de nition, 'm divides
xm , 1
gcd
x (m;n) , 1
so gcd('m ; 'n ) also divides this. Since n is not divisible by the characteristic, the lemma above shows that
xn , 1 has no repeated factors. Therefore, from the fact that gcd('n ; 'm ) divides xgcd(m;n) , 1 and also
divides (xn , 1)=(xgcd(m;n) , 1) we conclude that gcd(xm , 1; xn , 1) = 1.
Next, we use induction to prove that
xn , 1 =
Y
dn; djn
'd (x)
1
For n = 1 the assertion is true. From the de nition of 'n , we have
xn , 1 = 'n (x) lcmfxd , 1 : djn; 0 < d < ng
By induction, for d < n
Y
xd , 1 =
0
<e<d;ejd
'e (x)
Since we have already shown that for m 6= n the gcd of 'm and 'n is 1, we have
lcmfxd , 1 : djn; 0 < d < ng =
119
Y
djn;d<n
'd (x)
Thus,
x
n
, 1 = ' (x)
n
j
Y
' (x)
d
d n;d<n
as claimed.
The assertion about the degree of ' follows from the identity proven earlier for Euler's phi-function:
n
j
X ' d) = n
(
d n;d>0
|
This completes the proof of the theorem.
19.4 Primitive roots in nite elds
Now we can prove that the multiplicative group k of the any nite eld k is a cyclic group. A generator
of k is sometimes called a primitive root for k. This propert of k is essential for the working of modern
primality tests and modern factorization algorithms.
Theorem: Let k be a nite eld. Then k is a cyclic group.
Proof: Let q be the number of elements in k. The group of units k is a group. Since k is a eld, any b 6= 0
has a multiplicative inverse in k. So the order of k is q , 1. Thus, by corollaries to Lagrange's theorem,
for b 6= 0,
b ,1 = 1
That is, any non-zero element of k is a root of the polynomial f (x) = x ,1 , 1. On the other hand, by the
Fundamental Theorem of Algebra, this polynomial has at most q , 1 roots in k. Therefore, it has exactly
q , 1 (distinct) roots in k.
Let p be the characteristic of k. Certainly p cannot divide q , 1, since if it did then the derivative of
f (x) = x ,1 , 1 would be zero, so gcd(f; f 0 ) = f and f would have multiple roots. We have just noted that
f has q , 1 distinct roots, so this doesn't happen.
Since the characteristic of k does not divide q , 1, we can apply the results from just above concerning
cyclotomic polynomials. Thus,
' (x)
x ,1 , 1 =
q
q
q
Y
q
j ,1
d
d q
Since x ,1 , 1 has q , 1 roots in k, and since the ' 's here are relatively prime to each other, each ' with
djq , 1 must have number of roots (in k) equal to its degree. Thus, ' for djq , 1 has '(d) > 0 roots in k
(Euler's phi-function).
Finally, the roots of ' ,1 (x) are those eld elements b so that b ,1 = 1 and no smaller positive power
than q , 1 has this property. The primitive roots are exactly the roots of ' ,1 (x). The cyclotomic polynomial
' ,1 has '(q , 1) roots. Therefore, there are '(q , 1) > 0 primitive roots. That is, the group k has a
generator, that is, is cyclic.
|
q
d
d
d
q
q
q
q
#19.155 Determine the cyclotomic polynomials ' , ' , ' , ' , ' , ' , ' , ' 2.
#19.156 (*) Find a cyclotomic polynomial that has coecients other than 0; +1; ,1.
2
3
120
4
5
6
8
9
1
20. Primitive roots
Primitive roots in Z/p
Primitive roots in Z/pe
Counting primitive roots
Non-existence of primitive roots
20.1 Primitive roots in Z/p
Now we can verify that the multiplicative group Z of the nite eld Z with elements is a cyclic
group. Any generator of it is called a primitive root for Z . This property of Z (and other nite elds)
is essential in primality tests and factorization algorithms.
=p
=p
=p
Theorem: Let be the nite eld Z with prime. Then Z
k
=p
p
p
=p
=p
is a cyclic group.
Proof: As corollary of our study of cyclotomic polynomials, we've already proven that the multiplicative
group of any nite eld is cyclic. Therefore, all we need do is check that Z is a eld. That is, we
must check that any non-zero element 2 Z has a multiplicative inverse.
Let's repeat the explanation of why there is a multiplicative inverse, even though we've given it before
in other contexts. Indeed, since is prime, if 6= 0 mod , then gcd( ) = 1. Thus, there are integers
so that + = 1. Then, looking at the latter equation modulo , we see that is a multiplicative inverse
to modulo .
|
k
k
=p
b
p
sp
=p
b
p
tb
b
p; b
s; t
p
t
p
20.2 Primitive roots in Z/pe
To prove that there is a primitive root in Z e for an odd prime is not dicult, once we know that
there is a primitive root for Z . A minor adaption of this applies as well to Z 2 e .
=p
p
=p
= p
Theorem:
For an odd prime , Z e and Z 2 e have primitive roots. That is, the multiplicative groups
e
Z
and Z 2 e are cyclic.
Corollary: (of proof) In fact, for an integer which is a primitive root mod , either is a primitive root
p
=p
=p
= p
= p
g
p
g
mod e and mod 2 e for all 1, or else (1 + ) is. In particular, if p,1 6= 1 mod 2 , then is a primitive
root mod e and mod 2 e for all 1. Otherwise, (1 + ) is.
The following proposition is of interest in its own right, and is necessary to prove the theorem on
primitive roots. Its point is that understanding the order of certain types of elements in Z e is much
more elementary than the trouble we went through to show that Z has a primitive root. We'll prove this
proposition before proving the theorem and corollary on primitive roots.
p
p
p
e
p
p g
g
e
p
g
p g
=p
=p
121
Let be an odd prime. For integers 1 , and for an integer with 6 j , the order of
an element 1 + k in Z e is e,k . In particular, for 6 j and 1,
Proposition:
p
k
p x
=p
p
e
p
(1 +
k
p x
x
x
)p = 1 +
`
p
x
k
p
k+`
y
with = mod .
Proof: (of proposition). The main trick here is that a prime divides the binomial coecients
y
x
p
p
1
Also, the hypothesis that
Let's rst compute
(1 +
=1+
k+1
p
p x
|
) =1+ 1
p
p
x
2
;
;:::;
p
,2
p
;
p
,1
p
2 is essential.
p >
k
p
p
+ 2
p
2k
p
k
p x
+ 2
p
2k
p
+
2
x
,(k+1) x2 + : : : +
p
:::
p
,{z1
p
p
p
,1
,
(p 1)k
p
x
, + ppk xp
p 1
, ,(k+1) xp,1 + ppk,(k+1) xp
(p 1)k
y
}
Since divides those binomial coecients, the expression di ers from by a multiple of . Looking at
the very last term, pk,(k+1) p , we see that it is necessary that , ( + 1) 1 for this to work. Since all
we know about is that 1, it must be that
2 or this inequality could fail. This explains why the
argument fails for the prime 2. So we have proven that
p
y
p
x
x
k
pk
k
p
k
p >
(1 +
)p = 1 +
k
p x
p
k+1
y
with = mod . Repeating this argument (that is, doing an induction), we get
y
x
p
(1 +
k
p x
)p = 1 +
`
k+`
p
y
with = mod . This is the formula asserted in the proposition.
Now let's see that this formula gives the assertion about orders. First we must see what the order in
e
Z
of elements of the form 1 + can be. To do this we will invoke Lagrange's theorem. So we have to
count the number of elements of Z e expressible as 1 + . In the rst place, for any integer the integer
1 + is relatively prime to , so gives an element of Z e . On the other hand, if
y
x
p
=p
px
=p
px
px
p
x
=p
1+
px
= 1 + 0 mod
px
e
p
then e j(1 + , 1 , 0 ). That is, e,1 j , 0 . So the integers 1 + and 1 + 0 give the same element
e,1
of Z e only if = 0 mod e,1 . Thus, the e,1 integers = 0 1 2
, 1 give all the elements of
e
Z
expressible as 1 + .
By Lagrange's theorem, the order of any element 1 + in Z e must divide e,1 .
p
px
=p
px
x
=p
p
x
x
p
x
px
p
x
;
;
px
;:::p
px
px
=p
p
This limitation allows our computation of (1+ k )p` to give a de nitive answer to the question of order:
for 6 j ,
`
(1 + k )p = 1 + k+`
with = mod , so this is not 1 mod e unless + . (And if + it is 1 mod e .) Thus,
p x
p
x
p
p x
y
x
p
p
k
`
e
(multiplicative) order of 1 +
122
y
k
k
p x
mod
`
e
p
e
is e,k
p
p
This proves the proposition.
|
Proof: (of theorem and corollary) The assertion of the corollary is stronger than the theorem, so it certainly
suces to prove the more speci c assertion of the corollary in order to prove the theorem.
Before the most serious part of the proof, let's see why an integer g which is a primitive root for Z=pe
will also be a primitive root for Z=2pe . The main point is that for an odd prime p
'(2pe ) = (2 , 1)(p , 1)pe,1 = (p , 1)pe,1 = '(pe )
Let g be a primitive root modulo pe . Then ` = '(pe ) is the smallest exponent so that g` = 1 mod pe . Thus,
surely there is no smaller exponent ` so that g` = 1 mod 2pe , since pe j2pe . Therefore, a primitive root mod
pe also serves as a primitive root mod pe .
Now the central case, that of primitive roots for Z=pe . That is, we want to show that the multiplicative
group Z=pe is of the form hgi for some g. Let g1 be a primitive root mod p, which we already know exists
for other reasons. The plan is to \adjust" g1 suitably to obtain a primitive root mod pe , somewhat in the
spirit of Hensel's lemma. But it turns out that at most a single adjustment is necessary altogether, so in
some regards the situation is simpler than a Hensel's lemma application.
If (by good luck?)
g1p,1 = 1 + px
with p 6 jx, then let's show that g1 is already a primitive root mod pe for any e 1. By Lagrange's theorem,
the order of g1 in Z=pe is a divisor of '(pe ) = (p , 1)pe,1 . Since p , 1 is the smallest positive exponent
` so that g1` = 1 mod p, p , 1 divides the order of g1 in Z=pe (from our discussion of cyclic subgroups).
Thus, the order of g1 is in the list
p , 1; (p , 1)p; (p , 1)p2 ; : : : ; (p , 1)pe,1
Thus, the question is to nd the smallest positive ` so that
g1(p,1)p = 1 mod pe
`
We are assuming that
g1p,1 = 1 + px
with p 6 jx, so the question is to nd the smallest positive ` so that
(1 + px)p = 1 mod pe
`
From the proposition, the smallest positive ` with this property is ` = e , 1. That is, we have proven that
g1 is a primitive root mod pe for every e 1.
Now suppose that
g1p,1 = 1 + px
with pjx. Then consider
g = (1 + p)g1
Certainly g is still a primitive root mod p, because g = g1 mod p. And we compute
, 1 pp,2 + pp,1
(1 + p) , = 1 + p ,1 1 p + p ,2 1 p2 + : : : + pp ,
2
p 1
1+p
p , 1 p , 1 p , 1
+
p+
p2 + : : : = 1 + py
2 {z
3
| 1
}
y
123
Since
p , 1
1
we see that
so p 6 jy. Thus,
Since pjx, we have
=p,1
y = p , 1 mod p
gp,1 = ((1 + p)g1 )p,1 = (1 + py)(1 + px) = 1 + p(y + x + pxy)
y + x + pxy = y mod p
In particular, p 6 jy + x + pxy. Thus, by adjusting the primitive root a bit, we have returned to the rst
case above, that gp,1 is of the form gp,1 = 1 + pz with p 6 jz . In that case we already saw that such g is a
primitive root mod pe for any e 1.
This nishes the proof of existence of primitive roots in Z=pe for p an odd prime.
|
20.3 Counting primitive roots
After proving existence of primitive roots, it is at least equally interesting to have an idea how many
there are.
Theorem: If Z=n has a primitive root, then there are exactly
'('(n))
primitive roots mod n. (Yes, that is Euler's phi of Euler's phi of n.) For example, there are
'('(pe )) = '(p , 1) (p , 1)pe,2
primitive roots mod pe for an odd prime p.
Proof: The hypothesis that Z=n has a primitive root is that the multiplicative group Z=n is cyclic. That
is, for some element g (the \primitive root")
Z=n = hgi
Of course, the order jgj of g must be the order '(n) of Z=n . From general discussion of cyclic subgroups,
we know that
g0 ; g1; g2 ; g3 ; : : : ; g'(n),1
is a complete list of all the di erent elements of hgi. And
order of g
order of gk = gcd(
k; jgj)
So the generators for hgi are exactly the elements
gk with 1 k < jgj and k relatively prime to jgj
By de nition of Euler's '-function, there are '(jgj) of these. Thus, since jgj = '(n), there are '('(n))
primitive roots.
|
Corollary: For an odd prime p, the fraction '(p , 1)=p of the elements of Z=pe consists of primitive roots.
124
Proof: From the theorem just proven the ratio of primitive roots to all elements is
'('(pe )) = '(p , 1) (p , 1)pe,2 = '(p , 1)
'(pe )
(p , 1)pe,1
p
as claimed.
Remark: Thus, there are relatively many primitive roots modulo pe .
|
20.4 Non-existence of primitive roots
For generic integers n, there is no primitive root in Z=n.
Theorem: If n is not 4, 8, nor of the forms pe , 2pe for p an odd prime (and e a positive integer), then there
is no primitive root modulo n.
Proof: First, let's look at Z=2e with e 3. Any b 2 Z=2e can be written as b = 1 + 2x for integer x. Then
(1 + 2x)2 = 1 + 4x + 4x2 = 1 + 4x(x + 1)
The peculiar feature here is that for any integer x, the expression x(x + 1) is divisible by 2. Indeed, if x is
even surely x(x + 1) is even, and if x is odd then x + 1 is even and x(x + 1) is again even. Thus,
(1 + 2x)2 = 1 mod 8
(rather than merely modulo 4). And from the pattern
(1 + 2k x)2 = 1 + 2k+1 x + 22k x2
we can prove by induction that
(1 + 8x)2
Putting this together, we see that
,
e 3
= 1 mod 2e
,
(1 + 2x)2 = 1 mod 2e
But 2e,2 < 2e,1 = '(2e ). That is, there cannot be a primitive root modulo 2e with e > 2.
Now consider n not a power of 2. Then write n = pe m with p an odd prime not dividing m. By Euler's
theorem, we know that
b'(pe) = 1 mod pe
b'(m) = 1 mod m
Let M = lcm('(pe ); '(m)). Then (as usual)
e 2
bM = (b'(pe ) )M='(pe ) = 1M='(pe ) = 1 mod pe
and
Thus, certainly
bM = (b'(m))M='(m) = 1M='(m) = 1 mod m
bM = 1 mod pe m
125
But a primitive root g would have the property that no smaller exponent ` than '(pe m) has the property
that g` = 1 mod pe m. Therefore, unless gcd('(pe ); '(m)) = 1 we'll have
lcm('(pe ); '(m)) < '(pe ) '(m) = '(pe m)
which would deny the possibility that there be a primitive root.
Thus, we need '(m) relatively prime to '(pe ) = (p , 1)pe,1 . Since p , 1 is even, this means that '(m)
must be odd. If an odd prime q divides m, then q , 1 divides '(m), which would make '(m) even, which is
impossible. Thus, no odd prime can divide m. Further, if any power of 2 greater than just 2 itself divides
m, again '(m) would be even, and no primitive root could exist.
Thus, except for the cases where we've already proven that a primitive root does exist, there is no
primitive root mod n.
|
#20.157 Find primitive roots modulo 11 and 13.
#20.158 Determine the order of all elements of the multiplicative groups Z=12, Z=15. Z=17.
126
21. Group Homomorphisms
Group homomorphisms, isomorphisms
21.1 Group homomorphisms, isomorphisms
A function (or map)
f
from one group
G
to another one
H
:
G ! H
is a group homomorphism if
(
f g1 g2
) = ( 1) ( 2 )
f g
f g
for all 1 2
. Let G be the identity in and H the identity in . The kernel of such a group
homomorphism is
kernel of = ker =
: ( )= H
The image of is just like the image of any function:
g ;g
2 G
e
G
e
H
f
f
f
fg 2 G
f g
e
g
f
image of
= im =
f
f
fh 2 H
: there is
so that ( ) =
g 2 G
f g
hg
Let :
be a group homomorphism. Let G be the identity in and let H be the identity in .
Necessarily carries the identity of to the identity of : ( G) = H .
For
, ( ,1 ) = ( ),1 .
The kernel of is a subgroup of .
The image of is a subgroup of .
A group homomorphism :
is injective if and only if the kernel is trivial (that is, is the trivial
subgroup G ).
Proof: The image ( G) under of the identity G in has the property
f
G ! H
e
f
g 2 G
G
f g
H
f e
e
H
e
f g
f
G
f
H
f
fe
G
G ! H
g
f e
f
e
G
( G) = ( G G ) = ( G) ( G)
f e
f e
e
f e
f e
using the property of the identity in and the group homomorphism property. Left multiplying by ( G ),1
(whatever this may be!), we get
G
f e
( G),1 ( G ) = ( G),1 ( ( G) ( G ))
f e
f e
f e
f e
f e
Simplifying and rearranging a bit, this is
e
H = (f (eG ),1 f (eG )) f (eG ) = eH f (eG) = f (eG )
This proves that the identity in
G
is mapped to the identity in .
H
127
To check that the image of an inverse is the image of an inverse, we simply compute
( ,1 ) ( ) = ( ,1 )
f g
f g
f g
g
by the homomorphism property, and this is
= ( G) = H
by the inverse property and by the fact (just proven) that the identity in
by a group homomorphism. Likewise, we also compute that
f e
e
G
is mapped to the identity in
H
( ) ( ,1 ) = H
so the image of an inverse is the inverse of the image, as claimed.
To prove that the kernel of a group homomorphism :
is a subgroup of , we must prove three
things. First, we must check that the identity lies in the kernel: this follows immediately from the fact just
proven that ( G ) = H . Next, we must show that if is in the kernel then ,1 is also. Happily (by luck?)
we just showed that ( ,1) = ( ),1 , so indeed if ( ) = H then
f g
f g
e
f
f e
e
G
g
f g
f g
f g
x; y
g
e
( ,1 ) = ( ),1 = H,1 = H
are in the kernel of . Then
f g
Finally, suppose both
G ! H
f g
e
e
f
( )= ( ) ( )= H H = H
so the \product" is also in the kernel.
Now let be a subgroup of . Let
f xy
X
f x
f y
e
e
e
G
( )=
f X
( ):
ff x
x 2 Xg
To show that ( ) is a subgroup of , we must check the usual three things: presence of the identity, closure
under taking inverses, and closure under products. Again, we just showed that ( G ) = H , so the image
of a subgroup contains the identity. Also, we showed that ( ),1 ) = ( ,1 ), so the image of a subgroup is
closed under inverses. And ( ) = ( ) ( ) by the de ning property of a group homomorphism, so the
image is closed under multiplication.
Finally, let's prove that a homomorphism :
is injective if and only if its kernel is trivial. First,
if is injective, then at most one element can be mapped to H
. Since we know that at least G is
mapped to H by such a homomorphism, it must be that only G is mapped to H . Thus, the kernel is
trivial.
On the other hand, suppose that the kernel is trivial. We will suppose that ( ) = ( ), and show that
= . Left multiply the equality ( ) = ( ) by ( ),1 to obtain
f X
H
f e
f g
f xy
f x f y
f
G ! H
f
e
e
2 H
e
e
e
f x
x
y
e
f g
f x
f y
f y
f x
H = f (x),1 f (x) = f (x),1 f (y)
e
By the homomorphism property, this gives
H = f (x),1 f (y) = f (x,1 y)
is in the kernel of f , so (by assumption) x,1 y = eG. Left multiplying this equality by x and
e
Thus, ,1
simplifying, we get = . This proves the injectivity.
If a group homomorphism :
is surjective, then is said to be a homomorphic image of .
If a group homomorphism :
is a bijection, then is said to be an isomorphism, and and
are said to be isomorphic.
x
y
y
x
|
f
f
G ! H
H
G ! H
f
128
G
G
H
Remark: At least from a theoretical viewpoint, two groups that are isomorphic are considered to be \the
same", in the sense that any intrinsic group-theoretic assertion about one is also true of the other. In
practical terms, however, the transfer of structure via the isomorphism may be dicult to compute.
#21.159 What is the kernel of the homomorphism
! mod
from Z (with addition) to Z (with addition modulo )? (Hint: This may be easier than you think!)
#21.160 Let
be positive integers, and suppose that j . What is the kernel of the map
mod ! mod
from Z (with addition modulo ) to Z (with addition modulo )?
#21.161 Let
det : (2 Q) ! Q
x
x
N
=N
N
M; N
N M
x
=M
M
M
x
=N
N
GL
be the usual determinant map
Show by direct computation that
#21.162 Show that the map
N
det
det
;
a
b
c
d
=
,
ad
bc
is a group homomorphism.
t
1
! 0 1
t
is an isomorphism from Q (with addition) to a subgroup of
#21.163 Show that the map
a
0
!
b
d
GL
(2 Q).
;
a
is a homomorphism from the group of all matrices 0
in which
are non-zero rational numbers and
is any rational number, to the multiplicative group Q of non-zero rational numbers. What is its kernel?
#21.164 Show that
a
b
a; d
d
b
a
is not a homomorphism.
#21.165 De ne a map : Q !
E
(2 Q) by
GL
;
x
!
b
0
d
b
1
! 0 1
x
is a group homomorphism from Q with addition to a subgroup of
#21.166 De ne a map : Q ! (3 Q) by
Show that
E
E
GL
;
x
0
1
! @0
x
x2
2
1
0 0 1
129
x
1
A
GL
(2 Q).
;
is a group homomorphism from Q with addition to a subgroup of
#21.167 De ne a map : R ! (2 R) by
Show that
E
r
GL
GL
(3 Q).
;
;
x
! ,cos
sin
x
x
sin
cos
x
x
Show that is a group homomorphism from R with addition to a subgroup of (2 R). What is its kernel?
#21.168 Let be an integer. Show that : Z ! Z de ned by ( ) = is a homomorphism.
#21.169 Show that a homomorphism : ! always has the property that ( ,1) = ( ),1 for 2 .
r
GL
n
f
f
G
f x
H
;
nx
f g
130
f g
g
G
22.
Cyclic Groups
Finite cyclic groups
In nite cyclic groups
Roots and powers
22.1 Finite cyclic groups
A nite group G is cyclic if there is g G so that g = G. And such a g is a generator of G, and G
is said to be generated by g. (The case of in nite cyclic groups will be considered in the next section.)
2
h i
Finite cyclic groups are the simplest of all groups, and can be readily understood as follows.
Let N = G . Since G = g , also N = g . It is important to remember that (as proven a bit earlier)
The elements e = g0; g1 ; g2 ; : : : ; g ,2 ; g ,1 form a complete list of the distinct elements of G = g .
With arbitrary integers i; j , we have g = g if and only if i j mod N .
Given an integer j , let i be the reduction of j mod N . Then g = g .
Then the collections of all subgroups and of all generators can be completely understood in terms of
elementary arithmetic:
The distinct subgroups of G are exactly the subgroups g for all divisors d of N .
For d N the order of the subgroup g is the order of g , which is just N=d.
The order of g with arbitrary integer k = 0 is N=gcd(k; N )
For any integer n we have
g = g ( )
j
j
h i
j j
N
N
i
h i
j
j
h
j
h
d
i
d
i
k
d
i
6
h
n
i
h
gcd n;N
i
The distinct generators of G are the elements g where 1er < N and gcd(r; N ) = 1. Thus, there are
(N ) of them, where is Euler's phi function.
The number of elements of order n in a nite cyclic group of order N is 0 unless n N , in which case it
is N=n.
Remark: Some aspects of this can be paraphrased nicely in words: for example, Every subgroup of a nite
r
j
cyclic group is again a nite cyclic group, with order dividing the order of the group. Conversely, for every
divisor of the order of the group, there is a unique subgroup of that order.
Let's prove that that the order of g is N=gcd(k; N ). First, if (g ) = e = g0 , then k` 0 mod N ,
from the simpler facts recalled above. That is, N k`. That is, there is an integer m so that k` = mN . Then
divide both sides of this equality by gcd(k; N ), obtaining
Proof:
k
k `
j
k
N
gcd(k; N ) ` = m gcd(k; N )
131
Since now N=gcd(k; N ) and k=gcd(k; N ) are relatively prime, by unique factorization we conclude that
N
gcd(k; N ) `
j
Therefore, the actual order of gk is a multiple of N=gcd(k; N ). On the other hand,
(gk )N=gcd(k;N ) = (gN )k=gcd(k;N ) = ek=gcd(k;N ) = e
Note that we use the fact that N=gcd(k; N ) and k=gcd(k; N ) are both integers, so that all the expressions
here have genuine content and sense. This nishes the proof that the order of gk is N=gcd(k; N ).
As a special case of the preceding, if k N then the order of gk is N=gcd(k; N ) = N=k, as claimed above.
Since we know by now that h = h for any h, certainly
j
jh
ij
j
j
gk = gk = N=gcd(k; N )
jh
ij
j
Given integer k, let's show that
j
gk = ggcd(k;N )
h
i
h
Let d = gcd(k; N ), and let s; t be integers so that
d = sk + tN
Then
gd = gsk+tN = (gk )s (gN )t = (gk )s (e)t = (gk )s e = (gk )s
so gd
2 h
gk . On the other hand,
i
gk = (gd )k=d
since d k. Thus, gk gd . Therefore, since the subgroups gk and gd are closed under multiplication and
under inverses, for any integer `
(gk )` gd
j
2 h
i
h
2 h
i
h
i
i
and
(g d )` g k
But gd is just the set of all integer powers of gd (and similarly for gk ), so we have shown that
2 h
h
i
i
gd
h
and vice-versa, so we nd at last that
i h
gk
gd = gk
h
i
h
i
i
Therefore, all the cyclic subgroups of g = G are of the form gd for some positive d dividing N =
G = g . And di erent divisors d give di erent subgroups.
Let H be an arbitrary subgroup of G. We must show that H is generated by some gk (so is in fact
cyclic). Let k be the smallest positive integer so that gk H . We claim that gk = H . For any other
gm H , we can write
m=q k+r
with 0 r < k. Then
g r = g m , q k = g m ( g k ) q H
since H is a subgroup. Since k was the smallest positive integer so that gk H , and 0 r < k, it must be
that r = 0. Therefore, m is a multiple of k, and gk generates H .
h i
j
j
h
i
j j
2
h
i
2
2
2
132
As another particular case, notice that hgk i = hgi if and only if gcd(k; N ) = 1. And we may as well only
consider 0 < k < N , since otherwise we start repeating elements. That is, the distinct generators of hgi are
the elements gk with 0 < k < N and gcd(k; N ) = 1. So there certainly are '(N ) of them.
Likewise, since
jgk j = jhgk ij = jhggcd(k;N ) ij = jggcd(k;N ) j
it is not hard to count the number of elements of a given order in hgi.
|
A homomorphic image of a nite cyclic group is nite cyclic.
Proof: This follows by checking that the image of a generator is a generator for the image.
|
A nite cyclic group of order N is isomorphic to Z=N . Speci cally, for any choice of generator g of the
cyclic group G, the map
f : n ! gn
describes an isomorphism f : Z=N ! G.
Proof: This is just a paraphrase of some of the other properties above.
A possibly disturbing issue here is that of proving that the map f as described above is well-de ned.
That is, we have some sort of formula which appears to describe a map, but there are hidden pitfalls. What
we must show is that if m = n mod N then f (m) = f (n). (This has nothing to do with injectivity!) Well,
it turns out that everything is ok, because we've already shown (in discussion of cyclic subgroups) that
gm = gn if and only if m = n mod N .
The crucial property which must be demonstrated is the homomorphism property
f (m + n) = f (m) f (n)
Indeed,
f (m + n) = f ((m + n) % N ) = gm+n % N = gm+n
since we proved (in the discussion of cyclic subgroups) that gi = gj whenever i = j mod N . And then this is
= f (gm) f (gn)
as desired.
To see that f is injective, suppose that f (m) = f (n) for integers m; n. Then gm = gn. Again, this
implies that m = n mod N , which says that m , mod , N = n , mod, N , as desired. So f is injective.
The surjectivity is easy: given gn 2 hgi, f (n) = gn .
Therefore, the map f is a bijective homormphism, so by de nition is an isomorphism.
|
22.2 In nite cyclic groups
There are non- nite cyclic groups, as well, whose nature is also very simple, though somewhat di erent
from the nite cyclic groups.
Dropping the assumption that a cylic group is nite creates a few complications, but things are still
tractable. And we can't overlook this possibility, since for example Z with addition is an in nite cyclic group.
A group G is in nite cyclic if G is an in nite group and if there is g 2 G so that hgi = G. Such a g is
a generator of G, and G is said to be generated by g.
133
It is important to understand the assertions for in nite cyclic groups analogous to those for nite cyclic
groups above:
,3 ,2 ,1 = 0 = 1 2 3
The elements
are all distinct elements of = .
With integers , we have i = j if and only if = .
An in nite cyclic group is isomorphic to Z. Speci cally, for any choice of generator of the in nite
cyclic group , the map
:::;g
i; j
;g
;g
g
e
g ;g
g ;g ;g ;:::
g
i
G
hgi
j
g
G
g
n
! n
describes an isomorphism G ! Z. Thus, with hindsight, we realize that an in nite cyclic group has just two
generators, since that is true of Z.
Then the collections of all subgroups and of all generators can be completely understood in elementary
terms:
d
The distinct subgroups of G are exactly the subgroups hg i for all non-negative integers d.
d
0
0
Any subgroup hg i is in nite cyclic, except for the trivial group feg = fg g = hg i.
d
d
,d .
Each subgroup hg i has exactly two generators, g and g
Some aspects of this can be paraphrased nicely in words: Every non-trivial subgroup of an in nite cyclic
group is again an in nite cyclic group.
Also, about the number of elements of various orders: all elements of an in nite cyclic group are of
in nite order except e = g0, which is of order 1.
22.3 Roots, powers
In a cyclic group =
of the equation r = .
G
x
Let
G
hgi
of order it is possible to reach very clear conclusions about the solvability
n
y
be a cyclic group of order with generator . Fix an integer , and de ne
n
g
f
by
:
r
G ! G
( )=
f x
r
x
Theorem: This map is a group homomorphism of to itself. If gcd( ) = 1, then is an isomorphism.
That is, if gcd(
f
) = 1, then every
r; n
y 2 G
has an
r
G
r; n
order of kernel of = gcd(
f
r
th
)
r; n
order of image of = gcd( )
root, then it has exactly gcd( ) of them. There are exactly
f
If an element y has an
powers in G.
Proof: Certainly
f
root, and has exactly one such root. Generally,
th
n=
r; n
r; n
(
) = ( )r = r r (since
= ( ) ()
which shows that is a homomorphism.
f x y
xy
x
y
f x
f y
f
134
G
is abelian)
gcd(
n=
r; n
)
r
th
We may as well use the fact that G is isomorphic to Z=n with addition (proven just above.) This
allows us to directly use things we know about Z=n and the relatively simple behavior of addition mod n to
prove things about arbitr