Integer Factorization - Master Thesis
Integer Factorization - Master Thesis
Integer Factorization - Master Thesis
Master Thesis
D I K U
ii Integer Factorization
Per Leslie Jensen
Abstract
1 Preface 1
2 Mathematical Prerequisites 5
2.1 The Theory of Numbers . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 The Set Z . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Polynomials in Z, Q and C . . . . . . . . . . . . . . . . 7
2.2 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Fields and Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Galois Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 Elliptic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6.2 The Weierstra Form . . . . . . . . . . . . . . . . . . . . 13
5 Factorization Methods 41
5.1 Special Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.1.1 Trial division . . . . . . . . . . . . . . . . . . . . . . . . 42
5.1.2 Pollards p 1 method . . . . . . . . . . . . . . . . . . . 43
5.1.3 Pollards method . . . . . . . . . . . . . . . . . . . . . 44
5.1.4 Elliptic Curve Method (ECM) . . . . . . . . . . . . . . . 45
5.2 General Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2.1 Congruent Squares . . . . . . . . . . . . . . . . . . . . . 47
5.2.2 Continued Fractions (CFRAC) . . . . . . . . . . . . . . 48
Integer Factorization v
Per Leslie Jensen
CONTENTS
7 Implementing GNFS 75
7.1 pGNFS - A GNFS Implementation . . . . . . . . . . . . . . . . 76
7.1.1 Input/Output . . . . . . . . . . . . . . . . . . . . . . . . 77
7.1.2 Polynomial Selection . . . . . . . . . . . . . . . . . . . . 77
7.1.3 Factor Bases . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.1.4 Sieving . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.1.5 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . 78
7.1.6 Square Root . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.1.7 Example of Input . . . . . . . . . . . . . . . . . . . . . . 79
7.2 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
vi Integer Factorization
Per Leslie Jensen
CHAPTER 1
Preface
Unfortunately what is little recognized is that the most worthwhile scientific
books are those in which the author clearly indicates what he does not
know; for an author most hurts his readers by concealing difficulties.
- Evariste Galois(1811-1832)
In 1801 Gauss identified primality testing and integer factorization as the two
most fundamental problems in his Disquisitiones Arithmeticae[29] and they
have been the prime subject of many mathematicians work ever since and
with the introduction of public key cryptography in the late 1970s it is more
important than ever.
This thesis is a guide to integer factorization and especially the fastest gen-
eral purpose algorithm known: the number field sieve. I will describe the num-
ber field sieve algorithm so it can be implemented from this description. To
convince the reader I have implemented the algorithm from my description
and have made the implementation public available on the Internet1 so it can
be used as a reference and maybe be improved further by anyone interested.
Even before Gauss, the problem of factoring integers and verifying primes
were a hot topic, but it seemed intractable and no one could find a feasible
way to solve these problems and to this day it still takes a large amount of
computational power.
Till this day the problem of factoring integers is still not solved, which
means there is no deterministic polynomial time algorithm for factoring inte-
gers. The primality testing problem has just recently been solved by Agrawal,
Kayal and Saxena[4].
Integer factorization has since the introduction of public key cryptosys-
tems in 1977 been even more important because many cryptosystems rely on
the difficulty of factoring integers (large integers). This means that a very
1
Implementation available from http://www.pgnfs.org
Integer Factorization 1
Per Leslie Jensen
CHAPTER 1. PREFACE
fast factoring algorithm would make for example our home banking and en-
crypted communication over the Internet insecure.
The fastest known algorithm for the type of integers which cryptosystems
rely upon is the general number field sieve, it is an extremely complex algorithm
and to understand why and how it works one needs to be very knowledge-
able of the mathematical subjects of algebra and number theory.
This means that the algorithm is not accessible to a large audience and to
this point is has only been adopted by a small group of people who have been
involved in the development of the algorithm, that is: until now. . .
This thesis has the first algorithmic description of all the steps of the num-
ber field sieve and it has resulted in a working implementation of the algo-
rithm made publicly available.
The reader should know that it is a large task to implement all the steps
from scratch, and here you should benefit from my work by having a refer-
ence implementation and maybe start by trying to implement a part of the
algorithm and put it into my framework. My implementation is probably not
as fast and optimal as it could be but it is readable and modular so it is not to
complex to interchange some of the steps with other implementations.
The implementation here has been developed over several months but it
helped the understanding of the algorithm and have given me the knowledge
to write the algorithms in a way so a computer scientist can implement it
without being a math wizard.
The thesis consists of the following Chapters:
Chapter 2 gives the fundamental algebra for the rest of the paper, this
Chapter can be skipped if you know your algebra or it can be used as a dic-
tionary for the algebraic theory used.
In Chapter 3 I will take a trip trough the history of integer factorization
and pinpoint the periods where the development took drastic leaps, to un-
derstand the ideas behind todays algorithms you need to know the paths the
ideas has travelled on.
The motivation for this paper is the link to cryptography and in Chapter
4 I will give an introduction to public-key cryptography and take a more
detailed look on the RSA scheme, which is based on the difficulty of factoring
integers and it is still the most used public-key cryptosystem today.
In Chapter 5 I will take a closer look at some of the algorithms mentioned
in Chapter 3 and describe the algorithms leading to the number field sieve.
The main contribution in this paper is Chapter 6 and 7. In Chapter 6 I
will describe the number field sieve algorithm step by step, and write it on
algorithmic form usable for implementation. In Chapter 7 I will describe
my implementation of the number field sieve, and give some pointers on
implementation specific issues of the algorithm.
Acknowledgments
I would like to thank Stig Skelboe and Jens Damgaard Andersen for taking the
chance and letting me write this paper.
2 Integer Factorization
Per Leslie Jensen
I am grateful for valuable help from Prof. Daniel J. Bernstein with the square
root step.
Thanks to my parents and friends for supporting me through the long pe-
riod I have worked on this, and for understanding my strange work process.
Thanks to Bue Petersen for helping with the printing and binding of this
thesis.
Special thanks to Jacob Grue Simonsen for valuable discussions on- and off-
topic and for proofreading the paper.
And a very special thanks to Katrine Hommelhoff Jensen for keeping me
company many a Wednesday and for keeping my thoughts from wandering
to far from the work at hand.
Integer Factorization 3
Per Leslie Jensen
CHAPTER 1. PREFACE
4 Integer Factorization
Per Leslie Jensen
CHAPTER 2
Mathematical Prerequisites
God made the integers, all else is the work of man.
- Leopold Kronecker(1823-1891)
The main purpose of this chapter is to present the algebra and number theory
used in this paper.
The theorems are presented without proofs. For proofs see [5] and [69].
The reader is expected to have some insight into the world of algebra, con-
cepts like groups and fields are refreshed here, but the theory behind should
be well known to the reader. For a complete understanding of the inner work-
ings of the algebra, a somewhat deeper mathematical insight is needed.
If you have a mathematical background this chapter can be skipped.
Integer Factorization 5
Per Leslie Jensen
CHAPTER 2. MATHEMATICAL PREREQUISITES
The next definition with the accompanying lemmas are the work of Euler,
Leonhard Euler and they help us get to Eulers theorem and Fermats little theorem.
(1707-1783) is without
comparison the most
productive Definition 2.4 The Euler phi function
mathematician ever. Let n be a positive integer. (n) equals the number of positive integers less
He worked in every than or equal to n, that are relative prime to n.
field of mathematics
and in the words of
Laplace: Read Euler, Lemma 2.1 If p is a positive prime number, then (p) = p 1
Read Euler, he is the
master of us all.
Lemma 2.2 For p and q distinct primes, (pq) = (p 1)(q 1)
For a view of the man
and his mathematics,
Lemma 2.3 If p is a prime and k is a positive integer, then (pk ) = pk pk1
see [24].
Pierre de Fermat Lemma 2.4 If n > 1 has prime factorization n = pk11 pk22 pkr r , then (n) =
(1601-1665) is
considered to be the
(pk11 pk11 1 )(pk22 pk22 1 ) (pkr r pkr r 1 )
biggest hobby
mathematicians of all Lemma 2.5 a is an integer and p 6= q are primes and gcd(a, p) = gcd(a, q) = 1
times, he was a lawyer
then a(p1)(q1) 1 (mod pq)
of education but used
his spare time on
mathematics. Fermats Theorem 2.2 B-smooth
last theorem is one of If n is a positive integer and all its prime factors are smaller than B then n is
the most famous
theorems ever, it
called B-smooth.
remained unproven
for more than 300 By now we have the theory behind Eulers Theorem and Fermats little
years. theorem, and these can now be stated.
The story of Fermats
last theorem is told Theorem 2.3 Eulers Theorem
beautifully in [70]. If n is a positive integer with gcd(a, n) = 1, then a(n) 1 (mod n)
6 Integer Factorization
Per Leslie Jensen
2.2. GROUPS
2.2 Groups
Until now we have looked at elements from sets, and then applied opera-
tions to them, both addition and multiplication. But when we look at ele-
ments from a set and only are interested in one operation, then were actually
looking at a group.
A group G is a set with an operation which satisfies
a, b, c G : (a b) c = a (b c)
e G a G : e a = a e = a
Existence of inverses:
a G a1 : a a1 = a1 a = e
Integer Factorization 7
Per Leslie Jensen
CHAPTER 2. MATHEMATICAL PREREQUISITES
a, b G : a b = b a
A finite group is a group with a finite order. Unless otherwise stated the
groups we are working with are finite.
xHx1 = H
2.3 Mappings
A mapping : G 7 H takes an element g G and maps it to (g) H.
Mappings can be defined for groups, fields and rings.
8 Integer Factorization
Per Leslie Jensen
2.4. FIELDS AND RINGS
Zn
= Zn1 Zn1 Znm
When we look upon the properties for a field, we can see that the set of
rational numbers Q is a field, and so is C. One of the fields of great interest
for cryptography over the years is the field Z/Zp of integers modulo a prime
number p, which is denoted Fp .
Another interesting property of a field is that we can use it to create a
finite group, if we have the field Fp = Z/Zp, we can create the group G =
Fp = (Z/Zp) , which is the group of integers relatively prime with p (the set
1, . . . , p 1) and with multiplication modulo p as operator.
Often these properties are not fulfilled and therefore there is another strong
object in abstract algebra, and that is a ring.
A ring F is a set with all the same properties as a field, except one or more
of the following properties are not fulfilled:
Integer Factorization 9
Per Leslie Jensen
CHAPTER 2. MATHEMATICAL PREREQUISITES
1. (a + b) = (a) (b)
2. (a b) = (a) (b)
From the definition above it is clear that the splitting field G of f over the
field F equals F(r1 r2 . . . rn ), where r1 . . . rn are the roots of f in F.
The next few theorems appear to belong in Section but they are used on
elements from fields, so it is justified to have them here instead. The reci-
procity laws are considered among the greatest laws in number theory and
Gauss even called them the golden theorems. For a very thorough look at the
history and theory of reciprocity laws see [38].
10 Integer Factorization
Per Leslie Jensen
2.5. GALOIS THEORY
A note on the last two definitions. In typographic terms the Jacobi and
Legendre symbols are the same, but remember that they are not the same.
Legendre is modulo a prime and Jacobi is modulo a composite.
The reader have probably noticed that none of these symbols works for
quadratic residues modulo 2. This is because 2 is a special case and all num-
bers on the form 8n + 1 and 8n 1 for n Z are quadratic residues modulo
2.
Integer Factorization 11
Per Leslie Jensen
CHAPTER 2. MATHEMATICAL PREREQUISITES
As a consequence of Galois work, it became clear that all finite fields con-
tains pn elements, where p is a prime and n some integer. These fields can
all be constructed as a splitting field for tn t over Zp . This discovery is the
reason for naming finite fields after the great Evariste Galois.
The definition of a Galois Field adds another name for a finite field, so we
actually have three names for a finite field: Zp , Fp and GF(p) in this thesis a
finite field is denoted GF(p).
Although the most known terms are Galois group and Galois Field, Galois
main work was on the theory of radical extensions and solvable groups.
1 = G0 G1 G n = G
Although some of these definitions looks simple, they are very hard to
work with. One of the computational difficult subjects in mathematics is solv-
ing the Inverse Galois Problem, that is verifying that a given group is a Galois
group for some polynomial in a field extension. This is not relevant to this
thesis, but the interested reader should take a look at [34].
12 Integer Factorization
Per Leslie Jensen
2.6. ELLIPTIC CURVES
2.6.1 Introduction
Elliptic curves should not be confused with ellipses, even though they share
the same name. Elliptic curves are the result of a special kind of functions,
namely the elliptic functions.
0 y
-1.5 -1 -0.5 0 0.5 1
x
1
0.5
-1
y 0
-2 -1 0 1 2
x
-0.5
-1 -2
Figure 2.1: An ellipse is shown on the left and a projection of the elliptic
curve: y 2 + x3 x on the right
Integer Factorization 13
Per Leslie Jensen
CHAPTER 2. MATHEMATICAL PREREQUISITES
In the Weierstra form, the x and y cover a plane, and they can belong to
any kind of algebraic closure.
Interesting properties of a Weierstra equation is its discriminant and
its j-invariant j. In most literature on the subject, it is common practice to
use a few intermediate results to express and j, these are
b2 = a21 + 4a2
b4 = 2a4 + a1 a3
b6 = a23 + 4a6
b8 = a21 a6 + 4a2 a6 a1 a3 a4 + a2 a23 a24
c4 = b22 24b4
Definition 2.32 A curve is called singular if it has one or more singular points,
otherwise it is called non-singular.
Definition 2.33 Two elliptic curves defined over the same base field are iso-
morphic if their j-invariant are the same.
When using an elliptic curve over a Galois Field, we can estimate the num-
ber of points by using Hasses theorem, which was originally conjectured by
Artin in his thesis for general curves and proved later by Hasse for elliptic
curves. A proof can be found in [69].
14 Integer Factorization
Per Leslie Jensen
2.6. ELLIPTIC CURVES
q + 1 2 q |E/GF(q)| q + 1 + 2 q
t = |E/GF(q)| q 1
Rene Schoof created an algorithm for computing the points on elliptic cur-
ves[67], which uses Hasses theorem along with the Trace of Frobenius. The
Schoof algorithm have been improved later by Atkin.
Describing the algorithm is out of the scope of this paper, but the fact
that there exists algorithms for determining the number of points on ellip-
tic curves is important.
Corollary 2.2 Let E be an elliptic curve defined over GF(q), then E is super-
singular if and only if
t2 = 0 t2 = q t2 = 2q t2 = 3q t2 = 4q
Elliptic curves over a Galois Field work differently depending on the char-
acteristic of the field. The case of characteristic 2,3 and p > 3 have to be
treated separately.
2.6.2.1 Elliptic Curves Over GF(2n )
All elliptic curves over GF(2n ) are isomorphic to one of the two following
curves:
y 2 + y = x3 + a4 x + a6
or
y 2 + xy = x3 + a2 x2 + a6
Integer Factorization 15
Per Leslie Jensen
CHAPTER 2. MATHEMATICAL PREREQUISITES
E/GF(2n ) : y 2 + xy = x3 + a2 x2 + a6
= a6
1
j=
a6
E/GF(2n ) : y + a3 y = x3 + a4 x + a6
2
= a43
j=0
The latter curve is supersingular due to the well known result[27], that a
curve over GF(2n ) is supersingular when the j-invariant is 0.
2.6.2.2 Elliptic Curves Over GF(3n )
All elliptic curves over GF(3n ) are isomorphic to one of the two following
curves:
y 2 = x3 + a2 x2 + a6
or
y 2 = x3 + a4 x + a6
These curves lead to simplification of and the j-invariant
E/GF(3n ) : y 2 = x3 + a2 x2 + a6
= a32 a6
a2
j= 3
a6
E/GF(3n ) : y 2 = x3 + a4 x + a6
= a34
j=0
The latter curve is supersingular due to the well known result[27], that a
curve over GF(3n ) is supersingular when the j-invariant is 0.
2.6.2.3 Elliptic Curves Over GF(pn )
When looking at elliptic curves over prime fields, the equation can be reduced
even further. All elliptic curves over GF(pn ) for p > 3 a prime are isomorphic
to the following curve:
y 2 = x3 + a4 x + a6
This curve leads to simplification of the discriminant and the j-invariant
E/GF(pn ) : y 2 = x3 + a4 x + a6
= 16 4a34 + 27a26
4a34
j = 1728
4a34 + 27a26
16 Integer Factorization
Per Leslie Jensen
2.6. ELLIPTIC CURVES
Theorem 2.10 Points on an elliptic curve, E defined over GF(pn ), along with
a special addition, , forms an Abelian group G = (E, ), with O as identity
element.
This special addition is defined as follows. Let P = (x1 , y1 ),Q = (x2 , y2 )
E, where E is defined as in definition 2.30 and let O be the point at infinity,
then
OP =P O =P
O = O
P = (x1 , y1 a1 x1 a3 )
if Q = P then P Q = O
y 2 = x3 + a4 x + a6 (2.1)
= 4a34 + 27a26
Integer Factorization 17
Per Leslie Jensen
CHAPTER 2. MATHEMATICAL PREREQUISITES
Points on elliptic curves over finite rings behave similarly to elliptic curves
over fields, except for the finite number of points which lead to failure of the
addition operation and actually reveals a non trivial factor of n. This is the
reason elliptic curves over rings are used in primality proving algorithms.
When n = pq where p and q are primes of approximately the same sizes,
then the pseudo addition described above fails only if we actually have
factored n which we find as an unlikely situation. Therefore the points on an
elliptic curve defined over the ring Zpq , along with will define a pseudo
Abelian group.
The set of points on an elliptic curve defined over the ring Zpq will consti-
tute a group because it will be the direct product of two groups, and from The
Chinese Remainder Theorem we get:
GE/Zpq
= GE/Zp GE/Zq = GE/GF (p) GE/GF (q)
18 Integer Factorization
Per Leslie Jensen
CHAPTER 3
I believe in the importance of studying the origins of the theory one uses. I
will give an overview of the turning points in the history of integer factoriza-
tion which eventually lead to the number field sieve. For a complete reference
on the history of primality testing and integer factorization I refer to [77] and
[62] where references to the original articles can be found.
The history reflects that integer factorization requires great computational
powers compared to primality testing which does not require the same com-
putations, and the history of primality testing starts earlier and in 1876 Lucas
was able to verify the primality of a 39 digit number in comparison it was not
until 1970 and by using computers that Morrison and Brillhart were able to
factor a 39 digit composite integer.
Before Fermat
The concept of factoring numbers into primes has been around since Euclid
defined what primes are and the idea of unique factorization with the funda-
mental theorem of arithmetic 300 B.C.
Early development of mathematics was mainly driven by its use in busi-
ness or in general life and factoring of integers do not have a use in neither
business nor everyday life so its development have always been driven by
theoretical interest but since the late 1970s it is not driven by theoretical in-
terest but by security interest.
There is no indications that any methods other than trial division existed
before the time of Fermat and so my recap of the history of factoring integers
starts with Pierre de Fermat.
Integer Factorization 19
Per Leslie Jensen
CHAPTER 3. THE HISTORY OF INTEGER FACTORIZATION
Fermat ( 1640)
In 1643 Fermat showed some interesting ideas for factoring integers, ideas
that can be traced all the way to the fastest methods today. Fermats idea was
to write an integer as the difference between two square numbers, i.e. he tried
to find integers x and y such that the composite integer n could be written as
n = x2 y 2 thereby revealing the factors (x + y) and (x y). Fermats method
is very fast if the factors of n is close otherwise it is very inefficient.
Euler ( 1750)
As the most productive mathematician ever Euler of course also had some
thoughts on integer factorization. He only looked at integers on special forms.
One of his methods is only applicable to integers that can be written on the
form n = a2 + Db2 in two different ways with the same D.
The method proceeds as Fermats method for finding the numbers that
are representable in the way described. He used the method to successfully
factor some large numbers for that time.
And ironically he also used his method to disprove one of Fermats theo-
rems.
Legendre ( 1798)
Legendre presented the idea that eventually would evolutionize factoring al-
gorithms.
Legendres theory of congruent squares which I will describe in details
in Chapter 5 is the core of all modern general factorization algorithms. At
the time of Legendre the lack of computational power limited the size of the
numbers that could be factored but Legendres idea would prove itself to
be the best method when computational power would be made available by
various machines and at the end by computers.
Gauss ( 1800)
At the same time that Legendre published his ideas was Gauss working on
the most important work on number theory. And in 1801 was Disquisitiones
Arithmeticae published which had several idea and methods for factoring in-
tegers.
Gauss method is complicated but can be compared to the sieve of Eras-
tothenes which means that it works by finding more and more quadratic
residues mod n and thereby excluding more and more primes for being pos-
sible factors, when there is a few elements left they are trial divided.
The idea of sieving elements is an important element of all modern factor-
ization method and I will show this in Chapter 5.
In the years following Legendre and Gauss no new ideas emerged. Even
20 Integer Factorization
Per Leslie Jensen
with the methods of Gauss and Legendre it was still hard to factor integers
with more than 10-15 digits and even that took days and a lot of paper and
ink, so mathematicians of the time did not waste their time doing manual
computations. It was not until someone decided to build a machine to do the
sieving that the problem of factoring was revisited.
Integer Factorization 21
Per Leslie Jensen
CHAPTER 3. THE HISTORY OF INTEGER FACTORIZATION
original method in 1984, and in 1987 Lenstra took a completely different road
and developed a new method by using elliptic curves in his ECM algorithm,
which is specialized for composites with small factors.
31. August 1988 John Pollard sent a letter to A. M. Odlyzko with copies
to Richard P. Brent, J. Brillhart, H. W. Lenstra, C. P. Schnorr and H. Suyama,
outlining an idea of factoring a special class of numbers by using algebraic
number fields. Shortly after the number field sieve was implemented and
was generalized to be a general purpose algorithm.
The number field sieve is the most complex factoring algorithm but it is
also the fastest and has successfully factored a 512 bit composite.
Since the number field sieve arrived around 1990 there has not been any
big new ideas, only optimizations to the number field sieve.
22 Integer Factorization
Per Leslie Jensen
CHAPTER 4
4.1 Introduction
The word cipher has been used in many different contexts with many different
meanings. In this paper it means a function describing the encryption and
decryption process. A cryptosystem consists of a cipher along with one or
two keys. Cipher: any method of
transforming a message
Cryptosystems can be categorized into symmetric and asymmetric: to conceal its meaning.
The term is also used
symmetric cryptosystems use a shared secret in the encryption process synonymously with
as well as in the decryption process. Cryptosystems in this category cipher text or
cryptogram in reference
includes DES[74], AES[22], and Blowfish[66]. to the encrypted form of
the message.
asymmetric cryptosystems uses a publicly available key for encryption Encyklopdia Britannica
and a private key for decryption, this means that there is no need for
a shared secret. Asymmetric cryptosystems are also known as public-
key cryptosystems. Cryptosystems in this category includes RSA[64],
ElGamal[25], and Diffie-Hellman[23].
In this paper I will only look at public-key cryptography because they are
prone to number theoretic attacks like factoring, (for an introduction to all
kinds of cryptography see [74] or [43]).
4.1.1 How It All Started
Throughout history a lot of importance has been given to cryptography and
the breaking of these systems, a lot of historic researchers give cryptanalysts
a great deal of credit for the development of the second world war[81].
Integer Factorization 23
Per Leslie Jensen
CHAPTER 4. PUBLIC KEY CRYPTOGRAPHY
The history of public-key cryptosystems has been driven by the need for
distribution of common secrets, like keys for symmetric cryptosystems. The
increasing computational power has also had its share in the progress of these
systems. The history of public-key cryptography began in 1976, or so it was
believed until 1997 where a number of classified papers were made public
which showed that different people have had the same ideas.
One of the classified papers was [26], where James Ellis had stumbled upon
an old Bell Labs paper from 1944, which described a method for securing tele-
phone communication without no prior exchanging of keys.
James Ellis wrote in the paper of the possibility of finding a mathematical
formula which could be used. Three years later Clifford Cocks discovered the
first workable mathematical formula in [18], which was a special case of the
later discovered RSA, but like James Ellis paper, it was also classified.
A few months after Cocks paper, Malcolm Williamson made a report [80]
where he discovered a mathematical expression and described its usage as
very similar to the later discovered key exchange method by Diffie and Hell-
man.
Ellis, Cocks and Williamson didnt get the honor of being the inventors
of public-key cryptography because of the secrecy of their work, and who
knows if there was someone before them ?
It was from the academic world the first publicly available ideas came.
Ralph Merkle from Berkeley defined the possibility of public-key encryption
and the concepts were refined by Whitfield Diffie and Martin Hellman from
Stanford University in 1976 with their ground-breaking paper [23].
The paper revealed ideas like key exchanging which is still widely used
today along with some mathematical ideas which could lead to usable func-
tions.
Three researchers at MIT read the paper by Diffie and Hellman and de-
cided to find a practical mathematical formula to use in public-key cryptosys-
tems and they found it. The researchers where Ron Rivest, Idi Shamir and Len
Adleman and they invented the RSA cryptosystem with [64]. At the same
time, Robert James McEliece invented another public-key cryptosystem based
on algebraic error codes equivalent to the subset sum problem in the paper [41].
The two papers ([64],[41]) started a wave of public-key cryptosystems. In
1985 ElGamal defined , in the paper [25], a public-key cryptosystem based on
the discrete logarithm problem.
Since the first paper on public-key cryptography a lot of different systems
have been proposed, a lot of the proposed systems are similar but some of
them present really unique ideas. The only thing they all have in common is
their algebraic structure.
For a complete, and quite entertaining, view at the birth of public-key
cryptography, the interested reader should turn to the book [40], which takes
the reader into the minds of the creators of the systems that changed private
communication forever.
24 Integer Factorization
Per Leslie Jensen
4.1. INTRODUCTION
A cipher, K are the needed functions and parameters, like public and private
key.
Integer Factorization 25
Per Leslie Jensen
CHAPTER 4. PUBLIC KEY CRYPTOGRAPHY
4.2 RSA
The RSA paper in 1978[64] proposed a working system displaying the ideas
of Whit Diffie and Martin Hellman.
RSA is based on a very simple number theoretic problem and so are a lot
of the public-key cryptosystems known today.
In general the public-key cryptosystems known today based on number
theoretic problems can be categorized into three groups:
26 Integer Factorization
Per Leslie Jensen
4.2. RSA
We are mainly interested in the ones based on factoring, but in Section 4.3
I will give a short introduction to discrete logarithm based schemes because
the problem can be solved by factoring aswell.
I will use RSA as an example of a factoring based scheme afterwards I
will briefly discuss other factoring based algorithms by formalizing the RSA
scheme.
To use RSA encryption, one has to have a public and a private key, these
are created before using the scheme.
Setting up RSA:
2. calculate n = pq
3. calculate (n) = (p 1) (q 1)
The pair (e, n) is the public key and (d, n) is the private key
To encrypt a plain text M into the cipher text C:
To make this scheme more clear let us look at a short example: Alice wants
to talk secretly with Bob , she obtains Bob s public key (eB , nB ) from some
authority. She then calculates C = MeB (mod nB ), where M is the message
she wants to give him. She sends the encrypted message C to Bob .
Bob then decrypts the message from Alice using his private key (dB , nB )
in M = C dB (mod nB ) and can now read the secret message from Alice.
Integer Factorization 27
Per Leslie Jensen
CHAPTER 4. PUBLIC KEY CRYPTOGRAPHY
4.2.1 Correctness
The reader can surely see the beauty in this simple system. The RSA scheme
consists of a permutation polynomial and its inverse. We have just seen how
these polynomials look like, now we will verify that they work as expected.
The selection of e as the multiplicative inverse of d and gcd(d(n)) = 1
ensure us that
ed 1 (mod (n))
ed = k(n) + 1
Me (mod n)
(M e )d (mod n) = M ed (mod n)
k(n)+1
= M (mod n)
Fermats Little Theorem (Theorem 2.4) implies that for any prime p and for
all M where p - M
0 0
M k (p1) 1 (mod p) so M k (p1)+1 M (mod p)
for any positive integer k 0 , the last one even works for all M .
using both p and q we get
0
M k (p1)+1 M (mod p)
k0 (q1)+1
M M (mod q)
28 Integer Factorization
Per Leslie Jensen
4.2. RSA
Integer Factorization 29
Per Leslie Jensen
CHAPTER 4. PUBLIC KEY CRYPTOGRAPHY
We will take a look at some of the schemes proposed since the original pa-
per. We will not cover all RSA-like systems but look at some representatives
of these.
The RSA-like cryptosystems known today can be categorized in the obvi-
ous two categories:
30 Integer Factorization
Per Leslie Jensen
4.2. RSA
E(x) = e x
D(x) = d x
Where x En (0, b) and multiplying this with an integer i means using the
operation i times on x.
The KMOV scheme can certainly be called RSA-like, it works with the
group of points on the elliptic curve along with a point at infinity.
The encryption and decryption functions are indeed permutation polyno-
mials as they are bijective. The KMOV scheme is a good example of variation
of the RSA scheme, and using elliptic curves in RSA-like encryption.
In 1993 Nicholas Demytko presented another RSA variant using elliptic curves,
it extended the ideas of the KMOV people and the KMOV scheme can be ob-
tained as a special case of the Demytko scheme.
Demytko generalized the KMOV scheme so that different types of elliptic
curves can be used. The scheme is more complex than the KMOV scheme,
and it doesnt have the property of defining a bijective encryption and de-
cryption function.
The Demytko scheme is more similar to the Williams scheme in the sense
that the decryption key is dependent of the message. It does contain a method
for selecting the correct decryption key. And the general Demytko scheme
can therefore not be called RSA-like.
It is possible to construct a message independent decryption in the De-
mytko scheme, but this would be identical to the KMOV scheme.
There have been more attempts to use elliptic curves in RSA-like schemes
but many of these have been reduced to the KMOV or Demytko scheme. In
Section 4.2.4 we will take a look at the advantages/disadvantages in using
elliptic curves in this setting.
4.2.3.3 Dickson polynomials and Lucas sequences
In 1981 Nobauer and Muller
suggested using Dickson polynomials in public-
key cryptography[50]. In 1986 they proposed a scheme based on Dickson
polynomials called the Dickson scheme[51].
In 1993 Peter J. Smith and Michael J. J. Lennon designed a new public-key
scheme, named LUC, based on Lucas sequences[72].
Apparently it was not until late in the development they saw the work of
Nobauer and Muller.
The Lucas functions used by Smith and Lennon in LUC,
are identical to the Dickson polynomials used by Nobauer and Muller
in the
Dickson scheme.
A Dickson polynomial is a polynomial over Z on the form:
bk/2c
X k ki
gk (a, x) = (a)i xk2i (4.1)
ki i
i=0
Integer Factorization 31
Per Leslie Jensen
CHAPTER 4. PUBLIC KEY CRYPTOGRAPHY
Nobauer and Muller showed that Dickson polynomials with certain param-
eters are permutation polynomials. The LUC scheme utilizes this by using
polynomials with a = 1.
At first sight LUC has message dependent decryption keys, but it is possi-
ble to construct a message independent key at the cost of longer key length.
The LUC scheme works on elements from the ring Zn . The scheme seems
somewhat more computational expensive, but in an implementation a lot of
tricks can be applied so the overhead is minimal compared to the original
RSA scheme.
4.2.4 Security
A cryptosystem is obviously unusable if it does not provide some kind of
security. The concept of security is not easy to define and depends on the
usage and environment.
The security of a public-key cryptosystem lies in the impossibility for a
person to decrypt a message not intended for him/her. In other words there
should be no way to discover the plain text without the private key.
In public-key cryptography this is assured by using some well known
hard to solve problem as a base. Some cryptosystems are provably as
hard to break as some known problem, and others are supposedly as hard
to break as some other known problem. RSA is in the last category because
it is believed that breaking RSA is as hard as factoring large integers, but not
proved.
Like RSA not being proved to be as hard as factoring integers, neither is
factoring integers proved to lie in some hard complexity class like NP, which
leads to the question:
One important test that RSA has passed with flying colors is the test of
time. It was the first usable public-key cryptosystem implemented and has
for many years been the de facto standard of public-key cryptography. Be-
cause it is so widespread it has been the subject of many studies.
RSA has not been broken or compromised in any way, research has found
properties that users and implementers should be aware of but no way to
recover the plain text of a correctly RSA encrypted cipher text.
Now we will take a look at some of the attacks and weaknesses discovered
through the last 25 years of cryptanalysis on the RSA and RSA-like schemes.
The attacks/weaknesses can be categorized into four groups
implementation
implementations can have security holes for a possible attacker!
homomorphic
utilizing the homomorphic structure of the scheme!
parameters
some bad choices of parameters can compromise the cryptosystem!
32 Integer Factorization
Per Leslie Jensen
4.2. RSA
factoring
factoring the modulus and thereby breaking the system!
Lets take a look at some of the attacks in these groups.
4.2.4.1 Implementation attacks
When implementing an algorithm on small and simple devices there are
some attacks to be aware of.
Implementation attacks use knowledge of the implementation to find weak-
nesses. In this category are timing attacks and power analysis attacks.
Many of the timing attacks proposed work on the signature scheme of
RSA, and since we are not investigating the signature scheme in this paper
we will not go into details of these attacks, but give an overview of how they
work.
Timing attacks
Timing attacks are attacks based on measuring the time a given operation
takes. Most of the attacks proposed on RSA use signature generation to attack
the implementation. As we are not concerned with signatures in this paper
we will not go into details with these attack but give an overview of them.
Kocher[36] showed that by measuring the time it takes a smart card to per-
form a RSA operation (decryption or signature), an attacker can discover the
private key.
The attack works by the attacker choosing some plain texts which is then
signed or encrypted. The corresponding time for a signing/decryption is
measured. Kocher showed that an attacker can use these pairs to find some of
the bits in the decryption key (the private key), starting from the least signif-
icant bit.
The obvious way to prevent against a timing attack is making all oper-
ations take the same time, this is done by inserting delays assuring that all
operations take the same time, but making software running in fixed time is
difficult because of the different key sizes.
Defending against timing attacks in this way is far from optimal, especially
because of the difficulty in assuring that operations takes fixed time. And
attempts of implementing a fixed time implementation often results in very
slow software and will also be open for power attacks.
There is another method for defending against timing attacks and that is
blinding.
Blinding is a technique used for blinding signatures proposed by Chaum in
his 82 paper [17].
The method works on a smart card doing decryption of the cipher text C
with public key (e, n) and private key (d, n) obtaining the plain text M in the
following way:
1. the smart card chooses some random r Zn
2. the smart card then computes C 0 = C re mod n
3. the smart card then computes M0 = (C 0 )d mod n
Integer Factorization 33
Per Leslie Jensen
CHAPTER 4. PUBLIC KEY CRYPTOGRAPHY
In this way the attack is not possible because the decryption works on a
text unknown to the attacker because of the transformation C 0 = C re mod n.
Power analysis
Instead of measuring the time a given operation takes, one can measure the
power used. This is called power analysis. Like timing attacks one can figure
out the bits of the private key when measuring the power consumption under
some given operation.
Power analysis can only be used on units where it is possible to measure
power consumption like smart cards and other small units which do not do
any other computations at the same time, it is not possible to do power anal-
ysis on a PC with a modern operating system at it runs many processes si-
multaneously. See [35] for details on power analysis and especially differential
power analysis.
An obvious prevention against power analysis of all sorts is to prevent an
attacker in gaining close enough access to the device, this may not be possible
to assure for example in smart cards it would also involve an increase in costs.
Another way to protect against an attacker is to add signal noise to the
power channel so that an attacker would need infeasibly many samples and
not be able to make an attempt to break the code.
To really defend against this attack the algorithms have to be constructed
with the target architecture in mind.
Other
There are other measuring techniques, like electromagnetic radiation and su-
per conducting quantum imaging devices. They can all threaten an imple-
mentation if the implementer is not aware of the possible holes an attacker
can use.
All of the measuring attacks can only be applied when the attacker has
physical access to the working devices. The easiest way to prevent those
attacks is to make sure that it is not possible to gain physical access to the
operating machine.
4.2.4.2 Attacking the homomorphic structure of RSA
The permutation polynomial used in the RSA scheme has the following prop-
erty
(m1 m2 )e me1 me2 c1 c2 (mod n)
which is called the homomorphic structure of RSA.
This property can be used to mount an adaptive chosen cipher text attack
which has a lot in common with the blinding technique described previously.
The adaptive chosen cipher text attack utilizes the homomorphic property by
blinding a cipher text and getting the intended user to decrypt it.
The attacker Marvin wants to decrypt the cipher text C intended for Alice.
Alice will not decrypt C but is willing to decrypt other messages(for example
for using as a signature). Marvin then computes C 0 = C xe mod n where
34 Integer Factorization
Per Leslie Jensen
4.2. RSA
x Zn is randomly chosen and (e, n) is Alices public key. Marvin then gets
Alice to decrypt C 0 returning M0 = (C 0 )d mod n. Marvin can then get the
plain text of C by computing M = C 0 /x mod n, since
M0 (C 0 )d C d (xe )d Mx (mod n)
This attack can be prevented by being more restrictive with what is de-
crypted and returned, in practice this attack is only possible on very naive
users and a badly implemented RSA algorithm.
4.2.4.3 Choosing Parameters
Some of the first attacks on RSA were on systems based on the wrong selec-
tion of parameters. There is some properties of the parameters that an user
of RSA has to be aware of.
In the key generation phase the user has to make sure that the keys gen-
erated satisfy some certain properties. This is because a series of attacks over
the years have revealed that certain keys are easier to break than others.
We will now give an overview of attacks on the key structure and give
a guide of how to generate keys for RSA that are not vulnerable for these
attacks.
Small decryption exponent d
In 1989 Martin Wiener showed that using a small decryption exponent was
not wise[76]. Wiener showed that using continued fractions it was possible to
reveal a small decryption exponent. He showed that decryption exponents
smaller than n1/4 , where n is the modulus was unsafe, he also proposed that
this bound was not minimal.
In 2000 Boneh and Durfee changed the bound to n0.292 in [9]. They used a
similar approach. In the paper they suggested that the precise bound could
be upwards n1/2 .
Because of the Wiener attack one should choose d > n1/2 , where n is the
modulus.
The Wiener attach has been extended to the elliptic variants of RSA as well
as the variants utilizing Dickson polynomials, like LUC, in [54].
Small encryption exponent e
When it is possible to attack the use of a small decryption exponent, it seems
obvious that an attack on a small encryption exponent should be possible.
Attacks on a small encryption exponent is not as forwarded as the use of
continued fractions in the attack on a small decryption exponent.
An attack on a small encryption exponent seems only to be possible when
the attacker can obtain some messages that are linear dependent.
Hastad suggested an attack on a RSA protocol using random data pad-
ding[33]. Many RSA protocol implementations use some padding of the data
with random data, but when padding some messages in the same way, ie.
with the same amount of randomness in the same position, it results in a
series of polynomial related messages which can be put through standard
linear algebra giving the decryption key.
Integer Factorization 35
Per Leslie Jensen
CHAPTER 4. PUBLIC KEY CRYPTOGRAPHY
This theorem can, for example, be used on a series of messages, where only
a part of the messages change, like messages starting with the text: The tasks
of today is:. . . . Then the attacker solves the polynomial congruences of the
known text T and some unknown text x in the following way: (T +x)e C 0
(mod n), where (e, n) is the public key and C is the cipher text.
As seen is attacks on small encryption exponent dependent on protocol
implementation, and can be avoided by avoiding the protocol properties
mentioned, but for greater security one should use a not to small encryption
exponent.
Common parameters
One may never use the same parameters for cryptosystems. The use of the
same modulus for two different people with two different encryption expo-
nent enables an attacker to recover the plain text of a message sent to the two
people with the same modulus.
Let C1 and C2 be the two cipher texts of the plain text M. The two encryp-
tion exponents e1 and e2 will be relative prime and the attacker can then find
u, v such that ue1 + ve2 1 (mod n), with u, v the attacker can then recover
M:
C1u C2v Mue1 +ve2 M (mod n)
Analogous to this the use of a common encryption exponent will give the
attacker the possibility to recover the plain text by using the The Chinese Re-
mainder Theorem.
The important thing to notice is that none of these attacks enables the at-
tacker to discover the decryption exponent, he can only discover the original
plain text.
Strong Primes
There has been many discussions of using so called strong primes in the
key generation phase.
strong prime is widely accepted as being defined as follows:
The argumentation for using strong primes in the key generation phase is
to prevent factoring of the modulus with specialized algorithms like Pollards
36 Integer Factorization
Per Leslie Jensen
4.3. DISCRETE LOGARITHMS
Integer Factorization 37
Per Leslie Jensen
CHAPTER 4. PUBLIC KEY CRYPTOGRAPHY
given some element Q in the cyclic subgroup generated by P , find the integer
x, such that xP = Q.
Definition 4.5 is the base for algorithms like ECDSA and Elliptic ElGamal.
Alice obtains Bobs public key from some public key server.
Encryption: Alice wants to encrypt a plain text M with the cipher
eB = (e, d, n). She starts by choosing a random integer k where
1 k p 2 And then encrypting the plain text M into the cipher
text C:
EKB (M) = C = (, ) = k , M (a )k mod p
The reason for choosing an elliptic version rather than the original Zp ver-
sion, is that the ECDLP is harder than DLP, and thereby being able to provide
the same security with smaller keys.
The difficulty of solving the discrete logarithm problem in some group,
depends on the underlying group.
General Attacks on the DLP
Because the Diffie-Hellman key exchange and other cryptographic schemes
based on the difficulty of the discrete logarithm problem is used in many
applications, there has always been a lot of research and development in al-
gorithms for solving the discrete logarithm problem.
This has resulted in subexponential algorithms for solving the DLP in
cyclic subgroups of finite fields.
The introduction of groups based on points on elliptic curves, resulted in
38 Integer Factorization
Per Leslie Jensen
4.3. DISCRETE LOGARITHMS
the elliptic curve discrete logarithm problem which has proven to be harder
to solve than its equivalent in cyclic groups of finite fields.
Algorithms for solving the discrete logarithm over an arbitrary group G
of order |G| are:
Daniel Shanks developed an algorithm known p as the Baby step -pGiant
Step method[68]. The algorithm requires O( |G|) steps and O( |G|)
workspace. The algorithm works without knowledge of the group or-
der. It can be improved slightly when working over elliptic curves, but
the algorithm is far to slow and requires a lot of storage when used on
problems over group with the size of order that is used in cryptography.
John M. Pollard is one of the most productive people in the area of fac-
torization and solving the discrete logarithm problem. He has devel-
oped many algorithms for factoring and solving discrete logarithms.
He is the developer of the Pollards method of solving the discrete loga-
rithm problem and the generalized Pollards method both presented in
[58]. Both algorithms are Monte-Carlo methods, meaning that they use
some random walk which can yield a solution, in case it does not the
algorithm takes another walk. The algorithms uses roughly the same
time as Shankss algorithm but they do not use as much space. This is
one of the methods used to solve instances of ECDLP, and of 2003 the
largest ECDLP instance solved with Pollards algorithm is for an el-
liptic curve over a 109 bit field. The Pollard method for factoring is
described in the next Chapter.
In 1978 Pohlig and Hellman proposed an algorithm[55] after the success
and acceptance of the Diffie-Hellman key exchange protocol. The algo-
rithm is fast but only when the group order has small prime divisors.
The algorithm only works when the group order is known in advance.
To avoid an attack with the Pohlig-Hellman algorithm one should make
sure that the group order has large prime factors or is of prime order.
Pohlig and Hellman has an analogues method for factoring.
Solving DLP over a Multiplicative Group
Solving the discrete logarithm problem of a multiplicative group of a finite
field, can be done in subexponential time by the algorithm called index calcu-
lus.
The index calculus is a method based upon the ideas of Western and Miller
[75]. The algorithm was developed independently by Adleman[3], Merkle[44]
and Pollard[58]. The algorithm exists in many versions, some generalized to
special fields like the very fast version by Coppersmith[19, 20] that is special-
ized to the fields GF(2n ).
The index calculus algorithms are identically to the modern general pur-
pose factoring algorithms which is presented in the next Chapter. It consists
of the same 3 basic steps.
The development of algorithms for solving discrete logarithms follows the
development of factoring algorithms and a fast method for factoring integers
Integer Factorization 39
Per Leslie Jensen
CHAPTER 4. PUBLIC KEY CRYPTOGRAPHY
40 Integer Factorization
Per Leslie Jensen
CHAPTER 5
Factorization Methods
The obvious mathematical breakthrough, would be development
of an easy way to factor large prime numbers.
- Bill Gates(1955 - )
Integer Factorization 41
Per Leslie Jensen
CHAPTER 5. FACTORIZATION METHODS
are useless for our purpose, but for completeness I will give an idea of how
the most known special methods works some of them uses ideas that are in
use in general algorithms as well.
5.1.1 Trial division
Trial division is a fast method for small composites, and as its name indicates
it trial divides possible factors to see if the remainder is zero.
It does not fail for hard composites, it just takes a long time.
There is one major decision to make before implementing a trial division
algorithm, should we use primes or not ?
This means if we should use a list of primes for the divisions or we should
generate pseudo-primes on-the-fly. It takes a lot of time and storage to create
and store a list of primes, instead we can use the following numbers instead
2, 3, 6k 1 k Z, k > 0
This sequence covers all the primes[62], but also includes a few composites
which means that we may do a few more divisions than needed, but that is a
fair price for avoiding to create and store a list of primes.
There is different methods to generate the sequence described above, we
can start by 5 and then alternately add 2 and 4. I will use a variation of this in
the algorithm below. The algorithm described returns the first factor p found,
for a complete factorization it should be run again on n/p.
1. if n 0 mod 2 return p = 2
2. if n 0 mod 3 return p = 3
3. set p = 3
set b = 2
4. while p < n
(a) set p = p + b
(b) if n mod p = 0 return p
(c) set b = 6 b
Trial division is useless for composites with only large factors, but for
smaller composites it is definitely a usable algorithm and it is a good exer-
cise to implement it.
The time complexity of trial division is O( n), where n is the number to
be factored.
42 Integer Factorization
Per Leslie Jensen
5.1. SPECIAL ALGORITHMS
and therefore
ak(p1) 1 0 (mod p)
for any integer a and prime p, where p - a.
The algorithm can be described as
algorithm: Pollards p 1
2. select integer a
4. set p = gcd(a 1, n)
The Pollard p 1 method is not that effective unless one uses some smart
strategy for choosing as.
Integer Factorization 43
Per Leslie Jensen
CHAPTER 5. FACTORIZATION METHODS
2. Find the period of the sequence, i.e. find i and j such that ai aj
mod p
algorithm: Pollards
1. set a = b = 2,c = 1
4. set d = gcd(a b, n)
6. if d = 1 goto 3
In 1980 Richard P. Brent refined the algorithm[11] and it is his version that
is commonly used. It is about 25% faster than the original Pollard version,
and it has been applied to some large composites like the 8th Fermat number2
in 1980, it took 2 hours on a UNIVAC.
The difference from the original version is in the way it detects cycles in
the sequence.
1
It gets the name because the iteration of the elements looks like a circle with a tail when
sketched.
n
2
The nth Fermat number is defined as Fn = 22 + 1, and is named after Pierre de Fermat
that postulated that Fn is prime for all n which it is not. So far only the first 5 Fermat numbers
(F0 , F1 , F2 , F3 , F4 ) are primes but it is not proved that they are the only Fermat primes. The
Fermat numbers F7 , F8 , F9 , F10 , F11 have shown to be hard composites and a worthy oppo-
nent for modern day factoring algorithms.
44 Integer Factorization
Per Leslie Jensen
5.1. SPECIAL ALGORITHMS
algorithm: Brent-Pollards
1. set r = q = p = 1,c = 1
4. while p = 1
(a) set x = y
(b) for i = 1 to r
i. set y = f (y)
(c) set k = 0
(d) while k 6= r and p = 1
i. set ys = y
ii. for i = 1 to min(m, r k)
A. set y = f (y)
set q = (q|x y|) mod n
iii. set p = gcd(q, n)
set k = k + m
(e) set r = 2r
5. if g = n
(a) while p = 1
i. set ys = f (ys )
set p = gcd(x ys , n)
Integer Factorization 45
Per Leslie Jensen
CHAPTER 5. FACTORIZATION METHODS
The ECM method is a lot like Pollards p 1 method, actually it is the same,
but Lenstra generalized it and applied it to the group of points on an elliptic
curve.
Pollard worked in the multiplicative group of the finite field K, and Lenstra
uses a group based on the points of an elliptic curve, as described in Section
2.6.
The group operations of an elliptic curve group is more expensive, but it
gives the opportunity to use several groups at the same time. ECM have been
used in some big factorizations, see [12] for some results of the ECM.
I will give the algorithm here; for details of arithmetic with elliptic curves
and notation used see Section 2.6.
where
3x21 +a
(
d1
d2 = 2y1 mod n if P1 = P2
= d1 y1 y2
d2 = x1 x2 mod n otherwise
5. if c = O set p = gcd(d2 , n)
else goto 2
As with all other of the special algorithms one can add another phase
based on some continuation principle, i.e. what to do if it does not work
on the first run, and the ECM have been developed and different continua-
tion strategies have been proposed, for a summary of continuation strategies
see [12].
46 Integer Factorization
Per Leslie Jensen
5.2. GENERAL ALGORITHMS
x2 y 2 mod n , 0 x y n, x 6= y, x + y 6= n
x2 y 2 mod n
This implies
x2 y 2 mod n n|x2 y 2
n|(x y)(x + y)
x2 y 2 mod pq pq|x2 y 2
pq|(x y)(x + y)
p|(x y) or p|(x + y)
q|(x y) or q|(x + y)
Integer Factorization 47
Per Leslie Jensen
CHAPTER 5. FACTORIZATION METHODS
Assuming that the combinations in the table are equally likely to happen,
we have a probability of 2/3 of getting a nontrivial factor of n. This means
that we probably only need to have a small number of pairs (x, y) that satisfy
Legendres Congruence in order to factor n.
Modern factoring methods all consists of the 3 basic steps
1. Identify a set of relations that are smooth over some factor base.
2. Solve the system of linear equations and find the relations that yields
squares.
These are the steps that the modern factoring methods uses, the differ-
ence lies only in how they find the pairs of integers that satisfy Legendres
Congruence.
5.2.2 Continued Fractions (CFRAC)
Legendre himself had a method for factoring using the idea of congruent
squares, but it was Morrison and Brillhart that gave us the first modern meth-
od[48]. For many years it was the fastest algorithm for large composites, it
was first surpassed when the quadratic sieve emerged and later the number
field sieve.
The continued fractions method (CFRAC) is interesting because it uses a
completely different method for finding the desired squares than any of the
later methods.
CFRAC as it names indicates use continued fractions. A continued frac-
tion is a number representation and is used to represent real numbers with
integers.
Any real number x can be represented by a general continued fraction on
the form
b1
x = a0 + b2
a1 + b3
a2 + a +...
3
48 Integer Factorization
Per Leslie Jensen
5.2. GENERAL ALGORITHMS
x = [a0 , a1 , a2 , a3 , . . .]
Here is an algorithm for finding the continued fraction of a real number.
x0 = x
1
q0 = bx0 c x1 = x0 q 0
1
q1 = bx1 c x2 = x1 q1
.. ..
. .
1
qi = bxi c xi+1 = xi q i
.. ..
. .
Theorem 5.1 The square root n of a square free integer n has a periodic
continued fraction of the form
n = [a0 , a1 , . . . , an , 2a0 ]
Furthermore we have 0 ai 2 n
I need one more tool before I can describe the CFRAC algorithm.
Integer Factorization 49
Per Leslie Jensen
CHAPTER 5. FACTORIZATION METHODS
algorithm: CFRAC
The period of the continued fraction of n can beso short that enough
relations cannot be found. A way around this is to use kn instead, see more
in [60].
50 Integer Factorization
Per Leslie Jensen
5.2. GENERAL ALGORITHMS
We then compute Q(x1 ), Q(x2 ), . . . , Q(xk ) for some xi defined later. From
the evaluations of Q(xi ) we want a subset Q(xi1 )Q(xi2 ) Q(xir ) which is a
square y 2 Z.
From 5.2 we have Q(x) x 2 (mod n) so we get the wanted congruence
algorithm: QS
The quadratic sieve was for many years the fastest factoring algorithm for
large composites but has been superceeded by the number field sieve, but for
integers in the 60-110 digit range it still seems to be the fastest method.
Integer Factorization 51
Per Leslie Jensen
CHAPTER 5. FACTORIZATION METHODS
52 Integer Factorization
Per Leslie Jensen
5.2. GENERAL ALGORITHMS
Y
y2 = (a + bm) : y2 Z
(a,b)S
Y
2 = (a + b) : 2 Z[]
(a,b)S
()2 = ()()
= ( 2 )
Y
= (a + b)
(a,b)S
Y
= ((a + b))
(a,b)S
Y
= (a + bm)
(a,b)S
= y2
N (a, b) = a + bm
Integer Factorization 53
Per Leslie Jensen
CHAPTER 5. FACTORIZATION METHODS
()2 = ()()
= ( 2 )
Y
= f 0 ()2 (a + b)
(a,b)S
Y
= (f 0 ()2 ) ((a + b))
(a,b)S
Y
0 2
= f (m) (a + bm)
(a,b)S
2
= y
see [14] for further details.
One other consequence of sieving the way explained and assuming that
Z[] is a UFD is that we are not guaranteed that S actually is a square in Z[].
The AFB contains first degree prime ideals of Z[] and although the siev-
ing assures us that the resulting elements are AFB-smooth, the product is not
necessarily a perfect square in Z[]. To get past that obstacle the QCB is used.
Quadratic characters are introduced in the linear algebra step and they
give a higher probability of a product of elements from S being a perfect
square in Z[]. The higher the number of quadratic characters used the more
likely we are to arrive at a perfect square in Z[]. See [7] for a discussion of
the usage of quadratic characters where it is implied that 50-100 quadratic
chracters should suffice for any factorization.
Actually, there are a few more theoretic obstacles in assuming that Z[] is
UFD and the use of first degree prime ideals, but they are all solved by the
use of quadratic characters.
The reader should at this point be aware that one do not necessarily get the
wanted result from using the number field sieve, but using it correctly with
the right parameters has a very high probability of succeeding as described
in 5.2.1.
The next Chapter will take you through the algorithm step by step and will
also answer some of the unanswered questions from the Sections on CFRAC
and QS.
54 Integer Factorization
Per Leslie Jensen
5.3. FACTORING STRATEGY
1
3
3 2
Ln [ , c] = O ec+O(1)) log n (log log n)
3
q q
32
where c = 3
9 for the SNFS and c = 3 649 for the GNFS.
Integer Factorization 55
Per Leslie Jensen
CHAPTER 5. FACTORIZATION METHODS
This means that the security of RSA relies solely on the ability of factoring
large integers.
We can not say with certainty that choosing a 4096 bit key for signatures
would give enough security for x years. The past has shown a slow increase
in the bits of factorable numbers, so choosing 2048+ bit key sizes should give
some margin of security.
5.4.2 Factoring in The Future
What will the future of integer factorization bring ? This is impossible to pre-
dict but if we look at current development there is some research in Factor-
ization Circuits. Research has been done for many years on designing circuits
for factoring and the idea seems to be a hot topic. The main problem is the
price of such a circuit compared to mainstream PC architectures, D. J. Bern-
stein has hinted in a recent conference that he is going to present a new circuit
design with ECM as smoothness test on sieving elements in a number field
sieve implementation, so it is still a hot topic.
Computational power is increasing day for day and although there is far
to a PC being able to factor a 512 bit modulus in reasonable time there is still
the threat that an organization can build a large farm of cheap PCs and start
a code breaking service for the highest bidder. This would probably not be
used to invade the privacy of an individual but to obtain financial gains or in
a war or terror situation.
The last 10+ years, advances in factorization algorithms have all been
based on using smoothness and factor bases and it is very likely that one
have to choose another strategy to find even faster algorithms, hopefully a
polynomial time algorithm.
The polynomial time algorithm for testing primality[4] used some elemen-
tary number theory and is quite different from other primality testing algo-
rithms known. This could be what is needed for integer factorization unless
it is found to be NP-complete.
56 Integer Factorization
Per Leslie Jensen
CHAPTER 6
The reader is now aware of the various types of algorithms for factoring, and
know that the fastest algorithms are based on sieving over a well-defined
domain; actually one can refine it to say that the fastest algorithms calculates
congruences and solve systems of linear equations.
The number field sieve variants have the smallest time-complexity com-
pared to other factoring algorithms, and have also proved themselves to be
the best algorithms to use in real life for large composites.
The number field sieve shares many similarities with the quadratic sieve
described in Section 5.2.3, and can be seen as an expansion of the QS to utilize
polynomials of degree > 2.
The two main variants are the special number field sieve (SNFS) and the gen-
eral number field sieve (GNFS). The SNFS works on a special type of compos-
ites, namely integers on the form: re s, for small integers r, s and integer
e and the GNFS works on all types of composites. The difference between
SNFS and GNFS is in the polynomial selection part of the algorithm, where
the special numbers which SNFS can be applied to, make a special class of
polynomials especially attractive and the work in the square root step is also
more complex for the GNFS.
Integer Factorization 57
Per Leslie Jensen
CHAPTER 6. THE NUMBER FIELD SIEVE
flow of the algorithm more digestable, but behind each step is a lot of calcu-
lations and different algorithms in interaction.
gcd(a, b) = 1
58 Integer Factorization
Per Leslie Jensen
6.2. IMPLEMENTING THE GNFS ALGORITHM
The outline above gives an idea of the flow in the algorithm. The steps
described above are nowhere near a description of the algorithm that can
be used for implementation. For implementation purposes the steps have
to be unfolded to a number of algorithms without any steps concealed in
mathematical properties and assumptions.
The steps described above are the ones I will refer to in the rest of this
chapter.
Definition 6.1 A polynomial f (x) has a good size property if the values taken
by f (x) are small. This can be achieved by selecting a polynomial with small
coefficients. The smaller the size the better the size property.
Definition 6.2 A polynomial f (x) is said to have good root properties if f (x)
has many roots modulo small primes.
There is no direct way to choose a good polynomial. The best method is
to assure some level of size and root properties and then create candidates
which are then tested and the one with the best yield is chosen.
The problem lies in finding good polynomials among the usable candi-
dates. Brian Antony Murphy has in his thesis[49] done a lot of work in poly-
nomial yield.
The polynomial f (x) has to have the following properties
1. is irreducible over Z[x]
Integer Factorization 59
Per Leslie Jensen
CHAPTER 6. THE NUMBER FIELD SIEVE
The size property can be obtained by making the coefficients smaller, for
example by making the replacements
ai = ai m
ai+1 = ai+1 + 1
60 Integer Factorization
Per Leslie Jensen
6.2. IMPLEMENTING THE GNFS ALGORITHM
1 1
1. choose m0 such that bn d+1 c m0 dn d e
|ad |
2. choose interval of suitable ad s such that 1 < m0 < 2 , where
0 < 1 < 2 < 0.5.
6. return f (x).
The algebraic factor base AFB consists of pairs (p, r) such that f (r) 0
mod p, ie.
Integer Factorization 61
Per Leslie Jensen
CHAPTER 6. THE NUMBER FIELD SIEVE
gcd(a, b) = 1
a + bm is smooth over the rational factor base
bdeg(f ) f (a/b) is smooth over the algebraic factor base
62 Integer Factorization
Per Leslie Jensen
6.2. IMPLEMENTING THE GNFS ALGORITHM
The elements (a, b) with algebraic norm divisible by element (p, r) from
AFB are those with a on the form a = br + kp for k Z.
With these properties in mind, we can take each element from the factor
base and remove its contribution from the elements that have it as a factor,
and this is the idea of the line sieving algorithm given here.
1. b = 0
2. rels = []
(a) set b = b + 1
(b) set a[i] = i + bm for i [C; C]
(c) foreach (p, r) RFB
4. return rels.
Integer Factorization 63
Per Leslie Jensen
CHAPTER 6. THE NUMBER FIELD SIEVE
n
log = log(n) log(p)
p
So we avoid the many divisions of large integers, but there are some issues
one should be aware of under this optimization
The first problem can to some extent be handled by adding a fuzz factor to
the logarithms. If a norm is not RFB-smooth then it must have a factor which
is greater than the largest prime in RFB, so if we subtract log(max(RFB)) from
all the entries, we should catch more of the elements which are divisible by
prime powers without approving more wrong ones.
Both of the issues mentioned above imply that the elements found should
be trial divided to assure that they all are RFB- and AFB-smooth.
6.2.4 Linear Algebra
From the sieving step we have a list of (a, b)s which are RFB- and AFB-smooth
and we want to find a subset of these which yields a square, ie. we have to
find a combination of elements from the relation set which has a product that
is a square.
For a number to be a true square, the elements in its unique factorization
must have an even power, and this property is what we use to find elements
that are squares.
To clarify we can simply say that we have a list of numbers:
64 Integer Factorization
Per Leslie Jensen
6.2. IMPLEMENTING THE GNFS ALGORITHM
with the sign of the norm. Since we are only interested in the parity of the
power of the elements in the factorization (even/odd), we put a 1 if the el-
ement appears as an odd power in the factorization and a 0 if its even or
zero. This should be more clear from the algorithm below and the extended
example in 6.3.
This results in a matrix M over GF(2) with dimensions (#relations)
(#RFB + #AFB + #QCB + 1).
3. return M.
This matrix can then be put on reduced echelon form and from this we can
derive solutions which yield a square.
The process of solving linear systems by reducing a matrix of the system
has been thoroughly studied since the birth of algebra and the most com-
monly used method is Gaussian Elimination. It is fast and easy to implement
but has one drawback in this case; as the matrix is quite large, memory-usage
becomes a problem.
There exist variations of Gaussian Elimination which utilize block partion-
ing of the matrix, but often a completely different method is used, namely the
Block Lanczos method specialized for the GF(2) case by Peter Montgomery
in [46]. It is probabilistic but in both theory and practice the algorithm is
Integer Factorization 65
Per Leslie Jensen
CHAPTER 6. THE NUMBER FIELD SIEVE
input: n m matrix M
output: M on reduced echelon form
1. set i = 1
2. while i n
4. return M.
66 Integer Factorization
Per Leslie Jensen
6.3. AN EXTENDED EXAMPLE
6. set k = 0
7. set k = k + 1
Rk (3SRk2 )
8. compute R2k = 2 mod p2k
Integer Factorization 67
Per Leslie Jensen
CHAPTER 6. THE NUMBER FIELD SIEVE
I want to factor the number n = 3218147, which should be tested for pri-
mality before proceding with the factorization.
Even though I do not need the quadratic character base for the sieving
step, I construct it here. It consists of the same type of elements as the AFB,
but the elements are larger than the largest in the AFB. I choose to have 6
elements in the QCB
QCB = (233, 166) (233, 205) (233, 211) (281, 19) (281, 272) (281, 130)
68 Integer Factorization
Per Leslie Jensen
6.3. AN EXTENDED EXAMPLE
n = 3218147
m = 117
6.3.2 Sieving
Now I want to find elements that are RFB-smooth and AFB-smooth. This is
done by sieving, as described in 6.2.3. I start by selecting a linewidth for the
sieving, I choose to sieve with a [200; 200], then I follow the algorithm
from 6.2.3, which gives me the number of smooth elements wanted, e.g. the
pair (13, 7) has rational norm
Nrational (a, b) = a + bm
Nrational (13, 7) = 13 + 7 117
= 832
= 26 13
a
Nalgebraic (a, b) = (b)d f ( )
b
13
Nalgebraic (13, 7) = (7)3 f ( )
7
= 11685
= (1) 3 5 19 41
i.e. it is also smooth over the AFB, so (13, 7) is one of the desired elements. In
total I need at least 70 elements (#RFB+#AFB+#QCB+1). I find the following
72 elements
Integer Factorization 69
Per Leslie Jensen
CHAPTER 6. THE NUMBER FIELD SIEVE
n = 3218147
m = 117
M[63] = 0 . . .
The next 16 entries are the factorization of Nrational (13, 7) over RFB, and we
only store the parity of the power, so I get
M[63] = 0000000010000000000 . . .
The next 47 entries are the factorization of Nalgebraic (13, 7) over AFB, and I
store only the parity of the power, so I get
70 Integer Factorization
Per Leslie Jensen
6.3. AN EXTENDED EXAMPLE
M[63] = 000000001000000000000101000001000010000000000000000000000000000000 . . .
M[63] = 000000001000000000000101000001000010000000000000000000000000000000100100
1010000001000000010001000100000000000000000100000000000000000000100101
2 3
6
6 1100000010000000000100001000110000000000000000000000000001000000100111 7
7
6
6 1101000000000000000010010000000110000000000010000000000000000000111101 7
7
6
6 1000000000000000010000000010100000000000000000000000000000100000110110 7
7
6
6 1110000000000000000000100001000000000000000000000010000000000000111100 7
7
6
6 0000000000000000000101010100000000000000000000000000000000000000100110 7
7
6
6 0110000000000000000000100000010000000100000000000000000000000100101111 7
7
6
6 0000000000000000010100100000100000000000000000000000000000000000011110 7
7
6
6 0010000100000000010001000001000000010000001000000000000000000000011110 7
7
6 .. 7
.
6 7
6 7
M=6
6 .. 7
7
6
6 . 7
7
6
6 0000001000000000000101000001000010000000000000000000000000000000100100 7
7
6
6 0000010010000000000010000000100000000010000000000000000000000000011111 7
7
6
6 0010000000010000010000000000000000000000010000000000000000000000001100 7
7
6
6 0001000000000000010010000000000001000000000001000000000000000000000111 7
7
6
6 0001000000000010000010000100000100000000000100000000000000000000010111 7
7
6
6 0000000001000010010001010000000000000000010010000000000000000000010100 7
7
6
6 0000100000001000000000100000000000000000000000000010001000000000100100 7
7
6
6 0100000000000000000000000010000010000010000000000000000000000000001111 7
7
4 0100100010000000000001000000000000000100000000000000100000000000000100 5
0000000000000000000000000000000000000000000000000000000000000000000000
Integer Factorization 71
Per Leslie Jensen
CHAPTER 6. THE NUMBER FIELD SIEVE
100000000000000000000000000000000000000000000000000000000000000000100000
2 3
6 010000000000000000000000000000000000000000000000000000000000000000100000 7
001000000000000000000000000000000000000000000000000000001000010100100100
6 7
6 7
000100000000000000000000000000000000000000000000000000001000101101110100
6 7
6 7
6 7
6 000010000000000000000000000000000000000000000000000000000000111001010000 7
6 7
6
6 000001000000000000000000000000000000000000000000000000000001010010110000 7
7
6
6 000000100000000000000000000000000000000000000000000000000000000000000000 7
7
6
6 000000010000000000000000000000000000000000000000000000000000111000010100 7
7
6
6 000000001000000000000000000000000000000000000000000000000001111110000100 7
7
6
6 000000000100000000000000000000000000000000000000000000000001111011101100 7
7
6
6 000000000010000000000000000000000000000000000000000000000001110101000000 7
7
6 .. 7
6
6 . 7
7
6
6 .. 7
7
6
6 . 7
7
6 000000000000000000000000000000000000000000000000000001000001110101111000 7
Me = 6
6 7
6 000000000000000000000000000000000000000000000000000000100001111101010000 7
7
6
6 000000000000000000000000000000000000000000000000000000010000000111001000 7
7
6
6 000000000000000000000000000000000000000000000000000000000100111001011000 7
7
6
6 000000000000000000000000000000000000000000000000000000000011111011101100 7
7
6
6 000000000000000000000000000000000000000000000000000000000000000000000010 7
7
6
6 000000000000000000000000000000000000000000000000000000000000000000000000 7
7
6
6 000000000000000000000000000000000000000000000000000000000000000000000000 7
7
6
6 000000000000000000000000000000000000000000000000000000000000000000000000 7
7
6
6 000000000000000000000000000000000000000000000000000000000000000000000000 7
7
6
6 000000000000000000000000000000000000000000000000000000000000000000000000 7
7
6
6 000000000000000000000000000000000000000000000000000000000000000000000000 7
7
6
6 000000000000000000000000000000000000000000000000000000000000000000000000 7
7
6
6 000000000000000000000000000000000000000000000000000000000000000000000000 7
7
6
6 000000000000000000000000000000000000000000000000000000000000000000000000 7
7
4 000000000000000000000000000000000000000000000000000000000000000000000000 5
000000000000000000000000000000000000000000000000000000000000000000000000
From this I get a vector of free variables, i.e. elements which lead to a
solution by setting one or more of them to 1.
Vfree = [000000000000000000000000000000000000000000000000000000001001111111111101]
n = 3218147
m = 117
Matrix Me
72 Integer Factorization
Per Leslie Jensen
6.3. AN EXTENDED EXAMPLE
Vsol = [001100000000000100010101100000001111100110000011010110001000000000000000]
Now I apply the algorithm from 6.2.5, Ind I take it step by step.
I start by computing the product S(x) in Z[x]
p
S(m) f 0 (m)2 = 204996970067909038034618021120435457884160000 6786134884
= 1391137089692141371945604152740435826018211007037440000
= 1179464747117157958147891200
p
Y = S(m) f 0 (m)2 mod n
= 1179464747117157958147891200 mod 3218147
= 2860383
From now we need the product S(x) in Z[x]/f (x) instead of Z[x]
Integer Factorization 73
Per Leslie Jensen
CHAPTER 6. THE NUMBER FIELD SIEVE
p = 42 1042 + 43
I then choose a random r(x) Zp [x]/f (x)
r(x) = 33143395517908901720508889678332774413150964 x2 +
9935116049822899034547246528762346304828856 x +
37523378477365408863809265257916410422084120
As described in 6.2.5 I then calculate R0 + R1 y (Zp [x]/f (x))[y]/(y 2 S)
and get
R1 (x) = 40438443742646682162976058010107146767850928 x2 +
10975873976574477160776917631060091343250704 x +
25602452817775159059194687644564881010593683
This is usable, i.e. SR12 = 1 so I compute R2 and get
2
R2 (x) = 1484280452534851932191188732252856860031306910058907052137946073002617221365360609425453x +
1012438783385021395408772861725005923451102945520342680286858174520561778089965352712171x +
33707643386048967064886978071322595680303104670451605589553615208517742239145583274137
So I get
Salg (x) = S(x)R2 (x) mod p2
17805551987379270x2 + 460901612569132380 + 14073072352402140x
=
262144
ndivisor 1 = gcd(n, X Y)
= gcd(3218147, 484534 2860383)
= 17771
ndivisor 2 = gcd(n, Y X )
= gcd(3218147, 2860383 484534)
= 18112
So I have computed that n is the product of 1777 and 1811, and this is in
fact the prime factorization.
1
Birth year of the great Carl Friedrich Gauss(1777-1855)
2
Birth year of the legendary Evariste Galois(1811-1832)
74 Integer Factorization
Per Leslie Jensen
CHAPTER 7
Implementing GNFS
Machines take me by surprise with great frequency.
- Alan Turing(1912-1954)
In Chapter 6 I described the number field sieve algorithm in details and this
description has been used to implement the algorithm and the implementa-
tion has been used to correct the algorithms.
Implementing all the steps of the algorithm is a large task and it seems like
the few people who have made a working version have felt that their hard
work should not be accessible to everyone, and have thus kept the imple-
mentation to themselves. This is far from optimal as some interesting views
on the different parts could have emerged from making it publicly available.
There exist few public implementations, and all except one are unusable
for large factorizations because they are implemented with a lot of limitations
for example by predefining the degree of the polynomial and limiting the size
of the factor bases and the size of the numbers that can be tried.
There exists one public workable implementation1 written in C, but the
code is somewhat incomprehensible and is only used as-is instead of being
further developed by other than Chris himself. The code lacks an underly-
ing framework to make it more modularized and the interfaces between the
different steps are much to complex.
My implementation of the number field sieve has taken months to develop
and consulting previous work done on the number field sieve did not always
help me past the obstacles and especially the square root step involved much
more work than expected. As the days passed it was more and more clear
to me that this thesis was needed to make the number field sieve more di-
gestible.
My implementation has the following purposes
To show that the algorithms for the different steps in Chapter 6 work in
practice.
To be a general framework for number field sieve implementations, by
providing a clear interface between the various steps.
1
http://www.math.ttu.edu/cmonico/software/ggnfs/
Integer Factorization 75
Per Leslie Jensen
CHAPTER 7. IMPLEMENTING GNFS
kept a secret.
76 Integer Factorization
Per Leslie Jensen
7.1. PGNFS - A GNFS IMPLEMENTATION
7.1.1 Input/Output
As seen in Chapter 6 there are some variables to set and this mean that the
number field sieve is not a black box factoring algorithm. It is dependent
on choosing the right parameters and in Section 7.2 I will give an overview
of selecting parameters for the algorithm. The parameters cannot be found
algorithmic but only by emperically testing different values.
I have chosen to have one input file with values for all of the different
steps this makes it more easy to use and to move an instance of a problem to
another machine somewhere in between steps.
The values are put in a text file denoted input in the rest of this Chapter,
it is the only parameter the programs are called with.
The different steps results in some more data these are creates in new files
which are named after the input, i.e. if pgnfs polyselector is called with
the text file mycomposite.txt it results in a file mycomposite.txt.poly.
input holds the following values:
n The integer to be factored
afactor A desired co-factor of the leading coefficient
of the polynomial
pbound Prime bound for sieving experiments in the polynomial step
RFB bound Bound for primes in RFB.
AFB bound Bound for primes in AFB.
QCB num Number of desired elements in QCB.
pfile Absolute path of the file containing
at least all primes up to AFBbound
C Defining the sieving interval, i.e. sieving
(ai , bi ) where ai [C; C]
Summary
Program: pgnfs polyselector
Use file(s) : input
Generate file(s) : input.poly
Integer Factorization 77
Per Leslie Jensen
CHAPTER 7. IMPLEMENTING GNFS
which means that they are storage consuming but can easily be compressed
if they were to be distributed.
Summary
Program: pgnfs makebases
Use file(s) : input, input.poly
Generate file(s) : input.rfb,input.afb,input.qcb
7.1.4 Sieving
Line sieving as described in 6.2.3 is implemented in pgnfs sieve. It ap-
pends found relations to the file input.rel and current sieve line is stored
in input.line so the program can be terminated at any given time and
started again continuing from where it was stopped.
This also enables a primitive way of distributing the workload to multiple
machines as they can be given a different sieving line to start from and the
different results can then be merged.
Summary
Program: pgnfs sieve
Use file(s) : input, input.poly,input.rfb,
input.afb
Generate file(s) : input.line,input.rel
Summary
Program: pgnfs linearalg
Use file(s) : input,input.poly,input.rfb,
input.afb,input.qcb,input.rel
Generate file(s) : input.dep,input.mat
78 Integer Factorization
Per Leslie Jensen
7.2. PARAMETERS
The calculations use integer arithmetic with very large integers and should
continuously be reduced to avoid unnecessary large computations.
Summary
Program: pgnfs sqrt
Use file(s) : input,input.poly,input.rel,
input.mat,input.dep
Generate file(s) : stdout
3218147
12
1000
60
198
6
/home/pleslie/PROJECTS/Thesis/SRC/TEST/primes.txt
200
7.2 Parameters
Th number field sieve is, as mentioned, not a black box factoring machine.
It needs the right parameters, and these are not intuitively selected.
The values are empirically studied and guidelines for selecting parame-
ters are also dependent on the implementation. As described previously it is
the polynomial selected that dictates the success. A bad polynomial would
making the sieving step complicated as it does not generate smooth norms.
In the table below I have given parameters for successful factorizations
with pGNFS and parameters used for record factorizations by others. These
can be used as rough estimates for choosing the right parameters. The num-
bers are not necessarily the smallest that can be used, but they works.
name digits of RFB bound AFB bound QCB num C
n
ext-example 7 60 200 5 200
mytest10 10 100 300 10 1000
mytest20 20 400 1300 10 10000
mytest40 40 40000 100000 40 50000
mytest60 60 250000 700000 50 200000
RSA-140[16] 140 8000000 17000000 100 9000000000
RSA-155[15] 155 17000000 17000000 100 12000000000
Integer Factorization 79
Per Leslie Jensen
CHAPTER 7. IMPLEMENTING GNFS
80 Integer Factorization
Per Leslie Jensen
Bibliography
My method to overcome a difficulty is to go round it.
- George Polya(1887-1985)
Integer Factorization 81
Per Leslie Jensen
BIBLIOGRAPHY
[15] Stefania Cavallar, Bruce Dodson, Arjen K. Lenstra, Walter M. Lioen, Pe-
ter L. Montgomery, Brian Murphy, Herman te Riele, Karen Aardal,
Jeff Gilchrist, Gerard Guillerm, Paul C. Leyland, Joel Marchand,
Francois Morain, Alec Muffett, Chris Putnam, Craig Putnam, and
Paul Zimmermann, Factorization of a 512-bit RSA modulus, Theory and
Application of Cryptographic Techniques, 2000, pp. 118.
[16] Stefania Cavaller, Bruce Dodson, Arjen Lenstra, Paul Leyland, Walter
Lioen, Peter L. Montgomery, Brian Murphy, Herman te Riele, and
Paul Zimmermann, Factorization of rsa-140 using the number field sieve,
Proceedings of Asiacrypt 99 (1999).
[17] D. Chaum, Blind signatures for untraceable payments., CRYPTO 82 (1983),
199203.
[18] Clifford C. Cocks, A note on non-secret encryption, CESG report (classified
until december 1997) (1973).
[19] D. Coppersmith, Evaluating logarithms in gf(2n), Proc. 16th ACM Symp.
Theory of Computing (1984), 201207.
[20] , Fast evaluation of logarithms in fields of characteristic two, IEEE
Transactions on Information Theory 30 (1984), 587594.
[21] , Small solutions to polynomial equations, and low exponent rsa vul-
nerabilities., Journal of Cryptology 10 (1997), 223260.
[22] Joan Daemen and Vincent Rijmen, Rijndael for aes., AES Candidate Con-
ference, 2000, pp. 343348.
[23] Whitfield Diffie and Martin E. Hellman, New directions in cryptography,
IEEE Transactions on Information Theory IT-22 (1976), no. 6, 644
654.
[24] William Dunham, Euler - the master of us all, The Mathematical Associa-
tion Of America, 1999.
[25] T. ElGamal, A public-key cryptosystem and a signature scheme based on
discrete logarithms, IEEE Transactions on Information Theory IT-31
(1985), 1018.
[26] J. H. Ellis, The possibility of secure non-secret digital encryption, CESG report
(classified until december 1997) (1970).
[27] Andreas Enge, Elliptic curves and their applications to cryptography, an in-
troduction, Kluwer Academic Publishers, 1999.
[28] Evariste Galois, M`emoire sur la division des fonctions elliptiques de premi`ere
esp`ece, Manuscrits et papiers inedits de Galois par M.J. Tannery
(1907).
[29] Carl Friedrich Gauss, Disquisitiones arithmeticae (english translation by
arthur a. clarke), Yale University Press, 1801.
[30] Jr. H. W. Lenstra, Elliptic curve factorization, personal communication via
samuel wagstaff jr., 1985.
82 Integer Factorization
Per Leslie Jensen
BIBLIOGRAPHY
[31] Darrel Hankerson, Alfred Menezes, and Scott Vanstone, Guide to elliptic
curve cryptography, Springer-Verlag NY, 2004.
[32] R. M. Huizing, An implementation of the number field sieve, CWI Report
NM-R9511 (1995).
[33] J. Hastad, Solving simultaneous modular equations of low degree, SIAM Jour-
nal of Computing 17 (1988), 336341.
[34] Christian U. Jensen, Arne Ledet, and Noriko Yui, Generic polynomials -
constructive aspects of the inverse galois problem, Cambridge University
Press, 2002.
[35] Paul Kocher, Joshua Jaffe, and Benjamin Jun, Differential power analysis,
Lecture Notes in Computer Science 1666 (1999), 388397.
[36] Paul C. Kocher, Timing attacks on implementations of diffie-hellman, rsa, dss,
and other systems., Advances in Cryptology - CRYPTO 96 - Lecture
Notes in Computer Science 1109 (1996), 104113.
[37] Kenji Koyama, Ueli M. Maurer, Tatsuaki Okamoto, and Scott A. Van-
stone, New public-key schemes based on elliptic curves over the ring zn ,
CRYPTO, 1991, pp. 252266.
[38] Franz Lemmermeyer, Reciprocity laws: From euler to eisenstein, Springer-
Verlag Berlin and Heidelberg GmbH & Co., 2000.
[39] Arjen K. Lenstra and Jr. Hendrik W. Lenstra (eds.), The development of the
number field sieve, Lecture Notes in Mathematics, vol. 1554, Springer-
Verlag, Berlin, 1993.
[40] Steven Levy, Crypto - secrecy and privacy in the new code war, Penguin
Books, 2000.
[41] Robert James McEliece, A public-key cryptosystem based on algebraic coding
theory, JPL DSN Progress Report 42-44 (1978), 114116.
[42] Alfred J. Menezes, Elliptic curve public key cryptosystems, Kluwer
Aca-
demic Publishers, 1993.
[43] Alfred J Menezes, Paul C. van Oorschot, and Scott A. Vanstone, Hand-
book of applied cryptography, CRC Press, 1996.
[44] Ralph E. Merkle, Secrecy, authentication, and public key systems, Ph.D. the-
sis, Stanford University, Dept. of Electrical Engineering, 1979.
[45] Peter L. Montgomery, Square roots of products of algebraic numbers, Math-
ematics of Computation 19431993: a half-century of computational
mathematics (Walter Gautschi, ed.), American Mathematical Society,
1994, pp. 567571.
[46] , A block lanczos algorithm for finding dependencies over gf(2)., EU-
ROCRYPT, 1995, pp. 106120.
[47] Francois Morain, Implementation of the atkin-goldwasser-kilian primal-
ity testing algorithm, Tech. Report RR-0911, Departement de
Mathematiques, Universite de Limoges, 1988.
Integer Factorization 83
Per Leslie Jensen
BIBLIOGRAPHY
[48] Michael A. Morrison and John Brillhart, A method of factoring and the fac-
torization of F7 , Mathematics of Computation 29 (1975), 183205.
[49] Brian Antony Murphy, Polynomial selection for the number field sieve integer
factorisation algorithm, PhD - thesis (1999).
[50] Winfried B Muller
and Rupert Nobauer, Some remarks on public-key cryp-
tosystems, Sci. Math. Hungar 16 (1981), 7176.
[51] , Cryptanalysis of the dickson-scheme, Advances in Cryptology - EU-
RORYPT 85 - Lecture Notes in Computer Science 219 (1986), 5061.
[52] Phong Nguyen, A Montgomery-like square root for the number field sieve,
Lecture Notes in Computer Science 1423 (1998), 151??
[53] Tatsuaki Okamoto and Shigenori Uchiyama, A new public-key cryptosys-
tem as secure as factoring, Lecture Notes in Computer Science 1403
(1998), 308318. MR 1 729 059
[54] R. G. E. Pinch, Extending the Wiener attack to RSA-type cryptosystems, Elec-
tronics Letters 31 (1995), no. 20, 17361738.
[55] S. Pohlig and Martin E. Hellman, An improved algorithm for computing log-
arithms over gf(p) and its cryptographic significance, IEEE Transactions
on Information Theory 24 (1978), 106110.
[56] John M. Pollard, Theorems on factorization and primality testing., Proceed-
ings of the Cambridge Philosophical Society 76 (1974), 521528.
[57] , Monte carlo method for factorization., BIT 15 (1975), 331334.
[58] , Monte carlo methods for index computation (mod p), Mathematics
of Computation 32 (1978), 918924.
[59] , Lattice sieving, Lecture Notes in Mathematics 1554 (1991), 4349.
[60] Carl Pomerance, The quadratic sieve factoring algorithm, Advances in cryp-
tology: EUROCRYPT 84 (Berlin) (Thomas Beth, Norbert Cot, and In-
gemar Ingemarsson, eds.), Lecture Notes in Computer Science, vol.
209, Springer-Verlag, 1985, pp. 169182.
[61] Martin O. Rabin, Digitalized signatures and public-key functions as in-
tractable as factorization, MIT Technical Report TR-212 (1979).
[62] Hans Riesel, Prime numbers and computer methods for factorization (2nd ed.),
Birkhauser Verlag, Basel, Switzerland, Switzerland, 1994.
[63] Laura Toti Rigatelli, Evariste galois 1811-1832 (english translation by john
denton), Birkhauser Verlag, 1996.
[64] Ron L. Rivest, Adi Shamir, and Len Adleman, A method for obtaining
digital signatures and public-key cryptosystems, Communications of the
ACM 21 (1978), 120126.
[65] Ronald L. Rivest and Robert D. Silverman, Are strong primes needed for
rsa?, 1998.
[66] Bruce Schneier, The Blowfish encryption algorithm, Dr. Dobbs Journal of
Software Tools 19 (1994), no. 4, 38, 40, 98, 99.
84 Integer Factorization
Per Leslie Jensen
BIBLIOGRAPHY
[67] Rene Schoof, Counting points on elliptic curves over finite fields, Journal de
Theorie des Nombres de Bordeaux 7 (1995), 219254.
[68] Daniel Shanks, Class number, a theory of factorization, and genera, Proceed-
ings of Symposia in Pure Mathematics 20 (1969), 415440.
[69] Joseph H. Silverman and John Tate, Rational point on elliptic curves,
Springer-Verlag, 1992.
[70] Simon Singh, Fermats last theorem, Fourth Estate, 1997.
[71] Nigel P. Smart, The discrete logarithm problem on elliptic curves of trace one,
Journal of Cryptology 12 (1999), no. 3, 193196.
[72] Peter Smith and Michael J. J. Lennon, LUC: A new public key system, Tech.
report, 1993.
[73] Ian Stewart, Galois theory, second edition, Chapman & Hall/CRC, 1989.
[74] Douglas R. Stinson, Cryptography, theory and practice, CRC Press, 1995.
[75] A. E. Western and J. C. P. Miller, Tables of indices and primitive roots, Royal
Society Mathematical Tables 9 (1968), xxxviilxii.
[76] M. Wiener, Cryptanalysis of short rsa secret exponents, IEEE Transactions
on Information Theory 36 (1990), 553558.
[77] H. C. Williams and J. O. Shallit, Factoring Integers Before Computers, Math-
ematics of Computation 19431993: a half-century of computational
mathematics (Providence) (Walter Gautschi, ed.), American Mathe-
matical Society, 1991, pp. 481531.
[78] Hugh C. Williams, A modification of the rsa public-key encryption procedure,
IEEE Transactions On Information Theory IT-26 No. 6 (1980), 726
729.
[79] , A p + 1 method of factoring, Mathematics of Computation 39
(1982), 225234.
[80] Malcolm J. Williamson, Non-secret encryption using a finite field, CESG re-
port (classified until december 1997) (1974), 12.
[81] F. W. Winterbotham, The ultra secret: The inside story of operation ultra,
bletchley park and enigma, Orion mass market paperback, 2000.
Integer Factorization 85
Per Leslie Jensen