A Course in
A Course in Cryptography
lattice-based and code-based cryptosystems.
Many examples, figures and exercises, as well as SageMath (Python) computer
code, help the reader to understand the concepts and applications of modern
cryptography. A special focus is on algebraic structures, which are used in many
cryptographic constructions and also in post-quantum systems. The essential
mathematics and the modern approach to cryptography and security prepare
the reader for more advanced studies.
The text requires only a first-year course in mathematics (calculus and linear
algebra) and is also accessible to computer scientists and engineers. This book is Heiko Knospe
suitable as a textbook for undergraduate and graduate courses in cryptography
as well as for self-study.
A Course in
Heiko Knospe
Pure and Applied
A Course in
Heiko Knospe
Gerald B. Folland (Chair) Steven J. Miller
Jamie Pommersheim Serge Tabachnikov
Heiko Knospe
Contents
The aim of this book is to explain the current cryptographic primitives and schemes
and to outline the essential mathematics required to understand the building blocks
and assess their security. We cover the widespread schemes, but we also want to ad-
dress some of the recent developments in post-quantum cryptography. The mathemat-
ical and, in particular, algebraic and number-theoretical foundations of cryptography
are explained in detail. The mathematical theory is presented with a focus on crypto-
graphic applications and we do not strive for maximal generality. We look at a selection
of cryptographic algorithms according to their current and supposed future relevance,
while leaving out several historic schemes. Since cryptography is a very active field,
some uncertainty regarding future developments will of course remain.
Why write yet another textbook on cryptography? We hope to convince potential
readers by listing some of the unique features of this book:
• The fundamentals of cryptography are presented with rigorous definitions while
being accessible to undergraduate students in science, engineering and mathe-
• Formal definitions of security as used in the modern literature on cryptography;
• Focus on widely used methods and on prospective cryptographic schemes;
• Introduction to quantum computing and post-quantum cryptography;
• Numerical examples and SageMath (Python) code.
Cryptography can easily be underestimated by mathematicians. Several textbooks
contain excellent descriptions of the mathematical theory, but fall short of explaining
how to use these algorithms in practice. In fact, the main purpose of cryptography is
to achieve security objectives such as confidentiality and integrity in the presence of
powerful adversaries. Well-known schoolbook algorithms like RSA can be insecure
without adaptations, for example, by incorporating random data.
This book follows the provable security approach which is adopted in the modern
literature. Well-defined experiments (games) are used in which the success probability
of potential attackers determines the security. Secure schemes have the property that
an adversary with restricted resources can do little better than randomly guess the se-
cret information. Using this approach, the security is reduced to standard assumptions
that are generally believed to be true. In this book, we give exact security definitions
and some proofs, but refer to the literature for more advanced proofs and techniques,
for example, using the sequence of games approach.
We find that examples are very helpful and include computations using the open
source mathematics software SageMath (aka Sage) [Sag18]. SageMath contains many
algebraic and number theoretic functions which can be easily used and extended. Al-
though the software might be better known among mathematicians than scientists
and engineers, it is easily accessible and very suitable for cryptographic computations.
SageMath is based on Python and contains other open source software as, for example,
Singular, Maxima, PARI, GAP, NumPy, SciPy, SymPy and R. In recent years, Python
has gained immense popularity among scientists. One of its advantages is that results
can be obtained quickly without much programming overhead. In this book, we opt
for SageMath instead of plain Python since SageMath has much better support for alge-
braic computations, which are often needed in modern cryptography. SageMath also
has a convenient user interface and supports the popular Jupyter browser notebooks.
Numerical examples can be used to help understand cryptographic constructions
and their underlying theory. Toy examples, in which the numbers and bit-lengths are
too small for any real-world security, can still be useful in this respect. The reader is
encouraged to perform computations and to write their own SageMath functions. We
also provide exercises with both theoretical and numerical problems.
The book should be accessible to mathematics, science or engineering students af-
ter completing a first year’s undergraduate course in mathematics (calculus and linear
algebra). The material originates from several courses on cryptography for computer
scientists and communication engineers which the author has taught. Since the previ-
ous knowledge can be quite heterogeneous, we decided to include several elementary
topics. In the author’s teaching experience, abstract algebra as well as linear algebra
over general fields deserves special attention. Linear maps over finite fields play an
important role in many cryptographic constructions. This book should be largely self-
contained and requires no previous knowledge of discrete mathematics, algebra, num-
ber theory or cryptography. We do not strive for greatest generality and frequently refer
to more specialized textbooks or articles.
Cryptography can be taught at different levels and to different audiences. This
book can be used in bachelor’s and master’s courses, as well as by practitioners, and
is suitable for a general audience wanting to understand the fundamentals of modern
cryptography. Many mathematics and computer science students may already have the
necessary background in discrete mathematics, elementary number theory and prob-
ability and can therefore skip Chapters 1 and 3. Chapter 4 provides the necessary al-
gebraic constructions and is recommended to all readers without solid knowledge of
abstract algebra. From my teaching experience, algebra can be a major stumbling block
and should not be underestimated. Chapters 1, 3 and 4 thus provide the mathematical
background of cryptography.
We decided to begin with the core cryptographic content as early as possible, so
Chapter 2 deals with encryption schemes and the modern definitions of security. This
chapter requires only basic discrete mathematics, complexity and probability theory
and is recommended for most readers, even if they have some prior knowledge of cryp-
tography. Understanding the provable security approach is crucial for the subsequent
chapters of this book. Chapter 5 deals with block ciphers and AES in particular, which
is a crucial part of every modern course on cryptography. Chapter 6 explores stream
ciphers, which form a natural complement, but it is also possible to omit this chapter
if you are short on time. We have already mentioned that modern cryptography goes
beyond encryption. Integrity protection is another major objective, and hash functions
and message authentication codes play a crucial role in this. These topics are addressed
in Chapters 7 and 8. Chapters 9, 10 and 11, which are on public-key encryption, key
establishment and signatures, introduce the fundamentals of public-key cryptography.
We explain RSA and Diffie-Hellman in particular and discuss their security, which is
based on hard number-theoretic problems.
We therefore think that Chapters 2, 5, 7, 8, 9, 10 and 11, along with the neces-
sary mathematical preparations (Chapters 1, 3 and 4), should be covered in every first
course on cryptography. A one-semester bachelor’s module might end after Chapter
11, but whenever possible, we recommend including Chapter 12 on elliptic curve cryp-
tography. This has been the topic of intensive research in the last few decades but has
now become part of well-established cryptography and is implemented by every In-
ternet browser, for example. We believe the basics of elliptic curves are accessible to
readers after the preparatory work in Chapters 3 and 4. There are, however, more ad-
vanced topics in elliptic curves that are not treated here.
Chapters 13, 14 and 15 provide an introduction to the new field of post-quantum
cryptography. In Chapter 13, we explore the basics of quantum computing and explain
why quantum computers can break classic public-key schemes like RSA. Chapters 14
and 15 deal with two major types of post-quantum systems that are based on lattices
and error-correcting codes, respectively. We focus on the foundations and several se-
lected encryption schemes. Note that there are other post-quantum systems, for exam-
ple, cryptosystems from isogenies of elliptic curves or multivariate-quadratic-equations
signatures, which are not covered in this book. Chapters 13–15 are more challenging
with respect to the level of calculus and abstract algebra. However, we spend some
time on examples (many of them using SageMath) and we hope that the content of
these three chapters is accessible for master’s or advanced bachelor’s students. We ex-
pect that quantum computing and post-quantum schemes will become increasingly
important in the future.
I would be happy to receive feedback and suggestions for improvement. Please
email your comments to heiko.knospe@th-koeln.de. Updates and additional mate-
rial, for example, solutions to selected exercises and SageMath code, are available on
the following website: https://github.com/cryptobook.
Finally, I would like to thank my colleagues and my students for their valuable
feedback on my cryptography course and on earlier versions of the manuscript.
Chapter 1
Chapter 3 Chapter 2
Chapter 4
Chapter 5
Chapter 6 Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 12
SageMath (aka Sage) is an open source mathematics software which is ideally suited for
cryptography. SageMath supports a lot of algebraic constructions used in cryptography,
and results can be achieved with relatively few lines of code. Many experts in the field
use SageMath for their research and for prototyping before finally switching to faster
programming languages like C and C++.
This book contains a large number of examples and exercises which use SageMath,
and readers are encouraged to do their own experiments. The aim of this chapter is to
give a brief introduction to SageMath. Further information and links to online docu-
mentation can be found on http://www.sagemath.org/help.html. We also recom-
mend the book [Bar15].
0.1. Installation
The installation of SageMath is easy using pre-built binaries which are available from
http://www.sagemath.org/download.html for several types of CPU and operating
systems including Linux (Ubuntu, Debian), macOS and Microsoft Windows. A Docker
container is also available. Since Sage 8.0, Windows users can use a binary installer
(instead of a virtual machine) and run either the SageMath shell or the browser-based
interface by clicking on the respective icon. macOS users can install the app.dmg pack-
age, move the software to the Applications folder and start SageMath either from the
command line or by clicking on the app icon. On Linux, the downloaded package has
to be uncompressed (using bunzip2 and tar). The SageMath directory contains an ex-
ecutable file that can be started from the command line. The directory can be moved if
desired. It is advisable to add the directory to the local PATH variable or otherwise spec-
ify the full path when sage is executed. The sage executable starts a shell and sage
-notebook=jupyter runs the Jupyter notebook server and opens a browser window.
The packages and the installed software require several gigabytes of free disk space.
The SageMath distribution includes a long list of open source software and it is not
usually necessary to install additional packages.
In the following example, we define a 3 × 3 matrix over the integers and compute
the determinant and the inverse matrix.
sage: A= matrix ([[1 ,2 ,3] ,[ -1 ,3 ,4] ,[2 ,2 ,3]])
sage: det(A)
sage: 1/A
[ -1 0 1]
[-11 3 7]
[ 8 -2 -5]
window. In the following, we use Jupyter notebooks which are very popular in the
Python community. Alternatively, a legacy SageMath notebook server can be started
with sage -notebook=sagenb.
A new notebook is created by clicking on the ‘New’ button in the upper right corner
and choosing SageMath (see Figure 0.1). The commands and the code are written in
input cells and a cell is evaluated by pressing ‘Shift + Enter’ or by clicking on the play
symbol in the toolbar. The ‘Enter’ key does not interpret the code, but rather creates
a new line. It is a good practice not to write too much into a single cell, although a
cell can contain several lines as well as multiple commands in one line (separated by a
semicolon). Do not forget to rename an untitled notebook and to save (and checkpoint)
your work via the ‘File’ menu.
The code shown in Figure 0.2 implements a loop. For each 1 ≤ 𝑛 < 20 the factorial
𝑛! is printed out using the Python format specification. You may be unfamiliar with
the way Python structures the code, since it differs from languages like C and Java. The
colon in the first line and the indentation of the second line are very important.
Now we perform the same computation in the polynomial ring 𝑅 = 𝐺𝐹(2)[𝑡] over
the binary field 𝐺𝐹(2). The reader is advised to refer to Chapters 3 and 1 for the math-
ematical background of the following examples.
sage: R.<t> = PolynomialRing (GF (2))
sage: (1+t)^10
t^10 + t^8 + t^2 + 1
We verify the result by multiplying the residue classes. Note the difference to the mul-
tiplication in 𝑅 = 𝐺𝐹(2)[𝑡].
sage: (a+1)*( a^7 + a^6 + a^5 + a^4 + a^2 + a)
sage: (t+1)*( t^7 + t^6 + t^5 + t^4 + t^2 + t)
t^8 + t^4 + t^3 + t
We define a 4 × 4 matrix over 𝐺𝐹(256) and let SageMath compute the inverse:
sage: M= matrix (F ,[[a,a+1 ,1 ,1] ,[1 ,a,a+1 ,1] ,[1 ,1 ,a,a+1],
sage: 1/M
[a^3 + a^2 + a a^3 + a + 1 a^3 + a^2 + 1 a^3 + 1]
[ a^3 + 1 a^3 + a^2 + a a^3 + a + 1 a^3 + a^2 + 1]
[a^3 + a^2 + 1 a^3 + 1 a^3 + a^2 + a a^3 + a + 1]
[ a^3 + a + 1 a^3 + a^2 + 1 a^3 + 1 a^3 + a^2 + a]
In Chapter 5, we will see that the field 𝐺𝐹(256) and the matrix 𝑀 are used in the AES
block cipher.
Chapter 1
Modern cryptography relies on mathematical structures and methods, and this chap-
ter contains the mathematical background from discrete mathematics, computational
complexity and probability theory. We recapitulate elementary structures like sets, re-
lations, equivalence classes and functions in Section 1.1. Fundamental combinatorial
facts are outlined in Section 1.2 and the asymptotic notation is explained. Section 1.3
discusses complexity and the Big-O notation. Section 1.4 then deals with basic prob-
ability theory. Random numbers and the birthday problem are addressed in Section
For a general introduction to undergraduate mathematics the reader may, for ex-
ample, refer to the textbook [WJW+ 14]. Discrete mathematics and its applications are
discussed in [Ros12].
Definition 1.3. The cardinality or size of a finite set 𝑋 is the number of its elements
and is denoted by |𝑋|. ♢
8 1. Fundamentals
Note that there are other notions of size (see Warning 1.34 below): the size of an
integer is the number of bits needed to represent it and the size of a binary string is its
Sets can be defined explicitly, for example by enumeration or by intervals of real
numbers, or implicitly by formulas.
Example 1.4. 𝑀 = {𝑥 ∈ ℤ | 𝑥4 < 50} implicitly describes the set of integers 𝑥 for
which 𝑥4 < 50 holds. This set can also be described explicitly:
𝑀 = {−2, −1, 0, 1, 2}. ♢
sage: 2^128
Remark 1.6. It is useful to help understand the difference between small, big and
inaccessible numbers in practical computations. For example, one can easily store one
terabyte (1012 bytes, i.e., around 243 bits) of data. On the other hand, a large amount
of resources are required to store one exabyte (one million terabytes) or 263 bits, and
more than 2100 bits are out of reach.
The number of computing steps is also bounded: less than 240 steps (say CPU
clocks) are easily possible, 260 operations require a lot of computing resources and
take a significant amount of time, and more than 2100 operations are unfeasible. It
is for example impossible to test 2128 different keys with conventional (non-quantum)
Definition 1.7. A function, mapping or map 𝑓 ∶ 𝑋 → 𝑌 consists of two sets (the
domain 𝑋 and the codomain 𝑌 ) and a rule which assigns an output element (an image)
𝑦 = 𝑓(𝑥) ∈ 𝑌 to each input element 𝑥 ∈ 𝑋 . The set of all 𝑓(𝑥) is a subset of 𝑌 called
the range or the image 𝑖𝑚(𝑓). Any 𝑥 ∈ 𝑋 with 𝑓(𝑥) = 𝑦 is called a preimage of 𝑦. Let
𝐵 ⊂ 𝑌 ; then we say that 𝑓−1 (𝐵) = {𝑥 ∈ 𝑋 | 𝑓(𝑥) ∈ 𝐵} is the preimage or inverse image
of 𝐵 under 𝑓.
Example 1.8. Let 𝑓 ∶ {0, 1}4 → {0, 1} be defined by
𝑓(𝑏1 , 𝑏2 , 𝑏3 , 𝑏4 ) = 𝑏1 ⊕ 𝑏2 ⊕ (𝑏3 ⋅ 𝑏4 ) = 𝑏1 ⊕ 𝑏2 ⊕ 𝑏3 𝑏4 .
1.1. Sets, Relations and Functions 9
Refer to Table 1.1 for the definition of XOR (⊕) and AND (⋅). For example, (1, 1, 1, 1)
is a preimage of 1 and (0, 1, 0, 0) is another preimage of 1. (0, 0, 0, 0) is a preimage of 0
and the image of 𝑓 is 𝑖𝑚(𝑓) = {0, 1}. The function 𝑓 is surjective, but not injective (see
Definition 1.10 below).
⊕ 0 1 ⋅ 0 1
0 0 1 0 0 0
1 1 0 1 0 1
Every set 𝑋 has an identity function 𝑖𝑑𝑋 ∶ 𝑋 → 𝑋, which maps each 𝑥 ∈ 𝑋 to itself.
Functions can be composed if the range of the first function lies within the domain of
the second function. Let 𝑓 ∶ 𝑋 → 𝑌 and 𝑔 ∶ 𝑌 → 𝑍 be functions. Then there is a
composite function 𝑔 ∘ 𝑓 ∶ 𝑋 → 𝑍 with (𝑔 ∘ 𝑓)(𝑥) = 𝑔(𝑓(𝑥)) (see Figure 1.1).
𝑓 𝑔
𝑓(𝑥1 ) = 𝑓(𝑥2 ) ⇒ 𝑥1 = 𝑥2 .
10 1. Fundamentals
Note that the above conditions are only necessary and not sufficient.
Remark 1.13. The contraposition of Lemma 1.12 (1) is called the pigeonhole principle:
if |𝑋| > |𝑌 | then 𝑓 is not injective. Suppose 𝑋 is a set of pigeons and 𝑌 a set of holes. If
there are more pigeons than holes, then one hole has more than one pigeon.
Definition 1.14. A set 𝑆 is said to be countably infinite if there is a bijective map
from 𝑆 to the set of natural numbers. We say that a set 𝑆 is countable if it is finite or
countably infinite.
Example 1.15. ℕ and ℤ are countably infinite sets. The sets ℤ𝑛 (for 𝑛 ∈ ℕ) and ℚ are
also countable (see Exercise 3). However, a famous result of Cantor says that the set ℝ
of all real numbers is uncountable. ♢
The floor, ceiling and rounding functions are often used in numerical computa-
Definition 1.16. Let 𝑥 ∈ ℝ.
(1) ⌊𝑥⌋ is the greatest integer less than or equal to 𝑥.
(2) ⌈𝑥⌉ is the least integer greater than or equal to 𝑥.
(3) ⌊𝑥⌉ = ⌊𝑥 + ⌋ is rounding 𝑥 to the nearest integer (round half up).
Now we have 𝑛 different equivalence classes and the quotient set ℤ/ ∼ has 𝑛 elements.
We call this set the residue classes modulo 𝑛 or integers modulo 𝑛 and denote it by ℤ𝑛 or
ℤ/(𝑛). Each residue class has a standard representative in the set {0, 1, … , 𝑛 − 1} and
elements in the same residue class are called congruent modulo 𝑛. ♢
Two integers are congruent modulo 𝑛 if they have the same remainder when they
are divided by 𝑛. In many programming languages, the remainder of the integer divi-
sion 𝑎 ∶ 𝑛 is computed by 𝑎 % 𝑛, but note that the result may be negative for 𝑎 < 0,
whereas the standard representative of 𝑎 modulo 𝑛 is non-negative.
Example 1.22. Let 𝑛 = 11. Then ℤ11 = {0, 1, … , 10} has 11 elements. One has −14 =
8 since −14−8 = −22 is a multiple of 11. The integers 8 and −14 are congruent modulo
11 and one writes −14 ≡ 8 mod 11. The standard representative of this residue class
is 8, and −3, −14, … as well as 19, 30, … are other representatives of the same residue
class. Here is an example using SageMath:
sage: mod ( -892342322327 ,11)
Definition 1.23. A map 𝑓 ∶ {0, 1}𝑛 → {0, 1} is called an 𝑛-variable Boolean function
and a map 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑚 is called an (𝑛, 𝑚)-vectorial Boolean function. A vec-
torial Boolean function can be written as an 𝑚-tuple of 𝑛-variable Boolean functions:
𝑓 = (𝑓1 , 𝑓2 , … , 𝑓𝑚 ). ♢
Boolean functions can be represented by their truth table. Since the table of an 𝑛-
variable Boolean function has 2𝑛 entries, this is only reasonable for small 𝑛. Another
important representation is the algebraic normal form (ANF). This form uses XOR (⊕)
and AND (⋅) combinations of the binary variables.
An 𝑛-variable Boolean function has a unique representation as a polynomial in 𝑛
variables, say 𝑥1 , 𝑥2 , … , 𝑥𝑛 :
𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) = 𝑎𝐼 ⋅ ∏ 𝑥𝑖 .
𝐼⊂{1,…,𝑛} 𝑖∈𝐼
The coefficients 𝑎𝐼 are either 0 or 1 and 𝑓 is a sum (XOR) of products (AND) of the
variables, for example
𝑓(𝑥1 , 𝑥2 , 𝑥3 ) = 𝑥1 ⊕ 𝑥1 𝑥2 ⊕ 𝑥2 𝑥3 ⊕ 𝑥1 𝑥2 𝑥3 .
𝑓(𝑥1 , 𝑥2 , 𝑥3 ) = 𝑥1 + 𝑥1 𝑥2 + 𝑥2 𝑥3 + 𝑥1 𝑥2 𝑥3 mod 2.
14 1. Fundamentals
The algebraic degree of 𝑓 is the maximal length of the products which appear (with
nonzero coefficient) in the above representation. If 𝑓 is a constant function, then the
degree is 0. In the above example, the degree is 3.
We note that higher powers of a variable 𝑥𝑖 are not needed since 𝑥𝑖𝑘 = 𝑥𝑖 for all
𝑘 ≥ 1 and 𝑥𝑖 ∈ {0, 1}. Boolean functions of degree ≤ 1 are called affine. If the degree
is ≤ 1 and the constant part is 0, then the function is linear. An 𝑛-variable linear
Boolean function is a linear mapping from 𝐺𝐹(2)𝑛 to 𝐺𝐹(2). Linear maps are discussed
in Section 4.4.
The degree of a vectorial Boolean function is the maximal degree of its component
functions. A vectorial Boolean function is called affine if all component functions are
Example 1.24. (1) 𝑓(𝑥1 , 𝑥2 , 𝑥3 ) = 𝑥1 𝑥2 +𝑥2 𝑥3 +𝑥1 +1 mod 2 is a 3-variable Boolean
function of algebraic degree 2.
(2) 𝑓(𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 ) = (𝑥3 , 𝑥1 + 𝑥2 , 𝑥1 + 𝑥4 ) mod 2 is a (4, 3)-vectorial Boolean func-
tion. The algebraic degree is 1 and the function is linear since all constants are
(3) Let 𝑓 = 𝑓(𝑥0 , 𝑥1 ) be a 2-variable Boolean function given by the following table:
𝑥1 𝑥0 𝑓(𝑥)
0 0 1
0 1 1
1 0 0
1 1 1
1.2. Combinatorics
Combinatorics investigates finite or countable discrete structures. We are interested in
properties of finite sets like {1 , 2, … , 𝑛} or {0, 1}𝑛 which often occur in cryptographic
Definition 1.25. The factorial of a non-negative integer 𝑛 is defined as follows:
1 for 𝑛 = 0,
𝑛! = {
1 ⋅ 2⋯𝑛 for 𝑛 ≥ 1.
1.2. Combinatorics 15
𝑛 𝑛!
( )= for integers 0 ≤ 𝑘 ≤ 𝑛 . ♢
𝑘 𝑘! (𝑛 − 𝑘)!
𝑛 𝑛 𝑛(𝑛 − 1) ⋯ (𝑛 − (𝑘 − 1))
( )=( )= .
𝑘 𝑛−𝑘 𝑘!
15⋅14⋅13⋅12⋅11⋅10⋅9 32432400
Example 1.26. One has (15) = (15) = = = 6435. The follow-
8 7 7! 5040
ing SageMath code computes all binomials ( ) for 0 ≤ 𝑛 ≤ 15:
The binomial coefficients appear for example in the number of subsets of a given
finite set and in expansions of terms like (𝑎 + 𝑏)𝑛 .
Note that there is a difference between 𝑘-tuples and subsets of cardinality 𝑘. In the
first case, the order of elements is important and in the second case it is not.
Example 1.28. There are (128) = = 8128 different binary words of length 128
2 2
with exactly two ones and 126 zeros. Indeed, each subset of 𝑋 = {1, 2, … , 128} with
two elements gives the two positions where the digit is equal to 1. ♢
Permutations of a finite set 𝑆 can be written using a two-line matrix notation. The
first row lists the elements of 𝑆 and the image of each element is given below in the
second row. The first row can be omitted if 𝑆 is given and the elements are naturally
Example 1.30. Let 𝑆 = {1, 2, 3, 4, 5, 6, 7, 8}; then the following row describes a permu-
tation of 𝑆:
(5 7 1 2 8 6 3 4).
Proposition 1.31. Let 𝑆 be a finite set and |𝑆| = 𝑛; then there are 𝑛! permutations of 𝑆.
Example 1.33. (1) Let 𝑓(𝑛) = 2𝑛3 +𝑛2 +7𝑛+2. Since 𝑛2 ≤ 𝑛3 , 𝑛 ≤ 𝑛3 and 1 ≤ 𝑛3 for
𝑛 ≥ 1, one has 𝑓(𝑛) ≤ (2 + 1 + 7 + 2)𝑛3 . Set 𝐶 = 12 and 𝑛0 = 1. Thus 𝑓 = 𝑂(𝑛3 )
and 𝑓 has cubic growth in 𝑛.
20 20
(2) Let 𝑓(𝑛) = 100 + . Set 𝐶 = 101 and 𝑛0 = 19. Since ≤ 1 for 𝑛 ≥ 19, we
𝑛+1 𝑛+1
have 𝑓 = 𝑂(1). Hence 𝑓 is asymptotically bounded by a constant.
(3) Let 𝑓(𝑛) = 5√2𝑛+3 + 𝑛2 − 2𝑛; then 𝑓 = 𝑂(2𝑛/2 ) so that 𝑓 grows exponentially in
𝑛. ♢
The Big-O notation is often used to assess the running time of an algorithm in a
worst-case scenario. An asymptotic upper bound does not depend on the measuring
unit or the platform, since the values would differ only by a multiplicative constant.
If the running time function 𝑓(𝑛) has polynomial growth in the input length 𝑛, i.e.,
𝑓 = 𝑂(𝑛𝑘 ), then 𝑓(𝑛) is bounded by 𝐶𝑛𝑘 for constants 𝐶, 𝑘 and large 𝑛. For example,
if 𝑛 is doubled, then the upper bound is multiplied only by the constant 2𝑘 .
Algorithms with polynomial running time in terms of the input size are considered
to be efficient or fast. Many standard algorithms, for example adding or multiplying
two numbers, are polynomial. On the other hand, an algorithm that loops over every
instance of a set with 2𝑛 elements is exponential in 𝑛. A problem is called hard if no
efficient, i.e., polynomial-time, algorithm exists that solves the problem.
A decision problem has only two possible answers, yes or no. Decision problems
for which a polynomial time algorithm exists are said to belong to the complexity class
P. The class NP (nondeterministic polynomial) is the set of decision problems which
can be verified in polynomial-time. Checking the correctness of a proof is apparently
easier than solving a problem. Whether the class NP is strictly larger than P is a major
unresolved problem in computer science.
In computer science, one is usually interested in the worst-case complexity of an al-
gorithm which solves a certain problem. However, the worst-case complexity is hardly
relevant for attacks against cryptographic systems. A cryptosystem is certainly inse-
cure if attacks are inefficient in certain bad cases but efficient in other cases. Instead,
the average-case complexity of algorithms that break a scheme should be large. A se-
cure scheme should provide protection in almost all cases and the probability that a
polynomial time algorithm breaks the cryptosystem should be very small.
Note that there is not only the time complexity but also the space complexity of
an algorithm. The space complexity measures the memory or storage requirements in
terms of the input size.
18 1. Fundamentals
Warning 1.34. The complexity is measured in terms of the input size, not the input
value! The following formula gives the relation between a positive integer 𝑛 and its
size(𝑛) = ⌊log2 (𝑛)⌋ + 1.
The size of an integer is the number of bits that is needed to represent it. For a given
binary string 𝑚, the bit-length is also called size and denoted by |𝑚|.
One should understand the limitations of the asymptotic notation. First, the upper
bound applies only if 𝑛 is larger than some unknown initial value 𝑛0 . Furthermore, the
multiplicative constant 𝐶 can be large. For a given input value, an algorithm with poly-
nomial running time may even be slower than an exponential-time algorithm! How-
ever, for large 𝑛 the polynomial-time algorithm will eventually be faster.
Bounded functions are of type 𝑂(1) and functions of type 𝑂( ) converge to 0. Neg-
ligible functions approach zero faster than and any other inverse polynomial:
Definition 1.38. Let 𝑓 ∶ ℕ → ℝ be a function. One says that 𝑓 is negligible in 𝑛 if
1 1
𝑓 = 𝑂( ) for all polynomials 𝑞, or equivalently if 𝑓 = 𝑂( 𝑐 ) for all 𝑐 > 0. ♢
𝑞(𝑛) 𝑛
Hence negligible functions are eventually smaller than any inverse polynomial.
1 1 1
This means that 𝑓(𝑛) approaches zero faster than any of the functions , 2 , 3 , etc.
𝑛 𝑛 𝑛
1.4. Discrete Probability 19
Example 1.39. 𝑓(𝑛) = 10𝑒−𝑛 and 2−√𝑛 are negligible in 𝑛, whereas 𝑓(𝑛) = is
𝑛2 +3𝑛
1 1
not negligible since 𝑓(𝑛) = 𝑂( ), but 𝑓 ≠ 𝑂( ). ♢
𝑛2 𝑛3
Example 1.41. (1) Let 𝑓1 (𝑛) = log2 (𝑛). Then 𝑓1 = 𝑂(1).
̃ 2 ).
(2) Let 𝑓2 (𝑛) = log (𝑛)3 𝑛2 . Then 𝑓2 = 𝑂(𝑛
𝑃𝑟 [ 𝐴𝑖 ] = ∑ 𝑃𝑟[𝐴𝑖 ].
𝑖 𝑖
The triple (Ω, 𝒮, 𝑃𝑟) is called a discrete probability space. Ω is called the sample space
and we say that 𝑃𝑟 is a discrete probability distribution on Ω. The subsets 𝐴 ⊂ Ω are
said to be events and 𝑃𝑟[𝐴] is the probability of 𝐴. If Ω is finite, then (Ω, 𝒮, 𝑃𝑟) is called
a finite probability space. ♢
Note that the family of sets in (2) is either finite or countably infinite. Since all
probabilities are non-negative and the sum is bounded by 1, the series converges and
is also invariant under a reordering of terms.
Remark 1.43. In measure theory, a triple (Ω, 𝒮, 𝑃𝑟), where 𝑃𝑟 is a 𝜎-additive function
on a set 𝒮 ⊂ 𝒫(Ω) of measurable sets such that 𝑃𝑟[Ω] = 1, is called a probability space.
We only consider the case of a countable sample space Ω and assume that all subsets
(events) are measurable, i.e., 𝒮 = 𝒫(Ω). A discrete probability distribution is fully
determined by the values on the singletons {𝜔} (the elementary events). We define the
𝑝(𝜔) = 𝑃𝑟[{𝜔}]
and obtain for all events 𝐴 ⊂ Ω:
𝑃𝑟[𝐴] = ∑ 𝑝(𝜔).
20 1. Fundamentals
Example 1.44. Let Ω = ℕ and 0 < 𝑝 < 1. Define a discrete probability distribution
𝑃𝑟 on ℕ by
𝑃𝑟 [ 𝐴𝑖 ] = ∏ 𝑃𝑟[𝐴𝑖 ] . ♢
𝑖∈𝐼 𝑖∈𝐼
Definition 1.47. Let 𝐴 and 𝐵 be two events in a probability space and suppose that
1.4. Discrete Probability 21
Note that 𝑋 induces a discrete probability distribution 𝑃𝑟𝑋 on the countable subset
𝑋(Ω) ⊂ ℝ. The difference to the original distribution 𝑃𝑟 is that the sample space of 𝑃𝑟𝑋
is now a subset of ℝ. If the sample space Ω is already a subset of ℝ, then 𝑋 is usually
the inclusion map.
Example 1.49. Suppose two dice are rolled and the random variable 𝑋 gives the sum
of numbers on the dice. Then 𝑋 −1 (2) = {(1, 1)} and 𝑋 −1 (3) = {(1, 2), (2, 1)}, so that
1 1 1 1
𝑝𝑋 (2) = 𝑃[𝑋 = 2] = , 𝑝𝑋 (3) = 𝑃[𝑋 = 3] = + = .
36 36 36 18
1 1 1 1
Furthermore, 𝐹(𝑥) = 0 for 𝑥 < 2, 𝐹(2) = , 𝐹(3) = + = , etc., and
36 36 18 12
𝐹(𝑥) = 1 for 𝑥 ≥ 12.
Definition 1.50. Let 𝑃𝑟 be a discrete probability distribution and 𝑋 ∶ Ω → ℝ a random
variable with countable range 𝑋(Ω) ⊂ ℝ. One defines the expected value (also called
expectation, mean or average) 𝐸[𝑋] and the variance 𝑉[𝑋] if the sums given below are
either finite or the corresponding series converge absolutely:
𝐸[𝑋] = ∑ 𝑥 ⋅ 𝑃𝑟[𝑋 = 𝑥] = ∑ 𝑥 ⋅ 𝑝𝑋 (𝑥),
𝑥∈𝑋(Ω) 𝑥∈𝑋(Ω)
The square root 𝜎 = √𝑉[𝑋] of the variance is called the standard deviation. It measures
the quadratic deviation from the mean 𝐸[𝑋].
Example 1.51. (1) Let 𝑃𝑟 be a uniform distribution on a finite set Ω. Assume that
the random variable 𝑋 maps Ω to the set {0, 1, … , 𝑛 − 1}. The pmf is 𝑝𝑋 (𝑥) =
22 1. Fundamentals
Example 1.53. Let 𝑋1 and 𝑋2 be two binary random variables (values 0 or 1) that
are given by tossing two perfect coins so that 𝑋1 and 𝑋2 are independent. Now set
𝑋3 = 𝑋1 ⊕ 𝑋2 . Then 𝑋1 , 𝑋2 , 𝑋3 are pairwise independent and each of them has a
uniform distribution, but they are not mutually independent. We have
𝑃𝑟[𝑋1 = 1 ∧ 𝑋2 = 1 ∧ 𝑋3 = 1] = 0,
since 𝑋3 must be zero if 𝑋1 = 𝑋2 = 1, but
1 3 1
𝑃𝑟[𝑋1 = 1] ⋅ 𝑃𝑟[𝑋2 = 1] ⋅ 𝑃𝑟[𝑋3 = 1] = ( ) = .
2 8
Example 1.54. Let Ω = {0, 1}8 be a space of plaintext and ciphertexts. Suppose the
plaintexts 𝑋 are uniformly distributed. Let 𝜎 ∶ Ω → Ω be a random bit permutation
(see Section 1.2). Then the ciphertexts 𝑌 = 𝜎(𝑋) are also uniformly distributed. 𝑋 and
𝑌 are independent if
𝑃𝑟[𝑋 = 𝑚 ∧ 𝑌 = 𝑐] = 𝑃𝑟[𝑋 = 𝑚] ⋅ 𝑃𝑟[𝑌 = 𝑐]
1 1
holds for all plaintexts 𝑚 and ciphertexts 𝑐. The right side of the equation gives ⋅ =
28 28
for all 𝑚 and 𝑐. If 𝑚 and 𝑐 possess a different number of ones, then the left side is 0,
because such a combination is impossible for a bit permutation. This shows that 𝑋 and
𝑌 are not independent. Later we will see that bit permutations are not secure, since
the ciphertext leaks information about the plaintext.
1.5. Random Numbers 23
Figure 1.3. Probability mass functions of i) the binomial distribution 𝐵(20, ) (•) and
ii) the uniform distribution (×) on {0, 1, 2, … , 20}.
Example 1.55. Let 𝑃𝑟 be a probability distribution on a sample space Ω with two el-
ements. Suppose 𝑋 ∶ Ω → {0, 1} is a random variable with 𝑃𝑟[𝑋 = 1] = 𝑝 (success)
and 𝑃𝑟[𝑋 = 0] = 1 − 𝑝 (failure). This is called a Bernoulli trial. Furthermore, let
𝑋1 , … , 𝑋𝑛 be 𝑛 independent identical distributed (i.i.d.) random variables with 𝑋𝑖 = 𝑋,
and define
𝑌 = 𝑋 1 + 𝑋2 + ⋯ + 𝑋 𝑛 .
The new random variable 𝑌 follows a binomial distribution 𝐵(𝑛, 𝑝) and gives the num-
ber of successes in 𝑛 independent Bernoulli trials. For 𝑘 ∈ {0, 1, … , 𝑛} one has
𝑃𝑟[𝑌 = 𝑘] = ( )𝑝𝑘 (1 − 𝑝)𝑛−𝑘 ,
since there are (𝑛) combinations of 𝑛 trials with 𝑘 successes and 𝑛 − 𝑘 failures. The
probability of each combination is 𝑝𝑘 (1 − 𝑝)𝑛−𝑘 . We have 𝐸[𝑋] = 𝑝, 𝐸[𝑌 ] = 𝑛𝑝,
𝑉[𝑋] = 𝑝(1 − 𝑝) and 𝑉[𝑌 ] = 𝑛𝑝(1 − 𝑝).
by deterministic algorithms, which take a short random input seed as input and gener-
ate a long output sequence that appears to be random. Pseudorandom generators are
discussed in Section 2.8.
Definition 1.56. A random bit generator (RBG) is a mechanism or device which gen-
erates a sequence of random bits, such that the corresponding sequence of binary ran-
dom variables 𝑋1 , 𝑋2 , 𝑋3 , … has the following properties:
(1) 𝑃𝑟[𝑋𝑛 = 0] = 𝑃𝑟[𝑋𝑛 = 1] = for all 𝑛 ∈ ℕ (uniform distribution) and
(2) 𝑋1 , 𝑋2 , … , 𝑋𝑛 are mutually independent for all 𝑛 ∈ ℕ.
Example 1.57. If at least one output bit is a combination of the other bits, for example
if 𝑋3 = 𝑋1 ⊕ 𝑋2 (see Example 1.53), then this does not give a random bit sequence.
This demonstrates that the obvious constructions to ‘stretch’ a given sequence cannot
be used. ♢
Random bits or numbers can be produced manually (for example coin tossing, die
rolling or mouse movements) or with hardware random number generators, which use
physical phenomena like thermal noise, electrical noise or nuclear decay. Unfortu-
nately, these mechanisms or devices tend to be slow, elaborate and/or costly. Fast
all-digital random bit generators on current processor chips use thermal noise, but
whether such generators can be trusted and do not have any weaknesses or even con-
tain backdoors is disputed.
Remark 1.58. The required uniform distribution of the output of a bit generator can
be achieved by de-skewing a possibly biased generator (see Example 1.59 below), but
the statistical independence of the output bits is hard to achieve and difficult to prove.
Example 1.59. Von Neumann proposed the following de-skewing technique (von Neu-
mann extractor): group the output bits into pairs, then turn 01 into 0 and 10 into 1. If
the bits are independent, then the pairs 01 and 10 must have the same probability. The
pairs 00 and 11 are discarded. The derived generator is slower but unbiased. ♢
that the same input can have a different output value, and this can be a desirable prop-
erty for data encryption.
Random numbers match surprisingly often. This is known as the Birthday Problem
or Birthday Paradox.
Example 1.60. Assume that 23 people are in a certain place. Then the probability
that at least two of them have their birthday on the same day of the year is above 50%.
Intuitively, one would expect that around people would be needed for a probable
birthday match. ♢
The explanation of this ‘paradox’ is quite simple: the probability 𝑝 that no colli-
sion occurs (i.e., all birthdays are different) decreases exponentially with the number
𝑛 of persons. We assume that birthdays are uniformly distributed. For 𝑛 = 2, one has
364 364 363
𝑝 = . For 𝑛 = 3, one gets 𝑝 = ⋅ , and each increment of 𝑛 yields another
365 365 365
factor. We write a SageMath function and obtain 𝑝 ≈ 0.493 for 𝑛 = 23. The comple-
mentary probability 1−𝑝 for a birthday collision with 23 people therefore lies above 0.5.
The running time of finding a collision among binary strings of length 𝑙 is 𝑂(2𝑙/2 ).
Unfortunately, a large amount of space is also required, since all 𝑂(2𝑙/2 ) strings have to
be stored to detect a collision.
An optimization is possible if the samples are defined recursively by a function:
𝑥𝑖 = 𝑓(𝑥𝑖−1 ) for 𝑖 ≥ 1,
26 1. Fundamentals
where 𝑥0 is some initial value. Now the problem is to find a cycle in a sequence of
iterated function values. Floyd’s cycle-finding algorithm uses very little memory and is
based on the following observation:
Proposition 1.63. Let 𝑓 ∶ 𝑋 → 𝑋 be a function on some set 𝑋, 𝑥0 ∈ 𝑋 and 𝑥𝑖 = 𝑓(𝑥𝑖−1 )
for 𝑖 ≥ 1. Suppose there exist 𝑖, 𝑗 ∈ ℕ such that 𝑖 < 𝑗 and 𝑥𝑖 = 𝑥𝑗 . Then there exists an
integer 𝑘 < 𝑗 such that
𝑥𝑘 = 𝑥2𝑘 .
Proof. Let Δ = 𝑗 − 𝑖; then 𝑥𝑖 = 𝑥𝑖+∆ and hence 𝑥𝑘 = 𝑥𝑘+∆ = 𝑥𝑘+𝑚∆ for all integers
𝑘 ≥ 𝑖 and 𝑚 ≥ 1. Now let 𝑘 = 𝑚Δ, where 𝑚Δ is the smallest multiple of Δ that is
also greater than or equal to 𝑖. The sequence 𝑖, 𝑖 + 1, … , 𝑗 − 1 of Δ consecutive integers
contains the required number 𝑘 = 𝑚Δ. Therefore, 𝑥𝑘 = 𝑥2𝑘 and 𝑘 = 𝑚Δ < 𝑗. □
Note that a collision must exist if 𝑋 is a finite set. The above Proposition 1.63
implies that a collision in 𝑥0 , 𝑥1 , … , 𝑥𝑗 yields a collision of the special form 𝑥𝑘 = 𝑥2𝑘
for some 𝑘 < 𝑗. The least period of the sequence divides 𝑘. It is therefore sufficient to
compute the pairs (𝑥𝑖 , 𝑥2𝑖 ) for 𝑖 = 1, 2, … until a collision occurs. These values can be
recursively calculated:
𝑥𝑖 = 𝑓(𝑥𝑖−1 ) and 𝑥2𝑖 = 𝑓(𝑓(𝑥2(𝑖−1) )).
Assuming that the sequence 𝑥0 , 𝑥1 , … is uniformly distributed and |𝑋| = 𝑛, the run-
ning time is still 𝑂(√𝑛), but now it is sufficient to store only two values. This approach
is used in birthday attacks against hash functions and in Pollard’s 𝜌 algorithms for fac-
toring and discrete logarithms. The sequence 𝑥0 , 𝑥1 , … can be depicted by an initial
tail and a cycle so that it looks like the greek letter 𝜌.
Remark 1.64. We only consider the part of Floyd’s algorithm which finds a collision.
The algorithm can also compute the least period, i.e., the length of the shortest cycle,
and find the beginning of the cycle.
Example 1.65. Let 𝑋 = ℤ107 be the set of residue classes modulo 107 and let
𝑓(𝑥) = 𝑥2 + 26 mod 107.
Set 𝑥0 ≡ 1 mod 107 and let 𝑥𝑖 = 𝑓(𝑥𝑖−1 ) for 𝑖 ≥ 1. We want to find a collision within
the sequence 𝑥0 , 𝑥1 , 𝑥2 , … and implement Floyd’s cycle finding algorithm:
sage: def f(x):
return (x*x+26)
sage: x=mod (1 ,107)
sage: y=mod (1 ,107)
sage: x=f(x)
sage: y=f(f(y))
sage: k=1
sage: while x!=y:
1.6. Summary 27
print "k =",k," x =",x
k = 9 x = 39
Hence 𝑥9 = 𝑥18 = 39 is a collision. Let’s compute the first few elements of the
sequence and verify the result:
sage: x=mod (1 ,107)
sage: for i in range (46):
print ("{:2}". format (x)),
27 6 62 18 29 11 40 21 39 49 73 5 51 59 83 67 21 39 49 73 5 51
59 83 67 21 39 49 73 5 51 59 83 67 21 39 49 73 5 51 59 83 67 21
The first seven elements form the initial segment (tail) of the sequence. The be-
ginning of the cycle is 𝑥8 = 21, and the sequence 21, 39, 49, … is cyclic of period
1.6. Summary
1. Let 𝑋 = ([−1, 1] ∩ ℤ) × {0, 1}. Enumerate the elements of 𝑋 and determine |𝑋|. Let
𝑌 = {1, 2, … , |𝑋|}. Give a bijection from 𝑋 to 𝑌 .
2. Which of the following maps are injective, surjective or bijective? Determine the
image 𝑖𝑚(𝑓) and give the inverse map 𝑓−1 , if possible.
(a) 𝑓1 ∶ ℕ → ℕ, 𝑓1 (𝑛) = 2𝑛 + 1.
(b) 𝑓2 ∶ ℤ → ℕ, 𝑓2 (𝑘) = |𝑘| + 1.
(c) 𝑓3 ∶ {0, 1}8 → {0, 1}8 , 𝑓3 (𝑏) = 𝑏 ⊕ (01101011).
(d) 𝑓4 ∶ {0, 1}8 → {0, 1}8 , 𝑓4 (𝑏) = 𝑏 AND (01101011).
3. Show that the following sets are countable:
(a) ℤ.
(b) ℤ2 .
(c) ℚ.
Hint: It is sufficient to construct an injective function into ℕ.
4. Let 𝑓 ∶ 𝑋 → 𝑌 be a map between finite sets and suppose that |𝑋| = |𝑌 |. Show the
following equivalences:
𝑓 is injective ⟺ 𝑓 is surjective ⟺ 𝑓 is bijective.
5. Let 𝑓 ∶ 𝑋 → 𝑌 be a function.
(a) Let 𝐵 ⊂ 𝑌 . Show that 𝑓(𝑓−1 (𝐵)) ⊂ 𝐵 with equality occurring if 𝑓 is surjective.
(b) Let 𝐴 ⊂ 𝑋. Show that 𝐴 ⊂ 𝑓−1 (𝑓(𝐴)) with equality occurring if 𝑓 is injective.
6. Enumerate the integers modulo 26. Find the standard representative of the follow-
ing integers in ℤ26 :
−1000, −30, −1, 15, 2001, 293829329302932398231.
7. Consider the following relation 𝑆 on ℝ:
𝑆 = {(𝑥, 𝑦) ∈ ℝ × ℝ | 𝑥 − 𝑦 ∈ ℤ}.
Show that 𝑆 is an equivalence relation on ℝ. Determine the equivalence classes 0,
−2 and . Can you give an interval 𝐼 such that there is a bijection between 𝐼 and
the quotient set ℝ/ ∼ ?
8. Find an asymptotic upper bound of the following functions in 𝑛. Which of them
are polynomial and which are negligible?
(a) 𝑓1 = 2𝑛3 − 3𝑛2 + 𝑛.
(b) 𝑓2 = 3 ⋅ 2𝑛 − 2𝑛 + 1.
(c) 𝑓3 = √2𝑛 + 1.
(d) 𝑓4 = 𝑛/2 .
5𝑛2 −𝑛
(e) 𝑓5 = .
2𝑛2 +3𝑛+1
Exercises 29
(f) 𝑓6 = 2 3 .
(g) 𝑓7 = log2 (𝑛)2 𝑛.
9. Let 𝑓 = 𝑓(𝑥0 , 𝑥1 , 𝑥2 ) be a 3-variable Boolean function with the following truth
𝑥2 𝑥1 𝑥0 𝑓(𝑥)
0 0 0 1
0 0 1 1
0 1 0 1
0 1 1 0
1 0 0 0
1 0 1 0
1 1 0 1
1 1 1 1
16. (Birthday Paradox) Let 𝑃𝑟 be a uniform distribution on the sample space Ω with
|Ω| = 𝑛. If 𝑘 ≤ 𝑛 samples are independently chosen, then the probability 𝑝 that
all 𝑘 values are different (i.e., no collision occurs) is
𝑝 = ∏ (1 − ).
(a) Show that the probability 1 − 𝑝 of a collision satisfies
1−𝑝≥1−𝑒 2𝑛 .
(b) Determine the smallest number 𝑘 such that 𝑝 ≈ .
Hint: Use the inequality 1 − 𝑥 ≤ 𝑒−𝑥 for 0 ≤ 𝑥 ≤ 1 and replace the factors 1 −
by 𝑒 𝑛 . Compute the product and obtain a sum in the exponent. Use the formula
𝑘−1 𝑘(𝑘−1) 1
∑𝑖=1 𝑖 = . For part (b), set 𝑝 = and determine 𝑘 using the quadratic
2 2
formula. You may also approximate 𝑘(𝑘 − 1) by 𝑘2 . This gives the approximate
number of samples needed for a probable collision.
Chapter 2
This chapter contains the fundamental definitions of encryption schemes and their
security. We look at security under different types of attacks and make assumptions
about the computing power of adversaries. Then we study pseudorandom generators,
functions and permutations, which are important primitives and the basis of many
cryptographic constructions.
The definition of cryptosystems and some basic examples are given in Section
2.1. The following three sections deal with different types of security of encryption
schemes. Perfect secrecy is the strongest type of security and is covered in Section 2.2.
Since perfectly secure schemes can rarely be used in practice, we relax the require-
ments and consider computational security in Section 2.3. For the formal definition
of security, we look at the success probability of polynomial-time adversaries in well-
defined games or experiments in Section 2.4. We then explain the important defini-
tions of eavesdropping (EAV) security, security against chosen plaintext attacks (CPA)
and security against chosen ciphertext attacks (CCA). A secure scheme should have in-
distinguishable encryptions: challenged with a ciphertext and two possible plaintexts,
a polynomial-time adversary fails to find the correct plaintext better than a random
We then turn to the construction of secure encryption schemes. Pseudorandom
generators and families of pseudorandom functions and permutations are important
building blocks of secure ciphers and are covered in Sections 2.8 and 2.9.
The combination of a family of functions or permutations and an operation mode
defines an encryption scheme. In Section 2.10, we discuss ECB, CBC and CTR modes
32 2. Encryption Schemes and Definitions of Security
and their properties. The security of CBC or CTR mode encryption can be reduced to
the pseudorandomness of the underlying block cipher.
The presentation of this chapter is heavily influenced by [KL15]. Other recom-
mended references are [BR05], [GB08] and [Gol01].
Remark 2.2. The security parameter 𝑛 controls the security of the scheme and the dif-
ficulty to break it, as well as the run-time of key generation, encryption and decryption
algorithms. The security parameter is closely related or even equal to the key length,
2.1. Encryption Schemes 33
𝑘 𝑘
𝑚 ℰ 𝑐 𝒟 𝑚
and quite often the key generation algorithm 𝐺𝑒𝑛(1𝑛 ) outputs a uniform random key
of length 𝑛. ♢
The scheme is said to be symmetric-key if encryption and decryption use the same
secret key. In contrast, public-key (asymmetric-key) encryption schemes use key pairs
𝑘 = (𝑝𝑘, 𝑠𝑘), where 𝑝𝑘 is public and 𝑠𝑘 is private; encryption takes the public key 𝑝𝑘 as
input and decryption the private key 𝑠𝑘 (see Definition 9.1). The encryption algorithm
ℰ𝑝𝑘 must be carefully chosen so that the inversion (decryption) is computationally hard
if only 𝑝𝑘 is known.
Until the 1970s, only symmetric-key schemes were known, but subsequently
public-key methods became part of standard cryptography. We will see later that both
schemes have their own field of application. Public-key encryption is studied in Chap-
ter 9, while this chapter only deals with symmetric encryption schemes.
Remark 2.3. A cryptosystem should be secure under the assumption that an attacker
knows the encryption and decryption algorithms. This is known as Kerkhoff’s Principle.
The security should be solely based on a secret key, not on the details of the system (see
Exercise 2). ♢
In the past, the plaintexts, ciphertexts and keys were often constructed using the
alphabet of letters. Now only the binary alphabet is relevant and
ℳ, 𝒞, 𝒦 are subsets of {0, 1}∗ = {0, 1}𝑛 .
Modern symmetric encryption schemes support key lengths between 128 and 256
bits. In contrast, public-key algorithms (with the exception of elliptic curve schemes)
use longer keys consisting of more than 1000 bits. Most modern symmetric schemes
are able to encrypt plaintexts of arbitrary length. If however the message length is fixed
by the security parameter, then we speak of a fixed-length encryption scheme.
Example 2.4. The one-time pad is an example of a simple but very powerful fixed-
length symmetric encryption scheme. It uses the binary alphabet, and the key length is
equal to the message length. The security parameter 𝑛 defines the length of plaintexts,
ciphertexts and keys:
ℳ = 𝒞 = 𝒦 = {0, 1}𝑛 .
34 2. Encryption Schemes and Definitions of Security
The key generation algorithm 𝐺𝑒𝑛(1𝑛 ) outputs a uniform random key 𝑘 ← {0, 1}𝑛 . A
key 𝑘 of length 𝑛 is used only for one message, 𝑚 ∈ {0, 1}𝑛 . Encryption ℰ𝑘 and decryption
𝒟𝑘 are identical and defined by a simple vectorial XOR operation:
𝑐 = ℰ𝑘 (𝑚) = 𝑚 ⊕ 𝑘, 𝑚 = 𝒟𝑘 (𝑐) = 𝑐 ⊕ 𝑘.
We will see below that this scheme provides perfect security, but since the key has the
same length as the plaintext, the one-time pad is impractical. Much shorter keys (say
several hundreds bits), which can be used for a large amount of data (say megabytes or
gigabytes), are preferable.
Example 2.5. The Vigenère cipher of (key) length 𝑛 is a classical example of a symmet-
ric variable-length scheme over the alphabet of letters. One sets
ℳ = 𝒞 = Σ∗ and 𝒦 = Σ𝑛 , where Σ = {𝐴, 𝐵, … , 𝑍} ≅ ℤ26 .
The letters A to Z can be represented by integers modulo 26 (see Example 1.21). The
letter A corresponds to the residue class 0, B to 1, … , Z to 25. A cyclic shift then becomes
addition or subtraction of residue classes.
𝐺𝑒𝑛(1𝑛 ) generates a uniform random key string 𝑘 ← Σ𝑛 of length 𝑛. For encryption
and decryption, the message and the ciphertext is split into blocks of length 𝑛, although
the last block can be shorter. Each letter in a plaintext block is transformed by a cyclic
shift, where the number of positions is determined by the corresponding key letter.
For encryption the shifting is in the positive direction, and for decryption it is in the
opposite direction.
𝑐 = ℰ𝑘 (𝑚) = ℰ𝑘 (𝑚1 ‖𝑚2 ‖ … ) = (𝑚1 + 𝑘 ‖ 𝑚2 + 𝑘 ‖ … ) mod 26,
𝑚 = 𝒟𝑘 (𝑐) = 𝒟𝑘 (𝑐1 ‖𝑐2 ‖ … ) = (𝑐1 − 𝑘 ‖ 𝑐2 − 𝑘 ‖ … ) mod 26.
For 𝑛 = 1, one obtains a monoalphabetic substitution cipher, for example the so-called
Caesar cipher, where 𝑘 = 3: each letter is shifted by three positions, the letter A maps
to D, B maps to E, etc. The Vigenère cipher of length 𝑛 > 1 is an example of a polyal-
phabetic substitution cipher. Although the key can be long, each ciphertext letter only
depends on a single plaintext character.
The above examples all use linear or affine transformations. Later we will see that
affine ciphers are often vulnerable to known plaintext attacks which only require stan-
dard linear algebra (see Proposition 4.91). There are also classical nonlinear mono- or
polyalphabetic substitution ciphers. If the plaintext contains text from a known lan-
guage, then such ciphers can often be broken by a frequency analysis. It is also possible
to reveal an unknown length of a polyalphabetic cipher.
2.2. Perfect Secrecy 35
Perfect secrecy means that all plaintexts have the same probability for a given ci-
phertext. This property is also called perfect indistinguishability: without the secret key,
it is impossible to find out which plaintext was encrypted. An eavesdropper truly learns
nothing about the plaintext from the ciphertext, provided that any key is possible. In
other words: given any ciphertext 𝑐, every plaintext message 𝑚 is exactly as likely to
be the underlying plaintext. Note that the definition requires a fixed plaintext length
since encryption usually does not hide the length of a plaintext message.
The following lemma provides an alternative definition of perfect secrecy.
Lemma 2.9. An encryption scheme is perfectly secret if and only if for every probability
distribution over ℳ, every plaintext 𝑚 and every ciphertext 𝑐 for which 𝑃𝑟[𝑐] > 0, the
probability of 𝑚 and the conditional probability of 𝑚 given 𝑐 coincide:
𝑃𝑟[𝑚 | 𝑐] = 𝑃𝑟[𝑚]. ♢
Although perfect secrecy is a very strong requirement, there is a very simple cipher
which achieves this level of security: the one-time pad (see Example 2.4).
Theorem 2.10. The one-time pad is perfectly secret if the key is generated by a random
bit generator (see Definition 1.56) and is only used once.
Proof. Let 𝑛 be a security parameter of the one-time pad. Suppose 𝑚0 , 𝑚1 are plain-
texts and 𝑐 is a ciphertext of length 𝑛. Then there is exactly one key 𝑘0 of length 𝑛 which
encrypts 𝑚0 into 𝑐 and in fact 𝑘0 = 𝑚0 ⊕ 𝑐. Since we assumed an uniform distribution
36 2. Encryption Schemes and Definitions of Security
of keys, we have 𝑃𝑟[ℰ𝑘 (𝑚0 ) = 𝑐] = . The same holds true for 𝑚1 , which proves the
Theorem. □
Example 2.11. Suppose a Vigenère cipher of key length 3 is used to encrypt four char-
acters. Let 𝑐 = (𝑦1 , 𝑦2 , 𝑦3 , 𝑦4 ) be any ciphertext of length four, 𝑚 = (𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 ) the
corresponding plaintext and 𝑘 = (𝑘1 , 𝑘2 , 𝑘3 ) the key. Then
𝑦1 ≡ 𝑥1 + 𝑘1 , 𝑦2 ≡ 𝑥2 + 𝑘2 , 𝑦3 ≡ 𝑥3 + 𝑘3 , 𝑦4 ≡ 𝑥4 + 𝑘1 mod 26.
The difference between the fourth ciphertext and plaintext character is congruent to
the difference between the first ciphertext and plaintext character:
𝑘1 ≡ 𝑦1 − 𝑥1 ≡ 𝑦4 − 𝑥4 mod 26.
This forms a condition for all valid plaintext/ciphertext pairs. If 𝑐 = (𝑦1 , 𝑦2 , 𝑦3 , 𝑦4 ) is
given, then there are many unfeasible plaintexts 𝑚0 , i.e., ℰ𝑘 (𝑚0 ) ≠ 𝑐 for all 𝑘 ∈ 𝒦, and
𝑃𝑟[ℰ𝑘 (𝑚0 ) = 𝑐] = 0.
On the other hand, plaintexts 𝑚1 = (𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 ) that satisfy the congruence 𝑦1 −𝑥1 ≡
𝑦4 − 𝑥4 mod 26 are possible and their probability is
𝑃𝑟[ℰ𝑘 (𝑚1 ) = 𝑐] = 3
if 𝑘 ∈ ℤ326 is chosen uniformly at random. Hence the cipher does not have perfect
secrecy. ♢
If the key is shorter than the message, then an encryption scheme cannot be per-
fectly secret: a known ciphertext message changes the posterior probability of a plain-
text message. On the other hand, adversaries with limited resources might not be able
to exploit this situation so that the scheme is still computationally secure. This is dis-
cussed in the next section.
Example 2.13. Assume that the best-known attack against a scheme is exhaustive key
search (brute force) and that the key has length 𝑛. If testing a single key takes 𝑐 CPU
cycles and in total 𝑁 CPU cycles are executed, then keys can be tested and the proba-
bility of success is approximately 𝑛 , if ≪ 2𝑛 . Hence the scheme is (𝑁, 𝑛 )-secure.
𝑐2 𝑐 𝑐2
Suppose an adversary uses a computer with one 2 GHz CPU and performs a brute force
attack against a scheme with 128-bit key length over the course of a year. Let’s assume
that 𝑐 = 1. Then roughly 255 keys can be tested and the scheme is (255 , 2−73 )-secure.
Note that an event with a probability of 2−73 will never occur in practice. ♢
We see that concrete values depend on the hardware used, e.g., the type of CPU, as
well as on the implementation of attacks. Now we give an asymptotic version of a defi-
nition of security. In this approach, the running time and the probability of breaking a
scheme are considered as functions of the security parameter 𝑛 (see Remark 2.2), and
one analyzes the behavior for sufficiently large values of 𝑛.
Definition 2.14. An encryption scheme is called computational secure if every proba-
bilistic algorithm with polynomial running time can only break the scheme with neg-
ligible probability in the security parameter 𝑛. ♢
choose plaintexts (Chosen Plaintext Attack, CPA). If the adversary can even choose ci-
phertexts and obtain the corresponding plaintexts, then we call this a Chosen Ciphertext
Attack (CCA).
We consider experiments (or games) between two algorithms, a polynomial-time
adversary and a challenger. We denote the adversary by 𝐴 and the challenger by 𝐶.
The challenger takes as input a security parameter and sets up the experiment, for ex-
ample by generating parameters and keys. 𝐶 runs the experiment and interacts with
𝐴. In the experiment, 𝐴 has certain choices and capabilities. Finally, 𝐴 has to answer
a challenge and outputs a single bit. The challenger verifies the answer and outputs 1
(𝐴 was successful and won the game) or 0 (𝐴 failed). Obviously, 𝐴 has a 50% chance
of randomly guessing the correct answer, but 𝐴 might also use a more effective strat-
egy. The game is repeated many times so that a success probability and an advantage
(compared to random guesses) can be computed.
Such experiments may look artificial at first, but they answer the question as to
whether an adversary can obtain at least one bit of secret information by applying an
efficient algorithm. A scheme is considered broken if the probability of success is sig-
nificantly higher than 50%.
In many security experiments, 𝐶 chooses a uniform random secret bit 𝑏 and 𝐴
obtains a challenge that depends on 𝑏. Finally, 𝐴 outputs a bit 𝑏′ and wins the game if
𝑏 = 𝑏′ . Since the experiment is repeated many times, both 𝑏 and 𝑏′ can be considered
as random variables. The following Table 2.1 contains the four combinations of 𝑏 and
𝑏′ and their joint probabilities:
𝑏′ = 0 𝑏′ = 1
If the adversary randomly guesses 𝑏′ , then all four probabilities are close to . On
the other hand, if 𝐴 is doing a good job, then the diagonal entries are greater than
and the other two are smaller than .
We define 𝐴’s advantage over random guesses as the difference between the prob-
ability of success (output of the experiment is 1) and the probability of failure (output of
2.5. Eavesdropping Attacks 39
Adversary Challenger
1𝑛 $ $
𝑘 ← 𝐺𝑒𝑛(1𝑛 ), 𝑏 ← {0, 1}
𝑚0 , 𝑚1
Choose 𝑚0 , 𝑚1
|𝑚0 | = |𝑚1 |
𝑐 ← ℰ𝑘 (𝑚𝑏 )
Select 𝑚0 (𝑏′ = 0) Compare 𝑏 and 𝑏′ ,
or 𝑚1 (𝑏′ = 1) output 1 or 0
The definition can also be used to show that a particular scheme does not have EAV
security. In this case, it suffices to give one example of a polynomial-time algorithm
which achieves a non-negligible advantage in the EAV experiment.
Example 2.19. Suppose a scheme does not encrypt the first bit, i.e., the first plaintext
bit and the first ciphertext bit coincide. Then the scheme does not have EAV security:
an adversary could choose two plaintexts that differ in their first bit. In this way they
are able to identify the correct plaintext from the challenge ciphertext.
Remark 2.20. It may seem surprising that the adversary can choose the plaintexts in
an eavesdropping attack. It would be conceivable to let the challenger select the plain-
texts. However, eavesdropping security should ensure that encryption protects every
plaintext, not just selected or random plaintexts.
Furthermore, there are real-world situations where the plaintext space is rather
small, say {0, 1} or {YES, NO}. Such plaintexts also deserve protection. The EAV exper-
iment is perfect for modeling this situation. ♢
We may also adopt a concrete approach to eavesdropping security (see Section 2.3):
Definition 2.21. An encryption scheme is (𝑡, 𝜖)-secure in the presence of an eaves-
dropper if for every probabilistic adversary 𝐴 running in time 𝑡, the advantage of 𝐴 is
less than 𝜖:
Adv (𝐴) < 𝜖. ♢
2.6. Chosen Plaintext Attacks 41
𝑘 𝑏
𝑐 ← ℰ𝑘 (𝑚𝑏 )
ℰ 𝑐
Figure 2.3. Indistinguishability: no efficient algorithm can tell the two cases apart if
𝑘 and 𝑏 are secret.
Remark 2.22. This is the first in a series of similar experiments and the reader is
invited to think through the definitions. The adversary 𝐴 chooses two plaintexts of the
same length. The challenger encrypts one of the plaintexts and gives the ciphertext
to the adversary. Then 𝐴 has to distinguish between the two cases, i.e., find out which
plaintext was encrypted (see Figure 2.3). The question is whether 𝐴 finds a clever
way to tackle the challenge, or else falls back on random guesses. The advantage is
finally computed over many games with sample keys 𝑘, random bits 𝑏, ciphertexts 𝑐
and other randomness used by the adversary.
In practice, one would not really conduct a large number of such experiments. In
particular, modeling the possible strategies of an adversary seems rather difficult, but
the above definition is useful since it clearly states the requirements for eavesdropping
security and defines under which condition a scheme is considered broken.
Remark 2.23. Our security definition is based on indistinguishability. One can show
that this is equivalent to semantic security: an adversary cannot learn any partial infor-
mation about the plaintext from the ciphertext. This means that any function of the
plaintext, say extracting one bit, is hard to compute and polynomial-time adversaries
cannot do any better than random guessing. We refer to the literature for more details
on semantic security ([KL15], [Gol01]).
chooses a uniform random bit 𝑏 ← {0, 1}. A probabilistic polynomial-time adversary 𝐴
is given 1𝑛 , but 𝑘 and 𝑏 are not known to 𝐴. The adversary can choose arbitrary plain-
texts and get the corresponding ciphertext from an encryption oracle. The adversary
then chooses two different plaintexts 𝑚0 and 𝑚1 of the same length. The challenger
returns the ciphertext ℰ𝑘 (𝑚𝑏 ) of one of them. The adversary 𝐴 continues to have access
to the encryption oracle. Finally, 𝐴 tries to guess 𝑏 and outputs a bit 𝑏′ . The challenger
outputs 1 if 𝑏 = 𝑏′ , and 0 otherwise. The IND-CPA advantage of 𝐴 is defined as
Adv (𝐴) = | 𝑃𝑟[𝑏′ = 𝑏] − 𝑃𝑟[𝑏′ ≠ 𝑏] | .
The probability is taken over all random variables in this experiment, i.e., the key 𝑘, bit
𝑏, encryption ℰ𝑘 and randomness of 𝐴.
Adversary Challenger/Oracle
1𝑛 $ $
𝑘 ← 𝐺𝑒𝑛(1𝑛 ), 𝑏 ← {0, 1}
Choose plaintext 𝑚′
𝑐′ ← ℰ𝑘 (𝑚′ )
𝑚0 , 𝑚1
Choose 𝑚0 , 𝑚1
|𝑚0 | = |𝑚1 |
𝑐 ← ℰ𝑘 (𝑚𝑏 )
Choose plaintext 𝑚″
𝑐″ ← ℰ𝑘 (𝑚″ )
Select 𝑚0 (𝑏′ = 0) Compare 𝑏 and 𝑏′ ,
or 𝑚1 (𝑏′ = 1) output 1 or 0
Figure 2.4. CPA indistinguishability experiment. The adversary may repeatedly ask
for the encryption of chosen plaintexts 𝑚′ , 𝑚″ .
2.7. Chosen Ciphertext Attacks 43
Security under chosen plaintext attack is quite a strong condition. It basically says
that an adversary is not able to obtain a single bit of information from a given ciphertext,
even if they can ask for the encryption of arbitrary plaintexts.
Remark 2.26. The definition immediately implies that a deterministic encryption
scheme cannot be secure under IND-CPA. An adversary can, in fact, ask the oracle for
the encryption of the chosen plaintexts 𝑚0 and 𝑚1 . Then they only need to compare
the returned ciphertext with the challenge ciphertext. If the scheme is deterministic,
the IND-CPA advantage is equal to 1, but for a non-deterministic scheme, each encryp-
tion of a fixed plaintext can yield another ciphertext. Therefore, a simple comparison
cannot be used in the non-deterministic case.
Example 2.27. Suppose an encryption scheme is probabilistic, but the first ciphertext
bit depends only on the plaintext and the key and not on the randomness of the encryp-
tion algorithm. If the first ciphertext bit is not constant, then the scheme is not secure
under IND-CPA, even if the method is very strong otherwise. An adversary would gen-
erate multiple plaintexts of the same length and ask for encryption, until they find two
plaintexts 𝑚0 , 𝑚1 such that the corresponding ciphertexts differ in their first bit. Then
they choose 𝑚0 and 𝑚1 in the CPA experiment. The first bit of the challenge ciphertext
reveals which of the two plaintexts was encrypted by the challenger.
Remark 2.28. Constant patterns in the ciphertext do not violate IND-CPA (or IND-
EAV) security, and it is not required that the ciphertext looks like a random sequence.
In fact, an adversary cannot leverage constant parts of the ciphertext in order to obtain
information about the plaintext. ♢
In practice, one wants to encrypt multiple messages with the same key. This is not
directly addressed in the EAV and CPA experiments, where an adversary has to find the
plaintext for a single ciphertext message. The generalization of these games for multiple
encryptions allows the adversary to provide multiple pairs of plaintext messages. A left-
or-right oracle encrypts either the left or the right plaintext of each pair (depending on
the secret bit 𝑏) and returns the ciphertexts.
In fact, EAV security for multiple messages is stronger than EAV security for a sin-
gle message (see Exercise 11). On the other hand, CPA-secure schemes remain secure
when the adversary has access to a left-or-right encryption oracle:
Proposition 2.29. If an encryption scheme is CPA-secure (see Definition 2.25), then it is
CPA-secure for multiple encryptions.
Definition 2.30. Suppose a symmetric encryption scheme is given. Consider the fol-
lowing experiment (see Figure 2.5). On input 1𝑛 a challenger generates a random key
𝑘 ∈ 𝒦 and a random bit 𝑏 ← {0, 1}. A probabilistic polynomial-time adversary 𝐴
is given 1𝑛 , but 𝑘 and 𝑏 are not known to 𝐴. The adversary can ask an oracle to en-
crypt arbitrary plaintexts and to decrypt ciphertexts. The adversary chooses two dif-
ferent plaintexts 𝑚0 and 𝑚1 of the same length. The challenger returns the ciphertext
𝑐 = ℰ𝑘 (𝑚𝑏 ) of one of them. The adversary 𝐴 continues to have access to the encryption
and decryption oracle, only decryption of the challenge ciphertext 𝑐 is not permitted.
Finally, 𝐴 tries to guess 𝑏 and outputs a bit 𝑏′ . The challenger outputs 1 if 𝑏 = 𝑏′ , and
0 otherwise. Then the IND-CCA advantage of 𝐴 is defined by
Adv (𝐴) = | 𝑃𝑟[𝑏′ = 𝑏] − 𝑃𝑟[𝑏′ ≠ 𝑏] | .
The probability is taken over all random variables in this experiment, i.e., the key 𝑘, bit
𝑏, encryption ℰ𝑘 and randomness of 𝐴. ♢
Adversary Challenger/Oracle
1𝑛 $ $
𝑘 ← 𝐺𝑒𝑛(1𝑛 ), 𝑏 ← {0, 1}
𝑚′ or 𝑐′
Choose plaintext 𝑚′ 𝑐′ ← ℰ𝑘 (𝑚′ )
𝑐′ or 𝑚′
or ciphertext 𝑐′ or 𝑚′ ← 𝒟𝑘 (𝑐′ )
𝑚0 , 𝑚1
Choose 𝑚0 , 𝑚1
|𝑚0 | = |𝑚1 |
𝑐 ← ℰ𝑘 (𝑚𝑏 )
𝑚″ or 𝑐″
Choose plaintext 𝑚″ 𝑐″ ← ℰ𝑘 (𝑚″ )
𝑐″ or 𝑚″
or ciphertext 𝑐″ ≠ 𝑐 or 𝑚″ ← 𝒟𝑘 (𝑐″ )
Select 𝑚0 (𝑏′ = 0) Compare 𝑏 and 𝑏′ ,
or 𝑚1 (𝑏′ = 1) output 1 or 0
Figure 2.5. CCA2 indistinguishability experiment. The adversary may repeatedly ask
for the encryption of chosen plaintexts 𝑚′ , 𝑚″ and for the decryption of chosen cipher-
texts 𝑐′ , 𝑐″ except the challenge 𝑐.
2.8. Pseudorandom Generators 45
Adversary Challenger
1𝑛 $ $
𝑠 ← {0, 1}𝑛 , 𝑏 ← {0, 1}
𝑐 $ 𝐺(𝑠) if 𝑏 = 1
𝑟 ← {0, 1}𝑙(𝑛) , 𝑐 = {
𝑟 if 𝑏 = 0
Distinguish Compare 𝑏 and 𝑏′ ,
output 1 or 0
Remark 2.33. A pseudorandom generator is never a truly random bit generator! Be-
cause of the expansion from length 𝑛 to 𝑙(𝑛) > 𝑛, the distribution of output values
cannot be uniform. In fact, many strings of length 𝑙(𝑛) do not occur in the image of 𝐺
since the domain {0, 1}𝑛 is too small, but with limited resources, the output of 𝐺 looks
random and cannot be distinguished from a truly random sequence (see Figure 2.7).♢
𝑏 𝑠
𝐺 𝐺 (𝑠)
𝑏 = 1?
False 𝑟
𝑟 ← {0, 1}𝑙(𝑛)
Figure 2.7. Pseudorandomness: adversaries cannot tell the two cases apart if 𝑏 and 𝑠
are secret.
The output of a pseudorandom generator can be used as a keystream. Like the one-
time pad, XORing the plaintext and the keystream defines a fixed-length encryption
scheme. This type of scheme is called a stream cipher. Further details on stream ciphers
can be found in Chapter 6.
Definition 2.36. Let 𝑙(.) be a polynomial. A pseudorandom generator 𝐺, which on
input 𝑘 ∈ {0, 1}𝑛 produces an output sequence 𝐺(𝑘) ∈ {0, 1}𝑙(𝑛) , defines a fixed-length
encryption scheme by the following construction:
The key generation algorithm 𝐺𝑒𝑛(1𝑛 ) takes the security parameter 1𝑛 as input
and outputs a uniform random key 𝑘 ← {0, 1}𝑛 of length 𝑛. Set ℳ = 𝒞 = {0, 1}𝑙(𝑛) .
Encryption ℰ𝑘 and decryption 𝒟𝑘 are identical and are defined by XORing the output
stream 𝐺(𝑠) with the input data:
𝑐 = ℰ𝑘 (𝑚) = 𝑚 ⊕ 𝐺(𝑘) and 𝑚 = 𝒟𝑘 (𝑐) = 𝑐 ⊕ 𝐺(𝑘). ♢
Proof. We only sketch a proof by reduction. For more details we refer the reader to
Suppose the encryption scheme is not EAV-secure; then there is a polynomial-time
algorithm 𝐴 with a non-negligible advantage Adv (𝐴) in the EAV experiment (see
Definition 2.18). We construct an adversary 𝐵 in the prg distinguishability experiment
which uses 𝐴 as a subroutine. 𝐵 obtains a challenge string 𝑐 of length 𝑙(𝑛) and has to
determine whether 𝑐 was generated by 𝐺 or chosen uniformly. Now 𝐵 runs 𝐴, chooses
a uniform bit 𝑏 ← {0, 1} and obtains a pair 𝑚0 , 𝑚1 of messages of length 𝑙(𝑛) from 𝐴.
Subsequently, 𝐵 gives the challenge 𝑐 ⊕ 𝑚𝑏 to 𝐴. Finally, 𝐴 outputs 𝑏′ and 𝐵 observes
whether or not 𝐴 succeeds. Remember that we assumed that 𝐴 does a good job in the
48 2. Encryption Schemes and Definitions of Security
EAV experiment, and so a correct output of 𝐴 indicates that 𝑐 was produced by the
generator 𝐺. Therefore, 𝐵 outputs 1, i.e., 𝐵 guesses that 𝑐 = 𝐺(𝑠) if 𝐴 succeeds (𝑏 = 𝑏′ ).
Otherwise, 𝐵 outputs 0, i.e., 𝐵 guesses that 𝑐 is a random string.
eav prg
If Adv (𝐴) is non-negligible, then Adv (𝐵) is non-negligible, too. Furthermore,
𝐵 runs in polynomial time. This contradicts our assumption that 𝐺 is a pseudorandom
generator and proves the theorem. □
Now we know that the construction described in Definition 2.36 has a security
guarantee, but a disadvantage is that the message length 𝑙(𝑛) is fixed for a given security
parameter 𝑛. Furthermore, the above encryption scheme is not CPA-secure (since it is
deterministic) and not EAV-secure for multiple encryptions with the same seed or key
(see Exercise 11). Like the one-time pad, the same keystream must not be re-used for
two or more plaintexts.
In Chapter 6, we will deal with variable-length pseudorandom generators and
stream ciphers. They take a seed (or key) and an initialization vector as input, use an
internal state and recursively generate as many output bits as required.
If the adversary is not able to distinguish between these functions, then the advan-
tage is close to 0. Pseudorandom functions have the property that the prf-advantage is
negligible in 𝑛.
Definition 2.39. A keyed function family 𝐹 as described above is called a pseudo-
random function (prf) if, for every probabilistic polynomial time adversary 𝐴, the prf-
advantage Adv (𝐴) is negligible in 𝑛. ♢
Adversary Challenger/Oracle
$ $
1𝑛 𝑘 ← {0, 1}𝑛 , 𝑏 ← {0, 1}
𝐹 if 𝑏 = 1
𝑓={ 𝑘
random if 𝑏 = 0
Choose 𝑚
𝑐 = 𝑓(𝑚)
Distinguish Compare 𝑏 and 𝑏′ ,
output 1 or 0
where 𝑐𝑡𝑟 is viewed as an integer and addition is done modulo 2𝑛 . Incrementing the
counter allows you to generate output of (almost) arbitrary length. The opposite, i.e.,
constructing a pseudorandom function from a pseudorandom generator, is also pos-
sible but more elaborate. In practice, one prefers pseudorandom functions as a basic
building block. ♢
We may also give the adversary additional access to the inverse permutation.
Definition 2.42. A family of permutations 𝐹, as described above, is called a pseudo-
random permutation (prp) if, for every probabilistic polynomial time adversary 𝐴, the
prp-cpa-advantage Adv (𝐴) is negligible in 𝑛. If adversaries also have oracle ac-
cess to the inverse function 𝑓−1 (see Figure 2.9) and the advantage is negligible, then
𝐹 is said to be a strong pseudorandom permutation. ♢
Adversary Challenger/Oracle
$ $
1𝑛 𝑘 ← {0, 1}𝑛 , 𝑏 ← {0, 1}
𝐹 if 𝑏 = 1
𝑓={ 𝑘
random if 𝑏 = 0
Choose 𝑚
𝑐 = 𝑓(𝑚)
Choose 𝑐′
𝑚′ = 𝑓−1 (𝑐′ )
Distinguish Compare 𝑏 and 𝑏′ ,
output 1 or 0
have different outputs), an adversary might distinguish the permutation from a ran-
dom function by computing many input-output pairs and searching for collisions. Sup-
pose that the domain is 𝐷 = {0, 1}𝑙 . By the Birthday Paradox (see Proposition 1.61), a
random function will probably have collisions after around 2𝑙/2 samples. However, a
polynomial-time adversary is not able to check an exponential number of values. In
fact, the prp/prf switching lemma [BR06] states that the difference between the prp and
prf advantages is negligible.
Remark 2.44. In the experiments for pseudorandom functions and permutations, the
secret key 𝑘 is fixed and an adversary can only obtain input-output examples for that
key. Now, under a related-key attack (RKA) they can also study the input-output be-
havior for keys 𝑘1 , 𝑘2 , … related to 𝑘. In an RKA experiment one has to specify which
keys are related. Examples include keys with a fixed difference Δ to 𝑘, i.e. 𝑘 + Δ, or
the complemented key 𝑘, where all key bits are flipped. Note that 𝑘 and its related keys
are still unknown to an adversary, but now they can ask the oracle to use a related key
instead of 𝑘. This gives the adversary more power and pseudorandomness with respect
to a related-key attack is stronger than the usual notion of a prp or prf. We refer the
reader to [BK03] for details.
Example 2.45. Suppose a function family has the following complementation prop-
erty for all keys 𝑘 and input 𝑚:
𝐹𝑘 (𝑚) = 𝐹𝑘 (𝑚).
52 2. Encryption Schemes and Definitions of Security
If the oracle accepts the complemented key 𝑘 as related to 𝑘, then an adversary can
easily distinguish 𝐹 from a random function simply by checking the above equation.
It is known that the former Data Encryption Standard (DES) has the complemen-
tation property and the cipher is thus vulnerable to related-key attacks. ♢
𝑙 = 𝑙(𝑛) and a padding mechanism is applied, if the message length is not a multiple
of 𝑙. A message of length 𝐿 can be padded by appending a 1 followed by the necessary
number of zeros:
𝑥1 𝑥2 … 𝑥𝐿 10 … 0.
We write 𝑚 = 𝑚1 ‖𝑚2 ‖ … ‖𝑚𝑁 , where each 𝑚𝑖 is a block of length 𝑙.
• (ECB mode) Each block is encrypted separately using 𝐸𝑘 , so that
𝑐𝑖 = 𝐸𝑘 (𝑚𝑖 ) for 𝑖 = 1, 2, … , 𝑁, and 𝑐 = ℰ𝑘 (𝑚) = 𝑐1 ‖𝑐2 ‖ … ‖𝑐𝑁 .
Decryption works in a similar way:
𝑚𝑖 = 𝐸𝑘−1 (𝑐𝑖 ) for 𝑖 = 1, 2, … , 𝑁, and 𝑚 = 𝒟𝑘 (𝑐) = 𝑚1 ‖𝑚2 ‖ … ‖𝑚𝑁 .
• (Randomized CBC mode) An initialization vector 𝐼𝑉 ← {0, 1}𝑙 is chosen uni-
formly at random for each message. Then define
𝑐0 = 𝐼𝑉,
𝑐𝑖 = 𝐸𝑘 (𝑚𝑖 ⊕ 𝑐𝑖−1 ) for 𝑖 = 1, 2, … , 𝑁, and
𝑐 = ℰ𝑘 (𝑚) = 𝑐0 ‖𝑐1 ‖𝑐2 ‖ … ‖𝑐𝑁 .
Decryption is defined by
𝑚𝑖 = 𝐸𝑘−1 (𝑐𝑖 ) ⊕ 𝑐𝑖−1 for 𝑖 = 1, 2, … , 𝑁 and
𝑚 = 𝒟𝑘 (𝑐) = 𝑚1 ‖𝑚2 ‖ … ‖𝑚𝑁
(see Figure 2.10). ♢
We can easily verify that the CBC mode has correct decryption:
𝐸𝑘−1 (𝑐𝑖 ) ⊕ 𝑐𝑖−1 = 𝐸𝑘−1 (𝐸𝑘 (𝑚𝑖 ⊕ 𝑐𝑖−1 )) ⊕ 𝑐𝑖−1 = 𝑚𝑖 ⊕ 𝑐𝑖−1 ⊕ 𝑐𝑖−1 = 𝑚𝑖 .
The ECB mode has a straightforward definition, but the mode turns out to be in-
secure and should be avoided. The scheme is deterministic and thus cannot be CPA-
secure. Neither is the scheme EAV-secure, since plaintext patterns are preserved. Sup-
pose, for example, that 𝑚 = 𝑚1 ‖𝑚2 ‖𝑚1 ‖𝑚2 ; then the ciphertext has the same pattern:
𝑐 = 𝑐1 ‖𝑐2 ‖𝑐1 ‖𝑐2 .
54 2. Encryption Schemes and Definitions of Security
CBC is a popular mode, which can be CPA-secure when properly applied (see The-
orem 2.52). The additional computational load, compared to the ECB mode, is very
low. On the other hand, encryption in CBC mode cannot be parallelized.
Encryption schemes can also be based on a family of functions which are not nec-
essarily bijective. The counter mode described below is often used in practice and ef-
fectively turns a block cipher into a stream cipher (see Chapter 6).
Definition 2.48. (Randomized CTR mode) Let
𝐹 ∶ {0, 1}𝑛 × {0, 1}𝑙 → {0, 1}𝑙
be a family of functions. We define a variable-length symmetric encryption scheme
based on 𝐹. The key generation algorithm 𝐺𝑒𝑛(1𝑛 ) outputs a uniform random key
𝑘 ← {0, 1}𝑛 of length 𝑛. A plaintext message 𝑚 is split into blocks of length 𝑙 = 𝑙(𝑛),
where the last block can be shorter:
𝑚 = 𝑚1 ‖𝑚2 ‖ … ‖𝑚𝑁 .
A uniform random counter 𝑐𝑡𝑟 ← {0, 1}𝑙 is chosen and viewed as an integer. For each
block in the message, the counter is incremented (where addition is done modulo 2𝑙 )
and 𝐹𝑘 is applied to the counter. The output is used as a keystream and the ciphertext
is obtained by XORing plaintext and the keystream (see Figure 2.11):
𝑐𝑖 = 𝐹𝑘 (𝑐𝑡𝑟 + 𝑖) ⊕ 𝑚𝑖 for 𝑖 = 1, 2, … , 𝑁 and
𝑐 = ℰ𝑘 (𝑚) = 𝑐𝑡𝑟‖𝑐1 ‖𝑐2 ‖ … ‖𝑐𝑁 .
Decryption is defined by XORing the ciphertext and the same keystream:
𝑚𝑖 = 𝐹𝑘 (𝑐𝑡𝑟 + 𝑖) ⊕ 𝑐𝑖 for 𝑖 = 1, 2, … , 𝑁 and
𝑚 = 𝒟𝑘 (𝑐) = 𝑚1 ‖𝑚2 ‖ … ‖𝑚𝑁 .
Only the first (most significant) bits of 𝐹𝑘 (𝑐𝑡𝑟 + 𝑁) are used for the XOR operation if
the last block is shorter than 𝑙 bits. ♢
Figure 2.11. Encryption in CTR mode. Decryption is almost identical, with plaintext
and ciphertext swapped.
The above CBC and CTR modes define randomized encryption schemes. The IV
used in the CBC mode must be chosen uniformly, whereas the 𝑐𝑡𝑟 value in the CTR
2.10. Block Ciphers and Operation Modes 55
mode can also be a nonce value (number used once). We note that IV and 𝑐𝑡𝑟 values are
not secret and (without encryption) become part of the ciphertext. The full ciphertext
can only be recovered with the IV or 𝑐𝑡𝑟 value.
Remark 2.49. The CTR mode turns a block cipher into a synchronous stream cipher
(see Definition 6.1). The encrypted counter values are used as a keystream. For en-
cryption and decryption, the plaintext and the ciphertext is XORed with the keystream.
There are two other operation modes, the Cipher Feedback Mode (CFB) and the Output
Feedback Mode (OFB) that also generate a keystream. These two modes and stream
ciphers in general are discussed in Chapter 6. ♢
The following two Theorems 2.50 and 2.52 state that CTR and CBC modes achieve
security under CPA if the scheme uses a pseudorandom family of functions or permu-
Theorem 2.50. If 𝐹 is a pseudorandom function family, then the randomized CTR mode
has indistinguishable encryptions under a chosen plaintext attack, i.e., the encryption
scheme is IND-CPA secure.
Proof. The assertion has a proof by reduction. Let 𝐴 be an adversary in the CPA indis-
tinguishability experiment (see Definition 2.24). We want to show that Adv (𝐴)
is negligible if 𝐹 is a pseudorandom permutation.
We construct an algorithm 𝐵 running in the prf experiment (see Definition 2.38),
which uses 𝐴 as a subroutine. Similarly as in the proof of Theorem 2.37, 𝐵 takes the
role of 𝐴’s challenger in the CPA experiment. 𝐵 generates a random bit 𝑏 ← {0, 1}
and responds to 𝐴’s encryption queries. So 𝐵 needs to encrypt messages in CTR mode.
Now, since 𝐵 runs in the prf experiment, 𝐵 has access to a function 𝑓 ∶ {0, 1}𝑙 →
{0, 1}𝑙 which is either the pseudorandom function 𝐹𝑘 or a random function. 𝐵 uses 𝑓
to encrypt messages of arbitrary length in CTR mode, and we denote the associated
randomized encryption function by ℰ𝑓 . If 𝐴 sends the oracle query 𝑚, then 𝐵 responds
with ℰ𝑓 (𝑚). During the experiment, 𝐴 sends a pair (𝑚0 , 𝑚1 ) of plaintexts and 𝐵 returns
the challenge ciphertext ℰ𝑓 (𝑚𝑏 ). Finally, 𝐴 outputs 𝑏′ and 𝐵 observes whether or not
𝐴 succeeds. The result is used to answer the challenge in the prf experiment: 𝐵 outputs
1 if 𝑏 = 𝑏′ , and 0 otherwise.
The running time of 𝐵 is polynomial and is given by the sum of 𝐴’s running time
and the time to encrypt the plaintexts chosen by 𝐴. Note that 𝐵’s strategy is to observe
𝐴 and to output 1, i.e., to guess that 𝑓 is the pseudorandom function if 𝐴 was successful.
Now we want to prove that 𝐵’s advantage is closely related to 𝐴’s advantage, so that
ind−cpa prf
Adv (𝐴) is negligible if Adv (𝐵) is negligible.
56 2. Encryption Schemes and Definitions of Security
Since 𝐵 is an adversary in the prf experiment, 𝐵 does not know whether its chal-
lenger has chosen 𝑓 = random or 𝑓 = 𝐹𝑘 . We consider both cases:
(1) If 𝑓 is a random function, then 𝐵 succeeds, i.e., outputs 0 if and only if 𝐴 fails. The
sign is not relevant and the advantages of 𝐴 and 𝐵 are identical. The CTR mode
encryption is similar to a one-time pad and the advantage of 𝐴 and 𝐵 is 0, unless
the counter values overlap, i.e., a counter value used to compute the challenge ci-
phertext overlaps with at least one counter in the responses to 𝐴’s queries. In this
case, the adversary knows a keystream block and can easily answer the challenge.
Let 𝑞 be an upper bound on the number of blocks in 𝐴’s chosen plaintexts 𝑚0 and
𝑚1 , the number of queries and the number of blocks in the queries. Let 𝑐𝑡𝑟 be
the initial counter value used to compute the challenge ciphertext. For a single
query and a chosen plaintext of at most 𝑞 blocks, only initial counter values 𝑐𝑡𝑟′
with |𝑐𝑡𝑟 − 𝑐𝑡𝑟′ | < 𝑞 can result in an overlap. This inequality is satisfied for 2𝑞 − 1
values of 𝑐𝑡𝑟′ . If the number of queries is at most 𝑞, then an overlap occurs for
less than 2𝑞2 values. Since the initial counter values are chosen uniformly from
{0, 1}𝑙 , we obtain
𝑃𝑟[Overlap] < 𝑙
(compare [KL15] and [BR05]). Since 𝐴 runs in polynomial time, 𝑞 is polynomial
in 𝑛 and 𝑃𝑟[Overlap] is negligible. If an overlap occurs, then we use the trivial
bound 1 for 𝐵’s advantage. However, the probability of an overlap is less than ,
and otherwise the advantage of 𝐵 is 0. We conclude that
| 𝑃𝑟[𝑂𝑢𝑡(𝐵) = 0 | 𝑓 = random ] − 𝑃𝑟[𝑂𝑢𝑡(𝐵)
⏟⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⏟⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⏟= 1 | 𝑓 = random ] | < .
A uniform random bit, chosen by 𝐵’s challenger in the prf experiment, determines
whether 𝑓 is a random function or 𝑓 = 𝐹𝑘 . Hence
𝑃𝑟[𝑓 = random ] = 𝑃𝑟[𝑓 = 𝐹𝑘 ] =.
The definition of 𝐵’s advantage in the prf experiment and the definition of 𝑥 and 𝑦 in
(1) and (2) yield
1 1
Adv (𝐵) = ||| 𝑥 + 𝑦||| .
2 2
Combining (1) and (2), we obtain
ind−cpa prf 2𝑞2
Adv (𝐴) = |𝑦| = |𝑦 + 𝑥 − 𝑥| ≤ |𝑥 + 𝑦| + |𝑥| < 2 Adv (𝐵) + .
2.10. Block Ciphers and Operation Modes 57
𝐹 is a pseudorandom function family, and hence the advantage Adv (𝐵) is negligible.
1 2𝑞2
Furthermore, is negligible in 𝑙 as well as in 𝑛, and is negligible in 𝑛. It follows
2𝑙 2𝑙
that Adv (𝐴) is negligible, which completes the proof by reduction. □
Remark 2.51. The security guarantee provided by this type of proof can easily be mis-
understood, and there have been some controversies about this topic (see [KM07] and
the subsequent discussion).
• Is encryption in CTR mode unconditionally secure?
No, since the CPA security depends on the security of the underlying pseudoran-
dom function family. The scheme is secure under the condition that the function
family is pseudorandom. In other words, one has to assume that no polynomial-
time algorithm can distinguish between the function family and a random func-
tion, or at least that no one can find such an algorithm.
• If pseudorandomness is given, is the CTR mode then secure against all polynomial-
time attacks?
No, since we made assumptions about the capabilities of adversaries. They are
only looking at plaintexts and ciphertexts and have no information about the se-
cret key. Side-channel attacks, where an adversary analyzes the power consump-
tion or the timing of an implementation are not considered. Furthermore, one
assumes that the secret key is uniformly random. This is not necessarily true in
practice, for example if the key is derived from a password.
• If all assumptions are satisfied, does the CTR mode have a concrete security guaran-
The CTR mode is asymptotically CPA-secure. Concrete security depends on the
security parameter, the security of 𝐹 and the number and length of queries made
by an adversary.
We given an example and estimate the concrete advantage of an adversary 𝐴
given in the proof of the above Theorem. Suppose that 𝑙 = 128, 𝐴 makes 𝑞 = 232
queries and the advantage of the corresponding adversary 𝐵 in the prf experiment
is Adv (𝐵) = 2−64 . We assume that the number of blocks in 𝐴’s chosen plain-
texts and in his queries is less than 𝑞. Then
ind−cpa 2𝑞2
Adv (𝐴) ≤ 2 Adv = 2−62 .
(𝐵) +
The concrete security guarantee under the above assumptions is quite strong since
the CPA advantage is very small. ♢
The randomized CBC mode also has a security guarantee, and we refer to [KL15]
or [BR05] for a proof by reduction.
Theorem 2.52. If 𝐸 is a pseudorandom permutation, then the randomized CBC mode
is IND-CPA secure.
58 2. Encryption Schemes and Definitions of Security
Remark 2.53. It is easy to see that neither the CBC nor the CTR modes yield a CCA2-
secure encryption scheme. Security against chosen ciphertext attacks requires non-
malleability; a controlled modification of the ciphertext should be impossible and the
decryption of a modified ciphertext should not be related to the original plaintext. For
example, let’s consider the CTR mode and suppose a single bit of the ciphertext is
flipped. Then only the corresponding plaintext bit changes, so that the CTR mode
is clearly malleable.
CCA2 security of encryption schemes can be achieved by incorporating a message
authentication code. This is, for example, implemented by the Galois Counter Mode
(GCM), which is explained in Section 8.4.
2.11. Summary
• Encryption schemes are defined by plaintext, ciphertext and key spaces as well
as key generation, encryption and decryption algorithms. The encryption al-
gorithm can be probabilistic.
• Perfect secrecy is a very strong requirement, but the one-time pad is perfectly
secret. The key must be at least as long as the plaintext and cannot be re-used.
• Security against eavesdropping (EAV), chosen plaintext attacks (CPA) or adap-
tive chosen ciphertext attacks (CCA2) are common security goals of encryption
schemes. The security definition is based on experiments and the performance
of adversaries.
• Pseudorandom generators (prg) are deterministic algorithms. Polynomial-time
adversaries cannot distinguish the output of a prg from a random string if the
seed (or key) is secret.
• Pseudorandom function families (prf) and permutations appear to be random
transformations to polynomial-time adversaries if the key is secret.
• Block ciphers are keyed permutations. The combination of a block cipher and
an operation mode defines a variable-length encryption scheme.
• The randomized CBC and CTR modes give CPA-secure encryption schemes if
the underlying block cipher is a pseudorandom permutation.
1. Show that the Vigenère cipher is perfectly secure if the key is randomly chosen, it
is only used once and the plaintext has the same length as the key.
2. Find reasons for Kerkhoff’s principle and discuss possible counter-arguments.
3. Show that the one-time pad is not perfectly secret if a key is used twice.
4. Let ℳ be the plaintext space and 𝒦 the key space of a perfectly secure encryption
scheme. Show that |𝒦| ≥ |ℳ|.
Exercises 59
Hint: Suppose ℰ𝑘 (𝑚0 ) = 𝑐. How many different plaintext-key pairs give the ci-
phertext 𝑐 ?
5. Is a bit permutation of block length 𝑛 perfectly secure if it is used only once to
encrypt a string of length 𝑛?
6. Show the formulas for the advantage of adversaries in Remark 2.16.
7. Explain the differences in the definitions of EAV-secure and IND-CPA secure en-
cryption schemes.
8. Prove that a perfectly secure scheme is EAV-secure. Show that Adv (𝐴) is 0 for
any adversary 𝐴. Why is perfect security much stronger than EAV security?
9. Does the Vigenère cipher define an EAV-secure encryption scheme?
10. Suppose we want to expand a fixed output-length pseudorandom generator
𝐺 ∶ {0, 1}𝑛 → {0, 1}𝑛+1 by one extra bit. Which of the following generators 𝐺+ ∶
{0, 1}𝑛 → {0, 1}𝑛+2 could be pseudorandom?
(a) 𝐺(𝑠) = (𝑦1 ‖𝑦2 ‖ … ‖𝑦𝑛+1 ), 𝐺+ (𝑠) = (𝑦1 ‖𝑦2 ‖ … ‖𝑦𝑛+1 ‖ 𝑦1 ⊕ 𝑦2 ⋯ ⊕ 𝑦𝑛+1 ).
(b) 𝑠0 = 𝑠, for 𝑖 = 1, … , 𝑛 + 2: ( 𝑠𝑖 ‖ 𝑦𝑖 ) = 𝐺(𝑠𝑖−1 ). 𝐺+ (𝑠) = (𝑦1 ‖𝑦2 ‖ … ‖𝑦𝑛+2 ).
11. Suppose 𝐺 is a pseudorandom generator with fixed output-length and associated
encryption scheme defined by ℰ𝑘 (𝑚) = 𝑚 ⊕ 𝐺(𝑘). Show that this scheme is not
EAV-secure for multiple encryptions.
12. Prove that a pseudorandom generator is unpredictable in polynomial time, i.e.,
passes the next-bit test.
13. Explain why a malleable encryption scheme cannot be CCA2-secure.
14. Let 𝐹 be a family of bit permutations. Is 𝐹 a pseudorandom permutation?
15. Show that a block cipher in ECB mode is not EAV-secure.
16. Consider a block cipher in CBC mode. Suppose that the IV is initially set to 0 and
then incremented for every new encryption. Can this variant of the CBC mode be
17. Show that a block cipher in CTR is not secure against ciphertext-only attacks, if
the counter is re-used.
18. Can an encryption scheme that is based on a block cipher be perfectly secure?
19. Let 𝐸 be a block cipher of block length 4 and suppose that 𝐸𝑘 (𝑏1 𝑏2 𝑏3 𝑏4 ) = (𝑏2 𝑏3 𝑏4 𝑏1 ).
Encrypt 𝑚 = 1011 0001 0100 and decrypt the ciphertext with the following oper-
ation modes:
(a) ECB mode,
(b) CBC mode with 𝐼𝑉 = 1010,
(c) CTR mode with 𝑐𝑡𝑟 = 1010.
60 2. Encryption Schemes and Definitions of Security
20. Consider a block cipher in CBC mode. The ciphertext is sent to a receiver. What
are the consequences, if:
(a) the receiver misses the initialization vector (IV), or
(b) a single ciphertext block is changed due to transmission errors, or
(c) a ciphertext bit is flipped by an adversary during the transmission, or
(d) a bit error occurs during the ciphering operation ?
21. Suppose a block cipher of block length 128 in CTR mode is used to encrypt 300
bits of plaintext. Which bits cannot be correctly decrypted if one of the following
errors occurs?
(a) The first bit of the counter value is flipped during transmission.
(b) The first ciphertext bit is flipped.
(c) The first ciphertext block 𝑐1 is changed due to transmission errors.
(d) The last ciphertext bit is flipped.
22. Compare ECB, CBC and CTR modes with respect to message expansion, error
propagation, pre-computations and parallelization of encryption and decryption.
23. Explain why neither CBC nor CTR modes achieve CCA2 security.
Chapter 3
This chapter covers several topics from elementary number theory. Section 3.1 deals
with integers, factorizations, prime numbers and the Euclidean algorithm. In Section
3.2, we discuss residue classes and modulo operations. The modular exponentiation
and the associated algorithms play an important role in several cryptographic schemes
and are dealt with in Section 3.3.
We refer to [Ros12] and [Sho09] for detailed expositions of the modular arithmetic
and the elementary number theory used in cryptography.
3.1. Integers
The set of integers {… , −2, −1, 0, 1, 2, 3, … } is denoted by ℤ. Integer numbers can be
added, subtracted and multiplied. The fractions with 𝑎, 𝑏 ∈ ℤ and 𝑏 ≠ 0 define the
rational numbers ℚ. We say that 𝑏 divides 𝑎, 𝑎 is divisible by 𝑏 or 𝑎 is a multiple of 𝑏, if
∈ ℤ and we write 𝑏 ∣ 𝑎. Otherwise, we write 𝑏 ∤ 𝑎.
Example 3.1. 2 ∣ 224236, but 2 ∤ −71. One has 1 ∣ 𝑛 and (−1) ∣ 𝑛 for all 𝑛 ∈ ℤ.
Although 7 ∣ 14, note that 14 ∤ 7. ♢
62 3. Elementary Number Theory
(2) (−17) ∶ 5 = −4 remainder 3 and −17 = 5 ⋅ (−4) + 3. Rather, one might expect
that the integer quotient is −3 in this example, but then the remainder would be
negative. Hence −17 ≡ 3 mod 5, although −17 ≡ −2 mod 5 is also correct. ♢
Proof. Use a base representation of 𝑎 and 𝑏 (and a sign) and analyze the complexity
of the well-known “paper-and-pencil” methods. □
Definition 3.4. A positive integer 𝑝 ∈ ℕ with 𝑝 ≥ 2 is called prime if 𝑝 is only divisible
by ±1 and ±𝑝. Integers 𝑛 ≥ 2 that are not prime numbers are called composite. ♢
The first few prime numbers are 2, 3, 5, 7, 11, 13, 17, 19, and large prime numbers
play an important role in cryptography. The following famous theorem describes the
asymptotic density of primes (see [Edw74]):
Theorem 3.5. (Prime Number Theorem) Let 𝜋(𝑥) be the number of primes less than 𝑥.
lim = 1.
𝑥→∞ 𝑥
( )
𝜋(𝑥) 1
Therefore, the prime density is asymptotically .
𝑥 ln(𝑥)
𝜋(𝑥) 1 1
Example 3.6. Let 𝑥 = 22048 . The prime density is approximately =
𝑥 ln(𝑥) 2048 ln(2)
≈ . Hence the prime density of odd integers with at most 2048 binary digits is
≈ ≈ 0.0014.
Theorem 3.7. (Fundamental Theorem of Arithmetic) Every nonzero integer can be de-
composed into a product of primes and a sign ( factor 1 or −1). The decomposition is
unique up to the order of the factors.
Example 3.8. −60 = −22 ⋅ 3 ⋅ 5.
Example 3.9. We can use SageMath to check the primality or to obtain the factoriza-
tion of an integer.
sage: is_prime (267865461)
sage: factor (267865461)
3^5 * 337 * 3271
3.1. Integers 63
The largest prime number known at the time of writing is the Mersenne prime
282589933 − 1.
This prime number has more than 24 million decimal digits. The primality of Mersenne
numbers 2𝑛 − 1 is tested with the Lucas-Lehmer test, which we do not deal with in this
Definition 3.10. Let 𝑎 and 𝑏 be nonzero integers. Then the greatest positive integer
that divides both 𝑎 ∈ ℤ and 𝑏 ∈ ℤ is called the greatest common divisor of 𝑎 and 𝑏 and
is denoted by gcd(𝑎, 𝑏). We say that 𝑎 and 𝑏 are relatively prime if gcd(𝑎, 𝑏) = 1. ♢
Proposition 3.12. Let 𝑎 and 𝑏 be nonzero integer numbers. Then there exist 𝑥, 𝑦 ∈ ℤ
such that
gcd(𝑎, 𝑏) = 𝑎𝑥 + 𝑏𝑦 . ♢
The greatest common divisor and the above numbers 𝑥 and 𝑦 can be efficiently
computed with the Extended Euclidean Algorithm (see Algorithm 3.1). The algorithm
plays an important role in elementary number theory and in cryptography.
64 3. Elementary Number Theory
We can assume that the input parameters of the algorithm satisfy 𝑎 > 𝑏, since
otherwise the first iteration of the algorithm swaps 𝑎 and 𝑏.
The Extended Euclidean Algorithm is very efficient:
Proposition 3.14. The running time of the Extended Euclidean Algorithm on input
𝑎, 𝑏 ∈ ℕ is 𝑂(size(𝑎) size(𝑏)). ♢
is the current integer quotient. Since the product of all quotients 𝑞 is less than or equal
to 𝑎, the algorithm runs in time 𝑂(size(𝑎)2 ). A more refined argument shows that the
running time is 𝑂(size(𝑎) size(𝑏)).
3.2. Congruences
Congruences and residue classes modulo 𝑛 were already dealt with in Chapter 1 (see
Example 1.21). Let 𝑛 ≥ 2 be a positive integer. Recall the definition of the following
equivalence relation on ℤ:
𝑅𝑛 = {(𝑥, 𝑦) ∈ ℤ × ℤ | 𝑥 − 𝑦 ∈ 𝑛 ℤ}.
Hence (𝑥, 𝑦) ∈ 𝑅𝑛 if the difference 𝑥 − 𝑦 is divisible by 𝑛. Equivalent elements 𝑥
and 𝑦 are called congruent modulo 𝑛 and we write 𝑥 = 𝑦 or 𝑥 ≡ 𝑦 mod 𝑛. The set
of equivalence classes is denoted by ℤ𝑛 or ℤ/(𝑛) and contains 𝑛 elements. ℤ𝑛 is also
called the set of residue classes mod 𝑛 or integers mod 𝑛 (see Figure 3.1).
𝑛−1 1
Two integers 𝑥 and 𝑦 are congruent mod 𝑛 if they have the same remainder when
they are divided by 𝑛. Obviously, if 𝑥 = 𝑞1 𝑛 + 𝑟1 and 𝑦 = 𝑞2 𝑛 + 𝑟2 with 𝑟1 , 𝑟2 ∈
{0, 1, … , 𝑛 − 1}, then 𝑥 − 𝑦 = (𝑞1 − 𝑞2 )𝑛 + (𝑟1 − 𝑟2 ). Hence 𝑥 and 𝑦 are congruent mod
𝑛 if and only if 𝑟1 − 𝑟2 = 0. In other words, 𝑥 ≡ 𝑦 mod 𝑛 holds if the integer divisions
𝑥 ∶ 𝑛 and 𝑦 ∶ 𝑛 have the same remainder.
Note that an integer 𝑥 ∈ ℤ is only one representative of the residue class 𝑥 ∈ ℤ𝑛 ,
which contains infinitely many congruent elements. The standard representatives of
ℤ𝑛 are 0, 1, … , 𝑛 − 1, but other representatives are also permitted. Elements in ℤ𝑛 can
66 3. Elementary Number Theory
(3) How would you compute 782637846 mod 8927289 with a simple pocket calcula-
tor? The real division 782637846 ∶ 8927289 gives approximately 87.668, and so
the integer quotient is 87. We compute 782637846 − 87 ⋅ 8927289 and get the re-
mainder 5963703. Alternatively, we multiply the fractional part 0.668 by 8927289
and also obtain 5963703, up to a small rounding error. ♢
When does the multiplicative inverse modulo 𝑛 exist and how can we efficiently
compute it?
Proposition 3.17. Let 𝑛 ≥ 2 be a positive integer and 𝑎 ∈ ℤ a nonzero integer. Then
𝑎 mod 𝑛 has a multiplicative inverse if and only if gcd(𝑎, 𝑛) = 1. A representative of
(𝑎 mod 𝑛)−1 can be efficiently computed using the Extended Euclidean Algorithm.
Proof. Running the Extended Euclidean Algorithm on input 𝑎 and 𝑛 gives the output
integers gcd(𝑎, 𝑛), 𝑥 and 𝑦 such that
gcd(𝑎, 𝑛) = 𝑎𝑥 + 𝑛𝑦.
3.3. Modular Exponentiation 67
Definition 3.18. The invertible integers modulo 𝑛 are called units mod 𝑛. The subset
of units of ℤ𝑛 is denoted by ℤ∗𝑛 . Euler’s 𝜑-function (or 𝜙-function) is defined by the
cardinality of the units mod 𝑛, i.e., 𝜑(𝑛) = |ℤ∗𝑛 |.
Example 3.19. (1) ℤ∗10 = {1, 3, 7, 9} and 𝜑(10) = 4. The inverse elements are as
−1 −1 −1 −1
1 = 1, 3 = 7, 7 = 3, 9 = 9.
The inverses can be computed with the Extended Euclidean Algorithm. If we take
the input values 𝑎 = 3 and 𝑛 = 10, then we obtain gcd(3, 10) = 1, 𝑥 = −3 and
𝑦 = 1, satisfying the equation 1 = 3 ⋅ (−3) + 10 ⋅ 1. Hence (3 mod 10)−1 ≡ −3 ≡
7 mod 10.
(2) Let 𝑝 be a prime number; then ℤ∗𝑝 = {1, 2, … , 𝑝 − 1} and 𝜑(𝑝) = 𝑝 − 1.
𝑘 𝑘 𝑘
Warning 3.20. 𝑥2 is not the same as 𝑥2𝑘 . In fact, 𝑥2 = 𝑥(2 ) , and this is different
from (𝑥2 )𝑘 = 𝑥2𝑘 . ♢
If the exponent is not a power of 2, then it can still be written as a sum of powers
of 2. This gives a product of factors 𝑥2 . The binary representation of the exponent
determines whether or not a factor 𝑥2 is present in the product.
68 3. Elementary Number Theory
Example 3.21. 641 mod 59. We have 41 = 25 + 23 + 20 and compute the following
sequence of squares:
62 ≡ 36 mod 59,
64 ≡ 362 ≡ 57 mod 59,
68 ≡ 572 ≡ 4 mod 59,
616 ≡ 42 ≡ 16 mod 59,
632 ≡ 162 ≡ 20 mod 59.
Warning 3.23. When computing 𝑥𝑎 mod 𝑛, the exponent must not be reduced mod 𝑛.
For example 26 = 64 ≡ 4 mod 5, which is different from 21 = 2 mod 5. However,
we will later see (Proposition 4.16) that a reduction mod 𝜑(𝑛) is possible (i.e., mod 4 in
this example so that 26 ≡ 22 = 4 mod 5) and helps to reduce the size of the exponent.
Furthermore, a reduction of the base 𝑥 mod 𝑛 is allowed. For example, 66 ≡ 16 =
1 mod 5. ♢
3.4. Summary
2. Enumerate the elements of ℤ∗22 and give 𝜑(22). Find the inverse of each element
in ℤ∗22 .
3. Perform some elementary modular operations with SageMath.
Let 𝑛 = 123456789012345, 𝑎 = 5377543210987654321 and 𝑏 = 12345678914335.
(a) Find the prime factor decomposition of 𝑛, 𝑎 and 𝑏. Use the factor command.
(b) Compute 𝑎 + 𝑏 mod 𝑛, 𝑎𝑏 mod 𝑛, 𝑎𝑏 mod 𝑛, using mod(..,..) and
(c) Are 𝑎 or 𝑏 invertible modulo 𝑛 ? Why or why not? Compute (𝑎 mod 𝑛)−1 or
(𝑏 mod 𝑛)−1 , using mod(1/..,..).
(d) Are 𝑎 and 𝑏 relatively prime? Why or why not?
4. Run the Extended Euclidean Algorithm on input 𝑎 = 1234 and 𝑏 = 6789.
5. Use the Extended Euclidean Algorithm to compute the multiplicative inverse of
32 ∈ ℤ∗897 .
6. Let 𝑝 be a prime number and 𝑚 ∈ ℕ. Find 𝜑(2𝑝), 𝜑(2𝑚 ) and 𝜑(𝑝𝑚 ).
7. Write a function which examines the primality of all Mersenne numbers
𝑀𝑛 = 2 𝑛 − 1
Algebraic Structures
Modern cryptography uses not only discrete mathematics and elementary number the-
ory, but also algebraic structures such as abelian groups, polynomial rings, quotient
rings and finite fields. Understanding finite abelian groups and finite fields is crucial,
and these algebraic topics should not be underestimated.
Section 4.1 deals with groups and, in particular, finite abelian groups. We will see
that finite abelian groups are products of cyclic groups. Rings and fields are discussed in
Section 4.2. Finite fields are used in many cryptographic constructions, and in Section
4.3, we construct fields with 𝑝𝑛 elements. We add a recapitulation of linear and affine
maps in Section 4.4.
The contents of this chapter can be found in any textbook on abstract algebra, and
we refer the reader for example to [Sho09].
4.1. Groups
Groups are among the most fundamental mathematical structures. A group is a set
with a binary operation which satisfies several properties.
∘∶𝐺×𝐺 →𝐺
74 4. Algebraic Structures
The above definition uses ∘ for the composition of elements. In our applications,
the composition is either addition or multiplication, and we write + (plus) or ⋅ (dot).
The identity element is denoted by 0 (additive case) or 1 (multiplicative case). Accord-
ingly, the inverse element of 𝑔 is denoted by −𝑔 in the additive case and by 𝑔−1 in the
multiplicative case.
We want to relate different groups and consider maps that respect the group struc-
It is easy to show that the inverse map 𝑓−1 of an isomorphism 𝑓 is also a group
homomorphism and hence an isomorphism.
Warning 4.5. A bijection between two groups does not imply that they are isomorphic!
For example, there is a bijection between ℤ2 × ℤ2 and ℤ4 (since both groups have 4
elements), but they are not isomorphic (see Example 4.28 below).
4.1. Groups 75
Example 4.6. (1) The natural projection map 𝑓 ∶ ℤ → ℤ𝑛 with 𝑓(𝑘) = 𝑘 mod 𝑛 is
a group homomorphism, since
𝑓(𝑘1 + 𝑘2 ) = (𝑘1 + 𝑘2 ) mod 𝑛 = (𝑘1 mod 𝑛) + (𝑘2 mod 𝑛).
Obviously, 𝑓 is not injective and therefore not an isomorphism.
(2) The reverse map 𝑓 ∶ ℤ𝑛 → ℤ with 𝑓(𝑘) = 𝑘, where 𝑘 ∈ {0, 1, … , 𝑛 − 1} is the
standard representative, is not a homomorphism, since for example
𝑓(1) + 𝑓(𝑛 − 1) = 1 + (𝑛 − 1) = 𝑛,
but 𝑓(1 + 𝑛 − 1) = 𝑓(0) = 0.
(3) Let 𝐺1 = (ℤ4 , +) = {0, 1, 2, 3} and 𝐺2 = (ℤ∗5 , ⋅) = {1, 2, 3, 4}. The map
𝑓 ∶ 𝐺1 → 𝐺2 , 𝑓(𝑘 mod 4) = 2𝑘 mod 5
is well defined, since 24 ≡ 16 ≡ 1 mod 5 so that the result does not depend on a
representative of 𝑘 modulo 4. It defines a homomorphism, since
𝑓((𝑘1 mod 4) + (𝑘2 mod 4)) = 2𝑘1 +𝑘2 mod 5 = (2𝑘1 mod 5) ⋅ (2𝑘2 mod 5).
The explicit mapping is given by 𝑓(0) = 1, 𝑓(1) = 2, 𝑓(2) = 4 and 𝑓(3) = 3.
Hence 𝑓 is a bijection and defines a group isomorphism (ℤ4 , +) ≅ (ℤ∗5 , ⋅).
Definition 4.7. Let 𝐺 be a group. A subgroup 𝐻 of 𝐺 is a subset of 𝐺, which contains
the identity element and is closed under the law of composition and inverse. ♢
The subgroup ⟨𝑔⟩ is in fact a cyclic group (see Definition 4.18 below). Next, we
define the order of a group and the order of elements:
Definition 4.12. Let 𝐺 be a group; then the order of the group 𝐺 is defined to be the
number |𝐺| of elements in 𝐺 (or infinity) and denoted by ord(𝐺). Now let 𝑔 ∈ 𝐺. Then
the order of the element 𝑔 is defined by the order of the subgroup ⟨𝑔⟩ generated by 𝑔, i.e.,
ord(𝑔) = ord(⟨𝑔⟩). ♢
Proof. Let 𝑛 be the smallest positive integer such that 𝑔𝑛 = 𝑒. It follows that 𝑛 =
ord(𝑔), since ⟨𝑔⟩ has exactly ord(𝑔) elements. Hence
(𝑔𝑛 )𝑘 = 𝑔ord(𝑔) 𝑘 = 𝑒
for any 𝑘 ∈ ℕ. Then the assertion follows from Corollary 4.14. □
Euler’s Theorem is often stated for 𝐺 = ℤ∗𝑛 . Since ord(ℤ∗𝑛 ) = 𝜑(𝑛) (see Definition
3.18), we obtain
𝑥𝜑(𝑛) ≡ 1 mod 𝑛
for any 𝑥 ∈ ℤ with gcd(𝑥, 𝑛) = 1. If 𝑛 is a prime number 𝑝, then one has
𝑥𝑝−1 ≡ 1 mod 𝑝
for any integer 𝑥 which is not a multiple of 𝑝. This implies Fermat’s Little Theorem:
𝑥𝑝 ≡ 𝑥 mod 𝑝.
This modular equation holds for any prime number 𝑝 and integer 𝑥.
Euler’s Theorem shows how the exponent can be reduced in a modular exponen-
4.1. Groups 77
Note that the exponent can be reduced modulo 𝜑(𝑛), but not modulo 𝑛 (compare
Warning 3.23).
Example 4.17. Calculate 722 mod 11. Since 𝜑(11) = 10 and 22 ≡ 2 mod 10, we
obtain 722 ≡ 72 = 49 ≡ 5 mod 11. ♢
𝑒 = 𝑔0
𝑛−1 𝑔
In fact, one can show that all cyclic groups are isomorphic to either (ℤ𝑛 , +) or
(ℤ, +).
Proposition 4.20. Let 𝐺 be a cyclic group. If ord(𝐺) = 𝑛, then 𝐺 is isomorphic to ℤ𝑛 . If
ord(𝐺) = ∞, then 𝐺 is isomorphic to ℤ.
Proof. Let 𝐺 = ⟨𝑔⟩. We use the multiplicative notation for 𝐺. If 𝑔 has infinite order,
then 𝑓 ∶ ℤ → 𝐺 with 𝑓(𝑘) = 𝑔𝑘 defines an isomorphism from the additive group (ℤ, +)
to 𝐺. If ord(𝑔) = 𝑛, then 𝑓 ∶ ℤ𝑛 → 𝐺, 𝑓(𝑘 mod 𝑛) = 𝑔𝑘 gives an isomorphism. □
Example 4.21. (ℤ∗5 , ⋅) is generated by 2 mod 5 and is cyclic of order 4. We have seen
above (Example 4.6) that 𝑓(𝑘 mod 4) = 2𝑘 mod 5 gives an isomorphism between
the additive group ℤ4 and the multiplicative group ℤ∗5 . Hence two groups that look
different may still be isomorphic. ♢
How can one find or verify generators of a finite cyclic group? If the group is large,
then it is inefficient or even computationally impossible to compute the sequence of
powers 𝑔, 𝑔2 , 𝑔3 , … and to check whether all elements of 𝐺 occur. We can use the
following observation to give a more efficient method: suppose that ord(𝐺) = 𝑛. If 𝑔 is
not a generator, then ord(𝑔) is strictly less than 𝑛 and divides for some prime factor
𝑞 of 𝑛. In this case, we have 𝑔𝑛/𝑞 = 1. Hence we only need to check the powers 𝑔𝑛/𝑞
for all prime factors 𝑞 of 𝑛. If all powers are different from the identity element, then
ord(𝑔) = 𝑛 and 𝑔 is a generator.
The following Algorithm 4.1 takes a finite cyclic group 𝐺, the group order 𝑛 and
an element 𝑔 ∈ 𝐺 as input and outputs TRUE if 𝑔 is a generator and otherwise FALSE.
The following Theorem states that primitive roots modulo prime numbers exist.
Theorem 4.24. Let 𝑝 be a prime; then ℤ∗𝑝 is a cyclic group of order 𝑝 − 1. The number
of primitive roots is 𝜑(𝑝 − 1). ♢
There are several (non-trivial) proofs of this theorem. Note that there are certain
composite numbers, for example 𝑛 = 12, such that ℤ∗𝑛 does not possess a primitive
Example 4.25. Let 𝑝 = 2535301200456458802993406412663. We use SageMath to
compute element orders in ℤ∗𝑝 and to find a primitive root. First, we verify that 𝑝 is
prime and factorize 𝑝 − 1.
sage: p =2535301200456458802993406412663; is_prime (p); factor (p -1)
2 * 1267650600228229401496703206331
𝑓(𝑘 mod 𝑛) = (𝑘 mod 𝑎, 𝑘 mod 𝑏), is well defined and gives an isomorphism of ad-
ditive groups:
ℤ𝑛 ≅ ℤ𝑎 × ℤ𝑏 .
The restriction of this map to the group of units yields an isomorphism of multiplicative
ℤ∗𝑛 ≅ ℤ∗𝑎 × ℤ∗𝑏 .
Proof. We prove the isomorphism of the additive groups. Since 𝑎 and 𝑏 divide 𝑛, the
map is well defined. It follows from the definition of 𝑓 that the map is a homomor-
phism. Since ℤ𝑛 and ℤ𝑎 × ℤ𝑏 both contain 𝑛 = 𝑎𝑏 elements, it suffices to prove the sur-
jectivity. Let (𝑘1 mod 𝑎, 𝑘2 mod 𝑏) ∈ ℤ𝑎 ×ℤ𝑏 . We need to find an element 𝑘 ∈ ℤ such
that 𝑘 ≡ 𝑘1 mod 𝑎 and 𝑘 ≡ 𝑘2 mod 𝑏. Since gcd(𝑎, 𝑏) = 1, the Extended Euclidean
Algorithm gives 𝑥, 𝑦 ∈ ℤ such that 𝑎𝑥 + 𝑏𝑦 = 1. This equation implies 𝑎𝑥 ≡ 1 mod 𝑏
and 𝑏𝑦 ≡ 1 mod 𝑎. Now we set
𝑘 = 𝑘1 𝑏𝑦 + 𝑘2 𝑎𝑥.
Then 𝑘 ≡ 𝑘1 𝑏𝑦 ≡ 𝑘1 mod 𝑎 and 𝑘 ≡ 𝑘2 𝑎𝑥 ≡ 𝑘2 mod 𝑏, as desired. □
Cyclic groups form the main building block in the classification of arbitrary finite
abelian groups.
Theorem 4.29. (Fundamental Theorem of Abelian Groups) Let 𝐺 be a finite abelian
group. Then 𝐺 is isomorphic to a direct product of cyclic groups ℤ𝑝𝑘 of order 𝑝𝑘 , where 𝑝
is a prime number and 𝑘 ∈ ℕ. The same prime 𝑝 can appear in several factors.
It remains to show that every finite abelian group is a product of cyclic groups (see for
instance [Sho09]). □
Example 4.30. (1) Let 𝐺 be an abelian group of order 77. Then 𝐺 ≅ ℤ7 × ℤ11 . 𝐺 is
isomorphic to ℤ77 and is cyclic.
(2) Suppose 𝐺 is an abelian group of order 18. Then 𝐺 is either isomorphic to ℤ2 × ℤ9
or to ℤ2 × ℤ3 × ℤ3 . Note that these two groups are not isomorphic. The first group
is cyclic and the second group is not cyclic.
Example 4.32. The integers ℤ and ℤ𝑛 , the integers modulo 𝑛, form a ring with respect
to addition and multiplication of integers and residue classes, respectively. ♢
Maps between rings that are compatible with addition and multiplication are
called ring homomorphisms.
Definition 4.33. Let 𝑓 ∶ 𝑅1 → 𝑅2 be a map between the rings 𝑅1 and 𝑅2 . Then 𝑓 is
called a ring homomorphism if
(1) 𝑓(𝑥 + 𝑦) = 𝑓(𝑥) + 𝑓(𝑦) for all 𝑥, 𝑦 ∈ 𝑅, and
(2) 𝑓(𝑥 ⋅ 𝑦) = 𝑓(𝑥) ⋅ 𝑓(𝑦) for all 𝑥, 𝑦 ∈ 𝑅, and
(3) 𝑓(1) = 1.
A bijective ring homomorphism is called an isomorphism, and one writes 𝑅1 ≅ 𝑅2 .
Example 4.34. Let 𝑎, 𝑏 ∈ ℕ be relatively prime and 𝑛 = 𝑎𝑏. Then the Chinese Re-
mainder Theorem 4.26 gives a ring isomorphism
ℤ𝑛 ≅ ℤ𝑎 × ℤ𝑏 .
Definition 4.35. Let 𝑅 be a ring. Then the subset of invertible elements with respect to
multiplication is called the units of 𝑅 and is denoted by 𝑅∗ . The units form an abelian
group. ♢
82 4. Algebraic Structures
Definition 4.35 generalizes Definition 3.18 where we defined the units ℤ∗𝑛 of the
integers modulo 𝑛.
Example 4.36. ℤ∗ = {−1, 1}. This group is isomorphic to the additive group ℤ2 . ♢
It is evident that a field extension 𝐹 of 𝐾 is also a vector space over 𝐾 (see Section
4.4 on vector spaces).
Definition 4.40. Let 𝐹 be a field extension of 𝐾. If the dimension of 𝐹 over 𝐾 (as a
𝐾-vector space) is finite and equal to 𝑛, then 𝑛 is called the degree of the field extension
and we write [𝐹 ∶ 𝐾] = 𝑛.
Example 4.41. (1) ℝ is a subfield of ℂ. Furthermore, ℂ is a vector space over ℝ and
[ℂ ∶ ℝ] = 2. A basis is given by 1 and 𝑖, where 𝑖 ∈ ℂ is the imaginary unit.
(2) ℝ is a field extension of ℚ, but the degree is infinite. ♢
The following section deals with finite fields and their extensions.
Definition 4.44. Let 𝑝 be a prime number; then we write 𝐺𝐹(𝑝) or 𝔽𝑝 for the field
(ℤ𝑝 , +, ⋅) with 𝑝 elements.
Example 4.45. The binary digits {0, 1} form a field which is isomorphic to 𝐺𝐹(2),
where addition is XOR and multiplication is AND (see Table 1.1 in Chapter 1). ♢
There are rings of any finite order (for example ℤ𝑛 ), but this is not the case for
Proposition 4.46. Let 𝐾 be a finite field of order 𝑛; then 𝑛 is a prime number or a prime-
However, 𝐺𝐹(𝑝) is not the only field of characteristic 𝑝, and in the following we
construct fields of order 𝑝𝑛 for primes 𝑝 and integers 𝑛 ≥ 2. Unfortunately, the naive
constructions do not work:
• ℤ𝑝𝑛 is a ring with 𝑝𝑛 elements, but not a field (compare Proposition 4.42). For
example, 𝑝 is nonzero but not invertible modulo 𝑝𝑛 since gcd(𝑝, 𝑝𝑛 ) = 𝑝.
• ℤ𝑛𝑝 = ℤ𝑝 × ⋯ × ℤ𝑝 with component-wise addition and multiplication is a ring
with 𝑝𝑛 elements, but not a field (see Exercise 12). ♢
In fact, the construction of a field 𝐺𝐹(𝑝𝑛 ) of order 𝑝𝑛 is a bit more involved and
requires polynomial rings.
Definition 4.49. Let 𝐾 be a field. Then 𝐾[𝑥] is called the set (or ring) of polynomials
over 𝐾 and consists of all formal expressions
𝑓(𝑥) = ∑ 𝑎𝑖 𝑥𝑖 = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + ⋯ + 𝑎𝑛 𝑥𝑛 ,
84 4. Algebraic Structures
Note that 𝐾[𝑥] is not a field, since polynomials of degree ≥ 1 cannot be inverted
But we have a division with remainder. Let 𝑓(𝑥), 𝑔(𝑥) ∈ 𝐾[𝑥] with 𝑔(𝑥) ≠ 0. Then
the division 𝑓(𝑥) ∶ 𝑔(𝑥) gives a quotient 𝑞(𝑥) ∈ 𝐾[𝑥] and a remainder 𝑟(𝑥) ∈ 𝐾[𝑥]
such that
𝑓(𝑥) = 𝑞(𝑥)𝑔(𝑥) + 𝑟(𝑥) and deg(𝑟) < deg(𝑔).
Obviously, 𝑔(𝑥) divides 𝑓(𝑥) if and only if the remainder is 0.
Definition 4.53. Let 𝑓(𝑥), 𝑔(𝑥) ∈ 𝐾[𝑥] be nonzero polynomials. Then the greatest
common divisor gcd(𝑓, 𝑔) is the monic polynomial of highest possible degree that di-
vides 𝑓(𝑥) and 𝑔(𝑥). ♢
The greatest common divisor (gcd) of polynomials can be efficiently computed us-
ing the Extended Euclidean Algorithm, analogous to the gcd of integers (see Algorithm
3.1 in Chapter 3). The integer division is replaced by the division of polynomials with
4.3. Finite Fields 85
remainder. The Extended Euclidean Algorithm takes two polynomials 𝑓 and 𝑔 as input
and outputs 𝑔𝑐𝑑(𝑓, 𝑔) along with two polynomials 𝑎(𝑥) and 𝑏(𝑥) such that
𝑔𝑐𝑑(𝑓, 𝑔) = 𝑎(𝑥)𝑓(𝑥) + 𝑏(𝑥)𝑔(𝑥).
The gcd of 𝑓 and 𝑔 has the following property: if ℎ(𝑥) divides 𝑓(𝑥) and 𝑔(𝑥), then ℎ(𝑥)
divides gcd(𝑓, 𝑔).
Example 4.54. We compute gcd(𝑥3 + 1, 𝑥2 + 1) over 𝐺𝐹(2) (see Table 4.1) and obtain
gcd(𝑥3 + 1, 𝑥2 + 1) = 𝑥 + 1 = 1 ⋅ (𝑥3 + 1) − 𝑥 ⋅ (𝑥2 + 1).
Example 4.56. Let 𝑛 ∈ ℕ and 𝑓(𝑥) = 𝑥2 + 𝑥 ∈ 𝐺𝐹(2)[𝑥]. Then
𝑛 −1
𝐷(𝑓) = 2𝑛 𝑥2 + 1 = 1,
since 2𝑛 = 0 in 𝐺𝐹(2). ♢
One can show that the derivative satisfies the product rule (see Exercise 13):
𝐷(𝑓 ⋅ 𝑔) = 𝑓 ⋅ 𝐷(𝑔) + 𝐷(𝑓) ⋅ 𝑔.
Note that 𝐷 does not have a geometric interpretation as in the real case. However,
the derivative can detect double roots of polynomials.
Proposition 4.57. Let 𝑓(𝑥) ∈ 𝐾[𝑥] and assume that gcd(𝑓, 𝐷(𝑓)) = 1. Then 𝑓(𝑥) is
square-free, i.e., it is not divisible by the square of any polynomial of degree at least 1. In
particular, 𝑓(𝑥) is not divisible by (𝑥 − 𝑎)2 for any 𝑎 ∈ 𝐾 and does not have double roots.
Definition 4.59. Let 𝑔 ∈ 𝐾[𝑥] be a polynomial with deg(𝑔) ≥ 1. Then 𝑔(𝑥) defines an
equivalence relation on 𝐾[𝑥]:
𝑓1 (𝑥) ∼ 𝑓2 (𝑥) if 𝑓1 (𝑥) − 𝑓2 (𝑥) = 𝑞(𝑥)𝑔(𝑥) for some 𝑞(𝑥) ∈ 𝐾[𝑥].
Equivalent polynomials 𝑓1 and 𝑓2 are called congruent modulo 𝑔(𝑥) and we write 𝑓1 (𝑥) ≡
𝑓2 (𝑥) mod 𝑔(𝑥). The set of equivalence classes or residue classes modulo 𝑔(𝑥) is denoted
by 𝐾[𝑥]/(𝑔(𝑥)). ♢
Two polynomials 𝑓1 and 𝑓2 are congruent modulo 𝑔 if and only if they have the
same remainder when divided by 𝑔(𝑥). Note that the definition is similar to residue
classes modulo an integer 𝑛, but here the construction is based on the polynomial ring
𝐾[𝑥] instead of the ring of integers ℤ.
The residue classes modulo 𝑔(𝑥) form not only a set, but also a ring:
Proposition 4.60. Let 𝑔 ∈ 𝐾[𝑥] and 𝑛 = deg(𝑔) ≥ 1. Then 𝐾[𝑥]/(𝑔(𝑥)) is again a ring
called a quotient ring, factor ring or residue class ring, with the operations induced by
𝐾[𝑥]. Each residue class has a unique standard representative of degree less than 𝑛.
Proof. The ring structure can be easily verified. The standard representative can be
found by division with remainder. Let 𝑓(𝑥) ∈ 𝐾[𝑥] be any representative of a residue
class. We divide 𝑓(𝑥) by 𝑔(𝑥) and obtain polynomials 𝑞(𝑥), 𝑟(𝑥) such that
𝑓(𝑥) = 𝑞(𝑥)𝑔(𝑥) + 𝑟(𝑥),
where deg(𝑟) < 𝑛. The equation implies 𝑓(𝑥) ≡ 𝑟(𝑥) mod 𝑔(𝑥), where 𝑟(𝑥) is the
standard representative. □
Example 4.61. We continue Example 4.52. The division with remainder implies
𝑥6 + 𝑥5 + 𝑥3 + 𝑥2 + 𝑥 + 1 ≡ 𝑥3 + 𝑥 + 1 mod 𝑥4 + 𝑥3 + 1.
Therefore, the classes of 𝑥6 + 𝑥5 + 𝑥3 + 𝑥2 + 𝑥 + 1 and 𝑥3 + 𝑥 + 1 are equal in the residue
class ring 𝐺𝐹(2)[𝑥]/(𝑥4 + 𝑥3 + 1).
Remark 4.62. The construction of residue classes can be studied in a more general
context. Let 𝑅 be a ring. An ideal 𝐼 ⊂ 𝑅 is an additive subgroup with the property that
𝑟 ⋅ 𝑥 ∈ 𝐼 for all 𝑟 ∈ 𝑅 and 𝑥 ∈ 𝐼. For any ideal 𝐼 of a ring 𝑅 one has the quotient ring
𝑅/𝐼. Two elements 𝑥, 𝑦 ∈ 𝑅 are equivalent and identified in 𝑅/𝐼 if 𝑥 − 𝑦 ∈ 𝐼.
We considered polynomial rings 𝑅 = 𝐾[𝑥] and principal ideals 𝐼 = (𝑔(𝑥)) gener-
ated by a single polynomial 𝑔(𝑥) ∈ 𝐾[𝑥]. If 𝑅 = ℤ and 𝐼 = (𝑛), then the quotient 𝑅/𝐼
defines the integers modulo 𝑛, i.e., ℤ/(𝑛) = ℤ𝑛 .
The polynomial ring 𝐾[𝑥] has similar properties to ℤ with respect to factorization.
Polynomials can be decomposed into a product of polynomials and the factorization is
essentially unique. The prime numbers in ℤ correspond to irreducible polynomials in
Definition 4.64. A polynomial 𝑓(𝑥) ∈ 𝐾[𝑥] is called irreducible, if it cannot be fac-
tored into two polynomials of smaller degree. Otherwise, the polynomial is called re-
ducible. ♢
Degree Polynomials
2 𝑥2 + 𝑥 + 1
3 𝑥3 + 𝑥 + 1
𝑥3 + 𝑥2 + 1
4 𝑥4 + 𝑥 + 1
𝑥 + 𝑥3 + 𝑥2 + 𝑥 + 1
𝑥4 + 𝑥3 + 1
5 𝑥5 + 𝑥2 + 1
𝑥 + 𝑥3 + 𝑥2 + 𝑥 + 1
𝑥5 + 𝑥3 + 1
𝑥 + 𝑥4 + 𝑥3 + 𝑥 + 1
𝑥5 + 𝑥4 + 𝑥3 + 𝑥2 + 1
𝑥5 + 𝑥4 + 𝑥2 + 𝑥 + 1
This result might look surprising, since 𝐾[𝑥] is far from being a field: no polyno-
mial of degree ≥ 1 is multiplicatively invertible in 𝐾[𝑥]. However, an inversion mod-
ulo 𝑔(𝑥) is often possible, since two representatives 𝑓1 and 𝑓2 are multiplicative inverses
mod 𝑔(𝑥) if
𝑓1 (𝑥)𝑓2 (𝑥) = 1 + 𝑞(𝑥)𝑔(𝑥)
for some 𝑞(𝑥) ∈ 𝐾[𝑥].
The proof of Proposition 4.67 uses the Extended Euclidean Algorithm for polynomi-
als. We briefly sketch the proof: let 𝑓(𝑥) be a nonzero polynomial of degree less than
deg(𝑔). Then there are polynomials ℎ1 and ℎ2 such that
1 = ℎ1 (𝑥)𝑓(𝑥) + ℎ2 (𝑥)𝑔(𝑥).
This shows that 1 ≡ ℎ1 (𝑥)𝑓(𝑥) mod 𝑔(𝑥) so that 𝑓(𝑥) is invertible modulo 𝑔(𝑥).
Definition 4.68. Let 𝑔(𝑥) ∈ 𝐺𝐹(𝑝)[𝑥] be an irreducible polynomial of degree 𝑛. Then
the residue field 𝐺𝐹(𝑝)[𝑥]/(𝑔(𝑥)) defines the Galois Field 𝐺𝐹(𝑝𝑛 ) = 𝔽𝑝𝑛 of order 𝑝𝑛 .
It follows from Proposition 4.63 that the field 𝐺𝐹(𝑝𝑛 ) indeed contains 𝑝𝑛 elements;
each residue class has a unique representative of degree less than 𝑛, i.e., each class is
represented by a polynomial 𝑓(𝑥) = 𝑎0 + 𝑎1 𝑥 + ⋯ + 𝑎𝑛−1 𝑥𝑛−1 with 𝑎𝑖 ∈ 𝐺𝐹(𝑝).
Note that the definition of 𝐺𝐹(𝑝𝑛 ) depends on an irreducible polynomial of degree
𝑛 and it is not clear whether such a polynomial exists. We want to show that a finite
field of order 𝑝𝑛 exists and is essentially unique.
If 𝐺𝐹(𝑝𝑛 ) exists, then ord(𝐺𝐹(𝑝𝑛 )∗ ) = 𝑝𝑛 − 1 and thus
𝑛 −1
𝑎𝑝 =1
4.3. Finite Fields 89
𝐺𝐹(𝑝). Now we show that the splitting field of 𝑓 has 𝑝𝑛 elements, which proves that
𝐺𝐹(𝑝𝑛 ) exists (for all 𝑛 ∈ ℕ) and is unique up to isomorphism.
Proposition 4.73. Let 𝑓(𝑥) = 𝑥𝑝 − 𝑥 ∈ 𝐺𝐹(𝑝)[𝑥]. The splitting field of 𝑓(𝑥) over
𝐺𝐹(𝑝) has 𝑝𝑛 elements and defines the field 𝐺𝐹(𝑝𝑛 ).
Proof. Firstly, we show that 𝑓(𝑥) does not have multiple roots. The formal derivative
is 𝐷(𝑓) = 𝑝𝑛 𝑥𝑝 −1 − 1 = −1 and thus gcd(𝑓, 𝐷(𝑓)) = 1. By Proposition 4.57, 𝑓 does
not have multiple roots and so the splitting field 𝐹 of 𝑓(𝑥) over 𝐺𝐹(𝑝) contains at least
the 𝑝𝑛 distinct roots of 𝑥𝑝 − 𝑥. However, 𝐹 could contain more elements. Let 𝑆 =
{𝑎1 , … , 𝑎𝑝𝑛 } be the set of roots. Note that 𝐺𝐹(𝑝) ⊂ 𝑆 since 𝑎𝑝 ≡ 𝑎 mod 𝑝 for all
𝑎 ∈ 𝐺𝐹(𝑝).
Next, we show that 𝑆 forms a field which must be equal to 𝐹, since 𝐹 is the smallest
field extension of 𝐺𝐹(𝑝) where 𝑓(𝑥) splits into linear factors. We thus need to prove
that 𝑎 − 𝑏 ∈ 𝑆 for all 𝑎, 𝑏 ∈ 𝑆 and 𝑎𝑏−1 ∈ 𝑆 for all 𝑎, 𝑏 ∈ 𝑆 ⧵ {0} (see Proposition 4.8
on conditions for a subgroup). We show that 𝑓(𝑎 − 𝑏) = 0 and 𝑓(𝑎𝑏−1 ) = 0 if 𝑓(𝑎) = 0
and 𝑓(𝑏) = 0. To this end, we observe that
𝑚 𝑛 𝑛
(𝑎 − 𝑏)𝑝 = 𝑎𝑝 + (−𝑏)𝑝 = 𝑎 − 𝑏
since the other terms given by the Binomial Theorem are multiples of 𝑝 and therefore
zero in 𝐺𝐹(𝑝). We note that (−1)𝑝 = −1 if 𝑝 ≠ 2 and −1 = 1 for 𝑝 = 2. This gives
𝑓(𝑎 − 𝑏) = 0. Furthermore,
𝑝𝑛 𝑛 𝑛 𝑛 𝑛
(𝑎(𝑏−1 )) = 𝑎𝑝 (𝑏−1 )𝑝 = 𝑎𝑝 (𝑏𝑝 )−1 = 𝑎𝑏−1 ,
and so 𝑓(𝑎𝑏−1 ) = 0. Summarizing, the roots of 𝑓(𝑥) = 𝑥𝑝 − 𝑥 form the splitting field
𝐹 = 𝐺𝐹(𝑝𝑛 ), and this field of 𝑝𝑛 elements is unique up to isomorphism. □
Example 4.74. 𝐺𝐹(4) is the splitting field of 𝑓(𝑥) = 𝑥4 −𝑥 over 𝐺𝐹(2). This polynomial
factorizes into 𝑥(𝑥 + 1)(𝑥2 + 𝑥 + 1) over 𝐺𝐹(2). The first two factors correspond to the
elements 0 and 1 of the base field 𝐺𝐹(2). The polynomial 𝑥2 + 𝑥 + 1 is irreducible over
𝐺𝐹(2) and
𝐺𝐹(4) = 𝐺𝐹(2)[𝑥]/(𝑥2 + 𝑥 + 1).
𝐺𝐹(4) is represented by the polynomials {0, 1, 𝑥, 𝑥 + 1}.
Addition is obvious (modulo 2), and multiplication also follows the usual rules,
but the result is reduced modulo 𝑥2 + 𝑥 + 1. For example:
𝑥(𝑥 + 1) = 𝑥2 + 𝑥 ≡ 1 mod (𝑥2 + 𝑥 + 1).
Table 4.3 shows addition and multiplication in 𝐺𝐹(4).
The multiplicative inverses are 1−1 = 1, 𝑥−1 ≡ 𝑥 + 1 mod (𝑥2 + 𝑥 + 1) and
(𝑥 + 1)−1 ≡ 𝑥 mod (𝑥2 + 𝑥 + 1).
Proposition 4.75. Let 𝑛, 𝑚 ∈ ℕ. Then 𝐺𝐹(𝑝𝑚 ) ⊂ 𝐺𝐹(𝑝𝑛 ) if and only if 𝑚 ∣ 𝑛 (see
Figure 4.2).
4.3. Finite Fields 91
+ 0 1 𝑥 𝑥+1 ⋅ 0 1 𝑥 𝑥+1
0 0 1 𝑥 𝑥+1 0 0 0 0 0
1 1 0 𝑥+1 𝑥 1 0 1 𝑥 𝑥+1
𝑥 𝑥 𝑥+1 0 1 𝑥 0 𝑥 𝑥+1 1
𝑥+1 𝑥+1 𝑥 1 0 𝑥+1 0 𝑥+1 1 𝑥
Proof. Suppose that 𝐺𝐹(𝑝𝑚 ) ⊂ 𝐺𝐹(𝑝𝑛 ) and the degree of the field extension is
[𝐺𝐹(𝑝𝑛 ) ∶ 𝐺𝐹(𝑝𝑚 )] = 𝑘. It follows that the order of 𝐺𝐹(𝑝𝑛 ) is 𝑝𝑛 = (𝑝𝑚 )𝑘 = 𝑝𝑚𝑘
and thus 𝑚 ∣ 𝑛. To prove the opposite direction, let 𝑎 ∈ 𝐺𝐹(𝑝𝑚 ) and 𝑛 = 𝑚𝑘 for some
𝑘 ∈ ℕ. Then
𝑛 𝑚 )𝑘 𝑚 𝑚 𝑚
𝑎𝑝 = 𝑎(𝑝 = (((𝑎𝑝 )𝑝 ) … )𝑝 (𝑘-fold exponentiation).
𝑚 𝑛
Since 𝑎𝑝 = 𝑎 in each step, we obtain 𝑎𝑝 = 𝑎 and therefore 𝑎 ∈ 𝐺𝐹(𝑝𝑛 ). □
𝐺𝐹(𝑝8 ) 𝐺𝐹(𝑝9 )
𝐺𝐹(𝑝6 )
𝐺𝐹(𝑝 )
notation. Addition on the 8-bit words is given by a simple XOR operation, but the
multiplication is less obvious and defined by a multiplication of polynomials, followed
by a reduction modulo 𝑔(𝑥).
We can use SageMath for computations in 𝐺𝐹(256). Suppose we want to compute
𝑥7 (𝑥 + 1) mod 𝑔(𝑥) and the multiplicative inverse of 𝑥 + 1 mod 𝑔(𝑥).
sage: R.<x> = PolynomialRing (GF(2),x)
sage: g=x^8+x^4+x^3+x+1
sage: K.<a>=R. quotient_ring (g)
sage: a^7 * (a+1) ; 1/(a+1)
a^7 + a^4 + a^3 + a + 1
a^7 + a^6 + a^5 + a^4 + a^2 + a
In the following, we assume that the reader knows the definitions of linear inde-
pendence, basis and dimension of a vector space (see [WJW+ 14] or any other textbook
on linear algebra).
Example 4.79. (1) Let 𝐾 be any field; then 𝐾 𝑛 is the standard example of a 𝑛-dimen-
sional 𝐾-vector space.
(2) The binary strings of length 𝑛 ∈ ℕ form the 𝐺𝐹(2)-vector space 𝐺𝐹(2)𝑛 . The
group operation is defined by bitwise XORing. The scalar multiplication is trivial,
since the only factors are 0 and 1. Note that there is no natural ring structure on
(3) Let 𝑝 be a prime number and 𝑛 ∈ ℕ. Then 𝐺𝐹(𝑝𝑛 ) is a vector space over 𝐺𝐹(𝑝).
4.4. Linear and Affine Maps 93
Maps between vector spaces that are compatible with addition and scalar multi-
plication are called linear:
Definition 4.80. Let 𝑓 ∶ 𝑉 → 𝑊 be a map between two 𝐾-vector spaces. Then 𝑓 is a
𝐾-linear map if:
(1) 𝑓(𝑣1 + 𝑣2 ) = 𝑓(𝑣1 ) + 𝑓(𝑣2 ) for all 𝑣1 , 𝑣2 ∈ 𝑉 and
(2) 𝑓(𝜆 ⋅ 𝑣) = 𝜆 ⋅ 𝑓(𝑣) for all 𝜆 ∈ 𝐾, 𝑣 ∈ 𝑉.
Remark 4.82. One should understand that linearity is a very strong requirement,
and random maps are mostly far from linear! However, linear maps play an important
role in many applications as well as in cryptography.
Matrices are a key tool for the description of linear maps. We recapitulate the fol-
lowing fact from linear algebra:
Proposition 4.83. There is a one-to-one correspondence between 𝑛 × 𝑚 matrices over 𝐾
and linear maps 𝑓 ∶ 𝐾 𝑚 → 𝐾 𝑛 . Any matrix 𝐴 over 𝐾 with 𝑛 rows and 𝑚 columns gives
a linear map 𝑓 ∶ 𝐾 𝑚 → 𝐾 𝑛 by setting 𝑓(𝑣) = 𝐴𝑣, where we view 𝑣 as a column vector.
Conversely, a linear map 𝑓 ∶ 𝐾 𝑚 → 𝐾 𝑛 defines a matrix by writing the images of the
standard basis, i.e., 𝑓(𝑒1 ), 𝑓(𝑒2 ), … , 𝑓(𝑒𝑚 ), into the columns of a 𝑛 × 𝑚 matrix:
| | |
𝐴 = (𝑓(𝑒1 ) 𝑓(𝑒2 ) … 𝑓(𝑒𝑚 )) . ♢
| | |
The above construction can be generalized from the standard basis to an arbitrary
basis. In fact, a linear map is completely determined by its values on a basis.
Definition 4.84. A 𝐾-linear map 𝑓 ∶ 𝑉 → 𝑊 is said to be an isomorphism if 𝑓 is
invertible, i.e., if there is an inverse 𝐾-linear map 𝑓−1 ∶ 𝑊 → 𝑉. ♢
Cryptography primarily considers maps over finite fields, but more recent advances
(lattice-based cryptography and quantum computing) also require real and complex
vector spaces.
Definition 4.86. An 𝑛 × 𝑛 matrix 𝐴 over ℝ is called orthogonal if 𝐴𝑇 𝐴 = 𝐼𝑛 . ♢
Here 𝐴𝑇 denotes the transpose matrix and 𝐼𝑛 is the 𝑛 × 𝑛 identity matrix. Orthog-
onal matrices are invertible, the inverse matrix is 𝐴−1 = 𝐴𝑇 and det(𝐴) is either 1 or
−1. The rows and the columns are orthonormal vectors. The associated linear map
𝑓(𝑥) = 𝐴 𝑥 of real vector spaces is also called orthogonal and preserves lengths and
Example 4.87. The rotation of two-dimensional vectors by 𝛼 around the origin is de-
scribed by the following orthogonal matrix:
cos(𝛼) − sin(𝛼)
𝐴=( ).
sin(𝛼) cos(𝛼)
One easily verifies that 𝐴𝑇 𝐴 = 𝐼2 . The columns of 𝐴 are obtained by rotating the
standard unit vectors 𝑒1 and 𝑒2 by 𝛼 counter-clockwise around the origin. ♢
A slight generalization of linear maps are affine maps. They differ from linear maps
only by a constant translation. We remark that affine maps are sometimes also called
linear, although this is not fully precise. Furthermore, nonlinear usually means that a
map is neither linear nor affine.
4.4. Linear and Affine Maps 95
More examples of linear, affine and nonlinear Boolean functions are given in Ex-
ample 1.24.
Linear and affine maps play an important role in cryptography. This has several
• They can be efficiently described by a matrices and vectors, even for large dimen-
• Matrix computations are efficient, and the running time is polynomial in the
number of rows and columns.
• The kernel and the image of a linear map as well as the preimage of any element
can be efficiently computed by Gaussian elimination. Also, it can be easily veri-
fied whether a linear or affine map is bijective. If an inverse map exists, then it
can be efficiently computed and the inverse map is also linear or affine.
• Linear and affine maps over 𝐺𝐹(2) can produce diffusion and the so-called
avalanche effect: changing one input bit changes many output bits, if the matrix
is appropriately chosen. In fact, flipping the 𝑘-th input bit adds the 𝑘-th column
vector 𝐴 𝑒𝑘 to the output.
However, linear maps also have a major drawback when used in cryptography:
there are efficient attacks against encryption schemes that are solely based on linear or
affine operations. They do not protect against chosen plaintext attacks. For this reason,
linear and nonlinear operations are combined in the construction of secure ciphers.
Proposition 4.91. Let 𝑓 ∶ 𝐾 𝑚 → 𝐾 𝑛 be an affine map. Suppose the parameters of 𝑓 (i.e.,
the corresponding matrix and possibly a translation vector) are secret, but an adversary
knows 𝑚+1 input vectors 𝑣0 , 𝑣1 , … , 𝑣𝑚 and the corresponding output vectors 𝑤𝑖 = 𝑓(𝑣𝑖 ),
96 4. Algebraic Structures
Proof. Let 𝑓(𝑣) = 𝐴𝑣 + 𝑏, where 𝐴 and 𝑏 are unknown. Since 𝐴𝑣𝑖 + 𝑏 = 𝑤𝑖 , one has
𝐴(𝑣𝑖 − 𝑣0 ) = 𝑤𝑖 − 𝑤0
for all 𝑖 = 1, 2, … , 𝑚. Now write the vectors 𝑣𝑖 − 𝑣0 and 𝑤𝑖 − 𝑤0 into the columns of
matrices 𝑉 and 𝑊, respectively. The 𝑚 × 𝑚 matrix 𝑉 is regular since we assumed that
the vectors 𝑣𝑖 − 𝑣0 are linearly independent. Hence
𝐴𝑉 = 𝑊 ⟹ 𝐴 = 𝑊𝑉 −1 ,
and this provides an efficient matrix formula for 𝐴. It remains to compute the trans-
lation vector 𝑏; to this end we use the equation 𝑏 = 𝑤0 − 𝐴𝑣0 . This proves the asser-
tion. □
Example 4.92. Let 𝐸𝑘 ∶ {0, 1}128 → {0, 1}128 be the ciphering function of a block ci-
pher 𝐸 with block length 128. We identify {0, 1} with 𝐺𝐹(2) and suppose that 𝐸𝑘 is
affine. Then Proposition 4.91 shows that an adversary, who does not know 𝑘, can
find the matrix, the translation vector and hence 𝐸𝑘 and 𝐸𝑘−1 with only 129 known
plaintext/ciphertext pairs, if the above independency condition is satisfied. Hence
128 ⋅ 129 = 15,512 plaintext/ciphertext bits are sufficient, or slightly more if the vectors
are linearly dependent. Note that we made no assumptions concerning 𝑘 and how the
key determined the matrix and the translation vector. This could even be a nonlinear
relationship. In fact, the key is not computed during this attack.
Proposition 4.93. Let 𝐹 be a keyed family of functions and suppose that all maps 𝐹𝑘
are affine; then 𝐹 is not a pseudorandom function. Similarly, if 𝐸 is a keyed family of
permutations and all maps 𝐸𝑘 are affine, then 𝐸 is not a pseudorandom permutation.
Proof. Proposition 4.91 shows how an adversary can explicitly compute the param-
eters of an affine map, i.e., the matrix and the translation vector, using a number of
known input/output values. In the distinguishability experiments (see Definitions 2.38
and 2.41), an adversary can then predict 𝑓(𝑚) for any input 𝑚 and compare the result
with the response 𝑐 they obtain from the challenger. If they coincide, then the function
𝑓 is probably affine and the adversary outputs 𝑏′ = 1. Otherwise, 𝑓 is random and the
adversary outputs 𝑏′ = 0.
Alternatively, an adversary can test whether 𝑓 is affine, by choosing input values
𝑚1 and 𝑚2 and asking for 𝑓(0), 𝑓(𝑚1 ), 𝑓(𝑚2 ), 𝑓(𝑚1 + 𝑚2 ). If 𝑓 is affine, then
𝑓(𝑚1 + 𝑚2 ) + 𝑓(0) = 𝑓(𝑚1 ) + 𝑓(𝑚2 ).
An adversary outputs 𝑏 = 1 if this equation is satisfied, and 0 otherwise. Their advan-
tage is close to 1, and so affine functions cannot be pseudorandom. □
Exercises 97
Remark 4.94. The above attack would essentially still work if 𝐹𝑘 can be approximated
by a linear or affine map 𝑓, i.e., if 𝐹𝑘 and 𝑓 coincide significantly more often than by
chance. Therefore, pseudorandom functions and permutations must be highly nonlin-
4.5. Summary
• Finite cyclic groups of order 𝑛 are isomorphic to the additive group of integers
modulo 𝑛.
• Finite abelian groups can be decomposed into a product of cyclic groups of
prime-power order.
• The integers modulo a prime number 𝑝 define the field 𝐺𝐹(𝑝).
• The polynomials over a field 𝐾 form the ring 𝐾[𝑥].
• The field 𝐺𝐹(𝑝𝑛 ) with 𝑝𝑛 elements is an extension field of 𝐺𝐹(𝑝). It can be de-
fined as the quotient of the polynomial ring over 𝐺𝐹(𝑝) modulo an irreducible
polynomial of degree 𝑛.
• 𝐺𝐹(𝑝𝑛 ) is the splitting field of the polynomial 𝑥𝑝 − 𝑥 over 𝐺𝐹(𝑝).
• Linear maps between finite-dimensional vector spaces over an arbitrary field
can be described by matrices.
• Affine maps are defined by a linear map plus a constant translation.
• Keyed function or permutation families of linear or affine maps cannot be pseu-
9. Determine the decompositions of ℤ∗12 and ℤ∗23 as a product of additive cyclic groups.
10. Which residue classes are generators of the additive group ℤ𝑛 ?
11. Let 𝑛 = 247 = 𝑝𝑞. Find the factors 𝑝 and 𝑞 and solve the simultaneous congru-
ences 𝑘 = 7 mod 𝑝 and 𝑘 = 2 mod 𝑞 using the Chinese Remainder Theorem.
12. Let 𝑅1 and 𝑅2 be rings. Why is the product ring 𝑅1 × 𝑅2 never a field, even if 𝑅1
and 𝑅2 are fields?
Tip: Consider the idempotent elements (1, 0) and (0, 1).
13. Let 𝑓, 𝑔 ∈ 𝐾[𝑥]. Show the product rule
𝐷(𝑓 ⋅ 𝑔) = 𝐷(𝑓) ⋅ 𝑔 + 𝑓 ⋅ 𝐷(𝑔).
Tip: Use the linearity of the derivative 𝐷 to reduce to the case 𝑓 = 𝑥𝑛 and 𝑔 = 𝑥𝑚 .
14. Determine the number of elements of the following residue class rings. Which of
the rings are fields?
(a) 𝐺𝐹(2)[𝑥]/(𝑥4 + 𝑥2 + 1),
(b) 𝐺𝐹(3)[𝑥]/(𝑥2 + 1),
(c) 𝐺𝐹(2)[𝑥]/(𝑥𝑛 − 1), where 𝑛 ∈ ℕ.
15. Let 𝐺𝐹(8) = 𝐺𝐹(2)[𝑥]/(𝑥3 + 𝑥 + 1). Find representatives of 𝑥3 , 𝑥4 , 𝑥5 , 𝑥6 , 𝑥7 in
𝐺𝐹(8) of degree less than 3.
16. Find an irreducible polynomial over 𝐺𝐹(2) of degree 6.
17. 𝐺𝐹(28 ) is the splitting field of 𝑓(𝑥) = 𝑥256 − 𝑥. Use SageMath to factor 𝑓(𝑥) over
𝐺𝐹(2) and identity the irreducible factor 𝑔(𝑥) = 𝑥8 + 𝑥4 + 𝑥3 + 𝑥 + 1 used to define
the AES field.
18. Find explicit descriptions of all subfields of 𝐺𝐹(256).
19. Define 𝐺𝐹(28 ) using 𝑔(𝑥) as above. Which polynomial 𝑓(𝑥) corresponds to the
byte 02 (hexadecimal notation)? Determine a polynomial ℎ(𝑥) which is inverse to
𝑓(𝑥) mod 𝑔(𝑥) and give its hexadecimal representation.
20. Consider the bit permutation 𝑓 ∶ 𝐺𝐹(2)8 → 𝐺𝐹(2)8 described by (3 1 8 2 5 4 6 7).
Determine the inverse bit-permutation 𝑓−1 and the matrices which represent 𝑓
and 𝑓−1 .
21. Show that the following matrix 𝐴 is unitary and find the inverse matrix 𝐴−1 :
1 1+𝑖 1−𝑖
𝐴= ( ).
2 1−𝑖 1+𝑖
22. Let 𝑓 be an affine map given by 𝑓(𝑥) = 𝐴𝑥 + 𝑏. Give a necessary and sufficient
condition for 𝑓 being bijective and a formula for 𝑓−1 .
23. Why is the following map 𝑓 ∶ 𝐺𝐹(2)3 → 𝐺𝐹(2)3 affine and invertible:
𝑓(𝑥1 , 𝑥2 , 𝑥3 ) = (𝑥1 + 𝑥2 + 𝑥3 + 1, 𝑥1 + 𝑥2 , 𝑥2 + 𝑥3 + 1)?
Determine the matrix and the translation vector. Compute the inverse map 𝑓−1 .
Exercises 99
24. Let 𝑓 ∶ 𝐺𝐹(2)3 → 𝐺𝐹(2)3 be a linear map with the following input vectors 𝑣𝑖 and
output vectors 𝑤𝑖 = 𝑓(𝑣𝑖 ). Determine the matrix which corresponds to 𝑓.
0 1 0 1 1 1
𝑣1 = (1) , 𝑤1 = (0) , 𝑣2 = (0) , 𝑤2 = (0) , 𝑣3 = (1) , 𝑤3 = (1) .
0 0 1 1 1 1
25. Let 𝑉 = 𝐺𝐹(28 ). How can you describe a) 𝐺𝐹(28 )-linear maps and b) 𝐺𝐹(2)-linear
maps on 𝑉? How many different maps exist in case a) and in case b) ?
26. Suppose all maps 𝐹𝑘 of a keyed function family 𝐹 are linear. How can an adversary
easily win the prf distinguishability experiment? This shows that 𝐹 is not a pseu-
dorandom function.
Tip: Choose an all-zero input.
Chapter 5
Block Ciphers
102 5. Block Ciphers
linear or affine mixing maps are also used. The required properties (in particular
bijectivity) can easily be checked and the computations are very efficient. Fur-
thermore, linear and affine maps can achieve diffusion if the map is appropriately
chosen: small input changes, say only one bit, affect a whole block and result in
large output changes. Note that diffusion is a necessary property of pseudoran-
dom permutations: the output should completely change, even if only a few input
bits are modified. Otherwise, an adversary could distinguish 𝐸𝑘 from a random
• S-Boxes are nonlinear, random looking maps which are applied in parallel to short
segments of a block. For a small number of input values, for example 8 bits with
28 = 256 values, the S-Box transformation can be defined explicitly by a table.
The S-Box needs to be carefully defined and must be highly nonlinear.
The combination of linear mixing maps and nonlinear S-Boxes, applied in several
rounds, can achieve confusion, which makes the relationship between the ciphertext
and the key complex and involved. Confusion makes it very hard to find the key or
the decryption function, even if many plaintext/ciphertext pairs are known to an ad-
The properties of confusion and diffusion and their role in the construction of se-
cure ciphers were first described by Claude Shannon in 1949 [Sha49].
Confusion and diffusion can be achieved by a substitution-permutation network.
Such a network consists of a number of rounds in which a plaintext block is trans-
formed into a ciphertext block. Each round consists of the following operations (see
Figure 5.1):
(1) Add a round key to the data block. The round key is derived from the encryption
key and ensures that the transformation depends on the key.
(2) Split the block into smaller segments and apply a nonlinear S-Box (substitution)
to each of the segments.
(3) Apply a bit permutation or, more generally, a linear or affine mixing map to the
full data block.
Figure 5.1. Substitution-permutation network: add a round key 𝑘𝑖 , apply the S-Box 𝑆
and the permutation 𝑃 in 𝑛 rounds.
After 𝑟 rounds and a final permutation one obtains the ciphertext of the Feistel cipher:
𝐸𝑘 (𝐿0 , 𝑅0 ) = (𝑅𝑟 , 𝐿𝑟 ).
Now let (𝑅𝑟 , 𝐿𝑟 ) be a ciphertext block. Then define
(𝑅𝑖−1 , 𝐿𝑖−1 ) = (𝐿𝑖 , 𝑅𝑖 ⊕ 𝑓𝑘𝑖 (𝐿𝑖 )) for 𝑖 = 𝑟, 𝑟 − 1, … , 1.
Note that encryption and decryption use the same transformation. Applying 𝑟 rounds
of the Feistel network and a final permutation recovers the plaintext:
𝐷𝑘 (𝑅𝑟 , 𝐿𝑟 ) = (𝐿0 , 𝑅0 ).
The round function 𝑓 depends on a round key 𝑘𝑖 , and 𝑓𝑘𝑖 is usually defined by S-
boxes and bit permutations (or affine operations), similar to a substitution-permutation
network. However, the round function 𝑓 only operates on one half of a block and, due
to the Feistel network construction, 𝑓 does not have to be bijective. One can show that
104 5. Block Ciphers
Many block ciphers are based on Feistel networks, for example the former encryp-
tion standard DES, but also modern block ciphers such as Twofish.
𝑝 𝑝4 𝑝8 𝑝12
⎛ 0 ⎞
𝑝 𝑝5 𝑝9 𝑝13
⎜ 1 ⎟.
⎜ 2 𝑝6 𝑝10 𝑝14 ⎟
⎝𝑝3 𝑝7 𝑝11 𝑝15 ⎠
5.2. Advanced Encryption Standard 105
𝐺𝐹(28 ) = 𝐺𝐹(2)[𝑥]/(𝑥8 + 𝑥4 + 𝑥3 + 𝑥 + 1)
(compare Example 4.77). The byte (𝑏7 … 𝑏1 𝑏0 ) corresponds to the residue class 𝑏7 𝑥7 +
⋯ + 𝑏1 𝑥 + 𝑏0 mod (𝑥8 + 𝑥4 + 𝑥3 + 𝑥 + 1).
Example 5.1. Let 𝑚 = 10 … 0 be a 128-bit input block. The byte 80 = 1000 0000
corresponds to 𝑥7 mod 𝑥8 + 𝑥4 + 𝑥3 + 𝑥 + 1. Obviously, the zero byte corresponds to
the zero polynomial. Thus, 𝑚 is represented by the following 4 × 4 matrix over 𝐺𝐹(28 ):
𝑥7 0 0 0
⎛ ⎞
0 0 0 0
⎜ ⎟. ♢
⎜0 0 0 0⎟
⎝0 0 0 0⎠
The AES encryption function 𝑓𝑘 takes the plaintext as input state and transforms
the state in successive rounds. Each round consists of several steps: the nonlinear sub-
stitution step (SubBytes), two linear mixing steps (ShiftRows,
MixColumns) and the affine AddRoundKey step. The final state is output. Each step
is invertible and the decryption function 𝑓𝑘−1 is given by composing the inverse steps
in reverse order.
The following pseudocode gives a high-level description of 𝑓𝑘 . The SubBytes,
ShiftRows, MixColumns operations and the KeyExpansion step are described below.
Rijndael(State, CipherKey)
KeyExpansion(CipherKey, ExpandedKey)
for(i = 1; i < Nr ; i++) { // Nr is either 10, 12 or 14
// Round i
// Final Round
First, we consider the S-Box SubBytes which is the only non-affine component of
AES. The S-Box function 𝑆𝑅𝐷 ∶ 𝐺𝐹(28 ) → 𝐺𝐹(2)8 is applied to each byte of the state
106 5. Block Ciphers
Figure 5.3. The nonlinear S-Box operates on each byte of the state array individually.
𝐴𝑎−1 + 𝑏 for 𝑎 ≠ 0,
𝑆𝑅𝐷 (𝑎) = {
𝑏 for 𝑎 = 0,
1 1 1 1 1 0 0 0 0
⎛ ⎞ ⎛ ⎞
0 1 1 1 1 1 0 0 1
⎜ ⎟ ⎜ ⎟
⎜0 0 1 1 1 1 1 0⎟ ⎜1⎟
⎜0 0 0 1 1 1 1 1⎟ ⎜0⎟
𝐴=⎜ , 𝑏 = ⎜ ⎟.
1 0 0 0 1 1 1 1⎟ 0
⎜ ⎟ ⎜ ⎟
⎜1 1 0 0 0 1 1 1⎟ ⎜0⎟
⎜1 1 1 0 0 0 1 1⎟ ⎜1⎟
⎝1 1 1 1 0 0 0 1⎠ ⎝1⎠
Since 0 ∈ 𝐺𝐹(28 ) is not invertible, one extends the inversion by mapping 0 to 0. The ex-
tended inversion map is a bijection on 𝐺𝐹(28 ) and can also be described by the mono-
mial map 𝑖(𝑎) = 𝑎254 . In fact, Euler’s Theorem can be applied to the multiplicative
group 𝐺𝐹(28 )∗ of units. This yields 𝑎255 = 1 and hence 𝑎−1 = 𝑎254 for all 𝑎 ≠ 0. The
5.2. Advanced Encryption Standard 107
composition of the inversion map 𝑖 and the affine transformation 𝑓(𝑎) = 𝐴𝑎 + 𝑏 can
be represented by a polynomial over 𝐺𝐹(28 ) (see [DR02]):
𝑆𝑅𝐷 (𝑎) = 𝑓(𝑖(𝑎)) = 05 ⋅ 𝑎254 + 09 ⋅ 𝑎253 + F9 ⋅ 𝑎251 + 25 ⋅ 𝑎247 + F4 ⋅ 𝑎239
+ 01 ⋅ 𝑎223 + B5 ⋅ 𝑎191 + 8F ⋅ 𝑎127 + 63.
This shows that 𝑆𝑅𝐷 has a complex algebraic expression over 𝐺𝐹(28 ). One can also
consider 𝑆𝑅𝐷 as an (8, 8)-vectorial Boolean function and compute the algebraic normal
form (see Section 1.1) of each of its components. The algebraic degree of the Boolean
functions is 7, which demonstrates a high algebraic complexity over 𝐺𝐹(2).
Note that implementations of AES do not use the algebraic definition of 𝑆𝑅𝐷 , but
a lookup table instead. This requires only 256 bytes of memory.
Example 5.2. We use SageMath, construct an AES object called sr and print out the
hexadecimal S-Box values:
sage: sr = mq.SR(10, 4, 4, 8, star=True , allow_zero_inversions =True , aes_mode=True)
sage: S=sr.sbox ()
sage: for i in range (0 ,256):
print "{:02X}".format(S[i]),
63 7C 77 7B F2 6B 6F C5 30 01 67 2B FE D7 AB 76 CA 82 C9 7D FA 59 47 F0
AD D4 A2 AF 9C A4 72 C0 B7 FD 93 26 36 3F F7 CC 34 A5 E5 F1 71 D8 31 15
04 C7 23 C3 18 96 05 9A 07 12 80 E2 EB 27 B2 75 09 83 2C 1A 1B 6E 5A A0
52 3B D6 B3 29 E3 2F 84 53 D1 00 ED 20 FC B1 5B 6A CB BE 39 4A 4C 58 CF
D0 EF AA FB 43 4D 33 85 45 F9 02 7F 50 3C 9F A8 51 A3 40 8F 92 9D 38 F5
BC B6 DA 21 10 FF F3 D2 CD 0C 13 EC 5F 97 44 17 C4 A7 7E 3D 64 5D 19 73
60 81 4F DC 22 2A 90 88 46 EE B8 14 DE 5E 0B DB E0 32 3A 0A 49 06 24 5C
C2 D3 AC 62 91 95 E4 79 E7 C8 37 6D 8D D5 4E A9 6C 56 F4 EA 65 7A AE 08
BA 78 25 2E 1C A6 B4 C6 E8 DD 74 1F 4B BD 8B 8A 70 3E B5 66 48 03 F6 0E
61 35 57 B9 86 C1 1D 9E E1 F8 98 11 69 D9 8E 94 9B 1E 87 E9 CE 55 28 DF
8C A1 89 0D BF E6 42 68 41 99 2D 0F B0 54 BB 16
One of the main design criteria of the S-Box is its nonlinearity. We have seen above
that 𝑆𝑅𝐷 is nonlinear and its algebraic degree is high. In addition, it is impossible to
approximate 𝑆𝑅𝐷 by affine functions. One can show that any linear combination (XOR)
of input and output bits of the S-Box gives the correct value for at least 112 and at
most 144 of 256 input values. Note that the expected number of matches for a random
XOR combination is 128. The correlation between the S-Box and all affine functions is
therefore low, which protects the cipher against a linear cryptanalysis.
Another design aspect is the differential properties of the S-Box. It can be shown
that for any fixed pair of input-output differences, at most 4 out of 256 values propagate
the given differences. This prevents a differential cryptanalysis of the cipher.
Next, we look at the diffusion layer, which is implemented by the linear mixing
maps ShiftRows and MixColumns. Both are 𝐺𝐹(28 )-linear operations on the state ma-
ShiftRows is a bit permutation and rotates the bytes in the second, third and fourth
row to the left. The first row is left unchanged, the bytes in the second row are rotated
108 5. Block Ciphers
by one position, bytes in the third row are rotated by two positions and bytes in the
fourth row are rotated by three positions (see Figure 5.4). Clearly, ShiftRows can be
inverted by a corresponding circular right shift.
Figure 5.4. ShiftRows rotates the bytes in the rows by zero, one, two and three posi-
tions, respectively.
𝑝 𝑝4 𝑝8 𝑝12 𝑝 𝑝4 𝑝8 𝑝12
⎛ 0 ⎞ ⎛ 0 ⎞
𝑝 𝑝5 𝑝9 𝑝13 ShiftRows 𝑝5 𝑝9 𝑝13 𝑝1
⎜ 1 ⎟ −−−−−−→ ⎜ ⎟.
⎜𝑝2 𝑝6 𝑝10 𝑝14 ⎟ ⎜𝑝10 𝑝14 𝑝2 𝑝6 ⎟
⎝𝑝3 𝑝7 𝑝11 𝑝15 ⎠ ⎝𝑝15 𝑝3 𝑝7 𝑝11 ⎠
MixColumns transforms the columns of the state matrix by a 𝐺𝐹(28 )-linear map.
One multiplies a constant 4×4 matrix 𝑀 over 𝐺𝐹(28 ) by the column vectors of the state
(see Figure 5.5). The matrix is regular so that the operation can be inverted (see Sec-
tion 0.4 where the inverse matrix is computed). The MixColumns matrix was carefully
chosen to have good diffusion properties. If 𝑣 ∈ 𝐺𝐹(28 )4 is a nonzero column vector,
then the number of nonzero bytes of 𝑣 plus the number of nonzero bytes of 𝑀𝑣 is at
least 5. This can be shown using linear codes (see Example 15.19 (2)).
𝑝 𝑝4 𝑝8 𝑝12 02 03 01 01 𝑝 𝑝4 𝑝8 𝑝12
⎛ 0 ⎞ ⎛ ⎞ ⎛ 0 ⎞
𝑝 𝑝5 𝑝9 𝑝13 MixColumns 01 02 03 01 𝑝 𝑝5 𝑝9 𝑝13
⎜ 1 ⎟ −−−−−−−−→ ⎜ ⎟⋅⎜ 1 ⎟
⎜ 2 𝑝6 𝑝10 𝑝14 ⎟ ⎜ 01 01 02 03 𝑝
⎟ ⎜ 2 𝑝6 𝑝10 𝑝14 ⎟
⎝𝑝3 𝑝7 𝑝11 𝑝15 ⎠ 03⎵⎵⎵
⎝⏟⎵ 01 01
⎵⏟⎵ 02⎠ ⎝𝑝3
⎵⎵⎵⎵⏟ 𝑝7 𝑝11 𝑝15 ⎠
In the AddRoundKey step, every bit of the state matrix is XORed with the round
key 𝑘𝑖 . The round keys have the same length as the state (128 bits) and are computed
in the KeyExpansion step, as explained below.
5.2. Advanced Encryption Standard 109
𝑝0 𝑝4 𝑝8 𝑝12
𝑝1 𝑝5 𝑝9 𝑝13
𝑀⋅ 𝑀⋅ 𝑀⋅ 𝑀⋅
𝑝2 𝑝6 𝑝10 𝑝14
𝑝3 𝑝7 𝑝11 𝑝15
Figure 5.5. The MixColumns operation: each column is transformed by a fixed matrix 𝑀.
𝑝 𝑝4 𝑝8 𝑝12 𝑝 𝑝4 𝑝8 𝑝12
⎛ 0 ⎞ ⎛ 0 ⎞
𝑝 𝑝5 𝑝9 𝑝13 AddRoundKey 𝑝1 𝑝5 𝑝9 𝑝13
⎜ 1 ⎟ −−−−−−−−−→ ⎜ ⎟ 𝑘.
⎜ 2 𝑝6 𝑝10 𝑝14 ⎟ ⎜𝑝2 𝑝6 𝑝10 𝑝14 ⎟ ⨁ 𝑖
⎝𝑝3 𝑝7 𝑝11 𝑝15 ⎠ ⎝𝑝3 𝑝7 𝑝11 𝑝15 ⎠
The operations were designed such that two Rijndael rounds (SubBytes,
ShiftRows, MixColumns, AddRoundKey) already provide sufficient diffusion. After
two rounds, every output bit depends on all input bits, and a change in one input bit
changes about half of all output bits.
Finally, we explain AES key scheduling. The main design criteria for the key ex-
pansion step were efficiency, symmetry elimination, diffusion of the key and nonlinearity.
The nonlinearity is intended to protect the cipher against related-key attacks (compare
Remark 2.44).
We begin with 128-bit keys (see Figure 5.6). In this case, the AES algorithm has
ten rounds, and eleven 128-bit round keys 𝑘0 , 𝑘1 , … , 𝑘10 are required. The subkeys are
stored in 44 words 𝑊0 , 𝑊1 , … , 𝑊43 ∈ 𝐺𝐹(28 )4 of length 32 bits. Let
𝑠ℎ ∶ 𝐺𝐹(28 )4 → 𝐺𝐹(28 )4
be the rotation by one byte position to the left, i.e., 𝑠ℎ(𝑝0 , 𝑝1 , 𝑝2 , 𝑝3 ) = (𝑝1 , 𝑝2 , 𝑝3 , 𝑝0 ).
We write 𝐒 for the function which applies the S-Box 𝑆𝑅𝐷 to all four components of
a vector in 𝐺𝐹(28 )4 . This function ensures that the key schedule is nonlinear. The
symmetry of 𝐒 is eliminated by round constants:
𝑅𝐶𝑗 = 𝑥𝑗−1 mod 𝑥8 + 𝑥4 + 𝑥3 + 𝑥 + 1 ∈ 𝐺𝐹(28 ) for 𝑗 ≥ 1.
The 128-bit AES key 𝑘 defines the initial round key 𝑘 = 𝑘0 = 𝑊0 ‖𝑊1 ‖𝑊2 ‖𝑊3 . The next
round key 𝑘1 = 𝑊4 ‖𝑊5 ‖𝑊6 ‖𝑊7 is computed as follows:
W0 W1 W2 W3 k
W4 W5 W6 W7 k1
W8 W9 W 10 W 11 k
Figure 5.6. The first two rounds of 128-bit AES key scheduling. 𝑇 maps the word
𝑊4𝑖−1 to 𝐒(𝑠ℎ(𝑊4𝑖−1 )) ⊕ (𝑅𝐶𝑖 , 0, 0, 0). This is basically a byte-wise SubBytes operation,
but involves an additional rotation and a translation by a round constant.
The following round keys are constructed analogously (increment the index of 𝑅𝐶
by 1 and increase all other indices by 4). For 𝑖 = 2, … , 10 one defines:
A 256-bit AES key 𝑘 defines the first eight words 𝑊0 , 𝑊1 , … , 𝑊7 . The next eight
words 𝑊8 , 𝑊9 , … , 𝑊15 are computed as follows:
The following eight words are defined analogously (increment the index of 𝑅𝐶 by
1 and increase all other indices by 8), until all 60 words have been defined. Again, the
first word of each round key is defined by a nonlinear operation, which in turn affects
all subsequent words.
5.3. Summary
1. Consider Feistel ciphers. Use the formulas in Section 5.1 to show that 𝐷𝑘 recovers
the plaintext.
2. Verify the following inverses in 𝐺𝐹(28 ) (in hexadecimal notation):
01−1 = 01, 02−1 = 8D, 03−1 = F6.
Then compute 𝑆𝑅𝐷 (00), 𝑆𝑅𝐷 (01), 𝑆𝑅𝐷 (02) and 𝑆𝑅𝐷 (03).
112 5. Block Ciphers
3. Let 𝑛 ∈ ℕ. Show that 𝑓(𝑥) = 𝑥(2 is a 𝐺𝐹(2)-linear map on 𝐺𝐹(28 ), whereas
𝑓(𝑥) = 𝑥254 is not linear.
4. Describe the inverse S-Box 𝑆𝑅𝐷 .
5. How can the multiplication of 8-bit strings by 01, 02 and 03 be efficiently imple-
mented? What is an advantage of the MixColumns matrix? What can be said about
its inverse matrix?
6. Show that the MixColumn matrix and all submatrices are nonsingular over 𝐺𝐹(28 ).
One can show that this ensures good diffusion properties (see Example 15.19 (2)).
7. Give a high-level (pseudocode) description of the AES decryption function 𝑓𝑘−1 .
8. Suppose a 128-bit AES key 𝑘 and a plaintext 𝑚 are given:
𝑘 = 01 00 00 00 00 00 00 00 00, 𝑚 = 80 00 00 00 00 00 00 00 00.
(a) Find the round keys 𝑘0 and 𝑘1 .
(b) Use SageMath to compute all round keys and encrypt the input block 𝑚.
Tip: The following SageMath function may be used:
sage: sr = mq.SR(10, 4, 4, 8, star=True ,
allow_zero_inversions =True , aes_mode =True)
sage: def aesenc (p,k):
# Add k=key0
print sr. hex_str_vector (k)
# Rounds 1-9
for i in range (1 ,10):
p=sr. sub_bytes (p)
p=sr. shift_rows (p)
p=sr. mix_columns (p)
k=sr. key_schedule (k, i)
print sr. hex_str_vector (k)
# Round 10
p=sr. sub_bytes (p)
p=sr. shift_rows (p)
k=sr. key_schedule (k, 10)
print sr. hex_str_vector (k)
print " Output " + sr. hex_str_vector (p)
return p
Define 𝐾 = 𝐺𝐹(28 ) and initialize 4 × 4 matrices M and Key. Only the upper
left entries of these matrices are nonzero in this exercise.
sage: K.<a>=GF (2^8 , name='a', modulus =x^8+x^4+x^3+x+1)
sage: M=sr. state_array (); M[0 ,0]=a^7
sage: Key=sr. state_array (); Key [0 ,0]=1
Exercises 113
9. Assume that a modified AES block cipher lacks all ShiftRows and MixColumns
operations. Can this cipher be a pseudorandom permutation? What if only one of
these operations is missing?
10. Suppose that the multiplicative inversion is omitted in the S-Box of a modified AES
block cipher. Can this cipher be a pseudorandom permutation?
11. What is more important for a cipher: the nonlinearity of encryption or the nonlin-
earity of the key schedule?
12. Suppose a 256-bit AES key 𝑘 is all-zero. Find the round keys 𝑘0 , 𝑘1 , 𝑘2 and 𝑘3 .
Chapter 6
Stream Ciphers
Symmetric ciphers can be divided into block ciphers and stream ciphers. Some opera-
tion modes turn block ciphers into stream ciphers, for example the counter mode, but
this chapter focuses on dedicated stream ciphers that are constructed as keystream gen-
erators. Stream ciphers are usually very fast, even on restricted hardware. They were
used a lot in the past, for example to protect network communication, but in many
cases have been replaced by the AES block cipher.
Section 6.1 deals with synchronous stream ciphers and self-synchronizing stream
ciphers and presents the block cipher modes OFB and CFB. We introduce two classical
stream ciphers in Sections 6.2 and 6.3, linear feedback shift registers (LFSRs) and the
RC4 cipher, and outline their vulnerabilities. In Section 6.4, we provide an example of
a new stream cipher family, Salsa20, and the related ChaCha family.
Stream ciphers are contained in most cryptography textbooks, for example [PP10]
and [KL15]. We also recommend the handbook [MvOV97]. Further details on the
design of new stream ciphers can be found in [RB08] and at the eSTREAM project
116 6. Stream Ciphers
i.e., by bitwise addition of the plaintext and the keystream. Other encryption functions
which combine several plaintext and keystream bits to produce ciphertext are also pos-
sible. Note the difference to block ciphers (Chapter 5), which process larger blocks of
plaintext (e.g., 128 bits). Stream ciphers process small plaintext blocks, e.g., only one
bit, and the keystream varies as the plaintext is processed. Two types of cipher streams
can be distinguished. In synchronous stream ciphers, the keystream depends only on
the key and the internal state of the generator. Self-synchronizing stream ciphers, on
the other hand, use the previous ciphertext bits to generate the keystream. Below we
assume that encryption and decryption is given by binary addition.
Definition 6.1. A synchronous stream cipher is an encryption scheme defined by the
following spaces and polynomial-time algorithms:
• The plaintext space and the ciphertext space is ℳ = 𝒞 = {0, 1}∗ .
• The key generation algorithm 𝐺𝑒𝑛(1𝑛 ) takes 1𝑛 as input and outputs a key 𝑘 ∈
{0, 1}𝑛 as well as an initialization vector IV.
• The initialization algorithm Init (𝑘, 𝐼𝑉) takes 𝑘 and IV as input and outputs an
initial state 𝑠𝑡1 .
• The keystream generator 𝐺 = 𝐺(𝑘, 𝑠𝑡) takes 𝑘 and 𝑠𝑡 as input and recursively
computes 𝑙-bit output words 𝑦1 , 𝑦2 , … called a keystream. The next state function
𝑓(𝑘, 𝑠𝑡) takes 𝑘 and 𝑠𝑡 as input and updates the state 𝑠𝑡.
𝑦𝑖 = 𝐺(𝑘, 𝑠𝑡𝑖 ) and 𝑠𝑡𝑖+1 = 𝑓(𝑘, 𝑠𝑡𝑖 ) for 𝑖 ≥ 1.
• Encryption of a plaintext (𝑚1 , 𝑚2 , … ) and decryption of a ciphertext
(𝑐1 , 𝑐2 , … ) are defined by XORing each input word of length 𝑙 with the correspond-
ing keystream word (see Figure 6.1).
𝑐𝑖 = 𝑚𝑖 ⊕ 𝑦𝑖 and 𝑚𝑖 = 𝑐𝑖 ⊕ 𝑦𝑖 for 𝑖 ≥ 1.
If the last plaintext or ciphertext word is shorter than 𝑙 bits, then only the first
(most significant) bits of the keystream word are used. ♢
Figure 6.1. Keystream generation and encryption using a synchronous stream cipher.
6.1. Definition of Stream Ciphers 117
The keystream of a synchronous stream cipher does not depend on the plaintext
or the ciphertext. The sender and receiver must be synchronized and use the same state
for the decryption to be successful.
Example 6.2. The Output Feedback (OFB) mode (see [Dwo01]) turns a block cipher
into a synchronous stream cipher. Let 𝐹 ∶ {0, 1}𝑛 × {0, 1}𝑙 → {0, 1}𝑙 be a keyed family
of functions, for example a block cipher. A uniform random key 𝑘 ← {0, 1}𝑛 and a
uniform initialization vector 𝐼𝑉 ← {0, 1}𝑙 are chosen. The initial state is 𝑠𝑡1 = 𝑦0 = 𝐼𝑉,
and we recursively generate keystream words of length 𝑙 by applying 𝐹𝑘 to the state (see
Figure 6.2). The keystream is also used to update the state.
𝑦𝑖 = 𝐹𝑘 (𝑠𝑡𝑖 ) = 𝐹𝑘 (𝑦𝑖−1 ) and 𝑠𝑡𝑖+1 = 𝑦𝑖 for 𝑖 ≥ 1.
Figure 6.2. OFB mode encryption. The cipher recursively generates keystream words.
Example 6.4. A block cipher in Cipher Feedback (CFB) mode (see [Dwo01]) gives rise
to a self-synchronizing stream cipher. Let 𝐹 ∶ {0, 1}𝑛 × {0, 1}𝑙 → {0, 1}𝑙 be a keyed
family of functions, for example a block cipher. Choose a uniform key 𝑘 ← {0, 1}𝑛 and
a uniform initialization vector 𝐼𝑉 ← {0, 1}𝑙 . The initial state is 𝑠𝑡1 = 𝑐0 = 𝐼𝑉. Let 𝑚𝑖 be
the 𝑖-th plaintext word of length 𝑙. Define the keystream and the ciphertext words by
𝑦𝑖 = 𝐹𝑘 (𝑐𝑖−1 ) and 𝑐𝑖 = 𝑚𝑖 ⊕ 𝑦𝑖 for 𝑖 ≥ 1.
The keystream depends on the preceding ciphertext word (𝑡 = 1, see Figure 6.4). There
are also 𝑠-bit CFB modes where words of length 𝑠 ≤ 𝑙 are processed. ♢
For an IV-dependent stream cipher 𝐺(𝑘, 𝐼𝑉), the minimum requirement would
be the pseudorandomness of the keystream (as above) for random IVs, where the IV is
known to an adversary.
For a stronger security definition, we let the adversary choose an IV and the out-
put length. They are given either the associated keystream of the chosen length or a
random bit sequence of the same length. The adversary’s task is to distinguish between
the two cases.
The cipher is secure if 𝐺(𝑘, 𝐼𝑉) is a pseudorandom function (see Definitions 2.38
and 2.39). An IV-dependent stream cipher can in fact be viewed as a family of functions,
which is parametrized by a key and maps an IV to a keystream. Refer to [BG07] and
[Zen07] for a discussion of this topic.
Yet another security issue not addressed here are related-key attacks (see Remark
2.44) against stream ciphers.
Definition 6.5. A linear feedback shift register (LFSR) of degree 𝑛 (or length 𝑛) is
defined by feedback coefficients 𝑐1 , 𝑐2 , … , 𝑐𝑛 ∈ 𝐺𝐹(2). The initial state is an 𝑛-bit
word 𝑠𝑡 = (𝑠𝑛−1 , … , 𝑠1 , 𝑠0 ) and new bits are generated by the recursion
At each iteration step (clock tick), the state 𝑠𝑡 is updated from (𝑠𝑗−1 , … , 𝑠𝑗−𝑛 ) to
(𝑠𝑗 , 𝑠𝑗−1 , … , 𝑠𝑗−𝑛+1 ), i.e., by shifting the register to the right. The rightmost bit 𝑠𝑗−𝑛
is output. The output of an LFSR is called a linear recurring sequence. ♢
Obviously, an LFSR first outputs the initial state 𝑠0 , 𝑠1 , … , 𝑠𝑛−1 and subsequently
the new feedback bits 𝑠𝑛 , 𝑠𝑛+1 , … . At each step, the state vector 𝑠𝑡 is updated to 𝐴 ⋅ 𝑠𝑡,
where 𝐴 is an 𝑛 × 𝑛 matrix over 𝐺𝐹(2) and 𝑠𝑡 is viewed as a column vector. The initial
state is 𝑠𝑡 = (𝑠𝑛−1 , … , 𝑠1 , 𝑠0 )𝑇 .
𝑐 𝑐2 … 𝑐𝑛 𝑠 𝑠
⎛ 1 ⎞ ⎛ 𝑗−1 ⎞ ⎛ 𝑗 ⎞
1 0 0 0 𝑠𝑗−2 𝑠𝑗−1
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
𝐴 = ⎜0 1 0 0 ⎟ and 𝐴 ⋅ ⎜ 𝑠𝑗−3 ⎟ = ⎜ 𝑠𝑗−2 ⎟ for 𝑗 ≥ 𝑛.
⎜ … ⎟ ⎜ … ⎟ ⎜ … ⎟
⎝0 0 1 0⎠ ⎝𝑠𝑗−𝑛 ⎠ ⎝𝑠𝑗−𝑛+1 ⎠
120 6. Stream Ciphers
Remark 6.6. The literature has not adopted a unique notation of shift registers and
their parameters. LFSRs can also be shifted to the left so that the leftmost bit is output.
In this case, the state vector (from left to right) has increasing indices. The initial state is
𝑠𝑡 = (𝑠0 , 𝑠1 , … , 𝑠𝑛−1 ), and the recursion formula as well as the above transition matrix
look slightly different. We adopt the notation used by [MvOV97]. ♢
It is easy to see that every linear recurring sequence must ultimately be periodic:
the feedback coefficients are fixed and the output depends only on the state vector. The
state is a binary word of length 𝑛, and so there are 2𝑛 possible states.
Definition 6.7. Let 𝑠0 , 𝑠1 , … be a linear recurring sequence. The (least) period of the
sequence is the smallest integer 𝑁 ≥ 1 such that
𝑠𝑗+𝑁 = 𝑠𝑗
for all sufficiently large values of 𝑗.
Proof. Consider the sequence of state vectors. If the all-zero state occurs, then the
following output is constantly 0 and the period is 1. Otherwise, all state vectors are
nonzero. There are 2𝑛 − 1 nonzero states, and so the period is bounded by this number.
Example 6.9. Consider an LFSR of degree 4 with feedback coefficients 𝑐1 = 1, 𝑐2 = 0,
𝑐3 = 0 and 𝑐4 = 1. Suppose the initial state is 𝑠𝑡 = (1, 1, 0, 1) (see Figure 6.5). The state
is shifted to the right and a new bit is generated by XORing the first and fourth bit of
the state. The updated state is 𝑠𝑡 = (0, 1, 1, 0). We continue in this fashion and check
that the LFSR assumes all 15 nonzero states. The following output bits are generated:
1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0 (in this order). The sequence recurs after 15 output
6.2. Linear Feedback Shift Registers 121
SageMath can also compute the output of LFSRs. The key array contains the feed-
back coefficients and the fill array the initial state (in reverse order, so that the left-
most bit is the first output bit of the generator). We generate 20 bits and observe that
the output repeats after 15 bits.
sage: o = GF (2)(0); l = GF (2)(1)
sage: key = [l,o,o,l]; fill = [l,o,l,l]
sage: s = lfsr_sequence (key ,fill ,20); s
[1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0]
In the above Example 6.9, the period of the sequence is maximal. In general, the
period depends on the initial state and the parameters of an LFSR.
Definition 6.10. Let 𝑐1 , 𝑐2 , … , 𝑐𝑛 be the feedback coefficients of an LFSR of degree 𝑛.
Then 𝑐(𝑥) = 1 + 𝑐1 𝑥 + 𝑐2 𝑥2 + ⋯ + 𝑐𝑛 𝑥𝑛 ∈ 𝐺𝐹(2)[𝑥] is called connection polynomial or
feedback polynomial of the LFSR.
where 𝐺𝐹(2)[𝐴] is the commutative subring of matrices over 𝐺𝐹(2) that can be writ-
ten as a sum 𝑎0 𝐼𝑛 +𝑎1 𝐴+𝑎2 𝐴2 +⋯+𝑎𝑚 𝐴𝑚 with 𝑚 ∈ ℕ and 𝑎0 , 𝑎1 , … , 𝑎𝑚 ∈ 𝐺𝐹(2).
Proof. We leave it to the reader to prove (1). If 𝐴 is nonsingular, then 𝐴−1 exists and
𝑠 𝑠
⎛ 𝑗 ⎞ ⎛ 𝑗−1 ⎞
𝑠 𝑠
⎜ 𝑗−1 ⎟ ⎜ 𝑗−2 ⎟
𝐴−1 ⋅ ⎜ 𝑠𝑗−2 ⎟ = ⎜ 𝑠𝑗−3 ⎟ .
⎜ … ⎟ ⎜ … ⎟
⎝𝑠𝑗−𝑛+1 ⎠ ⎝𝑠𝑗−𝑛 ⎠
This shows (2). Since any LFSR is ultimately periodic and nonsingular LFSRs can run
in the reverse direction, all output bits of such LFSRs are periodic which proves (3).
We now turn to part (4). Since 𝑐(𝑥) = 1 + 𝑐1 𝑥 + 𝑐2 𝑥2 + ⋯ + 𝑐𝑛 𝑥𝑛 we have
𝑥𝑛 𝑐 ( ) = 𝑥𝑛 + 𝑐1 𝑥𝑛−1 + ⋯ + 𝑐𝑛−1 𝑥 + 𝑐𝑛 .
We prove by induction that this gives the characteristic polynomial of 𝐴. If 𝑛 = 1 then
𝐴 = (𝑐1 ), so that 𝑐1 − 𝑥 = 𝑥 + 𝑐1 is the characteristic polynomial of 𝐴 over 𝐺𝐹(2). Using
the hypothesis for LFSRs of degree 𝑛 − 1, we compute the characteristic polynomial
𝑝(𝑥) of 𝐴:
|𝑐1 + 𝑥 𝑐2 … 𝑐𝑛 |
| 1 𝑥 0 0 ||
det(𝐴 − 𝑥𝐼𝑛 ) = | 0 1 𝑥 0 | (expansion along the last column)
| … |
| |
| 0 0 1 𝑥|
|1 𝑥 0 0| |𝑐1 + 𝑥 𝑐2 … 𝑐𝑛−1 |
|0 1 𝑥 0|| | 1 𝑥 0 0 ||
= 𝑐𝑛 || |
|+𝑥| | (use hypothesis)
| … | | … |
|0 0 0 1| | 0 0 1 𝑥 |
𝑛−1 𝑛−2
= 𝑐𝑛 + 𝑥 ⋅ (𝑥 + 𝑐1 𝑥 + ⋯ + 𝑐𝑛−2 𝑥 + 𝑐𝑛−1 )
𝑛 𝑛−1
= 𝑥 + 𝑐1 𝑥 + ⋯ + 𝑐𝑛−1 𝑥 + 𝑐𝑛 .
This shows (4). Now we derive (5) from (4):
1 1 1
𝑐(𝑥) = 𝑐 ( ) = 𝑥𝑛 𝑝 ( ) = 𝑥𝑛 det (𝐴 − 𝐼𝑛 ) = det(𝑥𝐴 − 𝐼𝑛 ).
1/𝑥 𝑥 𝑥
It remains to prove (6). In general, the minimal polynomial of a matrix divides the
characteristic polynomial (Cayley-Hamilton theorem). Now one can easily see from
the definition of 𝐴 that the first unit vector 𝑒1 is a cyclic vector of 𝐴: the vectors 𝑒1 ,
𝐴 ⋅ 𝑒1 , … , 𝐴𝑛−1 𝑒1 span 𝐺𝐹(2)𝑛 . If the minimal polynomial is of degree less than 𝑛,
then 𝐴𝑛−1 is a linear combination of 𝐼𝑛 , 𝐴, … , 𝐴𝑛−2 , and there can be no cyclic vector.
This shows that the minimal polynomial is of degree 𝑛 and equals the characteristic
polynomial 𝑝(𝑥). The surjective ring homomorphism 𝐺𝐹(2)[𝑥] → 𝐺𝐹(2)[𝐴] maps a
6.2. Linear Feedback Shift Registers 123
polynomial 𝑓(𝑥) to 𝑓(𝐴). By definition of the minimal polynomial, 𝑓(𝐴) is the zero
matrix if and only if 𝑓(𝑥) is a multiple of 𝑝(𝑥). This completes the proof. □
In the following, we assume that 𝑐𝑛 = 1 so that the LFSR and the matrix 𝐴 are
nonsingular. This assumption is reasonable, since one could otherwise omit the last
register bit and obtain an LFSR of lower degree that generates essentially the same
The following Proposition relates the period of a linear recurring sequence to the
order of the associated matrix.
Proposition 6.12. Let 𝐴 be the matrix associated to a nonsingular LFSR of degree 𝑛 with
characteristic polynomial 𝑝(𝑥) and let ord(𝐴) be the order of 𝐴 in the multiplicative group
of invertible matrices over 𝐺𝐹(2), i.e., the smallest exponent 𝑁 ≥ 1 such that 𝐴𝑁 = 𝐼𝑛 .
(1) The period of any output sequence divides ord(𝐴).
(2) If the initial state is 𝑠𝑡 = (1, 0, … , 0)𝑇 , then the period of the associated output
sequence is equal to ord(𝐴).
(3) If 𝑝(𝑥) is irreducible, then the period of any nonzero sequence equals ord(𝐴).
Proof. Let 𝑠𝑡 be an initial state (viewed as a column vector). Then the sequence of
subsequent states is
𝑠𝑡, 𝐴 ⋅ 𝑠𝑡, 𝐴2 𝑠𝑡, 𝐴3 𝑠𝑡, … .
Let 𝑚 be the least period of that sequence. Since 𝐴ord(𝐴) = 𝐼𝑛 and hence 𝐴ord(𝐴) 𝑠𝑡
= 𝑠𝑡, we see that 𝑚 ∣ ord(𝐴), which proves (1). If 𝑠𝑡, 𝐴 ⋅ 𝑠𝑡, … , 𝐴𝑛−1 𝑠𝑡 form a basis
of 𝐺𝐹(2)𝑛 , then the period of the output sequence and the period of 𝐴 in the group of
invertible matrices coincide. For 𝑠𝑡 = (1, 0, … , 0)𝑇 , one can easily check that the vectors
𝑠𝑡, 𝐴 ⋅ 𝑠𝑡, … , 𝐴𝑛−1 𝑠𝑡 are linearly independent, and (2) is proved. If 𝑝(𝑥) is irreducible,
then Propositions 4.67 and 6.11 (6) imply an isomorphism of fields
𝐺𝐹(2)[𝑥]/(𝑝(𝑥)) ≅ 𝐺𝐹(2)[𝐴].
Therefore, any non-trivial linear combination of the matrices 𝐼𝑛 , 𝐴, … , 𝐴𝑛−1 is invert-
ible, which shows that the vectors 𝑠𝑡, 𝐴 ⋅ 𝑠𝑡, … , 𝐴𝑛−1 𝑠𝑡 are linearly independent for any
nonzero state 𝑠𝑡. This proves (3). □
The above Proposition shows that the maximal period of a nonsingular LFSR with
matrix 𝐴 is ord(𝐴). Under which conditions is the period equal to the maximal value
2𝑛 − 1?
It is not difficult to see that the order of 𝑓 is well defined: consider the quotient
ring 𝐺𝐹(𝑝)[𝑥]/(𝑓(𝑥)) and its group of units 𝑈 = (𝐺𝐹(𝑝)[𝑥]/(𝑓(𝑥)))∗ , which has at
most 𝑝𝑛 − 1 elements. If 𝑓 is irreducible then ord(𝑈) = 𝑝𝑛 − 1. Since 𝑓(0) ≠ 0, 𝑥 is
invertible modulo 𝑓(𝑥) and Euler’s Theorem 4.15 implies that
𝑥ord(𝑈) ≡ 1 mod 𝑓(𝑥).
In other words, 𝑓(𝑥) divides 𝑥ord(𝑈) − 1. In fact, we have
ord(𝑓) = ord(𝑥) ∣ ord(𝑈),
where ord(𝑥) is the order of 𝑥 in 𝑈.
Proof. Let 𝐴 be the matrix associated to the LFSR and suppose 𝑝(𝑥) is primitive. Prop-
osition 6.11 (6) shows that
(𝐺𝐹(2)[𝑥]/(𝑝(𝑥)))∗ ≅ 𝐺𝐹(2)[𝐴]∗ ,
and hence ord(𝑥) = ord(𝐴) = ord(𝑝(𝑥)) = 2𝑛 − 1. It remains to prove that the period
of all nonzero sequences is maximal. But this follows from Proposition 6.12 (3). □
Proof. Let 𝐴 be the unknown matrix associated to the LFSR. We reconstruct 𝑛 state
vectors of the LFSR from the output bits:
𝑦𝑛 𝑦𝑛+1 𝑦2𝑛−1
𝑠𝑡 = ( ⋮ ) , 𝐴 ⋅ 𝑠𝑡 = ( ⋮ ) , … , 𝐴𝑛−1 𝑠𝑡 = ( ⋮ ) .
𝑦1 𝑦2 𝑦𝑛
𝑦𝑛+1 = 𝑥 ⋅ 𝑠𝑡,
𝑦𝑛+2 = 𝑥 ⋅ (𝐴 ⋅ 𝑠𝑡)
𝑦2𝑛 = 𝑥 ⋅ (𝐴𝑛−1 𝑠𝑡).
We obtain a linear system of equations 𝑦 = 𝑀𝑥, where the rows of 𝑀 are formed
by the vectors 𝑠𝑡, 𝐴 ⋅ 𝑠𝑡, … , 𝐴𝑛−1 𝑠𝑡. Since 𝑝(𝑥) is irreducible, we obtain as in the proof
of Proposition 6.12 (3) that 𝑠𝑡, 𝐴 ⋅ 𝑠𝑡, … , 𝐴𝑛−1 𝑠𝑡 are linearly independent. Therefore,
the linear system of equations 𝑦 = 𝑀𝑥 has a unique solution, the vector of feedback
coefficients. □
Example 6.17. Suppose the following output bits of an LFSR of degree 4 are known
(in this order):
0, 1, 1, 1, 1, 0, 1, 0.
126 6. Stream Ciphers
We want to reconstruct the feedback coefficients and a state. The first four bits (in
reverse order) give the state vector:
⎛ ⎞
𝑠𝑡 = ⎜ ⎟ .
The subsequent states are 𝐴⋅𝑠𝑡 = (1, 1, 1, 1)𝑇 , 𝐴2 𝑠𝑡 = (0, 1, 1, 1)𝑇 and 𝐴3 𝑠𝑡 = (1, 0, 1, 1)𝑇 .
This yields four linear equations in the unknown feedback coefficients 𝑥1 , 𝑥2 , 𝑥3 and
𝑥4 . The left side of the equations is given by the last four output bits.
1 = 1𝑥1 + 1𝑥2 + 1𝑥3 + 0𝑥4 mod 2,
0 = 1𝑥1 + 1𝑥2 + 1𝑥3 + 1𝑥4 mod 2,
1 = 0𝑥1 + 1𝑥2 + 1𝑥3 + 1𝑥4 mod 2,
0 = 1𝑥1 + 0𝑥2 + 1𝑥3 + 1𝑥4 mod 2.
This 4 × 4 system of linear equations over 𝐺𝐹(2) is regular and the unique solution are
the feedback coefficients 𝑥1 = 1, 𝑥2 = 0, 𝑥3 = 0 and 𝑥4 = 1. In fact, we have taken
eight output bits from the LFSR in Example 6.9.
Remark 6.18. The Berlekamp-Massey algorithm (see [MvOV97]) finds the shortest
LFSR that generates a given finite sequence. The degree of the shortest LFSR is called
the linear complexity of a sequence. Suppose the characteristic polynomial 𝑝(𝑥) of a
nonsingular LFSR is irreducible and deg(𝑝(𝑥)) = 𝑛. Then each nonzero state pro-
duces an output sequence of period ord(𝑝(𝑥)) and linear complexity 𝑛. With 2𝑛 given
output bits, the Berlekamp-Massey algorithm can compute the feedback coefficients
more efficiently than solving a system of linear equations. ♢
We have seen that LFSRs are very efficient and can have a large period, but make
weak stream ciphers. The problem is, of course, the linear structure of LFSRs.
One possible approach is to use filter generators. A nonlinear function 𝑓 is applied
to the entire state of an LFSR and defines the keystream:
𝑦𝑗 = 𝑓(𝑠𝑗−1 , 𝑠𝑗−2 , … , 𝑠𝑗−𝑛 ).
The multiplications of state bits (AND) can be used along with additions (XOR).
Furthermore, one can use combination generators, which combine several LFSRs.
A linear or nonlinear Boolean function takes the output bits of each register as input
and combines them into a single keystream bit.
Example 6.19. The stream cipher Trivium [DCP08], which belongs to the portfolio
of the eSTREAM project, combines three shift registers of degree 93, 84 and 111, re-
spectively. The output of each register is defined by a nonlinear filter function, and
6.2. Linear Feedback Shift Registers 127
the input is the XOR-sum of one feedback bit and the output of another register. The
keystream at each clock tick is the XOR-sum of the output bits of the three registers. ♢
Yet another approach is to use irregular clocking: several LFSRs are combined and
a nonlinear function determines whether or not a register is clocked (shifted to the
right). If a register is not clocked, then the previous bit is output again.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Figure 6.6. The A5/1 cipher combines three LFSRs and uses irregular clocking.
Example 6.20. The A5/1 cipher that is used in GSM mobile networks (2G) combines
three LFSRs of degree 19, 22 and 23 (see Table 6.1 and Figure 6.6). In each register, one
clocking bit is fixed and a register is clocked, if the clocking bit agrees with the majority
of the three clocking bits. Therefore, either all three LFSRs or two of the LFSRs are
clocked. The probability that a register is clocked is (see Exercise 5).
Table 6.1. Feedback polynomials and clocking bit of the A5/1 cipher.
Initially, all registers are set to zero. A 64-bit ciphering key (where only 54 bits are
secret) and a 22-bit frame number are mixed in. Then the irregular clocking starts: the
first 100 output bits are discarded and the next 228 bits are used as the keystream (the
first 114 bits for the downlink from the base station to the cellular phone and then 114
bits for the uplink).
The irregular clocking forms the nonlinear component of the cipher. Nevertheless,
with current computing resources and large precomputed tables A5/1 is now broken.
Better GSM ciphers (A5/3 and A5/4) are available, but whether they are used depends
on the network and the mobile phone.
128 6. Stream Ciphers
6.3. RC4
The synchronous stream cipher RC4 (Rivest Cipher 4) was very popular for many years
and was often used to encrypt network traffic (for example in the TLS protocol or for
the encryption of Wi-Fi traffic). RC4 is based on permutations of the integers (or bytes)
0, 1, … , 255 and recursively generates output bytes. The key-scheduling algorithm (ini-
tialization) takes a key (between one and 256 bytes long) and sets up the state array
𝑆[0], 𝑆[1], … , 𝑆[255]. The pseudorandom generation algorithm recursively computes
one output byte and updates the state.
RC4 is ideal for software implementations and is very efficient. Unfortunately, the
output of RC4 is biased and can be distinguished from a random sequence of bytes.
For the remainder of this section, all additions (+) are modulo 256. First, we con-
sider the key scheduling (see Algorithm 6.1).
In the first iteration of the for loop, one has 𝑖 = 0 and 𝑗 = 0 + 𝑆[0] + 𝐾[0] = 𝐾[0].
The values of 𝑆[0] and 𝑆[𝐾[0]] are swapped so that 𝑆[0] = 𝐾[0] and 𝑆[𝐾[0]] = 0. In
the next iteration, one has 𝑖 = 1 and 𝑗 = 𝐾[0] + 𝑆[1] + 𝐾[1] so that
Hence 𝑆[1] is equal to 𝐾[0] + 𝐾[1] + 1, unless 𝐾[0] = 1 and 𝑆[1] = 0 as a result of the
first iteration. In this case, 𝑆[1] = 𝐾[0] + 𝐾[1] = 1 + 𝐾[1].
In the next step, 𝑖 = 2 and (if 𝐾[0] ≠ 1) one gets 𝑗 = 𝐾[0] + 𝐾[1] + 1 + 𝑆[2] + 𝐾[2].
Therefore, it is likely that 𝑆[2] = 𝐾[0] + 𝐾[1] + 𝐾[2] + 3. Note that 𝑆[0], 𝑆[1] and 𝑆[2]
may change later in the loop if one of the 𝑗-values for 𝑖 > 2 becomes 0, 1 or 2.
By continuing in this fashion, one can show that the most likely value for the 𝑖-th
state byte after the key scheduling algorithm is
𝑖(𝑖 + 1)
𝑆[𝑖] = 𝐾[0] + 𝐾[1] + ⋯ + 𝐾[𝑖] + mod 256.
6.3. RC4 129
For the first nine state values, it can be shown that the above formula holds with
more than 30% probability (see [PM07]), which is a very significant bias compared to
a random permutation.
Now consider the pseudorandom generator (see Algorithm 6.2). The first output
byte is 𝑆[𝑆[1] + 𝑆[𝑆[1]]] and the second byte is
𝑆[𝑆[2] + 𝑆[𝑆[1] + 𝑆[2]]].
One can show that the output of RC4 is biased and reveals information about the key.
Below, we discuss a famous attack which reveals the key byte 𝐾[3].
By construction, RC4 does not use an initialization vector (IV), and thus the
keystream must not be re-used with the same key. In practice, the secret key is often
re-used and an IV is incorporated into the RC4 key. In the former Wi-Fi encryption
standard WEP (Wired Equivalent Privacy), a three-byte IV is prepended to the key:
𝐾[0] = 𝐼𝑉[0], 𝐾[1] = 𝐼𝑉[1], 𝐾[2] = 𝐼𝑉[2].
It turned out that this construction is insecure (Fluhrer, Mantin and Shamir attack
[FMS01]): an adversary waits until the first two bytes of the IV are
𝐼𝑉[0] = 𝐾[0] = 3, 𝐼𝑉[1] = 𝐾[1] = 255.
Then the first two iterations of the key scheduling algorithm give
𝑆[0] = 𝐾[0] = 3, 𝑆[1] = 𝐾[0] + 𝐾[1] + 1 = 3 + 255 + 1 ≡ 3 mod 256.
Due to the swapping operations, the first few bytes of the state array are
𝑆[0] = 3, 𝑆[1] = 0, 𝑆[2] = 2, 𝑆[3] = 1.
The next two iterations yield:
𝑆[2] = 3 + 2 + 𝐼𝑉[2] = 5 + 𝐼𝑉[2] and
𝑆[3] = 5 + 𝐼𝑉[2] + 1 + 𝐾[3] = 6 + 𝐼𝑉[2] + 𝐾[3].
130 6. Stream Ciphers
If we now assume that 𝑆[0] = 3, 𝑆[1] = 0 and 𝑆[3] = 6 + 𝐼𝑉[2] + 𝐾[3] are not subse-
quently modified in the key scheduling algorithm, then the first keystream byte is
Since 𝐼𝑉[2] is known, the first secret key byte 𝐾[3] can be computed from 𝐵. In practice,
the first plaintext and ciphertext byte and thus the first keystream byte 𝐵 are often
known, for example in Wi-Fi communication.
In the updated RC4-based Wi-Fi security protocol TKIP, the mixing of IV and key
was improved, but now TKIP is also deprecated. The RC4 cipher should no longer be
used because of serious weaknesses.
(1) Profile 1: Stream ciphers with excellent throughput when implemented in soft-
ware: HC-128, Rabbit, Salsa20/12 and SOSEMANUK.
(2) Profile 2: Stream ciphers which are very efficient - in terms of the physical re-
sources required - when implemented in hardware: Grain v1, MICKEY 2.0 and
Profile 1 ciphers use 128-bit keys and profile 2 ciphers 80-bit keys. Extended key
lengths are provided by the software ciphers HC-256 and Salsa 20/20 (256-bit keys) and
the hardware cipher MICKEY-128 2.0 (128-bit key).
However, it should be noted that the eSTREAM portfolio is not a standardization.
The project wants to draw attention to these ciphers and to encourage further crypt-
In the following, we describe the stream cipher Salsa20/20 (i.e., Salsa20 with 20
rounds and a 256-bit key) [Ber08b] and its variant ChaCha20 [Ber08a], which has
been adopted as a replacement of RC4 in the TLS protocol (see RFC 7905 [LCM+ 16]).
Salsa20 is based on three simple operations:
The Salsa20/20 cipher takes a 256-bit key, a 64-bit nonce and a 64-bit counter. The
state array 𝑆 of Salsa20 is a 4 × 4 matrix of sixteen 32-bit words. Strings are interpreted
6.4. Salsa20 and ChaCha20 131
in little-endian notation, i.e., the least significant bit of each word is stored first.
𝑦 𝑦1 𝑦2 𝑦3
⎛ 0 ⎞
𝑦 𝑦5 𝑦6 𝑦7
𝑆=⎜ 4 ⎟.
⎜ 𝑦8 𝑦9 𝑦10 𝑦11 ⎟
⎝𝑦12 𝑦13 𝑦14 𝑦15 ⎠
𝑏 = 𝑏 ⊕ ((𝑎 + 𝑑) ⋘ 7),
𝑐 = 𝑐 ⊕ ((𝑏 + 𝑎) ⋘ 9),
𝑑 = 𝑑 ⊕ ((𝑐 + 𝑏) ⋘ 13),
𝑎 = 𝑎 ⊕ ((𝑑 + 𝑐) ⋘ 18).
The column-round function is the transpose of the row-round function: the words
in the columns are permuted, the quarter-round map is applied to each of the columns
and the permutation is reversed.
Definition 6.24. Let 𝑆 be a state matrix as above; then
column-round (𝑆) = (row-round (𝑆 𝑇 ))𝑇 . ♢
𝑏 = (𝑏1 , 𝑏2 ) is initially set to zero. The initialization algorithm copies 𝑘, 𝑛, 𝑏 and the
four 32-bit constants
𝑦0 = 61707865, 𝑦5 = 3320646E, 𝑦10 = 79622D32, 𝑦15 = 6B206574
into the sixteen 32-bit words of the Salsa20 state matrix:
𝑦 𝑘1 𝑘2 𝑘3
⎛ 0 ⎞
𝑘4 𝑦5 𝑛1 𝑛2
𝑆=⎜ ⎟.
⎜ 𝑏1 𝑏2 𝑦10 𝑘5 ⎟
⎝𝑘6 𝑘7 𝑘8 𝑦15 ⎠
The keystream generator computes the output state by ten double-round iterations
and a final addition mod 232 of the initial state matrix:
Salsa20𝑘 (𝑛, 𝑏) = 𝑆 + double-round (𝑆).
The block counter 𝑐 is incremented and the state is newly initialized for additional
64-byte output blocks. The Salsa20 keystream is the serialization of a sequence of 64-
byte output blocks:
Salsa20𝑘 (𝑛, 0), Salsa20𝑘 (𝑛, 1), Salsa20𝑘 (𝑛, 2), … .
Remark 6.27. Salsa20 treats strings as little-endian integers. For example, if the first
four key bytes are 01, 02, 03 and 04, then the corresponding integer is 𝑦1 = 04030201
in hexadecimal notation. Output words are serialized; the integer 04030201 yields the
output bytes 01, 02, 03 and 04 (in this order).
Example 6.28. Salsa20𝑘 (𝑛, 0) is a zero block if 𝑘 and 𝑛 are zero. This should not hap-
pen when Salsa20 is used as a stream cipher, since the nonce 𝑛 must only be used once.
Remark 6.29. Note that the state 𝑆 is re-initialized for each 64-byte output block and
there is no chaining from one block to another. Hence the Salsa20 keystream can be
accessed randomly and the computation of 64-byte blocks can be done in parallel. ♢
We turn to the ChaCha family of ciphers [Ber08a] and describe the ChaCha20 vari-
ant described in RFC 8439 [NL18]. ChaCha20 is a modification of Salsa20 and we
explain the differences to Salsa20.
Definition 6.30. Let 𝑦 = (𝑎, 𝑏, 𝑐, 𝑑) be a sequence of four 32-bit words. Then a ChaCha
quarter-round updates (𝑎, 𝑏, 𝑐, 𝑑) as follows:
1) 𝑎=𝑎+𝑏 ; 𝑑 =𝑑⊕𝑎 ; 𝑑 ⋘ 16;
2) 𝑐=𝑐+𝑑 ; 𝑏=𝑏⊕𝑐 ; 𝑏 ⋘ 12;
3) 𝑎=𝑎+𝑏 ; 𝑑 =𝑑⊕𝑎 ; 𝑑 ⋘ 8;
4) 𝑐=𝑐+𝑑 ; 𝑏=𝑏⊕𝑐 ; 𝑏 ⋘ 7.
134 6. Stream Ciphers
A ChaCha quarter-round updates each word twice and uses different rotation dis-
tances than Salsa20. ChaCha20 also runs ten double-rounds. However, a ChaCha
double-round consists of a column-round and a diagonal-round, which changes words
along the main and secondary diagonals.
Definition 6.31. A ChaCha double-round is defined by the eight ChaCha quarter-
rounds in Table 6.2.
Table 6.2. A column-round and diagonal-round form a ChaCha double-round.
The RFC version of ChaCha20 described below uses a 12-byte nonce and a 4-byte
block counter. The original cipher takes an 8-byte nonce and an 8-byte counter.
Definition 6.32. The ChaCha20 stream cipher takes a 256-bit key 𝑘 = (𝑘1 , … , 𝑘8 )
and a unique 96-bit message number 𝑛 = (𝑛1 , 𝑛2 , 𝑛3 ) (nonce) as input. A 32-bit block
counter 𝑏 is initially set to zero. The initialization algorithm copies 𝑘, 𝑛, 𝑏 and the four
32-bit constants
𝑦0 = 61707865, 𝑦1 = 3320646E, 𝑦2 = 79622D32, 𝑦3 = 6B206574
into the sixteen 32-bit words of the ChaCha20 state matrix:
𝑦 𝑦1 𝑦2 𝑦3
⎛ 0 ⎞
𝑘1 𝑘2 𝑘3 𝑘4
𝑆=⎜ ⎟.
⎜𝑘5 𝑘6 𝑘7 𝑘8 ⎟
⎝ 𝑏 𝑛1 𝑛2 𝑛3 ⎠
The ChaCha20 keystream generator works analogously to the Salsa20 generator, but
uses ChaCha double-rounds:
ChaCha𝑘 (𝑛, 𝑏) = 𝑆 + double-round (𝑆).
The block counter 𝑏 is incremented and the state is newly initialized for each 64-byte
output block. The ChaCha20 keystream is the serialization of a sequence of 64-byte
output blocks:
ChaCha𝑘 (𝑛, 0), ChaCha𝑘 (𝑛, 1), ChaCha𝑘 (𝑛, 2), … .
Remark 6.33. Salsa20/20 and ChaCha20 are very fast (also in comparison with AES)
and encryption requires less than 5 CPU cycles per byte on modern processors.
Exercises 135
6.5. Summary
1. Suppose the length of the IV of a synchronous stream cipher is 24 bits. Discuss the
security of the cipher.
2. Check whether 𝑝(𝑥) = 𝑥4 + 𝑥3 + 𝑥2 + 𝑥 + 1 ∈ 𝐺𝐹(2)[𝑥] is a primitive polyno-
mial. Suppose 𝑝(𝑥) is the characteristic polynomial of an LFSR. Find the periods
of output sequences generated by this LFSR.
3. Let 𝑐(𝑥) be the connection polynomial of a nonsingular LFSR and let 𝑝(𝑥) be the
corresponding characteristic polynomial. Show that 𝑝(𝑥) is irreducible if and only
if 𝑐(𝑥) is irreducible. Furthermore, show that 𝑝(𝑥) is primitive if and only if 𝑐(𝑥) is
4. Suppose an LFSR of degree 5 is used as a stream cipher and the following plaintext
𝑚 and ciphertext 𝑐 is known:
𝑚 = 00100 11000, 𝑐 = 10110 01110.
Compute the feedback polynomial, the characteristic polynomial, the period and
the complete keystream.
Hint: The first five bits of 𝑚 ⊕ 𝑐 give a state (reverse the order). The next five bits
yield linear equations in the unknown feedback coefficients.
5. Verify that the majority function of three bits 𝑥1 , 𝑥2 , 𝑥3 is given by
𝑋 = 𝑚𝑎𝑗(𝑥1 , 𝑥2 , 𝑥3 ) = (𝑥1 ∧ 𝑥2 ) ⊕ (𝑥1 ∧ 𝑥3 ) ⊕ (𝑥2 ∧ 𝑥3 ).
Show that 𝑃𝑟[𝑋 = 𝑥𝑖 ] = for 𝑖 = 1, 2, 3 if 𝑥1 , 𝑥2 , 𝑥3 are independent and uniformly
136 6. Stream Ciphers
6. Use SageMath to verify that the feedback polynomials of the A5/1 LFSRs (see
Example 6.20) are primitive. Give an upper bound for the period of the A5/1
keystream generator.
7. Suppose an RC4 key satisfies 𝐾[0] + 𝐾[1] ≡ 0 mod 256. Show that with increased
probability the first output byte is 𝐾[2] + 3 mod 256.
8. Show that the quarter-round operation in Salsa20 is invertible. Give a description
of the inverse map.
9. Give an explicit description of the column-round operation in Salsa20 using the
quarter-round map.
10. Apply a Salsa20 quarter-round to (1, 0, 0, 0), (0, 1, 0, 0) and (0, 0, 0, 1), where 1 =
00 00 00 01.
11. Salsa20 can be seen as a map on the vector space 𝐺𝐹(2)512 . Which Salsa20 opera-
tions are not 𝐺𝐹(2)-linear? Explain your answer.
12. In Salsa20 and ChaCha20, the initial state matrix is added to the resulting state
matrix after performing ten double-rounds. Why is this final step important for
the security of the cipher?
13. Suppose the diagonal rounds in ChaCha20 are omitted. Discuss the consequences
of this modification on the security of the cipher.
Chapter 7
Hash Functions
138 7. Hash Functions
If the domain 𝐷 is larger than the range 𝑅, then 𝐻 cannot be injective and colli-
sions must therefore exist. However, the probability of finding collisions with limited
computing resources may be very small.
There are two related requirements which are weaker than collision resistance.
We only give informal definitions:
• Second-preimage resistance or weak collision resistance means that an adversary,
who is given a uniform 𝑥 ∈ 𝐷, is not able to find a second preimage 𝑥′ ∈ 𝐷 with
𝑥 ≠ 𝑥′ such that 𝐻(𝑥) = 𝐻(𝑥′ ).
• Preimage resistance or one-wayness means that an adversary, who is given a uni-
form 𝑦 ∈ 𝑅, is not able to find a preimage 𝑥 ∈ 𝐷 such that 𝐻(𝑥) = 𝑦.
One can show that collision resistance implies second-preimage resistance and
preimage resistance (see Exercise 1).
In practice, hash functions are usually unkeyed or the key is fixed. Unkeyed hash
functions 𝐻 ∶ {0, 1}∗ → {0, 1}𝑙 have a theoretical disadvantage: they are fixed functions
and a collision can be found in constant time. However, this can be inaccessible if 𝑙 is
large. Therefore, one requires that it is computationally infeasible to produce a collision.
In particular, not even a single collision should be known.
Remark 7.2. An ideal unkeyed hash function is called a random oracle. The output of
a random oracle is uniformly random, unless the same input is queried twice, in which
case the oracle returns the same output. One can construct a pseudorandom generator
(see Definition 2.32) and a pseudorandom function (Definition 2.39) from a random
oracle (see [KL15]). However, implementations of a random oracle are impossible: it
must have some compact description, and hence the output of any real-world instance
is deterministic and not random.
The random oracle model is used in some security proofs, and one hopes that con-
crete instantiations of hash functions are sufficiently close to that assumption. A se-
curity guarantee in the random oracle model can only be relative: a scheme is secure
assuming that the hash function has no weaknesses and produces uniform output. ♢
The output length of a hash function should not be too short. In fact, the Birthday
Paradox shows that collisions occur surprisingly often (see Proposition 1.61):
Proposition 7.3. Let 𝑘 be the number of independent samples drawn from a uniform
distribution on a set of size 𝑁. If 𝑘 ≈ 1.2√𝑁, then the probability of a collision is around
50%. ♢
The construction of a secure hash function is not easy, and many obvious defini-
tions do not give collision-resistant functions (see Exercises 2 and 3).
The hashes of transactions 𝑇1 , 𝑇2 , 𝑇3 , … form the leaves of a binary tree called the
Merkle tree. The nodes further up are hashes of two children nodes and the root of the
Merkle is the top hash value (see Figure 7.2).
This works with any even number of transactions. The root hash forms an identi-
fier for all transactions in a block, and changing a single transaction would completely
change the root hash. Individual transactions can be verified by their hash path from
the leaf to the root.
Suppose we want to prove that a transaction 𝑇3′ is included in the blockchain; then
we only need to provide the hashes 𝐻4 = 𝐻(𝑇4 ) and 𝐻12 along with 𝑇3′ . The ver-
ifier checks the hash path by computing 𝐻(𝑇3′ ), 𝐻34 ′
= 𝐻(𝐻(𝑇3′ )‖𝐻4 ) and 𝐻𝑟𝑜𝑜𝑡
′ ′
𝐻(𝐻12 ‖𝐻34 ). Finally, they verify if 𝐻𝑟𝑜𝑜𝑡 coincides with the root hash 𝐻𝑟𝑜𝑜𝑡 which is
stored in the blockchain. This is very efficient, even for larger trees with thousands of
leaves, and Merkle trees have many applications beyond blockchains.
Blockchains are used by many cryptocurrencies. The blockchain records the trans-
actions of previously unspent cybercoins from one or more input addresses to one or
more output addresses. Each new block contains a proof-of-work; by adapting the
nonce value, a miner has to find a hash value of the new block that is smaller than
the network’s difficulty target. This may require a huge number of hashing operations
and consume significant computing resources as well as a lot of energy. The miner
is rewarded with new cybercoins. The proof of work protects the blockchain against
manipulations and complicates forks.
Definition 7.5. Let 𝑛, 𝑙 ∈ ℕ and let 𝑓 ∶ {0, 1}𝑛+𝑙 → {0, 1}𝑛 be a compression function.
Let 𝐼𝑉 ∈ {0, 1}𝑛 be an initialization vector. An input message 𝑚 of arbitrary length
is padded by a 1, a sequence of zero bits and the length 𝐿 = |𝑚|, encoded as a 64-bit
binary string. The padded message is
𝑚′ = 𝑚‖1‖0 … 0‖𝐿.
The number of zeros is chosen such that the length of 𝑚′ is a multiple of 𝑙. We split 𝑚′
into blocks of length 𝑙:
𝑚′ = 𝑚1 ‖𝑚2 ‖ … ‖𝑚𝑁 .
The Merkle-Damgård hash function 𝐻 = 𝐻𝐼𝑉 ∶ {0, 1}∗ → {0, 1}𝑛 is defined by recur-
sive application of the compression function 𝑓 (see Figure 7.3). The last output value
defines the hash:
ℎ0 = 𝐼𝑉,
ℎ𝑖 = 𝑓(ℎ𝑖−1 , 𝑚𝑖 ) for 𝑖 = 1, … , 𝑁,
𝐻(𝑚) = 𝐻𝐼𝑉 (𝑚) = ℎ𝑁 .
The initialization vector IV can be regarded as a key, but in practice, IV is a pre-defined
constant. ♢
𝑚′ = 𝑚1 𝑚2 𝑚3 𝑚4 𝑚𝑁
ℎ0 = 𝐼𝑉
𝑓 𝑓 𝑓 𝑓 ⋯ 𝑓 𝐻(𝑚)
ℎ1 ℎ2 ℎ3
The compression function can be based on a block cipher, although this construc-
tion is rarely used in practice.
Definition 7.7. (Davies-Meyer) Let 𝐸 be the encryption function of a block cipher
with key length 𝑛 and block length 𝑙. Then a compression function
𝑓 ∶ {0, 1}𝑛+𝑙 → {0, 1}𝑙
can be defined as follows:
𝑓(𝑘, 𝑚) = 𝐸𝑘 (𝑚) ⊕ 𝑚. ♢
142 7. Hash Functions
One can show that this construction defines a collision-resistant compression func-
tion in the ideal cipher model. A block cipher that is chosen uniformly at random from
all block ciphers with 𝑛-bit keys and 𝑙-bit input/output strings is called an ideal ci-
pher. An ideal cipher is a family of independent permutations. This is stronger than
the standard notion of pseudorandomness and includes protection against related-key
attacks (see Remark 2.44). Although ideal ciphers cannot be implemented and it is un-
clear whether real-word block ciphers (for example AES) behave like an ideal cipher,
security proofs in the ideal cipher model can still be useful: a scheme can be proven
to be secure (for example, the above Davies-Meyer construction), unless an adversary
exploits weaknesses of the underlying block cipher.
7.4. SHA-1
Until recently, SHA-1 was a widely used Merkle-Damgård hash function, and in the
following we describe its compression function 𝑓. As input, the function takes a 160-
bit status vector and a 512-bit message block and outputs an updated 160-bit status:
𝑓 ∶ {0, 1}160+512 → {0, 1}160 .
𝐻1 = 67452301,
𝐻2 = EFCDAB89,
𝐻3 = 98BADCFE,
𝐻4 = 10325476,
𝐻5 = C3D2E1F0.
The 160-bit input vector ℎ = 𝐻1 ‖𝐻2 ‖𝐻3 ‖𝐻4 ‖𝐻5 is subdivided into five 32-bit words
and copied to the initial status vector:
𝐴‖𝐵‖𝐶‖𝐷‖𝐸 ← 𝐻1 ‖𝐻2 ‖𝐻3 ‖𝐻4 ‖𝐻5 .
7.4. SHA-1 143
Figure 7.4. One round of the SHA-1 compression function 𝑓. The 32-bit status words
𝐴, 𝐵, 𝐶, 𝐷, 𝐸 are updated. In each round, a 32-bit chunk 𝑊 of the input message is
processed. 𝐹 is a nonlinear bit-function, 𝐾 a 32-bit constant (both depending on the
round number) and + denotes addition modulo 232 .
Then, 80 rounds of the SHA-1 compression function are performed (see Figure 7.4),
which update the status words 𝐴‖𝐵‖𝐶‖𝐷‖𝐸. In round 𝑗, the 32-bit message word 𝑊𝑗
is processed. A bit-function 𝐹 (defined by AND, OR, NOT and XOR operations) and a
constant 𝐾 are used. The function 𝐹 and the constant 𝐾 change every 20 rounds (see
Table 7.1).
Table 7.1. Keys and bit functions in SHA-1.
𝑗 𝐾 𝐹
0 ≤ 𝑗 ≤ 19 5A827999 𝐶ℎ(𝐵, 𝐶, 𝐷) = (𝐵 ∧ 𝐶) ⊕ (¬𝐵 ∧ 𝐷)
20 ≤ 𝑗 ≤ 39 6ED9EBA1 𝑃𝑎𝑟𝑖𝑡𝑦(𝐵, 𝐶, 𝐷) = 𝐵 ⊕ 𝐶 ⊕ 𝐷
40 ≤ 𝑗 ≤ 59 8F1BBCDC 𝑀𝑎𝑗(𝐵, 𝐶, 𝐷) = (𝐵 ∧ 𝐶) ⊕ (𝐵 ∧ 𝐷) ⊕ (𝐶 ∧ 𝐷)
60 ≤ 𝑗 ≤ 79 CA62C1D6 𝑃𝑎𝑟𝑖𝑡𝑦(𝐵, 𝐶, 𝐷) = 𝐵 ⊕ 𝐶 ⊕ 𝐷
value, and impressive examples have been published. 𝑀 (1) and 𝑀 (2) each consist of
two 512-bit blocks. The Merkle-Damgård iteration that takes the first block of 𝑀 (𝑖) as
input produces a near collision, and the second block then gives a full collision. Since
both messages have the same length, appending the padding data including the length
preserves the collision.
Example 7.8. We check the collision found by [SBK+ 17]. First, define the prefix and
the messages.
sage: prefix='255044462 d312e330a25e2e3cfd30a0a0a312030206f626a0a3c3c2f57696474
682032203020522 f4865696768742033203020522f547970652034203020522f53756274
7970652035203020522 f46696c7465722036203020522f436f6c6f725370616365203720
3020522 f4c656e6774682038203020522f42697473506572436f6d706f6e656e7420383e
3 e0a73747265616d0affd8fffe00245348412d3120697320646561642121212121852fec
092339759 c39b1a1c63c4c97e1fffe01 '
SHA-1 is now deprecated, and it is recommended to use SHA-2 or the new standard
hash function SHA-3 (see below).
7.5. SHA-2
The SHA-2 hash functions SHA-224, SHA-256, SHA-384 and SHA-512 are constructed
in a similar way to SHA-1, but use an extended internal state of 256 bits (eight 32-
bit words) and larger digests. It is assumed that SHA-2 offers better protection against
collision-finding attacks, and at the time of this writing SHA-2 is widely used in security
protocols and applications. The SHA-2 family is specified in the standard [FIP15a].
Since SHA-2 is a Merkle-Damgård hash function, we only need to define the com-
pression function 𝑓 and the initial status. In the following, we describe the SHA-256
The compression function 𝑓 takes as input a 256-bit status vector and a 512-bit
message block and outputs an updated 256-bit status:
𝑓 ∶ {0, 1}256+512 → {0, 1}256 .
The initial 256-bit status vector ℎ0 = 𝐻1 ‖𝐻2 ‖𝐻3 ‖𝐻4 ‖𝐻5 ‖𝐻6 ‖𝐻7 ‖𝐻8 is defined by:
𝐻1 = 6A09E667,
𝐻2 = BB67AE85,
𝐻3 = 3C6EF372,
𝐻4 = A54FF53A,
𝐻5 = 510E527F,
𝐻6 = 9B05688C,
𝐻7 = 1F83D9AB,
𝐻8 = 5BE0CD19.
A 512-bit message block 𝑚 = 𝑊0 ‖𝑊1 ‖ … ‖𝑊15 is split into 16 words of length 32 bits.
The functions 𝜎0 and 𝜎1 transform 32-bit words by a combination of XOR, right-rotate
(⋙) and right-shift (≫) operations:
𝜎0 (𝑤) = (𝑤 ⋙ 7) ⊕ (𝑤 ⋙ 18) ⊕ (𝑤 ≫ 3),
𝜎1 (𝑤) = (𝑤 ⋙ 17) ⊕ (𝑤 ⋙ 19) ⊕ (𝑤 ≫ 10).
Now 48 additional words 𝑊16 , … , 𝑊63 are generated:
𝑊𝑗 = 𝜎1 (𝑊𝑗−2 ) + 𝑊𝑗−7 + 𝜎0 (𝑊𝑗−15 ) + 𝑊𝑗−16 for 16 ≤ 𝑗 ≤ 63.
The 256-bit input vector ℎ = 𝐻1 ‖𝐻2 ‖𝐻3 ‖𝐻4 ‖𝐻5 ‖𝐻6 ‖𝐻7 ‖𝐻8 is subdivided into
eight 32-bit words and copied to the initial status vector:
𝐴‖𝐵‖𝐶‖𝐷‖𝐸‖𝐹‖𝐺‖𝐻 ← 𝐻1 ‖𝐻2 ‖𝐻3 ‖𝐻4 ‖𝐻5 ‖𝐻6 ‖𝐻7 ‖𝐻8 .
146 7. Hash Functions
Figure 7.5. One round of the SHA-2 compression function 𝑓. The 32-bit status words
𝐴, 𝐵, 𝐶, 𝐷, 𝐸, 𝐹, 𝐺, 𝐻 are updated. In each round, a 32-bit chunk 𝑊 of the input
message is processed. 𝐾 is a 32-bit constant, which depends on the round number.
Then, 64 rounds of the SHA-2 compression function 𝑓 are performed (see Figure 7.5),
which update the status words 𝐴‖𝐵‖𝐶‖𝐷‖𝐸‖𝐹‖𝐺‖𝐻. In round 𝑗, the 32-bit word 𝑊𝑗
and the 32-bit constant 𝐾𝑗 is processed. The numbers 𝐾𝑗 represent the first 32 bits of
the fractional parts of the cube roots of the first 64 prime numbers.
In each round, four functions 𝑀𝑎𝑗, 𝐶ℎ, Σ0 and Σ1 are used which operate on 32-bit
𝑀𝑎𝑗(𝑥, 𝑦, 𝑧) = (𝑥 ∧ 𝑦) ⊕ (𝑥 ∧ 𝑧) ⊕ (𝑦 ∧ 𝑧),
𝐶ℎ(𝑥, 𝑦, 𝑧) = (𝑥 ∧ 𝑦) ⊕ (¬𝑥 ∧ 𝑧),
Σ0 (𝑤) = (𝑤 ⋙ 2) ⊕ (𝑤 ⋙ 13) ⊕ (𝑤 ⋙ 22),
Σ1 (𝑤) = (𝑤 ⋙ 6) ⊕ (𝑤 ⋙ 11) ⊕ (𝑤 ⋙ 25).
𝑓(ℎ, 𝑚) = (𝐴 + 𝐻1 ‖𝐵 + 𝐻2 ‖𝐶 + 𝐻3 ‖𝐷 + 𝐻4 ‖𝐸 + 𝐻5 ‖𝐹 + 𝐻6 ‖𝐺 + 𝐻7 ‖𝐻 + 𝐻8 ),
7.6. SHA-3
Since collisions of MD5 have been found and weaknesses of SHA-1 were already
known, in 2007 the American NIST announced a competition to design a new hash
function called SHA-3. After narrowing down the candidates in three public rounds,
Keccak was selected as the winner of the competition in 2012. The main evaluation cri-
teria were security, performance, flexibility and simplicity of the design. Keccak is not of
Merkle-Damgård type, but rather based on a sponge construction (see Figure 7.6). The
design and the security claim is explained in [Ber11]. The construction is modeled to
behave like a random oracle.
7.6. SHA-3 147
In 2015, the Keccak variants SHA3-224, SHA3-256, SHA3-384, SHA3-512 with out-
put lengths between 224 and 512 bits were standardized [FIP15b]. The SHA-3 instance
of Keccak uses a three-dimensional state array of 5 × 5 × 64 = 1600 bits. The unkeyed
Keccak-𝑓[1600] permutation operates on the 1600-bit state array and it is assumed that
𝑓 behaves like a random permutation. In each step, 𝑟 < 1600 message bits are pro-
cessed. 𝑟 is called the rate, and the remaining number of 𝑐 = 1600 − 𝑟 bits is called the
capacity. The Keccak-𝑓[1600] permutation
is parametrized by 𝑟 and 𝑐. SHA-3 specifies the combinations which are shown in Ta-
ble 7.2.
supports output lengths 𝑙 ∈ {224, 256, 384, 512}. Depending on 𝑙, the rate 𝑟 and the
capacity 𝑐 are fixed (see Table 7.2). First, the input message 𝑚 is padded such that the
length of 𝑚′ is a multiple of 𝑟. The padding rule of the SHA-3 family is to append the
pattern 0110 … 01. The padded message 𝑚′ is split into blocks 𝑚1 , 𝑚2 , … , 𝑚𝑁 of length
𝑚′ = 𝑚‖0110 … 01 = 𝑚1 ‖𝑚2 ‖ … ‖𝑚𝑁 .
The state 𝑠 = 𝑠1 ‖𝑠2 is initialized by the zero vector 0𝑟 ‖0𝑐 . During the absorbing phase,
the message blocks are XORed into the leftmost 𝑟 bits of the state and the permutation
Keccak-𝑓[1600] is applied to the full state of 𝑟 + 𝑐 = 1600 bits. The state is updated for
each message block:
Finally, the SHA-3 hash value is computed using a single squeezing operation; 𝐻(𝑚) is
defined by the leftmost 𝑙 bits of the resulting state vector 𝑠1 (see Figure 7.6). ♢
148 7. Hash Functions
Figure 7.6. Absorbing message blocks of length 𝑟 into the state and finally squeezing
out the SHA-3 hash of length 𝑙 < 𝑟.
7.7. Summary
• Hash functions take messages of arbitrary length as input and output a short
message digest.
• Hash functions should be collision-resistant; although collisions must exist, it
should be very hard to find one.
• Many hash functions (in particular SHA-1 and SHA-2) are based on the Merkle-
Damgård construction. A compression function is recursively applied to the
input blocks.
• Collisions of SHA-1 have been found using significant computing resources.
The SHA-2 variants SHA-224, SHA-256, SHA-384 and SHA-512 are constructed
in a similar way to SHA-1, but they generate longer digests and are assumed to
be more secure.
• SHA-3 is the new standard hash function. SHA-3 is not a Merkle-Damgård
hash function, but it is based on a sponge construction. The internal state array
has 1600 bits and the Keccak-𝑓[1600] permutation operates on the state.
𝑓(𝑘, 𝑚) = 𝐸𝑘 (𝑚).
Show that 𝑓 – in contrast to the Davies-Miller construction – is not collision-resis-
8. Give a table of values of the Boolean functions 𝐶ℎ and 𝑀𝑎𝑗 used by SHA-1 and
SHA-2. Show that the XOR (⊕) operations in these functions can be replaced by
OR (∨).
9. Suppose 𝑚 is a message of length 109 bits. How many calls to the SHA-1 com-
pression function, the SHA-2 compression function and the SHA-3 permutation
Keccak-𝑓[1600], respectively, are required to compute the hash value 𝐻(𝑚)?
10. Explain how the rate 𝑟 and the capacity 𝑐 are related to the performance and the
security level of SHA-3.
11. Suppose Keccak-𝑓[1600] was a linear map on 𝐺𝐹(2)1600 . Fix a SHA-3 output
length. Show that a collision in the associated SHA-3 hash function can be con-
structed using an efficient algorithm.
12. Suppose Keccak-𝑓[1600] was not a permutation and a collision had been found.
Can this be used for a collision in SHA-3 ?
Chapter 8
Message Authentication
A message authentication code (MAC) is a cryptographic tag which protects the in-
tegrity and the origin of a message. A correct tag shows that the data has not been
tampered with by an adversary, and it also protects against accidental errors. MACs
are widely used (for example in network security protocols), since encryption alone is
not sufficient to protect the data. In fact, most encryption schemes cannot prevent the
manipulation of messages. Streams ciphers (or blocks ciphers in CTR, OFB and CFB
mode) are particularly vulnerable, since an adversary can change selected bits.
Message authentication codes use a symmetric secret key for tag generation and
verification. This constitutes a major difference to signatures (see Chapter 11), where
messages are signed with a private key and verification is performed with a public key.
The computation of MACs is usually very fast and they can efficiently protect the in-
tegrity of mass data.
We outline the definition of message authentication codes and their security re-
quirements in Section 8.1. Practical constructions of MACs, based on block ciphers
in CBC mode and on hash functions, are covered in Sections 8.2 and 8.3, respectively.
The combination of encryption and message authentication as well as authenticated
encryption schemes are discussed in Section 8.4.
For additional reading, we refer to [KL15] and [GB08].
152 8. Message Authentication Codes
Definition 8.1. A message authentication code is given by the following spaces and
polynomial-time algorithms:
• A message space ℳ.
• A key space 𝒦.
• A key generation algorithm 𝐺𝑒𝑛(1𝑛 ) that takes a security parameter 1𝑛 as input
and outputs a key 𝑘.
• A tag generation algorithm, which may be randomized. It takes a message 𝑚 and
a key 𝑘 as input and outputs a tag MAC𝑘 (𝑚).
• A deterministic verification algorithm that takes a key 𝑘, a message 𝑚 and a tag 𝑡
and outputs 1 if the tag is valid, or otherwise 0.
Canonical verification means to re-compute MAC𝑘 (𝑚) and to output 1 if
𝑡 = MAC𝑘 (𝑚), and 0 otherwise. Canonical verification is only possible if the tag gen-
eration is deterministic. ♢
A message authentication tag is usually short and does not include the message.
Verification therefore requires the message, the tag and the key. Note that hash values
can also be leveraged to verify the integrity of data. However, the verifier needs to
access the authentic hash value, which is impossible in many applications.
The security of message authentication codes is determined by the difficulty to
forge a valid tag without knowing the key. We assume that an adversary can choose
messages and obtain a valid MAC. This is called a chosen message attack and corre-
sponds to a situation in practice, where many messages and their MACs are known.
The scheme is considered to be insecure if an adversary can generate a new message
and an associated valid MAC in polynomial time.
Definition 8.2. Suppose a message authentication code is given. Consider the fol-
lowing experiment (see Figure 8.1): a challenger takes 1𝑛 as input and generates a key
𝑘 ← 𝐺𝑒𝑛(1𝑛 ). An adversary 𝐴 is given 1𝑛 . They can choose multiple messages 𝑚 and
obtain the tags MAC𝑘 (𝑚) from an oracle. The adversary succeeds if they can produce
a message 𝑚, which they did not query previously, and a valid tag 𝑡 of that message. In
this case, the challenger outputs 1, and otherwise 0.
The scheme is called existentially unforgeable under an adaptive chosen-message
attack (EUF-CMA secure), or just secure, if for all probabilistic polynomial-time adver-
saries, the probability of success is negligible in 𝑛.
Remark 8.3. The above experiment can be slightly modified by accepting a valid mes-
sage/tag pair, where only the tag is new and the message might have been queried
before. Unforgeable MACs in this experiment are called strongly secure. If canonical
verification is used, then secure MACs are automatically strongly secure, since in this
case a message uniquely determines the tag.
8.1. Definitions and Security Requirements 153
Adversary Challenger/Oracle
1𝑛 $
𝑘 ← 𝐺𝑒𝑛(1𝑛 )
Choose 𝑚
𝑡 = MAC𝑘 (𝑚)
Choose 𝑚′ ,
(𝑚′ , 𝑡′ )
forge a tag 𝑡′ Verify (𝑚′ , 𝑡′ ),
output 1 or 0
In the above game, the adversary cannot ask the oracle to verify a tag. Now one can
change the experiment by allowing verification queries. Since this makes the adversary
more powerful, the associated definition of security could be stronger. However, one
can show that a strongly secure MAC (for example a MAC with canonical verification)
is also secure in this experiment. An adversary can verify a message and a tag by run-
ning the original experiment. Since the number of verification queries is polynomial,
the security definitions are equivalent.
Remark 8.4. MACs do not protect against the replay of messages and tags. If replay
protection is required, for example in network protocols, then an additional counter (or
a timestamp) should be used. The counter is added to the message and integrated into
the tag computation so that the counter cannot be forged. The sender increments the
counter for every new message. The receiver keeps track of the counter and discards a
message if a counter is re-used or if the tag is invalid. ♢
The above Theorem has a proof by reduction (see [KL15]). An adversary, who can
forge valid MACs, is also able to distinguish 𝐹 from a random function.
Since the prf-construction only takes messages of fixed length as input, it is rarely
used in practice. The construction can be extended to messages of arbitrary length,
for example by a sequence of tags (see Exercise 5), but this is not very efficient. In the
following two sections, we describe two widely used MAC constructions, CBC MAC
and HMAC. They are based on block ciphers and hash functions, respectively.
The CBC MAC computation is similar to encryption in CBC mode. However, the
initialization vector is a zero string and only the last ciphertext block is output. One
can show that the basic CBC MAC is secure for fixed-length messages (see [KL15]).
Theorem 8.7. If 𝐸 is a pseudorandom permutation, then the basic CBC MAC is EUF-
CMA secure for messages of fixed length 𝑁𝑙.
Remark 8.8. The bijectivity of 𝐸𝑘 is not required in Definition 8.6, and Theorem 8.7
remains true for a pseudorandom function family. ♢
The above basic CBC MAC is not secure for messages of arbitrary length: suppose
𝑚 is a message of length 𝑙 so that 𝑡 = MAC𝑘 (𝑚) = 𝐸𝑘 (𝑚). Now an adversary constructs
the message 𝑚′ = 𝑚 ‖ (𝑡 ⊕ 𝑚) of length 2𝑙. Since
MAC𝑘 (𝑚′ ) = 𝐸𝑘 (𝑡 ⊕ (𝑡 ⊕ 𝑚)) = 𝐸𝑘 (𝑚) = 𝑡,
the same tag is also valid for 𝑚′ . This shows that the basic CBC MAC needs to be
modified for messages of variable length. One approach is to prepend the length of the
message which prevents this attack (see Exercise 6). Another option is to transform the
last input block using a secret key, which prevents the fabrication of valid tags. Below,
8.2. CBC MAC 155
8.3. HMAC
Another widely used MAC construction is based on hash functions. Hash functions
are usually faster than encryption algorithms. However, hash functions are unkeyed in
practice, so they cannot be used directly as MACs. But note that the general Merkle-
Damgård transform (see Section 7.3) takes an initialization vector (or key) 𝐼𝑉 as input.
The obvious prefix construction 𝐻𝑘 (𝑚) = 𝐻(𝑘, 𝑚) (with 𝑘 = 𝐼𝑉) or 𝐻𝑘 (𝑚) = 𝐻(𝑘‖𝑚)
(for an unkeyed hash function with fixed 𝐼𝑉) is insecure for messages of variable length
if 𝐻 is a Merkle-Damgård hash function (length extension attack; see Exercise 8). Note
that the SHA-3 family is not vulnerable to this attack.
The Hash-based Message Authentication Code (HMAC) is based on two nested
hashing operations and protects against length extension attacks. HMAC is described
in RFC 2104 [HK97] and standardized in [FIP08].
Definition 8.13. Let 𝐻 be a Merkle-Damgård hash function and suppose 𝑏 is the input
block length in bytes of the underlying compression function. For SHA-1 and SHA-
256, one has 𝑙 = 512 bits and thus 𝑏 = 64 bytes. The message space is ℳ = {0, 1}∗
and HMAC keys 𝑘 ← {0, 1}𝑛 are chosen uniformly at random. We assume that the
byte length of 𝑘 is, at most, 𝑏. Define ipad and opad strings by repeating the bytes 36
and 5C, respectively, 𝑏 times. The key 𝑘 is padded by zeros such that the byte length
of 𝑘 = (𝑘 ‖ 0 … 0) is 𝑏. Then the HMAC message authentication tag of a message 𝑚 is
defined as
HMAC(𝑘, 𝑚) = 𝐻(𝑘 ⊕ opad ‖ 𝐻(𝑘 ⊕ ipad ‖ 𝑚) ).
The verification of a message 𝑚 and a tag 𝑡 is canonical: compute HMAC(𝑘, 𝑚) and
compare the result with 𝑡. The tag is valid if 𝑡 = HMAC𝑘 (𝑚). ♢
The above security guarantee for NMAC is quite strong. Does it also apply to
HMAC ? The main differences are: a) HMAC uses an unkeyed hash function and is
keyed via the data input, b) length padding is applied, and c) the HMAC keys 𝑘 ⊕ opad
and 𝑘 ⊕ ipad are not independent. Nevertheless, one has the following result [Bel06]:
8.4. Authenticated Encryption 157
Theorem 8.15. Let 𝑓 ∶ {0, 1}𝑛 × {0, 1}𝑙 → {0, 1}𝑛 be a compression function and let
𝑓 ∶ {0, 1}𝑛 × {0, 1}𝑙 → {0, 1}𝑛 be the dual function with the same values as 𝑓, but keyed
via the second component. Let 𝐻 be the Merkle-Damgård hash function associated with
𝑓. If 𝑓 is a prf and 𝑓 is a prf under restricted related-key attacks, respectively, then HMAC
is a pseudorandom function and an EUF-CMA secure MAC for messages of arbitrary
length. ♢
The above Theorem reduces the security of HMAC to the pseudorandomness of
𝑓 and 𝑓. The related-key attack against 𝑓 can be restricted to two keys (𝑘 ⊕ opad and
𝑘⊕ipad) and two oracle queries. Note that only pseudorandomness is required, so that
HMAC could still be secure when used with hash functions whose collision resistance
is compromised.
Remark 8.16. HMAC is widely used in practice, not only as a message authentication
code, but also as a pseudorandom function and as a building block in key derivation
functions. For example, an HMAC-based Extract-and-Expand Key Derivation Function
(HKDF) is described in RFC 5869 [KE10]. Multiple HMAC calls with the same key and
different input data can generate the desired number of pseudorandom output bits. ♢
Truncated versions of HMAC are often used for message authentication, for ex-
ample HMAC-SHA1-80. These variants are defined by the leftmost 𝑡 bits of the HMAC
value. It is recommended that 𝑡 should not be less than 80.
Remark 8.17. HMAC was designed for Merkle-Damgård hash functions, for example
MD5 and SHA-1, which are vulnerable to length extension attacks. SHA-3 (Keccak)
does not need the nested approach and a MAC can be defined by prepending the key to
the message. The NIST publication [KCP16] describes the Keccak Message Authenti-
cation Code (KMAC). Keccak can also be used as a pseudorandom function and a key
derivation function.
Adversary Challenger/Oracle
1𝑛 $
𝑘 ← {0, 1}𝑛
Choose 𝑚
𝑐 = ℰ𝑘 (𝑚)
Forge a ciphertext 𝑐′ ≠ 𝑐 𝒟𝑘 (𝑐′ ) ≠ ⟂ ?
Output 1 or 0
Note that the adversary must produce a valid ciphertext 𝑐. Depending on the
scheme, the decryption of a given string 𝑐 can be invalid, i.e., 𝒟𝑘 (𝑐) = ⟂.
Definition 8.19. An encryption scheme is called an authenticated encryption scheme
if it is CCA2-secure and unforgeable. ♢
We already know (see Remark 2.53) that block ciphers in CBC or CTR mode are
malleable and therefore forgeable. An obvious approach to obtaining an authenticated
encryption scheme is to combine a CPA-secure encryption scheme and a secure MAC.
Several combinations are possible, and it turns out that the encrypt-then-authenticate
construction is the best choice.
Definition 8.20. Suppose a symmetric-key encryption scheme and a message authen-
tication code is given. We assume that key generation algorithms choose uniform keys
of length 𝑛. Then define a combined encryption and message authentication scheme
$ $
as follows: on input 1𝑛 choose two uniform keys 𝑘𝐸 ← {0, 1}𝑛 and 𝑘𝑀 ← {0, 1}𝑛 . En-
cryption of a plaintext 𝑚 with a key (𝑘𝐸 , 𝑘𝑀 ) is defined by
ℰ(𝑘 𝐸 ,𝑘𝑀 )
(𝑚) = (𝑐, 𝑡),
where 𝑐 ← ℰ𝑘𝐸 (𝑚) and 𝑡 = MAC𝑘𝑀 (𝑐). For decryption of (𝑐, 𝑡), one first verifies the
tag and outputs 𝒟𝑘𝐸 (𝑐) if the tag is valid. If the tag is not valid or 𝒟𝑘𝐸 (𝑐) = ⟂, then
output ⟂. ♢
Note that the tag is computed from the ciphertext, not from the plaintext. The next
Theorem states that the above definition gives a CCA2-secure and unforgeable scheme
if the underlying encryption scheme and the MAC are secure.
Theorem 8.21. Consider the encrypt-then-authenticate construction defined above.
Suppose that the encryption scheme is CPA-secure and the message authentication code is
8.4. Authenticated Encryption 159
a strongly secure MAC ( for example a secure MAC with canonical verification). Then the
encrypt-then-authenticate construction gives an authenticated encryption scheme. ♢
In our description below, we assume that the additional authenticated data (AAD)
is 128 bits long at most. AAD may also be empty.
Definition 8.24. (GCM mode) Let 𝐸 be a block cipher with 128-bit block length. For
each encryption, a uniform initialization vector (or a nonce) 𝐼𝑉 ← {0, 1}96 is chosen.
The plaintext message 𝑚 is split into blocks of length 128 bits where the last block can
be shorter. We write 𝑚 = 𝑚1 ‖𝑚2 ‖ … ‖𝑚𝑁 . Define a 128-bit counter by 𝑐𝑡𝑟 = 𝐼𝑉‖031 ‖1.
Applying the CTR mode (see Definition 2.48) gives:
𝑐𝑖 = 𝐸𝑘 (𝑐𝑡𝑟 + 𝑖) ⊕ 𝑚𝑖 for 𝑖 = 1, 2, … , 𝑁 and 𝑐 = 𝐼𝑉‖𝑐1 ‖𝑐2 ‖ … ‖𝑐𝑁 .
Define the 128-bit hash key 𝐻 = 𝐸𝑘 (0128 ) and let 𝐴 = 𝐴𝐴𝐷. Then define
𝑋1 = 𝐴 ⋅ 𝐻,
𝑋𝑖 = (𝑋𝑖−1 ⊕ 𝑐𝑖−1 ) ⋅ 𝐻 for 𝑖 = 2, … , 𝑁 + 1, and
𝑋𝑁+2 = (𝑋𝑁+1 ⊕ (|𝐴| ‖ |𝑐|)) ⋅ 𝐻.
𝐴 and 𝑐𝑁 are padded by zeros, if necessary. The multiplication by 𝐻 is defined in the
field 𝐺𝐹(2128 ) as described above, and the bit lengths |𝐴| and |𝑐| are represented by 64-
bit integers under the big-endian convention. Then the authentication tag 𝑡 is defined
𝑡 = 𝑋𝑁+2 ⊕ 𝐸𝑘 (𝑐𝑡𝑟)
(see Figure 8.3), and the complete authenticated ciphertext is given by (𝑐, 𝑡, 𝐴𝐴𝐷).
For the decryption of (𝑐, 𝑡, 𝐴𝐴𝐷), the authentication tag associated to 𝑐 and 𝐴𝐴𝐷
is computed using the same formulas as above. If the result is not equal to the given tag
𝑡, then output the error symbol ⟂. Otherwise the plaintext is computed by decrypting
𝑐 in CTR mode:
𝑚𝑖 = 𝐸𝑘 (𝑐𝑡𝑟 + 𝑖) ⊕ 𝑐𝑖 for 𝑖 = 1, 2, … , 𝑁 and 𝑚 = 𝒟𝑘 (𝑐) = 𝑚1 ‖𝑚2 ‖ … ‖𝑚𝑁 . ♢
⋅𝐻 ⨁ ⋅𝐻 ⨁ ⋅𝐻 ⨁ ⋅𝐻 ⨁ ⋅𝐻 ⨁ 𝑡
Figure 8.3. Computation of the GCM tag 𝑡 from the counter mode ciphertext 𝑐 =
𝐼𝑉‖𝑐1 ‖ … ‖𝑐𝑁 . 𝐻 = 𝐸𝑘 (0128 ) is the hash key, and the additional authenticated data is
denoted by 𝐴.
Note that the GCM mode does not strictly follow the encrypt-then-authenticate
approach, because the same key and counter is used for encryption and message au-
thentication. The security of GCM is proved in [MV04].
Exercises 161
8.5. Summary
• A message authentication code (MAC) is a tag that protects the integrity and
the authenticity of a message. The computation of a MAC requires a secret key
and the message.
• A MAC is secure if it is unforgeable under a chosen message attack.
• The CBC-MAC and CMAC constructions output the last ciphertext block in
CBC mode as a tag. CMAC modifies the last the plaintext block before the
message is encrypted in CBC mode in order to prevent length extension attacks.
• HMAC is based on a nested hash computation and takes a key and a message
as input.
• CMAC and HMAC are secure under certain assumptions.
• Authenticated encryption schemes are CCA2-secure and unforgeable.
• The encrypt-then-authenticate combination of a CPA-secure encryption
scheme and a strongly secure MAC gives an authenticated encryption scheme.
• The Galois Counter Mode (GCM) extends the CTR mode and provides encryp-
tion as well as message authentication using a single secret key.
This chapter deals with public-key encryption schemes and the RSA cryptosystem. Sec-
tion 9.1 introduces public-key encryption schemes and defines their security require-
ments. Section 9.2 explains the widely used RSA encryption algorithm. The security
of RSA and the necessary assumptions are covered in Section 9.3. RSA (and other
cryptographic schemes) require large prime numbers and Section 9.4 deals with the
generation of such primes. The efficiency of RSA and possible optimizations are dis-
cussed in Section 9.5. We will see that there are some pitfalls in the application of RSA.
This leads to a randomized and padded version of RSA, which is explained in Section
9.6. The security of RSA is closely related to the factoring assumption, and Section 9.7
outlines different factoring algorithms and their complexity.
RSA is a major public-key scheme and is dealt with in all cryptography textbooks,
for example in [PP10]. For the provable security approach we refer to [KL15], [BR05],
[GB08], [Gol01].
164 9. Public-Key Encryption and the RSA Cryptosystem
Historically, the idea of using public encryption keys is relatively new. A major ad-
vantage of this approach is that public keys can be openly exchanged. In addition, one
key pair suffices to receive messages from many communication partners. Obviously,
it is crucial that an adversary is not able to derive the private decryption key from the
public encryption key. Furthermore, the authenticity of public keys can represent a
Diffie and Hellman, influenced by Merkle’s work, were the first to publish a pa-
per [DH76] on public-key cryptography in 1976. They invented a mechanism for se-
cure key distribution over an insecure channel (see Chapter 10). Furthermore, they
described the fundamentals of public-key cryptography. Rivest, Shamir and Adleman
then published the first public-key encryption scheme (RSA) in 1978 [RSA78]. Al-
though Ellis, Cocks and Williamson had already invented public key mechanisms sev-
eral years before, they were not allowed to publish their results because they worked
for the British secret service.
Definition 9.1. (compare Definition 2.1) A public-key encryption scheme (public-key
cryptosystem) is given by:
• A plaintext space ℳ.
• A ciphertext space 𝒞.
• A key space 𝒦 = 𝒦𝑝𝑘 × 𝒦𝑠𝑘 (pairs of public and private keys).
• A randomized key generation algorithm 𝐺𝑒𝑛(1𝑛 ) that takes a security parameter
𝑛 as input and outputs a pair of keys (𝑝𝑘, 𝑠𝑘).
• An encryption algorithm ℰ = {ℰ𝑝𝑘 | 𝑝𝑘 ∈ 𝒦𝑝𝑘 } which may be randomized. It
takes a public key and a plaintext as input, and outputs the ciphertext or an error
symbol ⟂ if the plaintext is invalid.
• A deterministic decryption algorithm 𝒟 = {𝒟𝑠𝑘 | 𝑠𝑘 ∈ 𝒦𝑠𝑘 } that takes a private
key and a ciphertext as input and outputs the plaintext or an error symbol ⟂ if the
input is invalid.
All algorithms must run in polynomial time. The scheme provides correct decryption
if 𝒟𝑠𝑘 (ℰ𝑝𝑘 (𝑚)) = 𝑚 for each key pair (𝑝𝑘, 𝑠𝑘) ∈ 𝒦 and all plaintexts 𝑚 ∈ ℳ (see
Figure 9.1). ♢
𝑝𝑘 𝑠𝑘
𝑚 ℰ 𝑐 𝒟 𝑚
Figure 9.1. Encryption uses a public key 𝑝𝑘 and decryption a private key 𝑠𝑘.
9.1. Public-Key Cryptosystems 165
Definition 9.2. Suppose a public-key encryption scheme is given. Consider the fol-
lowing experiment (see Figure 9.2): a challenger takes the security parameter 1𝑛 as
input, generates a key pair (𝑝𝑘, 𝑠𝑘) ∈ 𝒦 by running 𝐺𝑒𝑛(1𝑛 ) and chooses a random bit
𝑏 ← {0, 1}. An adversary 𝐴 is given the public key 𝑝𝑘 and 1𝑛 . The private key 𝑠𝑘 and 𝑏
are not known to the adversary. They can encrypt arbitrary messages using the public
key 𝑝𝑘. The adversary chooses two messages 𝑚0 , 𝑚1 ∈ ℳ of the same length. Then
the challenger encrypts one of the messages, and the ciphertext 𝑐 = ℰ𝑝𝑘 (𝑚𝑏 ) is given
to 𝐴. The adversary tries to guess 𝑏 and outputs a bit 𝑏′ . The challenger outputs 1 if
𝑏 = 𝑏′ , and 0 otherwise. The IND-CPA advantage of the adversary 𝐴 is defined as
Adv (𝐴) = |𝑃𝑟[𝑏′ = 𝑏] − 𝑃𝑟[𝑏′ ≠ 𝑏]|.
The scheme has indistinguishable encryptions under a chosen plaintext attack (IND-
CPA secure or CPA-secure) if for every probabilistic polynomial time adversary 𝐴, the
advantage Adv (𝐴) is negligible in 𝑛. ♢
Adversary Challenger
𝑝𝑘, 1𝑛 $
(𝑝𝑘, 𝑠𝑘) ← 𝐺𝑒𝑛(1𝑛 )
𝑏 ← {0, 1}
Choose 𝑚0 , 𝑚1
𝑚0 , 𝑚1
|𝑚0 | = |𝑚1 |
𝑐 ← ℰ𝑝𝑘 (𝑚𝑏 )
Select 𝑚0 (𝑏′ = 0) Compare 𝑏 and 𝑏′ ,
or 𝑚1 (𝑏′ = 1) output 1 or 0
Figure 9.2. Public-key EAV and CPA experiment. The adversary can also encrypt any
chosen plaintext.
Since an adversary can encrypt 𝑚0 and 𝑚1 and compare the result with the chal-
lenge ciphertext 𝑐, it is obvious that a public-key scheme with deterministic encryption
cannot be IND-CPA secure.
166 9. Public-Key Encryption and the RSA Cryptosystem
Remark 9.3. A more powerful adversary is able to perform an adaptive chosen cipher-
text attack (CCA2). In the CCA2 experiment, the adversary can additionally request
the decryption of arbitrary ciphertexts (before and after choosing two plaintext mes-
sages), except that the challenge ciphertext 𝑐 cannot be queried (compare Figure 2.5 in
the secret-key case). ♢
The construction of secure public-key encryption schemes is far from trivial, since
encryption is public but decryption must be hard without the private key. The con-
struction can be based on a family of trapdoor one-way permutations. Such permuta-
tions are one-way, i.e., easy to compute, but hard to invert without a trapdoor informa-
tion, which corresponds to the private key. It should be mentioned that hardness of
inversion is only required for uniform random input. Obviously, an adversary can pre-
pare a list of input values, compute the associated output and use that list for inversion.
We refer to [Gol01] and [KL15] on how to construct a secure public-key encryption
scheme from a family of trapdoor permutations.
All known constructions of public-key schemes are based on hard number-theoretic
problems, which also provide a security guarantee for these schemes. This represents
an advantage over secret-key schemes, where such guarantees do not exist. However,
public-key schemes are much less efficient, and, in practice, such schemes are only ap-
plied to a few blocks.
The RSA algorithm, which is explained in the next section, uses exponentiation
modulo a public composite number 𝑁 as its one-way permutation. The prime factors
of 𝑁 represent the private trapdoor information that permit the inversion.
for 𝑝 ≠ 𝑞 and ord(ℤ∗𝑁 ) = 𝜑(𝑁) = (𝑝 − 1)(𝑞 − 1). One chooses 𝑒 ∈ ℤ such that
Then 𝑝𝑘 = (𝑒, 𝑁) forms the public key and 𝑠𝑘 = (𝑑, 𝑁) the private key. 𝑁 is called the
RSA modulus, 𝑒 is the encryption exponent and 𝑑 is the decryption exponent. The factors
𝑝, 𝑞 and 𝜑(𝑁) must remain secret, since 𝑑 can be efficiently derived from any of these
numbers. For encryption, the plaintext is raised to the power 𝑒 and reduced modulo 𝑁.
Decryption works similarly, but raises the ciphertext to the power 𝑑.
𝑒𝑑 ≡ 1 mod (𝑝 − 1)(𝑞 − 1)
are chosen as explained above. 𝐺𝑒𝑛(1𝑛 ) outputs the public key 𝑝𝑘 = (𝑒, 𝑁) and
the private key 𝑠𝑘 = (𝑑, 𝑁).
• The plaintext and the ciphertext space is ℤ∗𝑁 .
• The deterministic encryption algorithm takes a plaintext 𝑚 ∈ ℤ∗𝑁 and the public
key 𝑝𝑘 as input and outputs
• The decryption algorithm takes a ciphertext 𝑐 ∈ ℤ∗𝑁 and the private key 𝑠𝑘 as
input and outputs
𝑚 = 𝒟𝑠𝑘 (𝑐) = 𝑐𝑑 mod 𝑁.
The scheme is only defined for messages of fixed length. ♢
For the correctness of the RSA scheme one has to show that
(𝑚𝑒 )𝑑 ≡ 𝑚 mod 𝑁
for all 𝑚 ∈ ℤ∗𝑁 . But this follows from Euler’s Theorem (see Theorem 4.15 and Exercise
4.4): let 𝑚 ∈ ℤ∗𝑁 ; then we have
𝑚𝜑(𝑁) ≡ 1 mod 𝑁.
Example 9.5. Suppose Bob’s RSA key is given by 𝑝 = 29, 𝑞 = 23, 𝑁 = 𝑝𝑞 = 667, 𝑒 = 3,
𝑑 = 411. This defines an admissible RSA cryptosystem, since 𝑒 = 3 is relatively prime
to 𝜑(𝑁) = (𝑝 − 1)(𝑞 − 1) = 616. The multiplicative inverse 𝑑 = 411 can be computed
using the Extended Euclidean Algorithm:
Hence 𝑑 = (3 mod 616)−1 ≡ −205 ≡ 411. The public key 𝑝𝑘 = (3, 667) is published
by Bob. If Alice wants to send him the message 𝑚 = 108, she will encrypt it as follows:
She sends 𝑐 = 416 to Bob, who is able to decrypt the ciphertext using his private key
𝑠𝑘 = (411, 667):
𝑚 = 𝐷𝑑 (𝑐) = 𝑐𝑑 = 416411 mod 667 ≡ 108. ♢
𝑝2 − (𝑁 − 𝜑(𝑁) + 1)𝑝 + 𝑁 = 0.
The roots of the quadratic equation 𝑥2 + (𝑁 − 𝜑(𝑁) + 1)𝑥 + 𝑁 = 0 are 𝑝 and 𝑞. Hence
𝜑(𝑁) must also be kept secret.
Example 9.6. Consider Example 9.5 and suppose 𝑁 = 667 and 𝜑(𝑁) = 616 are known.
Then the roots of the equation
The factoring of integer numbers has been intensively studied and it is generally
assumed that finding large prime factors of a composite number is a hard problem.
Suppose a modulus generation algorithm takes the security parameter 1𝑛 as input
and outputs two primes 𝑝, 𝑞. Let 𝑁 = 𝑝𝑞. A probabilistic polynomial time adversary
𝐴 is given 𝑁 and has to find the factors 𝑝 and 𝑞. Now the factoring assumption states
that the modulus can be efficiently generated in such a way that an adversary has only
a negligible chance of finding the correct factors 𝑝 and 𝑞.
Since no polynomial-time factoring algorithms have been found so far, this as-
sumption is generally believed to be true and forms the basis of major cryptographic
schemes. Factoring algorithms are discussed in Section 9.7.
9.3. RSA Security 169
Example 9.7. Suppose the primes 𝑝 and 𝑞 are chosen such that the difference 𝑝 − 𝑞
is small. In this case, factoring 𝑁 = 𝑝𝑞 is not hard, even if 𝑝 and 𝑞 are large prime
numbers (see Fermat’s factorization method in Section 9.7). In fact, the primes should
be chosen independently. ♢
Factoring 𝑁 breaks RSA, but the opposite statement is not necessarily true. The
security of RSA is in fact based on the RSA assumption, which states that encryption is
a one-way permutation. However, the RSA assumption is stronger than the factoring
assumption, since an adversary might attack RSA without factoring the modulus.
Definition 9.8. Consider the following RSA experiment: run the RSA key generation
algorithm 𝐺𝑒𝑛(1𝑛 ) on input 1𝑛 to obtain the parameters 𝑝, 𝑞, 𝑒, 𝑑 and 𝑁. A uniform
ciphertext 𝑐 ← ℤ∗𝑁 is chosen and an adversary obtains 1𝑛 , 𝑒, 𝑁 and 𝑐. The adversary
has to find 𝑚 ∈ ℤ∗𝑁 such that 𝑚𝑒 mod 𝑁 ≡ 𝑐. The RSA problem is hard relative to
𝐺𝑒𝑛, if for every probabilistic polynomial-time adversary, the probability of finding the
correct plaintext 𝑚 is negligible in 𝑛.
The RSA assumption states that there is a key generation algorithm such that the
RSA problem is hard. ♢
The RSA assumption means that it is hard to recover the plaintext from a randomly
chosen ciphertext, but this does not imply the security of the plain RSA scheme. In fact,
the plain RSA encryption scheme is deterministic and thus cannot be CPA-secure. This
is critical in situations where the possible plaintexts are known or the number of plain-
texts is small. Then an adversary can easily find the plaintext simply by encrypting the
plaintext candidates. But if the plaintext messages are chosen uniformly at random
from a large space, then one might expect that the scheme is secure under the RSA as-
sumption. However, there are a number of pitfalls, which are discussed below. Further
details can be found in the survey article [Bon99].
(1) Encryption of a short plaintext message 𝑚 with a small encryption exponent 𝑒 is
insecure. If 𝑐 = 𝑚𝑒 < 𝑁, then 𝑐 is computed without modular reduction, and
hence the plaintext 𝑚 can be recovered by computing the real 𝑒-th root 𝑐1/𝑒 . If
𝑒 = 3 then this low-exponent attack can be applied to all messages of length < ,
where 𝑛 = size(𝑁). In practice, one often chooses the public exponent 𝑒 = 216 +1,
which is large enough to prevent this attack.
(2) If a fixed message 𝑚 (not necessarily short) is encrypted for 𝑒 recipients with dif-
ferent RSA moduli, then the Chinese Remainder Theorem allows 𝑚 to be recov-
ered by computing a real 𝑒-th root (Hastad’s broadcast attack; see Exercise 7).
(3) The modulus 𝑁 must not be shared among different users, even if individual ex-
ponents 𝑒 and 𝑑 are used. They can factorize 𝑁 and therefore compute the private
exponents of all users who share this modulus. Furthermore, one can show that
sharing the modulus is insecure, even if the users trust each other.
170 9. Public-Key Encryption and the RSA Cryptosystem
(4) The prime factors of the modulus must not be re-used. If 𝑁1 = 𝑝𝑞1 and 𝑁2 = 𝑝𝑞2 ,
then 𝑝 = gcd(𝑁1 , 𝑁2 ) can be efficiently computed.
(5) It was shown that small decryption exponents 𝑑 < 𝑁 1/4 can be efficiently recov-
ered (Wiener attack). This attack can be improved to 𝑑 < 𝑁 0.292 . Such 𝑑 should
therefore be avoided. However, if the public exponent 𝑒 is chosen first, then the
probability that 𝑑 satisfies this condition is very small.
(6) The plaintext of two related messages 𝑚1 and 𝑚2 , for example
𝑚2 = 𝑎𝑚1 + 𝑏 mod 𝑁,
can be recovered from their ciphertexts 𝑐1 and 𝑐2 if 𝑎 and 𝑏 are known and the
public exponent 𝑒 is small (Franklin-Reiter attack).
(7) The unknown part of a partially known plaintext can be recovered from the ci-
phertext if the encryption exponent 𝑒 is small (Coppersmith attack).
(8) The private exponent 𝑑 can be reconstructed if the least significant ⌈ ⌉ bits
of 𝑑 are known (partial key-exposure attack).
Furthermore, plain RSA does not provide protection against ciphertext manipula-
tions and chosen ciphertext attacks:
(1) Plain RSA encryption is malleable and the ciphertext can be easily manipulated.
If an adversary replaces the ciphertext 𝑐 = 𝑚𝑒 mod 𝑁 with 𝑠𝑒 𝑐 mod 𝑁, then
the corresponding plaintext becomes 𝑠𝑚 mod 𝑁. Similarly, the ciphertext of a
product of plaintexts is congruent to the product of ciphertexts mod 𝑁; plain
RSA encryption is a multiplicative homomorphism.
(2) A chosen ciphertext attack against plain RSA is easily performed: if an adversary
is given the challenge ciphertext 𝑐, then they may ask for the decryption of an
unsuspicious-looking ciphertext 𝑐′ = 𝑐𝑠𝑒 mod 𝑁, where 𝑠 ∈ ℤ∗𝑁 and 𝑠 ≢ 1. If 𝑚
and 𝑚′ are the plaintexts corresponding to 𝑐 and 𝑐′ , then 𝑚′ = 𝑚𝑠 mod 𝑁. Hence
an adversary can easily compute the plaintext 𝑚 = 𝑚′ 𝑠−1 mod 𝑁 (see Exercise
Example 9.9. The density is small, but not too small, since ln(𝑥) increases slowly. For
example, the density of primes among odd random numbers less than 22048 is approx-
2 1
imately ≈ 0.0014. The expected number of trials is ≈ 710. ♢
2048 ln(2) 0.0014
To generate a large prime, choose an odd random number of the required size and
test its primality. Usually, this requires testing of several hundred candidates, as in
Example 9.9 above. Rather surprisingly, a deterministic primality test (AKS) that runs
in polynomial time has been found [AKS04]. In practice, however, the AKS test is not
fast enough, so the probabilistic Miller-Rabin test is preferable. This test is based on
the Proposition below. Note that in this section 𝑛 represents any natural number, not
a security parameter.
Proposition 9.10. Let 𝑛 ∈ ℕ be odd and 1 ≤ 𝑎 < 𝑛 an integer. If 𝑔𝑐𝑑(𝑎, 𝑛) ≠ 1, then 𝑛
is composite. Otherwise, write
𝑛 − 1 = 2𝑠 𝑑
with 𝑠 ∈ ℕ being maximal. If 𝑛 is prime, then either
𝑎𝑑 ≡ ±1 mod 𝑛
The Miller-Rabin test checks whether 𝑛 satisfies the implication of the above
Proposition 9.10. If it does not, then 𝑛 is composite. Hence all 𝑛 satisfying the fol-
lowing condition (COMP) must be composite:
If the Miller-Rabin algorithm outputs that a number is composite, this result must
be correct. However, there are bases 𝑎 such that a composite number 𝑛 is incorrectly
identified as a prime in a run of the Miller-Rabin test.
Proposition 9.11. Let 𝑛 ∈ ℕ be composite and 𝑛 − 1 = 2𝑠 𝑑, where 𝑠 ∈ ℕ is maximal.
Then the number of bases 𝑎 ∈ {1, 2, … , 𝑛 − 1} such that
𝑎𝑑 ≡ ±1 mod 𝑛 or 𝑎2 ≡ −1 mod 𝑛 for 𝑟 ∈ {1, … , 𝑠 − 1}
is at most . ♢
We refer to [Sho09] for a proof of this statement. The probability that one run of the
Miller-Rabin test identifies a composite number as prime is therefore less than . One
can reduce the error probability to less than 𝑘 by 𝑘 independent runs of the Miller-
Rabin test. Note that Proposition 9.11 holds for all composite 𝑛. One can show that the
error probability for randomly selected odd numbers 𝑛 is much lower, and in practice
less than 10 runs are sufficient. The test is also efficient for large numbers, and we note
9.5. Efficiency of RSA 173
that a full factorization of 𝑛 − 1 is not required in order to find the maximal exponent
𝑠 of the factor 2. The exponent 𝑠 and hence the necessary number of exponentiations
is usually small, and the running time is 𝑂(size(𝑛)3 ).
Example 9.12. (1) 𝑛 = 561. We choose 𝑎 = 2 and have gcd(2, 561) = 1. Then
𝑛 − 1 = 560 = 24 ⋅ 35, so that 𝑑 = 35 and 𝑠 = 4. One computes 𝑎𝑑 = 235 ≡
263 ≢ ±1 mod 561, so the test continues. The next steps are 𝑎2𝑑 = 270 ≡ 166,
𝑎4𝑑 = 2140 ≡ 67 and finally 𝑎8𝑑 = 2280 ≡ 1. The sequence does not contain the
residue class −1, and thus the Miller-Rabin test shows that 561 is composite. In
fact, 561 = 3 ⋅ 11 ⋅ 17.
(2) 𝑛 = 1009. We choose 𝑎 = 3, so that gcd(3, 1009) = 1. We have 𝑛 − 1 = 1008 =
24 ⋅ 63, hence 𝑑 = 63 and 𝑠 = 4. We compute 𝑎𝑑 = 363 ≡ 192 ≢ ±1 mod 1009,
so the test continues. Then 𝑎2𝑑 ≡ 540 and 𝑎4𝑑 ≡ 1008 ≡ −1 mod 1009. Hence
𝑛 = 1009 could be a prime and another base 𝑎 is chosen. Every run of the test will
confirm the result, since 1009 is in fact a prime number.
Let 𝑐 be a ciphertext. First, the ciphertext and the private exponent are reduced:
𝑐𝑝 = 𝑐 mod 𝑝, 𝑐𝑞 = 𝑐 mod 𝑞,
𝑑𝑝 = 𝑑 mod (𝑝 − 1), 𝑑𝑞 = 𝑑 mod (𝑞 − 1).
Note that the exponent 𝑑 is reduced modulo 𝑝−1 and 𝑞−1, respectively, and not modulo
𝑝 and 𝑞 (see Proposition 4.16). In fact, one has ord(ℤ∗𝑝 ) = 𝑝 − 1 and ord(ℤ∗𝑞 ) = 𝑞 − 1.
In the next step, the decryption is done more efficiently modulo 𝑝 and 𝑞:
𝑑𝑝 𝑑𝑞
𝑚𝑝 = 𝑐𝑝 mod 𝑝, 𝑚𝑞 = 𝑐𝑞 mod 𝑞;
𝑚 is finally computed using the Chinese Remainder Theorem. Consider the equation
1 = 𝑥𝑝 + 𝑦𝑞.
𝑥, 𝑦 ∈ ℤ can be computed using the Extended Euclidean Algorithm on input 𝑝 and 𝑞.
It follows that 𝑥𝑝 ≡ 1 mod 𝑞 and 𝑦𝑞 ≡ 1 mod 𝑝. Now we obtain
𝑚 = 𝑚𝑞 𝑥𝑝 + 𝑚𝑝 𝑦𝑞 mod 𝑁.
29 ∶ 23 = 1 rem. 6 29 = 23 + 6 6 = 29 − 23
23 ∶ 6 = 3 rem. 5 23 = 3 ⋅ 6 + 5 5 = 23 − 3 ⋅ 6
6 ∶ 5 = 1 rem. 1 6=5+1 1=6−5
Let 𝑚 be the plaintext message. The maximum byte length of 𝑚 is 𝑘−2ℎ−2, where
𝑘 is the length of the modulus in bytes. Firstly, 𝑚 is transformed into a data block 𝐷𝐵
of length 𝑘 − ℎ − 1 bytes. One may add a label 𝐿 or otherwise leave 𝐿 empty. 𝑃𝑆 is a
zero padding string of the required length. Then set
𝐷𝐵 = 𝐻(𝐿) ‖ 𝑃𝑆 ‖ 01 ‖ 𝑚.
The data block 𝐷𝐵 can be viewed as a padded combination of the message and a hashed
Now the next step is to randomize the message. A random seed 𝑟 of length ℎ is
generated, and 𝑑𝑏𝑀𝑎𝑠𝑘 = 𝑀𝐺𝐹(𝑟, 𝑘 − ℎ − 1) gives a pseudorandom output string of
length 𝑘 − ℎ − 1 bytes. Define
𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵 = 𝐷𝐵 ⊕ 𝑑𝑏𝑀𝑎𝑠𝑘,
𝑚𝑎𝑠𝑘𝑒𝑑𝑆𝑒𝑒𝑑 = 𝑟 ⊕ 𝐻(𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵).
𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵 defines the randomized message, and 𝑚𝑎𝑠𝑘𝑒𝑑𝑆𝑒𝑒𝑑 is needed during de-
cryption to undo the masking of 𝐷𝐵. The encoded message 𝐸𝑀 is given by
𝐸𝑀 = 00 ‖ 𝑚𝑎𝑠𝑘𝑒𝑑𝑆𝑒𝑒𝑑 ‖ 𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵.
𝐶 = (𝐸𝑀)𝑒 mod 𝑁.
9.7. Factoring 177
𝑟 = 𝑚𝑎𝑠𝑘𝑒𝑑𝑆𝑒𝑒𝑑 ⊕ 𝐻(𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵),
𝑑𝑏𝑀𝑎𝑠𝑘 = 𝑀𝐺𝐹(𝑟, 𝑘 − ℎ − 1),
𝐷𝐵 = 𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵 ⊕ 𝑑𝑏𝑀𝑎𝑠𝑘.
The expected structure of 𝐷𝐵 and the label is verified, and finally the plaintext 𝑚 is
extracted. It is important that only one type of decryption error message is given for the
different error conditions. Furthermore, the running time of OAEP implementations
should not be correlated to the type of error. Otherwise, an adversary may obtain useful
information and perform a chosen ciphertext attack.
Remark 9.14. A major result is that RSA-OAEP is secure against adaptive chosen ci-
phertexts attacks (CCA2-secure) under the RSA assumption and in the random oracle
model [FOPS01]. However, the CCA2 security was proven for the original OAEP ver-
sion of Bellare and Rogaway, not for the standardized version, which has a leading zero
byte in the encoded message 𝐸𝑀. Care must also be taken to ensure that an adversary
cannot distinguish between the different error conditions. ♢
Loosely speaking, this result means that an adversary, who knows the public key
and has access to a decryption oracle, cannot gain any information from a given cipher-
text or tamper with a ciphertext.
9.7. Factoring
Factoring algorithms have been studied since the times of Ancient Greece, and many
ideas have been contributed over the centuries, but no polynomial-time algorithm has
been found and factoring is still assumed to be a hard problem, at least on conven-
tional computers (see Chapter 13 on quantum computing). Obviously, if the factoring
assumption turns out to be wrong, then RSA is broken.
In the following, we give an overview of different approaches to factoring and dis-
cuss their algorithmic complexity. We assume that a large positive integer
𝑁 = 𝑝𝑞 is given, where the prime factors 𝑝 and 𝑞 are unknown to an adversary.
Lists of primes up to a specified bound can be generated by the ancient sieve of
Eratosthenes. The idea is to successively filter out all multiples of primes. There are
faster modern algorithms, for example the sieve of Atkin. The sieve algorithm generates
a list of prime numbers, but this is only efficient for relatively small primes.
Trial division is an elementary factoring method. It suffices to test numbers ≤ √𝑁.
A list of small primes is useful (sieve method), and otherwise all odd numbers (or per-
haps all numbers not divisible by 2, 3 or 5) need to be tested. The worst-case complexity
is 𝑂(√𝑁) and the running time is exponential in size(𝑁).
178 9. Public-Key Encryption and the RSA Cryptosystem
We obtain the factor 𝑝 = 307 after 18 iterations. The algorithm finds the collision
𝑥18 = 𝑥36 mod 𝑝 and we verify that 21473 ≡ 67523 ≡ 290 mod 307. ♢
To find 𝑥 and 𝑦, you begin with the integer 𝑥 = ⌈√𝑁 ⌉ and increase 𝑥 by 1 until 𝑥2 − 𝑁
is square, say 𝑦2 , so that 𝑁 = 𝑥2 − 𝑦2 . Fermat factorization always works, since 𝑁 can
be written as a difference of two squares:
2 2
1 1
𝑝𝑞 = ( (𝑝 + 𝑞)) − ( (𝑝 − 𝑞)) = 𝑥2 − 𝑦2 .
2 2
However, Fermat’s method is only efficient if the prime factors are close to one another,
i.e., if 𝑦 is small. In general, the running time is 𝑂(√𝑁).
Example 9.16. Let 𝑁 = 14317; then √𝑁 ≈ 119.7. We begin with 𝑥 = 120 and obtain
𝑥2 − 𝑁 = 83, which is not a square. Next, let 𝑥 = 121 and now
𝑥2 − 𝑁 = 324 = 182
9.7. Factoring 179
The quadratic sieve generalizes Fermat factorization and is currently the fastest
algorithm for numbers with less than around 100 decimal digits. One looks for integers
𝑥 and 𝑦 such that 𝑥2 ≡ 𝑦2 mod 𝑁, but 𝑥 ≢ ±𝑦 mod 𝑁. This implies
𝑁 divides 𝑥2 − 𝑦2 = (𝑥 + 𝑦)(𝑥 − 𝑦),
but 𝑁 divides neither 𝑥 + 𝑦 nor 𝑥 − 𝑦. Hence gcd(𝑥 − 𝑦, 𝑁) must be a non-trivial divisor
of 𝑁 and equal either 𝑝 or 𝑞. The quadratic sieve tries to find suitable numbers 𝑥 and
𝑦. It is reasonable to choose integers 𝑥 close to √𝑁, so that 𝑥2 − 𝑁 is relatively small.
Fermat factorization requires that 𝑥2 − 𝑁 is a square number, but this is usually not the
case. Now the idea is to multiply several (non-quadratic) numbers 𝑥2 − 𝑁 with small
prime factors (smooth over a factor base). The difficult task of the quadratic sieve is to
find smooth numbers. Then one looks for a subset of smooth numbers such that their
product is a square. A solution can be found using linear algebra over 𝐺𝐹(2), since a
number is a square if the exponent of each prime factor is zero modulo 2.
The running time of the quadratic sieve is
𝑂(𝑒(1+𝑜(1))√ln(𝑁) ln(ln(𝑁))
(see [Pom96]), where 𝑜(1) is converging to 0 as 𝑛 → ∞. The algorithm is sub-exponen-
tial, but not polynomial.
Although 𝑥2 − 𝑁 is not a square for any of these 𝑥 and Fermat’s method cannot be
applied, their product is a square:
At the time of writing, the number field sieve is the most efficient algorithm for
factoring large integers. With massive computing resources, numbers with more than
200 digits and for example the RSA Challenge with 768 bits could be factored using this
method. Algebraic number fields are extension fields ℚ(𝛼) of ℚ, where 𝛼 is a root of a
polynomial over ℚ. The number field sieve uses the rings ℤ[𝛼] instead of the integers
ℤ, but this topic goes beyond the scope of this book.
The heuristic complexity of the number field sieve is
1/3 ln(ln(𝑁))2/3
𝑂(𝑒(𝑐+𝑜(1)) ln(𝑁) ),
3 64
where 𝑐 = √ ≈ 1.92 (see [Pom96]).
1/3 2/3
Example 9.18. Suppose the sieve algorithm requires 𝑓(𝑁) = 𝑒𝑐 ln(𝑁) ln(ln(𝑁)) steps.
Then the effective key length (the bit strength) of RSA is log2 (𝑓(𝑁)). For 1024-bit RSA,
i.e., for 𝑁 ≈ 21024 and 𝑐 ≈ 1.92, one obtains ‘only’ log2 (𝑓(𝑁)) ≈ 86.7 bits. ♢
𝑎𝑘 ≡ 1 mod 𝑝
for all integers 𝑎 with gcd(𝑎, 𝑝) = 1. One chooses a small integer 𝑎 > 1 and computes
𝑎𝑘 mod 𝑁. Since 𝑘 can be very large, fast exponentiation or the square-and-multiply
algorithm should be used. Finally, gcd(𝑎𝑘 − 1, 𝑁) gives either 𝑝 (method successful)
or 𝑁 (failure).
9.7. Factoring 181
𝑘 = 23 ⋅ 32 ⋅ 5 ⋅ 7 ⋅ 11 ⋅ 13 = 360360.
Finally, we have
𝑝 − 1 = 546 = 2 ⋅ 3 ⋅ 7 ⋅ 13
The Elliptic curve factorization method (ECM) is another interesting factoring al-
gorithm with sub-exponential running time. ECM is suitable for finding prime factors
with up to about 80 decimal digits, but it is less efficient than the quadratic sieve or the
number field sieve method for larger divisors. We outline ECM in Section 12.4.
Since no polynomial-time algorithm is known, the factoring assumption is cur-
rently well-founded, but in the future, quantum computers will probably be able to fac-
torize large integers. Quantum computing and Shor’s factoring algorithm are explored
in Chapter 13.
The relative success of the known factoring algorithms show that standard key
lengths of symmetric ciphers, i.e., 128 to 256 bits, are not sufficient for RSA (see Exam-
ple 9.18). With large resources, a modulus with up to around 1000 bits can be factored.
At the time of this writing, the use of 2048-bit integers 𝑁 is recommended for long-term
security against (non-quantum computing) attacks (see [BSI18]). The prime factors 𝑝
and 𝑞 should have around the same size (1024 bits) and their difference 𝑝 − 𝑞 should
be large.
Furthermore, the use of strong primes is sometimes recommended. A prime 𝑝 is
called strong if it is sufficiently large and satisfies additional conditions − in particu-
lar that 𝑝 − 1 and 𝑝 + 1 contain a large prime factor. This should provide protection
against certain factoring methods, for example Pollard’s 𝑝 − 1 method. However, the
size, randomness and independence of primes are more important and it is currently
assumed that tests on strong primes do not significantly increase the security of RSA.
182 9. Public-Key Encryption and the RSA Cryptosystem
9.8. Summary
• Public-key cryptosystems use a public key for encryption and a private key
for decryption. Indistinguishable encryptions under a chosen plaintext attack
(CPA security) or under an adaptive chosen ciphertext attack (CCA2 security)
are important requirements.
• The plain RSA cryptosystem uses the product of two large prime numbers and
the security relies on the difficulty to factorize a given product.
• The probabilistic Miller-Rabin algorithm can efficiently test the primality of
large integers.
• The plain RSA cryptosystem has weaknesses and the padded and randomized
RSA-OAEP scheme should be used instead. OAEP can achieve CCA2 security
under certain assumptions.
• Factoring algorithms with sub-exponential runtime exist, but no polynomial-
time algorithms are known. RSA is considered to be secure against non-
quantum computers, if the prime factors are randomly chosen and are more
than 1000 bits long.
(b) Mallory eavesdrops two ciphertexts 𝑐1 = 26 and 𝑐2 = 213, which were sent
to Bob, but he does not know the plaintexts 𝑚1 and 𝑚2 . How can Mallory
compute the ciphertexts corresponding to the plaintexts 𝑚1 𝑚2 mod 𝑁 and
𝑚1 𝑚−1
2 mod 𝑁 without carrying out an attack?
(c) Mallory chooses 𝑠 = 5 and computes 𝑦 = 𝑠𝑒 mod 𝑁 ≡ 23. He wants to find
out the plaintext 𝑚 corresponding to the ciphertext 𝑐 = 104. He asks Bob to
decrypt the ‘innocent’ ciphertext 𝑐′ = 𝑦𝑐 mod 𝑁 ≡ 131 and gets the plaintext
𝑚′ = 142. Why is Mallory now able to determine 𝑚 without computing the
private exponent 𝑑 ? Determine the plaintext 𝑚.
(d) Now conduct an attack against this RSA key. Factorize 𝑁 and compute 𝑑.
7. A plaintext 𝑚 is encrypted with three different RSA moduli 𝑁1 = 901,
𝑁2 = 2581 and 𝑁3 = 4141 using the public exponent 𝑒 = 3. The ciphertexts are
𝑐1 = 98, 𝑐2 = 974, 𝑐3 = 2199. Conduct Hastad’s broadcast attack and determine
the plaintext 𝑚.
Tip: Set 𝑁 = 𝑁1 𝑁2 𝑁3 and find 𝑐 mod 𝑁 such that 𝑐 = 𝑐𝑖 mod 𝑁𝑖 for
𝑖 = 1, 2, 3; then compute 𝑚 = √𝑐.
8. Side-channel attacks against RSA use the power consumption of an implemen-
tation to derive the private key. Suppose a microprocessor uses the square-and-
multiply algorithm to decrypt a ciphertext with a private key 𝑑. An attacker an-
alyzes the power trace and concludes that the decryption uses the following se-
quence of modular squarings (SQ) and multiplications (MULT): SQ, SQ, SQ, SQ,
(a) Determine the private key 𝑑.
Tip: Use the construction of the square-and-multiply algorithm given in Chap-
ter 3.
(b) The public key is (𝑒 = 11, 𝑁 = 8051). Calculate 𝜑(𝑁), 𝑝 and 𝑞 from 𝑑, 𝑒 and
𝑁 and verify your result.
9. The Fermat primality test of 𝑛 ∈ ℕ chooses a uniform random integer
𝑎 ∈ {1, … , 𝑛 − 1}, computes 𝑎𝑛−1 mod 𝑛 and outputs 𝑛 is composite, if the re-
sult is not congruent to 1. Otherwise, the test outputs 𝑛 is probably prime. Show
that the test is correct. However, there are composite numbers 𝑛 which are identi-
fied as possible primes for all 𝑎 ∈ ℤ∗𝑛 . They are called Carmichael numbers. Show
that 𝑛 = 561 is a Carmichael number.
10. Check the primality of 𝑛 = 263 using the Miller-Rabin algorithm and 𝑎 = 3 as well
as 𝑎 = 5.
11. Encrypt 𝑚 = 2314 with the plain RSA cipher and the public key (𝑒 = 5, 𝑁 =
10573). Factorize 𝑁 using Fermat’s method. Why is 𝑒 = 5 an admissible exponent,
whereas 𝑒 = 3 is not permitted? Determine the corresponding private key 𝑑. De-
crypt the ciphertext and check the result. Use the Chinese Remainder Theorem to
reduce the size of the exponents.
184 9. Public-Key Encryption and the RSA Cryptosystem
12. Two RSA moduli are given: 𝑁1 = 101400931 and 𝑁2 = 110107021. They have
a common prime factor. Show that both RSA keys are insecure and compute the
factorization of 𝑁1 and 𝑁2 .
13. An adversary is able to modify a RSA ciphertext. They want to square the unknown
plaintext modulo 𝑁. Why is this attack possible for plain RSA, but not if RSA-OAEP
is used?
14. Let (𝑒 = 5, 𝑁 = 10057) be the public key of an RSA cryptosystem. Encrypt the mes-
sage 𝑚 = 2090 using the plain RSA scheme. Factorize 𝑁 and find the decryption
exponent 𝑑.
15. Assume that RSA with a modulus of length 1024 bits and the encryption exponent
𝑒 = 216 + 1 is used. How many modular multiplications are needed, at most, for
encryption and for decryption?
16. Factorize 𝑁 = 2041 using the quadratic sieve method.
Remark: This example is discussed in [Pom96].
17. Factorize 𝑁 = 10573 with Pollard’s 𝑝 − 1 method. Choose 𝑎 = 2 and try 𝑘 = 23 33 ;
then give reasons why this attack is successful for the given integer 𝑁.
18. Describe an attack against RSA encryption with random padding if the padding
string is short and the number of possible plaintexts is small.
Chapter 10
Key Establishment
Keys play a crucial role in cryptography and the establishment of secret keys between
two (or more) parties is a non-trivial task. A key establishment method should prefer-
ably not require a secure channel and provide protection against adversaries.
Key distribution by a trusted authority is briefly discussed in Section 10.1. Key ex-
change or key agreement is a method where the parties exchange messages and jointly
generate a secret key. We explain key exchange protocols and discuss their security re-
quirements in Section 10.2. A widely used method is the Diffie-Hellman key exchange,
which is dealt with in Section 10.3. Diffie-Hellman is a public-key scheme, which uses
a large cyclic group in which the discrete logarithm is hard to compute. The most im-
portant example is the multiplicative group ℤ∗𝑝 of integers modulo a prime number, and
this is explained in Section 10.4. Another possibility is the group of points on an elliptic
curve over a finite field which is discussed in Section 12.2. In Section 10.5, we present
algorithms to solve the discrete logarithm problem and discuss their complexity.
Key encapsulation forms an alternative to key exchange and is covered in Section
10.6. There is also an encapsulation variant of the Diffie-Hellman key exchange. The
combination of key encapsulation and symmetric encryption gives hybrid public-key
encryption schemes, which are outlined in Section 10.7.
Key establishment, the Diffie-Hellman key exchange and the discrete logarithm
problem are standard topics in many cryptography textbooks, for example [PP10]. A
discussion of various methods for key distribution and key management can be found
in [GB08]. Refer to [KL15] for further details on security definitions and proofs of key
exchange, key encapsulation and hybrid encryption schemes.
186 10. Key Establishment
𝑘𝐴 𝑘𝐵
𝑘 ← {0, 1}𝑛
(𝑘) 𝐵 (𝑘)
𝐸 𝑘𝐴
𝐸𝑘 (𝑚)
We note that the basic protocol described above is only secure against eavesdrop-
ping and not against active attacks. Kerberos is an example of a more advanced and
widely used key distribution protocol (see [GB08] and RFC 4120 [NYHR05]).
The question of how to bootstrap the key distribution and establish long-term se-
cret keys remains. In the following section, we define the security requirements of
key exchange protocols that do not assume a secure channel ahead of time. In Section
10.3, we will see that the Diffie-Hellman key exchange is an important example of such
a protocol.
not assume a pre-distribution of keys or a secure channel between the parties. Never-
theless, the protocol should be secure against eavesdropping attacks.
Definition 10.1. Suppose a key exchange protocol is given. Consider the following
experiment (see Figure 10.2). Two communication parties (Alice and Bob) hold 1𝑛 ,
exchange the messages 𝑚 and derive a key 𝑘 of length 𝑛. A challenger chooses a random
$ $
bit 𝑏 ← {0, 1}. If 𝑏 = 1 set 𝑘′ = 𝑘, and otherwise 𝑘′ ← {0, 1}𝑛 is chosen uniformly at
random. An adversary 𝐴 is given 1𝑛 , the transcript 𝑚 and the challenge 𝑘′ . They try to
guess 𝑏, i.e., to distinguish between the secret key 𝑘 and a random string, and output
a bit 𝑏′ . The challenger outputs 1 if 𝑏 = 𝑏′ , and 0 otherwise. The key exchange (KE)
advantage of 𝐴 is defined as
Adv (𝐴) = | 𝑃𝑟[𝑏′ = 𝑏] − 𝑃𝑟[𝑏′ ≠ 𝑏] | .
The key exchange protocol is secure in the presence of an eavesdropper (EAV-secure) if,
for every probabilistic polynomial time adversary 𝐴, the advantage Adv (𝐴) is
negligible in 𝑛. ♢
Alice Bob
Derive 𝑘 Derive 𝑘
Adversary Challenger
1𝑛 , 𝑚 $ $
𝑏 ← {0, 1}, 𝑟 ← {0, 1}𝑛
𝑘′ 𝑘 if 𝑏 = 1
𝑘′ = {
𝑟 if 𝑏 = 0
Distinguish Compare 𝑏 and 𝑏′ ,
output 1 or 0
The above definition of EAV security requires that the protocol messages 𝑚 do
not reveal a single bit of information on the key 𝑘 to an eavesdropper. Otherwise, the
adversary would be able to distinguish between 𝑘 and a random string.
Remark 10.2. Note that the above experiment assumes a passive attacker who is un-
able to change or inject any messages. The presence of active adversaries requires an
authenticated key exchange (AKE) protocol, where the communication partners are
able to verify the authenticity of messages. Yet another topic is perfect forward secrecy
(PFS), which guarantees the security of past session keys if long-lived keys are exposed.
188 10. Key Establishment
Alice 𝐺, 𝑞, 𝑔 Bob
$ $
𝑎 ← ℤ 𝑞 , 𝐴 = 𝑔𝑎 𝑏 ← ℤ 𝑞 , 𝐵 = 𝑔𝑏
𝑘 = 𝐵𝑎 𝑘 = 𝐴𝑏
Note that the result of the Diffie-Hellman key exchange is a group element, not a
binary string. In practice, one applies a key derivation function to 𝑘, which transforms
the group element into a binary string.
The security of the Diffie-Hellman key exchange is closely related to the discrete
logarithm (DL) problem. If 𝑔 is a generator of 𝐺 and ord(𝐺) = ord(𝑔) = 𝑞, then
𝐺 = {𝑒, 𝑔1 , … , 𝑔𝑞−1 }.
10.3. Diffie-Hellman Key Exchange 189
If the DL problem or the CDH problem is easy, then the DDH problem is easy, too,
but the converse is not known. The DDH assumption is therefore stronger than the DL
Theorem 10.4. If the DDH problem is hard relative to the generation of group parame-
ters, then the Diffie-Hellman key exchange protocol is secure in the presence of an eaves-
dropper (EAV-secure).
190 10. Key Establishment
Adversary Challenger
𝐺, 𝑞, 𝑔, 𝐴, 𝐵 $
𝑎, 𝑏 ← ℤ𝑞 , 𝐴 = 𝑔𝑎 , 𝐵 = 𝑔𝑏
$ $
𝑏 ← {0, 1}, 𝑟 ← 𝐺
𝑘′ 𝑘 if 𝑏 = 1
𝑘′ = {
𝑟 if 𝑏 = 0
Distinguish Compare 𝑏 and 𝑏′ ,
output 1 or 0
Remark 10.5. It is not difficult to show that solving the above DDH experiment and
distinguishing a Diffie-Hellman shared secret 𝑘 from a uniform random element are
equivalent problems (see [KL15]). Note that 𝑘 is an element of group 𝐺. The above
Theorem therefore requires a slightly modified key distinguishability experiment: the
key and the random element are from group 𝐺 instead of an 𝑛-bit string. Alternatively,
one applies a key derivation function to transform the shared secret key into a binary
Remark 10.6. It is important to observe that the plain Diffie-Hellman protocol does
not protect against active adversaries. If an attacker is able to replace 𝐴 and 𝐵 with
their own parameters, then they can perform a Man-in-the-Middle attack. The prob-
lem of the plain Diffie-Hellman protocol is the lack of authenticity (see Remark 10.2).
In practice, one often signs the public keys in order to prove their authenticity. Sig-
natures are covered in Chapter 11. However, some issues remain, because at some
point a trusted public key (a trust anchor) is needed.
a product of small primes. For the hardness of the discrete logarithm and the DDH
problem, it is advisable for 𝑞 to be a large prime.
Suppose that ℎ is a generator of ℤ∗𝑝 , i.e., ord(ℎ) = 𝑝 − 1, and 𝑝 − 1 = 𝑟𝑞, where 𝑞 is
a large prime. Then ord(ℎ𝑟 ) = = 𝑞 and 𝐺 = ⟨ℎ𝑟 ⟩ is a cyclic group of prime order
𝑞. We let 𝑔 = ℎ and thus obtain Diffie-Hellman parameters.
Furthermore, safe primes are useful in the generation of Diffie-Hellman parame-
ters. A prime 𝑝 is called safe if 𝑞 = is prime. Then 𝑝 − 1 = 2𝑞 and the order of any
element 𝑔 ∈ ℤ𝑝 with 𝑔 ≢ ±1 is either 𝑝 − 1 or 𝑞.
Example 10.7. Let 𝑝 = 59. Of course, 𝑝 is much too small to be secure. One has
ord(ℤ∗𝑝 ) = 𝑝 − 1 = 2 ⋅ 29, so 59 is a safe prime. We are looking for a generator of ℤ∗𝑝
and try ℎ = 2 mod 59. In fact, ord(ℎ) = 58, i.e., ℎ is a primitive root mod 59, since
ℎ2 = 4 ≢ 1 and ℎ29 ≡ 58 ≢ 1 mod 59
(see Algorithm 4.1). Hence ℎ generates the full multiplicative group of order 58 and
𝑔 = ℎ2 generates a subgroup of prime order 29.
(1) We perform a Diffie-Hellman key exchange with the parameters 𝐺 = ℤ∗59 , ℎ =
2 mod 59 and ord(ℎ) = 58. Alice selects 𝑎 = 7 and sends 𝐴 = 27 mod 59 ≡ 10
to Bob. Bob chooses 𝑏 = 24 and transmits 𝐵 = 224 mod 59 ≡ 35 to Alice. Alice
computes 𝑘 = 𝐵 𝑎 = 357 mod 59 ≡ 12 and Bob obtains the same key 𝑘 = 𝐴𝑏 =
1024 mod 59 ≡ 12.
(2) Set 𝑔 = ℎ2 = 4 mod 59 and 𝐺 = ⟨4⟩ ⊂ ℤ∗59 and perform a Diffie-Hellman
exchange using 𝐺, 𝑔 and 𝑞 = ord(𝑔) = 29. Alice selects 𝑎 = 7 and sends 𝐴 =
47 mod 59 ≡ 41 to Bob. Bob chooses 𝑏 = 24 and sends 424 mod 59 ≡ 45 to Alice.
Alice computes 𝑘 = 457 mod 59 ≡ 26 and Bob gets 𝑘 = 4124 mod 59 ≡ 26. ♢
The group order is 𝑞 = . ♢
for 0 ≤ 𝑟 < 𝑚 are called babysteps and have to be stored. If one of the babysteps equals
1, then set 𝑎 = 𝑟 and the problem is solved. Otherwise, set 𝑇 = 𝑔𝑚 , compute the
giantsteps 𝑇 𝑠 for 0 < 𝑠 ≤ 𝑚 and compare them to the babysteps. If the giantstep 𝑇 𝑠
is equal to the babystep 𝐴𝑔−𝑟 , then the solution to the DL problem is 𝑎 = 𝑚𝑠 + 𝑟. In
the worst case, all babysteps and giantsteps have to be computed, which requires 2𝑚
exponentiations. Furthermore, 𝑚 babysteps have to be stored. Hence the running time
and the space complexity is 𝑂(2𝑛/2 ).
Example 10.9. Let 𝑝 = 59 and 𝑔 = 4 ∈ ℤ∗𝑝 ; then 𝑞 = ord(𝑔) = 29. Suppose an
adversary eavesdrops 𝐴 = 41 (see Example 10.7). They compute the discrete logarithm
using the Babystep-Giantstep algorithm: 𝑚 = ⌊√29⌋ = 5, 𝑔−1 = (4 mod 59)−1 ≡ 15.
The babysteps are:
𝐴 = 41, 𝐴𝑔−1 = 25, 𝐴𝑔−2 = 21, 𝐴𝑔−3 = 20, 𝐴𝑔−4 = 5.
Furthermore, 𝑇 = 𝑔𝑚 = 45 mod 59 ≡ 21. Hence the first giantstep matches the
second babystep and the solution is 𝑎 = 1⋅5+2 = 7. In fact, 𝑔𝑎 = 47 mod 59 ≡ 41. ♢
Pollard’s 𝜌 method for logarithms has about the same running time 𝑂(√𝑞) = 𝑂(2𝑛/2 )
as the Babystep-Giantstep algorithm, but requires much less storage. We briefly outline
10.5. Discrete Logarithm 193
The Index-Calculus algorithm fixes a factor base 𝐵 consisting of small primes and
first computes log𝑔 (𝑝𝑖 ) for all 𝑝𝑖 ∈ 𝐵. Then random integers 0 < 𝑥 < 𝑝 are chosen until
𝑔𝑥 𝐴 mod 𝑝 is a product of primes in the factor base. If this is the case, then log𝑔 (𝑔𝑥 𝐴)
194 10. Key Establishment
can be determined using the pre-computed discrete logarithms log𝑔 (𝑝𝑖 ). Finally, one
log𝑔 (𝐴) = log𝑔 (𝑔𝑥 𝐴) − 𝑥 mod 𝑝 − 1.
The algorithm can only be applied to the multiplicative groups of finite fields (and to
some families of elliptic curves). The expected running time is
𝑂(𝑒(√2+𝑜(1))√ln(𝑝) ln(ln(𝑝)) ).
The Number Field Sieve for Discrete Logarithms is currently the best available al-
gorithm for the multiplicative group, and its heuristic sub-exponential running time
1 2
is 𝑂(𝑒(𝑐+𝑜(1)) ln(𝑝) 3 ln(ln(𝑝)) 3 ), where 𝑐 = ( )1/3 ≈ 1.92 (see [JOP14]). Note that the
running time depends on the size of 𝑝 and not on the size of the group order 𝑞. The
effective key length for a 1024-bit prime 𝑝 is only around 86 bits, and at the time of this
writing primes of at least 2000 bits are recommended.
Polynomial-time algorithms are not known, but the discrete logarithm problem
can be efficiently solved with quantum computers (see Chapter 13).
a key of length 𝑙(𝑛). We write (𝑐, 𝑘) ← 𝐸𝑛𝑐𝑎𝑝𝑠𝑝𝑘 (1𝑛 ), where 𝑐 is public and 𝑘 is
• The decapsulation algorithm 𝐷𝑒𝑐𝑎𝑝𝑠 takes a private key 𝑠𝑘 and a ciphertext 𝑐 as
input. It outputs a key 𝑘 = 𝐷𝑒𝑐𝑎𝑝𝑠𝑠𝑘 (𝑐) or an error symbol ⟂.
The KEM provides correct encapsulation, if for (𝑐, 𝑘) ← 𝐸𝑛𝑐𝑎𝑝𝑠𝑝𝑘 (1𝑛 ), one has
𝐷𝑒𝑐𝑎𝑝𝑠𝑠𝑘 (𝑐) = 𝑘. ♢
1𝑛 𝑝𝑘 𝑠𝑘
𝐸𝑛𝑐𝑎𝑝𝑠 𝐷𝑒𝑐𝑎𝑝𝑠
𝑘 𝑘
A stronger notion is security against adaptive chosen ciphertext attacks (CCA2 se-
curity). The corresponding experiment gives the adversary additional access to a de-
capsulation oracle (before and after obtaining the challenge), but they may not request
the decapsulation of the challenge ciphertext 𝑐.
We construct a KEM based on RSA encryption, which can be shown to be CCA2-
• The key generation algorithm 𝐺𝑒𝑛(1𝑛 ) is identical to the RSA key generation (see
Definition 9.4) and outputs a public key 𝑝𝑘 = (𝑒, 𝑁) and a private key 𝑠𝑘 = (𝑑, 𝑁).
Furthermore, a hash function 𝐻 ∶ ℤ∗𝑁 → {0, 1}𝑛 is fixed.
• The encapsulation algorithm 𝐸𝑛𝑐𝑎𝑝𝑠 takes the public key 𝑝𝑘 as input and chooses
a uniform random element 𝑠 ∈ ℤ∗𝑁 . It outputs the ciphertext
𝑐 = 𝑠𝑒 mod 𝑁
𝑠 = 𝑐𝑑 mod 𝑁
We infer from the RSA construction (see Definition 9.4) that the above encapsula-
tion mechanism is correct. If the RSA assumption holds and the hash function behaves
like a random oracle, then CPA security follows from the fact that an adversary is un-
able to derive 𝑠 from 𝑐. But if 𝑠 is unknown then 𝐻(𝑠) is uniform random. Note that
padding schemes like OAEP are not required here since 𝑠 is uniform in ℤ∗𝑁 . Further-
more, the RSA key encapsulation mechanism even turns out to be CCA2-secure if the
hash function has no weaknesses. An adversary with access to a decapsulation oracle
only gets the hash value 𝑘′ = 𝐻((𝑐′ )𝑑 mod 𝑁) on input 𝑐′ . However, this does not re-
veal any information about 𝑘 = 𝐻(𝑐𝑑 mod 𝑁) if 𝑐 ≠ 𝑐′ , since hashes of different input
values are uncorrelated. We refer to [KL15] for a proof of the following Theorem:
Theorem 10.15. If the RSA assumption holds and 𝐻 is modeled as a random oracle,
then the RSA key encapsulation mechanism is CCA2-secure. ♢
The Diffie-Hellman key exchange protocol can also be turned into a key encap-
sulation mechanism. The Diffie-Hellman KEM can be viewed as an adaption of the
ElGamal encryption scheme (see Exercise 12).
10.7. Hybrid Encryption 197
• The key generation algorithm 𝐺𝑒𝑛 takes 1𝑛 as input and outputs a cyclic group
𝐺 of order 𝑞 with 𝑛 = size(𝑞), a generator 𝑔 ∈ 𝐺, a uniform random element
𝑏 ∈ ℤ𝑞 and 𝐵 = 𝑔𝑏 . The public key is 𝑝𝑘 = (𝐺, 𝑞, 𝑔, 𝐵) and the private key is
𝑠𝑘 = (𝐺, 𝑞, 𝑔, 𝑏). Also fix a function 𝐻 ∶ 𝐺 → {0, 1}𝑛 .
• The encapsulation algorithm takes 𝑝𝑘 as input, chooses a uniform random ele-
ment 𝑎 ∈ ℤ𝑞 and outputs the ciphertext 𝑐 = 𝐴 = 𝑔𝑎 and the key 𝑘 = 𝐻(𝐵 𝑎 ).
• The decapsulation algorithm 𝐷𝑒𝑐𝑎𝑝𝑠 takes 𝑠𝑘 and 𝑐 as input and outputs the key
𝑘 = 𝐻(𝑐𝑏 ). ♢
The encapsulated key is 𝐻(𝑘′ ), where 𝑘′ is the shared Diffie-Hellman secret 𝑔𝑎𝑏 .
Since 𝑘′ = 𝑔𝑎𝑏 = 𝐴𝑏 = 𝐵 𝑎 , the encapsulation method is correct. The security depends
on standard assumptions about the Diffie-Hellman problem and on properties of 𝐻.
Theorem 10.17. Suppose the computational Diffie-Hellman (CDH) problem is hard rel-
ative to the generation of group parameters and 𝐻 is modeled as a random oracle. Then
the Diffie-Hellman key encapsulation mechanism is CPA-secure. ♢
The proof can be found in [KL15]. There is also a security guarantee without the
use of a random oracle. Under the stronger gap-CDH assumption the Diffie-Hellman
key encapsulation mechanism can be shown to be CCA2-secure. The gap-CDH prob-
lem (see [OP01]) gives the adversary access to a Decisional Diffie-Hellman oracle that
answers whether (𝑔, 𝐴, 𝐵, 𝑘′ ) is a valid Diffie-Hellman quadruple, i.e., whether or not
log (𝐵)
𝑘′ = 𝐴 𝑔 .
• Run the key generation algorithm of the KEM on input 1𝑛 and output the keys 𝑝𝑘
and 𝑠𝑘.
198 10. Key Establishment
• The hybrid encryption algorithm takes the public key 𝑝𝑘 and a message 𝑚 ∈ {0, 1}∗
as input. The encapsulation algorithm 𝐸𝑛𝑐𝑎𝑝𝑠 computes
(𝑐, 𝑘) ← 𝐸𝑛𝑐𝑎𝑝𝑠𝑝𝑘 (1𝑛 ).
Then the symmetric encryption algorithm ℰ takes 𝑘 and the plaintext 𝑚 as input
and computes 𝑐′ = ℰ𝑘 (𝑚). Finally, output the ciphertext (𝑐, 𝑐′ ).
• The hybrid decryption algorithm takes the private key 𝑠𝑘 and the ciphertext (𝑐, 𝑐′ )
as input. First, the symmetric key is retrieved by computing
𝑘 = 𝐷𝑒𝑐𝑎𝑝𝑠𝑠𝑘 (𝑐).
Then decrypt 𝑐′ and output the plaintext 𝑚 = 𝒟𝑘 (𝑐′ ). If 𝑐 or 𝑐′ are invalid then
output ⟂. ♢
(1) If the KEM is CPA-secure and the symmetric scheme is EAV-secure, then the corre-
sponding hybrid scheme is CPA-secure.
(2) If the KEM and the symmetric scheme are both CCA2-secure, then the corresponding
hybrid scheme is CCA2-secure. ♢
Note that EAV security of the symmetric scheme is sufficient for (1). In fact, a
hybrid scheme is public-key, and so EAV and CPA security are equivalent.
Corollary 10.20. The hybrid encryption scheme that combines RSA key encapsulation
and an authenticated encryption scheme (see Definition 8.19) is CCA2-secure if the RSA
assumption holds and the hash function is modeled as a random oracle. ♢
• The encryption algorithm takes a plaintext 𝑚 and the public key 𝑝𝑘 as input,
chooses a uniform random 𝑎 ∈ ℤ𝑞 and sets 𝑘𝐸 ‖𝑘𝑀 = 𝐻(𝐵𝑎 ). Then compute
𝐴 = 𝑔𝑎 , 𝑐 ← ℰ𝑘𝐸 (𝑚), 𝑡 = MAC𝑘𝑀 (𝑐) and output the ciphertext (𝐴, 𝑐, 𝑡).
• The decryption algorithm takes the ciphertext (𝐴, 𝑐, 𝑡) and the private key 𝑠𝑘 as in-
put. Compute 𝑘𝐸 ‖𝑘𝑀 = 𝐻(𝐴𝑏 ), verify the tag 𝑡 using 𝑘𝑀 and output the plaintext
𝑚 = 𝒟𝑘𝐸 (𝑐). If 𝐴 ∉ 𝐺 or if the verification of 𝑡 fails then output ⟂. ♢
Note that the scheme derives both the symmetric encryption key and the message
authentication key from the shared Diffie-Hellman secret.
DHIES usually refers to Diffie-Hellman using subgroups of ℤ∗𝑝 , the multiplicative
group of integers modulo a prime number (see Section 10.4). However, if a group of
points on an elliptic curve is used (see Section 12.2), then the scheme is called the Elliptic
Curve Integrated Encryption Scheme (ECIES).
Diffie-Hellman integrated encryption schemes are CCA2-secure under certain as-
Theorem 10.22. Consider DHIES or ECIES and suppose the underlying symmetric en-
cryption scheme is CPA-secure, the message authentication code is strongly secure, the
gap-CDH assumption holds for the Diffie-Hellman group and the hash function 𝐻 is mod-
eled as a random oracle. Then DHIES and ECIES are CCA2-secure encryption schemes.
The above Theorem follows from the CCA2 security of the Diffie-Hellman key en-
capsulation method and the CCA2 security of the encrypt-then-authenticate construc-
tion (see Section 8.4) for CPA-secure symmetric encryption schemes.
Remark 10.23. Several of the above statements, in particular on the CCA2 security
of key encapsulation and hybrid encryption schemes, require the assumption that the
hash function is modeled as a random oracle. We refer to the literature for security
guarantees without the use of the random oracle model ([KL15], [HK07]).
200 10. Key Establishment
10.8. Summary
1. Show that the discrete-logarithm problem is easy in the additive group (ℤ𝑝 , +).
2. How can you efficiently generate Diffie-Hellman parameters 𝑝, 𝑞 and 𝑔 for the
multiplicative group ℤ∗𝑝 with given bit lengths 𝑛𝑝 and 𝑛𝑞 for 𝑝 and 𝑞 ?
3. Let 𝑝 = 89, 𝑔 = 2 mod 89 and 𝐺 = ⟨𝑔⟩. How many different shared keys 𝑘 are
possible in a Diffie-Hellman key exchange with these parameters?
4. You perform a Diffie-Hellman key exchange with Alice and you agreed on the pa-
rameters 𝑝 = 43, 𝐺 = ⟨𝑔⟩ ⊂ ℤ∗𝑝 and 𝑔 = 3 mod 43.
(a) Determine 𝑞 = ord(𝑔).
(b) Alice sends you 𝐴 = 14 and you choose the secret exponent 𝑏 = 26. Which
value do you send to Alice? Compute the shared secret key 𝑘.
5. Let 𝑔 ≡ 3 be an element of the group ℤ∗107 .
(a) Show that 𝑔 generates a group 𝐺 of prime order.
(b) How many exponentiations at most are necessary to compute a discrete loga-
rithm in 𝐺 using the Babystep-Giantstep algorithm?
(c) Compute log3 (12) in 𝐺.
Exercises 201
6. Show that the following parameters (a 2048-bit MODP group given in RFC 5114
[LK08]) can be used in a Diffie-Hellman key exchange, i.e., show that 𝑝 and 𝑞 are
prime numbers and ord(𝑔) = 𝑞.
Tip: Use SageMath. Remove the line breaks and define strings. The corresponding
hexadecimal numbers can be constructed with ZZ( ..., 16). Use the function
is_pseudoprime( ) to check the primality.
p = 87A8E61D B4B6663C FFBBD19C 65195999 8CEEF608 660DD0F2
5D2CEED4 435E3B00 E00DF8F1 D61957D4 FAF7DF45 61B2AA30
16C3D911 34096FAA 3BF4296D 830E9A7C 209E0C64 97517ABD
5A8A9D30 6BCF67ED 91F9E672 5B4758C0 22E0B1EF 4275BF7B
6C5BFC11 D45F9088 B941F54E B1E59BB8 BC39A0BF 12307F5C
4FDB70C5 81B23F76 B63ACAE1 CAA6B790 2D525267 35488A0E
F13C6D9A 51BFA4AB 3AD83477 96524D8E F6A167B5 A41825D9
67E144E5 14056425 1CCACB83 E6B486F6 B3CA3F79 71506026
C0B857F6 89962856 DED4010A BD0BE621 C3A3960A 54E710C3
75F26375 D7014103 A4B54330 C198AF12 6116D227 6E11715F
693877FA D7EF09CA DB094AE9 1E1A1597
g = 3FB32C9B 73134D0B 2E775066 60EDBD48 4CA7B18F 21EF2054
07F4793A 1A0BA125 10DBC150 77BE463F FF4FED4A AC0BB555
BE3A6C1B 0C6B47B1 BC3773BF 7E8C6F62 901228F8 C28CBB18
A55AE313 41000A65 0196F931 C77A57F2 DDF463E5 E9EC144B
777DE62A AAB8A862 8AC376D2 82D6ED38 64E67982 428EBC83
1D14348F 6F2F9193 B5045AF2 767164E1 DFC967C1 FB3F2E55
A4BD1BFF E83B9C80 D052B985 D182EA0A DB2A3B73 13D3FE14
C8484B1E 052588B9 B7D2BBD2 DF016199 ECD06E15 57CD0915
B3353BBB 64E0EC37 7FD02837 0DF92B52 C7891428 CDC67EB6
184B523D 1DB246C3 2F630784 90F00EF8 D647D148 D4795451
5E2327CF EF98C582 664B4C0F 6CC41659
q = 8CF83642 A709A097 B4479976 40129DA2 99B1A47D 1EB3750B
A308B0FE 64F5FBD3
7. Let 𝑝 = 59, 𝑔 ≡ 4 ∈ ℤ∗𝑝 , 𝑞 = ord(𝑔) = 29, 𝐺 = ⟨𝑔⟩ and 𝐴 = 𝑔𝑎 ≡ 9. Compute the
discrete logarithm 𝑎 = log𝑔 (𝐴) in 𝐺 with Pollard’s 𝜌 method.
Hint: Use the collision 𝑔2 𝐴1 = 𝑔5 𝐴5 in 𝐺.
8. Show that 𝑔 = 11 generates the group 𝐺 = ℤ∗109 . Apply the Pohlig-Hellman algo-
rithm to compute the discrete logarithm log11 (54).
9. Why is RSA key encapsulation (see Definition 10.14) not CPA-secure without the
hashing operation?
10. Discuss the consequences of re-using one or both of the secret Diffie-Hellman keys
𝑎 and 𝑏.
11. Explain a Man-in-the-Middle attack against the Diffie-Hellman protocol.
12. The ElGamal public-key encryption scheme uses the same parameters as the Diffie-
Hellman key-exchange, i.e., a cyclic group 𝐺, a generator 𝑔 and 𝑞 = ord(𝑔). Choose
a uniform number 𝑎 ∈ ℤ𝑞 and set 𝐴 = 𝑔𝑎 ∈ 𝐺. The message space is 𝐺, the
ciphertext space is 𝐺 × 𝐺, the public key is 𝑝𝑘 = (𝐺, 𝑞, 𝑔, 𝐴) and the private key
202 10. Key Establishment
Digital Signatures
Digital signatures are asymmetric cryptographic schemes which aim at data integrity
and authenticity. There are some similarities to message authentication codes, but digi-
tal signatures are verified using a public key. Successful verification shows that the data
is authentic and has not been tampered with. Since the private key is exclusively con-
trolled by the signer, digital signatures achieve not only data integrity and authenticity,
but also non-repudiation. Signatures have applications beyond integrity protection, for
example in entity authentication protocols.
In Section 11.1, we define digital signature schemes and discuss their security: sig-
natures should be unforgeable. Section 11.2 deals with the definition of the plain RSA
signature, which is based on the same parameters as the RSA cryptosystem. The plain
RSA signature is forgeable, and hashing of the data is advisable for security and effi-
ciency reasons. Furthermore, we present the probabilistic RSA-PSS scheme in Section
11.3. Other signatures schemes (ElGamal and DSA) are briefly discussed in the exer-
cises at the end of this chapter.
We refer the reader to [PP10], [KL15] and [GB08] for additional reading.
Definition 11.1. A digital signature scheme is given by the following spaces and poly-
nomial-time algorithms:
• A message space ℳ.
• A space of key pairs 𝒦 = 𝒦𝑝𝑘 × 𝒦𝑠𝑘 .
204 11. Digital Signatures
Note that verification of a signature also requires the message. The signature is
usually short and does not include the message.
Similar to a message authentication code, the security of a signature scheme is
determined by the hardness of computing a valid signature without the private key. We
assume that an adversary knows the public key and is thus able to verify signatures.
Furthermore, we assume that the adversary can choose arbitrary messages to be signed.
This is called a chosen message attack and corresponds to a situation in practice where
many signature values are known and legitimate or innocent messages are routinely
Definition 11.2. Suppose a signature scheme is given. Consider the following experi-
ment (see Figure 11.1): a challenger takes the security parameter 1𝑛 as input and gen-
erates a key pair (𝑝𝑘, 𝑠𝑘) by running 𝐺𝑒𝑛(1𝑛 ). An adversary 𝐴 is given 1𝑛 and the public
key 𝑝𝑘. The adversary can choose messages 𝑚 and obtains the signature 𝑠 = 𝑠𝑖𝑔𝑛𝑠𝑘 (𝑚)
from an oracle. 𝐴 can also verify signatures using the public key 𝑝𝑘. The adversary tries
to forge a signature of a new message 𝑚′ and outputs (𝑚′ , 𝑠′ ). The challenger outputs
1 if the signature is valid and has not been queried before, and 0 otherwise.
The scheme is called existentially unforgeable under an adaptive chosen message
attack (EUF-CMA secure or just secure) if for all probabilistic polynomial-time adver-
saries, the probability of success is negligible in 𝑛. ♢
The definition of a secure scheme requires that the length of signature values is
not too short, since an adversary might otherwise guess valid signatures.
Digital signatures protect the integrity and authenticity of messages and can also
achieve non-repudiation. Since the signer alone controls the private key, they can-
not deny the signature afterwards. Note that symmetric schemes cannot achieve non-
repudiation, since the secret key is known to two (or more) parties.
Remark 11.3. The verification of a digital signature requires the authentic public key
of the signer. Although public keys can be openly shared, authenticity is not evident.
A man-in-the-middle might replace the message, the signature and the public key with
his own data. A verifier is not able to detect this attack, unless they can check the
authenticity of the public key. In practice, public keys are often signed by other parties
(in a Web of Trust) or by a Certification Authority (CA). A signed certificate binds the
11.2. Plain RSA Signature 205
Adversary Challenger/Oracle
1𝑛 , 𝑝𝑘 $
(𝑝𝑘, 𝑠𝑘) ← 𝐺𝑒𝑛(1𝑛 )
Choose 𝑚
𝑠 = 𝑠𝑖𝑔𝑛𝑠𝑘 (𝑚)
Choose 𝑚′ ,
(𝑚′ , 𝑠′ )
forge a signature 𝑠′ Verify (𝑚′ , 𝑠′ ),
output 1 or 0
identity of a subject to its public key. This shifts the problem of authentic public keys
to a third party that is hopefully trustworthy.
The correctness follows in the same way as for the RSA encryption algorithm.
The efficiency of RSA was discussed in Section 9.5. The complexity of signature
verification is 𝑂(𝑛2 ) if the public exponent 𝑒 is short, for example 𝑒 = 216 + 1. The
running time of the RSA signature is 𝑂(𝑛3 ) because the size of the private exponent 𝑑
is 𝑛. The exponentiation can be accelerated by a factor of around 4 using the Chinese
Remainder Theorem (compare Example 9.13). Digital signatures are not as efficient
as message authentication codes (see Chapter 8), and one avoids carrying out a large
number of signatures or verifications.
Example 11.5. Alice’s RSA parameters are 𝑝 = 11, 𝑞 = 23, 𝑁 = 𝑝𝑞 = 253, 𝑒 = 3
and 𝑑 = 147. She signs the message 𝑚 = 111 and computes 𝑠 = 111147 mod 253 ≡
89. Bob uses her public key 𝑝𝑘 = (3, 253) and verifies the signature by computing
893 mod 253 ≡ 111. ♢
Unfortunately, this scheme is both impractical and insecure. Firstly, the message
length is limited by the size of the RSA modulus 𝑁, but in practice, one needs to sign
messages of arbitrary length and not only several hundred bytes, the usual RSA modu-
lus length.
Furthermore, the plain RSA signature scheme is insecure, because the signature is
multiplicative and new signature values can be easily forged. If 𝑠1 and 𝑠2 are signatures
of 𝑚1 and 𝑚2 , then 𝑠1 𝑠2 mod 𝑁 is a valid signature of 𝑚1 𝑚2 mod 𝑁. Similarly, valid
signature values can be generated by taking powers. An adversary can even choose
any 𝑠 ∈ ℤ∗𝑁 and compute 𝑚 = 𝑠𝑒 mod 𝑁. Then 𝑠 is a valid signature of the message
𝑚. This attack is called existential forgery. Note that the adversary only controls the
signature value 𝑠 and not the message 𝑚.
Example 11.6. Assume Alice’s RSA parameters are the same as in Example 11.5 above.
Mallory generates a forged signature value 𝑠 = 123 and computes
𝑚 = 𝑠3 ≡ 52 mod 253. Bob successfully verifies the signature 𝑠 of 𝑚:
𝑠𝑒 = 1233 ≡ 52 mod 253.
But Alice has never signed 𝑚 = 52. ♢
We conclude that the plain RSA signature scheme is not EUF-CMA secure. This
is analogous to the fact that plain RSA encryption is malleable and insecure under a
chosen ciphertext attack (see Chapter 9.2).
Closer analysis shows that the hash function should be collision-resistant and have
the properties of a random oracle. Furthermore, the range of the hash function should
be the full RSA message space ℤ∗𝑁 . The latter poses a problem in practice, since the
RSA modulus is usually more than 2000 bits long, whereas the digests of well-known
hash functions are much shorter − only between 160 and 512 bits.
The RSA-FDH (Full Domain Hash) signature is similar to the plain RSA scheme,
but leverages a hash function 𝐻 ∶ {0, 1}∗ → ℤ∗𝑁 . A message 𝑚 is first hashed and then
𝑠 = 𝑠𝑖𝑔𝑛𝑠𝑘 (𝑚) = 𝐻(𝑚)𝑑 mod 𝑁.
In the verification step, 𝐻(𝑚) is computed and then compared to 𝑠𝑒 mod 𝑁. The sig-
nature is valid if 𝐻(𝑚) = 𝑠𝑒 mod 𝑁.
Obviously, the collision-resistance of 𝐻 is crucial, since a collision 𝐻(𝑚1 ) = 𝐻(𝑚2 )
with 𝑚1 ≠ 𝑚2 can be used for an existential forgery.
Theorem 11.7. If 𝐻 has range ℤ∗𝑁 and is modeled as a random oracle, the RSA-FDH
scheme is EUF-CMA secure under the RSA assumption.
Remark 11.8. The above theorem has a proof by reduction (see [BR05], [KL15]). If we
assume that the hash function behaves like a random oracle, then forging a signature
is only possible by inverting the RSA function 𝑓(𝑥) = 𝑥𝑒 mod 𝑁 on uniform random
integers modulo 𝑁. But under the RSA assumption, the probability of successfully
inverting 𝑓 is negligible. ♢
Since the length of cryptographic hashes is usually smaller than the size of the
RSA modulus, one stretches the hash by randomized padding. The result should still
be indistinguishable from a random integer in ℤ𝑁 . A standard method is defined in
PKCS #1 version 2.2 (see RFC 8017 [MKJR16]) and is called a Probabilistic Signature
Standard (RSA-PSS) or RSASSA-PSS (RSA signature scheme with appendix).
Similar to RSA-OAEP (see Chapter 9.6), the PSS encoding requires a hash function
𝐻 with output byte length ℎ and a mask generating function 𝑀𝐺𝐹 with input length ℎ
and variable output length. For example, 𝐻 could be SHA-2 and 𝑀𝐺𝐹 is based on this
hash function.
In the following, we describe RSA-PSS signing and verification (see
Figure 11.2). Let 𝑚 be the message. We will derive an encoded message 𝐸𝑀 of byte
length 𝑘 = ⌈ ⌉ and sign 𝐸𝑀.
First, 𝑚′ is defined by concatenating eight zero padding bytes, the hashed message
𝐻(𝑚) and a random salt string. A typical salt length is ℎ bytes, but an empty salt is also
𝑚′ = 008 ‖ 𝐻(𝑚) ‖ salt.
The result is hashed again and we obtain 𝐻(𝑚′ ). A data block 𝐷𝐵 of length 𝑘 − ℎ − 1
is formed by concatenating the necessary number of zero padding bytes, one byte 01
208 11. Digital Signatures
𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵 = 𝐷𝐵 ⊕ 𝑀𝐺𝐹(𝐻(𝑚′ ), 𝑘 − ℎ − 1)
and define the encoded message 𝐸𝑀 by concatenating 𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵, the hash 𝐻(𝑚′ ) and
the byte BC:
𝐸𝑀 = 𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵 ‖ 𝐻(𝑚′ ) ‖ BC.
𝐸𝑀 = 𝑠𝑒 mod 𝑁.
The rightmost byte of 𝐸𝑀 should be BC. Then the byte strings 𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵 of length
𝑘 − ℎ − 1 and 𝐻 ′ of length ℎ are extracted and
𝐷𝐵 = 𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵 ⊕ 𝑀𝐺𝐹(𝐻 ′ , 𝑘 − ℎ − 1)
is computed. The leftmost bytes of 𝐷𝐵 should be 00, followed by 01. The remaining
bytes of 𝐷𝐵 form the salt. Set
and compute 𝐻(𝑚′ ). If 𝐻(𝑚′ ) = 𝐻 ′ then the signature is valid. Otherwise, the signa-
ture is invalid. It is important that only one failure message is given for all format or
verification errors.
Remark 11.9. If the salt is randomly chosen and is sufficiently long, then the RSA-PSS
signature is randomized and signing the same message twice using the same key will
give different signature values. An adversary who compares different signature values
cannot see whether any of the underlying messages are identical. ♢
Theorem 11.10. The RSA-PSS signature scheme is EUF-CMA secure in the random or-
acle model under the RSA assumption. ♢
11.3. Probabilistic Signature Scheme 209
11.4. Summary
9. Suppose we want to sign and to encrypt data (signcryption). Analogue to the en-
crypt-then-authenticate approach in symmetric cryptography, Alice encrypts a se-
cret message with Bob’s public key and then signs the ciphertext using her private
(a) How can an adversary produce their own signature of the same message and
thus mislead Bob?
(b) Does this combination of encryption and signature provide non-repudiation?
10. The ElGamal signature scheme is based on the discrete logarithm problem and uses
a cyclic subgroup 𝐺 of ℤ∗𝑝 of order 𝑞 and a generator 𝑔 ∈ 𝐺. One chooses a secret
uniform 𝑎 ∈ ℤ𝑞 and computes 𝐴 = 𝑔𝑎 . Then 𝑝𝑘 = (𝑝, 𝑔, 𝑞, 𝐴) forms the public key
and 𝑠𝑘 = (𝑝, 𝑔, 𝑞, 𝑎) is the secret key. The signature generation is randomized; one
chooses a random uniform 𝑘 ∈ ℤ∗𝑞 and computes the signature value by
𝑠𝑖𝑔𝑛𝑠𝑘 (𝑚) = (𝑟, 𝑠) with 𝑟 ≡ 𝑔𝑘 mod 𝑝 and 𝑠 ≡ 𝑘−1 (𝐻(𝑚) − 𝑎𝑟) mod 𝑞.
To verify the signature (𝑟, 𝑠) of a message 𝑚, one computes 𝐴𝑟 𝑟𝑠 mod 𝑝 and com-
pares the result with 𝑔𝐻(𝑚) mod 𝑝. If both residue classes coincide, then the sig-
nature is valid.
The Digital Signature Algorithm (called DSA or DSS) is a standardized variant
of the ElGamal signature scheme [FIP13]. The bit lengths of 𝑝 and 𝑞 are specified,
e.g., size(𝑝) = 2048 and size(𝑞) = 224. Unlike ElGamal, both DSA signature parts
𝑟 and 𝑠 are reduced modulo 𝑞 and the verification is also performed modulo 𝑞.
Although 𝑞 is much smaller than 𝑝, the existing sub-exponential attacks against
the discrete logarithm problem do not run faster in subgroups of ℤ∗𝑝 . Other attacks,
e.g., Babystep-Giantstep and Pollard’s 𝜌-algorithm, can use the subgroup, but their
running time is 𝑂(2size(𝑞)/2 ) and thus out of reach if 𝑞 has more than 200 bits.
(a) Show that the verification is correct.
(b) Assume that 𝑝 = 59, 𝑔 ≡ 4, 𝑞 = 29 and 𝑎 = 20 form Alice’s ElGamal key.
Compute the public parameter 𝐴.
(c) Alice wants to sign a message 𝑚 with hash value 𝐻(𝑚) = 8. She chooses the
secret parameter 𝑘 = 5. Compute the ElGamal signature (𝑟, 𝑠).
(d) Check that the signature (𝑟, 𝑠) is valid.
(e) The ElGamal signature is randomized by 𝑘. Explain why 𝑘 must remain secret
and should not be re-used for different signatures.
Chapter 12
In this chapter, we outline the basics of Elliptic Curve Cryptography (ECC). In several
public-key schemes, the additive group of points on elliptic curves over finite fields
forms an alternative to the multiplicative group of integers modulo a prime number.
The addition of points on an elliptic curve is slightly more complex than the multipli-
cation of residue classes, but elliptic curves offer a similar level of security as the mul-
tiplicative group with shorter keys. ECC is now widely used because of its efficiency
and accepted security.
In Section 12.1, Weierstrass equations and elliptic curves are introduced and the
addition of points on a cubic curve is explained. We present the Elliptic Curve Diffie-
Hellman algorithm in Section 12.2 and discuss the efficiency and security of ECC in
Section 12.3. Finally, in Section 12.4 we show how elliptic curves can be leveraged to
factor integers. The Elliptic Curve Digital Signature Algorithm (ECDSA) is discussed
in the exercises at the end of this chapter.
Elliptic curves over different fields are an interesting and challenging mathemati-
cal topic and we refer to [Sil09] for an in-depth treatment. There are several textbooks
on cryptographic applications of elliptic curves and we recommend [Was08], [TW06],
[Gal12] and [She17] for additional reading.
214 12. Elliptic Curve Cryptography
From now on, we assume that 𝑐ℎ𝑎𝑟(𝐾) is neither 2 nor 3, although this excludes
the binary fields 𝐾 = 𝐺𝐹(2𝑚 ). The theory of elliptic curves also works over these fields
and are used in cryptography, but there are some technical differences, for example
with respect to the Weierstrass equation.
We want to add points at infinity to the 𝑛-dimensional space 𝐾 𝑛 .
Definition 12.4. Let 𝐾 be a field and 𝑛 ∈ ℕ. Then the 𝑛-dimensional projective space
ℙ𝑛 (𝐾) is defined as the set of all lines in 𝐾 𝑛+1 passing through the origin. Points in
ℙ𝑛 (𝐾) are given by 𝑛 + 1 projective or homogeneous coordinates and are denoted by
[𝑥1 ∶ 𝑥2 ∶ ⋯ ∶ 𝑥𝑛+1 ].
Two points are equivalent and give the same element in ℙ𝑛 (𝐾) if they are on the same
line, i.e., if they differ only by a nonzero factor 𝜆 ∈ 𝐾. Hence the projective space ℙ𝑛 (𝐾)
is a set of equivalence classes of 𝐾 𝑛+1 ⧵ {0𝑛+1 }, where
[𝑥1 ∶ 𝑥2 ∶ ⋯ ∶ 𝑥𝑛+1 ] ∼ [𝑦1 ∶ 𝑦2 ∶ ⋯ ∶ 𝑦𝑛+1 ]
if there exists a 𝜆 ∈ 𝐾 ∗ such that 𝑦1 = 𝜆𝑥1 , 𝑦2 = 𝜆𝑥2 , … , 𝑦𝑛+1 = 𝜆𝑥𝑛+1 . ♢
12.1. Weierstrass Equations and Elliptic Curves 215
The points [𝑥1 ∶ ⋯ ∶ 𝑥𝑛 ∶ 0] are said to be points at infinity. The usual space 𝐾 𝑛
of 𝑛-dimensional vectors is called the affine space and we will sometimes write 𝔸𝑛 (𝐾)
instead of 𝐾 𝑛 . We have an injection
𝔸𝑛 (𝐾) ↪ ℙ𝑛 (𝐾), (𝑥1 , … 𝑥𝑛 ) ↦ [𝑥1 ∶ ⋯ ∶ 𝑥𝑛 ∶ 1]
and the complement of the image under this map consists of the points at infinity.
Example 12.5. a) Points in ℙ1 (𝐾) are lines in the plane 𝐾 2 passing through the ori-
gin. If 𝑦 ≠ 0 then [𝑥 ∶ 𝑦] is equivalent to [ ∶ 1], which lies in the image of 𝔸1 (𝐾).
Otherwise, [𝑥 ∶ 0] ∼ [1 ∶ 0] is the point at infinity. Hence
ℙ1 (𝐾) = 𝔸1 (𝐾) ∪ {[1 ∶ 0]}.
b) In the two-dimensional projective space ℙ2 (𝐾), all points are either equivalent
to [𝑥 ∶ 𝑦 ∶ 1] or to [𝑥 ∶ 𝑦 ∶ 0]. We have a decomposition
ℙ2 (𝐾) = 𝔸2 (𝐾) ∪ 𝔸1 (𝐾) ∪ {[1 ∶ 0 ∶ 0]},
where (𝑥, 𝑦) ∈ 𝐾 2 corresponds to [𝑥 ∶ 𝑦 ∶ 1] and 𝑢 ∈ 𝐾 to the point [𝑢 ∶ 1 ∶ 0]. ♢
A curve in the affine space 𝔸2 (𝐾) can be extended to a projective curve in ℙ2 (𝐾).
Suppose the curve is given by the Weierstrass equation 𝑦2 = 𝑥3 + 𝑎𝑥 + 𝑏. We set 𝑥 =
and 𝑦 = and obtain
𝑌2 𝑋3 𝑋
= 3
+ 𝑎 + 𝑏.
Multiplying both sides by 𝑍 3 yields the Weierstrass equation in projective (or homoge-
neous) coordinates:
𝑌 2 𝑍 = 𝑋 3 + 𝑎𝑋𝑍 2 + 𝑏𝑍 3 .
Proposition 12.6. The points on the projective curve 𝑌 2 𝑍 = 𝑋 3 + 𝑎𝑋𝑍 2 + 𝑏𝑍 3 are either
equivalent to [𝑥 ∶ 𝑦 ∶ 1], where (𝑥, 𝑦) ∈ 𝐾 2 satisfies the affine Weierstrass equation
𝑦2 = 𝑥3 + 𝑎𝑥 + 𝑏, or to the point [0 ∶ 1 ∶ 0] at infinity.
over a field 𝐾. Let 𝐾 be the algebraic closure of 𝐾 (see Remark 4.72). Then 𝐶 is called
nonsingular (or smooth) if for all points 𝑃 = (𝑥, 𝑦) ∈ 𝐶(𝐾) the partial derivatives 𝐷𝑥 𝑓
and 𝐷𝑦 𝑓 do not simultaneously vanish at 𝑃:
((𝐷𝑥 𝑓)(𝑃), (𝐷𝑦 𝑓)(𝑃)) ≠ (0, 0).
𝐷𝑥 𝑓 = 𝐷𝑥 (𝑓) and 𝐷𝑦 𝑓 = 𝐷𝑦 (𝑓) are the formal derivatives of 𝑓 with respect to 𝑥 and
𝑦, respectively (see Definition 4.55). If both derivatives vanish at 𝑃, then 𝑃 is called
a singular point. A point 𝑃 on the corresponding projective curve 𝑓(𝑋, 𝑌 , 𝑍) = 0 is
nonsingular if
((𝐷𝑋 𝑓)(𝑃), (𝐷𝑌 𝑓)(𝑃), (𝐷𝑍 𝑓)(𝑃)) ≠ (0, 0, 0).
Example 12.8. Let 𝐶 be the Weierstrass curve 𝑦2 = 𝑥3 over a field 𝐾. Then 𝑓(𝑥, 𝑦) =
−𝑦2 + 𝑥3 and
𝐷𝑥 𝑓 = 3𝑥2 and 𝐷𝑦 𝑓 = −2𝑦.
We assumed that 2 and 3 are nonzero in 𝐾. Then the equations 3𝑥2 = 0 and −2𝑦 = 0
give (𝑥, 𝑦) = (0, 0), a point on the curve. Therefore, 𝐶 is singular at the point (0, 0) and
nonsingular at all other points, including the point 𝑂 = [0 ∶ 1 ∶ 0] at infinity, since
𝑓(𝑋, 𝑌 , 𝑍) = −𝑌 2 𝑍 + 𝑋 3 gives 𝐷𝑍 𝑓 = −𝑌 2 and (𝐷𝑍 𝑓)(𝑂) = −1. ♢
Proof. The curve is defined by the equation 𝑓(𝑥, 𝑦) = −𝑦2 + 𝑥3 + 𝑎𝑥 + 𝑏 = 0. One has
𝐷𝑥 𝑓 = 3𝑥2 + 𝑎 and 𝐷𝑦 𝑓 = −2𝑦.
Suppose that (𝑥, 𝑦) ∈ 𝐾 × 𝐾 lies on the Weierstrass curve and both formal partial
derivatives vanish; then 𝑦 = 0 and 𝑎 = −3𝑥2 . Since 𝑓(𝑥, 𝑦) = 0 we also have 𝑥3 +
𝑎𝑥 + 𝑏 = 0, which gives 𝑥3 − 3𝑥3 + 𝑏 = 0 and 𝑏 = 2𝑥3 . The equations for 𝑎 = −3𝑥2
and 𝑏 = 2𝑥3 imply
−4𝑎3 = 27𝑏2 = 108𝑥6 .
If Δ ≠ 0 then −4𝑎3 ≠ 27𝑏2 , and so the affine curve does not have singular points.
It remains to be shown that the point 𝑂 = [0 ∶ 1 ∶ 0] at infinity is also nonsingular.
In projective coordinates, we have
𝑓(𝑋, 𝑌 , 𝑍) = −𝑌 2 𝑍 + 𝑋 3 + 𝑎𝑋𝑍 2 + 𝑏𝑍 3 .
The partial derivative with respect to 𝑍 is
𝐷𝑍 𝑓 = −𝑌 2 + 2𝑎𝑋𝑍 + 3𝑏𝑍 2 ,
and thus (𝐷𝑍 𝑓)(𝑂) = −1, which shows that 𝑂 is a nonsingular point. □
12.1. Weierstrass Equations and Elliptic Curves 217
The above proof also shows that a curve defined by a short Weierstrass equation is
nonsingular if and only if the cubic 𝑥3 + 𝑎𝑥 + 𝑏 does not have a double root.
We have seen that 𝐸(𝐾) ⊂ ℙ2 (𝐾) consists of all points satisfying the affine Weier-
strass equation and one additional point 𝑂 = [0 ∶ 1 ∶ 0] at infinity, i.e.,
𝐸(𝐾) = {(𝑥, 𝑦) ∈ 𝐾 × 𝐾 | 𝑦2 = 𝑥3 + 𝑎𝑥 + 𝑏} ∪ {𝑂}.
A very important fact is that points in 𝐸(𝐾) can be added. However, addition is not the
usual vector addition in 𝐾 2 , since the sum would not lie on the curve. Instead, we will
show that a line through two nonsingular points 𝑃 and 𝑄 intersects the elliptic curve
at a third point 𝑅 (see Figure 12.1), and use this property to define the addition. Let
𝐸 be an elliptic curve defined by the equation 𝑓(𝑥, 𝑦) = −𝑦2 + 𝑥3 + 𝑎𝑥 + 𝑏 = 0. It is
necessary to consider different cases:
(1) Firstly, we suppose that 𝑃 = (𝑥1 , 𝑦1 ), 𝑄 = (𝑥2 , 𝑦2 ) ∈ 𝐸(𝐾) are points with 𝑃, 𝑄 ≠ 𝑂
and different 𝑥-coordinates. Then a straight line through 𝑃 and 𝑄 intersects the
Weierstrass curve of degree 3 at a third point 𝑅 = (𝑥3 , 𝑦3 ) ∈ 𝐸(𝐾). In fact, the
equation of the line through 𝑃 and 𝑄 is
𝑦2 − 𝑦1
𝑦 = 𝑙(𝑥) = 𝑚(𝑥 − 𝑥1 ) + 𝑦1 , where 𝑚 = .
𝑥2 − 𝑥1
In order to find the 𝑥-coordinate of the third intersection point, we replace 𝑦 by
𝑙(𝑥) in the Weierstrass equation −𝑦2 +𝑥3 +𝑎𝑥 +𝑏 = 0. This gives a cubic equation
in the variable 𝑥:
𝑓(𝑥, 𝑙(𝑥)) = −(𝑚(𝑥 − 𝑥1 ) + 𝑦1 )2 + 𝑥3 + 𝑎𝑥 + 𝑏 = 𝑥3 − 𝑚𝑥2 + ⋯ = 0.
Since this equation already has two zeros in 𝐾 (the 𝑥-coordinates of 𝑃 and 𝑄), it
must have a third zero 𝑥3 ∈ 𝐾, and hence the cubic polynomial can be factorized:
𝑓(𝑥, 𝑙(𝑥)) = (𝑥 − 𝑥1 )(𝑥 − 𝑥2 )(𝑥 − 𝑥3 ) = 𝑥3 − (𝑥1 + 𝑥2 + 𝑥3 )𝑥2 + … .
By comparing the 𝑥2 -terms in the above expressions, we find that
𝑥1 + 𝑥2 + 𝑥3 = 𝑚2 .
Thus the coordinates of 𝑅 = (𝑥3 , 𝑦3 ) are
𝑥3 = 𝑚2 − 𝑥1 − 𝑥2 and 𝑦3 = 𝑚(𝑥3 − 𝑥1 ) + 𝑦1 .
(2) If 𝑃, 𝑄 ∈ 𝐸(𝐾) with 𝑃, 𝑄 ≠ 𝑂 have the same 𝑥-coordinate, then the line through
𝑃 and 𝑄 is a vertical line and, as we saw above, the point 𝑂 at infinity lies on all
vertical lines. Hence the third point is 𝑅 = 𝑂.
218 12. Elliptic Curve Cryptography
(3) Now let 𝑃 = 𝑄 = (𝑥1 , 𝑦1 ) ∈ 𝐸(𝐾) and 𝑃 ≠ 𝑂. If 𝑦1 ≠ 0 then we take the tangent
line at 𝑃. The tangent at 𝑃 is the line of equation
(𝑥 − 𝑥1 )(𝐷𝑥 𝑓)(𝑃) + (𝑦 − 𝑦1 )(𝐷𝑦 𝑓)(𝑃) = 0,
and rearranging gives the equation
(𝐷𝑥 𝑓)(𝑃) 3𝑥2 + 𝑎
𝑦 = 𝑡(𝑥) = 𝑚(𝑥 − 𝑥1 ) + 𝑦1 , where 𝑚 = − = 1 .
(𝐷𝑦 𝑓)(𝑃) 2𝑦1
Replacing 𝑦 with 𝑡(𝑥) in the Weierstrass equation 𝑓(𝑥, 𝑦) = −𝑦2 + 𝑥3 + 𝑎𝑥 + 𝑏 = 0
gives a cubic equation with a double root at 𝑥1 . We denote the other root by 𝑥3 .
This is the 𝑥-coordinate of the point 𝑅 = (𝑥3 , 𝑦3 ), the intersection of the tangent
with the elliptic curve. Factorization of the cubic polynomial gives
𝑓(𝑥, 𝑡(𝑥)) = (𝑥 − 𝑥1 )2 (𝑥 − 𝑥3 ) = 𝑥3 − (𝑥3 + 2𝑥1 )𝑥2 + … .
On the other hand, we have as above 𝑓(𝑥, 𝑡(𝑥)) = 𝑥3 − 𝑚2 𝑥2 + … , and comparing
the 𝑥2 -terms yields
𝑥3 = 𝑚2 − 2𝑥1 .
The corresponding 𝑦-coordinate of 𝑅 is
𝑦3 = 𝑚(𝑥3 − 𝑥1 ) + 𝑦1 .
We obtain almost the same formulas as in the case 𝑃 ≠ 𝑄, but the slope 𝑚 is
defined differently.
(4) If 𝑃 = 𝑄 = (𝑥1 , 𝑦1 ) ∈ 𝐸(𝐾) and 𝑦1 = 0, then 𝑥 = 𝑥1 is a vertical tangent line and
𝑅 = 𝑂 lies on that line.
(5) If 𝑃 = 𝑂 and 𝑄 = (𝑥1 , 𝑦1 ) ∈ 𝐸(𝐾), then the line through 𝑃 and 𝑄 is the vertical
line 𝑥 = 𝑥1 , which intersects the elliptic curve in 𝑅 = (𝑥1 , −𝑦1 ). Accordingly, if
𝑃 = (𝑥1 , 𝑦1 ) and 𝑄 = 𝑂 then 𝑅 = (𝑥1 , −𝑦1 ).
(6) Finally, if 𝑃 = 𝑄 = 𝑂 then 𝑅 = 𝑂.
In summary, given two points 𝑃, 𝑄 ∈ 𝐸(𝐾), there is a unique third point 𝑅 ∈ 𝐸(𝐾)
such that 𝑃, 𝑄 and 𝑅 (with multiplicities) lie on the same line. We can therefore define
the addition of points on 𝐸 by letting
𝑃 + 𝑄 + 𝑅 = 𝑂 ⟺ 𝑃 + 𝑄 = −𝑅
(see Figure 12.1). −𝑅 is the reflection of 𝑅 = (𝑥1 , 𝑦1 ) across the 𝑥-axis, i.e.,
−𝑅 = (𝑥1 , −𝑦1 ).
Note that the line through 𝑅 and −𝑅 is the vertical line 𝑥 = 𝑥1 and the third point on
that line is 𝑂, so that 𝑅 + (−𝑅) + 𝑂 = 𝑂, as expected.
In our computations above, we derived explicit formulas for the inverse point and
the addition of two points:
Proposition 12.11. Let 𝐾 be a field with 𝑐ℎ𝑎𝑟(𝐾) ≠ 2, 3 and 𝐸 an elliptic curve over 𝐾,
defined by the Weierstrass equation 𝑦2 = 𝑥3 + 𝑎𝑥 + 𝑏.
12.1. Weierstrass Equations and Elliptic Curves 219
Figure 12.1. The elliptic curve 𝐸 ∶ 𝑦2 + 𝑦 = 𝑥3 − 𝑥 over the real numbers. The line
through 𝑃 and 𝑄 intersects the curve in 𝑅 and 𝑃 + 𝑄 + 𝑅 = 𝑂.
𝑚= = 1, 𝑥3 = 1 − 1 − 1 ≡ 18 mod 19, 𝑦3 = (1 − 18) − 3 ≡ 18 mod 19, and so
2𝑃 = (18, 18). Furthermore, −𝑃 = (1, −3) = (1, 16). ♢
Theorem 12.13. 𝐸(𝐾) forms an abelian group with the above addition of points and the
identity element 𝑂 = [0 ∶ 1 ∶ 0].
Proof. We saw above that 𝑂 is the identity element, that every point 𝑃 has an inverse
point −𝑃 and that the addition is by construction commutative. It remains to be proven
that the addition is associative, i.e., (𝑃 + 𝑄) + 𝑅 = 𝑃 + (𝑄 + 𝑅) holds. This can be
shown by tedious computations using the explicit formulas in Proposition 12.11, by
geometric arguments (see [Was08]) or with more advanced results on algebraic curves
(see [Sil09]). □
Example 12.14. Consider the elliptic curve 𝐸 ∶ 𝑦2 = 𝑥3 + 3𝑥 + 5 over 𝐾 = 𝐺𝐹(19)
(see Example 12.12). We let SageMath find all points on this curve.
sage: E= EllipticCurve (GF (19) ,[3 ,5])
sage: E. points ()
[(0 : 1 : 0), (0 : 9 : 1), (0 : 10 : 1), (1 : 3 : 1), (1 : 16 : 1),
(2 :0 : 1), (4 : 9 : 1), (4 : 10 : 1), (6 : 7 : 1), (6 : 12 : 1),
(8 : 3 : 1), (8 : 16 : 1), (9 : 1 : 1), (9 : 18 : 1), (10 : 3 : 1),
(10 : 16 : 1), (11 : 1 : 1), (11 : 18 : 1), (14 : 6 : 1),
(14 : 13 : 1), (15 : 9 : 1), (15 : 10 : 1), (16 : 8 : 1),
(16 : 11 : 1), (18 : 1 : 1), (18 : 18 : 1)]
We see that 𝐸(𝐾) is an abelian group of order 26. By Theorem 4.29, 𝐸(𝐾) is iso-
morphic to ℤ26 ≅ ℤ13 × ℤ2 . The points in 𝐸(𝐾) are depicted in Figure 12.2. Note that
there is one extra point 𝑂 = [0 ∶ 1 ∶ 0] at infinity. All points must have order 1, 2, 13
or 26. Let 𝑃 = (1, 3) ∈ 𝐸(𝐾). We use SageMath to compute 2𝑃 and 13𝑃.
sage: E= EllipticCurve (GF (19) ,[3 ,5]); P=E(1 ,3)
sage: 2*P
(18 : 18 : 1)
sage: 13*P
(2 : 0 : 1)
Since 2𝑃 ≠ 𝑂 and 13𝑃 ≠ 𝑂, the point 𝑃 must have maximal order 26 and therefore
generates 𝐸(𝐾). ♢
For cryptographic use, one chooses a finite field 𝐾 = 𝐺𝐹(𝑝) or 𝐾 = 𝐺𝐹(2𝑚 ), an
elliptic curve 𝐸 over 𝐾 and a base point 𝑔 ∈ 𝐸(𝐾). The point 𝑔 generates a cyclic sub-
group 𝐺 = ⟨𝑔⟩ ⊂ 𝐸(𝐾) of order 𝑛 = ord(𝑔). 𝑛 should be a large prime or at least
contain a large prime factor. The cofactor is defined as ℎ = and usually ℎ is
small or equal to 1.
Determining the order of a point 𝑔 and the order of 𝐸(𝐾) is a non-trivial task, but
there are efficient algorithms (see [Was08]). Hasse’s Theorem provides the approxi-
mate number of points on an elliptic curve over a finite field:
12.1. Weierstrass Equations and Elliptic Curves 221
Theorem 12.15. Let 𝐸 be an elliptic curve over a finite field 𝐾 of order 𝑞. Then
| 𝑞 + 1 − ord(𝐸(𝐾)) | ≤ 2√𝑞. ♢
Note that the obvious estimate based on the Weierstrass equation only gives 1 ≤
ord(𝐸(𝐾)) ≤ 2𝑞 + 1.
Example 12.16. Let 𝐸 be any elliptic curve over 𝐺𝐹(19). Then 𝑞 + 1 = 20 and 2√𝑞 ≈
8.7. Hence 𝐸(𝐺𝐹(19)) must have between 12 and 28 points. In Example 12.14, we saw
that the order of 𝐸(𝐺𝐹(19)) is 26.
Definition 12.17. Let 𝐸 be an elliptic curve over a finite field 𝐾, 𝑔 ∈ 𝐸(𝐾) a base point,
𝐺 = ⟨𝑔⟩, 𝑛 = ord(𝐺) and 𝐴 ∈ 𝐺. Then the unique positive integer 𝑎 < 𝑛 such that
𝑎 ⋅ 𝑔 = 𝐴 is called the discrete logarithm log𝑔 (𝐴) of 𝐴 ∈ 𝐺. ♢
Note that we use the term discrete logarithm although the group operation on 𝐸(𝐾)
is written additively.
222 12. Elliptic Curve Cryptography
The security of elliptic curve cryptography relies on the hardness of the discrete
logarithm (DL) problem (see Section 10.3) in the group 𝐺 ⊂ 𝐸(𝐾). The elliptic curve
and the parameters must be carefully chosen, since there are less secure curves where
the computation of discrete logarithms can be reduced to an easier DL problem (see
Section 12.3 below). The construction of secure elliptic curves and their domain param-
eters is beyond the scope of this book.
Elliptic curve cryptography is widely standardized by national and international
organizations (e.g., ISO, ANSI, NIST, IEEE, IETF), and one of the proposed curves is
usually chosen.
p = A9FB57DBA1EEA9BC3E660A909D838D726E3BF623D52620282013481D1F6E5377
a = 7D5A0975FC2C3057EEF67530417AFFE7FB8055C126DC5C6CE94A4B44F330B5D9
b = 26DC5C6CE94A4B44F330B5D9BBD77CBF958416295CF7E1CE6BCCDC18FF8C07B6
g = (xg,yg)
xg= 8BD2AEB9CB7E57CB2C4B482FFC81B7AFB9DE27E1E3BD23C23A4453BD9ACE3262
yg= 547EF835C3DAC4FD97F8461A14611DC9C27745132DED8E545C1D54C72F046997
n = A9FB57DBA1EEA9BC3E660A909D838D718C397AA3B561A6F7901E0E82974856A7
h = 1
An eavesdropper, who only knows 𝐴 and/or 𝐵 as well as the elliptic curve and its
domain parameters, should not be able to derive 𝑎, 𝑏 or 𝑘 if the computational Diffie-
Hellman (CDH) problem (see Section 10.3) is hard in 𝐺.
Example 12.19. Alice and Bob agree on the elliptic curve 𝑦2 = 𝑥3 +3𝑥+5 over 𝐺𝐹(19),
and the base point is 𝑔 = 2 ⋅ (1, 3) = (18, 18). The point 𝑔 has order 13. Alice chooses
the secret key 𝑎 = 2 and computes 𝐴 = 𝑎 ⋅ 𝑔 = 2 ⋅ (18, 18) = (11, 18). Bob chooses the
secret key 𝑏 = 4 and computes 𝐵 = 𝑏 ⋅ 𝑔 = 4 ⋅ (18, 18) = (8, 3). They exchange 𝐴 and 𝐵.
Alice obtains the shared secret key by computing 𝑘 = 𝑎 ⋅ 𝐵 = 2 ⋅ (8, 3) = (9, 1) and
Bob computes 𝑘 = 𝑏 ⋅ 𝐴 = 4 ⋅ (11, 18) = (9, 1).
Remark 12.20. Elliptic curve Diffie-Hellman can also be used as a key encapsulation
mechanism (KEM) as explained in Section 10.6. The encapsulated key is 𝐻(𝑘), where
𝐻 ∶ 𝐺 → {0, 1}𝑙 is a key derivation (or hash) function on 𝐺 (or on 𝐾 if the 𝑥-coordinate
is used). The KEM also gives a hybrid elliptic curve encryption scheme called ECIES
(see Section 10.7).
The formulas for addition and doubling of points in Proposition 12.11 (2) also hold over
ℤ𝑁 if the point is not equal to 𝑂 modulo 𝑝 or 𝑞. We choose an elliptic curve 𝐸 and a
point 𝑃 ∈ 𝐸(ℤ𝑁 ). Then compute 𝑘𝑃, for example 𝑘 = 𝐵!, and check whether the result
exists as an affine point modulo 𝑁. This fails if and only if a denominator of a slope 𝑚
in the computation of 𝑘𝑃 is not invertible modulo 𝑁. In this case, the greatest common
divisor (gcd) of the denominator of 𝑚 and 𝑁 is not equal to 1, and it is very likely that
the gcd is either 𝑝 or 𝑞, and not 𝑁.
One may proceed as follows to choose an elliptic curve 𝐸 and a point 𝑃 over ℤ𝑁 :
choose random integers 𝑎, 𝑢 and 𝑣 between 0 and 𝑁 − 1. Let
𝑃 = (𝑢, 𝑣) mod 𝑁,
𝑏 = 𝑣2 − 𝑢3 − 𝑎𝑢 mod 𝑁,
𝐸 ∶ 𝑦2 = 𝑥3 + 𝑎𝑥 + 𝑏.
By construction we have 𝑃 ∈ 𝐸(ℤ𝑁 ). Then define 𝑘 = 𝐵! for some bound 𝐵 and com-
pute 𝑘𝑃. If 𝑘𝑃 does not exist modulo 𝑁 (as an affine point, see Remark 12.22 below),
then one has found a non-trivial factor of 𝑁, i.e., either 𝑝, 𝑞 or 𝑁. If 𝑘𝑃 exists, or if the
factor is 𝑁, then choose a new random curve 𝐸 and point 𝑃 or increase 𝐵 and start over.
Remark 12.22. We say that the point 𝑘𝑃 does not exist modulo 𝑁 if it cannot be com-
puted modulo 𝑁 (because of a non-invertible denominator). However, the correspond-
ing projective point exists and the reduction modulo 𝑝 or modulo 𝑞 is equal to the point
𝑂 at infinity. The aim of ECM is to produce such a non-affine point, since it yields a
non-trivial factor of 𝑁. This is analogous to Pollard’s 𝑝−1 method, where one is looking
for a power 𝑎𝑘 that is congruent to 1 modulo 𝑝 or modulo 𝑞.
226 12. Elliptic Curve Cryptography
𝐸 ∶ 𝑦2 = 𝑥3 + 10𝑥 − 126
and stop if any of these multiples does not exist as an affine point modulo 𝑁. The reader
can reproduce the following computations using SageMath.
We omit some intermediate steps and obtain
We denote this point by 𝑄 = (𝑥1 , 𝑦1 ). The next step is computing (7! )𝑃 = 7𝑄. Us-
ing the double-and-add algorithm and the formulas in Proposition 12.11, we compute
2𝑄 = (884483, 792125), 3𝑄 = 2𝑄 + 𝑄 = (179208, 246408) and 6𝑄 = 2 ⋅ (3𝑄) =
(1011121, 433793) = (𝑥2 , 𝑦2 ). Finally, 7𝑄 = 6𝑄 + 𝑄, but now the slope
𝑦2 − 𝑦1
𝑥2 − 𝑥 1
has a common factor with 𝑁. In fact, gcd(𝑥2 − 𝑥1 , 𝑁) = 1201 = 𝑝, and thus we have
found a factor of 𝑁. The second factor is 𝑞 = 1009.
We see that the elliptic curve factoring method is successful for 𝑁 and the chosen
curve 𝐸, point 𝑃 and multiple 𝑘 = 7!. After having found 𝑝 = 1201, it is not difficult
to understand why the method is successful: the group 𝐸(𝐺𝐹(1201)) has order 1176 =
23 ⋅ 2 ⋅ 72 , which contains only prime factors less than or equal to 7 so that (7! )𝑃 =
𝑂 mod 𝑝.
On the other hand, the order of 𝐸(𝐺𝐹(𝑞)) is 1041 = 3 ⋅ 347, and hence (7! )𝑃 ≠
𝑂 mod 𝑞. So we obtain only the factor 𝑝 and not the product 𝑝𝑞 = 𝑁. ♢
ECM is very successful in factoring integers 𝑁 with less than around 80 decimal
digits. For larger values of 𝑁, the sieve methods (see Section 9.7) are more efficient.
Exercises 227
12.5. Summary
5. Use the parameters in Example 12.18 for an elliptic curve Diffie-Hellman key ex-
change. Assume that Alice and Bob choose the following secret parameters:
Compute 𝐴, 𝐵, 𝑏𝐴, 𝑎𝐵 and 𝑘.
6. Factorize 𝑁 = 6227327 using the elliptic curve factoring method and SageMath.
(a) Choose 𝑎 = 4, 𝑢 = 6 and 𝑣 = 2. Give the associated elliptic curve over ℤ𝑁 and
the point 𝑃 ∈ 𝐸(ℤ𝑁 ).
Tip: E=EllipticCurve(IntegerModRing(N),[a,b]) and P=E(u,v).
(b) Let 𝐵 = 13!. Show that (12! )𝑃 exists (as an affine point), but not (13)! 𝑃.
(c) Find the critical denominator and compute the gcd with 𝑁.
(d) Give the factors 𝑝 and 𝑞 of 𝑁 and explain why the method is successful with
the chosen parameters.
7. The Elliptic Curve Digital Signature Algorithm (ECDSA) is the elliptic curve ana-
logue of DSA (see Exercise 11.10) and also standardized in [FIP13]. The scheme
uses an elliptic curve 𝐸 over a finite field 𝐾 and a base point 𝑔 of prime order 𝑛.
Choose a uniform secret key 𝑎 ∈ ℤ𝑛 and compute the point 𝐴 = 𝑎 𝑔 ∈ 𝐸(𝐾). The
domain parameters of the elliptic curve and 𝐴 form the public key. Similarly to
ElGamal and DSA, the signature is randomized and also requires a hash function
𝐻. In order to sign a message 𝑚, a secret uniform integer 𝑘 with 1 ≤ 𝑘 ≤ 𝑛 − 1 is
chosen and 𝑘 𝑔 is computed. Let 𝑟 be the 𝑥-coordinate modulo 𝑛 of the point 𝑘 𝑔.
If 𝑟 = 0 then choose a new value 𝑘. Otherwise, let
𝑠 = 𝑘−1 (𝐻(𝑚) + 𝑎𝑟) mod 𝑛.
If 𝑠 = 0 then start again with a new value 𝑘. Otherwise, the pair (𝑟, 𝑠) is the signa-
ture of 𝑚.
To verify the signature (𝑟, 𝑠) of a message 𝑚, one checks that 1 ≤ 𝑟 ≤ 𝑛 − 1 and
1 ≤ 𝑠 ≤ 𝑛 − 1. Then compute 𝑠−1 mod 𝑛, 𝑠−1 𝐻(𝑚) mod 𝑛, 𝑠−1 𝑟 mod 𝑛 and the
𝑅 = 𝑠−1 𝐻(𝑚)𝑔 + 𝑠−1 𝑟𝐴 ∈ 𝐸(𝐾).
Since ord(𝑔) = 𝑛, the result does not depend on the representatives of 𝑠−1 𝐻(𝑚)
and 𝑠−1 𝑟 modulo 𝑛. If 𝑅 = 𝑂 then the signature is invalid. Otherwise, reduce the
𝑥-coordinate of 𝑅 modulo 𝑛. The signature is valid if the result is 𝑟.
(a) Prove that the verification is correct.
(b) Use the elliptic curve and the parameters of Example 12.19; the base point is
𝑔 = (18, 18) and 𝑛 = ord(𝑔) = 13. Alice’s secret key is 𝑎 = 2. She wants to
sign a message 𝑚 with 𝐻(𝑚) ≡ 11 mod 𝑛 and chooses 𝑘 = 3. Compute the
signature (𝑟, 𝑠) and verify the signature using her public key.
(c) Show that 𝑘 must remain secret.
Chapter 13
Quantum Computing
230 13. Quantum Computing
Quantum bits can also be used for a secure key distribution, and we explain the
BB84 quantum cryptographic protocol in Section 13.6.
Two recommended textbooks for further reading on quantum computing are
[NC00] and [RP11].
Example 13.1. The measurement of a qubit having the state |0⟩ = 1 ⋅ |0⟩ + 0 ⋅ |1⟩ always
gives 0, and the measurement of the state |1⟩ always gives 1. On the other hand, if a
qubit has the state
1 1
|0⟩ + |1⟩ ,
√2 √2
then the probability of both 0 and 1 is , and a measurement outputs a uniform random
bit. This above state is quite useful and we denote it by |+⟩. ♢
A geometric representation of the state of a single qubit is given by the Bloch sphere
(see Figure 13.1). The points on a two-dimensional unit sphere in the three-dimension-
al space ℝ3 are given by two angles, the polar angle 𝜃 ∈[0, 𝜋] and the azimuth 𝜑 ∈]−𝜋, 𝜋].
In geography, the polar angle and the azimuth are called colatitude and longitude, re-
𝐳 = |0⟩
𝐱 = |+⟩
−𝐳 = |1⟩
How can we represent a state |𝜓⟩ = 𝑎 |0⟩ + 𝑏 |1⟩ by a point on the Bloch sphere?
We use the fact that the state of a qubit does not depend on a global phase 𝛾:
|𝜓⟩ ∼ 𝑒𝑖𝛾 |𝜓⟩ .
We may thus assume that the phase of the first coefficient 𝑎 is zero, so that 𝑎 is a non-
negative real number. We express the second coefficient 𝑏 ∈ ℂ in polar form:
𝑏 = 𝑟𝑒𝑖𝜑 ,
where 𝑟 ≥ 0 and 𝜑 ∈ ] − 𝜋, 𝜋]. The condition |𝑎|2 + |𝑏|2 = 1 implies 𝑎2 + 𝑟2 = 1,
since |𝑒𝑖𝜑 | = 1. The non-negative parameters 𝑎 and 𝑟 thus lie on a unit circle in the
first quadrant and we can write
𝜃 𝜃
𝑎 = cos ( ) and 𝑟 = sin ( ) ,
2 2
232 13. Quantum Computing
Note that a global phase does not change the state, e.g., 𝑖 |+⟩ ∼ |+⟩, but the relative
phase 𝜑 is important! For example, |+⟩ and |−⟩ are different states. ♢
is very useful in order to produce a balanced superposition of the basis states. Loosely
speaking, a 0-bit is turned into a qubit that is simultaneously 0 and 1. Measuring 𝐻 |0⟩
gives a uniform random bit (see Figure 13.3).
13.1. Quantum Bits 233
𝑎+𝑏 𝑎−𝑏
𝑎 |0⟩ + 𝑏 |1⟩ 𝐻 |0⟩ + |1⟩
√2 √2
|0⟩ 𝐻 𝑏
Since 𝑌 = −𝑖𝑍𝑋, Pauli-𝑌 is a composition of Pauli-𝑋 and Pauli-𝑍. The gates Pauli-
𝜋 𝜋 𝜋
𝑍, Phase and change the relative phase by 𝜋, and , respectively. On the Bloch
8 2 4
𝜋 𝜋
sphere, these gates give rotations of 𝜋, and around the 𝑧-axis. For example, one
2 4
𝜃 𝜃 𝜃 𝜃
𝑍 |𝜓⟩ = 𝑍 (cos ( ) |0⟩ + 𝑒𝑖𝜑 sin ( ) |1⟩) = cos ( ) |0⟩ − 𝑒𝑖𝜑 sin ( ) |1⟩ .
2 2 2 2
This state can be written as
𝜃 𝜃
𝑍 |𝜓⟩ = cos ( ) |0⟩ + 𝑒𝑖(𝜑+𝜋) sin ( ) |1⟩ .
2 2
Hence the 𝑍 gate adds 𝜋 to the azimuth 𝜑 on the Bloch sphere.
The reason for the historical name gate is the fact that 𝑇 is (up to an unimportant
global phase) equal to a matrix with ± on its diagonals:
𝑒−𝑖𝜋/8 0
𝑇 = 𝑒𝑖𝜋/8 ( 𝑖𝜋/8 ) .
0 𝑒
234 13. Quantum Computing
|𝑥1 𝑥2 … 𝑥𝑛 ⟩ ,
where 𝑥𝑖 ∈ {0, 1}. The general state of an 𝑛-qubit system is a superposition of the 2𝑛
basis states. Such a system is not the same as 𝑛 individual qubits!
Remark 13.6. The state of system of 𝑛 qubits is represented by the 𝑛-fold tensor prod-
uct of ℂ2 :
ℂ2 ⊗ ⋯ ⊗ ℂ2 = (ℂ2 )⊗𝑛 .
The general construction of tensor products is beyond our scope, but in this case
(ℂ2 )⊗𝑛 is a ℂ-vector space of dimension 2𝑛 . The elements in (ℂ2 )⊗𝑛 are linear com-
binations of vectors 𝑣1 ⊗ 𝑣2 ⊗ ⋯ ⊗ 𝑣𝑛 = |𝑣1 , 𝑣2 , … , 𝑣𝑛 ⟩, where 𝑣𝑖 ∈ ℂ2 . The tensor
product is linear in each component. The standard basis states are given by
𝑥1 ⊗ 𝑥2 ⊗ ⋯ ⊗ 𝑥𝑛 = |𝑥1 𝑥2 … 𝑥𝑛 ⟩ ,
where 𝑥𝑖 = |0⟩ or |1⟩. A general vector in (ℂ2 )⊗𝑛 can be written as
𝑣= ∑ 𝑎𝑥 |𝑥1 𝑥2 … 𝑥𝑛 ⟩ .
The normalization condition for multiple qubit systems requires that ‖𝑣‖ = 1. Uni-
tary operators 𝑈𝑓1 , … , 𝑈𝑓𝑛 on ℂ2 induce the unitary operator
𝑈𝑓 = 𝑈𝑓1 ⊗ ⋯ ⊗ 𝑈𝑓𝑛
2 ⊗𝑛
on (ℂ ) , but there are additional operators that are not of this type.
As with single qubits, a multiple qubit state does not depend on a global phase and
A two qubit system is represented by a state in ℂ2 ⊗ ℂ2 . The four basis states are |00⟩,
|01⟩, |10⟩, |11⟩, and a general state is a superposition of the four basis states:
One can show that the Bell states are not the product of any two single qubit states
(see Exercise 1). In fact, two entangled qubits behave differently from two single qubits.
The basis states of a system of 𝑛 qubits are |𝑥1 𝑥2 … 𝑥𝑛 ⟩ where 𝑥𝑖 ∈ {0, 1}. A general
state is a superposition
|𝜓⟩ = ∑ 𝑎𝑥 |𝑥⟩ with ∑ |𝑎𝑥 |2 = 1.
𝑥∈{0,1}𝑛 𝑥∈{0,1}𝑛
An obvious generalization of the Bloch sphere for multiple qubits is not known. Note
that the full state of a system of 𝑛 qubits involves 2𝑛 complex amplitudes (and 2𝑛 − 1
degrees of freedom, since the amplitude vector is normalized and multiplication by a
global phase is unimportant). This is a huge number, say for 𝑛 > 100. We emphasize
that a measurement outputs only one binary word 𝑥 of length 𝑛 and the probability of
𝑥 being measured is |𝑎𝑥 |2 .
qubits. This is a major difference to classical circuits, which are often not reversible,
for example the elementary AND or OR gates. However, this does not pose a serious
restriction, since there are invertible analogues of the classical gates. It turns out that
every classical circuit has a quantum analogue, giving the same output on the basis
states, but also processing superpositions of the basis states.
Definition 13.8. The controlled-NOT operation 𝐶𝑁𝑂𝑇 on two input qubits with basis
states |𝑥⟩ and |𝑦⟩ is given by
𝐶𝑁𝑂𝑇 |𝑥, 𝑦⟩ = |𝑥, 𝑥 ⊕ 𝑦⟩
(see Figure 13.4). This transformation acts on the basis states as follows: the first bit
(the control bit) is unchanged and the second bit (the target bit) is flipped if the control
bit is 1. The states |00⟩ and |01⟩ are thus unchanged, |10⟩ is mapped to |11⟩ and |11⟩ to
|10⟩. The CNOT gate is represented by the unitary matrix
1 0 0 0
⎛ ⎞
0 1 0 0
𝑈=⎜ ⎟,
⎜0 0 0 1⎟
⎝0 0 1 0⎠
since the first two basis states remain unchanged, while the third and the fourth basis
state are swapped. ♢
|𝑥⟩ |𝑥⟩
CNOT can produce entangled states and this two-bit gate is not the tensor product
of two single-qubit gates. For example, two qubits are transformed into the Bell state:
1 1
𝐶𝑁𝑂𝑇 ( (|00⟩ + |10⟩)) = (|00⟩ + |11⟩) .
√2 √2
CNOT is an example of a two qubit controlled gate (see Figure 13.5): if the first (control)
qubit is |1⟩, then a single qubit operation 𝑄 is performed on the second (target) qubit.
If the control qubit is |0⟩, then the target qubit is unchanged (see Figure 13.5). The
controlled 𝑄 gate is represented by a 4 × 4 matrix of four 2 × 2 blocks, where 𝐼2 is the
2 × 2 identity matrix:
𝐼 0
(2 ).
0 𝑄
Theorem 13.9. Single qubit gates and the CNOT gate are sufficient to implement an
arbitrary unitary operation on 𝑛 qubits.
Proof. We refer to [NC00] for a proof. Firstly, one shows that an arbitrary unitary
matrix can be decomposed into a product of two-level unitary matrices, where at most
two coordinates are changed. Secondly, circuits from two-level unitary matrices are
built from single qubit and CNOT gates. □
Remark 13.10. At the time of writing, IBM offers quantum computers and simu-
lators for public use (https://quantum-computing.ibm.com). You can create and
run your own algorithms on quantum devices using a graphical composer or a Quan-
tum Assembly Language Code (QASM) editor. Circuits are built from a set of ele-
mentary gates (𝑋, 𝑌 , 𝑍, 𝐻, 𝑆, 𝑇, CNOT, … ), and combinations of them can express
more complicated unitary transformations. Also look at the open source framework
Qiskit (https://qiskit.org) for working with noisy quantum computers.
Many quantum algorithms take as input a superposition of the basis states. The
single qubit Hadamard gate 𝐻 (see Section 13.1) transforms the basis state |0⟩ into the
(|0⟩ + |1⟩) .
The Walsh-Hadamard transformation generalizes this to a system of 𝑛 qubits. It can
be implemented by 𝑛 parallel Hadamard gates.
Definition 13.11. The Walsh-Hadamard transformation 𝑊 acts on a system of 𝑛 qubits
and is defined by
𝑊 = 𝐻 ⊗ 𝐻 ⊗ ⋯ ⊗ 𝐻 = 𝐻 ⊗𝑛 . ♢
|0𝑛 ⟩ 𝑊 ∑ |𝑥⟩
√2𝑛 𝑥∈{0,1}𝑛
The second equation follows from the fact that the tensor product is linear in each
Quantum algorithms can use this superposition to simultaneously compute all val-
ues of a vectorial Boolean function 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑚 . Since 𝑓 may not be invertible
(which is always the case if 𝑛 ≠ 𝑚), but quantum circuits must be invertible, 𝑓 has to
be tweaked. Given 𝑓, we define 𝐹 ∶ {0, 1}𝑛+𝑚 → {0, 1}𝑛+𝑚 by
𝐹(𝑥, 𝑦) = (𝑥, 𝑦 ⊕ 𝑓(𝑥)).
Obviously, 𝐹 is invertible and 𝐹 −1 = 𝐹. The corresponding unitary transformation on
a system of 𝑛 + 𝑚 qubits is given by
𝑈𝑓 (|𝑥, 𝑦⟩) = |𝑥, 𝑦 ⊕ 𝑓(𝑥)⟩
(see Figure 13.7). Such a transformation can be efficiently implemented by combining
elementary gates.
|𝑥⟩ |𝑥⟩
|𝑦⟩ |𝑦 ⊕ 𝑓(𝑥)⟩
Example 13.12. Let 𝑓(𝑥1 , 𝑥2 ) = 𝑥1 𝑥2 be the classical AND operation on two input bits
(see Table 1.1). The corresponding invertible function is 𝐹 ∶ {0, 1}3 → {0, 1}3 , where
𝐹(𝑥1 , 𝑥2 , 𝑦) = (𝑥1 , 𝑥2 , 𝑦 ⊕ 𝑥1 𝑥2 ). The quantum transformation is called a Toffoli gate
and is given by
𝑈𝑓 (|𝑥1 , 𝑥2 , 𝑦⟩) = |𝑥1 , 𝑥2 , 𝑦 ⊕ 𝑥1 𝑥2 ⟩
on three input qubits with the basis states |𝑥1 ⟩, |𝑥2 ⟩ and |𝑦⟩. ♢
One can leverage 𝐹 to compute 𝑓 by setting the second input component 𝑦 to the
zero string 0𝑚 . Then 𝐹(𝑥, 0𝑚 ) = (𝑥, 𝑓(𝑥)) and thus 𝑈𝑓 |𝑥, 0𝑚 ⟩ = |𝑥, 𝑓(𝑥)⟩. Further-
more, 𝑈𝑓 maps a superposition ∑𝑥 𝑎𝑥 |𝑥, 0𝑚 ⟩ to
∑ 𝑎𝑥 |𝑥, 𝑓(𝑥)⟩ .
13.3. Quantum Algorithms 239
1 1
𝑈𝑓 (𝑊 |0𝑛 ⟩ , 0𝑚 ) = 𝑈𝑓 ( ∑ |𝑥, 0𝑚 ⟩) = ∑ |𝑥, 𝑓(𝑥)⟩ .
√2𝑛 𝑥∈{0,1}𝑛 √2𝑛 𝑥∈{0,1}𝑛
0𝑛 𝑊 ∑ |𝑥⟩
√2𝑛 𝑥∈{0,1}𝑛
0𝑚 ∑ |𝑓(𝑥)⟩
√2𝑛 𝑥∈{0,1}𝑛
Figure 13.8. Input of the circuit is the zero state and output is the superposition of all
values of 𝑓.
|0𝑛 ⟩ 𝑊 𝑊 𝑤
|1⟩ 𝐻
the Deutsch-Josza quantum algorithm is polynomial in 𝑛. The first step of the Deutsch-
Josza algorithm is to apply the Walsh-Hadamard transformation 𝐻 ⊗(𝑛+1) to the input
state |0𝑛 , 1⟩. Since 𝐻 |1⟩ = (|0⟩ − |1⟩), we obtain the state
Suppose |𝑥⟩ is a basis state |𝑥1 … 𝑥𝑛 ⟩. If 𝑓(𝑥) = 0 then |0⟩ − |1⟩ remains unchanged. If
𝑓(𝑥) = 1 then |0⟩ − |1⟩ is mapped to |1⟩ − |0⟩ = −(|0⟩ − |1⟩). In both cases, we have
In the next step, 𝐻 ⊗𝑛 ⊗ 𝐼𝑑 is applied to 𝑈𝑓 |𝜓⟩, so that |𝑥⟩ is mapped to 𝐻 ⊗𝑛 |𝑥⟩. One
can check the expansion
𝐻 ⊗𝑛 |𝑥⟩ = ∑ (−1)𝑥⋅𝑧 |𝑧⟩
√2𝑛 𝑧∈{0,1}𝑛
(see Exercise 8), where 𝑥 ⋅ 𝑧 is the dot product modulo 2. This yields
1 1
(𝐻 ⊗𝑛 ⊗ 𝐼𝑑) (𝑈𝑓 |𝜓⟩) = ∑ ∑ (−1)𝑓(𝑥)+𝑥⋅𝑧 |𝑧⟩ ⊗ (|0⟩ − |1⟩) .
𝑧∈{0,1}𝑛 𝑥∈{0,1}𝑛
2𝑛 √2
Finally, we measure the first 𝑛 qubits. The coefficient of the basis state |0𝑛 ⟩ is
∑ (−1)𝑓(𝑥) ,
since 𝑧 = 0𝑛 obviously yields 𝑥 ⋅ 𝑧 = 0 for all 𝑥 ∈ {0, 1}𝑛 . If 𝑓 is constant, then the
above coefficient is either 1 or −1. Hence the probability of measuring 0𝑛 is equal to 1
and any measurement must give 0𝑛 . On the other hand, if 𝑓 is balanced then positive
and negative terms cancel and the probability of measuring 0𝑛 is 0. The Deutsch-Josza
algorithm outputs 𝑓 is constant, if a measurement gives 0𝑛 , and otherwise 𝑓 is balanced.
This result is always correct and the quantum algorithm runs in polynomial time.
13.4. Quantum Fourier Transform 241
𝑦𝑘 = ∑ 𝑎𝑥 𝑒−2𝜋𝑖𝑥𝑘/𝑁 , where 𝑘 = 0, 1, … , 𝑁 − 1.
√𝑁 𝑥=0
We note that the above unitary DFT differs from other variants by a normalization
factor. The computation can be accelerated using the Fast Fourier Transform (FFT).
The DFT is given by the following unitary 𝑁 × 𝑁 matrix:
1 1 1 … 1
⎛ ⎞
1 𝜔 𝜔2 … 𝜔𝑁−1
1 ⎜ ⎟
𝑈= ⎜1 𝜔2 𝜔4 … 𝜔2(𝑁−1)
√𝑁 …
⎜ ⎟
⎝1 𝜔𝑁−1 𝜔2(𝑁−1) … 𝜔(𝑁−1)(𝑁−1) ⎠
The corresponding matrix of the inverse DFT is 𝑈 −1 = 𝑈 , the conjugate transpose
The DFT computes the discrete spectrum of the input data. If the data of length 𝑁
is 𝑟-periodic and 𝑁 is divisible by 𝑟, then the Fourier coefficients 𝑦𝑘 are nonzero only
for multiples of . In general, a Fourier amplitude |𝑦𝑘 | ≫ 0 indicates that is an
𝑟 𝑘
approximate multiple of the period.
242 13. Quantum Computing
Example 13.13. Let 𝑎 = (1, 2, 1, 2)𝑇 be a data vector of length 𝑁 = 4. The data is
𝑟 = 2-periodic. The unitary 4 × 4 matrix that describes the DFT is
1 1 1 1
⎛ ⎞
1 1 −𝑖 −1 𝑖
𝑈= ⎜ ⎟.
2 ⎜1 −1 1 −1⎟
⎝1 𝑖 −1 −𝑖 ⎠
We compute the Fourier coefficients 𝑦 = 𝑈𝑎 = (3, 0, −1, 0)𝑇 . Only the coefficients 𝑦0
and 𝑦2 are nonzero and hence 𝑎 is = 2-periodic. The input vector 𝑎 can be recovered
from the Fourier coefficients by computing 𝑎 = 𝑈 −1 𝑦 = 𝑈 𝑦. ♢
In a quantum setting, we want to compute the DFT of a large input vector (𝑎0 , 𝑎1 ,
… , 𝑎𝑁−1 ) of length 𝑁 = 2𝑠 and find a hidden period of the data. The basic idea is that the
indices 𝑘 with Fourier coefficients |𝑦𝑘 |2 ≫ 0 reveal the period. The measurement of a
state vector of Fourier amplitudes will give such indices 𝑘 with a significant probability.
The Quantum Fourier Transform (QFT) does a DFT on the amplitudes of the quan-
tum state and outputs a superposition of Fourier coefficients:
𝑁−1 𝑁−1
|𝜓⟩ = 𝐶 ∑ 𝑎𝑥 |𝑥⟩ ↦ 𝑈 |𝜓⟩ = 𝐶 ∑ 𝑦𝑘 |𝑘⟩ .
𝑥=0 𝑘=0
The scaling factor 𝐶 ensures that the coefficient vector of |𝜓⟩ is normalized. The Fourier
coefficients are
𝑦𝑘 = ∑ 𝑎𝑥 𝑒2𝜋𝑖𝑥𝑘/𝑁 .
√𝑁 𝑥=0
Note that the above QFT uses the primitive 𝑁-th root of unity 𝜔 = 𝑒2𝜋𝑖/𝑁 and not the
conjugate value 𝑒−2𝜋𝑖/𝑁 . One can show that the QFT has an efficient circuit and runs in
time 𝑂(size(𝑁)2 ). In the simplest case (𝑁 = 2), the QFT is given by a single Hadamard
Now, suppose that the input vector (𝑎0 , 𝑎1 , … , 𝑎𝑁−1 ) is 𝑟-periodic, where 𝑟 is an
unknown period between 1 and 𝑁. One prepares the state |𝜓⟩ = 𝐶 ∑𝑥=0 𝑎𝑥 |𝑥⟩ of an
𝑛-qubit system, applies the QFT and measures the state 𝑈 |𝜓⟩ = 𝐶 ∑𝑘=0 𝑦𝑘 |𝑘⟩. With
a high probability, the measured output 𝑘 is an approximate multiple of .
that runs in polynomial time ([Sho94]). This algorithm is a major application of quan-
tum computing. It has been successfully implemented for toy examples (like factoring
21 = 7 ⋅ 3) and will certainly be applied to real-world problems as soon as quantum
computers with thousands of qubits become available.
Shor’s algorithm finds a hidden period of a function and is based on the Quantum
Fourier Transform.
Firstly, we explain how to derive the unknown factors of a composite number 𝑛
from the multiplicative order of an element 𝑎 ∈ ℤ∗𝑛 . Note that the multiplicative order
of 𝑎 is also the least period of the function 𝑓(𝑥) = 𝑎𝑥 mod 𝑛.
Suppose that 𝑛 = 𝑝𝑞, where 𝑛 is known and 𝑝, 𝑞 are unknown. Choose a uniform
random integer 𝑎 with 1 < 𝑎 < 𝑛. If gcd(𝑎, 𝑛) ≠ 1, then gcd(𝑎, 𝑛) is either 𝑝 or 𝑞 and
the unknown factors are found. However, the probability of gcd(𝑎, 𝑛) ≠ 1 is very small
if the prime factors are large and 𝑎 is randomly chosen.
Now assume that gcd(𝑎, 𝑛) = 1; then 𝑎 mod 𝑛 ∈ ℤ∗𝑛 and
𝑟 = ord(𝑎) ∣ ord(ℤ∗𝑛 ) = (𝑝 − 1)(𝑞 − 1)
(see Corollary 4.14). By definition, 𝑎𝑟 ≡ 1 mod 𝑛. If 𝑟 is even, then
𝑎𝑟 − 1 = (𝑎𝑟/2 − 1)(𝑎𝑟/2 + 1) ≡ 0 mod 𝑛,
and thus 𝑛 ∣ (𝑎𝑟/2 − 1)(𝑎𝑟/2 + 1). Since ord(𝑎) ≠ , we have 𝑛 ∤ (𝑎𝑟/2 − 1) and there are
two possibilities:
(1) 𝑝 divides one of the two factors and 𝑞 divides the other. In this case gcd(𝑎𝑟/2 +1, 𝑛)
gives 𝑝 or 𝑞.
(2) 𝑛 ∣ (𝑎𝑟/2 + 1), then the algorithm fails and one has to choose another base 𝑎.
The algorithm is successful if 𝑟 is even and 𝑛 ∤ (𝑎𝑟/2 + 1). Closer analysis shows
that the probability of success is at least 50% (compare [NC00]): let 𝑟𝑝 and 𝑟𝑞 be the
order of 𝑎 in ℤ∗𝑝 and ℤ∗𝑞 , respectively. Then 𝑟 is odd if and only if 𝑟𝑝 and 𝑟𝑞 are odd (see
Exercise 10). Now suppose that 𝑟 is even and 2𝑑 is the maximal power of 2 that divides
𝑟. We have 𝑎𝑟/2 + 1 ≡ 0 mod 𝑛 if and only if 𝑎𝑟/2 ≡ −1 mod 𝑝 and 𝑎𝑟/2 ≡ −1 mod 𝑞.
𝑟 𝑟
This requires 𝑟𝑝 ∤ and 𝑟𝑞 ∤ . Since 𝑟𝑝 ∣ 𝑟 and 𝑟𝑞 ∣ 𝑟, we obtain 2𝑑 ∣ 𝑟𝑝 and 2𝑑 ∣ 𝑟𝑞 .
2 2
Summarizing, the algorithm fails if either 2 ∤ 𝑟𝑝 and 2 ∤ 𝑟𝑞 or 2𝑑 ∣ 𝑟𝑝 and 2𝑑 ∣ 𝑟𝑞 .
If 𝑎 is chosen uniformly at random, then the probability for this to happen is at most
Example 13.14. Let 𝑛 = 77 and 𝑎 = 3; then gcd(𝑎, 𝑛) = 1. Suppose we know the
order of 𝑎 mod 𝑛 in the multiplicative group ℤ∗𝑛 : the order is 𝑟 = ord(3 mod 77) = 30
and this is an even number. We obtain
𝑎𝑟/2 + 1 = 315 + 1 ≡ 35 mod 77.
We compute gcd(𝑎𝑟/2 + 1, 𝑛) = gcd(35, 77) = 7 and obtain one of prime factors of
𝑛 = 77. Note that gcd(𝑎𝑟/2 − 1, 𝑛) = gcd(33, 77) = 11 gives the other factor of 𝑛.
244 13. Quantum Computing
|0𝑠 ⟩ 𝑊 𝑄𝐹𝑇 𝑘
|0𝑚 ⟩
Figure 13.10. Shor’s algorithm uses the Quantum Fourier Transform and finds a hid-
den period of a function 𝑓.
Now, the quantum part of Shor’s algorithm is to compute the unknown order 𝑟
of a given residue class 𝑎 ∈ ℤ∗𝑛 . For that purpose, one prepares a superposition of
input values 𝑥 = 0, 1, … , 𝑁 − 1 and simultaneously computes all 𝑎𝑥 mod 𝑛. The
values are 𝑟-periodic, i.e., 𝑎𝑥 ≡ 𝑎𝑥+𝑟 . The QFT is applied to the state and we will see
that a measurement reveals the period with high probability. The sequence must be
significantly longer than the period and it turns out that 𝑁 = 2𝑠 with 𝑛2 ≤ 𝑁 ≤ 2𝑛2 is
a reasonable choice.
We view 𝑓(𝑥) = 𝑎𝑥 mod 𝑛 as a Boolean function. The corresponding unitary
transformation on quantum bits is 𝑈𝑓 |𝑥, 𝑦⟩ = |𝑥, 𝑦 ⊕ 𝑓(𝑥)⟩. The first register has 𝑠
qubits and the second has 𝑚 = size(𝑛) qubits.
The Walsh-Hadamard transformation maps |𝑥⟩ = |0𝑠 ⟩ to a superposition of all
basis states. We set |𝑦⟩ = |0𝑚 ⟩, apply 𝑈𝑓 and obtain the state
|𝜓⟩ = 𝑈𝑓 (𝑊 |0𝑠 ⟩ , |0𝑚 ⟩) = ∑ |𝑥, 𝑓(𝑥)⟩ .
√𝑁 𝑥=0
Next, the QFT operator 𝑈 is applied to the first register while the second remains un-
𝑁−1 𝑁−1
(𝑈 ⊗ 𝐼𝑑) |𝜓⟩ = ∑ ∑ 𝑒2𝜋𝑖𝑥𝑘/𝑁 |𝑘, 𝑓(𝑥)⟩ .
𝑁 𝑥=0 𝑘=0
Finally, the first register is measured (see Figure 13.10). The second register takes a
random value 𝑢 ∈ 𝑖𝑚(𝑓) and the probability of measuring |𝑘, 𝑢⟩ is
|1 |
| ∑ 2𝜋𝑖𝑥𝑘/𝑁
𝑒 | .
|𝑁 |
| 𝑥∶ 𝑓(𝑥)=ᵆ |
There is a high probability that 𝑘 is an approximate multiple of and the inequalities
hold for some 𝑗. One can show that at most one fraction with 0 < 𝑗 < 𝑛 and 0 < 𝑟 < 𝑛
can satisfy this inequality. The fraction and the requested period 𝑟 can be efficiently
determined using the continued fraction expansion (see Example 13.16 below).
The reader might be surprised that the QFT of the first register gives anything in-
teresting, since the amplitudes of the first register of |𝜓⟩ are constant. However, the
amplitudes are partitioned by the second register, i.e., by different values of 𝑢 = 𝑓(𝑥) =
𝑎𝑥 mod 𝑛. We can rewrite |𝜓⟩ as
|𝜓⟩ = ∑ ∑ |𝑥, 𝑢⟩ .
√𝑁 ᵆ∈𝑖𝑚(𝑓) 𝑓(𝑥)=ᵆ
The amplitudes with different 𝑢 in the second register do not interfere with each other
when the QFT is applied to the first register. Now, for a fixed second register 𝑢, the
first register is 𝑟-periodic and applying the QFT gives peaks at multiples of . If 𝑁 is
divisible by 𝑟, the Fourier amplitudes are zero outside multiples of .
Remark 13.15. Shor’s algorithm requires around 3 size(𝑛) qubits and uses
𝑂(size(𝑛)3 ) operations, and optimizations are known.
Note that we need 13 qubits for the first register and 7 qubits for the second. We can
reorder the terms with respect to the second register 𝑢 = 3𝑥 mod 77. Suppose for
example that 𝑢 = 59; then the corresponding terms in |𝜓⟩ are
(|19, 59⟩ + |49, 59⟩ + |79, 59⟩ + ⋯ + |8179, 59⟩) .
The expansions for other values 𝑢 in the second register look similar and the period 𝑟 =
30 is clearly visible, but the state is not directly accessible to an observer. Instead, we
apply the QFT to the first register and measure it. The second register takes a random
value 𝑢 ∈ 𝑖𝑚(𝑓), for example 𝑢 = 59. The amplitudes |𝑦𝑘 | of |𝑘, 𝑢⟩ for 𝑘 = 0, 1, … , 8191
and 𝑢 = 59 are shown in Figure 13.11. The squares |𝑦𝑘 |2 give the probability that |𝑘, 59⟩
is measured. The probabilities for any other value 𝑢 ∈ 𝑖𝑚(𝑓) are identical. Closer
inspection shows peaks at all multiples of ≈ 273. Table 13.1 lists some amplitudes
around 𝑘 = 273.
Again, we remark that these amplitudes are not accessible to an observer, but the
measured value 𝑘 is likely to be a multiple of . Suppose that 𝑘 = 7100. We expect
246 13. Quantum Computing
Figure 13.11. Fourier amplitudes of |𝑘, 59⟩. The spectrum has peaks at multiples of
≈ 273.
𝑘 7100 𝑗
that = is close to a fraction where 𝑗 and 𝑟 are integers less than 𝑛 = 77. In
𝑁 8192 𝑟
this toy example, we could simply try out the possible values for 𝑗 and 𝑟, but in general,
the efficient method of continued fractions expansions is used.
The idea is to approximate a real number 𝑥 by continued fractions of integer num-
bers. The number is split into its integer part ⌊𝑥⌋ and its fractional part 𝜖0 :
𝑥 = ⌊𝑥⌋ + 𝜖0 = ⌊𝑥⌋ + .
( )
13.5. Shor’s Factoring Algorithm 247
Next, is split into an integer and a fractional part and we obtain
𝑥 = ⌊𝑥⌋ + 𝜖0 = ⌊𝑥⌋ + 1
⌊ ⌋ + 𝜖1
1 1
We continue in the same fashion, write 𝜖1 = 1
and split into an integer and a
( ) 𝜖1
fractional part. The method terminates after a finite number of steps if 𝑥 is a rational
number, and otherwise approximates 𝑥.
For our example, we let SageMath compute the sequence of integer parts:
sage: (7100/8192). continued_fraction ()
[0; 1, 6, 1, 1, 136]
7100 1 13
≈ 1 = .
8192 1 + 1
6+ 1
𝑗 13
We obtain = and therefore the period must be a multiple of 15 less than 𝑛 = 77.
𝑟 15
One checks that 315 ≡ 34 ≢ 1 mod 77 and 330 ≡ 1 mod 77. We conclude that
𝑁 8192
𝑟 = ord(3) = 30. Finally, we verify that = ≈ 273.07, which explains the peaks
𝑟 30
of the spectrum at multiples of this number. ♢
We have seen above that Shor’s algorithm can efficiently solve the factoring prob-
lem. The other major cryptographic problem, the discrete logarithm problem, can also
be solved with a period-finding algorithm. This can be applied to the multiplicative
group of integers modulo a prime number 𝑝 and also to the group of points on an el-
liptic curve over a finite field.
Suppose that 𝐴 = 𝑔𝑎 holds in a cyclic group 𝐺 = ⟨𝑔⟩ of order 𝑛 and 𝑎 is unknown.
Consider the map
𝑓 ∶ ℤ × ℤ → 𝐺, 𝑓(𝑥, 𝑦) = 𝐴𝑥 𝑔−𝑦 .
This map is (1, 𝑎)-periodic since
𝑓(𝑥 + 1, 𝑦 + 𝑎) = 𝐴𝑥+1 𝑔−𝑦−𝑎 = 𝑔𝑎𝑥+𝑎 𝑔−𝑦−𝑎 = 𝐴𝑥 𝑔−𝑦 = 𝑓(𝑥, 𝑦).
248 13. Quantum Computing
Let 𝛿 be a positive integer constant which determines the probability of success of the
protocol. Alice chooses a uniform random key string 𝑘 of length 4𝑛 + 𝛿 bits. Further-
more, Alice and Bob choose 4𝑛 + 𝛿 bits that will determine whether 𝐵0 or 𝐵1 is used:
$ $ $
𝑘 ← {0, 1}4𝑛+𝛿 , 𝑏𝐴 ← {0, 1}4𝑛+𝛿 , 𝑏𝐵 ← {0, 1}4𝑛+𝛿 .
Now Alice sends the key 𝑘 as a sequence of single qubits to Bob. She uses the basis 𝐵0
if the corresponding bit of 𝑏𝐴 is 0, or otherwise 𝐵1 .
Bob receives the sequence of qubits and measures them with respect to 𝐵0 or 𝐵1 ,
depending on the corresponding bit of 𝑏𝐵 . If the 𝑖-th bit of 𝑏𝐴 and 𝑏𝐵 are equal, then
Bob measures the correct 𝑖-th bit of 𝑘. Otherwise, the probability of a correct key bit is
only 50%. For example, suppose that Alice is using 𝐵0 and sending the bit 1; then the
transmitted qubit is |1⟩. If Bob chooses the basis 𝐵1 , then
1 1
|1⟩ = |+⟩ − |−⟩ .
√2 √2
Hence the probability of measuring |−⟩, which corresponds to the bit 1, is only 50%.
After Bob has received and measured the key bits, both partners exchange their
bases 𝑏𝐴 and 𝑏𝐵 via the conventional public channel. They discard the key bits that
were measured in a different basis and restart the key exchange if less than 2𝑛 bits
remain. Obviously, the constant 𝛿 determines the probability of a successful exchange.
They keep the first 2𝑛 key bits. The following step aims to reveal the interference of
an adversary. Alice chooses a subset of 𝑛 key bits and sends Bob the selected positions.
They exchange the associated key bits via the public channel. They compare the bits
and abort the protocol if the number of errors is higher than expected. The remaining
𝑛 bits are used as a secret key, which needs to be further transformed (information
reconciliation and privacy amplification), in order to reduce the effects of errors and
undetected interference by adversaries.
We are now discussing the security of the BB84 protocol. Firstly, the non-quantum
communication channel between Alice and Bob can be public, but integrity is impor-
tant. Furthermore, Alice has to generate and transmit single qubits, for example single
polarized photons, since an eavesdropper could otherwise use any extra particles with
the same state for a measurement.
It may seem surprising that the key can be sent without any protection. How-
ever, an eavesdropper has to measure the qubits in order to get any information. This
requires choosing a basis, i.e., 𝐵0 or 𝐵1 , which is incorrect in about half of the cases.
Remember that the correct basis is only known to Alice during the transmission of the
qubits. At first, Alice and Bob are not aware of an interception, but the error rate of the
check bits will increase significantly. In fact, Bob will measure around 25% incorrect
check bits, since the error rate is 50% if an eavesdropper used the wrong basis. There-
fore, Alice and Bob can detect an adversary who has intercepted a sufficient number of
quantum bits. Assuming that Alice and Bob accept a maximum error rate of 2.5%, an
250 13. Quantum Computing
adversary can only eavesdrop around 10% of the bits if they want to remain undetected.
Privacy amplification methods reduce an adversary’s partial information on the key by
producing a new, shorter key.
Table 13.2. Quantum key distribution example.
Position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Alice’s key 0 1 0 0 1 1 0 1 0 0 1 0 1 0 1 1 0
Alice’s basis 1 0 1 1 0 0 0 1 1 1 0 0 0 1 1 1 0
Alice sends + 1 + + 1 1 0 – + + 1 0 1 + – – 0
Bob’s basis 1 0 0 0 0 1 0 1 0 1 1 1 1 1 1 0 1
Bob measures + 1 1 0 1 – 0 – 1 + – + + + – 0 +
Same basis ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Shared key 0 1 1 0 1 0 0 1
Check bits 1 1 0 0
Key bits 0 0 1 1
Example 13.17. (See Table 13.2) Suppose 𝑛 = 4 and 𝛿 = 1. Alice generates the random
𝑘 = 0100 1101 0010 1011 0
of length 17. Alice and Bob’s choice of bases is given by the binary strings
𝑏𝐴 = 1011 0001 1100 0111 0,
𝑏𝐵 = 1000 0101 0111 1110 1,
where 0 represents the basis 𝐵0 = {|0⟩ , |1⟩} and 1 the basis 𝐵1 = {|+⟩ , |−⟩}. Alice sends
the following qubits:
+1 + +110 − + + 101 + − − 0.
Alice and Bob exchange 𝑏𝐴 and 𝑏𝐵 . The following positions coincide: 1, 2, 5, 7, 8, 10,
14, 15. Hence Alice and Bob used the same basis for these positions and 8 shared key
bits remain. Alice chooses positions 2, 5, 10, 14 to check for eavesdropping, which
would probably change at least one bit. They exchange the check bits and verify them.
If the check bits match, then they accept the key exchange. The resulting key 0011 is
defined by the remaining four positions 1, 7, 8 and 15. ♢
Exercises 251
13.7. Summary
1 1
1. Show that the Bell state |𝜓⟩ = |00⟩ + |11⟩ cannot be written as the product of
√2 √2
two single qubit states (𝑎1 |0⟩ + 𝑏1 |1⟩) ⊗ (𝑎2 |0⟩ + 𝑏2 |1⟩).
2. Show that applying the CNOT gate to 𝐻 |0⟩ ⊗ |0⟩ gives the above Bell state. Depict
the corresponding circuit. Give the other three Bell states 𝐻 |0⟩ ⊗ |1⟩, 𝐻 |1⟩ ⊗ |0⟩
and 𝐻 |1⟩ ⊗ |1⟩.
3. Describe the transformations of Pauli-𝑋, 𝑌 , 𝑍, 𝑆 (Phase), 𝑇 ( ) and 𝐻 gates on the
Bloch sphere.
4. Prove that there is a bijection between the state space for a single-qubit system and
the complex projective space ℙ1 (ℂ).
5. Give the matrix of the Walsh-Hadamard transformation of two qubits.
6. Determine the matrix of the Toffoli gate on three input qubits.
7. Show that the Toffoli gate gives a quantum analogue of the classical NAND oper-
8. Let 𝑥 ∈ {0, 1}𝑛 . Prove the formula
252 13. Quantum Computing
𝐻 ⊗𝑛 |𝑥⟩ = ∑ (−1)𝑥⋅𝑧 |𝑧⟩ .
√2𝑛 𝑧∈{0,1}𝑛
Lattice-based Cryptography
The following two chapters deal with public-key cryptosystems based on lattices and
codes, respectively. A key motivation is the emergence of quantum computers which
are able to break RSA, Diffie-Hellman and elliptic curve cryptosystems. Fortunately,
symmetric schemes like AES are less affected by the effects of quantum algorithms.
Now the relatively new field of post-quantum cryptography studies encryption and sig-
natures schemes which are believed to be secure in the presence of quantum comput-
ers. We focus on lattices and codes used in many proposals and look at encryption
This chapter introduces the basics of lattices and their applications in cryptogra-
phy. Lattices are discrete subgroups of ℝ𝑛 and there are computational problems, for
example finding the shortest vector in a lattice, which are believed to be hard. Solv-
ing a system of linear equations is easy, but solving a random system of noisy linear
equations modulo an integer (learning with errors) is hard. Lattice-based cryptogra-
phy offers strong security guarantees and is believed to resist quantum attacks.
We outline the fundamentals of lattices in Section 14.1. Lattice algorithms and in
particular the LLL algorithm are studied in Section 14.2. The following Sections 14.3,
14.4 and 14.5 deal with the public-key encryption schemes GGH, NTRU and LWE,
respectively. There are also promising lattice-based signature schemes, but they are
not covered in this book.
We refer the reader to the textbooks [HPS08], [Gal12] and the articles [MR09],
[Pei14] for additional reading on lattice-based cryptography.
254 14. Lattice-based Cryptography
14.1. Lattices
A lattice Λ is a discrete subgroup of ℝ𝑛 . The trivial lattice is Λ = {0} and a standard
example is the lattice Λ = ℤ𝑛 of vectors with integer coordinates. In mathematics, a
lattice can also refer to a partially ordered set in which two elements have a least upper
bound and a greatest lower bound, but this not used in our context.
Definition 14.1. A subset Λ ⊂ ℝ𝑛 is called discrete if every point 𝑣 ∈ Λ possesses an
environment 𝑈 = {𝑤 ∈ ℝ𝑛 | ‖𝑣 − 𝑤‖ < 𝜖}, i.e., an open ball of radius 𝜖 > 0, such that
𝑈 ∩ Λ = {𝑣}, i.e., 𝑣 is the only lattice point in 𝑈. A discrete subgroup of ℝ𝑛 is called a
lattice. ♢
It follows from the above definition that every bounded set and in particular every
ball {𝑣 ∈ ℝ𝑛 | ‖𝑣‖ < 𝑑} only contains a finite number of lattice points. All nontrivial
lattices are infinite sets, but they have a finite basis.
Let 𝑉 ⊂ ℝ𝑛 be the real vector space generated by Λ. We define the rank of Λ to
be the dimension of 𝑉. The following Proposition shows that a lattice Λ of rank 𝑟 has a
ℤ-basis {𝑣1 , … , 𝑣𝑟 } ⊂ Λ.
Proposition 14.2. Let Λ be a nontrivial lattice of rank 𝑟. Then there is a basis 𝐵 =
{𝑣1 , 𝑣2 , … , 𝑣𝑟 } ⊂ Λ, i.e., a set of linearly independent vectors such that
Λ = {𝑥1 𝑣1 + 𝑥2 𝑣2 + ⋯ + 𝑥𝑟 𝑣𝑟 | 𝑥1 , 𝑥2 , … , 𝑥𝑟 ∈ ℤ}.
Proof. Our proof follows [Gal12]. Let 𝐵 = {𝑣1 , 𝑣2 , … , 𝑣𝑟 } be a set of linearly indepen-
dent vectors in Λ. We want to transform 𝐵 into a basis 𝐵 ′ of Λ. For 𝑑 ≤ 𝑟 we let 𝑉𝑑 be
the real vector space generated by 𝑣1 , … , 𝑣𝑑 . The lattice Λ𝑑 = 𝑉𝑑 ∩Λ has rank 𝑑 ≤ 𝑟 and
Λ𝑟 = Λ. We now prove the claim by induction. For 𝑑 = 1, we can easily find a basis of
Λ1 : we replace 𝑣1 by the shortest nonzero multiple 𝑣1′ = 𝛼𝑣1 such that 𝑣1′ ∈ Λ1 . Now
we assume that Λ𝑑−1 has a basis {𝑣1′ , … , 𝑣𝑑−1
}. Consider the bounded and discrete set
𝑆 = Λ𝑑 ∩ {𝛼1 𝑣1′ + ⋯ + 𝛼𝑑−1 𝑣𝑑−1
+ 𝛼𝑑 𝑣𝑑 | 𝛼1 , … , 𝛼𝑑−1 ∈ [0, 1[ and 𝛼𝑑 ∈ [0, 1]}.
Let 𝑣𝑑′ = 𝛼1 𝑣1′ + ⋯ + 𝛼𝑑−1 𝑣𝑑−1 ′
+ 𝛼𝑑 𝑣𝑑 be the element in 𝑆 with smallest nonzero
coefficient 𝛼𝑑 . It is obvious that 𝐵 ′ = {𝑣1′ , … , 𝑣𝑑−1 , 𝑣𝑑′ } is linearly independent and it
remains to show that 𝐵 is a basis of Λ𝑑 . To this end, given any vector 𝑣 = 𝛽1 𝑣1′ + ⋯ +
𝛽𝑑−1 𝑣𝑑−1 + 𝛽𝑑 𝑣𝑑 ∈ Λ𝑑 , there are integer coefficients 𝑥𝑖 ∈ ℤ such that
𝑤 = 𝑣 − 𝑥1 𝑣1′ − ⋯ − 𝑥𝑑−1 𝑣𝑑−1 − 𝑥𝑑 𝑣𝑑′ ∈ 𝑆
and 𝛽𝑑 − 𝑥𝑑 𝛼𝑑 ∈ [0, 𝛼𝑑 [ . Note that 𝛽𝑑 − 𝑥𝑑 𝛼𝑑 is the coefficient of 𝑣𝑑 in 𝑤. Since 𝑣𝑑′ is
the element in 𝑆 with the smallest nonzero coefficient of 𝑣𝑑 , we obtain 𝛽𝑑 − 𝑥𝑑 𝛼𝑑 = 0
and hence 𝑤 ∈ Λ𝑑−1 . This shows that 𝑣 ∈ Λ𝑑 . □
In this chapter we assume that the rank is maximal, i.e., Λ ⊂ ℝ𝑛 and 𝑟𝑘(Λ) =
𝑛, as the more general case is not substantially different. We write lattice vectors as
columns, but row vectors are also used in the literature. Writing the basis vectors into
14.1. Lattices 255
the columns defines a regular 𝑛 × 𝑛 matrix. By abuse of notation, we will use the
same letter for a basis and the associated 𝑛 × 𝑛 matrix of column vectors. Two bases
𝐵1 = {𝑣1 , 𝑣2 , … , 𝑣𝑛 } and 𝐵2 = {𝑤1 , 𝑤2 , … , 𝑤𝑛 } of the same lattice Λ are connected by a
unimodular 𝑛 × 𝑛 matrix 𝑈 over ℤ:
𝐵2 = 𝐵1 𝑈.
A matrix 𝑈 is called unimodular if all entries are integers and det(𝑈) = ±1. Indeed, for
each 𝑤𝑖 there are 𝑥1 , … , 𝑥𝑛 ∈ ℤ such that 𝑤𝑖 = 𝑥1 𝑣1 + ⋯ + 𝑥𝑛 𝑣𝑛 and the coefficients
𝑥1 , … , 𝑥𝑛 form the 𝑖-th column of 𝑈. Conversely, each 𝑣𝑖 can be represented by an
˜ for some integer
integer linear combination of 𝑤1 , … , 𝑤𝑛 . Therefore, we have 𝐵1 = 𝐵2 𝑈
˜ ˜ −1
matrix 𝑈, from which we conclude that 𝑈 is invertible, 𝑈 = 𝑈 and det(𝑈) = ±1.
4 2
Example 14.3. Let 𝐵1 = {( ) , ( )} and assume that Λ is a lattice generated by 𝐵1 .
−1 2
The lattice is depicted in Figure 14.1, where 𝐵1 is shown with continuous lines.
256 14. Lattice-based Cryptography
8 10
𝐵2 = {( ) , ( )} is another basis of Λ, and the basis 𝐵2 is shown with dashed
−7 −10
lines in Figure 14.1. One has 𝐵2 = 𝐵1 𝑈, where 𝑈 is the unimodular matrix
3 4
𝑈=( ).
−2 −3
Intuitively, the first basis is ‘better’ than the second, since the vectors are shorter and
closer to being orthogonal.
Definition 14.4. Let Λ be a lattice and 𝐵 any basis of Λ. Then the determinant of Λ is
defined by the absolute value
det(Λ) = | det(𝐵)|.
The determinant of Λ does not depend on the chosen basis. ♢
Here we give a description of the dual lattice in terms of matrices. Remember that
we assumed that our lattices have full rank. Let 𝐵 be a basis of Λ. We have 𝑦 ∈ Λ∗ if
and only if 𝐵 𝑇 𝑦 ∈ ℤ𝑛 or, equivalently, 𝑦 = (𝐵 𝑇 )−1 𝑥 for some 𝑥 ∈ ℤ𝑛 . This implies that
the dual lattice is generated by the columns of (𝐵𝑇 )−1 . Furthermore, it follows that
1 1
det(Λ∗ ) = det |(𝐵𝑇 )−1 | = = .
| det(𝐵)| det(Λ)
Example 14.7. Consider the lattice Λ in Example 14.3. The dual lattice Λ∗ is given by
the columns of
1 1
(𝐵1𝑇 )−1 =( 1 5 10 ) .
5 5
1 1
The covolume of Λ is∗
= . ♢
det(Λ) 10
For cryptographic applications, one usually considers 𝑞-ary lattices, which are de-
fined by integers and modular congruences.
14.1. Lattices 257
The 𝑛-dimensional lattice Λ𝑞 (𝐴) is defined by the rows of 𝐴 and 𝑞ℤ𝑛 . Note that here
we have linear combinations of rows of 𝐴, not columns of 𝐴, as above. Furthermore,
the kernel of 𝐴 defines a lattice:
Λ⟂𝑞 (𝐴) ∶= {𝑦 ∈ ℤ𝑛 | 𝐴𝑦 ≡ 0 mod 𝑞}.
The lattices Λ𝑞 (𝐴) and Λ⟂𝑞 (𝐴) have full rank since they contain 𝑞ℤ𝑛 .
Λ𝑞 (𝐴) and Λ⟂𝑞 (𝐴) can also be viewed as linear codes over ℤ𝑞 , defined by the rows
of 𝐴 and the parity check matrix 𝐴, respectively (see Section 15.1). These two lattices
are dual to each other, up to normalization (see Exercise 5):
Λ⟂𝑞 (𝐴) = 𝑞Λ𝑞 (𝐴)∗ and Λ𝑞 (𝐴) = 𝑞Λ⟂𝑞 (𝐴)∗ .
Example 14.10. Consider the 10-ary lattice Λ from Example 14.3. The columns of
𝐵1 are (4, −1) and (2, 2). Since 8 ⋅ (4, −1) = (32, −8) ≡ (2, 2) mod 10, we discard
the second vector and define the 1 × 2 matrix 𝐴 = (4 − 1). Thus Λ = Λ10 (𝐴). The
lattice Λ⟂10 (𝐴) is defined by all solutions (𝑥, 𝑦) ∈ ℤ2 of the modular equation 4𝑥 − 𝑦 ≡
0 mod 10. We have Λ⟂10 (𝐴) = 10 ⋅ Λ10 (𝐴)∗ and the lattice is defined by the columns of
2 1
the matrix ( ). ♢
−2 4
(2) Shortest Independent Vector Problem (SIVP): Find linearly independent vectors
𝑣1 , … , 𝑣𝑛 in Λ such that max𝑖 ‖𝑣𝑖 ‖ = 𝜆𝑛 (Λ).
(3) Closest Vector Problem (CVP): given any target vector 𝑤 ∈ ℝ𝑛 , find the closest
lattice point 𝑣 ∈ Λ to 𝑤. ♢
One also considers approximation variants of these problems. Let 𝛾 ≥ 1. In SVP𝛾 ,
one has to find a vector 𝑣 with ‖𝑣‖ ≤ 𝛾 𝜆1 (Λ). Similarly, the SIVP𝛾 problem is to find
linearly independent vectors 𝑣1 , … , 𝑣𝑛 such that max𝑖 ‖𝑣𝑖 ‖ ≤ 𝛾 𝜆𝑛 (Λ).
In CVP𝛾 , the goal is to find a vector 𝑣 such that the distance to a target vector 𝑤 is
at most 𝛾 times the distance of the closest lattice vector to 𝑤.
Example 14.13. In Example 14.3 (see Figure 14.1), the shortest vectors are 𝑣 = ( )
and −𝑣, and thus 𝜆1 (Λ) = √8. The basis 𝐵1 is not a solution to SIVP, but instead the
2 −2
{( ) , ( )} .
2 3
Hence 𝜆2 (Λ) = √13. However, 𝐵1 is a solution to SIVP𝛾 for 𝛾 ≥ since 𝜆2 (𝐵1 ) = √17.
−1 −2
The closest lattice vector to the target 𝑤 = ( ) is 𝑣 = ( ). ♢
2 3
It is known that SVP is no harder than CVP, since there is a reduction from SVP to
CVP. Both are considered to be hard problems and CVP is NP-hard.
The classical Minkowski Theorem (see [HPS08]) gives an upper bound to the norm
of the shortest nonzero vector:
Theorem 14.14. Let Λ be a lattice and 𝑆 ⊂ ℝ𝑛 a convex centrally symmetric set. If the
volume of 𝑆 is greater than 2𝑛 det(Λ), then 𝑆 contains a nonzero lattice point. ♢
A set 𝑆 is centrally symmetric if 𝑥 ∈ 𝑆 implies −𝑥 ∈ 𝑆. 𝑆 is called convex if 𝑥, 𝑦 ∈ 𝑆
implies 𝑥 + 𝑡(𝑦 − 𝑥) ∈ 𝑆 for 𝑡 ∈ [0, 1], i.e., if the line segment between two points
𝑥, 𝑦 ∈ 𝑆 is contained in 𝑆. For example, balls or cubes with center 0 are centrally
symmetric and convex.
Corollary 14.15. 𝜆1 (Λ) ≤ √𝑛 (det(Λ)) 𝑛 .
Proof. Let 𝑆 be a ball with center 0 and radius √𝑛 (det(Λ)) 𝑛 . Then
𝑣 = (det(Λ)) 𝑛 (1, 1, … , 1) ∈ 𝑆
1 1
since ‖𝑣‖ = (det(Λ)) 𝑛 √𝑛. Furthermore, any vector 𝑤 = (det(𝐿)) 𝑛 ⋅ (𝑥1 , 𝑥2 , … , 𝑥𝑛 )
with |𝑥𝑖 | ≤ 1 for all 𝑖 = 1, … , 𝑛 lies in 𝑆. Hence
1 1 𝑛
[−(det(Λ)) 𝑛 , (det(Λ)) 𝑛 ] ⊂ 𝑆.
14.1. Lattices 259
The volume of 𝑆 is thus greater than 2𝑛 det(Λ), the volume of the above 𝑛-dimensional
cube. 𝑆 satisfies the prerequisite of Minkowski’s Theorem 14.14 and must contain a
nonzero lattice vector. □
Remark 14.16. The above upper bound for 𝜆1 (Λ) can be improved to
√𝛾𝑛 (det(Λ)) 𝑛
using Hermite’s constant 𝛾𝑛 . For a given dimension 𝑛, the constant 𝛾𝑛 is the smallest
number such that every lattice of rank 𝑛 contains a nonzero vector 𝑣 with
‖𝑣‖ ≤ √𝛾𝑛 (det(Λ)) 𝑛 .
For example, 𝛾2 = , but the exact value of 𝛾𝑛 is known only for a few values of 𝑛. The
expected length of the shortest vector of a random lattice is much smaller. Heuristically,
the approximate length is the radius of an 𝑛-dimensional ball with volume det(Λ). Stir-
ling’s asymptotic formula for the volume of an 𝑛-dimensional ball of radius 𝑟 is
1 2𝜋𝑒
𝑉𝑛 (𝑟) ≈ (√ 𝑟) .
√𝑛𝜋 𝑛
Now assume that the covolume of a lattice with 𝜆1 (Λ) = 𝑟 is approximately the volume
of a ball of radius 𝑟. Rearranging the above equation gives for large values of 𝑛:
𝑛 1
𝑟≈ (det(Λ)) 𝑛 .
√ 2𝜋𝑒
This is the Gaussian heuristic for randomly chosen lattices of dimension 𝑛. ♢
We consider 𝐵1 and 𝐵2 (see Example 14.3). For the ‘good’ basis 𝐵1 we obtain
1 2 −2 2 −
𝐵1−1 𝑣 = ( ) ⋅ ( ) = ( 135 ) .
10 1 4 6
The rounded coordinates are (−1, 3) and the resulting lattice vector is
−1 2
𝐵1 ( ) = ( ) ,
3 7
which is in fact the closest vector. Now we use the ‘bad’ basis 𝐵2 :
1 10 10 2 8
𝐵2−1 𝑣 = ( ) ( ) = ( 31 ) .
10 −7 −4 6 −
The rounded coordinates are (8, −6) and the corresponding lattice vector is
8 4
𝐵2 ( ) = ( ) ,
−6 4
and this is only the second-best solution. ♢
Finding short vectors in random 𝑞-ary lattices is assumed to be intractable for large
dimensions, say several hundreds. Below, we will see that public-key cryptosystems
can be based on lattices, where a ‘good’ basis with short vectors forms the private key
and only a ‘bad’ basis is public.
(4) In each row of 𝐻, the unique maximum coefficient lies on the diagonal. ♢
HNFs also exist for general 𝑛 × 𝑚 matrices, i.e., without the condition 𝑟𝑘(𝐴) = 𝑛.
Since we only consider lattices of full rank, this is not needed here.
Every integer matrix can be transformed into a matrix in HNF form:
Proof. The matrix 𝐻 can be computed by Gaussian elimination. The following oper-
ations are sufficient: swapping two columns, multiplying a column by −1 and adding
an integer multiple of a column to another column. These operations are given by
a multiplication with a unimodular matrix on the right. We leave the details to the
reader. □
Remark 14.20. Above, we considered the column-style HNF. There is also a row-style
HNF 𝐻 of matrix 𝐴 in upper-triangular form where 𝐻 = 𝑈𝐴. Both HNFs are transposes
of each other.
Example 14.21. (1) We compute the HNF of 𝐵1 and 𝐵2 in Example 14.3. Let 𝐵1 =
4 2
{𝑣1 , 𝑣2 } = {( ) , ( )}. Set 𝑣1 ← (−𝑣1 ) + 2𝑣2 and swap 𝑣1 and 𝑣2 . We obtain the
−1 2
2 0
𝐻=( ).
2 5
8 10
Now consider the second basis 𝐵2 = {( ) , ( )}. Set 𝑣1 ← (−𝑣1 ) + 𝑣2 , giving
−7 −10
2 10 2 0
( ). Now set 𝑣2 ← (−5𝑣1 ) + 𝑣2 and 𝑣1 ← 𝑣2 + 𝑣1 . This gives 𝐻 = ( )
−3 −10 2 5
as above. In fact, 𝐵1 and 𝐵2 generate the same lattice.
(2) Consider a slightly more complicated example:
2 −6 12 4
𝐴=(10 −30 11 6) .
2 −6 4 5
262 14. Lattice-based Cryptography
Although many lattices do not possess an orthogonal basis, a good basis should
have almost orthogonal vectors.
Definition 14.22. The orthogonality defect of a basis 𝑏1 , … , 𝑏𝑛 of a lattice Λ is defined
‖𝑏1 ‖ ⋯ ‖𝑏𝑛 ‖
. ♢
The orthogonality defect is always ≥ 1. It is close to 1 for a ‘good’ basis and equal
to 1 for an orthogonal basis.
Example 14.23. The orthogonality defect of the ‘good’ basis 𝐵1 (see Example 14.3) is
1.17 and that of the ‘bad’ basis 𝐵2 is 15.03. ♢
The numbers 𝜇𝑖,𝑗 are called GSO coefficients. The summand 𝜇𝑖,𝑗 𝑏𝑗∗ gives the projection
of 𝑏𝑖 onto 𝑏𝑗∗ and their sum is the projection of 𝑏𝑖 onto the hyperplane ⟨𝑏1∗ , … , 𝑏𝑖−1
⟩. The
difference of 𝑏𝑖 and the projection gives the vector 𝑏𝑖 , which is orthogonal to all vectors
𝑏1∗ , … , 𝑏𝑖−1
. The vector 𝑏𝑖∗ is also called the projection onto the orthogonal complement
∗ ∗
of 𝑏1 , … , 𝑏𝑖−1 .
𝑏𝑖∗ can also be computed by successive projections of 𝑏𝑖 onto the orthogonal com-
plement of 𝑏𝑗∗ , where 𝑗 runs from 𝑖 − 1 to 1. Initially set 𝑏1∗ = 𝑏1 , … , 𝑏𝑛∗ = 𝑏𝑛 and update
each vector 𝑏2∗ , … , 𝑏𝑛∗ recursively:
𝑏𝑖∗ ← 𝑏𝑖∗ − 𝜇𝑖,𝑗 𝑏𝑗∗ , where 𝑗 = 𝑖 − 1, … , 1.
We write 𝐵𝑖 for the square norm ‖𝑏𝑖∗ ‖2 of vectors in the GSO basis.
14.2. Lattice Algorithms 263
The standard GSO algorithm needs to be modified for lattices, since the GSO basis
is not contained in the lattice unless all GSO coefficients are integers. Now, the obvious
approach is to round the GSO coefficients 𝜇𝑖,𝑗 . Let ⌊𝜇𝑖,𝑗 ⌉ be the closest integer to 𝜇𝑖,𝑗 .
Then set
𝑏𝑖 = 𝑏𝑖 − ⌊𝜇𝑖,𝑗 ⌉𝑏𝑗 for 𝑗 = 𝑖 − 1, … , 1.
𝑏𝑖′ 𝑏𝑖
𝑏𝑗 2𝑏𝑗
Figure 14.2. Projection of 𝑏𝑖 onto the orthogonal complement of 𝑏𝑗 and lifting it back
to a lattice vector 𝑏𝑖′ (dashed). In this example, one has ⌊𝜇𝑖,𝑗 ⌉ = 2. The new basis 𝑏𝑖′ , 𝑏𝑗
is size-reduced.
A size-reduced basis cannot be further reduced, since all rounded GSO coefficients
are zero. But note that this property depends on the order of the vectors. Even in
dimension 2, it may happen that 𝑏1 , 𝑏2 is size-reduced while 𝑏2 , 𝑏1 is not.
A size-reduced basis can be computed with the following integer variant of the
GSO algorithm (see Algorithm 14.2).
We note that the size reduction algorithm does not change the GSO Basis 𝑏1∗ , … , 𝑏𝑛∗
and their square norms 𝐵1 , … , 𝐵𝑛 . A size-reduced basis 𝑏1 , … , 𝑏𝑛 may be further im-
proved by changing the order of the vectors. Consider the GSO Algorithm 14.1: if 𝑏𝑖
and 𝑏𝑖+1 are swapped, then the reduction algorithm leaves 𝑏1∗ , … , 𝑏𝑖−1
∗ ∗
and 𝑏𝑖+2 , … , 𝑏𝑛∗
unchanged and the new value for 𝑏𝑖 is
𝑏𝑖+1 − ∑ 𝜇𝑖+1,𝑗 𝑏𝑗∗ .
Definition 14.25. Let 𝑏1 , … , 𝑏𝑛 , 𝑏1∗ , … , 𝑏𝑛∗ , 𝐵1 , … , 𝐵𝑛 be as above and 𝛿 ∈ ] , 1[. Then
the Lovacz condition with factor 𝛿 is defined by
𝛿𝐵𝑖 ≤ 𝐵𝑖+1 + 𝜇2𝑖+1,𝑖 𝐵𝑖
for 𝑖 = 1, … , 𝑛 − 1. The condition is equivalent to (𝛿 − 𝜇2𝑖+1,𝑖 )𝐵𝑖 ≤ 𝐵𝑖+1 . An ordered
basis 𝑏1 , … , 𝑏𝑛 is called 𝛿-LLL-reduced if it is size-reduced and the Lovacz condition
holds with factor 𝛿. ♢
A typical choice is 𝛿 = , which ensures that the algorithm terminates in polyno-
mial time (in contrast to 𝛿 = 1). Now we can give the basic version of the famous LLL
(Lenstra-Lenstra-Lovasz) algorithm (see Algorithm 14.3).
Remark 14.26. Algorithm 14.3 can be optimized: it is not necessary to leave the loop
and to re-run the size-reduction Algorithm 14.2, if 𝑏𝑖 and 𝑏𝑖+1 are swapped (step 5).
Instead, it is sufficient to update the GSO basis and several GSO coefficients and to
decrease the loop index 𝑖 by 1.
Example 14.27. We consider the following HNF basis of a 3-dimensional lattice (see
Example 14.21 (2)):
2 0 0
𝑏1 = ( 3 ) , 𝑏2 = ( 7 ) , 𝑏3 = ( 0 ) .
14 11 23
We apply the LLL lattice reduction Algorithm 14.3. First, we compute the GSO coeffi-
𝑏 ⋅𝑏 175 𝑏 ⋅𝑏 322 𝑏 ⋅ 𝑏∗ 3473
𝜇21 = 2 1 = , 𝜇31 = 3 1 = , 𝜇32 = 3∗ 2∗ = − .
𝑏1 ⋅ 𝑏1 209 𝑏1 ⋅ 𝑏1 209 𝑏2 ⋅ 𝑏2 4905
Note that 𝜇32 is computed using the updated vector
𝑏2∗ = 𝑏2 − 𝜇21 𝑏1 = (−350/209, 938/209, −151/209)𝑇 .
266 14. Lattice-based Cryptography
1501 103684
Now the GSO algorithm gives 𝐵1 = 29, 𝐵2 = and 𝐵3 = , and the updated
29 1501
GSO coefficients are
6 5 2087
𝜇21 = − , 𝜇 = − , 𝜇32 = .
29 31 29 1501
Again, the size-reduction algorithm is applied. 𝑏2 does not change since ⌊𝜇21 ⌉ = 0. We
update 𝑏3 by 𝑏3 − ⌊𝜇32 ⌉𝑏2 = 𝑏3 − 𝑏2 = (4, 6, 5)𝑇 , 𝜇31 by 𝜇31 − ⌊𝜇32 ⌉𝜇21 = and 𝜇32
by 𝜇32 − 1 = . Now the basis 𝑏1 , 𝑏2 , 𝑏3 is size-reduced and both Lovacz conditions
are satisfied:
−2 −4 4
𝑏1 = ( 4 ) , 𝑏2 = ( 1 ) , 𝑏3 = (6) .
−3 6 5
The LLL-reduced basis is shorter than the original basis and one can show (for example,
by testing shorter vectors with integer coefficients) that 𝑏1 = ( 4 ) is the shortest
vector of the lattice Λ. ♢
One can show that the LLL algorithm always terminates and runs in polynomial
time. The number of swaps and hence the number of executions of the main loop is
bounded by 𝑂(𝑛2 ln(𝑋)), where 𝑋 is an upper bound on the norms of the input vectors.
We refer to [Gal12] and [HPS08] for a proof the following statement:
The next Proposition relates the norms of the LLL-reduced basis to the norms of
the GSO basis.
268 14. Lattice-based Cryptography
Proposition 14.29. Let 𝑏1 , … , 𝑏𝑛 be an LLL-reduced basis with 𝛿 = . Let 𝑏1∗ , … , 𝑏𝑛∗ be
the corresponding GSO basis and 𝐵𝑖 = ‖𝑏𝑖∗ ‖2 as above. Then:
(1) 𝐵𝑖 ≤ 2𝐵𝑖+1 for 1 ≤ 𝑖 < 𝑛 and 𝐵𝑗 ≤ 2𝑖−𝑗 𝐵𝑖 for 1 ≤ 𝑗 ≤ 𝑖 ≤ 𝑛.
(2) 𝐵𝑖 ≤ ‖𝑏𝑖 ‖2 ≤ ( + 2𝑖−2 )𝐵𝑖 for 1 ≤ 𝑖 ≤ 𝑛.
(3) ‖𝑏𝑗 ‖ ≤ 2(𝑖−1)/2 ‖𝑏𝑖∗ ‖ for 1 ≤ 𝑗 ≤ 𝑖 ≤ 𝑛.
(4) 𝜆1 (Λ) ≥ min1≤𝑖≤𝑛 ‖𝑏𝑖∗ ‖.
1 3
Proof. Since the basis is reduced, one has 𝜇2𝑖+1,𝑖 ≤ . The Lovacz condition for 𝛿 =
4 4
implies (1). The GSO construction gives
𝑏𝑖 = 𝑏𝑖∗ + ∑ 𝜇𝑖,𝑗 𝑏𝑗∗ .
Since the GSO vectors are orthogonal, one obtains ‖𝑏𝑖∗ ‖ ≤ ‖𝑏𝑖 ‖ and
‖𝑏𝑖 ‖2 = 𝐵𝑖 + ∑ 𝜇2𝑖,𝑗 𝐵𝑗 .
1 1 (𝑖−𝑗)
Furthermore, 𝜇2𝑖,𝑗 𝐵𝑗 ≤ 𝐵𝑗 ≤ 2 𝐵𝑖 by (1). This gives part (2), since
4 4
1 1 1
‖𝑏𝑖 ‖2 ≤ 𝐵𝑖 (1 + ∑ 2𝑖−𝑗 ) = 𝐵𝑖 (1 + (2𝑖 − 2)) = 𝐵𝑖 ( + 2𝑖−2 ) .
4 𝑗=1 4 2
For 𝑗 ≥ 1 we have + 2𝑗−2 ≤ 2𝑗−1 . Thus (2) implies ‖𝑏𝑗 ‖2 ≤ 2𝑗−1 𝐵𝑗 . Since 𝐵𝑗 ≤ 2𝑖−𝑗 𝐵𝑖
by (1), we obtain ‖𝑏𝑗 ‖2 ≤ 2𝑗−1 2𝑖−𝑗 𝐵𝑖 = 2𝑖−1 𝐵𝑖 . Taking square roots proves part (3).
Suppose 𝑣 is a shortest nonzero lattice vector and 𝑣 = ∑𝑖=1 𝑥𝑖 𝑏𝑖 where 𝑥𝑖 ∈ ℤ; then:
𝑛 𝑖−1 𝑛
𝑣 = ∑ (𝑥𝑖 𝑏𝑖∗ + ∑ 𝑥𝑖 𝜇𝑖,𝑗 𝑏𝑗∗ ) = ∑ (𝑥𝑖 + 𝜇𝑖+1,𝑖 𝑥𝑖+1 + ⋯ + 𝜇𝑛,𝑖 𝑥𝑛 ) 𝑏𝑖∗ .
𝑖=1 𝑗=1 𝑖=1
Now, let 𝑖 be the largest index such that 𝑥𝑖 ≠ 0. The above formula and the orthogo-
nality of the GSO basis implies ‖𝑣‖ ≥ |𝑥𝑖 | ‖𝑏𝑖∗ ‖ and hence part (4). □
The next Proposition shows how effective the LLL algorithm is (in the worst case)
with respect to computing a short vector and an almost orthogonal basis. The algo-
rithm is good for small values of 𝑛, but the bounding factors grow exponentially in 𝑛.
Proposition 14.30. Let 𝑏1 , … , 𝑏𝑛 be an LLL-reduced basis with 𝛿 = . Then:
Proof. We use the previous Proposition 14.29. Part (1) gives ‖𝑏𝑖∗ ‖ ≥ 2(1−𝑖)/2 ‖𝑏1∗ ‖. Part
(4) and 𝑏1 = 𝑏1∗ yield the first inequality:
𝜆1 (Λ) ≥ min ‖𝑏𝑖∗ ‖ ≥ min 2(1−𝑖)/2 ‖𝑏1∗ ‖ = 2(1−𝑛)/2 ‖𝑏1 ‖.
1≤𝑖≤𝑛 1≤𝑖≤𝑛
We have det(Λ) = ∏𝑖=1 ‖𝑏𝑖∗ ‖. Inequality (2) follows from ‖𝑏𝑖∗ ‖ ≤ ‖𝑏𝑖 ‖ and part (3) of
Proposition 14.29:
‖𝑏𝑖∗ ‖ ≤ ‖𝑏𝑖 ‖ ≤ 2(𝑖−1)/2 ‖𝑏𝑖∗ ‖.
Furthermore, ‖𝑏1 ‖ ≤ 2(𝑖−1)/2 ‖𝑏1∗ ‖ gives
‖𝑏1 ‖𝑛 ≤ ∏ 2(𝑖−1)/2 ‖𝑏𝑖∗ ‖ = 2𝑛(𝑛−1)/4 det(Λ),
The ciphertext is close to the lattice point 𝐻𝑚 and the assumption is that finding
this vector given 𝑐 is hard. The following Proposition shows that decryption is correct
if the noise vector 𝑟 is small.
Proposition 14.32. Let 𝐵, 𝐻, Λ and 𝑟 be as above. If ⌊𝐵−1 𝑟⌉ = 0 then GGH decryption
is correct.
However, if an adversary tries to decrypt 𝑐 using the public basis 𝐻, then the result 𝑚′
differs from 𝑚:
⎢⎛ 2 ⎞⎤ ⎛ 2 ⎞
⎢ −3 ⎥
𝑚′ = ⌊𝐻 −1 𝑐⌉ = ⎢ ⎜ ⎟ = ⎜−3⎟ . ♢
⎢⎜ 2 ⎟⎥ ⎜2⎟
⎢⎜ 24 ⎟⎥
⎣⎝ 7 ⎠⎥ ⎝ 3 ⎠
It can be shown that the GGH encryption scheme has inherent weaknesses. A
major problem is that the noise vector 𝑟 has to be short for correct decryption.
As a consequence, GGH ciphertexts are not uniformly distributed and can be dis-
tinguished from random data, but then the closest vector problem is much easier than
in the general case. Practical attacks could be mounted for 𝑛 < 400, and larger dimen-
sions are impractical because of the key size that grows quadratically in 𝑛. There are
proposals for improvements of GGH that require further cryptanalysis.
14.4. NTRU
The NTRU cryptosystem was invented in the 1990s by Hoffstein, Pipher and Silverman
[HPS98]. The classical definition of NTRU uses polynomials in the ring
𝑅 = ℤ[𝑥]/(𝑥𝑁 − 1),
where 𝑁 is fixed, for example 𝑁 = 743. Furthermore, a large modulus 𝑞 and a small
modulus 𝑝 are needed with gcd(𝑝, 𝑞) = 1, for example 𝑞 = 2048 and 𝑝 = 3. We begin
with the classical definition of NTRU and outline the relation to lattices later in this
Multiplication in the ring 𝑅 can be viewed as a convolution product, since
(𝑎0 + 𝑎1 𝑥 + ⋯ + 𝑎𝑁−1 𝑥𝑁−1 )(𝑏0 + 𝑏1 𝑥 + ⋯ + 𝑏𝑁−1 𝑥𝑁−1 ) ≡ 𝑐0 + ⋯ + 𝑐𝑁−1 𝑥𝑁−1 ,
where 𝑐𝑘 = ∑𝑖+𝑗≡𝑘 mod 𝑁 𝑎𝑖 𝑏𝑗 . The convolution product is often denoted by a ‘∗’, but
since it is also the usual multiplication in the quotient ring 𝑅 (see Proposition 4.60) we
will not use a special notation.
NTRU uses polynomials with small coefficients. We define 𝒯(𝑑1 , 𝑑2 ) ⊂ 𝑅 to be the
subset of ternary polynomials, where 𝑓 ∈ 𝒯(𝑑1 , 𝑑2 ) if a representative 𝑓 of degree < 𝑁
has 𝑑1 coefficients equal to 1, 𝑑2 coefficients equal to −1 and the remaining coefficients
equal to zero (see [HPS08]).
Note that the plain NTRU cryptosystem is not CPA-secure, since it leaks a small
part of the plaintext (see Exercise 10).
Remark 14.35. There are recommendations for 𝑁, 𝑝, 𝑞 and the distribution of coef-
ficients in the polynomials 𝑓, 𝑔 and 𝑟 can be more general than explained above. It is
estimated that NTRU encryption with 𝑁 = 743, 𝑝 = 3, 𝑞 = 211 = 2048 achieves a very
high level of security [HPS+ 17].
14.4. NTRU 273
The last equation is true since 𝑥5 = 1 in 𝑅. We have generated the public key 𝑝𝑘 =
(𝑁, 𝑝, 𝑞, ℎ), and the private key is 𝑠𝑘 = 𝑓.
Suppose the plaintext is encoded in the polynomial 𝑚 = 𝑥3 + 𝑥. A random poly-
nomial 𝑟 = 𝑥4 − 𝑥 ∈ 𝒯(1, 1) is chosen for encryption and the ciphertext is
𝑐 = 𝑝 𝑟ℎ + 𝑚 = 3(𝑥4 − 𝑥)(8𝑥4 + 2𝑥3 + 11𝑥2 + 13𝑥 − 5) + 𝑥3 + 𝑥
= 8𝑥4 + 21𝑥3 + 25𝑥2 + 20𝑥 + 15 mod 29.
sage: h=fq*Rq(g)
sage: m=x^3+x; r = x^4-x
sage: c= Rq(p)*Rq(h)* Rq(r)+ Rq(m);c
8* xbar ^4 + 21* xbar ^3 + 25* xbar ^2 + 20* xbar + 15
sage: a=p*R(r)*R(g)+R(f)*R(m); a
-2* xbar ^4 + 2* xbar ^3 + 4* xbar ^2 - 3* xbar + 1
sage: Rp(fp)* Rp(a)
xbar ^3 + xbar
Now we describe the lattice representation of NTRU. It is easy to see that elements
in 𝑅 = ℤ[𝑥]/(𝑋 𝑁 − 1) correspond to vectors in ℤ𝑁 : a polynomial 𝑓 = 𝑎0 + 𝑎1 𝑥 + ⋯ +
𝑎𝑁−1 𝑥𝑁−1 ∈ 𝑅 is mapped to the vector 𝑓˜ = (𝑎0 , 𝑎1 , … , 𝑎𝑁−1 ) ∈ ℤ𝑁 of coefficients.
This is clearly a group isomorphism, but how does the multiplication in 𝑅 translate to
ℤ𝑁 ? We define the circulant matrix of 𝑓 as
𝑎 𝑎1 … 𝑎𝑁−1
⎛ 0 ⎞
𝑎𝑁−1 𝑎0 … 𝑎𝑁−2
𝐶𝑓 = ⎜ ⎟.
⎜ … ⎟
⎝ 𝑎1 𝑎2 … 𝑎0 ⎠
𝑓ℎ = 𝑓𝑓𝑞 𝑔 = 𝑔 mod 𝑞.
𝐼 𝐶ℎ
𝐴=(𝑁 ),
0 𝑞𝐼𝑁
where 𝐼𝑁 is the 𝑁 × 𝑁 identity matrix. The matrix 𝐴 is derived from the public key of
the NTRU cryptosystem. Since 𝑔 = 𝑓ℎ + 𝑞𝑢 for some polynomial 𝑢 ∈ 𝑅, we have
˜ 𝑢)𝐴
(𝑓, ˜ 𝑢)̃ (𝐼𝑁 ) , (𝑓,
̃ = ((𝑓, ˜ 𝑢)̃ ( 𝐶ℎ )) = (𝑓,
˜ 𝑔),
0 𝑞𝐼𝑁
and hence (𝑓, ˜ 𝑔)̃ ∈ Λ. Note that 𝑓 forms the private key. Since 𝑓 and 𝑔 have small coef-
˜ 𝑔)̃ is a short vector in Λ. NTRU can therefore be attacked by finding short
ficients, (𝑓,
vectors in the lattice Λ. However, this is assumed to be intractable if the dimension is
large enough.
14.4. NTRU 275
1 0 0 0 0 −5 13 11 2 8
⎛ ⎞
0 1 0 0 0 8 −5 13 11 2
⎜ ⎟
⎜ 0 0 1 0 0 2 8 −5 13 11 ⎟
⎜ 0 0 0 1 0 11 2 8 −5 13 ⎟
𝐼 𝐶ℎ ⎜ 0 0 0 0 1 13 11 2 8 −5 ⎟
𝐴=(5 )=⎜ .
0 29𝐼5 0 0 0 0 0 29 0 0 0 0 ⎟
⎜ 0 0 0 0 0 0 29 0 0 0 ⎟
⎜ ⎟
⎜ 0 0 0 0 0 0 0 29 0 0 ⎟
⎜ 0 0 0 0 0 0 0 0 29 0 ⎟
⎝ 0 0 0 0 0 0 0 0 0 29 ⎠
The lattice is given by a ‘bad’ basis of long vectors. We want to attack this NTRU cryp-
tosystem by finding a ‘good’ basis and the short secret vector (𝑓, ˜ 𝑔)̃ ∈ Λ. We use
SageMath to compute the LLL-reduced basis of the lattice (see Section 14.2):
1 0 0 −1 −1 0 0 1 −1 0
⎛ ⎞
−1 1 0 0 −1 0 0 0 1 −1
⎜ ⎟
⎜ −1 −1 1 0 0 −1 0 0 0 1 ⎟
⎜ 0 −1 −1 1 0 1 −1 0 0 0 ⎟
⎜ 0 0 −1 −1 1 0 1 −1 0 0 ⎟
⎜ −4 −4 5 0 5 5 5 5 −5 −10 ⎟.
⎜ 0 −5 9 −5 0 10 0 −5 0 −5 ⎟
⎜ ⎟
⎜ 0 −5 5 1 0 10 9 5 5 0 ⎟
⎜ −5 9 −5 0 0 0 −5 0 −5 10 ⎟
⎝ 0 5 −5 −5 6 −5 −9 4 5 5 ⎠
Since the dimension is small, our attack is successful: the negative of the first row is
Remark 14.38. The security of NTRU relies on the assumption that, given ℎ =
𝑓−1 𝑔 mod 𝑞, it is hard to recover 𝑓 and 𝑔. There are many potential attacks and NTRU
has been investigated for two decades. It is generally assumed that improved NTRU
schemes are secure and can even achieve IND-CCA2 security if the parameter recom-
mendations are observed. Some doubts remain with respect to the cyclotomic structure
of the ring 𝑅 = ℤ[𝑥]/(𝑥𝑁 −1) and there are proposals [BCLvV18] replacing 𝑥𝑁 −1 with
𝑥𝑁 − 𝑥 − 1 and choosing a prime 𝑞 instead of a power of 2 (compare Remark 14.35).
276 14. Lattice-based Cryptography
Figure 14.3. Probability mass function of the discrete Gaussian distribution 𝐷ℤ,𝑠 with
standard deviation 𝜎 = 3 and width 𝑠 = √2𝜋𝜎 ≈ 7.5.
It is easy to see that solving the Search-LWE problem also solves the decision prob-
lem, and it can be shown that the decision and the search version of LWE are equivalent
if 𝑞 is bounded by a polynomial in 𝑛 (see [Reg09]).
LWE can be viewed as a lattice problem: the matrix 𝐴 defines a 𝑞-ary lattice Λ𝑞 (𝐴𝑇 )
of dimension 𝑚. The search problem is to find the closest lattice vector 𝑣 to a given noisy
vector 𝑣 + 𝑒 where 𝑒 is chosen according to 𝜒. This is also called a Bounded Distance
Decoding (BDD) problem. The decision problem is to distinguish between a uniform
random vector 𝑏 and a noisy lattice vector 𝑣 + 𝑒.
The following theorem ([Reg09]) is one of the key results and explains why the
LWE problem is believed to be hard:
Theorem 14.41. Let 𝑛 ∈ ℕ be the security parameter, let 𝑚, 𝑞 ∈ ℕ be polynomial in
𝑛 and let 𝜒 = 𝐷ℤ,𝑠 be a discrete Gaussian of parameter 𝑠 such that 𝑠 = 𝛼𝑞 > 2√𝑛 and
0 < 𝛼 < 1. Then solving the LWE decision problem is at least as hard as quantumly
solving SIVP𝛾 on arbitrary 𝑛-dimensional lattices, where 𝛾 = 𝑂(𝑛/𝛼). ♢
SIVP𝑛/𝛼 problem. Since the approximation of the shortest independent vector prob-
lem to within polynomial factors is assumed to be a hard problem even for quantum
computers, the LWE problem is probably hard. The worst-case to average-case reduc-
tion of Theorem 14.41 is particularly interesting, because most other cryptographic
constructions are based on average-case hardness.
Now we define a public-key cryptosystem that is based on LWE and has the same
strong security guarantee.
Definition 14.42. Let 𝑛, 𝑚, 𝑞 ∈ ℕ, 𝑚 ≥ 𝑛, 𝑞 ≥ 2 and 𝜒 an error distribution on ℤ. Let
𝑙 be the plaintext length. Then the LWE public-key cryptosystem is defined by:
• The plaintext space ℳ = {0, 1}𝑙 ≅ ℤ𝑙2 .
• The ciphertext space 𝒞 = ℤ𝑛𝑞 × ℤ𝑙𝑞 .
• For key generation, one chooses an 𝑛×𝑙 matrix 𝑆 and an 𝑚×𝑛 matrix 𝐴 uniformly
at random and an 𝑚 × 𝑙 matrix 𝐸 according to 𝜒. All matrices are defined over ℤ𝑞 .
The private key is 𝑠𝑘 = 𝑆 and the public key is 𝑝𝑘 = (𝐴, 𝑃), where 𝑃 = 𝐴𝑆 + 𝐸.
• To encrypt a plaintext 𝑣 ∈ {0, 1}𝑙 , one chooses a vector 𝑎 ∈ {0, 1}𝑚 uniformly at
random. The ciphertext is given by
(𝑢, 𝑐) = ℰ𝑝𝑘 (𝑣) = (𝐴𝑇 𝑎, 𝑃 𝑇 𝑎 + ⌊ ⌉𝑣) ∈ ℤ𝑛𝑞 × ℤ𝑙𝑞 .
The security of LWE encryption relies on the hardness of the LWE problem (see
Theorem 14.43. If the LWE decision problem with parameters 𝑛, 𝑚, 𝑞 and 𝜒 is hard,
then the LWE encryption scheme has indistinguishable encryption under chosen plaintext
attack (IND-CPA secure). ♢
9 5 11 13
⎛ ⎞
13 6 6 2
⎜ ⎟
⎜ 6 21 17 18 ⎟ ⎛
5 2 9 1
⎜ 22 19 20 8 ⎟ 6 8 19 1
𝐴=⎜ 𝑆 = ⎜ ⎟.
2 17 10 21 ⎟ ⎜ 19 18 9 18 ⎟
⎜ ⎟
⎜ 10 8 17 11 ⎟ ⎝ 9 2 14 18 ⎠
⎜ 5 16 12 2 ⎟
⎝ 5 7 11 7 ⎠
The secret matrix 𝐸 is chosen according to 𝐷ℤ,𝑠 . The matrix 𝑃 = 𝐴𝑆 + 𝐸 is public.
0 22 1 21 10 5 21 7
⎛ ⎞ ⎛ ⎞
0 22 22 22 3 1 13 1
⎜ ⎟ ⎜ ⎟
⎜ 22 22 22 0 ⎟ ⎜ 19 15 6 13 ⎟
⎜ 0 0 0 0 ⎟ ⎜ 9 20 0 16 ⎟
𝐸=⎜ 𝑃=⎜ .
0 0 1 2 ⎟ 8 17 13 4 ⎟
⎜ ⎟ ⎜ ⎟
⎜ 1 0 0 1 ⎟ ⎜ 15 21 20 17 ⎟
⎜ 1 22 1 22 ⎟ ⎜ 0 12 3 19 ⎟
⎝ 22 0 0 1 ⎠ ⎝ 16 2 7 15 ⎠
We want to encrypt 𝑣 = (1, 0, 1, 1)𝑇 and choose a random vector 𝑎 ∈ {0, 1}8 :
𝑎 = (1, 1, 0, 1, 0, 0, 0, 1)𝑇 .
We have ⌊ 𝑣⌉ = (12, 0, 12, 12)𝑇 and compute the ciphertext
(𝑢, 𝑐) = ℰ𝑝𝑘 (𝑣) = (𝐴𝑇 𝑎, 𝑃 𝑇 𝑎 + ⌊ ⌉𝑣) = ((3, 14, 2, 7)𝑇 , (4, 5, 7, 5)𝑇 ) mod 23.
For decryption, we use the secret matrix 𝑆 and obtain
𝑐 − 𝑆 𝑇 𝑢 = (4, 5, 7, 5)𝑇 − 𝑆 𝑇 (3, 14, 2, 7)𝑇 = (11, 21, 12, 10)𝑇 mod 23.
280 14. Lattice-based Cryptography
Coefficients close to 0 mod 23 give the bit 0 and coefficients close to ⌊ ⌉ = 12 give the
bit 1. Hence we recover the plaintext 𝑣 = (1, 0, 1, 1)𝑇 . ♢
The lattice Λ is generated by the columns of 𝐴 (where the coefficients are lifted to ℤ)
and 23ℤ8 . The first eight (nonzero) columns of the HNF 𝐻 form a basis of Λ. Now we
construct the 9-dimensional lattice Λ′ :
sage: B=H[: ,0:8].augment (P[: ,0]).stack( vector ([0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,2]))
We choose the embedding factor 𝑀 = 2, but the reader may check that 𝑀 = 1 and
𝑀 = 3 also work in this example. Finally, we compute the LLL-reduced basis.
sage: B. transpose (). LLL (). transpose ()
[ 0 1 -2 0 2 -4 -2 4 -1]
[ 0 0 -2 -2 -3 0 0 1 -3]
[-1 0 0 -1 0 -3 -3 -1 1]
[ 0 -2 -1 -1 1 0 -1 2 0]
[ 0 1 -3 -2 0 0 0 1 3]
[ 1 -3 -1 1 0 -2 2 -2 -2]
[ 1 1 0 -4 2 0 2 0 -3]
14.5. Learning with Errors 281
[-1 -1 0 -1 1 -1 3 1 0]
[ 2 0 2 0 -2 0 0 2 2]
Since the dimension is small, the first column (0, 0, −1, 0, 0, 1, 1, −1, 2)𝑇 is the shortest
nonzero vector of Λ′ . We have successfully found the vector ( ), where 𝑒 mod 23 is
the secret error vector (the first column of 𝐸; see Example 14.44). ♢
A disadvantage of LWE is that the key size and the number of operations is at least
quadratic in the main security parameter 𝑛. Optimizations of Regev’s LWE encryption
scheme exist, for example the more compact Lindner-Peikert scheme [LP11].
The key size and the efficiency of the system was also the motivation behind the
development of an algebraic variant of LWE called Ring-LWE (R-LWE). We mention R-
LWE only briefly and refer to the literature for a detailed discussion and applications
to encryption and key exchange (see [LPR13] and subsequent works).
LWE is based on the hardness to find a vector 𝑠 given the matrix 𝐴 and a noisy
product 𝐴𝑠 + 𝑒 over ℤ𝑞 . Now, Ring-LWE leverages the cyclotomic ring
𝑅 = ℤ𝑞 [𝑥]/(𝑥𝑛 + 1),
where 𝑛 is a power of 2. The matrix 𝐴 and the secret vectors 𝑠 and 𝑒 are replaced by
elements in 𝑅. Since all elements of 𝑅 can be uniquely represented by polynomials of
degree less than 𝑛, we have 𝑅 ≅ ℤ𝑛𝑞 . Any element 𝑎 ∈ 𝑅 generates a principal ideal
(𝑎) = {𝑎 𝑥 | 𝑥 ∈ 𝑅} ⊂ 𝑅. If 𝑎 ≠ 0, then (𝑎) corresponds to an 𝑛-dimensional 𝑞-ary ideal
The R-LWE problem is to find 𝑠 ∈ 𝑅 given 𝑎 ∈ 𝑅 and 𝑏 = 𝑎𝑠 + 𝑒 ∈ 𝑅. The ring
element 𝑒 is ‘small’ and chosen according to an error distribution. Note that 𝑎𝑠 is an
element of the ideal lattice and only the noisy element 𝑏 is given to an adversary. An
encryption scheme can be defined in a similar way to Definition 14.42 above. The main
advantage is that the key length and the number of operations are now linear in 𝑛.
Ring-LWE also has an asymptotic security guarantee: there is a reduction from
a worst-case lattice problem SVP𝛾 to R-LWE, i.e., solving R-LWE is at least as hard as
quantumly solving the SVP𝛾 problem on arbitrary ideal lattices. Note that the reduction
is based on ideal lattices in 𝑅 instead of general 𝑞-ary lattices. Such ideal lattices have
additional structure and it might be possible that efficient attacks will be found that
exploit the algebraic structure and cannot be applied to general lattices. However, such
attacks are not yet known and may not exist.
282 14. Lattice-based Cryptography
14.6. Summary
−13 0
𝑏1 = ( ) , 𝑏2 = ( ) .
31 47
(a) Compute the determinant of Λ and the orthogonality defect of the basis
{𝑏1 , 𝑏2 }.
Exercises 283
(b) Apply the LLL-algorithm: First, run the GSO and the size reduction algo-
(c) Check the Lovacz condition. Swap the basis vectors and again apply the size
reduction algorithm.
(d) Check the Lovacz condition again and output the LLL-reduced basis. Com-
pute the orthogonality defect of this basis.
(e) Give the shortest nonzero vector of Λ.
8. A lattice Λ is generated by the columns of the following matrix:
1 0 0
𝐻=( 0 1 0 ).
14 18 63
(a) Encrypt 𝑚 = (−2, −3, 1) with the GGH encryption scheme. Choose the noise
vector 𝑟 = (1, −1, −1).
(b) Try to decrypt the ciphertext 𝑐 using the matrix 𝐻. Why does this attempt fail?
(c) Apply the LLL-algorithm to 𝐻 using SageMath and show that Λ has the fol-
lowing short basis:
−2 −1 4
𝐵 = ( −2 1 −3 ) .
−1 4 2
(f) Compare the result with the original plaintext, for example by printing out
𝑣 − 𝑤.
(g) Print out 1002𝑣 and 𝑐 − 𝑆 𝑇 𝑢 and interpret the result.
(h) Explain why or why not any decryption errors occurred.
(i) Give the sizes of the public key, the private key, the plaintext and the cipher-
Chapter 15
Code-based Cryptography
Error correction codes play an important role when data is sent over noisy channels, for
example over wireless links, or stored on potentially unreliable media. Channel cod-
ing deals with random errors and not with manipulations by adversaries, but integrity
protection is a common objective of channel coding and cryptography. However, codes
also aim to ensure error correction, which goes beyond error detection.
A channel encoder takes an information word as input and generates a codeword
that is transmitted over a channel. The ratio between the lengths of the original data
word and the codeword determines the information rate of the code. Decoding is a
potentially complex task, where received words are transformed into codewords and
the original information is (hopefully) recovered. Codes with good error-correction
capabilities, a high information rate and efficient decoding algorithms are available
and widely used in practice.
For cryptographic applications, one can use very long codes with a secret structure.
In this case, decoding should be hard for an adversary without access to the hidden
In Section 15.1, we give a short introduction to linear codes. Bounds on the pa-
rameters of codes are given in Section 15.2. In Section 15.3, we explain classical Goppa
codes. The McEliece cryptosystem is based on Goppa codes and represents one of
the promising candidates for post-quantum cryptography. We explore the McEliece
scheme and the related Niederreiter cryptosystem in Section 15.4.
There are a number of similarities between lattice-based and code-based cryptog-
raphy. Lattices and codes are linear subspaces of high-dimensional spaces, and for a
given target vector, finding the closest vector in the subspace can be a hard problem.
286 15. Code-based Cryptography
Both use a secret structure which allows for an efficient solution to the problem. How-
ever, lattices and codes use a different metric.
Two recommended textbooks on coding theory are [Rot06] and [Bla03]. A short
introduction to error correcting codes and the McEliece cryptosystem is given
in [TW06]. More details on code-based cryptography can be found in [OS09].
𝑐 𝑦
𝑥 Channel Encoder Channel Decoder 𝑥′
Example 15.1. The most elementary example is repetition codes, where an information
symbol (for example one bit) is repeated 𝑛 times, say 𝑛 = 3:
𝑠1 𝑠2 ⋯ ⟶ 𝑠1 𝑠1 𝑠1 𝑠2 𝑠2 𝑠2 … .
Error detection is straightforward: if the message received does not have this pattern,
an error has occurred. The error detection does not always work since a codeword
could accidentally change into another codeword. Although the probability of this
happening is small, it is not impossible (unless 𝑛 is very large, which makes the code
impractical). Note the difference to hash values or message authentication codes which
(almost) always detect changes.
The repetition code also allows for error correction within certain limits: by choos-
ing the symbol with the highest frequency (maximum likelihood) in a received block,
up to ⌊ ⌋ errors can be corrected. For example, one error can be corrected with 𝑛 = 3,
two errors with 𝑛 = 5, etc.
A major disadvantage of a repetition code is the message extension by a factor
of 𝑛. ♢
Below, we assume that messages and codewords are vectors over a finite field of
characteristic 2, i.e., over a binary field 𝐺𝐹(𝑞), where 𝑞 = 2 or 𝑞 = 2𝑚 . Coding theory
can be studied over arbitrary finite fields, but in practice fields of characteristic 2 are
the most important case.
15.1. Linear Codes 287
Definition 15.2. A block code of length 𝑛 over 𝐺𝐹(𝑞) is a subset 𝐶 of 𝐺𝐹(𝑞)𝑛 . The
elements of 𝐶 are called codewords. The code size is 𝑀 = |𝐶| and the dimension of 𝐶 is
𝑘 = log𝑞 (|𝐶|). A linear code over 𝐺𝐹(𝑞) is a linear subspace 𝐶 ⊂ 𝐺𝐹(𝑞)𝑛 . The code size
of a linear code is |𝐶| = 𝑞𝑘 and 𝐶 is called a linear [𝑛, 𝑘] code. The information rate of
𝐶 is given by 𝑅 = .
Example 15.3. The codewords of a repetition code of length 𝑛 over 𝐺𝐹(𝑞) form the
one-dimensional subspace 𝐶 ⊂ 𝐺𝐹(𝑞)𝑛 generated by the vector (1, 1, … , 1). 𝐶 is a linear
[𝑛, 1] code over 𝐺𝐹(𝑞) and the information rate is . ♢
Nearest-codeword decoding of a received word 𝑦 picks the closest codeword 𝑐 with re-
spect to the Hamming distance. For a binary symmetric channel with error probability
less than one half, maximum-likelihood decoding is equivalent to nearest-codeword
decoding. We refer to textbooks on coding theory for a more detailed discussion and
now use nearest-codeword decoding.
The following Theorem provides bounds for error detection and correction:
Theorem 15.8. A code can detect up to 𝑑(𝐶) − 1 errors and correct up to ⌊ ⌋ errors.
Proof. By definition of the minimum distance, adding less than 𝑑(𝐶) errors cannot
give another codeword and hence up to 𝑑(𝐶) − 1 errors can be detected. Now let 𝑦
be a received word having at most ⌊ ⌋ errors; then a codeword 𝑐 ∈ 𝐶 exists with
𝑑(𝑦, 𝑐) ≤ ⌊ ⌋. Suppose there is another codeword 𝑐′ ∈ 𝐶 such that 𝑑(𝑦, 𝑐′ ) ≤
⌊ ⌋. Then the triangle inequality implies
𝑑(𝐶) − 1
𝑑(𝑐, 𝑐′ ) ≤ 𝑑(𝑐, 𝑦) + 𝑑(𝑦, 𝑐′ ) ≤ 2 ⌊ ⌋ ≤ 𝑑(𝐶) − 1,
a contradiction. □
Since linear codes are linear subspaces, they can be represented by the span of a set
of linearly independent vectors. Writing these vectors into the rows of a matrix gives
the generator matrix 𝐺 of a code. If 𝐶 is a linear [𝑛, 𝑘] code, then 𝐺 is a 𝑘 × 𝑛 matrix
over 𝐺𝐹(𝑞). The set of codewords can be computed by 𝑥𝐺, where 𝑥 runs over all vectors
𝑥 ∈ 𝐺𝐹(𝑞)𝑘 .
The generator matrix of a code is not uniquely determined: adding the multiple of
one row to another row or swapping two rows does not change the subspace. Swapping
two columns gives an equivalent code, where only the coordinates are permuted. By
applying elementary row operations (Gauss-Jordan elimination) and column permu-
tations (if necessary), one can find a generator matrix in systematic form:
1 0 |
𝐺 = (𝐼𝑘 | 𝑃) = ( … | 𝑃 ).
0 1 |
𝐼𝑘 is the 𝑘 × 𝑘 identity matrix and 𝑃 is a 𝑘 × (𝑛 − 𝑘) matrix. The corresponding code is
called systematic and (by Gauss-Jordan elimination) all codes are equivalent to a sys-
tematic code. Systematic codes have the advantage that codewords contain the origi-
nal data as their first 𝑘 symbols. Otherwise, the information word 𝑥 must be recovered
from a codeword 𝑥𝐺 by solving a linear system of equations.
How can we verify that a received vector 𝑦 is a codeword without comparing the
vector to a list of all codewords? Using Gaussian elimination, one can check whether
𝑦 ∈ 𝐶, i.e., whether 𝑦 is a linear combination of the rows of 𝐺. A more direct way is
using a parity-check matrix.
15.1. Linear Codes 289
Definition 15.9. Let 𝐶 be a code with generator matrix 𝐺. Then 𝐻 is called a parity-
check matrix of 𝐶 if
𝑦𝐻 𝑇 = 0 ⟺ 𝑦 ∈ 𝐶.
For a received word 𝑦 ∈ 𝐺𝐹(𝑞)𝑛 , the vector 𝑦𝐻 𝑇 is called the syndrome of 𝑦.
Proposition 15.10. Let 𝐺 = (𝐼𝑘 |𝑃) be the generator matrix of a systematic [𝑛, 𝑘] code.
Then the (𝑛 − 𝑘) × 𝑛 matrix
| 1 0
𝐻 = (−𝑃 𝑇 | 𝐼𝑛−𝑘 ) = ( −𝑃𝑇 | … )
| 0 1
is the associated parity check matrix.
Proof. Let
(𝑣, 𝑤) ∈ 𝐺𝐹(𝑞)𝑘 × 𝐺𝐹(𝑞)𝑛−𝑘
be a row vector of length 𝑛; then
𝑣𝑇 𝑇
(𝑣, 𝑤)𝐻 = (𝐻 ( 𝑇 )) = (−𝑃𝑇 𝑣𝑇 + 𝑤𝑇 ) = −𝑣𝑃 + 𝑤.
This is the zero vector if and only if 𝑤 = 𝑣𝑃, which is equivalent to (𝑣, 𝑤) = 𝑣𝐺 and to
(𝑣, 𝑤) being a codeword. □
Example 15.11. The [7, 4] Hamming code over 𝐺𝐹(2) can be defined by the following
generator matrix:
1 0 0 0 1 1 0
⎛ ⎞
0 1 0 0 1 0 1
𝐺=⎜ ⎟.
⎜0 0 1 0 0 1 1⎟
⎝0 0 0 1 1 1 1⎠
The information rate is and the parity-check matrix is
1 1 0 1 1 0 0
𝐻 = (1 0 1 1 0 1 0) .
0 1 1 1 0 0 1
We want to show that the minimum distance is 𝑑 = 3. Firstly, the minimum distance
cannot be greater than 3, since codewords of weight 3 exist (see Proposition 15.5), for
example 𝑐 = (1, 0, 0, 0, 1, 1, 0).
Now assume that 𝑑(𝑣, 𝑤) = 1 for codewords 𝑣 and 𝑤. Then 𝑤𝑡(𝑣 − 𝑤) = 1 and
(𝑣 − 𝑤)𝐻 𝑇 is a zero column of 𝐻, a contradiction. If 𝑑(𝑣, 𝑤) = 2 then (𝑣 − 𝑤)𝐻 𝑇 , a
sum of two columns of 𝐻, is zero, and hence two columns of 𝐻 are linearly dependent.
However, this is not the case.
The [7, 4, 3] Hamming code can correct one error. For example, suppose that the
vector 𝑦 = (1, 1, 1, 0, 0, 1, 1) is received. The syndrome is 𝑦𝐻 𝑇 = (0, 1, 1) and hence 𝑦 is
290 15. Code-based Cryptography
In the above example, one could guess the nearest codeword. A better method is
syndrome decoding.
Definition 15.12. Let 𝐶 ⊂ 𝐺𝐹(𝑞)𝑛 be a linear code and 𝑦 ∈ 𝐺𝐹(𝑞)𝑛 any vector. The
set 𝑦 + 𝐶 = {𝑦 + 𝑐 | 𝑐 ∈ 𝐶} is called a coset of 𝐶. A vector having minimum Hamming
weight in a coset is called a coset leader. ♢
Note that all vectors in a coset 𝑦 + 𝐶 have the same syndrome 𝑦𝐻 𝑇 , since 𝑐𝐻 𝑇 = 0
for all codewords 𝑐 ∈ 𝐻 and hence
(𝑦 + 𝑐)𝐻 𝑇 = 𝑦𝐻 𝑇 .
Furthermore, all vectors in a given coset can be decoded with the coset leader as their
error vector. In fact, the coset leader represents the least change that transforms a
vector into a codeword.
Proposition 15.13. Let 𝐶 be a linear code and suppose one wants to decode a received
word 𝑦. Let 𝑒 be the coset leader of the coset 𝑦 + 𝐶, i.e., the coset leader with the syndrome
𝑦𝐻 𝑇 . Then the nearest codeword to 𝑦 is 𝑦 − 𝑒 ∈ 𝐶.
Proof. Since 𝑦 and 𝑒 have the same syndrome, the syndrome of 𝑦 −𝑒 is zero and 𝑦 −𝑒 is
a codeword. Furthermore, the coset leader 𝑒 has minimum weight among the vectors
𝑒 with 𝑦 − 𝑒 ∈ 𝐶. □
Example 15.14. We continue Example 15.11. The syndrome is 𝑦𝐻 𝑇 = (0, 1, 1) and
we need to find the coset leader which has that syndrome. In general, one would use a
table that contains the coset leader for each syndrome. In our example, we suspect that
𝑦 only has a single bit error. Hence the coset leader is a unit vector and its syndrome
is a column of the parity-check matrix 𝐻. The syndrome (0, 1, 1) appears in the third
column. The coset leader is 𝑒 = (0, 0, 1, 0, 0, 0, 0, 0) and we decode 𝑦 to the codeword
𝑦 − 𝑒 = (1, 1, 0, 0, 0, 1, 1).
Remark 15.15. It is known that the nearest-codeword problem for random codes is
hard (NP-complete). This suggests that large codes can be used for cryptographic pur-
Theorem 15.16. (Singleton Bound) Let 𝐶 be a block code of length 𝑛, minimum distance
𝑑 and size 𝑀; then
𝑀 ≤ 𝑞𝑛−𝑑+1 .
In particular, if 𝐶 is a linear [𝑛, 𝑘, 𝑑] code, then
𝑘 ≤ 𝑛 − 𝑑 + 1 ⟺ 𝑑 ≤ 𝑛 − 𝑘 + 1.
For example, the parity code is MDS (see Exercise 1), but the [7, 4, 3] Hamming
code (see Example 15.11) is not MDS. Reed-Solomon codes are a well-known class of
MDS codes, for which we refer to textbooks on coding theory.
One can use the generator matrix or the parity-check matrix to check whether a
code is MDS. We refer to [Rot06] for the following fact:
Proposition 15.18. Let 𝐶 be a linear [𝑛, 𝑘] code with generator matrix 𝐺 and parity-
check matrix 𝐻. Then 𝐶 is MDS if and only if one of the following conditions is satisfied:
(1) Every set of 𝑛 − 𝑘 columns of 𝐻 is linearly independent.
(2) Every set of 𝑘 columns of 𝐺 is linearly independent.
(3) If 𝐺 = (𝐼𝑘 | 𝑃) is in systematic form, then 𝑃 and every square submatrix of 𝑃 is
Example 15.19. (1) We again consider the [7, 4] Hamming code (see Example 15.11).
We see that the first three columns of 𝐻 are linearly dependent. The last four
columns of 𝐺 are also linearly dependent. One can show that no non-trivial lin-
ear code over 𝐺𝐹(2) can be MDS.
(2) We construct a systematic [8, 4] code over 𝐺𝐹(28 ) with generator matrix 𝐺 =
(𝐼4 | 𝑃), where
02 03 01 01
⎛ ⎞
01 02 03 01
𝑃=⎜ ⎟.
⎜01 01 02 03⎟
⎝03 01 01 02⎠
𝑃 defines the MixColumns operation of the AES block cipher (see Section 5.2). All
quadratic submatrices of 𝑃 are nonsingular over 𝐺𝐹(28 ) (see Exercise 6 of Chapter
5), so the code is MDS.
292 15. Code-based Cryptography
There is also a lower bound for the size of at least one (not necessarily linear) code
with a given length and minimum distance. First, we count the number of vectors in
a ball of radius 𝑟:
Proposition 15.20. Let 𝑟 ∈ ℕ and 𝑣 ∈ 𝐺𝐹(𝑞)𝑛 . Then the number of vectors 𝑤 ∈ 𝐺𝐹(𝑞)𝑛
such that 𝑑(𝑣, 𝑤) ≤ 𝑟 is
𝑉𝑞 (𝑛, 𝑟) = ∑ ( )(𝑞 − 1)𝑖 .
Proof. Let 𝑣 ∈ 𝐺𝐹(𝑞)𝑛 and 𝑖 ≤ 𝑛. Then the number of vectors 𝑤 such that 𝑑(𝑣, 𝑤) = 𝑖
is (𝑛)(𝑞 − 1)𝑖 , because there are (𝑛) possible index sets where exactly 𝑖 coordinates of
𝑖 𝑖
𝑤 differ from 𝑣, and 𝑞 − 1 possible values for each of these coordinates. Adding these
numbers for 0 ≤ 𝑖 ≤ 𝑟 gives 𝑉𝑞 (𝑛, 𝑟), the number of vectors 𝑤 ∈ 𝐺𝐹(𝑞)𝑛 with 𝑑(𝑣, 𝑤) ≤
𝑟. This is the same as the number of vectors in a ball of radius 𝑟 around the center 𝑣. □
Definition 15.21. Let 𝑑, 𝑛 ∈ ℕ and 𝑑 ≤ 𝑛. Then we define 𝐴𝑞 (𝑛, 𝑑) to be the largest
integer 𝑀 such that a code 𝐶 over 𝐺𝐹(𝑞) of size 𝑀, length 𝑛 and minimum distance
≥ 𝑑 exists. ♢
Proof. Let 𝐶 be a code of length 𝑛, minimum distance of at least 𝑑 and |𝐶| = 𝐴𝑞 (𝑛, 𝑑).
We can assume that, for each vector 𝑣 ∈ 𝐺𝐹(𝑞)𝑛 , there is at least one codeword 𝑐 such
that 𝑑(𝑣, 𝑐) < 𝑑. Otherwise, we could add 𝑣 as a codeword to the code while preserving
the length 𝑛 and the minimum distance 𝑑. Hence the union of balls of radius 𝑑 − 1
15.2. Bounds on Codes 293
having their center at some codeword covers the whole of 𝐺𝐹(𝑞)𝑛 . The number of
vectors in that union is at most 𝐴𝑞 (𝑛, 𝑑)⋅𝑉𝑞 (𝑛, 𝑑−1), which implies the sphere-covering
bound. □
Note that the above argument does not show equality, since vectors can be con-
tained in several balls.
The following theorem gives a bound for the existence of a linear code of dimension
𝑘 and minimum distance 𝑑.
Theorem 15.23. (Gilbert-Varshamov bound) Let 𝑛 ≥ 2, 𝑘 ≤ 𝑛 and 𝑑 ≥ 2 be integers
such that
𝑉𝑞 (𝑛 − 1, 𝑑 − 2) < 𝑞𝑛−𝑘 .
Then there exists a linear [𝑛, 𝑘] code over 𝐺𝐹(𝑞) with minimum distance ≥ 𝑑.
Proof. The assumption 𝑞𝑛−𝑘 > 𝑉𝑞 (𝑛 − 1, 𝑑 − 2) ensures that we can find 𝑛 vectors in
𝐺𝐹(𝑞)𝑛−𝑘 such that any 𝑑 − 1 of them are linearly independent (see [Rot06] for more
details). We write these vectors into the columns of a (𝑛 − 𝑘) × 𝑛 parity-check matrix
𝐻. The dimension of the associated code is at least 𝑛 − (𝑛 − 𝑘) = 𝑘. Furthermore,
the distance between two different codewords cannot be less than 𝑑, since otherwise
a codeword 𝑐 ∈ 𝐶 exists with 𝑤𝑡(𝑐) ≤ 𝑑 − 1, and so 𝑑 − 1 columns of 𝐻 are linearly
dependent, a contradiction. □
Figure 15.2. Asymptotic bounds on the information rate 𝑅 = against the relative
minimum distance 𝛿 = for codes over 𝐺𝐹(2). Singleton and Hamming are upper
bounds and Gilbert-Varshamov is a lower bound.
and hence there is a code with at least 5 codewords and the above parameters. Next,
we consider the Gilbert-Varshamov bound:
6 6
𝑉2 (6, 1) = ( ) + ( ) = 7 < 23 = 27−4 .
0 1
This implies that a linear [7, 4] code with minimum distance ≥ 3 exists. Since the
[7, 4, 3] Hamming code has 16 codewords, the code attains the Gilbert-Varshamov
bound. The Hamming bound is
27 128
= 7 = 16.
𝑉2 (7, 1) ( ) + (7)
0 1
Therefore, the maximum size of a code of length 7 over 𝐺𝐹(2) with a minimum distance
of at least 3 is 16. We conclude that the Hamming [7, 4, 3] code is perfect. ♢
We return the problem at the beginning of this section on good codes of length 𝑛.
𝑑 𝑘
The relative minimum distance 𝛿 = and the information rate 𝑅 = cannot be close
𝑛 𝑛
to 1 at the same time. The bounds for 𝑛 → ∞ are shown in Figure 15.2.
15.3. Goppa Codes 295
that sends (𝑐1 , … , 𝑐𝑛 ) to ∑𝑖 𝑐𝑖 mod 𝑔. We may represent the syndrome by a poly-
nomial of degree less than 𝑡. This yields 𝑡 parity-check equations in 𝑛 variables and
𝐶 is therefore a [𝑛, 𝑛 − 𝑡] code over 𝐺𝐹(2𝑚 ), assuming that the equations are linearly
independent (see Remark 15.28 below).
Remark 15.28. One can show ([Rot06] Section 5.1 and Problem 5.11) that 𝐶 is a Gen-
eralized Reed-Solomon (GRS) code with the following parity-check matrix over 𝐺𝐹(2𝑚 ):
1 1 1
⎛ 𝑔(𝑎1 ) … 0 ⎞ …
1 … 1 ⎛ 𝑔(𝑎1 ) 𝑔(𝑎𝑛 ) ⎞
⎛ ⎞⎜ … ⎟ ⎜ 𝑎1 …
𝑎1 … 𝑎𝑛 ⎜ ⎟ = ⎜ 𝑔(𝑎1 )
𝐻=⎜ ⎟ … 𝑔(𝑎𝑛 ) ⎟ .
⎜ … ⎟⎜ ⎟ ⎜ … ⎟
𝑡−1 … ⎜ 𝑎𝑡−1 ⎟
⎝𝑎1 … 𝑎𝑡−1
𝑛 ⎠⎜ 1 ⎟ 1
⎝ 0 …
𝑔(𝑎𝑛 ) ⎠
⎝ 𝑔(𝑎1 ) 𝑔(𝑎𝑛 ) ⎠
The first 𝑡 columns of the first matrix have a Vandermonde form and are thus nonsin-
gular. The second matrix is a nonsingular diagonal matrix. This shows that the rows
of 𝐻 are linearly independent, so that 𝐶 is an [𝑛, 𝑛 − 𝑡] code. Using Proposition 15.18,
one can also show that 𝐶 is MDS, i.e., an [𝑛, 𝑛 − 𝑡, 𝑡 + 1] code. ♢
In principle, GRS codes can be used for encryption. However, several proposals to
use GRS codes turned out to be insecure while Goppa codes are still unbroken.
Definition 15.29. Let 𝐶 be a code over 𝐺𝐹(2𝑚 ) as in Definition 15.27. We define the
corresponding classical irreducible binary Goppa code Γ over 𝐺𝐹(2) to be the subfield
code of 𝐶, i.e.,
Γ = {(𝑐1 , … , 𝑐𝑛 ) ∈ 𝐺𝐹(2)𝑛 | ∑ 𝑐𝑖 ≡ 0 mod 𝑔} . ♢
𝑥 − 𝑎𝑖
Since ℎ(𝑥) = ∏𝑖=1 (𝑥 − 𝑎𝑖 ) is invertible modulo 𝑔, one has 𝑐 ∈ Γ if and only if
∑ 𝑐𝑖 mod 𝑔 ≡ 0.
𝑥 − 𝑎𝑖
The parity-check matrix of Γ is essentially the matrix 𝐻 in Remark 15.28, but now each
element is viewed as a column of 𝑚 elements of 𝐺𝐹(2).
Proposition 15.30. Let Γ be a classical irreducible binary Goppa code with the above
parameters. Then Γ is a [𝑛, ≥ 𝑛 − 𝑚𝑡, ≥ 2𝑡 + 1] code.
Note that the degree of 𝑓 is equal to the weight of 𝑐. Let 𝐷(𝑓) be the formal derivative
of 𝑓 (see Definition 4.55). One has
𝐷(𝑓) = 𝐷((𝑥 − 𝑎𝑖 )𝑓𝑖 ) = 𝑓𝑖 + (𝑥 − 𝑎𝑖 )𝐷(𝑓𝑖 )
for 𝑖 ∈ {1, … , 𝑛} with 𝑐𝑖 = 1. A recursive application gives
𝐷(𝑓) = ∑ 𝑓𝑖 .
𝑖∶ 𝑐𝑖 =1
Multiplying this equation with the polynomial yields
ℎ 𝐷(𝑓) ℎ ℎ
= ∑ = ∑ 𝑐𝑖 .
𝑓 𝑖∶ 𝑐 =1
𝑥 − 𝑎 𝑖 𝑖=1
𝑥 − 𝑎𝑖
𝑛 ℎ ℎ 𝐷(𝑓)
By assumption, we have 𝑐 ∈ Γ, and hence ∑𝑖=1 𝑐𝑖 ≡ 0 mod 𝑔. Therefore,
𝑥−𝑎𝑖 𝑓
is a multiple of 𝑔. Since 𝑔(𝑎𝑖 ) ≠ 0 for all 𝑖 = 1, … , 𝑛, the polynomials 𝑔 and are rela-
tively prime, which implies 𝑔 ∣ 𝐷(𝑓). Now, polynomials over 𝐺𝐹(2 ) have a surprising
property (see Exercise 7):
• All elements in 𝐺𝐹(2𝑚 ) are squares and
• All polynomials over 𝐺𝐹(2𝑚 ) can be written as 𝛼2 + 𝑥𝛽 2 with 𝛼, 𝛽 ∈ 𝐺𝐹(2𝑚 )[𝑥].
Write 𝑓 = 𝛼2 + 𝑥𝛽 2 with 𝛼, 𝛽 ∈ 𝐺𝐹(2𝑚 )[𝑥]. By construction, 𝑓 has only simple roots
and is not a square. Hence 𝛽 ≠ 0 and
𝐷(𝑓) = 2𝛼𝐷(𝛼) + 𝛽 2 + 2𝑥𝛽𝐷(𝛽) = 𝛽 2 .
Since 𝑔 is irreducible and 𝑔 ∣ 𝐷(𝑓) (see above), we obtain 𝑔 ∣ 𝛽. We conclude that the
degree of 𝛽 is at least 𝑡 and the degree of 𝑓 = 𝛼2 + 𝑥𝛽 2 is at least 2𝑡 + 1. This proves
𝑤𝑡(𝑐) = deg(𝑓) ≥ 2𝑡 + 1. □
Example 15.31. Let 𝑚 = 4, 𝑡 = 2 and 𝑛 = 16; then 𝑛 − 𝑚𝑡 = 8 and 𝑑 = 2𝑡 + 1 = 5.
We want to construct a [16, 8, 5] Goppa code over 𝐺𝐹(2) and use SageMath for the
computations. The field 𝐺𝐹(16) is given by
𝐺𝐹(2)[𝑧]/(𝑧4 + 𝑧 + 1)
and its elements 𝑎1 , … , 𝑎16 ∈ 𝐺𝐹(16) are represented by binary polynomials in the
variable 𝑧 of degree < 4. We choose an irreducible polynomial 𝑔 ∈ 𝐺𝐹(16)[𝑥] of de-
gree 2:
𝑔(𝑥) = 𝑥2 + 𝑧2 𝑥 + 𝑧.
298 15. Code-based Cryptography
The elements
mod 𝑔, 𝑎𝑖 ∈ 𝐺𝐹(16)
𝑥 − 𝑎𝑖
can be represented by polynomials in 𝐺𝐹(16)[𝑥] of degree ≤ 1.
sage: arr =[]
sage: for a in K.list ():
arr. append (1/ Rmodg (x-a))
The array arr contains all elements mod 𝑔 with 𝑎 ∈ 𝐺𝐹(16). Their coefficients
with respect to the standard basis {1, 𝑥} define a 2 × 16 parity check matrix 𝐻16 of the
code 𝐶 over 𝐺𝐹(16):
sage: H16= matrix (K ,2 ,16)
sage: for i in range (0 ,2):
for j in range (0 ,16):
H16[i,j]= list(arr[j])[i]
0 1 0 1 1 0 0 1 0 1 1 1 1 0 0 1
⎛ ⎞
0 0 0 0 0 1 0 1 1 0 1 0 0 0 0 1
⎜ ⎟
⎜ 1 1 0 0 0 1 0 1 1 1 1 1 1 1 0 0 ⎟
⎜ 0 0 0 0 0 0 1 1 0 0 1 1 1 0 1 1 ⎟
𝐻=⎜ ⎟.
1 0 1 1 0 0 1 0 0 1 0 1 0 0 0 0
⎜ ⎟
⎜ 0 0 0 1 1 0 1 0 1 1 1 1 0 1 1 1 ⎟
⎜ 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 1 ⎟
⎝ 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 ⎠
𝐻 is the parity-check matrix of a Goppa code Γ. By solving the linear system of equa-
tions 𝑣𝐻 𝑇 = 0 and performing elementary row operations, we get the generator matrix
𝐺 of Γ in systematic form.
1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1
⎛ ⎞
0 1 0 0 0 0 0 0 1 0 1 0 0 1 1 0
⎜ ⎟
⎜ 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 1 ⎟
⎜ 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 0 ⎟
𝐺=⎜ ⎟.
0 0 0 0 1 0 0 0 0 1 0 1 1 1 0 0
⎜ ⎟
⎜ 0 0 0 0 0 1 0 0 0 1 0 1 1 0 1 1 ⎟
⎜ 0 0 0 0 0 0 1 0 0 0 1 1 1 1 1 1 ⎟
⎝ 0 0 0 0 0 0 0 1 0 1 1 1 0 0 1 0 ⎠
Proposition 15.30 shows that Γ is a [16, ≥ 8, ≥ 5] code. Since the rank of 𝐺 is 8 and
codewords of weight 5 exist, Γ is a [16, 8, 5] code and can decode 16-bit words having at
most two errors. There are 28 = 256 different syndromes and their coset leader gives
the error vector. ♢
Suppose that all of the parameters of a Goppa code are known and a word 𝑤 =
(𝑤1 , … , 𝑤𝑛 ) ∈ 𝐺𝐹(2)𝑛 with at most 𝑡 errors is received. We want to decode 𝑤 and
proceed similarly as in the proof of Proposition 15.30.
Let 𝑓 = ∏𝑖∶𝑤 =1 (𝑥 − 𝑎𝑖 ) and 𝑓𝑖 = . Then 𝐷(𝑓) = ∑𝑖∶ 𝑤 =1 𝑓𝑖 and
𝑖 𝑥−𝑖 𝑖
1 𝐷(𝑓)
𝑆𝑦𝑛(𝑤) = ∑ = mod 𝑔.
𝑖∶ 𝑤 =1
𝑥 − 𝑎𝑖 𝑓
𝛼 = 𝛽𝑅 mod 𝑔.
We lift the residue class 𝑅 to a polynomial over 𝐺𝐹(2𝑚 ) of degree < 𝑡. The requested
solution (𝛼, 𝛽) can be found by performing several iterations of the Extended Euclidean
𝑡 𝑡−1
Algorithm in 𝐺𝐹(2𝑚 )[𝑥] on inputs 𝑔 and 𝑅 until deg(𝛼) ≤ and deg(𝛽) ≤ . The
2 2
algorithm outputs polynomials 𝛼, 𝛽 and 𝑦 in 𝐺𝐹(2𝑚 )[𝑥] such that
𝛼 = 𝛽𝑅 + 𝑦𝑔.
It is necessary to stop the Euclidean Algorithm midway as soon as the degree of a re-
mainder is ≤ . Further iterations decrease the degree of 𝛼 (until deg(𝛼) = 0), but
increase the degree of 𝛽 above the limit. Then the error polynomial is
𝜎 = 𝛼2 + 𝑥𝛽 2 .
Example 15.32. Consider the Goppa code from Example 15.31. We encode
(1, 1, 0, 1, 0, 0, 1, 0) and obtain the codeword
Adding two errors at the third and seventh positions yields the word
𝑤 = (1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0).
We re-use our array arr of elements mod 𝑔 (see Example 15.31).
Note that SageMath writes xbar for the residue class 𝑥 mod 𝑔. We obtain
𝑆𝑦𝑛(𝑤) = (𝑧2 + 𝑧)𝑥 + 1 mod 𝑔
and let
𝑇= ≡ 𝑧3 𝑥 + (𝑧3 + 𝑧 + 1) mod 𝑔.
Next, we have to compute the square root of 𝑇 + 𝑥 = (𝑧3 + 1)𝑥 + (𝑧3 + 𝑧 + 1). Since
𝐺𝐹(16)[𝑥]/(𝑔(𝑥)) is a binary field with 256 elements, one has (see Exercise 6)
𝑅 = √𝑇 + 𝑥 = (𝑇 + 𝑥)128 = ((𝑧3 + 1)𝑥 + (𝑧3 + 𝑧 + 1))128 ≡ (𝑧3 + 𝑧2 )𝑥 + (𝑧2 + 𝑧 + 1) mod 𝑔.
Note that the root 𝑅 can be computed more efficiently (see Exercise 8).
sage: T=1/(w* vector (arr ))
sage: T - Rmodg (x)
(z^3 + 1)* xbar + z^3 + z + 1
sage: R=(T - Rmodg(x ))^128; R
(z^3 + z^2)* xbar + z^2 + z + 1
Finally, we have to find 𝛼, 𝛽 with 𝛼 = 𝛽𝑅 mod 𝑔. In our example, the degrees have
to satisfy deg(𝛼) ≤ 1 and deg(𝛽) = 0. So we simply define 𝛼 as the lift of 𝑅 to 𝐺𝐹(16)[𝑥]
and set 𝛽 = 1. The error polynomial is
𝜎 = 𝛼2 + 𝑥𝛽 2 = ((𝑧3 + 𝑧2 )𝑥 + (𝑧2 + 𝑧 + 1))2 + 𝑥 = (𝑧3 + 𝑧2 + 𝑧 + 1)𝑥2 + 𝑥 + (𝑧2 + 𝑧).
The error locator polynomial 𝜎 ∈ 𝐺𝐹(16)[𝑥] is of degree 𝑡 = 2 and its roots are 𝑧2 and
𝑧3 + 𝑧2 .
sage: a=(z^3+z^2)*x+z^2+z+1; b=1
sage: sigma=a*a+x*b*b; sigma
(z^3 + z^2 + z + 1)*x^2 + x + z^2 + z
sage: sigma. factor ()
(z^3 + z^2 + z + 1) * (x + z^2) * (x + z^3 + z^2)
We fixed an ordering of the elements in 𝐺𝐹(16), and in our example, the roots are the
third and the seventh field elements. Hence the word 𝑤 has errors at positions 3 and 7
and we recover the codeword by adding the error vector 𝑒 = (0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0).
sage: i=0; e= vector (GF (2) ,16)
sage: for k in list(K):
if (( sigma.subs(x=k ))==0):
sage: print (e)
(0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
Since 𝑛 is small, classical syndrome decoding, i.e., without using the Goppa code
structure, would also work in this example. One finds that the coset leader of 𝑤 + Γ
is the error vector 𝑒 = (0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0), since 𝑤𝑡(𝑒) = 2 and the
syndrome of both 𝑤 and 𝑒 is
𝑤𝐻 𝑇 = 𝑒𝐻 𝑇 = (0, 0, 0, 1, 0, 1, 1, 0).
For decryption, we use the matrices 𝑃, 𝑆 and the Goppa code. First, compute
𝑦1 = 𝑦𝑃 −1 = (0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1).
Now we use the Goppa code structure to find the error vector 𝑒𝑃−1 and decode 𝑦1 to
the codeword
𝑐 = (0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1)
(Exercise 10). Finally, one solves the linear system of equations 𝑥𝑆𝐺 = 𝑐 and finds the
plaintext 𝑥 = (0, 1, 1, 1, 0, 0, 1, 1). ♢
𝑥𝐼 = 𝑦𝐼 𝐺𝐼−1 .
If the vector 𝑦 coincides with the codeword 𝑐 at the index positions 𝐼, i.e., if all errors
are outside 𝐼, then 𝑥𝐼 = 𝑥 and we have successfully decoded 𝑦. This follows from
basic linear algebra: the linear system of equations 𝑥𝐺1 = 𝑐 is overdetermined with
𝑘 variables and 𝑛 > 𝑘 equations. Choosing 𝑘 linearly independent equations, i.e., an
invertible 𝑘 × 𝑘 submatrix 𝐺𝐼 of 𝐺1 , suffices to find 𝑥.
However, it is unlikely that 𝑦 is error-free on a random index set 𝐼: the probability
that a randomly selected index set is not affected by any errors is
Hence the number of attempts necessary to find a suitable information set 𝐼 is expo-
nential in 𝑘. There are many improvements to this basic scheme, but the number of
guesses remains exponential in the number of errors added.
Example 15.35. We return to Example 15.34 and want to decode the ciphertext 𝑦 with-
out the Goppa code structure, using information-set decoding. We choose the index set
306 15. Code-based Cryptography
𝐼 = {4, 5, 6, 7, 8, 9, 10, 11}, extract columns 4 – 11 from 𝐺1 and invert the matrix:
0 1 0 0 0 1 1 1 1 0 1 1 0 1 1 1
⎛ ⎞ ⎛ ⎞
0 0 1 0 1 0 1 0 1 1 1 0 1 1 1 0
⎜ ⎟ ⎜ ⎟
⎜ 0 1 0 0 0 0 0 1 ⎟ ⎜ 1 1 1 1 1 0 1 0 ⎟
⎜ 0 1 0 1 1 0 0 0 ⎟ −1 ⎜ 1 0 0 1 1 1 1 1 ⎟
𝐺𝐼 = ⎜ ⎟ , 𝐺𝐼 = ⎜ 0 ⎟.
1 1 0 1 0 0 0 1 1 1 0 0 0 0 1
⎜ ⎟ ⎜ ⎟
⎜ 0 0 1 1 1 0 0 0 ⎟ ⎜ 0 1 1 1 1 0 1 1 ⎟
⎜ 1 0 0 0 0 1 0 1 ⎟ ⎜ 1 1 0 1 1 0 1 1 ⎟
⎝ 0 1 1 0 0 0 1 1 ⎠ ⎝ 1 1 0 0 1 1 1 0 ⎠
Now we hope to compute the plaintext:
𝑥𝐼 = 𝑦𝐼 𝐺𝐼−1 = (1, 1, 0, 1, 0, 1, 0, 1) 𝐺𝐼−1 = (0, 1, 1, 1, 0, 0, 1, 1).
Indeed, we have 𝑥 = 𝑥𝐼 and have successfully computed the plaintext. The attack
works because the error positions (2 and 14) lie outside the information set 𝐼.
However, if an adversary chooses the index set 𝐼 = {1, 2, 3, 4, 5, 6, 7, 8}, the result is
𝑥𝐼 = (0, 0, 0, 1, 1, 1, 0, 1). They can verify whether 𝑥𝐼 is correct by computing 𝑥𝐼 𝐺 and
subtracting (i.e., adding modulo 2) the ciphertext 𝑦:
𝑥𝐼 𝐺1 + 𝑦 = (0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1).
Since the weight is 6, the result is not a valid error vector and the attack has failed. ♢
There are several improvements of McEliece’s original system. Firstly, the public
generator matrix 𝐺1 = 𝑆𝐺𝑃 can be transformed into the systematic form 𝐺1′ = (𝐼𝑘 | 𝐺2 )
by elementary row operations and (possibly) column permutations. These operations
correspond to matrix multiplications from the left (row operations) and from the right
(column permutations). One obtains 𝐺1′ = 𝑆 ′ 𝐺𝑃 ′ , so the underlying Goppa code with
the generator matrix 𝐺 remains unchanged.
The advantage of the systematic form (𝐼𝑘 | 𝐺2 ) is that the public key can be com-
pressed to the 𝑘 × (𝑛 − 𝑘) submatrix 𝐺2 , since the identity matrix is a constant part of
any matrix of this form.
Now, suppose the public generator matrix is in systematic form (𝐼𝑘 | 𝐺2 ). Then
encryption of a plaintext 𝑥 yields the ciphertext
𝑦 = 𝑥(𝐼𝑘 | 𝐺2 ) + 𝑒 = (𝑥 ‖ 𝐺2 𝑥) + 𝑒.
We observe that the plaintext 𝑥 – only slightly disturbed by at most 𝑡 error bits – is a
part of the ciphertext! Although this is not a problem (or even desirable) for channel
coding, it looks alarming with respect to the security of the McEliece cryptosystem. In
fact, the original system is not EAV-, CPA- or CCA2-secure.
However, it is assumed that the McEliece encryption function is one-way and it is
hard to recover all bits of a uniform random plaintext from a given ciphertext. Further-
more, there are techniques to reduce the use of partial information, and a modification
15.4. McEliece Cryptosystem 307
of the original McEliece cryptosystem can achieve CCA2 security in the random ora-
cle model. The idea of Pointcheval’s generic conversion [Poi00] is to encrypt a uniform
random string 𝑟 of length 𝑘 instead of the plaintext 𝑥. The error vector is obtained
by applying a cryptographic hash function 𝐻 to (𝑥‖𝑟′ ), where 𝑟′ is a uniform random
string of length 𝑘′ :
𝑒 = 𝐻(𝑥‖𝑟′ ), 𝑦1 = 𝑟𝐺1 + 𝑒.
𝑒 needs to be transformed into an error vector of length 𝑛 and weight 𝑡, but we can skip
the details here. Let 𝑅 be a pseudorandom generator that takes a key (or seed) of length
𝑘 as input and outputs a string of length 𝑘 + 𝑘′ . Then set
𝑦2 = (𝑥‖𝑟′ ) ⊕ 𝑅(𝑟).
(𝑥‖𝑟′ ) = 𝑦2 ⊕ 𝑅(𝑟).
Before outputting the plaintext 𝑥, the integrity is checked using 𝑟′ . For this purpose, the
resulting error vector 𝑟𝐺1 + 𝑦1 is compared to the error vector derived from 𝐻(𝑥‖𝑟′ ). If
they do not match, then an error code is returned. Otherwise, the plaintext 𝑥 is output.
An adversary can still obtain large parts of 𝑟 from the ciphertext 𝑦1 . However, they
cannot exploit this information, unless they have the complete key 𝑟, which is very
unlikely. Furthermore, access to a decryption oracle does not help decrypt a given
challenge ciphertext.
Finally, we want to explain the Niederreiter cryptosystem, which uses syndromes
instead of erroneous codewords. An advantage is that the Niederreiter cryptosystem
has shorter ciphertexts than the McEliece cryptosystem.
Definition 15.36. Suppose the Goppa code parameters 𝑛, 𝑚 and 𝑡 are given such that
𝑚 ≥ 3, 𝑛 ≤ 2𝑚 and 2 ≤ 𝑡 < . Set 𝑘 = 𝑛 − 𝑚𝑡 and 𝑑 = 2𝑡 + 1. The Niederreiter
cryptosystem is defined as follows:
• The plaintext space contains the binary strings of length 𝑛 and weight 𝑡. The
ciphertext space is 𝒞 = {0, 1}𝑛−𝑘 .
• The secret key is chosen uniformly at random and consists of an invertible
(𝑛 − 𝑘) × (𝑛 − 𝑘) matrix 𝑆 over 𝐺𝐹(2), an 𝑛 × 𝑛 permutation matrix 𝑃 over
𝐺𝐹(2), distinct elements 𝑎1 , … , 𝑎𝑛 of the field 𝐺𝐹(2𝑚 ), an irreducible polyno-
mial 𝑔 ∈ 𝐺𝐹(2𝑚 )[𝑥] of degree 𝑡 and the (𝑛 − 𝑘) × 𝑛 parity-check matrix 𝐻 of the
associated Goppa code Γ. A new key is chosen if the dimension of Γ is not 𝑛 − 𝑚𝑡.
• The public key is the (𝑛 − 𝑘) × 𝑛 matrix 𝐻1 = 𝑆𝐻𝑃.
308 15. Code-based Cryptography
• The encryption algorithm takes a plaintext 𝑥 ∈ {0, 1}𝑛 of weight 𝑡 as input and
outputs the ciphertext
𝑦 = ℰ𝑝𝑘 (𝑥) = 𝐻1 𝑥𝑇 .
• The decryption algorithm takes a ciphertext 𝑦 ∈ {0, 1}𝑛−𝑘 as input and computes
the column vector 𝑠𝑦𝑛 = 𝑆 −1 𝑦. Find a vector 𝑧 such that 𝐻𝑧𝑇 = 𝑠𝑦𝑛 and decode 𝑧
using Patterson’s algorithm. This gives the error vector 𝑒 of weight 𝑡 with 𝐻𝑒𝑇 =
𝑠𝑦𝑛. The plaintext 𝑥 is recovered by
𝑥𝑇 = 𝑃 −1 𝑒𝑇 . ♢
The public key 𝐻1 = 𝑆𝐻𝑃 is transformed into the systematic form (𝐼𝑛−𝑘 |𝐻2 ). This
reduces the size of the public key to 𝑘(𝑛 − 𝑘) bits.
We explain the correctness of the Niederreiter cryptosystem: 𝐻(𝑃𝑥𝑇 ) is the syn-
drome of 𝑃𝑥𝑇 . Thus the ciphertext 𝑦 = 𝑆𝐻𝑃𝑥𝑇 is a transformed syndrome. Comput-
ing 𝑆−1 𝑦 gives the syndrome 𝐻(𝑃𝑥𝑇 ). Syndrome decoding recovers the error vector
𝑒𝑇 = 𝑃𝑥𝑇 , and the plaintext is 𝑥𝑇 = 𝑃 −1 𝑒𝑇 .
Example 15.37. Consider the Goppa code and the matrices 𝐻, 𝑆 and 𝑃 in Examples
15.31 and 15.34. Suppose we want to encrypt the plaintext
𝑥 = (0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0)
of weight 𝑡 = 2 with the Niederreiter cryptosystem. The ciphertext is
𝑦 = 𝑆𝐻𝑃𝑥𝑇 = (0, 0, 1, 0, 1, 1, 1, 0)𝑇 .
For decryption, we compute the syndrome
𝑠𝑦𝑛 = 𝑆 −1 𝑦 = (0, 0, 1, 1, 1, 1, 0, 1)𝑇 .
The linear system of equations 𝐻𝑧𝑇 = 𝑠𝑦𝑛 is underdetermined. The affine solution
space is of dimension 8 and one of the solutions is the vector
𝑧 = (1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0).
Syndrome decoding (see Exercise 12) yields the error vector
𝑒𝑇 = (1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0)𝑇 .
Finally, we recover the plaintext
𝑥𝑇 = 𝑃 −1 𝑒𝑇 = (0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0)𝑇 . ♢
Like the McEliece system, the plain Niederreiter cryptosystem is not CCA2 secure.
This can be resolved by applying a CCA2-secure conversion, for example Pointcheval’s
generic conversion [Poi00] (see above), the Fujisaki-Okamoto transform [FO13] or the
Kobara-Imai conversion [NC12].
15.5. Summary 309
Remark 15.38. The McEliece and the Niederreiter cryptosystems look differently, but
they are based on the same decoding problem. Let 𝐺 be the generator matrix and 𝐻 the
parity-check matrix of a Goppa code. Let 𝐺1 = 𝑆𝐺𝑃 be the public key of the McEliece
system and let 𝑆′ be any invertible (𝑛 − 𝑘) × (𝑛 − 𝑘) matrix. Then 𝐻1 = 𝑆 ′ 𝐻(𝑃−1 )𝑇
satisfies 𝐺1 𝐻1𝑇 = 0, and 𝐻1 is the parity-check matrix and the public key of the Nieder-
reiter system of the same code. The McEliece ciphertext is an erroneous codeword
and the Niederreiter ciphertext is a syndrome. Finding the nearest codeword is essen-
tially the same as finding the coset leader of a syndrome. Therefore, the McEliece and
the Niederreiter cryptosystems offer the same level of security. An adversary who can
break one of them is also able to break the other. ♢
15.5. Summary
• Codes are used to detect and to correct errors when data is sent over noisy chan-
nels or stored on potentially unreliable media.
• Information words are encoded to codewords. Decoding of an erroneous code-
word means finding the error vector, restoring the codeword and recovering
the original data.
• There are bounds on the maximum number of codewords when the length and
minimum distance of codewords is given.
• Syndrome decoding works in many practical applications, but decoding is a
hard problem for random codes of large dimension.
• Goppa codes have an efficient decoding algorithm that also works for large
• The McEliece cryptosystem is based on a code with a secret Goppa code struc-
ture. The ciphertext are erroneous codewords and the plaintext is recovered
by decoding. The Niederreiter cryptosystem is similar to the McEliece scheme,
but uses a parity-check matrix and syndromes for encryption.
• Code-based encryption with appropriate parameters is thought to be secure
against attacks by quantum computers.
310 15. Code-based Cryptography
1. The codewords of the parity code 𝐶 of length 𝑛 over 𝐺𝐹(𝑞) are the words (𝑥1 , … ,
𝑥𝑛−1 , 𝑥𝑛 ) that satisfy 𝑥𝑛 = 𝑥1 + ⋯ + 𝑥𝑛−1 . Give the generator and the parity-check
matrix of 𝐶. Show that 𝐶 is a linear [𝑛, 𝑛 − 1, 2] MDS code.
2. Find the relationship between 𝑞-ary lattices and linear codes over 𝐺𝐹(𝑞) for a prime
3. Let 𝐶 be the linear [8, 4] code with the following generator matrix over 𝐺𝐹(2):
1 1 1 1 1 1 1 1
⎛ ⎞
0 0 0 0 1 1 1 1
𝐺=⎜ ⎟.
⎜0 0 1 1 0 0 1 1⎟
⎝0 1 0 1 0 1 0 1⎠
(a) Show that 𝐺 is also the parity-check matrix of 𝐶. Such a code is called self-
(b) Show that the minimum distance of 𝐶 is 𝑑 = 4.
Hint: It is sufficient to show that every set of three columns of the parity-
check matrix is linearly independent.
(c) Decode the word received 𝑦 = (0, 1, 0, 0, 1, 1, 0, 0) using syndrome decoding.
4. Give the sphere-covering bound, the Gilbert-Varshamov bound and the Hamming
bound for 𝑛 = 16 and 𝑑 = 5. Show that the Goppa code in Example 15.31 has
maximal dimension.
5. Why is the formula (𝑎 + 𝑏)2 = 𝑎2 + 𝑏2 (teacher’s nightmare) true over binary fields
𝐺𝐹(2𝑚 )? What can be said about (𝑎 + 𝑏)𝑛 if 𝑛 is a power of 2?
6. Why hold 𝑎2 = 𝑎 for every 𝑎 ∈ 𝐺𝐹(2𝑚 )? Prove that
√𝑎 = 𝑎2 .
√𝑓 = 𝛼 + √𝑥 𝛽 mod 𝑔(𝑥).
Exercises 311
Now suppose that 𝑔 = 𝑔21 + 𝑥𝑔22 with 𝑔1 , 𝑔2 ∈ 𝐺𝐹(2𝑚 )[𝑥] and 1 = 𝑣1 𝑔1 + 𝑣2 𝑔2 with
𝑣1 , 𝑣2 ∈ 𝐺𝐹(2𝑚 )[𝑥]. Show Huber’s formulas [Hub96], [Hub03]:
√𝑥 = 𝑔1 𝑔−1
2 mod 𝑔(𝑥),
320 Index
Walsh-Hadamard, 237
Weierstrass equation, 214
Worst-case complexity, 17
XOR, 9
