A Course in Cryptography

Pure and Applied
AMSTEXT
40
Sally
The UNDERGRADUATE TEXTS 40
SERIES
This book provides a compact course in modern cryptography. The mathemat-

ical foundations in algebra, number theory and probability are presented with a
focus on their cryptographic applications. The text provides rigorous definitions
and follows the provable security approach. The most relevant cryptographic
schemes are covered, including block ciphers, stream ciphers, hash functions,
message authentication codes, public-key encryption, key establishment, digital
signatures and elliptic curves. The current developments in post-quantum
cryptography are also explored, with separate chapters on quantum computing,
A Course in
A Course in Cryptography
lattice-based and code-based cryptosystems.
Many examples, figures and exercises, as well as SageMath (Python) computer
code, help the reader to understand the concepts and applications of modern
Cryptography
cryptography. A special focus is on algebraic structures, which are used in many
cryptographic constructions and also in post-quantum systems. The essential
mathematics and the modern approach to cryptography and security prepare
the reader for more advanced studies.
The text requires only a first-year course in mathematics (calculus and linear
algebra) and is also accessible to computer scientists and engineers. This book is Heiko Knospe
suitable as a textbook for undergraduate and graduate courses in cryptography
as well as for self-study.
For additional information

Knospe
and updates on this book, visit
www.ams.org/bookpages/amstext-40
AMSTEXT/40
Sally
The This series was founded by the highly respected
SERIES
mathematician and educator, Paul J. Sally, Jr.
2-color cover: PMS 432 (Gray) and PMS 300 (Blue) Not yet adjusted page # 336 pages • Backspace 1 3/8" • Trim Size: 7" x 10"
A Course in
Cryptography
Heiko Knospe
Pure and Applied
Sally
The UNDERGRADUATE TEXTS • 40
SERIES
A Course in
Cryptography
Heiko Knospe
EDITORIAL COMMITTEE
Gerald B. Folland (Chair) Steven J. Miller
Jamie Pommersheim Serge Tabachnikov
2010 Mathematics Subject Classification. Primary 94A60;

Secondary 68P25, 81P94, 11T71.
For additional information and updates on this book, visit

Library of Congress Cataloging-in-Publication Data

Names: Knospe, Heiko, 1966– author.
Title: A course in cryptography / Heiko Knospe.
Description: Providence, Rhode Island: American Mathematical Society, [2019] | Series: Pure and
applied undergraduate texts; volume 40 | Includes bibliographical references and index.
Identifiers: LCCN 2019011732 | ISBN 9781470450557 (alk. paper)
Subjects: LCSH: Cryptography–Textbooks. | Coding theory–Textbooks. | Ciphers–Textbooks. |
AMS: Information and communication, circuits – Communication, information – Cryptogra-
phy. msc | Computer science – Theory of data – Data encryption. msc | Quantum theory –
Axiomatics, foundations, philosophy – Quantum cryptography. msc | Number theory – Finite
fields and commutative rings (number-theoretic aspects) – Algebraic coding theory; cryptog-
raphy. msc
Classification: LCC QA268 .K5827 2019 | DDC 005.8/24–dc23
LC record available at https://lccn.loc.gov/2019011732
Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting
for them, are permitted to make fair use of the material, such as to copy select pages for use
in teaching or research. Permission is granted to quote brief passages from this publication in
reviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publication
is permitted only under license from the American Mathematical Society. Requests for permission
to reuse portions of AMS publication content are handled by the Copyright Clearance Center. For
more information, please visit www.ams.org/publications/pubpermissions.
Send requests for translation rights and licensed reprints to reprint-permission@ams.org.

c 2019 by the author. All rights reserved.
Printed in the United States of America.

∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at https://www.ams.org/
10 9 8 7 6 5 4 3 2 1 24 23 22 21 20 19
To my parents, Anne Anita and Karlheinz
Contents
Preface xiii
Getting Started with SageMath 1

0.1. Installation 1
0.2. SageMath Command Line 2
0.3. Browser Notebooks 2
0.4. Computations with SageMath 3
Chapter 1. Fundamentals 7
1.1. Sets, Relations and Functions 7
1.2. Combinatorics 14
1.3. Computational Complexity 16
1.4. Discrete Probability 19
1.5. Random Numbers 23
1.6. Summary 27
Exercises 28
Chapter 2. Encryption Schemes and Definitions of Security 31

2.1. Encryption Schemes 32
2.2. Perfect Secrecy 35
2.3. Computational Security 36
2.4. Indistinguishable Encryptions 37
2.5. Eavesdropping Attacks 39
vii
viii Contents
2.6. Chosen Plaintext Attacks 41

2.7. Chosen Ciphertext Attacks 43
2.8. Pseudorandom Generators 45
2.9. Pseudorandom Functions 48
2.10. Block Ciphers and Operation Modes 52
2.11. Summary 58
Exercises 58
Chapter 3. Elementary Number Theory 61

3.1. Integers 61
3.2. Congruences 65
3.3. Modular Exponentiation 67
3.4. Summary 69
Exercises 69
Chapter 4. Algebraic Structures 73

4.1. Groups 73
4.2. Rings and Fields 81
4.3. Finite Fields 82
4.4. Linear and Affine Maps 92
4.5. Summary 97
Exercises 97
Chapter 5. Block Ciphers 101

5.1. Constructions of Block Ciphers 101
5.2. Advanced Encryption Standard 104
5.3. Summary 111
Exercises 111
Chapter 6. Stream Ciphers 115

6.1. Definition of Stream Ciphers 115
6.2. Linear Feedback Shift Registers 119
6.3. RC4 128
6.4. Salsa20 and ChaCha20 130
6.5. Summary 135
Exercises 135
Contents ix
Chapter 7. Hash Functions 137

7.1. Definitions and Security Requirements 137
7.2. Applications of Hash Functions 139
7.3. Merkle-Damgård Construction 140
7.4. SHA-1 142
7.5. SHA-2 145
7.6. SHA-3 146
7.7. Summary 149
Exercises 149
Chapter 8. Message Authentication Codes 151

8.2. CBC MAC 154
8.3. HMAC 156
8.4. Authenticated Encryption 157
8.5. Summary 161
Exercises 161
Chapter 9. Public-Key Encryption and the RSA Cryptosystem 163

9.1. Public-Key Cryptosystems 163
9.2. Plain RSA 166
9.3. RSA Security 168
9.4. Generation of Primes 170
9.5. Efficiency of RSA 173
9.6. Padded RSA 175
9.7. Factoring 177
9.8. Summary 182
Exercises 182
Chapter 10. Key Establishment 185

10.1. Key Distribution 186
10.2. Key Exchange Protocols 186
10.3. Diffie-Hellman Key Exchange 188
10.4. Diffie-Hellman using Subgroups of ℤ∗𝑝 190
10.5. Discrete Logarithm 192
10.6. Key Encapsulation 194
10.7. Hybrid Encryption 197
x Contents
10.8. Summary 200

Exercises 200
Chapter 11. Digital Signatures 203

11.2. Plain RSA Signature 205
11.3. Probabilistic Signature Scheme 206
11.4. Summary 210
Exercises 210
Chapter 12. Elliptic Curve Cryptography 213

12.1. Weierstrass Equations and Elliptic Curves 213
12.2. Elliptic Curve Diffie-Hellman 222
12.3. Efficiency and Security of Elliptic Curve Cryptography 223
12.4. Elliptic Curve Factoring Method 224
12.5. Summary 227
Exercises 227
Chapter 13. Quantum Computing 229

13.1. Quantum Bits 230
13.2. Multiple Qubit Systems 234
13.3. Quantum Algorithms 235
13.4. Quantum Fourier Transform 241
13.5. Shor’s Factoring Algorithm 242
13.6. Quantum Key Distribution 248
13.7. Summary 251
Exercises 251
Chapter 14. Lattice-based Cryptography 253

14.1. Lattices 254
14.2. Lattice Algorithms 260
14.3. GGH Cryptosystem 269
14.4. NTRU 271
14.5. Learning with Errors 276
14.6. Summary 282
Exercises 282
Contents xi
Chapter 15. Code-based Cryptography 285

15.1. Linear Codes 286
15.2. Bounds on Codes 290
15.3. Goppa Codes 295
15.4. McEliece Cryptosystem 303
15.5. Summary 309
Exercises 310
Bibliography 313
Index 319
Preface
Why is cryptography interesting? Firstly, cryptography is a classical subject with a fas-

cinating history. Encryption techniques have been used since ancient times, but the
protection against exposure was sometimes dubious and many ciphers were broken.
The design of new ciphers and the capability to analyze and break them co-evolved
over time. Since then, the techniques have been adapted to progress in cryptanalysis
and to the computing power available. Modern cryptography goes beyond confiden-
tiality, also addressing aspects such as data integrity, authentication, non-repudiation
and other security objectives. The subject now includes all mathematical techniques
relating to information security.
Secondly, cryptography is closely connected to several fields of mathematics and
computer science, providing interesting applications for many theoretical results and
stimulating mathematical research. Cryptography, which we use as an umbrella term
for the field including cryptanalysis, is linked to aspects of discrete mathematics, num-
ber theory, algebra, probability theory, statistics, information and coding theory.
Thirdly, cryptography has become a key technology that is used ubiquitously in
computer systems from small devices to large servers and networks. The multitude of
security threats is driving the trend to protect data whenever possible, be it for stor-
age or during transmission over networks. The reader should be warned, however,
that real-world security is a complex process in which cryptography contributes “only”
the primitives, algorithms and schemes. In practice, attacks often exploit protocol or
implementation flaws, weak passwords or negligent users. Furthermore, the security
guarantees offered by cryptography cannot be unconditional. The security provided
depends on the power of adversaries, their computing resources and the time available
for an attack, as well as on underlying computational problems which are believed to
be hard.
xiii
xiv Preface
The aim of this book is to explain the current cryptographic primitives and schemes
and to outline the essential mathematics required to understand the building blocks
and assess their security. We cover the widespread schemes, but we also want to ad-
dress some of the recent developments in post-quantum cryptography. The mathemat-
ical and, in particular, algebraic and number-theoretical foundations of cryptography
are explained in detail. The mathematical theory is presented with a focus on crypto-
graphic applications and we do not strive for maximal generality. We look at a selection
of cryptographic algorithms according to their current and supposed future relevance,
while leaving out several historic schemes. Since cryptography is a very active field,
some uncertainty regarding future developments will of course remain.
Why write yet another textbook on cryptography? We hope to convince potential
readers by listing some of the unique features of this book:
• The fundamentals of cryptography are presented with rigorous definitions while
being accessible to undergraduate students in science, engineering and mathe-
matics;
• Formal definitions of security as used in the modern literature on cryptography;
• Focus on widely used methods and on prospective cryptographic schemes;
• Introduction to quantum computing and post-quantum cryptography;
• Numerical examples and SageMath (Python) code.
Cryptography can easily be underestimated by mathematicians. Several textbooks
contain excellent descriptions of the mathematical theory, but fall short of explaining
how to use these algorithms in practice. In fact, the main purpose of cryptography is
to achieve security objectives such as confidentiality and integrity in the presence of
powerful adversaries. Well-known schoolbook algorithms like RSA can be insecure
without adaptations, for example, by incorporating random data.
This book follows the provable security approach which is adopted in the modern
literature. Well-defined experiments (games) are used in which the success probability
of potential attackers determines the security. Secure schemes have the property that
an adversary with restricted resources can do little better than randomly guess the se-
cret information. Using this approach, the security is reduced to standard assumptions
that are generally believed to be true. In this book, we give exact security definitions
and some proofs, but refer to the literature for more advanced proofs and techniques,
for example, using the sequence of games approach.
We find that examples are very helpful and include computations using the open
source mathematics software SageMath (aka Sage) [Sag18]. SageMath contains many
algebraic and number theoretic functions which can be easily used and extended. Al-
though the software might be better known among mathematicians than scientists
and engineers, it is easily accessible and very suitable for cryptographic computations.
SageMath is based on Python and contains other open source software as, for example,
Singular, Maxima, PARI, GAP, NumPy, SciPy, SymPy and R. In recent years, Python
Preface xv
has gained immense popularity among scientists. One of its advantages is that results
can be obtained quickly without much programming overhead. In this book, we opt
for SageMath instead of plain Python since SageMath has much better support for alge-
braic computations, which are often needed in modern cryptography. SageMath also
has a convenient user interface and supports the popular Jupyter browser notebooks.
Numerical examples can be used to help understand cryptographic constructions
and their underlying theory. Toy examples, in which the numbers and bit-lengths are
too small for any real-world security, can still be useful in this respect. The reader is
encouraged to perform computations and to write their own SageMath functions. We
also provide exercises with both theoretical and numerical problems.
The book should be accessible to mathematics, science or engineering students af-
ter completing a first year’s undergraduate course in mathematics (calculus and linear
algebra). The material originates from several courses on cryptography for computer
scientists and communication engineers which the author has taught. Since the previ-
ous knowledge can be quite heterogeneous, we decided to include several elementary
topics. In the author’s teaching experience, abstract algebra as well as linear algebra
over general fields deserves special attention. Linear maps over finite fields play an
important role in many cryptographic constructions. This book should be largely self-
contained and requires no previous knowledge of discrete mathematics, algebra, num-
ber theory or cryptography. We do not strive for greatest generality and frequently refer
to more specialized textbooks or articles.
Cryptography can be taught at different levels and to different audiences. This
book can be used in bachelor’s and master’s courses, as well as by practitioners, and
is suitable for a general audience wanting to understand the fundamentals of modern
cryptography. Many mathematics and computer science students may already have the
necessary background in discrete mathematics, elementary number theory and prob-
ability and can therefore skip Chapters 1 and 3. Chapter 4 provides the necessary al-
gebraic constructions and is recommended to all readers without solid knowledge of
abstract algebra. From my teaching experience, algebra can be a major stumbling block
and should not be underestimated. Chapters 1, 3 and 4 thus provide the mathematical
background of cryptography.
We decided to begin with the core cryptographic content as early as possible, so
Chapter 2 deals with encryption schemes and the modern definitions of security. This
chapter requires only basic discrete mathematics, complexity and probability theory
and is recommended for most readers, even if they have some prior knowledge of cryp-
tography. Understanding the provable security approach is crucial for the subsequent
chapters of this book. Chapter 5 deals with block ciphers and AES in particular, which
is a crucial part of every modern course on cryptography. Chapter 6 explores stream
ciphers, which form a natural complement, but it is also possible to omit this chapter
if you are short on time. We have already mentioned that modern cryptography goes
beyond encryption. Integrity protection is another major objective, and hash functions
xvi Preface
and message authentication codes play a crucial role in this. These topics are addressed
in Chapters 7 and 8. Chapters 9, 10 and 11, which are on public-key encryption, key
establishment and signatures, introduce the fundamentals of public-key cryptography.
We explain RSA and Diffie-Hellman in particular and discuss their security, which is
based on hard number-theoretic problems.
We therefore think that Chapters 2, 5, 7, 8, 9, 10 and 11, along with the neces-
sary mathematical preparations (Chapters 1, 3 and 4), should be covered in every first
course on cryptography. A one-semester bachelor’s module might end after Chapter
11, but whenever possible, we recommend including Chapter 12 on elliptic curve cryp-
tography. This has been the topic of intensive research in the last few decades but has
now become part of well-established cryptography and is implemented by every In-
ternet browser, for example. We believe the basics of elliptic curves are accessible to
readers after the preparatory work in Chapters 3 and 4. There are, however, more ad-
vanced topics in elliptic curves that are not treated here.
Chapters 13, 14 and 15 provide an introduction to the new field of post-quantum
cryptography. In Chapter 13, we explore the basics of quantum computing and explain
why quantum computers can break classic public-key schemes like RSA. Chapters 14
and 15 deal with two major types of post-quantum systems that are based on lattices
and error-correcting codes, respectively. We focus on the foundations and several se-
lected encryption schemes. Note that there are other post-quantum systems, for exam-
ple, cryptosystems from isogenies of elliptic curves or multivariate-quadratic-equations
signatures, which are not covered in this book. Chapters 13–15 are more challenging
with respect to the level of calculus and abstract algebra. However, we spend some
time on examples (many of them using SageMath) and we hope that the content of
these three chapters is accessible for master’s or advanced bachelor’s students. We ex-
pect that quantum computing and post-quantum schemes will become increasingly
important in the future.
I would be happy to receive feedback and suggestions for improvement. Please
email your comments to heiko.knospe@th-koeln.de. Updates and additional mate-
rial, for example, solutions to selected exercises and SageMath code, are available on
the following website: https://github.com/cryptobook.
Finally, I would like to thank my colleagues and my students for their valuable
feedback on my cryptography course and on earlier versions of the manuscript.
Preface xvii
Chapter 1
Chapter 3 Chapter 2
Chapter 4
Chapter 5
Chapter 6 Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Chapter 14 Chapter 13 Chapter 15
Dependence relationship between the chapters.

Getting Started with
SageMath
SageMath (aka Sage) is an open source mathematics software which is ideally suited for
cryptography. SageMath supports a lot of algebraic constructions used in cryptography,
and results can be achieved with relatively few lines of code. Many experts in the field
use SageMath for their research and for prototyping before finally switching to faster
programming languages like C and C++.
This book contains a large number of examples and exercises which use SageMath,
and readers are encouraged to do their own experiments. The aim of this chapter is to
give a brief introduction to SageMath. Further information and links to online docu-
mentation can be found on http://www.sagemath.org/help.html. We also recom-
mend the book [Bar15].
0.1. Installation
The installation of SageMath is easy using pre-built binaries which are available from
http://www.sagemath.org/download.html for several types of CPU and operating
systems including Linux (Ubuntu, Debian), macOS and Microsoft Windows. A Docker
container is also available. Since Sage 8.0, Windows users can use a binary installer
(instead of a virtual machine) and run either the SageMath shell or the browser-based
interface by clicking on the respective icon. macOS users can install the app.dmg pack-
age, move the software to the Applications folder and start SageMath either from the
command line or by clicking on the app icon. On Linux, the downloaded package has
to be uncompressed (using bunzip2 and tar). The SageMath directory contains an ex-
ecutable file that can be started from the command line. The directory can be moved if
1
2 Getting Started with SageMath
desired. It is advisable to add the directory to the local PATH variable or otherwise spec-
ify the full path when sage is executed. The sage executable starts a shell and sage
-notebook=jupyter runs the Jupyter notebook server and opens a browser window.
The packages and the installed software require several gigabytes of free disk space.
The SageMath distribution includes a long list of open source software and it is not
usually necessary to install additional packages.
0.2. SageMath Command Line

SageMath has several user interfaces: the SageMath command line shell and browser-
based notebook interfaces. The sage command (on macOS and Linux) and the Sage-
Math shell icon (for Windows) starts the command line interface with access to all of
its capabilities, including graphics.
sage$ /Applications/SageMath-8.5.app/sage
SageMath version 8.5, Release Date: 2018-12-22
Using Python 2.7.15. Type "help()" for help.
sage:
SageMath can be used as a calculator which supports many algebraic and symbolic
computations. By default, SageMath works over the integers ℤ, but the many other
base rings (for example polynomial rings) and fields (in particular ℚ, ℝ, ℂ and finite
fields) are also supported.
SageMath can factorize relatively large numbers:
sage: factor (2^128 -1)
3 * 5 * 17 * 257 * 641 * 65537 * 274177 * 6700417 * 67280421310721
In the following example, we define a 3 × 3 matrix over the integers and compute
the determinant and the inverse matrix.
sage: A= matrix ([[1 ,2 ,3] ,[ -1 ,3 ,4] ,[2 ,2 ,3]])
sage: det(A)
-1
sage: 1/A
[ -1 0 1]
[-11 3 7]
[ 8 -2 -5]
0.3. Browser Notebooks

Browser-based notebooks are a convenient user interface and are recommended for
working on the examples and exercises in this book. The command
sage -notebook=jupyter
(or clicking the SageMath icon) starts a local notebook server and opens a browser
window. In the following, we use Jupyter notebooks which are very popular in the
Python community. Alternatively, a legacy SageMath notebook server can be started
with sage -notebook=sagenb.
Figure 0.1. Creating a new browser notebook.
A new notebook is created by clicking on the ‘New’ button in the upper right corner
and choosing SageMath (see Figure 0.1). The commands and the code are written in
input cells and a cell is evaluated by pressing ‘Shift + Enter’ or by clicking on the play
symbol in the toolbar. The ‘Enter’ key does not interpret the code, but rather creates
a new line. It is a good practice not to write too much into a single cell, although a
cell can contain several lines as well as multiple commands in one line (separated by a
semicolon). Do not forget to rename an untitled notebook and to save (and checkpoint)
your work via the ‘File’ menu.
The code shown in Figure 0.2 implements a loop. For each 1 ≤ 𝑛 < 20 the factorial
𝑛! is printed out using the Python format specification. You may be unfamiliar with
the way Python structures the code, since it differs from languages like C and Java. The
colon in the first line and the indentation of the second line are very important.
0.4. Computations with SageMath

SageMath can perform all kinds of computations, but its symbolic and algebraic capa-
bilities are particularly useful for cryptography. SageMath can work with variables: 𝑥
is predefined and other variables can be declared by var('...'). Some constructions,
for example PolynomialRing, name the variable used. The following example shows
a computation in the polynomial ring 𝑆 = ℤ[𝑥].
sage: S.<x> = PolynomialRing (ZZ)
sage: (1+x)^10
x^10 + 10*x^9 + 45*x^8 + 120*x^7 + 210*x^6 + 252*x^5 +
210*x^4 + 120*x^3 + 45*x^2 + 10*x + 1
4 Getting Started with SageMath
Figure 0.2. SageMath code to print out the factorials.
Now we perform the same computation in the polynomial ring 𝑅 = 𝐺𝐹(2)[𝑡] over
the binary field 𝐺𝐹(2). The reader is advised to refer to Chapters 3 and 1 for the math-
ematical background of the following examples.
sage: R.<t> = PolynomialRing (GF (2))
sage: (1+t)^10
t^10 + t^8 + t^2 + 1
We can also construct the residue class ring

𝐹 = 𝑅/(𝑡8 + 𝑡4 + 𝑡3 + 𝑡 + 1) = 𝐺𝐹(2)[𝑡]/(𝑡8 + 𝑡4 + 𝑡3 + 𝑡 + 1)
which defines the field 𝐺𝐹(256) with 28 = 256 elements. The variable 𝑎 represents the
residue class of 𝑡 modulo 𝑡8 + 𝑡4 + 𝑡3 + 𝑡 + 1. All nonzero elements in 𝐺𝐹(256) are
invertible and we compute the multiplicative inverse of 𝑎 + 1.
sage: F.<a>=R. quotient_ring (t^8+t^4+t^3+t+1)
sage: 1/(a+1)
a^7 + a^6 + a^5 + a^4 + a^2 + a
We verify the result by multiplying the residue classes. Note the difference to the mul-
tiplication in 𝑅 = 𝐺𝐹(2)[𝑡].
sage: (a+1)*( a^7 + a^6 + a^5 + a^4 + a^2 + a)
1
sage: (t+1)*( t^7 + t^6 + t^5 + t^4 + t^2 + t)
t^8 + t^4 + t^3 + t
We define a 4 × 4 matrix over 𝐺𝐹(256) and let SageMath compute the inverse:
sage: M= matrix (F ,[[a,a+1 ,1 ,1] ,[1 ,a,a+1 ,1] ,[1 ,1 ,a,a+1],
[a+1,1,1,a]])
sage: 1/M
[a^3 + a^2 + a a^3 + a + 1 a^3 + a^2 + 1 a^3 + 1]
[ a^3 + 1 a^3 + a^2 + a a^3 + a + 1 a^3 + a^2 + 1]
[a^3 + a^2 + 1 a^3 + 1 a^3 + a^2 + a a^3 + a + 1]
[ a^3 + a + 1 a^3 + a^2 + 1 a^3 + 1 a^3 + a^2 + a]
In Chapter 5, we will see that the field 𝐺𝐹(256) and the matrix 𝑀 are used in the AES
block cipher.
Chapter 1
Fundamentals
Modern cryptography relies on mathematical structures and methods, and this chap-
ter contains the mathematical background from discrete mathematics, computational
complexity and probability theory. We recapitulate elementary structures like sets, re-
lations, equivalence classes and functions in Section 1.1. Fundamental combinatorial
facts are outlined in Section 1.2 and the asymptotic notation is explained. Section 1.3
discusses complexity and the Big-O notation. Section 1.4 then deals with basic prob-
ability theory. Random numbers and the birthday problem are addressed in Section
1.5.
For a general introduction to undergraduate mathematics the reader may, for ex-
ample, refer to the textbook [WJW+ 14]. Discrete mathematics and its applications are
discussed in [Ros12].
1.1. Sets, Relations and Functions

The most elementary mathematical structure is sets.
Definition 1.1 (Cantor’s definition of sets). A set 𝑀 is a well-defined collection of
distinct objects. If the object 𝑥 is a member of 𝑀, write 𝑥 ∈ 𝑀 or otherwise 𝑥 ∉ 𝑀. One
writes 𝑁 ⊂ 𝑀 if 𝑁 is a subset of 𝑀, i.e., if all elements of 𝑁 are also elements of 𝑀.
Example 1.2. Basic examples of finite sets are { } = ∅ (the empty set), {0} (a set with
one element), {0, 1} (the binary numbers) and {𝐴, 𝐵, 𝐶, … , 𝑍} (the set of capital letters).
The standard sets of numbers ℕ ⊂ ℤ ⊂ ℚ ⊂ ℝ ⊂ ℂ (positive integers, integers, rational
numbers, real and complex numbers) are examples of infinite sets.
Definition 1.3. The cardinality or size of a finite set 𝑋 is the number of its elements
and is denoted by |𝑋|. ♢
7
8 1. Fundamentals
Note that there are other notions of size (see Warning 1.34 below): the size of an
integer is the number of bits needed to represent it and the size of a binary string is its
length.
Sets can be defined explicitly, for example by enumeration or by intervals of real
numbers, or implicitly by formulas.
Example 1.4. 𝑀 = {𝑥 ∈ ℤ | 𝑥4 < 50} implicitly describes the set of integers 𝑥 for
which 𝑥4 < 50 holds. This set can also be described explicitly:
𝑀 = {−2, −1, 0, 1, 2}. ♢
There are several elementary set operations: 𝑀 ∪ 𝑁 (union), 𝑀 ∩ 𝑁 (intersection),

𝑀 ⧵𝑁 (set difference), 𝑀 ×𝑁 (Cartesian product), 𝑀 𝑛 (𝑛-ary product) and 𝒫(𝑀) (power
set).
Example 1.5. 𝑀 = {0, 1}128 is the set of binary strings of length 128. Elements in
𝑀 can be written in the form 𝑏1 𝑏2 … 𝑏128 or (𝑏1 , 𝑏2 , … , 𝑏128 ) in vectorial notation. An
element of 𝑀 could, for example, represent one block of plaintext or ciphertext data.
The cardinality of 𝑀 is very large:
|𝑀| = 2128 ≈ 3.4 ⋅ 1038
sage: 2^128
340282366920938463463374607431768211456
Remark 1.6. It is useful to help understand the difference between small, big and
inaccessible numbers in practical computations. For example, one can easily store one
terabyte (1012 bytes, i.e., around 243 bits) of data. On the other hand, a large amount
of resources are required to store one exabyte (one million terabytes) or 263 bits, and
more than 2100 bits are out of reach.
The number of computing steps is also bounded: less than 240 steps (say CPU
clocks) are easily possible, 260 operations require a lot of computing resources and
take a significant amount of time, and more than 2100 operations are unfeasible. It
is for example impossible to test 2128 different keys with conventional (non-quantum)
computers.
Definition 1.7. A function, mapping or map 𝑓 ∶ 𝑋 → 𝑌 consists of two sets (the
domain 𝑋 and the codomain 𝑌 ) and a rule which assigns an output element (an image)
𝑦 = 𝑓(𝑥) ∈ 𝑌 to each input element 𝑥 ∈ 𝑋 . The set of all 𝑓(𝑥) is a subset of 𝑌 called
the range or the image 𝑖𝑚(𝑓). Any 𝑥 ∈ 𝑋 with 𝑓(𝑥) = 𝑦 is called a preimage of 𝑦. Let
𝐵 ⊂ 𝑌 ; then we say that 𝑓−1 (𝐵) = {𝑥 ∈ 𝑋 | 𝑓(𝑥) ∈ 𝐵} is the preimage or inverse image
of 𝐵 under 𝑓.
Example 1.8. Let 𝑓 ∶ {0, 1}4 → {0, 1} be defined by
𝑓(𝑏1 , 𝑏2 , 𝑏3 , 𝑏4 ) = 𝑏1 ⊕ 𝑏2 ⊕ (𝑏3 ⋅ 𝑏4 ) = 𝑏1 ⊕ 𝑏2 ⊕ 𝑏3 𝑏4 .
Refer to Table 1.1 for the definition of XOR (⊕) and AND (⋅). For example, (1, 1, 1, 1)
is a preimage of 1 and (0, 1, 0, 0) is another preimage of 1. (0, 0, 0, 0) is a preimage of 0
and the image of 𝑓 is 𝑖𝑚(𝑓) = {0, 1}. The function 𝑓 is surjective, but not injective (see
Definition 1.10 below).
Table 1.1. XOR and AND operations.
⊕ 0 1 ⋅ 0 1
0 0 1 0 0 0
1 1 0 1 0 1
Warning 1.9. A mathematical function is not the same as an algorithm. An algo-

rithm is a step-by-step and possibly recursive procedure which produces output from
given input. There can be different algorithms (or no known algorithm at all) for
a given mathematical function. Furthermore, algorithms can also use random data
as input and can have different behaviors on different runs. In computer science, a
function is a piece of code in some programming language that performs a specific
task.
Every set 𝑋 has an identity function 𝑖𝑑𝑋 ∶ 𝑋 → 𝑋, which maps each 𝑥 ∈ 𝑋 to itself.
Functions can be composed if the range of the first function lies within the domain of
the second function. Let 𝑓 ∶ 𝑋 → 𝑌 and 𝑔 ∶ 𝑌 → 𝑍 be functions. Then there is a
composite function 𝑔 ∘ 𝑓 ∶ 𝑋 → 𝑍 with (𝑔 ∘ 𝑓)(𝑥) = 𝑔(𝑓(𝑥)) (see Figure 1.1).
𝑔∘𝑓
𝑓 𝑔
𝑋 𝑌 𝑍
Figure 1.1. Composition of functions.
Definition 1.10. Let 𝑓 ∶ 𝑋 → 𝑌 be a function.

• 𝑓 is injective if different elements of the domain map to different elements of the
range: for all 𝑥1 , 𝑥2 ∈ 𝑋 with 𝑥1 ≠ 𝑥2 we have 𝑓(𝑥1 ) ≠ 𝑓(𝑥2 ). Equivalently, 𝑓 is
injective if for all 𝑥1 , 𝑥2 ∈ 𝑋:
𝑓(𝑥1 ) = 𝑓(𝑥2 ) ⇒ 𝑥1 = 𝑥2 .
10 1. Fundamentals
• 𝑓 is surjective or onto if every element of the codomain 𝑌 is contained in the image

of 𝑓, i.e., for every 𝑦 ∈ 𝑌 there exists an 𝑥 ∈ 𝑋 (a preimage) with 𝑓(𝑥) = 𝑦. In
other words, 𝑓 is surjective if
𝑖𝑚(𝑓) = 𝑌 .
• 𝑓 is bijective if it is both injective and surjective. Bijective functions possess an
inverse map 𝑓−1 ∶ 𝑌 → 𝑋 such that 𝑓−1 ∘ 𝑓 = 𝑖𝑑𝑋 and 𝑓 ∘ 𝑓−1 = 𝑖𝑑𝑌 (see Figure
1.2). Such functions are also called invertible.
𝑓−1 ∘ 𝑓 = 𝑖𝑑𝑋 𝑓 ∘ 𝑓−1 = 𝑖𝑑𝑌
𝑓
𝑋 𝑌
𝑓−1
Figure 1.2. Inverse map and its properties.
Example 1.11. Let 𝑓 ∶ {0, 1}2 → {0, 1}2 be defined by

𝑓(𝑏1 , 𝑏2 ) = (𝑏1 ⊕ 𝑏2 , 𝑏1 ).
Since 𝑓(0, 0) = (0, 0), 𝑓(0, 1) = (1, 0), 𝑓(1, 0) = (1, 1) and 𝑓(1, 1) = (0, 1), the map is
bijective. The inverse map 𝑓−1 ∶ {0, 1}2 → {0, 1}2 is given by
𝑓−1 (𝑏1 , 𝑏2 ) = (𝑏2 , 𝑏1 ⊕ 𝑏2 ).
One has 𝑓−1 (0, 0) = (0, 0), 𝑓−1 (0, 1) = (1, 1), 𝑓−1 (1, 0) = (0, 1) and 𝑓−1 (1, 1) = (1, 0).
♢
Invertible (bijective) maps play an important role in ciphering. An encryption map

𝐸 must have a corresponding decryption map 𝐷 such that 𝐷 ∘ 𝐸 = 𝑖𝑑 (the identity
map on the plaintext space). Encryption maps are often bijective, although a slightly
weaker property is sufficient (𝐸 has to be injective and 𝐷 must be surjective). Ciphering
is often defined as a composition of several operations. These operations are not nec-
essarily bijective, even if their composition is ultimately bijective. On the other hand,
a composition of bijective maps is of course bijective.
Lemma 1.12. Let 𝑓 ∶ 𝑋 → 𝑌 be a map between finite sets; then:
(1) If 𝑓 is injective then |𝑋| ≤ |𝑌 |.
(2) If 𝑓 is surjective then |𝑋| ≥ |𝑌 |.
(3) If 𝑓 is bijective then |𝑋| = |𝑌 |. ♢
Note that the above conditions are only necessary and not sufficient.
Remark 1.13. The contraposition of Lemma 1.12 (1) is called the pigeonhole principle:
if |𝑋| > |𝑌 | then 𝑓 is not injective. Suppose 𝑋 is a set of pigeons and 𝑌 a set of holes. If
there are more pigeons than holes, then one hole has more than one pigeon.
Definition 1.14. A set 𝑆 is said to be countably infinite if there is a bijective map
𝑓∶𝑆→ℕ
from 𝑆 to the set of natural numbers. We say that a set 𝑆 is countable if it is finite or
countably infinite.
Example 1.15. ℕ and ℤ are countably infinite sets. The sets ℤ𝑛 (for 𝑛 ∈ ℕ) and ℚ are
also countable (see Exercise 3). However, a famous result of Cantor says that the set ℝ
of all real numbers is uncountable. ♢
The floor, ceiling and rounding functions are often used in numerical computa-
tions.
Definition 1.16. Let 𝑥 ∈ ℝ.
(1) ⌊𝑥⌋ is the greatest integer less than or equal to 𝑥.
(2) ⌈𝑥⌉ is the least integer greater than or equal to 𝑥.
1
(3) ⌊𝑥⌉ = ⌊𝑥 + ⌋ is rounding 𝑥 to the nearest integer (round half up).
2
Example 1.17. ⌊1.7⌋ = 1, ⌈−2.4⌉ = −2 and ⌊1.5⌉ = 2. ♢
Functions from 𝑋 to 𝑌 are particular cases of relations between 𝑋 and 𝑌 :

Definition 1.18. A binary relation (or relation) between 𝑋 and 𝑌 is a subset
𝑅 ⊂ 𝑋 × 𝑌.
Hence a relation is a set of pairs (𝑥, 𝑦) with 𝑥 ∈ 𝑋 and 𝑦 ∈ 𝑌 . If 𝑋 = 𝑌 then 𝑅 is called
a (binary) relation on 𝑋. ♢
A function 𝑓 ∶ 𝑋 → 𝑌 defines a relation between 𝑋 and 𝑌 :

𝑅 = {(𝑥, 𝑦) ∈ 𝑋 × 𝑌 | 𝑦 = 𝑓(𝑥)}.
𝑅 contains all tuples (𝑥, 𝑓(𝑥)) with 𝑥 ∈ 𝑋. This relation is also called the graph of
𝑓. However, a relation 𝑅 between 𝑋 and 𝑌 does not necessarily define a function. A
relation 𝑅 defines a function if and only if each 𝑥 ∈ 𝑋 occurs exactly once as a first
component in the relation 𝑅.
Equivalence relations on a set 𝑋 are of special importance. An equivalence relation
induces a segmentation of a set into disjoint subsets. The collection of these subsets
defines a new set and elements in the same subset are said to be equivalent.
12 1. Fundamentals
Definition 1.19. Let 𝑅 be a relation on 𝑋. Then 𝑅 is called an equivalence relation if it

satisfies the following conditions:
(1) 𝑅 is reflexive, i.e., (𝑥, 𝑥) ∈ 𝑅 for all 𝑥 ∈ 𝑋, and
(2) 𝑅 is symmetric, i.e., if (𝑥, 𝑦) ∈ 𝑅 then (𝑦, 𝑥) ∈ 𝑅, and
(3) 𝑅 is transitive, i.e., if (𝑥, 𝑦) ∈ 𝑅 and (𝑦, 𝑧) ∈ 𝑅 then (𝑥, 𝑧) ∈ 𝑅. ♢
If 𝑅 is an equivalence relation and (𝑥, 𝑦) ∈ 𝑅, then 𝑥 and 𝑦 are called equivalent

and we write 𝑥 ∼ 𝑦. For 𝑥 ∈ 𝑋, the subset 𝑥 = {𝑦 ∈ 𝑋 | 𝑥 ∼ 𝑦} ⊂ 𝑋 is called the
equivalence class of 𝑥 and all elements in 𝑥 are representatives of that class. The above
conditions of an equivalence relation ensure that the equivalence classes are disjoint
and their union is 𝑋. The set of equivalence classes is said to be the quotient set 𝑋/ ∼ .
Note: the elements of 𝑋/ ∼ are sets. By abuse of notation, the same 𝑥 is sometimes
viewed as an element of 𝑋 and of 𝑋/ ∼ .
Example 1.20. We define an equivalence relation 𝑅2 on 𝑋 = ℤ:
𝑅2 = {(𝑥, 𝑦) ∈ ℤ × ℤ | 𝑥 − 𝑦 ∈ 2 ℤ}.
Pairs (𝑥, 𝑦) ∈ 𝑅2 have the property that their difference 𝑥 − 𝑦 is even, i.e., divisible by
2. For example, the tuples (2, 4), (3, 1) and (−1, 5) are all elements of 𝑅2 , but (0, 1) ∉
𝑅2 and (−3, 4) ∉ 𝑅2 . One can easily check that 𝑅2 is an equivalence relation. The
equivalence class of 𝑥 ∈ ℤ is
𝑥 = {… , 𝑥 − 4, 𝑥 − 2, 𝑥, 𝑥 + 2, 𝑥 + 4 , … }.
There are only two different equivalence classes:
0 = {… , −4, −2, 0, 2, 4, … } and 1 = {… , −3, −1, 1, 3, 5, … }.
For example, one has −2 = 0 = 2 and −1 = 1 = 3. Hence the quotient set ℤ/ ∼ only
has two elements. This set is denoted by ℤ2 , ℤ/(2), 𝐺𝐹(2) or 𝔽2 , and we call it the field
of residue classes modulo 2. It can be identified with the binary set {0, 1}. ♢
A generalization of the above example yields the residue classes modulo 𝑛:

Example 1.21. Let 𝑛 ∈ ℕ and 𝑛 ≥ 2. Consider the following equivalence relation 𝑅𝑛
on 𝑋 = ℤ:
𝑅𝑛 = {(𝑥, 𝑦) ∈ ℤ × ℤ | 𝑥 − 𝑦 ∈ 𝑛 ℤ}.
Pairs (𝑥, 𝑦) ∈ 𝑅𝑛 have the property that their difference 𝑥 − 𝑦 is divisible by 𝑛, i.e., 𝑥 − 𝑦
is a multiple of 𝑛. For example, let 𝑛 = 11. Then the tuples (2, 13) and (−1, 10) are
elements of 𝑅11 , but (0, 10) ∉ 𝑅11 and (−2, 13) ∉ 𝑅11 .
Similar to the example above, 𝑅𝑛 is an equivalence relation. The equivalence class
of 𝑥 ∈ ℤ is the set
𝑥 = {… , 𝑥 − 2𝑛, 𝑥 − 𝑛, 𝑥, 𝑥 + 𝑛, 𝑥 + 2𝑛, … }.
Now we have 𝑛 different equivalence classes and the quotient set ℤ/ ∼ has 𝑛 elements.
We call this set the residue classes modulo 𝑛 or integers modulo 𝑛 and denote it by ℤ𝑛 or
ℤ/(𝑛). Each residue class has a standard representative in the set {0, 1, … , 𝑛 − 1} and
elements in the same residue class are called congruent modulo 𝑛. ♢
Two integers are congruent modulo 𝑛 if they have the same remainder when they
are divided by 𝑛. In many programming languages, the remainder of the integer divi-
sion 𝑎 ∶ 𝑛 is computed by 𝑎 % 𝑛, but note that the result may be negative for 𝑎 < 0,
whereas the standard representative of 𝑎 modulo 𝑛 is non-negative.
Example 1.22. Let 𝑛 = 11. Then ℤ11 = {0, 1, … , 10} has 11 elements. One has −14 =
8 since −14−8 = −22 is a multiple of 11. The integers 8 and −14 are congruent modulo
11 and one writes −14 ≡ 8 mod 11. The standard representative of this residue class
is 8, and −3, −14, … as well as 19, 30, … are other representatives of the same residue
class. Here is an example using SageMath:
sage: mod ( -892342322327 ,11)
6
The discussion of residue classes will be continued in Section 3.2.
Definition 1.23. A map 𝑓 ∶ {0, 1}𝑛 → {0, 1} is called an 𝑛-variable Boolean function
and a map 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑚 is called an (𝑛, 𝑚)-vectorial Boolean function. A vec-
torial Boolean function can be written as an 𝑚-tuple of 𝑛-variable Boolean functions:
𝑓 = (𝑓1 , 𝑓2 , … , 𝑓𝑚 ). ♢
Boolean functions can be represented by their truth table. Since the table of an 𝑛-
variable Boolean function has 2𝑛 entries, this is only reasonable for small 𝑛. Another
important representation is the algebraic normal form (ANF). This form uses XOR (⊕)
and AND (⋅) combinations of the binary variables.
An 𝑛-variable Boolean function has a unique representation as a polynomial in 𝑛
variables, say 𝑥1 , 𝑥2 , … , 𝑥𝑛 :
𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) = 𝑎𝐼 ⋅ ∏ 𝑥𝑖 .
⨁
𝐼⊂{1,…,𝑛} 𝑖∈𝐼
The coefficients 𝑎𝐼 are either 0 or 1 and 𝑓 is a sum (XOR) of products (AND) of the
variables, for example
𝑓(𝑥1 , 𝑥2 , 𝑥3 ) = 𝑥1 ⊕ 𝑥1 𝑥2 ⊕ 𝑥2 𝑥3 ⊕ 𝑥1 𝑥2 𝑥3 .
One can view 𝑓 as a polynomial in 𝑛 variables with coefficients in ℤ2 (residue classes

modulo 2) and write + instead of ⊕, e.g.,
𝑓(𝑥1 , 𝑥2 , 𝑥3 ) = 𝑥1 + 𝑥1 𝑥2 + 𝑥2 𝑥3 + 𝑥1 𝑥2 𝑥3 mod 2.
14 1. Fundamentals
The algebraic degree of 𝑓 is the maximal length of the products which appear (with
nonzero coefficient) in the above representation. If 𝑓 is a constant function, then the
degree is 0. In the above example, the degree is 3.
We note that higher powers of a variable 𝑥𝑖 are not needed since 𝑥𝑖𝑘 = 𝑥𝑖 for all
𝑘 ≥ 1 and 𝑥𝑖 ∈ {0, 1}. Boolean functions of degree ≤ 1 are called affine. If the degree
is ≤ 1 and the constant part is 0, then the function is linear. An 𝑛-variable linear
Boolean function is a linear mapping from 𝐺𝐹(2)𝑛 to 𝐺𝐹(2). Linear maps are discussed
in Section 4.4.
The degree of a vectorial Boolean function is the maximal degree of its component
functions. A vectorial Boolean function is called affine if all component functions are
affine.
Example 1.24. (1) 𝑓(𝑥1 , 𝑥2 , 𝑥3 ) = 𝑥1 𝑥2 +𝑥2 𝑥3 +𝑥1 +1 mod 2 is a 3-variable Boolean
function of algebraic degree 2.
(2) 𝑓(𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 ) = (𝑥3 , 𝑥1 + 𝑥2 , 𝑥1 + 𝑥4 ) mod 2 is a (4, 3)-vectorial Boolean func-
tion. The algebraic degree is 1 and the function is linear since all constants are
0.
(3) Let 𝑓 = 𝑓(𝑥0 , 𝑥1 ) be a 2-variable Boolean function given by the following table:
𝑥1 𝑥0 𝑓(𝑥)
0 0 1
0 1 1
1 0 0
1 1 1
We use SageMath to determine the algebraic normal form:

sage: B = BooleanFunction ([1 ,1 ,0 ,1])
sage: B. algebraic_normal_form ()
x0*x1 + x1 + 1
We find that 𝑓(𝑥0 , 𝑥1 ) = 𝑥0 𝑥1 + 𝑥1 + 1 mod 2. The algebraic degree is 2 and 𝑓 is

neither linear nor affine.
1.2. Combinatorics
Combinatorics investigates finite or countable discrete structures. We are interested in
properties of finite sets like {1 , 2, … , 𝑛} or {0, 1}𝑛 which often occur in cryptographic
operations.
Definition 1.25. The factorial of a non-negative integer 𝑛 is defined as follows:
1 for 𝑛 = 0,
𝑛! = {
1 ⋅ 2⋯𝑛 for 𝑛 ≥ 1.
1.2. Combinatorics 15
The binomial coefficients have the following formula:
𝑛 𝑛!
( )= for integers 0 ≤ 𝑘 ≤ 𝑛 . ♢
𝑘 𝑘! (𝑛 − 𝑘)!
The factorial increases very fast, for example
25! = 15511210043330985984000000 and 50! ≈ 3.04 ⋅ 1064 .
An efficient way to compute the binomial coefficients is the reduced formula
𝑛 𝑛 𝑛(𝑛 − 1) ⋯ (𝑛 − (𝑘 − 1))
( )=( )= .
𝑘 𝑛−𝑘 𝑘!
15⋅14⋅13⋅12⋅11⋅10⋅9 32432400
Example 1.26. One has (15) = (15) = = = 6435. The follow-
8 7 7! 5040
15
ing SageMath code computes all binomials ( ) for 0 ≤ 𝑛 ≤ 15:
𝑛
sage: for n in range (0 ,16):

print binomial (15,n),
1 15 105 455 1365 3003 5005 6435 6435 5005 3003 1365 455 105 15 1
The binomial coefficients appear for example in the number of subsets of a given
finite set and in expansions of terms like (𝑎 + 𝑏)𝑛 .
Proposition 1.27. Let 𝑋 and 𝑌 be finite sets. Then
(1) |𝑋 × 𝑌 | = |𝑋| ⋅ |𝑌 | and |𝑋 𝑘 | = |𝑋|𝑘 for 𝑘 ∈ ℕ.

(2) Let 𝑛 = |𝑋| be the cardinality of 𝑋 and 𝑘 ≤ 𝑛. Then the number of subsets of 𝑋 of
cardinality 𝑘 is (𝑛). ♢
𝑘
Note that there is a difference between 𝑘-tuples and subsets of cardinality 𝑘. In the
first case, the order of elements is important and in the second case it is not.
128⋅127
Example 1.28. There are (128) = = 8128 different binary words of length 128
2 2
with exactly two ones and 126 zeros. Indeed, each subset of 𝑋 = {1, 2, … , 128} with
two elements gives the two positions where the digit is equal to 1. ♢
Permutations of finite sets are of great importance in cryptography.
Definition 1.29. Let 𝑆 be a finite set. A permutation 𝜎 of 𝑆 is a bijective map 𝜎 ∶ 𝑆 →

𝑆. ♢
16 1. Fundamentals
Permutations of a finite set 𝑆 can be written using a two-line matrix notation. The
first row lists the elements of 𝑆 and the image of each element is given below in the
second row. The first row can be omitted if 𝑆 is given and the elements are naturally
ordered.
Example 1.30. Let 𝑆 = {1, 2, 3, 4, 5, 6, 7, 8}; then the following row describes a permu-
tation of 𝑆:
(5 7 1 2 8 6 3 4).
Proposition 1.31. Let 𝑆 be a finite set and |𝑆| = 𝑛; then there are 𝑛! permutations of 𝑆.
Proof. Write 𝑆 = {𝑥1 , 𝑥2 , … , 𝑥𝑛 } and let 𝜎 be a permutation of 𝑆. There are 𝑛 different

choices for 𝜎(𝑥1 ). Since 𝜎 is injective, there remain 𝑛 − 1 possible images for 𝜎(𝑥2 ),
then 𝑛 − 2 possibilities for 𝜎(𝑥3 ), etc. The last image 𝜎(𝑥𝑛 ) is fixed by the other choices.
Hence one has 𝑛(𝑛 − 1)(𝑛 − 2) ⋯ 1 = 𝑛! different permutations for 𝜎 ∶ 𝑆 → 𝑆. □
Cryptographic operations are often based on permutations. Indeed, a randomly

chosen family of permutations of a set like 𝑀 = {0, 1}128 would constitute an ideal block
cipher. Unfortunately, it is impossible to write down or to store a general permutation
since 𝑀 has 2128 elements (see Example 1.5). Much simpler are linear maps (see Section
4.4) and in particular bit permutations, which permute only the position of the bits. For
example, we might take the permutation (5 7 1 2 8 6 3 4) of 𝑆 = {1, 2, 3, 4, 5, 6, 7, 8} in
Example 1.30 to define a permutation on the set 𝑋 = {0, 1}8 . A byte (𝑏1 , 𝑏2 , … , 𝑏8 ) ∈ 𝑋
is mapped to (𝑏5 , 𝑏7 , 𝑏1 , 𝑏2 , 𝑏8 , 𝑏6 , 𝑏3 , 𝑏4 ) ∈ 𝑋. Such bit permutations are used within
ciphering operations, but one has to take into account the fact that bit permutations
are (in contrast to general permutations) linear maps.
1.3. Computational Complexity

One often needs to analyze the computational complexity of algorithms, i.e., the nec-
essary time and space relative to the input size. The precise values, for example the
number of steps or CPU-clocks and the required memory in bytes, depend on the plat-
form, the implementation and the chosen measurement units, but the growth of time
or space (in terms of the input size) is often independent of the operational environ-
ment and the parameters chosen. The Big-O notation describes the growth of functions
and is used to estimate the complexity of algorithms in terms of the input size.
Definition 1.32 (Big-O notation). Let 𝑓, 𝑔 ∶ ℕ → ℝ be two functions on ℕ. Then we
say that 𝑔 is an asymptotic upper bound for 𝑓 if there exists a real number 𝐶 ∈ ℝ and
an integer 𝑛0 ∈ ℕ such that
|𝑓(𝑛)| ≤ 𝐶 |𝑔(𝑛)| for all 𝑛 ≥ 𝑛0 ,
and one writes 𝑓 = 𝑂(𝑔) or 𝑓 ∈ 𝑂(𝑔). ♢
1.3. Computational Complexity 17
An asymptotic upper bound 𝑔(𝑛) is usually chosen to be as simple and as small

as possible. We say that 𝑓 has linear, quadratic, cubic or polynomial growth in 𝑛 if
𝑓 = 𝑂(𝑛), 𝑓 = 𝑂(𝑛2 ), 𝑓 = 𝑂(𝑛3 ) or 𝑓 = 𝑂(𝑛𝑘 ) for some 𝑘 ∈ ℕ, respectively.
Example 1.33. (1) Let 𝑓(𝑛) = 2𝑛3 +𝑛2 +7𝑛+2. Since 𝑛2 ≤ 𝑛3 , 𝑛 ≤ 𝑛3 and 1 ≤ 𝑛3 for
𝑛 ≥ 1, one has 𝑓(𝑛) ≤ (2 + 1 + 7 + 2)𝑛3 . Set 𝐶 = 12 and 𝑛0 = 1. Thus 𝑓 = 𝑂(𝑛3 )
and 𝑓 has cubic growth in 𝑛.
20 20
(2) Let 𝑓(𝑛) = 100 + . Set 𝐶 = 101 and 𝑛0 = 19. Since ≤ 1 for 𝑛 ≥ 19, we
𝑛+1 𝑛+1
have 𝑓 = 𝑂(1). Hence 𝑓 is asymptotically bounded by a constant.
(3) Let 𝑓(𝑛) = 5√2𝑛+3 + 𝑛2 − 2𝑛; then 𝑓 = 𝑂(2𝑛/2 ) so that 𝑓 grows exponentially in
𝑛. ♢
The Big-O notation is often used to assess the running time of an algorithm in a
worst-case scenario. An asymptotic upper bound does not depend on the measuring
unit or the platform, since the values would differ only by a multiplicative constant.
If the running time function 𝑓(𝑛) has polynomial growth in the input length 𝑛, i.e.,
𝑓 = 𝑂(𝑛𝑘 ), then 𝑓(𝑛) is bounded by 𝐶𝑛𝑘 for constants 𝐶, 𝑘 and large 𝑛. For example,
if 𝑛 is doubled, then the upper bound is multiplied only by the constant 2𝑘 .
Algorithms with polynomial running time in terms of the input size are considered
to be efficient or fast. Many standard algorithms, for example adding or multiplying
two numbers, are polynomial. On the other hand, an algorithm that loops over every
instance of a set with 2𝑛 elements is exponential in 𝑛. A problem is called hard if no
efficient, i.e., polynomial-time, algorithm exists that solves the problem.
A decision problem has only two possible answers, yes or no. Decision problems
for which a polynomial time algorithm exists are said to belong to the complexity class
P. The class NP (nondeterministic polynomial) is the set of decision problems which
can be verified in polynomial-time. Checking the correctness of a proof is apparently
easier than solving a problem. Whether the class NP is strictly larger than P is a major
unresolved problem in computer science.
In computer science, one is usually interested in the worst-case complexity of an al-
gorithm which solves a certain problem. However, the worst-case complexity is hardly
relevant for attacks against cryptographic systems. A cryptosystem is certainly inse-
cure if attacks are inefficient in certain bad cases but efficient in other cases. Instead,
the average-case complexity of algorithms that break a scheme should be large. A se-
cure scheme should provide protection in almost all cases and the probability that a
polynomial time algorithm breaks the cryptosystem should be very small.
Note that there is not only the time complexity but also the space complexity of
an algorithm. The space complexity measures the memory or storage requirements in
terms of the input size.
18 1. Fundamentals
Warning 1.34. The complexity is measured in terms of the input size, not the input
value! The following formula gives the relation between a positive integer 𝑛 and its
size:
size(𝑛) = ⌊log2 (𝑛)⌋ + 1.
The size of an integer is the number of bits that is needed to represent it. For a given
binary string 𝑚, the bit-length is also called size and denoted by |𝑚|.
In cryptographic applications, the key size is usually bounded by several hundred

or thousand bits, but the numbers represented can be very large.
Example 1.35. Let 𝑛 = 1024 . Then size(𝑛) = ⌊24 ⋅ log2 (10)⌋ + 1 = 80. Hence 80 bits
are needed for the binary representation of 1024 .
Example 1.36. Let 𝑛 ∈ ℕ and 1𝑛 = 1 … 1 be the unary string of 𝑛 ones. Obviously,
size(1𝑛 ) = 𝑛. An algorithm that takes 1𝑛 as input is polynomial if the running time is
polynomial in 𝑛. Of course, 1𝑛 is a very inefficient way to represent a positive integer
𝑛, but it allows us to set the input size to 𝑛. This will be used in Chapter 2, where
algorithms are often expected to be polynomial in a security parameter 𝑛, for example
the key length.
Example 1.37. Suppose we want to compute the prime divisors of an integer 𝑛. A

related decision problem is to determine whether or not 𝑛 is a prime number.
Our algorithm uses trial divisions by all odd positive integers ≤ √𝑛. The number
1
1 log2 𝑛
of divisions (i.e., a multiple of the running time) is at most ⌊ √𝑛⌋. Since √𝑛 = 2 2 ,
2
1
size(𝑛)
the worst-case running time function is 𝑂(2 2 ) and therefore exponential. On the
other hand, our algorithm has polynomial space complexity, since we only need to store
1
𝑛 and a small number of auxiliary variables. Note that the factor in the exponent of
2
the running time function must not be omitted in the Big-O notation! ♢
One should understand the limitations of the asymptotic notation. First, the upper
bound applies only if 𝑛 is larger than some unknown initial value 𝑛0 . Furthermore, the
multiplicative constant 𝐶 can be large. For a given input value, an algorithm with poly-
nomial running time may even be slower than an exponential-time algorithm! How-
ever, for large 𝑛 the polynomial-time algorithm will eventually be faster.
1
Bounded functions are of type 𝑂(1) and functions of type 𝑂( ) converge to 0. Neg-
𝑛
1
ligible functions approach zero faster than and any other inverse polynomial:
𝑛
Definition 1.38. Let 𝑓 ∶ ℕ → ℝ be a function. One says that 𝑓 is negligible in 𝑛 if
1 1
𝑓 = 𝑂( ) for all polynomials 𝑞, or equivalently if 𝑓 = 𝑂( 𝑐 ) for all 𝑐 > 0. ♢
𝑞(𝑛) 𝑛
Hence negligible functions are eventually smaller than any inverse polynomial.
1 1 1
This means that 𝑓(𝑛) approaches zero faster than any of the functions , 2 , 3 , etc.
𝑛 𝑛 𝑛
1
Example 1.39. 𝑓(𝑛) = 10𝑒−𝑛 and 2−√𝑛 are negligible in 𝑛, whereas 𝑓(𝑛) = is
𝑛2 +3𝑛
1 1
not negligible since 𝑓(𝑛) = 𝑂( ), but 𝑓 ≠ 𝑂( ). ♢
𝑛2 𝑛3
Finally, we define the soft-O notation which ignores logarithmic factors.

̃
Definition 1.40. Let 𝑓, 𝑔 ∶ ℕ → ℝ be functions. Then 𝑓 = 𝑂(𝑔) if there exists 𝑘 ∈ ℕ
such that 𝑓 = 𝑂(ln(𝑛)𝑘 𝑔(𝑛)).
̃
Example 1.41. (1) Let 𝑓1 (𝑛) = log2 (𝑛). Then 𝑓1 = 𝑂(1).
̃ 2 ).
(2) Let 𝑓2 (𝑛) = log (𝑛)3 𝑛2 . Then 𝑓2 = 𝑂(𝑛
10
1.4. Discrete Probability

Probability theory is an important foundation of modern cryptography. We only con-
sider discrete probability spaces.
Definition 1.42. Let Ω be a countable set, 𝒮 = 𝒫(Ω) the power set of Ω and 𝑃𝑟 ∶ 𝒮 →
[0, 1] a function with the following properties:
(1) 𝑃𝑟[Ω] = 1.
(2) If 𝐴1 , 𝐴2 , 𝐴3 , … are pairwise disjoint subsets of 𝐴, then
𝑃𝑟 [ 𝐴𝑖 ] = ∑ 𝑃𝑟[𝐴𝑖 ].
⋃
𝑖 𝑖
The triple (Ω, 𝒮, 𝑃𝑟) is called a discrete probability space. Ω is called the sample space
and we say that 𝑃𝑟 is a discrete probability distribution on Ω. The subsets 𝐴 ⊂ Ω are
said to be events and 𝑃𝑟[𝐴] is the probability of 𝐴. If Ω is finite, then (Ω, 𝒮, 𝑃𝑟) is called
a finite probability space. ♢
Note that the family of sets in (2) is either finite or countably infinite. Since all
probabilities are non-negative and the sum is bounded by 1, the series converges and
is also invariant under a reordering of terms.
Remark 1.43. In measure theory, a triple (Ω, 𝒮, 𝑃𝑟), where 𝑃𝑟 is a 𝜎-additive function
on a set 𝒮 ⊂ 𝒫(Ω) of measurable sets such that 𝑃𝑟[Ω] = 1, is called a probability space.
We only consider the case of a countable sample space Ω and assume that all subsets
(events) are measurable, i.e., 𝒮 = 𝒫(Ω). A discrete probability distribution is fully
determined by the values on the singletons {𝜔} (the elementary events). We define the
function
𝑝(𝜔) = 𝑃𝑟[{𝜔}]
and obtain for all events 𝐴 ⊂ Ω:
𝑃𝑟[𝐴] = ∑ 𝑝(𝜔).
𝜔∈𝐴
20 1. Fundamentals
Hence a discrete probability distribution is fully determined by a function 𝑝 ∶

Ω → [0, 1] with ∑𝜔∈Ω 𝑝(𝜔) = 1.
Example 1.44. Let Ω = ℕ and 0 < 𝑝 < 1. Define a discrete probability distribution
𝑃𝑟 on ℕ by
𝑝(𝑘) = 𝑃𝑟[{𝑘}] = (1 − 𝑝)𝑘−1 𝑝.

The formula for the geometric series implies that 𝑃𝑟[ℕ] = 1:
∞ ∞
1
∑ 𝑝(𝑘) = 𝑝 ∑ (1 − 𝑝)𝑘−1 = 𝑝 = 1.
𝑘=1 𝑘=1
1 − (1 − 𝑝)
𝑃𝑟 is called a geometric distribution. Suppose the probability of success in an experiment

is equal to 𝑝. Then the geometric distribution gives the probability that the first success
requires 𝑘 independent trials.
Definition 1.45. Let 𝑃𝑟 be a probability distribution on a finite sample space Ω. 𝑃𝑟 is

a uniform distribution if all elementary events have equal probability, i.e., if
1
𝑝(𝜔) = 𝑃𝑟[{𝜔}] = for all 𝜔 ∈ Ω. ♢
|Ω|
Examples of uniform distributions include a fair coin or a dice. Random number

generators should produce uniformly distributed numbers. A uniform distribution is
sometimes assumed when other information is not available. Cryptographic schemes
(see Chapter 2) often assume that a key is chosen uniformly at random so that all pos-
sible keys have the same probability.
Definition 1.46. Let 𝐴, 𝐵 and 𝐴1 , … , 𝐴𝑛 be events in a probability space.

(1) 𝐴 and 𝐵 are said to be independent if the joint probability equals the product of
probabilities:
𝑃𝑟[𝐴 ∩ 𝐵] = 𝑃𝑟[𝐴] ⋅ 𝑃𝑟[𝐵].
(2) 𝐴1 , 𝐴2 , … , 𝐴𝑛 are mutually independent if for every non-empty subset of indices
𝐼 ⊂ {1, … , 𝑛}, one has
𝑃𝑟 [ 𝐴𝑖 ] = ∏ 𝑃𝑟[𝐴𝑖 ] . ♢
⋂
𝑖∈𝐼 𝑖∈𝐼
Note that mutual independence of 𝑛 events is stronger than pairwise independence

of all pairs (see Example 1.53 below).
Definition 1.47. Let 𝐴 and 𝐵 be two events in a probability space and suppose that
𝑃𝑟[𝐴] ≠ 0. Then the conditional probability 𝑃𝑟[𝐵|𝐴] of 𝐵 given 𝐴 is defined by

𝑃𝑟[𝐴 ∩ 𝐵]
𝑃𝑟[𝐵|𝐴] = . ♢
𝑃𝑟[𝐴]
Obviously, 𝑃𝑟[𝐵|𝐴] = 𝑃𝑟[𝐵] holds if and only if 𝐴 and 𝐵 are independent.

The sample space Ω of a probability space is often mapped to a standard space and
in particular to the real numbers. Such a map on Ω is called a random variable. For
example, the outcome of rolling two dice can be mapped to {1, 2, … , 6}2 or to {2, 3, … , 12}
if the numbers on the dice are added.
Definition 1.48. Let 𝑃𝑟 ∶ Ω → [0, 1] be a discrete probability distribution; then a
function 𝑋 ∶ Ω → ℝ is called a real random variable. For any 𝑥 ∈ ℝ, one obtains
an event 𝑋 −1 (𝑥) ⊂ Ω and the probability 𝑃𝑟[𝑋 = 𝑥] is defined by 𝑃𝑟[𝑋 −1 (𝑥)]. The
function 𝑝𝑋 ∶ ℝ → [0, 1] defined by 𝑝𝑋 (𝑥) = 𝑃𝑟[𝑋 = 𝑥] is called the probability mass
function (pmf) of 𝑋. Furthermore, the cumulative distribution function (cdf) 𝐹 ∶ ℝ →
ℝ is defined by 𝐹(𝑥) = 𝑃𝑟[𝑋 ≤ 𝑥]. ♢
Note that 𝑋 induces a discrete probability distribution 𝑃𝑟𝑋 on the countable subset
𝑋(Ω) ⊂ ℝ. The difference to the original distribution 𝑃𝑟 is that the sample space of 𝑃𝑟𝑋
is now a subset of ℝ. If the sample space Ω is already a subset of ℝ, then 𝑋 is usually
the inclusion map.
Example 1.49. Suppose two dice are rolled and the random variable 𝑋 gives the sum
of numbers on the dice. Then 𝑋 −1 (2) = {(1, 1)} and 𝑋 −1 (3) = {(1, 2), (2, 1)}, so that
1 1 1 1
𝑝𝑋 (2) = 𝑃[𝑋 = 2] = , 𝑝𝑋 (3) = 𝑃[𝑋 = 3] = + = .
36 36 36 18
1 1 1 1
Furthermore, 𝐹(𝑥) = 0 for 𝑥 < 2, 𝐹(2) = , 𝐹(3) = + = , etc., and
36 36 18 12
𝐹(𝑥) = 1 for 𝑥 ≥ 12.
Definition 1.50. Let 𝑃𝑟 be a discrete probability distribution and 𝑋 ∶ Ω → ℝ a random
variable with countable range 𝑋(Ω) ⊂ ℝ. One defines the expected value (also called
expectation, mean or average) 𝐸[𝑋] and the variance 𝑉[𝑋] if the sums given below are
either finite or the corresponding series converge absolutely:
𝐸[𝑋] = ∑ 𝑥 ⋅ 𝑃𝑟[𝑋 = 𝑥] = ∑ 𝑥 ⋅ 𝑝𝑋 (𝑥),
𝑥∈𝑋(Ω) 𝑥∈𝑋(Ω)
𝑉[𝑋] = 𝐸[(𝑋 − 𝐸[𝑋])2 ] = 𝐸[𝑋 2 ] − (𝐸[𝑋])2 = ( ∑ 𝑥2 ⋅ 𝑝𝑋 (𝑥)) − (𝐸[𝑋])2 .

𝑥∈𝑋(Ω)
The square root 𝜎 = √𝑉[𝑋] of the variance is called the standard deviation. It measures
the quadratic deviation from the mean 𝐸[𝑋].
Example 1.51. (1) Let 𝑃𝑟 be a uniform distribution on a finite set Ω. Assume that
1
the random variable 𝑋 maps Ω to the set {0, 1, … , 𝑛 − 1}. The pmf is 𝑝𝑋 (𝑥) =
𝑛
22 1. Fundamentals
for 𝑥 = 0, 1, … , 𝑛 − 1, and zero otherwise. The expectation is

𝑛−1
1 𝑛(𝑛 − 1) 1 𝑛−1
𝐸[𝑋] = ∑ 𝑘 ⋅ = ⋅ = .
𝑘=0
𝑛 2 𝑛 2
The variance is
𝑛−1
1 𝑛−1 2
𝑉[𝑋] = ( ∑ 𝑘2 ⋅ )−( )
𝑘=0
𝑛 2
𝑛(𝑛 − 1)(2𝑛 − 1) 1 𝑛2 − 2𝑛 + 1 𝑛2 − 1
= ⋅ − = .
6 𝑛 4 12
(2) Consider the experiment of tossing a perfect coin and letting 𝑋 be the associated
random variable having values 0 and 1. It is hardly surprising that the expected
1 1 1
value is 𝐸[𝑋] = . The variance is 𝑉[𝑋] = and the standard deviation is 𝜎 = .
2 4 2
Definition 1.52. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be random variables on a discrete probability space

with probability mass functions 𝑝𝑋1 , 𝑝𝑋2 , … , 𝑝𝑋𝑛 . The random variables are called
mutually independent if for any sequence 𝑥1 , 𝑥2 , … , 𝑥𝑛 of values
𝑃𝑟[𝑋1 = 𝑥1 ∧ 𝑋2 = 𝑥2 ∧ ⋯ ∧ 𝑋𝑛 = 𝑥𝑛 ] = 𝑝𝑋1 (𝑥1 ) ⋅ 𝑝𝑋2 (𝑥2 ) ⋯ ⋅ 𝑝𝑋𝑛 (𝑥𝑛 ).
Example 1.53. Let 𝑋1 and 𝑋2 be two binary random variables (values 0 or 1) that
are given by tossing two perfect coins so that 𝑋1 and 𝑋2 are independent. Now set
𝑋3 = 𝑋1 ⊕ 𝑋2 . Then 𝑋1 , 𝑋2 , 𝑋3 are pairwise independent and each of them has a
uniform distribution, but they are not mutually independent. We have
𝑃𝑟[𝑋1 = 1 ∧ 𝑋2 = 1 ∧ 𝑋3 = 1] = 0,
since 𝑋3 must be zero if 𝑋1 = 𝑋2 = 1, but
1 3 1
𝑃𝑟[𝑋1 = 1] ⋅ 𝑃𝑟[𝑋2 = 1] ⋅ 𝑃𝑟[𝑋3 = 1] = ( ) = .
2 8
Example 1.54. Let Ω = {0, 1}8 be a space of plaintext and ciphertexts. Suppose the
plaintexts 𝑋 are uniformly distributed. Let 𝜎 ∶ Ω → Ω be a random bit permutation
(see Section 1.2). Then the ciphertexts 𝑌 = 𝜎(𝑋) are also uniformly distributed. 𝑋 and
𝑌 are independent if
𝑃𝑟[𝑋 = 𝑚 ∧ 𝑌 = 𝑐] = 𝑃𝑟[𝑋 = 𝑚] ⋅ 𝑃𝑟[𝑌 = 𝑐]
1 1
holds for all plaintexts 𝑚 and ciphertexts 𝑐. The right side of the equation gives ⋅ =
28 28
1
for all 𝑚 and 𝑐. If 𝑚 and 𝑐 possess a different number of ones, then the left side is 0,
216
because such a combination is impossible for a bit permutation. This shows that 𝑋 and
𝑌 are not independent. Later we will see that bit permutations are not secure, since
the ciphertext leaks information about the plaintext.
1
Figure 1.3. Probability mass functions of i) the binomial distribution 𝐵(20, ) (•) and
2
ii) the uniform distribution (×) on {0, 1, 2, … , 20}.
Example 1.55. Let 𝑃𝑟 be a probability distribution on a sample space Ω with two el-
ements. Suppose 𝑋 ∶ Ω → {0, 1} is a random variable with 𝑃𝑟[𝑋 = 1] = 𝑝 (success)
and 𝑃𝑟[𝑋 = 0] = 1 − 𝑝 (failure). This is called a Bernoulli trial. Furthermore, let
𝑋1 , … , 𝑋𝑛 be 𝑛 independent identical distributed (i.i.d.) random variables with 𝑋𝑖 = 𝑋,
and define
𝑌 = 𝑋 1 + 𝑋2 + ⋯ + 𝑋 𝑛 .
The new random variable 𝑌 follows a binomial distribution 𝐵(𝑛, 𝑝) and gives the num-
ber of successes in 𝑛 independent Bernoulli trials. For 𝑘 ∈ {0, 1, … , 𝑛} one has
𝑛
𝑃𝑟[𝑌 = 𝑘] = ( )𝑝𝑘 (1 − 𝑝)𝑛−𝑘 ,
𝑘
since there are (𝑛) combinations of 𝑛 trials with 𝑘 successes and 𝑛 − 𝑘 failures. The
𝑘
probability of each combination is 𝑝𝑘 (1 − 𝑝)𝑛−𝑘 . We have 𝐸[𝑋] = 𝑝, 𝐸[𝑌 ] = 𝑛𝑝,
𝑉[𝑋] = 𝑝(1 − 𝑝) and 𝑉[𝑌 ] = 𝑛𝑝(1 − 𝑝).
1.5. Random Numbers

Random numbers or random bits play an important role in cryptography. Keys are
often chosen uniformly at random, and many cryptographic algorithms are probabilis-
tic and require random input. One distinguishes between true random numbers in the
sense of probability theory and pseudorandom numbers or bits. The latter are produced
24 1. Fundamentals
by deterministic algorithms, which take a short random input seed as input and gener-
ate a long output sequence that appears to be random. Pseudorandom generators are
discussed in Section 2.8.
Definition 1.56. A random bit generator (RBG) is a mechanism or device which gen-
erates a sequence of random bits, such that the corresponding sequence of binary ran-
dom variables 𝑋1 , 𝑋2 , 𝑋3 , … has the following properties:
1
(1) 𝑃𝑟[𝑋𝑛 = 0] = 𝑃𝑟[𝑋𝑛 = 1] = for all 𝑛 ∈ ℕ (uniform distribution) and
2
(2) 𝑋1 , 𝑋2 , … , 𝑋𝑛 are mutually independent for all 𝑛 ∈ ℕ.
Example 1.57. If at least one output bit is a combination of the other bits, for example
if 𝑋3 = 𝑋1 ⊕ 𝑋2 (see Example 1.53), then this does not give a random bit sequence.
This demonstrates that the obvious constructions to ‘stretch’ a given sequence cannot
be used. ♢
Random bits or numbers can be produced manually (for example coin tossing, die
rolling or mouse movements) or with hardware random number generators, which use
physical phenomena like thermal noise, electrical noise or nuclear decay. Unfortu-
nately, these mechanisms or devices tend to be slow, elaborate and/or costly. Fast
all-digital random bit generators on current processor chips use thermal noise, but
whether such generators can be trusted and do not have any weaknesses or even con-
tain backdoors is disputed.
Remark 1.58. The required uniform distribution of the output of a bit generator can
be achieved by de-skewing a possibly biased generator (see Example 1.59 below), but
the statistical independence of the output bits is hard to achieve and difficult to prove.
Example 1.59. Von Neumann proposed the following de-skewing technique (von Neu-
mann extractor): group the output bits into pairs, then turn 01 into 0 and 10 into 1. If
the bits are independent, then the pairs 01 and 10 must have the same probability. The
pairs 00 and 11 are discarded. The derived generator is slower but unbiased. ♢
The generation of random bits is basically equivalent to the generation of ran-

dom integer numbers. In cryptography, random data is needed quite often. Random
bits are used to seed deterministic pseudorandom generators (see Definition 2.32). In
practice, the generators are often based on hash functions, HMAC or block ciphers in
counter mode (see [BK15]). These cryptographic primitives are explained in the fol-
lowing chapters. The output of such generators appears totally unpredictable to an
observer who does not know the seed, although deterministic functions are used to
derive the output bits.
Random bits are needed to generate keys. Another application are probabilistic al-
gorithms which take random data as auxiliary input. This may seem odd, but suppose
an algorithm sometimes gives an incorrect result. Repeated application with fresh ran-
dom input then reduces the failure rate. Furthermore, probabilistic algorithms ensure
that the same input can have a different output value, and this can be a desirable prop-
erty for data encryption.
Random numbers match surprisingly often. This is known as the Birthday Problem
or Birthday Paradox.
Example 1.60. Assume that 23 people are in a certain place. Then the probability
that at least two of them have their birthday on the same day of the year is above 50%.
365
Intuitively, one would expect that around people would be needed for a probable
2
birthday match. ♢
The explanation of this ‘paradox’ is quite simple: the probability 𝑝 that no colli-
sion occurs (i.e., all birthdays are different) decreases exponentially with the number
𝑛 of persons. We assume that birthdays are uniformly distributed. For 𝑛 = 2, one has
364 364 363
𝑝 = . For 𝑛 = 3, one gets 𝑝 = ⋅ , and each increment of 𝑛 yields another
365 365 365
factor. We write a SageMath function and obtain 𝑝 ≈ 0.493 for 𝑛 = 23. The comple-
mentary probability 1−𝑝 for a birthday collision with 23 people therefore lies above 0.5.
sage: def birthday (k,n):

p=1
for i in range (1,k):
p=p*(1-i/n)
print RR(p)
sage: birthday (23 ,365)
0.492702765676015
Proposition 1.61. Let 𝑃𝑟 be a uniform distribution on a set Ω of cardinality 𝑛. If we

draw 𝑘 = ⌈√2 ln(2)𝑛 ⌉ ≈ 1.2 √𝑛 independent samples from Ω, then the probability of a
collision is around 50%.
Proof. See Exercise 16. □

Example 1.62. Consider collisions of binary random strings of length 𝑙. The above
Proposition 1.61 shows that collisions are expected after around √2𝑙 = 2𝑙/2 values.
If the strings are sufficiently long and uniformly distributed, then random collisions
should almost never occur. For example, for 256-bit hash values with a uniform dis-
tribution, about 2128 hash values are required for a collision. On the other hand, only
around 100,000 32-bit strings will probably have a collision. ♢
The running time of finding a collision among binary strings of length 𝑙 is 𝑂(2𝑙/2 ).
Unfortunately, a large amount of space is also required, since all 𝑂(2𝑙/2 ) strings have to
be stored to detect a collision.
An optimization is possible if the samples are defined recursively by a function:
𝑥𝑖 = 𝑓(𝑥𝑖−1 ) for 𝑖 ≥ 1,
26 1. Fundamentals
where 𝑥0 is some initial value. Now the problem is to find a cycle in a sequence of
iterated function values. Floyd’s cycle-finding algorithm uses very little memory and is
based on the following observation:
Proposition 1.63. Let 𝑓 ∶ 𝑋 → 𝑋 be a function on some set 𝑋, 𝑥0 ∈ 𝑋 and 𝑥𝑖 = 𝑓(𝑥𝑖−1 )
for 𝑖 ≥ 1. Suppose there exist 𝑖, 𝑗 ∈ ℕ such that 𝑖 < 𝑗 and 𝑥𝑖 = 𝑥𝑗 . Then there exists an
integer 𝑘 < 𝑗 such that
𝑥𝑘 = 𝑥2𝑘 .
Proof. Let Δ = 𝑗 − 𝑖; then 𝑥𝑖 = 𝑥𝑖+∆ and hence 𝑥𝑘 = 𝑥𝑘+∆ = 𝑥𝑘+𝑚∆ for all integers
𝑘 ≥ 𝑖 and 𝑚 ≥ 1. Now let 𝑘 = 𝑚Δ, where 𝑚Δ is the smallest multiple of Δ that is
also greater than or equal to 𝑖. The sequence 𝑖, 𝑖 + 1, … , 𝑗 − 1 of Δ consecutive integers
contains the required number 𝑘 = 𝑚Δ. Therefore, 𝑥𝑘 = 𝑥2𝑘 and 𝑘 = 𝑚Δ < 𝑗. □
Note that a collision must exist if 𝑋 is a finite set. The above Proposition 1.63
implies that a collision in 𝑥0 , 𝑥1 , … , 𝑥𝑗 yields a collision of the special form 𝑥𝑘 = 𝑥2𝑘
for some 𝑘 < 𝑗. The least period of the sequence divides 𝑘. It is therefore sufficient to
compute the pairs (𝑥𝑖 , 𝑥2𝑖 ) for 𝑖 = 1, 2, … until a collision occurs. These values can be
recursively calculated:
𝑥𝑖 = 𝑓(𝑥𝑖−1 ) and 𝑥2𝑖 = 𝑓(𝑓(𝑥2(𝑖−1) )).
Assuming that the sequence 𝑥0 , 𝑥1 , … is uniformly distributed and |𝑋| = 𝑛, the run-
ning time is still 𝑂(√𝑛), but now it is sufficient to store only two values. This approach
is used in birthday attacks against hash functions and in Pollard’s 𝜌 algorithms for fac-
toring and discrete logarithms. The sequence 𝑥0 , 𝑥1 , … can be depicted by an initial
tail and a cycle so that it looks like the greek letter 𝜌.
Remark 1.64. We only consider the part of Floyd’s algorithm which finds a collision.
The algorithm can also compute the least period, i.e., the length of the shortest cycle,
and find the beginning of the cycle.
Example 1.65. Let 𝑋 = ℤ107 be the set of residue classes modulo 107 and let
𝑓(𝑥) = 𝑥2 + 26 mod 107.
Set 𝑥0 ≡ 1 mod 107 and let 𝑥𝑖 = 𝑓(𝑥𝑖−1 ) for 𝑖 ≥ 1. We want to find a collision within
the sequence 𝑥0 , 𝑥1 , 𝑥2 , … and implement Floyd’s cycle finding algorithm:
sage: def f(x):
return (x*x+26)
sage: x=mod (1 ,107)
sage: y=mod (1 ,107)
sage: x=f(x)
sage: y=f(f(y))
sage: k=1
sage: while x!=y:
x=f(x)
1.6. Summary 27
y=f(f(y))
k=k+1
print "k =",k," x =",x
k = 9 x = 39
Hence 𝑥9 = 𝑥18 = 39 is a collision. Let’s compute the first few elements of the
sequence and verify the result:
sage: x=mod (1 ,107)
sage: for i in range (46):
x=f(x)
print ("{:2}". format (x)),
27 6 62 18 29 11 40 21 39 49 73 5 51 59 83 67 21 39 49 73 5 51
59 83 67 21 39 49 73 5 51 59 83 67 21 39 49 73 5 51 59 83 67 21
The first seven elements form the initial segment (tail) of the sequence. The be-
ginning of the cycle is 𝑥8 = 21, and the sequence 21, 39, 49, … is cyclic of period
9.
1.6. Summary
• Sets, relations and functions are fundamental in mathematics and cryptogra-

phy.
• Residue classes are defined by an equivalence relation on the integers. There
are 𝑛 different residue classes modulo 𝑛 and the standard representatives are
0, 1, … , 𝑛 − 1.
• Permutations are bijective functions on a finite set.
• The Big-O notation is used to give an asymptotic upper bound of the growth of
a function in one variable.
• Algorithms with polynomial running time in terms of the input size are con-
sidered as efficient. The decision problems that can be solved in polynomial
time form the complexity class P.
• Negligible functions are eventually smaller than any inverse polynomial.
• A discrete probability space consists of a countable sample space Ω and a prob-
ability distribution 𝑃𝑟 on Ω. A real random variable 𝑋 is a mapping from Ω to
ℝ and gives a probability distribution on ℝ.
• A true random bit generator can be described by a sequence of mutually inde-
pendent binary and uniformly distributed random variables.
• The birthday paradox states that collisions of uniformly distributed random
numbers occur surprisingly often.
28 1. Fundamentals
Exercises
1. Let 𝑋 = ([−1, 1] ∩ ℤ) × {0, 1}. Enumerate the elements of 𝑋 and determine |𝑋|. Let
𝑌 = {1, 2, … , |𝑋|}. Give a bijection from 𝑋 to 𝑌 .
2. Which of the following maps are injective, surjective or bijective? Determine the
image 𝑖𝑚(𝑓) and give the inverse map 𝑓−1 , if possible.
(a) 𝑓1 ∶ ℕ → ℕ, 𝑓1 (𝑛) = 2𝑛 + 1.
(b) 𝑓2 ∶ ℤ → ℕ, 𝑓2 (𝑘) = |𝑘| + 1.
(c) 𝑓3 ∶ {0, 1}8 → {0, 1}8 , 𝑓3 (𝑏) = 𝑏 ⊕ (01101011).
(d) 𝑓4 ∶ {0, 1}8 → {0, 1}8 , 𝑓4 (𝑏) = 𝑏 AND (01101011).
3. Show that the following sets are countable:
(a) ℤ.
(b) ℤ2 .
(c) ℚ.
Hint: It is sufficient to construct an injective function into ℕ.
4. Let 𝑓 ∶ 𝑋 → 𝑌 be a map between finite sets and suppose that |𝑋| = |𝑌 |. Show the
following equivalences:
𝑓 is injective ⟺ 𝑓 is surjective ⟺ 𝑓 is bijective.
5. Let 𝑓 ∶ 𝑋 → 𝑌 be a function.
(a) Let 𝐵 ⊂ 𝑌 . Show that 𝑓(𝑓−1 (𝐵)) ⊂ 𝐵 with equality occurring if 𝑓 is surjective.
(b) Let 𝐴 ⊂ 𝑋. Show that 𝐴 ⊂ 𝑓−1 (𝑓(𝐴)) with equality occurring if 𝑓 is injective.
6. Enumerate the integers modulo 26. Find the standard representative of the follow-
ing integers in ℤ26 :
−1000, −30, −1, 15, 2001, 293829329302932398231.
7. Consider the following relation 𝑆 on ℝ:
𝑆 = {(𝑥, 𝑦) ∈ ℝ × ℝ | 𝑥 − 𝑦 ∈ ℤ}.
Show that 𝑆 is an equivalence relation on ℝ. Determine the equivalence classes 0,
4
−2 and . Can you give an interval 𝐼 such that there is a bijection between 𝐼 and
3
the quotient set ℝ/ ∼ ?
8. Find an asymptotic upper bound of the following functions in 𝑛. Which of them
are polynomial and which are negligible?
(a) 𝑓1 = 2𝑛3 − 3𝑛2 + 𝑛.
(b) 𝑓2 = 3 ⋅ 2𝑛 − 2𝑛 + 1.
(c) 𝑓3 = √2𝑛 + 1.
2𝑛
(d) 𝑓4 = 𝑛/2 .
2
5𝑛2 −𝑛
(e) 𝑓5 = .
2𝑛2 +3𝑛+1
Exercises 29
1
𝑛+3
(f) 𝑓6 = 2 3 .
(g) 𝑓7 = log2 (𝑛)2 𝑛.
9. Let 𝑓 = 𝑓(𝑥0 , 𝑥1 , 𝑥2 ) be a 3-variable Boolean function with the following truth
table:
𝑥2 𝑥1 𝑥0 𝑓(𝑥)
0 0 0 1
0 0 1 1
0 1 0 1
0 1 1 0
1 0 0 0
1 0 1 0
1 1 0 1
1 1 1 1
Determine the algebraic normal form and the algebraic degree of 𝑓.

10. Suppose the number of operations of the most efficient attack against a cipher is
1/3 )
𝑓(𝑛) = 𝑒(2𝑛 ,
where 𝑛 is the key length. Why is 𝑓(𝑛) sub-exponential, but not polynomial in 𝑛 ?
Compute effective key length log2 (𝑓(𝑛)) for 𝑛 = 128, 𝑛 = 1024 and 𝑛 = 2048.
11. Let Ω = {0, 1}8 be a probability space with a uniform probability distribution. Com-
pute the probability that a randomly chosen byte is balanced, i.e., contains four
zeros and four ones.
12. Verify the expected values and variances in Example 1.55.
13. Two perfect dice are rolled and the random variables which give the numbers on
the dice are called 𝑋 and 𝑌 . Compute
𝑃𝑟[𝑋 + 𝑌 ≥ 10] and 𝑃𝑟[𝑋 + 𝑌 ≥ 10 | 𝑋 − 𝑌 = 0] .
Are the random variables 𝑋 + 𝑌 and 𝑋 − 𝑌 independent? Determine the expecta-
tion, the variance and the standard deviation of 𝑋 + 𝑌 and 𝑋 − 𝑌 .
14. Suppose that 100 bits are generated uniformly at random and that additional bits
are produced by XORing the preceding 100 bits. Are the new bits uniformly dis-
tributed? Does this construction give a random bit generator?
15. Let 𝑋 = ℤ56807 and define a sequence by 𝑥0 ≡ 1 and
2
𝑥𝑖 = 𝑥𝑖−1 + 1 mod 56807
for 𝑖 ≥ 1. Use SageMath to find a collision and determine the least period of the
sequence.
30 1. Fundamentals
16. (Birthday Paradox) Let 𝑃𝑟 be a uniform distribution on the sample space Ω with
|Ω| = 𝑛. If 𝑘 ≤ 𝑛 samples are independently chosen, then the probability 𝑝 that
all 𝑘 values are different (i.e., no collision occurs) is
𝑘−1
𝑖
𝑝 = ∏ (1 − ).
𝑖=1
𝑛
(a) Show that the probability 1 − 𝑝 of a collision satisfies
𝑘(𝑘−1)
−
1−𝑝≥1−𝑒 2𝑛 .
1
(b) Determine the smallest number 𝑘 such that 𝑝 ≈ .
2
𝑖
Hint: Use the inequality 1 − 𝑥 ≤ 𝑒−𝑥 for 0 ≤ 𝑥 ≤ 1 and replace the factors 1 −
𝑛
𝑖
−
by 𝑒 𝑛 . Compute the product and obtain a sum in the exponent. Use the formula
𝑘−1 𝑘(𝑘−1) 1
∑𝑖=1 𝑖 = . For part (b), set 𝑝 = and determine 𝑘 using the quadratic
2 2
formula. You may also approximate 𝑘(𝑘 − 1) by 𝑘2 . This gives the approximate
number of samples needed for a probable collision.
Chapter 2
Encryption Schemes and

Definitions of Security
This chapter contains the fundamental definitions of encryption schemes and their
security. We look at security under different types of attacks and make assumptions
about the computing power of adversaries. Then we study pseudorandom generators,
functions and permutations, which are important primitives and the basis of many
cryptographic constructions.
The definition of cryptosystems and some basic examples are given in Section
2.1. The following three sections deal with different types of security of encryption
schemes. Perfect secrecy is the strongest type of security and is covered in Section 2.2.
Since perfectly secure schemes can rarely be used in practice, we relax the require-
ments and consider computational security in Section 2.3. For the formal definition
of security, we look at the success probability of polynomial-time adversaries in well-
defined games or experiments in Section 2.4. We then explain the important defini-
tions of eavesdropping (EAV) security, security against chosen plaintext attacks (CPA)
and security against chosen ciphertext attacks (CCA). A secure scheme should have in-
distinguishable encryptions: challenged with a ciphertext and two possible plaintexts,
a polynomial-time adversary fails to find the correct plaintext better than a random
guesser.
We then turn to the construction of secure encryption schemes. Pseudorandom
generators and families of pseudorandom functions and permutations are important
building blocks of secure ciphers and are covered in Sections 2.8 and 2.9.
The combination of a family of functions or permutations and an operation mode
defines an encryption scheme. In Section 2.10, we discuss ECB, CBC and CTR modes
31
32 2. Encryption Schemes and Definitions of Security
and their properties. The security of CBC or CTR mode encryption can be reduced to
the pseudorandomness of the underlying block cipher.
The presentation of this chapter is heavily influenced by [KL15]. Other recom-
mended references are [BR05], [GB08] and [Gol01].
2.1. Encryption Schemes

An encryption scheme essentially transforms plaintext into ciphertext and conversely.
The algorithms are parametrized by keys and encryption is either deterministic or ran-
domized, i.e., the ciphertext may depend on random input data. We use the symbols ∶=
$
or = for a deterministic assignment, ← for any type of assignment and ← for a random-
$
ized operation. For example, 𝑐 ← ℰ𝑘 (𝑚) denotes an operation where an encryption
algorithm ℰ takes 𝑘, 𝑚 and additional random data as input and outputs 𝑐. The symbol
$ $
← is also used for a uniform random choice from a set; for example 𝑘 ← {0, 1}𝑛 gives a
binary uniform random string 𝑘 of length 𝑛.
Definition 2.1. An encryption scheme or cryptosystem consists of
• A plaintext space ℳ, the set of plaintext or clear-text messages.
• A ciphertext space 𝒞, the set of ciphertext messages.
• A key space 𝒦, the set of keys.
• A randomized key generation algorithm 𝐺𝑒𝑛(1𝑛 ) that takes the security parameter
𝑛 (see Remark 2.2 below) in unary form (compare Example 1.36) as input and
returns a key 𝑘 ∈ 𝒦.
• An encryption algorithm ℰ = {ℰ𝑘 | 𝑘 ∈ 𝒦}, which is possibly randomized. It takes
a key and a plaintext message as input and returns the ciphertext or an error, if
$
the plaintext is invalid. We write 𝑐 = ℰ𝑘 (𝑚) or 𝑐 ← ℰ𝑘 (𝑚), and 𝑐 ← ℰ𝑘 (𝑚) if the
algorithm is randomized. The error output is denoted by ⟂.
• A deterministic decryption algorithm 𝒟 = {𝒟𝑘 | 𝑘 ∈ 𝒦} takes a key and a ci-
phertext message as input and returns the plaintext or an error symbol ⟂ if the
ciphertext is invalid. We write 𝑚 = 𝒟𝑘 (𝑐) or 𝑚 ← 𝒟𝑘 (𝑐).
We require that all algorithms are polynomial with respect to the input size. Since 𝐺𝑒𝑛
takes a unary string 1𝑛 = 1 … 1 of length 𝑛 as input, the key generation algorithm is
polynomial in 𝑛.
The scheme provides correct decryption if for each key 𝑘 ∈ 𝒦 and all plaintexts
𝑚 ∈ ℳ one has 𝒟𝑘 (ℰ𝑘 (𝑚)) = 𝑚 (see Figure 2.1). ♢
Remark 2.2. The security parameter 𝑛 controls the security of the scheme and the dif-
ficulty to break it, as well as the run-time of key generation, encryption and decryption
algorithms. The security parameter is closely related or even equal to the key length,
2.1. Encryption Schemes 33
𝑘 𝑘
𝑚 ℰ 𝑐 𝒟 𝑚
Figure 2.1. Encryption and decryption algorithms.
and quite often the key generation algorithm 𝐺𝑒𝑛(1𝑛 ) outputs a uniform random key
of length 𝑛. ♢
The scheme is said to be symmetric-key if encryption and decryption use the same
secret key. In contrast, public-key (asymmetric-key) encryption schemes use key pairs
𝑘 = (𝑝𝑘, 𝑠𝑘), where 𝑝𝑘 is public and 𝑠𝑘 is private; encryption takes the public key 𝑝𝑘 as
input and decryption the private key 𝑠𝑘 (see Definition 9.1). The encryption algorithm
ℰ𝑝𝑘 must be carefully chosen so that the inversion (decryption) is computationally hard
if only 𝑝𝑘 is known.
Until the 1970s, only symmetric-key schemes were known, but subsequently
public-key methods became part of standard cryptography. We will see later that both
schemes have their own field of application. Public-key encryption is studied in Chap-
ter 9, while this chapter only deals with symmetric encryption schemes.
Remark 2.3. A cryptosystem should be secure under the assumption that an attacker
knows the encryption and decryption algorithms. This is known as Kerkhoff’s Principle.
The security should be solely based on a secret key, not on the details of the system (see
Exercise 2). ♢
In the past, the plaintexts, ciphertexts and keys were often constructed using the
alphabet of letters. Now only the binary alphabet is relevant and
ℳ, 𝒞, 𝒦 are subsets of {0, 1}∗ = {0, 1}𝑛 .
⋃
𝑛∈ℕ
Modern symmetric encryption schemes support key lengths between 128 and 256
bits. In contrast, public-key algorithms (with the exception of elliptic curve schemes)
use longer keys consisting of more than 1000 bits. Most modern symmetric schemes
are able to encrypt plaintexts of arbitrary length. If however the message length is fixed
by the security parameter, then we speak of a fixed-length encryption scheme.
Example 2.4. The one-time pad is an example of a simple but very powerful fixed-
length symmetric encryption scheme. It uses the binary alphabet, and the key length is
equal to the message length. The security parameter 𝑛 defines the length of plaintexts,
ciphertexts and keys:
ℳ = 𝒞 = 𝒦 = {0, 1}𝑛 .
$
The key generation algorithm 𝐺𝑒𝑛(1𝑛 ) outputs a uniform random key 𝑘 ← {0, 1}𝑛 . A
key 𝑘 of length 𝑛 is used only for one message, 𝑚 ∈ {0, 1}𝑛 . Encryption ℰ𝑘 and decryption
𝒟𝑘 are identical and defined by a simple vectorial XOR operation:
𝑐 = ℰ𝑘 (𝑚) = 𝑚 ⊕ 𝑘, 𝑚 = 𝒟𝑘 (𝑐) = 𝑐 ⊕ 𝑘.
We will see below that this scheme provides perfect security, but since the key has the
same length as the plaintext, the one-time pad is impractical. Much shorter keys (say
several hundreds bits), which can be used for a large amount of data (say megabytes or
gigabytes), are preferable.
Example 2.5. The Vigenère cipher of (key) length 𝑛 is a classical example of a symmet-
ric variable-length scheme over the alphabet of letters. One sets
ℳ = 𝒞 = Σ∗ and 𝒦 = Σ𝑛 , where Σ = {𝐴, 𝐵, … , 𝑍} ≅ ℤ26 .
The letters A to Z can be represented by integers modulo 26 (see Example 1.21). The
letter A corresponds to the residue class 0, B to 1, … , Z to 25. A cyclic shift then becomes
addition or subtraction of residue classes.
$
𝐺𝑒𝑛(1𝑛 ) generates a uniform random key string 𝑘 ← Σ𝑛 of length 𝑛. For encryption
and decryption, the message and the ciphertext is split into blocks of length 𝑛, although
the last block can be shorter. Each letter in a plaintext block is transformed by a cyclic
shift, where the number of positions is determined by the corresponding key letter.
For encryption the shifting is in the positive direction, and for decryption it is in the
opposite direction.
𝑐 = ℰ𝑘 (𝑚) = ℰ𝑘 (𝑚1 ‖𝑚2 ‖ … ) = (𝑚1 + 𝑘 ‖ 𝑚2 + 𝑘 ‖ … ) mod 26,
𝑚 = 𝒟𝑘 (𝑐) = 𝒟𝑘 (𝑐1 ‖𝑐2 ‖ … ) = (𝑐1 − 𝑘 ‖ 𝑐2 − 𝑘 ‖ … ) mod 26.
For 𝑛 = 1, one obtains a monoalphabetic substitution cipher, for example the so-called
Caesar cipher, where 𝑘 = 3: each letter is shifted by three positions, the letter A maps
to D, B maps to E, etc. The Vigenère cipher of length 𝑛 > 1 is an example of a polyal-
phabetic substitution cipher. Although the key can be long, each ciphertext letter only
depends on a single plaintext character.
Example 2.6. A Transposition Cipher over an arbitrary alphabet encrypts a plaintext

block of length 𝑛 by reordering the characters. Keys are given by a random permutation
of {1, 2, … , 𝑛}. Bit permutations are transposition ciphers over the binary alphabet. We
observe that the frequency of characters is preserved by transposition ciphers. ♢
The above examples all use linear or affine transformations. Later we will see that
affine ciphers are often vulnerable to known plaintext attacks which only require stan-
dard linear algebra (see Proposition 4.91). There are also classical nonlinear mono- or
polyalphabetic substitution ciphers. If the plaintext contains text from a known lan-
guage, then such ciphers can often be broken by a frequency analysis. It is also possible
to reveal an unknown length of a polyalphabetic cipher.
2.2. Perfect Secrecy 35
Example 2.7. Another example of a polyalphabetic substitution cipher is the Enigma

rotor cipher machine that was used since the 1920s and in particular by Nazi Germany
during the Second World War. The Enigma is an electro-mechanical cipher machine
that produces one ciphertext letter per keystroke. The key space is quite large and de-
pends on initial rotor positions, ring settings and plug connections of the machine. The
rotor positions change with each character. With considerable effort and cryptoanalyt-
ical machines, the Enigma was eventually broken. We recommend the book [CM13]
for more details about the Enigma system.
2.2. Perfect Secrecy

One might want a cryptosystem that is unbreakable for any type of adversary, even if
they have unlimited computing power. Such a scheme would provide unconditional
security. Claude Shannon defined the notion of perfect secrecy [Sha49].
Definition 2.8. An encryption scheme is perfectly secret if for all plaintexts 𝑚0 , 𝑚1 ∈
ℳ and all ciphertexts 𝑐 ∈ 𝒞:
𝑃𝑟[ℰ𝑘 (𝑚0 ) = 𝑐 ] = 𝑃𝑟[ℰ𝑘 (𝑚1 ) = 𝑐 ].
The probabilities are computed over randomly generated keys 𝑘 ∈ 𝒦. ♢
Perfect secrecy means that all plaintexts have the same probability for a given ci-
phertext. This property is also called perfect indistinguishability: without the secret key,
it is impossible to find out which plaintext was encrypted. An eavesdropper truly learns
nothing about the plaintext from the ciphertext, provided that any key is possible. In
other words: given any ciphertext 𝑐, every plaintext message 𝑚 is exactly as likely to
be the underlying plaintext. Note that the definition requires a fixed plaintext length
since encryption usually does not hide the length of a plaintext message.
The following lemma provides an alternative definition of perfect secrecy.
Lemma 2.9. An encryption scheme is perfectly secret if and only if for every probability
distribution over ℳ, every plaintext 𝑚 and every ciphertext 𝑐 for which 𝑃𝑟[𝑐] > 0, the
probability of 𝑚 and the conditional probability of 𝑚 given 𝑐 coincide:
𝑃𝑟[𝑚 | 𝑐] = 𝑃𝑟[𝑚]. ♢
Although perfect secrecy is a very strong requirement, there is a very simple cipher
which achieves this level of security: the one-time pad (see Example 2.4).
Theorem 2.10. The one-time pad is perfectly secret if the key is generated by a random
bit generator (see Definition 1.56) and is only used once.
Proof. Let 𝑛 be a security parameter of the one-time pad. Suppose 𝑚0 , 𝑚1 are plain-
texts and 𝑐 is a ciphertext of length 𝑛. Then there is exactly one key 𝑘0 of length 𝑛 which
encrypts 𝑚0 into 𝑐 and in fact 𝑘0 = 𝑚0 ⊕ 𝑐. Since we assumed an uniform distribution
1
of keys, we have 𝑃𝑟[ℰ𝑘 (𝑚0 ) = 𝑐] = . The same holds true for 𝑚1 , which proves the
2𝑛
Theorem. □
Example 2.11. Suppose a Vigenère cipher of key length 3 is used to encrypt four char-
acters. Let 𝑐 = (𝑦1 , 𝑦2 , 𝑦3 , 𝑦4 ) be any ciphertext of length four, 𝑚 = (𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 ) the
corresponding plaintext and 𝑘 = (𝑘1 , 𝑘2 , 𝑘3 ) the key. Then
𝑦1 ≡ 𝑥1 + 𝑘1 , 𝑦2 ≡ 𝑥2 + 𝑘2 , 𝑦3 ≡ 𝑥3 + 𝑘3 , 𝑦4 ≡ 𝑥4 + 𝑘1 mod 26.
The difference between the fourth ciphertext and plaintext character is congruent to
the difference between the first ciphertext and plaintext character:
𝑘1 ≡ 𝑦1 − 𝑥1 ≡ 𝑦4 − 𝑥4 mod 26.
This forms a condition for all valid plaintext/ciphertext pairs. If 𝑐 = (𝑦1 , 𝑦2 , 𝑦3 , 𝑦4 ) is
given, then there are many unfeasible plaintexts 𝑚0 , i.e., ℰ𝑘 (𝑚0 ) ≠ 𝑐 for all 𝑘 ∈ 𝒦, and
therefore
𝑃𝑟[ℰ𝑘 (𝑚0 ) = 𝑐] = 0.
On the other hand, plaintexts 𝑚1 = (𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 ) that satisfy the congruence 𝑦1 −𝑥1 ≡
𝑦4 − 𝑥4 mod 26 are possible and their probability is
1
𝑃𝑟[ℰ𝑘 (𝑚1 ) = 𝑐] = 3
26
if 𝑘 ∈ ℤ326 is chosen uniformly at random. Hence the cipher does not have perfect
secrecy. ♢
If the key is shorter than the message, then an encryption scheme cannot be per-
fectly secret: a known ciphertext message changes the posterior probability of a plain-
text message. On the other hand, adversaries with limited resources might not be able
to exploit this situation so that the scheme is still computationally secure. This is dis-
cussed in the next section.
2.3. Computational Security

Practical security should take the computing power of an adversary into account, rather
than assuming unlimited resources. Furthermore, it is reasonable to accept a small
chance of success, since an adversary might guess the correct plaintext or key. The
computational approach estimates the probability of success in terms of time and com-
puting resources.
We consider a scheme to be broken if an adversary can learn something about the
plaintext from the ciphertext, i.e., if they obtain the output of any function of the plain-
text, for example a plaintext bit or a sum of several plaintext bits. In the next section,
breaking a scheme is defined in the context of an experiment, where an adversary has
to answer a challenge.
Definition 2.12. A scheme is (𝑡, 𝜖)-secure if any adversary running in time 𝑡 (measured
in CPU cycles) can break the scheme with probability of 𝜖 at most.
2.4. Indistinguishable Encryptions 37
Example 2.13. Assume that the best-known attack against a scheme is exhaustive key
search (brute force) and that the key has length 𝑛. If testing a single key takes 𝑐 CPU
𝑁
cycles and in total 𝑁 CPU cycles are executed, then keys can be tested and the proba-
𝑐
𝑁 𝑁 𝑁
bility of success is approximately 𝑛 , if ≪ 2𝑛 . Hence the scheme is (𝑁, 𝑛 )-secure.
𝑐2 𝑐 𝑐2
Suppose an adversary uses a computer with one 2 GHz CPU and performs a brute force
attack against a scheme with 128-bit key length over the course of a year. Let’s assume
that 𝑐 = 1. Then roughly 255 keys can be tested and the scheme is (255 , 2−73 )-secure.
Note that an event with a probability of 2−73 will never occur in practice. ♢
We see that concrete values depend on the hardware used, e.g., the type of CPU, as
well as on the implementation of attacks. Now we give an asymptotic version of a defi-
nition of security. In this approach, the running time and the probability of breaking a
scheme are considered as functions of the security parameter 𝑛 (see Remark 2.2), and
one analyzes the behavior for sufficiently large values of 𝑛.
Definition 2.14. An encryption scheme is called computational secure if every proba-
bilistic algorithm with polynomial running time can only break the scheme with neg-
ligible probability in the security parameter 𝑛. ♢
Computational security only provides an asymptotic security guarantee, i.e., if the

security parameter (see Remark 2.2) is sufficiently large.
Example 2.15. If the best possible attack is a brute-force search of a key of length 𝑛
(see Example 2.13) and the running time is bounded by 𝑁 = 𝑝(𝑛), where 𝑝 is a polyno-
𝑝(𝑛) 𝑝(𝑛)
mial, then the scheme is (𝑝(𝑛), 𝑛 )-secure, where 𝑐 is a constant. The probability 𝑛
𝑐⋅2 𝑐⋅2
decreases exponentially to zero as 𝑛 goes to infinity and is thus negligible. Therefore,
the scheme is computationally secure.
2.4. Indistinguishable Encryptions

In Section 2.2 we saw that perfect secrecy requires perfect indistinguishability: for a
given ciphertext, an adversary can only guess which plaintext was encrypted and all
plaintexts (of a given length) occur with exactly the same probability. Now we want
to relax the requirements of perfect secrecy for a more practical definition. Firstly, we
consider only efficient adversaries running in polynomial time. Secondly, we allow a
very small, i.e., negligible advantage over random guesses when an adversary tries to
distinguish between two cases. Indistinguishability (IND) means that efficient adver-
saries are unable to find the correct plaintext out of two possibilities if the ciphertext is
given. The performance of adversaries is not noticeably better than random guessing.
Furthermore, we want to define security under different types of attacks. We spec-
ify a threat model and consider attackers with certain capabilities: it is usually assumed
that adversaries are able to eavesdrop on ciphertext messages, but we also look at ad-
versaries who have access to plaintext/ciphertext pairs (Known Plaintext Attack) or can
choose plaintexts (Chosen Plaintext Attack, CPA). If the adversary can even choose ci-
phertexts and obtain the corresponding plaintexts, then we call this a Chosen Ciphertext
Attack (CCA).
We consider experiments (or games) between two algorithms, a polynomial-time
adversary and a challenger. We denote the adversary by 𝐴 and the challenger by 𝐶.
The challenger takes as input a security parameter and sets up the experiment, for ex-
ample by generating parameters and keys. 𝐶 runs the experiment and interacts with
𝐴. In the experiment, 𝐴 has certain choices and capabilities. Finally, 𝐴 has to answer
a challenge and outputs a single bit. The challenger verifies the answer and outputs 1
(𝐴 was successful and won the game) or 0 (𝐴 failed). Obviously, 𝐴 has a 50% chance
of randomly guessing the correct answer, but 𝐴 might also use a more effective strat-
egy. The game is repeated many times so that a success probability and an advantage
(compared to random guesses) can be computed.
Such experiments may look artificial at first, but they answer the question as to
whether an adversary can obtain at least one bit of secret information by applying an
efficient algorithm. A scheme is considered broken if the probability of success is sig-
nificantly higher than 50%.
In many security experiments, 𝐶 chooses a uniform random secret bit 𝑏 and 𝐴
obtains a challenge that depends on 𝑏. Finally, 𝐴 outputs a bit 𝑏′ and wins the game if
𝑏 = 𝑏′ . Since the experiment is repeated many times, both 𝑏 and 𝑏′ can be considered
as random variables. The following Table 2.1 contains the four combinations of 𝑏 and
𝑏′ and their joint probabilities:
Table 2.1. Joint probabilities of 𝑏 and 𝑏′ .
𝑏′ = 0 𝑏′ = 1
𝑏=0 𝑃𝑟[𝑏′ = 0 ∧ 𝑏 = 0] 𝑃𝑟[𝑏′ = 1 ∧ 𝑏 = 0]
𝑏=1 𝑃𝑟[𝑏′ = 0 ∧ 𝑏 = 1] 𝑃𝑟[𝑏′ = 1 ∧ 𝑏 = 1]
1
If the adversary randomly guesses 𝑏′ , then all four probabilities are close to . On
2
1
the other hand, if 𝐴 is doing a good job, then the diagonal entries are greater than
2
1
and the other two are smaller than .
2
We define 𝐴’s advantage over random guesses as the difference between the prob-
ability of success (output of the experiment is 1) and the probability of failure (output of
2.5. Eavesdropping Attacks 39
the experiment is 0):

Adv(𝐴) = | 𝑃𝑟[𝑏′ = 0 ∧ 𝑏 = 0] + 𝑃𝑟[𝑏′ = 1 ∧ 𝑏 = 1]
− 𝑃𝑟[𝑏′ = 1 ∧ 𝑏 = 0] − 𝑃𝑟[𝑏′ = 0 ∧ 𝑏 = 1] |
= | 𝑃𝑟[𝑏′ = 𝑏] − 𝑃𝑟[𝑏′ ≠ 𝑏] |
= | 𝑃𝑟[𝑂𝑢𝑡(𝐶) = 1] − 𝑃𝑟[𝑂𝑢𝑡(𝐶) = 0] | .
The difference could be negative, so we take the absolute value. A negative advan-
tage would anyway imply a positive advantage for an inverse adversary 𝐴′ who outputs
1 if 𝐴 outputs 0 and vice versa.
1 1
The advantage is close to − = 0 if 𝐴 randomly guesses 𝑏′ and close to 1 − 0 = 1
2 2
if 𝐴 (or the inverse adversary) often succeeds in finding the correct answer.
Remark 2.16. The following definitions are also used in the literature:
1 1
Adv(𝐴) = 2 ⋅ |||𝑃𝑟[𝑏′ = 𝑏] − ||| = 2 ⋅ |||𝑃𝑟[𝑂𝑢𝑡(𝐶) = 1] − ||| ,
2 2
Adv(𝐴) = | 𝑃𝑟[𝑏′ = 1|𝑏 = 1] − 𝑃𝑟[𝑏′ = 1|𝑏 = 0] |.
These formulas give the same advantage Adv(𝐴) as above (see Exercise 6). The first
formula is also used in the literature without the scaling factor 2.
2.5. Eavesdropping Attacks

Firstly, we assume a passive eavesdropper who only observes the ciphertext. This can
be formalized by an experiment, where an adversary chooses two plaintexts and gets
the ciphertext of one of them. Can the adversary find the correct plaintext and get an
advantage over random guesses?
Definition 2.17. Suppose a symmetric encryption scheme is given and consider the
following indistinguishability experiment (see Figure 2.2). A challenger takes the secu-
rity parameter 1𝑛 as input, generates a key 𝑘 ∈ 𝒦 by running 𝐺𝑒𝑛(1𝑛 ) and chooses a
$
random bit 𝑏 ← {0, 1}. A probabilistic polynomial-time adversary 𝐴 is given 1𝑛 , but nei-
ther 𝑘 nor 𝑏 are known to 𝐴. The adversary chooses two plaintexts 𝑚0 and 𝑚1 that are
equal in length. The challenger returns the ciphertext ℰ𝑘 (𝑚𝑏 ) of one of them. 𝐴 tries
to guess 𝑏 (i.e., tries to find out which of the two plaintexts was encrypted) and outputs
a bit 𝑏′ . The challenger outputs 1 if 𝑏 = 𝑏′ , and 0 otherwise. The EAV advantage of 𝐴
is defined as
eav
Adv (𝐴) = | 𝑃𝑟[𝑏′ = 𝑏] − 𝑃𝑟[𝑏′ ≠ 𝑏] | .
The probability is taken over all random variables in this experiment, i.e., the key 𝑘, bit
𝑏, encryption ℰ𝑘 and randomness of 𝐴. ♢
A scheme is insecure under an EAV attack if the advantage of a smart adversary

is not negligible, so that an adversary can successfully derive some information about
the plaintext from the ciphertext.
Adversary Challenger
1𝑛 $ $
𝑘 ← 𝐺𝑒𝑛(1𝑛 ), 𝑏 ← {0, 1}
𝑚0 , 𝑚1
Choose 𝑚0 , 𝑚1
|𝑚0 | = |𝑚1 |
𝑐
𝑐 ← ℰ𝑘 (𝑚𝑏 )
𝑏′
Select 𝑚0 (𝑏′ = 0) Compare 𝑏 and 𝑏′ ,
or 𝑚1 (𝑏′ = 1) output 1 or 0
Figure 2.2. Indistinguishability experiment in the presence of an eavesdropper.
Definition 2.18. An encryption scheme has indistinguishable encryptions in the pres-

ence of an eavesdropper (IND-EAV secure or EAV-secure) if for every probabilistic
eav
polynomial-time adversary 𝐴, the advantage Adv (𝐴) is negligible in the security pa-
rameter 𝑛. ♢
The definition can also be used to show that a particular scheme does not have EAV
security. In this case, it suffices to give one example of a polynomial-time algorithm
which achieves a non-negligible advantage in the EAV experiment.
Example 2.19. Suppose a scheme does not encrypt the first bit, i.e., the first plaintext
bit and the first ciphertext bit coincide. Then the scheme does not have EAV security:
an adversary could choose two plaintexts that differ in their first bit. In this way they
are able to identify the correct plaintext from the challenge ciphertext.
Remark 2.20. It may seem surprising that the adversary can choose the plaintexts in
an eavesdropping attack. It would be conceivable to let the challenger select the plain-
texts. However, eavesdropping security should ensure that encryption protects every
plaintext, not just selected or random plaintexts.
Furthermore, there are real-world situations where the plaintext space is rather
small, say {0, 1} or {YES, NO}. Such plaintexts also deserve protection. The EAV exper-
iment is perfect for modeling this situation. ♢
We may also adopt a concrete approach to eavesdropping security (see Section 2.3):
Definition 2.21. An encryption scheme is (𝑡, 𝜖)-secure in the presence of an eaves-
dropper if for every probabilistic adversary 𝐴 running in time 𝑡, the advantage of 𝐴 is
less than 𝜖:
eav
Adv (𝐴) < 𝜖. ♢
2.6. Chosen Plaintext Attacks 41
𝑘 𝑏
𝑚0
ℰ 𝑐
𝑚1
Figure 2.3. Indistinguishability: no efficient algorithm can tell the two cases apart if
𝑘 and 𝑏 are secret.
Remark 2.22. This is the first in a series of similar experiments and the reader is
invited to think through the definitions. The adversary 𝐴 chooses two plaintexts of the
same length. The challenger encrypts one of the plaintexts and gives the ciphertext
to the adversary. Then 𝐴 has to distinguish between the two cases, i.e., find out which
plaintext was encrypted (see Figure 2.3). The question is whether 𝐴 finds a clever
way to tackle the challenge, or else falls back on random guesses. The advantage is
finally computed over many games with sample keys 𝑘, random bits 𝑏, ciphertexts 𝑐
and other randomness used by the adversary.
In practice, one would not really conduct a large number of such experiments. In
particular, modeling the possible strategies of an adversary seems rather difficult, but
the above definition is useful since it clearly states the requirements for eavesdropping
security and defines under which condition a scheme is considered broken.
Remark 2.23. Our security definition is based on indistinguishability. One can show
that this is equivalent to semantic security: an adversary cannot learn any partial infor-
mation about the plaintext from the ciphertext. This means that any function of the
plaintext, say extracting one bit, is hard to compute and polynomial-time adversaries
cannot do any better than random guessing. We refer to the literature for more details
on semantic security ([KL15], [Gol01]).
2.6. Chosen Plaintext Attacks

Now we turn to a more powerful adversary who has flexible access to an encryption or-
acle: they can freely choose plaintexts and obtain the corresponding ciphertexts. This
might help to decrypt a challenge ciphertext, at least if there are only two plaintext can-
didates. Can they do better than randomly guessing the correct plaintext? This is mea-
sured by the IND-CPA advantage, which is defined analogously to the EAV-advantage
above.
Definition 2.24. Suppose a symmetric encryption scheme is given and an adversary
𝐴 has access to an encryption oracle. Consider the following experiment (see Figure
2.4). A challenger takes the security parameter 1𝑛 as input, generates a key 𝑘 ∈ 𝒦 and
$
chooses a uniform random bit 𝑏 ← {0, 1}. A probabilistic polynomial-time adversary 𝐴
is given 1𝑛 , but 𝑘 and 𝑏 are not known to 𝐴. The adversary can choose arbitrary plain-
texts and get the corresponding ciphertext from an encryption oracle. The adversary
then chooses two different plaintexts 𝑚0 and 𝑚1 of the same length. The challenger
returns the ciphertext ℰ𝑘 (𝑚𝑏 ) of one of them. The adversary 𝐴 continues to have access
to the encryption oracle. Finally, 𝐴 tries to guess 𝑏 and outputs a bit 𝑏′ . The challenger
outputs 1 if 𝑏 = 𝑏′ , and 0 otherwise. The IND-CPA advantage of 𝐴 is defined as
ind−cpa
Adv (𝐴) = | 𝑃𝑟[𝑏′ = 𝑏] − 𝑃𝑟[𝑏′ ≠ 𝑏] | .
𝑏, encryption ℰ𝑘 and randomness of 𝐴.
Definition 2.25. A scheme has indistinguishable encryptions under chosen plaintext

attack (IND-CPA secure or CPA-secure) if for every probabilistic polynomial-time ad-
ind−cpa
versary 𝐴, the advantage Adv (𝐴) is negligible in 𝑛. ♢
Adversary Challenger/Oracle
1𝑛 $ $
𝑘 ← 𝐺𝑒𝑛(1𝑛 ), 𝑏 ← {0, 1}
𝑚′
Choose plaintext 𝑚′
𝑐′
𝑐′ ← ℰ𝑘 (𝑚′ )
⋮
𝑚0 , 𝑚1
|𝑚0 | = |𝑚1 |
𝑐
𝑚″
Choose plaintext 𝑚″
𝑐″
𝑐″ ← ℰ𝑘 (𝑚″ )
⋮
𝑏′
Figure 2.4. CPA indistinguishability experiment. The adversary may repeatedly ask
for the encryption of chosen plaintexts 𝑚′ , 𝑚″ .
2.7. Chosen Ciphertext Attacks 43
Security under chosen plaintext attack is quite a strong condition. It basically says
that an adversary is not able to obtain a single bit of information from a given ciphertext,
even if they can ask for the encryption of arbitrary plaintexts.
Remark 2.26. The definition immediately implies that a deterministic encryption
scheme cannot be secure under IND-CPA. An adversary can, in fact, ask the oracle for
the encryption of the chosen plaintexts 𝑚0 and 𝑚1 . Then they only need to compare
the returned ciphertext with the challenge ciphertext. If the scheme is deterministic,
the IND-CPA advantage is equal to 1, but for a non-deterministic scheme, each encryp-
tion of a fixed plaintext can yield another ciphertext. Therefore, a simple comparison
cannot be used in the non-deterministic case.
Example 2.27. Suppose an encryption scheme is probabilistic, but the first ciphertext
bit depends only on the plaintext and the key and not on the randomness of the encryp-
tion algorithm. If the first ciphertext bit is not constant, then the scheme is not secure
under IND-CPA, even if the method is very strong otherwise. An adversary would gen-
erate multiple plaintexts of the same length and ask for encryption, until they find two
plaintexts 𝑚0 , 𝑚1 such that the corresponding ciphertexts differ in their first bit. Then
they choose 𝑚0 and 𝑚1 in the CPA experiment. The first bit of the challenge ciphertext
reveals which of the two plaintexts was encrypted by the challenger.
Remark 2.28. Constant patterns in the ciphertext do not violate IND-CPA (or IND-
EAV) security, and it is not required that the ciphertext looks like a random sequence.
In fact, an adversary cannot leverage constant parts of the ciphertext in order to obtain
information about the plaintext. ♢
In practice, one wants to encrypt multiple messages with the same key. This is not
directly addressed in the EAV and CPA experiments, where an adversary has to find the
plaintext for a single ciphertext message. The generalization of these games for multiple
encryptions allows the adversary to provide multiple pairs of plaintext messages. A left-
or-right oracle encrypts either the left or the right plaintext of each pair (depending on
the secret bit 𝑏) and returns the ciphertexts.
In fact, EAV security for multiple messages is stronger than EAV security for a sin-
gle message (see Exercise 11). On the other hand, CPA-secure schemes remain secure
when the adversary has access to a left-or-right encryption oracle:
Proposition 2.29. If an encryption scheme is CPA-secure (see Definition 2.25), then it is
CPA-secure for multiple encryptions.
2.7. Chosen Ciphertext Attacks

After discussing EAV and CPA security, we now consider even more powerful attack-
ers. Security under a chosen ciphertext attack is defined in a similar way to IND-CPA
security, but additionally gives the adversary the power to request the decryption of
any chosen ciphertext except the challenge ciphertext.
Definition 2.30. Suppose a symmetric encryption scheme is given. Consider the fol-
lowing experiment (see Figure 2.5). On input 1𝑛 a challenger generates a random key
$
𝑘 ∈ 𝒦 and a random bit 𝑏 ← {0, 1}. A probabilistic polynomial-time adversary 𝐴
is given 1𝑛 , but 𝑘 and 𝑏 are not known to 𝐴. The adversary can ask an oracle to en-
crypt arbitrary plaintexts and to decrypt ciphertexts. The adversary chooses two dif-
ferent plaintexts 𝑚0 and 𝑚1 of the same length. The challenger returns the ciphertext
𝑐 = ℰ𝑘 (𝑚𝑏 ) of one of them. The adversary 𝐴 continues to have access to the encryption
and decryption oracle, only decryption of the challenge ciphertext 𝑐 is not permitted.
Finally, 𝐴 tries to guess 𝑏 and outputs a bit 𝑏′ . The challenger outputs 1 if 𝑏 = 𝑏′ , and
0 otherwise. Then the IND-CCA advantage of 𝐴 is defined by
ind−cca
Adv (𝐴) = | 𝑃𝑟[𝑏′ = 𝑏] − 𝑃𝑟[𝑏′ ≠ 𝑏] | .
𝑏, encryption ℰ𝑘 and randomness of 𝐴. ♢
1𝑛 $ $
𝑘 ← 𝐺𝑒𝑛(1𝑛 ), 𝑏 ← {0, 1}
𝑚′ or 𝑐′
Choose plaintext 𝑚′ 𝑐′ ← ℰ𝑘 (𝑚′ )
𝑐′ or 𝑚′
or ciphertext 𝑐′ or 𝑚′ ← 𝒟𝑘 (𝑐′ )
⋮
𝑚0 , 𝑚1
|𝑚0 | = |𝑚1 |
𝑐
𝑚″ or 𝑐″
Choose plaintext 𝑚″ 𝑐″ ← ℰ𝑘 (𝑚″ )
𝑐″ or 𝑚″
or ciphertext 𝑐″ ≠ 𝑐 or 𝑚″ ← 𝒟𝑘 (𝑐″ )
⋮
𝑏′
Figure 2.5. CCA2 indistinguishability experiment. The adversary may repeatedly ask
for the encryption of chosen plaintexts 𝑚′ , 𝑚″ and for the decryption of chosen cipher-
texts 𝑐′ , 𝑐″ except the challenge 𝑐.
Definition 2.31. A scheme has indistinguishable encryptions under adaptive chosen

ciphertext attack (IND-CCA2 secure or CCA2-secure) if for every probabilistic poly-
ind−cca
nomial-time adversary 𝐴, the advantage Adv (𝐴) is negligible in 𝑛. ♢
In the literature, one distinguishes between IND-CCA1 and IND-CCA2. In the

IND-CCA1 experiment, the attack is not adaptive and the adversary may use the de-
cryption oracle only before being given the challenge. In contrast, the above IND-CCA2
experiment allows the adversary to adapt their queries to the challenge. CCA2 security
is stronger than CCA1 security, and CCA security mostly refers to the CCA2 experi-
ment.
CCA2-security requires non-malleability. We say a ciphertext 𝑐 is malleable if an
adversary can produce a valid ciphertext 𝑐′ ≠ 𝑐 such that the corresponding plaintexts
𝑚 and 𝑚′ are related. Note that tampering with the ciphertext does not require de-
cryption. Standard examples of malleability are encryption schemes where flipping
one ciphertext bit changes only one plaintext bit, e.g., a block cipher in CTR mode (see
Definition 2.48 below).
2.8. Pseudorandom Generators

Randomness and pseudorandomness play a fundamental role in cryptography, espe-
cially for encryption, but also for other cryptographic operations. A pseudorandom
string looks like a uniform string without being the output of a random bit generator
(compare Section 1.5). A pseudorandom generator is a deterministic algorithm that
stretches a short truly random seed and outputs a long output sequence that appears
to be uniform random. Pseudorandom generators are basic building blocks for many
cryptographic constructions.
Definition 2.32. Let 𝐺 be a deterministic polynomial-time algorithm that takes an
input seed 𝑠 ∈ {0, 1}𝑛 and outputs a string 𝐺(𝑠) ∈ {0, 1}𝑙(𝑛) , where 𝑙(.) is a polynomial
and 𝑙(𝑛) > 𝑛 for all 𝑛 ∈ ℕ. Then 𝐺 is called a pseudorandom generator (prg) if the
output cannot be distinguished from a uniform random sequence in polynomial time.
$
Consider the following experiment (see Figure 2.6): on input 1𝑛 a seed 𝑠 ← {0, 1}𝑛
$
and a bit 𝑏 ← {0, 1} are chosen uniformly at random. A probabilistic polynomial-time
adversary 𝐴 obtains 1𝑛 , but knows neither 𝑠 nor 𝑏. A challenger chooses a uniform
$
random string 𝑟 ← {0, 1}𝑙(𝑛) . If 𝑏 = 0 then 𝑐 = 𝑟 is given to 𝐴. Otherwise, 𝐴 receives
𝑐 = 𝐺(𝑠). The adversary tries to distinguish between the two cases and outputs a bit
𝑏′ . The challenger outputs 1 if 𝑏 = 𝑏′ , and 0 otherwise. Then the prg-advantage of 𝐴 is
defined as
prg
Adv (𝐴) = | 𝑃𝑟[𝑏′ = 𝑏] − 𝑃𝑟[𝑏′ ≠ 𝑏] | .
The probability is taken over all random variables in this experiment, i.e., the seed 𝑠,
bit 𝑏, string 𝑟 and randomness of 𝐴.
𝐺 is called a pseudorandom generator if for all probabilistic polynomial-time dis-

tinguisher 𝐴, the prg-advantage is negligible in 𝑛.
1𝑛 $ $
𝑠 ← {0, 1}𝑛 , 𝑏 ← {0, 1}
𝑐 $ 𝐺(𝑠) if 𝑏 = 1
𝑟 ← {0, 1}𝑙(𝑛) , 𝑐 = {
𝑟 if 𝑏 = 0
𝑏′
Distinguish Compare 𝑏 and 𝑏′ ,
output 1 or 0
Figure 2.6. Distinguishability experiment for a pseudorandom generator 𝐺.
Remark 2.33. A pseudorandom generator is never a truly random bit generator! Be-
cause of the expansion from length 𝑛 to 𝑙(𝑛) > 𝑛, the distribution of output values
cannot be uniform. In fact, many strings of length 𝑙(𝑛) do not occur in the image of 𝐺
since the domain {0, 1}𝑛 is too small, but with limited resources, the output of 𝐺 looks
random and cannot be distinguished from a truly random sequence (see Figure 2.7).♢
The construction of a pseudorandom generator is a non-trivial task. Obvious con-

structions, like XORing some seed bits in order to produce new bits, do not work since
the generated sequence can easily be distinguished from a truly random sequence. The
output must be unpredictable, i.e., a polynomial-time adversary who is given the first
𝑖 bits of the output, is unable to predict the (𝑖 + 1)st bit with a probability noticeably
higher than 50%. In other words, the generator must pass the next-bit test. One can
show the following Theorem (see [Gol01], Theorem 3.3.7):
Theorem 2.34. A generator is pseudorandom if and only if it is unpredictable in poly-

nomial time. ♢
Proving that a pseudorandom generator is unpredictable is straightforward (see

Exercise 12), but the opposite direction is harder.
Remark 2.35. An unpredictable generator is also called a cryptographically secure

pseudorandom number generator (CSPRNG or CPRNG). There is a statistical test suite
for pseudorandom generators produced by the American NIST [RSN+ 10], but note
that passing the complete test battery does not prove the pseudorandomness. On the
other hand, failing one of the tests indicates that the generator is not pseudorandom
(unless the deviation from the expected result is in turn random). ♢
𝑏 𝑠
𝐺 𝐺 (𝑠)
True
𝑐
𝑏 = 1?
False 𝑟
$
𝑟 ← {0, 1}𝑙(𝑛)
Figure 2.7. Pseudorandomness: adversaries cannot tell the two cases apart if 𝑏 and 𝑠
are secret.
The output of a pseudorandom generator can be used as a keystream. Like the one-
time pad, XORing the plaintext and the keystream defines a fixed-length encryption
scheme. This type of scheme is called a stream cipher. Further details on stream ciphers
can be found in Chapter 6.
Definition 2.36. Let 𝑙(.) be a polynomial. A pseudorandom generator 𝐺, which on
input 𝑘 ∈ {0, 1}𝑛 produces an output sequence 𝐺(𝑘) ∈ {0, 1}𝑙(𝑛) , defines a fixed-length
encryption scheme by the following construction:
The key generation algorithm 𝐺𝑒𝑛(1𝑛 ) takes the security parameter 1𝑛 as input
$
and outputs a uniform random key 𝑘 ← {0, 1}𝑛 of length 𝑛. Set ℳ = 𝒞 = {0, 1}𝑙(𝑛) .
Encryption ℰ𝑘 and decryption 𝒟𝑘 are identical and are defined by XORing the output
stream 𝐺(𝑠) with the input data:
𝑐 = ℰ𝑘 (𝑚) = 𝑚 ⊕ 𝐺(𝑘) and 𝑚 = 𝒟𝑘 (𝑐) = 𝑐 ⊕ 𝐺(𝑘). ♢
Theorem 2.37. If 𝐺 is a pseudorandom generator, then the associated encryption scheme

has indistinguishable encryptions in the presence of an eavesdropper (EAV-secure).
Proof. We only sketch a proof by reduction. For more details we refer the reader to
[KL15].
Suppose the encryption scheme is not EAV-secure; then there is a polynomial-time
eav
algorithm 𝐴 with a non-negligible advantage Adv (𝐴) in the EAV experiment (see
Definition 2.18). We construct an adversary 𝐵 in the prg distinguishability experiment
which uses 𝐴 as a subroutine. 𝐵 obtains a challenge string 𝑐 of length 𝑙(𝑛) and has to
determine whether 𝑐 was generated by 𝐺 or chosen uniformly. Now 𝐵 runs 𝐴, chooses
$
a uniform bit 𝑏 ← {0, 1} and obtains a pair 𝑚0 , 𝑚1 of messages of length 𝑙(𝑛) from 𝐴.
Subsequently, 𝐵 gives the challenge 𝑐 ⊕ 𝑚𝑏 to 𝐴. Finally, 𝐴 outputs 𝑏′ and 𝐵 observes
whether or not 𝐴 succeeds. Remember that we assumed that 𝐴 does a good job in the
EAV experiment, and so a correct output of 𝐴 indicates that 𝑐 was produced by the
generator 𝐺. Therefore, 𝐵 outputs 1, i.e., 𝐵 guesses that 𝑐 = 𝐺(𝑠) if 𝐴 succeeds (𝑏 = 𝑏′ ).
Otherwise, 𝐵 outputs 0, i.e., 𝐵 guesses that 𝑐 is a random string.
eav prg
If Adv (𝐴) is non-negligible, then Adv (𝐵) is non-negligible, too. Furthermore,
𝐵 runs in polynomial time. This contradicts our assumption that 𝐺 is a pseudorandom
generator and proves the theorem. □
Now we know that the construction described in Definition 2.36 has a security
guarantee, but a disadvantage is that the message length 𝑙(𝑛) is fixed for a given security
parameter 𝑛. Furthermore, the above encryption scheme is not CPA-secure (since it is
deterministic) and not EAV-secure for multiple encryptions with the same seed or key
(see Exercise 11). Like the one-time pad, the same keystream must not be re-used for
two or more plaintexts.
In Chapter 6, we will deal with variable-length pseudorandom generators and
stream ciphers. They take a seed (or key) and an initialization vector as input, use an
internal state and recursively generate as many output bits as required.
2.9. Pseudorandom Functions

CPA-secure schemes are often based on pseudorandom functions and permutations. Be-
low, we look at the definition of such a family of functions. Constructing pseudoran-
dom functions and permutations is a difficult task and we will have to wait until Section
5.2 for a concrete instance (AES).
We consider a family of functions:
𝐹 = 𝐹(𝑛) ∶ 𝒦𝑛 × 𝐷𝑛 → 𝑅𝑛 ,
where 𝑛 is a security parameter, 𝒦𝑛 = {0, 1}𝑛 the set of keys, 𝐷𝑛 = {0, 1}𝑙1 (𝑛) the domain,
𝑅𝑛 = {0, 1}𝑙2 (𝑛) the range and 𝑙1 (.), 𝑙2 (.) are polynomials in 𝑛. We suppose that 𝐹 can
be computed in polynomial time. Fixing 𝑘 ∈ 𝒦𝑛 yields a function 𝐹𝑘 ∶ 𝐷𝑛 → 𝑅𝑛 .
The family 𝐹 is called pseudorandom if the functions 𝐹𝑘 ∶ 𝐷𝑛 → 𝑅𝑛 appear to be
random for a polynomial-time adversary who knows the input-output behavior of 𝐹𝑘 ,
but is not given the secret key 𝑘. In other words, a polynomial-time adversary is not
able to distinguish a pseudorandom function from a truly random function.
To simplify the notation we assume that the key length is 𝑛 (the security parameter)
and that the input and output lengths are equal. We write 𝑙1 (𝑛) = 𝑙2 (𝑛) = 𝑙 and bear in
mind that 𝑙 depends on 𝑛. Then 𝐷𝑛 = 𝑅𝑛 = {0, 1}𝑙 and
𝐹 ∶ {0, 1}𝑛 × {0, 1}𝑙 → {0, 1}𝑙 .
Definition 2.38. Let 𝐹 be a public family of functions as outlined above. Consider the
$
following experiment (see Figure 2.8): on input 1𝑛 a random key 𝑘 ← 𝒦 of length 𝑛
$
and a random bit 𝑏 ← {0, 1} are chosen. A polynomial-time adversary 𝐴 obtains 1𝑛 , but
knows neither 𝑘 nor 𝑏. Furthermore, a challenger chooses a uniform random function

𝑓0 ∶ {0, 1}𝑙 → {0, 1}𝑙 and sets 𝑓 = 𝑓0 if 𝑏 = 0 and 𝑓 = 𝐹𝑘 if 𝑏 = 1. The adversary
has oracle access to 𝑓, i.e., 𝐴 can choose input values and obtain the output of 𝑓. The
adversary tries to find out which function was used and outputs a bit 𝑏′ . The challenger
outputs 1 if 𝑏 = 𝑏′ , and 0 otherwise. The prf-advantage of 𝐴 is defined as
prf
Adv (𝐴) = | 𝑃𝑟[𝑏′ = 𝑏] − 𝑃𝑟[𝑏′ ≠ 𝑏] |.
𝑏, function 𝑓0 and randomness of 𝐴. ♢
If the adversary is not able to distinguish between these functions, then the advan-
tage is close to 0. Pseudorandom functions have the property that the prf-advantage is
negligible in 𝑛.
Definition 2.39. A keyed function family 𝐹 as described above is called a pseudo-
random function (prf) if, for every probabilistic polynomial time adversary 𝐴, the prf-
prf
advantage Adv (𝐴) is negligible in 𝑛. ♢
$ $
1𝑛 𝑘 ← {0, 1}𝑛 , 𝑏 ← {0, 1}
𝐹 if 𝑏 = 1
𝑓={ 𝑘
random if 𝑏 = 0
𝑚
Choose 𝑚
𝑐
𝑐 = 𝑓(𝑚)
⋮
𝑏′
output 1 or 0
Figure 2.8. Distinguishability experiment for a pseudorandom function. The adver-

sary can ask for multiple values of 𝑓.
Pseudorandomness is a strong property that cannot be achieved by simple con-

structions. In Chapter 4 we will see that linear or affine families of functions cannot
be pseudorandom (Proposition 4.93).
Remark 2.40. A pseudorandom generator 𝐺 can be derived from a pseudorandom
function 𝐹 by the following construction: choose a uniform random counter 𝑐𝑡𝑟 and
set
𝐺(𝑘) = 𝐹𝑘 (𝑐𝑡𝑟 + 1) ‖ 𝐹𝑘 (𝑐𝑡𝑟 + 2) ‖ 𝐹𝑘 (𝑐𝑡𝑟 + 3) ‖ … ,
where 𝑐𝑡𝑟 is viewed as an integer and addition is done modulo 2𝑛 . Incrementing the
counter allows you to generate output of (almost) arbitrary length. The opposite, i.e.,
constructing a pseudorandom function from a pseudorandom generator, is also pos-
sible but more elaborate. In practice, one prefers pseudorandom functions as a basic
building block. ♢
Pseudorandom permutations are an important special case of pseudorandom func-

tions. They are given by a keyed family of permutations:
𝐹 = 𝐹(𝑛) ∶ 𝒦𝑛 × 𝐷𝑛 → 𝐷𝑛 .
For any 𝑘 ∈ 𝒦, the function 𝐹𝑘 ∶ 𝐷𝑛 → 𝐷𝑛 is a permutation, i.e., 𝐹𝑘 is bijective. As
above, we assume that the key length is 𝑛 and the strings in 𝐷 are of length 𝑙(𝑛) = 𝑙,
i.e. 𝒦𝑛 = {0, 1}𝑛 , 𝐷𝑛 = {0, 1}𝑙 and
𝐹 ∶ {0, 1}𝑛 × {0, 1}𝑙 → {0, 1}𝑙 .
A family 𝐹 of permutations is called pseudorandom if polynomial-time adversaries

are not able to distinguish 𝐹 from a truly random permutation. The following defini-
tion is analogous to Definition 2.38.
Definition 2.41. Let 𝐹 be a public family of permutations. Consider the following
$ $
experiment: on input 1𝑛 a random key 𝑘 ← 𝒦 of length 𝑛 and a random bit 𝑏 ← {0, 1}
are generated. A polynomial-time adversary 𝐴 obtains 1𝑛 , but knows neither 𝑘 nor
𝑏. Furthermore, a challenger chooses a uniform random permutation 𝑓0 of {0, 1}𝑛 and
sets 𝑓 = 𝑓0 if 𝑏 = 0 and 𝑓 = 𝐹𝑘 if 𝑏 = 1. The adversary has oracle access to 𝑓, i.e.,
𝐴 can choose input values and obtain the output of 𝑓. The adversary tries to find out
which function was used and outputs a bit 𝑏′ . The challenger outputs 1 if 𝑏 = 𝑏′ , and
0 otherwise. The prp-cpa-advantage of 𝐴 is defined as
prp−cpa
Adv (𝐴) = | 𝑃𝑟[𝑏′ = 𝑏] − 𝑃𝑟[𝑏′ ≠ 𝑏] | .
𝑏, permutation 𝑓0 and randomness of 𝐴. ♢
We may also give the adversary additional access to the inverse permutation.
Definition 2.42. A family of permutations 𝐹, as described above, is called a pseudo-
random permutation (prp) if, for every probabilistic polynomial time adversary 𝐴, the
prp−cpa
prp-cpa-advantage Adv (𝐴) is negligible in 𝑛. If adversaries also have oracle ac-
cess to the inverse function 𝑓−1 (see Figure 2.9) and the advantage is negligible, then
𝐹 is said to be a strong pseudorandom permutation. ♢
Remark 2.43. A pseudorandom permutation is also a pseudorandom function. The

prp and prf advantages are almost identical if domain and range coincide, but there
is one issue: suppose we want to use a pseudorandom permutation as a pseudoran-
dom function. Since permutations do not have any collisions (different inputs must
$ $
1𝑛 𝑘 ← {0, 1}𝑛 , 𝑏 ← {0, 1}
𝐹 if 𝑏 = 1
𝑓={ 𝑘
random if 𝑏 = 0
𝑚
Choose 𝑚
𝑐
𝑐 = 𝑓(𝑚)
𝑐′
Choose 𝑐′
𝑚′
𝑚′ = 𝑓−1 (𝑐′ )
⋮
𝑏′
output 1 or 0
Figure 2.9. Distinguishability experiment for a strong pseudorandom permutation.

The adversary can ask for multiple values of 𝑓 and 𝑓−1 .
have different outputs), an adversary might distinguish the permutation from a ran-
dom function by computing many input-output pairs and searching for collisions. Sup-
pose that the domain is 𝐷 = {0, 1}𝑙 . By the Birthday Paradox (see Proposition 1.61), a
random function will probably have collisions after around 2𝑙/2 samples. However, a
polynomial-time adversary is not able to check an exponential number of values. In
fact, the prp/prf switching lemma [BR06] states that the difference between the prp and
prf advantages is negligible.
Remark 2.44. In the experiments for pseudorandom functions and permutations, the
secret key 𝑘 is fixed and an adversary can only obtain input-output examples for that
key. Now, under a related-key attack (RKA) they can also study the input-output be-
havior for keys 𝑘1 , 𝑘2 , … related to 𝑘. In an RKA experiment one has to specify which
keys are related. Examples include keys with a fixed difference Δ to 𝑘, i.e. 𝑘 + Δ, or
the complemented key 𝑘, where all key bits are flipped. Note that 𝑘 and its related keys
are still unknown to an adversary, but now they can ask the oracle to use a related key
instead of 𝑘. This gives the adversary more power and pseudorandomness with respect
to a related-key attack is stronger than the usual notion of a prp or prf. We refer the
reader to [BK03] for details.
Example 2.45. Suppose a function family has the following complementation prop-
erty for all keys 𝑘 and input 𝑚:
𝐹𝑘 (𝑚) = 𝐹𝑘 (𝑚).
If the oracle accepts the complemented key 𝑘 as related to 𝑘, then an adversary can
easily distinguish 𝐹 from a random function simply by checking the above equation.
It is known that the former Data Encryption Standard (DES) has the complemen-
tation property and the cipher is thus vulnerable to related-key attacks. ♢
In the next section, we explain how a pseudorandom permutation can be turned

into an encryption scheme. In practice, one selects a block cipher, for example AES (see
Section 5.2), and combines it with an operation mode in order to obtain an encryption
scheme. The block cipher is modeled to behave as a pseudorandom permutation. It
should be noted however that there is no hard proof or guarantee (based on well-known
assumptions) that such a block cipher really is a pseudorandom permutation. Number-
theoretic one-way permutations can be used to define pseudorandom permutations, but
they are less efficient and are only used in public-key cryptography (see Chapter 9).
2.10. Block Ciphers and Operation Modes

Block ciphers are a very important building block of modern encryption schemes.
Definition 2.46. A family of permutations
𝐸 = 𝐸(𝑛) ∶ {0, 1}𝑛 × {0, 1}𝑙 → {0, 1}𝑙
is said to be a block cipher. 𝑛 is the security parameter and the key length, and {0, 1}𝑛 is
the key space. 𝑙 = 𝑙(𝑛) is called the block length. ♢
Block ciphers can be thought of as concrete instances of families of pseudorandom

permutations. Each key 𝑘 ∈ {0, 1}𝑛 yields a bijective function
𝐸𝑘 ∶ {0, 1}𝑙 → {0, 1}𝑙 ,
i.e., a permutation of {0, 1}𝑙 . Key and block lengths are often fixed or restricted to a few
values, for example 𝑙 = 128 and 𝑛 ∈ {128, 192, 256}. Nevertheless, secure block ciphers
should behave as pseudorandom permutations. A specific example of an important
block cipher (AES) is studied in Section 5.2.
Block ciphers only encrypt messages of a fixed block length 𝑙, but the combination
of a block cipher and an operation mode defines a variable-length encryption scheme.
Below, we present ECB, CBC and Counter (CTR) modes (see [Dwo01]). Additional
modes are dealt with in subsequent chapters: OFB and CFB modes in Section 6.1 and
the GCM mode in Section 8.4.
Definition 2.47. (ECB and CBC mode) Let
𝐸 ∶ {0, 1}𝑛 × {0, 1}𝑙 → {0, 1}𝑙
be a block cipher. We define two variable-length symmetric encryption schemes based
on 𝐸. In both cases, the key generation algorithm 𝐺𝑒𝑛(1𝑛 ) outputs a uniform random
$
key 𝑘 ← {0, 1}𝑛 of length 𝑛. A binary plaintext message 𝑚 is split into blocks of length
𝑙 = 𝑙(𝑛) and a padding mechanism is applied, if the message length is not a multiple
of 𝑙. A message of length 𝐿 can be padded by appending a 1 followed by the necessary
number of zeros:
𝑥1 𝑥2 … 𝑥𝐿 10 … 0.
We write 𝑚 = 𝑚1 ‖𝑚2 ‖ … ‖𝑚𝑁 , where each 𝑚𝑖 is a block of length 𝑙.
• (ECB mode) Each block is encrypted separately using 𝐸𝑘 , so that
𝑐𝑖 = 𝐸𝑘 (𝑚𝑖 ) for 𝑖 = 1, 2, … , 𝑁, and 𝑐 = ℰ𝑘 (𝑚) = 𝑐1 ‖𝑐2 ‖ … ‖𝑐𝑁 .
Decryption works in a similar way:
𝑚𝑖 = 𝐸𝑘−1 (𝑐𝑖 ) for 𝑖 = 1, 2, … , 𝑁, and 𝑚 = 𝒟𝑘 (𝑐) = 𝑚1 ‖𝑚2 ‖ … ‖𝑚𝑁 .
$
• (Randomized CBC mode) An initialization vector 𝐼𝑉 ← {0, 1}𝑙 is chosen uni-
formly at random for each message. Then define
𝑐0 = 𝐼𝑉,
𝑐𝑖 = 𝐸𝑘 (𝑚𝑖 ⊕ 𝑐𝑖−1 ) for 𝑖 = 1, 2, … , 𝑁, and
𝑐 = ℰ𝑘 (𝑚) = 𝑐0 ‖𝑐1 ‖𝑐2 ‖ … ‖𝑐𝑁 .
Decryption is defined by
𝑚𝑖 = 𝐸𝑘−1 (𝑐𝑖 ) ⊕ 𝑐𝑖−1 for 𝑖 = 1, 2, … , 𝑁 and
𝑚 = 𝒟𝑘 (𝑐) = 𝑚1 ‖𝑚2 ‖ … ‖𝑚𝑁
(see Figure 2.10). ♢
Figure 2.10. Encryption and decryption in CBC mode.
We can easily verify that the CBC mode has correct decryption:
𝐸𝑘−1 (𝑐𝑖 ) ⊕ 𝑐𝑖−1 = 𝐸𝑘−1 (𝐸𝑘 (𝑚𝑖 ⊕ 𝑐𝑖−1 )) ⊕ 𝑐𝑖−1 = 𝑚𝑖 ⊕ 𝑐𝑖−1 ⊕ 𝑐𝑖−1 = 𝑚𝑖 .
The ECB mode has a straightforward definition, but the mode turns out to be in-
secure and should be avoided. The scheme is deterministic and thus cannot be CPA-
secure. Neither is the scheme EAV-secure, since plaintext patterns are preserved. Sup-
pose, for example, that 𝑚 = 𝑚1 ‖𝑚2 ‖𝑚1 ‖𝑚2 ; then the ciphertext has the same pattern:
𝑐 = 𝑐1 ‖𝑐2 ‖𝑐1 ‖𝑐2 .
CBC is a popular mode, which can be CPA-secure when properly applied (see The-
orem 2.52). The additional computational load, compared to the ECB mode, is very
low. On the other hand, encryption in CBC mode cannot be parallelized.
Encryption schemes can also be based on a family of functions which are not nec-
essarily bijective. The counter mode described below is often used in practice and ef-
fectively turns a block cipher into a stream cipher (see Chapter 6).
Definition 2.48. (Randomized CTR mode) Let
𝐹 ∶ {0, 1}𝑛 × {0, 1}𝑙 → {0, 1}𝑙
be a family of functions. We define a variable-length symmetric encryption scheme
based on 𝐹. The key generation algorithm 𝐺𝑒𝑛(1𝑛 ) outputs a uniform random key
$
𝑘 ← {0, 1}𝑛 of length 𝑛. A plaintext message 𝑚 is split into blocks of length 𝑙 = 𝑙(𝑛),
where the last block can be shorter:
𝑚 = 𝑚1 ‖𝑚2 ‖ … ‖𝑚𝑁 .
$
A uniform random counter 𝑐𝑡𝑟 ← {0, 1}𝑙 is chosen and viewed as an integer. For each
block in the message, the counter is incremented (where addition is done modulo 2𝑙 )
and 𝐹𝑘 is applied to the counter. The output is used as a keystream and the ciphertext
is obtained by XORing plaintext and the keystream (see Figure 2.11):
𝑐𝑖 = 𝐹𝑘 (𝑐𝑡𝑟 + 𝑖) ⊕ 𝑚𝑖 for 𝑖 = 1, 2, … , 𝑁 and
𝑐 = ℰ𝑘 (𝑚) = 𝑐𝑡𝑟‖𝑐1 ‖𝑐2 ‖ … ‖𝑐𝑁 .
Decryption is defined by XORing the ciphertext and the same keystream:
𝑚𝑖 = 𝐹𝑘 (𝑐𝑡𝑟 + 𝑖) ⊕ 𝑐𝑖 for 𝑖 = 1, 2, … , 𝑁 and
𝑚 = 𝒟𝑘 (𝑐) = 𝑚1 ‖𝑚2 ‖ … ‖𝑚𝑁 .
Only the first (most significant) bits of 𝐹𝑘 (𝑐𝑡𝑟 + 𝑁) are used for the XOR operation if
the last block is shorter than 𝑙 bits. ♢
Figure 2.11. Encryption in CTR mode. Decryption is almost identical, with plaintext
and ciphertext swapped.
The above CBC and CTR modes define randomized encryption schemes. The IV
used in the CBC mode must be chosen uniformly, whereas the 𝑐𝑡𝑟 value in the CTR
mode can also be a nonce value (number used once). We note that IV and 𝑐𝑡𝑟 values are
not secret and (without encryption) become part of the ciphertext. The full ciphertext
can only be recovered with the IV or 𝑐𝑡𝑟 value.
Remark 2.49. The CTR mode turns a block cipher into a synchronous stream cipher
(see Definition 6.1). The encrypted counter values are used as a keystream. For en-
cryption and decryption, the plaintext and the ciphertext is XORed with the keystream.
There are two other operation modes, the Cipher Feedback Mode (CFB) and the Output
Feedback Mode (OFB) that also generate a keystream. These two modes and stream
ciphers in general are discussed in Chapter 6. ♢
The following two Theorems 2.50 and 2.52 state that CTR and CBC modes achieve
security under CPA if the scheme uses a pseudorandom family of functions or permu-
tations.
Theorem 2.50. If 𝐹 is a pseudorandom function family, then the randomized CTR mode
has indistinguishable encryptions under a chosen plaintext attack, i.e., the encryption
scheme is IND-CPA secure.
Proof. The assertion has a proof by reduction. Let 𝐴 be an adversary in the CPA indis-
ind−cpa
tinguishability experiment (see Definition 2.24). We want to show that Adv (𝐴)
is negligible if 𝐹 is a pseudorandom permutation.
We construct an algorithm 𝐵 running in the prf experiment (see Definition 2.38),
which uses 𝐴 as a subroutine. Similarly as in the proof of Theorem 2.37, 𝐵 takes the
$
role of 𝐴’s challenger in the CPA experiment. 𝐵 generates a random bit 𝑏 ← {0, 1}
and responds to 𝐴’s encryption queries. So 𝐵 needs to encrypt messages in CTR mode.
Now, since 𝐵 runs in the prf experiment, 𝐵 has access to a function 𝑓 ∶ {0, 1}𝑙 →
{0, 1}𝑙 which is either the pseudorandom function 𝐹𝑘 or a random function. 𝐵 uses 𝑓
to encrypt messages of arbitrary length in CTR mode, and we denote the associated
randomized encryption function by ℰ𝑓 . If 𝐴 sends the oracle query 𝑚, then 𝐵 responds
with ℰ𝑓 (𝑚). During the experiment, 𝐴 sends a pair (𝑚0 , 𝑚1 ) of plaintexts and 𝐵 returns
the challenge ciphertext ℰ𝑓 (𝑚𝑏 ). Finally, 𝐴 outputs 𝑏′ and 𝐵 observes whether or not
𝐴 succeeds. The result is used to answer the challenge in the prf experiment: 𝐵 outputs
1 if 𝑏 = 𝑏′ , and 0 otherwise.
The running time of 𝐵 is polynomial and is given by the sum of 𝐴’s running time
and the time to encrypt the plaintexts chosen by 𝐴. Note that 𝐵’s strategy is to observe
𝐴 and to output 1, i.e., to guess that 𝑓 is the pseudorandom function if 𝐴 was successful.
Now we want to prove that 𝐵’s advantage is closely related to 𝐴’s advantage, so that
ind−cpa prf
Adv (𝐴) is negligible if Adv (𝐵) is negligible.
Since 𝐵 is an adversary in the prf experiment, 𝐵 does not know whether its chal-
lenger has chosen 𝑓 = random or 𝑓 = 𝐹𝑘 . We consider both cases:
(1) If 𝑓 is a random function, then 𝐵 succeeds, i.e., outputs 0 if and only if 𝐴 fails. The
sign is not relevant and the advantages of 𝐴 and 𝐵 are identical. The CTR mode
encryption is similar to a one-time pad and the advantage of 𝐴 and 𝐵 is 0, unless
the counter values overlap, i.e., a counter value used to compute the challenge ci-
phertext overlaps with at least one counter in the responses to 𝐴’s queries. In this
case, the adversary knows a keystream block and can easily answer the challenge.
Let 𝑞 be an upper bound on the number of blocks in 𝐴’s chosen plaintexts 𝑚0 and
𝑚1 , the number of queries and the number of blocks in the queries. Let 𝑐𝑡𝑟 be
the initial counter value used to compute the challenge ciphertext. For a single
query and a chosen plaintext of at most 𝑞 blocks, only initial counter values 𝑐𝑡𝑟′
with |𝑐𝑡𝑟 − 𝑐𝑡𝑟′ | < 𝑞 can result in an overlap. This inequality is satisfied for 2𝑞 − 1
values of 𝑐𝑡𝑟′ . If the number of queries is at most 𝑞, then an overlap occurs for
less than 2𝑞2 values. Since the initial counter values are chosen uniformly from
{0, 1}𝑙 , we obtain
2𝑞2
𝑃𝑟[Overlap] < 𝑙
2
(compare [KL15] and [BR05]). Since 𝐴 runs in polynomial time, 𝑞 is polynomial
in 𝑛 and 𝑃𝑟[Overlap] is negligible. If an overlap occurs, then we use the trivial
2𝑞2
bound 1 for 𝐵’s advantage. However, the probability of an overlap is less than ,
2𝑙
and otherwise the advantage of 𝐵 is 0. We conclude that
2𝑞2
| 𝑃𝑟[𝑂𝑢𝑡(𝐵) = 0 | 𝑓 = random ] − 𝑃𝑟[𝑂𝑢𝑡(𝐵)
⏟⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⏟⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⏟= 1 | 𝑓 = random ] | < .
2𝑙
=∶𝑥
(2) If 𝑓 = 𝐹𝑘 , then 𝐵 succeeds, i.e., outputs 1 if and only if 𝐴 is successful. 𝐴 runs in

the CPA experiment with CTR mode encryption using the function 𝐹𝑘 . Therefore,
ind−cpa
Adv (𝐴) = | 𝑃𝑟[𝑂𝑢𝑡(𝐵) = 1 | 𝑓 = 𝐹𝑘 ] − 𝑃𝑟[𝑂𝑢𝑡(𝐵) = 0 | 𝑓 = 𝐹𝑘 ] |.
⏟⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⏟⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⎵⏟
=∶𝑦
A uniform random bit, chosen by 𝐵’s challenger in the prf experiment, determines
whether 𝑓 is a random function or 𝑓 = 𝐹𝑘 . Hence
1
𝑃𝑟[𝑓 = random ] = 𝑃𝑟[𝑓 = 𝐹𝑘 ] =.
2
The definition of 𝐵’s advantage in the prf experiment and the definition of 𝑥 and 𝑦 in
(1) and (2) yield
1 1
Adv (𝐵) = ||| 𝑥 + 𝑦||| .
prf
2 2
Combining (1) and (2), we obtain
ind−cpa prf 2𝑞2
Adv (𝐴) = |𝑦| = |𝑦 + 𝑥 − 𝑥| ≤ |𝑥 + 𝑦| + |𝑥| < 2 Adv (𝐵) + .
2𝑙
prf
𝐹 is a pseudorandom function family, and hence the advantage Adv (𝐵) is negligible.
1 2𝑞2
Furthermore, is negligible in 𝑙 as well as in 𝑛, and is negligible in 𝑛. It follows
2𝑙 2𝑙
ind−cpa
that Adv (𝐴) is negligible, which completes the proof by reduction. □
Remark 2.51. The security guarantee provided by this type of proof can easily be mis-
understood, and there have been some controversies about this topic (see [KM07] and
the subsequent discussion).
• Is encryption in CTR mode unconditionally secure?
No, since the CPA security depends on the security of the underlying pseudoran-
dom function family. The scheme is secure under the condition that the function
family is pseudorandom. In other words, one has to assume that no polynomial-
time algorithm can distinguish between the function family and a random func-
tion, or at least that no one can find such an algorithm.
• If pseudorandomness is given, is the CTR mode then secure against all polynomial-
time attacks?
No, since we made assumptions about the capabilities of adversaries. They are
only looking at plaintexts and ciphertexts and have no information about the se-
cret key. Side-channel attacks, where an adversary analyzes the power consump-
tion or the timing of an implementation are not considered. Furthermore, one
assumes that the secret key is uniformly random. This is not necessarily true in
practice, for example if the key is derived from a password.
• If all assumptions are satisfied, does the CTR mode have a concrete security guaran-
tee?
The CTR mode is asymptotically CPA-secure. Concrete security depends on the
security parameter, the security of 𝐹 and the number and length of queries made
by an adversary.
We given an example and estimate the concrete advantage of an adversary 𝐴
given in the proof of the above Theorem. Suppose that 𝑙 = 128, 𝐴 makes 𝑞 = 232
queries and the advantage of the corresponding adversary 𝐵 in the prf experiment
prf
is Adv (𝐵) = 2−64 . We assume that the number of blocks in 𝐴’s chosen plain-
texts and in his queries is less than 𝑞. Then
ind−cpa 2𝑞2
prf
Adv (𝐴) ≤ 2 Adv = 2−62 .
(𝐵) +
2𝑙
The concrete security guarantee under the above assumptions is quite strong since
the CPA advantage is very small. ♢
The randomized CBC mode also has a security guarantee, and we refer to [KL15]
or [BR05] for a proof by reduction.
Theorem 2.52. If 𝐸 is a pseudorandom permutation, then the randomized CBC mode
is IND-CPA secure.
Remark 2.53. It is easy to see that neither the CBC nor the CTR modes yield a CCA2-
secure encryption scheme. Security against chosen ciphertext attacks requires non-
malleability; a controlled modification of the ciphertext should be impossible and the
decryption of a modified ciphertext should not be related to the original plaintext. For
example, let’s consider the CTR mode and suppose a single bit of the ciphertext is
flipped. Then only the corresponding plaintext bit changes, so that the CTR mode
is clearly malleable.
CCA2 security of encryption schemes can be achieved by incorporating a message
authentication code. This is, for example, implemented by the Galois Counter Mode
(GCM), which is explained in Section 8.4.
2.11. Summary
• Encryption schemes are defined by plaintext, ciphertext and key spaces as well
as key generation, encryption and decryption algorithms. The encryption al-
gorithm can be probabilistic.
• Perfect secrecy is a very strong requirement, but the one-time pad is perfectly
secret. The key must be at least as long as the plaintext and cannot be re-used.
• Security against eavesdropping (EAV), chosen plaintext attacks (CPA) or adap-
tive chosen ciphertext attacks (CCA2) are common security goals of encryption
schemes. The security definition is based on experiments and the performance
of adversaries.
• Pseudorandom generators (prg) are deterministic algorithms. Polynomial-time
adversaries cannot distinguish the output of a prg from a random string if the
seed (or key) is secret.
• Pseudorandom function families (prf) and permutations appear to be random
transformations to polynomial-time adversaries if the key is secret.
• Block ciphers are keyed permutations. The combination of a block cipher and
an operation mode defines a variable-length encryption scheme.
• The randomized CBC and CTR modes give CPA-secure encryption schemes if
the underlying block cipher is a pseudorandom permutation.
Exercises
1. Show that the Vigenère cipher is perfectly secure if the key is randomly chosen, it
is only used once and the plaintext has the same length as the key.
2. Find reasons for Kerkhoff’s principle and discuss possible counter-arguments.
3. Show that the one-time pad is not perfectly secret if a key is used twice.
4. Let ℳ be the plaintext space and 𝒦 the key space of a perfectly secure encryption
scheme. Show that |𝒦| ≥ |ℳ|.
Exercises 59
Hint: Suppose ℰ𝑘 (𝑚0 ) = 𝑐. How many different plaintext-key pairs give the ci-
phertext 𝑐 ?
5. Is a bit permutation of block length 𝑛 perfectly secure if it is used only once to
encrypt a string of length 𝑛?
6. Show the formulas for the advantage of adversaries in Remark 2.16.
7. Explain the differences in the definitions of EAV-secure and IND-CPA secure en-
cryption schemes.
eav
8. Prove that a perfectly secure scheme is EAV-secure. Show that Adv (𝐴) is 0 for
any adversary 𝐴. Why is perfect security much stronger than EAV security?
9. Does the Vigenère cipher define an EAV-secure encryption scheme?
10. Suppose we want to expand a fixed output-length pseudorandom generator
𝐺 ∶ {0, 1}𝑛 → {0, 1}𝑛+1 by one extra bit. Which of the following generators 𝐺+ ∶
{0, 1}𝑛 → {0, 1}𝑛+2 could be pseudorandom?
(a) 𝐺(𝑠) = (𝑦1 ‖𝑦2 ‖ … ‖𝑦𝑛+1 ), 𝐺+ (𝑠) = (𝑦1 ‖𝑦2 ‖ … ‖𝑦𝑛+1 ‖ 𝑦1 ⊕ 𝑦2 ⋯ ⊕ 𝑦𝑛+1 ).
(b) 𝑠0 = 𝑠, for 𝑖 = 1, … , 𝑛 + 2: ( 𝑠𝑖 ‖ 𝑦𝑖 ) = 𝐺(𝑠𝑖−1 ). 𝐺+ (𝑠) = (𝑦1 ‖𝑦2 ‖ … ‖𝑦𝑛+2 ).
11. Suppose 𝐺 is a pseudorandom generator with fixed output-length and associated
encryption scheme defined by ℰ𝑘 (𝑚) = 𝑚 ⊕ 𝐺(𝑘). Show that this scheme is not
EAV-secure for multiple encryptions.
12. Prove that a pseudorandom generator is unpredictable in polynomial time, i.e.,
passes the next-bit test.
13. Explain why a malleable encryption scheme cannot be CCA2-secure.
14. Let 𝐹 be a family of bit permutations. Is 𝐹 a pseudorandom permutation?
15. Show that a block cipher in ECB mode is not EAV-secure.
16. Consider a block cipher in CBC mode. Suppose that the IV is initially set to 0 and
then incremented for every new encryption. Can this variant of the CBC mode be
CPA-secure?
17. Show that a block cipher in CTR is not secure against ciphertext-only attacks, if
the counter is re-used.
18. Can an encryption scheme that is based on a block cipher be perfectly secure?
19. Let 𝐸 be a block cipher of block length 4 and suppose that 𝐸𝑘 (𝑏1 𝑏2 𝑏3 𝑏4 ) = (𝑏2 𝑏3 𝑏4 𝑏1 ).
Encrypt 𝑚 = 1011 0001 0100 and decrypt the ciphertext with the following oper-
ation modes:
(a) ECB mode,
(b) CBC mode with 𝐼𝑉 = 1010,
(c) CTR mode with 𝑐𝑡𝑟 = 1010.
20. Consider a block cipher in CBC mode. The ciphertext is sent to a receiver. What
are the consequences, if:
(a) the receiver misses the initialization vector (IV), or
(b) a single ciphertext block is changed due to transmission errors, or
(c) a ciphertext bit is flipped by an adversary during the transmission, or
(d) a bit error occurs during the ciphering operation ?
21. Suppose a block cipher of block length 128 in CTR mode is used to encrypt 300
bits of plaintext. Which bits cannot be correctly decrypted if one of the following
errors occurs?
(a) The first bit of the counter value is flipped during transmission.
(b) The first ciphertext bit is flipped.
(c) The first ciphertext block 𝑐1 is changed due to transmission errors.
(d) The last ciphertext bit is flipped.
22. Compare ECB, CBC and CTR modes with respect to message expansion, error
propagation, pre-computations and parallelization of encryption and decryption.
23. Explain why neither CBC nor CTR modes achieve CCA2 security.
Chapter 3
Elementary Number Theory
This chapter covers several topics from elementary number theory. Section 3.1 deals
with integers, factorizations, prime numbers and the Euclidean algorithm. In Section
3.2, we discuss residue classes and modulo operations. The modular exponentiation
and the associated algorithms play an important role in several cryptographic schemes
and are dealt with in Section 3.3.
We refer to [Ros12] and [Sho09] for detailed expositions of the modular arithmetic
and the elementary number theory used in cryptography.
3.1. Integers
The set of integers {… , −2, −1, 0, 1, 2, 3, … } is denoted by ℤ. Integer numbers can be
𝑎
added, subtracted and multiplied. The fractions with 𝑎, 𝑏 ∈ ℤ and 𝑏 ≠ 0 define the
𝑏
rational numbers ℚ. We say that 𝑏 divides 𝑎, 𝑎 is divisible by 𝑏 or 𝑎 is a multiple of 𝑏, if
𝑎
∈ ℤ and we write 𝑏 ∣ 𝑎. Otherwise, we write 𝑏 ∤ 𝑎.
𝑏
Example 3.1. 2 ∣ 224236, but 2 ∤ −71. One has 1 ∣ 𝑛 and (−1) ∣ 𝑛 for all 𝑛 ∈ ℤ.
Although 7 ∣ 14, note that 14 ∤ 7. ♢
The result of an integer division 𝑎 ∶ 𝑏 for 𝑎, 𝑏 ∈ ℤ, 𝑏 ≠ 0 consists of an integer

𝑎
quotient 𝑞 = ⌊ ⌋ ∈ ℤ and a remainder 𝑟 ∈ ℤ such that 𝑎 = 𝑏𝑞 + 𝑟 and 0 ≤ 𝑟 < 𝑏 holds.
𝑏
Then 𝑎 ≡ 𝑟 mod 𝑏 since 𝑎 and 𝑟 are congruent modulo 𝑏, i.e., elements of the same
residue class. If 𝑏 divides 𝑎, then the remainder is 0 and 𝑎 ≡ 0 mod 𝑏.
Example 3.2. (1) 23 ∶ 3 = 7 remainder 2 and 23 = 3 ⋅ 7 + 2. Thus 23 ≡ 2 mod 3.
61
62 3. Elementary Number Theory
(2) (−17) ∶ 5 = −4 remainder 3 and −17 = 5 ⋅ (−4) + 3. Rather, one might expect
that the integer quotient is −3 in this example, but then the remainder would be
negative. Hence −17 ≡ 3 mod 5, although −17 ≡ −2 mod 5 is also correct. ♢
These elementary operations have efficient algorithms:

Proposition 3.3. Let 𝑎, 𝑏 ∈ ℤ. Then 𝑎 + 𝑏 and 𝑎 − 𝑏 can be computed in time
𝑂(max(size(𝑎), size(𝑏))) and the product 𝑎 ⋅ 𝑏 in time 𝑂(size(𝑎) size(𝑏)). We can com-
pute the integer quotient 𝑞 and the remainder 𝑟 of the integer division 𝑎 ∶ 𝑏 in time
𝑂(size(𝑏) size(𝑞)).
Proof. Use a base representation of 𝑎 and 𝑏 (and a sign) and analyze the complexity
of the well-known “paper-and-pencil” methods. □
Definition 3.4. A positive integer 𝑝 ∈ ℕ with 𝑝 ≥ 2 is called prime if 𝑝 is only divisible
by ±1 and ±𝑝. Integers 𝑛 ≥ 2 that are not prime numbers are called composite. ♢
The first few prime numbers are 2, 3, 5, 7, 11, 13, 17, 19, and large prime numbers
play an important role in cryptography. The following famous theorem describes the
asymptotic density of primes (see [Edw74]):
Theorem 3.5. (Prime Number Theorem) Let 𝜋(𝑥) be the number of primes less than 𝑥.
Then
𝜋(𝑥)
lim = 1.
𝑥→∞ 𝑥
( )
ln(𝑥)
𝜋(𝑥) 1
Therefore, the prime density is asymptotically .
𝑥 ln(𝑥)
𝜋(𝑥) 1 1
Example 3.6. Let 𝑥 = 22048 . The prime density is approximately =
𝑥 ln(𝑥) 2048 ln(2)
1
≈ . Hence the prime density of odd integers with at most 2048 binary digits is
1386
2
≈ ≈ 0.0014.
1386
Theorem 3.7. (Fundamental Theorem of Arithmetic) Every nonzero integer can be de-
composed into a product of primes and a sign ( factor 1 or −1). The decomposition is
unique up to the order of the factors.
Example 3.8. −60 = −22 ⋅ 3 ⋅ 5.
Example 3.9. We can use SageMath to check the primality or to obtain the factoriza-
tion of an integer.
sage: is_prime (267865461)
False
sage: factor (267865461)
3^5 * 337 * 3271
3.1. Integers 63
Primality testing for large numbers can be time-consuming, although a polynom-

ial-time algorithm (the AKS primality test [AKS04]) is known. In practice, probabilis-
tic algorithms are used (see Section 9.4).
sage: is_pseudoprime (2^1279 -1)

True
The largest prime number known at the time of writing is the Mersenne prime
282589933 − 1.
This prime number has more than 24 million decimal digits. The primality of Mersenne
numbers 2𝑛 − 1 is tested with the Lucas-Lehmer test, which we do not deal with in this
book.
Definition 3.10. Let 𝑎 and 𝑏 be nonzero integers. Then the greatest positive integer
that divides both 𝑎 ∈ ℤ and 𝑏 ∈ ℤ is called the greatest common divisor of 𝑎 and 𝑏 and
is denoted by gcd(𝑎, 𝑏). We say that 𝑎 and 𝑏 are relatively prime if gcd(𝑎, 𝑏) = 1. ♢
Example 3.11. (1) gcd(12, 140) = 4, since 12 = 22 ⋅ 3 and 140 = 22 ⋅ 5 ⋅ 7.

(2) 9 and 26 are relatively prime since gcd(32 , 2 ⋅ 13) = 1, although neither 9 nor 26
are prime numbers.
(3) If 𝑝 is a prime number, then all integers 1 ≤ 𝑛 < 𝑝 are relatively prime to 𝑝.
(4) An example using SageMath:
sage: gcd (267865461 ,236749299182338)

3271
Proposition 3.12. Let 𝑎 and 𝑏 be nonzero integer numbers. Then there exist 𝑥, 𝑦 ∈ ℤ
such that
gcd(𝑎, 𝑏) = 𝑎𝑥 + 𝑏𝑦 . ♢
The greatest common divisor and the above numbers 𝑥 and 𝑦 can be efficiently
computed with the Extended Euclidean Algorithm (see Algorithm 3.1). The algorithm
plays an important role in elementary number theory and in cryptography.
Algorithm 3.1 Extended Euclidean Algorithm

Input: 𝑎, 𝑏 ∈ ℕ
Output: gcd(𝑎, 𝑏), 𝑥, 𝑦 ∈ ℤ such that gcd(𝑎, 𝑏) = 𝑎𝑥 + 𝑏𝑦
Initialisation: 𝑥0 = 1, 𝑥1 = 0, 𝑦0 = 0, 𝑦1 = 1, 𝑠𝑖𝑔𝑛 = 1
1: while 𝑏 ≠ 0 do
2: 𝑟 = 𝑎 mod 𝑏 // remainder of the integer division 𝑎 ∶ 𝑏
3: 𝑞 = 𝑎/𝑏 // integer quotient
4: 𝑎=𝑏
5: 𝑏=𝑟
6: 𝑥𝑥 = 𝑥1
7: 𝑦𝑦 = 𝑦1
8: 𝑥1 = 𝑞 ⋅ 𝑥1 + 𝑥0
9: 𝑦1 = 𝑞 ⋅ 𝑦1 + 𝑦0
10: 𝑥0 = 𝑥𝑥
11: 𝑦0 = 𝑦𝑦
12: 𝑠𝑖𝑔𝑛 = −𝑠𝑖𝑔𝑛
13: end while
14: 𝑥 = 𝑠𝑖𝑔𝑛 ⋅ 𝑥0
15: 𝑦 = −𝑠𝑖𝑔𝑛 ⋅ 𝑦0
16: 𝑔𝑐𝑑 = 𝑎
17: return gcd, 𝑥, 𝑦
Example 3.13. We compute gcd(845, 117) (see Table 3.1). We have
Table 3.1. Computation of gcd(845, 117) = 13.
845 ∶ 117 = 7 rem. 26 845 = 7 ⋅ 117 + 26 26 = 845 − 7 ⋅ 117

117 ∶ 26 = 4 rem. 13 117 = 4 ⋅ 26 + 13 13 = 117 − 4 ⋅ 26
26 ∶ 13 = 2 rem. 0 26 = 2 ⋅ 13 + 0
𝟏𝟑 = 117 − 4 ⋅ 26 = 117 − 4 ⋅ (845 − 7 ⋅ 117) = −𝟒 ⋅ 845 + 𝟐𝟗 ⋅ 117.

Hence gcd(845, 117) = 13, 𝑥 = −4, 𝑦 = 29. ♢
We can assume that the input parameters of the algorithm satisfy 𝑎 > 𝑏, since
otherwise the first iteration of the algorithm swaps 𝑎 and 𝑏.
The Extended Euclidean Algorithm is very efficient:
Proposition 3.14. The running time of the Extended Euclidean Algorithm on input
𝑎, 𝑏 ∈ ℕ is 𝑂(size(𝑎) size(𝑏)). ♢
We refer to [Sho09] for a detailed proof of this statement. Suppose that

𝑎 > 𝑏 > 0. Each step of the while loop has running time 𝑂(size(𝑎) size(𝑞)), where 𝑞
3.2. Congruences 65
is the current integer quotient. Since the product of all quotients 𝑞 is less than or equal
to 𝑎, the algorithm runs in time 𝑂(size(𝑎)2 ). A more refined argument shows that the
running time is 𝑂(size(𝑎) size(𝑏)).
3.2. Congruences
Congruences and residue classes modulo 𝑛 were already dealt with in Chapter 1 (see
Example 1.21). Let 𝑛 ≥ 2 be a positive integer. Recall the definition of the following
equivalence relation on ℤ:
𝑅𝑛 = {(𝑥, 𝑦) ∈ ℤ × ℤ | 𝑥 − 𝑦 ∈ 𝑛 ℤ}.
Hence (𝑥, 𝑦) ∈ 𝑅𝑛 if the difference 𝑥 − 𝑦 is divisible by 𝑛. Equivalent elements 𝑥
and 𝑦 are called congruent modulo 𝑛 and we write 𝑥 = 𝑦 or 𝑥 ≡ 𝑦 mod 𝑛. The set
of equivalence classes is denoted by ℤ𝑛 or ℤ/(𝑛) and contains 𝑛 elements. ℤ𝑛 is also
called the set of residue classes mod 𝑛 or integers mod 𝑛 (see Figure 3.1).
0
𝑛−1 1
Figure 3.1. Residue classes modulo 𝑛.
Two integers 𝑥 and 𝑦 are congruent mod 𝑛 if they have the same remainder when
they are divided by 𝑛. Obviously, if 𝑥 = 𝑞1 𝑛 + 𝑟1 and 𝑦 = 𝑞2 𝑛 + 𝑟2 with 𝑟1 , 𝑟2 ∈
{0, 1, … , 𝑛 − 1}, then 𝑥 − 𝑦 = (𝑞1 − 𝑞2 )𝑛 + (𝑟1 − 𝑟2 ). Hence 𝑥 and 𝑦 are congruent mod
𝑛 if and only if 𝑟1 − 𝑟2 = 0. In other words, 𝑥 ≡ 𝑦 mod 𝑛 holds if the integer divisions
𝑥 ∶ 𝑛 and 𝑦 ∶ 𝑛 have the same remainder.
Note that an integer 𝑥 ∈ ℤ is only one representative of the residue class 𝑥 ∈ ℤ𝑛 ,
which contains infinitely many congruent elements. The standard representatives of
ℤ𝑛 are 0, 1, … , 𝑛 − 1, but other representatives are also permitted. Elements in ℤ𝑛 can
be added, subtracted and multiplied by choosing an arbitrary representative in ℤ. It is

often reasonable to choose the standard representative before and after each step of a
computation in order to reduce the size of numbers.
Example 3.15. (1) 234577 ⋅ 2328374 ⋅ 2837289374 mod 3 ≡ 1 ⋅ 2 ⋅ 2 ≡ 1 mod 3.
(2) An example using SageMath:
sage: mod (782637846 ,8927289)
5963703
(3) How would you compute 782637846 mod 8927289 with a simple pocket calcula-
tor? The real division 782637846 ∶ 8927289 gives approximately 87.668, and so
the integer quotient is 87. We compute 782637846 − 87 ⋅ 8927289 and get the re-
mainder 5963703. Alternatively, we multiply the fractional part 0.668 by 8927289
and also obtain 5963703, up to a small rounding error. ♢
Modular addition, subtraction and multiplication is straightforward using any in-

teger representative, and then showing that the result does not depend on the chosen
𝑏
representative. The division of residue classes is more tricky. Since mod 𝑛 might be
𝑎
𝑏
confused with the rational number , we rather write
𝑎
(𝑏 mod 𝑛) ⋅ (𝑎 mod 𝑛)−1
if (𝑎 mod 𝑛)−1 exists. A nonzero integer 𝑎 is invertible modulo 𝑛 if the equation
𝑎𝑥 ≡ 1 mod 𝑛
has a solution 𝑥 ∈ ℤ. In this case, 𝑥 is a representative of (𝑎 mod 𝑛)−1 . One can show
that the modular inverse, if it exists, is uniquely determined.
Example 3.16. Does the residue class (3 mod 10)−1 exist ? In other words, is 3 invert-
ible modulo 10 ? Since the equation
3𝑥 ≡ 1 mod 10
is solved by 𝑥 = 7, we find that (3 mod 10)−1 ≡ 7 mod 10. Note that the result is
1
neither ∈ ℚ nor the integer quotient or remainder of 1 ∶ 3. ♢
3
When does the multiplicative inverse modulo 𝑛 exist and how can we efficiently
compute it?
Proposition 3.17. Let 𝑛 ≥ 2 be a positive integer and 𝑎 ∈ ℤ a nonzero integer. Then
𝑎 mod 𝑛 has a multiplicative inverse if and only if gcd(𝑎, 𝑛) = 1. A representative of
(𝑎 mod 𝑛)−1 can be efficiently computed using the Extended Euclidean Algorithm.
Proof. Running the Extended Euclidean Algorithm on input 𝑎 and 𝑛 gives the output
integers gcd(𝑎, 𝑛), 𝑥 and 𝑦 such that
gcd(𝑎, 𝑛) = 𝑎𝑥 + 𝑛𝑦.
3.3. Modular Exponentiation 67
If gcd(𝑎, 𝑛) = 1, then 1 ≡ 𝑎𝑥 mod 𝑛, and so 𝑥 is a representative of (𝑎 mod 𝑛)−1 .

Conversely, if 𝑎 is invertible modulo 𝑛, then 𝑎𝑥 = 1 + 𝑛𝑦 ⇔ 𝑎𝑥 − 𝑛𝑦 = 1 for some
𝑥, 𝑦 ∈ ℤ. This implies gcd(𝑎, 𝑛) ∣ 1, which is only possible if gcd(𝑎, 𝑛) = 1. □
Definition 3.18. The invertible integers modulo 𝑛 are called units mod 𝑛. The subset
of units of ℤ𝑛 is denoted by ℤ∗𝑛 . Euler’s 𝜑-function (or 𝜙-function) is defined by the
cardinality of the units mod 𝑛, i.e., 𝜑(𝑛) = |ℤ∗𝑛 |.
Example 3.19. (1) ℤ∗10 = {1, 3, 7, 9} and 𝜑(10) = 4. The inverse elements are as
follows:
−1 −1 −1 −1
1 = 1, 3 = 7, 7 = 3, 9 = 9.
The inverses can be computed with the Extended Euclidean Algorithm. If we take
the input values 𝑎 = 3 and 𝑛 = 10, then we obtain gcd(3, 10) = 1, 𝑥 = −3 and
𝑦 = 1, satisfying the equation 1 = 3 ⋅ (−3) + 10 ⋅ 1. Hence (3 mod 10)−1 ≡ −3 ≡
7 mod 10.
(2) Let 𝑝 be a prime number; then ℤ∗𝑝 = {1, 2, … , 𝑝 − 1} and 𝜑(𝑝) = 𝑝 − 1.
3.3. Modular Exponentiation

The exponentiation of residue classes plays an important role in several cryptographic
algorithms. The exponent can be very large and may have in practice more than 2000
binary digits. A straightforward computation of modular powers is inefficient or even
completely impossible.
Below, we explain the fast exponentiation algorithm for residue classes. The algo-
rithm is based on the following observation: let 𝑎 = 2𝑘 be a power of 2. Then
𝑥𝑎 mod 𝑛 = ((((𝑥2 mod 𝑛)2 mod 𝑛)2 mod 𝑛)2 … )2 mod 𝑛,

1024
where the squaring is iterated 𝑘 times. For example, 𝑥(2 ) mod 𝑛 can be computed
with only 1024 modular squaring operations, even though 21024 is a huge number. Af-
ter each squaring, one takes the result modulo 𝑛 in order to reduce the size of integers.
𝑘 𝑘 𝑘
Warning 3.20. 𝑥2 is not the same as 𝑥2𝑘 . In fact, 𝑥2 = 𝑥(2 ) , and this is different
from (𝑥2 )𝑘 = 𝑥2𝑘 . ♢
If the exponent is not a power of 2, then it can still be written as a sum of powers
𝑘
of 2. This gives a product of factors 𝑥2 . The binary representation of the exponent
𝑘
determines whether or not a factor 𝑥2 is present in the product.
Example 3.21. 641 mod 59. We have 41 = 25 + 23 + 20 and compute the following
sequence of squares:
62 ≡ 36 mod 59,
64 ≡ 362 ≡ 57 mod 59,
68 ≡ 572 ≡ 4 mod 59,
616 ≡ 42 ≡ 16 mod 59,
632 ≡ 162 ≡ 20 mod 59.
The intermediate result 68 ≡ 4 should be stored when computing 632 . Then
641 = 632 ⋅ 68 ⋅ 6 ≡ 20 ⋅ 4 ⋅ 6 ≡ 8 mod 59.
Hence five quadratures and two modular multiplications are needed.

SageMath uses the function power_mod:
sage: power_mod (6 ,41 ,59)

8
Proposition 3.22. Let 𝑥, 𝑎, 𝑛 ∈ ℕ. The computation of the modular power 𝑥𝑎 mod 𝑛

requires at most size(𝑛) − 1 quadratures and size(𝑛) − 1 multiplications modulo 𝑛, if
the fast exponentiaton algorithm is used. The total number of modular multiplications is
bounded above by 2 size(𝑛). The complexity of modular exponentiations is 𝑂(size(𝑛)3 ).
♢
Since the complexity of a single modular multiplication is 𝑂(size(𝑛)2 ), the com-

plexity of a modular exponentiation is 𝑂(size(𝑛)3 ). This is polynomial in size(𝑛), but
significantly slower than algorithms with linear or quadratic running time. Therefore,
one tries to avoid too many exponentiations in practice. Algorithms that use modular
powers (for example RSA and Diffie-Hellman) are not applied to bulk data since they
are too slow.
Warning 3.23. When computing 𝑥𝑎 mod 𝑛, the exponent must not be reduced mod 𝑛.
For example 26 = 64 ≡ 4 mod 5, which is different from 21 = 2 mod 5. However,
we will later see (Proposition 4.16) that a reduction mod 𝜑(𝑛) is possible (i.e., mod 4 in
this example so that 26 ≡ 22 = 4 mod 5) and helps to reduce the size of the exponent.
Furthermore, a reduction of the base 𝑥 mod 𝑛 is allowed. For example, 66 ≡ 16 =
1 mod 5. ♢
The fast exponentiation can be slightly optimized with the square-and-multiply

algorithm (see Algorithm 3.2).
Exercises 69
Example 3.24. 641 = ((((62 )2 ⋅ 6)2 )2 )2 ⋅ 6. The computation is a sequence of squarings

(SQ) and multiplications (MULT): SQ, SQ, MULT, SQ, SQ, SQ, MULT. As above, we
need 5 quadratures and 2 multiplications, but in a different order. The exact order
depends on the binary representation of the exponent. An advantage of this algorithm
is that intermediate results do not need to be stored.
Algorithm 3.2 Square-and-Multiply Algorithm

𝑠
Input: Base 𝑥 ∈ ℕ and exponent 𝑎 = ∑𝑖=0 ℎ𝑖 2𝑖 with ℎ𝑖 ∈ {0, 1} and 𝑎 ≥ 2.
Output: 𝑥𝑎 mod 𝑛
Initialisation: 𝑟 = 𝑥
1: for 𝑖 = 𝑠 − 1 downto 0 do
2: 𝑟 = 𝑟2 mod 𝑛
3: if ℎ𝑖 = 1 then
4: 𝑟 = 𝑟 ⋅ 𝑥 mod 𝑛
5: end if
6: end for
7: return 𝑟
3.4. Summary
• Every nonzero integer can be decomposed into a product of prime numbers

and a sign.
• The Extended Euclidean Algorithm takes two integers as input and outputs
their greatest common divisor and two other useful integers.
• There are 𝑛 different residue classes modulo 𝑛. They are also called integers
modulo 𝑛.
• Residue classes can be added, subtracted and multiplied. An integer can be
inverted modulo 𝑛 if the greatest common divisor of that integer and 𝑛 is equal
to 1.
• There exist efficient algorithms for the elementary arithmetic operations (ad-
dition, multiplication, division, etc.) on integers and residue classes.
• Fast exponentiation and square-and-multiply are efficient algorithms to com-
pute a modular power.
Exercises
1. Which of the following residue classes are identical in ℤ26 :
0, 3, −49, 49, 104?

2. Enumerate the elements of ℤ∗22 and give 𝜑(22). Find the inverse of each element
in ℤ∗22 .
3. Perform some elementary modular operations with SageMath.
Let 𝑛 = 123456789012345, 𝑎 = 5377543210987654321 and 𝑏 = 12345678914335.
(a) Find the prime factor decomposition of 𝑛, 𝑎 and 𝑏. Use the factor command.
(b) Compute 𝑎 + 𝑏 mod 𝑛, 𝑎𝑏 mod 𝑛, 𝑎𝑏 mod 𝑛, using mod(..,..) and
power_mod(..,..,..).
(c) Are 𝑎 or 𝑏 invertible modulo 𝑛 ? Why or why not? Compute (𝑎 mod 𝑛)−1 or
(𝑏 mod 𝑛)−1 , using mod(1/..,..).
(d) Are 𝑎 and 𝑏 relatively prime? Why or why not?
4. Run the Extended Euclidean Algorithm on input 𝑎 = 1234 and 𝑏 = 6789.
5. Use the Extended Euclidean Algorithm to compute the multiplicative inverse of
32 ∈ ℤ∗897 .
6. Let 𝑝 be a prime number and 𝑚 ∈ ℕ. Find 𝜑(2𝑝), 𝜑(2𝑚 ) and 𝜑(𝑝𝑚 ).
7. Write a function which examines the primality of all Mersenne numbers
𝑀𝑛 = 2 𝑛 − 1
for 𝑛 < 2000. Use the SageMath function is_pseudoprime(. . . ).

8. This exercise explores the plain RSA encryption scheme (see Section 9.2).
(a) Let 𝑝 and 𝑞 be the largest Mersenne prime numbers with less than 2000 binary
digits (see Exercise 7).
Compute 𝑛 = 𝑝𝑞 and 𝜑(𝑛) = (𝑝 − 1)(𝑞 − 1).
(b) Find all integers 1 < 𝑒 < 100 such that gcd(𝑒, 𝜑(𝑛)) = 1. We will later see that
(𝑒, 𝑛) can be used as a public RSA key. Now choose 𝑒 > 1 as small as possible
such that 𝑒 and 𝜑(𝑛) are relatively prime.
(c) Compute the private RSA key 𝑑 = (𝑒 mod 𝜑(𝑛))−1 .
(d) Suppose that 𝑚 = 21500 + 2500 + 1 is a given plaintext. Compute the RSA
ciphertext 𝑐 = 𝑚𝑒 mod 𝑛.
(e) Decrypt the ciphertext by computing 𝑐𝑑 mod 𝑛 and compare the result with
the plaintext 𝑚.
Tip: SageMath distinguishes between residue classes and integers. Use ZZ(d) to
transform 𝑑 from a residue class into an integer.
9. Write a SageMath function which generates 100,000 odd random integers less than
21024 , checks their primality and counts the number of primes. Use the SageMath
function ZZ.random_element(N) to generate a positive random integer less than
𝑁. Compare the number of primes with the expected number using the Prime
Number Theorem.
Tip: An odd random integer less than 21024 can be generated by
2∗(ZZ.random_element(21023 ))+1.
Test the primality with the function is_pseudoprime.
Exercises 71
10. Compute 255 mod 61

(a) using fast exponentiation and
(b) using the square-and-multiply algorithm.
How many modular squarings and multiplications are necessary?
11. Determine the maximum number of modular squarings and multiplications that
are needed to compute 𝑥𝑘 mod 𝑛 if size(𝑛) = size(𝑘) = 2048.
Chapter 4
Algebraic Structures
Modern cryptography uses not only discrete mathematics and elementary number the-
ory, but also algebraic structures such as abelian groups, polynomial rings, quotient
rings and finite fields. Understanding finite abelian groups and finite fields is crucial,
and these algebraic topics should not be underestimated.
Section 4.1 deals with groups and, in particular, finite abelian groups. We will see
that finite abelian groups are products of cyclic groups. Rings and fields are discussed in
Section 4.2. Finite fields are used in many cryptographic constructions, and in Section
4.3, we construct fields with 𝑝𝑛 elements. We add a recapitulation of linear and affine
maps in Section 4.4.
The contents of this chapter can be found in any textbook on abstract algebra, and
we refer the reader for example to [Sho09].
4.1. Groups
Groups are among the most fundamental mathematical structures. A group is a set
with a binary operation which satisfies several properties.
Definition 4.1. A group 𝐺 is a set together with a law of composition
∘∶𝐺×𝐺 →𝐺
such that the following properties are satisfied:

• For all 𝑎, 𝑏, 𝑐 ∈ 𝐺 one has (𝑎 ∘ 𝑏) ∘ 𝑐 = 𝑎 ∘ (𝑏 ∘ 𝑐) (associative law).
• There is an identity element 𝑒 ∈ 𝐺 such that 𝑒 ∘ 𝑔 = 𝑔 ∘ 𝑒 = 𝑔 for all 𝑔 ∈ 𝐺 (identity
element).
73
74 4. Algebraic Structures
• For every 𝑔 ∈ 𝐺 there is an inverse element 𝑥 ∈ 𝐺 with 𝑔 ∘ 𝑥 = 𝑥 ∘ 𝑔 = 𝑒 (inverse

elements).
The group is called abelian or commutative if for all 𝑎, 𝑏 ∈ 𝐺, one has 𝑎 ∘ 𝑏 = 𝑏 ∘ 𝑎
(commutative law). ♢
The above definition uses ∘ for the composition of elements. In our applications,
the composition is either addition or multiplication, and we write + (plus) or ⋅ (dot).
The identity element is denoted by 0 (additive case) or 1 (multiplicative case). Accord-
ingly, the inverse element of 𝑔 is denoted by −𝑔 in the additive case and by 𝑔−1 in the
multiplicative case.
Example 4.2. (1) (ℤ, +) is an additive abelian group.

(2) (ℝ ⧵ {0}, ⋅) is a multiplicative abelian group.
(3) (ℤ𝑛 , +) is an additive abelian group with 𝑛 elements.
(4) (ℤ∗𝑛 , ⋅) is a multiplicative abelian group with 𝜑(𝑛) elements (see Definition 3.18).
We removed all non-units from ℤ𝑛 to ensure that all elements are invertible. Note
that the inverses are again residue classes, not rational numbers !
(5) Let 𝑝 be a prime. Then (ℤ∗𝑝 , ⋅) is a multiplicative abelian group with 𝑝−1 elements
(see Example 3.19).
(6) Let 𝐺1 and 𝐺2 be groups; then the direct product 𝐺1 ×𝐺2 is also a group, consisting
of all tuples (𝑔1 , 𝑔2 ) with 𝑔1 ∈ 𝐺1 and 𝑔2 ∈ 𝐺2 . The group operation is defined
component-wise. More generally, one has the product groups
𝐺1 × ⋯ × 𝐺𝑛 and 𝐺𝑛 = 𝐺 × ⋯ × 𝐺.
Remark 4.3. Examples of non-commutative groups include the group of permutations

of a set 𝑆 (with composition of mappings) and the group of invertible matrices (with
matrix multiplication). The structure of these groups can be very complicated. ♢
We want to relate different groups and consider maps that respect the group struc-
ture.
Definition 4.4. Let 𝑓 ∶ 𝐺1 → 𝐺2 be a map between two groups 𝐺1 , 𝐺2 . Then 𝑓 is

called a group homomorphism if 𝑓(𝑔 ∘ 𝑔′ ) = 𝑓(𝑔) ∘ 𝑓(𝑔′ ) for all 𝑔, 𝑔′ ∈ 𝐺1 . A bijective
group homomorphism is called an isomorphism. Then we say that 𝐺1 is isomorphic to
𝐺2 and write 𝐺1 ≅ 𝐺2 . ♢
It is easy to show that the inverse map 𝑓−1 of an isomorphism 𝑓 is also a group
homomorphism and hence an isomorphism.
Warning 4.5. A bijection between two groups does not imply that they are isomorphic!
For example, there is a bijection between ℤ2 × ℤ2 and ℤ4 (since both groups have 4
elements), but they are not isomorphic (see Example 4.28 below).
4.1. Groups 75
Example 4.6. (1) The natural projection map 𝑓 ∶ ℤ → ℤ𝑛 with 𝑓(𝑘) = 𝑘 mod 𝑛 is
a group homomorphism, since
𝑓(𝑘1 + 𝑘2 ) = (𝑘1 + 𝑘2 ) mod 𝑛 = (𝑘1 mod 𝑛) + (𝑘2 mod 𝑛).
Obviously, 𝑓 is not injective and therefore not an isomorphism.
(2) The reverse map 𝑓 ∶ ℤ𝑛 → ℤ with 𝑓(𝑘) = 𝑘, where 𝑘 ∈ {0, 1, … , 𝑛 − 1} is the
standard representative, is not a homomorphism, since for example
𝑓(1) + 𝑓(𝑛 − 1) = 1 + (𝑛 − 1) = 𝑛,
but 𝑓(1 + 𝑛 − 1) = 𝑓(0) = 0.
(3) Let 𝐺1 = (ℤ4 , +) = {0, 1, 2, 3} and 𝐺2 = (ℤ∗5 , ⋅) = {1, 2, 3, 4}. The map
𝑓 ∶ 𝐺1 → 𝐺2 , 𝑓(𝑘 mod 4) = 2𝑘 mod 5
is well defined, since 24 ≡ 16 ≡ 1 mod 5 so that the result does not depend on a
representative of 𝑘 modulo 4. It defines a homomorphism, since
𝑓((𝑘1 mod 4) + (𝑘2 mod 4)) = 2𝑘1 +𝑘2 mod 5 = (2𝑘1 mod 5) ⋅ (2𝑘2 mod 5).
The explicit mapping is given by 𝑓(0) = 1, 𝑓(1) = 2, 𝑓(2) = 4 and 𝑓(3) = 3.
Hence 𝑓 is a bijection and defines a group isomorphism (ℤ4 , +) ≅ (ℤ∗5 , ⋅).
Definition 4.7. Let 𝐺 be a group. A subgroup 𝐻 of 𝐺 is a subset of 𝐺, which contains
the identity element and is closed under the law of composition and inverse. ♢
This can be combined into one equivalent condition:

Proposition 4.8. 𝐻 is a subgroup of a group 𝐺 if and only if 𝑎 𝑏−1 ∈ 𝐻 for all 𝑎, 𝑏 ∈ 𝐻.
Example 4.9. (1) Let 𝐺 = (ℝ ⧵ {0}, ⋅). Then the set of strictly positive numbers 𝐻 =
(ℝ+ , ⋅) forms a subgroup of 𝐺 since the multiplicative inverse of a positive element
is positive and the product of two positive numbers is again positive. The nonzero
integers (ℤ⧵{0}, ⋅) do not form a subgroup of 𝐺, because the inverse elements (with
the exception of 1 and −1) are not integers.
(2) Let 𝐺 = (ℤ, +) and 𝐻 = 26 ℤ be the set of all integer multiples of 26. Then 𝐻 is
an additive subgroup of 𝐺. ♢
The following statement is easy to show.

Proposition 4.10. Let 𝑓 ∶ 𝐺1 → 𝐺2 be a group homomorphism. The kernel of 𝑓, i.e.,
the set {𝑔1 ∈ 𝐺1 | 𝑓(𝑔1 ) = 𝑒2 }, where 𝑒2 is the identity element in 𝐺2 , is a subgroup of 𝐺1 .
Furthermore, the image 𝑖𝑚(𝑓) is a subgroup of 𝐺2 . ♢
Each group element generates a subgroup:

Definition 4.11. Let 𝐺 be a group and 𝑔 ∈ 𝐺; then the set ⟨𝑔⟩ = {𝑔𝑘 | 𝑘 ∈ ℤ} is called
the subgroup generated by 𝑔. Note that we used the multiplicative notation. For an
additive group, we would write ⟨𝑔⟩ = {𝑘 ⋅ 𝑔 | 𝑘 ∈ ℤ} instead. ♢
The subgroup ⟨𝑔⟩ is in fact a cyclic group (see Definition 4.18 below). Next, we
define the order of a group and the order of elements:
Definition 4.12. Let 𝐺 be a group; then the order of the group 𝐺 is defined to be the
number |𝐺| of elements in 𝐺 (or infinity) and denoted by ord(𝐺). Now let 𝑔 ∈ 𝐺. Then
the order of the element 𝑔 is defined by the order of the subgroup ⟨𝑔⟩ generated by 𝑔, i.e.,
ord(𝑔) = ord(⟨𝑔⟩). ♢
We refer to [Sho09] for a proof of the following theorem.

Theorem 4.13. (Lagrange) Suppose 𝐺 is a finite group and 𝐻 ⊂ 𝐺 is a subgroup. Then
the order of 𝐻 divides the order of 𝐺:
ord(𝐻) ∣ ord(𝐺). ♢
If 𝐻 = ⟨𝑔⟩, then the Theorem of Lagrange implies:

Corollary 4.14. Let 𝐺 be a finite group and 𝑔 ∈ 𝐺; then the group order is divisible by
the element order:
ord(𝑔) ∣ ord(𝐺). ♢
We obtain Euler’s Theorem:

Theorem 4.15. (Euler’s Theorem) Let 𝐺 be a finite group and 𝑔 ∈ 𝐺; then
𝑔ord(𝐺) = 𝑒.
Proof. Let 𝑛 be the smallest positive integer such that 𝑔𝑛 = 𝑒. It follows that 𝑛 =
ord(𝑔), since ⟨𝑔⟩ has exactly ord(𝑔) elements. Hence
(𝑔𝑛 )𝑘 = 𝑔ord(𝑔) 𝑘 = 𝑒
for any 𝑘 ∈ ℕ. Then the assertion follows from Corollary 4.14. □
Euler’s Theorem is often stated for 𝐺 = ℤ∗𝑛 . Since ord(ℤ∗𝑛 ) = 𝜑(𝑛) (see Definition
3.18), we obtain
𝑥𝜑(𝑛) ≡ 1 mod 𝑛
for any 𝑥 ∈ ℤ with gcd(𝑥, 𝑛) = 1. If 𝑛 is a prime number 𝑝, then one has
𝑥𝑝−1 ≡ 1 mod 𝑝
for any integer 𝑥 which is not a multiple of 𝑝. This implies Fermat’s Little Theorem:
𝑥𝑝 ≡ 𝑥 mod 𝑝.
This modular equation holds for any prime number 𝑝 and integer 𝑥.
Euler’s Theorem shows how the exponent can be reduced in a modular exponen-
tiation.
4.1. Groups 77
Proposition 4.16. Let 𝑥 ∈ ℤ, 𝑛, 𝑎1 , 𝑎2 ∈ ℕ and assume that 𝑎1 ≡ 𝑎2 mod 𝜑(𝑛). Then

𝑥𝑎1 ≡ 𝑥𝑎2 mod 𝑛 .
Proof. By assumption, 𝑎1 − 𝑎2 is a multiple of 𝜑(𝑛). Then Euler’s Theorem implies

𝑥𝑎1 −𝑎2 ≡ 1 mod 𝑛. □
Note that the exponent can be reduced modulo 𝜑(𝑛), but not modulo 𝑛 (compare
Warning 3.23).
Example 4.17. Calculate 722 mod 11. Since 𝜑(11) = 10 and 22 ≡ 2 mod 10, we
obtain 722 ≡ 72 = 49 ≡ 5 mod 11. ♢
Now we will study an important type of group, the cyclic groups.

Definition 4.18. Let 𝐺 be a group and 𝑔 ∈ 𝐺. If ⟨𝑔⟩ = 𝐺, then 𝐺 is called a cyclic
group and we say 𝑔 is a generator of 𝐺. ♢
The elements of a cyclic group with generator 𝑔 are

⟨𝑔⟩ = {… , 𝑔−2 , 𝑔−1 , 𝑒, 𝑔, 𝑔2 , 𝑔3 , … }.
If a cyclic group has finite order 𝑛, then 𝑔𝑛 = 𝑒, 𝑔𝑛+1 = 𝑔, 𝑔𝑛+2 = 𝑔2 , … and also
𝑔−1 = 𝑔𝑛−1 , 𝑔−2 = 𝑔𝑛−2 , … (see Figure 4.1). Hence
⟨𝑔⟩ = {𝑒, 𝑔, 𝑔2 , … , 𝑔𝑛−1 }.
𝑒 = 𝑔0
𝑛−1 𝑔
𝑔
𝑔2
𝑔3
Figure 4.1. Cyclic group of order 𝑛 generated by 𝑔.

Example 4.19. (1) (ℤ𝑛 , +) is generated by 1, i.e., ℤ𝑛 = ⟨1⟩. Hence ℤ𝑛 is a cyclic

group of order 𝑛.
(2) (ℤ, +) is a cyclic group of infinite order and ℤ = ⟨1⟩ = ⟨−1⟩. ♢
In fact, one can show that all cyclic groups are isomorphic to either (ℤ𝑛 , +) or
(ℤ, +).
Proposition 4.20. Let 𝐺 be a cyclic group. If ord(𝐺) = 𝑛, then 𝐺 is isomorphic to ℤ𝑛 . If
ord(𝐺) = ∞, then 𝐺 is isomorphic to ℤ.
Proof. Let 𝐺 = ⟨𝑔⟩. We use the multiplicative notation for 𝐺. If 𝑔 has infinite order,
then 𝑓 ∶ ℤ → 𝐺 with 𝑓(𝑘) = 𝑔𝑘 defines an isomorphism from the additive group (ℤ, +)
to 𝐺. If ord(𝑔) = 𝑛, then 𝑓 ∶ ℤ𝑛 → 𝐺, 𝑓(𝑘 mod 𝑛) = 𝑔𝑘 gives an isomorphism. □
Example 4.21. (ℤ∗5 , ⋅) is generated by 2 mod 5 and is cyclic of order 4. We have seen
above (Example 4.6) that 𝑓(𝑘 mod 4) = 2𝑘 mod 5 gives an isomorphism between
the additive group ℤ4 and the multiplicative group ℤ∗5 . Hence two groups that look
different may still be isomorphic. ♢
How can one find or verify generators of a finite cyclic group? If the group is large,
then it is inefficient or even computationally impossible to compute the sequence of
powers 𝑔, 𝑔2 , 𝑔3 , … and to check whether all elements of 𝐺 occur. We can use the
following observation to give a more efficient method: suppose that ord(𝐺) = 𝑛. If 𝑔 is
𝑛
not a generator, then ord(𝑔) is strictly less than 𝑛 and divides for some prime factor
𝑞
𝑞 of 𝑛. In this case, we have 𝑔𝑛/𝑞 = 1. Hence we only need to check the powers 𝑔𝑛/𝑞
for all prime factors 𝑞 of 𝑛. If all powers are different from the identity element, then
ord(𝑔) = 𝑛 and 𝑔 is a generator.
The following Algorithm 4.1 takes a finite cyclic group 𝐺, the group order 𝑛 and
an element 𝑔 ∈ 𝐺 as input and outputs TRUE if 𝑔 is a generator and otherwise FALSE.
Algorithm 4.1 Generator of a Cyclic Group

Input: Cyclic group 𝐺 of order 𝑛 ∈ ℕ, identity element 1 and 𝑔 ∈ 𝐺.
Output: TRUE, if ord(𝑔) = 𝑛. Otherwise, return FALSE.
Initialisation: result=TRUE
1: for each prime factor 𝑞 of 𝑛 do
2: 𝑏 = 𝑔𝑛/𝑞
3: if 𝑏 = 1 then
4: result=FALSE
5: break
6: end if
7: end for
8: return result
4.1. Groups 79
Example 4.22. Let 𝐺 = ℤ∗53 . We want to check whether 𝑔 = 2 mod 53 is a generator

of 𝐺. Since 53 is a prime, we have ord(𝐺) = 52. The factorization 52 = 22 ⋅ 13 yields
the prime factors 2 and 13. One computes 𝑔52/13 = 24 = 16 mod 53 and 𝑔52/2 = 226 ≡
52 mod 53. We conclude that that 𝑔 = 2 is a generator of 𝐺. It also follows that 𝑔2 = 4
has order 26 and the order of 𝑔4 = 16 is 13.
Now consider ℎ = 7 mod 53; then 74 ≡ 16 mod 53 and ℎ26 ≡ 1 mod 53. Hence
7 is not a generator of 𝐺. Since ord(ℎ) divides ord(𝐺) = 52, the order of ℎ must be 1, 2,
13 or 26. Because ℎ2 ≡ 49 mod 53 and ℎ13 ≡ 52 mod 53 are not congruent to 1, we
conclude that ord(ℎ) = 26.
Definition 4.23. A generator of the multiplicative group ℤ∗𝑛 is called a primitive root
modulo 𝑛. ♢
The following Theorem states that primitive roots modulo prime numbers exist.
Theorem 4.24. Let 𝑝 be a prime; then ℤ∗𝑝 is a cyclic group of order 𝑝 − 1. The number
of primitive roots is 𝜑(𝑝 − 1). ♢
There are several (non-trivial) proofs of this theorem. Note that there are certain
composite numbers, for example 𝑛 = 12, such that ℤ∗𝑛 does not possess a primitive
root.
Example 4.25. Let 𝑝 = 2535301200456458802993406412663. We use SageMath to
compute element orders in ℤ∗𝑝 and to find a primitive root. First, we verify that 𝑝 is
prime and factorize 𝑝 − 1.
sage: p =2535301200456458802993406412663; is_prime (p); factor (p -1)
True
2 * 1267650600228229401496703206331
The prime factors of 𝑝 − 1 are 2 and 𝑞 = 1267650600228229401496703206331. Hence

ord(𝑔) ∈ {1, 2, 𝑞, 𝑝 − 1} for all 𝑔 ∈ ℤ∗𝑝 . Obviously, only 𝑔 ≡ 1 has order 1 and only
𝑔 ≡ −1 has order 2. The other residue classes have either prime order 𝑞 or maximal
order 2𝑞 = 𝑝 − 1. If 𝑔𝑞 ≡ 1 mod 𝑝, then ord(𝑔) = 𝑞. Otherwise, ord(𝑞) = 𝑝 − 1 and 𝑔
is a primitive root. For example, ord(3 mod 𝑝) = 𝑞 and ord(7 mod 𝑝) = 𝑝 − 1.
sage: q =1267650600228229401496703206331
power_mod (3,q,p)
1
sage: power_mod (7,q,p)
2535301200456458802993406412662
The Chinese Remainder Theorem provides a decomposition of cyclic groups.

Theorem 4.26. (Chinese Remainder Theorem) Let 𝑎, 𝑏 ∈ ℕ be relatively prime, i.e.,
gcd(𝑎, 𝑏) = 1. Set 𝑛 = 𝑎𝑏. Then the natural map 𝑓 ∶ ℤ𝑛 → ℤ𝑎 × ℤ𝑏 , given by
𝑓(𝑘 mod 𝑛) = (𝑘 mod 𝑎, 𝑘 mod 𝑏), is well defined and gives an isomorphism of ad-
ditive groups:
ℤ𝑛 ≅ ℤ𝑎 × ℤ𝑏 .
The restriction of this map to the group of units yields an isomorphism of multiplicative
groups:
ℤ∗𝑛 ≅ ℤ∗𝑎 × ℤ∗𝑏 .
Proof. We prove the isomorphism of the additive groups. Since 𝑎 and 𝑏 divide 𝑛, the
map is well defined. It follows from the definition of 𝑓 that the map is a homomor-
phism. Since ℤ𝑛 and ℤ𝑎 × ℤ𝑏 both contain 𝑛 = 𝑎𝑏 elements, it suffices to prove the sur-
jectivity. Let (𝑘1 mod 𝑎, 𝑘2 mod 𝑏) ∈ ℤ𝑎 ×ℤ𝑏 . We need to find an element 𝑘 ∈ ℤ such
that 𝑘 ≡ 𝑘1 mod 𝑎 and 𝑘 ≡ 𝑘2 mod 𝑏. Since gcd(𝑎, 𝑏) = 1, the Extended Euclidean
Algorithm gives 𝑥, 𝑦 ∈ ℤ such that 𝑎𝑥 + 𝑏𝑦 = 1. This equation implies 𝑎𝑥 ≡ 1 mod 𝑏
and 𝑏𝑦 ≡ 1 mod 𝑎. Now we set
𝑘 = 𝑘1 𝑏𝑦 + 𝑘2 𝑎𝑥.
Then 𝑘 ≡ 𝑘1 𝑏𝑦 ≡ 𝑘1 mod 𝑎 and 𝑘 ≡ 𝑘2 𝑎𝑥 ≡ 𝑘2 mod 𝑏, as desired. □
If 𝑝 and 𝑞 are different prime numbers and 𝑛 = 𝑝𝑞, then

ℤ𝑛 ≅ ℤ𝑝 × ℤ𝑞 and ℤ∗𝑛 ≅ ℤ∗𝑝 × ℤ𝑞 .
Furthermore, we obtain 𝜑(𝑛) = 𝜑(𝑝)𝜑(𝑞) = (𝑝 − 1)(𝑞 − 1). These groups play an
important role in the RSA algorithm (see Section 9.2).
Remark 4.27. The Chinese Remainder Theorem also holds true for more than two
factors if the factors are pairwise relatively prime.
Example 4.28. Let 𝑛 = 60 = 22 ⋅ 3 ⋅ 5. Then the Chinese Remainder Theorem gives
the following decompositions of ℤ60 :
ℤ60 ≅ ℤ4 × ℤ15 ≅ ℤ4 × ℤ3 × ℤ5 .
But note that ℤ4 is not isomorphic to ℤ2 × ℤ2 . Both groups have order 4, but the first
group is cyclic while the second is not. ♢
Cyclic groups form the main building block in the classification of arbitrary finite
abelian groups.
Theorem 4.29. (Fundamental Theorem of Abelian Groups) Let 𝐺 be a finite abelian
group. Then 𝐺 is isomorphic to a direct product of cyclic groups ℤ𝑝𝑘 of order 𝑝𝑘 , where 𝑝
is a prime number and 𝑘 ∈ ℕ. The same prime 𝑝 can appear in several factors.
Proof. According to Proposition 4.20, cyclic groups of order 𝑛 are isomorphic to ℤ𝑛 .

Suppose
𝑘 𝑘
𝑛 = 𝑝1 1 ⋯ 𝑝𝑛𝑛
4.2. Rings and Fields 81
is the prime factorization of 𝑛. Then the Chinese Remainder Theorem implies

ℤ𝑛 ≅ ℤ 𝑘 × ⋯ × ℤ 𝑝 𝑘𝑛 .
𝑝1 1 𝑛
It remains to show that every finite abelian group is a product of cyclic groups (see for
instance [Sho09]). □
Example 4.30. (1) Let 𝐺 be an abelian group of order 77. Then 𝐺 ≅ ℤ7 × ℤ11 . 𝐺 is
isomorphic to ℤ77 and is cyclic.
(2) Suppose 𝐺 is an abelian group of order 18. Then 𝐺 is either isomorphic to ℤ2 × ℤ9
or to ℤ2 × ℤ3 × ℤ3 . Note that these two groups are not isomorphic. The first group
is cyclic and the second group is not cyclic.
4.2. Rings and Fields

Rings are sets with two operations:
Definition 4.31. A ring (or, more precisely, a commutative ring with unity) is a set 𝑅
with two operations (addition + and multiplication ⋅) and the following properties:
• (𝑅, +) is an abelian group with the additive identity element 0.
• (𝑅, ⋅) satisfies the associative law, is commutative and has an identity element
denoted by 1. The existence of an inverse element is not required.
• 𝑥 ⋅ (𝑦 + 𝑧) = (𝑥 ⋅ 𝑦) + (𝑥 ⋅ 𝑧) for all 𝑥, 𝑦, 𝑧 ∈ 𝑅 (distributivity).
Example 4.32. The integers ℤ and ℤ𝑛 , the integers modulo 𝑛, form a ring with respect
to addition and multiplication of integers and residue classes, respectively. ♢
Maps between rings that are compatible with addition and multiplication are
called ring homomorphisms.
Definition 4.33. Let 𝑓 ∶ 𝑅1 → 𝑅2 be a map between the rings 𝑅1 and 𝑅2 . Then 𝑓 is
called a ring homomorphism if
(1) 𝑓(𝑥 + 𝑦) = 𝑓(𝑥) + 𝑓(𝑦) for all 𝑥, 𝑦 ∈ 𝑅, and
(2) 𝑓(𝑥 ⋅ 𝑦) = 𝑓(𝑥) ⋅ 𝑓(𝑦) for all 𝑥, 𝑦 ∈ 𝑅, and
(3) 𝑓(1) = 1.
A bijective ring homomorphism is called an isomorphism, and one writes 𝑅1 ≅ 𝑅2 .
Example 4.34. Let 𝑎, 𝑏 ∈ ℕ be relatively prime and 𝑛 = 𝑎𝑏. Then the Chinese Re-
mainder Theorem 4.26 gives a ring isomorphism
ℤ𝑛 ≅ ℤ𝑎 × ℤ𝑏 .
Definition 4.35. Let 𝑅 be a ring. Then the subset of invertible elements with respect to
multiplication is called the units of 𝑅 and is denoted by 𝑅∗ . The units form an abelian
group. ♢
Definition 4.35 generalizes Definition 3.18 where we defined the units ℤ∗𝑛 of the
integers modulo 𝑛.
Example 4.36. ℤ∗ = {−1, 1}. This group is isomorphic to the additive group ℤ2 . ♢
Clearly, the additive identity element 0 ∈ 𝑅 is not multiplicatively invertible, but

if all other elements of 𝑅 are invertible, then 𝑅 is called a field.
Definition 4.37. A field 𝐾 is a ring where all nonzero elements are units, i.e., 𝐾 ∗ =
𝐾 ⧵ {0}.
Example 4.38. ℚ, ℝ and ℂ are fields, but ℤ is not a field.
Definition 4.39. Let 𝐹 be a field and 𝐾 ⊂ 𝐹. If 𝐾 is a field with respect to the field
operations inherited from 𝐹, then we say that 𝐾 is a subfield of 𝐹 or 𝐹 is a field extension
of 𝐾. ♢
It is evident that a field extension 𝐹 of 𝐾 is also a vector space over 𝐾 (see Section
4.4 on vector spaces).
Definition 4.40. Let 𝐹 be a field extension of 𝐾. If the dimension of 𝐹 over 𝐾 (as a
𝐾-vector space) is finite and equal to 𝑛, then 𝑛 is called the degree of the field extension
and we write [𝐹 ∶ 𝐾] = 𝑛.
Example 4.41. (1) ℝ is a subfield of ℂ. Furthermore, ℂ is a vector space over ℝ and
[ℂ ∶ ℝ] = 2. A basis is given by 1 and 𝑖, where 𝑖 ∈ ℂ is the imaginary unit.
(2) ℝ is a field extension of ℚ, but the degree is infinite. ♢
The following section deals with finite fields and their extensions.
4.3. Finite Fields

We are particularly interested in finite fields. A natural candidate is the ring of integers
modulo 𝑛, but this only gives a field if 𝑛 is a prime number:
Proposition 4.42. Let 𝑛 ≥ 2. Then (ℤ𝑛 , +, ⋅) is a field if and only if 𝑛 is a prime number.
Proof. Suppose 𝑛 is prime. Then the integers 1, 2, … , 𝑛 − 1 are relatively prime to

𝑛 and hence invertible modulo 𝑛 (see Proposition 3.17). Conversely, if these numbers
are invertible and therefore relatively prime to 𝑛, then 𝑛 has no non-trivial divisors and
must be a prime number. □
Warning 4.43. In algebraic number theory, the ring of 𝑝-adic integers is denoted by
ℤ𝑝 . This infinite ring is the 𝑝-adic completion of ℤ and not needed in our context.
In cryptography, ℤ𝑛 stands for the finite cyclic group of order 𝑛 and ℤ𝑝 is the finite
cyclic group of prime order 𝑝. However, ℤ/(𝑛) = ℤ/𝑛ℤ and ℤ/(𝑝) = ℤ/𝑝ℤ is a clearer
notation.
Definition 4.44. Let 𝑝 be a prime number; then we write 𝐺𝐹(𝑝) or 𝔽𝑝 for the field
(ℤ𝑝 , +, ⋅) with 𝑝 elements.
Example 4.45. The binary digits {0, 1} form a field which is isomorphic to 𝐺𝐹(2),
where addition is XOR and multiplication is AND (see Table 1.1 in Chapter 1). ♢
There are rings of any finite order (for example ℤ𝑛 ), but this is not the case for
fields:
Proposition 4.46. Let 𝐾 be a finite field of order 𝑛; then 𝑛 is a prime number or a prime-
power.
Proof. Let 1 be the multiplicative identity element in 𝐾. By assumption, ord(𝐾) = 𝑛

and therefore 𝑛 ⋅ 1 = 0 in 𝐾. At least one prime factor 𝑝 of 𝑛 must be zero in 𝐾, i.e.,
𝑝 ⋅ 1 = 0 in 𝐾, because otherwise all prime factors are invertible, which is impossible
since their product is 0. This implies that 𝐺𝐹(𝑝) is a subfield of 𝐾. In fact, 𝐺𝐹(𝑝) is the
image of the ring homomorphism ℤ → 𝐾 that sends 1 ∈ ℤ to 1 ∈ 𝐾. By Definition
4.39, 𝐾 is a field extension and thus a vector space over 𝐺𝐹(𝑝). Since 𝐾 is a finite field,
the degree of 𝐾 over 𝐺𝐹(𝑝) must be finite, say [𝐾 ∶ 𝐺𝐹(𝑝)] = 𝑚. Hence ord(𝐾) = 𝑝𝑚 ,
which proves the assertion. □
Definition 4.47. Let 𝐾 be a field. The characteristic of 𝐾 is defined to be the smallest
positive integer 𝑛 such that 𝑛 ⋅ 1 = 0 in 𝐾. If 𝑛 ⋅ 1 ≠ 0 in 𝐾 for all nonzero integers
𝑛, then 𝐾 is said to have characteristic 0. The characteristic of a field 𝐾 is denoted by
𝑐ℎ𝑎𝑟(𝐾).
Example 4.48. 𝑐ℎ𝑎𝑟(ℚ) = 𝑐ℎ𝑎𝑟(ℝ) = 𝑐ℎ𝑎𝑟(ℂ) = 0. For a prime number 𝑝, 𝑐ℎ𝑎𝑟(𝐺𝐹(𝑝))

= 𝑝. ♢
However, 𝐺𝐹(𝑝) is not the only field of characteristic 𝑝, and in the following we
construct fields of order 𝑝𝑛 for primes 𝑝 and integers 𝑛 ≥ 2. Unfortunately, the naive
constructions do not work:
• ℤ𝑝𝑛 is a ring with 𝑝𝑛 elements, but not a field (compare Proposition 4.42). For
example, 𝑝 is nonzero but not invertible modulo 𝑝𝑛 since gcd(𝑝, 𝑝𝑛 ) = 𝑝.
• ℤ𝑛𝑝 = ℤ𝑝 × ⋯ × ℤ𝑝 with component-wise addition and multiplication is a ring
with 𝑝𝑛 elements, but not a field (see Exercise 12). ♢
In fact, the construction of a field 𝐺𝐹(𝑝𝑛 ) of order 𝑝𝑛 is a bit more involved and
requires polynomial rings.
Definition 4.49. Let 𝐾 be a field. Then 𝐾[𝑥] is called the set (or ring) of polynomials
over 𝐾 and consists of all formal expressions
𝑛
𝑓(𝑥) = ∑ 𝑎𝑖 𝑥𝑖 = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + ⋯ + 𝑎𝑛 𝑥𝑛 ,
𝑖=0
where 𝑎𝑖 ∈ 𝐾 and 𝑛 ≥ 0 is an integer. The degree deg(𝑓) of 𝑓 is equal to 𝑛 if 𝑎𝑛 ≠ 0.

The degree of constant polynomials is 0. A polynomial is called monic if 𝑎𝑛 = 1. ♢
Polynomials can be added and multiplied in the obvious way.
Proposition 4.50. The polynomials (𝐾[𝑥], +, ⋅) over 𝐾 form a ring.
Example 4.51. Let 𝑓, 𝑔 ∈ 𝐺𝐹(2)[𝑥] with 𝑓(𝑥) = 1 + 𝑥 + 𝑥3 and 𝑔(𝑥) = 𝑥 + 𝑥2 . We

compute
𝑓(𝑥) + 𝑔(𝑥) = (1 + 𝑥 + 𝑥3 ) + (𝑥 + 𝑥2 ) = 1 + 𝑥2 + 𝑥3 ,
𝑓(𝑥) ⋅ 𝑔(𝑥) = (1 + 𝑥 + 𝑥3 )(𝑥 + 𝑥2 )
= 𝑥 + 𝑥2 + 𝑥2 + 𝑥3 + 𝑥4 + 𝑥5 = 𝑥 + 𝑥3 + 𝑥4 + 𝑥5 . ♢
Note that 𝐾[𝑥] is not a field, since polynomials of degree ≥ 1 cannot be inverted
multiplicatively.
But we have a division with remainder. Let 𝑓(𝑥), 𝑔(𝑥) ∈ 𝐾[𝑥] with 𝑔(𝑥) ≠ 0. Then
the division 𝑓(𝑥) ∶ 𝑔(𝑥) gives a quotient 𝑞(𝑥) ∈ 𝐾[𝑥] and a remainder 𝑟(𝑥) ∈ 𝐾[𝑥]
such that
𝑓(𝑥) = 𝑞(𝑥)𝑔(𝑥) + 𝑟(𝑥) and deg(𝑟) < deg(𝑔).
Obviously, 𝑔(𝑥) divides 𝑓(𝑥) if and only if the remainder is 0.
Example 4.52. Let 𝑓(𝑥) = 𝑥6 + 𝑥5 + 𝑥3 + 𝑥2 + 𝑥 + 1 and 𝑔(𝑥) = 𝑥4 + 𝑥3 + 1 be

polynomials in 𝐺𝐹(2)[𝑥]. The quotient of 𝑓(𝑥) ∶ 𝑔(𝑥) is 𝑞(𝑥) = 𝑥2 , the remainder is
𝑟(𝑥) = 𝑥3 + 𝑥 + 1 and we have an equation
𝑥6 + 𝑥5 + 𝑥3 + 𝑥2 + 𝑥 + 1 = 𝑥2 (𝑥4 + 𝑥3 + 1) + (𝑥3 + 𝑥 + 1).
SageMath can do these calculations:
sage: R.<x> = PolynomialRing (GF(2),x)
sage: f = x^6+x^5+x^3+x^2+x+1
sage: g = x^4+x^3+1
sage: q= f // g
sage: r= f % g
sage: print q, r, q*g+r
x^2 x^3 + x + 1 x^6 + x^5 + x^3 + x^2 + x + 1
Definition 4.53. Let 𝑓(𝑥), 𝑔(𝑥) ∈ 𝐾[𝑥] be nonzero polynomials. Then the greatest
common divisor gcd(𝑓, 𝑔) is the monic polynomial of highest possible degree that di-
vides 𝑓(𝑥) and 𝑔(𝑥). ♢
The greatest common divisor (gcd) of polynomials can be efficiently computed us-
ing the Extended Euclidean Algorithm, analogous to the gcd of integers (see Algorithm
3.1 in Chapter 3). The integer division is replaced by the division of polynomials with
remainder. The Extended Euclidean Algorithm takes two polynomials 𝑓 and 𝑔 as input
and outputs 𝑔𝑐𝑑(𝑓, 𝑔) along with two polynomials 𝑎(𝑥) and 𝑏(𝑥) such that
𝑔𝑐𝑑(𝑓, 𝑔) = 𝑎(𝑥)𝑓(𝑥) + 𝑏(𝑥)𝑔(𝑥).
The gcd of 𝑓 and 𝑔 has the following property: if ℎ(𝑥) divides 𝑓(𝑥) and 𝑔(𝑥), then ℎ(𝑥)
divides gcd(𝑓, 𝑔).
Example 4.54. We compute gcd(𝑥3 + 1, 𝑥2 + 1) over 𝐺𝐹(2) (see Table 4.1) and obtain
gcd(𝑥3 + 1, 𝑥2 + 1) = 𝑥 + 1 = 1 ⋅ (𝑥3 + 1) − 𝑥 ⋅ (𝑥2 + 1).
Table 4.1. Computation of gcd(𝑥3 + 1, 𝑥2 + 1) = 𝑥 + 1.
(𝑥3 + 1) ∶ (𝑥2 + 1) = 𝑥 rem. 𝑥 + 1 (𝑥3 + 1) = 𝑥 ⋅ (𝑥2 + 1) + (𝑥 + 1)

(𝑥2 + 1) ∶ (𝑥 + 1) = (𝑥 + 1) rem. 0 (𝑥2 + 1) = (𝑥 + 1) ⋅ (𝑥 + 1) + 0
There is a formal derivative of polynomials:

Definition 4.55. The formal derivative is a map 𝐷 ∶ 𝐾[𝑥] → 𝐾[𝑥] defined by 𝐷(𝑥𝑛 ) =
𝑛 𝑥𝑛−1 , and a 𝐾-linear extension to arbitrary polynomials:
𝐷(𝑎𝑛 𝑥𝑛 + 𝑎𝑛−1 𝑥𝑛−1 + ⋯ + 𝑎1 𝑥 + 𝑎0 ) = 𝑛𝑎𝑛 𝑥𝑛−1 + (𝑛 − 1)𝑎𝑛−1 𝑥𝑛−2 + ⋯ + 𝑎1 .
𝑛
Example 4.56. Let 𝑛 ∈ ℕ and 𝑓(𝑥) = 𝑥2 + 𝑥 ∈ 𝐺𝐹(2)[𝑥]. Then
𝑛 −1
𝐷(𝑓) = 2𝑛 𝑥2 + 1 = 1,
since 2𝑛 = 0 in 𝐺𝐹(2). ♢
One can show that the derivative satisfies the product rule (see Exercise 13):
𝐷(𝑓 ⋅ 𝑔) = 𝑓 ⋅ 𝐷(𝑔) + 𝐷(𝑓) ⋅ 𝑔.
Note that 𝐷 does not have a geometric interpretation as in the real case. However,
the derivative can detect double roots of polynomials.
Proposition 4.57. Let 𝑓(𝑥) ∈ 𝐾[𝑥] and assume that gcd(𝑓, 𝐷(𝑓)) = 1. Then 𝑓(𝑥) is
square-free, i.e., it is not divisible by the square of any polynomial of degree at least 1. In
particular, 𝑓(𝑥) is not divisible by (𝑥 − 𝑎)2 for any 𝑎 ∈ 𝐾 and does not have double roots.
Proof. Suppose that 𝑓 is not square-free and 𝑓 = 𝑔2 ℎ for polynomials 𝑔, ℎ ∈ 𝐾[𝑥]

with deg(𝑔) ≥ 1; then
𝐷(𝑓) = 𝐷(𝑔2 )ℎ + 𝑔2 𝐷(ℎ) = 2𝑔𝐷(𝑔)ℎ + 𝑔2 𝐷(ℎ).
Hence 𝑔 is a common factor of 𝑓 and 𝐷(𝑓), a contradiction to gcd(𝑓, 𝐷(𝑓)) = 1. □
2𝑛
Example 4.58. Let 𝑛 ∈ ℕ and 𝑓(𝑥) = 𝑥 + 𝑥 ∈ 𝐺𝐹(2)[𝑥] (see above). Since
gcd(𝑓, 𝐷(𝑓)) = 1, the polynomial 𝑓 does not have multiple roots. ♢
Now we define residue classes of polynomial rings:
Definition 4.59. Let 𝑔 ∈ 𝐾[𝑥] be a polynomial with deg(𝑔) ≥ 1. Then 𝑔(𝑥) defines an
equivalence relation on 𝐾[𝑥]:
𝑓1 (𝑥) ∼ 𝑓2 (𝑥) if 𝑓1 (𝑥) − 𝑓2 (𝑥) = 𝑞(𝑥)𝑔(𝑥) for some 𝑞(𝑥) ∈ 𝐾[𝑥].
Equivalent polynomials 𝑓1 and 𝑓2 are called congruent modulo 𝑔(𝑥) and we write 𝑓1 (𝑥) ≡
𝑓2 (𝑥) mod 𝑔(𝑥). The set of equivalence classes or residue classes modulo 𝑔(𝑥) is denoted
by 𝐾[𝑥]/(𝑔(𝑥)). ♢
Two polynomials 𝑓1 and 𝑓2 are congruent modulo 𝑔 if and only if they have the
same remainder when divided by 𝑔(𝑥). Note that the definition is similar to residue
classes modulo an integer 𝑛, but here the construction is based on the polynomial ring
𝐾[𝑥] instead of the ring of integers ℤ.
The residue classes modulo 𝑔(𝑥) form not only a set, but also a ring:
Proposition 4.60. Let 𝑔 ∈ 𝐾[𝑥] and 𝑛 = deg(𝑔) ≥ 1. Then 𝐾[𝑥]/(𝑔(𝑥)) is again a ring
called a quotient ring, factor ring or residue class ring, with the operations induced by
𝐾[𝑥]. Each residue class has a unique standard representative of degree less than 𝑛.
Proof. The ring structure can be easily verified. The standard representative can be
found by division with remainder. Let 𝑓(𝑥) ∈ 𝐾[𝑥] be any representative of a residue
class. We divide 𝑓(𝑥) by 𝑔(𝑥) and obtain polynomials 𝑞(𝑥), 𝑟(𝑥) such that
𝑓(𝑥) = 𝑞(𝑥)𝑔(𝑥) + 𝑟(𝑥),
where deg(𝑟) < 𝑛. The equation implies 𝑓(𝑥) ≡ 𝑟(𝑥) mod 𝑔(𝑥), where 𝑟(𝑥) is the
standard representative. □
Example 4.61. We continue Example 4.52. The division with remainder implies
𝑥6 + 𝑥5 + 𝑥3 + 𝑥2 + 𝑥 + 1 ≡ 𝑥3 + 𝑥 + 1 mod 𝑥4 + 𝑥3 + 1.
Therefore, the classes of 𝑥6 + 𝑥5 + 𝑥3 + 𝑥2 + 𝑥 + 1 and 𝑥3 + 𝑥 + 1 are equal in the residue
class ring 𝐺𝐹(2)[𝑥]/(𝑥4 + 𝑥3 + 1).
Remark 4.62. The construction of residue classes can be studied in a more general
context. Let 𝑅 be a ring. An ideal 𝐼 ⊂ 𝑅 is an additive subgroup with the property that
𝑟 ⋅ 𝑥 ∈ 𝐼 for all 𝑟 ∈ 𝑅 and 𝑥 ∈ 𝐼. For any ideal 𝐼 of a ring 𝑅 one has the quotient ring
𝑅/𝐼. Two elements 𝑥, 𝑦 ∈ 𝑅 are equivalent and identified in 𝑅/𝐼 if 𝑥 − 𝑦 ∈ 𝐼.
We considered polynomial rings 𝑅 = 𝐾[𝑥] and principal ideals 𝐼 = (𝑔(𝑥)) gener-
ated by a single polynomial 𝑔(𝑥) ∈ 𝐾[𝑥]. If 𝑅 = ℤ and 𝐼 = (𝑛), then the quotient 𝑅/𝐼
defines the integers modulo 𝑛, i.e., ℤ/(𝑛) = ℤ𝑛 .
Proposition 4.63. Let 𝑝 be a prime and 𝑔 ∈ 𝐺𝐹(𝑝)[𝑥] a polynomial of degree 𝑛; then

the quotient ring 𝐺𝐹(𝑝)[𝑥]/(𝑔(𝑥)) has 𝑝𝑛 elements.
Proof. The elements of 𝐺𝐹(𝑝)[𝑥]/(𝑔(𝑥)) are in bijection with the polynomials in

𝐺𝐹(𝑝)[𝑥] of degree less than 𝑛 (see Proposition 4.60). Such polynomials are given by 𝑛
coefficients in 𝐺𝐹(𝑝). Hence the number of elements in the quotient ring is 𝑝𝑛 . □
The polynomial ring 𝐾[𝑥] has similar properties to ℤ with respect to factorization.
Polynomials can be decomposed into a product of polynomials and the factorization is
essentially unique. The prime numbers in ℤ correspond to irreducible polynomials in
𝐾[𝑥].
Definition 4.64. A polynomial 𝑓(𝑥) ∈ 𝐾[𝑥] is called irreducible, if it cannot be fac-
tored into two polynomials of smaller degree. Otherwise, the polynomial is called re-
ducible. ♢
Irreducible polynomials in 𝐾[𝑥] do not possess any zeros 𝑎 ∈ 𝐾, since otherwise a

linear factor (𝑥 − 𝑎) can be split off. The converse statement only holds for degree ≤ 3.
There are polynomials of degree ≥ 4 without zeros which are reducible (see Example
4.65 (4) below).
Example 4.65. (1) 𝑓(𝑥) = 𝑥2 + 1 is irreducible in ℝ[𝑥], but reducible in ℂ[𝑥]:
𝑥2 + 1 = (𝑥 − 𝑖)(𝑥 + 𝑖).
𝑓(𝑥) is also reducible over 𝐺𝐹(2), since 𝑥2 + 1 = (1 + 𝑥)2 in 𝐺𝐹(2)[𝑥].
(2) 𝑓(𝑥) = 𝑥2 + 𝑥 + 1 is irreducible in 𝐺𝐹(2)[𝑥]. Assume that 𝑓(𝑥) is reducible. Then
𝑓(𝑥) factorizes into two linear factors of type (𝑥 − 𝑎), which is impossible since
𝑓(𝑥) does not have a root in 𝐺𝐹(2) (note that 𝑓(0) = 1 and 𝑓(1) = 1).
(3) 𝑓(𝑥) = 𝑥2 + 1 is irreducible over 𝐺𝐹(3)[𝑥], since 𝑓(0), 𝑓(1) and 𝑓(2) are nonzero
modulo 3.
(4) 𝑔(𝑥) = 𝑥4 + 𝑥2 + 1 has no zeros over 𝐺𝐹(2), but 𝑔(𝑥) = (𝑥2 + 𝑥 + 1)2 in 𝐺𝐹(2)[𝑥].
Hence 𝑔(𝑥) is reducible.
Table 4.2 lists the irreducible polynomials over 𝐺𝐹(2) of degree up to 5.

Example 4.66. Let 𝑔(𝑥) = 𝑥8 + 𝑥4 + 𝑥3 + 𝑥 + 1 ∈ 𝐺𝐹(2)[𝑥]. This polynomial is
used by the AES block cipher (see Section 5.2). We use SageMath to verify that 𝑔(𝑥) is
irreducible:
sage: g=x^8+x^4+x^3+x+1
sage: g. is_irreducible ()
True
An important fact is that polynomial rings modulo irreducible polynomials are

fields:
Proposition 4.67. Let 𝑔(𝑥) ∈ 𝐾[𝑥] be an irreducible polynomial. Then the quotient ring
𝐾[𝑥]/(𝑔(𝑥)) is a field. ♢
Table 4.2. Irreducible polynomials over 𝐺𝐹(2).
Degree Polynomials
2 𝑥2 + 𝑥 + 1
3 𝑥3 + 𝑥 + 1
𝑥3 + 𝑥2 + 1
4 𝑥4 + 𝑥 + 1
4
𝑥 + 𝑥3 + 𝑥2 + 𝑥 + 1
𝑥4 + 𝑥3 + 1
5 𝑥5 + 𝑥2 + 1
𝑥 + 𝑥3 + 𝑥2 + 𝑥 + 1
5
𝑥5 + 𝑥3 + 1
𝑥 + 𝑥4 + 𝑥3 + 𝑥 + 1
5
𝑥5 + 𝑥4 + 𝑥3 + 𝑥2 + 1
𝑥5 + 𝑥4 + 𝑥2 + 𝑥 + 1
This result might look surprising, since 𝐾[𝑥] is far from being a field: no polyno-
mial of degree ≥ 1 is multiplicatively invertible in 𝐾[𝑥]. However, an inversion mod-
ulo 𝑔(𝑥) is often possible, since two representatives 𝑓1 and 𝑓2 are multiplicative inverses
mod 𝑔(𝑥) if
𝑓1 (𝑥)𝑓2 (𝑥) = 1 + 𝑞(𝑥)𝑔(𝑥)
for some 𝑞(𝑥) ∈ 𝐾[𝑥].
The proof of Proposition 4.67 uses the Extended Euclidean Algorithm for polynomi-
als. We briefly sketch the proof: let 𝑓(𝑥) be a nonzero polynomial of degree less than
deg(𝑔). Then there are polynomials ℎ1 and ℎ2 such that
1 = ℎ1 (𝑥)𝑓(𝑥) + ℎ2 (𝑥)𝑔(𝑥).
This shows that 1 ≡ ℎ1 (𝑥)𝑓(𝑥) mod 𝑔(𝑥) so that 𝑓(𝑥) is invertible modulo 𝑔(𝑥).
Definition 4.68. Let 𝑔(𝑥) ∈ 𝐺𝐹(𝑝)[𝑥] be an irreducible polynomial of degree 𝑛. Then
the residue field 𝐺𝐹(𝑝)[𝑥]/(𝑔(𝑥)) defines the Galois Field 𝐺𝐹(𝑝𝑛 ) = 𝔽𝑝𝑛 of order 𝑝𝑛 .
♢
It follows from Proposition 4.63 that the field 𝐺𝐹(𝑝𝑛 ) indeed contains 𝑝𝑛 elements;
each residue class has a unique representative of degree less than 𝑛, i.e., each class is
represented by a polynomial 𝑓(𝑥) = 𝑎0 + 𝑎1 𝑥 + ⋯ + 𝑎𝑛−1 𝑥𝑛−1 with 𝑎𝑖 ∈ 𝐺𝐹(𝑝).
Note that the definition of 𝐺𝐹(𝑝𝑛 ) depends on an irreducible polynomial of degree
𝑛 and it is not clear whether such a polynomial exists. We want to show that a finite
field of order 𝑝𝑛 exists and is essentially unique.
If 𝐺𝐹(𝑝𝑛 ) exists, then ord(𝐺𝐹(𝑝𝑛 )∗ ) = 𝑝𝑛 − 1 and thus
𝑛 −1
𝑎𝑝 =1
for all 𝑎 ∈ 𝐺𝐹(𝑝𝑛 )∗ . This implies

𝑛
𝑎𝑝 = 𝑎
for all 𝑎 ∈ 𝐺𝐹(𝑝𝑛 ), i.e., including the zero element. In other words, the elements of
𝐺𝐹(𝑝𝑛 ) are exactly the roots of the polynomial
𝑛
𝑓(𝑥) = 𝑥𝑝 − 𝑥.
Let 𝑎1 , 𝑎2 , … , 𝑎𝑝𝑛 denote the elements of 𝐺𝐹(𝑝𝑛 ). The polynomial 𝑓(𝑥) splits into linear
factors in 𝐺𝐹(𝑝𝑛 )[𝑥], i.e.,
𝑛
𝑥𝑝 − 𝑥 = (𝑥 − 𝑎1 ) ⋅ (𝑥 − 𝑎2 ) ⋯ (𝑥 − 𝑎𝑝𝑛 ).
This observation can be used to construct 𝐺𝐹(𝑝𝑛 ). A general theorem in field theory
states that a field can be extended such that any given polynomial of degree ≥ 1 com-
pletely factorizes over the extension field.
Theorem 4.69. Let 𝐾 be a field and 𝑓(𝑥) ∈ 𝐾[𝑥] a polynomial of positive degree. There
exists an extension field 𝐹 ⊃ 𝐾 such that 𝑓(𝑥) splits into linear factors over 𝐹. The smallest
field with this property is called the splitting field of 𝑓 over 𝐾. Any two splitting fields are
isomorphic.
Remark 4.70. We refer to the literature (for example [Sho09]) for a proof of the above
Theorem. It is tempting to construct 𝐹 as the quotient field of 𝐾[𝑥] modulo 𝑓(𝑥), but
this does not work in general. Firstly, splitting fields exist for any polynomial and not
only for irreducible ones. Secondly, even if 𝑓(𝑥) is irreducible so that 𝐹 = 𝐾[𝑥]/(𝑓(𝑥))
gives an extension field of 𝐾, the field 𝐹 might not contain all roots of 𝑓(𝑥). In this case,
further extensions of 𝐹 are necessary in order to obtain the splitting field.
Example 4.71. (1) The field ℂ of complex numbers is the splitting field of 𝑓(𝑥) =
𝑥2 + 1 over ℝ.
(2) Let 𝐾 = ℚ and 𝑓(𝑥) = 𝑥3 − 2. The polynomial is irreducible over ℚ and 𝐹 =
ℚ[𝑥]/(𝑥3 − 2) = ℚ( √ 3
2) is an extension field of degree 3 over ℚ, but 𝐹 is not the
splitting field of 𝑓(𝑥). In fact, 𝐹 is a subfield of ℝ and 𝑓(𝑥) has two complex roots
3
beside the real root √ 2. One can show that the degree of the splitting field of 𝑓(𝑥)
over ℚ is 6.
Remark 4.72. The algebraic closure 𝐾 of a field 𝐾 has the property that all polynomials
completely factor over 𝐾. One can show that an algebraic closure of a field exists and
is unique up to isomorphism. For example, ℂ is the algebraic closure of ℝ. In this
case, an extension of degree 2 suffices, but in general, the algebraic closure of a field
has infinite degree. This is, for example, true for the fields ℚ and 𝐺𝐹(𝑝). We note
that there are also transcendent extensions which are not algebraic, i.e., not defined by
adjoining roots of polynomials, for example the field ℚ(𝑥) of rational functions in one
variable over ℚ. ♢
We return to the construction of a finite field of order 𝑝𝑛 . At this point we know

𝑛
that if 𝐺𝐹(𝑝𝑛 ) exists, then it is isomorphic to the splitting field of 𝑓(𝑥) = 𝑥𝑝 − 𝑥 over
𝐺𝐹(𝑝). Now we show that the splitting field of 𝑓 has 𝑝𝑛 elements, which proves that
𝐺𝐹(𝑝𝑛 ) exists (for all 𝑛 ∈ ℕ) and is unique up to isomorphism.
𝑛
Proposition 4.73. Let 𝑓(𝑥) = 𝑥𝑝 − 𝑥 ∈ 𝐺𝐹(𝑝)[𝑥]. The splitting field of 𝑓(𝑥) over
𝐺𝐹(𝑝) has 𝑝𝑛 elements and defines the field 𝐺𝐹(𝑝𝑛 ).
Proof. Firstly, we show that 𝑓(𝑥) does not have multiple roots. The formal derivative
𝑛
is 𝐷(𝑓) = 𝑝𝑛 𝑥𝑝 −1 − 1 = −1 and thus gcd(𝑓, 𝐷(𝑓)) = 1. By Proposition 4.57, 𝑓 does
not have multiple roots and so the splitting field 𝐹 of 𝑓(𝑥) over 𝐺𝐹(𝑝) contains at least
𝑛
the 𝑝𝑛 distinct roots of 𝑥𝑝 − 𝑥. However, 𝐹 could contain more elements. Let 𝑆 =
{𝑎1 , … , 𝑎𝑝𝑛 } be the set of roots. Note that 𝐺𝐹(𝑝) ⊂ 𝑆 since 𝑎𝑝 ≡ 𝑎 mod 𝑝 for all
𝑎 ∈ 𝐺𝐹(𝑝).
Next, we show that 𝑆 forms a field which must be equal to 𝐹, since 𝐹 is the smallest
field extension of 𝐺𝐹(𝑝) where 𝑓(𝑥) splits into linear factors. We thus need to prove
that 𝑎 − 𝑏 ∈ 𝑆 for all 𝑎, 𝑏 ∈ 𝑆 and 𝑎𝑏−1 ∈ 𝑆 for all 𝑎, 𝑏 ∈ 𝑆 ⧵ {0} (see Proposition 4.8
on conditions for a subgroup). We show that 𝑓(𝑎 − 𝑏) = 0 and 𝑓(𝑎𝑏−1 ) = 0 if 𝑓(𝑎) = 0
and 𝑓(𝑏) = 0. To this end, we observe that
𝑚 𝑛 𝑛
(𝑎 − 𝑏)𝑝 = 𝑎𝑝 + (−𝑏)𝑝 = 𝑎 − 𝑏
since the other terms given by the Binomial Theorem are multiples of 𝑝 and therefore
𝑛
zero in 𝐺𝐹(𝑝). We note that (−1)𝑝 = −1 if 𝑝 ≠ 2 and −1 = 1 for 𝑝 = 2. This gives
𝑓(𝑎 − 𝑏) = 0. Furthermore,
𝑝𝑛 𝑛 𝑛 𝑛 𝑛
(𝑎(𝑏−1 )) = 𝑎𝑝 (𝑏−1 )𝑝 = 𝑎𝑝 (𝑏𝑝 )−1 = 𝑎𝑏−1 ,
𝑛
and so 𝑓(𝑎𝑏−1 ) = 0. Summarizing, the roots of 𝑓(𝑥) = 𝑥𝑝 − 𝑥 form the splitting field
𝐹 = 𝐺𝐹(𝑝𝑛 ), and this field of 𝑝𝑛 elements is unique up to isomorphism. □
Example 4.74. 𝐺𝐹(4) is the splitting field of 𝑓(𝑥) = 𝑥4 −𝑥 over 𝐺𝐹(2). This polynomial
factorizes into 𝑥(𝑥 + 1)(𝑥2 + 𝑥 + 1) over 𝐺𝐹(2). The first two factors correspond to the
elements 0 and 1 of the base field 𝐺𝐹(2). The polynomial 𝑥2 + 𝑥 + 1 is irreducible over
𝐺𝐹(2) and
𝐺𝐹(4) = 𝐺𝐹(2)[𝑥]/(𝑥2 + 𝑥 + 1).
𝐺𝐹(4) is represented by the polynomials {0, 1, 𝑥, 𝑥 + 1}.
Addition is obvious (modulo 2), and multiplication also follows the usual rules,
but the result is reduced modulo 𝑥2 + 𝑥 + 1. For example:
𝑥(𝑥 + 1) = 𝑥2 + 𝑥 ≡ 1 mod (𝑥2 + 𝑥 + 1).
Table 4.3 shows addition and multiplication in 𝐺𝐹(4).
The multiplicative inverses are 1−1 = 1, 𝑥−1 ≡ 𝑥 + 1 mod (𝑥2 + 𝑥 + 1) and
(𝑥 + 1)−1 ≡ 𝑥 mod (𝑥2 + 𝑥 + 1).
Proposition 4.75. Let 𝑛, 𝑚 ∈ ℕ. Then 𝐺𝐹(𝑝𝑚 ) ⊂ 𝐺𝐹(𝑝𝑛 ) if and only if 𝑚 ∣ 𝑛 (see
Figure 4.2).
Table 4.3. Addition and multiplication tables for 𝐺𝐹(4).
+ 0 1 𝑥 𝑥+1 ⋅ 0 1 𝑥 𝑥+1
0 0 1 𝑥 𝑥+1 0 0 0 0 0
1 1 0 𝑥+1 𝑥 1 0 1 𝑥 𝑥+1
𝑥 𝑥 𝑥+1 0 1 𝑥 0 𝑥 𝑥+1 1
𝑥+1 𝑥+1 𝑥 1 0 𝑥+1 0 𝑥+1 1 𝑥
Proof. Suppose that 𝐺𝐹(𝑝𝑚 ) ⊂ 𝐺𝐹(𝑝𝑛 ) and the degree of the field extension is
[𝐺𝐹(𝑝𝑛 ) ∶ 𝐺𝐹(𝑝𝑚 )] = 𝑘. It follows that the order of 𝐺𝐹(𝑝𝑛 ) is 𝑝𝑛 = (𝑝𝑚 )𝑘 = 𝑝𝑚𝑘
and thus 𝑚 ∣ 𝑛. To prove the opposite direction, let 𝑎 ∈ 𝐺𝐹(𝑝𝑚 ) and 𝑛 = 𝑚𝑘 for some
𝑘 ∈ ℕ. Then
𝑛 𝑚 )𝑘 𝑚 𝑚 𝑚
𝑎𝑝 = 𝑎(𝑝 = (((𝑎𝑝 )𝑝 ) … )𝑝 (𝑘-fold exponentiation).
𝑚 𝑛
Since 𝑎𝑝 = 𝑎 in each step, we obtain 𝑎𝑝 = 𝑎 and therefore 𝑎 ∈ 𝐺𝐹(𝑝𝑛 ). □
𝐺𝐹(𝑝8 ) 𝐺𝐹(𝑝9 )
𝐺𝐹(𝑝6 )
4
𝐺𝐹(𝑝 )
𝐺𝐹(𝑝2 ) 𝐺𝐹(𝑝3 ) 𝐺𝐹(𝑝5 ) 𝐺𝐹(𝑝7 )
𝐺𝐹(𝑝)
Figure 4.2. Subfield relation for several extension fields of 𝐺𝐹(𝑝).
Remark 4.76. As an additive group, 𝐺𝐹(𝑝𝑛 ) is isomorphic to

(ℤ𝑝 )𝑛 = ℤ𝑝 × ⋯ × ℤ𝑝 .
However, the multiplication in 𝐺𝐹(𝑝𝑛 ) is not defined component-wise (compare Exer-
cise 12), but rather by a multiplication of polynomials. Fixing an irreducible polyno-
mial of degree 𝑛 yields a unique representation of elements in 𝐺𝐹(𝑝𝑛 ) as 𝑛-dimensional
vectors over 𝐺𝐹(𝑝).
Example 4.77. Let 𝑔(𝑥) = 𝑥8 + 𝑥4 + 𝑥3 + 𝑥 + 1 ∈ 𝐺𝐹(2)[𝑥] (see Example 4.66).
Then 𝐺𝐹(2)[𝑥]/(𝑔(𝑥)) defines the field 𝐺𝐹(28 ) with 256 elements. By fixing 𝑔(𝑥), the
elements in 𝐺𝐹(28 ) are in bijection to polynomials of degree less than 8, which in turn
correspond the 8-bit words. The first bit (most significant bit, MSB) corresponds to the
coefficient of 𝑥7 , the second bit to 𝑥6 , etc., and the last bit (least significant bit, LSB) to
𝑥0 = 1. The block cipher AES (see Section 5.2) uses this representation of elements in
𝐺𝐹(256) as 8-bit words. The elements of this field can also be written in hexadecimal
notation. Addition on the 8-bit words is given by a simple XOR operation, but the
multiplication is less obvious and defined by a multiplication of polynomials, followed
by a reduction modulo 𝑔(𝑥).
We can use SageMath for computations in 𝐺𝐹(256). Suppose we want to compute
𝑥7 (𝑥 + 1) mod 𝑔(𝑥) and the multiplicative inverse of 𝑥 + 1 mod 𝑔(𝑥).
sage: g=x^8+x^4+x^3+x+1
sage: K.<a>=R. quotient_ring (g)
sage: a^7 * (a+1) ; 1/(a+1)
a^7 + a^4 + a^3 + a + 1
a^7 + a^6 + a^5 + a^4 + a^2 + a
In hexadecimal notation, the above computations can be written as 80 ⋅ 03 = 9B and

03−1 = F6.
The SageMath variable a represents the residue class of 𝑥 mod 𝑔(𝑥). We could
also define the quotient ring by K=R.quotient_ring(g), i.e., without reference to a
new variable a. In this case, SageMath writes xbar for the residue class of 𝑥.
4.4. Linear and Affine Maps

Vector spaces, linear maps and affine maps over finite fields play an important role in
cryptography, and we recapitulate some facts from linear algebra.
Definition 4.78. Let 𝐾 be a field. Then a 𝐾-vector space 𝑉 is an abelian group together
with a scalar multiplication, which assigns a vector 𝜆 ⋅ 𝑣 ∈ 𝑉 to a tuple (𝜆, 𝑣) ∈ 𝐾 × 𝑉
such that the following rules are satisfied:
(1) 𝜆 ⋅ (𝑣 + 𝑤) = 𝜆 ⋅ 𝑣 + 𝜆 ⋅ 𝑤 for all 𝜆 ∈ 𝐾, 𝑣, 𝑤 ∈ 𝑉.
(2) (𝜆 + 𝜇) ⋅ 𝑣 = 𝜆 ⋅ 𝑣 + 𝜇 ⋅ 𝑣 for all 𝜆, 𝜇 ∈ 𝐾, 𝑣 ∈ 𝑉.
(3) 𝜆 ⋅ (𝜇 ⋅ 𝑣) = (𝜆𝜇) ⋅ 𝑣 for all 𝜆, 𝜇 ∈ 𝐾, 𝑣 ∈ 𝑉.
(4) 1 ⋅ 𝑣 = 𝑣 for all 𝑣 ∈ 𝑉. ♢
In the following, we assume that the reader knows the definitions of linear inde-
pendence, basis and dimension of a vector space (see [WJW+ 14] or any other textbook
on linear algebra).
Example 4.79. (1) Let 𝐾 be any field; then 𝐾 𝑛 is the standard example of a 𝑛-dimen-
sional 𝐾-vector space.
(2) The binary strings of length 𝑛 ∈ ℕ form the 𝐺𝐹(2)-vector space 𝐺𝐹(2)𝑛 . The
group operation is defined by bitwise XORing. The scalar multiplication is trivial,
since the only factors are 0 and 1. Note that there is no natural ring structure on
𝑉.
(3) Let 𝑝 be a prime number and 𝑛 ∈ ℕ. Then 𝐺𝐹(𝑝𝑛 ) is a vector space over 𝐺𝐹(𝑝).
(4) 𝐺𝐹(256)𝑛 is a 𝐺𝐹(256)-vector space of dimension 𝑛. Here the scalar multipli-

cation is defined over the field 𝐺𝐹(256). It is also a 𝐺𝐹(2)-vector space, but of
dimension 8𝑛.
Maps between vector spaces that are compatible with addition and scalar multi-
plication are called linear:
Definition 4.80. Let 𝑓 ∶ 𝑉 → 𝑊 be a map between two 𝐾-vector spaces. Then 𝑓 is a
𝐾-linear map if:
(1) 𝑓(𝑣1 + 𝑣2 ) = 𝑓(𝑣1 ) + 𝑓(𝑣2 ) for all 𝑣1 , 𝑣2 ∈ 𝑉 and
(2) 𝑓(𝜆 ⋅ 𝑣) = 𝜆 ⋅ 𝑓(𝑣) for all 𝜆 ∈ 𝐾, 𝑣 ∈ 𝑉.
Example 4.81. Let 𝑉 and 𝑊 be 𝐺𝐹(2)-vector spaces; then 𝑓 ∶ 𝑉 → 𝑊 is linear if

𝑓(𝑣1 + 𝑣2 ) = 𝑓(𝑣1 ) + 𝑓(𝑣2 ) for all 𝑣1 , 𝑣2 ∈ 𝑉. The second condition is automatically
satisfied since the only scalars are 0 and 1.
Remark 4.82. One should understand that linearity is a very strong requirement,
and random maps are mostly far from linear! However, linear maps play an important
role in many applications as well as in cryptography.
Matrices are a key tool for the description of linear maps. We recapitulate the fol-
lowing fact from linear algebra:
Proposition 4.83. There is a one-to-one correspondence between 𝑛 × 𝑚 matrices over 𝐾
and linear maps 𝑓 ∶ 𝐾 𝑚 → 𝐾 𝑛 . Any matrix 𝐴 over 𝐾 with 𝑛 rows and 𝑚 columns gives
a linear map 𝑓 ∶ 𝐾 𝑚 → 𝐾 𝑛 by setting 𝑓(𝑣) = 𝐴𝑣, where we view 𝑣 as a column vector.
Conversely, a linear map 𝑓 ∶ 𝐾 𝑚 → 𝐾 𝑛 defines a matrix by writing the images of the
standard basis, i.e., 𝑓(𝑒1 ), 𝑓(𝑒2 ), … , 𝑓(𝑒𝑚 ), into the columns of a 𝑛 × 𝑚 matrix:
| | |
𝐴 = (𝑓(𝑒1 ) 𝑓(𝑒2 ) … 𝑓(𝑒𝑚 )) . ♢
| | |
The above construction can be generalized from the standard basis to an arbitrary
basis. In fact, a linear map is completely determined by its values on a basis.
Definition 4.84. A 𝐾-linear map 𝑓 ∶ 𝑉 → 𝑊 is said to be an isomorphism if 𝑓 is
invertible, i.e., if there is an inverse 𝐾-linear map 𝑓−1 ∶ 𝑊 → 𝑉. ♢
An isomorphism 𝑓 between 𝐾 𝑚 and 𝐾 𝑛 implies 𝑚 = 𝑛, but there are linear maps

𝑓 ∶ 𝐾 𝑛 → 𝐾 𝑛 which are neither injective nor surjective. 𝑓 is an isomorphism if and
only if the corresponding 𝑛 × 𝑛 matrix 𝐴 is invertible, i.e., if an inverse 𝑛 × 𝑛 matrix 𝐴−1
exists. There are several equivalent conditions for 𝐴 to be invertible, for example that
the determinant det(𝐴) is nonzero in 𝐾.
Example 4.85. Let 𝑓 ∶ 𝐺𝐹(2)3 → 𝐺𝐹(2)3 be defined by

𝑓(𝑥1 , 𝑥2 , 𝑥3 ) = (𝑥1 + 𝑥3 , 𝑥1 + 𝑥2 , 𝑥2 + 𝑥3 ).
𝑓 is linear and the corresponding matrix is
1 0 1
𝐴 = (1 1 0) .
0 1 1
The determinant is det(𝐴) = 2 ≡ 0 mod 2. Hence 𝑓 is not invertible over 𝐺𝐹(2). On
the other hand, the corresponding map over the real numbers ℝ is invertible. ♢
Cryptography primarily considers maps over finite fields, but more recent advances
(lattice-based cryptography and quantum computing) also require real and complex
vector spaces.
Definition 4.86. An 𝑛 × 𝑛 matrix 𝐴 over ℝ is called orthogonal if 𝐴𝑇 𝐴 = 𝐼𝑛 . ♢
Here 𝐴𝑇 denotes the transpose matrix and 𝐼𝑛 is the 𝑛 × 𝑛 identity matrix. Orthog-
onal matrices are invertible, the inverse matrix is 𝐴−1 = 𝐴𝑇 and det(𝐴) is either 1 or
−1. The rows and the columns are orthonormal vectors. The associated linear map
𝑓(𝑥) = 𝐴 𝑥 of real vector spaces is also called orthogonal and preserves lengths and
angles.
Example 4.87. The rotation of two-dimensional vectors by 𝛼 around the origin is de-
scribed by the following orthogonal matrix:
cos(𝛼) − sin(𝛼)
𝐴=( ).
sin(𝛼) cos(𝛼)
One easily verifies that 𝐴𝑇 𝐴 = 𝐼2 . The columns of 𝐴 are obtained by rotating the
standard unit vectors 𝑒1 and 𝑒2 by 𝛼 counter-clockwise around the origin. ♢
The corresponding notion for complex matrices is unitary.

Definition 4.88. A complex 𝑛 × 𝑛 matrix 𝐴 over ℂ is said to be unitary if 𝐴∗ 𝐴 = 𝐼𝑛 ,
𝑇
where 𝐴∗ denotes the conjugate transpose of 𝐴, i.e., 𝐴∗ = 𝐴 . ♢
The inverse of a unitary matrix 𝐴 is 𝐴∗ and | det(𝐴)| = | det(𝐴∗ )| = 1. The rows

and columns of 𝐴 form an orthonormal basis of ℂ𝑛 with respect to the Hermitian inner
product. The associated linear map over ℂ is also called unitary.
A slight generalization of linear maps are affine maps. They differ from linear maps
only by a constant translation. We remark that affine maps are sometimes also called
linear, although this is not fully precise. Furthermore, nonlinear usually means that a
map is neither linear nor affine.
Definition 4.89. A map 𝑓 ∶ 𝑉 → 𝑊 between 𝐾-vector spaces 𝑉 and 𝑊 is called affine

if there exists a linear map 𝑙 ∶ 𝑉 → 𝑊 and a vector 𝑏 ∈ 𝑊 such that for all 𝑣 ∈ 𝑉
𝑓(𝑣) = 𝑙(𝑣) + 𝑏. ♢
Affine maps 𝑓 ∶ 𝐾 𝑚 → 𝐾 𝑛 can be described by an 𝑛 × 𝑚 matrix 𝐴 and a vector

𝑏 ∈ 𝐾 𝑛:
𝑓(𝑣) = 𝐴 𝑣 + 𝑏.
Affine maps from 𝐺𝐹(2)𝑚 to 𝐺𝐹(2)𝑛 correspond to (𝑚, 𝑛)-vectorial Boolean maps of
algebraic degree 1 (see Definition 1.23).
Example 4.90. Let 𝑓(𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 ) = (𝑥1 + 𝑥2 + 𝑥3 + 1, 𝑥3 + 𝑥4 ) be a map over 𝐺𝐹(2).
Then
1 1 1 0 1
𝐴=( ) , 𝑏=( )
0 0 1 1 0
describes the affine map 𝑓, since 𝑓(𝑥) = 𝐴𝑥 + 𝑏 for all 𝑥 ∈ 𝐺𝐹(2)4 . ♢
More examples of linear, affine and nonlinear Boolean functions are given in Ex-
ample 1.24.
Linear and affine maps play an important role in cryptography. This has several
reasons:
• They can be efficiently described by a matrices and vectors, even for large dimen-
sions.
• Matrix computations are efficient, and the running time is polynomial in the
number of rows and columns.
• The kernel and the image of a linear map as well as the preimage of any element
can be efficiently computed by Gaussian elimination. Also, it can be easily veri-
fied whether a linear or affine map is bijective. If an inverse map exists, then it
can be efficiently computed and the inverse map is also linear or affine.
• Linear and affine maps over 𝐺𝐹(2) can produce diffusion and the so-called
avalanche effect: changing one input bit changes many output bits, if the matrix
is appropriately chosen. In fact, flipping the 𝑘-th input bit adds the 𝑘-th column
vector 𝐴 𝑒𝑘 to the output.
However, linear maps also have a major drawback when used in cryptography:
there are efficient attacks against encryption schemes that are solely based on linear or
affine operations. They do not protect against chosen plaintext attacks. For this reason,
linear and nonlinear operations are combined in the construction of secure ciphers.
Proposition 4.91. Let 𝑓 ∶ 𝐾 𝑚 → 𝐾 𝑛 be an affine map. Suppose the parameters of 𝑓 (i.e.,
the corresponding matrix and possibly a translation vector) are secret, but an adversary
knows 𝑚+1 input vectors 𝑣0 , 𝑣1 , … , 𝑣𝑚 and the corresponding output vectors 𝑤𝑖 = 𝑓(𝑣𝑖 ),
where 𝑖 = 0, 1, … , 𝑚. If 𝑣1 − 𝑣0 , 𝑣2 − 𝑣0 , … , 𝑣𝑚 − 𝑣0 are linearly independent, then the

adversary can efficiently compute the matrix 𝐴 and the translation vector 𝑏 such that
𝑓(𝑣) = 𝐴𝑣 + 𝑏 holds for all 𝑣 ∈ 𝐾 𝑚 .
Proof. Let 𝑓(𝑣) = 𝐴𝑣 + 𝑏, where 𝐴 and 𝑏 are unknown. Since 𝐴𝑣𝑖 + 𝑏 = 𝑤𝑖 , one has
𝐴(𝑣𝑖 − 𝑣0 ) = 𝑤𝑖 − 𝑤0
for all 𝑖 = 1, 2, … , 𝑚. Now write the vectors 𝑣𝑖 − 𝑣0 and 𝑤𝑖 − 𝑤0 into the columns of
matrices 𝑉 and 𝑊, respectively. The 𝑚 × 𝑚 matrix 𝑉 is regular since we assumed that
the vectors 𝑣𝑖 − 𝑣0 are linearly independent. Hence
𝐴𝑉 = 𝑊 ⟹ 𝐴 = 𝑊𝑉 −1 ,
and this provides an efficient matrix formula for 𝐴. It remains to compute the trans-
lation vector 𝑏; to this end we use the equation 𝑏 = 𝑤0 − 𝐴𝑣0 . This proves the asser-
tion. □
Example 4.92. Let 𝐸𝑘 ∶ {0, 1}128 → {0, 1}128 be the ciphering function of a block ci-
pher 𝐸 with block length 128. We identify {0, 1} with 𝐺𝐹(2) and suppose that 𝐸𝑘 is
affine. Then Proposition 4.91 shows that an adversary, who does not know 𝑘, can
find the matrix, the translation vector and hence 𝐸𝑘 and 𝐸𝑘−1 with only 129 known
plaintext/ciphertext pairs, if the above independency condition is satisfied. Hence
128 ⋅ 129 = 15,512 plaintext/ciphertext bits are sufficient, or slightly more if the vectors
are linearly dependent. Note that we made no assumptions concerning 𝑘 and how the
key determined the matrix and the translation vector. This could even be a nonlinear
relationship. In fact, the key is not computed during this attack.
Proposition 4.93. Let 𝐹 be a keyed family of functions and suppose that all maps 𝐹𝑘
are affine; then 𝐹 is not a pseudorandom function. Similarly, if 𝐸 is a keyed family of
permutations and all maps 𝐸𝑘 are affine, then 𝐸 is not a pseudorandom permutation.
Proof. Proposition 4.91 shows how an adversary can explicitly compute the param-
eters of an affine map, i.e., the matrix and the translation vector, using a number of
known input/output values. In the distinguishability experiments (see Definitions 2.38
and 2.41), an adversary can then predict 𝑓(𝑚) for any input 𝑚 and compare the result
with the response 𝑐 they obtain from the challenger. If they coincide, then the function
𝑓 is probably affine and the adversary outputs 𝑏′ = 1. Otherwise, 𝑓 is random and the
adversary outputs 𝑏′ = 0.
Alternatively, an adversary can test whether 𝑓 is affine, by choosing input values
𝑚1 and 𝑚2 and asking for 𝑓(0), 𝑓(𝑚1 ), 𝑓(𝑚2 ), 𝑓(𝑚1 + 𝑚2 ). If 𝑓 is affine, then
𝑓(𝑚1 + 𝑚2 ) + 𝑓(0) = 𝑓(𝑚1 ) + 𝑓(𝑚2 ).
′
An adversary outputs 𝑏 = 1 if this equation is satisfied, and 0 otherwise. Their advan-
tage is close to 1, and so affine functions cannot be pseudorandom. □
Exercises 97
Remark 4.94. The above attack would essentially still work if 𝐹𝑘 can be approximated
by a linear or affine map 𝑓, i.e., if 𝐹𝑘 and 𝑓 coincide significantly more often than by
chance. Therefore, pseudorandom functions and permutations must be highly nonlin-
ear.
4.5. Summary
• Finite cyclic groups of order 𝑛 are isomorphic to the additive group of integers
modulo 𝑛.
• Finite abelian groups can be decomposed into a product of cyclic groups of
prime-power order.
• The integers modulo a prime number 𝑝 define the field 𝐺𝐹(𝑝).
• The polynomials over a field 𝐾 form the ring 𝐾[𝑥].
• The field 𝐺𝐹(𝑝𝑛 ) with 𝑝𝑛 elements is an extension field of 𝐺𝐹(𝑝). It can be de-
fined as the quotient of the polynomial ring over 𝐺𝐹(𝑝) modulo an irreducible
polynomial of degree 𝑛.
𝑛
• 𝐺𝐹(𝑝𝑛 ) is the splitting field of the polynomial 𝑥𝑝 − 𝑥 over 𝐺𝐹(𝑝).
• Linear maps between finite-dimensional vector spaces over an arbitrary field
can be described by matrices.
• Affine maps are defined by a linear map plus a constant translation.
• Keyed function or permutation families of linear or affine maps cannot be pseu-
dorandom.
Exercises
1. Find all subgroups of (ℤ10 , +), (ℤ11 , +) and (ℤ∗11 , ⋅).

2. Let 𝐺 be a group of order 54. List the possible orders of elements in 𝐺.
3. Consider the map 𝑓 ∶ ℤ19 → ℤ19 defined by 𝑓(𝑥) = 5𝑥 mod 19. Show that 𝑓 is a
group homomorphism and even an isomorphism.
4. Let 𝑝 and 𝑞 be different prime numbers and 𝑛 = 𝑝𝑞. Let 𝑒 and 𝑑 be integers such
that 𝑒𝑑 ≡ 1 mod (𝑝 − 1)(𝑞 − 1). Show that 𝑚𝑒𝑑 ≡ 𝑚 mod 𝑛 for any 𝑚 ∈ ℤ𝑛 .
Tip: Use Euler’s Theorem to show the statement for 𝑚 ∈ ℤ∗𝑛 .
5. Consider the multiplicative group 𝐺 = ℤ∗23 . Compute the order of 2 mod 23. What
is the maximum order of elements in 𝐺? Find a generator of 𝐺.
6. Check whether 2 mod 19 and 5 mod 19 are generators of the multiplicative group
ℤ∗19 and determine their order.
7. Show that ℤ𝑛 × ℤ𝑛 is not cyclic for 𝑛 ≥ 2.
Tip: Verify that ord(𝑔) ∣ 𝑛 for all 𝑔 ∈ ℤ𝑛 × ℤ𝑛 .
8. Find all abelian groups of order 8, up to isomorphism. Which of them are cyclic?
9. Determine the decompositions of ℤ∗12 and ℤ∗23 as a product of additive cyclic groups.
10. Which residue classes are generators of the additive group ℤ𝑛 ?
11. Let 𝑛 = 247 = 𝑝𝑞. Find the factors 𝑝 and 𝑞 and solve the simultaneous congru-
ences 𝑘 = 7 mod 𝑝 and 𝑘 = 2 mod 𝑞 using the Chinese Remainder Theorem.
12. Let 𝑅1 and 𝑅2 be rings. Why is the product ring 𝑅1 × 𝑅2 never a field, even if 𝑅1
and 𝑅2 are fields?
Tip: Consider the idempotent elements (1, 0) and (0, 1).
13. Let 𝑓, 𝑔 ∈ 𝐾[𝑥]. Show the product rule
𝐷(𝑓 ⋅ 𝑔) = 𝐷(𝑓) ⋅ 𝑔 + 𝑓 ⋅ 𝐷(𝑔).
Tip: Use the linearity of the derivative 𝐷 to reduce to the case 𝑓 = 𝑥𝑛 and 𝑔 = 𝑥𝑚 .
14. Determine the number of elements of the following residue class rings. Which of
the rings are fields?
(a) 𝐺𝐹(2)[𝑥]/(𝑥4 + 𝑥2 + 1),
(b) 𝐺𝐹(3)[𝑥]/(𝑥2 + 1),
(c) 𝐺𝐹(2)[𝑥]/(𝑥𝑛 − 1), where 𝑛 ∈ ℕ.
15. Let 𝐺𝐹(8) = 𝐺𝐹(2)[𝑥]/(𝑥3 + 𝑥 + 1). Find representatives of 𝑥3 , 𝑥4 , 𝑥5 , 𝑥6 , 𝑥7 in
𝐺𝐹(8) of degree less than 3.
16. Find an irreducible polynomial over 𝐺𝐹(2) of degree 6.
17. 𝐺𝐹(28 ) is the splitting field of 𝑓(𝑥) = 𝑥256 − 𝑥. Use SageMath to factor 𝑓(𝑥) over
𝐺𝐹(2) and identity the irreducible factor 𝑔(𝑥) = 𝑥8 + 𝑥4 + 𝑥3 + 𝑥 + 1 used to define
the AES field.
18. Find explicit descriptions of all subfields of 𝐺𝐹(256).
19. Define 𝐺𝐹(28 ) using 𝑔(𝑥) as above. Which polynomial 𝑓(𝑥) corresponds to the
byte 02 (hexadecimal notation)? Determine a polynomial ℎ(𝑥) which is inverse to
𝑓(𝑥) mod 𝑔(𝑥) and give its hexadecimal representation.
20. Consider the bit permutation 𝑓 ∶ 𝐺𝐹(2)8 → 𝐺𝐹(2)8 described by (3 1 8 2 5 4 6 7).
Determine the inverse bit-permutation 𝑓−1 and the matrices which represent 𝑓
and 𝑓−1 .
21. Show that the following matrix 𝐴 is unitary and find the inverse matrix 𝐴−1 :
1 1+𝑖 1−𝑖
𝐴= ( ).
2 1−𝑖 1+𝑖
22. Let 𝑓 be an affine map given by 𝑓(𝑥) = 𝐴𝑥 + 𝑏. Give a necessary and sufficient
condition for 𝑓 being bijective and a formula for 𝑓−1 .
23. Why is the following map 𝑓 ∶ 𝐺𝐹(2)3 → 𝐺𝐹(2)3 affine and invertible:
𝑓(𝑥1 , 𝑥2 , 𝑥3 ) = (𝑥1 + 𝑥2 + 𝑥3 + 1, 𝑥1 + 𝑥2 , 𝑥2 + 𝑥3 + 1)?
Determine the matrix and the translation vector. Compute the inverse map 𝑓−1 .
Exercises 99
24. Let 𝑓 ∶ 𝐺𝐹(2)3 → 𝐺𝐹(2)3 be a linear map with the following input vectors 𝑣𝑖 and
output vectors 𝑤𝑖 = 𝑓(𝑣𝑖 ). Determine the matrix which corresponds to 𝑓.
0 1 0 1 1 1
𝑣1 = (1) , 𝑤1 = (0) , 𝑣2 = (0) , 𝑤2 = (0) , 𝑣3 = (1) , 𝑤3 = (1) .
0 0 1 1 1 1
25. Let 𝑉 = 𝐺𝐹(28 ). How can you describe a) 𝐺𝐹(28 )-linear maps and b) 𝐺𝐹(2)-linear
maps on 𝑉? How many different maps exist in case a) and in case b) ?
26. Suppose all maps 𝐹𝑘 of a keyed function family 𝐹 are linear. How can an adversary
easily win the prf distinguishability experiment? This shows that 𝐹 is not a pseu-
dorandom function.
Tip: Choose an all-zero input.
Chapter 5
Block Ciphers
A block cipher is a family of permutations which is designed to behave like a pseu-

dorandom permutation. Block ciphers operate on binary strings of fixed length, but in
combination with an operation mode they define a variable-length encryption scheme.
In Section 5.1 of this chapter, we study two important construction methods of
block ciphers: substitution-permutation networks and Feistel ciphers. Section 5.2 deals
with the Rijndael block cipher, which has been adopted as the Advanced Encryption
Standard (AES).
Block ciphers and the AES algorithm are dealt with in all modern cryptography
textbooks, for example [PP10]. A detailed description of the design of AES can be
found in [DR02].
5.1. Constructions of Block Ciphers

A block cipher is a keyed family of permutations (see Definition 2.46):
𝐸 ∶ {0, 1}𝑛 × {0, 1}𝑙 → {0, 1}𝑙 .
Currently, a block length of 𝑙 = 128 bits and key lengths between 𝑛 = 128 and 𝑛 =
256 bits are widely used. The key space is large enough to prevent brute-force attacks
against uniform random keys. For each key 𝑘, a permutation 𝐸𝑘 of {0, 1}𝑙 has to be
defined. Because of computing and storage limitations, it is impossible to use a table
of values and 𝐸𝑘 should rather be defined by an efficient algorithm. In practice, one
combines the following two types of operations: linear or affine mixing maps on full
blocks of length 𝑙 and nonlinear S-Boxes on short segments of a block.
• Mixing maps permute all bits of a block. Typically, bit permutations are used
which are very efficient and spread input changes throughout a block. General
101
102 5. Block Ciphers
linear or affine mixing maps are also used. The required properties (in particular
bijectivity) can easily be checked and the computations are very efficient. Fur-
thermore, linear and affine maps can achieve diffusion if the map is appropriately
chosen: small input changes, say only one bit, affect a whole block and result in
large output changes. Note that diffusion is a necessary property of pseudoran-
dom permutations: the output should completely change, even if only a few input
bits are modified. Otherwise, an adversary could distinguish 𝐸𝑘 from a random
permutation.
• S-Boxes are nonlinear, random looking maps which are applied in parallel to short
segments of a block. For a small number of input values, for example 8 bits with
28 = 256 values, the S-Box transformation can be defined explicitly by a table.
The S-Box needs to be carefully defined and must be highly nonlinear.
The combination of linear mixing maps and nonlinear S-Boxes, applied in several
rounds, can achieve confusion, which makes the relationship between the ciphertext
and the key complex and involved. Confusion makes it very hard to find the key or
the decryption function, even if many plaintext/ciphertext pairs are known to an ad-
versary.
The properties of confusion and diffusion and their role in the construction of se-
cure ciphers were first described by Claude Shannon in 1949 [Sha49].
Confusion and diffusion can be achieved by a substitution-permutation network.
Such a network consists of a number of rounds in which a plaintext block is trans-
formed into a ciphertext block. Each round consists of the following operations (see
Figure 5.1):
(1) Add a round key to the data block. The round key is derived from the encryption
key and ensures that the transformation depends on the key.
(2) Split the block into smaller segments and apply a nonlinear S-Box (substitution)
to each of the segments.
(3) Apply a bit permutation or, more generally, a linear or affine mixing map to the
full data block.
Each operation in a substitution-permutation network has to be invertible so that

a unique decryption map exists. Typical networks have at least ten rounds.
Another widely used construction is Feistel networks. They are also based on non-
linear S-boxes and linear mixing operations, but split a block into a left and right half.
Feistel networks use a round function that depends on a round key and operates on
one half of the data block (see Figure 5.2).
Let (𝐿0 , 𝑅0 ) be plaintext block of length 2𝑡. Define
(𝐿𝑖 , 𝑅𝑖 ) = (𝑅𝑖−1 , 𝐿𝑖−1 ⊕ 𝑓𝑘𝑖 (𝑅𝑖−1 )) for 𝑖 = 1, 2, … , 𝑟.

5.1. Constructions of Block Ciphers 103
Figure 5.1. Substitution-permutation network: add a round key 𝑘𝑖 , apply the S-Box 𝑆
and the permutation 𝑃 in 𝑛 rounds.
After 𝑟 rounds and a final permutation one obtains the ciphertext of the Feistel cipher:
𝐸𝑘 (𝐿0 , 𝑅0 ) = (𝑅𝑟 , 𝐿𝑟 ).
Now let (𝑅𝑟 , 𝐿𝑟 ) be a ciphertext block. Then define
(𝑅𝑖−1 , 𝐿𝑖−1 ) = (𝐿𝑖 , 𝑅𝑖 ⊕ 𝑓𝑘𝑖 (𝐿𝑖 )) for 𝑖 = 𝑟, 𝑟 − 1, … , 1.
Note that encryption and decryption use the same transformation. Applying 𝑟 rounds
of the Feistel network and a final permutation recovers the plaintext:
𝐷𝑘 (𝑅𝑟 , 𝐿𝑟 ) = (𝐿0 , 𝑅0 ).
The round function 𝑓 depends on a round key 𝑘𝑖 , and 𝑓𝑘𝑖 is usually defined by S-
boxes and bit permutations (or affine operations), similar to a substitution-permutation
network. However, the round function 𝑓 only operates on one half of a block and, due
to the Feistel network construction, 𝑓 does not have to be bijective. One can show that
a Feistel cipher with at least 3 rounds is a pseudorandom permutation if the underlying

round function is a pseudorandom function. Typical Feistel ciphers have 16 rounds.
Figure 5.2. Feistel network with 𝑟 rounds.
Many block ciphers are based on Feistel networks, for example the former encryp-
tion standard DES, but also modern block ciphers such as Twofish.
5.2. Advanced Encryption Standard

After a 4-year standardization process, the American National Institute of Standards
and Technology (NIST) adopted the block cipher Rijndael (designed by J. Daemen and
V. Rijmen) as the Advanced Encryption Standard (AES) in 2001 [FIP01]. The cipher
is now widely used by security applications and network protocols. Rijndael is fast on
both software and hardware (also on 8-bit processors), easy to implement and requires
little memory. Throughputs of several hundred MBit/s in software and several GBit/s in
hardware are easily possible. There are a number of alternatives, for example Twofish,
Serpent, Camellia and others, but at the time of this writing AES is the dominating
block cipher.
The standardized AES cipher has a block length of 128 bits and a 128-, 192- or 256-
bit key length. The original Rijndael cipher supports additional block lengths and key
lengths. The cipher is a substitution-permutation network, and the number of rounds
depends on the key size; the cipher has either 10 (for 128-bit keys), 12 (for 192-bit keys),
or 14 rounds (for 256-bit keys). These settings allow for a very efficient operation. Fur-
thermore, the block and key lengths prevent simple brute-force attacks and lookup
tables.
We denote the AES encryption function by 𝑓𝑘 ∶ {0, 1}128 → {0, 1}128 . The encryp-
tion function operates on the 128-bit state. The state is arranged in a 4 × 4 matrix over
𝐺𝐹(28 ) by writing the bytes 𝑝0 , 𝑝1 , … , 𝑝16 into the columns of a matrix:
𝑝 𝑝4 𝑝8 𝑝12
⎛ 0 ⎞
𝑝 𝑝5 𝑝9 𝑝13
⎜ 1 ⎟.
𝑝
⎜ 2 𝑝6 𝑝10 𝑝14 ⎟
⎝𝑝3 𝑝7 𝑝11 𝑝15 ⎠
Each byte is interpreted as an element of the field
𝐺𝐹(28 ) = 𝐺𝐹(2)[𝑥]/(𝑥8 + 𝑥4 + 𝑥3 + 𝑥 + 1)
(compare Example 4.77). The byte (𝑏7 … 𝑏1 𝑏0 ) corresponds to the residue class 𝑏7 𝑥7 +
⋯ + 𝑏1 𝑥 + 𝑏0 mod (𝑥8 + 𝑥4 + 𝑥3 + 𝑥 + 1).
Example 5.1. Let 𝑚 = 10 … 0 be a 128-bit input block. The byte 80 = 1000 0000
corresponds to 𝑥7 mod 𝑥8 + 𝑥4 + 𝑥3 + 𝑥 + 1. Obviously, the zero byte corresponds to
the zero polynomial. Thus, 𝑚 is represented by the following 4 × 4 matrix over 𝐺𝐹(28 ):
𝑥7 0 0 0
⎛ ⎞
0 0 0 0
⎜ ⎟. ♢
⎜0 0 0 0⎟
⎝0 0 0 0⎠
The AES encryption function 𝑓𝑘 takes the plaintext as input state and transforms
the state in successive rounds. Each round consists of several steps: the nonlinear sub-
stitution step (SubBytes), two linear mixing steps (ShiftRows,
MixColumns) and the affine AddRoundKey step. The final state is output. Each step
is invertible and the decryption function 𝑓𝑘−1 is given by composing the inverse steps
in reverse order.
The following pseudocode gives a high-level description of 𝑓𝑘 . The SubBytes,
ShiftRows, MixColumns operations and the KeyExpansion step are described below.
Rijndael(State, CipherKey)
{
KeyExpansion(CipherKey, ExpandedKey)
AddRoundKey(State,ExpandedKey[0])
for(i = 1; i < Nr ; i++) { // Nr is either 10, 12 or 14
// Round i
SubBytes(State)
ShiftRows(State)
MixColumns(State)
AddRoundKey(State,ExpandedKey[i])
}
// Final Round
SubBytes(State)
ShiftRows(State)
AddRoundKey(State,ExpandedKey[Nr])
}
First, we consider the S-Box SubBytes which is the only non-affine component of
AES. The S-Box function 𝑆𝑅𝐷 ∶ 𝐺𝐹(28 ) → 𝐺𝐹(2)8 is applied to each byte of the state
individually (see Figure 5.3).

𝑝 𝑝4 𝑝8 𝑝12 𝑆 (𝑝 ) 𝑆𝑅𝐷 (𝑝4 ) 𝑆𝑅𝐷 (𝑝8 ) 𝑆𝑅𝐷 (𝑝12 )
⎛ 0 ⎞ SubBytes ⎛ 𝑅𝐷 0 ⎞
𝑝 𝑝5 𝑝9 𝑝13 𝑆 (𝑝 ) 𝑆𝑅𝐷 (𝑝5 ) 𝑆𝑅𝐷 (𝑝9 ) 𝑆𝑅𝐷 (𝑝13 )
⎜ 1 ⎟ −−−−−→ ⎜ 𝑅𝐷 1 ⎟
⎜𝑝2 𝑝6 𝑝10 𝑝14 ⎟ ⎜𝑆𝑅𝐷 (𝑝2 ) 𝑆𝑅𝐷 (𝑝6 ) 𝑆𝑅𝐷 (𝑝10 ) 𝑆𝑅𝐷 (𝑝14 )⎟
⎝𝑝3 𝑝7 𝑝11 𝑝15 ⎠ ⎝𝑆𝑅𝐷 (𝑝3 ) 𝑆𝑅𝐷 (𝑝7 ) 𝑆𝑅𝐷 (𝑝11 ) 𝑆𝑅𝐷 (𝑝15 )⎠
𝑆𝑅𝐷 𝑆𝑅𝐷 𝑆𝑅𝐷 𝑆𝑅𝐷
Figure 5.3. The nonlinear S-Box operates on each byte of the state array individually.
𝑆𝑅𝐷 is defined by multiplicative inversion in the field 𝐺𝐹(28 ) followed by an affine

transformation of the vector space 𝐺𝐹(2)8 . The definition
𝐺𝐹(28 ) = 𝐺𝐹(2)[𝑥]/(𝑥8 + 𝑥4 + 𝑥3 + 𝑥 + 1)
gives a 𝐺𝐹(2)-linear isomorphism between 𝐺𝐹(28 ) and 𝐺𝐹(2)8 , so that elements in
𝐺𝐹(28 ) correspond to a binary word of length 8 (one byte).
𝐴𝑎−1 + 𝑏 for 𝑎 ≠ 0,
𝑆𝑅𝐷 (𝑎) = {
𝑏 for 𝑎 = 0,
1 1 1 1 1 0 0 0 0
⎛ ⎞ ⎛ ⎞
0 1 1 1 1 1 0 0 1
⎜ ⎟ ⎜ ⎟
⎜0 0 1 1 1 1 1 0⎟ ⎜1⎟
⎜0 0 0 1 1 1 1 1⎟ ⎜0⎟
𝐴=⎜ , 𝑏 = ⎜ ⎟.
1 0 0 0 1 1 1 1⎟ 0
⎜ ⎟ ⎜ ⎟
⎜1 1 0 0 0 1 1 1⎟ ⎜0⎟
⎜1 1 1 0 0 0 1 1⎟ ⎜1⎟
⎝1 1 1 1 0 0 0 1⎠ ⎝1⎠
Since 0 ∈ 𝐺𝐹(28 ) is not invertible, one extends the inversion by mapping 0 to 0. The ex-
tended inversion map is a bijection on 𝐺𝐹(28 ) and can also be described by the mono-
mial map 𝑖(𝑎) = 𝑎254 . In fact, Euler’s Theorem can be applied to the multiplicative
group 𝐺𝐹(28 )∗ of units. This yields 𝑎255 = 1 and hence 𝑎−1 = 𝑎254 for all 𝑎 ≠ 0. The
composition of the inversion map 𝑖 and the affine transformation 𝑓(𝑎) = 𝐴𝑎 + 𝑏 can
be represented by a polynomial over 𝐺𝐹(28 ) (see [DR02]):
𝑆𝑅𝐷 (𝑎) = 𝑓(𝑖(𝑎)) = 05 ⋅ 𝑎254 + 09 ⋅ 𝑎253 + F9 ⋅ 𝑎251 + 25 ⋅ 𝑎247 + F4 ⋅ 𝑎239
+ 01 ⋅ 𝑎223 + B5 ⋅ 𝑎191 + 8F ⋅ 𝑎127 + 63.
This shows that 𝑆𝑅𝐷 has a complex algebraic expression over 𝐺𝐹(28 ). One can also
consider 𝑆𝑅𝐷 as an (8, 8)-vectorial Boolean function and compute the algebraic normal
form (see Section 1.1) of each of its components. The algebraic degree of the Boolean
functions is 7, which demonstrates a high algebraic complexity over 𝐺𝐹(2).
Note that implementations of AES do not use the algebraic definition of 𝑆𝑅𝐷 , but
a lookup table instead. This requires only 256 bytes of memory.
Example 5.2. We use SageMath, construct an AES object called sr and print out the
hexadecimal S-Box values:
sage: sr = mq.SR(10, 4, 4, 8, star=True , allow_zero_inversions =True , aes_mode=True)
sage: S=sr.sbox ()
sage: for i in range (0 ,256):
print "{:02X}".format(S[i]),
63 7C 77 7B F2 6B 6F C5 30 01 67 2B FE D7 AB 76 CA 82 C9 7D FA 59 47 F0
AD D4 A2 AF 9C A4 72 C0 B7 FD 93 26 36 3F F7 CC 34 A5 E5 F1 71 D8 31 15
04 C7 23 C3 18 96 05 9A 07 12 80 E2 EB 27 B2 75 09 83 2C 1A 1B 6E 5A A0
52 3B D6 B3 29 E3 2F 84 53 D1 00 ED 20 FC B1 5B 6A CB BE 39 4A 4C 58 CF
D0 EF AA FB 43 4D 33 85 45 F9 02 7F 50 3C 9F A8 51 A3 40 8F 92 9D 38 F5
BC B6 DA 21 10 FF F3 D2 CD 0C 13 EC 5F 97 44 17 C4 A7 7E 3D 64 5D 19 73
60 81 4F DC 22 2A 90 88 46 EE B8 14 DE 5E 0B DB E0 32 3A 0A 49 06 24 5C
C2 D3 AC 62 91 95 E4 79 E7 C8 37 6D 8D D5 4E A9 6C 56 F4 EA 65 7A AE 08
BA 78 25 2E 1C A6 B4 C6 E8 DD 74 1F 4B BD 8B 8A 70 3E B5 66 48 03 F6 0E
61 35 57 B9 86 C1 1D 9E E1 F8 98 11 69 D9 8E 94 9B 1E 87 E9 CE 55 28 DF
8C A1 89 0D BF E6 42 68 41 99 2D 0F B0 54 BB 16
For example, 𝑆𝑅𝐷 (00) = 𝑏 = 63 and 𝑆𝑅𝐷 (01) = 7C. ♢
One of the main design criteria of the S-Box is its nonlinearity. We have seen above
that 𝑆𝑅𝐷 is nonlinear and its algebraic degree is high. In addition, it is impossible to
approximate 𝑆𝑅𝐷 by affine functions. One can show that any linear combination (XOR)
of input and output bits of the S-Box gives the correct value for at least 112 and at
most 144 of 256 input values. Note that the expected number of matches for a random
XOR combination is 128. The correlation between the S-Box and all affine functions is
therefore low, which protects the cipher against a linear cryptanalysis.
Another design aspect is the differential properties of the S-Box. It can be shown
that for any fixed pair of input-output differences, at most 4 out of 256 values propagate
the given differences. This prevents a differential cryptanalysis of the cipher.
Next, we look at the diffusion layer, which is implemented by the linear mixing
maps ShiftRows and MixColumns. Both are 𝐺𝐹(28 )-linear operations on the state ma-
trix.
ShiftRows is a bit permutation and rotates the bytes in the second, third and fourth
row to the left. The first row is left unchanged, the bytes in the second row are rotated
by one position, bytes in the third row are rotated by two positions and bytes in the
fourth row are rotated by three positions (see Figure 5.4). Clearly, ShiftRows can be
inverted by a corresponding circular right shift.
Figure 5.4. ShiftRows rotates the bytes in the rows by zero, one, two and three posi-
tions, respectively.
𝑝 𝑝4 𝑝8 𝑝12 𝑝 𝑝4 𝑝8 𝑝12
⎛ 0 ⎞ ⎛ 0 ⎞
𝑝 𝑝5 𝑝9 𝑝13 ShiftRows 𝑝5 𝑝9 𝑝13 𝑝1
⎜ 1 ⎟ −−−−−−→ ⎜ ⎟.
⎜𝑝2 𝑝6 𝑝10 𝑝14 ⎟ ⎜𝑝10 𝑝14 𝑝2 𝑝6 ⎟
⎝𝑝3 𝑝7 𝑝11 𝑝15 ⎠ ⎝𝑝15 𝑝3 𝑝7 𝑝11 ⎠
MixColumns transforms the columns of the state matrix by a 𝐺𝐹(28 )-linear map.
One multiplies a constant 4×4 matrix 𝑀 over 𝐺𝐹(28 ) by the column vectors of the state
(see Figure 5.5). The matrix is regular so that the operation can be inverted (see Sec-
tion 0.4 where the inverse matrix is computed). The MixColumns matrix was carefully
chosen to have good diffusion properties. If 𝑣 ∈ 𝐺𝐹(28 )4 is a nonzero column vector,
then the number of nonzero bytes of 𝑣 plus the number of nonzero bytes of 𝑀𝑣 is at
least 5. This can be shown using linear codes (see Example 15.19 (2)).
𝑝 𝑝4 𝑝8 𝑝12 02 03 01 01 𝑝 𝑝4 𝑝8 𝑝12
⎛ 0 ⎞ ⎛ ⎞ ⎛ 0 ⎞
𝑝 𝑝5 𝑝9 𝑝13 MixColumns 01 02 03 01 𝑝 𝑝5 𝑝9 𝑝13
⎜ 1 ⎟ −−−−−−−−→ ⎜ ⎟⋅⎜ 1 ⎟
𝑝
⎜ 2 𝑝6 𝑝10 𝑝14 ⎟ ⎜ 01 01 02 03 𝑝
⎟ ⎜ 2 𝑝6 𝑝10 𝑝14 ⎟
⎝𝑝3 𝑝7 𝑝11 𝑝15 ⎠ 03⎵⎵⎵
⎝⏟⎵ 01 01
⎵⏟⎵ 02⎠ ⎝𝑝3
⎵⎵⎵⎵⏟ 𝑝7 𝑝11 𝑝15 ⎠
𝑀
In the AddRoundKey step, every bit of the state matrix is XORed with the round
key 𝑘𝑖 . The round keys have the same length as the state (128 bits) and are computed
in the KeyExpansion step, as explained below.
𝑝0 𝑝4 𝑝8 𝑝12
𝑝1 𝑝5 𝑝9 𝑝13
𝑀⋅ 𝑀⋅ 𝑀⋅ 𝑀⋅
𝑝2 𝑝6 𝑝10 𝑝14
𝑝3 𝑝7 𝑝11 𝑝15
Figure 5.5. The MixColumns operation: each column is transformed by a fixed matrix 𝑀.
𝑝 𝑝4 𝑝8 𝑝12 𝑝 𝑝4 𝑝8 𝑝12
⎛ 0 ⎞ ⎛ 0 ⎞
𝑝 𝑝5 𝑝9 𝑝13 AddRoundKey 𝑝1 𝑝5 𝑝9 𝑝13
⎜ 1 ⎟ −−−−−−−−−→ ⎜ ⎟ 𝑘.
𝑝
⎜ 2 𝑝6 𝑝10 𝑝14 ⎟ ⎜𝑝2 𝑝6 𝑝10 𝑝14 ⎟ ⨁ 𝑖
⎝𝑝3 𝑝7 𝑝11 𝑝15 ⎠ ⎝𝑝3 𝑝7 𝑝11 𝑝15 ⎠
The operations were designed such that two Rijndael rounds (SubBytes,
ShiftRows, MixColumns, AddRoundKey) already provide sufficient diffusion. After
two rounds, every output bit depends on all input bits, and a change in one input bit
changes about half of all output bits.
Finally, we explain AES key scheduling. The main design criteria for the key ex-
pansion step were efficiency, symmetry elimination, diffusion of the key and nonlinearity.
The nonlinearity is intended to protect the cipher against related-key attacks (compare
Remark 2.44).
We begin with 128-bit keys (see Figure 5.6). In this case, the AES algorithm has
ten rounds, and eleven 128-bit round keys 𝑘0 , 𝑘1 , … , 𝑘10 are required. The subkeys are
stored in 44 words 𝑊0 , 𝑊1 , … , 𝑊43 ∈ 𝐺𝐹(28 )4 of length 32 bits. Let
𝑠ℎ ∶ 𝐺𝐹(28 )4 → 𝐺𝐹(28 )4
be the rotation by one byte position to the left, i.e., 𝑠ℎ(𝑝0 , 𝑝1 , 𝑝2 , 𝑝3 ) = (𝑝1 , 𝑝2 , 𝑝3 , 𝑝0 ).
We write 𝐒 for the function which applies the S-Box 𝑆𝑅𝐷 to all four components of
a vector in 𝐺𝐹(28 )4 . This function ensures that the key schedule is nonlinear. The
symmetry of 𝐒 is eliminated by round constants:
𝑅𝐶𝑗 = 𝑥𝑗−1 mod 𝑥8 + 𝑥4 + 𝑥3 + 𝑥 + 1 ∈ 𝐺𝐹(28 ) for 𝑗 ≥ 1.
The 128-bit AES key 𝑘 defines the initial round key 𝑘 = 𝑘0 = 𝑊0 ‖𝑊1 ‖𝑊2 ‖𝑊3 . The next
round key 𝑘1 = 𝑊4 ‖𝑊5 ‖𝑊6 ‖𝑊7 is computed as follows:
𝑊4 = 𝑊0 ⊕ 𝐒(𝑠ℎ(𝑊3 )) ⊕ (𝑅𝐶1 , 0, 0, 0), 𝑊5 = 𝑊1 ⊕ 𝑊4 ,

𝑊6 = 𝑊2 ⊕ 𝑊5 , 𝑊7 = 𝑊3 ⊕ 𝑊6 .
W0 W1 W2 W3 k
0
W4 W5 W6 W7 k1
W8 W9 W 10 W 11 k
2
Figure 5.6. The first two rounds of 128-bit AES key scheduling. 𝑇 maps the word
𝑊4𝑖−1 to 𝐒(𝑠ℎ(𝑊4𝑖−1 )) ⊕ (𝑅𝐶𝑖 , 0, 0, 0). This is basically a byte-wise SubBytes operation,
but involves an additional rotation and a translation by a round constant.
The following round keys are constructed analogously (increment the index of 𝑅𝐶
by 1 and increase all other indices by 4). For 𝑖 = 2, … , 10 one defines:
𝑊4𝑖 = 𝑊4𝑖−4 ⊕ 𝐒(𝑠ℎ(𝑊4𝑖−1 )) ⊕ (𝑅𝐶𝑖 , 0, 0, 0), 𝑊4𝑖+1 = 𝑊4𝑖−3 ⊕ 𝑊4𝑖 ,

𝑊4𝑖+2 = 𝑊4𝑖−2 ⊕ 𝑊4𝑖+1 , 𝑊4𝑖+3 = 𝑊4𝑖−1 ⊕ 𝑊4𝑖+2 .
The round keys are given by

𝑘𝑖 = 𝑊4𝑖 ‖ 𝑊4𝑖+1 ‖ 𝑊4𝑖+2 ‖ 𝑊4𝑖+3 for 𝑖 = 0, 1, … , 10.
Note that the construction of the first word of each round key uses the nonlinear S-Box,
while the other three words are defined by XORing two preceding words.
We skip 192-bit keys and look at 256-bit keys. In this case, fifteen 128-bit subkeys
𝑘0 , 𝑘1 , … , 𝑘14 are needed. The key schedule generates 60 words 𝑊0 , 𝑊1 , … , 𝑊59 in
𝐺𝐹(28 )4 and the round keys are given by
𝑘𝑖 = 𝑊4𝑖 ‖ 𝑊4𝑖+1 ‖ 𝑊4𝑖+2 ‖ 𝑊4𝑖+3 for 𝑖 = 0, 1, … , 14.
Exercises 111
A 256-bit AES key 𝑘 defines the first eight words 𝑊0 , 𝑊1 , … , 𝑊7 . The next eight
words 𝑊8 , 𝑊9 , … , 𝑊15 are computed as follows:
𝑊8 = 𝑊0 ⊕ 𝐒(𝑠ℎ(𝑊7 )) ⊕ (𝑅𝐶1 , 0, 0, 0), 𝑊9 = 𝑊1 ⊕ 𝑊8 ,

𝑊10 = 𝑊2 ⊕ 𝑊9 , 𝑊11 = 𝑊3 ⊕ 𝑊10 ,
𝑊12 = 𝑊4 ⊕ 𝐒(𝑊11 ), 𝑊13 = 𝑊5 ⊕ 𝑊12 ,
𝑊14 = 𝑊6 ⊕ 𝑊13 , 𝑊15 = 𝑊7 ⊕ 𝑊14 .
The following eight words are defined analogously (increment the index of 𝑅𝐶 by
1 and increase all other indices by 8), until all 60 words have been defined. Again, the
first word of each round key is defined by a nonlinear operation, which in turn affects
all subsequent words.
5.3. Summary
• A block cipher is a family of permutations. It should have good diffusion and

confusion properties.
• The main building blocks of block ciphers are linear or affine mixing maps and
nonlinear S-Boxes.
• Substitution-permutation networks and Feistel networks are the most impor-
tant methods for constructing block ciphers.
• The Rijndael block cipher was adopted as the Advanced Encryption Standard
(AES) and is widely used in practice.
• AES is a substitution-permutation network with a block length of 128 bits. The
cipher operates on a 4× 4 state matrix over 𝐺𝐹(28 ) and is defined by a sequence
of KeyExpansion, AddRoundKey, SubBytes, ShiftRows and MixColumns steps.
• The AES SubBytes S-Box is the only nonlinear operation. It is defined by a
multiplicative inversion in 𝐺𝐹(28 ), followed by an affine map over 𝐺𝐹(2).
• The AES key schedule takes a 128-, 192- or 256-bit key as input and generates
128-bit round keys by linear and nonlinear operations.
• AES is resistant to various attacks and modeled as a family of pseudorandom
permutations.
Exercises
1. Consider Feistel ciphers. Use the formulas in Section 5.1 to show that 𝐷𝑘 recovers
the plaintext.
2. Verify the following inverses in 𝐺𝐹(28 ) (in hexadecimal notation):
01−1 = 01, 02−1 = 8D, 03−1 = F6.
Then compute 𝑆𝑅𝐷 (00), 𝑆𝑅𝐷 (01), 𝑆𝑅𝐷 (02) and 𝑆𝑅𝐷 (03).
𝑛)
3. Let 𝑛 ∈ ℕ. Show that 𝑓(𝑥) = 𝑥(2 is a 𝐺𝐹(2)-linear map on 𝐺𝐹(28 ), whereas
𝑓(𝑥) = 𝑥254 is not linear.
−1
4. Describe the inverse S-Box 𝑆𝑅𝐷 .
5. How can the multiplication of 8-bit strings by 01, 02 and 03 be efficiently imple-
mented? What is an advantage of the MixColumns matrix? What can be said about
its inverse matrix?
6. Show that the MixColumn matrix and all submatrices are nonsingular over 𝐺𝐹(28 ).
One can show that this ensures good diffusion properties (see Example 15.19 (2)).
7. Give a high-level (pseudocode) description of the AES decryption function 𝑓𝑘−1 .
8. Suppose a 128-bit AES key 𝑘 and a plaintext 𝑚 are given:
𝑘 = 01 00 00 00 00 00 00 00 00, 𝑚 = 80 00 00 00 00 00 00 00 00.
(a) Find the round keys 𝑘0 and 𝑘1 .
(b) Use SageMath to compute all round keys and encrypt the input block 𝑚.
Tip: The following SageMath function may be used:
sage: sr = mq.SR(10, 4, 4, 8, star=True ,
allow_zero_inversions =True , aes_mode =True)
sage: def aesenc (p,k):
# Add k=key0
print sr. hex_str_vector (k)
p=p+k;
# Rounds 1-9
for i in range (1 ,10):
p=sr. sub_bytes (p)
p=sr. shift_rows (p)
p=sr. mix_columns (p)
k=sr. key_schedule (k, i)
p=p+k
# Round 10
p=sr. sub_bytes (p)
p=sr. shift_rows (p)
k=sr. key_schedule (k, 10)
p=p+k
print " Output " + sr. hex_str_vector (p)
return p
Define 𝐾 = 𝐺𝐹(28 ) and initialize 4 × 4 matrices M and Key. Only the upper
left entries of these matrices are nonzero in this exercise.
sage: K.<a>=GF (2^8 , name='a', modulus =x^8+x^4+x^3+x+1)
sage: M=sr. state_array (); M[0 ,0]=a^7
sage: Key=sr. state_array (); Key [0 ,0]=1
Exercises 113
9. Assume that a modified AES block cipher lacks all ShiftRows and MixColumns
operations. Can this cipher be a pseudorandom permutation? What if only one of
these operations is missing?
10. Suppose that the multiplicative inversion is omitted in the S-Box of a modified AES
block cipher. Can this cipher be a pseudorandom permutation?
11. What is more important for a cipher: the nonlinearity of encryption or the nonlin-
earity of the key schedule?
12. Suppose a 256-bit AES key 𝑘 is all-zero. Find the round keys 𝑘0 , 𝑘1 , 𝑘2 and 𝑘3 .
Chapter 6
Stream Ciphers
Symmetric ciphers can be divided into block ciphers and stream ciphers. Some opera-
tion modes turn block ciphers into stream ciphers, for example the counter mode, but
this chapter focuses on dedicated stream ciphers that are constructed as keystream gen-
erators. Stream ciphers are usually very fast, even on restricted hardware. They were
used a lot in the past, for example to protect network communication, but in many
cases have been replaced by the AES block cipher.
Section 6.1 deals with synchronous stream ciphers and self-synchronizing stream
ciphers and presents the block cipher modes OFB and CFB. We introduce two classical
stream ciphers in Sections 6.2 and 6.3, linear feedback shift registers (LFSRs) and the
RC4 cipher, and outline their vulnerabilities. In Section 6.4, we provide an example of
a new stream cipher family, Salsa20, and the related ChaCha family.
Stream ciphers are contained in most cryptography textbooks, for example [PP10]
and [KL15]. We also recommend the handbook [MvOV97]. Further details on the
design of new stream ciphers can be found in [RB08] and at the eSTREAM project
(http://www.ecrypt.eu.org/stream/).
6.1. Definition of Stream Ciphers

In Section 2.8 we introduced pseudorandom generators of fixed output length. A stream
cipher is essentially a pseudorandom generator that outputs keystream bits, but our
definition differs in two ways: the output length is not fixed and the keystream is com-
puted recursively using an internal state and the key. The initial state is derived from
a key and an initialization vector.
We use the term stream cipher for an encryption scheme that is based on a
keystream generator. Encryption is defined by XORing the plaintext with the keystream,
115
116 6. Stream Ciphers
i.e., by bitwise addition of the plaintext and the keystream. Other encryption functions
which combine several plaintext and keystream bits to produce ciphertext are also pos-
sible. Note the difference to block ciphers (Chapter 5), which process larger blocks of
plaintext (e.g., 128 bits). Stream ciphers process small plaintext blocks, e.g., only one
bit, and the keystream varies as the plaintext is processed. Two types of cipher streams
can be distinguished. In synchronous stream ciphers, the keystream depends only on
the key and the internal state of the generator. Self-synchronizing stream ciphers, on
the other hand, use the previous ciphertext bits to generate the keystream. Below we
assume that encryption and decryption is given by binary addition.
Definition 6.1. A synchronous stream cipher is an encryption scheme defined by the
following spaces and polynomial-time algorithms:
• The plaintext space and the ciphertext space is ℳ = 𝒞 = {0, 1}∗ .
• The key generation algorithm 𝐺𝑒𝑛(1𝑛 ) takes 1𝑛 as input and outputs a key 𝑘 ∈
{0, 1}𝑛 as well as an initialization vector IV.
• The initialization algorithm Init (𝑘, 𝐼𝑉) takes 𝑘 and IV as input and outputs an
initial state 𝑠𝑡1 .
• The keystream generator 𝐺 = 𝐺(𝑘, 𝑠𝑡) takes 𝑘 and 𝑠𝑡 as input and recursively
computes 𝑙-bit output words 𝑦1 , 𝑦2 , … called a keystream. The next state function
𝑓(𝑘, 𝑠𝑡) takes 𝑘 and 𝑠𝑡 as input and updates the state 𝑠𝑡.
𝑦𝑖 = 𝐺(𝑘, 𝑠𝑡𝑖 ) and 𝑠𝑡𝑖+1 = 𝑓(𝑘, 𝑠𝑡𝑖 ) for 𝑖 ≥ 1.
• Encryption of a plaintext (𝑚1 , 𝑚2 , … ) and decryption of a ciphertext
(𝑐1 , 𝑐2 , … ) are defined by XORing each input word of length 𝑙 with the correspond-
ing keystream word (see Figure 6.1).
𝑐𝑖 = 𝑚𝑖 ⊕ 𝑦𝑖 and 𝑚𝑖 = 𝑐𝑖 ⊕ 𝑦𝑖 for 𝑖 ≥ 1.
If the last plaintext or ciphertext word is shorter than 𝑙 bits, then only the first
(most significant) bits of the keystream word are used. ♢
Figure 6.1. Keystream generation and encryption using a synchronous stream cipher.
6.1. Definition of Stream Ciphers 117
The keystream of a synchronous stream cipher does not depend on the plaintext
or the ciphertext. The sender and receiver must be synchronized and use the same state
for the decryption to be successful.
Example 6.2. The Output Feedback (OFB) mode (see [Dwo01]) turns a block cipher
into a synchronous stream cipher. Let 𝐹 ∶ {0, 1}𝑛 × {0, 1}𝑙 → {0, 1}𝑙 be a keyed family
$
of functions, for example a block cipher. A uniform random key 𝑘 ← {0, 1}𝑛 and a
$
uniform initialization vector 𝐼𝑉 ← {0, 1}𝑙 are chosen. The initial state is 𝑠𝑡1 = 𝑦0 = 𝐼𝑉,
and we recursively generate keystream words of length 𝑙 by applying 𝐹𝑘 to the state (see
Figure 6.2). The keystream is also used to update the state.
𝑦𝑖 = 𝐹𝑘 (𝑠𝑡𝑖 ) = 𝐹𝑘 (𝑦𝑖−1 ) and 𝑠𝑡𝑖+1 = 𝑦𝑖 for 𝑖 ≥ 1.
Figure 6.2. OFB mode encryption. The cipher recursively generates keystream words.
Another example of a block cipher turned into a synchronous stream cipher is

given by the CTR (counter) mode (see Definition 2.48). In that case, the counter value
represents the state. The counter is incremented after each output block. ♢
Self-synchronizing stream ciphers (also called asynchronous stream ciphers or ci-

phertext autokey) are less common. For example, only two out of 34 submissions to
the eSTREAM project were of this type.
Definition 6.3. A self-synchronizing stream cipher is an encryption scheme that differs
from a synchronous stream cipher (see Definition 6.1) in the following way:
The state 𝑠𝑡𝑖 is represented by 𝑡 ≥ 1 preceding ciphertext words (𝑐𝑖−𝑡 , … , 𝑐𝑖−1 ),
where the initial state is the initialization vector 𝐼𝑉 = 𝑠𝑡1 = (𝑐1−𝑡 , … , 𝑐0 ).
𝑦𝑖 = 𝐺(𝑘, 𝑠𝑡𝑖 ) = 𝐺(𝑘, (𝑐𝑖−𝑡 , … , 𝑐𝑖−1 )), 𝑐𝑖 = 𝑚𝑖 ⊕ 𝑦𝑖 , 𝑠𝑡𝑖+1 = (𝑐𝑖+1−𝑡 , … , 𝑐𝑖 )
for 𝑖 ≥ 1 (see Figure 6.3 for 𝑡 = 1). ♢
Note that a synchronization is possible after receiving 𝑡 correct ciphertext words.

This does not work for a synchronous stream cipher since its state is recursively up-
dated and cannot be derived from the ciphertext.
Figure 6.3. Self-synchronizing stream cipher.
Example 6.4. A block cipher in Cipher Feedback (CFB) mode (see [Dwo01]) gives rise
to a self-synchronizing stream cipher. Let 𝐹 ∶ {0, 1}𝑛 × {0, 1}𝑙 → {0, 1}𝑙 be a keyed
$
family of functions, for example a block cipher. Choose a uniform key 𝑘 ← {0, 1}𝑛 and
$
a uniform initialization vector 𝐼𝑉 ← {0, 1}𝑙 . The initial state is 𝑠𝑡1 = 𝑐0 = 𝐼𝑉. Let 𝑚𝑖 be
the 𝑖-th plaintext word of length 𝑙. Define the keystream and the ciphertext words by
𝑦𝑖 = 𝐹𝑘 (𝑐𝑖−1 ) and 𝑐𝑖 = 𝑚𝑖 ⊕ 𝑦𝑖 for 𝑖 ≥ 1.
The keystream depends on the preceding ciphertext word (𝑡 = 1, see Figure 6.4). There
are also 𝑠-bit CFB modes where words of length 𝑠 ≤ 𝑙 are processed. ♢
Figure 6.4. CFB mode encryption and decryption.
There is no obvious security definition for stream ciphers. In the following, we

discuss possible definitions for synchronous stream ciphers.
If the cipher does not depend on an IV, then we can refer to Definition 2.32 of
a pseudorandom generator. The generated keystream 𝑦1 , 𝑦2 , 𝑦3 , … should be pseudo-
random; i.e., a probabilistic polynomial-time adversary, who does not know the secret
key, should not be able to distinguish the keystream of polynomial length from a truly
random bit sequence.
For an IV-dependent stream cipher 𝐺(𝑘, 𝐼𝑉), the minimum requirement would
be the pseudorandomness of the keystream (as above) for random IVs, where the IV is
known to an adversary.
For a stronger security definition, we let the adversary choose an IV and the out-
put length. They are given either the associated keystream of the chosen length or a
random bit sequence of the same length. The adversary’s task is to distinguish between
the two cases.
The cipher is secure if 𝐺(𝑘, 𝐼𝑉) is a pseudorandom function (see Definitions 2.38
and 2.39). An IV-dependent stream cipher can in fact be viewed as a family of functions,
which is parametrized by a key and maps an IV to a keystream. Refer to [BG07] and
[Zen07] for a discussion of this topic.
Yet another security issue not addressed here are related-key attacks (see Remark
2.44) against stream ciphers.
6.2. Linear Feedback Shift Registers

A classical and very efficient way to generate output bits from an initial state is to use a
linear feedback shift register (LFSR). An LFSR of degree 𝑛 consists of 𝑛 binary registers
that form the state. The state is updated by shifting all registers to the right (see Figure
6.5). The rightmost bit leaves the state and becomes an output bit. The new leftmost bit
is an XOR combination of state bits, and fixed feedback coefficients determine whether
or not a particular position of the state is tapped.
Definition 6.5. A linear feedback shift register (LFSR) of degree 𝑛 (or length 𝑛) is
defined by feedback coefficients 𝑐1 , 𝑐2 , … , 𝑐𝑛 ∈ 𝐺𝐹(2). The initial state is an 𝑛-bit
word 𝑠𝑡 = (𝑠𝑛−1 , … , 𝑠1 , 𝑠0 ) and new bits are generated by the recursion
𝑠𝑗 = 𝑐1 𝑠𝑗−1 + 𝑐2 𝑠𝑗−2 + ⋯ + 𝑐𝑛 𝑠𝑗−𝑛 mod 2 for 𝑗 ≥ 𝑛.
At each iteration step (clock tick), the state 𝑠𝑡 is updated from (𝑠𝑗−1 , … , 𝑠𝑗−𝑛 ) to
(𝑠𝑗 , 𝑠𝑗−1 , … , 𝑠𝑗−𝑛+1 ), i.e., by shifting the register to the right. The rightmost bit 𝑠𝑗−𝑛
is output. The output of an LFSR is called a linear recurring sequence. ♢
Obviously, an LFSR first outputs the initial state 𝑠0 , 𝑠1 , … , 𝑠𝑛−1 and subsequently
the new feedback bits 𝑠𝑛 , 𝑠𝑛+1 , … . At each step, the state vector 𝑠𝑡 is updated to 𝐴 ⋅ 𝑠𝑡,
where 𝐴 is an 𝑛 × 𝑛 matrix over 𝐺𝐹(2) and 𝑠𝑡 is viewed as a column vector. The initial
state is 𝑠𝑡 = (𝑠𝑛−1 , … , 𝑠1 , 𝑠0 )𝑇 .
𝑐 𝑐2 … 𝑐𝑛 𝑠 𝑠
⎛ 1 ⎞ ⎛ 𝑗−1 ⎞ ⎛ 𝑗 ⎞
1 0 0 0 𝑠𝑗−2 𝑠𝑗−1
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
𝐴 = ⎜0 1 0 0 ⎟ and 𝐴 ⋅ ⎜ 𝑠𝑗−3 ⎟ = ⎜ 𝑠𝑗−2 ⎟ for 𝑗 ≥ 𝑛.
⎜ … ⎟ ⎜ … ⎟ ⎜ … ⎟
⎝0 0 1 0⎠ ⎝𝑠𝑗−𝑛 ⎠ ⎝𝑠𝑗−𝑛+1 ⎠
Figure 6.5. Clocking an LFSR of degree 4.
Remark 6.6. The literature has not adopted a unique notation of shift registers and
their parameters. LFSRs can also be shifted to the left so that the leftmost bit is output.
In this case, the state vector (from left to right) has increasing indices. The initial state is
𝑠𝑡 = (𝑠0 , 𝑠1 , … , 𝑠𝑛−1 ), and the recursion formula as well as the above transition matrix
look slightly different. We adopt the notation used by [MvOV97]. ♢
It is easy to see that every linear recurring sequence must ultimately be periodic:
the feedback coefficients are fixed and the output depends only on the state vector. The
state is a binary word of length 𝑛, and so there are 2𝑛 possible states.
Definition 6.7. Let 𝑠0 , 𝑠1 , … be a linear recurring sequence. The (least) period of the
sequence is the smallest integer 𝑁 ≥ 1 such that
𝑠𝑗+𝑁 = 𝑠𝑗
for all sufficiently large values of 𝑗.
Proposition 6.8. The period of a sequence generated by an LFSR of degree 𝑛 is at most

2𝑛 − 1.
Proof. Consider the sequence of state vectors. If the all-zero state occurs, then the
following output is constantly 0 and the period is 1. Otherwise, all state vectors are
nonzero. There are 2𝑛 − 1 nonzero states, and so the period is bounded by this number.
□
Example 6.9. Consider an LFSR of degree 4 with feedback coefficients 𝑐1 = 1, 𝑐2 = 0,
𝑐3 = 0 and 𝑐4 = 1. Suppose the initial state is 𝑠𝑡 = (1, 1, 0, 1) (see Figure 6.5). The state
is shifted to the right and a new bit is generated by XORing the first and fourth bit of
the state. The updated state is 𝑠𝑡 = (0, 1, 1, 0). We continue in this fashion and check
that the LFSR assumes all 15 nonzero states. The following output bits are generated:
1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0 (in this order). The sequence recurs after 15 output
bits.
SageMath can also compute the output of LFSRs. The key array contains the feed-
back coefficients and the fill array the initial state (in reverse order, so that the left-
most bit is the first output bit of the generator). We generate 20 bits and observe that
the output repeats after 15 bits.
sage: o = GF (2)(0); l = GF (2)(1)
sage: key = [l,o,o,l]; fill = [l,o,l,l]
sage: s = lfsr_sequence (key ,fill ,20); s
[1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0]
We used the symbols o and l for 0 ∈ 𝐺𝐹(2) and 1 ∈ 𝐺𝐹(2), respectively. ♢
In the above Example 6.9, the period of the sequence is maximal. In general, the
period depends on the initial state and the parameters of an LFSR.
Definition 6.10. Let 𝑐1 , 𝑐2 , … , 𝑐𝑛 be the feedback coefficients of an LFSR of degree 𝑛.
Then 𝑐(𝑥) = 1 + 𝑐1 𝑥 + 𝑐2 𝑥2 + ⋯ + 𝑐𝑛 𝑥𝑛 ∈ 𝐺𝐹(2)[𝑥] is called connection polynomial or
feedback polynomial of the LFSR.
Proposition 6.11. Let 𝑐(𝑥) = 1 + 𝑐1 𝑥 + 𝑐2 𝑥2 + ⋯ + 𝑐𝑛 𝑥𝑛 be the connection polynomial

of an LFSR with associated 𝑛 × 𝑛 matrix
𝑐 𝑐2 … 𝑐𝑛
⎛ 1 ⎞
1 0 0 0
⎜ ⎟
𝐴 = ⎜0 1 0 0⎟.
⎜ … ⎟
⎝0 0 1 0⎠
Then:
(1) If 𝑐𝑛 = 1 then det(𝐴) = 1 and 𝐴 is nonsingular. In this case, the LFSR is called
nonsingular.
(2) A nonsingular LFSR is invertible, i.e., can run in the reverse direction, and a state
uniquely determines the preceding bits.
(3) If the LFSR is nonsingular, then an output sequence of period 𝑁 is purely periodic,
i.e., 𝑠𝑗+𝑁 = 𝑠𝑗 for all 𝑗 ∈ ℕ.
(4) The characteristic polynomial of the matrix 𝐴 is
1
𝑝(𝑥) = det(𝐴 − 𝑥𝐼𝑛 ) = 𝑥𝑛 𝑐 ( ) = 𝑥𝑛 + 𝑐1 𝑥𝑛−1 + ⋯ + 𝑐𝑛−1 𝑥 + 𝑐𝑛 .
𝑥
(5) The connection polynomial of a nonsingular LFSR satisfies
𝑐(𝑥) = det(𝑥𝐴 − 𝐼𝑛 ).
(6) The characteristic polynomial 𝑝(𝑥) is equal to the minimal polynomial of 𝐴 and
𝐺𝐹(2)[𝑥]/(𝑝(𝑥)) ≅ 𝐺𝐹(2)[𝐴],
where 𝐺𝐹(2)[𝐴] is the commutative subring of matrices over 𝐺𝐹(2) that can be writ-
ten as a sum 𝑎0 𝐼𝑛 +𝑎1 𝐴+𝑎2 𝐴2 +⋯+𝑎𝑚 𝐴𝑚 with 𝑚 ∈ ℕ and 𝑎0 , 𝑎1 , … , 𝑎𝑚 ∈ 𝐺𝐹(2).
Proof. We leave it to the reader to prove (1). If 𝐴 is nonsingular, then 𝐴−1 exists and
𝑠 𝑠
⎛ 𝑗 ⎞ ⎛ 𝑗−1 ⎞
𝑠 𝑠
⎜ 𝑗−1 ⎟ ⎜ 𝑗−2 ⎟
𝐴−1 ⋅ ⎜ 𝑠𝑗−2 ⎟ = ⎜ 𝑠𝑗−3 ⎟ .
⎜ … ⎟ ⎜ … ⎟
⎝𝑠𝑗−𝑛+1 ⎠ ⎝𝑠𝑗−𝑛 ⎠
This shows (2). Since any LFSR is ultimately periodic and nonsingular LFSRs can run
in the reverse direction, all output bits of such LFSRs are periodic which proves (3).
We now turn to part (4). Since 𝑐(𝑥) = 1 + 𝑐1 𝑥 + 𝑐2 𝑥2 + ⋯ + 𝑐𝑛 𝑥𝑛 we have
1
𝑥𝑛 𝑐 ( ) = 𝑥𝑛 + 𝑐1 𝑥𝑛−1 + ⋯ + 𝑐𝑛−1 𝑥 + 𝑐𝑛 .
𝑥
We prove by induction that this gives the characteristic polynomial of 𝐴. If 𝑛 = 1 then
𝐴 = (𝑐1 ), so that 𝑐1 − 𝑥 = 𝑥 + 𝑐1 is the characteristic polynomial of 𝐴 over 𝐺𝐹(2). Using
the hypothesis for LFSRs of degree 𝑛 − 1, we compute the characteristic polynomial
𝑝(𝑥) of 𝐴:
|𝑐1 + 𝑥 𝑐2 … 𝑐𝑛 |
| 1 𝑥 0 0 ||
|
det(𝐴 − 𝑥𝐼𝑛 ) = | 0 1 𝑥 0 | (expansion along the last column)
| … |
| |
| 0 0 1 𝑥|
|1 𝑥 0 0| |𝑐1 + 𝑥 𝑐2 … 𝑐𝑛−1 |
|0 1 𝑥 0|| | 1 𝑥 0 0 ||
= 𝑐𝑛 || |
|+𝑥| | (use hypothesis)
| … | | … |
|0 0 0 1| | 0 0 1 𝑥 |
𝑛−1 𝑛−2
= 𝑐𝑛 + 𝑥 ⋅ (𝑥 + 𝑐1 𝑥 + ⋯ + 𝑐𝑛−2 𝑥 + 𝑐𝑛−1 )
𝑛 𝑛−1
= 𝑥 + 𝑐1 𝑥 + ⋯ + 𝑐𝑛−1 𝑥 + 𝑐𝑛 .
This shows (4). Now we derive (5) from (4):
1 1 1
𝑐(𝑥) = 𝑐 ( ) = 𝑥𝑛 𝑝 ( ) = 𝑥𝑛 det (𝐴 − 𝐼𝑛 ) = det(𝑥𝐴 − 𝐼𝑛 ).
1/𝑥 𝑥 𝑥
It remains to prove (6). In general, the minimal polynomial of a matrix divides the
characteristic polynomial (Cayley-Hamilton theorem). Now one can easily see from
the definition of 𝐴 that the first unit vector 𝑒1 is a cyclic vector of 𝐴: the vectors 𝑒1 ,
𝐴 ⋅ 𝑒1 , … , 𝐴𝑛−1 𝑒1 span 𝐺𝐹(2)𝑛 . If the minimal polynomial is of degree less than 𝑛,
then 𝐴𝑛−1 is a linear combination of 𝐼𝑛 , 𝐴, … , 𝐴𝑛−2 , and there can be no cyclic vector.
This shows that the minimal polynomial is of degree 𝑛 and equals the characteristic
polynomial 𝑝(𝑥). The surjective ring homomorphism 𝐺𝐹(2)[𝑥] → 𝐺𝐹(2)[𝐴] maps a
polynomial 𝑓(𝑥) to 𝑓(𝐴). By definition of the minimal polynomial, 𝑓(𝐴) is the zero
matrix if and only if 𝑓(𝑥) is a multiple of 𝑝(𝑥). This completes the proof. □
In the following, we assume that 𝑐𝑛 = 1 so that the LFSR and the matrix 𝐴 are
nonsingular. This assumption is reasonable, since one could otherwise omit the last
register bit and obtain an LFSR of lower degree that generates essentially the same
output.
The following Proposition relates the period of a linear recurring sequence to the
order of the associated matrix.
Proposition 6.12. Let 𝐴 be the matrix associated to a nonsingular LFSR of degree 𝑛 with
characteristic polynomial 𝑝(𝑥) and let ord(𝐴) be the order of 𝐴 in the multiplicative group
of invertible matrices over 𝐺𝐹(2), i.e., the smallest exponent 𝑁 ≥ 1 such that 𝐴𝑁 = 𝐼𝑛 .
Then:
(1) The period of any output sequence divides ord(𝐴).
(2) If the initial state is 𝑠𝑡 = (1, 0, … , 0)𝑇 , then the period of the associated output
sequence is equal to ord(𝐴).
(3) If 𝑝(𝑥) is irreducible, then the period of any nonzero sequence equals ord(𝐴).
Proof. Let 𝑠𝑡 be an initial state (viewed as a column vector). Then the sequence of
subsequent states is
𝑠𝑡, 𝐴 ⋅ 𝑠𝑡, 𝐴2 𝑠𝑡, 𝐴3 𝑠𝑡, … .
Let 𝑚 be the least period of that sequence. Since 𝐴ord(𝐴) = 𝐼𝑛 and hence 𝐴ord(𝐴) 𝑠𝑡
= 𝑠𝑡, we see that 𝑚 ∣ ord(𝐴), which proves (1). If 𝑠𝑡, 𝐴 ⋅ 𝑠𝑡, … , 𝐴𝑛−1 𝑠𝑡 form a basis
of 𝐺𝐹(2)𝑛 , then the period of the output sequence and the period of 𝐴 in the group of
invertible matrices coincide. For 𝑠𝑡 = (1, 0, … , 0)𝑇 , one can easily check that the vectors
𝑠𝑡, 𝐴 ⋅ 𝑠𝑡, … , 𝐴𝑛−1 𝑠𝑡 are linearly independent, and (2) is proved. If 𝑝(𝑥) is irreducible,
then Propositions 4.67 and 6.11 (6) imply an isomorphism of fields
𝐺𝐹(2)[𝑥]/(𝑝(𝑥)) ≅ 𝐺𝐹(2)[𝐴].
Therefore, any non-trivial linear combination of the matrices 𝐼𝑛 , 𝐴, … , 𝐴𝑛−1 is invert-
ible, which shows that the vectors 𝑠𝑡, 𝐴 ⋅ 𝑠𝑡, … , 𝐴𝑛−1 𝑠𝑡 are linearly independent for any
nonzero state 𝑠𝑡. This proves (3). □
The above Proposition shows that the maximal period of a nonsingular LFSR with
matrix 𝐴 is ord(𝐴). Under which conditions is the period equal to the maximal value
2𝑛 − 1?
Definition 6.13. Let 𝑓(𝑥) ∈ 𝐺𝐹(𝑝)[𝑥] be a polynomial of degree 𝑛 ≥ 1 with 𝑓(0) ≠ 0.

Then ord(𝑓) (the order of 𝑓) is defined to be the smallest integer 𝑁 ≥ 1 such that
𝑓(𝑥) ∣ 𝑥𝑁 − 1. ♢
It is not difficult to see that the order of 𝑓 is well defined: consider the quotient
ring 𝐺𝐹(𝑝)[𝑥]/(𝑓(𝑥)) and its group of units 𝑈 = (𝐺𝐹(𝑝)[𝑥]/(𝑓(𝑥)))∗ , which has at
most 𝑝𝑛 − 1 elements. If 𝑓 is irreducible then ord(𝑈) = 𝑝𝑛 − 1. Since 𝑓(0) ≠ 0, 𝑥 is
invertible modulo 𝑓(𝑥) and Euler’s Theorem 4.15 implies that
𝑥ord(𝑈) ≡ 1 mod 𝑓(𝑥).
In other words, 𝑓(𝑥) divides 𝑥ord(𝑈) − 1. In fact, we have
ord(𝑓) = ord(𝑥) ∣ ord(𝑈),
where ord(𝑥) is the order of 𝑥 in 𝑈.
Proposition 6.14. Let 𝑝(𝑥) = 𝑥𝑛 + 𝑐1 𝑥𝑛−1 + ⋯ + 𝑐𝑛−1 𝑥 + 𝑐𝑛 ∈ 𝐺𝐹(2)[𝑥] be the char-

acteristic polynomial of a nonsingular LFSR of degree 𝑛 and suppose 𝑝(𝑥) is primitive,
i.e.,
(1) 𝑝(𝑥) is irreducible and
(2) ord(𝑝(𝑥)) = 2𝑛 − 1.
Then all output sequences with nonzero initial state have maximal period 2𝑛 − 1 and the
LFSR is called maximum-length.
Proof. Let 𝐴 be the matrix associated to the LFSR and suppose 𝑝(𝑥) is primitive. Prop-
osition 6.11 (6) shows that
(𝐺𝐹(2)[𝑥]/(𝑝(𝑥)))∗ ≅ 𝐺𝐹(2)[𝐴]∗ ,
and hence ord(𝑥) = ord(𝐴) = ord(𝑝(𝑥)) = 2𝑛 − 1. It remains to prove that the period
of all nonzero sequences is maximal. But this follows from Proposition 6.12 (3). □
Example 6.15. Let 𝑐(𝑥) = 𝑥4 + 𝑥 + 1 be the connection polynomial of an LFSR (see

Example 6.9). Then 𝑝(𝑥) = 𝑥4 + 𝑥3 + 1 is the corresponding characteristic polynomial
(see Proposition 6.11). It is easy to check that 𝑝(𝑥) is irreducible and we show that 𝑝(𝑥)
is primitive:
We have ord(𝑝(𝑥)) = ord(𝑥), where the latter is the order of 𝑥 in the group of
units of 𝐺𝐹(2)[𝑥]/(𝑥4 + 𝑥3 + 1). Since the order of the group of units is 15 and thus
ord(𝑥) ∣ 15, the order of 𝑥 must be either 1, 3, 5 or 15. Obviously, 𝑥 ≢ 1 and 𝑥3 ≢ 1.
Since 𝑥5 ≡ 𝑥3 + 𝑥 + 1 mod 𝑥4 + 𝑥3 + 1 one has 𝑥5 ≢ 1. Hence ord(𝑥) = ord(𝑝(𝑥)) = 15.
For every nonzero initial state, the LFSR generates a sequence of maximal length.
We can also use SageMath to do the work:
sage: R.<x> = PolynomialRing (GF (2))
sage: (x^4+x ^3+1). is_primitive ()
True
We have seen that a maximum-length LFSR of degree 𝑛 generates an output se-

quence of length 2𝑛 − 1. In principle, a synchronous stream cipher can be based on an
LFSR: first, the initialization algorithm loads the cipher key and the IV into the initial
state of the LFSR. The LFSR is clocked a number of times and the output discarded in
order to protect the secret key. The following output bits are then used as a keystream.
Although LFSRs have good statistical properties, the output of a linear feedback
shift register should not be used as a keystream. Having 𝑛 successive output bits (for
example in a known plaintext attack) suffices to reconstruct the state vector. If the
feedback coefficients are known, then all subsequent and previous output bits can be
computed. Even if the coefficients are kept secret (in contradiction to Kerkhoff’s prin-
ciple; see Remark 2.3), everything can be reconstructed from 2𝑛 output bits, as the
following proposition shows:
Proposition 6.16. Consider a nonsingular LFSR of degree 𝑛 with unknown feedback

coefficients. If the characteristic polynomial is irreducible, then the feedback coefficients
𝑐1 , … , 𝑐𝑛 can be uniquely determined from 2𝑛 consecutive output bits 𝑦1 , 𝑦2 , … , 𝑦2𝑛 (not
all zero) of the LFSR.
Proof. Let 𝐴 be the unknown matrix associated to the LFSR. We reconstruct 𝑛 state
vectors of the LFSR from the output bits:
𝑦𝑛 𝑦𝑛+1 𝑦2𝑛−1
𝑠𝑡 = ( ⋮ ) , 𝐴 ⋅ 𝑠𝑡 = ( ⋮ ) , … , 𝐴𝑛−1 𝑠𝑡 = ( ⋮ ) .
𝑦1 𝑦2 𝑦𝑛
Let 𝑥 = (𝑐1 𝑐2 … 𝑐𝑛 ) be the first row of 𝐴. Then:
𝑦𝑛+1 = 𝑥 ⋅ 𝑠𝑡,
𝑦𝑛+2 = 𝑥 ⋅ (𝐴 ⋅ 𝑠𝑡)
⋮
𝑦2𝑛 = 𝑥 ⋅ (𝐴𝑛−1 𝑠𝑡).
We obtain a linear system of equations 𝑦 = 𝑀𝑥, where the rows of 𝑀 are formed
by the vectors 𝑠𝑡, 𝐴 ⋅ 𝑠𝑡, … , 𝐴𝑛−1 𝑠𝑡. Since 𝑝(𝑥) is irreducible, we obtain as in the proof
of Proposition 6.12 (3) that 𝑠𝑡, 𝐴 ⋅ 𝑠𝑡, … , 𝐴𝑛−1 𝑠𝑡 are linearly independent. Therefore,
the linear system of equations 𝑦 = 𝑀𝑥 has a unique solution, the vector of feedback
coefficients. □
Example 6.17. Suppose the following output bits of an LFSR of degree 4 are known
(in this order):
0, 1, 1, 1, 1, 0, 1, 0.
We want to reconstruct the feedback coefficients and a state. The first four bits (in
reverse order) give the state vector:
1
⎛ ⎞
1
𝑠𝑡 = ⎜ ⎟ .
⎜1⎟
⎝0⎠
The subsequent states are 𝐴⋅𝑠𝑡 = (1, 1, 1, 1)𝑇 , 𝐴2 𝑠𝑡 = (0, 1, 1, 1)𝑇 and 𝐴3 𝑠𝑡 = (1, 0, 1, 1)𝑇 .
This yields four linear equations in the unknown feedback coefficients 𝑥1 , 𝑥2 , 𝑥3 and
𝑥4 . The left side of the equations is given by the last four output bits.
1 = 1𝑥1 + 1𝑥2 + 1𝑥3 + 0𝑥4 mod 2,
0 = 1𝑥1 + 1𝑥2 + 1𝑥3 + 1𝑥4 mod 2,
1 = 0𝑥1 + 1𝑥2 + 1𝑥3 + 1𝑥4 mod 2,
0 = 1𝑥1 + 0𝑥2 + 1𝑥3 + 1𝑥4 mod 2.
This 4 × 4 system of linear equations over 𝐺𝐹(2) is regular and the unique solution are
the feedback coefficients 𝑥1 = 1, 𝑥2 = 0, 𝑥3 = 0 and 𝑥4 = 1. In fact, we have taken
eight output bits from the LFSR in Example 6.9.
Remark 6.18. The Berlekamp-Massey algorithm (see [MvOV97]) finds the shortest
LFSR that generates a given finite sequence. The degree of the shortest LFSR is called
the linear complexity of a sequence. Suppose the characteristic polynomial 𝑝(𝑥) of a
nonsingular LFSR is irreducible and deg(𝑝(𝑥)) = 𝑛. Then each nonzero state pro-
duces an output sequence of period ord(𝑝(𝑥)) and linear complexity 𝑛. With 2𝑛 given
output bits, the Berlekamp-Massey algorithm can compute the feedback coefficients
more efficiently than solving a system of linear equations. ♢
We have seen that LFSRs are very efficient and can have a large period, but make
weak stream ciphers. The problem is, of course, the linear structure of LFSRs.
One possible approach is to use filter generators. A nonlinear function 𝑓 is applied
to the entire state of an LFSR and defines the keystream:
𝑦𝑗 = 𝑓(𝑠𝑗−1 , 𝑠𝑗−2 , … , 𝑠𝑗−𝑛 ).
The multiplications of state bits (AND) can be used along with additions (XOR).
Furthermore, one can use combination generators, which combine several LFSRs.
A linear or nonlinear Boolean function takes the output bits of each register as input
and combines them into a single keystream bit.
Example 6.19. The stream cipher Trivium [DCP08], which belongs to the portfolio
of the eSTREAM project, combines three shift registers of degree 93, 84 and 111, re-
spectively. The output of each register is defined by a nonlinear filter function, and
the input is the XOR-sum of one feedback bit and the output of another register. The
keystream at each clock tick is the XOR-sum of the output bits of the three registers. ♢
Yet another approach is to use irregular clocking: several LFSRs are combined and
a nonlinear function determines whether or not a register is clocked (shifted to the
right). If a register is not clocked, then the previous bit is output again.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Figure 6.6. The A5/1 cipher combines three LFSRs and uses irregular clocking.
Example 6.20. The A5/1 cipher that is used in GSM mobile networks (2G) combines
three LFSRs of degree 19, 22 and 23 (see Table 6.1 and Figure 6.6). In each register, one
clocking bit is fixed and a register is clocked, if the clocking bit agrees with the majority
of the three clocking bits. Therefore, either all three LFSRs or two of the LFSRs are
3
clocked. The probability that a register is clocked is (see Exercise 5).
4
Table 6.1. Feedback polynomials and clocking bit of the A5/1 cipher.
LFSR number Feedback polynomial Clocking bit (leftmost bit is 1)

19 18 17 14
1 𝑥 +𝑥 +𝑥 +𝑥 +1 9
22 21
2 𝑥 +𝑥 +1 11
23 22 21 8
3 𝑥 +𝑥 +𝑥 +𝑥 +1 11
Initially, all registers are set to zero. A 64-bit ciphering key (where only 54 bits are
secret) and a 22-bit frame number are mixed in. Then the irregular clocking starts: the
first 100 output bits are discarded and the next 228 bits are used as the keystream (the
first 114 bits for the downlink from the base station to the cellular phone and then 114
bits for the uplink).
The irregular clocking forms the nonlinear component of the cipher. Nevertheless,
with current computing resources and large precomputed tables A5/1 is now broken.
Better GSM ciphers (A5/3 and A5/4) are available, but whether they are used depends
on the network and the mobile phone.
6.3. RC4
The synchronous stream cipher RC4 (Rivest Cipher 4) was very popular for many years
and was often used to encrypt network traffic (for example in the TLS protocol or for
the encryption of Wi-Fi traffic). RC4 is based on permutations of the integers (or bytes)
0, 1, … , 255 and recursively generates output bytes. The key-scheduling algorithm (ini-
tialization) takes a key (between one and 256 bytes long) and sets up the state array
𝑆[0], 𝑆[1], … , 𝑆[255]. The pseudorandom generation algorithm recursively computes
one output byte and updates the state.
RC4 is ideal for software implementations and is very efficient. Unfortunately, the
output of RC4 is biased and can be distinguished from a random sequence of bytes.
Algorithm 6.1 RC4 Key Scheduling Algorithm (KSA)

Input: Key array 𝐾[0], 𝐾[1], … , 𝐾[𝑛 − 1] of 𝑛 bytes, 1 ≤ 𝑛 ≤ 255
Output: State array 𝑆[0], 𝑆[1], … , 𝑆[255]
1: for 𝑖 = 0 to 255 do
2: 𝑆[𝑖] = 𝑖
3: end for
4: 𝑗 = 0
5: for 𝑖 = 0 to 255 do
6: 𝑗 = (𝑗 + 𝑆[𝑖] + 𝐾[𝑖 mod 𝑛]) mod 256
7: Swap the values of 𝑆[𝑖] and 𝑆[𝑗]
8: end for
For the remainder of this section, all additions (+) are modulo 256. First, we con-
sider the key scheduling (see Algorithm 6.1).
In the first iteration of the for loop, one has 𝑖 = 0 and 𝑗 = 0 + 𝑆[0] + 𝐾[0] = 𝐾[0].
The values of 𝑆[0] and 𝑆[𝐾[0]] are swapped so that 𝑆[0] = 𝐾[0] and 𝑆[𝐾[0]] = 0. In
the next iteration, one has 𝑖 = 1 and 𝑗 = 𝐾[0] + 𝑆[1] + 𝐾[1] so that
𝑆[1] = 𝐾[0] + 𝑆[1] + 𝐾[1] mod 256.
Hence 𝑆[1] is equal to 𝐾[0] + 𝐾[1] + 1, unless 𝐾[0] = 1 and 𝑆[1] = 0 as a result of the
first iteration. In this case, 𝑆[1] = 𝐾[0] + 𝐾[1] = 1 + 𝐾[1].
In the next step, 𝑖 = 2 and (if 𝐾[0] ≠ 1) one gets 𝑗 = 𝐾[0] + 𝐾[1] + 1 + 𝑆[2] + 𝐾[2].
Therefore, it is likely that 𝑆[2] = 𝐾[0] + 𝐾[1] + 𝐾[2] + 3. Note that 𝑆[0], 𝑆[1] and 𝑆[2]
may change later in the loop if one of the 𝑗-values for 𝑖 > 2 becomes 0, 1 or 2.
By continuing in this fashion, one can show that the most likely value for the 𝑖-th
state byte after the key scheduling algorithm is
𝑖(𝑖 + 1)
𝑆[𝑖] = 𝐾[0] + 𝐾[1] + ⋯ + 𝐾[𝑖] + mod 256.
2
6.3. RC4 129
For the first nine state values, it can be shown that the above formula holds with
more than 30% probability (see [PM07]), which is a very significant bias compared to
a random permutation.
Algorithm 6.2 RC4 Pseudorandom generation algorithm (PRGA)

Input: State array 𝑆[0], 𝑆[1], … , 𝑆[255]
Output: Output bytes 𝐵
1: 𝑖 = 0
2: 𝑗 = 0
3: while Keystream is generated do
4: 𝑖 =𝑖+1
5: 𝑗 = (𝑗 + 𝑆[𝑖]) mod 256
6: Swap the values of 𝑆[𝑖] and 𝑆[𝑗]
7: 𝐵 = 𝑆[(𝑆[𝑖] + 𝑆[𝑗]) mod 256]
8: Output 𝐵
9: end while
Now consider the pseudorandom generator (see Algorithm 6.2). The first output
byte is 𝑆[𝑆[1] + 𝑆[𝑆[1]]] and the second byte is
𝑆[𝑆[2] + 𝑆[𝑆[1] + 𝑆[2]]].
One can show that the output of RC4 is biased and reveals information about the key.
Below, we discuss a famous attack which reveals the key byte 𝐾[3].
By construction, RC4 does not use an initialization vector (IV), and thus the
keystream must not be re-used with the same key. In practice, the secret key is often
re-used and an IV is incorporated into the RC4 key. In the former Wi-Fi encryption
standard WEP (Wired Equivalent Privacy), a three-byte IV is prepended to the key:
𝐾[0] = 𝐼𝑉[0], 𝐾[1] = 𝐼𝑉[1], 𝐾[2] = 𝐼𝑉[2].
It turned out that this construction is insecure (Fluhrer, Mantin and Shamir attack
[FMS01]): an adversary waits until the first two bytes of the IV are
𝐼𝑉[0] = 𝐾[0] = 3, 𝐼𝑉[1] = 𝐾[1] = 255.
Then the first two iterations of the key scheduling algorithm give
𝑆[0] = 𝐾[0] = 3, 𝑆[1] = 𝐾[0] + 𝐾[1] + 1 = 3 + 255 + 1 ≡ 3 mod 256.
Due to the swapping operations, the first few bytes of the state array are
𝑆[0] = 3, 𝑆[1] = 0, 𝑆[2] = 2, 𝑆[3] = 1.
The next two iterations yield:
𝑆[2] = 3 + 2 + 𝐼𝑉[2] = 5 + 𝐼𝑉[2] and
𝑆[3] = 5 + 𝐼𝑉[2] + 1 + 𝐾[3] = 6 + 𝐼𝑉[2] + 𝐾[3].
If we now assume that 𝑆[0] = 3, 𝑆[1] = 0 and 𝑆[3] = 6 + 𝐼𝑉[2] + 𝐾[3] are not subse-
quently modified in the key scheduling algorithm, then the first keystream byte is
𝐵 = 𝑆[𝑆[1] + 𝑆[𝑆[1]]] = 𝑆[0 + 𝑆[0]] = 𝑆[3] = 6 + 𝐼𝑉[2] + 𝐾[3] mod 256.
Since 𝐼𝑉[2] is known, the first secret key byte 𝐾[3] can be computed from 𝐵. In practice,
the first plaintext and ciphertext byte and thus the first keystream byte 𝐵 are often
known, for example in Wi-Fi communication.
In the updated RC4-based Wi-Fi security protocol TKIP, the mixing of IV and key
was improved, but now TKIP is also deprecated. The RC4 cipher should no longer be
used because of serious weaknesses.
6.4. Salsa20 and ChaCha20

The eSTREAM project aimed to develop new secure, efficient and compact stream ci-
phers suitable for widespread adoption. The project ended in 2008, and in 2012 the
reviewed eSTREAM portfolio contained seven algorithms in two profiles [CR12]:
(1) Profile 1: Stream ciphers with excellent throughput when implemented in soft-
ware: HC-128, Rabbit, Salsa20/12 and SOSEMANUK.
(2) Profile 2: Stream ciphers which are very efficient - in terms of the physical re-
sources required - when implemented in hardware: Grain v1, MICKEY 2.0 and
Trivium.
Profile 1 ciphers use 128-bit keys and profile 2 ciphers 80-bit keys. Extended key
lengths are provided by the software ciphers HC-256 and Salsa 20/20 (256-bit keys) and
the hardware cipher MICKEY-128 2.0 (128-bit key).
However, it should be noted that the eSTREAM portfolio is not a standardization.
The project wants to draw attention to these ciphers and to encourage further crypt-
analysis.
In the following, we describe the stream cipher Salsa20/20 (i.e., Salsa20 with 20
rounds and a 256-bit key) [Ber08b] and its variant ChaCha20 [Ber08a], which has
been adopted as a replacement of RC4 in the TLS protocol (see RFC 7905 [LCM+ 16]).
Salsa20 is based on three simple operations:
• modular addition of 32-bit words 𝑎 and 𝑏 mod 232 , denoted by 𝑎 + 𝑏,

• XOR-sum of 32-bit words 𝑎 and 𝑏, denoted by 𝑎 ⊕ 𝑏,
• circular left shift of a 32-bit word 𝑎 by 𝑡 positions, denoted by 𝑎 ⋘ 𝑡.
The Salsa20/20 cipher takes a 256-bit key, a 64-bit nonce and a 64-bit counter. The
state array 𝑆 of Salsa20 is a 4 × 4 matrix of sixteen 32-bit words. Strings are interpreted
in little-endian notation, i.e., the least significant bit of each word is stored first.
𝑦 𝑦1 𝑦2 𝑦3
⎛ 0 ⎞
𝑆=⎜ 4 ⎟.
⎜ 𝑦8 𝑦9 𝑦10 𝑦11 ⎟
⎝𝑦12 𝑦13 𝑦14 𝑦15 ⎠
The definition of Salsa20 is based on quarter-rounds, row-rounds and column-

rounds. The quarter-rounds operate on four words, the row-rounds transform the four
rows and the column-rounds transform the four columns of the state matrix.
Definition 6.21. Let 𝑦 = (𝑎, 𝑏, 𝑐, 𝑑) be a 4-word sequence. Then a Salsa20 quarter-

round updates (𝑎, 𝑏, 𝑐, 𝑑) as follows (see Figure 6.7):
𝑏 = 𝑏 ⊕ ((𝑎 + 𝑑) ⋘ 7),
𝑐 = 𝑐 ⊕ ((𝑏 + 𝑎) ⋘ 9),
𝑑 = 𝑑 ⊕ ((𝑐 + 𝑏) ⋘ 13),
𝑎 = 𝑎 ⊕ ((𝑑 + 𝑐) ⋘ 18).
Figure 6.7. A Salsa20 quarter-round.

Example 6.22. (1) quarter-round (0, 0, 0, 0) = (0, 0, 0, 0).

(2) We compute (𝑎, 𝑏, 𝑐, 𝑑) = quarter-round (0, 0, 1, 0), where 1 = 00 00 00 01.
Then 𝑏 = 0, 𝑐 = 1, 𝑑 = (1 ⋘ 13) = 00 00 20 00 and 𝑎 = (𝑑 + 𝑐) ⋘ 18 =
(00 00 20 01 ⋘ 18) = 80 04 00 00. The result is
(𝑎, 𝑏, 𝑐, 𝑑) = (80 04 00 00, 00 00 00 01, 00 00 00 01, 00 00 20 00). ♢
A row-round changes the state matrix by applying a quarter-round to each of the

rows. The words are permuted before the quarter-round is applied. After the quarter-
round operation, the permutation is reversed.
⎛ 0 ⎞
Definition 6.23. Let 𝑆 = ⎜ 4 ⎟. Then
⎜ 𝑦8 𝑦9 𝑦10 𝑦11 ⎟
⎝𝑦12 𝑦13 𝑦14 𝑦15 ⎠
𝑧 𝑧1 𝑧2 𝑧3
⎛ 0 ⎞
𝑧4 𝑧5 𝑧6 𝑧7
row-round (𝑆) = ⎜ ⎟ , where
⎜ 𝑧8 𝑧9 𝑧10 𝑧11 ⎟
⎝𝑧12 𝑧13 𝑧14 𝑧15 ⎠
(𝑧0 , 𝑧1 , 𝑧2 , 𝑧3 ) = quarter-round (𝑦0 , 𝑦1 , 𝑦2 , 𝑦3 ),
(𝑧15 , 𝑧12 , 𝑧13 , 𝑧14 ) = quarter-round (𝑦15 , 𝑦12 , 𝑦13 , 𝑦14 ).
♢
The column-round function is the transpose of the row-round function: the words
in the columns are permuted, the quarter-round map is applied to each of the columns
and the permutation is reversed.
Definition 6.24. Let 𝑆 be a state matrix as above; then
column-round (𝑆) = (row-round (𝑆 𝑇 ))𝑇 . ♢
A double-round is the composition of a column-round and a row-round.

Definition 6.25. Let 𝑆 be a state matrix as above; then
double-round (S) = row-round (column-round (𝑆)). ♢
Salsa20 runs 10 successive double-rounds, i.e., 20 quarter-rounds, in order to gen-

erate 64 bytes of output. The initial state depends on the key, a nonce and a counter.
Definition 6.26. The Salsa20/20 stream cipher takes a 256-bit key 𝑘 = (𝑘1 , … , 𝑘8 ) and
a unique 64-bit message number 𝑛 = (𝑛1 , 𝑛2 ) (nonce) as input. A 64-bit block counter
𝑏 = (𝑏1 , 𝑏2 ) is initially set to zero. The initialization algorithm copies 𝑘, 𝑛, 𝑏 and the
four 32-bit constants
𝑦0 = 61707865, 𝑦5 = 3320646E, 𝑦10 = 79622D32, 𝑦15 = 6B206574
into the sixteen 32-bit words of the Salsa20 state matrix:
𝑦 𝑘1 𝑘2 𝑘3
⎛ 0 ⎞
𝑘4 𝑦5 𝑛1 𝑛2
𝑆=⎜ ⎟.
⎜ 𝑏1 𝑏2 𝑦10 𝑘5 ⎟
⎝𝑘6 𝑘7 𝑘8 𝑦15 ⎠
The keystream generator computes the output state by ten double-round iterations
and a final addition mod 232 of the initial state matrix:
10
Salsa20𝑘 (𝑛, 𝑏) = 𝑆 + double-round (𝑆).
The block counter 𝑐 is incremented and the state is newly initialized for additional
64-byte output blocks. The Salsa20 keystream is the serialization of a sequence of 64-
byte output blocks:
Salsa20𝑘 (𝑛, 0), Salsa20𝑘 (𝑛, 1), Salsa20𝑘 (𝑛, 2), … .
Remark 6.27. Salsa20 treats strings as little-endian integers. For example, if the first
four key bytes are 01, 02, 03 and 04, then the corresponding integer is 𝑦1 = 04030201
in hexadecimal notation. Output words are serialized; the integer 04030201 yields the
output bytes 01, 02, 03 and 04 (in this order).
Example 6.28. Salsa20𝑘 (𝑛, 0) is a zero block if 𝑘 and 𝑛 are zero. This should not hap-
pen when Salsa20 is used as a stream cipher, since the nonce 𝑛 must only be used once.
Remark 6.29. Note that the state 𝑆 is re-initialized for each 64-byte output block and
there is no chaining from one block to another. Hence the Salsa20 keystream can be
accessed randomly and the computation of 64-byte blocks can be done in parallel. ♢
We turn to the ChaCha family of ciphers [Ber08a] and describe the ChaCha20 vari-
ant described in RFC 8439 [NL18]. ChaCha20 is a modification of Salsa20 and we
explain the differences to Salsa20.
Definition 6.30. Let 𝑦 = (𝑎, 𝑏, 𝑐, 𝑑) be a sequence of four 32-bit words. Then a ChaCha
quarter-round updates (𝑎, 𝑏, 𝑐, 𝑑) as follows:
1) 𝑎=𝑎+𝑏 ; 𝑑 =𝑑⊕𝑎 ; 𝑑 ⋘ 16;
2) 𝑐=𝑐+𝑑 ; 𝑏=𝑏⊕𝑐 ; 𝑏 ⋘ 12;
3) 𝑎=𝑎+𝑏 ; 𝑑 =𝑑⊕𝑎 ; 𝑑 ⋘ 8;
4) 𝑐=𝑐+𝑑 ; 𝑏=𝑏⊕𝑐 ; 𝑏 ⋘ 7.
♢
A ChaCha quarter-round updates each word twice and uses different rotation dis-
tances than Salsa20. ChaCha20 also runs ten double-rounds. However, a ChaCha
double-round consists of a column-round and a diagonal-round, which changes words
along the main and secondary diagonals.
Definition 6.31. A ChaCha double-round is defined by the eight ChaCha quarter-
rounds in Table 6.2.
Table 6.2. A column-round and diagonal-round form a ChaCha double-round.
column-round quarter-round(𝑦0 , 𝑦4 , 𝑦8 , 𝑦12 )

quarter-round(𝑦1 , 𝑦5 , 𝑦9 , 𝑦13 )
diagonal-round quarter-round(𝑦0 , 𝑦5 , 𝑦10 , 𝑦15 )
The RFC version of ChaCha20 described below uses a 12-byte nonce and a 4-byte
block counter. The original cipher takes an 8-byte nonce and an 8-byte counter.
Definition 6.32. The ChaCha20 stream cipher takes a 256-bit key 𝑘 = (𝑘1 , … , 𝑘8 )
and a unique 96-bit message number 𝑛 = (𝑛1 , 𝑛2 , 𝑛3 ) (nonce) as input. A 32-bit block
counter 𝑏 is initially set to zero. The initialization algorithm copies 𝑘, 𝑛, 𝑏 and the four
32-bit constants
𝑦0 = 61707865, 𝑦1 = 3320646E, 𝑦2 = 79622D32, 𝑦3 = 6B206574
into the sixteen 32-bit words of the ChaCha20 state matrix:
⎛ 0 ⎞
𝑘1 𝑘2 𝑘3 𝑘4
𝑆=⎜ ⎟.
⎜𝑘5 𝑘6 𝑘7 𝑘8 ⎟
⎝ 𝑏 𝑛1 𝑛2 𝑛3 ⎠
The ChaCha20 keystream generator works analogously to the Salsa20 generator, but
uses ChaCha double-rounds:
10
ChaCha𝑘 (𝑛, 𝑏) = 𝑆 + double-round (𝑆).
The block counter 𝑏 is incremented and the state is newly initialized for each 64-byte
output block. The ChaCha20 keystream is the serialization of a sequence of 64-byte
output blocks:
ChaCha𝑘 (𝑛, 0), ChaCha𝑘 (𝑛, 1), ChaCha𝑘 (𝑛, 2), … .
Remark 6.33. Salsa20/20 and ChaCha20 are very fast (also in comparison with AES)
and encryption requires less than 5 CPU cycles per byte on modern processors.
Exercises 135
6.5. Summary
• A stream cipher is a symmetric encryption scheme that uses an internal state

and recursively generates keystream. The plaintext is encrypted and the ci-
phertext is decrypted, respectively, by XORing input words and keystream.
• There are synchronous and self-synchronizing stream ciphers.
• Block ciphers can be turned into stream ciphers by applying the CTR, OFB or
CFB mode.
• Linear Feedback Shift Registers (LFSRs) generate output bits by a linear re-
cursion on the state bits. They make weak stream ciphers because of recon-
struction attacks, but one can improve the security by using nonlinear filter
functions and combinations of several shift registers.
• The stream cipher RC4 was in widespread use over many years, but is now
deprecated because of statistical weaknesses.
• Salsa20 and ChaCha20 are examples of promising new stream ciphers.
Exercises
1. Suppose the length of the IV of a synchronous stream cipher is 24 bits. Discuss the
security of the cipher.
2. Check whether 𝑝(𝑥) = 𝑥4 + 𝑥3 + 𝑥2 + 𝑥 + 1 ∈ 𝐺𝐹(2)[𝑥] is a primitive polyno-
mial. Suppose 𝑝(𝑥) is the characteristic polynomial of an LFSR. Find the periods
of output sequences generated by this LFSR.
3. Let 𝑐(𝑥) be the connection polynomial of a nonsingular LFSR and let 𝑝(𝑥) be the
corresponding characteristic polynomial. Show that 𝑝(𝑥) is irreducible if and only
if 𝑐(𝑥) is irreducible. Furthermore, show that 𝑝(𝑥) is primitive if and only if 𝑐(𝑥) is
primitive.
4. Suppose an LFSR of degree 5 is used as a stream cipher and the following plaintext
𝑚 and ciphertext 𝑐 is known:
𝑚 = 00100 11000, 𝑐 = 10110 01110.
Compute the feedback polynomial, the characteristic polynomial, the period and
the complete keystream.
Hint: The first five bits of 𝑚 ⊕ 𝑐 give a state (reverse the order). The next five bits
yield linear equations in the unknown feedback coefficients.
5. Verify that the majority function of three bits 𝑥1 , 𝑥2 , 𝑥3 is given by
𝑋 = 𝑚𝑎𝑗(𝑥1 , 𝑥2 , 𝑥3 ) = (𝑥1 ∧ 𝑥2 ) ⊕ (𝑥1 ∧ 𝑥3 ) ⊕ (𝑥2 ∧ 𝑥3 ).
3
Show that 𝑃𝑟[𝑋 = 𝑥𝑖 ] = for 𝑖 = 1, 2, 3 if 𝑥1 , 𝑥2 , 𝑥3 are independent and uniformly
4
distributed.
6. Use SageMath to verify that the feedback polynomials of the A5/1 LFSRs (see
Example 6.20) are primitive. Give an upper bound for the period of the A5/1
keystream generator.
7. Suppose an RC4 key satisfies 𝐾[0] + 𝐾[1] ≡ 0 mod 256. Show that with increased
probability the first output byte is 𝐾[2] + 3 mod 256.
8. Show that the quarter-round operation in Salsa20 is invertible. Give a description
of the inverse map.
9. Give an explicit description of the column-round operation in Salsa20 using the
quarter-round map.
10. Apply a Salsa20 quarter-round to (1, 0, 0, 0), (0, 1, 0, 0) and (0, 0, 0, 1), where 1 =
00 00 00 01.
11. Salsa20 can be seen as a map on the vector space 𝐺𝐹(2)512 . Which Salsa20 opera-
tions are not 𝐺𝐹(2)-linear? Explain your answer.
12. In Salsa20 and ChaCha20, the initial state matrix is added to the resulting state
matrix after performing ten double-rounds. Why is this final step important for
the security of the cipher?
13. Suppose the diagonal rounds in ChaCha20 are omitted. Discuss the consequences
of this modification on the security of the cipher.
Chapter 7
Hash Functions
Hash functions form an important cryptographic primitive. They take a message of

arbitrary length as input and output a short digest, which should uniquely identify the
input data. In Section 7.1, we discuss the main requirements and properties of hash
functions. The widely used Merkle-Damgård construction is explained in Section 7.3,
while Sections 7.4, 7.5 and 7.6 deal with the SHA-1 hash function, the SHA-2 family
and the new standard hash function SHA-3, respectively.
We refer to [MvOV97], [PP10], [KL15] and [Ber11] for additional reading on hash
functions.
7.1. Definitions and Security Requirements

In general, a cryptographic hash function consists of a polynomial-time key generator
and a hash algorithm. The key generation algorithm takes a security parameter 1𝑛 as
input and outputs a key 𝑘. A keyed hash function
𝐻𝑘 ∶ {0, 1}∗ → {0, 1}𝑙(𝑛)
takes a key and a binary string as input and outputs a hash value of length 𝑙(𝑛). If the
input length is restricted to 𝑙′ (𝑛) > 𝑙(𝑛), then 𝐻𝑘 is called a compression function.
Since hash values are used as message digests or unique identifiers, their main re-
quirement is collision resistance. A collision is given by two input values 𝑥 ≠ 𝑥′ such
that 𝐻(𝑥) = 𝐻(𝑥′ ).
Definition 7.1. A function 𝐻 = 𝐻𝑘 ∶ 𝐷 → 𝑅, where 𝐻, 𝑘, 𝐷 and 𝑅 depend on a
security parameter 𝑛, is called collision resistant if the probability that a probabilistic
polynomial-time adversary finds a collision 𝐻(𝑥) = 𝐻(𝑥′ ), where 𝑥, 𝑥′ ∈ 𝐷 and 𝑥 ≠ 𝑥′ ,
is negligible in 𝑛. ♢
137
138 7. Hash Functions
If the domain 𝐷 is larger than the range 𝑅, then 𝐻 cannot be injective and colli-
sions must therefore exist. However, the probability of finding collisions with limited
computing resources may be very small.
There are two related requirements which are weaker than collision resistance.
We only give informal definitions:
• Second-preimage resistance or weak collision resistance means that an adversary,
who is given a uniform 𝑥 ∈ 𝐷, is not able to find a second preimage 𝑥′ ∈ 𝐷 with
𝑥 ≠ 𝑥′ such that 𝐻(𝑥) = 𝐻(𝑥′ ).
• Preimage resistance or one-wayness means that an adversary, who is given a uni-
form 𝑦 ∈ 𝑅, is not able to find a preimage 𝑥 ∈ 𝐷 such that 𝐻(𝑥) = 𝑦.
One can show that collision resistance implies second-preimage resistance and
preimage resistance (see Exercise 1).
In practice, hash functions are usually unkeyed or the key is fixed. Unkeyed hash
functions 𝐻 ∶ {0, 1}∗ → {0, 1}𝑙 have a theoretical disadvantage: they are fixed functions
and a collision can be found in constant time. However, this can be inaccessible if 𝑙 is
large. Therefore, one requires that it is computationally infeasible to produce a collision.
In particular, not even a single collision should be known.
Remark 7.2. An ideal unkeyed hash function is called a random oracle. The output of
a random oracle is uniformly random, unless the same input is queried twice, in which
case the oracle returns the same output. One can construct a pseudorandom generator
(see Definition 2.32) and a pseudorandom function (Definition 2.39) from a random
oracle (see [KL15]). However, implementations of a random oracle are impossible: it
must have some compact description, and hence the output of any real-world instance
is deterministic and not random.
The random oracle model is used in some security proofs, and one hopes that con-
crete instantiations of hash functions are sufficiently close to that assumption. A se-
curity guarantee in the random oracle model can only be relative: a scheme is secure
assuming that the hash function has no weaknesses and produces uniform output. ♢
The output length of a hash function should not be too short. In fact, the Birthday
Paradox shows that collisions occur surprisingly often (see Proposition 1.61):
Proposition 7.3. Let 𝑘 be the number of independent samples drawn from a uniform
distribution on a set of size 𝑁. If 𝑘 ≈ 1.2√𝑁, then the probability of a collision is around
50%. ♢
If we consider hash values of length 𝑙 and assume a uniform distribution, then

collisions occur after hashing around √2𝑙 messages. For example, if 𝑙 = 80 then around
240 hash values are likely to have a collision, and this can be done on a standard PC. In
order to minimize the risk of random collisions, hash values should be at least 200 bits
long.
7.2. Applications of Hash Functions 139
The construction of a secure hash function is not easy, and many obvious defini-
tions do not give collision-resistant functions (see Exercises 2 and 3).
7.2. Applications of Hash Functions

Hash functions have many applications. Firstly, the hash value can be used as a short
identifier of data. The identifier is unique as long as the hash function is collision resis-
tant. Hashes can be used to verify the integrity of messages and files if the verifier has
access to the authentic message digest. Hash tables can speed up searching. Hashes
can also protect passwords: many operating systems and applications store password
hashes instead of plaintext passwords. When a user logs in with their username and
password, the password is hashed and compared to the stored password hash for that
user. Often, the password and a salt value is hashed in order to impede the use of
lookup tables.
Hashes play an important role in signature schemes, which are explored in Chap-
ter 11. Furthermore, message authentication codes, pseudorandom functions and key
derivation functions are often based on hash functions (see Chapter 8).
Hashes can also serve as an identifier of a sequence 𝑥1 , 𝑥2 , … , 𝑥𝑡 of messages. How-
ever, the obvious approach to compute
𝐻(𝑥1 ‖𝑥2 ‖ … ‖𝑥𝑡 )
is not very efficient, since the verification requires all data blocks. In the example
below, we will see that Merkle trees are more efficient at handling a large number of
blocks.
Example 7.4. A blockchain is a sequence of linked blocks. Each block contains the
hash value of the previous block (see Figure 7.1). A blockchain can be used as a dis-
tributed ledger, which records transactions in an efficient and verifiable way. Transac-
tions in a block cannot be modified without changing all subsequent hash values.
Figure 7.1. Three linked blocks of a blockchain.

The hashes of transactions 𝑇1 , 𝑇2 , 𝑇3 , … form the leaves of a binary tree called the
Merkle tree. The nodes further up are hashes of two children nodes and the root of the
Merkle is the top hash value (see Figure 7.2).
𝐻𝑟𝑜𝑜𝑡 = 𝐻(𝐻12 ‖𝐻34 )
𝐻12 = 𝐻(𝐻(𝑇1 )‖𝐻(𝑇2 )) 𝐻34 = 𝐻(𝐻(𝑇3 )‖𝐻(𝑇4 ))
𝐻(𝑇1 ) 𝐻(𝑇2 ) 𝐻(𝑇3 ) 𝐻(𝑇4 )
Figure 7.2. Merkle tree.
This works with any even number of transactions. The root hash forms an identi-
fier for all transactions in a block, and changing a single transaction would completely
change the root hash. Individual transactions can be verified by their hash path from
the leaf to the root.
Suppose we want to prove that a transaction 𝑇3′ is included in the blockchain; then
we only need to provide the hashes 𝐻4 = 𝐻(𝑇4 ) and 𝐻12 along with 𝑇3′ . The ver-
ifier checks the hash path by computing 𝐻(𝑇3′ ), 𝐻34 ′
= 𝐻(𝐻(𝑇3′ )‖𝐻4 ) and 𝐻𝑟𝑜𝑜𝑡
′
=
′ ′
𝐻(𝐻12 ‖𝐻34 ). Finally, they verify if 𝐻𝑟𝑜𝑜𝑡 coincides with the root hash 𝐻𝑟𝑜𝑜𝑡 which is
stored in the blockchain. This is very efficient, even for larger trees with thousands of
leaves, and Merkle trees have many applications beyond blockchains.
Blockchains are used by many cryptocurrencies. The blockchain records the trans-
actions of previously unspent cybercoins from one or more input addresses to one or
more output addresses. Each new block contains a proof-of-work; by adapting the
nonce value, a miner has to find a hash value of the new block that is smaller than
the network’s difficulty target. This may require a huge number of hashing operations
and consume significant computing resources as well as a lot of energy. The miner
is rewarded with new cybercoins. The proof of work protects the blockchain against
manipulations and complicates forks.
7.3. Merkle-Damgård Construction

A common approach to the construction of hash functions is the Merkle-Damgård
transform. This construction has found widespread use, including the MD-SHA fam-
ily. The Merke-Damgård transform is based on a compression function, which maps
𝑛 + 𝑙 input bits to 𝑛 output bits. The compression function is applied recursively. First,
the message is padded and segmented into blocks of length 𝑙. Then the first block is
compressed. The output and the next input block are compressed again and this is re-
peated until the last block is processed. The hash value is defined by the output of the
last compression operation (see Figure 7.3).
7.3. Merkle-Damgård Construction 141
Definition 7.5. Let 𝑛, 𝑙 ∈ ℕ and let 𝑓 ∶ {0, 1}𝑛+𝑙 → {0, 1}𝑛 be a compression function.
Let 𝐼𝑉 ∈ {0, 1}𝑛 be an initialization vector. An input message 𝑚 of arbitrary length
is padded by a 1, a sequence of zero bits and the length 𝐿 = |𝑚|, encoded as a 64-bit
binary string. The padded message is
𝑚′ = 𝑚‖1‖0 … 0‖𝐿.
The number of zeros is chosen such that the length of 𝑚′ is a multiple of 𝑙. We split 𝑚′
into blocks of length 𝑙:
𝑚′ = 𝑚1 ‖𝑚2 ‖ … ‖𝑚𝑁 .
The Merkle-Damgård hash function 𝐻 = 𝐻𝐼𝑉 ∶ {0, 1}∗ → {0, 1}𝑛 is defined by recur-
sive application of the compression function 𝑓 (see Figure 7.3). The last output value
defines the hash:
ℎ0 = 𝐼𝑉,
ℎ𝑖 = 𝑓(ℎ𝑖−1 , 𝑚𝑖 ) for 𝑖 = 1, … , 𝑁,
𝐻(𝑚) = 𝐻𝐼𝑉 (𝑚) = ℎ𝑁 .
The initialization vector IV can be regarded as a key, but in practice, IV is a pre-defined
constant. ♢
𝑚′ = 𝑚1 𝑚2 𝑚3 𝑚4 𝑚𝑁
ℎ0 = 𝐼𝑉
𝑓 𝑓 𝑓 𝑓 ⋯ 𝑓 𝐻(𝑚)
ℎ1 ℎ2 ℎ3
Figure 7.3. Merkle-Damgård construction
The security of a Merkle-Damgård hash function can be reduced to the collision-

resistance of the underlying compression function. A collision in a Merkle-Damgård
hash function yields a collision in the compression function (see [KL15], [BR05]).
Theorem 7.6. If the compression function 𝑓 is collision-resistant, then so is the associated
Merkle-Damgård hash function 𝐻. ♢
The compression function can be based on a block cipher, although this construc-
tion is rarely used in practice.
Definition 7.7. (Davies-Meyer) Let 𝐸 be the encryption function of a block cipher
with key length 𝑛 and block length 𝑙. Then a compression function
𝑓 ∶ {0, 1}𝑛+𝑙 → {0, 1}𝑙
can be defined as follows:
𝑓(𝑘, 𝑚) = 𝐸𝑘 (𝑚) ⊕ 𝑚. ♢
One can show that this construction defines a collision-resistant compression func-
tion in the ideal cipher model. A block cipher that is chosen uniformly at random from
all block ciphers with 𝑛-bit keys and 𝑙-bit input/output strings is called an ideal ci-
pher. An ideal cipher is a family of independent permutations. This is stronger than
the standard notion of pseudorandomness and includes protection against related-key
attacks (see Remark 2.44). Although ideal ciphers cannot be implemented and it is un-
clear whether real-word block ciphers (for example AES) behave like an ideal cipher,
security proofs in the ideal cipher model can still be useful: a scheme can be proven
to be secure (for example, the above Davies-Meyer construction), unless an adversary
exploits weaknesses of the underlying block cipher.
7.4. SHA-1
Until recently, SHA-1 was a widely used Merkle-Damgård hash function, and in the
following we describe its compression function 𝑓. As input, the function takes a 160-
bit status vector and a 512-bit message block and outputs an updated 160-bit status:
𝑓 ∶ {0, 1}160+512 → {0, 1}160 .
As described in the Merkle-Damgård construction, the message is split into 512-bit

blocks. The last block is padded by a binary 1, a sequence of zeros and the binary 64-bit
representation of the length of the message.
The initial 160-bit status vector is ℎ0 = 𝐼𝑉 = 𝐻1 ‖𝐻2 ‖𝐻3 ‖𝐻4 ‖𝐻5 , where:
𝐻1 = 67452301,
𝐻2 = EFCDAB89,
𝐻3 = 98BADCFE,
𝐻4 = 10325476,
𝐻5 = C3D2E1F0.
The compression function 𝑓(ℎ, 𝑚) is defined by a combination of very efficient lin-

ear and nonlinear operations (XOR, bit-rotations, NOT, OR, AND, modular additions),
which are applied in 80 consecutive rounds.
A 512-bit message block 𝑚 = 𝑊0 ‖𝑊1 ‖ … ‖𝑊15 is subdivided into 16 words of length

32 bits. By XOR operations and a circular left shift by one position, 64 additional words
𝑊16 , … , 𝑊79 are generated:
𝑊𝑗 = (𝑊𝑗−16 ⊕ 𝑊𝑗−14 ⊕ 𝑊𝑗−8 ⊕ 𝑊𝑗−3 ) ⋘ 1 for 16 ≤ 𝑗 ≤ 79.
The 160-bit input vector ℎ = 𝐻1 ‖𝐻2 ‖𝐻3 ‖𝐻4 ‖𝐻5 is subdivided into five 32-bit words
and copied to the initial status vector:
𝐴‖𝐵‖𝐶‖𝐷‖𝐸 ← 𝐻1 ‖𝐻2 ‖𝐻3 ‖𝐻4 ‖𝐻5 .
7.4. SHA-1 143
Figure 7.4. One round of the SHA-1 compression function 𝑓. The 32-bit status words
𝐴, 𝐵, 𝐶, 𝐷, 𝐸 are updated. In each round, a 32-bit chunk 𝑊 of the input message is
processed. 𝐹 is a nonlinear bit-function, 𝐾 a 32-bit constant (both depending on the
round number) and + denotes addition modulo 232 .
Then, 80 rounds of the SHA-1 compression function are performed (see Figure 7.4),
which update the status words 𝐴‖𝐵‖𝐶‖𝐷‖𝐸. In round 𝑗, the 32-bit message word 𝑊𝑗
is processed. A bit-function 𝐹 (defined by AND, OR, NOT and XOR operations) and a
constant 𝐾 are used. The function 𝐹 and the constant 𝐾 change every 20 rounds (see
Table 7.1).
Table 7.1. Keys and bit functions in SHA-1.
𝑗 𝐾 𝐹
0 ≤ 𝑗 ≤ 19 5A827999 𝐶ℎ(𝐵, 𝐶, 𝐷) = (𝐵 ∧ 𝐶) ⊕ (¬𝐵 ∧ 𝐷)
20 ≤ 𝑗 ≤ 39 6ED9EBA1 𝑃𝑎𝑟𝑖𝑡𝑦(𝐵, 𝐶, 𝐷) = 𝐵 ⊕ 𝐶 ⊕ 𝐷
40 ≤ 𝑗 ≤ 59 8F1BBCDC 𝑀𝑎𝑗(𝐵, 𝐶, 𝐷) = (𝐵 ∧ 𝐶) ⊕ (𝐵 ∧ 𝐷) ⊕ (𝐶 ∧ 𝐷)
60 ≤ 𝑗 ≤ 79 CA62C1D6 𝑃𝑎𝑟𝑖𝑡𝑦(𝐵, 𝐶, 𝐷) = 𝐵 ⊕ 𝐶 ⊕ 𝐷
After completing 80 rounds, the compression function outputs

𝑓(ℎ, 𝑚) = (𝐴 + 𝐻1 ‖ 𝐵 + 𝐻2 ‖ 𝐶 + 𝐻3 ‖ 𝐷 + 𝐻4 ‖ 𝐸 + 𝐻5 ),
where + denotes addition modulo 232 .
The security of SHA-1 has been intensively studied and it has been anticipated
that collisions can be found with available computing resources. In February 2017, a
first collision was announced [SBK+ 17]. The attack required 263 SHA-1 calls and took
approximately 6500 CPU years and 100 GPU years. They chose a prefix 𝑃 and found
two different 1024-bit messages 𝑀 (1) and 𝑀 (2) such that
𝐻(𝑃‖𝑀 (1) ) = 𝐻(𝑃‖𝑀 (2) ).
Since 𝑃 was chosen to be a preamble for PDF documents (see Example 7.8), the colli-
sion makes it possible to fabricate two different PDF files with the same SHA-1 hash
value, and impressive examples have been published. 𝑀 (1) and 𝑀 (2) each consist of
two 512-bit blocks. The Merkle-Damgård iteration that takes the first block of 𝑀 (𝑖) as
input produces a near collision, and the second block then gives a full collision. Since
both messages have the same length, appending the padding data including the length
preserves the collision.
Example 7.8. We check the collision found by [SBK+ 17]. First, define the prefix and
the messages.
sage: prefix='255044462 d312e330a25e2e3cfd30a0a0a312030206f626a0a3c3c2f57696474
682032203020522 f4865696768742033203020522f547970652034203020522f53756274
7970652035203020522 f46696c7465722036203020522f436f6c6f725370616365203720
3020522 f4c656e6774682038203020522f42697473506572436f6d706f6e656e7420383e
3 e0a73747265616d0affd8fffe00245348412d3120697320646561642121212121852fec
092339759 c39b1a1c63c4c97e1fffe01 '
Here is the prefix 𝑃 in ASCII characters:

%PDF-1.3.%.......1 0 obj.<</Width 2 0 R/Height 3 0 R/Type 4 0 R/
Subtype 5 0 R/Filter 6 0 R/ColorSpace 7 0 R/Length 8 0 R/BitsPer
Component 8>>.stream......$SHA-1 is dead!!!!!./..#9u.9...<L.....
The messages 𝑀 (1) and 𝑀 (2) are as follows:
sage: m1= '7 f46dc93a6b67e013b029aaa1db2560b45ca67d688c7f84b8c4c791fe02b3df614f
86 db1690901c56b45c1530afedfb76038e972722fe7ad728f0e4904e046c230570fe9d41
398 abe12ef5bc942be33542a4802d98b5d70f2a332ec37fac3514e74ddc0f2cc1a874cd0
c78305a21566461309789606bd0bf3f98cda8044629a1 '
sage: m2= '7346 dc9166b67e118f029ab621b2560ff9ca67cca8c7f85ba84c79030c2b3de218f
86 db3a90901d5df45c14f26fedfb3dc38e96ac22fe7bd728f0e45bce046d23c570feb141
398 bb552ef5a0a82be331fea48037b8b5d71f0e332edf93ac3500eb4ddc0decc1a864790
c782c76215660dd309791d06bd0af3f98cda4bc4629b1 '
We verify the collision:

sage: import hashlib
sage: h1 = hashlib .sha1 (); h1. update (( prefix +m1). decode ("hex"))
sage: h1. hexdigest ()
'f92d74e3874587aaf443d1db961d4e26dde13e9c '
sage: h2 = hashlib .sha1 (); h2. update (( prefix +m2). decode ("hex"))
'f92d74e3874587aaf443d1db961d4e26dde13e9c '
Appending any string 𝑚 yields another collision:

sage: m = 'deadbeef '
sage: h1 = hashlib .sha1 (); h1. update (( prefix +m1+m). decode ("hex"))
'811 b0fd07f6109f08da740a72b45d5818455b35e '
sage: h2 = hashlib .sha1 (); h2. update (( prefix +m2+m). decode ("hex"))
'811 b0fd07f6109f08da740a72b45d5818455b35e '
7.5. SHA-2 145
SHA-1 is now deprecated, and it is recommended to use SHA-2 or the new standard
hash function SHA-3 (see below).
7.5. SHA-2
The SHA-2 hash functions SHA-224, SHA-256, SHA-384 and SHA-512 are constructed
in a similar way to SHA-1, but use an extended internal state of 256 bits (eight 32-
bit words) and larger digests. It is assumed that SHA-2 offers better protection against
collision-finding attacks, and at the time of this writing SHA-2 is widely used in security
protocols and applications. The SHA-2 family is specified in the standard [FIP15a].
Since SHA-2 is a Merkle-Damgård hash function, we only need to define the com-
pression function 𝑓 and the initial status. In the following, we describe the SHA-256
variant.
The compression function 𝑓 takes as input a 256-bit status vector and a 512-bit
message block and outputs an updated 256-bit status:
𝑓 ∶ {0, 1}256+512 → {0, 1}256 .
The initial 256-bit status vector ℎ0 = 𝐻1 ‖𝐻2 ‖𝐻3 ‖𝐻4 ‖𝐻5 ‖𝐻6 ‖𝐻7 ‖𝐻8 is defined by:
𝐻1 = 6A09E667,
𝐻2 = BB67AE85,
𝐻3 = 3C6EF372,
𝐻4 = A54FF53A,
𝐻5 = 510E527F,
𝐻6 = 9B05688C,
𝐻7 = 1F83D9AB,
𝐻8 = 5BE0CD19.
A 512-bit message block 𝑚 = 𝑊0 ‖𝑊1 ‖ … ‖𝑊15 is split into 16 words of length 32 bits.
The functions 𝜎0 and 𝜎1 transform 32-bit words by a combination of XOR, right-rotate
(⋙) and right-shift (≫) operations:
𝜎0 (𝑤) = (𝑤 ⋙ 7) ⊕ (𝑤 ⋙ 18) ⊕ (𝑤 ≫ 3),
𝜎1 (𝑤) = (𝑤 ⋙ 17) ⊕ (𝑤 ⋙ 19) ⊕ (𝑤 ≫ 10).
Now 48 additional words 𝑊16 , … , 𝑊63 are generated:
𝑊𝑗 = 𝜎1 (𝑊𝑗−2 ) + 𝑊𝑗−7 + 𝜎0 (𝑊𝑗−15 ) + 𝑊𝑗−16 for 16 ≤ 𝑗 ≤ 63.
The 256-bit input vector ℎ = 𝐻1 ‖𝐻2 ‖𝐻3 ‖𝐻4 ‖𝐻5 ‖𝐻6 ‖𝐻7 ‖𝐻8 is subdivided into
eight 32-bit words and copied to the initial status vector:
𝐴‖𝐵‖𝐶‖𝐷‖𝐸‖𝐹‖𝐺‖𝐻 ← 𝐻1 ‖𝐻2 ‖𝐻3 ‖𝐻4 ‖𝐻5 ‖𝐻6 ‖𝐻7 ‖𝐻8 .
Figure 7.5. One round of the SHA-2 compression function 𝑓. The 32-bit status words
𝐴, 𝐵, 𝐶, 𝐷, 𝐸, 𝐹, 𝐺, 𝐻 are updated. In each round, a 32-bit chunk 𝑊 of the input
message is processed. 𝐾 is a 32-bit constant, which depends on the round number.
Then, 64 rounds of the SHA-2 compression function 𝑓 are performed (see Figure 7.5),
which update the status words 𝐴‖𝐵‖𝐶‖𝐷‖𝐸‖𝐹‖𝐺‖𝐻. In round 𝑗, the 32-bit word 𝑊𝑗
and the 32-bit constant 𝐾𝑗 is processed. The numbers 𝐾𝑗 represent the first 32 bits of
the fractional parts of the cube roots of the first 64 prime numbers.
In each round, four functions 𝑀𝑎𝑗, 𝐶ℎ, Σ0 and Σ1 are used which operate on 32-bit
words:
𝑀𝑎𝑗(𝑥, 𝑦, 𝑧) = (𝑥 ∧ 𝑦) ⊕ (𝑥 ∧ 𝑧) ⊕ (𝑦 ∧ 𝑧),
𝐶ℎ(𝑥, 𝑦, 𝑧) = (𝑥 ∧ 𝑦) ⊕ (¬𝑥 ∧ 𝑧),
Σ0 (𝑤) = (𝑤 ⋙ 2) ⊕ (𝑤 ⋙ 13) ⊕ (𝑤 ⋙ 22),
Σ1 (𝑤) = (𝑤 ⋙ 6) ⊕ (𝑤 ⋙ 11) ⊕ (𝑤 ⋙ 25).
After completing 64 rounds, the compression function outputs
𝑓(ℎ, 𝑚) = (𝐴 + 𝐻1 ‖𝐵 + 𝐻2 ‖𝐶 + 𝐻3 ‖𝐷 + 𝐻4 ‖𝐸 + 𝐻5 ‖𝐹 + 𝐻6 ‖𝐺 + 𝐻7 ‖𝐻 + 𝐻8 ),
where + denotes addition modulo 232 .
7.6. SHA-3
Since collisions of MD5 have been found and weaknesses of SHA-1 were already
known, in 2007 the American NIST announced a competition to design a new hash
function called SHA-3. After narrowing down the candidates in three public rounds,
Keccak was selected as the winner of the competition in 2012. The main evaluation cri-
teria were security, performance, flexibility and simplicity of the design. Keccak is not of
Merkle-Damgård type, but rather based on a sponge construction (see Figure 7.6). The
design and the security claim is explained in [Ber11]. The construction is modeled to
behave like a random oracle.
7.6. SHA-3 147
In 2015, the Keccak variants SHA3-224, SHA3-256, SHA3-384, SHA3-512 with out-
put lengths between 224 and 512 bits were standardized [FIP15b]. The SHA-3 instance
of Keccak uses a three-dimensional state array of 5 × 5 × 64 = 1600 bits. The unkeyed
Keccak-𝑓[1600] permutation operates on the 1600-bit state array and it is assumed that
𝑓 behaves like a random permutation. In each step, 𝑟 < 1600 message bits are pro-
cessed. 𝑟 is called the rate, and the remaining number of 𝑐 = 1600 − 𝑟 bits is called the
capacity. The Keccak-𝑓[1600] permutation
𝑓 ∶ {0, 1}1600 → {0, 1}1600
is parametrized by 𝑟 and 𝑐. SHA-3 specifies the combinations which are shown in Ta-
ble 7.2.
Table 7.2. Combinations of output length, rate and capacity.
Output length 𝑙 Rate 𝑟 Capacity 𝑐 State length 𝑟 + 𝑐

224 1152 448 1600
256 1088 512 1600
384 832 768 1600
512 576 1024 1600
Definition 7.9. The SHA-3 family of hash functions
𝐻 ∶ {0, 1}∗ → {0, 1}𝑙
supports output lengths 𝑙 ∈ {224, 256, 384, 512}. Depending on 𝑙, the rate 𝑟 and the
capacity 𝑐 are fixed (see Table 7.2). First, the input message 𝑚 is padded such that the
length of 𝑚′ is a multiple of 𝑟. The padding rule of the SHA-3 family is to append the
pattern 0110 … 01. The padded message 𝑚′ is split into blocks 𝑚1 , 𝑚2 , … , 𝑚𝑁 of length
𝑟:
𝑚′ = 𝑚‖0110 … 01 = 𝑚1 ‖𝑚2 ‖ … ‖𝑚𝑁 .
The state 𝑠 = 𝑠1 ‖𝑠2 is initialized by the zero vector 0𝑟 ‖0𝑐 . During the absorbing phase,
the message blocks are XORed into the leftmost 𝑟 bits of the state and the permutation
Keccak-𝑓[1600] is applied to the full state of 𝑟 + 𝑐 = 1600 bits. The state is updated for
each message block:
𝑠1 ‖𝑠2 ← 𝑓(𝑠1 ⊕ 𝑚𝑖 ‖ 𝑠2 ) for 1 ≤ 𝑖 ≤ 𝑁.
Finally, the SHA-3 hash value is computed using a single squeezing operation; 𝐻(𝑚) is
defined by the leftmost 𝑙 bits of the resulting state vector 𝑠1 (see Figure 7.6). ♢
Figure 7.6. Absorbing message blocks of length 𝑟 into the state and finally squeezing
out the SHA-3 hash of length 𝑙 < 𝑟.
An advantage of the sponge construction – in comparison to the Merkle-Damgård

transform – is that the hash value does not reveal the full state, which prevents length
extension attacks.
The SHA-3 standard [FIP15b] also defines two extendable-output functions (XOF)
called SHAKE128 and SHAKE256, with which the output can be extended to any de-
sired length. In this case, the Keccak-𝑓 function is applied multiple times during the
squeezing phase to obtain the required number of output bits.
Exercises 149
7.7. Summary
• Hash functions take messages of arbitrary length as input and output a short
message digest.
• Hash functions should be collision-resistant; although collisions must exist, it
should be very hard to find one.
• Many hash functions (in particular SHA-1 and SHA-2) are based on the Merkle-
Damgård construction. A compression function is recursively applied to the
input blocks.
• Collisions of SHA-1 have been found using significant computing resources.
The SHA-2 variants SHA-224, SHA-256, SHA-384 and SHA-512 are constructed
in a similar way to SHA-1, but they generate longer digests and are assumed to
be more secure.
• SHA-3 is the new standard hash function. SHA-3 is not a Merkle-Damgård
hash function, but it is based on a sponge construction. The internal state array
has 1600 bits and the Keccak-𝑓[1600] permutation operates on the state.
Exercises
1. Consider a hash function 𝐻. Explain why collision resistance implies second-

preimage resistance and the second-preimage resistance implies preimage resis-
tance.
2. Why are linear or affine hash functions not collision-resistant?
3. Why must the output of a collision-resistant hash function depend on every input
bit?
4. Assume that a collision-resistant hash function is modified as described below. Is
it still a collision-resistant hash function?
(a) The low bit of the message is set to 1. The resulting message is hashed.
(b) The low bit of the message is flipped and then the message is hashed.
(c) All bits are flipped and then the message is hashed.
(d) The message is split into blocks of a fixed length, the blocks are XORed and
the result is hashed.
5. Consider a Merkle-Damgård hash function 𝐻. Show that an adversary, who knows
a hash value 𝐻(𝑚) but not the input 𝑚, can generate messages 𝑚′ and compute the
hash 𝐻(𝑚‖𝑚′ ). This is called a length extension attack.
6. Find a reason why in the Merkle-Damgård construction the padded message in-
cludes the encoded length.
7. Let 𝐸 be a block cipher satisfying all necessary security assumptions (pseudoran-
domness and even an ideal cipher) and consider the compression function
𝑓(𝑘, 𝑚) = 𝐸𝑘 (𝑚).
Show that 𝑓 – in contrast to the Davies-Miller construction – is not collision-resis-
tant.
8. Give a table of values of the Boolean functions 𝐶ℎ and 𝑀𝑎𝑗 used by SHA-1 and
SHA-2. Show that the XOR (⊕) operations in these functions can be replaced by
OR (∨).
9. Suppose 𝑚 is a message of length 109 bits. How many calls to the SHA-1 com-
pression function, the SHA-2 compression function and the SHA-3 permutation
Keccak-𝑓[1600], respectively, are required to compute the hash value 𝐻(𝑚)?
10. Explain how the rate 𝑟 and the capacity 𝑐 are related to the performance and the
security level of SHA-3.
11. Suppose Keccak-𝑓[1600] was a linear map on 𝐺𝐹(2)1600 . Fix a SHA-3 output
length. Show that a collision in the associated SHA-3 hash function can be con-
structed using an efficient algorithm.
12. Suppose Keccak-𝑓[1600] was not a permutation and a collision had been found.
Can this be used for a collision in SHA-3 ?
Chapter 8
Message Authentication
Codes
A message authentication code (MAC) is a cryptographic tag which protects the in-
tegrity and the origin of a message. A correct tag shows that the data has not been
tampered with by an adversary, and it also protects against accidental errors. MACs
are widely used (for example in network security protocols), since encryption alone is
not sufficient to protect the data. In fact, most encryption schemes cannot prevent the
manipulation of messages. Streams ciphers (or blocks ciphers in CTR, OFB and CFB
mode) are particularly vulnerable, since an adversary can change selected bits.
Message authentication codes use a symmetric secret key for tag generation and
verification. This constitutes a major difference to signatures (see Chapter 11), where
messages are signed with a private key and verification is performed with a public key.
The computation of MACs is usually very fast and they can efficiently protect the in-
tegrity of mass data.
We outline the definition of message authentication codes and their security re-
quirements in Section 8.1. Practical constructions of MACs, based on block ciphers
in CBC mode and on hash functions, are covered in Sections 8.2 and 8.3, respectively.
The combination of encryption and message authentication as well as authenticated
encryption schemes are discussed in Section 8.4.
For additional reading, we refer to [KL15] and [GB08].

Message authentication codes consist of key generation, tag generation and a verifica-
tion algorithm.
151
152 8. Message Authentication Codes
Definition 8.1. A message authentication code is given by the following spaces and
polynomial-time algorithms:
• A message space ℳ.
• A key space 𝒦.
• A key generation algorithm 𝐺𝑒𝑛(1𝑛 ) that takes a security parameter 1𝑛 as input
and outputs a key 𝑘.
• A tag generation algorithm, which may be randomized. It takes a message 𝑚 and
a key 𝑘 as input and outputs a tag MAC𝑘 (𝑚).
• A deterministic verification algorithm that takes a key 𝑘, a message 𝑚 and a tag 𝑡
and outputs 1 if the tag is valid, or otherwise 0.
Canonical verification means to re-compute MAC𝑘 (𝑚) and to output 1 if
𝑡 = MAC𝑘 (𝑚), and 0 otherwise. Canonical verification is only possible if the tag gen-
eration is deterministic. ♢
A message authentication tag is usually short and does not include the message.
Verification therefore requires the message, the tag and the key. Note that hash values
can also be leveraged to verify the integrity of data. However, the verifier needs to
access the authentic hash value, which is impossible in many applications.
The security of message authentication codes is determined by the difficulty to
forge a valid tag without knowing the key. We assume that an adversary can choose
messages and obtain a valid MAC. This is called a chosen message attack and corre-
sponds to a situation in practice, where many messages and their MACs are known.
The scheme is considered to be insecure if an adversary can generate a new message
and an associated valid MAC in polynomial time.
Definition 8.2. Suppose a message authentication code is given. Consider the fol-
lowing experiment (see Figure 8.1): a challenger takes 1𝑛 as input and generates a key
$
𝑘 ← 𝐺𝑒𝑛(1𝑛 ). An adversary 𝐴 is given 1𝑛 . They can choose multiple messages 𝑚 and
obtain the tags MAC𝑘 (𝑚) from an oracle. The adversary succeeds if they can produce
a message 𝑚, which they did not query previously, and a valid tag 𝑡 of that message. In
this case, the challenger outputs 1, and otherwise 0.
The scheme is called existentially unforgeable under an adaptive chosen-message
attack (EUF-CMA secure), or just secure, if for all probabilistic polynomial-time adver-
saries, the probability of success is negligible in 𝑛.
Remark 8.3. The above experiment can be slightly modified by accepting a valid mes-
sage/tag pair, where only the tag is new and the message might have been queried
before. Unforgeable MACs in this experiment are called strongly secure. If canonical
verification is used, then secure MACs are automatically strongly secure, since in this
case a message uniquely determines the tag.
1𝑛 $
𝑘 ← 𝐺𝑒𝑛(1𝑛 )
𝑚
Choose 𝑚
𝑡
𝑡 = MAC𝑘 (𝑚)
⋮
Choose 𝑚′ ,
(𝑚′ , 𝑡′ )
forge a tag 𝑡′ Verify (𝑚′ , 𝑡′ ),
output 1 or 0
Figure 8.1. MAC forgery experiment.
In the above game, the adversary cannot ask the oracle to verify a tag. Now one can
change the experiment by allowing verification queries. Since this makes the adversary
more powerful, the associated definition of security could be stronger. However, one
can show that a strongly secure MAC (for example a MAC with canonical verification)
is also secure in this experiment. An adversary can verify a message and a tag by run-
ning the original experiment. Since the number of verification queries is polynomial,
the security definitions are equivalent.
Remark 8.4. MACs do not protect against the replay of messages and tags. If replay
protection is required, for example in network protocols, then an additional counter (or
a timestamp) should be used. The counter is added to the message and integrated into
the tag computation so that the counter cannot be forged. The sender increments the
counter for every new message. The receiver keeps track of the counter and discards a
message if a counter is re-used or if the tag is invalid. ♢
How can we construct a secure MAC? To begin with, a pseudorandom function

(prf) family
𝐹 ∶ {0, 1}𝑛 × {0, 1}𝑙 → {0, 1}𝑙
(see Section 2.9) yields a secure MAC.
Theorem 8.5. Let 𝐹 be a pseudorandom function family with key length 𝑛 and block
length 𝑙. We define an associated MAC: The key generation algorithm 𝐺𝑒𝑛(1𝑛 ) takes a
$
security parameter 1𝑛 as input and outputs a uniform key 𝑘 ← {0, 1}𝑛 . The MAC takes a
a message 𝑚 of length 𝑙 and a key of length 𝑛 as input and outputs
MAC𝑘 (𝑚) = 𝐹𝑘 (𝑚).
The verification is canonical. This construction defines a secure MAC for messages of
length 𝑙. ♢
The above Theorem has a proof by reduction (see [KL15]). An adversary, who can
forge valid MACs, is also able to distinguish 𝐹 from a random function.
Since the prf-construction only takes messages of fixed length as input, it is rarely
used in practice. The construction can be extended to messages of arbitrary length,
for example by a sequence of tags (see Exercise 5), but this is not very efficient. In the
following two sections, we describe two widely used MAC constructions, CBC MAC
and HMAC. They are based on block ciphers and hash functions, respectively.
8.2. CBC MAC

We want to construct a MAC for messages of arbitrary length based on a block cipher. A
popular construction is the CBC MAC (Cipher Block Chaining Message Authentication
Code). The message is encrypted in CBC mode (see Section 2.10) and the last ciphertext
block is used as an authentication tag.
Definition 8.6. Let 𝐸 ∶ {0, 1}𝑛 × {0, 1}𝑙 → {0, 1}𝑙 be a block cipher and
$
𝑘 ← {0, 1}𝑛 a uniform key. Fix a number 𝑁 of input blocks and set ℳ = {0, 1}𝑁𝑙 .
The basic CBC MAC of a message 𝑚 = 𝑚1 ‖𝑚2 ‖ … ‖𝑚𝑁 of fixed length 𝑁𝑙 is defined by
encrypting 𝑚 in CBC mode and outputting the last ciphertext block:
𝑐0 = 0𝑙 ,
𝑐𝑖 = 𝐸𝑘 (𝑚𝑖 ⊕ 𝑐𝑖−1 ) for 𝑖 = 1, 2, … , 𝑁,
MAC𝑘 (𝑚) = 𝑐𝑁 .
The CBC MAC is deterministic, and verification is canonical. ♢
The CBC MAC computation is similar to encryption in CBC mode. However, the
initialization vector is a zero string and only the last ciphertext block is output. One
can show that the basic CBC MAC is secure for fixed-length messages (see [KL15]).
Theorem 8.7. If 𝐸 is a pseudorandom permutation, then the basic CBC MAC is EUF-
CMA secure for messages of fixed length 𝑁𝑙.
Remark 8.8. The bijectivity of 𝐸𝑘 is not required in Definition 8.6, and Theorem 8.7
remains true for a pseudorandom function family. ♢
The above basic CBC MAC is not secure for messages of arbitrary length: suppose
𝑚 is a message of length 𝑙 so that 𝑡 = MAC𝑘 (𝑚) = 𝐸𝑘 (𝑚). Now an adversary constructs
the message 𝑚′ = 𝑚 ‖ (𝑡 ⊕ 𝑚) of length 2𝑙. Since
MAC𝑘 (𝑚′ ) = 𝐸𝑘 (𝑡 ⊕ (𝑡 ⊕ 𝑚)) = 𝐸𝑘 (𝑚) = 𝑡,
the same tag is also valid for 𝑚′ . This shows that the basic CBC MAC needs to be
modified for messages of variable length. One approach is to prepend the length of the
message which prevents this attack (see Exercise 6). Another option is to transform the
last input block using a secret key, which prevents the fabrication of valid tags. Below,
8.2. CBC MAC 155
we describe the CMAC (Cipher-based Message Authentication Code), which is also

called OMAC1 (see [Dwo16] and RFC 4493 [SPLI06]).
The CMAC computation requires the multiplication of 128-bit blocks, which is de-
fined in the field
𝐺𝐹(2128 ) = 𝐺𝐹(2)[𝑥]/(𝑥128 + 𝑥7 + 𝑥2 + 𝑥 + 1).
A 128-bit block (𝑏127 , … , 𝑏1 , 𝑏0 ) corresponds to the residue class
𝑏127 𝑥127 + ⋯ + 𝑏1 𝑥 + 𝑏0 mod 𝑥128 + 𝑥7 + 𝑥2 + 𝑥 + 1.
Example 8.9. Let 𝑏 = (𝑏127 … 𝑏1 𝑏0 ) and 𝑐 = (0126 10). We want to compute 𝑏 ⋅ 𝑐.
The corresponding polynomials are 𝑏127 𝑥127 + ⋯ + 𝑏1 𝑥 + 𝑏0 and 𝑥. Their product is
𝑚(𝑥) = 𝑏127 𝑥128 + ⋯ + 𝑏1 𝑥2 + 𝑏0 𝑥. If 𝑏127 = 0, then the degree of 𝑚(𝑥) is less than
128, and therefore 𝑏 ⋅ 𝑐 = (𝑏126 … 𝑏1 𝑏0 0) = (𝑏 ≪ 1), a left-shift of 𝑏 by one position. If
𝑏127 = 1, then we have to use the congruence
𝑥128 ≡ 𝑥7 + 𝑥2 + 𝑥 + 1.
The polynomial 𝑥7 + 𝑥2 + 𝑥 + 1 corresponds to the binary string 𝑅 = (0120 10000111),
and thus 𝑏 ⋅ 𝑐 = (𝑏126 … 𝑏1 𝑏0 0) ⊕ 𝑅.
Definition 8.10. Let 𝐸 be a block cipher with 128-bit block length, 𝑘 a secret key and
𝑚 = 𝑚1 ‖𝑚2 ‖ … ‖𝑚𝑁 a sequence of 128-bit message blocks, where the last block 𝑚𝑁
may be shorter. Then the CMAC of 𝑚 is defined as the CBC MAC of 𝑚′ = 𝑚1 ‖𝑚2 ‖ …
‖𝑚𝑁−1 ‖𝑚′𝑁 , where the last block is tweaked in the following way: first, two 128-bit
subkeys 𝑘1 and 𝑘2 derived:
𝑘1 = 𝐸𝑘 (0128 ) ⋅ (0126 10) and 𝑘2 = 𝐸𝑘 (0128 ) ⋅ (0125 100).
The multiplication of 128-bit blocks is defined in the field 𝐺𝐹(2128 ) (see above).
Then set
𝑚 ⊕ 𝑘1 if |𝑚𝑁 | = 128,
𝑚′𝑁 = { 𝑁
(𝑚𝑁 ‖10 … 0) ⊕ 𝑘2 if |𝑚𝑁 | < 128.
Hence the subkey 𝑘1 is used if the last block is full length. Otherwise, the last block is
padded and the subkey 𝑘2 is used. The verification of the CMAC is canonical. ♢
An adversary cannot produce a valid CMAC without knowing 𝑘1 or 𝑘2 , but these

subkeys depend on the secret MAC key 𝑘. Furthermore, 𝑘1 or 𝑘2 cannot be recovered
from valid tags, since 𝑚′𝑁 is protected by 𝐸𝑘 . One can show that the CMAC construction
is secure (see [IK03]):
Theorem 8.11. If 𝐸 is a pseudorandom permutation or function, then the CMAC con-
struction defines an EUF-CMA secure MAC for messages of variable length.
Remark 8.12. There are truncated versions of the CMAC, for example AES-CMAC-96
(see RFC 4494 [SPL06]), where the tag is defined by the leftmost 96 bits of the CMAC.
Of course, the tag must not be too short, say less than 80 bits, since collisions may
otherwise be produced and valid MACs forged.
8.3. HMAC
Another widely used MAC construction is based on hash functions. Hash functions
are usually faster than encryption algorithms. However, hash functions are unkeyed in
practice, so they cannot be used directly as MACs. But note that the general Merkle-
Damgård transform (see Section 7.3) takes an initialization vector (or key) 𝐼𝑉 as input.
The obvious prefix construction 𝐻𝑘 (𝑚) = 𝐻(𝑘, 𝑚) (with 𝑘 = 𝐼𝑉) or 𝐻𝑘 (𝑚) = 𝐻(𝑘‖𝑚)
(for an unkeyed hash function with fixed 𝐼𝑉) is insecure for messages of variable length
if 𝐻 is a Merkle-Damgård hash function (length extension attack; see Exercise 8). Note
that the SHA-3 family is not vulnerable to this attack.
The Hash-based Message Authentication Code (HMAC) is based on two nested
hashing operations and protects against length extension attacks. HMAC is described
in RFC 2104 [HK97] and standardized in [FIP08].
Definition 8.13. Let 𝐻 be a Merkle-Damgård hash function and suppose 𝑏 is the input
block length in bytes of the underlying compression function. For SHA-1 and SHA-
256, one has 𝑙 = 512 bits and thus 𝑏 = 64 bytes. The message space is ℳ = {0, 1}∗
$
and HMAC keys 𝑘 ← {0, 1}𝑛 are chosen uniformly at random. We assume that the
byte length of 𝑘 is, at most, 𝑏. Define ipad and opad strings by repeating the bytes 36
and 5C, respectively, 𝑏 times. The key 𝑘 is padded by zeros such that the byte length
of 𝑘 = (𝑘 ‖ 0 … 0) is 𝑏. Then the HMAC message authentication tag of a message 𝑚 is
defined as
HMAC(𝑘, 𝑚) = 𝐻(𝑘 ⊕ opad ‖ 𝐻(𝑘 ⊕ ipad ‖ 𝑚) ).
The verification of a message 𝑚 and a tag 𝑡 is canonical: compute HMAC(𝑘, 𝑚) and
compare the result with 𝑡. The tag is valid if 𝑡 = HMAC𝑘 (𝑚). ♢
The HMAC construction is a special case of a nested MAC (NMAC). Let 𝐻 be a

keyed hash function and suppose 𝑘1 , 𝑘2 are two secret keys. Then define
NMAC𝑘1 ‖𝑘2 (𝑚) = 𝐻𝑘2 (𝐻𝑘1 (𝑚)).
𝑘1 is called the inner key and 𝑘2 the outer key. One can show that NMAC is a secure
MAC if 𝐻 is based on a pseudorandom compression function ([Bel06]):
Theorem 8.14. Let 𝑓 ∶ {0, 1}𝑛+𝑙 → {0, 1}𝑛 be a compression function that takes a key of
length 𝑛 and a message block of length 𝑙 as input. Let 𝐻 be the associated keyed Merkle-
Damgård hash function (without length padding). The NMAC function defined above
takes a key of length 2𝑛 and a message of arbitrary length as input and outputs a tag of
length 𝑛. If 𝑓 is a pseudorandom function for messages of fixed length, then NMAC is a
pseudorandom function and a secure MAC for messages of arbitrary length. ♢
The above security guarantee for NMAC is quite strong. Does it also apply to
HMAC ? The main differences are: a) HMAC uses an unkeyed hash function and is
keyed via the data input, b) length padding is applied, and c) the HMAC keys 𝑘 ⊕ opad
and 𝑘 ⊕ ipad are not independent. Nevertheless, one has the following result [Bel06]:
Theorem 8.15. Let 𝑓 ∶ {0, 1}𝑛 × {0, 1}𝑙 → {0, 1}𝑛 be a compression function and let
𝑓 ∶ {0, 1}𝑛 × {0, 1}𝑙 → {0, 1}𝑛 be the dual function with the same values as 𝑓, but keyed
via the second component. Let 𝐻 be the Merkle-Damgård hash function associated with
𝑓. If 𝑓 is a prf and 𝑓 is a prf under restricted related-key attacks, respectively, then HMAC
is a pseudorandom function and an EUF-CMA secure MAC for messages of arbitrary
length. ♢
The above Theorem reduces the security of HMAC to the pseudorandomness of
𝑓 and 𝑓. The related-key attack against 𝑓 can be restricted to two keys (𝑘 ⊕ opad and
𝑘⊕ipad) and two oracle queries. Note that only pseudorandomness is required, so that
HMAC could still be secure when used with hash functions whose collision resistance
is compromised.
Remark 8.16. HMAC is widely used in practice, not only as a message authentication
code, but also as a pseudorandom function and as a building block in key derivation
functions. For example, an HMAC-based Extract-and-Expand Key Derivation Function
(HKDF) is described in RFC 5869 [KE10]. Multiple HMAC calls with the same key and
different input data can generate the desired number of pseudorandom output bits. ♢
Truncated versions of HMAC are often used for message authentication, for ex-
ample HMAC-SHA1-80. These variants are defined by the leftmost 𝑡 bits of the HMAC
value. It is recommended that 𝑡 should not be less than 80.
Remark 8.17. HMAC was designed for Merkle-Damgård hash functions, for example
MD5 and SHA-1, which are vulnerable to length extension attacks. SHA-3 (Keccak)
does not need the nested approach and a MAC can be defined by prepending the key to
the message. The NIST publication [KCP16] describes the Keccak Message Authenti-
cation Code (KMAC). Keccak can also be used as a pseudorandom function and a key
derivation function.
8.4. Authenticated Encryption

In practice, encryption and message authentication are often combined in order to
protect the confidentiality and the integrity. This should also prevent tampering with
ciphertext messages. We are thus looking for a CCA-secure and unforgeable encryption
scheme.
Definition 8.18. Suppose an encryption scheme is given. Consider the following ex-
$
periment (see Figure 8.2): a challenger takes 1𝑛 as input and generates a key 𝑘 ←
𝐺𝑒𝑛(1𝑛 ). An adversary 𝐴 is given 1𝑛 , can repeatedly choose messages 𝑚 and obtains
the ciphertext 𝑐 = ℰ𝑘 (𝑚) from an oracle. The adversary succeeds if they can produce a
valid ciphertext 𝑐′ , which they did not previously obtain from the oracle. In this case,
the challenger outputs 1 and otherwise 0. The encryption scheme is called unforge-
able if the probability of success is negligible in 𝑛 for all probabilistic polynomial-time
adversaries. ♢
1𝑛 $
𝑘 ← {0, 1}𝑛
𝑚
Choose 𝑚
𝑐
𝑐 = ℰ𝑘 (𝑚)
⋮
𝑐′
Forge a ciphertext 𝑐′ ≠ 𝑐 𝒟𝑘 (𝑐′ ) ≠ ⟂ ?
Output 1 or 0
Figure 8.2. Ciphertext forgery experiment.
Note that the adversary must produce a valid ciphertext 𝑐. Depending on the
scheme, the decryption of a given string 𝑐 can be invalid, i.e., 𝒟𝑘 (𝑐) = ⟂.
Definition 8.19. An encryption scheme is called an authenticated encryption scheme
if it is CCA2-secure and unforgeable. ♢
We already know (see Remark 2.53) that block ciphers in CBC or CTR mode are
malleable and therefore forgeable. An obvious approach to obtaining an authenticated
encryption scheme is to combine a CPA-secure encryption scheme and a secure MAC.
Several combinations are possible, and it turns out that the encrypt-then-authenticate
construction is the best choice.
Definition 8.20. Suppose a symmetric-key encryption scheme and a message authen-
tication code is given. We assume that key generation algorithms choose uniform keys
of length 𝑛. Then define a combined encryption and message authentication scheme
$ $
as follows: on input 1𝑛 choose two uniform keys 𝑘𝐸 ← {0, 1}𝑛 and 𝑘𝑀 ← {0, 1}𝑛 . En-
cryption of a plaintext 𝑚 with a key (𝑘𝐸 , 𝑘𝑀 ) is defined by
′
ℰ(𝑘 𝐸 ,𝑘𝑀 )
(𝑚) = (𝑐, 𝑡),
$
where 𝑐 ← ℰ𝑘𝐸 (𝑚) and 𝑡 = MAC𝑘𝑀 (𝑐). For decryption of (𝑐, 𝑡), one first verifies the
tag and outputs 𝒟𝑘𝐸 (𝑐) if the tag is valid. If the tag is not valid or 𝒟𝑘𝐸 (𝑐) = ⟂, then
output ⟂. ♢
Note that the tag is computed from the ciphertext, not from the plaintext. The next
Theorem states that the above definition gives a CCA2-secure and unforgeable scheme
if the underlying encryption scheme and the MAC are secure.
Theorem 8.21. Consider the encrypt-then-authenticate construction defined above.
Suppose that the encryption scheme is CPA-secure and the message authentication code is
a strongly secure MAC ( for example a secure MAC with canonical verification). Then the
encrypt-then-authenticate construction gives an authenticated encryption scheme. ♢
Refer to [KL15] for a proof. Note that the encrypt-then-authenticate construction

turns a CPA-secure encryption scheme into a CCA2-secure scheme.
Remark 8.22. The IP security protocol ESP (Encapsulating Security Payload; see RFC
4303 [Ken05]) protects IP packets with the above encrypt-then-authenticate approach.
On the other hand, the TLS (Transport Layer Security) 1.2 protocol (see RFC 5246
[DR08]), when used with stream ciphers or block ciphers in CBC mode, first com-
putes a message authentication code and then encrypts the plaintext and the MAC. The
main problem with this approach is that an error can occur after decryption (padding
error) or after tag computation (invalid MAC). If two different error messages are re-
turned or if the computing time is different, then this may leak information about the
plaintext. Implementations should therefore only return one error message ⟂ and com-
pute the MAC even if padding is incorrect. Now the TLS protocol 1.3 (see RFC 8446
[Res18]) supports only authenticated encryption, for example the GCM mode, which
is explained below. ♢
The above encrypt-then-authenticate construction requires two separate

algorithms and two independent keys. Now we want to present an operation mode for
block ciphers that provides authenticated encryption in one pass. The Galois Counter
Mode (GCM) is an extension of the CTR mode and gained increasing popularity. The
GCM mode outputs ciphertext and an authentication tag. The tag not only authen-
ticates the plaintext but also additional authenticated data (AAD). A variant of GCM
is the message authentication code GMAC, which only outputs the authentication tag.
GCM and GMAC are described in the NIST publication [Dwo07].
GCM is defined for block ciphers with 128-bit block length, for example AES. The
computation of the tag involves addition (XOR) and multiplication of 128-bit blocks.
Similar to the CMAC construction (see Section 8.2), the multiplication is defined in
the field
𝐺𝐹(2128 ) = 𝐺𝐹(2)[𝑥]/(𝑥128 + 𝑥7 + 𝑥2 + 𝑥 + 1).
GCM interprets binary strings as polynomials, but – in contrast to CMAC – in little-
endian order. A 128-bit block (𝑏0 𝑏1 … 𝑏127 ) thus corresponds to the residue class of
the polynomial 𝑏0 + 𝑏1 𝑥 + ⋯ + 𝑏127 𝑥127 mod 𝑥128 + 𝑥7 + 𝑥2 + 𝑥 + 1.
Example 8.23. The following is similar to Example 8.9, but uses the little-endian
convention. Let 𝑏 = (𝑏0 𝑏1 … 𝑏127 ) and 𝑐 = (010126 ). We want to compute 𝑏 ⋅ 𝑐.
The corresponding polynomials are 𝑏0 + 𝑏1 𝑥 + ⋯ + 𝑏127 𝑥127 and 𝑥. Their product
is 𝑚(𝑥) = 𝑏0 𝑥 + 𝑏1 𝑥2 + ⋯ + 𝑏127 𝑥128 . If 𝑏127 = 0, then the degree of 𝑚(𝑥) is less than
128, and therefore 𝑏 ⋅ 𝑐 = (0 𝑏0 𝑏1 … 𝑏126 ) = (𝑏 ≫ 1), a right shift by one position. If
𝑏127 = 1, then we have to use the congruence 𝑥128 ≡ 𝑥7 + 𝑥2 + 𝑥 + 1. The polynomial
1 + 𝑥 + 𝑥2 + 𝑥7 corresponds to 𝑅 = (111000010120 ), and thus 𝑏 ⋅ 𝑐 = (0 𝑏0 𝑏1 … 𝑏126 ) ⊕ 𝑅.
This computation can be generalized to any factor 𝑐 (see Exercise 10). ♢
In our description below, we assume that the additional authenticated data (AAD)
is 128 bits long at most. AAD may also be empty.
Definition 8.24. (GCM mode) Let 𝐸 be a block cipher with 128-bit block length. For
$
each encryption, a uniform initialization vector (or a nonce) 𝐼𝑉 ← {0, 1}96 is chosen.
The plaintext message 𝑚 is split into blocks of length 128 bits where the last block can
be shorter. We write 𝑚 = 𝑚1 ‖𝑚2 ‖ … ‖𝑚𝑁 . Define a 128-bit counter by 𝑐𝑡𝑟 = 𝐼𝑉‖031 ‖1.
Applying the CTR mode (see Definition 2.48) gives:
𝑐𝑖 = 𝐸𝑘 (𝑐𝑡𝑟 + 𝑖) ⊕ 𝑚𝑖 for 𝑖 = 1, 2, … , 𝑁 and 𝑐 = 𝐼𝑉‖𝑐1 ‖𝑐2 ‖ … ‖𝑐𝑁 .
Define the 128-bit hash key 𝐻 = 𝐸𝑘 (0128 ) and let 𝐴 = 𝐴𝐴𝐷. Then define
𝑋1 = 𝐴 ⋅ 𝐻,
𝑋𝑖 = (𝑋𝑖−1 ⊕ 𝑐𝑖−1 ) ⋅ 𝐻 for 𝑖 = 2, … , 𝑁 + 1, and
𝑋𝑁+2 = (𝑋𝑁+1 ⊕ (|𝐴| ‖ |𝑐|)) ⋅ 𝐻.
𝐴 and 𝑐𝑁 are padded by zeros, if necessary. The multiplication by 𝐻 is defined in the
field 𝐺𝐹(2128 ) as described above, and the bit lengths |𝐴| and |𝑐| are represented by 64-
bit integers under the big-endian convention. Then the authentication tag 𝑡 is defined
by
𝑡 = 𝑋𝑁+2 ⊕ 𝐸𝑘 (𝑐𝑡𝑟)
(see Figure 8.3), and the complete authenticated ciphertext is given by (𝑐, 𝑡, 𝐴𝐴𝐷).
For the decryption of (𝑐, 𝑡, 𝐴𝐴𝐷), the authentication tag associated to 𝑐 and 𝐴𝐴𝐷
is computed using the same formulas as above. If the result is not equal to the given tag
𝑡, then output the error symbol ⟂. Otherwise the plaintext is computed by decrypting
𝑐 in CTR mode:
𝑚𝑖 = 𝐸𝑘 (𝑐𝑡𝑟 + 𝑖) ⊕ 𝑐𝑖 for 𝑖 = 1, 2, … , 𝑁 and 𝑚 = 𝒟𝑘 (𝑐) = 𝑚1 ‖𝑚2 ‖ … ‖𝑚𝑁 . ♢
𝐴 𝑐1 𝑐2 … 𝑐𝑁 |𝐴| ‖ |𝑐| 𝐸𝑘 (𝑐𝑡𝑟)
⋅𝐻 ⨁ ⋅𝐻 ⨁ ⋅𝐻 ⨁ ⋅𝐻 ⨁ ⋅𝐻 ⨁ 𝑡
Figure 8.3. Computation of the GCM tag 𝑡 from the counter mode ciphertext 𝑐 =
𝐼𝑉‖𝑐1 ‖ … ‖𝑐𝑁 . 𝐻 = 𝐸𝑘 (0128 ) is the hash key, and the additional authenticated data is
denoted by 𝐴.
Note that the GCM mode does not strictly follow the encrypt-then-authenticate
approach, because the same key and counter is used for encryption and message au-
thentication. The security of GCM is proved in [MV04].
Exercises 161
8.5. Summary
• A message authentication code (MAC) is a tag that protects the integrity and
the authenticity of a message. The computation of a MAC requires a secret key
and the message.
• A MAC is secure if it is unforgeable under a chosen message attack.
• The CBC-MAC and CMAC constructions output the last ciphertext block in
CBC mode as a tag. CMAC modifies the last the plaintext block before the
message is encrypted in CBC mode in order to prevent length extension attacks.
• HMAC is based on a nested hash computation and takes a key and a message
as input.
• CMAC and HMAC are secure under certain assumptions.
• Authenticated encryption schemes are CCA2-secure and unforgeable.
• The encrypt-then-authenticate combination of a CPA-secure encryption
scheme and a strongly secure MAC gives an authenticated encryption scheme.
• The Galois Counter Mode (GCM) extends the CTR mode and provides encryp-
tion as well as message authentication using a single secret key.
Exercises
1. What are possible reasons why a MAC is invalid?

2. Compare the properties of hashes and MACs when used for integrity protection.
3. Show that MAC𝑘 (𝑚) = 𝑚 ⊕ 𝑘 defines a secure MAC if 𝑘 is generated uniformly at
random and only used once. Why is this scheme impractical?
4. Suppose a MAC tag does not depend on the first bit of the message. Why is such a
MAC insecure under a chosen-message attack?
5. Extend the fixed-length MAC based on a pseudorandom function (see Theorem
8.5) to messages of arbitrary length (see [KL15]).
Tip: Split the message into blocks. A sequence of tags would be a first approach,
but this does not protect against re-ordering, truncation and mixing attacks. Tweak
the construction in order to prevent such attacks.
6. Show that appending the length 𝐿 of a message does not turn the basic CBC MAC
into a secure MAC for messages of variable length.
Remark: In contrast, prepending a message with its length is secure.
7. Is it advisable to use the same key for CBC mode encryption and for CBC message
authentication?
8. Show that 𝐹𝑘 (𝑚) = 𝐻(𝑘 ‖ 𝑚) is not a secure MAC for Merkle-Damgård hash func-
tions and arbitrary length messages. Why is the HMAC construction not affected
by this attack ?
9. Use Sage to verify that the polynomial 𝑥128 + 𝑥7 + 𝑥2 + 𝑥 + 1 is irreducible in

𝐺𝐹(2)[𝑥].
10. Example 8.23 describes how to multiply a 128-bit block 𝑏 by 𝑐 = (010 … 0).
(a) Give 𝑏 ⋅ 𝑐 for 𝑐 = (0 … 0) and for 𝑐 = (10 … 0).
(b) How can you compute 𝑏 ⋅ 𝑐 for 𝑐 = 𝑒𝑖 , i.e., the 𝑖-th bit of 𝑐 is one and the other
bits are zero.
Hint: 𝑏 ⋅ 𝑋 𝑖 = 𝑏 ⋅ 𝑋 ⋯ ⋅ 𝑋.
(c) Now describe the computation of 𝑏 ⋅ 𝑐 for a general 128-bit block 𝑐.
Hint: 𝑏 ⋅ (𝑐 ⊕ 𝑐′ ) = (𝑏 ⋅ 𝑐) ⊕ (𝑏 ⋅ 𝑐′ ).
11. Describe the computation of the GCM tag if the plaintext consists of one 128-bit
block and AAD is empty.
12. A block cipher in CTR mode can only achieve CPA security. Describe a chosen
ciphertext attack against the CTR mode and show that this attack is not possible if
the CTR mode is combined with a secure MAC.
Chapter 9
Public-Key Encryption and

the RSA Cryptosystem
This chapter deals with public-key encryption schemes and the RSA cryptosystem. Sec-
tion 9.1 introduces public-key encryption schemes and defines their security require-
ments. Section 9.2 explains the widely used RSA encryption algorithm. The security
of RSA and the necessary assumptions are covered in Section 9.3. RSA (and other
cryptographic schemes) require large prime numbers and Section 9.4 deals with the
generation of such primes. The efficiency of RSA and possible optimizations are dis-
cussed in Section 9.5. We will see that there are some pitfalls in the application of RSA.
This leads to a randomized and padded version of RSA, which is explained in Section
9.6. The security of RSA is closely related to the factoring assumption, and Section 9.7
outlines different factoring algorithms and their complexity.
RSA is a major public-key scheme and is dealt with in all cryptography textbooks,
for example in [PP10]. For the provable security approach we refer to [KL15], [BR05],
[GB08], [Gol01].
9.1. Public-Key Cryptosystems

One of the main principles of symmetric encryption is that the encryption and decryp-
tion functions are closely related and both use a secret key. For a long time, the pre-
establishment of secret keys was assumed to be a prerequisite for cryptosystems. Obvi-
ously, a major problem of this type of cryptography is the exchange of secret keys over
potentially insecure channels. Furthermore, separate secret keys are needed for each
𝑛(𝑛−1)
pair of users, so that (𝑛) = keys are required in a network with 𝑛 nodes.
2 2
163
164 9. Public-Key Encryption and the RSA Cryptosystem
Historically, the idea of using public encryption keys is relatively new. A major ad-
vantage of this approach is that public keys can be openly exchanged. In addition, one
key pair suffices to receive messages from many communication partners. Obviously,
it is crucial that an adversary is not able to derive the private decryption key from the
public encryption key. Furthermore, the authenticity of public keys can represent a
problem.
Diffie and Hellman, influenced by Merkle’s work, were the first to publish a pa-
per [DH76] on public-key cryptography in 1976. They invented a mechanism for se-
cure key distribution over an insecure channel (see Chapter 10). Furthermore, they
described the fundamentals of public-key cryptography. Rivest, Shamir and Adleman
then published the first public-key encryption scheme (RSA) in 1978 [RSA78]. Al-
though Ellis, Cocks and Williamson had already invented public key mechanisms sev-
eral years before, they were not allowed to publish their results because they worked
for the British secret service.
Definition 9.1. (compare Definition 2.1) A public-key encryption scheme (public-key
cryptosystem) is given by:
• A plaintext space ℳ.
• A ciphertext space 𝒞.
• A key space 𝒦 = 𝒦𝑝𝑘 × 𝒦𝑠𝑘 (pairs of public and private keys).
• A randomized key generation algorithm 𝐺𝑒𝑛(1𝑛 ) that takes a security parameter
𝑛 as input and outputs a pair of keys (𝑝𝑘, 𝑠𝑘).
• An encryption algorithm ℰ = {ℰ𝑝𝑘 | 𝑝𝑘 ∈ 𝒦𝑝𝑘 } which may be randomized. It
takes a public key and a plaintext as input, and outputs the ciphertext or an error
symbol ⟂ if the plaintext is invalid.
• A deterministic decryption algorithm 𝒟 = {𝒟𝑠𝑘 | 𝑠𝑘 ∈ 𝒦𝑠𝑘 } that takes a private
key and a ciphertext as input and outputs the plaintext or an error symbol ⟂ if the
input is invalid.
All algorithms must run in polynomial time. The scheme provides correct decryption
if 𝒟𝑠𝑘 (ℰ𝑝𝑘 (𝑚)) = 𝑚 for each key pair (𝑝𝑘, 𝑠𝑘) ∈ 𝒦 and all plaintexts 𝑚 ∈ ℳ (see
Figure 9.1). ♢
𝑝𝑘 𝑠𝑘
𝑚 ℰ 𝑐 𝒟 𝑚
Figure 9.1. Encryption uses a public key 𝑝𝑘 and decryption a private key 𝑠𝑘.
9.1. Public-Key Cryptosystems 165
Now we want to define a public-key counterpart of eavesdropping and CPA-secur-

ity. We can assume that the public key is known so that an adversary is able to encrypt
any chosen plaintext, even without asking an encryption oracle. This implies that secu-
rity against eavesdropping (EAV) and chosen plaintext attacks (CPA) are equivalent for
public-key schemes. The following Definition 9.2 is similar to the previous Definitions
2.24 and 2.25 in the secret-key case.
Definition 9.2. Suppose a public-key encryption scheme is given. Consider the fol-
lowing experiment (see Figure 9.2): a challenger takes the security parameter 1𝑛 as
input, generates a key pair (𝑝𝑘, 𝑠𝑘) ∈ 𝒦 by running 𝐺𝑒𝑛(1𝑛 ) and chooses a random bit
$
𝑏 ← {0, 1}. An adversary 𝐴 is given the public key 𝑝𝑘 and 1𝑛 . The private key 𝑠𝑘 and 𝑏
are not known to the adversary. They can encrypt arbitrary messages using the public
key 𝑝𝑘. The adversary chooses two messages 𝑚0 , 𝑚1 ∈ ℳ of the same length. Then
the challenger encrypts one of the messages, and the ciphertext 𝑐 = ℰ𝑝𝑘 (𝑚𝑏 ) is given
to 𝐴. The adversary tries to guess 𝑏 and outputs a bit 𝑏′ . The challenger outputs 1 if
𝑏 = 𝑏′ , and 0 otherwise. The IND-CPA advantage of the adversary 𝐴 is defined as
ind−cpa
Adv (𝐴) = |𝑃𝑟[𝑏′ = 𝑏] − 𝑃𝑟[𝑏′ ≠ 𝑏]|.
The scheme has indistinguishable encryptions under a chosen plaintext attack (IND-
CPA secure or CPA-secure) if for every probabilistic polynomial time adversary 𝐴, the
ind−cpa
𝑝𝑘, 1𝑛 $
(𝑝𝑘, 𝑠𝑘) ← 𝐺𝑒𝑛(1𝑛 )
$
𝑏 ← {0, 1}
𝑚0 , 𝑚1
|𝑚0 | = |𝑚1 |
𝑐
𝑐 ← ℰ𝑝𝑘 (𝑚𝑏 )
𝑏′
Figure 9.2. Public-key EAV and CPA experiment. The adversary can also encrypt any
chosen plaintext.
Since an adversary can encrypt 𝑚0 and 𝑚1 and compare the result with the chal-
lenge ciphertext 𝑐, it is obvious that a public-key scheme with deterministic encryption
cannot be IND-CPA secure.
Remark 9.3. A more powerful adversary is able to perform an adaptive chosen cipher-
text attack (CCA2). In the CCA2 experiment, the adversary can additionally request
the decryption of arbitrary ciphertexts (before and after choosing two plaintext mes-
sages), except that the challenge ciphertext 𝑐 cannot be queried (compare Figure 2.5 in
the secret-key case). ♢
The construction of secure public-key encryption schemes is far from trivial, since
encryption is public but decryption must be hard without the private key. The con-
struction can be based on a family of trapdoor one-way permutations. Such permuta-
tions are one-way, i.e., easy to compute, but hard to invert without a trapdoor informa-
tion, which corresponds to the private key. It should be mentioned that hardness of
inversion is only required for uniform random input. Obviously, an adversary can pre-
pare a list of input values, compute the associated output and use that list for inversion.
We refer to [Gol01] and [KL15] on how to construct a secure public-key encryption
scheme from a family of trapdoor permutations.
All known constructions of public-key schemes are based on hard number-theoretic
problems, which also provide a security guarantee for these schemes. This represents
an advantage over secret-key schemes, where such guarantees do not exist. However,
public-key schemes are much less efficient, and, in practice, such schemes are only ap-
plied to a few blocks.
The RSA algorithm, which is explained in the next section, uses exponentiation
modulo a public composite number 𝑁 as its one-way permutation. The prime factors
of 𝑁 represent the private trapdoor information that permit the inversion.
9.2. Plain RSA

The plain RSA encryption scheme (schoolbook RSA) requires two large prime num-
bers; one takes a security parameter 1𝑛 as input, chooses two random primes 𝑝 and 𝑞
of size 𝑛 and sets 𝑁 = 𝑝𝑞. By the Chinese Remainder Theorem 4.26,
ℤ𝑁 ≅ ℤ𝑝 × ℤ𝑞 and ℤ∗𝑁 ≅ ℤ∗𝑝 × ℤ∗𝑞
for 𝑝 ≠ 𝑞 and ord(ℤ∗𝑁 ) = 𝜑(𝑁) = (𝑝 − 1)(𝑞 − 1). One chooses 𝑒 ∈ ℤ such that
1 < 𝑒 < (𝑝 − 1)(𝑞 − 1) and gcd(𝑒, (𝑝 − 1)(𝑞 − 1)) = 1,
and computes a modular inverse 𝑑 ∈ ℤ with
1 < 𝑑 < (𝑝 − 1)(𝑞 − 1) and 𝑒𝑑 ≡ 1 mod (𝑝 − 1)(𝑞 − 1).

9.2. Plain RSA 167
Then 𝑝𝑘 = (𝑒, 𝑁) forms the public key and 𝑠𝑘 = (𝑑, 𝑁) the private key. 𝑁 is called the
RSA modulus, 𝑒 is the encryption exponent and 𝑑 is the decryption exponent. The factors
𝑝, 𝑞 and 𝜑(𝑁) must remain secret, since 𝑑 can be efficiently derived from any of these
numbers. For encryption, the plaintext is raised to the power 𝑒 and reduced modulo 𝑁.
Decryption works similarly, but raises the ciphertext to the power 𝑑.
Definition 9.4. The plain RSA encryption scheme is defined by:

• A polynomial-time key generation algorithm 𝐺𝑒𝑛(1𝑛 ) that takes the security pa-
rameter 1𝑛 as input, generates two random 𝑛-bit primes 𝑝 and 𝑞, and sets 𝑁 = 𝑝𝑞.
Furthermore, two integers 𝑒 and 𝑑 with
𝑒𝑑 ≡ 1 mod (𝑝 − 1)(𝑞 − 1)
are chosen as explained above. 𝐺𝑒𝑛(1𝑛 ) outputs the public key 𝑝𝑘 = (𝑒, 𝑁) and
the private key 𝑠𝑘 = (𝑑, 𝑁).
• The plaintext and the ciphertext space is ℤ∗𝑁 .
• The deterministic encryption algorithm takes a plaintext 𝑚 ∈ ℤ∗𝑁 and the public
key 𝑝𝑘 as input and outputs
𝑐 = ℰ𝑝𝑘 (𝑚) = 𝑚𝑒 mod 𝑁.
• The decryption algorithm takes a ciphertext 𝑐 ∈ ℤ∗𝑁 and the private key 𝑠𝑘 as
input and outputs
𝑚 = 𝒟𝑠𝑘 (𝑐) = 𝑐𝑑 mod 𝑁.
The scheme is only defined for messages of fixed length. ♢
For the correctness of the RSA scheme one has to show that
(𝑚𝑒 )𝑑 ≡ 𝑚 mod 𝑁
for all 𝑚 ∈ ℤ∗𝑁 . But this follows from Euler’s Theorem (see Theorem 4.15 and Exercise
4.4): let 𝑚 ∈ ℤ∗𝑁 ; then we have
𝑚𝜑(𝑁) ≡ 1 mod 𝑁.
Since 𝑒𝑑 = 1 + 𝑘𝜑(𝑁) for some 𝑘 ∈ ℤ, we obtain 𝑚𝑒𝑑 = 𝑚(𝑚𝜑(𝑁) )𝑘 ≡ 𝑚 mod 𝑁.
Example 9.5. Suppose Bob’s RSA key is given by 𝑝 = 29, 𝑞 = 23, 𝑁 = 𝑝𝑞 = 667, 𝑒 = 3,
𝑑 = 411. This defines an admissible RSA cryptosystem, since 𝑒 = 3 is relatively prime
to 𝜑(𝑁) = (𝑝 − 1)(𝑞 − 1) = 616. The multiplicative inverse 𝑑 = 411 can be computed
using the Extended Euclidean Algorithm:
616 ∶ 3 = 205 remainder 1 ⟹ 616 = 3 ⋅ 205 + 1 ⟹ 1 = 616 − 205 ⋅ 3.

Hence 𝑑 = (3 mod 616)−1 ≡ −205 ≡ 411. The public key 𝑝𝑘 = (3, 667) is published
by Bob. If Alice wants to send him the message 𝑚 = 108, she will encrypt it as follows:
𝑐 = 𝐸𝑒 (𝑚) = 𝑚𝑒 = 1083 mod 667 ≡ 416.
She sends 𝑐 = 416 to Bob, who is able to decrypt the ciphertext using his private key
𝑠𝑘 = (411, 667):
𝑚 = 𝐷𝑑 (𝑐) = 𝑐𝑑 = 416411 mod 667 ≡ 108. ♢
9.3. RSA Security

It is obvious that the security of RSA relies on the hardness of factoring. If an adversary
can compute the factors 𝑝 and 𝑞, then they know 𝜑(𝑁) = (𝑝 − 1)(𝑞 − 1) and easily find
the private exponent 𝑑 ≡ (𝑒 mod 𝜑(𝑁))−1 . Furthermore, the factors 𝑝 and 𝑞 can be
computed from 𝜑(𝑁):
𝑁
𝜑(𝑁) = (𝑝 − 1) ( − 1) .
𝑝
Multiplication by 𝑝 yields a quadratic equation in 𝑝:
𝑝2 − (𝑁 − 𝜑(𝑁) + 1)𝑝 + 𝑁 = 0.
The roots of the quadratic equation 𝑥2 + (𝑁 − 𝜑(𝑁) + 1)𝑥 + 𝑁 = 0 are 𝑝 and 𝑞. Hence
𝜑(𝑁) must also be kept secret.
Example 9.6. Consider Example 9.5 and suppose 𝑁 = 667 and 𝜑(𝑁) = 616 are known.
Then the roots of the equation
𝑥2 − (667 − 616 + 1)𝑥 + 667 = 𝑥2 − 52𝑥 + 667 = 0
give 𝑥1 = 𝑝 = 29 and 𝑥2 = 𝑞 = 23. ♢
The factoring of integer numbers has been intensively studied and it is generally
assumed that finding large prime factors of a composite number is a hard problem.
Suppose a modulus generation algorithm takes the security parameter 1𝑛 as input
and outputs two primes 𝑝, 𝑞. Let 𝑁 = 𝑝𝑞. A probabilistic polynomial time adversary
𝐴 is given 𝑁 and has to find the factors 𝑝 and 𝑞. Now the factoring assumption states
that the modulus can be efficiently generated in such a way that an adversary has only
a negligible chance of finding the correct factors 𝑝 and 𝑞.
Since no polynomial-time factoring algorithms have been found so far, this as-
sumption is generally believed to be true and forms the basis of major cryptographic
schemes. Factoring algorithms are discussed in Section 9.7.
9.3. RSA Security 169
Example 9.7. Suppose the primes 𝑝 and 𝑞 are chosen such that the difference 𝑝 − 𝑞
is small. In this case, factoring 𝑁 = 𝑝𝑞 is not hard, even if 𝑝 and 𝑞 are large prime
numbers (see Fermat’s factorization method in Section 9.7). In fact, the primes should
be chosen independently. ♢
Factoring 𝑁 breaks RSA, but the opposite statement is not necessarily true. The
security of RSA is in fact based on the RSA assumption, which states that encryption is
a one-way permutation. However, the RSA assumption is stronger than the factoring
assumption, since an adversary might attack RSA without factoring the modulus.
Definition 9.8. Consider the following RSA experiment: run the RSA key generation
algorithm 𝐺𝑒𝑛(1𝑛 ) on input 1𝑛 to obtain the parameters 𝑝, 𝑞, 𝑒, 𝑑 and 𝑁. A uniform
$
ciphertext 𝑐 ← ℤ∗𝑁 is chosen and an adversary obtains 1𝑛 , 𝑒, 𝑁 and 𝑐. The adversary
has to find 𝑚 ∈ ℤ∗𝑁 such that 𝑚𝑒 mod 𝑁 ≡ 𝑐. The RSA problem is hard relative to
𝐺𝑒𝑛, if for every probabilistic polynomial-time adversary, the probability of finding the
correct plaintext 𝑚 is negligible in 𝑛.
The RSA assumption states that there is a key generation algorithm such that the
RSA problem is hard. ♢
The RSA assumption means that it is hard to recover the plaintext from a randomly
chosen ciphertext, but this does not imply the security of the plain RSA scheme. In fact,
the plain RSA encryption scheme is deterministic and thus cannot be CPA-secure. This
is critical in situations where the possible plaintexts are known or the number of plain-
texts is small. Then an adversary can easily find the plaintext simply by encrypting the
plaintext candidates. But if the plaintext messages are chosen uniformly at random
from a large space, then one might expect that the scheme is secure under the RSA as-
sumption. However, there are a number of pitfalls, which are discussed below. Further
details can be found in the survey article [Bon99].
(1) Encryption of a short plaintext message 𝑚 with a small encryption exponent 𝑒 is
insecure. If 𝑐 = 𝑚𝑒 < 𝑁, then 𝑐 is computed without modular reduction, and
hence the plaintext 𝑚 can be recovered by computing the real 𝑒-th root 𝑐1/𝑒 . If
𝑛
𝑒 = 3 then this low-exponent attack can be applied to all messages of length < ,
3
where 𝑛 = size(𝑁). In practice, one often chooses the public exponent 𝑒 = 216 +1,
which is large enough to prevent this attack.
(2) If a fixed message 𝑚 (not necessarily short) is encrypted for 𝑒 recipients with dif-
ferent RSA moduli, then the Chinese Remainder Theorem allows 𝑚 to be recov-
ered by computing a real 𝑒-th root (Hastad’s broadcast attack; see Exercise 7).
(3) The modulus 𝑁 must not be shared among different users, even if individual ex-
ponents 𝑒 and 𝑑 are used. They can factorize 𝑁 and therefore compute the private
exponents of all users who share this modulus. Furthermore, one can show that
sharing the modulus is insecure, even if the users trust each other.
(4) The prime factors of the modulus must not be re-used. If 𝑁1 = 𝑝𝑞1 and 𝑁2 = 𝑝𝑞2 ,
then 𝑝 = gcd(𝑁1 , 𝑁2 ) can be efficiently computed.
1
(5) It was shown that small decryption exponents 𝑑 < 𝑁 1/4 can be efficiently recov-
3
ered (Wiener attack). This attack can be improved to 𝑑 < 𝑁 0.292 . Such 𝑑 should
therefore be avoided. However, if the public exponent 𝑒 is chosen first, then the
probability that 𝑑 satisfies this condition is very small.
(6) The plaintext of two related messages 𝑚1 and 𝑚2 , for example
𝑚2 = 𝑎𝑚1 + 𝑏 mod 𝑁,
can be recovered from their ciphertexts 𝑐1 and 𝑐2 if 𝑎 and 𝑏 are known and the
public exponent 𝑒 is small (Franklin-Reiter attack).
(7) The unknown part of a partially known plaintext can be recovered from the ci-
phertext if the encryption exponent 𝑒 is small (Coppersmith attack).
size(𝑁)
(8) The private exponent 𝑑 can be reconstructed if the least significant ⌈ ⌉ bits
4
of 𝑑 are known (partial key-exposure attack).
Furthermore, plain RSA does not provide protection against ciphertext manipula-
tions and chosen ciphertext attacks:
(1) Plain RSA encryption is malleable and the ciphertext can be easily manipulated.
If an adversary replaces the ciphertext 𝑐 = 𝑚𝑒 mod 𝑁 with 𝑠𝑒 𝑐 mod 𝑁, then
the corresponding plaintext becomes 𝑠𝑚 mod 𝑁. Similarly, the ciphertext of a
product of plaintexts is congruent to the product of ciphertexts mod 𝑁; plain
RSA encryption is a multiplicative homomorphism.
(2) A chosen ciphertext attack against plain RSA is easily performed: if an adversary
is given the challenge ciphertext 𝑐, then they may ask for the decryption of an
unsuspicious-looking ciphertext 𝑐′ = 𝑐𝑠𝑒 mod 𝑁, where 𝑠 ∈ ℤ∗𝑁 and 𝑠 ≢ 1. If 𝑚
and 𝑚′ are the plaintexts corresponding to 𝑐 and 𝑐′ , then 𝑚′ = 𝑚𝑠 mod 𝑁. Hence
an adversary can easily compute the plaintext 𝑚 = 𝑚′ 𝑠−1 mod 𝑁 (see Exercise
6(c)).
9.4. Generation of Primes

RSA and other public key algorithms require large random prime numbers. The Prime
Number Theorem 3.5 tells us that the asymptotic density of primes among the first 𝑁
1
integer numbers is . Fortunately, the density only decreases logarithmically.
ln(𝑁)
9.4. Generation of Primes 171
Example 9.9. The density is small, but not too small, since ln(𝑥) increases slowly. For
example, the density of primes among odd random numbers less than 22048 is approx-
2 1
imately ≈ 0.0014. The expected number of trials is ≈ 710. ♢
2048 ln(2) 0.0014
To generate a large prime, choose an odd random number of the required size and
test its primality. Usually, this requires testing of several hundred candidates, as in
Example 9.9 above. Rather surprisingly, a deterministic primality test (AKS) that runs
in polynomial time has been found [AKS04]. In practice, however, the AKS test is not
fast enough, so the probabilistic Miller-Rabin test is preferable. This test is based on
the Proposition below. Note that in this section 𝑛 represents any natural number, not
a security parameter.
Proposition 9.10. Let 𝑛 ∈ ℕ be odd and 1 ≤ 𝑎 < 𝑛 an integer. If 𝑔𝑐𝑑(𝑎, 𝑛) ≠ 1, then 𝑛
is composite. Otherwise, write
𝑛 − 1 = 2𝑠 𝑑
with 𝑠 ∈ ℕ being maximal. If 𝑛 is prime, then either
𝑎𝑑 ≡ ±1 mod 𝑛
or there is an integer 𝑟 ∈ {1, … , 𝑠 − 1} such that

𝑟𝑑
𝑎2 ≡ −1 mod 𝑛.
Proof. Suppose that 𝑛 is a prime and 𝑦 = 𝑎𝑑 ≢ 1 mod 𝑛. Consider the sequence

𝑠−1 𝑠
𝑦, 𝑦2 , … , 𝑦2 , 𝑦2 mod 𝑛.
Euler’s Theorem 4.15 gives
𝑠 𝑠
𝑎𝑛−1 = (𝑎𝑑 )2 = 𝑦2 ≡ 1 mod 𝑛.
Hence the above sequence of squares starts with 𝑦 ≢ 1 and ends with 1, but since we
assumed that 𝑛 is a prime so that ℤ𝑛 is a field, the equation 𝑥2 ≡ 1 mod 𝑛 implies 𝑥 ≡ 1
or 𝑥 ≡ −1. Therefore, one of the elements in the above sequence must be congruent to
−1. □
The Miller-Rabin test checks whether 𝑛 satisfies the implication of the above
Proposition 9.10. If it does not, then 𝑛 is composite. Hence all 𝑛 satisfying the fol-
lowing condition (COMP) must be composite:
(COMP) There exists a base 𝑎 ∈ ℕ with 1 ≤ 𝑎 < 𝑛, such that

𝑟𝑑
𝑎𝑑 ≢ ±1 mod 𝑛 and 𝑎2 ≢ −1 mod 𝑛 for all 1 ≤ 𝑟 ≤ 𝑠 − 1.
This is used in Algorithm 9.1.
Algorithm 9.1 Miller-Rabin Algorithm with 𝑘 runs

Input: 𝑛 ∈ ℕ, 𝑘 ∈ ℕ
Output: ‘𝑛 is probably prime’ or ‘𝑛 is composite’
1: for 𝑖 = 1 to 𝑘 do
$
2: 𝑎 ← {1, 2, … , 𝑛 − 1} // choose a uniform random base 𝑎.
3: if gcd(𝑎, 𝑛) ≠ 1 then
4: return ’𝑛 is composite’
5: end if
6: 𝑦 ≡ 𝑎𝑑 mod 𝑛
7: if 𝑦 ≡ 1 or 𝑦 ≡ −1 then
8: do next // 𝑛 could be a prime, test another base 𝑎.
9: end if
10: for 𝑟 = 1 to 𝑠 − 1 do
11: 𝑦 = 𝑦2 mod 𝑛
12: if 𝑦 ≡ 1 then
13: return ‘𝑛 is composite’ // 𝑛 cannot be a prime, stop.
14: end if
15: if 𝑦 ≡ −1 then
16: do next // 𝑛 could be a prime, test another base 𝑎.
17: end if
18: end for
19: return ‘𝑛 is composite’ // 𝑛 cannot be a prime, stop.
20: end for
21: return ‘𝑛 is probably prime’
If the Miller-Rabin algorithm outputs that a number is composite, this result must
be correct. However, there are bases 𝑎 such that a composite number 𝑛 is incorrectly
identified as a prime in a run of the Miller-Rabin test.
Proposition 9.11. Let 𝑛 ∈ ℕ be composite and 𝑛 − 1 = 2𝑠 𝑑, where 𝑠 ∈ ℕ is maximal.
Then the number of bases 𝑎 ∈ {1, 2, … , 𝑛 − 1} such that
𝑟𝑑
𝑎𝑑 ≡ ±1 mod 𝑛 or 𝑎2 ≡ −1 mod 𝑛 for 𝑟 ∈ {1, … , 𝑠 − 1}
𝑛−1
is at most . ♢
4
We refer to [Sho09] for a proof of this statement. The probability that one run of the
1
Miller-Rabin test identifies a composite number as prime is therefore less than . One
4
1
can reduce the error probability to less than 𝑘 by 𝑘 independent runs of the Miller-
4
Rabin test. Note that Proposition 9.11 holds for all composite 𝑛. One can show that the
error probability for randomly selected odd numbers 𝑛 is much lower, and in practice
less than 10 runs are sufficient. The test is also efficient for large numbers, and we note
9.5. Efficiency of RSA 173
that a full factorization of 𝑛 − 1 is not required in order to find the maximal exponent
𝑠 of the factor 2. The exponent 𝑠 and hence the necessary number of exponentiations
is usually small, and the running time is 𝑂(size(𝑛)3 ).
Example 9.12. (1) 𝑛 = 561. We choose 𝑎 = 2 and have gcd(2, 561) = 1. Then
𝑛 − 1 = 560 = 24 ⋅ 35, so that 𝑑 = 35 and 𝑠 = 4. One computes 𝑎𝑑 = 235 ≡
263 ≢ ±1 mod 561, so the test continues. The next steps are 𝑎2𝑑 = 270 ≡ 166,
𝑎4𝑑 = 2140 ≡ 67 and finally 𝑎8𝑑 = 2280 ≡ 1. The sequence does not contain the
residue class −1, and thus the Miller-Rabin test shows that 561 is composite. In
fact, 561 = 3 ⋅ 11 ⋅ 17.
(2) 𝑛 = 1009. We choose 𝑎 = 3, so that gcd(3, 1009) = 1. We have 𝑛 − 1 = 1008 =
24 ⋅ 63, hence 𝑑 = 63 and 𝑠 = 4. We compute 𝑎𝑑 = 363 ≡ 192 ≢ ±1 mod 1009,
so the test continues. Then 𝑎2𝑑 ≡ 540 and 𝑎4𝑑 ≡ 1008 ≡ −1 mod 1009. Hence
𝑛 = 1009 could be a prime and another base 𝑎 is chosen. Every run of the test will
confirm the result, since 1009 is in fact a prime number.
9.5. Efficiency of RSA

In general, RSA encryption and decryption requires one exponentiation modulo 𝑁.
Assuming that the size of the base, exponent and modulus is 𝑛, the time complexity is
𝑂(𝑛3 ), if fast exponentiation or the square-and-multiple algorithm is used (see Section
3.3). Exponentiation is an efficient operation, but the running time is nonetheless sig-
nificant. Such an operation can take some time on hardware with restricted resources,
for example on smart cards, but a large number of modular exponentiations should
also be avoided on standard computer systems.
In practice, the public exponent is often chosen to be 𝑒 = 216 + 1. This is rela-
tively small, but still large enough to protect against low-exponent attacks. An encryp-
tion 𝑚𝑒 mod 𝑁 only requires 16 modular squarings and one additional multiplication.
Hence 17 multiplications modulo 𝑁 are sufficient, and the complexity is 𝑂(𝑛2 ), which
is much better than the general case. At the time of this writing, it is recommended to
choose a modulus of at least 2048 bits.
The private exponent 𝑑 has full length even if 𝑒 is short. In fact, a short exponent
𝑑 would be insecure (Wiener attack). However, the decryption 𝑐𝑑 mod 𝑁 can be ac-
celerated by a factor of around 4 by using the Chinese Remainder Theorem 4.26 (CRT).
Recall that the CRT gives a decomposition ℤ𝑁 ≅ ℤ𝑝 × ℤ𝑞 (Theorem 4.26). The
idea is to perform separate decryptions in ℤ𝑝 and ℤ𝑞 and map the resulting tuple back
to ℤ𝑁 . Since the size of 𝑝 and 𝑞 is around half the size of 𝑁 and the running time is
cubic, an exponentiation modulo 𝑝 or 𝑞 is approximately 23 = 8 times faster than an
exponentiation modulo 𝑁. Because exponentiations modulo 𝑝 and 𝑞 are necessary,
the speed-up is a factor of 4. The running time of the CRT computation is insignificant
compared to the exponentiations.
Let 𝑐 be a ciphertext. First, the ciphertext and the private exponent are reduced:
𝑐𝑝 = 𝑐 mod 𝑝, 𝑐𝑞 = 𝑐 mod 𝑞,
𝑑𝑝 = 𝑑 mod (𝑝 − 1), 𝑑𝑞 = 𝑑 mod (𝑞 − 1).
Note that the exponent 𝑑 is reduced modulo 𝑝−1 and 𝑞−1, respectively, and not modulo
𝑝 and 𝑞 (see Proposition 4.16). In fact, one has ord(ℤ∗𝑝 ) = 𝑝 − 1 and ord(ℤ∗𝑞 ) = 𝑞 − 1.
In the next step, the decryption is done more efficiently modulo 𝑝 and 𝑞:
𝑑𝑝 𝑑𝑞
𝑚𝑝 = 𝑐𝑝 mod 𝑝, 𝑚𝑞 = 𝑐𝑞 mod 𝑞;
𝑚 is finally computed using the Chinese Remainder Theorem. Consider the equation
1 = 𝑥𝑝 + 𝑦𝑞.
𝑥, 𝑦 ∈ ℤ can be computed using the Extended Euclidean Algorithm on input 𝑝 and 𝑞.
It follows that 𝑥𝑝 ≡ 1 mod 𝑞 and 𝑦𝑞 ≡ 1 mod 𝑝. Now we obtain
𝑚 = 𝑚𝑞 𝑥𝑝 + 𝑚𝑝 𝑦𝑞 mod 𝑁.
Example 9.13. Let 𝑝 = 29, 𝑞 = 23, 𝑁 = 𝑝𝑞 = 667, 𝑒 = 3, 𝑑 = 411 be RSA parameters

and 𝑐 = 416 a ciphertext (see Example 9.5). We want to accelerate the decryption
𝑚 = 𝑐𝑑 by using the Chinese Remainder Theorem. First, we reduce the ciphertext 𝑐
and the exponent 𝑑:
𝑐𝑝 = 416 mod 29 ≡ 10, 𝑐𝑞 = 416 mod 23 ≡ 2,
𝑑𝑝 = 411 mod 28 ≡ 19, 𝑑𝑞 = 411 mod 22 ≡ 15.
Then the plaintext is computed modulo 𝑝 and modulo 𝑞. The exponents 𝑑𝑝 and 𝑑𝑞 are
much smaller than 𝑑 = 411:
𝑚𝑝 = 1019 mod 29 ≡ 21, 𝑚𝑞 = 215 mod 23 ≡ 16.
We use the Extended Euclidean Algorithm to find 𝑥, 𝑦 ∈ ℤ such that 1 = 𝑥 ⋅ 29 + 𝑦 ⋅ 23
(see Table 9.1).
Table 9.1. Extended Euclidean Algorithm on input 29 and 23.
29 ∶ 23 = 1 rem. 6 29 = 23 + 6 6 = 29 − 23
23 ∶ 6 = 3 rem. 5 23 = 3 ⋅ 6 + 5 5 = 23 − 3 ⋅ 6
6 ∶ 5 = 1 rem. 1 6=5+1 1=6−5
We obtain 1 = (29 − 23) − (23 − 3 ⋅ (29 − 23)) = 4 ⋅ 29 − 5 ⋅ 23, so that 𝑥 = 4 and

𝑦 = −5. Finally, we compute the plaintext:
𝑚 = 𝑚𝑞 𝑥𝑝 + 𝑚𝑝 𝑦𝑞 = 16 ⋅ 4 ⋅ 29 + 21 ⋅ (−5) ⋅ 23 = −559 ≡ 108 mod 667.
9.6. Padded RSA 175
9.6. Padded RSA

We saw in Section 9.3 that plain RSA is not CPA-secure, even if the parameters 𝑝, 𝑞,
𝑒, 𝑑 are appropriately chosen, because the scheme is deterministic. Furthermore, the
ciphertext is malleable and chosen ciphertext attacks are possible.
An obvious approach is to add random data before encrypting. In the Public Key
Cryptography Standard (PKCS) #1 version 1.5 [MKJR16], a message 𝑚 is transformed
into a padded message 𝑀 that has the byte length of the modulus 𝑁:
𝑀 = 00‖02‖𝑟‖00‖𝑚;
𝑟 is a random padding string of at least 8 nonzero bytes. The extra padding bytes reduce
the maximum length of the plaintext 𝑚 by 11 bytes. This is not critical since 𝑁 is usually
at least 128 bytes long. The padded message 𝑀 is encrypted using plain RSA:
𝑐 = 𝑀 𝑒 mod 𝑁.
For decryption, one computes 𝑀 = 𝑐𝑑 mod 𝑁 and checks whether the first two bytes
of 𝑀 are equal to 00‖02. Otherwise, an error is returned. The plaintext 𝑚 is then recov-
ered by stripping off the padding data. Unfortunately, this scheme can be attacked with
an adaptive chosen ciphertext attack. The adversary only needs to know whether or
not specifically crafted ciphertexts have correct padding after decryption [Bon99]. This
information can be exploited in order to decrypt a given ciphertext and, for example, to
attack the handshake of the TLS protocol, if RSA encryption of the Pre-Master-Secret
is used (Bleichenbacher’s attack). The error messages or the response time of a TLS
server can leak the information whether or not the padding is correct. To prevent this
attack, TLS servers should not inform clients about padding errors and continue with
the handshake.
Furthermore, the scheme is not CPA-secure if the random string 𝑟 is short, for ex-
ample 8 bytes long. In this case, an adversary could recover a plaintext if parts of it are
known, for example leading zeros. If 𝑟 is longer, say half the length of 𝑁, then it is con-
jectured that the scheme is CPA-secure, but since it is does not protect against chosen
ciphertexts attacks (CCA), another more complex scheme is recommended: Optimal
Asymmetric Encryption Padding (OAEP). We describe the OAEP variant standardized
in PKCS #1 version 2.2 and in RFC 8017 [MKJR16] (see Figure 9.3).
The scheme requires two hash functions in order to generate pseudorandom out-
put: a hash function 𝐻 with output byte length ℎ and a so-called mask generating func-
tion 𝑀𝐺𝐹 with input length ℎ and variable output length. RFC 8017 proposes a mask
generating function called MGF1 based on a hash function 𝐻, for example SHA-2 (see
Chapter 7). MGF1 is defined as
𝑀𝐺𝐹1(𝑠𝑒𝑒𝑑, 𝑙𝑒𝑛) = 𝐻(𝑠𝑒𝑒𝑑‖𝑐𝑡𝑟) ‖ 𝐻(𝑠𝑒𝑒𝑑‖𝑐𝑡𝑟 − 1) ‖ … ‖ 𝐻(𝑠𝑒𝑒𝑑‖0),
𝑙𝑒𝑛
where 𝑐𝑡𝑟 = ⌈ ⌉ − 1. The integers 0, … , 𝑐𝑡𝑟 are represented as binary strings of length
ℎ
four bytes. MGF1 only outputs the leading 𝑙𝑒𝑛 octets.
Figure 9.3. Encryption of a plaintext 𝑚 using RSA-OAEP.
Let 𝑚 be the plaintext message. The maximum byte length of 𝑚 is 𝑘−2ℎ−2, where
𝑘 is the length of the modulus in bytes. Firstly, 𝑚 is transformed into a data block 𝐷𝐵
of length 𝑘 − ℎ − 1 bytes. One may add a label 𝐿 or otherwise leave 𝐿 empty. 𝑃𝑆 is a
zero padding string of the required length. Then set
𝐷𝐵 = 𝐻(𝐿) ‖ 𝑃𝑆 ‖ 01 ‖ 𝑚.
The data block 𝐷𝐵 can be viewed as a padded combination of the message and a hashed
label.
Now the next step is to randomize the message. A random seed 𝑟 of length ℎ is
generated, and 𝑑𝑏𝑀𝑎𝑠𝑘 = 𝑀𝐺𝐹(𝑟, 𝑘 − ℎ − 1) gives a pseudorandom output string of
length 𝑘 − ℎ − 1 bytes. Define
𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵 = 𝐷𝐵 ⊕ 𝑑𝑏𝑀𝑎𝑠𝑘,
𝑚𝑎𝑠𝑘𝑒𝑑𝑆𝑒𝑒𝑑 = 𝑟 ⊕ 𝐻(𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵).
𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵 defines the randomized message, and 𝑚𝑎𝑠𝑘𝑒𝑑𝑆𝑒𝑒𝑑 is needed during de-
cryption to undo the masking of 𝐷𝐵. The encoded message 𝐸𝑀 is given by
𝐸𝑀 = 00 ‖ 𝑚𝑎𝑠𝑘𝑒𝑑𝑆𝑒𝑒𝑑 ‖ 𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵.
The byte length of 𝐸𝑀 is 𝑘. Finally, the RSA-OAEP ciphertext is defined as
𝐶 = (𝐸𝑀)𝑒 mod 𝑁.
9.7. Factoring 177
For decryption, 𝐸𝑀 = 𝐶 𝑑 mod 𝑁 is computed and the length is checked. 𝐸𝑀 gives

𝑚𝑎𝑠𝑘𝑒𝑑𝑆𝑒𝑒𝑑 and 𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵. Then compute
𝑟 = 𝑚𝑎𝑠𝑘𝑒𝑑𝑆𝑒𝑒𝑑 ⊕ 𝐻(𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵),
𝑑𝑏𝑀𝑎𝑠𝑘 = 𝑀𝐺𝐹(𝑟, 𝑘 − ℎ − 1),
𝐷𝐵 = 𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵 ⊕ 𝑑𝑏𝑀𝑎𝑠𝑘.
The expected structure of 𝐷𝐵 and the label is verified, and finally the plaintext 𝑚 is
extracted. It is important that only one type of decryption error message is given for the
different error conditions. Furthermore, the running time of OAEP implementations
should not be correlated to the type of error. Otherwise, an adversary may obtain useful
information and perform a chosen ciphertext attack.
Remark 9.14. A major result is that RSA-OAEP is secure against adaptive chosen ci-
phertexts attacks (CCA2-secure) under the RSA assumption and in the random oracle
model [FOPS01]. However, the CCA2 security was proven for the original OAEP ver-
sion of Bellare and Rogaway, not for the standardized version, which has a leading zero
byte in the encoded message 𝐸𝑀. Care must also be taken to ensure that an adversary
cannot distinguish between the different error conditions. ♢
Loosely speaking, this result means that an adversary, who knows the public key
and has access to a decryption oracle, cannot gain any information from a given cipher-
text or tamper with a ciphertext.
9.7. Factoring
Factoring algorithms have been studied since the times of Ancient Greece, and many
ideas have been contributed over the centuries, but no polynomial-time algorithm has
been found and factoring is still assumed to be a hard problem, at least on conven-
tional computers (see Chapter 13 on quantum computing). Obviously, if the factoring
assumption turns out to be wrong, then RSA is broken.
In the following, we give an overview of different approaches to factoring and dis-
cuss their algorithmic complexity. We assume that a large positive integer
𝑁 = 𝑝𝑞 is given, where the prime factors 𝑝 and 𝑞 are unknown to an adversary.
Lists of primes up to a specified bound can be generated by the ancient sieve of
Eratosthenes. The idea is to successively filter out all multiples of primes. There are
faster modern algorithms, for example the sieve of Atkin. The sieve algorithm generates
a list of prime numbers, but this is only efficient for relatively small primes.
Trial division is an elementary factoring method. It suffices to test numbers ≤ √𝑁.
A list of small primes is useful (sieve method), and otherwise all odd numbers (or per-
haps all numbers not divisible by 2, 3 or 5) need to be tested. The worst-case complexity
is 𝑂(√𝑁) and the running time is exponential in size(𝑁).
Pollard’s 𝜌 algorithm searches for an integer 𝑥 such that gcd(𝑥, 𝑁) is either 𝑝 or 𝑞,

e.g., 𝑥 ≡ 0 mod 𝑝, but 𝑥 ≢ 0 mod 𝑞. The idea is to generate a pseudorandom sequence
𝑥𝑖 = 𝑓(𝑥𝑖−1 ) of integers modulo 𝑁 and to find a collision modulo 𝑝 or 𝑞 using Floyd’s
cycle finding algorithm (see Proposition 1.63). Note that
𝑥𝑘 ≡ 𝑥2𝑘 mod 𝑝 ⟺ 𝑥𝑘 − 𝑥2𝑘 ≡ 0 mod 𝑝.
The algorithm computes pairs 𝑥𝑘 , 𝑥2𝑘 of integers modulo 𝑁 and checks whether
gcd(𝑥𝑘 − 𝑥2𝑘 , 𝑁) > 1. By the birthday paradox, around 𝑂(√𝑝) = 𝑂(𝑁 1/4 ) iterations
should be sufficient to find a collision modulo 𝑝.
Example 9.15. Let 𝑁 = 108371. We choose the function 𝑓(𝑥) = 𝑥2 + 1 mod 𝑁 and
set 𝑥0 = 1.
sage: N =108371
sage: def f(x):
return (x*x+1)
sage: x=mod (1,N); y=mod (1,N)
x=f(x)
y=f(f(y))
k=1
while (gcd(x-y,N )==1):
x=f(x)
y=f(f(y))
k=k+1
print "k =",k,"x= ",x,"y= ",y,"p =",gcd(x-y,N)
k = 18 x= 21473 y= 67523 p = 307
We obtain the factor 𝑝 = 307 after 18 iterations. The algorithm finds the collision
𝑥18 = 𝑥36 mod 𝑝 and we verify that 21473 ≡ 67523 ≡ 290 mod 307. ♢
Fermat factorization uses a representation of 𝑁 as a difference of squares:

𝑁 = 𝑥2 − 𝑦2 = (𝑥 + 𝑦)(𝑥 − 𝑦).
To find 𝑥 and 𝑦, you begin with the integer 𝑥 = ⌈√𝑁 ⌉ and increase 𝑥 by 1 until 𝑥2 − 𝑁
is square, say 𝑦2 , so that 𝑁 = 𝑥2 − 𝑦2 . Fermat factorization always works, since 𝑁 can
be written as a difference of two squares:
2 2
1 1
𝑝𝑞 = ( (𝑝 + 𝑞)) − ( (𝑝 − 𝑞)) = 𝑥2 − 𝑦2 .
2 2
However, Fermat’s method is only efficient if the prime factors are close to one another,
i.e., if 𝑦 is small. In general, the running time is 𝑂(√𝑁).
Example 9.16. Let 𝑁 = 14317; then √𝑁 ≈ 119.7. We begin with 𝑥 = 120 and obtain
𝑥2 − 𝑁 = 83, which is not a square. Next, let 𝑥 = 121 and now
𝑥2 − 𝑁 = 324 = 182
9.7. Factoring 179
is a square. Thus 𝑦 = 18 and

𝑁 = (𝑥 + 𝑦)(𝑥 − 𝑦) = (121 − 18)(121 − 18) = 103 ⋅ 139. ♢
The quadratic sieve generalizes Fermat factorization and is currently the fastest
algorithm for numbers with less than around 100 decimal digits. One looks for integers
𝑥 and 𝑦 such that 𝑥2 ≡ 𝑦2 mod 𝑁, but 𝑥 ≢ ±𝑦 mod 𝑁. This implies
𝑁 divides 𝑥2 − 𝑦2 = (𝑥 + 𝑦)(𝑥 − 𝑦),
but 𝑁 divides neither 𝑥 + 𝑦 nor 𝑥 − 𝑦. Hence gcd(𝑥 − 𝑦, 𝑁) must be a non-trivial divisor
of 𝑁 and equal either 𝑝 or 𝑞. The quadratic sieve tries to find suitable numbers 𝑥 and
𝑦. It is reasonable to choose integers 𝑥 close to √𝑁, so that 𝑥2 − 𝑁 is relatively small.
Fermat factorization requires that 𝑥2 − 𝑁 is a square number, but this is usually not the
case. Now the idea is to multiply several (non-quadratic) numbers 𝑥2 − 𝑁 with small
prime factors (smooth over a factor base). The difficult task of the quadratic sieve is to
find smooth numbers. Then one looks for a subset of smooth numbers such that their
product is a square. A solution can be found using linear algebra over 𝐺𝐹(2), since a
number is a square if the exponent of each prime factor is zero modulo 2.
The running time of the quadratic sieve is
𝑂(𝑒(1+𝑜(1))√ln(𝑁) ln(ln(𝑁))
)
(see [Pom96]), where 𝑜(1) is converging to 0 as 𝑛 → ∞. The algorithm is sub-exponen-
tial, but not polynomial.
Example 9.17. Let 𝑁 = 10441; then √𝑁 ≈ 102.2. We compute 𝑥2 − 𝑁 for a cou-

ple of integers 𝑥 ≥ 103 and factorize the result. In general, complete factorization is
inefficient and sieving should be used instead.
Suppose our factor base contains the primes 2, 3, 5, 7. Then 𝑥 = 103, 104, 107, 109
are interesting for this factor base, since 𝑥2 − 𝑁 is smooth, i.e., divisible only by primes
in the factor base.
sage: N =10441
sage: for x in range (103 ,110):
s= factor (x^2-N)
print x, x^2 - N, "=", s
103 168 = 2^3 * 3 * 7

104 375 = 3 * 5^3
105 584 = 2^3 * 73
106 795 = 3 * 5 * 53
107 1008 = 2^4 * 3^2 * 7
108 1223 = 1223
109 1440 = 2^5 * 3^2 * 5
Although 𝑥2 − 𝑁 is not a square for any of these 𝑥 and Fermat’s method cannot be
applied, their product is a square:
(1032 − 𝑁) ⋅ (1042 − 𝑁) ⋅ (1072 − 𝑁) ⋅ (1092 − 𝑁) = 212 ⋅ 36 ⋅ 54 ⋅ 72 .
Now we set 𝑥 = 103⋅104⋅107⋅109 and 𝑦 = 26 ⋅33 ⋅52 ⋅7 and obtain 𝑥2 ≡ 𝑦2 mod 𝑁. We

have 𝑥 ≡ 7491, 𝑦 ≡ 10052 modulo 𝑁 and get gcd(7491 − 10052, 10441) = 197, which
is in fact a divisor of 𝑁 = 53 ⋅ 197. ♢
At the time of writing, the number field sieve is the most efficient algorithm for
factoring large integers. With massive computing resources, numbers with more than
200 digits and for example the RSA Challenge with 768 bits could be factored using this
method. Algebraic number fields are extension fields ℚ(𝛼) of ℚ, where 𝛼 is a root of a
polynomial over ℚ. The number field sieve uses the rings ℤ[𝛼] instead of the integers
ℤ, but this topic goes beyond the scope of this book.
The heuristic complexity of the number field sieve is
1/3 ln(ln(𝑁))2/3
𝑂(𝑒(𝑐+𝑜(1)) ln(𝑁) ),
3 64
where 𝑐 = √ ≈ 1.92 (see [Pom96]).
9
1/3 2/3
Example 9.18. Suppose the sieve algorithm requires 𝑓(𝑁) = 𝑒𝑐 ln(𝑁) ln(ln(𝑁)) steps.
Then the effective key length (the bit strength) of RSA is log2 (𝑓(𝑁)). For 1024-bit RSA,
i.e., for 𝑁 ≈ 21024 and 𝑐 ≈ 1.92, one obtains ‘only’ log2 (𝑓(𝑁)) ≈ 86.7 bits. ♢
Pollard’s 𝑝 − 1 method can be applied for factoring 𝑁 = 𝑝𝑞, if 𝑝 − 1 or 𝑞 − 1

decompose into a product of small primes. In this case, we can guess multiples 𝑘 of
𝑝−1. One defines 𝑘 as a product of sufficiently large powers of small primes. If (𝑝−1) ∣
𝑘, then Euler’s Theorem 4.15 implies
𝑎𝑘 ≡ 1 mod 𝑝
for all integers 𝑎 with gcd(𝑎, 𝑝) = 1. One chooses a small integer 𝑎 > 1 and computes
𝑎𝑘 mod 𝑁. Since 𝑘 can be very large, fast exponentiation or the square-and-multiply
algorithm should be used. Finally, gcd(𝑎𝑘 − 1, 𝑁) gives either 𝑝 (method successful)
or 𝑁 (failure).
9.7. Factoring 181
Example 9.19. Let 𝑁 = 1241143. Set 𝑎 = 2 and try
𝑘 = 23 ⋅ 32 ⋅ 5 ⋅ 7 ⋅ 11 ⋅ 13 = 360360.
Of course, other products are also possible. We compute
2𝑘 mod 𝑁 = 2360360 ≡ 861526 mod 1241143.
Finally, we have
gcd(2𝑘 − 1, 𝑁) = gcd(861525, 1241143) = 547.
The method is successful and we find 𝑁 = 547 ⋅ 2269.

The prime 𝑝 = 547 is vulnerable to Pollard’s 𝑝 − 1 method since
𝑝 − 1 = 546 = 2 ⋅ 3 ⋅ 7 ⋅ 13
is a product of small primes. ♢
The Elliptic curve factorization method (ECM) is another interesting factoring al-
gorithm with sub-exponential running time. ECM is suitable for finding prime factors
with up to about 80 decimal digits, but it is less efficient than the quadratic sieve or the
number field sieve method for larger divisors. We outline ECM in Section 12.4.
Since no polynomial-time algorithm is known, the factoring assumption is cur-
rently well-founded, but in the future, quantum computers will probably be able to fac-
torize large integers. Quantum computing and Shor’s factoring algorithm are explored
in Chapter 13.
The relative success of the known factoring algorithms show that standard key
lengths of symmetric ciphers, i.e., 128 to 256 bits, are not sufficient for RSA (see Exam-
ple 9.18). With large resources, a modulus with up to around 1000 bits can be factored.
At the time of this writing, the use of 2048-bit integers 𝑁 is recommended for long-term
security against (non-quantum computing) attacks (see [BSI18]). The prime factors 𝑝
and 𝑞 should have around the same size (1024 bits) and their difference 𝑝 − 𝑞 should
be large.
Furthermore, the use of strong primes is sometimes recommended. A prime 𝑝 is
called strong if it is sufficiently large and satisfies additional conditions − in particu-
lar that 𝑝 − 1 and 𝑝 + 1 contain a large prime factor. This should provide protection
against certain factoring methods, for example Pollard’s 𝑝 − 1 method. However, the
size, randomness and independence of primes are more important and it is currently
assumed that tests on strong primes do not significantly increase the security of RSA.
9.8. Summary
• Public-key cryptosystems use a public key for encryption and a private key
for decryption. Indistinguishable encryptions under a chosen plaintext attack
(CPA security) or under an adaptive chosen ciphertext attack (CCA2 security)
are important requirements.
• The plain RSA cryptosystem uses the product of two large prime numbers and
the security relies on the difficulty to factorize a given product.
• The probabilistic Miller-Rabin algorithm can efficiently test the primality of
large integers.
• The plain RSA cryptosystem has weaknesses and the padded and randomized
RSA-OAEP scheme should be used instead. OAEP can achieve CCA2 security
under certain assumptions.
• Factoring algorithms with sub-exponential runtime exist, but no polynomial-
time algorithms are known. RSA is considered to be secure against non-
quantum computers, if the prime factors are randomly chosen and are more
than 1000 bits long.
Exercises
1. Explain why a deterministic public-key encryption scheme is insecure if the num-

ber of possible plaintexts is small.
2. Suppose that 𝑝 is a prime. Define a public-key scheme with the encryption func-
tion ℰ𝑘 (𝑚) = 𝑚𝑒 mod 𝑝 for a public key 𝑘 = (𝑒, 𝑝) and a plaintext 𝑚 ∈ ℤ∗𝑝 . Give
the private key and the decryption function. Show that this scheme is insecure.
3. Consider a plain RSA cryptosystem with modulus 𝑁 = 437 and public exponent
𝑒 = 5.
(a) Encrypt 𝑚 = 100.
(b) Factorize 𝑁 and determine the private key 𝑑.
(c) Decrypt the ciphertext and check that the result is 𝑚 = 100.
4. Suppose that 𝑚 ∈ ℤ𝑁 is chosen uniformly at random, where 𝑁 = 𝑝𝑞 and size(𝑝) =
size(𝑞) = 𝑛. Show that the probability of 𝑚 ∉ ℤ∗𝑁 is negligible.
5. Consider Example 9.5. Which ciphertexts can be decrypted by a low-exponent at-
tack? Now suppose that size(𝑁) = 2048 and 𝑒 = 216 + 1. Which plaintexts are
vulnerable to a low-exponent attack?
6. Bob’s public RSA key is (𝑒 = 35, 𝑁 = 323). Apply the plain RSA encryption
scheme:
(a) Encrypt the plaintext 𝑚 = 66 with Bob’s public key. Use the fast exponentia-
tion method.
Exercises 183
(b) Mallory eavesdrops two ciphertexts 𝑐1 = 26 and 𝑐2 = 213, which were sent
to Bob, but he does not know the plaintexts 𝑚1 and 𝑚2 . How can Mallory
compute the ciphertexts corresponding to the plaintexts 𝑚1 𝑚2 mod 𝑁 and
𝑚1 𝑚−1
2 mod 𝑁 without carrying out an attack?
(c) Mallory chooses 𝑠 = 5 and computes 𝑦 = 𝑠𝑒 mod 𝑁 ≡ 23. He wants to find
out the plaintext 𝑚 corresponding to the ciphertext 𝑐 = 104. He asks Bob to
decrypt the ‘innocent’ ciphertext 𝑐′ = 𝑦𝑐 mod 𝑁 ≡ 131 and gets the plaintext
𝑚′ = 142. Why is Mallory now able to determine 𝑚 without computing the
private exponent 𝑑 ? Determine the plaintext 𝑚.
(d) Now conduct an attack against this RSA key. Factorize 𝑁 and compute 𝑑.
7. A plaintext 𝑚 is encrypted with three different RSA moduli 𝑁1 = 901,
𝑁2 = 2581 and 𝑁3 = 4141 using the public exponent 𝑒 = 3. The ciphertexts are
𝑐1 = 98, 𝑐2 = 974, 𝑐3 = 2199. Conduct Hastad’s broadcast attack and determine
the plaintext 𝑚.
Tip: Set 𝑁 = 𝑁1 𝑁2 𝑁3 and find 𝑐 mod 𝑁 such that 𝑐 = 𝑐𝑖 mod 𝑁𝑖 for
3
𝑖 = 1, 2, 3; then compute 𝑚 = √𝑐.
8. Side-channel attacks against RSA use the power consumption of an implemen-
tation to derive the private key. Suppose a microprocessor uses the square-and-
multiply algorithm to decrypt a ciphertext with a private key 𝑑. An attacker an-
alyzes the power trace and concludes that the decryption uses the following se-
quence of modular squarings (SQ) and multiplications (MULT): SQ, SQ, SQ, SQ,
SQ, MULT, SQ, MULT, SQ, SQ, SQ, SQ, MULT, SQ, MULT.
(a) Determine the private key 𝑑.
Tip: Use the construction of the square-and-multiply algorithm given in Chap-
ter 3.
(b) The public key is (𝑒 = 11, 𝑁 = 8051). Calculate 𝜑(𝑁), 𝑝 and 𝑞 from 𝑑, 𝑒 and
𝑁 and verify your result.
9. The Fermat primality test of 𝑛 ∈ ℕ chooses a uniform random integer
𝑎 ∈ {1, … , 𝑛 − 1}, computes 𝑎𝑛−1 mod 𝑛 and outputs 𝑛 is composite, if the re-
sult is not congruent to 1. Otherwise, the test outputs 𝑛 is probably prime. Show
that the test is correct. However, there are composite numbers 𝑛 which are identi-
fied as possible primes for all 𝑎 ∈ ℤ∗𝑛 . They are called Carmichael numbers. Show
that 𝑛 = 561 is a Carmichael number.
10. Check the primality of 𝑛 = 263 using the Miller-Rabin algorithm and 𝑎 = 3 as well
as 𝑎 = 5.
11. Encrypt 𝑚 = 2314 with the plain RSA cipher and the public key (𝑒 = 5, 𝑁 =
10573). Factorize 𝑁 using Fermat’s method. Why is 𝑒 = 5 an admissible exponent,
whereas 𝑒 = 3 is not permitted? Determine the corresponding private key 𝑑. De-
crypt the ciphertext and check the result. Use the Chinese Remainder Theorem to
reduce the size of the exponents.
12. Two RSA moduli are given: 𝑁1 = 101400931 and 𝑁2 = 110107021. They have
a common prime factor. Show that both RSA keys are insecure and compute the
factorization of 𝑁1 and 𝑁2 .
13. An adversary is able to modify a RSA ciphertext. They want to square the unknown
plaintext modulo 𝑁. Why is this attack possible for plain RSA, but not if RSA-OAEP
is used?
14. Let (𝑒 = 5, 𝑁 = 10057) be the public key of an RSA cryptosystem. Encrypt the mes-
sage 𝑚 = 2090 using the plain RSA scheme. Factorize 𝑁 and find the decryption
exponent 𝑑.
15. Assume that RSA with a modulus of length 1024 bits and the encryption exponent
𝑒 = 216 + 1 is used. How many modular multiplications are needed, at most, for
encryption and for decryption?
16. Factorize 𝑁 = 2041 using the quadratic sieve method.
Remark: This example is discussed in [Pom96].
17. Factorize 𝑁 = 10573 with Pollard’s 𝑝 − 1 method. Choose 𝑎 = 2 and try 𝑘 = 23 33 ;
then give reasons why this attack is successful for the given integer 𝑁.
18. Describe an attack against RSA encryption with random padding if the padding
string is short and the number of possible plaintexts is small.
Chapter 10
Key Establishment
Keys play a crucial role in cryptography and the establishment of secret keys between
two (or more) parties is a non-trivial task. A key establishment method should prefer-
ably not require a secure channel and provide protection against adversaries.
Key distribution by a trusted authority is briefly discussed in Section 10.1. Key ex-
change or key agreement is a method where the parties exchange messages and jointly
generate a secret key. We explain key exchange protocols and discuss their security re-
quirements in Section 10.2. A widely used method is the Diffie-Hellman key exchange,
which is dealt with in Section 10.3. Diffie-Hellman is a public-key scheme, which uses
a large cyclic group in which the discrete logarithm is hard to compute. The most im-
portant example is the multiplicative group ℤ∗𝑝 of integers modulo a prime number, and
this is explained in Section 10.4. Another possibility is the group of points on an elliptic
curve over a finite field which is discussed in Section 12.2. In Section 10.5, we present
algorithms to solve the discrete logarithm problem and discuss their complexity.
Key encapsulation forms an alternative to key exchange and is covered in Section
10.6. There is also an encapsulation variant of the Diffie-Hellman key exchange. The
combination of key encapsulation and symmetric encryption gives hybrid public-key
encryption schemes, which are outlined in Section 10.7.
Key establishment, the Diffie-Hellman key exchange and the discrete logarithm
problem are standard topics in many cryptography textbooks, for example [PP10]. A
discussion of various methods for key distribution and key management can be found
in [GB08]. Refer to [KL15] for further details on security definitions and proofs of key
exchange, key encapsulation and hybrid encryption schemes.
185
186 10. Key Establishment
10.1. Key Distribution

The secure establishment of secret keys is a major prerequisite of symmetric cryptog-
raphy. Until the 1970s, it was assumed that an initial confidential and authenticated
channel was necessary for key distribution. The channel is used to establish a long-
term secret key between two parties in a key predistribution scheme. After the initial
setup, the long-term key can be used to encrypt a short-term session key that is sent
over a channel that may be insecure.
The distribution of keys can be delegated to a trusted third party or a central in-
stance. With appropriate protocols, it is sufficient to share a long-term secret key with
a Key Distribution Center (KDC). Basically, the KDC generates and encrypts a random
session key under the long-term secret key of the involved parties. Then the KDC sends
the encrypted session key to the destination (see Figure 10.1). The parties decrypt the
session key and leverage it to protect their communication.
𝑘𝐴 𝑘𝐵
n
pr
io
e-
ut
di
ib
str
str
ib
KDC
di
ut
e-
$
io
𝑘 ← {0, 1}𝑛
pr
𝐸𝑘
n
(𝑘) 𝐵 (𝑘)
𝐸 𝑘𝐴
𝐸𝑘 (𝑚)
𝐴 𝐵
Figure 10.1. Key distribution using a trusted third party.
We note that the basic protocol described above is only secure against eavesdrop-
ping and not against active attacks. Kerberos is an example of a more advanced and
widely used key distribution protocol (see [GB08] and RFC 4120 [NYHR05]).
The question of how to bootstrap the key distribution and establish long-term se-
cret keys remains. In the following section, we define the security requirements of
key exchange protocols that do not assume a secure channel ahead of time. In Section
10.3, we will see that the Diffie-Hellman key exchange is an important example of such
a protocol.
10.2. Key Exchange Protocols

A key exchange (or key agreement) protocol is a distributed algorithm between two (or
more) parties who exchange messages and finally compute a secret key. Now, we do
10.2. Key Exchange Protocols 187
not assume a pre-distribution of keys or a secure channel between the parties. Never-
theless, the protocol should be secure against eavesdropping attacks.
Definition 10.1. Suppose a key exchange protocol is given. Consider the following
experiment (see Figure 10.2). Two communication parties (Alice and Bob) hold 1𝑛 ,
exchange the messages 𝑚 and derive a key 𝑘 of length 𝑛. A challenger chooses a random
$ $
bit 𝑏 ← {0, 1}. If 𝑏 = 1 set 𝑘′ = 𝑘, and otherwise 𝑘′ ← {0, 1}𝑛 is chosen uniformly at
random. An adversary 𝐴 is given 1𝑛 , the transcript 𝑚 and the challenge 𝑘′ . They try to
guess 𝑏, i.e., to distinguish between the secret key 𝑘 and a random string, and output
a bit 𝑏′ . The challenger outputs 1 if 𝑏 = 𝑏′ , and 0 otherwise. The key exchange (KE)
advantage of 𝐴 is defined as
KE−eav
Adv (𝐴) = | 𝑃𝑟[𝑏′ = 𝑏] − 𝑃𝑟[𝑏′ ≠ 𝑏] | .
The key exchange protocol is secure in the presence of an eavesdropper (EAV-secure) if,
KE−eav
for every probabilistic polynomial time adversary 𝐴, the advantage Adv (𝐴) is
negligible in 𝑛. ♢
Alice Bob
𝑚
Derive 𝑘 Derive 𝑘
1𝑛 , 𝑚 $ $
𝑏 ← {0, 1}, 𝑟 ← {0, 1}𝑛
𝑘′ 𝑘 if 𝑏 = 1
𝑘′ = {
𝑟 if 𝑏 = 0
𝑏′
output 1 or 0
Figure 10.2. Key distinguishability experiment.
The above definition of EAV security requires that the protocol messages 𝑚 do
not reveal a single bit of information on the key 𝑘 to an eavesdropper. Otherwise, the
adversary would be able to distinguish between 𝑘 and a random string.
Remark 10.2. Note that the above experiment assumes a passive attacker who is un-
able to change or inject any messages. The presence of active adversaries requires an
authenticated key exchange (AKE) protocol, where the communication partners are
able to verify the authenticity of messages. Yet another topic is perfect forward secrecy
(PFS), which guarantees the security of past session keys if long-lived keys are exposed.
10.3. Diffie-Hellman Key Exchange

The Diffie-Hellman protocol was a breakthrough in cryptography, because it solved the
problem of secure key exchange over an insecure channel without pre-distribution of
secret keys. The protocol was the first published public-key algorithm [DH76] and the
starting point of a new type of cryptography.
In this section, we explain the protocol for an arbitrary cyclic group 𝐺. A standard
choice of 𝐺 are (subgroups of) the multiplicative group ℤ∗𝑝 of integers modulo a prime
number 𝑝 (see Section 10.4). The multiplicative groups 𝐺𝐹(2𝑚 )∗ can also be used, but
research has shown that they are less secure than groups ℤ∗𝑝 of the same size. Another
option is the group of points on an elliptic curve over a finite field. We explore the
elliptic curve Diffie-Hellman key exchange in Section 12.2.
The Diffie-Hellman key exchange protocol requires a cyclic group 𝐺 of order 𝑞
and a generator 𝑔 ∈ 𝐺. The parameters (𝐺, 𝑞, 𝑔) are public and have to be exchanged
between the communication parties Alice and Bob if they are not known in advance.
Alice generates a private uniform random number 𝑎 ∈ ℤ𝑞 , i.e., a positive integer less
than 𝑞, and sends 𝐴 = 𝑔𝑎 ∈ 𝐺 to Bob. Bob also chooses a private uniform element
𝑏 ∈ ℤ𝑞 and sends 𝐵 = 𝑔𝑏 ∈ 𝐺 to Alice. The communication channel between Alice
and Bob can be public. Alice derives the shared secret key by computing 𝑘 = 𝐵 𝑎 ∈ 𝐺
and Bob computes 𝑘 = 𝐴𝑏 ∈ 𝐺 (compare Figure 10.3). The scheme is correct, since 𝐵 𝑎
and 𝐴𝑏 are both equal to 𝑔𝑎𝑏 ∈ 𝐺.
Alice 𝐺, 𝑞, 𝑔 Bob
$ $
𝑎 ← ℤ 𝑞 , 𝐴 = 𝑔𝑎 𝑏 ← ℤ 𝑞 , 𝐵 = 𝑔𝑏
𝐴
𝐵
𝑘 = 𝐵𝑎 𝑘 = 𝐴𝑏
Figure 10.3. Diffie-Hellman key exchange between Alice and Bob.
Note that the result of the Diffie-Hellman key exchange is a group element, not a
binary string. In practice, one applies a key derivation function to 𝑘, which transforms
the group element into a binary string.
The security of the Diffie-Hellman key exchange is closely related to the discrete
logarithm (DL) problem. If 𝑔 is a generator of 𝐺 and ord(𝐺) = ord(𝑔) = 𝑞, then
𝐺 = {𝑒, 𝑔1 , … , 𝑔𝑞−1 }.
10.3. Diffie-Hellman Key Exchange 189
We therefore have a bijection between the elements of 𝐺 and the exponents

0, 1, … , 𝑞 − 1. For each ℎ ∈ 𝐺, we call the corresponding exponent the discrete loga-
rithm of ℎ to the base 𝑔 and write log𝑔 (ℎ). One has
log𝑔 (ℎ)
𝑔 = ℎ.
In the Diffie-Hellman protocol, the exponents 𝑎 = log𝑔 (𝐴) and 𝑏 = log𝑔 (𝐵) are kept
secret. One assumes that an eavesdropper is not able to efficiently compute discrete
logarithms of 𝐴. Of course, some discrete logarithms, for example log𝑔 (𝑒) = 0 and
log𝑔 (𝑔) = 1, are obvious and pre-computed tables of powers 𝑔𝑎 can also be used. The
discrete logarithm problem is called hard relative to the generated parameters if for
any probabilistic polynomial-time adversary 𝐴, there is only a negligible probability
(in terms of the group size) that an adversary finds the discrete logarithm log𝑔 (𝐴) of a
uniform group element 𝐴.
The Computational Diffie-Hellman (CDH) problem is to compute the Diffie-
Hellman shared secret 𝑘 = 𝑔𝑎𝑏 = 𝐴𝑏 = 𝐵 𝑎 for given uniform elements 𝐴, 𝐵 ∈ 𝐺.
The CDH problem is not harder than the discrete logarithm (DL) problem. In other
words, a solution to the DL problem also solves the CDH problems.
Now, the security of the Diffie-Hellman protocol relies on the Decisional Diffie-
Hellman (DDH) problem defined below. The DDH problem is not harder than the
CDH and the DL problem.
Definition 10.3. Consider the following experiment (see Figure 10.4): on input 1𝑛 ,
a cyclic group 𝐺 = ⟨𝑔⟩ of order 𝑞 is generated where size(𝑞) = 𝑛. Choose random
$ $
numbers 𝑎 ← ℤ𝑞 , 𝑏 ← ℤ𝑞 , compute 𝐴 = 𝑔𝑎 , 𝐵 = 𝑔𝑏 , 𝑘 = 𝑔𝑎𝑏 and choose a random
$
bit 𝑏 ← {0, 1}. An adversary obtains 𝐺, 𝑞, 𝑔, 𝐴, 𝐵 and 𝑘 (if 𝑏 = 1) or a uniform random
$
element 𝑟 ← 𝐺 (if 𝑏 = 0). The adversary tries to guess 𝑏 and outputs a bit 𝑏′ . The
output of the experiment is 1 if 𝑏′ = 𝑏, and 0 otherwise.
The DDH-advantage of 𝐴 is
DDH
Adv (𝐴) = |𝑃𝑟[𝑏′ = 𝑏] − 𝑃𝑟[𝑏′ ≠ 𝑏] |.
The DDH problem is hard relative to the generation of group parameters, if for every
DDH
probabilistic polynomial time adversary 𝐴, the advantage Adv (𝐴) is negligible in
terms of 𝑛. ♢
If the DL problem or the CDH problem is easy, then the DDH problem is easy, too,
but the converse is not known. The DDH assumption is therefore stronger than the DL
assumption.
Theorem 10.4. If the DDH problem is hard relative to the generation of group parame-
ters, then the Diffie-Hellman key exchange protocol is secure in the presence of an eaves-
dropper (EAV-secure).
𝐺, 𝑞, 𝑔, 𝐴, 𝐵 $
𝑎, 𝑏 ← ℤ𝑞 , 𝐴 = 𝑔𝑎 , 𝐵 = 𝑔𝑏
$ $
𝑏 ← {0, 1}, 𝑟 ← 𝐺
𝑘′ 𝑘 if 𝑏 = 1
𝑘′ = {
𝑟 if 𝑏 = 0
𝑏′
output 1 or 0
Figure 10.4. Decisional Diffie-Hellman (DDH) experiment.
Remark 10.5. It is not difficult to show that solving the above DDH experiment and
distinguishing a Diffie-Hellman shared secret 𝑘 from a uniform random element are
equivalent problems (see [KL15]). Note that 𝑘 is an element of group 𝐺. The above
Theorem therefore requires a slightly modified key distinguishability experiment: the
key and the random element are from group 𝐺 instead of an 𝑛-bit string. Alternatively,
one applies a key derivation function to transform the shared secret key into a binary
string.
Remark 10.6. It is important to observe that the plain Diffie-Hellman protocol does
not protect against active adversaries. If an attacker is able to replace 𝐴 and 𝐵 with
their own parameters, then they can perform a Man-in-the-Middle attack. The prob-
lem of the plain Diffie-Hellman protocol is the lack of authenticity (see Remark 10.2).
In practice, one often signs the public keys in order to prove their authenticity. Sig-
natures are covered in Chapter 11. However, some issues remain, because at some
point a trusted public key (a trust anchor) is needed.
10.4. Diffie-Hellman using Subgroups of ℤ∗𝑝

The multiplicative group ℤ∗𝑝 and its subgroups are often used in a Diffie-Hellman key
exchange. In fact, the original construction in [DH76] was based on this type of group.
The idea is that the modular power function 𝑓(𝑥) = 𝑔𝑥 mod 𝑝 is a one-way permu-
tation, which is easy to compute but is hard to invert if the parameters 𝑝 and 𝑔 are
properly chosen.
If 𝑝 is a prime, then ℤ∗𝑝 = 𝐺𝐹(𝑝)∗ is a cyclic group of order 𝑝 − 1 (see Theorem
4.24). Any 𝑔 ∈ ℤ∗𝑝 generates a cyclic subgroup 𝐺 of order 𝑞 = ord(𝑔) ∣ 𝑝 − 1 and
could be used in the protocol, but since the discrete logarithm problem must be hard,
𝑞 should be large. However, this condition is not sufficient if 𝑞 can be factorized into
10.4. Diffie-Hellman using Subgroups of ℤ∗𝑝 191
a product of small primes. For the hardness of the discrete logarithm and the DDH
problem, it is advisable for 𝑞 to be a large prime.
Suppose that ℎ is a generator of ℤ∗𝑝 , i.e., ord(ℎ) = 𝑝 − 1, and 𝑝 − 1 = 𝑟𝑞, where 𝑞 is
𝑝−1
a large prime. Then ord(ℎ𝑟 ) = = 𝑞 and 𝐺 = ⟨ℎ𝑟 ⟩ is a cyclic group of prime order
𝑟
𝑟
𝑞. We let 𝑔 = ℎ and thus obtain Diffie-Hellman parameters.
Furthermore, safe primes are useful in the generation of Diffie-Hellman parame-
𝑝−1
ters. A prime 𝑝 is called safe if 𝑞 = is prime. Then 𝑝 − 1 = 2𝑞 and the order of any
2
∗
element 𝑔 ∈ ℤ𝑝 with 𝑔 ≢ ±1 is either 𝑝 − 1 or 𝑞.
Example 10.7. Let 𝑝 = 59. Of course, 𝑝 is much too small to be secure. One has
ord(ℤ∗𝑝 ) = 𝑝 − 1 = 2 ⋅ 29, so 59 is a safe prime. We are looking for a generator of ℤ∗𝑝
and try ℎ = 2 mod 59. In fact, ord(ℎ) = 58, i.e., ℎ is a primitive root mod 59, since
ℎ2 = 4 ≢ 1 and ℎ29 ≡ 58 ≢ 1 mod 59
(see Algorithm 4.1). Hence ℎ generates the full multiplicative group of order 58 and
𝑔 = ℎ2 generates a subgroup of prime order 29.
(1) We perform a Diffie-Hellman key exchange with the parameters 𝐺 = ℤ∗59 , ℎ =
2 mod 59 and ord(ℎ) = 58. Alice selects 𝑎 = 7 and sends 𝐴 = 27 mod 59 ≡ 10
to Bob. Bob chooses 𝑏 = 24 and transmits 𝐵 = 224 mod 59 ≡ 35 to Alice. Alice
computes 𝑘 = 𝐵 𝑎 = 357 mod 59 ≡ 12 and Bob obtains the same key 𝑘 = 𝐴𝑏 =
1024 mod 59 ≡ 12.
(2) Set 𝑔 = ℎ2 = 4 mod 59 and 𝐺 = ⟨4⟩ ⊂ ℤ∗59 and perform a Diffie-Hellman
exchange using 𝐺, 𝑔 and 𝑞 = ord(𝑔) = 29. Alice selects 𝑎 = 7 and sends 𝐴 =
47 mod 59 ≡ 41 to Bob. Bob chooses 𝑏 = 24 and sends 424 mod 59 ≡ 45 to Alice.
Alice computes 𝑘 = 457 mod 59 ≡ 26 and Bob gets 𝑘 = 4124 mod 59 ≡ 26. ♢
In practice, standardized parameters 𝐺, 𝑔 and 𝑞 are used. A set of pre-defined

parameters is called a Diffie-Hellman group. Note that – in contrast to RSA – the Diffie-
Hellman parameters 𝐺, 𝑔 and 𝑞 can be re-used. However, every key exchange should
use fresh uniform random exponents 𝑎 and 𝑏 (see Exercise 10).
Example 10.8. RFC 7919 [Gil16] defines a 2048-bit Diffie-Hellman group:
p = FFFFFFFF FFFFFFFF ADF85458 A2BB4A9A AFDC5620 273D3CF1
D8B9C583 CE2D3695 A9E13641 146433FB CC939DCE 249B3EF9
7D2FE363 630C75D8 F681B202 AEC4617A D3DF1ED5 D5FD6561
2433F51F 5F066ED0 85636555 3DED1AF3 B557135E 7F57C935
984F0C70 E0E68B77 E2A689DA F3EFE872 1DF158A1 36ADE735
30ACCA4F 483A797A BC0AB182 B324FB61 D108A94B B2C8E3FB
B96ADAB7 60D7F468 1D4F42A3 DE394DF4 AE56EDE7 6372BB19
0B07A7C8 EE0A6D70 9E02FCE1 CDF7E2EC C03404CD 28342F61
9172FE9C E98583FF 8E4F1232 EEF28183 C3FE3B1B 4C6FAD73
3BB5FCBC 2EC22005 C58EF183 7D1683B2 C6F34A26 C1B2EFFA
886B4238 61285C97 FFFFFFFF FFFFFFFF
g = 2
𝑝−1
The group order is 𝑞 = . ♢
2
Finally, we briefly discuss the efficiency of Diffie-Hellman. The Diffie-Hellman

key exchange requires two exponentiations by each communication partner. The public
keys 𝐴 = 𝑔𝑎 and 𝐵 = 𝑔𝑏 can be pre-computed if the group and the generator are known
in advance. Then the shared secret 𝑘 can be determined with only one exponentiation
by each party. The modulus is 𝑝, but the exponent is bounded by 𝑞. Therefore, the
running time of the modular exponentiations is 𝑂(size(𝑝)2 size(𝑞)).
10.5. Discrete Logarithm

We have seen that solving the discrete-logarithm (DL) problem breaks the Diffie-
Hellman key exchange. In the following, we give a brief overview of existing algo-
rithms to compute discrete logarithms.
Suppose that a cyclic group 𝐺 (not necessarily a subgroup of ℤ∗𝑝 ), a generator 𝑔 of
order 𝑞, 𝑛 = size(𝑞) and 𝐴 = 𝑔𝑎 are given. We are looking for algorithms computing
the exponent 𝑎. Note that we use the multiplicative notation for the group 𝐺.
Obviously, one can compute 𝑔𝑥 ∈ 𝐺 for all exponents 0 ≤ 𝑥 < 𝑞 and compare
the result with 𝐴. The complexity of this approach is 𝑂(𝑞) = 𝑂(2𝑛 ) (in the worst case)
and hence exponential. This can be reduced to 𝑂(√𝑞) = 𝑂(2𝑛/2 ), which is still expo-
nential, using Shank’s Babystep-Giantstep algorithm: let 𝑚 = ⌊√𝑞⌋; then the unknown
exponent 𝑎 can be written as
𝑎 = 𝑚𝑠 + 𝑟, where 𝑟 < 𝑚.
Since 𝐴 = 𝑔 , one has 𝐴 = 𝑔 𝑔 or, equivalently, 𝐴𝑔−𝑟 = (𝑔𝑚 )𝑠 . The elements 𝐴𝑔−𝑟
𝑎 𝑚𝑠 𝑟
for 0 ≤ 𝑟 < 𝑚 are called babysteps and have to be stored. If one of the babysteps equals
1, then set 𝑎 = 𝑟 and the problem is solved. Otherwise, set 𝑇 = 𝑔𝑚 , compute the
giantsteps 𝑇 𝑠 for 0 < 𝑠 ≤ 𝑚 and compare them to the babysteps. If the giantstep 𝑇 𝑠
is equal to the babystep 𝐴𝑔−𝑟 , then the solution to the DL problem is 𝑎 = 𝑚𝑠 + 𝑟. In
the worst case, all babysteps and giantsteps have to be computed, which requires 2𝑚
exponentiations. Furthermore, 𝑚 babysteps have to be stored. Hence the running time
and the space complexity is 𝑂(2𝑛/2 ).
Example 10.9. Let 𝑝 = 59 and 𝑔 = 4 ∈ ℤ∗𝑝 ; then 𝑞 = ord(𝑔) = 29. Suppose an
adversary eavesdrops 𝐴 = 41 (see Example 10.7). They compute the discrete logarithm
using the Babystep-Giantstep algorithm: 𝑚 = ⌊√29⌋ = 5, 𝑔−1 = (4 mod 59)−1 ≡ 15.
The babysteps are:
𝐴 = 41, 𝐴𝑔−1 = 25, 𝐴𝑔−2 = 21, 𝐴𝑔−3 = 20, 𝐴𝑔−4 = 5.
Furthermore, 𝑇 = 𝑔𝑚 = 45 mod 59 ≡ 21. Hence the first giantstep matches the
second babystep and the solution is 𝑎 = 1⋅5+2 = 7. In fact, 𝑔𝑎 = 47 mod 59 ≡ 41. ♢
Pollard’s 𝜌 method for logarithms has about the same running time 𝑂(√𝑞) = 𝑂(2𝑛/2 )
as the Babystep-Giantstep algorithm, but requires much less storage. We briefly outline
10.5. Discrete Logarithm 193
the approach: Generate a sequence of elements 𝑔𝑥 𝐴𝑦 ∈ 𝐺, where (𝑥, 𝑦) ∈ ℤ𝑞 × ℤ𝑞 , and

use Floyd’s cycle finding algorithm (see Proposition 1.63) to find a collision
𝑔𝑥1 𝐴𝑦1 = 𝑔𝑥2 𝐴𝑦2 .
Since 𝐺 is cyclic of order 𝑞, the exponents satisfy 𝑥1 + 𝑎𝑦1 ≡ 𝑥2 + 𝑎𝑦2 mod 𝑞, so that 𝑎
is equal to
𝑥 − 𝑥1
𝑎= 2 mod 𝑞.
𝑦1 − 𝑦2
The Pohlig-Hellman algorithm can be applied if 𝑞 = ord(𝐺) is a product of powers

of small primes. By Theorem 4.29, every finite abelian group is isomorphic to a prod-
uct of cyclic groups of prime-power order. The discrete logarithm can be computed
separately for each of the factors, and the Chinese Remainder Theorem combines this
into a result for the full group. Furthermore, one can speed up the computation of dis-
crete logarithms in groups of prime-power order. We refer to [Sho09] for more details.
Therefore, 𝑞 should be a prime or at least contain a large prime divisor.
Example 10.10. Let 𝑝 = 73 and 𝑔 = 11 mod 73. One can easily show that 𝑔 is a
generator of 𝐺 = ℤ∗𝑝 . Suppose 𝐴 = 62 mod 73 is given and we want to compute
log𝑔 (𝐴). Since ord(𝐺) = 72 = 8 ⋅ 9, one has an isomorphism
𝐺 ≅ 𝐺1 × 𝐺2 ,
where 𝐺1 and 𝐺2 are cyclic subgroups of order 8 and 9, respectively. The isomorphism
maps ℎ ∈ 𝐺 to (ℎ9 , ℎ8 ) ∈ 𝐺1 × 𝐺2 . The image of 𝑔 in 𝐺1 × 𝐺2 is given by
𝑔9 = 119 mod 73 ≡ 22 , 𝑔8 = 118 mod 73 ≡ 2,
and the image of 𝐴 in 𝐺1 × 𝐺2 is given by
𝐴9 = 629 mod 73 ≡ 51 and 𝐴8 = 628 mod 73 ≡ 2.
Now we can compute the discrete logarithm in the smaller subgroups 𝐺1 = ⟨22⟩ and
𝐺2 = ⟨2⟩:
log22 (51) = 5 and log2 (2) = 1.
The discrete logarithm in the full group 𝐺 is a positive integer 𝑎 which satisfies
𝑎 ≡ 5 mod 8 and 𝑎 ≡ 1 mod 9.
By the Chinese Remainder Theorem 4.26, these congruences have a unique solution
modulo 72. The Extended Euclidean Algorithm on input 8 and 9 yields the equation
1 = −8 + 9, and thus
𝑎 = 5 ⋅ 9 + 1 ⋅ (−8) = 37.
This is indeed the discrete logarithm of 𝐴 = 62, since 𝑔𝑎 ≡ 1137 mod 73 ≡ 62. ♢
The Index-Calculus algorithm fixes a factor base 𝐵 consisting of small primes and
first computes log𝑔 (𝑝𝑖 ) for all 𝑝𝑖 ∈ 𝐵. Then random integers 0 < 𝑥 < 𝑝 are chosen until
𝑔𝑥 𝐴 mod 𝑝 is a product of primes in the factor base. If this is the case, then log𝑔 (𝑔𝑥 𝐴)
can be determined using the pre-computed discrete logarithms log𝑔 (𝑝𝑖 ). Finally, one
obtains
log𝑔 (𝐴) = log𝑔 (𝑔𝑥 𝐴) − 𝑥 mod 𝑝 − 1.
The algorithm can only be applied to the multiplicative groups of finite fields (and to
some families of elliptic curves). The expected running time is
𝑂(𝑒(√2+𝑜(1))√ln(𝑝) ln(ln(𝑝)) ).
The Number Field Sieve for Discrete Logarithms is currently the best available al-
gorithm for the multiplicative group, and its heuristic sub-exponential running time
1 2
64
is 𝑂(𝑒(𝑐+𝑜(1)) ln(𝑝) 3 ln(ln(𝑝)) 3 ), where 𝑐 = ( )1/3 ≈ 1.92 (see [JOP14]). Note that the
9
running time depends on the size of 𝑝 and not on the size of the group order 𝑞. The
effective key length for a 1024-bit prime 𝑝 is only around 86 bits, and at the time of this
writing primes of at least 2000 bits are recommended.
Polynomial-time algorithms are not known, but the discrete logarithm problem
can be efficiently solved with quantum computers (see Chapter 13).
10.6. Key Encapsulation

Key encapsulation is a mechanism where a public-key scheme is leveraged to establish a
secret key over an insecure channel. The sender encapsulates a randomly chosen secret
key using the public key of the receiver and the receiver decapsulates the symmetric
key using their private asymmetric key (see Figure 10.5).
Example 10.11. Suppose a public-key encryption scheme is given. Bob possesses a key
pair (𝑝𝑘, 𝑠𝑘) and Alice has a copy of his public key 𝑝𝑘. Now Alice generates a uniform
random symmetric key 𝑘 and encrypts 𝑘 using 𝑝𝑘. The ciphertext 𝑐 = ℰ𝑝𝑘 (𝑘) is sent to
Bob, who decrypts 𝑐 and recovers 𝑘 using his private key 𝑠𝑘. ♢
This is an example of a Key Encapsulation Mechanism (KEM), but the definition

given below will be more general. Encapsulation and decapsulation is not necessarily
identical to encryption and decryption, respectively. Furthermore, padding and ran-
domization in encryption schemes such as RSA-OAEP may not be required for key
encapsulation purposes.
Definition 10.12. A key encapsulation mechanism (KEM) consists of the following
probabilistic polynomial-time algorithms (see Figure 10.5):
• A key generation algorithm 𝐺𝑒𝑛(1𝑛 ) takes a security parameter 1𝑛 as input and
outputs a key pair (𝑝𝑘, 𝑠𝑘).
• The encapsulation algorithm 𝐸𝑛𝑐𝑎𝑝𝑠 takes a public key 𝑝𝑘 and a security param-
eter 1𝑛 as input. It outputs a ciphertext 𝑐 and a key 𝑘 ∈ {0, 1}𝑛 or, more generally,
10.6. Key Encapsulation 195
$
a key of length 𝑙(𝑛). We write (𝑐, 𝑘) ← 𝐸𝑛𝑐𝑎𝑝𝑠𝑝𝑘 (1𝑛 ), where 𝑐 is public and 𝑘 is
secret.
• The decapsulation algorithm 𝐷𝑒𝑐𝑎𝑝𝑠 takes a private key 𝑠𝑘 and a ciphertext 𝑐 as
input. It outputs a key 𝑘 = 𝐷𝑒𝑐𝑎𝑝𝑠𝑠𝑘 (𝑐) or an error symbol ⟂.
$
The KEM provides correct encapsulation, if for (𝑐, 𝑘) ← 𝐸𝑛𝑐𝑎𝑝𝑠𝑝𝑘 (1𝑛 ), one has
𝐷𝑒𝑐𝑎𝑝𝑠𝑠𝑘 (𝑐) = 𝑘. ♢
1𝑛 𝑝𝑘 𝑠𝑘
𝑐
𝐸𝑛𝑐𝑎𝑝𝑠 𝐷𝑒𝑐𝑎𝑝𝑠
𝑘 𝑘
Figure 10.5. Key encapsulation and decapsulation.
A key encapsulation mechanism naturally defines a key establishment method:

Alice takes Bob’s public key 𝑝𝑘, runs the encapsulation algorithm and keeps 𝑘. She
sends the ciphertext 𝑐 to Bob, who obtains 𝑘 by running the decapsulation algorithm.
A KEM is secure under chosen plaintexts attacks (CPA-secure) if an adversary,
who has access to 𝑝𝑘 and 𝑐, cannot distinguish between the encapsulated key 𝑘 and
a random string of the same length. CPA security means that an adversary does not
learn a single bit of 𝑘 from the ciphertext 𝑐.
Definition 10.13. Suppose a key encapsulation mechanism is given. Consider the
following experiment: on input 1𝑛 , the algorithm 𝐺𝑒𝑛(1𝑛 ) generates a key pair (𝑝𝑘, 𝑠𝑘).
$
Then 𝐸𝑛𝑐𝑎𝑝𝑠𝑝𝑘 (1𝑛 ) is run and outputs (𝑐, 𝑘). A uniform random bit 𝑏 ← {0, 1} and a
$
uniform random string 𝑟 ← {0, 1}𝑛 are chosen. If 𝑏 = 1 then set 𝑘′ = 𝑘, and otherwise
𝑘′ = 𝑟. An adversary 𝐴 is given (𝑝𝑘, 𝑐, 𝑘′ ), but the private key 𝑠𝑘 and 𝑏 are not known to
the adversary. 𝐴 has to distinguish between the cases 𝑘′ = 𝑘 and 𝑘′ = 𝑟. The adversary
outputs 𝑏′ = 1 if they suspect that they obtained the key 𝑘 and otherwise output 𝑏′ = 0.
The output of the experiment is 1 if 𝑏′ = 𝑏, and 0 otherwise.
The IND-CPA advantage of the adversary 𝐴 is defined as
ind−cpa
Adv (𝐴) = | 𝑃𝑟[𝑏′ = 𝑏] − 𝑃𝑟[𝑏′ ≠ 𝑏] | .
The scheme is CPA-secure if, for all probabilistic polynomial time adversaries 𝐴, the
ind−cpa
A stronger notion is security against adaptive chosen ciphertext attacks (CCA2 se-
curity). The corresponding experiment gives the adversary additional access to a de-
capsulation oracle (before and after obtaining the challenge), but they may not request
the decapsulation of the challenge ciphertext 𝑐.
We construct a KEM based on RSA encryption, which can be shown to be CCA2-
secure.
Definition 10.14. The RSA key encapsulation mechanism is defined as follows:
• The key generation algorithm 𝐺𝑒𝑛(1𝑛 ) is identical to the RSA key generation (see
Definition 9.4) and outputs a public key 𝑝𝑘 = (𝑒, 𝑁) and a private key 𝑠𝑘 = (𝑑, 𝑁).
Furthermore, a hash function 𝐻 ∶ ℤ∗𝑁 → {0, 1}𝑛 is fixed.
• The encapsulation algorithm 𝐸𝑛𝑐𝑎𝑝𝑠 takes the public key 𝑝𝑘 as input and chooses
a uniform random element 𝑠 ∈ ℤ∗𝑁 . It outputs the ciphertext
𝑐 = 𝑠𝑒 mod 𝑁
and the key 𝑘 = 𝐻(𝑠).

• The decapsulation algorithm 𝐷𝑒𝑐𝑎𝑝𝑠 takes 𝑐 and the private key 𝑠𝑘 as input. It
computes
𝑠 = 𝑐𝑑 mod 𝑁
and outputs 𝑘 = 𝐻(𝑠). ♢
We infer from the RSA construction (see Definition 9.4) that the above encapsula-
tion mechanism is correct. If the RSA assumption holds and the hash function behaves
like a random oracle, then CPA security follows from the fact that an adversary is un-
able to derive 𝑠 from 𝑐. But if 𝑠 is unknown then 𝐻(𝑠) is uniform random. Note that
padding schemes like OAEP are not required here since 𝑠 is uniform in ℤ∗𝑁 . Further-
more, the RSA key encapsulation mechanism even turns out to be CCA2-secure if the
hash function has no weaknesses. An adversary with access to a decapsulation oracle
only gets the hash value 𝑘′ = 𝐻((𝑐′ )𝑑 mod 𝑁) on input 𝑐′ . However, this does not re-
veal any information about 𝑘 = 𝐻(𝑐𝑑 mod 𝑁) if 𝑐 ≠ 𝑐′ , since hashes of different input
values are uncorrelated. We refer to [KL15] for a proof of the following Theorem:
Theorem 10.15. If the RSA assumption holds and 𝐻 is modeled as a random oracle,
then the RSA key encapsulation mechanism is CCA2-secure. ♢
The Diffie-Hellman key exchange protocol can also be turned into a key encap-
sulation mechanism. The Diffie-Hellman KEM can be viewed as an adaption of the
ElGamal encryption scheme (see Exercise 12).
Definition 10.16. The Diffie-Hellman key encapsulation mechanism is defined by the

following algorithms:
• The key generation algorithm 𝐺𝑒𝑛 takes 1𝑛 as input and outputs a cyclic group
𝐺 of order 𝑞 with 𝑛 = size(𝑞), a generator 𝑔 ∈ 𝐺, a uniform random element
𝑏 ∈ ℤ𝑞 and 𝐵 = 𝑔𝑏 . The public key is 𝑝𝑘 = (𝐺, 𝑞, 𝑔, 𝐵) and the private key is
𝑠𝑘 = (𝐺, 𝑞, 𝑔, 𝑏). Also fix a function 𝐻 ∶ 𝐺 → {0, 1}𝑛 .
• The encapsulation algorithm takes 𝑝𝑘 as input, chooses a uniform random ele-
ment 𝑎 ∈ ℤ𝑞 and outputs the ciphertext 𝑐 = 𝐴 = 𝑔𝑎 and the key 𝑘 = 𝐻(𝐵 𝑎 ).
• The decapsulation algorithm 𝐷𝑒𝑐𝑎𝑝𝑠 takes 𝑠𝑘 and 𝑐 as input and outputs the key
𝑘 = 𝐻(𝑐𝑏 ). ♢
The encapsulated key is 𝐻(𝑘′ ), where 𝑘′ is the shared Diffie-Hellman secret 𝑔𝑎𝑏 .
Since 𝑘′ = 𝑔𝑎𝑏 = 𝐴𝑏 = 𝐵 𝑎 , the encapsulation method is correct. The security depends
on standard assumptions about the Diffie-Hellman problem and on properties of 𝐻.
Theorem 10.17. Suppose the computational Diffie-Hellman (CDH) problem is hard rel-
ative to the generation of group parameters and 𝐻 is modeled as a random oracle. Then
the Diffie-Hellman key encapsulation mechanism is CPA-secure. ♢
The proof can be found in [KL15]. There is also a security guarantee without the
use of a random oracle. Under the stronger gap-CDH assumption the Diffie-Hellman
key encapsulation mechanism can be shown to be CCA2-secure. The gap-CDH prob-
lem (see [OP01]) gives the adversary access to a Decisional Diffie-Hellman oracle that
answers whether (𝑔, 𝐴, 𝐵, 𝑘′ ) is a valid Diffie-Hellman quadruple, i.e., whether or not
log (𝐵)
𝑘′ = 𝐴 𝑔 .
10.7. Hybrid Encryption

You may have noticed that we have not discussed public-key encryption of messages of
arbitrary length. Multiple encryptions are of course possible, but are rarely used in prac-
tice because public-key schemes are not fast enough to process mass data. The stan-
dard approach is to combine public-key and symmetric-key encryption. The public-key
scheme is only used for key establishment and the data is encrypted with a symmetric
algorithm. This method is called hybrid encryption.
Definition 10.18. Suppose a key encapsulation mechanism (KEM) and a symmetric-

key encryption scheme for messages of arbitrary length are given. Then a hybrid en-
cryption scheme can be defined as follows:
• Run the key generation algorithm of the KEM on input 1𝑛 and output the keys 𝑝𝑘
and 𝑠𝑘.
• The hybrid encryption algorithm takes the public key 𝑝𝑘 and a message 𝑚 ∈ {0, 1}∗
as input. The encapsulation algorithm 𝐸𝑛𝑐𝑎𝑝𝑠 computes
$
(𝑐, 𝑘) ← 𝐸𝑛𝑐𝑎𝑝𝑠𝑝𝑘 (1𝑛 ).
Then the symmetric encryption algorithm ℰ takes 𝑘 and the plaintext 𝑚 as input
and computes 𝑐′ = ℰ𝑘 (𝑚). Finally, output the ciphertext (𝑐, 𝑐′ ).
• The hybrid decryption algorithm takes the private key 𝑠𝑘 and the ciphertext (𝑐, 𝑐′ )
as input. First, the symmetric key is retrieved by computing
𝑘 = 𝐷𝑒𝑐𝑎𝑝𝑠𝑠𝑘 (𝑐).
Then decrypt 𝑐′ and output the plaintext 𝑚 = 𝒟𝑘 (𝑐′ ). If 𝑐 or 𝑐′ are invalid then
output ⟂. ♢
The advantage of hybrid schemes is that the computationally expensive public-

key algorithm is only run once during encryption and decryption, independent of the
length of the plaintext and the ciphertext.
We also want to give a security guarantee for a hybrid encryption scheme (see
[KL15]).
Theorem 10.19. Consider a hybrid encryption scheme as defined above.
(1) If the KEM is CPA-secure and the symmetric scheme is EAV-secure, then the corre-
sponding hybrid scheme is CPA-secure.
(2) If the KEM and the symmetric scheme are both CCA2-secure, then the corresponding
hybrid scheme is CCA2-secure. ♢
Note that EAV security of the symmetric scheme is sufficient for (1). In fact, a
hybrid scheme is public-key, and so EAV and CPA security are equivalent.
Corollary 10.20. The hybrid encryption scheme that combines RSA key encapsulation
and an authenticated encryption scheme (see Definition 8.19) is CCA2-secure if the RSA
assumption holds and the hash function is modeled as a random oracle. ♢
The Diffie-Hellman key encapsulation mechanism (see Definition 10.16) can be

combined with a symmetric encryption scheme and a message authentication code.
This defines the Diffie-Hellman Integrated Encryption Scheme (DHIES).
Definition 10.21. Suppose a symmetric encryption scheme and a message authentica-

tion code are given. Then the Diffie-Hellman Integrated Encryption Scheme (DHIES)
is defined as follows.
• On input 1𝑛 the key generation algorithm outputs a cyclic group 𝐺 of order 𝑞

with 𝑛 = size(𝑞), a generator 𝑔 ∈ 𝐺, a uniform random element 𝑏 ∈ ℤ𝑞 and
𝐵 = 𝑔𝑏 . The public key is 𝑝𝑘 = (𝐺, 𝑞, 𝑔, 𝐵) and the private key is 𝑠𝑘 = (𝐺, 𝑞, 𝑔, 𝑏).
Furthermore, a function 𝐻 ∶ 𝐺 → {0, 1}2𝑛 is fixed.
• The encryption algorithm takes a plaintext 𝑚 and the public key 𝑝𝑘 as input,
chooses a uniform random 𝑎 ∈ ℤ𝑞 and sets 𝑘𝐸 ‖𝑘𝑀 = 𝐻(𝐵𝑎 ). Then compute
$
𝐴 = 𝑔𝑎 , 𝑐 ← ℰ𝑘𝐸 (𝑚), 𝑡 = MAC𝑘𝑀 (𝑐) and output the ciphertext (𝐴, 𝑐, 𝑡).
• The decryption algorithm takes the ciphertext (𝐴, 𝑐, 𝑡) and the private key 𝑠𝑘 as in-
put. Compute 𝑘𝐸 ‖𝑘𝑀 = 𝐻(𝐴𝑏 ), verify the tag 𝑡 using 𝑘𝑀 and output the plaintext
𝑚 = 𝒟𝑘𝐸 (𝑐). If 𝐴 ∉ 𝐺 or if the verification of 𝑡 fails then output ⟂. ♢
Note that the scheme derives both the symmetric encryption key and the message
authentication key from the shared Diffie-Hellman secret.
DHIES usually refers to Diffie-Hellman using subgroups of ℤ∗𝑝 , the multiplicative
group of integers modulo a prime number (see Section 10.4). However, if a group of
points on an elliptic curve is used (see Section 12.2), then the scheme is called the Elliptic
Curve Integrated Encryption Scheme (ECIES).
Diffie-Hellman integrated encryption schemes are CCA2-secure under certain as-
sumptions:
Theorem 10.22. Consider DHIES or ECIES and suppose the underlying symmetric en-
cryption scheme is CPA-secure, the message authentication code is strongly secure, the
gap-CDH assumption holds for the Diffie-Hellman group and the hash function 𝐻 is mod-
eled as a random oracle. Then DHIES and ECIES are CCA2-secure encryption schemes.
♢
The above Theorem follows from the CCA2 security of the Diffie-Hellman key en-
capsulation method and the CCA2 security of the encrypt-then-authenticate construc-
tion (see Section 8.4) for CPA-secure symmetric encryption schemes.
Remark 10.23. Several of the above statements, in particular on the CCA2 security
of key encapsulation and hybrid encryption schemes, require the assumption that the
hash function is modeled as a random oracle. We refer to the literature for security
guarantees without the use of the random oracle model ([KL15], [HK07]).
10.8. Summary
• The distribution of secret keys can be delegated to a Key Distribution Center.

This requires the pre-distribution of long-term keys. Kerberos is a widely used
example of a key distribution protocol.
• In a key exchange protocol, the communication partners exchange messages
over an insecure channel and derive a shared secret key.
• The Diffie-Hellman key exchange uses a large cyclic group and is secure against
eavesdropping under the decisional Diffie-Hellman assumption. The plain
Diffie-Hellman protocol does not protect against active attacks.
• A standard choice for Diffie-Hellman are subgroups of the multiplicative group
of integers modulo a prime number. The subgroup should have a large prime
order.
• Diffie-Hellman can be broken by computing discrete logarithms, but this is as-
sumed to be a hard problem if the group is appropriately chosen.
• Key encapsulation mechanisms are based on a public-key scheme, for example
RSA or Diffie-Hellman. A secret key is encapsulated using the public key of
the recipient.
• Hybrid public-key encryption schemes combine key encapsulation, symmet-
ric encryption and message authentication. Such schemes can have a security
guarantee and can be used to protect large data sets.
Exercises
1. Show that the discrete-logarithm problem is easy in the additive group (ℤ𝑝 , +).
2. How can you efficiently generate Diffie-Hellman parameters 𝑝, 𝑞 and 𝑔 for the
multiplicative group ℤ∗𝑝 with given bit lengths 𝑛𝑝 and 𝑛𝑞 for 𝑝 and 𝑞 ?
3. Let 𝑝 = 89, 𝑔 = 2 mod 89 and 𝐺 = ⟨𝑔⟩. How many different shared keys 𝑘 are
possible in a Diffie-Hellman key exchange with these parameters?
4. You perform a Diffie-Hellman key exchange with Alice and you agreed on the pa-
rameters 𝑝 = 43, 𝐺 = ⟨𝑔⟩ ⊂ ℤ∗𝑝 and 𝑔 = 3 mod 43.
(a) Determine 𝑞 = ord(𝑔).
(b) Alice sends you 𝐴 = 14 and you choose the secret exponent 𝑏 = 26. Which
value do you send to Alice? Compute the shared secret key 𝑘.
5. Let 𝑔 ≡ 3 be an element of the group ℤ∗107 .
(a) Show that 𝑔 generates a group 𝐺 of prime order.
(b) How many exponentiations at most are necessary to compute a discrete loga-
rithm in 𝐺 using the Babystep-Giantstep algorithm?
(c) Compute log3 (12) in 𝐺.
Exercises 201
6. Show that the following parameters (a 2048-bit MODP group given in RFC 5114
[LK08]) can be used in a Diffie-Hellman key exchange, i.e., show that 𝑝 and 𝑞 are
prime numbers and ord(𝑔) = 𝑞.
Tip: Use SageMath. Remove the line breaks and define strings. The corresponding
hexadecimal numbers can be constructed with ZZ( ..., 16). Use the function
is_pseudoprime( ) to check the primality.
p = 87A8E61D B4B6663C FFBBD19C 65195999 8CEEF608 660DD0F2
5D2CEED4 435E3B00 E00DF8F1 D61957D4 FAF7DF45 61B2AA30
16C3D911 34096FAA 3BF4296D 830E9A7C 209E0C64 97517ABD
5A8A9D30 6BCF67ED 91F9E672 5B4758C0 22E0B1EF 4275BF7B
6C5BFC11 D45F9088 B941F54E B1E59BB8 BC39A0BF 12307F5C
4FDB70C5 81B23F76 B63ACAE1 CAA6B790 2D525267 35488A0E
F13C6D9A 51BFA4AB 3AD83477 96524D8E F6A167B5 A41825D9
67E144E5 14056425 1CCACB83 E6B486F6 B3CA3F79 71506026
C0B857F6 89962856 DED4010A BD0BE621 C3A3960A 54E710C3
75F26375 D7014103 A4B54330 C198AF12 6116D227 6E11715F
693877FA D7EF09CA DB094AE9 1E1A1597
g = 3FB32C9B 73134D0B 2E775066 60EDBD48 4CA7B18F 21EF2054
07F4793A 1A0BA125 10DBC150 77BE463F FF4FED4A AC0BB555
BE3A6C1B 0C6B47B1 BC3773BF 7E8C6F62 901228F8 C28CBB18
A55AE313 41000A65 0196F931 C77A57F2 DDF463E5 E9EC144B
777DE62A AAB8A862 8AC376D2 82D6ED38 64E67982 428EBC83
1D14348F 6F2F9193 B5045AF2 767164E1 DFC967C1 FB3F2E55
A4BD1BFF E83B9C80 D052B985 D182EA0A DB2A3B73 13D3FE14
C8484B1E 052588B9 B7D2BBD2 DF016199 ECD06E15 57CD0915
B3353BBB 64E0EC37 7FD02837 0DF92B52 C7891428 CDC67EB6
184B523D 1DB246C3 2F630784 90F00EF8 D647D148 D4795451
5E2327CF EF98C582 664B4C0F 6CC41659
q = 8CF83642 A709A097 B4479976 40129DA2 99B1A47D 1EB3750B
A308B0FE 64F5FBD3
7. Let 𝑝 = 59, 𝑔 ≡ 4 ∈ ℤ∗𝑝 , 𝑞 = ord(𝑔) = 29, 𝐺 = ⟨𝑔⟩ and 𝐴 = 𝑔𝑎 ≡ 9. Compute the
discrete logarithm 𝑎 = log𝑔 (𝐴) in 𝐺 with Pollard’s 𝜌 method.
Hint: Use the collision 𝑔2 𝐴1 = 𝑔5 𝐴5 in 𝐺.
8. Show that 𝑔 = 11 generates the group 𝐺 = ℤ∗109 . Apply the Pohlig-Hellman algo-
rithm to compute the discrete logarithm log11 (54).
9. Why is RSA key encapsulation (see Definition 10.14) not CPA-secure without the
hashing operation?
10. Discuss the consequences of re-using one or both of the secret Diffie-Hellman keys
𝑎 and 𝑏.
11. Explain a Man-in-the-Middle attack against the Diffie-Hellman protocol.
12. The ElGamal public-key encryption scheme uses the same parameters as the Diffie-
Hellman key-exchange, i.e., a cyclic group 𝐺, a generator 𝑔 and 𝑞 = ord(𝑔). Choose
a uniform number 𝑎 ∈ ℤ𝑞 and set 𝐴 = 𝑔𝑎 ∈ 𝐺. The message space is 𝐺, the
ciphertext space is 𝐺 × 𝐺, the public key is 𝑝𝑘 = (𝐺, 𝑞, 𝑔, 𝐴) and the private key
is 𝑠𝑘 = (𝐺, 𝑞, 𝑔, 𝑎). Encryption is randomized; to encrypt 𝑚 ∈ 𝐺, one chooses a

uniform number 𝑏 ∈ ℤ𝑞 and defines the ciphertext as
ℰ𝑝𝑘 (𝑚) = (𝑔𝑏 , 𝐴𝑏 𝑚).
Decryption is given by
𝒟𝑠𝑘 (𝑐1 , 𝑐2 ) = 𝑐1−𝑎 𝑐2 .
(a) Show that the scheme provides correct decryption.
(b) Assume that 𝑝 = 59, 𝐺 = ℤ∗𝑝 , 𝑔 ≡ 4 and 𝑎 = 20 define Alice’s ElGamal key.
She obtains the ciphertext 𝑐 = (17, 16). Compute the plaintext 𝑚.
(c) The ElGamal encryption is randomized by a parameter 𝑏. Explain why 𝑏 has
to remain secret and should not be re-used for different encryptions.
Chapter 11
Digital Signatures
Digital signatures are asymmetric cryptographic schemes which aim at data integrity
and authenticity. There are some similarities to message authentication codes, but digi-
tal signatures are verified using a public key. Successful verification shows that the data
is authentic and has not been tampered with. Since the private key is exclusively con-
trolled by the signer, digital signatures achieve not only data integrity and authenticity,
but also non-repudiation. Signatures have applications beyond integrity protection, for
example in entity authentication protocols.
In Section 11.1, we define digital signature schemes and discuss their security: sig-
natures should be unforgeable. Section 11.2 deals with the definition of the plain RSA
signature, which is based on the same parameters as the RSA cryptosystem. The plain
RSA signature is forgeable, and hashing of the data is advisable for security and effi-
ciency reasons. Furthermore, we present the probabilistic RSA-PSS scheme in Section
11.3. Other signatures schemes (ElGamal and DSA) are briefly discussed in the exer-
cises at the end of this chapter.
We refer the reader to [PP10], [KL15] and [GB08] for additional reading.

Signature schemes consist of key generation, signing and a verification algorithm.
Definition 11.1. A digital signature scheme is given by the following spaces and poly-
nomial-time algorithms:
• A message space ℳ.
• A space of key pairs 𝒦 = 𝒦𝑝𝑘 × 𝒦𝑠𝑘 .
203
204 11. Digital Signatures
• A randomized key generation algorithm 𝐺𝑒𝑛(1𝑛 ) that takes a security parameter

1𝑛 as input and outputs a pair of keys (𝑝𝑘, 𝑠𝑘).
• A signing algorithm, which may be randomized. It takes a message 𝑚 and a pri-
vate key 𝑠𝑘 as input and outputs a signature 𝑠 ← 𝑠𝑖𝑔𝑛𝑠𝑘 (𝑚).
• A deterministic verification algorithm that takes a public key 𝑝𝑘, a message 𝑚
and a signature 𝑠 as input and outputs 1 if the signature is valid, and 0 otherwise.
♢
Note that verification of a signature also requires the message. The signature is
usually short and does not include the message.
Similar to a message authentication code, the security of a signature scheme is
determined by the hardness of computing a valid signature without the private key. We
assume that an adversary knows the public key and is thus able to verify signatures.
Furthermore, we assume that the adversary can choose arbitrary messages to be signed.
This is called a chosen message attack and corresponds to a situation in practice where
many signature values are known and legitimate or innocent messages are routinely
signed.
Definition 11.2. Suppose a signature scheme is given. Consider the following experi-
ment (see Figure 11.1): a challenger takes the security parameter 1𝑛 as input and gen-
erates a key pair (𝑝𝑘, 𝑠𝑘) by running 𝐺𝑒𝑛(1𝑛 ). An adversary 𝐴 is given 1𝑛 and the public
key 𝑝𝑘. The adversary can choose messages 𝑚 and obtains the signature 𝑠 = 𝑠𝑖𝑔𝑛𝑠𝑘 (𝑚)
from an oracle. 𝐴 can also verify signatures using the public key 𝑝𝑘. The adversary tries
to forge a signature of a new message 𝑚′ and outputs (𝑚′ , 𝑠′ ). The challenger outputs
1 if the signature is valid and has not been queried before, and 0 otherwise.
The scheme is called existentially unforgeable under an adaptive chosen message
attack (EUF-CMA secure or just secure) if for all probabilistic polynomial-time adver-
saries, the probability of success is negligible in 𝑛. ♢
The definition of a secure scheme requires that the length of signature values is
not too short, since an adversary might otherwise guess valid signatures.
Digital signatures protect the integrity and authenticity of messages and can also
achieve non-repudiation. Since the signer alone controls the private key, they can-
not deny the signature afterwards. Note that symmetric schemes cannot achieve non-
repudiation, since the secret key is known to two (or more) parties.
Remark 11.3. The verification of a digital signature requires the authentic public key
of the signer. Although public keys can be openly shared, authenticity is not evident.
A man-in-the-middle might replace the message, the signature and the public key with
his own data. A verifier is not able to detect this attack, unless they can check the
authenticity of the public key. In practice, public keys are often signed by other parties
(in a Web of Trust) or by a Certification Authority (CA). A signed certificate binds the
11.2. Plain RSA Signature 205
1𝑛 , 𝑝𝑘 $
(𝑝𝑘, 𝑠𝑘) ← 𝐺𝑒𝑛(1𝑛 )
𝑚
Choose 𝑚
𝑠
𝑠 = 𝑠𝑖𝑔𝑛𝑠𝑘 (𝑚)
⋮
Choose 𝑚′ ,
(𝑚′ , 𝑠′ )
forge a signature 𝑠′ Verify (𝑚′ , 𝑠′ ),
output 1 or 0
Figure 11.1. Signature forgery experiment.
identity of a subject to its public key. This shifts the problem of authentic public keys
to a third party that is hopefully trustworthy.
11.2. Plain RSA Signature

The plain RSA signature scheme is defined analogously to the plain RSA encryption
scheme (see Chapter 9.2). Signing corresponds to decryption and requires the private
RSA key; verification corresponds to encryption and uses the public RSA key.
Definition 11.4. The plain RSA signature scheme is defined by:
• A key generation algorithm 𝐺𝑒𝑛(1𝑛 ) that takes a security parameter 1𝑛 as input,
generates two random primes 𝑝 and 𝑞 of length 𝑛 and sets 𝑁 = 𝑝𝑞. Furthermore,
integers 𝑒 and 𝑑 with
𝑒𝑑 ≡ 1 mod (𝑝 − 1)(𝑞 − 1)
are chosen as in the RSA encryption scheme. 𝐺𝑒𝑛(1𝑛 ) outputs the public key
𝑝𝑘 = (𝑒, 𝑁) and the private key 𝑠𝑘 = (𝑑, 𝑁).
• The message space is ℳ = ℤ∗𝑁 .
• The deterministic signature algorithm takes the secret key 𝑠𝑘 = (𝑑, 𝑁) and a mes-
sage 𝑚 ∈ ℳ as input and outputs the signature
𝑠 = 𝑠𝑖𝑔𝑛𝑠𝑘 (𝑚) = 𝑚𝑑 mod 𝑁.
• The verification algorithm takes the public key 𝑝𝑘 = (𝑒, 𝑁), a message 𝑚 ∈ ℤ𝑁
and a signature 𝑠. It computes
𝑠𝑒 mod 𝑁
and outputs 1 (signature is valid) if 𝑚 = 𝑠𝑒 mod 𝑁, and 0 otherwise. ♢
The correctness follows in the same way as for the RSA encryption algorithm.
The efficiency of RSA was discussed in Section 9.5. The complexity of signature
verification is 𝑂(𝑛2 ) if the public exponent 𝑒 is short, for example 𝑒 = 216 + 1. The
running time of the RSA signature is 𝑂(𝑛3 ) because the size of the private exponent 𝑑
is 𝑛. The exponentiation can be accelerated by a factor of around 4 using the Chinese
Remainder Theorem (compare Example 9.13). Digital signatures are not as efficient
as message authentication codes (see Chapter 8), and one avoids carrying out a large
number of signatures or verifications.
Example 11.5. Alice’s RSA parameters are 𝑝 = 11, 𝑞 = 23, 𝑁 = 𝑝𝑞 = 253, 𝑒 = 3
and 𝑑 = 147. She signs the message 𝑚 = 111 and computes 𝑠 = 111147 mod 253 ≡
89. Bob uses her public key 𝑝𝑘 = (3, 253) and verifies the signature by computing
893 mod 253 ≡ 111. ♢
Unfortunately, this scheme is both impractical and insecure. Firstly, the message
length is limited by the size of the RSA modulus 𝑁, but in practice, one needs to sign
messages of arbitrary length and not only several hundred bytes, the usual RSA modu-
lus length.
Furthermore, the plain RSA signature scheme is insecure, because the signature is
multiplicative and new signature values can be easily forged. If 𝑠1 and 𝑠2 are signatures
of 𝑚1 and 𝑚2 , then 𝑠1 𝑠2 mod 𝑁 is a valid signature of 𝑚1 𝑚2 mod 𝑁. Similarly, valid
signature values can be generated by taking powers. An adversary can even choose
any 𝑠 ∈ ℤ∗𝑁 and compute 𝑚 = 𝑠𝑒 mod 𝑁. Then 𝑠 is a valid signature of the message
𝑚. This attack is called existential forgery. Note that the adversary only controls the
signature value 𝑠 and not the message 𝑚.
Example 11.6. Assume Alice’s RSA parameters are the same as in Example 11.5 above.
Mallory generates a forged signature value 𝑠 = 123 and computes
𝑚 = 𝑠3 ≡ 52 mod 253. Bob successfully verifies the signature 𝑠 of 𝑚:
𝑠𝑒 = 1233 ≡ 52 mod 253.
But Alice has never signed 𝑚 = 52. ♢
We conclude that the plain RSA signature scheme is not EUF-CMA secure. This
is analogous to the fact that plain RSA encryption is malleable and insecure under a
chosen ciphertext attack (see Chapter 9.2).
11.3. Probabilistic Signature Scheme

An obvious improvement of the plain RSA signature scheme is to apply a hash func-
tion before signing a message. If the output length of the hash function is at most the
length of the RSA modulus, then messages of arbitrary length can be signed. Further-
more, hashing prevents attacks that exploit the multiplicative structure of the plain
RSA scheme.
Closer analysis shows that the hash function should be collision-resistant and have
the properties of a random oracle. Furthermore, the range of the hash function should
be the full RSA message space ℤ∗𝑁 . The latter poses a problem in practice, since the
RSA modulus is usually more than 2000 bits long, whereas the digests of well-known
hash functions are much shorter − only between 160 and 512 bits.
The RSA-FDH (Full Domain Hash) signature is similar to the plain RSA scheme,
but leverages a hash function 𝐻 ∶ {0, 1}∗ → ℤ∗𝑁 . A message 𝑚 is first hashed and then
signed:
𝑠 = 𝑠𝑖𝑔𝑛𝑠𝑘 (𝑚) = 𝐻(𝑚)𝑑 mod 𝑁.
In the verification step, 𝐻(𝑚) is computed and then compared to 𝑠𝑒 mod 𝑁. The sig-
nature is valid if 𝐻(𝑚) = 𝑠𝑒 mod 𝑁.
Obviously, the collision-resistance of 𝐻 is crucial, since a collision 𝐻(𝑚1 ) = 𝐻(𝑚2 )
with 𝑚1 ≠ 𝑚2 can be used for an existential forgery.
Theorem 11.7. If 𝐻 has range ℤ∗𝑁 and is modeled as a random oracle, the RSA-FDH
scheme is EUF-CMA secure under the RSA assumption.
Remark 11.8. The above theorem has a proof by reduction (see [BR05], [KL15]). If we
assume that the hash function behaves like a random oracle, then forging a signature
is only possible by inverting the RSA function 𝑓(𝑥) = 𝑥𝑒 mod 𝑁 on uniform random
integers modulo 𝑁. But under the RSA assumption, the probability of successfully
inverting 𝑓 is negligible. ♢
Since the length of cryptographic hashes is usually smaller than the size of the
RSA modulus, one stretches the hash by randomized padding. The result should still
be indistinguishable from a random integer in ℤ𝑁 . A standard method is defined in
PKCS #1 version 2.2 (see RFC 8017 [MKJR16]) and is called a Probabilistic Signature
Standard (RSA-PSS) or RSASSA-PSS (RSA signature scheme with appendix).
Similar to RSA-OAEP (see Chapter 9.6), the PSS encoding requires a hash function
𝐻 with output byte length ℎ and a mask generating function 𝑀𝐺𝐹 with input length ℎ
and variable output length. For example, 𝐻 could be SHA-2 and 𝑀𝐺𝐹 is based on this
hash function.
In the following, we describe RSA-PSS signing and verification (see
Figure 11.2). Let 𝑚 be the message. We will derive an encoded message 𝐸𝑀 of byte
size(𝑁)−1
length 𝑘 = ⌈ ⌉ and sign 𝐸𝑀.
8
First, 𝑚′ is defined by concatenating eight zero padding bytes, the hashed message
𝐻(𝑚) and a random salt string. A typical salt length is ℎ bytes, but an empty salt is also
permitted:
𝑚′ = 008 ‖ 𝐻(𝑚) ‖ salt.
The result is hashed again and we obtain 𝐻(𝑚′ ). A data block 𝐷𝐵 of length 𝑘 − ℎ − 1
is formed by concatenating the necessary number of zero padding bytes, one byte 01
and the salt string:

𝐷𝐵 = 00‖ … ‖00‖01‖ salt.
Then 𝑀𝐺𝐹(𝐻(𝑚′ ), 𝑘 − ℎ − 1) outputs a string of the same length as 𝐷𝐵. Set
𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵 = 𝐷𝐵 ⊕ 𝑀𝐺𝐹(𝐻(𝑚′ ), 𝑘 − ℎ − 1)
and define the encoded message 𝐸𝑀 by concatenating 𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵, the hash 𝐻(𝑚′ ) and
the byte BC:
𝐸𝑀 = 𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵 ‖ 𝐻(𝑚′ ) ‖ BC.
The byte length of 𝐸𝑀 is 𝑘. Finally, the signature is defined by
𝑠 = 𝑠𝑖𝑔𝑛𝑠𝑘 (𝑚) = 𝐸𝑀 𝑑 mod 𝑁.
The verification involves various format and length checks. Compute
𝐸𝑀 = 𝑠𝑒 mod 𝑁.
The rightmost byte of 𝐸𝑀 should be BC. Then the byte strings 𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵 of length
𝑘 − ℎ − 1 and 𝐻 ′ of length ℎ are extracted and
𝐷𝐵 = 𝑚𝑎𝑠𝑘𝑒𝑑𝐷𝐵 ⊕ 𝑀𝐺𝐹(𝐻 ′ , 𝑘 − ℎ − 1)
is computed. The leftmost bytes of 𝐷𝐵 should be 00, followed by 01. The remaining
bytes of 𝐷𝐵 form the salt. Set
𝑚′ = 008 ‖ 𝐻(𝑚) ‖ salt
and compute 𝐻(𝑚′ ). If 𝐻(𝑚′ ) = 𝐻 ′ then the signature is valid. Otherwise, the signa-
ture is invalid. It is important that only one failure message is given for all format or
verification errors.
Remark 11.9. If the salt is randomly chosen and is sufficiently long, then the RSA-PSS
signature is randomized and signing the same message twice using the same key will
give different signature values. An adversary who compares different signature values
cannot see whether any of the underlying messages are identical. ♢
The following Theorem on the security of RSA-PSS has a proof by reduction

([BR96], [BR05]). However, the original RSA-PSS scheme is not identical to the stan-
dardized version.
Theorem 11.10. The RSA-PSS signature scheme is EUF-CMA secure in the random or-
acle model under the RSA assumption. ♢
Figure 11.2. Signing a message 𝑚 using RSA-PSS.
So far we have only discussed RSA signatures. An alternative is signature schemes

which are based on the discrete logarithm problem in a cyclic group. In particular,
the units of a finite field and the points of an elliptic curve can be used, and the corre-
sponding signature schemes are ElGamal, DSA/DSS (Digital Signature Algorithm) and
ECDSA (Elliptic Curve Digital Signature Algorithm). ElGamal and the standardized
scheme DSA are explained in Exercise 10 and elliptic curves are explored in Chapter 12.
Signatures are also used in authenticated key exchange (AKE) and in entity au-
thentication protocols. If the public key is authentic, then a correct signature shows
that the signer controls the corresponding private key. This can be used as a proof of
identity.
Another interesting topic not covered in this book is hash-based signatures. In
1975, Lamport discovered a one-time signature system based on hashes without the use
of number-theoretic constructions [Lam79], and various improvements of the original
system exist. Hash-based signatures are also a promising candidate for post-quantum
cryptography.
11.4. Summary
• Digital signature schemes are based on public-key cryptography. The private

key is used to sign a message and the public key to verify a signature.
• Digital signatures can achieve integrity, authenticity and non-repudiation of
messages.
• The RSA signature algorithm uses the same parameters as the RSA cryptosys-
tem.
• The plain RSA signature scheme has serious weaknesses and the input data
should at least be hashed.
• RSA-PSS is a standardized signature scheme that applies hashing and padding
before signing a message using RSA.
• The Digital Signature Algorithm (DSA/DSS) is a standardized signature
scheme that is based on the discrete logarithm problem.
Exercises
1. Give possible reasons why a signature may be invalid.

2. Suppose a new signature scheme takes pairs (𝑚1 , 𝑚2 ) of messages of the same
length as input and defines the signature as follows: set 𝑚 = 𝑚1 ⊕ 𝑚2 and set
𝑠 = 𝑠𝑖𝑔𝑛𝑠𝑘 (𝑚) using a secure signature scheme. Is the new scheme secure or would
you rather change this scheme?
3. Let 𝑒 = 5, 𝑁 = 437 be the parameters of a public RSA key. Verify the following
plain RSA signatures:
(a) 𝑚 = 102, 𝑠 = 416,
(b) 𝑚 = 101, 𝑠 = 416,
(c) 𝑚 = 100, 𝑠 = 86.
Conduct an existential forgery attack using the signature value 𝑠 = 99.
4. Compute the RSA-FDH signature of a message 𝑚 with hash value 𝐻(𝑚) = 11111
using the RSA parameters 𝑁 = 28829 and 𝑒 = 5, where 𝑑 has to be determined.
Verify the signature with the public RSA key. Hint: 𝑠 = 7003.
5. Consider the verification of an RSA-FDH signature. It is sufficient to give only the
hash and the signature value to a verifier and not the message?
6. Why does the existential forgery attack almost certainly fail for RSA-PSS?
7. Discuss the consequences of a fixed and known salt value in RSA-PSS.
8. Show that the plain RSA signature scheme is secure under the RSA assumption if
one considers the following experiment: the adversary succeeds if they output a
valid signature of a challenge message which is chosen uniformly at random.
Exercises 211
9. Suppose we want to sign and to encrypt data (signcryption). Analogue to the en-
crypt-then-authenticate approach in symmetric cryptography, Alice encrypts a se-
cret message with Bob’s public key and then signs the ciphertext using her private
key.
(a) How can an adversary produce their own signature of the same message and
thus mislead Bob?
(b) Does this combination of encryption and signature provide non-repudiation?
10. The ElGamal signature scheme is based on the discrete logarithm problem and uses
a cyclic subgroup 𝐺 of ℤ∗𝑝 of order 𝑞 and a generator 𝑔 ∈ 𝐺. One chooses a secret
uniform 𝑎 ∈ ℤ𝑞 and computes 𝐴 = 𝑔𝑎 . Then 𝑝𝑘 = (𝑝, 𝑔, 𝑞, 𝐴) forms the public key
and 𝑠𝑘 = (𝑝, 𝑔, 𝑞, 𝑎) is the secret key. The signature generation is randomized; one
chooses a random uniform 𝑘 ∈ ℤ∗𝑞 and computes the signature value by
𝑠𝑖𝑔𝑛𝑠𝑘 (𝑚) = (𝑟, 𝑠) with 𝑟 ≡ 𝑔𝑘 mod 𝑝 and 𝑠 ≡ 𝑘−1 (𝐻(𝑚) − 𝑎𝑟) mod 𝑞.
To verify the signature (𝑟, 𝑠) of a message 𝑚, one computes 𝐴𝑟 𝑟𝑠 mod 𝑝 and com-
pares the result with 𝑔𝐻(𝑚) mod 𝑝. If both residue classes coincide, then the sig-
nature is valid.
The Digital Signature Algorithm (called DSA or DSS) is a standardized variant
of the ElGamal signature scheme [FIP13]. The bit lengths of 𝑝 and 𝑞 are specified,
e.g., size(𝑝) = 2048 and size(𝑞) = 224. Unlike ElGamal, both DSA signature parts
𝑟 and 𝑠 are reduced modulo 𝑞 and the verification is also performed modulo 𝑞.
Although 𝑞 is much smaller than 𝑝, the existing sub-exponential attacks against
the discrete logarithm problem do not run faster in subgroups of ℤ∗𝑝 . Other attacks,
e.g., Babystep-Giantstep and Pollard’s 𝜌-algorithm, can use the subgroup, but their
running time is 𝑂(2size(𝑞)/2 ) and thus out of reach if 𝑞 has more than 200 bits.
(a) Show that the verification is correct.
(b) Assume that 𝑝 = 59, 𝑔 ≡ 4, 𝑞 = 29 and 𝑎 = 20 form Alice’s ElGamal key.
Compute the public parameter 𝐴.
(c) Alice wants to sign a message 𝑚 with hash value 𝐻(𝑚) = 8. She chooses the
secret parameter 𝑘 = 5. Compute the ElGamal signature (𝑟, 𝑠).
(d) Check that the signature (𝑟, 𝑠) is valid.
(e) The ElGamal signature is randomized by 𝑘. Explain why 𝑘 must remain secret
and should not be re-used for different signatures.
Chapter 12
Elliptic Curve Cryptography
In this chapter, we outline the basics of Elliptic Curve Cryptography (ECC). In several
public-key schemes, the additive group of points on elliptic curves over finite fields
forms an alternative to the multiplicative group of integers modulo a prime number.
The addition of points on an elliptic curve is slightly more complex than the multipli-
cation of residue classes, but elliptic curves offer a similar level of security as the mul-
tiplicative group with shorter keys. ECC is now widely used because of its efficiency
and accepted security.
In Section 12.1, Weierstrass equations and elliptic curves are introduced and the
addition of points on a cubic curve is explained. We present the Elliptic Curve Diffie-
Hellman algorithm in Section 12.2 and discuss the efficiency and security of ECC in
Section 12.3. Finally, in Section 12.4 we show how elliptic curves can be leveraged to
factor integers. The Elliptic Curve Digital Signature Algorithm (ECDSA) is discussed
in the exercises at the end of this chapter.
Elliptic curves over different fields are an interesting and challenging mathemati-
cal topic and we refer to [Sil09] for an in-depth treatment. There are several textbooks
on cryptographic applications of elliptic curves and we recommend [Was08], [TW06],
[Gal12] and [She17] for additional reading.
12.1. Weierstrass Equations and Elliptic Curves

The multiplicative group ℤ∗𝑝 of integers modulo a prime number 𝑝 is used in several
cryptographic algorithms, for example in the Diffie-Hellman key exchange. A well-
proven alternative is the group of points on an elliptic curve over a finite field. Elliptic
curves are defined by Weierstrass equations.
213
214 12. Elliptic Curve Cryptography
Definition 12.1. Let 𝐾 be a field and 𝑎1 , … , 𝑎6 ∈ 𝐾. Then the equation

𝑦2 + 𝑎1 𝑥𝑦 + 𝑎3 𝑦 = 𝑥3 + 𝑎2 𝑥2 + 𝑎4 𝑥 + 𝑎6
is called a (generalized) Weierstrass equation. The equation defines a cubic curve called
a Weierstrass curve. ♢
In many cases, the generalized Weierstrass equation can be transformed into a

shorter equation.
Definition 12.2. Let 𝐾 be a field and 𝑎, 𝑏 ∈ 𝐾. Then the equation
𝑦2 = 𝑥3 + 𝑎𝑥 + 𝑏
is called a short Weierstrass equation.
Proposition 12.3. Let 𝐾 be a field with 𝑐ℎ𝑎𝑟(𝐾) ≠ 2, 3. Then a Weierstrass curve over
𝐾 is up to an affine transformation given by a short Weierstrass equation.
Proof. By assumption, we can divide by 2 and 3. Following [Was08], we write the

generalized Weierstrass equation as
𝑎1 𝑎 2 𝑎2 𝑎 𝑎 𝑎2
(𝑦 + 𝑥 + 3 ) = 𝑥3 + (𝑎2 + 1 ) 𝑥2 + (𝑎4 + 1 3 ) 𝑥 + ( 3 + 𝑎6 ) .
2 2 4 2 4
𝑎1 𝑎3
The affine transformation 𝑦′ = 𝑦 + 𝑥+ yields an equation
2 2
(𝑦′ )2 = 𝑥3 + 𝑎′2 𝑥2 + 𝑎′4 𝑥 + 𝑎′6
𝑎′2
with constants 𝑎′2 , 𝑎′4 , 𝑎′6 ∈ 𝐾. Finally, replacing 𝑥 by 𝑥′ − removes the 𝑥2 term and
3
gives the desired equation
(𝑦′ )2 = (𝑥′ )3 + 𝑎𝑥′ + 𝑏
for some constants 𝑎, 𝑏 ∈ 𝐾. □
From now on, we assume that 𝑐ℎ𝑎𝑟(𝐾) is neither 2 nor 3, although this excludes
the binary fields 𝐾 = 𝐺𝐹(2𝑚 ). The theory of elliptic curves also works over these fields
and are used in cryptography, but there are some technical differences, for example
with respect to the Weierstrass equation.
We want to add points at infinity to the 𝑛-dimensional space 𝐾 𝑛 .
Definition 12.4. Let 𝐾 be a field and 𝑛 ∈ ℕ. Then the 𝑛-dimensional projective space
ℙ𝑛 (𝐾) is defined as the set of all lines in 𝐾 𝑛+1 passing through the origin. Points in
ℙ𝑛 (𝐾) are given by 𝑛 + 1 projective or homogeneous coordinates and are denoted by
[𝑥1 ∶ 𝑥2 ∶ ⋯ ∶ 𝑥𝑛+1 ].
Two points are equivalent and give the same element in ℙ𝑛 (𝐾) if they are on the same
line, i.e., if they differ only by a nonzero factor 𝜆 ∈ 𝐾. Hence the projective space ℙ𝑛 (𝐾)
is a set of equivalence classes of 𝐾 𝑛+1 ⧵ {0𝑛+1 }, where
[𝑥1 ∶ 𝑥2 ∶ ⋯ ∶ 𝑥𝑛+1 ] ∼ [𝑦1 ∶ 𝑦2 ∶ ⋯ ∶ 𝑦𝑛+1 ]
if there exists a 𝜆 ∈ 𝐾 ∗ such that 𝑦1 = 𝜆𝑥1 , 𝑦2 = 𝜆𝑥2 , … , 𝑦𝑛+1 = 𝜆𝑥𝑛+1 . ♢
The points [𝑥1 ∶ ⋯ ∶ 𝑥𝑛 ∶ 0] are said to be points at infinity. The usual space 𝐾 𝑛
of 𝑛-dimensional vectors is called the affine space and we will sometimes write 𝔸𝑛 (𝐾)
instead of 𝐾 𝑛 . We have an injection
𝔸𝑛 (𝐾) ↪ ℙ𝑛 (𝐾), (𝑥1 , … 𝑥𝑛 ) ↦ [𝑥1 ∶ ⋯ ∶ 𝑥𝑛 ∶ 1]
and the complement of the image under this map consists of the points at infinity.
Example 12.5. a) Points in ℙ1 (𝐾) are lines in the plane 𝐾 2 passing through the ori-
𝑥
gin. If 𝑦 ≠ 0 then [𝑥 ∶ 𝑦] is equivalent to [ ∶ 1], which lies in the image of 𝔸1 (𝐾).
𝑦
Otherwise, [𝑥 ∶ 0] ∼ [1 ∶ 0] is the point at infinity. Hence
ℙ1 (𝐾) = 𝔸1 (𝐾) ∪ {[1 ∶ 0]}.
b) In the two-dimensional projective space ℙ2 (𝐾), all points are either equivalent
to [𝑥 ∶ 𝑦 ∶ 1] or to [𝑥 ∶ 𝑦 ∶ 0]. We have a decomposition
ℙ2 (𝐾) = 𝔸2 (𝐾) ∪ 𝔸1 (𝐾) ∪ {[1 ∶ 0 ∶ 0]},
where (𝑥, 𝑦) ∈ 𝐾 2 corresponds to [𝑥 ∶ 𝑦 ∶ 1] and 𝑢 ∈ 𝐾 to the point [𝑢 ∶ 1 ∶ 0]. ♢
A curve in the affine space 𝔸2 (𝐾) can be extended to a projective curve in ℙ2 (𝐾).
𝑋
Suppose the curve is given by the Weierstrass equation 𝑦2 = 𝑥3 + 𝑎𝑥 + 𝑏. We set 𝑥 =
𝑍
𝑌
and 𝑦 = and obtain
𝑍
𝑌2 𝑋3 𝑋
2
= 3
+ 𝑎 + 𝑏.
𝑍 𝑍 𝑍
Multiplying both sides by 𝑍 3 yields the Weierstrass equation in projective (or homoge-
neous) coordinates:
𝑌 2 𝑍 = 𝑋 3 + 𝑎𝑋𝑍 2 + 𝑏𝑍 3 .
Proposition 12.6. The points on the projective curve 𝑌 2 𝑍 = 𝑋 3 + 𝑎𝑋𝑍 2 + 𝑏𝑍 3 are either
equivalent to [𝑥 ∶ 𝑦 ∶ 1], where (𝑥, 𝑦) ∈ 𝐾 2 satisfies the affine Weierstrass equation
𝑦2 = 𝑥3 + 𝑎𝑥 + 𝑏, or to the point [0 ∶ 1 ∶ 0] at infinity.
Proof. We only need to consider points [𝑋 ∶ 𝑌 ∶ 0] on the projective Weierstrass

curve, but 𝑍 = 0 implies 𝑋 3 = 0 and thus 𝑋 = 0. The only remaining point is [0 ∶ 1 ∶
0], which in turn satisfies the Weierstrass equation. □
We denote the point [0 ∶ 1 ∶ 0] at infinity by 𝑂. This point lies on every vertical

line 𝑥 = 𝑐, since the corresponding equation in projective coordinates is
𝑋 = 𝑐𝑍
and 𝑂 = [0 ∶ 1 ∶ 0] satisfies this equation for every 𝑐 ∈ 𝐾.
Definition 12.7. Let 𝐶 be an affine curve given by a polynomial equation
𝑓(𝑥, 𝑦) = 0
over a field 𝐾. Let 𝐾 be the algebraic closure of 𝐾 (see Remark 4.72). Then 𝐶 is called
nonsingular (or smooth) if for all points 𝑃 = (𝑥, 𝑦) ∈ 𝐶(𝐾) the partial derivatives 𝐷𝑥 𝑓
and 𝐷𝑦 𝑓 do not simultaneously vanish at 𝑃:
((𝐷𝑥 𝑓)(𝑃), (𝐷𝑦 𝑓)(𝑃)) ≠ (0, 0).
𝐷𝑥 𝑓 = 𝐷𝑥 (𝑓) and 𝐷𝑦 𝑓 = 𝐷𝑦 (𝑓) are the formal derivatives of 𝑓 with respect to 𝑥 and
𝑦, respectively (see Definition 4.55). If both derivatives vanish at 𝑃, then 𝑃 is called
a singular point. A point 𝑃 on the corresponding projective curve 𝑓(𝑋, 𝑌 , 𝑍) = 0 is
nonsingular if
((𝐷𝑋 𝑓)(𝑃), (𝐷𝑌 𝑓)(𝑃), (𝐷𝑍 𝑓)(𝑃)) ≠ (0, 0, 0).
Example 12.8. Let 𝐶 be the Weierstrass curve 𝑦2 = 𝑥3 over a field 𝐾. Then 𝑓(𝑥, 𝑦) =
−𝑦2 + 𝑥3 and
𝐷𝑥 𝑓 = 3𝑥2 and 𝐷𝑦 𝑓 = −2𝑦.
We assumed that 2 and 3 are nonzero in 𝐾. Then the equations 3𝑥2 = 0 and −2𝑦 = 0
give (𝑥, 𝑦) = (0, 0), a point on the curve. Therefore, 𝐶 is singular at the point (0, 0) and
nonsingular at all other points, including the point 𝑂 = [0 ∶ 1 ∶ 0] at infinity, since
𝑓(𝑋, 𝑌 , 𝑍) = −𝑌 2 𝑍 + 𝑋 3 gives 𝐷𝑍 𝑓 = −𝑌 2 and (𝐷𝑍 𝑓)(𝑂) = −1. ♢
We want to give a condition under which a Weierstrass curve is nonsingular.

Proposition 12.9. Let 𝑦2 = 𝑥3 + 𝑎𝑥 + 𝑏 be a Weierstrass curve over a field 𝐾. Then
Δ = −16(4𝑎3 + 27𝑏2 )
is called the discriminant of the curve. If Δ is nonzero in 𝐾, then the curve is nonsingular.
Proof. The curve is defined by the equation 𝑓(𝑥, 𝑦) = −𝑦2 + 𝑥3 + 𝑎𝑥 + 𝑏 = 0. One has
𝐷𝑥 𝑓 = 3𝑥2 + 𝑎 and 𝐷𝑦 𝑓 = −2𝑦.
Suppose that (𝑥, 𝑦) ∈ 𝐾 × 𝐾 lies on the Weierstrass curve and both formal partial
derivatives vanish; then 𝑦 = 0 and 𝑎 = −3𝑥2 . Since 𝑓(𝑥, 𝑦) = 0 we also have 𝑥3 +
𝑎𝑥 + 𝑏 = 0, which gives 𝑥3 − 3𝑥3 + 𝑏 = 0 and 𝑏 = 2𝑥3 . The equations for 𝑎 = −3𝑥2
and 𝑏 = 2𝑥3 imply
−4𝑎3 = 27𝑏2 = 108𝑥6 .
If Δ ≠ 0 then −4𝑎3 ≠ 27𝑏2 , and so the affine curve does not have singular points.
It remains to be shown that the point 𝑂 = [0 ∶ 1 ∶ 0] at infinity is also nonsingular.
In projective coordinates, we have
𝑓(𝑋, 𝑌 , 𝑍) = −𝑌 2 𝑍 + 𝑋 3 + 𝑎𝑋𝑍 2 + 𝑏𝑍 3 .
The partial derivative with respect to 𝑍 is
𝐷𝑍 𝑓 = −𝑌 2 + 2𝑎𝑋𝑍 + 3𝑏𝑍 2 ,
and thus (𝐷𝑍 𝑓)(𝑂) = −1, which shows that 𝑂 is a nonsingular point. □
The above proof also shows that a curve defined by a short Weierstrass equation is
nonsingular if and only if the cubic 𝑥3 + 𝑎𝑥 + 𝑏 does not have a double root.
Definition 12.10. An elliptic curve 𝐸 over a field 𝐾 is a nonsingular projective curve

defined by a Weierstrass equation. The set of points on 𝐸 with coordinates in 𝐾 is
denoted by 𝐸(𝐾). ♢
We have seen that 𝐸(𝐾) ⊂ ℙ2 (𝐾) consists of all points satisfying the affine Weier-
strass equation and one additional point 𝑂 = [0 ∶ 1 ∶ 0] at infinity, i.e.,
𝐸(𝐾) = {(𝑥, 𝑦) ∈ 𝐾 × 𝐾 | 𝑦2 = 𝑥3 + 𝑎𝑥 + 𝑏} ∪ {𝑂}.
A very important fact is that points in 𝐸(𝐾) can be added. However, addition is not the
usual vector addition in 𝐾 2 , since the sum would not lie on the curve. Instead, we will
show that a line through two nonsingular points 𝑃 and 𝑄 intersects the elliptic curve
at a third point 𝑅 (see Figure 12.1), and use this property to define the addition. Let
𝐸 be an elliptic curve defined by the equation 𝑓(𝑥, 𝑦) = −𝑦2 + 𝑥3 + 𝑎𝑥 + 𝑏 = 0. It is
necessary to consider different cases:
(1) Firstly, we suppose that 𝑃 = (𝑥1 , 𝑦1 ), 𝑄 = (𝑥2 , 𝑦2 ) ∈ 𝐸(𝐾) are points with 𝑃, 𝑄 ≠ 𝑂
and different 𝑥-coordinates. Then a straight line through 𝑃 and 𝑄 intersects the
Weierstrass curve of degree 3 at a third point 𝑅 = (𝑥3 , 𝑦3 ) ∈ 𝐸(𝐾). In fact, the
equation of the line through 𝑃 and 𝑄 is
𝑦2 − 𝑦1
𝑦 = 𝑙(𝑥) = 𝑚(𝑥 − 𝑥1 ) + 𝑦1 , where 𝑚 = .
𝑥2 − 𝑥1
In order to find the 𝑥-coordinate of the third intersection point, we replace 𝑦 by
𝑙(𝑥) in the Weierstrass equation −𝑦2 +𝑥3 +𝑎𝑥 +𝑏 = 0. This gives a cubic equation
in the variable 𝑥:
𝑓(𝑥, 𝑙(𝑥)) = −(𝑚(𝑥 − 𝑥1 ) + 𝑦1 )2 + 𝑥3 + 𝑎𝑥 + 𝑏 = 𝑥3 − 𝑚𝑥2 + ⋯ = 0.
Since this equation already has two zeros in 𝐾 (the 𝑥-coordinates of 𝑃 and 𝑄), it
must have a third zero 𝑥3 ∈ 𝐾, and hence the cubic polynomial can be factorized:
𝑓(𝑥, 𝑙(𝑥)) = (𝑥 − 𝑥1 )(𝑥 − 𝑥2 )(𝑥 − 𝑥3 ) = 𝑥3 − (𝑥1 + 𝑥2 + 𝑥3 )𝑥2 + … .
By comparing the 𝑥2 -terms in the above expressions, we find that
𝑥1 + 𝑥2 + 𝑥3 = 𝑚2 .
Thus the coordinates of 𝑅 = (𝑥3 , 𝑦3 ) are
𝑥3 = 𝑚2 − 𝑥1 − 𝑥2 and 𝑦3 = 𝑚(𝑥3 − 𝑥1 ) + 𝑦1 .
(2) If 𝑃, 𝑄 ∈ 𝐸(𝐾) with 𝑃, 𝑄 ≠ 𝑂 have the same 𝑥-coordinate, then the line through
𝑃 and 𝑄 is a vertical line and, as we saw above, the point 𝑂 at infinity lies on all
vertical lines. Hence the third point is 𝑅 = 𝑂.
(3) Now let 𝑃 = 𝑄 = (𝑥1 , 𝑦1 ) ∈ 𝐸(𝐾) and 𝑃 ≠ 𝑂. If 𝑦1 ≠ 0 then we take the tangent
line at 𝑃. The tangent at 𝑃 is the line of equation
(𝑥 − 𝑥1 )(𝐷𝑥 𝑓)(𝑃) + (𝑦 − 𝑦1 )(𝐷𝑦 𝑓)(𝑃) = 0,
and rearranging gives the equation
(𝐷𝑥 𝑓)(𝑃) 3𝑥2 + 𝑎
𝑦 = 𝑡(𝑥) = 𝑚(𝑥 − 𝑥1 ) + 𝑦1 , where 𝑚 = − = 1 .
(𝐷𝑦 𝑓)(𝑃) 2𝑦1
Replacing 𝑦 with 𝑡(𝑥) in the Weierstrass equation 𝑓(𝑥, 𝑦) = −𝑦2 + 𝑥3 + 𝑎𝑥 + 𝑏 = 0
gives a cubic equation with a double root at 𝑥1 . We denote the other root by 𝑥3 .
This is the 𝑥-coordinate of the point 𝑅 = (𝑥3 , 𝑦3 ), the intersection of the tangent
with the elliptic curve. Factorization of the cubic polynomial gives
𝑓(𝑥, 𝑡(𝑥)) = (𝑥 − 𝑥1 )2 (𝑥 − 𝑥3 ) = 𝑥3 − (𝑥3 + 2𝑥1 )𝑥2 + … .
On the other hand, we have as above 𝑓(𝑥, 𝑡(𝑥)) = 𝑥3 − 𝑚2 𝑥2 + … , and comparing
the 𝑥2 -terms yields
𝑥3 = 𝑚2 − 2𝑥1 .
The corresponding 𝑦-coordinate of 𝑅 is
𝑦3 = 𝑚(𝑥3 − 𝑥1 ) + 𝑦1 .
We obtain almost the same formulas as in the case 𝑃 ≠ 𝑄, but the slope 𝑚 is
defined differently.
(4) If 𝑃 = 𝑄 = (𝑥1 , 𝑦1 ) ∈ 𝐸(𝐾) and 𝑦1 = 0, then 𝑥 = 𝑥1 is a vertical tangent line and
𝑅 = 𝑂 lies on that line.
(5) If 𝑃 = 𝑂 and 𝑄 = (𝑥1 , 𝑦1 ) ∈ 𝐸(𝐾), then the line through 𝑃 and 𝑄 is the vertical
line 𝑥 = 𝑥1 , which intersects the elliptic curve in 𝑅 = (𝑥1 , −𝑦1 ). Accordingly, if
𝑃 = (𝑥1 , 𝑦1 ) and 𝑄 = 𝑂 then 𝑅 = (𝑥1 , −𝑦1 ).
(6) Finally, if 𝑃 = 𝑄 = 𝑂 then 𝑅 = 𝑂.
In summary, given two points 𝑃, 𝑄 ∈ 𝐸(𝐾), there is a unique third point 𝑅 ∈ 𝐸(𝐾)
such that 𝑃, 𝑄 and 𝑅 (with multiplicities) lie on the same line. We can therefore define
the addition of points on 𝐸 by letting
𝑃 + 𝑄 + 𝑅 = 𝑂 ⟺ 𝑃 + 𝑄 = −𝑅
(see Figure 12.1). −𝑅 is the reflection of 𝑅 = (𝑥1 , 𝑦1 ) across the 𝑥-axis, i.e.,
−𝑅 = (𝑥1 , −𝑦1 ).
Note that the line through 𝑅 and −𝑅 is the vertical line 𝑥 = 𝑥1 and the third point on
that line is 𝑂, so that 𝑅 + (−𝑅) + 𝑂 = 𝑂, as expected.
In our computations above, we derived explicit formulas for the inverse point and
the addition of two points:
Proposition 12.11. Let 𝐾 be a field with 𝑐ℎ𝑎𝑟(𝐾) ≠ 2, 3 and 𝐸 an elliptic curve over 𝐾,
defined by the Weierstrass equation 𝑦2 = 𝑥3 + 𝑎𝑥 + 𝑏.
Figure 12.1. The elliptic curve 𝐸 ∶ 𝑦2 + 𝑦 = 𝑥3 − 𝑥 over the real numbers. The line
through 𝑃 and 𝑄 intersects the curve in 𝑅 and 𝑃 + 𝑄 + 𝑅 = 𝑂.
(1) Let 𝑃 = (𝑥1 , 𝑦1 ) and 𝑃 ≠ 𝑂; then −𝑃 = (𝑥1 , −𝑦1 ).

(2) Let 𝑃 = (𝑥1 , 𝑦1 ), 𝑄 = (𝑥2 , 𝑦2 ) ∈ 𝐸(𝐾), 𝑃 ≠ 𝑂, 𝑄 ≠ 𝑂 and 𝑃 ≠ −𝑄, i.e., 𝑥1 ≠ 𝑥2 or
𝑦1 ≠ −𝑦2 . Then
𝑃 + 𝑄 = (𝑥3 , 𝑦3 ), 𝑥3 = 𝑚2 − 𝑥1 − 𝑥2 , 𝑦3 = 𝑚(𝑥1 − 𝑥3 ) − 𝑦1 ,
where 𝑚 is the slope of the line through 𝑃 and 𝑄 or the tangent line, if 𝑃 = 𝑄:
𝑦2 −𝑦1
⎧ 𝑥2 −𝑥1 if 𝑃 ≠ 𝑄,
⎪
𝑚=
⎨ 3𝑥2 +𝑎
⎪ 1 if 𝑃 = 𝑄.
⎩ 2𝑦1
(3) If 𝑃 = −𝑄 then 𝑃 + 𝑄 = 𝑂.
(4) If 𝑃 = 𝑂 then 𝑃 + 𝑄 = 𝑄, and if 𝑄 = 𝑂 then 𝑃 + 𝑄 = 𝑃.
Example 12.12. Let 𝐾 = 𝐺𝐹(19) and 𝐸 ∶ 𝑦2 = 𝑥3 + 3𝑥 + 5. The discriminant of 𝐸 is
Δ = −16(4 ⋅ 33 + 27 ⋅ 52 ) ≡ 12 mod 19, and so the Weierstrass curve is nonsingular.
We verify that 𝑃 = (1, 3) ∈ 𝐸(𝐾) and compute 2𝑃 using the above formulas. We obtain
3+3
𝑚= = 1, 𝑥3 = 1 − 1 − 1 ≡ 18 mod 19, 𝑦3 = (1 − 18) − 3 ≡ 18 mod 19, and so
2⋅3
2𝑃 = (18, 18). Furthermore, −𝑃 = (1, −3) = (1, 16). ♢
Theorem 12.13. 𝐸(𝐾) forms an abelian group with the above addition of points and the
identity element 𝑂 = [0 ∶ 1 ∶ 0].
Proof. We saw above that 𝑂 is the identity element, that every point 𝑃 has an inverse
point −𝑃 and that the addition is by construction commutative. It remains to be proven
that the addition is associative, i.e., (𝑃 + 𝑄) + 𝑅 = 𝑃 + (𝑄 + 𝑅) holds. This can be
shown by tedious computations using the explicit formulas in Proposition 12.11, by
geometric arguments (see [Was08]) or with more advanced results on algebraic curves
(see [Sil09]). □
Example 12.14. Consider the elliptic curve 𝐸 ∶ 𝑦2 = 𝑥3 + 3𝑥 + 5 over 𝐾 = 𝐺𝐹(19)
(see Example 12.12). We let SageMath find all points on this curve.
sage: E= EllipticCurve (GF (19) ,[3 ,5])
sage: E. points ()
[(0 : 1 : 0), (0 : 9 : 1), (0 : 10 : 1), (1 : 3 : 1), (1 : 16 : 1),
(2 :0 : 1), (4 : 9 : 1), (4 : 10 : 1), (6 : 7 : 1), (6 : 12 : 1),
(8 : 3 : 1), (8 : 16 : 1), (9 : 1 : 1), (9 : 18 : 1), (10 : 3 : 1),
(10 : 16 : 1), (11 : 1 : 1), (11 : 18 : 1), (14 : 6 : 1),
(14 : 13 : 1), (15 : 9 : 1), (15 : 10 : 1), (16 : 8 : 1),
(16 : 11 : 1), (18 : 1 : 1), (18 : 18 : 1)]
We see that 𝐸(𝐾) is an abelian group of order 26. By Theorem 4.29, 𝐸(𝐾) is iso-
morphic to ℤ26 ≅ ℤ13 × ℤ2 . The points in 𝐸(𝐾) are depicted in Figure 12.2. Note that
there is one extra point 𝑂 = [0 ∶ 1 ∶ 0] at infinity. All points must have order 1, 2, 13
or 26. Let 𝑃 = (1, 3) ∈ 𝐸(𝐾). We use SageMath to compute 2𝑃 and 13𝑃.
sage: E= EllipticCurve (GF (19) ,[3 ,5]); P=E(1 ,3)
sage: 2*P
(18 : 18 : 1)
sage: 13*P
(2 : 0 : 1)
Since 2𝑃 ≠ 𝑂 and 13𝑃 ≠ 𝑂, the point 𝑃 must have maximal order 26 and therefore
generates 𝐸(𝐾). ♢
For cryptographic use, one chooses a finite field 𝐾 = 𝐺𝐹(𝑝) or 𝐾 = 𝐺𝐹(2𝑚 ), an
elliptic curve 𝐸 over 𝐾 and a base point 𝑔 ∈ 𝐸(𝐾). The point 𝑔 generates a cyclic sub-
group 𝐺 = ⟨𝑔⟩ ⊂ 𝐸(𝐾) of order 𝑛 = ord(𝑔). 𝑛 should be a large prime or at least
ord(𝐸(𝐾))
contain a large prime factor. The cofactor is defined as ℎ = and usually ℎ is
𝑛
small or equal to 1.
Determining the order of a point 𝑔 and the order of 𝐸(𝐾) is a non-trivial task, but
there are efficient algorithms (see [Was08]). Hasse’s Theorem provides the approxi-
mate number of points on an elliptic curve over a finite field:
Figure 12.2. Points on the elliptic curve 𝑦2 = 𝑥3 + 3𝑥 + 5 over 𝐺𝐹(19).
Theorem 12.15. Let 𝐸 be an elliptic curve over a finite field 𝐾 of order 𝑞. Then
| 𝑞 + 1 − ord(𝐸(𝐾)) | ≤ 2√𝑞. ♢
Note that the obvious estimate based on the Weierstrass equation only gives 1 ≤
ord(𝐸(𝐾)) ≤ 2𝑞 + 1.
Example 12.16. Let 𝐸 be any elliptic curve over 𝐺𝐹(19). Then 𝑞 + 1 = 20 and 2√𝑞 ≈
8.7. Hence 𝐸(𝐺𝐹(19)) must have between 12 and 28 points. In Example 12.14, we saw
that the order of 𝐸(𝐺𝐹(19)) is 26.
Definition 12.17. Let 𝐸 be an elliptic curve over a finite field 𝐾, 𝑔 ∈ 𝐸(𝐾) a base point,
𝐺 = ⟨𝑔⟩, 𝑛 = ord(𝐺) and 𝐴 ∈ 𝐺. Then the unique positive integer 𝑎 < 𝑛 such that
𝑎 ⋅ 𝑔 = 𝐴 is called the discrete logarithm log𝑔 (𝐴) of 𝐴 ∈ 𝐺. ♢
Note that we use the term discrete logarithm although the group operation on 𝐸(𝐾)
is written additively.
The security of elliptic curve cryptography relies on the hardness of the discrete
logarithm (DL) problem (see Section 10.3) in the group 𝐺 ⊂ 𝐸(𝐾). The elliptic curve
and the parameters must be carefully chosen, since there are less secure curves where
the computation of discrete logarithms can be reduced to an easier DL problem (see
Section 12.3 below). The construction of secure elliptic curves and their domain param-
eters is beyond the scope of this book.
Elliptic curve cryptography is widely standardized by national and international
organizations (e.g., ISO, ANSI, NIST, IEEE, IETF), and one of the proposed curves is
usually chosen.
Example 12.18. We consider the domain parameters of the curve brainpoolP256r1

(RFC 5639 [LM10]). The curve is defined by the Weierstrass equation 𝑦2 = 𝑥3 + 𝑎𝑥 + 𝑏
over a 256-bit field 𝐾 = 𝐺𝐹(𝑝). The base point 𝑔 = (𝑥𝑔 , 𝑦𝑔 ) generates the full group
𝐺 = 𝐸(𝐾) and 𝑛 = ord(𝑔) is a 256-bit prime number.
p = A9FB57DBA1EEA9BC3E660A909D838D726E3BF623D52620282013481D1F6E5377
a = 7D5A0975FC2C3057EEF67530417AFFE7FB8055C126DC5C6CE94A4B44F330B5D9
b = 26DC5C6CE94A4B44F330B5D9BBD77CBF958416295CF7E1CE6BCCDC18FF8C07B6
g = (xg,yg)
xg= 8BD2AEB9CB7E57CB2C4B482FFC81B7AFB9DE27E1E3BD23C23A4453BD9ACE3262
yg= 547EF835C3DAC4FD97F8461A14611DC9C27745132DED8E545C1D54C72F046997
n = A9FB57DBA1EEA9BC3E660A909D838D718C397AA3B561A6F7901E0E82974856A7
h = 1
These parameters can be verified using SageMath (see Exercise 4).
12.2. Elliptic Curve Diffie-Hellman

We know from Section 10.3 that the Diffie-Hellman key exchange is based on a cyclic
group 𝐺. The common choice are subgroups of the multiplicative group ℤ∗𝑝 , but over
the last decade, elliptic curves have become increasingly popular.
For a Diffie-Hellman key exchange with elliptic curves, the communication part-
ners (say Alice and Bob) have to agree on a finite field 𝐾, an elliptic curve 𝐸 over 𝐾 and
a base point 𝑔 generating a group 𝐺 of order 𝑛. Usually, they would choose a standard
curve and its domain parameters as in Example 12.18.
Alice and Bob generate uniform secret keys 𝑎 and 𝑏 ∈ ℤ𝑛 , respectively. Alice
computes 𝐴 = 𝑎 ⋅ 𝑔 and Bob 𝐵 = 𝑏 ⋅ 𝑔; then they exchange the public values 𝐴 and 𝐵.
Subsequently, Alice and Bob can both compute the shared secret key 𝑘 = (𝑎𝑏) ⋅ 𝑔 =
𝑎 ⋅ 𝐵 = 𝑏 ⋅ 𝐴 ∈ 𝐺. The 𝑥-coordinate of a key 𝑘 = (𝑥, 𝑦) can be used as a shared secret.
Note that 𝑥 determines a quadratic equation for 𝑦 and there are at most two solutions
for 𝑦. For a uniform output, the 𝑥-coordinate is taken as input of a key derivation or
hash function.
12.3. Efficiency and Security of Elliptic Curve Cryptography 223
An eavesdropper, who only knows 𝐴 and/or 𝐵 as well as the elliptic curve and its
domain parameters, should not be able to derive 𝑎, 𝑏 or 𝑘 if the computational Diffie-
Hellman (CDH) problem (see Section 10.3) is hard in 𝐺.
Example 12.19. Alice and Bob agree on the elliptic curve 𝑦2 = 𝑥3 +3𝑥+5 over 𝐺𝐹(19),
and the base point is 𝑔 = 2 ⋅ (1, 3) = (18, 18). The point 𝑔 has order 13. Alice chooses
the secret key 𝑎 = 2 and computes 𝐴 = 𝑎 ⋅ 𝑔 = 2 ⋅ (18, 18) = (11, 18). Bob chooses the
secret key 𝑏 = 4 and computes 𝐵 = 𝑏 ⋅ 𝑔 = 4 ⋅ (18, 18) = (8, 3). They exchange 𝐴 and 𝐵.
Alice obtains the shared secret key by computing 𝑘 = 𝑎 ⋅ 𝐵 = 2 ⋅ (8, 3) = (9, 1) and
Bob computes 𝑘 = 𝑏 ⋅ 𝐴 = 4 ⋅ (11, 18) = (9, 1).
Remark 12.20. Elliptic curve Diffie-Hellman can also be used as a key encapsulation
mechanism (KEM) as explained in Section 10.6. The encapsulated key is 𝐻(𝑘), where
𝐻 ∶ 𝐺 → {0, 1}𝑙 is a key derivation (or hash) function on 𝐺 (or on 𝐾 if the 𝑥-coordinate
is used). The KEM also gives a hybrid elliptic curve encryption scheme called ECIES
(see Section 10.7).
12.3. Efficiency and Security of Elliptic Curve Cryptography

Diffie-Hellman and other algorithms which use elliptic curves require the multiplica-
tion of points. Analogous to the square-and-multiply algorithm for a fast exponentia-
tion (see Section 3.3), multiplying a point by a factor can be done recursively by dou-
bling and adding points (double-and-add algorithm). Proposition 12.11 shows that the
doubling or addition of points requires a few additions and multiplications as well as
division over the base field 𝐾. For 𝐾 = 𝐺𝐹(𝑝), the running time of a single addition or
doubling of points is 𝑂(size(𝑝)2 ). Using the double-and-add algorithm, the complexity
of multiplying a point by a factor of 𝑎 is 𝑂(size(𝑎) size(𝑝)2 ).
Suppose that 𝐺 = 𝐸(𝐺𝐹(𝑝)) and 𝑛 = ord(𝐺). By Hasse’s Theorem 12.15, the
order of 𝐺 is about 𝑝 and hence the computational complexity of multiplications in
𝐺 is 𝑂(size(𝑝)3 ). This is the same cubic complexity as in other public-key algorithms,
for example RSA and Diffie-Hellman over ℤ∗𝑝 . However, 𝑝 can be relatively small in
ECC, for example 256 bits long, and achieve a similar level of security as a much larger
multiplicative group ℤ∗𝑝 , say with size(𝑝) = 3072.
The security of elliptic curve schemes relies on the hardness of the discrete-loga-
rithm (DL) problem. A major difference to the multiplicative group is that the known
sub-exponential algorithms (in particular Index-Calculus) cannot be applied to elliptic
curves. If 𝑛 = ord(𝐺) is a prime number, then the best known algorithms for com-
puting discrete logarithms on elliptic curves are Babystep-Giantstep and Pollard’s 𝜌-
method for logarithms (see Section 10.5). Both algorithms have exponential complex-
ity 𝑂(√𝑛). However, the elliptic curve needs to be carefully chosen in order to prevent
certain types of attacks.
Remark 12.21. Let 𝐺 = ⟨𝑔⟩ ⊂ 𝐸(𝐺𝐹(𝑝)) and 𝑛 = ord(𝑔) be as above and sup-
pose we have a group homomorphism 𝑓 ∶ 𝐺 → 𝐺′ . With 𝑚 = ord(𝑓(𝑔)) we have
𝑚 ∣ 𝑛. Let 𝑎 ∈ {0, 1, … , 𝑛 − 1} and 𝐴 = 𝑎 ⋅ 𝑔. We obtain 𝑓(𝐴) = 𝑎 ⋅ 𝑓(𝑔), and thus

log𝑔 (𝐴) ≡ log𝑓(𝑔) (𝑓(𝐴)) mod 𝑚. Now the approach is to compute the discrete loga-
rithm in the group 𝐺′ instead of 𝐺, which might be easier. In fact, using the Weil or
Tate-Lichtenbaum pairing, 𝐺 can be efficiently embedded into the multiplicative group
of an extension field of 𝐺𝐹(𝑝). We refer to [Was08] for pairings and the MOV attack,
named after Menezes, Okamoto and Vanstone [MOV93], against the DL problem for
elliptic curves. ♢
Below, we list some of the requirements (see RFC 5639 [LM10]):
• Using the Weil or Tate-Lichtenbaum pairing mentioned above, one can reduce
the DL problem in 𝐺 to the DL problem in the multiplicative group of the field
𝐺𝐹(𝑝𝑙 ), where 𝑛 ∣ 𝑝𝑙 −1, or equivalently, 𝑝𝑙 ≡ 1 mod 𝑛. However, this can only be
exploited if 𝑙 is small enough so the discrete logarithm in 𝐺𝐹(𝑝𝑙 )∗ can be efficiently
computed. In order to prevent this attack, one requires that the multiplicative
order of 𝑝 mod 𝑛 is large and close to the maximum possible value 𝜑(𝑛).
• Anomalous curves have exactly 𝑝 points over 𝐺𝐹(𝑝). Such curves must be
avoided, since the DL problem can be reduced to an additive group, where com-
putation of the discrete logarithm is a simple division and thus runs in polynomial
time.
• The group order 𝑛 = ord(𝐺) should be a prime number, so that the DL problem
cannot be reduced to subgroups of 𝐺 (Pohlig-Hellman algorithm).
• The generation of the elliptic curve and its domain parameters should be pseudo-
random.
Table 12.1 (see [Bar16]) shows comparable strengths for different algorithms and
key lengths (in bits). The table compares symmetric encryption schemes (key size),
RSA (size of the modulus 𝑁), Diffie-Hellman using a subgroup of order 𝑞 in ℤ∗𝑝 (size of
𝑝, size of 𝑞) and Diffie-Hellman with elliptic curves (size of 𝑛, where 𝑛 is the order of
the cyclic group 𝐺).
Table 12.1. Comparable algorithm strengths for different key lengths.
Symmetric Encryption RSA DH ECDH

80 1024 1024, 160 160 − 223
112 2048 2048, 224 224 − 255
128 3072 3072, 256 256 − 383
192 7680 7680, 384 384 − 511
256 15360 15360, 512 512+
12.4. Elliptic Curve Factoring Method

Surprisingly, elliptic curves cannot only be used to secure cryptographic operations,
but also to attack RSA by factoring large integers.
12.4. Elliptic Curve Factoring Method 225
Lenstra’s elliptic curve factoring method (ECM) is similar to Pollard’s 𝑝 − 1 method

(see Section 9.7). Suppose that 𝑁 = 𝑝𝑞 is given and the factors 𝑝, 𝑞 are unknown. If all
prime factors of 𝑝 − 1 are small, then one has a significant chance of finding a multiple
𝑘 of 𝑝 − 1, for example by setting 𝑘 = 𝐵! for some bound 𝐵. Euler’s Theorem implies
𝑎𝑝−1 ≡ 1 mod 𝑝 and hence 𝑎𝑘 ≡ 1 mod 𝑝 for any base 𝑎 that is not a multiple of 𝑝.
Hence gcd(𝑎𝑘 − 1, 𝑁) is either 𝑝 (success) or 𝑁 (failure). Pollard’s method is successful
if the image of 𝑎𝑘 in
ℤ∗𝑁 ≅ ℤ∗𝑝 × ℤ∗𝑞
is congruent to 1 in exactly one of the components. This attack can be prevented by
choosing primes 𝑝 such that 𝑝 − 1 contains a large prime factor (strong primes). Ran-
domly chosen primes 𝑝 usually satisfy this condition.
ECM replaces the multiplicative group ℤ∗𝑁 in Pollard’s method by the group of
points on an elliptic curve 𝐸 over ℤ𝑁 . Although ℤ𝑁 is only a ring and not a field, we
have the decompositions
ℤ𝑁 ≅ ℤ𝑝 × ℤ𝑞 and 𝐸(ℤ𝑁 ) ≅ 𝐸(𝐺𝐹(𝑝)) × 𝐸(𝐺𝐹(𝑞)).
The formulas for addition and doubling of points in Proposition 12.11 (2) also hold over
ℤ𝑁 if the point is not equal to 𝑂 modulo 𝑝 or 𝑞. We choose an elliptic curve 𝐸 and a
point 𝑃 ∈ 𝐸(ℤ𝑁 ). Then compute 𝑘𝑃, for example 𝑘 = 𝐵!, and check whether the result
exists as an affine point modulo 𝑁. This fails if and only if a denominator of a slope 𝑚
in the computation of 𝑘𝑃 is not invertible modulo 𝑁. In this case, the greatest common
divisor (gcd) of the denominator of 𝑚 and 𝑁 is not equal to 1, and it is very likely that
the gcd is either 𝑝 or 𝑞, and not 𝑁.
One may proceed as follows to choose an elliptic curve 𝐸 and a point 𝑃 over ℤ𝑁 :
choose random integers 𝑎, 𝑢 and 𝑣 between 0 and 𝑁 − 1. Let
𝑃 = (𝑢, 𝑣) mod 𝑁,
𝑏 = 𝑣2 − 𝑢3 − 𝑎𝑢 mod 𝑁,
𝐸 ∶ 𝑦2 = 𝑥3 + 𝑎𝑥 + 𝑏.
By construction we have 𝑃 ∈ 𝐸(ℤ𝑁 ). Then define 𝑘 = 𝐵! for some bound 𝐵 and com-
pute 𝑘𝑃. If 𝑘𝑃 does not exist modulo 𝑁 (as an affine point, see Remark 12.22 below),
then one has found a non-trivial factor of 𝑁, i.e., either 𝑝, 𝑞 or 𝑁. If 𝑘𝑃 exists, or if the
factor is 𝑁, then choose a new random curve 𝐸 and point 𝑃 or increase 𝐵 and start over.
Remark 12.22. We say that the point 𝑘𝑃 does not exist modulo 𝑁 if it cannot be com-
puted modulo 𝑁 (because of a non-invertible denominator). However, the correspond-
ing projective point exists and the reduction modulo 𝑝 or modulo 𝑞 is equal to the point
𝑂 at infinity. The aim of ECM is to produce such a non-affine point, since it yields a
non-trivial factor of 𝑁. This is analogous to Pollard’s 𝑝−1 method, where one is looking
for a power 𝑎𝑘 that is congruent to 1 modulo 𝑝 or modulo 𝑞.
Example 12.23. Let 𝑁 = 1211809. We choose 𝑎 = 10, 𝑢 = 5 and 𝑣 = 7 so that

𝑏 = −126. The corresponding elliptic curve is
𝐸 ∶ 𝑦2 = 𝑥3 + 10𝑥 − 126
and 𝑃 = (5, 7) ∈ 𝐸(ℤ𝑁 ). Now we compute multiples of 𝑃. We choose the bound

𝐵 = 10, set 𝑘 = 𝐵! and hope that the order of 𝑃 in 𝐸(𝐺𝐹(𝑝)) or 𝐸(𝐺𝐹(𝑞)) contains only
prime factors less than or equal to 𝐵. We compute
2𝑃, (3! )𝑃, (4! )𝑃, … , (10! )𝑃
and stop if any of these multiples does not exist as an affine point modulo 𝑁. The reader
can reproduce the following computations using SageMath.
We omit some intermediate steps and obtain
(6! )𝑃 = 720 ⋅ 𝑃 = (222064, 820051).
We denote this point by 𝑄 = (𝑥1 , 𝑦1 ). The next step is computing (7! )𝑃 = 7𝑄. Us-
ing the double-and-add algorithm and the formulas in Proposition 12.11, we compute
2𝑄 = (884483, 792125), 3𝑄 = 2𝑄 + 𝑄 = (179208, 246408) and 6𝑄 = 2 ⋅ (3𝑄) =
(1011121, 433793) = (𝑥2 , 𝑦2 ). Finally, 7𝑄 = 6𝑄 + 𝑄, but now the slope
𝑦2 − 𝑦1
𝑚=
𝑥2 − 𝑥 1
does not exist modulo 𝑁 since the denominator
𝑥2 − 𝑥1 = 1011121 − 222064 = 789057
has a common factor with 𝑁. In fact, gcd(𝑥2 − 𝑥1 , 𝑁) = 1201 = 𝑝, and thus we have
found a factor of 𝑁. The second factor is 𝑞 = 1009.
We see that the elliptic curve factoring method is successful for 𝑁 and the chosen
curve 𝐸, point 𝑃 and multiple 𝑘 = 7!. After having found 𝑝 = 1201, it is not difficult
to understand why the method is successful: the group 𝐸(𝐺𝐹(1201)) has order 1176 =
23 ⋅ 2 ⋅ 72 , which contains only prime factors less than or equal to 7 so that (7! )𝑃 =
𝑂 mod 𝑝.
On the other hand, the order of 𝐸(𝐺𝐹(𝑞)) is 1041 = 3 ⋅ 347, and hence (7! )𝑃 ≠
𝑂 mod 𝑞. So we obtain only the factor 𝑝 and not the product 𝑝𝑞 = 𝑁. ♢
ECM is very successful in factoring integers 𝑁 with less than around 80 decimal
digits. For larger values of 𝑁, the sieve methods (see Section 9.7) are more efficient.
Exercises 227
12.5. Summary
• An elliptic curve is a nonsingular projective curve defined by a cubic Weier-

strass equation in two variables. The points of an elliptic curve over a field
form an abelian group.
• The group of points on an elliptic curve over a finite field can be used in public-
key schemes. Common applications of elliptic curves are the Diffie-Hellman
key exchange, key encapsulation, hybrid encryption and digital signatures.
• The security of elliptic curve cryptography relies on the assumed hardness of
the discrete logarithm problem. The elliptic curve and the domain parameters
need to be carefully chosen.
• Elliptic curve schemes have shorter keys than classical public-key algorithms
with comparable strength.
• Elliptic curves can also be used to factorize large integers.
Exercises
1. Let 𝐸 be the elliptic curve defined by 𝑦2 = 𝑥3 + 3𝑥 + 5.

(a) Assume that 𝐸 is defined over 𝐾 = 𝐺𝐹(19). Compute the discriminant Δ.
Show that 𝑃 = (4, 9) ∈ 𝐸(𝐺𝐹(19)) and compute 2𝑃.
(b) Now assume that 𝐸 is defined over 𝐾 = ℚ. Compute the discriminant Δ.
Show that 𝑃 = (4, 9) ∈ 𝐸(ℚ) and compute 2𝑃.
2. Suppose 𝐸 is an elliptic curve over 𝐾 = 𝐺𝐹(23).
(a) Give the minimum and the maximum number of points on 𝐸(𝐾).
(b) How many point additions or doublings are at most needed to compute 𝑛𝑃 for
any 𝑛 ∈ ℤ and 𝑃 ∈ 𝐸(𝐾) ?
3. Consider the Weierstrass curve 𝑦2 = 𝑥3 and let 𝐸𝑛𝑠 (𝐾) be the set of nonsingular
points on the curve (see Example 12.8). Prove that the map
𝑥
𝑓 ∶ 𝐸𝑛𝑠 (𝐾) → (𝐾, +), (𝑥, 𝑦) ↦ , 𝑂 ↦ 0
𝑦
is bijective. Add-on: Use the addition formulas in Proposition 12.11 to show that
𝑓 is a homomorphism, i.e., 𝑓(𝑃1 + 𝑃2 ) = 𝑓(𝑃1 ) + 𝑓(𝑃2 ) holds. Therefore, 𝑓 is an
isomorphism.
𝑥 1 1
Tip: Let 𝑡 = . Verify that 𝑥 = 2 and 𝑦 = 3 .
𝑦 𝑡 𝑡
4. Show that the parameters in Example 12.18 are valid, i.e., that 𝑝 and 𝑛 are prime,
Δ ≠ 0 and ord(𝑔) = 𝑛. Furthermore, show that ord(𝑝 mod 𝑛) is large.
Hint: Use SageMath. Hexadecimal numbers can be defined using the prefix 0x.
Check the primality using is_prime(). Define the elliptic curve
E=EllipticCurve(GF(p),[a,b]) and the point g=E(xg,yg). The order of 𝐸(𝐾)
can be obtained by E.order().
5. Use the parameters in Example 12.18 for an elliptic curve Diffie-Hellman key ex-
change. Assume that Alice and Bob choose the following secret parameters:
a=81DB1EE100150FF2EA338D708271BE38300CB54241D79950F77B063039804F1D
b=55E40BC41E37E3E2AD25C3C6654511FFA8474A91A0032087593852D3E7D76BD3
Compute 𝐴, 𝐵, 𝑏𝐴, 𝑎𝐵 and 𝑘.
6. Factorize 𝑁 = 6227327 using the elliptic curve factoring method and SageMath.
(a) Choose 𝑎 = 4, 𝑢 = 6 and 𝑣 = 2. Give the associated elliptic curve over ℤ𝑁 and
the point 𝑃 ∈ 𝐸(ℤ𝑁 ).
Tip: E=EllipticCurve(IntegerModRing(N),[a,b]) and P=E(u,v).
(b) Let 𝐵 = 13!. Show that (12! )𝑃 exists (as an affine point), but not (13)! 𝑃.
(c) Find the critical denominator and compute the gcd with 𝑁.
(d) Give the factors 𝑝 and 𝑞 of 𝑁 and explain why the method is successful with
the chosen parameters.
7. The Elliptic Curve Digital Signature Algorithm (ECDSA) is the elliptic curve ana-
logue of DSA (see Exercise 11.10) and also standardized in [FIP13]. The scheme
uses an elliptic curve 𝐸 over a finite field 𝐾 and a base point 𝑔 of prime order 𝑛.
Choose a uniform secret key 𝑎 ∈ ℤ𝑛 and compute the point 𝐴 = 𝑎 𝑔 ∈ 𝐸(𝐾). The
domain parameters of the elliptic curve and 𝐴 form the public key. Similarly to
ElGamal and DSA, the signature is randomized and also requires a hash function
𝐻. In order to sign a message 𝑚, a secret uniform integer 𝑘 with 1 ≤ 𝑘 ≤ 𝑛 − 1 is
chosen and 𝑘 𝑔 is computed. Let 𝑟 be the 𝑥-coordinate modulo 𝑛 of the point 𝑘 𝑔.
If 𝑟 = 0 then choose a new value 𝑘. Otherwise, let
𝑠 = 𝑘−1 (𝐻(𝑚) + 𝑎𝑟) mod 𝑛.
If 𝑠 = 0 then start again with a new value 𝑘. Otherwise, the pair (𝑟, 𝑠) is the signa-
ture of 𝑚.
To verify the signature (𝑟, 𝑠) of a message 𝑚, one checks that 1 ≤ 𝑟 ≤ 𝑛 − 1 and
1 ≤ 𝑠 ≤ 𝑛 − 1. Then compute 𝑠−1 mod 𝑛, 𝑠−1 𝐻(𝑚) mod 𝑛, 𝑠−1 𝑟 mod 𝑛 and the
point
𝑅 = 𝑠−1 𝐻(𝑚)𝑔 + 𝑠−1 𝑟𝐴 ∈ 𝐸(𝐾).
Since ord(𝑔) = 𝑛, the result does not depend on the representatives of 𝑠−1 𝐻(𝑚)
and 𝑠−1 𝑟 modulo 𝑛. If 𝑅 = 𝑂 then the signature is invalid. Otherwise, reduce the
𝑥-coordinate of 𝑅 modulo 𝑛. The signature is valid if the result is 𝑟.
(a) Prove that the verification is correct.
(b) Use the elliptic curve and the parameters of Example 12.19; the base point is
𝑔 = (18, 18) and 𝑛 = ord(𝑔) = 13. Alice’s secret key is 𝑎 = 2. She wants to
sign a message 𝑚 with 𝐻(𝑚) ≡ 11 mod 𝑛 and chooses 𝑘 = 3. Compute the
signature (𝑟, 𝑠) and verify the signature using her public key.
(c) Show that 𝑘 must remain secret.
Chapter 13
Quantum Computing
This chapter provides an introduction to quantum computing. We focus on quantum

computation and cryptographic applications, not on physical realizations of quantum
computers, and we do not require any previous knowledge in quantum mechanics.
Quantum computers have amazing features: they can process a very large number
of input values simultaneously and solve certain hard problems. At the time of this
writing, quantum computers are not yet large and stable enough to break any real-
world cryptosystems, but this might change within a decade. Today’s quantum systems
are noisy and quantum error correction is an important topic. There is now a broad
range of research into post-quantum cryptography, and new schemes should be secure
in the presence of quantum computers. Chapters 14 and 15 give an introduction to
several post-quantum schemes.
The basic information unit of quantum computing is a quantum bit (qubit), which
can assume not only two, but infinitely many states. However, only one classical bit can
be extracted (or measured) from a qubit and the output is probabilistic. In Section 13.1,
qubits and operations on single qubits are introduced. Systems with multiple qubits are
dealt with in Section 13.2. Analogous to classical computing, quantum algorithms are
realized with quantum gates. Since the output is probabilistic, quantum algorithms
need to be carefully designed in order to obtain useful results. The Deutsch-Josza al-
gorithm is explained in Section 13.3 and demonstrates the possibilities of quantum
computing.
A major algorithm is the Quantum Fourier Transform (Section 13.4), which can
efficiently find periods in a large (exponential) set of numbers. In Section 13.5, we show
that Shor’s algorithm can efficiently factorize integers and break the RSA cryptosystem.
229
230 13. Quantum Computing
Quantum bits can also be used for a secure key distribution, and we explain the
BB84 quantum cryptographic protocol in Section 13.6.
Two recommended textbooks for further reading on quantum computing are
[NC00] and [RP11].
13.1. Quantum Bits

A classical computer processes the input data sequentially in order to solve a problem.
Turing machines or logical circuits can be used to model classical computation. A
Boolean circuit is made up of elementary gates and performs a computational task,
where input bits are transformed into output bits. In each step, the system has a fixed
state and, depending on the complexity of an algorithm and the size of the input data,
the computation can be inefficient or infeasible: suppose that 𝑓 is a vectorial Boolean
function and an algorithm iterates over all 𝑥 ∈ {0, 1}𝑛 until 𝑓(𝑥) is equal to a fixed value
𝑦. The worst-case running time is exponential in 𝑛 and the algorithm is inefficient on
conventional computers, even if the individual computations of 𝑓(𝑥) run very fast.
Now, quantum computers can compute all 2𝑛 values 𝑓(𝑥) simultaneously. This
sounds fantastic, but there is a catch: the computation requires a system of 𝑛 qubits,
the internal state is inaccessible and only a single output value can be extracted from
a quantum system. Nevertheless, we will see that quantum algorithms can efficiently
solve many hard problems.
Quantum computing uses quantum circuits instead of Boolean circuits, and quan-
tum algorithms take qubits (quantum bits) instead of classical bits as input. A qubit is
a two-level quantum-mechanical system and can have different physical implementa-
tions, for example the polarization of a photon, the spin of an electron or the state of
an ion. The physical realization of quantum systems is an important field of research
and development, but it is beyond the scope of this book.
A qubit can assume infinitely many states between 0 or 1; the state is represented
by a normalized vector in the vector space ℂ2 . We fix an orthonormal basis of ℂ2 , for
example the standard basis 𝑒1 = (1, 0) and 𝑒2 = (0, 1), and denote the basis states by
|0⟩ and |1⟩. The Dirac or ket notation of a vector 𝜓 in the state space is |𝜓⟩. The state of
a qubit |𝜓⟩ is a superposition (linear combination) of the basis states |0⟩ and |1⟩:
|𝜓⟩ = 𝑎 |0⟩ + 𝑏 |1⟩ , where 𝑎, 𝑏 ∈ ℂ and |𝑎|2 + |𝑏|2 = 1.
The coefficients of |0⟩ and |1⟩ can be interpreted as probabilities, and a qubit as a ran-
dom variable. A measurement (observation) changes the state of a qubit and yields a
classical bit, i.e., either 0 or 1. The state of a qubit determines the probability of the
result of a measurement: the probability of 0 is |𝑎|2 and the probability of 1 is |𝑏|2 .
The original state of a qubit, and hence the amplitudes 𝑎 and 𝑏, are lost after the mea-
surement. Thus the quantum information (𝑎 and 𝑏) is hidden and cannot be directly
extracted. However, the quantum information can be used in quantum circuits that
transform the state of qubits.
Example 13.1. The measurement of a qubit having the state |0⟩ = 1 ⋅ |0⟩ + 0 ⋅ |1⟩ always
gives 0, and the measurement of the state |1⟩ always gives 1. On the other hand, if a
qubit has the state
1 1
|0⟩ + |1⟩ ,
√2 √2
1
then the probability of both 0 and 1 is , and a measurement outputs a uniform random
2
bit. This above state is quite useful and we denote it by |+⟩. ♢
A geometric representation of the state of a single qubit is given by the Bloch sphere
(see Figure 13.1). The points on a two-dimensional unit sphere in the three-dimension-
al space ℝ3 are given by two angles, the polar angle 𝜃 ∈[0, 𝜋] and the azimuth 𝜑 ∈]−𝜋, 𝜋].
In geography, the polar angle and the azimuth are called colatitude and longitude, re-
spectively.
𝐳 = |0⟩
|𝜓⟩
𝜃
𝐲
𝜑
𝐱 = |+⟩
−𝐳 = |1⟩
Figure 13.1. Bloch sphere.
How can we represent a state |𝜓⟩ = 𝑎 |0⟩ + 𝑏 |1⟩ by a point on the Bloch sphere?
We use the fact that the state of a qubit does not depend on a global phase 𝛾:
|𝜓⟩ ∼ 𝑒𝑖𝛾 |𝜓⟩ .
We may thus assume that the phase of the first coefficient 𝑎 is zero, so that 𝑎 is a non-
negative real number. We express the second coefficient 𝑏 ∈ ℂ in polar form:
𝑏 = 𝑟𝑒𝑖𝜑 ,
where 𝑟 ≥ 0 and 𝜑 ∈ ] − 𝜋, 𝜋]. The condition |𝑎|2 + |𝑏|2 = 1 implies 𝑎2 + 𝑟2 = 1,
since |𝑒𝑖𝜑 | = 1. The non-negative parameters 𝑎 and 𝑟 thus lie on a unit circle in the
first quadrant and we can write
𝜃 𝜃
𝑎 = cos ( ) and 𝑟 = sin ( ) ,
2 2
where 𝜃 ∈ [0, 𝜋]. We can therefore rewrite |𝜓⟩ = 𝑎 |0⟩ + 𝑏 |1⟩ as

𝜃 𝜃
|𝜓⟩ = cos ( ) |0⟩ + 𝑒𝑖𝜑 sin ( ) |1⟩ ,
2 2
where 𝜃 ∈ [0, 𝜋] and 𝜑 ∈ ] − 𝜋, 𝜋].
Example 13.2. Here are the representations of several states:
|0⟩ = cos(0) |0⟩ + 𝑒𝑖⋅0 sin(0) |1⟩ 𝜃=0 𝜑 = 0,
𝜋 𝜋
|1⟩ = cos ( ) |0⟩ + 𝑒𝑖⋅0 sin ( ) |1⟩ 𝜃=𝜋 𝜑 = 0,
2 2
𝜋 𝑖⋅0 𝜋 𝜋
|+⟩ = cos ( ) |0⟩ + 𝑒 sin ( ) |1⟩ 𝜃= 𝜑 = 0,
4 4 2
𝜋 𝑖⋅𝜋 𝜋 𝜋
|−⟩ = cos ( ) |0⟩ + 𝑒 sin ( ) |1⟩ 𝜃= 𝜑 = 𝜋.
4 4 2
Note that a global phase does not change the state, e.g., 𝑖 |+⟩ ∼ |+⟩, but the relative
phase 𝜑 is important! For example, |+⟩ and |−⟩ are different states. ♢
Next, we study operations on a single qubit by a quantum gate.

Definition 13.3. A quantum gate 𝑈𝑓 with a single input and output qubit is described
by a unitary 2 × 2 matrix
𝑐 𝑐
𝑈 = ( 11 12 ) .
𝑐21 𝑐22
A state |𝜓⟩ = 𝑎 |0⟩ + 𝑏 |1⟩ is transformed into
𝑈𝑓 |𝜓⟩ = 𝑈𝑓 (𝑎 |0⟩ + 𝑏 |1⟩) = (𝑐11 𝑎 + 𝑐12 𝑏) |0⟩ + (𝑐21 𝑎 + 𝑐22 𝑏) |1⟩ . ♢
Surprisingly, there is more than one nontrivial operation on a single qubit.

Definition 13.4. (1) The quantum analogue of the classical NOT gate transforms the
state |𝜓⟩ = 𝑎 |0⟩ + 𝑏 |1⟩ into |𝜓⟩ = 𝑏 |0⟩ + 𝑎 |1⟩. The swapping of coefficients is
given by the Pauli-𝑋 matrix
0 1
𝑋=( ).
1 0
(2) The Hadamard gate is described by the unitary matrix
1 1 1
𝐻= ( ). ♢
√2 1 −1
Figure 13.2 depicts the Pauli-𝑋 and Hadamard gates.

1 1 1 1 1
Since 𝐻 ⋅ ( ) = ( ), the state |0⟩ is transformed into |+⟩ = |0⟩ + |1⟩. This
0 √2 1 √2 √2
is very useful in order to produce a balanced superposition of the basis states. Loosely
speaking, a 0-bit is turned into a qubit that is simultaneously 0 and 1. Measuring 𝐻 |0⟩
gives a uniform random bit (see Figure 13.3).
𝑎 |0⟩ + 𝑏 |1⟩ 𝑋 𝑏 |0⟩ + 𝑎 |1⟩
𝑎+𝑏 𝑎−𝑏
𝑎 |0⟩ + 𝑏 |1⟩ 𝐻 |0⟩ + |1⟩
√2 √2
Figure 13.2. Pauli-𝑋 and Hadamard gates.
|0⟩ 𝐻 𝑏
Figure 13.3. Measuring 𝐻 |0⟩ gives a uniform random bit 𝑏.
Definition 13.5. Other commonly used single qubit gates are:

0 −𝑖
(1) Pauli-𝑌 gate with matrix 𝑌 = ( ).
𝑖 0
1 0
(2) Pauli-𝑍 gate with matrix 𝑍 = ( ).
0 −1
1 0
(3) Phase gate with matrix 𝑆 = ( ).
0 𝑖
𝜋 1 0
(4) gate with matrix 𝑇 = ( ). ♢
8 0 𝑒𝑖𝜋/4
Since 𝑌 = −𝑖𝑍𝑋, Pauli-𝑌 is a composition of Pauli-𝑋 and Pauli-𝑍. The gates Pauli-
𝜋 𝜋 𝜋
𝑍, Phase and change the relative phase by 𝜋, and , respectively. On the Bloch
8 2 4
𝜋 𝜋
sphere, these gates give rotations of 𝜋, and around the 𝑧-axis. For example, one
2 4
has
𝜃 𝜃 𝜃 𝜃
𝑍 |𝜓⟩ = 𝑍 (cos ( ) |0⟩ + 𝑒𝑖𝜑 sin ( ) |1⟩) = cos ( ) |0⟩ − 𝑒𝑖𝜑 sin ( ) |1⟩ .
2 2 2 2
This state can be written as
𝜃 𝜃
𝑍 |𝜓⟩ = cos ( ) |0⟩ + 𝑒𝑖(𝜑+𝜋) sin ( ) |1⟩ .
2 2
Hence the 𝑍 gate adds 𝜋 to the azimuth 𝜑 on the Bloch sphere.
𝜋
The reason for the historical name gate is the fact that 𝑇 is (up to an unimportant
8
𝜋
global phase) equal to a matrix with ± on its diagonals:
8
𝑒−𝑖𝜋/8 0
𝑇 = 𝑒𝑖𝜋/8 ( 𝑖𝜋/8 ) .
0 𝑒
13.2. Multiple Qubit Systems

More interesting quantum operations require systems of multiple qubits. A composite
system of 𝑛 qubits can represent 2𝑛 states simultaneously. The basis states are
|𝑥1 𝑥2 … 𝑥𝑛 ⟩ ,
where 𝑥𝑖 ∈ {0, 1}. The general state of an 𝑛-qubit system is a superposition of the 2𝑛
basis states. Such a system is not the same as 𝑛 individual qubits!
Remark 13.6. The state of system of 𝑛 qubits is represented by the 𝑛-fold tensor prod-
uct of ℂ2 :
ℂ2 ⊗ ⋯ ⊗ ℂ2 = (ℂ2 )⊗𝑛 .
The general construction of tensor products is beyond our scope, but in this case
(ℂ2 )⊗𝑛 is a ℂ-vector space of dimension 2𝑛 . The elements in (ℂ2 )⊗𝑛 are linear com-
binations of vectors 𝑣1 ⊗ 𝑣2 ⊗ ⋯ ⊗ 𝑣𝑛 = |𝑣1 , 𝑣2 , … , 𝑣𝑛 ⟩, where 𝑣𝑖 ∈ ℂ2 . The tensor
product is linear in each component. The standard basis states are given by
𝑥1 ⊗ 𝑥2 ⊗ ⋯ ⊗ 𝑥𝑛 = |𝑥1 𝑥2 … 𝑥𝑛 ⟩ ,
where 𝑥𝑖 = |0⟩ or |1⟩. A general vector in (ℂ2 )⊗𝑛 can be written as
𝑣= ∑ 𝑎𝑥 |𝑥1 𝑥2 … 𝑥𝑛 ⟩ .
𝑥∈{0,1}𝑛
The norm of 𝑣 is the non-negative real number

‖𝑣‖ = ∑ |𝑎𝑥 |2 .
𝑥∈{0,1}𝑛
The normalization condition for multiple qubit systems requires that ‖𝑣‖ = 1. Uni-
tary operators 𝑈𝑓1 , … , 𝑈𝑓𝑛 on ℂ2 induce the unitary operator
𝑈𝑓 = 𝑈𝑓1 ⊗ ⋯ ⊗ 𝑈𝑓𝑛
2 ⊗𝑛
on (ℂ ) , but there are additional operators that are not of this type.
As with single qubits, a multiple qubit state does not depend on a global phase and
|𝜓⟩ ∼ 𝑒𝑖𝛾 |𝜓⟩ .
A two qubit system is represented by a state in ℂ2 ⊗ ℂ2 . The four basis states are |00⟩,
|01⟩, |10⟩, |11⟩, and a general state is a superposition of the four basis states:
|𝜓⟩ = 𝑎00 |00⟩ + 𝑎01 |01⟩ + 𝑎10 |10⟩ + 𝑎11 |11⟩ .
The normalization condition requires |𝑎00 |2 + |𝑎01 |2 + |𝑎10 |2 + |𝑎11 |2 = 1. Measuring

a two qubit system gives 00 with probability |𝑎00 |2 , 01 with probability |𝑎01 |2 , 10 with
probability |𝑎10 |2 and 11 with probability |𝑎11 |2 .
Example 13.7. (1) Consider a two qubit system in the state

1 1 1 1
|𝜓⟩ = ( |0⟩ + |1⟩) ⊗ ( |0⟩ + |1⟩)
√2 √2 √2 √2
1 1 1 1
= |00⟩ + |01⟩ + |10⟩ + |11⟩ .
2 2 2 2
1 1
All four combinations have the same probability ( )2 = .
2 4
(2) One of the Bell states is
1 1
|𝜓⟩ = |00⟩ + |11⟩ .
√2 √2
After a measurement, the state is either 00 or 11. The combinations 01 and 10
cannot occur; a measurement of the two qubits must give the same result. The
qubits are entangled. In quantum mechanics, this is called the EPR (Einstein,
Podolsky, Rosen) paradox, where two particles are mysteriously correlated. The
other three Bell states are given in Exercise 2. ♢
One can show that the Bell states are not the product of any two single qubit states
(see Exercise 1). In fact, two entangled qubits behave differently from two single qubits.
The basis states of a system of 𝑛 qubits are |𝑥1 𝑥2 … 𝑥𝑛 ⟩ where 𝑥𝑖 ∈ {0, 1}. A general
state is a superposition
|𝜓⟩ = ∑ 𝑎𝑥 |𝑥⟩ with ∑ |𝑎𝑥 |2 = 1.
𝑥∈{0,1}𝑛 𝑥∈{0,1}𝑛
A basis state |𝑥1 𝑥2 … 𝑥𝑛 ⟩ can also be written as |𝑥⟩, where

𝑥 = 𝑥1 2𝑛−1 + 𝑥2 2𝑛−2 + ⋯ + 𝑥𝑛 ∈ {0, 1, … , 2𝑛 − 1}.
Then, a general state of an 𝑛-qubit system is
2𝑛 −1 2𝑛 −1
|𝜓⟩ = ∑ 𝑎𝑥 |𝑥⟩ with ∑ |𝑎𝑥 |2 = 1.
𝑥=0 𝑥=0
An obvious generalization of the Bloch sphere for multiple qubits is not known. Note
that the full state of a system of 𝑛 qubits involves 2𝑛 complex amplitudes (and 2𝑛 − 1
degrees of freedom, since the amplitude vector is normalized and multiplication by a
global phase is unimportant). This is a huge number, say for 𝑛 > 100. We emphasize
that a measurement outputs only one binary word 𝑥 of length 𝑛 and the probability of
𝑥 being measured is |𝑎𝑥 |2 .
13.3. Quantum Algorithms

Classical Boolean circuits transform input bits into output bits. Not surprisingly, quan-
tum circuits process qubits. The basic building blocks of quantum circuits are quantum
logic gates, which implement unitary (and therefore reversible) transformations on
qubits. This is a major difference to classical circuits, which are often not reversible,
for example the elementary AND or OR gates. However, this does not pose a serious
restriction, since there are invertible analogues of the classical gates. It turns out that
every classical circuit has a quantum analogue, giving the same output on the basis
states, but also processing superpositions of the basis states.
Definition 13.8. The controlled-NOT operation 𝐶𝑁𝑂𝑇 on two input qubits with basis
states |𝑥⟩ and |𝑦⟩ is given by
𝐶𝑁𝑂𝑇 |𝑥, 𝑦⟩ = |𝑥, 𝑥 ⊕ 𝑦⟩
(see Figure 13.4). This transformation acts on the basis states as follows: the first bit
(the control bit) is unchanged and the second bit (the target bit) is flipped if the control
bit is 1. The states |00⟩ and |01⟩ are thus unchanged, |10⟩ is mapped to |11⟩ and |11⟩ to
|10⟩. The CNOT gate is represented by the unitary matrix
1 0 0 0
⎛ ⎞
0 1 0 0
𝑈=⎜ ⎟,
⎜0 0 0 1⎟
⎝0 0 1 0⎠
since the first two basis states remain unchanged, while the third and the fourth basis
state are swapped. ♢
|𝑥⟩ |𝑥⟩
|𝑦⟩ |𝑥⟩ ⊕ |𝑦⟩
Figure 13.4. Controlled NOT gate.
CNOT can produce entangled states and this two-bit gate is not the tensor product
of two single-qubit gates. For example, two qubits are transformed into the Bell state:
1 1
𝐶𝑁𝑂𝑇 ( (|00⟩ + |10⟩)) = (|00⟩ + |11⟩) .
√2 √2
CNOT is an example of a two qubit controlled gate (see Figure 13.5): if the first (control)
qubit is |1⟩, then a single qubit operation 𝑄 is performed on the second (target) qubit.
If the control qubit is |0⟩, then the target qubit is unchanged (see Figure 13.5). The
controlled 𝑄 gate is represented by a 4 × 4 matrix of four 2 × 2 blocks, where 𝐼2 is the
2 × 2 identity matrix:
𝐼 0
(2 ).
0 𝑄
CNOT is a controlled 𝑄 gate with 𝑄 = 𝑋 (Pauli-𝑋). The CNOT gate is especially

important, since single qubit and CNOT gates are universal.
Figure 13.5. Controlled 𝑄 gate.
Theorem 13.9. Single qubit gates and the CNOT gate are sufficient to implement an
arbitrary unitary operation on 𝑛 qubits.
Proof. We refer to [NC00] for a proof. Firstly, one shows that an arbitrary unitary
matrix can be decomposed into a product of two-level unitary matrices, where at most
two coordinates are changed. Secondly, circuits from two-level unitary matrices are
built from single qubit and CNOT gates. □
Remark 13.10. At the time of writing, IBM offers quantum computers and simu-
lators for public use (https://quantum-computing.ibm.com). You can create and
run your own algorithms on quantum devices using a graphical composer or a Quan-
tum Assembly Language Code (QASM) editor. Circuits are built from a set of ele-
mentary gates (𝑋, 𝑌 , 𝑍, 𝐻, 𝑆, 𝑇, CNOT, … ), and combinations of them can express
more complicated unitary transformations. Also look at the open source framework
Qiskit (https://qiskit.org) for working with noisy quantum computers.
Many quantum algorithms take as input a superposition of the basis states. The
single qubit Hadamard gate 𝐻 (see Section 13.1) transforms the basis state |0⟩ into the
superposition
1
(|0⟩ + |1⟩) .
√2
The Walsh-Hadamard transformation generalizes this to a system of 𝑛 qubits. It can
be implemented by 𝑛 parallel Hadamard gates.
Definition 13.11. The Walsh-Hadamard transformation 𝑊 acts on a system of 𝑛 qubits
and is defined by
𝑊 = 𝐻 ⊗ 𝐻 ⊗ ⋯ ⊗ 𝐻 = 𝐻 ⊗𝑛 . ♢
The Walsh-Hadamard transformation 𝑊 transforms the zero state into a balanced

superposition of all 2𝑛 basis states (see Figure 13.6):
1 1 1
𝐻 ⊗𝑛 |00 … 0⟩ = (|0⟩ + |1⟩) ⊗ (|0⟩ + |1⟩) ⊗ ⋯ ⊗ (|0⟩ + |1⟩)
√2 √2 √2
1
= (|00 … 0⟩ + |00 … 1⟩ + ⋯ + |11 … 1⟩) .
√2𝑛
1
|0𝑛 ⟩ 𝑊 ∑ |𝑥⟩
√2𝑛 𝑥∈{0,1}𝑛
Figure 13.6. Walsh-Hadarmard transformation.
The second equation follows from the fact that the tensor product is linear in each
component.
Quantum algorithms can use this superposition to simultaneously compute all val-
ues of a vectorial Boolean function 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑚 . Since 𝑓 may not be invertible
(which is always the case if 𝑛 ≠ 𝑚), but quantum circuits must be invertible, 𝑓 has to
be tweaked. Given 𝑓, we define 𝐹 ∶ {0, 1}𝑛+𝑚 → {0, 1}𝑛+𝑚 by
𝐹(𝑥, 𝑦) = (𝑥, 𝑦 ⊕ 𝑓(𝑥)).
Obviously, 𝐹 is invertible and 𝐹 −1 = 𝐹. The corresponding unitary transformation on
a system of 𝑛 + 𝑚 qubits is given by
𝑈𝑓 (|𝑥, 𝑦⟩) = |𝑥, 𝑦 ⊕ 𝑓(𝑥)⟩
(see Figure 13.7). Such a transformation can be efficiently implemented by combining
elementary gates.
|𝑥⟩ |𝑥⟩
𝑈𝑓
|𝑦⟩ |𝑦 ⊕ 𝑓(𝑥)⟩
Figure 13.7. Unitary transformation associated to a Boolean function 𝑓.
Example 13.12. Let 𝑓(𝑥1 , 𝑥2 ) = 𝑥1 𝑥2 be the classical AND operation on two input bits
(see Table 1.1). The corresponding invertible function is 𝐹 ∶ {0, 1}3 → {0, 1}3 , where
𝐹(𝑥1 , 𝑥2 , 𝑦) = (𝑥1 , 𝑥2 , 𝑦 ⊕ 𝑥1 𝑥2 ). The quantum transformation is called a Toffoli gate
and is given by
𝑈𝑓 (|𝑥1 , 𝑥2 , 𝑦⟩) = |𝑥1 , 𝑥2 , 𝑦 ⊕ 𝑥1 𝑥2 ⟩
on three input qubits with the basis states |𝑥1 ⟩, |𝑥2 ⟩ and |𝑦⟩. ♢
One can leverage 𝐹 to compute 𝑓 by setting the second input component 𝑦 to the
zero string 0𝑚 . Then 𝐹(𝑥, 0𝑚 ) = (𝑥, 𝑓(𝑥)) and thus 𝑈𝑓 |𝑥, 0𝑚 ⟩ = |𝑥, 𝑓(𝑥)⟩. Further-
more, 𝑈𝑓 maps a superposition ∑𝑥 𝑎𝑥 |𝑥, 0𝑚 ⟩ to
∑ 𝑎𝑥 |𝑥, 𝑓(𝑥)⟩ .
𝑥∈{0,1}𝑛
We combine the Walsh-Hadarmard transformation 𝑊 (on the first 𝑛 qubits) and 𝑈𝑓 .

This transforms the zero state into a superposition of all values of 𝑓 (see Figure 13.8):
1 1
𝑈𝑓 (𝑊 |0𝑛 ⟩ , 0𝑚 ) = 𝑈𝑓 ( ∑ |𝑥, 0𝑚 ⟩) = ∑ |𝑥, 𝑓(𝑥)⟩ .
√2𝑛 𝑥∈{0,1}𝑛 √2𝑛 𝑥∈{0,1}𝑛
1
0𝑛 𝑊 ∑ |𝑥⟩
√2𝑛 𝑥∈{0,1}𝑛
𝑈𝑓
1
0𝑚 ∑ |𝑓(𝑥)⟩
√2𝑛 𝑥∈{0,1}𝑛
Figure 13.8. Input of the circuit is the zero state and output is the superposition of all
values of 𝑓.
The simultaneous computation of all values of a function looks miraculous. How-

ever, the state is not directly accessible and a measurement only extracts a single value.
Furthermore, the output is probabilistic, and values with a very small coefficient, i.e.,
a small probability, almost never occur. One therefore has to construct a clever quan-
tum algorithm, which makes use of the superposition and outputs the requested value
with a significant probability. One of the most important algorithms is the Quantum
Fourier Transform, which we study in the following section.
Next, we explain the Deutsch-Josza Algorithm (see Figure 13.9). It illustrates the
capabilities of quantum algorithms, although the underlying problem is of limited in-
terest.
|0𝑛 ⟩ 𝑊 𝑊 𝑤
𝑈𝑓
|1⟩ 𝐻
Figure 13.9. Deutsch-Josza algorithm. The measurement produces a classical binary

string 𝑤 of length 𝑛. If 𝑤 = 0𝑛 is measured then 𝑓 must be constant, otherwise, 𝑓 is
balanced.
Suppose a Boolean function 𝑓 ∶ {0, 1}𝑛 → {0, 1} is either constant or balanced,

i.e., the number of input strings having the output 0 and 1 is equal. A classical non-
probabilistic algorithm requires (in the worst case) 2𝑛−1 + 1 function calls in order to
determine whether or not 𝑓 is balanced. This is only feasible for small 𝑛. In contrast,
the Deutsch-Josza quantum algorithm is polynomial in 𝑛. The first step of the Deutsch-
Josza algorithm is to apply the Walsh-Hadamard transformation 𝐻 ⊗(𝑛+1) to the input
1
state |0𝑛 , 1⟩. Since 𝐻 |1⟩ = (|0⟩ − |1⟩), we obtain the state
√2
|𝜓⟩ = (𝐻 ⊗𝑛 ⊗ 𝐻)(|0⟩ ⊗ ⋯ ⊗ |0⟩ ⊗ |1⟩)

= 𝐻 ⊗𝑛 |0𝑛 ⟩ ⊗ 𝐻 |1⟩
1
= ∑ |𝑥⟩ ⊗ (|0⟩ − |1⟩).
√2𝑛+1 𝑥∈{0,1}𝑛
Next, 𝑈𝑓 (|𝑥, 𝑦⟩) = |𝑥, 𝑦 ⊕ 𝑓(𝑥)⟩ is applied to the state 𝜓:

1
𝑈𝑓 |𝜓⟩ = ∑ |𝑥⟩ ⊗ ((|0⟩ − |1⟩) ⊕ 𝑓(𝑥)).
√2𝑛+1 𝑥∈{0,1}𝑛
Suppose |𝑥⟩ is a basis state |𝑥1 … 𝑥𝑛 ⟩. If 𝑓(𝑥) = 0 then |0⟩ − |1⟩ remains unchanged. If
𝑓(𝑥) = 1 then |0⟩ − |1⟩ is mapped to |1⟩ − |0⟩ = −(|0⟩ − |1⟩). In both cases, we have
(|0⟩ − |1⟩) ⊕ 𝑓(𝑥) = (−1)𝑓(𝑥) (|0⟩ − |1⟩),
and thus obtain

1
𝑈𝑓 |𝜓⟩ = ∑ (−1)𝑓(𝑥) |𝑥⟩ ⊗ (|0⟩ − |1⟩).
√2𝑛+1 𝑥∈{0,1}𝑛
In the next step, 𝐻 ⊗𝑛 ⊗ 𝐼𝑑 is applied to 𝑈𝑓 |𝜓⟩, so that |𝑥⟩ is mapped to 𝐻 ⊗𝑛 |𝑥⟩. One
can check the expansion
1
𝐻 ⊗𝑛 |𝑥⟩ = ∑ (−1)𝑥⋅𝑧 |𝑧⟩
√2𝑛 𝑧∈{0,1}𝑛
(see Exercise 8), where 𝑥 ⋅ 𝑧 is the dot product modulo 2. This yields
1 1
(𝐻 ⊗𝑛 ⊗ 𝐼𝑑) (𝑈𝑓 |𝜓⟩) = ∑ ∑ (−1)𝑓(𝑥)+𝑥⋅𝑧 |𝑧⟩ ⊗ (|0⟩ − |1⟩) .
𝑧∈{0,1}𝑛 𝑥∈{0,1}𝑛
2𝑛 √2
Finally, we measure the first 𝑛 qubits. The coefficient of the basis state |0𝑛 ⟩ is
1
∑ (−1)𝑓(𝑥) ,
𝑥∈{0,1}𝑛
2𝑛
since 𝑧 = 0𝑛 obviously yields 𝑥 ⋅ 𝑧 = 0 for all 𝑥 ∈ {0, 1}𝑛 . If 𝑓 is constant, then the
above coefficient is either 1 or −1. Hence the probability of measuring 0𝑛 is equal to 1
and any measurement must give 0𝑛 . On the other hand, if 𝑓 is balanced then positive
and negative terms cancel and the probability of measuring 0𝑛 is 0. The Deutsch-Josza
algorithm outputs 𝑓 is constant, if a measurement gives 0𝑛 , and otherwise 𝑓 is balanced.
This result is always correct and the quantum algorithm runs in polynomial time.
13.4. Quantum Fourier Transform 241
13.4. Quantum Fourier Transform

The Deutsch-Josza algorithm demonstrates that measuring a quantum state can pro-
vide useful information. The Quantum Fourier Transform is a key algorithm, which
efficiently finds a period of a large input vector. In Section 13.5 below we will see how
this can be leveraged to solve the factoring problem.
The classical Discrete Fourier Transform (DFT) is a bijective ℂ-linear map on ℂ𝑁 .
A sequence of 𝑁 complex numbers (in the discrete time domain) is mapped to the
frequency domain. The resulting vector of complex Fourier coefficients reveals the
periodic structure of the input data.
Suppose the input vector is the complex vector (𝑎0 , 𝑎1 , … , 𝑎𝑁−1 ). Then the DFT
is (𝑦0 , 𝑦1 , … , 𝑦𝑁−1 ) and the Fourier coefficients are defined by
𝑁−1
1
𝑦𝑘 = ∑ 𝑎𝑥 𝑒−2𝜋𝑖𝑥𝑘/𝑁 , where 𝑘 = 0, 1, … , 𝑁 − 1.
√𝑁 𝑥=0
We note that the above unitary DFT differs from other variants by a normalization
factor. The computation can be accelerated using the Fast Fourier Transform (FFT).
The DFT is given by the following unitary 𝑁 × 𝑁 matrix:
1 1 1 … 1
⎛ ⎞
1 𝜔 𝜔2 … 𝜔𝑁−1
1 ⎜ ⎟
𝑈= ⎜1 𝜔2 𝜔4 … 𝜔2(𝑁−1)
⎟,
√𝑁 …
⎜ ⎟
⎝1 𝜔𝑁−1 𝜔2(𝑁−1) … 𝜔(𝑁−1)(𝑁−1) ⎠
where 𝜔 = 𝑒−2𝜋𝑖/𝑁 is a primitive 𝑁-th root of unity.

The original data can be recovered from the Fourier coefficients using the inverse
DFT:
𝑁−1
1
𝑎𝑥 = ∑ 𝑦𝑘 𝑒2𝜋𝑖𝑥𝑘/𝑁 , where 𝑥 = 0, 1, … , 𝑁 − 1.
√𝑁 𝑘=0
𝑇
The corresponding matrix of the inverse DFT is 𝑈 −1 = 𝑈 , the conjugate transpose
matrix.
The DFT computes the discrete spectrum of the input data. If the data of length 𝑁
is 𝑟-periodic and 𝑁 is divisible by 𝑟, then the Fourier coefficients 𝑦𝑘 are nonzero only
𝑁 𝑁
for multiples of . In general, a Fourier amplitude |𝑦𝑘 | ≫ 0 indicates that is an
𝑟 𝑘
approximate multiple of the period.
Example 13.13. Let 𝑎 = (1, 2, 1, 2)𝑇 be a data vector of length 𝑁 = 4. The data is
𝑟 = 2-periodic. The unitary 4 × 4 matrix that describes the DFT is
1 1 1 1
⎛ ⎞
1 1 −𝑖 −1 𝑖
𝑈= ⎜ ⎟.
2 ⎜1 −1 1 −1⎟
⎝1 𝑖 −1 −𝑖 ⎠
We compute the Fourier coefficients 𝑦 = 𝑈𝑎 = (3, 0, −1, 0)𝑇 . Only the coefficients 𝑦0
4
and 𝑦2 are nonzero and hence 𝑎 is = 2-periodic. The input vector 𝑎 can be recovered
2
𝑇
from the Fourier coefficients by computing 𝑎 = 𝑈 −1 𝑦 = 𝑈 𝑦. ♢
In a quantum setting, we want to compute the DFT of a large input vector (𝑎0 , 𝑎1 ,
… , 𝑎𝑁−1 ) of length 𝑁 = 2𝑠 and find a hidden period of the data. The basic idea is that the
indices 𝑘 with Fourier coefficients |𝑦𝑘 |2 ≫ 0 reveal the period. The measurement of a
state vector of Fourier amplitudes will give such indices 𝑘 with a significant probability.
The Quantum Fourier Transform (QFT) does a DFT on the amplitudes of the quan-
tum state and outputs a superposition of Fourier coefficients:
𝑁−1 𝑁−1
|𝜓⟩ = 𝐶 ∑ 𝑎𝑥 |𝑥⟩ ↦ 𝑈 |𝜓⟩ = 𝐶 ∑ 𝑦𝑘 |𝑘⟩ .
𝑥=0 𝑘=0
The scaling factor 𝐶 ensures that the coefficient vector of |𝜓⟩ is normalized. The Fourier
coefficients are
𝑁−1
1
𝑦𝑘 = ∑ 𝑎𝑥 𝑒2𝜋𝑖𝑥𝑘/𝑁 .
√𝑁 𝑥=0
Note that the above QFT uses the primitive 𝑁-th root of unity 𝜔 = 𝑒2𝜋𝑖/𝑁 and not the
conjugate value 𝑒−2𝜋𝑖/𝑁 . One can show that the QFT has an efficient circuit and runs in
time 𝑂(size(𝑁)2 ). In the simplest case (𝑁 = 2), the QFT is given by a single Hadamard
gate.
Now, suppose that the input vector (𝑎0 , 𝑎1 , … , 𝑎𝑁−1 ) is 𝑟-periodic, where 𝑟 is an
𝑁−1
unknown period between 1 and 𝑁. One prepares the state |𝜓⟩ = 𝐶 ∑𝑥=0 𝑎𝑥 |𝑥⟩ of an
𝑁−1
𝑛-qubit system, applies the QFT and measures the state 𝑈 |𝜓⟩ = 𝐶 ∑𝑘=0 𝑦𝑘 |𝑘⟩. With
𝑁
a high probability, the measured output 𝑘 is an approximate multiple of .
𝑟
13.5. Shor’s Factoring Algorithm

The assumption that factoring is a hard problem plays an important role in public-key
cryptography, and especially for RSA encryption and signature schemes. The most
efficient classical factoring algorithm is the Number Field Sieve, which has subexpo-
nential complexity (see Section 9.7). In 1994, Peter Shor found a quantum algorithm
that runs in polynomial time ([Sho94]). This algorithm is a major application of quan-
tum computing. It has been successfully implemented for toy examples (like factoring
21 = 7 ⋅ 3) and will certainly be applied to real-world problems as soon as quantum
computers with thousands of qubits become available.
Shor’s algorithm finds a hidden period of a function and is based on the Quantum
Fourier Transform.
Firstly, we explain how to derive the unknown factors of a composite number 𝑛
from the multiplicative order of an element 𝑎 ∈ ℤ∗𝑛 . Note that the multiplicative order
of 𝑎 is also the least period of the function 𝑓(𝑥) = 𝑎𝑥 mod 𝑛.
Suppose that 𝑛 = 𝑝𝑞, where 𝑛 is known and 𝑝, 𝑞 are unknown. Choose a uniform
random integer 𝑎 with 1 < 𝑎 < 𝑛. If gcd(𝑎, 𝑛) ≠ 1, then gcd(𝑎, 𝑛) is either 𝑝 or 𝑞 and
the unknown factors are found. However, the probability of gcd(𝑎, 𝑛) ≠ 1 is very small
if the prime factors are large and 𝑎 is randomly chosen.
Now assume that gcd(𝑎, 𝑛) = 1; then 𝑎 mod 𝑛 ∈ ℤ∗𝑛 and
𝑟 = ord(𝑎) ∣ ord(ℤ∗𝑛 ) = (𝑝 − 1)(𝑞 − 1)
(see Corollary 4.14). By definition, 𝑎𝑟 ≡ 1 mod 𝑛. If 𝑟 is even, then
𝑎𝑟 − 1 = (𝑎𝑟/2 − 1)(𝑎𝑟/2 + 1) ≡ 0 mod 𝑛,
𝑟
and thus 𝑛 ∣ (𝑎𝑟/2 − 1)(𝑎𝑟/2 + 1). Since ord(𝑎) ≠ , we have 𝑛 ∤ (𝑎𝑟/2 − 1) and there are
2
two possibilities:
(1) 𝑝 divides one of the two factors and 𝑞 divides the other. In this case gcd(𝑎𝑟/2 +1, 𝑛)
gives 𝑝 or 𝑞.
(2) 𝑛 ∣ (𝑎𝑟/2 + 1), then the algorithm fails and one has to choose another base 𝑎.
The algorithm is successful if 𝑟 is even and 𝑛 ∤ (𝑎𝑟/2 + 1). Closer analysis shows
that the probability of success is at least 50% (compare [NC00]): let 𝑟𝑝 and 𝑟𝑞 be the
order of 𝑎 in ℤ∗𝑝 and ℤ∗𝑞 , respectively. Then 𝑟 is odd if and only if 𝑟𝑝 and 𝑟𝑞 are odd (see
Exercise 10). Now suppose that 𝑟 is even and 2𝑑 is the maximal power of 2 that divides
𝑟. We have 𝑎𝑟/2 + 1 ≡ 0 mod 𝑛 if and only if 𝑎𝑟/2 ≡ −1 mod 𝑝 and 𝑎𝑟/2 ≡ −1 mod 𝑞.
𝑟 𝑟
This requires 𝑟𝑝 ∤ and 𝑟𝑞 ∤ . Since 𝑟𝑝 ∣ 𝑟 and 𝑟𝑞 ∣ 𝑟, we obtain 2𝑑 ∣ 𝑟𝑝 and 2𝑑 ∣ 𝑟𝑞 .
2 2
Summarizing, the algorithm fails if either 2 ∤ 𝑟𝑝 and 2 ∤ 𝑟𝑞 or 2𝑑 ∣ 𝑟𝑝 and 2𝑑 ∣ 𝑟𝑞 .
If 𝑎 is chosen uniformly at random, then the probability for this to happen is at most
50%.
Example 13.14. Let 𝑛 = 77 and 𝑎 = 3; then gcd(𝑎, 𝑛) = 1. Suppose we know the
order of 𝑎 mod 𝑛 in the multiplicative group ℤ∗𝑛 : the order is 𝑟 = ord(3 mod 77) = 30
and this is an even number. We obtain
𝑎𝑟/2 + 1 = 315 + 1 ≡ 35 mod 77.
We compute gcd(𝑎𝑟/2 + 1, 𝑛) = gcd(35, 77) = 7 and obtain one of prime factors of
𝑛 = 77. Note that gcd(𝑎𝑟/2 − 1, 𝑛) = gcd(33, 77) = 11 gives the other factor of 𝑛.
In fact, we succeeded in this example since 𝑟𝑝 = ord(3 mod 7) = 6 is even and 𝑟𝑞 =

ord(3 mod 11) = 5 is odd. ♢
|0𝑠 ⟩ 𝑊 𝑄𝐹𝑇 𝑘
𝑈𝑓
|0𝑚 ⟩
Figure 13.10. Shor’s algorithm uses the Quantum Fourier Transform and finds a hid-
den period of a function 𝑓.
Now, the quantum part of Shor’s algorithm is to compute the unknown order 𝑟
of a given residue class 𝑎 ∈ ℤ∗𝑛 . For that purpose, one prepares a superposition of
input values 𝑥 = 0, 1, … , 𝑁 − 1 and simultaneously computes all 𝑎𝑥 mod 𝑛. The
values are 𝑟-periodic, i.e., 𝑎𝑥 ≡ 𝑎𝑥+𝑟 . The QFT is applied to the state and we will see
that a measurement reveals the period with high probability. The sequence must be
significantly longer than the period and it turns out that 𝑁 = 2𝑠 with 𝑛2 ≤ 𝑁 ≤ 2𝑛2 is
a reasonable choice.
We view 𝑓(𝑥) = 𝑎𝑥 mod 𝑛 as a Boolean function. The corresponding unitary
transformation on quantum bits is 𝑈𝑓 |𝑥, 𝑦⟩ = |𝑥, 𝑦 ⊕ 𝑓(𝑥)⟩. The first register has 𝑠
qubits and the second has 𝑚 = size(𝑛) qubits.
The Walsh-Hadamard transformation maps |𝑥⟩ = |0𝑠 ⟩ to a superposition of all
basis states. We set |𝑦⟩ = |0𝑚 ⟩, apply 𝑈𝑓 and obtain the state
𝑁−1
1
|𝜓⟩ = 𝑈𝑓 (𝑊 |0𝑠 ⟩ , |0𝑚 ⟩) = ∑ |𝑥, 𝑓(𝑥)⟩ .
√𝑁 𝑥=0
Next, the QFT operator 𝑈 is applied to the first register while the second remains un-
changed:
𝑁−1 𝑁−1
1
(𝑈 ⊗ 𝐼𝑑) |𝜓⟩ = ∑ ∑ 𝑒2𝜋𝑖𝑥𝑘/𝑁 |𝑘, 𝑓(𝑥)⟩ .
𝑁 𝑥=0 𝑘=0
Finally, the first register is measured (see Figure 13.10). The second register takes a
random value 𝑢 ∈ 𝑖𝑚(𝑓) and the probability of measuring |𝑘, 𝑢⟩ is
2
|1 |
| ∑ 2𝜋𝑖𝑥𝑘/𝑁
𝑒 | .
|𝑁 |
| 𝑥∶ 𝑓(𝑥)=ᵆ |
𝑁
There is a high probability that 𝑘 is an approximate multiple of and the inequalities
𝑟
||𝑘 − 𝑗 𝑁 || < 1 ⇒ || 𝑘 − 𝑗 || < 1 ⇒ || 𝑘 − 𝑗 || < 1

| 𝑟| 2 | 𝑁 𝑟 | 2𝑁 | 𝑁 𝑟 | 2𝑛2
𝑗
hold for some 𝑗. One can show that at most one fraction with 0 < 𝑗 < 𝑛 and 0 < 𝑟 < 𝑛
𝑟
can satisfy this inequality. The fraction and the requested period 𝑟 can be efficiently
determined using the continued fraction expansion (see Example 13.16 below).
The reader might be surprised that the QFT of the first register gives anything in-
teresting, since the amplitudes of the first register of |𝜓⟩ are constant. However, the
amplitudes are partitioned by the second register, i.e., by different values of 𝑢 = 𝑓(𝑥) =
𝑎𝑥 mod 𝑛. We can rewrite |𝜓⟩ as
1
|𝜓⟩ = ∑ ∑ |𝑥, 𝑢⟩ .
√𝑁 ᵆ∈𝑖𝑚(𝑓) 𝑓(𝑥)=ᵆ
The amplitudes with different 𝑢 in the second register do not interfere with each other
when the QFT is applied to the first register. Now, for a fixed second register 𝑢, the
𝑁
first register is 𝑟-periodic and applying the QFT gives peaks at multiples of . If 𝑁 is
𝑟
𝑁
divisible by 𝑟, the Fourier amplitudes are zero outside multiples of .
𝑟
Remark 13.15. Shor’s algorithm requires around 3 size(𝑛) qubits and uses
𝑂(size(𝑛)3 ) operations, and optimizations are known.
Example 13.16. We continue Example 13.14 and apply Shor’s algorithm to 𝑛 = 77

and 𝑎 = 3 ∈ ℤ∗𝑛 . We want to compute ord(𝑎). First, we have to define 𝑁 = 2𝑠 . Since
𝑛2 = 5929 and 2𝑛2 = 11858, we choose 𝑠 = 13 and 𝑁 = 8192. We create a superposition
of the basis states |𝑥, 3𝑥 mod 77⟩ for 𝑥 = 0, 1, … , 8191 and obtain
8191
1
|𝜓⟩ = ∑ |𝑥, 3𝑥 mod 77⟩ .
√8192 𝑥=0
Note that we need 13 qubits for the first register and 7 qubits for the second. We can
reorder the terms with respect to the second register 𝑢 = 3𝑥 mod 77. Suppose for
example that 𝑢 = 59; then the corresponding terms in |𝜓⟩ are
1
(|19, 59⟩ + |49, 59⟩ + |79, 59⟩ + ⋯ + |8179, 59⟩) .
√8192
The expansions for other values 𝑢 in the second register look similar and the period 𝑟 =
30 is clearly visible, but the state is not directly accessible to an observer. Instead, we
apply the QFT to the first register and measure it. The second register takes a random
value 𝑢 ∈ 𝑖𝑚(𝑓), for example 𝑢 = 59. The amplitudes |𝑦𝑘 | of |𝑘, 𝑢⟩ for 𝑘 = 0, 1, … , 8191
and 𝑢 = 59 are shown in Figure 13.11. The squares |𝑦𝑘 |2 give the probability that |𝑘, 59⟩
is measured. The probabilities for any other value 𝑢 ∈ 𝑖𝑚(𝑓) are identical. Closer
𝑁
inspection shows peaks at all multiples of ≈ 273. Table 13.1 lists some amplitudes
𝑟
around 𝑘 = 273.
Again, we remark that these amplitudes are not accessible to an observer, but the
𝑁
measured value 𝑘 is likely to be a multiple of . Suppose that 𝑘 = 7100. We expect
𝑟
Figure 13.11. Fourier amplitudes of |𝑘, 59⟩. The spectrum has peaks at multiples of
8192
≈ 273.
30
Table 13.1. Absolute values of selected amplitudes.
𝑘 270 271 272 273 274 275

|𝑦𝑘 | 0.00071 0.00106 0.002060 0.033082 0.002372 0.001149
𝑘 7100 𝑗
that = is close to a fraction where 𝑗 and 𝑟 are integers less than 𝑛 = 77. In
𝑁 8192 𝑟
this toy example, we could simply try out the possible values for 𝑗 and 𝑟, but in general,
the efficient method of continued fractions expansions is used.
The idea is to approximate a real number 𝑥 by continued fractions of integer num-
bers. The number is split into its integer part ⌊𝑥⌋ and its fractional part 𝜖0 :
1
𝑥 = ⌊𝑥⌋ + 𝜖0 = ⌊𝑥⌋ + .
1
( )
𝜖0
1
Next, is split into an integer and a fractional part and we obtain
𝜖0
1
𝑥 = ⌊𝑥⌋ + 𝜖0 = ⌊𝑥⌋ + 1
.
⌊ ⌋ + 𝜖1
𝜖0
1 1
We continue in the same fashion, write 𝜖1 = 1
and split into an integer and a
( ) 𝜖1
𝜖1
fractional part. The method terminates after a finite number of steps if 𝑥 is a rational
number, and otherwise approximates 𝑥.
For our example, we let SageMath compute the sequence of integer parts:
sage: (7100/8192). continued_fraction ()
[0; 1, 6, 1, 1, 136]
Hence the complete continued fractions expansion is

7100 1
=0+ 1 .
8192 1+ 1
6+ 1
1+
1
1+
136
However, we need an approximation with a denominator less than 𝑛 = 77 and so we

1
discard the last fraction . Then
136
7100 1 13
≈ 1 = .
8192 1 + 1
15
6+ 1
1+
1
𝑗 13
We obtain = and therefore the period must be a multiple of 15 less than 𝑛 = 77.
𝑟 15
One checks that 315 ≡ 34 ≢ 1 mod 77 and 330 ≡ 1 mod 77. We conclude that
𝑁 8192
𝑟 = ord(3) = 30. Finally, we verify that = ≈ 273.07, which explains the peaks
𝑟 30
of the spectrum at multiples of this number. ♢
We have seen above that Shor’s algorithm can efficiently solve the factoring prob-
lem. The other major cryptographic problem, the discrete logarithm problem, can also
be solved with a period-finding algorithm. This can be applied to the multiplicative
group of integers modulo a prime number 𝑝 and also to the group of points on an el-
liptic curve over a finite field.
Suppose that 𝐴 = 𝑔𝑎 holds in a cyclic group 𝐺 = ⟨𝑔⟩ of order 𝑛 and 𝑎 is unknown.
Consider the map
𝑓 ∶ ℤ × ℤ → 𝐺, 𝑓(𝑥, 𝑦) = 𝐴𝑥 𝑔−𝑦 .
This map is (1, 𝑎)-periodic since
𝑓(𝑥 + 1, 𝑦 + 𝑎) = 𝐴𝑥+1 𝑔−𝑦−𝑎 = 𝑔𝑎𝑥+𝑎 𝑔−𝑦−𝑎 = 𝐴𝑥 𝑔−𝑦 = 𝑓(𝑥, 𝑦).
Any period (𝑟, 𝑠) is a multiple of (1, 𝑎) mod 𝑛, since

𝑓(𝑥 + 𝑟, 𝑦 + 𝑠) = 𝑓(𝑥, 𝑦) ⟹ 𝐴𝑥+𝑟 𝑔−𝑦−𝑠 = 𝐴𝑥 𝑔−𝑦 .
This implies 𝑎(𝑥 + 𝑟) − 𝑦 − 𝑠 ≡ 𝑎𝑥 − 𝑦 mod 𝑛 and hence 𝑎𝑟 ≡ 𝑠 mod 𝑛. If we find
an approximate period (𝑟, 𝑠) using an efficient quantum algorithm, then we can also
compute the discrete logarithm 𝑎.
This means that classical public-key cryptography is broken once sufficiently large
and stable quantum computers become available!
On the other hand, it is expected that symmetric algorithms are less severely af-
fected by the capabilities of quantum computers. Grover’s algorithm [Gro96] provides
a speedup from 𝑛 to √𝑛 evaluations, which effectively halves the security level. As a
consequence, AES with 128-bit keys provides only 64 bits of post-quantum security,
but attacking 256-bit AES with 128-bit post-quantum security is still unfeasible.
13.6. Quantum Key Distribution

A remarkable cryptographic application of quantum mechanics is key distribution over
an insecure channel. There are popular, well-known key establishment protocols, for
example the Diffie-Hellman key exchange (see Section 10.3), but once large-scale quan-
tum computing becomes available, the conventional public-key protocols are broken.
Quantum Key Distribution (QKD) is using qubits to transfer a secret key. Unlike
the quantum algorithms discussed above, QKD does not require an 𝑛-qubit system, but
uses a sequence of single qubits or pairs of entangled qubits. This is much easier to re-
alize and QKD has already been successfully implemented in practice, where polarized
photons were sent over large distances.
In this section, we present the BB84 (Bennett and Brassard) quantum key distri-
bution protocol [BB84]. Other protocols (in particular Ekert’s E91 protocol [Eke91])
are also used in practical implementations.
The idea of QKD is that quantum bits cannot be measured without changing the
state. Furthermore, an unknown quantum state cannot be cloned, since the internal
state is unknown to an observer who does not know how the system was initialized.
Suppose that Alice and Bob want to exchange a key of length 𝑛 and they have
access to a public communication channel and a quantum channel of single polarized
photons. They use two different bases 𝐵0 and 𝐵1 (polarizations) of the single-qubit
space ℂ2 . Assume that the standard basis is
𝐵0 = {|0⟩ , |1⟩}.
Applying the Hadamard transformation yields the Hadamard basis 𝐵1 which is diago-
nal to 𝐵0 :
1 1 1 1
𝐵1 = 𝐻𝐵0 = { |0⟩ + |1⟩ , |0⟩ − |1⟩} = {|+⟩ , |−⟩}.
√2 √2 √2 √2
13.6. Quantum Key Distribution 249
Let 𝛿 be a positive integer constant which determines the probability of success of the
protocol. Alice chooses a uniform random key string 𝑘 of length 4𝑛 + 𝛿 bits. Further-
more, Alice and Bob choose 4𝑛 + 𝛿 bits that will determine whether 𝐵0 or 𝐵1 is used:
$ $ $
𝑘 ← {0, 1}4𝑛+𝛿 , 𝑏𝐴 ← {0, 1}4𝑛+𝛿 , 𝑏𝐵 ← {0, 1}4𝑛+𝛿 .
Now Alice sends the key 𝑘 as a sequence of single qubits to Bob. She uses the basis 𝐵0
if the corresponding bit of 𝑏𝐴 is 0, or otherwise 𝐵1 .
Bob receives the sequence of qubits and measures them with respect to 𝐵0 or 𝐵1 ,
depending on the corresponding bit of 𝑏𝐵 . If the 𝑖-th bit of 𝑏𝐴 and 𝑏𝐵 are equal, then
Bob measures the correct 𝑖-th bit of 𝑘. Otherwise, the probability of a correct key bit is
only 50%. For example, suppose that Alice is using 𝐵0 and sending the bit 1; then the
transmitted qubit is |1⟩. If Bob chooses the basis 𝐵1 , then
1 1
|1⟩ = |+⟩ − |−⟩ .
√2 √2
Hence the probability of measuring |−⟩, which corresponds to the bit 1, is only 50%.
After Bob has received and measured the key bits, both partners exchange their
bases 𝑏𝐴 and 𝑏𝐵 via the conventional public channel. They discard the key bits that
were measured in a different basis and restart the key exchange if less than 2𝑛 bits
remain. Obviously, the constant 𝛿 determines the probability of a successful exchange.
They keep the first 2𝑛 key bits. The following step aims to reveal the interference of
an adversary. Alice chooses a subset of 𝑛 key bits and sends Bob the selected positions.
They exchange the associated key bits via the public channel. They compare the bits
and abort the protocol if the number of errors is higher than expected. The remaining
𝑛 bits are used as a secret key, which needs to be further transformed (information
reconciliation and privacy amplification), in order to reduce the effects of errors and
undetected interference by adversaries.
We are now discussing the security of the BB84 protocol. Firstly, the non-quantum
communication channel between Alice and Bob can be public, but integrity is impor-
tant. Furthermore, Alice has to generate and transmit single qubits, for example single
polarized photons, since an eavesdropper could otherwise use any extra particles with
the same state for a measurement.
It may seem surprising that the key can be sent without any protection. How-
ever, an eavesdropper has to measure the qubits in order to get any information. This
requires choosing a basis, i.e., 𝐵0 or 𝐵1 , which is incorrect in about half of the cases.
Remember that the correct basis is only known to Alice during the transmission of the
qubits. At first, Alice and Bob are not aware of an interception, but the error rate of the
check bits will increase significantly. In fact, Bob will measure around 25% incorrect
check bits, since the error rate is 50% if an eavesdropper used the wrong basis. There-
fore, Alice and Bob can detect an adversary who has intercepted a sufficient number of
quantum bits. Assuming that Alice and Bob accept a maximum error rate of 2.5%, an
adversary can only eavesdrop around 10% of the bits if they want to remain undetected.
Privacy amplification methods reduce an adversary’s partial information on the key by
producing a new, shorter key.
Table 13.2. Quantum key distribution example.
Position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Alice’s key 0 1 0 0 1 1 0 1 0 0 1 0 1 0 1 1 0
Alice’s basis 1 0 1 1 0 0 0 1 1 1 0 0 0 1 1 1 0
Alice sends + 1 + + 1 1 0 – + + 1 0 1 + – – 0
Bob’s basis 1 0 0 0 0 1 0 1 0 1 1 1 1 1 1 0 1
Bob measures + 1 1 0 1 – 0 – 1 + – + + + – 0 +
Same basis ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Shared key 0 1 1 0 1 0 0 1
Check bits 1 1 0 0
Key bits 0 0 1 1
Example 13.17. (See Table 13.2) Suppose 𝑛 = 4 and 𝛿 = 1. Alice generates the random
key
𝑘 = 0100 1101 0010 1011 0
of length 17. Alice and Bob’s choice of bases is given by the binary strings
𝑏𝐴 = 1011 0001 1100 0111 0,
𝑏𝐵 = 1000 0101 0111 1110 1,
where 0 represents the basis 𝐵0 = {|0⟩ , |1⟩} and 1 the basis 𝐵1 = {|+⟩ , |−⟩}. Alice sends
the following qubits:
+1 + +110 − + + 101 + − − 0.
Alice and Bob exchange 𝑏𝐴 and 𝑏𝐵 . The following positions coincide: 1, 2, 5, 7, 8, 10,
14, 15. Hence Alice and Bob used the same basis for these positions and 8 shared key
bits remain. Alice chooses positions 2, 5, 10, 14 to check for eavesdropping, which
would probably change at least one bit. They exchange the check bits and verify them.
If the check bits match, then they accept the key exchange. The resulting key 0011 is
defined by the remaining four positions 1, 7, 8 and 15. ♢
Exercises 251
13.7. Summary
• Quantum computing relies on quantum mechanics and uses quantum bits

(qubits) instead of classical bits.
• The state of a qubit is a superposition of the basis states |0⟩ and |1⟩.
• A single qubit state can be represented by a point on the Bloch sphere.
• The measurement of a qubit gives a classical bit. The probability of measuring
0 or 1 is given by the square of the amplitude of |0⟩ and |1⟩, respectively.
• A system of 𝑛 qubits has 2𝑛 basis states and a measurement gives 𝑛 classical
bits.
• A quantum gate has 𝑛 input qubits and 𝑛 output qubits and the transformation
is described by a unitary matrix. Quantum algorithms use quantum gates and
measurements.
• The Quantum Fourier Transform (QFT) maps an input state to a superposition
of Fourier coefficients.
• Shor’s factoring algorithm leverages the QFT to solve the factoring problem
in polynomial time. Quantum computers can break conventional public-key
cryptography, but sufficiently large and stable systems are not yet available.
• The Quantum Key Distribution (QKD) protocol BB84 uses a sequence of single
qubits for key distribution. Since the internal state of a qubit is changed by a
measurement, the interference by an adversary can be detected.
Exercises
1 1
1. Show that the Bell state |𝜓⟩ = |00⟩ + |11⟩ cannot be written as the product of
√2 √2
two single qubit states (𝑎1 |0⟩ + 𝑏1 |1⟩) ⊗ (𝑎2 |0⟩ + 𝑏2 |1⟩).
2. Show that applying the CNOT gate to 𝐻 |0⟩ ⊗ |0⟩ gives the above Bell state. Depict
the corresponding circuit. Give the other three Bell states 𝐻 |0⟩ ⊗ |1⟩, 𝐻 |1⟩ ⊗ |0⟩
and 𝐻 |1⟩ ⊗ |1⟩.
𝜋
3. Describe the transformations of Pauli-𝑋, 𝑌 , 𝑍, 𝑆 (Phase), 𝑇 ( ) and 𝐻 gates on the
8
Bloch sphere.
4. Prove that there is a bijection between the state space for a single-qubit system and
the complex projective space ℙ1 (ℂ).
5. Give the matrix of the Walsh-Hadamard transformation of two qubits.
6. Determine the matrix of the Toffoli gate on three input qubits.
7. Show that the Toffoli gate gives a quantum analogue of the classical NAND oper-
ation.
8. Let 𝑥 ∈ {0, 1}𝑛 . Prove the formula
1
𝐻 ⊗𝑛 |𝑥⟩ = ∑ (−1)𝑥⋅𝑧 |𝑧⟩ .
√2𝑛 𝑧∈{0,1}𝑛
9. Give an explicit description of the Quantum Fourier Transform for 𝑁 = 2 and

𝑁 = 4.
10. Suppose that 𝑛 = 𝑝𝑞, where 𝑝 and 𝑞 are prime numbers. Let 𝑎 ∈ ℤ∗𝑛 and let 𝑟, 𝑟𝑝 ,
𝑟𝑞 be the order of 𝑎 in ℤ∗𝑛 , ℤ∗𝑝 and ℤ∗𝑞 , respectively. Show that 𝑟 is the least common
multiple of 𝑟𝑝 and 𝑟𝑞 .
11. Let 𝑛 = 47053 and 𝑎 = 11 ∈ ℤ∗𝑛 . Suppose that ord(𝑎) = 7770 is already known.
Find the factors 𝑝 and 𝑞 of 𝑛.
12. Alice and Bob use the BB84 protocol and Mallory intercepts 𝑚 key bits. Determine
the expected number of faulty bits, i.e., bits that differ from Alice’s original key.
13. Suppose 𝑛 = 2048 and 𝛿 = 128 in the BB84 protocol. Mallory intercepts 64 key
bits. How many of the check bits are likely to differ if no other transmission errors
occur?
Chapter 14
Lattice-based Cryptography
The following two chapters deal with public-key cryptosystems based on lattices and
codes, respectively. A key motivation is the emergence of quantum computers which
are able to break RSA, Diffie-Hellman and elliptic curve cryptosystems. Fortunately,
symmetric schemes like AES are less affected by the effects of quantum algorithms.
Now the relatively new field of post-quantum cryptography studies encryption and sig-
natures schemes which are believed to be secure in the presence of quantum comput-
ers. We focus on lattices and codes used in many proposals and look at encryption
schemes.
This chapter introduces the basics of lattices and their applications in cryptogra-
phy. Lattices are discrete subgroups of ℝ𝑛 and there are computational problems, for
example finding the shortest vector in a lattice, which are believed to be hard. Solv-
ing a system of linear equations is easy, but solving a random system of noisy linear
equations modulo an integer (learning with errors) is hard. Lattice-based cryptogra-
phy offers strong security guarantees and is believed to resist quantum attacks.
We outline the fundamentals of lattices in Section 14.1. Lattice algorithms and in
particular the LLL algorithm are studied in Section 14.2. The following Sections 14.3,
14.4 and 14.5 deal with the public-key encryption schemes GGH, NTRU and LWE,
respectively. There are also promising lattice-based signature schemes, but they are
not covered in this book.
We refer the reader to the textbooks [HPS08], [Gal12] and the articles [MR09],
[Pei14] for additional reading on lattice-based cryptography.
253
254 14. Lattice-based Cryptography
14.1. Lattices
A lattice Λ is a discrete subgroup of ℝ𝑛 . The trivial lattice is Λ = {0} and a standard
example is the lattice Λ = ℤ𝑛 of vectors with integer coordinates. In mathematics, a
lattice can also refer to a partially ordered set in which two elements have a least upper
bound and a greatest lower bound, but this not used in our context.
Definition 14.1. A subset Λ ⊂ ℝ𝑛 is called discrete if every point 𝑣 ∈ Λ possesses an
environment 𝑈 = {𝑤 ∈ ℝ𝑛 | ‖𝑣 − 𝑤‖ < 𝜖}, i.e., an open ball of radius 𝜖 > 0, such that
𝑈 ∩ Λ = {𝑣}, i.e., 𝑣 is the only lattice point in 𝑈. A discrete subgroup of ℝ𝑛 is called a
lattice. ♢
It follows from the above definition that every bounded set and in particular every
ball {𝑣 ∈ ℝ𝑛 | ‖𝑣‖ < 𝑑} only contains a finite number of lattice points. All nontrivial
lattices are infinite sets, but they have a finite basis.
Let 𝑉 ⊂ ℝ𝑛 be the real vector space generated by Λ. We define the rank of Λ to
be the dimension of 𝑉. The following Proposition shows that a lattice Λ of rank 𝑟 has a
ℤ-basis {𝑣1 , … , 𝑣𝑟 } ⊂ Λ.
Proposition 14.2. Let Λ be a nontrivial lattice of rank 𝑟. Then there is a basis 𝐵 =
{𝑣1 , 𝑣2 , … , 𝑣𝑟 } ⊂ Λ, i.e., a set of linearly independent vectors such that
Λ = {𝑥1 𝑣1 + 𝑥2 𝑣2 + ⋯ + 𝑥𝑟 𝑣𝑟 | 𝑥1 , 𝑥2 , … , 𝑥𝑟 ∈ ℤ}.
Proof. Our proof follows [Gal12]. Let 𝐵 = {𝑣1 , 𝑣2 , … , 𝑣𝑟 } be a set of linearly indepen-
dent vectors in Λ. We want to transform 𝐵 into a basis 𝐵 ′ of Λ. For 𝑑 ≤ 𝑟 we let 𝑉𝑑 be
the real vector space generated by 𝑣1 , … , 𝑣𝑑 . The lattice Λ𝑑 = 𝑉𝑑 ∩Λ has rank 𝑑 ≤ 𝑟 and
Λ𝑟 = Λ. We now prove the claim by induction. For 𝑑 = 1, we can easily find a basis of
Λ1 : we replace 𝑣1 by the shortest nonzero multiple 𝑣1′ = 𝛼𝑣1 such that 𝑣1′ ∈ Λ1 . Now
we assume that Λ𝑑−1 has a basis {𝑣1′ , … , 𝑣𝑑−1
′
}. Consider the bounded and discrete set
𝑆 = Λ𝑑 ∩ {𝛼1 𝑣1′ + ⋯ + 𝛼𝑑−1 𝑣𝑑−1
′
+ 𝛼𝑑 𝑣𝑑 | 𝛼1 , … , 𝛼𝑑−1 ∈ [0, 1[ and 𝛼𝑑 ∈ [0, 1]}.
Let 𝑣𝑑′ = 𝛼1 𝑣1′ + ⋯ + 𝛼𝑑−1 𝑣𝑑−1 ′
+ 𝛼𝑑 𝑣𝑑 be the element in 𝑆 with smallest nonzero
′
coefficient 𝛼𝑑 . It is obvious that 𝐵 ′ = {𝑣1′ , … , 𝑣𝑑−1 , 𝑣𝑑′ } is linearly independent and it
remains to show that 𝐵 is a basis of Λ𝑑 . To this end, given any vector 𝑣 = 𝛽1 𝑣1′ + ⋯ +
′
′
𝛽𝑑−1 𝑣𝑑−1 + 𝛽𝑑 𝑣𝑑 ∈ Λ𝑑 , there are integer coefficients 𝑥𝑖 ∈ ℤ such that
′
𝑤 = 𝑣 − 𝑥1 𝑣1′ − ⋯ − 𝑥𝑑−1 𝑣𝑑−1 − 𝑥𝑑 𝑣𝑑′ ∈ 𝑆
and 𝛽𝑑 − 𝑥𝑑 𝛼𝑑 ∈ [0, 𝛼𝑑 [ . Note that 𝛽𝑑 − 𝑥𝑑 𝛼𝑑 is the coefficient of 𝑣𝑑 in 𝑤. Since 𝑣𝑑′ is
the element in 𝑆 with the smallest nonzero coefficient of 𝑣𝑑 , we obtain 𝛽𝑑 − 𝑥𝑑 𝛼𝑑 = 0
and hence 𝑤 ∈ Λ𝑑−1 . This shows that 𝑣 ∈ Λ𝑑 . □
In this chapter we assume that the rank is maximal, i.e., Λ ⊂ ℝ𝑛 and 𝑟𝑘(Λ) =
𝑛, as the more general case is not substantially different. We write lattice vectors as
columns, but row vectors are also used in the literature. Writing the basis vectors into
14.1. Lattices 255
Figure 14.1. Lattice of dimension 2 and two different bases.
the columns defines a regular 𝑛 × 𝑛 matrix. By abuse of notation, we will use the
same letter for a basis and the associated 𝑛 × 𝑛 matrix of column vectors. Two bases
𝐵1 = {𝑣1 , 𝑣2 , … , 𝑣𝑛 } and 𝐵2 = {𝑤1 , 𝑤2 , … , 𝑤𝑛 } of the same lattice Λ are connected by a
unimodular 𝑛 × 𝑛 matrix 𝑈 over ℤ:
𝐵2 = 𝐵1 𝑈.
A matrix 𝑈 is called unimodular if all entries are integers and det(𝑈) = ±1. Indeed, for
each 𝑤𝑖 there are 𝑥1 , … , 𝑥𝑛 ∈ ℤ such that 𝑤𝑖 = 𝑥1 𝑣1 + ⋯ + 𝑥𝑛 𝑣𝑛 and the coefficients
𝑥1 , … , 𝑥𝑛 form the 𝑖-th column of 𝑈. Conversely, each 𝑣𝑖 can be represented by an
˜ for some integer
integer linear combination of 𝑤1 , … , 𝑤𝑛 . Therefore, we have 𝐵1 = 𝐵2 𝑈
˜ ˜ −1
matrix 𝑈, from which we conclude that 𝑈 is invertible, 𝑈 = 𝑈 and det(𝑈) = ±1.
4 2
Example 14.3. Let 𝐵1 = {( ) , ( )} and assume that Λ is a lattice generated by 𝐵1 .
−1 2
The lattice is depicted in Figure 14.1, where 𝐵1 is shown with continuous lines.
8 10
𝐵2 = {( ) , ( )} is another basis of Λ, and the basis 𝐵2 is shown with dashed
−7 −10
lines in Figure 14.1. One has 𝐵2 = 𝐵1 𝑈, where 𝑈 is the unimodular matrix
3 4
𝑈=( ).
−2 −3
Intuitively, the first basis is ‘better’ than the second, since the vectors are shorter and
closer to being orthogonal.
Definition 14.4. Let Λ be a lattice and 𝐵 any basis of Λ. Then the determinant of Λ is
defined by the absolute value
det(Λ) = | det(𝐵)|.
The determinant of Λ does not depend on the chosen basis. ♢
A basis 𝐵 of a lattice Λ defines a fundamental parallelepiped

𝑃 = {𝑥1 𝑣1 + 𝑥2 𝑣2 + ⋯ + 𝑥𝑟 𝑣𝑟 | 𝑥1 , 𝑥2 , … , 𝑥𝑟 ∈ [0, 1[ }.
The determinant of a lattice can be described geometrically by the 𝑛-dimensional vol-
ume of the parallelepiped spanned by the basis vectors. We call this the covolume of Λ.
In the literature, this is also called the volume of Λ.
Example 14.5. Consider the lattice and bases of Example 14.3. The covolume of Λ is
det(Λ) = | det(𝐵1 )| = | det(𝐵2 )| = 10.
Definition 14.6. Let Λ be a lattice of dimension 𝑛. The dual lattice Λ∗ ⊂ ℝ𝑛 is defined

by the set of 𝑦 ∈ ℝ𝑛 such that the dot product satisfies 𝑥 ⋅ 𝑦 ∈ ℤ for all 𝑥 ∈ Λ. ♢
Here we give a description of the dual lattice in terms of matrices. Remember that
we assumed that our lattices have full rank. Let 𝐵 be a basis of Λ. We have 𝑦 ∈ Λ∗ if
and only if 𝐵 𝑇 𝑦 ∈ ℤ𝑛 or, equivalently, 𝑦 = (𝐵 𝑇 )−1 𝑥 for some 𝑥 ∈ ℤ𝑛 . This implies that
the dual lattice is generated by the columns of (𝐵𝑇 )−1 . Furthermore, it follows that
1 1
det(Λ∗ ) = det |(𝐵𝑇 )−1 | = = .
| det(𝐵)| det(Λ)
Example 14.7. Consider the lattice Λ in Example 14.3. The dual lattice Λ∗ is given by
the columns of
1 1
(𝐵1𝑇 )−1 =( 1 5 10 ) .
2
−
5 5
1 1
The covolume of Λ is∗
= . ♢
det(Λ) 10
For cryptographic applications, one usually considers 𝑞-ary lattices, which are de-
fined by integers and modular congruences.
14.1. Lattices 257
Definition 14.8. Let 𝑞 ∈ ℤ be a nonzero integer. A lattice Λ of dimension 𝑛 is called

𝑞-ary if 𝑞ℤ𝑛 ⊂ Λ ⊂ ℤ𝑛 . ♢
Below, we suppose that Λ is an integer lattice, i.e., Λ ⊂ ℤ𝑛 . Hence Λ is 𝑞-ary if and

only if the multiples 𝑞𝑒1 , … , 𝑞𝑒𝑛 of the unit vectors lie in Λ. Note that any integer lattice
Λ is 𝑞-ary for some 𝑞 ∈ ℤ.
Example 14.9. The lattice Λ in Example 14.3 is 10-ary since 10𝑒1 , 10𝑒2 ∈ Λ. The lattice
is 𝑞-ary for any integer multiple 𝑞 of 10. ♢
𝑞-ary lattices can be defined by any 𝑚 × 𝑛 matrix 𝐴 over ℤ𝑞 :

Λ𝑞 (𝐴) ∶= {𝑦 ∈ ℤ𝑛 | 𝑦 ≡ 𝐴𝑇 𝑥 mod 𝑞 for 𝑥 ∈ ℤ𝑚
𝑞 }.
The 𝑛-dimensional lattice Λ𝑞 (𝐴) is defined by the rows of 𝐴 and 𝑞ℤ𝑛 . Note that here
we have linear combinations of rows of 𝐴, not columns of 𝐴, as above. Furthermore,
the kernel of 𝐴 defines a lattice:
Λ⟂𝑞 (𝐴) ∶= {𝑦 ∈ ℤ𝑛 | 𝐴𝑦 ≡ 0 mod 𝑞}.
The lattices Λ𝑞 (𝐴) and Λ⟂𝑞 (𝐴) have full rank since they contain 𝑞ℤ𝑛 .
Λ𝑞 (𝐴) and Λ⟂𝑞 (𝐴) can also be viewed as linear codes over ℤ𝑞 , defined by the rows
of 𝐴 and the parity check matrix 𝐴, respectively (see Section 15.1). These two lattices
are dual to each other, up to normalization (see Exercise 5):
Λ⟂𝑞 (𝐴) = 𝑞Λ𝑞 (𝐴)∗ and Λ𝑞 (𝐴) = 𝑞Λ⟂𝑞 (𝐴)∗ .
Example 14.10. Consider the 10-ary lattice Λ from Example 14.3. The columns of
𝐵1 are (4, −1) and (2, 2). Since 8 ⋅ (4, −1) = (32, −8) ≡ (2, 2) mod 10, we discard
the second vector and define the 1 × 2 matrix 𝐴 = (4 − 1). Thus Λ = Λ10 (𝐴). The
lattice Λ⟂10 (𝐴) is defined by all solutions (𝑥, 𝑦) ∈ ℤ2 of the modular equation 4𝑥 − 𝑦 ≡
0 mod 10. We have Λ⟂10 (𝐴) = 10 ⋅ Λ10 (𝐴)∗ and the lattice is defined by the columns of
2 1
the matrix ( ). ♢
−2 4
Short lattice vectors play an important role in cryptographic applications.

Definition 14.11. Let Λ be a lattice and ‖ ‖ a norm on ℝ𝑛 (usually the Euclidean
norm).
(1) The norm of the shortest nonzero vector 𝑣 ∈ 𝐿 defines 𝜆1 (Λ) ∶= ‖𝑣‖.
(2) The 𝑖-th successive minimum is defined by 𝜆𝑖 (Λ) ∶= min𝑆 (max𝑣∈𝑆 ‖𝑣‖), where 𝑆
runs over all linearly independent sets 𝑆 ⊂ Λ with |𝑆| = 𝑖. ♢
We state several computational lattice problems:

Definition 14.12. Let Λ be a lattice and ‖ ‖ a norm on ℝ𝑛 (usually the Euclidean
norm).
(1) Shortest Vector Problem (SVP): Find a shortest nonzero vector 𝑣 ∈ Λ.
(2) Shortest Independent Vector Problem (SIVP): Find linearly independent vectors
𝑣1 , … , 𝑣𝑛 in Λ such that max𝑖 ‖𝑣𝑖 ‖ = 𝜆𝑛 (Λ).
(3) Closest Vector Problem (CVP): given any target vector 𝑤 ∈ ℝ𝑛 , find the closest
lattice point 𝑣 ∈ Λ to 𝑤. ♢
One also considers approximation variants of these problems. Let 𝛾 ≥ 1. In SVP𝛾 ,
one has to find a vector 𝑣 with ‖𝑣‖ ≤ 𝛾 𝜆1 (Λ). Similarly, the SIVP𝛾 problem is to find
linearly independent vectors 𝑣1 , … , 𝑣𝑛 such that max𝑖 ‖𝑣𝑖 ‖ ≤ 𝛾 𝜆𝑛 (Λ).
In CVP𝛾 , the goal is to find a vector 𝑣 such that the distance to a target vector 𝑤 is
at most 𝛾 times the distance of the closest lattice vector to 𝑤.
2
Example 14.13. In Example 14.3 (see Figure 14.1), the shortest vectors are 𝑣 = ( )
2
and −𝑣, and thus 𝜆1 (Λ) = √8. The basis 𝐵1 is not a solution to SIVP, but instead the
basis
2 −2
{( ) , ( )} .
2 3
√17
Hence 𝜆2 (Λ) = √13. However, 𝐵1 is a solution to SIVP𝛾 for 𝛾 ≥ since 𝜆2 (𝐵1 ) = √17.
√13
−1 −2
The closest lattice vector to the target 𝑤 = ( ) is 𝑣 = ( ). ♢
2 3
It is known that SVP is no harder than CVP, since there is a reduction from SVP to
CVP. Both are considered to be hard problems and CVP is NP-hard.
The classical Minkowski Theorem (see [HPS08]) gives an upper bound to the norm
of the shortest nonzero vector:
Theorem 14.14. Let Λ be a lattice and 𝑆 ⊂ ℝ𝑛 a convex centrally symmetric set. If the
volume of 𝑆 is greater than 2𝑛 det(Λ), then 𝑆 contains a nonzero lattice point. ♢
A set 𝑆 is centrally symmetric if 𝑥 ∈ 𝑆 implies −𝑥 ∈ 𝑆. 𝑆 is called convex if 𝑥, 𝑦 ∈ 𝑆
implies 𝑥 + 𝑡(𝑦 − 𝑥) ∈ 𝑆 for 𝑡 ∈ [0, 1], i.e., if the line segment between two points
𝑥, 𝑦 ∈ 𝑆 is contained in 𝑆. For example, balls or cubes with center 0 are centrally
symmetric and convex.
1
Corollary 14.15. 𝜆1 (Λ) ≤ √𝑛 (det(Λ)) 𝑛 .
1
Proof. Let 𝑆 be a ball with center 0 and radius √𝑛 (det(Λ)) 𝑛 . Then
1
𝑣 = (det(Λ)) 𝑛 (1, 1, … , 1) ∈ 𝑆
1 1
since ‖𝑣‖ = (det(Λ)) 𝑛 √𝑛. Furthermore, any vector 𝑤 = (det(𝐿)) 𝑛 ⋅ (𝑥1 , 𝑥2 , … , 𝑥𝑛 )
with |𝑥𝑖 | ≤ 1 for all 𝑖 = 1, … , 𝑛 lies in 𝑆. Hence
1 1 𝑛
[−(det(Λ)) 𝑛 , (det(Λ)) 𝑛 ] ⊂ 𝑆.
14.1. Lattices 259
The volume of 𝑆 is thus greater than 2𝑛 det(Λ), the volume of the above 𝑛-dimensional
cube. 𝑆 satisfies the prerequisite of Minkowski’s Theorem 14.14 and must contain a
nonzero lattice vector. □
Remark 14.16. The above upper bound for 𝜆1 (Λ) can be improved to
1
√𝛾𝑛 (det(Λ)) 𝑛
using Hermite’s constant 𝛾𝑛 . For a given dimension 𝑛, the constant 𝛾𝑛 is the smallest
number such that every lattice of rank 𝑛 contains a nonzero vector 𝑣 with
1
‖𝑣‖ ≤ √𝛾𝑛 (det(Λ)) 𝑛 .
2
For example, 𝛾2 = , but the exact value of 𝛾𝑛 is known only for a few values of 𝑛. The
√3
expected length of the shortest vector of a random lattice is much smaller. Heuristically,
the approximate length is the radius of an 𝑛-dimensional ball with volume det(Λ). Stir-
ling’s asymptotic formula for the volume of an 𝑛-dimensional ball of radius 𝑟 is
𝑛
1 2𝜋𝑒
𝑉𝑛 (𝑟) ≈ (√ 𝑟) .
√𝑛𝜋 𝑛
Now assume that the covolume of a lattice with 𝜆1 (Λ) = 𝑟 is approximately the volume
of a ball of radius 𝑟. Rearranging the above equation gives for large values of 𝑛:
𝑛 1
𝑟≈ (det(Λ)) 𝑛 .
√ 2𝜋𝑒
This is the Gaussian heuristic for randomly chosen lattices of dimension 𝑛. ♢
A first approach to SVP is to enumerate all nonzero points 𝑣 ∈ ℤ𝑛 with ‖𝑣‖ ≤ 𝐶

(using the bounds above) and to verify whether 𝑣 ∈ Λ. If 𝐵 is any basis of Λ, then 𝑣 is
a lattice point if and only if 𝑣 ∈ 𝐵 −1 𝑣 ∈ ℤ𝑛 . This algorithm has exponential running
time in 𝑛.
A similar enumeration method (with exponential running time) can be used to
solve CVP: simply test vectors with integer coordinates ‘near’ the target vector.
Another approach to CVP is Babai’s rounding method. The rational coordinates of
a given target vector 𝑤 with respect to 𝐵 are 𝐵−1 𝑤. Rounding the coordinates yields the
lattice vector 𝑣 = 𝐵⌊𝐵 −1 𝑤⌉. However, it is not guaranteed that 𝑣 is the closest lattice
vector to 𝑤.
2
Example 14.17. Consider Example 14.3 and let 𝑣 = ( ) be a target vector. Looking at
6
Figure 14.1 or by simply trying out vectors near 𝑣, it is easy to see that the closest lattice
2
vector is 𝑤 = ( ). However, this is only feasible in small dimensions.
7
We consider 𝐵1 and 𝐵2 (see Example 14.3). For the ‘good’ basis 𝐵1 we obtain
4
1 2 −2 2 −
𝐵1−1 𝑣 = ( ) ⋅ ( ) = ( 135 ) .
10 1 4 6
5
The rounded coordinates are (−1, 3) and the resulting lattice vector is
−1 2
𝐵1 ( ) = ( ) ,
3 7
which is in fact the closest vector. Now we use the ‘bad’ basis 𝐵2 :
1 10 10 2 8
𝐵2−1 𝑣 = ( ) ( ) = ( 31 ) .
10 −7 −4 6 −
5
The rounded coordinates are (8, −6) and the corresponding lattice vector is
8 4
𝐵2 ( ) = ( ) ,
−6 4
and this is only the second-best solution. ♢
Finding short vectors in random 𝑞-ary lattices is assumed to be intractable for large
dimensions, say several hundreds. Below, we will see that public-key cryptosystems
can be based on lattices, where a ‘good’ basis with short vectors forms the private key
and only a ‘bad’ basis is public.
14.2. Lattice Algorithms

A major task in lattice computations is to give a basis that is as good as possible, i.e.,
the basis vectors should be short and almost orthogonal. The LLL algorithm provides
a partial solution: it runs in polynomial time and improves the basis with regard to the
above objectives. However, for large dimensions, the LLL basis is far from being opti-
mal and the security of lattice-based cryptography relies on the relative ineffectiveness
of the LLL algorithm.
A much simpler task is to transform a set of generators into a basis in standard
echelon form. This also allows us to decide whether two bases (or two sets of generators)
give the same lattice.
Below, we define the column-style Hermite Normal Form (or echelon form) of an
𝑛 × 𝑚 matrix 𝐴 of rank 𝑛 over ℤ.
Definition 14.18. Let 𝐻 be an 𝑛 × 𝑚 matrix over ℤ such that 𝑟𝑘(𝐻) = 𝑛. We say 𝐻 is
in Hermite Normal Form (HNF), if the following conditions are satisfied:
(1) The rightmost 𝑚 − 𝑛 columns of 𝐻 are zero.
(2) 𝐻 is in lower-triangular form.
(3) All coefficients of 𝐻 are non-negative.
(4) In each row of 𝐻, the unique maximum coefficient lies on the diagonal. ♢
Hence, a matrix in Hermite Normal Form is of the following type:

>0 0 … 0 0 … 0
⎛ ⎞
≥0 >0 … 0 0 … 0
𝐻=⎜ ⎟.
⎜ ⋮ ⎟
⎝≥ 0 ≥0 … > 0 0 … 0⎠
HNFs also exist for general 𝑛 × 𝑚 matrices, i.e., without the condition 𝑟𝑘(𝐴) = 𝑛.
Since we only consider lattices of full rank, this is not needed here.
Every integer matrix can be transformed into a matrix in HNF form:
Proposition 14.19. Let 𝐴 be a 𝑛 × 𝑚 matrix over ℤ with 𝑟𝑘(𝐴) = 𝑛 and 𝑛 ≤ 𝑚. Then

there exists a unimodular 𝑚 × 𝑚 matrix 𝑈, such that 𝐻 = 𝐴𝑈 is in HNF. Furthermore,
the columns of 𝐴 and 𝐻 generate the same lattice Λ ⊂ ℝ𝑛 .
Proof. The matrix 𝐻 can be computed by Gaussian elimination. The following oper-
ations are sufficient: swapping two columns, multiplying a column by −1 and adding
an integer multiple of a column to another column. These operations are given by
a multiplication with a unimodular matrix on the right. We leave the details to the
reader. □
Remark 14.20. Above, we considered the column-style HNF. There is also a row-style
HNF 𝐻 of matrix 𝐴 in upper-triangular form where 𝐻 = 𝑈𝐴. Both HNFs are transposes
of each other.
Example 14.21. (1) We compute the HNF of 𝐵1 and 𝐵2 in Example 14.3. Let 𝐵1 =
4 2
{𝑣1 , 𝑣2 } = {( ) , ( )}. Set 𝑣1 ← (−𝑣1 ) + 2𝑣2 and swap 𝑣1 and 𝑣2 . We obtain the
−1 2
HNF
2 0
𝐻=( ).
2 5
8 10
Now consider the second basis 𝐵2 = {( ) , ( )}. Set 𝑣1 ← (−𝑣1 ) + 𝑣2 , giving
−7 −10
2 10 2 0
( ). Now set 𝑣2 ← (−5𝑣1 ) + 𝑣2 and 𝑣1 ← 𝑣2 + 𝑣1 . This gives 𝐻 = ( )
−3 −10 2 5
as above. In fact, 𝐵1 and 𝐵2 generate the same lattice.
(2) Consider a slightly more complicated example:
2 −6 12 4
𝐴=(10 −30 11 6) .
2 −6 4 5
We let SageMath compute the HNF:

sage: A= matrix (ZZ ,[[2 , -6 ,12 ,4] ,[10 , -30 ,11 ,6] ,[2 , -6 ,4 ,5]])
sage: H=A. transpose (). hermite_form (). transpose (); H
[ 2 0 0 0]
[ 3 7 0 0]
[14 11 23 0]
Since SageMath computes a row-style HNF, we have to add transpose()

operations. The columns of 𝐴 and 𝐻 generate the same lattice; the three nonzero
columns of 𝐻 form a basis of the lattice. ♢
Although many lattices do not possess an orthogonal basis, a good basis should
have almost orthogonal vectors.
Definition 14.22. The orthogonality defect of a basis 𝑏1 , … , 𝑏𝑛 of a lattice Λ is defined
by
‖𝑏1 ‖ ⋯ ‖𝑏𝑛 ‖
. ♢
det(Λ)
The orthogonality defect is always ≥ 1. It is close to 1 for a ‘good’ basis and equal
to 1 for an orthogonal basis.
Example 14.23. The orthogonality defect of the ‘good’ basis 𝐵1 (see Example 14.3) is
1.17 and that of the ‘bad’ basis 𝐵2 is 15.03. ♢
The main objective of lattice reduction is efficient transformation of an arbitrary

basis into an almost orthogonal basis.
For inner product spaces, for example ℝ𝑛 , the Gram-Schmidt Orthogonalization
(GSO) produces an orthogonal basis (see Algorithm 14.1). If 𝑏1 , … , 𝑏𝑛 is any given
basis, then one obtains an orthogonal basis by iterative projections: let 𝑏1∗ = 𝑏1 and
𝑖−1 𝑏𝑖 ⋅ 𝑏𝑗∗
𝑏𝑖∗ = 𝑏𝑖 − ∑ 𝜇𝑖,𝑗 𝑏𝑗∗ , where 𝜇𝑖,𝑗 = for 1 ≤ 𝑗 < 𝑖 ≤ 𝑛.
𝑗=1
𝑏𝑗∗ ⋅ 𝑏𝑗∗
The numbers 𝜇𝑖,𝑗 are called GSO coefficients. The summand 𝜇𝑖,𝑗 𝑏𝑗∗ gives the projection
of 𝑏𝑖 onto 𝑏𝑗∗ and their sum is the projection of 𝑏𝑖 onto the hyperplane ⟨𝑏1∗ , … , 𝑏𝑖−1
∗
⟩. The
∗
difference of 𝑏𝑖 and the projection gives the vector 𝑏𝑖 , which is orthogonal to all vectors
𝑏1∗ , … , 𝑏𝑖−1
∗
. The vector 𝑏𝑖∗ is also called the projection onto the orthogonal complement
∗ ∗
of 𝑏1 , … , 𝑏𝑖−1 .
𝑏𝑖∗ can also be computed by successive projections of 𝑏𝑖 onto the orthogonal com-
plement of 𝑏𝑗∗ , where 𝑗 runs from 𝑖 − 1 to 1. Initially set 𝑏1∗ = 𝑏1 , … , 𝑏𝑛∗ = 𝑏𝑛 and update
each vector 𝑏2∗ , … , 𝑏𝑛∗ recursively:
𝑏𝑖∗ ← 𝑏𝑖∗ − 𝜇𝑖,𝑗 𝑏𝑗∗ , where 𝑗 = 𝑖 − 1, … , 1.
We write 𝐵𝑖 for the square norm ‖𝑏𝑖∗ ‖2 of vectors in the GSO basis.
Algorithm 14.1 Gram-Schmidt Orthogonalization Algorithm (GSO)

Input: Basis 𝑏1 , … , 𝑏𝑛 of ℝ𝑛
Output: Orthogonal basis 𝑏𝑖∗ , … , 𝑏𝑛∗ , GSO coefficients 𝜇𝑖,𝑗
Initialisation: 𝑏1∗ = 𝑏1
1: for 𝑖 = 2 to 𝑛 do
2: 𝑏𝑖∗ = 𝑏𝑖
3: for 𝑗 = 𝑖 − 1 downto 1 do
𝑏𝑖 ⋅𝑏𝑗∗
4: 𝜇𝑖,𝑗 = // GSO coefficients
𝑏𝑗∗ ⋅𝑏𝑗∗
5: 𝑏𝑖∗ ← 𝑏𝑖∗ − 𝜇𝑖,𝑗 𝑏𝑗∗
6: end for
7: end for
8: return 𝑏𝑖∗ , 𝜇𝑖,𝑗
The standard GSO algorithm needs to be modified for lattices, since the GSO basis
is not contained in the lattice unless all GSO coefficients are integers. Now, the obvious
approach is to round the GSO coefficients 𝜇𝑖,𝑗 . Let ⌊𝜇𝑖,𝑗 ⌉ be the closest integer to 𝜇𝑖,𝑗 .
Then set
𝑏𝑖 = 𝑏𝑖 − ⌊𝜇𝑖,𝑗 ⌉𝑏𝑗 for 𝑗 = 𝑖 − 1, … , 1.
𝑏𝑖′ 𝑏𝑖
𝑏𝑗 2𝑏𝑗
Figure 14.2. Projection of 𝑏𝑖 onto the orthogonal complement of 𝑏𝑗 and lifting it back
to a lattice vector 𝑏𝑖′ (dashed). In this example, one has ⌊𝜇𝑖,𝑗 ⌉ = 2. The new basis 𝑏𝑖′ , 𝑏𝑗
is size-reduced.
This can be interpreted as projecting 𝑏𝑖 onto the orthogonal complement of 𝑏𝑗 and

lifting it back to a lattice vector (see Figure 14.2). Since we updated 𝑏𝑖 , we also need to
update the GSO coefficients 𝜇𝑖,1 , … , 𝜇𝑖,𝑗 after each projection. The new GSO coefficient
𝜇𝑖,𝑘 is
𝑏𝑖 ⋅ 𝑏𝑘∗ 𝑏𝑗 ⋅ 𝑏𝑘∗
= 𝜇 𝑖𝑘 − ⌊𝜇 𝑖,𝑗 ⌉ = 𝜇𝑖,𝑘 − ⌊𝜇𝑖,𝑗 ⌉𝜇𝑗,𝑘 for 𝑘 = 1, … , 𝑗.
𝑏𝑘∗ ⋅ 𝑏𝑘∗ 𝑏𝑘∗ ⋅ 𝑏𝑘∗
We set 𝜇𝑗,𝑗 = 1 and replace 𝜇𝑖𝑘 with 𝜇𝑖𝑘 − ⌊𝜇𝑖𝑗 ⌉𝜇𝑗𝑘 for 𝑘 = 1, … , 𝑗. The last iteration
(𝑘 = 𝑗) updates 𝜇𝑖𝑗 by 𝜇𝑖𝑗 − ⌊𝜇𝑖𝑗 ⌉. This is the error when replacing the GSO coefficient
𝜇𝑖𝑗 with the integer coefficient ⌊𝜇𝑖𝑗 ⌉.
Note that 𝑏𝑖 is not changed if ⌊𝜇𝑖,𝑗 ⌉ = 0.
Definition 14.24. An ordered basis 𝑏1 , … , 𝑏𝑛 of a lattice Λ is called size-reduced if all

1
GSO coefficients 𝜇𝑖,𝑗 satisfy |𝜇𝑖,𝑗 | ≤ . ♢
2
A size-reduced basis cannot be further reduced, since all rounded GSO coefficients
are zero. But note that this property depends on the order of the vectors. Even in
dimension 2, it may happen that 𝑏1 , 𝑏2 is size-reduced while 𝑏2 , 𝑏1 is not.
A size-reduced basis can be computed with the following integer variant of the
GSO algorithm (see Algorithm 14.2).
Algorithm 14.2 Size Reduction Algorithm

Input: Basis 𝑏1 , … , 𝑏𝑛 of lattice Λ
Output: Reduced basis 𝑏1 , … , 𝑏𝑛 , GSO coefficients 𝜇𝑖,𝑗
Initialisation: Run GSO Algorithm 14.1, get GSO basis 𝑏𝑖∗ and coefficients 𝜇𝑖,𝑗 .
1: for 𝑖 = 2 to 𝑛 do
2: for 𝑗 = 𝑖 − 1 downto 1 do
3: 𝑏𝑖 ← 𝑏𝑖 − ⌊𝜇𝑖,𝑗 ⌉𝑏𝑗
4: for 𝑘 = 1 to 𝑗 do
5: 𝜇𝑖,𝑘 ← 𝜇𝑖,𝑘 − ⌊𝜇𝑖,𝑗 ⌉𝜇𝑗,𝑘 // update GSO coefficients where 𝜇𝑗,𝑗 = 1
6: end for
7: end for
8: end for
∗
9: return 𝑏𝑖 , 𝑏𝑖 , 𝜇𝑖,𝑗
We note that the size reduction algorithm does not change the GSO Basis 𝑏1∗ , … , 𝑏𝑛∗
and their square norms 𝐵1 , … , 𝐵𝑛 . A size-reduced basis 𝑏1 , … , 𝑏𝑛 may be further im-
proved by changing the order of the vectors. Consider the GSO Algorithm 14.1: if 𝑏𝑖
and 𝑏𝑖+1 are swapped, then the reduction algorithm leaves 𝑏1∗ , … , 𝑏𝑖−1
∗ ∗
and 𝑏𝑖+2 , … , 𝑏𝑛∗
∗
unchanged and the new value for 𝑏𝑖 is
𝑖−1
𝑏𝑖+1 − ∑ 𝜇𝑖+1,𝑗 𝑏𝑗∗ .
𝑗=1
In terms of GSO vectors without swapping, this is the same as

∗
𝑏𝑖+1 + 𝜇𝑖+1,𝑖 𝑏𝑖∗ .
We compare ‖𝑏𝑖∗ ‖2 = 𝐵𝑖 (i.e., without swapping) with
∗
‖𝑏𝑖+1 + 𝜇𝑖+1,𝑖 𝑏𝑖∗ ‖2 = 𝐵𝑖+1 + 𝜇2𝑖+1,𝑖 𝐵𝑖
(with swapping). If 𝐵𝑖+1 + 𝜇2𝑖+1,𝑖 𝐵𝑖 < 𝐵𝑖 then swapping reduces the norm of 𝑏𝑖∗ and
should therefore be applied to the lattice basis. In this case, one swaps 𝑏𝑖 and 𝑏𝑖+1 and
again applies the size-reduction algorithm.
1
Definition 14.25. Let 𝑏1 , … , 𝑏𝑛 , 𝑏1∗ , … , 𝑏𝑛∗ , 𝐵1 , … , 𝐵𝑛 be as above and 𝛿 ∈ ] , 1[. Then
4
the Lovacz condition with factor 𝛿 is defined by
𝛿𝐵𝑖 ≤ 𝐵𝑖+1 + 𝜇2𝑖+1,𝑖 𝐵𝑖
for 𝑖 = 1, … , 𝑛 − 1. The condition is equivalent to (𝛿 − 𝜇2𝑖+1,𝑖 )𝐵𝑖 ≤ 𝐵𝑖+1 . An ordered
basis 𝑏1 , … , 𝑏𝑛 is called 𝛿-LLL-reduced if it is size-reduced and the Lovacz condition
holds with factor 𝛿. ♢
3
A typical choice is 𝛿 = , which ensures that the algorithm terminates in polyno-
4
mial time (in contrast to 𝛿 = 1). Now we can give the basic version of the famous LLL
(Lenstra-Lenstra-Lovasz) algorithm (see Algorithm 14.3).
Algorithm 14.3 LLL Lattice Reduction Algorithm

Input: Basis 𝑏1 , … , 𝑏𝑛 of lattice Λ
Output: LLL-reduced basis 𝑏1 , … , 𝑏𝑛
∗
1: Run Algorithm 14.2, get size-reduced basis 𝑏1 , … , 𝑏𝑛 , GSO basis 𝑏𝑖
and GSO coefficients 𝜇𝑖,𝑗 .
∗ 2
2: Compute 𝐵𝑖 = ‖𝑏𝑖 ‖ for 1 ≤ 𝑖 ≤ 𝑛
3: for 𝑖 = 1 to 𝑛 − 1 do
4: if (𝛿 − 𝜇2𝑖+1,𝑖 )𝐵𝑖 > 𝐵𝑖+1 then
5: Swap 𝑏𝑖 and 𝑏𝑖+1
6: Goto Step 1
7: end if
8: end for
9: return 𝑏1 , … , 𝑏𝑛
Remark 14.26. Algorithm 14.3 can be optimized: it is not necessary to leave the loop
and to re-run the size-reduction Algorithm 14.2, if 𝑏𝑖 and 𝑏𝑖+1 are swapped (step 5).
Instead, it is sufficient to update the GSO basis and several GSO coefficients and to
decrease the loop index 𝑖 by 1.
Example 14.27. We consider the following HNF basis of a 3-dimensional lattice (see
Example 14.21 (2)):
2 0 0
𝑏1 = ( 3 ) , 𝑏2 = ( 7 ) , 𝑏3 = ( 0 ) .
14 11 23
We apply the LLL lattice reduction Algorithm 14.3. First, we compute the GSO coeffi-
cients:
𝑏 ⋅𝑏 175 𝑏 ⋅𝑏 322 𝑏 ⋅ 𝑏∗ 3473
𝜇21 = 2 1 = , 𝜇31 = 3 1 = , 𝜇32 = 3∗ 2∗ = − .
𝑏1 ⋅ 𝑏1 209 𝑏1 ⋅ 𝑏1 209 𝑏2 ⋅ 𝑏2 4905
Note that 𝜇32 is computed using the updated vector
𝑏2∗ = 𝑏2 − 𝜇21 𝑏1 = (−350/209, 938/209, −151/209)𝑇 .
The size-reduction algorithm sets 𝑏2 = 𝑏2 − ⌊𝜇21 ⌉𝑏1 = 𝑏2 − 𝑏1 = (−2, 4, −3)𝑇 . Then,

34
𝜇21 is updated to 𝜇21 − ⌊𝜇21 ⌉ = − .
209
Now, 𝑏3 = 𝑏3 − ⌊𝜇32 ⌉𝑏2 = 𝑏3 + 𝑏2 = (−2, 4, 20)𝑇 . The GSO coefficient 𝜇31 changes
288 1432
to 𝜇31 − ⌊𝜇32 ⌉𝜇21 = 𝜇31 + 𝜇21 = and 𝜇32 is updated to 𝜇32 − ⌊𝜇32 ⌉ = 𝜇32 + 1 = .
209 4905
The last size-reduction step is 𝑏3 = 𝑏3 − ⌊𝜇31 ⌉𝑏1 = 𝑏3 − 𝑏1 = (−4, 1, 6)𝑇 . The GSO
79
coefficient 𝜇31 changes to 𝜇31 − ⌊𝜇31 ⌉ = 𝜇31 − 1 = . We have thus computed a new
209
size-reduced basis:
2 −2 −4
𝑏1 = ( 3 ) , 𝑏2 = ( 4 ) , 𝑏3 = ( 1 ) .
14 −3 6
3 4905
We check the Lovacz condition for 𝛿 = and 𝑖 = 1. We have 𝐵1 = 209, 𝐵2 = and
4 209
(𝛿 − 𝜇221 )𝐵1 ≈ 151 > 𝐵2 ≈ 23.

The Lovacz condition is not satisfied, and so we swap 𝑏1 and 𝑏2 :
−2 2 −4
𝑏1 = ( 4 ) , 𝑏2 = ( 3 ) , 𝑏3 = ( 1 ) .
−3 14 6
4905 103684
Apply the GSO algorithm and obtain 𝐵1 = 29, 𝐵2 = and 𝐵3 = for the new
29 4905
basis. The GSO coefficients are:
34 6 2087
𝜇21 = − , 𝜇31 = − , 𝜇32 = .
29 29 4905
Then size reduction is applied and we update 𝑏2 by 𝑏2 − ⌊𝜇21 ⌉𝑏1 = 𝑏2 + 𝑏1 = (0, 7, 11)𝑇
6
and 𝜇21 by 𝜇21 + 1 = − . The other two coefficients 𝜇31 and 𝜇32 are not changed and
29
are rounded to 0. Hence the new basis
−2 0 −4
𝑏1 = ( 4 ) , 𝑏2 = ( 7 ) , 𝑏3 = ( 1 )
3 11 6
is size-reduced. The Lovacz condition is satisfied for 𝑖 = 1:
(𝛿 − 𝜇221 )𝐵1 ≈ 21 < 𝐵2 ≈ 169,
but is not satisfied for 𝑖 = 2:
(𝛿 − 𝜇232 )𝐵2 ≈ 96 > 𝐵3 ≈ 21.
We have to swap 𝑏2 and 𝑏3 and obtain
−2 −4 0
𝑏1 = ( 4 ) , 𝑏2 = ( 1 ) , 𝑏3 = ( 7 ) .
−3 6 11
1501 103684
Now the GSO algorithm gives 𝐵1 = 29, 𝐵2 = and 𝐵3 = , and the updated
29 1501
GSO coefficients are
6 5 2087
𝜇21 = − , 𝜇 = − , 𝜇32 = .
29 31 29 1501
Again, the size-reduction algorithm is applied. 𝑏2 does not change since ⌊𝜇21 ⌉ = 0. We
1
update 𝑏3 by 𝑏3 − ⌊𝜇32 ⌉𝑏2 = 𝑏3 − 𝑏2 = (4, 6, 5)𝑇 , 𝜇31 by 𝜇31 − ⌊𝜇32 ⌉𝜇21 = and 𝜇32
29
586
by 𝜇32 − 1 = . Now the basis 𝑏1 , 𝑏2 , 𝑏3 is size-reduced and both Lovacz conditions
1501
are satisfied:
(𝛿 − 𝜇221 )𝐵1 ≈ 21 < 𝐵2 ≈ 52,

(𝛿 − 𝜇232 )𝐵2 ≈ 31 < 𝐵3 ≈ 69.
The LLL algorithm stops and outputs the LLL-reduced basis
−2 −4 4
𝑏1 = ( 4 ) , 𝑏2 = ( 1 ) , 𝑏3 = (6) .
−3 6 5
We verify our result with SageMath:
sage: A= matrix ([[2 ,0 ,0] ,[3 ,7 ,0] ,[14 ,11 ,23]])

sage: A. transpose (). LLL (). transpose ()
[-2 -4 4]
[ 4 1 6]
[-3 6 5]
The LLL-reduced basis is shorter than the original basis and one can show (for example,
−2
by testing shorter vectors with integer coefficients) that 𝑏1 = ( 4 ) is the shortest
−3
vector of the lattice Λ. ♢
One can show that the LLL algorithm always terminates and runs in polynomial
time. The number of swaps and hence the number of executions of the main loop is
bounded by 𝑂(𝑛2 ln(𝑋)), where 𝑋 is an upper bound on the norms of the input vectors.
We refer to [Gal12] and [HPS08] for a proof the following statement:
Theorem 14.28. Let Λ be a lattice in ℤ𝑛 with basis 𝑏1 , 𝑏2 , … , 𝑏𝑛 and ‖𝑏𝑖 ‖ ≤ 𝑋 for

1
𝑖 = 1, … , 𝑛. Let < 𝛿 < 1; then the LLL algorithm terminates and the running time is
4
𝑂(𝑛6 ln(𝑋)3 ). ♢
The next Proposition relates the norms of the LLL-reduced basis to the norms of
the GSO basis.
3
Proposition 14.29. Let 𝑏1 , … , 𝑏𝑛 be an LLL-reduced basis with 𝛿 = . Let 𝑏1∗ , … , 𝑏𝑛∗ be
4
the corresponding GSO basis and 𝐵𝑖 = ‖𝑏𝑖∗ ‖2 as above. Then:
(1) 𝐵𝑖 ≤ 2𝐵𝑖+1 for 1 ≤ 𝑖 < 𝑛 and 𝐵𝑗 ≤ 2𝑖−𝑗 𝐵𝑖 for 1 ≤ 𝑗 ≤ 𝑖 ≤ 𝑛.
1
(2) 𝐵𝑖 ≤ ‖𝑏𝑖 ‖2 ≤ ( + 2𝑖−2 )𝐵𝑖 for 1 ≤ 𝑖 ≤ 𝑛.
2
(3) ‖𝑏𝑗 ‖ ≤ 2(𝑖−1)/2 ‖𝑏𝑖∗ ‖ for 1 ≤ 𝑗 ≤ 𝑖 ≤ 𝑛.
(4) 𝜆1 (Λ) ≥ min1≤𝑖≤𝑛 ‖𝑏𝑖∗ ‖.
1 3
Proof. Since the basis is reduced, one has 𝜇2𝑖+1,𝑖 ≤ . The Lovacz condition for 𝛿 =
4 4
implies (1). The GSO construction gives
𝑖−1
𝑏𝑖 = 𝑏𝑖∗ + ∑ 𝜇𝑖,𝑗 𝑏𝑗∗ .
𝑗=1
Since the GSO vectors are orthogonal, one obtains ‖𝑏𝑖∗ ‖ ≤ ‖𝑏𝑖 ‖ and
𝑖−1
‖𝑏𝑖 ‖2 = 𝐵𝑖 + ∑ 𝜇2𝑖,𝑗 𝐵𝑗 .
𝑗=1
1 1 (𝑖−𝑗)
Furthermore, 𝜇2𝑖,𝑗 𝐵𝑗 ≤ 𝐵𝑗 ≤ 2 𝐵𝑖 by (1). This gives part (2), since
4 4
𝑖−1
1 1 1
‖𝑏𝑖 ‖2 ≤ 𝐵𝑖 (1 + ∑ 2𝑖−𝑗 ) = 𝐵𝑖 (1 + (2𝑖 − 2)) = 𝐵𝑖 ( + 2𝑖−2 ) .
4 𝑗=1 4 2
1
For 𝑗 ≥ 1 we have + 2𝑗−2 ≤ 2𝑗−1 . Thus (2) implies ‖𝑏𝑗 ‖2 ≤ 2𝑗−1 𝐵𝑗 . Since 𝐵𝑗 ≤ 2𝑖−𝑗 𝐵𝑖
2
by (1), we obtain ‖𝑏𝑗 ‖2 ≤ 2𝑗−1 2𝑖−𝑗 𝐵𝑖 = 2𝑖−1 𝐵𝑖 . Taking square roots proves part (3).
𝑛
Suppose 𝑣 is a shortest nonzero lattice vector and 𝑣 = ∑𝑖=1 𝑥𝑖 𝑏𝑖 where 𝑥𝑖 ∈ ℤ; then:
𝑛 𝑖−1 𝑛
𝑣 = ∑ (𝑥𝑖 𝑏𝑖∗ + ∑ 𝑥𝑖 𝜇𝑖,𝑗 𝑏𝑗∗ ) = ∑ (𝑥𝑖 + 𝜇𝑖+1,𝑖 𝑥𝑖+1 + ⋯ + 𝜇𝑛,𝑖 𝑥𝑛 ) 𝑏𝑖∗ .
𝑖=1 𝑗=1 𝑖=1
Now, let 𝑖 be the largest index such that 𝑥𝑖 ≠ 0. The above formula and the orthogo-
nality of the GSO basis implies ‖𝑣‖ ≥ |𝑥𝑖 | ‖𝑏𝑖∗ ‖ and hence part (4). □
The next Proposition shows how effective the LLL algorithm is (in the worst case)
with respect to computing a short vector and an almost orthogonal basis. The algo-
rithm is good for small values of 𝑛, but the bounding factors grow exponentially in 𝑛.
3
Proposition 14.30. Let 𝑏1 , … , 𝑏𝑛 be an LLL-reduced basis with 𝛿 = . Then:
4
(1) ‖𝑏1 ‖ ≤ 2(𝑛−1)/2 𝜆1 (Λ).

𝑛
(2) det(Λ) ≤ ∏𝑖=1 ‖𝑏𝑖 ‖ ≤ 2𝑛(𝑛−1)/4 det(Λ).
1
(3) ‖𝑏1 ‖ ≤ 2(𝑛−1)/4 det(Λ) 𝑛 .
14.3. GGH Cryptosystem 269
Proof. We use the previous Proposition 14.29. Part (1) gives ‖𝑏𝑖∗ ‖ ≥ 2(1−𝑖)/2 ‖𝑏1∗ ‖. Part
(4) and 𝑏1 = 𝑏1∗ yield the first inequality:
𝜆1 (Λ) ≥ min ‖𝑏𝑖∗ ‖ ≥ min 2(1−𝑖)/2 ‖𝑏1∗ ‖ = 2(1−𝑛)/2 ‖𝑏1 ‖.
1≤𝑖≤𝑛 1≤𝑖≤𝑛
𝑛
We have det(Λ) = ∏𝑖=1 ‖𝑏𝑖∗ ‖. Inequality (2) follows from ‖𝑏𝑖∗ ‖ ≤ ‖𝑏𝑖 ‖ and part (3) of
Proposition 14.29:
‖𝑏𝑖∗ ‖ ≤ ‖𝑏𝑖 ‖ ≤ 2(𝑖−1)/2 ‖𝑏𝑖∗ ‖.
Furthermore, ‖𝑏1 ‖ ≤ 2(𝑖−1)/2 ‖𝑏1∗ ‖ gives
𝑛
‖𝑏1 ‖𝑛 ≤ ∏ 2(𝑖−1)/2 ‖𝑏𝑖∗ ‖ = 2𝑛(𝑛−1)/4 det(Λ),
𝑖=1
which yields part (3). □
14.3. GGH Cryptosystem

In 1997, Goldreich, Goldwasser and Halevi introduced a public-key encryption and a
signature scheme based on lattices (GGH, [GGH97]). The security relies on the hard-
ness of the closest vector problem (CVP). Although the original scheme has flaws, and
it was shown that GGH is much easier than the general closest vector problem (CVP),
the underlying construction is still relevant today. Together with the Atjai-Dwork and
NTRU schemes, GGH is one of the classical lattice-based cryptosystems. Furthermore,
the GGH scheme is very efficient.
Definition 14.31. Let 𝑛 ∈ ℕ be the security parameter. Choose 𝑀 ∈ ℕ and a small
number 𝜎, for example 𝑀 = 𝑛 and 𝜎 = 3. The GGH encryption scheme is defined as
follows:
• The plaintext space is ℳ = {−𝑀, … , 0, 1, … , 𝑀}𝑛 and the ciphertext space is
𝒞 = ℤ𝑛 .
• For key generation, one chooses a uniform 𝑛×𝑛 matrix 𝐵 over ℤ with small entries,
say integers between −4 and 4. Verify that 𝐵 is invertible (over ℚ), and otherwise
generate a new matrix. 𝐵 is a ‘good’ basis of a lattice Λ ⊂ ℝ𝑛 (since the basis has
small coefficients) and forms the private key. Let 𝐻 be the HNF of 𝐵. This is a
‘bad’ basis of Λ and defines the public key. Alternatively, 𝐻 could be any random
basis of Λ. However, 𝐻 can be efficiently reduced to the HNF anyway.
• A plaintext 𝑚 ∈ {−𝑀, … , 0, 1, … , 𝑀}𝑛 is encrypted by choosing a uniform ran-
dom noise vector 𝑟 ∈ {−𝜎, 𝜎}𝑛 and computing the ciphertext
𝑐 = 𝐻𝑚 + 𝑟.
• For decryption, one uses Babai’s rounding method to recover the plaintext
𝑚 = 𝐻 −1 𝐵⌊𝐵−1 𝑐⌉. ♢
The ciphertext is close to the lattice point 𝐻𝑚 and the assumption is that finding
this vector given 𝑐 is hard. The following Proposition shows that decryption is correct
if the noise vector 𝑟 is small.
Proposition 14.32. Let 𝐵, 𝐻, Λ and 𝑟 be as above. If ⌊𝐵−1 𝑟⌉ = 0 then GGH decryption
is correct.
Proof. Let 𝑐 = 𝐻𝑚 + 𝑟. Since 𝐻𝑚 ∈ Λ, we have 𝐵 −1 𝐻𝑚 ∈ ℤ𝑛 and

⌊𝐵−1 𝑐⌉ = ⌊𝐵 −1 𝐻𝑚 + 𝐵 −1 𝑟⌉ = 𝐵 −1 𝐻𝑚 + ⌊𝐵 −1 𝑟⌉.
It follows that
𝐻 −1 𝐵⌊𝐵 −1 𝑐⌉ = 𝐻 −1 𝐵𝐵 −1 𝐻𝑚 + 𝐻 −1 𝐵⌊𝐵−1 𝑟⌉ = 𝑚 + 𝐻 −1 𝐵⌊𝐵−1 𝑟⌉.
The second summand is zero if ⌊𝐵−1 𝑟⌉ = 0. This proves the assertion. □
Example 14.33. Let 𝑛 = 4, 𝑀 = 4 and 𝜎 = 1. We choose a short basis 𝐵 of a lattice Λ
and compute its HNF:
2 −3 1 −4 1 0 0 0
⎛ ⎞ ⎛ ⎞
−2 1 0 4 0 1 0 0
𝐵=⎜ ⎟ , 𝐻=⎜ ⎟.
⎜ −1 3 2 1 ⎟ ⎜ 0 0 1 0 ⎟
⎝ −1 −4 3 −2 ⎠ ⎝ 44 18 4 49 ⎠
Suppose the plaintext is 𝑚 = (3, −4, 1, 3) and the noise vector 𝑟 = (−1, 1, 1, −1) is
chosen. The associated ciphertext is:
1 0 0 0 3 −1 3 −1 2
⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
0 1 0 0 −4 1 −4 1 −3
𝑐 = 𝐻𝑚 + 𝑟 = ⎜ ⎟⎜ ⎟ + ⎜ ⎟ = ⎜ ⎟ + ⎜ ⎟ = ⎜ ⎟.
⎜ 0 0 1 0 ⎟⎜ 1 ⎟ ⎜ 1 ⎟ ⎜ 1 ⎟ ⎜ 1 ⎟ ⎜ 2 ⎟
⎝ 44 18 4 49 ⎠ ⎝ 3 ⎠ ⎝−1⎠ ⎝211⎠ ⎝−1⎠ ⎝210⎠
Decryption uses the private basis 𝐵. We compute:
61 45 10 −27 −809 −116
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1 −10 −13 8 −2 1 −55 −8
𝐵−1 = ⎜ ⎟ , 𝐵 −1 𝑐 = ⎜ ⎟ , ⌊𝐵 −1 𝑐⌉ = ⎜ ⎟.
49 ⎜ 29 23 16 −4 ⎟ 7 ⎜−117⎟ ⎜ −17 ⎟
⎝ 33 38 3 −13 ⎠ ⎝−396⎠ ⎝ −57 ⎠
We recover the plaintext:
1 0 0 0
⎛ ⎞
0 1 0 0 ⎟
𝐻 −1 = ⎜⎜ 0 0 1 0 ⎟,
⎜ 44 18 4 1 ⎟
⎝ − 49 − 49 −
49 49 ⎠
−116 3 3
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
−8 −3 −4
𝑚 = 𝐻 −1 𝐵⌊𝐵−1 𝑐⌉ = 𝐻 −1 𝐵 ⎜ ⎟ = 𝐻 −1 ⎜ ⎟ = ⎜ ⎟ .
⎜ −17 ⎟ ⎜ 1 ⎟ ⎜1⎟
⎝ −57 ⎠ ⎝211⎠ ⎝ 3 ⎠
14.4. NTRU 271
However, if an adversary tries to decrypt 𝑐 using the public basis 𝐻, then the result 𝑚′
differs from 𝑚:
⎢⎛ 2 ⎞⎤ ⎛ 2 ⎞
⎢ −3 ⎥
𝑚′ = ⌊𝐻 −1 𝑐⌉ = ⎢ ⎜ ⎟ = ⎜−3⎟ . ♢
⎢⎜ 2 ⎟⎥ ⎜2⎟
⎢⎜ 24 ⎟⎥
⎥
⎣⎝ 7 ⎠⎥ ⎝ 3 ⎠
It can be shown that the GGH encryption scheme has inherent weaknesses. A
major problem is that the noise vector 𝑟 has to be short for correct decryption.
As a consequence, GGH ciphertexts are not uniformly distributed and can be dis-
tinguished from random data, but then the closest vector problem is much easier than
in the general case. Practical attacks could be mounted for 𝑛 < 400, and larger dimen-
sions are impractical because of the key size that grows quadratically in 𝑛. There are
proposals for improvements of GGH that require further cryptanalysis.
14.4. NTRU
The NTRU cryptosystem was invented in the 1990s by Hoffstein, Pipher and Silverman
[HPS98]. The classical definition of NTRU uses polynomials in the ring
𝑅 = ℤ[𝑥]/(𝑥𝑁 − 1),
where 𝑁 is fixed, for example 𝑁 = 743. Furthermore, a large modulus 𝑞 and a small
modulus 𝑝 are needed with gcd(𝑝, 𝑞) = 1, for example 𝑞 = 2048 and 𝑝 = 3. We begin
with the classical definition of NTRU and outline the relation to lattices later in this
section.
Multiplication in the ring 𝑅 can be viewed as a convolution product, since
(𝑎0 + 𝑎1 𝑥 + ⋯ + 𝑎𝑁−1 𝑥𝑁−1 )(𝑏0 + 𝑏1 𝑥 + ⋯ + 𝑏𝑁−1 𝑥𝑁−1 ) ≡ 𝑐0 + ⋯ + 𝑐𝑁−1 𝑥𝑁−1 ,
where 𝑐𝑘 = ∑𝑖+𝑗≡𝑘 mod 𝑁 𝑎𝑖 𝑏𝑗 . The convolution product is often denoted by a ‘∗’, but
since it is also the usual multiplication in the quotient ring 𝑅 (see Proposition 4.60) we
will not use a special notation.
NTRU uses polynomials with small coefficients. We define 𝒯(𝑑1 , 𝑑2 ) ⊂ 𝑅 to be the
subset of ternary polynomials, where 𝑓 ∈ 𝒯(𝑑1 , 𝑑2 ) if a representative 𝑓 of degree < 𝑁
has 𝑑1 coefficients equal to 1, 𝑑2 coefficients equal to −1 and the remaining coefficients
equal to zero (see [HPS08]).
Definition 14.34. Let 𝑁, 𝑝 be prime numbers and 𝑞, 𝑑 ∈ ℕ such that gcd(𝑝, 𝑞) =

gcd(𝑁, 𝑞) = 1 and 𝑞 > (6𝑑 + 1)𝑝. Let 𝑅 = ℤ[𝑥]/(𝑥𝑁 − 1), 𝑅𝑝 = ℤ𝑝 [𝑥]/(𝑥𝑁 − 1) and
𝑅𝑞 = ℤ𝑞 [𝑥]/(𝑥𝑁 − 1). Then the NTRU cryptosystem is defined as follows:
• The plaintext space is ℳ = 𝑅𝑝 and the ciphertext space is 𝒞 = 𝑅𝑞 .
• For key generation, two polynomials 𝑓, 𝑔 ∈ 𝑅 are chosen such that

𝑓 ∈ 𝒯(𝑑 + 1, 𝑑) and 𝑔 ∈ 𝒯(𝑑, 𝑑). Moreover, 𝑓 must be invertible mod 𝑞 and
mod 𝑝, i.e., there exist polynomials 𝑓𝑝 , 𝑓𝑞 ∈ 𝑅 such that
𝑓 ⋅ 𝑓𝑝 ≡ 1 mod 𝑝 and 𝑓 ⋅ 𝑓𝑞 ≡ 1 mod 𝑞.
Then set
ℎ = 𝑓𝑞 ⋅ 𝑔 mod 𝑞.
The public key is 𝑝𝑘 = (𝑁, 𝑝, 𝑞, ℎ) and the private key is 𝑠𝑘 = 𝑓.
• For encryption, the plaintext is encoded into a polynomial 𝑚 of degree less than 𝑁
𝑝−1 𝑝−1
with integer coefficients between − and , i.e., a lifted element of the ring
2 2
𝑅𝑝 . A random polynomial 𝑟 ∈ 𝑅 is chosen such that 𝑟 ∈ 𝒯(𝑑, 𝑑). The ciphertext
is defined by
𝑐 = 𝑝 𝑟ℎ + 𝑚 mod 𝑞.
• For decryption, compute
𝑎 = 𝑓𝑐 mod 𝑞
𝑞 𝑞
and represent 𝑎 with integer coefficients between − and . The plaintext is re-
2 2
covered by
𝑚 = 𝑓𝑝 𝑎 mod 𝑝. ♢
We verify that decryption is correct:

𝑎 = 𝑓𝑐 = 𝑓(𝑝𝑟ℎ + 𝑚) mod 𝑞
= 𝑝𝑓𝑟𝑓𝑞 𝑔 + 𝑓𝑚 mod 𝑞
= 𝑝 𝑟 𝑔 + 𝑓𝑚 mod 𝑞.
The above restrictions on the coefficients 𝑟, 𝑔, 𝑓 and 𝑚 ensure that the magnitude of
the largest coefficient of 𝑝 𝑟 𝑔 + 𝑓𝑚 is at most
𝑝 1
𝑝 ⋅ 2𝑑 + (2𝑑 + 1) = (3𝑑 + ) 𝑝.
2 2
Now the assumption 𝑞 > (6𝑑 + 1)𝑝 implies that every coefficient of 𝑝 𝑟 𝑔 + 𝑓𝑚 has
𝑞
magnitude less than . Therefore, the equation 𝑎 = 𝑝 𝑟 𝑔 + 𝑓𝑚 holds over the integers
2
before reducing modulo 𝑞. This gives
𝑓𝑝 𝑎 = 𝑝𝑓𝑝 𝑟 𝑔 + 𝑓𝑝 𝑓𝑚 ≡ 0 + 𝑚 = 𝑚 mod 𝑝.
Note that the plain NTRU cryptosystem is not CPA-secure, since it leaks a small
part of the plaintext (see Exercise 10).
Remark 14.35. There are recommendations for 𝑁, 𝑝, 𝑞 and the distribution of coef-
ficients in the polynomials 𝑓, 𝑔 and 𝑟 can be more general than explained above. It is
estimated that NTRU encryption with 𝑁 = 743, 𝑝 = 3, 𝑞 = 211 = 2048 achieves a very
high level of security [HPS+ 17].
14.4. NTRU 273
Example 14.36. We look at a toy example. Let 𝑁 = 5, 𝑝 = 3, 𝑞 = 29, 𝑑 = 1. For key

generation, we choose
𝑓 = 𝑥4 + 𝑥3 − 1 ∈ 𝒯(2, 1) and 𝑔 = 𝑥3 − 𝑥2 ∈ 𝒯(1, 1).
We use SageMath to check that 𝑓 is invertible modulo 𝑝 and 𝑞 in 𝑅 = ℤ[𝑥]/(𝑥5 − 1).
sage: p=3; q=29
sage: mod=x^5 -1
sage: R= PolynomialRing (ZZ ,x). quotient_ring (mod)
sage: Rp= PolynomialRing ( Integers (p),x). quotient_ring (mod)
sage: Rq= PolynomialRing ( Integers (q),x). quotient_ring (mod)
sage: f = x^4 + x^3 - 1; g = x^3 - x^2
sage: fp =1/ Rp(f);fq =1/ Rq(f); fp;fq
2* xbar ^3 + 2* xbar ^2 + xbar + 2
24* xbar ^4 + 8* xbar ^3 + 3* xbar ^2 + 11* xbar + 13
Hence 𝑓𝑝 = −𝑥3 − 𝑥2 + 𝑥 − 1 and 𝑓𝑞 = −5𝑥4 + 8𝑥3 + 3𝑥2 + 11𝑥 + 13. We obtain

ℎ = 𝑓𝑞 𝑔 = (−5𝑥4 + 8𝑥3 + 3𝑥2 + 11𝑥 + 13)(𝑥3 − 𝑥2 )
= −5𝑥7 + 8𝑥6 + 3𝑥5 + 11𝑥4 + 13𝑥3 + 5𝑥6 − 8𝑥5 − 3𝑥4 − 11𝑥3 − 13𝑥2
= −5𝑥7 + 13𝑥6 − 5𝑥5 + 8𝑥4 + 2𝑥3 − 13𝑥2
= 8𝑥4 + 2𝑥3 + 11𝑥2 + 13𝑥 − 5 mod 29.
The last equation is true since 𝑥5 = 1 in 𝑅. We have generated the public key 𝑝𝑘 =
(𝑁, 𝑝, 𝑞, ℎ), and the private key is 𝑠𝑘 = 𝑓.
Suppose the plaintext is encoded in the polynomial 𝑚 = 𝑥3 + 𝑥. A random poly-
nomial 𝑟 = 𝑥4 − 𝑥 ∈ 𝒯(1, 1) is chosen for encryption and the ciphertext is
𝑐 = 𝑝 𝑟ℎ + 𝑚 = 3(𝑥4 − 𝑥)(8𝑥4 + 2𝑥3 + 11𝑥2 + 13𝑥 − 5) + 𝑥3 + 𝑥
= 8𝑥4 + 21𝑥3 + 25𝑥2 + 20𝑥 + 15 mod 29.
sage: h=fq*Rq(g)
sage: m=x^3+x; r = x^4-x
sage: c= Rq(p)*Rq(h)* Rq(r)+ Rq(m);c
Note that 𝑚(1) = 𝑐(1) mod 29.

In order to decrypt 𝑐, we use the secret key 𝑓 and compute
𝑎 = 𝑓𝑐 mod 𝑞 = (𝑥4 + 𝑥3 − 1)(8𝑥4 + 21𝑥3 + 25𝑥2 + 20𝑥 + 15) mod 29.
SageMath gives the following result:
sage: a=Rq(f)*c;a
Hence 𝑎 = −2𝑥4 + 2𝑥3 + 4𝑥2 − 3𝑥 + 1. We verify that 𝑎 = 𝑝 𝑟𝑔 + 𝑓𝑚 holds in 𝑅, without

reducing modulo 𝑞. Finally, 𝑓𝑝 𝑎 mod 𝑝 recovers the plaintext 𝑚 = 𝑥3 + 𝑥.
sage: a=p*R(r)*R(g)+R(f)*R(m); a
-2* xbar ^4 + 2* xbar ^3 + 4* xbar ^2 - 3* xbar + 1
sage: Rp(fp)* Rp(a)
xbar ^3 + xbar
Now we describe the lattice representation of NTRU. It is easy to see that elements
in 𝑅 = ℤ[𝑥]/(𝑋 𝑁 − 1) correspond to vectors in ℤ𝑁 : a polynomial 𝑓 = 𝑎0 + 𝑎1 𝑥 + ⋯ +
𝑎𝑁−1 𝑥𝑁−1 ∈ 𝑅 is mapped to the vector 𝑓˜ = (𝑎0 , 𝑎1 , … , 𝑎𝑁−1 ) ∈ ℤ𝑁 of coefficients.
This is clearly a group isomorphism, but how does the multiplication in 𝑅 translate to
ℤ𝑁 ? We define the circulant matrix of 𝑓 as
𝑎 𝑎1 … 𝑎𝑁−1
⎛ 0 ⎞
𝑎𝑁−1 𝑎0 … 𝑎𝑁−2
𝐶𝑓 = ⎜ ⎟.
⎜ … ⎟
⎝ 𝑎1 𝑎2 … 𝑎0 ⎠
Let 𝑔 = 𝑏0 + 𝑏1 𝑥 + ⋯ + 𝑏𝑁−1 𝑥𝑁−1 . Then (𝑏0 𝑏1 … 𝑏𝑁−1 ) 𝐶𝑓 gives the coefficients of

the convolution product 𝑓𝑔 = 𝑔𝑓 ∈ 𝑅, i.e.,
˜
(𝑏0 𝑏1 … 𝑏𝑁−1 ) 𝐶𝑓 = 𝑔̃ 𝐶𝑓 = 𝑓𝑔.
Suppose an NTRU cryptosystem as above is given; then
𝑓ℎ = 𝑓𝑓𝑞 𝑔 = 𝑔 mod 𝑞.
˜ ℎ = 𝑔̃ mod 𝑞. Now we define the associated

Hence the coefficient vectors satisfy 𝑓𝐶
NTRU lattice Λ of dimension 2𝑁. The lattice is generated by the rows of the matrix
𝐼 𝐶ℎ
𝐴=(𝑁 ),
0 𝑞𝐼𝑁
where 𝐼𝑁 is the 𝑁 × 𝑁 identity matrix. The matrix 𝐴 is derived from the public key of
the NTRU cryptosystem. Since 𝑔 = 𝑓ℎ + 𝑞𝑢 for some polynomial 𝑢 ∈ 𝑅, we have
˜ 𝑢)𝐴
(𝑓, ˜ 𝑢)̃ (𝐼𝑁 ) , (𝑓,
̃ = ((𝑓, ˜ 𝑢)̃ ( 𝐶ℎ )) = (𝑓,
˜ 𝑔),
̃
0 𝑞𝐼𝑁
and hence (𝑓, ˜ 𝑔)̃ ∈ Λ. Note that 𝑓 forms the private key. Since 𝑓 and 𝑔 have small coef-
˜ 𝑔)̃ is a short vector in Λ. NTRU can therefore be attacked by finding short
ficients, (𝑓,
vectors in the lattice Λ. However, this is assumed to be intractable if the dimension is
large enough.
14.4. NTRU 275
Example 14.37. We continue Example 14.36. The corresponding NTRU lattice Λ of

dimension 10 is generated by the rows of the following matrix:
1 0 0 0 0 −5 13 11 2 8
⎛ ⎞
0 1 0 0 0 8 −5 13 11 2
⎜ ⎟
⎜ 0 0 1 0 0 2 8 −5 13 11 ⎟
⎜ 0 0 0 1 0 11 2 8 −5 13 ⎟
𝐼 𝐶ℎ ⎜ 0 0 0 0 1 13 11 2 8 −5 ⎟
𝐴=(5 )=⎜ .
0 29𝐼5 0 0 0 0 0 29 0 0 0 0 ⎟
⎜ 0 0 0 0 0 0 29 0 0 0 ⎟
⎜ ⎟
⎜ 0 0 0 0 0 0 0 29 0 0 ⎟
⎜ 0 0 0 0 0 0 0 0 29 0 ⎟
⎝ 0 0 0 0 0 0 0 0 0 29 ⎠
The lattice is given by a ‘bad’ basis of long vectors. We want to attack this NTRU cryp-
tosystem by finding a ‘good’ basis and the short secret vector (𝑓, ˜ 𝑔)̃ ∈ Λ. We use
SageMath to compute the LLL-reduced basis of the lattice (see Section 14.2):
1 0 0 −1 −1 0 0 1 −1 0
⎛ ⎞
−1 1 0 0 −1 0 0 0 1 −1
⎜ ⎟
⎜ −1 −1 1 0 0 −1 0 0 0 1 ⎟
⎜ 0 −1 −1 1 0 1 −1 0 0 0 ⎟
⎜ 0 0 −1 −1 1 0 1 −1 0 0 ⎟
⎜ −4 −4 5 0 5 5 5 5 −5 −10 ⎟.
⎜ 0 −5 9 −5 0 10 0 −5 0 −5 ⎟
⎜ ⎟
⎜ 0 −5 5 1 0 10 9 5 5 0 ⎟
⎜ −5 9 −5 0 0 0 −5 0 −5 10 ⎟
⎝ 0 5 −5 −5 6 −5 −9 4 5 5 ⎠
Since the dimension is small, our attack is successful: the negative of the first row is
˜ 𝑔)̃ = (−1, 0, 0, 1, 1, 0, 0, −1, 1, 0).

(𝑓,
Thus we could derive the private key 𝑓 = −1 + 𝑥3 + 𝑥4 as well as 𝑔 = −𝑥2 + 𝑥3 from

the public key (𝑁, 𝑝, 𝑞, ℎ).
Remark 14.38. The security of NTRU relies on the assumption that, given ℎ =
𝑓−1 𝑔 mod 𝑞, it is hard to recover 𝑓 and 𝑔. There are many potential attacks and NTRU
has been investigated for two decades. It is generally assumed that improved NTRU
schemes are secure and can even achieve IND-CCA2 security if the parameter recom-
mendations are observed. Some doubts remain with respect to the cyclotomic structure
of the ring 𝑅 = ℤ[𝑥]/(𝑥𝑁 −1) and there are proposals [BCLvV18] replacing 𝑥𝑁 −1 with
𝑥𝑁 − 𝑥 − 1 and choosing a prime 𝑞 instead of a power of 2 (compare Remark 14.35).
14.5. Learning with Errors

Learning with Errors (LWE) is a mathematical problem with many applications in post-
quantum cryptography. LWE was introduced by O. Regev [Reg09] and is based on a
system of noisy (approximate) linear equations modulo an integer 𝑞. Suppose the row
vectors 𝑎𝑖 ∈ ℤ𝑛𝑞 are chosen uniformly at random, 𝑠 ∈ ℤ𝑛𝑞 is a secret vector and a linear
system of 𝑚 ≥ 𝑛 approximate equations is given:
𝑎1 ⋅ 𝑠 ≈ 𝑏1 mod 𝑞,
𝑎2 ⋅ 𝑠 ≈ 𝑏2 mod 𝑞,
⋮
𝑎𝑚 ⋅ 𝑠 ≈ 𝑏𝑚 mod 𝑞.
The errors 𝑒1 , 𝑒2 , … , 𝑒𝑚 of each equation are small and are drawn from a distribution
𝜒 on ℤ. The distribution is usually taken to be a discrete Gaussian (see Definition
14.39 below). The number of equations is typically chosen to be large enough that the
approximate linear system of equations has a unique solution with high probability.
The LWE search problem is to find (to learn) the unknown vector 𝑠 ∈ ℤ𝑛𝑞 given 𝑎𝑖 and
𝑏𝑖 for 𝑖 = 1, … , 𝑚. If all errors are zero, then the problem can be solved in polynomial
time by Gaussian elimination. However, this does not work for approximate equations
because even a small error vector can change the solution completely.
The distribution 𝜒 of error terms is quite important in LWE. If the error is zero
or too small then LWE is easy. On the other hand, if the error terms are large or even
uniform in ℤ𝑞 , then the LWE problem has more than one solution. It is also important
that the error is randomly chosen, because otherwise LWE is easy. A standard choice
of the error distribution 𝜒 is the Discrete Gaussian.
Definition 14.39. A discrete Gaussian is a probability distribution on ℤ obtained from

a continuous Gaussian. The probability mass function of a discrete Gaussian is propor-
tional to a continuous Gaussian with mean 0 and standard deviation 𝜎 at each integer
value (see Figure 14.3), i.e.,
2
1 − 𝑥2
𝑝(𝑥) = 𝑒 2𝜍 for 𝑥 ∈ ℤ,
𝑐
2 2
where 𝑐 = ∑𝑘∈ℤ 𝑒−𝑘 /(2𝜍 ) ≈ 𝜎√2𝜋. A discrete Gaussian is said to have parameter or
𝑠
width 𝑠, if 𝜎 = , and a discrete Gaussian distribution of width 𝑠 is denoted by 𝐷ℤ,𝑠 .
√2𝜋
2 /𝑠2
The probability mass function of 𝐷ℤ,𝑠 is proportional to 𝑒−𝜋𝑥 . ♢
Definition 14.40. Let 𝑛, 𝑚, 𝑞 ∈ ℕ, 𝑚 ≥ 𝑛, 𝑞 ≥ 2 and assume that the secret vector

𝑠 ∈ ℤ𝑛𝑞 and the 𝑚 × 𝑛 matrix 𝐴 over ℤ𝑞 are chosen uniformly. Let 𝜒 be a probability
distribution on ℤ and 𝑒 ∈ ℤ𝑚 𝑚
𝑞 a vector that is drawn from the distribution 𝜒 .
(1) The Search-LWE problem is to find 𝑠 given 𝐴 and 𝑏 = 𝐴𝑠 + 𝑒.

Figure 14.3. Probability mass function of the discrete Gaussian distribution 𝐷ℤ,𝑠 with
standard deviation 𝜎 = 3 and width 𝑠 = √2𝜋𝜎 ≈ 7.5.
(2) The Decision-LWE problem is to distinguish with a non-negligible advantage be-

tween the cases i) 𝑏 = 𝐴𝑠+𝑒, and ii) 𝑏 ∈ ℤ𝑚
𝑞 is chosen uniformly at random, given
𝐴 and 𝑏.
The LWE problem depends on the parameters 𝑛, 𝑚, 𝑞 and 𝜒. ♢
It is easy to see that solving the Search-LWE problem also solves the decision prob-
lem, and it can be shown that the decision and the search version of LWE are equivalent
if 𝑞 is bounded by a polynomial in 𝑛 (see [Reg09]).
LWE can be viewed as a lattice problem: the matrix 𝐴 defines a 𝑞-ary lattice Λ𝑞 (𝐴𝑇 )
of dimension 𝑚. The search problem is to find the closest lattice vector 𝑣 to a given noisy
vector 𝑣 + 𝑒 where 𝑒 is chosen according to 𝜒. This is also called a Bounded Distance
Decoding (BDD) problem. The decision problem is to distinguish between a uniform
random vector 𝑏 and a noisy lattice vector 𝑣 + 𝑒.
The following theorem ([Reg09]) is one of the key results and explains why the
LWE problem is believed to be hard:
Theorem 14.41. Let 𝑛 ∈ ℕ be the security parameter, let 𝑚, 𝑞 ∈ ℕ be polynomial in
𝑛 and let 𝜒 = 𝐷ℤ,𝑠 be a discrete Gaussian of parameter 𝑠 such that 𝑠 = 𝛼𝑞 > 2√𝑛 and
0 < 𝛼 < 1. Then solving the LWE decision problem is at least as hard as quantumly
̃
solving SIVP𝛾 on arbitrary 𝑛-dimensional lattices, where 𝛾 = 𝑂(𝑛/𝛼). ♢
In other words: if a polynomial-time algorithm achieves a non-negligible advan-

tage in the LWE decision problem (in the average case), then a polynomial-time al-
gorithm on a quantum computer can solve any instance (also the worst case) of the
SIVP𝑛/𝛼 problem. Since the approximation of the shortest independent vector prob-
lem to within polynomial factors is assumed to be a hard problem even for quantum
computers, the LWE problem is probably hard. The worst-case to average-case reduc-
tion of Theorem 14.41 is particularly interesting, because most other cryptographic
constructions are based on average-case hardness.
Now we define a public-key cryptosystem that is based on LWE and has the same
strong security guarantee.
Definition 14.42. Let 𝑛, 𝑚, 𝑞 ∈ ℕ, 𝑚 ≥ 𝑛, 𝑞 ≥ 2 and 𝜒 an error distribution on ℤ. Let
𝑙 be the plaintext length. Then the LWE public-key cryptosystem is defined by:
• The plaintext space ℳ = {0, 1}𝑙 ≅ ℤ𝑙2 .
• The ciphertext space 𝒞 = ℤ𝑛𝑞 × ℤ𝑙𝑞 .
• For key generation, one chooses an 𝑛×𝑙 matrix 𝑆 and an 𝑚×𝑛 matrix 𝐴 uniformly
at random and an 𝑚 × 𝑙 matrix 𝐸 according to 𝜒. All matrices are defined over ℤ𝑞 .
The private key is 𝑠𝑘 = 𝑆 and the public key is 𝑝𝑘 = (𝐴, 𝑃), where 𝑃 = 𝐴𝑆 + 𝐸.
• To encrypt a plaintext 𝑣 ∈ {0, 1}𝑙 , one chooses a vector 𝑎 ∈ {0, 1}𝑚 uniformly at
random. The ciphertext is given by
𝑞
(𝑢, 𝑐) = ℰ𝑝𝑘 (𝑣) = (𝐴𝑇 𝑎, 𝑃 𝑇 𝑎 + ⌊ ⌉𝑣) ∈ ℤ𝑛𝑞 × ℤ𝑙𝑞 .
2
• Decryption of a ciphertext (𝑢, 𝑐) is defined by

𝑞
𝒟𝑠𝑘 (𝑢, 𝑐) = ⌊⌊ ⌉−1 (𝑐 − 𝑆 𝑇 𝑢)⌉ mod 2.
2
In other words, each coordinate of 𝑐 − 𝑆𝑇 𝑢 is tested whether it is closer to 0 or to

𝑞
⌊ ⌉ modulo 𝑞. ♢
2
The decryption may have errors. One has

𝑞
𝒟𝑠𝑘 (ℰ𝑝𝑘 (𝑣)) = 𝒟𝑠𝑘 (𝐴𝑇 𝑎, 𝑃 𝑇 𝑎 + ⌊ ⌉𝑣)
2
𝑞 𝑞
= ⌊⌊ ⌉−1 ((𝐴𝑆 + 𝐸) 𝑎 + ⌊ ⌉𝑣 − 𝑆 𝑇 𝐴𝑇 𝑎)⌉ mod 2
𝑇
2 2
𝑞 −1 𝑇 𝑞
= ⌊⌊ ⌉ (𝐸 𝑎 + ⌊ ⌉𝑣)⌉ mod 2
2 2
𝑞 −1 𝑇
= ⌊⌊ ⌉ 𝐸 𝑎⌉ + 𝑣 mod 2.
2
𝑞 𝑞
We can assume that the coordinates of 𝐸𝑇 𝑎 are integers between − and . Then
2 2
an decryption error occurs if the magnitude of a coordinate of 𝐸𝑇 𝑎 is greater than or
𝑞
equal to . Each coordinate is a sum of at most 𝑚 coefficients of 𝐸𝑇 , and 𝐸 is chosen
4
according to 𝜒. Suppose that 𝜒 is a discrete Gaussian 𝐷ℤ,𝑠 of width 𝑠. Then the coor-
dinates of the vector 𝐸𝑇 𝑎 have magnitude less than √𝑚𝑠 with high probability. One
can show that for all 𝑛 there are choices for 𝑞, 𝑚 and 𝑠 such that the error probability
is very small and the LWE problem is hard (see [MR09] and [Pei14]).
The security of LWE encryption relies on the hardness of the LWE problem (see
[Reg09]):
Theorem 14.43. If the LWE decision problem with parameters 𝑛, 𝑚, 𝑞 and 𝜒 is hard,
then the LWE encryption scheme has indistinguishable encryption under chosen plaintext
attack (IND-CPA secure). ♢
A disadvantage of the above LWE cryptosystem is a substantial encryption blowup

factor, since 𝑙 plaintext bits are transformed into (𝑛 + 𝑙) size(𝑞) ciphertext bits. The
public key (𝐴, 𝑃) has (𝑚𝑛 + 𝑚𝑙) size(𝑞) bits, and the size of the secret key 𝑆 is 𝑛 𝑙 size(𝑞)
bits.
Example 14.44. We illustrate the LWE encryption scheme with a toy example. A more
5
realistic example can be found in Exercise 13. Let 𝑛 = 4, 𝑞 = 23, 𝑚 = 8, 𝛼 = and a
23
𝑠
discrete Gaussian distribution 𝐷ℤ,𝑠 of width 𝑠 = 5 and standard deviation 𝜎 = ≈ 2.
√2𝜋
We choose uniform random matrices 𝑆 (secret) and 𝐴 (public):
9 5 11 13
⎛ ⎞
13 6 6 2
⎜ ⎟
⎜ 6 21 17 18 ⎟ ⎛
5 2 9 1
⎞
⎜ 22 19 20 8 ⎟ 6 8 19 1
𝐴=⎜ 𝑆 = ⎜ ⎟.
2 17 10 21 ⎟ ⎜ 19 18 9 18 ⎟
⎜ ⎟
⎜ 10 8 17 11 ⎟ ⎝ 9 2 14 18 ⎠
⎜ 5 16 12 2 ⎟
⎝ 5 7 11 7 ⎠
The secret matrix 𝐸 is chosen according to 𝐷ℤ,𝑠 . The matrix 𝑃 = 𝐴𝑆 + 𝐸 is public.
0 22 1 21 10 5 21 7
⎛ ⎞ ⎛ ⎞
0 22 22 22 3 1 13 1
⎜ ⎟ ⎜ ⎟
⎜ 22 22 22 0 ⎟ ⎜ 19 15 6 13 ⎟
⎜ 0 0 0 0 ⎟ ⎜ 9 20 0 16 ⎟
𝐸=⎜ 𝑃=⎜ .
0 0 1 2 ⎟ 8 17 13 4 ⎟
⎜ ⎟ ⎜ ⎟
⎜ 1 0 0 1 ⎟ ⎜ 15 21 20 17 ⎟
⎜ 1 22 1 22 ⎟ ⎜ 0 12 3 19 ⎟
⎝ 22 0 0 1 ⎠ ⎝ 16 2 7 15 ⎠
We want to encrypt 𝑣 = (1, 0, 1, 1)𝑇 and choose a random vector 𝑎 ∈ {0, 1}8 :
𝑎 = (1, 1, 0, 1, 0, 0, 0, 1)𝑇 .
23
We have ⌊ 𝑣⌉ = (12, 0, 12, 12)𝑇 and compute the ciphertext
2
23
(𝑢, 𝑐) = ℰ𝑝𝑘 (𝑣) = (𝐴𝑇 𝑎, 𝑃 𝑇 𝑎 + ⌊ ⌉𝑣) = ((3, 14, 2, 7)𝑇 , (4, 5, 7, 5)𝑇 ) mod 23.
2
For decryption, we use the secret matrix 𝑆 and obtain
𝑐 − 𝑆 𝑇 𝑢 = (4, 5, 7, 5)𝑇 − 𝑆 𝑇 (3, 14, 2, 7)𝑇 = (11, 21, 12, 10)𝑇 mod 23.
23
Coefficients close to 0 mod 23 give the bit 0 and coefficients close to ⌊ ⌉ = 12 give the
2
bit 1. Hence we recover the plaintext 𝑣 = (1, 0, 1, 1)𝑇 . ♢
LWE encryption can be attacked by recovering the key 𝑆. Let 𝑠 be a column of 𝑆

and 𝑒, 𝑏 the corresponding columns of 𝐸 and 𝑃, respectively. Then
𝐴𝑠 + 𝑒 = 𝑏
and the LWE problem is to find 𝑠 (or 𝐴𝑠) given 𝐴 and 𝑏. This is a closest vector problem
(CVP) for the lattice Λ𝑞 (𝐴𝑇 ) and the target vector 𝑏. One can use Kannan’s embedding
technique and convert CVP into a shortest vector problem (SVP). Let 𝐻 be an 𝑚 × 𝑚
matrix over ℤ such that the columns form a basis of Λ𝑞 (𝐴𝑇 ). We construct a lattice Λ′
of dimension 𝑚 + 1 that is generated by the columns of the (𝑚 + 1) × (𝑚 + 1) matrix
𝐻 𝑏
𝐵=( ),
0 𝑀
where 𝑀 > 0 is an embedding factor. Now the equation −𝐴𝑠 + 𝑏 = 𝑒 mod 𝑞 implies
𝑒
that a linear combination of the columns of 𝐵 gives the vector 𝑒′ = ( ) ∈ Λ′ . It is
𝑀
likely that 𝑒′ is the shortest vector in the lattice Λ′ generated by 𝐵, if the parameter 𝑀
is properly chosen. Thus, we will probably find 𝑒 by solving the SVP problem for the
lattice Λ′ of dimension 𝑚 + 1. However, if the shortest vector in Λ𝑞 (𝐴𝑇 ) is smaller than
𝑒, then the method does not work.
Example 14.45. We continue Example 14.44 above and try to attack the LWE problem.
Our aim is to first find column 𝑠 of the private matrix 𝑆 given only 𝐴 and 𝑃. First we
compute the HNF of the 23-ary lattice Λ of dimension 8 generated by 𝐴.
sage: AZ=A.lift (). augment (23* identity_matrix (8))
H=AZ. transpose (). hermite_form (). transpose ()
The lattice Λ is generated by the columns of 𝐴 (where the coefficients are lifted to ℤ)
and 23ℤ8 . The first eight (nonzero) columns of the HNF 𝐻 form a basis of Λ. Now we
construct the 9-dimensional lattice Λ′ :
sage: B=H[: ,0:8].augment (P[: ,0]).stack( vector ([0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,2]))
We choose the embedding factor 𝑀 = 2, but the reader may check that 𝑀 = 1 and
𝑀 = 3 also work in this example. Finally, we compute the LLL-reduced basis.
sage: B. transpose (). LLL (). transpose ()
[ 0 1 -2 0 2 -4 -2 4 -1]
[ 0 0 -2 -2 -3 0 0 1 -3]
[-1 0 0 -1 0 -3 -3 -1 1]
[ 0 -2 -1 -1 1 0 -1 2 0]
[ 0 1 -3 -2 0 0 0 1 3]
[ 1 -3 -1 1 0 -2 2 -2 -2]
[ 1 1 0 -4 2 0 2 0 -3]
[-1 -1 0 -1 1 -1 3 1 0]
[ 2 0 2 0 -2 0 0 2 2]
Since the dimension is small, the first column (0, 0, −1, 0, 0, 1, 1, −1, 2)𝑇 is the shortest
𝑒
nonzero vector of Λ′ . We have successfully found the vector ( ), where 𝑒 mod 23 is
2
the secret error vector (the first column of 𝐸; see Example 14.44). ♢
A disadvantage of LWE is that the key size and the number of operations is at least
quadratic in the main security parameter 𝑛. Optimizations of Regev’s LWE encryption
scheme exist, for example the more compact Lindner-Peikert scheme [LP11].
The key size and the efficiency of the system was also the motivation behind the
development of an algebraic variant of LWE called Ring-LWE (R-LWE). We mention R-
LWE only briefly and refer to the literature for a detailed discussion and applications
to encryption and key exchange (see [LPR13] and subsequent works).
LWE is based on the hardness to find a vector 𝑠 given the matrix 𝐴 and a noisy
product 𝐴𝑠 + 𝑒 over ℤ𝑞 . Now, Ring-LWE leverages the cyclotomic ring
𝑅 = ℤ𝑞 [𝑥]/(𝑥𝑛 + 1),
where 𝑛 is a power of 2. The matrix 𝐴 and the secret vectors 𝑠 and 𝑒 are replaced by
elements in 𝑅. Since all elements of 𝑅 can be uniquely represented by polynomials of
degree less than 𝑛, we have 𝑅 ≅ ℤ𝑛𝑞 . Any element 𝑎 ∈ 𝑅 generates a principal ideal
(𝑎) = {𝑎 𝑥 | 𝑥 ∈ 𝑅} ⊂ 𝑅. If 𝑎 ≠ 0, then (𝑎) corresponds to an 𝑛-dimensional 𝑞-ary ideal
lattice.
The R-LWE problem is to find 𝑠 ∈ 𝑅 given 𝑎 ∈ 𝑅 and 𝑏 = 𝑎𝑠 + 𝑒 ∈ 𝑅. The ring
element 𝑒 is ‘small’ and chosen according to an error distribution. Note that 𝑎𝑠 is an
element of the ideal lattice and only the noisy element 𝑏 is given to an adversary. An
encryption scheme can be defined in a similar way to Definition 14.42 above. The main
advantage is that the key length and the number of operations are now linear in 𝑛.
Ring-LWE also has an asymptotic security guarantee: there is a reduction from
a worst-case lattice problem SVP𝛾 to R-LWE, i.e., solving R-LWE is at least as hard as
quantumly solving the SVP𝛾 problem on arbitrary ideal lattices. Note that the reduction
is based on ideal lattices in 𝑅 instead of general 𝑞-ary lattices. Such ideal lattices have
additional structure and it might be possible that efficient attacks will be found that
exploit the algebraic structure and cannot be applied to general lattices. However, such
attacks are not yet known and may not exist.
14.6. Summary
• Lattices are discrete subgroups of ℝ𝑛 . They have a finite basis over ℤ.

• Finding the shortest vector in a lattice (SVP), the shortest independent vectors
(SIVP) and the closest lattice vector to a given vector (CVP) are supposed to be
hard problems.
• The LLL-algorithm leverages the Gram-Schmidt Orthogonalization (GSO) al-
gorithm to reduce the size of a given basis.
• A short, almost orthogonal basis of a lattice can form the private key of a public-
key encryption scheme.
• GGH and NTRU are classical cryptosystems based on lattice problems, and im-
proved versions of NTRU are assumed to be secure.
• The LWE (Learning with Errors) decision problem is to distinguish between a
noisy lattice vector and a uniform random vector. LWE has a security guarantee
that is based on the worst-case hardness of an approximate SIVP problem. The
corresponding LWE encryption scheme is CPA-secure if the LWE problem is
hard.
Exercises
1. Consider Example 14.3. Find a unimodular matrix 𝑈̃ such that 𝐵1 = 𝐵2 𝑈.̃

2. Consider Example 14.10. Find the lattice Λ⟂10 (𝐴)∗ .
1
1 − 2
2
3. Use the lattice defined by the columns of ( √3 ) to show that 𝛾2 ≥ .
√3
0
2
Remark: In fact, equality holds.
4. Consider the lattice in Example 14.3. Use Babai’s rounding method to find the
closest lattice vector to the target vector 𝑣 = (42, 25). Can you also use the basis
𝐵2 ?
5. Let 𝐴 be an 𝑛 × 𝑚 matrix over ℤ𝑞 . Show that Λ⟂𝑞 (𝐴) = 𝑞Λ𝑞 (𝐴)∗ and Λ𝑞 (𝐴) =
𝑞Λ⟂𝑞 (𝐴)∗ .
6. Let Λ be a 100-dimensional lattice of covolume 2104 . Give an upper bound for 𝜆1 (Λ)
and the expected length of the shortest vector if Λ is randomly chosen.
7. Let Λ be a lattice with the following HNF basis:
−13 0
𝑏1 = ( ) , 𝑏2 = ( ) .
31 47
(a) Compute the determinant of Λ and the orthogonality defect of the basis
{𝑏1 , 𝑏2 }.
Exercises 283
(b) Apply the LLL-algorithm: First, run the GSO and the size reduction algo-
rithm.
(c) Check the Lovacz condition. Swap the basis vectors and again apply the size
reduction algorithm.
(d) Check the Lovacz condition again and output the LLL-reduced basis. Com-
pute the orthogonality defect of this basis.
(e) Give the shortest nonzero vector of Λ.
8. A lattice Λ is generated by the columns of the following matrix:
1 0 0
𝐻=( 0 1 0 ).
14 18 63
(a) Encrypt 𝑚 = (−2, −3, 1) with the GGH encryption scheme. Choose the noise
vector 𝑟 = (1, −1, −1).
(b) Try to decrypt the ciphertext 𝑐 using the matrix 𝐻. Why does this attempt fail?
(c) Apply the LLL-algorithm to 𝐻 using SageMath and show that Λ has the fol-
lowing short basis:
−2 −1 4
𝐵 = ( −2 1 −3 ) .
−1 4 2
(d) Decrypt 𝑐 using the private basis 𝐵.

9. Use the NTRU cryptosystem of Example 14.36 to encrypt 𝑚 = −𝑥4 −𝑥2 +1. Choose
the random polynomial 𝑟 = −𝑥3 + 𝑥2 , then decrypt the ciphertext and recover 𝑚.
10. Show that the NTRU cryptosystem is not CPA-secure. How could you fix this prob-
lem? Hint: The construction of 𝑟 implies 𝑟(1) = 0 mod 𝑞 and hence 𝑐(1) =
𝑚(1) mod 𝑞 (compare Example 14.36).
11. Use the key pair of the LWE Example 14.44.
(a) Choose a random vector 𝑎 and encrypt 𝑣 = (1, 0, 0, 1)𝑇 .
(b) Decrypt the ciphertext (𝑢, 𝑐).
(c) Can decryption errors occur with the chosen matrix 𝐸? Given an example of
𝐸 and 𝑎 such that the decryption is not correct.
12. Consider Example 14.45. Use Kannan’s embedding technique to recover the sec-
ond column of the secret error matrix 𝐸. Show that the embedding factors 𝑀 =
1, 2, 3 work. What happens if you set 𝑀 = 4?
13. Construct an example of LWE encryption with the parameters 𝑛 = 𝑙 = 136, 𝑞 =
2003, 𝑚 = 2008, 𝛼 = 0.0065, 𝑠 = 𝛼𝑞 and a discrete Gaussian distribution 𝐷ℤ,𝑠 with
𝑠
𝜎= ≈ 5.19 (see [MR09] Table 3).
√2𝜋
(a) Generate a key pair:

sage: Zq= IntegerModRing (2003)
sage: A= random_matrix (Zq ,2008 ,136)
sage: S= random_matrix (Zq ,136 ,136)
Make sure to write the import statement into one line.

sage: from sage.stats.distributions. discrete_gaussian_integer import
DiscreteGaussianDistributionIntegerSampler
sage: D = DiscreteGaussianDistributionIntegerSampler (sigma =5.19)
sage: numbers =[D() for _in range (2008*136)]

sage: E = Matrix (Zq , 2008 , numbers )
sage: P=A*S+E
(b) We choose a random plaintext 𝑣 of length 𝑙 = 136 and a random vector 𝑎 of

length 𝑚 = 2008.
sage: v=[ ZZ. random_element (0 ,2) for _in range (136)]
sage: v= vector (v)
sage: a=[ ZZ. random_element (0 ,2) for _in range (2008)]
sage: a= vector (a)
𝑞
(c) Compute the ciphertext (𝑢, 𝑐). Note that ⌊ ⌉ = 1002.
2
sage: u=A. transpose ()*a

sage: c=P. transpose ()*a +1002* v
𝑞
(d) We need a map that takes an integer modulo 𝑞 as input, multiplies it by ⌊ ⌉−1 ,
2
rounds to the nearest integer and outputs the result modulo 2.
sage: def transform (x):
y=RR(x.lift ())*1/1002
return ZZ( round (y))
(e) Decrypt the ciphertext:

sage: w= vector (map(transform , c-S. transpose ()*u))
(f) Compare the result with the original plaintext, for example by printing out
𝑣 − 𝑤.
(g) Print out 1002𝑣 and 𝑐 − 𝑆 𝑇 𝑢 and interpret the result.
(h) Explain why or why not any decryption errors occurred.
(i) Give the sizes of the public key, the private key, the plaintext and the cipher-
text.
Chapter 15
Code-based Cryptography
Error correction codes play an important role when data is sent over noisy channels, for
example over wireless links, or stored on potentially unreliable media. Channel cod-
ing deals with random errors and not with manipulations by adversaries, but integrity
protection is a common objective of channel coding and cryptography. However, codes
also aim to ensure error correction, which goes beyond error detection.
A channel encoder takes an information word as input and generates a codeword
that is transmitted over a channel. The ratio between the lengths of the original data
word and the codeword determines the information rate of the code. Decoding is a
potentially complex task, where received words are transformed into codewords and
the original information is (hopefully) recovered. Codes with good error-correction
capabilities, a high information rate and efficient decoding algorithms are available
and widely used in practice.
For cryptographic applications, one can use very long codes with a secret structure.
In this case, decoding should be hard for an adversary without access to the hidden
structure.
In Section 15.1, we give a short introduction to linear codes. Bounds on the pa-
rameters of codes are given in Section 15.2. In Section 15.3, we explain classical Goppa
codes. The McEliece cryptosystem is based on Goppa codes and represents one of
the promising candidates for post-quantum cryptography. We explore the McEliece
scheme and the related Niederreiter cryptosystem in Section 15.4.
There are a number of similarities between lattice-based and code-based cryptog-
raphy. Lattices and codes are linear subspaces of high-dimensional spaces, and for a
given target vector, finding the closest vector in the subspace can be a hard problem.
285
286 15. Code-based Cryptography
Both use a secret structure which allows for an efficient solution to the problem. How-
ever, lattices and codes use a different metric.
Two recommended textbooks on coding theory are [Rot06] and [Bla03]. A short
introduction to error correcting codes and the McEliece cryptosystem is given
in [TW06]. More details on code-based cryptography can be found in [OS09].
15.1. Linear Codes

The basic idea of channel coding is to send (or store) encoded data instead of the origi-
nal information. The data is replaced by a sequence of codewords and sent over a noisy
communication channel or stored on unreliable media (see Figure 15.1). We will only
consider error-correcting codes and focus on block codes, where each information word
is separately encoded into a codeword. We note that other error-correcting codes exist,
for example convolutional and Turbo codes. There are also other types of coding, in
particular source coding.
𝑐 𝑦
𝑥 Channel Encoder Channel Decoder 𝑥′
Channel
Figure 15.1. Channel coding.
Example 15.1. The most elementary example is repetition codes, where an information
symbol (for example one bit) is repeated 𝑛 times, say 𝑛 = 3:
𝑠1 𝑠2 ⋯ ⟶ 𝑠1 𝑠1 𝑠1 𝑠2 𝑠2 𝑠2 … .
Error detection is straightforward: if the message received does not have this pattern,
an error has occurred. The error detection does not always work since a codeword
could accidentally change into another codeword. Although the probability of this
happening is small, it is not impossible (unless 𝑛 is very large, which makes the code
impractical). Note the difference to hash values or message authentication codes which
(almost) always detect changes.
The repetition code also allows for error correction within certain limits: by choos-
ing the symbol with the highest frequency (maximum likelihood) in a received block,
𝑛−1
up to ⌊ ⌋ errors can be corrected. For example, one error can be corrected with 𝑛 = 3,
2
two errors with 𝑛 = 5, etc.
A major disadvantage of a repetition code is the message extension by a factor
of 𝑛. ♢
Below, we assume that messages and codewords are vectors over a finite field of
characteristic 2, i.e., over a binary field 𝐺𝐹(𝑞), where 𝑞 = 2 or 𝑞 = 2𝑚 . Coding theory
can be studied over arbitrary finite fields, but in practice fields of characteristic 2 are
the most important case.
Definition 15.2. A block code of length 𝑛 over 𝐺𝐹(𝑞) is a subset 𝐶 of 𝐺𝐹(𝑞)𝑛 . The
elements of 𝐶 are called codewords. The code size is 𝑀 = |𝐶| and the dimension of 𝐶 is
𝑘 = log𝑞 (|𝐶|). A linear code over 𝐺𝐹(𝑞) is a linear subspace 𝐶 ⊂ 𝐺𝐹(𝑞)𝑛 . The code size
of a linear code is |𝐶| = 𝑞𝑘 and 𝐶 is called a linear [𝑛, 𝑘] code. The information rate of
𝑘
𝐶 is given by 𝑅 = .
𝑛
Example 15.3. The codewords of a repetition code of length 𝑛 over 𝐺𝐹(𝑞) form the
one-dimensional subspace 𝐶 ⊂ 𝐺𝐹(𝑞)𝑛 generated by the vector (1, 1, … , 1). 𝐶 is a linear
1
[𝑛, 1] code over 𝐺𝐹(𝑞) and the information rate is . ♢
𝑛
The error-correction capability of a code is determined by the minimum distance

between the codewords.
Definition 15.4. Let 𝐶 ⊂ 𝐺𝐹(𝑞)𝑛 be a block code over 𝐺𝐹(𝑞).
(1) The Hamming weight 𝑤𝑡(𝑥) of 𝑥 ∈ 𝐺𝐹(𝑞)𝑛 is the number of nonzero coordinates
of 𝑥.
(2) The Hamming distance 𝑑(𝑥, 𝑦) = 𝑤𝑡(𝑥 − 𝑦) between two vectors 𝑥, 𝑦 ∈ 𝐺𝐹(𝑞)𝑛
is the number of coordinates where 𝑥 and 𝑦 differ.
(3) The minimum distance 𝑑 = 𝑑(𝐶) of a code 𝐶 is the minimum of all distances
𝑑(𝑐, 𝑐′ ) for 𝑐, 𝑐′ ∈ 𝐶 and 𝑐 ≠ 𝑐′ . A linear [𝑛, 𝑘] code with minimum distance 𝑑 is
also called an [𝑛, 𝑘, 𝑑] code.
Proposition 15.5. Let 𝐶 be linear code. The minimum distance 𝑑 = 𝑑(𝐶) is equal to the
minimum weight min𝑐∈𝐶 𝑤𝑡(𝑐).
Proof. Suppose 𝑐, 𝑐′ ∈ 𝐶 and 𝑑(𝑐, 𝑐′ ) = 𝑑; then 𝑐 − 𝑐′ ∈ 𝐶 and 𝑤𝑡(𝑐 − 𝑐′ ) = 𝑑.

Conversely, if 𝑐 ∈ 𝐶 and 𝑤𝑡(𝑐) = 𝑤, then 𝑑(𝑐, 0𝑛 ) = 𝑤. Note that the zero vector is a
codeword since 𝐶 is a linear code. □
Example 15.6. Consider the repetition code 𝐶 = {0000, 1111} of length 4 over 𝐺𝐹(2).
The minimum distance is 𝑑(𝐶) = 4 and 𝐶 is a [4, 1, 4] code. ♢
The following properties of the Hamming distance are easy to show:

Proposition 15.7. The Hamming distance 𝑑 is a metric on 𝐺𝐹(𝑞)𝑛 , i.e., for all 𝑥, 𝑦, 𝑧 ∈
𝐺𝐹(𝑞)𝑛 one has:
(1) 𝑑(𝑥, 𝑦) ≥ 0 with equality if and only if 𝑥 = 𝑦.
(2) 𝑑(𝑥, 𝑦) = 𝑑(𝑦, 𝑥) (symmetry).
(3) 𝑑(𝑥, 𝑧) ≤ 𝑑(𝑥, 𝑦) + 𝑑(𝑦, 𝑧) (triangle inequality). ♢
A decoder maps a received word 𝑦 ∈ 𝐺𝐹(𝑞)𝑛 to a codeword 𝑐 ∈ 𝐶. Maximum-

likelihood decoding means to decode a word 𝑦 to the codeword 𝑐 ∈ 𝐶 that maximizes
the conditional probability
𝑃[ 𝑦 | 𝑐 ].
Nearest-codeword decoding of a received word 𝑦 picks the closest codeword 𝑐 with re-
spect to the Hamming distance. For a binary symmetric channel with error probability
less than one half, maximum-likelihood decoding is equivalent to nearest-codeword
decoding. We refer to textbooks on coding theory for a more detailed discussion and
now use nearest-codeword decoding.
The following Theorem provides bounds for error detection and correction:
𝑑(𝐶)−1
Theorem 15.8. A code can detect up to 𝑑(𝐶) − 1 errors and correct up to ⌊ ⌋ errors.
2
Proof. By definition of the minimum distance, adding less than 𝑑(𝐶) errors cannot
give another codeword and hence up to 𝑑(𝐶) − 1 errors can be detected. Now let 𝑦
𝑑(𝐶)−1
be a received word having at most ⌊ ⌋ errors; then a codeword 𝑐 ∈ 𝐶 exists with
2
𝑑(𝐶)−1
𝑑(𝑦, 𝑐) ≤ ⌊ ⌋. Suppose there is another codeword 𝑐′ ∈ 𝐶 such that 𝑑(𝑦, 𝑐′ ) ≤
2
𝑑(𝐶)−1
⌊ ⌋. Then the triangle inequality implies
2
𝑑(𝐶) − 1
𝑑(𝑐, 𝑐′ ) ≤ 𝑑(𝑐, 𝑦) + 𝑑(𝑦, 𝑐′ ) ≤ 2 ⌊ ⌋ ≤ 𝑑(𝐶) − 1,
2
a contradiction. □
Since linear codes are linear subspaces, they can be represented by the span of a set
of linearly independent vectors. Writing these vectors into the rows of a matrix gives
the generator matrix 𝐺 of a code. If 𝐶 is a linear [𝑛, 𝑘] code, then 𝐺 is a 𝑘 × 𝑛 matrix
over 𝐺𝐹(𝑞). The set of codewords can be computed by 𝑥𝐺, where 𝑥 runs over all vectors
𝑥 ∈ 𝐺𝐹(𝑞)𝑘 .
The generator matrix of a code is not uniquely determined: adding the multiple of
one row to another row or swapping two rows does not change the subspace. Swapping
two columns gives an equivalent code, where only the coordinates are permuted. By
applying elementary row operations (Gauss-Jordan elimination) and column permu-
tations (if necessary), one can find a generator matrix in systematic form:
1 0 |
𝐺 = (𝐼𝑘 | 𝑃) = ( … | 𝑃 ).
|
0 1 |
𝐼𝑘 is the 𝑘 × 𝑘 identity matrix and 𝑃 is a 𝑘 × (𝑛 − 𝑘) matrix. The corresponding code is
called systematic and (by Gauss-Jordan elimination) all codes are equivalent to a sys-
tematic code. Systematic codes have the advantage that codewords contain the origi-
nal data as their first 𝑘 symbols. Otherwise, the information word 𝑥 must be recovered
from a codeword 𝑥𝐺 by solving a linear system of equations.
How can we verify that a received vector 𝑦 is a codeword without comparing the
vector to a list of all codewords? Using Gaussian elimination, one can check whether
𝑦 ∈ 𝐶, i.e., whether 𝑦 is a linear combination of the rows of 𝐺. A more direct way is
using a parity-check matrix.
Definition 15.9. Let 𝐶 be a code with generator matrix 𝐺. Then 𝐻 is called a parity-
check matrix of 𝐶 if
𝑦𝐻 𝑇 = 0 ⟺ 𝑦 ∈ 𝐶.
For a received word 𝑦 ∈ 𝐺𝐹(𝑞)𝑛 , the vector 𝑦𝐻 𝑇 is called the syndrome of 𝑦.
Proposition 15.10. Let 𝐺 = (𝐼𝑘 |𝑃) be the generator matrix of a systematic [𝑛, 𝑘] code.
Then the (𝑛 − 𝑘) × 𝑛 matrix
| 1 0
𝐻 = (−𝑃 𝑇 | 𝐼𝑛−𝑘 ) = ( −𝑃𝑇 | … )
|
| 0 1
is the associated parity check matrix.
Proof. Let
(𝑣, 𝑤) ∈ 𝐺𝐹(𝑞)𝑘 × 𝐺𝐹(𝑞)𝑛−𝑘
be a row vector of length 𝑛; then
𝑇
𝑣𝑇 𝑇
𝑇
(𝑣, 𝑤)𝐻 = (𝐻 ( 𝑇 )) = (−𝑃𝑇 𝑣𝑇 + 𝑤𝑇 ) = −𝑣𝑃 + 𝑤.
𝑤
This is the zero vector if and only if 𝑤 = 𝑣𝑃, which is equivalent to (𝑣, 𝑤) = 𝑣𝐺 and to
(𝑣, 𝑤) being a codeword. □
Example 15.11. The [7, 4] Hamming code over 𝐺𝐹(2) can be defined by the following
generator matrix:
1 0 0 0 1 1 0
⎛ ⎞
0 1 0 0 1 0 1
𝐺=⎜ ⎟.
⎜0 0 1 0 0 1 1⎟
⎝0 0 0 1 1 1 1⎠
4
The information rate is and the parity-check matrix is
7
1 1 0 1 1 0 0
𝐻 = (1 0 1 1 0 1 0) .
0 1 1 1 0 0 1
We want to show that the minimum distance is 𝑑 = 3. Firstly, the minimum distance
cannot be greater than 3, since codewords of weight 3 exist (see Proposition 15.5), for
example 𝑐 = (1, 0, 0, 0, 1, 1, 0).
Now assume that 𝑑(𝑣, 𝑤) = 1 for codewords 𝑣 and 𝑤. Then 𝑤𝑡(𝑣 − 𝑤) = 1 and
(𝑣 − 𝑤)𝐻 𝑇 is a zero column of 𝐻, a contradiction. If 𝑑(𝑣, 𝑤) = 2 then (𝑣 − 𝑤)𝐻 𝑇 , a
sum of two columns of 𝐻, is zero, and hence two columns of 𝐻 are linearly dependent.
However, this is not the case.
The [7, 4, 3] Hamming code can correct one error. For example, suppose that the
vector 𝑦 = (1, 1, 1, 0, 0, 1, 1) is received. The syndrome is 𝑦𝐻 𝑇 = (0, 1, 1) and hence 𝑦 is
not a codeword. However, 𝑐 = (1, 1, 0, 0, 0, 1, 1) is a codeword because 𝑐𝐻 𝑇 = (0, 0, 0).

Furthermore, the Hamming distance of 𝑦 and 𝑐 is only 1. Therefore, the nearest-
codeword decoding of 𝑦 gives 𝑥 = (1, 1, 0, 0). ♢
In the above example, one could guess the nearest codeword. A better method is
syndrome decoding.
Definition 15.12. Let 𝐶 ⊂ 𝐺𝐹(𝑞)𝑛 be a linear code and 𝑦 ∈ 𝐺𝐹(𝑞)𝑛 any vector. The
set 𝑦 + 𝐶 = {𝑦 + 𝑐 | 𝑐 ∈ 𝐶} is called a coset of 𝐶. A vector having minimum Hamming
weight in a coset is called a coset leader. ♢
Note that all vectors in a coset 𝑦 + 𝐶 have the same syndrome 𝑦𝐻 𝑇 , since 𝑐𝐻 𝑇 = 0
for all codewords 𝑐 ∈ 𝐻 and hence
(𝑦 + 𝑐)𝐻 𝑇 = 𝑦𝐻 𝑇 .
Furthermore, all vectors in a given coset can be decoded with the coset leader as their
error vector. In fact, the coset leader represents the least change that transforms a
vector into a codeword.
Proposition 15.13. Let 𝐶 be a linear code and suppose one wants to decode a received
word 𝑦. Let 𝑒 be the coset leader of the coset 𝑦 + 𝐶, i.e., the coset leader with the syndrome
𝑦𝐻 𝑇 . Then the nearest codeword to 𝑦 is 𝑦 − 𝑒 ∈ 𝐶.
Proof. Since 𝑦 and 𝑒 have the same syndrome, the syndrome of 𝑦 −𝑒 is zero and 𝑦 −𝑒 is
a codeword. Furthermore, the coset leader 𝑒 has minimum weight among the vectors
𝑒 with 𝑦 − 𝑒 ∈ 𝐶. □
Example 15.14. We continue Example 15.11. The syndrome is 𝑦𝐻 𝑇 = (0, 1, 1) and
we need to find the coset leader which has that syndrome. In general, one would use a
table that contains the coset leader for each syndrome. In our example, we suspect that
𝑦 only has a single bit error. Hence the coset leader is a unit vector and its syndrome
is a column of the parity-check matrix 𝐻. The syndrome (0, 1, 1) appears in the third
column. The coset leader is 𝑒 = (0, 0, 1, 0, 0, 0, 0, 0) and we decode 𝑦 to the codeword
𝑦 − 𝑒 = (1, 1, 0, 0, 0, 1, 1).
Remark 15.15. It is known that the nearest-codeword problem for random codes is
hard (NP-complete). This suggests that large codes can be used for cryptographic pur-
poses.
15.2. Bounds on Codes

A good code 𝐶 ⊂ 𝐺𝐹(𝑞)𝑛 should have a large number of codewords and a large min-
imum distance, but given 𝑛, both parameters cannot be large at the same time. The
following Theorem provides an upper bound for the number of codewords in terms of
the minimum distance.
Theorem 15.16. (Singleton Bound) Let 𝐶 be a block code of length 𝑛, minimum distance
𝑑 and size 𝑀; then
𝑀 ≤ 𝑞𝑛−𝑑+1 .
In particular, if 𝐶 is a linear [𝑛, 𝑘, 𝑑] code, then
𝑘 ≤ 𝑛 − 𝑑 + 1 ⟺ 𝑑 ≤ 𝑛 − 𝑘 + 1.
Proof. Consider the map 𝑓 ∶ 𝐶 → 𝐺𝐹(𝑞)𝑛−𝑑+1 , which removes the first

𝑑 − 1 coordinates, i.e., 𝑓(𝑥1 , … , 𝑥𝑛 ) = (𝑥𝑑 , … , 𝑥𝑛 ). The map is injective, since two code-
words differ in at least 𝑑 coordinates. This shows the first assertion. For a linear code,
we have 𝑀 = 𝑞𝑘 , and therefore 𝑘 ≤ 𝑛 − 𝑑 + 1. □
Definition 15.17. A linear [𝑛, 𝑘, 𝑑] code that satisfies 𝑑 = 𝑛 − 𝑘 + 1 is called an MDS

code (maximum distance separable). ♢
For example, the parity code is MDS (see Exercise 1), but the [7, 4, 3] Hamming
code (see Example 15.11) is not MDS. Reed-Solomon codes are a well-known class of
MDS codes, for which we refer to textbooks on coding theory.
One can use the generator matrix or the parity-check matrix to check whether a
code is MDS. We refer to [Rot06] for the following fact:
Proposition 15.18. Let 𝐶 be a linear [𝑛, 𝑘] code with generator matrix 𝐺 and parity-
check matrix 𝐻. Then 𝐶 is MDS if and only if one of the following conditions is satisfied:
(1) Every set of 𝑛 − 𝑘 columns of 𝐻 is linearly independent.
(2) Every set of 𝑘 columns of 𝐺 is linearly independent.
(3) If 𝐺 = (𝐼𝑘 | 𝑃) is in systematic form, then 𝑃 and every square submatrix of 𝑃 is
nonsingular.
Example 15.19. (1) We again consider the [7, 4] Hamming code (see Example 15.11).
We see that the first three columns of 𝐻 are linearly dependent. The last four
columns of 𝐺 are also linearly dependent. One can show that no non-trivial lin-
ear code over 𝐺𝐹(2) can be MDS.
(2) We construct a systematic [8, 4] code over 𝐺𝐹(28 ) with generator matrix 𝐺 =
(𝐼4 | 𝑃), where
02 03 01 01
⎛ ⎞
01 02 03 01
𝑃=⎜ ⎟.
⎜01 01 02 03⎟
⎝03 01 01 02⎠
𝑃 defines the MixColumns operation of the AES block cipher (see Section 5.2). All
quadratic submatrices of 𝑃 are nonsingular over 𝐺𝐹(28 ) (see Exercise 6 of Chapter
5), so the code is MDS.
We now show that 𝑘 × 𝑘 matrices 𝑃 associated to [2𝑘, 𝑘, 𝑘 + 1] MDS codes

have good diffusion properties. The branch number, defined by
min(𝑤𝑡(𝑣𝑇 ) + 𝑤𝑡(𝑃𝑣𝑇 )),
𝑣≠0
where 𝑣 is a row vector of length 𝑘, measures the worst-case diffusion of 𝑃. The

weight of unit vectors is 1 and so the maximum branch number is 𝑘 + 1. Let
𝐺 = (𝐼𝑘 |𝑃) be the generator matrix of a [2𝑘, 𝑘, 𝑘 + 1] MDS code and 𝑣 ≠ 0. It
follows that
𝑘 + 1 ≤ 𝑤𝑡(𝑣𝐺) = 𝑤𝑡(𝑣‖𝑣𝑃) = 𝑤𝑡(𝑣) + 𝑤𝑡(𝑣𝑃) = 𝑤𝑡(𝑣𝑇 ) + 𝑤𝑡(𝑃 𝑇 𝑣𝑇 ),
and so the branch number of 𝑃 𝑇 is maximal. By Proposition 15.18 (3), the code
with generator matrix (𝐼𝑘 |𝑃 𝑇 ) is also MDS and hence the branch number of 𝑃 is
maximal, too. In this case, 𝑃 is called optimal. The MixColumns matrix of the
AES cipher is optimal and the branch number is 5. ♢
There is also a lower bound for the size of at least one (not necessarily linear) code
with a given length and minimum distance. First, we count the number of vectors in
a ball of radius 𝑟:
Proposition 15.20. Let 𝑟 ∈ ℕ and 𝑣 ∈ 𝐺𝐹(𝑞)𝑛 . Then the number of vectors 𝑤 ∈ 𝐺𝐹(𝑞)𝑛
such that 𝑑(𝑣, 𝑤) ≤ 𝑟 is
𝑟
𝑛
𝑉𝑞 (𝑛, 𝑟) = ∑ ( )(𝑞 − 1)𝑖 .
𝑖=0
𝑖
Proof. Let 𝑣 ∈ 𝐺𝐹(𝑞)𝑛 and 𝑖 ≤ 𝑛. Then the number of vectors 𝑤 such that 𝑑(𝑣, 𝑤) = 𝑖
is (𝑛)(𝑞 − 1)𝑖 , because there are (𝑛) possible index sets where exactly 𝑖 coordinates of
𝑖 𝑖
𝑤 differ from 𝑣, and 𝑞 − 1 possible values for each of these coordinates. Adding these
numbers for 0 ≤ 𝑖 ≤ 𝑟 gives 𝑉𝑞 (𝑛, 𝑟), the number of vectors 𝑤 ∈ 𝐺𝐹(𝑞)𝑛 with 𝑑(𝑣, 𝑤) ≤
𝑟. This is the same as the number of vectors in a ball of radius 𝑟 around the center 𝑣. □
Definition 15.21. Let 𝑑, 𝑛 ∈ ℕ and 𝑑 ≤ 𝑛. Then we define 𝐴𝑞 (𝑛, 𝑑) to be the largest
integer 𝑀 such that a code 𝐶 over 𝐺𝐹(𝑞) of size 𝑀, length 𝑛 and minimum distance
≥ 𝑑 exists. ♢
The following Theorem gives a lower bound for 𝐴𝑞 (𝑛, 𝑑):

Theorem 15.22. (Sphere-covering bound)
𝑞𝑛
𝐴𝑞 (𝑛, 𝑑) ≥ .
𝑉𝑞 (𝑛, 𝑑 − 1)
Proof. Let 𝐶 be a code of length 𝑛, minimum distance of at least 𝑑 and |𝐶| = 𝐴𝑞 (𝑛, 𝑑).
We can assume that, for each vector 𝑣 ∈ 𝐺𝐹(𝑞)𝑛 , there is at least one codeword 𝑐 such
that 𝑑(𝑣, 𝑐) < 𝑑. Otherwise, we could add 𝑣 as a codeword to the code while preserving
the length 𝑛 and the minimum distance 𝑑. Hence the union of balls of radius 𝑑 − 1
having their center at some codeword covers the whole of 𝐺𝐹(𝑞)𝑛 . The number of
vectors in that union is at most 𝐴𝑞 (𝑛, 𝑑)⋅𝑉𝑞 (𝑛, 𝑑−1), which implies the sphere-covering
bound. □
Note that the above argument does not show equality, since vectors can be con-
tained in several balls.
The following theorem gives a bound for the existence of a linear code of dimension
𝑘 and minimum distance 𝑑.
Theorem 15.23. (Gilbert-Varshamov bound) Let 𝑛 ≥ 2, 𝑘 ≤ 𝑛 and 𝑑 ≥ 2 be integers
such that
𝑉𝑞 (𝑛 − 1, 𝑑 − 2) < 𝑞𝑛−𝑘 .
Then there exists a linear [𝑛, 𝑘] code over 𝐺𝐹(𝑞) with minimum distance ≥ 𝑑.
Proof. The assumption 𝑞𝑛−𝑘 > 𝑉𝑞 (𝑛 − 1, 𝑑 − 2) ensures that we can find 𝑛 vectors in
𝐺𝐹(𝑞)𝑛−𝑘 such that any 𝑑 − 1 of them are linearly independent (see [Rot06] for more
details). We write these vectors into the columns of a (𝑛 − 𝑘) × 𝑛 parity-check matrix
𝐻. The dimension of the associated code is at least 𝑛 − (𝑛 − 𝑘) = 𝑘. Furthermore,
the distance between two different codewords cannot be less than 𝑑, since otherwise
a codeword 𝑐 ∈ 𝐶 exists with 𝑤𝑡(𝑐) ≤ 𝑑 − 1, and so 𝑑 − 1 columns of 𝐻 are linearly
dependent, a contradiction. □
An upper bound for 𝐴𝑞 (𝑛, 𝑑) is given by the Hamming bound:

Proposition 15.24. (Hamming bound or sphere-packing bound) For 𝑑 = 2𝑒 + 1, we
have
𝑞𝑛
𝐴𝑞 (𝑛, 𝑑) ≤ .
𝑉𝑞 (𝑛, 𝑒)
Proof. Suppose a code of length 𝑛, minimal distance 𝑑 = 2𝑒 + 1 and size 𝐴𝑞 (𝑛, 𝑑) is

given. Let 𝐵𝑒 (𝑣) be the set of vectors 𝑤 with 𝑑(𝑣, 𝑤) ≤ 𝑒, i.e., a ball of radius 𝑒 around
𝑣. The cardinality of 𝐵𝑒 (𝑐) is 𝑉𝑞 (𝑛, 𝑒). The union of the balls 𝐵𝑒 (𝑐) over all codewords 𝑐
is disjoint and has 𝐴𝑞 (𝑛, 𝑑) ⋅ 𝑉𝑞 (𝑛, 𝑒) elements. Therefore, we obtain
𝐴𝑞 (𝑛, 𝑑) ⋅ 𝑉𝑞 (𝑛, 𝑒) ≤ 𝑞𝑛 .
This yields the assertion. □
A code is called perfect if it attains the Hamming bound.

Example 15.25. Let 𝑞 = 2, 𝑛 = 7 and 𝑑 = 3. Then the sphere-covering bound is
27 27 128
= 7 7 7 = ≈ 4.4,
𝑉2 (7, 2) ( )+( )+( ) 29
0 1 2
𝑘
Figure 15.2. Asymptotic bounds on the information rate 𝑅 = against the relative
𝑛
𝑑
minimum distance 𝛿 = for codes over 𝐺𝐹(2). Singleton and Hamming are upper
𝑛
bounds and Gilbert-Varshamov is a lower bound.
and hence there is a code with at least 5 codewords and the above parameters. Next,
we consider the Gilbert-Varshamov bound:
6 6
𝑉2 (6, 1) = ( ) + ( ) = 7 < 23 = 27−4 .
0 1
This implies that a linear [7, 4] code with minimum distance ≥ 3 exists. Since the
[7, 4, 3] Hamming code has 16 codewords, the code attains the Gilbert-Varshamov
bound. The Hamming bound is
27 128
= 7 = 16.
𝑉2 (7, 1) ( ) + (7)
0 1
Therefore, the maximum size of a code of length 7 over 𝐺𝐹(2) with a minimum distance
of at least 3 is 16. We conclude that the Hamming [7, 4, 3] code is perfect. ♢
We return the problem at the beginning of this section on good codes of length 𝑛.
𝑑 𝑘
The relative minimum distance 𝛿 = and the information rate 𝑅 = cannot be close
𝑛 𝑛
to 1 at the same time. The bounds for 𝑛 → ∞ are shown in Figure 15.2.
15.3. Goppa Codes

Since our main objective is to explain the McEliece cryptosystem, we continue with
Goppa codes. Textbooks on coding theory would deal with (generalized) Reed-Solo-
mon (RS) and Bose-Chaudhuri-Hocquenghem (BCH) codes first, but this is beyond the
scope of this book.
General Goppa codes are geometric codes which are constructed using curves over
finite fields. For our purposes, we only consider classical irreducible binary Goppa
codes. They are defined by a polynomial in one variable over 𝐺𝐹(2𝑚 ).
We fix an integer 𝑚 ≥ 3 for a code 𝐶 over 𝐺𝐹(2𝑚 ). Furthermore, we fix an integer
2𝑚 −1
𝑡 such that 2 ≤ 𝑡 ≤ . The code is designed to have a minimum distance of 𝑑(𝐶) =
𝑚
2𝑡 + 1 and to fix up to 𝑡 errors. It is common to set the length to 𝑛 = 2𝑚 , but 𝑚𝑡 + 1 ≤
𝑛 ≤ 2𝑚 gives greater flexibility (see [BLP08]). A classical Goppa code Γ over 𝐺𝐹(2) is
a subfield code of 𝐶, i.e., Γ = 𝐶 ∩ 𝐺𝐹(2)𝑛 .
Furthermore, let 𝑎1 , … , 𝑎𝑛 be distinct code locator elements of 𝐺𝐹(2𝑚 ). If 𝑛 = 2𝑚 ,
then all elements of the field are code locators. Finally, we fix a random irreducible
polynomial 𝑔 ∈ 𝐺𝐹(2𝑚 )[𝑥] of degree 𝑡.
Example 15.26. Typical choices for cryptographic applications are 10 ≤ 𝑚 ≤ 13 and
50 ≤ 𝑡 ≤ 150, for example 𝑚 = 10, 𝑛 = 2𝑚 = 1024 and 𝑡 = 50. ♢
Below, we consider residue classes in 𝐺𝐹(2𝑚 )[𝑥]/(𝑔(𝑥)). Since we assumed that 𝑔

is irreducible, the latter ring is a field. Hence polynomials 𝑓 ∈ 𝐺𝐹(2𝑚 )[𝑥] are invertible
modulo 𝑔 if 𝑓 is not zero or a multiple of 𝑔. In particular, the residue class
1
mod 𝑔
𝑥 − 𝑎𝑖
exists and can be represented by a polynomial in 𝐺𝐹(2𝑚 )[𝑥] of degree less than 𝑡.
Definition 15.27. Using the above parameters, define a linear [𝑛, 𝑡] code 𝐶 over
𝐺𝐹(2𝑚 ) by
𝑛
1
𝐶 = {(𝑐1 , … , 𝑐𝑛 ) ∈ 𝐺𝐹(2𝑚 )𝑛 | ∑ 𝑐𝑖 ≡ 0 mod 𝑔} .
𝑖=1
𝑥 − 𝑎𝑖
The definition of 𝐶 depends on the order of the code locator elements, but re-ordering
gives an equivalent code. ♢
𝑛
Multiplying the above equation by ℎ(𝑥) = ∏𝑖=1 (𝑥 − 𝑎𝑖 ) eliminates the denomina-
tors 𝑥 − 𝑎𝑖 . Note that ℎ(𝑥) is invertible modulo 𝑔. For 𝑛 = 2𝑚 , one has ℎ(𝑥) = 𝑥𝑛 − 𝑥,
since the roots of 𝑥𝑛 − 𝑥 are precisely the elements of the field 𝐺𝐹(2𝑚 ) (see Proposition
4.73).
The code 𝐶 is the kernel of the syndrome map
𝑆𝑦𝑛 ∶ 𝐺𝐹(2𝑚 )𝑛 → 𝐺𝐹(2𝑚 )[𝑥]/(𝑔(𝑥))
ℎ
that sends (𝑐1 , … , 𝑐𝑛 ) to ∑𝑖 𝑐𝑖 mod 𝑔. We may represent the syndrome by a poly-
𝑥−𝑎𝑖
nomial of degree less than 𝑡. This yields 𝑡 parity-check equations in 𝑛 variables and
𝐶 is therefore a [𝑛, 𝑛 − 𝑡] code over 𝐺𝐹(2𝑚 ), assuming that the equations are linearly
independent (see Remark 15.28 below).
Remark 15.28. One can show ([Rot06] Section 5.1 and Problem 5.11) that 𝐶 is a Gen-
eralized Reed-Solomon (GRS) code with the following parity-check matrix over 𝐺𝐹(2𝑚 ):
1 1 1
⎛ 𝑔(𝑎1 ) … 0 ⎞ …
1 … 1 ⎛ 𝑔(𝑎1 ) 𝑔(𝑎𝑛 ) ⎞
⎛ ⎞⎜ … ⎟ ⎜ 𝑎1 …
𝑎𝑛
⎟
𝑎1 … 𝑎𝑛 ⎜ ⎟ = ⎜ 𝑔(𝑎1 )
𝐻=⎜ ⎟ … 𝑔(𝑎𝑛 ) ⎟ .
⎜ … ⎟⎜ ⎟ ⎜ … ⎟
𝑡−1 … ⎜ 𝑎𝑡−1 ⎟
⎝𝑎1 … 𝑎𝑡−1
𝑛 ⎠⎜ 1 ⎟ 1
…
𝑎𝑡−1
𝑛
⎝ 0 …
𝑔(𝑎𝑛 ) ⎠
⎝ 𝑔(𝑎1 ) 𝑔(𝑎𝑛 ) ⎠
The first 𝑡 columns of the first matrix have a Vandermonde form and are thus nonsin-
gular. The second matrix is a nonsingular diagonal matrix. This shows that the rows
of 𝐻 are linearly independent, so that 𝐶 is an [𝑛, 𝑛 − 𝑡] code. Using Proposition 15.18,
one can also show that 𝐶 is MDS, i.e., an [𝑛, 𝑛 − 𝑡, 𝑡 + 1] code. ♢
In principle, GRS codes can be used for encryption. However, several proposals to
use GRS codes turned out to be insecure while Goppa codes are still unbroken.
Definition 15.29. Let 𝐶 be a code over 𝐺𝐹(2𝑚 ) as in Definition 15.27. We define the
corresponding classical irreducible binary Goppa code Γ over 𝐺𝐹(2) to be the subfield
code of 𝐶, i.e.,
𝑛
1
Γ = {(𝑐1 , … , 𝑐𝑛 ) ∈ 𝐺𝐹(2)𝑛 | ∑ 𝑐𝑖 ≡ 0 mod 𝑔} . ♢
𝑖=1
𝑥 − 𝑎𝑖
𝑛
Since ℎ(𝑥) = ∏𝑖=1 (𝑥 − 𝑎𝑖 ) is invertible modulo 𝑔, one has 𝑐 ∈ Γ if and only if
𝑛
ℎ
∑ 𝑐𝑖 mod 𝑔 ≡ 0.
𝑖=1
𝑥 − 𝑎𝑖
The parity-check matrix of Γ is essentially the matrix 𝐻 in Remark 15.28, but now each
element is viewed as a column of 𝑚 elements of 𝐺𝐹(2).
Proposition 15.30. Let Γ be a classical irreducible binary Goppa code with the above
parameters. Then Γ is a [𝑛, ≥ 𝑛 − 𝑚𝑡, ≥ 2𝑡 + 1] code.
Proof. 𝐶 is defined by 𝑡 parity-check equations over 𝐺𝐹(2𝑚 ), and this corresponds to

𝑚𝑡 equations over 𝐺𝐹(2). Since the equations may not be linearly independent, the
dimension of Γ is ≥ 𝑛 − 𝑚𝑡.
We follow [Ber11] and prove that the distance of codewords is at least 2𝑡 + 1. It is
sufficient to show that the weight of all codewords is ≥ 2𝑡 + 1. Let 𝑐 ∈ Γ be a nonzero
codeword. We define the following polynomials over 𝐺𝐹(2𝑚 ):

𝑓
𝑓 = ∏ (𝑥 − 𝑎𝑖 ) and 𝑓𝑖 = for 𝑐𝑖 = 1.
𝑖∶ 𝑐𝑖 =1
𝑥 − 𝑎𝑖
Note that the degree of 𝑓 is equal to the weight of 𝑐. Let 𝐷(𝑓) be the formal derivative
of 𝑓 (see Definition 4.55). One has
𝐷(𝑓) = 𝐷((𝑥 − 𝑎𝑖 )𝑓𝑖 ) = 𝑓𝑖 + (𝑥 − 𝑎𝑖 )𝐷(𝑓𝑖 )
for 𝑖 ∈ {1, … , 𝑛} with 𝑐𝑖 = 1. A recursive application gives
𝐷(𝑓) = ∑ 𝑓𝑖 .
𝑖∶ 𝑐𝑖 =1
ℎ
Multiplying this equation with the polynomial yields
𝑓
𝑛
ℎ 𝐷(𝑓) ℎ ℎ
= ∑ = ∑ 𝑐𝑖 .
𝑓 𝑖∶ 𝑐 =1
𝑥 − 𝑎 𝑖 𝑖=1
𝑥 − 𝑎𝑖
𝑖
𝑛 ℎ ℎ 𝐷(𝑓)
By assumption, we have 𝑐 ∈ Γ, and hence ∑𝑖=1 𝑐𝑖 ≡ 0 mod 𝑔. Therefore,
𝑥−𝑎𝑖 𝑓
ℎ
is a multiple of 𝑔. Since 𝑔(𝑎𝑖 ) ≠ 0 for all 𝑖 = 1, … , 𝑛, the polynomials 𝑔 and are rela-
𝑓
𝑚
tively prime, which implies 𝑔 ∣ 𝐷(𝑓). Now, polynomials over 𝐺𝐹(2 ) have a surprising
property (see Exercise 7):
• All elements in 𝐺𝐹(2𝑚 ) are squares and
• All polynomials over 𝐺𝐹(2𝑚 ) can be written as 𝛼2 + 𝑥𝛽 2 with 𝛼, 𝛽 ∈ 𝐺𝐹(2𝑚 )[𝑥].
Write 𝑓 = 𝛼2 + 𝑥𝛽 2 with 𝛼, 𝛽 ∈ 𝐺𝐹(2𝑚 )[𝑥]. By construction, 𝑓 has only simple roots
and is not a square. Hence 𝛽 ≠ 0 and
𝐷(𝑓) = 2𝛼𝐷(𝛼) + 𝛽 2 + 2𝑥𝛽𝐷(𝛽) = 𝛽 2 .
Since 𝑔 is irreducible and 𝑔 ∣ 𝐷(𝑓) (see above), we obtain 𝑔 ∣ 𝛽. We conclude that the
degree of 𝛽 is at least 𝑡 and the degree of 𝑓 = 𝛼2 + 𝑥𝛽 2 is at least 2𝑡 + 1. This proves
𝑤𝑡(𝑐) = deg(𝑓) ≥ 2𝑡 + 1. □
Example 15.31. Let 𝑚 = 4, 𝑡 = 2 and 𝑛 = 16; then 𝑛 − 𝑚𝑡 = 8 and 𝑑 = 2𝑡 + 1 = 5.
We want to construct a [16, 8, 5] Goppa code over 𝐺𝐹(2) and use SageMath for the
computations. The field 𝐺𝐹(16) is given by
𝐺𝐹(2)[𝑧]/(𝑧4 + 𝑧 + 1)
and its elements 𝑎1 , … , 𝑎16 ∈ 𝐺𝐹(16) are represented by binary polynomials in the
variable 𝑧 of degree < 4. We choose an irreducible polynomial 𝑔 ∈ 𝐺𝐹(16)[𝑥] of de-
gree 2:
𝑔(𝑥) = 𝑥2 + 𝑧2 𝑥 + 𝑧.
sage: K.<z>=GF (2^4 , name='z', modulus =x^4+x+1)

sage: PR.<x>=K[]
sage: g=x^2+z^2*x+z; g. is_irreducible ()
True
The factor ring 𝐺𝐹(16)[𝑥]/(𝑔(𝑥)) is a field with 162 = 256 elements.

sage: Rmodg=PR. quotient_ring (g)
The elements
1
mod 𝑔, 𝑎𝑖 ∈ 𝐺𝐹(16)
𝑥 − 𝑎𝑖
can be represented by polynomials in 𝐺𝐹(16)[𝑥] of degree ≤ 1.
sage: arr =[]
sage: for a in K.list ():
arr. append (1/ Rmodg (x-a))
1
The array arr contains all elements mod 𝑔 with 𝑎 ∈ 𝐺𝐹(16). Their coefficients
𝑥−𝑎
with respect to the standard basis {1, 𝑥} define a 2 × 16 parity check matrix 𝐻16 of the
code 𝐶 over 𝐺𝐹(16):
sage: H16= matrix (K ,2 ,16)
for j in range (0 ,16):
H16[i,j]= list(arr[j])[i]
In our example, one obtains

𝑧 𝑧3 + 𝑧 0 𝑧3 … … 𝑧3 + 𝑧2 + 1
𝐻16 = ( 3 3 3 2 ).
𝑧 +1 𝑧+1 𝑧 +1 𝑧 +𝑧 +𝑧+1 … … 𝑧2 + 𝑧
𝐻16 defines a [16, 14] code over 𝐺𝐹(16). The parity-check matrix of the subfield code
Γ over 𝐺𝐹(2), denoted by 𝐻, is obtained by replacing the coefficients in 𝐺𝐹(16) by a col-
umn vector of length 4 over 𝐺𝐹(2), using the standard basis
{𝑧3 , 𝑧2 , 𝑧, 1} of 𝐺𝐹(16) over 𝐺𝐹(2).
For example, the lower left entry 𝑧3 +1 is replaced by the column vector (1, 0, 0, 1)𝑇 .
sage: H= matrix (GF (2) ,8 ,16)
for j in range (0 ,16):
H16[i,j]= list(arr[j])[i]
hbin=bin(eval(H16[i,j]. _int_repr ()))[2:]
hbin='0'*(4 - len(hbin ))+ hbin; hbin=list(hbin );
H[4*i:4*(i+1),j] = vector (map(GF(2), hbin ));
0 1 0 1 1 0 0 1 0 1 1 1 1 0 0 1
⎛ ⎞
0 0 0 0 0 1 0 1 1 0 1 0 0 0 0 1
⎜ ⎟
⎜ 1 1 0 0 0 1 0 1 1 1 1 1 1 1 0 0 ⎟
⎜ 0 0 0 0 0 0 1 1 0 0 1 1 1 0 1 1 ⎟
𝐻=⎜ ⎟.
1 0 1 1 0 0 1 0 0 1 0 1 0 0 0 0
⎜ ⎟
⎜ 0 0 0 1 1 0 1 0 1 1 1 1 0 1 1 1 ⎟
⎜ 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 1 ⎟
⎝ 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 ⎠
𝐻 is the parity-check matrix of a Goppa code Γ. By solving the linear system of equa-
tions 𝑣𝐻 𝑇 = 0 and performing elementary row operations, we get the generator matrix
𝐺 of Γ in systematic form.
sage: G=H. right_kernel (). basis_matrix ()
1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1
⎛ ⎞
0 1 0 0 0 0 0 0 1 0 1 0 0 1 1 0
⎜ ⎟
⎜ 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 1 ⎟
⎜ 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 0 ⎟
𝐺=⎜ ⎟.
0 0 0 0 1 0 0 0 0 1 0 1 1 1 0 0
⎜ ⎟
⎜ 0 0 0 0 0 1 0 0 0 1 0 1 1 0 1 1 ⎟
⎜ 0 0 0 0 0 0 1 0 0 0 1 1 1 1 1 1 ⎟
⎝ 0 0 0 0 0 0 0 1 0 1 1 1 0 0 1 0 ⎠
Proposition 15.30 shows that Γ is a [16, ≥ 8, ≥ 5] code. Since the rank of 𝐺 is 8 and
codewords of weight 5 exist, Γ is a [16, 8, 5] code and can decode 16-bit words having at
most two errors. There are 28 = 256 different syndromes and their coset leader gives
the error vector. ♢
Syndrome decoding of general [𝑛, 𝑘] codes is inefficient if 𝑛 is large. Although

there are more effective approaches (information-set decoding, see below), decoding
is a hard problem for general codes. Now, classical binary irreducible Goppa codes
have the useful property that they possess an efficient decoding algorithm, even for
large dimensions, because of their special structure. Below, we describe the Patterson
algorithm [Pat75]. In the next Section 15.4, we explain how Goppa codes are leveraged
to construct a public-key cryptosystem.
Algorithm 15.1 Decoding Goppa Codes using Patterson’s Algorithm

Input: Code locator elements 𝑎1 , … , 𝑎𝑛 , Goppa polynomial 𝑔 of degree 𝑡, erroneous
word 𝑤 = (𝑤1 , … , 𝑤𝑛 ) with up to 𝑡 errors.
Output: Codeword 𝑐
−1
1: Compute 𝑆𝑦𝑛(𝑤) mod 𝑔 and 𝑇 = (𝑆𝑦𝑛(𝑤) mod 𝑔) .
2: if 𝑇 ≡ 𝑥 mod 𝑔 then
3: 𝜎=𝑥
4: else
5: Compute 𝑅 = √𝑇 + 𝑥.
6: Run Extended Euclidean Algorithm on input 𝑔 and 𝑅.
𝑡 𝑡−1
7: Get (𝛼, 𝛽) such that 𝛼 = 𝛽𝑅 + 𝑦𝑔 and deg(𝛼) ≤ , deg(𝛽) ≤ .
2 2
8: 𝜎 = 𝛼2 + 𝑥𝛽 2
9: end if
10: Compute roots of 𝜎. Error vector 𝑒 has a 1 in coordinate 𝑖 if 𝑎𝑖 is a root of 𝜎.
11: 𝑐=𝑤+𝑒
12: return 𝑐
Suppose that all of the parameters of a Goppa code are known and a word 𝑤 =
(𝑤1 , … , 𝑤𝑛 ) ∈ 𝐺𝐹(2)𝑛 with at most 𝑡 errors is received. We want to decode 𝑤 and
proceed similarly as in the proof of Proposition 15.30.
𝑓
Let 𝑓 = ∏𝑖∶𝑤 =1 (𝑥 − 𝑎𝑖 ) and 𝑓𝑖 = . Then 𝐷(𝑓) = ∑𝑖∶ 𝑤 =1 𝑓𝑖 and
𝑖 𝑥−𝑖 𝑖
1 𝐷(𝑓)
𝑆𝑦𝑛(𝑤) = ∑ = mod 𝑔.
𝑖∶ 𝑤 =1
𝑥 − 𝑎𝑖 𝑓
𝑖
If 𝑆𝑦𝑛(𝑤) ≡ 0 mod 𝑔, then 𝑤 is already a codeword. Now assume that 𝑆𝑦𝑛(𝑤) ≢

0 mod 𝑔. We want to find the error locator polynomial 𝜎 = ∏𝑖∶𝑒 =1 (𝑥 − 𝑎𝑖 ) of degree
𝑖
≤ 𝑡 with the same syndrome as 𝑓:
𝐷(𝑓) 𝐷(𝜎)
𝑆𝑦𝑛(𝑤) = = mod 𝑔.
𝑓 𝜎
The roots of 𝜎 correspond to the nonzero entries of the error vector 𝑒. Set 𝜎 = 𝛼2 +
𝑥𝛽 2 with 𝛼, 𝛽 ∈ 𝐺𝐹(2𝑚 )[𝑥]. The polynomials 𝛼 and 𝛽 are uniquely determined and
𝑡 𝑡−1
deg(𝛼) ≤ and deg(𝛽) ≤ . We now compute the polynomials 𝛼 and 𝛽 from the
2 2
given syndrome 𝑆𝑦𝑛(𝑤). One has 𝐷(𝜎) = 𝛽 2 and hence
𝑆𝑦𝑛(𝑤)(𝛼2 + 𝑥𝛽 2 ) = 𝛽 2 mod 𝑔.
Letting 𝑇 = 𝑆𝑦𝑛(𝑤)−1 mod 𝑔 we obtain
(𝛼2 + 𝑥𝛽 2 ) = 𝑇𝛽 2 mod 𝑔 ⟺ 𝛼2 = 𝛽 2 (𝑇 − 𝑥) mod 𝑔.
If 𝑇 ≡ 𝑥 mod 𝑔, then set 𝛼 = 0, 𝛽 = 1 and
𝜎 = 𝛼2 + 𝑥𝛽 2 = 𝑥.
In this case 0 is the only root and the 𝑖-th bit is erroneous, where 𝑎𝑖 = 0 ∈ 𝐺𝐹(2𝑚 ).
Otherwise, compute 𝑅 = √𝑇 − 𝑥 mod 𝑔. We note that a unique square root ex-

ists, since 𝐺𝐹(2𝑚 )[𝑥]/(𝑔) is a finite field of characteristic 2 (see Exercise 6). Huber’s
observation ([Hub96], see Exercise 8) provides an efficient algorithm to compute the
square root in a binary field. This gives
𝛼 = 𝛽𝑅 mod 𝑔.
We lift the residue class 𝑅 to a polynomial over 𝐺𝐹(2𝑚 ) of degree < 𝑡. The requested
solution (𝛼, 𝛽) can be found by performing several iterations of the Extended Euclidean
𝑡 𝑡−1
Algorithm in 𝐺𝐹(2𝑚 )[𝑥] on inputs 𝑔 and 𝑅 until deg(𝛼) ≤ and deg(𝛽) ≤ . The
2 2
algorithm outputs polynomials 𝛼, 𝛽 and 𝑦 in 𝐺𝐹(2𝑚 )[𝑥] such that
𝛼 = 𝛽𝑅 + 𝑦𝑔.
It is necessary to stop the Euclidean Algorithm midway as soon as the degree of a re-
𝑡
mainder is ≤ . Further iterations decrease the degree of 𝛼 (until deg(𝛼) = 0), but
2
𝑡−1
increase the degree of 𝛽 above the limit. Then the error polynomial is
2
𝜎 = 𝛼2 + 𝑥𝛽 2 .
Finally, the roots of 𝜎 correspond to the error positions of 𝑤. A brute-force search in

𝐺𝐹(2𝑚 ) can be performed, but there are also other methods. The Patterson algorithm
is quite efficient and the root extraction turns out to be the most expansive part in the
decoding.
Example 15.32. Consider the Goppa code from Example 15.31. We encode
(1, 1, 0, 1, 0, 0, 1, 0) and obtain the codeword
𝑐 = (1, 1, 0, 1, 0, 0, 1, 0) 𝐺 = (1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0).
Adding two errors at the third and seventh positions yields the word
𝑤 = (1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0).
We apply Patterson’s algorithm to decode 𝑤. First, we compute the syndrome (which

depends on the ordering of 𝑎1 , … , 𝑎16 ∈ 𝐺𝐹(16)):
1
𝑆𝑦𝑛(𝑤) = ∑ mod 𝑔.
𝑖∶ 𝑤 =1
𝑥 − 𝑎𝑖
𝑖
1
We re-use our array arr of elements mod 𝑔 (see Example 15.31).
𝑥−𝑎𝑖
sage: c= vector (GF (2) ,[1 ,1 ,0 ,1 ,0 ,0 ,1 ,0])*G

sage: e= vector (GF (2) ,[0 ,0 ,1 ,0 ,0 ,0 ,1 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0])
sage: w=c+e
sage: syn=w* vector (arr ); syn
(z^2 + z)* xbar + 1
Note that SageMath writes xbar for the residue class 𝑥 mod 𝑔. We obtain
𝑆𝑦𝑛(𝑤) = (𝑧2 + 𝑧)𝑥 + 1 mod 𝑔
and let
1
𝑇= ≡ 𝑧3 𝑥 + (𝑧3 + 𝑧 + 1) mod 𝑔.
𝑆𝑦𝑛(𝑤)
Next, we have to compute the square root of 𝑇 + 𝑥 = (𝑧3 + 1)𝑥 + (𝑧3 + 𝑧 + 1). Since
𝐺𝐹(16)[𝑥]/(𝑔(𝑥)) is a binary field with 256 elements, one has (see Exercise 6)
𝑅 = √𝑇 + 𝑥 = (𝑇 + 𝑥)128 = ((𝑧3 + 1)𝑥 + (𝑧3 + 𝑧 + 1))128 ≡ (𝑧3 + 𝑧2 )𝑥 + (𝑧2 + 𝑧 + 1) mod 𝑔.
Note that the root 𝑅 can be computed more efficiently (see Exercise 8).
sage: T=1/(w* vector (arr ))
sage: T - Rmodg (x)
(z^3 + 1)* xbar + z^3 + z + 1
sage: R=(T - Rmodg(x ))^128; R
(z^3 + z^2)* xbar + z^2 + z + 1
Finally, we have to find 𝛼, 𝛽 with 𝛼 = 𝛽𝑅 mod 𝑔. In our example, the degrees have
to satisfy deg(𝛼) ≤ 1 and deg(𝛽) = 0. So we simply define 𝛼 as the lift of 𝑅 to 𝐺𝐹(16)[𝑥]
and set 𝛽 = 1. The error polynomial is
𝜎 = 𝛼2 + 𝑥𝛽 2 = ((𝑧3 + 𝑧2 )𝑥 + (𝑧2 + 𝑧 + 1))2 + 𝑥 = (𝑧3 + 𝑧2 + 𝑧 + 1)𝑥2 + 𝑥 + (𝑧2 + 𝑧).
The error locator polynomial 𝜎 ∈ 𝐺𝐹(16)[𝑥] is of degree 𝑡 = 2 and its roots are 𝑧2 and
𝑧3 + 𝑧2 .
sage: a=(z^3+z^2)*x+z^2+z+1; b=1
sage: sigma=a*a+x*b*b; sigma
(z^3 + z^2 + z + 1)*x^2 + x + z^2 + z
sage: sigma. factor ()
(z^3 + z^2 + z + 1) * (x + z^2) * (x + z^3 + z^2)
We fixed an ordering of the elements in 𝐺𝐹(16), and in our example, the roots are the
third and the seventh field elements. Hence the word 𝑤 has errors at positions 3 and 7
and we recover the codeword by adding the error vector 𝑒 = (0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0).
sage: i=0; e= vector (GF (2) ,16)
sage: for k in list(K):
if (( sigma.subs(x=k ))==0):
e[i]=1
i=i+1
sage: print (e)
(0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
Finally, we verify that 𝑒 has the same syndrome as 𝑤.

sage: e* vector (arr)

(z^2 + z)* xbar + 1
Since 𝑛 is small, classical syndrome decoding, i.e., without using the Goppa code
structure, would also work in this example. One finds that the coset leader of 𝑤 + Γ
is the error vector 𝑒 = (0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0), since 𝑤𝑡(𝑒) = 2 and the
syndrome of both 𝑤 and 𝑒 is
𝑤𝐻 𝑇 = 𝑒𝐻 𝑇 = (0, 0, 0, 1, 0, 1, 1, 0).
sage: w*H. transpose ()

(0, 0, 0, 1, 0, 1, 1, 0)
sage: e*H. transpose ()
(0, 0, 0, 1, 0, 1, 1, 0)
15.4. McEliece Cryptosystem

The idea of code-based encryption is quite simple: choose a code, encode the plaintext,
change the codeword at some positions and take the result as ciphertext. The challenge
is to make this a trapdoor one-way function and to provide an efficient decoding algo-
rithm that uses a secret structure of the code. The McEliece cryptosystem is based on
Goppa codes, but the public generator matrix is modified in order to conceal the Goppa
code structure. Decoding should in general be hard, but is easy when the underlying
secret Goppa code is known.
First, we explain the classical McEliece cryptosystem.
Definition 15.33. Suppose the Goppa code parameters 𝑛, 𝑚 and 𝑡 are given such that
𝑛
𝑚 ≥ 3, 𝑛 ≤ 2𝑚 and 2 ≤ 𝑡 < . Set 𝑘 = 𝑛 − 𝑚𝑡 and 𝑑 = 2𝑡 + 1. Then the McEliece
𝑚
cryptosystem is defined by:
• The plaintext space ℳ = {0, 1}𝑘 and the ciphertext space 𝒞 = {0, 1}𝑛 .
• A uniform random secret key consisting of an invertible 𝑘×𝑘 matrix 𝑆 over 𝐺𝐹(2),
an 𝑛×𝑛 permutation matrix 𝑃 over 𝐺𝐹(2), distinct elements 𝑎1 , … , 𝑎𝑛 ∈ 𝐺𝐹(2𝑚 ),
an irreducible polynomial 𝑔 ∈ 𝐺𝐹(2𝑚 )[𝑥] of degree 𝑡 and the 𝑘 × 𝑛 generator
matrix 𝐺 of the associated Goppa code Γ. A new key is chosen if the dimension
of Γ is not 𝑛 − 𝑚𝑡.
• A public key, defined by the 𝑘 × 𝑛 matrix 𝐺1 = 𝑆𝐺𝑃 over 𝐺𝐹(2).
• An encryption algorithm that takes a plaintext 𝑥 ∈ {0, 1}𝑘 as input, generates a
random binary string 𝑒 of length 𝑛 and weight 𝑡 and outputs the ciphertext
𝑦 = ℰ𝑝𝑘 (𝑥) = 𝑥𝐺1 + 𝑒.
• A decryption algorithm that takes a ciphertext 𝑦 ∈ {0, 1}𝑛 as input, computes 𝑦1 =
𝑦𝑃−1 and efficiently decodes 𝑦1 to the codeword 𝑐 by using the secret Goppa code
structure. The plaintext 𝑥 is obtained by solving the linear system of equations

𝑥𝑆𝐺 = 𝑐. ♢
We show that decryption is correct: let 𝑦 = 𝑥𝐺1 + 𝑒 = 𝑥𝑆𝐺𝑃 + 𝑒 be a ciphertext;

then
𝑦1 = 𝑦𝑃 −1 = 𝑥𝑆𝐺𝑃𝑃 −1 + 𝑒𝑃 −1 = 𝑥𝑆𝐺 + 𝑒𝑃−1 .
Since 𝑃 is a permutation, 𝑒𝑃−1 has weight 𝑡. Hence 𝑦1 can be decoded to the codeword
𝑐 = (𝑥𝑆) 𝐺 of the secret Goppa code having the generator matrix 𝐺. The original
information word that corresponds to the codeword 𝑐 is 𝑥𝑆, and multiplication by 𝑆−1
recovers the plaintext 𝑥.
Example 15.34. We use the Goppa code from Example 15.31 and choose random se-
cret matrices 𝑆 and 𝑃. 𝑆 is an invertible 8 × 8 matrix over 𝐺𝐹(2):
0 0 1 0 0 1 1 0
⎛ ⎞
1 0 1 0 1 0 1 0
⎜ ⎟
⎜ 0 0 0 0 0 1 0 0 ⎟
⎜ 1 1 0 0 0 0 0 0 ⎟
𝑆=⎜ ⎟.
0 1 0 1 0 1 0 1
⎜ ⎟
⎜ 1 1 1 1 1 0 0 0 ⎟
⎜ 0 0 0 1 0 1 0 1 ⎟
⎝ 0 0 1 1 1 1 1 0 ⎠
𝑃 describes a permutation of 𝐺𝐹(2)16 and its columns are unit vectors 𝑒𝑖 :
| | | | | | | | | | | | | | | |
𝑃 = ( 𝑒11 𝑒13 𝑒3 𝑒8 𝑒15 𝑒5 𝑒2 𝑒1 𝑒9 𝑒7 𝑒6 𝑒16 𝑒4 𝑒10 𝑒12 𝑒14 ) .
| | | | | | | | | | | | | | | |
We use the generator matrix 𝐺 of the Goppa code from Example 15.31. The matrix
𝐺1 = 𝑆𝐺𝑃 generates a [16, 8, 5] code and is public:
1 0 1 0 1 0 0 0 1 1 1 1 0 0 0 1
⎛ ⎞
1 0 1 0 0 1 0 1 0 1 0 1 0 0 1 1
⎜ ⎟
⎜ 0 1 0 0 1 0 0 0 0 0 1 1 0 1 1 0 ⎟
⎜ 1 0 0 0 1 0 1 1 0 0 0 1 0 0 1 0 ⎟
𝐺1 = ⎜ ⎟.
1 0 0 1 1 0 1 0 0 0 1 1 1 1 0 1
⎜ ⎟
⎜ 0 0 1 0 0 1 1 1 0 0 0 0 1 1 0 1 ⎟
⎜ 0 0 0 1 0 0 0 0 1 0 1 1 1 1 0 0 ⎟
⎝ 0 0 1 0 1 1 0 0 0 1 1 1 1 0 1 0 ⎠
Suppose we want to encrypt the plaintext 𝑥 = (0, 1, 1, 1, 0, 0, 1, 1). We choose the fol-
lowing random error vector of weight 2:
𝑒 = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0).
The ciphertext is
𝑦 = 𝑥𝐺1 + 𝑒 = (0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1).
For decryption, we use the matrices 𝑃, 𝑆 and the Goppa code. First, compute
𝑦1 = 𝑦𝑃 −1 = (0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1).
Now we use the Goppa code structure to find the error vector 𝑒𝑃−1 and decode 𝑦1 to
the codeword
𝑐 = (0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1)
(Exercise 10). Finally, one solves the linear system of equations 𝑥𝑆𝐺 = 𝑐 and finds the
plaintext 𝑥 = (0, 1, 1, 1, 0, 0, 1, 1). ♢
The security of the McEliece cryptosystem depends on how hard it is to decode a

vector with 𝑡 errors for a code given by the generator matrix 𝐺1 = 𝑆𝐺𝑃. Since 𝑆 and 𝑃
are secret, the matrix 𝐺 and the Goppa code structure cannot be used by an adversary.
The best known attacks are based on information-set decoding. The basic idea is
quite simple: choose 𝑘 columns of the public 𝑘 × 𝑛 generator matrix 𝐺1 such that the
resulting 𝑘 × 𝑘 matrix is invertible. The indices of the chosen columns define an index
set 𝐼 and we denote the resulting 𝑘 × 𝑘 submatrix by 𝐺𝐼 . Suppose we want to decode a
vector 𝑦 of length 𝑛. The information word 𝑥 and the codeword 𝑐 = 𝑥𝐺1 are unknown.
Let 𝑦𝐼 be the subvector of length 𝑘 defined by the index set 𝐼. We compute
𝑥𝐼 = 𝑦𝐼 𝐺𝐼−1 .
If the vector 𝑦 coincides with the codeword 𝑐 at the index positions 𝐼, i.e., if all errors
are outside 𝐼, then 𝑥𝐼 = 𝑥 and we have successfully decoded 𝑦. This follows from
basic linear algebra: the linear system of equations 𝑥𝐺1 = 𝑐 is overdetermined with
𝑘 variables and 𝑛 > 𝑘 equations. Choosing 𝑘 linearly independent equations, i.e., an
invertible 𝑘 × 𝑘 submatrix 𝐺𝐼 of 𝐺1 , suffices to find 𝑥.
However, it is unlikely that 𝑦 is error-free on a random index set 𝐼: the probability
that a randomly selected index set is not affected by any errors is
𝑛−𝑡 𝑛−𝑡−1 𝑛−𝑡−𝑘+1 𝑛−𝑡 𝑘 𝑡 𝑘

⋅ ⋯ <( ) = (1 − ) .
𝑛 𝑛−1 𝑛−𝑘+1 𝑛 𝑛
Hence the number of attempts necessary to find a suitable information set 𝐼 is expo-
nential in 𝑘. There are many improvements to this basic scheme, but the number of
guesses remains exponential in the number of errors added.
Example 15.35. We return to Example 15.34 and want to decode the ciphertext 𝑦 with-
out the Goppa code structure, using information-set decoding. We choose the index set
𝐼 = {4, 5, 6, 7, 8, 9, 10, 11}, extract columns 4 – 11 from 𝐺1 and invert the matrix:
0 1 0 0 0 1 1 1 1 0 1 1 0 1 1 1
⎛ ⎞ ⎛ ⎞
0 0 1 0 1 0 1 0 1 1 1 0 1 1 1 0
⎜ ⎟ ⎜ ⎟
⎜ 0 1 0 0 0 0 0 1 ⎟ ⎜ 1 1 1 1 1 0 1 0 ⎟
⎜ 0 1 0 1 1 0 0 0 ⎟ −1 ⎜ 1 0 0 1 1 1 1 1 ⎟
𝐺𝐼 = ⎜ ⎟ , 𝐺𝐼 = ⎜ 0 ⎟.
1 1 0 1 0 0 0 1 1 1 0 0 0 0 1
⎜ ⎟ ⎜ ⎟
⎜ 0 0 1 1 1 0 0 0 ⎟ ⎜ 0 1 1 1 1 0 1 1 ⎟
⎜ 1 0 0 0 0 1 0 1 ⎟ ⎜ 1 1 0 1 1 0 1 1 ⎟
⎝ 0 1 1 0 0 0 1 1 ⎠ ⎝ 1 1 0 0 1 1 1 0 ⎠
Now we hope to compute the plaintext:
𝑥𝐼 = 𝑦𝐼 𝐺𝐼−1 = (1, 1, 0, 1, 0, 1, 0, 1) 𝐺𝐼−1 = (0, 1, 1, 1, 0, 0, 1, 1).
Indeed, we have 𝑥 = 𝑥𝐼 and have successfully computed the plaintext. The attack
works because the error positions (2 and 14) lie outside the information set 𝐼.
However, if an adversary chooses the index set 𝐼 = {1, 2, 3, 4, 5, 6, 7, 8}, the result is
𝑥𝐼 = (0, 0, 0, 1, 1, 1, 0, 1). They can verify whether 𝑥𝐼 is correct by computing 𝑥𝐼 𝐺 and
subtracting (i.e., adding modulo 2) the ciphertext 𝑦:
𝑥𝐼 𝐺1 + 𝑦 = (0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1).
Since the weight is 6, the result is not a valid error vector and the attack has failed. ♢
There are several improvements of McEliece’s original system. Firstly, the public
generator matrix 𝐺1 = 𝑆𝐺𝑃 can be transformed into the systematic form 𝐺1′ = (𝐼𝑘 | 𝐺2 )
by elementary row operations and (possibly) column permutations. These operations
correspond to matrix multiplications from the left (row operations) and from the right
(column permutations). One obtains 𝐺1′ = 𝑆 ′ 𝐺𝑃 ′ , so the underlying Goppa code with
the generator matrix 𝐺 remains unchanged.
The advantage of the systematic form (𝐼𝑘 | 𝐺2 ) is that the public key can be com-
pressed to the 𝑘 × (𝑛 − 𝑘) submatrix 𝐺2 , since the identity matrix is a constant part of
any matrix of this form.
Now, suppose the public generator matrix is in systematic form (𝐼𝑘 | 𝐺2 ). Then
encryption of a plaintext 𝑥 yields the ciphertext
𝑦 = 𝑥(𝐼𝑘 | 𝐺2 ) + 𝑒 = (𝑥 ‖ 𝐺2 𝑥) + 𝑒.
We observe that the plaintext 𝑥 – only slightly disturbed by at most 𝑡 error bits – is a
part of the ciphertext! Although this is not a problem (or even desirable) for channel
coding, it looks alarming with respect to the security of the McEliece cryptosystem. In
fact, the original system is not EAV-, CPA- or CCA2-secure.
However, it is assumed that the McEliece encryption function is one-way and it is
hard to recover all bits of a uniform random plaintext from a given ciphertext. Further-
more, there are techniques to reduce the use of partial information, and a modification
of the original McEliece cryptosystem can achieve CCA2 security in the random ora-
cle model. The idea of Pointcheval’s generic conversion [Poi00] is to encrypt a uniform
random string 𝑟 of length 𝑘 instead of the plaintext 𝑥. The error vector is obtained
by applying a cryptographic hash function 𝐻 to (𝑥‖𝑟′ ), where 𝑟′ is a uniform random
string of length 𝑘′ :
𝑒 = 𝐻(𝑥‖𝑟′ ), 𝑦1 = 𝑟𝐺1 + 𝑒.
𝑒 needs to be transformed into an error vector of length 𝑛 and weight 𝑡, but we can skip
the details here. Let 𝑅 be a pseudorandom generator that takes a key (or seed) of length
𝑘 as input and outputs a string of length 𝑘 + 𝑘′ . Then set
𝑦2 = (𝑥‖𝑟′ ) ⊕ 𝑅(𝑟).
The ciphertext is defined by 𝑦 = (𝑦1 ‖𝑦2 ).

Decryption takes (𝑦1 ‖𝑦2 ) as input and recovers 𝑟 from 𝑦1 as in the original McEliece
cryptosystem. Then, one obtains 𝑥 and 𝑟′ :
(𝑥‖𝑟′ ) = 𝑦2 ⊕ 𝑅(𝑟).
Before outputting the plaintext 𝑥, the integrity is checked using 𝑟′ . For this purpose, the
resulting error vector 𝑟𝐺1 + 𝑦1 is compared to the error vector derived from 𝐻(𝑥‖𝑟′ ). If
they do not match, then an error code is returned. Otherwise, the plaintext 𝑥 is output.
An adversary can still obtain large parts of 𝑟 from the ciphertext 𝑦1 . However, they
cannot exploit this information, unless they have the complete key 𝑟, which is very
unlikely. Furthermore, access to a decryption oracle does not help decrypt a given
challenge ciphertext.
Finally, we want to explain the Niederreiter cryptosystem, which uses syndromes
instead of erroneous codewords. An advantage is that the Niederreiter cryptosystem
has shorter ciphertexts than the McEliece cryptosystem.
Definition 15.36. Suppose the Goppa code parameters 𝑛, 𝑚 and 𝑡 are given such that
𝑛
𝑚 ≥ 3, 𝑛 ≤ 2𝑚 and 2 ≤ 𝑡 < . Set 𝑘 = 𝑛 − 𝑚𝑡 and 𝑑 = 2𝑡 + 1. The Niederreiter
𝑚
cryptosystem is defined as follows:
• The plaintext space contains the binary strings of length 𝑛 and weight 𝑡. The
ciphertext space is 𝒞 = {0, 1}𝑛−𝑘 .
• The secret key is chosen uniformly at random and consists of an invertible
(𝑛 − 𝑘) × (𝑛 − 𝑘) matrix 𝑆 over 𝐺𝐹(2), an 𝑛 × 𝑛 permutation matrix 𝑃 over
𝐺𝐹(2), distinct elements 𝑎1 , … , 𝑎𝑛 of the field 𝐺𝐹(2𝑚 ), an irreducible polyno-
mial 𝑔 ∈ 𝐺𝐹(2𝑚 )[𝑥] of degree 𝑡 and the (𝑛 − 𝑘) × 𝑛 parity-check matrix 𝐻 of the
associated Goppa code Γ. A new key is chosen if the dimension of Γ is not 𝑛 − 𝑚𝑡.
• The public key is the (𝑛 − 𝑘) × 𝑛 matrix 𝐻1 = 𝑆𝐻𝑃.
• The encryption algorithm takes a plaintext 𝑥 ∈ {0, 1}𝑛 of weight 𝑡 as input and
outputs the ciphertext
𝑦 = ℰ𝑝𝑘 (𝑥) = 𝐻1 𝑥𝑇 .
• The decryption algorithm takes a ciphertext 𝑦 ∈ {0, 1}𝑛−𝑘 as input and computes
the column vector 𝑠𝑦𝑛 = 𝑆 −1 𝑦. Find a vector 𝑧 such that 𝐻𝑧𝑇 = 𝑠𝑦𝑛 and decode 𝑧
using Patterson’s algorithm. This gives the error vector 𝑒 of weight 𝑡 with 𝐻𝑒𝑇 =
𝑠𝑦𝑛. The plaintext 𝑥 is recovered by
𝑥𝑇 = 𝑃 −1 𝑒𝑇 . ♢
The public key 𝐻1 = 𝑆𝐻𝑃 is transformed into the systematic form (𝐼𝑛−𝑘 |𝐻2 ). This
reduces the size of the public key to 𝑘(𝑛 − 𝑘) bits.
We explain the correctness of the Niederreiter cryptosystem: 𝐻(𝑃𝑥𝑇 ) is the syn-
drome of 𝑃𝑥𝑇 . Thus the ciphertext 𝑦 = 𝑆𝐻𝑃𝑥𝑇 is a transformed syndrome. Comput-
ing 𝑆−1 𝑦 gives the syndrome 𝐻(𝑃𝑥𝑇 ). Syndrome decoding recovers the error vector
𝑒𝑇 = 𝑃𝑥𝑇 , and the plaintext is 𝑥𝑇 = 𝑃 −1 𝑒𝑇 .
Example 15.37. Consider the Goppa code and the matrices 𝐻, 𝑆 and 𝑃 in Examples
15.31 and 15.34. Suppose we want to encrypt the plaintext
𝑥 = (0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0)
of weight 𝑡 = 2 with the Niederreiter cryptosystem. The ciphertext is
𝑦 = 𝑆𝐻𝑃𝑥𝑇 = (0, 0, 1, 0, 1, 1, 1, 0)𝑇 .
For decryption, we compute the syndrome
𝑠𝑦𝑛 = 𝑆 −1 𝑦 = (0, 0, 1, 1, 1, 1, 0, 1)𝑇 .
The linear system of equations 𝐻𝑧𝑇 = 𝑠𝑦𝑛 is underdetermined. The affine solution
space is of dimension 8 and one of the solutions is the vector
𝑧 = (1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0).
Syndrome decoding (see Exercise 12) yields the error vector
𝑒𝑇 = (1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0)𝑇 .
Finally, we recover the plaintext
𝑥𝑇 = 𝑃 −1 𝑒𝑇 = (0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0)𝑇 . ♢
Like the McEliece system, the plain Niederreiter cryptosystem is not CCA2 secure.
This can be resolved by applying a CCA2-secure conversion, for example Pointcheval’s
generic conversion [Poi00] (see above), the Fujisaki-Okamoto transform [FO13] or the
Kobara-Imai conversion [NC12].
15.5. Summary 309
Remark 15.38. The McEliece and the Niederreiter cryptosystems look differently, but
they are based on the same decoding problem. Let 𝐺 be the generator matrix and 𝐻 the
parity-check matrix of a Goppa code. Let 𝐺1 = 𝑆𝐺𝑃 be the public key of the McEliece
system and let 𝑆′ be any invertible (𝑛 − 𝑘) × (𝑛 − 𝑘) matrix. Then 𝐻1 = 𝑆 ′ 𝐻(𝑃−1 )𝑇
satisfies 𝐺1 𝐻1𝑇 = 0, and 𝐻1 is the parity-check matrix and the public key of the Nieder-
reiter system of the same code. The McEliece ciphertext is an erroneous codeword
and the Niederreiter ciphertext is a syndrome. Finding the nearest codeword is essen-
tially the same as finding the coset leader of a syndrome. Therefore, the McEliece and
the Niederreiter cryptosystems offer the same level of security. An adversary who can
break one of them is also able to break the other. ♢
The McEliece and Niederreiter cryptosystems have been thoroughly investigated

and are among the most promising candidates for post-quantum cryptography.
McElices’s key sizes (𝑛 = 1024, 𝑘 = 524, 𝑡 = 50) were broken 30 years after the original
proposal with 260 CPU cycles [BLP08]. For high security, D.J. Bernstein and T. Lange
propose [6960, 5413] Goppa codes with degree 𝑡 = 119. A practical disadvantage is the
large public-key key size of 𝑘(𝑛 − 𝑘) = 8,373,911 bits.
15.5. Summary
• Codes are used to detect and to correct errors when data is sent over noisy chan-
nels or stored on potentially unreliable media.
• Information words are encoded to codewords. Decoding of an erroneous code-
word means finding the error vector, restoring the codeword and recovering
the original data.
• There are bounds on the maximum number of codewords when the length and
minimum distance of codewords is given.
• Syndrome decoding works in many practical applications, but decoding is a
hard problem for random codes of large dimension.
• Goppa codes have an efficient decoding algorithm that also works for large
dimensions.
• The McEliece cryptosystem is based on a code with a secret Goppa code struc-
ture. The ciphertext are erroneous codewords and the plaintext is recovered
by decoding. The Niederreiter cryptosystem is similar to the McEliece scheme,
but uses a parity-check matrix and syndromes for encryption.
• Code-based encryption with appropriate parameters is thought to be secure
against attacks by quantum computers.
Exercises
1. The codewords of the parity code 𝐶 of length 𝑛 over 𝐺𝐹(𝑞) are the words (𝑥1 , … ,
𝑥𝑛−1 , 𝑥𝑛 ) that satisfy 𝑥𝑛 = 𝑥1 + ⋯ + 𝑥𝑛−1 . Give the generator and the parity-check
matrix of 𝐶. Show that 𝐶 is a linear [𝑛, 𝑛 − 1, 2] MDS code.
2. Find the relationship between 𝑞-ary lattices and linear codes over 𝐺𝐹(𝑞) for a prime
𝑞.
3. Let 𝐶 be the linear [8, 4] code with the following generator matrix over 𝐺𝐹(2):
1 1 1 1 1 1 1 1
⎛ ⎞
0 0 0 0 1 1 1 1
𝐺=⎜ ⎟.
⎜0 0 1 1 0 0 1 1⎟
⎝0 1 0 1 0 1 0 1⎠
(a) Show that 𝐺 is also the parity-check matrix of 𝐶. Such a code is called self-
dual.
(b) Show that the minimum distance of 𝐶 is 𝑑 = 4.
Hint: It is sufficient to show that every set of three columns of the parity-
check matrix is linearly independent.
(c) Decode the word received 𝑦 = (0, 1, 0, 0, 1, 1, 0, 0) using syndrome decoding.
4. Give the sphere-covering bound, the Gilbert-Varshamov bound and the Hamming
bound for 𝑛 = 16 and 𝑑 = 5. Show that the Goppa code in Example 15.31 has
maximal dimension.
5. Why is the formula (𝑎 + 𝑏)2 = 𝑎2 + 𝑏2 (teacher’s nightmare) true over binary fields
𝐺𝐹(2𝑚 )? What can be said about (𝑎 + 𝑏)𝑛 if 𝑛 is a power of 2?
𝑚
6. Why hold 𝑎2 = 𝑎 for every 𝑎 ∈ 𝐺𝐹(2𝑚 )? Prove that
𝑚−1
√𝑎 = 𝑎2 .
Why is the square root unique? Show that
√𝑎 + 𝑏 = √𝑎 + √𝑏 for all 𝑎, 𝑏 ∈ 𝐺𝐹(2𝑚 ).
7. Show that any polynomial 𝑓 ∈ 𝐺𝐹(2𝑚 )[𝑥] can be written as 𝑓 = 𝛼2 + 𝑥𝛽 2 with

𝛼, 𝛽 ∈ 𝐺𝐹(2𝑚 )[𝑥]. How can 𝛼 and 𝛽 be computed?
8. Let 𝑓(𝑥) ∈ 𝐺𝐹(2𝑚 )[𝑥] and suppose that 𝑔(𝑥) ∈ 𝐺𝐹(2𝑚 )[𝑥] is irreducible. Why
does a unique square root √𝑓 mod 𝑔(𝑥) exist? Let 𝑓 = 𝛼2 + 𝑥𝛽2 mod 𝑔(𝑥) with
𝛼, 𝛽 ∈ 𝐺𝐹(2𝑚 )[𝑥]. Show that
√𝑓 = 𝛼 + √𝑥 𝛽 mod 𝑔(𝑥).
Exercises 311
Now suppose that 𝑔 = 𝑔21 + 𝑥𝑔22 with 𝑔1 , 𝑔2 ∈ 𝐺𝐹(2𝑚 )[𝑥] and 1 = 𝑣1 𝑔1 + 𝑣2 𝑔2 with
𝑣1 , 𝑣2 ∈ 𝐺𝐹(2𝑚 )[𝑥]. Show Huber’s formulas [Hub96], [Hub03]:
√𝑥 = 𝑔1 𝑔−1
2 mod 𝑔(𝑥),
√𝑥 = 𝑣2 𝑔1 + 𝑥𝑣1 𝑔2 mod 𝑔(𝑥).
9. Refer to Example 15.32. Find the nearest codeword to the vector

(1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0).
10. Verify the results in Example 15.34 using SageMath.
11. What happens when the same plaintext is encrypted twice using the original
McEliece cryptosystem? What information can an adversary gain from the cipher-
texts? Now suppose that Pointcheval’s generic conversion is used. What can now
be said about the ciphertexts?
12. Decode the syndrome (0, 0, 1, 1, 1, 1, 0, 1) using the Goppa code in Example 15.31.
Bibliography
[AKS04] Manindra Agrawal, Neeraj Kayal, and Nitin Saxena, PRIMES is in P, Ann. of Math. (2) 160 (2004), no. 2, 781–793,
DOI 10.4007/annals.2004.160.781. MR2123939
[Bar15] Gregory V Bard, Sage for Undergraduates, Vol. 87, American Mathematical Soc., 2015.
[Bar16] Elain Barker, Recommendation for Key Management, Part 1: General, Technical Report SP 800-57 Part 1 Revision
4, National Institute of Standards of Technology, 2016.
[BK15] Elaine Barker and John Kelsey, Recommendation for Random Number Generation Using Deterministic Random
Bit Generators, Technical Report SP 800-90A Revision 1, National Institute of Standards and Technology, 2015.
[Bel06] Mihir Bellare, New proofs for NMAC and HMAC: security without collision-resistance, Advances in cryptology—
CRYPTO 2006, Lecture Notes in Comput. Sci., vol. 4117, Springer, Berlin, 2006, pp. 602–619, DOI
10.1007/11818175_36. MR2422187
[BK03] Mihir Bellare and Tadayoshi Kohno, A theoretical treatment of related-key attacks: RKA-PRPs, RKA-PRFs, and
applications, Advances in cryptology—EUROCRYPT 2003, Lecture Notes in Comput. Sci., vol. 2656, Springer,
Berlin, 2003, pp. 491–506, DOI 10.1007/3-540-39200-9_31. MR2090438
[BR96] Mihir Bellare and Phillip Rogaway, The exact security of digital signatures - How to sign with RSA and Rabin,
International conference on the theory and applications of cryptographic techniques, 1996, pp. 399–416.
[BR05] Mihir Bellare and Phillip Rogaway, Introduction to Modern Cryptography, UCSD CSE 207 (2005), 1–283.
[BR06] Mihir Bellare and Phillip Rogaway, Code-based game-playing proofs and the security of triple encryption, Ad-
vances in cryptology–eurocrypt. lncs, 2006, pp. 409–426.
[BB84] Charles H Bennett and Gilles Brassard, Quantum Cryptography: Public Key Distribution and Coin Tossing, Int.
conf. on computers, systems and signal processing, Bangalore, India, 1984, pp. 175–179.
[BG07] Côme Berbain and Henri Gilbert, On the Security of IV Dependent Stream Ciphers, International workshop on
fast software encryption, 2007, pp. 254–273.
[BLP08] Daniel J. Bernstein, Tanja Lange, and Christiane Peters, Attacking and defending the McEliece cryptosystem,
Post-quantum cryptography, Lecture Notes in Comput. Sci., vol. 5299, Springer, Berlin, 2008, pp. 31–46, DOI
10.1007/978-3-540-88403-3_3. MR2775645
[Ber08a] Daniel J Bernstein, ChaCha, a variant of Salsa20, Workshop record of sasc, 2008, pp. 3–5.
[Ber08b] Daniel J Bernstein, The Salsa20 family of stream ciphers, Lecture Notes in Computer Science 4986 (2008), 84–97.
[Ber11] Daniel J. Bernstein, List decoding for binary Goppa codes, Coding and cryptology, Lecture Notes in Comput. Sci.,
vol. 6639, Springer, Heidelberg, 2011, pp. 62–80, DOI 10.1007/978-3-642-20901-7_4. MR2834693
[BCLvV18] Daniel J. Bernstein, Chitchanok Chuengsatiansup, Tanja Lange, and Christine van Vredendaal, NTRU prime:
reducing attack surface at low cost, Selected areas in cryptography—SAC 2017, Lecture Notes in Comput. Sci.,
vol. 10719, Springer, Cham, 2018, pp. 235–260. MR3775587
[Ber11] Bertoni, Guido and Daemen, Joan and Peeters, Michael and Van Assche, Gilles, The Keccak reference, 2011.
313
314 Bibliography
[Bla03] Richard E Blahut, Algebraic codes for data transmission, Cambridge University Press, 2003.
[Bon99] Dan Boneh, Twenty years of attacks on the RSA cryptosystem, Notices Amer. Math. Soc. 46 (1999), no. 2, 203–213.
MR1673760
[BSI18] BSI, Kryptographische Verfahren: Empfehlungen und Schlüssellängen, Bundesamt für Sicherheit in der Inform-
tionstechnik, 2018.
[CR12] Carlos Cid and Matt Robshaw, The eSTREAM Portfolio in 2012, ECRYPT II, 2012.
[CM13] Margaret Cozzens and Steven J. Miller, The mathematics of encryption, Mathematical World, vol. 29, American
Mathematical Society, Providence, RI, 2013. An elementary introduction. MR3098499
[DR02] Joan Daemen and Vincent Rijmen, The design of Rijndael, Information Security and Cryptography, Springer-
Verlag, Berlin, 2002. AES—the advanced encryption standard. MR1986943
[DCP08] Christophe De Canniere and Bart Preneel, Trivium, New Stream Cipher Designs, 2008, pp. 244–266.
[DR08] Tim Dierks and Eric Rescorla, The Transport Layer Security (TLS) Protocol Version 1.2, Internet Request for
Comments, RFC Editor, Fremont, CA, USA, 2008.
[DH76] Whitfield Diffie and Martin E. Hellman, New directions in cryptography, IEEE Trans. Information Theory IT-22
(1976), no. 6, 644–654, DOI 10.1109/tit.1976.1055638. MR0437208
[Dwo01] Morris Dworkin, Recommendation for Block Cipher Modes of Operation, Technical Report SP 800-38A, National
Institute of Standards and Technology, 2001.
[Dwo07] Morris J Dworkin, Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and
GMAC, NIST Special Publication 800-38D, 2007.
[Dwo16] Morris J Dworkin, Recommendation for Block Cipher Modes of Operation: The CMAC Mode for Authentication,
NIST Special Publication 800-38B, 2016.
[Edw74] H. M. Edwards, Riemann’s zeta function, Academic Press [A subsidiary of Harcourt Brace Jovanovich, Publish-
ers], New York-London, 1974. Pure and Applied Mathematics, Vol. 58. MR0466039
[Eke91] Artur K. Ekert, Quantum cryptography based on Bell’s theorem, Phys. Rev. Lett. 67 (1991), no. 6, 661–663, DOI
10.1103/PhysRevLett.67.661. MR1118810
[FIP01] FIPS, Federal Information Processing Standards Publication 197. Advanced Encryption Standard (AES), 2001.
[FIP08] FIPS, Federal Information Processing Standards Publication 198-1. The Keyed-Hash Message Authentication Code
(HMAC), 2008.
[FIP13] FIPS, Federal Information Processing Standards Publication 186-4: Digital Signature Standard (DSS), 2013.
[FIP15a] FIPS, Federal Information Processing Standards Publication 180-4. Secure Hash Standard, 2015.
[FIP15b] FIPS, Federal Information Processing Standards Publication 202. SHA-3 Standard: Permutation-Based Hash and
Extendable-Output Functions, 2015.
[FMS01] Scott Fluhrer, Itsik Mantin, and Adi Shamir, Weaknesses in the key scheduling algorithm of RC4, Selected areas
in cryptography, Lecture Notes in Comput. Sci., vol. 2259, Springer, Berlin, 2001, pp. 1–24, DOI 10.1007/3-540-
45537-X_1. MR2054424
[FO13] Eiichiro Fujisaki and Tatsuaki Okamoto, Secure integration of asymmetric and symmetric encryption schemes, J.
Cryptology 26 (2013), no. 1, 80–101, DOI 10.1007/s00145-011-9114-1. MR3016824
[FOPS01] Eiichiro Fujisaki, Tatsuaki Okamoto, David Pointcheval, and Jacques Stern, RSA-OAEP is secure under the
RSA assumption, Advances in cryptology—CRYPTO 2001 (Santa Barbara, CA), Lecture Notes in Comput. Sci.,
vol. 2139, Springer, Berlin, 2001, pp. 260–274, DOI 10.1007/3-540-44647-8_16. MR1931427
[Gal12] Steven D. Galbraith, Mathematics of public key cryptography, Cambridge University Press, Cambridge, 2012.
MR2931758
[Gil16] Daniel Kahn Gillmor, Negotiated Finite Field Diffie-Hellman Ephemeral Parameters for Transport Layer Security
(TLS), Internet Request for Comments, RFC Editor, Fremont, CA, USA, 2016.
[Gol01] Oded Goldreich, Foundations of cryptography, Cambridge University Press, Cambridge, 2001. Basic tools.
MR1881185
[GGH97] Oded Goldreich, Shafi Goldwasser, and Shai Halevi, Public-key cryptosystems from lattice reduction problems,
Advances in cryptology—CRYPTO ’97 (Santa Barbara, CA, 1997), Lecture Notes in Comput. Sci., vol. 1294,
Springer, Berlin, 1997, pp. 112–131, DOI 10.1007/BFb0052231. MR1630399
[GB08] Shafi Goldwasser and Mihir Bellare, Lecture Notes on Cryptography, Massachusetts Institute of Technology
(MIT), 2008.
[Gro96] Lov K. Grover, A fast quantum mechanical algorithm for database search, Proceedings of the Twenty-eighth
Annual ACM Symposium on the Theory of Computing (Philadelphia, PA, 1996), ACM, New York, 1996, pp. 212–
219, DOI 10.1145/237814.237866. MR1427516
Bibliography 315
[HPS+ 17] Jeff Hoffstein, Jill Pipher, John M. Schanck, Joseph H. Silverman, William Whyte, and Zhenfei Zhang, Choosing
parameters for 𝑁𝑇𝑅𝑈𝐸𝑛𝑐𝑟𝑦𝑝𝑡, Topics in cryptology—CT-RSA 2017, Lecture Notes in Comput. Sci., vol. 10159,
Springer, Cham, 2017, pp. 3–18. MR3630855
[HPS98] Jeffrey Hoffstein, Jill Pipher, and Joseph H. Silverman, NTRU: a ring-based public key cryptosystem, Algorithmic
number theory (Portland, OR, 1998), Lecture Notes in Comput. Sci., vol. 1423, Springer, Berlin, 1998, pp. 267–
288, DOI 10.1007/BFb0054868. MR1726077
[HPS08] Jeffrey Hoffstein, Jill Pipher, and Joseph H. Silverman, An introduction to mathematical cryptography, Under-
graduate Texts in Mathematics, Springer, New York, 2008. MR2433856
[HK07] Dennis Hofheinz and Eike Kiltz, Secure hybrid encryption from weakened key encapsulation, Advances in
cryptology—CRYPTO 2007, Lecture Notes in Comput. Sci., vol. 4622, Springer, Berlin, 2007, pp. 553–571, DOI
10.1007/978-3-540-74143-5_31. MR2423870
[Hub96] Klaus Huber, Note on decoding binary Goppa codes, Electronics Letters 32 (1996), no. 2, 102–103.
[Hub03] Klaus Huber, Taking 𝑝th roots modulo polynomials over finite fields, Des. Codes Cryptogr. 28 (2003), no. 3, 303–
311, DOI 10.1023/A:1024118322745. MR1976963
[HK97] Ran Canetti Hugo Krawczyk Mihir Bellare, HMAC: Keyed-Hashing for Message Authentication, Internet Request
for Comments, RFC Editor, Fremont, CA, USA, 1997.
[IK03] Tetsu Iwata and Kaoru Kurosawa, Stronger security bounds for OMAC, TMAC, and XCBC, Progress in
cryptology—INDOCRYPT 2003, Lecture Notes in Comput. Sci., vol. 2904, Springer, Berlin, 2003, pp. 402–415,
DOI 10.1007/978-3-540-24582-7_30. MR2092397
[JOP14] Antoine Joux, Andrew Odlyzko, and Cécile Pierrot, The past, evolving present, and future of the discrete loga-
rithm, Open problems in mathematics and computational science, Springer, Cham, 2014, pp. 5–36. MR3330876
[KL15] Jonathan Katz and Yehuda Lindell, Introduction to modern cryptography, 2nd ed., Chapman & Hall/CRC Cryp-
tography and Network Security, CRC Press, Boca Raton, FL, 2015. MR3287369
[KCP16] John Kelsey, Shu-jen Chang, and Ray Perlner, SHA-3 Derived Functions: cSHAKE, KMAC, TupleHash and Par-
allelHash, NIST Special Publication 800-185, 2016.
[Ken05] Stephan T. Kent, IP Encapsulating Security Payload (ESP), Internet Request for Comments, RFC Editor, Fre-
mont, CA, USA, 2005.
[KM07] Neal Koblitz and Alfred J. Menezes, Another look at “provable security”, J. Cryptology 20 (2007), no. 1, 3–37, DOI
10.1007/s00145-005-0432-z. MR2340187
[KE10] Hugo Krawczyk and Pasi Eronen, HMAC-based Extract-and-Expand Key Derivation Function (HKDF), Internet
Request for Comments, RFC Editor, Fremont, CA, USA, 2010.
[Lam79] Leslie Lamport, Constructing digital signatures from a one-way function, Technical Report CSL-98, SRI Interna-
tional Palo Alto, 1979.
[LCM+ 16] Adam Langley, WanTeh Chang, Nikos Mavrogiannopoulos, Joachim Strombergson, and Simon Josefsson,
ChaCha20-Poly1305 Cipher Suites for Transport Layer Security (TLS), Internet Request for Comments, RFC
Editor, Fremont, CA, USA, 2016.
[LK08] Matt Lepinski and Stephen T. Kent, Additional Diffie-Hellman Groups for Use with IETF Standards, Internet
[LP11] Richard Lindner and Chris Peikert, Better key sizes (and attacks) for LWE-based encryption, Topics in
cryptology—CT-RSA 2011, Lecture Notes in Comput. Sci., vol. 6558, Springer, Heidelberg, 2011, pp. 319–339,
DOI 10.1007/978-3-642-19074-2_21. MR2804770
[LM10] Manfred Lochter and Johannes Merkle, Elliptic Curve Cryptography (ECC) Brainpool Standard Curves and
Curve Generation, Internet Request for Comments, RFC Editor, Fremont, CA, USA, 2010.
[LPR13] Vadim Lyubashevsky, Chris Peikert, and Oded Regev, On ideal lattices and learning with errors over rings, J. ACM
60 (2013), no. 6, Art. 43, 35, DOI 10.1145/2535925. MR3144913
[MV04] David A. McGrew and John Viega, The security and performance of the Galois/counter mode (GCM) of operation,
Progress in cryptology—INDOCRYPT 2004, Lecture Notes in Comput. Sci., vol. 3348, Springer, Berlin, 2004,
pp. 343–355, DOI 10.1007/978-3-540-30556-9_27. MR2148337
[MOV93] Alfred J. Menezes, Tatsuaki Okamoto, and Scott A. Vanstone, Reducing elliptic curve logarithms to logarithms in
a finite field, IEEE Trans. Inform. Theory 39 (1993), no. 5, 1639–1646, DOI 10.1109/18.259647. MR1281712
[MvOV97] Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone, Handbook of applied cryptography, CRC Press
Series on Discrete Mathematics and its Applications, CRC Press, Boca Raton, FL, 1997. With a foreword by
Ronald L. Rivest. MR1412797
[MR09] Daniele Micciancio and Oded Regev, Lattice-based cryptography, Post-quantum cryptography, Springer, Berlin,
2009, pp. 147–191, DOI 10.1007/978-3-540-88702-7_5. MR2590647
[MKJR16] Kathleen M. Moriarty, Burt Kaliski, Jakob Jonsson, and Andreas Rusch, PKCS #1: RSA Cryptography Specifica-
tions Version 2.2, Internet Request for Comments, RFC Editor, Fremont, CA, USA, 2016.
316 Bibliography
[NYHR05] Clifford Neuman, Tom Yu, Sam Hartman, and Kenneth Raeburn, The Kerberos Network Authentication Service
(V5), Internet Request for Comments, RFC Editor, Fremont, CA, USA, 2005.
[NC12] Robert Niebuhr and Pierre-Louis Cayrel, Broadcast attacks against code-based schemes, Research in cryptology,
Lecture Notes in Comput. Sci., vol. 7242, Springer, Heidelberg, 2012, pp. 1–17, DOI 10.1007/978-3-642-34159-
5_1. MR3021183
[NC00] Michael A. Nielsen and Isaac L. Chuang, Quantum computation and quantum information, Cambridge Univer-
sity Press, Cambridge, 2000. MR1796805
[NL18] Yoav Nir and Adam Langley, ChaCha20 and Poly1305 for IETF Protocols, Internet Request for Comments, RFC
[OP01] Tatsuaki Okamoto and David Pointcheval, The gap-problems: a new class of problems for the security of cryp-
tographic schemes, Public key cryptography (Cheju Island, 2001), Lecture Notes in Comput. Sci., vol. 1992,
Springer, Berlin, 2001, pp. 104–118, DOI 10.1007/3-540-44586-2_8. MR1898028
[OS09] Raphael Overbeck and Nicolas Sendrier, Code-based cryptography, Post-quantum cryptography, Springer,
Berlin, 2009, pp. 95–145, DOI 10.1007/978-3-540-88702-7_4. MR2590646
[PP10] Christof Paar and Jan Pelzl, Understanding Cryptography: A Textbook for Students and Practitioners, Springer,
2010.
[Pat75] N. J. Patterson, The algebraic decoding of Goppa codes, IEEE Trans. Information Theory IT-21 (1975), 203–207,
DOI 10.1109/tit.1975.1055350. MR0379009
[PM07] Goutam Paul and Subhamoy Maitra, Permutation after RC4 key scheduling reveals the secret key, International
workshop on selected areas in cryptography, 2007, pp. 360–377.
[Pei14] Chris Peikert, A decade of lattice cryptography, Found. Trends Theor. Comput. Sci. 10 (2014), no. 4, i—iii, 283–
424, DOI 10.1561/0400000074. MR3494162
[Poi00] David Pointcheval, Chosen-ciphertext security for any one-way cryptosystem, Public key cryptography
(Melbourne, 2000), Lecture Notes in Comput. Sci., vol. 1751, Springer, Berlin, 2000, pp. 129–146, DOI
10.1007/978-3-540-46588-1_10. MR1864776
[Pom96] Carl Pomerance, A tale of two sieves, Notices Amer. Math. Soc. 43 (1996), no. 12, 1473–1485. MR1416721
[Reg09] Oded Regev, On lattices, learning with errors, random linear codes, and cryptography, J. ACM 56 (2009), no. 6,
Art. 34, 40, DOI 10.1145/1568318.1568324. MR2572935
[Res18] Eric Rescorla, The Transport Layer Security (TLS) Protocol Version 1.3, Internet Request for Comments, RFC
[RP11] Eleanor Rieffel and Wolfgang Polak, Quantum computing, Scientific and Engineering Computation, MIT Press,
Cambridge, MA, 2011. A gentle introduction. MR2791092
[RSA78] R. L. Rivest, A. Shamir, and L. Adleman, A method for obtaining digital signatures and public-key cryptosystems,
Comm. ACM 21 (1978), no. 2, 120–126, DOI 10.1145/359340.359342. MR700103
[RB08] Matthew Robshaw and Olivier Billet, New stream cipher designs, LNCS, vol. 4986, Springer, 2008.
[Ros12] Kenneth H Rosen, Discrete Mathematics and its Applications, 7th ed., McGraw-Hill, 2012.
[Rot06] Ron Roth, Introduction to Coding Theory, Cambridge University Press, 2006.
[RSN+ 10] Andrew Rukhin, Juan Soto, James Nechvatal, Miles Smid, and Elaine Barker, A Statistical Test Suite for Random
and Pseudorandom Number Generators for Cryptographic Applications, Technical Report SP 800-22 Revision 1a,
National Institute of Standards of Technology, 2010.
[Sag18] Sage Developers, Sagemath, the Sage Mathematics Software System (Version 8.5), https://www.sagemath.org,
2018.
[Sha49] C. E. Shannon, Communication theory of secrecy systems, Bell System Tech. J. 28 (1949), 656–715, DOI
10.1002/j.1538-7305.1949.tb00928.x. MR0032133
[She17] Thomas R. Shemanske, Modern cryptography and elliptic curves, Student Mathematical Library, vol. 83, Amer-
ican Mathematical Society, Providence, RI, 2017. A beginner’s guide. MR3676088
[Sho94] Peter W. Shor, Algorithms for quantum computation: discrete logarithms and factoring, 35th Annual Symposium
on Foundations of Computer Science (Santa Fe, NM, 1994), IEEE Comput. Soc. Press, Los Alamitos, CA, 1994,
pp. 124–134, DOI 10.1109/SFCS.1994.365700. MR1489242
[Sho09] Victor Shoup, A computational introduction to number theory and algebra, 2nd ed., Cambridge University Press,
Cambridge, 2009. MR2488898
[Sil09] Joseph H. Silverman, The arithmetic of elliptic curves, 2nd ed., Graduate Texts in Mathematics, vol. 106, Springer,
Dordrecht, 2009. MR2514094
[SPL06] Jun H. Song, Radha Poovendran, and Jicheol Lee, The AES-CMAC-96 Algorithm and Its Use with IPsec, Internet
Bibliography 317
[SPLI06] Jun H. Song, Radha Poovendran, Jicheol Lee, and Tetsu Iwata, The AES-CMAC Algorithm, Internet Request for
Comments, RFC Editor, Fremont, CA, USA, 2006.
[SBK+ 17] Marc Stevens, Elie Bursztein, Pierre Karpman, Ange Albertini, and Yarik Markov, The first collision for full SHA-
1, Advances in cryptology—CRYPTO 2017. Part I, Lecture Notes in Comput. Sci., vol. 10401, Springer, Cham,
2017, pp. 570–596. MR3703211
[Was08] Lawrence C Washington, Elliptic curves: Number Theory and Cryptography, CRC Press, 2008.
[TW06] Wade Trappe and Lawrence C. Washington, Introduction to cryptography with coding theory, 2nd ed., Pearson
Prentice Hall, Upper Saddle River, NJ, 2006. MR2372272
[WJW+ 14] Klaus Weltner, Sebastian John, Wolfgang Weber, et al., Mathematics for Physicists and Engineers: Fundamentals
and Interactive Study Guide, Springer, 2014.
[Zen07] Erik Zenner, Why IV Setup for Stream Ciphers is Difficult, Dagstuhl seminar proceedings, 2007.
Index
Adaptive chosen ciphertext attack, 166 Cardinality, 7

Advanced Encryption Standard, 104 CBC MAC, 154
AES, 104 CBC mode, 53
Affine map, 14, 95 CCA, 38
AKE, 187 CCA-secure, 45, 177
Algebraic degree, 14 CCA1-secure, 45
Algebraic normal form, 13 CCA2-secure, 45, 158, 166, 196–199, 307,
Algorithm, 9 308
AND, 9 CDH problem, 189
Authenticated encryption scheme, 158 Ceiling function, 11
Average-case complexity, 17 CFB mode, 118
ChaCha, 133
Babai’s rounding method, 259 Characteristic of a field, 83
Babystep-Giantstep, 192 Chinese Remainder Theorem, 80, 173
BB84 protocol, 248 Chosen ciphertext attack, 38, 166
Bell state, 235 Chosen message attack, 152, 204
Big-O, 16 Chosen plaintext attack, 38, 165
Bijective, 10 CMAC, 155
Binomial coefficient, 15 CNOT gate, 236
Binomial distribution, 23 Code, 287
Birthday paradox, 25 Codeword, 287
Bit permutation, 16 Codomain, 8
Bloch sphere, 231 Collision, 137
Block cipher, 52 Combination generator, 126
Block code, 287 Compression function, 137, 140
Blockchain, 140 Computational security, 36
Boolean function, 13 Conditional probability, 21
Congruences, 65
Caesar cipher, 34 Connection polynomial, 121
Canonical verification, 152 Convolution product, 271
319
320 Index
Coset, 290 Enigma machine, 35

Coset leader, 290 Equivalence relation, 12
Countable set, 11 Euclidean Algorithm, 63, 85
Covolume, 256 EUF-CMA secure, 152, 204
CPA, 38 Euler’s phi function, 67
CPA-secure, 42, 43, 55, 57, 165, 195, 197, Euler’s Theorem, 76
198, 279 Event, 19
CPRNG, 46 Existential forgery, 206
Cryptocurrency, 140 Expectation, 21
Cryptosystem, 32 Experiment, 38
CSPRNG, 46 Extended Euclidean Algorithm, 63, 85
CTR mode, 54
Cumulative distribution function, 21 Factor ring, 86
CVP, 258 Factorial, 15
Cyclic group, 77 Factoring, 177
Factoring assumption, 168
Davies-Meyer construction, 141 Fast exponentiation, 67
DDH problem, 189 Feedback polynomial, 121
Decision problem, 17 Feistel network, 102
Degree of a field extension, 82 Fermat factorization, 178
Degree of a polynomial, 83 Fermat’s Little Theorem, 76
Derivative, 85 Field, 82
Deutsch-Josza algorithm, 240 Field extension, 82
DFT, 241 Filter generator, 126
DHIES, 198 Floor function, 11
Diffie-Hellman, 188, 190, 223 Floyd’s cycle finding algorithm, 26
Digital signature, 204 Formal derivative, 85
Discrete Fourier Transform, 241 Function, 8
Discrete Gaussian, 276 Galois Field, 88
Discrete logarithm, 189 Galois field, 83
Discriminant, 216 Game, 38
DL problem, 189 Gap-CDH problem, 197
Domain, 8 Gaussian heuristic, 259
DSA, 211 GCM mode, 159
DSS, 211 Geometric distribution, 20
GGH cryptosystem, 269
EAV-secure, 40, 187 Gilbert-Varshamov bound, 293
ECB mode, 53 Goppa code, 296
ECC, 213 Gram-Schmidt orthogonalization, 262
ECDH, 223 Greatest common divisor, 63, 85
ECDSA, 228 Group, 73
ECIES, 199 Grover’s algorithm, 248
ECM, 225 GSO, 262
ElGamal encryption, 202
ElGamal signature, 211 Hadamard gate, 232
Elliptic curve, 217 Hamming bound, 293
Elliptic curve cryptography, 213 Hamming distance, 287
Encryption scheme, 32 Hard problem, 17
Index 321
Hash function, 137 LWE, 277

Hermite normal form, 261
HMAC, 156 MAC, 152
HNF, 261 Malleable, 45, 58
Homomorphism, 74, 81 Man-in-the-Middle attack, 190
Hybrid encryption, 197 Matrix, 93
Maximum-likelihood decoding, 288
Ideal cipher model, 142 McEliece cryptosystem, 304
Image, 8 MDS code, 291
IND, 37 Merkle tree, 140
IND-CCA1, 45 Merkle-Damgård, 141
IND-CCA2, 45, 158, 166, 177, 196–199, Mersenne prime, 63
307, 308 Message authentication code, 152
IND-CPA, 42, 43, 55, 57, 165, 195, 197, 198, Metric, 287
279 Miller-Rabin test, 171
IND-EAV, 40 Minkowski Theorem, 258
Independent, 20, 22
Index-Calculus algorithm, 194 Nearest-codeword decoding, 288
Indistinguishability, 37 Negligible, 18
Information rate, 287 Next-bit test, 46
Information-set decoding, 305 Niederreiter cryptosystem, 308
Injective, 10 NMAC, 156
Integers, 61 Nonce, 55
Inverse map, 10 Nonsingular, 216
Irreducible, 87 NTRU cryptosystem, 272
Isomorphism, 74, 81 Number field sieve, 180, 194
Kannan’s embedding technqiue, 280 OAEP, 175

Keccak, 148 OFB mode, 117
KEM, 194 One-time pad, 34
Kerberos, 186 One-way permutation, 166, 190
Kerkhoff’s principle, 33 Operation mode, 53, 117
Kernel, 75 Order, 76
Ket notation, 230 Order of a polynomial, 124
Key distribution, 186 Orthogonal, 94
Key encapsulation mechanism, 194 Orthogonality defect, 262
KMAC, 157
KSA, 128 Parity-check matrix, 289
Patterson algorithm, 299
Lattice, 254 Pauli-X gate, 232
Learning with errors, 277 Pauli-Y gate, 233
LFSR, 119 Pauli-Z gate, 233
Linear code, 287 Perfect code, 293
Linear Feedback Shift Register, 119 Perfect secrecy, 35
Linear map, 14, 93 Period, 120
Linear recurring sequence, 119 Permutation, 15
LLL algorithm, 265 PFS, 187
LLL-reduced basis, 265 Phase gate, 233
Lovacz condition, 265 Pointcheval’s conversion, 307
322 Index
Pollard’s 𝑝 − 1 method, 180 RSA-FDH, 207

Pollard’s rho algorithm, 26, 178, 193 RSA-OAEP, 175
Polynomial growth, 17 RSA-PSS, 207
Polynomial ring, 83
Preimage, 8 Safe prime, 191
Preimage resistance, 138 SageMath, 1
PRF, 49 Salsa20, 130
PRG, 46 Self-synchronizing stream cipher, 117
PRGA, 129 Set, 7
Prime number, 62 SHA-1, 142
Primitive polynomial, 124 SHA-2, 145
Primitive root, 79 SHA-3, 148
Probability distribution, 19 Shor’s algorithm, 243
Probability mass function, 21 Signature, 204
Probability space, 19 Singular point, 216
Proof by reduction, 48, 57 SIVP, 258
Pseudorandom function, 49 Size, 18
Pseudorandom generator, 46 Size-reduced basis, 264
Pseudorandom permutation, 50 Smooth curve, 216
Public-key encryption, 164 Soft-O notation, 19
Sphere-covering bound, 292
q-ary lattice, 257 Sphere-packing bound, 293
QFT, 242 Sponge construction, 148
QKD, 248 Square-and-multiply, 68
Quadratic sieve, 179 Stream cipher, 116
Quantum bit, 230 Strong primes, 181
Quantum computing, 230 Strong pseudorandom permutation, 50
Quantum Fourier Transform, 242 Subgroup, 75
Quantum gate, 232 Substitution-permutation network, 102
Quantum key distribution, 248 Surjective, 10
Qubit, 230 SVP, 258
Quotient ring, 86 Symmetric-key encryption, 33
Quotient set, 12 Synchronous stream cipher, 116
Syndrome, 289
Random bit generator, 24
Random oracle model, 138 Toffoli gate, 238
Random variable, 21 Transposition cipher, 34
Range, 8 Trapdoor permutation, 166
RC4, 128
Related-key attacks, 51 Unary string, 18
Relation, 11 Uniform distribution, 20
Residue classes, 12, 65, 86 Unitary, 94
Ring, 81 Units, 67
Ring-LWE, 281
RKA, 51 Variance, 21
Rounding function, 11 Vector space, 92
RSA, 166, 206 Vigenère cipher, 34
RSA assumption, 169 Von Neumann extractor, 24
Index 323
Walsh-Hadamard, 237
Weierstrass equation, 214
Worst-case complexity, 17
XOR, 9
PUBLISHED TITLES IN THIS SERIES
40 Heiko Knospe, A Course in Cryptography, 2019

36 Cesar E. Silva, Invitation to Real Analysis, 2019
35 Álvaro Lozano-Robledo, Number Theory and Geometry, 2019
34 C. Herbert Clemens, Two-Dimensional Geometries, 2019
33 Brad G. Osgood, Lectures on the Fourier Transform and Its Applications, 2019
32 John M. Erdman, A Problems Based Course in Advanced Calculus, 2018
31 Benjamin Hutz, An Experimental Introduction to Number Theory, 2018
30 Steven J. Miller, Mathematics of Optimization: How to do Things Faster, 2017
29 Tom L. Lindstrøm, Spaces, 2017
28 Randall Pruim, Foundations and Applications of Statistics: An Introduction Using R,
Second Edition, 2018
27 Shahriar Shahriari, Algebra in Action, 2017
26 Tamara J. Lakins, The Tools of Mathematical Reasoning, 2016
25 Hossein Hosseini Giv, Mathematical Analysis and Its Inherent Nature, 2016
24 Helene Shapiro, Linear Algebra and Matrices, 2015
23 Sergei Ovchinnikov, Number Systems, 2015
22 Hugh L. Montgomery, Early Fourier Analysis, 2014
21 John M. Lee, Axiomatic Geometry, 2013
20 Paul J. Sally, Jr., Fundamentals of Mathematical Analysis, 2013
19 R. Clark Robinson, An Introduction to Dynamical Systems: Continuous and Discrete,
Second Edition, 2012
18 Joseph L. Taylor, Foundations of Analysis, 2012
17 Peter Duren, Invitation to Classical Analysis, 2012
16 Joseph L. Taylor, Complex Variables, 2011
15 Mark A. Pinsky, Partial Differential Equations and Boundary-Value Problems with
Applications, Third Edition, 1998
14 Michael E. Taylor, Introduction to Differential Equations, 2011
13 Randall Pruim, Foundations and Applications of Statistics, 2011
12 John P. D’Angelo, An Introduction to Complex Analysis and Geometry, 2010
11 Mark R. Sepanski, Algebra, 2010
10 Sue E. Goodman, Beginning Topology, 2005
9 Ronald Solomon, Abstract Algebra, 2003
8 I. Martin Isaacs, Geometry for College Students, 2001
7 Victor Goodman and Joseph Stampfli, The Mathematics of Finance, 2001
6 Michael A. Bean, Probability: The Science of Uncertainty, 2001
5 Patrick M. Fitzpatrick, Advanced Calculus, Second Edition, 2006
4 Gerald B. Folland, Fourier Analysis and Its Applications, 1992
3 Bettina Richmond and Thomas Richmond, A Discrete Transition to Advanced
Mathematics, 2004
2 David Kincaid and Ward Cheney, Numerical Analysis: Mathematics of Scientific
Computing, Third Edition, 2002
1 Edward D. Gaughan, Introduction to Analysis, Fifth Edition, 1998
Pure and Applied
AMSTEXT
40
Sally
The UNDERGRADUATE TEXTS 40
SERIES
This book provides a compact course in modern cryptography. The mathemat-

ical foundations in algebra, number theory and probability are presented with a
focus on their cryptographic applications. The text provides rigorous definitions
and follows the provable security approach. The most relevant cryptographic
schemes are covered, including block ciphers, stream ciphers, hash functions,
message authentication codes, public-key encryption, key establishment, digital
signatures and elliptic curves. The current developments in post-quantum
cryptography are also explored, with separate chapters on quantum computing,
A Course in
A Course in Cryptography
lattice-based and code-based cryptosystems.
Many examples, figures and exercises, as well as SageMath (Python) computer
code, help the reader to understand the concepts and applications of modern
Cryptography
cryptography. A special focus is on algebraic structures, which are used in many
cryptographic constructions and also in post-quantum systems. The essential
mathematics and the modern approach to cryptography and security prepare
the reader for more advanced studies.
The text requires only a first-year course in mathematics (calculus and linear
algebra) and is also accessible to computer scientists and engineers. This book is Heiko Knospe
suitable as a textbook for undergraduate and graduate courses in cryptography
as well as for self-study.
For additional information

Knospe
and updates on this book, visit
AMSTEXT/40
Sally
The This series was founded by the highly respected
SERIES
mathematician and educator, Paul J. Sally, Jr.
2-color cover: PMS 432 (Gray) and PMS 300 (Blue) 344 pages • 50# paper • Backspace 1 3/8" • Trim Size: 7" x 10"

A Course in Cryptography

Uploaded by

Copyright:

Available Formats

A Course in Cryptography

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Course in Cryptography

Uploaded by

Copyright:

Available Formats

Pure and Applied

This book provides a compact course in modern cryptography. The mathemat-

For additional information

2010 Mathematics Subject Classiﬁcation. Primary 94A60;

For additional information and updates on this book, visit

Library of Congress Cataloging-in-Publication Data

Getting Started with SageMath 1

Chapter 2. Encryption Schemes and Definitions of Security 31

2.6. Chosen Plaintext Attacks 41

Chapter 3. Elementary Number Theory 61

Chapter 4. Algebraic Structures 73

Chapter 5. Block Ciphers 101

Chapter 6. Stream Ciphers 115

Chapter 7. Hash Functions 137

Chapter 8. Message Authentication Codes 151

Chapter 9. Public-Key Encryption and the RSA Cryptosystem 163

Chapter 10. Key Establishment 185

10.8. Summary 200

Chapter 11. Digital Signatures 203

Chapter 12. Elliptic Curve Cryptography 213

Chapter 13. Quantum Computing 229

Chapter 14. Lattice-based Cryptography 253

Chapter 15. Code-based Cryptography 285

Why is cryptography interesting? Firstly, cryptography is a classical subject with a fas-

Chapter 14 Chapter 13 Chapter 15

Dependence relationship between the chapters.

0.2. SageMath Command Line

0.3. Browser Notebooks

Figure 0.1. Creating a new browser notebook.

0.4. Computations with SageMath

Figure 0.2. SageMath code to print out the factorials.

We can also construct the residue class ring

1.1. Sets, Relations and Functions

There are several elementary set operations: 𝑀 ∪ 𝑁 (union), 𝑀 ∩ 𝑁 (intersection),

Table 1.1. XOR and AND operations.

Warning 1.9. A mathematical function is not the same as an algorithm. An algo-

Figure 1.1. Composition of functions.

Definition 1.10. Let 𝑓 ∶ 𝑋 → 𝑌 be a function.

• 𝑓 is surjective or onto if every element of the codomain 𝑌 is contained in the image

𝑓−1 ∘ 𝑓 = 𝑖𝑑𝑋 𝑓 ∘ 𝑓−1 = 𝑖𝑑𝑌

Figure 1.2. Inverse map and its properties.

Example 1.11. Let 𝑓 ∶ {0, 1}2 → {0, 1}2 be defined by

Invertible (bijective) maps play an important role in ciphering. An encryption map

Example 1.17. ⌊1.7⌋ = 1, ⌈−2.4⌉ = −2 and ⌊1.5⌉ = 2. ♢

Functions from 𝑋 to 𝑌 are particular cases of relations between 𝑋 and 𝑌 :

A function 𝑓 ∶ 𝑋 → 𝑌 defines a relation between 𝑋 and 𝑌 :

Definition 1.19. Let 𝑅 be a relation on 𝑋. Then 𝑅 is called an equivalence relation if it

If 𝑅 is an equivalence relation and (𝑥, 𝑦) ∈ 𝑅, then 𝑥 and 𝑦 are called equivalent

A generalization of the above example yields the residue classes modulo 𝑛:

The discussion of residue classes will be continued in Section 3.2.

One can view 𝑓 as a polynomial in 𝑛 variables with coefficients in ℤ2 (residue classes

We use SageMath to determine the algebraic normal form:

We find that 𝑓(𝑥0 , 𝑥1 ) = 𝑥0 𝑥1 + 𝑥1 + 1 mod 2. The algebraic degree is 2 and 𝑓 is

The binomial coefficients have the following formula:

The factorial increases very fast, for example

25! = 15511210043330985984000000 and 50! ≈ 3.04 ⋅ 1064 .

An efficient way to compute the binomial coefficients is the reduced formula

sage: for n in range (0 ,16):

Proposition 1.27. Let 𝑋 and 𝑌 be finite sets. Then

(1) |𝑋 × 𝑌 | = |𝑋| ⋅ |𝑌 | and |𝑋 𝑘 | = |𝑋|𝑘 for 𝑘 ∈ ℕ.