Basic Quantum Algorithms: Renato Portugal

arXiv:2201.
10574v3 [quant-ph] 23 Feb 2022
Basic Quantum Algorithms
Renato Portugal
Full Researcher at the

National Laboratory of Scientific Computing
LNCC/MCTI
February 25, 2022

Abstract
Quantum computing is evolving so quickly that forces us to revisit, rewrite, and update the
basis of the theory. Basic Quantum Algorithms revisits the first quantum algorithms. It started
in 1985 with Deutsch trying to evaluate a function at two domain points simultaneously. Then,
Deutsch and Jozsa created in 1992 a quantum algorithm that determines whether a Boolean
function is constant or balanced. In the next year, Bernstein and Vazirani realized that the same
algorithm can be used to find a specific Boolean function in the set of linear Boolean functions.
In 1994, Simon presented a new quantum algorithm that determines whether a function is one-
to-one or two-to-one exponentially faster than any classical algorithm for the same problem. In
the same year, Shor created two new quantum algorithms for factoring integers and calculating
discrete logarithms, threatening the cryptography methods widely used nowadays. In 1995,
Kitaev described an alternative version for Shor’s algorithms that proved useful in many other
applications. In the following year, Grover created a quantum search algorithm quadratically
faster than its classical counterpart. In this work, all those remarkable algorithms are described
in detail with a focus on the circuit model.
Contents
1 Introduction 1
2 Quantum Circuits 3
2.1 Review of linear algebra using the Dirac notation . . . . . . . . . . . . . . . . . . 3
2.2 Qubit and superposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Single-qubit gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Quantum states and entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5 Two-qubit quantum gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Multiqubit gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.7 Circuit of a Boolean function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.8 Quantum parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 Deutsch’s Algorithm 26
3.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Analysis of the algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Analysis of the entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5 Who implements the oracle? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.6 Economical circuit of Deutsch’s algorithm . . . . . . . . . . . . . . . . . . . . . . 31
4 Deutsch-Jozsa Algorithm 33
4.2 The quantum algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.5 Implementing the oracle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.6 Final remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5 Bernstein-Vazirani Algorithm 40
5.2 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.4 Bernstein-Vazirani algorithm has no entanglement . . . . . . . . . . . . . . . . . 43
5.5 Circuit of the oracle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.6 Economical circuit of the Bernstein-Vazirani algorithm . . . . . . . . . . . . . . . 45
6 Simon’s Problem 47
6.2 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.3 Analysis of the quantum part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.4 Analysis of the classical part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.6 Circuit of the oracle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.7 Final remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7 Shor’s Algorithm for Factoring Integers 56

7.2 Preliminaries on number theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.3 Quantum operator for modular exponentiation . . . . . . . . . . . . . . . . . . . 58
7.4 Fourier transform and its inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.5 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.6 Analysis of the quantum part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.7 Analysis of the classical part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.8 Circuit of the Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.9 Final remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8 Shor’s Algorithm for Discrete Logarithm 74

8.1 Preliminaries on number theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.2 Lattices in finite vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.3 Special case: The order of a is a power of 2 . . . . . . . . . . . . . . . . . . . . . 77
9 Grover’s Algorithm 80
9.1 Problem formulation in terms of an oracle . . . . . . . . . . . . . . . . . . . . . . 80
9.2 How to implement the oracle on a quantum computer . . . . . . . . . . . . . . . 81
9.3 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
9.4 Non-economical circuit of Grover’s algorithm . . . . . . . . . . . . . . . . . . . . 82
9.5 Economical circuit of Grover’s algorithm . . . . . . . . . . . . . . . . . . . . . . . 83
9.6 Analysis of Grover’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
9.7 Final remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
10 Phase Estimation and Applications 90

10.1 Phase estimation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
10.2 Application to order finding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
10.3 Application to discrete logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
10.4 Application to quantum counting . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
11 Final Remarks 101
Bibliography 103
Chapter 1
Introduction
Quantum algorithm is a subarea of quantum computing that is evolving quickly not only in terms
of new algorithms but also in terms of applications and implementations. The basic algorithms
are the pillars of this new edifice. The construction started with a change in the rules of the
game. Instead of storing information in bits, which take either zero or one, we are allowed to
store information in qubits, the state of which is a superposition of zeroes and ones. The rules
based on classical mechanics were replaced by rules based on quantum mechanics.
The first insight came with Deutsch’s proposal in 1985 of evaluating a one-bit Boolean
function at two points simultaneously by exploiting the superposition of zeroes and ones, which
is known as quantum parallelism. At this point in time, a framework for creating new algorithms
was missing, which was only established in 1989 when Deutsch published a paper on quantum
circuits, introducing the quantum gates that take place of the well-known classical gates, such as
AND, OR, NOT. In 1992, the area of quantum algorithm gained momentum when Deutsch and
Jozsa created an algorithm to determine whether a Boolean function is balanced or constant.
This algorithm stimulated the development of oracle-based algorithms. We have a function at
our disposal and we need to reveal a hidden property of this function. We can evaluate the
function as many times as we want, but the goal is to find the property by querying as few as
possible.
Bernstein and Vazirani noted in 1993 that the Deutsch-Jozsa algorithm could be used to
find a specific Boolean function in the set of linear Boolean functions. The Bernstein-Vazirani
algorithm is faster than its classical counterpart without exploiting entanglement. The power
of the algorithm relies only on quantum parallelism.
The momentum increased even further when Simon in 1994 published a quantum algorithm
that distinguishes whether a function is one-to-one or two-to-one exponentially faster than clas-
sical algorithms for the same purpose. In this case, we have to extend the function type in order
to have an output with many bits. This algorithm exploits maximum entanglement. Simon’s
problem has an interesting application to find a hidden subgroup of a special class of groups.
Shor reached the summit when he created in 1994 two celebrated quantum algorithms for
factoring composite integers and calculating discrete logarithms, which have a strong impact
in breaking cryptography methods used in our daily life. Shor’s algorithms helped to bring
quantum computing into the spotlight. Since then, the area is increasing at an amazing rate.
Shor’s algorithm can also be formulated as an oracle-based algorithm with a function that is
periodic. The goal is to find the period by evaluating the function as few as possible. To find
period is a job for the Fourier transform, which is useful in classical algorithms despite being
1
O(N log N ), where N is the data size. In the quantum case, the Fourier transform can be
implemented using O(log2 N ) universal gates and the quantum superposition makes the magic.
Kitaev presented in 1995 another version of Shor’s algorithms after developing a quantum
algorithm for phase estimation. Given a unitary operator and one of its eigenvectors, the algo-
rithm efficiently finds the corresponding eigenvalue, which is totally characterized by its phase.
Kitaev’s algorithm proved useful for other applications such as quantum counting.
Grover put the focus on unsorted databases and in 1996 created a quantum algorithm that
finds an item quadratically faster than classical searching. Grover’s algorithm can also be for-
mulated as an oracle-based algorithm with a Boolean function that is constant except for a
single point in the domain. The goal is to find that point by evaluating the function as few as
possible. When it is written as a black-box algorithm, it is clear that Grover’s algorithm has
wide applicability.
Basic Quantum Algorithms describes in full detail the remarkable contributions outlined
above. There is no hope if we don’t use the correct language: Mathematics, more specifically,
linear algebra. Notions such as quantum superposition and entanglement have a precise defini-
tion when we use linear algebra. Measurements are described by projectors, gates by unitary
operators, and qubits by vectors. Projectors, unitary operators, and vectors are words of the
linear algebra language.
Basic Quantum Algorithms follows as much as possible the historical ordering, which is the
order of increasing complexity. We feel as climbing steps with increasing heights, strengthening
muscles, and preparing for the challenge of understanding complex quantum algorithms. Each
Chapter is as independent as possible to help readers that are familiar with some algorithms
skip some parts.
Last but not least, do not hesitate in contacting the author (portugal@lncc.br) if there
are errors or any problems in terms of imprecision or missing citations. Suggestions are also
welcome.
2
Chapter 2
Quantum Circuits
The goal of this Chapter is to define the concepts of qubit, logic gate, and quantum circuit.
Before that, we briefly review key facts of linear algebra [1, 59] using the Dirac notation from
the beginning. Part of this material was published in the tutorial [45]. References for this
Section are [49] and [62]. Additional references for linear algebra for quantum computing are
Appendix A of [44] and Section 2.1 of [40].
2.1 Review of linear algebra using the Dirac notation

There are several notations to show that a variable v is a vector, for example, ~v , v, and so on.
In quantum computing, the most used notation is Dirac’s: |vi. A sequence of vectors is denoted
by |v0 i, |v1 i, |v2 i and so on. It is very common to abuse this notation and denote the same
sequence as |0i, |1i, |2i and so on.
The canonical basis of a two-dimensional vector space has two vectors, which are denoted
by {|0i, |1i} in the Dirac notation, where |0i and |1i have the following representation

1 0
|0i = and |1i = .
0 1
These vectors have two entries or components, unit length, and are orthogonal. Then, this basis
is orthonormal. It is called canonical basis in linear algebra and computational basis in quantum
computing. Note that |0i is not the null vector, but the first vector of the canonical basis. All
entries of the null vector are equal to 0. In the two-dimensional case, it is

0
0
without any special denomination in the Dirac notation.

A generic vector in a two-dimensional vector space is obtained via the linear combination of
the basis vectors,
|ψi = α|0i + β|1i,
where α and β are complex numbers. These numbers are the entries of vector |ψi, as can be
seen from the notation
α
|ψi = .
β
3
The dual vector to vector |ψi is denoted by hψ| and is obtained by transposing |ψi and conju-
gating each entry. Using the previous equation, we obtain
hψ| = α∗ β ∗ ,

which can be written as

hψ| = α∗ h0| + β ∗ h1|,
where
h0| = 1 0 and h1| = 0 1 .
The dual vector hψ| is an 1 × 2 matrix and vector |ψi is a 2 × 1 matrix. At this point, we
introduce the dagger symbol, denoted by †, which is the notation for the conjugate transpose
vector (transpose the vector and then conjugate each entry or vice versa). Then, we may write
hψ| = |ψi† and |ψi = hψ|† . Two daggers one after the other work as the identity operation.
Suppose that |ψ1 i and |ψ2 i are two-dimensional vectors given by

α γ
|ψ1 i = and |ψ2 i = .
β δ

The inner product of two vectors |ψ1 i and |ψ2 i is a complex number denoted by ψ1 ψ2 and
defined as the matrix product of the dual vector hψ1 | by |ψ2 i, as follows

γ
∗ ∗ = α∗ γ + β ∗ δ.

ψ1 ψ2 = α β
δ
In the Dirac notation, the calculation of the inner product is performed by distributing the
matrix product over the sum of vectors, as follows
ψ1 ψ2 = α∗ h0| + β ∗ h1| · γ|0i + δ|1i = α∗ γ 00 + β ∗ δ 11 = α∗ γ + β ∗ δ.

The norm of vector |ψ1 i is denoted by k |ψ1 i k and defined as

q
p
k |ψ1 i k = ψ1 ψ1 = |α|2 + |β|2 ,
where |α| is the absolute value of α, that is

√
|α| = α · α∗ .
√
If α = a + b i, where i is the imaginary unit (i = −1), a is the real part and b is the imaginary
part, then p p
|α| = (a + b i) · (a − b i) = a2 + b2 .
A complex number α such that |α| = 1 is called a unit complex number and can be written as
eiθ = cos θ + i sin θ, where θ is an angle. In real vector spaces, the inner product is called scalar
product and is given by

ψ1 ψ2 = k |ψ1 ik k |ψ2 ik cos θ,
where θ is the angle between vectors |ψ1 i and |ψ2 i.
4
Using these definitions, we can show that the basis {|0i, |1i} is orthonormal, that is, vectors
|0i and |1i are orthogonal and have norm 1, that is

00 = 1, 01 = 0, 10 = 0, 11 = 1.
An algebraic way of denoting orthonormality and of compacting the last four equations into one
is

k ` = δk` ,
where k and ` are bits and δk` is the Kronecker delta, defined as
(
1, if k = `,
δk` =
0, if k 6= `.
2.2 Qubit and superposition

The basic memory unit of a classical computer is the bit, which assumes 0 or 1. Usually, the
bit is implemented using two distinct voltages, following the convention that null or low voltage
represents bit 0 and high voltage represents bit 1. To determine whether the output is bit 0 or
1 at the end of the computation, it is necessary to measure the voltage.
The basic memory unit of a quantum computer is the qubit, which also assumes, at the end
of the computation, 0 or 1. The qubit is implemented using an electric current in a small super-
conductor, following the convention that clockwise current represents 0 and counter-clockwise
current represents 1, or the other way around. The difference from the classical device happens
during the computation since the qubit admits the simultaneous coexistence of 0 and 1. Dur-
ing the computation, that is, before the measurement, the state of a qubit is represented by a
norm-1 two-dimensional vector and the states of a qubit corresponding to 0 and 1 are |0i and
|1i. The definition of state is a vector of norm 1 in a complex vector space endowed with the
inner product presented in the previous Section.1 The state can be thought of as the “value”
of the qubit before the measurement. Quantum coexistence is represented mathematically by a
linear combination of orthonormal vectors as follows
|ψi = α|0i + β|1i,
where α and β are complex numbers that obey the constraint
|α|2 + |β|2 = 1.
The state of the qubit is vector |ψi of norm 1 with the entries α and β. The complex numbers
α and β are the amplitudes of the state |ψi.
The coexistence of bits 0 and 1 cannot be implemented in a classical device, since it is not
possible to have low and high voltage at the same time, as everyone knows. In quantum mechan-
ics, it is hard to believe, it is possible to have a quantum system (usually microscopic) with low
and high voltage at the same time. This coexistence can only be fully maintained if the quantum
system is fully isolated from the surrounding macroscopic environment. When we measure the
quantum system to determine the value of the voltage, the measuring device inevitably affects
1
A finite-dimensional vector space with an inner product is a Hilbert space.
5
the voltage, outputting a stochastic result, which is a low or high voltage, similar to the classical
bit. In other words, coexistence is only maintained when no one (and no thing) is trying to de-
termine whether the voltage is high or low. Note that quantum mechanics is a scientific theory,
that is, its laws and results can be tested objectively in laboratories. In addition, unnecessary
laws and statements are summarily discarded. Therefore, the statement “coexistence is only
maintained when the system is isolated” has practical consequences and is a statement that has
been tested and re-tested for over 100 years in thousands of quantum mechanics laboratories
worldwide. On the other hand, alternative theories that are more classically palatable, which
could help in understanding or visualizing the quantum superposition, have been ruled out by
experimental tests.
From a computational point of view, we have a qubit in superposition and we use this feature
in a circuit. For example, the circuit
|ψi 0 or 1
tells us that the initial “value” of the qubit is |ψi and this information is conveyed unchanged
from left to right until a measurement is performed, as shown by the meter (the display of a
voltmeter). The measurement outputs 0 or 1. Classical information is conveyed by a double
wire. If the state of the qubit is |ψi = |0i, a measurement will necessarily output 0 and if
the state is |1i, a measurement will necessarily output 1. In the general case, if the state is
α|0i + β|1i, a measurement will return 0 with probability |α|2 or 1 with probability |β|2 , as
shown in the circuit (
0, with probability |α|2 ,
α|0i + β|1i
1, with probability |β|2 .
The output can be depicted by a histogram of the probability distribution. It is important to
repeat the fact that α and β are called amplitudes of the state α|0i + β|1i and are numbers that
can be negative and may have an imaginary part. On the other hand, |α|2 and |β|2 are positive
real numbers in the interval [0, 1] and are called probabilities. A careless interchange between
amplitudes and probabilities creates unforgivable errors.
z
0
θ
-
+i
-i
+ φ y
x

Figure 2.1: Bloch sphere and the location of states |0i, |1i, |±i, and |±ii. An arbitrary state
|ψi is shown with spherical angles θ and ϕ.
6
The state of a qubit can be characterized by two angles θ and ϕ as follows
θ θ
|ψi = cos |0i + eiϕ sin |1i,
2 2
where 0 ≤ θ ≤ π and 0 ≤ ϕ < 2π. This notation shows that there is a one-to-one correspondence
between the set of states of a qubit and points on the surface of a sphere of radius 1, called Bloch
sphere. The angles θ and ϕ are spherical angles that describe the location of the state |ψi, as
shown in Fig. 2.1. A point on the Bloch sphere is described by a three-dimensional vector with
real entries  
sin θ cos ϕ
 sin θ sin ϕ  .
cos θ
The locations on the Bloch sphere of states |0i and |1i, whose spherical angles are (θ, ϕ) = (0, 0)
and (θ, ϕ) = (π, 0), are pinpointed in Fig. 2.1. The locations of states
|0i ± |1i
|±i = √ ,
2
|0i ± i|1i
|±ii = √
2
are determined after figuring out the spherical angles, which are (θ, ϕ) = (π/2, π/2 ∓ π/2) and
(θ, ϕ) = (π/2, ±π/2), respectively.2 Then, their locations are the intersections of the x-axis with
the Bloch sphere, and y-axis with the Bloch sphere.
If we have an arbitrary 1-qubit state α|0i + β|1i and we want to find the spherical angles θ
and ϕ, the first thing to do is to write α and β as r1 eiϕ1 and r2 eiϕ2 , respectively, where r1 = |α|
and r2 = |β|. Now we multiply the state by e−iϕ1 to obtain r1 |0i + ei(ϕ2 −ϕ1 ) r2 |1i. Then, we
take ϕ = ϕ2 − ϕ1 and θ = 2 arccos r1 . Note that r2 = sin(θ/2) because r12 + r22 = 1. There is no
problem in multiplying the state by a unit complex number such as e−iϕ1 because in quantum
mechanics two quantum states that differ by a global factor are considered the same. The global
factor must be a unit complex number and is usually called global phase.
2.3 Single-qubit gates

A single-qubit gate is a two-dimensional unitary square matrix. A matrix U is unitary if mul-
tiplying U by an arbitrary norm-1 vector results in a norm-1 vector. Formally, suppose that
|ψ 0 i = U |ψi, where |ψi is a norm-1 two-dimensional vector. If U is a unitary matrix, then |ψ 0 i
will have norm 1. For example, the Hadamard matrix

1 1 1
H=√
2 1 −1
is unitary. Therefore, the multiplication of H by the basis vectors has to result in norm-1 vectors.
In fact,
1 1 1 1 1
H|0i = H = √ = √ |0i + √ |1i.
0 2 1 2 2
2
Do not try to understand ± simultaneously because our mind enters in an undesired, superposed, entangled
state. Do please address first the upper sign of all expressions. Then, address the lower sign.
7
Note that the resulting vector has norm 1. We denote this vector by |+i, that is
1 1
|+i = √ |0i + √ |1i.
2 2
Multiplying H by |1i yields vector |−i defined as
1 1
|−i = √ |0i − √ |1i,
2 2
which also has norm 1. These calculations are important because we need to learn the output
of the gate. If the input is |0i, the output will be |+i. If the input is |1i, the output will be |−i.
If the input is a superposition α|0i + β|1i, the output will be the superposition of |+i and |−i
with the same amplitudes, α|+i + β|−i, because we use the linearity property of the gate, that
is, instead of thinking that H is a matrix, we use that H is a linear operator and if H is applied
to a linear combination of vectors |0i and |1i with amplitudes α and β, the result is a linear
combination of H|0i and H|1i with the same amplitudes α and β. We can avoid this abstract
point of view, but when we multiply a matrix to a sum of vectors, we have to distribute the
multiplication over the vectors.
To check that the multiplication of H by the vectors of an orthonormal basis results in
norm-1 vectors is not enough to show that H is a unitary matrix.
It is also necessary to show
that the resulting vectors are orthogonal, that is, to show that −+ = 0.
A quantum circuit is a graphic representation of a quantum algorithm. The input qubit is on
the left and the information (qubit’s state) is conveyed unchanged from left to right until it faces
a logical gate. The gate receives the input from the left, then processes the information, and
the result exits to the right and continues its course. The gate processing is done by multiplying
the unitary matrix that represents the gate by the vector that represents the state of the qubit.
For example, the expression |+i = H|0i is represented by the following circuit:
|0i H |+i.
The input is vector |0i, which is conveyed unchanged by the wire to H, which acts on the input
and transforms it into |+i, which is then conveyed to the right. The gate action is calculated
by multiplying H to |0i. Therefore, the result of the computation is |+i. If at the end of the
computation we perform a measurement, the circuit is
(
0, with probability 12 ,
|0i H
1, with probability 12 .
It shows that the output of the measurement of the qubit, whose state was |+i, is 0 with
probability 1/2 or 1 with the same probability. Fig. 2.2 shows the histogram of the probability
distribution generated in Qiskit3 with two iterations.
An example that is simpler than the previous one is the X gate, defined as

0 1
X= .
1 0
3
Qiskit is open-source software for running programs on IBM quantum computers.
8
Figure 2.2: Histogram of the probability distribution generated by measuring a qubit in state
|+i two times.
X is the quantum NOT gate because |1i = X|0i and |0i = X|1i. We can verify these equations
through the matrix multiplication of X with |0i and |1i. In a more compact form, we can write
|j ⊕ 1i = X|ji, where ⊕ is the XOR operation or sum modulo 2. Because of this, the gate X is
also represented as ⊕. A circuit using the X gate is
|0i X 1 with probability 1.
Now we can increase the complexity. How to generate a superposition such that the ampli-
tudes of α|0i + β|1i are different and non-null? For example, how to generate a state α|0i + β|1i
such that |α|2 = 25% and |β|2 = 75% before the measurement? The answer is to use the most
general 1-qubit gate, whose algebraic expression is
cos 2θ −eiλ sin 2θ

U (θ, φ, λ) = .
eiφ sin 2θ ei(λ+φ) cos 2θ
After applying U (θ, 0, 0) on |0i, we obtain

θ θ
U (θ, 0, 0)|0i = cos |0i + sin |1i.
2 2
We must choose θ = 2π/3, because cos2 (π/3) = 1/4. The previous example using the Hadamard
gate can be reproduced by taking θ = π/2 because H = U (π/2, 0, 0).
U (θ, φ, λ) is a wildcard gate because it replaces all other 1-qubit gates. For instance, three
useful gates obtained from U are
cos 2θ −i sin 2θ

U (θ, π/2, −π/2) = Rx (θ) = ,
−i sin 2θ cos 2θ
cos 2θ − sin 2θ

U (θ, 0, 0) = Ry (θ) = ,
sin 2θ cos 2θ

iλ/2 1 0
U (0, 0, λ) = e Rz (λ) = .
0 eiλ
where Rx , Ry , and Rz are the operators that rotate the Bloch sphere about the x, y, and z-axis,
respectively. Unfortunately, U is too perfect to be true for it would be possible to choose θ as
9
small as we want at the cost of one single gate. Errors would eventually prevent obtaining good
results after choosing very small angles.
To determine the computational complexity of an algorithm, we have to restrict ourselves to
a finite set of single-qubit gates. The most important 1-qubit gates are

1 0 0 1 0 −i 1 0
I2 = , X = , Y = , Z =
0 1 1 0 i 0 0 −1
known as Pauli matrices,

" # " #
1 1 1 1 0 1 0 1 0 1 0
H=√ , S= , S† = , T = iπ , T† = iπ
2 1 −1 0 i 0 −i 0 e4 0 e− 4
known as Hadamard gate, phase gate, its conjugate gate, π/8 gate or T gate and its conjugate.
iπ
The complex numbers e± 4 are equal to
iπ 1±i
e± 4 = √ .
2
Every quantum circuit without meters has an equivalent algebraic expression. For example, if
A, B, C are single-qubit gates, the circuit
|0i A B C |ψi
is equivalent to the algebraic expression
|ψi = C · B · A · |0i,
where · is the matrix product, which is usually omitted. Note that the algebraic expression
equivalent to the circuit has the reverse order. Therefore, the last circuit can be also be written
as
|0i CBA |ψi,
where CBA is a 2 × 2 unitary matrix. For example, the following circuits are equivalent:
H X H ≡ Z ,
because Z = HXH. The equivalent algebraic expression can be used to simplify the circuit and
to predict its output.
In quantum computing, we have to take advantage of classical computing notation. For
instance, think that ` in |ì is a classical bit. Then, |ì represents both |0i and |1i. This
notation is used in |` ⊕ 1i = X|ì. Similarly, we have (−1)` |ì = Z|ì and (−1)` i |` ⊕ 1i = Y |ì.
Those expressions are used to determine the output of gates X, Y , and Z when the input is a
vector of the computational basis. For the gate U (0, 0, λ), we have eiλ` |ì = U (0, 0, λ)|ì, which
includes S, T , and their conjugate gates as subcases. All those gates don’t create superposition.
The only one that does is the Hadamard gate, whose output is
|0i + (−1)` |1i

√ = H|ì.
2
10
Now we are ready to show that
|0i+(−1)`⊕1 |1i
|ì X H √
2
and
(−1)` |0i+|1i
|ì H X √ .
2
With a little more effort, we show that

√ √
2+1+i 2−1−i
|0i H T H √
2 2
|0i + √
2 2
|1i.
Implementing on IBM quantum computers.

At this point, it is a good idea to use IBM’s composer. After logging in IBM’s website4 (regis-
tration is needed), we have to launch the composer by clicking on Launch Composer . IBM’s
composer is friendly because we can drag the available gates into the circuit. Let us keep it
to a basic use at this moment. Fig. 2.3 shows a circuit with the Hadamard gate followed by
a measurement. We simply drag H and drop it on the first wire of the circuit, then we drag
the meter and drop it after H. The meter’s arrow shows that the output is re-directed to an
auxiliary classical register at the bottom of the circuit.
Figure 2.3: Example of a circuit with a Hadamard gate and a meter (Reprint Courtesy of IBM
Corporation ©)
After the circuit is ready, we click on Setup and run , and then we have two options: (1) Run
the circuit on a quantum computer by selecting one of the available quantum systems, or (2) sim-
ulate the circuit by selecting a simulator. It is usually better to start with the second option.
We select ibm qasm simulator as the provider, then we select the number of shots and then we
click on Run at the bottom. Fig. 2.4 shows the output of an execution. The result 000 was
obtained 503 times and 001 was obtained 521 times out of 1024 shots. The first two bits must
be discarded because they refer to qubits that have not been used (qubits q2 and q1 ). The
output uses the order q2 q1 q0 , different from the order adopted in most textbooks on quantum
computing. As we can see, the composer runs the experiment several times (up to 8192) and
shows the histogram of the cumulative frequency distribution. In the case of the Hadamard gate,
the most probable result is 50% each, but results close to 50% have non-negligible probabilities
and frequently occur. To obtain results closer to 50%, we have to increase the number of shots.
4
https://quantum-computing.ibm.com/
11
Figure 2.4: Output of the circuit with a Hadamard gate and a meter (Reprint Courtesy of IBM
Corporation ©)
If we choose to run on a quantum computer, the circuit will be queued and may take a long
time. We can check the size of the queue when we are selecting the system. The output of
the quantum computer is usually different from the simulator because errors degrade the result.
Increasing the number of shots doesn’t guarantee that the probability distribution tends towards
the correct distribution, and it is important to check the correctness of the circuit through the
simulator before running on the quantum system.
2.4 Quantum states and entanglement

The state of two qubits is described by a vector of norm 1 that belongs to a four-dimensional
vector space because there are four possible results after measuring the qubits: 00, 01, 10, 11.
The first bit refers to the first qubit and the second bit to the second qubit, as is usual in
textbooks on quantum computing, and the least significant bit is on the right, as usual.
Following the laws of quantum mechanics, there is a one-to-one correspondence between the
canonical basis and the possible measurement outcomes as follows:
       
1 0 0 0
 0   1   0   0 
|00i = 
 0  , |01i =  0  , |10i =  1  , |11i =  0  .
      
0 0 0 1
In the classical case, the state of two bits is either 00 or 01 or 10 or 11, exclusively. In the
quantum case, the state of two qubits is the linear combination
|ψi = a0 |00i + a1 |01i + a2 |10i + a3 |11i,
where a0 , a1 , a2 , and a3 are complex numbers. When the state of two qubits is |ψi, the
outcome of a measurement in the computational basis is either 00 or 01 or 10 or 11, exclusively
and stochastically; there is no way of predicting the measurement outcome even knowing |ψi.
However, if we know |ψi, then we know the probabilities of outcomes, which are

2
prob(00) = 00ψ = |a0 |2 ,

2
prob(01) = 01ψ = |a1 |2 ,

2
prob(10) = 10ψ = |a2 |2 ,

2
prob(11) = 11ψ = |a3 |2 .
12
The sum of those probabilities is 1. If we don’t know state |ψi, a single measurement doesn’t
allow the determination of |ψi, that is, we cannot find the amplitudes a0 , a1 , a2 , and a3 . There
is an important theorem in quantum mechanics known as non-cloning theorem [18, 42, 61].
Theorem 2.1. (No cloning) Using unitary operators, it is impossible to make an identical copy
of an arbitrary unknown quantum state that is available to us.
This theorem severely restricts any possibility of determining |ψi through measurements.
However, if we can generate |ψi again and again, for example, through a circuit, we can repeat
the whole process several times and obtain an approximation for prob(00), prob(01), prob(10),
and prob(11). For example, repeating 1000 times, we can determine these probabilities with two
digits. Unfortunately, we are still unable to determine |ψi exactly because knowing |a0 |2 doesn’t
allow us to determine a0 exactly. It may seem that this is unimportant—false. Let’s consider a
critical example. Suppose that
1 1
prob(00) = , prob(01) = 0, prob(10) = 0, prob(11) = .
2 2
We have at least two possibilities for |ψi:
|00i + |11i |00i − |11i
|ψ1 i = √ and |ψ2 i = √ .
2 2
Note that |ψ1 i and |ψ2 i are orthogonal. This shows that we can make a serious mistake. We
don’t know whether two circuits are equivalent when their probability distributions are the same.
At this point, the following question is relevant: Suppose that we know |ψi, is it possible to
determine the state of each qubit? The answer is “depends on |ψi”. If |ψi is one of the states of
the computational basis then we know the state of each qubit. For example, suppose |ψi = |10i.
We have to factorize |ψi as
|10i = |1i|0i = |1i ⊗ |0i,
where ⊗ is called Kronecker product. When factorization is successful, we know the state of each
qubit. In this case, the state of the first qubit is |1i and the state of the second is |0i. When we
write |1i|0i, the Kronecker product is implicitly assumed.
The Kronecker product5 of two vectors or two matrices is defined as follows. Let A be a
m × n matrix and B a p × q matrix. Then,
 
a11 B · · · a1n B
A⊗B =
 .. .

.
am1 B · · · amn B
The result is a mp × nq matrix. The Kronecker product of vectors |1i and |0i is calculated by
viewing these vectors as 2 × 1 matrices and is given by
 
1  
0 0  0
0 1   0
|1i ⊗ |0i = ⊗ =   =   = |10i.
1 0 
 1 
 1
1 0
0
5
There is a more abstract formulation of the Kronecker product called tensor product. In this work, we use
these terms interchangeably. The notation A ⊗ B reads “A tensor B”; however, the terms tensor and tensor
product have no relation to tensors (a generalization of matrices) or product of tensors.
13
Note that the Kronecker product is noncommutative. For example, |1i ⊗ |0i = 6 |0i ⊗ |1i. An
important hint is to never change the order of the Kronecker product.
From the laws of quantum mechanics, if the state of a qubit is |ψ1 i and the state of a second
one is |ψ2 i, then the state |ψi of the composite system, two interacting qubits, will initially be
|ψi = |ψ1 i ⊗ |ψ2 i.
We can always obtain the state of the composite system when we know the states of the parts.
However, the reverse process is not possible in general. For example, suppose that the state of
two qubits is
|00i + |11i
|ψi = √ .
2
We want to find 1-qubit states |ψ1 i = α|0i + β|1i and |ψ2 i = γ|0i + δ|1i such that
|00i + |11i
(α|0i + β|1i) ⊗ (γ|0i + δ|1i) = √ .
2
Expanding the left-hand side, we obtain the following system of equations:
1 1
αγ = √ , αδ = 0, βγ = 0, βδ = √ .
2 2
Since this system has no solution, the 2-qubit state |ψi cannot be written as the Kronecker
product of 1-qubit states. In quantum mechanics, a composite quantum system may have a
definite state |ψi while parts of the system have no definite state.
A quantum state of a composite system that cannot be factorized in terms of the Kronecker
product is called an entangled state. Entangled states are very important in quantum comput-
ing because without them the computational power of the quantum computer would be badly
impaired. However, note that the presence of entanglement in a quantum algorithm doesn’t
guarantee that this algorithm is more efficient than a classical algorithm.
The term “definite state” |ψi in quantum mechanics means pure state. A state of a quantum
system is called pure state if we are 100% sure that the system is described by a norm-1 vector
|ψi. On the other hand, if we are not 100% sure, that is, if we know that the state of the system
is |ψ1 i with probability 0 < p < 1 or |ψ2 i with probability 1 − p, then the state is mixed and is
represented by an ensemble or a matrix ρ such that Tr(ρ) = 1. When the state of a composite
quantum system is entangled, the state of any sub-system is always a mixed state.
We can generalize the discussion of this Section to n qubits, where n ≥ 1. The computational
basis has of 2n vectors, each vector with 2n entries,
     
1 0 0
 0   1   0 
|0 · · · 00i =  .  , |0 · · · 01i =  .  , · · · , |1 · · · 11i =  .  .
     
.
 .  .
 .   .. 
0 0 1
Note that the binary number inside the ket, for instance, 0 · · · 0 in |0 · · · 0i, has n bits and the
state itself is the Kronecker product of n 1-qubit states. The binary number 0 · · · 0 can be
written in the decimal notation as |0 · · · 0i → |0i. Each binary number inside the kets can be
written in the decimal notation as
|0 · · · 00i → |0i, |0 · · · 01i → |1i, |0 · · · 10i → |2i, ..., |1 · · · 11i → |2n − 1i.
14
A generic state |ψi belongs to a 2n -dimensional vector space. Then,
|ψi = a0 |0i + a1 |1i + a2 |2i + · · · + a2n −1 |2n − 1i,
where
|a0 |2 + |a1 |2 + |a2 |2 + · · · + |a2n −1 |2 = 1.
After a measurement of all qubits, we obtain a n-bit string stochastically. The outcome is either
the n-bit string 0 · · · 00 with probability |a0 |2 , or the n-bit string 0 · · · 01 with probability |a1 |2 ,
and so on. We have a sample space comprising those n-bit strings and a probability distribution
given by prob(`) = |a` |2 , where ` is a n-bit string. The measurement outcome is a random
variable that takes a value ` in this sample space with probability prob(`).
As we have said before, a vector of the computational basis of n qubits can be written as
the Kronecker product of n 1-qubit vectors. For example, for n = 3 qubits, we can obtain the
second vector of the computational basis using the Kronecker product as
 
0
1
 
0
 
1 1 0 0
|0i ⊗ |0i ⊗ |1i = ⊗ ⊗ =  0 = |001i.

0 0 1  
0
 
0
0
In the decimal notation, |0i can be confused with the state of 1 qubit. To avoid confusion, we
have to know what is the number of qubits. For example, if |0i refers to the state of 3 qubits in
the decimal notation, then in binary we have |000i.
In this Section, we have defined entangled states. It is not a good idea to try to under-
stand this concept without using mathematics at the outset. Learning the basic definition of
entanglement is similar to learning what is a prime number in basic arithmetic. Suppose we
know nothing and we are going to learn the basics of arithmetic starting with addition. After
attending classes, doing many exercises, memorizing tables, we master the concept of addition.
The next step is multiplication. We have to pass through the same ordeal of classes, exercises,
and table memorization, but eventually, we master the concept of multiplication. Then we are
ready to learn the reverse process, that is, factorization. We learn that the factors of 15 are 3
and 5. Now we try 17. We have to find two positive numbers strictly less than 17 so that when
we multiply those numbers we obtain 17. We check that there are no such numbers. Then we
are ready to understand the concept of prime numbers, which is very important in mathematics.
If we understand prime numbers, then we are ready to understand entangled states. But this
time we have to learn direct sum of vectors, and then Kronecker product of vectors, and then
we try to factorize a vector. That is, we have to find two vectors |ψ1 i and |ψ2 i that have fewer
entries than |ψi so that |ψi = |ψ1 i ⊗ |ψ2 i. The entangled states are the irreducible vectors
of composite quantum systems in terms of the Kronecker product. A single-qubit state is not
entangled because a single-qubit is not a composite system. We need at least two qubits. There
is a long road ahead, of course, but two qubits is a good starting point. In the next Section, we
see how to produce an entangled state in the quantum computer.
15
2.5 Two-qubit quantum gates
The most important 2-qubit gate is CNOT or controlled-NOT gate, also denoted by C(X) or
CX. It is defined as
CNOT |ki|ì = |kiX k |ì,
and is represented by the circuit
|ki • |ki
|ì X k |ì = |` ⊕ ki,
where k and ` are bits. The state of the first qubit (control ) doesn’t change after applying
CNOT. The state of the second qubit (target) changes only if bit k is 1. In this case the output
is X|ì = |` ⊕ 1i. If k = 0 then X 0 = I2 and I2 |ì = |ì, where I2 is the 2 × 2 identity matrix.
We have defined CNOT by showing its action on the vectors of the computational basis. In
linear algebra, this definition is complete, because to know the action of CNOT on an arbitrary
vector, which is a linear combination of vectors of the computational basis, we use the linearity
of this gate. For example, in the circuit below the first input is in superposition:
|0i+|1i
•

√
2  |00i+|11i
√
2
.

|0i
What is the output? The best way to determine the output is via algebraic calculations. After
using the distributive property of the Kronecker product over the sum of vectors, the input to
the circuit is
|0i + |1i 1 1 |00i + |10i
√ ⊗ |0i = √ |0i ⊗ |0i + √ |1i ⊗ |0i = √ .
2 2 2 2
To calculate the action of CNOT on a sum of vectors, we use the linearity of the matrix product,
that is,
|00i + |10i 1 1
CNOT · √ = √ CNOT · |00i + √ CNOT · |10i,
2 2 2
where CNOT · |00i denote the multiplication of the CNOT matrix by vector |00i. Using the
definition given at the beginning of the Section, we obtain CNOT|00i = |00i and CNOT|10i =
|11i, and we confirm that the output is
|00i + |11i
√ .
2
Since this result is an entangled state, we cannot factorize it and therefore we cannot write the
output for each qubit.
The same result is obtained if we use the matrix representation, which is
 
1 0 0 0
I 0 1 0 0
CNOT = 2 =0 0 0 1 ,

X
0 0 1 0
16
and the representation of |00i and |10i as 4-dimensional vectors, that is,
       
1 0 0 0 1 1 0 0 0 0 1
1 0 1 0 0 0
   1 0 1 0
 0 0
   1 0
√  +√  =√  .
2 0 0 0 1 0 2 0 0 0 1 1
   2 0
0 0 1 0 0 0 0 1 0 0 1
The complete circuit that implements the entangled state above when the initial state of the
quantum computer is |00i is
|0i •

H 

00 with probability 0.5,
11 with probability 0.5.

|0i
Since the first qubit is initially in state |0i, we have to use H to generate (|0i + |1i)/2. In fact,
we have
|00i + |11i
CNOT · (H ⊗ I) |00i = CNOT · H|0i ⊗ |0i) = √ .
2
The next example is simpler than the previous one. Consider the circuit without measure-
ments
|0i H |+i
|0i H |+i.
What is the output? Usually, to calculate the output, we convert the circuit to its equivalent
algebraic expression. For this circuit, we have (H ⊗ H)|00i. The calculation is performed in the
following way:
(H ⊗ H)|00i = (H ⊗ H) · (|0i ⊗ |0i) = (H|0i) ⊗ (H|0i) = |+i ⊗ |+i.
In the second equality, we use the following property of the Kronecker product:
(A ⊗ B) · (C ⊗ D) = (A · C) ⊗ (B · D),
for matrices A, B, C, D, or
(A ⊗ B) · (|ψ1 i ⊗ |ψ2 i) = (A|ψ1 i) ⊗ (B|ψ2 i),
which is valid for any matrices A and B and vectors |ψ1 i and |ψ2 i as long as the number of
entries of the vectors is equal to the corresponding number of columns of the matrices. So the
output of the circuit is
|0i + |1i |0i + |1i |0i + |1i + |2i + |3i

|+i ⊗ |+i = √ ⊗ √ = .
2 2 2
Let us consider another example. Take the following circuit without measurements:
|0i H |+i
|0i |0i.
17
How to calculate the output using the equivalent algebraic expression? The hint is to use the
following equivalent circuit:
|0i H |+i
|0i I2 |0i,
where I2 is the two-dimensional identity matrix. The algebraic calculation is done as follows:
|00i + |10i
(H ⊗ I2 )|00i = (H|0i) ⊗ (I2 |0i) = |+i ⊗ |0i = √ .
2
When we convert a quantum circuit to its equivalent algebraic expression, we must use the
Kronecker product for gates in the same column and the matrix product for gates in the same
wire or in sequence; however, we must reverse the order of the gates in the second case. For
example, the algebraic expression equivalent to the circuit
|0i

A B 
D |ψi,

|0i C
where A, B, and C are 1-qubit gates and D is a 2-qubit gate, is
|ψi = D · (B ⊗ C) · (A ⊗ I2 ) · |00i.
We can simplify this expression a little and write
|ψi = D (BA ⊗ C)|00i.
Only D can create or destroy entanglement. Operators A, B, and C cannot create or destroy
entanglement. The proof is as follows. Suppose that the input state is |ψ1 i ⊗ |ψ2 i (unentangled).
The action of A outputs (A|ψ1 i) ⊗ |ψ2 i, which is unentangled. Now suppose that the input is an
entangled state |ψi. The action of A outputs (A ⊗ I)|ψi. Suppose that this state is unentangled,
that is, there exist |ψ1 i and |ψ2 i such that (A ⊗ I)|ψi = |ψ1 i ⊗ |ψ2 i. We reach a contradiction
because the last equation is equivalent to |ψi = (A† |ψ1 i) ⊗ |ψ2 i.
CNOT is so important that we describe a variant that is CNOT activated by 0. It is defined
by the expression
|ki|ì −→ |kiX (1−k) |ì,
|ki |ki
|ì X (1−k) |ì = |` ⊕ k ⊕ 1i.

Note that the control qubit is denoted by the empty circle indicating that the control is only
activated if the qubit is in state |0i. This gate can be obtained from the usual CNOT by
multiplying (X ⊗ I2 ) on both sides, as shown in the following circuit equivalence:
X • X
≡
.
18
This gate is represented by a block matrix of the form

I2 I2 I2 X
(X ⊗ I2 ) · CN OT · (X ⊗ I2 ) = = .
I2 X I2 I2
An alternative way to obtain the matrix representation is by listing sequentially the output of
each vector of the computational basis
◦—⊕
|00i −−−→ |01i,
◦—⊕
|01i −−−→ |00i,
◦—⊕
|10i −−−→ |10i,
◦—⊕
|11i −−−→ |11i.
Then convert each output into the vector notation and join them side by side to form a matrix:
      
0 1 0 0 0 1 0 0
1 0 0 0 1 0 0 0
        −→  = X .
0 0 1 0 0 0 1 0 I2
0 0 0 1 0 0 0 1
The second most important 2-qubit gate is the controlled Z gate, denoted by C(Z) or CZ.
Its circuit representations are
• Z •
≡ ≡
Z • • .
Note that it doesn’t matter which qubit is the control or target, that is, Z may be controlled
by the first qubit, and target on the second, or the other way around. The third representation
is interesting because the qubits are on an equal footing. The matrix representation is unique
and is given by  
1 0 0 0
I 0 1 0 0 
C(Z) = 2 =0 0 1 0  .

Z
0 0 0 −1
The CNOT and C(Z) gates are connected as shown by the following circuit equivalence:
• •
≡
H H Z .
The equivalence follows from
(I ⊗ H) CNOT (I ⊗ H) = C(Z),
19
which can be shown using the matrix representation. There is an alternative way of showing
the equivalence by using the fact that H 2 = I and then I ⊗ H can be replaced by C(H) because
two H’s act trivially if the CNOT’s control is not activated. Then
(I ⊗ H) CNOT (I ⊗ H) = C(H)C(X)C(H) = C(HXH) = C(Z),
because HXH = Z and C(A)C(B) = C(BA) for any 1-qubit gates A and B.
Now we are ready to show that the control and target of a CNOT invert when we multiply
CNOT by H ⊗ H on both sides. In fact,
(H ⊗ H) CNOT (H ⊗ H) = (H ⊗ I)(I ⊗ H) CNOT (I ⊗ H)(H ⊗ I) = (H ⊗ I)C(Z)(H ⊗ I).
Consider the first qubit as the target of C(Z). Then we have HZH = X controlled by the
second qubit, and we are done. The circuit representation is
H • H
≡
H H • .
The third most important 2-qubit gate is the SWAP gate. Its circuit representations are
• • • ×
≡ ≡
• • • × .
The matrix representation is  

1 0 0 0
0 0 1 0
C(Z) = 
0
.
1 0 0
0 0 0 1
Implementing on IBM quantum computers.

The implementation of the circuit with gates H and CNOT on the IBM composer is shown in
Fig. 2.5. We have to drag and drop a H gate and a CNOT gate, so that the control of CNOT
is qubit q0 and the target is qubit q1 . We also have to drag and drop two meters. If we want to
see the exact simulation of the probability distribution in the composer, we have to remove the
meters. Note that in the composer, qubit q0 represents the rightmost qubit, that is, the order
of the qubits is |q2 q1 q0 i and the output refers to q2 q1 q0 . In this work and most papers and
books, the opposite order is assumed. The output of a simulation with 1024 repetitions is shown
in Fig. 2.6. The figure shows that the result 000 was obtained 507 times, 001 21 times, 010 51
times, and 011 445 times. We must discard the unused qubit q2 (leftmost qubit). Therefore, the
result shows that the algorithm returns 00 with probability 49,5% and 11 with probability 43,5%
approximately, which is unexpected based on the analytical calculation. The wrong results 01
and 10 were obtained because the quantum computers are error-prone.
20
Figure 2.5: Circuit with gates H and CNOT (Reprint Courtesy of IBM Corporation ©)
Figure 2.6: Output of the circuit in Fig. 2.5 using backend ibmq manila (Reprint Courtesy of
IBM Corporation ©)
2.6 Multiqubit gates

The most important 3-qubit gate is the Toffoli gate, denoted by CCNOT or C 2 (X), defined by
the expression
C 2 (X) |ji|ki|ì = |ji|kiX jk |ì,
and represented by the circuit
|ji • |ji
|ki • |ki
|ì X jk |ì = |` ⊕ (j AND k)i.

This gate has two controls and one target. X acts on the target qubit if and only if both control
qubits are set to one. If one control is set to zero, the target doesn’t change. This gate can be
seen as the quantum version of the classical AND gate because if ` = 0, the output of the third
qubit is (j AND k). The matrix representation of the Toffoli gate is a diagonal block matrix
given by  
I2
I2
C 2 (X) = 
 
.
 I2 
X
21
There are variants of the Toffoli gate activated when the control qubits are set to zero. For
example, the circuit
|ji |ji
|ki |ki
|ì X (1−j)(1−k) |ì.

implements a gate that applies X on the third qubit if and only if the first two control qubits
are set to zero. It can be implemented using a standard Toffoli gate and X gates, as shown in
the following circuit equivalence:
X • X
≡ X • X
.
The multiqubit Toffoli gate C n (X) is a (n + 1)-qubit gate with n control qubits and one
target. It is defined by the expression
C n (X)|q0 i|q1 i · · · |qn−1 i|qn i = |q0 i|q1 i · · · |qn−1 iX q0 q1 ···qn−1 |qn i,

|q0 i • |q0 i
|q1 i • |q1 i
.. .. ..
. . .
|qn−1 i • |qn−1 i
|qn i X q0 q1 ···qn−1 |qn i,
where q0 q1 · · · qn−1 is the product of bits q0 , q1 , ..., qn−1 . Therefore, the state of the target qubit
changes if and only if all control qubits are set to 1. The state of each control qubit doesn’t
change when we describe this gate acting on the computational basis. The action on a generic
vector is obtained by writing the vector as a linear combination of vectors of the computational
basis and then using linearity.
The simplest way to decompose the multiqubit Toffoli gate in terms of the usual Toffoli gate
is by using (n − 2) draft qubits called ancillas. The ancillary qubits are interlaced with the
control qubits, the first ancilla qubit being inserted between the second and third qubits. The
best way to explain the decomposition is to show an example. Consider the gate C 5 (X), whose
22
decomposition requires three ancillas, as shown in the following circuit equivalence:
• • •
• • •
− − −− |0i • • |0i
• • •
− − −− ≡ |0i • • |0i
• • •
− − −− |0i • |0i
• •
.
The multiqubit Toffoli gate can also be activated by zero. In this case, the control qubit is
shown as an empty circle. For n qubits, we have 2n multiqubit Toffoli gates that can implement
any Boolean function of n bits, as described in the next Section.
2.7 Circuit of a Boolean function

Let’s show how to obtain the quantum circuit of a truth table. We only need multiqubit Toffoli
gates. To show that the multiqubit Toffoli gates can implement any Boolean function on a
quantum computer, let’s take the 3-bit Boolean function f (a, b, c) defined by the following truth
table as an example:
a b c f (a, b, c)
0 0 0 0
0 0 1 1
0 1 0 0
0 1 1 0
1 0 0 0
1 0 1 0
1 1 0 1
1 1 1 0
After this example, it is evident how the general case is obtained. Since f has three input bits,
we use multiqubit Toffoli gates with three controls. The 4th qubit is the target. The output of
f is the output of a measurement of the target qubit. Since f has two clauses in the disjunctive
normal form, we use two multiqubit Toffoli gates.6 The first gate must be activated by the input
|001i and the second by |110i, which correspond to the rows of the truth table whose output is
6
In the disjunctive normal form, we only consider rows of the truth table whose output is 1. For each row with
output 1, we write a conjunction of three literals, using NOT for each input 0. Then we write a disjunction of the
conjunctions, like this, f (a, b, c) = (ā ∧ b̄ ∧ c) ∨ (a ∧ b ∧ c̄), where (ā ∧ b̄ ∧ c) is associated with 001 and (a ∧ b ∧ c̄)
with 110.
23
1. The following circuit implements f :
|0i • |0i
|0i • |0i
|1i • |1i
|0i |1i.
Note that if the input is |a, b, ci|0i then the output will be |a, b, ci|f (a, b, c)i. This shows that
the quantum computer can calculate any n-bit Boolean function using a multiqubit Toffoli gate
with (n + 1) qubits for each output 1 of the truth table. The goal here is not to implement
classical algorithms on quantum computers because it makes no sense to build a much more
expensive machine to run only classical algorithms. However, the implementation we have just
described can be used for inputs in superposition, which is not allowed on a classical computer.
Unfortunately, this quantum circuit construction technique for calculating truth tables is not
efficient, since the number of multiqubit Toffoli gates increases exponentially as a function of
the number of qubits in the worst case.
2.8 Quantum parallelism

The most standard model of quantum computing is described by the circuit
|0i H 0 or 1
.. .. .. ..
. . U . .
|0i H 0 or 1.
The initial state of each qubit is |0i. Then, the Hadamard gate is applied to each qubit, preparing
for the quantum parallelism. Then, the unitary matrix U is applied to all qubits. Finally, there
is a meter for each qubit, returning a bit string.
Quantum parallelism is the simultaneous execution of an algorithm with more than one
input to a single quantum processor. To execute the same task on a classical computer would
require an exponential number of classical processors. To understand this concept, we focus only
on U without considering the H gates. U and the meters can be interpreted as a randomized
algorithm in the classical sense. If x is a n-bit string and U is a 2n × 2n matrix, then U |xi is
a linear combination of 2n vectors of the computational basis. After the measurements, a n-bit
string y returned, where |yi is one of the terms of the linear combination. That is, if the input
to U is x, the output after the measurement is y. U is an “algorithm”, which is executed for a
single input x when the computation U |xi is performed. Now let us bring the H gates back.
The first step of the circuit is to perform the following computation: (H|0i) ⊗ · · · ⊗ (H|0i),
that is
|0i + |1i |0i + |1i
√ ⊗ ··· ⊗ √ .
2 2
24
After expanding this product, we obtain
1
√ |0 · · · 0i + |0 · · · 01i + |0 · · · 10i + · · · + |1 · · · 1i .
2n
Each n-bit string is inside a ket of the sum above. That is, all possible inputs of a classical
algorithm are represented by kets in the sum above. The next step in the standard model is
to apply U . Due to the linearity of linear algebra, U is applied to all kets of the sum above
simultaneously. Therefore, it is possible to perform 2n simultaneous computations on a n-qubit
quantum computer. Note, however, that the result of these computations is a superposition
state and, after a measurement, the final output is a single n-bit string.
We can refine the standard model by adding an extra register for draft calculations. Since
unitary gates are invertible, the whole calculation process without measurement is reversible.
This means that no information is erased. The number of output qubits must be equal to the
number of input qubits. We postpone the discussion about this refinement and present it after
describing and analyzing the basic quantum algorithms.
25
Chapter 3
Deutsch’s Algorithm
Deutsch’s algorithm is the first algorithm exploring quantum parallelism. It uses two qubits (only
one in the economical version) and has a very modest gain, but it has inspired the construction
of several new quantum algorithms that are more efficient than their classical counterpart.
Deutsch’s problem was posed in 1985 [15] without yet using the quantum circuit model. The
concepts of universal gates and quantum circuits were first presented in [16], approximately four
years later. A generalization of Deutsch’s algorithm was described in [17] and is known as the
Deutsch-Jozsa algorithm (see next Chapter). The version described here follows the modern
view of Deutsch’s algorithm [12, 40], which differs slightly from the original one. In the final
Section, we describe an implementation of Deutsch’s algorithm with only one qubit.
3.1 Problem formulation

Suppose we have a 1-bit Boolean function f : {0, 1} −→ {0, 1} without knowing the details of
the implementation of f . We want to determine whether this function is balanced or constant. A
1-bit Boolean function is balanced if f (0) 6= f (1); otherwise, the function is constant, in this case
f (0) = f (1). There are four Boolean functions f , whose truth tables are described in Fig. 3.1
with names f0 to f3 .
x f0 (x) f1 (x) f2 (x) f3 (x)

0 0 0 1 1
1 0 1 0 1
Figure 3.1: Truth tables of all 1-bit functions.
The disjunctive normal forms are
f0 (x) = False,
f1 (x) = x,
f2 (x) = x̄,
f3 (x) = x̄ ∨ x,
respectively, where x̄ = NOT x. The Boolean expression of f3 can be simplified since x̄ ∨ x =

True.
26
A classical algorithm that finds the solution to this problem needs to evaluate f twice, that
is, it is really necessary to evaluate f (0) and f (1). However, Deutsch’s algorithm uses a unitary
operator Uf that implements f and calls this operator only once. In this case, f (0) and f (1)
are also evaluated, but the difference is that the evaluations are performed simultaneously. This
idea is used recurrently in quantum algorithms.
In the quantum case, f is implemented through a 2-qubit unitary operator Uf defined as
Uf |xi|ji = |xi|j ⊕ f (x)i,
where ⊕ is the XOR operation or addition modulo 2. This is a recipe that can be used to
implement an arbitrary n-bit Boolean function. We have to take care that x is the input to the
first register, and to obtain f (x) we set j = 0 as the input to the second register and then we
look at the output of the second register. Now we use the technique described in Chapter 2 to
obtain the quantum circuit of functions f0 to f3 . We use their disjunctive normal forms and
we have to use one multiqubit Toffoli gate for each output 1 in the truth table. In the 2-qubit
case, a multiqubit Toffoli gate is a CNOT activated by either 0 or 1. There is no output 1 in
the truth table of f0 . Then,
Uf0 = I ⊗ I.
There is one output 1 in the truth table of f1 , which has input 1. We use the standard CNOT,
which is activated when the control is set to 1. Then,
Uf1 = CNOT.
There is one output 1 in the truth table of f2 , which has input 0. We use the CNOT that is
activated when the control is set to 0. Then,
Uf2 = (X ⊗ I) · CNOT · (X ⊗ I).
Finally, there are two outputs 1 in the truth table of f3 with input 0 and 1. We use the standard
CNOT and the CNOT that is activated when the control is set to 0. Then,
Uf3 = (X ⊗ I) · CNOT · (X ⊗ I) · CNOT.
It is clear that the general recipe to implement Boolean functions using their truth tables pro-
duces unnecessary large circuits. In this case, we need to simplify the Boolean expression before
building the circuit. There is no recipe at this point to guide us except our ability in handling
Boolean functions and making circuits. For f3 , we know that f3 (x) = True and the output must
be 1 regardless the input x to the first qubit. Since the input to the second qubit is 0, output 1
is obtained by using a X. Then, the simplified version of Uf3 is
Uf3 = I ⊗ X.
Note that Uf is unitary in all cases.
27
Algorithm 3.1: Deutsch’s algorithm
Input: A Boolean function f : {0, 1} −→ {0, 1}.
Output: 0 if f is constant, 1 if f is balanced.
1 Prepare the initial state |0i|1i;
2 Apply H ⊗ H;
3 Apply Uf ;
4 Apply H ⊗ H;
5 Measure the first qubit in the computational basis.
3.2 The algorithm

Deutsch’s algorithm is described in Algorithm 3.1 and the (non-economical) circuit is
|0i H H f (0) ⊕ f (1)

Uf
|1i H H |1i.
|ψ0 i |ψ1 i |ψ2 i |ψ3 i
We easily check that the circuit corresponds exactly to the steps of Algorithm 3.1. The output
f (0)⊕f (1) is 0 if f is constant, and 1 if f is balanced. Note that the last Hadamard gate applied
to the second qubit can be eliminated without affecting the algorithm. This gate is here because
the central part of the circuit is symmetric and the analysis of the algorithm is neater than
without it. The states at the bottom of the circuit are used in the analysis of the algorithm.
3.3 Analysis of the algorithm

After the first step, the state of the quantum computer is
|ψ0 i = |0i ⊗ |1i.
After the second step, the state of the quantum computer is
|ψ1 i = (H|0i) ⊗ (H|1i)

|0i + |1i
= √ ⊗ |−i,
2
√
where |−i = (|0i − |1i)/ 2. After the third step, the state of the quantum computer is
|ψ2 i = Uf |ψ1 i
Uf |0i|−i + Uf |1i|−i
= √ .
2
To simplify |ψ2 i, let us show the following proposition:
28
Proposition 3.1. Let f : {0, 1}n −→ {0, 1} be a n-bit Boolean function and define Uf as
X 1
X
Uf = |x, j ⊕ f (x)ihx, j|.
x∈{0,1}n j=0
Then
Uf |xi ⊗ |−i = (−1)f (x) |xi ⊗ |−i.

Proof. Using the definition of |−i, we obtain

Uf |xi|0i − Uf |xi|1i
Uf |xi ⊗ |−i = √ .
2
Using the definition of Uf , we obtain
|xi|f (x)i − |xi|1 ⊕ f (x)i

Uf |xi ⊗ |−i = √ .
2
If f (x) = 0, the right-hand side is |xi|−i and if f (x) = 1, the right-hand side is −|xi|−i. Then,
joining these results together, we have (−1)f (x) |xi ⊗ |−i.
Using the proposition above, we simplify |ψ2 i to
(−1)f (0) |0i + (−1)f (1) |1i

|ψ2 i = √ ⊗ |−i
2
(
± |+i ⊗ |−i, if f (0) = f (1),
=
± |−i ⊗ |−i, if f (0) 6= f (1).
In the last passage, we have simplified the state of the first qubit without specifying exactly
what is the sign of the amplitude. This sign has no effect on the output of the algorithm. After
the fourth step, the state of the quantum computer is
|ψ3 i = (H ⊗ H)|ψ2 i
(
± |0i ⊗ |1i, if f (0) = f (1),
=
± |1i ⊗ |1i, if f (0) 6= f (1),
where we have used the fact that H|+i = |0i and H|−i = |1i, which can be deducted using
|+i = H|0i, |−i = H|1i, and H 2 = I. The state |ψ3 i can be further simplified to
|ψ3 i = ± |f (0) ⊕ f (1)i ⊗ |1i.
After the fifth step, the measurement of the first qubit in the computational basis returns
f (0) ⊕ f (1), which is 0 if f is constant and 1 if f is balanced, concluding the analysis of the
algorithm.
Note that the sign of |ψ3 i has no influence on the measurement result because the probability
of obtaining f (0) ⊕ f (1) is | ± 1|2 = 1. It is easy to check, if relevant, that this sign is (−1)f (0) .
29
3.4 Analysis of the entanglement
Summarizing the set of states of the qubits after each step, we have
|ψ0 i = |0i ⊗ |1i,

|ψ1 i = |+i ⊗ |−i,
(
± |+i ⊗ |−i, if f (0) = f (1),
|ψ2 i =
± |−i ⊗ |−i, if f (0) 6= f (1),
|ψ3 i = ± |f (0) ⊕ f (1)i ⊗ |1i.
Regardless of f , the quantum computer does not go through an entangled state during the
execution of Deutsch’s algorithm because each state |ψi i for i from 1 to 3 is a tensor product
of 1-qubit pure states. The qubits are unentangled all the time. This means that Deutsch’s
algorithm is faster than classical algorithms making use of quantum parallelism only.
There is an alternate route to analyze entanglement. No 2-qubit operator A ⊗ B creates or
destroys entanglement. In Deutsch’s algorithm, only CNOT can create entanglement and it is
used at most one time. Then, for each function f we can simplify the whole algorithm in order
to obtain only one final operator, whose input is |00i. If we simplify the algorithm for each f ,
we obtain
(H ⊗ H) Uf0 (H ⊗ H) = (H ⊗ H) · (H ⊗ H) = I ⊗ I,
(H ⊗ H) Uf1 (H ⊗ H) = (H ⊗ H) · CNOT · (H ⊗ H) = CNOT10 ,
(H ⊗ H) Uf2 (H ⊗ H) = (Z ⊗ I) · CNOT10 · (Z ⊗ I),
(H ⊗ H) Uf3 (H ⊗ H) = I ⊗ Z,
where the control of CNOT10 is the second qubit and the target is the first qubit. We conclude
right away that no entanglement is created when f is f0 or f3 . For f1 and f2 , the CNOT’s
control qubit is set to 0 because the input is |00i, in these cases, no entanglement is created
either.
We can take advantage of the simplified version of the algorithm to check the output again
and reanalyze the algorithm. The states of the qubits before the measurement are

|ψ3 if0 = (I ⊗ I)|00i = |0, 1i,

|ψ3 if1 = CNOT10 |00i = |1, 1i,

|ψ3 if2 = (Z ⊗ I) · CNOT10 · (Z ⊗ I)|00i = −|1, 1i,

|ψ3 if3 = (I ⊗ Z)|00i = −|0, 1i.
The output of a measurement of the first qubit is 0 for f0 and f3 , which are the constant
functions, and 1 for f1 and f2 , which are the balanced functions. We can also check that the
sign is (−1)f (0) because f0 (0) = f1 (0) = +1 and f2 (0) = f3 (0) = −1.
3.5 Who implements the oracle?

Deutsch’s algorithm is said to be more efficient than its classical counterpart in a limited context
called “query complexity” [28]. In this context, we must compare the number of evaluations
30
of f in the classical case with the number of times Uf is used in the quantum case. This is a
rule of the game. Deutsch’s algorithm applies Uf only once and the classical algorithm needs to
evaluate f two times. Then, Deutsch’s is faster.
Note that the analysis of Deutsch’s algorithm shows that f is evaluated at two distinct points
in the domain simultaneously. This is not possible to be carried out using a sequential classical
algorithm. However, it is possible to be carried out in a classical computer with parallel proces-
sors if the number of simultaneous threads does not scale up. Since Deutsch’s algorithm uses a
fixed number of qubits, we cannot scale it up and it is not possible to perform an asymptotic
analysis. In terms of time complexity, the quantum version has no gain. The importance of
Deutsch’s algorithm lies in the fact that it stimulated the search for generalizations, such as
the Deutsch-Jozsa, Bernstein-Vazirani, and Simon algorithms, which inspired Shor in the elab-
oration of a quantum algorithm for factoring composite integers and a quantum algorithm for
calculating discrete logarithms.
Last but not least, we remark that it is not up to us to implement the oracle. It is someone
else’s job. Without this understanding, we face a contradiction—we have to know the answer
before even starting to implement the algorithm that will find the answer. It is allowed to query
function f implemented by someone without looking at its implementation details. We are given
what is called a black box quantum computer with Uf already implemented, which we can use,
add new gates, but cannot see inside. To add new gates is allowed by the rules of the game for
oracle-based algorithms.
3.6 Economical circuit of Deutsch’s algorithm

Deutsch’s algorithm can be implemented with only one qubit in the following way:
|0i H Uf0 H f (0) ⊕ f (1),
where
1
X
Uf0 = (−1)f (x) |xihx|.
x=0
From Proposition 3.1, we see that the second qubit is not necessary for the algorithm, although
it is necessary if we want to obtain f (x) in the classical sense, as shown ahead. By expanding
the sum, we obtain
(−1)f (0)

0 0
Uf = .
0 (−1)f (1)
Then, (
±I, if f (0) = f (1),
Uf0 =
±Z, if f (0) 6= f (1).
The analysis of the algorithm reduces to calculating
(
0 ± I|0i, if f (0) = f (1),
HUf H|0i =
± X|0i, if f (0) 6= f (1).
31
Using that f (0) ⊕ f (1) = 0 if f (0) = f (1), and f (0) ⊕ f (1) = 1 if f (0) 6= f (1), we obtain
HUf0 H|0i = ±|f (0) ⊕ f (1)i.
After a measurement, the output is f (0) ⊕ f (1) with probability | ± 1|2 = 1.

Now it is straightforward to check that there is no entanglement in Deutsch’s algorithm
because a qubit cannot entangle with itself. Entanglement requires at least two qubits.
Consulting the oracle

In the non-economical circuit, if the input to the first qubit of Uf is |xi, Uf returns f (x) when
we perform a measurement of the second qubit. In the economical circuit, we obtain f (x) in the
classical sense using the circuit
|xi Uf0 |xi
|0i H • H f (x).
In fact, the steps of this circuit are
Uf0 −•
I⊗H |xi|0i + (−1)f (x) |xi|1i I⊗H
|xi|0i −−−→ |xi|+i −−−−−→ √ −−−→ |xi|f (x)i.
2
To calculate the last step, we use that f (x) is either 0 or 1 for a fixed x. Firstly, we suppose
that f (x) = 0, then I ⊗ H is applied to |xi|+i resulting in |xi|f (x)i, and secondly we suppose
that f (x) = 1, then I ⊗ H is applied to |xi|−i resulting in |xi|f (x)i. After measuring the second
qubit in the computational basis, we obtain f (x) with probability 1.
32
Chapter 4
Deutsch-Jozsa Algorithm
The Deutsch-Jozsa algorithm is a deterministic quantum algorithm, a generalization of Deutsch’s

algorithm, and the first example that is exponentially faster than its equivalent classical deter-
ministic algorithm. It was published in 1992 [17] and revisited in 1998 [12]. Many books [28,
31, 33, 35, 40, 60] and papers [38, 47, 48] have reviewed and generalized this algorithm.

Let f : {0, 1}n −→ {0, 1} be a n-bit Boolean function, n ≥ 2, with the following property: Either
f is balanced or constant. A Boolean function is balanced if the inverse image of point 0 has
cardinality 2n−1 , and it is constant if the inverse image of point 0 has cardinality 0 or 2n . Suppose
we are able to evaluate this function at any point in the domain; however, we don’t have access to
the details of the implementation of f , that is, we are given a black-box quantum computer with
f already implemented. The Deutsch-Jozsa algorithm solves the following problem: Determine
whether f is balanced or constant using this black-box quantum computer.
There are only two constant functions, which are f (x) = False and f (x) = True for all n-bit
string x, but there are many balanced functions. The best classical deterministic algorithm
that solves this problem, given a black-box function f which is either balanced or constant and
implemented on a n-bit classical computer, is the following: Evaluate f at 2n−1 + 1 distinct
points in its domain and check whether the output is always the same (f is constant) or not (f
is balanced).
The Deutsch-Jozsa algorithm, on the other hand, is a deterministic quantum algorithm that
uses a black-box unitary operator Uf only once. The action of Uf on the computational basis is
where x ∈ {0, 1}n and j ∈ {0, 1}. The qubits are split into two quantum registers with sizes n
and 1.1 After making a superposition of vectors |xi for all x, the Deutsch-Jozsa algorithm is
able to compute f (x) for all x with a single application of Uf and after quick post-processing is
able to determine whether f is constant or balanced.
In the next proposition, we show that Uf is unitary. Then, since each entry of Uf is either
0 or 1, Uf is a 2n+1 -dimensional permutation matrix.2
1
A quantum register is a set of qubits.
2
A permutation matrix is a square binary matrix such that each row and each column has exactly one entry
33
Proposition 4.1. Uf is unitary for any n-bit Boolean function.
Proof. Let us show that Uf† Uf = I. Using the definition of Uf , we have

0
0 †
x j Uf Uf |xi|ji = x0 x j 0 ⊕ f (x0 )j ⊕ f (x)

= δxx0 j 0 ⊕ f (x0 )j ⊕ f (x) .

6 0 only if x = x0 , we have δxx0 j 0 ⊕ f (x0 )j ⊕ f (x) = δxx0 j 0 j . Then

Using that δxx0 =

0
0 †
x j Uf Uf |xi|ji = δxx0 δjj 0 .
Since j, j 0 are arbitrary bits and x, x0 arbitrary n-bit strings, the proof is done.
4.2 The quantum algorithm
Algorithm 4.1: Deutsch-Jozsa algorithm

Input: A black-box Uf implementing a n-bit Boolean function f : {0, 1}n −→ {0, 1},
which is either balanced or constant.
Output: 0 if f is constant; otherwise, f is balanced.
1 Prepare the initial state |0i⊗n |1i;
2 Apply H ⊗(n+1) ;
3 Apply Uf ;
4 Apply H ⊗(n+1) ;
5 Measure the first register in the computational basis.
The Deutsch-Jozsa algorithm is described in Algorithm 4.1 and the (n + 1)-qubit circuit is
|0i H H 0 or 1
.. .. .. ..
. . . .
Uf
|0i H H 0 or 1
|1i H H |1i.
|ψ0 i |ψ1 i |ψ2 i |ψ3 i
We easily check that the circuit corresponds exactly to the steps of Algorithm 4.1. The output
is the n-bit string 0 if f is constant, and different from 0 if f is balanced. Note that the last
Hadamard gate applied to the last qubit can be eliminated without affecting the algorithm. This
gate is here because the central part of the circuit is symmetric and the analysis of the algorithm
is neater than without it. The states at the bottom of the circuit are used in the analysis of the
algorithm.
equal to 1 and zeroes elsewhere.
34
|ψ0 i = |0i⊗n ⊗ |1i.
|ψ1 i = (H|0i)⊗n ⊗ (H|1i)

2 −1 n
1 X
= √ |xi|−i,
2n x=0
√
where |−i = (|0i − |1i)/ 2. After the third step, the state of the quantum computer is
|ψ2 i = Uf |ψ1 i
n
2 −1
1 X
= √ Uf |xi|−i .
2n x=0
To simplify |ψ2 i, we use Proposition 3.1 on Page 29, which states that
Uf |xi|−i = (−1)f (x) |xi|−i.

We obtain n
2 −1
1 X
|ψ2 i = √ (−1)f (x) |xi|−i.
n
2 x=0
After the fourth step, the state of the quantum computer is
|ψ3 i = H ⊗(n+1) |ψ2 i

2 −1n
1 X
(−1)f (x) H ⊗n |xi ⊗ H|−i .

= √
2n x=0
To simplify |ψ3 i, let us show the following proposition:
Proposition 4.2. Let x ∈ {0, 1}n be a n-bit string x0 · · · xn−1 . Then

2 −1 n
⊗n 1 X
H |xi = √ (−1)x·y |yi,
n
2 y=0
where x · y = x0 y0 + · · · + xn−1 yn−1 mod 2.
Proof. Using that x = (x0 · · · xn−1 )2 , we obtain
H ⊗n |xi = H|x0 i ⊗ · · · ⊗ H|xn−1 i

1 X 1 1 X 1
= √ (−1)x0 y0 |y0 i ⊗ · · · ⊗ √ (−1)xn−1 yn−1 |yn−1 i .
2 y0 =0 2 yn−1 =0
35
Putting all sums at the beginning, we obtain
1
⊗n 1 X
H |xi = √ (−1)(x0 y0 +···+xn−1 yn−1 ) |y0 i ⊗ · · · ⊗ |yn−1 i.
2n y0 ,...,yn−1 =0
Using the definition of x · y and converting to the decimal notation, we complete the proof.
Using the proposition above, we simplify |ψ3 i to

2 −1 n n −1
2X
!
1 X
|ψ3 i = n (−1)x·y+f (x) |yi ⊗ |1i.
2
y=0 x=0
The amplitude of state |0i|1i (0 in decimal) is

2 −1n
1 X
(−1)f (x) .
2n
x=0
The probability that a measurement of the first register returns y = 0 (in decimal) is
n −1 2
1 2X
f (x)
p(0) = n (−1) .

2
x=0
If f is constant, p(0) = 1 and we know with certainty that the output is y = 0. If f is balanced,
p(0) = 0 and we know with certainty that the output is y 6= 0.

There is no entanglement between the registers because |ψ0 i to |ψ3 i can be written either as
|ψi ⊗ |1i or |ψi ⊗ |−i, for some state |ψi. We have to check the first register only. From the
circuit of the algorithm, we realize that the only operator that creates or destroys entanglement
is Uf . Then, it is enough to analyze
2 −1 n
1 X
|ψi = √ (−1)f (x) |xi,
2n x=0
which is the state of the first register after applying Uf . We ask ourselves whether there are ai
and bi so that
a0 |0i + b0 |1i an−1 |0i + bn−1 |1i
|ψi = √ ⊗ ··· ⊗ √ ,
2 2
36
where each pair (ai , bi ) must obey |ai |2 +|bi |2 = 2. This is the only way of having no entanglement
at all. Equivalently, we ask whether the system of equations
a0 ...an−2 an−1 = (−1)f (0..00)

a0 ...an−2 bn−1 = (−1)f (0..01)
a0 ...bn−2 an−1 = (−1)f (0..10)
a0 ...bn−2 bn−1 = (−1)f (0..11)
..
.
b0 ...bn−2 bn−1 = (−1)f (1..11)
admits a solution or not. It is straightforward to check that ai 6= 0 and bi 6= 0 for all i. Besides,
we need not to worry about phases of ai . In fact, without loss of generality, we consider ai real
and positive because a global factor can be discarded. Then, by selecting equation
b0 a1 a2 ...an−2 an−1 = (−1)f (1..00) ,
we conclude that b0 is also real. The same applies to the other bi . Now, let us show that ai = 1
and bi = ±1. Let us start with a0 and b0 by selecting the following equations:
a0 a1 a2 ...an−2 an−1 = (−1)f (0..00) ,

b0 a1 a2 ...an−2 an−1 = (−1)f (1..00) .
Dividing them, we obtain a0 ± b0 = 0. This result together with the constraint a20 + b20 = 2
implies that a0 = 1 and b0 = ±1. The same applies to the other ai and bi . Now, counting the
number of unentangled states, we obtain at most 2n , up to a global sign. Or, at most 2n+1 .
There are only two constant functions: f (x) = 0 and f (x) = 1 for all x, which correspond
⊗n ⊗n
to |ψi = H|0i and |ψi = − H|0i , respectively. In both cases, there is no entanglement.
Then, let us count the number of balanced functions. The exact number is the number of subsets
with cardinality 2n−1 because as soon as we find such a subset S, we define a balanced function
taking f (x) = 0 if x ∈ S and f (x) = 1 otherwise. When we cover all such subsets, we have
obtained all balanced n
2n
3 functions. Since the domain has cardinality 2 , the number of balanced
functions is 2n−1 .
The number of balanced functions grows quicker than the number of unentangled states. In
fact, using the asymptotic approximation4
22p

2p
≈√
p πp
we obtain √ 2n
2n

2 2
≈√ √ .
2n−1
π 2n
We conclude that when n increases, there are more and more balanced functions f with Uf
creating entanglement.
3
An alternative way of counting the number of balanced functions is by considering the number of permutations
of a list with 2n−1 zeros and 2n−1 ones.
4
https://en.wikipedia.org/wiki/Binomial_coefficient
37
The discussion above shows that there is n0 ≥ 2, such that for all n ≥ n0 , Uf creates
2n
> 2n+1 for n ≥ 3, we can take n0 = 3. The only remaining case

entanglement. Since, 2n−1
that may have no entanglement is n = 2. For n = 2, there are 6 balanced functions, whose truth
tables are described in Fig. 4.4 with function names f0 to f5 .
x0 x1 f0 (x) f1 (x) f2 (x) f3 (x) f4 (x) f5 (x)

0 0 0 0 0 1 1 1
0 1 0 1 1 0 0 1
1 0 1 0 1 0 1 0
1 1 1 1 0 1 0 0
Figure 4.1: Truth tables of all 2-bit balanced functions.
It is enough to analyze f0 to f2 because the other functions are the complement of those
ones. For instance, f5 is the complement of f0 , which means that state |ψi created by Uf5 is
equal to the state created by Uf0 up to a global phase. States |ψi that correspond to f0 to f2
(without normalization) are

|00i + |01i − |10i − |11i = |0i − |1i ⊗ |0i + |1i ,

|00i − |01i + |10i − |11i = |0i + |1i ⊗ |0i − |1i ,

|00i − |01i − |10i + |11i = |0i − |1i ⊗ |0i − |1i ,
respectively. All those states are unentangled. We conclude that there is no entanglement in
the Deutsch-Jozsa algorithm when n = 2.
4.5 Implementing the oracle

We repeat here what has already been written before about the oracle. It is not up to us
to implement the oracle. It is someone else’s job. Without this understanding, we face a
contradiction—we have to know the answer before even starting to implement the algorithm
that will find the answer. It is allowed to us to evaluate function f implemented by someone
without looking at its implementation details. We are given what is called a black-box quantum
computer with Uf already implemented, which we can use, add new gates, but cannot see inside.
In the classical case, we have to count the number of evaluations of f . In the quantum case, we
have to count the number of applications of Uf . This is how we calculate the query complexity
of oracle-based algorithms. Note that it does not matter whether the evaluation of f is efficient
or not.
With this understanding, let us show how to implement oracle Uf for the following entangling
balanced function: f (x) = 0 if x ∈ {000, 010, 100, 101} and f (x) = 1 if x ∈ {001, 011, 110, 111}.
Using the disjunctive normal form, we add to the circuit one multiqubit Toffoli gate for each
38
point in {001, 011, 110, 111}, as follows
|x0 i • • |x0 i
|x1 i • • • |x1 i
|x2 i • • • |x2 i
|0i |f (x)i.
Since the first and second gates have opposite controls in the second qubit and identical controls
in the first and third qubits (the third and fourth gates have a similar feature), this circuit can
be simplified to
|x0 i • |x0 i
|x1 i • |x1 i
|x2 i • |x2 i
|0i |f (x)i.
This example helps to show how to implement any balanced oracle using n + 1 qubits. It
is possible to implement the Deutsch-Jozsa algorithm with only n qubits, but this discussion is
postponed and is fully addressed in Chapter 9.
4.6 Final remarks

There are efficient randomized classical algorithms that solve the Deutsch-Jozsa problem. It is
possible to find the correct solution with high probability by consulting the classical oracle a
few times. More formally, consider the following randomized algorithm: (1) Select uniformly at
random k ≥ 2 distinct points x0 , ..., xk−1 in the domain, (2) if f (x0 ) = · · · = f (xk−1 ), return
“f is constant”; otherwise, return “f is balanced”. This algorithm returns the output “f is
balanced” with certainty because as soon as we obtain two different values after evaluating f ,
we claim the promise that f is either balanced or constant—it must be balanced. The probability
that the output “f is constant” is correct is 1 − 1/2k−1 . The success probability quickly tends
to 1, for instance, take k = 10, the success probability is at least 99.8%.
39
Chapter 5
Bernstein-Vazirani Algorithm
The Bernstein-Vazirani algorithm was presented at a conference in 1993 [7], and the full paper
was published in 1997 [8], being the first deterministic quantum algorithm with linear gain over
the best deterministic or randomized classical algorithm. The Bernstein-Vazirani algorithm
explores quantum parallelism but has no entanglement at all. This algorithm is described in
some books [35, 39, 49].

Let s = s0 ...sn−1 be an unknown n-bit string. Although we don’t know s, we have at our
disposal a Boolean function f : {0, 1}n −→ {0, 1} defined as
f (x) = s · x = s0 x0 + ... + sn−1 xn−1 mod 2,
where x0 , ..., xn−1 are the bits of x. Note that f is a linear function and each linear Boolean
function is characterized by a specific s. Our goal is to find s by evaluating f without knowing
its implementation details. In quantum computing, the operator that implements this function
is called oracle, as if an oracle reveals f (x) without showing s explicitly, and f (x) can be used to
determine s. When we build the circuit of the algorithm, f is implemented by another person,
because we don’t know s. It is important to understand this; otherwise, we get the impression
that we need to know the answer to find the answer, which is absurd.
In the classical version of this problem, we have to consult the classical oracle at least n
times. In fact, we chose x = 10...0 and ask the oracle what is f (10...0). The answer is s0 . Next
we choose x = 010...0 and ask the oracle what is f (010...0). The answer is s1 . The last query
is f (0...01), whose answer is sn−1 . This shows that we need to consult the oracle n times and
there is no way to reduce this number without introducing an error in the algorithm. In the
quantum case, we consult the quantum oracle only once, which allows us to find all bits of s, as
described below.
In the quantum case, the function f (x) = s · x is implemented using the unitary operator Uf
of n + 1 qubits, defined as
where x ∈ {0, 1}n , j is a bit, and ⊕ is the XOR operation or sum modulo 2,. This operator
uses two registers, with sizes n and 1, respectively. We can use Uf as many times as we wish.
However, it is used only once in the Bernstein-Vazirani algorithm.
40
Summing up, the classical algorithm queries the classical oracle n times using a n-bit classical
computer. The quantum algorithm queries the quantum oracle only one time using a (n + 1)-
qubit quantum computer. In the last Section of this Chapter, we show that the algorithm can
be implemented on a n-qubit quantum computer.
5.2 The algorithm
Algorithm 5.1: Bernstein-Vazirani algorithm

Input: A Boolean function f : {0, 1}n −→ {0, 1} such that f (x) = s · x.
Output: s with probability equal to 1.
1 Prepare the initial state |0i⊗n |1i;
2 Apply H ⊗(n+1) ;
3 Apply Uf ;
4 Apply H ⊗(n+1) ;
The Bernstein-Vazirani algorithm is described in Algorithm 5.1 and the circuit is
|0i H H s0
.. .. .. .. ..
. . . . .
Uf
|0i H H sn−1
|1i H H |1i.
|ψ0 i |ψ1 i |ψ2 i |ψ3 i
Note that this circuit is equal to the circuit of the Deutsch-Jozsa algorithm, the only difference is
the function used in the Bernstein-Vazirani algorithm, which need not be constant nor balanced.
Besides, in the Deutsch-Jozsa algorithm we check whether all outputs are 0 to conclude that
the function is constant; otherwise, balanced. In the Bernstein-Vazirani algorithm, each output
is valuable to determine the unknown s. The states at the bottom of the circuit are used in the
analysis of the algorithm.

|ψ0 i = |0i⊗n |1i.
|ψ1 i = (H ⊗n |0i⊗n ) ⊗ (H|1i)

n
2 −1
1 X
= √ |xi ⊗ |−i,
2n x=0
41
√
where |−i = (|0i − |1i)/ 2 and x is written in the decimal notation. After the third step, the
state of the quantum computer is
|ψ2 i = Uf |ψ1 i
n
2 −1
1 X
= √ Uf |xi ⊗ |−i.
2n x=0
To simplify |ψ2 i, we use Proposition 3.1 on Page 29, which states that
Uf |xi|−i = (−1)f (x) |xi|−i.

We obtain
n
2 −1
1 X
|ψ2 i = √ (−1)s·x |xi ⊗ |−i.
n
2 x=0
After the fourth step, the state of the quantum computer is
|ψ3 i = H ⊗(n+1) |ψ2 i

n
2 −1
!
1 X
= H ⊗n √ (−1)s·x |xi ⊗ (H|−i). (5.1)
2n x=0
To simplify |ψ3 i, we use Proposition 4.2 on Page 35, which states that for any s = (s0 ...sn−1 )2
2 −1 n
⊗n 1 X
H |si = √ (−1)s·x |xi,
2n x=0
where s · x = s0 x0 + · · · + sn−1 xn−1 mod 2. Then we use H 2 = I to obtain

2n −1
!
⊗n 1 X s·x
|si = H √ (−1) |xi .
2n x=0
We replace this result in Eq. (5.1) to obtain
|ψ3 i = |si ⊗ |1i.
The output of the measurement of the first register is s0 , ..., sn−1 with probability 1 because
|si = |s0 i ⊗ · · · ⊗ |sn−1 i.
Uf is applied only one time, so we say that a single query was made to the quantum oracle.
Note that Uf is applied to a superposition of all vectors of the computational basis, therefore f is
evaluated simultaneously at all points in the domain—all n-bit strings. The result of this massive
evaluation is a superposition state, which is useless in many a case. Not in the Bernstein-Vazirani
algorithm because the application of H ⊗n to the first register at the end reveals s. Before the
last step, s was a phase. After, s went into the kernel of a “ket”. This makes all the difference
because a phase is associated with the probability of obtaining a certain result, while the kernel
of the “ket” is associated with the measurement result.
42
5.4 Bernstein-Vazirani algorithm has no entanglement
The analysis of the entanglement follows at the beginning the same track used in the Deutsch-
Jozsa algorithm. From the circuit of the algorithm, we realize that the only operator that
creates or destroys entanglement is Uf . Then, it is enough to analyze |ψ2 i. The Bernstein-
Vazirani algorithm has entanglement if and only if |ψ2 i is totally or partially entangled. State
|ψ2 i is given by
2n −1
1 X
|ψ2 i = √ (−1)s·x |xi ⊗ |−i.
2n x=0
Proposition 4.2 on Page 35 states that for any s = (s0 ...sn−1 )2
n
2 −1
⊗n 1 X
H |si = √ (−1)s·x |xi.
2n x=0
Then |ψ2 i can be written as

|ψ2 i = H ⊗n |si ⊗ |−i,

which can be fully factorized as

|ψ2 i = H|s0 i ⊗ · · · ⊗ H|sn−1 i ⊗ |−i.
Note that |ψ2 i has no entanglement at all because it is the Kronecker product of 1-qubit pure
states [36, 19].
5.5 Circuit of the oracle

In oracle-based algorithms, the oracle is not implemented by us—it is implemented by someone
else. However, it is important to know how it is done in order to understand the whole process.
Let us show an example with n = 4 and s = 1011 that will be enough to understand the general
case. In this particular case, f is
f (x) = s · x = x0 + x2 + x3 mod 2
and the action of Uf on an arbitrary vector of the computational basis |xi|ji is
Uf |x0 x1 x2 x3 i|ji = |x0 x1 x2 x3 i|j ⊕ f (x)i = |x0 x1 x2 x3 i|j ⊕ x0 ⊕ x2 ⊕ x3 i.
Let CNOT04 be the CNOT gate acting on qubits 0 and 4. Without showing qubits 1, 2, and 3,
we have
CNOT04 |x0 i|ji = |x0 iX x0 |ji = |x0 i|j ⊕ x0 i.
Note that if we use CNOT04 , CNOT24 , and CNOT34 , we generate the expected result j ⊕ x0 ⊕
x2 ⊕ x3 in the second register. Then,
Uf = CNOT34 · CNOT24 · CNOT04 ,
whose circuit is
43
|x0 i • |x0 i
|x1 i |x1 i
|x2 i • |x2 i
|x3 i • |x3 i
|ji |j ⊕ x0 ⊕ x2 ⊕ x3 i.
From this example, we can generalize the implementation to an arbitrary s = s0 · · · sn−1 . Just
consider
Uf = (CNOT0n )s0 (CNOT1n )s1 · · · (CNOTn−1,n )sn−1 ,
where CNOTij is controlled by qubit i and the target is qubit j, and (CNOTij )sk is the identity
operator if sk = 0, and the standard CNOTij if sk = 1. Note that the order of the CNOT gates
is irrelevant. There is a CNOT for each bit 1 of s.
Let us finish the circuit of the Bernstein-Vazirani algorithm for our example. The whole
circuit is
|0i H • H 1
|0i H H 0
|0i H • H 1
|0i H • H 1
|1i H H |1i,
which is equivalent to
|0i H • H 1
|0i 0
|0i H • H 1
|0i H • H 1
|1i H H H H H H |1i,
because H 2 = I. Using that (H ⊗ H) · CNOTij · (H ⊗ H) = CNOTji , the last circuit simplifies

to
|0i 1
|0i 0
|0i 1
|0i 1
|1i • • • |1i.
From this example, we can easily derive the generic case, which can always be expressed as
CNOT’s controlled by the last qubit and the targets are the qubits corresponding to the bits
of s that are equal to 1. This simplification helps to understand that the Bernstein-Vazirani
algorithm has no entanglement because Uf was multiplied by H ⊗(n+1) , which cannot create or
destroy entanglement. The depiction of Uf in the circuit of the algorithm gives us the impression
that Uf creates entanglement, but that is misleading.
44
5.6 Economical circuit of the Bernstein-Vazirani algorithm
The Bernstein-Vazirani algorithm can be implemented with n qubits instead of n + 1. In fact,
we have learned that Uf is a product of CNOT’s
Uf = (CNOT0n )s0 · · · (CNOTn−1,n )sm−1 ,
where CNOTij is controlled by qubit i and the target is qubit j, and (CNOTij )sk is the identity
if sk = 0, and CNOTij if sk = 1. In the algorithm, just before the action of Uf , the state of the
n-th qubit is |−i. The circuit that describes the action of CNOT0n is
|x0 i • (−1)x0 |x0 i
|−i |−i,
where we have depicted only the first and the last qubits and we have placed (−1)x0 in the first
qubit because this is mathematically allowed. This circuit is equivalent to
|x0 i Z (−1)x0 |x0 i.
We can convert all CNOT’s of Uf into Z’s and then
Uf0 = Z s0 ⊗ · · · ⊗ Z sn−1 .
Note that if a bit si of s is 0, Z si = I. Then
Uf0 |xi = (−1)s·x |xi.
Function f is the same as before. What changes is the way we implement f as a unitary operator.
The circuit of the economical version of the Bernstein-Vazirani algorithm is
|0i H H s0
.. .. .. .. ..
. . Uf0 . . .
|0i H H sn−1 .
As before, this circuit simplifies to
H ⊗n Uf0 H ⊗n = X s0 ⊗ · · · ⊗ X sn−1
because HZH = X. Now, it is straightforward to check that there is no entanglement in the

Bernstein-Vazirani algorithm. Uf0 is not a genuine n-qubit gate, but instead is the tensor product
of n 1-qubit gates. The depiction of the circuit is misleading.
Consulting the oracle

In the non-economical circuit, if the input to the first register of Uf is |xi, Uf returns f (x) =
(−1)s·x when we perform a measurement of the last qubit. In the economical circuit, we obtain
45
f (x) using the circuit
|x0 i |x0 i
.. Uf0 ..
. .
|xn−1 i |xn−1 i
|0i H • H f (x).
In fact, the steps of this circuit are
Uf0 −•
I⊗H |xi|0i + (−1)f (x) |xi|1i I⊗H
|xi|0i −−−→ |xi|+i −−−−−→ √ −−−→ |xi|f (x)i,
2
where I is the 2n -dimensional identity operator. To calculate the last step, there are only to
cases when we fix x: Either f (x) = 0 or f (x) = 1. Firstly, we suppose that f (x) = 0, then I ⊗ H
is applied to |xi|+i resulting in |xi|f (x)i, and secondly we suppose that f (x) = 1, then I ⊗ H
is applied to |xi|−i resulting in |xi|f (x)i. After performing a measurement of the last qubit in
the computational basis, we obtain f (x) with probability 1.
46
Chapter 6
Simon’s Problem
Simon’s problem was presented at a conference in 1994 [55] together with Shor’s algorithms [52],
and the full paper was published in 1997 [56]. It is a quantum algorithm exponentially faster
than the best deterministic or randomized equivalent classical algorithm. It is a remarkable but
underestimated scientific contribution to quantum computing. Simon’s algorithm explores not
only quantum parallelism but also maximal entanglement. This algorithm and its generalization
are described in some books [28, 35, 39, 49, 62] and papers [11, 37, 63].

Let f : {0, 1}n −→ {0, 1}n be a n-bit input to n-bit output function with the following property:
There is a non-zero bit string s ∈ {0, 1}n such that
f (x) = f (y) ⇐⇒ x ⊕ y ∈ {0, s}
for all x, y ∈ {0, 1}n . This means that each point in the image has exactly two associated domain
points, that is, the point f (x) in the image has associated x and x ⊕ s in the domain because
f (x) = f (x ⊕ s) for all x ∈ {0, 1}n . Because of that, function f is in the set of two-to-one
functions. Consider the following computational problem: Determine s by querying f as few as
possible.
Our goal is to find s by evaluating f without knowing its implementation details. In quantum
computing, the operator that implements this function is called oracle, as if an oracle reveals
f (x) without showing s explicitly, and f (x) can be used to determine s. When we build the
circuit of the algorithm, the part associated with f is implemented by another person, because
we don’t know s.
In the classical version of this problem, we have to consult a classical oracle, and to determine
s with high probability, the number of queries to the function f grows as an exponential function
on the number of bits n if we use a classical computer. In fact, a naive deterministic algorithm
would be to calculate f (x0 ) for a fixed x0 , then systematically search for x in the domain such that
f (x) = f (x0 ). We need 2n evaluations in the worst case. It is better to pick√x and x0 uniformly
at random and then check whether f (x) = f (x0 ). This method requires Ω( 2n ) evaluations to
achieve success probability O(1). The proof is the same as in the two-to-one collision problem
or in the birthday paradox [14].
47
In the quantum case, f is implemented using the unitary operator Uf of 2n qubits, defined
as
Uf |xi|yi = |xi|y ⊕ f (x)i,
where x, y, and f (x) are n-bit strings, and ⊕ is the bitwise XOR operation or bitwise sum
modulo 2. This operator uses two registers each one with n qubits. Uf is a 22n -dimensional
permutation matrix. The proof that Uf is unitary is an extension of the proof presented in
Proposition 3.1 on Page 29.
Simon’s algorithm has two parts. The quantum part returns a n-bit string x obeying x·s = 0,
where
x · s = x0 s0 + · · · + xn−1 sn−1 mod 2.
To know such x is not enough to determine s. It is necessary to run the quantum part many
times, collect many bit strings x obeying x · s = 0, and then run the classical part of the
algorithm, which reveals s with probability greater than 1/2.
6.2 The algorithm
Algorithm 6.1: Simon’s algorithm

Input: Function f : {0, 1}n −→ {0, 1}n with the promise that
f (x) = f (y) ⇐⇒ x ⊕ y ∈ {0, s}.
Output: s with probability greater than 1/2.
Run the quantum part n − 1 times (Algorithm 6.2, assume ouputs x(1) , ..., x(n−1) ;

1
2 Solve the system of linear equations {x(1) · s ≡ 0, ..., x(n−1) · s ≡ 0} mod 2 (assume
solution for s0 , ..., sn−2 );
3 Take sn−1 = 0 and check whether f (s) = f (0);
4 If True return s0 ...sn−2 0; otherwise, return s0 ...sn−2 1.
Simon’s algorithm is described in Algorithm 6.1 and the quantum part is described in Algo-
rithm 6.2. The circuit is of the quantum part is
|0i H H x0
.. .. .. .. ..
. . . . .
Uf xn−1
|0i H H
|ψ4 i
|0i⊗n /n z0 ...zn−1 .
|ψ0 i |ψ1 i |ψ2 i |ψ3 i
The states at the bottom of the circuit are used in the analysis of the algorithm. They describe
the state of the qubits after each step. Note that state |ψ4 i refers to the first register only. The
notation “/n ” over a wire denotes that it is a n-qubit register.
48
Algorithm 6.2: Quantum part of Simon’s algorithm
Input: A black box Uf implementing function f : {0, 1}n −→ {0, 1}n with the promise
that f (x) = f (y) ⇐⇒ x ⊕ y ∈ {0, s}.
Output: Point x ∈ {0, 1}n such that x · s = 0.
1 Prepare the initial state |0i⊗n |0i⊗n ;
2 Apply H ⊗n to the first register;
3 Apply Uf ;
4 Measure the second register in the computational basis (assume output z0 ...zn−1 );
5 Apply H ⊗n to the first register;
6.3 Analysis of the quantum part

|ψ0 i = |0i⊗n |0i⊗n .

⊗n
|ψ1 i = H|0i ⊗ |0i⊗n
n
2 −1
1 X
= √ |xi ⊗ |0i⊗n ,
2n x=0
where x is written in the decimal notation. After the third step, the state of the quantum
computer is
|ψ2 i = Uf |ψ1 i
n
2 −1
1 X
= √ Uf (|xi ⊗ |0 · · · 0i).
2n x=0
To simplify |ψ2 i, we use the definition of Uf to obtain

n
2 −1
1 X
|ψ2 i = √ |xi ⊗ |f (x)i,
2n x=0
because (0...0) ⊕ f (x) (bitwise XOR) is f (x). The fourth step is a measurement of each qubit of
the second register, which we assume has returned z0 ...zn−1 . State |ψ2 i collapses to a superposi-
tion of only two terms because there are only two points in the domain such that f (x) = z0 ...zn−1 .
Let x0 be one of those points. Then, f (x0 ) = f (x0 ⊕ s) = z0 ...zn−1 and state |ψ3 i is
0
|x i + |x0 ⊕ si

|ψ3 i = √ ⊗ |z0 ...zn−1 i.
2
Note that we have renormalized state |ψ3 i as demanded by the measurement postulate. Point
x0 is unknown. It is selected uniformly at random among the domain points. The information
we wish to acquire, s, is hidden because x0 ⊕ s is a random point in the domain.
49
In the fifth step, we only consider the first register. After applying H ⊗n to the state of the
first register, we obtain
1
√ H ⊗n x0 + H ⊗n x0 ⊕ s .

|ψ4 i =
2
To simplify |ψ4 i, we use Proposition 4.2 on Page 35, which states that for any x0 = (x00 ...x0n−1 )2
2 −1 n
⊗n 0 1 X 0
(−1)x ·x |xi,

H x = √
n
2 x=0
where x0 · x = x00 x0 + · · · + x0n−1 xn−1 mod 2. Then

n −1
2X
1 0 0

|ψ4 i = √ (−1)x ·x + (−1)(x ⊕s)·x |xi.
2n+1 x=0
Now we use the fact that

(x0 ⊕ s) · x = (x00 + s0 )x0 + · · · + (x0n−1 + sn−1 )xn−1 mod 2
= (x00 x0 + ··· + x0n−1 xn−1 ) + (s0 x0 + · · · + sn−1 xn−1 ) mod 2
0
= (x · x) + (s · x) mod 2.
We have used that x00 ⊕ s0 = x00 + s0 mod 2. State |ψ4 i simplifies to
n −1
2X
1 0

|ψ4 i = √ (−1)x ·x 1 + (−1)s·x |xi.
2n+1 x=0
Now we use (
s·x 2, if s · x = 0,
1 + (−1) =
0, otherwise,
to obtain
n −1
2X
1 0
|ψ4 i = √ (−1)x ·x |xi.
2n−1 x=0
s·x=0
The sum is over x such that x · s = 0. Then, the output of the measurement of the first register
0
is x such that x · s = 0 with probability |(−1)x ·x |2 = 1. When we run the quantum part of
Simon’s algorithm, we obtain partial information about s. This means that we have to run the
quantum part many times to collect enough information to determine s.
Note that the probability of obtaining a specific x such that x · s = 0 is 1/2n−1 , which is the
square of the absolute value of the amplitude of state |xi in |ψ4 i. The distribution is uniform
over all of n-bit strings x that satisfy x · s = 0. On the other hand, the probability of obtaining
an arbitrary x such that x · s = 0 is 1, or equivalently the probability of obtaining x such that
x · s 6= 0 is 0. That is, we are 100% sure that the output x satisfies x · s = 0. We learn partial
information about s unless x = (0...0)2 .
Point x0 , which obeys f (x0 ) = z, plays no role in the final result nor in the final calculation
of the success probability. This is good news because x0 was hiding s at some point. After the
fifth step, x0 becomes harmless.
50
6.4 Analysis of the classical part
Each time we run the quantum part of Simon’s algorithm, the output is a n-bit string x such
that x · s = 0. Suppose we have run two times and the outputs are x and x0 . This means that
we have obtained a homogeneous system of linear equations
x0 s0 + · · · + xn−1 sn−1 ≡ 0 mod 2,

x00 s0 + ··· + x0n−1 sn−1 ≡0 mod 2,
where s0 , ..., sn−1 are the variables (unknowns) and x, x0 are known binary coefficients. Let us
calculate probability p(2) that the system is independent. The probability that the first equation
is nontrivial is
1
p1 = 1 − n
2
n
because there are 2 strings x and only one is 0. To calculate the probability that the second
equation is independent, we think that x is a n-dimensional vector in a binary vector space with
2n vectors. The subspace spanned by x has two vectors, x itself and the null vector. There
are 2n − 2 vectors that are linearly independent of x. Then, the probability p2 that the second
equation is independent is
2
p2 = 1 − n .
2
Then, probability p(2) that the two equations are independent is

1 2
p(2) = p1 p2 = 1 − n 1− n .
2 2
The probability that the next equation added to the system is independent is calculated as
follows. Vectors x and x0 span a subspace with four vectors: x, x0 , x ⊕ x0 and the null vector.
There are 2n − 4 vectors that are linearly independent of x and x0 . The probability p3 that the
third equation is independent is
22
p3 = 1 − n .
2
Then, probability p(3) that the three equations are independent is
22

1 2
p(3) = p1 p2 p3 = 1 − n 1− n 1− n .
2 2 2
We proceed in this fashion until we obtain n − 1 independent equations with probability

n−1 n−2
Y 2i
Y
p(n − 1) = pi = 1− n .
2
i=1 i=0
This product is hard to calculate. The goal now is to find a nontrivial lower bound. If we expand
the product we obtain
n−2
2n−2 X 2i

1
1 − n ··· 1 − n =1− + ··· .
2 2 2n
i=0
51
The sum is calculated using the geometric series yielding (1/2 − 1/2n ). The (· · · ) part has
higher-order terms (1/(2n )2 , 1/(2n )3 , ...), and the next Proposition shows that it is positive.
Then,
1 1
p(n − 1) ≥ + n.
2 2
Proposition 6.1. Let n ≥ 2 be an integer. Then
n−2
Y 2i

1 1
1− n ≥ + .
2 2 2n
i=0
Proof. By induction on n. The base case follows after replacing n with 2. Let’s prove the
induction step. The left-hand expression for n + 1 is
n−1 n−1
Y 2i 20 2i
Y
1− = 1 − n+1 1 − n+1 .
2n+1 2 2
i=0 i=1
By manipulating the dummy index i, we obtain

n−1 n−2
Y 2i 2i
Y
1
1− = 1− 1− n .
2n+1 2n+1 2
i=0 i=0
Let us assume that the inequality is true for n. Then

n−1
Y 2i

1 1 1
1− ≥ 1− + .
2n+1 2n+1 2 2n
i=0
Expanding the right-hand side, we obtain

n−1
Y 2i

1 1 1 1 1 1 1
1− = + n+1 + n+1 − ≥ + .
2n+1 2 2 2 2 2n 2 2n+1
i=0
This completes the proof.
We are not done yet because with n − 1 independent equations we determine n − 1 bits of
s. The missing bit si is determined by guessing, for instance, by assuming that si = 0 and
then we use the classical oracle and ask whether f (s) = f (0)? If true, we have already found s;
otherwise, we set si = 1. The cost of running the classical part is basically the cost of solving a
system of n − 1 linear equations with n variables, which is the cost O(n2 ) of inverting a n × n
matrix.
The total cost of the algorithm is n − 1 calls of Uf and one call of f plus O(n2 ) steps to solve
the system of linear equations. The success probability is greater than 1/2.
52
From the circuit of the algorithm, we realize that the only operator that creates or destroys
entanglement is Uf . Then, it is enough to analyze |ψ2 i or |ψ3 i. It is simpler to analyze |ψ3 i.
Simon’s algorithm has entanglement if and only if |ψ3 i is totally or partially entangled. The
state of the first register of |ψ3 i is
|xi + |x ⊕ si
|ψi = √ ,
2
where x is a random n-bit string and s is a fixed nonzero n-bit string. If s = 1...1 and x = 1...1,
|ψi is the well-known Greenberger–Horne–Zeilinger state of n qubits, defined as
|0 · · · 0i + |1 · · · 1i
|GHZi = √ .
2
It is known that the GHZ state is maximally entangled.1 If s = 1...1, state |ψi is non-biseparable2
for any x because
|ψi = X x0 ⊗ · · · ⊗ X xn−1 |GHZi,
and X x0 ⊗ · · · ⊗ X xn−1 does not create or destroy entanglement.
On the other hand, we can factor state |ψi for each bit 0 of s. Suppose that s = 01...1, then
|x1 ...xn−1 i + |x̄1 ...x̄n−1 i

|ψi = |x0 i ⊗ √ ,
2
where x̄i = xi ⊕ 1. This state is not maximally entangled but it still has entanglement if n > 2.
If s = 0...01, then
|xn−1 i + |x̄n−1 i
|ψi = |x0 i ⊗ · · · |xn−2 i ⊗ √ ,
2
which has no entanglement at all.
Summing up, if the Hamming weight of s is larger than 1, Simon’s algorithm has entangle-
ment. The amount of entanglement increases with the Hamming weight of s and the state of
the first register before the measurement is maximally entangled if the Hamming weight of s is
n.
6.6 Circuit of the oracle

In oracle-based algorithms, the oracle is not implemented by us—it is implemented by someone
else. However, it is important to know how it is done in order to understand the whole process.
1
https://en.wikipedia.org/wiki/Greenberger-Horne-Zeilinger_state
2
A pure state |ψi of n qubits is called biseparable, if one can find a partition of the qubits in two registers A
and B such that |ψi = |ψA i ⊗ |ψB i.
53
Let us show an example with n = 3 and s = 110 that will be enough to understand the
general case. Let us take f as the following two-to-one function
x0 x1 x2 f (x)
000
000
110
001
001
111
010
010
100
011
100
101
To build the circuit we need to write down the explicit 3-output truth table, which is
x0 x1 x2 f0 f1 f2
000 000
001 001
010 010
011 100
100 010
101 100
110 000
111 001
Note that f (x) = f0 (x)f1 (x)f2 (x), where f0 to f2 are Boolean functions. The truth table of f0
is obtained by considering only the first column of the output, and the truth tables of f1 and f2
by considering the second and third columns, respectively. Now we focus on bits 1 in the first
column of the output denoted by f0 (x). There are two of them, corresponding to inputs 011
and 101. We add to the circuit two multiqubit Toffoli gates, the first with controls activated
by 011 and the second activated by 101, with target on the 4th qubit, as shown in Fig. 6.1.
Then, we focus on bits 1 in the second column of the output denoted by f1 (x). There are two
of them, corresponding to the inputs 010 and 100. The multiqubit Toffoli gates are activated by
010 and 100, respectively, with target on the 5th qubit, as shown in Fig. 6.1. The last column,
denoted by f2 (x), requires multiqubit Toffoli gates activated by 001 and 111 with target on the
6th qubit, as shown in Fig. 6.1.
|x0 i • • • |x0 i
|x1 i • • |x1 i
|x2 i • • • • |x2 i
|0i |f0 (x0 x1 x2 )i
|0i |f1 (x0 x1 x2 )i
|0i |f2 (x0 x1 x2 )i
Figure 6.1: Oracle of Simon’s algorithm.
The only trivial simplification that can be immediately seen is that the first two multiqubit
54
Toffoli gates can be simplified into only one Toffoli gate with empty control on qubit 2, full
control on qubit 3 and target on qubit 4.
6.7 Final remarks

The formulation of Simon’s problem in the original paper is slightly different from the one
presented here. Simon posed the problem of determining whether f is one-to-one (injective) or
a special kind of two-to-one characterized by a n-bit string s such that f (x0 ) = f (x) if and only
if x0 = x ⊗ s. In the latter case, we have to find s. This formulation goes along the line of
the Deutsch-Jozsa algorithm, in which we have the promise that the oracle is either balanced
or constant. We have to determine which is the case. Note that if we run the quantum part
of Simon’s algorithm with a one-to-one function, the output is a random n-bit string. Simon
used this fact to prove that there exists an algorithm for a quantum Turing machine that solves
Simon’s problem with zero error probability in expected time O(nTf (n) + G(n)), where Tf (n)
is the time required to compute f (x), and G(n) is the time required to solve an n × n linear
system of equations over Z2 [56].
55
Chapter 7
Shor’s Algorithm for Factoring

Integers
Shor’s algorithms were presented at a conference in 1994 [52], the full paper was published in
1997 [53], and reviewed in 1999 [54]. It describes two quantum algorithms for integer factoring
and discrete logarithm exponentially faster than the best-known classical algorithms. It is
a remarkable and celebrated scientific contribution to quantum computing. Shor’s algorithms
exploit not only quantum parallelism but also entanglement. The algorithm for factoring integers
is the focus of this Chapter and is described in many books [25, 28, 35, 39, 49, 51, 58, 62].

Let N be a composite natural number. The computational problem consists in finding a non-
trivial factor of N . If N is composite, there exist natural numbers p1 and p2 such that N = p1 p2 ,
where 1 < p1 , p2 < N . The goal is to find p1 and then p2 is obtained by calculating N/p1 .
Note that if we have an algorithm that finds a nontrivial factor of N efficiently then we are
able to find all prime factors of N efficiently because the number of prime factors of N is at
most log2 N .
7.2 Preliminaries on number theory

Although factoring integers is the main interest because it has a big impact in breaking cryp-
tography methods such as RSA and Diffie-Hellman key exchange, we can turn our attention to
the problem of finding the multiplicative order of a natural number a modulo N because if we
solve the latter problem efficiently then we also solve the factoring problem efficiently.
In number theory, the problem of finding the multiplicative order of a natural number a
modulo N aims to determine the smallest positive integer r such that
ar ≡ 1 mod N.
For example, let N = 21, which is the largest composite number factored on a quantum computer
so far using Shor’s algorithm [57]. Now pick at random a number a such that 1 < a < N . Let us
say a = 2. Then continue multiplying each line by a and simplifying using modular arithmetic
56
until obtaining 1:
a≡2 mod 21 a7 ≡ 2 mod 21

a2 ≡ 4 mod 21 a8 ≡ 4 mod 21
3 9
a ≡8 mod 21 a ≡8 mod 21
4 10
···
a ≡ 16 mod 21 a ≡ 16 mod 21
a5 ≡ 11 mod 21 a11 ≡ 11 mod 21
6 12
a ≡1 mod 21 a ≡1 mod 21
The multiplicative order of 2 modulo 21 is 6 because 6 is the smallest positive integer such that
26 ≡ 1 mod 21. If we continue this sequence, the results 2, 4, and so on will repeat again and
again because ar+1 ≡ a1 ≡ 2, ar+2 ≡ a2 ≡ 4, and so on. Then, we have a r-periodic sequence.
If r is even, then
r r
a 2 + 1 a 2 − 1 ≡ ar − 1 mod N

≡ 0 mod N.
We find two numbers ar/2 + 1 and ar/2 − 1 whose product is a multiple of N . If those numbers
are not zero nor multiple of N , then ar/2 + 1 and a r/2 − 1 must have factors whose product is
N . We conclude that in this case gcd ar/2 + 1, N > 1 and gcd ar/2 − 1, N > 1, where gcd
is the greatest common divisor. For example, when a = 2 and r = 6, we have ar/2 + 1 = 9
and ar/2 − 1 = 7, and in both cases the gcd returns a nontrivial factor of 21. The method fails
when a = 5 because the multiplicative order of 5 modulo 21 is r = 6 and ar/2 + 1 = 126 and
ar/2 − 1 = 124. In the first case, 126 is a multiple of 21, and in the second case, gcd(124, 21) = 1.
In Shor’s algorithm, we start with the number N , then we pick at random a number a such
that 1 < a < N . Before calculating the order of a, we check whether gcd(a, N ) > 1 because
(1) if gcd(a, N ) > 1 then there is no r such that ar ≡ 1 mod N and (2) if gcd(a, N ) > 1 then a
is a nontrivial factor of N , and then we are done. To better understand what is going on here,
we split the set of numbers {1, ..., N } into two subsets:
S1 = {a : 1 < a < N and gcd(a, N ) > 1},

S2 = {a : 1 ≤ a < N and gcd(a, N ) = 1}.
In our example with N = 21, we have
S1 = {3, 6, 7, 9, 12, 14, 15, 18},

S2 = {1, 2, 4, 5, 8, 10, 11, 13, 16, 17, 19, 20}.
When we pick a number a at random, if a ∈ S1 , we quickly find a factor of N by calculating

gcd(a, N ) using the Euclidean algorithm.1 The question now is what is the cardinality of S1 ? Is
it O(N )? To answer this question we use the following facts about S2 for an arbitrary N , which
is denoted by Z∗N in number theory [23, 41]:
Fact 1 Z∗N is a finite multiplicative group modulo N .

1
https://en.wikipedia.org/wiki/Euclidean_algorithm
57
Fact 2 The cardinality of Z∗N is Euler’s totient function ϕ(N ).
Fact 1 is the theoretical basis that guarantees that there exists r such that ar ≡ 1 mod N for
any a ∈ Z∗N . It also guarantees that the function
f (`) = a` mod N
is r-periodic. On the other hand, Fact 2 is the definition of Euler’s totient function ϕ(N ), which
has been widely studied in number theory. It is known that ϕ(N ) is always nearly N (Hardy
and Wright [23]). Since the cardinality of S1 is N − ϕ(N ) − 1, the probability of selecting
a ∈ S1 is always much smaller than the probability of selecting a ∈ Z∗N . For instance, assume
that N = p1 p2 , where p1 and p2 are prime numbers such √ that the number of figures of p1 is
close to the number of figures of the integer part of N . The same is true for p2 . This is
the most interesting case in cryptography because this is the hardest case for classical factoring
algorithms. Those primes usually have 1024 bits, which is close to 308 decimal figures. For this
case, we have
S1 = {p1 , 2p1 , ..., (p2 − 1)p1 , p2 , 2p2 , ..., (p1 − 1)p2 }
√
and the cardinality of S1 is p1 + p2 − 2, which is close to 2 N . For large N , we see that there
are more odds of picking a from Z∗N .
Not all a’s in Z∗N are good because some of them have odd multiplicative order. If we
select a bad one, we have to discard a and pick at random another one. The question now is
how many a’s in Z∗N have even order? Not only that, besides even order we also need that
gcd ar/2 + 1, N > 1 or gcd ar/2 − 1, N > 1.

Fact 3 (Theorem A4.13 of [40]) Suppose N = pα1 1 · · · pαmm is the prime factorization of an odd
composite positive integer. Let a be chosen uniformly at random from Z∗N , and let r be the
order of a modulo N . Then prob(r is even and ar/2 + 1 6≡ 0 mod N ) ≥ 1 − 1/2m ≥ 3/4.
The theorem above states that the probability of selecting a good a is at least 3/4. Note that
the case ar/2 − 1 ≡ 0 mod N , which is problematic, never happens because by definition r is
the smallest integer such that ar − 1 ≡ 0 mod N .Then, if r is even and ar/2 + 1 6≡ 0 mod N ,
we have gcd ar/2 + 1, N > 1 and gcd ar/2 − 1, N > 1.
7.3 Quantum operator for modular exponentiation

(a)
The unitary operator UN , which calculates the exponentiation of integer a modulo N , is defined
as E
(a)
UN |ì|yi = |ìy ⊕ a` mod N , for 0 ≤ ` < q, 0 ≤ y < 2n ,

where n = dlog2 N e, ⊕ is the bitwise XOR operation or bitwise sum modulo 2, and q is the
(a)
smallest power of 2 such that N 2 < q < 2N 2 . UN acts on two registers with sizes log2 q and n,
and it is a permutation matrix of dimension 2n q. The inputs of the algorithm, a and N , come
(a) (a)
through UN . It replaces the oracle Uf in Simon’s algorithm, but UN is no oracle because it is
(a)
our task to implement it. In Shor’s algorithm, UN is used many times with a different a each
58
(a)
time. The proof that UN is unitary is an extension of the proof presented in Proposition 3.1
on Page 29.
(a)
To implement UN efficiently, it is necessary to use the repeated squaring method, which is an
efficient algorithm to calculate modular exponentiation. For instance, if we want to calculate 316
mod 7, we would naively multiply 3 × 3 × ... × 3 sixteen times, obtain a large number, and then
calculate the remainder after dividing by 7. In the exponentiation by squaring, we calculate the
square of 3 modulo 7, then we calculate the square of the result modulo 7, and so on four times,
which requires a logarithmic number of multiplications and each result is never too large. When
we calculate a` mod N , ` is not a power of 2 in general, but the repeated squaring method still
(a)
can be used. Using this method, it is possible to find an efficient circuit of UN by converting
classical irreversible circuits into reversible ones [39, 43].
7.4 Fourier transform and its inverse

The q-dimensional Fourier Transform Fq is defined as
q−1
1 X k`
Fq |ki = √ ω |ì,
q
`=0
where 0 ≤ k < q and ω = e2πi/q . Note that the (k, `)-entry of Fq is
ω k`
Fq k`
= √ .
q
Then, Fq is a symmetric matrix. To find Fq† , we simply take the complex conjugate of each
√
entry, which is ω −k` / q because the complex conjugate of ω is ω −1 . Then
q−1
1 X −k`
Fq† |ki = √ ω |ì,
q
`=0
where 0 ≤ k < q.
Let us show that Fq is unitary. Using the definitions of Fq and Fq† , we have
q−1 q−1
! !

0 † 1 X −k0 `0
0 1 X k`
k Fq Fq |ki = √ ω ` √ ω |ì
q 0 q
` =0 `=0
q−1
1 X 0
= ω (k−k )` .
q
`=0
Now we use the closed-form formula for the geometric series with q terms, which is
q−1
X 1 − sq
sk = ,
1−s
k=0
59
0
if s 6= 1. When s = 1, the left-hand sum is equal to q. In our case s = ω (k−k ) . From the
0
definition of ω, we have ω q = 1, and then ω (k−k )q = 1. Combining those results, we obtain
q−1
(
1 X (k−k0 )` 1, if k = k 0 ,
ω =
q 0, otherwise.
`=0
We have just shown that

0 †
k Fq Fq |ki = δkk0 ,
that is, Fq† Fq = I.

In Shor’s algorithm, given N , q is the smallest power of 2 such that q > N 2 . Fq is applied
only to the first register. The number of qubits of the first register is log2 q, which is close to
2n, while the number of qubits of the second register is exactly n = dlog2 N e. Fq plays a role
similar to the Hadamard gates at the end of Simon’s algorithm. The circuit of Fq in terms of
CNOT and one-qubit gates is described in Sec. 7.8.
7.5 The algorithm
Algorithm 7.1: Shor’s algorithm

Input: Composite integer N .
Output: A nontrivial factor of N .
1 Check whether N is a power of some prime number p;
2 If True, return p;
3 Pick uniformly at random an integer a such that 1 < a < N ;
4 If gcd(a, N ) > 1, return a;
5 Run the quantum part with inputs a and N Algorithm 7.2, assume ouput `0 , ..., `q−1 ;
6 Calculate b = `/q (the same q used in the quantum part);
7 Find the convergent of the continued fraction expansion of b with the largest
denominator r0 such that r0 < N ;
8 If r0 is odd, go to Step 3;
0
9 Check whether gcd(ar /2 + 1, N ) > 1;
0
10 If True, return gcd(ar /2 + 1, N ); otherwise, go to Step 3.
Shor’s algorithm is described in Algorithm 7.1 and the quantum part is described in Algo-
rithm 7.2. The circuit is of the quantum part is
|0i H `0
.. .. .. ..
. . Fq† . .
(a)
|0i H UN `q−1
|ψ4 i
|0i⊗n /n z0 ...zn−1 ,
|ψ0 i |ψ1 i |ψ2 i |ψ3 i
60
where N 2 < q < 2N 2 , n = dlog2 N e, and a is an integer such that gcd(a, N ) = 1. The first
register has around 2n qubits and the second has exactly n qubits. The states at the bottom
of the circuit are used in the analysis of the algorithm. They describe the states of the qubits
after each step. Note that state |ψ4 i refers to the first register only. The notation “/n ” over a
wire denotes that it is a n-qubit register.
Algorithm 7.2: Quantum part of Shor’s algorithm

Input: A composite integer N and integer 1 < a < N such that gcd(a, N ) = 1.
Output: q-bit string ` that is the nearest integer to a multiple of q/r with probability
greater than or equal to 4/π 2 , where q is the smallest power of 2 such that
N 2 < q < 2N 2 .
1 Prepare the initial state |0i⊗ log2 q |0i⊗n , where n = dlog2 N e;
2 Apply H ⊗ log2 q to the first register;
(a)
3 Apply UN to both registers;
4 Measure the second register in the computational basis (assume output z0 ...zn−1 );
5 Apply Fq† to the first register;
We have presented Shor’s algorithm as a Las Vegas algorithm, which means that the output
is always correct and the expected runtime is finite. With a small modification, it can be
presented as a Monte Carlo algorithm, which means that the output may be incorrect with a
certain probability.
7.6 Analysis of the quantum part

Calculation of |ψ0 i
|ψ0 i = |0i⊗ log2 q |0i⊗n ,
where q is the smallest power of 2 such that N 2 < q < 2N 2 .
⊗ log2 q
|ψ1 i = H|0i ⊗ |0i⊗n
q−1
1 X
= √ |ì ⊗ |0i⊗n ,
q
`=0
where ` is written in the decimal notation.
61
After the third step, the state of the quantum computer is
(a)
|ψ2 i = UN |ψ1 i
q−1
1 X (a)
= √ UN (|ì ⊗ |0 · · · 0i).
q
`=0
(a)
To simplify |ψ2 i, we use the definition of UN to obtain
q−1
1 X
`
E
|ψ2 i = √ |ì ⊗ a mod N ,
q
`=0
because (0...0) ⊕ a` (bitwise XOR) is a` . From now on, we drop the notation modulo N inside
the second ket because we have no XOR operation and there is no danger in failing to recognize
the correct arithmetic.
It is really important to understand the structure of |ψ2 i before proceeding. Expanding the
sum, we obtain
√
|2ia2 + · · · + |r − 1iar−1 +

q|ψ2 i = |0i|1i + |1i|ai +
|r + 2ia2 + · · · + |2r − 1iar−1 +

|ri|1i + |r + 1i|ai +
|2r + 2ia2 + · · · + |3r − 1iar−1 +

|2ri|1i + |2r + 1i|ai +

.. .. .. .. ..
. . . . .
2
|(c − 1)ri|1i + |(c − 1)r + 1i|ai + |(c − 1)r + 2ia + . . . ,
where c = dq/re. In fact, we have q = (c − 1)r + r0 , where r0 is the remainder of q divided by

r. The last line has r0 terms and is incomplete unless r0 = 0.2 The possible values inside the
second ket are 1, a, a2 , ..., ar−1 . We have split |ψ2 i into columns that have the same second
ket. The first columns have c terms and the last columns have c − 1 terms.
The fourth step is a measurement of each qubit of the second register, which we assume has
returned z0 , ..., zn−1 . Then there exists r1 such that ar1 = z, where 0 ≤ r1 < r. State |ψ2 i
collapses to a superposition of the first register with all terms |ì such that a` ≡ ar1 mod N ,
that is, ` = kr + r1 , for 0 ≤ k < r, yielding
c−1
!
1 X
|ψ3 i = √ |kr + r1 i |ar1 i,
c
k=0
where c = dq/re if ar1 belongs to one of the first columns (r1 ≤ r0 ), and c = bq/rc if ar1 belongs
to one of the last columns (r1 > r0 ). The analysis of the algorithm uses parameter c extensively.
It is important to memorize its definition and be aware that it is an integer.
2
The last line is complete if r | q. In this case, r is a power of 2.
62
Note that we have renormalized state |ψ3 i as demanded by the measurement postulate. We
don’t know r1 because it is selected at random from 0 to r − 1. The information we wish to
acquire, r, is hidden because kr+r1 is a random value from 0 to q−1. To perform a measurement
of the first register at this point is useless. But, the probability distribution of the first register
is a r-periodic function. This motivates the application of the inverse Fourier transform because
the result is another (almost) periodic function, whose period is close to q/r.
In the fifth step, we only consider the first register. After applying Fq† to the state of the first
register,3 we obtain
c−1
1 X †
|ψ4 i = √ Fq |kr + r1 i
c
k=0
q−1
c−1 X
1 X
= √ ω −`(kr+r1 ) |ì.
cq
k=0 `=0
Rearranging the order of the sums, we obtain

q−1 c−1
!
1 X −`r1 1 X −`kr
|ψ4 i = √ ω √ ω |ì.
q c
`=0 k=0
To calculate the sum inside the parentheses, we use again the closed-form formula for the geo-
metric series, which is
c−1
X 1 − sc
sk = ,
1−s
k=0
if s 6= 1. When s = 1, the left-hand sum is equal to c. For the sum inside the parentheses,
s = ω −`r = e−2πi`r/q . Then
c−1
(
X c, if q | (`r),
ω −`kr = 1−ω−`cr
k=0 1−ω −`r
, otherwise.
State |ψ4 i simplifies to

√ X q−1 q−1
c 1 X −`r1 1 − ω −`cr
|ψ4 i = √ |ì + √ ω |ì, (7.1)
q qc 1 − ω −`r
`=0 `=0
q|(`r) q-(`r)
where the notation q - (`r) means that q does not divide `r. The first sum is over ` such that q
divides (`r). The second sum is over the remaining `’s.
3
The algorithm uses Fq† and not Fq at this point because the goal is to convert a superposition state into a
state of the computational basis, not the other way around.
63
Calculation of the output
The probability of obtaining 0 ≤ ` < q is

c, if q | (`r),
q
p(`) = 2
 1 1−ω−`cr
q c 1−ω −`r , otherwise.

2
Using the definition of ω and 1 − e2πiθ = 4 sin2 (πθ), p(`) simplifies to

 qc , if q | (`r),
p(`) = sin2 π`rc
q
(7.2)

q c sin2 π`r
, otherwise.
q
The terms that contribute most to p(`) when q - (`r) are the ones with ` satisfying
π`r
sin2 ≈0
q
because p(`) is large when the denominator is close to zero. This implies that π`r/q must be
close to a multiple of π (let’s say kπ) and then the `’s that contribute most are
kq
`≈ ,
r
where k runs from 0 to r − 1. Note that if ` is zero or an exact multiple of q/r then p(`) = c/q ≈
1/r — see the definition of p(`) in Eq. (7.2). This analysis explain why the peaks of Fig. 7.1
correspond to `’s that are close to kq/r. The output of the quantum part is a q-bit string ` such
that ` is close to a multiple of q/r.
Figure 7.1: Probability distribution p(`) as a function of ` given by Eq. (7.2) when N = 21,
q = 29 , and r = 6. The dots at the top the peaks correspond to ` = 0, 85, 171, 256, 341, 527.
Details about the probability distribution

In the previous analysis, it is missing to show that the numerator of p(`) when q - (`r), the term
π`rc
sin2 ,
q
64
is not too small when ` ≈ kq/r for 0 ≤ k < r − 1. Since this analysis is too long when ` is a
discrete variable, we take an alternative route.
Let us look at p(`) as a continuous function in terms of ` in the domain [0, q]. By using
trigonometric identities and the fact that c is an integer, it is straightforward to show that
q
p `+ = p(`).
r
When we look at p(`) as a continuous function, we are able to show that it is a truly periodic
function, while the function depicted in Fig. 7.1 is not.
Now let us obtain the shape of p(`). Since p(`) is (q/r)-periodic, let us consider the interval
` ∈ [0, q/r], and besides let us restrict to `’s such that q - (`r). We use the expression
π`rc
sin2 q
p(`) = 2 π`r
.
qc sin q
The numerator of p(`), sin2 (π`rc/q), is the square of a sinusoidal function, which is (q/rc)-
periodic. Note that q/rc is exactly 1 if r | q and is close to 1 if r - q because c is either dq/re or
bq/rc. An example of the numerator of p(`) is depicted in Fig. 7.2(a) for ` ∈ [0, q/r].
Figure 7.2: (a) Numerator of p(`) as a continuous function of ` for ` ∈ [0, q/r] when N = 21,
q = 29 , and r = 6. (b) Denominator of p(`) as a continuous function of ` without qc. (c) p(`) as
a continuous function of ` for ` ∈ [0, q/r]. The plot (c) is obtained by dividing (a) by (b) by qc.
The denominator of p(`) (without qc), sin2 (π`r/q), is also the square of a sinusoidal function,
which is zero only at ` = 0 and ` = q/r. An example of the denominator of p(`) (without qc)
for the same values of N , q, and r is depicted in Fig. 7.2(b) for ` ∈ [0, q/r].
If we divide the plot shown in Fig. 7.2(a) by the plot shown in Fig. 7.2(b) and divide the
result by qc, we obtain the plot shown in Fig. 7.2(c), which is the continuous version of the plot
shown in Fig. 7.1 for 0 ≤ ` < q/r. It is straightforward to check two facts: (1) the largest values
of p(`) are close to ` = 0 and ` = q/r because the denominator of p(`) is close to zero; and
(2) p(`) is close to zero in the middle (` close to q/2r) because the numerator is upper bounded
by 1 and the denominator is close to qc, which is large. Many other facts about p(`) can be
obtained from this division, for instance, it is easy to find the number of zeros of p(`). Using
our knowledge of calculus, we check that
π`rc π`rc
sin2 q sin2 q
lim 2 π`r
= lim 2 π`r
= c2 .
`=0+ sin q `= rq − sin q
65
Then, p(0) = p(q/r) = c/q ≈ 1/r. With this analysis, we have obtained the shape of p(`) in the
whole domain because p(`) is (q/r)-periodic, that is, if we put 6 plots of Fig. 7.2(c) side by side,
we obtain the continuous version of the plot of Fig. 7.1.
Figure 7.3: First peak of the probability distribution p(`) as a function of ` when N = 21,
q = 29 , and r = 6. The red curve is the continuous version of p(`) and the height of the peak is
1/r. The values of p(`) when ` is integer are highlighted in blue. The blue plot is the stretched
version of the first peak of Fig. 7.1.
Let us further analyze the shape of the first peak of p(`). Fig. 7.3 depicts the first peak in
detail when N = 21, q = 29 , and r = 6 showing not only the continuous version in red but also
the discrete values in blue when ` is an integer. The points in blue are the same ones depicted in
the first peak of Fig. 7.1, and at least four of them are clearly recognizable. Note that the first
peak of Fig. 7.1 does not reach c/q because the peak is aborted before reaching the underlying
summit. The underlying summit is reached only for `’s such that q | (`r).
The particular case r | q (r is a power of 2) is remarkable. In fact, in this case, c is exactly
equal to q/r, which is integer, and p(`) reaches the summit, which is exactly 1/r, for all `’s that
are multiple of q/r and p(`) goes to the bottom of the valley (p(`) = 0) for all other values
of `. In Eq. (7.1), |ψ4 i has only the first sum because all terms of the second sum vanishes
since ω −`cr = ω −`q = 1. In this case, the discrete version of the probability distribution is truly
periodic with period q/r. It happens because we are using qubits and then the dimension of the
Hilbert space is a power of 2.
Success probability of the quantum part

The gist of the analysis is the calculation of the success probability after one round of the quan-
tum part of the algorithm. We have already amassed enough information about the probability
distribution; we are ready for the calculation. There is a trigonometric inequality
1
4α2 ≤ sin2 (πα) ≤ π 2 α2 for |α| ≤ ,
2
which can be proved using basic calculus. Using both sides of that inequality, we prove the
following proposition:
Proposition 7.1. If |α| ≤ 1/2c and c ≥ 1, then
sin2 παc 4c2

2 ≥ 2.
sin πα π
66
Using this proposition, we are able to find an interesting lower bound on the success prob-
ability. Let us start by calculating a lower bound on the probability p(`) when ` is the nearest
integer to a multiple of q/r. Suppose that q - (`r) and take ` = bkq/re for 1 ≤ k < r, where b e
is the notation for the nearest integer. Then,
j m

kq sin 2
π kq
r
rc
q
p = j m .
r q c sin2 π kq r r q
Now we use the trigonometric identity sin2 α = sin2 (πk 0 − α) valid for any integer k 0 to obtain
sin2 παc

kq
p = ,
r q c sin2 πα
where
kq r
α = k− .
r q
Using that
kq kq 1
r − r ≤ 2, (7.3)

we obtain
r 1
|α| ≤ ≤ q.
2q 2 r
Assuming that c = bq/rc and using Proposition 7.1, we obtain

kq 4c 4
p ≥ 2 ≈ 2 .
r π q π r
We can visualize this result by noticing that there are two blue bars inside the peak of p(`) in
Fig. 7.3. We have just shown that the height of the largest bar is never shorter than 4/π 2 r,
around 40.5% of the height of the peak.
In the quantum part of the algorithm, there are r possible k’s. Then, a lower bound on the
success probability is
4
psucc ≥ 2 ,
π
which means that the quantum part has at least 40.5% of chances of returning a (log2 q)-bit
string ` that is the nearest integer to a multiple of q/r.
7.7 Analysis of the classical part

Before running the quantum part of the algorithm, we need to do a classical checklist. It
is necessary to check whether N is a composite number, which can be done efficiently using
primality testing4 . Besides, there are efficient classical algorithms to check whether N is a
power of some prime number p [6]. After assuring ourselves that N is a composite number and
4
https://en.wikipedia.org/wiki/Primality_test
67
is not a power of a prime number, we select a random integer a such that 1 < a < N and check
whether gcd(a, N ) = 1, which can be done efficiently using the Euclidean algorithm.
Then, after running the quantum part of the algorithm, we assume that the output is a
(log2 q)-bit string ` such that ` is the nearest integer to a multiple of q/r, that is

kq
`=
r
for some k such that 0 ≤ k < r. Define

`
b= ,
q
which obeys 0 ≤ b < 1. Now we use the method of continued fraction approximation to obtain
the desired information r. A continued fraction expansion of a positive rational number b < 1 is
1
b= 1 ,
b0 + b1 + 1
..
.+ 1
bm
where b0 to bm are positive integers, and m is a natural number. The notation for the continued
fraction is [b0 , b1 , ..., bm ], and the successive convergents are [b0 ], [b0 , b1 ], [b0 , b1 , b2 ], and so on,
each one closer to b, until the last one, which is equal to b. Each convergent is converted into
a rational number by truncating the continued fraction expansion. In the algorithm, we have
to find the convergent [b0 , b1 , ..., bj ] that has the largest j such that the denominator of the
equivalent rational number is less than N . The denominator of this convergent is the candidate
for r.
Now we can see why we have to demand that q > N 2 . It is a consequence of a theorem proved
at the end of the chapter on continued fraction expansion in Hardy and Wright’s book [23].
Theorem 7.2. (Hardy and Wright) If k and r are positive integers and b is a positive real
number and
k
− b < 1 ,

r 2r2
then k/r is a convergent of the continued fraction expansion of b.
In Eq. (7.3), we have shown that the output ` of the quantum part obeys

k
− ≤ 1,
`
r (7.4)
q 2q
for some integer k such that 0 ≤ k < r, which is unknown to us as well as r. In order to use
Theorem 7.2, we have to demand that q > N 2 because N is an upper bound for r, that is,
1/q < 1/N 2 < 1/r2 . Then, we can use the theorem and be sure that k/r will be a convergent of
`/q.
To understand why we have to pick the convergent [b0 , b1 , ..., bj ] that have the largest j such
that the denominator of the equivalent rational number is less than N , we need to consider the
following facts. If we pick a convergent [b0 , b1 , ..., bj 0 ] = k 0 /r0 that obeys Eq. (7.4) and r0 < N
and it does not have the largest j 0 , then the next convergent, let us say [b0 , b1 , ..., bj+1 ] = k 00 /r00
68
also obeys Eq. (7.4) and r00 < N . We obtain a contradiction because on the one hand using
Eq. (7.4) twice we have
0 0
00
00
k
− = k −b − k −b ≤ 1 < 1 ,
k
r0 r 00 r 0 r 00 q N2
and on the other hand we have

0 00
0 00 00 0
− k = |k r − k r | > 1 .
k
r0 r00 r0 r00 N2
The last inequality follows from the inequalities |k 0 r00 − k 00 r0 | ≥ 1 and r0 < N and r00 < N .
In conclusion, there is only one convergent [b0 , b1 , ..., bj ] that obeys Eq. (7.4) such that the
denominator of equivalent fraction is less than N ; it is the one with the largest j.
We are not done yet because it may happen that gcd(k, r) > 1. In this case, when we look at
the denominator of k/r, we obtain a factor of r, not r itself. We have to discard those cases and
we ask how many k’s relatively prime with r are there. The proportion of good k’s is ϕ(r)/r,
where ϕ(r) is Euler’s totient function. There is a lower bound for ϕ(r) given by [23, 50]
r
ϕ(r) > , for r ≥ 7.
4 ln(ln(r))
Then, a lower bound for the probability that the output ` is the nearest integer to a multiple of
q/r and gcd(k, r) = 1 is

kq 1
p `= and gcd(k, r) = 1 > 2 ,
r π ln(ln(r))
for r ≥ 7.
If we think of the factoring algorithm as a Monte Carlo algorithm, which is obtained by
removing the “go to” statements of Algorithm 7.1, the algorithm will run only one time and a
lower bound on the overall success probability is
3
,
4π 2 ln(ln(r))
which is obtained
from the analysis of this section and Fact 3 r must be even and gcd(ar/2 +
1, N ) > 1 . If we think of the factoring algorithm as a Las Vegas algorithm, the way it was
described in Algorithm 7.1, the success probability is 1 but we have to calculate an upper bound
for the average number of times the quantum part of the algorithm will run until finding a factor
of N . Suppose that an algorithm has probability 0 < p < 1 of outputting the correct result in
one run of the algorithm. The probability of outputting the correct result after exactly n runs
is (1 − p)n−1 p because it must fail n − 1 times before succeeding. The average number of times
the algorithm will run is
∞
X 1
n (1 − p)n−1 p = .
p
n=1
Then, an upper bound on the average number of times that the quantum part will run in
Algorithm 7.1 is 4π 2 ln(ln(r))/3 for r ≥ 7.
69
7.8 Circuit of the Fourier transform
The first decomposition of the Fourier transform in terms of basic gates is in an IBM report of
1994, which became widely available in 2002 [13]. A description of this decomposition based on
the classical FFT is available in [34].
The basic block of the circuit of the Fourier transform F2n , where n is the number of qubits,
is the controlled gate C(Rk ) for k ≥ 0, where

1 0
Rk = .
0 exp 2πi 2k
The matrix representation of C(Rk ) is

 
1 0 0 0
I2 0 1 0 0 
C(Rk ) = =
0
.
Rk 0 1 0 
0 0 0 exp 2πi
2k
The set of gates Rk has many special subcases, which are
R 0 = I2 ,
R1 = Z,
R2 = S,
R3 = T,
where Z,√S, and T are the Pauli Z gate, phase gate, and π/8 or T gate, respectively. Note that,
Rk+1 = Rk . Then, the sequence above means the next gate pis the square root of the previous
one. The same idea applies to C(Rk ), that is, C(Rk+1 ) = C(Rk ), but in the latter case we
are calculating the square root of (4 × 4)-matrices.
Figure 7.4: Snapshot of a Jupyter notebook showing the decomposition of the Fourier transform
F25 using Python commands QFT and circuit plot of the sympy library.
Fig. 7.4 depicts the circuit of F2n in terms of the Hadamard, C(Rk ), and swap gates when
n = 5. The structure of the circuit can be easily grasped from this example. The circuit has
70
n + 1 blocks. From left to right, the first block has n gates, starting with H and then R2 , R3 ,...,
Rn acting on qubit 1 and controlled by qubit 2, 3,..., n, respectively. The next block starts
again with H and then R2 , R3 ,..., Rn−1 acting on qubit 2 and controlled by qubit 3, 4,..., n,
respectively. This goes on until we reach the last qubit, on which a single H (n-th block) is
applied. The last block is made of bn/2c swap gates and has a simple symmetric structure. If
n is odd, the central qubit does not swap.
Let us show that a circuit with the structure depicted in Fig. 7.4 implements the Fourier
transform when we have n qubits. Suppose that the input is |ì = |`1 i ⊗ · · · ⊗ |`n i. When the
input is a state of the computational basis, the output is an unentangled state |ψ1 i ⊗ · · · ⊗ |ψn i.
Let us start by calculating the output |ψ1 i of the first qubit. Since there is a swap gate inverting
the states of the first and last qubit, we have
`n `
|0i + e2πi 2 |1i |0i + e2πi 2 |1i
|ψ1 i = H|`n i = √ = √ .
2 2
The second equation follow from the fact that if `n is 0, the output is |+i, and if `n = 1,
the output is |−i (because eπi is −1). The last equation follows from the decomposition ` =
2n−1 `1 + 2n−2 `n−2 + · · · + 2`n−1 + `n and from e2πik = 1 if k is an integer.
The output of the second qubit is
  `
`
2πi n−1 2πi n−1 + `n2 `
`n `n  |0i + e 2 |1i |0i + e 2 2 |1i |0i + e2πi 22 |1i
|ψ2 i = R2 H|`n−1 i = R2 √  = √ = √ .
2 2 2
The first equation is obtained from Fig. 7.4 using that C(R2 )|`n i|`n−1 i = |`n i R2`n |`n−1 i . The

second equation uses the same calculation described before for the first qubit. The third equation
follows from
R2`n |0i = |0i,

`n
R2`n |1i = e2πi 22 |1i.
The last equation uses the same decomposition of ` described before.

The output of the last qubit is

`1 `
2πi + 22 +···+ 2`n `
|0i + e 2 2 n
|1i |0i + e2πi 2n |1i
|ψn i = Rn`n · · · R2`2 H|`1 i = √ = √ .
2 2
The first equation is obtained from Fig. 7.4 using that the output of the last qubit is obtained
from the action of H, R2`2 , ..., Rn`n on the first qubit. The second and third equations are
obtained with the same sort of calculations described before for the first and second qubits.
Then, the output |ψ1 i ⊗ · · · ⊗ |ψn i of the circuit of Fig. 7.4 with n qubits is
` ` `
|0i + e2πi 2 |1i |0i + e2πi 22 |1i |0i + e2πi 2n |1i
√ ⊗ √ ⊗ ··· ⊗ √ .
2 2 2
Now we convert each term into a sum
1 1 1
1 X 2πik1 ` 1 X 2πik2 `2 1 X 2πikn `n
√ e 2 |k i ⊗ √
1 e 2 |k2 i ⊗ · · · ⊗ √ e 2 |k i.
n
2 k =0 2 k =0 2 k =0
1 2 n
71
Pushing all sums to the right-hand side and combining the exponentials, we obtain
1
1 X 2πi`
k1
+···+ k2n
n
√ e 2
|k1 , ..., kn i,
2n k1 ,...,kn =0
which is equivalent to
2 −1 n
1 X 2πi`k
√ e 2n |ki.
2n k=0
Using the definition of the Fourier transform given in Sec. 7.4, we recognize that the last ex-
pression is F2n |ì.
Decomposition of C(Rk )
The circuit that provides the decomposition of C(Rk ) in terms of CNOT and 1-qubit gates Rk+1
is
• • • Rk+1
≡
† .
Rk Rk+1 Rk+1
This is not a decomposition in terms of universal gates because we need to write Rk+1 in terms
of a finite set of 1-qubit gates. In most cases, it is unnecessary to worry about this step, since
it is done automatically by the compiler of quantum computers. Rk can be implemented using
Rz , which is a rotation of the Bloch sphere about the z-axis. Note that if k is large, errors will
prevent these gates to work properly unless error-correcting codes are present.
Now we show that the decomposition of C(Rk ) in terms of CNOT and 1-qubit gates Rk+1
is correct. If the input to C(Rk ) is |ji|ì then the output is
•− Rk
|ji|ì −−−−−→ |jiRkj |ì.
If |jì is equal to |00i or |01i or |10i, the output is |jì because Rk |0i = |0i. The only nontrivial
output is obtained when the input is |11i. In this case, the output is exp(2πi/2k )|11i.
Let us analyze the decomposition (right-hand circuit) to check that the result is the same.
For the inputs |00i or |01i or |10i, either the CNOTs are not activated or they cancel out.
†
Besides, since Rk |0i = |0i and Rk+1 Rk+1 = I, we check that the output is equal to the input.
The missing case is input |11i. Let us follow each step at a time using that Rk+1 |0i = |0i and
Rk+1 |1i = exp(2πi/2k+1 )|1i:
†
I⊗Rk+1 k+1 •−⊕ k+1 I⊗Rk+1 k+1
|1i|1i −−−−−→ e2πi/2 |1i|1i −−−→ e2πi/2 |1i|0i −−−−−→ e2πi/2 |1i|0i → - - -
•−⊕ k+1 Rk+1 ⊗I k+1 k+1 k
- - - −−−→ e2πi/2 |1i|1i −−−−−→ e2πi/2 e2πi/2 |1i|1i = e2πi/2 |1i|1i.
Note that the output is exp(2πi/2k )|11i, which is equal to the output of the left-hand circuit.
72
7.9 Final remarks
In his original paper [54], Shor showed that the number of times we have to rerun the algorithm
until hitting a solution is O(log log r). It is possible to reduce the number of repetitions of the
quantum part by improving the classical post-processing [20].
73
Chapter 8
Shor’s Algorithm for Discrete

Logarithm
The full paper on Shor’s algorithms was published in 1997 [53] and has described not only an
algorithm for integer factoring but also an exponentially faster algorithm for discrete logarithm.
It is a remarkable and celebrated scientific contribution to quantum computing, but the algo-
rithm for discrete logarithm has not been described in many books [28]. Some papers apply this
algorithm to cryptography [27, 46].
8.1 Preliminaries on number theory

The discrete logarithm of b to base a, denoted by loga b, is the number of times that a multiplied
by itself is b, that is, as = b. Here we are assuming that a, b, and s are positive integer numbers.
For instance, log2 8 is 3 because 2 × 2 × 2 = 8. Since s = loga b, we write
aloga b = b.
Note that there are choices of a and b so that the discrete logarithm loga b does not exist, for
instance, log2 7 because there is no integer s that satisfies 2s = 7. Two important properties of
the logarithm to base a, when loga b and loga c exist, are
• loga (bc) = loga b + loga c;
• loga (bc ) = c loga b.
There are two drawbacks in the context of quantum algorithms when we use the set of positive
integers Z+ to define the notion of discrete logarithm: (1) Depending on the choice of a and
b, loga b may not exist, and (2) if loga b do exist, there are efficient methods to calculate loga b
using the fact that the discrete logarithm is equal to the standard definition of logarithm of real
numbers.
Instead of using Z+ , let us use a subset of the set ZN = {0, 1, ..., N − 1}, which has two mod-
ular operations—addition and multiplication. We are basically interested in the multiplication
modulo N , where N > 1. To pick up the desired subset of ZN , we need to know which numbers
of ZN have inverse. A number a ∈ ZN has an inverse a−1 (aa−1 ≡ 1 mod N ) if and only if the
greatest common divisor of a and N is 1. If a has an inverse, the set Ga = {a, a2 , ..., ar } is a cyclic
74
group, where r is the order of a modulo N (order(a) is the smallest positive integer so that ar ≡ 1
mod N ). The group’s product in this case is the multiplication modulo N . If we select any ele-
ment b in Ga , loga b exists. For example, for N = 34 we have G15 = {15, 21, 9, 33, 19, 13, 25, 1},
and log15 15 = 1, log15 21 = 2, log15 9 = 3, log15 33 = 4, and so on. In this context, nobody
knows a classical algorithm that calculates loga b efficiently (in polynomial time) for an arbitrary
b in Ga .
Shor’s factoring algorithm can calculate loga b efficiently if order(b) is a nontrivial factor of
order(a) because loga b =order(a)/order(b) in this case. Before trying to calculate loga b using
discrete-logarithm algorithms, we check whether the order of b divides the order of a. If not,
then we use a quantum algorithm that calculates loga b efficiently for an arbitrary b in Ga .
The main strategy in the part of Shor’s algorithm that calculates the order of a ∈ ZN is to
define the function
f : Z2m → ZN
x 7→ ax ,
where m ≈ 2n, n = dlog2 N e, and to exploit the fact that f is r-periodic. We know that this
function can be implemented in a quantum computer with around 3n qubits using the unitary
operator Uf defined by Uf |xi|yi = |xi|y ⊕ (ax mod N )i. It is possible to determine r efficiently
with one application of the inverse discrete Fourier transform F2†m to the first register after Uf
have been applied to a superposition of all states of the computational basis of the first register
(the second register is initially set to |0i). The method works because f has period r.
It is natural to ask whether the same strategy can be used to calculate s knowing that
as ≡ b mod N . The problem is that there is no s-periodic function f (x) with domain Z2m and
codomain ZN that uses parameters a and b only. The way out is to use a function f (x, y) with
two variables defined as
f : Zr × Zr → ZN
(x, y) 7→ ax by .
This definition has two main differences: (1) The domain of each variable is Zr instead of Z2m ,
and (2) f has two variables. Regarding (1), it is important to use the domain Zr ; otherwise, the
modular arithmetic would fail to provide the correct answer at the end of the algorithm. This
means that r must be calculated beforehand and if r is not a power of 2, the implementation of
the Fourier transform Fr requires extra labor. Regarding (2), f is periodic in the following way:
f (x, y) = f (x + kr − `s, y + `),
where k, ` ∈ Zr . In fact,
ax by ≡ ax (b)−` by b` ≡ ax akr (as )−` by b` ≡ ax+kr−`s by+` mod N,
where we have used that ar ≡ 1 and as ≡ b mod N . To better understand the periodicity of
f (x, y), we have to use the concept of lattices and sub-lattices in finite vector spaces.
8.2 Lattices in finite vector spaces

There are many kinds of lattices in mathematics. The ones we are interested in are the lattices
in finite vector spaces, in particular, two-dimensional lattices that are additive subgroups of the
75
vector space Zr × Zr (denoted by Z2r from now on). This vector space is the set of vectors with
two entries each one in Zr , whose canonical basis is {e0 , e1 }, where

1 0
e0 = and e1 = .
0 1
This means that any vector v in Z2r is a linear combination of e0 and e1 , that is, there are scalars
k, ` ∈ Zr so that v = ke0 + è1 .
A lattice L in Z2r is defined as a set
L = {kr + `s : k, ` ∈ Zr } ,
where r and s are two linearly independent vectors in Z2r . Set {r, s} is a basis of L. Different
bases can span the same lattice. Note that Z2r itself is a lattice in Z2r . Then, L is a sub-lattice
of Z2r .
Vectors can be represented by points. For instance, vectors

16 5
r= and s =
0 1
are shown in Fig. 8.1 as points (x, y) = (16, 0) and (5, 1). The figure also shows the lattice with
basis {(16, 0), (5, 1)} in Z216 .
Figure 8.1: Lattice with basis r = (16, 0) and s = (−11, 1) modulo 16. The borders are cyclic.
As an example of application to the calculation of discrete logarithms, consider again N = 34

but this time take a = 27 and
G27 = {27, 15, 31, 21, 23, 9, 5, 33, 7, 19, 3, 13, 11, 25, 29, 1}.
Suppose we want to calculate log27 3, whose answer we know that is s = 11. Since the order
of a = 27 is r = 16 modulo 34, the lattice L with basis r = (r, 0) and s = (−s, 1) shown
in Fig. 8.1 represents all points (x, y) in Z216 so that f (x, y) = f (0, 0), where f (x, y) = ax by
76
mod 34, a = 27, and b = 3. In fact, f (kr + `s) = f (kr − `s, `) = f (0, 0). So, f is a periodic
function where the period depends on both s and r. The lattice L is a subgroup of the additive
group Z216 . This means that we can partition Z216 into equivalence classes, the cosets of L in
Z216 , and we may select the following representatives: (0, 0), (1, 0), ..., (r − 1, 0). The image
of f on any point (x, y) ∈ Z2r is equal to the image f on one of the representatives. That is,
f (x − `s, `) = f (x, 0) for 0 ≤ ` < r, which corresponds to shift the lattice x unities to the right
with cyclic boundary conditions, that is, to add (x, 0) to each element of L modulo 16. When
we fix x and run ` from 0 to r − 1 we cover the coset with representative (x, 0).
8.3 Special case: The order of a is a power of 2

Let N , a, and b be known positive integers and let s be a positive integer such that as ≡ b
mod N and gcd(a, N ) = 1. Our goal is to find s given N , a, and b as input. Let r be the
order of a modulo N , which can be efficiently determined using Shor’s factoring algorithm. We
address in this Section the case r = 2m for some integer m, which means that r is a power of 2.
In this case, there is an efficient classical algorithm called Pohlig–Hellman algorithm that is able
to calculate s in polynomial time. We describe the quantum algorithm for this case because the
Fourier transform Fr can be implemented in a straightforward way in a qubit-based quantum
computer and the analysis of the algorithm is easier than the general case.
Let f be a two-variable function with domain Zr × Zr and codomain ZN defined as
f (x, y) = ax by mod N.
We have shown that f is periodic in the following way:
f (x, y) = f (x + kr − `s, y + `).
Using f , we define a 3-register unitary operator
Uf |xi|yi|zi = |xi|yi|z ⊕ f (x, y)i, (8.1)
where the first and second registers have m qubits each and the third register has n = dlog2 N e
qubits. The arithmetic with the variables of the first and second registers is performed modulo r.
The arithmetic to calculate the image f (x, y) is performed modulo N . The symbol ⊕ represents
the bitwise xor operation. The algorithm that calculates the discrete logarithm when r is a
power of 2 is described in Algorithm 8.1 and the circuit is depicted in Fig. 8.2.
Analysis of the algorithm

In step 2, we apply H ⊗m ⊗ H ⊗m ⊗ I to |0i|0i|0i obtaining
r−1
1 X
|ψ1 i = |xi|yi|0i.
r
x,y=0
In step 3, we apply Uf to |ψ1 i obtaining

r−1
1 X
|ψ2 i = |xi|yi|f (x, y)i.
r
x,y=0
77
Algorithm 8.1: Discrete logarithm algorithm when r is a power of 2
Input: N , a, b, and r (order of a).
Output: s = loga b with probability 1/2, where as ≡ b mod N
1 Prepare the initial state |0i⊗m |0i⊗m |0i⊗n , where m = log2 r and n = dlog2 N e;
2 Apply H ⊗m ⊗ H ⊗m to the first and second registers;
3 Apply Uf as defined in Eq. (8.1);
4 Measure the third register in the computational basis;
5 Apply Fr† ⊗ Fr† to the first and second registers;
6 Measure the first and second registers in the computational basis, where r1 and r2 are
the results;
7 If gcd(r1 , r) = 1, return s ≡ r2 /r1 mod r; otherwise, Fail.
|0i⊗m /m H ⊗m F2†m r1
|0i⊗m /m H ⊗m Uf F2†m r2
|ψ4 i
|0i⊗n /n
|ψ0 i |ψ1 i |ψ2 i |ψ3 i
Figure 8.2: Circuit of the discrete logarithm algorithm when r is a power of 2, where m = log2 r,
n = dlog2 N e, and Uf is defined by Eq. (8.1). If gcd(r1 , r) = 1, return s ≡ r2 /r1 mod r;
otherwise, Fail. The probability of returning the correct result is 1/2.
Before proceeding, we simplify |ψ2 i as most as possible using the periodicity of function f . Since
the image of f on all points (x − `s, `) for 0 ≤ ` < r is equal to the image of (x, 0), we can collect
the third register as
r−1 r−1
!
1X X
|ψ2 i = |x − `si|ì |f (x, 0)i.
r
x=0 `=0
In step 4, we measure the third register in the computational basis, obtaining an image of one
of the representatives of the cosets of L in Z2r as follows
r−1
!
1 X
|ψ3 i = √ |x0 − `si|ì |f (x0 , 0)i.
r
`=0
In step 5, we apply Fr† ⊗ Fr† to the first and second registers (we disregard the third register)
obtaining
r−1
1 X †
|ψ4 i = √ Fr |x0 − `si Fr† |ì .
r
`=0
78
We now simplify |ψ4 i as most as possible. Using the definition of the Fourier transform Fr† , we
obtain ! 
r−1 r−1 r−1
1 X 1 X 1 X
|ψ4 i = √ √ ω x(x0 −`s) |xi  √ ω y` |yi ,
r r x=0 r r y=0 r
`=0
P
where ωr = exp(2πi/r). Changing the order of the sums by pushing x,y to the left, and by
pushing ` and the term ωr−x`s to the second register, we obtain
P
r−1 r−1
1 X xx0 X
|ψ4 i = √ ωr |xi ωr`(−xs+y) |yi.
r r x,y=0
`=0
Using that
r−1
(
1 X −xs+y ` 1, if y = xs,
ωr =
r 0, otherwise,
`=0
we obtain
r−1
1 X xx0
|ψ4 i = √ ω |xi δy,xs |yi,
r x,y=0 r
and by simplifying the sum over y, we obtain

r−1
1 X xx0
|ψ4 i = √ ω |xi |xsi.
r x=0 r
In step 6, we measure the first and second registers in the computational basis and obtain
two results: r1 and r2 = r1 s, where r1 is chosen in the interval [0, r − 1] uniformly at random.
In step 7, if gcd(r1 , r) = 1, we calculate s ≡ r2 /r1 mod r. Since s ≤ r < N , s satisfies
as ≡ b mod N . For any odd r1 , gcd(r1 , r) = 1. Since half of the values of r1 is odd, the success
probability is 1/2.
79
Chapter 9
Grover’s Algorithm
Grover’s algorithm [21, 22] is a search algorithm initially developed for unstructured data. It
can also be described in terms of an oracle, which is a function with some promise or property
that can be evaluated as many times as we want, and our goal is to determine the property that
the function has. The algorithm was shown optimal in [4, 64]. Grover’s algorithm is described
in many books [2, 3, 5, 24, 26, 28, 30, 31, 35, 40, 44, 49, 58, 60].
9.1 Problem formulation in terms of an oracle

Let N be a power of 2 for some integer n, that is, N = 2n . Suppose that f : 0, . . . , N − 1 →

{0, 1} is a Boolean function such that f (x) = 1 if and only if x = x0 for some fixed value x0 ,
that is,
1, if x = x0 ,
f (x) =
0, otherwise.
Suppose that x0 is unknown to us. How can we find x0 by evaluating f ? From a computational
point of view, we want to evaluate f as few as possible.
Classically, the most efficient algorithm queries this function N times in the worst case, that
is, the complexity of the classical algorithm in terms of the number of queries is O(N ). How is
it done in practice? We ask someone else to generate a n-bit random number x0 . This person
hides x0 from us and makes a compiled subroutine of f . We can use the subroutine as many
times as we want, but we cannot hack the code in search for x0 . The classical algorithm that
solves this problem is an iteration that queries f (x) for x from 0 to 2n − 1. As soon as f (x) is
1, the program returns x0 . √
Quantumly, it is possible to improve the query complexity to O( N ). How is it done in
practice? We have to ask someone again to generate a n-bit random number x0 and define f .
This person, the oracle, implements f through a unitary matrix Uf in a quantum computer. We
can use Uf , but we cannot see the details of the implementation of Uf . Each time we use Uf ,
we add a unit to the count. We can use additional gates that obviously don’t depend on x0 .
We have the same problem that can be solved by algorithms executed in two different ma-
chines. In the first case, a classical computer with O(n) bits is used and the solution is found
after O(N ) steps. In the √ second case, a quantum computer with O(n) qubits is used and the
solution is found after O( N ) steps. This gain in complexity justifies the investment in quan-
tum hardware and the study of techniques to develop quantum algorithms, which necessarily
80
have to explore the superposition of states. In the case of Grover’s algorithm, Uf acting on a
superposition of states evaluates f at more than one point in the domain at the same time with
the same available resources. The classical computer needs an exponential number of processors
to perform this task with the same efficiency.
9.2 How to implement the oracle on a quantum computer

The first step in developing a quantum algorithm that solves Grover’s problem is the implemen-
tation of the function f . Since f is a Boolean function whose truth table has a single row with
output 1, f can be implemented with a multiqubit Toffoli gate activated by x0 , as described
in Sec. 2.7 on Page 23. This gate has an associated unitary matrix Uf , which is defined by its
action on the computational basis as
Uf |xi|ii = |xi|i ⊕ f (x)i,
where x is a n-bit string and i is a bit. The first register has n qubits and the second register
has one qubit. Note that if we take i = 0, the above equation reduces to

|x0 i|1i, if x = x0 ,
Uf |xi|0i =
|xi|0i, otherwise,
which describes the output of a multiqubit Toffoli gate activated by x0 (and only by x0 ) when
the input is |xi|0i. The result of the calculation of f (x) is stored in the second register while
the state of the first register remains unchanged.
For example, the circuit that implements Uf in the case N = 8 and x0 = 6, that is, f (110) = 1
and f (j) = 0 if j 6= 110, is
|1i • |1i
|1i • |1i
|0i |0i
|0i |1i.
The first register has three qubits. Note that the state of the second register changes from |0i
to |1i only if the input to the first register is |110i because 110 activates the three controls and
all other 3-bit strings do not.
In Grover’s algorithm, the second register is always in the state
|0i − |1i
|−i = √ .
2
Using linearity, the action of Uf is given by

−|x0 i|−i, if x = x0 ,
Uf |xi|−i =
|xi|−i, otherwise.
The same result is obtained from Proposition 3.1 on Page 29, which states that
Uf |xi|−i = (−1)f (x) |xi|−i.
81
9.3 The algorithm
Grover’s algorithm uses an additional operator defined as

G = 2 |dihd| − IN ⊗ I2 ,
where
N −1
1 X
|di = √ |ji.
N j=0
The notation “|dihd|” is called external product between the vector |di (a N × 1 matrix) and
the dual vector hd| (a 1 × N matrix). The external product is equivalent to the matrix product.
Multiplying a N × 1 matrix by a 1 × N matrix results in a N × N matrix. Therefore, |dihd| is
a N × N matrix, given by  
1 1 ··· 1
1 1 · · · 1
1  
|dihd| = . . . ,
N  .. .. . . . .. 
1 1 ··· 1
and  
(2 − N ) 2 ··· 2
1 
 2 (2 − N ) · ·· 2 
2 |dihd| − IN =

.. .. . .. ..
N
 
. . . 
2 2 ··· (2 − N )
which is called Grover matrix (or Grover operator).
Grover’s algorithm is described in Algorithm 9.1.
Algorithm 9.1: Grover’s algorithm

Input: An integer N and a function f : {0, ..., N − 1} → {0, 1} such that f (x) = 1 only
for one point x = x0 in the domain.
Output: x0 with probability equal to or greater than 1 − N1 .
1 Prepare the initial state |di|−i ;
j √ k
2 Apply (G Uf )t , where t = π4 N ;
9.4 Non-economical circuit of Grover’s algorithm

The goal of this Section is to find the circuit that implements the Grover operator using our
knowledge of implementing Boolean functions. The circuit uses more qubits than necessary, but
later we show how to obtain an economical version of the circuit.
To obtain the circuit, wehave to do an algebraic manipulation with the expression of the
Grover matrix 2 |dihd| − IN . Note that
|di = H ⊗n |0i
82
where |0i is in the decimal notation and H ⊗n = H ⊗ · · · ⊗ H. Transposing the above equation,
we obtain
hd| = h0|H ⊗n .
Using (H ⊗n ) · (H ⊗n ) = (H · H)⊗n = (I2 )⊗n = IN , we obtain
2 |dihd| − IN = H ⊗n 2 |0ih0| − IN H ⊗n ,

where  
1 0 ··· 0
 0 −1 · · · 0
2 |0ih0| − IN =  .
 
.. .. .. ..
 . . . .

0 0 ··· −1

Matrix 2 |0ih0| − IN acts only on the first register. However, it is simpler to implement this
matrix using both registers. Let us show that it is implemented by a multiqubit Toffoli gate
activated by 0. In fact, the action of 2 |0ih0| − IN on |xi, where x is a n-bit string, is

|0i, if x = 0,
2 |0ih0| − IN |xi =
−|xi, otherwise.

Therefore, the action of 2 |0ih0| − IN on the first register is the same as the action of (−Uf 0 )
on both registers when x0 = 0 (the state of the second register must be |−i), where

0 1, if x = 0,
f (x) =
0, otherwise.
The minus sign in (−Uf 0 ) changes neither the result of the algorithm nor the final probability.
That is, using G or −G in Grover’s algorithm does not change the final result.
Using these algebraic results, we conclude that a circuit to implement Grover’s algorithm is
√
repeat b π4 N c times
|0i1 H H H i1
.. .. .. .. .. ..
. . . . . .
Uf
|0in H H H in
|−i |−i,
where the outputs i1 , ..., in are the bits of x0 , that is x0 = (i1 . . . in )2 .
9.5 Economical circuit of Grover’s algorithm

The second register of Grover’s algorithm can be discarded since it is possible to make a more
economical implementation of the oracle [32]. Let us start by showing how to implement the
83

operator 2 |0ih0| − IN (modulo a global phase), which enables us to implement operator G
using only the first register. Let us show the equivalence of the following circuits:
|k1 i |k1 i |k1 i |k1 i

|k2 i |k2 i |k2 i |k2 i
.. .. .. ..
. . ≡ . .
|kn−1 i |kn−1 i |kn−1 i |kn−1 i
|kn i |kn i |kn i X H H X (−1)k̄1 ···k̄n |kn i.
|−i (−1)k̄1 ···k̄n |−i
The output of the left-hand circuit is the (n + 1)-qubit state

|k1 i ⊗ · · · ⊗ |kn i ⊗ (−1)k̄1 ···k̄n |−i ,
which is obtained from the definition of the multiqubit Toffoli gate activated only when qubits
k1 , . . . , kn are set to 0. The Kronecker product has the property
|v1 i ⊗ (a|v2 i) = (a|v1 i) ⊗ |v2 i = a(|v1 i ⊗ |v2 i),
for any vectors |v1 i, |v2 i and any scalar a. Then, we move the scalar term (−1)k̄1 ···k̄n to the first
register whose output is

|k1 i ⊗ · · · ⊗ |kn−1 i ⊗ (−1)k̄1 ···k̄n |kn i = (−1)k̄1 ···k̄n |k1 i ⊗ · · · ⊗ |kn i.
The state of the second register is |−i and this register will be discarded at the end of the
process.
Now let us show that the output of the right-hand circuit is the same. We use the fact that
XHXHX = −Z, therefore the right-hand circuit is equivalent to
|k1 i |k1 i
|k2 i |k2 i
.. ..
. .
|kn−1 i |kn−1 i
|kn i −Z (−1)k̄1 ···k̄n |kn i.
The output is obtained using that (−Z)|kn i = (−1)k̄n |kn i, and since −Z is activated only when
qubits k1 , . . . , kn−1 are set to 0, we obtain the overall output (−1)k̄1 ···k̄n−1 k̄n |k1 · · · kn−1 kn i. In
conclusion, the result of the first circuit (after the elimination of the second register) is the same
as the result of the second circuit. Since we are going to perform a measurement of the first
register only, this completes the proof that we can replace without any loss the first circuit with
the second in Grover’s algorithm. This replacement is not valid in all algorithms. So far we
have shown that G (modulo a global phase) can be implemented using only the first register.
Let us consider the oracle. If the oracle choose x0 = 0, the circuit of Uf is a multiqubit
Toffoli gate activated only when all qubits of the first register are set to 0. In this case, we
84
have already shown how to implement Uf using only the first register. The oracle would use
the right-hand circuit depicted in the beginning of this Section. If the oracle choose x0 = 1,
the circuit of Uf is a multiqubit Toffoli gate that is activated only when all qubits of the first
register are set to 0 except the n-th qubit, which is set to 1. In this case, we have the following
circuit equivalence:
|k1 i |k1 i |k1 i |k1 i
|k2 i |k2 i |k2 i |k2 i
.. .. .. ..
. . ≡ . .
|kn−1 i |kn−1 i |kn−1 i |kn−1 i
|kn i • |kn i |kn i H H (−1)k̄1 ···k̄n−1 kn |kn i.
|−i (−1)k̄1 ···k̄n−1 kn |−i
The equivalence check is similar to the previous case but now it is obtained using that HXH = Z
and Z|kn i = (−1)kn |kn i, and since Z is activated only when qubits k1 , k2 , . . . , kn−1 are set to
0, we obtain the overall output (−1)k̄1 k̄2 ···k̄n−1 kn |k1 k2 · · · kn i. This concludes the proof that the
oracle would use a n-qubit circuit if x0 = 1.
The remaining cases, x0 ≥ 2, can be obtained from the previous results. If the rightmost
(last) bit of x0 is 0, we use the circuit equivalence described at the beginning of this Section in
the following way: If any control (except the n-th) on the left-hand circuit changes from empty
to full circle, the same must happen to the controls on the right-hand circuit. If the rightmost
bit of x0 is 1, we use the second circuit equivalence of this Section, and in the same fashion if
any control (except the n-th) on the left-hand circuit changes from empty to full circle, the same
must happen to the controls on the right-hand circuit. The expression of the output changes
accordingly.
Thus, not only G but also Uf can be implemented with n qubits by eliminating the second
register and by introducing two Hadamard gates plus two Pauli X gates if the n-th qubit is
activated by 0, and by introducing only two Hadamard gates if the n-th qubit is activated by 1.
Economical circuit when N = 4

The circuit of Grover’s algorithm in the economical form when N = 4 and x0 = 11 is
|0i H • H H 1
|0i H H H H X H H X H 1.
The gates inside the first dashed box implement the oracle and the gates inside the second
dashed box implement 2 |0ih0| − IN (modulo a global phase). This circuit can be simplified
by substituting HXH in the second qubit with Z in two places.
The goal of Grover’s algorithm is to determine x0 by querying the oracle, that is, using the
first dashed box, without looking at its implementation details. We have to pretend that the
first dashed box is a black box. When N = 4, there are four possible black boxes, case x0 = 11
is one of them. The gates that implement the oracle when x0 = 00, x0 = 01, and x0 = 10 are
(X ⊗XH)CNOT(X ⊗HX), (X ⊗H)CNOT(X ⊗H), and (I ⊗XH)CNOT(I ⊗HX), respectively.
When N = 4, the output is the correct one with probability 1.
85
Economical circuit for arbitrary N
For an arbitrary N , the circuit of Grover’s algorithm with only n qubits is
√
repeat b π4 N c times
|0i1 H H H i1
.. .. .. .. .. ..
. . uf . . . .
|0in−1 H H H in−1
|0in H Z Z in ,
where the matrix uf is the economical version of Uf and the circuit for uf for an arbitrary
x0 = (i1 . . . in )2 is
X i1 X • X X i1
.. .. .. .. .. ..
. . ≡ . . . .
uf
X in−1 X • X X in−1
X 1−in H H X 1−in .
9.6 Analysis of Grover’s algorithm

Why does Grover’s algorithm work correctly? We answer this question using a geometric inter-
pretation of vector reflections. The goal of the algorithm is to find x0 , which is a n-bit string.
If the state of the qubits is |x0 i, the measurement of the qubits returns x0 . At the beginning of
the algorithm, the state of the qubits is |di. For large N , |di is almost orthogonal to |x0 i. To
measure the first register at this point has little chance of returning x0 .
|x0 i
|di
θ/2
Figure 9.1: Depiction of vector |x0 i and |di.
Fig. 9.1 shows the position of these vectors and angle θ/2 between |di and the horizontal
axis. Any other position of the vectors can be used in the analysis of the algorithm provided
86
that |di is almost orthogonal to |x0 i. The angle θ is very small for large N , and in this case,
θ/2 is a very good approximation of sin(θ/2). In addition, the sine of an angle is equal to the
cosine of the complement, that is,

θ θ π θ
≈ sin = cos − .
2 2 2 2
As (π − θ)/2 is the angle between |x0 i and |di, by definition of the inner product, cos (π − θ)/2
is the inner product of vectors |x0 i and |di, the result of which is

θ θ π θ
1
≈ sin = cos − = x0 d = √ .
2 2 2 2 N
Therefore,
2
θ≈√ .
N
We will disregard the state of the second register, as it remains fixed as |−i throughout the
algorithm. So let us define operator uf as

−|x0 i, if x = x0 ,
uf |xi =
|xi, otherwise,
which is a reduced version of Uf . We will use uf instead of Uf in the analysis of the algorithm.
The first step is to apply uf to |di. The action of uf on |di (written in the computational
basis) inverts the sign of the amplitude of |x0 i and does not change the other amplitudes. The
amplitude of |x0 i is the orthogonal projection of |di on the vertical axis—see Fig. 9.1, which is
inverted by the action of uf . Geometrically, the action of uf is represented by a reflection of
|di about the horizontal axis. The angle between the vectors |di and (uf |di) is θ, as shown in
Fig. 9.2.
|x0 i
|di
fx0 |di
Figure 9.2: Vector uf |di is a reflexion of |di about the horizontal axis.

The next step is to apply 2 |dihd| − IN . We call this matrix g, as it is a reduced version
with N dimensions of G, that is,
g = 2 |dihd| − IN .
Let us show that the action of g is a reflection about the axis defined by |di. This proof is done
in two parts. First, we show that |di is invariant under the action of g. Second, we show that
87
the action of g on d⊥ inverts the sign of d⊥ , where d⊥ is any vector orthogonal to |di. The

first step follows from

g|di = 2 |dihd| − IN |di = 2 |di dd − |di = |di,

because dd = 1. The second step follows from
E E D E E E
g d⊥ = 2 |dihd| − IN d⊥ = 2 |di dd⊥ − d⊥ = −d⊥ ,

because dd⊥ = 0.

|x0 i gfx0 |di
θ |di
θ
fx0 |di
Figure 9.3: Vector (g uf |di) is a reflexion of (uf |di) about |di.
Fig. 9.3 depicts (g · uf · |di) and shows that the action of (G · Uf ) rotates the initial state by
θ degrees towards |x0 i. Since θ is a small angle, this achievement is modest, but promising. It is
easy to see that the second action of (G · Uf ) repeats the process of rotating θ degrees towards
|x0 i. We want to know how many iterations r are needed such as rθ = π/2? The number of
iterations is j π k jπ √ k
r= = N .
2θ 4
|ψi
|x0 i
θ |di
2
a
θ/2
Figure 9.4: Vector |ψi is the final state before measurement and a is the projection of |x0 i on
|ψi. The angle between |ψi and |x0 i is less than or equal to θ/2.
It is still missing the calculation of the success probability. After r iterations, the state of
the quantum computer without considering the second register is
√
π
|ψi = (g uf ) 4 N |di.
88
Vector |ψi is almost orthogonal to |di at this point, as depicted in Fig. 9.4. The angle between
|ψi and |x0 i is less than or equal to θ/2. The success probability is greater than or equal to the
absolute square of the amplitude of |x0 i in the decomposition of |ψi in the computational basis.
This amplitude is a as shown in Fig. 9.4. The orthogonal projection of |x0 i on the final state is
at most cos(θ/2). Therefore, the success probability p = |a|2 satisfies
θ θ 1
p ≥ cos2 ≥ 1 − sin2 ≥ 1− .
2 2 N
√
The case N = 4 case is special because θ = 60◦ , since sin(θ/2) = 1/ N . With one application
of (g uf ), vector |di rotates 60◦ and coincides with |x0 i. In this case, the success probability is
exactly p = 1.
9.7 Final remarks

The same technique for implementing the oracle using n qubits analyzed in this Chapter can be
applied to the implementation of the Deutsch-Jozsa algorithm with only n qubits. Note that
the state of the second register in the Deutsch-Jozsa circuit before applying Uf is |−i. What we
have to do is to discard the second register and to replace Uf in the Deutsch-Jozsa circuit by
uf described at the end of Sec. 9.5.
89
Chapter 10
Phase Estimation and Applications
Kitaev published the phase estimation algorithm as a preprint in 1995 [29], and later as a
section in a book in Russian, which was translated into English [30]. Kitaev’s method is based
on a procedure for measuring an eigenvalue of a unitary operator, that is, given a unitary
operator U and one of its eigenvectors |ψi, the algorithm finds the eigenvalue exp(2πφ), so
that U |ψi = exp(2πφ)|ψi, where φ is the phase of the eigenvalue. This algorithm provides an
alternative way of factoring integers and calculating discrete logarithms. Not only that, it is
used in many applications such as quantum counting. This algorithm has been described in
many books [3, 5, 25, 28, 58].
10.1 Phase estimation algorithm

Suppose we have a n-qubit unitary operator U and we know one of its eigenvectors |ψi. We do
not know the eigenvalue associated with |ψi, but we know that its analytical expression is e2πiφ ,
where 0 ≤ φ < 1 (φ is unknown), because U is unitary. We assume for now that φ = 0.φ1 · · · φm
for some integer m, where φ1 , ..., φm are bits, that is, the phase of the eigenvalue e2πiφ is a
rational multiple of 2π. The goal of the phase estimation algorithm is to determine φ using U
as an oracle and |ψi as an input.
Basic block of phase estimation

The basic block of the circuit depends on an integer 0 ≤ j < m and is given by
j
|0i • |0i+e2πiφ2 |1i
H √
2
|ψi /n U2
j
|ψi.
To verify the correctness of the output of the basic block, we have to use
U |ψi = e2πiφ |ψi,
and also
j j
U 2 |ψi = e2πiφ 2 |ψi.
90
j
The output of the basic block is obtained by applying the controlled U 2 operator on (H|0i)⊗|ψi,
that is,
j j
|0i|ψi + |1iU 2 |ψi |0i + e2πiφ2 |1i

2j
|0i|ψi + |1i|ψi
C U √ = √ = √ ⊗ |ψi.
2 2 2
Here we see an example of the phase kickback process because the phase was produced by the
action of U on the second register but it appears as a relative phase of the first qubit after |ψi
has been collected.
j
There is an alternative way of writing the eigenvalue of U 2 associated with |ψi. Using that
φ = 0.φ1 · · · φm in binary, then
φ1 φ2 φm
φ = + 2 + ··· + m.
2 2 2
Multiplying by 2j , we obtain
φj+1 φm
φ 2j = 2j−1 φ1 + · · · + 2φj−1 + φj + + · · · + m−j .
2 2
It is straightforward to check that

j
φj+1 φm
exp 2πiφ 2 = exp 2πi + · · · + m−j
2 2
because exp 2πi 2j−1 φ1 = · · · = exp 2πiφj = 1. Then,

exp 2πiφ 2j = exp (2πi 0.φj+1 · · · φm ) .

Note that the first digits of φ were eliminated.

j
The implementation of U 2 is not necessarily performed by 2j applications of U . This method
is inefficient if m is large. The implementation depends on specific applications of the phase
estimation algorithm. For instance, if U performs modular arithmetic, the repeated squaring
method is employed.
Intermediate block
The intermediate block of the phase estimation circuit is depicted in Fig. 10.1. The circuit has
two registers: The first has m qubits with input |0i⊗m and the second n qubits with input |ψi.
The output is a direct consequence of each basic block, where j runs from 0 to m − 1. The order
of the controlled operations is irrelevant but j must be 0 for the m-th qubit, j must be 1 for the
(m − 1)-th qubit, and so on.
The output of the first register of the intermediate block is
m−1 m−2 0
|0i + e2πiφ2 |1i |0i + e2πiφ2 |1i |0i + e2πiφ2 |1i
√ ⊗ √ ⊗ ··· ⊗ √ .
2 2 2
This output can be simplified into a very neat expression. In order to do so, let us replace each
term with an equivalent term using a binary sum and collecting the denominators
1 1 m
1 X 2πiφ2m−1 `1 X m−2 `
X 0
√ e |`1 i ⊗ e2πiφ2 2
|`2 i ⊗ · · · ⊗ e2πiφ2 `m |`m i.
m
2 ` =0
1 ` =0 2 ` =0 1
91
m−1
|0i ··· • |0i+e2πiφ2 |1i
H √
2
m−2
|0i ··· • |0i+e2πiφ2 |1i
H √
2
.. .. ..
. . .
1
|0i • ··· |0i+e2πiφ2 |1i
H √
2
0
|0i • ··· |0i+e2πiφ2 |1i
H √
2
|ψi /n U2
0
U2
1
··· U2
m−2
U2
m−1
|ψi
Figure 10.1: Intermediate block of the phase estimation circuit.
Pushing all sums to the beginning of the expression and combining all exponentials, we obtain
1
1 X m−1 ` +···+20 ` )
√ e2πiφ(2 1 m
|`1 i ⊗ ... ⊗ |`m i.
2m `1 ,...,`m =0
Converting the binary into decimal, we obtain

2 −1 m
1 X 2πi φ `
√ e |ì.
2m `=0
This is the neat expression we were looking for. Let us summarize the intermediate block:
2m −1
!
⊗m intermediate 1 X 2πi φ `
|0i ⊗ |ψi −−−−−−−→ √ e |ì ⊗ |ψi,
block 2m `=0
where m is the number of qubits of the first register, |ψi is an eigenvector of U with eigenvalue
exp(2πi φ), and φ = 0.φ1 ...φm .
Full circuit of phase estimation

In the last Section, we have shown that the output of the first register of the intermediate block
is
2m −1
1 X 2πi φ `
√ e |ì.
2m `=0
On the other hand, the action of the Fourier transform F2m on a generic state |ji of the com-
putation basis is
2m −1
1 X 2πij`
F2m |ji = √ e 2m |ì.
2m `=0
Inverting the equation, taking j = (φ1 ...φm )2 = (2m φ)10 and |ji = |φ1 i ⊗ ... ⊗ |φm i, we obtain
2m −1
!
† 1 X 2πi φ `
F2m √ e |ì = |2m φi = |φ1 i ⊗ ... ⊗ |φm i.
2m `=0
92
We have shown that if we apply the inverse Fourier transform to the output of the intermediate
block, the result is a state |ji of the computational basis equal to |φ1 i ⊗ ... ⊗ |φm i. This means
that a measurement in the computational basis reveals with certainty each fractional bit of φ
because we are assuming that φ has been given with m bits. In the general case, the result of the
algorithm is a good m-bit estimation of φ, which we denote by φ̃. The full circuit of the phase
estimation algorithm is depicted in Fig. 10.2. The algorithm is described in Algorithm 10.1.
|0i H ··· • φ1
|0i H ··· • φ2
.. .. F2†m ..
. . .
|0i H • ··· φm−1
|0i H • ··· φm
|ψi /n U2
0
U2
1
··· U2
m−2
U2
m−1
|ψi
Figure 10.2: Full circuit of the phase estimation algorithm.
Algorithm 10.1: Phase estimation algorithm

Input: Eigenvector |ψi of U .
Output: Number 2m φ, where exp(2πiφ) is the eigenvalue of |ψi.
1 Prepare the initial state |0i⊗m ⊗ |ψi;
2 Apply H ⊗m to the first register;
`
3 For ` in the interval [0, m − 1] apply the controlled operation C m−` U 2 , where the
control qubit is m − ` and the target is the 2nd register;
4 Apply F2†m to the first register;
10.2 Application to order finding

Let N and x be positive integers so that 1 < x < N and gcd(x, N ) = 1. The order of x modulo
N is the smallest positive integer r that obeys
xr ≡ 1 mod N.
Given x and N , the order finding is the problem of calculating r. In this Section, we show how
to solve the order finding problem efficiently using the phase estimation algorithm providing an
alternative to Shor’s factoring algorithm.
93
The strategy is to replace |ψi in Algorithm 10.1 by |1i (the second vector of the computational
basis of the second register) and choose U as the unitary operator that multiplies the input by
x modulo N , that is,
U |yi = |xy mod N i.
The input is a vector |yi of the computational basis of the second register. We may think that
y is represented in the decimal system. The output is also a vector |y 0 i of the computational
basis of the second register, which is obtained by calculating xy ≡ y 0 modulo N . U is a unitary
operator because gcd(x, N ) = 1. U † is defined accordingly using x−1 modulo N , that is,
U † |yi = x−1 y mod N .

The motivation of using U here is that the repeated application of U produces successive powers
of x, in fact, U j |yi = xj y . The number of qubits n of the second register must be able to lodge
U , then we take n = dlog2 N e.
In order to understand the order finding as a phase estimation algorithm, let us find the
eigenvectors of U . It is straightforward to obtain a 1-eigenvector because set {x0 , x1 , ...xr−1 },
where r is the order of x modulo N , is invariant under the multiplication by x. Then, the
normalized vector
r−1
1 X ` E
|ψ0 i = √ x
r
`=0
is a 1-eigenvector of U . Since thisis the

first
vector
of the Fourier basis (Fr ) when we consider
the transformation of the subset x0 , x1 , ...xr−1 of the computational basis, let us define

the remaining ones:

r−1
1 X − 2πik` ` E
|ψk i = √ e r x . (10.1)
r
`=0
Now let us check that each |ψk i is an eigenvector of U . In fact,

r−1
1 X − 2πik` `+1 E
U |ψk i = √ e r x
r
`=0
r−1
1 X − 2πik(`−1) ` E
= √ e r x
r
`=0
2πik
= e r |ψk i.
We conclude that |ψk i is an eigenvector of U with eigenvalue exp(2πik/r) for 0 ≤ k < r. If we

are able to prepare the input to the second register of the phase estimation algorithm as |ψk i
for some k, we will obtain an approximation of k/r as the output. If the input to the second
register is |ψk i, the output of the intermediate block will be
2 −1 m
⊗m intermediate1 X 2πik`
|0i |ψk i −−−−−−−→ √ e r |ì|ψk i. (10.2)
block 2m `=0
We cannot prepare |ψk i as an input to the phase estimation algorithm but we can find a
known vector that is spanned by the set of vectors {|ψ0 i, ...|ψr i}. Using the inverse Fourier
94
transform, we have
r−1
E
` 1 X 2πik`
x = √ e r |ψk i.
r
k=0

The simplest candidate is x0 = |1i, which is given by
r−1
1 X
|1i = √ |ψk i.
r
k=0
Using transformation (10.2) for each k, the output of the intermediate block is
r−1 2 −1 m
⊗m intermediate1 X X 2πik`
|0i |1i −−−−−−−→ √ e r |ì|ψk i.
block r2m k=0 `=0
To simplify the output, we use Eq. (10.1)

m
r−1 2 −1 r−1
!
1 X X 2πik` 1 X − 2πik`0 `0 E
output = √ e r |ì √ e r x .
r2m k=0 `=0 r 0
` =0
Inverting the order of the sums and combining the exponents, we obtain
2m −1 r−1 r−1
!
1 X X 1 X 2πik(`−`0 ) 0E
output = √ e |ìx` .
r

2m 0
r
`=0 ` =0 k=0
The expression inside the parenthesis is 1 if ` = `0 and 0 otherwise. This means that the output
of the intermediate block is
2m −1
⊗m 1 X
intermediate
E
|0i |1i −−−−−−−→ √ |ìx` .

block 2m `=0
This is the same state as in the standard Shor’s factoring algorithm just before applying the
inverse Fourier transform F2m , that is, we consider the part of the algorithm where the input is
|0i⊗m |0i and then the Hadamard gate is applied on each qubit of the first register and then Uf ,
where f (x) = ax mod N :
2m −1
Uf ·(H ⊗m ⊗I)
1 X E
⊗m
|0i |0i −−−−−−−−→ √ |ìx` .

m
2 `=0
This means that the phase estimation version yields the same result and the analysis of the
success probability is exactly the same as in Shor’s factoring algorithm if we choose m so that
m ≈ 2dlog2 N e. The number of qubits of the first register must be close to twice the number of
qubits of the second register.
There is an interesting special case. If we know somehow that the order r is a power of 2,
we can take m = n = dlog2 N e. If r is a power of 2, the phase of the eigenvalue exp(2πik/r) is a
rational multiple of 2π, and the phase estimation algorithm returns an exact value 2n k/r when
m = n.
95
j
How do we implement U 2 efficiently for 0 ≤ j < m? Note that
j
j E
U 2 |yi = x2 y mod N .

j
Since x2 can be calculated efficiently in O(n2 ) steps using the repeated squaring method, instead
of applying U repeatedly 2j times, for each j we implement an operator Uj |yi = |zyi after
j
calculating z = x2 using the repeated squaring method. In this case, the intermediate block
can be computed in O(n3 ) steps.
10.3 Application to discrete logarithm

Let N , a, and b be known positive integers and let s be a positive integer such that as ≡ b mod N
and gcd(a, N ) = 1. Our goal is to find s given N , a, and b as input. This is the same problem
addressed in Sec. 8.3 on Page 77. Now we show how to solve the discrete logarithm problem
using the phase estimation algorithm providing an alternative version to Shor’s algorithm for
discrete logarithm.
The strategy we use here is the same one used in the order finding algorithm based on the
phase estimation. Recall that, when we described the order finding algorithm, the output of
the intermediate block is the same as in the original Shor’s factoring algorithm right before the
action of the inverse Fourier transform. Now, the state of the quantum computer right before
the action of Fr† ⊗ Fr† in Shor’s discrete logarithm algorithm described in Sec. 8.3 is
r−1
1 X
|xi|yi|ax by mod N i.
r
x,y=0
If we wish to produce this state using the phase estimation algorithm, we need to use three
registers and two unitary operators:
Ua |xi = |ax mod N i,
Ub |yi = |by mod N i.
Ub is a unitary operator because gcd(b, N ) = 1. In fact, the inverse of b is ar−s , where r is the
order of a modulo N . In the new algorithm, the action of Ua is controlled by the first register
and the action of Ub is controlled by the second register, as described in Fig. 10.3. Note that Ua
and Ub act on the same register.
Let us check that the vectors
r−1
1 X − 2πik` ` E
|ψk i = √ e r a ,
r
`=0
which are eigenvectors of Ua , are also eigenvectors of Ub . In fact, using that b = as , we have
r−1
1 X − 2πik` `+s E
Ub |ψk i = √ e r a
r
`=0
r−1
1 X − 2πik(`−s) ` E
= √ e r a
r
`=0
2πiks
= e r |ψk i.
96
|0i H • φ̃1
.. .. . F2†m ..
. . .. .
|0i H • φ̃m
|0i H • φ̃01
.. .. . F2†m ..
. . .. .
|0i H • φ̃0m
0 m−1 0 m−1
|ψk i /n Ua2 Ua2 Ub2 Ub2 |ψk i
Figure 10.3: Circuit of the discrete logarithm algorithm using phase estimation, where |ψk i is a
common eigenvector of Ua and Ub . Since we are not usually able to prepare |ψk i as the input
to the third register, we use |1i in its place, which can be represented as a linear combination
of |ψk i, for 0 ≤ k < r.
We conclude that |ψk i is an eigenvector of Ub with eigenvalue exp(2πiks/r) for 0 ≤ k < r.

Better yet, |ψk i is an eigenvector of Ua and Ub simultaneously. If we are able to prepare the
input to the third register of the circuit depicted in Fig. 10.3 as |ψk i for some k, we will obtain
an estimate of φ̃ ≈ k/r as the output of the first register and an estimate of φ̃0 ≈ ks/r as the
output of the second register modulo N . If the input to the third register is |ψk i, the output of
the intermediate block will be
2m −1 2m −1
! !
⊗m ⊗m intermediate 1 X 2πik` 1 X 2πiks`0 0
|0i |0i |ψk i −−−−−−−→ √ e r |ì √ e r ` |ψk i,
block 2m `=0 2m `0 =0
where m is the number of qubits of the first and second registers, and n = dlog2 N e of the third
register. The first term inside the parentheses is obtained by replacing φ by k/r and the second
term by replacing φ by ks/r in the output of the intermediate block of the phase estimation
algorithm. Simplifying the output, we have
 
m −1
2X 0)
intermediate 1 2πik(`+s`
|0i⊗m |0i⊗m |ψk i −−−−−−−→  m |ì`0  |ψk i.

e r
block 2 0 `,` =0
Usually, we are not able to prepare |ψk i as an input to the phase estimation algorithm but
we can use again
r−1
1 X
|1i = √ |ψk i.
r
k=0
In this case, the input to and output of the intermediate block are
 
r−1 m −1
2X 0)
intermediate 1  1
2πik(`+s`
|0i⊗m |0i⊗m |1i −−−−−−−→ √
X
|ì`0  |ψk i.

e r
block r 2m 0
k=0 `,` =0
97
We use the definition of |ψk i given by Eq. (10.1) to simplify the output. We start writing
 
r−1 m −1
2X r−1
!
0)
1 X 1 2πik(`+s` 1 X 2πikk0
0E
|ì`0  √ e− r ak

√  e r ,
r 2m 0 r 0
k=0 `,` =0 k =0
and by pushing the sums over `, `0 , k 0 to the left and combining the exponents, we obtain
2m −1 r−1 r−1
!
1 X X 1 X 2πik(`+s`0 −k0 ) 0 k0 E
e r |ì` a .
2m 0 0
r
`,` =0 k =0 k=0
The expression inside the parenthesis is 1 if ` + s`0 = k 0 and 0 otherwise. This means that the
action of the intermediate block is
2m −1
⊗m ⊗m 1 X
intermediate 0
E
|0i |0i |1i −−−−−−−→ m |ì`0 a`+s` .
block 2 0 `,` =0
Using that as = b mod N , ` → x, and `0 → y, we have

2 −1 m
⊗m ⊗m 1 X
intermediate
|0i |0i |1i −−−−−−−→ m |xi|yi|ax by i.
block 2
x,y=0
The output of the intermediate block is the same state as in the standard Shor’s discrete-
logarithm algorithm just before applying the inverse Fourier transforms F2m ⊗ F2m (state |ψ2 i
of Algorithm 8.1 on Page 78). This means that the phase estimation version yields the same
result and the analysis of the success probability is exactly the same as in Shor’s algorithm if
we choose m appropriately.
10.4 Application to quantum counting

In the context of Grover’s algorithm, we have an oracle f : {0, ..., N − 1} → {0, 1} that is a
Boolean function defined as
1, if x ∈ M ,
f (x) =
0, otherwise,
where M is a subset of the domain and N = 2n . We say that x is marked if x ∈ M p. The optimal
number of steps of Grover’s algorithm depends on |M |, in fact, it is given by π4 N/|M |. If the
cardinality of M is unknown, it is possible to find a marked element by repeatedly guessing the
runtime of Grover’s algorithm [9]. An alternative method is by solving the quantum counting
problem [10].
The quantum counting problem asks what is the cardinality of M given function f as an
oracle. A classical solution cannot perform better than Ω(N ) queries to the oracle because
p all
domain elements must be checked. The quantum algorithm can find the solution in O |M | N
queries to the oracle.
Before addressing the quantum counting problem, let us review some key points of Grover’s
algorithm. There is an economical version of the algorithm, which uses only one n-qubit register.
98
The initial state is the uniform superposition of the computational basis given by
N −1
1 X
|di = √ |ji,
N j=0
π
p
and the algorithm consists of 4 N/|M | applications of the evolution operator
U = G Uf ,
where
G = 2|dihd| − I
and
N
X −1
Uf = (−1)f (x) |xihx|.
x=0
The analysis of the algorithm is performed by using the e±iθ -eigenvectors of U , which are given
1
in of the superposition of marked states |M i and the superposition of unmarked states
terms
M ⊥ :
⊥
± |M i ± i M
ψ = √ , (10.3)
2
where r
θ |M |
sin = ,
2 N
and
1 X
|M i = p |xi,
|M | x∈M

⊥
E 1 X
M = p |xi.
N − |M | x6∈M
It is straightforward to check that M ⊥ M = 0, and

r r
|M | |M | ⊥ E
|di = |M i + 1 − M .
N N
Using the equation above, the definition of sin(θ/2), and Eq. (10.3), we obtain
iθ iθ
e 2 |ψ + i − e− 2 |ψ − i
|di = √ .
i 2
Now we come back to the quantum counting problem using p the phase estimation algorithm.
Since the eigenvalue of |ψ + i is exp(iθ), where sin(θ/2) = |M |/N , we would obtain an ap-
proximation for |M | if we use |ψ + i as the input to the second register of the phase estimation
algorithm with U = GUf . We do not know how to prepare |ψ + i, then the strategy is to replace
|ψi in Algorithm 10.1 by a known vector that belongs to the subspace spanned by |ψ + i and
Some references call |M i as “good state” and M ⊥ as “bad state”.
1

99
|ψ − i. The best candidate is |di, which can be easily prepared by applying H ⊗n to |0i⊗n . In
this case, the number of qubits of the second register must be n = log2 N and the output of the
intermediate block is
iθ 2X m
−1 iθ 2X−1 m
⊗m intermediatee2 2πiφ+ ` e− 2 −
e2πiφ ` |ìψ − ,
+
|0i |di −−−−−−−→ √ e |ì ψ − √

block i 2m+1 `=0 i 2m+1 `=0
where φ+ = θ/2π for the first term and φ− = (2π − θ)/2π for the second term. After applying
the inverse Fourier transform F2†m , we obtain the following output of the full circuit
iθ iθ
e 2 m + E e− 2 m − E
√ 2 φ̃ − √ 2 φ̃ ,
i 2 i 2
where φ̃ is a m bit estimate of φ. After a measurement in the computational basis, we learn

an estimate pof φ+ or φ− with equal chance. Let φ̃ be the measurement result. Using that that
sin(θ/2) = |M |/N , φ+ = θ/2π, and φ− = (2π − θ)/2π, the estimate of |M | is N sin2 (π φ̃)
f| = N sin2 (π φ̃+ ) and for the
because for the first case we obtain an estimate of |M | using |M
second case |M 2 − 2 −
f| = N sin (π − π φ̃ ) = N sin (π φ̃ ).
How many qubits has the first register? This is the tricky part. Note that m cannot be
equal to n because the number of applications of U would be 20 + · · · + 2n−1 = 2n − 1. Then,
the number√ of queries to f would be O(N ). If we choose m = n/2, the number of queries to f
would be N , but in this case we obtain an estimate |M f| so that
p
|M
f| − |M | = O |M | . (10.4)
This estimate is not good. For instance, suppose that |M | is around N/2. If √
we wish to know the
number of marked elements, and we obtain |M f| with an error as big as O( N ), we don’t have
a good result. To understand what is the problem here, which does not arise in the factoring
and discrete logarithm algorithms, we have to analyze carefully the range of values of angle θ
we are trying to estimate. √
p us assume that 0 < |M | N , or more formally, |M | = o( N ). The
For this analysis, let
expression sin(θ/2) = |M |/N can be written asymptotically as
p
2 |M | |M |
θ= √ +O .
N N
This means that θ/2π represented in terms of binary digits is of the form 0.0 · · · 01 · · · , where
the number of 0’s before the first 1 is around n/2 − log2 (|M |)/2. If we choose the size of the
first register so that m is less than n/2 − log2 (|M |)/2, it is likely that we obtain a 0 as the
output of the counting algorithm, which is wrong. If we choose m = n/2, we will obtain around
log2 (|M |)/2 correct significant bits of θ/2π. This is an imprecise estimate of |M | compatible
with Eq. (10.4). In fact, |M | has log2 (|M |) bits, and we need to know most of the significant
bits in order to have a good estimate of |M |.
100
Chapter 11
Final Remarks
Most quantum algorithms analyzed in this work can be cast into the oracle-based framework.
The query complexity of an algorithm based on an oracle or a black-box is the number of queries.
It does not matter how difficult is to implement the oracle unless we aim to solve a practical
problem. In practical problems, it is our task to implement the oracle, and then the cost of each
evaluation matters. Take Shor’s factoring algorithm as an example. The oracle in this case is a
r-periodic function and our goal is to find r. We have seen that the function in Shor’s algorithm
is the modular exponentiation, which can be implemented efficiently in terms of the input size
using the repeated squaring method.
Any classical deterministic algorithm is a n-input and m-output function f : {0, 1}n −→
{0, 1}m , which is a collection of m n-bit Boolean functions. Then, any classical algorithm can
be implemented on a quantum computer with two registers of sizes n and m using the operator
Uf |xi|yi = |xi|y ⊕ f (x)i.
To exploit quantum parallelism, we have to apply H ⊗n to the first register before applying
Uf . After applying Uf , we have a superposition state, which is not useful unless we perform
some quantum post-processing that would produce the desired output. Most of the quantum
algorithms we have analyzed can be cast into the following circuit:
|0i H i0
.. .. post .. ..
. . . .
processing
Uf
|0i H in−1
|0i⊗m /m j0 ...jm−1 .
For the Deutsch-Jozsa, Bernstein-Vazirani, Simon, and Shor (factoring) algorithms, the quantum
post-processing is either H ⊗n or the inverse Fourier transform. They have the structure outlined
above with a few adaptations.
Grover’s algorithm does not have the structure outlined above because Uf and the post-
processing are repeated many times before measurement. On the other hand, Grover’s has a
polynomial gain in contrast with the exponential gain of Simon’s and Shor’s algorithms. The
extension of the general structure that includes Grover’s algorithm is
101
repeat k times
|0i H i0
.. .. post .. ..
. . . .
processing
Uf
|0i H in−1
|0i⊗m /m j0 ...jm−1 .
The number
√ of repetitions k is 1 for the Deutsch-Jozsa, Bernstein-Vazirani, Simon, and Shor; and
k is bπ 2n /4c for Grover’s algorithm. The measurement of the second register is unnecessary.
It is there because it helps in the analysis of the algorithm.
The second register of the Deutsch-Jozsa, Bernstein-Vazirani, and Grover algorithms has
only one qubit, whose state during the computation is |−i. The oracle for those cases obeys
Uf |xi|−i = (−1)f (x) |xi|−i.
This means that the second register can be eliminated and there is an economical version of the
circuit with the following form:
repeat k times
|0i H i0
.. .. post .. ..
. . Uf . .
processing
|0i H in−1 .
√
As before, k = 1 for the Deutsch-Jozsa and Bernstein-Vazirani algorithms; and k = bπ 2n /4c
for Grover’s algorithm.
Shor’s algorithm for discrete logarithm shows us how to extend the structure of the circuit
when the function f has more than one variable. Suppose that f has two variables. Then, Uf
is defined as
Uf |x1 i|x2 i|yi = |x1 i|x2 i|y ⊕ f (x1 , x2 )i.
This means that we need a circuit with three registers and the general structure of the algorithm
is the same up to small changes, as follows:
repeat k times
|0i⊗n1 /n1 H ⊗n1 i0 ..in1 −1

post
processing
|0i⊗n2 /n2 H ⊗n2 Uf i00 ..i0n2 −1
|0i⊗n3 /n3 j0 ...jn3 −1 .
Those recipes are useful to understand advanced algorithms.
102
Bibliography
[1] S. Axler. Linear Algebra Done Right. Springer, New York, 1997.
[2] S. Barnett. Quantum Information. Oxford University Press, New York, 2009.
[3] G. Benenti, G. Casati, and G. Strini. Principles of Quantum Computation and Information:
Basic Tools and Special Topics. World Scientific Publishing, River Edge, 2007.
[4] C. H. Bennett, E. Bernstein, G. Brassard, and U. V. Vazirani. Strengths and weaknesses

of quantum computing. SIAM J. Comput., 26(5):1510–1523, 1997.
[5] J. A. Bergou and M. Hillery. Introduction to the Theory of Quantum Information Processing.
Springer, 2013.
[6] D. J. Bernstein. Detecting perfect powers in essentially linear time. Math. Comput.,
67(223):1253–1283, 1998.
[7] E. Bernstein and U. Vazirani. Quantum complexity theory. In Proc. of the 25th Annual
ACM Symposium on Theory of Computing, STOC ’93, page 11–20. ACM, New York, 1993.
[8] E. Bernstein and U. Vazirani. Quantum complexity theory. SIAM Journal on Computing,
26(5):1411–1473, 1997.
[9] M. Boyer, G. Brassard, P. Høyer, and A. Tapp. Tight bounds on quantum searching.
Forstschritte Der Physik, 4:820–831, 1998.
[10] G. Brassard, P. Høyer, M. Mosca, and A. Tapp. Quantum amplitude amplification and
estimation. Quantum Computation and Quantum Information Science, AMS Contemporary
Mathematics Series, 305:53–74, 2002.
[11] Guangya Cai and Daowen Qiu. Optimal separation in exact query complexities for Simon’s
problem. Journal of Computer and System Sciences, 97:83–93, 2018.
[12] R. Cleve, A. Ekert, C. Macchiavello, and M. Mosca. Quantum algorithms revisited. Proc.
Royal Society London Ser. A, 454(1969):339–354, 1998.
[13] D. Coppersmith. An approximate Fourier transform useful in quantum factoring.

Arxiv:quant-ph/0201067, 2002.
[14] A. DasGupta. The matching, birthday and the strong birthday problem: a contemporary
review. Journal of Statistical Planning and Inference, 130(1):377–389, 2005.
103
[15] D. Deutsch. Quantum theory, the Church-Turing principle and the universal quantum
computer. Proc. Royal Society London Ser. A, pages 96–117, 1985.
[16] D. Deutsch. Quantum computational networks. Proc. Royal Society London Ser. A,
425(1868):73–90, 1989.
[17] D. Deutsch and R. Jozsa. Rapid solution of problems by quantum computation. Proc.
Royal Society London Ser. A, 439(1907):553–558, 1992.
[18] D. Dieks. Communication by EPR devices. Physics Letters A, 92(6):271 – 272, 1982.
[19] Jiangfeng Du, Mingjun Shi, Xianyi Zhou, Yangmei Fan, BangJiao Ye, Rongdian Han,
and Jihui Wu. Implementation of a quantum algorithm to solve the Bernstein-Vazirani
parity problem without entanglement on an ensemble quantum computer. Phys. Rev. A,
64:042306, 2001.
[20] M. Ekerå. On the success probability of quantum order finding. arXiv:2201.07791, 2022.
[21] L. K. Grover. A fast quantum mechanical algorithm for database search. In Proc. 28th
annual ACM symposium on theory of computing, STOC ’96, pages 212–219, ACM, New
York, 1996.
[22] L. K. Grover. Quantum mechanics helps in searching for a needle in a haystack. Phys. Rev.
Lett., 79(2):325–328, 1997.
[23] G. H. Hardy and E. M. Wright. An Introduction to the Theory of Numbers. Oxford, 4th
edition, 1975.
[24] M. Hayashi, S. Ishizaka, A. Kawachi, G. Kimura, and T. Ogawa. Introduction to Quantum

Information Science. Springer, 2014.
[25] J. Hidary. Quantum Computing: An Applied Approach. Springer, 2019.
[26] M. Hirvensalo. Quantum Computing. Springer, 2010.
[27] Yan Huang, Zhaofeng Su, Fangguo Zhang, Yong Ding, and Rong Cheng. Quantum al-
gorithm for solving hyperelliptic curve discrete logarithm problem. Quantum Information
Processing, 19(2):62, 2020.
[28] P. Kaye, R. Laflamme, and M. Mosca. An Introduction to Quantum Computing. Oxford

University Press, New York, 2007.
[29] A. Yu. Kitaev. Quantum measurements and the Abelian stabilizer problem. Arxiv:quant-
ph/9511026, 1995.
[30] A. Yu. Kitaev, A. H. Shen, and M. N. Vyalyi. Classical and Quantum Computation.
American Mathematical Society, Boston, 2002.
[31] R. J. Lipton and K. W. Regan. Quantum Algorithms via Linear Algebra: A Primer. MIT
Press, 2014.
104
[32] A. Mandviwalla, K. Ohshiro, and B. Ji. Implementing Grover’s algorithm on the IBM
quantum computers. In 2018 IEEE International Conference on Big Data, pages 2531–
2537, 2018.
[33] D. C. Marinescu and G. M. Marinescu. Approaching Quantum Computing. Pear-

son/Prentice Hall, Michigan, 2005.
[34] F. L. Marquezino, R. Portugal, and F. D. Sasse. Obtaining the quantum Fourier transform
from the classical FFT with QR decomposition. Journal of Computational and Applied
Mathematics, 235(1):74–81, 2010.
[35] N. D. Mermin. Quantum Computer Science: An Introduction. Cambridge University Press,

New York, 2007.
[36] D. A. Meyer. Sophisticated quantum search without entanglement. Phys. Rev. Lett.,
85:2014–2017, 2000.
[37] Takashi Mihara and Shao Chin Sung. Deterministic polynomial-time quantum algorithms
for Simon’s problem. Computational Complexity, 12(3):162–175, 2003.
[38] A. Montanaro, R. Jozsa, and G. Mitchison. On exact quantum query complexity. Algorith-
mica, 71(4):775–796, 2015.
[39] M. Nakahara and T. Ohmi. Quantum Computing: From Linear Algebra to Physical Real-
izations. CRC Press, 2008.
[40] M. A. Nielsen and I. L. Chuang. Quantum computation and quantum information. Cam-
bridge University Press, New York, 2000.
[41] I. Niven, H. S. Zuckerman, and H. L. Montgomery. An Introduction to the Theory of

Numbers. Wiley, 5th edition, 1991.
[42] J. L. Park. The concept of transition in quantum mechanics. Foundations of Physics,

1(1):23–33, 1970.
[43] A. Pavlidis and D. Gizopoulos. Fast quantum modular exponentiation architecture for
Shor’s factoring algorithm. Quantum Info. Comput., 14(7&8):649–682, 2014.
[44] R. Portugal. Quantum Walks and Search Algorithms. Springer, Cham, 2th edition, 2018.
[45] R. Portugal and F. L. Marquezino. Introdução à Programação de Computadores Quânticos.

In Conference CSBC 2019 – 38o JAI, pages 1–51, Belém, Pará, 2019.
[46] J. Proos and C. Zalka. Shor’s discrete logarithm quantum algorithm for elliptic curves.
Quantum Information and Computation, 3(4):317–344, 2003.
[47] Daowen Qiu and Shenggen Zheng. Generalized Deutsch-Jozsa problem and the optimal
quantum algorithm. Phys. Rev. A, 97:062331, 2018.
[48] Daowen Qiu and Shenggen Zheng. Revisiting Deutsch-Jozsa algorithm. Information and
Computation, 275:104605, 2020.
105
[49] E. Rieffel and W. Polak. Quantum Computing, a Gentle Introduction. MIT Press, Cam-
bridge, 2011.
[50] J. B. Rosser and L. Schoenfeld. Approximate formulas for some functions of prime numbers.
Illinois Journal of Mathematics, 6(1):64 – 94, 1962.
[51] W. Scherer. Mathematics of Quantum Computing: An Introduction. Springer, 2019.
[52] P. W. Shor. Algorithms for quantum computation: discrete logarithms and factoring. In
Proc. 35th Annual Symposium on Foundations of Computer Science, pages 124 –134, 1994.
[53] P. W. Shor. Polynomial-time algorithms for prime factorization and discrete logarithms on
a quantum computer. SIAM Journal on Computing, 26(5):1484–1509, 1997.
[54] P. W. Shor. Polynomial-time algorithms for prime factorization and discrete logarithms on
a quantum computer. SIAM Review, 41(2):303–332, 1999.
[55] D. R. Simon. On the power of quantum computation. In Proc. 35th Annual Symposium on
Foundations of Computer Science, pages 116–123, 1994.
[56] D. R. Simon. On the power of quantum computation. SIAM Journal on Computing,

26(5):1474–1483, 1997.
[57] U. Skosana and M. Tame. Demonstration of Shor’s factoring algorithm for N = 21 on IBM
quantum processors. Scientific Reports, 11(1):16599, 2021.
[58] J. Stolze and D. Suter. Quantum Computing, Revised and Enlarged: A Short Course from
Theory to Experiment. Wiley-VCH, 2008.
[59] G. Strang. Linear Algebra and Its Applications. Brooks Cole, 1988.
[60] C. P. Williams. Explorations in Quantum Computing. Springer, 2008.
[61] W. K. Wootters and W. H. Zurek. A single quantum cannot be cloned. Nature, 299:802–803,
1982.
[62] N. S. Yanofsky and M. Mannucci. Quantum Computing for Computer Scientists. Cambridge
University Press, 2008.
[63] Zekun Ye, Yunqi Huang, Lvzhou Li, and Yuyi Wang. Query complexity of generalized
Simon’s problem. Information and Computation, 281:104790, 2021.
[64] C. Zalka. Grover’s quantum searching algorithm is optimal. Phys. Rev. A, 60:2746–2751,
1999.
106

Basic Quantum Algorithms: Renato Portugal

Uploaded by

Copyright:

Available Formats

Basic Quantum Algorithms: Renato Portugal

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basic Quantum Algorithms: Renato Portugal

Uploaded by

Copyright:

Available Formats

arXiv:2201.

10574v3 [quant-ph] 23 Feb 2022

Basic Quantum Algorithms

Full Researcher at the

February 25, 2022

7 Shor’s Algorithm for Factoring Integers 56

8 Shor’s Algorithm for Discrete Logarithm 74

10 Phase Estimation and Applications 90

11 Final Remarks 101

2.1 Review of linear algebra using the Dirac notation

without any special denomination in the Dirac notation.

which can be written as

ψ1 ψ2 = α∗ h0| + β ∗ h1| · γ|0i + δ|1i = α∗ γ 0 0 + β ∗ δ 1 1 = α∗ γ + β ∗ δ.

The norm of vector |ψ1 i is denoted by k |ψ1 i k and defined as

where |α| is the absolute value of α, that is

2.2 Qubit and superposition

|ψi = α|0i + β|1i,

where α and β are complex numbers that obey the constraint

2.3 Single-qubit gates

|0i X 1 with probability 1.

cos 2θ −eiλ sin 2θ

After applying U (θ, 0, 0) on |0i, we obtain

known as Pauli matrices,

is equivalent to the algebraic expression

|0i + (−1)` |1i

With a little more effort, we show that

Implementing on IBM quantum computers.

2.4 Quantum states and entanglement

|ψi = a0 |0i + a1 |1i + a2 |2i + · · · + a2n −1 |2n − 1i,

|`i X k |`i = |` ⊕ ki,

(H ⊗ H)|00i = (H ⊗ H) · (|0i ⊗ |0i) = (H|0i) ⊗ (H|0i) = |+i ⊗ |+i.

(A ⊗ B) · (|ψ1 i ⊗ |ψ2 i) = (A|ψ1 i) ⊗ (B|ψ2 i),

|0i + |1i |0i + |1i |0i + |1i + |2i + |3i

where A, B, and C are 1-qubit gates and D is a 2-qubit gate, is

We can simplify this expression a little and write

|ψi = D (BA ⊗ C)|00i.

|`i X (1−k) |`i = |` ⊕ k ⊕ 1i.

The equivalence follows from

(I ⊗ H) CNOT (I ⊗ H) = C(H)C(X)C(H) = C(HXH) = C(Z),

(H ⊗ H) CNOT (H ⊗ H) = (H ⊗ I)(I ⊗ H) CNOT (I ⊗ H)(H ⊗ I) = (H ⊗ I)C(Z)(H ⊗ I).

The matrix representation is  

Implementing on IBM quantum computers.

2.6 Multiqubit gates

|`i X jk |`i = |` ⊕ (j AND k)i.

|`i X (1−j)(1−k) |`i.

C n (X)|q0 i|q1 i · · · |qn−1 i|qn i = |q0 i|q1 i · · · |qn−1 iX q0 q1 ···qn−1 |qn i,

and is represented by the circuit

|qn i X q0 q1 ···qn−1 |qn i,

2.7 Circuit of a Boolean function

2.8 Quantum parallelism

3.1 Problem formulation

x f0 (x) f1 (x) f2 (x) f3 (x)

The disjunctive normal forms are

respectively, where x̄ = NOT x. The Boolean expression of f3 can be simplified since x̄ ∨ x =

Uf |xi|ji = |xi|j ⊕ f (x)i,

Uf2 = (X ⊗ I) · CNOT · (X ⊗ I).

Uf3 = (X ⊗ I) · CNOT · (X ⊗ I) · CNOT.

Note that Uf is unitary in all cases.

3.2 The algorithm

ψ1 ψ2 = α∗ h0| + β ∗ h1| · γ|0i + δ|1i = α∗ γ 00 + β ∗ δ 11 = α∗ γ + β ∗ δ.

= δxx0 j 0 ⊕ f (x0 )j ⊕ f (x) .

6 0 only if x = x0 , we have δxx0 j 0 ⊕ f (x0 )j ⊕ f (x) = δxx0 j 0 j . Then