Quantum Computing Course
Quantum Computing Course
Quantum Information
and Computation
A Course on the Theory of Quantum Computing
arXiv:2507.11536v1 [quant-ph] 15 Jul 2025
John Watrous
Basics of
Quantum Information
1 Single Systems 3
1.1 Classical information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Quantum information . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Multiple Systems 25
2.1 Classical information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Quantum information . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3 Quantum Circuits 63
3.1 Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 Inner products and projections . . . . . . . . . . . . . . . . . . . . . . 73
3.3 Limitations on quantum information . . . . . . . . . . . . . . . . . . . 85
4 Entanglement in Action 93
4.1 Quantum teleportation . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.2 Superdense coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.3 The CHSH game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
1
2
Single Systems
This lesson introduces the basic framework of quantum information, including the
description of quantum states as vectors with complex number entries, measure-
ments that allow classical information to be extracted from quantum states, and
operations on quantum states that are described by unitary matrices.
We will restrict our attention in this lesson to the comparatively simple setting
in which a single system is considered in isolation. In the next lesson, we’ll expand
our view to multiple systems, which can interact with one another and be correlated.
3
4 LESSON 1. SINGLE SYSTEMS
Some readers may already be familiar with the material to be discussed in this
section, while others may not — but the discussion is meant for both audiences. In
addition to highlighting the aspects of classical information that are most relevant to
an introduction to quantum information, this section introduces the Dirac notation,
which is often used to describe vectors and matrices in quantum information
and computation. As it turns out, the Dirac notation is not specific to quantum
information; it can equally well be used in the context of classical information, as
well as for many other settings in which vectors and matrices arise.
1. If X is a bit, then Σ = {0, 1}. In words, we refer to this set as the binary alphabet.
1.1. CLASSICAL INFORMATION 5
3 1
Pr(X = 0) = and Pr(X = 1) = .
4 4
A more succinct way to represent this probabilistic state is by a column vector.
3
4
1
4
The probability of the bit being 0 is placed at the top of the vector and the probability
of the bit being 1 is placed at the bottom, because this is the conventional way to
order the set {0, 1}.
In general, we can represent a probabilistic state of a system having any classical
state set in the same way, as a vector of probabilities. The probabilities can be
ordered in any way we choose, but it is typical that there is a natural or default way
to do this. To be precise, we can represent any probabilistic state through a column
vector satisfying two properties:
Conversely, any column vector that satisfies these two properties can be taken as
a representation of a probabilistic state. Hereafter, we will refer to vectors of this
form as probability vectors.
Alongside the succinctness of this notation, identifying probabilistic states as
column vectors has the advantage that operations on probabilistic states are repre-
sented through matrix-vector multiplication, as will be discussed shortly.
Notice that any two-dimensional column vector can be expressed as a linear combi-
nation of these two vectors. For example,
3
4 = 3 |0⟩ + 1 |1⟩.
1 4 4
4
This fact naturally generalizes to any classical state set: any column vector can be
written as a linear combination of standard basis states. Quite often we express
vectors in precisely this way.
1.1. CLASSICAL INFORMATION 7
Classical operations
In the last part of this brief summary of classical information, we will consider the
sorts of operations that can be performed on a classical system.
8 LESSON 1. SINGLE SYSTEMS
Deterministic operations
First, there are deterministic operations, where each classical state a ∈ Σ is trans-
formed into f ( a) for some function f of the form f : Σ → Σ.
For example, if Σ = {0, 1}, there are four functions of this form, f 1 , f 2 , f 3 , and f 4 ,
which can be represented by tables of values as follows:
a f 1 ( a) a f 2 ( a) a f 3 ( a) a f 4 ( a)
0 0 0 0 0 1 0 1
1 0 1 1 1 0 1 1
The first and last of these functions are constant: f 1 ( a) = 0 and f 4 ( a) = 1 for each
a ∈ Σ. The middle two are not constant, they are balanced: each of the two output
values occurs the same number of times (once, in this case) as we range over the
possible inputs. The function f 2 is the identity function: f 2 ( a) = a for each a ∈ Σ.
And f 3 is the function f 3 (0) = 1 and f 3 (1) = 0, which is better-known as the NOT
function.
The actions of deterministic operations on probabilistic states can be represented
by matrix-vector multiplication. Specifically, the matrix M that represents a given
function f : Σ → Σ is the one that satisfies
M | a⟩ = | f ( a)⟩
for every a ∈ Σ. Such a matrix always exists and is uniquely determined by this
requirement. Matrices that represent deterministic operations always have exactly
one 1 in each column, and 0 for all other entries.
For instance, the matrices M1 , . . . , M4 corresponding to the functions f 1 , . . . , f 4
above are as follows:
! ! ! !
1 1 1 0 0 1 0 0
M1 = , M2 = , M3 = , M4 = .
0 0 0 1 1 0 1 1
Here’s a quick verification showing that the first matrix is correct. The other three
can be checked similarly.
! ! !
1 1 1 1
M1 |0⟩ = = = |0⟩ = | f 1 (0)⟩
0 0 0 0
! ! !
1 1 0 1
M1 |1⟩ = = = |0⟩ = | f 1 (1)⟩
0 0 1 0
1.1. CLASSICAL INFORMATION 9
A convenient way to represent matrices of these and other forms makes use
of an analogous notation for row vectors to the one for column vectors discussed
previously: we denote by ⟨ a| the row vector having a 1 in the entry corresponding
to a and zero for all other entries, for each a ∈ Σ. This vector is read as “bra a.”
For example, if Σ = {0, 1}, then
⟨0| = 1 0 and ⟨1| = 0 1 .
For any classical state set Σ, we can view row vectors and column vectors as
matrices, and perform the matrix multiplication |b⟩⟨ a|. We obtain a square matrix
having a 1 in the entry corresponding to the pair (b, a), meaning that the row of
the entry corresponds to the classical state b and the column corresponds to the
classical state a, with 0 for all other entries. For example,
! !
1 0 1
|0⟩⟨1| = 0 1 = .
0 0 0
Using this notation, we may express the matrix M that corresponds to any given
function f : Σ → Σ as
M = ∑ | f ( a)⟩⟨ a|.
a∈Σ
For example, consider the function f 4 above, for which Σ = {0, 1}. We obtain the
matrix
! ! !
0 0 0 0 0 0
M4 = | f 4 (0)⟩⟨0| + | f 4 (1)⟩⟨1| = |1⟩⟨0| + |1⟩⟨1| = + = .
1 0 0 1 1 1
The reason why this works is as follows. If we again think about vectors as
matrices, and this time consider the multiplication ⟨ a||b⟩, we obtain a 1 × 1 matrix,
which we can think about as a scalar (i.e., a number). For the sake of tidiness, we
write this product as ⟨ a|b⟩ rather than ⟨ a||b⟩. This product satisfies the following
simple formula.
1 a = b
⟨ a|b⟩ =
0 a ̸ = b
Using this observation, together with the fact that matrix multiplication is associa-
tive and linear, we obtain
!
M|b⟩ = ∑ | f (a)⟩⟨a| |b⟩ = ∑ | f (a)⟩⟨a|b⟩ = | f (b)⟩,
a∈Σ a∈Σ
10 LESSON 1. SINGLE SYSTEMS
always map probability vectors to probability vectors, and any matrix that always
maps probability vectors to probability vectors must be a stochastic matrix.
Finally, a different way to think about probabilistic operations is that they are
random choices of deterministic operations. For instance, we can think about the
operation in the example above as applying either the identity function or the
constant 0 function, each with probability 1/2. This is consistent with the equation
! ! !
1 12 1 1 0 1 1 1
= + .
0 1 2
2 0 1 2 0 0
Such an expression is always possible, for an arbitrary choice of a classical state set
and any stochastic matrix having rows and columns identified with it.
Suppose that X is a system having classical state set Σ, and M1 , . . . , Mn are stochastic
matrices representing probabilistic operations on the system X.
If the first operation M1 is applied to the probabilistic state represented by a
probability vector u, the resulting probabilistic state is represented by the vector
M1 u. If we then apply the second probabilistic operation M2 to this new probability
vector, we obtain the probability vector
M2 ( M1 u) = ( M2 M1 )u.
The equality follows from the fact that matrix multiplication, including matrix-
vector multiplication as a special case, is an associative operation. Thus, the prob-
abilistic operation obtained by composing the first and second probabilistic oper-
ations, where we first apply M1 and then apply M2 , is represented by the matrix
M2 M1 , which is necessarily stochastic.
More generally, composing the probabilistic operations represented by the matri-
ces M1 , . . . , Mn in this order, meaning that M1 is applied first, M2 is applied second,
and so on, with Mn applied last, is represented by the matrix product
Mn · · · M1 .
Note that the ordering is important here: although matrix multiplication is associa-
tive, it is not a commutative operation. For example, if
! !
1 1 0 1
M1 = and M2 = ,
0 0 1 0
12 LESSON 1. SINGLE SYSTEMS
then ! !
0 0 1 1
M2 M1 = and M1 M2 = .
1 1 0 0
That is, the order in which probabilistic operations are composed matters; changing
the order in which operations are applied in a composition can change the resulting
operation.
The condition that the sum of the absolute values squared of a quantum state vector
equals 1 is therefore equivalent to that vector having Euclidean norm equal to 1.
That is, quantum state vectors are unit vectors with respect to the Euclidean norm.
The term qubit refers to a quantum system whose classical state set is {0, 1}. That is,
a qubit is really just a bit — but by using this name we explicitly recognize that this
bit can be in a quantum state.
These are examples of quantum states of a qubit:
! !
1 0
= |0⟩ and = |1⟩,
0 1
√1
2
= √1 |0⟩ + √1 |1⟩, (1.1)
√1 2 2
2
!
1+2i
3 1 + 2i 2
= |0⟩ − |1⟩.
− 23 3 3
The first two examples, |0⟩ and |1⟩, illustrate that standard basis elements
are valid quantum state vectors. Their entries are complex numbers, where the
imaginary part of these numbers all happen to be 0, and computing the sum of the
absolute values squared of the entries yields
as required. Similar to the classical setting, we associate the quantum state vectors
|0⟩ and |1⟩ with a qubit being in the classical state 0 and 1, respectively.
For the other two examples, we again have complex number entries, and com-
puting the sum of the absolute value squared of the entries yields
2 2
1 1 1 1
√ + √ = + =1
2 2 2 2
14 LESSON 1. SINGLE SYSTEMS
and
2 2
1 + 2i 2 5 4
+ − = + = 1.
3 3 9 9
These are therefore valid quantum state vectors. Note that they are linear
combinations of the standard basis states |0⟩ and |1⟩, and for this reason we often
say that they’re superpositions of the states 0 and 1. Within the context of quantum
states, superposition and linear combination are essentially synonymous.
The example (1.1) of a qubit state vector above is very commonly encountered —
it is called the plus state and is denoted as follows:
1 1
|+⟩ = √ |0⟩ + √ |1⟩.
2 2
We also use the notation
1 1
|−⟩ = √ |0⟩ − √ |1⟩
2 2
to refer to a related quantum state vector where the second entry is negative rather
than positive, and we call this state the minus state.
This sort of notation, where some symbol other than one referring to a classical
state appears inside of a ket, is common — we can use whatever name we wish
inside of a ket to name a vector. It is quite common to use the notation |ψ⟩, or a
different name in place of ψ, to refer to an arbitrary vector that may not necessarily
be a standard basis vector.
Notice that, if we have a vector |ψ⟩ whose indices correspond to some classical
state set Σ, and if a ∈ Σ is an element of this classical state set, then the matrix
product ⟨ a||ψ⟩ is equal to the entry of the vector |ψ⟩ whose index corresponds to a.
As we did when |ψ⟩ was a standard basis vector, we write ⟨ a|ψ⟩ rather than ⟨ a||ψ⟩
for the sake of readability.
For example, if Σ = {0, 1} and
!
1+2i
1 + 2i 2 3
|ψ⟩ = |0⟩ − |1⟩ = , (1.2)
3 3 − 23
then
1 + 2i 2
⟨0| ψ ⟩ = and ⟨1|ψ⟩ = − .
3 3
In general, when using the Dirac notation for arbitrary vectors, the notation ⟨ψ|
refers to the row vector obtained by taking the conjugate transpose of the column
vector |ψ⟩, where the vector is transposed from a column vector to a row vector and
1.2. QUANTUM INFORMATION 15
each entry is replaced by its complex conjugate. For example, if |ψ⟩ is the vector
defined in (1.2) then
1 − 2i 2
⟨ψ| = ⟨0| − ⟨1| = 1−32i − 32 .
3 3
The reason for taking the complex conjugate, in addition to the transpose, will be
made more clear later on when we discuss inner products.
We can consider quantum states of systems having arbitrary classical state sets. For
example, here is a quantum state vector for an electrical fan switch:
1
2
0 1 i 1
i = |high⟩ − |low⟩ + √ |off⟩.
− 2 2 2 2
√1
2
The assumption in place here is that the classical states are ordered as high, medium,
low, off. There may be no particular reason why one would want to consider a
quantum state of an electrical fan switch, but it is possible in principle.
Here’s another example, this time of a quantum decimal digit whose classical
states are 0, 1, . . . , 9:
1
2
3
4
9
1 5 = √ 1 ∑ (k + 1)|k ⟩.
√
6
385 385 k=0
7
8
9
10
This example illustrates the convenience of writing state vectors using the Dirac
notation. For this particular example, the column vector representation is merely
cumbersome — but if there were significantly more classical states it would become
16 LESSON 1. SINGLE SYSTEMS
This suggests that, as far as standard basis measurements are concerned, the plus
and minus states are no different. Why, then, would we care to make a distinc-
tion between them? The answer is that these two states behave differently when
operations are performed on them, as we will discuss in the next subsection below.
Of course, measuring the quantum state |0⟩ results in the classical state 0 with
certainty, and likewise measuring the quantum state |1⟩ results in the classical
state 1 with certainty. This is consistent with the identification of these quantum
states with the system being in the corresponding classical state, as was suggested
previously.
As a final example, measuring the state
1 + 2i 2
|ψ⟩ = |0⟩ − |1⟩
3 3
causes the two possible outcomes to appear with probabilities as follows:
2
2 1 + 2i 5
Pr(outcome is 0) = ⟨0|ψ⟩ = = ,
3 9
and
2
2 2 4
Pr(outcome is 1) = ⟨1|ψ⟩ = − = .
3 9
Unitary operations
Thus far, it may not be evident why quantum information is fundamentally dif-
ferent from classical information. That is, when a quantum state is measured, the
probability to obtain each classical state is given by the absolute value squared of
the corresponding vector entry — so why not simply record these probabilities in a
probability vector?
The answer, at least in part, is that the set of allowable operations that can be
performed on a quantum state is different than it is for classical information. Similar
to the probabilistic setting, operations on quantum states are linear mappings —
but rather than being represented by stochastic matrices, like in the classical case,
operations on quantum state vectors are represented by unitary matrices.
A square matrix U having complex number entries is unitary if it satisfies the
following two equations.
UU † = I
(1.3)
U†U = I
1.2. QUANTUM INFORMATION 19
Here, I is the identity matrix, and U † is the conjugate transpose of U, meaning the
matrix obtained by transposing U and taking the complex conjugate of each entry.
U† = UT
If either of the two equalities numbered (1.3) above is true, then the other must also
be true. Both equalities are equivalent to U † being the inverse of U:
U −1 = U † .
All of the matrices just defined are unitary, and therefore represent quantum
operations on a single qubit. For example, here is a calculation that verifies that H
is unitary:
†
√1 √1 √1 √1 √1 √1 √1 √1
2 2 2 2
= 2 2 2 2
√1 − √12 √1 − √12 √1 − √12 √1 − √12
2 2 2 2
1
+ 12 1
− 12
!
2 2 1 0
= = .
1
− 1 1
+ 1 0 1
2 2 2 2
And here’s the action of the Hadamard operation on a few commonly encountered
qubit state vectors.
! 1
√1 √1 √
2 2 1 2
H |0⟩ = = = |+⟩
1 1 0 1
√ −√
2 2
√
2
√1 √1 √1
!
2 2 0 2
H |1⟩ = = = |−⟩
√1 − √12 1 − √12
2
1.2. QUANTUM INFORMATION 21
√1 √1 √1
!
2 2 2 1
H |+⟩ = = = |0⟩
√1 − √12 √1 0
2 2
√1 √1 √1
!
2 2 2 0
H |−⟩ = = = |1⟩
√1 − 2
√ 1
− 12
√ 1
2
−1 + 2i 3 + 2i
= √ |0⟩ + √ |1⟩
3 2 3 2
Next, let’s consider the action of a T operation on a plus state.
1 1 1 1 1 1+i
T |+⟩ = T √ |0⟩ + √ |1⟩ = √ T |0⟩ + √ T |1⟩ = √ |0⟩ + |1⟩
2 2 2 2 2 2
22 LESSON 1. SINGLE SYSTEMS
Notice here that we did not bother to convert to the equivalent matrix/vector forms,
and instead used the linearity of matrix multiplication together with the formulas
1+i
T |0⟩ = |0⟩ and T |1⟩ = √ |1⟩.
2
Along similar lines, we may compute the result of applying a Hadamard operation
to the quantum state vector just obtained.
1 1+i 1 1+i
H √ |0⟩ + |1⟩ = √ H |0⟩ + H |1⟩
2 2 2 2
1 1+i
= √ |+⟩ + |−⟩
2 2
1 1 1+i 1+i
= |0⟩ + |1⟩ + √ |0⟩ − √ |1⟩
2 2 2 2 2 2
1 1+i 1 1+i
= + √ |0⟩ + − √ |1⟩
2 2 2 2 2 2
The two approaches — one where we explicitly convert to matrix representations
and the other where we use linearity and plug in the actions of an operation
on standard basis states — are equivalent. We can use whichever one is more
convenient in the case at hand.
That is, R is a square root of NOT operation. Such a behavior, where the same
operation is applied twice to yield a NOT operation, is not possible for a classical
operation on a single bit.
0 1 0
Assuming that the classical states of the system are 0, 1, and 2, we can describe this
operation as addition modulo 3.
This matrix describes an operation known as the quantum Fourier transform, specif-
ically in the 4 × 4 case. The quantum Fourier transform can be defined more
generally, for any positive integer dimension, and plays a key role in quantum
algorithms.
Lesson 2
Multiple Systems
This lesson focuses on the basics of quantum information in the context of multiple
systems. This context arises both commonly and naturally in information process-
ing, classical and quantum; information-carrying systems are typically constructed
from collections of smaller systems, such as bits or qubits.
A simple, yet critically important idea to keep in mind going into this lesson is
that we can always choose to view multiple systems together as if they form a single,
compound system — to which the discussion in the previous lesson applies. Indeed,
this idea very directly leads to a description of how quantum states, measurements,
and operations work for multiple systems.
There is, however, more to understanding multiple quantum systems than
simply recognizing that they may be viewed collectively as single systems. For
instance, we may have multiple quantum systems that are collectively in a particular
quantum state, and then choose to measure some but not all of the individual
systems. In general, this will affect the state of the systems that were not measured,
and it is important to understand exactly how when analyzing quantum algorithms
and protocols. An understanding of the sorts of correlations among multiple systems
— and particularly a type of correlation known as entanglement — is also important
in quantum information and computation.
25
26 LESSON 2. MULTIPLE SYSTEMS
Σ × Γ = ( a, b) : a ∈ Σ and b ∈ Γ .
In simple terms, the Cartesian product is precisely the mathematical notion that
captures the idea of viewing an element of one set and an element of a second set
together, as if they form a single element of a single set.
In the case at hand, to say that (X, Y ) is in the classical state ( a, b) ∈ Σ × Γ means
that X is in the classical state a ∈ Σ and Y is in the classical state b ∈ Γ; and if the
classical state of X is a ∈ Σ and the classical state of Y is b ∈ Γ, then the classical
state of the joint system (X, Y ) is ( a, b).
For more than two systems, the situation generalizes in a natural way. If we sup-
pose that X1 , . . . , Xn are systems having classical state sets Σ1 , . . . , Σn , respectively,
for any positive integer n, the classical state set of the n-tuple (X1 , . . . , Xn ), viewed
as a single joint system, is the Cartesian product
Σ1 × · · · × Σ n = ( a1 , . . . , a n ) : a1 ∈ Σ1 , . . . , a n ∈ Σ n .
2.1. CLASSICAL INFORMATION 27
Of course, we are free to use whatever names we wish for systems, and to order
them as we choose. In particular, if we have n systems like above, we could instead
choose to name them X0 , . . . , Xn−1 and arrange them from right to left, so that the
joint system becomes (Xn−1 , . . . , X0 ). Following the same pattern for naming the
associated classical states and classical state sets, we might then refer to a classical
state
( a n −1 , . . . , a 0 ) ∈ Σ n −1 × · · · × Σ 0
of this compound system.
Indeed, this is the ordering convention used by Qiskit when naming multiple
qubits. We’ll come back to this convention and how it connects to quantum circuits
in the next lesson, but we’ll start using it now to help to get used to it.
It is often convenient to write a classical state of the form ( an−1 , . . . , a0 ) as a string
an−1 · · · a0 for the sake of brevity, particularly in the very typical situation that the
classical state sets Σ0 , . . . , Σn−1 are associated with sets of symbols or characters. In
this context, the term alphabet is commonly used to refer to sets of symbols used to
form strings, but the mathematical definition of an alphabet is precisely the same as
the definition of a classical state set: it is a finite and nonempty set.
For example, suppose that X0 , . . . , X9 are bits, so that the classical state sets of
these systems are all the same.
Σ0 = Σ1 = · · · = Σ9 = {0, 1}
There are then 210 = 1024 classical states of the joint system (X9 , . . . , X0 ), which are
the elements of the set
Σ9 × Σ8 × · · · × Σ0 = {0, 1}10 .
0000000000
0000000001
0000000010
0000000011
0000000100
..
.
1111111111
For the classical state 0000000110, for instance, we see that X1 and X2 are in the
state 1, while all other systems are in the state 0.
28 LESSON 2. MULTIPLE SYSTEMS
Probabilistic states
Recall from the previous lesson that a probabilistic state associates a probability with
each classical state of a system. Thus, a probabilistic state of multiple systems —
viewed collectively as a single system — associates a probability with each element
of the Cartesian product of the classical state sets of the individual systems.
For example, suppose that X and Y are both bits, so that their corresponding
classical state sets are Σ = {0, 1} and Γ = {0, 1}, respectively. Here is a probabilistic
state of the pair (X, Y ) :
Pr (X, Y ) = (0, 0) = 1/2
Pr (X, Y ) = (0, 1) = 0
Pr (X, Y ) = (1, 0) = 0
Pr (X, Y ) = (1, 1) = 1/2
This probabilistic state is one in which both X and Y are random bits — each is 0
with probability 1/2 and 1 with probability 1/2 — but the classical states of the two
bits always agree. This is an example of a correlation between these systems.
There is a simple convention that we follow for doing this, which is to start with
whatever orderings are already in place for the individual classical state sets, and
then to order the elements of the Cartesian product alphabetically. Another way
to say this is that the entries in each n-tuple (or, equivalently, the symbols in each
string) are treated as though they have significance that decreases from left to right.
For example, according to this convention, the Cartesian product {1, 2, 3} × {0, 1}
is ordered like this:
(1, 0), (1, 1), (2, 0), (2, 1), (3, 0), (3, 1).
When n-tuples are written as strings and ordered in this way, we observe familiar
patterns, such as {0, 1} × {0, 1} being ordered as 00, 01, 10, 11, and the set {0, 1}10
being ordered as it was written earlier in the lesson. As another example, viewing
the set {0, 1, . . . , 9} × {0, 1, . . . , 9} as a set of strings, we obtain the two-digit numbers
00 through 99, ordered numerically. This is obviously not a coincidence; our decimal
number system uses precisely this sort of alphabetical ordering, where the word
alphabetical should be understood as having a broad meaning that includes numerals
in addition to letters.
Returning to the example of two bits from above, the probabilistic state described
previously is therefore represented by the following probability vector, where the
entries are labeled explicitly for the sake of clarity.
1
← probability of being in the state 00
2
0 ← probability of being in the state 01
(2.1)
0 ← probability of being in the state 10
1
2 ← probability of being in the state 11
A special type of probabilistic state of two systems is one in which the systems
are independent. Intuitively speaking, two systems are independent if learning the
classical state of either system has no effect on the probabilities associated with the
other. That is, learning what classical state one of the systems is in provides no
information at all about the classical state of the other.
To define this notion precisely, let us suppose once again that X and Y are systems
having classical state sets Σ and Γ, respectively. With respect to a given probabilistic
state of these systems, they are said to be independent if it is the case that
Pr((X, Y ) = ( a, b)) = Pr(X = a) Pr(Y = b) (2.2)
30 LESSON 2. MULTIPLE SYSTEMS
The condition (2.2) for independence is then equivalent to the existence of two
probability vectors
representing the probabilities associated with the classical states of X and Y, respec-
tively, such that
p ab = q a rb (2.4)
for all a ∈ Σ and b ∈ Γ.
For example, the probabilistic state of a pair of bits (X, Y ) represented by the
vector
1 1 1 1
|00⟩ + |01⟩ + |10⟩ + |11⟩
6 12 2 4
is one in which X and Y are independent. Specifically, the condition required for
independence is true for the probability vectors
1 3 2 1
|ϕ⟩ = |0⟩ + |1⟩ and |ψ⟩ = |0⟩ + |1⟩.
4 4 3 3
For instance, to make the probabilities for the 00 state match, we need 16 = 14 × 32 ,
and indeed this is the case. Other entries can be verified in a similar manner.
On the other hand, the probabilistic state (2.1), which we may write as
1 1
|00⟩ + |11⟩, (2.5)
2 2
does not represent independence between the systems X and Y. A simple way to
argue this follows.
Suppose that there did exist probability vectors |ϕ⟩ and |ψ⟩, as in equation (2.3)
above, for which the condition (2.4) is satisfied for every choice of a and b. It would
then necessarily be that
q0 r1 = Pr (X, Y ) = (0, 1) = 0.
2.1. CLASSICAL INFORMATION 31
This implies that either q0 = 0 or r1 = 0, because if both were nonzero, the product
q0 r1 would also be nonzero. This leads to the conclusion that either q0 r0 = 0 (in case
q0 = 0) or q1 r1 = 0 (in case r1 = 0). We see, however, that neither of those equalities
can be true because we must have q0 r0 = 1/2 and q1 r1 = 1/2. Hence, there do not
exist vectors |ϕ⟩ and |ψ⟩ satisfying the property required for independence.
Having defined independence between two systems, we can now define what is
meant by correlation: it is a lack of independence. For example, because the two bits in
the probabilistic state represented by the vector (2.5) are not independent, they are,
by definition, correlated.
The entries of this new vector correspond to the elements of the Cartesian product
Σ × Γ, which are written as strings in the previous equation. Equivalently, the
vector |π ⟩ = |ϕ⟩ ⊗ |ψ⟩ is defined by the equation
⟨ ab|π ⟩ = ⟨ a|ϕ⟩⟨b|ψ⟩
|π ⟩ = |ϕ⟩ ⊗ |ψ⟩
of probability vectors |ϕ⟩ and |ψ⟩ on each of the subsystems X and Y. In this
situation, |π ⟩ is said to be a product state or product vector.
32 LESSON 2. MULTIPLE SYSTEMS
We often omit the symbol ⊗ when taking the tensor product of kets, such as
writing |ϕ⟩|ψ⟩ rather than |ϕ⟩ ⊗ |ψ⟩. This convention captures the idea that the
tensor product is, in this context, the most natural or default way to take the product
of two vectors. Although it is less common, the notation |ϕ ⊗ ψ⟩ is also sometimes
used.
When we use the alphabetical convention for ordering elements of Cartesian
products, we obtain the following specification for the tensor product of two column
vectors.
α1 β 1
.
..
α1 β k
α2 β 1
α1 β1 .
. . ..
.. ⊗ .. =
α β
2 k
αm βk .
..
α
m 1 β
.
.
.
αm β k
As an important aside, notice the following expression for tensor products of
standard basis vectors:
| a⟩ ⊗ |b⟩ = | ab⟩.
We could alternatively write ( a, b) as an ordered pair, rather than a string, in which
case we obtain | a⟩ ⊗ |b⟩ = |( a, b)⟩. It is, however, more common to omit the
parentheses in this situation, instead writing | a⟩ ⊗ |b⟩ = | a, b⟩. This is typical in
mathematics more generally; parentheses that don’t add clarity or remove ambigu-
ity are often simply omitted.
The tensor product of two vectors has the important property that it is bilinear,
which means that it is linear in each of the two arguments separately, assuming
that the other argument is fixed. This property can be expressed through these
equations:
Considering the second equation in each of these pairs of equations, we see that
scalars “float freely” within tensor products:
α|ϕ⟩ ⊗ |ψ⟩ = |ϕ⟩ ⊗ α|ψ⟩ = α |ϕ⟩ ⊗ |ψ⟩ .
(Xn−1 , . . . , X0 )
Similar to the tensor product of just two vectors, the tensor product of three or
more vectors is linear in each of the arguments individually, assuming that all other
34 LESSON 2. MULTIPLE SYSTEMS
arguments are fixed. In this case it is said that the tensor product of three or more
vectors is multilinear.
Like in the case of two systems, we could say that the systems X0 , . . . , Xn−1 are
independent when they are in a product state, but the term mutually independent is
more precise. There happen to be other notions of independence for three or more
systems, such as pairwise independence, that are both interesting and important —
but not in the context of this course.
Generalizing the observation earlier concerning tensor products of standard
basis vectors, for any positive integer n and any classical states a0 , . . . , an−1 , we
have
| a n −1 ⟩ ⊗ · · · ⊗ | a 0 ⟩ = | a n −1 · · · a 0 ⟩.
To be precise, let’s suppose that X and Y are systems whose classical state sets are
Σ and Γ, respectively, and that the two systems together are in some probabilistic
state. We’ll consider what happens when we measure just X and do nothing to
Y. The situation where just Y is measured and nothing happens to X is handled
symmetrically.
First, we know that the probability to observe a particular classical state a ∈ Σ
when just X is measured must be consistent with the probabilities we would obtain
under the assumption that Y was also measured. That is, we must have
∑ Pr
Pr(X = a) = (X, Y ) = ( a, b) .
b∈Γ
This is the formula for the so-called reduced (or marginal) probabilistic state of X
alone.
This formula makes perfect sense at an intuitive level, in the sense that something
very strange would have to happen for it to be wrong. If it were wrong, that would
mean that measuring Y could somehow influence the probabilities associated with
different outcomes of the measurement of X, irrespective of the actual outcome of
the measurement of Y. If Y happened to be in a distant location, such as somewhere
in another galaxy for instance, this would allow for faster-than-light signaling —
which we reject based on our understanding of physics.
Another way to understand this comes from the interpretation of probability
as reflecting a degree of belief. The mere fact that someone else might decide to
look at Y cannot change the classical state of X, so without any information about
what they did or didn’t see, one’s beliefs about the state of X should not change as a
result.
Now, given the assumption that only X is measured and Y is not, there may still
exist uncertainty about the classical state of Y. For this reason, rather than updating
our description of the probabilistic state of (X, Y ) to | ab⟩ for some selection of a ∈ Σ
and b ∈ Γ, we must update our description so that this uncertainty about Y is
properly reflected.
The following conditional probability formula reflects this uncertainty.
Pr (X, Y ) = ( a, b)
Pr(Y = b | X = a) =
Pr(X = a)
|ψ⟩ = ∑ p ab | ab⟩
( a,b)∈Σ×Γ
Pr(X = a) = ∑ pac .
c∈Γ
∑b∈Γ p ab |b⟩
|π a ⟩ = .
∑c∈Γ p ac
In the event that the measurement of X resulted in the classical state a, we therefore
update our description of the probabilistic state of the joint system to | a⟩ ⊗ |π a ⟩.
One way to think about this definition of |π a ⟩ is to see it as a normalization of the
vector ∑b∈Γ p ab |b⟩, where we divide by the sum of the entries in this vector to obtain
a probability vector. This normalization effectively accounts for a conditioning on
the event that the measurement of X has resulted in the outcome a.
For a specific example, suppose that classical state set of X is Σ = {0, 1}, the
classical state set of Y is Γ = {1, 2, 3}, and the probabilistic state of (X, Y ) is
1 1 1 1 1
|ψ⟩ = |0, 1⟩ + |0, 3⟩ + |1, 1⟩ + |1, 2⟩ + |1, 3⟩.
2 12 12 6 6
Our goal will be to determine the probabilities of the two possible outcomes (0
and 1), and to calculate what the resulting probabilistic state of Y is for the two
outcomes, assuming the system X is measured.
2.1. CLASSICAL INFORMATION 37
Using the bilinearity of the tensor product, and specifically the fact that it is
linear in the second argument, we may rewrite the vector |ψ⟩ as follows:
1 1 1 1 1
| ψ ⟩ = |0⟩ ⊗ |1⟩ + |3⟩ + |1⟩ ⊗ |1⟩ + |2⟩ + |3⟩ .
2 12 12 6 6
In words, what we’ve done is to isolate the distinct standard basis vectors for the first
system (i.e., the one being measured), tensoring each with the linear combination of
standard basis vectors for the second system we get by picking out the entries of
the original vector that are consistent with the corresponding classical state of the
first system. A moment’s thought reveals that this is always possible, regardless of
what vector we started with.
Having expressed our probability vector in this way, the effects of measuring
the first system become easy to analyze. The probabilities of the two outcomes can
be obtained by summing the probabilities in parentheses.
1 1 7
Pr(X = 0) = + =
2 12 12
1 1 1 5
Pr(X = 1) = + + =
12 6 6 12
These probabilities sum to one, as expected — but this is a useful check on our
calculations.
And now, the probabilistic state of Y conditioned on each possible outcome can
be inferred by normalizing the vectors in parentheses. That is, we divide these
vectors by the associated probabilities we just calculated, so that they become
probability vectors. Thus, conditioned on X being 0, the probabilistic state of Y
becomes
1 1
2 |1⟩ + 12 |3⟩ 6 1
7
= |1⟩ + |3⟩,
12
7 7
and conditioned on the measurement of X being 1, the probabilistic state of Y
becomes
1 1 1
12 |1⟩ + 6 |2⟩ + 6 |3⟩ 1 2 2
5
= |1⟩ + |2⟩ + |3⟩.
12
5 5 5
|00⟩ 7→ |00⟩
|01⟩ 7→ |01⟩
|10⟩ 7→ |11⟩
|11⟩ 7→ |10⟩
If we were to exchange the roles of X and Y, taking Y to be the control bit and X to
be the target bit, then the matrix representation of the operation would become
1 0 0 0
0 0 0 1
0 0 1 0
0 1 0 0
2.1. CLASSICAL INFORMATION 39
Perform one of the following two operations, each with probability 1/2 :
1. Set Y to be equal to X.
2. Set X to be equal to Y.
assuming we’ve agreed that numbers from 0 to 7 inside of kets refer to the three-bit
binary encodings of those numbers. A third option is to express this operation as a
matrix.
0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0
Independent operations
Now suppose that we have multiple systems and we independently perform different
operations on the systems separately.
For example, taking our usual set-up of two systems X and Y having classical
state sets Σ and Γ, respectively, let us suppose that we perform one operation on
X and, completely independently, another operation on Y. As we know from the
previous lesson, these operations are represented by stochastic matrices — and to
be precise, let us say that the operation on X is represented by the matrix M and
the operation on Y is represented by the matrix N. Thus, the rows and columns
of M have indices that are placed in correspondence with the elements of Σ and,
likewise, the rows and columns of N correspond to the elements of Γ.
A natural question to ask is this: if we view X and Y together as a single,
compound system (X, Y ), what is the matrix that represents the combined action of
the two operations on this compound system? To answer this question we must
first introduce tensor products of matrices, which are similar to tensor products of
vectors and are defined analogously.
2.1. CLASSICAL INFORMATION 41
M= ∑ α ab | a⟩⟨b|
a,b∈Σ
and
N= ∑ β cd |c⟩⟨d|
c,d∈Γ
is the matrix
M⊗N = ∑ ∑ α ab β cd | ac⟩⟨bd|
a,b∈Σ c,d∈Γ
Equivalently, the tensor product of M and N is defined by the equation
for every possible choice of vectors |ϕ⟩ and |ψ⟩, assuming that the indices of |ϕ⟩
correspond to the elements of Σ and the indices of |ψ⟩ correspond to Γ.
Following the convention described previously for ordering the elements of
Cartesian products, we can also write the tensor product of two matrices explicitly
as follows.
α11 · · · α1m β 11 · · · β 1k
. ..
.. ⊗ ... .. ..
.. . . . .
αm1 · · · αmm β k1 · · · β kk
α11 β 11 · · · α11 β 1k α1m β 11 · · · α1m β 1k
. .. .. .. .. ..
.. . . ··· . . .
· · · α11 β kk · · · α1m β kk
α11 β k1 α1m β k1
= .. .. ..
. . .
· · · αm1 β 1k · · · αmm β 1k
αm1 β 11 αmm β 11
. .. .. .. .. ..
.. . . ··· . . .
αm1 β k1 · · · αm1 β kk αmm β k1 · · · αmm β kk
42 LESSON 2. MULTIPLE SYSTEMS
Tensor products of three or more matrices are defined in an analogous way. That
is, if M0 , . . . , Mn−1 are matrices whose indices correspond to classical state sets
Σ0 , . . . , Σn−1 , then the tensor product Mn−1 ⊗ · · · ⊗ M0 is defined by the condition
that
is always true, for any choice of matrices M0 , . . . , Mn−1 and N0 . . . , Nn−1 , provided
that the products M0 N0 , . . . , Mn−1 Nn−1 make sense.
By inspection, we see that this is a stochastic matrix. This will always be the case:
the tensor product of two or more stochastic matrices is always stochastic.
A common situation that we encounter is one in which one operation is per-
formed on one system and nothing is done to another. In such a case, exactly the
same prescription is followed, bearing in mind that doing nothing is represented
by the identity matrix. For example, resetting the bit X to the 0 state and doing
nothing to Y yields the probabilistic (and in fact deterministic) operation on (X, Y )
represented by the matrix
! ! 1 0 1 0
1 1 1 0 0
1 0 1
⊗ = .
0 0 0 1 0 0 0 0
0 0 0 0
Quantum states
Multiple systems can be viewed collectively as single, compound systems. We’ve
already observed this in the probabilistic setting, and the quantum setting is anal-
ogous. Quantum states of multiple systems are therefore represented by column
vectors having complex number entries and Euclidean norm equal to 1, just like
quantum states of single systems. In the multiple system case, the entries of these
44 LESSON 2. MULTIPLE SYSTEMS
vectors are placed in correspondence with the Cartesian product of the classical state
sets associated with each of the individual systems, because that’s the classical state
set of the compound system.
For instance, if X and Y are qubits, then the classical state set of the pair of qubits
(X, Y ), viewed collectively as a single system, is the Cartesian product {0, 1} × {0, 1}.
By representing pairs of binary values as binary strings of length two, we associate
this Cartesian product set with the set {00, 01, 10, 11}. The following vectors are
therefore all examples of quantum state vectors of the pair (X, Y ):
1 1 i 1 3 4
√ |00⟩ − √ |01⟩ + √ |10⟩ + √ |11⟩, |00⟩ − |11⟩, and |01⟩.
2 6 6 6 5 5
There are variations on how quantum state vectors of multiple systems are
expressed, and we can choose whichever variation suits our preferences. Here are
some examples for the first quantum state vector above.
1. We may use the fact that | ab⟩ = | a⟩|b⟩ (for any classical states a and b) to
instead write
1 1 i 1
√ |0⟩|0⟩ − √ |0⟩|1⟩ + √ |1⟩|0⟩ + √ |1⟩|1⟩.
2 6 6 6
2. We may choose to write the tensor product symbol explicitly like this:
1 1 i 1
√ |0⟩ ⊗ |0⟩ − √ |0⟩ ⊗ |1⟩ + √ |1⟩ ⊗ |0⟩ + √ |1⟩ ⊗ |1⟩.
2 6 6 6
3. We may subscript the kets to indicate how they correspond to the systems
being considered, like this:
1 1 i 1
√ |0⟩X |0⟩Y − √ |0⟩X |1⟩Y + √ |1⟩X |0⟩Y + √ |1⟩X |1⟩Y .
2 6 6 6
Of course, we may also write quantum state vectors explicitly as column vectors:
1
√
2
− √1
6
√i .
6
√1
6
Depending upon the context in which it appears, one of these variations may be
preferred — but they are all equivalent in the sense that they describe the same
vector.
2.2. QUANTUM INFORMATION 45
Similar to what we have for probability vectors, tensor products of quantum state
vectors are also quantum state vectors — and again they represent independence
among systems.
In greater detail, and beginning with the case of two systems, suppose that
|ϕ⟩ is a quantum state vector of a system X and |ψ⟩ is a quantum state vector of
a system Y. The tensor product |ϕ⟩ ⊗ |ψ⟩, which may alternatively be written as
|ϕ⟩|ψ⟩ or as |ϕ ⊗ ψ⟩, is then a quantum state vector of the joint system (X, Y ). Again
we refer to a state of this form as a being a product state.
Intuitively speaking, when a pair of systems (X, Y ) is in a product state |ϕ⟩ ⊗ |ψ⟩,
we may interpret this as meaning that X is in the quantum state |ϕ⟩, Y is in the
quantum state |ψ⟩, and the states of the two systems have nothing to do with one
another.
The fact that the tensor product vector |ϕ⟩ ⊗ |ψ⟩ is indeed a quantum state vector
is consistent with the Euclidean norm being multiplicative with respect to tensor
products: s
∑
2
|ϕ⟩ ⊗ |ψ⟩ = ⟨ ab|ϕ ⊗ ψ⟩
( a,b)∈Σ×Γ
s
∑∑
2
= ⟨ a|ϕ⟩⟨b|ψ⟩
a∈Σ b∈Γ
s
∑
2
∑
2
= ⟨ a|ϕ⟩ ⟨b|ψ⟩
a∈Σ b∈Γ
= |ϕ⟩ |ψ⟩ .
Because |ϕ⟩ and |ψ⟩ are quantum state vectors, we have ∥|ϕ⟩∥ = 1 and ∥|ψ⟩∥ = 1,
and therefore ∥|ϕ⟩ ⊗ |ψ⟩∥ = 1, so |ϕ⟩ ⊗ |ψ⟩ is also a quantum state vector.
This generalizes to more than two systems. If |ψ0 ⟩, . . . , |ψn−1 ⟩ are quantum state
vectors of systems X0 , . . . , Xn−1 , then |ψn−1 ⟩ ⊗ · · · ⊗ |ψ0 ⟩ is a quantum state vector
representing a product state of the joint system (Xn−1 , . . . , X0 ). Again, we know that
this is a quantum state vector because
Entangled states
Not all quantum state vectors of multiple systems are product states. For example,
the quantum state vector
1 1
√ |00⟩ + √ |11⟩ (2.6)
2 2
of two qubits is not a product state. To reason this, we may follow exactly the same
argument that we used in the previous section for a probabilistic state. That is, if
(2.6) were a product state, there would exist quantum state vectors |ϕ⟩ and |ψ⟩ for
which
1 1
|ϕ⟩ ⊗ |ψ⟩ = √ |00⟩ + √ |11⟩.
2 2
But then it would necessarily be the case that
⟨0|ϕ⟩⟨1|ψ⟩ = ⟨01|ϕ ⊗ ψ⟩ = 0
implying that ⟨0|ϕ⟩ = 0 or ⟨1|ψ⟩ = 0 (or both). That contradicts the fact that
1
⟨0|ϕ⟩⟨0|ψ⟩ = ⟨00|ϕ ⊗ ψ⟩ = √
2
and
1
⟨1|ϕ⟩⟨1|ψ⟩ = ⟨11|ϕ ⊗ ψ⟩ = √
2
are both nonzero. Thus, the quantum state vector (2.6) represents a correlation
between two systems, and specifically we say that the systems are entangled.
√
Notice that the specific value 1/ 2 is not important to this argument — all that
is important is that this value is nonzero. Thus, for instance, the quantum state
3 4
|00⟩ + |11⟩
5 5
is also not a product state, by the same argument.
Entanglement is a quintessential feature of quantum information that will be
discussed in greater detail in a later lesson. Entanglement can be complicated,
particularly for the sorts of noisy quantum states that can be described by density
matrices, which are discussed later in the course in Lesson 9 (Density Matrices). For
quantum state vectors, however, entanglement is equivalent to correlation: any
quantum state vector that is not a product state represents an entangled state.
2.2. QUANTUM INFORMATION 47
1 i 1 i
|00⟩ + |01⟩ − |10⟩ − |11⟩
2 2 2 2
1 1 1 i
= √ |0⟩ − √ |1⟩ ⊗ √ |0⟩ + √ |1⟩
2 2 2 2
Hence, this state is not entangled.
Bell states
| ϕ ⟩, | ϕ − ⟩, | ψ + ⟩, | ψ − ⟩
+
is known as the Bell basis. True to its name, this is a basis; any quantum state vector
of two qubits, or indeed any complex vector at all having entries corresponding to
the four classical states of two bits, can be expressed as a linear combination of the
four Bell states. For example,
1 1
|00⟩ = √ |ϕ+ ⟩ + √ |ϕ− ⟩.
2 2
48 LESSON 2. MULTIPLE SYSTEMS
Next we will consider two interesting examples of states of three qubits. The first
example is the GHZ state (so named in honor of Daniel Greenberger, Michael Horne,
and Anton Zeilinger, who first studied some of its properties):
1 1
√ |000⟩ + √ |111⟩.
2 2
The second example is the so-called W state:
1 1 1
√ |001⟩ + √ |010⟩ + √ |100⟩.
3 3 3
Neither of these states is a product state, meaning that they cannot be written as a
tensor product of three qubit quantum state vectors. We’ll examine both of these
states later when we discuss partial measurements of quantum states of multiple
systems.
Additional examples
The examples of quantum states of multiple systems we’ve seen so far are states of
two or three qubits, but we can also consider quantum states of multiple systems
having different classical state sets.
For example, here’s a quantum state of three systems, X, Y, and Z, where the
classical state set of X is the binary alphabet (so X is a qubit) and the classical state
set of Y and Z is {♣, ♢, ♡, ♠} :
1 1 1
|0⟩|♡⟩|♡⟩ + |1⟩|♠⟩|♡⟩ − √ |0⟩|♡⟩|♢⟩.
2 2 2
And here’s an example of a quantum state of three systems, X, Y, and Z, that all
share the same classical state set {0, 1, 2} :
Systems having the classical state set {0, 1, 2} are often called trits or (assuming that
they can be in a quantum state) qutrits. The term qudit refers to a system having
classical state set {0, . . . , d − 1} for an arbitrary choice of d.
2.2. QUANTUM INFORMATION 49
3 4i
|0⟩|♡⟩ − |1⟩|♠⟩,
5 5
then measuring both systems with standard basis measurements yields the outcome
(0, ♡) with probability 9/25 and the outcome (1, ♠) with probability 16/25.
Partial measurements
Now let us consider the situation in which we have multiple systems in some
quantum state, and we measure a proper subset of the systems. As before, we will
begin with two systems X and Y having classical state sets Σ and Γ, respectively.
In general, a quantum state vector of (X, Y ) takes the form
|ψ⟩ = ∑ α ab | ab⟩,
( a,b)∈Σ×Γ
∑ |α ab |2 = 1,
( a,b)∈Σ×Γ
We already know, from the discussion above, that if both X and Y are measured,
then each possible outcome ( a, b) ∈ Σ × Γ appears with probability
2
⟨ ab|ψ⟩ = |α ab |2 .
If we suppose instead that just the first system X is measured, the probability for
each outcome a ∈ Σ to appear must therefore be equal to
∑
2
⟨ ab|ψ⟩ = ∑ |αab |2.
b∈Γ b∈Γ
This is consistent with what we already saw in the probabilistic setting, as well as
our current understanding of physics: the probability for each outcome to appear
when X is measured can’t possibly depend on whether or not Y was also measured,
as that would allow for faster-than-light communication.
Having obtained a particular outcome a ∈ Σ of a standard basis measurement of
X, we naturally expect that the quantum state of X changes so that it is equal to | a⟩,
just like we had for single systems. But what happens to the quantum state of Y?
To answer this question, we can first express the vector |ψ⟩ as
|ψ⟩ = ∑ | a ⟩ ⊗ | ϕa ⟩ ,
a∈Σ
where
| ϕa ⟩ = ∑ αab |b⟩
b∈Γ
for each a ∈ Σ. Here we’re following the same methodology as in the probabilistic
case, of isolating the standard basis states of the system being measured. The
probability for the standard basis measurement of X to give each outcome a is then
as follows
∑ |αab |2 = |ϕa ⟩
2
b∈Γ
And, as a result of the standard basis measurement of X giving the outcome a, the
quantum state of the pair (X, Y ) together becomes
| ϕa ⟩
| a⟩ ⊗ .
∥|ϕa ⟩∥
That is, the state “collapses” like in the single-system case, but only as far as is
required for the state to be consistent with the measurement of X having produced
the outcome a.
2.2. QUANTUM INFORMATION 51
The same technique, used in a symmetric way, describes what happens if the
second system Y is measured rather than the first. This time we rewrite the vector
|ψ⟩ as
1 i 1 1
| ψ ⟩ = √ |0⟩ + √ |1⟩ ⊗ |0⟩ + − √ |0⟩ + √ |1⟩ ⊗ |1⟩.
2 6 6 6
52 LESSON 2. MULTIPLE SYSTEMS
For quantum state vectors, there isn’t an analogous way to do this. In particular, for
a quantum state vector
|ψ⟩ = ∑ αab |ab⟩,
( a,b)∈Σ×Γ
2.2. QUANTUM INFORMATION 53
the vector
∑ α ab | a⟩
( a,b)∈Σ×Γ
is not a quantum state vector in general, and does not properly represent the concept
of a reduced or marginal state.
Density matrices do, in fact, provide us with a meaningful way to define reduced
quantum states in an analogous way to the probabilistic setting.
Partial measurements for three or more systems, where some proper subset of the
systems are measured, can be reduced to the case of two systems by dividing the
systems into two collections, those that are measured and those that are not.
Here is a specific example that illustrates how this can be done. It demonstrates
specifically how subscripting kets by the names of the systems they represent can
be useful — in this case because it gives us a simple way to describe permutations
of the systems.
For this example, consider a quantum state of a 5-tuple of systems (X4 , . . . , X0 ),
where all five of these systems share the same classical state set {♣, ♢, ♡, ♠} :
r r r
1 2 1
|♡⟩|♣⟩|♢⟩|♠⟩|♠⟩ + |♢⟩|♣⟩|♢⟩|♠⟩|♣⟩ + |♠⟩|♠⟩|♣⟩|♢⟩|♣⟩
7 7 r 7
r
2 1
−i |♡⟩|♣⟩|♢⟩|♡⟩|♡⟩ − |♠⟩|♡⟩|♣⟩|♠⟩|♣⟩.
7 7
We’ll examine the situation in which the first and third systems are measured, and
the remaining systems are left alone.
Conceptually speaking, there’s no fundamental difference between this situation
and one in which one of two systems is measured. Unfortunately, because the
measured systems are interspersed with the unmeasured systems, we face a hurdle
in writing down the expressions needed to perform these calculations.
One way to proceed, as suggested above, is to subscript the kets to indicate
which systems they refer to. This gives us a way to keep track of the systems as we
permute the ordering of the kets, which makes the mathematics simpler.
54 LESSON 2. MULTIPLE SYSTEMS
We now see that, if the systems X4 and X2 are measured, the (nonzero) probabili-
ties of the different outcomes are as follow:
• The measurement outcome (♡, ♢) occurs with probability
2
r r
1 2 1 2 3
|♣⟩3 |♠⟩1 |♠⟩0 − i |♣⟩3 |♡⟩1 |♡⟩0 = + =
7 7 7 7 7
• The measurement outcome (♢, ♢) occurs with probability
2
r
2 2
|♣⟩3 |♠⟩1 |♣⟩0 =
7 7
• The measurement outcome (♠, ♣) occurs with probability
2
r r
1 1 1 1 2
|♠⟩3 |♢⟩1 |♣⟩0 − |♡⟩3 |♠⟩1 |♣⟩0 = + = .
7 7 7 7 7
If the measurement outcome is (♡, ♢), for instance, the resulting state of our five
systems becomes
q q
1 2
7 |♣⟩3 |♠⟩1 |♠⟩0 − i 7 |♣⟩3 |♡⟩1 |♡⟩0
|♡⟩4 |♢⟩2 ⊗ q
3
7
r r
1 2
= |♡⟩4 |♣⟩3 |♢⟩2 |♠⟩1 |♠⟩0 − i |♡⟩4 |♣⟩3 |♢⟩2 |♡⟩1 |♡⟩0 .
3 3
Here, for the final answer, we’ve reverted back to our original ordering of the
systems, just to illustrate that we can do this. For the other possible measurement
outcomes, the state can be determined in a similar way.
Finally, here are two examples promised earlier, beginning with the GHZ state
1 1
√ |000⟩ + √ |111⟩.
2 2
If just the first system is measured, we obtain the outcome 0 with probability 1/2,
in which case the state of the three qubits becomes |000⟩; and we also obtain the
outcome 1 with probability 1/2, in which case the state of the three qubits becomes
|111⟩.
For a W state, on the other hand, assuming again that just the first system is
measured, we begin by writing this state like this:
1 1 1
√ |001⟩ + √ |010⟩ + √ |100⟩
3 3 3
1 1 1
= |0⟩ √ |01⟩ + √ |10⟩ + |1⟩ √ |00⟩ .
3 3 3
56 LESSON 2. MULTIPLE SYSTEMS
The probability that a measurement of the first qubit results in the outcome 0 is
therefore equal to
2
1 1 2
√ |01⟩ + √ |10⟩ = ,
3 3 3
and conditioned upon the measurement producing this outcome, the quantum state
of the three qubits becomes
√1 |01⟩ + √1 |10⟩
3 3 1 1
|0⟩ ⊗ q = |0⟩ √ |01⟩ + √ |10⟩ = |0⟩|ψ+ ⟩.
2 2 2
3
The probability that the measurement outcome is 1 is 1/3, in which case the state of
the three qubits becomes |100⟩.
The W state is symmetric, in the sense that it does not change if we permute the
qubits. We therefore obtain a similar description for measuring the second or third
qubit rather than the first.
Unitary operations
In principle, any unitary matrix whose rows and columns correspond to the classical
states of a system represents a valid quantum operation on that system. This, of
course, remains true for compound systems, whose classical state sets happen to be
Cartesian products of the classical state sets of the individual systems.
Focusing in on two systems, if X is a system having classical state set Σ, and Y is
a system having classical state set Γ, then the classical state set of the joint system
(X, Y ) is Σ × Γ. Therefore, quantum operations on this joint system are represented
by unitary matrices whose rows and columns are placed in correspondence with
the set Σ × Γ. The ordering of the rows and columns of these matrices is the same
as the ordering used for quantum state vectors of the system (X, Y ).
For example, let us suppose that Σ = {1, 2, 3} and Γ = {0, 1}, and recall that the
standard convention for ordering the elements of the Cartesian product {1, 2, 3} ×
{0, 1} is this:
(1, 0), (1, 1), (2, 0), (2, 1), (3, 0), (3, 1).
2.2. QUANTUM INFORMATION 57
2 2i − 12 0 − 2i
1
0
1
2 − 12 12 0 0 −2 1
U= .
0 0 1 1
0 √ √ 0
2 2
1 − i −1 0 0 i
2 2 2 2
0 0 0 − √1 √1 0
2 2
This unitary matrix isn’t special, it’s just an example. To check that U is unitary, it
suffices to compute and check that U † U = I, for instance. Alternatively, we can
check that the rows (or the columns) are orthonormal, which is made simpler in
this case given the particular form of the matrix U.
The action of U on the standard basis vector |1, 1⟩, for instance, is
1 i 1 i
U |1, 1⟩ = |1, 0⟩ + |1, 1⟩ − |2, 0⟩ − |3, 0⟩,
2 2 2 2
which we can see by examining the second column of U, considering our ordering
of the set {1, 2, 3} × {0, 1}.
As with any matrix, it is possible to express U using Dirac notation, which
would require 20 terms for the 20 nonzero entries of U. If we did write down all of
these terms, however, rather than writing a 6 × 6 matrix, it would be messy and the
patterns that are evident from the matrix expression would not likely be as clear.
Simply put, Dirac notation is not always the best choice.
Unitary operations on three or more systems work in a similar way, with the
unitary matrices having rows and columns corresponding to the Cartesian product
of the classical state sets of the systems. We’ve already seen one example in this
lesson: the three-qubit operation
7
∑ |(k + 1) mod 8⟩⟨k|,
k =0
where numbers in bras and kets mean their 3-bit binary encodings. In addition to
being a deterministic operation, this is also a unitary operation. Operations that are
both deterministic and unitary are sometimes called reversible operations, and are
58 LESSON 2. MULTIPLE SYSTEMS
This represents the reverse, or in mathematical terms the inverse, of the original
operation — which is what we expect from the conjugate transpose of a unitary
matrix. We’ll see other examples of unitary operations on multiple systems as the
lesson continues.
for any chosen matrices M0 , . . . , Mn−1 . This can be checked by going back to the
definition of the tensor product and of the conjugate transpose, and checking that
each entry of the two sides of the equation are in agreement. This means that
Here we have written I0 , . . . , In−1 to refer to the matrices representing the iden-
tity operation on the systems X0 , . . . , Xn−1 , which is to say that these are identity
matrices whose sizes agree with the number of classical states of X0 , . . . , Xn−1 .
Finally, the tensor product In−1 ⊗ · · · ⊗ I0 is equal to the identity matrix for
which we have a number of rows and columns that agrees with the product of
the number of rows and columns of the matrices In−1 , . . . , I0 . This larger identity
matrix represents the identity operation on the joint system (Xn−1 , . . . , X0 ).
In summary, we have the following sequence of equalities.
(Un−1 ⊗ · · · ⊗ U0 )† (Un−1 ⊗ · · · ⊗ U0 )
= (Un†−1 ⊗ · · · ⊗ U0† )(Un−1 ⊗ · · · ⊗ U0 )
= (Un†−1 Un−1 ) ⊗ · · · ⊗ (U0† U0 )
= In−1 ⊗ · · · ⊗ I0 = I
We therefore conclude that Un−1 ⊗ · · · ⊗ U0 is unitary.
An important situation that often arises is one in which a unitary operation is
applied to just one system — or a proper subset of systems — within a larger joint
system. For instance, suppose that X and Y are systems that we can view together
as forming a single, compound system (X, Y ), and we perform an operation just on
the system X. To be precise, let us suppose that U is a unitary matrix representing an
operation on X, so that its rows and columns have been placed in correspondence
with the classical states of X.
To say that we perform the operation represented by U just on the system X
implies that we do nothing to Y, meaning that we independently perform U on
X and the identity operation on Y. That is, “doing nothing” to Y is equivalent to
performing the identity operation on Y, which is represented by the identity matrix
IY . (Here, by the way, the subscript Y tells us that IY refers to the identity matrix
having a number of rows and columns in agreement with the classical state set
of Y.) The operation on (X, Y ) that is obtained when we perform U on X and do
nothing to Y is therefore represented by the unitary matrix U ⊗ IY .
For example, if X and Y are qubits, performing a Hadamard operation on X and
doing nothing to Y is equivalent to performing the operation
1
√1
√ 0 0
2 2
!
√1 √1 0 √1 0 √1
1 0
H ⊗ IY = 2 2 2 2
⊗ =
√1 1 0 1 √1 1
− 2√ 2 0 − 2 √ 0
2
0 √1 0 − √1
2 2
60 LESSON 2. MULTIPLE SYSTEMS
√1 √1
0 0
2 2
√1 √1 √1 − √12
!
1 0 0 0
⊗ 12 2 2
= .
0 1 √ − √12 0 0 √1 √1
2 2 2
1
0 0 √
2
− √1 2
To conclude the lesson, let’s take a look at two classes of examples of unitary
operations on multiple systems, beginning with the swap operation.
Suppose that X and Y are systems that share the same classical state set Σ. The
swap operation on the pair (X, Y ) is the operation that exchanges the contents of the
two systems, but otherwise leaves the systems alone — so that X remains on the left
and Y remains on the right. We’ll denote this operation as SWAP, and it operates
like this for every choice of classical states a, b ∈ Σ :
One way to write the matrix associated with this operation using the Dirac notation
is as follows:
SWAP = ∑ |c⟩⟨d| ⊗ |d⟩⟨c|.
c,d∈Σ
It may not be immediately clear that this matrix represents SWAP, but we can check
it satisfies the condition SWAP | a⟩|b⟩ = |b⟩| a⟩ for every choice of classical states
2.2. QUANTUM INFORMATION 61
Controlled-unitary operations
Now let us suppose that Q is a qubit and R is an arbitrary system, having whatever
classical state set we wish. For every unitary operation U acting on the system R, a
controlled-U operation is a unitary operation on the pair (Q, R) defined as follows.
|0⟩⟨0| ⊗ IR + |1⟩⟨1| ⊗ U
For example, if R is also a qubit, and we consider the Pauli X operation on R,
then a controlled-X operation is given by
1 0 0 0
0 1 0 0
|0⟩⟨0| ⊗ IR + |1⟩⟨1| ⊗ X = .
0 0 0 1
0 0 1 0
We already encountered this operation in the context of classical information and
probabilistic operations earlier in the lesson. Replacing the Pauli X operation on R
with a Z operation gives this operation:
1 0 0 0
0 1 0 0
|0⟩⟨0| ⊗ IR + |1⟩⟨1| ⊗ Z = .
0 0 1 0
0 0 0 −1
If instead we take R to be two qubits, and we take U to be the swap operation
between these two qubits, we obtain this operation:
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
CSWAP = .
0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1
62 LESSON 2. MULTIPLE SYSTEMS
0 0 0 0 0 0 1 0
Quantum Circuits
This lesson introduces the quantum circuit model of computation, which provides a
standard way to describe quantum computations.
The lesson also introduces a few important mathematical concepts, including
inner products between vectors, the notions of orthogonality and orthonormality, and
projections and projective measurements, which generalize standard basis measure-
ments. Through these concepts, we’ll derive fundamental limitations on quantum
information, including the no-cloning theorem and the impossibility to perfectly
discriminate non-orthogonal quantum states.
3.1 Circuits
In computer science, circuits are models of computation in which information
is carried by wires through a network of gates, which represent operations on
the information carried by the wires. Quantum circuits are a specific model of
computation based on this more general concept.
Although the word “circuit” often refers to a circular path, circular paths aren’t
actually allowed in the circuit models of computation that are most commonly
studied. That is to say, we usually consider acyclic circuits when we’re thinking
about circuits as computational models. Quantum circuits follow this pattern;
a quantum circuit represents a finite sequence of operations that cannot contain
feedback loops.
63
64 LESSON 3. QUANTUM CIRCUITS
Boolean circuits
Figure 3.1 shows an example of a (classical) Boolean circuit, where the wires carry
binary values and the gates represent Boolean logic operations. The flow of infor-
Y ∧
¬
¬
X ∧
Figure 3.1: A Boolean circuit for computing the exclusive-OR of two bits.
mation along the wires goes from left to right: the wires on the left-hand side of the
figure labeled X and Y are input bits, which can be set to whatever binary values we
choose, and the wire on the right-hand side is the output. The intermediate wires
take values determined by the gates, which are evaluated from left to right.
The gates are AND gates (labeled ∧), OR gates (labeled ∨), and NOT gates
(labeled ¬). The functions computed by these gates will likely be familiar to many
readers, but here they are represented by tables of values:
a ¬a ab a ∧ b ab a ∨ b
0 1 00 0 00 0
1 0 01 0 01 1
10 0 10 1
11 1 11 1
The two small, solid circles on the wires just to the right of the names X and
Y represent fan-out operations, which simply create a copy of whatever value is
carried on the wire on which they appear, allowing this value to be input into
multiple gates. Fan-out operations are not always considered to be gates in the
classical setting; sometimes they’re treated as if they’re “free” in some sense. When
Boolean circuits are converted into equivalent quantum circuits, however, we do
3.1. CIRCUITS 65
need to classify fan-out operations explicitly as gates to handle and account for
them correctly.
The same circuit is illustrated in Figure 3.2 using a style more common in
electrical engineering, which uses conventional symbols for the AND, OR, and
NOT gates. We won’t use this style or these particular gate symbols further, but
we will use different symbols to represent gates in quantum circuits, which we’ll
explain as we encounter them.
Figure 3.2: The same Boolean circuit as in Figure 3.1 expressed using standardized
symbols in electrical engineering.
The particular circuit in this example computes the exclusive-OR (or XOR for
short), which is denoted by the symbol ⊕:
ab a ⊕ b
00 0
01 1
10 1
11 0
Figure 3.3 illustrates the evaluation of our circuit on just one choice for the
inputs: X = 0 and Y = 1. Each wire is labeled by value it carries so you can follow
the operations. The output value is 1 in this case, which is the correct value for the
XOR: 0 ⊕ 1 = 1. The other three possible input settings can be checked in a similar
way.
1
1 ∧
Y
¬ 1
1
1
1
∨
0
0
0 ¬ 0
X ∧
0
Figure 3.3: The same Boolean circuit as in Figure 3.1 evaluated on the inputs X = 0
and Y = 1.
In arithmetic circuits, for instance, the wires may carry integer values while
the gates represent arithmetic operations, such as addition and multiplication.
Figure 3.4 depicts an arithmetic circuit that takes two variable input values (x and y)
as well as a third input set to the value 1. The values carried by the wires, as
functions of the values x and y, are shown in the figure.
x
x ∗
x
x2
y +
y
x2 + y
y x 2 y + x 2 + y2 + y
∗
+
1 y+1
We can also consider circuits that incorporate randomness, such as ones where
gates represent probabilistic operations.
3.1. CIRCUITS 67
Quantum circuits
In the quantum circuit model, wires represent qubits and gates represent operations
on these qubits. We’ll focus for now on operations we’ve encountered so far, namely
unitary operations and standard basis measurements. As we learn about other sorts of
quantum operations and measurements, we can enhance our model accordingly.
A simple example of a quantum circuit is shown in Figure 3.5. In this circuit,
we have a single qubit named X, which is represented by the horizontal line, and
a sequence of gates representing unitary operations on this qubit. Just like in the
examples above, the flow of information goes from left to right — so the first
operation performed is a Hadamard operation, the second is an S operation, the
third is another Hadamard operation, and the final operation is a T operation.
Applying the entire circuit therefore applies the composition of these operations,
THSH, to the qubit X.
X H S H T
1+ i √1
|0⟩ H S H T 2 |0⟩ + 2 |1⟩
Figure 3.6: The circuit from Figure 3.5 evaluated on the input |0⟩.
circuits often start out with all qubits initialized to |0⟩, as we have in this case, but
there are also situations where the input qubits are initially set to different states.
Another example of a quantum circuit, this time with two qubits, is shown in
Figure 3.7. As always, the gate labeled H refers to a Hadamard operation, while
the second gate is a controlled-NOT operation: the solid circle represents the control
qubit and the circle resembling the symbol ⊕ denotes the target qubit.
68 LESSON 3. QUANTUM CIRCUITS
Y H
X +
Before examining this circuit in greater detail and explaining what it does, it is
imperative that we first clarify how qubits are ordered in quantum circuits. This
connects with the convention that Qiskit uses for naming and ordering systems that
was mentioned briefly in the previous lesson.
In Qiskit, the topmost qubit in a circuit diagram has index 0 and corresponds
to the rightmost position in a tuple of qubits (or in a string, Cartesian product,
or tensor product corresponding to this tuple), the second-from-top qubit has
index 1 and corresponds to the position second-from-right in a tuple, and so on.
The bottommost qubit, which has the highest index, therefore corresponds to
the leftmost position in a tuple.
In particular, Qiskit’s default names for the qubits in an n-qubit circuit are
represented by the n-tuple (qn−1 , . . . , q0 ), with q0 being the qubit on the top
and qn−1 on the bottom in quantum circuit diagrams.
Please be aware that this is a reversal of a more common convention for ordering
qubits in circuits, and is a frequent source of confusion.
Although we sometimes deviate from the specific default names q0 , . . . , qn−1
used for qubits by Qiskit, we will always follow the ordering convention described
above when interpreting circuit diagrams throughout this course. Thus, our inter-
pretation of the circuit above is that it describes an operation on a pair of qubits
(X, Y ). If the input to the circuit is a quantum state |ψ⟩ ⊗ |ϕ⟩, for instance, then this
means that the lower qubit X starts in the state |ψ⟩ and the upper qubit Y starts in
the state |ϕ⟩.
3.1. CIRCUITS 69
Now, to understand what the circuit in Figure 3.7 does, we can go from left to
right through its operations.
√1 √1
0 0
2 2
√1 − √12 0 0
2
I⊗H =
.
0 0 √1 √1
2 2
1
0 0 √
2
− √1 2
Note that the identity matrix is on the left of the tensor product and H is on
the right, which is consistent with Qiskit’s ordering convention.
2. The second operation is the controlled-NOT operation, where Y is the control
and X is the target. The controlled-NOT gate’s action on standard basis states
is illustrated in Figure 3.8.
Given that we order the qubits as (X, Y ), with X being on the bottom and Y
being on the top of our circuit, the matrix representation of the controlled-NOT
gate is this:
1 0 0 0
0 0 0 1
.
0 0 1 0
0 1 0 0
The unitary operation implemented by the entire circuit, which we’ll give the
name U, is the composition of the operations:
√1 √1 √1 √1
0 0 0 0
0 2 2 2 2
1 0 0
√1 − √12 0 0 √1 − √12
0 0
0
0 0 1
2 2
U= = .
√1 √1 1 √1
0 0 1 0 0 0 0
0 √
2 2 2 2
0 1 0 0
1 1 √1
0 0 √
2
−√ 2 2
− √12 0 0
70 LESSON 3. QUANTUM CIRCUITS
|b⟩ |b⟩
| a⟩ + | a ⊕ b⟩
Y H
X +
Figure 3.9: A quantum circuit including measurements and classical bit wires.
Y H B
X + A
on two qubits, X and Y, just like in the previous example. We also have two classical
bits, A and B, as well as two measurement gates. The measurement gates represent
standard basis measurements: the qubits are changed into their post-measurement
states, while the measurement outcomes are overwritten onto the classical bits to
which the arrows point.
It’s often convenient to depict a measurement as a gate that takes a qubit as
input and outputs a classical bit (as opposed to outputting the qubit in its post-
measurement state and writing the result to a separate classical bit). This means the
measured qubit has been discarded and can safely be ignored thereafter, its state
having changed into |0⟩ or |1⟩ depending upon the measurement outcome. For
example, the circuit diagram in Figure 3.10 represents the same process as the one
in Figure 3.9, but where we disregard X and Y after measuring them.
72 LESSON 3. QUANTUM CIRCUITS
X Y Z
H S T
As the course continues, we’ll see more examples of quantum circuits, which are
usually more complicated than the simple examples above. Here are some examples
of symbols used to denote gates that commonly appear in circuit diagrams.
+
+
Inner products
Recall from Lesson 1 (Single Systems) that when we use the Dirac notation to refer
to an arbitrary column vector as a ket, such as
α1
α2
|ψ⟩ = .. ,
.
αn
74 LESSON 3. QUANTUM CIRCUITS
Alternatively, if we have some classical state set Σ in mind, and we express a column
vector as a ket such as
| ψ ⟩ = ∑ α a | a ⟩,
a∈Σ
then the corresponding row (or bra) vector is the conjugate transpose
⟨ψ| = ∑ α a ⟨ a |. (3.2)
a∈Σ
We also have that the product of a bra vector and a ket vector, viewed as matrices
either having a single row or a single column, results in a scalar. Specifically, if we
have two column vectors
α1 β1
α2 β2
|ψ⟩ =
.. and |ϕ⟩ = .. ,
. .
αn βn
so that the row vector ⟨ψ| is as in equation (3.1), then
β1
β2
⟨ψ|ϕ⟩ = ⟨ψ||ϕ⟩ = α1 α2 · · · αn .. = α1 β 1 + · · · + αn β n .
.
βn
Alternatively, if we have two column vectors that we have written as
where the last equality follows from the observation that ⟨ a| a⟩ = 1 and ⟨ a|b⟩ = 0
for classical states a and b satisfying a ̸= b.
The value ⟨ψ|ϕ⟩ is called the inner product between the vectors |ψ⟩ and |ϕ⟩. Inner
products are critically important in quantum information and computation; we
would not get far in understanding quantum information at a mathematical level
without them. Some basic facts about inner products of vectors follow.
3.2. INNER PRODUCTS AND PROJECTIONS 75
|ψ⟩ = ∑ α a | a⟩
a∈Σ
with itself is
∑ α a α a = ∑ | α a |2 =
2
⟨ψ|ψ⟩ = |ψ⟩ .
a∈Σ a∈Σ
Thus, the Euclidean norm of a vector may alternatively be expressed as
q
| ψ ⟩ = ⟨ ψ | ψ ⟩.
Notice that the Euclidean norm of a vector must always be a nonnegative real
number. Moreover, the only way the Euclidean norm of a vector can be equal to
zero is if every one of the entries is equal to zero, which is to say that the vector is
the zero vector.
We can summarize these observations like this: for every vector |ψ⟩ we have
⟨ψ|ψ⟩ ≥ 0,
with ⟨ψ|ψ⟩ = 0 if and only if |ψ⟩ = 0. This property of the inner product is
sometimes referred to as positive definiteness.
Conjugate symmetry. For any two vectors
we have
⟨ψ|ϕ⟩ = ∑ αa β a and ⟨ϕ|ψ⟩ = ∑ β a αa ,
a∈Σ a∈Σ
and therefore
⟨ ψ | ϕ ⟩ = ⟨ ϕ | ψ ⟩.
Linearity in the second argument (and conjugate linearity in the first). Let us
suppose that |ψ⟩, |ϕ1 ⟩, and |ϕ2 ⟩ are vectors and α1 and α2 are complex numbers. If
we define a new vector
|ϕ⟩ = α1 |ϕ1 ⟩ + α2 |ϕ2 ⟩,
then
⟨ψ|ϕ⟩ = ⟨ψ| α1 |ϕ1 ⟩ + α2 |ϕ2 ⟩ = α1 ⟨ψ|ϕ1 ⟩ + α2 ⟨ψ|ϕ2 ⟩.
76 LESSON 3. QUANTUM CIRCUITS
That is to say, the inner product is linear in the second argument. This can be verified
either through the formulas above or simply by noting that matrix multiplication is
linear in each argument (and specifically in the second argument).
Combining this fact with conjugate symmetry reveals that the inner product is
conjugate linear in the first argument. That is, if |ψ1 ⟩, |ψ2 ⟩, and |ϕ⟩ are vectors and α1
and α2 are complex numbers, and we define
then
⟨ψ|ϕ⟩ = α1 ⟨ψ1 | + α2 ⟨ψ2 | |ϕ⟩ = α1 ⟨ψ1 |ϕ⟩ + α2 ⟨ψ2 |ϕ⟩.
The Cauchy–Schwarz inequality. For every choice of vectors |ϕ⟩ and |ψ⟩ having
the same number of entries, we have
This is an incredibly handy inequality that gets used quite extensively in quantum
information (and in many other topics of study).
⟨ψ|ϕ⟩ = 0.
⟨ψj |ψk ⟩ = 0
for the 2-dimensional space corresponding to a single qubit, and the Bell basis
| ϕ + ⟩, | ϕ − ⟩, | ψ + ⟩, | ψ − ⟩
Suppose that |ψ1 ⟩, . . . , |ψm ⟩ are vectors that live in an n-dimensional space, and
assume moreover that {|ψ1 ⟩, . . . , |ψm ⟩} is an orthonormal set. Orthonormal sets
are always linearly independent sets, so these vectors necessarily span a subspace
of dimension m. From this we conclude that m ≤ n because the dimension of the
subspace spanned by these vectors cannot be larger than the dimension of the entire
space from which they’re drawn.
If it is the case that m < n, then it is always possible to choose an additional
n − m vectors |ψm+1 ⟩, . . . , |ψn ⟩ so that {|ψ1 ⟩, . . . , |ψn ⟩} forms an orthonormal basis.
A procedure known as the Gram–Schmidt orthogonalization process can be used to
construct these vectors.
Orthonormal bases are closely connected with unitary matrices. One way to express
this connection is to say that the following three statements are logically equivalent
(meaning that they are all true or all false) for any choice of a square matrix U.
Multiplying the two matrices, with the conjugate transpose on the left-hand side,
gives us this matrix:
α1,1 α2,1 α3,1 α1,1 α1,2 α1,3
α1,2 α2,2 α3,2 α2,1 α2,2 α2,3
α1,3 α2,3 α3,3 α3,1 α3,2 α3,3
α1,1 α1,1 + α2,1 α2,1 + α3,1 α3,1 α1,1 α1,2 + α2,1 α2,2 + α3,1 α3,2 α1,1 α1,3 + α2,1 α2,3 + α3,1 α3,3
=
α1,2 α1,1 + α2,2 α2,1 + α3,2 α3,1 α1,2 α1,2 + α2,2 α2,2 + α3,2 α3,2 α1,2 α1,3 + α2,2 α2,3 + α3,2 α3,3
α1,3 α1,1 + α2,3 α2,1 + α3,3 α3,1 α1,3 α1,2 + α2,3 α2,2 + α3,3 α3,2 α1,3 α1,3 + α2,3 α2,3 + α3,3 α3,3
Referring to the equation (3.3), we see that this matrix is equal to the identity matrix
if and only if the set {|ψ1 ⟩, |ψ2 ⟩, |ψ3 ⟩} is orthonormal. This argument generalizes to
unitary matrices of any size.
3.2. INNER PRODUCTS AND PROJECTIONS 79
The fact that the rows of a square matrix form an orthonormal basis if and only
if the matrix is unitary follows from the fact that a matrix is unitary if and only if its
transpose is unitary.
Given the equivalence described above, together with the fact that every or-
thonormal set can be extended to form an orthonormal basis, we conclude the
following useful fact: Given any orthonormal set of vectors {|ψ1 ⟩, . . . , |ψm ⟩} drawn
from an n-dimensional space, there exists a unitary matrix U whose first m columns
are the vectors |ψ1 ⟩, . . . , |ψm ⟩. Pictorially, we can always find a unitary matrix having
this form:
U = |ψ1 ⟩ |ψ2 ⟩ · · · |ψm ⟩ |ψm+1 ⟩ · · · |ψn ⟩ .
The last n − m columns ban be filled in with any choice of vectors |ψm+1 ⟩, . . . , |ψn ⟩
that make {|ψ1 ⟩, . . . , |ψn ⟩} an orthonormal basis.
Π = |ψ⟩⟨ψ| (3.4)
for any unit vector |ψ⟩. We can see that this matrix is Hermitian as follows:
† † †
Π† = |ψ⟩⟨ψ| = ⟨ψ| |ψ⟩ = |ψ⟩⟨ψ| = Π.
( AB)† = B† A† ,
which is always true, for any two matrices A and B for which the product AB makes
sense.
To see that the matrix Π in (3.4) is idempotent, we can use the assumption that
|ψ⟩ is a unit vector, so that it satisfies ⟨ψ|ψ⟩ = 1. Thus, we have
2
Π2 = |ψ⟩⟨ψ| = |ψ⟩⟨ψ|ψ⟩⟨ψ| = |ψ⟩⟨ψ| = Π.
More generally, if {|ψ1 ⟩, . . . , |ψm ⟩} is any orthonormal set of vectors, then the
matrix
m
Π= ∑ |ψk ⟩⟨ψk | (3.5)
k =1
is a projection. Specifically, we have
m † m m
∑ ∑
†
Π = †
|ψk ⟩⟨ψk | = |ψk ⟩⟨ψk | = ∑ |ψk ⟩⟨ψk | = Π,
k =1 k =1 k =1
and
m m
Π =
2
∑ |ψj ⟩⟨ψj | ∑ |ψk ⟩⟨ψk |
j =1 k =1
m m m
= ∑ ∑ |ψj ⟩⟨ψj |ψk ⟩⟨ψk | = ∑ |ψk ⟩⟨ψk | = Π,
j =1 k =1 k =1
Projective measurements
The notion of a measurement of a quantum system is more general than just stan-
dard basis measurements. Projective measurements are measurements that are de-
scribed by a collection of projections whose sum is equal to the identity matrix. In
symbols, a collection {Π0 , . . . , Πm−1 } of projection matrices describes a projective
measurement if
Π0 + · · · + Πm−1 = I.
When such a measurement is performed on a system X while it is in some state |ψ⟩,
two things happen:
1. For each k ∈ {0, . . . , m − 1}, the outcome of the measurement is k with proba-
bility equal to
2
Pr outcome is k = Πk |ψ⟩ .
Πk |ψ⟩
.
Πk |ψ⟩
We can also choose outcomes other than {0, . . . , m − 1} for projective measure-
ments if we wish. More generally, for any finite and nonempty set Σ, if we have a
collection of projection matrices {Π a : a ∈ Σ} that satisfies the condition
∑ Πa = I,
a∈Σ
Π a |ψ⟩
.
Π a |ψ⟩
If we have multiple systems that are jointly in some quantum state and a projec-
tive measurement is performed on just one of the systems, the action is similar to
what we had for standard basis measurements — and in fact we can now describe
this action in much simpler terms than we could before.
To be precise, let us suppose that we have two systems (X, Y ) in a quantum
state |ψ⟩, and a projective measurement described by a collection {Π a : a ∈ Σ} is
performed on the system X, while nothing is done to Y. Doing this is then equivalent
to performing the projective measurement described by the collection
Πa ⊗ I : a ∈ Σ
on the joint system (X, Y ). Each measurement outcome a results with probability
2
(Π a ⊗ I)|ψ⟩ ,
and conditioned on the result a appearing, the state of the joint system (X, Y )
becomes
(Π a ⊗ I)|ψ⟩
.
(Π a ⊗ I)|ψ⟩
Π0 0 · · · 0
Π1 0 · · · 0
M= . .
.. . . .
.. . . ..
Π m −1 0 · · · 0
Here, each 0 represents an n × n matrix filled entirely with zeros, so that the entire
matrix M is an nm × nm matrix.
Now, M is certainly not a unitary matrix (unless m = 1, in which case Π0 = I,
giving M = I in this trivial case) because unitary matrices cannot have any columns
(or rows) that are entirely 0; unitary matrices have columns that form orthonormal
bases, and the all-zero vector is not a unit vector.
84 LESSON 3. QUANTUM CIRCUITS
However, it is the case that the first n columns of M are orthonormal, and we
get this from the assumption that {Π0 , . . . , Πm−1 } is a measurement. To verify this
claim, notice that for each j ∈ {0, . . . , n − 1}, the vector formed by column number
j of M is as follows:
m −1
|ψj ⟩ = M|0, j⟩ = ∑ | k ⟩ ⊗ Π k | j ⟩.
k =0
Note that here we’re numbering the columns starting from column 0. Taking the
inner product of column i with column j when i, j ∈ {0, . . . , n − 1} gives
m −1 † m −1 m −1 m −1
⟨ψi |ψj ⟩ = ∑ | k ⟩ ⊗ Πk |i ⟩ ∑ |l ⟩ ⊗ Πl | j⟩ = ∑ ∑ ⟨k|l ⟩⟨i|Πk Πl | j⟩
k =0 l =0 k =0 l =0
m −1 m −1 1 i=j
= ∑ ⟨ i | Π k Π k | j ⟩ = ∑ ⟨ i | Π k | j ⟩ = ⟨ i |I| j ⟩ =
k =0 k =0 0 i ̸= j,
Π0
? ··· ?
Π1
? · · · ?
U= .
.. .
.. . .
. . ..
Π m −1 ? · · · ?
If we’re given the matrices Π0 , . . . , Πm−1 , we can compute suitable matrices to fill
in for the blocks marked ? — using the Gram–Schmidt process — but it does not
matter specifically what these matrices are for the sake of this discussion.
Finally we can describe the measurement process: we first perform U on the joint
system (Y, X) and then measure Y with respect to a standard basis measurement.
For an arbitrary state |ϕ⟩ of X, we obtain the state
m −1
U |0⟩|ϕ⟩ = M |0⟩|ϕ⟩ = ∑ |k ⟩ ⊗ Πk |ϕ⟩,
k =0
where the first equality follows from the fact that U and M agree on their first
n columns. When we perform a projective measurement on Y, we obtain each
outcome k with probability
2
Πk |ϕ⟩ ,
3.3. LIMITATIONS ON QUANTUM INFORMATION 85
Πk |ϕ⟩
|k⟩ ⊗ .
Πk |ϕ⟩
In the second case, in which the system is in the state |ϕ⟩, the probability of measur-
ing any classical state a is
2 2 2 2
⟨ a|ϕ⟩ = α⟨ a|ψ⟩ = | α |2 ⟨ a | ψ ⟩ = ⟨ a|ψ⟩ ,
because |α| = 1. That is, the probability of an outcome appearing is the same for
both states.
Now consider what happens when we apply an arbitrary unitary operation U
to both states. In the first case, in which the initial state is |ψ⟩, the state becomes
U | ψ ⟩,
and in the second case, in which the initial state is |ϕ⟩, it becomes
U |ϕ⟩ = αU |ψ⟩.
That is, the two resulting states still differ by the same global phase α.
Consequently, two quantum states |ψ⟩ and |ϕ⟩ that differ by a global phase are
completely indistinguishable; no matter what operation, or sequence of operations,
we apply to the two states, they will always differ by a global phase, and performing
a standard basis measurement will produce outcomes with precisely the same
probabilities as the other. For this reason, two quantum state vectors that differ by a
global phase are considered to be equivalent, and are effectively viewed as being
the same state.
For example, the quantum states
1 1 1 1
|−⟩ = √ |0⟩ − √ |1⟩ and − |−⟩ = − √ |0⟩ + √ |1⟩
2 2 2 2
differ by a global phase (which is −1 in this example), and are therefore considered
to be the same state.
On the other hand, the quantum states
1 1 1 1
|+⟩ = √ |0⟩ + √ |1⟩ and |−⟩ = √ |0⟩ − √ |1⟩
2 2 2 2
do not differ by a global phase. Although the only difference between the two states
is that a plus sign turns into a minus sign, this is not a global phase difference, it is
a relative phase difference because it does not affect every vector entry, but only a
proper subset of the entries. This is consistent with what we have already observed
3.3. LIMITATIONS ON QUANTUM INFORMATION 87
previously, which is that the states |+⟩ and |−⟩ can be discriminated perfectly. In
particular, performing a Hadamard operation and then measuring yields outcome
probabilities as follows:
2 2
⟨0| H |+⟩ =1 ⟨0| H |−⟩ =0
2 2
⟨1| H |+⟩ =0 ⟨1| H |−⟩ = 1.
No-cloning theorem
The no-cloning theorem shows that it is impossible to create a perfect copy of an
unknown quantum state.
No-cloning theorem
Let Σ be a classical state set having at least two elements, and let X and Y be
systems sharing the same classical state set Σ. There does not exist a quantum
state |ϕ⟩ of Y and a unitary operation U on the pair (X, Y ) such that
U |ψ⟩ ⊗ |ϕ⟩ = |ψ⟩ ⊗ |ψ⟩
That is, there is no way to initialize the system Y (to any state |ϕ⟩ whatsoever)
and perform a unitary operation U on the joint system (X, Y ) so that the effect is for
the state |ψ⟩ of X to be cloned — resulting in (X, Y ) being in the state |ψ⟩ ⊗ |ψ⟩.
The proof of this theorem is actually quite simple: it boils down to the observa-
tion that the mapping
|ψ⟩ ⊗ |ϕ⟩ 7→ |ψ⟩ ⊗ |ψ⟩
is not linear in |ψ⟩.
In detail, because Σ has at least two elements, we may choose a, b ∈ Σ with
a ̸= b. If there did exist a quantum state |ϕ⟩ of Y and a unitary operation U on the
pair (X, Y ) for which U |ψ⟩ ⊗ |ϕ⟩ = |ψ⟩ ⊗ |ψ⟩ for every quantum state |ψ⟩ of X,
then it would be the case that
U | a⟩ ⊗ |ϕ⟩ = | a⟩ ⊗ | a⟩ and U |b⟩ ⊗ |ϕ⟩ = |b⟩ ⊗ |b⟩.
By linearity, meaning specifically the linearity of the tensor product in the first
argument and the linearity of matrix-vector multiplication in the second (vector)
88 LESSON 3. QUANTUM CIRCUITS
|0⟩ + | a⟩
| a⟩ | a⟩
0
|ψ⟩
U
|0 · · · 0⟩
1
|ϕ⟩
U
|0 · · · 0⟩
Figure 3.17: A quantum circuit U perfectly discriminates the states |ψ⟩ and |ϕ⟩.
for |ψ⟩ and 1 for |ϕ⟩; the analysis would not differ fundamentally if these output
values were reversed.
Notice that, in addition to the qubits that initially store either |ψ⟩ or |ϕ⟩, the
circuit is free to make use of any number of additional workspace qubits. These
qubits are initially each set to the |0⟩ state — so their combined state is denoted
|0 · · · 0⟩ in the figures — and these qubits can be used by the circuit in any way that
might be beneficial. It is very common to make use of workspace qubits in quantum
circuits like this.
Now, consider what happens when we run our circuit on the state |ψ⟩ (along
with the initialized workspace qubits). The resulting state, immediately prior to the
measurement being performed, can be written as
U |0 · · · 0⟩|ψ⟩ = |γ0 ⟩|0⟩ + |γ1 ⟩|1⟩
for two vectors |γ0 ⟩ and |γ1 ⟩ that correspond to all of the qubits except the top
qubit. In general, for such a state the probabilities that a measurement of the top
qubit yields the outcomes 0 and 1 are as follows:
2 2
Pr(outcome is 0) = |γ0 ⟩ and Pr(outcome is 1) = |γ1 ⟩ .
3.3. LIMITATIONS ON QUANTUM INFORMATION 91
Because our circuit always outputs 0 for the state |ψ⟩, it must be that |γ1 ⟩ = 0, and
so
U |0 · · · 0⟩|ψ⟩ = |γ0 ⟩|0⟩.
Multiplying both sides of this equation by U † yields this equation:
Now let us take the inner product of the vectors represented by the equations
(3.6) and (3.7), starting with the representations on the right-hand side of each
equation. We have
†
U † |γ0 ⟩|0⟩
= ⟨γ0 |⟨0| U,
so the inner product of the vector (3.6) with the vector (3.7) is
⟨γ0 |⟨0| UU † |δ1 ⟩|1⟩ = ⟨γ0 |⟨0| |δ1 ⟩|1⟩ = ⟨γ0 |δ1 ⟩⟨0|1⟩ = 0.
Here we have used the fact that UU † = I, as well as the fact that the inner product
of tensor products is the product of the inner products:
⟨u ⊗ v|w ⊗ x ⟩ = ⟨u|w⟩⟨v| x ⟩
for any choices of these vectors (assuming |u⟩ and |w⟩ have the same number of
entries and |v⟩ and | x ⟩ have the same number of entries, so that it makes sense to
form the inner products ⟨u|w⟩ and ⟨v| x ⟩). Notice that the value of the inner product
⟨γ0 |δ1 ⟩ is irrelevant because it is multiplied by ⟨0|1⟩ = 0.
Finally, taking the inner product of the vectors on the left-hand sides of the
equations (3.6) and (3.7) must result in the same zero value that we’ve already
calculated, so
†
0 = |0 · · · 0⟩|ψ⟩ |0 · · · 0⟩|ϕ⟩ = ⟨0 · · · 0|0 · · · 0⟩⟨ψ|ϕ⟩ = ⟨ψ|ϕ⟩.
We have therefore concluded what we wanted, which is that |ψ⟩ and |ϕ⟩ are orthog-
onal: ⟨ψ|ϕ⟩ = 0.
92 LESSON 3. QUANTUM CIRCUITS
It is possible, by the way, to perfectly discriminate any two states that are
orthogonal, which is the converse to the statement we just proved. Suppose that
the two states to be discriminated are |ϕ⟩ and |ψ⟩, where ⟨ϕ|ψ⟩ = 0. We can
then perfectly discriminate these states by performing the projective measurement
described by these matrices, for instance:
|ϕ⟩⟨ϕ|, I − |ϕ⟩⟨ϕ| .
And, for the state |ψ⟩, the second outcome is always obtained:
2 2 2
|ϕ⟩⟨ϕ||ψ⟩ = |ϕ⟩⟨ϕ|ψ⟩ = 0 = 0,
2 2 2
(I − |ϕ⟩⟨ϕ|)|ψ⟩ = |ψ⟩ − |ϕ⟩⟨ϕ|ψ⟩ = |ψ⟩ = 1.
More generally, any orthogonal collection of quantum state vectors can be discrimi-
nated perfectly.
Lesson 4
Entanglement in Action
In this lesson we’ll take a look at three fundamentally important examples. The
first two are the quantum teleportation and superdense coding protocols, which are
principally concerned with the transmission of information from a sender to a
receiver. The third example is an abstract game, called the CHSH game, which
illustrates a phenomenon in quantum information that is sometimes referred to
as nonlocality. (The CHSH game is not always described as a game. It is often
described instead as an experiment — specifically, it is an example of a Bell test —
and is referred to as the CHSH inequality.)
Quantum teleportation, superdense coding, and the CHSH game are not merely
examples meant to illustrate how quantum information works, although they do
serve well in this regard. Rather, they are stones in the foundation of quantum
information. Entanglement plays a key role in all three examples, so this lesson
provides the first opportunity in this course to see entanglement in action, and
to begin to explore what it is that makes entanglement such an interesting and
important concept.
Before proceeding to the examples themselves, a few preliminary comments
that connect to all three examples are in order.
These names were first used in this way in the 1970s in the context of cryptogra-
phy, but the convention has become common more broadly since then. The idea
is simply that these are common names (at least in some parts of the world) that
start with the letters A and B. It is also quite convenient to refer to Alice with the
pronoun her and Bob with the pronoun him for the sake of brevity.
By default, we imagine that Alice and Bob are in different locations. They
may have different goals and behaviors depending on the context in which they
arise. For example, in the context of communication, meaning the transmission of
information, we might decide to use the name Alice to refer to the sender and Bob
to refer to the receiver of whatever information is transmitted. In general, it may
be that Alice and Bob cooperate, which is typical of a wide range of settings — but
in other settings they may be in competition, or they may have different goals that
may or may not be consistent or harmonious. These things must be made clear in
the situation at hand.
We can also introduce additional characters, such as Charlie and Diane, as needed.
Other names that represent different personas, such as Eve for an eavesdropper or
Mallory for someone behaving maliciously, are also sometimes used.
Entanglement as a resource
Recall this example of an entangled quantum state of two qubits:
1 1
|ϕ+ ⟩ = √ |00⟩ + √ |11⟩. (4.1)
2 2
It is one of the four Bell states, and is often viewed as the archetypal example of an
entangled quantum state.
We also previously encountered this example of a probabilistic state of two bits:
1 1
|00⟩ + |11⟩. (4.2)
2 2
It is, in some sense, analogous to the entangled quantum state (4.1). It represents a
probabilistic state in which two bits are correlated, but it is not entangled. Entangle-
ment is a uniquely quantum phenomenon, essentially by definition: in simplified
terms, entanglement refers to non-classical quantum correlations.
Unfortunately, defining entanglement as non-classical quantum correlation
is somewhat unsatisfying at an intuitive level, because it’s a definition of what
entanglement is in terms of what it is not. This may be why it’s actually rather
95
challenging to explain precisely what entanglement is, and what makes it special,
in intuitive terms.
Typical explanations of entanglement often fail to distinguish the two states
(4.1) and (4.2) in a meaningful way. For example, it is sometimes said that if one
of two entangled qubits is measured, then the state of the other qubit is somehow
instantaneously affected; or that the state of the two qubits together cannot be
described separately; or that the two qubits somehow maintain a memory of each
other. These statements are not false, but why are they not also true for the (unen-
tangled) probabilistic state (4.2) above? The two bits represented by this state are
intimately connected: each one has a perfect memory of the other in a literal sense.
But the state is nevertheless not entangled.
One way to explain what makes entanglement special, and what makes the
quantum state (4.1) different from the probabilistic state (4.2), is to explain what can
be done with entanglement, or what we can see happening because of entanglement,
that goes beyond the decisions we make about how to represent our knowledge of
states using vectors. All three of the examples to be discussed in this lesson have
this nature, in that they illustrate things that can be done with the state (4.1) that
cannot be done with any classically correlated state, including the state (4.2).
Indeed, it is typical in the study of quantum information and computation
that entanglement is viewed as a resource through which different tasks can be
accomplished. When this is done, the state (4.1) is viewed as representing one
unit of entanglement, which we refer to as an e-bit. The “e” stands for “entangled”
or “entanglement.” While it is true that the state (4.1) is a state of two qubits, the
quantity of entanglement that it represents is one e-bit.
Incidentally, we can also view the probabilistic state (4.2) as a resource, which is
one bit of shared randomness. It can be very useful in cryptography, for instance, to
share a random bit with somebody (presuming that nobody else knows what the
bit is), so that it can be used as a private key, or part of a private key, for the sake of
encryption. But in this lesson the focus is on entanglement and a few things we can
do with it.
As a point of clarification regarding terminology, when we say that Alice and
Bob share an e-bit, what we mean is that Alice has a qubit named A, Bob has a qubit
named B, and together the pair (A, B) is in the quantum state (4.1). Different names
could, of course, be chosen for the qubits, but throughout this lesson we will stick
with these names in the interest of clarity.
96 LESSON 4. ENTANGLEMENT IN ACTION
At this point, one might ask whether it is possible for Alice and Bob to accom-
plish their task without even needing to make use of a shared e-bit. In other words,
is there any way to transmit a qubit using classical communication alone?
The answer is no, it is not possible to transmit quantum information using
classical communication alone. This is not too difficult to prove mathematically
using basic quantum information theory, but we can alternatively rule out the
possibility of transmitting qubits using classical communication alone by thinking
about the no-cloning theorem.
Imagine that there was a way to send quantum information using classical com-
munication alone. Classical information can easily be copied and broadcast, which
means that any classical transmission from Alice to Bob might also be received by
a second receiver (Charlie, let us say). But if Charlie receives the same classical
communication that Bob received, then would he not also be able to obtain a copy
of the qubit Q? This would suggest that Q was cloned, which we already know is
impossible by the no-cloning theorem, and so we conclude that there is no way to
send quantum information using classical communication alone.
When the assumption that Alice and Bob share an e-bit is in place, however, it
is possible for Alice and Bob to accomplish their task. This is precisely what the
quantum teleportation protocol does.
Protocol
Figure 4.1 describes the teleportation protocol as a quantum circuit. The diagram
Q
|ψ⟩ H
A
+
Alice
|ϕ+ ⟩ Bob
B
X Z |ψ⟩
is slightly stylized in that it depicts the separation between Alice and Bob, with
two diagonal wires representing classical bits that are sent from Alice to Bob, but
otherwise it is an ordinary quantum circuit diagram. The qubit names are shown
above the wires rather than to the left so that the initial states can be shown as well
(which we will commonly do when it is convenient). It should also be noted that
the X and Z gates have classical controls, which simply means that the gates are
not applied or applied depending on whether these classical control bits are 0 or 1,
respectively.
In words, the teleportation protocol is as follows:
1. Alice performs a controlled-NOT operation on the pair (A, Q), with Q the
control and A the target, and then performs a Hadamard operation on Q.
2. Alice then measures both A and Q, with respect to a standard basis mea-
surement in both cases, and transmits the classical outcomes to Bob. Let us
refer to the outcome of the measurement of A as a and the outcome of the
measurement of Q as b.
3. Bob receives a and b from Alice, and depending on the values of these bits he
performs these operations:
That is, conditioned on ab being 00, 01, 10, or 11, Bob performs one of the
operations I, Z, X, or ZX on the qubit B.
This is the complete description of the teleportation protocol. The analysis that
appears below reveals that when it is run, the qubit B will be in whatever state
Q was in prior to the protocol being executed, including whatever correlations it
had with any other systems — which is to say that the protocol has effectively
implemented a perfect qubit communication channel, where the state of Q has been
“teleported” into B.
Before proceeding to the analysis, notice that this protocol does not succeed
in cloning the state of Q, which we already know is impossible by the no-cloning
theorem. Rather, when the protocol is finished, the state of the qubit Q will have
changed from its original value to |b⟩ as a result of the measurement performed on
it. Also notice that the e-bit has effectively been “burned” in the process: the state
of A has changed to | a⟩ and is no longer entangled with B (or any other system).
This is the cost of teleportation.
4.1. QUANTUM TELEPORTATION 99
α |0⟩ + β |1⟩ H
+
Alice
|ϕ+ ⟩ Bob
X Z
| π0 ⟩ | π1 ⟩ | π2 ⟩
Figure 4.2: Three states |π0 ⟩, |π1 ⟩, and |π2 ⟩ relevant to the analysis of the teleporta-
tion protocol.
Analysis
To analyze the teleportation protocol, we’ll examine the behavior of the circuit
described above, one step at a time, beginning with the situation in which Q is
initially in the state α|0⟩ + β|1⟩. This is not the most general situation, as it does not
capture the possibility that Q is entangled with other systems, but starting with this
simpler case will add clarity to the analysis. The more general case is addressed
below, following the analysis of the simpler case.
Consider the states of the qubits (B, A, Q) at the times suggested by Figure 4.2.
Under the assumption that the qubit Q begins the protocol in the state α|0⟩ + β|1⟩,
the state of the three qubits (B, A, Q) together at the start of the protocol is therefore
α|000⟩ + α|110⟩ + β|001⟩ + β|111⟩
| π0 ⟩ = | ϕ + ⟩ ⊗ α |0⟩ + β |1⟩ =
√ .
2
The first gate that is performed is the controlled-NOT gate, which transforms the
state |π0 ⟩ into
α|000⟩ + α|110⟩ + β|011⟩ + β|101⟩
| π1 ⟩ = √ .
2
Then the Hadamard gate is applied, which transforms the state |π1 ⟩ into
α|00⟩|+⟩ + α|11⟩|+⟩ + β|01⟩|−⟩ + β|10⟩|−⟩
| π2 ⟩ = √
2
α|000⟩ + α|001⟩ + α|110⟩ + α|111⟩ + β|010⟩ − β|011⟩ + β|100⟩ − β|101⟩
= .
2
100 LESSON 4. ENTANGLEMENT IN ACTION
Using the multilinearity of the tensor product, we may alternatively write this state
as follows.
1
| π2 ⟩ = α|0⟩ + β|1⟩ |00⟩
2
1
+ α|0⟩ − β|1⟩ |01⟩
2
1
+ α|1⟩ + β|0⟩ |10⟩
2
1
+ α|1⟩ − β|0⟩ |11⟩
2
At first glance, it might look like something magical has happened, because
the leftmost qubit B now seems to depend on the numbers α and β, even though
there has not yet been any communication from Alice to Bob. This is an illusion.
Scalars float freely through tensor products, so α and β are neither more nor less
associated with the leftmost qubit than they are with the other qubits, and all we
have done is to use algebra to express the state in a way that facilitates an analysis
of the measurements.
Now let us consider the four possible outcomes of Alice’s standard basis mea-
surements, together with the actions that Bob performs as a result.
Possible outcomes
Bob does nothing in this case, and so this is the final state of these three qubits.
• The outcome of Alice’s measurement is ab = 01 with probability
2
1 |α|2 + |− β|2 1
α |0⟩ − β |1⟩ = = ,
2 4 4
In this case, Bob applies an X gate to the qubit B, leaving (B, A, Q) in the state
α|0⟩ + β|1⟩ |10⟩.
In this case, Bob performs the operation ZX on the qubit B, leaving (B, A, Q)
in the state
α|0⟩ + β|1⟩ |11⟩.
We now see, in all four cases, that Bob’s qubit B is left in the state α|0⟩ + β|1⟩
at the end of the protocol, which is the initial state of the qubit Q. This is what we
wanted to show: the teleportation protocol has worked correctly.
We also see that the qubits A and Q are left in one of the four states |00⟩, |01⟩, |10⟩,
or |11⟩, each with probability 1/4, depending upon the measurement outcomes that
Alice obtained. Thus, as was already suggested above, at the end of the protocol
Alice no longer has the state α|0⟩ + β|1⟩, which is consistent with the no-cloning
theorem.
Notice that Alice’s measurements yield absolutely no information about the
state α|0⟩ + β|1⟩. That is, the probability for each of the four possible measurement
outcomes is 1/4, irrespective of α and β. This is also essential for teleportation to
work correctly. Extracting information from an unknown quantum state necessarily
disturbs it in general, but here Bob obtains the state without it being disturbed.
102 LESSON 4. ENTANGLEMENT IN ACTION
General case
Now let’s consider the more general situation in which the qubit Q is initially
entangled with another system, which we’ll name R. A similar analysis to the one
above reveals that the teleportation protocol functions correctly in this more general
case: at the end of the protocol, the qubit B held by Bob is entangled with R in the
same way that Q was at the start of the protocol, as if Alice had simply handed Q to
Bob.
To prove this, let us suppose that the state of the pair (Q, R) is initially given by
a quantum state vector of the form
α|0⟩Q |γ0 ⟩R + β|1⟩Q |γ1 ⟩R ,
where |γ0 ⟩ and |γ1 ⟩ are quantum state vectors for the system R and α and β are
complex numbers satisfying |α|2 + | β|2 = 1. Any quantum state vector of the pair
(Q, R) can be expressed in this way.
Figure 4.3 depicts the same circuit as before, with the addition of the system
R (represented by a collection of qubits on the top of the diagram that nothing
happens to).
α|0⟩|γ0 ⟩ + β|1⟩|γ1 ⟩
H
+
Alice
|ϕ+ ⟩ Bob
X Z
| π0 ⟩ | π1 ⟩ | π2 ⟩
Figure 4.3: The three states |π0 ⟩, |π1 ⟩, and |π2 ⟩ in the general case where there may
be an additional system.
To analyze what happens when the teleportation protocol is run in this situation,
it is helpful to permute the systems, along the same lines that was described in the
4.1. QUANTUM TELEPORTATION 103
previous lesson. Specifically, we’ll consider the state of the systems in the order
(B, R, A, Q) rather than (B, A, Q, R). The names of the various systems are included
as subscripts in the expressions that follow for clarity.
At the start of the protocol, the state of these systems is as follows:
α|0⟩B |γ0 ⟩R |00⟩AQ + α|1⟩B |γ0 ⟩R |10⟩AQ + β|0⟩B |γ1 ⟩R |01⟩AQ + β|1⟩B |γ1 ⟩R |11⟩AQ
= √ .
2
α|0⟩B |γ0 ⟩R |00⟩AQ + α|1⟩B |γ0 ⟩R |10⟩AQ + β|0⟩B |γ1 ⟩R |11⟩AQ + β|1⟩B |γ1 ⟩R |01⟩AQ
| π1 ⟩ = √ .
2
Then the Hadamard gate is applied. After expanding and simplifying the resulting
state, along similar lines to the analysis of the simpler case above, we obtain this
expression of the resulting state:
1
| π2 ⟩ = α|0⟩B |γ0 ⟩R + β|1⟩B |γ1 ⟩R |00⟩AQ
2
1
+ α|0⟩B |γ0 ⟩R − β|1⟩B |γ1 ⟩R |01⟩AQ
2
1
+ α|1⟩B |γ0 ⟩R + β|0⟩B |γ1 ⟩R |10⟩AQ
2
1
+ α|1⟩B |γ0 ⟩R − β|0⟩B |γ1 ⟩R |11⟩AQ .
2
Proceeding exactly as before, where we consider the four different possible
outcomes of Alice’s measurements along with the corresponding actions performed
by Bob, we find that at the end of the protocol, the state of (B, R) is always
α|0⟩|γ0 ⟩ + β|1⟩|γ1 ⟩.
Informally speaking, the analysis does not change in a significant way as compared
with the simpler case above; |γ0 ⟩ and |γ1 ⟩ essentially just “come along for the
ride.” So, teleportation succeeds in creating a perfect quantum communication
channel, effectively transmitting the contents of the qubit Q into B and preserving
all correlations with other systems.
This is actually not surprising at all, given the analysis of the simpler case
above. As that analysis revealed, we have a physical process that acts like the
104 LESSON 4. ENTANGLEMENT IN ACTION
identity operation on a qubit in an arbitrary quantum state, and there’s only one
way that can happen: the operation implemented by the protocol must be the
identity operation. That is, once we know that teleportation works correctly for a
single qubit in isolation, we can conclude that the protocol effectively implements a
perfect, noiseless quantum channel, and so it must work correctly even if the input
qubit is entangled with another system.
Further discussion
Here are a few brief, concluding remarks on teleportation, beginning with the
clarification that teleportation is not an application of quantum information, it’s a
protocol for performing quantum communication. It is therefore useful only insofar
as quantum communication is useful.
Indeed, it is reasonable to speculate that teleportation could one day become a
standard way to communicate quantum information, perhaps through a process
known as entanglement distillation. This is a process that converts a larger number of
noisy (or imperfect) e-bits into a smaller number of high quality e-bits, that could
then be used for noiseless or near-noiseless teleportation. The idea is that the process
of entanglement distillation is not as delicate as direct quantum communication.
We could accept losses, for instance, and if the process doesn’t work out, we can just
try again. In contrast, the actual qubits we hope to communicate might be much
more precious.
Finally, it should be understood that the idea behind teleportation and the way
that it works is quite fundamental in quantum information and computation. It
really is a cornerstone of quantum information theory, and variations of it arise.
For example, quantum gates can be implemented through a closely related process
known as quantum gate teleportation, which uses teleportation to apply operations to
qubits rather than communicating them.
In greater detail, we have a sender (Alice) and a receiver (Bob) that share one
e-bit of entanglement. According to the conventions in place for the lesson, this
means that Alice holds a qubit A, Bob holds a qubit B, and together the pair (A, B)
is in the state |ϕ+ ⟩. Alice wishes to transmit two classical bits to Bob, which we’ll
denote by c and d, and she will accomplish this by sending him one qubit.
It is reasonable to view this feat as being less interesting than the one that
teleportation accomplishes. Sending qubits is likely to be so much more difficult
than sending classical bits for the foreseeable future that trading one qubit of
quantum communication for two bits of classical communication, at the cost of an
e-bit no less, hardly seems worth it. However, this does not imply that superdense
coding is not interesting, for it most certainly is.
Fitting the theme of the lesson, one reason why superdense coding is interesting
is that it demonstrates a concrete and (in the context of information theory) rather
striking use of entanglement. A famous theorem in quantum information theory,
known as Holevo’s theorem, implies that without the use of a shared entangled
state, it is impossible to communicate more than one bit of classical information
by sending a single qubit. (Holevo’s theorem is more general than this. Its precise
statement is technical and requires explanation, but this is one consequence of it.)
So, through superdense coding, shared entanglement effectively allows for the
doubling of the classical information-carrying capacity of sending qubits.
Protocol
The superdense coding protocol is described as a quantum circuit in Figure 4.4. In
words, here is what Alice does:
1. If d = 1, Alice performs a Z gate on her qubit A (and if d = 0 she does not).
2. If c = 1, Alice performs an X gate on her qubit A (and if c = 0 she does not).
Alice then sends her qubit A to Bob.
What Bob does when he receives the qubit A is to first perform a controlled-
NOT gate, with A being the control and B being the target, and then he applies
a Hadamard gate to A. He then measures B to obtain c and A to obtain d, with
standard basis measurements in both cases.
106 LESSON 4. ENTANGLEMENT IN ACTION
d
c
Z X
Alice
|ϕ+ ⟩ Bob
H d
+ c
Analysis
The idea behind this protocol is pretty simple: Alice effectively chooses which Bell
state she would like to be sharing with Bob, she sends Bob her qubit, and Bob
measures to determine which Bell state Alice chose.
That is, they initially share |ϕ+ ⟩, and depending upon the bits c and d, Alice
either leaves this state alone or shifts it to one of the other Bell states by applying I,
X, Z, or XZ to her qubit A.
(I ⊗ I)|ϕ+ ⟩ = |ϕ+ ⟩
(I ⊗ Z )|ϕ+ ⟩ = |ϕ− ⟩
(I ⊗ X )|ϕ+ ⟩ = |ψ+ ⟩
(I ⊗ XZ )|ϕ+ ⟩ = |ψ− ⟩
Bob’s actions have the following effects on the four Bell states.
|ϕ+ ⟩ 7→ |00⟩
|ϕ− ⟩ 7→ |01⟩
|ψ+ ⟩ 7→ |10⟩
|ψ− ⟩ 7→ −|11⟩
This can be checked directly, by computing the results of Bob’s operations on these
states one at a time.
4.3. THE CHSH GAME 107
So, when Bob performs his measurements, he is able to determine which Bell
state Alice chose. To verify that the protocol works correctly is a matter of checking
each case:
• If cd = 00, then the state of (B, A) when Bob receives A is |ϕ+ ⟩. He transforms
this state into |00⟩ and obtains cd = 00.
• If cd = 01, then the state of (B, A) when Bob receives A is |ϕ− ⟩. He transforms
this state into |01⟩ and obtains cd = 01.
• If cd = 10, then the state of (B, A) when Bob receives A is |ψ+ ⟩. He transforms
this state into |10⟩ and obtains cd = 10.
• If cd = 11, then the state of (B, A) when Bob receives A is |ψ− ⟩. He transforms
this state into −|11⟩ and obtains cd = 11. (The negative-one phase factor has
no effect here.)
Nonlocal games
A nonlocal game is a cooperative game where two players, Alice and Bob, work
together to achieve a particular outcome. The game is run by a referee, who behaves
108 LESSON 4. ENTANGLEMENT IN ACTION
No communication
between Alice and Bob
Alice Bob
a b
x y
Referee
Figure 4.5: The interactions between the Referee and Alice and Bob in a nonlocal
game.
We’ll take a look at the CHSH game momentarily, but before that let us briefly ac-
knowledge that it’s also interesting to consider other nonlocal games. It’s extremely
interesting, in fact, and there are some nonlocal games for which it’s currently not
known how well Alice and Bob can play using entanglement. The set-up is simple,
but there’s complexity at work — and for some games it can be impossibly difficult
to compute best or near-best strategies for Alice and Bob. This is the mind-blowing
nature of the non-local games model.
( x, y) win lose
(0, 0) a=b a ̸= b
(0, 1) a=b a ̸= b
(1, 0) a=b a ̸= b
(1, 1) a ̸= b a=b
Deterministic strategies
This implies that the strategy loses in the final case ( x, y) = (1, 1), for here winning
requires that a(1) ̸= b(1). Thus, there can be no deterministic strategy that wins
every time.
On the other hand, it is easy to find deterministic strategies that win in three of
the four cases, such as a(0) = a(1) = b(0) = b(1) = 0. From this we conclude that
4.3. THE CHSH GAME 111
the maximum probability for Alice and Bob to win using a deterministic strategy
is 3/4.
Probabilistic strategies
As we just concluded, Alice and Bob cannot do better than winning the CHSH
game 75% of the time using a deterministic strategy. But what about a probabilistic
strategy? Could it help Alice and Bob to use randomness — including the possibility
of shared randomness, where their random choices are correlated?
It turns out that probabilistic strategies don’t help at all to increase the probability
that Alice and Bob win. This is because every probabilistic strategy can alternatively
be viewed as a random selection of a deterministic strategy, just like probabilistic
operations can be viewed as random selections of deterministic operations. The
average is never larger than the maximum, and so it follows that probabilistic
strategies don’t offer any advantage in terms of their overall winning probability.
Thus, winning with probability 3/4 is the best that Alice and Bob can do using
any classical strategy, whether deterministic or probabilistic.
The first thing we need to do is to define a qubit state vector |ψθ ⟩, for each real
number θ (which we’ll think of as an angle measured in radians) as follows.
|ψ⟩
Alice Bob
a b
x y
Referee
Figure 4.6: A quantum strategy in which Alice and Bob make use of a shared
entangled state |ψ⟩.
We also have the following examples, which arise in the analysis below.
p √ p √
2+ 2 2− 2
|ψ−π/8 ⟩ = |0⟩ − |1⟩
2 2
p √ p √
2+ 2 2− 2
|ψπ/8 ⟩ = |0⟩ + |1⟩
2 2
p √ p √
2− 2 2+ 2
|ψ3π/8 ⟩ = |0⟩ + |1⟩
2 2
p √ p √
2− 2 2+ 2
|ψ5π/8 ⟩ = − |0⟩ + |1⟩
2 2
Looking at the general form, we see that the inner product between any two of
these vectors has this formula:
In detail, there are only real number entries in these vectors, so there are no complex
conjugates to worry about: the inner product is the product of the cosines plus the
product of the sines. Using one of the angle addition formulas from trigonometry
leads to the simplification above. This formula reveals the geometric interpretation
of the inner product between real unit vectors as the cosine of the angle between
them.
4.3. THE CHSH GAME 113
If we compute the inner product of the tensor product of any two of these vectors
√
with the |ϕ+ ⟩ state, we obtain a similar expression, except that it has a 2 in the
denominator:
cos(α) cos( β) + sin(α) sin( β) cos(α − β)
⟨ψα ⊗ ψβ |ϕ+ ⟩ = √ = √ . (4.4)
2 2
Our interest in this particular inner product will become clear shortly, but for now
we’re simply observing this as a formula.
Next, define a unitary matrix Uθ for each angle θ as follows.
Intuitively speaking, this matrix transforms |ψθ ⟩ into |0⟩ and |ψθ +π/2 ⟩ into |1⟩. To
check that this is a unitary matrix, a key observation is that the vectors |ψθ ⟩ and
|ψθ +π/2 ⟩ are orthogonal for every angle θ:
Strategy description
Set-up: Alice and Bob start the game sharing an e-bit: Alice holds a qubit A, Bob
holds a qubit B, and together the two qubits (X, Y ) are in the |ϕ+ ⟩ state.
Alice’s actions:
• If Alice receives the question x = 0, she applies U0 to her qubit A.
• If Alice receives the question x = 1, she applies Uπ/4 to her qubit A.
The operation Alice performs on A may alternatively be described like this:
U
0 if x = 0
U if x = 1.
π/4
After Alice applies this operation, she measures A with a standard basis measure-
ment and sets her answer a to be the measurement outcome.
Bob’s actions:
• If Bob receives the question y = 0, he applies Uπ/8 to his qubit B.
• If Bob receives the question y = 1, he applies U−π/8 to his qubit B.
Like we did for Alice, we can express Bob’s operation on B like this:
U
π/8 if y = 0
U
−π/8 if y = 1.
After Bob applies this operation, he measures B with a standard basis measurement
and sets his answer b to be the measurement outcome.
Figure 4.7 describes this strategy as a quantum circuit diagram. In this diagram
we see two ordinary controlled gates, one for U−π/8 on the top and one for Uπ/4 on
the bottom. We also have two gates that look like controlled gates, one for Uπ/8 on
the top and one for U0 on the bottom, except that the circle representing the control
is not filled in. This denotes a different type of controlled gate where the gate is
performed if the control is set to 0 (rather than 1 like an ordinary controlled gate).
So, effectively, Bob performs Uπ/8 on his qubit if y = 0 and U−π/8 if y = 1; and
Alice performs U0 on her qubit if x = 0 and Uπ/4 if x = 1, which is consistent with
the description of the protocol in words above.
It remains to figure out how well this strategy for Alice and Bob works. We’ll do
this by going through the four possible question pairs individually.
4.3. THE CHSH GAME 115
U π8 U− π8 b
Bob
|ϕ+ ⟩
Alice
Uπ
U0 4
a
Case-by-case analysis
Case 1: ( x, y) = (0, 0). In this case Alice performs U0 on her qubit and Bob per-
forms Uπ/8 on his, so the state of the two qubits (A, B) after they perform their
operations is
U0 ⊗ Uπ/8 |ϕ+ ⟩ = |00⟩⟨ψ0 ⊗ ψπ/8 |ϕ+ ⟩ + |01⟩⟨ψ0 ⊗ ψ5π/8 |ϕ+ ⟩
For the question pair (0, 0), Alice and Bob win if a = b, and therefore they win in
this case with probability √
2+ 2
.
4
Case 2: ( x, y) = (0, 1). In this case Alice performs U0 on her qubit and Bob per-
forms U−π/8 on his, so the state of the two qubits (A, B) after they perform their
operations is
U0 ⊗ U−π/8 |ϕ+ ⟩ = |00⟩⟨ψ0 ⊗ ψ−π/8 |ϕ+ ⟩ + |01⟩⟨ψ0 ⊗ ψ3π/8 |ϕ+ ⟩
The probabilities for the four possible answer pairs ( a, b) are therefore as follows.
√
1 2 π
2+ 2
Pr ( a, b) = (0, 0) = cos =
2 8 8
√
1 3π 2 − 2
Pr ( a, b) = (0, 1) = cos2 −
=
2 8 8
√
1 5π 2− 2
Pr ( a, b) = (1, 0) = cos2
=
2 8 8
√
1 2 π
2+ 2
Pr ( a, b) = (1, 1) = cos =
2 8 8
We find, once again, that probabilities that a = b and a ̸= b are as follows.
√ √
2+ 2 2− 2
Pr( a = b) = Pr( a ̸= b) =
4 4
For the question pair (1, 0), Alice and Bob win if a = b, so they win in this case with
probability √
2+ 2
.
4
Case 4: ( x, y) = (1, 1). The last case is a little bit different, as we might expect
because the winning condition is different in this case. When x and y are both 1,
Alice and Bob win when a and b are different. In this case Alice performs Uπ/4 on
her qubit and Bob performs U−π/8 on his, so the state of the two qubits (A, B) after
they perform their operations is
Uπ/4 ⊗ U−π/8 |ϕ+ ⟩ = |00⟩⟨ψπ/4 ⊗ ψ−π/8 |ϕ+ ⟩ + |01⟩⟨ψπ/4 ⊗ ψ3π/8 |ϕ+ ⟩
cos 3π 7π 3π
8 | 00 ⟩ + cos − π
8 | 01 ⟩ + cos 8 | 10 ⟩ + cos 8 |11⟩
= √ .
2
The probabilities for the four possible answer pairs ( a, b) are therefore as follows.
√
1 2 3π
2− 2
Pr ( a, b) = (0, 0) = cos =
2 8 8
√
1 π 2 + 2
Pr ( a, b) = (0, 1) = cos2 −
=
2 8 8
√
1 2 7π
2+ 2
Pr ( a, b) = (1, 0) = cos =
2 8 8
√
1 2 3π
2− 2
Pr ( a, b) = (1, 1) = cos =
2 8 8
118 LESSON 4. ENTANGLEMENT IN ACTION
The probabilities have effectively swapped places from in the three other cases. We
obtain the probabilities that a = b and a ̸= b by summing.
√ √
2− 2 2+ 2
Pr( a = b) = Pr( a ̸= b) =
4 4
For the question pair (1, 1), Alice and Bob win if a ̸= b, and therefore they win in
this case with probability √
2+ 2
.
4
They win in every case with the same probability:
√
2+ 2
≈ 0.85.
4
This is therefore the probability that they win overall. That’s significantly better
than any classical strategy can do for this game; classical strategies have winning
probability bounded by 3/4. And that makes this a very interesting example.
This happens to be the optimal winning probability for quantum strategies. That
is, we can’t do any better than this, no matter what entangled state or measurements
we choose. This fact is known as Tsirelson’s inequality, named for Boris Tsirelson
who first proved it — and who first described the CHSH experiment as a game.
Geometric picture
It is possible to think about the strategy described above geometrically, which may
be helpful for understanding the relationships among the various angles chosen for
Alice and Bob’s operations.
What Alice effectively does is to choose an angle α, depending on her question x,
and then to apply Uα to her qubit and measure. Similarly, Bob chooses an angle β,
depending on y, and then he applies Uβ to his qubit and measures. We’ve chosen α
and β like so.
0 x=0
α=
π/4 x = 1
π/8 y=0
β=
−π/8 y = 1
4.3. THE CHSH GAME 119
For the moment, though, let’s take α and β to be arbitrary. By choosing α, Alice
effectively defines an orthonormal basis of vectors as is shown in Figure 4.8. Bob
does likewise, except that his angle is β, as illustrated in Figure 4.9. The colors of
the vectors correspond to Alice and Bob’s answers: blue for 0 and red for 1.
|ψα+π/2 ⟩ |ψα ⟩
ψβ+π/2
ψβ
β
1
⟨ψα ⊗ ψβ |ϕ+ ⟩ = √ ⟨ψα |ψβ ⟩,
2
which works for all real numbers α and β.
120 LESSON 4. ENTANGLEMENT IN ACTION
|ψ5π/8 ⟩ |ψπ/2 ⟩
|ψπ/8 ⟩
|ψ0 ⟩
Following the same sort of analysis that we went through above, but with α and
β being variables, we find this:
Uα ⊗ Uβ |ϕ+ ⟩
|ψπ/2 ⟩
|ψ3π/8 ⟩
|ψ0 ⟩
|ψ−π/8 ⟩
|ψ5π/8 ⟩
|ψπ/4 ⟩
|ψ3π/4 ⟩
|ψπ/8 ⟩
|ψ3π/8 ⟩
|ψ3π/4 ⟩ |ψπ/4 ⟩
|ψ−π/8 ⟩
When ( x, y) = (1, 1), Alice and Bob choose α = π/4 and β = −π/8. This
results in the bases shown in Figure 4.13, which reveals that something different has
happened. By the way the angles were chosen, this time the angle between vectors
having the same color is 3π/8 rather than π/8. The probability that Alice and Bob’s
outcomes agree is still the cosine-squared of this angle, but this time the value is
√
2 3π
2− 2
cos = .
8 4
The probability the outcomes disagree is the sine-squared of this angle, which in
this case is this: √
2 3π
2+ 2
sin = .
8 4
4.3. THE CHSH GAME 123
Remarks
The basic idea of an experiment like the CHSH game, where entanglement leads
to statistical results that are inconsistent with purely classical reasoning, is due to
John Bell, the namesake of the Bell states. For this reason, people often refer to
experiments of this sort as Bell tests. Sometimes people also refer to Bell’s theorem,
which can be formulated in different ways — but the essence of it is that quantum
mechanics is not compatible with so-called local hidden variable theories. The CHSH
game is a particularly clean and simple example of a Bell test, and can be viewed as
a proof, or demonstration, of Bell’s theorem.
The CHSH game offers a way to experimentally test the theory of quantum
information. Experiments can be performed that implement the CHSH game, and
test the sorts of strategies based on entanglement described above. This provides
us with a high degree of confidence that entanglement is real — and unlike the
sometimes vague or poetic ways that we come up with to explain entanglement, the
CHSH game gives us a concrete and testable way to observe entanglement. The 2022
Nobel Prize in Physics acknowledges the importance of this line of work: the prize
was awarded to Alain Aspect, John Clauser (the C in CHSH) and Anton Zeilinger
for observing entanglement through Bell tests on entangled photons.
Unit II
Fundamentals of
Quantum Algorithms
125
126 LESSON 4. ENTANGLEMENT IN ACTION
In this first lesson of the unit, we’ll formulate a simple algorithmic framework —
known as the query model — and explore the advantages that quantum computers
offer within this framework.
The query model of computation is like a Petri dish for quantum algorithmic
ideas. It’s rigid and unnatural in the sense that it doesn’t accurately represent the
sorts of computational problems we generally care about in practice, but it has never-
theless proved to be incredibly useful as a tool for developing quantum algorithmic
techniques. This includes the ones that power the most well-known quantum
algorithms, such as Shor’s algorithm for integer factorization. The query model
also happens to be a very useful framework for explaining quantum algorithmic
techniques.
After introducing the query model itself, we’ll discuss the very first quantum
algorithm that was discovered, which is Deutsch’s algorithm, along with an extension
of Deutsch’s algorithm known as the Deutsch–Jozsa algorithm. These algorithms
demonstrate quantifiable advantages of quantum over classical computers within
the context of the query model. We’ll then discuss a quantum algorithm known as
Simon’s algorithm, which offers a more robust and satisfying advantage of quantum
over classical computations, for reasons that will be explained when we get to it.
127
128 LESSON 5. QUANTUM QUERY ALGORITHMS
While it is true that the computers we use today continuously receive input
and produce output, essentially interacting with both us and with other computers
in a way not reflected by the figure, the intention is not to represent the ongoing
operation of computers. Rather, it is to create a simple abstraction of computation,
focusing on isolated computational tasks. For example, the input might encode
a number, a vector, a matrix, a graph, a description of a molecule, or something
more complicated, while the output encodes a solution to the computational task
we have in mind.
The key point is that the input is provided to the computation, usually in the
form of a binary string, with no part of it being hidden.
input
queries ···
computation output
computational problems, with some simple examples described shortly, but for all
of them the input will be represented by a function taking the form
f : Σn → Σm
for two positive integers n and m. Naturally, we could choose a different name in
place of f , but we’ll stick with f throughout the lesson.
To say that a computation makes a query means that some string x ∈ Σn is
selected, and then the string f ( x ) ∈ Σm is made available to the computation by the
oracle. The precise way that this works for quantum algorithms will be discussed
shortly — we need to make sure that this is possible to do with a unitary quantum
operation allowing queries to be made in superposition — but for now we can think
about it intuitively at a high level.
Finally, the way that we’ll measure efficiency of query algorithms is simple:
we’ll count the number of queries they require. This is related to the time required
to perform a computation, but it’s not exactly the same because we’re ignoring the
time for operations other than the queries, and we’re also treating the queries as
if they each have unit cost. We can take the operations besides the queries into
account if we wish (and this is sometimes done), but restricting our attention just to
the number of queries helps to keep things simple.
130 LESSON 5. QUANTUM QUERY ALGORITHMS
Or: The input function takes the form f : Σn → Σ (so m = 1 for this problem). The
task is to output 1 if there exists a string x ∈ Σn for which f ( x ) = 1, and to output 0
if there is no such string. If we think about the function f as representing a sequence
of 2n bits to which we have random access, the problem is to compute the OR of
these bits.
Parity: The input function again takes the form f : Σn → Σ. The task is to determine
whether the number of strings x ∈ Σn for which f ( x ) = 1 is even or odd. To be
precise, the required output is 0 if the set { x ∈ Σn : f ( x ) = 1} has an even number
of elements and 1 if it has an odd number of elements. If we think about the
function f as representing a sequence of 2n bits to which we have random access,
the problem is to compute the parity (or exclusive-OR) of these bits.
Minimum: The input function takes the form f : Σn → Σm for any choices of
positive integers n and m. The required output is the string y ∈ { f ( x ) : x ∈ Σn }
that comes first in the lexicographic (i.e., dictionary) ordering of Σm . If we think
about the function f as representing a sequence of 2n integers encoded as strings
of length m in binary notation to which we have random access, the problem is to
compute the minimum of these integers.
Unique search. The input function takes the form f : Σn → Σ, and we are promised
that there is exactly one string z ∈ Σn for which f (z) = 1, with f ( x ) = 0 for all
strings x ̸= z. The task is to find this unique string z.
All four of the examples just described are natural, in the sense that they’re easy
to describe and we can imagine a variety of situations or contexts in which they
might arise. In contrast, some query problems aren’t “natural” like this at all. In
5.1. THE QUERY MODEL OF COMPUTATION 131
fact, in the study of the query model, we sometimes come up with very complicated
and highly contrived problems where it’s difficult to imagine that anyone would
ever actually want to solve them in practice. This doesn’t mean that the problems
aren’t interesting, though! Things that might seem contrived or unnatural at first
can provide unexpected clues or inspire new ideas. Shor’s quantum algorithm for
factoring, which was inspired by Simon’s algorithm, is a great example. It’s also an
important part of the study of the query model to look for extremes, which can shed
light on both the potential advantages and the limitations of quantum computing.
Query gates
When we’re describing computations with circuits, queries are made by special
gates called query gates.
The simplest way to define query gates for classical Boolean circuits is to simply
allow them to compute the input function f directly, as Figure 5.3 suggests.
x f f (x)
When a Boolean circuit is created for a query problem, the input function f is
accessed through these gates, and the number of queries that the circuit makes is
simply the number of query gates that appear in the circuit. The input wires of the
Boolean circuit itself are initialized to fixed values, which should be considered as
part of the algorithm (as opposed to being inputs to the problem).
For example, Figure 5.4 describes a Boolean circuit with classical query gates
that solves the parity problem described above for a function of the form f : Σ → Σ.
This algorithm makes two queries because there are two query gates. The way it
works is that the function f is queried on the two possible inputs, 0 and 1, and the
results are plugged into a Boolean circuit that computes the XOR. This particular
circuit appeared as an example of a Boolean circuit in Lesson 3 (Quantum Circuits).
132 LESSON 5. QUANTUM QUERY ALGORITHMS
0 f ∧
¬
¬
1 f
∧
Figure 5.4: A Boolean circuit that solves the parity problem for a function f : Σ → Σ.
For quantum circuits, this definition of query gates doesn’t work, because these
gates will be non-unitary for some choices of the function f . So, what we do instead
is to define unitary query gates that operate on standard basis states as shown in
Figure 5.5.
|x⟩ |x⟩
Uf
|y⟩ |y ⊕ f ( x )⟩
Figure 5.5: The action of a unitary query gate U f on standard basis inputs.
matrix), and therefore does not change that vector’s Euclidean norm — revealing
that permutation matrices are always unitary.
It should be highlighted that, when we analyze query algorithms by simply
counting the number of queries that a query algorithm makes, we’re completely
ignoring the difficulty of physically constructing the query gates — for both the
classical and quantum versions just described. Intuitively speaking, the construction
of the query gates is part of the preparation of the input, not part of finding a
solution.
That might seem unreasonable, but we must keep in mind that we’re not try-
ing to describe practical computing or fully account for the resources required.
Rather, we’re defining a theoretical model that helps to shed light on the potential
advantages of quantum computing. We’ll have more to say about this point in the
lesson following this one when we turn our attention to a more standard model of
computation where inputs are given explicitly to circuits as binary strings.
a f 1 ( a) a f 2 ( a) a f 3 ( a) a f 4 ( a)
0 0 0 0 0 1 0 1
1 0 1 1 1 0 1 1
The first and last of these functions are constant and the middle two are balanced,
meaning that the two possible output values for the function occur the same number
of times as we range over the inputs. Deutsch’s problem is to determine which of
these two categories the input function belongs to: constant or balanced.
Deutsch’s problem
function string
f1 00
f2 01
f3 10
f4 11
When viewed in this way, Deutsch’s problem is to compute the parity (or, equiva-
lently, the exclusive-OR) of the two bits.
Every classical query algorithm that correctly solves this problem must query
both bits: f (0) and f (1). If we learn that f (1) = 1, for instance, the answer could
still be 0 or 1, depending on whether f (0) = 1 or f (0) = 0, respectively. Every other
case is similar; knowing just one of two bits doesn’t provide any information at all
about their parity. So, the Boolean circuit described in the previous section is the
best we can do in terms of the number of queries required to solve this problem.
(
0 f is constant
|0⟩ H H
1 f is balanced
Uf
|1⟩ H
Analysis
To analyze Deutsch’s algorithm, we will trace through the action of the circuit above
and identify the states of the qubits at the times suggested by Figure 5.7.
|0⟩ H H
Uf
|1⟩ H
| π1 ⟩ | π2 ⟩ | π3 ⟩
Figure 5.7: Three states |π1 ⟩, |π2 ⟩, and |π3 ⟩ considered in the analysis of Deutsch’s
algorithm.
The initial state is |1⟩|0⟩, and the two Hadamard operations on the left-hand
side of the circuit transform this state to
1 1
|π1 ⟩ = |−⟩|+⟩ = |0⟩ − |1⟩ |0⟩ + |0⟩ − |1⟩ |1⟩.
2 2
(As always, we’re following Qiskit’s qubit ordering convention, which puts the top
qubit to the right and the bottom qubit to the left.)
Next, the U f gate is performed. According to the definition of the U f gate, the
value of the function f for the classical state of the top/rightmost qubit is XORed
onto the bottom/leftmost qubit, which transforms |π1 ⟩ into the state
1 1
| π2 ⟩ = |0 ⊕ f (0)⟩ − |1 ⊕ f (0)⟩ |0⟩ + |0 ⊕ f (1)⟩ − |1 ⊕ f (1)⟩ |1⟩.
2 2
We can simplify this expression by observing that the formula
works for both possible values a ∈ Σ. More explicitly, the two cases are as follows.
1 1
|π2 ⟩ = (−1) f (0) |0⟩ − |1⟩ |0⟩ + (−1) f (1) |0⟩ − |1⟩ |1⟩
2 2
(−1) |0⟩ + (−1) ) |1⟩
f ( 0 ) f ( 1
= |−⟩ √ .
2
Something interesting just happened! Although the action of the U f gate on
standard basis states leaves the top/rightmost qubit alone and XORs the function
value onto the bottom/leftmost qubit, here we see that the state of the top/rightmost
qubit has changed (in general) while the state of the bottom/leftmost qubit remains
the same — specifically being in the |−⟩ state before and after the U f gate is
performed. This phenomenon is known as the phase kickback, and we will have more
to say about it shortly.
With one final simplification, which is to pull the factor of (−1) f (0) outside of
the sum, we obtain this expression of the state |π2 ⟩:
Notice that in this expression, we have f (0) ⊕ f (1) in the exponent of −1 as opposed
to f (1) − f (0), which is what we might expect from a purely algebraic viewpoint,
but we obtain the same result either way. This is because the value (−1)k for any
integer k depends only on whether k is even or odd.
Applying the final Hadamard gate to the top qubit leaves us with the state
(−1) f (0) |−⟩|0⟩ if f (0) ⊕ f (1) = 0
| π3 ⟩ =
(−1) f (0) |−⟩|1⟩ if f (0) ⊕ f (1) = 1,
which leads to the correct outcome with probability 1 when the right/topmost qubit
is measured.
First, notice that the following formula works for all choices of bits b, c ∈ Σ.
|b ⊕ c⟩ = X c |b⟩
This can be verified by checking it for the two possible values c = 0 and c = 1:
| b ⊕ 0⟩ = | b ⟩ = I| b ⟩ = X 0 | b ⟩
|b ⊕ 1⟩ = |¬b⟩ = X |b⟩ = X 1 |b⟩.
for every choice of bits a, b ∈ Σ. Because this formula is true for b = 0 and b = 1,
we see by linearity that
X |−⟩ = −|−⟩.
|0⟩ H H
|0⟩ H H
y ∈ Σn
Uf
|0⟩ H H
|1⟩ H
Notice that, when n is larger than 1, there are functions of the form f : Σn → Σ
that are neither constant nor balanced. For example, the function f : Σ2 → Σ
defined as
f (00) = 0
f (01) = 0
f (10) = 0
f (11) = 1
falls into neither of these two categories. For the Deutsch–Jozsa problem, we simply
don’t worry about functions like this — they’re considered to be “don’t care” inputs.
That is, for this problem we have a promise that f is either constant or balanced.
The Deutsch–Jozsa algorithm, with its single query, solves this problem in the
following sense: if every one of the n measurement outcomes is 0, then the function
f is constant; and otherwise, if at least one of the measurement outcomes is 1, then
the function f is balanced. Another way to say this is that the circuit described above
is followed by a classical post-processing step in which the OR of the measurement
outcomes is computed to produce the output.
Algorithm analysis
but we can also express this operation in terms of its action on standard basis states:
1 1
H |0⟩ = √ |0⟩ + √ |1⟩
2 2
1 1
H |1⟩ = √ |0⟩ − √ |1⟩.
2 2
These two equations can be combined into a single formula,
1 1 1
H | a⟩ = √ |0⟩ + √ (−1) a |1⟩ = √
2 2
∑
2 b∈{0,1}
(−1) ab |b⟩,
H ⊗ n | x n −1 · · · x 1 x 0 ⟩
= H | x n −1 ⟩ ⊗ · · · ⊗ H | x 0 ⟩
! !
1 1
= √ ∑ (−1) xn−1 yn−1 |yn−1 ⟩ ⊗···⊗ √ ∑ (−1) x0 y0 |y0 ⟩
2 y n −1 ∈ Σ 2 y0 ∈ Σ
1
=√
2n
∑ (−1) xn−1 yn−1 +···+x0 y0 |yn−1 · · · y0 ⟩.
yn−1 ···y0 ∈Σn
Here, by the way, we’re writing binary strings of length n as xn−1 · · · x0 and
yn−1 · · · y0 , following Qiskit’s indexing convention. This formula provides us with
a useful tool for analyzing the quantum circuit above.
After the first layer of Hadamard gates is performed, the state of the n + 1 qubits
(including the leftmost/bottom qubit, which is treated separately from the rest) is
1
H ⊗n |0 · · · 0⟩ = |−⟩ ⊗ √ ∑
H |1⟩ | x n −1 · · · x 0 ⟩.
2n xn−1 ··· x0 ∈Σn
1
|−⟩ ⊗ √
2n
∑ (−1) f (xn−1 ···x0 ) | xn−1 · · · x0 ⟩
xn−1 ··· x0 ∈Σn
5.3. THE DEUTSCH–JOZSA ALGORITHM 141
through exactly the same phase kickback phenomenon that we saw in the analysis
of Deutsch’s algorithm. Then the second layer of Hadamard gates is performed,
which (by the formula above) transforms this state into
1
|−⟩ ⊗
2n ∑ ∑ (−1) f (xn−1 ···x0 )+xn−1 yn−1 +···+x0 y0 |yn−1 · · · y0 ⟩.
xn−1 ··· x0 ∈Σn yn−1 ···y0 ∈Σn
This expression looks somewhat complicated, and not too much can be con-
cluded about the probabilities to obtain different measurement outcomes without
knowing more about the function f . Fortunately, all we need to know is the prob-
ability that every one of the measurement outcomes is 0 — because that’s the
probability that the algorithm determines that f is constant. This probability has a
simple formula.
2 1 if f is constant
1
2n x ···∑ n (−1) f ( xn−1 ··· x0 )
=
x ∈Σ
0 if f is balanced
n −1 0
Classical difficulty
The Deutsch–Jozsa algorithm works every time, always giving us the correct answer
when the promise is met, and requires a single query. How does this compare with
classical query algorithms for the Deutsch–Jozsa problem?
First, any deterministic classical algorithm that correctly solves the Deutsch–Jozsa
problem must make exponentially many queries: 2n−1 + 1 queries are required in
the worst case. The reasoning is that, if a deterministic algorithm queries f on 2n−1
or fewer different strings, and obtains the same function value every time, then both
answers are still possible. The function might be constant, or it might be balanced
but through bad luck the queries all happen to return the same function value.
142 LESSON 5. QUANTUM QUERY ALGORITHMS
The second possibility might seem unlikely — but for deterministic algorithms
there’s no randomness or uncertainty, so they will fail systematically on certain
functions. We therefore have a significant advantage of quantum over classical
algorithms in this regard.
There is a catch, however, which is that probabilistic classical algorithms can solve
the Deutsch–Jozsa problem with very high probability using just a few queries. In
particular, if we simply choose a few different strings of length n randomly, and
query f on those strings, it’s unlikely that we’ll get the same function value for all
of them when f is balanced.
To be specific, if we choose k input strings x1 , . . . , x k ∈ Σn uniformly at random,
evaluate f ( x1 ), . . . , f ( x k ), and answer 0 if the function values are all the same, and 1
if not, then we’ll always be correct when f is constant, and wrong in the case that f
is balanced with probability just 2−k+1 . If we take k = 11, for instance, this algorithm
will answer correctly with probability greater than 99.9
For this reason, we do still have a rather modest advantage of quantum over
classical algorithms — but it is nevertheless a quantifiable advantage representing
an improvement over Deutsch’s algorithm.
x · y = x n −1 y n −1 ⊕ · · · ⊕ x 0 y 0 .
We’ll refer to this operation as the binary dot product. An alternative way to define it
is like so.
1 xn−1 yn−1 + · · · + x0 y0 is odd
x·y =
0 x y + · · · + x y is even
n −1 n −1 0 0
Notice that this is a symmetric operation, meaning that the result doesn’t change
if we swap x and y, so we’re free to do that whenever it’s convenient. Sometimes
it’s useful to think about the binary dot product x · y as being the parity of the bits
of x in positions where the string y has a 1, or equivalently, the parity of the bits of
y in positions where the string x has a 1.
5.3. THE DEUTSCH–JOZSA ALGORITHM 143
With this notation in hand we can now define the Bernstein–Vazirani problem.
Bernstein–Vazirani problem
We don’t actually need a new quantum algorithm for this problem; the Deutsch–
Jozsa algorithm solves it. In the interest of clarity, let’s refer to the quantum circuit
from above, which doesn’t include the classical post-processing step of computing
the OR, as the Deutsch–Jozsa circuit.
Algorithm analysis
To analyze how the Deutsch–Jozsa circuit works for a function satisfying the promise
for the Bernstein–Vazirani problem, we’ll begin with a quick observation. Using the
binary dot product, we can alternatively describe the action of n Hadamard gates
on the standard basis states of n qubits as follows.
1
H ⊗n | x ⟩ = √
2n
∑n (−1)x·y |y⟩
y∈Σ
Similar to what we saw when analyzing Deutsch’s algorithm, this is because the
value (−1)k for any integer k depends only on whether k is even or odd.
Turning to the Deutsch–Jozsa circuit, after the first layer of Hadamard gates is
performed, the state of the n + 1 qubits is
1
|−⟩ ⊗ √
2n
∑ | x ⟩.
x ∈Σn
The query gate is then performed, which (through the phase kickback phenomenon)
transforms the state into
1
|−⟩ ⊗ √
n ∑
2 x ∈Σn
(−1) f (x) | x ⟩.
Using our formula for the action of a layer of Hadamard gates, we see that the
second layer of Hadamard gates then transforms this state into
1
|−⟩ ⊗
2n ∑n ∑n (−1) f (x)+x·y |y⟩.
x ∈Σ y∈Σ
144 LESSON 5. QUANTUM QUERY ALGORITHMS
Now we can make some simplifications, in the exponent of −1 inside the sum.
We’re promised that f ( x ) = s · x for some string s = sn−1 · · · s0 , so we can express
the state as
1
|−⟩ ⊗ n ∑ ∑ (−1)s·x+x·y |y⟩.
2 x ∈Σn y∈Σn
Because s · x and x · y are binary values, we can replace the addition with the
exclusive-OR — again because the only thing that matters for an integer in the
exponent of −1 is whether it is even or odd. Making use of the symmetry of the
binary dot product, we obtain this expression for the state:
1
|−⟩ ⊗
2n ∑n ∑n (−1)(s·x)⊕(y·x) |y⟩.
x ∈Σ y∈Σ
Parentheses have been added for clarity, though they aren’t really necessary because
it’s conventional to treat the binary dot product as having higher precedence than
the exclusive-OR.
At this point we will make use of the following formula.
(s · x ) ⊕ (y · x ) = (s ⊕ y) · x
together with an expansion of the binary dot product and bitwise exclusive-OR.
( s · x ) ⊕ ( y · x ) = ( s n −1 x n −1 ) ⊕ · · · ⊕ ( s 0 x 0 ) ⊕ ( y n −1 x n −1 ) ⊕ · · · ⊕ ( y 0 x 0 )
= ( s n −1 ⊕ y n −1 ) x n −1 ⊕ · · · ⊕ ( s 0 ⊕ y 0 ) x 0
= (s ⊕ y) · x
This allows us to express the state of the circuit immediately prior to the measure-
ments like this:
1
|−⟩ ⊗ n ∑ ∑ (−1)(s⊕y)·x |y⟩.
2 x ∈Σn y∈Σn
The final step is to make use of yet another formula, which works for every
binary string z = zn−1 · · · z0 .
1 1 if z = 0n
2n x∑
z· x
(− 1 ) =
∈Σn
0 if z ̸= 0n
5.3. THE DEUTSCH–JOZSA ALGORITHM 145
Here we’re using a simple notation for strings that we’ll use throughout the remain-
der of course: 0n is the all-zero string of length n.
A simple way to argue that this formula works is to consider the two cases
separately. If z = 0n , then z · x = 0 for every string x ∈ Σn , so the value of each
term in the sum is 1, and we obtain 1 by summing and dividing by 2n . On the other
hand, if any one of the bits of z is equal to 1, then the binary dot product z · x is
equal to 0 for exactly half of the possible choices for x ∈ Σn and 1 for the other half
— because the value of the binary dot product z · x flips (from 0 to 1 or from 1 to 0) if
we flip any bit of x in a position where z has a 1.
If we now apply this formula to simplify the state of the circuit prior to the
measurements, we obtain
1
|−⟩ ⊗
2n ∑n ∑n (−1)(s⊕y)·x |y⟩ = |−⟩ ⊗ |s⟩,
x ∈Σ y∈Σ
owing to the fact that s ⊕ y = 0n if and only if y = s. Thus, the measurements reveal
precisely the string s we’re looking for.
Classical difficulty
While the Deutsch–Jozsa circuit solves the Bernstein–Vazirani problem with a single
query, any classical query algorithm must make at least n queries to solve this prob-
lem. This can be reasoned through a so-called information theoretic argument, which
is very simple in this case: each classical query reveals a single bit of information
about the solution, and there are n bits of information that need to be uncovered, so
at least n queries are needed.
It is, in fact, possible to solve the Bernstein–Vazirani problem classically by
querying the function on each of the n strings having a single 1, in each possible
position, and 0 for all other bits, which reveals the bits of s one at a time. So, the
advantage of quantum over classical algorithms for this problem is 1 query versus
n queries.
Remark on nomenclature
What Bernstein and Vazirani did after showing that the Deutsch–Jozsa algorithm
solves the Bernstein–Vazirani problem (as it is stated above) was to define a much
more complicated problem, known as the recursive Fourier sampling problem. This is
a highly contrived problem where solutions to different instances of the problem
effectively unlock new levels of the problem arranged in a tree-like structure. The
Bernstein–Vazirani problem is essentially just the base case of this more complicated
problem.
The recursive Fourier sampling problem was the first known example of a query
problem where quantum algorithms have a so-called super-polynomial advantage
over probabilistic algorithms, thereby surpassing the advantage of quantum over
classical offered by the Deutsch–Jozsa algorithm. Intuitively speaking, the recursive
version of the problem amplifies the 1 versus n advantage of quantum algorithms to
something much larger. The most challenging aspect of the mathematical analysis
establishing this advantage is showing that classical query algorithms can’t solve the
problem without making lots of queries. This is quite typical; for many problems
it can be very difficult to rule out creative classical approaches that solve them
efficiently.
Simon’s problem, and the algorithm for it described in the next section, does
provide a much simpler example of a super-polynomial (and, in fact, exponential)
advantage of quantum over classical algorithms, and for this reason the recursive
Fourier sampling problem is less often discussed. It is, nevertheless, an interesting
computational problem in its own right.
Simon’s problem
The input function for Simon’s problem takes the form
f : Σn → Σm
for positive integers n and m. We could restrict our attention to the case m = n in
the interest of simplicity, but there’s little to be gained in making this assumption —
Simon’s algorithm and its analysis are basically the same either way.
Simon’s problem
Input: A function f : Σn → Σm .
Promise: There exists a string s ∈ Σn such that
[ f ( x ) = f (y)] ⇔ [( x = y) ∨ ( x ⊕ s = y)]
for all x, y ∈ Σn .
Output: The string s.
We’ll unpack the promise to better understand what it says momentarily, but first
let’s be clear that it requires that f has a very special structure — so most functions
won’t satisfy this promise. It’s also fitting to acknowledge that this problem isn’t
intended to have practical importance. Rather, it’s a somewhat artificial problem
tailor-made to be easy for quantum computers and hard for classical computers.
There are two main cases: the first case is that s is the all-zero string 0n , and the
second case is that s is not the all-zero string.
Case 1: s = 0n . If s is the all-zero string, then we can simplify the if and only if
statement in the promise so that it reads [ f ( x ) = f (y)] ⇔ [ x = y]. This is equivalent
to f being a one-to-one function.
Case 2: s ̸= 0n . If s is not the all-zero string, then the promise being satisfied for
this string implies that f is two-to-one, meaning that for every possible output string
of f , there are exactly two input strings that cause f to output that string. Moreover,
these two input strings must take the form w and w ⊕ s for some string w.
It’s important to recognize that there can only be one string s that works if the
promise is met, so there’s always a unique correct answer for functions that satisfy
the promise.
148 LESSON 5. QUANTUM QUERY ALGORITHMS
f (000) = 10011
f (001) = 00101
f (010) = 00101
f (011) = 10011
f (100) = 11010
f (101) = 00001
f (110) = 00001
f (111) = 11010
There are 8 different input strings and 4 different output strings, each of which
occurs twice — so this is a two-to-one function. Moreover, for any two different
input strings that produce the same output string, we see that the bitwise XOR of
these two input strings is equal to 011, which is equivalent to saying that either one
of them equals the other XORed with s.
Notice that the only thing that matters about the actual output strings is whether
they’re the same or different for different choices of input strings. For instance,
in the example above, there are four strings (10011, 00101, 00001, and 11010) that
appear as outputs of f . We could replace these four strings with different strings, so
long as they’re all distinct, and the correct solution s = 011 would not change.
Algorithm description
Figure 5.9 describes the quantum circuit portion of Simon’s algorithm. To be clear,
there are n qubits on the top that are acted upon by Hadamard gates and m qubits
on the bottom that go directly into the query gate. It looks very similar to the
algorithms we’ve already discussed in the lesson, but this time there’s no phase
kickback; the bottom m qubits all go into the query gate in the state |0⟩.
To solve Simon’s problem using this circuit requires several independent runs
of it followed by a classical post-processing step, which will be described later after
the behavior of the circuit is analyzed.
5.4. SIMON’S ALGORITHM 149
|0⟩ H H
|0⟩ H H
y ∈ Σn
|0⟩ H H
Uf
|0⟩
|0⟩
|0⟩
Analysis
The analysis of Simon’s algorithm begins along similar lines to the Deutsch–Jozsa
algorithm. After the first layer of Hadamard gates is performed on the top n qubits,
the state becomes
1
√
n ∑
2 x ∈Σn
|0m ⟩| x ⟩.
When the U f is performed, the output of the function f is XORed onto the all-zero
state of the bottom m qubits, so the state becomes
1
√
2n
∑ | f ( x )⟩| x ⟩.
x ∈Σn
When the second layer of Hadamard gates is performed, we obtain the following
state by using the same formula for the action of a layer of Hadamard gates as
before.
1
2n x∑ ∑n (−1)x·y | f (x)⟩|y⟩
∈Σ y∈Σ
n
150 LESSON 5. QUANTUM QUERY ALGORITHMS
At this point, the analysis diverges from the ones for the previous algorithms
in this lesson. We’re interested in the probability for the measurements to result
in each possible string y ∈ Σn . Through the rules for analyzing measurements
described in Lesson 2 (Multiple Systems), we find that the probability p(y) to obtain
the string y is equal to
2
1
p(y) = n
2 ∑ (−1) x ·y
| f ( x )⟩ .
x ∈Σn
To get a better handle on these probabilities, we’ll need just a bit more notation
and terminology. First, the range of the function f is the set containing all of its
output strings.
range( f ) = { f ( x ) : x ∈ Σn }
Second, for each string z ∈ range( f ), we can express the set of all input strings that
cause the function to evaluate to this output string z as f −1 ({z}).
f −1 ({z}) = { x ∈ Σn : f ( x ) = z}
The set f −1 ({z}) is known as the preimage of {z} under f . We can define the
preimage under f of any set in place of {z} in an analogous way — it’s the set of all
elements that f maps to that set. (This notation should not be confused with the
inverse of the function f , which may not exist. The fact that the argument on the
left-hand side is the set {z} rather than the element z is the clue that allows us to
avoid this confusion.)
Using this notation, we can split up the sum in our expression for the probabili-
ties above to obtain
! 2
1
p(y) = n ∑
2 z∈range ∑ (−1)x·y |z⟩ .
( f ) x ∈ f −1 ({z})
So, it turns out that the value (5.1) is independent of the specific choice of z ∈
range( f ) in both cases.
We can now finish off the analysis by looking at the same two cases as before
separately.
{ y ∈ Σ n : y · s = 0},
which contains 2n−1 strings. This is because, when s ̸= 0n , exactly half of the binary
strings of length n have binary dot product 1 with s and the other have binary
dot product 0 with s, as we already observed in the analysis of the Deutsch–Jozsa
algorithm for the Bernstein–Vazirani problem.
Classical post-processing
We now know what the probabilities are for the possible measurement outcomes
when we run the quantum circuit for Simon’s algorithm. Is this enough information
to determine s?
The answer is yes, provided that we’re willing to repeat the process several
times and accept that it could fail with some probability, which we can make very
small by running the circuit enough times. The essential idea is that each execution
of the circuit provides us with statistical evidence concerning s, and we can use that
evidence to find s with very high probability if we run the circuit sufficiently many
times.
Let’s suppose that we run the circuit independently k times, for k = n + 10.
There’s nothing special about this particular number of iterations — we could take
k to be larger (or smaller) depending on the probability of failure we’re willing to
tolerate, as we will see. Choosing k = n + 10 will ensure that we have greater than
a 99.9% chance of recovering s.
By running the circuit k times, we obtain strings y1 , ..., yk ∈ Σn . To be clear, the
superscripts here are part of the names of these strings, not exponents or indexes to
their bits, so we have
y1 = y1n−1 · · · y10
y2 = y2n−1 · · · y20
..
.
yk = ykn−1 · · · y0k
We then form a matrix M having k rows and n columns by taking the bits of these
5.4. SIMON’S ALGORITHM 153
Now, we don’t know what s is at this point — our goal is to find this string. But
imagine for a moment that we do know the string s, and we form a column vector v
from the bits of the string s = sn−1 · · · s0 as follows.
s n −1
.
v = ..
s0
which works the same way when arithmetic is done modulo 2 as it does with real
or complex numbers. So long as the vectors corresponding to 0n and s are alone in
the null space of M, which happens with high probability, we can deduce s from
the results of this computation.
Classical difficulty
How many queries does a classical query algorithm need to solve Simon’s problem?
The answer is: a lot, in general.
There are different precise statements that can be made about the classical
difficulty of this problem, and here’s just one of them. If we have any probabilistic
query algorithm, and that algorithm makes fewer than 2n/2−1 − 1 queries, which
is a number of queries that’s exponential in n, then that algorithm will fail to solve
Simon’s problem with probability at least 1/2.
Sometimes, proving impossibility results like this can be very challenging, but
this one isn’t too difficult to prove through an elementary probabilistic analysis.
Here, however, we’ll only briefly examine the basic intuition behind it.
We’re trying to find the hidden string s, but so long as we don’t query the
function on two strings having the same output value, we’ll get very limited
information about s. Intuitively speaking, all we’ll learn is that the hidden string
s is not the exclusive-OR of any two distinct strings we’ve queried. And if we
query fewer than 2n/2−1 − 1 strings, then there will still be a lot of choices for s that
we haven’t ruled out because there aren’t enough pairs of strings to cover all the
possibilities. This isn’t a formal proof, it’s just the basic idea.
So, in summary, Simon’s algorithm provides us with a striking advantage of
quantum over classical algorithms within the query model. In particular, Simon’s
algorithm solves Simon’s problem with a number of queries that’s linear in the
number of input bits n of our function, whereas any classical algorithm, even if it’s
probabilistic, needs to make a number of queries that’s exponential in n in order to
solve Simon’s problem with a reasonable probability of success.
Lesson 6
155
156 LESSON 6. QUANTUM ALGORITHMIC FOUNDATIONS
Finally, we’ll turn to a critically important task, which is running classical com-
putations on quantum computers. The reason this task is important is not because
we hope to replace classical computers with quantum computers — which seems
extremely unlikely to happen any time soon, if ever — but rather because it opens
up many interesting possibilities for quantum algorithms. Specifically, classical
computations running on quantum computers become available as subroutines,
effectively leveraging decades of research and development on classical algorithms
in pursuit of quantum computational advantages.
Integer factorization
Input: An integer N ≥ 2.
Output: The prime factorization of N.
By the prime factorization of N we mean a list of the prime factors of N and the
powers to which they must be raised to obtain N by multiplication. For example,
the prime factors of 12 are 2 and 3, and to obtain 12 we must take the product of 2
to the power 2 and 3 to the power 1.
12 = 22 · 3
Up to the ordering of the prime factors, there is only one prime factorization for
each positive integer N ≥ 2, which is a fact known as the fundamental theorem of
arithmetic.
6.1. TWO EXAMPLES: FACTORING AND GCDS 157
A few simple code demonstrations in Python will be helpful for further ex-
plaining integer factorization and other concepts that relate to this discussion. The
following imports are needed for these demonstrations.
import math
from sympy.ntheory import factorint
The factorint function from the SymPy symbolic mathematics package for
Python solves the integer factorization problem for whatever input N we choose.
For example, we can obtain the prime factorization for 12, which naturally agrees
with the factorization above.
N = 12
print(factorint(N))
{2: 2, 3: 1}
Factoring small numbers like 12 is easy, but when the number N to be factored
gets larger, the problem becomes more difficult. For example, running factorint
on a significantly larger number causes a short but noticeable delay on a typical
personal computer.
N = 3402823669209384634633740743176823109843098343
print(factorint(N))
RSA1024 = 1350664108659952233496032162788059699388814756056670
27524485143851526510604859533833940287150571909441798207282164
47155137368041970396419174304649658927425623934102086438320211
03729587257623585096431105640735015081875106765946292055636855
29475213500852879416377328533906109750544334999811150056977236
890927563
Don’t bother running factorint on RSA1024, it would not finish within our
lifetimes.
The fastest known algorithm for factoring large integers is known as the number
field sieve. As an example of this algorithm’s use, the RSA challenge number RSA250,
which has 250 decimal digits (or 829 bits when written in binary), was factored
using the number field sieve in 2020. The computation required thousands of CPU
core-years, distributed across tens of thousands of machines around the world.
Here we can appreciate this effort by checking the solution.
RSA250 = 21403246502407449612644230728393335630086147151447550
17797754920881418023447140136643345519095804679610992851872470
91458768739626192155736304745477052080511905649310668769159001
97594056934574522305893259766974716817380693648946998715784949
75937497937
p = 6413528947707158027879019017057738908482501474294344720811
68596320245323446302386235987526683477087376619255856946397988
53367
q = 3337202759497815655622601060535511422794076034476755466678
45209870238417292100370802574486732968818775657189862580369320
62711
print(RSA250 == p * q)
True
Next let’s consider a related but very different problem, which is computing the
greatest common divisor (or GCD) of two integers.
The greatest common divisor of two numbers is the largest integer that evenly
divides both of them.
This problem is easy to solve with a computer — it has roughly the same com-
putational cost as multiplying the two input numbers together. The gcd function
from the Python math module computes the greatest common divisor of numbers
that are considerably larger than RSA1024 in the blink of an eye. (In fact, RSA1024
is the GCD of the two numbers in this example.)
N = 4636759690183918349682239573236686632636353319755818421393
66706492998731059234746071176778488245588998396154649166612991
56284315499828936384642434938124879795303294608635320415882978
85958272943021122033997933550246447236884738870576045537199814
80492028189035527562505079652686409309200689474479073977837684
8205654332434378295899591539239698896074
M = 5056714874804877864225164843977749374751021379176083540426
46136094565396724930649454588862135361321851808441493084665506
64957674410105268868034583004403457829821275222122094894103154
22285463057656809702949608368597012967321172325810519806487247
19525981807491808241629051373815583434195725455827815138558899
0304622183174568167973121179585331770773
print(math.gcd(N, M))
13506641086599522334960321627880596993888147560566702752448514
38515265106048595338339402871505719094417982072821644715513736
80419703964191743046496589274256239341020864383202110372958725
76235850964311056407350150818751067659462920556368552947521350
0852879416377328533906109750544334999811150056977236890927563
160 LESSON 6. QUANTUM ALGORITHMIC FOUNDATIONS
This is possible because we have very efficient algorithms for computing GCDs, the
most well-known of which is Euclid’s algorithm, discovered over 2,000 years ago.
Could there be a fast algorithm for integer factorization that we just haven’t
discovered yet, allowing large numbers like RSA1024 to be factored in the blink of
an eye? The answer is yes. Although we might expect that an efficient algorithm for
factoring as simple and elegant as Euclid’s algorithm for computing GCDs would
have been discovered by now, there is nothing that rules out the existence of a very
fast classical algorithm for integer factorization, beyond the fact that we’ve failed
to find one thus far. One could be discovered tomorrow — but don’t hold your
breath. Generations of mathematicians and computer scientists have searched, and
factoring numbers like RSA1024 remains beyond our reach.
simple for the purposes of this discussion by restricting our attention to binary
string inputs and outputs. Through binary strings, we can encode a variety of
interesting objects that the problems we’re interested in solving might concern, such
as numbers, vectors, matrices, and graphs, as well as lists of these and other objects.
For example, to encode nonnegative integers, we can use binary notation. The
following table lists the binary encoding of the first nine nonnegative integers, along
with the length (meaning the total number of bits) of each encoding.
We can easily extend this encoding to handle both positive and negative integers
by appending a sign bit to the representations if we choose. Sometimes it’s also
convenient to allow binary representations of nonnegative integers to have leading
zeros, which don’t change the value being encoded but can allow representations
to fill up a string or word of a fixed size.
Using binary notation to represent nonnegative integers is both common and
efficient, but if we wanted to we could choose a different way to represent nonnega-
tive integers using binary strings, such as the ones suggested in the following table.
The specifics of these alternatives are not important to this discussion — the point
is only to clarify that we do have choices for the encodings we use.
162 LESSON 6. QUANTUM ALGORITHMIC FOUNDATIONS
(In this table, the symbol ε represents the empty string, which has no symbols in it
and length equal to zero. Naturally, to avoid an obvious source of confusion, we
use a special symbol such as ε to represent the empty string rather than literally
writing nothing.)
Other types of inputs, such as vectors and matrices, or more complicated objects
like descriptions of molecules, can also be encoded as binary strings. Just like
we have for nonnegative integers, a variety of different encoding schemes can be
selected or invented. For whatever scheme we come up with to encode inputs to a
given problem, we interpret the length of an input string as representing the size of
the problem instance being solved.
For example, the number of bits required to express a nonnegative integer N
in binary notation, which is sometimes denoted lg( N ), is given by the following
formula.
1 N=0
lg( N ) =
1 + ⌊log ( N )⌋ N ≥ 1
2
Assuming that we use binary notation to encode the input to the integer factoring
problem, the input length for the number N is therefore lg( N ). Note, in particular,
that the length (or size) of the input N is not N itself; when N is large we don’t need
nearly this many bits to express N in binary notation.
From a strictly formal viewpoint, whenever we consider a computational prob-
lem or task, it should be understood that a specific scheme has been selected for
encoding whatever objects are given as input or produced as output. This allows
computations that solve interesting problems to be viewed abstractly as transfor-
mations from binary string inputs to binary string outputs.
6.2. MEASURING COMPUTATIONAL COST 163
The details of how objects are encoded as binary strings must necessarily be
important to these computations at some level. Usually, though, we don’t worry
all that much about these details when we’re analyzing computational cost, so that
we can avoid getting into details of secondary importance. The basic reasoning
is that we expect the computational cost of converting back and forth between
“reasonable” encoding schemes to be insignificant compared with the cost of solving
the actual problem. In those situations in which this is not the case, the details can
(and should) be clarified.
For example, a very simple computation converts between the binary represen-
tation of a nonnegative integer and its lexicographic encoding (which we have not
explained in detail, but it can be inferred from the table above). For this reason, the
computational cost of integer factoring wouldn’t differ significantly if we decided
to switch from using one of these encodings to the other for the input N. On the
other hand, encoding nonnegative integers in unary notation incurs an exponential
blow-up in the total number of symbols required, and we would not consider it to
be a “reasonable” encoding scheme for this reason.
Elementary operations
Now let’s consider the computation itself, which is represented by the blue rectangle
in Figure 6.1. The way that we’ll measure computational cost is to count the number
of elementary operations that each computation requires. Intuitively speaking, an
elementary operation is one involving a small, fixed number of bits or qubits, that
can be performed quickly and easily — such as computing the AND of two bits. In
contrast, running the factorint function is not reasonably viewed as being an
elementary operation.
Formally speaking, there are different choices for what constitutes an elementary
operation depending on the computational model being used. Our focus will be on
circuit models, and specifically quantum and Boolean circuits.
For circuit-based models of computation, it’s typical that each gate is viewed as
an elementary operation. This leads to the question of precisely which gates we
permit in our circuits. Focusing for the moment on quantum circuits, we’ve seen
several gates thus far in this course, including X, Y, Z, H, S, and T gates, swap gates,
164 LESSON 6. QUANTUM ALGORITHMIC FOUNDATIONS
For Boolean circuits, we’ll take AND, OR, NOT, and FANOUT gates to be the
ones representing elementary operations. We don’t actually need both AND gates
and OR gates — we can use De Morgan’s laws to convert from either one to the other
by placing NOT gates on all three input/output wires — but nevertheless it is both
typical and convenient to allow both AND and OR gates. AND, OR, NOT, and
FANOUT gates form a universal set for deterministic computations, meaning that
any function from any fixed number of input bits to any fixed number of output
bits can be implemented with these gates.
Standard basis measurement gates can appear within quantum circuits, but some-
times it’s convenient to delay them until the end. This allows us to view quantum
computations as consisting of a unitary part (representing the computation itself),
followed by a simple read-out phase where qubits are measured and the results are
output. This can always be done, provided that we’re willing to add an additional
qubit for each standard basis measurement. Figure 6.2 illustrates how this can be
done.
Specifically, the classical bit in the circuit on the left is replaced by a qubit on the
right (initialized to the |0⟩ state), and the standard basis measurement is replaced by
a controlled-NOT gate, followed by a standard basis measurement on the bottom
qubit. The point is that the standard basis measurement in the right-hand circuit
can be pushed all the way to the end of the circuit. If the classical bit in the circuit
on the left is later used as a control bit, we can use the bottom qubit in the circuit
on the right as a control instead, and the overall effect will be the same. (We are
assuming that the classical bit in the circuit on the left doesn’t get overwritten after
|0⟩ +
Figure 6.2: The standard basis measurement on the left can be deferred through
the introduction of a workspace qubit and a controlled-NOT gate, as shown on the
right.
166 LESSON 6. QUANTUM ALGORITHMIC FOUNDATIONS
The total number of gates in a circuit is referred to as that circuit’s size. Thus,
presuming that the gates in our circuits represent elementary operations, a circuit’s
size represents the number of elementary operations it requires — or, in other words,
its computational cost. We write size(C ) to refer to the size of a given circuit C.
For example, consider the Boolean circuit for computing the XOR of two bits
shown in Figure 6.3, which we’ve now encountered a few times. The size of this
∧
¬
¬
∧
Figure 6.3: A Boolean circuit for computing the exclusive-OR of two bits.
circuit is 7 because there are 7 gates in total. (Fanout operations are not always
counted as being gates, but for the purposes of this lesson we will count them as
being gates.)
One final note concerning circuit size and computational cost is that it is possible
to assign different costs to gates, rather than viewing every gate as contributing
equally to the total cost.
For example, as was already mentioned, FANOUT gates are often viewed as
being free for Boolean circuits — which is to say that we could choose that FANOUT
gates have zero cost. As another example, when we’re working in the query model
and we count the number of queries that a circuit makes to an input function (in the
168 LESSON 6. QUANTUM ALGORITHMIC FOUNDATIONS
form of a black box), we’re effectively assigning unit cost to query gates and zero
cost to other gates, such as Hadamard gates. A final example is that we sometimes
assign different costs to gates depending on how difficult they are to implement,
which could vary depending upon the hardware being considered.
While all of these options are sensible in different contexts, for this lesson we’ll
keep things simple and stick with circuit size as a representation of computational
cost.
Families of circuits
{C1 , C2 , C3 , . . .},
where Cn solves whatever problem we’re talking about for n-bit inputs (or, more
generally, for inputs whose size is parameterized in some way by n), and the
6.2. MEASURING COMPUTATIONAL COST 169
t(n) = size(Cn ).
For quantum circuits the situation is similar, where larger and larger circuits are
needed to accommodate longer and longer input strings.
To explain further, let’s take a moment to consider the problem of integer addition,
which is much simpler than integer factoring or even computing GCDs.
Integer addition
0 ≤ N, M ≤ 2n − 1.
The output will be an (n + 1)-bit binary string representing the sum, which is the
maximum number of bits we need to express the result.
We begin with an algorithm — the standard algorithm for addition of binary
representations — which is the base 2 analogue to the way addition is taught in
elementary/primary schools around the world. This algorithm can be implemented
with Boolean circuits as follows.
Starting from the least significant bits, we can compute their XOR to determine
the least significant bit for the sum. Then we compute the carry bit, which is the
AND of the two least significant bits of N and M. Sometimes these two operations
together are known as a half adder.
Using the XOR circuit we’ve now seen a few times together with an AND gate
and two FANOUT gates, we can build a half adder with 10 gates. If for some reason
we changed our minds and decided to include XOR gates in our set of elementary
operations, we would need 1 AND gate, 1 XOR gate, and 2 FANOUT gates to build
a half adder.
170 LESSON 6. QUANTUM ALGORITHMIC FOUNDATIONS
Half adder
Figure 6.4: A Boolean circuit implementing a half adder using two FANOUT gates,
an XOR gate, and an AND gate.
Moving on to the more significant bits, we can use a similar procedure, but this
time including the carry bit from each previous position into our calculation. By
cascading two half adders and taking the OR of the carry bits they produce, we can
create what’s known as a full adder. Figure 6.5 illustrates this construction. This
Full adder
Half
adder
Half
adder ∨
Figure 6.5: A full adder constructed from two half adders and an OR gate.
requires 21 gates in total: 2 AND gates, 2 XOR gates (each requiring 7 gates to
implement), one OR gate, and 4 FANOUT gates.
Finally, by cascading a half adder along with however many full adders as
needed, we obtain a Boolean circuit for nonnegative integer addition. For example,
Figure 6.6 illustrates how this is done when computing the sum of two 4-bit integers.
6.2. MEASURING COMPUTATIONAL COST 171
Half
adder
Full
adder
Full
adder
Full
adder
Figure 6.6: Cascading a half adder and three full adders creates a Boolean circuit for
adding two 4-bit integers.
21(n − 1) + 10 = 21n − 11
gates. Had we decided to include XOR gates in our set of elementary operations,
we would need 2n − 1 AND gates, 2n − 1 XOR gates, n − 1 OR gates, and 4n − 2
FANOUT gates, for a total of 9n − 5 gates. If in addition we decide not to count
FANOUT gates, it’s 5n − 3 gates.
Asymptotic notation
On the one hand, it’s good to know precisely how many gates are needed to perform
various computations, like in the example of integer addition above. These details
are important for actually building the circuits.
On the other hand, if we perform analyses at this level of detail for all the
computations we’re interested in, including ones for tasks that are much more
complicated than addition, we’ll very quickly be buried in details. To keep things
manageable, and to intentionally suppress details of secondary importance, we
typically use Big-O notation when analyzing algorithms. Through this notation we
can make useful statements about the rate at which functions grow.
172 LESSON 6. QUANTUM ALGORITHMIC FOUNDATIONS
Formally speaking, if we have two functions g(n) and h(n), we write that
g(n) = O(h(n)) if there exists a positive real number c > 0 and a positive integer
n0 such that
g(n) ≤ c · h(n)
for all n ≥ n0 . Typically h(n) is chosen to be as simple an expression as possible, so
that the notation can be used to reveal the limiting behavior of a function in simple
terms. For example, 17n3 − 257n2 + 65537 = O(n3 ).
This notation can be extended to functions having multiple arguments in a fairly
straightforward way. For instance, if we have two functions g(n, m) and h(n, m)
defined on positive integers n and m, we write that g(n, m) = O(h(n, m)) if there
exists a positive real number c > 0 and a positive integer k0 such that
g(n, m) ≤ c · h(n, m)
whenever n + m ≥ k0 .
Connecting this notation to the example of nonnegative integer addition, we
conclude that there exists a family of Boolean circuits {C1 , C2 , . . . , }, where Cn adds
two n-bit nonnegative integers together, such that size(Cn ) = O(n). This reveals
the most essential feature of how the cost of addition scales with the input size: it
scales linearly.
Notice also that it doesn’t depend on the specific detail of whether we consider
XOR gates to have unit cost or cost 7. In general, using Big-O notation allows us to
make statements about computational costs that aren’t sensitive to such low-level
details.
More examples
Here are a few more examples of problems from computational number theory,
beginning with multiplication.
Integer multiplication
Creating Boolean circuits for this problem is more difficult than creating circuits for
addition — but by thinking about the standard multiplication algorithm, we can come
6.2. MEASURING COMPUTATIONAL COST 173
up with circuits having size O(n2 ) for this problem (assuming N and M are both
represented by n-bit binary representations). More generally, if N has n bits and M
has m bits, there are Boolean circuits of size O(nm) for multiplying N and M.
There are, in fact, other ways to multiply that scale better. For instance, the
Schönhage–Strassen multiplication algorithm can be used to create Boolean circuits
for multiplying two n-bit integers at cost O(n lg(n) lg(lg(n))). The intricacy of this
method causes a lot of overhead, however, making it only practical for numbers
having tens of thousands of bits or more.
Another basic problem is division, which we interpret to mean computing both
the quotient and remainder given an integer divisor and dividend.
Integer division
The cost of integer division is similar to multiplication: if N has n bits and M has
m bits, there are Boolean circuits of size O(nm) for solving this problem. And like
multiplication, asymptotically superior methods are known.
We can now compare known algorithms for computing GCDs with those for
addition and multiplication. Euclid’s algorithm for computing the GCD of an
n-bit number N and an m-bit number M requires Boolean circuits of size O(nm),
similar to the standard algorithms for multiplication and division. Also similar
to multiplication and division, there are asymptotically faster GCD algorithms —
including ones requiring O(n(lg(n))2 lg(lg(n))) elementary operations to compute
the GCD of two n-bit numbers.
A somewhat more expensive computation that arises in number theory is modu-
lar exponentiation.
If N has n bits, M has m bits, and K has k bits, this problem can be solved by
Boolean circuits having size O(km2 + nm). This is not at all obvious. The solution is
not to first compute N K and then take the remainder, which would necessitate using
exponentially many bits just to store the number N K . Rather, we can use the power
algorithm (known alternatively as the binary method and repeated squaring), which
makes use of the binary representation of K to perform the entire computation
modulo M. Assuming N, M, and K are all n-bit numbers, we obtain an O(n3 )
algorithm — or a cubic time algorithm. And once again, there are known algorithms
that are more complicated but asymptotically faster.
In contrast to the algorithms just discussed, known algorithms for integer factoriza-
tion are much more expensive — as we might expect from the discussion earlier in
the lesson.
One simple approach to factoring is trial division, where an algorithm searches
√
through the list 2, . . . , N to find a prime factor of an input number N. This requires
O(2n/2 ) iterations in the worst case when N is an n-bit number. Each iteration
requires a trial division, which means O(n2 ) elementary operations for each iteration
(using a standard algorithm for integer division). We end up with circuits of size
O(n2 2n/2 ), which is exponential in the input size n.
There are algorithms for integer factorization having better scaling. The number
field sieve mentioned earlier, for instance, which is an algorithm that makes use of
randomness, is generally believed (but not rigorously proven) to require
1/3 (lg( n ))2/3 )
2O(n
elementary operations to factor n-bit integers with high probability. While it is quite
significant that n is raised to the power 1/3 in the exponent of this expression, the
fact it appears in the exponent is still a problem that causes poor scaling — and
explains in part why RSA1024 remains outside of its domain of applicability.
three problems have quadratic cost (or subquadratic cost using asymptotically fast
algorithms). Modular exponentiation is more expensive but can still be done pretty
efficiently, with cubic cost (or sub-cubic cost using asymptotically fast algorithms).
These are all examples of algorithms having polynomial cost, meaning that
they have cost O(nc ) for some choice of a fixed constant c > 0. As a rough, first-
order approximation, algorithms having polynomial cost are abstractly viewed as
representing efficient algorithms.
In contrast, known classical algorithms for integer factoring have exponential
cost. Sometimes the cost of the number field sieve is described as sub-exponential
because n is raised to the power 1/3 in the exponent, but in complexity theory it is
more typical to reserve this term for algorithms whose cost is
ε
O 2n
for every ε > 0. The so-called NP-complete problems are a class of problems not
known to (and widely conjectured not to) have polynomial-cost algorithms. A
circuit-based formulation of the exponential-time hypothesis posits something even
stronger, which is that no NP-complete problem can have a sub-exponential cost
algorithm.
The association of polynomial-cost algorithms with efficient algorithms must
be understood as being a loose abstraction. Of course, if an algorithm’s cost scales
as n1000 or n1000000 for inputs of size n, then it’s a stretch to describe that algorithm
as being efficient. However, even an algorithm having cost that scales as n1000000
must be doing something clever to avoid having exponential cost, which is generally
what we expect of algorithms based in some way on “brute force” or “exhaustive
search.” Even the sophisticated refinements of the number field sieve, for instance,
fail to avoid this exponential scaling in cost. Polynomial-cost algorithms, on the
other hand, manage to take advantage of the problem structure in some way that
avoids an exponential scaling.
In practice, the identification of a polynomial-cost algorithm for a problem is just
a first step toward actual efficiency. Through algorithmic refinements, polynomial-
cost algorithms with large exponents can sometimes be improved dramatically,
lowering the cost to a more “reasonable” polynomial scaling. Sometimes things
become easier when they’re known to be possible — so the identification of a
polynomial-cost algorithm for a problem can also have the effect of inspiring new,
even more efficient algorithms.
176 LESSON 6. QUANTUM ALGORITHMIC FOUNDATIONS
There is one final issue that’s worth mentioning, although we will not concern
ourselves with it further in this course. There’s a “hidden” computational cost
when we’re working with circuits, and it concerns the specifications of the circuits
themselves. As inputs get longer and longer, larger and larger circuits are required
— but we need to get our hands on the descriptions of these circuits somehow if
we’re going to implement them.
For all of the examples we’ve discussed, or will discuss in subsequent lessons,
there’s an underlying algorithm from which the circuits are derived. Usually the
circuits in a family follow some basic pattern that’s easy to extrapolate to larger and
larger inputs, such as cascading full adders to create Boolean circuits for addition or
performing layers of Hadamard gates and other gates in some simple-to-describe
pattern.
But what happens if there are prohibitive computational costs associated with
the patterns in the circuits themselves? For instance, the description of each member
Cn in a circuit family could, in principle, be determined by some extremely difficult
to compute function of n.
The answer is that this is indeed a problem — and so we must place additional
restrictions on families of circuits beyond having polynomial cost in order for them
to truly represent efficient algorithms. The property of uniformity for circuits does
6.3. CLASSICAL COMPUTATIONS ON QUANTUM COMPUTERS 177
Toffoli gates
| a⟩ | a⟩
|b⟩ |b⟩
|c⟩ + |c ⊕ ab⟩
Bearing in mind that we’re using Qiskit’s ordering convention, where the qubits
are ordered in increasing significance from top to bottom, the matrix representation
of this gate is as follows.
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0
0 0 0 1 0 0 0 0
Another way to think about Toffoli gates is that they’re essentially query gates for
the AND function, in the sense that they follow the pattern we saw in the previous
lesson for unitary query gate implementations of arbitrary functions having binary
string inputs and outputs.
Toffoli gates are not included in the default gate set discussed earlier in the
lesson, but it is possible to construct a Toffoli gate from H, T, T † , and CNOT gates
as shown in Figure 6.8.
T + T† +
H + T† + T + T† + T H
A single Toffoli gate, used in conjunction with a few NOT gates, can implement an
AND and OR gate, and FANOUT gates can easily be implemented using controlled-
NOT gates, as Figure 6.9 illustrates.
6.3. CLASSICAL COMPUTATIONS ON QUANTUM COMPUTERS 179
|0⟩ + | a ∧ b⟩ |0⟩ + + | a ∨ b⟩
FANOUT gate
| a⟩ | a⟩
|0⟩ + | a⟩
Figure 6.9: Implementations of AND, OR, and FANOUT gates using Toffoli and
NOT gates along with an initialized workspace qubit.
In all three cases, the qubits that the AND, OR, and FANOUT gates act upon
come in from the left as inputs, and we also require one workspace qubit initialized
to the zero state for each one. These workspace qubits appear inside of the boxes
representing the gate implementations to suggest that they’re new, and therefore
part of the cost of these implementations.
For the AND and OR gates we also have two qubits left over, in addition to
the output qubit. For example, inside the box in the diagram representing the
simulation of an AND gate, the top two qubits are left in the states | a⟩ and |b⟩.
These qubits are illustrated as remaining inside of the boxes because they’re no
longer needed and are not part of the output. They can be ignored for now, though
we will turn our attention back to them shortly.
The remaining Boolean gate, the NOT gate, is included in our default set of
quantum gates, so we don’t require a simulation for this one.
bits. Let t = size(C ) be the number of gates in C, and let’s give the name f to the
function that C computes, which takes the form
f : Σn → Σm
C
x f (x) |x⟩ | f ( x )⟩
t gates
R
O(t) gates
| g( x )⟩
|0k ⟩
Figure 6.10: For a given Boolean circuit C, a circuit R is obtained by replacing each
AND, OR, and FANOUT gate with its Toffoli gate simulation. The action of R on
standard basis states is as shown.
Here, k is the number of workspace qubits required — one for each AND, OR,
and FANOUT gate of C — and g is a function of the form g : Σn → Σn+k−m that
describes the states of the leftover qubits created by the gate simulations after R is
run. In the figure, the qubits corresponding to the output f ( x ) are on the top and
the remaining, leftover qubits storing g( x ) are on the bottom. We can force this
to happen if we wish by rearranging the qubits using SWAP gates, which can be
implemented with three controlled-NOT gates as shown in Figure 6.11. As we’ll
see in the next section, it’s not really essential to rearrange the output qubits like
this, but it’s easy enough to do it if we choose.
The function g that describes the classical states of the leftover qubits is deter-
mined by the circuit C, but we actually don’t need to worry all that much about
6.3. CLASSICAL COMPUTATIONS ON QUANTUM COMPUTERS 181
|ϕ⟩ + |ψ⟩
|ψ⟩ + + |ϕ⟩
it; we don’t care specifically what state these qubits are in when the computation
finishes. The letter g comes after f , so it’s a reasonable name for this function on
that account, but there’s a better reason to pick the name g — it’s short for garbage.
|x⟩ |x⟩
R R†
|0k ⟩ |0k ⟩
+
+
+
+
|y⟩ +
+
+
|y ⊕ f ( x )⟩
+
+
+
|x⟩ |x⟩
Q
|0k ⟩ |0k ⟩
|y⟩ |y ⊕ f ( x )⟩
The construction just described allows us to simulate any Boolean circuit with a
quantum circuit in a garbage-free manner. If C is a Boolean circuit implementing a
function f : Σn → Σm , then we obtain a quantum circuit Q that operates as follows
on standard basis states.
Q |y⟩|0k ⟩| x ⟩ = |y ⊕ f ( x )⟩|0k ⟩| x ⟩
The number k indicates how many workspace qubits are required in total.
It is possible to take this methodology one step further when the function f itself
is invertible. To be precise, suppose that the function f takes the form f : Σn → Σn ,
and also suppose that there exists a function f −1 such that f −1 ( f ( x )) = x for every
x ∈ Σn (which is necessarily unique when it exists). This means that the operation
that transforms | x ⟩ into | f ( x )⟩ for every x ∈ Σn is unitary, so we might hope to
build a quantum circuit that implements the unitary operation defined by
U | x ⟩ = | f ( x )⟩
for every x ∈ Σn .
To be clear, the fact that this is a unitary operation relies on f being invertible
— it’s not unitary when f isn’t invertible. Disregarding the workspace qubits, U
is different from the operation that the circuit Q implements because we’re not
keeping a copy of the input around and XORing it to an arbitrary string, we’re
replacing x by f ( x ).
The question is: when f is invertible, can we do this?
184 LESSON 6. QUANTUM ALGORITHMIC FOUNDATIONS
The answer is yes, provided that we’re allowed to use workspace qubits and,
in addition to having a Boolean circuit that computes f , we also have one that
computes f −1 . So, this isn’t a shortcut for computationally inverting functions
when we don’t already know how to do that! Figure 6.14 illustrates how it can
be done by composing two quantum circuits, Q f and Q f −1 , which are obtained
individually for the functions f and f −1 through the method described above, along
with n swap gates, taking k to be the maximum of the numbers of workspace qubits
required by Q f and Q f −1 .
( )
|x⟩ | f ( x )⟩
|0k ⟩ Qf Q f −1 |0k ⟩
( )
|0n ⟩ |0n ⟩
In this lesson, we’ll discuss the phase estimation problem and how to solve it
with a quantum computer. We’ll then use this solution to obtain Shor’s algorithm
— an efficient quantum algorithm for the integer factorization problem. Along the
way, we’ll encounter the quantum Fourier transform, and we’ll see how it can be
implemented efficiently by a quantum circuit.
Spectral theorem
The spectral theorem is an important fact from linear algebra that states that matrices
of a certain type, called normal matrices, can be expressed in a simple and useful
way. We’ll only need this theorem for unitary matrices in this lesson, but later in
the course we’ll apply it to Hermitian matrices as well.
Normal matrices
185
186 LESSON 7. PHASE ESTIMATION AND FACTORING
Hermitian matrices, which are matrices that equal their own conjugate transpose,
are another important class of normal matrices. If M is a Hermitian matrix, then
MM† = M2 = M† M,
so M is normal.
Not every square matrix is normal. For instance, this matrix isn’t normal:
!
0 1
0 0
(This is a simple but great example of a matrix that’s often very helpful to consider.)
It isn’t normal because
! !† ! ! !
0 1 0 1 0 1 0 0 1 0
= =
0 0 0 0 0 0 1 0 0 0
while !† ! ! ! !
0 1 0 1 0 0 0 1 0 0
= = .
0 0 0 0 1 0 0 0 0 1
Theorem statement
Spectral theorem
M|ψj ⟩ = λ j |ψj ⟩
7.1. THE PHASE ESTIMATION PROBLEM 187
Example 1. Let !
1 0
I= ,
0 1
which is normal. The theorem implies that I can be written in the form (7.1)
for some choice of λ0 , λ1 , |ψ0 ⟩, and |ψ1 ⟩. There are multiple choices that work,
including
λ0 = 1, λ1 = 1, |ψ0 ⟩ = |0⟩, |ψ1 ⟩ = |1⟩.
Notice that the theorem does not say that the complex numbers λ0 , . . . , λ N −1
are distinct — we can have the same complex number repeated, which is
necessary for this example. These choices work because
I = |0⟩⟨0| + |1⟩⟨1|.
Indeed, we could choose {|ψ0 ⟩, |ψ1 ⟩} to be any orthonormal basis and the
equation will be true. For instance,
I = |+⟩⟨+| + |−⟩⟨−|.
where
|ψθ ⟩ = cos(θ )|0⟩ + sin(θ )|1⟩.
188 LESSON 7. PHASE ESTIMATION AND FACTORING
More explicitly,
p √ p √
2+ 2 2− 2
|ψπ/8 ⟩ = |0⟩ + |1⟩,
2 2
p √ p √
2− 2 2+ 2
|ψ5π/8 ⟩ = − |0⟩ + |1⟩.
2 2
We can check that this decomposition is correct by performing the required
calculations:
√ √ √ √
2+ 2 2 2− 2 2
4 4 4 − 4
|ψπ/8 ⟩⟨ψπ/8 | − |ψ5π/8 ⟩⟨ψ5π/8 | = √ √ − √ √ = H.
2 2− 2 2 2+ 2
4 4 − 4 4
As the first example above reveals, there can be some freedom in how eigen-
vectors are selected. There is, however, no freedom at all in how the eigenvalues
are chosen, except for their ordering: the same N complex numbers λ0 , . . . , λ N −1 ,
which can include repetitions of the same complex number, will always occur in
the equation (7.1) for a given choice of a matrix M.
Now let’s focus in on unitary matrices. Suppose U is unitary and we have a
complex number λ and a nonzero vector |ψ⟩ that satisfy the equation
U | ψ ⟩ = λ | ψ ⟩. (7.2)
T = { α ∈ C : | α | = 1}
(The symbol T is a common name for the complex unit circle. The name S1 is also
common.)
7.1. THE PHASE ESTIMATION PROBLEM 189
λ = e2πiθ
for a unique real number θ satisfying 0 ≤ θ < 1. The goal of the problem is to
compute or approximate this real number θ.
U
U
Figure 7.1: A unitary operation U (viewed as a quantum gate) on the left and a
controlled-U operation on the right.
|0⟩ H H
|ψ⟩ U
|0⟩ H H
|ψ⟩ U
| π0 ⟩ | π1 ⟩ | π2 ⟩ | π3 ⟩
Figure 7.3: The states |π0 ⟩, . . . , |π3 ⟩ considered in the analysis of the single control
qubit phase estimation procedure.
1 1
|π2 ⟩ = √ |ψ⟩|0⟩ + √ U |ψ⟩ |1⟩.
2 2
e2πiθ e2πiθ
1 1
|π2 ⟩ = √ |ψ⟩|0⟩ + √ |ψ⟩|1⟩ = |ψ⟩ ⊗ √ |0⟩ + √ |1⟩
2 2 2 2
Here we observe the phase kickback phenomenon. It is slightly different this time
than it was for Deutsch’s algorithm and the Deutsch–Jozsa algorithm because we’re
not working with a query gate — but the idea is similar.
Finally, the second Hadamard gate is performed. After just a bit of simplification,
we obtain this expression for this state.
1 + e2πiθ 1 − e2πiθ
| π3 ⟩ = | ψ ⟩ ⊗ |0⟩ + |1⟩
2 2
The measurement therefore yields the outcomes 0 and 1 with these probabilities:
2
1 + e2πiθ
p0 = = cos2 (πθ )
2
2
1 − e2πiθ
p1 = = sin2 (πθ ).
2
7.2. PHASE ESTIMATION PROCEDURE 193
0.8
probability
0.6
0
1
0.4
0.2
Figure 7.4: Output probabilities for phase estimation with a single control qubit.
Figure 7.4 shows a plot of the probabilities for the two possible outcomes, 0
and 1, as functions of θ. Naturally, the two probabilities always sum to 1. Notice
that when θ = 0, the measurement outcome is always 0, and when θ = 1/2, the
measurement outcome is always 1. So, although the measurement result doesn’t
reveal exactly what θ is, it does provide us with some information about it — and
if we were promised that either θ = 0 or θ = 1/2, we could learn from the circuit
which one is correct without error.
Intuitively speaking, we can think of the circuit’s measurement outcome as
being a guess for θ to “one bit of accuracy.” In other words, if we were to write θ in
binary notation and round it off to one bit, we’d have a number like this:
0 a = 0
0.a =
1 a = 1.
2
The measurement outcome can be viewed as a guess for the bit a. When θ is
neither 0 nor 1/2, there’s a nonzero probability that the guess will be wrong — but
the probability of making an error becomes smaller and smaller as we get closer to
0 or 1/2.
It’s natural to ask what role the two Hadamard gates play in this procedure:
• The first Hadamard gate sets the control qubit to a uniform superposition of
|0⟩ and |1⟩, so that when the phase kickback occurs, it happens for the |1⟩
194 LESSON 7. PHASE ESTIMATION AND FACTORING
state and not the |0⟩ state, creating a relative phase difference that affects the
measurement outcomes. If we didn’t do this and the phase kickback produced
a global phase, it would have no effect on the probabilities of obtaining different
measurement outcomes.
• The second Hadamard gate allows us to learn something about the number θ
through the phenomenon of interference. Prior to the second Hadamard gate,
the state of the top qubit is
1 e2πiθ
√ |0⟩ + √ |1⟩,
2 2
and if we were to measure this state, we would obtain 0 and 1 each with prob-
ability 1/2, telling us nothing about θ. By performing the second Hadamard
gate, however, we cause the number θ to affect the output probabilities.
The circuit above uses the phase kickback phenomenon to approximate θ to a single
bit of accuracy. One bit of accuracy may be all we need in some situations — but for
factoring we’re going to need a lot more accuracy than that. A natural question is,
how can we learn more about θ?
One very simple thing we can do is to replace the controlled-U operation in our
circuit with two copies of this operation, like in Figure 7.5. Two copies of a controlled-
|0⟩ H H
|ψ⟩ U U
Figure 7.5: A modified version of the circuit in Figure 7.2 with two controlled-U
gates in place of one.
So, if we run this version of the circuit, we’re effectively performing the same
computation as before, except that the number θ is replaced by 2θ. Figure 7.6 shows
a plot illustrating the output probabilities as θ ranges from 0 to 1.
0.8
probability
0.6
0
1
0.4
0.2
Figure 7.6: Output probabilities for phase estimation with a single control qubit
and two controlled-unitary gates.
Doing this can indeed provide us with some additional information about θ. If
the binary representation of θ is
θ = 0.a1 a2 a3 · · ·
then doubling θ effectively shifts the binary point one position to the right.
2θ = a1 .a2 a3 · · ·
And because we’re equating θ = 1 with θ = 0 as we move around the unit circle,
we see that the bit a1 has no influence on our probabilities, and we’re effectively
obtaining a guess for the second bit after the binary point if we round θ to two bits.
For instance, if we knew in advance that θ was either 0 or 1/4, then we could fully
trust the measurement outcome to tell us which.
It’s not immediately clear, though, how this estimation should be reconciled
with what we learned from the original (non-doubled) phase kickback circuit to
give us the most accurate information possible about θ. So let’s take a step back and
consider how to proceed.
196 LESSON 7. PHASE ESTIMATION AND FACTORING
Rather than considering the two options described above separately, let’s combine
them into a single circuit, like in Figure 7.7. The Hadamard gates after the controlled
|0⟩ H
|0⟩ H
|ψ⟩ U U U
Figure 7.7: The initial portion of a quantum circuit for phase estimation with two
control qubits.
operations have been removed and there are no measurements here yet. We’ll add
more to the circuit as we consider our options for learning as much as we can
about θ.
If we run this circuit when |ψ⟩ is an eigenvector of U, the state of the bottom
qubits will remain |ψ⟩ throughout the entire circuit, and phases will be kicked into
the state of the top two qubits. Let’s analyze the circuit carefully by considering the
states indicated in Figure 7.8.
We can write the state |π1 ⟩ like this:
1 1 1
2 a∑ ∑ | a1 a0 ⟩.
| π1 ⟩ = | ψ ⟩ ⊗
=0 a =0
0 1
1 1 1 2πia0 θ
2 a∑ ∑e
| π2 ⟩ = | ψ ⟩ ⊗ | a1 a0 ⟩.
=0 a =0
0 1
The second and third controlled-U gates do something similar, except for a1
rather than a0 , and with θ replaced by 2θ. We can express the resulting state like
7.2. PHASE ESTIMATION PROCEDURE 197
|0⟩ H
|0⟩ H
|ψ⟩ U U U
| π1 ⟩ | π2 ⟩ | π3 ⟩
Figure 7.8: The states |π0 ⟩, . . . , |π3 ⟩ considered in the analysis of two-qubit phase
estimation.
this:
1 1 1 2πi(2a1 +a0 )θ
2 a∑ ∑e
| π3 ⟩ = | ψ ⟩ ⊗ | a1 a0 ⟩.
=0 a =0
0 1
1 3 2πix( y ) 1 3 2πi xy
|ϕy ⟩ = ∑ e 4 |x⟩ = ∑ e 4 |x⟩
2 x =0 2 x =0
198 LESSON 7. PHASE ESTIMATION AND FACTORING
1 1 1 1
|ϕ0 ⟩ = |0⟩ + |1⟩ + |2⟩ + |3⟩
2 2 2 2
1 i 1 i
|ϕ1 ⟩ = |0⟩ + |1⟩ − |2⟩ − |3⟩
2 2 2 2
1 1 1 1
|ϕ2 ⟩ = |0⟩ − |1⟩ + |2⟩ − |3⟩
2 2 2 2
1 i 1 i
|ϕ3 ⟩ = |0⟩ − |1⟩ − |2⟩ + |3⟩
2 2 2 2
These vectors are orthogonal: if we choose any pair of them and compute their
inner product, we get 0. Each one is also a unit vector, so {|ϕ0 ⟩, |ϕ1 ⟩, |ϕ2 ⟩, |ϕ3 ⟩} is
an orthonormal basis. We therefore know right away that there is a measurement
that can discriminate them perfectly — meaning that, if we’re given one of them
but we don’t know which, then we can figure out which one it is without error.
To perform such a discrimination with a quantum circuit, we can first define a
unitary operation V that transforms standard basis states into the four states listed
above.
V |00⟩ = |ϕ0 ⟩
V |01⟩ = |ϕ1 ⟩
V |10⟩ = |ϕ2 ⟩
V |11⟩ = |ϕ3 ⟩
To write down V as a 4 × 4 matrix, it’s just a matter of taking the columns of V to
be the states |ϕ0 ⟩, . . . , |ϕ3 ⟩.
1 1 1 1
11 i −1 − i
V=
2 1 −1 1 −1
1 − i −1 i
This is a special matrix, and it’s likely that some readers will have encountered
it before: it’s the matrix associated with the 4-dimensional discrete Fourier transform.
In light of this fact, let us call it by the name QFT4 rather than V. The name QFT is
short for quantum Fourier transform — which is essentially just the discrete Fourier
7.2. PHASE ESTIMATION PROCEDURE 199
We can perform the inverse of this operation to go the other way, to transform the
states |ϕ0 ⟩, . . . , |ϕ3 ⟩ into the standard basis states |0⟩, . . . , |3⟩. If we do this, then we
can measure to learn which value y ∈ {0, 1, 2, 3} describes θ as θ = y/4. Figure 7.9
depicts a quantum circuit that does this.
|0⟩ H
QFT4†
|0⟩ H
|ψ⟩ U U U
Figure 7.9: The complete quantum circuit for phase estimation with two control
qubits.
To summarize, if we run this circuit when θ = y/4 for y ∈ {0, 1, 2, 3}, the state
immediately before the measurements take place will be |ψ⟩|y⟩ (for y encoded as a
two-bit binary string), so the measurements will reveal the value y without error.
This circuit is motivated by the special case that θ ∈ {0, 1/4, 1/2, 3/4} — but
we can run it for any choice of U and |ψ⟩, and hence any value of θ, that we wish.
Figure 7.10 shows a plot of the output probabilities the circuit produces for arbitrary
choices of θ.
This is a clear improvement over the single-qubit variant described earlier in
the lesson. It’s not perfect — it can give us the wrong answer — but the answer
is heavily skewed toward values of y for which y/4 is close to θ. In particular, the
200 LESSON 7. PHASE ESTIMATION AND FACTORING
0.8
probability 0
0.6
1
2
0.4 3
0.2
Figure 7.10: Output probabilities for phase estimation with two control qubits.
most likely outcome always corresponds to the closest value of y/4 to θ (equating
θ = 0 and θ = 1 as before), and from the plot it looks like this closest value for x
always appears with probability just above 40%. When θ is exactly halfway between
two such values, like θ = 0.375 for instance, the two equally close values of y are
equally likely.
Given the improvement we’ve just obtained by using two control qubits rather
than one, in conjunction with the inverse of the 4-dimensional quantum Fourier
transform, it’s natural to consider generalizing it further — by adding more control
qubits. When we do this, we obtain the general phase estimation procedure. We’ll see
how this works shortly, but in order to describe it precisely we’re going to need to
discuss the quantum Fourier transform in greater generality, to see how it’s defined
for other dimensions and to see how we can implement it (or its inverse) with a
quantum circuit.
and how it can be implemented with a quantum circuit on m qubits with cost O(m2 )
when N = 2m .
The matrices that describe the quantum Fourier transform are derived from
an analogous operation on N-dimensional vectors known as the discrete Fourier
transform. This operation can be thought about in different ways. For instance, we
can think about the discrete Fourier transform in purely abstract, mathematical
terms as a linear mapping. Or we can think about it in computational terms, where
we’re given an N-dimensional vector of complex numbers (using binary notation
to encode the real and imaginary parts of the entries, let us suppose) and the goal
is to calculate the N-dimensional vector obtained by applying the discrete Fourier
transform. Our focus is on third way, which is viewing this transformation as a
unitary operation that can be performed on a quantum system.
There’s an efficient algorithm for computing the discrete Fourier transform on
a given input vector known as the fast Fourier transform. It has applications in
signal processing and many other areas, and is considered by many to be one of the
most important algorithms ever discovered. As it turns out, the implementation of
the quantum Fourier transform when N is a power of 2 that we’ll study is based
on precisely the same underlying structure that makes the fast Fourier transform
possible.
To define the quantum Fourier transform, we’ll first define a complex number ω N ,
for each positive integer N, like this:
2πi 2π 2π
ω N = e N = cos + i sin .
N N
This is the number on the complex unit circle we obtain if we start at 1 and move
counter-clockwise by an angle of 2π/N radians, or a fraction of 1/N of the circum-
ference of the circle. Here are a few examples.
ω1 = 1
ω2 = − 1
√
1 3
ω3 = − + i
2 2
ω4 = i
202 LESSON 7. PHASE ESTIMATION AND FACTORING
1+i
ω8 = √
2
p √ p √
2+ 2 2− 2
ω16 = + i
2 2
ω100 ≈ 0.998 + 0.063i
As was already stated, this is the matrix associated with the N-dimensional
√
discrete Fourier transform. Often the leading factor of 1/ N is not included in the
definition of this matrix, but we need to include it to obtain a unitary matrix.
Here’s the quantum Fourier transform, written as a matrix, for some small
values of N.
QFT1 = 1
!
1 1 1
QFT2 = √
2 1 −1
1 1 1
1 √ √
QFT3 = √ 1
−1+ i 3 −1− i 3
2 2
3 √ √
−1− i 3 −1+ i 3
1 2 2
1 1 1 1
11 i −1 − i
QFT4 =
2 1 −1 1 −1
1 − i −1 i
7.2. PHASE ESTIMATION PROCEDURE 203
1 1 1 1 1 1 1 1
1√+i −√
1+ i −√
1− i 1√−i
1
2
i
2
−1 2
−i 2
1 i −1 −i 1 i −1 − i
−√
1+ i 1√+i 1√−i −√ 1− i
−i −1
1
1 i
2 2 2 2
QFT8 = √
2 2
1 −1 1 −1 1 −1 1 −1
−√
1− i 1√−i
1
2
i
2
−1 1√+i
2
−i −√1+ i
2
1
−i −1 i 1 −i −1 i
1√−i −√
1− i −√
1+ i 1√+i
1
2
−i 2
−1 2
i
2
Unitarity
Let’s check that QFT N is unitary, for any selection of N. One way to do this is to show
that its columns form an orthonormal basis. We can define a vector corresponding
to column number y, starting from y = 0 and going up to y = N − 1, like this:
N −1
1
∑
xy
|ϕy ⟩ = √ ω N | x ⟩.
N x =0
Taking the inner product between any two of these vectors gives us this expression:
N −1
1 x (y−z)
⟨ϕz |ϕy ⟩ =
N ∑ ωN
x =0
We can evaluate sums like this using the following formula for the sum of the
first N terms of a geometric series.
N
αα−−11 if α ̸= 1
1 + α + α 2 + · · · + α N −1 =
N if α = 1
y−z
Specifically, we can use this formula when α = ω N . When y = z, we have α = 1,
so using the formula and dividing by N gives
⟨ϕy |ϕy ⟩ = 1.
204 LESSON 7. PHASE ESTIMATION AND FACTORING
N (y−z)
This happens because ω N N = e2πi = 1, so ω
N = 1y−z = 1, making numerator
y−z
zero, while the denominator is nonzero because ω N ̸= 1. Intuitively speaking,
what we’re doing is summing a bunch of points that are distributed around the unit
circle, and they cancel out and leave 0 when summed.
We have therefore established that {|ϕ0 ⟩, . . . , |ϕN −1 ⟩} is an orthonormal set,
1 y = z
⟨ϕz |ϕy ⟩ =
0 y ̸= z,
Controlled-phase gates
To implement the quantum Fourier transform with a quantum circuit, we’ll need
to make use of controlled-phase gates. Recall that a phase operation is a single-qubit
unitary operation of the form
!
1 0
Pα =
0 eiα
for any real number α. A controlled version of this gate has the following matrix.
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 e iα
For this controlled gate, it doesn’t actually matter which qubit is the control and
which is the target because the two possibilities are equivalent. We can use any of
the symbols shown in Figure 7.11 to represent this gate in quantum circuit diagrams.
For the third form, the number α is also sometimes placed on the side of the control
line or under the lower control when that’s convenient.
7.2. PHASE ESTIMATION PROCEDURE 205
α
Pα
Pα
π π π π
16 8 4 2
| a⟩
| y0 ⟩
| y1 ⟩
| y2 ⟩
| y3 ⟩
Figure 7.12: A quantum circuit for performing the operation (7.3) when m = 5.
Now we’ll see how we can implement the quantum Fourier transform with a circuit
when the dimension N = 2m is a power of 2. There are, in fact, multiple ways to
implement the quantum Fourier transform, but this is arguably the simplest method
known. Once we know how to implement the quantum Fourier transform with a
quantum circuit, it’s straightforward to implement its inverse: we can replace each
gate with its inverse (or, equivalently, conjugate transpose) and apply the gates in
the reverse order. Every quantum circuit composed of unitary gates alone can be
inverted in this way.
The implementation is recursive in nature, so that’s how it’s most naturally
described. The base case is m = 1, in which case the quantum Fourier transform is
a Hadamard operation.
To perform the quantum Fourier transform on m qubits when m ≥ 2, we can
perform the following steps, whose actions we’ll describe for standard basis states
of the form | x ⟩| a⟩, where x ∈ {0, . . . , 2m−1 − 1} is an integer encoded as m − 1 bits
using binary notation and a is a single bit.
1. First apply the 2m−1 -dimensional quantum Fourier transform to the bot-
tom/leftmost m − 1 qubits to obtain this state:
2m −1 −1
1
∑
xy
QFT2m−1 | x ⟩ | a⟩ = √ ω2m−1 |y⟩| a⟩.
2m −1 y =0
This is done by recursively applying the method being described for one fewer
qubit, using the Hadamard operation on a single qubit as the base case.
y
2. Use the top/rightmost qubit as a control to inject the phase ω2m for each
standard basis state |y⟩ of the remaining m − 1 qubits (as is described above)
to obtain this state:
2m −1 −1
1
∑
xy ay
√ ω2m−1 ω2m |y⟩| a⟩.
2m −1 y =0
2m −1 −1 1
1
∑ ∑ (−1)ab ω2m−1 ω2m |y⟩|b⟩.
xy ay
√
2m y =0 b =0
7.2. PHASE ESTIMATION PROCEDURE 207
4. Permute the order of the qubits so that the least significant bit becomes the
most significant bit, with all others shifted up/right:
2m −1 −1 1
1
∑ ∑ (−1)ab ω2m−1 ω2m |b⟩|y⟩.
xy ay
√
2m y =0 b =0
For example, Figure 7.13 shows the circuit we obtain for N = 32 = 25 . In this
diagram, the qubits are given names that correspond to the standard basis vectors
| x ⟩| a⟩ (for the input) and |b⟩|y⟩ (for the output) for clarity.
π π π π
16 8 4 2
a H y0
x0 y1
x1 y2
QFT16
x2 y3
x3 b
Figure 7.13: A quantum circuit for QFT32 using an operation for QFT16 .
Analysis
The key formula we need to verify that the circuit just described implements the
2m -dimensional quantum Fourier transform is this one:
xy ay (2x + a)(2m−1 b+y)
(−1) ab ω2m−1 ω2m = ω2m .
This formula works for any choice of integers a, b, x, and y, but we’ll only need it
for a, b ∈ {0, 1} and x, y ∈ {0, . . . , 2m−1 − 1}. It can be checked by expanding the
product in the exponent on the right-hand side,
(2x + a)(2m−1 b+y) m 2xy m−1 ab ay xy ay
ω2m = ω22m xb ω2m ω22m ω2m = (−1) ab ω2m−1 ω2m ,
2m −1 −1 1
1 (2x + a)(2m−1 b+y)
QFT2m |2x + a⟩ = √
2m
∑ ∑ ω2m |b2m−1 + y⟩
y =0 b =0
2m −1 −1 1
1
∑ ∑ (−1)ab ω2m−1 ω2m |b2m−1 + y⟩.
xy ay
=√
2m y =0 b =0
Finally, by thinking about the standard basis states | x ⟩| a⟩ and |b⟩|y⟩ as binary
encodings of integers in the range {0, . . . , 2m − 1},
| x ⟩| a⟩ = |2x + a⟩
|b⟩|y⟩ = |2m−1 b + y⟩,
s1 = 1.
If m ≥ 2, then in the circuit above we need sm−1 gates for the quantum Fourier
transform on m − 1 qubits, plus m − 1 controlled-phase gates, plus a Hadamard
gate, plus m − 1 swap gates, so
H
H
|0m ⟩ QFT2†m
H
|ψ⟩ U2
m −1
U U2
Figure 7.14: A quantum circuit for the general phase estimation procedure.
2m −1 2m −1
1 1
∑ x
∑ e2πixθ | x ⟩.
√ U |ψ⟩ | x ⟩ = |ψ⟩ ⊗ √
2m x =0 2m x =0
A special case
Along similar lines to what we did in the m = 2 case, we’ll first consider the special
case that θ = y/2m for y ∈ {0, . . . , 2m − 1}. In this case the state prior to the inverse
quantum Fourier transform can alternatively be written like this:
2m −1 2m −1
1 xy 1
∑ ∑
2πi 2m xy
|ψ⟩ ⊗ √ e | x ⟩ = |ψ⟩ ⊗ √ ω2m | x ⟩ = |ψ⟩ ⊗ QFT2m |y⟩.
2m x =0 2m x =0
7.2. PHASE ESTIMATION PROCEDURE 211
So, when the inverse quantum Fourier transform is applied, the state becomes
|ψ⟩|y⟩
For other values of θ, meaning ones that don’t take the form y/2m for an integer y,
the measurement outcomes won’t be certain, but we can prove bounds on the
probabilities for different outcomes. Going forward, let’s consider an arbitrary
choice of θ satisfying 0 ≤ θ < 1.
After the inverse quantum Fourier transform is performed, the state of the circuit
is this: m m
1 2 −1 2 −1 2πix(θ −y/2m )
|ψ⟩ ⊗ m ∑ ∑ e | y ⟩.
2 y =0 x =0
So, when the measurements on the top m qubits are performed, we see each outcome
y with probability
2
2m −1
1
py = m
2 ∑ e 2πix (θ −y/2m )
.
x =0
To get a better handle on these probabilities, we’ll make use of the same formula
that we saw before, for the sum of the initial portion of a geometric series.
N
αα−−11 if α ̸= 1
2 N −1
1+α+α +···+α =
N if α = 1
m
We can simplify the sum appearing in the formula for py by taking α = e2πi(θ −y/2 ) .
Here’s what we obtain.
m
2 −1 2m
θ = y/2m
∑ e 2πix (θ −y/2m )
= 2π (2m θ −y) −1
x =0 e2π
(θ −y/2m ) θ ̸= y/2m
e −1
So, in the case that θ = y/2m , we find that py = 1 (as we already knew from
considering this special case), and in the case that θ ̸= y/2m , we find that
m 2
1 e2πi(2 θ −y) − 1
py = m .
22m e2πi(θ −y/2 ) − 1
212 LESSON 7. PHASE ESTIMATION AND FACTORING
2π
|δ|
e2πiδ
e 2π i δ
−1
Figure 7.15: Arc and chord lengths on the complex unit circle.
We can learn more about these probabilities by thinking about how arc lengths
and chord lengths on the unit circle are related. Figure 7.15 illustrates the relation-
ships we need for any real number δ ∈ − 12 , 12 .
First, the chord length (drawn in blue) can’t possibly be larger than the arc length
(drawn in purple):
e2πiδ − 1 ≤ 2π |δ|.
Relating these lengths in the other direction, we see that the ratio of the arc length
to the chord length is greatest when δ = ±1/2, and in this case the ratio is half the
circumference of the circle divided by the diameter, which is π/2. Thus, we have
2π |δ| π
≤ ,
e2πiδ −1 2
and so
e2πiδ − 1 ≥ 4|δ|.
An analysis based on these relations reveals the following two facts.
We’ll prove that py has to be pretty large in this case. By the assumption
we’re considering, it follows that |2m θ − y| ≤ 1/2, so we can use the second
observation above relating arc and chord lengths to conclude that
m θ −y) y
e2πi(2 − 1 ≥ 4 |2m θ − y | = 4 · 2m · θ −
.
2m
We can also use the first observation about arc and chord lengths to conclude
m) y
e2πi(θ −y/2 − 1 ≤ 2π θ − .
2m
Putting these two inequalities to use on py reveals
1 16 · 22m 4
py ≥ 2m 2
= 2 ≈ 0.405.
2 4π π
This explains our observation that the best outcome occurs with probability
greater than 40% in the m = 2 version of phase estimation discussed earlier.
It’s not really 40%, it’s 4/π 2 , and this bound holds for every choice of m.
which follows from the fact that any two points on the unit circle can differ in
absolute value by at most 2.
We can also use the second observation about arc and chord lengths from
above, this time working with the denominator of py rather than the numera-
tor, to conclude
m y
e2πi(θ −y/2 ) − 1 ≥ 4 θ − m ≥ 4 · 2−m .
2
Putting the two inequalities together reveals
1 4 1
py ≤ −
= .
22m 16 · 2 2m 4
Note that, while this bound is good enough for our purposes, it is fairly crude
— the probability is usually much lower than 1/4.
214 LESSON 7. PHASE ESTIMATION AND FACTORING
The important take-away from this analysis is that very close approximations to
θ are likely to occur — we’ll get a best m-bit approximation with probability greater
than 40% — whereas approximations off by more than 2−m are less likely to occur,
with probability upper bounded by 25%.
Given these guarantees, it is possible to boost our confidence by repeating the
phase estimation procedure several times, to gather statistical evidence about θ. It is
important to note that the state |ψ⟩ of the bottom collection of qubits is unchanged
by the phase estimation procedure, so it can be used to run the procedure as many
times as we like. In particular, each time we run the circuit, we get a best m-bit
approximation to θ with probability greater than 40%, while the probability of being
off by more than 2−m is bounded by 25%. If we run the circuit several times and
take the most commonly appearing outcome of the runs, it’s therefore exceedingly
likely that the outcome that appears most commonly will not be one that occurs at
most 25% of the time. As a result, we’ll be very likely to obtain an approximation
y/2m that’s within 1/2m of the value θ. Indeed, the unlikely chance that we’re off
by more than 1/2m decreases exponentially in the number of times the procedure is
run.
Figures 7.16 and 7.17 show plots of the probabilities for three consecutive values
for y when m = 3 and m = 4 as functions of θ. (Only three outcomes are shown for
clarity. Probabilities for other outcomes are obtained by cyclically shifting the same
underlying function.)
0.8
probability
0.6 3
4
0.4 5
0.2
Figure 7.16: Output probabilities for the outcomes 3, 4, and 5 in the phase estimation
procedure using m = 3 control qubits.
0.8
probability
0.6 7
8
0.4 9
0.2
Figure 7.17: Output probabilities for the outcomes 7, 8, and 9 in the phase estimation
procedure using m = 4 control qubits.
216 LESSON 7. PHASE ESTIMATION AND FACTORING
finding.) This second part of Shor’s algorithm doesn’t make use of quantum
computing at all; it’s completely classical. Quantum computing is only needed to
solve order finding.
To explain the order finding problem and how it can be solved using phase estima-
tion, it will be helpful to begin with a couple of basic number theory concepts, and
to introduce some handy notation along the way.
To begin, for any given positive integer N, define the set Z N like this.
Z N = {0, 1, . . . , N − 1}
+ 0 1 2 3 4 5 · 0 1 2 3 4 5
0 0 1 2 3 4 5 0 0 0 0 0 0 0
1 1 2 3 4 5 0 1 0 1 2 3 4 5
2 2 3 4 5 0 1 2 0 2 4 0 2 4
3 3 4 5 0 1 2 3 0 3 0 3 0 3
4 4 5 0 1 2 3 4 0 4 2 0 4 2
5 5 0 1 2 3 4 5 0 5 4 3 2 1
7.3. SHOR’S ALGORITHM 217
Z∗N = { a ∈ Z N : gcd( a, N ) = 1}
If we focus our attention on the operation of multiplication, the set Z∗N forms a
group — specifically an abelian group — which is another important type of object in
algebra. It’s a basic fact about these sets (and finite groups in general), that if we
pick any element a ∈ Z∗N and repeatedly multiply a to itself, we’ll always eventually
get the number 1.
For a first example, let’s take N = 6. We have that 5 ∈ Z6∗ because gcd(5, 6) = 1,
and if we multiply 5 to itself we get 1, as the table above confirms.
52 = 1 (working within Z6 )
For each of these elements, it is possible to raise that number to a positive integer
power to get 1. Here are the smallest powers for which this works:
11 = 1 82 = 1 163 = 1
26 = 1 106 = 1 176 = 1
43 = 1 116 = 1 196 = 1
56 = 1 132 = 1 202 = 1
Naturally we’re working within Z21 for all of these equations, which we haven’t
bothered to write — we take it to be implicit to avoid cluttering things up. We’ll
continue to do that throughout the rest of the lesson.
Order finding
Ma | x ⟩ = | ax ⟩ (for each x ∈ Z N )
To be clear, we’re doing the multiplication in Z N , so it’s implicit that we’re taking
the product modulo N inside of the ket on the right-hand side of the equation.
For example, if we take N = 15 and a = 2, then the action of M2 on the standard
basis {|0⟩, . . . , |14⟩} is as follows.
There’s another way to think about the inverse that doesn’t require any knowl-
edge of r (which, after all, is what we’re trying to compute). For every element
a ∈ Z∗N there’s always a unique element b ∈ Z∗N that satisfies ab = 1. We denote
this element b by a−1 , and it can be computed efficiently; an extension of Euclid’s
GCD algorithm does it with cost quadratic in lg( N ). And thus
Ma−1 Ma = Ma−1 a = M1 = I.
So, the operation Ma is both deterministic and invertible. That implies that it’s
described by a permutation matrix, and is therefore unitary.
7.3. SHOR’S ALGORITHM 219
Now let’s think about the eigenvectors and eigenvalues of the operation Ma ,
assuming that a ∈ Z∗N . As was just argued, this assumption tells us that Ma is
unitary.
There are N eigenvalues of Ma , possibly including the same eigenvalue repeated
multiple times, and in general there’s some freedom in selecting corresponding
eigenvectors — but we won’t need to worry about all of the possibilities. Let’s start
simply and identify just one eigenvector of Ma .
| 1 ⟩ + | a ⟩ + · · · + | a r −1 ⟩
|ψ0 ⟩ = √
r
The number r is the order of a modulo N, here and throughout the remainder of the
lesson. The eigenvalue associated with this eigenvector is 1 because it isn’t changed
when we multiply by a.
| a ⟩ + · · · + | a r −1 ⟩ + | a r ⟩ | a ⟩ + · · · + | a r −1 ⟩ + | 1 ⟩
Ma |ψ0 ⟩ = √ = √ = |ψ0 ⟩
r r
This happens because ar = 1, so each standard basis state | ak ⟩ gets shifted to | ak+1 ⟩
for k ≤ r − 1, and | ar−1 ⟩ gets shifted back to |1⟩. Informally speaking, it’s like we’re
slowly stirring |ψ0 ⟩, but it’s already completely stirred so nothing changes.
Here’s another example of an eigenvector of Ma . This one happens to be more
interesting in the context of order finding and phase estimation.
−(r −1)
|1⟩ + ωr−1 | a⟩ + · · · + ωr | a r −1 ⟩
|ψ1 ⟩ = √
r
Alternatively, we can write this vector using a summation as follows.
1 r −1
|ψ1 ⟩ = √ ∑ ωr−k | ak ⟩
r k =0
Here we’re seeing the complex number ωr = e2πi/r showing up naturally, due
to the way that multiplication by a works modulo N. This time the corresponding
eigenvalue is ωr . To see this, we can first compute as follows.
r −1 r −1 r r
−(k−1) k
Ma |ψ1 ⟩ = ∑ ωr−k Ma | ak ⟩ = ∑ ωr−k | ak+1 ⟩ = ∑ ωr |a ⟩ = ωr ∑ ωr−k |ak ⟩
k =0 k =0 k =1 k =1
so Ma |ψ1 ⟩ = ωr |ψ1 ⟩.
Using the same reasoning, we can identify additional eigenvector/eigenvalue
pairs for Ma . For any choice of j ∈ {0, . . . , r − 1} we have that
1 r−1 − jk
| ψ j ⟩ = √ ∑ ωr | a k ⟩
r k =0
j
is an eigenvector of Ma whose corresponding eigenvalue is ωr .
j
M a | ψ j ⟩ = ωr | ψ j ⟩
0 7→ 00000
1 7→ 00001
..
.
20 7→ 10100
7.3. SHOR’S ALGORITHM 221
| x ⟩|y⟩ 7→ | x ⟩|y ⊕ f a ( x )⟩
where
ax (mod N ) 0≤x<N
f a (x) =
x N ≤ x < 2n
using the method described in the previous lesson. This gives us a circuit of
size O(n2 ).
2. Swap the two n-qubit systems qubit-by-qubit using n swap gates.
3. Along similar lines to the first step, build a circuit for the operation
| x ⟩|y⟩ 7→ | x ⟩ y ⊕ f a−1 ( x )
The method requires workspace qubits, but they’re returned to their initialized state
at the end, which allows us to use these circuits for phase estimation. The total cost
of the circuit we obtain is O(n2 ).
222 LESSON 7. PHASE ESTIMATION AND FACTORING
To perform M2a , M4a , M8a , and so on, we can use exactly the same method, except
that we replace a with a2 , a4 , a8 , and so on, as elements of Z∗N . That is, for any power
k we choose, we can create a circuit for Mak not by iterating k times the circuit for
Ma , but instead by computing b = ak ∈ Z∗N and then using the circuit for Mb .
The computation of powers ak ∈ Z N is the modular exponentiation problem
mentioned in the previous lesson. This computation can be done classically, using
the power algorithm for modular exponentiation mentioned in the previous lesson.
m −1
In fact, we only require power-of-2 powers of a, in particular a2 , a4 , . . . a2 ∈ Z∗N ,
and we can obtain these powers by iteratively squaring m − 1 times. Each squaring
can be performed by a Boolean circuit of size O(n2 ).
In essence, what we’re effectively doing here is offloading the problem of it-
erating Ma as many as 2m−1 times to an efficient classical computation. And it’s
good fortune that this is possible! For an arbitrary choice of a quantum circuit in
the phase estimation problem, this is not likely to be possible — and in that case
the resulting cost for phase estimation grows exponentially in the number of control
qubits m.
To understand how we can solve the order finding problem using phase estimation,
let’s start by supposing that we run the phase estimation procedure on the operation
Ma using the eigenvector |ψ1 ⟩. Getting our hands on this eigenvector isn’t easy, as
it turns out, so this won’t be the end of the story — but it’s helpful to start here.
The eigenvalue of Ma corresponding to the eigenvector |ψ1 ⟩ is
1
ωr = e2πi r .
That is, ωr = e2πiθ for θ = 1/r. So, if we run the phase estimation procedure on
Ma using the eigenvector |ψ1 ⟩, we’ll get an approximation to 1/r. By computing
the reciprocal we’ll be able to learn r — provided that our approximation is good
enough.
In more detail, when we run the phase estimation procedure using m control
qubits, what we obtain is a number y ∈ {0, . . . , 2m − 1}. We then take y/2m as
a guess for θ, which is 1/r in the case at hand. To figure out what r is from
this approximation, the natural thing to do is to compute the reciprocal of our
approximation and round to the nearest integer.
m
2 1
+
y 2
7.3. SHOR’S ALGORITHM 223
2m 1 r εr2
= 1
= =r− .
y r +ε 1 + εr 1 + εr
We’re less than 1/2 away from r, so as expected we’ll get r when we round.
Unfortunately, because we don’t yet know what r is, we can’t use it to tell us
how much accuracy we need. What we can do instead is to use the fact that r must
be smaller than N to ensure that we use enough precision. In particular, if we use
enough accuracy to guarantee that the best approximation y/2m to 1/r satisfies
y 1 1
m
− ≤ ,
2 r 2N 2
then we’ll have enough precision to correctly determine r when we take the re-
ciprocal. Taking m = 2 lg( N ) + 1 ensures that we have a high chance to obtain
an estimation with this precision using the method described previously. (Taking
m = 2 lg( N ) is good enough if we’re comfortable with a lower bound of 40% on the
probability of success.)
General solution
Given an integer N ≥ 2 and a real number α ∈ (0, 1), there is at most one
choice of integers u, v ∈ {0, . . . , N − 1} with v ̸= 0 and gcd(u, v) = 1 satisfying
1
|α − u/v| < .
2N 2
Given α and N, the continued fraction algorithm finds u and v, or reports that they
don’t exist. This algorithm can be implemented as a Boolean circuit having size
O((lg( N ))3 ).
If we have a very close approximation y/2m to k/r, and we run the continued
fraction algorithm for N and α = y/2m , we’ll get u and v, as they’re described in
the fact. An analysis of the fact allows us to conclude that
u k
= .
v r
Notice in particular that we don’t necessarily learn k and r, we only learn k/r in
lowest terms.
For example, and as we’ve already noticed, we’re not going to learn anything
from k = 0. But that’s the only value of k where that happens. When k is nonzero, it
might have common factors with r, but the number v we obtain from the continued
fraction algorithm must at least divide r.
It’s far from obvious, but it is true that if we have the ability to learn u and v for
u/v = k/r for k ∈ {0, . . . , r − 1} chosen uniformly at random, then we’re very likely
to be able to recover r after just a few samples. In particular, if our guess for r is
the least common multiple of all the values for the denominator v that we observe,
we’ll be right with high probability. Intuitively speaking, some values of k aren’t
good because they share common factors with r, and those common factors are
hidden from us when we learn u and v. But random choices of k aren’t likely to hide
226 LESSON 7. PHASE ESTIMATION AND FACTORING
factors of r for long, and the probability that we don’t guess r correctly by taking
the least common multiple of the denominators we observe drops exponentially in
the number of samples.
It remains to address the issue of how we get our hands on an eigenvector |ψk ⟩
of Ma on which to run the phase estimation procedure. As it turns out, we don’t
actually need to create them!
What we will do instead is to run the phase estimation procedure on the state
|1⟩, by which we mean the n-bit binary encoding of the number 1, in place of an
eigenvector |ψ⟩ of Ma . So far, we’ve only talked about running the phase estimation
procedure on a particular eigenvector, but nothing prevents us from running the
procedure on an input state that isn’t an eigenvector of Ma , and that’s what we’re
doing here with the state |1⟩. (This isn’t an eigenvector of Ma unless a = 1, which
isn’t a choice we’ll be interested in.)
The rationale for choosing the state |1⟩ in place of an eigenvector of Ma is that
the following equation is true.
1 r −1
|1⟩ = √ ∑ |ψk ⟩
r k =0
One way to verify this equation is to compare the inner products of the two sides
with each standard basis state, using formulas mentioned previously in the lesson
to help to evaluate the results for the right-hand side. As a consequence, we will
obtain precisely the same measurement results as if we had chosen k ∈ {0, . . . , r − 1}
uniformly at random and used |ψk ⟩ as an eigenvector.
In greater detail, let’s imagine that we run the phase estimation procedure with
the state |1⟩ in place of one of the eigenvectors |ψk ⟩. After the inverse quantum
Fourier transform is performed, this leaves us with the state
1 r −1
√ ∑ |ψk ⟩|γk ⟩,
r k =0
where
2m −1 2m −1
1
∑ ∑
m
| γk ⟩ = m e2πix(k/r−y/2 ) |y⟩.
2 y =0 x =0
The vector |γk ⟩ represents the state of the top m qubits after the inverse of the
quantum Fourier transform has been performed on them.
So, by virtue of the fact that {|ψ0 ⟩, . . . , |ψr−1 ⟩} is an orthonormal set, we find
that a measurement of the top m qubits yields an approximation y/2m to the value
7.3. SHOR’S ALGORITHM 227
Total cost
The cost to implement each Mak , and hence each controlled version of these unitary
operations, is O(n2 ). There are m controlled-unitary operations, and we have m =
O(n), so the total cost for the controlled-unitary operations is O(n3 ). In addition,
we have m Hadamard gates (which contribute O(n) to the cost), and the inverse
quantum Fourier transform contributes O(n2 ) to the cost. Thus, the cost of the
controlled-unitary operations dominates the cost of the entire procedure — which
is therefore O(n3 ).
In addition to the quantum circuit itself, there are a few classical computations
that need to be performed along the way. This includes computing the powers ak in
Z N for k = 2, 4, 8, . . . , 2m−1 , which are needed to create the controlled-unitary gates,
as well as the continued fraction algorithm that converts approximations of θ into
fractions. These computations can be performed by Boolean circuits with a total
cost of O(n3 ).
As is typical, all of these bounds can be improved using asymptotically fast algo-
rithms; these bounds assume we’re using standard algorithms for basic arithmetic
operations.
It’s also easy to split perfect powers, meaning numbers of the form N = s j
for integers s, j ≥ 2, just by approximating the roots N 1/2 , N 1/3 , N 1/4 , etc., and
checking nearby integers as suspects for s. We don’t need to go further than log( N )
steps into this sequence, because at that point the root drops below 2 and won’t
reveal additional candidates.
It’s good that we can do both of these things because order finding won’t help us
to factor even numbers or prime powers, where the number s happens to be prime.
If N is odd and not a prime power, however, order finding allows us to split N
through the following algorithm.
A run of this algorithm may fail to find a factor of N. Specifically, this happens
in two situations:
• The order of a modulo N is odd.
• The order of a modulo N is even and gcd ar/2 − 1, N = 1.
Using basic number theory it can be proved that, for a random choice of a, with
probability at least 1/2 neither of these events happen. In fact, the probability that
either event happens is at most 2−(m−1) for m being the number of distinct prime
factors of N, which is why the assumption that N is not a prime power is needed.
(The assumption that N is odd is also required for this fact to be true.)
7.3. SHOR’S ALGORITHM 229
This means that each run has at least a 50% chance to split N. Therefore, if we
run the algorithm t times, randomly choosing a each time, we’ll succeed in splitting
N with probability at least 1 − 2−t .
The basic idea behind the algorithm is as follows. If we have a choice of a for
which the order r of a modulo N is even, then r/2 is an integer and we can consider
the numbers
ar/2 − 1 (mod N ) and ar/2 + 1 (mod N ).
Using the formula Z2 − 1 = ( Z + 1)( Z − 1), we conclude that
ar/2 − 1 ar/2 + 1 = ar − 1.
For this to be true, all of the prime factors of N must also be prime factors of ar/2 − 1
or ar/2 + 1 (or both) — and for a random selection of a it turns out to be unlikely
that all of the prime factors of N will divide one of the terms and none will divide
the other. Otherwise, so long as some of the prime factors of N divide the first term
and some divide the second term, we’ll be able to find a non-trivial factor of N by
computing the GCD with the first term.
Lesson 8
Grover’s Algorithm
231
232 LESSON 8. GROVER’S ALGORITHM
ited to evaluating f on chosen inputs, this is the best we can do with a deterministic
algorithm if we want to guarantee success.
With a probabilistic algorithm, we might hope to save time by randomly choos-
ing input strings to f , but we’ll still require O( N ) evaluations of f if we want this
method to succeed with high probability.
Grover’s algorithm solves this search problem with high probability with just
√
O( N ) evaluations of f . To be clear, these function evaluations must happen
in superposition, similar to the query algorithms discussed in Lesson 5 (Quantum
Query Algorithms), including Deutsch’s algorithm, the Deutsch–Jozsa algorithm,
and Simon’s algorithm. Unlike those algorithms, Grover’s algorithm takes an
iterative approach: it evaluates f on superpositions of input strings and intersperses
these evaluations with other operations that have the effect of creating interference
√
patterns, leading to a solution with high probability (if one exists) after O( N )
iterations.
for every x ∈ Σn and a ∈ Σ. This is the action of U f on standard basis states, and its
action in general is determined by linearity.
As was discussed in Lesson 6 (Quantum Algorithmic Foundations), if we have a
Boolean circuit for computing f , we can transform that Boolean circuit description
into a quantum circuit implementing U f (using some number of workspace qubits
that start and end the computation in the |0⟩ state). So, although we’re using the
query model to formalize the problem that Grover’s algorithm solves, it is not
limited to this model; we can run Grover’s algorithm on any function f for which
we have a Boolean circuit.
Here’s a precise statement of the problem, which is named search because we’re
searching for a solution, meaning a string x that causes f to evaluate to 1.
234 LESSON 8. GROVER’S ALGORITHM
Search
Input: A function f : Σn → Σ.
Output: A string x ∈ Σn satisfying f ( x ) = 1, or “no solution” if no such
string x exists.
Notice that this is not a promise problem — the function f is arbitrary. It will,
however, be helpful to consider the following promise variant of the problem,
where we’re guaranteed that there’s exactly one solution. This problem appeared
as an example of a promise problem in Lesson 5 (Quantum Query Algorithms).
Unique search
Also notice that the or problem mentioned in the same lesson is closely related to
search. For that problem, the goal is simply to determine whether or not a solution
exists, as opposed to actually finding a solution.
Z f | x ⟩ = (−1) f ( x) | x ⟩
8.2. DESCRIPTION OF GROVER’S ALGORITHM 235
Z f gate
|−⟩ |−⟩
requires that one workspace qubit, initialized to a |−⟩ state, is made available. This
qubit remains in the |−⟩ state after the implementation has completed, and can be
reused (to implement subsequent Z f gates, for instance) or simply discarded.
In addition to the operation Z f , we will also make use of a phase query gate for
the n-bit OR function, which is defined as follows for each string x ∈ Σn .
0 x = 0n
OR( x ) =
1 x ̸ = 0n
Explicitly, the phase query gate for the n-bit OR function operates like this:
| x ⟩ x = 0n
ZOR | x ⟩ =
−| x ⟩ x ̸= 0n .
To be clear, this is how ZOR operates on standard basis states; its behavior on
arbitrary states is determined from this expression by linearity.
The operation ZOR can be implemented as a quantum circuit by beginning with
a Boolean circuit for the OR function, then constructing a UOR operation (i.e., a
standard query gate for the n-bit OR function) using the procedure described in
Lesson 6 (Quantum Algorithmic Foundations), and finally a ZOR operation using the
phase kickback phenomenon as described above. Notice that the operation ZOR has
no dependence on the function f and can therefore be implemented by a quantum
circuit having no query gates.
236 LESSON 8. GROVER’S ALGORITHM
Grover’s algorithm
1. Initialize an n qubit register Q to the all-zero state |0n ⟩ and then apply a
Hadamard operation to each qubit of Q.
2. Apply t times the unitary operation G = H ⊗n ZOR H ⊗n Z f to the register Q.
3. Measure the qubits of Q with respect to standard basis measurements
and output the resulting string.
H H
H H
H H
Zf H ZOR H
H H
H H
H H
|0⟩ H H H H H H H
|0⟩ H H H H H H H
|0⟩ H H H H H H H
|0⟩ H Zf H ZOR H Zf H ZOR H Zf H ZOR H
|0⟩ H H H H H H H
|0⟩ H H H H H H H
|0⟩ H H H H H H H
f , and then convert it to a quantum circuit for Z f , we can reasonably expect that the
resulting quantum circuit will be larger and more complicated than one for ZOR .
Figure 8.3 shows a quantum circuit for the entire algorithm when n = 7 and
t = 3. For larger values of t we can simply insert additional instances of the Grover
operation immediately before the measurements.
Application to search
Grover’s algorithm can be applied to the search problem as follows:
Once we’ve analyzed how Grover’s algorithm works, we’ll see that by taking
√
t = O( N ), we obtain a solution to our search problem (if one exists) with high
probability.
238 LESSON 8. GROVER’S ALGORITHM
8.3 Analysis
Now we’ll analyze Grover’s algorithm to understand how it works. We’ll start
with what could be described as a symbolic analysis, where we calculate how the
Grover operation G acts on certain states, and then we’ll tie this symbolic analysis
to a geometric picture that’s helpful for visualizing how the algorithm works.
A0 = x ∈ Σ n : f ( x ) = 0
A1 = x ∈ Σ n : f ( x ) = 1
The set A1 contains all of the solutions to our search problem while A0 contains
the strings that aren’t solutions (which we can refer to as non-solutions when it’s
convenient). These two sets satisfy A0 ∩ A1 = ∅ and A0 ∪ A1 = Σn , which is to say
that this is a bipartition of Σn .
Next we’ll define two unit vectors representing uniform superpositions over the
sets of solutions and non-solutions.
1
| A0 ⟩ = p ∑ |x⟩
| A 0 | x ∈ A0
1
| A1 ⟩ = p ∑ |x⟩
| A 1 | x ∈ A1
Formally speaking, each of these vectors is only defined when its corresponding
set is nonempty, but hereafter we’re going to focus on the case that neither A0 nor
A1 is empty. The cases that A0 = ∅ and A1 = ∅ are easily handled separately, and
we’ll do that later.
As an aside, the notation being used here is common: any time we have a finite
and nonempty set S, we can write |S⟩ to denote the quantum state vector that’s
uniform over the elements of S.
Let’s also define |u⟩ to be a uniform quantum state over all n-bit strings:
1
|u⟩ = √
N
∑ | x ⟩.
x ∈Σn
Notice that r r
| A0 | | A1 |
|u⟩ = | A0 ⟩ + | A1 ⟩.
N N
8.3. ANALYSIS 239
We also have that |u⟩ = H ⊗n |0n ⟩, so |u⟩ represents the state of the register Q after
the initialization in step 1 of Grover’s algorithm.
This implies that just before the iterations of G happen in step 2, the state of Q
is contained in the two-dimensional vector space spanned by | A0 ⟩ and | A1 ⟩, and
moreover the coefficients of these vectors are real numbers. As we will see, the
state of Q will always have these properties — meaning that the state is a real linear
combination of | A0 ⟩ and | A1 ⟩ — after any number of iterations of the operation G
in step 2.
G = H ⊗n ZOR H ⊗n Z f ,
Notice that
(−1) g(x) = (−1)1⊕ f (x) = −(−1) f (x)
for every string x ∈ Σn , and therefore
Zg = − Z f .
This means that if we were to substitute the function f with the function g, Grover’s
algorithm wouldn’t function any differently — because the states we obtain from
the algorithm in the two cases are necessarily equivalent up to a global phase.
This isn’t a problem! Intuitively speaking, the algorithm doesn’t care which
strings are solutions and which are non-solutions — it only needs to be able to
distinguish solutions and non-solutions to operate correctly.
240 LESSON 8. GROVER’S ALGORITHM
Z f | A0 ⟩ = | A0 ⟩
Z f | A1 ⟩ = −| A1 ⟩
again for every string x ∈ Σn , and a convenient alternative way to express this
operation is like this:
ZOR = 2|0n ⟩⟨0n | − I.
A simple way to verify that this expression agrees with the definition of ZOR is
to evaluate its action on standard basis states. The operation H ⊗n ZOR H ⊗n can
therefore be written like this:
using the same notation |u⟩ that we used above for the uniform superposition over
all n-bit strings.
And now we have what we need to compute the action of G on | A0 ⟩ and | A1 ⟩.
First let’s compute the action of G on | A0 ⟩.
G | A0 ⟩ = 2|u⟩⟨u| − I Z f | A0 ⟩
= 2|u⟩⟨u| − I | A0 ⟩
r
| A0 |
=2 | u ⟩ − | A0 ⟩
r N r r
| A0 | | A0 | | A1 |
=2 | A0 ⟩ + | A1 ⟩ − | A0 ⟩
N N N
p
2| A0 | 2 | A0 | · | A1 |
= − 1 | A0 ⟩ + | A1 ⟩
N N
p
| A0 | − | A1 | 2 | A0 | · | A1 |
= | A0 ⟩ + | A1 ⟩
N N
8.3. ANALYSIS 241
G | A1 ⟩ = 2|u⟩⟨u| − I Z f | A1 ⟩
= − 2|u⟩⟨u| − I | A1 ⟩
r
| A1 |
= −2 | u ⟩ + | A1 ⟩
r N r r
| A1 | | A0 | | A1 |
= −2 | A0 ⟩ + | A1 ⟩ + | A1 ⟩
N N N
p
2 | A1 | · | A0 | 2| A1 |
=− | A0 ⟩ + 1 − | A1 ⟩
N N
p
2 | A1 | · | A0 | | A | − | A1 |
=− | A0 ⟩ + 0 | A1 ⟩
N N
In both cases we’re using the equation
r r
| A0 | | A1 |
|u⟩ = | A0 ⟩ + | A1 ⟩
N N
along with the expressions
r r
| A0 | | A1 |
⟨ u | A0 ⟩ = and ⟨ u | A1 ⟩ =
N N
that follow. In summary, we have
p
| A0 | − | A1 | 2 | A0 | · | A1 |
G | A0 ⟩ = | A0 ⟩ + | A1 ⟩
N N
p
2 | A1 | · | A0 | | A | − | A1 |
G | A1 ⟩ = − | A0 ⟩ + 0 | A1 ⟩.
N N
As we already noted, the state of Q just prior to step 2 is contained in the two-
dimensional space spanned by | A0 ⟩ and | A1 ⟩, and we have just established that G
maps any vector in this space to another vector in the same space. This means that,
for the sake of the analysis, we can focus our attention exclusively on this subspace.
To better understand what’s happening within this two-dimensional space, let’s
express the action of G on this space as a matrix,
√
| A0 |−| A1 | 2 | A1 |·| A0 |
N − N
M= √ ,
2 | A0 |·| A1 | | A0 |−| A1 |
N N
242 LESSON 8. GROVER’S ALGORITHM
The matrix q q
| A0 | | A1 |
N − N
q q
| A1 | | A0 |
N N
is a rotation matrix, which we can alternatively express as
q q
| A0 | | A1 |
N − N cos(θ ) − sin(θ )
=
q q
| A1 | | A0 | sin(θ ) cos(θ )
N N
for r
| A1 |
−1
θ = sin .
N
This angle θ is going to play a very important role in the analysis that follows, so
it’s worth stressing its importance here as we see it for the first time.
In light of this expression of this matrix, we observe that
2
cos(θ ) − sin(θ ) cos(2θ ) − sin(2θ )
M= = .
sin(θ ) cos(θ ) sin(2θ ) cos(2θ )
This is because rotating by the angle θ two times is equivalent to rotating by the
angle 2θ. Another way to see this is to make use of the alternative expression
r
| A0 |
−1
θ = cos ,
N
together with the double angle formulas from trigonometry:
and in general
Geometric picture
Now let’s connect the analysis we just went through to a geometric picture. The
idea is that the operation G is the product of two reflections, Z f and H ⊗n ZOR H ⊗n .
And the net effect of performing two reflections is to perform a rotation.
Let’s start with Z f . As we already observed previously, we have
Z f | A0 ⟩ = | A0 ⟩
Z f | A1 ⟩ = −| A1 ⟩.
| A1 ⟩
|ψ⟩
L1 | A0 ⟩
Z f |ψ⟩
Figure 8.4: The action of Z f , which reflects about the line L1 , on a vector |ψ⟩ that is
a real linear combination of | A0 ⟩ and | A1 ⟩.
| A1 ⟩
H ⊗n ZOR H ⊗n |ψ⟩
|u⟩
| A0 ⟩
|ψ⟩
L2
Figure 8.5: The action of H ⊗n ZOR H ⊗n , which reflects about the line L2 , on a vec-
tor |ψ⟩ that is a real linear combination of | A0 ⟩ and | A1 ⟩.
8.4. CHOOSING THE NUMBER OF ITERATIONS 245
| A1 ⟩ G |ψ⟩
2θ |ψ⟩
L1 |u⟩
θ | A0 ⟩
L2
Z f |ψ⟩
Figure 8.6: The Grover operation G is a composition of the reflections about the
lines L1 and L2 . Its action on real linear combinations of | A0 ⟩ and | A1 ⟩ is to rotate
by twice the angle between L1 and L2 .
Unique search
First, let’s focus on the situation in which there’s a single string x such that f ( x ) = 1.
Another way to say this is that we’re considering an instance of the unique search
problem. In this case we have
r
1
θ = sin−1 ,
N
8.4. CHOOSING THE NUMBER OF ITERATIONS 247
The first argument, N, refers to the number of items we’re searching over, and the
second argument, which is 1 in this case, refers to the number of solutions. A bit
later we’ll use the same notation more generally, where there are multiple solutions.
Here’s a table of the probabilities of success for increasing values of N = 2n .
N p( N, 1) N p( N, 1)
2 0.5000000000 512 0.9994480262
4 1.0000000000 1024 0.9994612447
8 0.9453125000 2048 0.9999968478
16 0.9613189697 4096 0.9999453461
32 0.9991823155 8192 0.9999157752
64 0.9965856808 16384 0.9999997811
128 0.9956198657 32768 0.9999868295
256 0.9999470421 65536 0.9999882596
Notice that these probabilities are not strictly increasing. In particular, we have an
interesting anomaly when N = 4, where we get a solution with certainty. It can,
however, be proved in general that
1
p( N, 1) ≥ 1 −
N
248 LESSON 8. GROVER’S ALGORITHM
for all N, so the probability of success goes to 1 in the limit as N becomes large, as
the values above seem to suggest. This is good!
But notice, however, that even a weak bound such as p( N, 1) ≥ 1/2 establishes
the utility of Grover’s algorithm. For whatever measurement outcome x we obtain
from running the procedure, we can always check to see if f ( x ) = 1 using a single
query to f . And if we fail to obtain the unique string x for which f ( x ) = 1 with
probability at most 1/2 by running the procedure once, then after m independent
runs of the procedure we will have failed to obtain this unique string x with
√
probability at most 2−m . That is, using O(m N ) queries to f , we’ll obtain the unique
solution x with probability at least 1 − 2−m . Using the better bound p( N, 1) ≥
1 − 1/N reveals that the probability to find x ∈ A1 using this method is actually at
least 1 − N −m .
Multiple solutions
As the number of elements in A1 varies, so too does the angle θ, which can have a
significant effect on the algorithm’s probability of success. For the sake of brevity,
let’s write s = | A1 | to denote the number of solutions, and as before we’ll assume
that s ≥ 1.
As a motivating example, let’s imagine that we have s = 4 solutions rather than
a single solution, as we considered above. This means that
r
4
θ = sin−1 ,
N
This time the probability of success goes to 0 as N goes to infinity. This happens
because we’re effectively rotating twice as fast as we did when there was a unique
solution, so we end up zooming past the target | A1 ⟩ and landing near −| A0 ⟩.
However, if instead we use the recommended choice of t, which is
jπk
t=
4θ
for r
−1 s
θ = sin ,
N
then the performance will be better. To be more precise, using this choice of t leads
to success with high probability.
N p( N, 4) N p( N, 4)
4 1.0000000000 1024 0.9999470421
8 0.5000000000 2048 0.9994480262
16 1.0000000000 4096 0.9994612447
32 0.9453125000 8192 0.9999968478
64 0.9613189697 16384 0.9999453461
128 0.9991823155 32768 0.9999157752
256 0.9965856808 65536 0.9999997811
512 0.9956198657
where we’re using the notation suggested earlier: p( N, s) denotes the probability
that Grover’s algorithm run for t iterations reveals a solution when there are s
solutions in total out of N possibilities.
This lower bound of 1 − s/N on the probability of success is slightly peculiar in
that more solutions implies a worse lower bound — but under the assumption that
s is significantly smaller than N, we nevertheless conclude that the probability of
success is reasonably high. As before, the mere fact that p( N, s) is reasonably large
implies the algorithm’s usefulness.
It also happens to be the case that
s
p( N, s) ≥ .
N
This lower bound describes the probability that a string x ∈ Σn selected uniformly
at random is a solution — so Grover’s algorithm always does at least as well as
random guessing. (In fact, when t = 0, Grover’s algorithm is random guessing.)
Now let’s take a look at the number of iterations (and hence the number of
queries) jπk
t= ,
4θ
for r
−1 s
θ = sin .
N
For every α ∈ [0, 1], it is the case that sin−1 (α) ≥ α, and so
r r
−1 s s
θ = sin ≥ .
N N
This implies that r
π π N
t≤ ≤ ,
4θ 4 s
which translates to a savings in the number of queries as s grows. In particular, the
number of queries required is
r
N
O .
s
uniformly at random. Selecting t in this way always finds a solution (assuming one
exists) with probability greater than 40%, though this is not obvious and requires an
analysis that will not be included here. It does makes sense, however, particularly
when we think about the geometric picture: rotating the state of Q a random number
of times like this is not unlike choosing a random unit vector in the space spanned
by | A0 ⟩ and | A1 ⟩, for which it is likely that the coefficient of | A1 ⟩ is reasonably
large. By repeating this procedure and checking the outcome in the same way as
described before, the probability to find a solution can be made very close to 1.
√
There is a refined method that finds a solution when one exists using O( N/s)
√
queries, even when the number of solutions s is not known, and requires O( N )
queries to determine that there are no solutions when s = 0.
The basic idea is to choose t uniformly at random from the set {1, . . . , T } itera-
tively, for increasing values of T. In particular, we can start with T = 1 and increase
it exponentially, always terminating the process as soon as a solution is found
and capping T so as not to waste queries when there isn’t a solution. The process
takes advantage of the fact that fewer queries are required when more solutions
exist. Some care is required, however, to balance the rate of growth of T with the
probability of success for each iteration. (Taking T ← ⌈ 54 T ⌉ works, for instance, as
an analysis reveals. Doubling T, however, does not — this turns out to be too fast
of an increase.)
= ± 2|u⟩⟨u| − I |u⟩
= ±|u⟩.
So, irrespective of the number of iterations t we perform in these cases, the mea-
surements will always reveal a uniform random string x ∈ Σn .
General Formulation of
Quantum Information
253
254 LESSON 8. GROVER’S ALGORITHM
Density Matrices
• Density matrices can represent a broader class of quantum states than quan-
tum state vectors. This includes states that arise in practical settings, such
as states of quantum systems that have been subjected to noise, as well as
random choices of quantum states.
255
256 LESSON 9. DENSITY MATRICES
At first glance, it may seem peculiar that quantum states are represented by
matrices, which more typically represent actions or operations, as opposed to
states. For example, unitary matrices describe quantum operations in the simplified
formulation of quantum information and stochastic matrices describe probabilistic
operations in the context of classical information. In contrast, although density
matrices are indeed matrices, they represent states — not actions or operations.
Despite this, the fact that density matrices can (like all matrices) be associated
with linear mappings is a critically important aspect of them. For example, the
eigenvalues of density matrices describe the randomness or uncertainty inherent to
the states they represent.
Definition
Suppose that we have a quantum system named X, and let Σ be the (finite and
nonempty) classical state set of this system. Here we’re mirroring the naming
conventions used in Unit I, which we’ll continue to do when the opportunity arises.
In the general formulation of quantum information, a quantum state of the
system X is described by a density matrix ρ whose entries are complex numbers and
whose indices (for both its rows and columns) have been placed in correspondence
9.1. DENSITY MATRIX BASICS 257
with the classical state set Σ. The lowercase Greek letter ρ is a conventional first
choice for the name of a density matrix, although σ and ξ are also common choices.
Here are a few examples of density matrices that describe states of qubits:
1 1 3 i 1
!
1 0 2 2 4 8 2 0
, , , and .
0 0 1 1
− i 1
0 1
2 2 8 4 2
To say that ρ is a density matrix means that these two conditions, which will be
explained momentarily, are both satisfied:
1. Unit trace: Tr(ρ) = 1.
2. Positive semidefiniteness: ρ ≥ 0.
The first condition on density matrices refers to the trace of a matrix. This is a
function that is defined, for all square matrices, as the sum of the diagonal entries:
α0,0 α0,1 · · · α0,n−1
α
1,0 α 1,1 · · · α
1,n−1
Tr . = α0,0 + α1,1 + · · · + αn−1,n−1 .
.. ..
. . . .
. . .
αn−1,0 αn−1,1 · · · αn−1,n−1
The trace is a linear function: for any two square matrices A and B of the same
size, and any two complex numbers α and β, the following equation is always true.
The trace is an extremely important function and there’s a lot more that can be
said about it, but we’ll wait until the need arises to say more.
The second condition refers to the property of a matrix being positive semidefinite,
which is a fundamental concept in quantum information theory and in many other
subjects. A matrix P is positive semidefinite if there exists a matrix M such that
P = M† M.
258 LESSON 9. DENSITY MATRICES
Here we can either demand that M is a square matrix of the same size as P or allow
it to be non-square — we obtain the same class of matrices either way.
There are several alternative (but equivalent) ways to define this condition,
including these:
• A matrix P is positive semidefinite if and only if P is Hermitian (i.e., equal
to its own conjugate transpose) and all of its eigenvalues are nonnegative
real numbers. Checking that a matrix is Hermitian and all of its eigenvalues
are nonnegative is a simple computational way to verify that it’s positive
semidefinite.
• A matrix P is positive semidefinite if and only if ⟨ψ| P|ψ⟩ ≥ 0 for every
complex vector |ψ⟩ having the same indices as the rows and columns of P.
An intuitive way to think about positive semidefinite matrices is that they’re
like matrix analogues of nonnegative real numbers. That is, positive semidefinite
matrices are to complex square matrices as nonnegative real numbers are to complex
numbers. For example, a complex number α is a nonnegative real number if and
only if
α = ββ
for some complex number β, which matches the definition of positive semidefinite-
ness when we replace matrices with scalars. While matrices are more complicated
objects than scalars in general, this is nevertheless a helpful way to think about
positive semidefinite matrices.
This also explains the common notation P ≥ 0, which indicates that P is positive
semidefinite. Notice in particular that P ≥ 0 does not mean that each entry of
P is nonnegative in this context; there are positive semidefinite matrices having
negative entries, as well as matrices whose entries are all positive that are not
positive semidefinite.
At this point, the definition of density matrices may seem rather arbitrary and
abstract, as we have not yet associated any meaning with these matrices or their
entries. The way density matrices work and can be interpreted will be clarified as
the lesson continues, but for now it may be helpful to think about the entries of
density matrices in the following (somewhat informal) way.
9.1. DENSITY MATRIX BASICS 259
• The diagonal entries of a density matrix give us the probabilities for each
classical state to appear if we perform a standard basis measurement — so
we can think about these entries as describing the “weight” or “likelihood”
associated with each classical state.
• The off-diagonal entries of a density matrix describe the degree to which the
two classical states corresponding to that entry (meaning the one correspond-
ing to the row and the one corresponding to the column) are in quantum
superposition, as well as the relative phase between them.
It is certainly not obvious a priori that quantum states should be represented by
density matrices. Indeed, there is a sense in which the choice to represent quantum
states by density matrices leads naturally to the entire mathematical description of
quantum information. Everything else about quantum information actually follows
pretty logically from this one choice!
ρ = |ψ⟩⟨ψ|
√1
1 i 2
|−i ⟩ = √ |0⟩ − √ |1⟩ =
2 2 − √i
2
260 LESSON 9. DENSITY MATRICES
√1 1 i
2 2 2
|−i ⟩⟨−i | = √1 √i =
√i 2 2
− 2
− 2i 1
2
Here’s a table listing these states along with a few other basic examples: |0⟩, |1⟩,
|+⟩, and |−⟩. We’ll see these six states again later in the lesson.
For one more example, here’s a state from Lesson 1 (Single Systems), including
both its state vector and density matrix representations.
5 −2−4i
1 + 2i 2 9 9
|v⟩ = |0⟩ − |1⟩ |v⟩⟨v| =
3 3 −2+4i 4
9 9
9.1. DENSITY MATRIX BASICS 261
Density matrices that take the form ρ = |ψ⟩⟨ψ| for a quantum state vector |ψ⟩
are known as pure states. Not every density matrix can be written in this form; some
states are not pure.
As density matrices, pure states always have one eigenvalue equal to 1 and
all other eigenvalues equal to 0. This is consistent with the interpretation that the
eigenvalues of a density matrix describe the randomness or uncertainty inherent to
that state. In essence, there’s no uncertainty for a pure state ρ = |ψ⟩⟨ψ| — the state
is definitely |ψ⟩.
In general, for a quantum state vector
α0
α1
|ψ⟩ = ..
.
α n −1
for a system with n classical states, the density matrix representation of the same
state is as follows.
α0 α0 α 0 α 1 · · · α 0 α n −1
α1 α0 α 1 α 1 · · · α 1 α n −1
|ψ⟩⟨ψ| = .
.. .
.. . .. .
..
α n −1 α 0 α n −1 α 1 · · · α n −1 α n −1
| α0 |2 α0 α1 · · · α 0 α n −1
2
α1 α0 | α 1 | · · · α 1 α n − 1
= .
.. .. ... ..
. .
α n −1 α 0 α n −1 α 1 · · · | α n −1 | 2
So, for the special case of pure states, we can verify that the diagonal entries of a
density matrix describe the probabilities that a standard basis measurement would
output each possible classical state.
A final remark about pure states is that density matrices eliminate the degeneracy
concerning global phases found for quantum state vectors. Suppose we have two
quantum state vectors that differ by a global phase: |ψ⟩ and |ϕ⟩ = eiθ |ψ⟩, for some
real number θ. Because they differ by a global phase, these vectors represent exactly
the same quantum state, despite the fact that the vectors may be different. The
262 LESSON 9. DENSITY MATRICES
density matrices that we obtain from these two state vectors, on the other hand, are
identical.
†
|ϕ⟩⟨ϕ| = eiθ |ψ⟩ eiθ |ψ⟩ = ei(θ −θ ) |ψ⟩⟨ψ| = |ψ⟩⟨ψ|
For example, if a qubit is prepared in the state |0⟩ with probability 1/2 and in
the state |+⟩ with probability 1/2, the density matrix representation of the state we
obtain is given by
1 1 3 1
!
1 1 1 1 0 1 2 2 4 4
|0⟩⟨0| + |+⟩⟨+| = + = .
2 2 2 0 0 2 1 1 1 1
2 2 4 4
is not a valid quantum state vector because its Euclidean norm is not equal to 1. A
more extreme example that shows that this doesn’t work for quantum state vectors
is that we fix any quantum state vector |ψ⟩ that we wish, and then we take our state
to be |ψ⟩ with probability 1/2 and −|ψ⟩ with probability 1/2. These states differ by
a global phase, so they’re actually the same state — but averaging gives us the zero
vector, which is not a valid quantum state vector.
(In this equation the symbol I denotes the 2 × 2 identity matrix.) This is a special
state known as the completely mixed state. It represents complete uncertainty about
the state of a qubit, similar to a uniform random bit in the probabilistic setting.
Now suppose that we change the procedure: in place of the states |0⟩ and |1⟩
we’ll use the states |+⟩ and |−⟩. We can compute the density matrix that describes
the resulting state in a similar way.
1 1 1 1 1
1 1 1 2 2 1 2 − 0
|+⟩⟨+| + |−⟩⟨−| = +
2
=
2 = 1I
2 2 2 1 1 2 − 1 1
0 1 2
2 2 2 2 2
264 LESSON 9. DENSITY MATRICES
It’s the same density matrix as before, even though we changed the states. In
fact, we would again obtain the same result — the completely mixed state — by
substituting any two orthogonal qubit state vectors for |0⟩ and |1⟩.
This is a feature, not a bug! We do in fact obtain exactly the same state either
way. That is, there’s no way to distinguish the two procedures by measuring the
qubit they produce, even in a statistical sense. Our two different procedures are
simply different ways to prepare this state.
We can verify that this makes sense by thinking about what we could hope to
learn given a random selection of a state from one of the two possible state sets
{|0⟩, |1⟩} and {|+⟩, |−⟩}. To keep things simple, let’s suppose that we perform a
unitary operation U on our qubit and then measure in the standard basis.
In the first scenario, the state of the qubit is chosen uniformly from the set
{|0⟩, |1⟩}. If the state is |0⟩, we obtain the outcomes 0 and 1 with probabilities
respectively. If the state is |1⟩, we obtain the outcomes 0 and 1 with probabilities
Because the two possibilities each happen with probability 1/2, we obtain the
outcome 0 with probability
1 1
|⟨0|U |0⟩|2 + |⟨0|U |1⟩|2
2 2
and the outcome 1 with probability
1 1
|⟨1|U |0⟩|2 + |⟨1|U |1⟩|2 .
2 2
Both of these expressions are equal to 1/2. One way to argue this is to use a fact
from linear algebra that can be seen as a generalization of the Pythagorean theorem.
Parseval’s identity
We can apply this theorem to determine the probabilities as follows. The proba-
bility to get 0 is
1 2 1 2 1 2 2
|⟨0|U |0⟩| + |⟨0|U |1⟩| = |⟨0|U |0⟩| + |⟨0|U |1⟩|
2 2 2
1
= |⟨0|U † |0⟩|2 + |⟨1|U † |0⟩|2
2
1 2
= U † |0⟩
2
and the probability to get 1 is
1 2 1 2 1 2 2
|⟨1|U |0⟩| + |⟨1|U |1⟩| = |⟨1|U |0⟩| + |⟨1|U |1⟩|
2 2 2
1
= |⟨0|U † |1⟩|2 + |⟨1|U † |1⟩|2
2
1 2
= U † |1⟩ .
2
Because U is unitary, we know that U † is unitary as well, implying that both U † |0⟩
and U † |1⟩ are unit vectors. Both probabilities are therefore equal to 1/2. This means
that no matter how we choose U, we’re just going to get a uniform random bit from
the measurement.
We can perform a similar verification for any other pair of orthonormal states in
place of |0⟩ and |1⟩. For example, because {|+⟩, |−⟩} is an orthonormal basis, the
probability to obtain the measurement outcome 0 in the second procedure is
1 1 1 2 1
|⟨0|U |+⟩|2 + |⟨0|U |−⟩|2 = U † |0⟩ =
2 2 2 2
and the probability to get 1 is
1 1 1 2 1
|⟨1|U |+⟩|2 + |⟨1|U |−⟩|2 = U † |1⟩ = .
2 2 2 2
In particular, we obtain exactly the same output statistics as we did for the states
|0⟩ and |1⟩.
Probabilistic states
Classical states can be represented by density matrices. In particular, for each
classical state a of a system X, the density matrix
ρ = | a⟩⟨ a|
266 LESSON 9. DENSITY MATRICES
Going in the other direction, any diagonal density matrix can naturally be
identified with the probabilistic state we obtain by simply reading the probability
vector off from the diagonal.
To be clear, when a density matrix is diagonal, it’s not necessarily the case that
we’re talking about a classical system, or that the system must have been prepared
through the random selection of a classical state, but rather that the state could have
been obtained through the random selection of a classical state.
The fact that probabilistic states are represented by diagonal density matrices is
consistent with the intuition suggested at the start of the lesson that off-diagonal
entries describe the degree to which the two classical states corresponding to the
row and column of that entry are in quantum superposition. Here, all of the off-
diagonal entries are zero, so we just have classical randomness and nothing is in
quantum superposition.
Spectral theorem
for an orthonormal basis {|ψ0 ⟩, . . . , |ψn−1 ⟩}. It remains to verify that (λ0 , . . . , λn−1 )
is a probability vector, which we can then rename to ( p0 , . . . , pn−1 ) if we wish.
The numbers λ0 , . . . , λn−1 are the eigenvalues of ρ, and because ρ is positive
semidefinite, these numbers must therefore be nonnegative real numbers. We can
conclude that λ0 + · · · + λn−1 = 1 from the fact that ρ has trace equal to 1. Going
through the details will give us an opportunity to point out the following important
and very useful property of the trace.
For any two matrices A and B that give us a square matrix AB by multiplying,
the equality Tr( AB) = Tr( BA) is true.
268 LESSON 9. DENSITY MATRICES
Note that this works even if A and B are not themselves square matrices. That is,
we may have that A is n × m and B is m × n, for some choice of positive integers n
and m, so that AB is an n × n square matrix and BA is m × m.
In particular, if we let A be a column vector |ϕ⟩ and let B be the row vector ⟨ϕ|,
then we see that
Tr |ϕ⟩⟨ϕ| = Tr ⟨ϕ|ϕ⟩ = ⟨ϕ|ϕ⟩.
The second equality follows from the fact that ⟨ϕ|ϕ⟩ is a scalar, which we can also
think of as a 1 × 1 matrix whose trace is its single entry. Using this fact, we can
conclude that λ0 + · · · + λn−1 = 1 by the linearity of the trace function.
1 = Tr(ρ) = Tr λ0 |ψ0 ⟩⟨ψ0 | + · · · + λn−1 |ψn−1 ⟩⟨ψn−1 |
= λ0 Tr |ψ0 ⟩⟨ψ0 | + · · · + λn−1 Tr |ψn−1 ⟩⟨ψn−1 | = λ0 + · · · + λn−1
Alternatively, we can reach the same conclusion by using the fact that the trace of a
square matrix (even one that isn’t normal) is equal to the sum of its eigenvalues.
We have therefore concluded that any given density matrix ρ can be expressed
as a convex combination of pure states. We also see that we can, moreover, take
the pure states to be orthogonal. This means, in particular, that we never need the
number n to be larger than the size of the classical state set of X.
In general, it must be understood that there will be different ways to write a
density matrix as a convex combination of pure states, not just the ways that the
spectral theorem provides. A previous example illustrates this.
3 1
1 1 4 4
|0⟩⟨0| + |+⟩⟨+| =
2 2 1 1
4 4
This is not a spectral decomposition of this matrix because |0⟩ and |+⟩ are not
orthogonal. Here’s a spectral decomposition:
3 1
4 4
= cos2 (π/8)|ψπ/8 ⟩⟨ψπ/8 | + sin2 (π/8)|ψ5π/8 ⟩⟨ψ5π/8 |,
1 1
4 4
where |ψθ ⟩ = cos(θ )|0⟩ + sin(θ )|1⟩. The eigenvalues are numbers that will likely
look familiar:
√ √
2 + 2 2 − 2
cos2 (π/8) = ≈ 0.85 and sin2 (π/8) = ≈ 0.15.
4 4
9.3. BLOCH SPHERE 269
1 99
100 k∑
ρ= |ϕk ⟩⟨ϕk |.
=0
Because we’re talking about a qubit, the density matrix ρ is 2 × 2, so by the spectral
theorem we could alternatively write
for some real number p ∈ [0, 1] and an orthonormal basis {|ψ0 ⟩, |ψ1 ⟩} — but
naturally the existence of this expression doesn’t prohibit us from writing ρ as an
average of 100 pure states if we choose to do that.
for two real numbers θ ∈ [0, π ] and ϕ ∈ [0, 2π ). Here, we’re allowing θ to range
from 0 to π and dividing by 2 in the argument of sine and cosine because this is
a conventional way to parameterize vectors of this sort, and it will make things
simpler a bit later on.
Now, it isn’t quite the case that the numbers θ and ϕ are uniquely determined
by a given quantum state vector α|0⟩ + β|1⟩, but it is nearly so. In particular, if
β = 0, then θ = 0 and it doesn’t make any difference what value ϕ takes, so it can
be chosen arbitrarily. Similarly, if α = 0, then θ = π, and once again ϕ is irrelevant
(as our state is equivalent to eiϕ |1⟩ for any ϕ up to a global phase). If, however,
neither α nor β is zero, then there’s a unique choice for the pair (θ, ϕ) for which |ψ⟩
is equivalent to α|0⟩ + β|1⟩ up to a global phase.
Next, let’s consider the density matrix representation of this state.
2
cos (θ/2) e − iϕ cos(θ/2) sin(θ/2)
|ψ⟩⟨ψ| =
eiϕ cos(θ/2) sin(θ/2) sin2 (θ/2)
This makes it easy to express this density matrix as a linear combination of the Pauli
matrices:
! ! ! !
1 0 0 1 0 −i 1 0
I= , σx = , σy = , σz = .
0 1 1 0 i 0 0 −1
In fact, this is a unit vector. Using spherical coordinates it can be written as (1, θ, ϕ).
The first coordinate, 1, represents the radius or radial distance (which is always 1 in
this case), θ represents the polar angle, and ϕ represents the azimuthal angle.
In words, thinking about a sphere as the planet Earth, the polar angle θ is how
far we rotate south from the north pole to reach the point being described, from 0
to π = 180◦ , while the azimuthal angle ϕ is how far we rotate east from the prime
meridian, from 0 to 2π = 360◦ , as is illustrated in Figure 9.1. This assumes that we
define the prime meridian to be the curve on the surface of the sphere from one
pole to the other that passes through the positive x-axis.
ϕ y
x
Figure 9.1: Illustration of the Cartesian coordinates of a point on the unit 2-sphere
with polar angle θ and azimuthal angle ϕ.
272 LESSON 9. DENSITY MATRICES
|0⟩
|−⟩
|−i ⟩
|+i ⟩
|+⟩
|1⟩
Figure 9.2: The states |0⟩, |1⟩, |+⟩, |−⟩, |+i ⟩, and |−i ⟩ on the Bloch sphere.
Every point on the sphere can be described in this way — which is to say that the
points we obtain when we range over all possible pure states of a qubit correspond
precisely to a sphere in 3 real dimensions. (This sphere is typically called the unit
2-sphere because the surface of this sphere is two-dimensional.)
When we associate points on the unit 2-sphere with pure states of qubits, we
obtain the Bloch sphere representation these states.
which also works for any ϕ. Intuitively speaking, the polar angle θ is zero, so we’re
at the north pole of the Bloch sphere, where the azimuthal angle is irrelevant.
Along similar lines, the density matrix for the state |1⟩ can be written like so.
I − σz
|1⟩⟨1| =
2
This time the Cartesian coordinates are (0, 0, −1). In spherical coordinates this point
is (1, π, ϕ) where ϕ can be any angle. In this case the polar angle is all the way to π,
so we’re at the south pole where the azimuthal angle is again irrelevant.
The basis {|+⟩, |−⟩}. We have these expressions for the density matrices correspond-
ing to these states.
I + σx
|+⟩⟨+| =
2
I − σx
|−⟩⟨−| =
2
The corresponding points on the unit 2-sphere have Cartesian coordinates (1, 0, 0)
and (−1, 0, 0), and spherical coordinates (1, π/2, 0) and (1, π/2, π ), respectively.
In words, |+⟩ corresponds to the point where the positive x-axis intersects the
unit 2-sphere and |−⟩ corresponds to the point where the negative x-axis intersects
it. More intuitively, |+⟩ is on the equator of the Bloch sphere where it meets the
prime meridian, and |−⟩ is on the equator on the opposite side of the sphere.
The basis {|+i ⟩, |−i ⟩}. As we saw earlier in the lesson, these two states are defined
like this:
1 i
|+i ⟩ = √ |0⟩ + √ |1⟩
2 2
1 i
|−i ⟩ = √ |0⟩ − √ |1⟩.
2 2
This time we have these expressions.
I + σy
|+i ⟩⟨+i | =
2
I − σy
|−i ⟩⟨−i | =
2
The corresponding points on the unit 2-sphere have Cartesian coordinates (0, 1, 0)
and (0, −1, 0), while the spherical coordinates of these points are (1, π/2, π/2) and
(1, π/2, 3π/2), respectively.
In words, |+i ⟩ corresponds to the point where the positive y-axis intersects the
unit 2-sphere and |−i ⟩ to the point where the negative y-axis intersects it.
274 LESSON 9. DENSITY MATRICES
|0⟩
|ψ7π/8 ⟩
|ψπ/8 ⟩
|−⟩
|+⟩
|ψ5π/8 ⟩
|ψ3π/8 ⟩
|1⟩
Figure 9.3: Qubit states of the form |ψα ⟩ = cos(α)|0⟩ + sin(α)|1⟩ on the Bloch
sphere.
Here’s another class of quantum state vectors that has appeared from time to
time throughout this course, including previously in this lesson.
Figure 9.3 illustrates the corresponding points on the Bloch sphere for a few choices
for α.
density matrices of states that are not pure. Sometimes we refer to the Bloch ball
when we wish to be explicit about the inclusion of points inside of the Bloch sphere
as representations of qubit density matrices.
For example, we’ve seen that the density matrix 12 I, which represents the com-
pletely mixed state of a qubit, can be written in these two alternative ways:
1 1 1 1 1 1
I = |0⟩⟨0| + |1⟩⟨1| and I = |+⟩⟨+| + |−⟩⟨−|.
2 2 2 2 2 2
We also have
1 1 1
I = |+i ⟩⟨+i | + |−i ⟩⟨−i |,
2 2 2
and more generally we can use any two orthogonal qubit state vectors (which will
always correspond to two antipodal points on the Bloch sphere). If we average
the corresponding points on the Bloch sphere in a similar way, we obtain the same
point, which is at the center of the sphere. This is consistent with the observation
that
1 I + 0 · σx + 0 · σy + 0 · σz
I= ,
2 2
giving us the Cartesian coordinates (0, 0, 0).
A different example concerning convex combinations of Bloch sphere points is
the one discussed in the previous subsection.
3 1
1 1 4 4
|0⟩⟨0| + |+⟩⟨+| =
2 2 1 1
4 4
Figure 9.4 illustrates these two different ways of obtaining this density matrix as a
convex combination of pure states.
|0⟩
|ψπ/8 ⟩
!
3 1
4 4
1 1
4 4
|−⟩
|+⟩
|ψ5π/8 ⟩
|1⟩
Figure 9.4: An illustration of the density matrix 12 |0⟩⟨0| + 12 |+⟩⟨+| inside the Bloch
sphere.
Multiple systems
Density matrices can represent states of multiple systems in an analogous way to
state vectors in the simplified formulation of quantum information, following the
same basic idea that multiple systems can be viewed as if they’re single, compound
systems. In mathematical terms, the rows and columns of density matrices repre-
senting states of multiple systems are placed in correspondence with the Cartesian
product of the classical state sets of the individual systems.
For example, recall the state vector representations of the four Bell states.
1 1 1 1
|ϕ+ ⟩ = √ |00⟩ + √ |11⟩ |ϕ− ⟩ = √ |00⟩ − √ |11⟩
2 2 2 2
1 1 1 1
|ψ+ ⟩ = √ |01⟩ + √ |10⟩ |ψ− ⟩ = √ |01⟩ − √ |10⟩
2 2 2 2
9.4. MULTIPLE SYSTEMS AND REDUCED STATES 277
Product states
Similar to what we had for state vectors, tensor products of density matrices rep-
resent independence between the states of multiple systems. For instance, if X is
prepared in the state represented by the density matrix ρ and Y is independently
prepared in the state represented by σ, then the density matrix describing the state
of (X, Y ) is the tensor product ρ ⊗ σ.
The same terminology is used here as in the simplified formulation of quantum
information: states of this form are referred to as product states.
Correlated classical states. For example, we can express the situation in which
Alice and Bob share a random bit like this:
1
0 0 0
2
0 0 0 0
1 1
|0⟩⟨0| ⊗ |0⟩⟨0| + |1⟩⟨1| ⊗ |1⟩⟨1| =
2 2 0 0
0 0
1
0 0 0 2
278 LESSON 9. DENSITY MATRICES
To be clear, this is the state of a pair (Y, X) where Y represents the classical
selection of k — so we’re assuming its classical state set is {0, . . . , m − 1}.
States of this form are sometimes called classical-quantum states.
Entangled states. Not all states of pairs of systems are separable. In the general
formulation of quantum information, this is how entanglement is defined:
states that are not separable are said to be entangled.
Note that this terminology is consistent with the terminology we used in
Lesson 4 (Entanglement in Action). There we said that quantum state vectors
that are not product states represent entangled states — and indeed, for any
quantum state vector |ψ⟩ that is not a product state, we find that the state
represented by the density matrix |ψ⟩⟨ψ| is not separable. Entanglement is
much more complicated than this for states that are not pure.
9.4. MULTIPLE SYSTEMS AND REDUCED STATES 279
Suppose that we have a pair of qubits (A, B) that are together in the state
1 1
|ϕ+ ⟩ = √ |00⟩ + √ |11⟩.
2 2
We can imagine that Alice holds the qubit A and Bob holds B, which is to say that
together they share an e-bit. We’d like to have a density matrix description of Alice’s
qubit A in isolation, as if Bob decided to take his qubit and visit the stars, never to
be seen again.
First let’s think about what would happen if Bob decided somewhere on his
journey to measure his qubit with respect to a standard basis measurement. If he
did this, he would obtain the outcome 0 with probability
2 1 2 1
IA ⊗ ⟨0| |ϕ+ ⟩
= √ |0⟩ = ,
2 2
in which case the state of Alice’s qubit becomes |0⟩; and he would obtain the
outcome 1 with probability
2 1 2 1
IA ⊗ ⟨1| |ϕ+ ⟩
= √ |1⟩ = ,
2 2
That is, Alice’s qubit is in the completely mixed state. To be clear, this description
of the state of Alice’s qubit doesn’t include Bob’s measurement outcome; we’re
ignoring Bob altogether.
Now, it might seem like the density matrix description of Alice’s qubit in isola-
tion that we’ve just obtained relies on the assumption that Bob has measured his
qubit, but this is not actually so. What we’ve done is to use the possibility that Bob
measures his qubit to argue that the completely mixed state arises as the state of
Alice’s qubit, based on what we’ve already learned. Of course, nothing says that
Bob must measure his qubit — but nothing says that he doesn’t. And if he’s light
years away, then nothing he does or doesn’t do can possibly influence the state of
Alice’s qubit viewed it in isolation. That is to say, the description we’ve obtained
for the state of Alice’s qubit is the only description consistent with the impossibility
of faster-than-light communication.
We can also consider the state of Bob’s qubit B, which happens to be the com-
pletely mixed state as well. Indeed, for all four Bell states we find that the reduced
state of both Alice’s qubit and Bob’s qubit is the completely mixed state.
Now let’s generalize the example just discussed to two arbitrary systems A and B,
not necessarily qubits in the state |ϕ+ ⟩. We’ll assume the classical state sets of A and
B are Σ and Γ, respectively. A density matrix ρ representing a state of the combined
system (A, B) therefore has row and column indices corresponding to the Cartesian
product Σ × Γ.
Suppose that the state of (A, B) is described by the quantum state vector |ψ⟩, so
the density matrix describing this state is ρ = |ψ⟩⟨ψ|. We’ll obtain a density matrix
description of the state of A in isolation, which is conventionally denoted ρA . (A
superscript is also sometimes used rather than a subscript.)
The state vector |ψ⟩ can be expressed in the form
|ϕb ⟩
.
∥|ϕb ⟩∥
As a density matrix, this state can be written as follows.
†
|ϕb ⟩ |ϕb ⟩ |ϕb ⟩⟨ϕb |
=
∥|ϕb ⟩∥ ∥|ϕb ⟩∥ ∥|ϕb ⟩∥2
Averaging the different states according to the probabilities of the respective out-
comes, we arrive at the density matrix
|ϕ ⟩⟨ϕ |
∑ ∥|ϕb ⟩∥2 ∥|bϕb ⟩∥b2 ∑ |ϕb ⟩⟨ϕb | = ∑ IA ⊗ ⟨b| |ψ⟩⟨ψ| IA ⊗ |b⟩
ρA = =
b∈Γ b∈Γ b∈Γ
The formula
∑ IA ⊗ ⟨b| |ψ⟩⟨ψ| IA ⊗ |b⟩
ρA =
b∈Γ
leads us to the description of the reduced state of A for any density matrix ρ of the
pair (A, B), not just a pure state.
∑ IA ⊗ ⟨ b | ρ IA ⊗ | b ⟩
ρA =
b∈Γ
This formula must work, simply by linearity together with the fact that every
density matrix can be written as a convex combination of pure states.
The operation being performed on ρ to obtain ρA in this equation is known as
the partial trace, and to be more precise we say that the partial trace is performed
on B, or that B is traced out. This operation is denoted TrB , so we can write
∑ IA ⊗ ⟨b| ρ IA ⊗ |b⟩ .
TrB (ρ) =
b∈Γ
We can also define the partial trace on A, so it’s the system A that gets traced out
rather than B, like this.
∑ ⟨ a| ⊗ IB ρ | a⟩ ⊗ IB
TrA (ρ) =
a∈Σ
282 LESSON 9. DENSITY MATRICES
This gives us the density matrix description ρB of the state of B in isolation rather
than A.
To recapitulate, if (A, B) is any pair of systems and we have a density matrix ρ
describing a state of (A, B), the reduced states of the systems A and B are as follows.
∑ ⟨ a| ⊗ IB ρ | a⟩ ⊗ IB
ρB = TrA (ρ) =
a∈Σ
∑∑ ⟨ a | ⊗ ⟨ b | ⊗ IC ρ | a ⟩ ⊗ | b ⟩ ⊗ IC
ρC = TrAB (ρ) =
a∈Σ b∈Γ
An alternative way to describe the partial trace mappings TrA and TrB is that they
are the unique linear mappings that satisfy the formulas
TrA ( M ⊗ N ) = Tr( M) N
TrB ( M ⊗ N ) = Tr( N ) M.
In these formulas, N and M are square matrices of the appropriate sizes: the rows
and columns of M correspond to the classical states of A and the rows and columns
of N correspond to the classical states of B.
This characterization of the partial trace is not only fundamental from a mathe-
matical viewpoint, but can also allow for quick calculations in some situations. For
example, consider this state of a pair of qubits (A, B).
1 1
ρ= |0⟩⟨0| ⊗ |0⟩⟨0| + |1⟩⟨1| ⊗ |+⟩⟨+|
2 2
9.4. MULTIPLE SYSTEMS AND REDUCED STATES 283
To compute the reduced state ρA for instance, we can use linearity together with the
fact that |0⟩⟨0| and |+⟩⟨+| have unit trace.
1 1 1 1
ρA = TrB (ρ) = Tr |0⟩⟨0| |0⟩⟨0| + Tr |+⟩⟨+| |1⟩⟨1| = |0⟩⟨0| + |1⟩⟨1|
2 2 2 2
The reduced state ρB can be computed similarly.
1 1 1 1
ρB = TrA (ρ) = Tr |0⟩⟨0| |0⟩⟨0| + Tr |1⟩⟨1| |+⟩⟨+| = |0⟩⟨0| + |+⟩⟨+|
2 2 2 2
The partial trace can also be described explicitly in terms of matrices. Here we’ll do
this just for two qubits, but this can also be generalized to larger systems. Assume
that we have two qubits (A, B), so that any density matrix describing a state of these
two qubits can be written as
α00 α01 α02 α03
α10 α11 α12 α13
ρ=
α20 α21 α22 α23
α30 α31 α32 α33
One way to think about this formula begins by viewing 4 × 4 matrices as 2 × 2 block
matrices, where each block is 2 × 2. That is,
!
M0,0 M0,1
ρ=
M1,0 M1,1
284 LESSON 9. DENSITY MATRICES
for
α00 α01 α02 α03
M0,0 = , M0,1 = ,
α10 α11 α12 α13
α20 α21 α22 α23
M1,0 = , M1,1 = .
α30 α31 α32 α33
We then have !
M0,0 M0,1
TrA = M0,0 + M1,1 .
M1,0 M1,1
Here’s the formula when the second system is traced out rather than the first.
! !
α00 α01 α02 α03 α00 α01 α02 α03
Tr Tr
α α α12 α13
α10 α11 α12 α13 10 11
TrB =
! !
α20 α21 α22 α23 α20 α21 α22 α23
Tr Tr
α30 α31 α32 α33 α30 α31 α32 α33
α00 + α11 α02 + α13
=
α20 + α31 α22 + α33
The block matrix descriptions of these functions can be extended to systems larger
than qubits in a natural and direct way.
To finish the lesson, let’s apply these formulas to the same state we considered
above.
1
0 0 0
2
0 0 0 0
1 1
ρ = |0⟩⟨0| ⊗ |0⟩⟨0| + |1⟩⟨1| ⊗ |+⟩⟨+| = .
2 2 0 0 14 14
1 1
0 0 4 4
9.4. MULTIPLE SYSTEMS AND REDUCED STATES 285
0 0 41 14
Lesson 10
Quantum Channels
287
288 LESSON 10. QUANTUM CHANNELS
On the other hand, we could think about the input state of the channel as being
represented by the weighted average pρ + (1 − p)σ, in which case the output is
Φ( pρ + (1 − p)σ). It’s the same state regardless of how we choose to think about it,
so we must have
Whenever we have a mapping that satisfies this condition for every choice of density
matrices ρ and σ and scalars p ∈ [0, 1], there’s always a unique way to extend that
mapping to every matrix input (i.e., not just density matrix inputs) so that it’s linear.
On the right-hand side of this equation we have a block matrix, which can alter-
natively be described using Dirac notation as we have in the middle expression.
290 LESSON 10. QUANTUM CHANNELS
Each matrix ρ a,b has rows and columns corresponding to the classical states of X,
and these matrices can be determined by a simple formula.
ρ a,b = ⟨ a| ⊗ IX ρ |b⟩ ⊗ IX
Note that these are not density matrices in general — it’s only when they’re arranged
together to form ρ that we obtain a density matrix.
The following equation describes the state of (Z, Y ) that is obtained when Φ is
applied to X.
Φ(ρ1,0 ) Φ Φ
m −1 ( ρ 1,1 ) · · · ( ρ 1,m − 1 )
∑ | a ⟩⟨ b | ⊗ Φ ( ) =
ρ a,b .. .. .. ..
a,b=0
. . . .
Φ(ρm−1,0 ) Φ(ρm−1,1 ) · · · Φ(ρm−1,m−1 )
Notice that, in order to evaluate this expression for a given choice of Φ and ρ, we
must understand how Φ works as a linear mapping on non-density matrix inputs,
as each ρ a,b generally won’t be a density matrix on its own.
The previous equation is consistent with the expression (IdZ ⊗ Φ)(ρ), in which
IdZ denotes the identity channel on the system Z. This presumes that we’ve extended
the notion of a tensor product to linear mappings from matrices to matrices, which
is straightforward — but it isn’t really essential to the lesson and won’t be explained
further.
Reiterating a statement made above, in order for a linear mapping Φ to be a
valid channel it must be the case that, for every choice for Z and every density
matrix ρ of the pair (Z, X), we always obtain a density matrix when Φ is applied
to X. In mathematical terms, the properties a mapping must possess to be a channel
are that it must be trace-preserving — so that the matrix we obtain by applying the
channel has trace equal to one — as well as completely positive — so that the resulting
matrix is positive semidefinite. These are both important properties that can be
considered and studied separately, but it isn’t critical for the sake of this lesson to
consider them in isolation.
There are, in fact, linear mappings that always output a density matrix when
given a density matrix as input, but fail to map density matrices to density matrices
for compound systems, so we do eliminate some linear mappings from the class
of channels in this way. (The linear mapping given by matrix transposition is the
simplest example.)
10.1. QUANTUM CHANNEL BASICS 291
We have an analogous formula to one above in the case that the two systems X
and Z are swapped, so that Φ is applied to the system on the left rather than the
right.
m −1
Φ ⊗ IdZ (ρ) = ∑ Φ(ρ a,b ) ⊗ | a⟩⟨b|
a,b=0
This assumes that ρ is a state of (X, Z) rather than (Z, X). This time the block matrix
description doesn’t work because the matrices ρ a,b don’t fall into consecutive rows
and columns in ρ, but it’s the same underlying mathematical structure.
Any linear mapping that satisfies the requirement that it always transforms
density matrices into density matrices, even when it’s applied to just one part of a
compound systems, represents a valid channel. So, in an abstract sense, the notion
of a channel is determined by the notion of a density matrix, together with the
assumption that channels act linearly. In this regard, channels are analogous to
unitary operations in the simplified formulation of quantum information, which
are precisely the linear mappings that always transform quantum state vectors to
quantum state vectors for a given system; as well as to probabilistic operations
(represented by stochastic matrices) in the standard formulation of classical infor-
mation, which are precisely the linear mappings that always transform probability
vectors into probability vectors.
This action, where we multiply by U on the left and U † on the right, is commonly
referred to as conjugation by the matrix U.
This description is consistent with the fact that the density matrix that represents
a given quantum state vector |ψ⟩ is |ψ⟩⟨ψ|. In particular, if the unitary operation U
is performed on |ψ⟩, then the output state is represented by the vector U |ψ⟩, and so
the density matrix describing this state is equal to
|ψ⟩⟨ψ| 7→ U |ψ⟩⟨ψ|U †
on pure states, we can conclude by linearity that it must work as is specified by the
equation (10.1) for any density matrix ρ.
The particular channel we obtain when we take U = I is the identity channel
Id, which we can also give a subscript (such as IdZ , as we’ve already encountered)
when we wish to indicate explicitly what system this channel acts on. Its output
is always equal to its input: Id(ρ) = ρ. This might not seem like an interesting
channel, but it’s actually a very important one — and it’s fitting that this is our first
example. The identity channel is the perfect channel in some contexts, representing
an ideal memory or a perfect, noiseless transmission of information from a sender
to a receiver.
Every channel defined by a unitary operation in this way is indeed a valid
channel: conjugation by a matrix U gives us a linear map; and if ρ is a density
matrix of a system (Z, X) and U is unitary, then the result, which we can express as
(IZ ⊗ U )ρ(IZ ⊗ U † ),
is also a density matrix. Specifically, this matrix must be positive semidefinite, for if
ρ = M† M then
( IZ ⊗ U ) ρ ( IZ ⊗ U † ) = K † K
for K = M(IZ ⊗ U † ), and it must have unit trace by the cyclic property of the trace.
Mixed unitary channels for which all of the unitary operations are Pauli matrices
(or tensor products of Pauli matrices) are called Pauli channels, and are commonly
encountered in quantum computing.
This channel does something very simple: it resets a qubit to the |0⟩ state. As a
linear mapping this channel can be expressed as follows for every qubit density
matrix ρ.
Λ(ρ) = Tr(ρ)|0⟩⟨0|
Although the trace of every density matrix ρ is equal to 1, writing the channel in
this way makes it clear that it’s a linear mapping that could be applied to any 2 × 2
matrix, not just a density matrix. As we already observed, we need to understand
how channels work as linear mappings on non-density matrix inputs to describe
what happens when they’re applied to just one part of a compound system.
294 LESSON 10. QUANTUM CHANNELS
For example, suppose that A and B are qubits and together the pair (A, B) is in
the Bell state |ϕ+ ⟩. As a density matrix, this state is given by
1 1
0 0
2 2
0 0 0 0
|ϕ+ ⟩⟨ϕ+ | = .
0 0 0 0
1 1
2 0 0 2
This channel is called the completely dephasing channel, and it can be thought
of as representing an extreme form of the process known as decoherence — which
essentially ruins quantum superpositions and turns them into classical probabilistic
states.
Another way to think about this channel is that it describes a standard basis
measurement on a qubit, where an input qubit is measured and then discarded,
and where the output is a density matrix describing the measurement outcome.
Alternatively, but equivalently, we can imagine that the measurement outcome is
discarded, leaving the qubit in its post-measurement state.
Let us again consider an e-bit, and see what happens when ∆ is applied to just
one of the two qubits. Specifically, we have qubits A and B for which (A, B) is in the
state |ϕ+ ⟩, and this time let’s apply the channel to the second qubit. Here’s the state
we obtain.
1 1
|0⟩⟨0| ⊗ ∆(|0⟩⟨0|) + |0⟩⟨1| ⊗ ∆(|0⟩⟨1|)
2 2
1 1
+ |1⟩⟨0| ⊗ ∆(|1⟩⟨0|) + |1⟩⟨1| ⊗ ∆(|1⟩⟨1|)
2 2
1 1
= |0⟩⟨0| ⊗ |0⟩⟨0| + |1⟩⟨1| ⊗ |1⟩⟨1|
2 2
Alternatively we can express this equation using block matrices.
! !
1 1 1
0 0 0 0 0
∆ 2 ∆ 2
2
0 0 0 0 0 0 0 0
! =
!
∆ 0 0
0 0 0 0 0 0
1
∆ 1 1
2 0 0 2 0 0 0 2
We can also consider a qubit channel that only slightly dephases a qubit, as
opposed to completely dephasing it, which is a less extreme form of decoherence
than what is represented by the completely dephasing channel. In particular,
suppose that ε ∈ (0, 1) is a small but nonzero real number. We can define a channel
∆ε = (1 − ε) Id +ε∆,
That is, nothing happens with probability 1 − ε, and with probability ε, the qubit
dephases. In terms of matrices, this action can be expressed as follows, where the
diagonal entries are left alone and the off-diagonal entries are multiplied by 1 − ε.
! !
⟨0| ρ |0⟩ ⟨0| ρ |1⟩ ⟨0| ρ |0⟩ (1 − ε)⟨0|ρ|1⟩
ρ= 7→
⟨1| ρ |0⟩ ⟨1| ρ |1⟩ (1 − ε)⟨1|ρ|0⟩ ⟨1| ρ |1⟩
I
Ω(ρ) = Tr(ρ)
2
Here, I denotes the 2 × 2 identity matrix. In words, for any density matrix input ρ,
the channel Ω outputs the completely mixed state. It doesn’t get any noisier than
this! This channel is called the completely depolarizing channel, and like the completely
dephasing channel it can be generalized to arbitrary systems in place of qubits.
We can also consider a less extreme variant of this channel where depolarizing
happens with probability ε, similar to what we saw for the dephasing channel.
unitary matrix represents a valid operation and every valid operation can be ex-
pressed as a unitary matrix. In essence, the question being asked is: How can we
do something analogous for channels?
To answer this question, we’ll require some additional mathematical machinery.
We’ll see that channels can, in fact, be described mathematically in a few different
ways, including representations named in honor of three individuals who played
key roles in their development: Stinespring, Kraus, and Choi. Together, these
different ways of describing channels offer different angles from which they can be
viewed and analyzed.
Stinespring representations
Stinespring representations are based on the idea that every channel can be im-
plemented in a standard way, where an input system is first combined with an
initialized workspace system, forming a compound system; then a unitary opera-
tion is performed on the compound system; and finally the workspace system is
discarded (or traced out), leaving the output of the channel.
Figure 10.1 depicts such an implementation, in the form of a circuit diagram, for
a channel whose input and output systems are the same system, X. In this diagram,
the wires represent arbitrary systems, as indicated by the labels above the wires, and
not necessarily single qubits. Also, the ground symbol commonly used in electrical
engineering indicates explicitly that W is discarded.
In words, the way the implementation works is as follows. The input system X
begins in some state ρ, while a workspace system W is initialized to the standard
basis state |0⟩. A unitary operation U is performed on the pair (W, X), and finally
the workspace system W is traced out, leaving X as the output.
X X
ρ Φ(ρ)
W
U W
|0⟩
As usual, we’re using Qiskit’s ordering convention: the system X is on top in the
diagram, and therefore corresponds to the right-hand tensor factor in the formula.
Note that we’re presuming that 0 is a classical state of W, and we choose it to
be the initialized state of this system, which will help to simplify the mathematics.
One could, however, choose any fixed pure state to represent the initialized state
of W without changing the basic properties of the representation.
In general, the input and output systems of a channel need not be the same.
Figure 10.2 shows an implementation of a channel Φ whose input system is X and
whose output system is Y. This time the unitary operation transforms (W, X) into a
pair (G, Y ), where G is a new “garbage” system that gets traced out, leaving Y as
the output system.
X Y
ρ Φ(ρ)
W
U G
|0⟩
In order for U to be unitary, it must be a square matrix. This requires that the pair
(G, Y ) has the same number of classical states as the pair (W, X), and so the systems
W and G must be chosen in a way that allows this. We obtain a mathematical
expression of the resulting channel, Φ, that is similar to what we had before.
It’s not at all obvious, but every channel does in fact have a Stinespring repre-
sentation, as we will see by the end of the lesson. We’ll also see that Stinespring
representations aren’t unique; there will always be different ways to implement the
same channel in the manner that’s been described.
for an isometry A, which is a matrix whose columns are orthonormal but that might
not be a square matrix. For Stinespring representations having the form that we’ve
adopted as a definition, we can obtain an expression of this other form by taking
A = U (|0⟩W ⊗ IX ).
ρ ∆(ρ)
|0⟩ +
To see that the effect that this circuit has on the input qubit is indeed described
by the completely dephasing channel, we can go through the circuit one step at a
time, using the explicit matrix representation of the partial trace discussed in the
previous lesson. We’ll refer to the top qubit as X — this is the input and output of
the channel — and we’ll assume that X starts in some arbitrary state ρ.
The first step is the introduction of a workspace qubit W. Prior to the controlled-
NOT gate being performed, the state of the pair (W, X) is represented by the follow-
300 LESSON 10. QUANTUM CHANNELS
As per Qiskit’s ordering convention, the top qubit X is on the right and the bottom
qubit W is on the left. We’re using density matrices rather than quantum state vec-
tors, but they’re tensored together in a similar way to what’s done in the simplified
formulation of quantum information.
The next step is to perform the controlled-NOT operation, where X is the control
and W is the target. Still keeping in mind the Qiskit ordering convention, the matrix
representation of this gate is as follows.
1 0 0 0
0 0 0 1
0 0 1 0
0 1 0 0
Finally, the partial trace is performed on W. Recalling the action of this operation
on 4 × 4 matrices, which was described in the previous lesson, we obtain the
10.2. CHANNEL REPRESENTATIONS 301
The circuit described above is not the only way to implement the completely de-
phasing channel. Figure 10.4 illustrates a different way to do it.
Here’s a quick analysis showing that this implementation works. After the
Hadamard gate is performed we have this two-qubit state as a density matrix:
! !
1 1 1 ⟨0| ρ |0⟩ ⟨0| ρ |1⟩
|+⟩⟨+| ⊗ ρ = ⊗
2 1 1 ⟨1| ρ |0⟩ ⟨1| ρ |1⟩
⟨0| ρ |0⟩ ⟨0| ρ |1⟩ ⟨0| ρ |0⟩ ⟨0| ρ |1⟩
1 ⟨ 1 | ρ | 0 ⟩ ⟨ 1 | ρ | 1 ⟩ ⟨ 1 | ρ | 0 ⟩ ⟨ 1 | ρ | 1 ⟩
= .
2 ⟨0| ρ |0⟩ ⟨0| ρ |1⟩ ⟨0| ρ |0⟩ ⟨0| ρ |1⟩
⟨1| ρ |0⟩ ⟨1| ρ |1⟩ ⟨1| ρ |0⟩ ⟨1| ρ |1⟩
302 LESSON 10. QUANTUM CHANNELS
ρ Z ∆(ρ)
|0⟩ H
probability 1/2.
! !
1 1 1 ⟨0| ρ |0⟩ ⟨0| ρ |1⟩ 1 ⟨0| ρ |0⟩ −⟨0|ρ|1⟩
ρ + σz ρσz = +
2 2 2 ⟨1| ρ |0⟩ ⟨1| ρ |1⟩ 2 −⟨1|ρ|0⟩ ⟨1| ρ |1⟩
!
⟨0| ρ |0⟩ 0
=
0 ⟨1| ρ |1⟩
= ∆(ρ)
The qubit reset channel can be implemented as is illustrated in Figure 10.5. The
swap gate simply shifts the |0⟩ initialized state of the workspace qubit so that it gets
output, while the input state ρ gets moved to the bottom qubit and then traced out.
ρ Tr(ρ)|0⟩⟨0|
|0⟩
Alternatively, if we don’t demand that the output of the channel is left on top,
we can take the very simple circuit shown in Figure 10.6 as our representation. In
words, resetting a qubit to the |0⟩ state is equivalent to throwing the qubit in the
trash and getting a new one.
|0⟩ Tr(ρ)|0⟩⟨0|
Kraus representations
Now we’ll discuss Kraus representations, which offer a convenient formulaic way
to express the action of a channel through matrix multiplication and addition. In
particular, a Kraus representation is a specification of a channel, Φ, in the following
form.
N −1
Φ(ρ) = ∑ Ak ρA†k
k =0
Here, A0 , . . . , A N −1 are matrices that all have the same dimensions: their columns
correspond to the classical states of the input system, X, and their rows correspond
to the classical states of the output system, whether it’s X or some other system Y.
In order for Φ to be a valid channel, these matrices must satisfy the following
condition.
N −1
∑ A†k Ak = IX
k =0
This condition is equivalent to the condition that Φ preserves trace. The other
property required of a channel — which is complete positivity — follows from the
general form of the equation for Φ, as a sum of conjugations.
Sometimes it’s convenient to name the matrices A0 , . . . , A N −1 in a different way.
For instance, we could number them starting from 1, or we could use states in some
arbitrary classical state set Γ instead of numbers as subscripts:
These different ways of naming these matrices, which are called Kraus matrices, are
all common and can be convenient in different situations — but we’ll stick with the
names A0 , . . . , A N −1 in this lesson for the sake of simplicity.
The number N can be an arbitrary positive integer, but it never needs to be
too large: if the input system X has n classical states and the output system Y has
m classical states, then any given channel from X to Y will always have a Kraus
representation for which N is at most the product nm.
10.2. CHANNEL REPRESENTATIONS 305
1
1 1
∑ Ak ρA†k = 2 ρ + 2 σz ρσz = ∆(ρ),
k =0
as was computed previously. This time the required condition can be verified as
follows.
1
1 1 1 1
∑ A†k Ak = 2 I + 2 σz2 = 2 I + 2 I = I
k =0
= Tr(ρ)|0⟩⟨0|
These matrices satisfy the required condition.
1
∑ A†k Ak = |0⟩⟨0|0⟩⟨0| + |1⟩⟨0|0⟩⟨1| = |0⟩⟨0| + |1⟩⟨1| = I
k =0
306 LESSON 10. QUANTUM CHANNELS
One way to obtain a Kraus representation for the completely depolarizing channel
is to choose Kraus matrices A0 , . . . , A3 as follows.
|0⟩⟨0| |0⟩⟨1| |1⟩⟨0| |1⟩⟨1|
A0 = √ A1 = √ A2 = √ A3 = √
2 2 2 2
For any qubit density matrix ρ we then have
3
1
∑ Ak ρA†k = 2
|0⟩⟨0|ρ|0⟩⟨0| + |0⟩⟨1|ρ|1⟩⟨0| + |1⟩⟨0|ρ|0⟩⟨1| + |1⟩⟨1|ρ|1⟩⟨1|
k =0
I
= Tr(ρ)
2
= Ω ( ρ ).
An alternative Kraus representation is obtained by choosing Kraus matrices like so.
I σx σy σz
A0 =A1 = A2 = A3 =
2 2 2 2
To verify that these Kraus matrices do in fact represent the completely depolarizing
channel, let’s first observe that conjugating an arbitrary 2 × 2 matrix by a Pauli
matrix works as follows.
! !
α0,0 α0,1 α1,1 α1,0
σx σx =
α1,0 α1,1 α0,1 α0,0
! !
α0,0 α0,1 α1,1 −α1,0
σy σy =
α1,0 α1,1 −α0,1 α0,0
! !
α0,0 α0,1 α0,0 −α0,1
σz σz =
α1,0 α1,1 −α1,0 α1,1
This allows us to verify the correctness of our Kraus representation.
3 ρ + σx ρσx + σy ρσy + σz ρσz
∑ Ak ρA†k = 4
k =0
1 ⟨0| ρ |0⟩ + ⟨1| ρ |1⟩ + ⟨1| ρ |1⟩ + ⟨0| ρ |0⟩ ⟨0| ρ |1⟩ + ⟨1| ρ |0⟩ − ⟨1| ρ |0⟩ − ⟨0| ρ |1⟩
=
4 ⟨1| ρ |0⟩ + ⟨0| ρ |1⟩ − ⟨0| ρ |1⟩ − ⟨1| ρ |0⟩ ⟨1| ρ |1⟩ + ⟨0| ρ |0⟩ + ⟨0| ρ |0⟩ + ⟨1| ρ |1⟩
I
= Tr(ρ)
2
10.2. CHANNEL REPRESENTATIONS 307
This Kraus representation expresses an important idea, which is that the state of a
qubit can be completely randomized by applying to it one of the four Pauli matrices
(including the identity matrix) chosen uniformly at random. Thus, the completely
depolarizing channel is another example of a Pauli channel.
It is not possible to find a Kraus representation for the completely depolarizing
channel Ω having three or fewer Kraus matrices; at least four are required for this
channel.
Unitary channels
takes the much simpler form U†U = IX , which we know is true because U is
unitary.
Choi representations
Now we’ll discuss a third way that channels can be described, through the Choi
representation. The way it works is that each channel is represented by a single
matrix known as its Choi matrix. If the input system has n classical states and the
output system has m classical states, then the Choi matrix of the channel will have
nm rows and nm columns.
Choi matrices provide a faithful representation of channels, meaning that two
channels are the same if and only if they have the same Choi matrix. One reason
why this is important is that it provides us with a way of determining whether two
different descriptions correspond to the same channel or to different channels: we
simply compute the Choi matrices and compare them to see if they’re equal. In
contrast, Stinespring and Kraus representations are not unique in this way, as we
have seen.
Choi matrices are also useful in other regards for uncovering various mathemat-
ical properties of channels.
308 LESSON 10. QUANTUM CHANNELS
Definition
Let Φ be a channel from a system X to a system Y, and assume that the classical
state set of the input system X is Σ. The Choi representation of Φ, which is denoted
J (Φ), is defined by the following equation.
J (Φ) = ∑ | a⟩⟨b| ⊗ Φ | a⟩⟨b|
a,b∈Σ
That is, as a block matrix, the Choi matrix of a channel has one block Φ(| a⟩⟨b|) for
each pair ( a, b) of classical states of the input system, with the blocks arranged in a
natural way.
Notice that the set {| a⟩⟨b| : 0 ≤ a, b < n} forms a basis for the space of all n × n
matrices. Because Φ is linear, it follows that its action can be recovered from its
Choi matrix by taking linear combinations of the blocks.
Another way to think about the Choi matrix of a channel is that it’s a density matrix
if we divide by n = |Σ|. Let’s focus on the situation that Σ = {0, . . . , n − 1} for
simplicity, and imagine that we have two identical copies of X that are together in
the entangled state
1 n −1
|ψ⟩ = √ ∑ | a ⟩ ⊗ | a ⟩.
n a =0
As a density matrix, this state is as follows.
1 n −1
n a,b∑
|ψ⟩⟨ψ| = | a⟩⟨b| ⊗ | a⟩⟨b|
=0
If we apply Φ to the copy of X on the right-hand side, we obtain the Choi matrix
divided by n.
1 n −1 J (Φ)
(Id ⊗ Φ) |ψ⟩⟨ψ| = ∑ | a⟩⟨b| ⊗ Φ | a⟩⟨b| =
n a,b=0 n
10.2. CHANNEL REPRESENTATIONS 309
X Y
Φ
J (Φ)
|ψ⟩⟨ψ|
X n
Figure 10.7: Evaluating a channel on one-half of the maximally entangled state |ψ⟩
yields the normalized Choi matrix of the channel.
∑
= Tr | a⟩⟨b| | a⟩⟨b|
a,b∈Σ
= ∑ |a⟩⟨a|
a∈Σ
= IX .
In summary, the Choi representation J (Φ) for any channel Φ must be positive
semidefinite and must satisfy
TrY ( J (Φ)) = IX .
As we will see by the end of the lesson, these two conditions are not only necessary
but also sufficient, meaning that any linear mapping Φ from matrices to matrices
that satisfies these requirements must, in fact, be a channel.
310 LESSON 10. QUANTUM CHANNELS
Notice in particular that J (Id) is not the identity matrix. The Choi representa-
tion does not directly describe a channel’s action in the usual way that a matrix
represents a linear mapping.
10.3. EQUIVALENCE OF THE REPRESENTATIONS 311
The way the proof works is that a cycle of implications is proved: the first
statement in our list implies the second, the second implies the third, the third
implies the fourth, and the fourth statement implies the first. This establishes that
312 LESSON 10. QUANTUM CHANNELS
all four statements are equivalent — which is to say that they’re either all true or all
false for a given choice of Φ — because the implications can be followed transitively
from any one statement to any other.
This is a common strategy when proving that a collection of statements are
equivalent, and a useful trick to use in such a context is to set up the implications in
a way that makes them as easy to prove as possible. That is the case here — and in
fact we’ve already encountered two of the four implications.
∑
= Tr | a⟩⟨b| | a⟩⟨b|
a,b∈Σ
= ∑ |a⟩⟨a|
a∈Σ
= IX .
10.3. EQUIVALENCE OF THE REPRESENTATIONS 313
for some way of choosing the vectors |ψ0 ⟩, . . . , |ψN −1 ⟩. In general there will be
multiple ways to do this — and in fact this directly mirrors the freedom one has in
choosing a Kraus representation for Φ.
One way to obtain such an expression is to first use the spectral theorem to write
N −1
J (Φ) = ∑ λk |γk ⟩⟨γk |,
k =0
where the vectors {|ϕk,a ⟩} have entries corresponding to the classical states of Y
and can be explicitly determined by the equation
|ϕk,a ⟩ = ⟨ a| ⊗ IY |ψk ⟩
Ak = ∑ |ϕk,a ⟩⟨a|
a∈Σ
We can think about this formula purely symbolically: | a⟩ effectively gets flipped
around to form ⟨ a| and moved to right-hand side, forming a matrix. For the
purposes of verifying the proof, the formula is all we need.
There is, however, a simple and intuitive relationship between the vector |ψk ⟩
and the matrix Ak , which is that by vectorizing Ak we get |ψk ⟩. What it means to
vectorize Ak is that we stack the columns on top of one another (with the leftmost
column on top proceeding to the rightmost on the bottom), in order to form a vector.
For instance, if X and Y are both qubits, and for some choice of k we have
α00
α
01
|ψk ⟩ = α00 |0⟩ ⊗ |0⟩ + α01 |0⟩ ⊗ |1⟩ + α10 |1⟩ ⊗ |0⟩ + α11 |1⟩ ⊗ |1⟩ = ,
α10
α11
10.3. EQUIVALENCE OF THE REPRESENTATIONS 315
then
!
α00 α10
Ak = α00 |0⟩⟨0| + α01 |1⟩⟨0| + α10 |0⟩⟨1| + α11 |1⟩⟨1| = .
α01 α11
N −1
= ∑ | a⟩⟨b| ⊗ ∑ |ϕk,a ⟩⟨ϕk,b |
a,b∈Σ k =0
N −1
= ∑ ∑ |a⟩ ⊗ |ϕk,a ⟩ ∑ ⟨b| ⊗ ⟨ϕk,b |
k =0 a∈Σ b∈Σ
N −1
= ∑ |ψk ⟩⟨ψk |
k =0
= J (Φ)
(in which we’re referring the matrix transpose on the left-hand side).
Starting on the left, we can first observe that
!T !T
N −1 N −1
∑ A†k Ak = ∑ ∑ |b⟩⟨ϕk,b |ϕk,a ⟩⟨ a|
k =0 k=0 a,b∈Σ
N −1
= ∑ ∑ ⟨ϕk,b |ϕk,a ⟩| a⟩⟨b|.
k=0 a,b∈Σ
The last equality follows from the fact that the transpose is linear and maps |b⟩⟨ a|
to | a⟩⟨b|.
Moving to the right-hand side of our equation, we have
N −1 N −1
J (Φ) = ∑ |ψk ⟩⟨ψk | = ∑ ∑ | a⟩⟨b| ⊗ |ϕk,a ⟩⟨ϕk,b |
k =0 k=0 a,b∈Σ
and therefore
N −1
TrY ( J (Φ)) = ∑ ∑
Tr |ϕk,a ⟩⟨ϕk,b | | a⟩⟨b|
k=0 a,b∈Σ
N −1
= ∑ ∑ ⟨ϕk,b |ϕk,a ⟩| a⟩⟨b|.
k=0 a,b∈Σ
We’ve obtained the same result, and therefore the equation (10.3) has been
verified. It follows, by the assumption TrY ( J (Φ)) = IX , that
!T
N −1
∑ A†k Ak = IX
k =0
and therefore, because the identity matrix is its own transpose, the required condi-
tion is true.
N −1
∑ A†k Ak = IX
k =0
10.3. EQUIVALENCE OF THE REPRESENTATIONS 317
where each matrix Mk,j has m rows and n columns, and in particular we shall take
Mk,0 = Ak for k = 0, . . . , N − 1.
This must be a unitary matrix, and the blocks labeled with a question mark, or
equivalently Mk,j for j > 0, must be selected with this in mind — but aside from
allowing U to be unitary, the blocks labeled with a question mark won’t have any
relevance to the proof.
Let’s momentarily disregard the concern that U is unitary and focus on the
expression
TrG U (|0⟩⟨0|W ⊗ ρ)U †
that describes the output state of Y given the input state ρ of X for our Stinespring
representation. We can alternatively write
and so
N −1
TrG U (|0⟩⟨0|W ⊗ ρ)U † = ∑ Tr |k⟩⟨ j| Ak ρA†j
j,k=0
N −1
= ∑ Ak ρA†k
k =0
= Φ ( ρ ).
We therefore have a correct representation for the mapping Φ, and it remains to
verify that we can choose U to be unitary. Consider the first n columns of U when
it’s selected according to the pattern above. Taking these columns alone, we have a
block matrix
A0
A1
. .
..
A N −1
10.3. EQUIVALENCE OF THE REPRESENTATIONS 319
There are n columns, one for each classical state of X, and as vectors let us name
the columns as |γa ⟩ for each a ∈ Σ. Here’s a formula for these vectors that can be
matched to the block matrix representation above.
N −1
| γa ⟩ = ∑ |k ⟩ ⊗ Ak | a⟩
k =0
Now let’s compute the inner product between any two of these vectors, meaning
the ones corresponding to any choice of a, b ∈ Σ.
!
N −1 N −1
⟨ γ a | γb ⟩ = ∑ ⟨k| j⟩ ⟨ a| A†k A j |b⟩ = ⟨ a| ∑ A†k Ak |b⟩
j,k=0 k =0
By the assumption
m −1
∑ A†k Ak = IX
k =0
we conclude that the n column vectors {|γa ⟩ : a ∈ Σ} form an orthonormal set:
1 a = b
⟨ γ a | γb ⟩ =
0 a ̸ = b
for all a, b ∈ Σ.
This implies that it is possible to fill out the remaining columns of U so that
it becomes a unitary matrix. In particular, the Gram–Schmidt orthogonalization
process can be used to select the remaining columns, as discussed in Lesson 3
(Quantum Circuits).
is a valid channel. From its form, it is evident that Φ is linear, and it remains to
verify that it always transforms density matrices into density matrices. This is pretty
straightforward and we’ve already discussed the key points.
In particular, if we start with a density matrix σ of a compound system (Z, X),
and then add on an additional workspace system W, we will certainly be left with a
320 LESSON 10. QUANTUM CHANNELS
density matrix. If we reorder the systems (W, Z, X) for convenience, we can write
this state as
|0⟩⟨0|W ⊗ σ.
We then apply the unitary operation U, and as we already discussed this is a valid
channel, and hence maps density matrices to density matrices. Finally, the partial
trace of a density matrix is another density matrix.
Another way to say this is to observe first that each of these things is a valid
channel:
1. Introducing an initialized workspace system.
2. Performing a unitary operation.
3. Tracing out a system.
And finally, any composition of channels is another channel — which is immediate
from the definition, but is also a fact worth observing in its own right.
This completes the proof of the final implication, and therefore we’ve established
the equivalence of the four statements listed at the start of the section.
Lesson 11
General Measurements
321
322 LESSON 11. GENERAL MEASUREMENTS
Projective measurements
{ Π 0 , . . . , Π m −1 }
11.1. MATHEMATICAL FORMULATIONS OF MEASUREMENTS 323
Π 0 + · · · + Π m − 1 = IX
Here we’re using the cyclic property of the trace for the second equality, and for the
third equality we’re using the fact that each Π a is a projection matrix, and therefore
satisfies Π2a = Π a .
In general, if ρ is a convex combination
N −1
ρ= ∑ pk |ψk ⟩⟨ψk |
k =0
of pure states, then the expression Tr(Π a ρ) coincides with the average probability
for the outcome a, owing to the fact that this expression is linear in ρ.
N −1 N −1
Tr(Π a ρ) = ∑ pk Tr(Π a |ψk ⟩⟨ψk |) = ∑ pk ∥Π a |ψk ⟩∥2
k =0 k =0
General measurements
These are both positive semidefinite matrices: they’re Hermitian, and in both cases
√
the eigenvalues happen to be 1/2 ± 5/6, which are both positive. We also have
that P0 + P1 = I, and therefore { P0 , P1 } describes a measurement.
If the state of X is described by a density matrix ρ and we perform this measure-
ment, then the probability of obtaining the outcome 0 is Tr( P0 ρ) and the probability
of obtaining the outcome 1 is Tr( P1 ρ). For instance, if ρ = |+⟩⟨+| then the probabil-
ities for the two outcomes 0 and 1 are as follows.
2 1 1 1
3 3 2 2 5
Tr( P0 ρ) = Tr =
1 1 1 1 6
3 3 2 2
1
3 − 31 1 1
1
Tr( P1 ρ) = Tr 2 2
=
− 13 2 1 1 6
3 2 2
|ϕ0 ⟩ = |0⟩
r
1 2
|ϕ1 ⟩ = √ |0⟩ + |1⟩
3 3
r
1 2 2πi/3
|ϕ2 ⟩ = √ |0⟩ + e |1⟩
3 3
r
1 2 −2πi/3
|ϕ3 ⟩ = √ |0⟩ + e |1⟩
3 3
These four states are sometimes known as tetrahedral states because they’re vertices
of a regular tetrahedron inscribed within the Bloch sphere, as illustrated in Figure 11.1.
The Cartesian coordinates of these four states on the Bloch sphere are
√ ! √ r ! √ r !
2 2 1 2 2 1 2 2 1
(0, 0, 1), , 0, − , − , ,− , − ,− ,− ,
3 3 3 3 3 3 3 3
326 LESSON 11. GENERAL MEASUREMENTS
|ϕ0 ⟩
|ϕ3 ⟩
|ϕ2 ⟩
|ϕ1 ⟩
Figure 11.1: The tetrahedral states form the vertices of a regular tetrahedron in-
scribed within the Bloch sphere.
These four states are perfectly spread out on the Bloch sphere, each one equidistant
from the other three and with the angles between any two of them always being
the same.
11.1. MATHEMATICAL FORMULATIONS OF MEASUREMENTS 327
Measurements as channels
A second way to describe measurements in mathematical terms is as channels.
Classical information can be viewed as a special case of quantum information,
insofar as we can identify probabilistic states with diagonal density matrices. So,
in operational terms, we can think about measurements as being channels whose
inputs are matrices describing states of whatever system is being measured and
whose outputs are diagonal density matrices describing the resulting distribution of
measurement outcomes.
We’ll see shortly that any channel having this property can always be written
in a simple, canonical form that ties directly to the description of measurements
as collections of positive semidefinite matrices. Conversely, given an arbitrary
measurement as a collection of matrices, there’s always a valid channel having the
diagonal output property that describes the given measurement as suggested in
the previous paragraph. Putting these observations together, we find that the two
descriptions of general measurements are equivalent.
Before proceeding further, let’s be more precise about the measurement, how
we’re viewing it as a channel, and what assumptions we’re making about it. As
before, we’ll suppose that X is the system to be measured, and that the possible
outcomes of the measurement are the integers 0, . . . , m − 1 for some positive integer
m. We’ll let Y be the system that stores measurement outcomes, so its classical state
set is {0, . . . , m − 1}, and we represent the measurement as a channel named Φ
from X to Y.
328 LESSON 11. GENERAL MEASUREMENTS
Our assumption is that Y is classical — which is to say that no matter what state
we start with for X, the state of Y we obtain is represented by a diagonal density
matrix. We can express in mathematical terms that the output of Φ is always
diagonal in the following way. First define the completely dephasing channel ∆m
on Y.
m −1
∆m (σ ) = ∑ ⟨a|σ|a⟩ |a⟩⟨a|
a =0
This channel is analogous to the completely dephasing qubit channel ∆ from the
previous lesson. As a linear mapping, it zeros out all of the off-diagonal entries of
an input matrix and leaves the diagonal alone.
And now, a simple way to express that a given density matrix σ is diagonal is
by the equation σ = ∆m (σ). In words, zeroing out all of the off-diagonal entries of a
density matrix has no effect if and only if the off-diagonal entries were all zero to
begin with. The channel Φ therefore satisfies our assumption — that Y is classical —
if and only if
Φ(ρ) = ∆m (Φ(ρ))
for every density matrix ρ representing a state of X.
Φ(ρ) = ∆m (Φ(ρ))
Like all channels, we can express Φ in Kraus form for some way of choosing
Kraus matrices A0 , . . . , A N −1 .
N −1
Φ(ρ) = ∑ Ak ρA†k
k =0
11.1. MATHEMATICAL FORMULATIONS OF MEASUREMENTS 329
This provides us with an alternative expression for the diagonal entries of Φ(ρ):
N −1
⟨ a|Φ(ρ)| a⟩ = ∑ ⟨a| Ak ρA†k |a⟩
k =0
N −1
∑ Tr A†k | a⟩⟨ a| Ak ρ
=
k =0
= Tr Pa ρ
for
N −1
Pa = ∑ A†k | a⟩⟨ a| Ak .
k =0
Thus, for these same matrices P0 , . . . , Pm−1 we can express the channel Φ as
follows.
m −1
Φ(ρ) = ∑ Tr( Pa ρ)| a⟩⟨ a|
a =0
This expression is consistent with our description of general measurements in
terms of matrices, as we see each measurement outcome appearing with probability
Tr( Pa ρ).
Now let’s observe that the two properties required of the collection of matrices
{ P0 , . . . , Pm−1 } to describe a general measurement are indeed satisfied. The first
property is that they’re all positive semidefinite matrices. One way to see this is
to observe that, for every vector |ψ⟩ having entries in correspondence with the
classical state of X we have
N −1 N −1
∑ ⟨ψ| A†k |a⟩⟨a| Ak |ψ⟩ = ∑
2
⟨ψ| Pa |ψ⟩ = ⟨ a| Ak |ψ⟩ ≥ 0.
k =0 k =0
The second property is that if we sum these matrices we get the identity matrix.
m −1 m −1 N −1
∑ Pa = ∑ ∑ A†k | a⟩⟨ a| Ak
a =0 a =0 k =0
!
N −1 m −1
= ∑ A†k ∑ | a⟩⟨ a| Ak
k =0 a =0
N −1
= ∑ A†k Ak
k =0
= IX
The last equality follows from the fact that Φ is a channel, so its Kraus matrices
must satisfy this condition.
330 LESSON 11. GENERAL MEASUREMENTS
Matrices to channels
Now let’s verify that for any collection { P0 , . . . , Pm−1 } of positive semidefinite
matrices satisfying P0 + · · · + Pm−1 = IX , the mapping defined by
m −1
Φ(ρ) = ∑ Tr( Pa ρ)| a⟩⟨ a|
a =0
This allows for the expressions |b⟩⟨b| and |c⟩⟨c| to appear, which simplify to the
identity matrix upon summing over b and c, respectively.
By the assumption that P0 , . . . , Pm−1 are positive semidefinite, so too are the
matrices P0T , . . . , PmT −1 . In particular, transposing a Hermitian matrix results in an-
other Hermitian matrix, and the eigenvalues of any square matrix and its transpose
always agree. It follows that J (Φ) is positive semidefinite. Tracing out the output
system Y (which is the system on the right) yields
m −1
TrY ( J (Φ)) = ∑ PaT = IXT = IX ,
a =0
Partial measurements
Suppose that we have multiple systems that are collectively in a quantum state, and
a general measurement is performed on one of the systems. This results in one of the
measurement outcomes, selected at random according to probabilities determined
by the measurement and the state of the system prior to the measurement. The
resulting state of the remaining systems will then, in general, depend on which
measurement outcome was obtained.
Let’s examine how this works for a pair of systems (X, Z) when the system X is
measured. (We’re naming the system on the right Z because we’ll take Y to be a
system representing the classical output of the measurement when we view it as a
channel.) We can then easily generalize to the situation in which the systems are
swapped as well as to three or more systems.
Suppose the state of (X, Z) prior to the measurement is described by a density
matrix ρ, which we can write as follows.
n −1
ρ= ∑ |b⟩⟨c| ⊗ ρb,c
b,c=0
Outcome probabilities
Tr Pa ρX = Tr Pa TrZ (ρ) = Tr ( Pa ⊗ IZ )ρ
The first expression naturally represents the probability to obtain the outcome a
based on what we already know about measurements of a single system. To get the
second expression we’re simply using the definition ρX = TrZ (ρ).
332 LESSON 11. GENERAL MEASUREMENTS
To get the third expression requires more thought — and learners are encouraged
to convince themselves that it is true. Here’s a hint: The equivalence between the
second and third expressions does not depend on ρ being a density matrix or on
each Pa being positive semidefinite. Try showing it first for tensor products of the
form ρ = M ⊗ N and then conclude that it must be true in general by linearity.
While the equivalence of the first and third expressions in the previous equation
may not be immediate, it does make sense. Starting from a measurement on X,
we’re effectively defining a measurement of (X, Z), where we simply throw away Z
and measure X. Like all measurements, this new measurement can be described by
a collection of matrices, and it’s not surprising that this measurement is described
by the collection
{ P0 ⊗ IZ , . . . , Pm−1 ⊗ IZ }.
If we want to determine not only the probabilities for the different outcomes but
also the resulting state of Z conditioned on each measurement outcome, we can
look to the channel description of the measurement. In particular, let’s examine the
state we get when we apply Φ to X and do nothing to Z.
n −1
(Φ ⊗ IdZ )(ρ) = ∑ Φ(|b⟩⟨c|) ⊗ ρb,c
b,c=0
m −1 n −1
= ∑ ∑ Tr( Pa |b⟩⟨c|) | a⟩⟨ a| ⊗ ρb,c
a=0 b,c=0
m −1 n −1
= ∑ | a⟩⟨ a| ⊗ ∑ Tr( Pa |b⟩⟨c|)ρb,c
a =0 b,c=0
m −1 n −1
∑ ∑ TrX ( Pa ⊗ IZ )(|b⟩⟨c| ⊗ ρb,c )
= | a⟩⟨ a| ⊗
a =0 b,c=0
m −1
∑ | a⟩⟨ a| ⊗ TrX ( Pa ⊗ IZ )ρ
=
a =0
Note that this is a density matrix by virtue of the fact that Φ is a channel, so each
matrix TrX ( Pa ⊗ IZ )ρ) is necessarily positive semidefinite.
One final step transforms this expression into one that reveals what we’re
looking for.
m −1 TrX ( Pa ⊗ IZ )ρ)
∑ Tr ( Pa ⊗ IZ )ρ) |a⟩⟨a| ⊗ Tr ( P ⊗ I )ρ)
a =0 a Z
11.1. MATHEMATICAL FORMULATIONS OF MEASUREMENTS 333
p( a) = Tr ( Pa ⊗ IZ )ρ)
TrX ( Pa ⊗ IZ )ρ)
σa = . (11.2)
Tr ( Pa ⊗ IZ )ρ)
TrX ( Pa ⊗ IZ )ρ)
by dividing it by its trace. (Formally speaking, the state σa is only defined when
the probability p( a) is nonzero; when p( a) = 0 this state is irrelevant, for it refers
to a discrete event that occurs with probability zero.) Naturally, the outcome
probabilities are consistent with our previous observations.
In summary, this is what happens when the measurement { P0 , . . . , Pm−1 } is
performed on X when (X, Z) is in the state ρ.
1. Each outcome a appears with probability p( a) = Tr ( Pa ⊗ IZ )ρ).
2. Conditioned on obtaining outcome a, the state of Z is then represented by
the density matrix σa shown in the equation (11.2), which is obtained by
normalizing TrX ( Pa ⊗ IZ )ρ).
Generalization
We can adapt this description to other situations, such as when the ordering of the
systems is reversed or when there are three or more systems. Conceptually it is
straightforward, although it can become cumbersome to write down the formulas.
In general, if we have r systems X1 , . . . , Xr , the state of the compound system
(X1 , . . . , Xr ) is ρ, and the measurement { P0 , . . . , Pm−1 } is performed on Xk , the fol-
lowing happens.
334 LESSON 11. GENERAL MEASUREMENTS
Naimark’s theorem
To be clear, the system X starts out in some arbitrary state ρ while Y is initialized
to the |0⟩ state. The unitary operation U is applied to (Y, X) and then the system
11.2. NAIMARK’S THEOREM 335
X X
ρ
U
Y Y
|0⟩ a
One way to find the square root of a positive semidefinite matrix is to first compute
a spectral decomposition.
n −1
P= ∑ λk |ψk ⟩⟨ψk |
k =0
Because P is positive semidefinite, its eigenvalues must be nonnegative real num-
bers, and by replacing them with their square roots we obtain an expression for the
336 LESSON 11. GENERAL MEASUREMENTS
square root of P.
√ n −1 p
P= ∑ λk |ψk ⟩⟨ψk |
k =0
With this concept in hand, we’re ready to prove Naimark’s theorem. Under the
assumption that X has n classical states, a unitary operation U on the pair (Y, X)
can be represented by an nm × nm matrix, which we can view as an m × m block
matrix whose blocks are n × n. The key to the proof is to take U to be any unitary
matrix that matches the following pattern.
√
P0 ? ··· ?
√
P1 ? ··· ?
U= .
.. .
.. . .
. . ..
√
Pm−1 ? · · · ?
For it to be possible to fill in the blocks marked with a question mark so that U is
unitary, it’s both necessary and sufficient that the first n columns, which are formed
√ √
by the blocks P0 , . . . , Pm−1 , are orthonormal. We can then use the Gram–Schmidt
orthogonalization process to fill in the remaining columns.
The first n columns of U can be expressed as vectors in the following way, where
c = 0, . . . , n − 1 refers to the column number starting from 0.
m −1 √
| γc ⟩ = ∑ | a⟩ ⊗ Pa |c⟩
a =0
We can compute the inner product between any two of them as follows.
!
m −1 √ p m −1
⟨γc |γd ⟩ = ∑ ⟨ a|b⟩ · ⟨c| Pa Pb |d⟩ = ⟨c| ∑ Pa |d⟩ = ⟨c|d⟩
a,b=0 a =0
This shows that these columns are in fact orthonormal, so we can fill in the remain-
ing columns of U in a way that guarantees the entire matrix is unitary.
It remains to check that the measurement outcome probabilities for the simula-
tion are consistent with the original measurement. For a given initial state ρ of X,
the measurement described by the collection { P0 , . . . , Pm−1 } results in each outcome
a ∈ {0, . . . , m − 1} with probability Tr( Pa ρ).
To obtain the outcome probabilities for the simulation, let’s first give the name
σ to the state of (Y, X) after U has been performed. This state can be expressed as
11.2. NAIMARK’S THEOREM 337
follows.
m −1 √
σ = U |0⟩⟨0| ⊗ ρ U † = ∑
p
| a⟩⟨b| ⊗ Pa ρ Pb
a,b=0
Notice that the entries of U falling into the blocks marked with a question mark
have no influence on the outcome by virtue of the fact that we’re conjugating a
matrix of the form |0⟩⟨0| ⊗ ρ — so the question mark entries are always multiplied
by zero entries of |0⟩⟨0| ⊗ ρ when the matrix product is computed.
Now we can analyze what happens when a standard basis measurement is
performed on Y. The probabilities of the possible outcomes are given by the diagonal
entries of the reduced state σY of Y.
m −1 √ p
σY = ∑ Tr Pa ρ Pb | a⟩⟨b|
a,b=0
In particular, using the cyclic property of the trace, we see that the probability to
obtain a given outcome a ∈ {0, . . . , m − 1} is as follows.
√ √
⟨ a|σY | a⟩ = Tr Pa ρ Pa = Tr( Pa ρ)
This matches with the original measurement, establishing the correctness of the
simulation.
Non-destructive measurements
So far in the lesson, we’ve concerned ourselves with destructive measurements,
where the output consists of the classical measurement result alone and there is
338 LESSON 11. GENERAL MEASUREMENTS
The probabilities for the different classical outcomes to appear are the same as
before — they can’t change as a result of us deciding to ignore or not ignore X. That
is, we obtain each a ∈ {0, . . . , m − 1} with probability Tr( Pa ρ).
Conditioned upon having obtained a particular measurement outcome a, the
resulting state of X is given by this expression.
√ √
Pa ρ Pa
Tr( Pa ρ)
which is consistent with the expression we’ve obtained for the state of X conditioned
on each possible measurement outcome.
There are alternative selections for U in the context of Naimark’s theorem that
produce the same measurement outcome probabilities but give entirely different
output states of X.
For instance, one option is to substitute (IY ⊗ V )U for U, where V is any unitary
operation on X. The application of V to X commutes with the measurement of Y so
the classical outcome probabilities do not change, but now the state of X conditioned
on the outcome a becomes √ √
V Pa ρ Pa V †
.
Tr( Pa ρ)
More generally, we could replace U by the unitary matrix
!
m −1
∑ | a⟩⟨ a| ⊗ Va U
a =0
340 LESSON 11. GENERAL MEASUREMENTS
for any choice of unitary operations V0 , . . . , Vm−1 on X. Again, the classical outcome
probabilities are unchanged, but now the state of X conditioned on the outcome a
becomes √ √
Va Pa ρ Pa Va†
.
Tr( Pa ρ)
An equivalent way to express this freedom is connected with Kraus represen-
tations. That is, we can describe an m-outcome non-destructive measurement of a
system having n classical states by a selection of n × n Kraus matrices A0 , . . . , Am−1
satisfying the typical condition for Kraus matrices.
m −1
∑ A†a A a = IX (11.3)
a =0
Assuming that the initial state of X is ρ, the classical measurement outcome is a with
probability
Tr A a ρA†a = Tr A†a A a ρ
A a ρA†a
.
Tr( A†a A a ρ)
Generalizations
There are even more general ways to formulate non-destructive measurements than
the ways we’ve discussed. The notion of a quantum instrument (which won’t be
described here) represents one way to do this.
11.3. QUANTUM STATE DISCRIMINATION AND TOMOGRAPHY 341
An optimal measurement
S0 = {k ∈ {0, . . . , n − 1} : λk ≥ 0}
S1 = {k ∈ {0, . . . , n − 1} : λk < 0}
(It doesn’t actually matter in which set S0 or S1 we include the values of k for which
λk = 0. Here we’re choosing arbitrarily to include these values in S0 .) We can then
choose a projective measurement as follows.
This is an optimal measurement in the situation at hand that minimizes the proba-
bility of an incorrect determination of the selected state.
Correctness probability
Now we will determine the probability of correctness for the measurement {Π0 , Π1 }.
As we begin, we won’t really need to be concerned with the specific choice
we’ve made for Π0 and Π1 , though it may be helpful to keep it in mind. For any
measurement { P0 , P1 } (not necessarily projective) we can write the correctness
probability as follows.
p Tr( P0 ρ0 ) + (1 − p) Tr( P1 ρ1 )
11.3. QUANTUM STATE DISCRIMINATION AND TOMOGRAPHY 343
1 1 1 1 1
Tr( P0 |ϕ0 ⟩⟨ϕ0 |) + Tr( P1 |ϕ1 ⟩⟨ϕ1 |) + Tr( P2 |ϕ2 ⟩⟨ϕ2 |) + Tr( P3 |ϕ3 ⟩⟨ϕ3 |) = .
4 4 4 4 2
This is optimal by the Holevo–Yuen–Kennedy–Lax condition, as a calculation
reveals that
1
Q a = (I − |ϕa ⟩⟨ϕa |) ≥ 0
4
for a = 0, 1, 2, 3.
each of which has been independently prepared in the state ρ. Thus, the state of the
compound system (X1 , . . . , X N ) is
ρ⊗ N = ρ ⊗ ρ ⊗ · · · ⊗ ρ (N times)
We’ll now consider quantum state tomography in the simple case where ρ is a
qubit density matrix. We assume that we’re given qubits X1 , . . . , X N that are each
independently in the state ρ, and our goal is to compute an approximation ρ̃ that is
close to ρ.
Our strategy will be to divide the N qubits X1 , . . . , X N into three roughly equal-
size collections, one for each of the three Pauli matrices σx , σy , and σz . Each qubit is
then measured independently as follows.
To establish this formula, we can use the following equation for the absolute values
squared of inner products of tetrahedral states, which can be checked through direct
calculations.
1 a = b
2
⟨ϕa |ϕb ⟩ =
1 a ̸= b.
3
The four matrices
1 0
|ϕ0 ⟩⟨ϕ0 | =
0 0
√
1 2
|ϕ1 ⟩⟨ϕ1 | = √3 3
2 2
3 3
348 LESSON 11. GENERAL MEASUREMENTS
√
1 2 −2πi/3
3 3 e
|ϕ2 ⟩⟨ϕ2 | = √
2 2πi/3 2
3 e 3
√
1 2 2πi/3
3 3 e
|ϕ3 ⟩⟨ϕ3 | = √
2 −2πi/3 2
3 e 3
are linearly independent, so it suffices to prove that the formula is true when
ρ = |ϕb ⟩⟨ϕb | for b = 0, 1, 2, 3. In particular,
1 3 1 1 a = b
2
3 Tr( Pa |ϕb ⟩⟨ϕb |) − = |⟨ϕa |ϕb ⟩| − =
2 2 2 0 a ̸ = b
and therefore
3
Tr(|ϕb ⟩⟨ϕb |)
∑ 3 Tr( Pa |ϕb ⟩⟨ϕb |) −
2
|ϕa ⟩⟨ϕa | = |ϕb ⟩⟨ϕb |.
a =0
We arrive at an approximation of ρ.
3 3n 1
∑
a
ρ̃ = − |ϕa ⟩⟨ϕa |
a =0 N 2
This approximation will always be a Hermitian matrix having trace equal to one,
but it may fail to be positive semidefinite. In this case, the approximation must be
rounded to a density matrix, similar to the strategy involving Pauli measurements.
Lesson 12
12.1 Purifications
Definition of purifications
Let us begin with a precise mathematical definition for purifications.
349
350 LESSON 12. PURIFICATIONS AND FIDELITY
Purifications
The pure state |ψ⟩⟨ψ|, expressed as a density matrix rather than a quantum state
vector, is also commonly referred to as a purification of ρ when the equation in the
definition is true, but we’ll generally use the term to refer to a quantum state vector.
The term purification is also used more generally when the ordering of the
systems is reversed, when the names of the systems and states are different (of
course), and when there are more than two systems. For instance, if |ψ⟩ is a
quantum state vector representing a pure state of a compound system (A, B, C), and
the equation
ρ = TrB |ψ⟩⟨ψ|
is true for a density matrix ρ representing a state of the system (A, C), then |ψ⟩ is
still referred to as a purification of ρ.
For the purposes of this lesson, however, we’ll focus on the specific form de-
scribed in the definition. Properties and facts concerning purifications, according to
this definition, can typically be generalized to more than two systems by re-ordering
and partitioning the systems into two compound systems, one playing the role of X
and the other playing the role of Y.
Existence of purifications
Suppose that X and Y are any two systems and ρ is a given state of X. We will
prove that there exists a quantum state vector |ψ⟩ of (X, Y ) that purifies ρ — which
is another way of saying that |ψ⟩ is a purification of ρ — provided that the system
Y is large enough. In particular, if Y has at least as many classical states as X, then a
purification of this form necessarily exists for every state ρ. Fewer classical states
of Y are required for some states ρ; in general, rank(ρ) classical states of Y are
necessary and sufficient for the existence of a quantum state vector of (X, Y ) that
purifies ρ.
12.1. PURIFICATIONS 351
n −1 n −1
√ √
∑ p a pb |ϕa ⟩⟨ϕb | Tr(| a⟩⟨b|) = ∑ p a |ϕa ⟩⟨ϕa | = ρ
TrY |ψ⟩⟨ψ| =
a,b=0 a =0
More generally, for any orthonormal set of vectors {|γ0 ⟩, . . . , |γn−1 ⟩}, the quan-
tum state vector
n −1
√
| ψ ⟩ = ∑ p a | ϕa ⟩ ⊗ | γ a ⟩
a =0
is a purification of ρ.
352 LESSON 12. PURIFICATIONS AND FIDELITY
where |ψθ ⟩ = cos(θ )|0⟩ + sin(θ )|1⟩. The quantum state vector
1 1
ρ= |0⟩⟨0| + |+⟩⟨+|.
2 2
This is a convex combination of pure states but not a spectral decomposition because
|0⟩ and |+⟩ are not orthogonal and 1/2 is not an eigenvalue of ρ. Nevertheless, the
quantum state vector
1 1
√ |0⟩ ⊗ |0⟩ + √ |+⟩ ⊗ |1⟩
2 2
is a purification of ρ.
Schmidt decompositions
Next, we will discuss Schmidt decompositions, which are expressions of quantum
state vectors of pairs of systems that take a certain form. Schmidt decompositions
are closely connected with purifications, and they’re very useful in their own right.
Indeed, when reasoning about a given quantum state vector |ψ⟩ of a pair of systems,
the first step is often to identify or consider a Schmidt decomposition of this state.
12.1. PURIFICATIONS 353
Schmidt decompositions
Let |ψ⟩ be a given quantum state vector of a pair of systems (X, Y ). A Schmidt
decomposition of |ψ⟩ is an expression of the form
r −1
√
|ψ⟩ = ∑ p a | x a ⟩ ⊗ | y a ⟩,
a =0
where p0 , . . . , pr−1 are positive real numbers summing to 1 and both of the sets
{| x0 ⟩, . . . , | xr−1 ⟩} and {|y0 ⟩, . . . , |yr−1 ⟩} are orthonormal.
The values
√ √
p0 , . . . , p r −1
in a Schmidt decomposition of |ψ⟩ are known as its Schmidt coefficients, which are
uniquely determined (up to their ordering) — they’re the only positive real numbers
that can appear in such an expression of |ψ⟩. The sets
on the other hand, are not uniquely determined, and the freedom one has in
choosing these sets of vectors will be clarified in the explanation that follows.
We’ll now verify that a given quantum state vector |ψ⟩ does indeed have a
Schmidt decomposition, and in the process, we’ll learn how to find one. Consider
first an arbitrary (not necessarily orthogonal) basis {| x0 ⟩, . . . , | xn−1 ⟩} of the vector
space corresponding to the system X. Because this is a basis, there will always exist
a uniquely determined selection of vectors |z0 ⟩, . . . , |zn−1 ⟩ for which the following
equation is true.
n −1
|ψ⟩ = ∑ | x a ⟩ ⊗ |z a ⟩ (12.1)
a =0
For example, suppose {| x0 ⟩, . . . , | xn−1 ⟩} is the standard basis associated with X.
Assuming the classical state set of X is {0, . . . , n − 1}, this means that | x a ⟩ = | a⟩ for
each a ∈ {0, . . . , n − 1}, and we find that
n −1
|ψ⟩ = ∑ | a⟩ ⊗ |z a ⟩
a =0
when
|z a ⟩ = (⟨ a| ⊗ IY )|ψ⟩
354 LESSON 12. PURIFICATIONS AND FIDELITY
for each a ∈ {0, . . . , n − 1}. We frequently consider expressions like this when
contemplating a standard basis measurement of X.
It’s important to note that the formula
|z a ⟩ = (⟨ a| ⊗ IY )|ψ⟩
for the vectors |z0 ⟩, . . . , |zn−1 ⟩ in this example only works because {|0⟩, . . . , |n − 1⟩}
is an orthonormal basis. In general, if {| x0 ⟩, . . . , | xn−1 ⟩} is a basis that is not necessar-
ily orthonormal, then the vectors |z0 ⟩, . . . , |zn−1 ⟩ are still uniquely determined by
the equation (12.1), but a different formula is needed. One way to find them is first
to identify vectors |w0 ⟩, . . . , |wn−1 ⟩ so that the equation
1 a = b
⟨wa | xb ⟩ =
0 a ̸ = b
|z a ⟩ = (⟨wa | ⊗ IY )|ψ⟩.
for which the equation (12.1) is true is necessarily orthogonal, we can begin by
computing the partial trace.
n −1 n −1
TrY (|ψ⟩⟨ψ|) = ∑ | x a ⟩⟨ xb | Tr(|z a ⟩⟨zb |) = ∑ ⟨zb |z a ⟩ | x a ⟩⟨ xb |.
a,b=0 a,b=0
This expression must agree with the spectral decomposition of ρ. We conclude from
the fact that {| x0 ⟩, . . . , | xn−1 ⟩} is a basis that the set of matrices
| x a ⟩⟨ xb | : a, b ∈ {0, . . . , n − 1}
for a unit vector |y a ⟩ for each of the remaining terms. A convenient way to do this
begins with the observation that we’re free to number the eigenvalue/eigenvector
pairs in a spectral decomposition of the reduced state ρ however we wish — so we
may assume that the eigenvalues are sorted in decreasing order:
p 0 ≥ p 1 ≥ · · · ≥ p n −1 .
|z a ⟩ |z a ⟩
|y a ⟩ = =√ ,
∥|z a ⟩∥ pa
√
so that |z a ⟩ = p a |y a ⟩ for each a ∈ {0, . . . , r − 1}. The vectors {|z0 ⟩, . . . , |zr−1 ⟩} are
orthogonal and nonzero, so it follows that {|y0 ⟩, . . . , |yr−1 ⟩} is an orthonormal set,
and so we have obtained a Schmidt decomposition of |ψ⟩.
r −1
√
|ψ⟩ = ∑ p a | x a ⟩ ⊗ |y a ⟩
a =0
Concerning the choice of the vectors {| x0 ⟩, . . . , | xr−1 ⟩} and {|y0 ⟩, . . . , |yr−1 ⟩},
we can select {| x0 ⟩, . . . , | xr−1 ⟩} to be any orthonormal set of eigenvectors corre-
sponding to the nonzero eigenvalues of the reduced state TrY (|ψ⟩⟨ψ|) (as we have
done above), in which case the vectors {|y0 ⟩, . . . , |yr−1 ⟩} are uniquely determined.
The situation is symmetric between the two systems, so we can alternatively choose
{|y0 ⟩, . . . , |yr−1 ⟩} to be any orthonormal set of eigenvectors corresponding to the
nonzero eigenvalues of the reduced state TrX (|ψ⟩⟨ψ|), in which case the vectors
{| x0 ⟩, . . . , | xr−1 ⟩} will be uniquely determined.
Notice, however, that once one of the sets is selected, as a set of eigenvectors of
the corresponding reduced state as just described, the other is determined — so
they cannot be chosen independently.
Although it won’t come up again in this course, it is noteworthy that the nonzero
eigenvalues p0 , . . . , pr−1 of the reduced state TrX (|ψ⟩⟨ψ|) must always agree with
the nonzero eigenvalues of the reduced state TrY (|ψ⟩⟨ψ|) for any pure state |ψ⟩
of a pair of systems (X, Y ). Intuitively speaking, the reduced states of X and Y
have exactly the same amount of randomness in them when the pair (X, Y ) is in
a pure state. This fact is revealed by the Schmidt decomposition: in both cases
the eigenvalues of the reduced states must agree with the squares of the Schmidt
coefficients of the pure state.
Suppose that X and Y are systems, and |ψ⟩ and |ϕ⟩ are quantum state vectors
of (X, Y ) that both purify the same state of X. In symbols,
for some density matrix ρ representing a state of X. There must then exist a
unitary operation U on Y alone that transforms the first purification into the
second:
(IX ⊗ U )|ψ⟩ = |ϕ⟩.
We’ll discuss a few implications of this theorem as the lesson continues, but first
let’s see how it follows from our previous discussion of Schmidt decompositions.
Our assumption is that |ψ⟩ and |ϕ⟩ are quantum state vectors of a pair of systems
(X, Y ) that satisfy the equation
In these expressions r is the rank of ρ and {|u0 ⟩, . . . , |ur−1 ⟩} and {|v0 ⟩, . . . , |vr−1 ⟩}
are orthonormal sets of vectors in the space corresponding to Y.
For any two orthonormal sets in the same space that have the same number of el-
ements, there’s always a unitary matrix that transforms the first set into the second,
so we can choose a unitary matrix U so that U |u a ⟩ = |v a ⟩ for a = 0, . . . , r − 1. In
358 LESSON 12. PURIFICATIONS AND FIDELITY
particular, to find such a matrix U we can first use the Gram–Schmidt orthogonaliza-
tion process to extend our orthonormal sets to orthonormal bases {|u0 ⟩, . . . , |um−1 ⟩}
and {|v0 ⟩, . . . , |vm−1 ⟩}, where m is the dimension of the space corresponding to Y,
and then take
m −1
U= ∑ |v a ⟩⟨u a |.
a =0
We now find that
r −1 r −1
√ √
(IX ⊗ U )|ψ⟩ = ∑ p a | x a ⟩ ⊗ U |u a ⟩ = ∑ p a | x a ⟩ ⊗ | v a ⟩ = | ϕ ⟩,
a =0 a =0
Superdense coding
In the superdense coding protocol, Alice and Bob share an e-bit, meaning that Alice
holds a qubit A, Bob holds a qubit B, and together the pair (A, B) is in the |ϕ+ ⟩ Bell
state. The protocol describes how Alice can transform this shared state into any one
of the four Bell states, |ϕ+ ⟩, |ϕ− ⟩, |ψ+ ⟩, and |ψ− ⟩, by applying a unitary operation
to her qubit A. Once she has done this, she sends A to Bob, and then Bob performs a
measurement on the pair (A, B) to see which Bell state he holds.
For all four Bell states, the reduced state of Bob’s qubit B is completely mixed.
I
TrA (|ϕ+ ⟩⟨ϕ+ |) = TrA (|ϕ− ⟩⟨ϕ− |) = TrA (|ψ+ ⟩⟨ψ+ |) = TrA (|ψ− ⟩⟨ψ− |) =
2
By the unitary equivalence of purifications, we immediately conclude that for
each Bell state there must exist a unitary operation on Alice’s qubit A alone that
transforms |ϕ+ ⟩ into the chosen Bell state. Although this does not reveal the precise
details of the protocol, the unitary equivalence of purifications does immediately
imply that superdense coding is possible.
We can also conclude that generalizations of superdense coding to larger systems
are always possible, provided that we replace the Bell states with any orthonormal
basis of purifications of the completely mixed state.
12.1. PURIFICATIONS 359
Cryptographic implications
However, because Alice and Bob have only used unitary operations, the state of
all of the systems involved in the protocol together after the commit phase must be
in a pure state. In particular, suppose that |ψ0 ⟩ is the pure state of all of the systems
involved in the protocol when Alice commits to 0, and |ψ1 ⟩ is the pure state of all of
the systems involved in the protocol when Alice commits to 1. If we write A and B
to denote Alice and Bob’s (possibly compound) systems, then
ρ0 = TrA (|ψ0 ⟩⟨ψ0 |)
ρ1 = TrA (|ψ1 ⟩⟨ψ1 |).
Given the requirement that ρ0 = ρ1 for a perfectly concealing protocol, we
find that |ψ0 ⟩ and |ψ1 ⟩ are purifications of the same state — and so, by the unitary
equivalence of purifications, there must exist a unitary operation U on A alone such
that
(U ⊗ IB )|ψ0 ⟩ = |ψ1 ⟩.
Alice is therefore free to change her commitment from 0 to 1 by applying U to A,
or from 1 to 0 by applying U † , and so the hypothetical protocol being considered
completely fails to be binding.
Hughston–Jozsa–Wootters theorem
The last implication of the unitary equivalence of purifications that we’ll discuss in
this lesson is a theorem known as the Hughston–Jozsa–Wootters theorem.
Hughston–Jozsa–Wootters theorem
Let X and Y be systems and let |ϕ⟩ be a quantum state vector of the pair (X, Y ).
Also let N be an arbitrary positive integer, let ( p0 , . . . , p N −1 ) be a probability
vector, and let |ψ0 ⟩, . . . , |ψN −1 ⟩ be quantum state vectors representing states of
X such that
N −1
TrY |ϕ⟩⟨ϕ| = ∑ p a |ψa ⟩⟨ψa |.
a =0
Z
|0⟩ a (with probability p a )
Y U
|ϕ⟩
X
| ψa ⟩
12.2 Fidelity
In this part of the lesson, we’ll discuss the fidelity between quantum states, which is
a measure of their similarity — or how much they “overlap.”
Given two quantum state vectors, the fidelity between the pure states associated
with these quantum state vectors equals the absolute value of the inner product
between the quantum state vectors. This provides a basic way to measure their
similarity: the result is a value between 0 and 1, with larger values indicating greater
similarity. In particular, the value is zero for orthogonal states (by definition), while
the value is 1 for states equivalent up to a global phase.
12.2. FIDELITY 363
Intuitively speaking, the fidelity can be seen as an extension of this basic measure
of similarity, from quantum state vectors to density matrices.
Definition of fidelity
It’s fitting to begin with a definition of fidelity. At first glance, the definition that
follows might look unusual or mysterious, and perhaps not easy to work with. The
function it defines, however, turns out to have many interesting properties and
multiple alternative formulations, making it much nicer to work with than it may
initially appear.
Fidelity
Let ρ and σ be density matrices representing quantum states of the same system.
The fidelity between ρ and σ is defined as
q√ √
F(ρ, σ ) = Tr ρσ ρ.
Remark. Although this is a common definition, it is also common that the fidelity
is defined as the square of the quantity defined here, which is then referred to as
the root-fidelity. Neither definition is right or wrong — it’s essentially a matter of
preference. Nevertheless, one must always be careful to understand or clarify which
definition is being used.
√ √
To make sense of the formula in the definition, notice first that ρσ ρ is a
positive semidefinite matrix:
√ √
ρσ ρ = M† M
√ √
for M = σ ρ. Like all positive semidefinite matrices, this positive semidefinite
matrix has a unique positive semidefinite square root, the trace of which is the
fidelity.
For every square matrix M, the eigenvalues of the two positive semidefinite
matrices M† M and MM† are always the same, and hence the same is true for the
√ √
square roots of these matrices. Choosing M = σ ρ and using the fact that the
trace of a square matrix is the sum of its eigenvalues, we find that
q√ √ √ √ q√ √
F(ρ, σ) = Tr ρσ ρ = Tr M† M = Tr MM† = Tr σρ σ = F(σ, ρ).
364 LESSON 12. PURIFICATIONS AND FIDELITY
So, although it is not immediate from the definition, the fidelity is symmetric in its
two arguments.
Here we see the trace norm, which we encountered in the previous lesson in the
context of state discrimination. The trace norm of a (not necessarily square) matrix
M can be defined as √
∥ M∥1 = Tr M† M,
√ √
and by applying this definition to the matrix σ ρ we obtain the formula in the
definition.
An alternative way to express the trace norm of a (square) matrix M is through
this formula.
∥ M∥1 = max Tr( MU ) .
U unitary
Here the maximum is over all unitary matrices U having the same number of rows
and columns as M. Applying this formula in the situation at hand reveals another
expression of the fidelity.
√ √
F(ρ, σ) = max Tr σ ρU
U unitary
One last point on the definition of fidelity is that every pure state is (as a density
matrix) equal to its own square root, which allows the formula for the fidelity to be
simplified considerably when one or both of the states is pure. In particular, if one
of the two states is pure we have the following formula.
q
F |ϕ⟩⟨ϕ|, σ = ⟨ϕ|σ|ϕ⟩
If both states are pure, the formula simplifies to the absolute value of the inner
product of the corresponding quantum state vectors, as was mentioned at the start
of the section.
F |ϕ⟩⟨ϕ|, |ψ⟩⟨ψ| = ⟨ϕ|ψ⟩
12.2. FIDELITY 365
1. For any two density matrices ρ and σ having the same size, the fidelity F(ρ, σ )
lies between zero and one: 0 ≤ F(ρ, σ) ≤ 1. It is the case that F(ρ, σ ) = 0 if
and only if ρ and σ have orthogonal images (so they can be discriminated
without error), and F(ρ, σ ) = 1 if and only if ρ = σ.
2. The fidelity is multiplicative, meaning that the fidelity between two product
states is equal to the product of the individual fidelities:
3. The fidelity between states is nondecreasing under the action of any channel.
That is, if ρ and σ are density matrices and Φ is a channel that can take these
two states as input, then it is necessarily the case that
4. The Fuchs-van de Graaf inequalities establish a close (though not exact) rela-
tionship between fidelity and trace distance: for any two states ρ and σ we
have r
1 1
1 − ∥ρ − σ ∥1 ≤ F(ρ, σ ) ≤ 1 − ∥ρ − σ∥21 .
2 4
The final property can be expressed graphically as shown in Figure 12.2. Specifically,
for any choice of states ρ and σ of the same system, the horizontal line that crosses
the y-axis at F(ρ, σ) and the vertical line that crosses the x-axis at 12 ∥ρ − σ∥1 (which
is sometimes called the trace distance between ρ and σ) must intersect within the
gray region bordered below by the line y = 1 − x and above by the unit circle.
The most interesting region of this figure from a practical viewpoint is the upper
left-hand corner of the gray region: if the fidelity between two states is close to one,
then their trace distance is close to zero, and vice versa.
1
F(ρ, σ)
√ 1−
x
2
1−
x x
0 1 1
2 ∥ ρ − σ ∥1
Figure 12.2: The horizontal line corresponding to the fidelity and the vertical line
corresponding to the trace distance between two states must intersect inside the
shaded region.
lemma that comes up from time to time, and it’s also noteworthy because the
seemingly clunky definition for the fidelity actually makes the lemma very easy to
prove.
The set-up is as follows. Let X be a system in a state ρ and let { P0 , . . . , Pm−1 } be
a collection of positive semidefinite matrices representing a general measurement
of X. Suppose further that if this measurement is performed on the system X while
it’s in the state ρ, one of the outcomes is highly likely. To be concrete, let’s assume
that the likely measurement outcome is 0, and specifically let’s assume that
Tr( P0 ρ) > 1 − ε
We’ll need a basic fact about measurements to prove this. The measurement
matrices P0 , . . . , Pm−1 are positive semidefinite and sum to the identity, which allows
us to conclude that all of the eigenvalues of P0 are real numbers between 0 and 1.
This follows from the fact that, for any unit vector |ψ⟩, the value ⟨ψ| Pa |ψ⟩ is a
nonnegative real number for each a ∈ {0, . . . , m − 1} (because each Pa is positive
semidefinite), together with these numbers summing to one.
!
m −1 m −1
∑ ⟨ψ| Pa |ψ⟩ = ⟨ψ| ∑ Pa |ψ⟩ = ⟨ψ|I|ψ⟩ = 1.
a =0 a =0
Hence ⟨ψ| P0 |ψ⟩ is always a real number between 0 and 1, and this implies that
every eigenvalue of P0 is a real number between 0 and 1 because we can choose |ψ⟩
specifically to be a unit eigenvector corresponding to whichever eigenvalue is of
interest.
From this observation we can conclude the following inequality for every density
matrix ρ.
p
Tr P0 ρ ≥ Tr P0 ρ
In greater detail, starting from a spectral decomposition
n −1
P0 = ∑ λk |ψk ⟩⟨ψk |
k =0
we conclude that
n −1 p n −1
∑ ∑ λk ⟨ψk |ρ|ψk ⟩ = Tr
p
Tr P0 ρ = λk ⟨ψk |ρ|ψk ⟩ ≥ P0 ρ
k =0 k =0
√
from the fact that ⟨ψk |ρ|ψk ⟩ is a nonnegative real number and λk ≥ λk for each
k = 0, . . . , n − 1. (Squaring numbers between 0 and 1 can never make them larger.)
368 LESSON 12. PURIFICATIONS AND FIDELITY
Now we can prove the gentle measurement lemma by evaluating the fidelity
and then using our inequality. First, let’s simplify the expression of interest.
√ √ ! s√ √ √ √
P0 ρ P0 ρ P0 ρ P0 ρ
F ρ, = Tr
Tr( P0 ρ) Tr( P0 ρ)
u √ √ √ !2
v
u ρ P0 ρ
= Tr t p
Tr( P0 ρ)
√ √ √ !
ρ P0 ρ
= Tr p
Tr( P0 ρ)
√
Tr P0 ρ
=p
Tr( P0 ρ)
Notice that these are all equalities — we’ve not used our inequality (or any other
inequality) at this point, so we have an exact expression for the fidelity. We can now
use our inequality to conclude
√ √ ! √
P0 ρ P0 Tr P0 ρ Tr P0 ρ q
F ρ, =p ≥p = Tr P0 ρ
Tr( P0 ρ) Tr( P0 ρ) Tr( P0 ρ)
Uhlmann’s theorem
To conclude the lesson, we’ll take a look at Uhlmann’s theorem, which is a fundamen-
tal fact about the fidelity that connects it with the notion of a purification. What the
theorem says, in simple terms, is that the fidelity between any two quantum states
is equal to the maximum inner product (in absolute value) between two purifications
of those states.
12.2. FIDELITY 369
Uhlmann’s theorem
where the maximum is over all quantum state vectors |ϕ⟩ and |ψ⟩ of (X, Y ).
We can prove this theorem using the unitary equivalence of purifications — but it
isn’t completely straightforward and we’ll make use of a trick along the way.
To begin, consider spectral decompositions of the two density matrices ρ and σ.
n −1
ρ= ∑ pa |ua ⟩⟨ua |
a =0
n −1
σ= ∑ qb |vb ⟩⟨vb |
b =0
The two collections {|u0 ⟩, . . . , |un−1 ⟩} and {|v0 ⟩, . . . , |vn−1 ⟩} are orthonormal bases
of eigenvectors of ρ and σ, respectively, and p0 , . . . , pn−1 and q0 , . . . , qn−1 are the
corresponding eigenvalues.
We’ll also define |u0 ⟩, . . . , |un−1 ⟩ and |v0 ⟩, . . . , |vn−1 ⟩ to be the vectors obtained
by taking the complex conjugate of each entry of |u0 ⟩, . . . , |un−1 ⟩ and |v0 ⟩, . . . , |vn−1 ⟩.
That is, for an arbitrary vector |w⟩ we can define |w⟩ according to the following
equation for each c ∈ {0, . . . , n − 1}.
⟨c|w⟩ = ⟨c|w⟩
Notice that for any two vectors |u⟩ and |v⟩ we have ⟨u|v⟩ = ⟨v|u⟩. More generally,
for any square matrix M we have the following formula.
It follows that |u⟩ and |v⟩ are orthogonal if and only if |u⟩ and |v⟩ are orthogonal,
and therefore {|u0 ⟩, . . . , |un−1 ⟩} and {|v0 ⟩, . . . , |vn−1 ⟩} are both orthonormal bases.
370 LESSON 12. PURIFICATIONS AND FIDELITY
Now consider the following two vectors |ϕ⟩ and |ψ⟩, which are purifications of
ρ and σ, respectively.
n −1
√
|ϕ⟩ = ∑ p a |u a ⟩ ⊗ |u a ⟩
a =0
n −1
√
|ψ⟩ = ∑ qb |vb ⟩ ⊗ |vb ⟩
b =0
This is the trick referred to previously. Nothing indicates explicitly at this point that
it’s a good idea to make these particular choices for purifications of ρ and σ, but
they are valid purifications, and the complex conjugations will allow the algebra to
work out the way we need.
By the unitary equivalence of purifications, we know that every purification of
ρ for the pair of systems (X, Y ) must take the form (IX ⊗ U )|ϕ⟩ for some unitary
matrix U, and likewise every purification of σ for the pair (X, Y ) must take the form
(IX ⊗ V )|ψ⟩ for some unitary matrix V. The inner product of two such purifications
can be simplified as follows.
As U and V range over all possible unitary matrices, the matrix (U † V ) T also
ranges over all possible unitary matrices. Thus, maximizing the absolute value of
the inner product of two purifications of ρ and σ yields the following equation.
√ √
max Tr ρ σ (U † V ) T
U,V unitary
√ √ √ √
= max Tr ρ σW = ρ σ 1
= F(ρ, σ).
W unitary
Unit IV
Foundations of
Quantum Error Correction
371
372 LESSON 12. PURIFICATIONS AND FIDELITY
This final unit of the course is on quantum error correction. It begins with an
explanation of what quantum error correcting codes are and how they work. It
then moves on to the stabilizer formalism for describing quantum error correcting
codes, CSS codes, and several key examples of quantum error correcting codes.
The unit concludes with fault-tolerant quantum computation, in which quantum
computations are performed on error-corrected quantum information.
373
374 LESSON 13. CORRECTING QUANTUM ERRORS
we’ll also discuss a foundational concept in quantum error correction known as the
discretization of errors.
0 7→ 000
1 7→ 111
If nothing goes wrong, we can obviously distinguish the two possibilities for the
original bit from their encodings. The point is that if there was an error and one of
the three bits flipped, meaning that a 0 changes into a 1 or a 1 changes to a 0, then
we can still figure out what the original bit was by determining which of the two
binary values appears twice. Equivalently, we can decode by computing the majority
value (i.e., the binary value that appears most frequently).
abc 7→ majority( a, b, c)
Of course, if 2 or 3 bits of the encoding flip, then the decoding won’t work
properly and the wrong bit will be recovered, but if at most 1 of the 3 bits flip,
the decoding will be correct. This is a typical property of error correcting codes
in general: they may allow for the correction of errors, but only if there aren’t too
many of them.
13.1. REPETITION CODES 375
Encoding qubits
The 3-bit repetition code is a classical error correcting code, but we can consider
what happens if we try to use it to protect qubits against errors. As we’ll see, it’s not
a very impressive quantum error correcting code, because it actually makes some
errors more likely. It is, however, the first step toward the Shor code, and will serve
us well from a pedagogical viewpoint.
376 LESSON 13. CORRECTING QUANTUM ERRORS
0.75
error probability
p
0.5
3p2 − 2p3
0.25
Figure 13.1: The probability that two or three bits flip during transmission for the
binary symmetric channel, leading to a decoding error for the 3-bit repetition code,
is drawn in blue.
To be clear, when we refer to the 3-bit repetition code being used for qubits, we
have in mind an encoding of a qubit where standard basis states are repeated three
times, so that a single-qubit state vector is encoded as follows.
This encoding is easily implemented by the quantum circuit in Figure 13.2, which
makes use of two initialized workspace qubits and two controlled-NOT gates.
α |0⟩ + β |1⟩
|0⟩ + α|000⟩ + β|111⟩
|0⟩ +
Notice, in particular, that this encoding is not the same as repeating the quantum
state three times, as in a given qubit state vector being encoded as |ψ⟩ 7→ |ψ⟩|ψ⟩|ψ⟩.
13.1. REPETITION CODES 377
Bit-flip errors
Now suppose that an error takes place after the encoding has been performed.
Specifically, let’s suppose that an X gate, or in other words a bit-flip, occurs on one
of the qubits. For instance, if the middle qubit experiences a bit-flip, the state of the
three qubits is transformed into this state:
α|010⟩ + β|101⟩.
This isn’t the only sort of error that could occur — and it’s also reasonable to
question the assumption that an error takes the form of a perfect, unitary operation.
We’ll return to these issues in the last section of the lesson, and for now we can view
an error of this form as being just one possible type of error (albeit a fundamentally
important one).
We can see clearly from the mathematical expression for the state above that
the middle bit is the one that’s different inside of each ket. But suppose that we
had the three qubits in our possession and didn’t know their state. If we suspected
that a bit-flip may have occurred, one option to verify that a bit flipped would be to
perform a standard basis measurement, which, in the case at hand, would cause
us to see 010 or 101 with probabilities |α|2 and | β|2 , respectively. In either case, our
conclusion would be that the middle bit flipped — but, unfortunately, we would
lose the original quantum state α|0⟩ + β|1⟩. This is the state we’re trying to protect,
so measuring in the standard basis is an unsatisfactory option.
What we can do instead is to use the quantum circuit shown in Figure 13.3,
feeding the encoded state into the top three qubits. This circuit nondestructively
measures the parity of the standard basis states of the top two qubits as well as the
bottom two qubits of the three-qubit encoding.
Under the assumption that at most one bit flipped, one can easily deduce from
the measurement outcomes the location of the bit-flip (or the absence of one). In
particular, as Figure 13.5 illustrates, the three possible locations for a bit-flip error on
the encoded state are revealed by the measurement outcomes. If no bit-flips occur,
on the other hand, the measurement outcomes are 00, as shown in Figure 13.4.
Crucially, the state of the top three qubits does not collapse in any of the cases,
which allows us to correct a bit-flip error if one has occurred — by simply applying
378 LESSON 13. CORRECTING QUANTUM ERRORS
|0⟩ + +
|0⟩ + +
Figure 13.3: An error detection circuit for the 3-bit repetition code.
α |0⟩ + β |1⟩
|0⟩ +
α|000⟩ + β|111⟩
|0⟩ +
|0⟩ + + 0
|0⟩ + + 0
Figure 13.4: If no errors occur, the error detection circuit results in the outcome 00
and the encoded state is unchanged.
the same bit-flip again with an X gate. The following table summarizes the states
we obtain from at most one bit-flip, the measurement outcomes (which are called
the syndrome in the context of error correction), and the correction needed to get
back to the original encoding.
α |0⟩ + β |1⟩ X
|0⟩ +
α|001⟩ + β|110⟩
|0⟩ +
|0⟩ + + 1
bit-flip
error
|0⟩ + + 0
α |0⟩ + β |1⟩
|0⟩ + X
α|010⟩ + β|101⟩
|0⟩ +
|0⟩ + + 1
bit-flip
error
|0⟩ + + 1
α |0⟩ + β |1⟩
|0⟩ +
α|100⟩ + β|011⟩
|0⟩ + X
|0⟩ + + 0
bit-flip
error
|0⟩ + + 1
Figure 13.5: A single bit-flip error is detected by the 3-bit repetition code, with the
measurement outcomes revealing which qubit was affected.
380 LESSON 13. CORRECTING QUANTUM ERRORS
Once again, we’re only considering the possibility that at most one bit-flip occurred.
This wouldn’t work correctly if two or three bit-flips occurred, and we also haven’t
considered other possible errors besides bit-flips.
Phase-flip errors
In the quantum setting, bit-flip errors aren’t the only errors we need to worry about.
For instance, we also have to worry about phase-flip errors, which are described by Z
gates. Along the same lines as bit-flip errors, we can think about phase-flip errors
as representing just another possibility for an error that can affect a qubit.
However, as we will see in the last section of the lesson, which is on the so-
called discretization of errors for quantum error correcting codes, a focus on bit-flip
errors and phase-flip errors turns out to be well-justified. Specifically, the ability
to correct a bit-flip error, a phase-flip error, or both of these errors simultaneously
automatically implies the ability to correct an arbitrary quantum error on a single
qubit.
Unfortunately, the 3-bit repetition code doesn’t protect against phase-flips at all.
For instance, suppose that a qubit state α|0⟩ + β|1⟩ has been encoded using the 3-bit
repetition code, and a phase-flip error occurs on the middle qubit. This results in
the state
(I ⊗ Z ⊗ I)(α|000⟩ + β|111⟩) = α|000⟩ − β|111⟩,
which is exactly the state we would have obtained from encoding the qubit state
α|0⟩ − β|1⟩. Indeed, a phase-flip error on any one of the three qubits of the encoding
has this same effect, which is equivalent to a phase-flip error occurring on the
original qubit prior to encoding. Under the assumption that the original quantum
state is an unknown state, there’s therefore no way to detect that an error has
occurred, because the resulting state is a perfectly valid encoding of a different
qubit state. In particular, running the error detection circuit from before on the state
α|000⟩ − β|111⟩ is certain to result in the syndrome 00, which wrongly suggests that
no errors have occurred.
Meanwhile, there are now three qubits rather than one that could potentially ex-
perience phase-flip errors. So, in a situation in which phase-flip errors are assumed
to occur independently on each qubit with some nonzero probability p (similar to
a binary symmetric channel except for phase-flips rather than bit-flips), this code
actually increases the likelihood of a phase-flip error after decoding for small values
of p. To be more precise, we’ll get a phase-flip error on the original qubit after
13.1. REPETITION CODES 381
decoding whenever there are an odd number of phase-flip errors on the three qubits
of the encoding, which happens with probability
3p(1 − p)2 + p3 .
This value is larger than p when 0 < p < 1/2, so the code increases the probability
of a phase-flip error for values of p in this range.
We’ve observed that the 3-bit repetition code is completely oblivious to phase-flip
errors, so it doesn’t seem to be very helpful for dealing with this sort of error. We
can, however, modify the 3-bit repetition code in a simple way so that it does detect
phase-flip errors. This modification will render the code oblivious to bit-flip errors
— but, as we’ll see in the next section, we can combine together the 3-bit repetition
code with this modified version to obtain the Shor code, which can correct both
bit-flip and phase-flip errors.
Figure 13.6 shows a modified version of the encoding circuit from above, which
will now be able to protect against phase-flip errors. The modification is very
simple: we simply apply a Hadamard gate to each qubit after performing the two
controlled-NOT gates.
α |0⟩ + β |1⟩ H
|0⟩ + H α|+ + +⟩ + β|− − −⟩
|0⟩ +
H
Figure 13.6: An encoding circuit for the modified 3-bit repetition code for phase-flip
errors.
A Hadamard gate transforms a |0⟩ state into a |+⟩ state, and a |1⟩ state into a
|−⟩ state, so the net effect is that the single qubit state α|0⟩ + β|1⟩ is encoded as
α|+ + +⟩ + β|− − −⟩
H H
H H
H H
|0⟩ + +
|0⟩ + +
Figure 13.7: An error detection circuit for the modified 3-bit repetition code for
phase-flip errors.
+ +
|+⟩ H
|+⟩ H
Figure 13.8: A simplification of the error detection circuit for the modified 3-bit
repetition code for phase-flip errors shown in Figure 13.7.
A phase-flip error, or equivalently a Z gate, flips between the states |+⟩ and
|−⟩, so this encoding will be useful for detecting (and correcting) phase-flip errors.
Specifically, the error-detection circuit from earlier can be modified as in Figure 13.7.
In words, we take the circuit from before and simply put Hadamard gates on the
top three qubits at both the beginning and the end. The idea is that the first three
Hadamard gates transform |+⟩ and |−⟩ states back into |0⟩ and |1⟩ states, the same
parity checks as before take place, and then the second layer of Hadamard gates
transforms the state back to |+⟩ and |−⟩ states so that we recover our encoding. For
13.1. REPETITION CODES 383
future reference, let’s observe that this phase-flip detection circuit can be simplified
as is shown in Figure 13.8.
Figures 13.9 and 13.10 describe how our modified version of the 3-bit repetition
code, including the encoding step and the error detection step, functions when
at most one phase-flip error occurs. The behavior is similar to the ordinary 3-bit
repetition code for bit-flips.
α |0⟩ + β |1⟩ H +
|0⟩ + H + +
α|+ + +⟩ + β|− − −⟩
|0⟩ + H +
|+⟩ H 0
|+⟩ H 0
Figure 13.9: If no errors occur, the error detection circuit results in the outcome 00
and the encoded state is unchanged.
Here’s an analogous table to the one from above, this time considering the
possibility of at most one phase-flip error.
Unfortunately, this modified version of the 3-bit repetition code can now no
longer correct bit-flip errors. All is not lost, however. As suggested previously, we’ll
be able to combine the two codes we’ve just seen into one code — the 9-qubit Shor
code — that can correct both bit-flip and phase-flip errors, and indeed any error on
a single qubit.
384 LESSON 13. CORRECTING QUANTUM ERRORS
α |0⟩ + β |1⟩ H Z +
|0⟩ + H + +
α|+ + −⟩ + β|− − +⟩
|0⟩ + H +
|+⟩ H 1
phase-flip
error
|+⟩ H 0
α |0⟩ + β |1⟩ H +
|0⟩ + H Z + +
α|+ − +⟩ + β|− + −⟩
|0⟩ + H +
|+⟩ H 1
phase-flip
error
|+⟩ H 1
α |0⟩ + β |1⟩ H +
|0⟩ + H + +
α|− + +⟩ + β|+ − −⟩
|0⟩ + H Z +
|+⟩ H 0
phase-flip
error
|+⟩ H 1
Figure 13.10: A single phase-flip error is detected by the modified 3-bit repetition
code, with the measurement outcomes revealing which qubit was affected.
13.2. THE 9-QUBIT SHOR CODE 385
Code description
The 9-qubit Shor code is the code we obtain by concatenating the two codes from the
previous section. This means that we first apply one encoding, which encodes one
qubit into three, and then we apply the other encoding to each of the three qubits
used for the first encoding, resulting in nine qubits in total.
To be more precise, while we could apply the two codes in either order in this
particular case, we’ll make the choice to first apply the modified version of the 3-bit
repetition code (which detects phase-flip errors), and then we’ll encode each of the
resulting three qubits independently using the original 3-bit repetition code (which
detects bit-flip errors). Figure 13.11 shows a circuit diagram representation of this
encoding.
As the figure suggests, we’ll think about the nine qubits of the Shor code as
being grouped into three blocks of three qubits, where each block is obtained from
the second encoding step (which is the ordinary 3-bit repetition code). The ordinary
3-bit repetition code, which here is applied three times independently, is called
the inner code in this context, whereas the outer code is the code used for the first
encoding step, which is the modified version of the 3-bit repetition code that detects
phase-flip errors.
We can alternatively specify the code by describing how the two standard basis
states for our original qubit get encoded.
1
|0⟩ 7 → √ (|000⟩ + |111⟩) ⊗ (|000⟩ + |111⟩) ⊗ (|000⟩ + |111⟩)
2 2
1
|1⟩ 7 → √ (|000⟩ − |111⟩) ⊗ (|000⟩ − |111⟩) ⊗ (|000⟩ − |111⟩)
2 2
Once we know this, we can determine by linearity how an arbitrary qubit state
vector is encoded.
386 LESSON 13. CORRECTING QUANTUM ERRORS
α |0⟩ + β |1⟩ H
|0⟩ + block 0
+
|0⟩
|0⟩ + H
|0⟩ + block 1
+
|0⟩
|0⟩ + H
|0⟩ + block 2
+
|0⟩
To analyze how X and Z errors affect encodings of qubits, both for the 9-qubit Shor
code as well as other codes, it will be helpful to observe a few simple relationships
between these errors and CNOT gates. As we begin to analyze the 9-qubit Shor
code, this is a reasonable moment to pause to do this.
Figure 13.12 illustrate three basic relationships among X gates and CNOT gates.
Specifically, applying an X gate to the target qubit prior to a CNOT is equivalent to
swapping the order and performing the CNOT first, but applying an X gate to the
control qubit prior to a CNOT is equivalent to applying X gates to both qubits after
the CNOT. Finally, applying X gates to both qubits prior to a CNOT is equivalent
to applying the CNOT first and then applying an X gate to the control qubit. These
relationships can be verified by performing the required matrix multiplications or
computing the effect of the circuits on standard basis states.
The situation is similar for Z gates, except that the roles of the control and
target qubits switch. In particular, we have the three relationships depicted by
Figure 13.13.
13.2. THE 9-QUBIT SHOR CODE 387
X + + X
=
+ + X
=
X X
X + +
=
X X
Z + + Z
=
Z
+ +
=
Z Z
Z + + Z
=
Z
Now we’ll consider how errors can be detected and corrected using the 9-qubit
Shor code, starting with bit-flip errors — which we’ll generally refer to as X errors
hereafter for the sake of brevity.
To detect and correct X errors, we can simply treat each of the three blocks
in the encoding separately. Each block is an encoding of a qubit using the 3-bit
repetition code, which protects against X errors — so by performing the syndrome
measurements and X error corrections described previously to each block, we can
detect and correct up to one X error per block. In particular, if there is at most one
X error on the nine qubits of the encoding, this error will be detected and corrected
by this procedure. In short, correcting bit-flip errors is a simple matter for this code,
due to the fact that the inner code corrects bit-flip errors.
Next we’ll consider phase-flip errors, or Z errors for brevity. This time it’s not quite
as clear what we should do because the outer code is the one that detects Z errors,
but the inner code seems to be somehow “in the way,” making the detection and
correction of these errors slightly more difficult.
Suppose that a Z error occurs on one of the 9 qubits of the Shor code, such as
the one indicated in Figure 13.14. We’ve already observed what happens when a Z
error occurs when we’re using the 3-bit repetition code — it’s equivalent to a Z error
occurring prior to encoding. In the context of the 9-qubit Shor code, this means
that a Z error on any one of the three qubits within a block always has the same
effect, which is equivalent to a Z error occurring on the corresponding qubit prior
to the inner code being applied. For example, the error in Figure 13.14 is equivalent
to the one suggested in Figure 13.15. This can be reasoned using the relationships
between Z and CNOT gates described above, or by simply evaluating the circuits
on an arbitrary qubit state α|0⟩ + β|1⟩.
This suggests one option for detecting and correcting Z errors, which is to decode
the inner code, leaving us with the three qubits used for the outer encoding along
with six initialized workspace qubits. We can then check these three qubits of the
outer code for Z errors, and then finally we can re-encode using the inner code, to
bring us back to the 9-qubit encoding we get from the Shor code. If we do detect a
Z error, we can either correct it prior to re-encoding with the inner code, or we can
correct it after re-encoding, by applying a Z gate to any of the qubits in that block.
13.2. THE 9-QUBIT SHOR CODE 389
α |0⟩ + β |1⟩ H
|0⟩ + block 0
+
|0⟩
|0⟩ + H
|0⟩ + block 1
+
|0⟩ Z
|0⟩ + H
|0⟩ + block 2
+
|0⟩
Figure 13.14: A phase-flip error on one of the qubits of the 9-qubit Shor code.
α |0⟩ + β |1⟩ H
|0⟩ + block 0
+
|0⟩
|0⟩ + H Z
|0⟩ + block 1
+
|0⟩
|0⟩ + H
|0⟩ + block 2
+
|0⟩
Figure 13.15: A phase-flip error within the middle block, such as the one indicated
in Figure 13.14, is equivalent to one on the middle qubit prior to the inner encoding.
390 LESSON 13. CORRECTING QUANTUM ERRORS
α |0⟩ + β |1⟩ H +
|0⟩ + + +
|0⟩ + + +
|0⟩ + H Z + +
|0⟩ + + +
|0⟩ + + +
|0⟩ + H +
|0⟩ + + +
|0⟩ + + +
|+⟩ H 1
|+⟩ H 1
Figure 13.16: To detect phase-flip errors, we can decode the inner code, run the
error detection circuit on the three qubits of the outer code, and then re-encode the
inner code.
Figure 13.16 is a circuit diagram that includes the encoding circuit and the error
suggested above together with the steps just described (but not the actual correction
step). In this particular example, the syndrome measurement is 11, which locates the
Z error as having occurred on one of the qubits in the middle block. An advantage
of correcting Z errors after the re-encoding step rather than before is that we can
simplify the circuit above. The circuit is Figure 13.17 equivalent, but requires four
fewer CNOT gates. Again, the syndrome doesn’t indicate which qubit has been
affected by a Z error, but rather which block has experienced a Z error, with the
effect being the same regardless of which qubit within the block was affected. We
can then correct the error by applying a Z gate to any of the three qubits of the
affected block.
As an aside, here we see an example of degeneracy in a quantum error-correcting
code, where we’re able to correct certain errors without being able to identify them
uniquely.
13.2. THE 9-QUBIT SHOR CODE 391
α |0⟩ + β |1⟩ H +
|0⟩ + +
|0⟩ + +
|0⟩ + H Z + +
|0⟩ + + +
|0⟩ + + +
|0⟩ + H +
|0⟩ + +
|0⟩ + +
|+⟩ H 1
|+⟩ H 1
Figure 13.17: A simplification of the circuit in Figure 13.16 using fewer CNOT gates.
We’ve now seen how both X and Z errors can be detected and corrected using the
9-qubit Shor code, and in particular how at most one X error or at most one Z
error can be detected and corrected. Now let’s suppose that both a bit-flip and a
phase-flip error occur, possibly on the same qubit. As it turns out, nothing different
needs to be done in this situation from what has already been discussed — the
code is able to detect and correct up to one X error and one Z error simultaneously,
without further modification.
To be more specific, X errors are detected by applying the ordinary 3-bit repeti-
tion code syndrome measurement, which is performed separately on each of the
three blocks of three qubits; and Z errors are detected through the procedure de-
scribed just above, which is equivalent to decoding the inner code, performing the
syndrome measurement for the modified 3-bit repetition code for phase-flips, and
then re-encoding. These two error detection steps — as well as the corresponding
corrections — can be performed completely independently of one another, and in
fact it doesn’t matter in which order they’re performed.
392 LESSON 13. CORRECTING QUANTUM ERRORS
α |0⟩ + β |1⟩ H
|0⟩ + block 0
+
|0⟩
|0⟩ + H
|0⟩ + block 1
+
|0⟩ X Z
|0⟩ + H
|0⟩ + block 2
+
|0⟩
Figure 13.18: A bit-flip error and a phase-flip error on the same qubit in the 9-qubit
Shor code.
To see why this is, consider the example depicted in the circuit diagram in
Figure 13.18, where both an X and a Z error have affected the bottom qubit of the
middle block. Let’s first observe that the ordering of the errors doesn’t matter, in the
sense that reversing the position of the X and Z errors yields an equivalent circuit.
To be clear, X and Z do not commute, they anti-commute:
! ! ! ! !
0 1 1 0 0 −1 1 0 0 1
XZ = = =− = − ZX.
1 0 0 −1 1 0 0 −1 1 0
This implies that changing the ordering leads to an irrelevant −1 global phase factor.
We can then move the Z error just like before to obtain another equivalent circuit,
shown in Figure 13.19, which is equivalent to the one above up to a global phase
factor.
At this point it’s evident that if the procedure to detect and correct X errors
is performed first, the X error will be corrected, after which the procedure for
detecting and correcting Z errors can be performed to eliminate the Z error as
before.
13.2. THE 9-QUBIT SHOR CODE 393
α |0⟩ + β |1⟩ H
|0⟩ + block 0
+
|0⟩
|0⟩ + H Z
|0⟩ + block 1
+
|0⟩ X
|0⟩ + H
|0⟩ + block 2
+
|0⟩
Figure 13.20 illustrates, for very small values of p, that the code provides an advan-
tage, with the break-even point occurring at about 0.0323.
0.03
error probability
0.02 p
1 − (1 − p)9 − 9p(1 − p)8
0.01
Figure 13.20: A plot illustrating the break-even point for the 9-qubit Shor code.
396 LESSON 13. CORRECTING QUANTUM ERRORS
If p is smaller than this break-even point, then the code helps; at the break-even
point the probabilities are equal, so we’re just wasting our time along with 8 qubits
if we use the code; and beyond the break-even point we should absolutely not be
using this code because it’s increasing the chance of a logical error on Q.
Three and a quarter percent or so may not seem like a very good break-even
point, particularly when compared to 50%, which is the analogous break-even point
for the 3-bit repetition code for classical information. This difference is, in large part,
due to the fact that quantum information is more delicate and harder to protect
than classical information. But also — while recognizing that the 9-qubit Shor code
represents a brilliant discovery, as the world’s first quantum error correcting code —
it should be acknowledged that it isn’t actually a very good code in practical terms.
As we will see, when the error detection circuits are run, the measurements that give
us the syndrome bits effectively collapse the state of the encoding probabilistically
to one where an error (or lack of an error) represented by one of the four Pauli
matrices has taken place. (It follows from the fact that U is unitary that the numbers
α, β, γ, and δ must satisfy |α|2 + | β|2 + |γ|2 + |δ|2 = 1, and indeed, the values |α|2 ,
| β|2 , |γ|2 , and |δ|2 are the probabilities with which the encoded state collapses to
one for which the corresponding Pauli error has occurred.)
To explain how this works in greater detail, it will be convenient to use subscripts
to indicate which qubit a given qubit unitary operation acts upon. For example,
using Qiskit’s qubit numbering convention (Q8 , Q7 , . . . , Q0 ) to number the 9 qubits
used for the Shor code, we have these expressions for various unitary operations
on single qubits, where in each case we tensor the unitary matrix with the identity
matrix on every other qubit.
X0 = I ⊗ I ⊗ I ⊗ I ⊗ I ⊗ I ⊗ I ⊗ I ⊗ X
Z4 = I ⊗ I ⊗ I ⊗ I ⊗ Z ⊗ I ⊗ I ⊗ I ⊗ I
U7 = I ⊗ U ⊗ I ⊗ I ⊗ I ⊗ I ⊗ I ⊗ I ⊗ I
So, in particular, for a given qubit unitary operation U, we can specify the action of
U applied to qubit k by the following formula, which is similar to the one before
except that each matrix represents an operation applied to qubit k.
Now suppose that |ψ⟩ is the 9-qubit encoding of a qubit state. If the error U
takes place on qubit k, we obtain the state Uk |ψ⟩, which can be expressed as a linear
combination of Pauli operations acting on |ψ⟩ as follows.
of 8 bits. Just prior to the actual standard basis measurements that produce these
syndrome bits, the state has the following form.
α |I syndrome⟩ ⊗ |ψ⟩
+ β | Xk syndrome⟩ ⊗ Xk |ψ⟩
+iγ | Xk Zk syndrome⟩ ⊗ Xk Zk |ψ⟩
+δ | Zk syndrome⟩ ⊗ Zk |ψ⟩
To be clear, we have two systems at this point. The system on the left is the
8 qubits we’ll measure to get the syndrome, where |I syndrome⟩, | Xk syndrome⟩,
and so on, refer to whatever 8-qubit standard basis state is consistent with the
corresponding error (or non-error). The system on the right is the 9 qubits we’re
using for the encoding.
Notice that these two systems are now correlated (in general), and this is the key
to why this works. By measuring the syndrome, the state of the 9 qubits on the right
effectively collapses to one in which a Pauli error consistent with the measured
syndrome has been applied to one of the qubits. Moreover, the syndrome itself
provides enough information so that we can undo the error and recover the original
encoding |ψ⟩.
In particular, if the syndrome qubits are measured and the appropriate correc-
tions are made, we obtain a state that can be expressed as a density matrix,
ξ ⊗ |ψ⟩⟨ψ|,
where
ξ =|α|2 |I syndrome⟩⟨I syndrome|
+ | β|2 | Xk syndrome⟩⟨ Xk syndrome|
+ |γ|2 | Xk Zk syndrome⟩⟨ Xk Zk syndrome|
+ |δ|2 | Zk syndrome⟩⟨ Zk syndrome|.
Critically, this is a product state: we have our original, uncorrupted encoding as the
right-hand tensor factor, and on the left we have a density matrix ξ that describes a
random error syndrome. There is no longer any correlation with the system on the
right, which is the one we care about, because the errors have been corrected.
At this point we can throw the syndrome qubits away or reset them so we can
use them again. This is how the randomness — or entropy — created by errors is
removed from the system.
13.3. DISCRETIZATION OF ERRORS 399
This is the discretization of errors for the special case of unitary errors. In essence,
by measuring the syndrome, we effectively project the error onto an error that’s
described by a Pauli matrix.
At first glance it may seem too good to be true that we can correct for arbitrary
unitary errors like this, even errors that are tiny and hardly noticeable on their own.
But, what’s important to realize here is that this is a unitary error on a single qubit,
and by the design of the code, a single-qubit operation can’t change the state of the
logical qubit that’s been encoded. All it can possibly do is to move the state out of
the subspace of valid encodings, but then the error detections collapse the state and
the corrections bring it back to where it started.
Φ(σ) = ∑ A j σA†j
j
A j = α j I + β j X + γ j Y + δj Z
This allows us to express the action of the error Φ on a chosen qubit k in terms of
Pauli matrices as follows.
In short, we’ve simply expanded out all of our Kraus matrices as linear combinations
of Pauli matrices.
If we now compute and measure the error syndrome, and correct for any errors
that are revealed, we’ll obtain a similar sort of state to what we had in the case of a
unitary error:
ξ ⊗ |ψ⟩⟨ψ|,
400 LESSON 13. CORRECTING QUANTUM ERRORS
+ |γ j |2 | Xk Zk syndrome⟩⟨ Xk Zk syndrome|
2
+ |δj | | Zk syndrome⟩⟨ Zk syndrome| .
The details are a bit messier and are not shown here. Conceptually speaking, the
idea is identical to the unitary case.
Generalization
The discretization of errors generalizes to other quantum error-correcting codes,
including ones that can detect and correct errors on multiple qubits. In such cases,
errors on multiple qubits can be expressed as tensor products of Pauli matrices, and
correspondingly different syndromes specify Pauli operation corrections that might
be performed on multiple qubits rather than just one qubit.
Again, by measuring the syndrome, errors are effectively projected or collapsed
onto a discrete set of possibilities represented by tensor products of Pauli matrices,
and by correcting for those Pauli errors, we can recover the original encoded state.
Meanwhile, whatever randomness is generated in the process is moved into the
syndrome qubits, which are discarded or reset, thereby removing the randomness
generated in this process from the system that stores the encoding.
Lesson 14
In the previous lesson, we took a first look at quantum error correction, focusing
specifically on the 9-qubit Shor code. In this lesson, we’ll introduce the stabilizer
formalism, which is a mathematical framework through which a broad class of quan-
tum error correcting codes, known as stabilizer codes, can be specified and analyzed.
This includes the 9-qubit Shor code along with many other examples, including
codes that seem likely to be well-suited to real-world quantum devices. Not every
quantum error correcting code is a stabilizer code, but many are, including every
example that we’ll see in this course.
The lesson begins with a short discussion of Pauli matrices, and tensor products
of Pauli matrices more generally, which can represent not only operations on qubits,
but also measurements of qubits — in which case they’re typically referred to as
observables. We’ll then go back and take a second look at the repetition code and see
how it can be described in terms of Pauli matrix observables. This will both inform
and lead into a general discussion of stabilizer codes, including several examples,
basic properties of stabilizer codes, and how the fundamental tasks of encoding,
detecting errors, and correcting those errors can be performed.
401
402 LESSON 14. THE STABILIZER FORMALISM
All four of the Pauli matrices are both unitary and Hermitian. We used the names
σx , σy , and σz to refer to the non-identity Pauli matrices earlier in the course, but it
is conventional to instead use the capital letters X, Y, and Z in the context of error
correction. This convention was followed in the previous lesson, and we’ll continue
to do this for the remaining lessons.
Different non-identity Pauli matrices anti-commute with one another.
XY = −YX XZ = − ZX YZ = − ZY
These anti-commutation relations are simple and easy to verify by performing the
multiplications, but they’re critically important, in the stabilizer formalism and
elsewhere. As we will see, the minus signs that emerge when the ordering between
two different non-identity Pauli matrices is reversed in a matrix product correspond
precisely to the detection of errors in the stabilizer formalism.
We also have the multiplication rules listed here.
XX = YY = ZZ = I XY = iZ YZ = iX ZX = iY
That is, each Pauli matrix is its own inverse (which is always true for any matrix that
is both unitary and Hermitian), and multiplying two different non-identity Pauli
matrices together is always ±i times the remaining non-identity Pauli matrix. In
particular, up to a phase factor, Y is equivalent to XZ, which explains our focus on
X and Z errors and apparent lack of interest in Y errors in quantum error correction;
X represents a bit-flip, Z represents a phase-flip, and so (up to a global phase factor)
Y represents both of those errors occurring simultaneously on the same qubit.
The four Pauli matrices all represent operations (which could be errors) on a single
qubit — and by tensoring them together we obtain operations on multiple qubits.
14.1. PAULI OPERATIONS AND OBSERVABLES 403
This can be reasoned through the multiplication rules listed earlier. There are 16
different matrices in this set, which is commonly called the Pauli group. For a second
example, if we remove Y, we obtain half of the Pauli group.
Here’s one final example (for now), where this time we have n = 2.
⟨ X ⊗ X, Z ⊗ Z ⟩ = {I ⊗ I, X ⊗ X, Z ⊗ Z, −Y ⊗ Y }
In this case we obtain just four elements, owing to the fact that X ⊗ X and Z ⊗ Z
commute:
( X ⊗ X )( Z ⊗ Z ) = ( XZ ) ⊗ ( XZ )
= (− ZX ) ⊗ (− ZX )
= ( ZX ) ⊗ ( ZX )
= ( Z ⊗ Z )( X ⊗ X ).
Pauli observables
Pauli matrices, and n-qubit Pauli operations more generally, are unitary, and there-
fore they describe unitary operations on qubits. But they’re also Hermitian matrices,
and for this reason they describe measurements, as will now be explained.
Let’s see what measurements of the sort just described look like for Pauli operations,
starting with the three non-identity Pauli matrices. These matrices have spectral
decompositions as follows.
X = |+⟩⟨+| − |−⟩⟨−|
Y = |+i ⟩⟨+i | − |−i ⟩⟨−i |
Z = |0⟩⟨0| − |1⟩⟨1|
In all three cases, the two possible measurement outcomes are the eigenvalues
+1 and −1. Such measurements are called X measurements, Y measurements,
and Z measurements. We encountered these measurements in Lesson 11 (General
Measurements), where they arose in the context of quantum state tomography.
Of course, a Z measurement is essentially just a standard basis measurement and
an X measurement is a measurement with respect to the plus/minus basis of a qubit
— but, as these measurements are described here, we’re taking the eigenvalues +1
and −1 to be the actual measurement outcomes.
406 LESSON 14. THE STABILIZER FORMALISM
so these are the two projections that define the measurement. If, for instance, we
were to measure a |ϕ+ ⟩ Bell state nondestructively using this measurement, then
we would be certain to obtain the outcome +1, and the state would be unchanged
as a result of the measurement. In particular, the state would not collapse to |00⟩
or |11⟩.
For any n-qubit Pauli operation, we can perform the measurement associated with
that observable nondestructively using phase estimation.
Figure 14.1 shows a circuit based on phase estimation that works for any Pauli
matrix P, where the measurement is being performed on the top qubit. The out-
comes 0 and 1 of the standard basis measurement in the circuit correspond to the
14.1. PAULI OPERATIONS AND OBSERVABLES 407
|+⟩ H
P0
P1
P2
|+⟩ H
eigenvalues +1 and −1, just like we usually have for phase estimation with one
control qubit. Note that the control qubit is on the bottom in this diagram, whereas
in Lesson 7 (Phase Estimation and Factoring) the control qubits were drawn on the
top.
A similar method works for Pauli operations on multiple qubits. For example,
the circuit illustrated in Figure 14.2 performs a nondestructive measurement of the
3-qubit Pauli observable P2 ⊗ P1 ⊗ P0 , for any choice of P0 , P1 , P2 ∈ { X, Y, Z }. This
approach generalizes to n-qubit Pauli observables, for any n, in the natural way.
Of course, we only need to include controlled-unitary gates for non-identity tensor
factors of Pauli observables when implementing such measurements; controlled-
identity gates are simply identity gates and can therefore be omitted. This means
that lower weight Pauli observables require smaller circuits to be implemented
through this approach.
408 LESSON 14. THE STABILIZER FORMALISM
|+⟩ H
|0⟩ + +
Notice that, irrespective of n, these phase estimation circuits have just a single
control qubit, which is consistent with the fact that there are just two possible mea-
surement outcomes for these measurements. Using more control qubits wouldn’t
reveal additional information because these measurements are already perfect using
a single control qubit. (One way to see this is directly from the general procedure
for phase estimation: the assumption U 2 = I renders any additional control qubits
beyond the first pointless.)
Figure 14.3 shows a specific example, of a nondestructive implementation of
a Z ⊗ Z measurement, which is relevant to the description of the 3-bit repetition
code as a stabilizer code that we’ll see shortly. In this case, and for tensor products
of more than two Z observables more generally, the circuit can be simplified, as is
shown in Figure 14.4. Thus, this measurement is equivalent to nondestructively
measuring the parity (or XOR) of the standard basis states of two qubits.
14.2. REPETITION CODE REVISITED 409
Any state |ψ⟩ of this form is a valid 3-qubit encoding of a qubit state — but if we had
a state that we weren’t sure about, we could verify that we have a valid encoding
by checking the following two equations.
( Z ⊗ Z ⊗ I)|ψ⟩ = |ψ⟩
(I ⊗ Z ⊗ Z )|ψ⟩ = |ψ⟩
The first equation states that applying Z operations to the leftmost two qubits
of |ψ⟩ has no effect, which is to say that |ψ⟩ is an eigenvector of Z ⊗ Z ⊗ I with
eigenvalue 1. The second equation is similar except that Z operations are applied
to the rightmost two qubits. The idea is that, if we think about |ψ⟩ as a linear
combination of standard basis states, then the first equation implies that we can
only have nonzero coefficients for standard basis states where the leftmost two
bits have even parity (or, equivalently, are equal), and the second equation implies
that we can only have nonzero coefficients for standard basis states for which the
rightmost two bits have even parity.
Equivalently, if we view the two Pauli operations Z ⊗ Z ⊗ I and I ⊗ Z ⊗ Z
as observables, and measure both using the circuits suggested at the end of the
previous section, then we would be certain to obtain measurement outcomes corre-
sponding to +1 eigenvalues, because |ψ⟩ is an eigenvector of both observables with
eigenvalue 1. But, the simplified version of the (combined) circuit for independently
measuring both observables, shown in Figure 14.5, is none other than the parity
check circuit for the 3-bit repetition code.
The two equations above therefore imply that the parity check circuit outputs
00, which is the syndrome that indicates that no errors have been detected.
410 LESSON 14. THE STABILIZER FORMALISM
|ψ⟩ |ψ⟩
|0⟩ + + 0
|0⟩ + + 0
⟨ Z ⊗ Z ⊗ I, I ⊗ Z ⊗ Z ⟩ = {I ⊗ I ⊗ I, Z ⊗ Z ⊗ I, Z ⊗ I ⊗ Z, I ⊗ Z ⊗ Z }
Error detection
Next, we’ll consider bit-flip detection for the 3-bit repetition code, with a focus on
the interactions and relationships among the Pauli operations that are involved: the
stabilizer generators and the errors themselves.
Suppose we’ve encoded a qubit using the 3-bit repetition code, and a bit-flip
error occurs on the leftmost qubit. This causes the state |ψ⟩ to be transformed
according to the action of an X operation (or X error).
|ψ⟩ 7→ ( X ⊗ I ⊗ I)|ψ⟩
14.2. REPETITION CODE REVISITED 411
This error can be detected by performing the parity checks for the 3-bit repetition
code, as discussed in the previous lesson, which is equivalent to nondestructively
measuring the stabilizer generators Z ⊗ Z ⊗ I and I ⊗ Z ⊗ Z as observables.
Let’s begin with the first stabilizer generator. The state |ψ⟩ has been affected
by an X error on the leftmost qubit, and our goal is to understand how the mea-
surement of this stabilizer generator, as an observable, is influenced by this error.
Because X and Z anti-commute, whereas every matrix commutes with the identity
matrix, it follows that Z ⊗ Z ⊗ I anti-commutes with X ⊗ I ⊗ I. Meanwhile, because
|ψ⟩ is a valid encoding of a qubit, Z ⊗ Z ⊗ I acts trivially on |ψ⟩.
( Z ⊗ Z ⊗ I)( X ⊗ I ⊗ I)|ψ⟩ = −( X ⊗ I ⊗ I)( Z ⊗ Z ⊗ I)|ψ⟩
= −( X ⊗ I ⊗ I)|ψ⟩
Therefore, ( X ⊗ I ⊗ I)|ψ⟩ is an eigenvector of Z ⊗ Z ⊗ I with eigenvalue −1.
When the measurement associated with the observable Z ⊗ Z ⊗ I is performed on
the state ( X ⊗ I ⊗ I)|ψ⟩, the outcome is therefore certain to be the one associated
with the eigenvalue −1.
Similar reasoning can be applied to the second stabilizer generator, but this time
the error commutes with the stabilizer generator rather than anti-commuting, and
so the outcome for this measurement is the one associated with the eigenvalue +1.
(I ⊗ Z ⊗ Z )( X ⊗ I ⊗ I)|ψ⟩ = ( X ⊗ I ⊗ I)(I ⊗ Z ⊗ Z )|ψ⟩
= ( X ⊗ I ⊗ I)|ψ⟩
What we find when considering these equations is that, regardless of our original
state |ψ⟩, the corrupted state is an eigenvector of both stabilizer generators, and
whether the eigenvalue is +1 or −1 is determined by whether the error commutes
or anti-commutes with each stabilizer generator. For errors represented by Pauli
operations, it will always be one or the other, because any two Pauli operations
either commute or anti-commute. Meanwhile, the actual state |ψ⟩ doesn’t play an
important role, except for the fact that the stabilizer generators act trivially on this
state.
For this reason, we really don’t need to concern ourselves in general with the
specific encoded state we’re working with. All that matters is whether the error
commutes or anti-commutes with each stabilizer generator. In particular, these are
the relevant equations with regard to this particular error for this code.
( Z ⊗ Z ⊗ I)( X ⊗ I ⊗ I) = −( X ⊗ I ⊗ I)( Z ⊗ Z ⊗ I)
(I ⊗ Z ⊗ Z )( X ⊗ I ⊗ I) = ( X ⊗ I ⊗ I)(I ⊗ Z ⊗ Z )
412 LESSON 14. THE STABILIZER FORMALISM
Here’s a table with one row for each stabilizer generator and one column for
each error. The entry in the table is either +1 or −1 depending on whether the error
and the stabilizer generator commute or anti-commute. The table only includes
columns for the errors corresponding to a single bit-flip, as well as no error at all,
which is described by the identity tensored with itself three times. We could add
more columns for other errors, but for now our focus will be on just these errors.
For each error in the table, the corresponding column therefore reveals how that
error transforms any given encoding into a +1 or −1 eigenvector of each stabilizer
generator. Equivalently, the columns describe the syndrome we would obtain from
the parity checks, which are equivalent to nondestructive measurements of the
stabilizer generators as observables.
Of course, the table has +1 and −1 entries rather than 0 and 1 entries — and it’s
common to think about a syndrome as being a binary string rather than column
of +1 and −1 entries — but we can equally well think about these vectors with
+1 and −1 entries as syndromes to connect them directly to the eigenvalues of the
stabilizer generators. In general, the syndromes tell us something about whatever
error took place, and if we know that one of the four possible errors listed in the
table occurred, the syndrome indicates which one it was.
Syndromes
Encodings for the 3-bit repetition code are 3-qubit states, so they’re unit vectors in
an 8-dimensional complex vector space. The four possible syndromes effectively
split this 8 dimensional space into four 2-dimensional subspaces, where quantum
state vectors in each subspace always result in the same syndrome. The diagram in
Figure 14.6 illustrates specifically how the 8-dimensional space is divided up by the
two stabilizer generators.
Each stabilizer generator splits the space into two subspaces of equal dimension,
namely the space of +1 eigenvectors and the space of −1 eigenvectors for that
observable. For example, the +1 eigenvectors of Z ⊗ Z ⊗ I are linear combinations
of standard basis states for which the leftmost two bits have even parity, and the −1
eigenvectors are linear combinations of standard basis states for which the leftmost
14.2. REPETITION CODE REVISITED 413
I⊗Z⊗Z
z }| {
+1 −1
|000⟩ |001⟩
+1
|111⟩ |110⟩
Z⊗Z⊗I
|100⟩ |010⟩
−1
|011⟩ |101⟩
two bits have odd parity. The situation is similar for the other stabilizer generator,
except that for this one it’s the rightmost two bits rather than the leftmost two bits.
The four 2-dimensional subspaces corresponding to the four possible syndromes
are easy to describe in this case, owing to the fact that this is a very simple code.
In particular, the subspace corresponding to the syndrome (+1, +1) is the space
spanned by |000⟩ and |111⟩, which is the space of valid encodings (also known as
the code space), and in general the spaces are spanned by the standard basis shown
in the corresponding squares.
The syndromes also partition all of the 3-qubit Pauli operations into 4 equal-
size collections, depending upon which syndrome that operation (as an error)
would cause. For example, any Pauli operation that commutes with both stabilizer
generators results in the syndrome (+1, +1), and among the 64 possible 3-qubit
Pauli operations, there are exactly 16 of them in this category (including I ⊗ I ⊗ Z,
Z ⊗ Z ⊗ Z, and X ⊗ X ⊗ X for instance), and likewise for the other 3 syndromes.
Both of these properties — that the syndromes partition both the state space in
which encodings live and all of the Pauli operations on this space into equal-sized
collections — are true in general for stabilizer codes, which we’ll define precisely in
the next section.
414 LESSON 14. THE STABILIZER FORMALISM
Although it’s mainly an aside at this point, it’s worth mentioning that Pauli
operations that commute with both stabilizer generators, or equivalently Pauli
operations that result in the syndrome (+1, +1), but are not themselves proportional
to elements of the stabilizer, turn out to behave just like single-qubit Pauli operations
on the encoded qubit (i.e., the logical qubit) for this code. For example, X ⊗ X ⊗ X
commutes with both stabilizer generators, but is itself not proportional to any
element in the stabilizer, and indeed the effect of this operation on an encoding is
equivalent to an X gate on the logical qubit being encoded.
Pk ∈
/ ⟨ P1 , . . . , Pk−1 , Pk+1 , . . . , Pr ⟩ (for all k ∈ {1, . . . , r })
3. At least one quantum state vector is fixed by all of the stabilizer generators.
−I⊗ n ∈
/ ⟨ P1 , . . . , Pr ⟩
(It’s not obvious that the existence of a quantum state vector |ψ⟩ fixed by all of
the stabilizer generators, meaning P1 |ψ⟩ = · · · = Pr |ψ⟩ = |ψ⟩, is equivalent to
−I⊗ n ∈ / ⟨ P1 , . . . , Pr ⟩, but indeed this is the case, and we’ll see why a bit later
in the lesson.)
14.3. STABILIZER CODES 415
Assuming that we have such a list P1 , . . . , Pr , the code space defined by these stabilizer
generators is the subspace C containing every n-qubit quantum state vector fixed
by all r of these stabilizer generators.
C = |ψ⟩ : P1 |ψ⟩ = · · · = Pr |ψ⟩ = |ψ⟩
Quantum state vectors in this subspace are precisely the ones that can be viewed as
valid encodings of quantum states. We’ll discuss the actual process of encoding later.
Finally, the stabilizer of the code defined by the stabilizer generators P1 , . . . , Pr is
the set generated by these operations:
⟨ P1 , . . . , Pr ⟩.
A natural way to think about a stabilizer code is to view the stabilizer generators
as observables, and to collectively interpret the outcomes of the measurements
associated with these observables as an error syndrome. Valid encodings are n-qubit
quantum state vectors for which the measurement outcomes, as eigenvalues, are
all guaranteed to be +1. Any other syndrome, where at least one −1 measurement
outcome occurs, signals that an error has been detected.
We’ll take a look at several examples shortly, but first just a few remarks about
the three conditions on stabilizer generators are in order.
The first condition is natural, in light of the interpretation of the stabilizer
generators as observables, for it implies that it doesn’t matter in what order the
measurements are performed: the observables commute, so the measurements
commute. This naturally imposes certain algebraic constraints on stabilizer codes
that are important to how they work.
The second condition requires that the stabilizer generators form a minimal
generating set, meaning that removing any one of them would result in a smaller
stabilizer. Strictly speaking, this condition isn’t really essential to the way stabilizer
codes work in an operational sense — and, as we’ll see in the next lesson, it does
sometimes make sense to think about sets of stabilizer generators for codes that
actually don’t satisfy this condition. For the sake of analyzing stabilizer codes and
explaining their properties, however, we will assume that this condition is in place.
In short, this condition guarantees that each observable that we measure to obtain
the error syndrome adds information about possible errors, as opposed to being
redundant and producing results that could be inferred from the other stabilizer
generator measurements.
416 LESSON 14. THE STABILIZER FORMALISM
The third condition requires that at least one nonzero vector is fixed by all
of the stabilizer generators, which is equivalent to −I⊗n not being contained in
the stabilizer. The need for this condition comes from the fact that it actually is
possible to choose a minimal generating set of n-qubit Pauli operations that all
commute with one another, and yet no nonzero vectors are fixed by every one of the
operations. We’re not interested in “codes” for which there are no valid encodings,
so we rule out this possibility by requiring this condition as a part of the definition.
Examples
Here are some examples of stabilizer codes for small values of n. We’ll see more
examples, including ones for which n can be much larger, in the next lesson.
The 3-bit repetition code is an example of a stabilizer code, where our stabilizer
generators are Z ⊗ Z ⊗ I and I ⊗ Z ⊗ Z.
We can easily check that these two stabilizer generators fulfill the required
conditions. First, the two stabilizer generators Z ⊗ Z ⊗ I and I ⊗ Z ⊗ Z commute
with one another.
( Z ⊗ Z ⊗ I)(I ⊗ Z ⊗ Z ) = Z ⊗ I ⊗ Z = (I ⊗ Z ⊗ Z )( Z ⊗ Z ⊗ I)
Z⊗Z⊗I ∈
/ ⟨I ⊗ Z ⊗ Z ⟩ = {I ⊗ I ⊗ I, I ⊗ Z ⊗ Z }
I⊗Z⊗Z ∈
/ ⟨ Z ⊗ Z ⊗ I⟩ = {I ⊗ I ⊗ I, Z ⊗ Z ⊗ I}
And third, we already know that |000⟩ and |111⟩, as well as any linear combination
of these vectors, are fixed by both Z ⊗ Z ⊗ I and I ⊗ Z ⊗ Z. Alternatively, we can
conclude this using the equivalent condition from the definition.
−I ⊗ I ⊗ I ∈
/ ⟨ Z ⊗ Z ⊗ I, I ⊗ Z ⊗ Z ⟩ = {I ⊗ I ⊗ I, Z ⊗ Z ⊗ I, Z ⊗ I ⊗ Z, I ⊗ Z ⊗ Z }
These conditions can be much more difficult to check for more complicated stabilizer
codes.
14.3. STABILIZER CODES 417
In the previous lesson, we saw that it’s possible to modify the 3-bit repetition code
so that it protects against phase-flip errors rather than bit-flip errors. As a stabilizer
code, this new code is easy to describe: its stabilizer generators are X ⊗ X ⊗ I and
I ⊗ X ⊗ X.
This time the stabilizer generators represent X ⊗ X observables rather than
Z ⊗ Z observables, so they’re essentially parity checks in the plus/minus basis
rather than the standard basis. The three required conditions on the stabilizer
generators are easily verified, along similar lines to the ordinary 3-bit repetition
code.
Here’s the 9-qubit Shor code, which is also a stabilizer code, expressed by stabilizer
generators.
Z⊗Z⊗I⊗I⊗I⊗I⊗I⊗I⊗I
I⊗Z⊗Z⊗I⊗I⊗I⊗I⊗I⊗I
I⊗I⊗I⊗Z⊗Z⊗I⊗I⊗I⊗I
I⊗I⊗I⊗I⊗Z⊗Z⊗I⊗I⊗I
I⊗I⊗I⊗I⊗I⊗I⊗Z⊗Z⊗I
I⊗I⊗I⊗I⊗I⊗I⊗I⊗Z⊗Z
X⊗X⊗X⊗X⊗X⊗X⊗I⊗I⊗I
I⊗I⊗I⊗X⊗X⊗X⊗X⊗X⊗X
In this case, we basically have three copies of the 3-bit repetition code, one for each
of the three blocks of three qubits, as well as the last two stabilizer generators, which
take a form reminiscent of the circuit for detecting phase-flips for this code. An
alternative way to think about the last two stabilizer generators is that they take the
same form as for the 3-bit repetition code for phase-flips, except that X ⊗ X ⊗ X is
substituted for X, which is consistent with the fact that X ⊗ X ⊗ X corresponds to
an X operation on logical qubits encoded using the 3-bit repetition code.
Before we move on to other examples, it should be noted that tensor product
symbols are often omitted when describing stabilizer codes by lists of stabilizer
generators, because it tends to make them easier to read and to see their patterns.
418 LESSON 14. THE STABILIZER FORMALISM
For example, the same stabilizer generators as above for the 9-qubit Shor code look
like this without the tensor product symbols being written explicitly.
Z Z I I I I I I I
I Z Z I I I I I I
I I I Z Z I I I I
I I I I Z Z I I I
I I I I I I Z Z I
I I I I I I I Z Z
X X X X X X I I I
I I I X X X X X X
Here’s another example of a stabilizer code, known as the 7-qubit Steane code. It
has some remarkable features, and we’ll come back to this code from time to time
throughout the remaining lessons of the course.
Z Z Z Z I I I
Z Z I I Z Z I
Z I Z I Z I Z
X X X X I I I
X X I I X X I
X I X I X I X
For now, let’s simply observe that this is a valid stabilizer code. The first three
stabilizer generators clearly commute with one another, because Z commutes with
itself and the identity commutes with everything, and the situation is similar for
the last three stabilizer generators. It remains to check that if we take one of the Z
stabilizer generators (i.e., one of the first three) and one of the X stabilizer generators
(i.e., one of the last three), then these two generators commute, and one can go
through the 9 possible pairings to check that. In all of these cases, an X and a Z
Pauli matrix always line up in the same position an even number of times, so the
two generators will commute, just like X ⊗ X and Z ⊗ Z commute. This is also a
minimal generating set, and it defines a nontrivial code space, which are facts left
to you to contemplate.
14.3. STABILIZER CODES 419
The 7-qubit Steane code is similar to the 9-qubit Shor code in that it encodes a
single qubit and allows for the correction of an arbitrary error on one qubit, but it
requires only 7 qubits rather than 9.
5-qubit code
Seven is not the fewest number of qubits required to encode one qubit and protect
it against an arbitrary error on one qubit — here’s a stabilizer code that does this
using just 5 qubits.
X Z Z X I
I X Z Z X
X I X Z Z
Z X I X Z
This code is typically called the 5-qubit code. This is the smallest number of qubits in
a quantum error correcting code that can allow for the correction of an arbitrary
single-qubit error.
Here’s another example of a stabilizer code, though it doesn’t actually encode any
qubits: the code space is one-dimensional. It is, however, still a valid stabilizer code
by the definition.
Z Z
X X
Specifically, the code space is the one-dimensional space spanned by an e-bit |ϕ+ ⟩.
Here’s a related example of a stabilizer code whose code space is the one-
√
dimensional space spanned by a GHZ state (|000⟩ + |111⟩)/ 2.
Z Z I
I Z Z
X X X
This question has a simple answer. Assuming that the n-qubit stabilizer gen-
erators P1 , . . . , Pr satisfy the three requirements of the definition (namely, that the
stabilizer generators all commute with one another, that this is a minimal generating
set, and that the code space is nonempty), it must then be that the code space for
this stabilizer code has dimension 2n−r , so n − r qubits can be encoded using this
code.
Intuitively speaking, we have n qubits to use for this encoding, and each stabi-
lizer generator effectively “takes a qubit away” in terms of how many qubits we
can encode. Note that this is not about which or how many errors can be detected
or corrected, it is only a statement about the dimension of the code space.
For example, for both the 3-bit repetition code and the modified version of that
code for phase-flip errors, we have n = 3 qubits and r = 2 stabilizer generators,
and therefore these codes can each encode 1 qubit. For another example, consider
the 5-qubit code: we have 5 qubits and 4 stabilizer generators, so once again the
code space has dimension 2, meaning that one qubit can be encoded using this code.
For one final example, the code whose stabilizer generators are X ⊗ X and Z ⊗ Z
has a one-dimensional code space, spanned by the state |ϕ+ ⟩, which is consistent
with having n = 2 qubits and r = 2 stabilizer generators.
Now let’s see how this fact can be proved. The first step is to observe that,
because the stabilizer generators commute, and because every Pauli operation is its
own inverse, every element in the stabilizer can be expressed as a product
P1a1 · · · Prar ,
I⊗n + Pk
Πk =
2
The code space C is the subspace of all vectors that are fixed by all r of the stabilizer
generators P1 , . . . , Pr , or equivalently, all r of the projections Π1 , . . . , Πr .
14.3. STABILIZER CODES 421
Given that the stabilizer generators all commute with one another, the projec-
tions Π1 , . . . , Πr must also commute. This allows us to use a fact from linear algebra,
which is that the product of these projections is the projection onto the intersection
of the subspaces corresponding to the individual projections. That is to say, the
product Π1 · · · Πr is the projection onto the code space C .
We can now expand out the product Π1 · · · Πr using the formulas for these
projections to obtain the following expression.
I + P1 I + Pr
⊗n ⊗n
1
2 a ,...,a∑
Π1 · · · Πr = ··· = r P1a1 · · · Prar
2 2 ∈{0,1} 1 r
In words, the projection onto the code space of a stabilizer code is equal, as a matrix,
to the average over all of the elements in the stabilizer of that code.
Finally, we can compute the dimension of the code space by using the fact
that the dimension of any subspace is equal to the trace of the projection onto
that subspace. Thus, the dimension of the code space C is given by the following
formula.
1
2 a ,...,a∑
dim(C) = Tr(Π1 · · · Πr ) = r Tr( P1a1 · · · Prar )
∈{0,1} 1 r
• For ( a1 , . . . , ar ) ̸= (0, . . . , 0), the product P1a1 · · · Prar must be ±1 times a Pauli
operation — but we cannot obtain I⊗n because this would contradict the
minimality of the set { P1 , . . . , Pr }, and we cannot obtain −I⊗n because the
third condition on the stabilizer generators forbids it. Therefore, because the
trace of every non-identity Pauli operation is zero, we obtain
As an aside, we can now see that the assumption that −I⊗n is not contained in
the stabilizer implies that the code space must contain at least one quantum state
422 LESSON 14. THE STABILIZER FORMALISM
vector. This is because, as we’ve just verified, this assumption implies that the code
space has dimension 2n−r , which cannot be zero. The converse implication happens
to be trivial: if −I⊗n is contained in the stabilizer, then the code space can’t possibly
contain any quantum state vectors, because no nonzero vectors are fixed by this
operation.
Clifford operations
Clifford operations are unitary operations, on any number of qubits, that can
be implemented by quantum circuits with a restricted set of gates:
• Hadamard gates
• S gates
• CNOT gates
Notice that T gates are not included in the list, nor are Toffoli gates and Fredkin
gates. Not only are those gates not included in the list, but in fact, it’s not possible
to implement those gates using the ones listed here; they’re not Clifford operations.
Pauli operations, on the other hand, are Clifford operations because they can be
implemented with sequences of Hadamard and S gates.
That’s a simple way to define Clifford operations, but it doesn’t explain why
they’re defined like this or what’s special about this particular collection of gates.
The real reason Clifford operations are defined like this is that, up to global phase
factors, the Clifford operations are precisely the unitary operations that always
transform Pauli operations into Pauli operations by conjugation. To be more precise,
an n-qubit unitary operation U is equivalent to a Clifford operation up to a phase
factor if, and only if, for every n-qubit Pauli operation P, we have
UPU † = ± Q
for some n-qubit Pauli operation Q. (Note that it is not possible to have UPU † = αQ
for α ∈
/ {+1, −1} when U is unitary and P and Q are Pauli operations. This follows
from the fact that the matrix on the left-hand side of the equation in question is
14.3. STABILIZER CODES 423
both unitary and Hermitian, and +1 and −1 are the only choices for α that allow
the right-hand side to be unitary and Hermitian as well.)
It is straightforward to verify the conjugation property just described when U is
a Hadamard, S, or CNOT gate. In particular, this is easy for Hadamard gates,
and S gates,
SXS† = Y, SYS† = − X, SZS† = Z.
For CNOT gates, there are 15 non-identity Pauli operations on two qubits to check.
Naturally, they can be checked individually — but the relationships between CNOT
gates and X and Z gates listed (in circuit form) in the previous lesson, together with
the multiplication rules for Pauli matrices, offer a shortcut to the same conclusion.
Once we know this conjugation property is true for Hadamard, S, and CNOT
gates, we can immediately conclude that it is true for circuits composed of these
gates — which is to say, all Clifford operations.
It is more difficult to prove that the relationship works in the other direction,
which is that if a given unitary operation U satisfies the conjugation property for
Pauli operations, then it must be possible to implement it (up to a global phase)
using just Hadamard, S, and CNOT gates. This won’t be explained in this lesson,
but it is true.
Clifford operations are not universal for quantum computation; unlike universal
sets of quantum gates, approximating arbitrary unitary operations to any desired
level of accuracy with Clifford operations is not possible. Indeed, for a given value
of n, there are only finitely many n-qubit Clifford operations (up to phase factors).
Performing Clifford operations on standard basis states followed by standard basis
measurements also can’t allow us to perform computations that are outside of the
reach of classical algorithms — because we can efficiently simulate computations of
this form classically. This fact is known as the Gottesman–Knill theorem.
A stabilizer code defines a code space of a certain dimension, and we have the
freedom to use that code space however we choose — nothing forces us to encode
qubits into this code space in a specific way. It is always possible, however, to use a
Clifford operation as an encoder, if we choose to do that. To be more precise, for any
424 LESSON 14. THE STABILIZER FORMALISM
stabilizer code that allows m qubits to be encoded into n qubits, there’s an n-qubit
Clifford operation U such that, for any m-qubit quantum state vector |ϕ⟩, we have
that
| ψ ⟩ = U |0n − m ⟩ ⊗ | ϕ ⟩
is a quantum state vector in the code space of our code that we may interpret as an
encoding of |ϕ⟩.
This is good because Clifford operations are relatively simple, compared with
arbitrary unitary operations, and there are ways to optimize their implementation
using techniques similar to ones found in the proof of the Gottesman–Knill theorem.
As a result, circuits for encoding states using stabilizer codes never need to be too
large. In particular, it is always possible to perform an encoding for an n-qubit
stabilizer code using a Clifford operation that requires O(n2 / log(n)) gates. This is
because every Clifford operation on n qubits can be implemented by a circuit of this
size.
For example, Figure 14.7 shows an encoder for the 7-qubit Steane code. It is
indeed a Clifford operation, and as it turns out, this one doesn’t even need S gates.
α |0⟩ + β |1⟩ + +
|0⟩ + + +
|0⟩ + + +
|0⟩ + + +
|0⟩ H
|0⟩ H
|0⟩ H
Detecting errors
For an n-qubit stabilizer code described by stabilizer generators P1 , . . . , Pr , error
detection works in the following way.
To detect errors, all of the stabilizer generators are measured as observables.
There are r stabilizer generators, and therefore r measurement outcomes, each one
being +1 or −1 (or a binary value if we choose to associate 0 with +1 and 1 with
−1, respectively). We interpret the r outcomes collectively, as a vector or string, as
a syndrome. The syndrome (+1, . . . , +1) indicates that no error has been detected,
while at least one −1 somewhere within the syndrome indicates an error has been
detected.
Suppose, in particular, that E is an n-qubit Pauli operation, representing a
hypothetical error. (We’re only considering Pauli operations as errors, by the way,
because the discretization of errors works the same way for arbitrary stabilizer
codes as it does for the 9-qubit Shor code.) There are three cases that determine
whether or not E is detected as an error.
E = ± Q for some Q ∈ ⟨ P1 , . . . , Pr ⟩
In this case, E must commute with every stabilizer generator, so we obtain the
syndrome (+1, . . . , +1). This means that E is not detected as an error.
2. The operation E is not proportional to an element in the stabilizer, but it
nevertheless commutes with every stabilizer generator.
This is an error that changes vectors in the code space in some nontrivial way.
But, because E commutes with every stabilizer generator, the syndrome is
(+1, . . . , +1), so E goes undetected by the code.
3. The operation E anti-commutes with at least one of the stabilizer generators.
The syndrome is different than (+1, . . . , +1), so the error E is detected by the
code.
426 LESSON 14. THE STABILIZER FORMALISM
In the first case, the error E is not a concern because this operation does nothing
to vectors in the code space, except to possibly inject an irrelevant global phase:
E|ψ⟩ = ±|ψ⟩ for every encoded state |ψ⟩. In essence, this is not actually an error —
whatever nontrivial action E may have happens outside of the code space — so it’s
good that E is not detected as an error, because nothing needs to be done about it.
The second case, intuitively speaking, is the bad case. It’s the anti-commutation
of an error with a stabilizer generator that causes a −1 to appear somewhere in
the syndrome, signaling an error, but that doesn’t happen in this case. So, we
have an error E that does change vectors in the code space in some nontrivial way,
but it goes undetected by the code. For example, for the 3-bit repetition code, the
operation E = X ⊗ X ⊗ X falls into this category.
The fact that such an error E must change some vectors in the code space in a
non-trivial way can be argued as follows. By the assumption that E commutes with
P1 , . . . , Pr but is not proportional to a stabilizer element, we can conclude that we
would obtain a new, valid stabilizer code by including E as a stabilizer generator
along with P1 , . . . , Pr . The code space for this new code, however, has only half the
dimension of the original code space, from which we can conclude that the action
of E on the original code space cannot be proportional to the identity operation.
For the last of the three cases, which is that the error E anti-commutes with
at least one stabilizer generator, the syndrome has at least one −1 somewhere in
it, which indicates that something is wrong. As we have already discussed, the
syndrome won’t uniquely identify E in general, so it’s still necessary to choose
a correction operation for each syndrome, which might or might not correct the
error E. We’ll discuss this step shortly, in the last part of the lesson.
As an example, let’s consider the 7-qubit Steane code. Here are the stabilizer
generators for this code:
Z Z Z Z I I I
Z Z I I Z Z I
Z I Z I Z I Z
X X X X I I I
X X I I X X I
X I X I X I X
Z⊗I⊗Z⊗I⊗Z⊗I⊗Z
X⊗I⊗X⊗I⊗X⊗I⊗X
The tensor factor Q in our error E lines up with the identity matrix in both of these
stabilizer generators (which is why they were selected). Given that we have identity
matrices in the rightmost 5 positions of E, we conclude that P must commute with
X and Z, for otherwise E would anti-commute with one of the two generators.
However, the only Pauli matrix that commutes with both X and Z is the identity
matrix, so P = I.
Now that we know this, we can choose two more stabilizer generators that have
an X and a Z in the second position from left, and we draw a similar conclusion:
Q = I. It is therefore the case that E is the identity operation.
428 LESSON 14. THE STABILIZER FORMALISM
So, there’s no way for an error having weight at most 2 to go undetected by this
code, unless the error is the identity operation (which is in the stabilizer and there-
fore not actually an error). On the other hand, there are weight 3 Pauli operations
that commute with all six of these stabilizer generators, but aren’t proportional to
stabilizer elements, such as I ⊗ I ⊗ I ⊗ I ⊗ X ⊗ X ⊗ X and I ⊗ I ⊗ I ⊗ I ⊗ Z ⊗ Z ⊗ Z.
This establishes that this code has distance 3, as claimed.
Correcting errors
The last topic of discussion for this lesson is the correction of errors for stabilizer
codes. As usual, assume that we have a stabilizer code specified by n-qubit stabilizer
generators P1 , . . . , Pr .
The n-qubit Pauli operations, as errors that could affect states encoded using this
code, are partitioned into equal-sized collections according to which syndrome they
cause to appear. There are 2r distinct syndromes and 4n Pauli operations, which
means there are 4n /2r Pauli operations causing each syndrome. Any one of these
errors could be responsible for the corresponding syndrome.
However, among the 4n /2r Pauli operations that cause each syndrome, there are
some that should be considered as being equivalent. In particular, if the product
of two Pauli operations is proportional to a stabilizer element, then those two
operations are effectively equivalent as errors.
Another way to say this is that if we apply a correction operation C to attempt
to correct an error E, then this correction succeeds so long as the composition CE is
proportional to a stabilizer element. Given that there are 2r elements in the stabilizer,
it follows that each correction operation C corrects 2r different Pauli errors. This
leaves 4n−r inequivalent classes of Pauli operations, considered as errors, that are
consistent with each possible syndrome.
This means that, unless n = r (in which case we have a trivial, one-dimensional
code space), we can’t possibly correct every error detected by a stabilizer code. What
we must do instead is to choose just one correction operation for each syndrome, in
the hopes of correcting just one class of equivalent errors that cause this syndrome.
One natural strategy for choosing which correction operation to perform for
each syndrome is to choose the lowest weight Pauli operation that, as an error, causes
that syndrome. There may in fact be multiple operations that tie for the lowest
weight error consistent with a given syndrome, in which case any one of them may
be selected. The idea is that lower-weight Pauli operations represent more likely
14.3. STABILIZER CODES 429
explanations for whatever syndrome has been measured. This might actually not
be the case for some noise models, and one alternative strategy is to compute the
most likely error that causes the given syndrome, based on the chosen noise model.
For this lesson, however, we’ll keep things simple and only consider lowest-weight
corrections.
For a distance d stabilizer code, this strategy of choosing the correction operation
to be a lowest weight Pauli operation consistent with the measured syndrome
always allows for the correction of errors having weight strictly less than half
of d, or in other words, weight at most (d − 1)/2. This shows, for instance, that
the 7-qubit Steane code can correct for any weight-one Pauli error, and by the
discretization of errors, this means that the Steane code can correct for an arbitrary
error on one qubit.
To see how this works, consider the diagram in Figure 14.8. The circle on the
left represents all of the Pauli operations that result in the syndrome (+1, . . . , +1),
which is the syndrome that suggests that no errors have occurred and nothing is
wrong. Among these operations we have elements that are proportional to elements
of the stabilizer, and we also have non-trivial errors that change the code space in
some way but aren’t detected by the code. By the definition of distance, every Pauli
undetected
errors
(weight ≥ d)
E (weight < d/2)
C
CE
± stabilizer
(weight < d)
operation in this category must have weight at least d, because d is defined as the
minimum weight of these operations.
The circle on the right represents the Pauli operations that result in a different
syndrome s ̸= (+1, . . . , +1), including an error E having weight strictly less than
d/2 that we will consider. The correction operation C chosen for the syndrome s is
the lowest weight Pauli operation in the collection represented by the circle on the
right in the diagram (or any one of them in case there’s a tie). So, it could be that
C = E, but not necessarily. What we can say for certain, however, is that C cannot
have weight larger than the weight of E, because C has minimal weight among the
operations in this collection — and therefore C has weight strictly less than d/2.
Now consider what happens when the correction operation C is applied to
whatever state we obtained after the error E takes place. Assuming that the original
encoding was |ψ⟩, we’re left with CE|ψ⟩. Our goal will be to show that CE is
proportional to an element in the stabilizer, implying that the correction is successful
and (up to a global phase) we’re left with the original encoded state |ψ⟩.
First, because E and C cause the same syndrome, the composition CE must
commute with every stabilizer generator. In particular, if Pk is any one of the
stabilizer generators, then we must have
for the same value of α ∈ {+1, −1}, because this is the k-th entry in the syndrome s
that both C and E generate. Hence, we have
so Pk commutes with CE. We’ve therefore shown that CE belongs in the circle on
the left in the diagram, because it generates the syndrome (+1, . . . , +1).
Second, the composition CE must have weight at most the sum of the weights
of C and E — which follows from a moment’s thought about products of Pauli
operations — and therefore the weight of CE is strictly less than d. This implies
that CE is proportional to an element in the stabilizer of our code, which is what
we wanted to show. By choosing our correction operations to be lowest-weight
representatives of the set of errors that generate each syndrome, we’re therefore
guaranteed to correct any Pauli errors having weight less than half of the distance
of the code.
There is one problem, however. For stabilizer codes in general, it’s a computa-
tionally difficult problem to compute the lowest weight Pauli operation causing a
14.3. STABILIZER CODES 431
given syndrome. (Indeed, this is true even for classical codes, which in this context
we can think of as stabilizer codes where we only have I and Z matrices appearing
as tensor factors within the stabilizer generators.) So, unlike the encoding step,
Clifford operations will not be coming to our rescue this time.
The solution is to choose specific codes for which good corrections can be
computed efficiently, for which there’s no simple recipe. Simply put, devising
stabilizer codes for which good correction operations can be computed efficiently is
part of the artistry of quantum code design. We’ll see this artistry on display in the
next lesson.
Lesson 15
We’ve seen a few examples of quantum error correcting codes in previous lessons of
this unit, including the 9-qubit Shor code, the 7-qubit Steane code, and the 5-qubit
code. These codes are undoubtedly interesting and represent a natural place to
begin an exploration of quantum error correction, but a problem with them is that
they can only tolerate a very low error rate. Correcting an error on one qubit out
of five, seven, or nine isn’t bad, but in all likelihood we’re going to need to be able
to tolerate a lot more errors than that to make large-scale quantum computing a
reality.
In this lesson, we’ll take a first look at some more sophisticated quantum error
correcting code constructions, including codes that can tolerate a much higher error
rate than the ones we’ve seen so far, and that are viewed as promising candidates
for practical quantum error correction.
We’ll begin with a class of quantum error correcting codes known as CSS codes,
named for Robert Calderbank, Peter Shor, and Andrew Steane, who first discovered
them. The CSS code construction allows one to take certain pairs of classical error
correcting codes and combine them into a single quantum error correcting code.
The second part of the lesson is on a code known as the toric code. This is a
fundamental (and truly beautiful) example of a quantum error correcting code that
can tolerate relatively high error rates. In fact, the toric code isn’t a single example
of a quantum error correcting code, but rather it’s an infinite family of codes, one
for each positive integer greater than one.
Finally, in the last part of the lesson, we’ll briefly discuss a couple of other
families of quantum codes, including surface codes (which are closely connected to
the toric code) and color codes.
433
434 LESSON 15. QUANTUM CODE CONSTRUCTIONS
000 ⊕ 000 = 000, 000 ⊕ 111 = 111, 111 ⊕ 000 = 111, 111 ⊕ 111 = 000.
Here’s another example of a classical linear code called the [7, 4, 3]-Hamming code.
It was one of the very first classical error correcting codes ever discovered, and it
consists of these 16 binary strings of length 7. (Sometimes the [7, 4, 3]-Hamming
code is understood to mean the code with these strings reversed, but we’ll take it to
be the code containing the strings shown here.)
There is very simple logic behind the selection of these strings, but it’s secondary to
the lesson and won’t be explained here. For now, it’s enough to observe that this is
a classical linear code: XORing any two of these strings together will always result
in another string in the code.
The notation [7, 4, 3] (in single square brackets) means something analogous to
the double square bracket notation for stabilizer codes mentioned in the previous
lesson, but here it’s for classical linear codes. In particular, codewords have 7 bits,
we can encode 4 bits using the code (because there are 16 = 24 codewords), and
it happens to be a distance 3 code, which means that any two distinct codewords
must differ in at least 3 positions — so at least 3 bits must be flipped to change one
codeword into another. The fact that this is a distance 3 code implies that it can
correct for up to one bit-flip error.
The examples just mentioned are very simple examples of classical linear codes, but
even the [7, 4, 3]-Hamming code looks somewhat mysterious when the codewords
are simply listed. There are better, more efficient ways to describe classical linear
codes, including the following two ways.
436 LESSON 15. QUANTUM CODE CONSTRUCTIONS
Generators. One way to describe a classical linear code is with a minimal list of
codewords that generates the code, meaning that by taking all of the possible subsets
of these codewords and XORing them together, we get the entire code.
That is, the strings u1 , . . . , um ∈ Σn generate the classical linear code C if
C = α1 u1 ⊕ · · · ⊕ αm um : α1 , . . . , αm ∈ {0, 1} ,
and this set of strings is minimal if removing one results in a larger code. These
are called parity check strings because u has binary dot product equal to zero with
v if and only if the bits of u in positions where v has 1s have even parity. So, to
determine if a string u is in the code C , it suffices to check the parity of certain
subsets of the bits of u.
An important thing to notice here is that the binary dot product is not an inner
product in a formal sense. In particular, when two strings have binary dot product
equal to zero, it doesn’t mean that they’re orthogonal in the usual way we think
about orthogonality. For example, the binary dot product of the string 11 with itself
is zero — so it is possible that a parity check string for a classical linear code is itself
in the code.
Classical linear codes over the binary alphabet always include a number of
strings that’s a power of 2 — and for a single classical linear code described in
the two different ways just described, it will always be the case that n = m + r. In
15.1. CSS CODES 437
particular, if we have a minimal set of m generators, then the code encodes m bits
and we’ll necessarily have 2m codewords; and if we have a minimal set of r parity
check strings, then we’ll have 2n−r codewords. So, each generator doubles the size
of the code space while each parity check string halves the size of the code space.
For example, the 3-bit repetition code is a linear code, so it can be described in
both of these ways. In particular, there’s only one choice for a generator that works:
111. We can alternatively describe the code with two parity check strings, such as
110 and 011 — which should look familiar from our previous discussions of this
code — or we could instead take the parity check strings to be 110 and 101, or 101
and 011. (Generators and parity check strings are generally not unique for a given
classical linear code.)
For a second example, consider the [7, 4, 3]-Hamming code. Here’s one choice
for a list of generators that works.
1111000
0110100
1010010
1100001
And here’s a choice for a list of parity checks for this code.
1111000
1100110
1010101
Here, by the way, we see that all of our parity check strings are themselves in the
code.
One final remark about classical linear codes, which connects them to the stabi-
lizer formalism, is that parity check strings are equivalent to stabilizer generators
that only consist of Z and identity Pauli matrices. For instance, the parity check
strings 110 and 011 for the 3-bit repetition code correspond precisely to the stabilizer
generators Z ⊗ Z ⊗ I and I ⊗ Z ⊗ Z, which is consistent with the discussions of
Pauli observables from the previous lesson.
— the two codes must have a certain relationship. Nevertheless, this construction
opens up many possibilities for quantum error correcting codes, based in part on
over 75 years of classical coding theory.
In the stabilizer formalism, stabilizer generators containing only Z and identity
Pauli matrices are equivalent to parity checks, as we just observed for the 3-bit
repetition code. For another example, consider the following parity check strings
for the [7, 4, 3]-Hamming code.
1111000
1100110
1010101
These parity check strings correspond to the following stabilizer generators (written
without tensor product symbols), which we obtain by replacing each 1 by a Z and
each 0 by an I. These are three of the six stabilizer generators for the 7-qubit Steane
code.
Z Z Z Z I I I
Z Z I I Z Z I
Z I Z I Z I Z
Let us give the name Z stabilizer generators to stabilizer generators like this, meaning
that they only have Pauli Z and identity tensor factors — so X and Y Pauli matrices
never occur in Z stabilizer generators.
We can also consider stabilizer generators where only X and identity Pauli
matrices appear as tensor factors. Stabilizer generators like this can be viewed as
being analogous to Z stabilizer generators, except that they describe parity checks
in the {|+⟩, |−⟩} basis rather than the standard basis. Stabilizer generators of this
form are called X stabilizer generators — so no Y or Z Pauli matrices are allowed this
time.
For example, consider the remaining three stabilizer generators from the 7-qubit
Steane code.
X X X X I I I
X X I I X X I
X I X I X I X
They follow exactly the same pattern from the [7, 4, 3]-Hamming code as the Z
stabilizer generators, except this time we substitute X for 1 rather than Z. What we
obtain from just these three stabilizer generators is a code that includes the 16 states
15.1. CSS CODES 439
shown here, which we get by applying Hadamard operations to the standard basis
states that correspond to the strings in the [7, 4, 3]-Hamming code. (Of course, the
code space for this code also includes linear combinations of these states.)
|+ + + + + + +⟩ |− − + + + + −⟩ |− + − + + − +⟩ |+ − − + + − −⟩
|+ − − + − + +⟩ |− + − + − + −⟩ |− − + + − − +⟩ |+ + + + − − −⟩
|− − − − + + +⟩ |+ + − − + + −⟩ |+ − + − + − +⟩ |− + + − + − −⟩
|− + + − − + +⟩ |+ − + − − + −⟩ |+ + − − − − +⟩ |− − − − − − −⟩
CSS codes
A CSS code is a stabilizer code that can be expressed using only X and Z
stabilizer generators.
That is, CSS codes are stabilizer codes for which we have stabilizer generators in
which no Pauli Y matrices appear, and for which X and Z never appear in the same
stabilizer generator.
To be clear, by this definition, a CSS code is one for which it is possible to
choose just X and Z stabilizer generators — but we must keep in mind that there is
freedom in how we choose stabilizer generators for stabilizer codes. Thus, there
will generally be different choices for the stabilizer generators of a CSS code that
don’t happen to be X or Z stabilizer generators (in addition to at least one choice
for which they are).
Here’s a very simple example of a CSS code that includes both a Z stabilizer
generator and an X stabilizer generator:
Z Z
X X
It’s clear that this is a CSS code, because the first stabilizer generator is a Z stabilizer
generator and the second is an X stabilizer generator. Of course, a CSS code
must also be a valid stabilizer code — meaning that the stabilizer generators must
commute, form a minimal generating set, and fix at least one quantum state vector.
These requirements happen to be simple to observe for this code. As we noted
in the previous lesson, the code space for this code is the one-dimensional space
spanned by the |ϕ+ ⟩ Bell state. The fact that both stabilizer generators fix this state
440 LESSON 15. QUANTUM CODE CONSTRUCTIONS
Z Z Z Z I I I
Z Z I I Z Z I
Z I Z I Z I Z
X X X X I I I
X X I I X X I
X I X I X I X
Here we have three Z stabilizer generators and three X stabilizer generators, and
we’ve already verified that this is a valid stabilizer code.
And the 9-qubit Shor code is another example.
Z Z I I I I I I I
I Z Z I I I I I I
I I I Z Z I I I I
I I I I Z Z I I I
I I I I I I Z Z I
I I I I I I I Z Z
X X X X X X I I I
I I I X X X X X X
This time we have six Z stabilizer generators and just two X stabilizer generators.
This is fine, there doesn’t need to be a balance or a symmetry between the two types
of generators (though there often is).
Once again, it is critical that CSS codes are valid stabilizer codes, and in particular
each Z stabilizer generator must commute with each X stabilizer generator. So, not
every collection of X and Z stabilizer generators defines a valid CSS code.
15.1. CSS CODES 441
Z Z Z Z I I I
Z Z I I Z Z I
Z I Z I Z I Z
X X X X I I I
X X I I X X I
X I X I X I X
The basic idea for this code is now apparent: it’s a [7, 4, 3]-Hamming code for bit-flip
errors and a [7, 4, 3]-Hamming code for phase-flip errors. The fact that the X and
Z stabilizer generators commute is perhaps good fortune, for this wouldn’t be a
valid stabilizer code if they didn’t. But there are, in fact, many known examples of
classical linear codes that yield a valid stabilizer code when used in a similar way.
In general, suppose we have a CSS code for which the Z stabilizer generators
allow for the correction of up to j bit-flip errors, and the X stabilizer generators
allow for the correction of up to k phase-flip errors. For example, j = 1 and k = 1
for the Steane code, given that the [7, 4, 3]-Hamming code can correct one bit-flip. It
then follows, by the discretization of errors, that this CSS code can correct for any
error on a number of qubits up to the minimum of j and k. This is because, when
the syndrome is measured, an arbitrary error on this number of qubits effectively
collapses probabilistically into some combination of X errors, Z errors, or both —
and then the X errors and Z errors are detected and corrected independently.
In summary, provided that we have two classical linear codes (or two copies of a
single classical linear code) that are compatible, in that they define X and Z stabilizer
generators that commute, the CSS code we obtain by combining them inherits the
error correction properties of those two codes, in the sense just described.
442 LESSON 15. QUANTUM CODE CONSTRUCTIONS
Notice that there is a price to be paid though, which is that we can’t encode as
many qubits as we could bits with the two classical codes. This is because the
total number of stabilizer generators for the CSS code is the sum of the number
of parity checks for the two classical linear codes, and each stabilizer generator
cuts the dimension of the code space in half. For example, the [7, 4, 3]-Hamming
code allows for the encoding of four classical bits, because we have just three parity
check strings for this code, whereas the 7-qubit Steane code only encodes one qubit,
because it has six stabilizer generators.
C Z = u ∈ Σ n : u · z1 = · · · = u · z s = 0
In words, the classical linear code C Z contains every string whose binary dot product
with every one of the parity check strings z1 , . . . , zs is zero.
Along similar lines, let us take x1 , . . . , xt ∈ Σn to be the n-bit parity check strings
corresponding to the X stabilizer generators of our code. Thus, the classical linear
code corresponding to the X stabilizer generators takes this form.
C X = u ∈ Σ n : u · x1 = · · · = u · x t = 0
The X stabilizer generators alone therefore describe a code that’s similar to this
code, but in the {|+⟩, |−⟩} basis rather than the standard basis.
Now we’ll introduce two new classical linear codes that are derived from the
same choices of strings, z1 , . . . , zs and x1 , . . . , xt , but where we take these strings as
generators rather than parity check strings. In particular, we obtain these two codes.
D Z = α1 z1 ⊕ · · · ⊕ αs zs : α1 , . . . , αs ∈ {0, 1}
D X = α1 x1 ⊕ · · · ⊕ αt xt : α1 , . . . , αt ∈ {0, 1}
15.1. CSS CODES 443
These are known as the dual codes of the codes defined previously: D Z is the dual
code of C Z and D X is the dual code of C X . It may not be clear at this point why these
dual codes are relevant, but they turn out to be quite relevant for multiple reasons,
including the two reasons explained in the following paragraphs.
First, the conditions that must hold for two classical linear codes C Z and C X
to be compatible, in the sense that they can be paired together to form a CSS
code, can be described in simple terms by referring to the dual codes. Specifically,
it must be that D Z ⊆ C X , or equivalently, that D X ⊆ C Z . In words, the dual
code D Z includes the strings corresponding to Z stabilizer generators, and their
containment in C X is equivalent to the binary dot product of each of these strings
with the ones corresponding to the X stabilizer generators being zero. That, in
turn, is equivalent to each Z stabilizer generator commuting with each X stabilizer
generator. Alternatively, by reversing the roles of the X and Z stabilizer generators
and starting from the containment D X ⊆ C Z , we can reach the same conclusion.
Second, by referring to the dual codes, we can easily describe the code spaces
of a given CSS code. In particular, the code space is spanned by vectors of the
following form.
1
|u ⊕ D X ⟩ = √
2t
∑ |u ⊕ v⟩ (for all u ∈ C Z )
v∈D X
In words, these vectors are uniform superpositions over the strings in the dual code
D X of the code corresponding to the X stabilizer generators, shifted by (in other
words, bitwise XORed with) strings in the code C Z corresponding to the Z stabilizer
generators. To be clear, different choices for the shift — represented by the string u
in this expression — can result in the same vector. So, these states aren’t all distinct,
but collectively they span the entire code space.
Here’s an intuitive explanation for why such vectors are both in the code space
and span it. Consider the n-qubit standard basis state |u⟩, for some arbitrary n-bit
string u, and suppose that we project this state onto the code space. That is to say,
letting Π denote the projection onto the code space of our CSS code, consider the
vector Π|u⟩. There are two cases:
Case 1: u ∈ C Z . This implies that each Z stabilizer generator of our CSS code acts
trivially on |u⟩. The X stabilizer generators, on the other hand, each simply flip
some of the bits of |u⟩. In particular, for each generator v of D X , the X stabilizer
generator corresponding to v transforms |u⟩ into |u ⊕ v⟩. By characterizing the
444 LESSON 15. QUANTUM CODE CONSTRUCTIONS
projection Π as the average over the elements of the stabilizer (as we saw in the
previous lesson), we obtain this formula:
1 1
Π|u⟩ =
2t ∑ | u ⊕ v ⟩ = √ | u ⊕ D X ⟩.
2t
v∈D X
Case 2: u ∈/ C Z . This implies that at least one of the parity checks corresponding to
the Z stabilizer generators fails, which is to say that |u⟩ must be a −1 eigenvector
of at least one of the Z stabilizer generators. The code space of the CSS code is
the intersection of the +1 eigenspaces of the stabilizer generators. So, as a −1
eigenvector of at least one of the Z stabilizer generators, |u⟩ is therefore orthogonal
to the code space:
Π|u⟩ = 0.
And now, as we range over all n-bit strings u, discard the ones for which
Π|u⟩ = 0, and normalize the remaining ones, we obtain the vectors described
previously, which demonstrates that they span the code space.
We can also use the symmetry between X and Z stabilizer generators to describe
the code space in a similar but different way. In particular, it is the space spanned
by vectors having the following form.
1
H ⊗n | u ⊕ D Z ⟩ = √
2s
∑ H ⊗n | u ⊕ v ⟩ (for u ∈ C X )
v∈D Z
In essence, X and Z have been swapped in each instance in which they appear —
but we must also swap the standard basis for the {|+⟩, |−⟩} basis, which is why
the Hadamard operations are included.
As an example, let us consider the 7-qubit Steane code. The parity check strings
for both the X and Z stabilizer generators are the same: 1111000, 1100110, and
1010101. The codes C X and C Z are therefore the same; both are equal to the [7, 4, 3]-
Hamming code.
0000000, 0000111, 0011001, 0011110,
0101010, 0101101, 0110011, 0110100,
CX = CZ =
1001011, 1001100, 1010010, 1010101,
1100001, 1100110, 1111000, 1111111
The dual codes D X and D Z are therefore also the same. We have three generators,
so we obtain eight strings.
( )
0000000, 0011110, 0101101, 0110011,
DX = DZ =
1001011, 1010101, 1100110, 1111000
15.2. THE TORIC CODE 445
These strings are all contained in the [7, 4, 3]-Hamming code, and so the CSS condi-
tion is satisfied: D Z ⊆ C X , or equivalently, D X ⊆ C Z .
Given that D X contains half of all of the strings in C Z , there are only two different
vectors |u ⊕ D X ⟩ that can be obtained by choosing u ∈ C Z . This is expected, because
the 7-qubit Steane code has a two-dimensional code space. We can use the two
states we obtain in this way to encode the logical state |0⟩ and |1⟩ as follows.
|0000000⟩ + |0011110⟩ + |0101101⟩ + |0110011⟩ + |1001011⟩ + |1010101⟩ + |1100110⟩ + |1111000⟩
|0⟩ 7 → √
8
As usual, this choice isn’t forced on us — we’re free to use the code space to
encode qubits however we choose. This encoding is, however, consistent with the
example of an encoding circuit for the 7-qubit Steane code in the previous lesson.
• The stabilizer generators have low weight, and in particular they all have
weight four. In coding theory parlance, the toric code is an example of a quan-
tum low-density parity check code, or quantum LDPC code (where low means
4 in this case). This is nice because each stabilizer generator measurement
doesn’t need to involve too many qubits.
• The toric code has geometric locality. This means that not only do the stabi-
lizer generators have low weight, but it’s also possible to arrange the qubits
spatially so that each of the stabilizer generator measurements only involves
qubits that are close together. In principle, this makes these measurements
easier to implement than if they involved spatially distant qubits.
• Members of the toric code family have increasingly large distance and can
tolerate a relatively high error rate.
446 LESSON 15. QUANTUM CODE CONSTRUCTIONS
The way one can “move around” on a torus like this, between adjacent points
on the lattice, will likely be familiar to those that have played old-school video
games, where moving off the top of the screen causes you emerge on the bottom,
and likewise for the left and right edges of the screen. This is how we will view this
lattice with periodic boundaries, as opposed to speaking specifically about a torus
in 3-dimensional space.
Next, qubits are arranged on the edges of this lattice, as illustrated in Figure 15.3,
where qubits are indicated by solid blue circles. Note that the qubits placed on the
dotted lines aren’t solid because they’re already represented on the topmost and
leftmost lines in the lattice. In total there are 2L2 qubits: L2 qubits on horizontal
lines and L2 qubits on vertical lines.
To describe the toric code itself, it remains to describe the stabilizer generators:
• For each tile formed by the lines in the lattice there is one Z stabilizer generator,
obtained by tensoring Z matrices on the four qubits touching that tile along
with identity matrices on all other qubits.
• For each vertex formed by the lines in the lattice there is one X stabilizer
generator, obtained by tensoring X matrices on the four qubits adjacent to
that vertex along with identity matrices on all other qubits.
448 LESSON 15. QUANTUM CODE CONSTRUCTIONS
Figure 15.3: Qubits, indicated by blue circles, are placed on the edges of the lattice.
Z X
Z Z X X
Z X
Z stabilizer X stabilizer
generator (tile) generator (vertex)
Figure 15.4: The two types of stabilizer generators for the toric code.
Figure 15.5: Examples of stabilizer generators of the two types are indicated by
thick lines. In total, there are L2 stabilizer generators of each type.
The stabilizer generators must commute for this to be a valid stabilizer code.
As usual, the Z stabilizer generators all commute with one another, because Z
commutes with itself and the identity commutes with everything, and likewise
for the X stabilizer generators. The Z and X stabilizer generators clearly commute
when they act nontrivially on disjoint sets of qubits, like for the examples shown
in Figure 15.5. The remaining possibility is that a Z stabilizer generator and an X
stabilizer generator overlap on the qubits upon which they act nontrivially, and
whenever this happens the generators must always overlap on two qubits, as shown
in Figure 15.6. Consequently, two stabilizer generators like this commute, just like
Z ⊗ Z and X ⊗ X commute. The stabilizer generators therefore all commute with
one another.
The second required condition on the stabilizer generators for a stabilizer code
is that they form a minimal generating set. This condition is actually not satisfied by
this collection: if we multiply all of the Z stabilizer generators together, we obtain
the identity operation, and likewise for the X stabilizer generators. Thus, any one
of the Z stabilizer generators can be expressed as the product of all of the remaining
450 LESSON 15. QUANTUM CODE CONSTRUCTIONS
ones, and similarly, any one of the X stabilizer generators can be expressed as the
product of the remaining X stabilizer generators. If we remove any one of the Z
stabilizer generators and any one of the X stabilizer generators, however, we do
obtain a minimal generating set.
To be clear about this, we do in fact care equally about all of the stabilizer
generators, and in a strictly operational sense there isn’t any need to select one
stabilizer generator of each type to remove. But, for the sake of analyzing the code
— and counting the generators in particular — we can imagine that one stabilizer
generator of each type has been removed, so that we get a minimal generating set,
keeping in mind that we could always infer the results of these removed generators
(thinking of them as observables) from the results of all of the other stabilizer
generator observables of the same type.
This leaves L2 − 1 stabilizer generators of each type, or 2L2 − 2 in total, in a
(hypothetical) minimal generating set. Given that there are 2L2 qubits in total, this
means that the toric code encodes 2L2 − 2( L2 − 1) = 2 qubits.
15.2. THE TORIC CODE 451
The final condition required of stabilizer generators is that at least one quantum
state vector is fixed by all of the stabilizer generators. We will see that this is the
case as we proceed with the analysis of the code, but it’s also possible to reason that
there’s no way to generate −1 times the identity on all 2L2 qubits from the stabilizer
generators.
Detecting errors
The toric code has a simple and elegant description, but its quantum error-correcting
properties may not be at all clear from a first glance. As it turns out, it’s an amazing
code! To understand why and how it works, let’s begin by considering different
errors and the syndromes they generate.
The toric code is a CSS code, because all of our stabilizer generators are either Z
or X stabilizer generators. This means that X errors and Z errors can be detected
(and possibly corrected) separately. In fact, there’s a simple symmetry between
the Z and X stabilizer generators that allows us to analyze X errors and Z errors
in essentially the same way. So, we shall focus on X errors, which are possibly
detected by the Z stabilizer generators — but the entire discussion that follows can
be translated from X errors to Z errors, which are analogously detected by the X
stabilizer generators.
Figure 15.7 depicts the effect of an X error on a single qubit. Here, the assumption
is that our 2L2 qubits were previously in a state contained in the code space of the
toric code, causing all of the stabilizer generator measurements to output +1. The
Z stabilizer generators detect X errors, and there is one such stabilizer generator
for each tile in the figure, so we can indicate the measurement outcome of the
corresponding stabilizer generator with the color of that tile: +1 outcomes are
indicated by white tiles and −1 outcomes are indicated by gray tiles. If a bit-
flip error occurs on one of the qubits, the effect is that the stabilizer generator
measurements corresponding to the two tiles touching the affected qubit now
output −1.
This is intuitive when we consider Z stabilizer generators and how they behave.
In essence, each Z stabilizer generator measures the parity of the four qubits that
touch the corresponding tile (with respect to the standard basis). So, a +1 outcome
doesn’t indicate that no X errors have occurred on these four qubits, but rather it
indicates that an even number of X errors have occurred on these qubits, whereas a
−1 outcome indicates that an odd number of X errors have occurred. A single X
452 LESSON 15. QUANTUM CODE CONSTRUCTIONS
unaffected qubit
qubit affected by X error
+1 measurement outcome
−1 measurement outcome
Figure 15.7: The effect of a single X error on the Z stabilizer generator measurement
outcomes.
error therefore flips the parity of the four bits on both of the tiles it touches, causing
the stabilizer generator measurements to output −1.
Next let’s introduce multiple X errors to see what happens. In particular, we’ll
consider a chain of adjacent X errors, where two X errors are adjacent if they
affect qubits touching the same tile. As shown in Figure 15.8, the two Z stabilizer
generators at the endpoints of the chain both give the outcome −1 in this case,
because an odd number of X errors have occurred on those two corresponding tiles.
All of the other Z stabilizer generators, on the other hand, give the outcome +1,
including the ones touching the chain but not at the endpoints, because an even
number of X errors have occurred on the qubits touching the corresponding tiles.
Thus, as long as we have a chain of X errors that has endpoints, the toric code
will detect that errors have occurred, resulting in −1 measurement outcomes for the
Z stabilizer generators corresponding to the endpoints of the chain. Note that the
actual chain of errors is not revealed, only the endpoints! This is OK — in the next
subsection we’ll see that we don’t need to know exactly which qubits were affected
by X errors to correct them. (The toric code is an example of a highly degenerate
code, in the sense that it generally does not uniquely identify the errors it corrects.)
It is, however, possible for a chain of adjacent X errors not to have endpoints,
which is to say that a chain of errors could form a closed loop, like in Figure 15.9.
15.2. THE TORIC CODE 453
unaffected qubit
qubit affected by X error
+1 measurement outcome
−1 measurement outcome
Figure 15.8: The effect of a chain of adjacent X errors on the Z stabilizer generator
measurement outcomes.
In such a case, an even number of X errors have occurred on every tile, so every
stabilizer generator measurement results in a +1 outcome. Closed loops of adjacent
X errors are therefore not detected by the code.
This might seem disappointing, because we only need four X errors to form
a closed loop (and we’re hoping for better than a distance 4 code). However, a
closed loop of X errors of the form depicted in Figure 15.9 is not actually an error —
because it’s in the stabilizer! Recall that, in addition to the Z stabilizer generators,
we also have an X stabilizer generator for each vertex in the lattice. And if we
multiply adjacent X stabilizer generators together, the result is that we obtain closed
loops of X operations. For instance, the closed loop in Figure 15.9 can be obtained
by multiplying together the X stabilizer generators indicated in Figure 15.10.
This is, however, not the only type of closed loop of X errors that we can have —
and it is not the case that every closed loop of X errors is included in the stabilizer.
In particular, the different types of loops can be characterized as follows.
unaffected qubit
qubit affected by X error
+1 measurement outcome
−1 measurement outcome
Figure 15.9: A closed loop of adjacent X errors goes undetected by the toric code.
unaffected qubit
qubit affected by X error
+1 measurement outcome
−1 measurement outcome
Figure 15.10: The closed loop of adjacent X errors illustrated in Figure 15.9 is
generated by the X stabilizer generators within the loop.
15.2. THE TORIC CODE 455
unaffected qubit
qubit affected by X error
+1 measurement outcome
−1 measurement outcome
The shortest that such a loop can be is L, and therefore this is the distance of the
toric code: any closed loop of X errors with length less than L must fall into the
first category, and is therefore contained in the stabilizer; and any chain of X errors
with endpoints is detected by the code. Given that the toric code uses 2L2 qubits to
encode 2 qubits and has distance L, it follows that it’s a [[2L2 , 2, L]] stabilizer code.
Correcting errors
We’ve discussed error detection for the toric code, and now we’ll briefly discuss
how to correct errors. The toric code is a CSS code, so X errors and Z errors can be
detected and corrected independently. Keeping our focus on Z stabilizer generators,
which detect X errors, let us consider how a chain of X errors can be corrected. (Z
errors are corrected in a symmetric way.)
If a syndrome different from the (+1, . . . , +1) syndrome appears when the Z
stabilizer generators are measured, the −1 outcomes reveal the endpoints of one or
more chains of X errors. We can attempt to correct these errors by pairing together
the −1 outcomes and forming a chain of X corrections between them. When doing
this, it makes sense to choose shortest paths along which the corrections take place.
For instance, consider the diagram in Figure 15.12, which depicts a syndrome
with two −1 outcomes, indicated by gray tiles, caused by a chain of X errors
illustrated by the magenta line and circles. As we have already remarked, the chain
itself is not revealed by the syndrome; only the endpoints are visible.
To attempt to correct this chain of errors, a shortest path between the −1 mea-
surement outcomes is selected and X gates are applied as corrections to the qubits
along this path (indicated in yellow in the figure). While the corrections may not
match up with the actual chain of errors, the errors and corrections together form
a closed loop of X operations that is contained in the stabilizer of the code. The
correction is therefore successful in this situation, as the combined effect of the
errors and corrections is to do nothing to an encoded state.
This strategy won’t always be successful. For example, a different explanation
for the same syndrome as in the previous figure is shown in Figure 15.13. This
time, the same chain of corrections as before fails to correct for this chain of errors,
because the combined effect of the errors and corrections is that we obtain a closed
loop of X operations that wraps around the torus, and therefore has a nontrivial
effect on the code space. So, there’s no guarantee that the strategy just described, of
15.2. THE TORIC CODE 457
unaffected qubit
qubit affected by X error
qubit corrected by X gate
+1 measurement outcome
−1 measurement outcome
unaffected qubit
qubit affected by X error
qubit corrected by X gate
+1 measurement outcome
−1 measurement outcome
unaffected qubit
qubit corrected by X gate
+1 measurement outcome
−1 measurement outcome
Surface codes
As it turns out, it isn’t actually necessary that the toric code has periodic boundaries.
That is to say, it’s possible to cut out just a portion of the toric code and lay it flat on
a two-dimensional surface, rather than a torus, to obtain a quantum error correcting
code — provided that the stabilizer generators on the edges are properly defined.
What we obtain is called a surface code.
For example, Figure 15.15 shows a diagram of a surface code, where the lattice is
cut with so-called rough edges at the top and bottom and smooth edges at the sides.
The edge cases for the stabilizer generators are defined in the natural way, which
is that Pauli operations on “missing” qubits are simply omitted. Surface codes of
Z Z
Z Z Z Z Z Z
Z Z Z
Z stabilizer generators
X X X
X X X X
X X X
X stabilizer generators
Figure 15.15: A surface code with smooth edges on the sides and rough edges on
the top and bottom.
460 LESSON 15. QUANTUM CODE CONSTRUCTIONS
this form encode a single qubit, rather than two like the toric code. The stabilizer
generators happen to form a minimal generating set in this case, without the need
to remove one of each type as with the toric code. But, despite these differences,
the important characteristics of the toric code are inherited. In particular, nontrivial
undetected errors for this code correspond to chains of errors that either stretch
from the left edge to the right edge (for chains of X errors) or from top to bottom
(for chains of Z errors).
It’s also possible to cut the edges for a surface code diagonally to obtain what
are sometimes called rotated surface codes, which are so-named not because the
codes are rotated in a meaningful sense, but because the diagrams are rotated (by
45 degrees). For example, Figure 15.16 shows a diagram of a rotated surface code
having distance 5.
Figure 15.16: A diagram of a rotated surface code. Black faces denote X stabilizer
generators and white faces denote Z stabilizer generators.
For this type of diagram, black tiles (including the rounded ones on the edges)
indicate X stabilizer generators, where X operations are applied to the (two or four)
vertices of each tile, while white tiles represent Z stabilizer generators. Rotated
surface codes have similar properties to (non-rotated) surface codes, but are more
economical in terms of how many qubits are used.
Color codes
Color codes are another interesting class of codes, which also fall into the general
category of topological quantum codes. They will only briefly be described here.
15.3. OTHER CODE FAMILIES 461
One way to think about color codes is to view them as geometric generalizations
of the 7-qubit Steane code. With this in mind, let’s consider the 7-qubit Steane code
again, and suppose that the seven qubits are named and ordered using Qiskit’s num-
bering convention as (Q6 , Q5 , Q4 , Q3 , Q2 , Q1 , Q0 ). Recall that the stabilizer generators
for this code are as follows.
Z Z Z Z I I I
Z Z I I Z Z I
Z I Z I Z I Z
X X X X I I I
X X I I X X I
X I X I X I X
If we associate these seven qubits with the vertices of the graph shown in Fig-
ure 15.17, we find that the stabilizer generators match up precisely with the faces
formed by the edges of the graph.
Q3
ZZZZ
XXXX
Q5 Q4
Q6
ZZZZ ZZZZ
XXXX XXXX
Q1 Q2 Q0
That is, for each face, there’s both a Z stabilizer generator and an X stabilizer
generator that act nontrivially on those qubits found at the vertices of that face. The
7-qubit Steane code therefore possesses geometric locality, so in principle it’s not
necessary to move qubits over large distances to measure the stabilizer generators.
The fact that the Z and X stabilizer generators always act nontrivially on exactly the
462 LESSON 15. QUANTUM CODE CONSTRUCTIONS
same sets of qubits is also nice for reasons connected with fault-tolerant quantum
computation, which is the topic for the next lesson.
Color codes are quantum error correcting codes (CSS codes to be more precise)
that generalize this basic pattern, except that the underlying graphs may be different.
For example, Figure 15.18 shows a graph with 19 vertices that works. It defines a
code that encodes one qubit into 19 qubits and has distance 5 (so it’s a [[19, 1, 5]]
stabilizer code). This can be done with many other graphs, including families of
graphs that grow in size and have interesting structures.
Color codes are so-named because one of the required conditions on the graphs
that define them is that the faces can be three-colored, meaning that the faces can
each be assigned one of three colors in such a way that no two faces of the same
color share an edge (as we have in the diagram above). The colors don’t actually
matter for the definition of the code itself — there are always Z and X stabilizer
generators for each face, regardless of its color — but the colors are important for
analyzing how the codes work.
Other codes
Quantum error correction is an active and rapidly advancing area of research.
Those interested in exploring deeper may wish to consult the Error Correction Zoo
15.3. OTHER CODE FAMILIES 463
The gross code is a recently discovered [[144, 12, 12]] stabilizer code. It is
similar to the toric code, except each stabilizer generator acts nontrivially on
two additional qubits, slightly further away from the tile or vertex for that
generator (so each stabilizer generator has weight 6). The advantage of this
code is that it can encode 12 qubits, compared with just two for the toric code.
Lesson 16
In the previous lessons of this unit, we’ve seen several examples of quantum error
correcting codes, which can detect and allow for the correction of errors — so long
as not too many qubits are affected. If we want to use error correction for quantum
computing, however, there are still many issues to be reckoned with. This includes
the reality that, not only is quantum information fragile and susceptible to noise,
but the quantum gates, measurements, and state initializations used to implement
quantum computations will themselves be imperfect.
For instance, if we wish to perform error correction on one or more qubits that
have been encoded using a quantum error correcting code, then this must be done
using gates and measurements that might not work correctly — which means not
only failing to detect or correct errors, but possibly introducing new errors.
In addition, the actual computations we’re interested in performing must be
implemented, again with gates that aren’t perfect. But, we certainly can’t risk de-
coding qubits for the sake of performing these computations, and then re-encoding
once we’re done, because errors might strike when the protection of a quantum
error correcting code is absent. This means that quantum gates must somehow be
performed on logical qubits that never go without the protection of a quantum error
correcting code.
This all presents a major challenge. But it is known that, as long as the level
of noise falls below a certain threshold value, it is possible in theory to perform
arbitrarily large quantum computations reliably using noisy hardware. We’ll discuss
this critically important fact, which is known as the threshold theorem, toward the
end of the lesson.
465
466 LESSON 16. FAULT-TOLERANT QUANTUM COMPUTATION
The lesson starts with a basic framework for fault-tolerant quantum computing,
including a short discussion of noise models and a general methodology for fault-
tolerant implementations of quantum circuits. We’ll then move on to the issue
of error propagation in fault-tolerant quantum circuits and how to control it. In
particular, we’ll discuss transversal implementations of gates, which offer a very
simple way to control error propagation — though there is a fundamental limitation
that prevents us from using this method exclusively — and we’ll also take a look at
a different methodology involving so-called magic states, which offers a different
path to controlling error propagation in fault-tolerant quantum circuits.
And finally, the lesson concludes with a high-level discussion of the threshold
theorem, which states that arbitrarily large quantum circuits can be implemented
reliably, so long as the error rate for all of the components involved falls below a
certain finite threshold value. This threshold value depends on the error correcting
code that is used, as well as the specific choices that are made for fault-tolerant
implementations of gates and measurements, but critically it does not depend on
the size of the quantum circuit being implemented.
|ψ⟩ H
|0⟩ + +
|0⟩ H X Z
are perfect. For example, if we decide to use a surface code for error correction,
and a classical perfect matching algorithm is run to compute corrections, we really
don’t need to concern ourselves with the possibility that errors in this classical
computation will lead to a faulty solution. As another example, quantum com-
putations often necessitate classical pre- and post-processing, and these classical
computations can safely be assumed to be perfect as well.
Noise models
To analyze fault-tolerant implementations of quantum circuits, we require a precise
mathematical model — a noise model — through which probabilities for various things
to go wrong can be associated. Hypothetically speaking, one could attempt to come
up with a highly detailed, complicated noise model that aims to reflect the reality
of what happens in a particular device. But, if the noise model is too complicated
or difficult to reason about, it will likely be of limited use. For this reason, simpler
noise models are much more typically considered.
One example of a simple noise model is the independent stochastic noise model,
where errors or faults affecting different components at different moments in time
— or, in other words, different locations in a quantum circuit — are assumed to be
independent. For instance, each gate might fail with a certain probability, an error
might strike each stored qubit per unit time with a different probability, and so on,
with no correlations among the different possible errors.
Now, it is certainly reasonable to object to such a model, because there probably
will be correlations among errors in real physical devices. For instance, there might
468 LESSON 16. FAULT-TOLERANT QUANTUM COMPUTATION
be a small chance of a catastrophic error that wipes out all the qubits at once. Per-
haps more likely, there could be errors that are localized but that nevertheless affect
multiple components in a quantum computer. Nobody suggests otherwise! Nev-
ertheless, the independent stochastic noise model does provide a simple baseline
that captures the idea that nature is unpredictable but not malicious, and it isn’t
intentionally trying to ruin quantum computations.
Other, less forgiving noise models are also commonly studied. For example,
a common relaxation of the assumption of independence among errors affecting
different locations in a quantum circuit is that just the locations of the errors are
independent, but the actual errors affecting these locations could be correlated.
Regardless of what noise model is chosen, it should be recognized that learning
about the errors that affect specific devices, and formulating new error models if the
old ones lead us astray, could potentially be an important part of the development
of fault-tolerant quantum computation.
|ψ⟩ EC EC EC EC EC EC H EC
|0⟩ EC EC EC + EC + EC EC EC
|0⟩ EC H EC EC EC EC EC EC EC X EC Z EC
For this reason, sufficient care must be given to the way quantum computations
are performed in fault-tolerant implementations of circuits, to control error prop-
agation. That is, an error on one qubit can potentially be propagated to multiple
qubits through the action of gates in a quantum circuit, which can cause the number
of errors to increase dramatically. This is a paramount concern, for if we don’t
manage to control error propagation, our error-correction efforts will quickly be
overwhelmed by errors. If, on the other hand, we’re able to keep the propagation
of errors under control, then error correction stands a fighting chance of keeping
up, allowing errors to be corrected at a high enough rate to allow the quantum
computation to function as intended.
The starting point for a technical discussion of this issue is the recognition that
two-qubit gates (or multiple-qubit gates more generally) can propagate errors, even
when they function perfectly. For instance, consider a controlled-NOT gate, and
suppose that an X error occurs on the control qubit just prior to the controlled-NOT
gate being performed. As we already observed in Lesson 13 (Correcting Quantum
Errors), this is equivalent to an X error occurring on both qubits after the controlled-
NOT is performed. And the situation is similar for a Z error acting on the target
rather than the control prior to the controlled-NOT gate being performed.
This is a propagation of errors, because the unfortunate location of an X or Z
error prior to the controlled-NOT gate effectively turns it into two errors after the
controlled-NOT gate. This happens even when the controlled-NOT gate is perfect,
and we must not forget that a given controlled-NOT gate may itself be noisy, which
can create correlated errors on two qubits.
+ + X
=
X X
Z
=
Z + + Z
+ + X
+ + X
=
+ + X
X X
+ + Z
=
+ + Z
Z + + Z
Figure 16.4: Multiple CNOT gates can further propagate X and Z errors.
Adding to our concern is the fact that subsequent two-qubit gates might propa-
gate these errors even further, as Figure 16.4 suggests. In some sense, we can never
get around this; so long as we use multiple-qubit gates, there will be a potential for
error propagation. However, as we’ll discuss in the subsections that follow, steps
can be taken to limit the damage this causes, allowing for propagated errors to be
managed.
=
+
+
+ +
+
+
An X gate on the logical qubit encoded by this code can be implemented transver-
sally by the 9-qubit Pauli operation
Z⊗I⊗I⊗Z⊗I⊗I⊗Z⊗I⊗I
while a Z gate on the logical qubit can be implemented transversally by the 9-qubit
Pauli operation
X ⊗ X ⊗ X ⊗ I ⊗ I ⊗ I ⊗ I ⊗ I ⊗ I.
Both of these Pauli operations have weight 3, which is the minimum weight required.
(The 9-qubit Shor code has distance 3, so any non-identity Pauli operation of weight
2 or less is detected as an error.)
And, for a third example, the 7-qubit Steane code (and indeed every color code)
allows for a transversal implementation of all Clifford gates. We’ve already seen
how CNOT gates are implemented transversally for any CSS code, so it remains to
consider H and S gates. A Hadamard gate applied to all 7 qubits of the Steane code
is equivalent to H being applied to the logical qubit it encodes, while an S† gate (as
opposed to an S gate) applied to all 7 qubits is equivalent to a logical S gate.
Now that we know what transversal implementations of gates are, let us discuss
their connection to error propagation.
For a transversal implementation of a single-qubit gate, we simply have a tensor
product of single-qubit gates in our gadget, which acts on a code block of physical
qubits for the chosen quantum error correcting code. Although any of these gates
could fail and introduce an error, there will be no propagation of errors because no
multiple-qubit gates are involved. Immediately after the gadget is applied, error
correction is performed; and if the number of errors introduced by the gadget
(or while the gadget is being performed) is sufficiently small, the errors will be
corrected. So, if the rate of errors introduced by faulty gates is sufficiently small,
error correction has a good chance to succeed.
For a transversal implementation of a two-qubit gate, on the other hand, there
is the potential for a propagation of errors — there is simply no way to avoid this,
as we have already observed. The essential point, however, is that a transversal
gadget can never cause a propagation of errors within a single code block.
For example, considering the transversal implementation of a CNOT gate for
a CSS code described above, an X error could occur on the top qubit of the top
474 LESSON 16. FAULT-TOLERANT QUANTUM COMPUTATION
code block right before the gadget is performed, and the first CNOT within the
gadget will propagate that error to the top qubit in the lower block. However, the
two resulting errors are now in separate code blocks. So, assuming our code can
correct an X error, the error correction steps that take place after the gadget will
correct the two X errors individually — because only a single error occurs within
each code block. In contrast, if error propagation were to happen inside of the same
code block, it could turn a low-weight error into a high-weight error that the code
cannot handle.
For two different stabilizer codes, it may be that a particular gate can be imple-
mented transversally with one code but not the other. For example, while it is not
possible to implement a T gate transversally using the 7-qubit Steane code, there
are other codes for which this is possible.
Unfortunately, it is never possible, for any non-trivial quantum error correcting
code, to implement a universal set of gates transversally. This fact is known as the
Eastin–Knill theorem.
Eastin–Knill theorem
For any quantum error correcting code with distance at least 2, the set of logical
gates that can be implemented transversally generates a set of operations that
(up to a global phase) is discrete, and is therefore not universal.
The proof of this theorem will not be explained here. It is not a complicated
proof, but it does require a basic knowledge of Lie groups and Lie algebras, which
are not among the course prerequisites. The basic idea, however, can be conveyed in
intuitive terms: Infinite families of transversal operations can’t possibly stay within
the code space of a non-trivial code because minuscule differences in transversal
operations are well-approximated by low-weight Pauli operations, which the code
detects as errors.
In summary, transversal gadgets offer a simple and inherently fault-tolerant
implementation of gates — but for any reasonable choice of a quantum error
correcting code, there will never be a universal gate set that can be implemented in
this way, which necessitates the use of alternative gadgets.
16.2. CONTROLLING ERROR PROPAGATION 475
Magic states
Given that it is not possible, for any non-trivial choice for a quantum error correcting
code, to implement a universal set of quantum gates transversally, we must consider
other methods to implement gates fault-tolerantly. One well-known method is
based on the notion of magic states, which are quantum states of qubits that enable
fault-tolerant implementations of certain gates.
|ψ⟩ S T |ψ⟩
T |+⟩ +
To check that this circuit works correctly, we can first compute the action of the
CNOT gate on the input.
CNOT 1 1+i
T |+⟩ ⊗ |ψ⟩ 7−→ √ |0⟩ ⊗ T |ψ⟩ + |1⟩ ⊗ T † | ψ ⟩
2 2
476 LESSON 16. FAULT-TOLERANT QUANTUM COMPUTATION
The measurement therefore gives the outcomes 0 and 1 with equal probability.
If the outcome is 0, the S gate is not performed, and the output state is T |ψ⟩; and if
the outcome is 1, the S gate is performed, and the output state is ST † |ψ⟩ = T |ψ⟩.
The state T |+⟩ is called a magic state in this context, although it’s not unique
in this regard: other states are also called magic states when they can be used
in a similar way (for possibly different gates and using different circuits). For
example, exchanging the state T |+⟩ for the state S|+⟩ and replacing the S gate in
the circuit above with a Z gate implements an S gate — which is potentially useful
for fault-tolerant quantum computation using a code for which S gates cannot be
implemented transversally.
It may not be clear that using magic states to implement gates is helpful for fault-
tolerance. For the T gate implementation described above, for instance, it appears
that we still need to apply a T gate to a |+⟩ state to obtain a magic state, which we
then use to implement a T gate. So what is the advantage of using this approach for
fault-tolerance? Here are three key points that provide an answer to this question.
1. The creation of magic states does not necessitate applying the gate we’re
attempting to implement to a particular state. For example, applying a T gate
to a |+⟩ state is not the only way to obtain a T |+⟩ state.
2. The creation of magic states can be done separately from the computation in
which they’re used. This means that errors that arise in the magic state creation
process will not propagate to the actual computation being performed.
3. If the individual gates in the circuit implementing a chosen gate using a magic
state can be implemented fault-tolerantly, and we assume the availability of
magic states, we obtain a fault-tolerant implementation of the chosen gate.
|ψ⟩ S T |ψ⟩
T |+⟩ +
need encoded magic states. The gates in the original T-gate circuit are here replaced
by gadgets, which we assume are fault-tolerant.
This particular figure therefore suggests that we already have fault-tolerant
gadgets for CNOT gates and S gates. For a color code, these gadgets could be
transversal; for a surface code (or any other CSS code), the CNOT can be performed
transversally, while the S gate gadget might itself be implemented using magic
states, as was earlier suggested is possible. (The figure also suggests that we have a
fault-tolerant gadget for performing a standard basis measurement, which we’ve
ignored thus far. This could actually be challenging for some codes selected to make
it so, but for a CSS code it’s a matter of measuring each physical qubit followed by
classical post-processing.)
The implementation is therefore fault-tolerant, assuming we have an encoding
of a magic state T |+⟩. But, we still haven’t addressed the issue of how we obtain
an encoding of this state. One way to obtain encoded magic states (or, perhaps
more accurately, to make them better) is through a process known as magic state
distillation. The diagram in Figure 16.8 illustrates what this process looks like at the
highest level.
In words, a collection of noisy encoded magic states is fed into a special type
of circuit known as a distiller. All but one of the output blocks is measured —
meaning that logical qubits are measured with standard basis measurements. If any
of the measurement outcomes is 1, the process has failed and must be restarted.
If, however, every measurement outcome is 0, the resulting state of the top code
block will be a less noisy encoded magic state. This state could then join four more
as inputs into another distiller, or used to implement a T gate if it is deemed to be
sufficiently close to a true encoded magic state. Of course, the process must begin
somewhere, with one possibility being to prepare them non-fault-tolerantly.
478 LESSON 16. FAULT-TOLERANT QUANTUM COMPUTATION
Less-noisy
magic state
encoding
0
Noisy
magic state Distiller 0
encodings
0
0
There are different known ways to build the distiller itself, but they will not be
explained or analyzed here. At a logical level, the typical approach — remarkably
and somewhat coincidentally — is to run an encoding circuit for a stabilizer code
in reverse! This could, in fact, be a different stabilizer code from the one used for
error correction. For example, one could potentially use a surface or color code for
error correction, but run an encoder for the 5-qubit code in reverse for the sake of
magic state distillation. Encoding circuits for stabilizer codes only require Clifford
gates, which simplifies the fault-tolerant implementation of a distiller. In actuality,
the specifics are dependent on the codes that are used.
In summary, this section has aimed to provide only a very high-level discussion
of magic states, with the intention being to provide just a basic idea of how it works.
It is sometimes claimed that the overhead for using magic states to implement
gates fault-tolerantly along these lines would be extremely high, with the vast
majority of the work going into the distillation process. However, this is actually
not so clear — there are many potential ways to optimize these processes. There
are, in addition, alternative approaches to building fault-tolerant gadgets for gates
that cannot be implemented transversally. For example, code deformation and code
switching are keywords associated with some of these schemes — and new ways
continue to be developed and refined.
16.2. CONTROLLING ERROR PROPAGATION 479
1 1
√ |0n ⟩ + √ |1n ⟩,
2 2
where 0n and 1n refer to the all-zero and all-one strings of length n. For instance, this
is a |ϕ+ ⟩ state when n = 2 and a GHZ state when n = 3, but in general, Shor error
correction requires a state like this for n being the weight of the stabilizer generator
being measured. As an example, the circuit shown in Figure 16.9 measures a
stabilizer generator of the form P2 ⊗ P1 ⊗ P0 .
This necessitates the construction of the cat state itself, and to make it work
reliably in the presence of errors and potentially faulty gates, the method actually re-
quires repeatedly running circuits like this to make inferences about where different
errors may have occurred during the process.
An alternative method is known as Steane error correction. This method works
differently, and it only works for CSS codes. The idea is that we don’t actually
perform the syndrome measurements on the encoded quantum states in the circuit
we’re trying to run, but instead we intentionally propagate errors to a workspace
system, and then measure that system and classically detect errors. The circuit
diagrams in Figure 16.10 illustrate how this can be done for detecting X and Z
480 LESSON 16. FAULT-TOLERANT QUANTUM COMPUTATION
P0
P1
P2
H
|000⟩ + |111⟩
compute
√ H
2
parity
H
errors, respectively. A related method known as Knill error correction extends this
method to arbitrary stabilizer codes using teleportation.
Threshold theorem
In simple terms, it says that if we have any quantum circuit having N gates, where
N can be as large as we like, then it’s possible to implement that circuit with high
accuracy using a noisy quantum circuit, provided that the level of noise is below a
certain threshold value that is independent of N. Moreover, it isn’t too expensive to
16.3. THRESHOLD THEOREM 481
+
+
detect
Encoding
+ X errors
of |+⟩ (classically)
+
+
+
+
+
+
+
H
detect
H
Encoding
H Z errors
of |0⟩ (classically)
H
H
Figure 16.10: Circuits for detecting X and Z errors using Steane error correction.
do this, in the sense that the size of the noisy circuit required is on the order of N
times some constant power of the logarithm of N.
To state the theorem more formally requires being specific about the noise
model, which will not be done in this lesson. It can, for instance, be proved for the
independent stochastic noise model that was mentioned earlier, where errors occur
independently at each possible location in the circuit with some probability strictly
smaller than the threshold value, but it can also be proved for more general noise
models where there can be correlations among errors.
This is a theoretical result, and the most typical way it is proved doesn’t neces-
sarily translate to a practical approach, but it does nevertheless have great practical
importance. In particular, it establishes that there is no fundamental barrier to
performing quantum computations using noisy components; as long as the error
482 LESSON 16. FAULT-TOLERANT QUANTUM COMPUTATION
rate for these components is below the threshold value, they can be used to build
reliable quantum circuits of arbitrary size. An alternative way to state its importance
is to observe that, if the theorem wasn’t true, it would be hard to imagine large-scale
quantum computing ever becoming a reality.
There are many technical details involved in formal proofs of (formal statements
of) this theorem, and those details will not be communicated here — but the
essential ideas can nevertheless be explained at an intuitive level. To make this
explanation as simple as possible, let’s imagine that we use the 7-qubit Steane code
for error correction. This would be an impractical choice for an actual physical
implementation — as would be reflected by a minuscule threshold value pth — but
it works well to convey the main ideas. This explanation will also be rather cavalier
about the noise model, with the assumption being that an error strikes each location
in a fault-tolerant implementation independently with probability p.
Now, if the probability p is larger than the reciprocal of N, the size of the circuit
we aim to implement, chances are very good that an error will strike somewhere.
So, we can attempt to run a fault-tolerant implementation of this circuit, following
the prescription outlined in the lesson. We may then ask ourselves the question
suggested earlier: Is this making things better or worse?
If the probability p of an error at each location is too large, then our efforts will
not help and may even make things worse, just like the 9-qubit Shor code doesn’t
help if the error probability is above 3.23% or so. In particular, the fault-tolerant
implementation is considerably larger than our original circuit, so there are a lot
more locations where errors could strike.
However, if p is small enough, then we will succeed in reducing the error
probability for the logical computation we’re performing. (In a formal proof, we
would need to be very careful at this point: errors in the logical computation will
not necessarily be accurately described by the original noise model. This, in fact,
motivates less forgiving noise models where errors might not be independent —
but we will sweep this detail under the rug for the sake of this explanation.)
In greater detail, in order for a logical error to occur in the original circuit, at least
two errors must fall into the same code block in the fault-tolerant implementation,
given that the Steane code can correct any single error in a code block. Keeping in
mind there are many different ways to have two or more errors in the same code
block, it is possible to argue that the probability of a logical error at each location
in the original circuit is at most Cp2 for some fixed positive real number C that
depends on the code and the gadgets we use, but critically not on N, the size of the
16.3. THRESHOLD THEOREM 483
original circuit. If p is smaller than 1/C, which is the number we can take as our
threshold value pth , this translates to a reduction in error.
However, this new error rate might still be too high to allow the entire circuit
to work correctly. A natural thing to do at this point is to choose a better code and
better gadgets to drive the error rate down to a point where the implementation is
likely to work. Theoretically speaking, a simple way to argue that this is possible is
to concatenate. That is to say, we can think of the fault-tolerant implementation of
the original circuit as if it were any other quantum circuit, and then implement this
new circuit fault-tolerantly, using the same scheme. We can then do this again and
again, as many times as we need to reduce the error rate to a level that allows the
original computation to work.
To get a rough idea for how the error rate decreases through this method, let’s
consider how it works for a few iterations. Note that a rigorous analysis would
need to account for various technical details we’re omitting here.
We start with the error probability p for locations in the original circuit. Presum-
ing that p < pth = 1/C, the logical error rate can be bounded by Cp2 = (Cp) p after
the first iteration. By treating the fault-tolerant implementation as any other circuit,
and implementing it fault-tolerantly, we obtain a bound on the logical error rate of
2
C (Cp) p = (Cp)3 p.
Continuing in this manner for a total of k iterations leads to a logical error rate (for
the original circuit) bounded by
k −1
(Cp)2 p,
So, what is this threshold value in reality? The answer depends on the code and
the gadgets used. For the Steane code together with magic state distillation, it is
minuscule and probably unlikely to be achievable in practice. But, using surface
codes and state of the art gadgets, the threshold has been estimated to be on the
order of 0.1% to 1%.
As new codes and methods are discovered, it is reasonable to expect the thresh-
old value to increase, while simultaneously the level of noise in actual physical
components will decrease. Reaching the point at which large-scale quantum com-
putations can be implemented fault-tolerantly will not be easy, and will not happen
overnight. But, this theorem, together with advances in quantum codes and quan-
tum hardware, provide us with optimism as we continue to push forward to reach
the ultimate goal of building a large-scale, fault-tolerant quantum computer.
Bibliography
This bibliography includes numerous references that are relevant to this course,
including books, surveys, and research papers, divided into separate lists: back-
ground and prerequisite material (such as linear algebra, probability theory, and
basic theoretical computer science); general references that cover topics spanning
or relevant to multiple units; and unit-specific references.
Some of these references represent original research discoveries while others
are pedagogical in nature or are secondary sources that refine and/or simplify the
subject matter. Some are connected directly to facts or discoveries mentioned in the
text while others are merely relevant or offer further explorations of various topics.
In some cases, only small portions of these sources may be relevant to this course. I
have made no attempt to categorize them along these lines.
This bibliography should not be seen as a comprehensive list or a historical
record that aims to give proper attribution to discoveries and developments in the
field. Rather, it’s a list of suggestions for background or further reading. After over
30 years studying, researching, and teaching quantum information and computa-
tion, it would be extremely difficult for me to produce a comprehensive list — and
the truth of the matter is that this course was informed by many sources that are
not listed here, as well as talks, presentations, and personal conversations over the
years. I do regret any omissions, but this should be a good start for those wishing
to learn more.
Stephen Friedberg, Arnold Insel, and Lawrence Spence. Linear Algebra. Prentice
Hall, 4th edition, 2003.
485
486 LESSON 16. FAULT-TOLERANT QUANTUM COMPUTATION
Kenneth Hoffman and Ray Kunze. Linear Algebra. Prentice Hall, 2nd edition, 1971.
Roger Horn and Charles Johnson. Matrix Analysis. Cambridge University Press,
1985.
Sal Khan. Linear algebra. Khan Academy, 2025. Video series available at
https://www.khanacademy.org/math/linear-algebra.
General references
Richard Feynman. Simulating physics with computers. International Journal of
Theoretical Physics, 21(6/7):467–488, 1982.
Alexei Kitaev, Alexander Shen, and Mikhail Vyalyi. Classical and Quantum
Computation, volume 47 of Graduate Studies in Mathematics. American Mathematical
Society, 2002.
Michael Nielsen and Isaac Chuang. Quantum Computation and Quantum Information.
Cambridge University Press, 10th anniversary edition, 2010.
16.3. THRESHOLD THEOREM 487
John Preskill. Lecture Notes for Physics 229: Quantum Information and Computation.
California Institute of Technology, 2020. Available at
https://www.preskill.caltech.edu/ph229/.
Unit I references
John Bell. On the Einstein Podolsky Rosen paradox. Physics Physique Fizika,
1(3):195–200, 1964.
Charles Bennett, Gilles Brassard, Claude Crépeau, Richard Jozsa, Asher Peres, and
William Wootters. Teleporting an unknown quantum state via dual classical and
Einstein–Podolsky–Rosen channels. Physical Review Letters, 70(13):1895–1899, 1993.
John Clauser, Michael Horne, Abner Shimony, and Richard Holt. Proposed
experiment to test local hidden-variable theories. Physical Review Letters,
23(15):880–884, 1969.
Richard Cleve, Peter Høyer, Benjamin Toner, and John Watrous. Consequences and
limits of nonlocal strategies. In Proceedings of the 19th Annual IEEE Conference on
Computational Complexity, pages 236–249, 2004.
Paul Dirac. The Principles of Quantum Mechanics. Clarendon Press, fourth edition,
1958.
William Wootters and Wojciech Zurek. A single quantum cannot be cloned. Nature,
299(5886):802–803, 1982.
Unit II references
Sanjeev Arora and Boaz Barak. Computational Complexity: A Modern Approach.
Cambridge University Press, 2009.
Eric Bach and Jeffrey Shallit. Algorithmic Number Theory, Volume I: Efficient
Algorithms. MIT Press, 1996.
Charles Bennett, Ethan Bernstein, Gilles Brassard, and Umesh Vazirani. Strengths
and weaknesses of quantum computing. SIAM Journal on Computing,
26(5):1510–1523, 1997.
Ethan Bernstein and Umesh Vazirani. Quantum complexity theory. SIAM Journal
on Computing, 26(5):1411–1473, 1997.
Michel Boyer, Gilles Brassard, Peter Høyer, and Alain Tapp. Tight bounds on
quantum searching. Fortschritte der Physik, 46(4-5):493–505, 1998.
Oscar Boykin, Tal Mor, Matthew Pulver, Vwani Roychowdhury, and Farrokh Vatan.
A new universal and fault-tolerant quantum basis. Information Processing Letters,
75(3):101–107, 2000.
Gilles Brassard, Peter Høyer, Michele Mosca, and Alain Tapp. Quantum amplitude
amplification and estimation. Contemporary Mathematics, 305:53–74, 2002.
Richard Cleve, Artur Ekert, Chiara Macchiavello, and Michele Mosca. Quantum
algorithms revisited. Proceedings of the Royal Society of London A, 454(1969):339–354,
1998.
James Cooley and John Tukey. An algorithm for the machine calculation of
complex Fourier series. Mathematics of Computation, 19:297–301, 1965.
David Deutsch. Quantum theory, the Church-Turing principle and the universal
quantum computer. Proceedings of the Royal Society of London A, 400(1818):97–117,
1985.
Alexei Kitaev. Quantum measurements and the Abelian stabilizer problem, 1996.
arXiv:quant-ph/9511026.
Ronald Rivest, Adi Shamir, and Leonard Adleman. A method for obtaining digital
signatures and public-key cryptosystems. Communications of the ACM,
21(2):120–126, 1978.
Andrew Yao. Quantum circuit complexity. In Proceedings of the 34th Annual IEEE
Symposium on Foundations of Computer Science, pages 352–361, 1993.
Richard Jozsa. Fidelity for mixed quantum states. Journal of Modern Optics,
41(12):2315–2323, 1994.
Karl Kraus. States, effects, and operations: fundamental notions of quantum theory.
Lecture Notes in Physics, 190, 1983.
Mark Wilde. Quantum Information Theory. Cambridge University Press, 2nd edition,
2017.
Andreas Winter. Coding theorem and strong converse for quantum channels. IEEE
Transactions on Information Theory, 45(7):2481–2485, 1999.
16.3. THRESHOLD THEOREM 491
Unit IV references
Dorit Aharonov and Michael Ben-Or. Fault-tolerant quantum computation with
constant error. Proceedings of the 29th Annual ACM Symposium on Theory of
Computing, pages 176–188, 1997.
Sergey Bravyi and Alexei Kitaev. Quantum codes on a lattice with boundary. arXiv:
quant-ph/9811052, 1998.
Sergey Bravyi, Andrew Cross, Jay Gambetta, Dmitri Maslov, Patrick Rall, and
Theodore Yoder. High-threshold and low-overhead fault-tolerant quantum
memory. Nature, 627:778–782, 2024.
Robert Calderbank and Peter Shor. Good quantum error-correcting codes exist.
Physical Review A, 54(2):1098–1105, 1996.
Eric Dennis, Alexei Kitaev, Andrew Landahl, and John Preskill. Topological
quantum memory. Journal of Mathematical Physics, 43(9):4452–4505, 2002.
Bryan Eastin and Emanuel Knill. Restrictions on transversal encoded quantum gate
sets. Physical Review Letters, 102(11):110502, 2009.
Austin Fowler, Matteo Mariantoni, John Martinis, and Andrew Cleland. Surface
codes: Towards practical large-scale quantum computation. Physical Review A,
86(3):032324, 2012.
Daniel Gottesman. Stabilizer codes and quantum error correction. PhD thesis,
California Institute of Technology, 1997. arXiv: quant-ph/9705052.
Daniel Lidar and Todd Brun. Quantum Error Correction. Cambridge University
Press, 2013.
John Preskill. Reliable quantum computers. Proceedings of the Royal Society of London
A, 454(1969):385–410, 1998.
Andrew Steane. Error correcting codes in quantum theory. Physical Review Letters,
77(5):793–797, 1996.